speech processing 15 492 18 492
play

Speech Processing 15-492/18-492 Speech Translation Case study: - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Speech Translation Case study: Transtac Details Transtac: Two S2S System DARPA developed for DARPA developed for Check points, medical and civil defense Check points, medical and civil defense


  1. Speech Processing 15-492/18-492 Speech Translation Case study: Transtac Details

  2. Transtac: Two S2S System DARPA developed for � DARPA developed for � � Check points, medical and civil defense Check points, medical and civil defense � Requirements � Requirements � � Two way Two way � � Eyes Eyes- -free (no screen) free (no screen) � � Portable Portable � � Usable by real Usable by real usersS usersS �

  3. Transtac System Close-talking Microphone Optional speech control Push-to-Talk Buttons Laptop secured in Backpack Small powerful Speakers

  4. Transtac System Details Two way system � Two way system � � 2 ASR systems: English and Iraqi 2 ASR systems: English and Iraqi � � 2 way statistical translation 2 way statistical translation � � 2 synthesizers 2 synthesizers � Push- -to to- -talk system talk system � Push � � (Users don’t like “translate everything mode”) (Users don’t like “translate everything mode”) � Echo back ASR result � Echo back ASR result � � And then translation And then translation �

  5. Iraqi Language Iraqi Arabic is a dialect � Iraqi Arabic is a dialect � � Most Iraqi’s write Modern Standard Arabic Most Iraqi’s write Modern Standard Arabic � � Most Iraqi’s do not write their own dialect Most Iraqi’s do not write their own dialect � No standardized spelling � No standardized spelling � � Transtac Transtac project invented one project invented one � � But Iraqi’s may not be used to it But Iraqi’s may not be used to it � Arabic (MSA and dialects) � Arabic (MSA and dialects) � � Do not write short vowels in words Do not write short vowels in words �

  6. Data for Training � Collected human mediated dialogs Collected human mediated dialogs � � Human acts as a machine Human acts as a machine � � Passed a microphone back an forward Passed a microphone back an forward � � Try to get people not to talk at same time Try to get people not to talk at same time � � Large number of collections (over 4 years) Large number of collections (over 4 years) � � 650 thousand sentences pairs 650 thousand sentences pairs � � Many different speakers Many different speakers � � Hand transcribed by experts (in Iraqi spelling) Hand transcribed by experts (in Iraqi spelling) � � Hand translate (Source sentences and Interpreter’s) Hand translate (Source sentences and Interpreter’s) �

  7. Iraqi ASR � Acoustic model from Iraqi data Acoustic model from Iraqi data � � Based on MSA Based on MSA phoneset phoneset � � Needs to be small fast models Needs to be small fast models � � Discriminative Training Discriminative Training � � Speaker specific adaptation Speaker specific adaptation � � Lexicon Lexicon � � Based on LDC provided lexicon Based on LDC provided lexicon � � Multiple pronunciations/typos still a problem Multiple pronunciations/typos still a problem � � Statistically trained LTS rules Statistically trained LTS rules � � Language Model Language Model � � Trained on Iraqi input (and translated output) Trained on Iraqi input (and translated output) �

  8. English ASR � Acoustic model Acoustic model � � Originally using other models Originally using other models � � Then trained from collected data Then trained from collected data � � (Mostly military personnel) (Mostly military personnel) � � Lexicon Lexicon � � Existing lexicon but needed to add Military speak: Existing lexicon but needed to add Military speak: � MRAP, IED MRAP, IED � Language model Language model � � Trained from data provided Trained from data provided � � Trained from “similar” data found on the web Trained from “similar” data found on the web � � Training from hand created “typical” examples Training from hand created “typical” examples �

  9. TTS � Standard English TTS Standard English TTS � � Appropriate “command” voice Appropriate “command” voice � � Unit selection Unit selection � � Added lots of military vocabulary Added lots of military vocabulary � � Iraqi TTS Iraqi TTS � � Recorded from Iraqi radio announcer Recorded from Iraqi radio announcer � � Based on example sentences in the domain Based on example sentences in the domain � � LDC lexicon and LTS rules (same as ASR) LDC lexicon and LTS rules (same as ASR) � � Hand tuned Hand tuned �

  10. S2S Interface Issues How do you teach people to use the system � How do you teach people to use the system � � “ “Transtac Transtac say instructions” say instructions” � � Not really sufficient Not really sufficient � How can you tell it translated correctly � How can you tell it translated correctly � � Give (speech) feedback. Give (speech) feedback. �  Backtranslation Backtranslation   ASR echo back ASR echo back 

  11. S2S Interface Issues How do you translate names � How do you translate names � � A correct translation/transliteration is hard to A correct translation/transliteration is hard to � understand understand Mark names in translations � Mark names in translations � � “My name is … Abdullah” “My name is … Abdullah” � � “He lives on … al “He lives on … al- -Aqar Aqar … street” … street” �

  12. S2S Evaluation (Transtac) � Offline tests Offline tests � � ASR ASR- ->Text and Text >Text and Text- ->Text >Text � � Compare to translation references Compare to translation references � � WER and “BLEU” score WER and “BLEU” score � � Online tests Online tests � � Concept transfer (through defined scenarios) Concept transfer (through defined scenarios) � � Speed (number of concepts per minute) Speed (number of concepts per minute) � � (English speech masking) (English speech masking) � � Utility tests Utility tests � � Does it really work Does it really work �

  13. Transtac Participants � Developer groups Developer groups � � IBM IBM � � SRI SRI � � BBN BBN � � CMU CMU � � USC USC � � Evaluations Evaluations � � Twice a year in Iraqi (somewhere in DC) Twice a year in Iraqi (somewhere in DC) � � One surprise language (Farsi, One surprise language (Farsi, Bahasa Bahasa Malay) Malay) � � Other evaluations with military groups Other evaluations with military groups �

  14. Does it work?? Yes, mostly � Yes, mostly � � 27 concepts out of 30 27 concepts out of 30- -ish turns ish turns � Systems are mostly similar � Systems are mostly similar � � But some better than others But some better than others � Other techniques � Other techniques � � Belt/holster based PC with handheld speaker Belt/holster based PC with handheld speaker � � Small PC in pouch Small PC in pouch � � Chest mounted array microphone Chest mounted array microphone �

  15. S2S ASR Advanced issues � Tight coupling Tight coupling � � ASR should output N ASR should output N- -best best � � Translated all (lattice) Translated all (lattice) � � Choose best translation Choose best translation � � (MT as a LM for ASR) (MT as a LM for ASR) � � Remove Remove disfluencies/hestitations disfluencies/hestitations � � Add more relevant data Add more relevant data � � Automatically convert past tense/third person data to Automatically convert past tense/third person data to � present tense/first+second first+second person … person … present tense/

  16. S2S TTS Advance Issues MT output isn’t gramtical gramtical � MT output isn’t � � TTS doesn’t care and just says it TTS doesn’t care and just says it � � TTS should try to say MT output with more TTS should try to say MT output with more � breaks. breaks. TTS (unit selection) � TTS (unit selection) � � As a LM on MT output As a LM on MT output � � Choose the best translation on what is said best Choose the best translation on what is said best �

Recommend


More recommend