11-731 Machine Translation Speech 2 Speech Translation
Speech Translation Three part systems � Three part systems � � ASR ASR - -> Translation > Translation - -> TTS > TTS � System configurations � System configurations � � One way One way – – phrasal phrasal � � One way One way – – broadcast/lecture broadcast/lecture � � 1.5 way 1.5 way – – phrasal with limited answers phrasal with limited answers � � Two way Two way – – full two way full two way �
Machine Translation Technologies � Phrasal Phrasal � � Phrase to phrase look up Phrase to phrase look up � � Template: Template: � � Template fillers, fixed translation Template fillers, fixed translation � � Interlingua Interlingua � � Translation into meaning representation Translation into meaning representation � � Statistical Machine Translation Statistical Machine Translation � � From large collect of parallel text From large collect of parallel text � � Classification base translation Classification base translation � � Identify classes and deal directly with them Identify classes and deal directly with them �
Simple Translation Phrase to Phrase � Phrase to Phrase � � Greetings Greetings � � Do you need medical attention? Do you need medical attention? � � Relatively easy to build, but limited use Relatively easy to build, but limited use � Template translations � Template translations � � The next train leaves at TIME from gate The next train leaves at TIME from gate GATE GATE � form PLACE form PLACE � Limited but still useful Limited but still useful �
SPEECH Translation Speech isn’t text � Speech isn’t text � � Different style, hard to find lots of Different style, hard to find lots of exaples exaples � Speech isn’t fluent � Speech isn’t fluent � � False starts, hesitations, ungrammatical False starts, hesitations, ungrammatical � ☺ ASR never makes errors ☺ � ASR never makes errors �
One Way: Broadcast One speaker � One speaker � � Lecturer: can modify language model Lecturer: can modify language model � Multiple speakers � Multiple speakers � � May be repeat speakers (News Anchor) May be repeat speakers (News Anchor) � � May had other noises: music etc May had other noises: music etc � � (TV programs) (TV programs) � Doesn’t need to be real time (maybe) � Doesn’t need to be real time (maybe) �
One Way: “Dialogue” • Voxtec’s Phraselator – One way communication – Recognized “fixed” phrases – Lookup for translations – *Very* fast deployment for new languages.
Two Way: Dialog Users can detect own errors and correct � Users can detect own errors and correct � Needs to be real time � Needs to be real time � One user may be much more familiar � One user may be much more familiar � How do you teach the other user � How do you teach the other user � Typically domain directed � Typically domain directed �
Two way: Dialog CMU System: Janus PDA version CMU SMT Cepstral Synthesis Mobile Tech models Platform: COTS PDA (Ipaq) VoxTec P2 Language: Iraqi/English, Thai/English Chinese, Japanese etc
Speech Technology Issues ASR: � ASR: � � Disfluencies Disfluencies, dialects, speaking style , dialects, speaking style � � Unfamiliarity with system Unfamiliarity with system � TTS: � TTS: � � MT output isn’t always fluent MT output isn’t always fluent � � TTS says it anyway TTS says it anyway � � Can be hard to understand Can be hard to understand �
Speech Technology Issues Spoken not Written Languages � Spoken not Written Languages � � Arabic Arabic vs vs Arabic Dialects Arabic Dialects � � Mixture of languages Mixture of languages � � Politeness levels Politeness levels � � Gender in speech Gender in speech �
Transtac: Two S2S System DARPA developed for � DARPA developed for � � Check points, medical and civil defense Check points, medical and civil defense � Requirements � Requirements � � Two way Two way � � Eyes Eyes- -free (no screen) free (no screen) � � Portable Portable � � Usable by real Usable by real usersS usersS �
Transtac System Close-talking Microphone Optional speech control Push-to-Talk Buttons Laptop secured in Backpack Small powerful Speakers
Transtac System Details Two way system � Two way system � � 2 ASR systems: English and Iraqi 2 ASR systems: English and Iraqi � � 2 way statistical translation 2 way statistical translation � � 2 synthesizers 2 synthesizers � Push- -to to- -talk system talk system � Push � � (Users don’t like “translate everything mode”) (Users don’t like “translate everything mode”) � Echo back ASR result � Echo back ASR result � � And then translation And then translation �
Iraqi Language Iraqi Arabic is a dialect � Iraqi Arabic is a dialect � � Most Iraqi’s write Modern Standard Arabic Most Iraqi’s write Modern Standard Arabic � � Most Iraqi’s do not write their own dialect Most Iraqi’s do not write their own dialect � No standardized spelling � No standardized spelling � � Transtac Transtac project invented one project invented one � � But Iraqi’s may not be used to it But Iraqi’s may not be used to it � Arabic (MSA and dialects) � Arabic (MSA and dialects) � � Do not write short vowels in words Do not write short vowels in words �
Data for Training � Collected human mediated dialogs Collected human mediated dialogs � � Human acts as a machine Human acts as a machine � � Passed a microphone back an forward Passed a microphone back an forward � � Try to get people not to talk at same time Try to get people not to talk at same time � � Large number of collections (over 4 years) Large number of collections (over 4 years) � � 650 thousand sentences pairs 650 thousand sentences pairs � � Many different speakers Many different speakers � � Hand transcribed by experts (in Iraqi spelling) Hand transcribed by experts (in Iraqi spelling) � � Hand translate (Source sentences and Interpreter’s) Hand translate (Source sentences and Interpreter’s) �
Iraqi ASR � Acoustic model from Iraqi data Acoustic model from Iraqi data � � Based on MSA Based on MSA phoneset phoneset � � Needs to be small fast models Needs to be small fast models � � Discriminative Training Discriminative Training � � Speaker specific adaptation Speaker specific adaptation � � Lexicon Lexicon � � Based on LDC provided lexicon Based on LDC provided lexicon � � Multiple pronunciations/typos still a problem Multiple pronunciations/typos still a problem � � Statistically trained LTS rules Statistically trained LTS rules � � Language Model Language Model � � Trained on Iraqi input (and translated output) Trained on Iraqi input (and translated output) �
English ASR � Acoustic model Acoustic model � � Originally using other models Originally using other models � � Then trained from collected data Then trained from collected data � � (Mostly military personnel) (Mostly military personnel) � � Lexicon Lexicon � � Existing lexicon but needed to add Military speak: Existing lexicon but needed to add Military speak: � MRAP, IED MRAP, IED � Language model Language model � � Trained from data provided Trained from data provided � � Trained from “similar” data found on the web Trained from “similar” data found on the web � � Training from hand created “typical” examples Training from hand created “typical” examples �
TTS � Standard English TTS Standard English TTS � � Appropriate “command” voice Appropriate “command” voice � � Unit selection Unit selection � � Added lots of military vocabulary Added lots of military vocabulary � � Iraqi TTS Iraqi TTS � � Recorded from Iraqi radio announcer Recorded from Iraqi radio announcer � � Based on example sentences in the domain Based on example sentences in the domain � � LDC lexicon and LTS rules (same as ASR) LDC lexicon and LTS rules (same as ASR) � � Hand tuned Hand tuned �
S2S Interface Issues How do you teach people to use the system � How do you teach people to use the system � � “ “Transtac Transtac say instructions” say instructions” � � Not really sufficient Not really sufficient � How can you tell it translated correctly � How can you tell it translated correctly � � Give (speech) feedback. Give (speech) feedback. � Backtranslation Backtranslation ASR echo back ASR echo back
S2S Interface Issues How do you translate names � How do you translate names � � A correct translation/transliteration is hard to A correct translation/transliteration is hard to � understand understand Mark names in translations � Mark names in translations � � “My name is … Abdullah” “My name is … Abdullah” � � “He lives on … al “He lives on … al- -Aqar Aqar … street” … street” �
Recommend
More recommend