Speech Processing 15-492/18-492 Speech Translation Speech - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Speech Translation

Speech Translation Three part systems � Three part systems � � ASR ASR - -> Translation > Translation - -> TTS > TTS � System configurations � System configurations � � One way One way – – phrasal phrasal � � One way One way – – broadcast/lecture broadcast/lecture � � 1.5 way 1.5 way – – phrasal with limited answers phrasal with limited answers � � Two way Two way – – full two way full two way �

Machine Translation Technologies � Phrasal Phrasal � � Phrase to phrase look up Phrase to phrase look up � � Template: Template: � � Template fillers, fixed translation Template fillers, fixed translation � � Interlingua Interlingua � � Translation into meaning representation Translation into meaning representation � � Statistical Machine Translation Statistical Machine Translation � � From large collect of parallel text From large collect of parallel text � � Classification base translation Classification base translation � � Identify classes and deal directly with them Identify classes and deal directly with them �

Choices in Translation Choose any two … � Choose any two … � � High accuracy High accuracy � � Large vocabulary Large vocabulary � � Fully automatic Fully automatic � Speech vs vs Text Text � Speech � � Speech less clear than text Speech less clear than text � � Less speech to train from Less speech to train from � � Needs to be real Needs to be real- -time (probably) time (probably) �

Simple Translation Phrase to Phrase � Phrase to Phrase � � Greetings Greetings � � Do you need medical attention? Do you need medical attention? � � Relatively easy to build, but limited use Relatively easy to build, but limited use � Template translations � Template translations � � The next train leaves at TIME from gate The next train leaves at TIME from gate GATE GATE � form PLACE form PLACE � Limited but still useful Limited but still useful �

Interlingua Translate sentences into standard form � Translate sentences into standard form � Generate sentences from standard form � Generate sentences from standard form � PROS: � PROS: � � Can do multiple languages easily Can do multiple languages easily � � Can be very accurate Can be very accurate � CONS � CONS � � Designing universal interlingua is very hard Designing universal interlingua is very hard � � Doesn’t do well when out of domain Doesn’t do well when out of domain �

Statistical Machine Translation Build probabilistic models from parallel text � Build probabilistic models from parallel text � Parallel text often available from � Parallel text often available from � � Bilingual organizations Bilingual organizations �  Governments, UN Governments, UN  � Relatively easy to collect Relatively easy to collect �  Requires translators rather than MT experts Requires translators rather than MT experts 

Learning from Parallel Text

Statistical Machine Translation PROS � PROS � � Data collection doesn’t require MT experts Data collection doesn’t require MT experts � � Data driven Data driven � � Degrades gracefully when out of domain Degrades gracefully when out of domain � CONS � CONS � � Needs all language pairs Needs all language pairs � � Needs good/lots of data Needs good/lots of data � � Hard to fix specific errors Hard to fix specific errors �

SPEECH Translation Speech isn’t text � Speech isn’t text � � Different style, hard to find lots of Different style, hard to find lots of exaples exaples � Speech isn’t fluent � Speech isn’t fluent � � False starts, hesitations, ungrammatical False starts, hesitations, ungrammatical � ☺ ASR never makes errors ☺ � ASR never makes errors �

One Way: Broadcast One speaker � One speaker � � Lecturer: can modify language model Lecturer: can modify language model � Multiple speakers � Multiple speakers � � May be repeat speakers (News Anchor) May be repeat speakers (News Anchor) � � May had other noises: music etc May had other noises: music etc � � (TV programs) (TV programs) � Doesn’t need to be real time (maybe) � Doesn’t need to be real time (maybe) �

Two Way: Dialog Users can detect own errors and correct � Users can detect own errors and correct � Needs to be real time � Needs to be real time � One user may be much more familiar � One user may be much more familiar � How do you teach the other user � How do you teach the other user � Typically domain directed � Typically domain directed �

Speech Technology Issues ASR: � ASR: � � Disfluencies Disfluencies, dialects, speaking style , dialects, speaking style � � Unfamiliarity with system Unfamiliarity with system � TTS: � TTS: � � MT output isn’t always fluent MT output isn’t always fluent � � TTS says it anyway TTS says it anyway � � Can be hard to understand Can be hard to understand �

Speech Technology Issues Spoken not Written Languages � Spoken not Written Languages � � Arabic Arabic vs vs Arabic Dialects Arabic Dialects � � Mixture of languages Mixture of languages � � Politeness levels Politeness levels � � Gender in speech Gender in speech �

Speech Processing 15-492/18-492 Speech Translation Speech - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Speech Translation Speech Translation Three part systems Three part systems ASR ASR - -> Translation > Translation - -> TTS > TTS System configurations System configurations

Speech Processing for Speech Processing for Unwritten Languages Unwritten Languages Alan W

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Grammars

Chapter 1 Introduction to Speech Signal Processing 1 Outline The

Speech Processing 11-492/18-492 Speech Synthesis Signal Processing Signal Manipulation

Speech Processing 11-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Signal Representations Part 2: Speech Signal Processing Hsin-min Wang References: 1 X.

Speech Processing 15-492/18-492 Speech Recognition Grammars Other ASR techniques But not just

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

Speech Processing 15-492/18-492 Emotional Speech (Some slides taken form JHU Workshop 2011 final

Cepstral analysis in speech processing From speech production model, we have: s[n] = (p[n]*g[n] +

Human Speech Hermansky Spring 2020 EN.520.680 Speech and Auditory Processing by Humans and

Unsupervised speech processing using acoustic word embeddings Herman Kamper School of

SYNTAX PROCESSING Statistical Natural Language Processing 23.04.19 1 Syntax, Grammars, Parsing

Natural Language Processing Parts of Speech Part of Speech Tagging Dan Klein UC

End-to-End Speech Processing: From Pipeline to Integrated Architecture Shinji Watanabe Center

Speech Processing 15-492/18-492 Speech Processing Current Topics and Future challenges

Microphone Array Processing for Distant Speech Recognition From close-talking microphones to

EE E6820: Speech & Audio Processing & Recognition Lecture 5: Speech modeling and

Workshop on the Role of Speech in Developing Robust Speech Processing Applications May 7-8, 2015

KALDI GPU ACCELERATION GTC - March 2019 1) Brief introduction to speech processing 2) What we

FINITE STATE MORPHOLOGY 24.05.19 Statistical Natural Language Processing 1 Morphology with FSAs

EE679: Speech Processing EE679: Speech Processing A preview A preview Dept of Electrical

EE679: Speech Processing EE679: Speech Processing A preview A preview Dept of Electrical

Speech Processing 15-492/18-492 Speech Translation Speech - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Speech Translation Speech Translation Three part systems Three part systems ASR ASR - -> Translation > Translation - -> TTS > TTS System configurations System configurations

Speech Processing for Speech Processing for Unwritten Languages Unwritten Languages Alan W

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Grammars

Chapter 1 Introduction to Speech Signal Processing 1 Outline The

Speech Processing 11-492/18-492 Speech Synthesis Signal Processing Signal Manipulation

Speech Processing 11-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Signal Representations Part 2: Speech Signal Processing Hsin-min Wang References: 1 X.

Speech Processing 15-492/18-492 Speech Recognition Grammars Other ASR techniques But not just

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

Speech Processing 15-492/18-492 Emotional Speech (Some slides taken form JHU Workshop 2011 final

Cepstral analysis in speech processing From speech production model, we have: s[n] = (p[n]*g[n] +

Human Speech Hermansky Spring 2020 EN.520.680 Speech and Auditory Processing by Humans and

Unsupervised speech processing using acoustic word embeddings Herman Kamper School of

SYNTAX PROCESSING Statistical Natural Language Processing 23.04.19 1 Syntax, Grammars, Parsing

Natural Language Processing Parts of Speech Part of Speech Tagging Dan Klein UC

End-to-End Speech Processing: From Pipeline to Integrated Architecture Shinji Watanabe Center

Speech Processing 15-492/18-492 Speech Processing Current Topics and Future challenges

Microphone Array Processing for Distant Speech Recognition From close-talking microphones to

EE E6820: Speech &amp; Audio Processing &amp; Recognition Lecture 5: Speech modeling and

Workshop on the Role of Speech in Developing Robust Speech Processing Applications May 7-8, 2015

KALDI GPU ACCELERATION GTC - March 2019 1) Brief introduction to speech processing 2) What we

FINITE STATE MORPHOLOGY 24.05.19 Statistical Natural Language Processing 1 Morphology with FSAs

EE679: Speech Processing EE679: Speech Processing A preview A preview Dept of Electrical

EE679: Speech Processing EE679: Speech Processing A preview A preview Dept of Electrical

EE E6820: Speech & Audio Processing & Recognition Lecture 5: Speech modeling and