SDS Aplications - Speech-to-speech translation - Anca Burducea May - PowerPoint PPT Presentation

SDS Aplications - Speech-to-speech translation - Anca Burducea May 28, 2015

S2S Translation Three independent tasks: S s → T s → T t → S t S s = speech source T s = text source T t = text target S t = speech target

S2S Translation S s → T s = ASR ↓ T s → T t = MT T t → S t = TTS Wo ist das n¨ achste Hotel? S s = speech source ↓ T s = text source T t = text target Where is the nearest hotel? S t = speech target ↓

S2S Translation S s → T s = ASR – WER ↓ T s → T t = MT – BLEU T t → S t = TTS – subjective Wo ist das n¨ achste Hotel? listening tests ↓ S s = speech source T s = text source Where is the nearest hotel? T t = text target S t = speech target ↓

S2S Translation - Issues ◮ error propagation ◮ not using context in the downstream process

Annotations of Speech A lot of context annotation on speech ◮ dialog act (DA) tags ◮ semantic annotation ◮ pitch prominence ◮ emphasis ◮ contrast ◮ emotion ◮ speaker segmentation

Sridhar 2013 Enrich S2S translations using contextual information!

Sridhar 2013 Enrich S2S translations using contextual information! ◮ DA tags ◮ prosodic word prominence

Sridhar 2013 Enrich S2S translations using contextual information! ◮ DA tags ◮ prosodic word prominence Purpose: ◮ resolve ambiguities ◮ wir haben noch → we still have ◮ wir haben noch → we have another

Sridhar 2013 Enrich S2S translations using contextual information! ◮ DA tags ◮ prosodic word prominence Purpose: ◮ resolve ambiguities ◮ wir haben noch → we still have ◮ wir haben noch → we have another ◮ enrich target speech with prosody (intonation, emotion) from source speech

Sridhar 2013 S s = speech source T s = text source T t = text target S t = speech target L s = enriched source = text source + context labels L t = enriched target = text target + context labels

Sridhar 2013 Data ◮ train MaxEnt classifier for ◮ DA tagging: statement, acknowledgment, abandoned, agreement, question, appreciation, other – 82.9% ◮ prosodic prominence: accent, no-accent – 78.5%

Sridhar 2013 Data ◮ train MaxEnt classifier for ◮ DA tagging: statement, acknowledgment, abandoned, agreement, question, appreciation, other – 82.9% ◮ prosodic prominence: accent, no-accent – 78.5% ◮ tested on three parallel corpora: Farsi-English, Japanese-English, Chinese-English

Sridhar 2013 Improve translation model using source language enrichment: ◮ bag-of-words model ◮ ◮ reorder words according to target language model

Sridhar 2013 Improve translation model using source language enrichment: ◮ bag-of-words model ◮ ◮ reorder words according to target language model Improve translation model using target language enrichment ◮ factored model: word is translated into (word, pitch accent)

Sridhar 2013 - Results DA tags ◮ question(YN, WH, open), acknowledgement → significant improvement ◮ statement → no significant improvement

Sridhar 2013 - Results DA tags ◮ question(YN, WH, open), acknowledgement → significant improvement ◮ statement → no significant improvement Prosody ◮ improved prosodic accuracy of target speech ◮ lexical selection accuracy no affected (same BLEU)

Sridhar 2013 - Results DA tags ◮ question(YN, WH, open), acknowledgement → significant improvement ◮ statement → no significant improvement Prosody ◮ improved prosodic accuracy of target speech ◮ lexical selection accuracy no affected (same BLEU) Conclusion: ”the real benefits of such a scheme would be manifested through human evaluations. We are currently working on conducting subjective evaluations.”

VERBMOBIL ◮ German S2S system developed between 1993-2000 ◮ ”verbal communication with foreign interlocutors in mobile situations” ◮ ”Verbmobil is the first speech-only dialog translation system” ◮ bidirectional translations for German, English, Japanese ◮ business-oriented domains: 1. appointment scheduling 2. travel planning 3. remote PC maintenance

VERBMOBIL features ◮ context-sensitive translations e.g. GER nachste → ENG next (train) or nearest (hotel) ◮ prosody e.g. ”wir haben noch” vs. ”wir haben noch ” ◮ domain knowlege: it knows ”things about the topic being discussed” ◮ dialog memory: it knows ”things that were communicated earlier” ◮ disfluencies management: 1. filters out simple disfluencies (”ahh”, ”umm”) 2. remove reparandum

VERBMOBIL - Disambiguation

VERBMOBIL - Control Panel Demo: Link

SDS Aplications - Speech-to-speech translation - Anca Burducea May - PowerPoint PPT Presentation

SDS Aplications - Speech-to-speech translation - Anca Burducea May 28, 2015 S2S Translation Three independent tasks: S s T s T t S t S s = speech source T s = text source T t = text target S t = speech target S2S Translation S s

Active Error Detection and Resolution for Speech-to-Speech (S2S) Translation Rohit Prasad Rohit

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Multimodality in a speech to speech translation system. Preliminary results of an experimental

The CMU TransTac 2007 Eyes-free and Hands-free Two-way Speech-to-Speech Translation System Thilo

Pre-training on high-resource speech recognition improves low-resource speech-to-text translation

Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Tomoki

Speech Processing 15-492/18-492 Speech Translation Speech Translation Three part systems

Simultaneous Speech Translation Graham Neubig Nara Institute of Science and Technology (NAIST)

Toward Toward Univeral Network-based Univeral Network-based Speech Translation Speech

Simultaneous Speech Translation Graham Neubig Nara Institute of Science and Technology (NAIST)

Project Overview Speech Speech Generation Generation Common Semantic Frame Speech Speech

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

NAISTs Machine Translation Systems for IWSLT 2020 Conversational Speech Translation Task Ryo

Speech Acts Speech Act Theory: Basic concept of Speech Act Theory is Saying is part of doing

Presentation Speech by Professor Bengt Nagel of the Nobel Prize Organisation Translation from the

speech pathologists do? Leanne Stein, M.S., CCC-SLP Speech-Language Pathologist Speech Therapy

Speech and Language CS 188: Artificial Intelligence Speech technologies Automatic

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Grammars

HMMS and Speech HMMS and Speech HMMS and Speech Recognition Recognition Recognition Presented

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 23: Speech

Analysis of speech Dr. Anil Kumar Vuppala IIIT Hyderabad Analysis of speech Representing speech

End-to-end approach to ASR, TTS and Speech Translation Satoshi Nakamura 1,2 with Sakriani Sakti

Chapter 1 Introduction to Speech Signal Processing 1 Outline The