Simultaneous Speech Translation Graham Neubig Nara Institute of - PowerPoint PPT Presentation

Simultaneous Speech Translation Simultaneous Speech Translation Graham Neubig Nara Institute of Science and Technology (NAIST) 10/16/2015 Joint Work With: Satoshi Nakamura, Tomoki Toda, Sakriani Sakti, Tomoki Fujita, Hiroaki Shimizu, Yusuke Oda, Takashi Mieno, Quoc Truong Do 1

Simultaneous Speech Translation Background 2

Simultaneous Speech Translation Speech Translation Source: Microsoft Research http://research.microsoft.com/en-us/news/features/translator-052714.aspx Source: NICT http://www.nict.go.jp/press/2010/06/29-1.html 3 Source: Karlsruhe Institute of Technology http://isl.anthropomatik.kit.edu/english/1520.php

Simultaneous Speech Translation Traditional Speech Translation Divide at sentence boundaries ASR こんにちは、駅はどこですか？ MT Hello, where is the station? TTS 4

Simultaneous Speech Translation Problem: Delay (Ear-Voice Span) Delay ASR こんにちは、駅はどこですか？ MT Hello, where is the station? TTS 5

Simultaneous Speech Translation Speech Translation Example 6

Simultaneous Speech Translation Simultaneous Speech Translation Delay: Reduced ASR こんにちは、駅はどこですか？ MT MT MT Hello, the station where is it? TTS TTS TTS But, this is not easy! 7

Simultaneous Speech Translation Professional Simultaneous Interpretation Photo Credit: 8 https://www.flickr.com/photos/joi/2027679714 https://www.flickr.com/photos/european_parliament/4268490015

Simultaneous Speech Translation Simultaneous Interpretation Data [Shimizu+ LREC14]  Recorded data － About 10 Hours of TED Talks (English-Japanese, Japanese-English)  Simultaneous interpreters Experience Rank － 3 pros with varying years of 15 years S rank experience 4 years A rank － Ranked S, A, and B 1 year B rank Freely available for research purposes: http://ahclab.naist.jp/resource/stc/ 9

Simultaneous Speech Translation Simultaneous Interpreter Example 10

Simultaneous Speech Translation So How do Simultaneous Interpreters Do It? Source: 今ご覧いただいたこの映像は今から五年前、日本で世間を賑わせていた裁判員制度が始まる一年前、大学四年生だった私が模擬裁判用の資料として作った物です Translation: Five years ago, as a college senior, I created the video that you just saw as a reference material for a mock trial, one year before the much-talked-about jury system commenced in Japan. Interpretation: Predict NP You just saw this video clip. Five years ago, at that time in Japan, the ordinary people's justice system, jury system, was very much talked about in Japan, and I created this video as a reference material for that. 11 Segmentation Prediction Rewording Summarization

Simultaneous Speech Translation Can We Do the Same in Speech Translation Systems? Four problems in this talk: ● Segmentation: When do we start translating? ● Prediction: Can we predict things that haven't been said? ● Rewording: Can we reword sentences to be conducive to simultaneous translation? ● Evaluation: How do we decide which results are better? 12

Simultaneous Speech Translation Segmentation 13

Simultaneous Speech Translation Heuristic Segmentation Strategies Division on pauses [Fugen+ 07, Bangalore+ 12] hello where is the station comma no comma Division on predicted commas [Sridhar+ 13] Division based on reordering probabilities [Fujita+ 13] hello → probability of reordering 0.1 14 where → probability of reordering 0.8

Simultaneous Speech Translation Optimizing Segmentation Strategies for Simultaneous Speech Translation [Oda+ ACL14] ● All previous segmentation strategies were based on heuristics ● Don't directly take into account effect on translation accuracy What if we could directly optimize sentence segmentation for translation accuracy? 15

Simultaneous Speech Translation Training/Testing Framework Find segmentation S* Training Corpus Segmentation S* that maximizes MT accuracy src src src trg trg trg src src src src src src trg trg trg src src src src src src trg trg trg src src src Train segmentation Model model Testing Corpus Segmented Test Translated Test src src src src src src trg trg trg src src src src src src trg trg trg Segment Translate src src src src src src trg trg trg 16

Simultaneous Speech Translation S* Search Method 1: Greedy Search I ate lunch but she left 私は昼食を食べたが彼女は帰った I ate lunch but she left 0.7 私昼食を食べたが彼女は帰った I ate lunch but she left 0.4 私は食べたランチ彼女は帰った I ate lunch but she left 0.6 私は昼食を食べたしかし彼女は帰った私は昼食を食べたが彼女は帰った 1.0 I ate lunch but she left 0.2 私は食べたが彼女左 I ate lunch but she left I ate lunch but she left I ate lunch but she left 私昼食を食べたが彼女は帰った 0.9 I ate lunch but she left 私は食べた昼食だが彼女は帰った 0.3 I ate lunch but she left 私は昼食を食べたしかし彼女は帰った 0.6 I ate lunch but she left 私は昼食を食べたが彼女左 0.2 I ate lunch but she left 17 Train SVM classifier to recover / at test time

Simultaneous Speech Translation S* Search Method 2: Grouping by Features ● Because MT/Evaluation is complicated, there is the potential to overfit ● Solution: group boundaries by features I ate lunch but she left PRN VBD NN CC PRN VBD Pronoun + Verb Noun + Conjunction Determiner + Noun I ate an apple and an orange PRN VBD DET NN CC DET NN Search can be performed using dynamic programming 18 Features for the model trivial, no learning is needed

Simultaneous Speech Translation Results on TED Talks 19 → 2-3 times faster with no loss in BLEU

Simultaneous Speech Translation Simultaneous Translation Demo ● Greedy+Grouping at 10 words 20

Simultaneous Speech Translation Future Contributions to Segmentation? ● Speech: Optimized models using acoustic features? ● Parsing: Incorporation with incremental parsing? e.g. [Ryu+ 06] ● Machine Learning: Smarter models: neural networks? ● Algorithms: Integration with incremental decoding? e.g. [Sankaran+ 10] 21

Simultaneous Speech Translation Prediction 22

Simultaneous Speech Translation What Kind of Prediction do Simultaneous Interpreters Do? [Wilss 78, Chernov+ 04] ● Lexical prediction サイエンスを正しく楽しく、これを合い言葉にサイエンス CG science factual fun this keyword as science CG then what I wanted to do is to クリエーターとして活動しています。 creator as working promote fun and factual science, that's my keyword. I'm a … ● Structural prediction 今ご覧頂いた映像 now you saw video you just saw a video clip 23

Simultaneous Speech Translation Predicting Sentence-final Verbs [Grissom et al., EMNLP14] ● Method for translating from verb-final languages (e.g. German) ● Train a classifier to predict the sentence-final verb ● Use reinforcement learning to decide to “wait” “predict” or “commit” 24

Simultaneous Speech Translation Syntax-based Simultaneous Translation through Prediction of Unseen Syntactic Constituents [Oda+ ACL15] ● Predict unseen syntax constituents ● Translate from correct tree Predict In the next 18 minutes I VP PP S IN NP PP NP VP NP NP IN NP PRP DT JJ CD NNS NN DT JJ CD NNS in the next 18 minutes I in the next 18 minutes I (VP) 25 今から 18 分私今から 18 分で私は (VP)

Simultaneous Speech Translation Why is Syntax Necessary? ● Tree-to-string (T2S) MT framework ● Obtains state-of-the-art results on syntactically distant language pairs (c.f. phrase-based translation; PBMT) ● Possible to use additional syntactic constituents explicitly S VP NP This is NP Parse NP MT これは NP です DT VBZ This is ● Additional heuristic to wait for more input based on when 26 translation requires reordering

Simultaneous Speech Translation Making Training Data for Syntax Prediction ● Decompose gold trees in the treebank 1. Select any leaf span in the tree S 2. Find the path between leftmost/rightmost leaves NP VP 3. Delete the outside subtree DT VBZ NP 4. Replace inside subtrees with topmost phrase label DT NN NN 5. Finally we obtain: This is a pen nil is a NN nil Leaf span 27 Left syntax Leaf span Right syntax

Simultaneous Speech Translation Syntax Prediction Process 1. Parse the input as-is 2. Extract features PP Word:R1=I ROOT=PP POS:R1=NN ROOT-L=IN Word:R1-2=I,minutes ROOT-R=NP POS:R1-2=NN,NNS ... IN NP ... 3. Predict the next tag NP NP (linear SVM) VP ... 0.65 NP ... 0.28 4. Append to DT JJ CD NNS NN nil ... 0.04 sequence ... 5. Repeat until nil in the next 18 minutes I VP nil Input translation unit 28

Simultaneous Speech Translation Graham Neubig Nara Institute of - PowerPoint PPT Presentation

Simultaneous Speech Translation Simultaneous Speech Translation Graham Neubig Nara Institute of Science and Technology (NAIST) 10/16/2015 Joint Work With: Satoshi Nakamura, Tomoki Toda, Sakriani Sakti, Tomoki Fujita, Hiroaki Shimizu, Yusuke

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Tomoki

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Speech Processing 15-492/18-492 Speech Translation Speech Translation Three part systems

SDS Aplications - Speech-to-speech translation - Anca Burducea May 28, 2015 S2S Translation

Simultaneous Speech Translation Graham Neubig Nara Institute of Science and Technology (NAIST)

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Toward Toward Univeral Network-based Univeral Network-based Speech Translation Speech

Simultaneous Translation: Recent Advances and Remaining Challenges Liang Huang Baidu

Community Translation By Willem Stoeller Examples Community Translation Virtual Teams Powering

Simultaneous GermanEnglish Lecture Translation Muntsin Kolss, Matthias Wlfel, Florian Kraft,

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Project Overview Speech Speech Generation Generation Common Semantic Frame Speech Speech

Grounded Semantics Daniel Fried with slides from Greg Durrett and Chris Potts Language is

9/5/2016 Department of Large Animal Sciences Department of Large Animal Sciences Outline The

From registration to information II Anders Ringgaard Kristensen Department of Large Animal

JUST FEEL THE MUSIC! Vibration Motors LEDs Arduino TRANSLATING MUSIC INTO VIBRATIONS The

CS 403X Mobile and Ubiquitous Computing Lecture 6: Maps, Sensors, Widget Catalog and Presentations

The General Counsel Program of the Greater Richmond Bar Foundation Nonprofit Corporate

An Unseen Interface :D Creating Speech-driven UI For Your App That Makes Users Happy by Halle

NTAG 5 PRODUCT INTRODUCTION NFC FORUM TYPE 5 TAGS: NTAG 5 PRODUCT FAMILY PRESENTATION PABLO