NAISTs Machine Translation Systems for IWSLT 2020 Conversational - PowerPoint PPT Presentation

NAIST’s Machine Translation Systems for IWSLT 2020 Conversational Speech Translation Task Ryo Fukuda 1 , Katsuhito Sudoh 1 , and Satoshi Nakamura 1,2 1 Nara Institute of Science and Technology 2 AIP Center, RIKEN, Japan

Brief Overview Ch Challenge track: Co Conversational Spe peech Tra ranslation Translation task from disfluent Spanish to fluent English • Includes speech-to-text and text-to-text translation subtask Mo Motiv ivatio ion: Tackle tw two problems on text-to to-te text NMT 1. Low-resource translation 2. Noisy input sentences • fillers, hesitations, self-corrections, ASR errors, … Pr Proposal: Dom Domain a n adaptation us on using ng s style t trans nsfer les of out-of-domain data to be like in- • transfer the sty style domain data, and them performed domain adaptation 2

Outline 1. Introduction 2. System Description 3. Experiments 4. Discussion 5. Summary 3

Motivation Th The “ “style” o of t task d data ( (in-do domain): disfluent fluent in-domain data Spanish English → Ideally, augment data by using large corpus same style Large c corpus a available ( (out-of of-do domain): fluent fluent out-of-domain data Spanish English → Effects of training with them are limited 4

Motivation Style transfer model: fluent to disfluent St fluent fluent out-of-domain data Spanish English St Style Transfer disfluent fluent pseudo in-domain data Spanish English • increase the the similarity between out-of-domain and in-domain data → Enables effective domain adaptive training 5

Overview Generate pseudo in-domain data and adapt it for NMT Out-of- In-domain domain monolingual monolingual Train Out-of- Pseudo Ps In-domain (1) (1 ) Style Transfer model domain in in-do domain parallel Fluent-to-Disfluent parallel pa parallel Train In-domain In-domain (2 (2) ) NMT model source target Spanish-to-English 7

(1) Style Transfer model Transfer fluent input sentences of out-of-domain parallel data into disfluent styles He’s been sleeping He estado durmiendo casi once horas for almost 11 hours Style Transfe fer model Fluent-to-Disfluent He’s been sleeping Ya, ya, so, duerme casi once horas for almost 11 hours Styl tyle Transfe fer mode del: • based on Unsupervised NMT (Artetxe et al., 2018; Lample et al., 2018) with out-of-domain fluent data and in-domain disfluent data 8

(2) NMT model Apply fine-tuning • conventional domain adaptation methods of MT • greatly improves the accuracy of low-resource domain- specific translation (Dakwale and Monz, 2017) Learning steps for fine-tuning: 1. Pre-training steps 2. Fine-tuning steps Pseudo In-domain in-domain parallel parallel pr pre-tr trained d mode del fi fine-tu tuned d mode del Spanish-to-English Spanish-to-English 9

Outline 1. Introduction 2. System Description 3. Experiments 4. Discussion 5. Conclusion 10

Datasets • LDC Fisher Spanish speech with English translations ( Fi Fisher ) • parallel in-domain data • disfluent Spanish to (fluent/disfluent) English • United Nations Parallel Corpus ( UN UNCo Corpus ) • parallel out-of-domain data • fluent Spanish to fluent English Da Data statistics # sentences er (in-domain)/Train Fi Fisher 138,720 Dev 3,977 Test 3,641 UNCorpus (out-of-domain)/Train 1,000,000 UN Dev 4,000 Test 4,000 11

(1) Spanish Style Transfer Data: Fisher (disfluent) and UNCorpus (fluent) Spanish data Model: Unsupervised NMT (UNMT) based on Transformer Evaluation: ity between domains by measuring • Estimate the sim similar ilarity plexity of 3-gram language model the pe perpl Out-of- In-domain domain monolingual monolingual Train Out-of- Pseudo (1) Style Transfe fer model In-domain domain in-domain Fluent-to-Disfluent parallel parallel parallel Train In-domain (2 (2) ) NMT model In-domain source Spanish-to-English target 12

(1) Spanish Style Transfer Results • reduced perplexity and number of unknown words by style transfer Training data perplexity unknow words Fisher 72.46 0 UNCorpus 589.81 5,173,539 Fishe Fi her-lik like e UN UNCorpus 474.47 4,217,819 Examples of pseudo in-domain data ( Fi Fisher-lik like UN UNCo Corpus ) UNCorpus Fisher-like UNCorpus d conducta y disciplina eh conducta y disciplina eh c lista amplia de verificación para la mh mhm lista amplia de verificación para autoevaluación la la la te tele le • Delete paragraph symbol • Insert “Disfluency” (filler, repetition, missing words, ASR error, ..) 13

(2) NMT with Domain Adaptation Data • in-domain: 130K bilingual pairs of Fisher • out-of-domain: 1M of UNCorpus or Fisher-like UNCorpus Model: Transformer (almost follow the transformer_base settings) Evaluation: calculated the BLEU scores with sacreBLEU Out-of- In-domain domain monolingual monolingual Train Out-of- Pseudo (1) Style Transfe fer model In-domain domain in-domain Fluent-to-Disfluent parallel parallel parallel Train In-domain (2 (2) ) NMT model In-domain source Spanish-to-English target 14

(2) NMT with Domain Adaptation Results (1/2) – Effect of Style Transfer BLEU scores of trained NMT models for Disfluent Spanish to Fluent English System Fisher/test Fisher 14.8 Single Training UNCorpus 7.8 Fisher-like UNCorpus 6.7 UNCorpus + Fisher 18.3 Fine-tuning Fisher-like UNCorpus + Fisher 18.5 • Domain adaptation training outperformed the baseline • slightly improved by using the pseudo in-domain data 15

(2) NMT with Domain Adaptation Results (1/2) – Effect of Style Transfer BLEU scores of trained NMT models for Disfluent Spanish to Fluent English System Fisher/test Fisher 14.8 Single Training UNCorpus 7.8 +3.5 Fisher-like UNCorpus 6.7 UNCorpus + Fisher 18. 18.3 Fine-tuning Fisher-like UNCorpus + Fisher 18.5 • Domain adaptation training outperformed the baseline • slightly improved by using the pseudo in-domain data 16

(2) NMT with Domain Adaptation Results (1/2) – Effect of Style Transfer BLEU scores of trained NMT models for Disfluent Spanish to Fluent English System Fisher/test Fisher 14.8 Single Training UNCorpus 7.8 Fisher-like UNCorpus 6.7 UNCorpus + Fisher 18.3 Fine-tuning +0.2 Fisher-like UNCorpus + Fisher 18. 18.5 • Domain adaptation training outperformed the baseline • slightly improved by using the pseudo in-domain data 17

(2) NMT with Domain Adaptation Results (1/2) – Effect of Style Transfer BLEU scores of trained NMT models for Disfluent Spanish to Fluent English System Fisher/test Fisher 14.8 Single Training UNCorpus 7.8 -1.1 Fisher-like UNCorpus 6.7 UNCorpus + Fisher 18.3 Fine-tuning Fisher-like UNCorpus + Fisher 18.5 • Domain adaptation training outperformed the baseline • slightly improved by using the pseudo in-domain data 18

(2) NMT with Domain Adaptation Results (2/2) – Fluent vs Disfluent references “ Fi Fisher er (disfluen ent) “ did not use Fisher’s fluent references but instead used disfluent references System Fisher/test Fisher (fluent) 14.8 UNCorpus + Fisher (fluent) 18.3 Fisher-like UNCorpus + Fisher (fluent) 18. 18.5 -3.2 Fisher ( di disfluent ) 11.6 -3.1 UNCorpus + Fisher ( di disfluent ) 15.2 -2.9 Fisher-like UNCorpus + Fisher ( di disfluent ) 15.6 • models trained with Fisher’s original disfluent references had about 3 points lower BLEU 19

Effect of Style Transfer The use of pseudo in-domain data improved accuracy, but • there was no significant improvement • was worse in the pre-training phase An example of style transferred sentence: nueva york 1 a 12 de junio de 2015 (original) nueva york oh a mi eh de de de de (generated) • some sentences lost the meaning of the sentence • style transfer constrains may be too strong → This problem may be mitigated by a model that ca can co control the tr trade de-of off between style transfer and content preservation 21

Fluent vs Disfluent References The model trained using Fisher’s original disfluent data had a BLEU score of about 3 points lower than the model trained using fluent data. → by by remo moving the disfluency of reference sentences imp improves s the the BLEU by ab about ut thr three points ints for all all the the le lear arning ning str strat ategie gies s we tr trie ied the use of large out-of-domain data with fl fluen ent • re refere rence sentences did not mitigate this problem Style of the sentence ce has an impact ct on the translation accu ccuracy cy 22

Summary Translation accuracy was improved • by domain adaptation (+3.7) • by style transfer of out-of-domain (+0.4) • effect was limited due to parallel data quality degradation Future work pursue a a s style t transfer t that d does n not r reduce the q quality o of t the p parallel d data 23

NAISTs Machine Translation Systems for IWSLT 2020 Conversational - PowerPoint PPT Presentation

NAISTs Machine Translation Systems for IWSLT 2020 Conversational Speech Translation Task Ryo Fukuda 1 , Katsuhito Sudoh 1 , and Satoshi Nakamura 1,2 1 Nara Institute of Science and Technology 2 AIP Center, RIKEN, Japan Brief Overview Ch

Forest-to-String SMT for Asian Language Translation: NAIST at WAT 2014 Graham Neubig Nara

Statistical Machine Translation Graham Neubig Nara Institute of Science and Technology (NAIST)

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Neural Reranking Improves Subjective Quality of Machine Translation: NAIST at WAT 2015 Graham

Introd u ction to machine translation MAC H IN E TR AN SL ATION IN P YTH ON Th u shan

Machine Translation Machine Translation February 13, 2008 Andreas Eisele UdS Computerlinguistik

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

Machine Translation Philipp Koehn 28 April 2020 Philipp Koehn Artificial Intelligence: Machine

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Computer Aided Translation Philipp Koehn 30 April 2015 Philipp Koehn Machine Translation:

Computer Aided Translation Philipp Koehn 15 November 2018 Philipp Koehn Machine Translation:

Machine Translation: Going Deep Philipp Koehn 4 June 2015 Philipp Koehn Machine Translation:

Machine Translation Philipp Koehn 1 December 2015 Philipp Koehn Artificial Intelligence:

Neural Machine Translation II Refinements Philipp Koehn 17 October 2017 Philipp Koehn Machine

Representing Huge Translation Models Statistical Machine Translation parallel text + alignment

Cross-lingual topic prediction for speech using translations Sameer Bansal Herman Kamper Adam

CS184a: Computer Architecture (Structures and Organization) Day19: November 27, 2000

Federal Budget 2019-20 T op 5 Budget impacts for SMSFs Doug McBirnie and Melanie Dunn Agenda

Fourth Quarter and to end all bond investor litigation against Credit Suisse. As a result of this

The inverse conductivity problem with power densities in dimension n 2 Fran cois Monard

The Tiniest Bit of reality The Tiniest Bit of reality An introductory course on neutrino An

Foxp1 Syndrome Joseph D Buxbaum, PhD Director, Seaver Autism Center Deputy Chair, Department of

Introduction Rules & Regulations Dates & Times Updated: July 3, 2018 Resources

NAISTs Machine Translation Systems for IWSLT 2020 Conversational - PowerPoint PPT Presentation

NAISTs Machine Translation Systems for IWSLT 2020 Conversational Speech Translation Task Ryo Fukuda 1 , Katsuhito Sudoh 1 , and Satoshi Nakamura 1,2 1 Nara Institute of Science and Technology 2 AIP Center, RIKEN, Japan Brief Overview Ch

Forest-to-String SMT for Asian Language Translation: NAIST at WAT 2014 Graham Neubig Nara

Statistical Machine Translation Graham Neubig Nara Institute of Science and Technology (NAIST)

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Neural Reranking Improves Subjective Quality of Machine Translation: NAIST at WAT 2015 Graham

Introd u ction to machine translation MAC H IN E TR AN SL ATION IN P YTH ON Th u shan

Machine Translation Machine Translation February 13, 2008 Andreas Eisele UdS Computerlinguistik

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

Machine Translation Philipp Koehn 28 April 2020 Philipp Koehn Artificial Intelligence: Machine

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Computer Aided Translation Philipp Koehn 30 April 2015 Philipp Koehn Machine Translation:

Computer Aided Translation Philipp Koehn 15 November 2018 Philipp Koehn Machine Translation:

Machine Translation: Going Deep Philipp Koehn 4 June 2015 Philipp Koehn Machine Translation:

Machine Translation Philipp Koehn 1 December 2015 Philipp Koehn Artificial Intelligence:

Neural Machine Translation II Refinements Philipp Koehn 17 October 2017 Philipp Koehn Machine

Representing Huge Translation Models Statistical Machine Translation parallel text + alignment

Cross-lingual topic prediction for speech using translations Sameer Bansal Herman Kamper Adam

CS184a: Computer Architecture (Structures and Organization) Day19: November 27, 2000

Federal Budget 2019-20 T op 5 Budget impacts for SMSFs Doug McBirnie and Melanie Dunn Agenda

Fourth Quarter and to end all bond investor litigation against Credit Suisse. As a result of this

The inverse conductivity problem with power densities in dimension n 2 Fran cois Monard

The Tiniest Bit of reality The Tiniest Bit of reality An introductory course on neutrino An

Foxp1 Syndrome Joseph D Buxbaum, PhD Director, Seaver Autism Center Deputy Chair, Department of

Introduction Rules &amp; Regulations Dates &amp; Times Updated: July 3, 2018 Resources

Introduction Rules & Regulations Dates & Times Updated: July 3, 2018 Resources