Neural Machine Translation: Breaking the Performance Plateau Rico - PowerPoint PPT Presentation

Neural Machine Translation: Breaking the Performance Plateau Rico Sennrich Institute for Language, Cognition and Computation University of Edinburgh July 4 2016 Rico Sennrich Neural Machine Translation 1 / 15

Is Machine Translation Getting Better Over Time? [Graham et al., 2014] B LEU on newstest2007 (EN → DE) 30 23 . 6 20 14 . 6 10 0 2007 best system current system (2014) Rico Sennrich Neural Machine Translation 1 / 15

Edinburgh’s WMT Results Over the Years B LEU on newstest2013 (EN → DE) 30 24 . 7 22 . 1 22 21 . 5 20 . 9 20 . 8 20 . 3 20 . 2 20 19 . 4 10 0 2013 2014 2015 2016 phrase-based SMT syntax-based SMT neural MT Rico Sennrich Neural Machine Translation 2 / 15

Neural Machine Translation [Bahdanau et al., 2015] Kyunghyun Cho http://devblogs.nvidia.com/parallelforall/introduction-neural-machine-translation-gpus-part-3/ Rico Sennrich Neural Machine Translation 3 / 15

Why Neural Machine Translation? qualitative differences main strength of neural MT: improved grammaticality [Neubig et al., 2015] phrase-based SMT strong independence assumptions log-linear combination of many “weak” features neural MT output conditioned on full source text and target history end-to-end trained model Rico Sennrich Neural Machine Translation 4 / 15

Example (WMT16 EN → DE) source But he wants an international reporter to be there to write about it. reference Aber er will , dass ein internationaler Reporter anwesend ist , um dort zu schreiben . PBSMT Aber er will einen internationalen Reporter zu sein , darüber zu schreiben . SBSMT Aber er will einen internationalen Reporter , um dort zu sein , über sie zu schreiben . neural MT Aber er will , dass ein internationaler Reporter da ist , um darüber zu schreiben . Rico Sennrich Neural Machine Translation 5 / 15

Recent Advances in Neural MT some problems: networks have fixed vocabulary → poor translation of rare/unknown words models are trained on parallel data; how do we use monolingual data? recent solutions: subword models allow translation of rare/unknown words [Sennrich et al., 2016b] train on back-translated monolingual data [Sennrich et al., 2016a] Rico Sennrich Neural Machine Translation 6 / 15

Problem with Word-level Models they charge a carry-on bag fee . sie erheben eine Hand|gepäck|gebühr . Neural MT architectures have small and fixed vocabulary translation is an open-vocabulary problem productive word formation (example: compounding) names (may require transliteration) Rico Sennrich Neural Machine Translation 7 / 15

Why Subword Models? transparent translations many translations are semantically/phonologically transparent → translation via subword units possible morphologically complex words (e.g. compounds): solar system (English) Sonnen|system (German) Nap|rendszer (Hungarian) named entities: Barack Obama (English; German) ➪àðàê ❰áàìà (Russian) バラク・オバマ (ba-ra-ku o-ba-ma) (Japanese) cognates and loanwords: claustrophobia (English) Klaustrophobie (German) ✃ëàóñòðîôîáèÿ (Russian) Rico Sennrich Neural Machine Translation 8 / 15

Examples system sentence source health research institutes reference Gesundheitsforschungsinstitute word-level Forschungsinstitute character bigrams Fo|rs|ch|un|gs|in|st|it|ut|io|ne|n joint BPE Gesundheits|forsch|ungsin|stitute source rakfisk reference ðàêôèñêà (rakfiska) word-level rakfisk → UNK → rakfisk character bigrams ra|kf|is|k → ðà⑤êô⑤èñ⑤ê (ra|kf|is|k) joint BPE rak|f|isk → ðàê⑤ô⑤èñêà (rak|f|iska) Rico Sennrich Neural Machine Translation 9 / 15

Monolingual Training Data why monolingual data for phrase-based SMT? relax independence assumptions ✓ more training data ✓ more appropriate training data (domain adaptation) ✓ why monolingual data for neural MT? relax independence assumptions ✗ more training data ✓ more appropriate training data (domain adaptation) ✓ Rico Sennrich Neural Machine Translation 10 / 15

Monolingual Data in NMT solutions previous work: combine NMT with separately trained LM [Gülçehre et al., 2015] our idea: decoder is already a language model → train encoder-decoder with added monolingual data monolingual training instances how do we get approximation of source context? dummy source context (moderately effective) automatically back-translate monolingual data into source language Rico Sennrich Neural Machine Translation 11 / 15

Results: WMT 15 English → German system B LEU syntax-based 24.4 Neural MT baseline 22.0 +subwords 22.8 +back-translated data 25.7 +ensemble of 4 26.5 Rico Sennrich Neural Machine Translation 12 / 15

WMT16 Results (B LEU ) uedin-nmt 34.2 metamind 32.3 uedin-nmt 26.0 NYU-UMontreal 30.8 uedin-nmt 31.4 amu-uedin 25.3 cambridge 30.6 jhu-pbmt 30.4 jhu-pbmt 24.0 uedin-syntax 30.6 PJATK 28.3 LIMSI 23.6 KIT/LIMSI 29.1 cu-mergedtrees 13.3 AFRL-MITLL 23.5 KIT 29.0 CS → EN NYU-UMontreal 23.1 uedin-pbmt 28.4 AFRL-MITLL-verb-annot 20.9 jhu-syntax 26.6 EN → RU uedin-pbmt 35.2 EN → DE uedin-nmt 33.9 uedin-syntax 33.6 amu-uedin 29.1 uedin-nmt 38.6 jhu-pbmt 32.2 NRC 29.1 uedin-pbmt 35.1 LIMSI 31.0 uedin-nmt 28.0 jhu-pbmt 34.5 RO → EN AFRL-MITLL 27.6 uedin-syntax 34.4 AFRL-MITLL-contrast 27.0 KIT 33.9 RU → EN QT21-HimL-SysComb 28.9 jhu-syntax 31.0 uedin-nmt 28.1 DE → EN RWTH-SYSCOMB 27.1 uedin-pbmt 26.8 uedin-nmt 25.8 uedin-lmu-hiero 25.9 NYU-UMontreal 23.6 KIT 25.8 jhu-pbmt 23.6 lmu-cuni 24.3 cu-chimera 21.0 LIMSI 23.9 uedin-cu-syntax 20.9 jhu-pbmt 23.5 cu-tamchyna 20.8 usfd-rescoring 23.1 cu-TectoMT 14.7 EN → RO cu-mergedtrees 8.2 EN → CS Rico Sennrich Neural Machine Translation 13 / 15

WMT16 Results (B LEU ) uedin-nmt 34.2 metamind 32.3 uedin-nmt 26.0 NYU-UMontreal 30.8 uedin-nmt 31.4 amu-uedin 25.3 cambridge 30.6 jhu-pbmt 30.4 jhu-pbmt 24.0 uedin-syntax 30.6 PJATK 28.3 LIMSI 23.6 KIT/LIMSI 29.1 cu-mergedtrees 13.3 AFRL-MITLL 23.5 KIT 29.0 CS → EN NYU-UMontreal 23.1 uedin-pbmt 28.4 AFRL-MITLL-verb-annot 20.9 jhu-syntax 26.6 EN → RU uedin-pbmt 35.2 EN → DE uedin-nmt 33.9 uedin-syntax 33.6 amu-uedin 29.1 uedin-nmt 38.6 jhu-pbmt 32.2 NRC 29.1 uedin-pbmt 35.1 LIMSI 31.0 uedin-nmt 28.0 jhu-pbmt 34.5 RO → EN AFRL-MITLL 27.6 uedin-syntax 34.4 AFRL-MITLL-contrast 27.0 KIT 33.9 RU → EN QT21-HimL-SysComb 28.9 jhu-syntax 31.0 uedin-nmt 28.1 DE → EN RWTH-SYSCOMB 27.1 uedin-pbmt 26.8 Edinburgh NMT uedin-nmt 25.8 uedin-lmu-hiero 25.9 NYU-UMontreal 23.6 KIT 25.8 jhu-pbmt 23.6 lmu-cuni 24.3 cu-chimera 21.0 LIMSI 23.9 uedin-cu-syntax 20.9 jhu-pbmt 23.5 cu-tamchyna 20.8 usfd-rescoring 23.1 cu-TectoMT 14.7 EN → RO cu-mergedtrees 8.2 EN → CS Rico Sennrich Neural Machine Translation 13 / 15

WMT16 Results (B LEU ) uedin-nmt 34.2 metamind 32.3 uedin-nmt 26.0 NYU-UMontreal 30.8 uedin-nmt 31.4 amu-uedin 25.3 cambridge 30.6 jhu-pbmt 30.4 jhu-pbmt 24.0 uedin-syntax 30.6 PJATK 28.3 LIMSI 23.6 KIT/LIMSI 29.1 cu-mergedtrees 13.3 AFRL-MITLL 23.5 KIT 29.0 CS → EN NYU-UMontreal 23.1 uedin-pbmt 28.4 AFRL-MITLL-verb-annot 20.9 jhu-syntax 26.6 EN → RU uedin-pbmt 35.2 EN → DE uedin-nmt 33.9 uedin-syntax 33.6 amu-uedin 29.1 uedin-nmt 38.6 jhu-pbmt 32.2 NRC 29.1 uedin-pbmt 35.1 LIMSI 31.0 uedin-nmt 28.0 jhu-pbmt 34.5 RO → EN AFRL-MITLL 27.6 uedin-syntax 34.4 AFRL-MITLL-contrast 27.0 KIT 33.9 RU → EN QT21-HimL-SysComb 28.9 jhu-syntax 31.0 uedin-nmt 28.1 DE → EN RWTH-SYSCOMB 27.1 uedin-pbmt 26.8 Edinburgh NMT uedin-nmt 25.8 uedin-lmu-hiero 25.9 NYU-UMontreal 23.6 KIT 25.8 jhu-pbmt 23.6 lmu-cuni 24.3 System cu-chimera 21.0 LIMSI 23.9 uedin-cu-syntax 20.9 jhu-pbmt 23.5 Combination with cu-tamchyna 20.8 usfd-rescoring 23.1 Edinburgh NMT cu-TectoMT 14.7 EN → RO cu-mergedtrees 8.2 EN → CS Rico Sennrich Neural Machine Translation 13 / 15

Neural MT and Phrase-based SMT Neural MT Phrase-based SMT translation quality ✓ model size ✓ training time ✓ model interpretability ✓ decoding efficiency ✓ ✓ ✓ ✓ toolkits (for simplicity) (for maturity) special hardware requirement GPU lots of RAM Rico Sennrich Neural Machine Translation 14 / 15

Conclusions and Outlook conclusions neural MT is SOTA on many tasks subword models and back-translated data contributed to success future predictions performance lead over phrase-based SMT will increase industry adoption will happen, but beware: some hard things are suddenly easy (incremental training) some easy things are suddenly hard (manual changes to model) exciting research opportunities relax independence assumptions: document-level translation, multimodal input, ... share parts of network between tasks: universal translation models, multi-task models, ... Rico Sennrich Neural Machine Translation 15 / 15

Neural Machine Translation: Breaking the Performance Plateau Rico - PowerPoint PPT Presentation

Neural Machine Translation: Breaking the Performance Plateau Rico Sennrich Institute for Language, Cognition and Computation University of Edinburgh July 4 2016 Rico Sennrich Neural Machine Translation 1 / 15 Is Machine Translation Getting

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

Introduction to Neural Machine Translation Gongbo Tang 16 September 2019 Outline Why Neural

Neural Machine Translation Philipp Koehn 6 October 2020 Philipp Koehn Machine Translation:

Neural Machine Translation II Refinements Philipp Koehn 17 October 2017 Philipp Koehn Machine

Machine Translation 12: (Non-neural) Statistical Machine Translation Rico Sennrich University of

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Convolutional over Recurrent Encoder for Neural Machine Translation Praveen Dakwale and Christof

Adaptive Multi-pass Decoder for Neural Machine Translation EMNLP 2018

Introd u ction to machine translation MAC H IN E TR AN SL ATION IN P YTH ON Th u shan

Machine Translation Machine Translation February 13, 2008 Andreas Eisele UdS Computerlinguistik

Neural Machine Translation Decoding Philipp Koehn 8 October 2020 Philipp Koehn Machine

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Machine Translation Philipp Koehn 28 April 2020 Philipp Koehn Artificial Intelligence: Machine

Semi-supervised Learning for Neural Machine Translation Yong Cheng joint work with Wei Xu,

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Computer Aided Translation Philipp Koehn 30 April 2015 Philipp Koehn Machine Translation:

Translation , 1 2,1 2 3,1 3 0,1 1,1

Empirical Methods in Natural Language Processing Lecture 14 Machine translation (I): Introduction

Machine Translation 2: Statistical MT: Phrase-Based and Neural Ond rej Bojar

2D Geometric Transformations Question : How do we represent a geometric object in the plane?

Surprise Language Evaluation: Rapid-Response Cross-Language IR Maryland: Douglas W. Oard, Marine

A first step towards interactivity and language tools convergence Arnaud Vi 1 Luis Villarejo

Machine Translation Prof. Sameer Singh CS 295: STATISTICAL NLP WINTER 2017 February 28, 2017

Experiments in Term Translation Mihael Arcan DERI, NUI Galway Supervised by Dr. Paul Buitelaar