Rapid Adaptation of Machine Translation to New Languages Graham - PowerPoint PPT Presentation

Rapid Adaptation of Machine Translation to New Languages Graham Neubig, Junjie Hu @ EMNLP 11/2/2018

Inspiration: Rapid Disaster Response නාවල��ය යටකළ වැ� ව�ර - #HiruNews #StandBy ගංව�ර හා නායයා� ත�වය�ෙග� �පතට ප� වූ ��වලට සහන සැපයූ �ෙ��ඡා ක�ඩාය� ද�ව�ට �� ඉ�ලා පළ කළ සමාජ ජාල ප�වුඩ පසු�ය �නවල ද��නට ලැ�� . Disaster in Sri Lanka Photo Credit: Wikimedia Commons

How can we effectively and rapidly adapt MT to new languages?

Some Crazy Ideas • Cross-lingual transfer: can we create a machine translation system by transferring across language boundaries? [Zoph+16] • Zero-shot transfer: can we do it with no data in the low- resource language?

Multi-lingual Training [Firat+16, Johnson+17, Ha+17] • Train a large multi-lingual MT system, and apply it to a low-resource language fra por rus eng tur ... bel aze

Two Multilingual Training Paradigms • Warm-start training: (indicated w/ "+") fra por • We already have some data in the test language rus eng tur • Train a model starting with that data . bel • Cold-start training: (indicated w/ "-") aze fra • We initially have no data in the test language por rus • Possibilities for completely unsupervised transfer eng tur . • Suitable for rapid adaptation to new languages bel x aze

Experiments: Training Setting • TED multi-lingual corpus (Qi et al. 2018)   https://github.com/neulab/word-embeddings-for-nmt • 57 source languages, plus English • Testbed languages: Azerbaijani (aze), Belarusian (bel), Galician (glg), Slovak (slk) • Related languages: Turkish (tur), Russian (rus), Portuguese (por), Czech (ces)

Systems • Test Systems: • Single-source Neural MT (Sing.): Test source language only • Bi-source Neural MT (Bi.): Test source language and related source • All-source Neural MT (All): All source languages • Other Baselines: • Phrase-based MT: Shown to be strong in low-resource settings • Unsupervised MT [Artetxe+17]: Learn system using only monolingual data in source/target languages (cited as effective in low-resource settings)

How does Cross-lingual Transfer Help? PBMT Unsupervised NMT Sing. NMT Bi+ NMT All+ 30 22.5 15 7.5 0 aze/tur bel/rus glg/por slk/ces • Unsupervised translation not competitive • Without transfer, NMT worse than PBMT • With transfer NMT significantly better (transfer barely helped PBMT)

How Does Cold-start Compare? NMT Bi+ NMT All+ NMT Bi- NMT All- 30 22.5 15 7.5 0 aze/tur bel/rus glg/por slk/ces • Large drop, but still much better than nothing • Up to 15 BLEU with no training data in test language

Adaptation to New Languages • Training on all languages can be less effective, esp. in cold-start case • Can we further adapt to new languages? • Problem: overfitting Adaptation (All → Sing.) Pre-training aze eng fra por Adaptation w/ rus eng Similar Language Regularization tur (All → Bi.) ... bel tur eng aze aze

Warm-start + Adaptation NMT Sing. NMT Bi+ NMT All+ All+ -> Sing. All+ -> Bi 30 22.5 15 7.5 0 aze/tur bel/rus glg/por slk/ces • Adaptation helps! • Helps more w/ similar language regularization

Cold-start + Adaptation NMT Sing. NMT Bi- NMT All- All- -> Sing. All- -> Bi All+ -> Bi 30 22.5 15 7.5 0 aze/tur bel/rus glg/por slk/ces • Adaptation w/ similar-language regularization gains more • Approaches quality of warm-start; doesn't need data a-priori

How Fast can we Adapt? Cold-start adaptation reaches good point faster than training from scratch 0.21 0.18 0.15 Sing. 0.12 Bi BLEU 0.09 All-→Sing. All-→Bi 0.06 All-→Bi 1-1 0.03 0 0 1 2 3 4 5 6 7 8 9 10 Hours Training

Take-aways • NMT with massively multi-lingual cross-lingual transfer : a stable recipe for low- resource translation • Better results than phrase-based, unsupervised MT in real low-resource languages • Adaptation w/ similar language regularization : simple and effective, even in cold- start scenarios https://github.com/neubig/rapid-adaptation Questions?

Rapid Adaptation of Machine Translation to New Languages Graham - PowerPoint PPT Presentation

Rapid Adaptation of Machine Translation to New Languages Graham Neubig, Junjie Hu @ EMNLP 11/2/2018 Inspiration: Rapid Disaster Response - #HiruNews #StandBy

Domain Adaptation in Statistical Machine Translation Logic, Language and Computation Bart

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Adaptation Philipp Koehn 27 October 2020 Philipp Koehn Machine Translation: Adaptation 27

Rapid Response Jobs are Alaskas Future Rapid Response Rapid Response Rapid Response is a

Introd u ction to machine translation MAC H IN E TR AN SL ATION IN P YTH ON Th u shan

Machine Translation Machine Translation February 13, 2008 Andreas Eisele UdS Computerlinguistik

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Machine Translation Philipp Koehn 28 April 2020 Philipp Koehn Artificial Intelligence: Machine

Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Minema Minema

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Computer Aided Translation Philipp Koehn 30 April 2015 Philipp Koehn Machine Translation:

Computer Aided Translation Philipp Koehn 15 November 2018 Philipp Koehn Machine Translation:

Statistical Machine Translation Graham Neubig Nara Institute of Science and Technology (NAIST)

Machine Translation: Going Deep Philipp Koehn 4 June 2015 Philipp Koehn Machine Translation:

Machine Translation Philipp Koehn 1 December 2015 Philipp Koehn Artificial Intelligence:

Dynamic formulations of Optimal Transportation and variational MFGs Jean-David Benamou EPC

Optimizing Latency and Reliability of Pipeline Workflow Applications Anne Benoit Veronika

Natural Model Semantics of Comonadic Modal Type Theory Colin Zwanziger Department of Philosophy

Moore s Law s Law Moore Super Scalar/Vector/Parallel 1 PFlop/s Earth Parallel

Coalition Battle Management Language International Collaborative Activities MSG-085

Searching for a Better Life: Nowcasting International Migration with Online Search Queries

Demand Paging Code pages are stored in a memory-mapped file on the backing store some are

Editorship Treatment Gaps and Patient New England Journal Concerns: Where Do We Go from