BREAKTHROUGHS IN NEURAL MACHINE TRANSLATION Olof Mogren Chalmers - PowerPoint PPT Presentation

BREAKTHROUGHS IN NEURAL MACHINE TRANSLATION Olof Mogren Chalmers University of Technology 2016-09-29

COMING SEMINARS • T oday: Olof Mogren Neural Machine Translation • October 6: John Wiedenhoeft Fast Bayesian inference in Hidden Markov Models using Dynamic Wavelet Compression • October 10: Haris Charalambos Themistocleous Linguistic, signal processing, and machine learning approaches in eliciting information form speech ❤tt♣✿✴✴✇✇✇✳❝s❡✳❝❤❛❧♠❡rs✳s❡✴r❡s❡❛r❝❤✴❧❛❜✴s❡♠✐♥❛rs✴

Progress in Machine Translation [Edinburgh En-De WMT newstest2013 Cased BLEU; NMT 2015 from U. Montréal] Phrase-based SMT Syntax-based SMT Neural MT 25 20 15 10 5 0 2013 2014 2015 2016 From [Sennrich 2016, http://www.meta-net.eu/events/meta-forum-2016/slides/09_sennrich.pdf]

Phrase-based Statistical Machine Translation A marvelous use of big data but … it’s mined out?!? 1519 年 600 名西班牙人在墨西哥登陆，去征服几百万人口的阿兹特克帝国，初次交锋他们损兵三分之二。 In 1519, six hundred Spaniards landed in Mexico to conquer the Aztec Empire with a population of a few million. They lost two thirds of their soldiers in the first clash. translate.google.com (2009): 1519 600 Spaniards landed in Mexico, millions of people to conquer the Aztec empire, the first two-thirds of soldiers against their loss. translate.google.com (2013): 1519 600 Spaniards landed in Mexico to conquer the Aztec empire, hundreds of millions of people, the initial confrontation loss of soldiers two-thirds. conquer the Aztec empire, the first two-thirds of the loss of soldiers they clash. } translate.google.com (2014): 1519 600 Spaniards landed in Mexico, millions of people to conquer the Aztec empire, the first two-thirds of the loss of soldiers they clash. translate.google.com (2015): 1519 600 Spaniards landed in Mexico, millions of people to conquer the Aztec empire, the first two-thirds of the loss of soldiers they clash. translate.google.com (2016): 1519 600 Spaniards landed in Mexico, millions of people to

WHAT IS NEURAL MT (NMT)? The approach of modelling the entire MT process via one big artificial neural network.

MODELLING LANGUAGE USING RNNS y 1 y 2 y 3 x 1 x 2 x 3 • Language models: P ( word i | word 1 , ..., word i − 1 ) • Recurrent Neural Networks • Gated additive sequence modelling: LSTM (and variants) details • Fixed vector representation for sequences • Use with beam-search for language generation

ENCODER-DECODER FRAMEWORK y 1 y 2 y 3 encoder { { decoder x 1 x 2 x 3 • Sequence to Sequence Learning with Neural Networks Ilya Sutskever, Oriol Vinyals, Quoc V. Le, NIPS 2014

ENCODER-DECODER FRAMEWORK y 1 y 2 y 3 encoder { { decoder x 1 x 2 x 3 • Sequence to Sequence Learning with Neural Networks Ilya Sutskever, Oriol Vinyals, Quoc V. Le, NIPS 2014 • Reversed input sentence!

ENCODER-DECODER WITH ATTENTION y 1 y 2 y 3 encoder decoder x 1 x 2 x 3 • N eural Machine Translation by Jointly Learning to Align and Translate Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio - ICLR 2015

ENCODER-DECODER WITH ATTENTION y 1 y 2 y 3 encoder decoder attention x 1 x 2 x 3 • N eural Machine Translation by Jointly Learning to Align and Translate Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio - ICLR 2015

ALIGNMENT - (MORE) environments environment agreement Economic European August <end> signed marine <end> should known 1992 noted Area least was The that the the the on be in of is It . . L' Il accord convient de sur noter la que zone l' économique environment européenne marin a est été le signé moins connu en de août l' 1992 environment . . <end> <end>

NEURAL MACHINE TRANSLATION, NMT • E nd-to-end training • Distributed representations • Better exploitation of context What’s not on that list?

WHAT’S BEEN HOLDING NMT BACK? • Limited vocabulary • Copying • Dictionary lookup • Data requirements • Computation • Training time • Inference time • Memory usage

RARE WORDS 1: SUBWORD UNITS • N eural machine translation of rare words with subword units Rico Sennrich and Barry Haddow and Alexandra Birch • A character-level decoder without explicit segmentation for neural machine translation Junyoung Chung, Kyunghyun Cho, and Yoshua Bengio, ACL 2016 Byte-pair encoding (BPE): ❛❛❛❜❞❛❛❛❜❛❝ ❩❛❜❞❩❛❜❛❝ ❩❨❞❩❨❛❝ ❳❞❳❛❝ ❩❂❛❛ ❨❂❛❜ ❳❂❩❨ ❩❂❛❛ ❨❂❛❜ ❩❂❛❛

RARE WORDS 2: HYBRID CHAR/WORD NMT • A chieving open vocabulary neural machine translation with hybrid word-character models Thang Luong and Chris Manning, ACL 2016. • Hybrid architechture: • Word-based for most words • Character-based for rare words • 2 BLEU points improvement over copy mechanism

W ord-level (4 layers) End-to-end training 8 stacked LSTM layers

Effects of Vocabulary Sizes Word Word + copy mechanism Hybrid 20 +2.1 18 +3.5 16 +4.5 14 12 BLEU +11.4 10 8 6 4 2 0 1K 10K 20K 50K More than +2.0 BLEU over copy mechanism! Vocabulary Size 177

Rare Word Embeddings

TRAINING WITH MONOLINGUAL DATA • Improving neural machine translation models with monolingual data Rico Sennrich, Barry Haddow, Alexandra Birch, ACL 2016. • Backtranslate monolingual data (with NMT model) • Use backtranslated data as parallell training data

BREAKTHROUGHS IN NEURAL MACHINE TRANSLATION Olof Mogren Chalmers - PowerPoint PPT Presentation

BREAKTHROUGHS IN NEURAL MACHINE TRANSLATION Olof Mogren Chalmers University of Technology 2016-09-29 COMING SEMINARS T oday: Olof Mogren Neural Machine Translation October 6: John Wiedenhoeft Fast Bayesian inference in Hidden Markov

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

Neural Machine Translation Philipp Koehn 6 October 2020 Philipp Koehn Machine Translation:

Convolutional over Recurrent Encoder for Neural Machine Translation Praveen Dakwale and Christof

Machine Translation 12: (Non-neural) Statistical Machine Translation Rico Sennrich University of

Adaptive Multi-pass Decoder for Neural Machine Translation EMNLP 2018

Semi-supervised Learning for Neural Machine Translation Yong Cheng joint work with Wei Xu,

Neural Machine Translation: Breaking the Performance Plateau Rico Sennrich Institute for

What can Statistical Machine Translation teach Neural Machine Translation about Structured

Googles Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

Introduction to Neural Machine Translation Gongbo Tang 16 September 2019 Outline Why Neural

Neural Machine Translation: directions for improvement CMSC 470 Marine Carpuat How can we

Sparse and Constrained Attention for Neural Machine Translation Chaitanya Malaviya 1 ,

An Introduction to Neural Machine Translation Prof. John D. Kelleher @johndkelleher ADAPT Centre

Neural machine translation with less supervision CMSC 470 Marine Carpuat Neural MT only helps

Neural Machine Translation Decoding Philipp Koehn 8 October 2020 Philipp Koehn Machine

Fundamentals of Machine Learning for Neural Machine Translation Dr. John D. Kelleher ADAPT

Effective Approaches to Attention-based Neural Machine Translation Thang Luong Hieu Pham and Chris

Neural Reranking Improves Subjective Quality of Machine Translation: NAIST at WAT 2015 Graham

What can Statistical Machine Translation teach Neural Text Generation about Optimization? Graham

Sequence to Sequence Models for Machine Translation (2) CMSC 723 / LING 723 / INST 725 Marine

Document Context Neural Machine Translation with Memory Networks Sameen Maruf, Gholamreza Haffari

Machine Translation 2: Statistical MT: Phrase-Based and Neural Ond rej Bojar

Machine Translation 2: Statistical MT: Phrase-Based and Neural Ond rej Bojar

Machine Translation 2: Statistical MT: Neural MT and Representations Ondej Bojar