Online Versus Offline NMT Quality An In-depth Analysis on - PowerPoint PPT Presentation

Online Versus Offline NMT Quality An In-depth Analysis on English-German and German-English Maha Elbayad 1,2 Michael Ustaszewski 3 Emmanuelle Esperança-Rodier 1 Francis Brunet-Manquat 1 Jakob Verbeek 4 Laurent Besacier 1 (1) (2) (3) (4)

Introduction Online NMT models Automatic Evaluation Human Evaluation Outline 1 Introduction to online translation 2 Neural architectures for online NMT a Transformer (Vaswani et al. 2017) b Pervasive Attention (Elbayad et al. 2018) 3 Automatic evaluation 4 Human evaluation 5 Conclusion Elbayad et al. Online vs. Offline NMT Quality 1 / 16

Introduction Online NMT models Automatic Evaluation Human Evaluation Online Neural Machine Translation source source x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 1 x 2 x 3 x 4 x 5 x 6 x 7 < / s> < / s> <s> <s> y 1 y 1 y 2 y 2 y 3 y 3 target target y 4 y 4 y 5 y 5 y 6 y 6 y 7 y 7 y 8 y 8 < / s> < / s> Offline translation Online translation Elbayad et al. Online vs. Offline NMT Quality 2 / 16

Introduction Online NMT models Automatic Evaluation Human Evaluation Wait- k Decoders for Online Translation z wait- k ∀ t ∈ [ 1 .. | y | ] , = min( k + t − 1 , | x | ) t source source source x 1 x 2 x 3 x 4 x 5 < x 1 x 2 x 3 x 4 x 5 < x 1 x 2 x 3 x 4 x 5 < / s> / s> / s> <s> <s> <s> y 1 y 1 y 1 y 2 y 2 y 2 target y 3 y 3 y 3 y 4 y 4 y 4 y 5 y 5 y 5 < / s> < / s> < / s> Wait-1 Wait-3 Wait- ∞ Wait-k or prefix-to-prefix decoding (Dalvi et al. 2018; Ma et al. 2019; Elbayad et al. 2020) Elbayad et al. Online vs. Offline NMT Quality 3 / 16

Online NMT models Introduction Automatic Evaluation Human Evaluation Online Transformer ◮ Unidirectional encoder (Elbayad et al. 2020) s 3 x x Encoder states Source tokens x 1 x 2 x 3 x 4 x 5 x 6 x 1 x 2 x 3 x 4 x 5 x 6 z t = 4 z t +1 = 5 Elbayad et al. Online vs. Offline NMT Quality 4 / 16

Online NMT models Introduction Automatic Evaluation Human Evaluation Online Transformer ◮ Unidirectional encoder (Elbayad et al. 2020) s 3 x x Encoder states Source tokens x 1 x 2 x 3 x 4 x 5 x 6 x 1 x 2 x 3 x 4 x 5 x 6 z t = 4 z t +1 = 5 ◮ Masked decoder - masking the attention energies wrt. z t h t − 1 , z t = 4 x Decoder Encoder s 1 s 2 s 3 s 4 s 5 s 6 Elbayad et al. Online vs. Offline NMT Quality 4 / 16

Online NMT models Introduction Automatic Evaluation Human Evaluation The Pervasive Attention Architecture A ) g g x r e ( g a t i o e n c r u o s p ( y 1 | y < 1 , x ) · · . . . target ( y ) · · · p ( y | y | | y < | y | , x ) H out H conv H 0 H 1 H 1 H 2 H N Concatenated Convolutional source-target Elbayad et al. 2018 feature maps embeddings Elbayad et al. Online vs. Offline NMT Quality 5 / 16

Online NMT models Introduction Automatic Evaluation Human Evaluation Online Pervasive Attention x z t source W y t − 1 target y t 2D causal convolution Features aggregation Elbayad et al. Online vs. Offline NMT Quality 6 / 16

Online NMT models Introduction Automatic Evaluation Human Evaluation Online Pervasive Attention x z t source W y t − 1 target y t + Masking the future source for The appropriate context size z t is unidirectional encoding. controlled during aggregation. Elbayad et al. Online vs. Offline NMT Quality 6 / 16

Online NMT models Introduction Automatic Evaluation Human Evaluation Training and Evaluation Setup Data ◮ IWSLT’14 De-En and En-De (Cettolo et al. 2014). ◮ Sentences >175 words and pairs with length-ratio >1.5 are removed. ◮ The data is tokenized but not lowercased. ◮ The sequences are BPE segmented (Sennrich et al. 2016) → 32K vocabulary. ◮ Training = 160K, development = 7.3K and test = 6.7K. Elbayad et al. Online vs. Offline NMT Quality 7 / 16

Online NMT models Introduction Automatic Evaluation Human Evaluation Training and Evaluation Setup Data ◮ IWSLT’14 De-En and En-De (Cettolo et al. 2014). ◮ Sentences >175 words and pairs with length-ratio >1.5 are removed. ◮ The data is tokenized but not lowercased. ◮ The sequences are BPE segmented (Sennrich et al. 2016) → 32K vocabulary. ◮ Training = 160K, development = 7.3K and test = 6.7K. Models ◮ For each direction and for each architecture, an online and an offline model. ◮ Pervasive Attention ( PA ) with 14 layers and 7 × 7 filters (effectively 4 × 4). ◮ Transformer ( TF ) small. ◮ Online trained with k train = 7 and evaluated with k eval = 3. ◮ Greedy decoding for all. Elbayad et al. Online vs. Offline NMT Quality 7 / 16

Online Versus Offline NMT Quality An In-depth Analysis on - PowerPoint PPT Presentation

Online Versus Offline NMT Quality An In-depth Analysis on English-German and German-English Maha Elbayad 1,2 Michael Ustaszewski 3 Emmanuelle Esperana-Rodier 1 Francis Brunet-Manquat 1 Jakob Verbeek 4 Laurent Besacier 1 (1) (2) (3) (4)

DEA PMU NMT Content Introduction Project Planning NMT Friendly Policy and

5.1 Online versus Offline SVMs We start with a review of the Offline Support Vector Machine.

NMT Structure Terry Kuzma NMT Instructor Outline Program Mission Logistics / Schedule

D.O.T. HAZMAT / DANGEROUS GOODS TRAINING FOR HEALTHCARE WORKERS including the Nuclear

M6 Offline Analysis Katarina Pajchel University of Oslo April 18, 2008 Katarina Pajchel

PLEASE CHECK-IN Strategic Online/Offline Campaigns Sent to Customers and/or Prospects What Do I

Meta-Learning for Low Resource NMT Introduction Historically Statistical Translation

Analysis of NMT Systems Yonatan Belinkov Guest lecture CMU CS 11-731: Machine Translation and

Advanced Neural Machine Translation Gongbo Tang 23 September 2019 Outline NMT with Attention

Machine Translation 3: Linguistics in SMT and NMT Ond rej Bojar bojar@ufal.mff.cuni.cz

Advanced Neural Machine Translation Gongbo Tang 21 September 2020 Outline NMT with Attention

Target Conditioned Sampling: Optimizing Data Selection for Multilingual NMT Xinyi Wang,

2018 Offline and Online Media Trends BMAP General Membership Meeting Jay G. Bautista April 20,

Successful Online and Offline Cleaning of Steam Turbines With and Without Disassembly Successful

Key parse TCP assembly Offline Online capture anonymize Anon. One-Way Interface Key (anon.

Machine Learning Climate Model Dynamics: Offline versus Online Performance Noah Brenowitz,

Introduction to Electrical Systems Course Code: EE 111 Course Code: EE 111 Department: Electrical

1 Prof. S. Ben-Yaakov , DC-DC Converters [3- 4] Magnetic losses mW B P 3 B DC cm H H DC

Industrial Robots Industrial Robots Control Control Part 1 Control Control Part 1 Part 1

Browsing: Query Refinement and Video Synopsis Yonatan Bisk April 23, 2009 Yonatan Bisk

High-Charged Magnetized Beams at FAST-IOTA [MagBeam] Northern Illinois University: A.

PVMD Olindo Isabella Delft University of Technology Plane of array PV modules irradiance

Universal transformers Matus Zilinec SZI, November 29, 2018 Motivation What do we want? Given a

All Eyes on Code Using Call Graphs for WSN Software Optimization Wolf-Bastian Pttner, Daniel