Context-Aware Neural Machine Translation Learns Anaphora Resolution - PowerPoint PPT Presentation

Context-Aware Neural Machine Translation Learns Anaphora Resolution Elena Voita, Pavel Serdyukov, Rico Sennrich, Ivan Titov

Do we really need context? 2

Do we really need context? Source: › It has 48 columns. 3

Do we really need context? Source: › It has 48 columns. What does “it” refer to? 3

Do we really need context? Source: › It has 48 columns. Possible translations into Russian: › У него 48 колонн. (masculine or neuter) › У нее 48 колонн. (feminine) › У них 48 колонн. (plural) 3

Do we really need context? Source: › It has 48 columns . What do “columns” mean? 4

Do we really need context? Source: › It has 48 columns . Possible translations into Russian: › У него/нее/них 48 колонн . › У него/нее/них 48 колонок . 4

Do we really need context? Context: › Under the cathedral lies the antique chapel. Source: › It has 48 columns. Translation: › У нее 48 колонн. 5

Recap: antecedent and anaphora resolution Under the cathedral lies the antique chapel. It has 48 columns. antecedent anaphoric pronoun Wikipedia: An antecedent is an expression that gives its meaning to a proform (pronoun, pro-verb, pro-adverb, etc.) Anaphora resolution is the problem of resolving references to earlier or later items in the discourse. 6

Context in Machine Translation SMT › focused on handling specific phenomena › used special-purpose features ([Le Nagard and Koehn, 2010]; [Hardmeier and Federico, 2010]; [Hardmeier et al., 2015], [Meyer et al., 2012], [Gong et al., 2012], [Carpuat, 2009]; [Tiedemann, 2010]; [Gong et al., 2011]) 7

Context in Machine Translation SMT › focused on handling specific phenomena › used special-purpose features ([Le Nagard and Koehn, 2010]; [Hardmeier and Federico, 2010]; [Hardmeier et al., 2015], [Meyer et al., 2012], [Gong et al., 2012], [Carpuat, 2009]; [Tiedemann, 2010]; [Gong et al., 2011]) NMT › directly provide context to an NMT system at training time ([Jean et al., 2017]; [Wang et al., 2017]; [Tiedemann and Scherrer, 2017]; [Bawden et al., 2018]) 7

Context in Machine Translation SMT › focused on handling specific phenomena › used special-purpose features ([Le Nagard and Koehn, 2010]; [Hardmeier and Federico, 2010]; [Hardmeier et al., 2015], [Meyer et al., 2012], [Gong et al., 2012], [Carpuat, 2009]; [Tiedemann, 2010]; [Gong et al., 2011]) NMT › directly provide context to an NMT system at training time ([Jean et al., 2017]; [Wang et al., 2017]; [Tiedemann and Scherrer, 2017]; [Bawden et al., 2018]) › not clear: what kinds of discourse phenomena are successfully handled how they are modeled 7

Our work Plan › we introduce a context-aware neural model, which is effective and has a sufficiently simple and interpretable interface between 1 Model Architecture the context and the rest of the translation model › we analyze the flow of information from the context and identify 2 Overall performance pronoun translation as the key phenomenon captured by the model › 3 Analysis by comparing to automatically predicted or human-annotated coreference relations, we observe that the model implicitly captures anaphora 14

Context-Aware Model Architecture

Transformer model architecture › start with the Transformer [Vaswani et al, 2018] 10

Context-aware model architecture › start with the Transformer [Vaswani et al, 2018] › incorporate context information on the encoder side 10

Context-aware model architecture › start with the Transformer [Vaswani et al, 2018] › incorporate context information on the encoder side › use a separate encoder for context › share first N-1 layers of source and context encoders 10

Context-aware model architecture › start with the Transformer [Vaswani et al, 2018] › incorporate context information on the encoder side › use a separate encoder for context › share first N-1 layers of source and context encoders › the last layer incorporates contextual information 10

Overall performance Dataset: OpenSubtitles2018 (Lison et al., 2018) for English and Russian

Overall performance: models comparison (context is the previous sentence) › baseline: context-agnostic baseline Transformer 30.14 30.2 concatenation › concatenation: modification of the 30 approach by [Tiedemann and context encoder (our 29.8 Scherrer, 2017] work) 29.53 29.6 29.46 29.4 29.2 29 12

Our model: different types of context baseline next sentence 30.4 › Next sentence does not appear 30.14 30.2 random sentence beneficial 30 previous sentence › Performance drops for a random 29.8 29.69 context sentence 29.6 29.46 › Model is robust towards being 29.31 29.4 shown a random context 29.2 sentence 29 28.8 (the only significant at p<0.01 difference is with the best model; differences between other results are not significant) 13

Analysis

Our work Analysis › we introduce a context-aware neural model, which is effective and has a sufficiently simple and interpretable interface between 1 Top words influenced by context the context and the rest of the translation model › we analyze the flow of information from the context and identify Non-lexical patterns affecting attention 2 pronoun translation as the key phenomenon captured by the to context model › by comparing to automatically predicted or human-annotated 3 Latent anaphora resolution coreference relations, we observe that the model implicitly captures anaphora 24

What do we mean by “attention to context”? › attention from source to context › mean over heads of per-head attention weights 16

What do we mean by “attention to context”? › attention from source to context › mean over heads of per-head attention weights › take sum over context words (excluding <bos>, <eos> and punctuation) 16

Top words influenced by context word pos it 5.5 yours 8.4 yes 2.5 i 3.3 yeah 1.4 you 4.8 ones 8.3 ‘m 5.1 wait 3.8 well 2.1 17

Top words influenced by context word pos Third person it 5.5 › yours 8.4 singular masculine yes 2.5 › singular feminine i 3.3 › singular neuter yeah 1.4 › plural you 4.8 ones 8.3 ‘m 5.1 wait 3.8 well 2.1 17

Top words influenced by context word pos Second person it 5.5 › yours 8.4 singular impolite yes 2.5 › singular polite i 3.3 › plural yeah 1.4 you 4.8 ones 8.3 ‘m 5.1 wait 3.8 well 2.1 17

Top words influenced by context word pos Need to know gender, because it 5.5 verbs must agree in gender with “I” yours 8.4 (in past tense) yes 2.5 i 3.3 yeah 1.4 you 4.8 ones 8.3 ‘m 5.1 wait 3.8 well 2.1 17

Top words influenced by context word pos it 5.5 yours 8.4 yes 2.5 Many of these words appear at sentence initial position. i 3.3 yeah 1.4 Maybe this is all that matters? you 4.8 ones 8.3 ‘m 5.1 wait 3.8 well 2.1 17

Top words influenced by context word pos word pos it 6.8 it 5.5 yours 8.3 yours 8.4 ones 7.5 yes 2.5 Only positions ‘m 4.8 i 3.3 after the first you 5.6 yeah 1.4 am 4.4 you 4.8 i 5.2 ones 8.3 ‘s ‘m 5.6 5.1 one 6.5 wait 3.8 won 4.6 well 2.1 17

Does the amount of attention to context depend on factors such as sentence length and position?

Dependence on sentence length 19

Dependence on sentence length short source long context high attention to context 19

Dependence on sentence length long source short context low attention to context 19

Is context especially helpful for short sentences? 20

Dependence on token position 21

Analysis of pronoun translation

Ambiguous pronouns and translation quality: how to evaluate Metric: BLEU (standard metric for MT) Specific test sets: › feed CoreNLP (Manning et al., 2014) with pairs of sentences › pick examples with a link between the pronoun and a noun group in a context › gather a test set for each pronoun › use the test sets to evaluate the context-aware NMT system 23

Ambiguous pronouns and translation quality: noun antecedent +1.8 baseline 33 +0.6 context-aware 31.7 32 31 29.9 29.7 30 29.1 BLEU 29 +2.2 28 27 26.1 26 25 23.9 24 23 it you I 24

Ambiguous “it”: noun antecedent baseline +0.3 +4.8 29 context-aware 27.2 26.9 26.6 27 +1.9 +4.3 25 BLEU 24 22.5 23 22.1 21.8 21 19 18.2 17 masculine feminine neuter plural 25

“It” with noun antecedent: example Source: › It was locked up in the hold with 20 other boxes of supplies. Possible translations into Russian: › Он был заперт в трюме с 20 другими ящиками с припасами. ( masculine) › Оно был о заперт о в трюме с 20 другими ящиками с припасами. (neuter) › Она был а заперт а в трюме с 20 другими ящиками с припасами. (feminine) › Они был и заперт ы в трюме с 20 другими ящиками с припасами. (plural) 26

Context-Aware Neural Machine Translation Learns Anaphora Resolution - PowerPoint PPT Presentation

Context-Aware Neural Machine Translation Learns Anaphora Resolution Elena Voita, Pavel Serdyukov, Rico Sennrich, Ivan Titov Do we really need context? 2 Do we really need context? 2 Do we really need context? Source: It has 48

ANAPHORA RESOLUTION Olga Uryupina DISI, University of Trento Anaphora Resolution Anaphora

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

Introduction to Neural Machine Translation Gongbo Tang 16 September 2019 Outline Why Neural

Neural Machine Translation Philipp Koehn 6 October 2020 Philipp Koehn Machine Translation:

Neural Machine Translation II Refinements Philipp Koehn 17 October 2017 Philipp Koehn Machine

Pronominal, temporal and descriptive anaphora Rob van der Sandt Dept. of Philosophy Radboud

Machine Translation 12: (Non-neural) Statistical Machine Translation Rico Sennrich University of

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Toolkit to Support Intelligibility in Context Aware Applications Context-Aware Applications P

Convolutional over Recurrent Encoder for Neural Machine Translation Praveen Dakwale and Christof

Adaptive Multi-pass Decoder for Neural Machine Translation EMNLP 2018

Introd u ction to machine translation MAC H IN E TR AN SL ATION IN P YTH ON Th u shan

Machine Translation Machine Translation February 13, 2008 Andreas Eisele UdS Computerlinguistik

Neural Machine Translation Decoding Philipp Koehn 8 October 2020 Philipp Koehn Machine

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Machine Translation Philipp Koehn 28 April 2020 Philipp Koehn Artificial Intelligence: Machine

Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings A paper

Neutrino acceleration: analogy with Fermi acceleration and Comptonization Yudai Suwa 1,2 1 Yukawa

ACQUISITION - FLYWHEEL BAKKEN, LLC APRIL 2019 NYSE American: NOG FORWARD LOOKING STATEMENTS /

Making a Difference Fubon Group Overview Fubon Group Overview s first private 1961,

Taking Stock of Progress Board of Education May 7, 2018 | Tactic Leader: Jennifer Steele |

English Standar Engli sh Standard Module B Module B: Clos : Close Study of Tex e Study of Text

TATT CHAT AUGUST 27, 2020 Welcome Terence Roberts, TATT Chairman Guest Presentation

Common Core State Standards ~ Literacy! Its More than Just ELA Mary Boyle, Deputy Supt