context aware neural machine translation learns anaphora
play

Context-Aware Neural Machine Translation Learns Anaphora Resolution - PowerPoint PPT Presentation

Context-Aware Neural Machine Translation Learns Anaphora Resolution Elena Voita, Pavel Serdyukov, Rico Sennrich, Ivan Titov Do we really need context? 2 Do we really need context? 2 Do we really need context? Source: It has 48


  1. Context-Aware Neural Machine Translation Learns Anaphora Resolution Elena Voita, Pavel Serdyukov, Rico Sennrich, Ivan Titov

  2. Do we really need context? 2

  3. Do we really need context? 2

  4. Do we really need context? Source: › It has 48 columns. 3

  5. Do we really need context? Source: › It has 48 columns. What does “it” refer to? 3

  6. Do we really need context? Source: › It has 48 columns. Possible translations into Russian: › У него 48 колонн. (masculine or neuter) › У нее 48 колонн. (feminine) › У них 48 колонн. (plural) 3

  7. Do we really need context? Source: › It has 48 columns . What do “columns” mean? 4

  8. Do we really need context? Source: › It has 48 columns . Possible translations into Russian: › У него/нее/них 48 колонн . › У него/нее/них 48 колонок . 4

  9. Do we really need context? Context: › Under the cathedral lies the antique chapel. Source: › It has 48 columns. Translation: › У нее 48 колонн. 5

  10. Recap: antecedent and anaphora resolution Under the cathedral lies the antique chapel. It has 48 columns. antecedent anaphoric pronoun Wikipedia: An antecedent is an expression that gives its meaning to a proform (pronoun, pro-verb, pro-adverb, etc.) Anaphora resolution is the problem of resolving references to earlier or later items in the discourse. 6

  11. Context in Machine Translation SMT › focused on handling specific phenomena › used special-purpose features ([Le Nagard and Koehn, 2010]; [Hardmeier and Federico, 2010]; [Hardmeier et al., 2015], [Meyer et al., 2012], [Gong et al., 2012], [Carpuat, 2009]; [Tiedemann, 2010]; [Gong et al., 2011]) 7

  12. Context in Machine Translation SMT › focused on handling specific phenomena › used special-purpose features ([Le Nagard and Koehn, 2010]; [Hardmeier and Federico, 2010]; [Hardmeier et al., 2015], [Meyer et al., 2012], [Gong et al., 2012], [Carpuat, 2009]; [Tiedemann, 2010]; [Gong et al., 2011]) NMT › directly provide context to an NMT system at training time ([Jean et al., 2017]; [Wang et al., 2017]; [Tiedemann and Scherrer, 2017]; [Bawden et al., 2018]) 7

  13. Context in Machine Translation SMT › focused on handling specific phenomena › used special-purpose features ([Le Nagard and Koehn, 2010]; [Hardmeier and Federico, 2010]; [Hardmeier et al., 2015], [Meyer et al., 2012], [Gong et al., 2012], [Carpuat, 2009]; [Tiedemann, 2010]; [Gong et al., 2011]) NMT › directly provide context to an NMT system at training time ([Jean et al., 2017]; [Wang et al., 2017]; [Tiedemann and Scherrer, 2017]; [Bawden et al., 2018]) › not clear: what kinds of discourse phenomena are successfully handled how they are modeled 7

  14. Our work Plan › we introduce a context-aware neural model, which is effective and has a sufficiently simple and interpretable interface between 1 Model Architecture the context and the rest of the translation model › we analyze the flow of information from the context and identify 2 Overall performance pronoun translation as the key phenomenon captured by the model › 3 Analysis by comparing to automatically predicted or human-annotated coreference relations, we observe that the model implicitly captures anaphora 14

  15. Context-Aware Model Architecture

  16. Transformer model architecture › start with the Transformer [Vaswani et al, 2018] 10

  17. Context-aware model architecture › start with the Transformer [Vaswani et al, 2018] › incorporate context information on the encoder side 10

  18. Context-aware model architecture › start with the Transformer [Vaswani et al, 2018] › incorporate context information on the encoder side › use a separate encoder for context › share first N-1 layers of source and context encoders 10

  19. Context-aware model architecture › start with the Transformer [Vaswani et al, 2018] › incorporate context information on the encoder side › use a separate encoder for context › share first N-1 layers of source and context encoders › the last layer incorporates contextual information 10

  20. Overall performance Dataset: OpenSubtitles2018 (Lison et al., 2018) for English and Russian

  21. Overall performance: models comparison (context is the previous sentence) › baseline: context-agnostic baseline Transformer 30.14 30.2 concatenation › concatenation: modification of the 30 approach by [Tiedemann and context encoder (our 29.8 Scherrer, 2017] work) 29.53 29.6 29.46 29.4 29.2 29 12

  22. Our model: different types of context baseline next sentence 30.4 › Next sentence does not appear 30.14 30.2 random sentence beneficial 30 previous sentence › Performance drops for a random 29.8 29.69 context sentence 29.6 29.46 › Model is robust towards being 29.31 29.4 shown a random context 29.2 sentence 29 28.8 (the only significant at p<0.01 difference is with the best model; differences between other results are not significant) 13

  23. Analysis

  24. Our work Analysis › we introduce a context-aware neural model, which is effective and has a sufficiently simple and interpretable interface between 1 Top words influenced by context the context and the rest of the translation model › we analyze the flow of information from the context and identify Non-lexical patterns affecting attention 2 pronoun translation as the key phenomenon captured by the to context model › by comparing to automatically predicted or human-annotated 3 Latent anaphora resolution coreference relations, we observe that the model implicitly captures anaphora 24

  25. What do we mean by “attention to context”? › attention from source to context › mean over heads of per-head attention weights 16

  26. What do we mean by “attention to context”? › attention from source to context › mean over heads of per-head attention weights › take sum over context words (excluding <bos>, <eos> and punctuation) 16

  27. Top words influenced by context word pos it 5.5 yours 8.4 yes 2.5 i 3.3 yeah 1.4 you 4.8 ones 8.3 ‘m 5.1 wait 3.8 well 2.1 17

  28. Top words influenced by context word pos Third person it 5.5 › yours 8.4 singular masculine yes 2.5 › singular feminine i 3.3 › singular neuter yeah 1.4 › plural you 4.8 ones 8.3 ‘m 5.1 wait 3.8 well 2.1 17

  29. Top words influenced by context word pos Second person it 5.5 › yours 8.4 singular impolite yes 2.5 › singular polite i 3.3 › plural yeah 1.4 you 4.8 ones 8.3 ‘m 5.1 wait 3.8 well 2.1 17

  30. Top words influenced by context word pos Need to know gender, because it 5.5 verbs must agree in gender with “I” yours 8.4 (in past tense) yes 2.5 i 3.3 yeah 1.4 you 4.8 ones 8.3 ‘m 5.1 wait 3.8 well 2.1 17

  31. Top words influenced by context word pos it 5.5 yours 8.4 yes 2.5 Many of these words appear at sentence initial position. i 3.3 yeah 1.4 Maybe this is all that matters? you 4.8 ones 8.3 ‘m 5.1 wait 3.8 well 2.1 17

  32. Top words influenced by context word pos word pos it 6.8 it 5.5 yours 8.3 yours 8.4 ones 7.5 yes 2.5 Only positions ‘m 4.8 i 3.3 after the first you 5.6 yeah 1.4 am 4.4 you 4.8 i 5.2 ones 8.3 ‘s ‘m 5.6 5.1 one 6.5 wait 3.8 won 4.6 well 2.1 17

  33. Does the amount of attention to context depend on factors such as sentence length and position?

  34. Dependence on sentence length 19

  35. Dependence on sentence length short source long context high attention to context 19

  36. Dependence on sentence length long source short context low attention to context 19

  37. Is context especially helpful for short sentences? 20

  38. Dependence on token position 21

  39. Analysis of pronoun translation

  40. Ambiguous pronouns and translation quality: how to evaluate Metric: BLEU (standard metric for MT) Specific test sets: › feed CoreNLP (Manning et al., 2014) with pairs of sentences › pick examples with a link between the pronoun and a noun group in a context › gather a test set for each pronoun › use the test sets to evaluate the context-aware NMT system 23

  41. Ambiguous pronouns and translation quality: noun antecedent +1.8 baseline 33 +0.6 context-aware 31.7 32 31 29.9 29.7 30 29.1 BLEU 29 +2.2 28 27 26.1 26 25 23.9 24 23 it you I 24

  42. Ambiguous “it”: noun antecedent baseline +0.3 +4.8 29 context-aware 27.2 26.9 26.6 27 +1.9 +4.3 25 BLEU 24 22.5 23 22.1 21.8 21 19 18.2 17 masculine feminine neuter plural 25

  43. “It” with noun antecedent: example Source: › It was locked up in the hold with 20 other boxes of supplies. Possible translations into Russian: › Он был заперт в трюме с 20 другими ящиками с припасами. ( masculine) › Оно был о заперт о в трюме с 20 другими ящиками с припасами. (neuter) › Она был а заперт а в трюме с 20 другими ящиками с припасами. (feminine) › Они был и заперт ы в трюме с 20 другими ящиками с припасами. (plural) 26

Recommend


More recommend