using coreference links to improve
play

Using Coreference Links to Improve Spanish-to-English Machine - PowerPoint PPT Presentation

Using Coreference Links to Improve Spanish-to-English Machine Translation Lesly Miculicich Andrei Popescu-Belis Content 1. Introduction 2. Coreference aware machine translation 3. Experiments and results 4. Conclusion Content 1.


  1. Using Coreference Links to Improve Spanish-to-English Machine Translation Lesly Miculicich Andrei Popescu-Belis

  2. Content 1. Introduction 2. Coreference aware machine translation 3. Experiments and results 4. Conclusion

  3. Content 1. Introduction 2. Coreference aware machine translation 3. Experiments and results 4. Conclusion

  4. Motivation Source: When she ran down, the left slipper remained stuck in the stairs, it was small and dainty. MT: Quand elle a couru, la pantoufle gauche est restée coincée dans les escaliers, il était petit et délicat. 4

  5. Motivation Source: Pertenezco a un partido político respetable. – ¿Qué partido ? Reference: I belong to a respectable political party . – Which party ? MT: I belong to a respectable political party . – What a match ? 5

  6. Machine Translation (MT) 𝒇 𝑐𝑓𝑡𝑢 = 𝑏𝑠𝑕max 𝑞 𝒇 𝒈 𝑓 𝒇 = 𝑓 1 , 𝑓 2 , … , 𝑓 𝑜 Sentence in target language 𝒈 = (𝑔 1 , 𝑔 2 , … , 𝑔 𝑛 ) Sentence in source language 6

  7. Machine Translation (MT) • Approaches: • PBSMT : Phase-based statistical machine translation • NMT : Neural machine translation • Evaluation made comparing with human translation as reference. Common metric: • BLEU : n -gram precision 7

  8. Coreference Resolution • Linking or grouping mentions that refer to the same entity in a text. • Mentions: nouns, pronouns, noun- phrases, … • Entities: people, object, places, … • Links : coreference links, mention clusters, mention chains, … • Evaluation made comparing with ground-truth. Common metrics: • MUC: number of links to be inserted or deleted. • B 3 : precision and recall at cluster-level for each mention. • CEAF: precision and recall at cluster-level for each entity. 8

  9. Content 1. Introduction 2. Coreference aware machine translation 3. Experiments and results 4. Conclusion

  10. Coreference-aware MT ▪ State-of-the-art ▪ Contribution Machine Translator Source Target Coreference- Document Document aware MT Coreference resolver Objective: Improve the translation of documents by including coreference constraints. 10

  11. Coreference in translation Source (Spanish) 1 Source (Spanish) 1 Source (Spanish) 1 Human Translation 2 Human Translation 2 Machine Translation 2 3 La película narra la historia de [un La película narra la historia de [un La película narra la historia de [un The film tells the story of [a young The film tells the story of [a young The film tells the story of [a young joven parisiense] c1 que marcha a joven parisiense] c1 que marcha a joven parisiense] c1 que marcha a Parisian] c1 who goes to Romania Parisian] c1 who goes to Romania Parisian] c1 who goes to Romania Rumanía en busca de [una Rumanía en busca de [una Rumanía en busca de [una in search of [a gypsy singer] c2 , as in search of [a gypsy singer] c2 , as in search of [a gypsy singer] c2 , as cantante zíngara] c2 , ya que [su] c1 cantante zíngara] c2 , ya que [su] c1 cantante zíngara] c2 , ya que [su] c1 [his] c1 deceased father use to [his] c1 deceased father use to [his] c2 deceased father always fallecido padre escuchaba fallecido padre escuchaba fallecido padre escuchaba listen to [her] c2 songs. listen to [her] c2 songs. listened to [his] c2 songs. siempre [sus] c2 canciones. siempre [sus] c2 canciones. siempre [sus] c2 canciones. Pudiera considerarse un viaje Pudiera considerarse un viaje Pudiera considerarse un viaje It could be considered a failed It could be considered a failed It could be considered [a failed fallido, porque [ ∅ ] c1 no encuentra fallido, porque [ ∅ ] c1 no encuentra fallido, porque [ ∅ ] c1 no encuentra journey, because [he] c1 does not journey, because [he] c1 does not trip] c3 because [it] c3 does not [su] c1 objetivo, pero el azar [le] c1 [su] c1 objetivo, pero el azar [le] c1 [su] c1 objetivo, pero el azar [le] c1 find [his] c1 objective, but the fate find [his] c1 objective, but the fate find [its] c3 objective, but the conduce a una pequeña conduce a una pequeña conduce a una pequeña leads [him] c1 to a small leads [him] c1 to a small chance leads to ∅ a small comunidad... comunidad... comunidad... community... community... community... 1 Example from AnCora-CO with manual annotation of coreferences. 2 Automatic coreference resolution with Stanford CoreNLP (http://stanfordnlp.github.io/CoreNLP/coref.html) 11 3 Translation with a free online NMT

  12. Defining Coreference Similarity Score Translation 𝑒 𝑢 Source 𝑒 𝑡 1. Apply coreference resolver on both sides. 2. Find alignments of mentions. 3. Calculate MUC, B3, and CEAF Ground-truth Evaluated document 12

  13. Empirical Verification B 3 BLEU MUC CEAF Translation Coreference Human translation - 37 32 41 Quality Quality Commercial NMT 49.7 28 26 36 Baseline PBSMT 43.4 23 24 33 Values of F1 in % • Data: 3 K words from AnCora-CO with manual annotation of coreferences. • Automatic coreference resolution with Stanford CoreNLP (http://stanfordnlp.github.io/CoreNLP/coref.html) 13 • Implementation of metrics from CoNLL 2012 (http://conll.cemantix.org/2012/)

  14. Proposed approaches 1. Re-ranking of n -best sentences  Changes at sentence-level  Scoring at document-level 2. Post-editing of mentions  Changes at mention-level  Scoring at cluster-level 14

  15. Re-ranking … Source 𝑒 𝑡 Sentence 1 Sentence 2 Sentence 3 Sentence N 1 1 1 1 Translation 𝑒 𝑢 ℎ𝑧𝑞 1 ℎ𝑧𝑞 2 ℎ𝑧𝑞 3 ℎ𝑧𝑞 𝑁 … 2 2 2 2 ℎ𝑧𝑞 1 ℎ𝑧𝑞 2 ℎ𝑧𝑞 3 ℎ𝑧𝑞 𝑁 … N-best 3 3 3 3 ℎ𝑧𝑞 1 ℎ𝑧𝑞 2 ℎ𝑧𝑞 3 ℎ𝑧𝑞 𝑁 by MT … system 4 4 4 4 ℎ𝑧𝑞 1 ℎ𝑧𝑞 2 ℎ𝑧𝑞 3 ℎ𝑧𝑞 𝑁 … … … … … 15

  16. Re-ranking … Source 𝑒 𝑡 Sentence 1 Sentence 2 Sentence 3 Sentence N … Translation 𝟐 𝟐 𝟐 𝟐 Translation 𝑒 𝑢 𝒊𝒛𝒒 𝟐 𝒊𝒛𝒒 𝟑 𝒊𝒛𝒒 𝟒 𝒊𝒛𝒒 𝑶 by MT system … 2 2 2 2 ℎ𝑧𝑞 1 ℎ𝑧𝑞 2 ℎ𝑧𝑞 3 ℎ𝑧𝑞 𝑂 … 3 3 3 3 ℎ𝑧𝑞 1 ℎ𝑧𝑞 2 ℎ𝑧𝑞 3 ℎ𝑧𝑞 𝑂 … 4 4 4 4 ℎ𝑧𝑞 1 ℎ𝑧𝑞 2 ℎ𝑧𝑞 3 ℎ𝑧𝑞 𝑂 … … … … … 16

  17. Re-ranking 𝐷 𝑡𝑗𝑛 = 𝑁𝑉𝐷 + 𝐶 3 + 𝐷𝐹𝐵𝐺 /3 𝑏𝑠𝑕𝑛𝑏𝑦 𝐷 𝑡𝑗𝑛 𝑒 𝑢 , 𝑒 𝑡 … Source 𝑒 𝑡 Sentence 1 Sentence 2 Sentence 3 Sentence M … 1 1 1 1 Translation 𝑒 𝑢 ℎ𝑧𝑞 1 ℎ𝑧𝑞 2 ℎ𝑧𝑞 3 ℎ𝑧𝑞 𝑁 … 2 2 2 2 ℎ𝑧𝑞 1 ℎ𝑧𝑞 2 ℎ𝑧𝑞 3 ℎ𝑧𝑞 𝑁 … N-best 3 3 3 3 ℎ𝑧𝑞 1 ℎ𝑧𝑞 2 ℎ𝑧𝑞 3 ℎ𝑧𝑞 𝑁 by MT … system 4 4 4 4 ℎ𝑧𝑞 1 ℎ𝑧𝑞 2 ℎ𝑧𝑞 3 ℎ𝑧𝑞 𝑁 … … … … … 17

  18. Re-ranking 𝐷 𝑡𝑗𝑛 = 𝑁𝑉𝐷 + 𝐶 3 + 𝐷𝐹𝐵𝐺 /3 𝑏𝑠𝑕𝑛𝑏𝑦 𝐷 𝑡𝑗𝑛 𝑒 𝑢 , 𝑒 𝑡 … Source 𝑒 𝑡 Sentence 1 Sentence 2 Sentence 3 Sentence N … Translation 1 1 𝟐 1 1 Translation 𝑒 𝑢 ℎ𝑧𝑞 1 𝒊𝒛𝒒 𝟑 ℎ𝑧𝑞 2 ℎ𝑧𝑞 3 ℎ𝑧𝑞 𝑂 by Re-ranking … 𝟑 2 2 𝟑 2 ℎ𝑧𝑞 2 ℎ𝑧𝑞 3 𝒊𝒛𝒒 𝟐 𝒊𝒛𝒒 𝑶 ℎ𝑧𝑞 𝑂 … 3 3 3 𝟒 3 ℎ𝑧𝑞 1 ℎ𝑧𝑞 2 𝒊𝒛𝒒 𝟒 ℎ𝑧𝑞 3 ℎ𝑧𝑞 𝑂 … 4 4 4 4 ℎ𝑧𝑞 1 ℎ𝑧𝑞 2 ℎ𝑧𝑞 3 ℎ𝑧𝑞 𝑂 … … … … … ✓ Remove sentences with same set of mentions. ✓ Beam search 18

  19. Re-ranking ✓ Optimization at document-level. ✓ Simple to use with a MT system.  Not all mentions in a sentence can be optimized at the same time.  Need to run coreference resolver at each step. 19

  20. Post-editing Translation 𝑒 𝑢 Source 𝑒 𝑡 1. Apply coreference resolver on source side. 2. Find translation hypothesis of mentions in target side. 3. For each cluster: select the hypotheses that are more likely to refer to the same entity. 20

  21. Post-editing 𝐷 𝑡𝑑𝑝𝑠𝑓 𝑑 𝑦 : Likelihood that all mentions in 𝑑 𝑗 refer to the 𝑏𝑠𝑕𝑛𝑏𝑦 𝐷 𝑡𝑑𝑝𝑠𝑓 𝑑 𝑦 same entity … … Source cluster 𝑑 𝑗 Mention 1 Mention 2 Mention 3 Mention M … 1 1 1 1 Translation ℎ𝑧𝑞 1 ℎ𝑧𝑞 2 ℎ𝑧𝑞 3 ℎ𝑧𝑞 𝑁 … 2 2 2 2 ℎ𝑧𝑞 1 ℎ𝑧𝑞 2 ℎ𝑧𝑞 3 ℎ𝑧𝑞 𝑁 … N-best 3 3 3 3 ℎ𝑧𝑞 1 ℎ𝑧𝑞 2 ℎ𝑧𝑞 3 ℎ𝑧𝑞 𝑁 by MT … system 4 4 4 4 ℎ𝑧𝑞 1 ℎ𝑧𝑞 2 ℎ𝑧𝑞 3 ℎ𝑧𝑞 𝑁 … … … … … 21

  22. Post-editing Cluster score: 𝜇 1 . 𝐹 𝑡 𝜇 2 . 𝑈 𝜇 3 ෍ 𝜇 𝑗 = 1 𝐷 𝑡𝑑𝑝𝑠𝑓 𝑑 𝑦 = 𝐷 𝑡 𝑡 𝑗 Elements in cluster Entity representation Translation frequency from source 22

  23. Post-editing Source cluster 𝑑 1 Partido politico fue partido que that Translation Political party was match It was party which N-best He was who by MT system She was 23

  24. Post-editing Source cluster 𝑑 1 Partido politico partido fue que was Translation Political party match that It was party which Reordering He was for number who of options She was 24

Recommend


More recommend