using word alignments to assist computer aided
play

Using word alignments to assist computer-aided translation users by - PowerPoint PPT Presentation

Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work Using word alignments to assist computer-aided translation users by marking which target-side words to change or keep unedited Miquel Espl`


  1. Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work Using word alignments to assist computer-aided translation users by marking which target-side words to change or keep unedited Miquel Espl` a-Gomis Felipe S´ anchez-Mart´ ınez Mikel L. Forcada { mespla,fsanchez,mlf } @dlsi.ua.es Departament de Llenguatges i Sistemes Inform` atics Universitat d’Alacant, E-03071 Alacant, Spain 15th Annual Conference of the EAMT Leuven, May 30, 2011

  2. Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work Outline Introduction 1 Related Work 2 Methodology 3 Experiments and Results 4 Conclusion 5 Current and future Work 6

  3. Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work Outline Introduction 1 Related Work 2 Methodology 3 Experiments and Results 4 Conclusion 5 Current and future Work 6

  4. Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work Translation Memories English Catalan s 1 : European Association for t 1 : Associaci´ o Europea per a la Machine Translation Traducci´ o Autom` atica s 2 : The EAMT is a member of t 2 : L ’EAMT ´ es membre de l’IAMT the IAMT t 3 : el congr´ s 3 : current year’s conference is es d’enguany se cel- held in Leuven ebra a Lovaina . . . . . .

  5. Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work Translation Memories English Catalan s 1 : European Association for t 1 : Associaci´ o Europea per a la Machine Translation Traducci´ o Autom` atica s 2 : The EAMT is a member of t 2 : L ’EAMT ´ es membre de l’IAMT the IAMT t 3 : el congr´ s 3 : current year’s conference is es d’enguany se cel- held in Leuven ebra a Lovaina . . . . . . New sentence s ′ : The AMTA is a member of the IAMT

  6. Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work Translation Memories English Catalan s 1 : European Association for t 1 : Associaci´ o Europea per a la Machine Translation Traducci´ o Autom` atica s 2 : The EAMT is a member of t 2 : L ’EAMT ´ es membre de l’IAMT the IAMT t 3 : el congr´ s 3 : current year’s conference is es d’enguany se cel- held in Leuven ebra a Lovaina . . . . . . New sentence s ′ : The AMTA is a member of the IAMT Best match s 2 : The EAMT is a member of the IAMT

  7. Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work Translation Memories English Catalan s 1 : European Association for t 1 : Associaci´ o Europea per a la Machine Translation Traducci´ o Autom` atica s 2 : The EAMT is a member of t 2 : L ’EAMT ´ es membre de l’IAMT the IAMT t 3 : el congr´ s 3 : current year’s conference is es d’enguany se cel- held in Leuven ebra a Lovaina . . . . . . New sentence s ′ : The AMTA is a member of the IAMT Best match s 2 : The EAMT is a member of the IAMT Proposal t 2 : L ’EAMT ´ es membre de l’IAMT

  8. Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work Fuzzy Matching Scores Fuzzy matching scores measure the similarity between segments s ′ (segment to be translated) and s i (matching segment in the Translation memory) score ( s ′ , s i ) = 1 − EditDistance ( s ′ , s i ) max ( | s ′ | , | s i | )

  9. Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work Fuzzy Matching Scores Fuzzy matching scores measure the similarity between segments s ′ (segment to be translated) and s i (matching segment in the Translation memory) score ( s ′ , s i ) = 1 − EditDistance ( s ′ , s i ) max ( | s ′ | , | s i | ) Example s ′ : The Association for Machine Translation in the Americas is the American branch of the IAMT s i : The European Association for Machine Translation is a member of the IAMT score ( s ′ , s i ) = 1 − 7 15 ≃ 0 , 53

  10. Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work Translation-Memory Based CAT Tools

  11. Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work Fuzzy Match Scores + Alignment Edit distance provides information about the matching words between s ′ and s i : Example l’ Associaci´ o Europea per a la Traducci´ o Autom` atica t i s i the European Association for Machine Translation s ′ the Asia-Pacific Association for Machine Translation

  12. Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work Fuzzy Match Scores + Alignment Word alignment may be used to “project” source-side matching information onto t i to suggest which words to change and which to keep unedited: Example l’ Associaci´ o Europea per a la Traducci´ o Autom` atica t i ❅ ❆ ❉ ❆ � ❅ ❆ ❉ ❆ � ❅ ❆ ❉ � ❆ ❅ ❆ ❉ � ❆ ❅ ❆ ❉ � ❆ s i the European Association for Machine Translation s ′ the Asia-Pacific Association for Machine Translation

  13. Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work Outline Introduction 1 Related Work 2 Methodology 3 Experiments and Results 4 Conclusion 5 Current and future Work 6

  14. Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work Related Work Simard (2003) : Statistical MT techniques allows exploiting TMs at sub-segment (sub-sentential) level: translation spotting Bourdaillet et al. (2009) : Similar approach for a bilingual concordancer, TransSearch Kranias and Samiotou (2004) : Sub-segment level alignments using a bilingual dictionary to (i) detect words to be changed and (ii) propose translations for them

  15. Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work Outline Introduction 1 Related Work 2 Methodology 3 Experiments and Results 4 Conclusion 5 Current and future Work 6

  16. Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work Rationale [keep] [change] [?] w ij w ij ′ w ij ′′ t i � � � � ? � � � � v ik v ik ′ � � s i matched unmatched matched with s ′ with s ′ with s ′ w ij and v ik aligned and v ik matched = ⇒ keep w ij w ij and v ik aligned and v ik not matched = ⇒ change w ij w ij not aligned = ⇒ ???

  17. Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work Rationale What to do if there is more than one alignment with contradictory evidence? [???] w ij t i ✁ ❆ ✁ ❆ ✁ ❆ ✁ ❆ v ik v ik ′ ✁ ❆ s i matched unmatched with s ′ with s ′

  18. Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work Rationale We define the likelihood of keeping the word w ij unedited as: � v ik ∈ aligned ( w ij ) matched ( v ik ) f K ( w ij , s ′ , s i , t i ) = | aligned ( w ij ) | aligned ( w ij ) : set of source-side words aligned with w ij in s i matched ( v ik ) : 1 if v ik is matched in s ′ and 0 otherwise

  19. Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work Interpretation of f K ( w ij , s ′ , s i , t i ) Two ways to interpret f K ( w ij , s ′ , s i , t i ) : Unanimity: if f K ( w ij , s ′ , s i , t i ) = 1: w ij → keep unedited if f K ( w ij , s ′ , s i , t i ) = 0: w ij → change otherwise → not marked Majority: if f K ( w ij , s ′ , s i , t i ) > 1 2 : w ij → keep unedited if f K ( w ij , s ′ , s i , t i ) < 1 2 : w ij → change otherwise → not marked

  20. Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work Example of Unanimity Criterion [change] [?] [keep] [keep] t i : he missed his brother ✁ ❏ ✁ ❏ ❏ ✁ s i : el ech´ ´ o de menos a su hermano s ′ : ella ech´ o de casa a su hermano

  21. Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work Example of Majority Criterion [change] [keep] [keep] [keep] t i : he missed his brother ✁ ❏ ✁ ❏ ❏ ✁ s i : el ech´ ´ o de menos a su hermano s ′ : ella ech´ o de casa a su hermano

  22. Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work Outline Introduction 1 Related Work 2 Methodology 3 Experiments and Results 4 Conclusion 5 Current and future Work 6

  23. Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work Corpora

  24. Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work Evaluation Metrics Accuracy = correctly marked words marked words Coverage = marked words total words

  25. Introduction Related Work Methodology Experiments and Results Conclusion Current and future Work Statistical Word Alignment We use the GIZA++ (Och and Ney, 2003) free/open-source tool we obtain SL to TL alignment and a TL to SL alignment on the TM we experiment with three ways to combine the alignments: union intersection grow-diag-final-and

Recommend


More recommend