machine translation evaluation
play

Machine Translation Evaluation (Based on Milo s Stanojevi cs - PowerPoint PPT Presentation

Machine Translation Evaluation (Based on Milo s Stanojevi cs slides) Iacer Calixto Institute for Logic, Language and Computation University of Amsterdam May 18, 2018 Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18,


  1. Machine Translation Evaluation (Based on Miloˇ s Stanojevi´ c’s slides) Iacer Calixto Institute for Logic, Language and Computation University of Amsterdam May 18, 2018 Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 1 / 18

  2. Introduction Machine Translation Pipeline Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 2 / 18

  3. Introduction “Good” versus “Bad” Translations • How bad can translations be? Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 3 / 18

  4. Introduction “Good” versus “Bad” Translations • How bad can translations be? • Grammar errors: Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 3 / 18

  5. Introduction “Good” versus “Bad” Translations • How bad can translations be? • Grammar errors: • Wrong noun-verb agreement: e.g. She do not dance. Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 3 / 18

  6. Introduction “Good” versus “Bad” Translations • How bad can translations be? • Grammar errors: • Wrong noun-verb agreement: e.g. She do not dance. • Spelling mistakes: e.g. The dog is playin with the bal. Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 3 / 18

  7. Introduction “Good” versus “Bad” Translations • How bad can translations be? • Grammar errors: • Wrong noun-verb agreement: e.g. She do not dance. • Spelling mistakes: e.g. The dog is playin with the bal. • Etc. Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 3 / 18

  8. Introduction “Good” versus “Bad” Translations • How bad can translations be? • Grammar errors: • Wrong noun-verb agreement: e.g. She do not dance. • Spelling mistakes: e.g. The dog is playin with the bal. • Etc. • Disfluent translations: e.g. She does not like [to] dance. Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 3 / 18

  9. Introduction “Good” versus “Bad” Translations • How bad can translations be? • Grammar errors: • Wrong noun-verb agreement: e.g. She do not dance. • Spelling mistakes: e.g. The dog is playin with the bal. • Etc. • Disfluent translations: e.g. She does not like [to] dance. • Etc. Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 3 / 18

  10. Introduction “Good” versus “Bad” Translations • How bad can translations be? • Grammar errors: • Wrong noun-verb agreement: e.g. She do not dance. • Spelling mistakes: e.g. The dog is playin with the bal. • Etc. • Disfluent translations: e.g. She does not like [to] dance. • Etc. • What constitutes a good translation? Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 3 / 18

  11. Introduction “Good” versus “Bad” Translations • How bad can translations be? • Grammar errors: • Wrong noun-verb agreement: e.g. She do not dance. • Spelling mistakes: e.g. The dog is playin with the bal. • Etc. • Disfluent translations: e.g. She does not like [to] dance. • Etc. • What constitutes a good translation? • One that accounts for all the “ units of meaning ” in the source sentence? Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 3 / 18

  12. Introduction “Good” versus “Bad” Translations • How bad can translations be? • Grammar errors: • Wrong noun-verb agreement: e.g. She do not dance. • Spelling mistakes: e.g. The dog is playin with the bal. • Etc. • Disfluent translations: e.g. She does not like [to] dance. • Etc. • What constitutes a good translation? • One that accounts for all the “ units of meaning ” in the source sentence? • One that reads fluently in the target language? Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 3 / 18

  13. Introduction “Good” versus “Bad” Translations • How bad can translations be? • Grammar errors: • Wrong noun-verb agreement: e.g. She do not dance. • Spelling mistakes: e.g. The dog is playin with the bal. • Etc. • Disfluent translations: e.g. She does not like [to] dance. • Etc. • What constitutes a good translation? • One that accounts for all the “ units of meaning ” in the source sentence? • One that reads fluently in the target language? • What about translating literature, e.g. Alice’s Adventures in Wonderland? Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 3 / 18

  14. Introduction “Good” versus “Bad” Translations • How bad can translations be? • Grammar errors: • Wrong noun-verb agreement: e.g. She do not dance. • Spelling mistakes: e.g. The dog is playin with the bal. • Etc. • Disfluent translations: e.g. She does not like [to] dance. • Etc. • What constitutes a good translation? • One that accounts for all the “ units of meaning ” in the source sentence? • One that reads fluently in the target language? • What about translating literature, e.g. Alice’s Adventures in Wonderland? • Or a philosophical treatise, e.g. Beyond Good and Evil? Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 3 / 18

  15. Introduction Good Translations - Fluency vs. Adequacy • Let’s simplify the problem: • One axis of our evaluation should account for target-language fluency ; Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 4 / 18

  16. Introduction Good Translations - Fluency vs. Adequacy • Let’s simplify the problem: • One axis of our evaluation should account for target-language fluency ; • Another axis should account for how adequate are the source-sentence “ units of meaning ” translated into the target language. Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 4 / 18

  17. Introduction Good Translations - Fluency vs. Adequacy • Let’s simplify the problem: • One axis of our evaluation should account for target-language fluency ; • Another axis should account for how adequate are the source-sentence “ units of meaning ” translated into the target language. • Examples: • The man is playing football (source sentence) • La femme joue au football ( ✓ fluent but ✗ adequate) • ✗ Le homme joue ✗ football ( ✗ fluent but ✓ adequate) • L’homme joue au football ( ✓ fluent and ✓ adequate) Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 4 / 18

  18. Outline 1 Introduction 2 Outline 3 Motivation 4 Word-based Metrics 5 Feature-based Metric(s) 6 Wrap-up & Conclusions Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 5 / 18

  19. Motivation Why Machine Translation Evaluation? • Why do we need automatic evaluation of MT output? Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 6 / 18

  20. Motivation Why Machine Translation Evaluation? • Why do we need automatic evaluation of MT output? • Rapid system development; • Tuning MT systems; • Comparing different systems; Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 6 / 18

  21. Motivation Why Machine Translation Evaluation? • Why do we need automatic evaluation of MT output? • Rapid system development; • Tuning MT systems; • Comparing different systems; • Ideally we would like to incorporate human feedback too, but they are too expensive ... � Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 6 / 18

  22. Motivation What is a Metric? • A function that computes the similarity between the output of an MT system (i.e. hypothesis or sys ) and one or more human translations (reference translations or ref ); Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 7 / 18

  23. Motivation What is a Metric? • A function that computes the similarity between the output of an MT system (i.e. hypothesis or sys ) and one or more human translations (reference translations or ref ); • It can be interpreted in different ways: • Overlap between sys and ref : precision, recall... • Edit distance: insert, delete, shift; • Etc. Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 7 / 18

  24. Motivation What is a Metric? • A function that computes the similarity between the output of an MT system (i.e. hypothesis or sys ) and one or more human translations (reference translations or ref ); • It can be interpreted in different ways: • Overlap between sys and ref : precision, recall... • Edit distance: insert, delete, shift; • Etc. • Different metrics make different choices; Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 7 / 18

  25. Word-based Metrics BLEU (Papineni et al., 2002) • Commonly, we set N = 4, w n = 1 N ; • BP stands for “Brevity Penalty” and is computed by: • c is the length of the candidate translation; • r is the effective reference corpus length. Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 8 / 18

  26. Word-based Metrics BLEU (cont.) • ref : john plays in the park (length = 5) • hyp : john is playing in the park (length = 6) • 1-gram : ✓ john ✗ is ✗ playing ✓ in ✓ the ✓ park • BP = 1 ( c > r ) • For N = 1: • w 1 = 1 1 = 1 • p 1 = 4 5 , therefore BLEU 1 = 1 · exp(1 · log 0 . 8) = 0 . 9. Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 9 / 18

Recommend


More recommend