Machine Translation Evaluation (Based on Milo s Stanojevi cs - PowerPoint PPT Presentation

Machine Translation Evaluation (Based on Miloˇ s Stanojevi´ c’s slides) Iacer Calixto Institute for Logic, Language and Computation University of Amsterdam May 18, 2018 Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 1 / 18

Introduction Machine Translation Pipeline Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 2 / 18

Introduction “Good” versus “Bad” Translations • How bad can translations be? Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 3 / 18

Introduction “Good” versus “Bad” Translations • How bad can translations be? • Grammar errors: Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 3 / 18

Introduction “Good” versus “Bad” Translations • How bad can translations be? • Grammar errors: • Wrong noun-verb agreement: e.g. She do not dance. Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 3 / 18

Introduction “Good” versus “Bad” Translations • How bad can translations be? • Grammar errors: • Wrong noun-verb agreement: e.g. She do not dance. • Spelling mistakes: e.g. The dog is playin with the bal. Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 3 / 18

Introduction “Good” versus “Bad” Translations • How bad can translations be? • Grammar errors: • Wrong noun-verb agreement: e.g. She do not dance. • Spelling mistakes: e.g. The dog is playin with the bal. • Etc. Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 3 / 18

Introduction “Good” versus “Bad” Translations • How bad can translations be? • Grammar errors: • Wrong noun-verb agreement: e.g. She do not dance. • Spelling mistakes: e.g. The dog is playin with the bal. • Etc. • Disfluent translations: e.g. She does not like [to] dance. Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 3 / 18

Introduction “Good” versus “Bad” Translations • How bad can translations be? • Grammar errors: • Wrong noun-verb agreement: e.g. She do not dance. • Spelling mistakes: e.g. The dog is playin with the bal. • Etc. • Disfluent translations: e.g. She does not like [to] dance. • Etc. Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 3 / 18

Introduction “Good” versus “Bad” Translations • How bad can translations be? • Grammar errors: • Wrong noun-verb agreement: e.g. She do not dance. • Spelling mistakes: e.g. The dog is playin with the bal. • Etc. • Disfluent translations: e.g. She does not like [to] dance. • Etc. • What constitutes a good translation? Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 3 / 18

Introduction “Good” versus “Bad” Translations • How bad can translations be? • Grammar errors: • Wrong noun-verb agreement: e.g. She do not dance. • Spelling mistakes: e.g. The dog is playin with the bal. • Etc. • Disfluent translations: e.g. She does not like [to] dance. • Etc. • What constitutes a good translation? • One that accounts for all the “ units of meaning ” in the source sentence? Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 3 / 18

Introduction “Good” versus “Bad” Translations • How bad can translations be? • Grammar errors: • Wrong noun-verb agreement: e.g. She do not dance. • Spelling mistakes: e.g. The dog is playin with the bal. • Etc. • Disfluent translations: e.g. She does not like [to] dance. • Etc. • What constitutes a good translation? • One that accounts for all the “ units of meaning ” in the source sentence? • One that reads fluently in the target language? Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 3 / 18

Introduction “Good” versus “Bad” Translations • How bad can translations be? • Grammar errors: • Wrong noun-verb agreement: e.g. She do not dance. • Spelling mistakes: e.g. The dog is playin with the bal. • Etc. • Disfluent translations: e.g. She does not like [to] dance. • Etc. • What constitutes a good translation? • One that accounts for all the “ units of meaning ” in the source sentence? • One that reads fluently in the target language? • What about translating literature, e.g. Alice’s Adventures in Wonderland? Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 3 / 18

Introduction “Good” versus “Bad” Translations • How bad can translations be? • Grammar errors: • Wrong noun-verb agreement: e.g. She do not dance. • Spelling mistakes: e.g. The dog is playin with the bal. • Etc. • Disfluent translations: e.g. She does not like [to] dance. • Etc. • What constitutes a good translation? • One that accounts for all the “ units of meaning ” in the source sentence? • One that reads fluently in the target language? • What about translating literature, e.g. Alice’s Adventures in Wonderland? • Or a philosophical treatise, e.g. Beyond Good and Evil? Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 3 / 18

Introduction Good Translations - Fluency vs. Adequacy • Let’s simplify the problem: • One axis of our evaluation should account for target-language fluency ; Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 4 / 18

Introduction Good Translations - Fluency vs. Adequacy • Let’s simplify the problem: • One axis of our evaluation should account for target-language fluency ; • Another axis should account for how adequate are the source-sentence “ units of meaning ” translated into the target language. Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 4 / 18

Introduction Good Translations - Fluency vs. Adequacy • Let’s simplify the problem: • One axis of our evaluation should account for target-language fluency ; • Another axis should account for how adequate are the source-sentence “ units of meaning ” translated into the target language. • Examples: • The man is playing football (source sentence) • La femme joue au football ( ✓ fluent but ✗ adequate) • ✗ Le homme joue ✗ football ( ✗ fluent but ✓ adequate) • L’homme joue au football ( ✓ fluent and ✓ adequate) Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 4 / 18

Outline 1 Introduction 2 Outline 3 Motivation 4 Word-based Metrics 5 Feature-based Metric(s) 6 Wrap-up & Conclusions Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 5 / 18

Motivation Why Machine Translation Evaluation? • Why do we need automatic evaluation of MT output? Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 6 / 18

Motivation Why Machine Translation Evaluation? • Why do we need automatic evaluation of MT output? • Rapid system development; • Tuning MT systems; • Comparing different systems; Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 6 / 18

Motivation Why Machine Translation Evaluation? • Why do we need automatic evaluation of MT output? • Rapid system development; • Tuning MT systems; • Comparing different systems; • Ideally we would like to incorporate human feedback too, but they are too expensive ... � Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 6 / 18

Motivation What is a Metric? • A function that computes the similarity between the output of an MT system (i.e. hypothesis or sys ) and one or more human translations (reference translations or ref ); Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 7 / 18

Motivation What is a Metric? • A function that computes the similarity between the output of an MT system (i.e. hypothesis or sys ) and one or more human translations (reference translations or ref ); • It can be interpreted in different ways: • Overlap between sys and ref : precision, recall... • Edit distance: insert, delete, shift; • Etc. Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 7 / 18

Motivation What is a Metric? • A function that computes the similarity between the output of an MT system (i.e. hypothesis or sys ) and one or more human translations (reference translations or ref ); • It can be interpreted in different ways: • Overlap between sys and ref : precision, recall... • Edit distance: insert, delete, shift; • Etc. • Different metrics make different choices; Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 7 / 18

Word-based Metrics BLEU (Papineni et al., 2002) • Commonly, we set N = 4, w n = 1 N ; • BP stands for “Brevity Penalty” and is computed by: • c is the length of the candidate translation; • r is the effective reference corpus length. Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 8 / 18

Word-based Metrics BLEU (cont.) • ref : john plays in the park (length = 5) • hyp : john is playing in the park (length = 6) • 1-gram : ✓ john ✗ is ✗ playing ✓ in ✓ the ✓ park • BP = 1 ( c > r ) • For N = 1: • w 1 = 1 1 = 1 • p 1 = 4 5 , therefore BLEU 1 = 1 · exp(1 · log 0 . 8) = 0 . 9. Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18, 2018 9 / 18

Machine Translation Evaluation (Based on Milo s Stanojevi cs - PowerPoint PPT Presentation

Machine Translation Evaluation (Based on Milo s Stanojevi cs slides) Iacer Calixto Institute for Logic, Language and Computation University of Amsterdam May 18, 2018 Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18,

Chapter 8 Evaluation Statistical Machine Translation Evaluation How good is a given machine

History & Evaluation CMSC 470 Marine Carpuat T odays topics Machine Translation

Machine Translation History & Evaluation CMSC 470 Marine Carpuat T odays topics

Evaluation Philipp Koehn 22 September 2020 Philipp Koehn Machine Translation: Evaluation 22

Automated Metrics for MT Evaluation 11731: 11731: Machine Translation Alon Lavie

Machine Translation Evaluation Sara Stymne 2020-09-02 Partly based on Philipp Koehns slides

Introduction to Machine Translation Joost Bastings ILLC, University of Amsterdam

Choosing the Right Evaluation for Machine Translation An Examination of Annotator and Automatic

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Machine Translation 2 Wikipedia Machine translation, often referred to by the acronym MT, is a

Machine Translation Evaluation Sara Stymne Partly based on Philipp Koehns slides for chapter 8

Machine Translation (M2M) Machine Translation (M2M) SNMP MIB to CIM MOF SNMP MIB to CIM MOF

Using a Grammar Checker for Evaluation and Postprocessing of Statistical Machine Translation Sara

Machine Translation 1: Introduction, Approaches, Evaluation, Word Alignment Ond rej Bojar

Natural Language Processing Machine Translation Dan Klein UC Berkeley 1 Machine Translation 2

Decoding in Statistical Machine Translation Christian Hardmeier 2016-05-04 Mid-course Evaluation

Introd u ction to machine translation MAC H IN E TR AN SL ATION IN P YTH ON Th u shan

Natural Language Processing Machine Translation Machine Translation Dan Klein UC Berkeley

Use of the Machine Translation Module within Dj Vu X2 Quick Guidance Introduction Machine

Machine Translation 1: Introduction, Approaches, Evaluation, Alignment, PBMT Ondej Bojar

STS for Machine Translation Evaluation STS Workshop, NYC March 12-13 2012 Lucia Specia

Automatic Machine Translation Evaluation using Source Language Inputs and Cross-lingual Language

Machine Translation Machine Translation February 13, 2008 Andreas Eisele UdS Computerlinguistik

QUALITY ESTIMATION AND EVALUATION OF MACHINE TRANSLATION INTO ARABIC Houda Bouamor, Carnegie

Machine Translation Evaluation (Based on Milo s Stanojevi cs - PowerPoint PPT Presentation

Machine Translation Evaluation (Based on Milo s Stanojevi cs slides) Iacer Calixto Institute for Logic, Language and Computation University of Amsterdam May 18, 2018 Iacer Calixto (ILLC, UvA) Machine Translation Evaluation May 18,

Chapter 8 Evaluation Statistical Machine Translation Evaluation How good is a given machine

History &amp; Evaluation CMSC 470 Marine Carpuat T odays topics Machine Translation

Machine Translation History &amp; Evaluation CMSC 470 Marine Carpuat T odays topics

Evaluation Philipp Koehn 22 September 2020 Philipp Koehn Machine Translation: Evaluation 22

Automated Metrics for MT Evaluation 11731: 11731: Machine Translation Alon Lavie

Machine Translation Evaluation Sara Stymne 2020-09-02 Partly based on Philipp Koehns slides

Introduction to Machine Translation Joost Bastings ILLC, University of Amsterdam

Choosing the Right Evaluation for Machine Translation An Examination of Annotator and Automatic

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Machine Translation 2 Wikipedia Machine translation, often referred to by the acronym MT, is a

Machine Translation Evaluation Sara Stymne Partly based on Philipp Koehns slides for chapter 8

Machine Translation (M2M) Machine Translation (M2M) SNMP MIB to CIM MOF SNMP MIB to CIM MOF

Using a Grammar Checker for Evaluation and Postprocessing of Statistical Machine Translation Sara

Machine Translation 1: Introduction, Approaches, Evaluation, Word Alignment Ond rej Bojar

Natural Language Processing Machine Translation Dan Klein UC Berkeley 1 Machine Translation 2

Decoding in Statistical Machine Translation Christian Hardmeier 2016-05-04 Mid-course Evaluation

Introd u ction to machine translation MAC H IN E TR AN SL ATION IN P YTH ON Th u shan

Natural Language Processing Machine Translation Machine Translation Dan Klein UC Berkeley

Use of the Machine Translation Module within Dj Vu X2 Quick Guidance Introduction Machine

Machine Translation 1: Introduction, Approaches, Evaluation, Alignment, PBMT Ondej Bojar

STS for Machine Translation Evaluation STS Workshop, NYC March 12-13 2012 Lucia Specia

Automatic Machine Translation Evaluation using Source Language Inputs and Cross-lingual Language

Machine Translation Machine Translation February 13, 2008 Andreas Eisele UdS Computerlinguistik

QUALITY ESTIMATION AND EVALUATION OF MACHINE TRANSLATION INTO ARABIC Houda Bouamor, Carnegie

History & Evaluation CMSC 470 Marine Carpuat T odays topics Machine Translation

Machine Translation History & Evaluation CMSC 470 Marine Carpuat T odays topics