Using MT-Based Metrics for RTE Alexander Volokh and Gnter Neumann - PowerPoint PPT Presentation

Using MT-Based Metrics for RTE Alexander Volokh and Günter Neumann DFKI, Germany

Result Analysis ● RTE-6 results: - 38 - 48 F1-score - Good or bad? ● What is the highest realistic result within the scope of RTE-7? - What are the different phenomena in the data? - How frequent? - How difficult?

Complexity Analysis ● Divided the data into three complexity classes: - A: syntax - B: lexical semantics (synonymy) - C: inference / world knowledge → A – 30% of the data, B – 35%, C – 35% ● Focus on A and B classes, C – too difficult for the scope of the task.

Examples (A class) ● H 1 : People were forced to leave their pets behind when they evacuated New Orleans. ● T 1 : Thousands of people were forced to leave their pets behind when they evacuated New Orleans. - the relevant information is expressed with the same words in both T and H - analysis of the syntactic structure should suffice

Examples (B class) ● H 1 : People were forced to leave their pets behind when they evacuated New Orleans. ● T 2 : Animal rescue officials have been collecting scores of pets and other animals from the shattered city, while many survivors have told of their distress at having to leave beloved cats and dogs behind in the watery city when they fled. ● T 3 : Such emotional scenes were repeated perhaps thousands of times along the Gulf Coast last week as pet owners were forced to abandon their animals in the midst of evacuation. - the words used in T 2 and T 3 differ from those used in H 1 - one has to know about the synonymy/semantic relatedness of the words in addition to the syntactic structure

Examples (C class) ● H 1 : People were forced to leave their pets behind when they evacuated New Orleans. ● T 4 : For Elizabeth Finch, the owner of two dogs named Zorra and Hans Blix, the sight of citizens forced to choose between their pets and their safety was, like the disaster itself, indicative of broader social rifts. ● T 5 : The animals are being cared for at a farm north of Louisiana until they can be reunited with their families, many of whom were told they would not be able to bring their pets on evacuation busses and helicopters. - Logic inference and/or world knowledge

Approach for A and B ● A and B – same or similar words are used ● Idea: assume T and H are translations of the same source sentence → The assumption is wrong in general, T contains more information than H (in YES case) ● But: the similarity between T and H in YES cases is still higher than in NO cases ● → Entailment can be predicted

Meteor ● We compute the similarity using different features ● Most important ones use Meteor 1 (Metric for Evaluation of Translation with Explicit Ordering) ● Meteor matches words using exact , stem and synonym modules ● If T entails H Meteor score is higher than if T does not entail H (especially for A, but also for B classes) 1- http://www.cs.cmu.edu/~alavie/METEOR/

Weaknesses ● Problems: - Does not work if T and H have completely different lengths - Synonym module does not always match - Finally, T is simly not equal H ● Final result far below the targeted boundary of 0.65

Conclusion ● 43.41 micro-average F1-score ● 46.34 macro-average F1-score → Above median, big improvement over the last year ● Very robust solution to an exremely large amount of data - >50% can be solved this way if account for weaknesses ● Problem-specific alternatives can still be included for the rest of the data

Using MT-Based Metrics for RTE Alexander Volokh and Gnter Neumann - PowerPoint PPT Presentation

Using MT-Based Metrics for RTE Alexander Volokh and Gnter Neumann DFKI, Germany Result Analysis RTE-6 results: - 38 - 48 F1-score - Good or bad? What is the highest realistic result within the scope of RTE-7? - What are the

What we learned from Community Metrics Agenda Why are metrics used? How metrics are used

Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics

AGENCY OPERATIONS METRICS The Metrics of Me The Metrics of Me x 159 13,006 5 days old books

Proposal Metrics Dashboard What Gets Measured Gets Done Topics Why Keep Metrics? What

Metrics and Estimation Rahul Premraj + Andreas Zeller 1 Metrics Quantitative measures that

Software Metrics Alex Boughton Executive Summary What are software metrics? Why are

Astheno-Khler and strong KT General results metrics Bismut connection Definition of strong KT

NDCs and metrics Andrei Marcu , Director, ERCST 1 NDCs and metrics Main issues: - Which metrics

Metrics are Pivotal A NATIONAL FARM TO INSTITUTION METRICS COLLABORATIVE WEBINAR Local

Software Metrics And I gnominy Software Metrics And I gnominy Software Metrics And I gnominy

Software Metrics Chapter 4 1 SW Metrics SW process and product metrics are quantitative

Software Metrics Overview SE 350 Software Process & Product Quality Lecture Objectives

Transmission System Operator Big Data Paris 2 Summary Presentation of RTE Potential

Using Metrics to Identify Design Using Metrics to Identify Design Patterns in Object-Oriented

Polls apar Polls apar t? Using t? Using c lic ke r c lic ke r s to make s to make le ar le

Opsim and MAF Metrics Lynne Jones Opsim and MAF metrics Call for white papers on LSST survey

Teachers Quality and Its Impact on Learning BY: Y: RIGHT TO EDUCATION (IDARA-E-TALEEM-O-AAGAHI)

ROUTE 29 / NEW BALTIMORE ADVISORY PANEL MEETING #5 November 29, 2018 Virginia Department of

ROUTE 29 / NEW BALTIMORE ADVISORY PANEL MEETING #15 January 23, 2020 Virginia Department of

RT PRE SE NT AT ION T O T HE JOINT OIRE ACHT AS COMMIT T E E ON COMMUNICAT

transfer package for next-generation supercomputers Approved for public release Benjamin R.

T INTERSTATE 2040-77 I-64 Peninsula (8-Lane Option) Bland Blvd New Kent County Line

Parcel Group 6: Carlos Bee Quarry RFP Pre-Submittal Meeting Jennifer Ott, Deputy City Manager

Capacity markets and the internal market: the French court case (C-543/15 ) A missed opportunity?

Using MT-Based Metrics for RTE Alexander Volokh and Gnter Neumann - PowerPoint PPT Presentation

Using MT-Based Metrics for RTE Alexander Volokh and Gnter Neumann DFKI, Germany Result Analysis RTE-6 results: - 38 - 48 F1-score - Good or bad? What is the highest realistic result within the scope of RTE-7? - What are the

What we learned from Community Metrics Agenda Why are metrics used? How metrics are used

Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics

AGENCY OPERATIONS METRICS The Metrics of Me The Metrics of Me x 159 13,006 5 days old books

Proposal Metrics Dashboard What Gets Measured Gets Done Topics Why Keep Metrics? What

Metrics and Estimation Rahul Premraj + Andreas Zeller 1 Metrics Quantitative measures that

Software Metrics Alex Boughton Executive Summary What are software metrics? Why are

Astheno-Khler and strong KT General results metrics Bismut connection Definition of strong KT

NDCs and metrics Andrei Marcu , Director, ERCST 1 NDCs and metrics Main issues: - Which metrics

Metrics are Pivotal A NATIONAL FARM TO INSTITUTION METRICS COLLABORATIVE WEBINAR Local

Software Metrics And I gnominy Software Metrics And I gnominy Software Metrics And I gnominy

Software Metrics Chapter 4 1 SW Metrics SW process and product metrics are quantitative

Software Metrics Overview SE 350 Software Process &amp; Product Quality Lecture Objectives

Transmission System Operator Big Data Paris 2 Summary Presentation of RTE Potential

Using Metrics to Identify Design Using Metrics to Identify Design Patterns in Object-Oriented

Polls apar Polls apar t? Using t? Using c lic ke r c lic ke r s to make s to make le ar le

Opsim and MAF Metrics Lynne Jones Opsim and MAF metrics Call for white papers on LSST survey

Teachers Quality and Its Impact on Learning BY: Y: RIGHT TO EDUCATION (IDARA-E-TALEEM-O-AAGAHI)

ROUTE 29 / NEW BALTIMORE ADVISORY PANEL MEETING #5 November 29, 2018 Virginia Department of

ROUTE 29 / NEW BALTIMORE ADVISORY PANEL MEETING #15 January 23, 2020 Virginia Department of

RT PRE SE NT AT ION T O T HE JOINT OIRE ACHT AS COMMIT T E E ON COMMUNICAT

transfer package for next-generation supercomputers Approved for public release Benjamin R.

T INTERSTATE 2040-77 I-64 Peninsula (8-Lane Option) Bland Blvd New Kent County Line

Parcel Group 6: Carlos Bee Quarry RFP Pre-Submittal Meeting Jennifer Ott, Deputy City Manager

Capacity markets and the internal market: the French court case (C-543/15 ) A missed opportunity?

Software Metrics Overview SE 350 Software Process & Product Quality Lecture Objectives