Evaluation Metrics for Machine Reading Comprehension (RC): - PowerPoint PPT Presentation

Evaluation Metrics for Machine Reading Comprehension (RC): Prerequisite Skills and Readability Sugawara et al. The University of Tokyo, Fujitsu Laboratories Ltd., Natural Institute of Informatics Presented by: Shaima AbdulMajeed

To give the agent the ability to: RC 1. Read open-domain documents Task 2. Answer questions about them

Knowing Quality of Reading Goal Comprehension (RC) datasets

To know Which dataset to use that Why best evaluates the developed RC system

RC dataset Example

Datasets evaluated

Current dataset metrics • Question types • Answer types • Categories

Is that enough ?

Does readability of text correlates with difficulty of answering questions about it?

Evaluation Metrics Proposed 1. Prerequisite skills 2. Readability metrics

1. Object Tracking 2. Mathematical Reasoning 3. Coreference resolution 4. Logical Reasoning 5. Analogy 6. Causal relation Prerequisite skills 7. Spatiotemporal relation 8. Ellipsis 9. Bridging 10. Elaboration 11. Meta-Knowledge 12. Schematic clause relation 13. Punctuation

Tracking or grasping of multiple objects 1. Object Tracking 2. Mathematical Reasoning 3. Coreference resolution 4. Logical Reasoning 5. Analogy Context: Tom ate apples. 6. Causal relation 7. Spatiotemporal relation Mary ate apples, too. 8. Ellipsis 9. Bridging Q: Who ate apples? 10. Elaboration 11. Meta-Knowledge A: Tom and Mary 12. Schematic clause relation 13. Punctuation (Object: Tom, Mary)

Statistical, mathematical and 1. Object Tracking quantitative reasoning 2. Mathematical Reasoning 3. Coreference resolution 4. Logical Reasoning 5. Analogy 6. Causal relation Context: Tom ate ten apples. 7. Spatiotemporal relation Mary ate eight apples. 8. Ellipsis 9. Bridging Q: How many apples did 10. Elaboration 11. Meta-Knowledge Tom and Mary eat? 12. Schematic clause relation 13. Punctuation A: eighteen

Detection and resolution of all 1. Object Tracking possible demonstratives 2. Mathematical Reasoning 3. Coreference resolution 4. Logical Reasoning 5. Analogy Context: Tom was hungry. 6. Causal relation 7. Spatiotemporal relation He ate ten apples. 8. Ellipsis 9. Bridging Q: How many apples did 10. Elaboration 11. Meta-Knowledge Tom eat? 12. Schematic clause relation 13. Punctuation A: ten (Tom = He)

1. Object Tracking Understanding of Predicate Logic 2. Mathematical Reasoning 3. Coreference resolution 4. Logical Reasoning 5. Analogy Context: All students have a 6. Causal relation 7. Spatiotemporal relation pen. Tom is a student. 8. Ellipsis 9. Bridging Q: Does Tom have a pen. 10. Elaboration 11. Meta-Knowledge A: Yes (and object tracking) 12. Schematic clause relation 13. Punctuation

Understanding metaphors Context: The White House 1. Object Tracking 2. Mathematical Reasoning said Trump is open to … 3. Coreference resolution 4. Logical Reasoning 5. Analogy 6. Causal relation Q: Did the President of the 7. Spatiotemporal relation 8. Ellipsis United States and his staff 9. Bridging 10. Elaboration say Trump is open to ... 11. Meta-Knowledge 12. Schematic clause relation A: Yes 13. Punctuation (The White House said = POTUS and his staff said...)

1. Object Tracking 2. Mathematical Reasoning 3. Coreference resolution 4. Logical Reasoning 5. Analogy 6. Causal relation “why,” “because,” 7. Spatiotemporal relation 8. Ellipsis 9. Bridging 10. Elaboration 11. Meta-Knowledge 12. Schematic clause relation 13. Punctuation

Context: One day, Tom went to the park. After that, 1. Object Tracking he went to the restaurant. 2. Mathematical Reasoning 3. Coreference resolution Finally, he went to his 4. Logical Reasoning 5. Analogy grandma's house. 6. Causal relation 7. Spatiotemporal relation 8. Ellipsis 9. Bridging Q: Where did Tom go 10. Elaboration 11. Meta-Knowledge finally? 12. Schematic clause relation A: his grandma's house 13. Punctuation (Finally: temporal)

Recognizing implicit 1. Object Tracking 2. Mathematical Reasoning information 3. Coreference resolution 4. Logical Reasoning 5. Analogy 6. Causal relation 7. Spatiotemporal relation She is a smart student 8. Ellipsis 9. Bridging = 10. Elaboration 11. Meta-Knowledge 12. Schematic clause relation She is a student 13. Punctuation

Inference supported by 1. Object Tracking 2. Mathematical Reasoning grammatical and lexical 3. Coreference resolution 4. Logical Reasoning knowledge 5. Analogy 6. Causal relation She loves sushi 7. Spatiotemporal relation 8. Ellipsis 9. Bridging = 10. Elaboration 11. Meta-Knowledge 12. Schematic clause relation She likes sushi 13. Punctuation

Inference using known facts, general knowledge 1. Object Tracking 2. Mathematical Reasoning 3. Coreference resolution 4. Logical Reasoning 5. Analogy The writer of Hamlet 6. Causal relation 7. Spatiotemporal relation was Shakespeare 8. Ellipsis 9. Bridging � 10. Elaboration 11. Meta-Knowledge Shakespeare wrote 12. Schematic clause relation 13. Punctuation Hamlet

1. Object Tracking Who are the principal 2. Mathematical Reasoning 3. Coreference resolution characters of the story? 4. Logical Reasoning 5. Analogy 6. Causal relation 7. Spatiotemporal relation 8. Ellipsis 9. Bridging What is the main 10. Elaboration 11. Meta-Knowledge subject of this article? 12. Schematic clause relation 13. Punctuation

Understanding of complex 1. Object Tracking sentences that have 2. Mathematical Reasoning 3. Coreference resolution coordination or subordination 4. Logical Reasoning 5. Analogy Context: Tom has a friend 6. Causal relation 7. Spatiotemporal relation whose name is John. 8. Ellipsis 9. Bridging 10. Elaboration 11. Meta-Knowledge Q: What is a name of 12. Schematic clause relation 13. Punctuation Tom's friend? A: John (whose = relative clause)

Understanding of punctuation marks 1. Object Tracking Context: The AFC champion (Denver 2. Mathematical Reasoning Broncos) defeated the NFC champion 3. Coreference resolution 4. Logical Reasoning (Carolina Panthers) in super bowl 50 5. Analogy 6. Causal relation 7. Spatiotemporal relation 8. Ellipsis Q: Which NFL team won Super Bowl 9. Bridging 50? 10. Elaboration 11. Meta-Knowledge A: Denver Broncos 12. Schematic clause relation 13. Punctuation Note: parentheses present the champion team's name

1. Lexical Features Readability 2. Syntactic Features metrics 3. Traditional Features

Step 1: annotators see simultaneously the context, question, and its answer e.g. Q: Why Tom looked angry? A: His sister ate his cake. Step 2: Select sentences (from the context) e.g. Context: Annotation (C1) Tom is a student. (C2) Tom looks annoyed because his sister ate his cake. Procedure (C3) His sister's name is Sylvia. (100 Qu) -> Select: C2 Step 3: Select skills required for answering the question e.g.: C2: Tom looks annoyed because his sister ate his cake. � Skill: causal relation ("because"), bridging (lexical knowledge of "annoyed = angry")

1. Prerequisite skills required for each RC dataset 2. Prerequisite skills required per question Results 3. Readability of each RC dataset 4. Correlation between readability and prerequisite skills required.

Results 1- prerequisite skills required for each RC dataset 1. QA4MRE (Highest score in all skills): • Bridging • Elaboration • Clause Relation • Punctuation 2. MCTest • Casual Relation • Meta Knowledge • Coreference resolution • Spatiotemoral Relation

Results 2- Number of prerequisite skills required per question QA4MRE MCTests SQuAD WDW MS News MARCO QA Avg 3.25 1.56 1.28 2.43 1.19 1.99 Highest – technical documents – Qu handcrafted by experts

Results Nonsense/Difficult Questions QA4MRE MCTest SQuAD WDW MARCO News QA Non 10 1 3 27 14 1 sense

Results 3- Readability metrics for each RC dataset QA4MRE MCTests SQuAD WDW MARCO News QA F-K 14.9 3.6 14.6 15.3 12.1 12.6

Results 4- Correlation between readability metrics and the number of required prerequisite skills

Summary QA4MRE MCTest SQuAD • Hard to read • Easy to read • Hard to read • Hard to answer • Hard to answer • Easy to answer

How to utilize this study 1. Preparing appropriate datasets for each step of RC dev: I. easy-to-read and easy-to-answer II. easy-to-read but difficult-to-answer dataset III. difficult-to-read and difficult-to-answer datasets 2. Apply metrics to evaluate other datasets

Evaluation Metrics for Machine Reading Comprehension (RC): - PowerPoint PPT Presentation

Evaluation Metrics for Machine Reading Comprehension (RC): Prerequisite Skills and Readability Sugawara et al. The University of Tokyo, Fujitsu Laboratories Ltd., Natural Institute of Informatics Presented by: Shaima AbdulMajeed To give the

What we learned from Community Metrics Agenda Why are metrics used? How metrics are used

Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics

AGENCY OPERATIONS METRICS The Metrics of Me The Metrics of Me x 159 13,006 5 days old books

Proposal Metrics Dashboard What Gets Measured Gets Done Topics Why Keep Metrics? What

Chapter 12. Evaluation Research Chapter 12. Evaluation Research evaluation research? evaluation

User Interface Evaluation Empirical evaluation Heuristic evaluation 1 CS 349 - UI evaluation

Software Metrics Alex Boughton Executive Summary What are software metrics? Why are

Astheno-Khler and strong KT General results metrics Bismut connection Definition of strong KT

NDCs and metrics Andrei Marcu , Director, ERCST 1 NDCs and metrics Main issues: - Which metrics

Metrics are Pivotal A NATIONAL FARM TO INSTITUTION METRICS COLLABORATIVE WEBINAR Local

Metrics and Estimation Rahul Premraj + Andreas Zeller 1 Metrics Quantitative measures that

Software Metrics And I gnominy Software Metrics And I gnominy Software Metrics And I gnominy

Software Metrics Chapter 4 1 SW Metrics SW process and product metrics are quantitative

Software Metrics Overview SE 350 Software Process & Product Quality Lecture Objectives

Chapter 8 Evaluation Statistical Machine Translation Evaluation How good is a given machine

Model Evaluation Model Evaluation Metrics for Performance Evaluation How to evaluate the

Performance Chall llenges along the Conti tinuous Deli livery Pipelin ine Wolfgang Gottesheim

Quality of Life Sub-Committee UEC Meeting 17. January 2020 Yuanyuan Zhang, Oliver Gutsche for

California Debt and Investment California Debt and Investment Advisory Commission Advisory

DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // @JOY_ARULRAJ LECTURE #8: DERIVING

QM 120 Chapter 1 Spring 2011 Dr. Mohammad Zainal Chapter 1: An Introduction to Business

VICTORY IN JESUS Hymn # 426 I heard and old, old story, how a Savior came from glory, How He

A DMINISTRATIVE Midterm 1 A DMINISTRATIVE Assignment #3 Console Application

Laura Frank Engineer, Codeship Agenda 1. Parallel Testing Goals 2. DIY with LXC 3. Using

Evaluation Metrics for Machine Reading Comprehension (RC): - PowerPoint PPT Presentation

Evaluation Metrics for Machine Reading Comprehension (RC): Prerequisite Skills and Readability Sugawara et al. The University of Tokyo, Fujitsu Laboratories Ltd., Natural Institute of Informatics Presented by: Shaima AbdulMajeed To give the

What we learned from Community Metrics Agenda Why are metrics used? How metrics are used

Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics

AGENCY OPERATIONS METRICS The Metrics of Me The Metrics of Me x 159 13,006 5 days old books

Proposal Metrics Dashboard What Gets Measured Gets Done Topics Why Keep Metrics? What

Chapter 12. Evaluation Research Chapter 12. Evaluation Research evaluation research? evaluation

User Interface Evaluation Empirical evaluation Heuristic evaluation 1 CS 349 - UI evaluation

Software Metrics Alex Boughton Executive Summary What are software metrics? Why are

Astheno-Khler and strong KT General results metrics Bismut connection Definition of strong KT

NDCs and metrics Andrei Marcu , Director, ERCST 1 NDCs and metrics Main issues: - Which metrics

Metrics are Pivotal A NATIONAL FARM TO INSTITUTION METRICS COLLABORATIVE WEBINAR Local

Metrics and Estimation Rahul Premraj + Andreas Zeller 1 Metrics Quantitative measures that

Software Metrics And I gnominy Software Metrics And I gnominy Software Metrics And I gnominy

Software Metrics Chapter 4 1 SW Metrics SW process and product metrics are quantitative

Software Metrics Overview SE 350 Software Process &amp; Product Quality Lecture Objectives

Chapter 8 Evaluation Statistical Machine Translation Evaluation How good is a given machine

Model Evaluation Model Evaluation Metrics for Performance Evaluation How to evaluate the

Performance Chall llenges along the Conti tinuous Deli livery Pipelin ine Wolfgang Gottesheim

Quality of Life Sub-Committee UEC Meeting 17. January 2020 Yuanyuan Zhang, Oliver Gutsche for

California Debt and Investment California Debt and Investment Advisory Commission Advisory

DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // @JOY_ARULRAJ LECTURE #8: DERIVING

QM 120 Chapter 1 Spring 2011 Dr. Mohammad Zainal Chapter 1: An Introduction to Business

VICTORY IN JESUS Hymn # 426 I heard and old, old story, how a Savior came from glory, How He

A DMINISTRATIVE Midterm 1 A DMINISTRATIVE Assignment #3 Console Application

Laura Frank Engineer, Codeship Agenda 1. Parallel Testing Goals 2. DIY with LXC 3. Using

Software Metrics Overview SE 350 Software Process & Product Quality Lecture Objectives