evaluation metrics for machine
play

Evaluation Metrics for Machine Reading Comprehension (RC): - PowerPoint PPT Presentation

Evaluation Metrics for Machine Reading Comprehension (RC): Prerequisite Skills and Readability Sugawara et al. The University of Tokyo, Fujitsu Laboratories Ltd., Natural Institute of Informatics Presented by: Shaima AbdulMajeed To give the


  1. Evaluation Metrics for Machine Reading Comprehension (RC): Prerequisite Skills and Readability Sugawara et al. The University of Tokyo, Fujitsu Laboratories Ltd., Natural Institute of Informatics Presented by: Shaima AbdulMajeed

  2. To give the agent the ability to: RC 1. Read open-domain documents Task 2. Answer questions about them

  3. Knowing Quality of Reading Goal Comprehension (RC) datasets

  4. To know Which dataset to use that Why best evaluates the developed RC system

  5. RC dataset Example

  6. Datasets evaluated

  7. Current dataset metrics • Question types • Answer types • Categories

  8. Is that enough ?

  9. Does readability of text correlates with difficulty of answering questions about it?

  10. Evaluation Metrics Proposed 1. Prerequisite skills 2. Readability metrics

  11. 1. Object Tracking 2. Mathematical Reasoning 3. Coreference resolution 4. Logical Reasoning 5. Analogy 6. Causal relation Prerequisite skills 7. Spatiotemporal relation 8. Ellipsis 9. Bridging 10. Elaboration 11. Meta-Knowledge 12. Schematic clause relation 13. Punctuation

  12. Tracking or grasping of multiple objects 1. Object Tracking 2. Mathematical Reasoning 3. Coreference resolution 4. Logical Reasoning 5. Analogy Context: Tom ate apples. 6. Causal relation 7. Spatiotemporal relation Mary ate apples, too. 8. Ellipsis 9. Bridging Q: Who ate apples? 10. Elaboration 11. Meta-Knowledge A: Tom and Mary 12. Schematic clause relation 13. Punctuation (Object: Tom, Mary)

  13. Statistical, mathematical and 1. Object Tracking quantitative reasoning 2. Mathematical Reasoning 3. Coreference resolution 4. Logical Reasoning 5. Analogy 6. Causal relation Context: Tom ate ten apples. 7. Spatiotemporal relation Mary ate eight apples. 8. Ellipsis 9. Bridging Q: How many apples did 10. Elaboration 11. Meta-Knowledge Tom and Mary eat? 12. Schematic clause relation 13. Punctuation A: eighteen

  14. Detection and resolution of all 1. Object Tracking possible demonstratives 2. Mathematical Reasoning 3. Coreference resolution 4. Logical Reasoning 5. Analogy Context: Tom was hungry. 6. Causal relation 7. Spatiotemporal relation He ate ten apples. 8. Ellipsis 9. Bridging Q: How many apples did 10. Elaboration 11. Meta-Knowledge Tom eat? 12. Schematic clause relation 13. Punctuation A: ten (Tom = He)

  15. 1. Object Tracking Understanding of Predicate Logic 2. Mathematical Reasoning 3. Coreference resolution 4. Logical Reasoning 5. Analogy Context: All students have a 6. Causal relation 7. Spatiotemporal relation pen. Tom is a student. 8. Ellipsis 9. Bridging Q: Does Tom have a pen. 10. Elaboration 11. Meta-Knowledge A: Yes (and object tracking) 12. Schematic clause relation 13. Punctuation

  16. Understanding metaphors Context: The White House 1. Object Tracking 2. Mathematical Reasoning said Trump is open to … 3. Coreference resolution 4. Logical Reasoning 5. Analogy 6. Causal relation Q: Did the President of the 7. Spatiotemporal relation 8. Ellipsis United States and his staff 9. Bridging 10. Elaboration say Trump is open to ... 11. Meta-Knowledge 12. Schematic clause relation A: Yes 13. Punctuation (The White House said = POTUS and his staff said...)

  17. 1. Object Tracking 2. Mathematical Reasoning 3. Coreference resolution 4. Logical Reasoning 5. Analogy 6. Causal relation “why,” “because,” 7. Spatiotemporal relation 8. Ellipsis 9. Bridging 10. Elaboration 11. Meta-Knowledge 12. Schematic clause relation 13. Punctuation

  18. Context: One day, Tom went to the park. After that, 1. Object Tracking he went to the restaurant. 2. Mathematical Reasoning 3. Coreference resolution Finally, he went to his 4. Logical Reasoning 5. Analogy grandma's house. 6. Causal relation 7. Spatiotemporal relation 8. Ellipsis 9. Bridging Q: Where did Tom go 10. Elaboration 11. Meta-Knowledge finally? 12. Schematic clause relation A: his grandma's house 13. Punctuation (Finally: temporal)

  19. Recognizing implicit 1. Object Tracking 2. Mathematical Reasoning information 3. Coreference resolution 4. Logical Reasoning 5. Analogy 6. Causal relation 7. Spatiotemporal relation She is a smart student 8. Ellipsis 9. Bridging = 10. Elaboration 11. Meta-Knowledge 12. Schematic clause relation She is a student 13. Punctuation

  20. Inference supported by 1. Object Tracking 2. Mathematical Reasoning grammatical and lexical 3. Coreference resolution 4. Logical Reasoning knowledge 5. Analogy 6. Causal relation She loves sushi 7. Spatiotemporal relation 8. Ellipsis 9. Bridging = 10. Elaboration 11. Meta-Knowledge 12. Schematic clause relation She likes sushi 13. Punctuation

  21. Inference using known facts, general knowledge 1. Object Tracking 2. Mathematical Reasoning 3. Coreference resolution 4. Logical Reasoning 5. Analogy The writer of Hamlet 6. Causal relation 7. Spatiotemporal relation was Shakespeare 8. Ellipsis 9. Bridging � 10. Elaboration 11. Meta-Knowledge Shakespeare wrote 12. Schematic clause relation 13. Punctuation Hamlet

  22. 1. Object Tracking Who are the principal 2. Mathematical Reasoning 3. Coreference resolution characters of the story? 4. Logical Reasoning 5. Analogy 6. Causal relation 7. Spatiotemporal relation 8. Ellipsis 9. Bridging What is the main 10. Elaboration 11. Meta-Knowledge subject of this article? 12. Schematic clause relation 13. Punctuation

  23. Understanding of complex 1. Object Tracking sentences that have 2. Mathematical Reasoning 3. Coreference resolution coordination or subordination 4. Logical Reasoning 5. Analogy Context: Tom has a friend 6. Causal relation 7. Spatiotemporal relation whose name is John. 8. Ellipsis 9. Bridging 10. Elaboration 11. Meta-Knowledge Q: What is a name of 12. Schematic clause relation 13. Punctuation Tom's friend? A: John (whose = relative clause)

  24. Understanding of punctuation marks 1. Object Tracking Context: The AFC champion (Denver 2. Mathematical Reasoning Broncos) defeated the NFC champion 3. Coreference resolution 4. Logical Reasoning (Carolina Panthers) in super bowl 50 5. Analogy 6. Causal relation 7. Spatiotemporal relation 8. Ellipsis Q: Which NFL team won Super Bowl 9. Bridging 50? 10. Elaboration 11. Meta-Knowledge A: Denver Broncos 12. Schematic clause relation 13. Punctuation Note: parentheses present the champion team's name

  25. 1. Lexical Features Readability 2. Syntactic Features metrics 3. Traditional Features

  26. Step 1: annotators see simultaneously the context, question, and its answer e.g. Q: Why Tom looked angry? A: His sister ate his cake. Step 2: Select sentences (from the context) e.g. Context: Annotation (C1) Tom is a student. (C2) Tom looks annoyed because his sister ate his cake. Procedure (C3) His sister's name is Sylvia. (100 Qu) -> Select: C2 Step 3: Select skills required for answering the question e.g.: C2: Tom looks annoyed because his sister ate his cake. � Skill: causal relation ("because"), bridging (lexical knowledge of "annoyed = angry")

  27. 1. Prerequisite skills required for each RC dataset 2. Prerequisite skills required per question Results 3. Readability of each RC dataset 4. Correlation between readability and prerequisite skills required.

  28. Results 1- prerequisite skills required for each RC dataset 1. QA4MRE (Highest score in all skills): • Bridging • Elaboration • Clause Relation • Punctuation 2. MCTest • Casual Relation • Meta Knowledge • Coreference resolution • Spatiotemoral Relation

  29. Results 2- Number of prerequisite skills required per question QA4MRE MCTests SQuAD WDW MS News MARCO QA Avg 3.25 1.56 1.28 2.43 1.19 1.99 Highest – technical documents – Qu handcrafted by experts

  30. Results Nonsense/Difficult Questions QA4MRE MCTest SQuAD WDW MARCO News QA Non 10 1 3 27 14 1 sense

  31. Results 3- Readability metrics for each RC dataset QA4MRE MCTests SQuAD WDW MARCO News QA F-K 14.9 3.6 14.6 15.3 12.1 12.6

  32. Results 4- Correlation between readability metrics and the number of required prerequisite skills

  33. Results 4- Correlation between readability metrics and the number of required prerequisite skills

  34. Summary QA4MRE MCTest SQuAD • Hard to read • Easy to read • Hard to read • Hard to answer • Hard to answer • Easy to answer

  35. How to utilize this study 1. Preparing appropriate datasets for each step of RC dev: I. easy-to-read and easy-to-answer II. easy-to-read but difficult-to-answer dataset III. difficult-to-read and difficult-to-answer datasets 2. Apply metrics to evaluate other datasets

Recommend


More recommend