Simple and Effective Multi-Paragraph Reading Comprehension - PowerPoint PPT Presentation

Simple and Effective Multi-Paragraph Reading Comprehension Christopher Clark and Matt Gardner

Neural Question Answering Question: “What color is the sky?” Passage: “Air is made mainly from molecules of nitrogen and oxygen. These molecules scatter the blue colors of sunlight more effectively than the green and red colors. Therefore, a clean sky appears blue.”

40 45 50 55 60 65 70 75 80 85 90 Fast Progress on Paragraph Datasets Jun-16 Jul-16 Aug-16 Sep-16 Oct-16 Nov-16 Dec-16 Jan-17 Feb-17 Accuracy on SQuAD 1.1 Mar-17 Apr-17 May-17 Jun-17 Jul-17 Aug-17 Sep-17 Oct-17 Nov-17 Dec-17 Jan-18 Feb-18 Mar-18 Apr-18 May-18 Jun-18

What Next?

Open Question Answering Question: “What color is the sky?” Blue Relevant Text Model Answer Span Document Retrieval

Challenge: Scaling Models to Documents § Modern reading comprehension models have many layers and parameter s § The trend is continuing in this direction, for example with the use of large language models § Reduced efficiency as the paragraph length increases due to long RNN chains or transformers/self-attention modules § Limits the model to processing short paragraphs

Two Possible Approaches • Pipelined Systems • Select a single paragraph from the input, and run the model on that paragraph § Confidence Systems (0.68) § Run the model on many paragraphs from the input, and have itassign a confidence score to its results on each paragraph (0.83) (0.29)

This Work Improved Pipeline Method • Improve several of the key design decision that arise when training on document-level data Improved Confidence Method • Study ways to train models to produce correct confidence scores

Pipeline Method: Paragraph Selection § Train a shallow linear model to select the best paragraphs § Features include TF-IDF, word occurrences, and its position within the document § If there is just one document TF-IDF alone is effective § Improves change of selecting an answering-containing paragraph from 83.0 to 85.1 on TriviaQA Web

Pipeline Method: Noisy Supervision Document level data can be expected to be distantly supervised: Question: Which British general was killed at Khartoum in 1885? Passage: In February 1884 Gordon returned to the Sudan to evacuate Egyptian forces. Rebels broke into the city , killing Gordon and the other defenders. The British public reacted to his death by acclaiming ' Gordon of Khartoum , a saint. However, historians have since suggested that Gordon defied orders and….

Pipeline Method: Noisy Supervision § Need a training objective that can handle multiple (noisy) answer spans § Use the summed objective from Kadlec et al (2016), that optimizes the log sum of the probability of all answer spans § Remains agnostic to how probability mass is distributed among the answer spans

Pipeline Method: Model § Construct a fast, competitive model § Use some keys ideas from prior work, bidirectional-attention, self-attention, character- embeddings, variational dropout § Also added learned tokens for document and paragraphs starts § < 5 hours to train for 26 epochs on SQuAD

Confidence Methods § We can derive confidence scores from the logit scores given to each span by the model, i.e., the scores given before the softmax operator is applied § Without re-training this can work poorly

Example from SQuAD Question: “When is the Members Debate held?” Model Extraction: “..majority of the Scottish electorate voted for it in a referendum to be held on 1 March 1979 that represented at least... ” Correct Answer: “Immediately after Decision Time a “Members Debate” is held, which lasts for 45 minutes... ”

Learning Well-Calibrated Confidence Scores § Train the model on both answering-containing and non-answering containing paragraph and use a modified objective function § Merge : Concatenate sampled paragraphs together § No-Answer : Process paragraphs independently, and allow the model to place probability mass on a “no-answer” output § Sigmoid : Assign an independent probability on each span using the sigmoid operator § Shared-Norm : Process paragraphs independently, but compute the span probability across spans in all paragraphs

Results

Datasets • TriviaQA : Datasets of trivia questions and related documents found by web- search • Includes three setting, Web (a single document for each questions) Wiki (multiple wikipedia documents for each questions) and Unfiltered (Multiple documents for each questions) • SQuAD: Turker-generated questions about Wikipedia articles • We use the questions paired with the entire article • Manual annotation shows most (90%) of questions are answerable as given the document it was generated from

Pipeline Method: Results on TriviaQA Web 70 Baseline implementation: 60 61.1 57.2 • Uses BiDAF as the model 56.22 50 53.41 50.21 • Select paragraphs by truncating documents 40 41.08 EM • Select answer-spans randomly 30 • 72.14 EM / 81.05 F1 on SQuAD 20 • 78.58 EM / 85.83 F1 with contextualized 10 word embeddings (Peters et al., 2017) 0 TriviaQA Our +TF-IDF +Sum +TF-IDF +Model Baseline Baseline +Sum +TF-IDF +Sum

TriviaQA Leaderboard (Exact Match Scores) Model Web-All Web-Verified Wiki-All Wiki-Verified Best leaderboard entry (“mingyan”) 68.65 82.44 66.56 74.83 Leaderboard entry (“dirkweissen”) 64.60 67.46 77.63 72.77 Shared-Norm (Ours) 66.37 79.97 63.99 67.98 Dynamic Integration of Background Knowledge 50.56 63.20 48.64 53.42 (Weissenborn et al., 2017a) Neural Cascades (Swayamdipta et al., 2017) 53.75 63.20 51.59 58.90 MnemonicReader (Hue et al., 2017) 46.65 56.96 46.94 54.45 SMARNET (Chen et al., 2017 51.11 40.87 42.41 50.51

Error Analysis • Manually annotated 200 errors made by the TriviaQA Web model • 40.5% are due to noise or lack of context in the relevant documents • Of the remaining….

Answer indirectly stated 20% Sentence Reading Missing backgroun 35% knoweldge 6% Part of answer extracted 7% Document Coreference Paragraph Reading 14% 18%

Building an Open Question Answering System • Use Bing web search and a Wikipedia entity linker to locate relevant documents • Extract the top 12 paragraphs, as found using the linear paragraph ranker • Use the model trained for TriviaQA Unfiltered to find the final answer Question

Curated Trec Results 60 53.31 50 40 ACCURACY 37.18 34.26 30 25.7 20 10 0 YodaQA with Bing YodaQA (Baudis, DrQA + DS (Chen et S-Norm (ours) (Baudis, 2015), 2015) al., 2017a)

Thank You Demo : https://documentqa.allenai.org/ Question Github : https://github.com/allenai/document-qa

Simple and Effective Multi-Paragraph Reading Comprehension - PowerPoint PPT Presentation

Simple and Effective Multi-Paragraph Reading Comprehension Christopher Clark and Matt Gardner Neural Question Answering Question: What color is the sky? Passage: Air is made mainly from molecules of nitrogen and oxygen. These

Adversarial Examples for Evaluating Reading Comprehension Systems Robin Jia and Percy Liang

Reading and Comprehension Reading requires: o Decoding written text o Can compensate for lack

Reading and Comprehension Reading requires: o Decoding written text o Can compensate for lack

MIHS Expectations for Reading Comprehension May 18, 2017 Common Thread: Reading for Information

Elements of reading Decoding Reading skills Comprehension Reading words

(Age 7-11) A new solution for guided reading Agenda Why a comprehension programme? What is Bug

Multi-Hop RC, HotpotQA & GNNs Select, Answer and Explain: Interpretable Multi-hop Reading

Investigating how picturebooks support reading comprehension Reciprocal Reading Conference Mary

Commonsense for Generative Multi-Hop Question Answering Tasks Lisa Bauer* Yicheng Wang* Mohit

Reading between the lines: Improving Comprehension for Students Kevin Larson Microsoft Advanced

A love of books and of reading Being able to read for information Comprehension-

School Strategic Plan 2012-2015 School Curriculum- Reading Comprehension and Writing Key

Data Driven Reading Comprehension Phil Blunsom In collaboration with Karl Moritz Hermann, Tom

Task (1) Factoid QA with Single Supporting Fact (where is actor) (Very Simple) Toy reading

Modeling Biological Processes for Reading Comprehension Vivek

Extract 2 1984 , G. Orwell, 1. What progression can you find from paragraph 1 to paragraph 5?

SHE IT ME 1 Lesson 3 Reading Comprehension.notebook April 22, 2020 Replace the blank with

Di Gretton Kathleen Paice Jessie Favell Kay Sue Hayes Warren Reading Comprehension Numeracy

BUILDING READING COMPREHENSION IN THE AGE OF TECHNOLOGY Literacy Can Influence You in More Ways

Evaluation Metrics for Machine Reading Comprehension (RC): Prerequisite Skills and Readability

Consensus Attention-based Neural Networks for Reading Comprehension Y IMING C UI , T ING L IU , Z

Reading Comprehension in Adolescence Current Theoretical Models And Applications to Intervention

on Reading Skills 24 March 2018 Outline of Sharing Overview of Reading Extensive Reading

Reading Workshop Aims What is reading? How do we teach your child to read and develop

Simple and Effective Multi-Paragraph Reading Comprehension - PowerPoint PPT Presentation

Simple and Effective Multi-Paragraph Reading Comprehension Christopher Clark and Matt Gardner Neural Question Answering Question: What color is the sky? Passage: Air is made mainly from molecules of nitrogen and oxygen. These

Adversarial Examples for Evaluating Reading Comprehension Systems Robin Jia and Percy Liang

Reading and Comprehension Reading requires: o Decoding written text o Can compensate for lack

Reading and Comprehension Reading requires: o Decoding written text o Can compensate for lack

MIHS Expectations for Reading Comprehension May 18, 2017 Common Thread: Reading for Information

Elements of reading Decoding Reading skills Comprehension Reading words

(Age 7-11) A new solution for guided reading Agenda Why a comprehension programme? What is Bug

Multi-Hop RC, HotpotQA &amp; GNNs Select, Answer and Explain: Interpretable Multi-hop Reading

Investigating how picturebooks support reading comprehension Reciprocal Reading Conference Mary

Commonsense for Generative Multi-Hop Question Answering Tasks Lisa Bauer* Yicheng Wang* Mohit

Reading between the lines: Improving Comprehension for Students Kevin Larson Microsoft Advanced

A love of books and of reading Being able to read for information Comprehension-

School Strategic Plan 2012-2015 School Curriculum- Reading Comprehension and Writing Key

Data Driven Reading Comprehension Phil Blunsom In collaboration with Karl Moritz Hermann, Tom

Task (1) Factoid QA with Single Supporting Fact (where is actor) (Very Simple) Toy reading

Modeling Biological Processes for Reading Comprehension Vivek

Extract 2 1984 , G. Orwell, 1. What progression can you find from paragraph 1 to paragraph 5?

SHE IT ME 1 Lesson 3 Reading Comprehension.notebook April 22, 2020 Replace the blank with

Di Gretton Kathleen Paice Jessie Favell Kay Sue Hayes Warren Reading Comprehension Numeracy

BUILDING READING COMPREHENSION IN THE AGE OF TECHNOLOGY Literacy Can Influence You in More Ways

Evaluation Metrics for Machine Reading Comprehension (RC): Prerequisite Skills and Readability

Consensus Attention-based Neural Networks for Reading Comprehension Y IMING C UI , T ING L IU , Z

Reading Comprehension in Adolescence Current Theoretical Models And Applications to Intervention

on Reading Skills 24 March 2018 Outline of Sharing Overview of Reading Extensive Reading

Reading Workshop Aims What is reading? How do we teach your child to read and develop

Multi-Hop RC, HotpotQA & GNNs Select, Answer and Explain: Interpretable Multi-hop Reading