SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav - PowerPoint PPT Presentation

SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, Percy Liang Published in EMNLP 2016 Presented by Jiaming Shen April 17, 2018 1

SQuAD = S tanford Qu estion A nswering D ataset Online challenge: https://rajpurkar.github.io/SQuAD-explorer/ 2

Overall contribution • A benchmark dataset with: • Proper difficulty • Principled curation process • Detailed data analysis 3

Outline • What are the QA datasets prior to SQuAD? • What does SQuAD look like? • How is SQuAD created? • What are the properties of SQuAD? • How well we can do on SQuAD? 4

What are the QA datasets prior to SQuAD? 5

Related Datasets Type I: Complex reading comprehension datasets Type II: Open-domain QA datasets Type III: Cloze datasets 6

Type I: Complex Reading Comprehension Datasets • Require commonsense knowledge, very challenge • Dataset size too small 7

Type II: Open-domain QA Datasets • Open-domain QA: answer a question from a large collection of documents. • WikiQA: only sentence selection • TREC-QA: free-form answer -> hard to evaluate 8

Type III: Cloze Datasets • Automatically generated -> large scale • Limitations are described in ACL 2016 Best Paper. 9

What does SQuAD look like? 10

SQuAD Dataset Format A passage One QA pair 11

SQuAD Dataset Format • One passage can have multiple question-answer pairs. • Totally 100,000+ QA pairs from 23,215 passages. 12

How is SQuAD created? 13

SQuAD Dataset Collection • Consisting three steps: • Step1: Passage curation • Step2: Question-answer collection • Step3: Additional answers collection 14

Step 1: Passage Curation • Select top 10000 articles of English Wikipedia based on Wikipedia’s internal PageRanks scores. • Random sample 536 articles out of 10000 articles. • Extract passages longer than 500 characters from all 536 articles -> 23,115 paragraphs. • Train/dev/test datasets are split in the article level. • Train/dev datasets are released and test dataset is holdout. 15

Step 2: Question-Answer Collection • Using crowdsourcing technique • Crowd-workers with 97% HIT acceptance rate, larger than 1000 HITs, and located in USA/Canada. • Spend 4 minutes on each paragraph and asking up to 5 questions with answer highlighted in the text. 16

Step 2: Question-Answer Collection 17

Step 3: Additional Answers Collection • For each question in dev/test datasets, get at least two additional answers. • Why we do this? • Make evaluation more robust. • Assess human performance. 18

What are the properties of SQuAD? 19

Data Analysis • Diversity in answers • Reasoning for answering questions • Syntactic divergence 20

Diversity in Answers • 67.4% non-entity answers, and many answers are not even noun -> Can be challenging. 21

Reasoning for answering questions 22

Syntactic divergence • Syntactic divergence is the minimum edit distance (belong all unlexicalized dependency path) over all possible anchors (word-lemma pairs). 23

Syntactic divergence • Histogram of syntactic divergence 24

How well we can do on SQuAD? 25

“Baseline” method • Candidate answer generation: use constituency parser. • Feature extraction • Train a Logistic Regression Model 26

Help to pick the correct sentence Resolve lexical variations Resolve syntactic variations 27

Evaluation • After ignoring punctuations and articles, using the following two metrics: • Exact Match • Macro-averaged F1 score: maximum F1 over all of the ground truth answers 28

Experiment results • Overall results For SQuAD v1.1 Test: EM: 82.304 F1: 91.221 29

Experiment results • Performance stratified by answer type 30

Experiment results • Performance stratified by syntactic divergence 31

Experiment results • Performance with feature ablations 32

Summary • SQuAD is a machine reading style QA dataset. • SQuAD consists of 100,000+ QA pairs. • SQuAD is constructed based on crowdsourcing. • SQuAD drives the field forward. 33

Thanks Q & A 34

SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav - PowerPoint PPT Presentation

SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, Percy Liang Published in EMNLP 2016 Presented by Jiaming Shen April 17, 2018 1 SQuAD = S tanford Qu estion A nswering D ataset Online

Growth in Known Compounds 70,000,000 63,175,733 60,000,000 54,675,250 50,000,000 50,000,000

SJVIA Projected Cash Flows as of 10/15/15 $10,000,000 $9,000,000 $8,000,000 $7,000,000

State funding remains below pre-recession levels $300,000,000 $290,000,000 $280,000,000 $273.1M

APRIL 30, 2019 $14,000,000.00 $12,000,000.00 $10,000,000.00 $8,000,000.00 $6,000,000.00

Comprehension Skills: Teacher Presentation Book, Comprehension Skills: Teacher Presentation Book,

CFR Data- State-Wide Fiscal Losses State Wide Losses - Education Programs 93,700,000

PAPA Technical Meetings - 2017 HMA PRODUCTION BY YEAR 1,200,000 1,000,000 980,000 1,000,000

Industrial Robot Outlook 1,000,000 900,000 800,000 700,000 600,000 500,000 400,000 300,000

Camping units 300,000 290,000 280,000 270,000 260,000 250,000 240,000 230,000 220,000

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

3,542 o F 3,542 o F 120 o F 50 o F N ATURAL GAS USE - 75% 1.5 MILLION THERMS /Y R .

Curtis Dubay Senior Economist, U.S. Chamber of Commerce April 2020 Historically High

BUDGET OVERVIEW 1 CHALLENGES FOR THE GENERAL FUND $30,000,000 $25,000,000 $20,000,000

27.3% 9,130,000 C-Crossovers sold Crossover % in 2014 in total C segment in 2014 35,000,000

The Returns to Education Source: Bureau of Labor Statistics 1 Total Enrollment Over Time

ACADEMY ACADEMY SQUAD 2 1 PRO-STAR SOUTH DEVELOPMENT SQUAD DEVELOPMENT PROGRAM 1 PRO OUR

P7: Joint Session on General & Domain Specific Metadata Materials, Chemistry, Photon/Neutron

2017/11/02 Army Environmental Program Division Curation of Federal Collections Kathleen

Balancing Broad Data Access With Usability at Scale Austin Wilt 1 Data Product Management at

Digital Creation and Curation Best Practices, Guidelines and Standards: NISO? Jody DeRidder

Data Management: Knowledge and skills required in research, scientific and technical

Spatial and Space-Time Data on COVID-19: COVID-19 Data Forum Orhun Aydin, PhD Environmental

Workflow Support for Con/nuous Data Quality Control in a

Information Infrastructures Week 1 LBSC 671 Creating Information Infrastructures Tonight

SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav - PowerPoint PPT Presentation

SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, Percy Liang Published in EMNLP 2016 Presented by Jiaming Shen April 17, 2018 1 SQuAD = S tanford Qu estion A nswering D ataset Online

Growth in Known Compounds 70,000,000 63,175,733 60,000,000 54,675,250 50,000,000 50,000,000

SJVIA Projected Cash Flows as of 10/15/15 $10,000,000 $9,000,000 $8,000,000 $7,000,000

State funding remains below pre-recession levels $300,000,000 $290,000,000 $280,000,000 $273.1M

APRIL 30, 2019 $14,000,000.00 $12,000,000.00 $10,000,000.00 $8,000,000.00 $6,000,000.00

Comprehension Skills: Teacher Presentation Book, Comprehension Skills: Teacher Presentation Book,

CFR Data- State-Wide Fiscal Losses State Wide Losses - Education Programs 93,700,000

PAPA Technical Meetings - 2017 HMA PRODUCTION BY YEAR 1,200,000 1,000,000 980,000 1,000,000

Industrial Robot Outlook 1,000,000 900,000 800,000 700,000 600,000 500,000 400,000 300,000

Camping units 300,000 290,000 280,000 270,000 260,000 250,000 240,000 230,000 220,000

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

3,542 o F 3,542 o F 120 o F 50 o F N ATURAL GAS USE - 75% 1.5 MILLION THERMS /Y R .

Curtis Dubay Senior Economist, U.S. Chamber of Commerce April 2020 Historically High

BUDGET OVERVIEW 1 CHALLENGES FOR THE GENERAL FUND $30,000,000 $25,000,000 $20,000,000

27.3% 9,130,000 C-Crossovers sold Crossover % in 2014 in total C segment in 2014 35,000,000

The Returns to Education Source: Bureau of Labor Statistics 1 Total Enrollment Over Time

ACADEMY ACADEMY SQUAD 2 1 PRO-STAR SOUTH DEVELOPMENT SQUAD DEVELOPMENT PROGRAM 1 PRO OUR

P7: Joint Session on General &amp; Domain Specific Metadata Materials, Chemistry, Photon/Neutron

2017/11/02 Army Environmental Program Division Curation of Federal Collections Kathleen

Balancing Broad Data Access With Usability at Scale Austin Wilt 1 Data Product Management at

Digital Creation and Curation Best Practices, Guidelines and Standards: NISO? Jody DeRidder

Data Management: Knowledge and skills required in research, scientific and technical

Spatial and Space-Time Data on COVID-19: COVID-19 Data Forum Orhun Aydin, PhD Environmental

Workflow Support for Con/nuous Data Quality Control in a

Information Infrastructures Week 1 LBSC 671 Creating Information Infrastructures Tonight

P7: Joint Session on General & Domain Specific Metadata Materials, Chemistry, Photon/Neutron