TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for - PowerPoint PPT Presentation

TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension Mandar Joshi, Eunsol Choi, Daniel S. Weld, Luke Zettlemoyer Presenter: Zhuolun Xiang

Background • Question Answering (QA) Formulation • Answer a question 𝑟 given evidences 𝐸 • Dataset of tuples 𝑟 𝑗 , 𝑏 𝑗 , 𝐸 𝑗 𝑗 = 1, … , 𝑜} • 𝑏 𝑗 is a substring of D 𝑗 • Example

Overview • TriviaQA • Over 650K question-answer-evidence triples • First dataset where questions authored by trivia enthusiasts • Evidence documents from Web search and Wiki pages • A high percentage of the questions are challenging • Dataset samples

Dataset Collection • Gather question-answer pairs from 14 trivia websites • Remove short questions • Collect evidence from Web search and Wikipedia • Web search • Pose questions on Bing • Exclude trivia websites • Crawl top 10 results • Wikipedia • Use TAGME to find Wikipedia entities in the question • Add these pages as evidence

Dataset Analysis • Question-answer pairs • Avg length = 14 • Manually analyze 200 sampled questions Property of questions

Dataset Analysis • Question-answer pairs • Avg length = 14 • Manually analyze 200 sampled questions Property of answers

Dataset Analysis • Question-answer pairs • Avg length = 14 • Manually analyze 200 sampled questions • Evidences • 75.4%/79.7% of Web/Wiki evidences contain answers • Human test achieves 75.3/79.6 accuracy on Web/Wiki domains • Answer 40% of questions needs information from multiple sentences

Experiments: Baseline Methods • Random entity baseline (Wiki domain only) • Entities in Wiki pages form candidate answer set • Randomly pick one that not occur in question • Entity classifier • Ranking problem over candidate answers • Function learnt using LambdaMART (Wu et al., 10) • Neural model • Use BiDAF model (Seo et al., 17)

Experiments • Metrics • Exact match(EM) and F1 score • For numerical and freeform answer: single given answer as ground truth • For Wiki entity: use Wiki aliases as well • Setup • Random partition into train(80%)/development(10%)/test(10%)

Experiments • Results • Human baseline: 79.7% on Wiki, 75.4% on web

Conclusion • TriviaQA • 650K question-answer-evidence triples • Questions authored by trivia enthusiasts • Evidence documents from Web search and Wiki pages • Experiments show TriviaQA is a challenging testbed Thanks!

TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for - PowerPoint PPT Presentation

TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension Mandar Joshi, Eunsol Choi, Daniel S. Weld, Luke Zettlemoyer Presenter: Zhuolun Xiang Background Question Answering (QA) Formulation Answer a

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

Not to Cry Wolf: Distantly Supervised Multitask Learning in Critical Care Patrick Schwab 1

Large-Scale Machine Learning at Twitter 2 Large-Scale Machine Learning at Twitter Jimmy Lin and

INFRASTRUCTURE 2110414 Large Scale Computing Systems Natawut Nupairoj, Ph.D. Outline 2

MongoDB large scale data-centric architectures QConSF 2012 Kenny Gorman Founder, ObjectRocket

INCORPORATING LARGE-SCALE CITIZEN INCORPORATING LARGE-SCALE CITIZEN DELIBERATION INTO

Workshop Workshop on Large on Large- -Scale Disaster Recovery Scale Disaster Recovery i i

A large-scale chemical data integration system Gaia Paolini Pfizer Confidential 1 Large-Scale

Efficient Large-Scale Graph Processing on Hybrid CPU and GPU Systems A. Gharaibeh, E.

Meeting the Challenges of Ultra- -Large Large- -Scale Scale Meeting the Challenges of Ultra

Large Scale I nternational I Pv6 Pilot Large Scale I nternational I Pv6 Pilot Network (6NET)

Deploying Large Scale AVB/TSN Networks Jeff Koftinoff, Meyer Sound Laboratories, Inc. June 19,

Meeting the Challenges of Ultra- -Large Large- -Scale Scale Meeting the Challenges of Ultra

Enabling Future Enabling Future Technology Technology Ultra-Large-Scale Systems

A Large Scale, MHD Resonant A Large Scale, MHD Resonant Instability in a Galactic-Like Disk

Web Search Using Mobile Cores Quantifying and Mitigating the Price of Efficiency Vijay Janapa

ON THE EQUIVALENCE BETWEEN GRAPHICAL AND TABULAR REPRESENTATIONS FOR SECURITY RISK ASSESSMENT

Methodology Adapted from Menasc & Almeida. 1 Learning Objectives Discuss the concept

A Dynamic Approach to Scaling in Bundle Methods for Convex Optimization Christoph Helmberg joint

A Reference Architecture for Web Servers

CISC 322 Software Architecture Lecture 11: Reference Architecture Emad Shihab Paper by: Ahmed

Performance Metrics for Web Browsing draft fan ippm web metrics 00 Peng Fan

What Does Performance Mean? Response time Lecture 2: Performance A simulation program

TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for - PowerPoint PPT Presentation

TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension Mandar Joshi, Eunsol Choi, Daniel S. Weld, Luke Zettlemoyer Presenter: Zhuolun Xiang Background Question Answering (QA) Formulation Answer a

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

Not to Cry Wolf: Distantly Supervised Multitask Learning in Critical Care Patrick Schwab 1

Large-Scale Machine Learning at Twitter 2 Large-Scale Machine Learning at Twitter Jimmy Lin and

INFRASTRUCTURE 2110414 Large Scale Computing Systems Natawut Nupairoj, Ph.D. Outline 2

MongoDB large scale data-centric architectures QConSF 2012 Kenny Gorman Founder, ObjectRocket

INCORPORATING LARGE-SCALE CITIZEN INCORPORATING LARGE-SCALE CITIZEN DELIBERATION INTO

Workshop Workshop on Large on Large- -Scale Disaster Recovery Scale Disaster Recovery i i

A large-scale chemical data integration system Gaia Paolini Pfizer Confidential 1 Large-Scale

Efficient Large-Scale Graph Processing on Hybrid CPU and GPU Systems A. Gharaibeh, E.

Meeting the Challenges of Ultra- -Large Large- -Scale Scale Meeting the Challenges of Ultra

Large Scale I nternational I Pv6 Pilot Large Scale I nternational I Pv6 Pilot Network (6NET)

Deploying Large Scale AVB/TSN Networks Jeff Koftinoff, Meyer Sound Laboratories, Inc. June 19,

Meeting the Challenges of Ultra- -Large Large- -Scale Scale Meeting the Challenges of Ultra

Enabling Future Enabling Future Technology Technology Ultra-Large-Scale Systems

A Large Scale, MHD Resonant A Large Scale, MHD Resonant Instability in a Galactic-Like Disk

Web Search Using Mobile Cores Quantifying and Mitigating the Price of Efficiency Vijay Janapa

ON THE EQUIVALENCE BETWEEN GRAPHICAL AND TABULAR REPRESENTATIONS FOR SECURITY RISK ASSESSMENT

Methodology Adapted from Menasc &amp; Almeida. 1 Learning Objectives Discuss the concept

A Dynamic Approach to Scaling in Bundle Methods for Convex Optimization Christoph Helmberg joint

A Reference Architecture for Web Servers

CISC 322 Software Architecture Lecture 11: Reference Architecture Emad Shihab Paper by: Ahmed

Performance Metrics for Web Browsing draft fan ippm web metrics 00 Peng Fan

What Does Performance Mean? Response time Lecture 2: Performance A simulation program

Methodology Adapted from Menasc & Almeida. 1 Learning Objectives Discuss the concept