Why Is That Relevant? Collecting Annotator Rationales for Relevance - PowerPoint PPT Presentation

Why Is That Relevant? Collecting Annotator Rationales for Relevance Judgments Presenter: Tyler McDonnell Department of Computer Science The University of Texas at Austin Tyler McDonnell, Matthew Lease, Mucahid Kutlu, Tamer Elsayed 2016 AAAI Conference on Human Computation & Crowdsourcing 1

Search Relevance jaundice What are the symptoms of jaundice? 2

Search Relevance jaundice What are the symptoms of jaundice? 3

Search Relevance 25 Years of the National Institute of jaundice Standards & Technology Text REtrieval Conference (NIST TREC) What are the symptoms of jaundice? ● Expert assessors provide relevance labels for web pages. ● Task is highly subjective: even expert assessors disagree often.* Google: Quality Rater Guidelines (150 pages of instructions!) * Voorhees 2000 4

A First Experiment Collected sample of relevance judgments on Mechanical Turk. ● ● Labeled some data myself. Checked agreement. ● Between workers. ● Between workers vs. myself. ● ● Between workers vs. NIST gold. Between myself vs. NIST gold. ● Why do I disagree with NIST? Who knows! ● 5

Search Relevance Can we do better? 6

The Rationale jaundice What are the symptoms of jaundice? 7

The Rationale jaundice What are the symptoms of jaundice? 8

Why Rationales? 1. Transparency jaundice ● Focused context for interpreting What are the symptoms of jaundice? objective or subjective answers. ● Workers can justify decisions and establish alternative truths. ● Useful for immediate verification and future users of collected data. 9

Why Rationales? 2. Reliability & Verifiability jaundice ● Logical insight into reasoning What are the symptoms of jaundice? reduces temptation to cheat. ● Makes explicit the implicit reasoning underlying labeling tasks. ● Enables sequential task design. 10

Why Rationales? 3. Increased Inclusivity jaundice Hypothesis: With improved transparency What are the symptoms of jaundice? and accountability, we can remove all traditional barriers to participation so anyone interested is allowed to work. ● Scalability ● Diversity ● Equal Opportunity 11

Experimental Setup Collected relevance judgments through Mechanical Turk. ● ● Evaluated two main task types. Standard Task (Baseline): Assessors provide a relevance judgment for a given query, web page. ○ Rationale Task: Assessors provide a relevance judgment and rationale from the document. ○ (will mention two other variants later) ○ ● No worker qualifications. No “honey-pot” or verification questions. ● Equal pay across all evaluated tasks. ● 10,000 judgments collected. (Available online*) ● 12

Results - Accuracy Workers who provide rationales ● produce higher quality work. Rationale tasks provided higher ● binary accuracy (92-96%) than comparable studies (80-82%).* Collecting one rationale provides ● only marginally lower accuracy than five standard judgments. * Hosseini et al. 2012 13

Results - Cost-Efficiency Rationale tasks initially take ● longer to complete, but the difference becomes negligible with task familiarity. ● Rationales make explicit the implicit reasoning process underlying labeling. 14

But wait, there’s more! What about the rationale? 15

Using Rationales: Overlap Assessor 1 Rationale Assessor 2 Rationale 16

Using Rationales: Overlap Assessor 1 Rationale Assessor 2 Rationale Overlap Idea: Filter judgments based on pairwise rationale overlap among assessors. Motivation: Workers who converge on similar rationales are likely to agree on labels as well. 17

Results - Accuracy (Overlap) Filtering collected judgments ● by rationale overlap prior to aggregation increases quality. 18

Using Rationales: Two-Stage Task Design Assessor 1: Relevant Assessor 2: ? Assessor 1 Rationale Idea: Reviewer must confirm or refute judgment of initial reviewer. Motivation: Worker must consider their response in the context of peer’s reasoning. 19

Results - Accuracy (Two-Stage) Single review offers same ● accuracy as five aggregated standard judgments. Aggregating reviewers ● 1 Assessor + reaches same accuracy as 4 Reviewers 1 Assessor + filtered approaches. 1 Reviewer 20

The Big Picture Transparency ● Context for understanding and validating subjective answers. ○ Convergence on justification-based crowdsourcing. (e.g., Microtalk HCOMP 2016) ○ ● Improved Accuracy Rationales make the implicit reasoning for labeling explicit and hold workers accountable. ○ ● Improved Cost-Efficiency No additional cost for collection once workers are familiar with task. ○ ● Improved Aggregation Rationales are a signal that can be used for filtering or aggregating judgments. ○ 21

Future Work Dual Supervision: How can we further leverage rationales for aggregation? Supervised learning over labels/rationales. ● Zaidan, Eisner, Piatko 2007. NAACL 2007 Task Design: What about other sequential task designs? (e.g., multi-stage) Generalizability: How far can we generalize rationales to other tasks? (e.g., images) ● Donahue, Grauman. Annotator Rationales for Visual Recognition . ICCV 2011. 22

Acknowledgements We would like to thank our many talented crowd contributors. This work was made possible by the Qatar National Research Fund, a member of Qatar Foundation. 23

Questions ? 24

Why Is That Relevant? Collecting Annotator Rationales for Relevance - PowerPoint PPT Presentation

Why Is That Relevant? Collecting Annotator Rationales for Relevance Judgments Presenter: Tyler McDonnell Department of Computer Science The University of Texas at Austin Tyler McDonnell, Matthew Lease, Mucahid Kutlu, Tamer Elsayed 2016 AAAI

LENDING & MARKETING ACROSS THE GENERATIONS Remain Relevant at Every Age Presented by Bryn

Searching String Collections for the Most Relevant Documents the Most Relevant Documents Wing

Leadership Breakfast Fall 2019 Leapfrog Staying relevant Strategies to remain relevant amid

Nice but are they relevant? A political rules used for Rationality of rules scientist looks at

Relevant Disclosures Under the Oklahoma State Medical Association CME guidelines, disclosure

a) the work is relevant to the submission; and b) the text contains details of the publication.

GST ANNUAL RETURN CA DR ARPIT HALDIA FY 2017-18 RELEVANT SECTION, RULE, NOTIFICATIONS AND ROD

Object tracking and re-identification Sigmund Rolfsjord Overview Curriculum: Highly relevant

Health and Wellbeing Board Relevant Officer: John Blackledge, Director of Community and

Culturally Relevant Literature for English Language Learners Amy J. Heineke, Ph.D. Objectives

Building & Sustaining Industry Relationships: Be Relevant & Responsive novaNAIT

Relevant Persons of Northern Ireland and the EU Settlement Scheme Statement of changes in

Relevant Financial Disclosure(s) <insert speaker name> Directions: 1) Insert your name

Masterclass on COGATI, ISP, etc Consumer Roundtable, Brisbane June 2019 Relevant processes

is it relevant to patients with heart failure? M Johnson ST CATHERINES HOSPICE Overview

Relevant Content, Positive Attitude, and Memorable Presentation Kenneth P. Fivizzani Division of

Palumbo Antonio, MD Research Support/P.I. No relevant conflicts of interest to declare Employee

Identifying Relevant Sources for Data Linking using a Semantic Web Index Andriy Nikolov Mathieu

Disclosure I have nothing relevant to this presentation to disclose 1 What am I going to say?

Let me send relevant pictures to my friends while we chat. Select a picture from a

Lets Keep You Out of the Hospital May, 2015 Presented to: Insert relevant presenter

Health and Wellbeing Board Relevant Officer: Paul Greenwood, Chairman Blackpool, Wyre and Fylde

Bioprocess scale-up Tracking the informations relevant for scaling-up by GFP reporter strains

Chapter 4: Relevant Costs and Benefits for Decision- Making Agenda Sunk/Opportunity Costs

Why Is That Relevant? Collecting Annotator Rationales for Relevance - PowerPoint PPT Presentation

Why Is That Relevant? Collecting Annotator Rationales for Relevance Judgments Presenter: Tyler McDonnell Department of Computer Science The University of Texas at Austin Tyler McDonnell, Matthew Lease, Mucahid Kutlu, Tamer Elsayed 2016 AAAI

LENDING &amp; MARKETING ACROSS THE GENERATIONS Remain Relevant at Every Age Presented by Bryn

Searching String Collections for the Most Relevant Documents the Most Relevant Documents Wing

Leadership Breakfast Fall 2019 Leapfrog Staying relevant Strategies to remain relevant amid

Nice but are they relevant? A political rules used for Rationality of rules scientist looks at

Relevant Disclosures Under the Oklahoma State Medical Association CME guidelines, disclosure

a) the work is relevant to the submission; and b) the text contains details of the publication.

GST ANNUAL RETURN CA DR ARPIT HALDIA FY 2017-18 RELEVANT SECTION, RULE, NOTIFICATIONS AND ROD

Object tracking and re-identification Sigmund Rolfsjord Overview Curriculum: Highly relevant

Health and Wellbeing Board Relevant Officer: John Blackledge, Director of Community and

Culturally Relevant Literature for English Language Learners Amy J. Heineke, Ph.D. Objectives

Building &amp; Sustaining Industry Relationships: Be Relevant &amp; Responsive novaNAIT

Relevant Persons of Northern Ireland and the EU Settlement Scheme Statement of changes in

Relevant Financial Disclosure(s) &lt;insert speaker name&gt; Directions: 1) Insert your name

Masterclass on COGATI, ISP, etc Consumer Roundtable, Brisbane June 2019 Relevant processes

is it relevant to patients with heart failure? M Johnson ST CATHERINES HOSPICE Overview

Relevant Content, Positive Attitude, and Memorable Presentation Kenneth P. Fivizzani Division of

Palumbo Antonio, MD Research Support/P.I. No relevant conflicts of interest to declare Employee

Identifying Relevant Sources for Data Linking using a Semantic Web Index Andriy Nikolov Mathieu

Disclosure I have nothing relevant to this presentation to disclose 1 What am I going to say?

Let me send relevant pictures to my friends while we chat. Select a picture from a

Lets Keep You Out of the Hospital May, 2015 Presented to: Insert relevant presenter

Health and Wellbeing Board Relevant Officer: Paul Greenwood, Chairman Blackpool, Wyre and Fylde

Bioprocess scale-up Tracking the informations relevant for scaling-up by GFP reporter strains

Chapter 4: Relevant Costs and Benefits for Decision- Making Agenda Sunk/Opportunity Costs

LENDING & MARKETING ACROSS THE GENERATIONS Remain Relevant at Every Age Presented by Bryn

Building & Sustaining Industry Relationships: Be Relevant & Responsive novaNAIT

Relevant Financial Disclosure(s) <insert speaker name> Directions: 1) Insert your name