Argument Retrieval in Project Debater Yufang Hou IBM Research Europe, Dublin
IBM Research: History of Grand Challenges 2019 First computer to successfully debate champion debaters 2011 ( Proje ater ) Project Debat First computer to defeat best human Jeopardy! players (Watson) 1997 First computer to defeat a world champion in Chess (Deep Blue)
Segments from a Live Debate (San Francisco, Feb 11 th 2019) Expert human debater: Mr. Harish Natarajan Motion: We should Format: Oxford style debating Fully automatic debate subsidize preschool Fully automatic debate No human intervention No human intervention Selected from test set based on assessment of chances to have a meaningful debate
Project Debater: Media Exposure Millions Hundreds 2.1 Billion 100 Million social media people reached of video views of press articles in all leading news papers impressions
• Fu Full Li Live e Deba ebate, te, Feb Feb-2019 2019 https://www.youtube.com/watch?v=m3u-1yttrVw&t=2469s • “T “The e Deb ebater ter” Doc ocumen enta tary https://www.youtube.com/watch?v=7pHaNMdWGsk&t=1383s
Outline q System overview q Argument retrieval in Project Debater q Some retrospective thoughts
Current Publications Highlight Various As Aspects of the System
Pub Public licatio ions ns an and Dat Datas asets are are av avai ailab able at at - https://www.research.ibm.com/artificial- intelligence/project-debater/research/
Outline q System overview q Argument retrieval in Project Debater q Some retrospective thoughts
Related Work • Lippi Li ppi an and d To Toroni ni, IJCAI AI, 2015 • Al Al-Khatib et al, NAAC AACL 2016; Wachsmuth et al, Ar Argument-Mi Mining W Workshop, 2 2017, … … • Stab and St nd Gu Gurevy vych, EMNLP 2014; Stab et al, NAAC AACL 2018, … • Re Recent nt reviews • Fiv Five years rs of of argu rgument min inin ing: g: a data-dr driven en an anal alysis, Cabr abrio an and d Vi Villata, IJCAI AI, 2018 • Ar Argumentation Mining, St Stede an and d Sch chnei eider der, Synthes esis Lect Lectures es on HLT LT, 2018 2018 • Ar Argument Mining: A A Survey, Lawrence and Reed, CL, 2019
Wikipedia Stage Con ontext Dependent Claim Detection on, Levy et al, COLING 2014. 2014. Show ow Me You Your Evidence - an Au Autom omatic Method od for or Con ontext Dependent Evidence Detection on, Rinot ott et al, EMNLP 2015. 2015.
Wikipedia Stage • Wikiped edia Claim/Eviden ence e Label eled ed Data – Label eling Proces ess Con ontrov oversial Top opic Select Wikipedia Ar Articles ü 5 5 In-house An Annotators Per Stage Fi Find Claim Candidates per Ar Article ü Ex Exhau austive e an annotat ation Con onfirm/Reject Each Claim Candidate Find Candidate Evidence per Claim Fi Con onfirm/Reject Each Candidate Evidence
Wikipedia Stage • Wikiped edia Claim/Eviden ence e Label eled ed Data - Res esults ü 58 58 Controver ersial al Topi pics cs se selected from rom De Debatabase ü 547 547 rel elev evan ant Wikipedi pedia a ar articl cles es car caref efully label abeled ed by by in-ho hous use team § E.g., Ban the sale of of Viol olent Video o Games for or Children ü 2. 2.6K 6K Clai aims ms & & 4. 4.5K 5K Ev Eviden dence ce th that s t support/c t/conte test th t the c claims § Evidence length vary from om on one sentence to o a whol ole paragraph § Three types of of Evidence: Study, Expert, and An Anecdot otal ü Pr Pre-def defined ed trai ain/dev dev/tes est spl plit
Wikipedia Stage • System em Des esign for Ar Argumen ent Mining Topic We should subsidize preschool Simple logistic regression model with lots of o carefully designed features GrASP: Rich Patterns for Argumentation o Claim Detection Mining, Shnarch et al., EMNLP 2017 Document Topic Static train/dev/test datasets Level IR o Analysis Evidence Moderate success over a range of test topics o Detection Only positive instances are annotated o Limited coverage o Retrieve documents that directly o address the topic and are likely to contain argumentative text segments
VLC (Very Large Corpus) Stage Cor orpus wide argument mining - a wor orking sol olution on, Ein-Dor or et al, AAAI AAAI 2020. 2020.
VLC (Very Large Corpus) Stage Mai Main n Di Disti tincti nction n from Prev. Wo Work • Se Sent ntenc nce Level (SL (SL) ) strategy, vs. Docum ument nt Level us used before • SCAL ALE • ~240 ~240 trai ain/dev dev topi pics cs & ~100 ~100 tes est topi pics cs • ~200, ~200,000 000 sen enten ences ces car caref efully an annotat ated ed for trai ain/dev dev à Re Retrospective Lab abeling ng Par arad adigm • ~10, ~10,000, 000,000, 000,000 000 Sen enten ences ces - Re Reporting ng resul ults over a a mas massive corpus us Closer tha Clos han n ever to o a wor orking ng solut olution on
VLC (Very Large Corpus) Stage System Ar Architecture Retrieve 12, 000 sentences per o evidence type per topic Massive Corpus Retrieved High-precision ~10B Sentences Sentences Evidence Set Queries Ranking Model BERT Support flexible patterns to retrieve o argumentative sentences § Topic terms § Evidence connectors Starting with LR from Rinott et o § sentiment lexicon al, EMNLP 2015 § NER Controversial Re Retrospectiv ive Labelin ing g Paradigm igm o An infrastructure that supports An Topic o Iteratively qu quick dy dynamic expe periments and d Collected mo monitors annotation quality Labeled-Data
VLC (Very Large Corpus) Stage Ho How to to Collect ct Lab Labeled Data? ata? • Co Collecting labeled data poses a two wo-fo fold c challenge - • Low ow prior or of of pos ositive examples • An Annot otation on throu ough crow owd requires expertise – simple guidelines, careful mon onitor oring… • BTW - Kappa of BT of ~0. ~0.4 4 is ac actual ally quite good ood • De Developing corpus-wi wide a argument m t mining p poses a anoth ther c challenge • Imagine ~2, ~2,000 000 new prediction ons every week… à As Assoc ociated infrastructure is a must • Re Retrospective lab abeling ng of top predictions ns is a a nat natur ural al and and effective solut ution
Why Eviden ence e Det etec ection is Hard? Mo Moti tion: n: Blood donation should be mandatory According to studies, blood donors are 88 percent less likely to suffer a heart attack… CONFIRMED Statistics … show that students are the main blood donors contributing about 80 percent… REJECTED
VLC (Very Large Corpus) Stage Re Results Re Results by va various BERT RT Models ove ver o a mas a assive corpus of ~10B B sentences BA A baselines: Bl BlendNet, At Attention based o Macro-Average Precision bi bidi directional LS LSTM mode del [ Shnarch et al. (2018 )] )] Hig High p precis isio ion o Wide coverage wi Wi with th diverse evidences o (hi (highl hly simi milar sent ntenc nces are remo moved) Number of candidates
Outline q System overview q Argument retrieval in Project Debater q Some retrospective thoughts
Challenges to Consider while developing a Live Debate System Data-driven speech Listening comprehension Modeling human writing and delivery dilemmas • Identify key claims hidden in long continuous spoken • Digest massive corpora • Modeling the world of human language controversy and • Write a well-structured speech • Compare to personal assistants • Enabling the system to suggest • Deliver with clarity and purpose - simple short commands principled arguments Ar Argumen ent ret etriev eval is the e first step ep to build such a system em
The Problem: Many things need to succeed simultaneously and many things can go wrong…
Many things can go wrong… / Examples • Ge Getti tting th the s sta tance wr wrong m means y you s support y t your o opponent… t… • Dr Drifting from the topic – fr from Ph Physical l Ed Education on to to Se Sex Edu ducat atio ion an and d back back… • The The system is onl nly as good as its corpus us à … … gl global al war armin ing g wil ill lead ad ma malaria virus to to creep into nto hi hilly areas…
Progress over time / Improvement in Precision of Detecting Claims Sentence level IR Se o Very Large Corpus: 400 Ve 400 o mi million articles ( 50 times larger than Wikipedia) Retrospective labelling o Bert fine-tuning o Docu Document le level l IR o Co Corpus us: Wi Wikipedia o Exhaustive Ex ve labe belling g o of pos of ositive instance ces LR + Rich feat LR ature res o Very large corpus Sentence level IR Attention-based Bi-LSTM Retrospective labelling Flexible query with weak supervision
Beyond Project Debater Computational argumentation o is emerging as an interesting Dialogue System research area “Argument mining” is the new Social NLP o keyword in the list of topics in o Sentiment Computational Argumentation recent *ACL conferences Natural o Persuasiveness Language o Social bias o Argument retrieval o Framing Generation o Argument Unit Identification o Fact verification o Argument Relation Prediction o … o Argument(ation) Quality o Argument Generation o … Text Summarization Discourse and Pragmatics o Argumentative discourse o Argumentative coherence o …
Recommend
More recommend