argument retrieval in project debater
play

Argument Retrieval in Project Debater Yufang Hou IBM Research - PowerPoint PPT Presentation

Argument Retrieval in Project Debater Yufang Hou IBM Research Europe, Dublin IBM Research: History of Grand Challenges 2019 First computer to successfully debate champion debaters 2011 ( Proje ater ) Project Debat First computer to defeat


  1. Argument Retrieval in Project Debater Yufang Hou IBM Research Europe, Dublin

  2. IBM Research: History of Grand Challenges 2019 First computer to successfully debate champion debaters 2011 ( Proje ater ) Project Debat First computer to defeat best human Jeopardy! players (Watson) 1997 First computer to defeat a world champion in Chess (Deep Blue)

  3. Segments from a Live Debate (San Francisco, Feb 11 th 2019) Expert human debater: Mr. Harish Natarajan Motion: We should Format: Oxford style debating Fully automatic debate subsidize preschool Fully automatic debate No human intervention No human intervention Selected from test set based on assessment of chances to have a meaningful debate

  4. Project Debater: Media Exposure Millions Hundreds 2.1 Billion 100 Million social media people reached of video views of press articles in all leading news papers impressions

  5. • Fu Full Li Live e Deba ebate, te, Feb Feb-2019 2019 https://www.youtube.com/watch?v=m3u-1yttrVw&t=2469s • “T “The e Deb ebater ter” Doc ocumen enta tary https://www.youtube.com/watch?v=7pHaNMdWGsk&t=1383s

  6. Outline q System overview q Argument retrieval in Project Debater q Some retrospective thoughts

  7. Current Publications Highlight Various As Aspects of the System

  8. Pub Public licatio ions ns an and Dat Datas asets are are av avai ailab able at at - https://www.research.ibm.com/artificial- intelligence/project-debater/research/

  9. Outline q System overview q Argument retrieval in Project Debater q Some retrospective thoughts

  10. Related Work • Lippi Li ppi an and d To Toroni ni, IJCAI AI, 2015 • Al Al-Khatib et al, NAAC AACL 2016; Wachsmuth et al, Ar Argument-Mi Mining W Workshop, 2 2017, … … • Stab and St nd Gu Gurevy vych, EMNLP 2014; Stab et al, NAAC AACL 2018, … • Re Recent nt reviews • Fiv Five years rs of of argu rgument min inin ing: g: a data-dr driven en an anal alysis, Cabr abrio an and d Vi Villata, IJCAI AI, 2018 • Ar Argumentation Mining, St Stede an and d Sch chnei eider der, Synthes esis Lect Lectures es on HLT LT, 2018 2018 • Ar Argument Mining: A A Survey, Lawrence and Reed, CL, 2019

  11. Wikipedia Stage Con ontext Dependent Claim Detection on, Levy et al, COLING 2014. 2014. Show ow Me You Your Evidence - an Au Autom omatic Method od for or Con ontext Dependent Evidence Detection on, Rinot ott et al, EMNLP 2015. 2015.

  12. Wikipedia Stage • Wikiped edia Claim/Eviden ence e Label eled ed Data – Label eling Proces ess Con ontrov oversial Top opic Select Wikipedia Ar Articles ü 5 5 In-house An Annotators Per Stage Fi Find Claim Candidates per Ar Article ü Ex Exhau austive e an annotat ation Con onfirm/Reject Each Claim Candidate Find Candidate Evidence per Claim Fi Con onfirm/Reject Each Candidate Evidence

  13. Wikipedia Stage • Wikiped edia Claim/Eviden ence e Label eled ed Data - Res esults ü 58 58 Controver ersial al Topi pics cs se selected from rom De Debatabase ü 547 547 rel elev evan ant Wikipedi pedia a ar articl cles es car caref efully label abeled ed by by in-ho hous use team § E.g., Ban the sale of of Viol olent Video o Games for or Children ü 2. 2.6K 6K Clai aims ms & & 4. 4.5K 5K Ev Eviden dence ce th that s t support/c t/conte test th t the c claims § Evidence length vary from om on one sentence to o a whol ole paragraph § Three types of of Evidence: Study, Expert, and An Anecdot otal ü Pr Pre-def defined ed trai ain/dev dev/tes est spl plit

  14. Wikipedia Stage • System em Des esign for Ar Argumen ent Mining Topic We should subsidize preschool Simple logistic regression model with lots of o carefully designed features GrASP: Rich Patterns for Argumentation o Claim Detection Mining, Shnarch et al., EMNLP 2017 Document Topic Static train/dev/test datasets Level IR o Analysis Evidence Moderate success over a range of test topics o Detection Only positive instances are annotated o Limited coverage o Retrieve documents that directly o address the topic and are likely to contain argumentative text segments

  15. VLC (Very Large Corpus) Stage Cor orpus wide argument mining - a wor orking sol olution on, Ein-Dor or et al, AAAI AAAI 2020. 2020.

  16. VLC (Very Large Corpus) Stage Mai Main n Di Disti tincti nction n from Prev. Wo Work • Se Sent ntenc nce Level (SL (SL) ) strategy, vs. Docum ument nt Level us used before • SCAL ALE • ~240 ~240 trai ain/dev dev topi pics cs & ~100 ~100 tes est topi pics cs • ~200, ~200,000 000 sen enten ences ces car caref efully an annotat ated ed for trai ain/dev dev à Re Retrospective Lab abeling ng Par arad adigm • ~10, ~10,000, 000,000, 000,000 000 Sen enten ences ces - Re Reporting ng resul ults over a a mas massive corpus us Closer tha Clos han n ever to o a wor orking ng solut olution on

  17. VLC (Very Large Corpus) Stage System Ar Architecture Retrieve 12, 000 sentences per o evidence type per topic Massive Corpus Retrieved High-precision ~10B Sentences Sentences Evidence Set Queries Ranking Model BERT Support flexible patterns to retrieve o argumentative sentences § Topic terms § Evidence connectors Starting with LR from Rinott et o § sentiment lexicon al, EMNLP 2015 § NER Controversial Re Retrospectiv ive Labelin ing g Paradigm igm o An infrastructure that supports An Topic o Iteratively qu quick dy dynamic expe periments and d Collected mo monitors annotation quality Labeled-Data

  18. VLC (Very Large Corpus) Stage Ho How to to Collect ct Lab Labeled Data? ata? • Co Collecting labeled data poses a two wo-fo fold c challenge - • Low ow prior or of of pos ositive examples • An Annot otation on throu ough crow owd requires expertise – simple guidelines, careful mon onitor oring… • BTW - Kappa of BT of ~0. ~0.4 4 is ac actual ally quite good ood • De Developing corpus-wi wide a argument m t mining p poses a anoth ther c challenge • Imagine ~2, ~2,000 000 new prediction ons every week… à As Assoc ociated infrastructure is a must • Re Retrospective lab abeling ng of top predictions ns is a a nat natur ural al and and effective solut ution

  19. Why Eviden ence e Det etec ection is Hard? Mo Moti tion: n: Blood donation should be mandatory According to studies, blood donors are 88 percent less likely to suffer a heart attack… CONFIRMED Statistics … show that students are the main blood donors contributing about 80 percent… REJECTED

  20. VLC (Very Large Corpus) Stage Re Results Re Results by va various BERT RT Models ove ver o a mas a assive corpus of ~10B B sentences BA A baselines: Bl BlendNet, At Attention based o Macro-Average Precision bi bidi directional LS LSTM mode del [ Shnarch et al. (2018 )] )] Hig High p precis isio ion o Wide coverage wi Wi with th diverse evidences o (hi (highl hly simi milar sent ntenc nces are remo moved) Number of candidates

  21. Outline q System overview q Argument retrieval in Project Debater q Some retrospective thoughts

  22. Challenges to Consider while developing a Live Debate System Data-driven speech Listening comprehension Modeling human writing and delivery dilemmas • Identify key claims hidden in long continuous spoken • Digest massive corpora • Modeling the world of human language controversy and • Write a well-structured speech • Compare to personal assistants • Enabling the system to suggest • Deliver with clarity and purpose - simple short commands principled arguments Ar Argumen ent ret etriev eval is the e first step ep to build such a system em

  23. The Problem: Many things need to succeed simultaneously and many things can go wrong…

  24. Many things can go wrong… / Examples • Ge Getti tting th the s sta tance wr wrong m means y you s support y t your o opponent… t… • Dr Drifting from the topic – fr from Ph Physical l Ed Education on to to Se Sex Edu ducat atio ion an and d back back… • The The system is onl nly as good as its corpus us à … … gl global al war armin ing g wil ill lead ad ma malaria virus to to creep into nto hi hilly areas…

  25. Progress over time / Improvement in Precision of Detecting Claims Sentence level IR Se o Very Large Corpus: 400 Ve 400 o mi million articles ( 50 times larger than Wikipedia) Retrospective labelling o Bert fine-tuning o Docu Document le level l IR o Co Corpus us: Wi Wikipedia o Exhaustive Ex ve labe belling g o of pos of ositive instance ces LR + Rich feat LR ature res o Very large corpus Sentence level IR Attention-based Bi-LSTM Retrospective labelling Flexible query with weak supervision

  26. Beyond Project Debater Computational argumentation o is emerging as an interesting Dialogue System research area “Argument mining” is the new Social NLP o keyword in the list of topics in o Sentiment Computational Argumentation recent *ACL conferences Natural o Persuasiveness Language o Social bias o Argument retrieval o Framing Generation o Argument Unit Identification o Fact verification o Argument Relation Prediction o … o Argument(ation) Quality o Argument Generation o … Text Summarization Discourse and Pragmatics o Argumentative discourse o Argumentative coherence o …

Recommend


More recommend