an interpretable joint graphical model for fact checking
play

An Interpretable Joint Graphical Model for Fact-Checking from Crowds - PowerPoint PPT Presentation

An Interpretable Joint Graphical Model for Fact-Checking from Crowds An T. Nguyen 1 Aditya Kharosekar 1 Matthew Lease 1 Byron C. Wallace 2 1 University of Texas at Austin 2 Northeastern University 1 Problems Given a claim: Facebook Shut Down an


  1. An Interpretable Joint Graphical Model for Fact-Checking from Crowds An T. Nguyen 1 Aditya Kharosekar 1 Matthew Lease 1 Byron C. Wallace 2 1 University of Texas at Austin 2 Northeastern University 1

  2. Problems Given a claim: Facebook Shut Down an AI Experiment Because Chatbots Developed Their Own Language. 2

  3. Problems Given a claim: Facebook Shut Down an AI Experiment Because Chatbots Developed Their Own Language. and relevant article headlines: No, Facebook Did Not Panic and Shut Down an AI Program That Was Getting Dangerously Smart. source: gizmodo.com 2

  4. Problems Given a claim: Facebook Shut Down an AI Experiment Because Chatbots Developed Their Own Language. and relevant article headlines: No, Facebook Did Not Panic and Shut Down an AI Program That Was Getting Dangerously Smart. source: gizmodo.com Predict headline stance: For Against Observing 2

  5. Problems Given a claim: Facebook Shut Down an AI Experiment Because Chatbots Developed Their Own Language. and relevant article headlines: No, Facebook Did Not Panic and Shut Down an AI Program That Was Getting Dangerously Smart. source: gizmodo.com Predict headline stance: For Against Observing Predict claim veracity: False True Unknown 2

  6. Problems Given a claim: Facebook Shut Down an AI Experiment Because Chatbots Developed Their Own Language. and relevant article headlines: No, Facebook Did Not Panic and Shut Down an AI Program That Was Getting Dangerously Smart. source: gizmodo.com Predict headline stance: For Against Observing Predict claim veracity: False True Unknown 2

  7. Problems Given a claim: Facebook Shut Down an AI Experiment Because Chatbots Developed Their Own Language. and relevant article headlines: No, Facebook Did Not Panic and Shut Down an AI Program That Was Getting Dangerously Smart. source: gizmodo.com Predict headline stance: For Against Observing Predict claim veracity: False True Unknown Our motivation: ◮ Make sense of general claims incl. scientific, historical, ... 2

  8. Problems Given a claim: Facebook Shut Down an AI Experiment Because Chatbots Developed Their Own Language. and relevant article headlines: No, Facebook Did Not Panic and Shut Down an AI Program That Was Getting Dangerously Smart. source: gizmodo.com Predict headline stance: For Against Observing Predict claim veracity: False True Unknown Our motivation: ◮ Make sense of general claims incl. scientific, historical, ... ◮ Not just “fake news”. 2

  9. Solutions Previous work: 3

  10. Solutions Previous work: ◮ Predict stance from text features (Ferreira& Vlachos 2016). 3

  11. Solutions Previous work: ◮ Predict stance from text features (Ferreira& Vlachos 2016). ◮ Predict veracity from stance+source features (Popat et al. 2017) 3

  12. Solutions Previous work: ◮ Predict stance from text features (Ferreira& Vlachos 2016). ◮ Predict veracity from stance+source features (Popat et al. 2017) We proposed: ◮ Crowdsource stance labels. 3

  13. Solutions Previous work: ◮ Predict stance from text features (Ferreira& Vlachos 2016). ◮ Predict veracity from stance+source features (Popat et al. 2017) We proposed: ◮ Crowdsource stance labels. ◮ Hybrid human AI ◮ Available near real-time 3

  14. Solutions Previous work: ◮ Predict stance from text features (Ferreira& Vlachos 2016). ◮ Predict veracity from stance+source features (Popat et al. 2017) We proposed: ◮ Crowdsource stance labels. ◮ Hybrid human AI ◮ Available near real-time ◮ Joint graphical model of stance, veracity, annotators. 3

  15. Solutions Previous work: ◮ Predict stance from text features (Ferreira& Vlachos 2016). ◮ Predict veracity from stance+source features (Popat et al. 2017) We proposed: ◮ Crowdsource stance labels. ◮ Hybrid human AI ◮ Available near real-time ◮ Joint graphical model of stance, veracity, annotators. ◮ Interaction between variables ◮ Interpretable 3

  16. Powered by TCPDF (www.tcpdf.org) Powered by TCPDF (www.tcpdf.org) Model B U L A c lablers V S W T n claims R m sources 4

  17. Powered by TCPDF (www.tcpdf.org) Powered by TCPDF (www.tcpdf.org) Model B U L A c lablers V S W 1. Predict Stance S ◮ Text features T T n claims R m sources Powered by TCPDF (www.tcpdf.org) 4

  18. Powered by TCPDF (www.tcpdf.org) Powered by TCPDF (www.tcpdf.org) Model B U L A c lablers V S W 1. Predict Stance S ◮ Text features T T n claims 2. Predict Veracity V R ◮ Stance S m sources Powered by TCPDF (www.tcpdf.org) ◮ Reputation R 4

  19. Model B 1. Predict Stance S ◮ Text features T U L A 2. Predict Veracity V c lablers ◮ Stance S V S W ◮ Reputation R 3. Stance Label L T ◮ True stance S n claims ◮ Annotator competence A R m sources Powered by TCPDF (www.tcpdf.org) Powered by TCPDF (www.tcpdf.org) 4

  20. Inference & Learning Inference: ◮ Gibbs sampling: accurate but slow. 5

  21. Inference & Learning Inference: ◮ Gibbs sampling: accurate but slow. ◮ Variational inference: fast but biased. 5

  22. Inference & Learning Inference: ◮ Gibbs sampling: accurate but slow. ◮ Variational inference: fast but biased. Learning: Expectation Maximization. 5

  23. Inference & Learning Inference: ◮ Gibbs sampling: accurate but slow. ◮ Variational inference: fast but biased. Learning: Expectation Maximization. Details in the paper. 5

  24. Evaluation Data: Emergent (Ferreira and Vlachos 2016) ◮ 300 claims. ◮ 2595 articles with stance labels. 6

  25. Evaluation Data: Emergent (Ferreira and Vlachos 2016) ◮ 300 claims. ◮ 2595 articles with stance labels. ◮ We collected: crowd stance labels by Mechanical Turk. 6

  26. Evaluation Data: Emergent (Ferreira and Vlachos 2016) ◮ 300 claims. ◮ 2595 articles with stance labels. ◮ We collected: crowd stance labels by Mechanical Turk. Baseline: Separated models for stance, veracity & crowd labels. 6

  27. Evaluation Data: Emergent (Ferreira and Vlachos 2016) ◮ 300 claims. ◮ 2595 articles with stance labels. ◮ We collected: crowd stance labels by Mechanical Turk. Baseline: Separated models for stance, veracity & crowd labels. Metric: Brier score, measures accuracy and prob. calibration. 6

  28. Results 7

  29. User study Interface: users enter claims, see predictions. 8

  30. User study Interface: users enter claims, see predictions. A/B testing 8

  31. User study Interface: users enter claims, see predictions. A/B testing ◮ A: see only veracity predictions 8

  32. User study Interface: users enter claims, see predictions. A/B testing ◮ A: see only veracity predictions ◮ B: also see explanation (reputation, stances) 8

  33. User study Interface: users enter claims, see predictions. A/B testing ◮ A: see only veracity predictions ◮ B: also see explanation (reputation, stances) 8

  34. User study: results 9

  35. Conclusion Takeaway: ◮ Stance/Veracity predictions are hard. 10

  36. Conclusion Takeaway: ◮ Stance/Veracity predictions are hard. ◮ We contribute: crowdsourcing + joint modeling. 10

  37. Conclusion Takeaway: ◮ Stance/Veracity predictions are hard. ◮ We contribute: crowdsourcing + joint modeling. Paper: experiments on Snopes. 10

  38. Conclusion Takeaway: ◮ Stance/Veracity predictions are hard. ◮ We contribute: crowdsourcing + joint modeling. Paper: experiments on Snopes. Demo: fcweb.pythonanywhere.com 10

  39. Conclusion Takeaway: ◮ Stance/Veracity predictions are hard. ◮ We contribute: crowdsourcing + joint modeling. Paper: experiments on Snopes. Demo: fcweb.pythonanywhere.com We share code + data 10

  40. Conclusion Takeaway: ◮ Stance/Veracity predictions are hard. ◮ We contribute: crowdsourcing + joint modeling. Paper: experiments on Snopes. Demo: fcweb.pythonanywhere.com We share code + data Acknowledge: Crowd annotator, reviewers, NSF. 10

  41. Conclusion Takeaway: ◮ Stance/Veracity predictions are hard. ◮ We contribute: crowdsourcing + joint modeling. Paper: experiments on Snopes. Demo: fcweb.pythonanywhere.com We share code + data Acknowledge: Crowd annotator, reviewers, NSF. Questions? 10

Recommend


More recommend