questioning question answering answers
play

Questioning Question Answering Answers Sameer Singh University of - PowerPoint PPT Presentation

Questioning Question Answering Answers Sameer Singh University of California, Irvine Questioning Question Answering Answers Sameer Singh University of California, Irvine QA Systems are really good! Is there a moustache in the picture? >


  1. Questioning Question Answering Answers Sameer Singh University of California, Irvine

  2. Questioning Question Answering Answers Sameer Singh University of California, Irvine

  3. QA Systems are really good! Is there a moustache in the picture? > Yes What is the moustache made of? > Banana Visual7A [Zhu et al 2016]

  4. QA Systems are really good! The biggest city on the river Rhine is How long is the Rhine? Cologne, Germany with a population of more than 1,050,000 people. 1230km It is the second-longest river in Central and Western Europe (after the Danube), at about 1,230 km (760 mi) Is it doing the right thing? BiDAF [Seo et al 2017] 4

  5. We know that they are not Jia and Liang, EMNLP 2017 Mudrakarta et al ACL 2018

  6. Overstability! What is the moustache made of? > Banana What are the eyes made of? > Bananas What is? > Banana What? > Banana

  7. Oversensitivity to phrasing! What type of road sign is shown? > STOP. What type of road sign is shown? > Do not Enter.

  8. Oversensitivity to unimportant typos! How long is the Rhine? > 1230km The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest river in Central How long is the Rhine? and Western Europe (after the Danube), at about 1,230 km (760 mi) > More than 1,050,000

  9. QA Systems are brittle • Our goals are to provide automated tools • For both oversensitivity and overstability • Can we figure these out automatically, with minimal human time? • Can we try to rationalize/explain predictions? analyze the mistakes? • Hopefully, they help design choices for: • Data gathering and annotations • Model structure and training • Evaluation pipelines

  10. Being Model-Agnostic … Ignore the internal structure X1 > 0.5 f(x) X2 > 0.5 Not restricted to differentiable modules Practically easy: not tied to PyTorch, Tflow, etc. Study models that you don’t have access to! 10

  11. Talk Overview LIME: Linear Explanations Explaining Predictions Anchors: Sufficient Conditions SEARS: Detecting Oversensitivity

  12. Talk Overview LIME: Linear Explanations Explaining Predictions Anchors: Sufficient Conditions SEARS: Detecting Oversensitivity

  13. Being Local… “Global” explanation is too complicated

  14. Being Local … “Global” explanation is too complicated

  15. Being Local … “Global” explanation is too complicated Describe the locally-accurate behavior, using interpretable representations

  16. Talk Overview KDD 2016 LIME: Linear Explanations Explaining Predictions Anchors: Sufficient Conditions SEARS: Detecting Oversensitivity

  17. LIME: Sparse, Linear Explanations Identify the important words, and present their relative importance

  18. What an explanation looks like From: Keith Richards Subject: Christianity is the answer NTTP-Posting-Host: x.x.com I think Christianity is the one true religion. If you’d like to know more, send me a note Why did this happen?

  19. LIME on VisualQA What type of road sign is shown? > STOP. LIME What type of road sign is shown?

  20. LIME on SQuAD What is the longest river in Central and Western Europe? The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. the Danube It is the second-longest river in Central LIME and Western Europe (after the Danube), at about 1,230 km (760 mi) What is the longest river in Central and Western Europe? BiDAF [Seo et al 2017]

  21. LIME on SQuAD What is the second longest river in Central and Western Europe? The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. the Danube It is the second-longest river in Central LIME and Western Europe (after the Danube), at about 1,230 km (760 mi) What is the second longest river in Central and Western Europe? BiDAF [Seo et al 2017]

  22. Limitations of LIME Gain understanding of local behavior, but very little generalization… Which is the second longest river in Germany’s part of Europe? The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest river in Central Unless they run it, the and Western Europe (after the Danube), users have little idea of at about 1,230 km (760 mi) what the answer will be

  23. Talk Overview LIME: Linear Explanations Explaining Predictions Anchors: Sufficient Conditions SEARS: Detecting Oversensitivity AAAI 2018

  24. Anchors: Sufficient Conditions Identify the conditions under which the classifier has the same prediction

  25. Anchors on VisualQA What type of road sign is shown? STOP. If question starts with What (and is similarly structured) the prediction will be STOP 96.8% What type of road sign is shown? What type of road sign is shown?

  26. Anchors on Visual QA Anchor

  27. Anchors on Visual QA Anchor

  28. Anchors on SQuAD What is the longest river in The biggest city on the river Rhine is Central and Western Europe? Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest river in Central the Danube and Western Europe (after the Danube), at about 1,230 km (760 mi) 96.5% What is the longest river in What is the longest river in Central and Western Europe? Central and Western Europe?

  29. Anchors on SQuAD What is the second longest river in The biggest city on the river Rhine is Central and Western Europe? Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest river in Central the Danube and Western Europe (after the Danube), at about 1,230 km (760 mi) What is the second longest river in What is the second longest river in Central and Western Europe? Central and Western Europe?

  30. User study on VisualQA Show humans predictions + explanations Ask them to predict what the model will do in new instances (only if confident) No explanations Which is the longest river ? Danube Which is second longest river? LIME , , “I don’t know” Danube Rhine Which is the longest river ? Danube Anchor Anchor: “ longest river ” → Danube

  31. Summary of VisualQA Results How often they predict How often they correct Time per prediction 95.95 100 100 20 16.3 80 80 66.9 64.95 62.85 60 60 9.85 10 40 35.3 40  Users are more precise 29.6 4.55 and quicker with anchors 20 20 0 0 0 No LIME Anchor No LIME Anchor No LIME Anchor Explanations Explanations Explanations

  32. Anchors: Tools for Overstability What about Over-sensitivity?

  33. Talk Overview LIME: Linear Explanations Explaining predictions Anchors: Sufficient Conditions SEARS: Detecting Oversensitivity ACL 2018

  34. Oversensitivity: Adversarial Examples Find closest example with different prediction 37

  35. But unlikely in the real world (except for attacks) Oversensitivity in images “panda” “gibbon” 57.7% confidence 99.3% confidence Adversaries are indistinguishable to humans … 39

  36. What about text? What type of road sign is shown? > STOP. What type of road sign is What type of road sign is shown? sho wn? Perceptible by humans, unlikely in real world 40

  37. What about text? What type of road sign is shown? > STOP. What type of road sign is shown? A single word changes too much! 41

  38. Semantics matter What type of road sign is shown? > STOP. What type of road sign is shown? > Do not Enter. Bug, and likely in the real world 42

  39. Semantics matter How long is the Rhine? The biggest city on the river Rhine is > 1230km Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest river in Central How long is the Rhine? and Western Europe (after the Danube), at about 1,230 km (760 mi) > More than 1,050,000 Not all changes are the same: meaning should be same 43

  40. Characterize via Rules Find rule that generates many adversaries 44

  41. Characterizing via Rules What type of road sign is shown? > STOP. What type of road sign is shown? > Do not Enter. - flips 3.9% of examples Rule What NOUN Which NOUN

  42. Characterizing via Rules How long is the Rhine? The biggest city on the river Rhine is > 1230km Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest river in Central and Western Europe (after the Danube), How long is the Rhine? at about 1,230 km (760 mi) > More than 1,050,000 - flips 3% of examples Rule ? ?? 47

  43. SEARS: Adversarial Rules Rules are global and actionable, more interesting than individual adversaries 48

  44. SEARS Examples: VisualQA Visual7a-Telling [Zhu et al 2016] 49

  45. SEARS Examples: SQuAD BiDAF [Seo et al 2017] 50

  46. VQA User Study: Detecting adversaries 45 40 36 SEAs find adversaries as often as humans! 33.6 20 SEAs + Humans better than humans! 0 Human SEA Human + SEA Human SEA Human + SEA

  47. VQA User study: Can experts find bugs? Time (minutes) % predictions flipped 20 20 16.9 SEARs are much better than 14.2 expert-produced rules 10.1 Evaluating is much easier than finding them 3 0 0 Visual QA Visual QA Closing the loop brings it down to 1.4% Finding Rules Evaluating SEARs Experts SEARs

Recommend


More recommend