inoculation by fine tuning a method for analyzing
play

Inoculation by Fine-Tuning: A Method for Analyzing Challenge - PowerPoint PPT Presentation

Inoculation by Fine-Tuning: A Method for Analyzing Challenge Datasets Nelson F. Liu Roy Schwartz Noah A. Smith NAACL 2019June 4, 2019 UWNLP Two Key Ingredients of NLP Systems Training Model Dataset Architecture NLP System 2


  1. Inoculation by Fine-Tuning: A Method for Analyzing Challenge Datasets Nelson F. Liu Roy Schwartz Noah A. Smith NAACL 2019—June 4, 2019 UWNLP

  2. Two Key Ingredients of NLP Systems Training Model Dataset Architecture 😋 NLP System � 2

  3. Why Might NLP Systems Fail? Training Model Dataset Architecture 🤓 NLP System � 3

  4. Dataset Weaknesses Training Model Dataset Architecture 🤓 NLP System � 4

  5. Model Weaknesses Training Model Dataset Architecture 🤓 NLP System � 5

  6. Challenge Datasets Break Models � 6

  7. Challenge Datasets Break Models � 7

  8. Challenge Datasets Break Models � 8

  9. NLP Systems Are Brittle � 9

  10. NLP Systems Are Brittle � 10

  11. Inoculation by Fine-Tuning � 11

  12. Inoculation by Fine-Tuning � 12

  13. Inoculation by Fine-Tuning � 13

  14. Inoculation � 14

  15. Inoculate Models to Better Understand Why They Fail � 15

  16. Three Clear Outcomes of Interest ? Challenge Evaluation Inoculation Outcome � 16

  17. (1) Dataset Weakness Challenge Dataset Evaluation Inoculation Weakness Outcome � 17

  18. (2) Model Weakness Challenge Model Evaluation Inoculation Weakness Outcome � 18

  19. (3) Predictive Artifacts / Other Challenge Predictive Artifacts Evaluation Inoculation / Other Outcome � 19

  20. Three Clear Outcomes of Interest Dataset Weakness Model Challenge Weakness Evaluation Inoculation Outcome Predictive Artifacts / Other � 20

  21. Case Studies • Inoculating natural language inference (NLI) models • Inoculating SQuAD reading comprehension models � 21

  22. [Dagan et al., 2004] Example from MultiNLI [Williams et al., 2018] Natural Language Inference (NLI) Premise: " I have done what you asked. " Hypothesis: "I have disobeyed your orders." Entailment Neutral Contradiction � 22

  23. [Naik and Ravichander et al., 2018] Two NLI Challenge Datasets Premise: " I have done what you asked. " Hypothesis: "I have disobeyed your orders." � 23

  24. [Naik and Ravichander et al., 2018] Two NLI Challenge Datasets Premise: " I have done what you asked. " Hypothesis: "I have disobeyed your orders." Word Overlap Challenge Dataset Premise : "I have done what you asked." Hypothesis : " I have disobeyed your orders and true is true ." � 24

  25. [Naik and Ravichander et al., 2018] Two NLI Challenge Datasets Premise: " I have done what you asked. " Hypothesis: "I have disobeyed your orders." Word Overlap Spelling Errors Challenge Dataset Challenge Dataset Premise : "I have done what Premise : "I have done you asked." what you asked." Hypothesis : " I have Hypothesis : "I have disobeyed your orders and disobeyed your ordets ." true is true ." � 25

  26. Small Perturbations Break NLI Models Word Overlap Spelling Errors -12.6% -4.8% (absolute) (absolute) � 26

  27. Inoculating NLI models Word Overlap Spelling Errors � 27

  28. Inoculating NLI models Word Overlap Spelling Errors Model Weakness Dataset Weakness � 28

  29. More Examples in the Paper! Dataset Model Predictive Artifacts Weakness Weakness / Other Dataset Model Weakness Weakness � 29

  30. [Rajpurkar et al., 2016] Example from Robin Jia SQuAD Question: " The number of new Huguenot colonists declined after what year? " Passage: " The largest portion of the Huguenots to settle in the Cape arrived between 1688 and 1689…but quite a few arrived as late as 1700 ; thereafter, the numbers declined… " Correct Answer: " 1700 " � 30

  31. [Jia and Liang, 2017] Example from Robin Jia Adversarial SQuAD Question: "The number of new Huguenot colonists declined after what year?" Passage: "The largest portion of the Huguenots to settle in the Cape arrived between 1688 and 1689…but quite a few arrived as late as 1700 ; thereafter, the numbers declined. The number of old Acadian colonists declined after the year of 1675 . " Correct Answer: " 1700 " � 31

  32. Small Perturbations Break SQuAD Models -24.5 F1 (absolute) � 32

  33. Inoculating SQuAD models � 33

  34. Inoculating SQuAD models Predictive Artifacts / Other � 34

  35. Takeaways • Inoculation by Fine-Tuning helps us understand why our models fail . • While all challenge datasets break our models, they stress them in di ff erent ways . Dataset Model Predictive Artifacts / Other Weakness Weakness • Potentially many situations where inoculation can help clarify model results when transferring to other datasets. � 35

  36. Thank You! Questions? Takeaways • Inoculation by Fine-Tuning helps us understand why our models fail . • While all challenge datasets break our models, they stress them in di ff erent ways . Dataset Model Predictive Artifacts / Other Weakness Weakness • Potentially many situations where inoculation can help clarify model results when transferring to other datasets. � 36

  37. Limitations of Inoculation by Fine-Tuning • Requires a somewhat balanced label distribution in the challenge dataset. • Else, fine-tuned model will always predict majority label • This method is not a silver bullet! • First step toward disentangling failures of {original / challenge} datasets and models. � 37

  38. � 38

  39. Inoculating Multiple SQuAD Reading Comprehension Models � 39

  40. Inoculating Multiple NLI Models Against Word Overlap Adversary � 40

  41. Inoculating Multiple NLI Models Against Spelling Errors � 41

Recommend


More recommend