Inoculation by Fine-Tuning: A Method for Analyzing Challenge - PowerPoint PPT Presentation

Inoculation by Fine-Tuning: A Method for Analyzing Challenge Datasets Nelson F. Liu Roy Schwartz Noah A. Smith NAACL 2019—June 4, 2019 UWNLP

Two Key Ingredients of NLP Systems Training Model Dataset Architecture 😋 NLP System � 2

Why Might NLP Systems Fail? Training Model Dataset Architecture 🤓 NLP System � 3

Dataset Weaknesses Training Model Dataset Architecture 🤓 NLP System � 4

Model Weaknesses Training Model Dataset Architecture 🤓 NLP System � 5

Challenge Datasets Break Models � 6

NLP Systems Are Brittle � 9

NLP Systems Are Brittle � 10

Inoculation by Fine-Tuning � 11

Inoculation � 14

Inoculate Models to Better Understand Why They Fail � 15

Three Clear Outcomes of Interest ? Challenge Evaluation Inoculation Outcome � 16

(1) Dataset Weakness Challenge Dataset Evaluation Inoculation Weakness Outcome � 17

(2) Model Weakness Challenge Model Evaluation Inoculation Weakness Outcome � 18

(3) Predictive Artifacts / Other Challenge Predictive Artifacts Evaluation Inoculation / Other Outcome � 19

Three Clear Outcomes of Interest Dataset Weakness Model Challenge Weakness Evaluation Inoculation Outcome Predictive Artifacts / Other � 20

Case Studies • Inoculating natural language inference (NLI) models • Inoculating SQuAD reading comprehension models � 21

[Dagan et al., 2004] Example from MultiNLI [Williams et al., 2018] Natural Language Inference (NLI) Premise: " I have done what you asked. " Hypothesis: "I have disobeyed your orders." Entailment Neutral Contradiction � 22

[Naik and Ravichander et al., 2018] Two NLI Challenge Datasets Premise: " I have done what you asked. " Hypothesis: "I have disobeyed your orders." � 23

[Naik and Ravichander et al., 2018] Two NLI Challenge Datasets Premise: " I have done what you asked. " Hypothesis: "I have disobeyed your orders." Word Overlap Challenge Dataset Premise : "I have done what you asked." Hypothesis : " I have disobeyed your orders and true is true ." � 24

[Naik and Ravichander et al., 2018] Two NLI Challenge Datasets Premise: " I have done what you asked. " Hypothesis: "I have disobeyed your orders." Word Overlap Spelling Errors Challenge Dataset Challenge Dataset Premise : "I have done what Premise : "I have done you asked." what you asked." Hypothesis : " I have Hypothesis : "I have disobeyed your orders and disobeyed your ordets ." true is true ." � 25

Small Perturbations Break NLI Models Word Overlap Spelling Errors -12.6% -4.8% (absolute) (absolute) � 26

Inoculating NLI models Word Overlap Spelling Errors � 27

Inoculating NLI models Word Overlap Spelling Errors Model Weakness Dataset Weakness � 28

More Examples in the Paper! Dataset Model Predictive Artifacts Weakness Weakness / Other Dataset Model Weakness Weakness � 29

[Rajpurkar et al., 2016] Example from Robin Jia SQuAD Question: " The number of new Huguenot colonists declined after what year? " Passage: " The largest portion of the Huguenots to settle in the Cape arrived between 1688 and 1689…but quite a few arrived as late as 1700 ; thereafter, the numbers declined… " Correct Answer: " 1700 " � 30

[Jia and Liang, 2017] Example from Robin Jia Adversarial SQuAD Question: "The number of new Huguenot colonists declined after what year?" Passage: "The largest portion of the Huguenots to settle in the Cape arrived between 1688 and 1689…but quite a few arrived as late as 1700 ; thereafter, the numbers declined. The number of old Acadian colonists declined after the year of 1675 . " Correct Answer: " 1700 " � 31

Small Perturbations Break SQuAD Models -24.5 F1 (absolute) � 32

Inoculating SQuAD models � 33

Inoculating SQuAD models Predictive Artifacts / Other � 34

Takeaways • Inoculation by Fine-Tuning helps us understand why our models fail . • While all challenge datasets break our models, they stress them in di ff erent ways . Dataset Model Predictive Artifacts / Other Weakness Weakness • Potentially many situations where inoculation can help clarify model results when transferring to other datasets. � 35

Thank You! Questions? Takeaways • Inoculation by Fine-Tuning helps us understand why our models fail . • While all challenge datasets break our models, they stress them in di ff erent ways . Dataset Model Predictive Artifacts / Other Weakness Weakness • Potentially many situations where inoculation can help clarify model results when transferring to other datasets. � 36

Limitations of Inoculation by Fine-Tuning • Requires a somewhat balanced label distribution in the challenge dataset. • Else, fine-tuned model will always predict majority label • This method is not a silver bullet! • First step toward disentangling failures of {original / challenge} datasets and models. � 37

� 38

Inoculating Multiple SQuAD Reading Comprehension Models � 39

Inoculating Multiple NLI Models Against Word Overlap Adversary � 40

Inoculating Multiple NLI Models Against Spelling Errors � 41

Inoculation by Fine-Tuning: A Method for Analyzing Challenge - PowerPoint PPT Presentation

Inoculation by Fine-Tuning: A Method for Analyzing Challenge Datasets Nelson F. Liu Roy Schwartz Noah A. Smith NAACL 2019June 4, 2019 UWNLP Two Key Ingredients of NLP Systems Training Model Dataset Architecture NLP System 2

fine-tuning April 9, 2019 1 Fine Tuning In [1]: % matplotlib inline import d2l from mxnet

Grain Refinement of Al -Si Alloys by Nb-B Inoculation. Part 1: Concept Development and Effect

Fine Grained Access Control Fine-Grained Access Control Fine Grained Access Control

VGG/MOBILENET SSD Michael Sun July September, 2019 DATASETS FOR FINE-TUNING BelgaLogos

Fine Tuning of Universe Evidence for (but not proof of) the Existence of God? Walter L.

Fine Grinding - IsaMill 11 Fine Grinding There are several commercially available fine

Fine Arts in RISD Presented by Jeff Bradford Executive Director of Fine Arts RISD Board Meeting

Fine-Grained Access Control Fine Grained Access Control Fine-grained access control examples:

PAC PACE AUT AUTO-WER WERKS KS Vehicle Tuning Services Performance tuning with fuel

TUNING Russia: Development of master programmes in engineering education using the Tuning

Parameters vs hyperparameters Dr. Shirin Glander Data Scientist DataCamp Hyperparameter Tuning

SELF TUNING MEMORY MANAGEMENT FOR DATA SERVERS By Sangeetha Sivaprakasam Introduction : 1)

CHAPTER 9: PID TUNING Process Solve the tuning Apply, is the reaction curve problem. Requires

Hyperparameter tuning in caret Dr. Shirin Glander Data Scientist DataCamp Hyperparameter

Elementary Particles Lecture 4 Niels Tuning Harry van der Graaf Niels Tuning (1) Thanks

Interventions Known to Be Effective Cognitive Behavioral (stress inoculation therapy)

AFRIC ICA FORUM ON NATURAL CAPITAL ACCOUNTING Towards a Regional Community of Practice November

Property Finance Principles August 2019 Overview Key terms in development financing.

Can you imagine creating buildings that generate more energy than they use? Clean the air? Clean

RANGE PAYLOAD DIAGRAM Prof. Rajkumar S. Pant Aerospace Engg. Deptt. Sources: Fielding, J. P.,

How to Get Your Papers Accepted at LISA Tom Limoncelli, Employed Adam Moskowitz, Unemployed

Fourth Quarter and Full-Year 2019 Earnings I January 30, 2020 Forward-Looking Statements Certain

CpSc 513: Course Overview Mark Greenstreet January 7, 2020 Outline: What is verification? A

Accurate Estimates of Fine Scale Reaction Zone Thicknesses in Gas Phase Detonations Joseph M.