Babble Labble: Training Classifiers with Natural Language Explanations Braden Hancock, Paroma Varma, Stephanie Wang, Martin Bringmann, Percy Liang, Chris Ré ACL 17 July 2018 Melbourne, Australia
Machine learning can help you!*** ***If you have enough training data 2
Traditional Labeling Example Tom Brady was spotted in New York City on Monday with his wife Gisele Bündchen amid rumors of Brady’s alleged role in Deflategate. Label Is pe person 1 married to pe person 2 ? Y N Time Spent Reading/Understanding Clicking Y/N 3
Higher Bandwidth Supervision Example Tom Brady was spotted in New York City on Monday with his wife Gisele Bündchen amid rumors of Brady’s alleged role in Deflategate. Label Is pe person 1 married to pe person 2 ? N Y Explanation Why do you think so? Because the words “his wife” are right before person 2 . pe 4
Explanations Encode Labeling Heuristics Explanation Why did you label Tr True ? Because the words “ hi wife ” are right before his w pe person 2 . La Label Ex Example Tr True “Barack batted back tears as he thanked his wife , Michelle, for all her help.” Tr True “Both Bill and his wife Hillary smiled and waved at reporters as they rode by.” Tr True “George attended the event with his wife , Laura, and their two daughters.” Big Idea : Instead of collecting labels, collect labeling heuristics (in the form of explanations) that can be used to label more examples for free. 5
A framework for generating large training sets from natural language explanations and unlabeled data Result : classifiers trained with Babble Labble and explanations achieved the same F1 score as ones trained with traditional labels while requiring 5–100x fewer user inputs 6
Babble Labble Framework x 1 x 2 x 3 SE SEMANTIC IC PARSE SER UNLABELED EXAMPLES e 1 True,because … e 2 True,because … e 3 False,because … EXPLANATIONS 7
Explanations Encode Heuristics Explanation Why did you label Tr True ? Because the words “his wife” are right before pe person 2 . Labeling Function def f(x): return 1 if (“his wife” in left(x.person2, dist==1)) else 0 #abstain 8
Semantic Parser L F C ONDITION B OOL A RG L IST I S E QUAL S TART L ABEL F ALSE B ECAUSE A RG A ND A RG I S E QUAL S TOP <START> label false because X and Y are the same person <STOP> Lexical Rules Ignored token Compositional Rules Unary Rules → S TART → L F <START> S TART L ABEL B OOL B ECAUSE C ONDITION S TOP → B OOL F ALSE label → C ONDITION → L ABEL A RG L IST I S E QUAL T RUE → B OOL false → F ALSE → A RG L IST A RG A ND A RG → N UM I NT Labeling Function Template: def LF(x): return [label] if [condition] else [abstain] 9
Predicates Logic & Comparison String Matching NER Tags Sets & Mapping Relative Positioning 10
Semantic Parser I/O 1 Explanation 1 Parse Typical Semantic def f(x): return 1 if… True, because… Parser Goal: produce the correct parse 1 Explanation Many Parses def f(x): return 1 if… Our Semantic True, because… def f(x): return 1 if… Parser def f(x): return 1 if… Goal: produce useful parses (whether they’re correct or not) 11
Babble Labble Framework x 1 x 2 x 3 SE SEMANTIC IC PARSE SER FIL FILTER BANK UNLABELED SEMANTIC EXAMPLES e 1 True,because … PRAGMATIC e 2 True,because … e 3 False,because … EXPLANATIONS 12
Filter Bank (Filtered) Labeling Functions Explanations Labeling Functions Filter Bank def f(x): return 1 if… def f(x): return 1 if… def f(x): return 1 if… True, because… Semantic def f(x): return 1 if… False, because… Parser def f(x): return 1 if… True, because… def f(x): return 1 if… def f(x): return 1 if… def f(x): return 1 if… def f(x): return 1 if… Semantic Filter Pragmatic Filter 13
Semantic Filter Example x1: Tom Brady was spotted in New York City on Monday with his wife Gisele Bündchen amid rumors of Brady’s alleged role in Deflategate. Explanation True, because the words “his wife” are right before person 2. Candidate Labeling Functions “right before” = “immediately before” “right before” = “to the right of” def LF_1b(x): def LF_1a(x): return (1 if “his wife” in return (1 if “his wife” in right(x.person2) else 0 left(x.person2, dist==1) else 0) LF_1a(x1) == 1 LF_1b(x1) == 0 (“his wife” is not to the right of (“his wife” is, in fact, 1 word to the person 2) left of person 2) 14
Pragmatic Filters How does the LF label our unlabeled data? x N x 1 Uniform labeling LF 1 : signature LF 2A : Duplicate labeling signature LF 2B : 15
Babble Labble Framework x 1 x 2 x 3 SE SEMANTIC IC PARSE SER FIL FILTER BANK LA LABEL L AGGREGATOR UNLABELED SEMANTIC EXAMPLES ˜ y e 1 True,because … PRAGMATIC e 2 True,because … e 3 False,because … EXPLANATIONS 16
Label Aggregator x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 LF 1: Positive LF 2: Negative Input: LF 3: Abstain LF 4: LF 5: ˜ Output: y: ? ? ? ? ? ? ? ? ? Training Data (x 1 , ỹ 1 ) (x 2 , ỹ 2 ) (x 3 , ỹ 3 ) (x 4 , ỹ 4 ) 17
Label Aggregator x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 LF 1: High correlation; not independent? LF 2: Input: LF 3: LF 4: High conflict; low accuracy? LF 5: Low coverage, high accuracy? ˜ Output: y: ? ? ? ? ? ? ? ? ? How should I break this tie? Data Programming: (Ratner, et al. NIPS 2016) As implemented in: snorkel.stanford.edu 18
Babble Labble Framework x 1 x 2 x 3 SE SEMANTIC IC PARSE SER FIL FILTER BANK LABEL LA L AGGREGATOR DI DISC. MODE DEL UNLABELED SEMANTIC EXAMPLES ˜ y ˜ y x e 1 True,because … PRAGMATIC e 2 True,because … e 3 False,because … EXPLANATIONS 19
Discriminative Classifier Label Input: Labeling Functions, Discriminative Unlabeled data Aggregator Model Generalize beyond Labeling functions Resolve conflicts, generate noisy, re-weight & the labeling functions conflicting votes combine 20
Generalization Task: identify disease-causing chemicals Keywords mentioned in LFs: “treats”, “causes”, “induces”, “prevents”, … Highly relevant features learned by discriminative model: “could produce a”, “support diagnosis of”, … Training a discriminative model that can take advantage of additional useful features not specified in labeling functions boosted performance by 4.3 F1 points on average (10%). 21
Datasets Name # Unlabeled Sample Explanations Label true because "and" occurs between X Spouse 22k and Y and "marriage" occurs one word after person1. Label true because the disease is immediately Disease 6.7k after the chemical and "induc" or "assoc" is in the chemical name. Label true because "Ser" or "Tyr" are within Protein 5.5k 10 characters of the protein. 22
Results Task F1 Babble Labble Traditional Labels Reduction in Score # Explanations # Labels User Inputs 50.1 30 3000+ 100x Spouse Disease 42.3 30 1000+ 33x 47.3 30 150+ 5x Protein Classifiers trained with Babble Labble and explanations achieved the same F1 score as ones trained with traditional labels while requiring 5–100x fewer user inputs 23
Utilizing Unlabeled Data With labeling functions, training set size (and often performance) scales with the amount of unlabeled data we have. 24
Filter Bank Effectiveness Task Babble Labble Babble Labble % Incorrected (No Filters) Parses Filtered 15.7 50.1 97.8% Spouse 39.8 42.3 96.0% Disease 38.2 47.3 97.0% Protein 31.2 46.6 96.9% AVERAGE The filters removed almost 97% of incorrect parses. Without the filters removing bad parses, F1 drops by 15 F1 points on average. 25
Perfect Parsers Need Not Apply Task Babble Labble Babble Labble (Perfect Parses) 50.1 49.8 Spouse 42.3 43.2 Disease 47.3 46.8 Protein AVERAGE 46.6 46.8 Using perfect parses yielded negligible improvements. In this framework, for this task, a naïve semantic parser is good enough! 26
Limitations “Alice beat Bob in the annual office pool tournament.” Do you think person 1 is the spouse of person 2? Why? No, because it sounds like they’re just co-workers. Prefers Prefers High-level Low-level (e.g., “it says so”) What’s a co-worker? (e.g., keywords, word distance, capitalization, etc.) Users’ reasons for labeling are sometimes high-level concepts that are hard to parse. 27
Related Work: Data Programming Common theme: Use weak supervision (e.g., labeling functions) to generate training sets • Snorkel (Ratner et al., VLDB 2018) • Flagship platform for dataset creation from weak supervision • Structure Learning (Bach et al., ICML 2017) • Learning dependencies between correlated labeling functions • Reef (Varma and Ré, In Submission) • Auto-generating labeling functions from a small labeled set snorkel.stanford.edu 28
Related Work: Explanations as Features (Srivastava et al., 2017) What if we use our explanations to make features instead of training labels? DI DISC. MODE DEL Use as features for x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 classifier y ˜ x Exp 1: Exp 2: Exp 3: LABEL LA L AGGREGATOR DISC. MODE DI DEL Exp 4: Exp 5: ˜ y ˜ y x Use as labels for training set Using the parses to label training data instead of as features boosts 4.5 F1 points. 29
Recommend
More recommend