AAAI DeLBP Workshop 2/3 / 2018 Snorkel + Data Programming: Beyond Hand-Labeled Training Data Alex Ratner Stanford University, InfoLab
MOTIVATION: In practice, training data is often: • The bottleneck • The practical injection point for domain knowledge
KEY IDEA: We can use higher-level, weaker supervision to program ML models
AAAI DeLBP Workshop 2/3 / 2018 Outline • The Labeling Bottleneck: The new pain point of ML • Data Programming + Snorkel: A framework for weaker, more efficient supervision • In practice: Empirical results & user studies
AAAI DeLBP Workshop 2/3 / 2018 My Amazing Collaborators On the market! Jason Fries Henry Ehrenberg Chris De Sa Stephen Bach (Facebook) (Cornell) And many more at Stanford & Beyond… Bryan He Paroma Varma Sen Wu Braden Hancock Chris Ré
AAAI DeLBP Workshop 2/3 / 2018 The ML Pipeline Pre-Deep Learning Feature Training Collection Labeling Engineering True False Feature engineering used to be the bottleneck…
AAAI DeLBP Workshop 2/3 / 2018 The ML Pipeline Today Training Collection Labeling Representation True Learning False New pain point, new injection point
AAAI DeLBP Workshop 2/3 / 2018 Training Data: Challenges & Opportunities • Expensive & Slow: • Especially when domain expertise needed • Static: • Real-world problems change; hand-labeled training data does not. • An opportunity to inject domain knowledge: • Modern ML models are often too complex for hand-tuned structures, priors, etc. How do we get— and use— training data more effectively?
AAAI DeLBP Workshop 2/3 / 2018 Data Programming + Snorkel A Framework + System for Creating Training Data with Weak Supervision NIPS 2016 SIGMOD (Demo) 2017
KEY IDEA: Get users to provide higher-level (but noisier) supervision, Then model & de-noise it (using unlabeled data) to train high-quality models
AAAI DeLBP Workshop 2/3 / 2018 Data Programming Pipeline in Snorkel Input: Labeling Functions, Generative Noise-Aware Ex. Application: Knowledge Base Unlabeled data Model Discriminative Model Creation (KBC) DOMAIN def lf1(x): def 𝜇 $ cid = (x.chemical_id, EXPERT x.disease_id) h 1,1 return return 1 if if cid in KB else else 0 x 1,1 def lf2(x): def 𝜇 # m = re.search(r’.*cause.*’, 𝑍 y 1 h 1,2 x.between) return 1 if return if m else else 0 x 1,2 Output: Probabilistic def def lf3(x): h 1,3 𝜇 " Training Labels m = re.search( r’.*not r’.*not cause.*’ , x.between) cause.*’ return 1 if return if m else else 0 1 2 3 Users write labeling We model the labeling We use the resulting functions to generate functions’ behavior to prob. labels to train noisy labels de-noise them a model
AAAI DeLBP Workshop 2/3 / 2018 Surprising Point: No hand-labeled training data!
AAAI DeLBP Workshop 2/3 / 2018 𝜇 $ DOMAIN EXPERT def lf1(x): def cid = (x.chemical_id, x.disease_id) h return 1 if return if cid in KB else else 0 1 , 1 x 1 , 1 𝜇 # def def lf2(x): 𝑍 h m = re.search(r’.*cause.*’, y 1 x.between) 1 , return 1 if return if m else else 0 2 x 1 , 2 h 1 , def lf3(x): def 𝜇 " 3 m = re.search( r’.*not cause.*’ r’.*not cause.*’ , x.between) return 1 if return if m else else 0 DOMAIN def def lf1(x): cid = (x.chemical_id, EXPERT x.disease_id) return 1 if return if cid in KB else else 0 def def lf2(x): Step 1: Writing m = re.search(r’.*cause.*’, x.between) return 1 if return if m else else 0 Labeling Functions def lf3(x): def m = re.search( r’.*not r’.*not cause.*’ , x.between) cause.*’ return 1 if return if m else else 0 A Unifying Framework for Expressing Weak Supervision
AAAI DeLBP Workshop 2/3 / 2018 Example: Chemical-Disease Relation Extraction from Text • We define candidate entity mentions: ID Chemical Disease Prob. • Chemicals 00 magnesium Myasthenia 0.84 gravis • Diseases 01 magnesium quadriplegic 0.73 • Goal: Populate a relational schema with 02 magnesium paralysis 0.96 relation mentions KNOWLEDGE BASE (KB)
AAAI DeLBP Workshop 2/3 / 2018 Labeling Functions • Traditional “distant supervision” def lf1(x): def rule relying on external KB cid =(x.chemical_id,x.disease_id) if cid in KB else return 1 if return else 0 ”Chemical A is found to cause Label = TRUE Label = TRUE disease B under certain conditions…” This is likely to be true… but Contains (A,B) Contains (A,B) Existing KB
AAAI DeLBP Workshop 2/3 / 2018 Labeling Functions • Traditional “distant supervision” def lf1(x): def cid =(x.chemical_id,x.disease_id) rule relying on external KB return return 1 if if cid in KB else else 0 ”Chemical A was found on the Label = TRUE Label = TRUE floor near a person with disease B…” …can be false! Contains (A,B) Contains (A,B) Existing KB We will learn the accuracy of each LF (next)
AAAI DeLBP Workshop 2/3 / 2018 Writing Labeling Functions in Snorkel • Labeling functions take • Three levels of abstraction for in Candidate objects: writing LFs in Snorkel: • Python code Document Candidate( Candidate(A,B ,B) def def lf1(x): cid =(x.chemical_id,x.disease_id) Sentence return return 1 if if cid in KB else else 0 • LF templates Span Entity lf1 = LF_DS(KB) CONTEXT HIERARCHY • LF generators for lf in LF_DS_hier(KB, cut_level=2): A knowledge base Key Point: Supervision as code (KB) with hierarchy yield lf
AAAI DeLBP Workshop 2/3 / 2018 Supported by Simple Jupyter Interface snorkel.stanford.edu
AAAI DeLBP Workshop 2/3 / 2018 Broader Perspective: A Template for Weak Supervision
AAAI DeLBP Workshop 2/3 / 2018 A Unifying Method for Weak Supervision • Distant supervision • Crowdsourcing • Weak classifiers • Domain heuristics / rules 𝜇 ∶ 𝑌 ↦ 𝑍 ∪ {∅}
AAAI DeLBP Workshop 2/3 / 2018 Related Work in Weak Supervision • Distant Supervision: Mintz et. al. 2009, Alfonesca et. al. 2012, Takamatsu et. al. 2012, Roth & Klakow 2013, Augenstein et. al. 2015, etc. • Crowdsourcing: Dawid & Skene 1979, Karger et. al. 2011, Dalvi et. al. 2013, Ruvolo et. al. 2013, Zhang et. al. 2014, Berend & Kontorovich 2014, etc. • Co-Training: Blum & Mitchell 1998 • Noisy Learning: Bootkrajang et. al. 2012, Mnih & Hinton 2012, Xiao et. al. 2015, etc. • Indirect Supervision: Clarke et. al. 2010, Guu et. Al. et. al. 2017, etc. • Feature and Class-distribution Supervision: Zaidan & Eisner 2008, Druck et. al. 2009, Liang et. al. 2009, Mann & McCallum 2010, etc. • Boosting & Ensembling: Schapire & Freund, Platanios et. al. 2016, etc. • Constraint-Based Supervision: Bilenko et. al. 2004, Koestinger et. al. 2012, Stewart & Ermon 2017, etc. Check out our full list @ snorkel.stanford.edu/blog/ws_blog_post.html – we love suggested additions or other feedback!
AAAI DeLBP Workshop 2/3 / 2018 How to handle such a diversity of weak supervision sources?
AAAI DeLBP Workshop 2/3 / 2018 𝜇 $ DOMAIN EXPERT def def lf1(x): cid = (x.chemical_id, x.disease_id) h return 1 if return if cid in KB else else 0 1 , 1 x 1 , 1 𝜇 # def def lf2(x): 𝑍 h m = re.search(r’.*cause.*’, y 1 x.between) 1 , return 1 if return if m else else 0 2 x 1 , 2 h 1 , def lf3(x): def 𝜇 " 3 m = re.search( r’.*not cause.*’ r’.*not cause.*’ , x.between) return 1 if return if m else else 0 𝜇 $ Step 2: Modeling . 𝜇 # 𝑍 𝑍 Weak Supervision 𝜇 "
AAAI DeLBP Workshop 2/3 / 2018 Weak Supervision: Core Challenges • Unified input format 𝜇 $ • Accuracies of sources • Modeling • Correlations between sources 𝜇 # 𝑍 • Expertise of sources 𝜇 " • Using to train a wide range of models
AAAI DeLBP Workshop 2/3 / 2018 Weak Supervision: Core Challenges • Unified input format 𝜇 $ • Accuracies of sources NIPS 2016 • Modeling • Correlations between sources 𝜇 # 𝑍 • Expertise of sources 𝜇 " • Using to train a wide range of models Intuition: We use agreements / disagreements to learn without ground truth
AAAI DeLBP Workshop 2/3 / 2018 Basic Generative Labeling Model Labeling propensity: Λ 0,$ ;<= Λ 0 , 𝑍 # ) 𝛾 3 = 𝑞 6 (Λ 0,3 ≠ ∅) ;<= Λ 0,3 𝑔 0 = exp (𝜄 3 3 Accuracy: 𝑍 Λ 0,# 0 𝛽 3 = 𝑞 6 Λ 0,3 = 𝑍 0 𝑍 0 , Λ 0,3 ≠ ∅) <BB Λ 0 , 𝑍 <BB Λ 0,3 𝑍 𝑔 0 = exp (𝜄 0 ) 3 3 Λ 0," Correlations ICML 2017
AAAI DeLBP Workshop 2/3 / 2018 Intuition: Learning from Disagreements P(y i | 𝜇 ) x 1 0.95 P(λ i |y j ) Learn the model π = 𝑄 𝑧, Λ using MLE λ 1 0.85 • LFs have a hidden accuracy parameter x 2 0.80 • Intuition: Majority vote--estimate labeling function accuracy based on overlaps / conflicts λ 2 0.80 x 3 0.85 • Similar to crowdsourcing but different scaling. • small number of LFs, large number of labels 0.15 x 4 each λ 3 0.65 x 5 Produce a set of noisy training labels 0.65 LFs ( 𝜇 ) 𝜈 H 𝑧, 𝜇 = 𝑄 I,J ~L 𝑧 | Λ = 𝜇(𝑦) Unlabeled objects
Recommend
More recommend