Snorkel + Data Programming: Beyond Hand-Labeled Training Data Alex - PowerPoint PPT Presentation

AAAI DeLBP Workshop 2/3 / 2018 Snorkel + Data Programming: Beyond Hand-Labeled Training Data Alex Ratner Stanford University, InfoLab

MOTIVATION: In practice, training data is often: • The bottleneck • The practical injection point for domain knowledge

KEY IDEA: We can use higher-level, weaker supervision to program ML models

AAAI DeLBP Workshop 2/3 / 2018 Outline • The Labeling Bottleneck: The new pain point of ML • Data Programming + Snorkel: A framework for weaker, more efficient supervision • In practice: Empirical results & user studies

AAAI DeLBP Workshop 2/3 / 2018 My Amazing Collaborators On the market! Jason Fries Henry Ehrenberg Chris De Sa Stephen Bach (Facebook) (Cornell) And many more at Stanford & Beyond… Bryan He Paroma Varma Sen Wu Braden Hancock Chris Ré

AAAI DeLBP Workshop 2/3 / 2018 The ML Pipeline Pre-Deep Learning Feature Training Collection Labeling Engineering True False Feature engineering used to be the bottleneck…

AAAI DeLBP Workshop 2/3 / 2018 The ML Pipeline Today Training Collection Labeling Representation True Learning False New pain point, new injection point

AAAI DeLBP Workshop 2/3 / 2018 Training Data: Challenges & Opportunities • Expensive & Slow: • Especially when domain expertise needed • Static: • Real-world problems change; hand-labeled training data does not. • An opportunity to inject domain knowledge: • Modern ML models are often too complex for hand-tuned structures, priors, etc. How do we get— and use— training data more effectively?

AAAI DeLBP Workshop 2/3 / 2018 Data Programming + Snorkel A Framework + System for Creating Training Data with Weak Supervision NIPS 2016 SIGMOD (Demo) 2017

KEY IDEA: Get users to provide higher-level (but noisier) supervision, Then model & de-noise it (using unlabeled data) to train high-quality models

AAAI DeLBP Workshop 2/3 / 2018 Data Programming Pipeline in Snorkel Input: Labeling Functions, Generative Noise-Aware Ex. Application: Knowledge Base Unlabeled data Model Discriminative Model Creation (KBC) DOMAIN def lf1(x): def 𝜇 $ cid = (x.chemical_id, EXPERT x.disease_id) h 1,1 return return 1 if if cid in KB else else 0 x 1,1 def lf2(x): def 𝜇 # m = re.search(r’.*cause.*’, 𝑍 y 1 h 1,2 x.between) return 1 if return if m else else 0 x 1,2 Output: Probabilistic def def lf3(x): h 1,3 𝜇 " Training Labels m = re.search( r’.*not r’.*not cause.*’ , x.between) cause.*’ return 1 if return if m else else 0 1 2 3 Users write labeling We model the labeling We use the resulting functions to generate functions’ behavior to prob. labels to train noisy labels de-noise them a model

AAAI DeLBP Workshop 2/3 / 2018 Surprising Point: No hand-labeled training data!

AAAI DeLBP Workshop 2/3 / 2018 𝜇 $ DOMAIN EXPERT def lf1(x): def cid = (x.chemical_id, x.disease_id) h return 1 if return if cid in KB else else 0 1 , 1 x 1 , 1 𝜇 # def def lf2(x): 𝑍 h m = re.search(r’.*cause.*’, y 1 x.between) 1 , return 1 if return if m else else 0 2 x 1 , 2 h 1 , def lf3(x): def 𝜇 " 3 m = re.search( r’.*not cause.*’ r’.*not cause.*’ , x.between) return 1 if return if m else else 0 DOMAIN def def lf1(x): cid = (x.chemical_id, EXPERT x.disease_id) return 1 if return if cid in KB else else 0 def def lf2(x): Step 1: Writing m = re.search(r’.*cause.*’, x.between) return 1 if return if m else else 0 Labeling Functions def lf3(x): def m = re.search( r’.*not r’.*not cause.*’ , x.between) cause.*’ return 1 if return if m else else 0 A Unifying Framework for Expressing Weak Supervision

AAAI DeLBP Workshop 2/3 / 2018 Example: Chemical-Disease Relation Extraction from Text • We define candidate entity mentions: ID Chemical Disease Prob. • Chemicals 00 magnesium Myasthenia 0.84 gravis • Diseases 01 magnesium quadriplegic 0.73 • Goal: Populate a relational schema with 02 magnesium paralysis 0.96 relation mentions KNOWLEDGE BASE (KB)

AAAI DeLBP Workshop 2/3 / 2018 Labeling Functions • Traditional “distant supervision” def lf1(x): def rule relying on external KB cid =(x.chemical_id,x.disease_id) if cid in KB else return 1 if return else 0 ”Chemical A is found to cause Label = TRUE Label = TRUE disease B under certain conditions…” This is likely to be true… but Contains (A,B) Contains (A,B) Existing KB

AAAI DeLBP Workshop 2/3 / 2018 Labeling Functions • Traditional “distant supervision” def lf1(x): def cid =(x.chemical_id,x.disease_id) rule relying on external KB return return 1 if if cid in KB else else 0 ”Chemical A was found on the Label = TRUE Label = TRUE floor near a person with disease B…” …can be false! Contains (A,B) Contains (A,B) Existing KB We will learn the accuracy of each LF (next)

AAAI DeLBP Workshop 2/3 / 2018 Writing Labeling Functions in Snorkel • Labeling functions take • Three levels of abstraction for in Candidate objects: writing LFs in Snorkel: • Python code Document Candidate( Candidate(A,B ,B) def def lf1(x): cid =(x.chemical_id,x.disease_id) Sentence return return 1 if if cid in KB else else 0 • LF templates Span Entity lf1 = LF_DS(KB) CONTEXT HIERARCHY • LF generators for lf in LF_DS_hier(KB, cut_level=2): A knowledge base Key Point: Supervision as code (KB) with hierarchy yield lf

AAAI DeLBP Workshop 2/3 / 2018 Supported by Simple Jupyter Interface snorkel.stanford.edu

AAAI DeLBP Workshop 2/3 / 2018 Broader Perspective: A Template for Weak Supervision

AAAI DeLBP Workshop 2/3 / 2018 A Unifying Method for Weak Supervision • Distant supervision • Crowdsourcing • Weak classifiers • Domain heuristics / rules 𝜇 ∶ 𝑌 ↦ 𝑍 ∪ {∅}

AAAI DeLBP Workshop 2/3 / 2018 Related Work in Weak Supervision • Distant Supervision: Mintz et. al. 2009, Alfonesca et. al. 2012, Takamatsu et. al. 2012, Roth & Klakow 2013, Augenstein et. al. 2015, etc. • Crowdsourcing: Dawid & Skene 1979, Karger et. al. 2011, Dalvi et. al. 2013, Ruvolo et. al. 2013, Zhang et. al. 2014, Berend & Kontorovich 2014, etc. • Co-Training: Blum & Mitchell 1998 • Noisy Learning: Bootkrajang et. al. 2012, Mnih & Hinton 2012, Xiao et. al. 2015, etc. • Indirect Supervision: Clarke et. al. 2010, Guu et. Al. et. al. 2017, etc. • Feature and Class-distribution Supervision: Zaidan & Eisner 2008, Druck et. al. 2009, Liang et. al. 2009, Mann & McCallum 2010, etc. • Boosting & Ensembling: Schapire & Freund, Platanios et. al. 2016, etc. • Constraint-Based Supervision: Bilenko et. al. 2004, Koestinger et. al. 2012, Stewart & Ermon 2017, etc. Check out our full list @ snorkel.stanford.edu/blog/ws_blog_post.html – we love suggested additions or other feedback!

AAAI DeLBP Workshop 2/3 / 2018 How to handle such a diversity of weak supervision sources?

AAAI DeLBP Workshop 2/3 / 2018 𝜇 $ DOMAIN EXPERT def def lf1(x): cid = (x.chemical_id, x.disease_id) h return 1 if return if cid in KB else else 0 1 , 1 x 1 , 1 𝜇 # def def lf2(x): 𝑍 h m = re.search(r’.*cause.*’, y 1 x.between) 1 , return 1 if return if m else else 0 2 x 1 , 2 h 1 , def lf3(x): def 𝜇 " 3 m = re.search( r’.*not cause.*’ r’.*not cause.*’ , x.between) return 1 if return if m else else 0 𝜇 $ Step 2: Modeling . 𝜇 # 𝑍 𝑍 Weak Supervision 𝜇 "

AAAI DeLBP Workshop 2/3 / 2018 Weak Supervision: Core Challenges • Unified input format 𝜇 $ • Accuracies of sources • Modeling • Correlations between sources 𝜇 # 𝑍 • Expertise of sources 𝜇 " • Using to train a wide range of models

AAAI DeLBP Workshop 2/3 / 2018 Weak Supervision: Core Challenges • Unified input format 𝜇 $ • Accuracies of sources NIPS 2016 • Modeling • Correlations between sources 𝜇 # 𝑍 • Expertise of sources 𝜇 " • Using to train a wide range of models Intuition: We use agreements / disagreements to learn without ground truth

AAAI DeLBP Workshop 2/3 / 2018 Basic Generative Labeling Model Labeling propensity: Λ 0,$ ;<= Λ 0 , 𝑍 # ) 𝛾 3 = 𝑞 6 (Λ 0,3 ≠ ∅) ;<= Λ 0,3 𝑔 0 = exp (𝜄 3 3 Accuracy: 𝑍 Λ 0,# 0 𝛽 3 = 𝑞 6 Λ 0,3 = 𝑍 0 𝑍 0 , Λ 0,3 ≠ ∅) <BB Λ 0 , 𝑍 <BB Λ 0,3 𝑍 𝑔 0 = exp (𝜄 0 ) 3 3 Λ 0," Correlations ICML 2017

AAAI DeLBP Workshop 2/3 / 2018 Intuition: Learning from Disagreements P(y i | 𝜇 ) x 1 0.95 P(λ i |y j ) Learn the model π = 𝑄 𝑧, Λ using MLE λ 1 0.85 • LFs have a hidden accuracy parameter x 2 0.80 • Intuition: Majority vote--estimate labeling function accuracy based on overlaps / conflicts λ 2 0.80 x 3 0.85 • Similar to crowdsourcing but different scaling. • small number of LFs, large number of labels 0.15 x 4 each λ 3 0.65 x 5 Produce a set of noisy training labels 0.65 LFs ( 𝜇 ) 𝜈 H 𝑧, 𝜇 = 𝑄 I,J ~L 𝑧 | Λ = 𝜇(𝑦) Unlabeled objects

Snorkel + Data Programming: Beyond Hand-Labeled Training Data Alex - PowerPoint PPT Presentation

AAAI DeLBP Workshop 2/3 / 2018 Snorkel + Data Programming: Beyond Hand-Labeled Training Data Alex Ratner Stanford University, InfoLab MOTIVATION: In practice, training data is often: The bottleneck The practical injection point for

Object Detection in Snorkel Michael Chi Ian Tang 1 Snorkel 2 Data Programming Labelling

Weak Supervised Learning on Ray using Snorkel British Sentiment Analysis with LSTM Using Noisy

Few-shot learning of weak supervision sources in Snorkel (or, learning weakly supervised weak

List of hand outs for this session Hand out 1: Incident decision tree Hand out 2: Yorkshire

Hand Hygiene Stefan Morton Hand Hygiene Coordinator Evidence Improved adherence to hand

Learning to Rank with Learning to Rank with Partially-Labeled Data Partially-Labeled Data Kevin

Learning to Rank Learning to Rank with Partially-Labeled Data with Partially-Labeled Data Kevin

Creators of the finest hand painted wallpapers and fabrics, hand carved furniture and hand painted

ROUNDERS (1998) CASINO ROYALE (2006) HAND RANKINGS HIGH CARD HAND RANKINGS PAIR HIGH CARD

Raise your hand in Zoom Click on Participants Your hand is raised Click hand to lower it

Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale Stephen Bach

Toronto Bloorview Macmillan (TBM) Hand Multi-Fingered, Adaptive Grasp Prosthetic Hand: Better

Right Hand Coordinate System Right Hand Rule A rectangular or Cartesian coordinate system

Co-Training Based on Combining Labeled and Unlabeled Data with Co-Training by A. Blum

Looking Beyond the Knob Looking Beyond the Knob Looking Beyond the Knob Looking Beyond the Knob

MEDIA DISRUPTION SEEING BEYOND SEEING BEYOND SEEING BEYOND SEEING BEYOND LED BY THE BLIND

Decline in TRECs with age and HIV infection Rob J. de Boer Theoretical Biology, UU Mette

Thames Valley SCN Workshop Charles Rendell June 20 2014 Affiliate: DRAFT - NOT FOR CIRCULATION

1 2 3 1 6/12/2019 Formats of Groups: Types of Groups Invitation for further brainstorming:

Learning the Structure of Generative Models without Labeled Data Stephen Bach Bryan He

1E8 - Universal Design http://www.cs.tcd.ie/Alexis.Donnelly/1e8/ Alexis Donnelly Department of

Retroperitoneale fibrose/chronische periaortitis: een geval voor de internist NVIVG

GP2DRS Jon Gilbert Contracts & Specifications Manager http://diabeticeye.screening.nhs.uk

Transformations Rotations Reflections Dilations Symmetry Return to Congruence &

Snorkel + Data Programming: Beyond Hand-Labeled Training Data Alex - PowerPoint PPT Presentation

AAAI DeLBP Workshop 2/3 / 2018 Snorkel + Data Programming: Beyond Hand-Labeled Training Data Alex Ratner Stanford University, InfoLab MOTIVATION: In practice, training data is often: The bottleneck The practical injection point for

Object Detection in Snorkel Michael Chi Ian Tang 1 Snorkel 2 Data Programming Labelling

Weak Supervised Learning on Ray using Snorkel British Sentiment Analysis with LSTM Using Noisy

Few-shot learning of weak supervision sources in Snorkel (or, learning weakly supervised weak

List of hand outs for this session Hand out 1: Incident decision tree Hand out 2: Yorkshire

Hand Hygiene Stefan Morton Hand Hygiene Coordinator Evidence Improved adherence to hand

Learning to Rank with Learning to Rank with Partially-Labeled Data Partially-Labeled Data Kevin

Learning to Rank Learning to Rank with Partially-Labeled Data with Partially-Labeled Data Kevin

Creators of the finest hand painted wallpapers and fabrics, hand carved furniture and hand painted

ROUNDERS (1998) CASINO ROYALE (2006) HAND RANKINGS HIGH CARD HAND RANKINGS PAIR HIGH CARD

Raise your hand in Zoom Click on Participants Your hand is raised Click hand to lower it

Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale Stephen Bach

Toronto Bloorview Macmillan (TBM) Hand Multi-Fingered, Adaptive Grasp Prosthetic Hand: Better

Right Hand Coordinate System Right Hand Rule A rectangular or Cartesian coordinate system

Co-Training Based on Combining Labeled and Unlabeled Data with Co-Training by A. Blum

Looking Beyond the Knob Looking Beyond the Knob Looking Beyond the Knob Looking Beyond the Knob

MEDIA DISRUPTION SEEING BEYOND SEEING BEYOND SEEING BEYOND SEEING BEYOND LED BY THE BLIND

Decline in TRECs with age and HIV infection Rob J. de Boer Theoretical Biology, UU Mette

Thames Valley SCN Workshop Charles Rendell June 20 2014 Affiliate: DRAFT - NOT FOR CIRCULATION

1 2 3 1 6/12/2019 Formats of Groups: Types of Groups Invitation for further brainstorming:

Learning the Structure of Generative Models without Labeled Data Stephen Bach Bryan He

1E8 - Universal Design http://www.cs.tcd.ie/Alexis.Donnelly/1e8/ Alexis Donnelly Department of

Retroperitoneale fibrose/chronische periaortitis: een geval voor de internist NVIVG

GP2DRS Jon Gilbert Contracts &amp; Specifications Manager http://diabeticeye.screening.nhs.uk

Transformations Rotations Reflections Dilations Symmetry Return to Congruence &amp;

GP2DRS Jon Gilbert Contracts & Specifications Manager http://diabeticeye.screening.nhs.uk

Transformations Rotations Reflections Dilations Symmetry Return to Congruence &