Learning the Structure of Generative Models without Labeled Data - PowerPoint PPT Presentation

Learning the Structure of Generative Models   without Labeled Data Stephen Bach Bryan He Alex Ratner Chris Ré Stanford University �

This Talk � • We study structure learning for generative models in which a latent variable generates weak signals � • The challenge is distinguishing between dependencies directly between the weak signals and those induced by the latent class �

� This Talk � • We propose an l1- regularized pseudolikelihood approach � • We develop a new analysis technique, since previous analyses of related approaches only apply to the fully supervised case �

Roadmap � • Motivation: Denoising Weak Supervision with Generative Models � • Our Work: Learn their Structure without Ground Truth � • Results � • Provable Recovery � • Consistent Performance Improvements on Existing Systems �

� Motivation: Denoising � Weak Supervision with Generative Models �

Training Data Creation: $$$, Slow, Static � • Expensive & Slow: � • Especially when domain expertise needed Especially when domain expertise needed � Grad Student Labeler � • With deep learning replacing feature engineering, collecting training data is now often the biggest ML bottleneck �

Snorkel � • Open-source system to build ML models � with weak supervision � • Users write labeling functions, model their � accuracies and correlations, and train models � snorkel.stanford.edu �

Example: Chemical-Disease Relations � • We have entity mentions: � ID ¡ Chemical ¡ Disease ¡ Prob. ¡ • Chemicals Chemicals � 00 ¡ magnesium ¡ Myasthenia ¡ 0.84 ¡ gravis ¡ • Diseases Diseases � 01 ¡ magnesium ¡ quadriplegic ¡ 0.73 ¡ • Goal: Populate table with relation mentions � 02 ¡ magnesium ¡ paralysis ¡ 0.96 ¡

How can we train without � hand-labeling examples? �

� Weak Supervision � Noisy, less expensive labels � Example types: � • Domain heuristics � • Crowdsourcing � • Distant supervision � • Weak classifiers �

Generative Models for Weak Supervision � • Crowdsourcing � [Dawid and Skene, 1979, � Dalvi et al., WWW 2013] � • Hierarchical topic models for relation extraction � [Alfonseca et al., ACL 2012, � Roth and Klakow, EMNLP 2013] � • Generative models for denoising distant supervision � [Takamatsu et al., ACL 2012] � • Generative models for arbitrary labeling functions � [Ratner et al., NIPS 2016] �

Labeling Functions – Domain Heuristics � “In our study, administering Chemical A caused Disease B under certain conditions…” � def def LF_1(x): m = re.match('.*caused.*', x.sentence) return True if m else else None

Labeling Functions – Distant Supervision � “In our study, administering Chemical A caused Disease B under certain conditions…” � def def LF_2(x): in_kb = (x.chemical, x.disease) in ctd return True if in_kb else else None Comparative Toxicogenomics Database � http://ctdbase.org ¡

Weak Supervision Pipeline in Snorkel � Noise-Aware Input: Labeling Functions Generative Model � Discriminative Model � def def lf1(x): DOMAIN L 1 ¡ cid = (x.chemical_id, x.disease_id) EXPERT � h 1 ¡ return return 1 if if cid in KB else else 0 x 1 ¡ def def lf2(x): L 2 ¡ y ¡ m = re.search(r’.*cause.*’, y ¡ h 2 ¡ x.between) return return 1 if if m else else 0 x 2 ¡ Output: h 3 ¡ def lf3(x): def L 3 ¡ m = re.search( r’.*not r’.*not Trained Model cause.*’ , x.between) cause.*’ return return 1 if if m else else 0 Users write functions We use estimated We model functions’ to label training data � labels to train a model � behavior to denoise it �

� � Denoising Weak Supervision � True � Latent variable � Label � Factors model � Acc � Acc � Acc � LF accuracies � Generates � LF outputs � LF 1 � LF 2 � LF 3 � We maximize the marginal likelihood of the noisy labels � Intuitively, compares their agreements and disagreements �

Dependent Labeling Functions � • Correlated heuristics � • E.g., looking for keywords in different sized windows of text � • Correlated inputs � • E.g., looking for keywords in raw tokens or lemmas � • Correlated Knowledge Sources � • E.g., distant supervision from overlapping knowledge bases �

Structure Learning �

Structure Learning � True � Label � Acc � Acc � Acc � Latent Variable � Target Variable � LF 1 � Cor? � LF 2 � Cor? � LF 3 � Conditioning Variable � Dependency � Possible Dependency � Cor? �

� Structure Learning for Factor Graphs � Challenges � • Gradient requires approximation � • Possible dependencies grow quadratically or worse � Prior Work � • Ravikumar et al. (Ann. of Stats., 2010) proposed using � l1-regularized pseudolikelihood for supervised Ising models �

Structure Learning for Generative Models � • We maximize the l1-regularized marginal pseudolikelihood � • One target variable and one latent variable means gradient can be computed exactly, efficiently �

Structure Learning for Generative Models � True � Label � Acc � Acc � Acc � Latent Variable � Target Variable � LF 1 � Cor? � LF 2 � Cor? � LF 3 � Conditioning Variable � Dependency � Possible Dependency � Cor? �

Structure Learning for Generative Models � True � Label � Acc � Acc � Acc � Latent Variable � Target Variable � LF 1 � Cor � LF 2 � LF 3 � Conditioning Variable � Dependency � Possible Dependency � Cor �

Structure Learning for Generative Models � True � Label � Acc � Acc � Acc � Latent Variable � Target Variable � LF 1 � Cor � LF 2 � Cor � LF 3 � Conditioning Variable � Dependency � Possible Dependency �

Structure Learning for Generative Models � True � Label � Acc � Acc � Acc � Latent Variable � Target Variable � LF 1 � LF 2 � Cor � LF 3 � Conditioning Variable � Dependency � Possible Dependency � Cor �

Structure Learning for Generative Models � • Without ground truth, the problem becomes harder � • Latent variable means marginal likelihood is nonconvex �

Analysis �

� Analysis � • Strategy � • Focus on case in which most labeling functions are non- adversarial � • Show that true model contained in region in which objective is locally strongly convex � • Assumptions � • Feasible set of parameters that contains the true model � • Over the feasible set, conditioning on a labeling function provides more information than marginalizing it out �

� � � � � � Theorem: Guaranteed Recovery � For pairwise dependencies, such as correlations, � ⇣ ⌘ n log n m ≥ Ω δ samples are sufficient to recover true dependency structure over δ n labeling functions with probability at least 1 - . � n

Empirical Results �

Empirical Sample Complexity � • Better in practice � • Same as observed in supervised setting �

Speed Up: 100x �

Improvement to End Models � Application � Ind. F1 � Struct. F1 � F1 Diff � # LF � # Dep. � Disease 66.3 � 68.9 � +2.6 � 233 � 315 � Tagging � Chemical- 54.6 � 55.9 � +1.3 � 33 � 21 � Disease � Device- 88.1 � 88.7 � +0.6 � 12 � 32 � Polarity �

Conclusion � • Generative models can help us get around the training data bottleneck, but we need to learn their structure � • Maximum pseudolikelihood gives � • provable recovery � • 100x speedup � • end-model improvement � snorkel.stanford.edu � Thank you! �

Learning the Structure of Generative Models without Labeled Data - PowerPoint PPT Presentation

Learning the Structure of Generative Models without Labeled Data Stephen Bach Bryan He Alex Ratner Chris R Stanford University This Talk We study structure learning for generative models in which a latent variable

generative design systems Generative Brief Design Definitions Workshop Processes

Learning Deep Generative Models Inference & Representation Lecture 12 Rahul G. Krishnan

Generative networks part 2: GANs 23 / 54 Recap on generative networks Generative networks provide

Generative Adversarial Nets(GANs) Troy Cary and Chenzhi Zhao A generative adversarial net is

CSC421/2516 Lecture 18: Generative Adversarial Networks Roger Grosse and Jimmy Ba Roger Grosse

Deep Generative models for Inverse Problems Alex Dimakis joint work with Ashish Bora, Dave Van

Invertible Generative Models for Inverse Problems Mitigating Representation Error and Dataset Bias

Introduction to Generative Models (and GANs) Haoqiang Fan fhq@megvii.com Nov. 2017 Figures

LEARNING GENERATIVE MODELS ACROSS INCOMPARABLE SPACES Cha harlot otte Bunne unne , David

Compressed Sensing and Generative Models Ashish Bora Ajil Jalal Eric Price Alex Dimakis UT

Probabilistic Models of Cognition: Generative models Table of Contents Chapter

Augmented Statistical Models: Exploiting Generative Models in Discriminative Classifiers Martin

Generative Deep Learning Prof. Kuan-Ting Lai 2020/5/12 DeepFake (Intro) Generative Recurrent

Conditional Generative Adversarial Networks (and a brief look at image-to-image translation)

Applications of GANs Photo-Realistic Single Image Super-Resolution Using a Generative

CSC321 Lecture 19: Generative Adversarial Networks Roger Grosse Roger Grosse CSC321 Lecture 19:

Centre for Neuromuscular Diseases Queen Square Centre for Neuromuscular Diseases

A Clinical Perspective on PCSK9 Inhibition: What do we know and what can we expect? Paul M

Common Foodborne Illnesses Causes, Diagnostics, Reporting Objectives Describe recent

Real world data PCSK9 remmers- Efficacy and side effects Jeanine Roeters van Lennep Internist

1 2 3 1 6/12/2019 Formats of Groups: Types of Groups Invitation for further brainstorming:

Thames Valley SCN Workshop Charles Rendell June 20 2014 Affiliate: DRAFT - NOT FOR CIRCULATION

Decline in TRECs with age and HIV infection Rob J. de Boer Theoretical Biology, UU Mette

Snorkel + Data Programming: Beyond Hand-Labeled Training Data Alex Ratner Stanford University,

Learning the Structure of Generative Models without Labeled Data - PowerPoint PPT Presentation

Learning the Structure of Generative Models without Labeled Data Stephen Bach Bryan He Alex Ratner Chris R Stanford University This Talk We study structure learning for generative models in which a latent variable

generative design systems Generative Brief Design Definitions Workshop Processes

Learning Deep Generative Models Inference &amp; Representation Lecture 12 Rahul G. Krishnan

Generative networks part 2: GANs 23 / 54 Recap on generative networks Generative networks provide

Generative Adversarial Nets(GANs) Troy Cary and Chenzhi Zhao A generative adversarial net is

CSC421/2516 Lecture 18: Generative Adversarial Networks Roger Grosse and Jimmy Ba Roger Grosse

Deep Generative models for Inverse Problems Alex Dimakis joint work with Ashish Bora, Dave Van

Invertible Generative Models for Inverse Problems Mitigating Representation Error and Dataset Bias

Introduction to Generative Models (and GANs) Haoqiang Fan fhq@megvii.com Nov. 2017 Figures

LEARNING GENERATIVE MODELS ACROSS INCOMPARABLE SPACES Cha harlot otte Bunne unne , David

Compressed Sensing and Generative Models Ashish Bora Ajil Jalal Eric Price Alex Dimakis UT

Probabilistic Models of Cognition: Generative models Table of Contents Chapter

Augmented Statistical Models: Exploiting Generative Models in Discriminative Classifiers Martin

Generative Deep Learning Prof. Kuan-Ting Lai 2020/5/12 DeepFake (Intro) Generative Recurrent

Conditional Generative Adversarial Networks (and a brief look at image-to-image translation)

Applications of GANs Photo-Realistic Single Image Super-Resolution Using a Generative

CSC321 Lecture 19: Generative Adversarial Networks Roger Grosse Roger Grosse CSC321 Lecture 19:

Centre for Neuromuscular Diseases Queen Square Centre for Neuromuscular Diseases

A Clinical Perspective on PCSK9 Inhibition: What do we know and what can we expect? Paul M

Common Foodborne Illnesses Causes, Diagnostics, Reporting Objectives Describe recent

Real world data PCSK9 remmers- Efficacy and side effects Jeanine Roeters van Lennep Internist

1 2 3 1 6/12/2019 Formats of Groups: Types of Groups Invitation for further brainstorming:

Thames Valley SCN Workshop Charles Rendell June 20 2014 Affiliate: DRAFT - NOT FOR CIRCULATION

Decline in TRECs with age and HIV infection Rob J. de Boer Theoretical Biology, UU Mette

Snorkel + Data Programming: Beyond Hand-Labeled Training Data Alex Ratner Stanford University,

Learning Deep Generative Models Inference & Representation Lecture 12 Rahul G. Krishnan