FlyingSquid: Speeding Up Weak Supervision with Triplet Methods Dan - PowerPoint PPT Presentation

FlyingSquid: Speeding Up Weak Supervision with Triplet Methods Dan Fu*, Mayee Chen*, Fred Sala, Sarah Hooper, Kayvon Fatahalian, Chris Ré Daniel Y. Fu*, Mayee F. Chen*, Frederic Sala, Sarah M. Hooper, Kayvon Fatahalian, and Christopher Ré. Fast and Three-rious: Speeding Up Weak Supervision with Triplet Methods. ICML 2020. * Denotes Equal Contribution

The Training Data Bottleneck in ML Collecting training data can be slow and expensive Introduction Background Parameter Recovery Theoretical Analysis Evaluation

Weak Supervision - A Response def L_1(comment) : return SPAM if “http” in comment def L_2(comment) : return NOT SPAM if “love” in comment User-De fi ned Functions Crowd Workers External Knowledge Bases How to best use multiple noisy sources of supervision? Introduction Background Parameter Recovery Theoretical Analysis Evaluation

Data Programming: Unifying Weak Supervision [1] X 1 λ 1 μ ( λ 1 ,Y 1 ) def S_1 : Y 1 λ 2 label +1 if Bernie on screen Y 1 0.95 λ 3 μ ( λ 2 ,Y 1 ) μ ( λ 3 ,Y 1 ) λ 4 def S_2 : X 2 Y 2 Y 2 0.87 λ 5 label +1 if background blue μ ( λ 4 , λ 5 ,Y 2 ) λ 6 μ ( λ 6 ,Y 2 ) def S_3 : Y 3 0.09 λ 7 return CROWD_WORKER_VOTE Y 3 X 3 λ 8 λ 9 Unlabeled Input Labeling Functions Latent Variable Label Model Probabilistic End Model Model Training Labels 1 2 3 Model labeling Use the probabilistic Users write labeling function behavior to labels to train a functions de-noise them downstream model [1] Ratner et al. Snorkel: Rapid Training Data Creation with Weak Supervision. VLDB 2018 . Introduction Background Parameter Recovery Theoretical Analysis Evaluation

Weak Supervision: A Response Modeling labeling functions critical, but can be slow… X 1 λ 1 μ ( λ 1 ,Y 1 ) def S_1 : Y 1 λ 2 label +1 if Bernie on screen Y 1 0.95 λ 3 μ ( λ 2 ,Y 1 ) μ ( λ 3 ,Y 1 ) λ 4 def S_2 : X 2 Y 2 Y 2 0.87 λ 5 label +1 if background blue μ ( λ 4 , λ 5 ,Y 2 ) λ 6 μ ( λ 6 ,Y 2 ) def S_3 : Y 3 0.09 λ 7 return CROWD_WORKER_VOTE Y 3 X 3 λ 8 λ 9 Unlabeled Input Labeling Functions Latent Variable Label Model Probabilistic End Model Model Training Labels 1 2 3 Model labeling Use the probabilistic Users write labeling function behavior to labels to train a functions de-noise them downstream model Expensive SGD iterations Introduction Background Parameter Recovery Theoretical Analysis Evaluation

FlyingSquid: Reduce the Turnaround Time ▪ Background: labeling functions and graphical models ▪ Closed-form solution to model parameters, no SGD ▪ Theoretical bounds and guarantees ∴ ⇒ ∎ ▪ Run orders of magnitude faster, without losing accuracy; weakly-supervised online learning Introduction Background Parameter Recovery Theoretical Analysis Evaluation

Background Introduction Background Parameter Recovery Theoretical Analysis Evaluation

Problem Setup S 1 : 풳 → λ 1 ∈ {±1,0} { ̂ { X i } n Y i } n f w : 풳 → 풴 i =1 i =1 S m : 풳 → λ m ∈ {±1,0} Unlabeled Data m Labeling Functions Probabilistic Downstream Labels End Model We want to learn the joint distribution P( λ , Y), without observing Y! Introduction Background Parameter Recovery Theoretical Analysis Evaluation

Model Labeling Functions with Latent Graphical Models Hidden Variable (True Label) λ 1 λ 2 λ 4 λ 5 λ 7 λ 8 Y Y 1 Y 2 Y 3 λ 1 λ 2 λ 3 λ 4 λ 5 λ 3 λ 6 λ 9 Observed Variables (Labeling Function Outputs) Temporal dependencies Technical problem: learning parameters of these graphical models Main challenge: recover accuracies μ of labeling functions [2] Varma et al. Learning Dependency Structures for Weak Supervision Models. ICML 2019. Introduction Background Parameter Recovery Theoretical Analysis Evaluation

Parameter Recovery Introduction Background Parameter Recovery Theoretical Analysis Evaluation

Existing Iterative Approaches Can Be Slow SGD over loss function Sala et al. 2019 Ratner et al. 2016 Safranchik et al. 2020 Zhan et al. 2019 Ratner et al. 2018 Ratner et al. 2019 Bach et al. 2019 (Gibbs) Disadvantages: SGD can take a long time, many hyperparameters (learning rate, momentum, etc) to tune Introduction Background Parameter Recovery Theoretical Analysis Evaluation

Solve Triplets of Labeling Function Parameters at a Time λ 1 μ ( λ 1 ,Y 1 ) Y 1 λ 2 E[ λ 1 Y 1 ]E[ λ 2 Y 1 ] = E[ λ 1 λ 2 ] λ 3 μ ( λ 2 ,Y 1 ) E[ λ 2 Y 1 ]E[ λ 3 Y 1 ] = E[ λ 2 λ 3 ] μ ( λ 3 ,Y 1 ) λ 4 E[ λ 3 Y 1 ]E[ λ 1 Y 1 ] = E[ λ 3 λ 1 ] Y 2 λ 5 μ ( λ 4 , λ 5 ,Y 2 ) λ 6 μ ( λ 6 ,Y 2 ) λ 7 Y 3 λ 8 λ 9 Latent Variable Solve Triplets Label Model Model of Labeling Functions Method of moments: break problem up into pieces, get closed-form solutions Introduction Background Parameter Recovery Theoretical Analysis Evaluation

Triplets of Conditionally-Independent Labeling Functions Moment = E [ λ i λ j ] μ i μ j = N i , j Unobservable Observable accuracy parameters agreements Form triplets of these equations: Get closed-form solutions: μ 1 μ 4 = N 1,4 | μ 1 | = N 1,4 N 1,5 / N 4,5 μ 1 μ 5 = N 1,5 ⇒ | μ 4 | = N 1,4 N 4,5 / N 1,5 μ 4 μ 5 = N 4,5 | μ 5 | = N 4,5 N 1,5 / N 1,4 All we need to do is count how often the labeling functions agree - no SGD! Introduction Background Parameter Recovery Theoretical Analysis Evaluation

Theoretical Analysis Introduction Background Parameter Recovery Theoretical Analysis Evaluation

Bounding Sampling Error (Informal) Theorem 1: How Sampling Error Scales in n Error in parameter estimate μ − μ ∥ 2 ] ≤ O ( n − 1/2 ) E [ ∥ ̂ Number of unlabeled data points Theorem 2: Optimal Scaling Rate Y μ − μ ∥ 2 ] ≥ Ω ( n − 1/2 ) E [ ∥ ̂ ⇒ Bound is Tight λ 1 λ 2 λ 3 λ 4 λ 5 Best Possible Scaling Rate Conditionally-Independent with Unlabeled Data Labeling Functions Introduction Background Parameter Recovery Theoretical Analysis Evaluation

̂ ̂ End Model Generalization Error (Informal) Theorem 3: End Model Generalization Error f ̂ If you use parameters to generate labels and train an end model , Y μ w End model generalization error w , X , Y ) − L ( w *, X , Y )] = O ( n − 1/2 ) E [ L ( ̂ This is the same asymptotic rate as with supervised data! More theory nuggets (check out our paper for details): ▪ We can achieve these rates even with model misspeci fi cation (graph is incorrect) ▪ Bounds for distributional drift over time in the online setting Introduction Background Parameter Recovery Theoretical Analysis Evaluation

Evaluation & Implications Introduction Background Parameter Recovery Theoretical Analysis Evaluation

We run faster, and get high quality Label model training times (s): Snorkel Temporal FlyingSquid Snorkel Benchmarks 3.0 -- 0.06 Video Tasks 41.5 292.3 0.20 End model accuracies (F1): Snorkel Temporal FlyingSquid Snorkel Benchmarks 74.6 -- 77.0 Video Tasks 47.4 75.2 76.2 Introduction Background Parameter Recovery Theoretical Analysis Evaluation

Re-training in the End Model Training Loop No SGD -> re-train the label model in the training loop of an end model X 1 X 2 X 3 Loss Gradients PyTorch integration: FlyingSquid loss layer Introduction Background Parameter Recovery Theoretical Analysis Evaluation

Speedups enable online learning Online learning: re-train on a rolling window Adapt to distributional drift over time Introduction Background Parameter Recovery Theoretical Analysis Evaluation

Thank you! ∴ ⇒ ∎ Contact: Dan Fu (danfu@cs.stanford.edu, @realDanFu) Dan Fu Mayee Chen Code: https://github.com/HazyResearch/ fl yingsquid Blog Post (Towards Interactive Weak Supervision with FlyingSquid): http://hazyresearch.stanford.edu/ fl yingsquid Fast and Three-rious: Speeding Up Weak Supervision with Triplet Methods: https://arxiv.org/abs/2002.11955 Fred Sala Sarah Hooper

FlyingSquid: Speeding Up Weak Supervision with Triplet Methods Dan - PowerPoint PPT Presentation

FlyingSquid: Speeding Up Weak Supervision with Triplet Methods Dan Fu, Mayee Chen, Fred Sala, Sarah Hooper, Kayvon Fatahalian, Chris R Daniel Y. Fu, Mayee F. Chen, Frederic Sala, Sarah M. Hooper, Kayvon Fatahalian, and Christopher R. Fast

Noise2Self: Blind Denoising by Self-Supervision Joshua Batson Loc Royer Noisy Data

Few-shot learning of weak supervision sources in Snorkel (or, learning weakly supervised weak

Supervision Strengthening Our Practice The plan Supervision what is it? Benefits

Supervision Mandatory Webinar 4 Webinar overview I. Background II. Why supervision? III.

Speeding up the Inter-Planetary File System (IPFS) Speeding up the Inter-Planetary File System

Speeding Up Your Mac A Joe ON Tech Guide Speeding Up Your Mac Basics Three factors affect

On Minimum Elementary-triplet Bases for Independence Relations Janneke H. Bolt and Linda C. van

Unification without Doublet-Triplet Splitting SUSY Exotics at the LHC Jrgen Reuter

Weak Supervision Vincent Chen and Nish Khandwala Outline Motivation We want more

Learning Dependency Structures for Weak Supervision Models Fred Sala , Paroma Varma, Ann He, Alex

Weak Supervision, noisy labels, and error propagation Marat Freytsis hep-ai journal club

Weak-Signal Digital Modes Weak-Signal Digital Modes The weak-signal digimodes have been

To the weak I became weak, that I might win the weak. I have become all things to all people,

WEAK INTERPOLATION PROPERTY over THE MINIMAL LOGIC Larisa Maksimova Sobolev Institute of

Linking linking Weak forms Linking Weak forms Elision (sound cut)

Weak memory models INF4140 - Models of concurrency Weak memory models Fall 2016 30. 10. 2016

Astrophysical Probes of Dark Matter Ting Li, Alex Drlica-Wagner Buckley & Peter 2017

Random Forest Applied Multivariate Statistics Spring 2012 Overview Intuition of Random

semiconductors [Fonstad, Sze02, Ghione] Semiconductors Conducibility: - Insulators: s <10-

Understanding the Economic Imperative of Energy Efficiency * John A. Skip Laitner

NOBIS DA ECONOMIA KAHNEMAN E Richard Thaler Daniel Kahneman THALER: APLICAO AO

Human Factors? 1 Causes of incidences in healthcare Building a Safer USA 2000 Institute of

CONNECTIVITY RESEARCH INTERN, BANKS LAB FROM HIGH- DEPARTMENT OF DIMENSIONAL ANESTHESIOLOGY

Advisory Group (HITAG) HIE/HIT Community and Organization Panel (HCOP) July 14, 2016 1 Agenda

Sambuz

Useful Links

Newsletter

Mail Us

FlyingSquid: Speeding Up Weak Supervision with Triplet Methods Dan - PowerPoint PPT Presentation

FlyingSquid: Speeding Up Weak Supervision with Triplet Methods Dan Fu*, Mayee Chen*, Fred Sala, Sarah Hooper, Kayvon Fatahalian, Chris R Daniel Y. Fu*, Mayee F. Chen*, Frederic Sala, Sarah M. Hooper, Kayvon Fatahalian, and Christopher R. Fast

Noise2Self: Blind Denoising by Self-Supervision Joshua Batson Loc Royer Noisy Data

Few-shot learning of weak supervision sources in Snorkel (or, learning weakly supervised weak

Supervision Strengthening Our Practice The plan Supervision what is it? Benefits

Supervision Mandatory Webinar 4 Webinar overview I. Background II. Why supervision? III.

Speeding up the Inter-Planetary File System (IPFS) Speeding up the Inter-Planetary File System

Speeding Up Your Mac A Joe ON Tech Guide Speeding Up Your Mac Basics Three factors affect

On Minimum Elementary-triplet Bases for Independence Relations Janneke H. Bolt and Linda C. van

Unification without Doublet-Triplet Splitting SUSY Exotics at the LHC Jrgen Reuter

Weak Supervision Vincent Chen and Nish Khandwala Outline Motivation We want more

Learning Dependency Structures for Weak Supervision Models Fred Sala , Paroma Varma, Ann He, Alex

Weak Supervision, noisy labels, and error propagation Marat Freytsis hep-ai journal club

Weak-Signal Digital Modes Weak-Signal Digital Modes The weak-signal digimodes have been

To the weak I became weak, that I might win the weak. I have become all things to all people,

WEAK INTERPOLATION PROPERTY over THE MINIMAL LOGIC Larisa Maksimova Sobolev Institute of

Linking linking Weak forms Linking Weak forms Elision (sound cut)

Weak memory models INF4140 - Models of concurrency Weak memory models Fall 2016 30. 10. 2016

Astrophysical Probes of Dark Matter Ting Li, Alex Drlica-Wagner Buckley &amp; Peter 2017

Random Forest Applied Multivariate Statistics Spring 2012 Overview Intuition of Random

semiconductors [Fonstad, Sze02, Ghione] Semiconductors Conducibility: - Insulators: s &lt;10-

Understanding the Economic Imperative of Energy Efficiency * John A. Skip Laitner

NOBIS DA ECONOMIA KAHNEMAN E Richard Thaler Daniel Kahneman THALER: APLICAO AO

Human Factors? 1 Causes of incidences in healthcare Building a Safer USA 2000 Institute of

CONNECTIVITY RESEARCH INTERN, BANKS LAB FROM HIGH- DEPARTMENT OF DIMENSIONAL ANESTHESIOLOGY

Advisory Group (HITAG) HIE/HIT Community and Organization Panel (HCOP) July 14, 2016 1 Agenda

Sambuz

Useful Links

Newsletter

Mail Us

FlyingSquid: Speeding Up Weak Supervision with Triplet Methods Dan Fu, Mayee Chen, Fred Sala, Sarah Hooper, Kayvon Fatahalian, Chris R Daniel Y. Fu, Mayee F. Chen, Frederic Sala, Sarah M. Hooper, Kayvon Fatahalian, and Christopher R. Fast

Astrophysical Probes of Dark Matter Ting Li, Alex Drlica-Wagner Buckley & Peter 2017

semiconductors [Fonstad, Sze02, Ghione] Semiconductors Conducibility: - Insulators: s <10-