Semi-Supervised Learning of Sequence Models via Method of Moments EMNLP - Empirical Methods for Natural Language Processing Austin, Texas November 1-6, 2016 Zita Marinho Shay B. Cohen Noah A. Smith André F. T. Martins Computer Science & Eng. IST, University of Lisbon School of Informatics IT, IST, University of Lisbon University of Washington Robotics Institute, CMU Unbabel University of Edinburgh zmarinho@cmu.edu andre.martins@unbabel.com scohen@inf.ed.ac.uk nasmith@cs.washington.edu
Sequence Labeling N V Pre Det N . y 1 y 2 y 3 y 5 y 4 y 6 w 1 w 4 w 6 w 2 w 3 w 5 Herb a . fights like ninja observed data {w 1 , w 2 , w 3 ,…, w 6 } labels {y 1 , y 2 , y 3 ,…, y 6 } 2 Introduction EMNLP 16 | Semi-supervised sequence labeling with MoM |
Sequence Labeling N V V Det N . y 1 y 2 y 3 y 5 y 4 y 6 w 1 w 4 w 6 w 2 w 3 w 5 Herb a . fights like ninja observed data {w 1 , w 2 , w 3 ,…, w 6 } labels {y 1 , y 2 , y 3 ,…, y 6 } 3 Introduction EMNLP 16 | Semi-supervised sequence labeling with MoM |
Sequence Labeling ADJ N V Det N . y 1 y 2 y 3 y 5 y 4 y 6 w 1 w 4 w 6 w 2 w 3 w 5 Herb a . fights like ninja observed data {w 1 , w 2 , w 3 ,…, w 6 } labels {y 1 , y 2 , y 3 ,…, y 6 } 4 Introduction EMNLP 16 | Semi-supervised sequence labeling with MoM |
Sequence Labeling K 6 possible assignments ? ? ? ? ? ? y 1 y 2 y 3 y 5 y 4 y 6 w 1 w 4 w 6 w 2 w 3 w 5 Herb a . fights like ninja observed data {w 1 , w 2 , w 3 ,…, w 6 } labels {y 1 , y 2 , y 3 ,…, y 6 } 5 Introduction EMNLP 16 | Semi-supervised sequence labeling with MoM |
Hidden Markov Model Learn parameters? p(y t | y t-1 ) y 1 y 2 y 3 y 5 y 4 y 6 p(w t | y t ) w 1 w 4 w 6 w 2 w 3 w 5 supervised learning • unsupervised/semi-supervised (this talk) • 6 Introduction EMNLP 16 | Semi-supervised sequence labeling with MoM |
Hidden Markov Model Learn parameters? p(y t | y t-1 ) y 1 y 2 y 3 y 5 y 4 y 6 p(w t | y t ) w 1 w 4 w 6 w 2 w 3 w 5 supervised learning • unsupervised/semi-supervised (this talk) • model can be extended to include features • Berg-Kirkpatrick, et al, Painless unsupervised learning with features. NAACL HLT, 2010. 7 Introduction EMNLP 16 | Semi-supervised sequence labeling with MoM |
Maximum Likelihood estimation Method of Moments estimation (MLE) (MoM) • exact inference is hard computationally efficient • EM sensitive to local optima no local optima (depends on initialization) • EM expensive in large datasets one pass over data (several inference passes) 8 Problem Statement EMNLP 16 | Semi-supervised sequence labeling with MoM |
Hidden Markov Model via Maximum Likelihood Estimation via Method of Moments MLE MLE MoM MoM HMM feature HMM HMM feature HMM ✓ ✓ semi-supervised learning ? ? ✓ ✓ ✓ unsupervised learning ? Shay B. Cohen, Karl Stratos, Michael Collins, Dean P. Foster and Lyle Ungar, Spectral Learning of Latent-Variable PCFGs: Algorithms and Sample Complexity , JMLR 2014 Arora et al., A Practical Algorithm for Topic Modeling with Provable Guarantees, ICML 2013 9 Introduction EMNLP 16 | Semi-supervised sequence labeling with MoM |
Learning sequence models via MoM Outline 1. Learn HMM models via MoM 2. Solve a QP 3. Extend to feature-based model 4. Experiments 5. Experiments 10 Outline EMNLP 16 | Semi-supervised sequence labeling with MoM |
Method of Moments Key insight: 1. Conditional Independence: infer label by looking at context 2. Anchor Trick: learn a proxy for labels with anchors 11 Anchor Learning EMNLP 16 | Semi-supervised sequence labeling with MoM |
1. Conditional Independence y 1 y 2 y 3 y 5 y 6 y 4 y 7 start stop w 6 w 1 w 7 w 2 w 3 w 5 w 4 hehe good its gonna a day b word 12 Anchor Learning EMNLP 16 | Semi-supervised sequence labeling with MoM |
1. Conditional Independence y 1 y 2 y 3 y 5 y 6 y 4 y 7 start stop w 6 w 1 w 7 w 2 w 3 w 5 w 4 :) goin wait now am 2 I context = { w -1 , w +1 } Log-linear model 13
1. Conditional Independence adp y t-1 y t y t+1 w t+1 w t-1 w t context chimichangas tasted like 14 Problem Statement EMNLP 16 | Semi-supervised sequence labeling with MoM |
1. Conditional Independence verb y t-1 y t y t+1 w t+1 w t-1 w t context fajitas i like 15 Problem Statement EMNLP 16 | Semi-supervised sequence labeling with MoM |
1. Conditional Independence verb y t-1 y t y t+1 w t+1 w t-1 w t context fajitas i like “You shall know a word by the company it keeps.” Firth, 1957 16 Problem Statement EMNLP 16 | Semi-supervised sequence labeling with MoM |
1. Conditional Independence | label word ⊥ context y 1 y 2 y 3 y 5 y 6 y 4 y 7 start stop w 6 w 1 w 7 w 2 w 3 w 5 w 4 hehe good its gonna a day b word context 17 Anchor Learning EMNLP 16 | Semi-supervised sequence labeling with MoM |
2. Anchor Trick all instances of be = verb p ( verb | be ) = 1 p ( label ≠ verb | be ) = 0 verb label y t-1 y t y t+1 anchor w t+1 w t-1 w t word be Arora et al., A Practical Algorithm for Topic Modeling with Provable Guarantees, ICML 2013 18 Anchor Learning EMNLP 16 | Semi-supervised sequence labeling with MoM |
2. Anchor Trick More anchors per label verb = b, be, are, is, am, have, going verb y is are go be going have am less biased context estimates more than 1 anchor word 19 Anchor Learning EMNLP 16 | Semi-supervised sequence labeling with MoM |
2. Anchor Trick How to find anchors ? • small labeled corpus • small lexicon Austin noun airport playground am,be,is,are go, verb make,made become he,it,she pron so,on,of adp 20 Anchor Learning EMNLP 16 | Semi-supervised sequence labeling with MoM |
Method of moments unlabeled co-occurrences in data w t-1 w t+1 w t+2 w t context Andrew fights like Jet Li. Ann sings like me. eat Fruit like cherry. Children like ice-cream. 21 Method of Moments EMNLP 16 | Semi-supervised sequence labeling with MoM |
Method of moments Q p (context | word) w t-1 w t+1 w t+2 w t context m a n e e y r r s r c d e r t - e e h l r e i e l h t v h e g l context e c i o m C . c h w fi word i a J l t Andrew fights like Jet Li. Ann sings like me. like eat Fruit like cherry. Children like ice-cream. 22 Method of Moments EMNLP 16 | Semi-supervised sequence labeling with MoM |
Method of moments Q p (context | word) m a n e e y r r s r c d e r t - e e h l r e i e l h t v h e g l context e c i o m C . c h w fi word i a J l t be Let there be love. like Bill will be a ninja. 23 Method of Moments EMNLP 16 | Semi-supervised sequence labeling with MoM |
Method of moments | label word ⊥ 1. Conditional Independence context = X p (context | word) p (label | word) p (context | label) labels context label context = x label word word R Γ Q 24 Method of Moments EMNLP 16 | Semi-supervised sequence labeling with MoM |
Method of moments | label word ⊥ 1. Conditional Independence context = X p (context | word) p (label | word) p (context | label) labels 2. Anchor Trick = X p (context | word) p (label | word) p (context | anchors) labels context label context = anchors x word word R Γ Q 25 Method of Moments EMNLP 16 | Semi-supervised sequence labeling with MoM |
Learning sequence models via MoM Outline 1. Learn HMM models via MoM 2. Solve a QP 3. Extend to feature-based model 4. Experiments 5. Experiments Proposed work 26 Outline EMNLP 16 | Semi-supervised sequence labeling with MoM |
Method of Moments p (label | word) p (context | label) p (context | word) context label context = anchors x word word R q γ Γ Q • solve per word type ~(ms) γ = argmin || q - R γ || 2 0 ≤ γ ≤ 1 γ = 1 X labels 27 Method of Moments EMNLP 16 | Semi-supervised sequence labeling with MoM |
Method of Moments p (label | word) p (context | label) p (context | word) context label context = anchors x word word R q γ Γ Q + λ || γ sup - γ || 2 γ = argmin || q - R γ || 2 0 ≤ γ ≤ 1 γ = 1 X labels 28 Method of Moments EMNLP 16 | Semi-supervised sequence labeling with MoM |
Method of Moments p (label | word) p (context | label) p (context | word) context label context = anchors x word word R q γ Γ Q estimated from labeled data + λ || γ sup - γ || 2 γ = argmin || q - R γ || 2 0 ≤ γ ≤ 1 estimated from unlabeled data γ = 1 X labels 29 Method of Moments EMNLP 16 | Semi-supervised sequence labeling with MoM |
HMM Learning Learn parameters ? p (label | word) γ coefficients Observation Matrix Bayes’ Rule p (word) γ p ( word | label ) = p (label) γ p (word) X p (label) = words 30 Method of Moments EMNLP 16 | Semi-supervised sequence labeling with MoM |
HMM Learning Learn parameters ? Observation Matrix Bayes’ Rule p (word) γ p ( word | label ) = p (label) Transition Matrix • estimate from labeled data only 31 Method of Moments EMNLP 16 | Semi-supervised sequence labeling with MoM |
Recommend
More recommend