light supervision
play

Light-Supervision of Structured Prediction Energy Networks Andrew - PowerPoint PPT Presentation

Light-Supervision of Structured Prediction Energy Networks Andrew McCallum SPENs Generalized Expectation Pedram Aishwarya [2016] [Mann; Druck 2010-12] Rooshenas Kamath David Belanger Greg Druck Oregon PhD UMass Postdoc UMass MS


  1. Light-Supervision of Structured Prediction Energy Networks Andrew McCallum SPENs Generalized Expectation Pedram Aishwarya [2016] [Mann; Druck 2010-12] Rooshenas Kamath David Belanger Greg Druck Oregon PhD → UMass Postdoc UMass MS UMass PhD → Google Brain UMass PhD → Yummly

  2. Light-Supervision Prior Knowledge as Generalized Expectation …induces extra structural dependencies… Structured Prediction Complex dependencies with SPENs

  3. Chapter 1 Generalized Expectation

  4. Learning from small labeled data

  5. Leverage unlabeled data

  6. Family 1: Expectation Maximization [Dempster, Laird, Rubin, 1977]

  7. Family 2: Graph-Based Methods [Szummer, Jaakkola, 2002] [Zhu, Ghahramani, 2002]

  8. Family 3: Auxiliary-Task Methods [Ando and Zhang, 2005]

  9. Family 4: Boundary in Sparse Region Transductive SVMs [Joachims, 1999]: Sparsity measured by margin Entropy Regularization [Grandvalet & Bengio, 2005]: minimize label entropy

  10. Family 4: Boundary in Sparse Region Family 5: Generalized Expectation Criteria [Mann, McCallum 2010; Druck, Mann, McCallum 2011, Druck McCallum 2012] Transductive SVMs [Joachims, 1999]: Sparsity measured by margin Entropy Regularization [Grandvalet & Bengio, 2005]: minimize label entropy best solution? Label Proportions Student Faculty Label | Feature Label Prior 100 Expectations Expectations E [ p(y|f(x)) ] E [ p(y) ] 50 0

  11. Expectations on Labels | Features 
 Classifying Baseball versus Hockey Generalized Traditional Expectation Human Brainstorm Labeling a few Effort Keywords p(HOCKEY | “puck”) = .9 ball puck field ice bat stick (Semi-)Supervised Training via Semi-Supervised Training via Maximum Likelihood Generalized Expectation

  12. Labeling Features ~1000 unlabeled examples features labeled . . . hockey goal ball batting Edmonton Oilers baseball Buffalo Oilers base Toronto Maple HR Leafs Leafs Sox NHL Pittsburgh Mets puck Pens Bruins Penguins Lemieux runs Penguins Accuracy 85% 92% 94.5% 96%

  13. Accuracy per Human E ff ort Labeling features Test accuracy Labeling instances Labeling time in seconds

  14. Prior Knowledge Feature labels from humans baseball / hockey classification baseball hockey hit puck braves goal runs nhl many other sources resources on the web --- --- --- --- --- --- data from related tasks ----- -- ----- -- ----- -- -- --- -- --- -- --- W. H. Enright. Improving the efficiency of matrix operations in the numerical solution of stiff --- --- --- --- --- --- ordinary differential equations. ACM Trans. Math. ----- -- ----- -- ----- -- -- --- -- --- -- --- Softw., 4(2), 127-136, June 1978.

  15. Generalized Expectation (GE) input variables output variables O ( θ ) = S (E ˜ p ( x ) [E p ( y | x ; θ ) [ g ( x , y )]]) + p ( θ ) constraint features returns 1 if x contains “hit” and y is baseball

  16. Generalized Expectation (GE) assume general CRF [Lafferty et al. 01] 1 θ > f ( x , y ) � � p ( y | x ; θ ) = exp Z θ , x O ( θ ) = S (E ˜ p ( x ) [E p ( y | x ; θ ) [ g ( x , y )]]) + p ( θ ) model features model distribution model probability of baseball if x contains “hit”

  17. Generalized Expectation (GE) O ( θ ) = S (E ˜ p ( x ) [E p ( y | x ; θ ) [ g ( x , y )]]) + p ( θ ) empirical distribution (can be defined as) 
 model’s probability that documents that contain “hit” are labeled baseball

  18. Generalized Expectation (GE) (soft) expectation constraint O ( θ ) = S (E ˜ p ( x ) [E p ( y | x ; θ ) [ g ( x , y )]]) + p ( θ ) score function larger score if model expectation matches prior knowledge

  19. Generalized Expectation (GE) Objective Function O ( θ ) = S (E ˜ p ( x ) [E p ( y | x ; θ ) [ g ( x , y )]]) + r ( θ ) regularization

  20. GE Score Functions O ( θ ) = S (E ˜ p ( x ) [E p ( y | x ; θ ) [ g ( x , y )]]) + r ( θ ) model expectations target expectations ˆ g = g θ = � 2 � �� � ˆ � �� 2 ( θ ) = − S l 2 squared error: g − g θ 2 target expectations model expectations } “puck” g = ˆ g θ = } “hit” g q log ˆ g q X KL divergence: S KL ( θ ) = − ˆ g θ ,q q

  21. Estimating Parameters with GE O ( θ ) = S (E ˜ p ( x ) [E p ( y | x ; θ ) [ g ( x , y )]]) + r ( θ ) v i = ˆ g i v i = − 2(ˆ g i − g θ i ) violation term: sq. error: KL: g θ i ⇥ θ O ( θ ) = v � ⇣ p ( x ) [E p ( y | x ; θ ) [ g ( x , y ) f ( x , y ) � ] E ˜ violation ⌘ � E p ( y | x ; θ ) [ g ( x , y )]E p ( y | x ; θ ) [ f ( x , y ) � ]] + ⇥ θ r ( θ ) estimated covariance between model and constraint features

  22. Learning About Unconstrained Features Trained Model generalizes beyond hit prior knowledge run hit pitcher unlabeled puck GE learned through puck covariance model goal NHL

  23. Generalized Expectation criteria 
 Easy communication with domain experts • Inject domain knowledge into parameter estimation • Like “informative prior”... • ...but rather than the “language of parameters” 
 (di ffi cult for humans to understand) • ...use the “language of expectations” 
 (natural for humans)

  24. IID Structured Prediction “classification” e.g. logistic regression Example: Spam Filtering Predicted Not Not Spam Spam Y Spam Spam Observed @ @ @ @ X

  25. Structured Prediction 羅穆尼頭號對⼿扌 桑托倫在三州勝 選,⽽耍⾦釒瑞契只 e.g. “sequence labeling” Chinese Word Segmentation 贏得喬治亞州的 初選。羅穆尼⾯靣 臨的⼀丁⼤夨挑戰是, O ( θ ) = S (E ˜ p ( x ) [E p ( y | x ; θ ) [ g ( x , y )]]) + r ( θ ) 其他共和黨總統 參選⼈亻⺫⽬盯前均表 Linear-chain CRF GE v > X X X p ( y i � 1 , y i , y j | x ; θ ) g ( x , y j , j ) f ( x , y i � 1 , y i , i ) > Gradient marginal over three, i j y non-consecutive positions Not Not Y Start Start Start Start 中 国 ⼈亻 民 X C h i n e s e P e o p l e

  26. Natural Expectations lead to Di ffi cult Training Inference “AUTHOR field should be contiguous, only appearing once.” AUTHOR AUTHOR Anna Popescu (2004), “Interactive Clustering,” EDITOR EDITOR Wei Li (Ed.), Learning Handbook, Athos Press, LOCATION Souroti. The downfall of GE. p(y i-1 , y i , y j , y k )

  27. Chapter 2 A framework providing easier inference for complex dependencies? Structured Prediction Energy Networks Deep Learning + Structured Prediction

  28. Structured Prediction “classification” e.g. logistic regression Example: Spam Filtering Predicted Not Not Spam Spam Y Spam Spam =argmin Y E (Y;X)= 횺 Factor Factor Factor Factor Observed @ @ @ @ X

  29. Structured Prediction 羅穆尼頭號對⼿扌 桑托倫在三州勝 選,⽽耍⾦釒瑞契只 e.g. “sequence labeling” 贏得喬治亞州的 初選。羅穆尼⾯靣 臨的⼀丁⼤夨挑戰是, Example: Chinese Word Segmentation 其他共和黨總統 參選⼈亻⺫⽬盯前均表 E (Y,Y) Not Not Y Start Start Start Start =argmin Y E (Y;X) 中 国 ⼈亻 民 X C h i n e s e P e o p l e

  30. Structured Prediction 羅穆尼頭號對⼿扌 桑托倫在三州勝 選,⽽耍⾦釒瑞契只 e.g. “sequence labeling” 贏得喬治亞州的 初選。羅穆尼⾯靣 臨的⼀丁⼤夨挑戰是, Example: Chinese Word Segmentation 其他共和黨總統 參選⼈亻⺫⽬盯前均表 E (Y,Y) Not Not Y Start Start Start Start E (Y;X) Feature Engineering 中 国 ⼈亻 民 X C h i n e s e P e o p l e

  31. Structured Prediction 羅穆尼頭號對⼿扌 桑托倫在三州勝 選,⽽耍⾦釒瑞契只 e.g. “sequence labeling” 贏得喬治亞州的 初選。羅穆尼⾯靣 臨的⼀丁⼤夨挑戰是, Example: Chinese Word Segmentation 其他共和黨總統 參選⼈亻⺫⽬盯前均表 E (Y,Y) Not Not Y Start Start Start Start E (Y;X) Feature Engineering 中 国 ⼈亻 民 X C h i n e s e P e o p l e

  32. Structured Prediction 羅穆尼頭號對⼿扌 桑托倫在三州勝 選,⽽耍⾦釒瑞契只 e.g. “sequence labeling” 贏得喬治亞州的 初選。羅穆尼⾯靣 臨的⼀丁⼤夨挑戰是, Example: Chinese Word Segmentation 其他共和黨總統 參選⼈亻⺫⽬盯前均表 E (Y,Y) “Hidden Unit Conditional Random Fields” Maaten, Welling, Saul, AISTATS 2011 Not Not Y Start Start Start Start E (Y,Z;X) Feature Engineering Z 1 Z 2 Z 3 Z 4 中 国 ⼈亻 民 X C h i n e s e P e o p l e

  33. Structured Prediction 羅穆尼頭號對⼿扌 桑托倫在三州勝 選,⽽耍⾦釒瑞契只 e.g. “sequence labeling” 贏得喬治亞州的 初選。羅穆尼⾯靣 臨的⼀丁⼤夨挑戰是, Example: Chinese Word Segmentation 其他共和黨總統 參選⼈亻⺫⽬盯前均表 E (Y,Y) Not Not Y Start Start Start Start E (Y,Z;X) Feature Engineering Z 1 Z 2 Z 3 Z 4 中 国 ⼈亻 民 X C h i n e s e P e o p l e

  34. Structured Prediction 羅穆尼頭號對⼿扌 桑托倫在三州勝 選,⽽耍⾦釒瑞契只 e.g. “sequence labeling” 贏得喬治亞州的 初選。羅穆尼⾯靣 臨的⼀丁⼤夨挑戰是, Example: Chinese Word Segmentation 其他共和黨總統 參選⼈亻⺫⽬盯前均表 E (Y,Y) Dependency structure Not Not Y Start Start Start Start E (Y,Z;X) Feature Engineering Z 1 Z 2 Z 3 Z 4 中 国 ⼈亻 民 X C h i n e s e P e o p l e

Recommend


More recommend