learning with structured output spaces
play

Learning with Structured Output Spaces Keerthiram Murugesan - PowerPoint PPT Presentation

Learning with Structured Output Spaces Keerthiram Murugesan Standard Predic,on Find func8on from input space X to output space Y such that the predic8on error is low. x Microsoft announced today that they x acquired Apple for the amount


  1. Learning with Structured Output Spaces Keerthiram Murugesan

  2. Standard Predic,on • Find func8on from input space X to output space Y such that the predic8on error is low. x Microsoft announced today that they x acquired Apple for the amount equal to the y gross national product of Switzerland. 1 GATACAACCTATCCCCGTATATATATTCTA Microsoft officials stated that they first x TGGGTATAGTATTAAATCAATACAACCTAT y wanted to buy Switzerland, but eventually CCCCGTATATATATTCTATGGGTATAGTAT -1 were turned off by the mountains and the TAAATCAATACAACCTATCCCCGTATATAT snowy winters… y 7.3 ATTCTATGGGTATAGTATTAAATCAGATAC AACCTATCCCCGTATATATATTCTATGGGT ATAGTATTAAATCACATTTA (typically Y is “simple”)

  3. ��� � � ��� � � � ��� � � � � �� � � �� �� � �� �� � � � ��� � � � ��� � �� � ��� � �� � � � �� � ��� � � � ��� � � � ��� � �� � � �� �� �� � � Structured Predic,on X Y Y Y X X APPGEAYLQPGEAYLQV The dog chased the cat. [Obama]running S presidential election in the [presidental election] has position NP VP mobilized [ many this group young voters]. many young voters [His][position] on Det N V NP [climate change] Obama was well received His climate change by [this group]. Det N X X Y � Y � Y � Y Conservation Reservoir !"#$0' !"#$%&'()*'+,'-.%' 4' 5' !"#$%&'/)*'+,'-.%' Corridors 12+30' � Y � � Y � � Y �

  4. � � �� � � �� �� �� � � ��� � ��� � � � ��� � � Talk Overview • Structured Predic8on (Quick Review) The dog chased the cat. S – Conven8onal Approach NP VP Det N V NP Det N • Structured Predic8on Cascades – Ensemble Cascades � Y • Ensemble learning for Structured Predic8on – Online algorithm – Boos8ng-style algorithm � Y �

  5. Structured Predic8on

  6. Structured Output Spaces • Input: x • Predict: y Y(x) Structured! ∈ • Quality determined by u8lity func8on Scoring • Conven,onal Approach: func,on – Train: learn model U(x,y) of u8lity – Test: predict via h ( x ) = argmax y ∈ Y ( x ) U ( x , y ) Can be challenging

  7. Example: Sequence Predic8on • Part-of-Speech Tagging h ( x ) = argmax y ∈ Y ( x ) U ( x , y ) – Given a sequence of words x – Predict sequence of tags y. y x Det V Det N The rain wet the cat N y Det V V Adj V y Adv N V V Det …

  8. Example: Sequence Predic8on • MAP inference in 1-st order Markov models 1 st order dynamics … y 1 y 2 y 3 y 4 … x 1 x 3 x 2 x 4 Similar models include CRFs, Kalman Filters, Linear Dynamical Systems, etc.

  9. Example: Sequence Predic8on y 1 y 2 y 3 y 4 x 1 x 2 x 3 x 4 n • Utility function: ∑ U ( x , y ) = u ( x t , y t , y t − 1 ) Sum over maximal cliques t = 1 n ∑ • Prediction: h ( x ) = argmax u ( x t , y t , y t − 1 ) y t = 1 Dynamic Programming

  10. Scoring function as Linear Models • U/u is parameterized linearly: ∑ U ( x , y ; θ ) = u ( x t , y t , y t − 1 ; θ ) t Some feature u ( x , y 1 , y 2 ; θ ) = θ T f ( x , y 1 , y 2 ) representa,on θ T f ( x t , y t , y t − 1 ) ∑ h ( x ; θ ) = argmax y t Dynamic Programming

  11. Feature representa8on

  12. Generalizing to Other Structures • From last slide: θ T f ( x t , y t , y t − 1 ) ∑ h ( x ; θ ) = argmax y t • General Formulation: ∑ Ψ ( x , y ) = f ( x t , y t , y t − 1 ) • Viterbi t • CKY Parsing θ T Ψ ( x , y ) h ( x ; θ ) = argmax • Sorting • Belief Propagation y • Integer Programming

  13. Learning Se]ng λ 2 + ℓ y , h ( x ; θ ) ∑ ( ) argmin 2 θ h ( x ) = argmax y ∈ Y ( x ) U ( x , y ) θ ( x , y ) Regulariza,on Loss Func,on • Generaliza8on of Conven8onal Se]ngs – Hinge loss = Structural SVMs – Log-loss = Condi8onal Random Fields – Gradient Descent, Cu]ng Plane, etc… • Requires running inference during training

  14. Restric8on: Increased Complexity

  15. Restric8on: Pre-specified Structure h ( x ; θ ) = argmax U ( x , y ; θ ) y Structure • Learn a (linearly) parameterized U – Such that h(x) gives good predic8ons • What if U is “wrong”? – Known to not be consistent – Infinite training data ≠ converging to best model

  16. Summary: Structured Predic8on h ( x ; θ ) = argmax U ( x , y ; θ ) y Structure • Conven8onal Approach – Specify structure & inference procedure – Train parameters on training set {(x,y)} • Limita,ons: – Run,me propor,onal to Model Complexity – Structure Mismatch & Inconsistency

  17. Structured Predic8on Cascades

  18. Classifier Cascades (Face Classifier)

  19. Classifier Cascades

  20. Tradeoffs in Cascaded Learning • Accuracy : Minimize the number of errors incurred by each level • Efficiency : Maximize the number of filtered assignments at each level

  21. Structured Predic8on Cascades

  22. Clique Assignments • Valid assignment for clique (Y k-1 ,Y k ) Remember Sum over Adj N Cliques? ∑ U ( x , y ) = u ( x t , y t , y t − 1 ) Y k-1 Y k c ∈ C • Invalid assignment (that will be eliminated/ pruned) N N Y k-1 Y k

  23. Clique Assignments • Valid assignment for clique (Y k-1 ,Y k ) How do we know this Adj N assignment is good or bad? 1. Score 2. Threshold Y k-1 Y k • Invalid assignment (that will be eliminated/ pruned) N N Y k-1 Y k

  24. Max-marginal score (sequence models)

  25. Threshold (t)

  26. Threshold (t)

  27. Threshold (t)

  28. Learning θ at each cascade level

  29. Online learning

  30. Structured Predic8on Ensembles

  31. Ensemble Learning h 1 h 2 h 3 h p face face no face face Goal: Combine these output from mul8ple models / hypotheses / experts: 1) Majority Vo8ng 2) Linear combina8on of hypotheses/experts 3) Boos8ng, etc

  32. Weighted Majority Algorithm

  33. Ensemble learning for Structured Predic8on h 1 h 2 h 3 h p . . . h 1 1 h 1 2 . . . h 1 l h p 1 h p 2 . . . h p l h 1 Adv N V V Det

  34. Example: Sequence Model

  35. Weighted Majority Algorithm for Structured Predic8on Ensembles

  36. Ensemble output from Weighted Majority Algorithm • Given W 1 , W 2 , … W T

  37. Boos8ng for Structured Predic8on Ensembles

  38. Ensemble output from Boos8ng • Given the base learners h 1 , h 2 , … h T, : • Note h 1 , h 2 , … h T are different from h 1 , h 2 , … h P

  39. • THE END

Recommend


More recommend