predicting sequences structured perceptron
play

Predicting Sequences: Structured Perceptron CS 6355: Structured - PowerPoint PPT Presentation

Predicting Sequences: Structured Perceptron CS 6355: Structured Prediction 1 Conditional Random Fields summary An undirected graphical model Decompose the score over the structure into a collection of factors Each factor assigns a


  1. Predicting Sequences: Structured Perceptron CS 6355: Structured Prediction 1

  2. Conditional Random Fields summary • An undirected graphical model – Decompose the score over the structure into a collection of factors – Each factor assigns a score to assignment of the random variables it is connected to • Training and prediction – Final prediction via argmax w T Á ( x , y ) – Train by maximum (regularized) likelihood • Connections to other models – Effectively a linear classifier – A generalization of logistic regression to structures – An conditional variant of a Markov Random Field • We will see this soon 2

  3. Global features The feature function decomposes over the sequence y 0 y 1 y 2 y 3 x 𝒙 𝑈 𝜚(𝒚, 𝑧 0 , 𝑧 1 ) 𝒙 𝑈 𝜚(𝒚, 𝑧 + , 𝑧 2 ) 𝒙 𝑈 𝜚(𝒚, 𝑧 2 , 𝑧 3 ) 3

  4. Outline • Sequence models • Hidden Markov models – Inference with HMM – Learning • Conditional Models and Local Classifiers • Global models – Conditional Random Fields – Structured Perceptron for sequences 4

  5. � HMM is also a linear classifier Consider the HMM: 𝑄 𝐲, 𝐳 = 2 𝑄 𝑧 3 𝑧 34+ 𝑄 𝑦 3 𝑧 3 3 5

  6. � HMM is also a linear classifier Consider the HMM: 𝑄 𝐲, 𝐳 = 2 𝑄 𝑧 3 𝑧 34+ 𝑄 𝑦 3 𝑧 3 3 Transitions Emissions 6

  7. � � HMM is also a linear classifier Consider the HMM: 𝑄 𝐲, 𝐳 = 2 𝑄 𝑧 3 𝑧 34+ 𝑄 𝑦 3 𝑧 3 3 Or equivalently log 𝑄 𝐲, 𝐳 = : log 𝑄 𝑧 3 ∣ 𝑧 34+ + log 𝑄 𝑦 3 ∣ 𝑧 3 3 Log joint probability = transition scores + emission scores 7

  8. � HMM is also a linear classifier log 𝑄 𝐲, 𝐳 = : log 𝑄 𝑧 3 ∣ 𝑧 34+ + log 𝑄 𝑦 3 ∣ 𝑧 3 3 Log joint probability = transition scores + emission scores Let us examine this expression using a carefully defined set of indicator functions 8

  9. � HMM is also a linear classifier log 𝑄 𝐲, 𝐳 = : log 𝑄 𝑧 3 ∣ 𝑧 34+ + log 𝑄 𝑦 3 ∣ 𝑧 3 3 Log joint probability = transition scores + emission scores Let us examine this expression using a carefully defined set of indicator functions 𝐽 ? = @1, 𝑨 is true, 0, 𝑨 is false. Indicators are functions that map Booleans to 0 or 1 9

  10. � � � HMM is also a linear classifier log 𝑄 𝐲, 𝐳 = : log 𝑄 𝑧 3 ∣ 𝑧 34+ + log 𝑄 𝑦 3 ∣ 𝑧 3 3 Log joint probability = transition scores + emission scores Let us examine this expression using a carefully defined set of indicator functions Equivalent to 𝑡 L : : log 𝑄 𝑡 ⋅ 𝐽 N O PQ ⋅ 𝐽 [N OTUVWR ] Q R Q The indicators ensure that only one of the elements of the double summation is non-zero 10

  11. � � HMM is also a linear classifier log 𝑄 𝐲, 𝐳 = : log 𝑄 𝑧 3 ∣ 𝑧 34+ + log 𝑄 𝑦 3 ∣ 𝑧 3 3 Log joint probability = transition scores + emission scores Let us examine this expression using a carefully defined set of indicator functions Equivalent to : log 𝑄 𝑦 3 𝑡 ⋅ 𝐽 N O PQ Q The indicators ensure that only one of the elements of the summation is non-zero 11

  12. � � � � � � HMM is also a linear classifier log 𝑄 𝐲, 𝐳 = : log 𝑄 𝑧 3 ∣ 𝑧 34+ + log 𝑄 𝑦 3 ∣ 𝑧 3 3 Let us examine this expression using a carefully defined set of indicator functions 𝑡 L log 𝑄 𝐲, 𝐳 = : : : log 𝑄 𝑡 ⋅ 𝐽 N O PQ ⋅ 𝐽 [N OTUVWR ] Q R 3 Q + : : log 𝑄 𝑦 3 𝑡 ⋅ 𝐽 N O PQ 3 Q 12

  13. � � � � � � � � � � � HMM is also a linear classifier log 𝑄 𝐲, 𝐳 = : log 𝑄 𝑧 3 ∣ 𝑧 34+ + log 𝑄 𝑦 3 ∣ 𝑧 3 3 Let us examine this expression using a carefully defined set of indicator functions 𝑡 L log 𝑄 𝐲, 𝐳 = : : : log 𝑄 𝑡 ⋅ 𝐽 N O PQ ⋅ 𝐽 [N OTUVWR ] Q R 3 Q + : : log 𝑄 𝑦 3 𝑡 ⋅ 𝐽 N O PQ 3 Q log 𝑄 𝐲, 𝐳 = : : log 𝑄(𝑡 ∣ 𝑡 L ) : 𝐽 N O PQ 𝐽 [N OTUVWR ] Q R Q 3 + : log 𝑄 𝑦 3 𝑡 : 𝐽 N O PQ Q 3 13

  14. � � � � � � HMM is also a linear classifier log 𝑄 𝐲, 𝐳 = : log 𝑄 𝑧 3 ∣ 𝑧 34+ + log 𝑄 𝑦 3 ∣ 𝑧 3 3 Let us examine this expression using a carefully defined set of indicator functions 𝑡 L : 𝐽 N O PQ log 𝑄 𝐲, 𝐳 = : : log 𝑄 𝑡 𝐽 [N OTUVWR ] Q R Q 3 + : log 𝑄 𝑦 3 𝑡 : 𝐽 N O PQ Number of times Q 3 there is a transition in the sequence from state 𝑡’ to state 𝑡 Count(𝑡 L → 𝑡) 14

  15. � � � � � HMM is also a linear classifier log 𝑄 𝐲, 𝐳 = : log 𝑄 𝑧 3 ∣ 𝑧 34+ + log 𝑄 𝑦 3 ∣ 𝑧 3 3 Let us examine this expression using a carefully defined set of indicator functions ⋅ Count(𝑡 L → 𝑡) 𝑡 L log 𝑄 𝐲, 𝐳 = : : log 𝑄 𝑡 Q R Q + : log 𝑄 𝑦 3 𝑡 : 𝐽 N O PQ Q 3 15

  16. � � � � � HMM is also a linear classifier log 𝑄 𝐲, 𝐳 = : log 𝑄 𝑧 3 ∣ 𝑧 34+ + log 𝑄 𝑦 3 ∣ 𝑧 3 3 Let us examine this expression using a carefully defined set of indicator functions ⋅ Count(𝑡 L → 𝑡) 𝑡 L log 𝑄 𝐲, 𝐳 = : : log 𝑄 𝑡 Q R Q + : log 𝑄 𝑦 3 𝑡 : 𝐽 N O PQ Q 3 Number of times state 𝑡 occurs in the sequence: Count(𝑡) 16

  17. � � � � HMM is also a linear classifier log 𝑄 𝐲, 𝐳 = : log 𝑄 𝑧 3 ∣ 𝑧 34+ + log 𝑄 𝑦 3 ∣ 𝑧 3 3 Let us examine this expression using a carefully defined set of indicator functions ⋅ Count(𝑡 L → 𝑡) 𝑡 L log 𝑄 𝐲, 𝐳 = : : log 𝑄 𝑡 Q R Q + : log 𝑄 𝑦 3 𝑡 ⋅ Count 𝑡 Q 17

  18. � � � � HMM is also a linear classifier log 𝑄 𝐲, 𝐳 = : log 𝑄 𝑧 3 ∣ 𝑧 34+ + log 𝑄 𝑦 3 ∣ 𝑧 3 3 Let us examine this expression using a carefully defined set of indicator functions ⋅ Count(𝑡 L → 𝑡) 𝑡 L log 𝑄 𝐲, 𝐳 = : : log 𝑄 𝑡 Q R Q + : log 𝑄 𝑦 3 𝑡 ⋅ Count 𝑡 Q This is a linear function log P terms are the weights; counts via indicators are features Can be written as w T Á ( x , y ) and add more features 18

  19. HMM is a linear classifier: An example Det Noun Verb Det Noun The dog ate the homework 19

  20. HMM is a linear classifier: An example Det Noun Verb Det Noun The dog ate the homework Consider 20

  21. HMM is a linear classifier: An example Det Noun Verb Det Noun The dog ate the homework Consider Transition scores + Emission scores 21

  22. HMM is a linear classifier: An example Det Noun Verb Det Noun The dog ate the homework Consider log P(Det ! Noun) £ 2 + log P(Noun ! Verb) £ 1 + Emission scores + log P(Verb ! Det) £ 1 22

  23. HMM is a linear classifier: An example Det Noun Verb Det Noun The dog ate the homework Consider log P(The | Det) £ 1 log P(Det ! Noun) £ 2 + log P(dog| Noun) £ 1 + log P(Noun ! Verb) £ 1 + + log P(ate| Verb) £ 1 + log P(Verb ! Det) £ 1 + log P(the| Det) £ 1 + log P(homework| Noun) £ 1 23

  24. HMM is a linear classifier: An example Det Noun Verb Det Noun The dog ate the homework Consider log P(The | Det) £ 1 log P(Det ! Noun) £ 2 + log P(dog| Noun) £ 1 + log P(Noun ! Verb) £ 1 + + log P(ate| Verb) £ 1 + log P(Verb ! Det) £ 1 + log P(the| Det) £ 1 + log P(homework| Noun) £ 1 w : Parameters of the model 24

  25. HMM is a linear classifier: An example Det Noun Verb Det Noun The dog ate the homework Consider log P(The | Det) £ 1 log P(Det ! Noun) £ 2 Á ( x , y ): Properties of + log P(dog| Noun) £ 1 this output and the + log P(Noun ! Verb) £ 1 input + + log P(ate| Verb) £ 1 + log P(Verb ! Det) £ 1 + log P(the| Det) £ 1 + log P(homework| Noun) £ 1 25

  26. HMM is a linear classifier: An example Det Noun Verb Det Noun The dog ate the homework Consider log 𝑄(𝐸𝑓𝑢 → 𝑂𝑝𝑣𝑜) 2 log 𝑄(𝑂𝑝𝑣𝑜 → 𝑊𝑓𝑠𝑐) 1 log 𝑄(𝑊𝑓𝑠𝑐 → 𝐸𝑓𝑢) 1 log 𝑄 𝑈ℎ𝑓 𝐸𝑓𝑢) 1 ⋅ log 𝑄 𝑒𝑝𝑕 𝑂𝑝𝑣𝑜) 1 1 log 𝑄 𝑏𝑢𝑓 𝑊𝑓𝑠𝑐) 1 log 𝑄 𝑢ℎ𝑓 𝐸𝑓𝑢 1 log 𝑄 ℎ𝑝𝑛𝑓𝑥𝑝𝑠𝑙 𝑂𝑝𝑣𝑜) Á ( x , y ): Properties of this w : Parameters output and the input of the model 26

  27. HMM is a linear classifier: An example Det Noun Verb Det Noun The dog ate the homework Consider log 𝑄(𝐸𝑓𝑢 → 𝑂𝑝𝑣𝑜) 2 log 𝑄(𝑂𝑝𝑣𝑜 → 𝑊𝑓𝑠𝑐) 1 log P( x , y ) = A linear scoring log 𝑄(𝑊𝑓𝑠𝑐 → 𝐸𝑓𝑢) 1 function = w T Á ( x , y ) log 𝑄 𝑈ℎ𝑓 𝐸𝑓𝑢) 1 ⋅ log 𝑄 𝑒𝑝𝑕 𝑂𝑝𝑣𝑜) 1 1 log 𝑄 𝑏𝑢𝑓 𝑊𝑓𝑠𝑐) 1 log 𝑄 𝑢ℎ𝑓 𝐸𝑓𝑢 1 log 𝑄 ℎ𝑝𝑛𝑓𝑥𝑝𝑠𝑙 𝑂𝑝𝑣𝑜) Á ( x , y ): Properties of this w : Parameters output and the input of the model 27

Recommend


More recommend