Predicting Sequences: Structured Perceptron CS 6355: Structured Prediction 1
Conditional Random Fields summary • An undirected graphical model – Decompose the score over the structure into a collection of factors – Each factor assigns a score to assignment of the random variables it is connected to • Training and prediction – Final prediction via argmax w T Á ( x , y ) – Train by maximum (regularized) likelihood • Connections to other models – Effectively a linear classifier – A generalization of logistic regression to structures – An conditional variant of a Markov Random Field • We will see this soon 2
Global features The feature function decomposes over the sequence y 0 y 1 y 2 y 3 x 𝒙 𝑈 𝜚(𝒚, 𝑧 0 , 𝑧 1 ) 𝒙 𝑈 𝜚(𝒚, 𝑧 + , 𝑧 2 ) 𝒙 𝑈 𝜚(𝒚, 𝑧 2 , 𝑧 3 ) 3
Outline • Sequence models • Hidden Markov models – Inference with HMM – Learning • Conditional Models and Local Classifiers • Global models – Conditional Random Fields – Structured Perceptron for sequences 4
� HMM is also a linear classifier Consider the HMM: 𝑄 𝐲, 𝐳 = 2 𝑄 𝑧 3 𝑧 34+ 𝑄 𝑦 3 𝑧 3 3 5
� HMM is also a linear classifier Consider the HMM: 𝑄 𝐲, 𝐳 = 2 𝑄 𝑧 3 𝑧 34+ 𝑄 𝑦 3 𝑧 3 3 Transitions Emissions 6
� � HMM is also a linear classifier Consider the HMM: 𝑄 𝐲, 𝐳 = 2 𝑄 𝑧 3 𝑧 34+ 𝑄 𝑦 3 𝑧 3 3 Or equivalently log 𝑄 𝐲, 𝐳 = : log 𝑄 𝑧 3 ∣ 𝑧 34+ + log 𝑄 𝑦 3 ∣ 𝑧 3 3 Log joint probability = transition scores + emission scores 7
� HMM is also a linear classifier log 𝑄 𝐲, 𝐳 = : log 𝑄 𝑧 3 ∣ 𝑧 34+ + log 𝑄 𝑦 3 ∣ 𝑧 3 3 Log joint probability = transition scores + emission scores Let us examine this expression using a carefully defined set of indicator functions 8
� HMM is also a linear classifier log 𝑄 𝐲, 𝐳 = : log 𝑄 𝑧 3 ∣ 𝑧 34+ + log 𝑄 𝑦 3 ∣ 𝑧 3 3 Log joint probability = transition scores + emission scores Let us examine this expression using a carefully defined set of indicator functions 𝐽 ? = @1, 𝑨 is true, 0, 𝑨 is false. Indicators are functions that map Booleans to 0 or 1 9
� � � HMM is also a linear classifier log 𝑄 𝐲, 𝐳 = : log 𝑄 𝑧 3 ∣ 𝑧 34+ + log 𝑄 𝑦 3 ∣ 𝑧 3 3 Log joint probability = transition scores + emission scores Let us examine this expression using a carefully defined set of indicator functions Equivalent to 𝑡 L : : log 𝑄 𝑡 ⋅ 𝐽 N O PQ ⋅ 𝐽 [N OTUVWR ] Q R Q The indicators ensure that only one of the elements of the double summation is non-zero 10
� � HMM is also a linear classifier log 𝑄 𝐲, 𝐳 = : log 𝑄 𝑧 3 ∣ 𝑧 34+ + log 𝑄 𝑦 3 ∣ 𝑧 3 3 Log joint probability = transition scores + emission scores Let us examine this expression using a carefully defined set of indicator functions Equivalent to : log 𝑄 𝑦 3 𝑡 ⋅ 𝐽 N O PQ Q The indicators ensure that only one of the elements of the summation is non-zero 11
� � � � � � HMM is also a linear classifier log 𝑄 𝐲, 𝐳 = : log 𝑄 𝑧 3 ∣ 𝑧 34+ + log 𝑄 𝑦 3 ∣ 𝑧 3 3 Let us examine this expression using a carefully defined set of indicator functions 𝑡 L log 𝑄 𝐲, 𝐳 = : : : log 𝑄 𝑡 ⋅ 𝐽 N O PQ ⋅ 𝐽 [N OTUVWR ] Q R 3 Q + : : log 𝑄 𝑦 3 𝑡 ⋅ 𝐽 N O PQ 3 Q 12
� � � � � � � � � � � HMM is also a linear classifier log 𝑄 𝐲, 𝐳 = : log 𝑄 𝑧 3 ∣ 𝑧 34+ + log 𝑄 𝑦 3 ∣ 𝑧 3 3 Let us examine this expression using a carefully defined set of indicator functions 𝑡 L log 𝑄 𝐲, 𝐳 = : : : log 𝑄 𝑡 ⋅ 𝐽 N O PQ ⋅ 𝐽 [N OTUVWR ] Q R 3 Q + : : log 𝑄 𝑦 3 𝑡 ⋅ 𝐽 N O PQ 3 Q log 𝑄 𝐲, 𝐳 = : : log 𝑄(𝑡 ∣ 𝑡 L ) : 𝐽 N O PQ 𝐽 [N OTUVWR ] Q R Q 3 + : log 𝑄 𝑦 3 𝑡 : 𝐽 N O PQ Q 3 13
� � � � � � HMM is also a linear classifier log 𝑄 𝐲, 𝐳 = : log 𝑄 𝑧 3 ∣ 𝑧 34+ + log 𝑄 𝑦 3 ∣ 𝑧 3 3 Let us examine this expression using a carefully defined set of indicator functions 𝑡 L : 𝐽 N O PQ log 𝑄 𝐲, 𝐳 = : : log 𝑄 𝑡 𝐽 [N OTUVWR ] Q R Q 3 + : log 𝑄 𝑦 3 𝑡 : 𝐽 N O PQ Number of times Q 3 there is a transition in the sequence from state 𝑡’ to state 𝑡 Count(𝑡 L → 𝑡) 14
� � � � � HMM is also a linear classifier log 𝑄 𝐲, 𝐳 = : log 𝑄 𝑧 3 ∣ 𝑧 34+ + log 𝑄 𝑦 3 ∣ 𝑧 3 3 Let us examine this expression using a carefully defined set of indicator functions ⋅ Count(𝑡 L → 𝑡) 𝑡 L log 𝑄 𝐲, 𝐳 = : : log 𝑄 𝑡 Q R Q + : log 𝑄 𝑦 3 𝑡 : 𝐽 N O PQ Q 3 15
� � � � � HMM is also a linear classifier log 𝑄 𝐲, 𝐳 = : log 𝑄 𝑧 3 ∣ 𝑧 34+ + log 𝑄 𝑦 3 ∣ 𝑧 3 3 Let us examine this expression using a carefully defined set of indicator functions ⋅ Count(𝑡 L → 𝑡) 𝑡 L log 𝑄 𝐲, 𝐳 = : : log 𝑄 𝑡 Q R Q + : log 𝑄 𝑦 3 𝑡 : 𝐽 N O PQ Q 3 Number of times state 𝑡 occurs in the sequence: Count(𝑡) 16
� � � � HMM is also a linear classifier log 𝑄 𝐲, 𝐳 = : log 𝑄 𝑧 3 ∣ 𝑧 34+ + log 𝑄 𝑦 3 ∣ 𝑧 3 3 Let us examine this expression using a carefully defined set of indicator functions ⋅ Count(𝑡 L → 𝑡) 𝑡 L log 𝑄 𝐲, 𝐳 = : : log 𝑄 𝑡 Q R Q + : log 𝑄 𝑦 3 𝑡 ⋅ Count 𝑡 Q 17
� � � � HMM is also a linear classifier log 𝑄 𝐲, 𝐳 = : log 𝑄 𝑧 3 ∣ 𝑧 34+ + log 𝑄 𝑦 3 ∣ 𝑧 3 3 Let us examine this expression using a carefully defined set of indicator functions ⋅ Count(𝑡 L → 𝑡) 𝑡 L log 𝑄 𝐲, 𝐳 = : : log 𝑄 𝑡 Q R Q + : log 𝑄 𝑦 3 𝑡 ⋅ Count 𝑡 Q This is a linear function log P terms are the weights; counts via indicators are features Can be written as w T Á ( x , y ) and add more features 18
HMM is a linear classifier: An example Det Noun Verb Det Noun The dog ate the homework 19
HMM is a linear classifier: An example Det Noun Verb Det Noun The dog ate the homework Consider 20
HMM is a linear classifier: An example Det Noun Verb Det Noun The dog ate the homework Consider Transition scores + Emission scores 21
HMM is a linear classifier: An example Det Noun Verb Det Noun The dog ate the homework Consider log P(Det ! Noun) £ 2 + log P(Noun ! Verb) £ 1 + Emission scores + log P(Verb ! Det) £ 1 22
HMM is a linear classifier: An example Det Noun Verb Det Noun The dog ate the homework Consider log P(The | Det) £ 1 log P(Det ! Noun) £ 2 + log P(dog| Noun) £ 1 + log P(Noun ! Verb) £ 1 + + log P(ate| Verb) £ 1 + log P(Verb ! Det) £ 1 + log P(the| Det) £ 1 + log P(homework| Noun) £ 1 23
HMM is a linear classifier: An example Det Noun Verb Det Noun The dog ate the homework Consider log P(The | Det) £ 1 log P(Det ! Noun) £ 2 + log P(dog| Noun) £ 1 + log P(Noun ! Verb) £ 1 + + log P(ate| Verb) £ 1 + log P(Verb ! Det) £ 1 + log P(the| Det) £ 1 + log P(homework| Noun) £ 1 w : Parameters of the model 24
HMM is a linear classifier: An example Det Noun Verb Det Noun The dog ate the homework Consider log P(The | Det) £ 1 log P(Det ! Noun) £ 2 Á ( x , y ): Properties of + log P(dog| Noun) £ 1 this output and the + log P(Noun ! Verb) £ 1 input + + log P(ate| Verb) £ 1 + log P(Verb ! Det) £ 1 + log P(the| Det) £ 1 + log P(homework| Noun) £ 1 25
HMM is a linear classifier: An example Det Noun Verb Det Noun The dog ate the homework Consider log 𝑄(𝐸𝑓𝑢 → 𝑂𝑝𝑣𝑜) 2 log 𝑄(𝑂𝑝𝑣𝑜 → 𝑊𝑓𝑠𝑐) 1 log 𝑄(𝑊𝑓𝑠𝑐 → 𝐸𝑓𝑢) 1 log 𝑄 𝑈ℎ𝑓 𝐸𝑓𝑢) 1 ⋅ log 𝑄 𝑒𝑝 𝑂𝑝𝑣𝑜) 1 1 log 𝑄 𝑏𝑢𝑓 𝑊𝑓𝑠𝑐) 1 log 𝑄 𝑢ℎ𝑓 𝐸𝑓𝑢 1 log 𝑄 ℎ𝑝𝑛𝑓𝑥𝑝𝑠𝑙 𝑂𝑝𝑣𝑜) Á ( x , y ): Properties of this w : Parameters output and the input of the model 26
HMM is a linear classifier: An example Det Noun Verb Det Noun The dog ate the homework Consider log 𝑄(𝐸𝑓𝑢 → 𝑂𝑝𝑣𝑜) 2 log 𝑄(𝑂𝑝𝑣𝑜 → 𝑊𝑓𝑠𝑐) 1 log P( x , y ) = A linear scoring log 𝑄(𝑊𝑓𝑠𝑐 → 𝐸𝑓𝑢) 1 function = w T Á ( x , y ) log 𝑄 𝑈ℎ𝑓 𝐸𝑓𝑢) 1 ⋅ log 𝑄 𝑒𝑝 𝑂𝑝𝑣𝑜) 1 1 log 𝑄 𝑏𝑢𝑓 𝑊𝑓𝑠𝑐) 1 log 𝑄 𝑢ℎ𝑓 𝐸𝑓𝑢 1 log 𝑄 ℎ𝑝𝑛𝑓𝑥𝑝𝑠𝑙 𝑂𝑝𝑣𝑜) Á ( x , y ): Properties of this w : Parameters output and the input of the model 27
Recommend
More recommend