Predicting Sequences: Structured Perceptron CS 6355: Structured - PowerPoint PPT Presentation

Predicting Sequences: Structured Perceptron CS 6355: Structured Prediction 1

Conditional Random Fields summary • An undirected graphical model – Decompose the score over the structure into a collection of factors – Each factor assigns a score to assignment of the random variables it is connected to • Training and prediction – Final prediction via argmax w T Á ( x , y ) – Train by maximum (regularized) likelihood • Connections to other models – Effectively a linear classifier – A generalization of logistic regression to structures – An conditional variant of a Markov Random Field • We will see this soon 2

Global features The feature function decomposes over the sequence y 0 y 1 y 2 y 3 x 𝒙 𝑈 𝜚(𝒚, 𝑧 0 , 𝑧 1 ) 𝒙 𝑈 𝜚(𝒚, 𝑧 + , 𝑧 2 ) 𝒙 𝑈 𝜚(𝒚, 𝑧 2 , 𝑧 3 ) 3

Outline • Sequence models • Hidden Markov models – Inference with HMM – Learning • Conditional Models and Local Classifiers • Global models – Conditional Random Fields – Structured Perceptron for sequences 4

� HMM is also a linear classifier Consider the HMM: 𝑄 𝐲, 𝐳 = 2 𝑄 𝑧 3 𝑧 34+ 𝑄 𝑦 3 𝑧 3 3 5

� HMM is also a linear classifier Consider the HMM: 𝑄 𝐲, 𝐳 = 2 𝑄 𝑧 3 𝑧 34+ 𝑄 𝑦 3 𝑧 3 3 Transitions Emissions 6

� � HMM is also a linear classifier Consider the HMM: 𝑄 𝐲, 𝐳 = 2 𝑄 𝑧 3 𝑧 34+ 𝑄 𝑦 3 𝑧 3 3 Or equivalently log 𝑄 𝐲, 𝐳 = : log 𝑄 𝑧 3 ∣ 𝑧 34+ + log 𝑄 𝑦 3 ∣ 𝑧 3 3 Log joint probability = transition scores + emission scores 7

� HMM is also a linear classifier log 𝑄 𝐲, 𝐳 = : log 𝑄 𝑧 3 ∣ 𝑧 34+ + log 𝑄 𝑦 3 ∣ 𝑧 3 3 Log joint probability = transition scores + emission scores Let us examine this expression using a carefully defined set of indicator functions 8

� HMM is also a linear classifier log 𝑄 𝐲, 𝐳 = : log 𝑄 𝑧 3 ∣ 𝑧 34+ + log 𝑄 𝑦 3 ∣ 𝑧 3 3 Log joint probability = transition scores + emission scores Let us examine this expression using a carefully defined set of indicator functions 𝐽 ? = @1, 𝑨 is true, 0, 𝑨 is false. Indicators are functions that map Booleans to 0 or 1 9

� � � HMM is also a linear classifier log 𝑄 𝐲, 𝐳 = : log 𝑄 𝑧 3 ∣ 𝑧 34+ + log 𝑄 𝑦 3 ∣ 𝑧 3 3 Log joint probability = transition scores + emission scores Let us examine this expression using a carefully defined set of indicator functions Equivalent to 𝑡 L : : log 𝑄 𝑡 ⋅ 𝐽 N O PQ ⋅ 𝐽 [N OTUVWR ] Q R Q The indicators ensure that only one of the elements of the double summation is non-zero 10

� � HMM is also a linear classifier log 𝑄 𝐲, 𝐳 = : log 𝑄 𝑧 3 ∣ 𝑧 34+ + log 𝑄 𝑦 3 ∣ 𝑧 3 3 Log joint probability = transition scores + emission scores Let us examine this expression using a carefully defined set of indicator functions Equivalent to : log 𝑄 𝑦 3 𝑡 ⋅ 𝐽 N O PQ Q The indicators ensure that only one of the elements of the summation is non-zero 11

� � � � � � HMM is also a linear classifier log 𝑄 𝐲, 𝐳 = : log 𝑄 𝑧 3 ∣ 𝑧 34+ + log 𝑄 𝑦 3 ∣ 𝑧 3 3 Let us examine this expression using a carefully defined set of indicator functions 𝑡 L log 𝑄 𝐲, 𝐳 = : : : log 𝑄 𝑡 ⋅ 𝐽 N O PQ ⋅ 𝐽 [N OTUVWR ] Q R 3 Q + : : log 𝑄 𝑦 3 𝑡 ⋅ 𝐽 N O PQ 3 Q 12

� � � � � � � � � � � HMM is also a linear classifier log 𝑄 𝐲, 𝐳 = : log 𝑄 𝑧 3 ∣ 𝑧 34+ + log 𝑄 𝑦 3 ∣ 𝑧 3 3 Let us examine this expression using a carefully defined set of indicator functions 𝑡 L log 𝑄 𝐲, 𝐳 = : : : log 𝑄 𝑡 ⋅ 𝐽 N O PQ ⋅ 𝐽 [N OTUVWR ] Q R 3 Q + : : log 𝑄 𝑦 3 𝑡 ⋅ 𝐽 N O PQ 3 Q log 𝑄 𝐲, 𝐳 = : : log 𝑄(𝑡 ∣ 𝑡 L ) : 𝐽 N O PQ 𝐽 [N OTUVWR ] Q R Q 3 + : log 𝑄 𝑦 3 𝑡 : 𝐽 N O PQ Q 3 13

� � � � � � HMM is also a linear classifier log 𝑄 𝐲, 𝐳 = : log 𝑄 𝑧 3 ∣ 𝑧 34+ + log 𝑄 𝑦 3 ∣ 𝑧 3 3 Let us examine this expression using a carefully defined set of indicator functions 𝑡 L : 𝐽 N O PQ log 𝑄 𝐲, 𝐳 = : : log 𝑄 𝑡 𝐽 [N OTUVWR ] Q R Q 3 + : log 𝑄 𝑦 3 𝑡 : 𝐽 N O PQ Number of times Q 3 there is a transition in the sequence from state 𝑡’ to state 𝑡 Count(𝑡 L → 𝑡) 14

� � � � � HMM is also a linear classifier log 𝑄 𝐲, 𝐳 = : log 𝑄 𝑧 3 ∣ 𝑧 34+ + log 𝑄 𝑦 3 ∣ 𝑧 3 3 Let us examine this expression using a carefully defined set of indicator functions ⋅ Count(𝑡 L → 𝑡) 𝑡 L log 𝑄 𝐲, 𝐳 = : : log 𝑄 𝑡 Q R Q + : log 𝑄 𝑦 3 𝑡 : 𝐽 N O PQ Q 3 15

� � � � � HMM is also a linear classifier log 𝑄 𝐲, 𝐳 = : log 𝑄 𝑧 3 ∣ 𝑧 34+ + log 𝑄 𝑦 3 ∣ 𝑧 3 3 Let us examine this expression using a carefully defined set of indicator functions ⋅ Count(𝑡 L → 𝑡) 𝑡 L log 𝑄 𝐲, 𝐳 = : : log 𝑄 𝑡 Q R Q + : log 𝑄 𝑦 3 𝑡 : 𝐽 N O PQ Q 3 Number of times state 𝑡 occurs in the sequence: Count(𝑡) 16

� � � � HMM is also a linear classifier log 𝑄 𝐲, 𝐳 = : log 𝑄 𝑧 3 ∣ 𝑧 34+ + log 𝑄 𝑦 3 ∣ 𝑧 3 3 Let us examine this expression using a carefully defined set of indicator functions ⋅ Count(𝑡 L → 𝑡) 𝑡 L log 𝑄 𝐲, 𝐳 = : : log 𝑄 𝑡 Q R Q + : log 𝑄 𝑦 3 𝑡 ⋅ Count 𝑡 Q 17

� � � � HMM is also a linear classifier log 𝑄 𝐲, 𝐳 = : log 𝑄 𝑧 3 ∣ 𝑧 34+ + log 𝑄 𝑦 3 ∣ 𝑧 3 3 Let us examine this expression using a carefully defined set of indicator functions ⋅ Count(𝑡 L → 𝑡) 𝑡 L log 𝑄 𝐲, 𝐳 = : : log 𝑄 𝑡 Q R Q + : log 𝑄 𝑦 3 𝑡 ⋅ Count 𝑡 Q This is a linear function log P terms are the weights; counts via indicators are features Can be written as w T Á ( x , y ) and add more features 18

HMM is a linear classifier: An example Det Noun Verb Det Noun The dog ate the homework 19

HMM is a linear classifier: An example Det Noun Verb Det Noun The dog ate the homework Consider 20

HMM is a linear classifier: An example Det Noun Verb Det Noun The dog ate the homework Consider Transition scores + Emission scores 21

HMM is a linear classifier: An example Det Noun Verb Det Noun The dog ate the homework Consider log P(Det ! Noun) £ 2 + log P(Noun ! Verb) £ 1 + Emission scores + log P(Verb ! Det) £ 1 22

HMM is a linear classifier: An example Det Noun Verb Det Noun The dog ate the homework Consider log P(The | Det) £ 1 log P(Det ! Noun) £ 2 + log P(dog| Noun) £ 1 + log P(Noun ! Verb) £ 1 + + log P(ate| Verb) £ 1 + log P(Verb ! Det) £ 1 + log P(the| Det) £ 1 + log P(homework| Noun) £ 1 23

HMM is a linear classifier: An example Det Noun Verb Det Noun The dog ate the homework Consider log P(The | Det) £ 1 log P(Det ! Noun) £ 2 + log P(dog| Noun) £ 1 + log P(Noun ! Verb) £ 1 + + log P(ate| Verb) £ 1 + log P(Verb ! Det) £ 1 + log P(the| Det) £ 1 + log P(homework| Noun) £ 1 w : Parameters of the model 24

HMM is a linear classifier: An example Det Noun Verb Det Noun The dog ate the homework Consider log P(The | Det) £ 1 log P(Det ! Noun) £ 2 Á ( x , y ): Properties of + log P(dog| Noun) £ 1 this output and the + log P(Noun ! Verb) £ 1 input + + log P(ate| Verb) £ 1 + log P(Verb ! Det) £ 1 + log P(the| Det) £ 1 + log P(homework| Noun) £ 1 25

HMM is a linear classifier: An example Det Noun Verb Det Noun The dog ate the homework Consider log 𝑄(𝐸𝑓𝑢 → 𝑂𝑝𝑣𝑜) 2 log 𝑄(𝑂𝑝𝑣𝑜 → 𝑊𝑓𝑠𝑐) 1 log 𝑄(𝑊𝑓𝑠𝑐 → 𝐸𝑓𝑢) 1 log 𝑄 𝑈ℎ𝑓 𝐸𝑓𝑢) 1 ⋅ log 𝑄 𝑒𝑝𝑕 𝑂𝑝𝑣𝑜) 1 1 log 𝑄 𝑏𝑢𝑓 𝑊𝑓𝑠𝑐) 1 log 𝑄 𝑢ℎ𝑓 𝐸𝑓𝑢 1 log 𝑄 ℎ𝑝𝑛𝑓𝑥𝑝𝑠𝑙 𝑂𝑝𝑣𝑜) Á ( x , y ): Properties of this w : Parameters output and the input of the model 26

HMM is a linear classifier: An example Det Noun Verb Det Noun The dog ate the homework Consider log 𝑄(𝐸𝑓𝑢 → 𝑂𝑝𝑣𝑜) 2 log 𝑄(𝑂𝑝𝑣𝑜 → 𝑊𝑓𝑠𝑐) 1 log P( x , y ) = A linear scoring log 𝑄(𝑊𝑓𝑠𝑐 → 𝐸𝑓𝑢) 1 function = w T Á ( x , y ) log 𝑄 𝑈ℎ𝑓 𝐸𝑓𝑢) 1 ⋅ log 𝑄 𝑒𝑝𝑕 𝑂𝑝𝑣𝑜) 1 1 log 𝑄 𝑏𝑢𝑓 𝑊𝑓𝑠𝑐) 1 log 𝑄 𝑢ℎ𝑓 𝐸𝑓𝑢 1 log 𝑄 ℎ𝑝𝑛𝑓𝑥𝑝𝑠𝑙 𝑂𝑝𝑣𝑜) Á ( x , y ): Properties of this w : Parameters output and the input of the model 27

Predicting Sequences: Structured Perceptron CS 6355: Structured - PowerPoint PPT Presentation

Predicting Sequences: Structured Perceptron CS 6355: Structured Prediction 1 Conditional Random Fields summary An undirected graphical model Decompose the score over the structure into a collection of factors Each factor assigns a

Structured Perceptron CMSC 470 Marine Carpuat POS tagging Sequence labeling with the perceptron

CS 472 - Perceptron 1 Basic Neuron CS 472 - Perceptron 2 Expanded Neuron CS 472 - Perceptron

NLP Programming Tutorial 11 - The Structured Perceptron Graham Neubig Nara Institute of Science

Sequential Data Modeling - The Structured Perceptron Graham Neubig Nara Institute of Science and

Machine Learning Fall 2017 Structured Prediction (structured perceptron, HMM, structured SVM)

The Perceptron Algorithm Machine Learning 1 Some slides based on lectures from Dan Roth, Avrim

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Introduction to Machine Learning Perceptron Barnabs Pczos Contents History of Artificial

How to Train Your Perceptron 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University

The Perceptron Mistake Bound Machine Learning 1 Some slides based on lectures from Dan Roth,

Machine Learning A Geometric Approach Linear Classification: Perceptron Professor Liang Huang

20-03-06 7. Learning Sequences/Behaviors How to use sequences/behaviors? Sequences and more

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan

Sequences Sequences and Difference Equations "Sequences" is a central topic in

Sequences Sequences and Difference Equations "Sequences" is a central topic in

1 Real HMM Examples Real HMM Examples Speech recognition HMMs: Machine translation HMMs:

Presented by Yi-Ta Wu April. 23, 2004 1 Agenda The Surveillance System Auto-Alarm Based

Mobile Robot Self-Driving Through Image Classification Using Discriminative Learning of

Heart Visualization from MRI Marek Zimnyi Julius Parulek Faculty of Mathematics, Physics and

HMM: Particle filters Lirong Xia Recap: Reasoning over Time Markov models p( X 1 )

Detecting Outliers in HMM modeling through Relative Entropy with Applications to Change-Point

Holes and Islands in Random Point Sets Martin Balko, Manfred Scheucher, Pavel Valtr 1 k -Gons a

AP Chemistry Compounds 2015-09-14 www.njctl.org Slide 3 / 126 Slide 4 / 126 Table of

Predicting Sequences: Structured Perceptron CS 6355: Structured - PowerPoint PPT Presentation

Predicting Sequences: Structured Perceptron CS 6355: Structured Prediction 1 Conditional Random Fields summary An undirected graphical model Decompose the score over the structure into a collection of factors Each factor assigns a

Structured Perceptron CMSC 470 Marine Carpuat POS tagging Sequence labeling with the perceptron

CS 472 - Perceptron 1 Basic Neuron CS 472 - Perceptron 2 Expanded Neuron CS 472 - Perceptron

NLP Programming Tutorial 11 - The Structured Perceptron Graham Neubig Nara Institute of Science

Sequential Data Modeling - The Structured Perceptron Graham Neubig Nara Institute of Science and

Machine Learning Fall 2017 Structured Prediction (structured perceptron, HMM, structured SVM)

The Perceptron Algorithm Machine Learning 1 Some slides based on lectures from Dan Roth, Avrim

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Introduction to Machine Learning Perceptron Barnabs Pczos Contents History of Artificial

How to Train Your Perceptron 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University

The Perceptron Mistake Bound Machine Learning 1 Some slides based on lectures from Dan Roth,

Machine Learning A Geometric Approach Linear Classification: Perceptron Professor Liang Huang

20-03-06 7. Learning Sequences/Behaviors How to use sequences/behaviors? Sequences and more

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan

Sequences Sequences and Difference Equations &quot;Sequences&quot; is a central topic in

Sequences Sequences and Difference Equations &quot;Sequences&quot; is a central topic in

1 Real HMM Examples Real HMM Examples Speech recognition HMMs: Machine translation HMMs:

Presented by Yi-Ta Wu April. 23, 2004 1 Agenda The Surveillance System Auto-Alarm Based

Mobile Robot Self-Driving Through Image Classification Using Discriminative Learning of

Heart Visualization from MRI Marek Zimnyi Julius Parulek Faculty of Mathematics, Physics and

HMM: Particle filters Lirong Xia Recap: Reasoning over Time Markov models p( X 1 )

Detecting Outliers in HMM modeling through Relative Entropy with Applications to Change-Point

Holes and Islands in Random Point Sets Martin Balko, Manfred Scheucher, Pavel Valtr 1 k -Gons a

AP Chemistry Compounds 2015-09-14 www.njctl.org Slide 3 / 126 Slide 4 / 126 Table of

Sequences Sequences and Difference Equations "Sequences" is a central topic in

Sequences Sequences and Difference Equations "Sequences" is a central topic in