Structured Perceptron CMSC 470 Marine Carpuat POS tagging - PowerPoint PPT Presentation

Sequence Labeling with the Structured Perceptron CMSC 470 Marine Carpuat

POS tagging Sequence labeling with the perceptron Sequence labeling problem Structured Perceptron • Input: • Perceptron algorithm can be used for sequence labeling • sequence of tokens x = [x 1 … x L ] • Variable length L • But there are challenges • Output (aka label): • How to compute argmax efficiently? • What are appropriate features? • sequence of tags y = [y 1 … y L ] • # tags = K • Approach: leverage structure of • Size of output space? output space

Perceptron algorithm remains the same as for multiclass classification Note: CIML denotes the weight vector as 𝑥 instead of 𝜄 - - The feature function as Φ(𝑦, 𝑧) instead of 𝑔(𝑦, 𝑧 )

Feature functions for sequence labeling • Standard features of POS tagging • Unary features: # times word w has been labeled with tag l for all words w and all tags l • Markov features: # times tag l is adjacent to tag l’ in output for all tags l and l’ • Size of feature representation is constant wrt input length Example from CIML chapter 17

Solving the argmax problem for sequences with dynamic programming • Efficient algorithms possible if the feature function decomposes over the input • This holds for unary and markov features used for POS tagging

Decomposition of structure • Features decompose over the input if Feature function that only includes features about position l • If features decompose over the input, structures (x,y) can be scored incrementally

Decomposition of structure: Lattice/trellis representation • Trellis sequence labeling • Any path represents a labeling of input sentence • Gold standard path in red • Each edge receives a weight such that adding weights along the path corresponds to score for input/ouput configuration • Any max-weight path algorithm can find the argmax • We’ll describe the Viterbi algorithm

Dynamic programming solution relies on recursively computing prefix scores 𝛽 𝑚,𝑙 Score of best possible output prefix, up to and including position l, that labels the l-th word as label k Features for Sequence of labels Sequence of length l sequence starting at of length l-1 obtained by adding position 1 up to and k at the end. including position l

Computing prefix scores 𝛽 𝑚,𝑙 Example Let’s compute 𝛽 3,𝐵 given • Prefix scores for length 2 𝛽 2,𝑂 = 2, 𝛽 2,𝑊 = 9 , 𝛽 2,𝐵 = −1 • Unary feature weights 𝑥 𝑢𝑏𝑡𝑢𝑧/𝐵 = 1.2 • Markov feature weights 𝑥 𝑂,𝐵 = −5 , 𝑥 𝑊,𝐵 = 2.5, 𝑥 𝐵,𝐵 = 2.2

Dynamic programming solution relies on recursively computing prefix scores 𝛽 𝑚,𝑙 Score of best possible output prefix, up to and including position l+1, that labels the (l+1)-th word as label k Backpointer to the label that achieves the above maximum Derivation on board + CIML ch17

Viterbi algorithm Assumptions: - Unary features - Markov features based on 2 adjacent labels Runtime: 𝑃(𝑀𝐿 2 )

Exercise: Impact of feature definitions • Consider a structured perceptron with the following features • # times word w has been labeled with tag l for all words w and all tags l • # times word w has been labeled with tag l when it follows word w’ for all words w, w’ and all tags l • # times tag l occurs in the sequence ( l’,l’’,l ) in the output for all tags l, l’, l’’ • What is the dimension of the perceptron weight vector? • Can we use dynamic programming to compute the argmax?

Recap: POS tagging • An example of sequence labeling tasks • Requires a predefined set of POS tags • Penn Treebank commonly used for English • Encodes some distinctions and not others • Given annotated examples, we can address sequence labeling with multiclass perceptron • but computing the argmax naively is expensive • constraints on the feature definition make efficient algorithms possible • Viterbi algorithm for unary and markov features

Structured Perceptron CMSC 470 Marine Carpuat POS tagging - PowerPoint PPT Presentation

Sequence Labeling with the Structured Perceptron CMSC 470 Marine Carpuat POS tagging Sequence labeling with the perceptron Sequence labeling problem Structured Perceptron Input: Perceptron algorithm can be used for sequence labeling

CS 472 - Perceptron 1 Basic Neuron CS 472 - Perceptron 2 Expanded Neuron CS 472 - Perceptron

NLP Programming Tutorial 11 - The Structured Perceptron Graham Neubig Nara Institute of Science

Sequential Data Modeling - The Structured Perceptron Graham Neubig Nara Institute of Science and

Machine Learning Fall 2017 Structured Prediction (structured perceptron, HMM, structured SVM)

The Perceptron Algorithm Machine Learning 1 Some slides based on lectures from Dan Roth, Avrim

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Introduction to Machine Learning Perceptron Barnabs Pczos Contents History of Artificial

How to Train Your Perceptron 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University

The Perceptron Mistake Bound Machine Learning 1 Some slides based on lectures from Dan Roth,

Machine Learning A Geometric Approach Linear Classification: Perceptron Professor Liang Huang

Predicting Sequences: Structured Perceptron CS 6355: Structured Prediction 1 Conditional Random

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan

The Perceptron Algorithm Perceptron (Frank Rosenblatt, 1957) First learning algorithm for

Lecture 3: Perceptron Princeton University COS 495 Instructor: Yingyu Liang Perceptron Overview

Supervised Classification with Logistic Regression CMSC 470 Marine Carpuat The Perceptron What

Induction and Its Applications Part 1: Algorithm Correctness, Loop Invariants, and Induction

Estimating Risk under Estimating statistics . . . Linearized techniques Interval Uncertainty:

Understanding CPU Caches Ulrich Drepper Introduction Discrepancy main CPU and main memory speed

The I/O-Model Aggarwal and Vitter, The Input/Output Complexity of Sorting and Related Problems

Lecture 2 Pairwise sequence alignment. Principles Computational Biology Teresa Przytycka, PhD

Algorithms for Big Data CISC5835 Fordham Univ. Instructor: X. Zhang Lecture 1 Outline

Lecture 7: Sequential Networks CSE 140: Components and Design Techniques for Digital Systems

1 State minimization (Incompletely specified FSM) PS x NS z Idea of equivalence does