Conditional Random Fields Dietrich Klakow Overview Sequence - PowerPoint PPT Presentation

Conditional Random Fields Dietrich Klakow

Overview • Sequence Labeling • Bayesian Networks • Markov Random Fields • Conditional Random Fields • Software example

Sequence Labeling Tasks

Sequence: a sentence Pierre Vinken , 61 years old , will join the board as a nonexecutive director Nov. 29 .

POS Labels Pierre NNP Vinken NNP , , 61 CD years NNS old JJ , , will MD join VB the DT board NN as IN a DT nonexecutive JJ director NN Nov. NNP 29 CD . .

Chunking Task: find phrase boundaries:

Chunking B-NP Pierre I-NP Vinken O , B-NP 61 I-NP years B-ADJP old O , B-VP will I-VP join B-NP the I-NP board B-PP as B-NP a I-NP nonexecutive I-NP director B-NP Nov. I-NP 29 O .

Named Entity Tagging Pierre B-PERSON Vinken I-PERSON , O 61 B-DATE:AGE years I-DATE:AGE old I-DATE:AGE , O will O join O the O board B-ORG_DESC:OTHER as O a O nonexecutive O director B-PER_DESC Nov. B-DATE:DATE 29 I-DATE:DATE . O

Supertagging Pierre N/N Vinken N , , 61 N/N years N old (S[adj]\NP)\NP , , will (S[dcl]\NP)/(S[b]\NP) join ((S[b]\NP)/PP)/NP the NP[nb]/N board N as PP/NP a NP[nb]/N nonexecutive N/N director N Nov. ((S\NP)\(S\NP))/N[num] 29 N[num] . .

Hidden Markov Model

HMM: just an Application of a Bayes Classifier [ ] π π π = π π π ˆ ˆ ˆ ( , ... ) arg max P ( x , x ... x , , ... ) 1 2 N 1 2 N 1 2 N π π π , .. 1 2 N

Decomposition of Probabilities π π π P ( x , x .. x , , .. ) 1 2 N 1 2 N N ∏ = π π π P ( x | ) P ( | 1 ) − i i i i = i 1 π π P ( | ) : transition probability − i i 1 π P ( x | ) : emission probability i i

Graphical view HMM Observation sequence X 1 X 2 X 3 X N ……. π 1 π 2 π 3 π N ……. Label sequence

Criticism • HMMs model only limiter dependencies � come up with more flexible models � come up with graphical description

Bayesian Networks

Example for Bayesian Network From Russel and Norvig 95 AI: A Modern Approach = P ( C , S , R , W ) Corresponding joint distribution P ( W | S , R ) P ( S | C ) P ( R | C ) P ( C )

Naïve Bayes Observations x 1 , …. x D are assumed to be independent D ∏ P ( x i z | ) = i 1

Markov Random Fields

• Undirected graphical model • New term: • clique in an undirected graph: • Set of nodes such that every node is connected to every other node • maximal clique : there is no node that can be added without add without destroying the clique property

Example cliques: green and blue maximal clique: blue

Factorization x : all nodes x ... x 1 N x : nodes in clique C C C : set of all maximal cliques M Ψ Ψ ≥ ( x ) : potential function ( ( x ) 0 ) C C C C Joint distribution described by graph 1 ∏ = Ψ p ( x ) C x ( ) C Z ∈ C C M Normalization ∑ ∏ = Ψ ( ) Z x C C ∈ x C C M Z is sometimes call the partition function

Example x 2 x 1 x 3 x 5 x 4 What are the maximum cliques? Write down joint probability described by this graph � white board

Energy Function Define − E ( x ) Ψ = ( x ) e C C C Insert into joint distribution ∑ − E ( x ) 1 C = ∈ p ( x ) e C C M Z

Conditional Random Fields

Definition Maximum random field were each random variable y i is conditioned on the complete input sequence x 1 , …x n y=(y 1 …y n ) y 1 y 2 y 3 y n-1 y n ….. x=(x 1 …x n ) x

Distribution Distribution n N ∑∑ − λ f ( y , y , x , i ) 1 − j j i 1 i = = = p ( y | x ) e i 1 j 1 ( ) Z x λ parameters to be trained : j feature function f ( y , y , x , i ) : − j i 1 i (see maximum entropy models)

Example feature functions Modeling transitions  if y = and y = 1 IN NNP i - 1 i  = f ( y , y , x , i ) 1 − 1 i i  else 0 Modeling emissions  = = if y and x 1 NNP September i i  = f ( y , y , x , i ) − 2 i 1 i  else 0

Training • Like in maximum entropy models Generalized iterative scaling • Convergence: p(y|x) is a convex function � � unique maximum � Convergence is slow � Improved algorithms exist

Decoding: Auxiliary Matrix Define additional start symbol y 0 =START and stop symbol y n+1 =STOP M i Define matrix ( x ) such that N ∑ − λ f ( y , y , x , i ) [ ] − j j i 1 i i i = = = M ( x ) M ( x ) e j 1 y y y y − − i 1 i i 1 i

Reformulate Probability With that definition we have + 1 n 1 ∏ i = p ( y | x ) M ( x ) y y Z ( x ) i − 1 i = i 1 with ∑∑∑ ∑ + 1 2 n 1 = Z ( x ) ... M ( x ) M ( x ).... M ( x ) y y y y y y + 0 1 1 2 n n 1 y y y y 1 2 3 n

Use Matrix Properties Use matrix product ∑ [ ] 1 2 1 2 = M ( x ) M ( x ) M ( x ) M ( x ) y y y y y y 0 2 0 1 1 2 y 1 with [ ] + 1 2 1 n = Z ( x ) M ( x ) M ( x )... M ( x ) = = y START , y STOP 0 + 1 n

Software

CRF++ • See http://crfpp.sourceforge.net/

Summary • Sequence labeling problems • CRFs are • flexible • Expensive to train • Fast to decode

Conditional Random Fields Dietrich Klakow Overview Sequence - PowerPoint PPT Presentation

Conditional Random Fields Dietrich Klakow Overview Sequence Labeling Bayesian Networks Markov Random Fields Conditional Random Fields Software example Sequence Labeling Tasks Sequence: a sentence Pierre Vinken , 61

Multiscale Conditional 1) Generalization of conditional random fields (CRF) to multiscale

Conditional Random Fields [Hanna M. Wallach, Conditional Random Fields: An Introduction,

Sequential Data Modeling - Conditional Random Fields Graham Neubig Nara Institute of Science and

Graphical Models - Part II Oliver Schulte - CMPT 726 Bishop PRML Ch. 8 Markov Random Fields

Visualization Visualization Height Fields and Contours Height Fields and Contours Scalar Fields

Markov random fields 2. conditional specifications 3. conditional auto-regression Rasmus

Conditional Random Fields Andrea Passerini passerini@disi.unitn.it Statistical relational

Part 4: Conditional Random Fields Sebastian Nowozin and Christoph H. Lampert Colorado Springs,

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

Review: Conditional Probability Conditional Probability The conditional probability of event

11/15/16 Conditional distributions Let X and Y be discrete r.v.s. Conditional probability mass

Conditional quenched CLTs for random walks among random conductances Christophe Gallesco Nina

Limit theorems for excursion sets of stationary random fields Evgeny Spodarev | 23.01.2013 WIAS,

A conditional quenched CLT for random walks among random conductances on Z d Christophe Gallesco

Function Fields, Curves Introduction Function Fields vs. Curves and Global sections Function

Outline Outline Conditional Distribution and Density Conditional Distribution and

CS 758/858: Algorithms http://www.cs.unh.edu/~ruml/cs758 Graph Problems Number Problem Wheeler

Co-nondeterminism in compositions: A kernelization lower bound for a Ramsey-type problem Stefan

Deterministic subgraph detection in broadcast CONGEST Janne H. Korhonen Aalto University Joel

Consensus Agreement Games Paul Upchurch, Daniel Sedra, Andrew Mullen, Haym Hirsh and Kavita Bala

The Average-Case Complexity of Counting Cliques in Erd os-R enyi Hypergraphs Enric

The Probabilistic Method Week 9: Random Graphs Joshua Brody CS49/Math59 Fall 2015 Reading

NP complete problems Some figures, text, and pseudocode from: - Introduction to Algorithms, by

k -dismantlability in graphs Bertrand Jouve joint work with Etienne Fieux CNRS - Toulouse -