Markov Networks [ Michael Jordan, Graphical Models, Statistical Science (Special Issue on Bayesian Statistics), 19, 140-155, 2004.] CS 486/686 University of Waterloo Lecture 18: Nov 8, 2012 Outline • Markov networks (a.k.a. Markov random fields) 2 CS486/686 Lecture Slides (c) 2012 P. Poupart 1
Recall Bayesian networks • Directed acyclic graph Cloudy • Arcs often interpreted as causal relationships Sprinkler Rain • Joint distribution: product of conditional dist Wet grass 3 CS486/686 Lecture Slides (c) 2012 P. Poupart Markov networks • Undirected graph Cloudy • Arcs simply indicate direct correlations Sprinkler Rain • Joint distribution: normalized product of potentials Wet • Popular in computer vision and grass natural language processing 4 CS486/686 Lecture Slides (c) 2012 P. Poupart 2
Parameterization • Joint: normalized product of potentials Pr( X ) = 1/k j f j ( CLIQUE j ) Cloudy = 1/k f 1 (C,S,R) f 2 (S,R,W) where k is a normalization constant k = X i j f j ( CLIQUE j ) Sprinkler Rain = C,S,R,W f 1 (C,S,R) f 2 (S,R,W) • Potential: Wet – Non-negative factor grass – Potential for each maximal clique in the graph – Entries: “likelihood strength” of different configurations. 5 CS486/686 Lecture Slides (c) 2012 P. Poupart Potential Example f 1 (C,S,R) csr 3 c~sr is more cs~r 2.5 likely than cs~r c~sr 5 c~s~r 5.5 ~csr 0 ~cs~r 2.5 ~c~sr 0 impossible configuration ~c~s~r 7 6 CS486/686 Lecture Slides (c) 2012 P. Poupart 3
Markov property • Markov property: a variable is independent of all other variables given its immediate neighbours. • Markov blanket: set of direct neighbours B MB(A) = {B,C,D,E} E A C D 7 CS486/686 Lecture Slides (c) 2012 P. Poupart Conditional Independence • X and Y are independent given Z iff there doesn’t exist any path between X and Y that doesn’t contain any of the variables in Z • Exercise: – A,E? E – A,E|D? A F – A,E|C? G B D – A,E|B,C? C H 8 CS486/686 Lecture Slides (c) 2012 P. Poupart 4
Interpretation • Markov property has a price: – Numbers are not probabilities • What are potentials? – They are indicative of local correlations • What do the numbers mean? – They are indicative of the likelihood of each configuration – Numbers are usually learnt from data since it is hard to specify them by hand given their lack of a clear interpretation 9 CS486/686 Lecture Slides (c) 2012 P. Poupart Applications • Natural language processing: – Part of speech tagging • Computer vision – Image segmentation • Any other application where there is no clear causal relationship 10 CS486/686 Lecture Slides (c) 2012 P. Poupart 5
Image Segmentation Segmentation of the Alps Kervrann, Heitz (1995) A Markov Random Field model-based Approach to Unsupervised Texture Segmentation Using Local and Global Spatial Statistics, IEEE Transactions on Image Processing, vol 4, no 6, p 856-862 11 CS486/686 Lecture Slides (c) 2012 P. Poupart Image Segmentation • Variables – Pixel features (e.g. intensities): X ij – Pixel labels: Y ij • Correlations: – Neighbouring pixel labels are correlated – Label and features of a pixel are correlated • Segmentation: – argmax Y Pr( Y | X )? 12 CS486/686 Lecture Slides (c) 2012 P. Poupart 6
Inference • Markov nets: factored representation – Use variable elimination • P( X | E = e )? – Restrict all factors that contain E to e – Sumout all variables that are not X or in E – Normalize the answer 13 CS486/686 Lecture Slides (c) 2012 P. Poupart Parameter Learning • Maximum likelihood – * = argmax P(data| ) • Complete data – Convex optimization, but no closed form solution – Iterative techniques such as gradient descent • Incomplete data – Non-convex optimization – EM algorithm 14 CS486/686 Lecture Slides (c) 2012 P. Poupart 7
Maximum likelihood • Let be the set of parameters and x i be the i th instance in the dataset • Optimization problem: – * = argmax P(data| ) = argmax i Pr( x i | ) = argmax i j f( X [j]= x i [j]) X j f( X [j]= x i [j]) where X [j] is the clique of variables that potential j depends on and x [j] is a variable assignment for that clique 15 CS486/686 Lecture Slides (c) 2012 P. Poupart Maximum likelihood • Let x = f( X = x ) • Optimization continued: – * = argmax i j X i [j] X j X i [j] = argmax log i j X i [j] X j X i [j] = argmax i j log X i [j] – log X j X i [j] • This is a non-concave optimization problem 16 CS486/686 Lecture Slides (c) 2012 P. Poupart 8
Maximum likelihood • Substitute = log and the problem becomes concave : – * = argmax i j X i [j] – log X e j X i [j] • Possible algorithms: – Gradient ascent – Conjugate gradient 17 CS486/686 Lecture Slides (c) 2012 P. Poupart Feature-based Markov Networks • Generalization of Markov networks – May not have a corresponding graph – Use features and weights instead of potentials – Use exponential representation • Pr( X = x ) = 1/k e j j j ( x[j] ) where x[j] is a variable assignment for a subset of variables specific to j • Feature j : Boolean function that maps partial variable assignments to 0 or 1 • Weight j : real number 18 CS486/686 Lecture Slides (c) 2012 P. Poupart 9
Feature-based Markov Networks • Potential-based Markov networks can always be converted to feature-based Markov networks Pr( x ) = 1/k j f j ( CLIQUE j = x [j]) = 1/k e j, clique j j, clique j j, clique j ( x [j]) • j, clique j = log f j ( CLIQUE j = x [j]) • j, clique j ( x [j])=1 if clique j = x [j], 0 otherwise 19 CS486/686 Lecture Slides (c) 2012 P. Poupart Example weights features f 1 (C,S,R) 1 if CSR = csr 1,csr = log 3 1,csr (CSR) = csr 3 0 otherwise cs~r 2.5 1 if CSR = *s~r 1,*s~r = log 2.5 1,*s~r (CSR) = 0 otherwise c~sr 5 1 if CSR = c~sr 1,c~sr = log 5 c~sr (CSR) = c~s~r 5.5 0 otherwise 1,c~s~r = log 5.5 1,c~s~r (CSR) = 1 if CSR = c~s~r ~csr 0 0 otherwise ~cs~r 2.5 1,~c*r = log 0 1,~c*r (CSR) = 1 if CSR = ~c*r ~c~sr 0 0 otherwise 1,~c~s~r = log 7 ~c~s~r (CSR) = 1 if CSR = ~c~s~r ~c~s~r 7 0 otherwise 20 CS486/686 Lecture Slides (c) 2012 P. Poupart 10
Features • Features – Any Boolean function – Provide tremendous flexibility • Example: text categorization – Simplest features: presence/absence of a word in a document – More complex features • Presence/absence of specific expressions • Presence/absence of two words within a certain window • Presence/absence of any combination of words • Presence/absence of a figure of style • Presence/absence of any linguistic feature 21 CS486/686 Lecture Slides (c) 2012 P. Poupart Next Class • Conditional random fields 22 CS486/686 Lecture Slides (c) 2012 P. Poupart 11
Recommend
More recommend