Undirected Probabilistic Graphical Models CMSC 678 UMBC
Announcement 1: Progress Report on Project Due Monday April 16 th , 11:59 AM Build on the proposal: Update to address comments Discuss the progress you’ve made Discuss what remains to be done Discuss any new blocks you’ve experienced (or anticipate experiencing) Any questions?
Announcement 2: Assignment 4 Due Monday May 14 th , 11:59 AM Topic: probabilistic & graphical modeling
Recap from last time…
Hidden Markov Model Representation 𝑞 𝑨 1 , 𝑥 1 , 𝑨 2 , 𝑥 2 , … , 𝑨 𝑂 , 𝑥 𝑂 = 𝑞 𝑨 1 | 𝑨 0 𝑞 𝑥 1 |𝑨 1 ⋯ 𝑞 𝑨 𝑂 | 𝑨 𝑂−1 𝑞 𝑥 𝑂 |𝑨 𝑂 emission transition = ෑ 𝑞 𝑥 𝑗 |𝑨 𝑗 𝑞 𝑨 𝑗 | 𝑨 𝑗−1 probabilities/parameters probabilities/parameters 𝑗 … z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 represent the probabilities and independence assumptions in a graph
v = double[N+2][K*] Viterbi Algorithm b = int[N+2][K*] backpointers/ v(i, B) is the v [*][*] = 0 book-keeping maximum probability of v [0][START] = 1 any paths to that state B from the for(i = 1; i ≤ N+1; ++ i) { beginning (and emitting the for(state = 0; state < K*; ++state) { observation) p obs = p emission (obs i | state) for(old = 0; old < K*; ++old) { p move = p transition (state | old) if( v [i-1][old] * p obs * p move > v [i][state]) { v [i][state] = v [i-1][old] * p obs * p move b[i][state] = old } computing v at time i-1 will correctly } incorporate (maximize over) paths through time i-2 : } we correctly obey the Markov property }
Marginal Probability (via the Forward Algorithm) 𝛽 𝑗 − 1, 𝑡 ′ ∗ 𝑞 𝑡 𝑡 ′ ) ∗ 𝑞(obs at 𝑗 | 𝑡) 𝛽 𝑗, 𝑡 = 𝑡 ′ what are the what’s the total probability how likely is it to get immediate ways to up until now? into state s this way? get into state s ? α(i, s ) is the total probability of all paths: 1. that start from the beginning 2. that end (currently) in s at step i 3. that emit the observation obs at i Q: What do we return? (How do we A: α [N+1][end] return the likelihood of the sequence?) There’s an analogous backwards algorithm
With Both Forward and Backward Values α( i, s ) * β( i, s) = total probability of paths through state s at step i 𝑞 𝑨 𝑗 = 𝑡 𝑥 1 , ⋯ , 𝑥 𝑂 ) = 𝛽 𝑗, 𝑡 ∗ 𝛾(𝑗, 𝑡) 𝛽(𝑂 + 1, END ) α( i, s) * p( s’ | B) * p(obs at i+1 | s’) * β( i+1, s’ ) = total probability of paths through the s s ’ arc (at time i) 𝑞 𝑨 𝑗 = 𝑡, 𝑨 𝑗+1 = 𝑡 ′ 𝑥 1 , ⋯ , 𝑥 𝑂 ) = 𝛽 𝑗, 𝑡 ∗ 𝑞 𝑡 ′ 𝑡 ∗ 𝑞 obs 𝑗+1 𝑡 ′ ∗ 𝛾(𝑗 + 1, 𝑡′) 𝛽(𝑂 + 1, END )
EM For HMMs α = computeForwards() (Baum-Welch β = computeBackwards() Algorithm) L = α [N+1][ END ] for(i = N; i ≥ 0; --i) { for(next = 0; next < K*; ++next) { c obs (obs i+1 | next) += α [i+1][next]* β [i+1][next]/L for(state = 0; state < K*; ++state) { u = p obs (obs i+1 | next) * p trans (next | state) c trans (next| state) += α [i][state] * u * β [i+1][next]/L } } }
Bayesian Networks: Directed Acyclic Graphs 𝑞 𝑦 1 , 𝑦 2 , 𝑦 3 , … , 𝑦 𝑂 = ෑ 𝑞 𝑦 𝑗 𝜌(𝑦 𝑗 )) 𝑗 “parents of” topological sort
Bayesian Networks: Directed Acyclic Graphs 𝑞 𝑦 1 , 𝑦 2 , 𝑦 3 , … , 𝑦 𝑂 = ෑ 𝑞 𝑦 𝑗 𝜌(𝑦 𝑗 )) 𝑗 exact inference in general DAGs is NP-hard inference in trees can be exact
D-Separation: Testing for Conditional Independence d-separation X & Y are d-separated if for all paths P, one of the following is true: Variables X & Y are P has a chain with an observed middle node conditionally independent given Z if all X Y (undirected) paths from P has a fork with an observed parent node (any variable in) X to (any variable in) Y are X Y d-separated by Z P includes a “v - structure” or “collider” with all unobserved descendants X Z Y
D-Separation: Testing for Conditional Independence d-separation Variables X & Y are conditionally independent given Z if all (undirected) paths from (any variable X & Y are d-separated if for all paths P, one of in) X to (any variable in) Y are d-separated by Z the following is true: P has a chain with an observed middle node observing Z blocks the path from X to Y X Z Y P has a fork with an observed parent node Z observing Z blocks the path from X to Y X Y P includes a “v - structure” or “collider” with all unobserved descendants X Z Y not observing Z blocks the path from X to Y
D-Separation: Testing for Conditional Independence d-separation Variables X & Y are conditionally independent given Z if all (undirected) paths from (any variable X & Y are d-separated if for all paths P, one of in) X to (any variable in) Y are d-separated by Z the following is true: P has a chain with an observed middle node observing Z blocks the path from X to Y X Z Y P has a fork with an observed parent node Z observing Z blocks the path from X to Y X Y P includes a “v - structure” or “collider” with all unobserved descendants not observing Z blocks X Z Y the path from X to Y 𝑞 𝑦, 𝑧, 𝑨 = 𝑞 𝑦 𝑞 𝑧 𝑞(𝑨|𝑦, 𝑧) 𝑞 𝑦, 𝑧 = 𝑞 𝑦 𝑞 𝑧 𝑞(𝑨|𝑦, 𝑧) = 𝑞 𝑦 𝑞 𝑧 𝑨
Markov Blanket the set of nodes needed to form the complete conditional for a variable x i 𝑞(𝑦 1 , … , 𝑦 𝑂 ) 𝑞 𝑦 𝑗 𝑦 𝑘≠𝑗 = ∫ 𝑞 𝑦 1 , … , 𝑦 𝑂 𝑒𝑦 𝑗 x ς 𝑙 𝑞(𝑦 𝑙 |𝜌 𝑦 𝑙 ) factorization = of graph ∫ ς 𝑙 𝑞 𝑦 𝑙 𝜌 𝑦 𝑙 ) 𝑒𝑦 𝑗 factor out terms not dependent on x i Markov blanket of a node x ς 𝑙:𝑙=𝑗 or 𝑗∈𝜌 𝑦 𝑙 𝑞(𝑦 𝑙 |𝜌 𝑦 𝑙 ) is its parents, children, and = children's parents ∫ ς 𝑙:𝑙=𝑗 or 𝑗∈𝜌 𝑦 𝑙 𝑞 𝑦 𝑙 𝜌 𝑦 𝑙 ) 𝑒𝑦 𝑗
Markov Random Fields: Undirected Graphs 𝑞 𝑦 1 , 𝑦 2 , 𝑦 3 , … , 𝑦 𝑂
Markov Random Fields: Undirected Graphs clique : subset of nodes, where nodes are pairwise connected maximal clique : a clique that cannot add a node and remain a clique 𝑞 𝑦 1 , 𝑦 2 , 𝑦 3 , … , 𝑦 𝑂
Markov Random Fields: Undirected Graphs clique : subset of nodes, where nodes are pairwise connected maximal clique : a clique that cannot add a node and remain a clique 𝑞 𝑦 1 , 𝑦 2 , 𝑦 3 , … , 𝑦 𝑂 = 1 𝑎 ෑ 𝜔 𝐷 𝑦 𝑑 𝐷 variables part of the clique C global normalization maximal potential function (not cliques necessarily a probability!)
Markov Random Fields: Undirected Graphs clique : subset of nodes, where nodes are pairwise connected maximal clique : a clique that cannot add a node and remain a clique 𝑞 𝑦 1 , 𝑦 2 , 𝑦 3 , … , 𝑦 𝑂 = 1 𝑎 ෑ 𝜔 𝐷 𝑦 𝑑 𝐷 variables part of the clique C global normalization maximal potential function (not cliques necessarily a probability!)
Markov Random Fields: Undirected Graphs clique : subset of nodes, where nodes are pairwise connected maximal clique : a clique that cannot add a node and remain a clique 𝑞 𝑦 1 , 𝑦 2 , 𝑦 3 , … , 𝑦 𝑂 = 1 𝑎 ෑ 𝜔 𝐷 𝑦 𝑑 𝐷 variables part Q : What restrictions should we of the clique C place on the potentials 𝜔 𝐷 ? global normalization maximal potential function (not cliques necessarily a probability!)
Markov Random Fields: Undirected Graphs clique : subset of nodes, where nodes are pairwise connected maximal clique : a clique that cannot add a node and remain a clique 𝑞 𝑦 1 , 𝑦 2 , 𝑦 3 , … , 𝑦 𝑂 = 1 𝑎 ෑ 𝜔 𝐷 𝑦 𝑑 𝐷 variables part Q : What restrictions should we of the clique C place on the potentials 𝜔 𝐷 ? global normalization maximal potential function (not A : 𝜔 𝐷 ≥ 0 (or 𝜔 𝐷 > 0 ) cliques necessarily a probability!)
Terminology: Potential Functions 𝑞 𝑦 1 , 𝑦 2 , 𝑦 3 , … , 𝑦 𝑂 = 1 𝑎 ෑ 𝜔 𝐷 𝑦 𝑑 𝐷 energy function (for clique C) (get the total energy of a configuration by summing the individual energy functions) 𝜔 𝐷 𝑦 𝑑 = exp −𝐹(𝑦 𝐷 ) Boltzmann distribution
Ambiguity in Undirected Model Notation 𝑞 𝑦, 𝑧, 𝑨 ∝ 𝜔(𝑦, 𝑧, 𝑨) X Y Z 𝑞 𝑦, 𝑧, 𝑨 ∝ 𝜔 1 𝑦,𝑧 𝜔 2 𝑧,𝑨 𝜔 3 𝑦,𝑨
Example: Ising Model Image denoising (Bishop, 2006; Fig 8.30) y: observed (noisy) pixel/state w/ 10% noise original x: original pixel/state two solutions Q : What are the cliques?
Example: Ising Model Image denoising (Bishop, 2006; Fig 8.30) y: observed (noisy) pixel/state w/ 10% noise original x: original pixel/state two solutions Q : What are the cliques?
Example: Ising Model y: Image denoising (Bishop, 2006; Fig 8.30) observed (noisy) pixel/state w/ 10% noise original x: original pixel/state two solutions neighboring pixels should be similar 𝐹 𝑦, 𝑧 = ℎ 𝑦 𝑗 − 𝛾 𝑦 𝑗 𝑦 𝑘 − 𝜃 𝑦 𝑗 𝑧 𝑗 𝑗 𝑗𝑘 𝑗 x i and y i should allow for a bias be correlated
Recommend
More recommend