models
play

Models CMSC 678 UMBC Announcement 1: Progress Report on Project - PowerPoint PPT Presentation

Undirected Probabilistic Graphical Models CMSC 678 UMBC Announcement 1: Progress Report on Project Due Monday April 16 th , 11:59 AM Build on the proposal: Update to address comments Discuss the progress youve made Discuss what remains


  1. Undirected Probabilistic Graphical Models CMSC 678 UMBC

  2. Announcement 1: Progress Report on Project Due Monday April 16 th , 11:59 AM Build on the proposal: Update to address comments Discuss the progress you’ve made Discuss what remains to be done Discuss any new blocks you’ve experienced (or anticipate experiencing) Any questions?

  3. Announcement 2: Assignment 4 Due Monday May 14 th , 11:59 AM Topic: probabilistic & graphical modeling

  4. Recap from last time…

  5. Hidden Markov Model Representation 𝑞 𝑨 1 , 𝑥 1 , 𝑨 2 , 𝑥 2 , … , 𝑨 𝑂 , 𝑥 𝑂 = 𝑞 𝑨 1 | 𝑨 0 𝑞 𝑥 1 |𝑨 1 ⋯ 𝑞 𝑨 𝑂 | 𝑨 𝑂−1 𝑞 𝑥 𝑂 |𝑨 𝑂 emission transition = ෑ 𝑞 𝑥 𝑗 |𝑨 𝑗 𝑞 𝑨 𝑗 | 𝑨 𝑗−1 probabilities/parameters probabilities/parameters 𝑗 … z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 represent the probabilities and independence assumptions in a graph

  6. v = double[N+2][K*] Viterbi Algorithm b = int[N+2][K*] backpointers/ v(i, B) is the v [*][*] = 0 book-keeping maximum probability of v [0][START] = 1 any paths to that state B from the for(i = 1; i ≤ N+1; ++ i) { beginning (and emitting the for(state = 0; state < K*; ++state) { observation) p obs = p emission (obs i | state) for(old = 0; old < K*; ++old) { p move = p transition (state | old) if( v [i-1][old] * p obs * p move > v [i][state]) { v [i][state] = v [i-1][old] * p obs * p move b[i][state] = old } computing v at time i-1 will correctly } incorporate (maximize over) paths through time i-2 : } we correctly obey the Markov property }

  7. Marginal Probability (via the Forward Algorithm) 𝛽 𝑗 − 1, 𝑡 ′ ∗ 𝑞 𝑡 𝑡 ′ ) ∗ 𝑞(obs at 𝑗 | 𝑡) 𝛽 𝑗, 𝑡 = ෍ 𝑡 ′ what are the what’s the total probability how likely is it to get immediate ways to up until now? into state s this way? get into state s ? α(i, s ) is the total probability of all paths: 1. that start from the beginning 2. that end (currently) in s at step i 3. that emit the observation obs at i Q: What do we return? (How do we A: α [N+1][end] return the likelihood of the sequence?) There’s an analogous backwards algorithm

  8. With Both Forward and Backward Values α( i, s ) * β( i, s) = total probability of paths through state s at step i 𝑞 𝑨 𝑗 = 𝑡 𝑥 1 , ⋯ , 𝑥 𝑂 ) = 𝛽 𝑗, 𝑡 ∗ 𝛾(𝑗, 𝑡) 𝛽(𝑂 + 1, END ) α( i, s) * p( s’ | B) * p(obs at i+1 | s’) * β( i+1, s’ ) = total probability of paths through the s  s ’ arc (at time i) 𝑞 𝑨 𝑗 = 𝑡, 𝑨 𝑗+1 = 𝑡 ′ 𝑥 1 , ⋯ , 𝑥 𝑂 ) = 𝛽 𝑗, 𝑡 ∗ 𝑞 𝑡 ′ 𝑡 ∗ 𝑞 obs 𝑗+1 𝑡 ′ ∗ 𝛾(𝑗 + 1, 𝑡′) 𝛽(𝑂 + 1, END )

  9. EM For HMMs α = computeForwards() (Baum-Welch β = computeBackwards() Algorithm) L = α [N+1][ END ] for(i = N; i ≥ 0; --i) { for(next = 0; next < K*; ++next) { c obs (obs i+1 | next) += α [i+1][next]* β [i+1][next]/L for(state = 0; state < K*; ++state) { u = p obs (obs i+1 | next) * p trans (next | state) c trans (next| state) += α [i][state] * u * β [i+1][next]/L } } }

  10. Bayesian Networks: Directed Acyclic Graphs 𝑞 𝑦 1 , 𝑦 2 , 𝑦 3 , … , 𝑦 𝑂 = ෑ 𝑞 𝑦 𝑗 𝜌(𝑦 𝑗 )) 𝑗 “parents of” topological sort

  11. Bayesian Networks: Directed Acyclic Graphs 𝑞 𝑦 1 , 𝑦 2 , 𝑦 3 , … , 𝑦 𝑂 = ෑ 𝑞 𝑦 𝑗 𝜌(𝑦 𝑗 )) 𝑗 exact inference in general DAGs is NP-hard inference in trees can be exact

  12. D-Separation: Testing for Conditional Independence d-separation X & Y are d-separated if for all paths P, one of the following is true: Variables X & Y are P has a chain with an observed middle node conditionally independent given Z if all X Y (undirected) paths from P has a fork with an observed parent node (any variable in) X to (any variable in) Y are X Y d-separated by Z P includes a “v - structure” or “collider” with all unobserved descendants X Z Y

  13. D-Separation: Testing for Conditional Independence d-separation Variables X & Y are conditionally independent given Z if all (undirected) paths from (any variable X & Y are d-separated if for all paths P, one of in) X to (any variable in) Y are d-separated by Z the following is true: P has a chain with an observed middle node observing Z blocks the path from X to Y X Z Y P has a fork with an observed parent node Z observing Z blocks the path from X to Y X Y P includes a “v - structure” or “collider” with all unobserved descendants X Z Y not observing Z blocks the path from X to Y

  14. D-Separation: Testing for Conditional Independence d-separation Variables X & Y are conditionally independent given Z if all (undirected) paths from (any variable X & Y are d-separated if for all paths P, one of in) X to (any variable in) Y are d-separated by Z the following is true: P has a chain with an observed middle node observing Z blocks the path from X to Y X Z Y P has a fork with an observed parent node Z observing Z blocks the path from X to Y X Y P includes a “v - structure” or “collider” with all unobserved descendants not observing Z blocks X Z Y the path from X to Y 𝑞 𝑦, 𝑧, 𝑨 = 𝑞 𝑦 𝑞 𝑧 𝑞(𝑨|𝑦, 𝑧) 𝑞 𝑦, 𝑧 = ෍ 𝑞 𝑦 𝑞 𝑧 𝑞(𝑨|𝑦, 𝑧) = 𝑞 𝑦 𝑞 𝑧 𝑨

  15. Markov Blanket the set of nodes needed to form the complete conditional for a variable x i 𝑞(𝑦 1 , … , 𝑦 𝑂 ) 𝑞 𝑦 𝑗 𝑦 𝑘≠𝑗 = ∫ 𝑞 𝑦 1 , … , 𝑦 𝑂 𝑒𝑦 𝑗 x ς 𝑙 𝑞(𝑦 𝑙 |𝜌 𝑦 𝑙 ) factorization = of graph ∫ ς 𝑙 𝑞 𝑦 𝑙 𝜌 𝑦 𝑙 ) 𝑒𝑦 𝑗 factor out terms not dependent on x i Markov blanket of a node x ς 𝑙:𝑙=𝑗 or 𝑗∈𝜌 𝑦 𝑙 𝑞(𝑦 𝑙 |𝜌 𝑦 𝑙 ) is its parents, children, and = children's parents ∫ ς 𝑙:𝑙=𝑗 or 𝑗∈𝜌 𝑦 𝑙 𝑞 𝑦 𝑙 𝜌 𝑦 𝑙 ) 𝑒𝑦 𝑗

  16. Markov Random Fields: Undirected Graphs 𝑞 𝑦 1 , 𝑦 2 , 𝑦 3 , … , 𝑦 𝑂

  17. Markov Random Fields: Undirected Graphs clique : subset of nodes, where nodes are pairwise connected maximal clique : a clique that cannot add a node and remain a clique 𝑞 𝑦 1 , 𝑦 2 , 𝑦 3 , … , 𝑦 𝑂

  18. Markov Random Fields: Undirected Graphs clique : subset of nodes, where nodes are pairwise connected maximal clique : a clique that cannot add a node and remain a clique 𝑞 𝑦 1 , 𝑦 2 , 𝑦 3 , … , 𝑦 𝑂 = 1 𝑎 ෑ 𝜔 𝐷 𝑦 𝑑 𝐷 variables part of the clique C global normalization maximal potential function (not cliques necessarily a probability!)

  19. Markov Random Fields: Undirected Graphs clique : subset of nodes, where nodes are pairwise connected maximal clique : a clique that cannot add a node and remain a clique 𝑞 𝑦 1 , 𝑦 2 , 𝑦 3 , … , 𝑦 𝑂 = 1 𝑎 ෑ 𝜔 𝐷 𝑦 𝑑 𝐷 variables part of the clique C global normalization maximal potential function (not cliques necessarily a probability!)

  20. Markov Random Fields: Undirected Graphs clique : subset of nodes, where nodes are pairwise connected maximal clique : a clique that cannot add a node and remain a clique 𝑞 𝑦 1 , 𝑦 2 , 𝑦 3 , … , 𝑦 𝑂 = 1 𝑎 ෑ 𝜔 𝐷 𝑦 𝑑 𝐷 variables part Q : What restrictions should we of the clique C place on the potentials 𝜔 𝐷 ? global normalization maximal potential function (not cliques necessarily a probability!)

  21. Markov Random Fields: Undirected Graphs clique : subset of nodes, where nodes are pairwise connected maximal clique : a clique that cannot add a node and remain a clique 𝑞 𝑦 1 , 𝑦 2 , 𝑦 3 , … , 𝑦 𝑂 = 1 𝑎 ෑ 𝜔 𝐷 𝑦 𝑑 𝐷 variables part Q : What restrictions should we of the clique C place on the potentials 𝜔 𝐷 ? global normalization maximal potential function (not A : 𝜔 𝐷 ≥ 0 (or 𝜔 𝐷 > 0 ) cliques necessarily a probability!)

  22. Terminology: Potential Functions 𝑞 𝑦 1 , 𝑦 2 , 𝑦 3 , … , 𝑦 𝑂 = 1 𝑎 ෑ 𝜔 𝐷 𝑦 𝑑 𝐷 energy function (for clique C) (get the total energy of a configuration by summing the individual energy functions) 𝜔 𝐷 𝑦 𝑑 = exp −𝐹(𝑦 𝐷 ) Boltzmann distribution

  23. Ambiguity in Undirected Model Notation 𝑞 𝑦, 𝑧, 𝑨 ∝ 𝜔(𝑦, 𝑧, 𝑨) X Y Z 𝑞 𝑦, 𝑧, 𝑨 ∝ 𝜔 1 𝑦,𝑧 𝜔 2 𝑧,𝑨 𝜔 3 𝑦,𝑨

  24. Example: Ising Model Image denoising (Bishop, 2006; Fig 8.30) y: observed (noisy) pixel/state w/ 10% noise original x: original pixel/state two solutions Q : What are the cliques?

  25. Example: Ising Model Image denoising (Bishop, 2006; Fig 8.30) y: observed (noisy) pixel/state w/ 10% noise original x: original pixel/state two solutions Q : What are the cliques?

  26. Example: Ising Model y: Image denoising (Bishop, 2006; Fig 8.30) observed (noisy) pixel/state w/ 10% noise original x: original pixel/state two solutions neighboring pixels should be similar 𝐹 𝑦, 𝑧 = ℎ ෍ 𝑦 𝑗 − 𝛾 ෍ 𝑦 𝑗 𝑦 𝑘 − 𝜃 ෍ 𝑦 𝑗 𝑧 𝑗 𝑗 𝑗𝑘 𝑗 x i and y i should allow for a bias be correlated

Recommend


More recommend