Probabilistic Graphical Models Probabilistic Graphical Models Markov Chain Monte Carlo Inference Fall 2019 Siamak Ravanbakhsh
Learning objectives Learning objectives Markov chains the idea behind Markov Chain Monte Carlo (MCMC) two important examples: Gibbs sampling Metropolis-Hastings algorithm
Problem with Problem with likelihood weighting likelihood weighting Recap use a topological ordering sample conditioned on the parents if observed: keep the observed value update the weight
Problem with Problem with likelihood weighting likelihood weighting Recap use a topological ordering sample conditioned on the parents if observed: keep the observed value update the weight Issues observing the child does not affect the parent's assignment only applies to Bayes-nets
Gibbs sampling Gibbs sampling Idea iteratively sample each var. condition on its Markov blanket ∼ p ( x ∣ ) X X MB ( i ) i i if is observed: keep the observed value X i after many Gibbs sampling iterations X ∼ P
Gibbs sampling Gibbs sampling Idea iteratively sample each var. condition on its Markov blanket ∼ p ( x ∣ ) X X MB ( i ) i i if is observed: keep the observed value X i equivalent to first simplifying the model by removing observed vars sampling from the simplified Gibbs dist. after many Gibbs sampling iterations X ∼ P
Example: Example: Ising model Ising model recall the Ising model: p ( x ) ∝ exp( + ) ∑ i ∑ i , j ∈ E x h x x J i , j i i i j ∈ {−1, +1} x i
Example: Example: Ising model Ising model recall the Ising model: p ( x ) ∝ exp( + ) ∑ i ∑ i , j ∈ E x h x x J i , j i i i j ∈ {−1, +1} x i sample each node i: p ( x = +1 ∣ X ) = MB ( i ) i exp( h + ∑ j ∈ Mb ( i ) ) J X i , j i j = exp( h + )+exp(− h − ) ∑ j ∈ Mb ( i ) ∑ j ∈ Mb ( i ) J X J X i , j i , j i j i j
Example: Example: Ising model Ising model recall the Ising model: p ( x ) ∝ exp( + ) ∑ i ∑ i , j ∈ E x h x x J i , j i i i j ∈ {−1, +1} x i sample each node i: p ( x = +1 ∣ X ) = MB ( i ) i exp( h + ∑ j ∈ Mb ( i ) ) J X i , j i j = exp( h + )+exp(− h − ) ∑ j ∈ Mb ( i ) ∑ j ∈ Mb ( i ) J X J X i , j i , j i j i j σ (2 h + 2 ) ∑ j ∈ Mb ( i ) J X i , j i j
Example: Example: Ising model Ising model recall the Ising model: p ( x ) ∝ exp( + ) ∑ i ∑ i , j ∈ E x h x x J i , j i i i j ∈ {−1, +1} x i sample each node i: p ( x = +1 ∣ X ) = MB ( i ) i exp( h + ∑ j ∈ Mb ( i ) ) J X i , j i j = exp( h + )+exp(− h − ) ∑ j ∈ Mb ( i ) ∑ j ∈ Mb ( i ) J X J X i , j i , j i j i j σ (2 h + 2 ) σ (2 h + 2 ) ∑ j ∈ Mb ( i ) ∑ j ∈ Mb ( i ) J X J μ compare with mean-field i , j i , j i j i j
Markov Chain Markov Chain a sequence of random variables with Markov property ( t ) (1) ( t −1) ( t ) ( t −1) P ( X ∣ X , … , X ) = P ( X ∣ X ) ... its graphical model X (1) X ( T ) X (2) X ( T −1) many applications: language modeling: X is a word or a character physics: with correct choice of X, the world is Markov
Transition model Transition model we assume a homogeneous chain: P ( X ( t ) ( t −1) ( t +1) ( t ) ∣ X ) = P ( X ∣ X ) ∀ t cond. probabilities remain the same across time-steps notation: conditional probability ( t ) ( t −1) ′ ′ P ( X = x ∣ X = x ) = T ( x , x ) is called the transition model think of this as a matrix T
Transition model Transition model we assume a homogeneous chain: P ( X ( t ) ( t −1) ( t +1) ( t ) ∣ X ) = P ( X ∣ X ) ∀ t cond. probabilities remain the same across time-steps notation: conditional probability ( t ) ( t −1) ′ ′ P ( X = x ∣ X = x ) = T ( x , x ) is called the transition model think of this as a matrix T state-transition diagram its transition matrix ⎡ .25 ⎤ 0 .75 ⎢ ⎥ T = 0 .7 .3 ⎣ 0 ⎦ .5 .5
Transition model Transition model we assume a homogeneous chain: P ( X ( t ) ( t −1) ( t +1) ( t ) ∣ X ) = P ( X ∣ X ) ∀ t cond. probabilities remain the same across time-steps notation: conditional probability ( t ) ( t −1) ′ ′ P ( X = x ∣ X = x ) = T ( x , x ) is called the transition model think of this as a matrix T state-transition diagram its transition matrix ⎡ .25 ⎤ 0 .75 ⎢ ⎥ T = 0 .7 .3 ⎣ 0 ⎦ .5 .5 evolving the distribution P ( X ( t +1) ( t ) ′ ′ = x ) = P ( X = x ) T ( x , x ) ∑ x ∈ V al ( X ) ′
Markov Chain Monte Carlo ( Markov Chain Monte Carlo (MCMC MCMC) Example state-transition diagram for grasshopper random walk (0) ( X = 0) = 1 initial distribution P
Markov Chain Monte Carlo ( Markov Chain Monte Carlo (MCMC MCMC) Example state-transition diagram for grasshopper random walk (0) ( X = 0) = 1 initial distribution P after t=50 steps, the distribution is almost uniform P ( x ) ≈ 1 t ∀ x 9
Markov Chain Monte Carlo ( Markov Chain Monte Carlo (MCMC MCMC) Example state-transition diagram for grasshopper random walk (0) ( X = 0) = 1 initial distribution P after t=50 steps, the distribution is almost uniform P ( x ) ≈ 1 t ∀ x 9 use the chain to sample from the uniform distribution P ( X ) ≈ 1 t 9
Markov Chain Monte Carlo ( Markov Chain Monte Carlo (MCMC MCMC) Example state-transition diagram for grasshopper random walk (0) ( X = 0) = 1 initial distribution P after t=50 steps, the distribution is almost uniform P ( x ) ≈ 1 t ∀ x 9 use the chain to sample from the uniform distribution P ( X ) ≈ 1 t 9 why is it uniform? (mixing image: Murphy's book)
Markov Chain Monte Carlo ( Markov Chain Monte Carlo (MCMC MCMC) Example state-transition diagram for grasshopper random walk (0) ( X = 0) = 1 initial distribution P after t=50 steps, the distribution is almost uniform P ( x ) ≈ 1 t ∀ x 9 use the chain to sample from the uniform distribution P ( X ) ≈ 1 t 9 MCMC generalize this idea beyond uniform dist. why is it uniform? P ∗ we want to sample from pick the transition model such that P ∞ ∗ ( X ) = P ( X ) (mixing image: Murphy's book)
Stationary distribution Stationary distribution ′ T ( x , x ) given a transition model if the chain converges: ( t ) ( t +1) ( t ) ′ ′ ( x ) ≈ ( x ) = ( x ) T ( x , x ) ∑ x ′ global balance equation P P P
Stationary distribution Stationary distribution ′ T ( x , x ) given a transition model if the chain converges: ( t ) ( t +1) ( t ) ′ ′ ( x ) ≈ ( x ) = ( x ) T ( x , x ) ∑ x ′ global balance equation P P P this condition defines the stationary distribution : π ′ ′ π ( X = x ) = π ( X = x ) T ( x , x ) ∑ x ∈ V al ( X ) ′
Stationary distribution Stationary distribution ′ T ( x , x ) given a transition model if the chain converges: ( t ) ( t +1) ( t ) ′ ′ ( x ) ≈ ( x ) = ( x ) T ( x , x ) ∑ x ′ global balance equation P P P this condition defines the stationary distribution : π ′ ′ π ( X = x ) = π ( X = x ) T ( x , x ) ∑ x ∈ V al ( X ) ′ Example finding the stationary dist. 1 π ( x ) = .2 1 1 3 π ( x ) = .25 π ( x ) + .5 π ( x ) 2 2 3 2 π ( x ) = .7 π ( x ) + .5 π ( x ) π ( x ) = .5 3 1 2 3 π ( x ) = .75 π ( x ) + .3 π ( x ) π ( x ) = .3 1 2 3 π ( x ) + π ( x ) + π ( x ) = 1
Stationary distribution Stationary distribution as an eigenvector as an eigenvector Example finding the stationary dist. 1 1 3 1 π ( x ) = .25 π ( x ) + .5 π ( x ) π ( x ) = .2 2 2 3 2 π ( x ) = .7 π ( x ) + .5 π ( x ) π ( x ) = .5 3 1 2 3 π ( x ) = .75 π ( x ) + .3 π ( x ) π ( x ) = .3 1 2 3 π ( x ) + π ( x ) + π ( x ) = 1
Stationary distribution Stationary distribution as an eigenvector as an eigenvector Example finding the stationary dist. 1 1 3 1 π ( x ) = .25 π ( x ) + .5 π ( x ) π ( x ) = .2 2 2 3 2 π ( x ) = .7 π ( x ) + .5 π ( x ) π ( x ) = .5 3 1 2 3 π ( x ) = .75 π ( x ) + .3 π ( x ) π ( x ) = .3 1 2 3 π ( x ) + π ( x ) + π ( x ) = 1 viewing as a matrix and as a vector T (., .) P ( x ) t ⎡ .25 ⎤ ⎡ .2 ⎤ ⎡ .2 ⎤ 0 .5 = ⎣ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ( t +1) ( t ) evolution of dist : P ( x ) t = T P T P 0 .7 .5 .5 .5 ⎣ 0 ⎦ ⎣ .3 ⎦ .3 ⎦ .75 .3 multiple steps: ( t + m ) ( t ) = ( T ) P T m P π T T π
Stationary distribution Stationary distribution as an eigenvector as an eigenvector Example finding the stationary dist. 1 1 3 1 π ( x ) = .25 π ( x ) + .5 π ( x ) π ( x ) = .2 2 2 3 2 π ( x ) = .7 π ( x ) + .5 π ( x ) π ( x ) = .5 3 1 2 3 π ( x ) = .75 π ( x ) + .3 π ( x ) π ( x ) = .3 1 2 3 π ( x ) + π ( x ) + π ( x ) = 1 viewing as a matrix and as a vector T (., .) P ( x ) t ⎡ .25 ⎤ ⎡ .2 ⎤ ⎡ .2 ⎤ 0 .5 = ⎣ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ( t +1) ( t ) evolution of dist : P ( x ) t = T P T P 0 .7 .5 .5 .5 ⎣ 0 ⎦ ⎣ .3 ⎦ .3 ⎦ .75 .3 multiple steps: ( t + m ) ( t ) = ( T ) P T m P π T T π for stationary dist: π = T π T
Recommend
More recommend