markov chain monte carlo mcmc
play

Markov Chain Monte Carlo (MCMC) Variational methods Milos - PDF document

CS 3750 Machine Learning Lecture 6 Approximate probabilistic inference: Markov Chain Monte Carlo (MCMC) Variational methods Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square CS 3750 Advanced Machine Learning Markov chain


  1. CS 3750 Machine Learning Lecture 6 Approximate probabilistic inference: • Markov Chain Monte Carlo (MCMC) • Variational methods Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square CS 3750 Advanced Machine Learning Markov chain Monte Carlo • Importance sampling: samples are generated according to Q and every sample from Q is reweighted according to w, but the Q distribution may be very far from the target • MCMC is a strategy for generating samples from the target distribution, including conditional distributions • MCMC: – Markov chain defines a sampling process that – initially generates samples very different from the target distribution (e.g. posterior) – but gradually refines the samples so that they are closer and closer to the posterior. CS 3750 Advanced Machine Learning 1

  2. MCMC • The construction of a Markov chain requires two basic ingredients – a transition matrix P – an initial distribution  0 • Assume a finite set S={1,…m} of states, then a transition matrix is    p p p   11 12 1 m    p p p  21 22 2 m P              p p p m 1 m 2 mm        2 0 ( , ) p ij 1 i S Where and p ij i j S  j S CS 3750 Advanced Machine Learning Markov Chain • Markov chain defines a random process of selecting states ( 0 ) ( 1 ) ( )  m  , , , x x x Subsequent states selected based on the Initial state selected previous state and the transition matrix based on  0 x ’ x t t+1 • Chain Dynamics        ( t 1 ) ( t 1 ) ( t ) ( t ) P ( X x ' ) P ( X x ) T ( x x ' )  x Dom ( X ) Probability of a state x’ being selected transition matrix at time t+1 CS 3750 Advanced Machine Learning 2

  3. MCMC • Markov chain satisfies        ( | , ,  ) ( | ) P X j X i X i X i P X j X i   1 0 0 1 1 1 n n n n n n • Irreducibility: A MC is called irreducible (or un- decomposable) if there is a positive transition probability for all pairs of states within a limited number of steps • In irreducible chains there may still exist a periodic structure such that for each state , the set of possible return times  Ν  to i when starting in i is a subset of the set p { p , 2 p , 3 p , } containing all but a finite set of these elements. The smallest number p with this property is the so-called period of the chain    ( ) n p gcd{ n N : p 0 } ii CS 3750 Advanced Machine Learning MCMC • Aperiodicity: An irreducible chain is called aperiodic (or acyclic) if the period p equals 1 or, equivalently, if for all pairs n  of states there is an integer n ij such that for all , the n ij probability p ( n ) ij >0. • If a Markov chain satisfies both irreducibility and aperiodicity , then it converges to an invariant distribution q(x) • A Markov chain with transition matrix P will have an q = qP . equilibrium distribution q iff • A sufficient, but not necessary, condition to ensure a particular q(x) is the invariant distribution of transition matrix P is the following reversibility (detailed balance) condition     1 1 1 i i i i i i ( ) ( | ) ( ) ( | ) q x P x x q x P x x CS 3750 Advanced Machine Learning 3

  4. Markov Chain Monte Carlo Objective: generate samples from the posterior distribution • Idea: – Markov chain defines a sampling process that – initially generates samples very different from the target posterior – but gradually refines the samples so that they are closer and closer to the posterior. CS 3750 Advanced Machine Learning MCMC • P(X|e) — the query we want to compute • e 1 & e 2 are known evidence P(X|e) • Sampling from the distribution P(X) is very different from the desired posterior P(X|e) e 1 e 2 CS 3750 Advanced Machine Learning 4

  5. Markov Chain Monte Carlo (MCMC) State Space X 2 X 3 X 1 X 4 ……… CS 3750 Advanced Machine Learning MCMC (Cont.) • Goal: a sample from P(X|e) • Start from some P(X) and generate a sample x1 X 1 CS 3750 Advanced Machine Learning 5

  6. MCMC (Cont.) • Goal: a sample from P(X|e) • Start from some P(X) and generate a sample x1 X 1 Apply T CS 3750 Advanced Machine Learning MCMC (Cont.) • Goal: a sample from P(X|e) • Start from some P(X) and generate a sample x1 • From x1 and transition generate x2 X 1 X 2 Apply T Apply T CS 3750 Advanced Machine Learning 6

  7. MCMC (Cont.) • Goal: a sample from P(X|e) • Start from some P(X) and generate a sample x1 • From x1 and transition generate x2 X 1 X 2 Apply T Apply T CS 3750 Advanced Machine Learning MCMC (Cont.) • Goal: a sample from P(X|e) • Start from some P(X) and generate a sample x1 • From x1 and transition generate x2 • Repeat for n steps P ’ (X|e) X 1 X 2 X n …… Apply T Apply T Apply T CS 3750 Advanced Machine Learning 7

  8. MCMC (Cont.) • Goal: a sample from P(X|e) • Start from some P(X) and generate a sample x1 • From x1 and transition generate x2 • Repeat for n steps P ’ (X|e) X 1 X 2 X n …… Apply T Apply T Apply T CS 3750 Advanced Machine Learning MCMC (Cont.) • Goal: a sample from P(X|e) • Start from some P(X) and generate a sample x1 • From x1 and transition generate x2 • Repeat for n steps Samples from desired P (X|e) P ’ (X|e) X 1 X 2 X n X n+1 X n+2 …… …… Apply T Apply T Apply T CS 3750 Advanced Machine Learning 8

  9. MCMC • In general, an MCMC sampling process doesn’t have to converge to a stationary distribution • A finite state Markov Chain has a unique stationary distribution iff the markov chain is regular – regular: exist some k, for each pair of states x and x’, the probability of getting from x to x’ in exactly k steps is greater than 0 • We want Markov chains that converge to a unique target distribution from any initial state Big question: • How to build such Markov chains? CS 3750 Advanced Machine Learning Gibbs Sampling - A simple method to define MC for BBN can benefit from the structure (independences) in the network x 1 • Evidence: – x 5 =T – x 6 =T x 2 x 3 • all variables have binary values T or F x 4 x 5 x 6 CS 3750 Advanced Machine Learning 9

  10. Gibbs Sampling Initial state x 1 x 2 x 3 x 1 =F, x 2 =T x 4 x 3 =T, x 4 =T x 5 x 6 x 5 =x 6 =T (Fixed) X 0 CS 3750 Advanced Machine Learning Gibbs Sampling Initial state Update Value of x 4 x 1 x 2 x 3 x 1 =F, x 2 =T x 3 =T, x 4 =T x 4 x 5 x 6 x 5 =x 6 =T (Fixed) X 0 CS 3750 Advanced Machine Learning 10

  11. Gibbs Sampling x 1 x 1 x 1 =F, x 2 =T, x 2 x 3 x 2 x 3 x 3 =T, x 4 x 4 x 4 =F x 5 =T x 6 =T x 5 x 6 x 5 x 6 X 0 X 1 CS 3750 Advanced Machine Learning Gibbs Sampling x 1 x 1 Update Value of x 3 x 2 x 3 x 2 x 3 x 4 x 4 x 4 =F x 5 =T x 6 =T x 5 x 6 x 5 x 6 X 0 X 1 CS 3750 Advanced Machine Learning 11

  12. Gibbs Sampling Update Value of x 3 x 1 x 1 x 2 x 3 x 2 x 3 x 4 x 4 x 3 =T x 4 =F x 5 =T x 4 =F x 6 =T x 5 =T x 5 x 6 x 5 x 6 x 6 =T X 1 X 2 CS 3750 Advanced Machine Learning Gibbs Sampling After many reassignments x 1 x 1 x 2 x 3 x 2 x 3 …… …… x 4 x 4 x 5 x 6 x 5 x 6 X n Samples from desired P(X rest |e) CS 3750 Advanced Machine Learning 12

  13. Gibbs Sampling Keep resampling each variable using the value of variables in its local neighborhood (Markov blanket) x 1 x 1 x 2 x 3 x 2 x 3 x 4 x 4 ( | , , , ) P X x x x x 4 2 3 5 6 x 5 x 6 x 5 x 6 CS 3750 Advanced Machine Learning Gibbs Sampling • Gibbs sampling takes advantage of the graphical model structure • Markov blanket makes the variable independent from the rest of the network x 1 x 2 x 3 ( | , , , ) x 4 P X x x x x 4 2 3 5 6 x 5 x 6 CS 3750 Advanced Machine Learning 13

  14. Building a Markov Chain • A reversible Markov chain: • A sufficient, but not necessary, condition to ensure a particular q(x) is the invariant distribution of transition matrix P is the following reversibility (detailed balance) condition     i i 1 i i 1 i i 1 ( ) ( | ) ( ) ( | ) q x P x x q x P x x • Metropolis-Hastings algorithm – builds a reversible Markov Chain – Uses a proposal distribution to generate candidate states • Either accept it and take a transition to state x’ • Or reject it and stay at current state x CS 3750 Advanced Machine Learning Building a Markov Chain • Metropolis-Hastings algorithm – builds a reversible Markov Chain – uses the proposal distribution (similar to proposal the distribution in importance sampling) to generate candidates for x’  • A proposal distribution Q: T Q ( ' ) x x • Example: Uniform over the values of variables – Either accept a proposal and take a transition to state x’ – Or reject it and stay at current state x • Acceptance probability  ( ' ) A x x CS 3750 Advanced Machine Learning 14

Recommend


More recommend