Markov Chain Monte Carlo Methods Michel Bierlaire michel.bierlaire@epfl.ch Transport and Mobility Laboratory Markov Chain Monte Carlo Methods – p. 1/36
Markov Chains Andrey Markov, 1856–1922, Russian mathematician. Markov Chain Monte Carlo Methods – p. 2/36
Markov Chains Glossary: • Stochastic process: X t , t = 0 , 1 , . . . , , collection of r.v. with same support, or states space { 1 , . . . , i, . . . , J } . • Markov process: (short memory) Pr( X t = i | X 0 , . . . , X t − 1 ) = Pr( X t = i | X t − 1 ) • Homogeneous Markov process: Pr( X t = j | X t − 1 = i ) = Pr( X t + k = j | X t − 1+ k = i ) = P ij ∀ t ≥ 1 , k ≥ 0 . • Transition matrix: P ∈ R J × J . • Properties: J � P ij = 1 , i = 1 , . . . , J, P ij ≥ 0 , ∀ i, j, j =1 Markov Chain Monte Carlo Methods – p. 3/36
Markov Chains • If state j can be reached from state i with non zero probability, we say that i communicates with j . • Two states that communicate belong to the same class . • A Markov chain is irreducible or ergodic if it contains only one class. • With an ergodic chain, it is possible to go to every state from any state. Markov Chain Monte Carlo Methods – p. 4/36
Markov Chains • P t ij is the probability that the process reaches state j from i after t steps. • Consider all t such that P t ii > 0 . The largest common divisor d is called the period of state i . • A state with period 1 is aperiodic . • If P ii > 0 , state i is aperiodic. • The period is the same for all states in the same class. • Therefore, if the chain is irreducible, if one state is aperiodic, they all are. Markov Chain Monte Carlo Methods – p. 5/36
A periodic chain 1 3 3 2 1 1 2 2 1 1 3 2 1 1 2 5 4 1 2 1 1 0 0 0 2 2 1 2 0 0 0 3 3 P = , d = 3 . 1 0 0 0 0 1 1 0 0 0 2 2 1 0 0 0 0 Markov Chain Monte Carlo Methods – p. 6/36
Markov Chains J � Pr( j ) = Pr( j | i ) Pr( i ) i =1 • Stationary probabilities: unique solution of the system J � π j = P ij π i , ∀ j = 1 , . . . , J. (1) i =1 J � π j = 1 . j =1 • Solution exists for any irreducible chain. Markov Chain Monte Carlo Methods – p. 7/36
Markov Chains • Consider the following system of equations: J � x i P ij = x j P ji , i � = j, x i = 1 (2) i =1 • We sum over i : J J � � x i P ij = x j P ji = x j . i =1 i =1 • If (2) has a solution, it is also a solution of (1). As π is the unique solution of (1) then x = π . π i P ij = π j P ji , i � = j • The chain is said time reversible Markov Chain Monte Carlo Methods – p. 8/36
Example • A machine can be in 4 states with respect to wear • perfect condition, • partially damaged, • seriously damaged, • completely useless. • The degradation process can be modeled by an irreducible aperiodic homogeneous Markov process, with the following transition matrix: 0 . 95 0 . 04 0 . 01 0 . 0 0 . 0 0 . 90 0 . 05 0 . 05 P = 0 . 0 0 . 0 0 . 80 0 . 20 1 . 0 0 . 0 0 . 0 0 . 0 Markov Chain Monte Carlo Methods – p. 9/36
Example � 5 8 , 1 4 , 3 32 , 1 � Stationary distribution: 32 0 . 95 0 . 04 0 . 01 0 . 0 � 5 � � 5 � 8 , 1 4 , 3 32 , 1 8 , 1 4 , 3 32 , 1 0 . 0 0 . 90 0 . 05 0 . 05 = 32 32 0 . 0 0 . 0 0 . 80 0 . 20 1 . 0 0 . 0 0 . 0 0 . 0 • Machine in perfect condition 5 days out of 8, in average. • Repair occurs in average every 32 days From now on: Markov process = irreducible aperiodic homogeneous Markov process Markov Chain Monte Carlo Methods – p. 10/36
Stationary distributions • Property: π j = lim t →∞ Pr( X t = j ) j = 1 , . . . , J. • Ergodicity: • Let f be any function on the state space. • Then, with probability 1, T J 1 � � lim f ( X t ) = π j f ( j ) . T T →∞ t =1 j =1 • Computing the expectation of a function of the stationary states is the same as to take the average of the values along a trajectory of the process. Markov Chain Monte Carlo Methods – p. 11/36
Simulation • We want to simulate a r.v. X with pmf Pr( X = j ) = p j . • We generate a Markov process with limiting probabilities p j (how?) • We simulate the evolution of the process. p j = π j = lim t →∞ Pr( X t = j ) j = 1 , . . . , J. Markov Chain Monte Carlo Methods – p. 12/36
Example: T = 100 1 0.8 Pr( X t = j ) 0.6 0.4 0.2 0 0 20 40 60 80 100 t Markov Chain Monte Carlo Methods – p. 13/36
Example: T = 1000 1 0.8 Pr( X t = j ) 0.6 0.4 0.2 0 0 200 400 600 800 1000 t Markov Chain Monte Carlo Methods – p. 14/36
Example: T = 10000 1 0.8 Pr( X t = j ) 0.6 0.4 0.2 0 0 2000 4000 6000 8000 10000 t Markov Chain Monte Carlo Methods – p. 15/36
Simulation • Assume that we are interested in simulating J � E[ f ( X )] = f ( j ) p j . j =1 • We use ergodicity to estimate it with T 1 � f ( X t ) . T t =1 • We should drop early states (see above example). Better estimate: T + k 1 � f ( X t ) . T t =1+ k Markov Chain Monte Carlo Methods – p. 16/36
Metropolis-Hastings Nicholas Metropolis W. Keith Hastings 1915 – 1999 1930 – Markov Chain Monte Carlo Methods – p. 17/36
Metropolis-Hastings • Let b j , j = 1 , . . . , J be positive numbers. • Let B = � j b j . • Let π j = b j /B . • We want to simulate a r.v. with pmf π j . • Consider a Markov process on { 1 , . . . , J } with transition probability Q . • Define another Markov process with the same states in the following way: • Assume the process is in state i , that is X t = i , • Simulate the (candidate) next state j according to Q . • Define � with probability α ij j X t +1 = with probability 1 − α ij i Markov Chain Monte Carlo Methods – p. 18/36
Metropolis-Hastings • Transition probability P : if i � = j P ij = Q ij α ij Q ii α ii + � otherwise P ii = ℓ � = i Q iℓ (1 − α iℓ ) • Must verify the property: 1 = � P ii + � j P ij = j � = i P ij Q ii α ii + � ℓ � = i Q iℓ (1 − α iℓ ) + � = j � = i Q ij α ij Q ii α ii + � ℓ � = i Q iℓ − � ℓ � = i Q iℓ α iℓ + � = j � = i Q ij α ij Q ii α ii + � = ℓ � = i Q iℓ As � j Q ij = 1 , we have α ii = 1 . Markov Chain Monte Carlo Methods – p. 19/36
Metropolis-Hastings • Stationary distribution and time reversibility: π i P ij = π j P ji , i � = j • that is π i Q ij α ij = π j Q ji α ji , i � = j • It is satisfied if α ij = π j Q ji and α ji = 1 π i Q ij or π i Q ij = α ji and α ij = 1 π j Q ji Markov Chain Monte Carlo Methods – p. 20/36
Metropolis-Hastings • Therefore � π j Q ji � α ij = min , 1 π i Q ij • Remember: π j = b j /B . Therefore � b j BQ ji � � b j Q ji � α ij = min , 1 = min , 1 b i BQ ij b i Q ij • The normalization constant B does not play a role in the computation of α ij . • In summary: • Given Q and b j • defining α as above • creates a Markov process characterized by P • with stationary distribution π . Markov Chain Monte Carlo Methods – p. 21/36
Metropolis-Hastings Algorithm: 1. Choose a Markov process characterized by Q . 2. Initialize the chain with a state i : t = 0 , X 0 = i . 3. Simulate the (candidate) next state j based on Q . 4. Let r be a draw from U [0 , 1[ . � � b j Q ji 5. Compare r with α ij = min . If b i Q ij , 1 r < b j Q ji b i Q ij then X t +1 = j , else X t +1 = i . 6. Increase t by one. 7. Goto step 3. Markov Chain Monte Carlo Methods – p. 22/36
Example b =(20 , 8 , 3 , 1 ) =( 5 8 , 1 4 , 3 32 , 1 π 32 ) 1 1 1 1 4 4 4 4 1 1 1 1 4 4 4 4 Q = 1 1 1 1 4 4 4 4 1 1 1 1 4 4 4 4 Run MH for 10000 iterations. Collect statistics after 1000. • Accept: [2488, 1532, 801, 283] • Reject: [0, 952, 1705, 2239] • Simulated: [0.627, 0.250, 0.095, 0.028] • Target: [0.625, 0.250, 0.09375, 0.03125] Markov Chain Monte Carlo Methods – p. 23/36
Gibbs sampling • Let X = ( X 1 , X 2 , . . . , X n ) be a random vector with pmf (or pdf) p ( x ) . • Assume we can draw from the marginals: Pr( X i | X j = x j , j � = i ) , i = 1 , . . . , n. • Markov process. Assume current state is x . • Draw randomly (equal probability) a coordinate i . • Draw r from the i th marginal. • New state: y = ( x 1 , . . . , x i − 1 , r, x i +1 , . . . , x n ) . Markov Chain Monte Carlo Methods – p. 24/36
Gibbs sampling • Transition probability: Q xy = 1 p ( y ) n Pr( X i = r | X j = x j , j � = i ) = n Pr( X j = x j , j � = i ) • The denominator is independent of X i . • So Q xy is proportional to p ( y ) . • Metropolis-Hastings: � p ( y ) Q yx � � p ( y ) p ( x ) � α xy = min , 1 = min p ( x ) p ( y ) , 1 = 1 p ( x ) Q xy • The candidate state is always accepted. Markov Chain Monte Carlo Methods – p. 25/36
Example: bivariate normal distribution � � �� � � �� σ 2 X µ X ρσ X σ Y X ∼ N , σ 2 Y µ Y ρσ X σ Y Y Marginal distribution: � � µ Y + σ Y ρ ( x − µ X ) , (1 − ρ 2 ) σ 2 Y | ( X = x ) ∼ N Y σ X Apply Gibbs sampling to draw from: �� � � �� 0 1 0 . 9 N , 0 0 . 9 1 Note: just for illustration. Should use Cholesky factor. Markov Chain Monte Carlo Methods – p. 26/36
Recommend
More recommend