Stochastic Simulation Markov Chain Monte Carlo Bo Friis Nielsen Institute of Mathematical Modelling Technical University of Denmark 2800 Kgs. Lyngby – Denmark Email: bfni@dtu.dk
MCMC: What we aim to achieve MCMC: What we aim to achieve We have a variable X with a “complicated” distribution. We cannot sample X directly. We aim to generate a sequence of X i ’s • which each has the same distribution as X • but we allow them to be interdependent. This is an inverse problem relative to the queueing exercise: We start with the distribution of X , and aim to design a state machine which has this steady-state distribution. DTU 02443 – lecture 8 2
MCMC example from Bayesian statistics MCMC example from Bayesian statistics Prior distribution of parameter P ∼ U (0 , 1) : f P ( p ) = 1 (0 ≤ p ≤ 1) Distribution of data, conditional on parameter X for given P = p is Binomial( n, p ) i.e. the data has the conditional probabilities n P i (1 − P ) n − i P ( X = i | P ) = i DTU 02443 – lecture 8 3
The posterior distribution of P The posterior distribution of P Conditional density of parameter, given observed data X = i (the posterior distribution): f P | X = i ( p ) = f P ( p ) P ( X = i | P = p ) P ( X = i ) We need the unconditional probability of the observation: � 1 n p i (1 − p ) n − i dp P ( X = i ) = f P ( p ) i 0 We can evaluate this; in more complex models we could not. AIM: To sample from f P | X = i , without evaluating c = 1 / P ( X = i ) . DTU 02443 – lecture 8 4
The posterior distribution The posterior distribution Time series Histogram of Pvec 0.7 1400 0.6 0.5 1200 P 0.4 0.3 1000 0.2 0.1 800 Frequency 0 20 40 60 80 100 Iteration no. 600 400 200 0 0.2 0.4 0.6 0.8 DTU Pvec 02443 – lecture 8 5
When to apply MCMC? When to apply MCMC? The distribution is given by f ( x ) = c · g ( x ) where the unnormalized density g can be evaluated, but the normalising constant c cannot be evaluated (easily). 1 c = � X g ( x ) dx This is frequently the case in Bayesian statistics - the posterior density is proportional to the likelihood function Note (again) the similarity between simulation and evaluation of integrals DTU 02443 – lecture 8 6
Metropolis-Hastings algorithm Metropolis-Hastings algorithm • Proposal distribution h ( x , y ) • Acceptance of solution? The solution will be accepted with probability � � � � 1 , f ( y ) h ( y , x ) 1 , g ( y ) h ( y , x ) min = min f ( x ) h ( x , y ) g ( x ) h ( x , y ) • Avoiding the troublesome constant c ! • Frequently we apply a symmetric proposal distribution h ( y , x ) = h ( y , x ) Metropolis algorithm to get � � � � 1 , g ( y ) = min for h ( y , x ) = h ( x , y ) g ( x ) DTU 02443 – lecture 8 7
Metropolis Hastigs algorithm and local balance Metropolis Hastigs algorithm and local balance The transition rate q ( x , y ) from x to y and vice versa is � 1 , g ( y ) h ( y , x ) � q ( x , y ) = h ( x , y ) min g ( x ) h ( x , y ) and � � 1 , g ( x ) h ( x , y ) q ( y , x ) = h ( y , x ) min g ( y ) h ( y , x ) Suppose g ( y ) h ( y , x ) < g ( x ) h ( x , y ) then f ( x ) q ( x , y ) = cg ( x ) h ( x , y ) g ( y ) h ( y , x ) g ( x ) h ( x , y ) = cg ( y ) h ( y , x ) = f ( y ) q ( y , x ) DTU 02443 – lecture 8 8
Random Walk Metropolis-Hastings Random Walk Metropolis-Hastings A simple symmetric proposal distribution is the random walk 1. At iteration i , the state is X i 2. Propose to jump from X i to Y i = X i + ∆ X i where ∆ X i is sampled indepedently from a symmetric distribution • If g ( Y ) ≥ g ( X i ) , accept • If g ( Y ) ≤ g ( X i ) , accept w.p. g ( Y ) /g ( X i ) 3. On accept: Set X i +1 = Y i and goto 1. 4. On reject: Set X i +1 = X i and goto 1. DTU 02443 – lecture 8 9
Proposal distribution (Gelman 1998) Proposal distribution (Gelman 1998) • A good proposal distribution has the following properties ⋄ For any x , it is easy to sample from h ( x , y ) ⋄ It is easy to compute the accpetance probability ⋄ Each jump goes a reasonable distance in the parameter space ⋄ The proposals are not rejected too frequently DTU 02443 – lecture 8 10
Illustration of ordinary MCMC sampling Illustration of ordinary MCMC sampling A new proposal can be anywhere in the full region DTU 02443 – lecture 8 11
Gibss sampling Gibss sampling • Applies in multivariate cases where the conditional distribution among the coordinates are known. • For a multidimensional distribution x the Gibss sampler will modify only one coordinate at a time. • Typically d -steps in each iteration, where d is the dimension of the parameter space , that is of x DTU 02443 – lecture 8 12
Gibbs sampling - first dimension Gibbs sampling - first dimension • Each dimension is updated at a time. Here the first. DTU 02443 – lecture 8 13
Gibbs sampling - second dimension Gibbs sampling - second dimension • Each dimension is updated at a time. Here the second. DTU 02443 – lecture 8 14
Different perspective modelling with Markov Different perspective modelling with Markov chains as oppposed to MCMC sampling chains as oppposed to MCMC sampling • For an ordinary Markov chain we know P and find π - analytically or by simulation • When we apply MCMC ⋄ For a discrete distribution we have π = c α construct P which has no physical interpretation in general and obtain π by simulation ⋄ For a continuous distribution we have the density f ( x ) = cg ( x ) , construct a transition kernel P ( x , y ) and get f ( x ) by simulation. DTU 02443 – lecture 8 15
Remarks Remarks • The method is computer intensitive • It is hard to verify the assumptions (Read: impossible) • Warmup period strongly recommended (necessary indeed!) • The samples are correlated • Should be run several times with different starting conditions ⋄ Comparing within run variance with between run variance • Check the BUGS site: http://www.mrc-bsu.cam.ac.uk/bugs/and/or links given at the BUGS site DTU 02443 – lecture 8 16
Further reading Further reading • A. Gelman, J.B. Carlin, H.S. Stern, D.B. Rubin: Bayesian Data Analysis, Chapmann & Hall 1998, ISBN 0 412 03991 5 • W.R. Gilks, S. Richarson, D.J. Spiegelhalter: Markov chain Mone Carlo in practice, Chapmann & Hall 1996, ISBN 0 412 05551 1 DTU 02443 – lecture 8 17
Beyond Random Walk Metropolis-Hastings Beyond Random Walk Metropolis-Hastings • Proposed points Y i can be generated with other schemes - this would change the acceptance probabilities. • In mulitvariate situations, we can process one co-ordinate at a time • If we know conditional distributions in the mulitvariate setting, then we can apply Gibbs sampling • This is well suited for graphical models with many variables, which each interact only with a few others • (Decision support systems is a big area of application) • Many hybrids and specialized versions exist • Very active research area, both theory and applications
Exercise 6: Markov Chain Monte Carlo Exercise 6: Markov Chain Monte Carlo simulation simulation 1. The number of busy lines in a trunk group (Erlang system) is given by a truncated Poisson distribution A i i ! P ( i ) = , j = 0 , . . . m � m A j j =0 j ! Generate values from this distribution by applying the Metropolis-Hastings algorithm, verify with a χ 2 -test. You can use the parameter values from exercise 4. 2. For two different call types the joint number of occupied lines is given by A j A i P ( i, j ) = 1 1 2 0 ≤ i + j ≤ m K i ! j !
You can use A 1 , A 2 = 4 and n = 10 . (a) Use Metropolis-Hastings, directly to generate variates from this distribution. (b) Use Metropolis-Hastings, coordinate wise to generate variates from this distribution. (c) Use Gibbs sampling to sample from the distribution. This is (also) coordinate-wise but here we use the exact conditional distributions. You will need to find the conditional distributions analytically. In all three cases test the distribution fit with a χ 2 test The system can be extended to an arbitrary dimension, and we can add restrictions on the different call types.
Recommend
More recommend