Stochastic Simulation Markov Chain Monte Carlo Bo Friis Nielsen Institute of Mathematical Modelling Technical University of Denmark 2800 Kgs. Lyngby – Denmark Email: bfni@dtu.dk
The queueing example The queueing example We simulated the system until “stochastic steady state”. We were then able to describe this steady state: • What is the distribution of occupied servers • What is the rejection probability The model was a “state machine”, i.e. a Markov Chain. To obtain steady-state statistics, we used stochastic simulation, i.e. Monte Carlo. DTU 02443 – lecture 8 2
Discrete time Markov chains Discrete time Markov chains • We observe a sequence of X n s taking values in some sample space • The Next value in the sequence X n +1 is determined from some decision rule depending on the value of X n only. • For discrete sample space we can express the decision rule as a matrix of transition probabilities P = { p ij } , p ij = P ( X n +1 = j | X n = i ) • Under some technical assumptions we can find a stationary and limiting distribution π .π j = P ( X ∞ = j ) . • This distribution can be analytically found by solving π = π P (equilibrium distribution) DTU 02443 – lecture 8 3
Markov chains continued Markov chains continued • The theory can be extended to: ⋄ Continuous sample space or ⋄ Continuous time: exercise 4 is an example of a Continuous time Markov chain DTU 02443 – lecture 8 4
The probability of X n The probability of X n • The behaviour of the process itself - X n • The behaviour conditional on X 0 = i is ( p ij ( n )) • Define P ( X n = j ) = µ ( n ) with P ( X 0 = j ) = µ (0) j j µ ( n ) = { µ ( n ) • with � j } we find µ ( n ) = � µ ( n − 1) P = � µ (0) P n = � µ (0) P n � DTU 02443 – lecture 8 5
Small example Small example 1 − p p 0 0 q 0 p 0 P = 0 q 0 p 0 0 q 1 − q � 1 µ (0) = 3 , 0 , 0 , 2 � with � we get 3 1 − p p 0 0 q 0 p 0 � 1 � � 1 − p � 3 , 0 , 0 , 2 , p 3 , 2 q 3 , 2(1 − q ) µ (1) = � = 3 3 3 0 q 0 p 0 0 q 1 − q DTU 02443 – lecture 8 6
and and � 1 � 3 , 0 , 0 , 2 µ (0) = � , 3 (1 − p ) 2 + pq p 2 (1 − p ) p 0 p 2 q (1 − p ) 2 qp 0 P 2 = q 2 0 2 qp p (1 − q ) (1 − q ) 2 + qp q 2 0 (1 − q ) q DTU 02443 – lecture 8 7
� 1 � 3 , 0 , 0 , 2 µ (2) = � · 3 (1 − p ) 2 + pq p 2 (1 − p ) p 0 p 2 q (1 − p ) 2 qp 0 q 2 0 2 qp p (1 − q ) (1 − q ) 2 + qp q 2 0 (1 − q ) q � (1 − p ) 2 + pq � , (1 − p ) p , 4 qp 3 , 2 p (1 − q ) = 3 3 3 DTU 02443 – lecture 8 8
MCMC: What we aim to achieve MCMC: What we aim to achieve We have a variable X with a “complicated” distribution. We cannot sample X directly. We aim to generate a sequence of X i ’s • which each has the same distribution as X • but we allow them to be interdependent. This is an inverse problem relative to the queueing exercise: We start with the distribution of X , and aim to design a state machine which has this steady-state distribution. DTU 02443 – lecture 8 9
MCMC example from Bayesian statistics MCMC example from Bayesian statistics Prior distribution of parameter P ∼ U (0 , 1) : f P ( p ) = 1 (0 ≤ p ≤ 1) Distribution of data, conditional on parameter X for given P = p is Binomial( n, p ) i.e. the data has the conditional probabilities n P i (1 − P ) n − i P ( X = i | P ) = i DTU 02443 – lecture 8 10
The posterior distribution of P The posterior distribution of P Conditional density of parameter, given observed data X = i : f P | X = i ( p ) = f P ( p ) P ( X = i | P = p ) P ( X = i ) We need the unconditional probability of the observation: � 1 n p i (1 − p ) n − i dp P ( X = i ) = f P ( p ) i 0 We can evaluate this; in more complex models we could not. AIM: To sample from f P | X = i , without evaluating P ( X = i ) . DTU 02443 – lecture 8 11
The posterior distribution The posterior distribution Time series Histogram of Pvec 0.7 1400 0.6 0.5 1200 P 0.4 0.3 1000 0.2 0.1 800 Frequency 0 20 40 60 80 100 Iteration no. 600 400 200 0 0.2 0.4 0.6 0.8 DTU Pvec 02443 – lecture 8 12
When to apply MCMC? When to apply MCMC? The distribution is given by f ( x ) = c · g ( x ) where the unnormalized density g can be evaluated, but the normalising constant c cannot be evaluated (easily). 1 c = � X g ( x ) dx This is frequently the case in Bayesian statistics - the posterior density is proportional to the likelihood function Note (again) the similarity between simulation and evaluation of integrals DTU 02443 – lecture 8 13
Metropolis-Hastings algorithm Metropolis-Hastings algorithm • Proposal distribution h ( x , y ) • Acceptance of solution? The solution will be accepted with probability � � � � 1 , f ( y ) h ( y , x ) 1 , g ( y ) h ( y , x ) min = min f ( x ) h ( x , y ) g ( x ) h ( x , y ) � � � � 1 , g ( y ) = min for h ( y , x ) = h ( x , y ) g ( x ) • Avoiding the troublesome constant K ! • Frequently we apply a symmetric proposal distribution h ( y , x ) = h ( y , x ) Metropolis algorithm • It can be shown that this Markov chain will have f ( x ) as stationary distribution. DTU 02443 – lecture 8 14
Random Walk Metropolis-Hastings Random Walk Metropolis-Hastings Sampling from p.d.f. c · g ( x ) where c is unknown. 1. At iteration i , the state is X i 2. Propose to jump from X i to Y i = X i + ∆ X i where ∆ X i is sampled indepedently from a symmetric distribution • If g ( Y ) ≥ g ( X i ) , accept • If g ( Y ) ≤ g ( X i ) , accept w.p. g ( Y ) /g ( X i ) 3. On accept: Set X i +1 = Y i and goto 1. 4. On reject: Set X i +1 = X i and goto 1. Note that knowing c is not necessary! DTU 02443 – lecture 8 15
Proposal distribution (Gelman 1998) Proposal distribution (Gelman 1998) • A good proposal distribution has the following properties ⋄ For any x , it is easy to sample from h ( x , y ) ⋄ It is easy to compute the accpetance probability ⋄ Each jump goes a reasonable distance in the parameter space ⋄ The proposals are not rejected too frequently DTU 02443 – lecture 8 16
Gibss sampling Gibss sampling • Applies in multivariate cases where the conditional distribution among the coordinates are known. • For a multidimensional distribution x the Gibss sampler will modify only one coordinate at a time. • Typically d -steps in each iteration, where d is the dimension of the parameter space , that is of x DTU 02443 – lecture 8 17
Illustration of ordinary and MCMC sampling Illustration of ordinary and MCMC sampling DTU 02443 – lecture 8 18
Gibbs sampling - first dimension Gibbs sampling - first dimension DTU 02443 – lecture 8 19
Gibbs sampling - second dimension Gibbs sampling - second dimension DTU 02443 – lecture 8 20
Direct Markov chain as oppposed to MCMC Direct Markov chain as oppposed to MCMC • For an ordinary Markov chain we know P and find π - analytically or by simulation • When we apply MCMC ⋄ For a discrete distribution we know K π construct P which has no physical interpretation in general and obtain π by simulation ⋄ For a continuous distribution we know g ( x ) construct a transition kernel P ( x , y ) and get f ( x ) by simulation. DTU 02443 – lecture 8 21
Remarks Remarks • The method is computer intensitive • It is hard to verify the assumptions (Read: impossible) • Warmup period strongly recommended (necessary indeed!) • The samples are correlated • Should be run several times with different starting conditions ⋄ Comparing within run variance with between run variance • Check the BUGS site: http://www.mrc-bsu.cam.ac.uk/bugs/and/or links given at the BUGS site DTU 02443 – lecture 8 22
Further reading Further reading • A. Gelman, J.B. Carlin, H.S. Stern, D.B. Rubin: Bayesian Data Analysis, Chapmann & Hall 1998, ISBN 0 412 03991 5 • W.R. Gilks, S. Richarson, D.J. Spiegelhalter: Markov chain Mone Carlo in practice, Chapmann & Hall 1996, ISBN 0 412 05551 1 DTU 02443 – lecture 8 23
Beyond Random Walk Metropolis-Hastings Beyond Random Walk Metropolis-Hastings • Proposed points Y i can be generated with other schemes - this would change the acceptance probabilities. • In mulitvariate situations, we can process one co-ordinate at a time • If we know conditional distributions in the mulitvariate setting, then we can apply Gibbs sampling • This is well suited for graphical models with many variables, which each interact only with a few others • (Decision support systems is a big area of application) • Many hybrids and specialized versions exist • Very active research area, both theory and applications
Exercise 6: Markov Chain Monte Carlo Exercise 6: Markov Chain Monte Carlo simulation simulation • The number of busy lines in a trunk group (Erlang system) is given by a truncated Poisson distribution A i i ! P ( i ) = � n A j j =0 j ! • Generate values from this distribution by applying the Metropolis-Hastings algorithm, verify with a χ 2 -test. You can use the parameter values from exercise 4.
Recommend
More recommend