Stochastic Simulation Markov Chain Monte Carlo Bo Friis Nielsen - PowerPoint PPT Presentation

Stochastic Simulation Markov Chain Monte Carlo Bo Friis Nielsen Institute of Mathematical Modelling Technical University of Denmark 2800 Kgs. Lyngby – Denmark Email: bfni@dtu.dk

The queueing example The queueing example We simulated the system until “stochastic steady state”. We were then able to describe this steady state: • What is the distribution of occupied servers • What is the rejection probability The model was a “state machine”, i.e. a Markov Chain. To obtain steady-state statistics, we used stochastic simulation, i.e. Monte Carlo. DTU 02443 – lecture 8 2

Discrete time Markov chains Discrete time Markov chains • We observe a sequence of X n s taking values in some sample space • The Next value in the sequence X n +1 is determined from some decision rule depending on the value of X n only. • For discrete sample space we can express the decision rule as a matrix of transition probabilities P = { p ij } , p ij = P ( X n +1 = j | X n = i ) • Under some technical assumptions we can find a stationary and limiting distribution π .π j = P ( X ∞ = j ) . • This distribution can be analytically found by solving π = π P (equilibrium distribution) DTU 02443 – lecture 8 3

Markov chains continued Markov chains continued • The theory can be extended to: ⋄ Continuous sample space or ⋄ Continuous time: exercise 4 is an example of a Continuous time Markov chain DTU 02443 – lecture 8 4

The probability of X n The probability of X n • The behaviour of the process itself - X n • The behaviour conditional on X 0 = i is ( p ij ( n )) • Define P ( X n = j ) = µ ( n ) with P ( X 0 = j ) = µ (0) j j µ ( n ) = { µ ( n ) • with � j } we find µ ( n ) = � µ ( n − 1) P = � µ (0) P n = � µ (0) P n � DTU 02443 – lecture 8 5

Small example Small example   1 − p p 0 0   q 0 p 0   P =     0 q 0 p     0 0 q 1 − q � 1 µ (0) = 3 , 0 , 0 , 2 � with � we get 3   1 − p p 0 0   q 0 p 0 � 1 � � 1 − p � 3 , 0 , 0 , 2 , p 3 , 2 q 3 , 2(1 − q )   µ (1) = � =   3 3 3   0 q 0 p     0 0 q 1 − q DTU 02443 – lecture 8 6

and and � 1 � 3 , 0 , 0 , 2 µ (0) = � , 3 (1 − p ) 2 + pq   p 2 (1 − p ) p 0   p 2 q (1 − p ) 2 qp 0 P 2 =       q 2 0 2 qp p (1 − q )     (1 − q ) 2 + qp q 2 0 (1 − q ) q DTU 02443 – lecture 8 7

� 1 � 3 , 0 , 0 , 2 µ (2) = � · 3  (1 − p ) 2 + pq  p 2 (1 − p ) p 0   p 2 q (1 − p ) 2 qp 0       q 2 0 2 qp p (1 − q )     (1 − q ) 2 + qp q 2 0 (1 − q ) q � (1 − p ) 2 + pq � , (1 − p ) p , 4 qp 3 , 2 p (1 − q ) = 3 3 3 DTU 02443 – lecture 8 8

MCMC: What we aim to achieve MCMC: What we aim to achieve We have a variable X with a “complicated” distribution. We cannot sample X directly. We aim to generate a sequence of X i ’s • which each has the same distribution as X • but we allow them to be interdependent. This is an inverse problem relative to the queueing exercise: We start with the distribution of X , and aim to design a state machine which has this steady-state distribution. DTU 02443 – lecture 8 9

MCMC example from Bayesian statistics MCMC example from Bayesian statistics Prior distribution of parameter P ∼ U (0 , 1) : f P ( p ) = 1 (0 ≤ p ≤ 1) Distribution of data, conditional on parameter X for given P = p is Binomial( n, p ) i.e. the data has the conditional probabilities    n  P i (1 − P ) n − i P ( X = i | P ) = i DTU 02443 – lecture 8 10

The posterior distribution of P The posterior distribution of P Conditional density of parameter, given observed data X = i : f P | X = i ( p ) = f P ( p ) P ( X = i | P = p ) P ( X = i ) We need the unconditional probability of the observation:   � 1  n  p i (1 − p ) n − i dp P ( X = i ) = f P ( p ) i 0 We can evaluate this; in more complex models we could not. AIM: To sample from f P | X = i , without evaluating P ( X = i ) . DTU 02443 – lecture 8 11

The posterior distribution The posterior distribution Time series Histogram of Pvec 0.7 1400 0.6 0.5 1200 P 0.4 0.3 1000 0.2 0.1 800 Frequency 0 20 40 60 80 100 Iteration no. 600 400 200 0 0.2 0.4 0.6 0.8 DTU Pvec 02443 – lecture 8 12

When to apply MCMC? When to apply MCMC? The distribution is given by f ( x ) = c · g ( x ) where the unnormalized density g can be evaluated, but the normalising constant c cannot be evaluated (easily). 1 c = � X g ( x ) dx This is frequently the case in Bayesian statistics - the posterior density is proportional to the likelihood function Note (again) the similarity between simulation and evaluation of integrals DTU 02443 – lecture 8 13

Metropolis-Hastings algorithm Metropolis-Hastings algorithm • Proposal distribution h ( x , y ) • Acceptance of solution? The solution will be accepted with probability � � � � 1 , f ( y ) h ( y , x ) 1 , g ( y ) h ( y , x ) min = min f ( x ) h ( x , y ) g ( x ) h ( x , y ) � � � � 1 , g ( y ) = min for h ( y , x ) = h ( x , y ) g ( x ) • Avoiding the troublesome constant K ! • Frequently we apply a symmetric proposal distribution h ( y , x ) = h ( y , x ) Metropolis algorithm • It can be shown that this Markov chain will have f ( x ) as stationary distribution. DTU 02443 – lecture 8 14

Random Walk Metropolis-Hastings Random Walk Metropolis-Hastings Sampling from p.d.f. c · g ( x ) where c is unknown. 1. At iteration i , the state is X i 2. Propose to jump from X i to Y i = X i + ∆ X i where ∆ X i is sampled indepedently from a symmetric distribution • If g ( Y ) ≥ g ( X i ) , accept • If g ( Y ) ≤ g ( X i ) , accept w.p. g ( Y ) /g ( X i ) 3. On accept: Set X i +1 = Y i and goto 1. 4. On reject: Set X i +1 = X i and goto 1. Note that knowing c is not necessary! DTU 02443 – lecture 8 15

Proposal distribution (Gelman 1998) Proposal distribution (Gelman 1998) • A good proposal distribution has the following properties ⋄ For any x , it is easy to sample from h ( x , y ) ⋄ It is easy to compute the accpetance probability ⋄ Each jump goes a reasonable distance in the parameter space ⋄ The proposals are not rejected too frequently DTU 02443 – lecture 8 16

Gibss sampling Gibss sampling • Applies in multivariate cases where the conditional distribution among the coordinates are known. • For a multidimensional distribution x the Gibss sampler will modify only one coordinate at a time. • Typically d -steps in each iteration, where d is the dimension of the parameter space , that is of x DTU 02443 – lecture 8 17

Illustration of ordinary and MCMC sampling Illustration of ordinary and MCMC sampling DTU 02443 – lecture 8 18

Gibbs sampling - first dimension Gibbs sampling - first dimension DTU 02443 – lecture 8 19

Gibbs sampling - second dimension Gibbs sampling - second dimension DTU 02443 – lecture 8 20

Direct Markov chain as oppposed to MCMC Direct Markov chain as oppposed to MCMC • For an ordinary Markov chain we know P and find π - analytically or by simulation • When we apply MCMC ⋄ For a discrete distribution we know K π construct P which has no physical interpretation in general and obtain π by simulation ⋄ For a continuous distribution we know g ( x ) construct a transition kernel P ( x , y ) and get f ( x ) by simulation. DTU 02443 – lecture 8 21

Remarks Remarks • The method is computer intensitive • It is hard to verify the assumptions (Read: impossible) • Warmup period strongly recommended (necessary indeed!) • The samples are correlated • Should be run several times with different starting conditions ⋄ Comparing within run variance with between run variance • Check the BUGS site: http://www.mrc-bsu.cam.ac.uk/bugs/and/or links given at the BUGS site DTU 02443 – lecture 8 22

Further reading Further reading • A. Gelman, J.B. Carlin, H.S. Stern, D.B. Rubin: Bayesian Data Analysis, Chapmann & Hall 1998, ISBN 0 412 03991 5 • W.R. Gilks, S. Richarson, D.J. Spiegelhalter: Markov chain Mone Carlo in practice, Chapmann & Hall 1996, ISBN 0 412 05551 1 DTU 02443 – lecture 8 23

Beyond Random Walk Metropolis-Hastings Beyond Random Walk Metropolis-Hastings • Proposed points Y i can be generated with other schemes - this would change the acceptance probabilities. • In mulitvariate situations, we can process one co-ordinate at a time • If we know conditional distributions in the mulitvariate setting, then we can apply Gibbs sampling • This is well suited for graphical models with many variables, which each interact only with a few others • (Decision support systems is a big area of application) • Many hybrids and specialized versions exist • Very active research area, both theory and applications

Exercise 6: Markov Chain Monte Carlo Exercise 6: Markov Chain Monte Carlo simulation simulation • The number of busy lines in a trunk group (Erlang system) is given by a truncated Poisson distribution A i i ! P ( i ) = � n A j j =0 j ! • Generate values from this distribution by applying the Metropolis-Hastings algorithm, verify with a χ 2 -test. You can use the parameter values from exercise 4.

Stochastic Simulation Markov Chain Monte Carlo Bo Friis Nielsen - PowerPoint PPT Presentation

Stochastic Simulation Markov Chain Monte Carlo Bo Friis Nielsen Institute of Mathematical Modelling Technical University of Denmark 2800 Kgs. Lyngby Denmark Email: bfni@dtu.dk The queueing example The queueing example We simulated the

Markov chain Monte Carlo Dr. Jarad Niemi STAT 544 - Iowa State University April 2, 2018 Jarad

Markov Chain Monte Carlo Methods Michel Bierlaire michel.bierlaire@epfl.ch Transport and

MARKOV CHAIN MONTE CARLO METHODS MARKOV CHAIN MONTE CARLO METHODS MARKO LAINE, FMI MARKO LAINE,

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Gersende FORT LTCI /

Simulation Monte Carlo Monte Carlo simulation Outcome of a single stochastic simulation run

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabs Pczos

Probabilistic Graphical Models Probabilistic Graphical Models Markov Chain Monte Carlo Inference

Markov chain Monte Carlo Reminder Need to sample large, non-standard distributions: Markov

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National University Monte Carlo

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

Part 3 Markov Chain Modeling Markov Chain Model Stochastic model Amounts to sequence of

Bayesian inference & Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly

Summary Machine learning in general and of formal languages in particular States,

Midterm Scores Min 1Q Median 3Q Max 23 58 68 83 100 The exam will be curved, with

Course on Inverse Problems Albert Tarantola Fourth Lesson: Sampling a Probability Distribution

Multi-parameter models - Metropolis sampling Applied Bayesian Statistics Dr. Earvin Balderama

Metropolis Sampling Matt Pharr cs348b May 20, 2003 Introduction Unbiased MC method for

Reasonable Accommodations and Modifications for People with Disabilities D E B O R A H T H R O

+ Soo Choi Sr. Manager, eBook Production HarperCollins Publishers Reaching the Same Screen

2019 Bankruptcy Reform Present ed By: Zach Cooper, Law Clerk t o Chief Judge Barreca Kryst al

Stochastic Simulation Markov Chain Monte Carlo Bo Friis Nielsen - PowerPoint PPT Presentation

Stochastic Simulation Markov Chain Monte Carlo Bo Friis Nielsen Institute of Mathematical Modelling Technical University of Denmark 2800 Kgs. Lyngby Denmark Email: bfni@dtu.dk The queueing example The queueing example We simulated the

Markov chain Monte Carlo Dr. Jarad Niemi STAT 544 - Iowa State University April 2, 2018 Jarad

Markov Chain Monte Carlo Methods Michel Bierlaire michel.bierlaire@epfl.ch Transport and

MARKOV CHAIN MONTE CARLO METHODS MARKOV CHAIN MONTE CARLO METHODS MARKO LAINE, FMI MARKO LAINE,

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Gersende FORT LTCI /

Simulation Monte Carlo Monte Carlo simulation Outcome of a single stochastic simulation run

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabs Pczos

Probabilistic Graphical Models Probabilistic Graphical Models Markov Chain Monte Carlo Inference

Markov chain Monte Carlo Reminder Need to sample large, non-standard distributions: Markov

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National University Monte Carlo

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

Part 3 Markov Chain Modeling Markov Chain Model Stochastic model Amounts to sequence of

Bayesian inference &amp; Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly

Summary Machine learning in general and of formal languages in particular States,

Midterm Scores Min 1Q Median 3Q Max 23 58 68 83 100 The exam will be curved, with

Course on Inverse Problems Albert Tarantola Fourth Lesson: Sampling a Probability Distribution

Multi-parameter models - Metropolis sampling Applied Bayesian Statistics Dr. Earvin Balderama

Metropolis Sampling Matt Pharr cs348b May 20, 2003 Introduction Unbiased MC method for

Reasonable Accommodations and Modifications for People with Disabilities D E B O R A H T H R O

+ Soo Choi Sr. Manager, eBook Production HarperCollins Publishers Reaching the Same Screen

2019 Bankruptcy Reform Present ed By: Zach Cooper, Law Clerk t o Chief Judge Barreca Kryst al

Bayesian inference & Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly