advanced simulation lecture 6
play

Advanced Simulation - Lecture 6 Patrick Rebeschini January 31th, - PowerPoint PPT Presentation

Advanced Simulation - Lecture 6 Patrick Rebeschini January 31th, 2018 Patrick Rebeschini Lecture 6 1/ 25 Markov chain Monte Carlo We are interested in sampling from a distribution , for instance a posterior distribution in a Bayesian


  1. Advanced Simulation - Lecture 6 Patrick Rebeschini January 31th, 2018 Patrick Rebeschini Lecture 6 1/ 25

  2. Markov chain Monte Carlo We are interested in sampling from a distribution π , for instance a posterior distribution in a Bayesian framework. Markov chains with π as invariant distribution can be constructed to approximate expectations with respect to π . For example, the Gibbs sampler generates a Markov chain targeting π defined on R d using the full conditionals π ( x i | x 1 , . . . , x i − 1 , x i +1 , . . . , x d ) . Patrick Rebeschini Lecture 6 2/ 25

  3. Gibbs Sampling Assume you are interested in sampling from π ( x ) = π ( x 1 , x 2 , ..., x d ) . Notation: x − i := ( x 1 , ..., x i − 1 , x i +1 , ..., x d ). � X (1) 1 , ..., X (1) � Systematic scan Gibbs sampler . Let be the d initial state then iterate for t = 2 , 3 , ... 1. Sample X ( t ) � ·| X ( t − 1) , ..., X ( t − 1) � ∼ π X 1 | X − 1 . 1 2 d · · · j. Sample X ( t ) � ·| X ( t ) 1 , ..., X ( t ) j − 1 , X ( t − 1) j +1 , ..., X ( t − 1) � ∼ π X j | X − j . j d · · · d. Sample X ( t ) � ·| X ( t ) 1 , ..., X ( t ) � ∼ π X d | X − d . d − 1 d Patrick Rebeschini Lecture 6 3/ 25

  4. Gibbs Sampling Is the joint distribution π uniquely specified by the conditional distributions π X i | X − i ? Does the Gibbs sampler provide a Markov chain with the correct stationary distribution π ? If yes, does the Markov chain converge towards this invariant distribution? It will turn out to be the case under some mild conditions. Patrick Rebeschini Lecture 6 4/ 25

  5. Hammersley-Clifford Theorem Theorem . Consider a distribution whose density π ( x 1 , x 2 , ..., x d ) is such that supp( π ) = ⊗ d i =1 supp( π X i ). Then for any ( z 1 , ..., z d ) ∈ supp( π ), we have d π X j | X − j ( x j | x 1: j − 1 , z j +1: d ) � π ( x 1 , x 2 , ..., x d ) ∝ π X j | X − j ( z j | x 1: j − 1 , z j +1: d ) . j =1 Proof: we have π ( x 1: d − 1 , x d ) = π X d | X − d ( x d | x 1: d − 1 ) π ( x 1: d − 1 ) , π X d | X − d ( z d | x 1: d − 1 ) π ( x 1: d − 1 ) . π ( x 1: d − 1 , z d ) = Therefore π ( x 1: d ) = π ( x 1: d − 1 , z d ) π X d | X − d ( x d | x 1: d − 1 ) π X d | X − d ( z d | x 1: d − 1 ) . Patrick Rebeschini Lecture 6 5/ 25

  6. Hammersley-Clifford Theorem Similarly, we have π ( x 1: d − 1 , z d ) = π X d − 1 | X − ( d − 1) ( x d − 1 | x 1: d − 2 , z d ) π ( x 1: d − 2 , z d ) , π ( x 1: d − 2 , z d − 1 , z d ) = π X d − 1 | X − ( d − 1) ( z d − 1 | x 1: d − 2 , z d ) π ( x 1: d − 2 , z d ) hence π X d − 1 | X − ( d − 1) ( x d − 1 | x 1: d − 2 , z d ) π ( x 1: d ) = π ( x 1: d − 2 , z d − 1 , z d ) π X d − 1 | X − ( d − 1) ( z d − 1 | x 1: d − 2 , z d ) × π X d | X − d ( x d | x 1: d − 1 ) π X d | X − d ( z d | x 1: d − 1 ) By iterating, we obtain the theorem, where the multiplicative constant is exactly π ( z 1 , . . . , z d ). Patrick Rebeschini Lecture 6 6/ 25

  7. Example: Non-Integrable Target Consider the following conditionals on R + π X 1 | X 2 ( x 1 | x 2 ) = x 2 exp ( − x 2 x 1 ) π X 2 | X 1 ( x 2 | x 1 ) = x 1 exp ( − x 1 x 2 ) . We might expect that these full conditionals define a joint probability density π ( x 1 , x 2 ). Hammersley-Clifford would give π ( x 1 , x 2 , ..., x d ) ∝ π X 1 | X 2 ( x 1 | z 2 ) π X 2 | X 1 ( x 2 | x 1 ) π X 1 | X 2 ( z 1 | z 2 ) π X 2 | X 1 ( z 2 | x 1 ) = z 2 exp ( − z 2 x 1 ) x 1 exp ( − x 1 x 2 ) z 2 exp ( − z 2 z 1 ) x 1 exp ( − x 1 z 2 ) ∝ exp ( − x 1 x 2 ) . � � exp ( − x 1 x 2 ) dx 1 dx 2 is not finite so However π X 1 | X 2 ( x 1 | x 2 ) = x 2 exp ( − x 2 x 1 ) and π X 2 | X 1 ( x 1 | x 2 ) = x 1 exp ( − x 1 x 2 ) are not compatible. Patrick Rebeschini Lecture 6 7/ 25

  8. Example: Positivity condition violated 2 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 y ● ● ● ● −1 −2 −2 −1 0 1 2 x Figure: Gibbs sampling targeting π ( x, y ) ∝ 1 [ − 1 , 0] × [ − 1 , 0] ∪ [0 , 1] × [0 , 1] ( x, y ). Patrick Rebeschini Lecture 6 8/ 25

  9. Invariance of the Gibbs sampler The kernel of the Gibbs sampler (case d = 2) is K ( x ( t − 1) , x ( t ) ) = π X 1 | X 2 ( x ( t ) | x ( t − 1) ) π X 2 | X 1 ( x ( t ) | x ( t ) 1 ) 1 2 2 Case d > 2: d π X j | X − j ( x ( t ) | x ( t ) 1: j − 1 , x ( t − 1) K ( x ( t − 1) , x ( t ) ) = � j +1: d ) j j =1 Proposition : The systematic scan Gibbs sampler kernel admits π as invariant distribution. Proof for d = 2. We have � � K ( x, y ) π ( x ) dx = π ( y 2 | y 1 ) π ( y 1 | x 2 ) π ( x 1 , x 2 ) dx 1 dx 2 � = π ( y 2 | y 1 ) π ( y 1 | x 2 ) π ( x 2 ) dx 2 = π ( y 2 | y 1 ) π ( y 1 ) = π ( y 1 , y 2 ) = π ( y ) . Patrick Rebeschini Lecture 6 9/ 25

  10. Irreducibility and Recurrence Proposition : Assume π satisfies the positivity condition, then the Gibbs sampler yields a π − irreducible and recurrent Markov chain. Theorem . Assume the positivity condition is satisfied then we have for any integrable function ϕ : X → R : t lim 1 � � X ( i ) � � ϕ = ϕ ( x ) π ( x ) dx t X i =1 for π − almost all starting value X (1) . Patrick Rebeschini Lecture 6 10/ 25

  11. Example: Bivariate Normal Distribution Let X := ( X 1 , X 2 ) ∼ N ( µ, Σ) where µ = ( µ 1 , µ 2 ) and � � σ 2 ρ 1 Σ = . σ 2 ρ 2 The Gibbs sampler proceeds as follows in this case 1 Sample X ( t ) � � X ( t − 1) � � µ 1 + ρ/σ 2 , σ 2 1 − ρ 2 /σ 2 ∼ N − µ 2 1 2 2 2 2 Sample X ( t ) � � X ( t ) � � µ 2 + ρ/σ 2 , σ 2 2 − ρ 2 /σ 2 ∼ N − µ 1 . 1 1 2 1 By proceeding this way, we generate a Markov chain X ( t ) whose successive samples are correlated. If successive values of X ( t ) are strongly correlated, then we say that the Markov chain mixes slowly. Patrick Rebeschini Lecture 6 11/ 25

  12. Bivariate Normal Distribution ● ● 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 y ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −2 ● ● −2 0 2 x Figure: Case where ρ = 0 . 1, first 100 steps. Patrick Rebeschini Lecture 6 12/ 25

  13. Bivariate Normal Distribution 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● y ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −2 −2 0 2 x Figure: Case where ρ = 0 . 99, first 100 steps. Patrick Rebeschini Lecture 6 13/ 25

  14. Bivariate Normal Distribution 0.4 0.6 0.3 density density 0.4 0.2 0.2 0.1 0.0 0.0 −2 0 2 −2 0 2 X X Figure: Histogram of the first component of the chain after 1000 iterations. Small ρ on the left, large ρ on the right. Patrick Rebeschini Lecture 6 14/ 25

  15. Bivariate Normal Distribution 0.4 0.4 0.3 0.3 density density 0.2 0.2 0.1 0.1 0.0 0.0 −2 0 2 −2 0 2 X X Figure: Histogram of the first component of the chain after 10000 iterations. Small ρ on the left, large ρ on the right. Patrick Rebeschini Lecture 6 15/ 25

  16. Bivariate Normal Distribution 0.4 0.4 0.3 0.3 density density 0.2 0.2 0.1 0.1 0.0 0.0 −2 0 2 −2 0 2 X X Figure: Histogram of the first component of the chain after 100000 iterations. Small ρ on the left, large ρ on the right. Patrick Rebeschini Lecture 6 16/ 25

  17. Gibbs Sampling and Auxiliary Variables Gibbs sampling requires sampling from π X j | X − j . In many scenarios, we can include a set of auxiliary variables Z 1 , ..., Z p and have an “extended” distribution of joint density π ( x 1 , ..., x d , z 1 , ..., z p ) such that � π ( x 1 , ..., x d , z 1 , ..., z p ) dz 1 ...dz d = π ( x 1 , ..., x d ) . which is such that its full conditionals are easy to sample. Mixture models, Capture-recapture models, Tobit models, Probit models etc. Patrick Rebeschini Lecture 6 17/ 25

  18. Mixtures of Normals mixture population 1 0.5 population 2 population 3 0.4 density 0.3 0.2 0.1 0.0 -2 -1 0 1 Independent data y 1 , ..., y n t K � � � µ k , σ 2 Y i | θ ∼ p k N k k =1 � p 1 , ..., p K , µ 1 , ..., µ K , σ 2 1 , ..., σ 2 � . where θ = K Patrick Rebeschini Lecture 6 18/ 25

Recommend


More recommend