advanced simulation lecture 5
play

Advanced Simulation - Lecture 5 George Deligiannidis February 1st, - PowerPoint PPT Presentation

Advanced Simulation - Lecture 5 George Deligiannidis February 1st, 2016 Irreducibility and aperiodicity Definition Given a distribution over X , a Markov chain is -irreducible if K t ( x , A ) > 0. x X A : ( A ) > 0


  1. Advanced Simulation - Lecture 5 George Deligiannidis February 1st, 2016

  2. Irreducibility and aperiodicity Definition Given a distribution µ over X , a Markov chain is µ -irreducible if K t ( x , A ) > 0. ∀ x ∈ X ∀ A : µ ( A ) > 0 ∃ t ∈ N A µ -irreducible Markov chain of transition kernel K is periodic if there exists some partition of the state space X 1 , ..., X d for d ≥ 2, such that � 1 � = j = i + s mod d � X t ∈ X i � � ∀ i , j , t , s : P X t + s ∈ X j . 0 otherwise. Otherwise the chain is aperiodic. Lecture 5 Continuous State Markov Chains 2 / 40

  3. Recurrence and Harris Recurrence For any measurable set A of X , let ∞ ∑ η A = I A ( X k ) = # of visits to A . k = 1 Definition A µ -irreducible Markov chain is recurrent if for any measurable set A ⊂ X : µ ( A ) > 0, then ∀ x ∈ A E x ( η A ) = ∞ . A µ -irreducible Markov chain is Harris recurrent if for any measurable set A ⊂ X : µ ( A ) > 0, then ∀ x ∈ X P x ( η A = ∞ ) = 1. Harris recurrence is stronger than recurrence. Lecture 5 Continuous State Markov Chains 3 / 40

  4. Invariant Distribution and Reversibility Definition A distribution of density π is invariant or stationary for a Markov kernel K , if � X π ( x ) K ( x , y ) dx = π ( y ) . A Markov kernel K is π -reversible if �� ∀ f f ( x , y ) π ( x ) K ( x , y ) dxdy �� = f ( y , x ) π ( x ) K ( x , y ) dxdy where f is a bounded measurable function. Lecture 5 Continuous State Markov Chains 4 / 40

  5. Detailed balance In practice it is easier to check the detailed balance condition: ∀ x , y ∈ X π ( x ) K ( x , y ) = π ( y ) K ( y , x ) Lemma If detailed balance holds, then π is invariant for K and K is π -reversible. Example: the Gaussian AR process is π -reversible, π -invariant for τ 2 � � π ( x ) = N x ; 0, 1 − ρ 2 when | ρ | < 1. Lecture 5 Continuous State Markov Chains 5 / 40

  6. Checking for recurrence It’s often straightforward to check for irreducibility, or for an invariant measure but not so for recurrence. Proposition If the chain is µ -irreducible and admits an invariant measure then the chain is recurrent. Remark: A chain that is µ -irreducible and admits an invariant measure is called a positive. Lecture 5 Continuous State Markov Chains 6 / 40

  7. Law of Large Numbers Theorem If K is a π -irreducible, π -invariant Markov kernel, then for any integrable function ϕ : X → R : t 1 � ∑ ϕ ( X i ) = X ϕ ( x ) π ( x ) dx lim t t → ∞ i = 1 almost surely, for π − almost all starting values x. Theorem If K is a π -irreducible, π -invariant, Harris recurrent Markov chain, then for any integrable function ϕ : X → R : t 1 � ∑ lim ϕ ( X i ) = X ϕ ( x ) π ( x ) dx t t → ∞ i = 1 almost surely, for any starting value x. Lecture 5 Limit theorems 7 / 40

  8. Convergence Theorem Suppose the kernel K is π -irreducible, π -invariant, aperiodic. Then, we have � � K t ( x , y ) − π ( y ) � � � dy = 0 lim t → ∞ X for π − almost all starting values x. Under some additional conditions, one can prove that a chain is geometrically ergodic, i.e. there exists ρ < 1 and a function M : X → R + such that for all measurable set A : | K n ( x , A ) − π ( A ) | ≤ M ( x ) ρ n , for all n ∈ N . In other words, we can obtain a rate of convergence. Lecture 5 Limit theorems 8 / 40

  9. Central Limit Theorem Theorem Under regularity conditions, for a Harris recurrent, π -invariant Markov chain, we can prove � � √ t 1 � 0, σ 2 ( ϕ ) D ∑ ϕ ( X i ) − − t → ∞ N − → � � t X ϕ ( x ) π ( x ) dx , t i = 1 where the asymptotic variance can be written ∞ σ 2 ( ϕ ) = V π [ ϕ ( X 1 )] + 2 ∑ C ov π [ ϕ ( X 1 ) , ϕ ( X k )] . k = 2 This formula shows that (positive) correlations increase the asymptotic variance, compared to i.i.d. samples for which the variance would be V π ( ϕ ( X )) . Lecture 5 Limit theorems 9 / 40

  10. Central Limit Theorem Example: for the AR Gaussian model, x ; 0, τ 2 / ( 1 − ρ 2 ) π ( x ) = N � � for | ρ | < 1 and τ 2 C ov ( X 1 , X k ) = ρ k − 1 V [ X 1 ] = ρ k − 1 1 − ρ 2 . Therefore with ϕ ( x ) = x , � � τ 2 ∞ τ 2 τ 2 1 + ρ σ 2 ( ϕ ) = ρ k ∑ 1 + 2 = 1 − ρ = ( 1 − ρ ) 2 , 1 − ρ 2 1 − ρ 2 k = 1 which increases when ρ → 1. Lecture 5 Limit theorems 10 / 40

  11. Markov chain Monte Carlo We are interested in sampling from a distribution π , for instance a posterior distribution in a Bayesian framework. Markov chains with π as invariant distribution can be constructed to approximate expectations with respect to π . For example, the Gibbs sampler generates a Markov chain targeting π defined on R d using the full conditionals π ( x i | x 1 , . . . , x i − 1 , x i + 1 , . . . , x d ) . Lecture 5 MCMC 11 / 40

  12. Gibbs Sampling Assume you are interested in sampling from x R d . π ( x ) = π ( x 1 , x 2 , ..., x d ) , Notation: x − i : = ( x 1 , ..., x i − 1 , x i + 1 , ..., x d ) . � � X ( 1 ) 1 , ..., X ( 1 ) Systematic scan Gibbs sampler . Let be the d initial state then iterate for t = 2, 3, ... 1. Sample X ( t ) � ·| X ( t − 1 ) , ..., X ( t − 1 ) � ∼ π X 1 | X − 1 . 1 2 d · · · j. Sample X ( t ) � ·| X ( t ) 1 , ..., X ( t ) j − 1 , X ( t − 1 ) j + 1 , ..., X ( t − 1 ) � ∼ π X j | X − j . j d · · · d. Sample X ( t ) � ·| X ( t ) 1 , ..., X ( t ) � ∼ π X d | X − d . d d − 1 Lecture 5 MCMC Gibbs Sampling 12 / 40

  13. Gibbs Sampling Is the joint distribution π uniquely specified by the conditional distributions π X i | X − i ? Does the Gibbs sampler provide a Markov chain with the correct stationary distribution π ? If yes, does the Markov chain converge towards this invariant distribution? It will turn out to be the case under some mild conditions. Lecture 5 MCMC Gibbs Sampling 13 / 40

  14. Hammersley-Clifford Theorem I Theorem Consider a distribution whose density π ( x 1 , x 2 , ..., x d ) is such that supp ( π ) = ⊗ d i = 1 supp ( π X i ) . Then for any ( z 1 , ..., z d ) ∈ supp ( π ) , we have � � x 1: j − 1 , z j + 1: d � � x j π X j | X − j d ∏ � . π ( x 1 , x 2 , ..., x d ) ∝ � x 1: j − 1 , z j + 1: d � � π X j | X − j z j j = 1 Remark: The condition above is the positivity condition. Equivalently, if π X i ( x i ) > 0 for i = 1, . . . , d , then π ( x 1 , . . . , x d ) > 0. Lecture 5 MCMC Gibbs Sampling 14 / 40

  15. Proof of Hammersley-Clifford Theorem Proof. We have π ( x 1: d − 1 , x d ) = π X d | X − d ( x d | x 1: d − 1 ) π ( x 1: d − 1 ) , π ( x 1: d − 1 , z d ) = π X d | X − d ( z d | x 1: d − 1 ) π ( x 1: d − 1 ) . Therefore π ( x 1: d ) = π ( x 1: d − 1 , z d ) π ( x 1: d − 1 , x d ) π ( x 1: d − 1 , z d ) = π ( x 1: d − 1 , z d ) π ( x 1: d − 1 , x d ) / π ( x 1: d − 1 ) π ( x 1: d − 1 , z d ) / π ( x 1: d − 1 ) π X d | X 1: d − 1 ( x d | x 1: d − 1 ) = π ( x 1: d − 1 , z d ) π X d | X 1: d − 1 ( z d | x 1: d − 1 ) . Lecture 5 MCMC Gibbs Sampling 15 / 40

  16. Proof. Similarly, we have π ( x 1: d − 1 , z d ) π ( x 1: d − 1 , z d ) = π ( x 1: d − 2 , z d − 1 , z d ) π ( x 1: d − 2 , z d − 1 , z d ) π ( x 1: d − 1 , z d ) / π ( x 1: d − 2 , z d ) = π ( x 1: d − 2 , z d − 1 , z d ) π ( x 1: d − 2 , z d − 1 , z d ) / π ( x 1: d − 2 , z d ) π X d − 1 | X − ( d − 1 ) ( x d − 1 | x 1: d − 2 , z d ) = π ( x 1: d − 2 , z d − 1 , z d ) π X d − 1 | X − ( d − 1 ) ( z d − 1 | x 1: d − 2 , z d ) hence π X d − 1 | X − ( d − 1 ) ( x d − 1 | x 1: d − 2 , z d ) π ( x 1: d ) = π ( x 1: d − 2 , z d − 1 , z d ) π X d − 1 | X − ( d − 1 ) ( z d − 1 | x 1: d − 2 , z d ) π X d | X − d ( x d | x 1: d − 1 ) × π X d | X − d ( z d | x 1: d − 1 ) Lecture 5 MCMC Gibbs Sampling 16 / 40

  17. Proof. By z ∈ supp ( π ) we have that π X i ) ( z i ) > 0 for all i . Also, we are allowed to suppose that π X i ( x i ) > 0 for all i . Thus all the conditional probabilities we introduce are positive since π X j | X − j ( x j | x 1 , . . . , x j − 1 , z j + 1 , . . . , z d ) = π ( x 1 , . . . , x j − 1 , x j , z j + 1 , . . . , z d ) π ( x 1 , . . . , x j − 1 , z j , z j + 1 , . . . , z d ) > 0. By iterating we have the theorem. Lecture 5 MCMC Gibbs Sampling 17 / 40

  18. Example: Non-Integrable Target Consider the following conditionals on R + π X 1 | X 2 ( x 1 | x 2 ) = x 2 exp ( − x 2 x 1 ) π X 2 | X 1 ( x 2 | x 1 ) = x 1 exp ( − x 1 x 2 ) . We might expect that these full conditionals define a joint probability density π ( x 1 , x 2 ) . Hammersley-Clifford would give π X 1 | X 2 ( x 1 | z 2 ) π X 2 | X 1 ( x 2 | x 1 ) π ( x 1 , x 2 , ..., x d ) ∝ π X 1 | X 2 ( z 1 | z 2 ) π X 2 | X 1 ( z 2 | x 1 ) = z 2 exp ( − z 2 x 1 ) x 1 exp ( − x 1 x 2 ) z 2 exp ( − z 2 z 1 ) x 1 exp ( − x 1 z 2 ) ∝ exp ( − x 1 x 2 ) . However �� exp ( − x 1 x 2 ) dx 1 dx 2 = ∞ so π X 1 | X 2 ( x 1 | x 2 ) = x 2 exp ( − x 2 x 1 ) and π X 2 | X 1 ( x 1 | x 2 ) = x 1 exp ( − x 1 x 2 ) are not compatible. Lecture 5 MCMC Gibbs Sampling 18 / 40

Recommend


More recommend