Advanced Simulation - Lecture 5 Patrick Rebeschini January 29th, 2018 Patrick Rebeschini Lecture 5 1/ 23
Limits of standard Monte Carlo methods Monte Carlo methods yield convergence rates in 1 / √ n , which is independent of the dimension d . On close inspection, the error still depends on d , through the constant in front of the rate. Unfortunately that “constant” (in n ) typically explodes exponentially with d . Markov chain Monte Carlo methods yield errors which explodes only polynomially in d , at least under some conditions. Patrick Rebeschini Lecture 5 2/ 23
Markov chain Monte Carlo Revolutionary idea introduced by Metropolis et al., J. Chemical Physics, 1953. Key idea : Given a target distribution π , build a Markov chain ( X t ) t ≥ 1 such that, as t → ∞ , X t ∼ π and � n � 1 ϕ ( X t ) → ϕ ( x ) π ( x ) dx n t =1 when n → ∞ e.g. almost surely. Also central limit theorems with a rate in 1 / √ n . Patrick Rebeschini Lecture 5 3/ 23
Markov chains - discrete space Let X be discrete, e.g. X = Z . ( X t ) t ≥ 1 is a Markov chain if P ( X t = x t | X 1 = x 1 , ..., X t − 1 = x t − 1 ) = P ( X t = x t | X t − 1 = x t − 1 ) . Homogeneous Markov chains: ∀ m ∈ N : P ( X t = y | X t − 1 = x ) = P ( X t + m = y | X t + m − 1 = x ) . The Markov transition kernel is K ( i, j ) = K ij = P ( X t = j | X t − 1 = i ) . Patrick Rebeschini Lecture 5 4/ 23
Markov chains - discrete space Let µ t ( x ) = P ( X t = x ), the chain rule yields t � P ( X 1 = x 1 , X 2 = x 2 , ..., X t = x t ) = µ 1 ( x 1 ) K x i − 1 x i . i =2 The m -transition matrix K m as K m ij = P ( X t + m = j | X t = i ) . Chapman-Kolmogorov equation: � K m + n K m ik K n = kj . ij k ∈ X We obtain � µ t +1 ( j ) = µ t ( i ) K ij i i.e. using “linear algebra notation”, µ t +1 = µ t K. Patrick Rebeschini Lecture 5 5/ 23
Irreducibility and aperiodicity A Markov chain is said to be irreducible if all the states communicate with each other, that is � � t : K t ∀ x, y ∈ X inf xy > 0 < ∞ . A state x has period d ( x ) defined as d ( x ) = gcd { s ≥ 1 : K s xx > 0 } . An irreducible chain is aperiodic if all states have period 1. � � θ 1 − θ Example: K θ = is irreducible if 1 − θ θ θ ∈ [0 , 1) and aperiodic if θ ∈ (0 , 1). If θ = 0, the gcd is 2. Patrick Rebeschini Lecture 5 6/ 23
Transience and recurrence Introduce the number of visits to x : ∞ � η x := 1 x ( X k ) . k =1 For a Markov chain, a state x is termed transient if: E x ( η x ) < ∞ , where E x refers to the law of the chain starting from x . A state is called recurrent otherwise and E x ( η x ) = ∞ . Patrick Rebeschini Lecture 5 7/ 23
Invariant distribution Definition: A distribution π is invariant for a Markov kernel K , if πK = π. Note: if there exists t such that X t ∼ π , then X t + s ∼ π for all s ∈ N . Example: for any θ ∈ [0 , 1] � � θ 1 − θ K θ = 1 − θ θ admits � � 1 1 π = 2 2 as invariant distribution. Patrick Rebeschini Lecture 5 8/ 23
Detailed balance A Markov kernel K satisfies detailed balance for π if ∀ x, y ∈ X : π ( x ) K xy = π ( y ) K yx . Lemma : If K satisfies detailed balance for π then K is π -invariant. If K satisfies detailed balance for π then the Markov chain is reversible, i.e. at stationarity, ∀ x, y ∈ X : P ( X t = x | X t +1 = y ) = P ( X t = x | X t − 1 = y ) . Patrick Rebeschini Lecture 5 9/ 23
Lack of reversibility 1 / 3 1 / 3 1 / 3 Let P = 1 0 0 . 0 1 0 Check πP = π for π = (1 / 2 , 1 / 3 , 1 / 6). P cannot be π reversible as 1 → 3 → 2 → 1 is a possible sequence whereas 1 → 2 → 3 → 1 is not (as P 2 , 3 = 0). Detailed balance does not hold as π 2 P 23 = 0 � = π 3 P 32 . Patrick Rebeschini Lecture 5 10/ 23
Remarks All finite space Markov chains have at least one stationary distribution but not all stationary distributions are also limiting distributions. 0 . 4 0 . 6 0 0 0 . 2 0 . 8 0 0 P = 0 0 0 . 4 0 . 6 0 0 0 . 2 0 . 8 Two left eigenvectors of eigenvalue 1: π 1 = (1 / 4 , 3 / 4 , 0 , 0) , π 2 = (0 , 0 , 1 / 4 , 3 / 4) depending on the initial state, two different stationary distributions. Patrick Rebeschini Lecture 5 11/ 23
Equilibrium Proposition : If a discrete space Markov chain is aperiodic and irreducible, and has an invariant distribution, then ∀ x ∈ X P µ ( X t = x ) − t →∞ π ( x ) , − − → for any starting distribution µ . In the Monte Carlo perspective, we will be primarily interested in convergence of empirical averages, such as n � � I n = 1 a.s. � − − − → ϕ ( X t ) n →∞ I = ϕ ( x ) π ( x ) . n t =1 x ∈ X Before turning to these “ergodic theorems”, let us consider continuous spaces. Patrick Rebeschini Lecture 5 12/ 23
Markov chains - continuous space The state space X is now continuous, e.g. R d . ( X t ) t ≥ 1 is a Markov chain if for any (measurable) set A , P ( X t ∈ A | X 1 = x 1 , X 2 = x 2 , ..., X t − 1 = x t − 1 ) = P ( X t ∈ A | X t − 1 = x t − 1 ) . We have � P ( X t ∈ A | X t − 1 = x ) = K ( x, y ) dy = K ( x, A ) , A that is conditional on X t − 1 = x , X t is a random variable which admits a probability density function K ( x, · ). K : X 2 → R is the kernel of the Markov chain. Patrick Rebeschini Lecture 5 13/ 23
Markov chains - continuous space Denoting µ 1 the pdf of X 1 , we obtain directly � t � P ( X 1 ∈ A 1 , ..., X t ∈ A t ) = µ 1 ( x 1 ) K ( x k − 1 , x k ) dx 1 · · · dx t . A 1 ×···× A t k =2 Denoting by µ t the distribution of X t , Chapman-Kolmogorov equation reads � µ t ( y ) = µ t − 1 ( x ) K ( x, y ) dx X and similarly for m > 1 � µ t ( x ) K m ( x, y ) dx µ t + m ( y ) = X where � t + m � K m ( x t , x t + m ) = K ( x k − 1 , x k ) dx t +1 · · · dx t + m − 1 . X m − 1 k = t +1 Patrick Rebeschini Lecture 5 14/ 23
Example Consider the autoregressive (AR) model X t = ρX t − 1 + V t � 0 , τ 2 � . This defines a Markov process such i.i.d. where V t ∼ N that � � 1 − 1 2 τ 2 ( y − ρx ) 2 √ K ( x, y ) = 2 πτ 2 exp . We also have m � X t + m = ρ m X t + ρ m − k V t + k k =1 so in the Gaussian case � � ( y − ρ m x ) 2 1 − 1 K m ( x, y ) = exp � τ 2 2 πτ 2 2 m m m = τ 2 � m � ρ 2 � m − k = τ 2 1 − ρ 2 m with τ 2 1 − ρ 2 . k =1 Patrick Rebeschini Lecture 5 15/ 23
Irreducibility and aperiodicity Given a distribution µ over X , a Markov chain is µ -irreducible if K t ( x, A ) > 0 . ∀ x ∈ X ∀ A : µ ( A ) > 0 ∃ t ∈ N A µ -irreducible Markov chain of transition kernel K is periodic if there exists some partition of the state space X 1 , ..., X d for d ≥ 2, such that � 1 j = i + s mod d ∀ i, j, t, s : P ( X t + s ∈ X j | X t ∈ X i ) = . 0 otherwise. Otherwise the chain is aperiodic. Patrick Rebeschini Lecture 5 16/ 23
Recurrence and Harris Recurrence For any measurable set A of X , let ∞ � η A = I A ( X k ) . k =1 A µ -irreducible Markov chain is recurrent if for any measurable set A ⊂ X : µ ( A ) > 0, then ∀ x ∈ A E x ( η A ) = ∞ . A µ -irreducible Markov chain is Harris recurrent if for any measurable set A ⊂ X : µ ( A ) > 0, then ∀ x ∈ X P x ( η A = ∞ ) = 1 . Harris recurrence is stronger than recurrence. Patrick Rebeschini Lecture 5 17/ 23
Invariant Distribution and Reversibility A distribution of density π is invariant or stationary for a Markov kernel K , if � π ( x ) K ( x, y ) dx = π ( y ) . X A Markov kernel K is π -reversible if � � ∀ f f ( x, y ) π ( x ) K ( x, y ) dxdy � � = f ( y, x ) π ( x ) K ( x, y ) dxdy where f is a bounded measurable function. Patrick Rebeschini Lecture 5 18/ 23
Detailed balance In practice it is easier to check the detailed balance condition: ∀ x, y ∈ X π ( x ) K ( x, y ) = π ( y ) K ( y, x ) Lemma: If detailed balance holds, then π is invariant for K and K is π -reversible. Example: the Gaussian AR process is π -reversible, π -invariant for � � τ 2 π ( x ) = N x ; 0 , 1 − ρ 2 when | ρ | < 1. Patrick Rebeschini Lecture 5 19/ 23
Selected asymptotic results Theorem . If K is a π -irreducible, π -invariant Markov kernel, then for any integrable function ϕ : X → R : � t � 1 lim ϕ ( X i ) = ϕ ( x ) π ( x ) dx t t →∞ X i =1 almost surely, for π − almost all starting value x . Theorem . If K is a π -irreducible, π -invariant, Harris recurrent Markov chain, then for any integrable function ϕ : X → R : � t � 1 lim ϕ ( X i ) = ϕ ( x ) π ( x ) dx t t →∞ X i =1 almost surely, for any starting value x . Patrick Rebeschini Lecture 5 20/ 23
Selected asymptotic results Theorem . Suppose the kernel K is π -irreducible, π -invariant, aperiodic. Then, we have � � � � � K t ( x, y ) − π ( y ) � lim � dy = 0 t →∞ X for π − almost all starting value x . Under some additional conditions, one can prove that a chain is geometrically ergodic, i.e. there exists ρ < 1 and a function M : X → R + such that for all measurable set A : | K n ( x, A ) − π ( A ) | ≤ M ( x ) ρ n , for all n ∈ N . In other words, we can obtain a rate of convergence. Patrick Rebeschini Lecture 5 21/ 23
Recommend
More recommend