Convergence of Adaptive and Interacting MCMC algorithms Convergence of Adaptive and Interacting MCMC algorithms Gersende FORT LTCI / CNRS - TELECOM ParisTech, France Joint work with E. Moulines (LTCI, France) and P. Priouret (LPMA, France)
Convergence of Adaptive and Interacting MCMC algorithms Examples of adaptive MCMC Convergence of the marginals for adaptive MCMC samplers Law of large numbers for adaptive MCMC samplers Convergence of the stationary distributions π θ n Applications
Convergence of Adaptive and Interacting MCMC algorithms Examples of adaptive MCMC I. Two examples of adaptive MCMC samplers an Adaptive MCMC algorithm 1 an Interacting MCMC algorithm 2
Convergence of Adaptive and Interacting MCMC algorithms Examples of adaptive MCMC The Adaptive Metropolis Example 1: The Adaptive Metropolis [Haario et al. (1999)] Consider the Metropolis-Hastings algorithm with target density π on X X ⊆ R d , density w.r.t. the Lebesgue measure with Gaussian proposal q θ ( x, y ) = N d ( x, θ )[ y ] → How to choose the design parameter θ ? ֒
Convergence of Adaptive and Interacting MCMC algorithms Examples of adaptive MCMC The Adaptive Metropolis Example 1: The Adaptive Metropolis [Haario et al. (1999)] Consider the Metropolis-Hastings algorithm with target density π on X X ⊆ R d , density w.r.t. the Lebesgue measure with Gaussian proposal q θ ( x, y ) = N d ( x, θ )[ y ] → How to choose the design parameter θ ? ֒ Ans: covariance matrix of π up to a scalar, [Roberts et al. (1997)] iteratively estimated by the empirical covariance matrix or a robust estimator n o n 1 ( X n +1 − µ n +1 )( X n +1 − µ n +1 ) T + κ Id d θ n +1 = n + 1 θ n + n + 1 1 µ n +1 = µ n + n + 1( X n +1 − µ n )
Convergence of Adaptive and Interacting MCMC algorithms Examples of adaptive MCMC The Adaptive Metropolis This yields the adaptive Metropolis algorithm: iteratively draw X n +1 ∼ P θ n ( X n , · ) transition kernel of a HM algo with Gaussian proposal with covariance matrix ∝ θn update the parameter θ n +1 , based on θ n and X 1: n +1
Convergence of Adaptive and Interacting MCMC algorithms Examples of adaptive MCMC The Adaptive Metropolis This yields the adaptive Metropolis algorithm: iteratively draw X n +1 ∼ P θ n ( X n , · ) transition kernel of a HM algo with Gaussian proposal with covariance matrix ∝ θn update the parameter θ n +1 , based on θ n and X 1: n +1 In this example πP θ = π i.e. same invariant distribution θ n ∈ Θ where Θ is a finite dimensional parameter space.
Convergence of Adaptive and Interacting MCMC algorithms Examples of adaptive MCMC The Equi-Energy sampler (simplified) Example 2: The Equi-Energy sampler (simplified) [Kou et al. (2006)] ֒ → For the simulation of multi-modal density π . Hastings−Metropolis 0.9 Processus auxiliaire 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 −6 −4 −2 0 2 4 6 −10 −8 −6 −4 −2 0 2 4 6 Equi Energy, avec selection Equi Energy, sans selection 0.8 0.9 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 −8 −6 −4 −2 0 2 4 6 −8 −6 −4 −2 0 2 4 6
Convergence of Adaptive and Interacting MCMC algorithms Examples of adaptive MCMC The Equi-Energy sampler (simplified) Let a transition kernel P such that πP = π . a probability of swap: ǫ ∈ (0 , 1) an auxiliary process { Y n , n ≥ 0 } that “targets” the tempered density π 1 − β ( β ∈ (0 , 1) )
Convergence of Adaptive and Interacting MCMC algorithms Examples of adaptive MCMC The Equi-Energy sampler (simplified) Let a transition kernel P such that πP = π . a probability of swap: ǫ ∈ (0 , 1) an auxiliary process { Y n , n ≥ 0 } that “targets” the tempered density π 1 − β ( β ∈ (0 , 1) ) Define iteratively the process of interest { X n , n ≥ 0 } with probability (1 − ǫ ) : draw X n +1 ∼ P ( X n , · ) with probability ǫ : draw at random Y through the past values Y 0: n and accept or not Y as the new value, with an acceptation-rejection algorithm.
Convergence of Adaptive and Interacting MCMC algorithms Examples of adaptive MCMC The Equi-Energy sampler (simplified) Let a transition kernel P such that πP = π . a probability of swap: ǫ ∈ (0 , 1) an auxiliary process { Y n , n ≥ 0 } that “targets” the tempered density π 1 − β ( β ∈ (0 , 1) ) Define iteratively the process of interest { X n , n ≥ 0 } with probability (1 − ǫ ) : draw X n +1 ∼ P ( X n , · ) with probability ǫ : draw at random Y through the past values Y 0: n and accept or not Y as the new value, with an acceptation-rejection algorithm. (simplified EE)
Convergence of Adaptive and Interacting MCMC algorithms Examples of adaptive MCMC The Equi-Energy sampler (simplified) This yields the (simplified) Equi-Energy sampler: X n +1 ∼ P θ n ( X n , · ) n 1 X where θ n +1 = δ Y k n + 1 k =0 Z ff Z P θ ( x, A ) = (1 − ǫ ) P ( x, A ) + ǫ α ( x, y ) θ (d y ) + ✶ A ( x ) (1 − α ( x, y )) θ (d y ) A and α ( x, y ) defined such that πPθ⋆ = π where θ⋆ = lim n θn ∝ π 1 − β
Convergence of Adaptive and Interacting MCMC algorithms Examples of adaptive MCMC The Equi-Energy sampler (simplified) This yields the (simplified) Equi-Energy sampler: X n +1 ∼ P θ n ( X n , · ) n 1 X where θ n +1 = δ Y k n + 1 k =0 Z ff Z P θ ( x, A ) = (1 − ǫ ) P ( x, A ) + ǫ α ( x, y ) θ (d y ) + ✶ A ( x ) (1 − α ( x, y )) θ (d y ) A and α ( x, y ) defined such that πPθ⋆ = π where θ⋆ = lim n θn ∝ π 1 − β In this example π θ P θ = π θ i.e. invariant distributions depending upon θ θ n ∈ Θ where Θ is an infinite dimensional parameter space.
Convergence of Adaptive and Interacting MCMC algorithms Examples of adaptive MCMC Conclusion Conclusion Let a family of transition kernels on X, { P θ , θ ∈ Θ } . Consider a X × Θ -valued process { ( X n , θ n ) , n ≥ 0 } such that it is adapted to a filtration F n . P ( X n +1 ∈ A |F n ) = P θ n ( X n , A ) ֒ → What kind of conditions on the adaption mecanism { θ n , n ≥ 0 } and on the transition kernels { P θ , θ ∈ Θ } imply that there exists a distribution π such that convergence of the marginals: E [ f ( X n )] → π ( f ) f bounded law of large numbers: n − 1 � n a.s. k =1 f ( X k ) − → π ( f ) f unbounded
Convergence of Adaptive and Interacting MCMC algorithms Convergence of the marginals for adaptive MCMC samplers II. Convergence of the marginals for adaptive MCMC samplers For a bounded function f , E [ f ( X n )] − π ( f ) → 0 Even in the case the kernels P θ have the same invariant distribution, it is NOT true that ergodicity holds since the kernels are chosen at random. Conditions on the adaptation mecanism are required
Convergence of Adaptive and Interacting MCMC algorithms Convergence of the marginals for adaptive MCMC samplers Sketch of the proof Sketch of the proof We write h i f ( X n ) − P N E [ f ( X n )] − π ( f ) = E θ n − N f ( X n − N ) h i ˆ ˜ P N + E θ n − N f ( X n − N ) − π θ n − N ( f ) + E π θ n − N ( f ) − π ( f )
Convergence of Adaptive and Interacting MCMC algorithms Convergence of the marginals for adaptive MCMC samplers Sketch of the proof Sketch of the proof We write h i f ( X n ) − P N E [ f ( X n )] − π ( f ) = E θ n − N f ( X n − N ) h i ˆ ˜ P N + E θ n − N f ( X n − N ) − π θ n − N ( f ) + E π θ n − N ( f ) − π ( f ) → [A] condition on the ergodicity of the transition kernels ֒ “Usually”, the transition kernels { P θ , θ ∈ Θ } are geometrically ergodic : | P n θ f ( x ) − π θ ( f ) | ≤ C θ ρ n sup θ V ( x ) ρ θ ∈ (0 , 1) f, | f |≤ 1
Convergence of Adaptive and Interacting MCMC algorithms Convergence of the marginals for adaptive MCMC samplers Sketch of the proof Sketch of the proof We write h i f ( X n ) − P N E [ f ( X n )] − π ( f ) = E θ n − N f ( X n − N ) h i ˆ ˜ P N + E θ n − N f ( X n − N ) − π θ n − N ( f ) + E π θ n − N ( f ) − π ( f ) → [B] condition on the adaptation mecanism since ֒ ˛ h i˛ ˛ f ( X n ) − P N ˛ ˛ E θ n − N f ( X n − N ) ˛ N − 1 » – X ‚ ‚ ≤ ( N − j ) E sup ‚ P θ n − N + j ( x, · ) − P θ n − N + j − 1 ( x, · ) ‚ TV x j =1
Convergence of Adaptive and Interacting MCMC algorithms Convergence of the marginals for adaptive MCMC samplers Sketch of the proof Sketch of the proof We write h i f ( X n ) − P N E [ f ( X n )] − π ( f ) = E θ n − N f ( X n − N ) h i ˆ ˜ P N + E θ n − N f ( X n − N ) − π θ n − N ( f ) + E π θ n − N ( f ) − π ( f ) → [C] when π θ � = π , condition on the convergence of { π θ n , n ≥ 0 } to π ֒
Recommend
More recommend