Parallel tempering and Interacting MCMC algorithms Adaptive Equi-Energy sampler Parallel tempering and Interacting MCMC algorithms Gersende FORT / Eric MOULINES Telecom Paris Tech CNRS - LTCI
Parallel tempering and Interacting MCMC algorithms Adaptive Equi-Energy sampler Part II: Adaptive Equi-Energy samplers Joint work with Amandine Schreck Aur´ elien Garivier and Eric Moulines from LTCI, Telecom ParisTech & CNRS, France.
Parallel tempering and Interacting MCMC algorithms Adaptive Equi-Energy sampler From Parallel Tempering to Interacting Tempering ◮ The Equi Energy sampler Kou et al (2006) is an example of Interacting Tempering algorithm. ◮ The idea is to replace an instantaneous swap by an interaction with the whole past of a neighboring process on the temperature ladder .
Parallel tempering and Interacting MCMC algorithms Adaptive Equi-Energy sampler From Parallel Tempering to Interacting Tempering ◮ The Equi Energy sampler Kou et al (2006) is an example of Interacting Tempering algorithm. ◮ The idea is to replace an instantaneous swap by an interaction with the whole past of a neighboring process on the temperature ladder . Equi-Energy sampler Kou et al (2006) ◮ Will define X ( t ) = { X ( t ) n , n ≥ 0 } with X (1) (hot temperature), · · · , X ( K ) target process. ◮ Algorithm: given the previous level X ( k − 1) 1: n − 1 and the current point X ( k ) n − 1 , define X ( k ) as follows: n ◮ (MCMC step / local moves) with probability ǫ , with P ( k ) s.t. π ( k ) P ( k ) = π ( k ) X ( k ) ∼ P ( k ) ( X ( k ) n − 1 , · ) n ◮ (Interaction step / global moves) otherwise, (i) selection of a point X ( k − 1) among the set { X ( k − 1) 1: n − 1 } with the same • energy level as X ( k ) n − 1 (ii) acceptance-rejection ratio.
Parallel tempering and Interacting MCMC algorithms Adaptive Equi-Energy sampler Numerical application: on the interest of EE Target density : mixture of 2−dim Gaussian 10 8 ◮ target density : π = � 20 i =1 N 2 ( µ i , Σ i ) 6 ◮ K processes with target distribution π 1 /T k 4 2 ( T K = 1) 0 draws means of the components −2 0 1 2 3 4 5 6 7 8 9 10 Target density at temperature 1 Target density at temperature 2 Target density at temperature 3 14 12 12 12 10 10 10 8 8 8 6 6 6 4 4 4 2 2 2 0 0 0 −2 draws draws draws means of the components means of the components means of the components −4 −2 −2 −2 0 2 4 6 8 10 12 −2 0 2 4 6 8 10 12 0 1 2 3 4 5 6 7 8 9 10 Target density at temperature 4 Target density at temperature 5 Hastings−Metropolis 12 10 10 9 10 8 8 8 7 6 6 6 4 5 4 4 2 2 3 2 0 0 draws draws 1 draws means of the components means of the components means of the components −2 −2 0 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9
Parallel tempering and Interacting MCMC algorithms Adaptive Equi-Energy sampler “Design parameters” of the EE sampler 1. How to choose the probability of interaction ǫ ? 2. How many temperatures , and which ones ? 3. How many energy levels , and which ones ? Despite many convergence analysis (on EE with no selection) lim n E [ h ( X ( K ) ◮ ergodicity: )] = π ( h ) n � n j =1 h ( X ( K ) ◮ law of large numbers: 1 ) → π ( h ) in P or a.s. j n √ n − 1 � n j =1 { h ( X ( K ) ◮ CLT: ) − π ( h ) } → D N (0 , σ 2 ) j see e.g. Kou, Zhou, Wong (2006); Atchad´ e (2010); Andrieu, Jasra, Doucet, Del Moral (2011); Fort, Moulines, Priouret (2012); Fort, Moulines, Priouret, Vandekerkhove (2012) these problems are still open.
Parallel tempering and Interacting MCMC algorithms Adaptive Equi-Energy sampler “Design parameters” of the EE sampler 1. How to choose the probability of interaction ǫ ? 2. How many temperatures , and which ones ? 3. How many energy levels , and which ones ? ◮ In the original EE: energy rings = strata in the range of the energy H of the target π π ( x ) = exp( −H ( x )) Choose H i s.t. min H < H 1 < · · · < H L . Energy Ring # i = { x, H ( x ) ∈ [ H i − 1 , H i ] } ◮ Our contribution : tune adaptively the boundaries of the strata
Parallel tempering and Interacting MCMC algorithms Adaptive Equi-Energy sampler Num. Appl.: fixed boundaries vs adapted boundaries ◮ Target distribution on R 6 π = 1 2 N 6 ( µ, 0 . 3 Id) + 1 2 N 6 ( − µ, 0 . 2 Id) µ = [2 , · · · , 2] ◮ We compare Hastings-Metropolis (HM); and the EE sampler and the Adaptive EE sampler when applied with 3 temperatures and 11 strata. u T X ; v T X ◮ The last plot is for the 2 -d projection � � with u T ∝ [1 , 1 , · · · , 1] v T ∝ [1 , 1 , 1 , − 1 , − 1 , − 1]
Parallel tempering and Interacting MCMC algorithms Adaptive Equi-Energy sampler Behavior along one path: HM EE A-EE [Top] Error when estimating the means L1 error when estimating the means 2.5 � � MH 6 n EES � � 1 1 SA−AEES X ( K ) � � � � − E π [ X i ] � j,i � 2 6 n � � i =1 j =1 � � 1.5 [Bottom L] Time spent in one of the mode where the path is initialized . 1 [Bottom R] Probability of being in 0.5 some ellipsoids, for the first mode 0 0 1 2 3 4 5 6 (line) and the second one (dashed line) x 10 5 Probability of being in some area (true prob. is 0.05) Time spent in the left mode 0.12 100 mode 1 ees ees mode 2 ees aees mode 1 aees true mode 2 aees true mode 1 true mode 2 90 0.1 80 0.08 70 0.06 60 0.04 50 0.02 40 0 30 0 1 2 3 4 5 6 0 1 2 3 4 5 6 5 x 10 5 x 10
Parallel tempering and Interacting MCMC algorithms Adaptive Equi-Energy sampler Behavior on 50 ind. run HM EE A-EE [Top] Error when estimating the means � � 6 n 1 � 1 � X ( K ) L1 error for HM (red), EES (black) and AEES (blue) � � � � − E π [ X i ] � � j,i 6 n � � 2 i =1 j =1 � � 1.5 [Bottom L] Time spent in one of the mode where the path is initialized . 1 [Bottom R] Probability of being in 0.5 some ellipsoids for the first mode Proba of being in the left ellipsoid, for EES (black) and AEES (blue) 0 1000 5000 25 k 100 k 250 k 400 k 550 k 0.16 0.14 Percent of the time spent in the first component, for EES (black) and AEES (blue) 100 0.12 90 0.1 80 70 0.08 60 50 0.06 40 30 0.04 20 0.02 10 0 1000 5000 25 k 100 k 250 k 400 k 550 k 0 1000 5000 25 k 100 k 250 k 400 k 550 k
Parallel tempering and Interacting MCMC algorithms Adaptive Equi-Energy sampler Adaptive tuning of the boundaries of the energy rings → How to define the boundaries H 1 , · · · , H L of the energy rings ? ֒ Algorithm ◮ Level 1 (Hot level) ◮ Sample X (1) with target π 1 /T 1 (MCMC). ◮ at each time n , update the boundaries H (1) n, 1 , · · · , H (1) n,L computed from X (1) 1: n ◮ Level 2 ◮ Sample X (2) (MCMC step and interaction step) with target π 1 /T 2 . For the interaction step, use the boundaries H (1) • . ◮ at each time n , update the boundaries H (2) n, 1 , · · · , H (2) n,L computed from X (2) 1: n ◮ Repeat until Level K .
Parallel tempering and Interacting MCMC algorithms Adaptive Equi-Energy sampler On the convergence of such adaptive schemes Convergence result: we prove ergodicity and a strong law of large numbers for A-EE. Our approach for the proof is by induction: ◮ we assume the process X ( k − 1) ”converges”. ◮ we prove that the process X ( k ) has the same convergence properties. ◮ Repeat from level 1 to K . Tools for the proof: ◮ the conditional distribution L ( X ( k ) n | past (1: k ) n − 1 ) is P ( k ) θ n − 1 ( X ( k ) n − 1 , · ) P ( k ) θn ( x, dy ) = ǫP ( k )( x, dy ) + (1 − ǫ ) K ( k ) θn ( x, dy ) gθn ( x, y ) θn ( dy ) gθn ( x, y ) θn ( dy ) K ( k ) � α ( k ) � { 1 − α ( k ) θn ( x, A ) = θn ( x, y ) + δx ( A ) θn ( x, y ) } � � A gθn ( x, z ) θn ( dz ) gθn ( x, z ) θn ( dz ) π 1 /Tk − 1 /Tk − 1 ( y ) n � gθn ( x, z ) θn ( dz ) 1 α ( k ) � θn = δ θn ( x, y ) = 1 ∧ X ( k − 1) π 1 /Tk − 1 /Tk − 1 ( x ) n � gθn ( y, z ) θn ( dz ) j =1 j gθn ( x, y ) = ” x and y are in the same energy ring with boundaries defined by H ( k − 1) ” n, • ( ex. ) � 0 if if x, y are in the same energy level = 1 if otherwise
Parallel tempering and Interacting MCMC algorithms Adaptive Equi-Energy sampler On the convergence of such adaptive schemes Convergence result: we prove ergodicity and a strong law of large numbers for A-EE. Our approach for the proof is by induction: ◮ we assume the process X ( k − 1) ”converges”. ◮ we prove that the process X ( k ) has the same convergence properties. ◮ Repeat from level 1 to K . Tools for the proof: ◮ the conditional distribution L ( X ( k ) n | past (1: k ) n − 1 ) is P ( k ) θ n − 1 ( X ( k ) n − 1 , · ) ◮ containment and diminishing adaptation conditions extensions from the pioneering work by (Roberts, Rosenthal (2005)) + Poisson equation + Limit Theorems for Martingales. ◮ condition on the adapted boundaries (i) There exists β > 0 s.t. lim n n β � � � H ( k ) n, • − H ( k ) � = 0 w.p.1. � � n − 1 , • (ii) H ( k ) n, • → H ( k ) ∞ , • w.p.1 when n → ∞ . (iii) assumption on the limiting boundaries: � g ( k ) ∞ ( x, y ) π 1 /T k ( dy ) > 0 inf x
Recommend
More recommend