stochastic approximation for adaptive markov chain monte
play

Stochastic approximation for adaptive Markov chain Monte Carlo - PowerPoint PPT Presentation

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Gersende FORT LTCI / CNRS - TELECOM ParisTech, France Stochastic approximation for adaptive


  1. Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Gersende FORT LTCI / CNRS - TELECOM ParisTech, France

  2. Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Examples of adaptive MCMC samplers I. Examples of adaptive and interacting MCMC samplers 1. Adaptive Hastings-Metropolis algorithm [Haario et al. 1999] 2. Equi-Energy algorithm [Kou et al. 2006] 3. Wang-Landau algorithm [Wang & Landau, 2001]

  3. Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Examples of adaptive MCMC samplers Adaptive Hastings-Metropolis algorithm Adaptive Hastings-Metropolis algorithm ◮ Symmetric Random Walk Hastings-Metropolis algorithm Goal: sample a Markov chain with known stationary distribution π on R d (known up to a normalizing constant) Iterative mecanism: given the current sample X n , propose a move to X n + Y Y ∼ q ( · − X n ) accept the move with probability α ( X n , X n + Y ) = 1 ∧ π ( X n + Y ) π ( X n ) and set X n +1 = X n + Y ; otherwise, X n +1 = X n .

  4. Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Examples of adaptive MCMC samplers Adaptive Hastings-Metropolis algorithm Adaptive Hastings-Metropolis algorithm ◮ Symmetric Random Walk Hastings-Metropolis algorithm Goal: sample a Markov chain with known stationary distribution π on R d (known up to a normalizing constant) Iterative mecanism: given the current sample X n , propose a move to X n + Y Y ∼ q ( · − X n ) accept the move with probability α ( X n , X n + Y ) = 1 ∧ π ( X n + Y ) π ( X n ) and set X n +1 = X n + Y ; otherwise, X n +1 = X n . Design parameter: how to choose the proposal distribution q ? For example, in the case q ( · − x ) = N d ( x ; θ ) how to scale the proposal i.e. how to choose the covariance matrix θ ?

  5. Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Examples of adaptive MCMC samplers Adaptive Hastings-Metropolis algorithm 3 2.5 3 2 2 2 1.5 1 1 1 0 0 0.5 −1 −1 0 −2 −2 −0.5 “goldilock principle” −3 −1 −3 0 500 1000 0 500 1000 0 500 1000 Too small, too large, better variance 1 1 1.2 1 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0 0 0 −0.2 0 50 100 0 50 100 0 50 100

  6. Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Examples of adaptive MCMC samplers Adaptive Hastings-Metropolis algorithm ◮ Adaptive Hastings-Metropolis algorithm(s) Based on theoretical results [Gelman et al. 1996; · · · ] when the proposal is Gaussian N d ( x, θ ) , choose θ as the covariance structure of π [Haario et al. 1999] : θ ∝ Σ π . In practice, Σ π is unknown and this quantity is computed “online” with the past samples of the chain n 1 n o ( X n +1 − µ n +1 )( X n +1 − µ n +1 ) T + κ Id d θ n +1 = n + 1 θ n + n + 1 where µ n +1 is the empirical mean. κ > 0 , prevent from badly scaled matrix

  7. Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Examples of adaptive MCMC samplers Adaptive Hastings-Metropolis algorithm ◮ Adaptive Hastings-Metropolis algorithm(s) Based on theoretical results [Gelman et al. 1996; · · · ] when the proposal is Gaussian N d ( x, θ ) , choose θ as the covariance structure of π [Haario et al. 1999] : θ ∝ Σ π . In practice, Σ π is unknown and this quantity is computed “online” with the past samples of the chain n 1 n o ( X n +1 − µ n +1 )( X n +1 − µ n +1 ) T + κ Id d θ n +1 = n + 1 θ n + n + 1 where µ n +1 is the empirical mean. κ > 0 , prevent from badly scaled matrix OR such that the mean acceptance rate converges to α ⋆ [Andrieu & Robert 2001] . In practice this θ is unknown and this parameter is adapted during the run of the algorithm θ n = τ n Id with log τ n +1 = log τ n + γ n +1 ( α n +1 − α ⋆ ) where α n is the mean acceptance rate. OR · · ·

  8. Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Examples of adaptive MCMC samplers Adaptive Hastings-Metropolis algorithm ◮ In practice, simultaneous adaptation of the design parameter and simulation. Given the current value of the chain X n and the design parameter θ n Draw the next sample X n +1 with the transition kernel P θ n ( X n , · ) . Update the design parameter: θ n +1 = Ξ n +1 ( θ n , X n +1 , · ) .

  9. Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Examples of adaptive MCMC samplers Adaptive Hastings-Metropolis algorithm ◮ In practice, simultaneous adaptation of the design parameter and simulation. Given the current value of the chain X n and the design parameter θ n Draw the next sample X n +1 with the transition kernel P θ n ( X n , · ) . Update the design parameter: θ n +1 = Ξ n +1 ( θ n , X n +1 , · ) . ◮ In this MCMC context, we are interested in the behavior of the chain { X n , n ≥ 0 } e.g. Convergence of the marginals: E [ f ( X n )] → π ( f ) for f bounded. n − 1 P n Law of large numbers: k =1 f ( X k ) → π ( f ) (a.s. or P ) Central limit theorem but we have πP θ = π for any θ : all the transition kernels have the same inv. distribution π so, stability / convergence of the adaptation process { θ n , n ≥ 0 } is not the main issue.

  10. Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Examples of adaptive MCMC samplers Equi-Energy sampler Equi-Energy sampler ◮ Proposed by Kou et al. (2006) for the simulation of multi-modal density π . How to define a sampler that both allows local moves for a local exploration of the density. and large jumps in order to visit other modes of the target ?

  11. Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Examples of adaptive MCMC samplers Equi-Energy sampler Equi-Energy sampler ◮ Proposed by Kou et al. (2006) for the simulation of multi-modal density π . How to define a sampler that both allows local moves for a local exploration of the density. and large jumps in order to visit other modes of the target ? ◮ Idea: (a) build an auxiliary process that moves between the modes far more easily and (b) define the process of interest by running a “classical” MCMC algorithm and sometimes, choose a value of the auxiliary process as the new value of the process of interest: draw a point at random + acceptation-rejection mecanism

  12. Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Examples of adaptive MCMC samplers Equi-Energy sampler Equi-Energy sampler ◮ Proposed by Kou et al. (2006) for the simulation of multi-modal density π . How to define a sampler that both allows local moves for a local exploration of the density. and large jumps in order to visit other modes of the target ? ◮ Idea: (a) build an auxiliary process that moves between the modes far more easily and (b) define the process of interest by running a “classical” MCMC algorithm and sometimes, choose a value of the auxiliary process as the new value of the process of interest: draw a point at random + acceptation-rejection mecanism How to define such an auxiliary process ? Ans.: as a process with stationary distribution π β ( β ∈ (0 , 1) ), a tempered version of the target π .

  13. Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Examples of adaptive MCMC samplers Equi-Energy sampler ◮ On an example: a K -stage Equi-Energy sampler. Target density : mixture of 2−dim Gaussian 10 8 target density: π = P 20 i =1 N 2 ( µ i , Σ i ) 6 K auxiliary processes: with targets π 1 /T i 4 2 T 1 > T 2 > · · · > T K +1 = 1 0 draws means of the components −2 0 1 2 3 4 5 6 7 8 9 10 Target density at temperature 1 Target density at temperature 2 Target density at temperature 3 14 12 12 12 10 10 10 8 8 8 6 6 6 4 4 4 2 2 2 0 0 0 −2 draws draws draws means of the components means of the components means of the components −4 −2 −2 −2 0 2 4 6 8 10 12 −2 0 2 4 6 8 10 12 0 1 2 3 4 5 6 7 8 9 10 Target density at temperature 4 Target density at temperature 5 Hastings−Metropolis 12 10 10 9 10 8 8 8 7 6 6 6 4 5 4 4 2 2 3 2 0 0 draws draws 1 draws means of the components means of the components means of the components −2 −2 0 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9

  14. Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Examples of adaptive MCMC samplers Equi-Energy sampler ◮ Algorithm: (2 stages) Repeat: Update the adaptation process n − 1 θ n = 1 X δ Yk n k =0 where { Y n , n ≥ 0 } is the auxiliary process with stationary distribution π β . Update the process of interest with transition : X n +1 ∼ P θ n ( X n , · ) where 8 9 > > > > Z Z < = P θn ( x, A ) = (1 − ǫ ) P ( x, A )+ ǫ α ( x, y ) θ n ( dy ) + δ x ( A ) (1 − α ( x, y )) θ n ( dy ) > > A | {z } > > : ; accept/reject mecanism

Recommend


More recommend