adaptive and interacting markov chain monte carlo
play

Adaptive and Interacting Markov chain Monte Carlo Gersende FORT - PowerPoint PPT Presentation

Adaptive and Interacting Markov chain Monte Carlo Adaptive and Interacting Markov chain Monte Carlo Gersende FORT LTCI CNRS & Telecom ParisTech Paris, France Talk based on joint works with Eric Moulines (Telecom ParisTech, France) , Pierre


  1. Adaptive and Interacting Markov chain Monte Carlo Adaptive and Interacting Markov chain Monte Carlo Gersende FORT LTCI CNRS & Telecom ParisTech Paris, France Talk based on joint works with Eric Moulines (Telecom ParisTech, France) , Pierre Priouret (Univ. Paris 6, France) and Pierre Vandekerkhove (Univ. Marne-la-Vall´ ee, France) . Amandine Schreck (Telecom ParisTech, France) . Benjamin Jourdain (ENPC, France) , Estelle Kuhn (INRA, France) , Tony Leli` evre (ENPC, France) and Gabriel Stoltz (ENPC, France) .

  2. Adaptive and Interacting Markov chain Monte Carlo Introduction Hastings-Metropolis algorithm (1/2) Given on X ⊆ R d (to simplify the talk) a target density π a proposal transition kernel q ( x, y ) define { X k , k ≥ 0 } iteratively as ( i ) draw Y ∼ q ( X k , · ) ( ii ) compute α ( X k , Y ) = 1 ∧ π ( Y ) q ( Y, X k ) π ( X k ) q ( X k , Y ) � Y with prob. α ( X k , Y ) ( iii ) set X k +1 = X k with prob. 1 − α ( X k , Y )

  3. Adaptive and Interacting Markov chain Monte Carlo Introduction Hastings-Metropolis algorithm (2/2) Then ( X k ) k ≥ 0 is a Markov chain with transition kernel P � � P ( x, A ) = α ( x, y ) q ( x, y ) λ ( dy ) + 1 I A ( x ) (1 − α ( x, y )) q ( x, y ) λ ( dy ) Under conditions on π and q d P k ( x, · ) Ergodic behavior : − → π � P k ( x, · ) − π � TV ≤ B ( x, k ) Explicit control of ergodicity Law of Large Numbers n � 1 � a.s. − → f ( X k ) f π dλ n k =1 Central Limit Theorem � � n � √ n 1 � d → N (0 , σ 2 f ( X k ) − f π dλ − f ) n k =1

  4. Adaptive and Interacting Markov chain Monte Carlo Introduction ex. : Efficiency of a Gaussian Random Walk Hastings-Metropolis When λ ≡ Lebesgue on R and q ( x, · ) ≡ N ( x, θ ) efficiency compared through the (estimated) lag- s autocovariance function γ s = E [ X 0 X s ] − ( E [ X 0 ]) 2 when X 0 ∼ π 3 2.5 3 2 2 2 1.5 1 1 1 0 0 0.5 −1 −1 0 −2 −2 −0.5 −3 −1 −3 0 500 1000 0 500 1000 0 500 1000 1 1 1.2 1 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0 0 0 −0.2 0 50 100 0 50 100 0 50 100 For 3 different values of θ : [top] a path ( X k , k ≥ 1) [bottom] s �→ γ ( s ) /γ (0) ֒ → Online Adaption of the design parameters θ

  5. Adaptive and Interacting Markov chain Monte Carlo Examples of adaptive and interacting MCMC Introduction Examples of adaptive and interacting MCMC The Adaptive Metropolis sampler The Wang-Landau sampler The Equi-Energy sampler Convergence results Unfortunately ... Ergodic behavior Central Limit Theorems Conclusion Bibliography

  6. Adaptive and Interacting Markov chain Monte Carlo Examples of adaptive and interacting MCMC The Adaptive Metropolis sampler Example 1 : Adaptive Metropolis (1/2) Proposed by Haario et al. (2001) : learn on the fly the optimal covariance of the Gaussian proposal distribution Define a process { X k , k ≥ 0 } such that (i) update the chain : P ( X k +1 ∈ A |F k ) ≡ one step of Gaussian HM, with covariance matrix θ k (ii) update the estimate of the covariance matrix θ k +1 = function ( k, θ k , X k +1 ) .

  7. Adaptive and Interacting Markov chain Monte Carlo Examples of adaptive and interacting MCMC The Adaptive Metropolis sampler Example 1 : Adaptive Metropolis (2/2) The general framework : Let P θ be a Gaussian Hastings-Metropolis kernel ; θ is the covariance matrix of the Gaussian proposal distribution. For any θ : πP θ = π The adaptive algorithm : (i) Sample X k +1 |F k ∼ P θ k ( X k , · ) (ii) Update the parameter θ k +1 by using θ k , X k +1 . Here, θ is a covariance matrix.

  8. Adaptive and Interacting Markov chain Monte Carlo Examples of adaptive and interacting MCMC The Wang-Landau sampler Example 2 : Wang-Landau (1/4) Proposed by Wang and Landau (2001) for sampling systems in molecular dynamics ; many metastable states ↔ many local modes separated with deep valleys. Idea : Let X 1 , · · · , X d be a partition of X . Set d π ( x ) � π θ ⋆ ( x ) ∝ θ ⋆ ( i )1 I X i ( x ) θ ⋆ ( i ) = π ( X i ) i =1 The idea is to obtain samples (approx.) under π θ ⋆ . Then, by an importance ratio, these samples will approximate π . n n 1 ⇒ 1 � � δ X k ≈ π θ ⋆ = I X k ∈ X i δ X k ≈ π roughly : θ ⋆ ( i )1 n n k =1 k =1 WL is an algorithm which provides an estimation of θ ⋆ and samples approx. distributed under π θ ⋆ .

  9. Adaptive and Interacting Markov chain Monte Carlo Examples of adaptive and interacting MCMC The Wang-Landau sampler Example 2 : Wang-Landau (2/4) Define { X k , k ≥ 0 } iteratively (i) Sample X k +1 |F k ∼ MCMC sampler with target distribution π θ k (ii) Update the parameter θ k +1 = function ( k, θ k , X k +1 ) The parameter { θ k , k ≥ 0 } is updated through a Stochastic Approximation procedure θn +1 = θn + γn +1 h ( θn ) + γn +1noise n +1 with mean field h such that if { θ k , k ≥ 0 } converges, its limiting value is θ ⋆ .

  10. Adaptive and Interacting Markov chain Monte Carlo Examples of adaptive and interacting MCMC The Wang-Landau sampler Example 2 : Wang-Landau (3/4) 8 7 2.5 2.5 6 2 2 5 1.5 4 1.5 1 3 1 2 0.5 0.5 1 0 0 0 −0.5 3 −0.5 2 −1 1 −1 0 −1.5 −1.5 −1 −2.5 −2 −1 −1.5 −2 −2 0.5 0 −0.5 −2 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 2 1.5 1 2.5 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 Figure : [left] level curves of π [center] Target density π [right] Partition of the state space 0.14 0.12 0.12 0.1 0.1 0.08 0.08 0.06 0.06 0.04 0.04 0.02 0.02 0 0 0 0.5 e6 1 e6 1.5 e6 2 e6 2.5 e6 3 e6 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 Figure : [left] The sequences ( θk ( i )) k . [right] The limiting value θ⋆ ( i )

  11. Adaptive and Interacting Markov chain Monte Carlo Examples of adaptive and interacting MCMC The Wang-Landau sampler Example 2 : Wang-Landau (3/4) 8 7 2.5 2.5 6 2 5 2 1.5 4 1.5 1 3 1 0.5 2 0.5 1 0 0 0 −0.5 3 −0.5 2 −1 1 −1 0 −1.5 −1.5 −1 −2 −2.5 −2 −0.5 −1 −1.5 −2 1 0.5 0 −2 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 2.5 2 1.5 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 Figure : [left] level curves of π [center] Target density π [right] Partition of the state space beta=4 beta=4 2 2 1.5 1.5 1 1 0.5 0.5 0 0 −0.5 −0.5 −1 −1 −1.5 −1.5 −2 −2 0 2 4 6 8 10 12 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 4 6 x 10 x 10 Figure : [left] Wang Landau, T = 110 000 . [right] Hastings Metropolis, T = 2 106 ; the red line is at x = 110 000

  12. Adaptive and Interacting Markov chain Monte Carlo Examples of adaptive and interacting MCMC The Wang-Landau sampler Example 2 : Wang-Landau (4/4) The general framework : Let π θ be a distribution. Let P θ be MCMC sampler with target distribution π θ . For any θ : π θ P θ = π θ The adaptive algorithm : (i) Sample X k +1 |F k ∼ P θ k ( X k , · ) (ii) Update the parameter θ k +1 by using θ k , X k +1 . Here, θ = ( θ (1) , · · · , θ ( d )) is a probability on { 1 , · · · , d } .

  13. Adaptive and Interacting Markov chain Monte Carlo Examples of adaptive and interacting MCMC The Equi-Energy sampler Example 3 : Equi-Energy (1/3) Proposed by Kou et al. (2006) to sample multimodal target density π π 1 /T Based on an auxiliary process designed to admit ( T > 1) as target distribution. 0.5 target distribution equi-energy jump 0.4 local move 0.3 0.2 tempered distribution boundary 2 0.1 boundary 1 0 -6 -4 -2 0 2 4 6 current state The transition kernel X k → X k +1 is ˜ P θ k ( X k , · ) = (1 − ǫ ) Q ( X k , · ) + ǫ Q θ k ( X k , · ) � �� � � �� � MCMC with target π kernel depending on the empirical distribution θ k of the auxiliary process

  14. Adaptive and Interacting Markov chain Monte Carlo Examples of adaptive and interacting MCMC The Equi-Energy sampler Example 3 : Equi-Energy (2/3) Target density : mixture of 2−dim Gaussian 10 target density : π = � 20 8 i =1 N 2 ( µ i , Σ i ) 6 5 processes with target distribution π 1 /T k 4 2 ( T K = 1) 0 draws means of the components −2 0 1 2 3 4 5 6 7 8 9 10 Target density at temperature 1 Target density at temperature 2 Target density at temperature 3 14 12 12 12 10 10 10 8 8 8 6 6 6 4 4 4 2 2 2 0 0 0 −2 draws draws draws means of the components means of the components means of the components −4 −2 −2 −2 0 2 4 6 8 10 12 −2 0 2 4 6 8 10 12 0 1 2 3 4 5 6 7 8 9 10 Target density at temperature 4 Target density at temperature 5 Hastings−Metropolis 12 10 10 9 10 8 8 8 7 6 6 6 4 5 4 4 2 3 2 2 0 0 1 draws draws draws means of the components means of the components means of the components −2 −2 0 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9

  15. Adaptive and Interacting Markov chain Monte Carlo Examples of adaptive and interacting MCMC The Equi-Energy sampler Example 3 : Equi-Energy (3/3) The general framework : Let P θ be the kernel associated to a EE-transition when the equi-energy jump uses a point sampled under the distribution θ . Under assumptions, for any θ : ∃ π θ s.t. π θ P θ = π θ . The adaptive algorithm : (i) Sample X k +1 |F k ∼ P θ k ( X k , · ) (ii) Update the distribution θ k +1 by using θ k and (auxiliary process) k +1 . Here, θ k is an empirical distribution on X .

Recommend


More recommend