convergence and efficiency of the wang landau algorithm
play

Convergence and Efficiency of the Wang Landau algorithm Gersende - PowerPoint PPT Presentation

Convergence and Efficiency of the Wang Landau algorithm Convergence and Efficiency of the Wang Landau algorithm Gersende FORT CNRS & Telecom ParisTech Paris, France Joint work with Benjamin Jourdain, Tony Leli` evre and Gabriel Stoltz -


  1. Convergence and Efficiency of the Wang Landau algorithm Convergence and Efficiency of the Wang Landau algorithm Gersende FORT CNRS & Telecom ParisTech Paris, France Joint work with Benjamin Jourdain, Tony Leli` evre and Gabriel Stoltz - from ENPC, France. Estelle Kuhn - from INRA Jouy-en-Josas, France.

  2. Convergence and Efficiency of the Wang Landau algorithm Convergence analysis of a Monte Carlo sampler to sample from on X ⊆ R p π ( x ) dλ ( x ) when π is multimodal

  3. Convergence and Efficiency of the Wang Landau algorithm The Wang Landau algorithm Wang Landau : a biasing potential approach Instead of sampling from π , sample from π ⋆ π ⋆ ( x ) ∝ π ( x ) exp( A ⋆ ( x )) where A ⋆ is a biasing potential chosen such that π ⋆ satisfies some efficiency criterion. Such a “perfect” A ⋆ is unknown: it has to be estimated on the fly, when running the sampler. To obtain samples approximating π , use an importance sampling strategy.

  4. Convergence and Efficiency of the Wang Landau algorithm The Wang Landau algorithm Wang Landau : definition of π ⋆ ? π ⋆ ( x ) ∝ π ( x ) exp( − A ⋆ ( x )) Choose a partition X 1 , · · · , X d of X and choose A ⋆ constant on X i d � π ⋆ ( x ) ∝ 1 I X i ( x ) π ( x ) exp( − A ⋆ ( i )) i =1 and such that under π ⋆ , each subset X i has the same weight: π ⋆ ( X i ) = 1 /d 1 d = π ( X i ) exp( − A ⋆ ( i )) Then, d π ⋆ ( x )= 1 π ( x ) � π ( X i )1 I X i ( x ) d i =1

  5. Convergence and Efficiency of the Wang Landau algorithm The Wang Landau algorithm Wang Landau: an adaptive biasing potential algorithm π ( X i ) is unknown and we can not sample under π ⋆ . Define the family of biased densities, indexed by a weight vector θ = ( θ (1) , · · · ,θ ( d )) , d π ( x ) � π θ ( x ) ∝ θ ( i ) 1 I X i ( x ) i =1 The algorithm produces iteratively a sequence (( θ t ,X t )) t s.t. (i) X t ∼ π θ t or, if not possible, X t ∼ P θ t ( X t − 1 , · ) where π θ P θ = π θ . (ii) lim t θ t = ( π ( X 1 ) , · · · ,π ( X d ))

  6. Convergence and Efficiency of the Wang Landau algorithm The Wang Landau algorithm Wang Landau: Update rules for the bias θ t By definition, π ⋆ ( X i ) = 1 /d . The update rules consist in penalizing the subsets X i which are visited in order to force the sampler to spend the same time in each subset X i . Since π θ ( X i ) ∝ π ( X i ) /θ ( i ) � if X t +1 ∈ X i θ t +1 ( k ) < θ t ( k ) , k � = i θ t +1 ( i ) > θ t ( i ) Rules: lim t θ t = ( π ( X 1 ) , · · · ,π ( X d ))

  7. Convergence and Efficiency of the Wang Landau algorithm The Wang Landau algorithm Wang Landau: Update rules for the bias θ t By definition, π ⋆ ( X i ) = 1 /d . The update rules consist in penalizing the subsets X i which are visited in order to force the sampler to spend the same time in each subset X i . Since π θ ( X i ) ∝ π ( X i ) /θ ( i ) � if X t +1 ∈ X i θ t +1 ( k ) < θ t ( k ) , k � = i θ t +1 ( i ) > θ t ( i ) Rules: lim t θ t = ( π ( X 1 ) , · · · ,π ( X d )) Ex. Strategy 1: Non-linear update with deterministic step size ( γ t ) t 1 + γ t +1 1 θ t +1 ( i ) = θ t ( i ) θ t +1 ( k ) = θ t ( k ) 1 + γ t +1 θ t ( i ) 1 + γ t +1 θ t ( i )

  8. Convergence and Efficiency of the Wang Landau algorithm The Wang Landau algorithm Wang Landau: Update rules for the bias θ t By definition, π ⋆ ( X i ) = 1 /d . The update rules consist in penalizing the subsets X i which are visited in order to force the sampler to spend the same time in each subset X i . Since π θ ( X i ) ∝ π ( X i ) /θ ( i ) � if X t +1 ∈ X i θ t +1 ( k ) < θ t ( k ) , k � = i θ t +1 ( i ) > θ t ( i ) Rules: lim t θ t = ( π ( X 1 ) , · · · ,π ( X d )) Ex. Strategy 1: Non-linear update with deterministic step size ( γ t ) t 1 + γ t +1 1 θ t +1 ( i ) = θ t ( i ) θ t +1 ( k ) = θ t ( k ) 1 + γ t +1 θ t ( i ) 1 + γ t +1 θ t ( i ) Ex. Strategy 2: Linear update with deterministic step size ( γ t ) t θ t +1 ( i ) = θ t ( i ) + γ t +1 θ t ( i ) (1 − θ t ( i )) θ t +1 ( k ) = θ t ( k ) − γ t +1 θ t ( i ) θ t ( k )

  9. Convergence and Efficiency of the Wang Landau algorithm The Wang Landau algorithm Conclusion Herefater, in the talk WL is an iterative algorithm: each iteration consists in (i) sampling a point X t +1 ∼ P θ t ( X t , · ) where π θ P θ = π θ (ii) updating the biasing potential: θ t +1 = Ξ( θ t ,X t +1 ,t ) We now prove that lim t θ t = ( π ( X 1 ) , · · · ,π ( X d )) a.s. 1 as t → ∞ , X t “approximates” π ⋆ : for a large class of functions f 2 lim t E [ f ( X t )] = π ⋆ ( f ) T � T T − 1 lim f ( X t ) = π ⋆ ( f ) a.s. t =1 and we propose an adaptive importance sampling estimator of π .

  10. Convergence and Efficiency of the Wang Landau algorithm Asymptotic behavior of the weights ( θt ) t Outline The Wang Landau algorithm Conclusion Asymptotic behavior of the weights ( θ t ) t WL as a Stochastic Approximation algorithm Convergence of the weight sequence Rate of convergence Asymptotic distribution of X t WL as a sampler Ergodicity and Law of large numbers Approximation of π Efficiency of the WL algorithm A toy example A second example References

  11. Convergence and Efficiency of the Wang Landau algorithm Asymptotic behavior of the weights ( θt ) t In this section, the update of θ t is one of the tow previous strategies θ t +1 = Ξ( θ t ,X t +1 ,γ t +1 ) where ( γ t ) t is a non increasing positive sequence chosen by the user controlling the adaption rate of the weight sequence ( θ t ) t . We address the convergence 1 the rate of convergence 2 of the weight sequence ( θ t ) t

  12. Convergence and Efficiency of the Wang Landau algorithm Asymptotic behavior of the weights ( θt ) t WL as a Stochastic Approximation algorithm WL as a Stochastic Approximation algorithm WL is a stochastic approximation algorithm with Markov controlled dynamics it produces a sequence of weights ( θ t ) t defined by � � γ 2 θ t +1 = θ t + γ t +1 H ( θ t ,X t +1 ) + O t +1 where H i ( θ,x ) = θ ( i ) (1 I X i ( x ) − θ ( I ( x ))) i ∈ { 1 , · · · ,d }

  13. Convergence and Efficiency of the Wang Landau algorithm Asymptotic behavior of the weights ( θt ) t WL as a Stochastic Approximation algorithm WL as a Stochastic Approximation algorithm WL is a stochastic approximation algorithm with Markov controlled dynamics it produces a sequence of weights ( θ t ) t defined by � � γ 2 θ t +1 = θ t + γ t +1 H ( θ t ,X t +1 ) + O t +1 where H i ( θ,x ) = θ ( i ) (1 I X i ( x ) − θ ( I ( x ))) i ∈ { 1 , · · · ,d } with dynamics ( X t ) t : controlled Markov chain P ( X t +1 ∈ A | past t ) = P θ t ( X t ,A ) Note that the field H ( θ,X t +1 ) is a (random) approximation of the mean field � h ( θ ) = H ( θ,x ) π θ ( x ) λ ( dx ) .

  14. Convergence and Efficiency of the Wang Landau algorithm Asymptotic behavior of the weights ( θt ) t Convergence of the weight sequence Almost-sure convergence of the WL weight sequence Theorem ( F., Jourdain, Kuhn, Leli` evre, Stoltz (2014-a) ) Assume The target distribution π dλ satisfies 0 < inf X π ≤ sup X π < ∞ and 1 inf i π ( X i ) > 0 . For any θ , P θ is a Hastings-Metropolis kernel with invariant distribution 2 d π ( x ) � π θ ( x ) ∝ θ ( i ) 1 I X i ( x ) i =1 and proposal distribution q ( x,y ) dλ ( y ) such that inf X 2 q > 0 . The step-size sequence is non-increasing, positive, 3 � � γ 2 γ t = ∞ t < ∞ t t Then t θ t = ( π ( X 1 ) , · · · ,π ( X d )) lim almost-surely

  15. Convergence and Efficiency of the Wang Landau algorithm Asymptotic behavior of the weights ( θt ) t Convergence of the weight sequence Sketch of the proof (1/2) θ t +1 = θ t + γ t +1 H ( θ t ,X t +1 ) + γ 2 t +1 O (1) ( 1 . ) Rewrite the update rule as a perturbation of a discretized O.D.E. ˙ u = h ( u ) u t +1 = u t + γ t +1 h ( u t ) + γ t +1 ξ t +1 In our case � − 1     � d π ( X 1 ) θ ( j ) �  − θ h ( θ ) = · · ·    π ( X j ) π ( X d ) j =1 ( 2 . ) Show that the ODE ˙ u = h ( u ) converges to the set L = { θ : h ( θ ) = 0 } = { ( π ( X 1 ) , · · · ,π ( X d )) } ( 3 . ) Show that the noisy discretization ( u t ) t inherits the same limiting behavior and converges to L .

  16. Convergence and Efficiency of the Wang Landau algorithm Asymptotic behavior of the weights ( θt ) t Convergence of the weight sequence Sketch of the proof (2/2) The last step is the most technical ( 3 a. ) The noisy discretization has to visit infinitely often an attractive neighborhood of the limiting set L ( 3 b. ) The noise ξ t has to be small (at least when t is large) ξ t +1 = H ( θ t ,X t +1 ) − h ( θ t ) + γ t +1 O (1) and this holds true since we have − Uniform geometric ergodicity: There exists ρ ∈ (0 , 1) s.t. � P n θ ( x, · ) − π θ � TV ≤ 2(1 − ρ ) n . sup x ∈ X ,θ ∈ Θ − Regularity-in- θ of π θ and P θ : There exists C such that for any θ,θ ′ ∈ Θ and any x ∈ X d � � � 1 − θ ′ ( i ) � � � � P θ ( x, · ) − P θ ′ ( x, · ) � TV + � π θ dλ − π θ ′ dλ � TV ≤ C � � θ ( i ) � i =1

Recommend


More recommend