the metropolis hastings algorithm introduction and
play

The Metropolis Hastings algorithm : introduction and optimal scaling - PowerPoint PPT Presentation

Optimal scaling of the RWMH algorithm Introduction to the Metropolis-Hastings algorithm Optimal scaling of the transient phase of RWMH Optimisation strategies for the RWMH algorithm The Metropolis Hastings algorithm : introduction and optimal


  1. Optimal scaling of the RWMH algorithm Introduction to the Metropolis-Hastings algorithm Optimal scaling of the transient phase of RWMH Optimisation strategies for the RWMH algorithm The Metropolis Hastings algorithm : introduction and optimal scaling of the transient phase Benjamin Jourdain Joint work with T. Leli` evre and B. Miasojedow Summer school CEMRACS 2017 July 19 2017 Benjamin Jourdain (Universit´ e Paris Est, CERMICS) July 19 2017 1 / 33

  2. Optimal scaling of the RWMH algorithm Introduction to the Metropolis-Hastings algorithm Optimal scaling of the transient phase of RWMH Optimisation strategies for the RWMH algorithm Outline of the talk Introduction to the Metropolis-Hastings algorithm 1 Optimal scaling of the transient phase of RWMH 2 Optimisation strategies for the RWMH algorithm 3 Long time convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm Benjamin Jourdain (Universit´ e Paris Est, CERMICS) July 19 2017 2 / 33

  3. Optimal scaling of the RWMH algorithm Introduction to the Metropolis-Hastings algorithm Optimal scaling of the transient phase of RWMH Introduction to the Metropolis-Hastings algorithm Optimisation strategies for the RWMH algorithm Motivation η ( x ) λ ( dx ) Simulation according to a measure π ( dx ) = E η ( y ) λ ( dy ) on E where � λ is a reference measure on ( E , E ) , � η : E → R + is measurable and such that E η ( x ) λ ( dx ) ∈ ( 0 , ∞ ) . Examples Statistical physics : simulation according to the Boltzmann-Gibbs 1 probability measure with density proportional to η ( x ) = e − kBT U ( x ) w.r.t. the Lebesgue measure λ on E = R n ( k B Boltzmann constant, T temperature, U : R n → R potential function), Bayesian statistics : θ E -valued parameter with a priori density p Θ ( θ ) with respect to λ . Denoting by p Y | Θ ( y | θ ) the density of the observation Y when the parameter if θ , the a posteriori density of Θ is p Y | Θ ( y | θ ) p Θ ( θ ) η ( θ ) = p Θ | Y ( θ | y ) = E p Y | Θ ( y | ϑ ) p Θ ( ϑ ) λ ( d ϑ ) . � The computation of the normalizing constant is difficult in both cases. Benjamin Jourdain (Universit´ e Paris Est, CERMICS) July 19 2017 3 / 33

  4. Optimal scaling of the RWMH algorithm Introduction to the Metropolis-Hastings algorithm Optimal scaling of the transient phase of RWMH Introduction to the Metropolis-Hastings algorithm Optimisation strategies for the RWMH algorithm The Metropolis Hastings algorithm Let q : E × E → R + be a mesurable function such that ∀ x ∈ E , � E q ( x , y ) λ ( dy ) = 1, simulation according to the probability measure q ( x , y ) λ ( dy ) is possible. � � � 1 , η ( y ) q ( y , x ) min if η ( x ) q ( x , y ) > 0 η ( x ) q ( x , y ) Let α ( x , y ) = . 1 if η ( x ) q ( x , y ) = 0 No need of the normalizing constant to compute α Starting from an initial E -valued random variable X 0 , construct a Markov chain ( X k ) k ∈ N by the following induction : Given ( X 0 , . . . , X k ) , one generates a proposal Y k + 1 ∼ q ( X k , y ) λ ( dy ) and an independent random variable U k + 1 ∼ U [ 0 , 1 ] , One sets X k + 1 = Y k + 1 1 { U k + 1 ≤ α ( X k , Y k + 1 ) } + X k 1 { U k + 1 >α ( X k , Y k + 1 ) } , i.e. the proposal is accepted with probability α ( X k , Y k + 1 ) and otherwise the position X k is kept. Benjamin Jourdain (Universit´ e Paris Est, CERMICS) July 19 2017 4 / 33

  5. Optimal scaling of the RWMH algorithm Introduction to the Metropolis-Hastings algorithm Optimal scaling of the transient phase of RWMH Introduction to the Metropolis-Hastings algorithm Optimisation strategies for the RWMH algorithm Markov kernel of ( X k ) k For f : E → R measurable and bounded and X 0 : k = ( X 0 , X 1 , . . . , X k ) , E [ f ( X k + 1 ) | X 0 : k ] = E [ E [ f ( Y k + 1 ) 1 { U k + 1 ≤ α ( X k , Y k + 1 ) } + f ( X k ) 1 { U k + 1 >α ( X k , Y k + 1 ) } | X 0 : k , Y k + 1 | X 0 : k ] = E [ f ( Y k + 1 ) α ( X k , Y k + 1 ) + f ( X k )( 1 − α ( X k , Y k + 1 )) | X 0 : k ] � � = f ( y ) α ( X k , y ) q ( X k , y ) λ ( dy ) + f ( X k ) ( 1 − α ( X k , y )) q ( X k , y ) λ ( dy ) E E � = f ( y ) P ( X k , dy ) E where P ( x , dy ) = 1 { y � = x } α ( x , y ) q ( x , y ) λ ( dy ) �� � ( 1 − α ( x , z )) q ( x , z ) λ ( dz ) + q ( x , x ) λ ( { x } ) + δ x ( dy ) . E \{ x } Thus ( X k ) k ∈ N is a Markov chain with kernel P . Benjamin Jourdain (Universit´ e Paris Est, CERMICS) July 19 2017 5 / 33

  6. Optimal scaling of the RWMH algorithm Introduction to the Metropolis-Hastings algorithm Optimal scaling of the transient phase of RWMH Introduction to the Metropolis-Hastings algorithm Optimisation strategies for the RWMH algorithm Reversibility of π For y � = x , � � � 1 , η ( y ) q ( y , x ) η ( x ) q ( x , y ) min if η ( x ) q ( x , y ) > 0 η ( x ) q ( x , y ) η ( x ) q ( x , y ) α ( x , y ) = η ( x ) q ( x , y ) × 1 if η ( x ) q ( x , y ) = 0 = min ( η ( x ) q ( x , y ) , η ( y ) q ( y , x )) . is a symmetric function of ( x , y ) . As a consequence, 1 { x � = y } η ( x ) λ ( dx ) P ( x , dy ) = 1 { x � = y } η ( x ) q ( x , y ) α ( x , y ) λ ( dx ) λ ( dy ) = 1 { x � = y } η ( y ) λ ( dy ) P ( y , dx ) . Since the equality clearly remains true with 1 { x = y } replacing 1 { x � = y } , π ( dx ) P ( x , dy ) = π ( dy ) P ( y , dx ) i.e. π is reversible for the Markov kernel P . This implies that � � x ∈ E π ( dx ) P ( x , dy ) = x ∈ E π ( dy ) P ( y , dx ) = π ( dy ) P ( y , E ) = π ( dy ) . � �� � 1 Benjamin Jourdain (Universit´ e Paris Est, CERMICS) July 19 2017 6 / 33

  7. Optimal scaling of the RWMH algorithm Introduction to the Metropolis-Hastings algorithm Optimal scaling of the transient phase of RWMH Introduction to the Metropolis-Hastings algorithm Optimisation strategies for the RWMH algorithm Remarks the reversibility of π by the kernel P is preserved when � � � η ( y ) q ( y , x ) if η ( x ) q ( x , y ) > 0 a η ( x ) q ( x , y ) α ( x , y ) = , 1 if η ( x ) q ( x , y ) = 0 where a : R + → [ 0 , 1 ] satisfies a ( 0 ) = 0 and a ( u ) = ua ( 1 / u ) for u > 0. The previous choice a ( u ) = min ( u , 1 ) leads to better u asymptotic properties (Peskun 1973). Other ex: a ( u ) = 1 + u . When E = R n et q ( x , y ) = ϕ ( y − x ) for some symmetric probability density ϕ w.r.t. the Lebesgue measure λ (ex ϕ ( z ) = e − | z | 2 2 σ 2 / ( 2 πσ 2 ) n / 2 ), then η ( x ) q ( x , y ) = η ( y ) ϕ ( y − x ) η ( y ) q ( y , x ) η ( x ) ϕ ( x − y ) = η ( y ) η ( x ) . Algorithm called Random Walk Metroplis Hastings since the random variables ( Y n + 1 − X n ) n ∈ N are i.i.d. according to ϕ ( z ) dz . Benjamin Jourdain (Universit´ e Paris Est, CERMICS) July 19 2017 7 / 33

  8. Optimal scaling of the RWMH algorithm Introduction to the Metropolis-Hastings algorithm Optimal scaling of the transient phase of RWMH Introduction to the Metropolis-Hastings algorithm Optimisation strategies for the RWMH algorithm Ergodic theory for Markov chains Conditions on P and π ensuring that as k → ∞ , the law of X k converges weakly to π , � for f : E → R measurable and such that E | f ( x ) | π ( dx ) < ∞ , � � k − 1 1 j = 0 f ( X j ) converges a.s. to E f ( x ) π ( dx ) , k √ � � � k − 1 � 1 converges in law to N 1 ( 0 , σ 2 j = 0 f ( X j ) − k E f ( x ) π ( dx ) f ) k � � � � 2 � � where σ 2 F 2 ( x ) − f = F ( y ) P ( x , dy ) π ( dx ) E E with F solving the Poisson equation � � ∀ x ∈ E , F ( x ) − F ( y ) P ( x , dy ) = f ( x ) − f ( y ) π ( dy ) E E � �� � � �� � := PF ( x ) := π ( f ) � k − 1 j = 0 ( f ( X j ) − π ( f )) = � k − 1 j = 1 ( F ( X j ) − E [ F ( X j ) | X 0 : j − 1 ]) + F ( X 0 ) − PF ( X k − 1 ) . Benjamin Jourdain (Universit´ e Paris Est, CERMICS) July 19 2017 8 / 33

  9. Optimal scaling of the RWMH algorithm Introduction to the Metropolis-Hastings algorithm Optimal scaling of the transient phase of RWMH Optimal scaling of the transient phase of RWMH Optimisation strategies for the RWMH algorithm Introduction to the Metropolis-Hastings algorithm 1 Optimal scaling of the transient phase of RWMH 2 Optimisation strategies for the RWMH algorithm 3 Long time convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm Benjamin Jourdain (Universit´ e Paris Est, CERMICS) July 19 2017 9 / 33

  10. Optimal scaling of the RWMH algorithm Introduction to the Metropolis-Hastings algorithm Optimal scaling of the transient phase of RWMH Optimal scaling of the transient phase of RWMH Optimisation strategies for the RWMH algorithm Random Walk Metropolis Hastings algorithm Sampling of a target probability measure with density η on R n Y n k + 1 = X n k + σ G k + 1 where ( G k ) k ≥ 1 i.i.d. ∼ N n ( 0 , I n ) � � − | x − y | 2 1 q ( x , y ) = ( 2 πσ 2 ) n / 2 exp = q ( y , x ) 2 σ 2 Acceptance probability α ( x , y ) = η ( y ) η ( x ) ∧ 1. How to choose σ in function of the dimension n ? Bad exploration of the space (and therefore poor ergodic properties) in the two opposite situations σ too large : large moves are proposed but almost always rejected, σ too small even if a large proportion of the proposed moves is then accepted. Benjamin Jourdain (Universit´ e Paris Est, CERMICS) July 19 2017 10 / 33

Recommend


More recommend