The Metropolis Hastings algorithm : introduction and optimal scaling - PowerPoint PPT Presentation

Optimal scaling of the RWMH algorithm Introduction to the Metropolis-Hastings algorithm Optimal scaling of the transient phase of RWMH Optimisation strategies for the RWMH algorithm The Metropolis Hastings algorithm : introduction and optimal scaling of the transient phase Benjamin Jourdain Joint work with T. Leli` evre and B. Miasojedow Summer school CEMRACS 2017 July 19 2017 Benjamin Jourdain (Universit´ e Paris Est, CERMICS) July 19 2017 1 / 33

Optimal scaling of the RWMH algorithm Introduction to the Metropolis-Hastings algorithm Optimal scaling of the transient phase of RWMH Optimisation strategies for the RWMH algorithm Outline of the talk Introduction to the Metropolis-Hastings algorithm 1 Optimal scaling of the transient phase of RWMH 2 Optimisation strategies for the RWMH algorithm 3 Long time convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm Benjamin Jourdain (Universit´ e Paris Est, CERMICS) July 19 2017 2 / 33

Optimal scaling of the RWMH algorithm Introduction to the Metropolis-Hastings algorithm Optimal scaling of the transient phase of RWMH Introduction to the Metropolis-Hastings algorithm Optimisation strategies for the RWMH algorithm Motivation η ( x ) λ ( dx ) Simulation according to a measure π ( dx ) = E η ( y ) λ ( dy ) on E where � λ is a reference measure on ( E , E ) , � η : E → R + is measurable and such that E η ( x ) λ ( dx ) ∈ ( 0 , ∞ ) . Examples Statistical physics : simulation according to the Boltzmann-Gibbs 1 probability measure with density proportional to η ( x ) = e − kBT U ( x ) w.r.t. the Lebesgue measure λ on E = R n ( k B Boltzmann constant, T temperature, U : R n → R potential function), Bayesian statistics : θ E -valued parameter with a priori density p Θ ( θ ) with respect to λ . Denoting by p Y | Θ ( y | θ ) the density of the observation Y when the parameter if θ , the a posteriori density of Θ is p Y | Θ ( y | θ ) p Θ ( θ ) η ( θ ) = p Θ | Y ( θ | y ) = E p Y | Θ ( y | ϑ ) p Θ ( ϑ ) λ ( d ϑ ) . � The computation of the normalizing constant is difficult in both cases. Benjamin Jourdain (Universit´ e Paris Est, CERMICS) July 19 2017 3 / 33

Optimal scaling of the RWMH algorithm Introduction to the Metropolis-Hastings algorithm Optimal scaling of the transient phase of RWMH Introduction to the Metropolis-Hastings algorithm Optimisation strategies for the RWMH algorithm The Metropolis Hastings algorithm Let q : E × E → R + be a mesurable function such that ∀ x ∈ E , � E q ( x , y ) λ ( dy ) = 1, simulation according to the probability measure q ( x , y ) λ ( dy ) is possible. � � � 1 , η ( y ) q ( y , x ) min if η ( x ) q ( x , y ) > 0 η ( x ) q ( x , y ) Let α ( x , y ) = . 1 if η ( x ) q ( x , y ) = 0 No need of the normalizing constant to compute α Starting from an initial E -valued random variable X 0 , construct a Markov chain ( X k ) k ∈ N by the following induction : Given ( X 0 , . . . , X k ) , one generates a proposal Y k + 1 ∼ q ( X k , y ) λ ( dy ) and an independent random variable U k + 1 ∼ U [ 0 , 1 ] , One sets X k + 1 = Y k + 1 1 { U k + 1 ≤ α ( X k , Y k + 1 ) } + X k 1 { U k + 1 >α ( X k , Y k + 1 ) } , i.e. the proposal is accepted with probability α ( X k , Y k + 1 ) and otherwise the position X k is kept. Benjamin Jourdain (Universit´ e Paris Est, CERMICS) July 19 2017 4 / 33

Optimal scaling of the RWMH algorithm Introduction to the Metropolis-Hastings algorithm Optimal scaling of the transient phase of RWMH Introduction to the Metropolis-Hastings algorithm Optimisation strategies for the RWMH algorithm Markov kernel of ( X k ) k For f : E → R measurable and bounded and X 0 : k = ( X 0 , X 1 , . . . , X k ) , E [ f ( X k + 1 ) | X 0 : k ] = E [ E [ f ( Y k + 1 ) 1 { U k + 1 ≤ α ( X k , Y k + 1 ) } + f ( X k ) 1 { U k + 1 >α ( X k , Y k + 1 ) } | X 0 : k , Y k + 1 | X 0 : k ] = E [ f ( Y k + 1 ) α ( X k , Y k + 1 ) + f ( X k )( 1 − α ( X k , Y k + 1 )) | X 0 : k ] � � = f ( y ) α ( X k , y ) q ( X k , y ) λ ( dy ) + f ( X k ) ( 1 − α ( X k , y )) q ( X k , y ) λ ( dy ) E E � = f ( y ) P ( X k , dy ) E where P ( x , dy ) = 1 { y � = x } α ( x , y ) q ( x , y ) λ ( dy ) �� ( 1 − α ( x , z )) q ( x , z ) λ ( dz ) + q ( x , x ) λ ( { x } ) + δ x ( dy ) . E \{ x } Thus ( X k ) k ∈ N is a Markov chain with kernel P . Benjamin Jourdain (Universit´ e Paris Est, CERMICS) July 19 2017 5 / 33

Optimal scaling of the RWMH algorithm Introduction to the Metropolis-Hastings algorithm Optimal scaling of the transient phase of RWMH Introduction to the Metropolis-Hastings algorithm Optimisation strategies for the RWMH algorithm Reversibility of π For y � = x , � � � 1 , η ( y ) q ( y , x ) η ( x ) q ( x , y ) min if η ( x ) q ( x , y ) > 0 η ( x ) q ( x , y ) η ( x ) q ( x , y ) α ( x , y ) = η ( x ) q ( x , y ) × 1 if η ( x ) q ( x , y ) = 0 = min ( η ( x ) q ( x , y ) , η ( y ) q ( y , x )) . is a symmetric function of ( x , y ) . As a consequence, 1 { x � = y } η ( x ) λ ( dx ) P ( x , dy ) = 1 { x � = y } η ( x ) q ( x , y ) α ( x , y ) λ ( dx ) λ ( dy ) = 1 { x � = y } η ( y ) λ ( dy ) P ( y , dx ) . Since the equality clearly remains true with 1 { x = y } replacing 1 { x � = y } , π ( dx ) P ( x , dy ) = π ( dy ) P ( y , dx ) i.e. π is reversible for the Markov kernel P . This implies that � � x ∈ E π ( dx ) P ( x , dy ) = x ∈ E π ( dy ) P ( y , dx ) = π ( dy ) P ( y , E ) = π ( dy ) . � �� 1 Benjamin Jourdain (Universit´ e Paris Est, CERMICS) July 19 2017 6 / 33

Optimal scaling of the RWMH algorithm Introduction to the Metropolis-Hastings algorithm Optimal scaling of the transient phase of RWMH Introduction to the Metropolis-Hastings algorithm Optimisation strategies for the RWMH algorithm Remarks the reversibility of π by the kernel P is preserved when � � � η ( y ) q ( y , x ) if η ( x ) q ( x , y ) > 0 a η ( x ) q ( x , y ) α ( x , y ) = , 1 if η ( x ) q ( x , y ) = 0 where a : R + → [ 0 , 1 ] satisfies a ( 0 ) = 0 and a ( u ) = ua ( 1 / u ) for u > 0. The previous choice a ( u ) = min ( u , 1 ) leads to better u asymptotic properties (Peskun 1973). Other ex: a ( u ) = 1 + u . When E = R n et q ( x , y ) = ϕ ( y − x ) for some symmetric probability density ϕ w.r.t. the Lebesgue measure λ (ex ϕ ( z ) = e − | z | 2 2 σ 2 / ( 2 πσ 2 ) n / 2 ), then η ( x ) q ( x , y ) = η ( y ) ϕ ( y − x ) η ( y ) q ( y , x ) η ( x ) ϕ ( x − y ) = η ( y ) η ( x ) . Algorithm called Random Walk Metroplis Hastings since the random variables ( Y n + 1 − X n ) n ∈ N are i.i.d. according to ϕ ( z ) dz . Benjamin Jourdain (Universit´ e Paris Est, CERMICS) July 19 2017 7 / 33

Optimal scaling of the RWMH algorithm Introduction to the Metropolis-Hastings algorithm Optimal scaling of the transient phase of RWMH Introduction to the Metropolis-Hastings algorithm Optimisation strategies for the RWMH algorithm Ergodic theory for Markov chains Conditions on P and π ensuring that as k → ∞ , the law of X k converges weakly to π , � for f : E → R measurable and such that E | f ( x ) | π ( dx ) < ∞ , � � k − 1 1 j = 0 f ( X j ) converges a.s. to E f ( x ) π ( dx ) , k √ � � � k − 1 � 1 converges in law to N 1 ( 0 , σ 2 j = 0 f ( X j ) − k E f ( x ) π ( dx ) f ) k � � � � 2 � � where σ 2 F 2 ( x ) − f = F ( y ) P ( x , dy ) π ( dx ) E E with F solving the Poisson equation � � ∀ x ∈ E , F ( x ) − F ( y ) P ( x , dy ) = f ( x ) − f ( y ) π ( dy ) E E � �� := PF ( x ) := π ( f ) � k − 1 j = 0 ( f ( X j ) − π ( f )) = � k − 1 j = 1 ( F ( X j ) − E [ F ( X j ) | X 0 : j − 1 ]) + F ( X 0 ) − PF ( X k − 1 ) . Benjamin Jourdain (Universit´ e Paris Est, CERMICS) July 19 2017 8 / 33

Optimal scaling of the RWMH algorithm Introduction to the Metropolis-Hastings algorithm Optimal scaling of the transient phase of RWMH Optimal scaling of the transient phase of RWMH Optimisation strategies for the RWMH algorithm Introduction to the Metropolis-Hastings algorithm 1 Optimal scaling of the transient phase of RWMH 2 Optimisation strategies for the RWMH algorithm 3 Long time convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm Benjamin Jourdain (Universit´ e Paris Est, CERMICS) July 19 2017 9 / 33

Optimal scaling of the RWMH algorithm Introduction to the Metropolis-Hastings algorithm Optimal scaling of the transient phase of RWMH Optimal scaling of the transient phase of RWMH Optimisation strategies for the RWMH algorithm Random Walk Metropolis Hastings algorithm Sampling of a target probability measure with density η on R n Y n k + 1 = X n k + σ G k + 1 where ( G k ) k ≥ 1 i.i.d. ∼ N n ( 0 , I n ) � � − | x − y | 2 1 q ( x , y ) = ( 2 πσ 2 ) n / 2 exp = q ( y , x ) 2 σ 2 Acceptance probability α ( x , y ) = η ( y ) η ( x ) ∧ 1. How to choose σ in function of the dimension n ? Bad exploration of the space (and therefore poor ergodic properties) in the two opposite situations σ too large : large moves are proposed but almost always rejected, σ too small even if a large proportion of the proposed moves is then accepted. Benjamin Jourdain (Universit´ e Paris Est, CERMICS) July 19 2017 10 / 33

The Metropolis Hastings algorithm : introduction and optimal scaling - PowerPoint PPT Presentation

Optimal scaling of the RWMH algorithm Introduction to the Metropolis-Hastings algorithm Optimal scaling of the transient phase of RWMH Optimisation strategies for the RWMH algorithm The Metropolis Hastings algorithm : introduction and optimal

Metropolis-Hastings algorithm Dr. Jarad Niemi STAT 544 - Iowa State University April 2, 2019

Metropolis Sampling Ars` ene P erard-Gayot May 23, 2016 Introduction Background Metropolis

Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI

Downtown Hastings A Place for Talent 2016 MML Community Excellence Award - City of Hastings

Hastings Borough Council Corporate Plan & Budget 2017 / 2018 www.hastings.gov.uk What

Hastings Opportunity Area: Initial work was undertaken with Hastings schools, colleagues,

Optimal scaling of the transient phase of Metropolis Hastings algorithms Tony Leli` evre Ecole

Metropolis Of Boston Philoptochos Officers Workshop Saturday, November 23, 2013 Greek Orthodox

Projet METROPOLIS METROlogie Pour LInternet et les Services Metropolis Project

Metropolis-Hastings Algorithms in Function Space for Bayesian Inverse Problems Bjrn Sprungk,

Gibbs Sampling Biostatistics 615/815 Lecture 22: . . . . . . . . . Metropolis-Hastings

Scalable Metropolis-Hastings for Exact Bayesian Inference with Large Datasets Rob Cornish Paul

MCMC Diagnostics Review In the practical you used Metropolis-Hastings with a Gaussian proposal

aks249 Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield Topic :

Metropolis-Hastings Generative Adversarial Networks Ryan Turner, Jane Hung, Eric Frank, Yunus

Farewell to flexicurity? Austerity and labour policies in the European Union Dr. Thomas Hastings

[Andrieu, Doucet & Holenstein, 2010] Introduce algorithms that use SMC proposals in MCMC Given

Advanced Simulation - Lecture 7 George Deligiannidis February 8th, 2016 MetropolisHastings

Introduction Recently, computers have become crucial to the process of partisan gerrymandering

? Quantum Variational Monte Carlo Problem statement Minimize the functional E [ T ], where

Maximizing a Tree Series in the Representation Space Guillaume Rabusseau, Fran cois Denis ICGI

Statistics I Supplements for Chapters 5 and 6 Moment Generating Functions Ling-Chieh Kung

Mathematical Foundations for Finance Exercise 8 Martin Stefanik ETH Zurich Normal Distribution

Statistical mechanics via Answers: GUE asymptotics of symmetric functions Probability via Schur

The Metropolis Hastings algorithm : introduction and optimal scaling - PowerPoint PPT Presentation

Optimal scaling of the RWMH algorithm Introduction to the Metropolis-Hastings algorithm Optimal scaling of the transient phase of RWMH Optimisation strategies for the RWMH algorithm The Metropolis Hastings algorithm : introduction and optimal

Metropolis-Hastings algorithm Dr. Jarad Niemi STAT 544 - Iowa State University April 2, 2019

Metropolis Sampling Ars` ene P erard-Gayot May 23, 2016 Introduction Background Metropolis

Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI

Downtown Hastings A Place for Talent 2016 MML Community Excellence Award - City of Hastings

Hastings Borough Council Corporate Plan &amp; Budget 2017 / 2018 www.hastings.gov.uk What

Hastings Opportunity Area: Initial work was undertaken with Hastings schools, colleagues,

Optimal scaling of the transient phase of Metropolis Hastings algorithms Tony Leli` evre Ecole

Metropolis Of Boston Philoptochos Officers Workshop Saturday, November 23, 2013 Greek Orthodox

Projet METROPOLIS METROlogie Pour LInternet et les Services Metropolis Project

Metropolis-Hastings Algorithms in Function Space for Bayesian Inverse Problems Bjrn Sprungk,

Gibbs Sampling Biostatistics 615/815 Lecture 22: . . . . . . . . . Metropolis-Hastings

Scalable Metropolis-Hastings for Exact Bayesian Inference with Large Datasets Rob Cornish Paul

MCMC Diagnostics Review In the practical you used Metropolis-Hastings with a Gaussian proposal

aks249 Parallel Metropolis-Hastings-Walker Sampling for LDA Xanda Schofield Topic :

Metropolis-Hastings Generative Adversarial Networks Ryan Turner, Jane Hung, Eric Frank, Yunus

Farewell to flexicurity? Austerity and labour policies in the European Union Dr. Thomas Hastings

[Andrieu, Doucet &amp; Holenstein, 2010] Introduce algorithms that use SMC proposals in MCMC Given

Advanced Simulation - Lecture 7 George Deligiannidis February 8th, 2016 MetropolisHastings

Introduction Recently, computers have become crucial to the process of partisan gerrymandering

? Quantum Variational Monte Carlo Problem statement Minimize the functional E [ T ], where

Maximizing a Tree Series in the Representation Space Guillaume Rabusseau, Fran cois Denis ICGI

Statistics I Supplements for Chapters 5 and 6 Moment Generating Functions Ling-Chieh Kung

Mathematical Foundations for Finance Exercise 8 Martin Stefanik ETH Zurich Normal Distribution

Statistical mechanics via Answers: GUE asymptotics of symmetric functions Probability via Schur

Hastings Borough Council Corporate Plan & Budget 2017 / 2018 www.hastings.gov.uk What

[Andrieu, Doucet & Holenstein, 2010] Introduce algorithms that use SMC proposals in MCMC Given