Advanced Simulation - Lecture 7 George Deligiannidis February 8th, - PowerPoint PPT Presentation

Advanced Simulation - Lecture 7 George Deligiannidis February 8th, 2016

Metropolis–Hastings algorithm Target distribution on X = R d of density π ( x ) . Proposal distribution: for any x , x ′ ∈ X , we have q ( x ′ | x ) ≥ 0 and � X q ( x ′ | x ) dx ′ = 1. Starting with X ( 1 ) , for t = 2, 3, ... � ·| X ( t − 1 ) � 1 Sample X ⋆ ∼ q . 2 Compute X ( t − 1 ) � � � X ⋆ �   � � X ⋆ | X ( t − 1 ) � π ( X ⋆ ) q  1,  . = min α � X ( t − 1 ) � � X ⋆ | X ( t − 1 ) � π q � X ⋆ | X ( t − 1 ) � , set X ( t ) = X ⋆ , 3 Sample U ∼ U [ 0,1 ] . If U ≤ α otherwise set X ( t ) = X ( t − 1 ) . Lecture 7 Metropolis–Hastings Properties 2 / 30

Some results Proposition If q ( x ⋆ | x ) > 0 for any x , x ⋆ ∈ supp ( π ) then the Metropolis-Hastings chain is irreducible, in fact every state can be reached in a single step (strongly irreducible). Less strict conditions in (Roberts & Rosenthal, 2004). Proposition If the MH chain is irreducible then it is also Harris recurrent(see Tierney, 1994). Lecture 7 Metropolis–Hastings Properties 3 / 30

LLN for MH Theorem If the Markov chain generated by the Metropolis–Hastings sampler is π − irreducible, then we have for any integrable function ϕ : X → R : � � X ( i ) � t 1 ∑ lim ϕ = X ϕ ( x ) π ( x ) dx t t → ∞ i = 1 for every starting value X ( 1 ) . Lecture 7 Metropolis–Hastings Properties 4 / 30

Random Walk Metropolis–Hastings In the Metropolis–Hastings, pick q ( x ⋆ | x ) = g ( x ⋆ − x ) with g being a symmetric distribution, thus X ⋆ = X + ε , ε ∼ g ; e.g. g is a zero-mean multivariate normal or t-student. Acceptance probability becomes � � 1, π ( x ⋆ ) α ( x ⋆ | x ) = min . π ( x ) We accept... a move to a more probable state with probability 1; a move to a less probable state with probability π ( x ⋆ ) / π ( x ) < 1. Lecture 7 Metropolis–Hastings Random Walk Metropolis 5 / 30

Independent Metropolis–Hastings Independent proposal : a proposal distribution q ( x ⋆ | x ) which does not depend on x . Acceptance probability becomes � � 1, π ( x ⋆ ) q ( x ) α ( x ⋆ | x ) = min . π ( x ) q ( x ⋆ ) For instance, multivariate normal or t-student distribution. If π ( x ) / q ( x ) < M for all x and some M < ∞ , then the chain is uniformly ergodic . The acceptance probability at stationarity is at least 1/ M (Lemma 7.9 of Robert & Casella). On the other hand, if such an M does not exist, the chain is not even geometrically ergodic! Lecture 7 Metropolis–Hastings Independent MH 6 / 30

Choosing a good proposal distribution Goal: design a Markov chain with small correlation � X ( t − 1 ) , X ( t ) � ρ between subsequent values (why?). Two sources of correlation: between the current state X ( t − 1 ) and proposed value � ·| X ( t − 1 ) � X ∼ q , correlation induced if X ( t ) = X ( t − 1 ) , if proposal is rejected. Trade-off: there is a compromise between proposing large moves, obtaining a decent acceptance probability. For multivariate distributions: covariance of proposal should reflect the covariance structure of the target. Lecture 7 Metropolis–Hastings Which proposal? 7 / 30

Choice of proposal Target distribution, we want to sample from � � 0 � � 1 �� 0.5 π ( x ) = N x ; , . 0 0.5 1 We use a random walk Metropolis—Hastings algorithm with � � 1 �� 0 ε ; 0, σ 2 g ( ε ) = N . 0 1 What is the optimal choice of σ 2 ? We consider three choices: σ 2 = 0.1 2 , 1, 10 2 . Lecture 7 Metropolis–Hastings Which proposal? 8 / 30

Metropolis–Hastings algorithm 2 X 1 0 −2 0 2500 5000 7500 10000 step 2 X 2 0 −2 0 2500 5000 7500 10000 step Figure: Metropolis–Hastings on a bivariate Gaussian target. With σ 2 = 0.1 2 , the acceptance rate is ≈ 94%. Lecture 7 Metropolis–Hastings Which proposal? 9 / 30

Metropolis–Hastings algorithm 0.4 0.3 density 0.2 0.1 0.0 −2 0 2 X 1 0.4 0.3 density 0.2 0.1 0.0 −2 0 2 X 2 Figure: Metropolis–Hastings on a bivariate Gaussian target. With σ 2 = 0.1 2 , the acceptance rate is ≈ 94%. Lecture 7 Metropolis–Hastings Which proposal? 9 / 30

Metropolis–Hastings algorithm 2 X 1 0 −2 0 2500 5000 7500 10000 step 2 X 2 0 −2 0 2500 5000 7500 10000 step Figure: Metropolis–Hastings on a bivariate Gaussian target. With σ 2 = 1, the acceptance rate is ≈ 52%. Lecture 7 Metropolis–Hastings Which proposal? 9 / 30

Metropolis–Hastings algorithm 0.4 0.3 density 0.2 0.1 0.0 −2 0 2 X 1 0.4 0.3 density 0.2 0.1 0.0 −2 0 2 X 2 Figure: Metropolis–Hastings on a bivariate Gaussian target. With σ 2 = 1, the acceptance rate is ≈ 52%. Lecture 7 Metropolis–Hastings Which proposal? 9 / 30

Metropolis–Hastings algorithm 2 X 1 0 −2 0 2500 5000 7500 10000 step 2 X 2 0 −2 0 2500 5000 7500 10000 step Figure: Metropolis–Hastings on a bivariate Gaussian target. With σ 2 = 10, the acceptance rate is ≈ 1.5%. Lecture 7 Metropolis–Hastings Which proposal? 9 / 30

Metropolis–Hastings algorithm 0.6 0.4 density 0.2 0.0 −2 0 2 X 1 0.8 0.6 density 0.4 0.2 0.0 −2 0 2 X 2 Figure: Metropolis–Hastings on a bivariate Gaussian target. With σ 2 = 10, the acceptance rate is ≈ 1.5%. Lecture 7 Metropolis–Hastings Which proposal? 9 / 30

Choice of proposal Aim at some intermediate acceptance ratio: 20%? 40%? Some hints come from the literature on “optimal scaling”. Literature suggest tuning to get .234... Maximize the expected square jumping distance: � || X t + 1 − X t || 2 � E In multivariate cases, try to mimick the covariance structure of the target distribution. Cooking recipe: run the algorithm for T iterations, check some criterion, tune the proposal distribution accordingly, run the algorithm for T iterations again . . . “Constructing a chain that mixes well is somewhat of an art.” All of Statistics , L. Wasserman. Lecture 7 Metropolis–Hastings Which proposal? 10 / 30

The adaptive MCMC approach One can make the transition kernel K adaptive, i.e. use K t at iteration t and choose K t using the past sample ( X 1 , . . . , X t − 1 ) . The Markov chain is not homogeneous anymore: the mathematical study of the algorithm is much more complicated. Adaptation can be counterproductive in some cases (see Atchadé & Rosenthal, 2005)! Adaptive Gibbs samplers also exist. Lecture 7 Metropolis–Hastings Which proposal? 11 / 30

Sophisticated Proposals “Langevin” proposal relies on X ⋆ = X ( t − 1 ) + σ 2 ∇ log π | X ( t − 1 ) + σ W where W ∼ N ( 0, I d ) , so the Metropolis-Hastings acceptance ratio is π ( X ⋆ ) q ( X ( t − 1 ) | X ⋆ ) π ( X ( t − 1 ) ) q ( X ⋆ | X ( t − 1 ) ) N ( X ( t − 1 ) ; X ⋆ + σ 2 . ∇ log π | X ⋆ ; σ 2 ) π ( X ⋆ ) = 2 . ∇ log π | X ( t − 1 ) ; σ 2 ) . N ( X ⋆ ; X ( t − 1 ) + σ π ( X ( t − 1 ) ) Possibility to use higher order derivatives: � ∇ 2 log π | X ( t − 1 ) � − 1 ∇ log π | X ( t − 1 ) + σ W . X ⋆ = X ( t − 1 ) + σ 2 Lecture 7 Metropolis–Hastings Which proposal? 12 / 30

Sophisticated Proposals We can use q ( X ⋆ | X ( t − 1 ) ) = g ( X ⋆ ; ϕ ( X ( t − 1 ) )) where g is a distribution on X of parameters ϕ ( X ( t − 1 ) ) and ϕ is a deterministic mapping π ( X ⋆ ) q ( X ( t − 1 ) | X ⋆ ) π ( X ⋆ ) g ( X ( t − 1 ) ; ϕ ( X ⋆ )) π ( X ( t − 1 ) ) q ( X ⋆ | X ( t − 1 ) ) = π ( X ( t − 1 ) ) g ( X ⋆ ; ϕ ( X ( t − 1 ) )) . For instance, use heuristics borrowed from optimization techniques. Lecture 7 Metropolis–Hastings Which proposal? 13 / 30

Sophisticated Proposals The following link shows a comparison of adaptive Metropolis-Hastings, Gibbs sampling, No U-Turn Sampler (e.g. Hamiltonian MCMC) on a simple linear model. twiecki.github.io/blog/2014/01/02/visualizing-mcmc/ Lecture 7 Metropolis–Hastings Which proposal? 14 / 30

Sophisticated Proposals Assume you want to sample from a target π with supp ( π ) ⊂ R + , e.g. the posterior distribution of a variance/scale parameter. Any proposed move, e.g. using a normal random walk, to R − is a waste of time. Given X ( t − 1 ) , propose X ⋆ = exp ( log X ( t − 1 ) + ε ) with ε ∼ N ( 0, σ 2 ) . What is the acceptance probability then? � � q ( X ( t − 1 ) | X ⋆ ) π ( X ⋆ ) α ( X ⋆ | X ( t − 1 ) ) = min 1, q ( X ⋆ | X ( t − 1 ) ) π ( X ( t − 1 ) ) � � π ( X ⋆ ) X ⋆ = min 1, . π ( X ( t − 1 ) ) X ( t − 1 ) Why? � � − ( log y − log x ) 2 1 √ 2 π exp q ( y | x ) � = x 2 σ 2 y σ q ( x | y ) = � y . − ( log x − log y ) 2 1 √ 2 π exp 2 σ 2 x σ Lecture 7 Metropolis–Hastings Which proposal? 15 / 30

Advanced Simulation - Lecture 7 George Deligiannidis February 8th, - PowerPoint PPT Presentation

Advanced Simulation - Lecture 7 George Deligiannidis February 8th, 2016 MetropolisHastings algorithm Target distribution on X = R d of density ( x ) . Proposal distribution: for any x , x X , we have q ( x | x ) 0 and X

Outline Narcisse Ngada DESY, MKK 1) What is simulation ? 14.05.2014 2) Why simulation ? 3)

Grid simulation (AliEn) Outline GRID simulation Simulation tool Ptolemy (Berkeley)

T7 Cloud Simulation On-demand access simulation December 2016 T7 Cloud Simulation December 2016

Simulation Simulation CHAPTER 1 INTRODUCTION TO SIMULATION 2 MODELING CHAPTER 1 INTRODUCTION

Automated Configuration of Co-simulation with Domain Specific Hints Co-simulation on the rise

MD3311 Simulation Results Joschua Dilly 28.01.2019 MD3311 Simulation Results 2 Introduction

Surgical Simulation: Surgical Simulation: We dont need simulation. We dont need

Why Bayesian methods in Simulation? Simulation Simulation Model Inputs BAYESIAN IDEAS

Statistical Simulation in Python Tushar Shanker Data Scientist DataCamp Statistical Simulation

Chapter 2 Simulation Examples Banks, Carson, Nelson & Nicol Discrete-Event System Simulation

Simulation of stationary processes Timo Tiihonen 2014 Tactical aspects of simulation

Architecture without explicit locks for logic Importance Of Simulation simulation on SIMD

Simulation Monte Carlo Monte Carlo simulation Outcome of a single stochastic simulation run

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

1/88 Presentation: Advanced Techniques 2/88 Presentation: Advanced Techniques 3/88

Advanced Nutrition Course Advanced Nutrition Course 6 Week Advanced Nutrition Live Online

Introduction Recently, computers have become crucial to the process of partisan gerrymandering

Exploring The (Metric) Space of Collider Events CERN Particle and Astro-Particle Physics Seminar

Topics in Metrics for Software Testing [Reading assignment: Chapter 20, pp. 314-326]

Developing a Predic0ve Model for Internet Video

[Andrieu, Doucet & Holenstein, 2010] Introduce algorithms that use SMC proposals in MCMC Given

Metropolis Sampling Ars` ene P erard-Gayot May 23, 2016 Introduction Background Metropolis

The Metropolis Hastings algorithm : introduction and optimal scaling of the transient phase

? Quantum Variational Monte Carlo Problem statement Minimize the functional E [ T ], where

Advanced Simulation - Lecture 7 George Deligiannidis February 8th, - PowerPoint PPT Presentation

Advanced Simulation - Lecture 7 George Deligiannidis February 8th, 2016 MetropolisHastings algorithm Target distribution on X = R d of density ( x ) . Proposal distribution: for any x , x X , we have q ( x | x ) 0 and X

Outline Narcisse Ngada DESY, MKK 1) What is simulation ? 14.05.2014 2) Why simulation ? 3)

Grid simulation (AliEn) Outline GRID simulation Simulation tool Ptolemy (Berkeley)

T7 Cloud Simulation On-demand access simulation December 2016 T7 Cloud Simulation December 2016

Simulation Simulation CHAPTER 1 INTRODUCTION TO SIMULATION 2 MODELING CHAPTER 1 INTRODUCTION

Automated Configuration of Co-simulation with Domain Specific Hints Co-simulation on the rise

MD3311 Simulation Results Joschua Dilly 28.01.2019 MD3311 Simulation Results 2 Introduction

Surgical Simulation: Surgical Simulation: We dont need simulation. We dont need

Why Bayesian methods in Simulation? Simulation Simulation Model Inputs BAYESIAN IDEAS

Statistical Simulation in Python Tushar Shanker Data Scientist DataCamp Statistical Simulation

Chapter 2 Simulation Examples Banks, Carson, Nelson &amp; Nicol Discrete-Event System Simulation

Simulation of stationary processes Timo Tiihonen 2014 Tactical aspects of simulation

Architecture without explicit locks for logic Importance Of Simulation simulation on SIMD

Simulation Monte Carlo Monte Carlo simulation Outcome of a single stochastic simulation run

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

1/88 Presentation: Advanced Techniques 2/88 Presentation: Advanced Techniques 3/88

Advanced Nutrition Course Advanced Nutrition Course 6 Week Advanced Nutrition Live Online

Introduction Recently, computers have become crucial to the process of partisan gerrymandering

Exploring The (Metric) Space of Collider Events CERN Particle and Astro-Particle Physics Seminar

Topics in Metrics for Software Testing [Reading assignment: Chapter 20, pp. 314-326]

Developing a Predic0ve Model for Internet Video

[Andrieu, Doucet &amp; Holenstein, 2010] Introduce algorithms that use SMC proposals in MCMC Given

Metropolis Sampling Ars` ene P erard-Gayot May 23, 2016 Introduction Background Metropolis

The Metropolis Hastings algorithm : introduction and optimal scaling of the transient phase

? Quantum Variational Monte Carlo Problem statement Minimize the functional E [ T ], where

Chapter 2 Simulation Examples Banks, Carson, Nelson & Nicol Discrete-Event System Simulation

[Andrieu, Doucet & Holenstein, 2010] Introduce algorithms that use SMC proposals in MCMC Given