Scaling Analysis of MCMC algorithms Alexandre Thiry 1 1 University of - PowerPoint PPT Presentation

The Scaling Analysis Method High Dimensional MCMC Concentration near a manifold Scaling Analysis of MCMC algorithms Alexandre Thiéry 1 1 University of Warwick MCQMC, February 2012 Collaboration with Andrew Stuart (Warwick), Gareth Roberts (Warwick), Natesh Pillai (Harvard) and Alex Beskos (UCL). Funded by CRISM

The Scaling Analysis Method High Dimensional MCMC Concentration near a manifold Outline The Scaling Analysis Method 1 High Dimensional MCMC 2 Concentration near a manifold 3

The Scaling Analysis Method High Dimensional MCMC Concentration near a manifold Purposes Analysis of asymptotic complexity [Roberts and Co-workers, 1997] Avoid Spectral gaps, Log-Sobolev, etc ... Provide more intuition on behaviour of algorithms Easy-to-follow guidelines for tuning MCMC algorithms

The Scaling Analysis Method High Dimensional MCMC Concentration near a manifold Sequence of MCMC algorithms Sequence of target distributions π α index by parameter α Sequence of MCMC proposals. (Almost always) local proposals of the form x ⋆ = a ( α ) x + σ ( α ) Z Sequence of MCMC chains indexed by parameter α , x α = x 1 ,α , x 2 ,α , x 3 ,α , . . . We are interested in the limit α → α ∞

The Scaling Analysis Method High Dimensional MCMC Concentration near a manifold Example of Limiting Regime α = dimension of the state space. Interest in α → α ∞ = ∞ . Consider target distribution with density of the form − Ψ( x ) � � π α ( x ) ∝ exp α Interest in α → α ∞ = 0.

The Scaling Analysis Method High Dimensional MCMC Concentration near a manifold Interpolation Choose a time discretisation parameter δ = δ ( α ) such that δ → 0 as α → α ∞ . Define the accelerated process z α by z α ( t ) = x t /δ ( α ) ,α

The Scaling Analysis Method High Dimensional MCMC Concentration near a manifold Scaling Limit

The Scaling Analysis Method High Dimensional MCMC Concentration near a manifold Scaling Limit A scaling limit result is a theorem of the form Theorem (Scaling Limit) α → α ∞ z α = z lim The convergence is on pathspace C ([ 0 , T ] , H ) . The limiting process is typically a non-trivial diffusion, jump or Levy process.

The Scaling Analysis Method High Dimensional MCMC Concentration near a manifold Interpretation Limiting process z ( t ) takes T mix to mix. Using the approximation x k ,α = z α ( k δ ) ≈ z ( k δ ) it follows that x ( · , α ) takes roughly k ≈ T mix /δ ( α ) steps to mix. Consequently, as α → α ∞ the complexity of the MCMC algorithm grows as δ ( α ) − 1 .

The Scaling Analysis Method High Dimensional MCMC Concentration near a manifold Mixing

The Scaling Analysis Method High Dimensional MCMC Concentration near a manifold Some motivations: Bayesian Inverse Problems Consider an infinite dimensional Hilbert space H . Reconstruction of unknown data x ∈ H from noisy observation y = F ( x ) + (Noise) Suppose that the noise is Gaussian and put a Gaussian prior π 0 = N (0,C) on the data x to be estimated. Posterior probability distribution π (living on H ) is given by d π ( x ) ∝ e − Φ( x ) d π 0 − 1 2 � F ( x ) − y � 2 � � where Φ( x ) = exp . Γ

The Scaling Analysis Method High Dimensional MCMC Concentration near a manifold Temperature Field Reconstruction

The Scaling Analysis Method High Dimensional MCMC Concentration near a manifold Some motivations: Conditioned Diffusions Consider a diffusion with constant volatility coefficient (see Lamperti) on the interval I = [ 0 , T ] , dX = −∇ U ( X ) dt + σ dW with X 0 = x − , X T = x + Call π the law of X t ∈ I ∈ H = L 2 ( I ) . Law of diffusion X is absolutely continuous (Girsanov) w.r.t. to Wiener bridge measure π 0 dY = σ dW with Y 0 = x − , Y T = x + One can explicitly write down (without stoch. integral) the change of probability d π ( x ) ∝ e − Φ( x ) d π 0

The Scaling Analysis Method High Dimensional MCMC Concentration near a manifold Conditioned Diffusion

The Scaling Analysis Method High Dimensional MCMC Concentration near a manifold Finite Dimensional Discretisation Let ϕ 1 , ϕ 2 . . . , ϕ k , . . . be eigenfunctions of covariance operator C . Let P N ( · ) denote orthogonal projection, in H , onto D span ( ϕ 1 , . . . , ϕ N ) and π N ∼ P N ( π 0 ) . 0 Finite dimensional (but living on H ) posterior π N is given by d π N ( x ) ∝ e − Φ( P N x ) . d π N 0 One can implement all the algorithms in R N but analyse then in H . Other (more natural) discretisation possible.

The Scaling Analysis Method High Dimensional MCMC Concentration near a manifold Random Walk Metropolis (RWM) algorithm RWM for target distribution π α = π N , x ⋆ = x + ξ D � δ ( N ) P N ( ξ ) with ∼ π 0 = N ( 0 , C ) . Discretisation of Brownian motion with covariance P N ( C ) between t and t + δ ( N ) . Diffusion Limit Take δ ( N ) = N − p for any p ≥ 1. The limit N →∞ z N = z lim exists and is a non-trivial ergodic diffusion process.

The Scaling Analysis Method High Dimensional MCMC Concentration near a manifold Scaling Analysis of RWM J. Mattingly, N. Pillai and A.M. Stuart, 2011 Theorem Consider RWM with increment δ ( N ) ≈ N − 1 . The limit z N ⇒ z holds weakly in C ([ 0 , T ] , H s ) . The limit process z is a H -valued Langevin diffusion that is reversible with respect to π . For δ ( N ) ∝ N − 1 , limiting acceptance probability 0 < p < 1. For δ ( N ) ∝ N − ( 1 + ε ) , limiting acceptance probability p = 1. For δ ( N ) ∝ N − ( 1 − ε ) , acceptance probability is exponentially small. Complexity of RWM grows as O ( N ) as N → ∞ .

The Scaling Analysis Method High Dimensional MCMC Concentration near a manifold Langevin Diffusion Probability distribution with density π ( x ) ∝ e − L ( x ) . √ dx = −∇ L ( x ) dt + 2 dW is π -reversible. √ dx = − M ∇ L ( x ) dt + 2 M dW is π -reversible. d π 0 ( x ) ∝ e − Φ( x ) with π 0 = N ( 0 , C ) . Case d π Informally π ( x ) ∝ e − 1 2 � x , C − 1 x �− Φ( x ) . � 1 2 � x , C − 1 x � + Φ( x ) � = C − 1 x + ∇ Φ( x ) , Because ∇ √ dx = − ( x + C ∇ Φ( x )) dt + 2 C dW √ = drift ( x ) dt + 2 C dW is π -reversible. Notice diffusion term.

The Scaling Analysis Method High Dimensional MCMC Concentration near a manifold MALA algorithm MALA for target distribution π α = π N , x ⋆ = x − drift ( x ) δ ( N ) + � 2 C δ ( N ) P N ( ξ ) Euler Discretisation of Langevin Diffusion between t and t + δ ( N ) . Diffusion Limit Take δ ( N ) = N − p for any p ≥ 1 3 . The limit N →∞ z N = z lim exists and is a non-trivial ergodic diffusion process.

The Scaling Analysis Method High Dimensional MCMC Concentration near a manifold Scaling Analysis of MALA N. Pillai,A.M. Stuart and A.T, 2011 Theorem Consider MALA with increment δ ( N ) ≈ N − 1 3 . The limit z N ⇒ z holds weakly in C ([ 0 , T ] , H s ) . The limit process z is a H -valued Langevin diffusion that is reversible with respect to π . For δ ( N ) ∝ N − 1 3 , limiting acceptance probability 0 < p < 1. For δ ( N ) ∝ N − ( 1 3 + ε ) , limiting acceptance probability p = 1. For δ ( N ) ∝ N − ( 1 3 − ε ) , acceptance probability is exponentially small. 1 3 ) as N → ∞ . Complexity of MALA grows as O ( N

The Scaling Analysis Method High Dimensional MCMC Concentration near a manifold What is going wrong? Consider RWM and MALA for Gaussian targets π = π 0 = N ( 0 , C ) and Φ ≡ 0. √ (RWM) x ⋆ = x + δ ξ with ξ D ∼ N ( 0 , C ) . √ (MALA) x ⋆ = ( 1 − δ ) x + 2 δ ξ with ξ D ∼ N ( 0 , C ) . Consequently, if x D ∼ π = N ( 0 , C ) we have (RWM) x ⋆ D ∼ N ( 0 , ( 1 + δ ) C ) . (MALA) x ⋆ D ∼ N ( 0 , ( 1 + δ 2 ) C ) . In infinite dimensional setting, Gaussian measures N ( 0 , C ) and N ( 0 , ( 1 + ε ) C ) are singular. RWM and MALA are NOT well-defined on H .

The Scaling Analysis Method High Dimensional MCMC Concentration near a manifold A Robust Algorithm Target density d π d π 0 ( x ) ∝ e − Φ( x ) . The ’right’ proposal (called pCN) should be √ √ x ⋆ = δ P N ( ξ ) . 1 − δ x + It is well-defined on H and preserve π 0 = N ( 0 , C ) . Theorem (N.Pillai, A.M. Stuart and A.T. (2011)) Under growth conditions on the potential Φ , the pCN algorithm is robust. For any fixed parameter δ > 0 the average acceptance probability stays bounded away fom 0 . RWM complexity grows as O ( N ) 1 3 ) MALA complexity grows as O ( N pCN complexity is O ( 1 )

The Scaling Analysis Method High Dimensional MCMC Concentration near a manifold Optimal Proposal Design Principle Designing proposals which are well-defined on the infinite dimensional parameter space results in MCMC methods which do not suffer from the curse of dimensionality.

Scaling Analysis of MCMC algorithms Alexandre Thiry 1 1 University of - PowerPoint PPT Presentation

The Scaling Analysis Method High Dimensional MCMC Concentration near a manifold Scaling Analysis of MCMC algorithms Alexandre Thiry 1 1 University of Warwick MCQMC, February 2012 Collaboration with Andrew Stuart (Warwick), Gareth Roberts

Analysis of Scaling Algorithms for Matrix & Operator Scaling Contents Scaling Algorithms

Parallel tempering and Interacting MCMC algorithms Gersende FORT / Eric MOULINES Telecom Paris

An MCMC library for probabilistic programming Rob Zinkov June 13th, 2014 Rob Zinkov An MCMC

Testing MCMC Samplers Jason M.T. Roos First European Bayesian Summit in Marketing Testing MCMC

Additional notes on MCMC sampling Shravan Vasishth March 18, 2020 For more details on MCMC, some

Outline Scaling Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large

UP UP AND OUT: SCALING SOFTWARE WITH AKKA Jonas Bonr CTO Typesafe @jboner Scaling software

MCMC for Cut Models or Chasing a Moving Target with MCMC Martyn Plummer International Agency

Modern Computational Statistics Lecture 8: Advanced MCMC Cheng Zhang School of Mathematical

CSci 8980: Advanced Topics in Graphical Models MCMC, Gibbs Sampling Instructor: Arindam Banerjee

Introduction to MCMC and BUGS Basic recipes, and a sample of some techniques for getting

FOR MCMC OLD HEADQUARTER CONFIDENTIAL BACKGROUND Existing MCMC Old HQ building is occupying

MCMC and Variational Inference for AutoEncoders Achille Thin 1 , Alain Durmus 2 , Eric Moulines 1 1

Network determination based on birth-death MCMC inference A. Mohammadi and E. Wit February 4,

STAT 339 Markov Chain Monte Carlo (MCMC) 7 April 2017 Some theory and intuition about MCMC

Convergence of Adaptive and Interacting MCMC algorithms Gersende FORT LTCI / CNRS - TELECOM

Harmonic Analysis on data sets in high-dimensional space Mauro Maggioni Mathematics and Computer

Non-asymptotic convergence bound for the Unadjusted Langevin Algorithm Alain Durmus, Eric

Lower Bounds for Sampling Peter Bartlett CS and Statistics UC Berkeley EPFL Open Problem

Complex Langevin Dynamics in 1+1D QCD at finite densities SIGN workshop Sebastian Schmalzbauer

with population imbalance Shoichiro Tsutsui (RIKEN Nishina Center for Accelerator-Based Science)

Stationary states in 2D systems driven by L evy noises Bart lomiej Dybiec and Krzysztof

Introduction to the Read Paper Young Statisticians Section Mark Girolami Department of

Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++ Alexander Dubbs May 13, 2010

Sambuz

Useful Links

Newsletter

Mail Us

Scaling Analysis of MCMC algorithms Alexandre Thiry 1 1 University of - PowerPoint PPT Presentation

The Scaling Analysis Method High Dimensional MCMC Concentration near a manifold Scaling Analysis of MCMC algorithms Alexandre Thiry 1 1 University of Warwick MCQMC, February 2012 Collaboration with Andrew Stuart (Warwick), Gareth Roberts

Analysis of Scaling Algorithms for Matrix &amp; Operator Scaling Contents Scaling Algorithms

Parallel tempering and Interacting MCMC algorithms Gersende FORT / Eric MOULINES Telecom Paris

An MCMC library for probabilistic programming Rob Zinkov June 13th, 2014 Rob Zinkov An MCMC

Testing MCMC Samplers Jason M.T. Roos First European Bayesian Summit in Marketing Testing MCMC

Additional notes on MCMC sampling Shravan Vasishth March 18, 2020 For more details on MCMC, some

Outline Scaling Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large

UP UP AND OUT: SCALING SOFTWARE WITH AKKA Jonas Bonr CTO Typesafe @jboner Scaling software

MCMC for Cut Models or Chasing a Moving Target with MCMC Martyn Plummer International Agency

Modern Computational Statistics Lecture 8: Advanced MCMC Cheng Zhang School of Mathematical

CSci 8980: Advanced Topics in Graphical Models MCMC, Gibbs Sampling Instructor: Arindam Banerjee

Introduction to MCMC and BUGS Basic recipes, and a sample of some techniques for getting

FOR MCMC OLD HEADQUARTER CONFIDENTIAL BACKGROUND Existing MCMC Old HQ building is occupying

MCMC and Variational Inference for AutoEncoders Achille Thin 1 , Alain Durmus 2 , Eric Moulines 1 1

Network determination based on birth-death MCMC inference A. Mohammadi and E. Wit February 4,

STAT 339 Markov Chain Monte Carlo (MCMC) 7 April 2017 Some theory and intuition about MCMC

Convergence of Adaptive and Interacting MCMC algorithms Gersende FORT LTCI / CNRS - TELECOM

Harmonic Analysis on data sets in high-dimensional space Mauro Maggioni Mathematics and Computer

Non-asymptotic convergence bound for the Unadjusted Langevin Algorithm Alain Durmus, Eric

Lower Bounds for Sampling Peter Bartlett CS and Statistics UC Berkeley EPFL Open Problem

Complex Langevin Dynamics in 1+1D QCD at finite densities SIGN workshop Sebastian Schmalzbauer

with population imbalance Shoichiro Tsutsui (RIKEN Nishina Center for Accelerator-Based Science)

Stationary states in 2D systems driven by L evy noises Bart lomiej Dybiec and Krzysztof

Introduction to the Read Paper Young Statisticians Section Mark Girolami Department of

Gibbs Sampling Bayesian Networks: A First Attempt with Cilk++ Alexander Dubbs May 13, 2010

Sambuz

Useful Links

Newsletter

Mail Us

Analysis of Scaling Algorithms for Matrix & Operator Scaling Contents Scaling Algorithms