Second-order particle MCMC for Bayesian parameter inference IFAC - PowerPoint PPT Presentation

Second-order particle MCMC for Bayesian parameter inference IFAC World Congress 2014, Cape Town, South Africa, August 28, 2014. Johan Dahlin johan.dahlin@liu.se Division of Automatic Control, Linköping University, Sweden. AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

This is collaborative work with Dr. Fredrik Lindsten (University of Cambridge, United Kingdom) Prof. Thomas B. Schön (Uppsala University, Sweden) AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

Summary Aim Efficient Bayesian parameter inference in nonlinear SSMs. Methods Markov chain Monte Carlo. Sequential Monte Carlo methods. Contributions Efficient estimation of first/second order information. Inclusion of first/second order in the proposal. PMH1 and PMH2. AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

Example: Earthquakes between 1900 and 2013 40 Number of major earthquakes 35 30 25 20 15 10 5 1900 1920 1940 1960 1980 2000 2020 Year 1.0 0.06 0.8 0.6 0.04 Density ACF 0.4 0.02 0.2 0.0 0.00 −0.2 0 10 20 30 40 50 0 5 10 15 20 Number of earthquakes Lag AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

Example: A simple model of annual earthquake counts � x t +1 ; φ x t , σ 2 � x t +1 | x t ∼ N , � � y t | x t ∼ P y t ; β exp( x t ) . Task: Estimate π ( θ ) = p ( θ | y 1: T ) ∝ p θ ( y 1: T ) p ( θ ) with θ = { φ , σ } . AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

Metropolis-Hastings algorithm Compute Accept or Propose acc. prob reject? - Propose: θ ′ ∼ q ( θ ′ | θ k − 1 ) . - Compute acceptance probability: � � π ( θ ′ ) q ( θ k − 1 | θ ′ ) α ( θ ′ , θ k − 1 ) = min 1 , π ( θ k − 1 ) q ( θ ′ | θ k − 1 ) - Accept or reject? θ ′ → θ k w.p. α ( θ ′ , θ k − 1 ) . AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

Particle Metropolis-Hastings algorithm (cont.) Compute Accept or Propose acc. prob reject? - Propose: θ ′ ∼ q ( θ ′ | θ k − 1 , u ′ ) and u ′ ∼ PF ( θ ′ ) . - Compute � p θ ′ ( y 1: T | u ′ ) and the acceptance probability: p ( θ ′ ) p θ ′ ( y 1: T | u ′ ) q ( θ k − 1 | θ ′ , u ′ ) � α ( θ ′ , θ k − 1 ) = 1 ∧ q ( θ ′ | θ k − 1 , u k − 1 ) . p ( θ k − 1 ) p θ k − 1 ( y 1: T | u k − 1 ) � - Accept or reject? θ ′ → θ k and u ′ → u k w.p. α ( θ ′ , θ k − 1 ) . AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

Particle filtering Resampling Propagation Weighting Given the particle system (the random variables u ) � � N x ( i ) 1: T , w ( i ) u = i =1 , 1: T we can obtain (consistent) estimates of: - the likelihood p θ ( y 1: T ) . - the first and second order information of π ( θ ) . AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

Zeroth order proposal (PMH0) Gaussian random walk θ ′′ = θ ′ + ǫz, z ∼ N ( z ; 0 , 1) . gives the zeroth order (marginal) proposal � � θ ′′ ; θ ′ , ǫ 2 I d q ( θ ′′ | θ ′ , u ′ ) = N . AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

Example: Parameter inference in earthquake model −300 Iteration: 2 0.5 ● −320 Log−likelihood ● −340 0.4 −360 −380 ● ● ● −400 0.3 0.0 0.5 1.0 1.5 2.0 σ Iteration 0.2 8 Marginal posterior density 6 0.1 4 2 0.0 0 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.4 0.5 0.6 0.7 0.8 0.9 1.0 φ φ AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

Example: Parameter inference in earthquake model AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

Example: Parameter inference in earthquake model 10 20 8 15 6 Density Density 10 4 5 2 0 0 0.70 0.75 0.80 0.85 0.90 0.95 1.00 0.00 0.05 0.10 0.15 0.20 0.25 0.30 φ σ φ σ Posterior mean 0.86 0.15 Posterior median 0.86 0.15 Posterior mode 0.90 0.14 AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

Example: State inference in the earthquake model Number of major earthquakes 50 40 30 20 10 0 1900 1920 1940 1960 1980 2000 2020 Year 1.0 Estimated latent state 0.5 0.0 −0.5 −1.0 1900 1920 1940 1960 1980 2000 2020 Year AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

Fixed-lag particle smoothing: overview The first and second order information can be estimated using � � N x ( i ) 1: T , w ( i ) u = i =1 , 1: T and the fixed-lag particle smoother approximation, p θ ( ❞ x t : t − 1 | y 1: T ) ≈ � p θ ( ❞ x t : t − 1 | y 1: κ t ) , κ t = min { T, t + ∆ } , � with no additional computational complexity. AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

Fixed-lag particle smoothing: motivation 60 40 20 state 0 -20 -40 -60 0 2 4 6 8 time AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

First order proposal (PMH1) Noisy gradient-based ascent update θ ′′ = θ ′ + ǫ 2 2 S ( θ ′ ) + ǫz, z ∼ N ( z ; 0 , 1) , with the first order information � � S ( θ ′ )= ∇ θ log π ( θ ) θ = θ ′ , gives the first order proposal � � θ ′′ ; θ ′ + ǫ 2 � S ( θ ′ | u ′ ) , ǫ 2 I d q ( θ ′′ | θ ′ , u ′ ) = N . 2 AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

Second order proposal (PMH2) Noisy Newton update θ ′′ = θ ′ + ǫ 2 � � − 1 S ( θ ′ ) + ǫ � � − 1 / 2 z, J ( θ ′ ) J ( θ ′ ) z ∼ N ( z ; 0 , 1) , 2 with the second order information � J ( θ ′ )= −∇ 2 � θ log π ( θ ) θ = θ ′ , gives the second order proposal � � θ ′′ ; θ ′ + ǫ 2 � � � − 1 , ǫ 2 � � � − 1 � q ( θ ′′ | θ ′ , u ′ ) = N S ( θ ′ | u ′ ) J ( θ ′ | u ′ ) J ( θ ′ | u ′ ) . 2 AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

Example: Parameter inference in earthquake model PMH0 PMH1 PMH2 0.5 0.5 0.5 ● ● ● 0.4 0.4 0.4 ● 0.3 0.3 0.3 σ σ σ 0.2 0.2 0.2 0.1 0.1 0.1 0.0 0.0 0.0 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.4 0.5 0.6 0.7 0.8 0.9 1.0 φ φ φ AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

Example: Parameter inference in earthquake model AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

Integrated autocorrelation time 1.0 1.0 1.0 PMH0: φ PMH1: φ PMH2: φ 0.8 0.8 0.8 0.6 0.6 0.6 ACF ACF ACF 0.4 0.4 0.4 0.2 0.2 0.2 0.0 0.0 0.0 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 Lag Lag Lag 1.0 1.0 1.0 PMH0: σ PMH1: σ PMH2: σ 0.8 0.8 0.8 0.6 0.6 0.6 ACF ACF ACF 0.4 0.4 0.4 0.2 0.2 0.2 0.0 0.0 0.0 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 Lag Lag Lag AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

Integrated autocorrelation time (cont.) IACT: the number of iterations between two uncorrelated samples. Acceptance rate max IACT PMH0 0.47 31.82 PMH1 0.38 21.38 PMH2 0.54 14.15 AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

Conclusions Results Shorter burn-in phase. Increased efficiency (about 2 times). Simplified tuning. Retained linear computational complexity in N . Methods Include u into the proposal. Particle fixed-lag smoothing (almost for free). Laplace approximation / Random walk on a Riemann manifold. Future work Non-reversible Markov chains. Adaptive methods. Approximate Bayesian computations. AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

Thank you for your attention! Questions, comments and suggestions are most welcome. Further developments Extended version available at arXiv. The paper and code to replicate the results within it are found at: http://work.johandahlin.com/ . AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

Particle Metropolis-Hastings algorithm The target distribution is given by the parameter proposal π ( θ ) = p θ ( y 1: T ) p ( θ ) . p ( y 1: T ) An unbiased estimator of the likelihood is given by � � � p θ ( y 1: T | u ) � = p θ ( y 1: T | u ) m θ ( u ) ❞ u = p θ ( y 1: T ) . � E m An extended target is given by p θ ( y 1: T | u ) m θ ( u ) p ( θ ) p θ ( y 1: T | u ) m θ ( u ) π ( θ ) π ( θ, u ) = � = � . p ( y 1: T ) p θ ( y 1: T ) AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

Particle Metropolis-Hastings algorithm (cont.) � � � p θ ( y 1: T | u ) m θ ( u ) π ( θ ) π ( θ, u ) ❞ u = ❞ u p θ ( y 1: T ) � π ( θ ) p θ ( y 1: T | u ) m θ ( u ) ❞ u = � , p θ ( y 1: T ) � �� = p θ ( y 1: T ) = π ( θ ) . That is, the marginal is the desired target distribution and the Markov chain is kept invariant. AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

(bootstrap) Particle filtering Resampling Propagation Weighting t − 1 = x a ( i ) - Resampling: P ( a ( i ) w ( j ) x ( i ) = j ) = � t − 1 and set � t − 1 . t t � � � - Propagation: x ( i ) x ( i ) x ( i ) �� ∼ R θ = f θ ( x t | � t − 1 ) . x t t t − 1 � � - Weighting: w ( i ) x ( i ) x ( i ) = W θ t , � = g θ ( y t | x t ) . t t − 1 AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

Second-order particle MCMC for Bayesian parameter inference IFAC - PowerPoint PPT Presentation

Second-order particle MCMC for Bayesian parameter inference IFAC World Congress 2014, Cape Town, South Africa, August 28, 2014. Johan Dahlin johan.dahlin@liu.se Division of Automatic Control, Linkping University, Sweden. AUTOMATIC CONTROL

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

Testing MCMC Samplers Jason M.T. Roos First European Bayesian Summit in Marketing Testing MCMC

I 4 - Bayesian parameter estimation in a normal model STAT 587 (Engineering) Iowa State

An MCMC library for probabilistic programming Rob Zinkov June 13th, 2014 Rob Zinkov An MCMC

Additional notes on MCMC sampling Shravan Vasishth March 18, 2020 For more details on MCMC, some

Particle dynamics Second-order motion Particle system Forces Constraints

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

6. Parameter Passing Parameter Passing CS 381 Spring 2016 Example (Formal) Parameter void

10/16/19 Parameter Control Genetic Algorithms Motivation Parameter setting Tuning

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Particle dynamics Particle overview Particle system Forces Constraints

Particle dynamics Particle overview Particle system Forces Constraints

Bayesian Zig Zag Developing probabilistic models using grid methods and MCMC Allen Downey ACM

Bayesian method probabilities Application of Bayesian methods Demo: McRobot (P . Lewis)

MCMC for Cut Models or Chasing a Moving Target with MCMC Martyn Plummer International Agency

On the use of monetary and macroprudential policies for small open economies F. Gulcin Ozkan

High-Order Finite-Volume Methods for Hyperbolic Problems Phillip Colella Computational

Decision trees and Ensemble methods Camilo Fosco CS109A Introduction to Data Science Pavlos

An Efficient Computational Solution Scheme of the Random Eigenvalue Problems Rajib Chowdhury

DNS of turbulent channel flow with different types of spanwise forcing Sergio Pirozzoli ,

Focus Topic: Propagators for the TDSE Part I: The Time-Evolution Operator and Crank-Nicolson

Electronic Materials and Extreme Conditions J. Paul Attfield School of Chemistry and Centre for

When and how should I combine patient- level data and literature data in a meta- analysis?