Parcle Metropolis-Hasngs Johan Dahlin Department of Informaon - PowerPoint PPT Presentation

work@johandahlin.com Par�cle Metropolis-Has�ngs Johan Dahlin Department of Informa�on Technology, Uppsala University, Sweden. September 28, 2016

This is collabora�ve work together with Dr. Fredrik Lindsten (Uppsala University, Sweden). Prof. Thomas Schön (Uppsala University, Sweden). Andreas Svensson (Uppsala University, Sweden).

What are we going to do? - Give a (hopefully) gentle introduc�on to (P)MCMC. - Develop some intui�on for PMH and its pros/cons. Why are we doing this? - PMH is general algorithm for Bayesian inference. - Rela�vely simple to implement and tune. How will we do this? - Employ intui�on and analogues with op�misa�on. - Inves�gate PMH using simula�ons and not maths. - Illustrate PMH by water tank benchmark. - By asking ques�ons.

Mapping a lake

State space models Markov chain [ X 0: T , Y 1: T ] with X t ∈ X = R , Y t ∈ Y = R , t ∈ N . · · · · · · x t − 1 x t x t +1 · · · · · · y t − 1 y t y t +1 x 0 ∼ µ θ ( x 0 ) x t +1 | x t ∼ f θ ( x t +1 | x t ) , y t | x t ∼ g θ ( y t | x t ) . Example: linear Gaussian SSM ( θ = [ µ, φ, σ v , σ e ] ): � � � � x t +1 ; µ + φ ( x t − µ ) , σ 2 y t ; x t , σ 2 x t +1 | x t ∼ N y t | x t ∼ N , . v e

Bayesian parameter inference -4 -2 0 2 4 -4 -2 0 2 4 -4 -2 0 2 4 θ θ θ � � � ϕ ( θ ′ ) π ( θ ′ ) d θ ′ . π ( θ ) = p ( θ | y ) ∝ p ( y | θ ) p ( θ ) , π [ ϕ ] = E π ϕ ( θ ) =

Exploring posteriors by Markov chains

Markov chains: basic proper�es A sequence of random variables { X k } K k =0 with the property � R ( x k − 1 , x k ) d x k . P [ X k ∈ A | x 0: k − 1 ] = P [ X k ∈ A | x k − 1 ] = A We will consider ergodic chains with the proper�es: Reach any point No cycles Does not get stuck (irreducible) (aperiodic) (recurrent)

Markov chains: sta�onary distribu�on � θ k ; µ + φ ( θ k − 1 − µ ) , σ 2 � θ k | θ k − 1 ∼ N .

Metropolis-Has�ngs: algorithm Ini�alise in θ 0 and then generate samples { θ k } K k =1 from p ( θ | y ) by (i) Sample a candidate parameter θ ′ by θ ′ ∼ N ( θ ′ ; θ k − 1 , Σ) . (ii) Accept θ ′ by se�ng θ k ← θ ′ with probability � � � � p ( θ ′ | y ) p ( θ ′ ) p ( y | θ ′ ) p ( y ) min = min 1 , 1 , p ( θ k − 1 | y ) p ( θ k − 1 ) p ( y | θ k − 1 ) p ( y ) and otherwise reject θ ′ by se�ng θ k ← θ k − 1 . User choices: K and Σ .

q 2 Metropolis-Has�ngs: toy example -6 -4 -2 0 2 4 6 -6 -4 -2 q 1 0 2 4 6 marginal posterior density log-posterior at current state 0.0 0.1 0.2 0.3 0.4 0.5 -400 -300 -200 -100 0 0.6 -6 -4 0.8 -2 iteration 1.0 q 2 0 2 1.2 4 1.4 6

q 2 Metropolis-Has�ngs: toy example -6 -4 -2 0 2 4 6 -6 -4 -2 q 1 0 2 4 6 marginal posterior density log-posterior at current state 0.0 0.1 0.2 0.3 0.4 0.5 -400 -300 -200 -100 0 1.0 -6 -4 1.2 -2 1.4 iteration q 2 0 1.6 2 1.8 4 2.0 6

q 2 Metropolis-Has�ngs: toy example -6 -4 -2 0 2 4 6 -6 -4 -2 q 1 0 2 4 6 marginal posterior density log-posterior at current state 0.0 0.1 0.2 0.3 0.4 0.5 -400 -300 -200 -100 0 0 -6 -4 10 -2 20 iteration q 2 0 30 2 40 4 50 6

Metropolis-Has�ngs: proposal and mixing 1.0 6 0.8 4 2 0.6 ACF of q 1 trace 0 0.4 -2 0.2 -4 0.0 -6 0 500 1000 1500 2000 2500 0 50 100 150 200 q 1 lag 1.0 6 0.8 4 2 0.6 ACF of q 2 trace 0 0.4 -2 0.2 -4 0.0 -6 0 500 1000 1500 2000 2500 0 50 100 150 200 q 2 lag

Metropolis-Has�ngs: proposal and mixing 1.0 6 0.8 0.6 ACF of q 1 4 0.4 0.2 2 0.0 0 50 100 150 200 q 2 0 lag 1.0 0.8 -2 0.6 ACF of q 2 0.4 -4 0.2 0.0 -6 -6 -4 -2 0 2 4 6 0 50 100 150 200 q 1 lag

Approxima�ng the target by par�cles

Mapping a stormy lake

Par�cle Metropolis-Has�ngs (PMH) Ini�alise in θ 0 and then generate samples { θ k } K k =1 from p ( θ | y ) by (i) Sample a candidate parameter θ ′ by θ ′ ∼ N ( θ ′ ; θ k − 1 , Σ) . (ii) Run a par�cle filter with N par�cles to es�mate � p N ( θ ′ | y ) . (iii) Accept θ ′ by se�ng θ k ← θ ′ with probability � � p N ( θ ′ | y ) � min 1 , , p N ( θ k − 1 | y ) � and otherwise reject θ ′ by se�ng θ k ← θ k − 1 . User choices: K , Σ and N .

Water tank example: model Consider the model � x 1 ( t ) = − k 1 x 1 ( t ) + k 4 u ( t ) + w 1 ( t ) , ˙ � � x 2 ( t ) = k 2 x 1 ( t ) − k 3 ˙ x 2 ( t ) + w 2 ( t ) , y ( t ) = x 2 ( t ) + e ( t ) , where w 1 ( t ) , w 2 ( t ) , e ( t ) are independent Gaussian and The parameters are θ = { k 1 , k 2 , k 3 , k 4 , . . . } with p ( θ ) ∝ 1 .

Water tank example: data 8 6 input (u) 4 2 0 0 200 400 600 800 1000 time 10 8 observation (y) 6 4 2 0 200 400 600 800 1000 time

Water tank example: parameter posteriors marginal posterior density marginal posterior density 0 20 40 60 80 100 120 0 20 40 60 80 100 120 -0.01 0.06 0.00 0.07 0.01 0.08 k 3 k 1 0.02 0.09 0.03 0.04 0.10 marginal posterior density marginal posterior density 0 10 20 30 40 50 0 20 40 60 80 100 120 -0.01 0.60 0.62 0.00 0.64 0.01 k 4 k 2 0.66 0.02 0.68 0.70 0.03

Water tank example: state predic�ons 8 6 input (u) 4 2 0 0 200 400 600 800 1000 time 10 8 observation (y) 6 4 2 0 200 400 600 800 1000 time

Improving the PMH algorithm

trace of k 3 trace of k 1 Water tank example: trace plots -0.01 0.00 0.01 0.02 0.03 0.04 0.06 0.07 0.08 0.09 0.10 0 0 2000 2000 4000 4000 time time 6000 6000 8000 8000 trace of k 4 trace of k 2 0.60 0.62 0.64 0.66 0.68 0.70 -0.005 0.005 0.015 0.025 0 0 2000 2000 4000 4000 time time 6000 6000 8000 8000

What are some open ques�ons? - Decreasing computa�onal �me when T is large. Correla�ng and improving the par�cle filter. - Obtaining be�er mixing when p = | θ | is large (>5). Add gradient and Hessian informa�on into proposal. - Devising be�er mixing when n x = | x | is large (>10). Improving the par�cle filter. - Decrease the amount of tuning by the user. Adap�ve algorithms and rules-of-thumb.

What did we do? - Gave a (hopefully) gentle introduc�on to (P)MCMC. - Developed some intui�on for PMH and its pros/cons. Why did we do this? - PMH is general algorithm for Bayesian inference. - Rela�vely simple to implement and tune. What are you going to do now? - Remember that the PMH algorithm exist. - Learning more by reading our tutorial. - Try to implement the algorithm yourself.

Complete tutorial on PMH is available at arXiv:1511.01707.

work@johandahlin.com work.johandahlin.com Thank you for listening Comments, sugges�ons and/or ques�ons? Johan Dahlin Remember: the tutorial is available at arXiv:1511.01707

Par�cle filtering [I/II] An instance of sequen�al Monte Carlo (SMC) samplers. � � Es�mates E and p θ ( y 1: T ) . ϕ ( x t ) | y 1: t Computa�onal cost of order O ( NT ) (with N ∼ T ). Well-understood sta�s�cal proper�es. (unbiasedness, large devia�on inequali�es, CLTs) References: A. Doucet and A. Johansen, A tutorial on par�cle filtering and smoothing. In D. Crisan and B. Rozovsky (editors), The Oxford Handbook of Nonlinear Filtering. Oxford University Press, 2011. O. Cappé, S.J. Godsill and E. Moulines, An overview of exis�ng methods and recent advances in sequen�al Monte Carlo. In Proceedings of the IEEE 95(5), 2007.

Par�cle filtering [II/II] Resampling Propaga�on Weigh�ng By itera�ng: P ( a ( i ) w ( j ) Resampling: t − 1 , for i, j = 1 , . . . , N . = j ) = � t � � � � x a ( i ) x ( i ) Propaga�on: , for i = 1 , . . . , N . ∼ f θ t x t t − 1 t � � � w ( i ) � x ( i ) Weigh�ng: , for i = 1 , . . . , N . = g θ y t t t We obtain the par�cle system � � N x ( i ) 0: T , w ( i ) i =1 . 0: T

Par�cle filtering: state es�ma�on 0.30 0.30 0.25 0.25 0.20 0.20 density density 0.15 0.15 0.10 0.10 0.05 0.05 0.00 0.00 0 2 4 6 8 10 0 2 4 6 8 10 x x � � N � � � � � √ � � d w ( i ) x ( i ) 0 , σ 2 ϕ N � � ϕ N ϕ ( x t ) | y 1: t = ϕ t − � − → N t ( ϕ ) � E t ϕ , N . t t t i =1

Par�cle filtering: likelihood es�ma�on 2.0 1.0 1.5 0.5 density estimate sample quantiles 1.0 0.0 -0.5 0.5 -1.0 0.0 -1.0 -0.5 0.0 0.5 1.0 -3 -2 -1 0 1 2 3 error in the log-likelihood estimate standard Gaussian quantiles � N � � � T � � √ ℓ ( θ ) + σ 2 � � w ( i ) d log � log − T log N, ℓ ( θ ) − � π � 0 , σ 2 − → N p θ ( y 1: T ) = N . t π � � �� 2 N t =1 i =1 � � ℓ ( θ )

Parcle Metropolis-Hasngs Johan Dahlin Department of Informaon - PowerPoint PPT Presentation

work@johandahlin.com Parcle Metropolis-Hasngs Johan Dahlin Department of Informaon Technology, Uppsala University, Sweden. September 28, 2016 This is collaborave work together with Dr. Fredrik Lindsten (Uppsala University,

Metropolis Sampling Ars` ene P erard-Gayot May 23, 2016 Introduction Background Metropolis

Future of Par+cle Astrophysics, Michel Spiro President of IUPAP Malargue November 15 th , 2019

Pathway Analysis Jenny Wu Outline Introduction to NGS data analysis in Cancer Genomics

IPSEC VPN overview IPSEC VPN overview Basic VPN Architecture CPE/CLE CPE/CLE PE

Metropolis Of Boston Philoptochos Officers Workshop Saturday, November 23, 2013 Greek Orthodox

Metropolis-Hastings algorithm Dr. Jarad Niemi STAT 544 - Iowa State University April 2, 2019

Projet METROPOLIS METROlogie Pour LInternet et les Services Metropolis Project

New Applications of Moment-SOS Hierarchies Victor Magron , RA Imperial College 17 October 2014

New Applications of Moment-SOS Hierarchies Victor Magron , RA Imperial College 12 February 2015

New Applications of Moment-SOS Hierarchies Victor Magron , RA Imperial College 13 November 2014

DRAFT Scaling MySQL with Python draft2 Roberto Polli - roberto.polli@par-tec.it Par-Tec Spa -

NLCertify : A Tool for Formal Nonlinear Optimization Victor Magron , Postdoc LAAS-CNRS 18

New Applications of Semidefinite Programming Victor Magron , RA Imperial College 3 Fvrier 2015

Genomics infrastr Genomics infrastruc ucture f ure for NGS r NGS 2013 Winter School

The NGS WFS of MAORY Presented by Marco Bonaglia Adoni workshop Padova, 10th-12th April 2017

Nov Novel Appr Approaches oaches to to ID ID Te Testing Usi Using NGS NGS Based Based

Advanced Distributed Systems RPCs & MapReduce Wyatt Lloyd Some slides adapted from:

intro / what is an OS / processes and system calls 1 Changelog 14 January 2020: reorganize

Hierarchical Bayesian ARX models for robust inference Johan Dahlin, Fredrik Lindsten, Thomas B.

Making Byzantine Fault Tolerant Systems Tolerate Byzantine Failures Allen Clement, Mirco

POSIX API 3 / Scheduling 1 1 last time pipes (fjnish) process states in xv6 the scheduler

Running Effective Interdisciplinary Team (IDT) Meetings Connie Dahlin, MSN, ANP-BC, ACHPN

Administrivia CS 4410 Operating Systems Spring 2018 Professor Bracy [R. Agarwal, L. Alvisi, A.

STAR: Self-Tuning Aggregation for Scalable Monitoring [On job market next year] Navendu Jain,

Parcle Metropolis-Hasngs Johan Dahlin Department of Informaon - PowerPoint PPT Presentation

work@johandahlin.com Parcle Metropolis-Hasngs Johan Dahlin Department of Informaon Technology, Uppsala University, Sweden. September 28, 2016 This is collaborave work together with Dr. Fredrik Lindsten (Uppsala University,

Metropolis Sampling Ars` ene P erard-Gayot May 23, 2016 Introduction Background Metropolis

Future of Par+cle Astrophysics, Michel Spiro President of IUPAP Malargue November 15 th , 2019

Pathway Analysis Jenny Wu Outline Introduction to NGS data analysis in Cancer Genomics

IPSEC VPN overview IPSEC VPN overview Basic VPN Architecture CPE/CLE CPE/CLE PE

Metropolis Of Boston Philoptochos Officers Workshop Saturday, November 23, 2013 Greek Orthodox

Metropolis-Hastings algorithm Dr. Jarad Niemi STAT 544 - Iowa State University April 2, 2019

Projet METROPOLIS METROlogie Pour LInternet et les Services Metropolis Project

New Applications of Moment-SOS Hierarchies Victor Magron , RA Imperial College 17 October 2014

New Applications of Moment-SOS Hierarchies Victor Magron , RA Imperial College 12 February 2015

New Applications of Moment-SOS Hierarchies Victor Magron , RA Imperial College 13 November 2014

DRAFT Scaling MySQL with Python draft2 Roberto Polli - roberto.polli@par-tec.it Par-Tec Spa -

NLCertify : A Tool for Formal Nonlinear Optimization Victor Magron , Postdoc LAAS-CNRS 18

New Applications of Semidefinite Programming Victor Magron , RA Imperial College 3 Fvrier 2015

Genomics infrastr Genomics infrastruc ucture f ure for NGS r NGS 2013 Winter School

The NGS WFS of MAORY Presented by Marco Bonaglia Adoni workshop Padova, 10th-12th April 2017

Nov Novel Appr Approaches oaches to to ID ID Te Testing Usi Using NGS NGS Based Based

Advanced Distributed Systems RPCs &amp; MapReduce Wyatt Lloyd Some slides adapted from:

intro / what is an OS / processes and system calls 1 Changelog 14 January 2020: reorganize

Hierarchical Bayesian ARX models for robust inference Johan Dahlin, Fredrik Lindsten, Thomas B.

Making Byzantine Fault Tolerant Systems Tolerate Byzantine Failures Allen Clement, Mirco

POSIX API 3 / Scheduling 1 1 last time pipes (fjnish) process states in xv6 the scheduler

Running Effective Interdisciplinary Team (IDT) Meetings Connie Dahlin, MSN, ANP-BC, ACHPN

Administrivia CS 4410 Operating Systems Spring 2018 Professor Bracy [R. Agarwal, L. Alvisi, A.

STAR: Self-Tuning Aggregation for Scalable Monitoring [On job market next year] Navendu Jain,

Advanced Distributed Systems RPCs & MapReduce Wyatt Lloyd Some slides adapted from: