work@johandahlin.com Par�cle Metropolis-Has�ngs Johan Dahlin Department of Informa�on Technology, Uppsala University, Sweden. September 28, 2016
This is collabora�ve work together with Dr. Fredrik Lindsten (Uppsala University, Sweden). Prof. Thomas Schön (Uppsala University, Sweden). Andreas Svensson (Uppsala University, Sweden).
What are we going to do? - Give a (hopefully) gentle introduc�on to (P)MCMC. - Develop some intui�on for PMH and its pros/cons. Why are we doing this? - PMH is general algorithm for Bayesian inference. - Rela�vely simple to implement and tune. How will we do this? - Employ intui�on and analogues with op�misa�on. - Inves�gate PMH using simula�ons and not maths. - Illustrate PMH by water tank benchmark. - By asking ques�ons.
Mapping a lake
State space models Markov chain [ X 0: T , Y 1: T ] with X t ∈ X = R , Y t ∈ Y = R , t ∈ N . · · · · · · x t − 1 x t x t +1 · · · · · · y t − 1 y t y t +1 x 0 ∼ µ θ ( x 0 ) x t +1 | x t ∼ f θ ( x t +1 | x t ) , y t | x t ∼ g θ ( y t | x t ) . Example: linear Gaussian SSM ( θ = [ µ, φ, σ v , σ e ] ): � � � � x t +1 ; µ + φ ( x t − µ ) , σ 2 y t ; x t , σ 2 x t +1 | x t ∼ N y t | x t ∼ N , . v e
Bayesian parameter inference -4 -2 0 2 4 -4 -2 0 2 4 -4 -2 0 2 4 θ θ θ � � � ϕ ( θ ′ ) π ( θ ′ ) d θ ′ . π ( θ ) = p ( θ | y ) ∝ p ( y | θ ) p ( θ ) , π [ ϕ ] = E π ϕ ( θ ) =
Exploring posteriors by Markov chains
Markov chains: basic proper�es A sequence of random variables { X k } K k =0 with the property � R ( x k − 1 , x k ) d x k . P [ X k ∈ A | x 0: k − 1 ] = P [ X k ∈ A | x k − 1 ] = A We will consider ergodic chains with the proper�es: Reach any point No cycles Does not get stuck (irreducible) (aperiodic) (recurrent)
Markov chains: sta�onary distribu�on � θ k ; µ + φ ( θ k − 1 − µ ) , σ 2 � θ k | θ k − 1 ∼ N .
Metropolis-Has�ngs: algorithm Ini�alise in θ 0 and then generate samples { θ k } K k =1 from p ( θ | y ) by (i) Sample a candidate parameter θ ′ by θ ′ ∼ N ( θ ′ ; θ k − 1 , Σ) . (ii) Accept θ ′ by se�ng θ k ← θ ′ with probability � � � � p ( θ ′ | y ) p ( θ ′ ) p ( y | θ ′ ) p ( y ) min = min 1 , 1 , p ( θ k − 1 | y ) p ( θ k − 1 ) p ( y | θ k − 1 ) p ( y ) and otherwise reject θ ′ by se�ng θ k ← θ k − 1 . User choices: K and Σ .
q 2 Metropolis-Has�ngs: toy example -6 -4 -2 0 2 4 6 -6 -4 -2 q 1 0 2 4 6 marginal posterior density log-posterior at current state 0.0 0.1 0.2 0.3 0.4 0.5 -400 -300 -200 -100 0 0.6 -6 -4 0.8 -2 iteration 1.0 q 2 0 2 1.2 4 1.4 6
q 2 Metropolis-Has�ngs: toy example -6 -4 -2 0 2 4 6 -6 -4 -2 q 1 0 2 4 6 marginal posterior density log-posterior at current state 0.0 0.1 0.2 0.3 0.4 0.5 -400 -300 -200 -100 0 1.0 -6 -4 1.2 -2 1.4 iteration q 2 0 1.6 2 1.8 4 2.0 6
q 2 Metropolis-Has�ngs: toy example -6 -4 -2 0 2 4 6 -6 -4 -2 q 1 0 2 4 6 marginal posterior density log-posterior at current state 0.0 0.1 0.2 0.3 0.4 0.5 -400 -300 -200 -100 0 0 -6 -4 10 -2 20 iteration q 2 0 30 2 40 4 50 6
Metropolis-Has�ngs: proposal and mixing 1.0 6 0.8 4 2 0.6 ACF of q 1 trace 0 0.4 -2 0.2 -4 0.0 -6 0 500 1000 1500 2000 2500 0 50 100 150 200 q 1 lag 1.0 6 0.8 4 2 0.6 ACF of q 2 trace 0 0.4 -2 0.2 -4 0.0 -6 0 500 1000 1500 2000 2500 0 50 100 150 200 q 2 lag
Metropolis-Has�ngs: proposal and mixing 1.0 6 0.8 0.6 ACF of q 1 4 0.4 0.2 2 0.0 0 50 100 150 200 q 2 0 lag 1.0 0.8 -2 0.6 ACF of q 2 0.4 -4 0.2 0.0 -6 -6 -4 -2 0 2 4 6 0 50 100 150 200 q 1 lag
Approxima�ng the target by par�cles
Mapping a stormy lake
Par�cle Metropolis-Has�ngs (PMH) Ini�alise in θ 0 and then generate samples { θ k } K k =1 from p ( θ | y ) by (i) Sample a candidate parameter θ ′ by θ ′ ∼ N ( θ ′ ; θ k − 1 , Σ) . (ii) Run a par�cle filter with N par�cles to es�mate � p N ( θ ′ | y ) . (iii) Accept θ ′ by se�ng θ k ← θ ′ with probability � � p N ( θ ′ | y ) � min 1 , , p N ( θ k − 1 | y ) � and otherwise reject θ ′ by se�ng θ k ← θ k − 1 . User choices: K , Σ and N .
Water tank example: model Consider the model � x 1 ( t ) = − k 1 x 1 ( t ) + k 4 u ( t ) + w 1 ( t ) , ˙ � � x 2 ( t ) = k 2 x 1 ( t ) − k 3 ˙ x 2 ( t ) + w 2 ( t ) , y ( t ) = x 2 ( t ) + e ( t ) , where w 1 ( t ) , w 2 ( t ) , e ( t ) are independent Gaussian and The parameters are θ = { k 1 , k 2 , k 3 , k 4 , . . . } with p ( θ ) ∝ 1 .
Water tank example: data 8 6 input (u) 4 2 0 0 200 400 600 800 1000 time 10 8 observation (y) 6 4 2 0 200 400 600 800 1000 time
Water tank example: parameter posteriors marginal posterior density marginal posterior density 0 20 40 60 80 100 120 0 20 40 60 80 100 120 -0.01 0.06 0.00 0.07 0.01 0.08 k 3 k 1 0.02 0.09 0.03 0.04 0.10 marginal posterior density marginal posterior density 0 10 20 30 40 50 0 20 40 60 80 100 120 -0.01 0.60 0.62 0.00 0.64 0.01 k 4 k 2 0.66 0.02 0.68 0.70 0.03
Water tank example: state predic�ons 8 6 input (u) 4 2 0 0 200 400 600 800 1000 time 10 8 observation (y) 6 4 2 0 200 400 600 800 1000 time
Improving the PMH algorithm
trace of k 3 trace of k 1 Water tank example: trace plots -0.01 0.00 0.01 0.02 0.03 0.04 0.06 0.07 0.08 0.09 0.10 0 0 2000 2000 4000 4000 time time 6000 6000 8000 8000 trace of k 4 trace of k 2 0.60 0.62 0.64 0.66 0.68 0.70 -0.005 0.005 0.015 0.025 0 0 2000 2000 4000 4000 time time 6000 6000 8000 8000
What are some open ques�ons? - Decreasing computa�onal �me when T is large. Correla�ng and improving the par�cle filter. - Obtaining be�er mixing when p = | θ | is large (>5). Add gradient and Hessian informa�on into proposal. - Devising be�er mixing when n x = | x | is large (>10). Improving the par�cle filter. - Decrease the amount of tuning by the user. Adap�ve algorithms and rules-of-thumb.
What did we do? - Gave a (hopefully) gentle introduc�on to (P)MCMC. - Developed some intui�on for PMH and its pros/cons. Why did we do this? - PMH is general algorithm for Bayesian inference. - Rela�vely simple to implement and tune. What are you going to do now? - Remember that the PMH algorithm exist. - Learning more by reading our tutorial. - Try to implement the algorithm yourself.
Complete tutorial on PMH is available at arXiv:1511.01707.
work@johandahlin.com work.johandahlin.com Thank you for listening Comments, sugges�ons and/or ques�ons? Johan Dahlin Remember: the tutorial is available at arXiv:1511.01707
Par�cle filtering [I/II] An instance of sequen�al Monte Carlo (SMC) samplers. � � Es�mates E and p θ ( y 1: T ) . ϕ ( x t ) | y 1: t Computa�onal cost of order O ( NT ) (with N ∼ T ). Well-understood sta�s�cal proper�es. (unbiasedness, large devia�on inequali�es, CLTs) References: A. Doucet and A. Johansen, A tutorial on par�cle filtering and smoothing. In D. Crisan and B. Rozovsky (editors), The Oxford Handbook of Nonlinear Filtering. Oxford University Press, 2011. O. Cappé, S.J. Godsill and E. Moulines, An overview of exis�ng methods and recent advances in sequen�al Monte Carlo. In Proceedings of the IEEE 95(5), 2007.
Par�cle filtering [II/II] Resampling Propaga�on Weigh�ng By itera�ng: P ( a ( i ) w ( j ) Resampling: t − 1 , for i, j = 1 , . . . , N . = j ) = � t � � � � x a ( i ) x ( i ) Propaga�on: , for i = 1 , . . . , N . ∼ f θ t x t t − 1 t � � � w ( i ) � x ( i ) Weigh�ng: , for i = 1 , . . . , N . = g θ y t t t We obtain the par�cle system � � N x ( i ) 0: T , w ( i ) i =1 . 0: T
Par�cle filtering: state es�ma�on 0.30 0.30 0.25 0.25 0.20 0.20 density density 0.15 0.15 0.10 0.10 0.05 0.05 0.00 0.00 0 2 4 6 8 10 0 2 4 6 8 10 x x � � N � � � � � √ � � d w ( i ) x ( i ) 0 , σ 2 ϕ N � � ϕ N ϕ ( x t ) | y 1: t = ϕ t − � − → N t ( ϕ ) � E t ϕ , N . t t t i =1
Par�cle filtering: likelihood es�ma�on 2.0 1.0 1.5 0.5 density estimate sample quantiles 1.0 0.0 -0.5 0.5 -1.0 0.0 -1.0 -0.5 0.0 0.5 1.0 -3 -2 -1 0 1 2 3 error in the log-likelihood estimate standard Gaussian quantiles � N � � � T � � √ ℓ ( θ ) + σ 2 � � w ( i ) d log � log − T log N, ℓ ( θ ) − � π � 0 , σ 2 − → N p θ ( y 1: T ) = N . t π � � �� � 2 N t =1 i =1 � � ℓ ( θ )
Recommend
More recommend