par cle metropolis has ngs
play

Parcle Metropolis-Hasngs Johan Dahlin Department of Informaon - PowerPoint PPT Presentation

work@johandahlin.com Parcle Metropolis-Hasngs Johan Dahlin Department of Informaon Technology, Uppsala University, Sweden. September 28, 2016 This is collaborave work together with Dr. Fredrik Lindsten (Uppsala University,


  1. work@johandahlin.com Par�cle Metropolis-Has�ngs Johan Dahlin Department of Informa�on Technology, Uppsala University, Sweden. September 28, 2016

  2. This is collabora�ve work together with Dr. Fredrik Lindsten (Uppsala University, Sweden). Prof. Thomas Schön (Uppsala University, Sweden). Andreas Svensson (Uppsala University, Sweden).

  3. What are we going to do? - Give a (hopefully) gentle introduc�on to (P)MCMC. - Develop some intui�on for PMH and its pros/cons. Why are we doing this? - PMH is general algorithm for Bayesian inference. - Rela�vely simple to implement and tune. How will we do this? - Employ intui�on and analogues with op�misa�on. - Inves�gate PMH using simula�ons and not maths. - Illustrate PMH by water tank benchmark. - By asking ques�ons.

  4. Mapping a lake

  5. State space models Markov chain [ X 0: T , Y 1: T ] with X t ∈ X = R , Y t ∈ Y = R , t ∈ N . · · · · · · x t − 1 x t x t +1 · · · · · · y t − 1 y t y t +1 x 0 ∼ µ θ ( x 0 ) x t +1 | x t ∼ f θ ( x t +1 | x t ) , y t | x t ∼ g θ ( y t | x t ) . Example: linear Gaussian SSM ( θ = [ µ, φ, σ v , σ e ] ): � � � � x t +1 ; µ + φ ( x t − µ ) , σ 2 y t ; x t , σ 2 x t +1 | x t ∼ N y t | x t ∼ N , . v e

  6. Bayesian parameter inference -4 -2 0 2 4 -4 -2 0 2 4 -4 -2 0 2 4 θ θ θ � � � ϕ ( θ ′ ) π ( θ ′ ) d θ ′ . π ( θ ) = p ( θ | y ) ∝ p ( y | θ ) p ( θ ) , π [ ϕ ] = E π ϕ ( θ ) =

  7. Exploring posteriors by Markov chains

  8. Markov chains: basic proper�es A sequence of random variables { X k } K k =0 with the property � R ( x k − 1 , x k ) d x k . P [ X k ∈ A | x 0: k − 1 ] = P [ X k ∈ A | x k − 1 ] = A We will consider ergodic chains with the proper�es: Reach any point No cycles Does not get stuck (irreducible) (aperiodic) (recurrent)

  9. Markov chains: sta�onary distribu�on � θ k ; µ + φ ( θ k − 1 − µ ) , σ 2 � θ k | θ k − 1 ∼ N .

  10. Metropolis-Has�ngs: algorithm Ini�alise in θ 0 and then generate samples { θ k } K k =1 from p ( θ | y ) by (i) Sample a candidate parameter θ ′ by θ ′ ∼ N ( θ ′ ; θ k − 1 , Σ) . (ii) Accept θ ′ by se�ng θ k ← θ ′ with probability � � � � p ( θ ′ | y ) p ( θ ′ ) p ( y | θ ′ ) p ( y ) min = min 1 , 1 , p ( θ k − 1 | y ) p ( θ k − 1 ) p ( y | θ k − 1 ) p ( y ) and otherwise reject θ ′ by se�ng θ k ← θ k − 1 . User choices: K and Σ .

  11. q 2 Metropolis-Has�ngs: toy example -6 -4 -2 0 2 4 6 -6 -4 -2 q 1 0 2 4 6 marginal posterior density log-posterior at current state 0.0 0.1 0.2 0.3 0.4 0.5 -400 -300 -200 -100 0 0.6 -6 -4 0.8 -2 iteration 1.0 q 2 0 2 1.2 4 1.4 6

  12. q 2 Metropolis-Has�ngs: toy example -6 -4 -2 0 2 4 6 -6 -4 -2 q 1 0 2 4 6 marginal posterior density log-posterior at current state 0.0 0.1 0.2 0.3 0.4 0.5 -400 -300 -200 -100 0 1.0 -6 -4 1.2 -2 1.4 iteration q 2 0 1.6 2 1.8 4 2.0 6

  13. q 2 Metropolis-Has�ngs: toy example -6 -4 -2 0 2 4 6 -6 -4 -2 q 1 0 2 4 6 marginal posterior density log-posterior at current state 0.0 0.1 0.2 0.3 0.4 0.5 -400 -300 -200 -100 0 0 -6 -4 10 -2 20 iteration q 2 0 30 2 40 4 50 6

  14. Metropolis-Has�ngs: proposal and mixing 1.0 6 0.8 4 2 0.6 ACF of q 1 trace 0 0.4 -2 0.2 -4 0.0 -6 0 500 1000 1500 2000 2500 0 50 100 150 200 q 1 lag 1.0 6 0.8 4 2 0.6 ACF of q 2 trace 0 0.4 -2 0.2 -4 0.0 -6 0 500 1000 1500 2000 2500 0 50 100 150 200 q 2 lag

  15. Metropolis-Has�ngs: proposal and mixing 1.0 6 0.8 0.6 ACF of q 1 4 0.4 0.2 2 0.0 0 50 100 150 200 q 2 0 lag 1.0 0.8 -2 0.6 ACF of q 2 0.4 -4 0.2 0.0 -6 -6 -4 -2 0 2 4 6 0 50 100 150 200 q 1 lag

  16. Approxima�ng the target by par�cles

  17. Mapping a stormy lake

  18. Par�cle Metropolis-Has�ngs (PMH) Ini�alise in θ 0 and then generate samples { θ k } K k =1 from p ( θ | y ) by (i) Sample a candidate parameter θ ′ by θ ′ ∼ N ( θ ′ ; θ k − 1 , Σ) . (ii) Run a par�cle filter with N par�cles to es�mate � p N ( θ ′ | y ) . (iii) Accept θ ′ by se�ng θ k ← θ ′ with probability � � p N ( θ ′ | y ) � min 1 , , p N ( θ k − 1 | y ) � and otherwise reject θ ′ by se�ng θ k ← θ k − 1 . User choices: K , Σ and N .

  19. Water tank example: model Consider the model � x 1 ( t ) = − k 1 x 1 ( t ) + k 4 u ( t ) + w 1 ( t ) , ˙ � � x 2 ( t ) = k 2 x 1 ( t ) − k 3 ˙ x 2 ( t ) + w 2 ( t ) , y ( t ) = x 2 ( t ) + e ( t ) , where w 1 ( t ) , w 2 ( t ) , e ( t ) are independent Gaussian and The parameters are θ = { k 1 , k 2 , k 3 , k 4 , . . . } with p ( θ ) ∝ 1 .

  20. Water tank example: data 8 6 input (u) 4 2 0 0 200 400 600 800 1000 time 10 8 observation (y) 6 4 2 0 200 400 600 800 1000 time

  21. Water tank example: parameter posteriors marginal posterior density marginal posterior density 0 20 40 60 80 100 120 0 20 40 60 80 100 120 -0.01 0.06 0.00 0.07 0.01 0.08 k 3 k 1 0.02 0.09 0.03 0.04 0.10 marginal posterior density marginal posterior density 0 10 20 30 40 50 0 20 40 60 80 100 120 -0.01 0.60 0.62 0.00 0.64 0.01 k 4 k 2 0.66 0.02 0.68 0.70 0.03

  22. Water tank example: state predic�ons 8 6 input (u) 4 2 0 0 200 400 600 800 1000 time 10 8 observation (y) 6 4 2 0 200 400 600 800 1000 time

  23. Improving the PMH algorithm

  24. trace of k 3 trace of k 1 Water tank example: trace plots -0.01 0.00 0.01 0.02 0.03 0.04 0.06 0.07 0.08 0.09 0.10 0 0 2000 2000 4000 4000 time time 6000 6000 8000 8000 trace of k 4 trace of k 2 0.60 0.62 0.64 0.66 0.68 0.70 -0.005 0.005 0.015 0.025 0 0 2000 2000 4000 4000 time time 6000 6000 8000 8000

  25. What are some open ques�ons? - Decreasing computa�onal �me when T is large. Correla�ng and improving the par�cle filter. - Obtaining be�er mixing when p = | θ | is large (>5). Add gradient and Hessian informa�on into proposal. - Devising be�er mixing when n x = | x | is large (>10). Improving the par�cle filter. - Decrease the amount of tuning by the user. Adap�ve algorithms and rules-of-thumb.

  26. What did we do? - Gave a (hopefully) gentle introduc�on to (P)MCMC. - Developed some intui�on for PMH and its pros/cons. Why did we do this? - PMH is general algorithm for Bayesian inference. - Rela�vely simple to implement and tune. What are you going to do now? - Remember that the PMH algorithm exist. - Learning more by reading our tutorial. - Try to implement the algorithm yourself.

  27. Complete tutorial on PMH is available at arXiv:1511.01707.

  28. work@johandahlin.com work.johandahlin.com Thank you for listening Comments, sugges�ons and/or ques�ons? Johan Dahlin Remember: the tutorial is available at arXiv:1511.01707

  29. Par�cle filtering [I/II] An instance of sequen�al Monte Carlo (SMC) samplers. � � Es�mates E and p θ ( y 1: T ) . ϕ ( x t ) | y 1: t Computa�onal cost of order O ( NT ) (with N ∼ T ). Well-understood sta�s�cal proper�es. (unbiasedness, large devia�on inequali�es, CLTs) References: A. Doucet and A. Johansen, A tutorial on par�cle filtering and smoothing. In D. Crisan and B. Rozovsky (editors), The Oxford Handbook of Nonlinear Filtering. Oxford University Press, 2011. O. Cappé, S.J. Godsill and E. Moulines, An overview of exis�ng methods and recent advances in sequen�al Monte Carlo. In Proceedings of the IEEE 95(5), 2007.

  30. Par�cle filtering [II/II] Resampling Propaga�on Weigh�ng By itera�ng: P ( a ( i ) w ( j ) Resampling: t − 1 , for i, j = 1 , . . . , N . = j ) = � t � � � � x a ( i ) x ( i ) Propaga�on: , for i = 1 , . . . , N . ∼ f θ t x t t − 1 t � � � w ( i ) � x ( i ) Weigh�ng: , for i = 1 , . . . , N . = g θ y t t t We obtain the par�cle system � � N x ( i ) 0: T , w ( i ) i =1 . 0: T

  31. Par�cle filtering: state es�ma�on 0.30 0.30 0.25 0.25 0.20 0.20 density density 0.15 0.15 0.10 0.10 0.05 0.05 0.00 0.00 0 2 4 6 8 10 0 2 4 6 8 10 x x � � N � � � � � √ � � d w ( i ) x ( i ) 0 , σ 2 ϕ N � � ϕ N ϕ ( x t ) | y 1: t = ϕ t − � − → N t ( ϕ ) � E t ϕ , N . t t t i =1

  32. Par�cle filtering: likelihood es�ma�on 2.0 1.0 1.5 0.5 density estimate sample quantiles 1.0 0.0 -0.5 0.5 -1.0 0.0 -1.0 -0.5 0.0 0.5 1.0 -3 -2 -1 0 1 2 3 error in the log-likelihood estimate standard Gaussian quantiles � N � � � T � � √ ℓ ( θ ) + σ 2 � � w ( i ) d log � log − T log N, ℓ ( θ ) − � π � 0 , σ 2 − → N p θ ( y 1: T ) = N . t π � � �� � 2 N t =1 i =1 � � ℓ ( θ )

Recommend


More recommend