Advanced Simulation - Lecture 13 Patrick Rebeschini February 26th, 2018 Patrick Rebeschini Lecture 13 1/ 27
Outline Sequential Importance Sampling. Resampling step. Sequential Monte Carlo / Particle Filters. Patrick Rebeschini Lecture 13 2/ 27
Hidden Markov Models ... X 0 X 1 X 2 X T θ ... y 2 y T y 1 Figure: Graph representation of a general HMM. ( X t ): initial distribution µ θ , transition f θ . ( Y t ) given ( X t ): measurement g θ . Prior on the parameter θ ∈ Θ. Inference in HMMs , Capp´ e, Moulines, Ryden, 2005. Patrick Rebeschini Lecture 13 3/ 27
General inference in HMM Proposition : The posterior p ( x 1: t | y 1: t , θ ) satisfies p ( x 1: t | y 1: t , θ ) = p ( x 1: t − 1 | y 1: t − 1 , θ ) f θ ( x t | x t − 1 ) g θ ( y t | x t ) p ( y t | y 1: t − 1 , θ ) where � p ( y t | y 1: t − 1 , θ ) = p ( x 1: t − 1 | y 1: t − 1 , θ ) f θ ( x t | x t − 1 ) g θ ( y t | x t ) dx 1: t . Proposition : The marginal posterior p ( x t | y 1: t ) satisfies the following recursion � p ( x t | y 1: t − 1 ) = f ( x t | x t − 1 ) p ( x t − 1 | y 1: t − 1 ) dx t − 1 p ( x t | y 1: t ) = g ( y t | x t ) p ( x t | y 1: t − 1 ) p ( y t | y 1: t − 1 ) where � p ( y t | y 1: t − 1 ) = g ( y t | x t ) p ( x t | y 1: t − 1 ) dx t . Patrick Rebeschini Lecture 13 4/ 27
General inference in HMM In general, the filtering problem is thus intractable: � � ϕ ( x t ) p ( x t | y 1: t , θ ) dx t = ϕ ( x t ) p ( x 1: t , y 1: t | θ ) dx 1: t t t � � � = ϕ ( x t ) µ θ ( x 1 ) f θ ( x s | x s − 1 ) g θ ( y s | x s ) dx 1: t . s =1 s =1 It is a t × dim( X ) dimensional integral. The likelihood is also intractable: � p ( y 1: t | θ ) = p ( x 1: t , y 1: t | θ ) dx 1: t t t � � � = µ θ ( x 1 ) f θ ( x s | x s − 1 ) g θ ( y s | x s ) dx 1: t . s =1 s =1 Thus we cannot compute it pointwise, e.g. to perform Metropolis–Hastings algorithm on the parameters. Patrick Rebeschini Lecture 13 5/ 27
Sequential Importance Sampling We now consider the parameter θ to be fixed. We want to infer X 1: t given y 1: t . Two ingredients: importance sampling, and “sampling via composition”, or “via condition”. IS: if we have a weighted sample ( w i 1 , X i ) approximating π 1 , then ( w i 2 , X i ) approximates π 2 if we define 1 × π 2 ( X i ) w i 2 = w i π 1 ( X i ) . In standard IS, π 1 and π 2 are defined on the same space. Patrick Rebeschini Lecture 13 6/ 27
Sequential Importance Sampling Sampling via composition: if ( w i , X i ) approximates p X ( x ), and if Y i ∼ q Y | X ( y | X i ), then ( w i , ( X i , Y i )) approximates p X ( x ) q Y | X ( y | x ). The space has been extended. Marginally, ( w i , Y i ) approximates � q Y ( y ) = p X ( x ) q Y | X ( y | x ) dx. Sequential Importance Sampling combines both ingredients to iteratively approximate p ( x 1: t | y 1: t ). Patrick Rebeschini Lecture 13 7/ 27
Sequential Importance Sampling: algorithm At time t = 1 Sample X i 1 ∼ q 1 ( · ) . Compute the weights 1 = µ ( X i 1 ) g ( y 1 | X i 1 ) w i . � X i � q 1 1 At time t ≥ 2 Sample X i t ∼ q t | t − 1 ( ·| X i t − 1 ) . Compute the weights w i t = w i t − 1 × ω i t � g � X i � X i � � y t | X i � t − 1 × f t t − 1 t = w i . q t | t − 1 ( X i t | X i t − 1 ) Patrick Rebeschini Lecture 13 8/ 27
Sequential Importance Sampling: example 4 2 0 x −2 −4 −6 0 25 50 75 100 time prior optimal Kalman Filter Figure: Estimation of filtering means E ( x t | y 1: t ). Patrick Rebeschini Lecture 13 9/ 27
Sequential Importance Sampling: example 2.0 1.5 Var ( x ) 1.0 0.5 0.0 0 25 50 75 100 time prior optimal Kalman Filter Figure: Estimation of filtering variances V ( x t | y 1: t ). Patrick Rebeschini Lecture 13 10/ 27
Sequential Importance Sampling: example 0 −100 log ( L ( y 1 : t )) −200 −300 0 25 50 75 100 time prior optimal Kalman Filter Figure: Estimation of marginal log likelihoods log p ( y 1: t ). Patrick Rebeschini Lecture 13 11/ 27
Sequential Importance Sampling: example 800 600 ESS 400 200 0 0 25 50 75 100 time prior optimal Figure: Effective sample size over time. Patrick Rebeschini Lecture 13 12/ 27
Sequential Importance Sampling: example 10 5 0 x −5 −10 0 25 50 75 100 time Figure: Spread of 100 paths drawn from the prior proposal, and KF means in blue. Darker lines indicate higher weights. Patrick Rebeschini Lecture 13 13/ 27
Resampling Idea: at time t , select particles with high weights, and remove particles with low weights. Spend the fixed computational budget “ N ” on the most promising paths. Obtain an equally weighted sample ( N − 1 , ¯ X i ) from a weighted sample ( w i , X i ). Resampling on empirical probability measures: input w j � − 1 � �� π N ( x ) = w i δ X i ( x ) and output π N ( x ) = N − 1 � ¯ δ ¯ X i ( x ) . Patrick Rebeschini Lecture 13 14/ 27
Multinomial resampling How to draw from an empirical probability distribution? N 1 π N ( x ) = � w i δ X i ( x ) � N j =1 w j i =1 Remember how to draw from a mixture model? K ω i p i ( x ) � i =1 Draw k with probabilities ω 1 , . . . , ω N , then draw from p k . Patrick Rebeschini Lecture 13 15/ 27
Multinomial resampling Draw an “ancestry vector” ∈ { 1 , . . . , N } N independently from a A 1: N = � A 1 , . . . , A N � categorical distribution A 1: N i.i.d � w 1 , . . . , w N � ∼ C at , in other words w k A i = k � � ∀ i ∈ { 1 , . . . , N } ∀ k ∈ { 1 , . . . , N } P = j =1 w j . � N X i to be X A i for all i ∈ { 1 , . . . , N } . X A i is said to Define ¯ be the “parent” or “ancestor” of ¯ X i . � ¯ X N � Return ¯ X 1 , . . . , ¯ X = . Patrick Rebeschini Lecture 13 16/ 27
Multinomial resampling Draw an “offspring vector” ∈ { 0 , . . . , N } N from a multinomial O 1: N = � O 1 , . . . , O N � distribution � N ; w 1 , . . . , w N � O 1: N ∼ M ultinomial t so that N w i O i = N. � O i � � ∀ i ∈ { 1 , . . . , N } E = N and � N j =1 w j i =1 Each particle X i is replicated O i times (possibly zero times) to create the sample ¯ X : ¯ X ← {} � ¯ X, X i � t , ¯ For i = 1 , . . . , N , for k = 0 , . . . , O i X ← � ¯ X N � Return ¯ X 1 , . . . , ¯ X = . Patrick Rebeschini Lecture 13 17/ 27
Multinomial resampling Other strategies are possible to perform resampling. Some properties are desirable: w i � O i � = N E j =1 w j , � N w k A i = k � � or = P j =1 w j . � N This is sometimes called “unbiasedness”, because then � ¯ N N 1 = 1 X k � � X k � � � O k ϕ ϕ N N k =1 k =1 has expectation N w k � X k � � j =1 w j ϕ . � N k =1 Patrick Rebeschini Lecture 13 18/ 27
Sequential MC / Sequential Importance Resampling At time t = 1 Sample X i 1 ∼ q 1 ( · ) . Compute the weights 1 = µ ( X i 1 ) g ( y 1 | X i 1 ) w i . � X i � q 1 1 At time t ≥ 2 � → N − 1 , X i � � � w i t − 1 , X i . Resample 1: t − 1 1: t − 1 � ¯ t ∼ q t | t − 1 ( ·| ¯ � Sample X i X i t − 1 ), X i X i 1: t − 1 , X i 1: t := t Compute the weights � g � X i � X i � � y t | X i � t = f t t − 1 t w i t = ω i . q t | t − 1 ( X i t | X i t − 1 ) Patrick Rebeschini Lecture 13 19/ 27
Sequential Importance Sampling (a) (b) (c) Level sets of likelihood function ! g ( x, Y n ) Y n ! x − Patrick Rebeschini Lecture 13 20/ 27
Sequential Importance Resampling (a) (b) (c) (d) Level sets of 3%par:cles% likelihood function % 2%par:cles% Y n ! x − ! g ( x, Y n ) % N N Patrick Rebeschini Lecture 13 21/ 27
Sequential Monte Carlo: example 2 0 x −2 −4 0 25 50 75 100 time prior optimal Kalman Filter Figure: Estimation of filtering means E ( x t | y 1: t ). Patrick Rebeschini Lecture 13 22/ 27
Sequential Monte Carlo: example 1.0 0.8 Var ( x ) 0.6 0.4 0 25 50 75 100 time prior optimal Kalman Filter Figure: Estimation of filtering variances V ( x t | y 1: t ). Patrick Rebeschini Lecture 13 23/ 27
Sequential Monte Carlo: example 0 −50 log ( L ( y 1 : t )) −100 −150 −200 0 25 50 75 100 time prior optimal Kalman Filter Figure: Estimation of marginal log likelihoods log p ( y 1: t ). Patrick Rebeschini Lecture 13 24/ 27
Sequential Monte Carlo: example 1000 750 ESS 500 250 0 0 25 50 75 100 time prior optimal Figure: Effective sample size over time. Patrick Rebeschini Lecture 13 25/ 27
Sequential Monte Carlo: example 6 3 0 x −3 −6 0 25 50 75 100 time Figure: Support of the approximation of p ( x t | y 1: t ), over time. Patrick Rebeschini Lecture 13 26/ 27
Sequential Monte Carlo: example 2.5 0.0 x −2.5 −5.0 0 25 50 75 100 time Figure: Trajectories ¯ X i 1: T , for i ∈ { 1 , . . . , N } and N = 100. Patrick Rebeschini Lecture 13 27/ 27
Recommend
More recommend