Ergodicity and Accuracy in Optimal Particle Filters David Kelly and Andrew Stuart Courant Institute New York University New York NY www.dtbkelly.com December 9, 2016 Math colloquium , CMU David Kelly (CIMS) Data assimilation December 9, 2016 1 / 33
What is data assimilation? David Kelly (CIMS) Data assimilation December 9, 2016 2 / 33
What is data assimilation? Suppose u is an evolving state, satisfying known dynamics : du ( t ) = F ( u ( t )) + ˙ eg . W dt with some unknown initial condition u 0 ∼ µ 0 . There is a true trajectory of u that is producing partial, noisy observations at times t = h , 2 h , . . . , nh : y n = Hu n + ξ n where H is a linear operator (think low rank projection), u n = u ( nh ), and ξ n ∼ N (0 , Γ) iid. The aim of data assimilation is to characterize the conditional distribution of u n given the observations { y 1 , . . . , y n } David Kelly (CIMS) Data assimilation December 9, 2016 3 / 33
Eg (A chaotic toy model) The state u ∈ R d satisfies the Lorenz-96 equations du i dt = − u i − 2 u i − 1 + u i − 1 u i +1 − u i + F i = 1 , . . . , d with periodic BCs ( u 0 = u d ). Model for turbulent ‘atmospheric variables’ in the midlatitudes. We observe the nodes with odd index: y ( nh ) = ( u 1 ( nh ) , u 3 ( nh ) , . . . , u d ( nh )) + noise David Kelly (CIMS) Data assimilation December 9, 2016 4 / 33
The conditional distribution is updated via the filtering cycle . David Kelly (CIMS) Data assimilation December 9, 2016 5 / 33
Illustration (Initialization) y Ψ Figure: The blue circle represents our initial uncertainty u 0 ∼ µ 0 . x obs David Kelly (CIMS) Data assimilation December 9, 2016 6 / 33
Illustration (Forecast step) y Ψ Figure: Apply the time h flow map Ψ. This produces a new probability measure which is our forecasted estimate of u 1 . This is x called the forecast step. obs David Kelly (CIMS) Data assimilation December 9, 2016 6 / 33
Illustration (Make an observation) y Ψ Figure: We make an observation y 1 = Hu 1 + ξ 1 . In the picture, we only observe the x variable. x obs David Kelly (CIMS) Data assimilation December 9, 2016 6 / 33
Illustration (Analysis step) y Figure: Using Bayes formula we compute the conditional distribution of u 1 | y 1 . This new Ψ measure (called the posterior) is the new estimate of u 1 . The uncertainty of the estimate is reduced by x incorporating the obs observation. The forecast distribution steers the update from the observation. David Kelly (CIMS) Data assimilation December 9, 2016 6 / 33
Bayes’ formula filtering update Let Y n = { y 1 , . . . , y n } . We want to compute the conditional density P ( u n +1 | Y n +1 ), using P ( u n | Y n ) and y n +1 . By Bayes’ formula, we have P ( u n +1 | Y n +1 ) = P ( u n +1 | Y n , y n +1 ) ∝ P ( y n +1 | u n +1 ) P ( u n +1 | Y n ) But we need to compute the integral � P ( u n +1 | Y n ) = P ( u n +1 | Y n , u n ) P ( u n | Y n ) du n . David Kelly (CIMS) Data assimilation December 9, 2016 7 / 33
In general there are no closed formulas for the Bayesian densities. One typically approximates the densities with a sampling procedure. David Kelly (CIMS) Data assimilation December 9, 2016 8 / 33
Particle filters approximate the posterior empirically N � 1 N δ ( u k − u ( n ) P ( u k | Y k ) ≈ k ) n =1 the particles { u ( n ) k } N n =1 can be updated in different ways, giving rise to different particle filters. David Kelly (CIMS) Data assimilation December 9, 2016 9 / 33
Model Assumption We always assume a conditionally Gaussian model u k +1 = ψ ( u k ) + η k η k ∼ N (0 , Σ) i.i.d. , where ψ is deterministic, with observations y k +1 = Hu k +1 + ξ k +1 ξ k ∼ N (0 , Γ) i.i.d. , This facilitates the implementation and theory for particles filters and is a realistic assumption for many practical problems. We denote the posterior, with density P ( u k | Y k ), by µ k and denote the particle filter approximations by µ N k . David Kelly (CIMS) Data assimilation December 9, 2016 10 / 33
The motivation: importance sampling If p ( x ) is a probability density, the empirical approximation of p is given by N � p ( x ) ≈ 1 δ ( x − x ( n ) ) N n =1 where x ( n ) are samples from p . When p is difficult to sample from, we can instead use the importance sampling approximation N � x ( n ) ) p ( � p ( x ) ≈ Z − 1 x ( n ) ) δ ( x − � N x ( n ) ) q ( � n =1 x ( n ) are samples from a different probability density q and Z N is where � normalization constant Z N = � N x ( n ) ) p ( � n =1 x ( n ) ) q ( � David Kelly (CIMS) Data assimilation December 9, 2016 11 / 33
The motivation: standard particle filter We have samples { u ( n ) k } N n =1 from P ( u k | Y k ) that we wish to update into samples from P ( u k +1 | Y k +1 ). Note that u k | Y k is a Markov chain with kernel p k ( u k , du k +1 ) = Z − 1 P ( y k +1 | u k +1 ) P ( u k +1 | u k ) If we could draw u ( n ) k +1 from p k ( u ( n ) k , du k +1 ) then we would have u ( n ) k +1 ∼ P ( u k +1 | Y k +1 ). David Kelly (CIMS) Data assimilation December 9, 2016 11 / 33
The motivation: standard particle filter u ( n ) It is too difficult to sample directly, so we instead draw � k +1 from q ( u k +1 ) = P ( u k +1 | u ( n ) k ) and get the importance sampling approximation N � u ( n ) u ( n ) P ( u k +1 | Y k +1 ) ≈ Z − 1 P ( y k +1 | � k +1 ) δ ( u k +1 − � k +1 ) N n =1 Thus the weights in � N n =1 w ( n ) k +1 δ ( u k +1 − u ( n ) k +1 ) are computed by � � − 1 w ( n ) , ∗ u ( n ) u ( n ) k +1 | 2 k +1 = P ( y k +1 | � k +1 ) ∝ exp 2 | y k +1 − H � Γ w ( n ) , ∗ w ( n ) k +1 k +1 = � N n =1 w ( n ) , ∗ k +1 Notation : | · | A = � A − 1 · , ·� David Kelly (CIMS) Data assimilation December 9, 2016 11 / 33
A different approach Another approach p k ( u ( n ) k , du k +1 ) ∝ P ( y k +1 | u k +1 ) P ( u k +1 | u ( n ) k ) � � � � − 1 − 1 2 | u k +1 − ψ ( u ( n ) = Z − 1 2 | y k +1 − Hu k +1 | 2 Z − 1 k ) | 2 exp Σ exp Γ Σ Γ � � � � − 1 − 1 2 | y k +1 − H ψ ( u ( n ) 2 | u k +1 − m ( n ) = Z − 1 k ) | 2 Z − 1 k +1 | 2 exp exp S C S C by product of Gaussian densities formulae , and C − 1 = Σ − 1 + H T Γ − 1 H S = H Σ H T + Γ m ( n ) k +1 = C (Σ − 1 ψ ( u ( n ) k ) + H T Γ − 1 y k +1 ) = ( I − KH ) ψ ( u k ) + Ky k +1 David Kelly (CIMS) Data assimilation December 9, 2016 11 / 33
� � 2 | u k +1 − m ( n ) If q ( u k +1 ) = Z − 1 − 1 k +1 | 2 exp then the importance sampling C C approximation is given by � � N � − 1 2 | y k +1 − H ψ ( u ( n ) u ( n ) P ( u k +1 | Y k +1 ) ≈ Z − 1 k ) | 2 exp δ ( u k +1 − � k +1 ) S N n =1 u ( n ) where � k +1 are sampled from q , ie u ( n ) k +1 = m ( n ) k +1 + ζ ( n ) k +1 = ( I − KH ) ψ ( u ( n ) k ) + Ky k +1 + ζ ( n ) � k +1 where ζ ( n ) k +1 ∼ N (0 , C ). The weights are computed by � � w ( n ) , ∗ − 1 w ( n ) , ∗ 2 | y k +1 − H ψ ( u ( n ) w ( n ) k +1 k ) | 2 k +1 = exp , k +1 = � N S n =1 w ( n ) , ∗ k +1 David Kelly (CIMS) Data assimilation December 9, 2016 11 / 33
The optimal particle filter David Kelly (CIMS) Data assimilation December 9, 2016 16 / 33
Propagate the particles y x obs u ( n ) k +1 = ( I − KH ) ψ ( u ( n ) k ) + Ky k +1 + ζ ( n ) ζ ( n ) � , k +1 ∼ N (0 , C ) i.i.d. k +1 David Kelly (CIMS) Data assimilation December 9, 2016 16 / 33
Weight the particles using the observation y x obs � � w ( n ) , ∗ − 1 w ( n ) , ∗ 2 | y k +1 − H ψ ( u ( n ) w ( n ) k ) | 2 k +1 k +1 = exp , k +1 = � N S n =1 w ( n ) , ∗ k +1 David Kelly (CIMS) Data assimilation December 9, 2016 16 / 33
Resample the weighted particles y × 2 x obs N � 1 N δ ( u k +1 − u ( n ) P ( u k +1 | Y k +1 ) ≈ k +1 ) n =1 David Kelly (CIMS) Data assimilation December 9, 2016 16 / 33
The optimal particle filter We represent the optimal particle filter as a random dynamical system u ( n ) k +1 = ( I − KH ) ψ ( u ( n ) k ) + Ky k +1 + ζ ( n ) ζ ( n ) � , ∼ N (0 , C ) i.i.d. k k N � u ( n ) ] ( r ( n ) u ( m ) k +1 ) � k +1 = 1 [ x ( m ) k +1 . k +1 , x ( m +1) k +1 m =1 where r ( n ) k +1 is uniformly distributed on [0 , 1] and x ( m +1) = x ( m ) k +1 + w ( m ) k +1 k +1 u ( m ) k +1 with probability w ( m ) ie. pick � k +1 . Note that U k = ( u (1) k , . . . , u ( n ) k ) is a Markov chain. David Kelly (CIMS) Data assimilation December 9, 2016 17 / 33
What do we know about particle filters? David Kelly (CIMS) Data assimilation December 9, 2016 18 / 33
Theory for filtering distributions The true posterior (filtering distribution) µ k is known to be Conditionally ergodic : If µ ′ k , µ ′′ k are two copies of the filtering distribution with µ ′ 0 and µ ′′ 0 = δ u ′ 0 = δ u ′′ 0 then d TV ( µ ′ k , µ ′′ k ) = O ( δ k ) as k → ∞ , where δ ∈ (0 , 1). This is a type of stability . And accurate (with sufficient assumptions on observations): E � m k − u k � 2 = O (obs noise) lim sup k →∞ where u k is the trajectory producing Y k , m k = E ( u k | Y k ) and we take E over all randomness. David Kelly (CIMS) Data assimilation December 9, 2016 19 / 33
Recommend
More recommend