ergodicity and accuracy in optimal particle filters
play

Ergodicity and Accuracy in Optimal Particle Filters David Kelly and - PowerPoint PPT Presentation

Ergodicity and Accuracy in Optimal Particle Filters David Kelly and Andrew Stuart Courant Institute New York University New York NY www.dtbkelly.com December 9, 2016 Math colloquium , CMU David Kelly (CIMS) Data assimilation December 9, 2016


  1. Ergodicity and Accuracy in Optimal Particle Filters David Kelly and Andrew Stuart Courant Institute New York University New York NY www.dtbkelly.com December 9, 2016 Math colloquium , CMU David Kelly (CIMS) Data assimilation December 9, 2016 1 / 33

  2. What is data assimilation? David Kelly (CIMS) Data assimilation December 9, 2016 2 / 33

  3. What is data assimilation? Suppose u is an evolving state, satisfying known dynamics : du ( t ) = F ( u ( t )) + ˙ eg . W dt with some unknown initial condition u 0 ∼ µ 0 . There is a true trajectory of u that is producing partial, noisy observations at times t = h , 2 h , . . . , nh : y n = Hu n + ξ n where H is a linear operator (think low rank projection), u n = u ( nh ), and ξ n ∼ N (0 , Γ) iid. The aim of data assimilation is to characterize the conditional distribution of u n given the observations { y 1 , . . . , y n } David Kelly (CIMS) Data assimilation December 9, 2016 3 / 33

  4. Eg (A chaotic toy model) The state u ∈ R d satisfies the Lorenz-96 equations du i dt = − u i − 2 u i − 1 + u i − 1 u i +1 − u i + F i = 1 , . . . , d with periodic BCs ( u 0 = u d ). Model for turbulent ‘atmospheric variables’ in the midlatitudes. We observe the nodes with odd index: y ( nh ) = ( u 1 ( nh ) , u 3 ( nh ) , . . . , u d ( nh )) + noise David Kelly (CIMS) Data assimilation December 9, 2016 4 / 33

  5. The conditional distribution is updated via the filtering cycle . David Kelly (CIMS) Data assimilation December 9, 2016 5 / 33

  6. Illustration (Initialization) y Ψ Figure: The blue circle represents our initial uncertainty u 0 ∼ µ 0 . x obs David Kelly (CIMS) Data assimilation December 9, 2016 6 / 33

  7. Illustration (Forecast step) y Ψ Figure: Apply the time h flow map Ψ. This produces a new probability measure which is our forecasted estimate of u 1 . This is x called the forecast step. obs David Kelly (CIMS) Data assimilation December 9, 2016 6 / 33

  8. Illustration (Make an observation) y Ψ Figure: We make an observation y 1 = Hu 1 + ξ 1 . In the picture, we only observe the x variable. x obs David Kelly (CIMS) Data assimilation December 9, 2016 6 / 33

  9. Illustration (Analysis step) y Figure: Using Bayes formula we compute the conditional distribution of u 1 | y 1 . This new Ψ measure (called the posterior) is the new estimate of u 1 . The uncertainty of the estimate is reduced by x incorporating the obs observation. The forecast distribution steers the update from the observation. David Kelly (CIMS) Data assimilation December 9, 2016 6 / 33

  10. Bayes’ formula filtering update Let Y n = { y 1 , . . . , y n } . We want to compute the conditional density P ( u n +1 | Y n +1 ), using P ( u n | Y n ) and y n +1 . By Bayes’ formula, we have P ( u n +1 | Y n +1 ) = P ( u n +1 | Y n , y n +1 ) ∝ P ( y n +1 | u n +1 ) P ( u n +1 | Y n ) But we need to compute the integral � P ( u n +1 | Y n ) = P ( u n +1 | Y n , u n ) P ( u n | Y n ) du n . David Kelly (CIMS) Data assimilation December 9, 2016 7 / 33

  11. In general there are no closed formulas for the Bayesian densities. One typically approximates the densities with a sampling procedure. David Kelly (CIMS) Data assimilation December 9, 2016 8 / 33

  12. Particle filters approximate the posterior empirically N � 1 N δ ( u k − u ( n ) P ( u k | Y k ) ≈ k ) n =1 the particles { u ( n ) k } N n =1 can be updated in different ways, giving rise to different particle filters. David Kelly (CIMS) Data assimilation December 9, 2016 9 / 33

  13. Model Assumption We always assume a conditionally Gaussian model u k +1 = ψ ( u k ) + η k η k ∼ N (0 , Σ) i.i.d. , where ψ is deterministic, with observations y k +1 = Hu k +1 + ξ k +1 ξ k ∼ N (0 , Γ) i.i.d. , This facilitates the implementation and theory for particles filters and is a realistic assumption for many practical problems. We denote the posterior, with density P ( u k | Y k ), by µ k and denote the particle filter approximations by µ N k . David Kelly (CIMS) Data assimilation December 9, 2016 10 / 33

  14. The motivation: importance sampling If p ( x ) is a probability density, the empirical approximation of p is given by N � p ( x ) ≈ 1 δ ( x − x ( n ) ) N n =1 where x ( n ) are samples from p . When p is difficult to sample from, we can instead use the importance sampling approximation N � x ( n ) ) p ( � p ( x ) ≈ Z − 1 x ( n ) ) δ ( x − � N x ( n ) ) q ( � n =1 x ( n ) are samples from a different probability density q and Z N is where � normalization constant Z N = � N x ( n ) ) p ( � n =1 x ( n ) ) q ( � David Kelly (CIMS) Data assimilation December 9, 2016 11 / 33

  15. The motivation: standard particle filter We have samples { u ( n ) k } N n =1 from P ( u k | Y k ) that we wish to update into samples from P ( u k +1 | Y k +1 ). Note that u k | Y k is a Markov chain with kernel p k ( u k , du k +1 ) = Z − 1 P ( y k +1 | u k +1 ) P ( u k +1 | u k ) If we could draw u ( n ) k +1 from p k ( u ( n ) k , du k +1 ) then we would have u ( n ) k +1 ∼ P ( u k +1 | Y k +1 ). David Kelly (CIMS) Data assimilation December 9, 2016 11 / 33

  16. The motivation: standard particle filter u ( n ) It is too difficult to sample directly, so we instead draw � k +1 from q ( u k +1 ) = P ( u k +1 | u ( n ) k ) and get the importance sampling approximation N � u ( n ) u ( n ) P ( u k +1 | Y k +1 ) ≈ Z − 1 P ( y k +1 | � k +1 ) δ ( u k +1 − � k +1 ) N n =1 Thus the weights in � N n =1 w ( n ) k +1 δ ( u k +1 − u ( n ) k +1 ) are computed by � � − 1 w ( n ) , ∗ u ( n ) u ( n ) k +1 | 2 k +1 = P ( y k +1 | � k +1 ) ∝ exp 2 | y k +1 − H � Γ w ( n ) , ∗ w ( n ) k +1 k +1 = � N n =1 w ( n ) , ∗ k +1 Notation : | · | A = � A − 1 · , ·� David Kelly (CIMS) Data assimilation December 9, 2016 11 / 33

  17. A different approach Another approach p k ( u ( n ) k , du k +1 ) ∝ P ( y k +1 | u k +1 ) P ( u k +1 | u ( n ) k ) � � � � − 1 − 1 2 | u k +1 − ψ ( u ( n ) = Z − 1 2 | y k +1 − Hu k +1 | 2 Z − 1 k ) | 2 exp Σ exp Γ Σ Γ � � � � − 1 − 1 2 | y k +1 − H ψ ( u ( n ) 2 | u k +1 − m ( n ) = Z − 1 k ) | 2 Z − 1 k +1 | 2 exp exp S C S C by product of Gaussian densities formulae , and C − 1 = Σ − 1 + H T Γ − 1 H S = H Σ H T + Γ m ( n ) k +1 = C (Σ − 1 ψ ( u ( n ) k ) + H T Γ − 1 y k +1 ) = ( I − KH ) ψ ( u k ) + Ky k +1 David Kelly (CIMS) Data assimilation December 9, 2016 11 / 33

  18. � � 2 | u k +1 − m ( n ) If q ( u k +1 ) = Z − 1 − 1 k +1 | 2 exp then the importance sampling C C approximation is given by � � N � − 1 2 | y k +1 − H ψ ( u ( n ) u ( n ) P ( u k +1 | Y k +1 ) ≈ Z − 1 k ) | 2 exp δ ( u k +1 − � k +1 ) S N n =1 u ( n ) where � k +1 are sampled from q , ie u ( n ) k +1 = m ( n ) k +1 + ζ ( n ) k +1 = ( I − KH ) ψ ( u ( n ) k ) + Ky k +1 + ζ ( n ) � k +1 where ζ ( n ) k +1 ∼ N (0 , C ). The weights are computed by � � w ( n ) , ∗ − 1 w ( n ) , ∗ 2 | y k +1 − H ψ ( u ( n ) w ( n ) k +1 k ) | 2 k +1 = exp , k +1 = � N S n =1 w ( n ) , ∗ k +1 David Kelly (CIMS) Data assimilation December 9, 2016 11 / 33

  19. The optimal particle filter David Kelly (CIMS) Data assimilation December 9, 2016 16 / 33

  20. Propagate the particles y x obs u ( n ) k +1 = ( I − KH ) ψ ( u ( n ) k ) + Ky k +1 + ζ ( n ) ζ ( n ) � , k +1 ∼ N (0 , C ) i.i.d. k +1 David Kelly (CIMS) Data assimilation December 9, 2016 16 / 33

  21. Weight the particles using the observation y x obs � � w ( n ) , ∗ − 1 w ( n ) , ∗ 2 | y k +1 − H ψ ( u ( n ) w ( n ) k ) | 2 k +1 k +1 = exp , k +1 = � N S n =1 w ( n ) , ∗ k +1 David Kelly (CIMS) Data assimilation December 9, 2016 16 / 33

  22. Resample the weighted particles y × 2 x obs N � 1 N δ ( u k +1 − u ( n ) P ( u k +1 | Y k +1 ) ≈ k +1 ) n =1 David Kelly (CIMS) Data assimilation December 9, 2016 16 / 33

  23. The optimal particle filter We represent the optimal particle filter as a random dynamical system u ( n ) k +1 = ( I − KH ) ψ ( u ( n ) k ) + Ky k +1 + ζ ( n ) ζ ( n ) � , ∼ N (0 , C ) i.i.d. k k N � u ( n ) ] ( r ( n ) u ( m ) k +1 ) � k +1 = 1 [ x ( m ) k +1 . k +1 , x ( m +1) k +1 m =1 where r ( n ) k +1 is uniformly distributed on [0 , 1] and x ( m +1) = x ( m ) k +1 + w ( m ) k +1 k +1 u ( m ) k +1 with probability w ( m ) ie. pick � k +1 . Note that U k = ( u (1) k , . . . , u ( n ) k ) is a Markov chain. David Kelly (CIMS) Data assimilation December 9, 2016 17 / 33

  24. What do we know about particle filters? David Kelly (CIMS) Data assimilation December 9, 2016 18 / 33

  25. Theory for filtering distributions The true posterior (filtering distribution) µ k is known to be Conditionally ergodic : If µ ′ k , µ ′′ k are two copies of the filtering distribution with µ ′ 0 and µ ′′ 0 = δ u ′ 0 = δ u ′′ 0 then d TV ( µ ′ k , µ ′′ k ) = O ( δ k ) as k → ∞ , where δ ∈ (0 , 1). This is a type of stability . And accurate (with sufficient assumptions on observations): E � m k − u k � 2 = O (obs noise) lim sup k →∞ where u k is the trajectory producing Y k , m k = E ( u k | Y k ) and we take E over all randomness. David Kelly (CIMS) Data assimilation December 9, 2016 19 / 33

Recommend


More recommend