information geometric nonlinear filtering a hilbert space
play

Information Geometric Nonlinear Filtering: a Hilbert Space Approach - PowerPoint PPT Presentation

Information Geometric Nonlinear Filtering: a Hilbert Space Approach Nigel Newton (University of Essex) Information Geometry and its Applications IV, Liblice, June 2016 In honour of Shun-ichi Amari on the occasion of his 80 th birthday


  1. Information Geometric Nonlinear Filtering: a Hilbert Space Approach Nigel Newton (University of Essex) Information Geometry and its Applications IV, Liblice, June 2016 In honour of Shun-ichi Amari on the occasion of his 80 th birthday

  2. Overview • Nonlinear Filtering (recursive Bayesian estimation) – The need for a proper state space for posterior distributions • The infinite-dimensional Hilbert manifold of probability measures, M , (and Banach variants) • An M -valued Itô stochastic differential equation for the nonlinear filter • Information geometric properties of the nonlinear filter 1 NJN U of E 2016

  3. Nonlinear Filtering      • Markov “signal” process: , [ 0 , ) X t X t   – is a metric space, with reference probability measure m m , X  m  d – Eg. R , ( 0 , ) X N I      • Partial “observation” process: R, [ 0 , ) Y t t   0 t  ( ) Y h X ds W t s t Brownian Motion, independent of X 2 NJN U of E 2016

  4. Nonlinear Filtering      • Markov “signal” process: , [ 0 , ) X t X t   – is a metric space, with reference probability measure m m , X  m  d – Eg. R , ( 0 , ) X N I      • Partial “observation” process: R, [ 0 , ) Y t t   0 t  ( ) Y h X ds W t s t Brownian Motion, independent of X • Estimate X t at each time t from its prior distribution P t and the history of the observation:   t : ( , [ 0 , ]) Y Y s t 0 s • The linear-Gaussian case yields the Kalman-Bucy filter 2 NJN U of E 2016

  5. Nonlinear Filtering  t   • Regular conditional (posterior) distribution: P : ( ) X      t ( ) | B P X B Y 0 t t  • is a random probability measure evolving on . P ( X ) t How should we represent it? 3 NJN U of E 2016

  6. Nonlinear Filtering  t   • Regular conditional (posterior) distribution: P : ( ) X      t ( ) | B P X B Y 0 t t  • is a random probability measure evolving on . P ( X ) t How should we represent it? • We could consider the conditional density (w.r.t m ), p t – typical differential equation (Shiriyayev, Wonham, Stratonovich, Kushner):   p  p  p    A " ( )( )" d dt h h dY h dt   : ( ) ( ) h h x dx t t t t t t t t • Spaces of densities are not necessarily optimal 3 NJN U of E 2016

  7. Mean-Square Errors    • Suppose for some 2 ( ) : R E f X f X t  • Then minimises the mean-square error : E f t f  t   ˆ ˆ      2 2 2 ( ( ) ) E ( ) ( ) f X f f f f f E E  t t t t t t  estimation error approximat ion error 4 NJN U of E 2016

  8. Mean-Square Errors    • Suppose for some 2 ( ) : R E f X f X t  • Then minimises the mean-square error : E f t f  t   ˆ ˆ      2 2 2 ( ( ) ) E ( ) ( ) f X f f f f f E E  t t t t t t  estimation error approximat ion error ˆ ˆ ˆ     m • If for some , and then  t   E f t f P X , : ( ) ˆ  t t t ˆ   p  p ˆ 2 2 2 ( ) E E ( ) f f f m m t t t t and so the L 2 ( m ) norm on densities may be useful 4 NJN U of E 2016

  9. Mean-Square Errors    • Suppose for some 2 ( ) : R E f X f X t  • Then minimises the mean-square error : E f t f  t   ˆ ˆ      2 2 2 ( ( ) ) E ( ) ( ) f X f f f f f E E  t t t t t t  estimation error approximat ion error ˆ ˆ ˆ     m • If for some , and then  t   E f t f P X , : ( ) ˆ  t t t ˆ   p  p ˆ 2 2 2 ( ) E E ( ) f f f m m t t t t and so the L 2 ( m ) norm on densities may be useful • Not if f = 1 B and  t ( B ) is very small (Eg. fault detection) 4 NJN U of E 2016

  10. Mean-Square Errors    • Suppose for some 2 ( ) : R E f X f X t  • Then minimises the mean-square error : E f t f  t   ˆ ˆ      2 2 2 ( ( ) ) E ( ) ( ) f X f f f f f E E  t t t t t t  estimation error approximat ion error ˆ ˆ ˆ     m • If for some , and then  t   E f t f P X , : ( ) ˆ  t t t ˆ   p  p ˆ 2 2 2 ( ) E E ( ) f f f m m t t t t and so the L 2 ( m ) norm on densities may be useful • Not if f = 1 B and  t ( B ) is very small (Eg. fault detection) • When topologised in this way, P ( X ) has a boundary 4 NJN U of E 2016

  11. Multi-Objective Mean-Square Errors • Maximising the L 2 error over square-integrable functions ˆ  2 ( ) f f   ˆ    approximat ion error   M ( | ) : sup t t     2   estimation error  t t 2 ( ) f L E ( ) f f t  t t   ˆ 2     sup E ( 1 / ) f d d   f F t t t ˆ     2 E ( 1 / ) d d  t t t        where 2 2 : ( ) : 0 , E 1 F f L f  f t t t 5 NJN U of E 2016

  12. Multi-Objective Mean-Square Errors • Maximising the L 2 error over square-integrable functions ˆ  2 ( ) f f   ˆ    approximat ion error   M ( | ) : sup t t     2   estimation error  t t 2 ( ) f L E ( ) f f t  t t   ˆ 2     sup E ( 1 / ) f d d   f F t t t ˆ     2 E ( 1 / ) d d  t t t        where 2 2 : ( ) : 0 , E 1 F f L f  f t t t ˆ  • In time-recursive approximations, the accuracy of is t ˆ  affected by that of ( s < t ). This naturally induces s multi-objective criteria at time s (nonlinear dynamics). 5 NJN U of E 2016

  13. Geometric Sensitivity • M is “geometrically sensitive”. (It requires small probabilities to be approximated with greater absolute accuracy than large probabilities) • When topologised by M , P ( X ) does not have a boundary 6 NJN U of E 2016

  14. Geometric Sensitivity • M is “geometrically sensitive”. (It requires small probabilities to be approximated with greater absolute accuracy than large probabilities) • When topologised by M , P ( X ) does not have a boundary • This is highly desirable in the context of recursive Bayesian estimation, where conditional probabilities are repeatedly multiplied by the likelihood functions of new observations. 6 NJN U of E 2016

  15. Geometric Sensitivity • M is “geometrically sensitive”. (It requires small probabilities to be approximated with greater absolute accuracy than large probabilities.) • When topologised by M , P ( X ) does not have a boundary. • This is highly desirable in the context of recursive Bayesian estimation, where conditional probabilities are repeatedly multiplied by the likelihood functions of new observations. • M is Pearson’s c 2 divergence. It belongs to the one-  D parameter family of a -divergences: M  3 6 NJN U of E 2016

  16. Geometric Sensitivity • M is “geometrically sensitive”. (It requires small probabilities to be approximated with greater absolute accuracy than large probabilities.) • When topologised by M , P ( X ) does not have a boundary. • This is highly desirable in the context of recursive Bayesian estimation, where conditional probabilities are repeatedly multiplied by the likelihood functions of new observations. • M is Pearson’s c 2 divergence. It belongs to the one-  D parameter family of a -divergences: M  3 • It is too restrictive to use in practice 6 NJN U of E 2016 NJN U of E 2016

  17. a -Divergences • As | a | becomes larger becomes increasingly D a “geometrically sensitive” • The case a = 0 yields the Hellinger metric NJN U of E 2016 7

  18. a -Divergences • As | a | becomes larger becomes increasingly D a “geometrically sensitive” • The case a = 0 yields the Hellinger metric • The case a = ± 1 yields the KL-Divergence: dP dP  D  D ( | ) : ( | ) E log P Q P Q 1 - Q dQ dQ • This is widely used in practice. NJN U of E 2016 7

  19. a -Divergences • As | a | becomes larger becomes increasingly D a “geometrically sensitive” • The case a = 0 yields the Hellinger metric • The case a = ± 1 yields the KL-Divergence: dP dP  D  D ( | ) : ( | ) E log P Q P Q 1 - Q dQ dQ • This is widely used in practice. • Symmetric error criteria may be appropriate, such as ˆ ˆ      D D ( | ) ( | ) t t t t NJN U of E 2016 7

Recommend


More recommend