Information Geometric Nonlinear Filtering: a Hilbert Space Approach Nigel Newton (University of Essex) Information Geometry and its Applications IV, Liblice, June 2016 In honour of Shun-ichi Amari on the occasion of his 80 th birthday
Overview • Nonlinear Filtering (recursive Bayesian estimation) – The need for a proper state space for posterior distributions • The infinite-dimensional Hilbert manifold of probability measures, M , (and Banach variants) • An M -valued Itô stochastic differential equation for the nonlinear filter • Information geometric properties of the nonlinear filter 1 NJN U of E 2016
Nonlinear Filtering • Markov “signal” process: , [ 0 , ) X t X t – is a metric space, with reference probability measure m m , X m d – Eg. R , ( 0 , ) X N I • Partial “observation” process: R, [ 0 , ) Y t t 0 t ( ) Y h X ds W t s t Brownian Motion, independent of X 2 NJN U of E 2016
Nonlinear Filtering • Markov “signal” process: , [ 0 , ) X t X t – is a metric space, with reference probability measure m m , X m d – Eg. R , ( 0 , ) X N I • Partial “observation” process: R, [ 0 , ) Y t t 0 t ( ) Y h X ds W t s t Brownian Motion, independent of X • Estimate X t at each time t from its prior distribution P t and the history of the observation: t : ( , [ 0 , ]) Y Y s t 0 s • The linear-Gaussian case yields the Kalman-Bucy filter 2 NJN U of E 2016
Nonlinear Filtering t • Regular conditional (posterior) distribution: P : ( ) X t ( ) | B P X B Y 0 t t • is a random probability measure evolving on . P ( X ) t How should we represent it? 3 NJN U of E 2016
Nonlinear Filtering t • Regular conditional (posterior) distribution: P : ( ) X t ( ) | B P X B Y 0 t t • is a random probability measure evolving on . P ( X ) t How should we represent it? • We could consider the conditional density (w.r.t m ), p t – typical differential equation (Shiriyayev, Wonham, Stratonovich, Kushner): p p p A " ( )( )" d dt h h dY h dt : ( ) ( ) h h x dx t t t t t t t t • Spaces of densities are not necessarily optimal 3 NJN U of E 2016
Mean-Square Errors • Suppose for some 2 ( ) : R E f X f X t • Then minimises the mean-square error : E f t f t ˆ ˆ 2 2 2 ( ( ) ) E ( ) ( ) f X f f f f f E E t t t t t t estimation error approximat ion error 4 NJN U of E 2016
Mean-Square Errors • Suppose for some 2 ( ) : R E f X f X t • Then minimises the mean-square error : E f t f t ˆ ˆ 2 2 2 ( ( ) ) E ( ) ( ) f X f f f f f E E t t t t t t estimation error approximat ion error ˆ ˆ ˆ m • If for some , and then t E f t f P X , : ( ) ˆ t t t ˆ p p ˆ 2 2 2 ( ) E E ( ) f f f m m t t t t and so the L 2 ( m ) norm on densities may be useful 4 NJN U of E 2016
Mean-Square Errors • Suppose for some 2 ( ) : R E f X f X t • Then minimises the mean-square error : E f t f t ˆ ˆ 2 2 2 ( ( ) ) E ( ) ( ) f X f f f f f E E t t t t t t estimation error approximat ion error ˆ ˆ ˆ m • If for some , and then t E f t f P X , : ( ) ˆ t t t ˆ p p ˆ 2 2 2 ( ) E E ( ) f f f m m t t t t and so the L 2 ( m ) norm on densities may be useful • Not if f = 1 B and t ( B ) is very small (Eg. fault detection) 4 NJN U of E 2016
Mean-Square Errors • Suppose for some 2 ( ) : R E f X f X t • Then minimises the mean-square error : E f t f t ˆ ˆ 2 2 2 ( ( ) ) E ( ) ( ) f X f f f f f E E t t t t t t estimation error approximat ion error ˆ ˆ ˆ m • If for some , and then t E f t f P X , : ( ) ˆ t t t ˆ p p ˆ 2 2 2 ( ) E E ( ) f f f m m t t t t and so the L 2 ( m ) norm on densities may be useful • Not if f = 1 B and t ( B ) is very small (Eg. fault detection) • When topologised in this way, P ( X ) has a boundary 4 NJN U of E 2016
Multi-Objective Mean-Square Errors • Maximising the L 2 error over square-integrable functions ˆ 2 ( ) f f ˆ approximat ion error M ( | ) : sup t t 2 estimation error t t 2 ( ) f L E ( ) f f t t t ˆ 2 sup E ( 1 / ) f d d f F t t t ˆ 2 E ( 1 / ) d d t t t where 2 2 : ( ) : 0 , E 1 F f L f f t t t 5 NJN U of E 2016
Multi-Objective Mean-Square Errors • Maximising the L 2 error over square-integrable functions ˆ 2 ( ) f f ˆ approximat ion error M ( | ) : sup t t 2 estimation error t t 2 ( ) f L E ( ) f f t t t ˆ 2 sup E ( 1 / ) f d d f F t t t ˆ 2 E ( 1 / ) d d t t t where 2 2 : ( ) : 0 , E 1 F f L f f t t t ˆ • In time-recursive approximations, the accuracy of is t ˆ affected by that of ( s < t ). This naturally induces s multi-objective criteria at time s (nonlinear dynamics). 5 NJN U of E 2016
Geometric Sensitivity • M is “geometrically sensitive”. (It requires small probabilities to be approximated with greater absolute accuracy than large probabilities) • When topologised by M , P ( X ) does not have a boundary 6 NJN U of E 2016
Geometric Sensitivity • M is “geometrically sensitive”. (It requires small probabilities to be approximated with greater absolute accuracy than large probabilities) • When topologised by M , P ( X ) does not have a boundary • This is highly desirable in the context of recursive Bayesian estimation, where conditional probabilities are repeatedly multiplied by the likelihood functions of new observations. 6 NJN U of E 2016
Geometric Sensitivity • M is “geometrically sensitive”. (It requires small probabilities to be approximated with greater absolute accuracy than large probabilities.) • When topologised by M , P ( X ) does not have a boundary. • This is highly desirable in the context of recursive Bayesian estimation, where conditional probabilities are repeatedly multiplied by the likelihood functions of new observations. • M is Pearson’s c 2 divergence. It belongs to the one- D parameter family of a -divergences: M 3 6 NJN U of E 2016
Geometric Sensitivity • M is “geometrically sensitive”. (It requires small probabilities to be approximated with greater absolute accuracy than large probabilities.) • When topologised by M , P ( X ) does not have a boundary. • This is highly desirable in the context of recursive Bayesian estimation, where conditional probabilities are repeatedly multiplied by the likelihood functions of new observations. • M is Pearson’s c 2 divergence. It belongs to the one- D parameter family of a -divergences: M 3 • It is too restrictive to use in practice 6 NJN U of E 2016 NJN U of E 2016
a -Divergences • As | a | becomes larger becomes increasingly D a “geometrically sensitive” • The case a = 0 yields the Hellinger metric NJN U of E 2016 7
a -Divergences • As | a | becomes larger becomes increasingly D a “geometrically sensitive” • The case a = 0 yields the Hellinger metric • The case a = ± 1 yields the KL-Divergence: dP dP D D ( | ) : ( | ) E log P Q P Q 1 - Q dQ dQ • This is widely used in practice. NJN U of E 2016 7
a -Divergences • As | a | becomes larger becomes increasingly D a “geometrically sensitive” • The case a = 0 yields the Hellinger metric • The case a = ± 1 yields the KL-Divergence: dP dP D D ( | ) : ( | ) E log P Q P Q 1 - Q dQ dQ • This is widely used in practice. • Symmetric error criteria may be appropriate, such as ˆ ˆ D D ( | ) ( | ) t t t t NJN U of E 2016 7
Recommend
More recommend