Why are nonlinear filters stable? Ramon van Handel Department of Operations Research & Financial Engineering 5th Oxford-Princeton Conference, March 27, 2009
Filtering models Markov additive process ( X t , Y t ) t ≥ 0 : ◮ ( X t , Y t ) t ≥ 0 is a Markov process with c` adl` ag paths. ◮ Signal ( X t ) t ≥ 0 is itself a Markov process. ◮ Observations ( Y t ) t ≥ 0 conditionally independent increments. Standard examples: 1. White noise observations: dY t = h ( X t ) dt + σ dW t . 2. Counting observations: Y t Poisson with rate λ ( X t ) . 3. Marked point process observations, stochastic volatility, etc. Counterpart in discrete time: Hidden Markov Models .
Nonlinear filtering and stability Definition The nonlinear filter is the measure-valued process ( π t ) t ≥ 0 such that π t ( f ) is the optional projection of ( f ( X t )) t ≥ 0 on ( F Y t ) t ≥ 0 for every f . Notation: ◮ F Y t = σ { Y s : s ≤ t } , etc. (suitably augmented). ◮ Under P µ , the signal has initial measure X 0 ∼ µ . The corresponding filter is denoted ( π µ t ) t ≥ 0 , i.e., π µ t ( f ) = E µ ( f ( X t ) | F Y t ) . Question t →∞ When is the filter stable , i.e., E µ ( � π µ t − π ν t � ) − − − → 0 ? ◮ Problem lies at the heart of the asymptotic theory of nonlinear filters: key to ergodic theory and other uniform properties of the filter.
Example (discrete time) 5 N = 25 0 −5 5 N = 500 0 −5 5 N = 10000 0 −5 0 10 20 30 40 50 60 70 80 90 100 Kalman/SIS/SIS-R X n = 0 . 9 X n − 1 + β n , Y n = X n + γ n
Example (discrete time) 5 N = 50 0 −5 0 10 20 30 40 50 60 70 80 90 100 10 N = 50 0 −10 0 100 200 300 400 500 600 700 800 900 1000 10 N = 50 0 −10 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Kalman/SIS/SIS-R X n = 0 . 9 X n − 1 + β n , Y n = X n + γ n
Intuition Filter stability is caused by two mechanisms: 1. When the signal is ergodic , the filter should be also. 2. When the observations are sufficently informative , the resulting information gain should obsolete the prior measure. In the special linear-Gaussian case (Kalman filter), intuition can be made explicit: ergodic , observable , detectable models. Goal: develop a general theory. ◮ Proof in linear-Gaussian case is useless! ◮ Most results need very strong assumptions (uniform contraction). ◮ Ergodic case: all known general results are based on a paper by Kunita (1971). However, the key step in his proof is incorrect. ◮ Results beyond the ergodic case very limited.
Ergodic signal: a general result Ergodicity Assumption The signal possesses an invariant probability measure λ such that � P z ( X t ∈ · ) − λ � TV → 0 as t → ∞ for λ -a.e. z . Nondegeneracy Assumption P µ | F X t ∼ P µ | F X t ⊗ Φ | F Y t for all t < ∞ , µ . t ∨ F Y Theorem Suppose that the above assumptions hold. Then E µ ( � π µ t − π λ � P µ | σ ( X t ) − λ � TV → 0 . t � TV ) → 0 iff
Idea of proof Problem can be reduced to the case µ ≪ λ . We can prove: E µ ( � π µ t − π λ t � TV ) = � dµ � dµ � � � � � � � � E λ � E λ � � � F Y ∞ ∨ F X − E λ � � F Y � dλ ( X 0 ) dλ ( X 0 ) . � � [ t, ∞ [ � t � � By martingale convergence, t →∞ � E µ ( � π µ F Y ∞ ∨ F X [ t, ∞ [ = F Y t − π λ = ⇒ t � TV ) − − − → 0 . ∞ t ≥ 0 Wrong proof ⇒ � t ≥ 0 F X ⇒ � t ≥ 0 F Y ∞ ∨ F X [ t, ∞ [ = F Y ( X t ) t ≥ 0 ergodic = [ t, ∞ [ is trivial = ∞ . This fundamental mistake is made in Kunita (1971)!
Idea of proof Correct statement (von Weizs¨ acker 1983): � � F Y ∞ ∨ F X [ t, ∞ [ = F Y ∞ P λ -a.s. F X [ t, ∞ [ P λ ( · | F Y ∞ ) -trivial P λ -a.s. ⇔ t ≥ 0 t ≥ 0 So, must prove that ( X t ) t ≥ 0 is ergodic under P λ ( · | F Y ∞ ) . Key ideas: ◮ ( X t ) t ≥ 0 is a Markov pr. in a random environment under P λ ( · | F Y ∞ ) . ◮ Prove a general ergodic theorem for such processes. ◮ Use coupling, disintegration and time reversal methods to relate the ergodic properties under P λ ( · | F Y ∞ ) to those under P λ . ◮ Nondegeneracy enters in the last step.
Informative observations: a general result Definition Model is called uniformly observable if ∀ ε > 0 , ∃ δ > 0 such that � P µ | F Y ∞ − P ν | F Y ∞ � TV < δ implies � µ − ν � BL < ε. Model is called observable if P µ | F Y ∞ = P ν | F Y ∞ implies µ = ν . Theorem If the model is uniformly observable, then E µ ( � π µ t →∞ t − π ν → 0 whenever P µ | F Y ∞ ≪ P ν | F Y t � BL ) − − − ∞ . Moreover, if ( X t ) t ≥ 0 is Feller and takes values in a compact state space, then the conclusion already holds if the model is observable. Proof: Martingale convergence arguments.
Verifying observability How to prove (uniform) observability? ◮ Finite state space : observability reduces to linear algebra. ◮ Kalman filter : observability ⇐ ⇒ uniform observability. ◮ Additive noise : the model dX t = b ( X t ) dt + g ( X t ) dW t , dY t = h ( X t ) dt + σ dB t , is uniformly observable if h is strongly invertible. Proposition e i k · x ξ ( dx ) | > 0 . Then Let µ, ν, ξ ∈ P ( R d ) and let | � ∀ ε > 0 , ∃ δ > 0 s.t. � µ ∗ ξ − ν ∗ ξ � BL < δ = ⇒ � µ − ν � BL < ε. Proof: basic ideas from Banach space theory and harmonic analysis.
A necessary and sufficient condition Detectability Assumption For every pair µ, ν of initial measures, either 1. P µ | F Y ∞ � = P ν | F Y ∞ ; or 2. � P µ | σ ( X t ) − P ν | σ ( X t ) � TV → 0 as t → ∞ . Theorem Suppose that ( X t ) t ≥ 0 is a finite state Markov process and that the observations are nondegenerate. Then the following are equivalent: 1. The detectability condition is satisfied. 2. E µ ( � π µ t − π ν t � TV ) → 0 whenever P µ | F Y ∞ ≪ P ν | F Y ∞ . ◮ Detectability is necessary and sufficient! ◮ Very satisfying, but proof does not generalize (so far . . . )
Filter approximation: a general result Theorem Let ( π N k ) k ≥ 0 , N ≥ 1 be a sequence of recursive approximations of the nonlinear filter ( π k ) k ≥ 0 . Suppose that the following assumptions hold: 1. The signal is ergodic and the observations are nondegenerate. 2. The one step transition probability Π N of ( X k , π N k ) k ≥ 0 converges to the transition probability Π of ( X k , π k ) k ≥ 0 uniformly on compacts. 3. The family { π N k : k ≥ 0 , N ≥ 1 } is tight. Then ( π N k ) k ≥ 0 approximates ( π k ) k ≥ 0 uniformly in time average: � T � 1 � � π N N →∞ sup lim k − π k � BL = 0 . E T T ≥ 0 k =1 Inspired by an argument of Budhiraja and Kushner (2001), but the new stability results are key to developing the technique in its generality.
Particle filters ◮ SIS-R algorithm satisfies condition 2, SIS violates it. ◮ To prove the approximation property, need “only” prove that the particle system is tight . This is surprisingly difficult! ◮ Tightness proofs for geometrically ergodic signals with either (1) bounded observations, or (2) radially unbounded observations. ◮ Significant improvement over previous results (Del Moral 2004), and at present the only approach that can feasibly be extended. ◮ Continuous time should be no problem; nonergodic case is a mystery. 5 0 5 0 10 20 30 40 50 60 70 80 90 100 Kalman/SIS/SIS-R X n = 0 . 9 X n − 1 + β n , Y n = X n + γ n
Conclusion ◮ A surprisingly general asymptotic theory answers the basic question: why are nonlinear filters stable? ◮ Application: new insight into the performance of particle filters. ◮ Various open problems remain both in the fundamental theory and in applications (particle filters, stochastic control, statistical inference). References at http://www.princeton.edu/ ∼ rvan/
Recommend
More recommend