On-line estimation of a smooth regression function Liptser, R. jointly with L. Goldentyer Tel Aviv University Dept. of Electrical Engineering-Systems December 19, 2002
SETTING We consider a tracking problem for smooth function f = f ( t ) , 0 ≤ t ≤ T, under observation X in = f ( t in ) + σξ i , t in = i n n is large ; - ( ξ i ) is i.i.d., Eξ i = 0 , Eξ 2 i = 1; - σ 2 is a positive constant. Without additional assumptions on f , it is dif- ficult to create an estimator even n is large. 1
Main assumption f is differentiable k -times and the oldest deriva- tive is Lipschitz continuous. Filtering approach: Bar-Shalom and Li Simulate f ( k ) ( t ) with a help WHITE NOISE: f ( k ) ( t ) d = “white noise” . dt it sounds as nonsense but works pretty good. Nonparametric Statistic Approach. f ∈ Σ( k, α, L ) The Stone-Ibragimov-Khasminskii class contain- ing k -times differentiable function with � � � � � ≤ L | t ′′ − t ′ | α , � f ( k ) ( t ′′ ) − f ( k ) ( t ′ ) 0 < α ≤ 1 . 2
Task: to combine both approaches Since a quality of estimating depends on n , any � f ( j ) estimate of f is marked by n , that is ( t ) are n estimates of f ( j ) ( t ), j = 0 , 1 , . . . , k respectively. It is known from Ibragimov and Khasminskii that for a wide class of loss � � k + α − j � f ( j ) − f ( j ) � L p sup E L n 2( k + α )+1 � < C. n f ∈ Σ( k,α,L ) and k + α − j n 2( k + α )+1 , j = 0 , 1 . . . , k is the best rate, uniformly in the class, of esti- mating risk convergence to zero in n → ∞ . 3
In particular, the risks � � � 2 , j = 0 , 1 . . . , k f ( j ) ( t ) − f ( j ) ( t ) E n have the same rates in n : � � 2 k + α − j � f ( j ) ( t ) − f ( j ) ( t ) | 2( k + α )+1 | sup lim n E n < C. n f ∈ Σ( k,α,L ) These rates cannot be exceeded uniformly on any nonempty open set from (0 , T ). Jointly with Khasminskii, we realize on-line fil- ter guaranteeing the optimal rates in n . 4
� � f ( j ) f ( j ) Here t in and ( t in ) identify t i and ( t i ). n n For j = 0 , 1 . . . , k − 1, f ( j ) ( t i ) = � � f ( j ) ( t i − 1 )+ + 1 � f ( j +1) ( t i − 1 ) n � � q j X i − � f (0) ( t i − 1 ) + (2( k + α ) − j ) 2( k + α )+1 n and for j = k � � q k f ( k ) ( t i ) = � � X i − � f ( k ) ( t i − 1 )+ f (0) ( t i − 1 ) . (2( k + α ) − k ) 2( k + α )+1 n The vector q with entries q 0 , . . . , q k has to be chosen such that all roots of characteristic poly- nomial p k ( u, q ) = u k +1 + q 0 u k + q 1 u k − 1 + . . . + q k − 1 u + q k are different and have negative real parts. 5
Two problems 1. Choice of an appropriate initial conditions: f (0) (0) , � � f (1) (0) , . . . , � f ( k ) (0) to minimize a boundary layer. 2. Choice of the vector q such that the as- sumption about roots of the polynomial p k ( u, q ) remains valid and � � 2 k + α − j � f ( j ) ( t ) − f ( j ) ( t ) | C ( q ) ≥ sup 2( k + α )+1 | E n n f ∈ Σ( k,α,L ) is smallest as possible. To manage these problems we need to restrict ourselves by α = 1 . 6
Boundary layer The left side boundary layer 1 c ( q ) n − 2 β +1 log n where the optimal rates in n might be lost is inevitable. This boundary layer is due to on- line limitations of the above tracking system. One can readily suggest an off-line modifica- tion with the same recursion in the backward time subject to some boundary conditions in- dependent of observation X i ’s. This modifica- tion obeys the right side boundary layer. So, a combination of the forward and back- ward time tracking algorithms allows support the optimal rate in n for [0 , T ]. 7
Suitable choice of q Vector q should satisfy multiple requirements regarding - C ( q ) the upper bound for the normalized risk; - c ( q ) the parameter of the boundary layer; - roots of polynomial p k ( u, q ). These requirements might contradict each other. 8
Example 1, Σ(0 , 1 , L ) The worst f ( t ) = f (0) ± Lt . Applying the Arzela-Ascoli theorem we find that 2 + L 2 σ 2 C ( q ) = σq q 2 and q ◦ := argmin C ( q ) = (2 L ) 2 / 3 σ 1 / 3 . g> 0 Hence, a reasonable estimator is: � 2 L � 2 / 3 f ( t i ) = � � ( X i − � f ( t i − 1 ) + f ( t i − 1 )) . nσ 9
General case, Σ( k > 0 , 1 , L ) With the worst f ( t ) such that f ( k ) ( t ) = f ( k ) (0) ± Lt, applying the Arzela-Ascoli theorem we find � � P ( q ) + M ( g ) M ∗ ( q ) C ( q ) = trace where M ( q ) = L ( a − qA ) − 1 b and P ( q ) solves the Lyapunov equation ( a − qA ) P ( q ) + P ( q )( a − qA ) ∗ + σ 2 qq ∗ = 0 . 10
Here, 0 1 0 0 0 . . . 0 0 1 0 0 . . . . . . . . . . . . . . . a = . . . . . . , 0 0 0 0 . . . 1 0 0 0 0 . . . 0 ( k +1) × ( k +1) 0 . � � . . A = 1 0 0 b = . . . 1 × ( k +1) , . 0 1 ( k +1) × 1 11
Conditional minimization A direct minimization of C ( q ) is useless. A computer implementation is heavy enough. Even if q ◦ = argmin C ( q ) g is found, the main requirement, expressed in term of eigenvalues the polynomial p k ( u, q ), might be not satisfied (numerical computa- tions show that). So, some kind of a conditional minimization procedure in vector q is desirable. The main tool for such minimization is adaptation to Kalman filter design. 12
Kalman filter design In a frame of Bar Shalom idea, set k +2 f ( k ) ( t i ) = f ( k ) ( t i − 1 ) + n − 2( k +1)+1 γ η i X i = f (0) ( t i − 1 ) + σξ i where ( η i ) is a white noise, independent of ( ξ i ), with Eη 1 = 0, Eη 1 = 1; γ is free parameter. For any γ � = 0, the Kalman filter possesses an asymptotic form in n → ∞ and, being applied to the original function f ( t ), guaranties the op- timal rate in n → ∞ for the estimation risk. In other words, that Kalman filter coincides with our proposed filter. The remarkable fact is that q = q ( γ ) and for any positive γ roots of polynomial p k ( u, q ( γ )) are different and have negative real parts. 13
Thus, q ( γ ) = Q ( γ ) A ∗ σ 2 with Q ( γ ) being solution of the algebraic Ric- cati equation aQ ( γ ) + Q ( γ ) a ∗ + γ 2 bb ∗ − Q ( γ ) A ∗ AQ ( γ ) = 0 . σ 2 obeying the unique positive-definite solution since block-matrices A � a k bb ∗ � Aa bb ∗ abb ∗ G 1 = and G 2 = . . . . . . Aa k are of full ranks (so called, observability and controllability conditions). 14
C ( q ( γ )) -minimization We reduce the minimization of C ( q ) with re- spect to vector q to minimization of C ( q ( γ ) with respect to a positive parameter γ . σ = 0.25 σ = 1 8 8 10 10 6 6 10 10 C (q ( γ ) ) C (q ( γ ) ) 4 4 10 10 2 2 10 10 0 0 10 10 −2 0 2 4 −2 0 2 4 10 10 10 10 10 10 10 10 γ γ L=1 σ = 4 L=10 8 10 L=100 L=1; σ =0.25; γ =0.74082; C=5.278; 6 10 L=10; σ =0.25; γ =4.4817; C=102.1574; L=100; σ =0.25; γ =24.5325; C=2695.7356; C (q ( γ ) ) L=1; σ =1; γ =1; C=18.8975; 4 10 L=10; σ =1; γ =6.0496; C=260.7145; L=100; σ =1; γ =33.1155; C=5839.3352; L=1; σ =4; γ =1.3499; C=92.0461; 2 10 L=10; σ =4; γ =8.1662; C=787.7197; L=100; σ =4; γ =49.4024; C=13850.9423; 0 10 −2 0 2 4 10 10 10 10 γ � � Here, C q ( γ ) in logarithmic scale for k = 2 and various L and σ . 15
Explicit minimization procedure Entries of Q ( γ ) obey the following presentation � γ � i + j +1 k +1 , i, j = 0 , 1 , . . . , k, Q ij ( γ, σ ) = U ij σ 2 σ where U ij are entries of the matrix U also being solution of the algebraic Riccati equation free of σ and γ : aU + Ua ∗ + bb ∗ − UA ∗ AU = 0 . We have � γ � 1 /k +1 q 0 ( γ ) = U 00 σ � γ � 2 /k +1 q 1 ( γ ) = U 01 σ ................................. � γ � q k ( γ ) = U 0 k . σ 16
For k = 0 , . . . , 4 k U 00 U 01 U 02 U 03 U 04 0 1 NA NA NA NA √ 1 2 1 NA NA NA 2 2 2 1 NA NA � � √ √ √ 3 4 + 8 2 + 2 4 + 8 1 NA √ √ √ √ 4 1 + 5 3 + 5 3 + 5 1 + 5 1 17
Roots of p k ( u, q ) � γ � k = 0 : − σ � 1 / 2 � 1 � γ � 2 ± i 1 √ √ k = 1 : − σ 2 √ � γ � 1 / 3 � � 1; 1 3 k = 2 : − 2 ± i σ 2 � γ � 1 / 4 � k = 3 : − 0 . 924 ± i 0 . 383; σ � 0 . 383 ± i 0 . 924 � γ � 1 / 5 � k = 4 : 1; 0 . 809 ± i 0 . 588; − σ � 0 . 309 ± i 0 . 951 . 18
Example 2 k = 2, L = 100, σ = 0 . 25. f (0) ( t i − 1 ) + 1 f (0) ( t i ) = � � � f (1) ( t i − 1 ) n � � + 9 . 225 X i − � f (0) ( t i − 1 ) n 6 / 7 f (1) ( t i − 1 ) + 1 f (1) ( t i ) = � � � f (2) ( t i − 1 ) n � � + 42 . 550 X i − � f (0) ( t i − 1 ) n 5 / 7 f (2) ( t i ) = � � f (2) ( t i − 1 ) � � + 98 . 132 X i − � f (0) ( t i − 1 ) . n 4 / 7 19
Recommend
More recommend