On-line estimation of a smooth regression function Liptser, R. - PDF document

On-line estimation of a smooth regression function Liptser, R. jointly with L. Goldentyer Tel Aviv University Dept. of Electrical Engineering-Systems December 19, 2002

SETTING We consider a tracking problem for smooth function f = f ( t ) , 0 ≤ t ≤ T, under observation X in = f ( t in ) + σξ i , t in = i n n is large ; - ( ξ i ) is i.i.d., Eξ i = 0 , Eξ 2 i = 1; - σ 2 is a positive constant. Without additional assumptions on f , it is dif- ficult to create an estimator even n is large. 1

Main assumption f is differentiable k -times and the oldest deriva- tive is Lipschitz continuous. Filtering approach: Bar-Shalom and Li Simulate f ( k ) ( t ) with a help WHITE NOISE: f ( k ) ( t ) d = “white noise” . dt it sounds as nonsense but works pretty good. Nonparametric Statistic Approach. f ∈ Σ( k, α, L ) The Stone-Ibragimov-Khasminskii class contain- ing k -times differentiable function with � � � � � ≤ L | t ′′ − t ′ | α , � f ( k ) ( t ′′ ) − f ( k ) ( t ′ ) 0 < α ≤ 1 . 2

Task: to combine both approaches Since a quality of estimating depends on n , any � f ( j ) estimate of f is marked by n , that is ( t ) are n estimates of f ( j ) ( t ), j = 0 , 1 , . . . , k respectively. It is known from Ibragimov and Khasminskii that for a wide class of loss � � k + α − j � f ( j ) − f ( j ) � L p sup E L n 2( k + α )+1 � < C. n f ∈ Σ( k,α,L ) and k + α − j n 2( k + α )+1 , j = 0 , 1 . . . , k is the best rate, uniformly in the class, of estimating risk convergence to zero in n → ∞ . 3

In particular, the risks � � � 2 , j = 0 , 1 . . . , k f ( j ) ( t ) − f ( j ) ( t ) E n have the same rates in n : � � 2 k + α − j � f ( j ) ( t ) − f ( j ) ( t ) | 2( k + α )+1 | sup lim n E n < C. n f ∈ Σ( k,α,L ) These rates cannot be exceeded uniformly on any nonempty open set from (0 , T ). Jointly with Khasminskii, we realize on-line filter guaranteeing the optimal rates in n . 4

� � f ( j ) f ( j ) Here t in and ( t in ) identify t i and ( t i ). n n For j = 0 , 1 . . . , k − 1, f ( j ) ( t i ) = � � f ( j ) ( t i − 1 )+ + 1 � f ( j +1) ( t i − 1 ) n � � q j X i − � f (0) ( t i − 1 ) + (2( k + α ) − j ) 2( k + α )+1 n and for j = k � � q k f ( k ) ( t i ) = � � X i − � f ( k ) ( t i − 1 )+ f (0) ( t i − 1 ) . (2( k + α ) − k ) 2( k + α )+1 n The vector q with entries q 0 , . . . , q k has to be chosen such that all roots of characteristic polynomial p k ( u, q ) = u k +1 + q 0 u k + q 1 u k − 1 + . . . + q k − 1 u + q k are different and have negative real parts. 5

Two problems 1. Choice of an appropriate initial conditions: f (0) (0) , � � f (1) (0) , . . . , � f ( k ) (0) to minimize a boundary layer. 2. Choice of the vector q such that the assumption about roots of the polynomial p k ( u, q ) remains valid and � � 2 k + α − j � f ( j ) ( t ) − f ( j ) ( t ) | C ( q ) ≥ sup 2( k + α )+1 | E n n f ∈ Σ( k,α,L ) is smallest as possible. To manage these problems we need to restrict ourselves by α = 1 . 6

Boundary layer The left side boundary layer 1 c ( q ) n − 2 β +1 log n where the optimal rates in n might be lost is inevitable. This boundary layer is due to on- line limitations of the above tracking system. One can readily suggest an off-line modifica- tion with the same recursion in the backward time subject to some boundary conditions independent of observation X i ’s. This modifica- tion obeys the right side boundary layer. So, a combination of the forward and backward time tracking algorithms allows support the optimal rate in n for [0 , T ]. 7

Suitable choice of q Vector q should satisfy multiple requirements regarding - C ( q ) the upper bound for the normalized risk; - c ( q ) the parameter of the boundary layer; - roots of polynomial p k ( u, q ). These requirements might contradict each other. 8

Example 1, Σ(0 , 1 , L ) The worst f ( t ) = f (0) ± Lt . Applying the Arzela-Ascoli theorem we find that 2 + L 2 σ 2 C ( q ) = σq q 2 and q ◦ := argmin C ( q ) = (2 L ) 2 / 3 σ 1 / 3 . g> 0 Hence, a reasonable estimator is: � 2 L � 2 / 3 f ( t i ) = � � ( X i − � f ( t i − 1 ) + f ( t i − 1 )) . nσ 9

General case, Σ( k > 0 , 1 , L ) With the worst f ( t ) such that f ( k ) ( t ) = f ( k ) (0) ± Lt, applying the Arzela-Ascoli theorem we find � � P ( q ) + M ( g ) M ∗ ( q ) C ( q ) = trace where M ( q ) = L ( a − qA ) − 1 b and P ( q ) solves the Lyapunov equation ( a − qA ) P ( q ) + P ( q )( a − qA ) ∗ + σ 2 qq ∗ = 0 . 10

Here,   0 1 0 0 0 . . .    0 0 1 0 0  . . .   . . . . . . . . . . . .   a = . . . . . . ,     0 0 0 0 . . . 1   0 0 0 0 . . . 0 ( k +1) × ( k +1)   0 . � �   . .   A = 1 0 0 b = . . . 1 × ( k +1) , .   0   1 ( k +1) × 1 11

Conditional minimization A direct minimization of C ( q ) is useless. A computer implementation is heavy enough. Even if q ◦ = argmin C ( q ) g is found, the main requirement, expressed in term of eigenvalues the polynomial p k ( u, q ), might be not satisfied (numerical computa- tions show that). So, some kind of a conditional minimization procedure in vector q is desirable. The main tool for such minimization is adaptation to Kalman filter design. 12

Kalman filter design In a frame of Bar Shalom idea, set k +2 f ( k ) ( t i ) = f ( k ) ( t i − 1 ) + n − 2( k +1)+1 γ η i X i = f (0) ( t i − 1 ) + σξ i where ( η i ) is a white noise, independent of ( ξ i ), with Eη 1 = 0, Eη 1 = 1; γ is free parameter. For any γ � = 0, the Kalman filter possesses an asymptotic form in n → ∞ and, being applied to the original function f ( t ), guaranties the optimal rate in n → ∞ for the estimation risk. In other words, that Kalman filter coincides with our proposed filter. The remarkable fact is that q = q ( γ ) and for any positive γ roots of polynomial p k ( u, q ( γ )) are different and have negative real parts. 13

Thus, q ( γ ) = Q ( γ ) A ∗ σ 2 with Q ( γ ) being solution of the algebraic Ric- cati equation aQ ( γ ) + Q ( γ ) a ∗ + γ 2 bb ∗ − Q ( γ ) A ∗ AQ ( γ ) = 0 . σ 2 obeying the unique positive-definite solution since block-matrices   A   � a k bb ∗ � Aa   bb ∗ abb ∗ G 1 = and G 2 = . . .   . . .   Aa k are of full ranks (so called, observability and controllability conditions). 14

C ( q ( γ )) -minimization We reduce the minimization of C ( q ) with respect to vector q to minimization of C ( q ( γ ) with respect to a positive parameter γ . σ = 0.25 σ = 1 8 8 10 10 6 6 10 10 C (q ( γ ) ) C (q ( γ ) ) 4 4 10 10 2 2 10 10 0 0 10 10 −2 0 2 4 −2 0 2 4 10 10 10 10 10 10 10 10 γ γ L=1 σ = 4 L=10 8 10 L=100 L=1; σ =0.25; γ =0.74082; C=5.278; 6 10 L=10; σ =0.25; γ =4.4817; C=102.1574; L=100; σ =0.25; γ =24.5325; C=2695.7356; C (q ( γ ) ) L=1; σ =1; γ =1; C=18.8975; 4 10 L=10; σ =1; γ =6.0496; C=260.7145; L=100; σ =1; γ =33.1155; C=5839.3352; L=1; σ =4; γ =1.3499; C=92.0461; 2 10 L=10; σ =4; γ =8.1662; C=787.7197; L=100; σ =4; γ =49.4024; C=13850.9423; 0 10 −2 0 2 4 10 10 10 10 γ � � Here, C q ( γ ) in logarithmic scale for k = 2 and various L and σ . 15

Explicit minimization procedure Entries of Q ( γ ) obey the following presentation � γ � i + j +1 k +1 , i, j = 0 , 1 , . . . , k, Q ij ( γ, σ ) = U ij σ 2 σ where U ij are entries of the matrix U also being solution of the algebraic Riccati equation free of σ and γ : aU + Ua ∗ + bb ∗ − UA ∗ AU = 0 . We have � γ � 1 /k +1 q 0 ( γ ) = U 00 σ � γ � 2 /k +1 q 1 ( γ ) = U 01 σ ................................. � γ � q k ( γ ) = U 0 k . σ 16

For k = 0 , . . . , 4 k U 00 U 01 U 02 U 03 U 04 0 1 NA NA NA NA √ 1 2 1 NA NA NA 2 2 2 1 NA NA � � √ √ √ 3 4 + 8 2 + 2 4 + 8 1 NA √ √ √ √ 4 1 + 5 3 + 5 3 + 5 1 + 5 1 17

Roots of p k ( u, q ) � γ � k = 0 : − σ � 1 / 2 � 1 � γ � 2 ± i 1 √ √ k = 1 : − σ 2 √ � γ � 1 / 3 � � 1; 1 3 k = 2 : − 2 ± i σ 2 � γ � 1 / 4 � k = 3 : − 0 . 924 ± i 0 . 383; σ � 0 . 383 ± i 0 . 924 � γ � 1 / 5 � k = 4 : 1; 0 . 809 ± i 0 . 588; − σ � 0 . 309 ± i 0 . 951 . 18

Example 2 k = 2, L = 100, σ = 0 . 25. f (0) ( t i − 1 ) + 1 f (0) ( t i ) = � � � f (1) ( t i − 1 ) n � � + 9 . 225 X i − � f (0) ( t i − 1 ) n 6 / 7 f (1) ( t i − 1 ) + 1 f (1) ( t i ) = � � � f (2) ( t i − 1 ) n � � + 42 . 550 X i − � f (0) ( t i − 1 ) n 5 / 7 f (2) ( t i ) = � � f (2) ( t i − 1 ) � � + 98 . 132 X i − � f (0) ( t i − 1 ) . n 4 / 7 19

On-line estimation of a smooth regression function Liptser, R. - PDF document

On-line estimation of a smooth regression function Liptser, R. jointly with L. Goldentyer Tel Aviv University Dept. of Electrical Engineering-Systems December 19, 2002 SETTING We consider a tracking problem for smooth function f = f ( t ) ,

Strengthening Smooth Transition Strengthening Smooth Transition Strengthening Smooth Transition

The Slope of a Line The Slope of a Line The Slope of a Line The Slope of a Line The Slope of a

Title Slide Math 696 Class July 19, 2002 Line 1 Line 2 Line 3 Line 4 Line 5 Line 6 Line 7

Chapter 10: Regression Think about predicting the sons height from the fathers height The

X-Line 101 June 2019 X-Line 101 X-Line Unit Overview What makes X-Line unique X-Line 101

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Chapter 12: The Regression Line We already know that the regression line goes through the point

Extremal generalized smooth words Kolakoski word Run-length encoding Smooth words Generalized

Chapter 7 Linear Regression 04/05/2016 Huamei Dong 1. Review Least square regression line 2.

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

LEARNING Outline Math Behind Logistic Regression Visualizing Logistic Regression Loss

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

Smart Cities Institute Presentation by Professor Peter Newton FASSA Research Professor in

Efficient Reprogrammable Architecture for Boolean Functions and Cellular Automata Content

5/5/16 Outline 0) Course Info CS520 1) Introduction Data Integration, Warehousing, and 2)

Preparing for PIP QI Topic: PIP QI Update and Team Roles & Responsibilities Presenter: Cati

Membrane Computing at Twelve Years Gheorghe P aun Romanian Academy, Bucure sti, RGNC,

Improving MPC performance by stabilizing feedbacks Roberto Guglielmi Gran Sasso Science

Wide Area Federal Aviation Administration Augmentation System (WAAS) Ionospheric Effects

SCHOOL BOARD PROFILE Nearly 170 years of excellence in Nearly Catholic Catholic 91,000

Sambuz

Useful Links

Newsletter

Mail Us

On-line estimation of a smooth regression function Liptser, R. - PDF document

On-line estimation of a smooth regression function Liptser, R. jointly with L. Goldentyer Tel Aviv University Dept. of Electrical Engineering-Systems December 19, 2002 SETTING We consider a tracking problem for smooth function f = f ( t ) ,

Strengthening Smooth Transition Strengthening Smooth Transition Strengthening Smooth Transition

The Slope of a Line The Slope of a Line The Slope of a Line The Slope of a Line The Slope of a

Title Slide Math 696 Class July 19, 2002 Line 1 Line 2 Line 3 Line 4 Line 5 Line 6 Line 7

Chapter 10: Regression Think about predicting the sons height from the fathers height The

X-Line 101 June 2019 X-Line 101 X-Line Unit Overview What makes X-Line unique X-Line 101

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Chapter 12: The Regression Line We already know that the regression line goes through the point

Extremal generalized smooth words Kolakoski word Run-length encoding Smooth words Generalized

Chapter 7 Linear Regression 04/05/2016 Huamei Dong 1. Review Least square regression line 2.

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

LEARNING Outline Math Behind Logistic Regression Visualizing Logistic Regression Loss

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

Smart Cities Institute Presentation by Professor Peter Newton FASSA Research Professor in

Efficient Reprogrammable Architecture for Boolean Functions and Cellular Automata Content

5/5/16 Outline 0) Course Info CS520 1) Introduction Data Integration, Warehousing, and 2)

Preparing for PIP QI Topic: PIP QI Update and Team Roles &amp; Responsibilities Presenter: Cati

Membrane Computing at Twelve Years Gheorghe P aun Romanian Academy, Bucure sti, RGNC,

Improving MPC performance by stabilizing feedbacks Roberto Guglielmi Gran Sasso Science

Wide Area Federal Aviation Administration Augmentation System (WAAS) Ionospheric Effects

SCHOOL BOARD PROFILE Nearly 170 years of excellence in Nearly Catholic Catholic 91,000

Sambuz

Useful Links

Newsletter

Mail Us

Preparing for PIP QI Topic: PIP QI Update and Team Roles & Responsibilities Presenter: Cati