Adaptive Regularization Algorithms in Learning Theory – Case Study: Prediction of Blood Glucose Level Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang RICAM, Austria Joint research with E: De Vito (Uni. Genova), L. Rosasco (MIT, Boston). Workshop ”Inverse and Partial Information Problems”. RICAM, Linz. October-2008. Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang Adaptive Regularization Algorithms in Learning Theory – Case Study: Pre
Learning from examples Vapnik (95), Evgeniou, Pontil, Poggio (2000), Cucker, Smale (01) : 1) Two sets of variables X ⊂ R d , Y ⊂ R are related by a probabilistic relationship: → ρ ( ·| x ) − (unknown) probability distribution on Y X ∋ x − 2) Training data: z = { ( x 1 , y 1 ) ,..., ( x n , y n ) } ∈ ( X × Y ) n The goal: provide an estimator f = f z : X → Y to predict y ∈ Y for any given x ∈ X . Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang Adaptive Regularization Algorithms in Learning Theory – Case Study: Pre
EU-project ”DIAdvisor – diabetes adviser”: Glucose Prediction using patient vital data. 1) Input : x = x i = ( t i , x i 1 , x i 2 ,..., x i d − 1 ) ∈ R d , where x i k , k = 1 , 2 ,..., d − 1 , are the measurements of vital signs (e.g. glucose concentration, blood pH, temperature...) measured at the time t = t i , i = 1 , 2 ,..., n . 2) Output : y is the blood glucose concentration at the time t > t n in the future. State of art (R. Gillis et al., Abstract 0415-P , 2007, Santa Barbara, CA): ”With the estimator blinded to meals one can accurately (i.e. with an error less than 2 mmol/l) predict glucose levels 45 minites into the future. This is a promising result...” Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang Adaptive Regularization Algorithms in Learning Theory – Case Study: Pre
”The Uncertainty...It is rather a matter of Efficiency” (David Mumford. ”The mathematics of Perception”) If the blood glucose concentration is assumed to be a function y = y ( t , x 1 , x 2 ,..., x d − 1 , x d ,... ) , then Training data are: t i , x i 1 , x i 2 ,..., x i d − 1 , y i = y ( t i , x i 1 , x i 2 ,..., x i d − 1 , x i d ,... ) i = 1 , 2 ,..., n . In the first phase of ”DIAdvisor” only the data ( t i , y i ) , i = 1 , 2 ,..., n are available. The goal is to predict the value y m = y ( t m ,... ) for t m > t n , t m − t n > 45 ( minutes ) . Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang Adaptive Regularization Algorithms in Learning Theory – Case Study: Pre
Statistical framework 1) ρ X ( · ) is the (marginal) probability distribution on X (which is also unknown) 2) Expected risk of the estimator f : X → Y � � Y ( f ( x ) − y ) 2 ρ ( y | x ) ρ X ( x ) dydx E ( f ) = X 3)Regression function � f ρ ( x ) = argmin { E ( f ) , f ∈ L 2 ( X , ρ X dx ) } = Y y ρ ( y | x ) dy Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang Adaptive Regularization Algorithms in Learning Theory – Case Study: Pre
Hypothesis space and Target function → L 2 ( X , ρ X dx ) is the compact 1) H is a Hilbert space. J : H ֒ embedding 2) f H = argmin { E ( f ) , f ∈ H } = argmin {� f − f ρ � ρ , f ∈ H } E ( f ) = � f − f ρ � 2 ρ + E ( f ρ ) , � ·� ρ = � ·� L 2 ( X , ρ X dx ) ∀ f ∈ H � f − f ρ � ρ = � J f − f ρ � ρ J ∗ : L 2 ( X , ρ X dx ) → H f H : J ∗ J f = J ∗ f ρ ; Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang Adaptive Regularization Algorithms in Learning Theory – Case Study: Pre
Picard criterion and Source conditions ∞ ∞ ∑ ∑ L = J J ∗ = T = J ∗ J = t i �· , e i � H e i ; t i �· , l i � ρ l i i = 1 i = 1 ∞ ∞ � l i , f ρ � 2 � l i , f ρ � ρ ∑ ∑ ρ f H = e i ∈ H ⇐ < ∞ √ t i ⇒ t i i = 1 i = 1 ∞ � l i , f ρ � 2 ρ ∑ ∃ φ : [ 0 , t 1 ] → R + , φ ( 0 ) = 0, φ ↑ : t i φ 2 ( t i ) < ∞ i = 1 ∞ � l i , f ρ � ρ ∑ √ t i φ ( t i ) e i ∈ H ⇒ f H = φ ( T ) v v = i = 1 H φ = { f ∈ H : f = φ ( T ) v , v ∈ H } . Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang Adaptive Regularization Algorithms in Learning Theory – Case Study: Pre
Reproducing Kernel Hilbert Space H = H K 1) K : X × X → R is continuous, symmetric, positive semidefinite; K x = K ( x , · ) . r 2) H K = { f : f = ∑ c j K x j } , K x j = K ( x j , · ) j = 1 r s r s 3) � f , g � K = � ∑ c j K x j , ∑ d i K t i � K : = ∑ ∑ c j d i K ( x j , t i ) j = 1 i = 1 j = 1 i = 1 4) H K is the completion of H K w.r.t � ·� K ∀ f ∈ H K f ( x ) = � K x , f � K Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang Adaptive Regularization Algorithms in Learning Theory – Case Study: Pre
Discrete version of the equation J ∗ J f = J ∗ f ρ for J = J H K n z = { ( x i , y i ) } n x = ( x i ) n y = ( y i ) n i = 1 ∈ R n ; � u , v � R n = 1 ∑ u i v i . i = 1 , i = 1 , n i = 1 x : R n → H K S x : H K → R n , S x f = ( f ( x i )) n S ∗ i = 1 , n n x y = 1 S ∗ ∑ y i K x i , T x = S ∗ x S x = 1 ∑ K x i � K x i , ·� K n n i = 1 i = 1 → L 2 ( X , ρ x d x ) ; J H K f = fp ρ ⇒ S x f = y J H K : H K ֒ T = J ∗ Tf = J ∗ H K f ρ ⇒ T x f = S ∗ x y H K J H K : H K → H K ; Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang Adaptive Regularization Algorithms in Learning Theory – Case Study: Pre
x y Regularization of T x f = S ∗ Poggio et al. (2000), ... Smale, Zhou (05): Tikhonov regularization f λ n ∑ n i = 1 ( f ( x i ) − y i ) 2 + λ � f � 2 K } = ( λ I + T x ) − 1 S ∗ x y z = argmin { 1 n General regularization scheme: f λ x y = γ i K x i z = g λ ( T x ) S ∗ ∑ i = 1 g λ ( t ) : [ 0 , � T x � ] → R ; For Tikhonov g λ ( t ) = ( λ + t ) − 1 | g λ ( t ) | � c o 1) sup λ ; t | ( 1 − g λ ( t ) t ) t ν | ≤ c ρ λ ν . For Tikhonov p = 1 . 2) ∃ p : ∀ ν ∈ [ 0 , p ] sup t Remark: De Vore et al.(2006), Maiorov(2006): λ = 0 , H is a finite ball in a finite-dimensional space. Cortes, Vapnik (1995): other forms of a loss function V ( y i , f ( x i )) . Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang Adaptive Regularization Algorithms in Learning Theory – Case Study: Pre
Basic Theorem Assume: f H K ∈ H φ φ ∈ F p K ( x , x ) ; 1) K , æ , æ > sup x ∈ X | ( 1 − g λ ( t ) t ) t q | � c λ q , g λ : ∀ λ q � p + 1 / 2; 2) sup t √ Then for f λ x y , λ > 2 z = g λ ( T x ) S ∗ √ n ælog 4 2 h , with probability 1 − h √ c 2 ) log 1 � f H K − f λ ( c 1 φ ( λ ) λ + z � ρ √ h , � λ n c 4 λ √ n ) log 1 � f H K − f λ ( c 3 φ ( λ )+ z � K � h Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang Adaptive Regularization Algorithms in Learning Theory – Case Study: Pre
A priori parameter choice Th.1. Let θ ( t ) = φ ( t ) t , f H K ∈ H φ K . Under the assumptions of Basic Theorem for λ n = θ − 1 ( n − 1 / 2 ) with probability 1 − h � θ − 1 ( n − 1 / 2 ) log 1 � f H K − f λ n c φ ( θ − 1 ( n − 1 / 2 )) z � ρ � h c φ ( θ − 1 ( n − 1 / 2 )) log 1 � f H K − f λ n z � K � h � ·� ρ ∼ n − 2 r + 1 r � ·� K ∼ n − Remark 1: For φ ( t ) = t r ; 4 ( r + 1 ) , 2 ( r + 1 ) . Remark 2: Smale, Zhou (2005): 0 < r � 1 / 2 ; 2 r + 1 � ·� ρ � n − Caponnetto et al. (2005): r > 1 / 2 , 4 ( r + 3 / 2 ) Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang Adaptive Regularization Algorithms in Learning Theory – Case Study: Pre
Regularization in the empirical norm n { x i } : = 1 ∑ � f � 2 f 2 ( x i ) . n i = 1 Th.2. For f ∈ H K with the probability 1 − h � ≤ c 1 log 1 � � h � � f � 2 ρ −� f � 2 � f � 2 √ n K . � � { x i } Moreover, under the assumptions of Basic Theorem with the same probability √ c 6 ) log 1 � f H K − f λ z � { x i } ≤ ( c 5 φ ( λ ) λ + √ h . λ n Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang Adaptive Regularization Algorithms in Learning Theory – Case Study: Pre
Balancing Principle for Learning Theory √ { f λ i λ i = λ 0 q i , i = 0 , 1 ,..., M ; λ 0 = 2 h , q > 1 . √ n ælog 4 2 z } , λ j λ emp = max { λ k : � f λ k z � { x i } ≤ 4 c 6 log 1 z − f j = 0 , 1 ,..., k − 1 } . √ h λ j n , z � K ≤ 4 c 4 log 1 λ j λ H K = max { λ k : � f λ k z − f h j = 0 , 1 ,..., k − 1 } . √ n , λ j Th.3. Under the assumption of Basic Theorem the choice λ + = min { λ emp , λ H K } guarantees the optimal order of the risk without knowledge of the function φ generating source conditions. Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang Adaptive Regularization Algorithms in Learning Theory – Case Study: Pre
Adaptive scheme c v = 1 � f H K − f λ φ ( λ )+ z � ≤ λ v √ n , 2 , 1 4 c λ j λ j � f λ k � f H K − f λ k z − f z � + � f H K − f z � ≤ z � ≤ √ n . λ v j Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang Adaptive Regularization Algorithms in Learning Theory – Case Study: Pre
Recommend
More recommend