Short introduction to machine learning theory Machine learning and time series Machine learning theory for time series Exponential inequalities for nonstationary Markov chains Pierre Alquier CIMFAV seminar January 16, 2019 Pierre Alquier Machine learning theory for time series
Short introduction to machine learning theory Machine learning and time series Short introduction to machine learning theory 1 Machine learning and time series 2 Machine learning & stationary time series Nonstationary Markov chains Pierre Alquier Machine learning theory for time series
Short introduction to machine learning theory Machine learning and time series Generic machine learning problem Main ingredients : Pierre Alquier Machine learning theory for time series
Short introduction to machine learning theory Machine learning and time series Generic machine learning problem Main ingredients : observations : ( X 1 , Y 1 ) , ( X 2 , Y 2 ) , ..., ( X n , Y n ) Pierre Alquier Machine learning theory for time series
Short introduction to machine learning theory Machine learning and time series Generic machine learning problem Main ingredients : observations : ( X 1 , Y 1 ) , ( X 2 , Y 2 ) , ..., ( X n , Y n ) → usually i.i.d from an unknown distribution P ... Pierre Alquier Machine learning theory for time series
Short introduction to machine learning theory Machine learning and time series Generic machine learning problem Main ingredients : observations : ( X 1 , Y 1 ) , ( X 2 , Y 2 ) , ..., ( X n , Y n ) → usually i.i.d from an unknown distribution P ... a restricted set of predictors ( f θ , θ ∈ Θ) Pierre Alquier Machine learning theory for time series
Short introduction to machine learning theory Machine learning and time series Generic machine learning problem Main ingredients : observations : ( X 1 , Y 1 ) , ( X 2 , Y 2 ) , ..., ( X n , Y n ) → usually i.i.d from an unknown distribution P ... a restricted set of predictors ( f θ , θ ∈ Θ) → f θ ( X ) meant to predict Y . Pierre Alquier Machine learning theory for time series
Short introduction to machine learning theory Machine learning and time series Generic machine learning problem Main ingredients : observations : ( X 1 , Y 1 ) , ( X 2 , Y 2 ) , ..., ( X n , Y n ) → usually i.i.d from an unknown distribution P ... a restricted set of predictors ( f θ , θ ∈ Θ) → f θ ( X ) meant to predict Y . A loss function ℓ → ℓ ( y ′ − y ) incurred by predicting y ′ while the truth is y . Pierre Alquier Machine learning theory for time series
Short introduction to machine learning theory Machine learning and time series Generic machine learning problem Main ingredients : observations : ( X 1 , Y 1 ) , ( X 2 , Y 2 ) , ..., ( X n , Y n ) → usually i.i.d from an unknown distribution P ... a restricted set of predictors ( f θ , θ ∈ Θ) → f θ ( X ) meant to predict Y . A loss function ℓ → ℓ ( y ′ − y ) incurred by predicting y ′ while the truth is y . the risk R ( θ ) Pierre Alquier Machine learning theory for time series
Short introduction to machine learning theory Machine learning and time series Generic machine learning problem Main ingredients : observations : ( X 1 , Y 1 ) , ( X 2 , Y 2 ) , ..., ( X n , Y n ) → usually i.i.d from an unknown distribution P ... a restricted set of predictors ( f θ , θ ∈ Θ) → f θ ( X ) meant to predict Y . A loss function ℓ → ℓ ( y ′ − y ) incurred by predicting y ′ while the truth is y . the risk R ( θ ) → R ( θ ) = E ( X , Y ) ∼ P [ ℓ ( f θ ( X ) − Y )] . Not observable. Pierre Alquier Machine learning theory for time series
Short introduction to machine learning theory Machine learning and time series Generic machine learning problem Main ingredients : observations : ( X 1 , Y 1 ) , ( X 2 , Y 2 ) , ..., ( X n , Y n ) → usually i.i.d from an unknown distribution P ... a restricted set of predictors ( f θ , θ ∈ Θ) → f θ ( X ) meant to predict Y . A loss function ℓ → ℓ ( y ′ − y ) incurred by predicting y ′ while the truth is y . the risk R ( θ ) → R ( θ ) = E ( X , Y ) ∼ P [ ℓ ( f θ ( X ) − Y )] . Not observable. an empirical proxy r ( θ ) for R ( θ ) Pierre Alquier Machine learning theory for time series
Short introduction to machine learning theory Machine learning and time series Generic machine learning problem Main ingredients : observations : ( X 1 , Y 1 ) , ( X 2 , Y 2 ) , ..., ( X n , Y n ) → usually i.i.d from an unknown distribution P ... a restricted set of predictors ( f θ , θ ∈ Θ) → f θ ( X ) meant to predict Y . A loss function ℓ → ℓ ( y ′ − y ) incurred by predicting y ′ while the truth is y . the risk R ( θ ) → R ( θ ) = E ( X , Y ) ∼ P [ ℓ ( f θ ( X ) − Y )] . Not observable. an empirical proxy r ( θ ) for R ( θ ) � n → for example r ( θ ) = 1 i = 1 ℓ ( f θ ( X i ) − Y i ) . n empirical risk minimizer ˆ θ Pierre Alquier Machine learning theory for time series
Short introduction to machine learning theory Machine learning and time series Generic machine learning problem Main ingredients : observations : ( X 1 , Y 1 ) , ( X 2 , Y 2 ) , ..., ( X n , Y n ) → usually i.i.d from an unknown distribution P ... a restricted set of predictors ( f θ , θ ∈ Θ) → f θ ( X ) meant to predict Y . A loss function ℓ → ℓ ( y ′ − y ) incurred by predicting y ′ while the truth is y . the risk R ( θ ) → R ( θ ) = E ( X , Y ) ∼ P [ ℓ ( f θ ( X ) − Y )] . Not observable. an empirical proxy r ( θ ) for R ( θ ) � n → for example r ( θ ) = 1 i = 1 ℓ ( f θ ( X i ) − Y i ) . n empirical risk minimizer ˆ θ → ˆ θ = argmin r ( θ ) . θ ∈ Θ Pierre Alquier Machine learning theory for time series
Short introduction to machine learning theory Machine learning and time series Sub-gamma random variables Definition T is said to be sub-gamma iff ∃ ( v , w ) such that ∀ k ≥ 2, ≤ k ! vw k − 2 � | T | k � . E 2 Pierre Alquier Machine learning theory for time series
Short introduction to machine learning theory Machine learning and time series Sub-gamma random variables Definition T is said to be sub-gamma iff ∃ ( v , w ) such that ∀ k ≥ 2, ≤ k ! vw k − 2 � | T | k � . E 2 Examples : T ∼ Γ( a , b ) , holds with ( v , w ) = ( ab 2 , b ) . Pierre Alquier Machine learning theory for time series
Short introduction to machine learning theory Machine learning and time series Sub-gamma random variables Definition T is said to be sub-gamma iff ∃ ( v , w ) such that ∀ k ≥ 2, ≤ k ! vw k − 2 � | T | k � . E 2 Examples : T ∼ Γ( a , b ) , holds with ( v , w ) = ( ab 2 , b ) . any Z with P ( | Z | ≥ t ) ≤ P ( | T | ≥ t ) . Pierre Alquier Machine learning theory for time series
Short introduction to machine learning theory Machine learning and time series Bernstein’s inequality Theorem Let T 1 , . . . , T n be i.i.d and ( v , w ) -sub-gamma random variables. Then, ∀ ζ ∈ ( 0 , 1 / w ) , � � n � � nv ζ 2 � E exp ζ [ T i − E T i ] ≤ exp . 2 ( 1 − w ζ ) i = 1 Pierre Alquier Machine learning theory for time series
Short introduction to machine learning theory Machine learning and time series Bernstein’s inequality Theorem Let T 1 , . . . , T n be i.i.d and ( v , w ) -sub-gamma random variables. Then, ∀ ζ ∈ ( 0 , 1 / w ) , � � n � � nv ζ 2 � E exp ζ [ T i − E T i ] ≤ exp . 2 ( 1 − w ζ ) i = 1 Consequence in ML : put T i = − ℓ ( f θ ( X i ) − Y i ) and assume T i is ( v , w ) -sub-gamma, then Pierre Alquier Machine learning theory for time series
Short introduction to machine learning theory Machine learning and time series Bernstein’s inequality Theorem Let T 1 , . . . , T n be i.i.d and ( v , w ) -sub-gamma random variables. Then, ∀ ζ ∈ ( 0 , 1 / w ) , � � n � � nv ζ 2 � E exp ζ [ T i − E T i ] ≤ exp . 2 ( 1 − w ζ ) i = 1 Consequence in ML : put T i = − ℓ ( f θ ( X i ) − Y i ) and assume T i is ( v , w ) -sub-gamma, then for any s > 0, � � � � � R ( θ ) − r ( θ ) > t = exp [ s ( R ( θ ) − r ( θ ))] > exp( st ) P P Pierre Alquier Machine learning theory for time series
Short introduction to machine learning theory Machine learning and time series Bernstein’s inequality Theorem Let T 1 , . . . , T n be i.i.d and ( v , w ) -sub-gamma random variables. Then, ∀ ζ ∈ ( 0 , 1 / w ) , � � n � � nv ζ 2 � E exp ζ [ T i − E T i ] ≤ exp . 2 ( 1 − w ζ ) i = 1 Consequence in ML : put T i = − ℓ ( f θ ( X i ) − Y i ) and assume T i is ( v , w ) -sub-gamma, then for any s > 0, � � � � � R ( θ ) − r ( θ ) > t ≤ E exp s ( R ( θ ) − r ( θ )) − st P Pierre Alquier Machine learning theory for time series
Short introduction to machine learning theory Machine learning and time series Bernstein’s inequality Theorem Let T 1 , . . . , T n be i.i.d and ( v , w ) -sub-gamma random variables. Then, ∀ ζ ∈ ( 0 , 1 / w ) , � � n � � nv ζ 2 � E exp ζ [ T i − E T i ] ≤ exp . 2 ( 1 − w ζ ) i = 1 Consequence in ML : put T i = − ℓ ( f θ ( X i ) − Y i ) and assume T i is ( v , w ) -sub-gamma, then for any s > 0, � � � � � � � � n s R ( θ ) − r ( θ ) > t ≤ E exp i = 1 [ T i − E T i ] − st P n Pierre Alquier Machine learning theory for time series
Recommend
More recommend