machine learning theory for time series
play

Machine learning theory for time series Exponential inequalities for - PowerPoint PPT Presentation

Short introduction to machine learning theory Machine learning and time series Machine learning theory for time series Exponential inequalities for nonstationary Markov chains Pierre Alquier CIMFAV seminar January 16, 2019 Pierre Alquier


  1. Short introduction to machine learning theory Machine learning and time series Machine learning theory for time series Exponential inequalities for nonstationary Markov chains Pierre Alquier CIMFAV seminar January 16, 2019 Pierre Alquier Machine learning theory for time series

  2. Short introduction to machine learning theory Machine learning and time series Short introduction to machine learning theory 1 Machine learning and time series 2 Machine learning & stationary time series Nonstationary Markov chains Pierre Alquier Machine learning theory for time series

  3. Short introduction to machine learning theory Machine learning and time series Generic machine learning problem Main ingredients : Pierre Alquier Machine learning theory for time series

  4. Short introduction to machine learning theory Machine learning and time series Generic machine learning problem Main ingredients : observations : ( X 1 , Y 1 ) , ( X 2 , Y 2 ) , ..., ( X n , Y n ) Pierre Alquier Machine learning theory for time series

  5. Short introduction to machine learning theory Machine learning and time series Generic machine learning problem Main ingredients : observations : ( X 1 , Y 1 ) , ( X 2 , Y 2 ) , ..., ( X n , Y n ) → usually i.i.d from an unknown distribution P ... Pierre Alquier Machine learning theory for time series

  6. Short introduction to machine learning theory Machine learning and time series Generic machine learning problem Main ingredients : observations : ( X 1 , Y 1 ) , ( X 2 , Y 2 ) , ..., ( X n , Y n ) → usually i.i.d from an unknown distribution P ... a restricted set of predictors ( f θ , θ ∈ Θ) Pierre Alquier Machine learning theory for time series

  7. Short introduction to machine learning theory Machine learning and time series Generic machine learning problem Main ingredients : observations : ( X 1 , Y 1 ) , ( X 2 , Y 2 ) , ..., ( X n , Y n ) → usually i.i.d from an unknown distribution P ... a restricted set of predictors ( f θ , θ ∈ Θ) → f θ ( X ) meant to predict Y . Pierre Alquier Machine learning theory for time series

  8. Short introduction to machine learning theory Machine learning and time series Generic machine learning problem Main ingredients : observations : ( X 1 , Y 1 ) , ( X 2 , Y 2 ) , ..., ( X n , Y n ) → usually i.i.d from an unknown distribution P ... a restricted set of predictors ( f θ , θ ∈ Θ) → f θ ( X ) meant to predict Y . A loss function ℓ → ℓ ( y ′ − y ) incurred by predicting y ′ while the truth is y . Pierre Alquier Machine learning theory for time series

  9. Short introduction to machine learning theory Machine learning and time series Generic machine learning problem Main ingredients : observations : ( X 1 , Y 1 ) , ( X 2 , Y 2 ) , ..., ( X n , Y n ) → usually i.i.d from an unknown distribution P ... a restricted set of predictors ( f θ , θ ∈ Θ) → f θ ( X ) meant to predict Y . A loss function ℓ → ℓ ( y ′ − y ) incurred by predicting y ′ while the truth is y . the risk R ( θ ) Pierre Alquier Machine learning theory for time series

  10. Short introduction to machine learning theory Machine learning and time series Generic machine learning problem Main ingredients : observations : ( X 1 , Y 1 ) , ( X 2 , Y 2 ) , ..., ( X n , Y n ) → usually i.i.d from an unknown distribution P ... a restricted set of predictors ( f θ , θ ∈ Θ) → f θ ( X ) meant to predict Y . A loss function ℓ → ℓ ( y ′ − y ) incurred by predicting y ′ while the truth is y . the risk R ( θ ) → R ( θ ) = E ( X , Y ) ∼ P [ ℓ ( f θ ( X ) − Y )] . Not observable. Pierre Alquier Machine learning theory for time series

  11. Short introduction to machine learning theory Machine learning and time series Generic machine learning problem Main ingredients : observations : ( X 1 , Y 1 ) , ( X 2 , Y 2 ) , ..., ( X n , Y n ) → usually i.i.d from an unknown distribution P ... a restricted set of predictors ( f θ , θ ∈ Θ) → f θ ( X ) meant to predict Y . A loss function ℓ → ℓ ( y ′ − y ) incurred by predicting y ′ while the truth is y . the risk R ( θ ) → R ( θ ) = E ( X , Y ) ∼ P [ ℓ ( f θ ( X ) − Y )] . Not observable. an empirical proxy r ( θ ) for R ( θ ) Pierre Alquier Machine learning theory for time series

  12. Short introduction to machine learning theory Machine learning and time series Generic machine learning problem Main ingredients : observations : ( X 1 , Y 1 ) , ( X 2 , Y 2 ) , ..., ( X n , Y n ) → usually i.i.d from an unknown distribution P ... a restricted set of predictors ( f θ , θ ∈ Θ) → f θ ( X ) meant to predict Y . A loss function ℓ → ℓ ( y ′ − y ) incurred by predicting y ′ while the truth is y . the risk R ( θ ) → R ( θ ) = E ( X , Y ) ∼ P [ ℓ ( f θ ( X ) − Y )] . Not observable. an empirical proxy r ( θ ) for R ( θ ) � n → for example r ( θ ) = 1 i = 1 ℓ ( f θ ( X i ) − Y i ) . n empirical risk minimizer ˆ θ Pierre Alquier Machine learning theory for time series

  13. Short introduction to machine learning theory Machine learning and time series Generic machine learning problem Main ingredients : observations : ( X 1 , Y 1 ) , ( X 2 , Y 2 ) , ..., ( X n , Y n ) → usually i.i.d from an unknown distribution P ... a restricted set of predictors ( f θ , θ ∈ Θ) → f θ ( X ) meant to predict Y . A loss function ℓ → ℓ ( y ′ − y ) incurred by predicting y ′ while the truth is y . the risk R ( θ ) → R ( θ ) = E ( X , Y ) ∼ P [ ℓ ( f θ ( X ) − Y )] . Not observable. an empirical proxy r ( θ ) for R ( θ ) � n → for example r ( θ ) = 1 i = 1 ℓ ( f θ ( X i ) − Y i ) . n empirical risk minimizer ˆ θ → ˆ θ = argmin r ( θ ) . θ ∈ Θ Pierre Alquier Machine learning theory for time series

  14. Short introduction to machine learning theory Machine learning and time series Sub-gamma random variables Definition T is said to be sub-gamma iff ∃ ( v , w ) such that ∀ k ≥ 2, ≤ k ! vw k − 2 � | T | k � . E 2 Pierre Alquier Machine learning theory for time series

  15. Short introduction to machine learning theory Machine learning and time series Sub-gamma random variables Definition T is said to be sub-gamma iff ∃ ( v , w ) such that ∀ k ≥ 2, ≤ k ! vw k − 2 � | T | k � . E 2 Examples : T ∼ Γ( a , b ) , holds with ( v , w ) = ( ab 2 , b ) . Pierre Alquier Machine learning theory for time series

  16. Short introduction to machine learning theory Machine learning and time series Sub-gamma random variables Definition T is said to be sub-gamma iff ∃ ( v , w ) such that ∀ k ≥ 2, ≤ k ! vw k − 2 � | T | k � . E 2 Examples : T ∼ Γ( a , b ) , holds with ( v , w ) = ( ab 2 , b ) . any Z with P ( | Z | ≥ t ) ≤ P ( | T | ≥ t ) . Pierre Alquier Machine learning theory for time series

  17. Short introduction to machine learning theory Machine learning and time series Bernstein’s inequality Theorem Let T 1 , . . . , T n be i.i.d and ( v , w ) -sub-gamma random variables. Then, ∀ ζ ∈ ( 0 , 1 / w ) , � � n � � nv ζ 2 � E exp ζ [ T i − E T i ] ≤ exp . 2 ( 1 − w ζ ) i = 1 Pierre Alquier Machine learning theory for time series

  18. Short introduction to machine learning theory Machine learning and time series Bernstein’s inequality Theorem Let T 1 , . . . , T n be i.i.d and ( v , w ) -sub-gamma random variables. Then, ∀ ζ ∈ ( 0 , 1 / w ) , � � n � � nv ζ 2 � E exp ζ [ T i − E T i ] ≤ exp . 2 ( 1 − w ζ ) i = 1 Consequence in ML : put T i = − ℓ ( f θ ( X i ) − Y i ) and assume T i is ( v , w ) -sub-gamma, then Pierre Alquier Machine learning theory for time series

  19. Short introduction to machine learning theory Machine learning and time series Bernstein’s inequality Theorem Let T 1 , . . . , T n be i.i.d and ( v , w ) -sub-gamma random variables. Then, ∀ ζ ∈ ( 0 , 1 / w ) , � � n � � nv ζ 2 � E exp ζ [ T i − E T i ] ≤ exp . 2 ( 1 − w ζ ) i = 1 Consequence in ML : put T i = − ℓ ( f θ ( X i ) − Y i ) and assume T i is ( v , w ) -sub-gamma, then for any s > 0, � � � � � R ( θ ) − r ( θ ) > t = exp [ s ( R ( θ ) − r ( θ ))] > exp( st ) P P Pierre Alquier Machine learning theory for time series

  20. Short introduction to machine learning theory Machine learning and time series Bernstein’s inequality Theorem Let T 1 , . . . , T n be i.i.d and ( v , w ) -sub-gamma random variables. Then, ∀ ζ ∈ ( 0 , 1 / w ) , � � n � � nv ζ 2 � E exp ζ [ T i − E T i ] ≤ exp . 2 ( 1 − w ζ ) i = 1 Consequence in ML : put T i = − ℓ ( f θ ( X i ) − Y i ) and assume T i is ( v , w ) -sub-gamma, then for any s > 0, � � � � � R ( θ ) − r ( θ ) > t ≤ E exp s ( R ( θ ) − r ( θ )) − st P Pierre Alquier Machine learning theory for time series

  21. Short introduction to machine learning theory Machine learning and time series Bernstein’s inequality Theorem Let T 1 , . . . , T n be i.i.d and ( v , w ) -sub-gamma random variables. Then, ∀ ζ ∈ ( 0 , 1 / w ) , � � n � � nv ζ 2 � E exp ζ [ T i − E T i ] ≤ exp . 2 ( 1 − w ζ ) i = 1 Consequence in ML : put T i = − ℓ ( f θ ( X i ) − Y i ) and assume T i is ( v , w ) -sub-gamma, then for any s > 0, � � � � � � � � n s R ( θ ) − r ( θ ) > t ≤ E exp i = 1 [ T i − E T i ] − st P n Pierre Alquier Machine learning theory for time series

Recommend


More recommend