adaptive inference and its relations to sequential
play

Adaptive inference and its relations to sequential decision making - PowerPoint PPT Presentation

Adaptive inference and its relations to sequential decision making Alexandra Carpentier 1 OvGU Magdeburg Based on joint works with Olga Klopp, Samory Kpotufe, Andr ea Locatelli, Matthias L offler, Richard Nickl Criteo, Oct. 2nd, 2019 1


  1. Adaptive inference and its relations to sequential decision making Alexandra Carpentier 1 OvGU Magdeburg Based on joint works with Olga Klopp, Samory Kpotufe, Andr´ ea Locatelli, Matthias L¨ offler, Richard Nickl Criteo, Oct. 2nd, 2019 1 Partly funded by the DFG EN CA1488, the CRC 1294, the GK 2297, the GK 2433.

  2. Non-Convex Optimization Problem Finding/Exploiting the maximum M ( f ) of an unknown function f .

  3. Non-Convex Optimization Problem Finding/Exploiting the maximum M ( f ) of an unknown function f . Question Can we design algorithms that adapt to the difficulty of the problem?

  4. Non-Convex Optimization Depending on the difficulty of the problem, we would hope to get different performances :

  5. Non-Convex Optimization Depending on the difficulty of the problem, we would hope to get different performances : Question Can we adapt to the hyperparameters?

  6. Scope of this talk Talk : ◮ Presentation of adaptive inference in statistics. ◮ Adaptivity in continuously armed bandits.

  7. ADAPTIVE INFERENCE

  8. Adaptive inference for non-parametric regression Problem : Non-parametric regression X X X X X X X X X X X X X X X X Inference (estimation + uncertainty quantification) of the function?

  9. Adaptive inference for non-parametric regression Problem : Non-parametric regression X X X X X X X X X X X X X X X X Inference (estimation + uncertainty quantification) of the function?

  10. Adaptive inference for non-parametric regression Problem : The Model f : function on [0 , 1] d . Non-parametric regression n observed data samples ( X i , Y i ) i ≤ n : Y i = f ( X i ) + ε i , i = 1 , . . . , n, X X X X where X i ∼ iid U [0 , 1] d and ε is an X X X indep. centered noise s. t. | ε | ≤ 1. X X X X X X X X X Inference (estimation + uncertainty quantification) of the function?

  11. Adaptive inference for non-parametric regression Problem : The Model f : function on [0 , 1] d . Non-parametric regression n observed data samples ( X i , Y i ) i ≤ n : Y i = f ( X i ) + ε i , i = 1 , . . . , n, X X X X where X i ∼ iid U [0 , 1] d and ε is an X X X indep. centered noise s. t. | ε | ≤ 1. X X X X X X X X X Inference (estimation + uncertainty quantification) of the function?

  12. Adaptive inference for non-parametric regression Problem : The Model f : function on [0 , 1] d . Non-parametric regression n observed data samples ( X i , Y i ) i ≤ n : Y i = f ( X i ) + ε i , i = 1 , . . . , n, X X X X where X i ∼ iid U [0 , 1] d and ε is an X X X indep. centered noise s. t. | ε | ≤ 1. X X X X X X X X X C ( α ) = { Hoelder ball ( α ) } . E.g. for α ≤ 1 { f : | f ( x ) − f ( y ) | ≤ � x − y � α ∞ } . Inference (estimation + uncertainty quantification) of the function?

  13. Adaptive inference for non-parametric regression Problem : Non-parametric regression Question : If f ∈ C ( α ), then the “optimal” precision of inference should depend on α . Inference X X adaptive to α ? X X X X X X X X X X X X X X Inference (estimation + uncertainty quantification) of the function?

  14. Adaptive inference Adaptive inference : Adaptive estimation and Adaptation to the set C h when confidence statements : See f ∈ C h , h ∈ { 0 , 1 } . [Lepski, 1990-92], [Juditsky and Lambert-Lacroix, 1994], [Donoho and Johnstone, 1990-92], [Low, 2004-06], [Birg´ e and Massart, e and Nickl, 2010], etc . 1994-00], [Gin´ ◮ “Large” sets C 0 ⊂ C 1 C 0 e.g. C 0 =: C ( γ ) and C 1 =: C ( α ) with α < γ . ◮ Associated probability C 1 distributions P f for f ∈ C 1 ◮ Receive a dataset of n i.i.d. entries according to P f

  15. Adaptive inference Estimation : Adaptive estimation and ◮ Minimax-optimal estimation confidence statements : See errors r 0 (over C 0 ) and r 1 [Lepski, 1990-92], [Juditsky and (over C 1 ) in � . � norm Lambert-Lacroix, 1994], [Donoho and Johnstone, 1990-92], [Low, 2004-06], [Birg´ e and Massart, Minimax-opt. est. error e and Nickl, 2010], etc . 1994-00], [Gin´ E f � ˜ r h = inf sup f − f � , h ∈ { 0 , 1 } . ˜ ◮ “Large” sets C 0 ⊂ C 1 f est. f ∈C h e.g. C 0 =: C ( γ ) and C 1 =: C ( α ) Minimax-optimal � . � ∞ est. error with α < γ . in non-param. reg. C ( α ) : � log( n ) � α/ (2 α + d ) ◮ Associated probability � . n distributions P f for f ∈ C 1 See [Lepski, 1990-92, etc] . ◮ Receive a dataset of n i.i.d. entries according to P f

  16. Adaptive inference Adaptive estimation : Adaptive estimation and ◮ Minimax-optimal estimation confidence statements : See errors r 0 (over C 0 ) and r 1 [Lepski, 1990-92], [Juditsky and (over C 1 ) in � . � norm Lambert-Lacroix, 1994], [Donoho and Johnstone, f 1990-92], [Low, 2004-06], [Birg´ e and Massart, r 0 e and Nickl, 2010], etc . 1994-00], [Gin´ ^ f ◮ “Large” sets C 0 ⊂ C 1 C 0 e.g. C 0 =: C ( γ ) and C 1 =: C ( α ) with α < γ . f r 1 ^ f C 1 ◮ Associated probability distributions P f for f ∈ C 1 ◮ Receive a dataset of n i.i.d. entries according to P f

  17. Adaptive inference Adaptive estimation : Adaptive estimation and ◮ Minimax-optimal estimation confidence statements : See errors r 0 (over C 0 ) and r 1 [Lepski, 1990-92], [Juditsky and (over C 1 ) in � . � norm Lambert-Lacroix, 1994], [Donoho and Johnstone, ◮ In many models : adaptive 1990-92], [Low, 2004-06], [Birg´ e and Massart, estimator ˆ f exists e and Nickl, 2010], etc . 1994-00], [Gin´ Adaptive estimation ◮ “Large” sets C 0 ⊂ C 1 e.g. C 0 =: C ( γ ) and C 1 =: C ( α ) E f � ˆ sup f − f � ≤ � r h , ∀ h ∈ { 0 , 1 } . with α < γ . f ∈C h ◮ Associated probability Adaptive estimators exist in distributions P f for f ∈ C 1 non-param. reg. See [Lepski, 1990-92, ◮ Receive a dataset of n Donoho and Johnstone, 1998, etc] . i.i.d. entries according to P f

  18. Adaptive inference Adaptive and honest Adaptive estimation and confidence sets : confidence statements : See ◮ Minimax-optimal estimation [Lepski, 1990-92], [Juditsky and errors r 0 , r 1 in � . � norm Lambert-Lacroix, 1994], [Donoho and Johnstone, ◮ Confidence set ˆ C : contains 1990-92], [Low, 2004-06], [Birg´ e and Massart, f and has adaptive diameter e and Nickl, 2010], etc . 1994-00], [Gin´ f r 0 ^ C ◮ “Large” sets C 0 ⊂ C 1 ^ f e.g. C 0 =: C ( γ ) and C 1 =: C ( α ) C 0 with α < γ . f ◮ Associated probability r 1 ^ f C 1 ^ distributions P f for f ∈ C 1 C ◮ Receive a dataset of n i.i.d. entries according to P f

  19. Adaptive inference Adaptive and honest Adaptive estimation and confidence sets : confidence statements : See ◮ Minimax-optimal estimation [Lepski, 1990-92], [Juditsky and errors r 0 , r 1 in � . � norm Lambert-Lacroix, 1994], [Donoho and Johnstone, ◮ Confidence set ˆ C : contains 1990-92], [Low, 2004-06], [Birg´ e and Massart, f and has adaptive diameter e and Nickl, 2010], etc . 1994-00], [Gin´ η -adapt. and honest conf. set ◮ “Large” sets C 0 ⊂ C 1 e.g. C 0 =: C ( γ ) and C 1 =: C ( α ) Honesty : with α < γ . P f ( f ∈ ˆ sup C ) ≥ 1 − η. f ∈C 1 ◮ Associated probability Adaptivity : distributions P f for f ∈ C 1 E f � ˆ sup C � ≤ � r h , ∀ h ∈ { 0 , 1 } . ◮ Receive a dataset of n f ∈C h i.i.d. entries according to P f

  20. Adaptive and honest Adaptive inference confidence sets : Adaptive estimation and ◮ Minimax-optimal estimation confidence statements : See errors r 0 , r 1 in � . � norm [Lepski, 1990-92], [Juditsky and ◮ Confidence set ˆ C : contains Lambert-Lacroix, 1994], [Donoho and Johnstone, f and has adaptive diameter 1990-92], [Low, 2004-06], [Birg´ e and Massart, f r 0 e and Nickl, 2010], etc . 1994-00], [Gin´ ^ C ^ f ◮ “Large” sets C 0 ⊂ C 1 C 0 e.g. C 0 =: C ( γ ) and C 1 =: C ( α ) with α < γ . f r 1 ^ f ◮ Associated probability C 1 ^ C distributions P f for f ∈ C 1 ◮ Receive a dataset of n i.i.d. entries according to P f

  21. Adaptive inference Adaptive estimation and confidence statements : In non-parametric regression : See Adaptive and honest confidence sets do [Lepski, 1990-92], [Juditsky and not exist. See [Cai and Low (2004)], [Hoffmann and Nickl (2011)], etc. Lambert-Lacroix, 1994], [Donoho and Johnstone, Indeed minimax rate for testing 1990-92], [Low, 2004-06], [Birg´ e and Massart, e and Nickl, 2010], etc . between C 0 = C ( γ ) and C 1 = C ( α ) in 1994-00], [Gin´ � . � ∞ norm is: ◮ “Large” sets C 0 ⊂ C 1 � log( n ) � − α/ (2 α + d ) = r 1 ≫ r 0 . e.g. C 0 =: C ( γ ) and C 1 =: C ( α ) n with α < γ . ◮ Associated probability Common situation, adaptive distributions P f for f ∈ C 1 inference paradox - see [Gine and Nickl, 2011], [C, Klopp, L¨ offler, Nickl, ◮ Receive a dataset of n 2017] for a systematic study and relations to a testing problem. i.i.d. entries according to P f

  22. Subtle problem : Matrix completion Problem : Application : Recommendation system (e.g. Netflix). Carine Daniel Alice Bob Ed   � �     � � �         � �     � � �       � � Inference (estimation + uncertainty quantification) of the matrix?

Recommend


More recommend