Adaptive inference and its relations to sequential decision making - PowerPoint PPT Presentation

Adaptive inference and its relations to sequential decision making Alexandra Carpentier 1 OvGU Magdeburg Based on joint works with Olga Klopp, Samory Kpotufe, Andr´ ea Locatelli, Matthias L¨ offler, Richard Nickl Criteo, Oct. 2nd, 2019 1 Partly funded by the DFG EN CA1488, the CRC 1294, the GK 2297, the GK 2433.

Non-Convex Optimization Problem Finding/Exploiting the maximum M ( f ) of an unknown function f .

Non-Convex Optimization Problem Finding/Exploiting the maximum M ( f ) of an unknown function f . Question Can we design algorithms that adapt to the difficulty of the problem?

Non-Convex Optimization Depending on the difficulty of the problem, we would hope to get different performances :

Non-Convex Optimization Depending on the difficulty of the problem, we would hope to get different performances : Question Can we adapt to the hyperparameters?

Scope of this talk Talk : ◮ Presentation of adaptive inference in statistics. ◮ Adaptivity in continuously armed bandits.

ADAPTIVE INFERENCE

Adaptive inference for non-parametric regression Problem : Non-parametric regression X X X X X X X X X X X X X X X X Inference (estimation + uncertainty quantification) of the function?

Adaptive inference for non-parametric regression Problem : The Model f : function on [0 , 1] d . Non-parametric regression n observed data samples ( X i , Y i ) i ≤ n : Y i = f ( X i ) + ε i , i = 1 , . . . , n, X X X X where X i ∼ iid U [0 , 1] d and ε is an X X X indep. centered noise s. t. | ε | ≤ 1. X X X X X X X X X Inference (estimation + uncertainty quantification) of the function?

Adaptive inference for non-parametric regression Problem : The Model f : function on [0 , 1] d . Non-parametric regression n observed data samples ( X i , Y i ) i ≤ n : Y i = f ( X i ) + ε i , i = 1 , . . . , n, X X X X where X i ∼ iid U [0 , 1] d and ε is an X X X indep. centered noise s. t. | ε | ≤ 1. X X X X X X X X X C ( α ) = { Hoelder ball ( α ) } . E.g. for α ≤ 1 { f : | f ( x ) − f ( y ) | ≤ � x − y � α ∞ } . Inference (estimation + uncertainty quantification) of the function?

Adaptive inference for non-parametric regression Problem : Non-parametric regression Question : If f ∈ C ( α ), then the “optimal” precision of inference should depend on α . Inference X X adaptive to α ? X X X X X X X X X X X X X X Inference (estimation + uncertainty quantification) of the function?

Adaptive inference Adaptive inference : Adaptive estimation and Adaptation to the set C h when confidence statements : See f ∈ C h , h ∈ { 0 , 1 } . [Lepski, 1990-92], [Juditsky and Lambert-Lacroix, 1994], [Donoho and Johnstone, 1990-92], [Low, 2004-06], [Birg´ e and Massart, e and Nickl, 2010], etc . 1994-00], [Gin´ ◮ “Large” sets C 0 ⊂ C 1 C 0 e.g. C 0 =: C ( γ ) and C 1 =: C ( α ) with α < γ . ◮ Associated probability C 1 distributions P f for f ∈ C 1 ◮ Receive a dataset of n i.i.d. entries according to P f

Adaptive inference Estimation : Adaptive estimation and ◮ Minimax-optimal estimation confidence statements : See errors r 0 (over C 0 ) and r 1 [Lepski, 1990-92], [Juditsky and (over C 1 ) in � . � norm Lambert-Lacroix, 1994], [Donoho and Johnstone, 1990-92], [Low, 2004-06], [Birg´ e and Massart, Minimax-opt. est. error e and Nickl, 2010], etc . 1994-00], [Gin´ E f � ˜ r h = inf sup f − f � , h ∈ { 0 , 1 } . ˜ ◮ “Large” sets C 0 ⊂ C 1 f est. f ∈C h e.g. C 0 =: C ( γ ) and C 1 =: C ( α ) Minimax-optimal � . � ∞ est. error with α < γ . in non-param. reg. C ( α ) : � log( n ) � α/ (2 α + d ) ◮ Associated probability � . n distributions P f for f ∈ C 1 See [Lepski, 1990-92, etc] . ◮ Receive a dataset of n i.i.d. entries according to P f

Adaptive inference Adaptive estimation : Adaptive estimation and ◮ Minimax-optimal estimation confidence statements : See errors r 0 (over C 0 ) and r 1 [Lepski, 1990-92], [Juditsky and (over C 1 ) in � . � norm Lambert-Lacroix, 1994], [Donoho and Johnstone, f 1990-92], [Low, 2004-06], [Birg´ e and Massart, r 0 e and Nickl, 2010], etc . 1994-00], [Gin´ ^ f ◮ “Large” sets C 0 ⊂ C 1 C 0 e.g. C 0 =: C ( γ ) and C 1 =: C ( α ) with α < γ . f r 1 ^ f C 1 ◮ Associated probability distributions P f for f ∈ C 1 ◮ Receive a dataset of n i.i.d. entries according to P f

Adaptive inference Adaptive estimation : Adaptive estimation and ◮ Minimax-optimal estimation confidence statements : See errors r 0 (over C 0 ) and r 1 [Lepski, 1990-92], [Juditsky and (over C 1 ) in � . � norm Lambert-Lacroix, 1994], [Donoho and Johnstone, ◮ In many models : adaptive 1990-92], [Low, 2004-06], [Birg´ e and Massart, estimator ˆ f exists e and Nickl, 2010], etc . 1994-00], [Gin´ Adaptive estimation ◮ “Large” sets C 0 ⊂ C 1 e.g. C 0 =: C ( γ ) and C 1 =: C ( α ) E f � ˆ sup f − f � ≤ � r h , ∀ h ∈ { 0 , 1 } . with α < γ . f ∈C h ◮ Associated probability Adaptive estimators exist in distributions P f for f ∈ C 1 non-param. reg. See [Lepski, 1990-92, ◮ Receive a dataset of n Donoho and Johnstone, 1998, etc] . i.i.d. entries according to P f

Adaptive inference Adaptive and honest Adaptive estimation and confidence sets : confidence statements : See ◮ Minimax-optimal estimation [Lepski, 1990-92], [Juditsky and errors r 0 , r 1 in � . � norm Lambert-Lacroix, 1994], [Donoho and Johnstone, ◮ Confidence set ˆ C : contains 1990-92], [Low, 2004-06], [Birg´ e and Massart, f and has adaptive diameter e and Nickl, 2010], etc . 1994-00], [Gin´ f r 0 ^ C ◮ “Large” sets C 0 ⊂ C 1 ^ f e.g. C 0 =: C ( γ ) and C 1 =: C ( α ) C 0 with α < γ . f ◮ Associated probability r 1 ^ f C 1 ^ distributions P f for f ∈ C 1 C ◮ Receive a dataset of n i.i.d. entries according to P f

Adaptive inference Adaptive and honest Adaptive estimation and confidence sets : confidence statements : See ◮ Minimax-optimal estimation [Lepski, 1990-92], [Juditsky and errors r 0 , r 1 in � . � norm Lambert-Lacroix, 1994], [Donoho and Johnstone, ◮ Confidence set ˆ C : contains 1990-92], [Low, 2004-06], [Birg´ e and Massart, f and has adaptive diameter e and Nickl, 2010], etc . 1994-00], [Gin´ η -adapt. and honest conf. set ◮ “Large” sets C 0 ⊂ C 1 e.g. C 0 =: C ( γ ) and C 1 =: C ( α ) Honesty : with α < γ . P f ( f ∈ ˆ sup C ) ≥ 1 − η. f ∈C 1 ◮ Associated probability Adaptivity : distributions P f for f ∈ C 1 E f � ˆ sup C � ≤ � r h , ∀ h ∈ { 0 , 1 } . ◮ Receive a dataset of n f ∈C h i.i.d. entries according to P f

Adaptive and honest Adaptive inference confidence sets : Adaptive estimation and ◮ Minimax-optimal estimation confidence statements : See errors r 0 , r 1 in � . � norm [Lepski, 1990-92], [Juditsky and ◮ Confidence set ˆ C : contains Lambert-Lacroix, 1994], [Donoho and Johnstone, f and has adaptive diameter 1990-92], [Low, 2004-06], [Birg´ e and Massart, f r 0 e and Nickl, 2010], etc . 1994-00], [Gin´ ^ C ^ f ◮ “Large” sets C 0 ⊂ C 1 C 0 e.g. C 0 =: C ( γ ) and C 1 =: C ( α ) with α < γ . f r 1 ^ f ◮ Associated probability C 1 ^ C distributions P f for f ∈ C 1 ◮ Receive a dataset of n i.i.d. entries according to P f

Adaptive inference Adaptive estimation and confidence statements : In non-parametric regression : See Adaptive and honest confidence sets do [Lepski, 1990-92], [Juditsky and not exist. See [Cai and Low (2004)], [Hoffmann and Nickl (2011)], etc. Lambert-Lacroix, 1994], [Donoho and Johnstone, Indeed minimax rate for testing 1990-92], [Low, 2004-06], [Birg´ e and Massart, e and Nickl, 2010], etc . between C 0 = C ( γ ) and C 1 = C ( α ) in 1994-00], [Gin´ � . � ∞ norm is: ◮ “Large” sets C 0 ⊂ C 1 � log( n ) � − α/ (2 α + d ) = r 1 ≫ r 0 . e.g. C 0 =: C ( γ ) and C 1 =: C ( α ) n with α < γ . ◮ Associated probability Common situation, adaptive distributions P f for f ∈ C 1 inference paradox - see [Gine and Nickl, 2011], [C, Klopp, L¨ offler, Nickl, ◮ Receive a dataset of n 2017] for a systematic study and relations to a testing problem. i.i.d. entries according to P f

Subtle problem : Matrix completion Problem : Application : Recommendation system (e.g. Netflix). Carine Daniel Alice Bob Ed   � �     � � �         � �     � � �       � � Inference (estimation + uncertainty quantification) of the matrix?

Adaptive inference and its relations to sequential decision making - PowerPoint PPT Presentation

Adaptive inference and its relations to sequential decision making Alexandra Carpentier 1 OvGU Magdeburg Based on joint works with Olga Klopp, Samory Kpotufe, Andr ea Locatelli, Matthias L offler, Richard Nickl Criteo, Oct. 2nd, 2019 1

{Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code}

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Hardware Design with VHDL Sequential Stmts ECE 443 Sequential Statements This slide set covers

Random Sampling Florian Schoppmann August 24, 2010 Non-Sequential Sequential Sequential with

Sequential Files : Outline ! Overview ! Ordered vs. Unordered ! Physical sequential Files !

Group Sequential and Adaptive Designs Part II: Adaptive Designs May 2, 2015 Cyrus Mehta, Ph.D.

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

Introduction to Synchronous Sequential Introduction to Synchronous Sequential Circuits Circuits

Chapter 5 Synchronous Sequential Logic 5-1 Outline ! Sequential Circuits ! Latches ! Flip-Flops

Sequential Supervised Learning Sequential Supervised Learning Many Application Problems Require

Hardware Design with VHDL Sequential Circuit Design I ECE 443 Sequential Circuit Design:

Adaptive Control Chapter 13: Multimodel adaptive control with switching Chapter 13: Multimodel

Adaptive Control Chapter 14: Adaptive regulation Rejection of unknown disturbances 1

Conversations Conversations among among Inference Relations Inference Relations Itala M.

Supporting Robust Decisions with Classification and Data-Mining Algorithms Benjamin Bryant

Mathematical Fuzzy Logic in Reasoning and Decision Making under Uncertainty Hykel Hosni

Collective Decision Making with Incomplete Individual Opinions Zoi Terzopoulou Institute for

Making Complex Decisions Paolo Turrini Department of Computing, Imperial College London

Agenda Conservatorship is a legal process, wherein: The Court appoints an individual or an

Decision Table-Based Testing Chapter 7 DTT1 Decision Tables - Wikipedia A precise yet

Lecture 8: Decision Tables 2018-05-28 Prof. Dr. Andreas Podelski, Dr. Bernd Westphal

Decision Table If the deductible has been met, the amount to be reimbursed depends on whether or

Sambuz

Useful Links

Newsletter

Mail Us