Learning Faster from Easy Data Peter Gr unwald Wouter M. Koolen - PowerPoint PPT Presentation

Learning Faster from Easy Data Peter Gr¨ unwald Wouter M. Koolen Sasha Rakhlin Karthik Sridharan

How Natural is the Worst Case? Predict T coin flips � � Regret = My total loss − min All-heads total loss , All-tails total loss √ Minimax regret is T (IID fair coin)

How Natural is the Worst Case? Predict T coin flips � � Regret = My total loss − min All-heads total loss , All-tails total loss √ Minimax regret is T (IID fair coin) Any other IID coin: ◮ FTL gives constant regret . . .

How Natural is the Worst Case? Predict T coin flips � � Regret = My total loss − min All-heads total loss , All-tails total loss √ Minimax regret is T (IID fair coin) Any other IID coin: ◮ FTL gives constant regret . . . ◮ . . . but is no solution: terrible worst-case regret (010101. . . )

How Natural is the Worst Case? Predict T coin flips � � Regret = My total loss − min All-heads total loss , All-tails total loss √ Minimax regret is T (IID fair coin) Any other IID coin: ◮ FTL gives constant regret . . . ◮ . . . but is no solution: terrible worst-case regret (010101. . . ) √ ◮ . . . yet standard low regret algorithms retain T regret.

How Natural is the Worst Case? Predict T coin flips � � Regret = My total loss − min All-heads total loss , All-tails total loss √ Minimax regret is T (IID fair coin) Any other IID coin: ◮ FTL gives constant regret . . . ◮ . . . but is no solution: terrible worst-case regret (010101. . . ) √ ◮ . . . yet standard low regret algorithms retain T regret. Not useful in practice �

This Problem is Everywhere Individual Sequence: R = Regret T � ln K min alg max data R = T 1 Achieved by Hedge/EW with η = √ T

This Problem is Everywhere Individual Sequence: R = Regret T � ln K min alg max data R = T 1 Achieved by Hedge/EW with η = √ T Easy case: Stochastic w. gap R = c · ln K T Achieved by FTL/EW with const η

This Problem is Everywhere Individual Sequence: R = Regret T � ln K min alg max data R = T 1 Achieved by Hedge/EW with η = √ T const η is bad Easy case: Stochastic w. gap R = c · ln K T Achieved by FTL/EW with const η 1 η = T is bad √

This Problem is Everywhere Individual Sequence: R = Regret Stochastic IID: R = Excess Risk T � � ln K T ln K min alg max dist R = min alg max data R = T T 1 Achieved by Hedge/EW with η = Achieved by ERM √ T const η is bad Easy case: Stochastic w. gap R = c · ln K T Achieved by FTL/EW with const η 1 η = T is bad √

This Problem is Everywhere Individual Sequence: R = Regret Stochastic IID: R = Excess Risk T � � ln K T ln K min alg max dist R = min alg max data R = T T 1 Achieved by Hedge/EW with η = Achieved by ERM √ T const η is bad Easy case: Stochastic w. gap Easy case: Tsybakov( κ ) condition κ � ln K T � R = c · ln K 2 κ − 1 R = T T Exploited by ERM Achieved by FTL/EW with const η 1 η = T is bad √

This Problem is Everywhere Individual Sequence: R = Regret Stochastic IID: R = Excess Risk T � � − ln π (best) ln K min alg max data R = min alg max dist R = T T 1 1 Achieved by Hedge/EW with η = Achieved by “Bayes” with η = √ √ T T const η is bad Easy case: Stochastic w. gap Easy case: Tsybakov( κ ) condition κ � − ln π (best) � R = c · ln K 2 κ − 1 R = T T Achieved by Bayes w. η = T 1 − κ Achieved by FTL/EW with const η 2 κ − 1 1 η = T is bad √

This Problem is Everywhere Individual Sequence: R = Regret Stochastic IID: R = Excess Risk T � � − ln π (best) ln K min alg max data R = min alg max dist R = T T 1 1 Achieved by Hedge/EW with η = Achieved by “Bayes” with η = √ √ T T const η is bad higher η are bad Easy case: Stochastic w. gap Easy case: Tsybakov( κ ) condition κ � − ln π (best) � R = c · ln K 2 κ − 1 R = T T Achieved by Bayes w. η = T 1 − κ Achieved by FTL/EW with const η 2 κ − 1 1 η = T is bad √ other η are bad

Punchline No single algorithm seems to work in general Different degrees of easiness seem to require different algorithms

Punchline No single algorithm seems to work in general Different degrees of easiness seem to require different algorithms or do they . . . ?

Punchline No single algorithm seems to work in general Different degrees of easiness seem to require different algorithms or do they . . . ? Adaptive algorithms exist adapting to some types of luckiness in some settings, while preserving minimax guarantees: ◮ Srebro low target error in non-parametric setting ◮ Agarwal high margin in active learning setting ◮ Sridharan past proves future cannot be worst-case ◮ Van Erven data for which FTL works well (e.g. stochastic) ◮ Bubeck stochastic bandit feedback

Goals of this workshop ◮ Develop general methods for constructing algorithms that adapt to general types of easiness ◮ Determine classes of easiness worth exploiting in practice Recent developments suggest answers may be within our reach

Partial Unification of Easiness Notions [vEGRW12] subsume three important easiness criteria Statistical learning (Generalised) Tsybakov  • •   condition     Density estimation Barron-Li-Van der Vaart • •   Stochastic  when model wrong martingale condition mixability Ind. seq. prediction Vovk mixability • •     with easy loss fn. ⊃ exp-concavity     ⊃ strong convexity  � � e − ηℓ ( Y , a ) E for every action a : ≤ 1 (SM- η ) e − ηℓ ( Y , a ∗ ) Y ∼ P

Partial Unification of Easiness Notions [vEGRW12] subsume three important easiness criteria Statistical learning (Generalised) Tsybakov  • •   condition     Density estimation Barron-Li-Van der Vaart • •   Stochastic  when model wrong martingale condition mixability Ind. seq. prediction Vovk mixability • •     with easy loss fn. ⊃ exp-concavity     ⊃ strong convexity  � � e − ηℓ ( Y , a ) E for every action a : ≤ 1 (SM- η ) e − ηℓ ( Y , a ∗ ) Y ∼ P Loss Vovk mixable iff stochastically mixable for all distributions

Easiness sans Stochastics Small regret when ◮ Prior luckiness ◮ simple (high prior) best expert [Hutter & Poland, 2005] ◮ many good experts [Chaudhuri, Freund & Hsu 2009] ◮ few leaders [Gofer, Cesa-Bianchi, Gentile & Mansour 2013]

Easiness sans Stochastics Small regret when ◮ Prior luckiness ◮ simple (high prior) best expert [Hutter & Poland, 2005] ◮ many good experts [Chaudhuri, Freund & Hsu 2009] ◮ few leaders [Gofer, Cesa-Bianchi, Gentile & Mansour 2013] ◮ IID type luckiness ◮ best expert has low loss [Auer, Cesa-Bianchi & Gentile 2002] ◮ algorithm issues low variance predictions [Cesa-Bianchi, Mansour & Stoltz 2007] ◮ best expert loss has low variance [Hazan & Kale 2008]

Easiness sans Stochastics Small regret when ◮ Prior luckiness ◮ simple (high prior) best expert [Hutter & Poland, 2005] ◮ many good experts [Chaudhuri, Freund & Hsu 2009] ◮ few leaders [Gofer, Cesa-Bianchi, Gentile & Mansour 2013] ◮ IID type luckiness ◮ best expert has low loss [Auer, Cesa-Bianchi & Gentile 2002] ◮ algorithm issues low variance predictions [Cesa-Bianchi, Mansour & Stoltz 2007] ◮ best expert loss has low variance [Hazan & Kale 2008] ◮ Non-stationary luckiness ◮ expert losses evolve slowly over time [Chiang, Yang, Lee, Mahdavi, Lu, Jin & Zhu 2012] ◮ expert losses are predictable [Rakhlin & Karthik 2013] ◮ . . .

We insist: your next algorithm is both robust in the worst case and optimal in the lucky case Enjoy!

Learning Faster from Easy Data Peter Gr unwald Wouter M. Koolen - PowerPoint PPT Presentation

Learning Faster from Easy Data Peter Gr unwald Wouter M. Koolen Sasha Rakhlin Karthik Sridharan How Natural is the Worst Case? Predict T coin flips Regret = My total loss min All-heads total loss , All-tails total loss

FASTER TRANSFORMER Bo Yang Hsueh, 2019/12/18 AGENDA What is Faster Transformer Introduce the

Easy-to-Use Easy-to-Install Easy on the Budget orecx.com Easy-to-Use

Easy Flype & Easy HiFlype Peripheral Self-Expanding Stent System 20/07/2018 Easy Flype

Learning Faster from Easy Data II Wouter Koolen Tim van Erven Aim of the Workshop

TS 83 DORMA DORMA TS 83 Easy-action Door Closer Easy-action door closer Data and features TS

Water Rights Accounting New Accounting Model New Technology: 1979 versus 2011 Faster

Faster Cover Trees Mike Izbicki and Christian R. Shelton UC Riverside Izbicki and Shelton (UC

Faster Johnson-Lindenstrauss style reductions Aditya Menon August 23, 2007 Faster

WRITING FASTER CODE 1 . 1 WRITING FASTER CODE AND NOT HATING YOUR JOB AS A SOFTWARE DEVELOPER

Faster Code Nicolas Limare 2014/11/19 faster? one task vs many speeds one operation vs many

Inphi Moves Big Data Faster Inphi Moves Big Data Faster Inphis New Canopus DSP Enabling

Rapid Restoration Diagnostic Motivate Enable Implement ++ Urgent Urgent Not easy Easy

Easy Move Progression and Distinctive versions The lifts for accessibility EASY MOVE Distinctive

Title Table of content 1 Easy to change colors, photos and Text 2 Easy to change colors,

Meal Planning Made Easy Meal Planning Made Easy Healthy Utah Meal Planning Made Easy

Expandabee Easy Access | Easy Lift | Easy Bees Red B Where we left off Standalone

Perturbations of Binary de Bruijn sequences Martianus Frederic Ezerman, Adamas Aqsa Fahreza NTU,

Polynomial-Time Approximation Algorithms for Weighted LCS Problem Marek Cygan 1 , Marcin Kubica 1

POLAND-SCHERAGA model and renewal theory Maha Khatib Supervised by Giambattista Giacomin LPMA -

Turning Borel sets into Clopen effectively Vassilis Gregoriades TU Darmstadt

Modeling of fractional dynamics using L evy walks - recent advances Marcin Magdziarz Hugo

MALACH : Multilingual Access to Large spoken ArCHives http://www.clsp.jhu.edu/research/malach

On algebraic constructions of graphs without small cycles and commutative diagrams and their

On Kim-independence in NSOP 1 theories Itay Kaplan, HUJI Joint works with Nick Ramsey, Nick

Learning Faster from Easy Data Peter Gr unwald Wouter M. Koolen - PowerPoint PPT Presentation

Learning Faster from Easy Data Peter Gr unwald Wouter M. Koolen Sasha Rakhlin Karthik Sridharan How Natural is the Worst Case? Predict T coin flips Regret = My total loss min All-heads total loss , All-tails total loss

FASTER TRANSFORMER Bo Yang Hsueh, 2019/12/18 AGENDA What is Faster Transformer Introduce the

Easy-to-Use Easy-to-Install Easy on the Budget orecx.com Easy-to-Use

Easy Flype &amp; Easy HiFlype Peripheral Self-Expanding Stent System 20/07/2018 Easy Flype

Learning Faster from Easy Data II Wouter Koolen Tim van Erven Aim of the Workshop

TS 83 DORMA DORMA TS 83 Easy-action Door Closer Easy-action door closer Data and features TS

Water Rights Accounting New Accounting Model New Technology: 1979 versus 2011 Faster

Faster Cover Trees Mike Izbicki and Christian R. Shelton UC Riverside Izbicki and Shelton (UC

Faster Johnson-Lindenstrauss style reductions Aditya Menon August 23, 2007 Faster

WRITING FASTER CODE 1 . 1 WRITING FASTER CODE AND NOT HATING YOUR JOB AS A SOFTWARE DEVELOPER

Faster Code Nicolas Limare 2014/11/19 faster? one task vs many speeds one operation vs many

Inphi Moves Big Data Faster Inphi Moves Big Data Faster Inphis New Canopus DSP Enabling

Rapid Restoration Diagnostic Motivate Enable Implement ++ Urgent Urgent Not easy Easy

Easy Move Progression and Distinctive versions The lifts for accessibility EASY MOVE Distinctive

Title Table of content 1 Easy to change colors, photos and Text 2 Easy to change colors,

Meal Planning Made Easy Meal Planning Made Easy Healthy Utah Meal Planning Made Easy

Expandabee Easy Access | Easy Lift | Easy Bees Red B Where we left off Standalone

Perturbations of Binary de Bruijn sequences Martianus Frederic Ezerman, Adamas Aqsa Fahreza NTU,

Polynomial-Time Approximation Algorithms for Weighted LCS Problem Marek Cygan 1 , Marcin Kubica 1

POLAND-SCHERAGA model and renewal theory Maha Khatib Supervised by Giambattista Giacomin LPMA -

Turning Borel sets into Clopen effectively Vassilis Gregoriades TU Darmstadt

Modeling of fractional dynamics using L evy walks - recent advances Marcin Magdziarz Hugo

MALACH : Multilingual Access to Large spoken ArCHives http://www.clsp.jhu.edu/research/malach

On algebraic constructions of graphs without small cycles and commutative diagrams and their

On Kim-independence in NSOP 1 theories Itay Kaplan, HUJI Joint works with Nick Ramsey, Nick

Easy Flype & Easy HiFlype Peripheral Self-Expanding Stent System 20/07/2018 Easy Flype