Easy Data Peter Grnwald Centrum Wiskunde & Informatica - PowerPoint PPT Presentation

Easy Data Peter Grünwald Centrum Wiskunde & Informatica – Amsterdam Mathematical Institute – Leiden University Joint work with W. Koolen, T. Van Erven, N. Mehta, T. Sterkenburg

Today: Three Things To Tell You 1. Nifty Reformulation of Conditions for Fast Rates in Statistical Learning – Tsybakov, Bernstein, Exp-Concavity,... 2. Do this via new concept: ESI 3. Precise Analogue of Bernstein Condition for Fast Rates in Individual Sequence Setting – ...and algorithm that achieves these rates!

Today: Three Things To Tell You 1. Nifty Reformulation of Conditions for Fast Rates in Statistical Learning 2. Do this via new concept: ESI 3. Precise Analogue of Bernstein Condition for Fast Rates in Individual Sequence Setting – ...and algorithm that achieves these rates!

Van Erven, G. Mehta, Reid, Williamson Fast Rates in Statistical and Online Learning. JMLR Special Issue in Memory of A. Chervonenkis, Oct. 2015 VC: Vapnik-Chervonenkis (1974!) optimistic (realizability) • Plaatje van stochmix paper condition TM: Tsybakov (2004) margin condition (special case: Massart Condition) 𝒗 -BC: Audibert, Bousquet (2005), Bartlett, Mendelson (2006 ) “Bernstein Condition” • Does not require 0/1 or absolute loss • Does not require Bayes act to be in model

Decision Problem • A decision problem (DP) is defined as a tuple where • 𝑄 is the distribution of random quantity 𝑎 taking values in , is a set of predictors 𝑔 , and for each • the model indicates loss 𝑔 makes on 𝑎 , • Example: squared error loss

Decision Problem • A decision problem (DP) is defined as a tuple where • 𝑄 is the distribution of random quantity 𝑎 taking values in , is a set of predictors 𝑔 , and for each • the model indicates loss 𝑔 makes on 𝑎 , • We assume throughout that the model contains a risk minimizer 𝑔 ∗ , achieving • abbreviates

Bernstein Condition • Fix a DP with (for now) bounded loss • DP satisfies the 𝐷, 𝛽 -Bernstein condition if there exists 𝐷 > 0, 𝛽 ∈ 0,1 , such that for all where we set and is ‘ regret of 𝑔 relative to 𝑔 ∗ ’. •

Bernstein Condition • Fix a DP with (for now) bounded loss • DP satisfies the 𝐷, 𝛽 -Bernstein condition if there exists 𝐷 > 0, 𝛽 ∈ 0,1 , such that for all where we set and Generalizes Tsybakov condition: 𝑔 ∗ does not need • to be Bayes act, loss does not need to be 0/1

Bernstein Condition • Fix a DP with (for now) bounded loss • DP satisfies the 𝐷, 𝛽 -Bernstein condition if there exists 𝐷 > 0, 𝛽 ∈ 0,1 , such that for all where we set and Suppose data are i.i.d. and the 𝐷, 𝛽 -Bernstein • condition holds. Then...

Under Bernstein (𝑫, 𝜷) • Empirical Risk minimization satisfies, with high prob*, 𝛽 = 0 : condition trivially satisfied, get minimax rate • 𝛽 = 1 : nice case (Massart condition), get ‘log - loss’ • rate

Under Bernstein (𝑫, 𝜷) 𝜽 − “Bayes” MAP satisfies , with high prob*, • This requires setting “learning rate” 𝜃 in terms of 𝛽 • and 𝑈 ! 𝛽 = 0 : slow rate ; 𝛽 = 1 : fast rate •

GOAL: Sequential Bernstein 𝜃 − “Bayes” MAP satisfies, with high prob*, • • GOAL: design ‘sequential Bernstein condition’ and accompanying sequential prediction algorithm s.t. 1. cumulative regret always satisfies, for all 𝑔 ∗ , all sequences 2. if condition holds, it also satisfies, with high prob*

GOAL: Sequential Bernstein • GOAL: design ‘sequential Bernstein condition’ and accompanying sequential prediction algorithm s.t. 1. cumulative regret always satisfies, for all 𝑔 ∗ , all sequences 2. if condition holds, it also satisfies, with high prob*

DREAM • DREAM: design ‘sequential Bernstein condition’ and accompanying sequential prediction algorithm s.t. 1. cumulative regret always satisfies, for all 𝑔 ∗ , all sequences 2. if condition holds for given sequence , then cumulative regret satisfies, for that sequence:

GOAL: Sequential Bernstein • GOAL: design ‘sequential Bernstein condition’ s.t. 1. for all 𝑔 ∗ , all sequences 2. if condition holds, it also satisfies, with high prob*, Approach 1: define seq. Bernstein as standard Bernstein+i.i.d. Even then none of the standard algorithms achieve this... With one (?) exception!

Today: Three Things To Tell You 1. Nifty Reformulation of Fast Rate Conditions in Statistical Learning 2. Do this via new concept: ESI 3. Precise Analogue of Bernstein Condition for Fast Rates in Individual Sequence Setting – ...and algorithm that achieves these rates!

Exponential Stochastic Inequality (ESI) ∗ 𝝑 as For any given 𝜃 > 0 we write 𝒀 ≤ 𝜽 • shorthand for ∗ 𝜗 implies, via Jensen, 𝑌 ≤ 𝜃 • ∗ 𝜗 implies, via Markov, for all 𝐵 , 𝑌 ≤ 𝜃 •

ESI-Example Hoeffding’s Inequality : suppose that 𝑌 has • support [−1,1] , and mean 0. Then

ESI – More Properties For i.i.d. rvs 𝑌, 𝑌 1 , … , 𝑌 𝑈 we have • For arbitrary rvs 𝑌, 𝑍 we have •

Bernstein in ESI Terms • Most general form of Bernstein condition: for some nondecreasing function :

Bernstein in ESI Terms • Most general form of Bernstein condition: for some nondecreasing function : • Van Erven et al. (2015) show this is equivalent to having for some nondecreasing function with

U-Central Condition • Van Erven et al. (2015) show Bernstein condition is is equivalent to the existence of increasing function such that for some : They term this the 𝒗 -central condition

U-Central Condition • Van Erven et al. (2015) show Bernstein condition is is equivalent to the existence of increasing function such that for some : They term this the 𝒗 -central condition – can also be related to mixability, exp-concavity, JRT-condition , condition for well-behavedness of Bayesian inference under misspecification

U-Central Condition • Van Erven et al. (2015) show Bernstein condition is is equivalent to the existence of increasing function such that for some : They term this the 𝒗 -central condition – can also be related to mixability, exp-concavity, JRT-condition , condition for well-behavedness of Bayesian inference under misspecification – for unbounded losses, it becomes different (and better!) than Bernstein condition – it is one-sided

Three Equivalent Notions for Bounded Losses • U-central condition in terms of regret : .....or equivalently (extending notation):

Three Equivalent Notions for Bounded Losses • U-central condition in terms of regret : with • For bounded losses, this turns out to be equivalent to: for some appropriately chosen with :

Three Equivalent Notions for Bounded Losses • U-central condition in terms of regret : with • For bounded losses, this turns out to be equivalent to: for some appropriately chosen with : • More similar to original Bernstein condition. However, condition is now in ‘exponential’ rather than ‘expectation’ form

Today: Three Things To Tell You 1. Nifty Reformulation of Fast Rate Conditions in Statistical Learning 2. Do this via new concept: ESI 3. Precise Analogue of Bernstein Condition for Fast Rates in Individual Sequence Setting – ...and algorithm that achieves these rates!

T-fold U-Central Condition • Suppose that 𝑣 -central condition holds (i.e. 𝑦 / 𝑣(𝑦) – Bernstein holds) , and data are i.i.d. Then by generic property of ESI, with 𝜃 𝜗 = 𝐷 1 ⋅ 𝑣(𝜗) , where

T-fold U-Central Condition • Under 𝑣 -central cond. and iid data, with 𝜃 𝜗 = 𝐷 1 ⋅ 𝑣 𝜗 : but also for every learning algorithm with

Cumulative U-Central Condition • Under 𝑣 -central cond. and iid data, with 𝜃 𝜗 = 𝐷 1 ⋅ 𝑣 𝜗 : but also for every learning algorithm This condition may of course also hold for non-i.i.d. data . It is the condition we need, so we term it the cumulative u-central condition

Hedge with Oracle Learning Rate Hedge with learning rate 𝜃 achieves regret bound, • for all We assume cumulative 𝑣 -central condition for some • 𝑣 . For simplicity assume ; then: and even for some other constant

Hedge with Oracle Learning Rate • Combining we get We can set 𝜗 (or eqv. 𝜃 ) as we like. Best possible • bound achieved if we make sure all terms are of same order, i.e. we set at time 𝑈, • and then and

Squint without Oracle Learning Rate! • Hedge achieves ESI- (!)-bound ...but needs to know 𝑔 ∗ , 𝛾 and 𝑈 to set learning rate! • Squint (Koolen and Van Erven ’15) • achieves same bound without knowing these! Gets bound with 𝛾 = 0 automatically for individual • sequences • What about Adanormalhedge? (Luo & Shapire ‘15)

Dessert: Easy Data Rather than Distributions • We are working with algorithms such as Hedge and Squint, designed for individual, nonstochastic sequences • Yet condition is stochastic • Does there exist nonstochastic analogue ? • Answer is yes:

Easy Data Peter Grnwald Centrum Wiskunde & Informatica - PowerPoint PPT Presentation

Easy Data Peter Grnwald Centrum Wiskunde & Informatica Amsterdam Mathematical Institute Leiden University Joint work with W. Koolen, T. Van Erven, N. Mehta, T. Sterkenburg Today: Three Things To Tell You 1. Nifty Reformulation of

Easy-to-Use Easy-to-Install Easy on the Budget orecx.com Easy-to-Use

PRESENTATION OF DATA Data summarization Is the organization of data in a way for easy

TS 83 DORMA DORMA TS 83 Easy-action Door Closer Easy-action door closer Data and features TS

Learning Faster from Easy Data II Wouter Koolen Tim van Erven Aim of the Workshop

Data Systems that are Easy to Design, Tune and Use Stratos Idreos applications api/sql

make experiment WMT 2010 workflow management Goals JHU Submission WMT 2010 Running translation

Smartpods PULSE The Software is easy to use and provides relevant data to improve health and

Easy Flype & Easy HiFlype Peripheral Self-Expanding Stent System 20/07/2018 Easy Flype

Expandabee Easy Access | Easy Lift | Easy Bees Red B Where we left off Standalone

Personal Control of Your Data Butler Lampson August 8, 2013 Background What is new about

Emsi UK Product Update June 2019 Emsi UK Product Vision 3 Key principles Easy to use

Passive Learning (Ch. 21.1-21.2) Step 1. EM Algorithm For an example, lets go back to the

APD TECHNOLOGIES COMMUNITY OUTREACH LINKS Crime data: www.crimemapping.com An easy to

Data stories with The Pudding So what is data storytelling? Data academia & data science

220 Maintainability: How easy is it to make changes and fix bugs? Testability: How easy is it to

ELECTRONIC MEDICAL RECORDS FOR EMERGENCIES / PROJECT BUENDIA WORKS EVERYWHERE EXTREMELY

Easy and Instantaneous Processing for Data-Intensive Workflows Nan Dun, Kenjiro Taura, and

Open Gov Love Yow! Data - 7 May 2019 Designing for transparency, public participation and easy

Rapid Restoration Diagnostic Motivate Enable Implement ++ Urgent Urgent Not easy Easy

1 1 easy to compute , 1 easy to compute 2

Make it easy - integration of data description in the research process 11. June 2019 University

E2 Jr Easy upgrading it easy to ship and install. And thanks to its modular cards and dual

The most easy way to view our assortment and send your orders! New layout and structure Easy

Taking the pain out of PDM Product Data Management made easy with Wayne Marshall & Richard

Easy Data Peter Grnwald Centrum Wiskunde & Informatica - PowerPoint PPT Presentation

Easy Data Peter Grnwald Centrum Wiskunde & Informatica Amsterdam Mathematical Institute Leiden University Joint work with W. Koolen, T. Van Erven, N. Mehta, T. Sterkenburg Today: Three Things To Tell You 1. Nifty Reformulation of

Easy-to-Use Easy-to-Install Easy on the Budget orecx.com Easy-to-Use

PRESENTATION OF DATA Data summarization Is the organization of data in a way for easy

TS 83 DORMA DORMA TS 83 Easy-action Door Closer Easy-action door closer Data and features TS

Learning Faster from Easy Data II Wouter Koolen Tim van Erven Aim of the Workshop

Data Systems that are Easy to Design, Tune and Use Stratos Idreos applications api/sql

make experiment WMT 2010 workflow management Goals JHU Submission WMT 2010 Running translation

Smartpods PULSE The Software is easy to use and provides relevant data to improve health and

Easy Flype &amp; Easy HiFlype Peripheral Self-Expanding Stent System 20/07/2018 Easy Flype

Expandabee Easy Access | Easy Lift | Easy Bees Red B Where we left off Standalone

Personal Control of Your Data Butler Lampson August 8, 2013 Background What is new about

Emsi UK Product Update June 2019 Emsi UK Product Vision 3 Key principles Easy to use

Passive Learning (Ch. 21.1-21.2) Step 1. EM Algorithm For an example, lets go back to the

APD TECHNOLOGIES COMMUNITY OUTREACH LINKS Crime data: www.crimemapping.com An easy to

Data stories with The Pudding So what is data storytelling? Data academia &amp; data science

220 Maintainability: How easy is it to make changes and fix bugs? Testability: How easy is it to

ELECTRONIC MEDICAL RECORDS FOR EMERGENCIES / PROJECT BUENDIA WORKS EVERYWHERE EXTREMELY

Easy and Instantaneous Processing for Data-Intensive Workflows Nan Dun, Kenjiro Taura, and

Open Gov Love Yow! Data - 7 May 2019 Designing for transparency, public participation and easy

Rapid Restoration Diagnostic Motivate Enable Implement ++ Urgent Urgent Not easy Easy

1 1 easy to compute , 1 easy to compute 2

Make it easy - integration of data description in the research process 11. June 2019 University

E2 Jr Easy upgrading it easy to ship and install. And thanks to its modular cards and dual

The most easy way to view our assortment and send your orders! New layout and structure Easy

Taking the pain out of PDM Product Data Management made easy with Wayne Marshall &amp; Richard

Easy Flype & Easy HiFlype Peripheral Self-Expanding Stent System 20/07/2018 Easy Flype

Data stories with The Pudding So what is data storytelling? Data academia & data science

Taking the pain out of PDM Product Data Management made easy with Wayne Marshall & Richard