ECML 2015 Big Targets Workshop Paul Mineiro Paul Mineiro ECML 2015 - PowerPoint PPT Presentation

How can we generalize well? ECML 2015 Big Targets Workshop Paul Mineiro Paul Mineiro ECML 2015 Big Targets Workshop

How can we generalize well? Extreme Challenges How can we generalize well? Can we compete with OAA? When can we predict quickly? Paul Mineiro ECML 2015 Big Targets Workshop

How can we generalize well? How can we generalize well? Paul Mineiro ECML 2015 Big Targets Workshop

How can we generalize well? Chasing Tails Typical extreme datasets have many rare classes. Paul Mineiro ECML 2015 Big Targets Workshop

How can we generalize well? Chasing Tails Typical extreme datasets have many rare classes. What are the implications for generalization? Paul Mineiro ECML 2015 Big Targets Workshop

How can we generalize well? Chasing Tails Typical extreme datasets have many rare classes. What are the implications for generalization? Let’s use the bootstrap to get intuition. Paul Mineiro ECML 2015 Big Targets Workshop

How can we generalize well? Bootstrap Lesson Observation (Tail Frequencies) The true frequencies of tail classes is not clear given the training set. Paul Mineiro ECML 2015 Big Targets Workshop

How can we generalize well? Two Loss Patterns All classes below have 1 training example. Which hypothesis do you like better? h 1 h 2 class 1 1 0.6 class 2 1 0.6 class 3 0 0.42 class 4 0 0.42 Paul Mineiro ECML 2015 Big Targets Workshop

How can we generalize well? Two Loss Patterns All classes below have 1 training example. Which hypothesis do you like better? h 1 h 2 class 1 1 0.6 class 2 1 0.6 class 3 0 0.42 class 4 0 0.42 ERM likes h 1 better. I like h 2 better. Paul Mineiro ECML 2015 Big Targets Workshop

How can we generalize well? The Extreme Deficiencies of ERM ERM cares only about average loss. h ∗ = argmin E ( x , y ) ∼ D [ l ( h ( x ); y )] h ∈H . . . but extreme learning empirical losses can have high variance. ERM doesn’t care about empirical loss variance. ERM is based upon a uniform bound on the hypothesis space. Paul Mineiro ECML 2015 Big Targets Workshop

How can we generalize well? eXtreme Risk Minimization Sample Variance Penalization (XRM) penalizes combination of expected loss and loss variance. h ∗ = argmin ( E [ l ( h ( x ); y )] + κ V [ l ( h ( x ); y )]) h ∈H ( κ is a hyperparameter in practice) XRM is based upon empirical Bernstein bounds. Paul Mineiro ECML 2015 Big Targets Workshop

How can we generalize well? Example: Neural Language Modeling Mini-batch XRM gradient:     l i ( φ ) − E j [ l j ( φ )] ∂ l i ( φ )      1 + κ E i     � ∂φ     � � − E j [ l j ( φ )] 2 l 2    j ( φ ) E j Smaller than average loss = ⇒ lower learning rate Larger than average loss = ⇒ larger learning rate Loss variance is the unit of loss measurement Paul Mineiro ECML 2015 Big Targets Workshop

How can we generalize well? Example: Neural Language Modeling enwiki9 data set FNN-LM of Zhang et. al. Same everything except κ . method perplexity ERM ( κ = 0) 106.3 XRM ( κ = 0.25) 104.1 Modest lift, but over SOTA baseline and with minimal code changes. Paul Mineiro ECML 2015 Big Targets Workshop

How can we generalize well? Example: Neural Language Modeling Progressive Loss Variance 4 3 ERM 2 XRM 1 0 10 4 10 6 10 8 10 10 Example # Paul Mineiro ECML 2015 Big Targets Workshop

How can we generalize well? Example: Randomized Embeddings Based upon (randomized) SVD. d k c c V ⊤ k ≈ T d X Y n n W = TV ⊤ How to adapt black-box technique to XRM? Idea: proxy model = ⇒ importance weights. Paul Mineiro ECML 2015 Big Targets Workshop

How can we generalize well? Imbalanced binary XRM Binary classification with constant predictor. l ( y ; q ) = y log( q ) + (1 − y ) log(1 − q ) � l ( y ; q ) − E [ l ( · ; q )] � 1 + κ � � � E [ l 2 ( · ; q )] − E [ l ( · ; q )] 2 � q = p  � p 1 − κ y = 0  1 − p = ( p ≤ 0 . 5) � 1 − p 1 + κ y = 1  p Paul Mineiro ECML 2015 Big Targets Workshop

How can we generalize well? XRM Rembed for ODP Compute base rate q c each class c . Importance weight (1 + κ (1 / √ q y i )). method error rate (%) ODP ERM [80.3, 80.4] ODP XRM ( κ = 1) [78.5, 78.7] Modest lift, but over SOTA baseline and with minimal code changes. Paul Mineiro ECML 2015 Big Targets Workshop

How can we generalize well? Summary The tail can deviate wildly between train and test. Controlling loss variance helps a little bit. Speculation: explicitly treat the head and tail differently? Paul Mineiro ECML 2015 Big Targets Workshop

ECML 2015 Big Targets Workshop Paul Mineiro Paul Mineiro ECML 2015 - PowerPoint PPT Presentation

How can we generalize well? ECML 2015 Big Targets Workshop Paul Mineiro Paul Mineiro ECML 2015 Big Targets Workshop How can we generalize well? Extreme Challenges How can we generalize well? Can we compete with OAA? When can we predict

Opening session Bled, September 7, 2009 ECML PKDD in SecondLife For the first time, ECML PKDD

SLO Growth Targets How to determine & set growth targets Todays Learning Targets I CAN

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

Factorization of the Label Conditional Distribution for Multi-Label Classification ECML PKDD 2015

I Prefer Pi Corey Sinnamon Febuary 3, 2015 Big Day 3/14/15 Big Day 3/14/15 Themes Big

Sarasota Bay Chlorophyll Targets Sarasota Bay Chlorophyll Targets Sarasota Bay Chlorophyll

Targets and Public Services: a case of incorrect thinking Jake Chapman Demos Associate The case

Achieving the SRF targets? Frank Rijsberman, Consortium CEO Intermedia iary ry 2022 targets ali

ISO 14001 4.3.3 ISO 14001 4.3.3 OBJ ECTIVES AND OBJ ECTIVES AND TARGETS TARGETS Lesson

Solving Complex Machine Learning Problems with Ensemble Methods ECML/PKDD 2013 Workshop .

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data

CS535 Big Data 1/22/2020 Sangmi Lee Pallickara CS535 Big Data | Computer Science Department

COMP9313: Big Data Management Introduction to Big Data Management What is big data? Tweeted by

Euro 2016 Predictions Using Team Rating Systems Jan Lasek deepsense.io Machine Learning and

Science Based Targets initiative Call to Action Campaign May 20, 2015 An initiative by Science

A Spectral Learning Algorithm for Finite State Transducers Borja Balle , Ariadna Quattoni, Xavier

and Strong Convexity Nati Srebro Ohad Shamir Shai Shalev-Shwartz Karthik Sridharan Ambuj

Learning Theory and Model Selection Weinan Zhang Shanghai Jiao Tong University

CSC2412: Private Gradient Descent & Empirical Risk Minimization Sasho Nikolov 1 Empirical

Stability of Clustering Methods Sasha Rakhlin Ph.D. candidate, MIT 1 A procedure is stable if P

Distributed Work: Forecasts + Recommendations ibute d Wor k: T e am Numbe r : 3 Government

ERP IMPLEMENTATION ERP IMPLEMENTATION Kedar Gaonkar Kedar Gaonkar IETF IETF- -69 Chicago,

Smooth and Flexible ERP Migration between Heterogeneous ERP Systems/ERP Modules Lars Frank

DEC Brownfield Programs Brownfield Cleanup Program (BCP) Environmental Restoration

ECML 2015 Big Targets Workshop Paul Mineiro Paul Mineiro ECML 2015 - PowerPoint PPT Presentation

How can we generalize well? ECML 2015 Big Targets Workshop Paul Mineiro Paul Mineiro ECML 2015 Big Targets Workshop How can we generalize well? Extreme Challenges How can we generalize well? Can we compete with OAA? When can we predict

Opening session Bled, September 7, 2009 ECML PKDD in SecondLife For the first time, ECML PKDD

SLO Growth Targets How to determine &amp; set growth targets Todays Learning Targets I CAN

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

Factorization of the Label Conditional Distribution for Multi-Label Classification ECML PKDD 2015

I Prefer Pi Corey Sinnamon Febuary 3, 2015 Big Day 3/14/15 Big Day 3/14/15 Themes Big

Sarasota Bay Chlorophyll Targets Sarasota Bay Chlorophyll Targets Sarasota Bay Chlorophyll

Targets and Public Services: a case of incorrect thinking Jake Chapman Demos Associate The case

Achieving the SRF targets? Frank Rijsberman, Consortium CEO Intermedia iary ry 2022 targets ali

ISO 14001 4.3.3 ISO 14001 4.3.3 OBJ ECTIVES AND OBJ ECTIVES AND TARGETS TARGETS Lesson

Solving Complex Machine Learning Problems with Ensemble Methods ECML/PKDD 2013 Workshop .

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data

CS535 Big Data 1/22/2020 Sangmi Lee Pallickara CS535 Big Data | Computer Science Department

COMP9313: Big Data Management Introduction to Big Data Management What is big data? Tweeted by

Euro 2016 Predictions Using Team Rating Systems Jan Lasek deepsense.io Machine Learning and

Science Based Targets initiative Call to Action Campaign May 20, 2015 An initiative by Science

A Spectral Learning Algorithm for Finite State Transducers Borja Balle , Ariadna Quattoni, Xavier

and Strong Convexity Nati Srebro Ohad Shamir Shai Shalev-Shwartz Karthik Sridharan Ambuj

Learning Theory and Model Selection Weinan Zhang Shanghai Jiao Tong University

CSC2412: Private Gradient Descent &amp; Empirical Risk Minimization Sasho Nikolov 1 Empirical

Stability of Clustering Methods Sasha Rakhlin Ph.D. candidate, MIT 1 A procedure is stable if P

Distributed Work: Forecasts + Recommendations ibute d Wor k: T e am Numbe r : 3 Government

ERP IMPLEMENTATION ERP IMPLEMENTATION Kedar Gaonkar Kedar Gaonkar IETF IETF- -69 Chicago,

Smooth and Flexible ERP Migration between Heterogeneous ERP Systems/ERP Modules Lars Frank

DEC Brownfield Programs Brownfield Cleanup Program (BCP) Environmental Restoration

SLO Growth Targets How to determine & set growth targets Todays Learning Targets I CAN

CSC2412: Private Gradient Descent & Empirical Risk Minimization Sasho Nikolov 1 Empirical