The Ladder: A Reliable Leaderboard for Machine Learning Competitions - PowerPoint PPT Presentation

The Ladder: A Reliable Leaderboard for Machine Learning Competitions COMS 6998-4 2017, Topics in Learning Theory Qinyao He qh2183@columbia.edu Columbia University November 30, 2017 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Outline Introduction Problem Formulation Ladder Mechanism Parameter Free Modification Boosting Attack Experiment in Real . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Outline Introduction Problem Formulation Ladder Mechanism Boosting Attack Experiment in Real . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Kaggle Competition Figure: Public and Private Leaderboard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Overfiting ▶ Repeated submission to Kaggle leaderboard tends to overfit the public leaderboard dataset. ▶ Public leaderboard score may not represent the actual performance, participants can be mislead. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Overfiting ▶ Repeated submission to Kaggle leaderboard tends to overfit the public leaderboard dataset. ▶ Public leaderboard score may not represent the actual performance, participants can be mislead. ▶ In fact the error between the public leaderboard and actual √ k performance can be large as O ( n ), k is number of submission. ▶ How should we deal with that? How to maintain a leaderboard with reliable accurate estimation of the true performance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Ways to Reduce that Effect ▶ Limit the rate of submission (maximum of 10 submission per day). ▶ Limit the numerical accuracy returned by the leaderboard (rounding to fixed decimal digits). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Ways to Reduce that Effect ▶ Limit the rate of submission (maximum of 10 submission per day). ▶ Limit the numerical accuracy returned by the leaderboard (rounding to fixed decimal digits). We want theoretical guarantee even for very large times of submission. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Preliminaries and Notations ▶ Data domain X and label domain Y , unknown distribution D over X × Y . ▶ Classifier f : X → Y , loss function ℓ : Y × Y → [0 , 1]. ▶ Set of sample S = { ( x 1 , y 1 ) , . . . , ( x n , y n ) } drawn i.i.d from D . ▶ Empirical loss n R S ( f ) = 1 ∑ ℓ ( f ( x i ) , y i ) n i =1 ▶ True loss R D ( f ) = [ ℓ ( f ( x ) , y )] E ( x , y ) ∼D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Leaderboard Model 1. Each time t a competitor submit a classifier f t (in practice a prediction over holdout dataset). 2. The leaderboard return a estimate of score R t to the competitor using public leaderboard dataset S . 3. Finally the true score over D is estimated over another set of private dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Error Evaluation Given a sequence of classifier f 1 , f 2 , . . . , f k , and score by the leaderboard R t , we want to bound max | R D ( f t ) − R t | t i.e., we should make Pr[ ∃ t ∈ [ k ] : | R D ( f t ) − R t | > ϵ ] ≤ δ The error on private leaderboard should be close to the true loss since those private data are not revealed to the competitor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Kaggle Algorithm Algorithm 1 Kaggle Algorithm Input: Data set S , rounding parameter α > 0 (typically 0.00001) for each round t ← 1 , 2 , . . . do Receive function f t : X → Y return [ R S ( f t )] α end for [ x ] α denote rounding x to the nearest integer multiple of α . e.g., [3 . 14159] 0 . 01 = 3 . 14. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Simple Non-adaptive Case ▶ Assume all f 1 , . . . , f k are fixed independent of S ▶ Just compute empirical loss R S ( f t ) as R t . ▶ Directly apply Hoeffding’s inequality and union bound we have Pr[ ∃ t ∈ [ k ] : | R D ( f t ) − R S ( f t ) | > ϵ ] ≤ 2 k exp( − 2 ϵ 2 n ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Simple Non-adaptive Case ▶ Assume all f 1 , . . . , f k are fixed independent of S ▶ Just compute empirical loss R S ( f t ) as R t . ▶ Directly apply Hoeffding’s inequality and union bound we have Pr[ ∃ t ∈ [ k ] : | R D ( f t ) − R S ( f t ) | > ϵ ] ≤ 2 k exp( − 2 ϵ 2 n ) ▶ √ log k ϵ = O ( ) n k = O (exp( ϵ 2 n )) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Adaptive Setting ▶ Classifier f t may be chosen as a function of previous estimate. f t = A ( f 1 , R 1 , . . . , f t − 1 , R t − 1 ) independence of f 1 , . . . , f k never holds, no longer union bounds over k ! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Adaptive Setting ▶ Classifier f t may be chosen as a function of previous estimate. f t = A ( f 1 , R 1 , . . . , f t − 1 , R t − 1 ) independence of f 1 , . . . , f k never holds, no longer union bounds over k ! ▶ We will later show an simple attack for the Kaggle algorithm √ k to have error ϵ = Ω( n ). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Adaptive Setting ▶ Classifier f t may be chosen as a function of previous estimate. f t = A ( f 1 , R 1 , . . . , f t − 1 , R t − 1 ) independence of f 1 , . . . , f k never holds, no longer union bounds over k ! ▶ We will later show an simple attack for the Kaggle algorithm √ k to have error ϵ = Ω( n ). ▶ In fact no computational efficient way to achieve o (1) error with k ≥ n 2+ o (1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Leaderboard Error Previous setting of bounding error for every step is not possible. Introduce a weaker notion, we only cares about the best classifier submitted so far rather than accurately estimate all f i . Let R t returned by the leaderboard at time t represent the estimated loss of the currently best classifier. Definition Given adaptively chosen f 1 , . . . , f k , define leaderboard error of estimates R 1 , . . . , R k , � � � � lberr( R 1 , . . . , R k ) = max � min 1 ≤ i ≤ t R D ( f i ) − R t � � 1 ≤ t ≤ k � . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Ladder Algorithm Algorithm 2 Ladder Algorithm Input: Data set S , step size η > 0 Assign initial state R 0 ← ∞ for each round t ← 1 , 2 , . . . do Receive function f t : X → Y if R S ( f t ) < R t − 1 − η then Assign R t ← [ R S ( f t )] η else Assign R t ← R t − 1 end if return R t end for Require an increase by some margin η to be considered as the new best. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Error Bound Theorem For any adaptively chosen f 1 , . . . , f k , the Ladder Mechanism satisfy for all t ≤ k and ϵ > 0 , lberr ( R 1 , . . . , R k ) = O (log 1 / 3 ( kn ) ) n 1 / 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Error Bound Theorem For any adaptively chosen f 1 , . . . , f k , the Ladder Mechanism satisfy for all t ≤ k and ϵ > 0 , lberr ( R 1 , . . . , R k ) = O (log 1 / 3 ( kn ) ) n 1 / 3 Put it another way, we can have up to k = O (1 n exp( n ϵ 3 )) submissions but still expect the leaderboard error to be small. Previously, k = O ( n 2 ). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Proof ▶ Recall the union bound technique we apply in non-adaptive setting Pr[ ∃ t ∈ [ k ] : | R D ( f t ) − R S ( f t ) | > ϵ ] ≤ 2 k exp( − 2 ϵ 2 n ) ▶ No longer only k possible classifiers, need to consider all possible classifiers may appear to apply the union bound. ▶ Now the problem becomes counting the total number of different classifiers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

The Ladder: A Reliable Leaderboard for Machine Learning Competitions - PowerPoint PPT Presentation

The Ladder: A Reliable Leaderboard for Machine Learning Competitions COMS 6998-4 2017, Topics in Learning Theory Qinyao He qh2183@columbia.edu Columbia University November 30, 2017 . . . . . . . . . . . . . . . . . . . . .

UHD Career Ladder Program UHD Career Ladder Program Committed to Staff Career Opportunities/

St. J t. John ohn Climacus Climacus Ladder of Ladder of Divine A Divine Ascent scent

How much $ to change a light bulb? #1 Choose The Right Ladder for the Job. Use A Ladder

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Prior Work Consensus Consensus Reliable BGP Consensus Reliable BGP Consensus Routing

Prior Work Consensus Consensus Reliable BGP Consensus Reliable BGP Consensus Routing

Moving Up The Ladder From Moving Up The Ladder From The Shop Floor To Fleet The Shop Floor To

An Introduction to the Munich Chain Ladder based on paper by Quarg and Mack Louise Francis,

Propagating MN Landscape Arboretum Internship 2015 Western Jacobs Ladder Sol Jin Background

Ladder Safety 101 By the Numbers All Falls Falls off of Ladders 2,000 number of

Related rates 10/24/2011 Example Suppose you has a 5m ladder resting against a wall. 5m

Contributions to the verification and control of timed and probabilistic models Nathalie Bertrand

PIQA: Reasoning about Physical Commonsense in Natural Language Shailesh M Pandey Bisk, Yonatan

Op Optio tions ns fo for r Se Sec 3 c 3 (F (For r 2020 Se Sec 2 st student dents) s)

WELCOME Comments about ZOOM Final District 6860 Business Meeting AGENDA for Business Meeting

Improving the Long Term Care Workforce Serving Older Adults by Robyn I. Stone, DrPH Using

Orange Harvester Red B Mockup Presentation October 20, 2005 1 U.S. Orange Market Fresh: 2.5

The Silicon Tracking System of the CBM Experiment at FAIR A. Lymanets for the CBM collaboration

Rhythm Rhythm We use symbols to show the stressed and unstressed syllables within a word.