Learnability Beyond Uniform Convergence Shai Shalev-Shwartz School - PowerPoint PPT Presentation

Learnability Beyond Uniform Convergence Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem ”Mathematical and Computational Foundations of Learning Theory”, Dagstuhl 2011 Joint work with: N. Srebro, O. Shamir, K. Sridharan (COLT’09,JMLR’11) A. Daniely, S. Sabato, S. Ben-David (COLT’11) Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 1 / 34

The Fundamental Theorem of Learning Theory For Binary Classification Uniform trivial trivial Learnable Learnable Convergence with ERM VC’71 Finite VC NFL (W’96) Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 2 / 34

The Fundamental Theorem of Learning Theory For Regression Uniform trivial trivial Learnable Learnable Convergence with ERM Finite fat- BLW’96,ABCH’97 KS’94,BLW’96,ABCH’97 shattering Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 3 / 34

For general learning problems? Uniform trivial trivial Learnable Learnable Convergence with ERM ? Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 4 / 34

For general learning problems? Uniform trivial trivial Learnable Learnable Convergence with ERM X Not true even in multiclass classification ! What is learnable ? How to learn ? Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 4 / 34

Outline Definitions 1 Learnability without uniform convergence 2 Characterizing Learnability using Stability 3 Characterizing Multiclass Learnability 4 Open Questions 5 Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 5 / 34

The General Learning Setting Vapnik’s General Learning Setting Hypothesis class H Instance space Z with unknown distribution D Loss function ℓ : H × Z → R Given: Training set S ∼ D m Goal: Probably approximately solve min h ∈H L ( h ) where L ( h ) = E z ∼D [ ℓ ( h, z )] Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 6 / 34

Examples Binary classification: Z = X × { 0 , 1 } h ∈ H is a predictor h : X → { 0 , 1 } ℓ ( h, ( x, y )) = 1 [ h ( x ) � = y ] Multiclass categorization: Z = X × Y h ∈ H is a predictor h : X → Y ℓ ( h, ( x, y )) = 1 [ h ( x ) � = y ] k -means clustering: Z = R d H ⊂ ( R d ) k specifies k cluster centers ℓ (( µ 1 , . . . , µ k ) , z ) = min j � µ j − z � Density Estimation: h is a parameter of a density p h ( z ) ℓ ( h, z ) = − log p h ( z ) Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 7 / 34

Learnability, ERM, Uniform convergence Uniform Convergence : For m ≥ m UC ( ǫ, δ ) , S ∼D m [ ∀ h ∈ H , | L S ( h ) − L ( h ) | ≤ ǫ ] ≥ 1 − δ P Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 8 / 34

Learnability, ERM, Uniform convergence Uniform Convergence : For m ≥ m UC ( ǫ, δ ) , S ∼D m [ ∀ h ∈ H , | L S ( h ) − L ( h ) | ≤ ǫ ] ≥ 1 − δ P Learnable : ∃A s.t. for m ≥ m PAC ( ǫ, δ ) , � � L ( A ( S )) ≤ min ≥ 1 − δ P h ∈H L ( h ) + ǫ S ∼D m Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 8 / 34

Learnability, ERM, Uniform convergence Uniform Convergence : For m ≥ m UC ( ǫ, δ ) , S ∼D m [ ∀ h ∈ H , | L S ( h ) − L ( h ) | ≤ ǫ ] ≥ 1 − δ P Learnable : ∃A s.t. for m ≥ m PAC ( ǫ, δ ) , � � L ( A ( S )) ≤ min ≥ 1 − δ P h ∈H L ( h ) + ǫ S ∼D m ERM : An algorithm that returns A ( S ) ∈ argmin h ∈H L S ( h ) Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 8 / 34

Learnability, ERM, Uniform convergence Uniform Convergence : For m ≥ m UC ( ǫ, δ ) , S ∼D m [ ∀ h ∈ H , | L S ( h ) − L ( h ) | ≤ ǫ ] ≥ 1 − δ P Learnable : ∃A s.t. for m ≥ m PAC ( ǫ, δ ) , � � L ( A ( S )) ≤ min ≥ 1 − δ P h ∈H L ( h ) + ǫ S ∼D m ERM : An algorithm that returns A ( S ) ∈ argmin h ∈H L S ( h ) Learnable by arbitrary ERM : Like “Learnable” but A should be an ERM. Denote sample complexity by m ERM ( ǫ, δ ) Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 8 / 34

For Binary Classification Uniform trivial trivial Learnable Learnable Convergence with ERM VC’71 Finite VC NFL (W’96) VC( H ) log(1 /δ ) m UC ( ǫ, δ ) ≈ m ERM ( ǫ, δ ) ≈ m PAC ( ǫ, δ ) ≈ ǫ 2 Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 9 / 34

First (trivial) Counter Example Minorizing function: Let H ′ be a class of binary classifiers with infinite VC dimension Let H = H ′ ∪ { h 0 }  1 if h � = h 0 ∧ h ( x ) � = y   Let ℓ ( h, ( x, y )) = 1 / 2 if h � = h 0 ∧ h ( x ) = y  0 if h = h 0  No uniform convergence ( m UC = ∞ ) Learnable by ERM ( m ERM = 0 ) Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 11 / 34

From Vapnik’s book ... Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 12 / 34

Second Counter Example — Multiclass X – a set, Y = 2 X ∪ {∗} . H = { h T : T ⊂ X} where � ∗ x / ∈ T h T ( x ) = x ∈ T T Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 13 / 34

Second Counter Example — Multiclass X – a set, Y = 2 X ∪ {∗} . H = { h T : T ⊂ X} where � ∗ x / ∈ T h T ( x ) = x ∈ T T Claim: No uniform convergence: m UC ≥ |X| /ǫ Target function is h ∅ For any training set S , take T = X \ S L S ( h T ) = 0 but L ( h T ) = P [ T ] Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 13 / 34

Second Counter Example — Multiclass X – a set, Y = 2 X ∪ {∗} . H = { h T : T ⊂ X} where � ∗ ∈ T x / h T ( x ) = T x ∈ T Claim: H is Learnable: m PAC ≤ 1 ǫ Let T be the target A ( S ) = h T if ( x, T ) ∈ S A ( S ) = h ∅ if S = { ( x 1 , ∗ ) , . . . , ( x m , ∗ ) } In the 1st case, L ( A ( S )) = 0 . In the 2nd case, L ( A ( S )) = P [ T ] With high probability, if P [ T ] > ǫ then we’ll be in the 1st case Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 13 / 34

Second Counter Example — Multiclass Corollary m UC m PAC ≈ |X| . If |X| → ∞ then the problem is learnable but there is no uniform convergence! Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 14 / 34

Third Counter Example — Stochastic Convex Optimization Consider the family of problems: H is a convex set with max h ∈H � h � ≤ 1 For all z , ℓ ( h, z ) is convex and Lipschitz w.r.t. h Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 15 / 34

Third Counter Example — Stochastic Convex Optimization Consider the family of problems: H is a convex set with max h ∈H � h � ≤ 1 For all z , ℓ ( h, z ) is convex and Lipschitz w.r.t. h Claim: Problem is learnable by the rule: m 2 � h � 2 + 1 λ m � argmin ℓ ( h, z i ) m h ∈H i =1 No uniform convergence Not learnable by ERM Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 15 / 34

Third Counter Example — Stochastic Convex Optimization Proof (of “not learnable by arbitrary ERM”) 1 -Mean + missing features Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 16 / 34

Third Counter Example — Stochastic Convex Optimization Proof (of “not learnable by arbitrary ERM”) 1 -Mean + missing features z = ( α, x ) , α ∈ { 0 , 1 } d , x ∈ R d , � x � ≤ 1 �� i α i ( h i − x i ) 2 ℓ ( h, ( α, x )) = Take P [ α i = 1] = 1 / 2 , P [ x = µ ] = 1 Let h ( i ) be s.t. � 1 − µ j if j = i h ( i ) = j µ j o.w. If d is large enough, exists i such that h ( i ) is an ERM √ But L ( h ( i ) ) ≥ 1 / 2 Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 16 / 34

Third Counter Example — Stochastic Convex Optimization Proof (of “not even learnable by a unique ERM”) Perturb the loss a little bit: �� α i ( h i − x i ) 2 + ǫ � 2 − i ( h i − 1) 2 ℓ ( h, ( α, x )) = i i Now loss is strictly convex — unique ERM But the unique ERM does not generalize (as before) Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 17 / 34

Characterizing Learnability using Stability Theorem A sufficient and necessary condition for learnability is the existence of Asymptotic ERM (AERM) which is stable. RMP’05,MNPR’06, Uniform trivial ERM is stable ∃ stable AERM Convergence Learnable Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 19 / 34

Learnability Beyond Uniform Convergence Shai Shalev-Shwartz School - PowerPoint PPT Presentation

Learnability Beyond Uniform Convergence Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Mathematical and Computational Foundations of Learning Theory, Dagstuhl 2011 Joint work with: N. Srebro, O.

Curriculum on The Cadet Corps Uniform Class A Uniform Class A Uniform Agenda C1. Class A

Curriculum on The Cadet Corps Uniform Wear It WIth honor Class C Uniform Class C Uniform

Non-Uniform Computation Lecture 10 Non-Uniform Computational Models: Circuits 1 Non-Uniform

Learnability Beyond Uniform Convergence Shai Shalev-Shwartz School of CS and Engineering, The

Circuits Lecture 11 Uniform Circuit Complexity 1 Recall 2 Recall Non-uniform complexity 2

Non-Uniform Computation & Circuits Lecture 10 Wherein every language can be decided 1

An experimental study of the learnability of congestion control Anirudh Sivaraman, Keith

Machine learning theory Nonuniform learnability Hamid Beigy Sharif university of technology

Learnability and models of decision making under uncertainty Pathikrit Basu Federico Echenique

Evaluating Learnability of - User interface and inline help - Inline/Online Tutorials Aim:

Plan Introduction 1 On categorial grammars and learnability 2 Logical Information Systems

Non Uniform Learnability prof. dr Arno Siebes Algorithmic Data Analysis Group Department of

MAZENOD COLLEGE STUDENT PRESENTATION POLICY SUMMER UNIFORM If out of uniform students must

Winter Uniform If out of uniform students must present a note of explanation to their Year Level

Uniform Guidance aka UG, UniGui HUGE: CSU Harnessing Uniform Guidance Effectively An update

Convergence of uniform subdivision Amos Ron Erice, Trapani, Sicilia, Italia, Europa September,

Enterprise Risk Management: A Practical Approach Presented by: Ellen M. Labita, CPA, Partner,

AASHTO SUBCOMMITTEE FOR INTERNAL/EXTERNAL AUDIT ANNUAL MEETING Doubletree Hotel Orange,

Enterprise Risk Management Program Overview 1 Enterprise Risk Management: An Overview ERM

State University of New York Enterprise Risk Management Overview of Current Risk Management

Enterprise Risk Management and Culture Jai Ramaswamy Managing Vice President Enterprise Risk

Enterprise Risk Management Presented by Rotimi Okpaise B.Sc, ASA, FIA at the CIINs 2008

MRO SAC Hosted Webinar Information Risk Management Framework Catherine Sherwood, Manager

Board of Visitors Audit, Compliance, and Risk Committee March 2, 2017 1 Audit Department

Learnability Beyond Uniform Convergence Shai Shalev-Shwartz School - PowerPoint PPT Presentation

Learnability Beyond Uniform Convergence Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Mathematical and Computational Foundations of Learning Theory, Dagstuhl 2011 Joint work with: N. Srebro, O.

Curriculum on The Cadet Corps Uniform Class A Uniform Class A Uniform Agenda C1. Class A

Curriculum on The Cadet Corps Uniform Wear It WIth honor Class C Uniform Class C Uniform

Non-Uniform Computation Lecture 10 Non-Uniform Computational Models: Circuits 1 Non-Uniform

Learnability Beyond Uniform Convergence Shai Shalev-Shwartz School of CS and Engineering, The

Circuits Lecture 11 Uniform Circuit Complexity 1 Recall 2 Recall Non-uniform complexity 2

Non-Uniform Computation &amp; Circuits Lecture 10 Wherein every language can be decided 1

An experimental study of the learnability of congestion control Anirudh Sivaraman, Keith

Machine learning theory Nonuniform learnability Hamid Beigy Sharif university of technology

Learnability and models of decision making under uncertainty Pathikrit Basu Federico Echenique

Evaluating Learnability of - User interface and inline help - Inline/Online Tutorials Aim:

Plan Introduction 1 On categorial grammars and learnability 2 Logical Information Systems

Non Uniform Learnability prof. dr Arno Siebes Algorithmic Data Analysis Group Department of

MAZENOD COLLEGE STUDENT PRESENTATION POLICY SUMMER UNIFORM If out of uniform students must

Winter Uniform If out of uniform students must present a note of explanation to their Year Level

Uniform Guidance aka UG, UniGui HUGE: CSU Harnessing Uniform Guidance Effectively An update

Convergence of uniform subdivision Amos Ron Erice, Trapani, Sicilia, Italia, Europa September,

Enterprise Risk Management: A Practical Approach Presented by: Ellen M. Labita, CPA, Partner,

AASHTO SUBCOMMITTEE FOR INTERNAL/EXTERNAL AUDIT ANNUAL MEETING Doubletree Hotel Orange,

Enterprise Risk Management Program Overview 1 Enterprise Risk Management: An Overview ERM

State University of New York Enterprise Risk Management Overview of Current Risk Management

Enterprise Risk Management and Culture Jai Ramaswamy Managing Vice President Enterprise Risk

Enterprise Risk Management Presented by Rotimi Okpaise B.Sc, ASA, FIA at the CIINs 2008

MRO SAC Hosted Webinar Information Risk Management Framework Catherine Sherwood, Manager

Board of Visitors Audit, Compliance, and Risk Committee March 2, 2017 1 Audit Department

Non-Uniform Computation & Circuits Lecture 10 Wherein every language can be decided 1