VC Dimension and classification John Duchi Prof. John Duchi
Outline I Setting: classification problems II Finite hypothesis classes 1 Union bounds 2 Zero error case III Shatter coe ffi cients and Rademacher complexity IV VC Dimension Prof. John Duchi
Setting for the lecture Binary classification problems: data X 2 X and labels Y 2 { � 1 , 1 } . Hypothesis class H ⇢ { h : X ! R } . Goal: Find h 2 H with L ( h ) := E [ 1 { h ( X ) Y 0 } ] small Loss is always ( 1 if sign( h ( x )) 6 = y ` ( h ; ( x, y )) = 1 { h ( x ) y 0 } = 0 if sign( h ( x )) = y Prof. John Duchi
Finite hypothesis classes Theorem Let H be a finite class. Then ! r log |H| + t 9 h 2 H s.t. | L ( h ) � b 2 e � t . L n ( h ) | � P 2 n Prof. John Duchi
Finite hypothesis classes: generalization Corollary Let H be a finite class, b h n 2 argmin h b L n ( h ) . Then (for numerical constant C < 1 ) s log |H| L ( b � h n ) min h 2 H L ( h ) + C n w.p. � 1 � � Prof. John Duchi
Finite hypothesis classes: perfect classifiers Possible to give better guarantees if there are good classifiers! We won’t bother looking at bad ones. Theorem Let H be a finite hypothesis class and assume min h L ( h ) = 0 . Then for t � 0 ✓ ◆ h n ) � L ( h ? ) + log |H| + t L ( b e � t . P n Prof. John Duchi
Do not pick the bad ones Prof. John Duchi
Finite function classes: Rademacher complexity Idea: Use Rademacher complexity to understand generalization even for these? Let F be finite with | f | 1 for f 2 F . Then � � " # � � n X 1 � � R n ( F ) := E max " i f ( Z i ) � � � � n f 2 F i =1 satisfies � � ! � � n X 1 � � 2 exp( � cnt 2 ) max f ( X i ) � E [ f ( X i )] � � 2 R n ( F ) + t P � � � n f 2 F i =1 Prof. John Duchi
Finite function classes: sub-Gaussianity I Let P n be empirical distribution P n I Define k f k 2 L 2 ( P n ) = 1 i =1 f ( x i ) 2 n I What about sum n X 1 p n " i f ( x i ) i =1 Prof. John Duchi
Finite function classes: Rademacher complexity Proposition (Massart’s finite class bound) Let F be finite with M := max f 2 F k f k L 2 ( P n ) . Then r 2 M 2 log(2 card( F )) b R n ( F ) . n Prof. John Duchi
Infinite classes with finite labels What if we had a classifier h : X ! { � 1 , 1 } that could only give a certain number of di ff erent labelings to a data set? Example (Sketchy) Say X = R and h t ( x ) = sign( x � t ) . Complexity of F := { f ( x ) = 1 { h t ( x ) 0 }} ? Prof. John Duchi
Complexity of function classes Define F ( x 1: n ) := { ( f ( x 1 ) , . . . , f ( x n )) | f 2 F} . Then R n ( F ) = b b R n ( F 0 ) whenever F ( x 1: n ) = F 0 ( x 1: n ) Proposition Rademacher complexity depends on values of F : if | f ( x ) | M for all x then r log card( F ( x 1: n )) R n ( F ) c · M sup . n x 1 ,...,x n 2 X Prof. John Duchi
Proof of complexity Prof. John Duchi
Shatter coe ffi cients Given function class F , shattering coe ffi cient (growth function) is s n ( F ) := sup card ( F ( x 1: n )) x 1 ,...,x n 2 X = x 1: n 2 X n card (( f ( x 1 ) , . . . , f ( x n )) | f 2 F ) sup Example Thresholds in R Prof. John Duchi
Shatter coe ffi cients and Rademacher complexity Proposition For any function class F with | f ( x ) | M we have r log s n ( F ) R n ( F ) cM . n Prof. John Duchi
VC Dimension How do we use shatter coe ffi cients to give complexity guarantees? Definition (VC Dimension) Let H be a collection of boolean functions. The Vapnik Chervonenkis (VC) Dimension of H is VC ( H ) := sup { n 2 N : s n ( H ) = 2 n } . Prof. John Duchi
VC Dimension: examples Example (Thresholds in R ) Example (Intervals in R ) Prof. John Duchi
VC Dimension: examples Example (Half-spaces in R 2 ) Prof. John Duchi
Finite dimensional hypothesis classes Let F be functions f : X ! R and suppose dim ( F ) = d I Definition of dimension: Example (Linear functionals) If F = { f ( x ) = w > x, w 2 R d } then dim ( F ) = d Example (Nonlinear functionals) If F = { f ( x ) = w > � ( x ) , w 2 R d } then dim ( F ) = d Prof. John Duchi
VC dimension of finite dimensional classes Let F have dim ( F ) = d and let H := { h : X ! { � 1 , 1 } s.t. h ( x ) = sign( f ( x )) , f 2 F} . Proposition (Dimension bounds VC dimension) VC ( H ) dim ( F ) Prof. John Duchi
Finite dimensional hypothesis classes: proof Prof. John Duchi
Sauer-Shelah Lemma Theorem Let H be boolean functions with VC ( H ) = d . Then ( ✓ n ◆ d X 2 n if n d s n ( H ) � ne � d i if n > d i =0 d Prof. John Duchi
Rademacher complexity of VC classes Proposition Let H be collection of boolean functions with VC ( H ) = d . Then r d log n d R n ( H ) c . n Proof is immediate (but a tighter result is possible): Prof. John Duchi
Generalization bounds for VC classes Proposition Let H have VC-dimension d and ` ( h ; ( x, y )) = 1 { h ( x ) 6 = y } . Then 0 s 1 d log d A 2 e � nt 2 @ 9 h 2 H s.t. | b n L n ( h ) � L ( h ) | � c + t P n Prof. John Duchi
Things we have not addressed I Multiclass problems (Natarajan dimension, due to Bala Natarajan; see also Multiclass Learnability and the ERM Principle by Daniely et al.) I Extending “zero error” results to infinite classes I Non-boolean classes Prof. John Duchi
Reading and bibliography 1. M. Anthony and P. Bartlet. Neural Network Learning: Theoretical Foundations . Cambridge University Press, 1999 2. P. L. Bartlett and S. Mendelson. Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research , 3:463–482, 2002 3. S. Boucheron, O. Bousquet, and G. Lugosi. Theory of classification: a survey of some recent advances. ESAIM: Probability and Statistics , 9:323–375, 2005 4. A. W. van der Vaart and J. A. Wellner. Weak Convergence and Empirical Processes: With Applications to Statistics . Springer, New York, 1996 (Ch. 2.6) 5. Scribe notes for Statistics 300b: http://web.stanford.edu/class/stats300b/ Prof. John Duchi
Recommend
More recommend