Self-bounding functions and concentration of variance Andreas Maurer Advances in stochastic inequalities and their applications, BIRS 2009
Notation and de…nitions � := Q n k =1 � k is some product space with product probability � = � n k =1 � k . � x 1 ; :::; x k � 1 ; y; x k +1 ; :::; x n � . for x 2 � write x y;k := f : � ! R is some generic function and bded below For 1 � k � n de…ne functions inf k f , Df : � ! R by � � k f ( x ) x y;k inf : = inf f y 2 � k � � 2 n X Df ( x ) f ( x ) � inf k f ( x ) : = : k =1 Df is a local measure of the sensitivity of f to modi…cations of individual arguments.
Theorem 1 Boucheron, Lugosi, Massart (2003), Maurer (2006) ! � t 2 Pr f f � E [ f ] � t g � exp : 2 k Df k 1 If also 8 k; f � inf k f � 1 a.s. then ! � t 2 Pr f E [ f ] � f � t g � exp : 2 k Df k 1 + 2 t= 3 Applies to convex Lipschitz functions, eigenvalues of random symmetric matri- ces, shortest TSP’s...
Theorem 2 Boucheron, Lugosi, Massart (2003), Maurer (2006) Suppose Df � af a.s., with a > 0 ; Then ! � t 2 Pr f f � E [ f ] � t g � exp : 2 a E [ f ] + at If also 8 k; f � inf k f � 1 a.s. and a � 1 then ! � t 2 Pr f E [ f ] � f � t g � exp : 2 a E [ f ] This talk is about applications of this result.
Application 1 Amendment to Theorem 1, idea from Boucheron, Lugosi, Massart (2009) If f � 0 and f 2 � inf k f 2 � 1 , then ! � t 2 Pr f E [ f ] � f � t g � exp : 8 k Df k 1 Proof: � � 2 � � 2 � � 2 � f 2 � X X f 2 � inf k f 2 = = f � inf f + inf D k f k f k k ( Df ) (2 f ) 2 � 4 k Df k 1 f 2 � so by Theorem 2 applied to f 2 ! n h f 2 i o � t 2 � f 2 � E [ f ] t � Pr f E [ f ] � f � t g � Pr E � exp 8 k Df k 1
Application 2 (with Massi Pontil for COLT09): X; X 1 ; :::; X n iid r.v. with values in [0 ; 1] . Want to give bounds on E X in terms of X = ( X 1 ; :::; X n ) with high con…dence 1 � � . Hoe¤ding: 8 9 s < = ln 1 =� : E X � � Pr X � ; � 1 � �: 2 n Bernstein/Bennett: 8 9 s < = p 2 ln 1 =� + ln 1 =� : E X � � Pr ; � 1 � �: X � V n 3 n To use Bernstein without other information we need a bound on the standard p deviation V in terms of sample.
Estimators for variance and standard deviation For the variance use the sample variance ^ V � � 2 for x 2 [0 ; 1] n X 1 ^ V ( x ) = x i � x j 2 n ( n � 1) i;j p ^ For the standard deviation we use V . Then we can show this: f := n ^ V satis…es n f � inf k f � 1 and Df � n � 1 f; and Theorem 2 gives the lower tail bounds ! n o � ( n � 1) t 2 V � ^ Pr V > t � exp , and 2 V ! q � p � � ( n � 1) t 2 ^ Pr V � V > t � exp : 2
Other methods to get such bounds Audibert, Munos, Szepesvári (2007): Apply Bernstein-like bounds to X i , � X i and � ( X i � E X ) 2 respectively, combine to get ! � p � q � nt 2 ^ Pr � 3 exp V � V emp > t ; 3 : 24 where ^ V emp = ( n � 1) ^ V =n (=variance of empirical distribution). � x; x 0 � = � x � x 0 � 2 = 2 . Alternative: ^ V is U-statistic with kernel q Hoe¤dings version of Bennett’s inequality for U-statistics leads to ! q � p � � ( n � 1) t 2 ^ Pr V � V > t � exp : 2 : 62
Empirical Bernstein bounds Substitution of above in Bernstein’s inequality gives empirical version: 8 9 s q < = 2 ln 2 =� + 7 ln 2 =� : E X � � ^ Pr X � V ; � 1 � �: 3 ( n � 1) n Applications: Multi-armed bandit problem (Audibert, Munos, Szepesvári, 2007), stopping algorithms (Mnih, Szepesvári, Audibert, 2008), sample variance pe- nalization (Pontil, Maurer, 2009).
Application 3 (Largest eigenvalue of the Gramian): X = ( X 1 ; :::; X n ) indep. r.v. distributed in unit ball B of Hilbert space H: D E G ( x ) ij = x i ; x j , f ( x ) = � max ( x ) = largest eigenvalue of G ( x ) : � � By Weyls monotonicity inf k f ( x ) = f x 0 ;k . Also 9 u 2 R n ; k u k R n = 1 , such that � � � � * + 2 2 � � � � � � X X X X � � � � � � � � f ( x ) � f x 0 ;k = u i x i + � u i x i � u i x i u k x k ; u i x i � � � � � � � � i i 6 = k i i 6 = k � � � � q X � � � � 2 j u k j � = 2 j u k j f ( x ) : � u i x i � � � i Conclusion1: f � inf k f � 1 Conclusion2: Square and sum over k to get Df � 4 f
Application 3 (Largest eigenvalue of the Gramian): X = ( X 1 ; :::; X n ) indep. r.v. distributed in unit ball B of Hilbert space H: D E G ( x ) ij = x i ; x j , f ( x ) = � max ( x ) = largest eigenvalue of G ( x ) : From Theorem 2 we get ! � t 2 Pr f � max � E � max > t g � exp 8 E � max + 4 t ! � t 2 Pr f E � max � � max > t g � exp 8 E � max For the largest singular value of the matrix X we get Pr f� ( � max � E � max ) > t g � e � t 2 = 8 :
Another result related to self-bounded functions: Theorem 3 Suppose f; g : � ! R , 0 � f � g and Df � ag and Dg � ag and a � 1 Then ! � t 2 Pr f f � E f > t g � exp 4 a E g + 3 at= 2 If also f � inf k f � 1 ! � t 2 Pr f E f � f > t g � exp 4 a E g + at
Application 4 (any eigenvalue of the Gramian) X = ( X 1 ; :::; X n ) indep. r.v. distributed in unit ball B of Hilbert space H: D E G ( X ) ij = X i ; X j , now let � d ( X ) be any eigenvalue of G ( X ) Set f := � d = 2 and g = � max = 2 . We can show 0 � f � g and f � inf k f � 1 and Df � 2 g and Dg � 2 g: Applying Theorem 3 gives ! � t 2 Pr f � d � E � d > t g � exp 16 E � max + 6 t ! � t 2 Pr f E � d � � d > t g � exp : 16 E � max + 4 t
References [1] J. Y. Audibert, R. Munos, C. Szepesvári. Exploration-exploitation trade- o¤ using variance estimates in multi-armed bandits, Theoretical Computer Science, 2008. [2] S. Boucheron, G. Lugosi, P. Massart, Concentration inequalities using the entropy method , Annals of Probability (2003) 31:1583-1614. [3] M. Ledoux, The Concentration of Measure Phenomenon, AMS Surveys and Monographs 89 (2001) [4] A. Maurer, Concentration inequalities for functions of independent vari- ables. Random Structures Algorithms 29 121–138 2006
[5] Volodymyr Mnih, C. Szepesvári, J. Y. Audibert. Empirical Bernstein Stop- ping. ICML 2008
Recommend
More recommend