concentration inequalities and tail bounds
play

Concentration inequalities and tail bounds John Duchi Prof. John - PowerPoint PPT Presentation

Concentration inequalities and tail bounds John Duchi Prof. John Duchi Outline I Basics and motivation 1 Law of large numbers 2 Markov inequality 3 Cherno ff bounds II Sub-Gaussian random variables 1 Definitions 2 Examples 3 Hoe ff ding


  1. Concentration inequalities and tail bounds John Duchi Prof. John Duchi

  2. Outline I Basics and motivation 1 Law of large numbers 2 Markov inequality 3 Cherno ff bounds II Sub-Gaussian random variables 1 Definitions 2 Examples 3 Hoe ff ding inequalities III Sub-exponential random variables 1 Definitions 2 Examples 3 Cherno ff /Bernstein bounds Prof. John Duchi

  3. Motivation I Often in this class, goal is to argue that sequence of random (vectors) X 1 , X 2 , . . . satisfies n 1 p X X i ! E [ X ] . n i =1 I Law of large numbers: if E [ k X k ] < 1 , then n ! 1 X lim X i 6 = E [ X ] = 0 . P n n →∞ i =1 Prof. John Duchi

  4. Markov inequalities Theorem (Markov’s inequality) Let X be a non-negative random variable. Then P ( X � t )  E [ X ] . t Prof. John Duchi

  5. Chebyshev inequalities Theorem (Chebyshev’s inequality) Let X be a real-valued random variable with E [ X 2 ] < 1 . Then P ( X � E [ X ] � t )  E [( X � E [ X ]) 2 ] = Var( X ) . t 2 t 2 Example: i.i.d. sampling Prof. John Duchi

  6. Cherno ff bounds Moment generating function: for random variable X , the MGF is M X ( λ ) := E [ e � X ] Example: Normally distributed random variables Prof. John Duchi

  7. Cherno ff bounds Theorem (Cherno ff bound) For any random variable and t � 0 , � ≥ 0 M X − E [ X ] ( λ ) e − � t = inf � ≥ 0 E [ e � ( X − E [ X ]) ] e − � t . P ( X � E [ X ] � t )  inf Prof. John Duchi

  8. Sub-Gaussian random variables Definition (Sub-Gaussianity) A mean-zero random variable X is σ 2 -sub-Gaussian if ✓ λ 2 σ 2 ◆ h e � X i  exp for all λ 2 R E 2 Example: X ⇠ N (0 , σ 2 ) Prof. John Duchi

  9. Properties of sub-Gaussians Proposition (sums of sub-Gaussians) Let X i be independent, mean-zero σ 2 i -sub-Gaussian. Then P n i =1 X i is P n i =1 σ 2 i -sub-Gaussian. Prof. John Duchi

  10. Concentration inequalities Theorem Let X be σ 2 -sub-Gaussian. Then for t � 0 , � t 2 ✓ ◆ P ( X � E [ X ] � t )  exp 2 σ 2 � t 2 ✓ ◆ P ( X � E [ X ]  � t )  exp 2 σ 2 Prof. John Duchi

  11. Concentration: convergence of an independent sum Corollary Let X i be independent σ 2 i -sub-Gaussian. Then for t � 0 , n ! ! nt 2 1 X X i � t  exp � P P n 2 1 i =1 σ 2 n i n i =1 Prof. John Duchi

  12. Example: bounded random variables Proposition Let X 2 [ a, b ] , with E [ X ] = 0 . Then λ 2( b − a )2 E [ e � X ]  e . 8 Prof. John Duchi

  13. Maxima of sub-Gaussian random variables (in probability)  � 2 σ 2 log n p max j ≤ n X j  E Prof. John Duchi

  14. Maxima of sub-Gaussian random variables (in expectation) ✓ ◆  e − t . p 2 σ 2 (log n + t ) max j ≤ n X j � P Prof. John Duchi

  15. Hoe ff ding’s inequality If X i are bounded in [ a i , b i ] then for t � 0 , n ! ! 2 nt 2 1 X ( X i � E [ X i ]) � t  exp � P P n 1 n i =1 ( b i � a i ) 2 n i =1 n ! ! 2 nt 2 1 X ( X i � E [ X i ])  � t  exp � . P P n 1 n i =1 ( b i � a i ) 2 n i =1 Prof. John Duchi

  16. Equivalent definitions of sub-Gaussianity Theorem The following are equivalent (up to constants) i E [exp( X 2 / σ 2 )]  e p ii E [ | X | k ] 1 /k  σ k iii P ( | X | � t )  exp( � t 2 2 � 2 ) If in addition X is mean-zero, then this is also equivalent to i–iii above iv X is σ 2 -sub-Gaussian Prof. John Duchi

  17. Sub-exponential random variables Definition (Sub-exponential) A mean-zero random variable X is ( τ 2 , b ) -sub-Exponential if ✓ λ 2 τ 2 ◆ for | λ |  1 E [exp ( λ X )]  exp b. 2 Example: Exponential RV, density p ( x ) = β e − � x for x � 0 Prof. John Duchi

  18. Sub-exponential random variables Example: χ 2 -random variable. Let Z ⇠ N (0 , σ 2 ) and X = Z 2 . Then 1 E [ e � X ] = . 1 [1 � 2 λσ 2 ] 2 + Prof. John Duchi

  19. Concentration of sub-exponentials Theorem Let X be ( τ 2 , b ) -sub-exponential. Then e − t 2 ( if 0  t  ⌧ 2 ⇢ e − t 2 � 2 τ 2 2 τ 2 , e − t P ( X � E [ X ]+ t )  b = max . 2 b e − t if t � ⌧ 2 2 b b Prof. John Duchi

  20. Sums of sub-exponential random variables Let X i be independent ( τ 2 i , b i ) -sub-exponential random variables. Then P n i =1 X i is ( P n i =1 τ 2 i , b ∗ ) -sub-exponential, where b ∗ = max i b i Corollary: If X i satisfy above, then � n � ! ( )! nt 2 1 , nt � � X X i � E [ X i ] � � t  2 exp � min . P � � P n 2 1 i =1 τ 2 n 2 b ∗ � � i n � i =1 Prof. John Duchi

  21. Bernstein conditions and sub-exponentials Suppose X is mean-zero with | E [ X k ] |  1 2 k ! σ 2 b k − 2 Then λ 2 σ 2 ✓ ◆ E [ e � X ]  exp 2(1 � b | λ | ) Prof. John Duchi

  22. Johnson-Lindenstrauss and high-dimensional embedding Question: Let u 1 , . . . , u m 2 R d be arbitrary. Can we find a mapping F : R d ! R n , n ⌧ d , such that � u i � u j � � 2 � 2 � u i � u j � � 2 � F ( u i ) � F ( u j ) � � � � (1 � δ ) 2  2  (1 + δ ) 2 Theorem (Johnson-Lindenstrauss embedding) For n & 1 ✏ 2 log m such a mapping exists. Prof. John Duchi

  23. Proof of Johnson-Lindenstrauss continued � � ! k Xu k 2 � nt 2 ✓ ◆ � � 2 � 1 � � t  2 exp for t 2 [0 , 1] . P � � n k u k 2 8 � � 2 � Prof. John Duchi

  24. Reading and bibliography 1. S. Boucheron, O. Bousquet, and G. Lugosi. Concentration inequalities. In O. Bousquet, U. Luxburg, and G. Ratsch, editors, Advanced Lectures in Machine Learning , pages 208–240. Springer, 2004 2. V. Buldygin and Y. Kozachenko. Metric Characterization of Random Variables and Random Processes , volume 188 of Translations of Mathematical Monographs . American Mathematical Society, 2000 3. M. Ledoux. The Concentration of Measure Phenomenon . American Mathematical Society, 2001 4. S. Boucheron, G. Lugosi, and P. Massart. Concentration Inequalities: a Nonasymptotic Theory of Independence . Oxford University Press, 2013 Prof. John Duchi

Recommend


More recommend