Concentration inequalities and tail bounds John Duchi Prof. John Duchi
Outline I Basics and motivation 1 Law of large numbers 2 Markov inequality 3 Cherno ff bounds II Sub-Gaussian random variables 1 Definitions 2 Examples 3 Hoe ff ding inequalities III Sub-exponential random variables 1 Definitions 2 Examples 3 Cherno ff /Bernstein bounds Prof. John Duchi
Motivation I Often in this class, goal is to argue that sequence of random (vectors) X 1 , X 2 , . . . satisfies n 1 p X X i ! E [ X ] . n i =1 I Law of large numbers: if E [ k X k ] < 1 , then n ! 1 X lim X i 6 = E [ X ] = 0 . P n n →∞ i =1 Prof. John Duchi
Markov inequalities Theorem (Markov’s inequality) Let X be a non-negative random variable. Then P ( X � t ) E [ X ] . t Prof. John Duchi
Chebyshev inequalities Theorem (Chebyshev’s inequality) Let X be a real-valued random variable with E [ X 2 ] < 1 . Then P ( X � E [ X ] � t ) E [( X � E [ X ]) 2 ] = Var( X ) . t 2 t 2 Example: i.i.d. sampling Prof. John Duchi
Cherno ff bounds Moment generating function: for random variable X , the MGF is M X ( λ ) := E [ e � X ] Example: Normally distributed random variables Prof. John Duchi
Cherno ff bounds Theorem (Cherno ff bound) For any random variable and t � 0 , � ≥ 0 M X − E [ X ] ( λ ) e − � t = inf � ≥ 0 E [ e � ( X − E [ X ]) ] e − � t . P ( X � E [ X ] � t ) inf Prof. John Duchi
Sub-Gaussian random variables Definition (Sub-Gaussianity) A mean-zero random variable X is σ 2 -sub-Gaussian if ✓ λ 2 σ 2 ◆ h e � X i exp for all λ 2 R E 2 Example: X ⇠ N (0 , σ 2 ) Prof. John Duchi
Properties of sub-Gaussians Proposition (sums of sub-Gaussians) Let X i be independent, mean-zero σ 2 i -sub-Gaussian. Then P n i =1 X i is P n i =1 σ 2 i -sub-Gaussian. Prof. John Duchi
Concentration inequalities Theorem Let X be σ 2 -sub-Gaussian. Then for t � 0 , � t 2 ✓ ◆ P ( X � E [ X ] � t ) exp 2 σ 2 � t 2 ✓ ◆ P ( X � E [ X ] � t ) exp 2 σ 2 Prof. John Duchi
Concentration: convergence of an independent sum Corollary Let X i be independent σ 2 i -sub-Gaussian. Then for t � 0 , n ! ! nt 2 1 X X i � t exp � P P n 2 1 i =1 σ 2 n i n i =1 Prof. John Duchi
Example: bounded random variables Proposition Let X 2 [ a, b ] , with E [ X ] = 0 . Then λ 2( b − a )2 E [ e � X ] e . 8 Prof. John Duchi
Maxima of sub-Gaussian random variables (in probability) � 2 σ 2 log n p max j ≤ n X j E Prof. John Duchi
Maxima of sub-Gaussian random variables (in expectation) ✓ ◆ e − t . p 2 σ 2 (log n + t ) max j ≤ n X j � P Prof. John Duchi
Hoe ff ding’s inequality If X i are bounded in [ a i , b i ] then for t � 0 , n ! ! 2 nt 2 1 X ( X i � E [ X i ]) � t exp � P P n 1 n i =1 ( b i � a i ) 2 n i =1 n ! ! 2 nt 2 1 X ( X i � E [ X i ]) � t exp � . P P n 1 n i =1 ( b i � a i ) 2 n i =1 Prof. John Duchi
Equivalent definitions of sub-Gaussianity Theorem The following are equivalent (up to constants) i E [exp( X 2 / σ 2 )] e p ii E [ | X | k ] 1 /k σ k iii P ( | X | � t ) exp( � t 2 2 � 2 ) If in addition X is mean-zero, then this is also equivalent to i–iii above iv X is σ 2 -sub-Gaussian Prof. John Duchi
Sub-exponential random variables Definition (Sub-exponential) A mean-zero random variable X is ( τ 2 , b ) -sub-Exponential if ✓ λ 2 τ 2 ◆ for | λ | 1 E [exp ( λ X )] exp b. 2 Example: Exponential RV, density p ( x ) = β e − � x for x � 0 Prof. John Duchi
Sub-exponential random variables Example: χ 2 -random variable. Let Z ⇠ N (0 , σ 2 ) and X = Z 2 . Then 1 E [ e � X ] = . 1 [1 � 2 λσ 2 ] 2 + Prof. John Duchi
Concentration of sub-exponentials Theorem Let X be ( τ 2 , b ) -sub-exponential. Then e − t 2 ( if 0 t ⌧ 2 ⇢ e − t 2 � 2 τ 2 2 τ 2 , e − t P ( X � E [ X ]+ t ) b = max . 2 b e − t if t � ⌧ 2 2 b b Prof. John Duchi
Sums of sub-exponential random variables Let X i be independent ( τ 2 i , b i ) -sub-exponential random variables. Then P n i =1 X i is ( P n i =1 τ 2 i , b ∗ ) -sub-exponential, where b ∗ = max i b i Corollary: If X i satisfy above, then � n � ! ( )! nt 2 1 , nt � � X X i � E [ X i ] � � t 2 exp � min . P � � P n 2 1 i =1 τ 2 n 2 b ∗ � � i n � i =1 Prof. John Duchi
Bernstein conditions and sub-exponentials Suppose X is mean-zero with | E [ X k ] | 1 2 k ! σ 2 b k − 2 Then λ 2 σ 2 ✓ ◆ E [ e � X ] exp 2(1 � b | λ | ) Prof. John Duchi
Johnson-Lindenstrauss and high-dimensional embedding Question: Let u 1 , . . . , u m 2 R d be arbitrary. Can we find a mapping F : R d ! R n , n ⌧ d , such that � u i � u j � � 2 � 2 � u i � u j � � 2 � F ( u i ) � F ( u j ) � � � � (1 � δ ) 2 2 (1 + δ ) 2 Theorem (Johnson-Lindenstrauss embedding) For n & 1 ✏ 2 log m such a mapping exists. Prof. John Duchi
Proof of Johnson-Lindenstrauss continued � � ! k Xu k 2 � nt 2 ✓ ◆ � � 2 � 1 � � t 2 exp for t 2 [0 , 1] . P � � n k u k 2 8 � � 2 � Prof. John Duchi
Reading and bibliography 1. S. Boucheron, O. Bousquet, and G. Lugosi. Concentration inequalities. In O. Bousquet, U. Luxburg, and G. Ratsch, editors, Advanced Lectures in Machine Learning , pages 208–240. Springer, 2004 2. V. Buldygin and Y. Kozachenko. Metric Characterization of Random Variables and Random Processes , volume 188 of Translations of Mathematical Monographs . American Mathematical Society, 2000 3. M. Ledoux. The Concentration of Measure Phenomenon . American Mathematical Society, 2001 4. S. Boucheron, G. Lugosi, and P. Massart. Concentration Inequalities: a Nonasymptotic Theory of Independence . Oxford University Press, 2013 Prof. John Duchi
Recommend
More recommend