example: kernel density estimation Let X 1 , . . . , X n be i.i.d. real samples drawn according to some density φ . The kernel density estimate is � x − X i � n � φ n (x) = 1 K , nh h i=1 � where h > 0 , and K is a nonnegative “kernel” K = 1 . The L 1 error is � Z = f(X 1 , . . . , X n ) = | φ (x) − φ n (x) | dx .
example: kernel density estimation Let X 1 , . . . , X n be i.i.d. real samples drawn according to some density φ . The kernel density estimate is � x − X i � n � φ n (x) = 1 K , nh h i=1 � where h > 0 , and K is a nonnegative “kernel” K = 1 . The L 1 error is � Z = f(X 1 , . . . , X n ) = | φ (x) − φ n (x) | dx . It is easy to see that | f(x 1 , . . . , x n ) − f(x 1 , . . . , x ′ i , . . . , x n ) | � � �� � x − x i � � x − x ′ � � 1 � dx ≤ 2 � i � ≤ � K − K n , nh h h Var (Z) ≤ 2 so we get n .
example: uniform deviations Let A be a collection of subsets of X , and let X 1 , . . . , X n be n random points in X drawn i.i.d. Let n � P n (A) = 1 P(A) = P { X 1 ∈ A } and ✶ X i ∈ A n i=1 If Z = sup A ∈A | P(A) − P n (A) | , Var (Z) ≤ 1 2n
example: uniform deviations Let A be a collection of subsets of X , and let X 1 , . . . , X n be n random points in X drawn i.i.d. Let n � P n (A) = 1 P(A) = P { X 1 ∈ A } and ✶ X i ∈ A n i=1 If Z = sup A ∈A | P(A) − P n (A) | , Var (Z) ≤ 1 2n regardless of the distribution and the richness of A .
✶ ✶ ✶ bounding the expectation � n i ∈ A and let E ′ denote expectation only n (A) = 1 Let P ′ i=1 ✶ X ′ n with respect to X ′ 1 , . . . , X ′ n . | E ′ [P n (A) − P ′ E sup | P n (A) − P(A) | = E sup n (A)] | A ∈A A ∈A � � � � n � n (A) | = 1 � � | P n (A) − P ′ ≤ E sup n E sup ( ✶ X i ∈ A − ✶ X ′ i ∈ A ) � � � � A ∈A A ∈A i=1
bounding the expectation � n i ∈ A and let E ′ denote expectation only n (A) = 1 Let P ′ i=1 ✶ X ′ n with respect to X ′ 1 , . . . , X ′ n . | E ′ [P n (A) − P ′ E sup | P n (A) − P(A) | = E sup n (A)] | A ∈A A ∈A � � � � n � n (A) | = 1 � � | P n (A) − P ′ ≤ E sup n E sup ( ✶ X i ∈ A − ✶ X ′ i ∈ A ) � � � � A ∈A A ∈A i=1 Second symmetrization: if ε 1 , . . . , ε n are independent Rademacher variables, then � � � � � � � � n n � � = 1 � ≤ 2 � � � � n E sup ε i ( ✶ X i ∈ A − ✶ X ′ i ∈ A ) n E sup ε i ✶ X i ∈ A � � � � � � � A ∈A A ∈A i=1 i=1
conditional rademacher average If � � � � n � � � R n = E ε sup ε i ✶ X i ∈ A � � � � A ∈A i=1 then | P n (A) − P(A) | ≤ 2 E sup n E R n . A ∈A
conditional rademacher average If � � � � n � � � R n = E ε sup ε i ✶ X i ∈ A � � � � A ∈A i=1 then | P n (A) − P(A) | ≤ 2 E sup n E R n . A ∈A R n is a data-dependent quantity!
concentration of conditional rademacher average Define � � � � � � � � � R (i) n = E ε sup ε j ✶ X j ∈ A � � � � A ∈A j � =i One can show easily that n � 0 ≤ R n − R (i) (R n − R (i) n ≤ 1 and n ) ≤ R n . i=1 By the Efron-Stein inequality, n � n ) 2 ≤ E R n . (R n − R (i) Var (R n ) ≤ E i=1
concentration of conditional rademacher average Define � � � � � � � � � R (i) n = E ε sup ε j ✶ X j ∈ A � � � � A ∈A j � =i One can show easily that n � 0 ≤ R n − R (i) (R n − R (i) n ≤ 1 and n ) ≤ R n . i=1 By the Efron-Stein inequality, n � n ) 2 ≤ E R n . (R n − R (i) Var (R n ) ≤ E i=1 Standard deviation is at most √ E R n !
concentration of conditional rademacher average Define � � � � � � � � � R (i) n = E ε sup ε j ✶ X j ∈ A � � � � A ∈A j � =i One can show easily that n � 0 ≤ R n − R (i) (R n − R (i) n ≤ 1 and n ) ≤ R n . i=1 By the Efron-Stein inequality, n � n ) 2 ≤ E R n . (R n − R (i) Var (R n ) ≤ E i=1 Standard deviation is at most √ E R n ! Such functions are called self-bounding.
bounding the conditional rademacher average If S(X n 1 , A ) is the number of different sets of form { X 1 , . . . , X n } ∩ A : A ∈ A then R n is the maximum of S(X n 1 , A ) sub-Gaussian random variables. By the maximal inequality, � log S(X n 1 1 , A ) 2R n ≤ . 2n
bounding the conditional rademacher average If S(X n 1 , A ) is the number of different sets of form { X 1 , . . . , X n } ∩ A : A ∈ A then R n is the maximum of S(X n 1 , A ) sub-Gaussian random variables. By the maximal inequality, � log S(X n 1 1 , A ) 2R n ≤ . 2n In particular, � log S(X n 1 , A ) E sup | P n (A) − P(A) | ≤ 2 E . 2n A ∈A
random VC dimension Let V = V(x n 1 , A ) be the size of the largest subset of { x 1 , . . . , x n } shattered by A . By Sauer’s lemma, log S(X n 1 , A ) ≤ V(X n 1 , A ) log(n + 1) .
random VC dimension Let V = V(x n 1 , A ) be the size of the largest subset of { x 1 , . . . , x n } shattered by A . By Sauer’s lemma, log S(X n 1 , A ) ≤ V(X n 1 , A ) log(n + 1) . V is also self-bounding: n � (V − V (i) ) 2 ≤ V i=1 so by Efron-Stein, Var (V) ≤ E V
vapnik and chervonenkis Alexey Chervonenkis Vladimir Vapnik
beyond the variance X 1 , . . . , X n are independent random variables taking values in some set X . Let f : X n → R and Z = f(X 1 , . . . , X n ) . Recall the Doob martingale representation: n � Z − E Z = ∆ i where ∆ i = E i Z − E i − 1 Z , i=1 with E i [ · ] = E [ ·| X 1 , . . . , X i ] . To get exponential inequalities, we bound the moment generating function E e λ (Z − E Z) .
azuma’s inequality Suppose that the martingale differences are bounded: | ∆ i | ≤ c i . Then �� n − 1 � � n E e λ (Z − E Z) = E e λ ( i=1 ∆ i ) = EE n e λ i=1 ∆ i + λ ∆ n �� n − 1 � λ i=1 ∆ i E n e λ ∆ n = E e �� n − 1 � λ i=1 ∆ i n / 2 (by Hoeffding) e λ 2 c 2 ≤ E e · · · � n ≤ e λ 2 ( i=1 c 2 i ) / 2 . This is the Azuma-Hoeffding inequality for sums of bounded martingale differences.
bounded differences inequality If Z = f(X 1 , . . . , X n ) and f is such that | f(x 1 , . . . , x n ) − f(x 1 , . . . , x ′ i , . . . , x n ) | ≤ c i then the martingale differences are bounded.
bounded differences inequality If Z = f(X 1 , . . . , X n ) and f is such that | f(x 1 , . . . , x n ) − f(x 1 , . . . , x ′ i , . . . , x n ) | ≤ c i then the martingale differences are bounded. Bounded differences inequality: if X 1 , . . . , X n are independent, then P {| Z − E Z | > t } ≤ 2e − 2t 2 / � n i=1 c 2 i .
bounded differences inequality If Z = f(X 1 , . . . , X n ) and f is such that | f(x 1 , . . . , x n ) − f(x 1 , . . . , x ′ i , . . . , x n ) | ≤ c i then the martingale differences are bounded. Bounded differences inequality: if X 1 , . . . , X n are independent, then P {| Z − E Z | > t } ≤ 2e − 2t 2 / � n i=1 c 2 i . McDiarmid’s inequality. Colin McDiarmid
hoeffding in a hilbert space Let X 1 , . . . , X n be independent zero-mean random variables in a separable Hilbert space such that � X i � ≤ c / 2 and denote v = nc 2 / 4 . Then, for all t ≥ √ v , �� � � � � n � ≤ e − (t −√ v) 2 / (2v) . � � P � X i � � > t � i=1
hoeffding in a hilbert space Let X 1 , . . . , X n be independent zero-mean random variables in a separable Hilbert space such that � X i � ≤ c / 2 and denote v = nc 2 / 4 . Then, for all t ≥ √ v , �� � � � � n � ≤ e − (t −√ v) 2 / (2v) . � � P � X i � � > t � i=1 � � �� n � has the bounded Proof: By the triangle inequality, i=1 X i differences property with constants c , so �� � � �� � � � � � � � � � � � � � � n n n n � � � � � � � � � � � � X i � > t = P X i � − E X i � > t − E X i P � � � � � � � � � � � � � i=1 i=1 i=1 i=1 � � � � �� 2 � �� n t − E i=1 X i ≤ exp − . 2v Also, � � � � � � � � 2 � � � � n � n n � � � � E � X i � 2 ≤ √ v . � � � � � E � X i � ≤ X i = E � � � � � � � i=1 i=1 i=1
bounded differences inequality Easy to use. Distribution free. Often close to optimal (e.g., L 1 error of kernel density estimate). Does not exploit “variance information.” Often too rigid. Other methods are necessary.
shannon entropy If X , Y are random variables taking values in a set of size N , � H(X) = − p(x) log p(x) x H(X | Y)= H(X , Y) − H(Y) � = − p(x , y) log p(x | y) Claude Shannon x , y (1916–2001) H(X) ≤ log N and H(X | Y) ≤ H(X)
han’s inequality If X = (X 1 , . . . , X n ) and X (i) = (X 1 , . . . , X i − 1 , X i+1 , . . . , X n ) , then � � n � H(X) − H(X (i) ) ≤ H(X) i=1 Proof: H(X)= H(X (i) ) + H(X i | X (i) ) ≤ H(X (i) ) + H(X i | X 1 , . . . , X i − 1 ) Since � n i=1 H(X i | X 1 , . . . , X i − 1 ) = H(X) , summing Te Sun Han the inequality, we get n � H(X (i) ) . (n − 1)H(X) ≤ i=1
edge isoperimetric inequality on the hypercube Let A ⊂ {− 1 , 1 } n . Let E(A) be the collection of pairs x , x ′ ∈ A such that d H (x , x ′ ) = 1 . Then | E(A) | ≤ | A | 2 × log 2 | A | . Proof: Let X = (X 1 , . . . , X n ) be uniformly distributed over A . Then p(x) = ✶ x ∈ A / | A | . Clearly, H(X) = log | A | . Also, � H(X) − H(X (i) ) = H(X i | X (i) ) = − p(x) log p(x i | x (i) ) . x ∈ A For x ∈ A , � if x (i) ∈ A 1 / 2 p(x i | x (i) ) = 1 otherwise where x (i) = (x 1 , . . . , x i − 1 , − x i , x i+1 , . . . , x n ) .
� H(X) − H(X (i) ) = log 2 ✶ x , x (i) ∈ A | A | x ∈ A and therefore � � n n � � � = log 2 ✶ x , x (i) ∈ A = | E(A) | H(X) − H(X (i) ) 2 log 2 . | A | | A | i=1 x ∈ A i=1 Thus, by Han’s inequality, � � n � | E(A) | H(X) − H(X (i) ) 2 log 2 = ≤ H(X) = log | A | . | A | i=1
This is equivalent to the edge isoperimetric inequality on the hypercube: if � � (x , x ′ ) : x ∈ A , x ′ ∈ A c , d H (x , x ′ ) = 1 ∂ E (A) = . is the edge boundary of A , then 2 n | ∂ E (A) | ≥ log 2 | A | × | A | Equality is achieved for sub-cubes.
VC entropy is self-bounding Let A is a class of subsets of X and x = (x 1 , . . . , x n ) ∈ X n . Recall that S(x , A ) is the number of different sets of form { x 1 , . . . , x n } ∩ A : A ∈ A Let f n (x) = log 2 S(x , A ) be the VC entropy. Then 0 ≤ f n (x) − f n − 1 (x 1 , . . . , x i − 1 , x i+1 . . . , x n ) ≤ 1 and n � (f n (x) − f n − 1 (x 1 , . . . , x i − 1 , x i+1 . . . , x n )) ≤ f n (x) . i=1 Proof: Put the uniform distribution on the class of sets { x 1 , . . . , x n } ∩ A and use Han’s inequality.
VC entropy is self-bounding Let A is a class of subsets of X and x = (x 1 , . . . , x n ) ∈ X n . Recall that S(x , A ) is the number of different sets of form { x 1 , . . . , x n } ∩ A : A ∈ A Let f n (x) = log 2 S(x , A ) be the VC entropy. Then 0 ≤ f n (x) − f n − 1 (x 1 , . . . , x i − 1 , x i+1 . . . , x n ) ≤ 1 and n � (f n (x) − f n − 1 (x 1 , . . . , x i − 1 , x i+1 . . . , x n )) ≤ f n (x) . i=1 Proof: Put the uniform distribution on the class of sets { x 1 , . . . , x n } ∩ A and use Han’s inequality. Corollary: if X 1 , . . . , X n are independent, then Var (log 2 S(X , A )) ≤ E log 2 S(X , A ) .
subadditivity of entropy The entropy of a random variable Z ≥ 0 is Ent (Z) = E Φ(Z) − Φ( E Z) where Φ(x) = x log x . By Jensen’s inequality, Ent (Z) ≥ 0 .
subadditivity of entropy The entropy of a random variable Z ≥ 0 is Ent (Z) = E Φ(Z) − Φ( E Z) where Φ(x) = x log x . By Jensen’s inequality, Ent (Z) ≥ 0 . Han’s inequality implies the following sub-additivity property. Let X 1 , . . . , X n be independent and let Z = f(X 1 , . . . , X n ) , where f ≥ 0 . Denote Ent (i) (Z) = E (i) Φ(Z) − Φ( E (i) Z) Then n � Ent (i) (Z) . Ent (Z) ≤ E i=1
a logarithmic sobolev inequality on the hypercube Let X = (X 1 , . . . , X n ) be uniformly distributed over {− 1 , 1 } n . If f : {− 1 , 1 } n → R and Z = f(X) , n � Ent (Z 2 ) ≤ 1 (Z − Z ′ i ) 2 2 E i=1 The proof uses subadditivity of the entropy and calculus for the case n = 1 . Implies Efron-Stein.
herbst’s argument: exponential concentration If f : {− 1 , 1 } n → R , the log-Sobolev inequality may be used with g(x) = e λ f(x) / 2 where λ ∈ R . If F( λ ) = E e λ Z is the moment generating function of Z = f(X) , � Ze λ Z � � e λ Z � � Ze λ Z � Ent (g(X) 2 )= λ E − E log E = λ F ′ ( λ ) − F( λ ) log F( λ ) . Differential inequalities are obtained for F( λ ) .
herbst’s argument As an example, suppose f is such that � n i=1 (Z − Z ′ i ) 2 + ≤ v . Then by the log-Sobolev inequality, λ F ′ ( λ ) − F( λ ) log F( λ ) ≤ v λ 2 4 F( λ ) If G( λ ) = log F( λ ) , this becomes � G( λ ) � ′ ≤ v 4 . λ This can be integrated: G( λ ) ≤ λ E Z + λ v / 4 , so F( λ ) ≤ e λ E Z − λ 2 v / 4 This implies P { Z > E Z + t } ≤ e − t 2 / v
herbst’s argument As an example, suppose f is such that � n i=1 (Z − Z ′ i ) 2 + ≤ v . Then by the log-Sobolev inequality, λ F ′ ( λ ) − F( λ ) log F( λ ) ≤ v λ 2 4 F( λ ) If G( λ ) = log F( λ ) , this becomes � G( λ ) � ′ ≤ v 4 . λ This can be integrated: G( λ ) ≤ λ E Z + λ v / 4 , so F( λ ) ≤ e λ E Z − λ 2 v / 4 This implies P { Z > E Z + t } ≤ e − t 2 / v Stronger than the bounded differences inequality!
gaussian log-sobolev inequality Let X = (X 1 , . . . , X n ) be a vector of i.i.d. standard normal If f : R n → R and Z = f(X) , � �∇ f(X) � 2 � Ent (Z 2 ) ≤ 2 E (Gross, 1975).
gaussian log-sobolev inequality Let X = (X 1 , . . . , X n ) be a vector of i.i.d. standard normal If f : R n → R and Z = f(X) , � �∇ f(X) � 2 � Ent (Z 2 ) ≤ 2 E (Gross, 1975). Proof sketch: By the subadditivity of entropy, it suffices to prove it for n = 1 . Approximate Z = f(X) by � � m � 1 f ε i √ m i=1 where the ε i are i.i.d. Rademacher random variables. Use the log-Sobolev inequality of the hypercube and the central limit theorem.
gaussian concentration inequality Herbst’t argument may now be repeated: Suppose f is Lipschitz: for all x , y ∈ R n , | f(x) − f(y) | ≤ L � x − y � . Then, for all t > 0 , P { f(X) − E f(X) ≥ t } ≤ e − t 2 / (2L 2 ) . (Tsirelson, Ibragimov, and Sudakov, 1976).
an application: supremum of a gaussian process Let (X t ) t ∈T be an almost surely continuous centered Gaussian process. Let Z = sup t ∈T X t . If � � �� σ 2 = sup X 2 , E t t ∈T then P {| Z − E Z | ≥ u } ≤ 2e − u 2 / (2 σ 2 )
an application: supremum of a gaussian process Let (X t ) t ∈T be an almost surely continuous centered Gaussian process. Let Z = sup t ∈T X t . If � � �� σ 2 = sup X 2 , E t t ∈T then P {| Z − E Z | ≥ u } ≤ 2e − u 2 / (2 σ 2 ) Proof: We may assume T = { 1 , ..., n } . Let Γ be the covariance matrix of X = (X 1 , . . . , X n ) . Let A = Γ 1 / 2 . If Y is a standard normal vector, then distr . f(Y) = i=1 ,..., n (AY) i max = i=1 ,..., n X i max By Cauchy-Schwarz, � � 1 / 2 � � � � � � � � A 2 | (Au) i − (Av) i | = A i , j (u j − v j ) ≤ � u − v � � � i , j � � j j ≤ σ � u − v �
beyond bernoulli and gaussian: the entropy method For general distributions, logarithmic Sobolev inequalities are not available. Solution: modified logarithmic Sobolev inequalities. Suppose X 1 , . . . , X n are independent. Let Z = f(X 1 , . . . , X n ) and Z i = f i (X (i) ) = f i (X 1 , . . . , X i − 1 , X i+1 , . . . , X n ) . Let φ (x) = e x − x − 1 . Then for all λ ∈ R , � Ze λ Z � � e λ Z � � e λ Z � λ E − E log E � � n � e λ Z φ ( − λ (Z − Z i )) ≤ . E i=1 Michel Ledoux
the entropy method i f(X 1 , . . . , x ′ Define Z i = inf x ′ i , . . . , X n ) and suppose n � (Z − Z i ) 2 ≤ v . i=1 Then for all t > 0 , P { Z − E Z > t } ≤ e − t 2 / (2v) .
the entropy method i f(X 1 , . . . , x ′ Define Z i = inf x ′ i , . . . , X n ) and suppose n � (Z − Z i ) 2 ≤ v . i=1 Then for all t > 0 , P { Z − E Z > t } ≤ e − t 2 / (2v) . This implies the bounded differences inequality and much more.
example: the largest eigenvalue of a symmetric matrix Let A = (X i , j ) n × n be symmetric, the X i , j independent ( i ≤ j ) with | X i , j | ≤ 1 . Let u T Au . Z = λ 1 = sup u: � u � =1 and suppose v is such that Z = v T Av . A ′ i , j is obtained by replacing X i , j by x ′ i , j . Then � � v T Av − v T A ′ (Z − Z i , j ) + ≤ i , j v ✶ Z > Z i , j � � � � v T (A − A ′ v i v j (X i , j − X ′ = i , j )v ✶ Z > Z i , j ≤ 2 i , j ) + ≤ 4 | v i v j | . Therefore, � n � 2 � � � 16 | v i v j | 2 ≤ 16 (Z − Z ′ i , j ) 2 v 2 + ≤ = 16 . i 1 ≤ i ≤ j ≤ n 1 ≤ i ≤ j ≤ n i=1
example: convex lipschitz functions Let f : [0 , 1] n → R be a convex function. Let i f(X 1 , . . . , x ′ i , . . . , X n ) and let X ′ i be the value of x ′ Z i = inf x ′ i for which the minimum is achieved. Then, writing (i) = (X 1 , . . . , X i − 1 , X ′ X i , X i+1 , . . . , X n ) , n n � � (i) ) 2 (Z − Z i ) 2 = (f(X) − f(X i=1 i=1 � ∂ f � 2 n � (X i − X ′ i ) 2 ≤ (X) ∂ x i i=1 (by convexity) � ∂ f � 2 n � ≤ (X) ∂ x i i=1 = �∇ f(X) � 2 ≤ L 2 .
convex lipschitz functions If f : [0 , 1] n → R is a convex Lipschitz function and X 1 , . . . , X n are independent taking values in [0 , 1] , Z = f(X 1 , . . . , X n ) satisfies P { Z > E Z + t } ≤ e − t 2 / (2L 2 ) .
convex lipschitz functions If f : [0 , 1] n → R is a convex Lipschitz function and X 1 , . . . , X n are independent taking values in [0 , 1] , Z = f(X 1 , . . . , X n ) satisfies P { Z > E Z + t } ≤ e − t 2 / (2L 2 ) . A similar lower tail bound also holds.
self-bounding functions Suppose Z satisfies n � 0 ≤ Z − Z i ≤ 1 and (Z − Z i ) ≤ Z . i=1 Recall that Var (Z) ≤ E Z . We have much more: P { Z > E Z + t } ≤ e − t 2 / (2 E Z+2t / 3) and P { Z < E Z − t } ≤ e − t 2 / (2 E Z)
self-bounding functions Suppose Z satisfies n � 0 ≤ Z − Z i ≤ 1 and (Z − Z i ) ≤ Z . i=1 Recall that Var (Z) ≤ E Z . We have much more: P { Z > E Z + t } ≤ e − t 2 / (2 E Z+2t / 3) and P { Z < E Z − t } ≤ e − t 2 / (2 E Z) Rademacher averages, random VC dimension, random VC entropy, longest increasing subsequence in a random permutation, are all examples of self bounding functions.
self-bounding functions Suppose Z satisfies n � 0 ≤ Z − Z i ≤ 1 and (Z − Z i ) ≤ Z . i=1 Recall that Var (Z) ≤ E Z . We have much more: P { Z > E Z + t } ≤ e − t 2 / (2 E Z+2t / 3) and P { Z < E Z − t } ≤ e − t 2 / (2 E Z) Rademacher averages, random VC dimension, random VC entropy, longest increasing subsequence in a random permutation, are all examples of self bounding functions. Configuration functions.
exponential efron-stein inequality Define E ′ � � n � V + = (Z − Z ′ i ) 2 + i=1 and E ′ � � n � V − = (Z − Z ′ i ) 2 . − i=1 By Efron-Stein, Var (Z) ≤ E V − . Var (Z) ≤ E V + and
exponential efron-stein inequality Define E ′ � � n � V + = (Z − Z ′ i ) 2 + i=1 and E ′ � � n � V − = (Z − Z ′ i ) 2 . − i=1 By Efron-Stein, Var (Z) ≤ E V − . Var (Z) ≤ E V + and The following exponential versions hold for all λ, θ > 0 with λθ < 1 : λθ log E e λ (Z − E Z) ≤ 1 − λθ log E e λ V + /θ . If also Z ′ i − Z ≤ 1 for every i , fhen for all λ ∈ (0 , 1 / 2) , 2 λ 1 − 2 λ log E e λ V − . log E e λ (Z − E Z) ≤
weakly self-bounding functions f : X n → [0 , ∞ ) is weakly (a , b) -self-bounding if there exist f i : X n − 1 → [0 , ∞ ) such that for all x ∈ X n , � � 2 n � f(x) − f i (x (i) ) ≤ af(x) + b . i=1
weakly self-bounding functions f : X n → [0 , ∞ ) is weakly (a , b) -self-bounding if there exist f i : X n − 1 → [0 , ∞ ) such that for all x ∈ X n , � � 2 n � f(x) − f i (x (i) ) ≤ af(x) + b . i=1 Then � � t 2 P { Z ≥ E Z + t } ≤ exp − . 2 (a E Z + b + at / 2)
weakly self-bounding functions f : X n → [0 , ∞ ) is weakly (a , b) -self-bounding if there exist f i : X n − 1 → [0 , ∞ ) such that for all x ∈ X n , � � 2 n � f(x) − f i (x (i) ) ≤ af(x) + b . i=1 Then � � t 2 P { Z ≥ E Z + t } ≤ exp − . 2 (a E Z + b + at / 2) If, in addition, f(x) − f i (x (i) ) ≤ 1 , then for 0 < t ≤ E Z , � � t 2 P { Z ≤ E Z − t } ≤ exp − . 2 (a E Z + b + c − t) where c = (3a − 1) / 6 .
the isoperimetric view Let X = (X 1 , . . . , X n ) have independent components, taking values in X n . Let A ⊂ X n . The Hamming distance of X to A is n � d(X , A) = min y ∈ A d(X , y) = min ✶ X i � =y i . y ∈ A i=1 Michel Talagrand
the isoperimetric view Let X = (X 1 , . . . , X n ) have independent components, taking values in X n . Let A ⊂ X n . The Hamming distance of X to A is n � d(X , A) = min y ∈ A d(X , y) = min ✶ X i � =y i . y ∈ A i=1 Michel Talagrand � � � n 1 ≤ e − 2t 2 / n . d(X , A) ≥ t + 2 log P P [A]
the isoperimetric view Let X = (X 1 , . . . , X n ) have independent components, taking values in X n . Let A ⊂ X n . The Hamming distance of X to A is n � d(X , A) = min y ∈ A d(X , y) = min ✶ X i � =y i . y ∈ A i=1 Michel Talagrand � � � n 1 ≤ e − 2t 2 / n . d(X , A) ≥ t + 2 log P P [A] Concentration of measure!
the isoperimetric view Proof: By the bounded differences inequality, P { E d(X , A) − d(X , A) ≥ t } ≤ e − 2t 2 / n . Taking t = E d(X , A) , we get � n 1 E d(X , A) ≤ 2 log P { A } . By the bounded differences inequality again, � � � n 1 ≤ e − 2t 2 / n P d(X , A) ≥ t + 2 log P { A }
talagrand’s convex distance The weighted Hamming distance is � d α (x , A) = inf y ∈ A d α (x , y) = inf | α i | y ∈ A i:x i � =y i where α = ( α 1 , . . . , α n ) . The same argument as before gives � � � � α � 2 1 ≤ e − 2t 2 / � α � 2 , P d α (X , A) ≥ t + log 2 P { A } This implies min ( P { A } , P { d α (X , A) ≥ t } ) ≤ e − t 2 / 2 . sup α : � α � =1
convex distance inequality convex distance: d T (x , A) = sup d α (x , A) . α ∈ [0 , ∞ ) n : � α � =1
convex distance inequality convex distance: d T (x , A) = sup d α (x , A) . α ∈ [0 , ∞ ) n : � α � =1 Talagrand’s convex distance inequality: P { A } P { d T (X , A) ≥ t } ≤ e − t 2 / 4 .
convex distance inequality convex distance: d T (x , A) = sup d α (x , A) . α ∈ [0 , ∞ ) n : � α � =1 Talagrand’s convex distance inequality: P { A } P { d T (X , A) ≥ t } ≤ e − t 2 / 4 . Follows from the fact that d T (X , A) 2 is (4 , 0) weakly self bounding (by a saddle point representation of d T ). Talagrand’s original proof was different.
✶ ✶ convex lipschitz functions For A ⊂ [0 , 1] n and x ∈ [0 , 1] n , define D(x , A) = inf y ∈ A � x − y � . If A is convex, then D(x , A) ≤ d T (x , A) .
convex lipschitz functions For A ⊂ [0 , 1] n and x ∈ [0 , 1] n , define D(x , A) = inf y ∈ A � x − y � . If A is convex, then D(x , A) ≤ d T (x , A) . Proof: D(x , A)= ν ∈M (A) � x − E ν Y � inf (since A is convex) � � n � � � � 2 � ≤ inf (since x j , Y j ∈ [0 , 1] ) E ν ✶ x j � =Y j ν ∈M (A) j=1 n � = inf sup α j E ν ✶ x j � =Y j (by Cauchy-Schwarz) ν ∈M (A) α : � α �≤ 1 j=1 = d T (x , A) (by minimax theorem) .
John von Neumann (1903–1957)
Sergei Lvovich Sobolev (1908–1989)
Recommend
More recommend