Concentration inequalities and the entropy method G abor Lugosi - PowerPoint PPT Presentation

Concentration inequalities and the entropy method G´ abor Lugosi ICREA and Pompeu Fabra University Barcelona

what is concentration? We are interested in bounding random fluctuations of functions of many independent random variables.

what is concentration? We are interested in bounding random fluctuations of functions of many independent random variables. X 1 , . . . , X n are independent random variables taking values in some set X . Let f : X n → R and Z = f(X 1 , . . . , X n ) . How large are “typical” deviations of Z from E Z ? In particular, we seek upper bounds for P { Z > E Z + t } and P { Z < E Z − t } for t > 0 .

various approaches - martingales (Yurinskii, 1974; Milman and Schechtman, 1986; Shamir and Spencer, 1987; McDiarmid, 1989,1998); - information theoretic and transportation methods (Alhswede, G´ acs, and K¨ orner, 1976; Marton 1986, 1996, 1997; Dembo 1997); - Talagrand’s induction method, 1996; - logarithmic Sobolev inequalities (Ledoux 1996, Massart 1998, Boucheron, Lugosi, Massart 1999, 2001).

chernoff bounds By Markov’s inequality, if λ > 0 , ≤ E e λ (Z − E Z) � e λ (Z − E Z) > e λ t � P { Z − E Z > t } = P e λ t Next derive bounds for the moment generating function E e λ (Z − E Z) and optimize λ .

chernoff bounds By Markov’s inequality, if λ > 0 , ≤ E e λ (Z − E Z) � e λ (Z − E Z) > e λ t � P { Z − E Z > t } = P e λ t Next derive bounds for the moment generating function E e λ (Z − E Z) and optimize λ . If Z = � n i=1 X i is a sum of independent random variables, n n E e λ Z = E e λ X i = � � E e λ X i i=1 i=1 by independence. It suffices to find bounds for E e λ X i .

chernoff bounds By Markov’s inequality, if λ > 0 , ≤ E e λ (Z − E Z) � e λ (Z − E Z) > e λ t � P { Z − E Z > t } = P e λ t Next derive bounds for the moment generating function E e λ (Z − E Z) and optimize λ . If Z = � n i=1 X i is a sum of independent random variables, n n E e λ Z = E e λ X i = � � E e λ X i i=1 i=1 by independence. It suffices to find bounds for E e λ X i . Serguei Bernstein (1880-1968) Herman Chernoff (1923–)

hoeffding’s inequality If X 1 , . . . , X n ∈ [0 , 1] , then E e λ (X i − E X i ) ≤ e λ 2 / 8 .

hoeffding’s inequality If X 1 , . . . , X n ∈ [0 , 1] , then E e λ (X i − E X i ) ≤ e λ 2 / 8 . We obtain �� n n �� 1 1 � � ≤ 2e − 2nt 2 � � P X i − E X i � > t � � � n n � � i=1 i=1 Wassily Hoeffding (1914–1991)

bernstein’s inequality Hoeffding’s inequality is distribution free. It does not take variance information into account. Bernstein’s inequality is an often useful variant: Let X 1 , . . . , X n be independent such that X i ≤ 1 . Let v = � n X 2 � � . Then i=1 E i � n � � � t 2 � P (X i − E X i ) ≥ t ≤ exp − . 2(v + t / 3) i=1

martingale representation X 1 , . . . , X n are independent random variables taking values in some set X . Let f : X n → R and Z = f(X 1 , . . . , X n ) . Denote E i [ · ] = E [ ·| X 1 , . . . , X i ] . Thus, E 0 Z = E Z and E n Z = Z .

martingale representation X 1 , . . . , X n are independent random variables taking values in some set X . Let f : X n → R and Z = f(X 1 , . . . , X n ) . Denote E i [ · ] = E [ ·| X 1 , . . . , X i ] . Thus, E 0 Z = E Z and E n Z = Z . Writing ∆ i = E i Z − E i − 1 Z , we have n � Z − E Z = ∆ i i=1 This is the Doob martingale representation of Z .

martingale representation X 1 , . . . , X n are independent random variables taking values in some set X . Let f : X n → R and Z = f(X 1 , . . . , X n ) . Denote E i [ · ] = E [ ·| X 1 , . . . , X i ] . Thus, E 0 Z = E Z and E n Z = Z . Writing ∆ i = E i Z − E i − 1 Z , we have n � Z − E Z = ∆ i i=1 This is the Doob martingale Joseph Leo Doob (1910–2004) representation of Z .

martingale representation: the variance � n  � 2  n � � � � �  = ∆ 2 Var (Z) = E ∆ i + 2 E ∆ i ∆ j . E  i i=1 i=1 j > i Now if j > i , E i ∆ j = 0 , so E i ∆ j ∆ i = ∆ i E i ∆ j = 0 , We obtain � n  � 2  n � � � �  = ∆ 2 Var (Z) = E ∆ i . E  i i=1 i=1

martingale representation: the variance � n  � 2  n � � � � �  = ∆ 2 Var (Z) = E ∆ i + 2 E ∆ i ∆ j . E  i i=1 i=1 j > i Now if j > i , E i ∆ j = 0 , so E i ∆ j ∆ i = ∆ i E i ∆ j = 0 , We obtain � n  � 2  n � � � �  = ∆ 2 Var (Z) = E ∆ i . E  i i=1 i=1 From this, using independence, it is easy derive the Efron-Stein inequality.

efron-stein inequality (1981) Let X 1 , . . . , X n be independent random variables taking values in X . Let f : X n → R and Z = f(X 1 , . . . , X n ) . Then n n (Z − E (i) Z) 2 = E � � Var (i) (Z) . Var (Z) ≤ E i=1 i=1 where E (i) Z is expectation with respect to the i -th variable X i only.

efron-stein inequality (1981) Let X 1 , . . . , X n be independent random variables taking values in X . Let f : X n → R and Z = f(X 1 , . . . , X n ) . Then n n (Z − E (i) Z) 2 = E � � Var (i) (Z) . Var (Z) ≤ E i=1 i=1 where E (i) Z is expectation with respect to the i -th variable X i only. We obtain more useful forms by using that Var (X) = 1 2 E (X − X ′ ) 2 Var (X) ≤ E (X − a) 2 and for any constant a .

efron-stein inequality (1981) If X ′ 1 , . . . , X ′ n are independent copies of X 1 , . . . , X n , and Z ′ i = f(X 1 , . . . , X i − 1 , X ′ i , X i+1 , . . . , X n ) , then � n � Var (Z) ≤ 1 � (Z − Z ′ i ) 2 2 E i=1 Z is concentrated if it doesn’t depend too much on any of its variables.

efron-stein inequality (1981) If X ′ 1 , . . . , X ′ n are independent copies of X 1 , . . . , X n , and Z ′ i = f(X 1 , . . . , X i − 1 , X ′ i , X i+1 , . . . , X n ) , then � n � Var (Z) ≤ 1 � (Z − Z ′ i ) 2 2 E i=1 Z is concentrated if it doesn’t depend too much on any of its variables. If Z = � n i=1 X i then we have an equality. Sums are the “least concentrated” of all functions!

efron-stein inequality (1981) If for some arbitrary functions f i Z i = f i (X 1 , . . . , X i − 1 , X i+1 , . . . , X n ) , then � n � � (Z − Z i ) 2 Var (Z) ≤ E i=1

efron, stein, and steele Mike Steele Charles Stein Bradley Efron

weakly self-bounding functions f : X n → [0 , ∞ ) is weakly (a , b) -self-bounding if there exist f i : X n − 1 → [0 , ∞ ) such that for all x ∈ X n , n � 2 � � f(x) − f i (x (i) ) ≤ af(x) + b . i=1

weakly self-bounding functions f : X n → [0 , ∞ ) is weakly (a , b) -self-bounding if there exist f i : X n − 1 → [0 , ∞ ) such that for all x ∈ X n , n � 2 � � f(x) − f i (x (i) ) ≤ af(x) + b . i=1 Then Var (f(X)) ≤ a E f(X) + b .

self-bounding functions If 0 ≤ f(x) − f i (x (i) ) ≤ 1 and n � � � f(x) − f i (x (i) ) ≤ f(x) , i=1 then f is self-bounding and Var (f(X)) ≤ E f(X) .

self-bounding functions If 0 ≤ f(x) − f i (x (i) ) ≤ 1 and n � � � f(x) − f i (x (i) ) ≤ f(x) , i=1 then f is self-bounding and Var (f(X)) ≤ E f(X) . Rademacher averages, random VC dimension, random VC entropy, longest increasing subsequence in a random permutation, are all examples of self bounding functions.

self-bounding functions If 0 ≤ f(x) − f i (x (i) ) ≤ 1 and n � � � f(x) − f i (x (i) ) ≤ f(x) , i=1 then f is self-bounding and Var (f(X)) ≤ E f(X) . Rademacher averages, random VC dimension, random VC entropy, longest increasing subsequence in a random permutation, are all examples of self bounding functions. Configuration functions.

example: uniform deviations Let A be a collection of subsets of X , and let X 1 , . . . , X n be n random points in X drawn i.i.d. Let n P n (A) = 1 � P(A) = P { X 1 ∈ A } and ✶ X i ∈ A n i=1 If Z = sup A ∈A | P(A) − P n (A) | , Var (Z) ≤ 1 2n

example: uniform deviations Let A be a collection of subsets of X , and let X 1 , . . . , X n be n random points in X drawn i.i.d. Let n P n (A) = 1 � P(A) = P { X 1 ∈ A } and ✶ X i ∈ A n i=1 If Z = sup A ∈A | P(A) − P n (A) | , Var (Z) ≤ 1 2n regardless of the distribution and the richness of A .

beyond the variance X 1 , . . . , X n are independent random variables taking values in some set X . Let f : X n → R and Z = f(X 1 , . . . , X n ) . Recall the Doob martingale representation: n � Z − E Z = ∆ i where ∆ i = E i Z − E i − 1 Z , i=1 with E i [ · ] = E [ ·| X 1 , . . . , X i ] . To get exponential inequalities, we bound the moment generating function E e λ (Z − E Z) .

azuma’s inequality Suppose that the martingale differences are bounded: | ∆ i | ≤ c i . Then �� n − 1 � E e λ (Z − E Z) = E e λ ( � n i=1 ∆ i ) = EE n e λ i=1 ∆ i + λ ∆ n �� n − 1 � λ i=1 ∆ i E n e λ ∆ n = E e �� n − 1 � λ i=1 ∆ i n / 2 (by Hoeffding) e λ 2 c 2 ≤ E e · · · ≤ e λ 2 ( � n i=1 c 2 i ) / 2 . This is the Azuma-Hoeffding inequality for sums of bounded martingale differences.

Concentration inequalities and the entropy method G abor Lugosi - PowerPoint PPT Presentation

Concentration inequalities and the entropy method G abor Lugosi ICREA and Pompeu Fabra University Barcelona what is concentration? We are interested in bounding random fluctuations of functions of many independent random variables. what

Entropy, Relative Entropy, Cross Entropy Entropy Entropy, H(x) is a measure of the uncertainty of

Formal Modeling in Cognitive Science Lecture 25: Entropy, Joint Entropy, Conditional Entropy 1

Concentration inequalities, the entropy method, search for super -concentration Concentration, ...

Entropy power inequalities for qudits Entropy power inequalities for qudits M M aris Ozols

Entropy power inequalities for qudits Entropy power inequalities for qudits M M aris Ozols

Entropy inequalities and quantum field theory Horacio Casini Instituto Balseiro, Centro Atomico

Concentration inequalities for occupancy models with log-concave marginals Jay Bartroff, Larry

Chapter 2 Entropy, Relative Entropy, and Mutual Infor- mation Peng-Hua Wang Graduate Institute

Entropy Coding Definition of Entropy Three Entropy coding techniques: (taken from the

1) Entropy = measure of randomness 2) Entropy = measure of compressibility More random = Less

OSMOSIS and DIFFUSION Concentration gradient Concentration Gradient - change in the concentration

Concentration inequalities G abor Lugosi ICREA and Pompeu Fabra University Barcelona what is

Concentration Inequalities for Random Matrices M. Ledoux Institut de Math ematiques de

Inequalities for Symmetric Polynomials Curtis Greene October 24, 2009 Inequalities for Symmetric

Topological entropy and algebraic entropy on locally compact abelian groups - The Bridge Theorem

Probabilistic Models of Human Sentence Experiment 1: Entropy and Sentence Length 2 Processing

Concentration inequalities Jean-Yves Audibert 1 , 2 1. Imagine - ENPC/CSTB - universit e Paris

Monte-Carlo Game Tree Search: Advanced Techniques Tsan-sheng Hsu tshsu@iis.sinica.edu.tw

SIMULATE RESULTS IN AUSTRALIAN FOOTBALL Dr Karl Jackson Champion Data (Melbourne, Australia) A

3. Independence and Random Variables Andrej Bogdanov Independence of two events Let E 1 be

Probabilistic Inequalities and Examples Lecture 3 January 22, 2019 Chandra (UIUC) CS498ABD 1

Regularization with Lipschitz Loss Pierre Alquier Sequential, structured, and/or statistical

Machine learning theory for time series Exponential inequalities for nonstationary Markov chains

On Estimation of Modal Decompositions Anuran Makur, Gregory W. Wornell, and Lizhong Zheng