Matrix-valued Chernoff Bounds and Applications China Theory Week Anastasios Zouzias University of Toronto September 2010
Introduction Probability theory: backbone in analysis of randomized algorithms Random sampling is the most fundamental technique Several inequalities for analyzing approximation: Markov, Chebyshev, Chernoff, Azuma, etc. In this talk: Discuss recent matrix-valued probabilistic inequalities and their applications Agenda: Review real-valued probabilistic inequalities 1 Present recent matrix-valued variants 2 A low rank matrix-valued inequality 3 Two applications: matrix sparsification, approximate matrix 4 multiplication
Introduction Probability theory: backbone in analysis of randomized algorithms Random sampling is the most fundamental technique Several inequalities for analyzing approximation: Markov, Chebyshev, Chernoff, Azuma, etc. In this talk: Discuss recent matrix-valued probabilistic inequalities and their applications Agenda: Review real-valued probabilistic inequalities 1 Present recent matrix-valued variants 2 A low rank matrix-valued inequality 3 Two applications: matrix sparsification, approximate matrix 4 multiplication
Introduction Probability theory: backbone in analysis of randomized algorithms Random sampling is the most fundamental technique Several inequalities for analyzing approximation: Markov, Chebyshev, Chernoff, Azuma, etc. In this talk: Discuss recent matrix-valued probabilistic inequalities and their applications Agenda: Review real-valued probabilistic inequalities 1 Present recent matrix-valued variants 2 A low rank matrix-valued inequality 3 Two applications: matrix sparsification, approximate matrix 4 multiplication
Introduction Probability theory: backbone in analysis of randomized algorithms Random sampling is the most fundamental technique Several inequalities for analyzing approximation: Markov, Chebyshev, Chernoff, Azuma, etc. In this talk: Discuss recent matrix-valued probabilistic inequalities and their applications Agenda: Review real-valued probabilistic inequalities 1 Present recent matrix-valued variants 2 A low rank matrix-valued inequality 3 Two applications: matrix sparsification, approximate matrix 4 multiplication
Introduction Probability theory: backbone in analysis of randomized algorithms Random sampling is the most fundamental technique Several inequalities for analyzing approximation: Markov, Chebyshev, Chernoff, Azuma, etc. In this talk: Discuss recent matrix-valued probabilistic inequalities and their applications Agenda: Review real-valued probabilistic inequalities 1 Present recent matrix-valued variants 2 A low rank matrix-valued inequality 3 Two applications: matrix sparsification, approximate matrix 4 multiplication
Law of Large Numbers Fundamental principle of random sampling: Law of Large Numbers (LLN) It states that the empirical average converges to true average Classical form: for reals rather than matrices Let X 1 ,..., X t be independent copies of a random variable X Goal: estimate the mean E [ X ] using samples X 1 ,..., X t Approximate by the empirical mean t � 1 X t ≈ E [ X ] t i = 1 How good is the approximation (non-asymptotics)?
Law of Large Numbers Fundamental principle of random sampling: Law of Large Numbers (LLN) It states that the empirical average converges to true average Classical form: for reals rather than matrices Let X 1 ,..., X t be independent copies of a random variable X Goal: estimate the mean E [ X ] using samples X 1 ,..., X t Approximate by the empirical mean t � 1 X t ≈ E [ X ] t i = 1 How good is the approximation (non-asymptotics)? Question: Is there a matrix-valued LLN?
Matrix-valued Random Variables Let ( Ω , F , P ) be a probability space. A matrix-valued random variable is a measurable function M : Ω → R d × d Its expectation is a d × d matrix, denote by E [ M ] ∈ R d × d Self-adjoint matrix-valued random variable: M : Ω → S d × d Caveat: Entries may or may not be correlated with each other
Matrix-valued Random Variables Let ( Ω , F , P ) be a probability space. A matrix-valued random variable is a measurable function M : Ω → R d × d Its expectation is a d × d matrix, denote by E [ M ] ∈ R d × d Self-adjoint matrix-valued random variable: M : Ω → S d × d Caveat: Entries may or may not be correlated with each other Matrix-valued random variable is a random matrix with (possibly) correlated entries
Real-valued Probabilistic Inequalities Lemma (Markov) Let X ≥ 0 be a real-valued random variable (r.v.) and α > 0 . Then P ( X ≥ α ) ≤ E [ X ] . α
Real-valued Probabilistic Inequalities Lemma (Markov) Let X ≥ 0 be a real-valued random variable (r.v.) and α > 0 . Then P ( X ≥ α ) ≤ E [ X ] . α Lemma (Chernoff-Hoeffding) Let X 1 , X 2 ,..., X t be i.i.d. copies of a real-valued r.v. X and ε > 0 . If | X | ≤ γ , then � � � � � � t � � � − C ε 2 t 1 � � X i − E [ X ] � � P > ε ≤ 2exp . � � t γ 2 � � i = 1
Real-valued Probabilistic Inequalities Lemma (Chernoff-Hoeffding) Let X 1 , X 2 ,..., X t be i.i.d. copies of a real-valued r.v. X and ε > 0 . If | X | ≤ γ , then � � � � � � t � � � − C ε 2 t 1 � � X i − E [ X ] � � > ε ≤ 2exp . P � � t γ 2 � � i = 1 Lemma (Bernstein) Let X 1 , X 2 ,..., X t be i.i.d. copies of a real-valued r.v. X and ε > 0 . If | X | ≤ γ and Var( X ) ≤ ρ 2 , then � � � � � � t � � � ε 2 t 1 � � X i − E [ X ] − C � � P > ε ≤ 2exp . � � t ρ 2 + γε/ 3 � � i = 1
Real-valued Probabilistic Inequalities Lemma (Chernoff-Hoeffding) Let X 1 , X 2 ,..., X t be i.i.d. copies of a real-valued r.v. X and ε > 0 . If | X | ≤ γ , then � � � � � � t � � � − C ε 2 t 1 � � X i − E [ X ] � � > ε ≤ 2exp . P � � t γ 2 � � i = 1 Lemma (Bernstein) Let X 1 , X 2 ,..., X t be i.i.d. copies of a real-valued r.v. X and ε > 0 . If | X | ≤ γ and Var( X ) ≤ ρ 2 , then � � � � � � t � � � ε 2 t 1 � � X i − E [ X ] − C � � P > ε ≤ 2exp . � � t ρ 2 + γε/ 3 � � i = 1 ...and many more...
Real-valued Probabilistic Inequalities Lemma (Chernoff-Hoeffding) Let X 1 , X 2 ,..., X t be i.i.d. copies of a real-valued r.v. X and ε > 0 . If | X | ≤ γ , then � � � � � � t � � � − C ε 2 t 1 � � X i − E [ X ] � � > ε ≤ 2exp . P � � t γ 2 � � i = 1 Lemma (Bernstein) Let X 1 , X 2 ,..., X t be i.i.d. copies of a real-valued r.v. X and ε > 0 . If | X | ≤ γ and Var( X ) ≤ ρ 2 , then � � � � � � t � � � ε 2 t 1 � � X i − E [ X ] − C � � P > ε ≤ 2exp . � � t ρ 2 + γε/ 3 � � i = 1 Question: How would the matrix-valued generalizations look like?
Real-valued to Matrix-valued Is there a meaningful way to generalize the real-valued inequalities to matrix -valued? Would these inequalities be useful to us?
Real-valued to Matrix-valued Is there a meaningful way to generalize the real-valued inequalities to matrix -valued? Would these inequalities be useful to us? A , B ∈ S d × d α,̙ ∈ R Comments A � B A − B is p.s.d. α > ̙ � A � | α | Spectral norm e A e α Matrix Exponential
Matrix-valued Probabilistic Inequalities Lemma (Markov) Let X ≥ 0 be a real-valued r.v. and α > 0 . Then P ( X ≥ α ) ≤ E [ X ] . α
Matrix-valued Probabilistic Inequalities Lemma (Markov) Let X ≥ 0 be a real-valued r.v. and α > 0 . Then P ( X ≥ α ) ≤ E [ X ] . α Lemma (Matrix-valued Markov [AW02]) Let M � 0 be a self adjoint matrix-valued r.v. and α > 0 . Then P ( M � α · I ) ≤ tr ( E [ M ]) . α Remark: P ( M � α · I ) = P ( λ max ( M ) > α )
Matrix-valued Probabilistic Inequalities Theorem (Chernoff) Let X 1 , X 2 ,..., X t be i.i.d. copies of a real-valued r.v. X and ε > 0 . If | X | ≤ γ , then � � � � � � t � � � − C ε 2 t 1 � � X i − E [ X ] � � P > ε ≤ 2exp . � � t γ 2 � � i = 1
Recommend
More recommend