matrix valued chernoff bounds and applications
play

Matrix-valued Chernoff Bounds and Applications China Theory Week - PowerPoint PPT Presentation

Matrix-valued Chernoff Bounds and Applications China Theory Week Anastasios Zouzias University of Toronto September 2010 Introduction Probability theory: backbone in analysis of randomized algorithms Random sampling is the most fundamental


  1. Matrix-valued Chernoff Bounds and Applications China Theory Week Anastasios Zouzias University of Toronto September 2010

  2. Introduction Probability theory: backbone in analysis of randomized algorithms Random sampling is the most fundamental technique Several inequalities for analyzing approximation: Markov, Chebyshev, Chernoff, Azuma, etc. In this talk: Discuss recent matrix-valued probabilistic inequalities and their applications Agenda: Review real-valued probabilistic inequalities 1 Present recent matrix-valued variants 2 A low rank matrix-valued inequality 3 Two applications: matrix sparsification, approximate matrix 4 multiplication

  3. Introduction Probability theory: backbone in analysis of randomized algorithms Random sampling is the most fundamental technique Several inequalities for analyzing approximation: Markov, Chebyshev, Chernoff, Azuma, etc. In this talk: Discuss recent matrix-valued probabilistic inequalities and their applications Agenda: Review real-valued probabilistic inequalities 1 Present recent matrix-valued variants 2 A low rank matrix-valued inequality 3 Two applications: matrix sparsification, approximate matrix 4 multiplication

  4. Introduction Probability theory: backbone in analysis of randomized algorithms Random sampling is the most fundamental technique Several inequalities for analyzing approximation: Markov, Chebyshev, Chernoff, Azuma, etc. In this talk: Discuss recent matrix-valued probabilistic inequalities and their applications Agenda: Review real-valued probabilistic inequalities 1 Present recent matrix-valued variants 2 A low rank matrix-valued inequality 3 Two applications: matrix sparsification, approximate matrix 4 multiplication

  5. Introduction Probability theory: backbone in analysis of randomized algorithms Random sampling is the most fundamental technique Several inequalities for analyzing approximation: Markov, Chebyshev, Chernoff, Azuma, etc. In this talk: Discuss recent matrix-valued probabilistic inequalities and their applications Agenda: Review real-valued probabilistic inequalities 1 Present recent matrix-valued variants 2 A low rank matrix-valued inequality 3 Two applications: matrix sparsification, approximate matrix 4 multiplication

  6. Introduction Probability theory: backbone in analysis of randomized algorithms Random sampling is the most fundamental technique Several inequalities for analyzing approximation: Markov, Chebyshev, Chernoff, Azuma, etc. In this talk: Discuss recent matrix-valued probabilistic inequalities and their applications Agenda: Review real-valued probabilistic inequalities 1 Present recent matrix-valued variants 2 A low rank matrix-valued inequality 3 Two applications: matrix sparsification, approximate matrix 4 multiplication

  7. Law of Large Numbers Fundamental principle of random sampling: Law of Large Numbers (LLN) It states that the empirical average converges to true average Classical form: for reals rather than matrices Let X 1 ,..., X t be independent copies of a random variable X Goal: estimate the mean E [ X ] using samples X 1 ,..., X t Approximate by the empirical mean t � 1 X t ≈ E [ X ] t i = 1 How good is the approximation (non-asymptotics)?

  8. Law of Large Numbers Fundamental principle of random sampling: Law of Large Numbers (LLN) It states that the empirical average converges to true average Classical form: for reals rather than matrices Let X 1 ,..., X t be independent copies of a random variable X Goal: estimate the mean E [ X ] using samples X 1 ,..., X t Approximate by the empirical mean t � 1 X t ≈ E [ X ] t i = 1 How good is the approximation (non-asymptotics)? Question: Is there a matrix-valued LLN?

  9. Matrix-valued Random Variables Let ( Ω , F , P ) be a probability space. A matrix-valued random variable is a measurable function M : Ω → R d × d Its expectation is a d × d matrix, denote by E [ M ] ∈ R d × d Self-adjoint matrix-valued random variable: M : Ω → S d × d Caveat: Entries may or may not be correlated with each other

  10. Matrix-valued Random Variables Let ( Ω , F , P ) be a probability space. A matrix-valued random variable is a measurable function M : Ω → R d × d Its expectation is a d × d matrix, denote by E [ M ] ∈ R d × d Self-adjoint matrix-valued random variable: M : Ω → S d × d Caveat: Entries may or may not be correlated with each other Matrix-valued random variable is a random matrix with (possibly) correlated entries

  11. Real-valued Probabilistic Inequalities Lemma (Markov) Let X ≥ 0 be a real-valued random variable (r.v.) and α > 0 . Then P ( X ≥ α ) ≤ E [ X ] . α

  12. Real-valued Probabilistic Inequalities Lemma (Markov) Let X ≥ 0 be a real-valued random variable (r.v.) and α > 0 . Then P ( X ≥ α ) ≤ E [ X ] . α Lemma (Chernoff-Hoeffding) Let X 1 , X 2 ,..., X t be i.i.d. copies of a real-valued r.v. X and ε > 0 . If | X | ≤ γ , then  � �  � � � � t  �  � � − C ε 2 t   1  � �    X i − E [ X ]  � �  P > ε  ≤ 2exp .    � � t γ 2 � � i = 1

  13. Real-valued Probabilistic Inequalities Lemma (Chernoff-Hoeffding) Let X 1 , X 2 ,..., X t be i.i.d. copies of a real-valued r.v. X and ε > 0 . If | X | ≤ γ , then � �   � � � � t  �  � � − C ε 2 t   1   � �   X i − E [ X ]  � �  > ε  ≤ 2exp . P    � � t γ 2 � � i = 1 Lemma (Bernstein) Let X 1 , X 2 ,..., X t be i.i.d. copies of a real-valued r.v. X and ε > 0 . If | X | ≤ γ and Var( X ) ≤ ρ 2 , then  � �  � � � � t  �  � � ε 2 t   1   � �   X i − E [ X ] − C  � �  P > ε  ≤ 2exp .    � � t ρ 2 + γε/ 3 � � i = 1

  14. Real-valued Probabilistic Inequalities Lemma (Chernoff-Hoeffding) Let X 1 , X 2 ,..., X t be i.i.d. copies of a real-valued r.v. X and ε > 0 . If | X | ≤ γ , then � �   � � � � t  �  � � − C ε 2 t   1   � �   X i − E [ X ]  � �  > ε  ≤ 2exp . P    � � t γ 2 � � i = 1 Lemma (Bernstein) Let X 1 , X 2 ,..., X t be i.i.d. copies of a real-valued r.v. X and ε > 0 . If | X | ≤ γ and Var( X ) ≤ ρ 2 , then  � �  � � � � t  �  � � ε 2 t   1   � �   X i − E [ X ] − C  � �  P > ε  ≤ 2exp .    � � t ρ 2 + γε/ 3 � � i = 1 ...and many more...

  15. Real-valued Probabilistic Inequalities Lemma (Chernoff-Hoeffding) Let X 1 , X 2 ,..., X t be i.i.d. copies of a real-valued r.v. X and ε > 0 . If | X | ≤ γ , then � �   � � � � t  �  � � − C ε 2 t   1   � �   X i − E [ X ]  � �  > ε  ≤ 2exp . P    � � t γ 2 � � i = 1 Lemma (Bernstein) Let X 1 , X 2 ,..., X t be i.i.d. copies of a real-valued r.v. X and ε > 0 . If | X | ≤ γ and Var( X ) ≤ ρ 2 , then  � �  � � � � t  �  � � ε 2 t   1   � �   X i − E [ X ] − C  � �  P > ε  ≤ 2exp .    � � t ρ 2 + γε/ 3 � � i = 1 Question: How would the matrix-valued generalizations look like?

  16. Real-valued to Matrix-valued Is there a meaningful way to generalize the real-valued inequalities to matrix -valued? Would these inequalities be useful to us?

  17. Real-valued to Matrix-valued Is there a meaningful way to generalize the real-valued inequalities to matrix -valued? Would these inequalities be useful to us? A , B ∈ S d × d α,̙ ∈ R Comments A � B A − B is p.s.d. α > ̙ � A � | α | Spectral norm e A e α Matrix Exponential

  18. Matrix-valued Probabilistic Inequalities Lemma (Markov) Let X ≥ 0 be a real-valued r.v. and α > 0 . Then P ( X ≥ α ) ≤ E [ X ] . α

  19. Matrix-valued Probabilistic Inequalities Lemma (Markov) Let X ≥ 0 be a real-valued r.v. and α > 0 . Then P ( X ≥ α ) ≤ E [ X ] . α Lemma (Matrix-valued Markov [AW02]) Let M � 0 be a self adjoint matrix-valued r.v. and α > 0 . Then P ( M � α · I ) ≤ tr ( E [ M ]) . α Remark: P ( M � α · I ) = P ( λ max ( M ) > α )

  20. Matrix-valued Probabilistic Inequalities Theorem (Chernoff) Let X 1 , X 2 ,..., X t be i.i.d. copies of a real-valued r.v. X and ε > 0 . If | X | ≤ γ , then  � �  � � � � t  �  � � − C ε 2 t   1   � �  X i − E [ X ]   � �  P > ε  ≤ 2exp .    � � t γ 2 � � i = 1

Recommend


More recommend