Using Friendly Tail Bounds for Sums of Random Matrices ❦ Joel A. Tropp Computing + Mathematical Sciences California Institute of Technology jtropp@cms.caltech.edu Research supported in part by NSF, DARPA, ONR, and AFOSR 1
. Matrix . Rademacher Series Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 2
The Norm of a Matrix Rademacher Series Theorem 1. [Oliveira 2010, T 2010] Suppose ❧ B 1 , B 2 , . . . are fixed matrices with dimensions d 1 × d 2 , and ❧ ε 1 , ε 2 , . . . are independent Rademacher RVs. Define d := d 1 + d 2 , and introduce the matrix variance �� � � � σ 2 := max � � � j B j B ∗ � , j B ∗ j B j � � � � j � � � Then � � 2 σ 2 log d � � j ε j B j � ≤ E � � � �� � � ≤ d · e − t 2 / 2 σ 2 � j ε j B j � ≥ t P � � � Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 3
Example: Modulation by Random Signs Fixed matrix, in captivity: c 11 c 12 c 13 . . . c 21 c 22 c 23 . . . C = c 31 c 32 c 33 . . . . . . ... . . . . . . d 1 × d 2 Random matrix, formed by randomly flipping the signs of the entries: ε 11 c 11 ε 12 c 12 ε 13 c 13 . . . ε 21 c 21 ε 22 c 22 ε 23 c 23 . . . Z = ε 31 c 31 ε 32 c 32 ε 33 c 33 . . . . . . ... . . . . . . d 1 × d 2 The family { ε jk } consists of independent Rademacher random variables [Q] What is the typical value of � Z � ? Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 4
The Random Matrix, qua Rademacher Series Rewrite the random matrix: ε 11 c 11 ε 12 c 12 ε 13 c 13 . . . ε 21 c 21 ε 22 c 22 ε 23 c 23 . . . � Z = = jk ε jk c jk E jk ε 31 c 31 ε 32 c 32 ε 33 c 33 . . . . . . ... . . . . . . d 1 × d 2 The symbol E jk denotes the d 1 × d 2 matrix unit E jk = ← j 1 ↑ k Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 5
Computing the Matrix Variance The first term in the matrix variance σ 2 satisfies jk | c jk | 2 E jk E kj � jk ( c jk E jk )( c jk E jk ) ∗ � � � � � � = � � � � � � � � � �� k | c jk | 2 � � = E jj � � � � j k | c 1 k | 2 � � � � � � k | c 2 k | 2 � = � � � ... � � � � � k | c jk | 2 = max j The same argument applies to the second term. Thus, σ 2 = max � j | c jk | 2 � � k | c jk | 2 , max k � max j Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 6
The Norm of a Randomly Modulated Matrix Suppose Z = � Theorem 2. [T 2010] jk ε jk c jk E jk , where ❧ C is a fixed d 1 × d 2 matrix, and ❧ { ε jk } is an independent family of Rademacher RVs. Define d := d 1 + d 2 , and compute the matrix variance σ 2 = max � j | c jk | 2 � � k | c jk | 2 , max k � max j Then 2 σ 2 log d � E � Z � ≤ P {� Z � ≥ t } ≤ d · e − t 2 / 2 σ 2 This result also holds when { ε jk } is an iid family of standard normal RVs. Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 7
Comparison with the Literature For the random matrix Z = [ ε jk c jk ] ... [T 2010] , obtained via matrix Rademacher bound: � E � Z � ≤ 2 log d · σ [Seginer 2000] , obtained with path-counting arguments: � 4 E � Z � ≤ const · log d · σ [Lata� la 2005] , obtained with chaining arguments: � �� � jk | c jk | 4 E � Z � ≤ const · σ + 4 Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 8
. Matrix . Chernoff Inequality Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 9
The Matrix Chernoff Bound Suppose Y = � Theorem 3. [T 2010] j X j , where ❧ X 1 , X 2 , . . . are random psd matrices with dimension d , and ❧ λ max ( X j ) ≤ R almost surely. Define µ min := λ min ( E Y ) and µ max := λ max ( E Y ) . Then E λ min ( Y ) ≥ 0 . 6 µ min − R log d E λ max ( Y ) ≤ 1 . 8 µ max + R log d � µ min /R e − t � P { λ min ( Y ) ≤ (1 − t ) · µ min } ≤ d · (1 − t ) 1 − t � µ max /R e t � P { λ max ( Y ) ≥ (1 + t ) · µ max } ≤ d · (1 + t ) 1+ t Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 10
Example: Random Submatrices Fixed matrix, in captivity: | | | | | C = . . . c 1 c 2 c 3 c 4 c n | | | | | d × n Random matrix, formed by picking random columns: | | | Z = . . . c 2 c 3 c n | | | d × n ↑ ↑ ↑ [Q] What is the typical value of σ 1 ( Z ) ? What about σ d ( Z ) ? Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 11
Model for Random Submatrix ❧ Let C be a fixed d × n matrix with columns c 1 , . . . , c n ❧ Let δ 1 , . . . , δ n be independent 0–1 random variables with mean s/n ❧ Define ∆ = diag( δ 1 , . . . , δ n ) ❧ Form a random submatrix Z by turning off columns from C δ 1 | | | δ 2 . . . Z = C ∆ = c 1 c 2 c n ... | | | d × n δ n n × n ❧ Note that Z typically consists of about s columns from C Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 12
The Random Submatrix, qua PSD Sum ❧ The largest and smallest singular values of Z satisfy σ 1 ( Z ) 2 = λ max ( ZZ ∗ ) σ d ( Z ) 2 = λ min ( ZZ ∗ ) ❧ Define the psd matrix Y = ZZ ∗ , and observe that � n Y = ZZ ∗ = C ∆ 2 C ∗ = C ∆ C ∗ = k =1 δ k c k c ∗ k ❧ We have expressed Y as a sum of independent psd random matrices Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 13
Preparing to Apply the Chernoff Bound ❧ Consider the random matrix � k δ k c k c ∗ Y = k ❧ The maximal eigenvalue of each summand is bounded as k ) ≤ max k � c k � 2 R = max k λ max ( δ k c k c ∗ ❧ The expectation of the random matrix Y is E ( Y ) = s k = s � n k =1 c k c ∗ n CC ∗ n ❧ The mean parameters satisfy µ max = λ max ( E Y ) = s µ min = λ min ( E Y ) = s n σ 1 ( C ) 2 n σ d ( C ) 2 and Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 14
What the Chernoff Bound Says Applying the Chernoff bound, we reach = E λ max ( Y ) ≤ 1 . 8 · s n σ 1 ( C ) 2 + max k � c k � 2 � σ 1 ( Z ) 2 � 2 · log d E = E λ min ( Y ) ≥ 0 . 6 · s n σ d ( C ) 2 − max k � c k � 2 σ d ( Z ) 2 � � 2 · log d E ❧ Matrix C has n columns; the random submatrix Z includes about s ❧ The singular value σ i ( Z ) 2 inherits an s/n share of σ i ( C ) 2 for i = 1 , d ❧ Additive correction reflects number d of rows of C , max column norm ❧ [Gittens, T 2011] The remaining singular values have similar behavior Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 15
Key Example: Unit-Norm Tight Frame ❧ A d × n unit-norm tight frame C satisfies CC ∗ = n � c k � 2 and 2 = 1 for k = 1 , 2 , . . . , n d I ❧ Specializing the inequalities from the previous slide... ≤ 1 . 8 · s σ 1 ( Z ) 2 � � d + log d E ≥ 0 . 6 · s σ d ( Z ) 2 � � d − log d E ❧ Choose s ≥ 1 . 67 d log d columns for a nontrivial lower bound ❧ Sharp condition s > d log d also follows from matrix Chernoff bound ❧ Earlier work: [Rudelson 1999, Rudelson–Vershynin 2007, T 2008] Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 16
. Matrix . Bernstein Inequality Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 17
The Matrix Bernstein Inequality Suppose Z = � Theorem 4. [Oliveira 2010, T 2010] j W j , where ❧ W 1 , W 2 , . . . are independent random matrices with dimension d 1 × d 2 , ❧ E W j = 0 , and ❧ � W j � ≤ R almost surely. Define d := d 1 + d 2 , and introduce the matrix variance σ 2 := max �� � � � � � � j E ( W j W ∗ j E ( W ∗ j ) � , j W j ) � � � � � � � Then 2 σ 2 log d + 1 � E � Z � ≤ 3 R log d − t 2 / 2 � � P {� Z � ≥ t } ≤ d · exp σ 2 + Rt/ 3 Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 18
Example: Randomized Matrix Multiplication Product of two matrices, in captivity: — — c ∗ 1 — c ∗ — 2 | | | | | — c ∗ — BC ∗ = 3 . . . b 1 b 2 b 3 b 4 b n — — c ∗ 4 | | | | | . . . d 1 × n — c ∗ — n n × d 2 [Idea] Approximate multiplication by random sampling First reference (?): [Drineas–Mahoney–Kannan 2004] Some recent work: [Magen–Zousias 2010], [Magdon-Ismail 2010], [Hsu–Kakade–Zhang 2011] Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 19
A Sampling Model for Tutorial Purposes ❧ Assume � b k � 2 = 1 and � c k � 2 = 1 for k = 1 , 2 , . . . , n ❧ Construct a random variable W whose value is a d 1 × d 2 matrix ❧ Draw K ∼ uniform { 1 , 2 , . . . , n } ❧ Set W = n · b K c ∗ K ❧ The random matrix W is an unbiased estimator of the product BC ∗ � n � n k =1 ( n · b k c ∗ k =1 b k c ∗ k = BC ∗ E W = k ) · P { K = k } = ❧ Approximate BC ∗ by averaging s independent copies of W Z = 1 � s j =1 W j ≈ BC ∗ s Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 20
Recommend
More recommend