Using Friendly Tail Bounds for Sums of Random Matrices Joel A. - PowerPoint PPT Presentation

Using Friendly Tail Bounds for Sums of Random Matrices ❦ Joel A. Tropp Computing + Mathematical Sciences California Institute of Technology jtropp@cms.caltech.edu Research supported in part by NSF, DARPA, ONR, and AFOSR 1

. Matrix . Rademacher Series Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 2

The Norm of a Matrix Rademacher Series Theorem 1. [Oliveira 2010, T 2010] Suppose ❧ B 1 , B 2 , . . . are fixed matrices with dimensions d 1 × d 2 , and ❧ ε 1 , ε 2 , . . . are independent Rademacher RVs. Define d := d 1 + d 2 , and introduce the matrix variance �� σ 2 := max � � � j B j B ∗ � , j B ∗ j B j � � � � j � � � Then � � 2 σ 2 log d � � j ε j B j � ≤ E � � � �� ≤ d · e − t 2 / 2 σ 2 � j ε j B j � ≥ t P � � � Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 3

Example: Modulation by Random Signs Fixed matrix, in captivity:   c 11 c 12 c 13 . . . c 21 c 22 c 23 . . .   C =   c 31 c 32 c 33 . . .   . . . ... . . . . . . d 1 × d 2 Random matrix, formed by randomly flipping the signs of the entries:   ε 11 c 11 ε 12 c 12 ε 13 c 13 . . . ε 21 c 21 ε 22 c 22 ε 23 c 23 . . .   Z =   ε 31 c 31 ε 32 c 32 ε 33 c 33 . . .   . . . ... . . . . . . d 1 × d 2 The family { ε jk } consists of independent Rademacher random variables [Q] What is the typical value of � Z � ? Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 4

The Random Matrix, qua Rademacher Series Rewrite the random matrix:   ε 11 c 11 ε 12 c 12 ε 13 c 13 . . . ε 21 c 21 ε 22 c 22 ε 23 c 23 . . . �   Z = = jk ε jk c jk E jk   ε 31 c 31 ε 32 c 32 ε 33 c 33 . . .   . . . ... . . . . . . d 1 × d 2 The symbol E jk denotes the d 1 × d 2 matrix unit     E jk =  ← j   1  ↑ k Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 5

Computing the Matrix Variance The first term in the matrix variance σ 2 satisfies jk | c jk | 2 E jk E kj � jk ( c jk E jk )( c jk E jk ) ∗ � � � � � � = � � � � � � � � � �� k | c jk | 2 � � = E jj � � � � j k | c 1 k | 2 � �   � � � � k | c 2 k | 2 � = � � �   ... � � � � � k | c jk | 2 = max j The same argument applies to the second term. Thus, σ 2 = max � j | c jk | 2 � � k | c jk | 2 , max k � max j Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 6

The Norm of a Randomly Modulated Matrix Suppose Z = � Theorem 2. [T 2010] jk ε jk c jk E jk , where ❧ C is a fixed d 1 × d 2 matrix, and ❧ { ε jk } is an independent family of Rademacher RVs. Define d := d 1 + d 2 , and compute the matrix variance σ 2 = max � j | c jk | 2 � � k | c jk | 2 , max k � max j Then 2 σ 2 log d � E � Z � ≤ P {� Z � ≥ t } ≤ d · e − t 2 / 2 σ 2 This result also holds when { ε jk } is an iid family of standard normal RVs. Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 7

Comparison with the Literature For the random matrix Z = [ ε jk c jk ] ... [T 2010] , obtained via matrix Rademacher bound: � E � Z � ≤ 2 log d · σ [Seginer 2000] , obtained with path-counting arguments: � 4 E � Z � ≤ const · log d · σ [Lata� la 2005] , obtained with chaining arguments: � �� jk | c jk | 4 E � Z � ≤ const · σ + 4 Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 8

. Matrix . Chernoff Inequality Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 9

The Matrix Chernoff Bound Suppose Y = � Theorem 3. [T 2010] j X j , where ❧ X 1 , X 2 , . . . are random psd matrices with dimension d , and ❧ λ max ( X j ) ≤ R almost surely. Define µ min := λ min ( E Y ) and µ max := λ max ( E Y ) . Then E λ min ( Y ) ≥ 0 . 6 µ min − R log d E λ max ( Y ) ≤ 1 . 8 µ max + R log d � µ min /R e − t � P { λ min ( Y ) ≤ (1 − t ) · µ min } ≤ d · (1 − t ) 1 − t � µ max /R e t � P { λ max ( Y ) ≥ (1 + t ) · µ max } ≤ d · (1 + t ) 1+ t Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 10

Example: Random Submatrices Fixed matrix, in captivity:   | | | | | C = . . . c 1 c 2 c 3 c 4 c n   | | | | | d × n Random matrix, formed by picking random columns:   | | | Z = . . . c 2 c 3 c n   | | | d × n ↑ ↑ ↑ [Q] What is the typical value of σ 1 ( Z ) ? What about σ d ( Z ) ? Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 11

Model for Random Submatrix ❧ Let C be a fixed d × n matrix with columns c 1 , . . . , c n ❧ Let δ 1 , . . . , δ n be independent 0–1 random variables with mean s/n ❧ Define ∆ = diag( δ 1 , . . . , δ n ) ❧ Form a random submatrix Z by turning off columns from C   δ 1   | | | δ 2   . . . Z = C ∆ = c 1 c 2 c n ...       | | | d × n δ n n × n ❧ Note that Z typically consists of about s columns from C Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 12

The Random Submatrix, qua PSD Sum ❧ The largest and smallest singular values of Z satisfy σ 1 ( Z ) 2 = λ max ( ZZ ∗ ) σ d ( Z ) 2 = λ min ( ZZ ∗ ) ❧ Define the psd matrix Y = ZZ ∗ , and observe that � n Y = ZZ ∗ = C ∆ 2 C ∗ = C ∆ C ∗ = k =1 δ k c k c ∗ k ❧ We have expressed Y as a sum of independent psd random matrices Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 13

Preparing to Apply the Chernoff Bound ❧ Consider the random matrix � k δ k c k c ∗ Y = k ❧ The maximal eigenvalue of each summand is bounded as k ) ≤ max k � c k � 2 R = max k λ max ( δ k c k c ∗ ❧ The expectation of the random matrix Y is E ( Y ) = s k = s � n k =1 c k c ∗ n CC ∗ n ❧ The mean parameters satisfy µ max = λ max ( E Y ) = s µ min = λ min ( E Y ) = s n σ 1 ( C ) 2 n σ d ( C ) 2 and Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 14

What the Chernoff Bound Says Applying the Chernoff bound, we reach = E λ max ( Y ) ≤ 1 . 8 · s n σ 1 ( C ) 2 + max k � c k � 2 � σ 1 ( Z ) 2 � 2 · log d E = E λ min ( Y ) ≥ 0 . 6 · s n σ d ( C ) 2 − max k � c k � 2 σ d ( Z ) 2 � � 2 · log d E ❧ Matrix C has n columns; the random submatrix Z includes about s ❧ The singular value σ i ( Z ) 2 inherits an s/n share of σ i ( C ) 2 for i = 1 , d ❧ Additive correction reflects number d of rows of C , max column norm ❧ [Gittens, T 2011] The remaining singular values have similar behavior Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 15

Key Example: Unit-Norm Tight Frame ❧ A d × n unit-norm tight frame C satisfies CC ∗ = n � c k � 2 and 2 = 1 for k = 1 , 2 , . . . , n d I ❧ Specializing the inequalities from the previous slide... ≤ 1 . 8 · s σ 1 ( Z ) 2 � � d + log d E ≥ 0 . 6 · s σ d ( Z ) 2 � � d − log d E ❧ Choose s ≥ 1 . 67 d log d columns for a nontrivial lower bound ❧ Sharp condition s > d log d also follows from matrix Chernoff bound ❧ Earlier work: [Rudelson 1999, Rudelson–Vershynin 2007, T 2008] Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 16

. Matrix . Bernstein Inequality Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 17

The Matrix Bernstein Inequality Suppose Z = � Theorem 4. [Oliveira 2010, T 2010] j W j , where ❧ W 1 , W 2 , . . . are independent random matrices with dimension d 1 × d 2 , ❧ E W j = 0 , and ❧ � W j � ≤ R almost surely. Define d := d 1 + d 2 , and introduce the matrix variance σ 2 := max �� j E ( W j W ∗ j E ( W ∗ j ) � , j W j ) � � � � � � � Then 2 σ 2 log d + 1 � E � Z � ≤ 3 R log d − t 2 / 2 � � P {� Z � ≥ t } ≤ d · exp σ 2 + Rt/ 3 Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 18

Example: Randomized Matrix Multiplication Product of two matrices, in captivity:   — — c ∗ 1 — c ∗ —     2 | | | | |   — c ∗ — BC ∗ =   3 . . . b 1 b 2 b 3 b 4 b n     — — c ∗   4 | | | | | . .   . d 1 × n   — c ∗ — n n × d 2 [Idea] Approximate multiplication by random sampling First reference (?): [Drineas–Mahoney–Kannan 2004] Some recent work: [Magen–Zousias 2010], [Magdon-Ismail 2010], [Hsu–Kakade–Zhang 2011] Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 19

A Sampling Model for Tutorial Purposes ❧ Assume � b k � 2 = 1 and � c k � 2 = 1 for k = 1 , 2 , . . . , n ❧ Construct a random variable W whose value is a d 1 × d 2 matrix ❧ Draw K ∼ uniform { 1 , 2 , . . . , n } ❧ Set W = n · b K c ∗ K ❧ The random matrix W is an unbiased estimator of the product BC ∗ � n � n k =1 ( n · b k c ∗ k =1 b k c ∗ k = BC ∗ E W = k ) · P { K = k } = ❧ Approximate BC ∗ by averaging s independent copies of W Z = 1 � s j =1 W j ≈ BC ∗ s Joel A. Tropp, Using Friendly Tail Bounds , IMA, 27 September 2011 20

Using Friendly Tail Bounds for Sums of Random Matrices Joel A. - PowerPoint PPT Presentation

Using Friendly Tail Bounds for Sums of Random Matrices Joel A. Tropp Computing + Mathematical Sciences California Institute of Technology jtropp@cms.caltech.edu Research supported in part by NSF, DARPA, ONR, and AFOSR 1 . Matrix .

tail bounds tail bounds For a random variable X, the tails of X are the parts of the PMF/density

Results for different matrices and comparisons Dense Matrices Rectangular Matrices

Dedekind sums ingredients Dedekind sums Fourier- Dedekind sums Restricted partition Mirco

MATHEMATICS 1 CONTENTS Matrices Special matrices Operations with matrices Matrix

CS161 Recursion Continued Tail recursion n Tail recursion is a recursive call that occurs as

Lecture 2. Upper and lower bounds for subgaussian matrices The -net method refined 1 Random

PROBABILITY INEQUALITIES FOR SUMS OF RANDOM MATRICES JOEL A. TROPP 1. Overview Let X 1 ,..., X n

Recap: Prefix Sums Given A : set of n integers Find B : prefix sums A: 3 1 1 7 2 5

18.175: Lecture 7 Sums of random variables Scott Sheffield MIT 1 18.175 Lecture 7 Outline

Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen

Race Condition Shared Data: 4 5 6 1 8 5 6 20 9 ? Synchronization and Deadlocks tail

Race Condition Shared Data: 5 6 4 1 8 5 6 20 9 ? InterProcess Communication tail A[]

Probe or Wait : Handling tail losses using Multipath TCP Kiran Yedugundla, Per Hurtig, Anna

Circuit Lower-bounds Lecture 24 Weak circuits are indeed weak 1 Circuit Lower-bounds 2

Binary and Ternary Kloosterman sums Kseniya Garaschuk University of Victoria July 22, 2010

Data Structures II Partial Sums Dynamic Arrays Philip Bille Data Structures II

Regular expressions as types: Bit-coded regular expression parsing Fritz Henglein Department of

Synchronizing Automata III M. V. Volkov Ural State University, Ekaterinburg, Russia LATA

Toeplitz operators on the symmetrized bidisc (A joint work with T. Bhattacharyya and B. K. Das)

Mealy machines, automaton (semi)groups, decision problems, and random generation Thibault Godin

Latency and Throughput Latency (of task): Time elapsed between start of the task and

Latency-Driven Replica Placement Michal Szymaniak Guillaume Pierre Maarten

but quite a lot is. Coordination among users can help with anonymity. Debajyoti Das 1 Sebastian

Latency_nice Implementation and Use-case for Scheduler Optimization Parth Shah