CS 498ABD: Algorithms for Big Data, Spring 2019 Approximate Matrix Multiplication Lecture 21 April 11, 2019 Chandra (UIUC) CS498ABD 1 Spring 2019 1 / 34
Matrix data Lot of data can be viewed as defining a matrix. We have already seen vectors modeling data/signals. More generally we can use tensors too. n data items and each data item a i is a vector over some features (say m features) A is the matrix defined by the n data items. Assuming a 1 , . . . , a n are columns then A is a m × n matrix Combinatorial objects such as graphs can also be modeled via graphs Chandra (UIUC) CS498ABD 2 Spring 2019 2 / 34
Numerical Linear Algebra Basic problems in linear algebra: Matrix vector product: compute Ax Matrix multiplication: compute AB Linear equations: solve Ax = b Matrix inversion: compute A − 1 Least squares: solve min x � Ax − b � Singular value decomposition, eigen values, principal component analysis, low-rank approximations . . . Fundamental in all areas of applied mathematics and engineering. Many applications to statistics and data analysis. Chandra (UIUC) CS498ABD 3 Spring 2019 3 / 34
Numerical Linear Algebra NLA has a vast literature In practice iterative methods are used that converge to an optimum solution. They can take advantage of sparsity in the input data better than exact methods Some TCS contributions in the recent past: randomized NLA for faster algorithms with provable approximation guarantees - sampling and JL based techniques and others revisit preconditioning methods for Laplacians and beyond Many powerful applications in theory and practice Chandra (UIUC) CS498ABD 4 Spring 2019 4 / 34
Norms and matrix norms Definition A norm �� in a real vector space V is a real valued function that has three properties: (i) � x � ≥ 0 for all x ∈ V and � x � = 0 implies x = 0 , (ii) � ax � = | a |� x � for all scalars a (iii) � x + y � ≤ � x � + � y � Familiar vector norms: � x � p = ( � i | x i | p ) 1 / p If A is a injective linear transformation � Ax � is also a norm in the original space. Norms and metrics: d ( x , y ) = � x − y � is a metric Chandra (UIUC) CS498ABD 5 Spring 2019 5 / 34
Matrix norms Consider vector space of all matrices A ∈ R m × n What are useful norms over matrices? Treat matrix like a vector of dimension m × n and apply vector norm. For instance � A � F (Frobenius norm) is ( � i , j | A i , j | 2 ) 1 / 2 . Treat matrix as linear operator and see what it does to norms of vectors it operates on. Spectral norm is sup � x � 2 =1 � Ax � 2 . Schatten p -norms based on singular values of A Trace norm, nuclear norm, . . . Norms are related in some cases (different perspective on the same norm) Chandra (UIUC) CS498ABD 6 Spring 2019 6 / 34
Frobenus and Spectral norms Submultiplicative property: � AB � F ≤ � A � F � B � F � AB � 2 ≤ � A � 2 � B � 2 Chandra (UIUC) CS498ABD 7 Spring 2019 7 / 34
Matrix Multiplication Problem: Given matrices A ∈ R m × n and B ∈ R n × p compute the matrix AB Standard algorithm based on definition: O ( mnp ) time Faster algorithms via non-trivial Strassen-like divide and conquer. Chandra (UIUC) CS498ABD 8 Spring 2019 8 / 34
Matrix Multiplication Problem: Given matrices A ∈ R m × n and B ∈ R n × p compute the matrix AB Standard algorithm based on definition: O ( mnp ) time Faster algorithms via non-trivial Strassen-like divide and conquer. Approximation: Compute D ∈ R m × p such that � D − AB � is small in some appropriate matrix norm. Two methods random sampling random projections (fast JL) Chandra (UIUC) CS498ABD 8 Spring 2019 8 / 34
Matrix Multiplication Problem: Given matrices A ∈ R m × n and B ∈ R n × p compute the matrix AB Notation: M ( j ) for j ’th column of M and M ( i ) for i ’th row of M both interpreted as vectors From textbook definition: D i , h = � A ( i ) , B ( h ) � = � n k =1 A i , k B k , h Consider A T consisting of m column vectors from R n and B as p column vectors from R n We want to compute all mp inner products of these vectors. Chandra (UIUC) CS498ABD 9 Spring 2019 9 / 34
Approximate Matrix Multiplication Want to approximate AB in the Frobenius norm. Want D such that � D − AB � F ≤ ǫ � AB � F but � AB � F can be 0 . Instead will settle for � D − AB � F ≤ ǫ � A � F � B � F Chandra (UIUC) CS498ABD 10 Spring 2019 10 / 34
Part I Random Sampling for Approx Matrix Mult Chandra (UIUC) CS498ABD 11 Spring 2019 11 / 34
Matrix Multiplication and Outer Products Alternate definition of matrix multiplication based on outer product: n � A ( j ) B ( j ) AB = j =1 A ( j ) B ( j ) is a m × h matrix of rank 1 Chandra (UIUC) CS498ABD 12 Spring 2019 12 / 34
Importance Sampling n � A ( j ) B ( j ) AB = j =1 Pick a probability distribution over [ n ] , p 1 + p 2 + . . . + p n = 1 For ℓ = 1 to t do pick an index j ℓ ∈ [ n ] according to distribution p (independent with replacement) Output C = 1 � t 1 p i ℓ A ( j ℓ ) B ( j ℓ ) ℓ =1 t Chandra (UIUC) CS498ABD 13 Spring 2019 13 / 34
Importance Sampling n � A ( j ) B ( j ) AB = j =1 Pick a probability distribution over [ n ] , p 1 + p 2 + . . . + p n = 1 For ℓ = 1 to t do pick an index j ℓ ∈ [ n ] according to distribution p (independent with replacement) Output C = 1 � t 1 p i ℓ A ( j ℓ ) B ( j ℓ ) ℓ =1 t C = 1 � ℓ C ℓ where E[ C ℓ ] = AB . t By linearity of expectation: E[ C ] = AB Chandra (UIUC) CS498ABD 13 Spring 2019 13 / 34
Importance Sampling Question: How should we choose p 1 , p 2 , . . . , p n ? Chandra (UIUC) CS498ABD 14 Spring 2019 14 / 34
Importance Sampling Question: How should we choose p 1 , p 2 , . . . , p n ? p j should correspond to contribution of A ( j ) B ( j ) to � AB � F Chandra (UIUC) CS498ABD 14 Spring 2019 14 / 34
Importance Sampling Question: How should we choose p 1 , p 2 , . . . , p n ? p j should correspond to contribution of A ( j ) B ( j ) to � AB � F Use spectral norm of A ( j ) B ( j ) which is � A ( j ) B ( j ) � 2 Chandra (UIUC) CS498ABD 14 Spring 2019 14 / 34
Importance Sampling Question: How should we choose p 1 , p 2 , . . . , p n ? p j should correspond to contribution of A ( j ) B ( j ) to � AB � F Use spectral norm of A ( j ) B ( j ) which is � A ( j ) B ( j ) � 2 Claim: � A ( j ) B ( j ) � 2 = � A ( j ) � 2 � B ( j ) � 2 . � A ( j ) � 2 � B ( j ) � 2 Choose p j = ℓ � A ( ℓ ) � 2 � B ( ℓ ) � 2 � Due to [Drineas-Kannan-Mahoney] Chandra (UIUC) CS498ABD 14 Spring 2019 14 / 34
Running time For all j compute � A ( j ) � 2 and � B ( j ) � 2 . Takes one pass over A and B Allows one to compute p 1 , p 2 , . . . , p n C = 1 � t 1 p i ℓ A ( j ℓ ) B ( j ℓ ) ℓ =1 t At most O ( tmh + N A + N B ) time where N A and N B is number of non-zeroes in A and B . Full computation takes O ( nmh ) time. Chandra (UIUC) CS498ABD 15 Spring 2019 15 / 34
Analysis of approximation Want to analyse Pr[ � C − AB � F ≥ ǫ � A � F � B � F ] . Chandra (UIUC) CS498ABD 16 Spring 2019 16 / 34
Analysis of approximation Want to analyse Pr[ � C − AB � F ≥ ǫ � A � F � B � F ] . Using Markov: � � C − AB � 2 � Pr[ � C − AB � F ≥ ǫ � A � F � B � F ] ≤ E F ǫ 2 � A � 2 F � B � 2 F Lemma � 2 �� n � � C − AB � 2 � ≤ 1 j =1 � A ( j ) � 2 � B ( j ) � 2 − 1 t � AB � 2 E F t F Chandra (UIUC) CS498ABD 16 Spring 2019 16 / 34
Analysis continued � � C − AB � 2 � Pr[ � C − AB � F ≥ ǫ � A � F � B � F ] ≤ E F ǫ 2 � A � 2 F � B � 2 F 2 n 1 − 1 � � � C − AB � 2 � � A ( j ) � 2 � B ( j ) � t � AB � 2 ≤ E F F t j =1 1 t � A � 2 F � B � 2 ≤ F . 1 Thus, if t = ǫ 2 δ then Pr[ � C − AB � F ≥ ǫ � A � F � B � F ] ≤ δ. Chandra (UIUC) CS498ABD 17 Spring 2019 17 / 34
Median trick Recall that we used median trick to improve dependence on δ from 1 /δ to log(1 /δ ) . 3 If t = ǫ 2 then Pr[ � C − AB � F ≥ ǫ � A � F � B � F ] ≤ 1 / 3 . Repeat independently to obtain C 1 , C 2 , . . . , C r where r = Θ(log(1 /δ )) By Chernoff bounds majority of estimators are good. How do we pick the “median” matrix? Chandra (UIUC) CS498ABD 18 Spring 2019 18 / 34
Median trick 3 If t = ǫ 2 then Pr[ � C − AB � F ≥ ǫ � A � F � B � F ] ≤ 1 / 3 . Repeat independently to obtain C 1 , C 2 , . . . , C r where r = Θ(log(1 /δ )) For each 1 ≤ i ≤ r compute ρ i = |{ j | j � = i , � C i − C j � ≤ 2 ǫ � A � F � B � F }| Output C s such that ρ s ≥ r / 2 [Clarkson-Woodruff] Chandra (UIUC) CS498ABD 19 Spring 2019 19 / 34
Median trick For each 1 ≤ i ≤ r compute ρ i = |{ j | j � = i , � C i − C j � ≤ 2 ǫ � A � F � B � F }| Output C s such that ρ s ≥ r / 2 Correctness follows from triangle inequality. � C i − C j � F ≤ � C i − AB � F + � C j − AB � F and � C i − C j � F ≥ � C i − AB � F − � C j − AB � F . Chandra (UIUC) CS498ABD 20 Spring 2019 20 / 34
Recommend
More recommend