Matrix Multiplication Rasmus Pagh IT University of Copenhagen ITCS, January 10, 2012 1
Matrix Multiplication Rasmus Pagh IT University of Copenhagen ITCS, January 10, 2012 2
Outline • Algorithm and analysis • Related work • Case study: Correlations • Open problems 3
Informal problem statement • Input : n -by- n matrices A and B , parameter b . • Output : Approximation of AB that is good if AB is dominated by its b largest entries (“compressible”). 4
Basic algorithm 1. Take hash functions s 1 , s 2 : [ n ] → {-1,1} and h 1 , h 2 : [ n ] → [ b ]. 2. Compute the polynomial ! 0 1 n n n X X X X A ik s 1 ( i ) x h 1 ( i ) B kj s 2 ( j ) x h 2 ( j ) A . c i x i = . @ i =1 j =1 i k =1 3. Extract unbiased estimator ( AB ) ij ≈ s 1 ( i ) s 2 ( j ) c h 1 ( i )+ h 2 ( j ) 5
Basic algorithm 1. Take hash functions s 1 , s 2 : [ n ] → {-1,1} and h 1 , h 2 : [ n ] → [ b ]. 2. Compute the polynomial ! 0 1 n n n X X X X A ik s 1 ( i ) x h 1 ( i ) B kj s 2 ( j ) x h 2 ( j ) A . c i x i = . @ i =1 j =1 i k =1 3. Extract unbiased estimator Observation : Each ( AB ) ij ≈ s 1 ( i ) s 2 ( j ) c h 1 ( i )+ h 2 ( j ) coefficient c i is a sum of entries of AB with random signs 5
Why unbiased? Lemma : If s 1 and s 2 are pairwise independent, ⇢ 1 if i 1 = i 2 and j 1 = j 2 E [ s 1 ( i 1 ) s 1 ( i 2 ) s 2 ( j 1 ) s 2 ( j 2 )] = 0 otherwise . X c i x i Using lemma, expected value of is: s 1 ( i ) s 2 ( j ) . i 2 ! 0 1 3 n n n X X X A ik s 1 ( i ) x h 1 ( i ) B kj s 2 ( j ) x h 2 ( j ) 5 . 4 s 1 ( i ) s 2 ( j ) E @ A k =1 i =1 j =1 n X A ik s 2 1 ( i ) A ik x h 1 ( i ) s 2 2 ( j ) B kj x h 2 ( j ) = k =1 = ( AB ) ij x h 1 ( i )+ h 2 ( j ) 6
What is the variance? • Consider the “noise” in estimator caused by (AB) i’j’ : ⇢ s 1 ( i 0 ) s 2 ( j 0 )( AB ) i 0 j 0 if h 1 ( i ) + h 2 ( j ) = h 1 ( i 0 ) + h 2 ( j 0 ) X i 0 j 0 = 0 otherwise . • If h 1 , h 2 are 3-wise independent, these random . variables are uncorrelated, so: 0 1 @X X X A = E [ X 2 X i 0 j 0 Var ( X i 0 j 0 ) = i 0 j 0 ] Var i 0 ,j 0 i 0 ,j 0 i 0 ,j 0 X ( AB ) 2 i 0 j 0 /b = || AB || 2 F /b ≤ i 0 ,j 0 7
Sparse outputs • Suppose AB has at most b /3 nonzero entries. • Then with probability 2/3 there is no noise in a given estimator. • Repeat O(log n ) times and take median estimate, to get exact result whp. 8
Time analysis • Construct 2 n degree b polynomials: O( n 2 + nb ). • Multiply n pairs of degree b polynomials, using FFT: O( nb log b ). • Extracting estimates: O( n 2 ). Total time : O( n 2 + nb log b ). 9
Background • The polynomial computed is in fact a Count- Sketch [Charikar et al. ’04], an early compressed sensing method. 10
Background • The polynomial computed is in fact a Count- Sketch [Charikar et al. ’04], an early compressed sensing method. • Polynomial multiplication combines Count- Sketch es of column vector of A and a row vector of B into a Count-Sketch for their outer product . 10
Background • The polynomial computed is in fact a Count- Sketch [Charikar et al. ’04], an early compressed sensing method. • Polynomial multiplication combines Count- Sketch es of column vector of A and a row vector of B into a Count-Sketch for their outer product . • Add up outer product sketches to get a sketch for AB . 10
Some related results • Folklore : Computing AB with b nonzeros in time O( nb ) if there are no cancellations . 11
Some related results • Folklore : Computing AB with b nonzeros in time O( nb ) if there are no cancellations . • Cohen and Lewis ’99 : For nonnegative matrices, estimate AB with low relative error. 11
Some related results • Folklore : Computing AB with b nonzeros in time O( nb ) if there are no cancellations . • Cohen and Lewis ’99 : For nonnegative matrices, estimate AB with low relative error. • Iwen and Spencer ’09 : Computing AB with ≤ b/n nonzeros in each column in time Õ( nb ). 11
Some related results • Folklore : Computing AB with b nonzeros in time O( nb ) if there are no cancellations . • Cohen and Lewis ’99 : For nonnegative matrices, estimate AB with low relative error. • Iwen and Spencer ’09 : Computing AB with ≤ b/n nonzeros in each column in time Õ( nb ). • Drineas, Kannan, Mahoney ’06; Sarlós ’06 : Computing AB with low total error in terms of || A || F and || B || F . 11
Case study: Correlations A = Two rows of A are correlated. Which ones? 12
Sample covariance matrix AA T = 13
Sample covariance matrix AA T = 13
Sample covariance matrix estimated using compressed matrix multiplication AA T ≈ 14
Sample covariance matrix estimated using compressed matrix multiplication f(AA T )= Showing large values not explained by hash collisions. 15
Some open problems • Can other problems with “sparse solutions” be solved efficiently using compressed sensing techniques? - Matrix inversion? - Linear systems with a sparse solution? - Sparse transitive closure of a graph? - Product of > 2 matrices? 16
Discussion: Combinatorial algorithms • Compressed MM can be considered “combinatorial”. • Another view: No large hidden constants (in contrast to “algebraic” approaches leading to ω < 2.3727) . 17
Discussion: Combinatorial algorithms • Compressed MM can be considered “combinatorial”. • Another view: No large hidden constants (in contrast to “algebraic” approaches leading to ω < 2.3727) . • It is interesting to consider what other subclasses of matrix products can be computed in time, say, n 2+ ε , using algorithms with these properties. 17
Hidden slide: Extra application = 18
Hidden slide: Extra application = http://xkcd.com/651/ 18
Recommend
More recommend