Approximate Matrix Multiplication Lecture 21 April 11, 2019 - PowerPoint PPT Presentation

CS 498ABD: Algorithms for Big Data, Spring 2019 Approximate Matrix Multiplication Lecture 21 April 11, 2019 Chandra (UIUC) CS498ABD 1 Spring 2019 1 / 34

Matrix data Lot of data can be viewed as defining a matrix. We have already seen vectors modeling data/signals. More generally we can use tensors too. n data items and each data item a i is a vector over some features (say m features) A is the matrix defined by the n data items. Assuming a 1 , . . . , a n are columns then A is a m × n matrix Combinatorial objects such as graphs can also be modeled via graphs Chandra (UIUC) CS498ABD 2 Spring 2019 2 / 34

Numerical Linear Algebra Basic problems in linear algebra: Matrix vector product: compute Ax Matrix multiplication: compute AB Linear equations: solve Ax = b Matrix inversion: compute A − 1 Least squares: solve min x � Ax − b � Singular value decomposition, eigen values, principal component analysis, low-rank approximations . . . Fundamental in all areas of applied mathematics and engineering. Many applications to statistics and data analysis. Chandra (UIUC) CS498ABD 3 Spring 2019 3 / 34

Numerical Linear Algebra NLA has a vast literature In practice iterative methods are used that converge to an optimum solution. They can take advantage of sparsity in the input data better than exact methods Some TCS contributions in the recent past: randomized NLA for faster algorithms with provable approximation guarantees - sampling and JL based techniques and others revisit preconditioning methods for Laplacians and beyond Many powerful applications in theory and practice Chandra (UIUC) CS498ABD 4 Spring 2019 4 / 34

Norms and matrix norms Definition A norm �� in a real vector space V is a real valued function that has three properties: (i) � x � ≥ 0 for all x ∈ V and � x � = 0 implies x = 0 , (ii) � ax � = | a |� x � for all scalars a (iii) � x + y � ≤ � x � + � y � Familiar vector norms: � x � p = ( � i | x i | p ) 1 / p If A is a injective linear transformation � Ax � is also a norm in the original space. Norms and metrics: d ( x , y ) = � x − y � is a metric Chandra (UIUC) CS498ABD 5 Spring 2019 5 / 34

Matrix norms Consider vector space of all matrices A ∈ R m × n What are useful norms over matrices? Treat matrix like a vector of dimension m × n and apply vector norm. For instance � A � F (Frobenius norm) is ( � i , j | A i , j | 2 ) 1 / 2 . Treat matrix as linear operator and see what it does to norms of vectors it operates on. Spectral norm is sup � x � 2 =1 � Ax � 2 . Schatten p -norms based on singular values of A Trace norm, nuclear norm, . . . Norms are related in some cases (different perspective on the same norm) Chandra (UIUC) CS498ABD 6 Spring 2019 6 / 34

Frobenus and Spectral norms Submultiplicative property: � AB � F ≤ � A � F � B � F � AB � 2 ≤ � A � 2 � B � 2 Chandra (UIUC) CS498ABD 7 Spring 2019 7 / 34

Matrix Multiplication Problem: Given matrices A ∈ R m × n and B ∈ R n × p compute the matrix AB Standard algorithm based on definition: O ( mnp ) time Faster algorithms via non-trivial Strassen-like divide and conquer. Chandra (UIUC) CS498ABD 8 Spring 2019 8 / 34

Matrix Multiplication Problem: Given matrices A ∈ R m × n and B ∈ R n × p compute the matrix AB Standard algorithm based on definition: O ( mnp ) time Faster algorithms via non-trivial Strassen-like divide and conquer. Approximation: Compute D ∈ R m × p such that � D − AB � is small in some appropriate matrix norm. Two methods random sampling random projections (fast JL) Chandra (UIUC) CS498ABD 8 Spring 2019 8 / 34

Matrix Multiplication Problem: Given matrices A ∈ R m × n and B ∈ R n × p compute the matrix AB Notation: M ( j ) for j ’th column of M and M ( i ) for i ’th row of M both interpreted as vectors From textbook definition: D i , h = � A ( i ) , B ( h ) � = � n k =1 A i , k B k , h Consider A T consisting of m column vectors from R n and B as p column vectors from R n We want to compute all mp inner products of these vectors. Chandra (UIUC) CS498ABD 9 Spring 2019 9 / 34

Approximate Matrix Multiplication Want to approximate AB in the Frobenius norm. Want D such that � D − AB � F ≤ ǫ � AB � F but � AB � F can be 0 . Instead will settle for � D − AB � F ≤ ǫ � A � F � B � F Chandra (UIUC) CS498ABD 10 Spring 2019 10 / 34

Part I Random Sampling for Approx Matrix Mult Chandra (UIUC) CS498ABD 11 Spring 2019 11 / 34

Matrix Multiplication and Outer Products Alternate definition of matrix multiplication based on outer product: n � A ( j ) B ( j ) AB = j =1 A ( j ) B ( j ) is a m × h matrix of rank 1 Chandra (UIUC) CS498ABD 12 Spring 2019 12 / 34

Importance Sampling n � A ( j ) B ( j ) AB = j =1 Pick a probability distribution over [ n ] , p 1 + p 2 + . . . + p n = 1 For ℓ = 1 to t do pick an index j ℓ ∈ [ n ] according to distribution p (independent with replacement) Output C = 1 � t 1 p i ℓ A ( j ℓ ) B ( j ℓ ) ℓ =1 t Chandra (UIUC) CS498ABD 13 Spring 2019 13 / 34

Importance Sampling n � A ( j ) B ( j ) AB = j =1 Pick a probability distribution over [ n ] , p 1 + p 2 + . . . + p n = 1 For ℓ = 1 to t do pick an index j ℓ ∈ [ n ] according to distribution p (independent with replacement) Output C = 1 � t 1 p i ℓ A ( j ℓ ) B ( j ℓ ) ℓ =1 t C = 1 � ℓ C ℓ where E[ C ℓ ] = AB . t By linearity of expectation: E[ C ] = AB Chandra (UIUC) CS498ABD 13 Spring 2019 13 / 34

Importance Sampling Question: How should we choose p 1 , p 2 , . . . , p n ? Chandra (UIUC) CS498ABD 14 Spring 2019 14 / 34

Importance Sampling Question: How should we choose p 1 , p 2 , . . . , p n ? p j should correspond to contribution of A ( j ) B ( j ) to � AB � F Chandra (UIUC) CS498ABD 14 Spring 2019 14 / 34

Importance Sampling Question: How should we choose p 1 , p 2 , . . . , p n ? p j should correspond to contribution of A ( j ) B ( j ) to � AB � F Use spectral norm of A ( j ) B ( j ) which is � A ( j ) B ( j ) � 2 Chandra (UIUC) CS498ABD 14 Spring 2019 14 / 34

Importance Sampling Question: How should we choose p 1 , p 2 , . . . , p n ? p j should correspond to contribution of A ( j ) B ( j ) to � AB � F Use spectral norm of A ( j ) B ( j ) which is � A ( j ) B ( j ) � 2 Claim: � A ( j ) B ( j ) � 2 = � A ( j ) � 2 � B ( j ) � 2 . � A ( j ) � 2 � B ( j ) � 2 Choose p j = ℓ � A ( ℓ ) � 2 � B ( ℓ ) � 2 � Due to [Drineas-Kannan-Mahoney] Chandra (UIUC) CS498ABD 14 Spring 2019 14 / 34

Running time For all j compute � A ( j ) � 2 and � B ( j ) � 2 . Takes one pass over A and B Allows one to compute p 1 , p 2 , . . . , p n C = 1 � t 1 p i ℓ A ( j ℓ ) B ( j ℓ ) ℓ =1 t At most O ( tmh + N A + N B ) time where N A and N B is number of non-zeroes in A and B . Full computation takes O ( nmh ) time. Chandra (UIUC) CS498ABD 15 Spring 2019 15 / 34

Analysis of approximation Want to analyse Pr[ � C − AB � F ≥ ǫ � A � F � B � F ] . Chandra (UIUC) CS498ABD 16 Spring 2019 16 / 34

Analysis of approximation Want to analyse Pr[ � C − AB � F ≥ ǫ � A � F � B � F ] . Using Markov: � � C − AB � 2 � Pr[ � C − AB � F ≥ ǫ � A � F � B � F ] ≤ E F ǫ 2 � A � 2 F � B � 2 F Lemma � 2 �� n � � C − AB � 2 � ≤ 1 j =1 � A ( j ) � 2 � B ( j ) � 2 − 1 t � AB � 2 E F t F Chandra (UIUC) CS498ABD 16 Spring 2019 16 / 34

Analysis continued � � C − AB � 2 � Pr[ � C − AB � F ≥ ǫ � A � F � B � F ] ≤ E F ǫ 2 � A � 2 F � B � 2 F 2   n 1 − 1 � � � C − AB � 2 � � A ( j ) � 2 � B ( j ) � t � AB � 2 ≤ E   F F t j =1 1 t � A � 2 F � B � 2 ≤ F . 1 Thus, if t = ǫ 2 δ then Pr[ � C − AB � F ≥ ǫ � A � F � B � F ] ≤ δ. Chandra (UIUC) CS498ABD 17 Spring 2019 17 / 34

Median trick Recall that we used median trick to improve dependence on δ from 1 /δ to log(1 /δ ) . 3 If t = ǫ 2 then Pr[ � C − AB � F ≥ ǫ � A � F � B � F ] ≤ 1 / 3 . Repeat independently to obtain C 1 , C 2 , . . . , C r where r = Θ(log(1 /δ )) By Chernoff bounds majority of estimators are good. How do we pick the “median” matrix? Chandra (UIUC) CS498ABD 18 Spring 2019 18 / 34

Median trick 3 If t = ǫ 2 then Pr[ � C − AB � F ≥ ǫ � A � F � B � F ] ≤ 1 / 3 . Repeat independently to obtain C 1 , C 2 , . . . , C r where r = Θ(log(1 /δ )) For each 1 ≤ i ≤ r compute ρ i = |{ j | j � = i , � C i − C j � ≤ 2 ǫ � A � F � B � F }| Output C s such that ρ s ≥ r / 2 [Clarkson-Woodruff] Chandra (UIUC) CS498ABD 19 Spring 2019 19 / 34

Median trick For each 1 ≤ i ≤ r compute ρ i = |{ j | j � = i , � C i − C j � ≤ 2 ǫ � A � F � B � F }| Output C s such that ρ s ≥ r / 2 Correctness follows from triangle inequality. � C i − C j � F ≤ � C i − AB � F + � C j − AB � F and � C i − C j � F ≥ � C i − AB � F − � C j − AB � F . Chandra (UIUC) CS498ABD 20 Spring 2019 20 / 34

Approximate Matrix Multiplication Lecture 21 April 11, 2019 - PowerPoint PPT Presentation

CS 498ABD: Algorithms for Big Data, Spring 2019 Approximate Matrix Multiplication Lecture 21 April 11, 2019 Chandra (UIUC) CS498ABD 1 Spring 2019 1 / 34 Matrix data Lot of data can be viewed as defining a matrix. We have already seen

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

Matrix Multiplication Matrix multiplication is an operation with properties quite different from

CS 140 : Matrix multiplication Warmup: Matrix times vector: communication volume Matrix

Shared Memory with Cilk++ Matrix-matrix multiplication Matrix-vector multiplication

Parallel Scientific Computing Matrix-vector multiplication. Matrix-matrix multiplication.

Complexity of matrix multiplication (For Hierarchical matrix) For Usual matrix The

CS 401 Integer Multiplication / Matrix Multiplication Xiaorui Sun 1 Integer Multiplication

Matrix-chain multiplication Carola Wenk 1 CMPS 6610 Algorithms Matrix-chain multiplication

Chapter VI All Pair Shortest Paths and Matrix Multiplication VI.1 APSPs and Matrix

Efficient multiplication 2 Matrix multiplication If you have square matrices A and B, then C =

Matrix Calculations: Kernels & Images, Matrix Multiplication A. Kissinger (and H. Geuvers)

Communication Lower Bounds for Matrix-Matrix Multiplication Dagstuhl Seminar #15281 July 6-9,

High-performance and Memory-saving Sparse General Matrix-Matrix Multiplication for Pascal GPU

MATH 105: Finite Mathematics 2-5: Matrix Multiplication Prof. Jonathan Duncan Walla Walla

Column-Based Matrix Partitioning for Parallel Matrix Multiplication on Heterogeneous Processors

I/O Lower Bounds and Algorithms for Matrix-Matrix Multiplication Tyler M. Smith July 5, 2017 1

IGA Lecture IV: Quantization of group-valued moment maps Eckhard Meinrenken Adelaide, September

Bridging the Gap: Communication & Conflict Resolution Housekeeping: meeting online Stay

Protons aft fter bombarding the target at MOMENT NuFact2015 Rio de Janeiro iro, , Brazi zil

Connectedness properties of the set where the iterates of an entire function are unbounded John

A Study of Depth of Shower Maximum of Simulated Air Shower Longitudinal profjle using

Contents of the Lecture Multiple random variables Covariance, correlation and higher order

Hall effect and Giant Hall effects Michel Viret Service de Physique de lEtat Condens CEA

What is the circled term? The circled term represents the total stresses, The

Approximate Matrix Multiplication Lecture 21 April 11, 2019 - PowerPoint PPT Presentation

CS 498ABD: Algorithms for Big Data, Spring 2019 Approximate Matrix Multiplication Lecture 21 April 11, 2019 Chandra (UIUC) CS498ABD 1 Spring 2019 1 / 34 Matrix data Lot of data can be viewed as defining a matrix. We have already seen

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

Matrix Multiplication Matrix multiplication is an operation with properties quite different from

CS 140 : Matrix multiplication Warmup: Matrix times vector: communication volume Matrix

Shared Memory with Cilk++ Matrix-matrix multiplication Matrix-vector multiplication

Parallel Scientific Computing Matrix-vector multiplication. Matrix-matrix multiplication.

Complexity of matrix multiplication (For Hierarchical matrix) For Usual matrix The

CS 401 Integer Multiplication / Matrix Multiplication Xiaorui Sun 1 Integer Multiplication

Matrix-chain multiplication Carola Wenk 1 CMPS 6610 Algorithms Matrix-chain multiplication

Chapter VI All Pair Shortest Paths and Matrix Multiplication VI.1 APSPs and Matrix

Efficient multiplication 2 Matrix multiplication If you have square matrices A and B, then C =

Matrix Calculations: Kernels &amp; Images, Matrix Multiplication A. Kissinger (and H. Geuvers)

Communication Lower Bounds for Matrix-Matrix Multiplication Dagstuhl Seminar #15281 July 6-9,

High-performance and Memory-saving Sparse General Matrix-Matrix Multiplication for Pascal GPU

MATH 105: Finite Mathematics 2-5: Matrix Multiplication Prof. Jonathan Duncan Walla Walla

Column-Based Matrix Partitioning for Parallel Matrix Multiplication on Heterogeneous Processors

I/O Lower Bounds and Algorithms for Matrix-Matrix Multiplication Tyler M. Smith July 5, 2017 1

IGA Lecture IV: Quantization of group-valued moment maps Eckhard Meinrenken Adelaide, September

Bridging the Gap: Communication &amp; Conflict Resolution Housekeeping: meeting online Stay

Protons aft fter bombarding the target at MOMENT NuFact2015 Rio de Janeiro iro, , Brazi zil

Connectedness properties of the set where the iterates of an entire function are unbounded John

A Study of Depth of Shower Maximum of Simulated Air Shower Longitudinal profjle using

Contents of the Lecture Multiple random variables Covariance, correlation and higher order

Hall effect and Giant Hall effects Michel Viret Service de Physique de lEtat Condens CEA

What is the circled term? The circled term represents the total stresses, The

Matrix Calculations: Kernels & Images, Matrix Multiplication A. Kissinger (and H. Geuvers)

Bridging the Gap: Communication & Conflict Resolution Housekeeping: meeting online Stay