The Input/Output Complexity of Sparse Matrix Multiplication Rasmus Pagh 1 , Morten St¨ ockel 2 1 IT University of Copenhagen, 2 University of Copenhagen SIAM LA, October 26 2015 Pagh, St¨ ockel ITU, DIKU October 26 2015 1 / 30
Sparse matrix multiplication Problem description Sparse matrix multiplication Problem description Upper bound Size estimation Partitioning Outputting from partitions Summary Lower bound Technique used Bounding #phases Pagh, St¨ ockel ITU, DIKU October 26 2015 2 / 30
Sparse matrix multiplication Problem description Overview I Let A and C be matrices over a semiring R with N nonzero entries in total. I The problem: Compute matrix product [ AC ] i,j = P k A i,k C k,j with Z nonzero entries. I Central result: Can be done in (for most of parameter space) optimal √ ⇣ ⌘ ˜ N Z O I/Os. √ B M Pagh, St¨ ockel ITU, DIKU October 26 2015 3 / 30
Sparse matrix multiplication Problem description Cancellation of elementary products C : p rows q columns ... c 11 c 12 c 1 q ... c 21 c 22 c 2 q a 21 × c 12 . . . ... . . . + . . . c 22 × a 22 ... + c p 1 c p 2 c pq . . . + c p 2 × a 2 p ... ... a 11 a 12 a 1 p ac 11 ac 12 ac 1 q We say that we have cancellation ... ... a 21 a 22 a 2 p ac 21 ac 22 ac 2 q . . ... . . . ... . when two or more summands of . . . . . . . . . . . . ... ... a n 1 a n 2 a np ac n 1 ac n 2 ac nq [ AC ] i,j = P k A i,k C k,j are nonzero A : n rows p columns AC = A × C : n rows q columns but the sum is zero. Our algorithm handles such cases. 1 Pagh, St¨ ockel ITU, DIKU October 26 2015 4 / 30
Sparse matrix multiplication Problem description Motivation Lots of applications. Some of them: I Computing determinants and inverses of matrices. I Bioinformatics. I Graphs: counting cycles, computing matchings. Pagh, St¨ ockel ITU, DIKU October 26 2015 5 / 30
Sparse matrix multiplication Problem description The semiring I/O model, 1 I A word is big enough to hold a matrix element plus its coordinates. I Internal memory that holds M words and disk of infinite size. I One I/O: Transfer B words from disk to internal memory. I Cost of an algorithm: Number of I/Os used. I Operations allowed: Semiring operations, copy and equality check. Pagh, St¨ ockel ITU, DIKU October 26 2015 6 / 30
Sparse matrix multiplication Problem description The semiring I/O model, 2 I We make no assumptions about cancellation. I To produce output: must invoke emit ( . ) on every nonzero output entry once. I Matrices are of size U × U . I ˜ O suppresses polylog factors in U and N . Pagh, St¨ ockel ITU, DIKU October 26 2015 7 / 30
Sparse matrix multiplication Problem description Our results, 1 I Let A and C be U × U matrices over semiring R with N nonzero input and Z nonzero output entries. There exist algorithms 1 and 2 such that: 1. emits the set of nonzero entries of AC with probability at least √ √ ⇣ ⌘ 1 − 1 /U , using ˜ O N Z/ ( B M ) I/Os. � N 2 / ( MB ) � 2. emits the set of nonzero entries of AC , and uses O I/Os. √ ⇣ ⌘ I Previous best [Amossen-Pagh, ’09]: ˜ Z/ ( BM 1 / 8 ) O N I/Os (boolean matrices = ⇒ no cancellation). Pagh, St¨ ockel ITU, DIKU October 26 2015 8 / 30
Sparse matrix multiplication Problem description Our results, 2 I Let A and C be U × U matrices over semiring R with N nonzero input and Z nonzero output entries. There exist algorithms 1 and 2 such that: 1. emits the set of nonzero entries of AC with probability at least √ √ ⇣ ⌘ 1 − 1 /U , using ˜ O N Z/ ( B M ) I/Os. � N 2 / ( MB ) � 2. emits the set of nonzero entries of AC , and uses O I/Os. √ ⇣ ⇣ ⌘⌘ N 2 MB , N Z I There exist matrices that require Ω min I/Os to √ B M compute all nonzero entries of AC . Pagh, St¨ ockel ITU, DIKU October 26 2015 8 / 30
Upper bound Size estimation Output size estimation Size estimation tool: Given matrices A and C with N nonzero entries, compute ε -estimate of number of nonzeroes of each column of AC using ˜ O ( ε − 3 N/B ) I/Os. Fact (Bender et al, ’07) For dense 1 × U vector y and sparse U × U matrix S we can compute yS in ˜ O (( nnz ( S ) /B ) I/Os. Pagh, St¨ ockel ITU, DIKU October 26 2015 9 / 30
Upper bound Size estimation Distinct elements and matrix size I Distinct elements: Given frequency vector x of size n where x i i | x i | 0 . denotes the number of times element i occurs, then F 0 = P I Fundamental problem in streaming: Estimate F 0 without materializing x . I Observation: The distinct elements of AC is nnz ( AC ) . I Good news: use existing machinery. Size O ( ε − 3 log n log δ − 1 ) × n matrix F exists s.t Fx gives F 0 whp [Flajolet-Martin, ’85]. Pagh, St¨ ockel ITU, DIKU October 26 2015 10 / 30
Upper bound Size estimation Output estimation F is ε − 3 log δ − 1 log U × U . A and C are U × U . To get size estimate we must compute: F × A × C Pagh, St¨ ockel ITU, DIKU October 26 2015 11 / 30
Upper bound Size estimation Output estimation F is ε − 3 log δ − 1 log U × U . A and C are U × U . To get size estimate we must compute: ( F × A ) × C Due to associativity: Pick cheap order. Analysis: ε − 3 log δ − 1 log U invocations of dense vector sparse matrix black box: ˜ O ( ε − 3 N/B ) I/Os. Note: Works with cancellation, contrary to previous size estimation. Pagh, St¨ ockel ITU, DIKU October 26 2015 11 / 30
Upper bound Partitioning Matrix mult partitioning, 1 × A C Pagh, St¨ ockel ITU, DIKU October 26 2015 12 / 30
Upper bound Partitioning Matrix mult partitioning, 1 × A C Pagh, St¨ ockel ITU, DIKU October 26 2015 12 / 30
Upper bound Partitioning Matrix mult partitioning, 2 A C = × × + × + × + × Pagh, St¨ ockel ITU, DIKU October 26 2015 13 / 30
Upper bound Partitioning Partitioning the matrices I What we want: Split matrices into disjoint colored groups s.t. every color combination has at most M nonzero output entries. I Problem: Can’t be done. I Instead: Color rows of A using c colors. For each c groups of rows, do an independent coloring with c colors of columns of C . + × × Pagh, St¨ ockel ITU, DIKU October 26 2015 14 / 30
Upper bound Partitioning Partitioning the matrices, 2 Overview of how to partition matrices A and C : q nnz ( AC ) log U 1. Pick number of colors c = + O (1) M 2. Recurse: Split A into A 1 and A 2 where it holds: nnz ( A 1 C ) ≈ nnz ( AC ) / 2 and nnz ( A 2 C ) ≈ nnz ( AC ) . 3. After log c + O (1) recursive levels we have O ( c ) disjoint colored groups of rows of A . 4. For each of those groups: Repeat procedure for columns of C . 5. The key point: O ( c 2 ) problems of size nnz ( AC ) /c 2 = O ( M/ log U ) . Pagh, St¨ ockel ITU, DIKU October 26 2015 15 / 30
Upper bound Partitioning Getting the correct subproblem size Say we can do splits of A into A 1 , A 2 s.t. (1 − log − 1 U ) nnz ( AC ) / 2; (1 + log − 1 U ) nnz ( AC ) / 2 ⇥ ⇤ 1. nnz ( A 1 C ) ∈ . (1 − log − 1 U ) nnz ( AC ) / 2; (1 + log − 1 U ) nnz ( AC ) / 2 ⇥ ⇤ 2. nnz ( A 2 C ) ∈ . Assume biggest possible positive error: after q recursions have problem output size nnz ( AC )(1 / 2 + 1 / (2 log U )) q . Then after log c 2 + O (1) recursions: ◆ log c 2 ✓ 1 1 log c 2 ≤ nnz ( AC )2 − log c 2 e nnz ( AC ) 2 + log U 2 log U ≤ nnz ( AC ) O (1) /c 2 = O ( M/ log U ) Pagh, St¨ ockel ITU, DIKU October 26 2015 16 / 30
Upper bound Partitioning How to compute the split How to do relative error 1 / log U splits: Use size estimation tool: For any set r of rows we have access to ˆ z i ’s s.t. X ! X ! (1 − log − 1 U ) nnz z i ≤ (1+log − 1 U ) nnz X [ AC ] i ∗ ˆ [ AC ] i ∗ . ≤ i ∈ r i ∈ r i ∈ r Splitting A into A 1 and A 2 : 1. Let ˆ Z = P i ˆ z i . z i ≥ ˆ 2. Add rows from A to A 1 until P i ∈ A 1 ˆ Z/ 2 . 3. The row that y overflows A 1 : Compute y × C directly. 4. Add remaining rows to A 2 Pagh, St¨ ockel ITU, DIKU October 26 2015 17 / 30
Upper bound Partitioning I/O cost of splitting I/O cost: I Initial size est: ˜ O ( N/B ) . I Partition A : c dense-vector-sparse-matrix: ˜ O ( cN/B ) . I For the c A -partitions: one size est of total ˜ O ( N/B ) and c DVSM of total ˜ O ( cN/B ) . N √ ✓ ◆ q nnz ( AC ) I Total: ˜ O ( cN/B ) = ˜ nnz ( AC ) log U O since c = . √ M B M Pagh, St¨ ockel ITU, DIKU October 26 2015 18 / 30
Upper bound Outputting from partitions Are we done? + × × Pagh, St¨ ockel ITU, DIKU October 26 2015 19 / 30
Recommend
More recommend