Sparse Matrix Multiplication and Triangle Listing in the Congested - PowerPoint PPT Presentation

Sparse Matrix Multiplication and Triangle Listing in the Congested Clique Model Keren Censor-Hillel, Dean Leitersdorf , Elia Turner (Technion) OPODIS 2018 This project received funding from the European Union’s Horizon 2020 Research and Innovation Program under grant agreement no. 755839

Overview 2

The Congested Clique Input Graph Overlay Network n nodes in both graphs ● ● Synchronous, bits per message All-to-All Communication ● ● Goal: Minimize # communication rounds 3

Sparse Algorithms Sparse input graphs common in practice ● Leverage sparsity, reduce runtime ● Congested Clique: Does not decrease model strength ● 4

Sparse Algorithms - Our Results New load balancing building blocks ● New algorithms for sparse matrix multiplication, ● triangle listing ● Implies sparse graph algorithms Triangle, 4-cycle counting ○ ○ APSP 5

Sparse Matrix Multiplication (Sparse MM) 6

Previous Work on MM boolean MM ● [Drucker, Kuhn, Oshman, PODC 2014] ring MM ● [Censor-Hillel, Kaski, Korhonen, Lenzen, Paz, Suomela, PODC 2015] semiring MM ● [Censor-Hillel, Kaski, Korhonen, Lenzen, Paz, Suomela, PODC 2015] Rectangular matrices and multiple instances of MM concurrently ● [Le Gall, DISC 2016] ω = exponent of sequential MM < 2.372864 7

Matrix Multiplication (MM) ● Input: Square matrices S, T . Output: P = S*T Node i : row i of each matrix ● ● Example: S = T = Adjacency matrix of graph Input Graph Input Matrices 10

Sparse MM ● Many beautiful works in sequential and parallel. Typically different runtime measures ● New algorithm: deterministic & dynamic communication pattern w.r.t. sparsity structure 11

Sparse MM Non-Zero S T P Zero N/A 12

Sparse MM Non-Zero S T P Zero N/A Implicit communication of zeros! 13

Sparse MM - Our Main Result P = S*T: ➔ nz(A) = number of non-zero elements in A Lets see it! 14

Semiring MM [Censor-Hillel, Kaski, Korhonen, ● Lenzen, Paz, Suomela, PODC 2015] 3 Parts: ● 1. Distribute matrix entries 2. Locally compute partial products 3. Sum partial products Our novelty: 1, 3 in a sparsity aware manner ● 15

The Challenges S T P 16

The Challenges S T P 17

The Challenges Non-Zero S Zero N/A 18

The Challenges Non-Zero S Zero N/A 19

Sparse MM: Two Challenges [Lenzen, 2013] - runtime depends on max messages ● Receiving Challenge : Every node receives ~same # messages ● Sending Challenge : Every node sends ~same # messages ● 20

Step 1: (a, b)-split Several Instances of Square MM Rectangular MM 21

Step 1: (a, b)-split S T P 22

Step 1: (a, b)-split T S P 23

Step 1: (a, b)-split Detailed Example T S P 24

Step 1: (a, b)-split Detailed Example T a = 3 S a P 25

Step 1: (a, b)-split Detailed Example T b a = 3 b = 4 S a P 26

Step 1: (a, b)-split Detailed Example T b a = 3 b = 4 S a P 27

Step 1: (a, b)-split Finally: Detailed Example T b There are a*b a = 3 ● rectangular MM b = 4 ● Assign n/(ab) nodes to compute each S a P 28

Step 2: Receiving Challenge Step 2.1: Roughly similar rectangular MM (density-wise) ● Step 2.2: Sparsity awareness within rectangular MM ● 29

Step 2.1: Similar Rectangular MM T b S a P 30

Step 2.1: Similar Rectangular MM Observation: T b Ok reorder S-rows, T-cols ● ● Reorder to achieve similar rectangular MMs O(1) in congested clique ● Deterministic ● S a P 31

Step 2.2: Sparsity Aware Rectangular MM S T How do we split this between the n/(ab) nodes? 34

Step 2.2: Sparsity Aware Rectangular MM 35

Step 2.2: Sparsity Aware Rectangular MM S T 36

Step 2.2: Sparsity Aware Rectangular MM S T 37

Step 2.2: Sparsity Aware Rectangular MM S T Observation: Swapping two S-cols and the two ● respective T-rows cancels out 38

Step 2.2: Sparsity Aware Rectangular MM 0 0 0 0 0 3 3 3 0 0 0 0 3 3 3 0 0 0 Phase 1: Count non-zeros in S-cols, T-rows ● 39

Step 2.2: Sparsity Aware Rectangular MM 0 0 0 0 0 3 3 3 0 0 0 0 3 3 3 0 0 0 Phase 1: Count non-zeros in S-cols, T-rows ● 0 0 0 6 6 6 0 0 0 Phase 2: Sum counts ● 40

Step 2.2: Sparsity Aware Rectangular MM Phase 1: Count non-zeros in S-cols, T-rows ● 0 0 0 6 6 6 0 0 0 Phase 2: Sum counts ● 6 0 0 6 0 0 6 0 0 ● Phase 3: Reorder 41

Step 2.2: Sparsity Aware Rectangular MM Phase 1: Count non-zeros in S-cols, T-rows ● 0 0 0 6 6 6 0 0 0 Phase 2: Sum counts ● 6 0 0 6 0 0 6 0 0 ● Phase 3: Reorder 42

Step 2: Receiving Challenge SOLVED! 43

Step 2.2: Sparsity Aware Rectangular MM Notice! a*b different rectangular MM ● n/(ab) nodes in each ● ● Inner reorderings = local knowledge of n/ab nodes Making them global knowledge = too expensive! ● Will be problematic soon 44

Step 3: Sending Challenge Need every node to send roughly same ● Solution: balancing message duplication ● 45

Dense Nodes Will Be Slow Non-Zero T Zero N/A 46

Message Duplication T b ● S, T duplicated b, a times resp. S a P 47

Message Duplication T b ● S, T duplicated b, a times resp. S a P 48

Step 3: Sending Challenge Key Point 1: Duplication is expensive! ● Key Point 2: Very easily load balanced - sparse nodes ● help dense nodes 49

Step 4: Knowledge Challenge ● Problem: Inner reorderings = local knowledge Senders do not know who to send messages to ● We show O(1) solution ● ○ Requires specific redistribution of elements in sending challenge - receivers know who needs to message them Nodes request messages ○ 50

Summary For any (a, b), total runtime: ● ● Optimal (a, b): ● Resulting overall runtime: 51

Sparse MM - Our Main Result P = S*T: ➔ nz(A) = number of non-zero elements in A 52

Sparse Triangle Listing 53

Triangle Listing Every triangle must be known to at least one node ● 54

Previous Work on Triangle Listing [Dolev, Lenzen, Peled DISC 2012] ● ● [Izumi, Le Gall, PODC 2017, Pandurangan, Robinson, Scquizzato, SPAA 2018] ● w.h.p. [Pandurangan, Robinson, Scquizzato, SPAA 2018] Our Result: deterministic 55

Our Result Triangle = path length 1 (v → u) + path length 2 (u → v) ● ● Our runtime: ● Notice: this is faster than the time for squaring! No need to compute all of A 2 ○ 56

Conclusion 57

Conclusion Our Work ● New load balancing building blocks in Congested Clique New algorithms for Sparse MM, Triangle Listing ● Open Questions Can the complexity of Sparse MM be improved in the ● clique? Sparse Ring MM? Lower bound for Sparse Triangle Listing? ● ● Using these algorithms/techniques in other models 58

Sparse Matrix Multiplication and Triangle Listing in the Congested - PowerPoint PPT Presentation

Sparse Matrix Multiplication and Triangle Listing in the Congested Clique Model Keren Censor-Hillel, Dean Leitersdorf , Elia Turner (Technion) OPODIS 2018 This project received funding from the European Unions Horizon 2020 Research and

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

Matrix Multiplication Matrix multiplication is an operation with properties quite different from

High-performance and Memory-saving Sparse General Matrix-Matrix Multiplication for Pascal GPU

Sparse Matrix Partitioning, Reordering and Vector Multiplication Albert-Jan Yzelman, Utrecht

CS 140 : Matrix multiplication Warmup: Matrix times vector: communication volume Matrix

The Input/Output Complexity of Sparse Matrix Multiplication Rasmus Pagh, Morten St ockel IT

Triangle Counting in Large Sparse Graph Meng-Tsung Tsai r95065@cise.ntu.edu.tw Triangle Counting

Parallel Sparse Matrix-Vector and Matrix- Transpose-Vector Multiplication using Compressed Sparse

Shared Memory with Cilk++ Matrix-matrix multiplication Matrix-vector multiplication

Parallel Scientific Computing Matrix-vector multiplication. Matrix-matrix multiplication.

Fast sparse matrixvector multiplication by partitioning and reordering Albert-Jan Yzelman

Complexity of matrix multiplication (For Hierarchical matrix) For Usual matrix The

Exploiting GPU Caches in Sparse Matrix Vector Multiplication Yusuke Nagasaka Tokyo Institute of

CS 401 Integer Multiplication / Matrix Multiplication Xiaorui Sun 1 Integer Multiplication

Chapter VI All Pair Shortest Paths and Matrix Multiplication VI.1 APSPs and Matrix

Matrix-chain multiplication Carola Wenk 1 CMPS 6610 Algorithms Matrix-chain multiplication

Knowledge Representation (Ch. 12) Announcements HW 5 correction Writing 3 up for real now

Self-Stabilizing Algorithms for graph parameters Phd student : Brahim NEGGAZI 1 Laboratoire

First-Order Knowledge Compilation Guy Van den Broeck Dagstuhl Sept 18, 2017 Overview 1.

Texture Based Classification Of Seismic Image Patches Using Topological Data Analysis June 6,

Lecture 1 - Introduction Welcome! , = (, ) ,

The New Building Blocks of Development Conference Paul Weisenfeld Executive Vice President,

Overview Industry-University-Government Partnerships and Statewide Economic Development

For Tuesday Reach chapter 18, sections 1-4 Homework: Chapter 12, exercise 7 Program 3

Sambuz

Useful Links

Newsletter

Mail Us