Edge-based graph partitioning Outline Introduction 2D Medium-grain Rob H. Bisseling Medium-grain method Results Mathematical Institute, Utrecht University Optimal B&B method Results Pretty pictures Conclusion and Joint work with Dani¨ el M. Pelt (CWI Amsterdam) future work Workshop Sparse Days at St. Girons June 30, 2015 1
Introduction 2D matrix partitioning Outline Introduction Heuristic medium-grain partitioning 2D Medium-grain Medium-grain method Medium-grain method Results Results Optimal B&B method Results Optimal sparse matrix bipartitioning Pretty pictures Branch-and-Bound method Conclusion and future work Results Pretty pictures Conclusion and future work 2
Parallel sparse matrix-vector multiplication Outline Introduction 2D Medium-grain Medium-grain method Results Optimal B&B method Results Pretty pictures Conclusion and future work ◮ Parallel multiplication of a 5 × 5 sparse matrix A and a dense input vector � v , � u = A � v ◮ 2D matrix distribution over 2 processors ◮ 4 data words of communication ◮ Perfect load balance: 8 nonzeros per processor 3
Sparse matrix partitioning and graph partitioning Outline ◮ A sparse matrix is the adjacency matrix of a sparse graph: Introduction 2D a ij � = 0 ⇔ ( i , j ) ∈ E Medium-grain Medium-grain method ◮ Partitioning the nonzeros of a matrix is the same as Results Optimal partioning the edges of a graph. B&B method Results ◮ 2D partitioning splits both rows and columns. Pretty pictures Conclusion and ◮ Partitioning for parallel sparse matrix-vector multiplication future work (SpMV) can be used in Google PageRank computation. ◮ Partitioning for SpMV also gives a good partitioning for many graph computations. 4
Advantage of 2D partitioning Outline Introduction 2D ◮ We can use both dimensions of the matrix to reduce SpMV Medium-grain communication. Medium-grain method ◮ For a √ p × √ p block distribution, each matrix row or Results column is distributed over at most √ p processors, instead Optimal B&B method Results of p processors for a 1D distribution. Pretty pictures Conclusion and ◮ Relatively dense rows and columns can be split and do not future work cause load imbalance or memory overflow. 5
2D (edge-based) parallel matching SpMV Matching 1 D 2 D 1 D 2 D Name rw9 (af shell10) 113 105 169 150 Outline Introduction rw10 (boneS10) 150 145 228 189 2D rw11 (Stanford) 340 141 479 234 Medium-grain rw12 (gupta3) 710 44 1,305 61 Medium-grain method Results rw13 (St Berk.) 716 448 1,152 812 Optimal rw14 (F1) 139 130 148 139 B&B method Results sw1 (small world) 1,007 417 2,111 303 Pretty pictures sw2 1,957 829 3,999 563 Conclusion and future work sw3 2,017 832 4,255 528 er1 (random) 1,856 1,133 1,788 1,157 er2 3,451 1,841 3,721 1,635 er3 5,476 2,569 6,350 1,990 Communication volume in parallel sparse matrix–vector multiplication and Karp–Sipser matching. Source: Patwary, Bisseling, Manne (HLPP 2010). 6
Medium-grain partitioning method Outline Introduction 2D Medium-grain Medium-grain method Results Optimal ◮ m × n matrix A is split by a simple method into B&B method Results A = A r + A c Pretty pictures Conclusion and ◮ ( m + n ) × ( m + n ) matrix B is formed and partitioned by future work column using a 1D method ( A r ) T � � I n B = A c I m “A medium-grain method for fast 2D bipartitioning of sparse matrices”, by Dani¨ el M. Pelt and Rob H. Bisseling, Proc. IPDPDS 2014, IEEE Press, pp. 529-539. 7
Simple split Outline Introduction 2D Medium-grain Medium-grain method Results Optimal B&B method Results Pretty pictures 34 × 34 matrix karate , Conclusion and future work nz ( A ) = 156 (Zachary’s karate club, 1977), V = 8 ◮ Matrix nonzero a ij is assigned to A c if nz c ( i ) < nz r ( j ), and to A r otherwise. ◮ Fewer nonzeros in a column have more chance to stay together in a good partitioning 8
Result for matrix from Graph Drawing contest 1997 Outline Introduction 2D Medium-grain Medium-grain method Results Optimal B&B method Results Pretty pictures Conclusion and future work 47 × 47 matrix gd97 b , nz ( A ) = 264 ◮ Medium-grain method achieves V Opt = 11 ◮ communication volume of 1D partitioning of B = volume of corresponding 2D partitioning of A 9
The corresponding graph Outline Introduction 2D Medium-grain Medium-grain method Results Optimal B&B method Results Pretty pictures Conclusion and future work http://www.cise.ufl.edu/research/sparse/matrices/ Pajek/GD97 b.html ◮ 46 vertices, 132 edges ◮ One matrix row and column were empty 10
Comparing 3 methods for p = 2 using Mondriaan 1 . 0 0 . 8 Fraction of test cases Outline Introduction 0 . 6 LB 2D LB+IR Medium-grain 0 . 4 FG Medium-grain method FG+IR Results 0 . 2 MG Optimal MG+IR B&B method Results 0 . 0 1 . 0 1 . 2 1 . 4 1 . 6 1 . 8 2 . 0 Pretty pictures Communication volume relative to best Conclusion and future work ◮ LB = localbest = best of 1D row, 1D column (v1-v3) ◮ MG = medium-grain method (v4.0) ◮ FG = fine-grain model (C ¸ataly¨ urek and Aykanat 2001) ◮ IR = iterative refinement, a cheap Kernighan–Lin based postprocessing procedure using the MG idea ◮ 2267 matrices from U. Florida collection with 500 ≤ nz ≤ 5 , 000 , 000 11
Comparing 3 methods for p = 2 using PaToH Outline Introduction 1 . 0 2D Medium-grain 0 . 8 Fraction of test cases Medium-grain method Results 0 . 6 LB Optimal B&B method LB+IR Results 0 . 4 FG Pretty pictures FG+IR Conclusion and 0 . 2 MG future work MG+IR 0 . 0 1 . 0 1 . 2 1 . 4 1 . 6 1 . 8 2 . 0 Communication volume relative to best 12
Comparing 3 methods for p = 64 using PaToH Outline Introduction 1 . 0 2D Medium-grain 0 . 8 Fraction of test cases Medium-grain method Results 0 . 6 LB Optimal B&B method LB+IR Results 0 . 4 FG Pretty pictures FG+IR Conclusion and 0 . 2 MG future work MG+IR 0 . 0 1 . 0 1 . 2 1 . 4 1 . 6 1 . 8 2 . 0 Communication volume relative to best 13
Optimal bipartitioning Outline Introduction 2D Medium-grain Medium-grain method Results 7 × 7 matrix b1 ss , nz ( A ) = 15 Optimal B&B method Results ◮ Benchmark p = 2 because heuristic partitioners are often Pretty pictures based on recursive bipartitioning. Conclusion and future work ◮ Problem p = 2 is easier to solve than p > 2. ◮ Load balance criterion is � nz ( A ) � nz ( A i ) ≤ (1 + ε ) for i = 0 , 1 . , 2 ◮ Rounding enables a feasible solution even for ε = 0 and odd nz ( A ). 14
Branch-and-bound method Outline Introduction 2D Medium-grain Medium-grain method Piet Mondriaan 1908 Results Evening - the red tree Optimal B&B method Results Pretty pictures ◮ Construct a ternary tree representing all possible solutions Conclusion and future work ◮ Every node in the tree has 3 branches, representing a choice for a matrix row or column: • completely assigned to processor P (0) • completely assigned to processor P (1) • cut ◮ The tree is pruned by using lower bounds on the communication volume or number of nonzeros 15
Lower bounds L 1 , L 2 on communication volume Outline 0 1 - - - Introduction c 2D 0 Medium-grain - Medium-grain method - Results - Optimal B&B method Results Pretty pictures Conclusion and future work ◮ Partial solution: value 0, 1, or c has been assigned to 2 rows and 2 columns ◮ Row 0 has been cut: lower bound on volume L 1 = 1 ◮ Rows 2 and 4 have been implicitly cut: L 2 = 2 16
Lower bound L 3 on communication volume 0 1 - - - Outline c Introduction 2D 0 Medium-grain - Medium-grain - method Results - Optimal B&B method Results Pretty pictures Conclusion and ◮ Columns 3, 4, 5 have been partially assigned to P (0) future work ◮ They can only be completely assigned to P (0) or cut. ◮ For perfect load balance ( ε = 0), we can assign at most 2 more red nonzeros ◮ Thus we have to cut column 3, and one more: L 3 = 2 17
Optimal solution Outline 1 c 0 c c Introduction 2D 0 Medium-grain c Medium-grain method 1 Results 1 Optimal 0 B&B method Results Pretty pictures Conclusion and future work ◮ Optimal solution: volume = 4. ◮ Total lower bound is LB = L 1 + L 2 + L 3 = 5. ◮ Prune partial solution since LB > UB . 18
Lower bound L 4 by conflicting partial assignments Outline Introduction 2D Medium-grain Medium-grain method Results Optimal B&B method Results Pretty pictures Conclusion and ◮ Permute matrix to create blocks: future work • ˆ B 0 : completely assigned to processor P (0) • P 0 : partially assigned to processor P (0) • ˆ B c : cut • ˆ I c : implicitly cut ◮ Conflict for nonzero in row block P 1 ∩ column block P 0 : L 4 = 1 19
The perfect match Outline Introduction 2D Medium-grain Medium-grain method Results Optimal B&B method Results Pretty pictures Conclusion and future work Coming soon to this theatre! 20
Recommend
More recommend