investigating hypergraph partitioning based sparse matrix
play

Investigating hypergraph-partitioning-based sparse matrix - PowerPoint PPT Presentation

Outline Investigating hypergraph-partitioning-based sparse matrix partitioning methods Bora U car ro:ma, Lyon, France 22 October 2009 Jointly with Umit V. C ataly urek (VMWIP) 1/37 Hypergraph partitioning Outline Outline


  1. Outline Investigating hypergraph-partitioning-based sparse matrix partitioning methods Bora U¸ car ro:ma, Lyon, France 22 October 2009 Jointly with ¨ Umit V. C ¸ataly¨ urek (VMWIP) 1/37 Hypergraph partitioning

  2. Outline Outline Hypergraphs 1 Parallel SpMxV 2 Scalability analysis of partitioning methods 3 Concluding remarks 4 2/37 Hypergraph partitioning

  3. Hypergraphs Parallel SpMxV Scalability analysis of partitioning methods Concluding remarks Hypergraphs: Definitions A hypergraph H = ( V , N ) is a set of vertices V and a set of hyperedges (nets) N . A hyperedge h ∈ N is a subset of vertices. A cost c ( h ) is associated with each hyperedge h . A weight w ( v ) is associated with each vertex v . An undirected graph can be seen as a hypergraph where each hyperedge contains exactly two vertices. 3/37 Hypergraph partitioning

  4. Hypergraphs Parallel SpMxV Scalability analysis of partitioning methods Concluding remarks Hypergraphs: Partitioning Partition Π = {V 1 , V 2 , . . . , V K } is a K -way vertex partition if V k � = ∅ , parts are mutually exclusive: V k ∩ V ℓ = ∅ , parts are collectively exhaustive: V = � V k . The connectivity λ ( h ) of a hyperedge h is equal to the number of parts in which h has vertices. Objective: Minimize cutsize(Π) Constraint: Balanced part weights P v ∈V w ( v ) � h c ( h )( λ ( h ) − 1), � v ∈V k w ( v ) ≤ (1 + ε ) . K 4/37 Hypergraph partitioning

  5. Hypergraphs Parallel SpMxV Scalability analysis of partitioning methods Concluding remarks Hypergraph partitioning 10 vertices, 4 nets. V V 2 5 7 Partitioned into 4 parts: { 4 , 5 } , 1 { 7 , 10 } , { 3 , 8 , 9 } , and { 1 , 2 , 6 } , 10 4 λ ( n 1 ) = 2 λ ( n 2 ) = 3 n 1 n 2 λ ( n 3 ) = 3 λ ( n 4 ) = 2 cutsize (Π) = c ( n 1 ) + 2 c ( n 2 ) + 9 n 4 n 3 2 2 c ( n 3 ) + c ( n 4 ) 6 (with unit costs 6). 8 V 3 V 4 1 3 5/37 Hypergraph partitioning

  6. Hypergraphs Parallel SpMxV Scalability analysis of partitioning methods Concluding remarks Hypergraphs: Partitioning tools and applications Tools Applications hMETIS ( Karypis and Kumar, Univ. VLSI: circuit partitioning, Minnesota ), Scientific computing: matrix MLPart ( Caldwell, Kahng, and partitioning, ordering, Markov, UCLA/UMich ), cryptology, Mondriaan ( Bisseling and Meesen, Parallel/distributed Utrecht Univ. ), computing: volume rendering, Par k way ( Trifunovic and data aggregation, Knottenbelt, Imperial Coll. London ), declustering/clustering, PaToH ( C ¸ataly¨ urek and Aykanat, scheduling, Bilkent Univ. ), Software engineering, Zoltan-PHG ( Devine, Boman, information retrieval, Heaphy, Bisseling, and C ¸ataly¨ urek, processing spatial join queries, Sandia National Labs. ). etc. 6/37 Hypergraph partitioning

  7. Hypergraphs Parallel SpMxV Scalability analysis of partitioning methods Concluding remarks Parallel sparse matrix-vector multiplies Row-column-parallel multiplies P 4 P 2 P 4 P 1 P 1 P 3 P 3 P 2 x x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 To compute y ← Ax Expand x 1 P 1 y 1 1 ( P 1 sends x 5 to P 2 and P 3 .) y 2 P 1 1 1 1 y 3 P 3 Scalar multiply and add 1 3 3 1 2 y P 4 4 4 4 ( P 2 computes a partial y P 2 2 2 2 5 result y 6 P 4 4 4 2 2 2 y ′ 6 = a 65 x 5 + a 66 x 6 + a 68 x 8 .) y 7 P 4 4 3 y P 3 Fold y 3 3 3 3 8 ( P 2 sends its partial result y A y ′ 6 to P 4 .) 7/37 Hypergraph partitioning

  8. Hypergraphs Parallel SpMxV Scalability analysis of partitioning methods Concluding remarks Parallelization objectives Achieve load balance Load of a processor: number of nonzeros. ⇒ assign almost equal number of nonzeros per processor. Minimize communication cost Communication cost is a complex function (depends on the machine architecture and the problem size): total volume of messages, total number of messages, max. volume of messages per processor (sends or receives, both?), max. number of messages per processor (sends or receives, both?). The common metric in different works: total volume of communication. 8/37 Hypergraph partitioning

  9. Hypergraphs Parallel SpMxV Scalability analysis of partitioning methods Concluding remarks Parallelization problem Problem definition: Partition the matrix so that processors have equal number of nonzeros, minimize the total volume of messages. Volume of messages Consider x 5 . If assigned to processor: x 5 P 1 : 2 units of communication, P 2 : 2 units of communication, 1 P 3 : 2 units of communication, 1 P 4 : 3 units of communication, and does not make sense. y 6 4 4 2 2 2 Consider y 6 Similar 3 9/37 Hypergraph partitioning

  10. Hypergraphs Parallel SpMxV Scalability analysis of partitioning methods Concluding remarks Parallelization problem (cont’) Volume of messages x 5 Nonzeros in column c j in s c ( j ) processors: 1 volume of communication is s c ( j ) − 1. 1 Nonzeros in row r i in s r ( i ) processors: volume of communication is s r ( i ) − 1. y 6 4 4 2 2 2 3 Total volume of communication is � s c ( j ) − 1 + � s r ( i ) − 1. Balance the number of nonzeros per processor and minimize the number of processors sharing a column/row. Equivalent to the hypergraph partitioning problem; it is NP-complete. 10/37 Hypergraph partitioning

  11. Hypergraphs Parallel SpMxV Scalability analysis of partitioning methods Concluding remarks Three main models for matrix partitioning v (r ) h h a hj Column-net model: Used for rowwise a ij partitioning. Each column is a net and v (r ) n (c ) j j i i each row is a vertex. a kj v (r ) k k v (c ) h h a jh Row-net model: Used for columnwise a ji v (c ) n (r ) partitioning. Each row is a net and each i i j j a column is a vertex. jk v (c ) k k Fine-grain model: Used for nonzero- based partitioning. Each row is a net, y a j x j i i each column is a net, and each nonzero is a vertex. 11/37 Hypergraph partitioning

  12. Hypergraphs Parallel SpMxV Scalability analysis of partitioning methods Concluding remarks Parallel sparse matrix-vector multiplies Row-column-parallel multiplies P 4 P 2 P 4 P 1 P 1 P 3 P 3 P 2 x x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 To compute y ← Ax Expand x 1 P 1 y 1 1 ( P 1 sends x 5 to P 2 and P 3 .) y 2 P 1 1 1 1 y 3 P 3 Scalar multiply and add 1 3 3 1 2 y P 4 4 4 4 ( P 2 computes a partial y P 2 2 2 2 5 result y 6 P 4 4 4 2 2 2 y ′ 6 = a 65 x 5 + a 66 x 6 + a 68 x 8 .) y 7 P 4 4 3 y P 3 Fold y 3 3 3 3 8 ( P 2 sends its partial result y A y ′ 6 to P 4 .) 12/37 Hypergraph partitioning

  13. Hypergraphs Parallel SpMxV Scalability analysis of partitioning methods Concluding remarks Taxonomy of sparse matrix partitioning methods and models Parallel y ← Ax computation Parallel Algorithm Row-parallel Column-parallel Row-Column-parallel Partitioning Scheme 2D via Orthogonal Partitioning 2D Nonzero Based 1D RW 1D CW 2D JL 2D ORB 2D CH 2D ML2D 2D FG Multi-Constraint Hypergraph Model Column-Net Row-Net Column-Row-Net 13/37 Hypergraph partitioning

  14. Hypergraphs Parallel SpMxV Scalability analysis of partitioning methods Concluding remarks Example: Jagged-like partitioning 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 nnz = 47 R1 R2 r 16 3 r 5 4 c 11 c 8 r 2 6 c 4 r 11 8 c 2 r 1 r 8 11 12 c 16 r 7 14 r 4 c 10 16 c 3 c 1 c 7 c 13 1 c 5 c 9 r 6 2 c 14 5 r 10 7 r 15 r 14 9 r 3 r 9 c 6 c 12 10 r 12 c 15 r 13 13 15 3 4 6 8 11 12 14 16 1 2 5 7 9 10 13 15 nnz = 47 vol = 3 imbal = [ − 2.1%, 2.1%] 14/37 Hypergraph partitioning

  15. Hypergraphs Parallel SpMxV Scalability analysis of partitioning methods Concluding remarks Example: Jagged-like partitioning P1 P2 c 6 4 r 8 c 4 8 c 8 c 5 12 r 3 r 16 r 6 r 14 16 c 12 c 14 c 3 3 r 12 c 16 6 r 4 c 11 c 2 r 11 11 14 1 2 5 P3 P4 13 c 1 r 15 c 10 7 r 1 9 c 13 c 15 c 9 10 r 5 r 13 r 9 15 r 10 c 5 c 7 c 12 c 2 4 8 12 16 3 6 11 14 1 2 5 13 7 9 10 15 r 7 nnz = 47 r 2 vol = 8 imbal = [ − 6.4%, 2.1%] 15/37 Hypergraph partitioning

  16. Hypergraphs Parallel SpMxV Scalability analysis of partitioning methods Concluding remarks Too many alternatives? Ideally, for partitioning a given matrix, we would like to choose the best method. The best is not well-defined. Given that the main objective of the hypergraph-partitioning-based methods is the minimization of the total communication volume, we may content ourselves with the “least total communication volume”. Can we know which method will give the best result, without applying them all? The landscape is not too complicated; the fine-grain method usually obtains the least total communication volume. But, also with the highest run time and, even worse, the highest total number of messages. 16/37 Hypergraph partitioning

  17. Hypergraphs Parallel SpMxV Scalability analysis of partitioning methods Concluding remarks A recipe for matrix partitioning Partitioning Recipe Square? ( M = N ?) No Yes Pathological? M < 0 . 35 N No or or M ≥ Z mode( d r , d c ) = 0 √ M/ or K < max( d r , d c ) √ Yes max( d r , d c ) ≥ (1 − ε ) 2 Z/ K No Yes N < 0 . 35 M FGS/FGU CWU sym( A ) > 0.95 No No Yes Yes Q 3 ( d r )-med( d r ) < max( d r ) − Q 3 ( d r ) RWU FGU 2 or No avg( d r ) > med( d r ) Q 3 ( d c )-med( d c ) < max( d c ) − Q 3 ( d c ) 2 med( d r ) ≤ med( d c ) No Yes Yes No Yes FGS/FGU JLS/JLU FGU JLU T JLU 17/37 Hypergraph partitioning

Recommend


More recommend