Partitioning sparse matrices for parallel preconditioned iterative - PowerPoint PPT Presentation

Partitioning sparse matrices for parallel preconditioned iterative methods Bora Uçar Emory University, Atlanta, GA Joint work with Prof C. Aykanat Bilkent University, Ankara, Turkey

Iterative methods • Used for solving linear systems A x  b – usually A is sparse while not converged do computations • Involves check convergence – linear vector operations • x = x  y  x i = x i   y i – inner products •  =  x , y    =  x i  y i – sparse matrix-vector multiplies (SpMxV)  y i =  A i ,x  • y = A x  y i =  A T i ,x  • y = A T x 2

Preconditioned iterative methods • Transform A x  b to another system that is easier to solve • Preconditioner is a matrix that does the desired transformation • Focus: approximate inverse preconditioners • Right approximate inverse M provides AM  I • Instead of solving A x  b , use right preconditioning and solve AM y = b and then set x = M y 3

Parallelizing iterative methods • Avoid communicating vector entries for linear vector operations and inner products • Inner products require communication – regular communication – cost remains the same with the increasing problem size – there are cost optimal algorithms to perform these communications. • Efficiently parallelize the SpMxV operations • Efficiently parallelize the application of the preconditioner 4

Preconditioned iterative methods • Applying approximate inverse preconditioners – additional SpMxV operations with M • never form the matrix AM; perform SpMxVs • Parallelizing a full step requires efficient SpMxV with A and M – partition A and M simultaneously • What has been done? – a bipartite graph model (Hendrickson and Kolda, SISC 00) 5

Row-parallel y=Ax • Rows (and hence y) and x is partitioned P 1 P 2 P 3 P 4 x 1 x 2 x 3 x 4 11 14 15 16 17 20 21 22 23 24 25 26 1. Expand x vector 10 12 13 18 19 2 3 4 5 6 7 9 1 8       1        (sends/receives) 2 P 1 y 1       3       4       2. Compute with 5         P 2 6 y 2        diagonal blocks 7      8       9 3. Receive x and          10 P 3 y 3     11 compute with off-       12       13 diagonal blocks        14 P 4 y 4         15     16 6

Row-parallel y=Ax P 1 P 2 P 3 P 4 x 1 x 2 x 3 x 4 Communication requirements 11 14 15 16 17 20 21 22 23 24 25 26 10 12 13 18 19 2 3 4 5 6 7 9 1 8       1        2 P 1 y 1 Total volume:       3       4 #nonzero column       5         P 2 6 y 2        segments in off 7      8       9 diagonal blocks (13)          10 P 3 y 3     11       12 Total number :       13        14 P 4 y 4 #nonzero off diagonal         15     16 blocks (9) Total volume and number of messages Per processor: addressed previously (Catalyurek above two confined and Aykanat, IEEE TPDS 99; U. and within a column stripe Aykanat, SISC 04; Vastenhouw and Bisseling, SIREV 05) 7

Minimize volume in row-parallel y=Ax: Revisiting 1D hypergraph models • Three entities to partition y, rows of A, & x – three types of vertices y i , r i & x j • y i is computed by a single r i – connect y i and r i (edge, hyperedge) • x j is a data source; r i 's where a ij ≠0 need x j – connect x j and all such r i (definitely a hyperedge) 8

Minimize volume in row-parallel y=Ax: Revisiting 1D hypergraph models General hypergraph model for Combine y i and r i : owner 1D rowwise partitioning computes rule Partition the vertices into K parts (partition the data among K processors) 9

Hypergraph partitioning • Partition the vertices of a hypergraph into two or more partitions such that: – ∑con ( n i )–1 is minimized (total volume) con ( n i )=number of parts connected by hyperedge n i – a balance criterion among the part weights is maintained (load balance) 10

Column-parallel y=Ax P 1 P 2 P 3 P 4 x 1 x 2 x 3 Communication requirements x4 10 11 12 13 14 15 16 1 2 3 4 5 6 7 8 9    1 Total volume:     2    3    #nonzero row segments in off y 1 4 P 1    5       diagonal blocks (13) 6     7     8 Total number :   9    10 #nonzero off diagonal blocks (9)    y 2 P 2 11       12    13 Per processor:      14    15 above two confined within a row    16   P 3 y 3 17 stripe    18         19      20     Total volume and number of messages 21    22    P 4 addressed previously (Catalyurek 23 y 4     24      25 and Aykanat, IEEE TPDS 99; U.       26 and Aykanat, SISC 04; Vastenhouw and Bisseling, SIREV 05). 11

Preconditioned iterative methods • Linear vector operations and inner product computations are done: – all vectors in a single operation have the same partition • Partition A and M simultaneously • A blend of dependencies and interactions among matrices and vectors – different partitioning requirements in different methods • Figure out partitioning requirements through analyzing linear vector operations and inner products 12

Preconditioned BiCG-STAB   p , r , v should         i i 1 i 1 i 1 p r p v be partitioned   i 1 i 1 conformably  i ˆ p Mp i  ˆ v A p s should be with r   i 1   i s r v i and v  ˆ s Ms  ˆ t A s t should be with s   t , s t , t i       i i 1 i x x p s x should be with i i p and s i    r s t i 13

Preconditioned BiCG-STAB  p , r , v, s, t, and, x should be partitioned conformably • What remains? Columns of M and  ˆ i p Mp ˆ p rows of A should  i ˆ be conformal v A p  ˆ s Ms ˆ s  ˆ t A s should be conformal P A Q T Rows of M and Q M P T columns of A should be conformal 14

Partitioning requirements BiCG-STAB PAQ T QMP T TFQMR PAP T and PM 1 M 2 P T GMRES PAP T and PMP T CGNE PAQ and PMP T • “and” means there is a synchronization point between SpMxV’s – Load balance each SpMxV individually 15

Model for simultaneous partitioning • We use the previously proposed models – define operators to build composite models Rowwise model (y=Ax) Col.wise model (w=Mz) 16

Combining hypergraph models • Vertex amalgamation: combine vertices of individual hypergraphs, and connect the composite vertex to the hyperedges of the individual vertices • Vertex weighting: define multiple weights; individual vertex weights are not added up Never amalgamate hyperedges of individual hypergraphs! 17

Combining guideline 1. Determine partitioning requirements 2. Decide on partitioning dimensions • generate rowwise model for the matrices to be partitioned rowwise • generate columnwise model for the matrices to be partitioned columnwise 3. Apply vertex operations ● to impose identical partition on two vertices amalgamate them ● if the applications of matrices are interleaved with synchronization apply vertex weighting 18

Combining example • BiCG-STAB requires PAQ T QMP T 1 • A rowwise (y=Ax), M columnwise (w=Mz) 2 19

Combining example (Cont') • AQ T QM: C olumns of A and rows of M (y=Ax, w=Mz) 3i 20

Combining example (Cont') • PAMP T: Rows of A and columns of M (y=Ax, w=Mz) 3i 21

Remarks on composite models • Partitioning the composite hypergraphs – balances computational loads of processors – minimizes the total communication volume in a full step of the preconditioned iterative methods • Assumption: A and M or their sparsity patterns are available 22

Experiments: Set up • Sparse nonsymmetric square matrices from Univ. Florida sparse matrix collection • SPAI by Grote and Huckle (SISC 97) • AINV by Benzi and Tůma (SISC 98) • PaToH by Çatalyürek and Aykanat (TPDS 99) 23

Experiments: Comparison With respect to partitioning A and applying the same partition to M (SPAI experiments) percent gain in total volume CC RR 32-way 64-way 32-way 64-way min 7 8 6 8 max 31 34 36 36 average 20 20 20 20 (Ten different matrices) 24

Partitioning sparse matrices for parallel preconditioned iterative - PowerPoint PPT Presentation

Partitioning sparse matrices for parallel preconditioned iterative methods Bora Uar Emory University, Atlanta, GA Joint work with Prof C. Aykanat Bilkent University, Ankara, Turkey Iterative methods Used for solving linear systems A x

Results for different matrices and comparisons Dense Matrices Rectangular Matrices

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

CNBC Matlab Mini-Course Sparse Matrices Sparse matrices provide an efficient means to store

MATHEMATICS 1 CONTENTS Matrices Special matrices Operations with matrices Matrix

Investigating hypergraph-partitioning-based sparse matrix partitioning methods Bora U car

Block Locally Optimal Preconditioned Eigenvalue Xolvers BLOPEX Ilya Lashuk, Merico Argentati,

Inverse Free Preconditioned Krylov Subspace Method for Symmetric Generalized Eigenvalue Problems

Fast sparse matrixvector multiplication by partitioning and reordering Albert-Jan Yzelman

Fast sparse matrixvector multiplication by partitioning and reordering Albert-Jan Yzelman

Sparse Matrix Partitioning, Reordering and Vector Multiplication Albert-Jan Yzelman, Utrecht

Algorithms in the parallel Algorithms in the parallel partitioning tool partitioning tool

Recommended Reading Efficient Parallel Sparse MatrixVector Multiplication U.V. C ataly

Partitioning and Divide-and- Conquer Strategies Partitioning Strategies Partitioning simply

Partitioning Introduction to Partitioning Mahapatra-Texas A&M-Spring02 1 System

Parallel Numerical Algorithms Chapter 4 Sparse Linear Systems Section 4.1 Direct Methods

DAVID ORSO Lifelong Arnold Resident and Local Real Estate Expert Hows the Market 20

Retirement: An Affordable and Good Quality of Life or Sliding into Poverty ? John Burbank

Terminal/Campus Optimization Program Evolve the Airport Travel Experience Understand the

Generations @ Work Reaching Across the Ages Agenda Introduction Explore the Situation Creating

HEALTHY STUDENTS make better learners Could uld someone meone he help lp me e wit ith h

D ISSEMINATION W ORKSHOP Assam(8-15 Nov 2011) 12 th January 2012 NAGAON DHUBRI T EAM C

Orient-Express Hotels Ltd. Q1 2012 Update Explanatory Statements This presentation and any

MODERNISATION A VITAL STEP IN BUILDING A SUSTAINABLE MINING INDUSTRY IN SOUTH AFRICA CHRIS

Partitioning sparse matrices for parallel preconditioned iterative - PowerPoint PPT Presentation

Partitioning sparse matrices for parallel preconditioned iterative methods Bora Uar Emory University, Atlanta, GA Joint work with Prof C. Aykanat Bilkent University, Ankara, Turkey Iterative methods Used for solving linear systems A x

Results for different matrices and comparisons Dense Matrices Rectangular Matrices

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

CNBC Matlab Mini-Course Sparse Matrices Sparse matrices provide an efficient means to store

MATHEMATICS 1 CONTENTS Matrices Special matrices Operations with matrices Matrix

Investigating hypergraph-partitioning-based sparse matrix partitioning methods Bora U car

Block Locally Optimal Preconditioned Eigenvalue Xolvers BLOPEX Ilya Lashuk, Merico Argentati,

Inverse Free Preconditioned Krylov Subspace Method for Symmetric Generalized Eigenvalue Problems

Fast sparse matrixvector multiplication by partitioning and reordering Albert-Jan Yzelman

Fast sparse matrixvector multiplication by partitioning and reordering Albert-Jan Yzelman

Sparse Matrix Partitioning, Reordering and Vector Multiplication Albert-Jan Yzelman, Utrecht

Algorithms in the parallel Algorithms in the parallel partitioning tool partitioning tool

Recommended Reading Efficient Parallel Sparse MatrixVector Multiplication U.V. C ataly

Partitioning and Divide-and- Conquer Strategies Partitioning Strategies Partitioning simply

Partitioning Introduction to Partitioning Mahapatra-Texas A&amp;M-Spring02 1 System

Parallel Numerical Algorithms Chapter 4 Sparse Linear Systems Section 4.1 Direct Methods

DAVID ORSO Lifelong Arnold Resident and Local Real Estate Expert Hows the Market 20

Retirement: An Affordable and Good Quality of Life or Sliding into Poverty ? John Burbank

Terminal/Campus Optimization Program Evolve the Airport Travel Experience Understand the

Generations @ Work Reaching Across the Ages Agenda Introduction Explore the Situation Creating

HEALTHY STUDENTS make better learners Could uld someone meone he help lp me e wit ith h

D ISSEMINATION W ORKSHOP Assam(8-15 Nov 2011) 12 th January 2012 NAGAON DHUBRI T EAM C

Orient-Express Hotels Ltd. Q1 2012 Update Explanatory Statements This presentation and any

MODERNISATION A VITAL STEP IN BUILDING A SUSTAINABLE MINING INDUSTRY IN SOUTH AFRICA CHRIS

Partitioning Introduction to Partitioning Mahapatra-Texas A&M-Spring02 1 System