Sparse Matrix Partitioning, Reordering and Vector Multiplication - PowerPoint PPT Presentation

Sparse Matrix Partitioning, Reordering and Vector Multiplication Sparse Matrix Partitioning, Reordering and Vector Multiplication Albert-Jan Yzelman, Utrecht University (NL) May, 2010 Albert-Jan Yzelman, Utrecht University (NL)

Sparse Matrix Partitioning, Reordering and Vector Multiplication > Bulk Synchronous Parallel Bulk Synchronous Parallel Bulk Synchronous Parallel 1 Matrix partitioning for SpMV 2 Reordering for Sequential SpMV 3 In relation to PSPIKE 4 Albert-Jan Yzelman, Utrecht University (NL)

Sparse Matrix Partitioning, Reordering and Vector Multiplication > Bulk Synchronous Parallel Supercomputers... Many different processor types: Reduced Instruction Set chips (RISC), (e.g., IBM Power) Intel Itanium x86-type (your average home PC/laptop) Vector (co-)processors GPUs Stream processors ... Albert-Jan Yzelman, Utrecht University (NL)

Sparse Matrix Partitioning, Reordering and Vector Multiplication > Bulk Synchronous Parallel Supercomputers... Many different connectivity; Ring All-to-all ethernet InfiniBand Cube Hierarchical Internet ... Albert-Jan Yzelman, Utrecht University (NL)

Sparse Matrix Partitioning, Reordering and Vector Multiplication > Bulk Synchronous Parallel Programming Model One model; bridging models : Message Passing Interface (MPI) Bulk Synchronous Parallel (BSP) Leslie G. Valiant, A bridging model for parallel computation , Communications of the ACM, Volume 33 (1990), pp. 103–111 Albert-Jan Yzelman, Utrecht University (NL)

Sparse Matrix Partitioning, Reordering and Vector Multiplication > Bulk Synchronous Parallel Bulk Synchronous Parallel A BSP-computer: consists of P processors, each with local memory executes a Single Program on Multiple Data (SPMD) performs no communication during calculation communicates only during barrier synchronisation Albert-Jan Yzelman, Utrecht University (NL)

Sparse Matrix Partitioning, Reordering and Vector Multiplication > Bulk Synchronous Parallel Superstep 0 Sync Superstep 1 Albert-Jan Yzelman, Utrecht University (NL)

Sparse Matrix Partitioning, Reordering and Vector Multiplication > Bulk Synchronous Parallel Bulk Synchronous Parallel A BSP-computer furthermore: has homogenous processors, able to do r flops each second takes l time to synchronise has a communication speed of g The model thus only uses four parameters ( P , r , l , g ). Albert-Jan Yzelman, Utrecht University (NL)

Sparse Matrix Partitioning, Reordering and Vector Multiplication > Bulk Synchronous Parallel Bulk Synchronous Parallel A BSP- algorithm can (using BSPlib, BSPonMPI): Ask for some environment variables: bsp_nprocs() bsp_pid() Synchronise: bsp_sync() Perform “direct” remote memory access (DRMA): bsp_put(source, dest, dest_PID) bsp_get(source, source_PID, dest) Send messages, synchronously (BSMP): bsp_send(data, dest_PID) bsp_move() Albert-Jan Yzelman, Utrecht University (NL)

Sparse Matrix Partitioning, Reordering and Vector Multiplication > Bulk Synchronous Parallel Exercise Design a parallel inner-product calculation: double spmd_ip( double *x, double *y, int length ) { double sum=... ... return sum; } Using: bsp_nprocs(), bsp_pid(), bsp_put(...) Albert-Jan Yzelman, Utrecht University (NL)

Sparse Matrix Partitioning, Reordering and Vector Multiplication > Bulk Synchronous Parallel Exercise Design a parallel inner-product calculation: double spmd_ip( double *x, double *y, int length ) { int i = 0; double sum = x[0]*y[0]; double res[ bsp_nprocs() ]; for(i=1; i<length; i++) sum += x[i]*y[i]; for(i=0; i<bsp_nprocs(); i++) bsp_put(&sum,&res[bsp_pid()],i); bsp_sync(); for(i=0; i<bsp_nprocs(); i++) sum += res[i]; return sum; } Albert-Jan Yzelman, Utrecht University (NL)

Sparse Matrix Partitioning, Reordering and Vector Multiplication > Bulk Synchronous Parallel BSP cost model: let w ( s ) be the work to be done by processor s in superstep i , i let r ( s ) be the communication received by processor s between i superstep i and i + 1, let t ( s ) be the communication transmitted by processor s . i Furthermore, let T be the total number of supersteps and the communication bound of superstep i be given by � � s ∈ [0 , P − 1] r ( s ) s ∈ [0 , P − 1] t ( s ) c i = max max max , . i i Similarly, the upper bound for the amount of work: s ∈ [0 , P − 1] w ( s ) w i = max . i Then the cost of a BSP algorithm is given by: T T − 1 � � w i + ( l + g · c i ) i =0 i =0 Albert-Jan Yzelman, Utrecht University (NL)

Sparse Matrix Partitioning, Reordering and Vector Multiplication > Bulk Synchronous Parallel Exercise Now think on how to do Sparse Matrix–Vector multiplication (SpMV) using BSP. Albert-Jan Yzelman, Utrecht University (NL)

Sparse Matrix Partitioning, Reordering and Vector Multiplication > Matrix partitioning for SpMV Matrix partitioning for SpMV Bulk Synchronous Parallel 1 Matrix partitioning for SpMV 2 Reordering for Sequential SpMV 3 In relation to PSPIKE 4 Albert-Jan Yzelman, Utrecht University (NL)

Sparse Matrix Partitioning, Reordering and Vector Multiplication > Matrix partitioning for SpMV Sparse matrix, dense vector multiplication y=Ax : for each nonzero k from A add x [ k . column ] · k . value to y [ k . row ] �� Albert-Jan Yzelman, Utrecht University (NL)

Sparse Matrix Partitioning, Reordering and Vector Multiplication > Matrix partitioning for SpMV Sparse matrix, dense vector multiplication (parallel) Step 1 ( fan-out ): Not all processors have the elements from x they need; processors need to get the missing items. �� Albert-Jan Yzelman, Utrecht University (NL)

Sparse Matrix Partitioning, Reordering and Vector Multiplication - PowerPoint PPT Presentation

Sparse Matrix Partitioning, Reordering and Vector Multiplication Sparse Matrix Partitioning, Reordering and Vector Multiplication Albert-Jan Yzelman, Utrecht University (NL) May, 2010 Albert-Jan Yzelman, Utrecht University (NL) Sparse Matrix

Fast sparse matrixvector multiplication by partitioning and reordering Albert-Jan Yzelman

Fast sparse matrixvector multiplication by partitioning and reordering Albert-Jan Yzelman

Matrix and Vector Operations Matrix and Vector Operations 1 / 21 Matrix and Vector Operations

Exploiting Matrix Reuse and Data Locality in Sparse Matrix-Vector and Matrix-Transpose-Vector

Parallel Sparse Matrix-Vector and Matrix- Transpose-Vector Multiplication using Compressed Sparse

Sparse matrix partitioning, ordering, and visualisation by Mondriaan 3.0 Outline Partitioning

Investigating hypergraph-partitioning-based sparse matrix partitioning methods Bora U car

Column-Based Matrix Partitioning for Parallel Matrix Multiplication on Heterogeneous Processors

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

Exploiting GPU Caches in Sparse Matrix Vector Multiplication Yusuke Nagasaka Tokyo Institute of

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Benchmarking Sparse Matrix-Vector Multiply In 5 Minutes Hormozd Gahvari, Mark Hoemmen, James

1 last time reordering: processors and compilers avoiding reordering: special instructions,

Non-projective Dependency-based Pre-Reordering with Recurrent Neural Network for Machine

LSTM Neural Reordering Model for Statistical Machine Translation Yiming Cui, Shijin Wang,

Reordering Philipp Koehn 31 October 2017 Philipp Koehn Machine Translation: Reordering 31

Russells paradox and free zig zag solutions Ludovica Conti FINO - Northwestern Philosophy

CSE326:DataStructures Lecture#10 AmazinglyVexingLetters Bart Niswonger

Using new Geant4 features in LArG4 Hans Wenzel LarSoft coordination meeting 27 th June 2017

Gravity from BRST squared copy BRST Double-copy in a non-flat Silvia Nagy background

Floating phase versus chiral transition in 1D constrained models Natalia Chepiga Swiss National

4-connected polyhedra have at least a linear number of hamiltonian cycles Gunnar Brinkmann Nico

A spectral sequence for cohomology of knot space Syunji Moriya Osaka Prefecture University

A formality criterion for differential graded Lie algebras Marco Manetti Sapienza University,