sparseblas products in upc an evaluation of storage
play

SparseBLAS Products in UPC: an Evaluation of Storage Formats Jorge - PowerPoint PPT Presentation

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions SparseBLAS Products in UPC: an Evaluation of Storage Formats Jorge Gonzlez-Domnguez*, scar Garca-Lpez, Guillermo L. Taboada,


  1. Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions SparseBLAS Products in UPC: an Evaluation of Storage Formats Jorge González-Domínguez*, Óscar García-López, Guillermo L. Taboada, María J. Martín, Juan Touriño Computer Architecture Group University of A Coruña (Spain) {jgonzalezd,oscar.garcia,taboada,mariam,juan}@udc.es International Conference on Computational and Mathematical Methods in Science and Engineering CMMSE 2011 1/25

  2. Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions Introduction 1 Sparse Matrix-Vector Product 2 Sparse Matrix-Matrix Product 3 Experimental Evaluation 4 Conclusions 5 2/25

  3. Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions Introduction 1 Sparse Matrix-Vector Product 2 Sparse Matrix-Matrix Product 3 Experimental Evaluation 4 Conclusions 5 3/25

  4. Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions UPC: a Suitable Alternative for HPC in Multi-core Era Programming Models: PGAS Languages: Traditionally: Shared/Distributed memory programming models UPC -> C Challenge: hybrid memory architectures Titanium -> Java PGAS (Partitioned Global Address Co-Array Fortran -> Space) Fortran UPC Compilers: Berkeley UPC GCC (Intrepid) Michigan TU HP , Cray and IBM UPC Compilers 4/25

  5. Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions UPC: a Suitable Alternative for HPC in Multi-core Era Programming Models: PGAS Languages: Traditionally: Shared/Distributed memory programming models UPC -> C Challenge: hybrid memory architectures Titanium -> Java PGAS (Partitioned Global Address Co-Array Fortran -> Space) Fortran UPC Compilers: Berkeley UPC GCC (Intrepid) Michigan TU HP , Cray and IBM UPC Compilers 4/25

  6. Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions Studied Numerical Operations BLAS Libraries Basic Linear Algebra Subprograms Specification of a set of numerical functions Widely used by scientists and engineers SparseBLAS and PBLAS (Parallel BLAS) Studied Routines usmv : Sparse Matrix-Vector Product ( α ∗ A ∗ x + y = y ) usmm : Sparse Matrix-Matrix Product ( α ∗ A ∗ B + C = C ) 5/25

  7. Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions Studied Storage Formats Elements ordered by rows Coordinate Compressed Sparse Row (CSR) Block Sparse Row (BSR) Skyline with lower matrices Elements ordered by columns Compressed Sparse Column (CSC) Skyline with upper matrices Elements ordered by diagonals Diagonal 6/25

  8. Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions Introduction 1 Sparse Matrix-Vector Product 2 Sparse Matrix-Matrix Product 3 Experimental Evaluation 4 Conclusions 5 7/25

  9. Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions Syntax α ∗ A ∗ x + y = y Structures α -> Scalar A -> Sparse matrix x -> Dense vector y -> Dense vector 8/25

  10. Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions Distribution by rows Characterstics Well balanced computational workload in multiplication Unbalanced computational workload in final additions Gathering of data only with one copy per thread 9/25

  11. Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions Distribution by columns Characterstics Well balanced computational workload Gathering of data with one reduce per vector element 10/25

  12. Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions Distribution by diagonals Characterstics Unbalanced computational workload Gathering of data with one reduce per vector element 11/25

  13. Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions Introduction 1 Sparse Matrix-Vector Product 2 Sparse Matrix-Matrix Product 3 Experimental Evaluation 4 Conclusions 5 12/25

  14. Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions Syntax α ∗ A ∗ B + C = C Structures α -> Scalar A -> Sparse matrix B -> Dense matrix C -> Dense matrix 13/25

  15. Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions Distribution by rows Characterstics Well balanced computational workload in multiplication Unbalanced computational workload in additions Gathering of data only with one copy per thread 14/25

  16. Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions Distribution by columns Characterstics Well balanced computational workload in multiplication Well balanced computational workload in additions Gathering of data with one copy per thread and row 15/25

  17. Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions Introduction 1 Sparse Matrix-Vector Product 2 Sparse Matrix-Matrix Product 3 Experimental Evaluation 4 Conclusions 5 16/25

  18. Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions Experimental Results with a Regular Matrix (I) 17/25

  19. Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions Experimental Results with a Regular Matrix (II) matrix-vector product nemeth26-large 60 coordinate csr 50 bsr csc diagonal 40 sky-upper Speedups sky-lower 30 20 10 0 8 16 32 64 Number of Threads 18/25

  20. Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions Experimental Results with a Regular Matrix (and III) matrix-matrix product nemeth26 40 coordinate csr 35 bsr csc 30 diagonal sky-upper 25 Speedups sky-lower 20 15 10 5 0 8 16 32 64 Number of Threads 19/25

  21. Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions Experimental Results with an Irregular Matrix (I) 20/25

  22. Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions Experimental Results with an Irregular Matrix (II) matrix-vector product exdata-large 70 coordinate csr 60 bsr csc 50 diagonal Speedups 40 30 20 10 0 8 16 32 64 Number of Threads 21/25

  23. Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions Experimental Results with an Irregular Matrix (and III) matrix-matrix product exdata 70 coordinate csr 60 bsr csc 50 diagonal Speedups 40 30 20 10 0 8 16 32 64 Number of Threads 22/25

  24. Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions Introduction 1 Sparse Matrix-Vector Product 2 Sparse Matrix-Matrix Product 3 Experimental Evaluation 4 Conclusions 5 23/25

  25. Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions Main Conclusions Summary High speedups for both routines Best approach: Sparse matrix-vector product -> by rows Sparse matrix-matrix product If regular sparse matrix -> by rows If irregular sparse matrix -> by columns Future Work Study the impact of performing each distribution 24/25

  26. Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions SparseBLAS Products in UPC: an Evaluation of Storage Formats Jorge González-Domínguez*, Óscar García-López, Guillermo L. Taboada, María J. Martín, Juan Touriño Computer Architecture Group University of A Coruña (Spain) {jgonzalezd,oscar.garcia,taboada,mariam,juan}@udc.es International Conference on Computational and Mathematical Methods in Science and Engineering CMMSE 2011 25/25

Recommend


More recommend