supermatrix a multithreaded runtime scheduling
play

SuperMatrix: A Multithreaded Runtime Scheduling System for - PowerPoint PPT Presentation

SuperMatrix: A Multithreaded Runtime Scheduling System for Algorithms-by-Blocks Ernie Chan, Field G. Van Zee, Robert van de Geijn, Paolo Bientinesi, Enrique S. Quintana-Ort and Gregorio Quintana-Ort Software Engineering Seminar Luc Humair


  1. SuperMatrix: A Multithreaded Runtime Scheduling System for Algorithms-by-Blocks Ernie Chan, Field G. Van Zee, Robert van de Geijn, Paolo Bientinesi, Enrique S. Quintana-Ortí and Gregorio Quintana-Ortí Software Engineering Seminar – Luc Humair

  2. Motivation • Multicore architectures demand concurrent algorithms • Complicated and error prone linear algebra libraries 2

  3. Motivation • SuperMatrix offers level of abstraction for algorithms-by-block: – Automatic parallelization – Straight forward implementation of algorithms-by-block 3

  4. Motivation • SuperMatrix offers level of abstraction for algorithms-by-block: – Automatic parallelization – Straight forward implementation of algorithms-by-block • Work with blocked matrices (FLAME/FLASH API) • Dependency analysis • Out of order scheduling 4

  5. Inversion of a SPD Matrix Given symmetric positive definite matrix 1 1 1    n n A R A 1 2 1 1 1 3 5

  6. Inversion of a SPD Matrix Given symmetric positive definite matrix 1 1 1    n n A R A 1 2 1 1 1 3 1. Cholesky factorization (CHOL) 1 1 1   T A U U U 0 1 0 0 0 1.4 6

  7. Inversion of a SPD Matrix Given symmetric positive definite matrix 1 1 1    n n A R A 1 2 1 1 1 3 1. Cholesky factorization (CHOL) 1 1 1   T A U U U 0 1 0 0 0 1.4 2. Inversion of triangular matrix (TRINV) 1 -1 -0.7   U  1 : R R 0 1 0 0 0 0.7 7

  8. Inversion of a SPD Matrix Given symmetric positive definite matrix 1 1 1    n n A R A 1 2 1 1 1 3 1. Cholesky factorization (CHOL) 1 1 1   T A U U U 0 1 0 0 0 1.4 2. Inversion of triangular matrix (TRINV) 1 -1 -0.7   U  1 : R R 0 1 0 0 0 0.7 3. Triangular transpose matrix mult. (TTMM) 2.5 -1 -0.5  :  1   1 T A A RR -1 1 0 -0.5 0 0.5 8

  9. Inversion of a SPD Matrix – Proof   n n SPD matrix A R  T A U U 1. (CHOL)  U  1 2. (TRINV) R :  :  1 T 3. (TTMM) A RR Proof: 1 , 3 2         1 T T T 1 T T T AA U URR U UU U U U I 9

  10. Inversion of a SPD Matrix – Proof   n n SPD matrix A R  T A U U 1. (CHOL)  U  1 2. (TRINV) R :  :  1 T 3. (TTMM) A RR Proof: 1 , 3 2         1 T T T 1 T T T AA U URR U UU U U U I 10

  11. Inversion of a SPD Matrix – Proof   n n SPD matrix A R  T A U U 1. (CHOL)  U  1 2. (TRINV) R :  :  1 T 3. (TTMM) A RR Proof: 1 , 3 2         1 T T T 1 T T T AA U URR U UU U U U I 11

  12. Inversion of a SPD Matrix – Proof   n n SPD matrix A R  T A U U 1. (CHOL)  U  1 2. (TRINV) R :  :  1 T 3. (TTMM) A RR Proof: 1 , 3 2         1 T T T 1 T T T AA U URR U UU U U U I 12

  13. Inversion of a SPD Matrix – Proof   n n SPD matrix A R  T A U U 1. (CHOL)  U  1 2. (TRINV) R :  :  1 T 3. (TTMM) A RR Proof: 1 , 3 2         1 T T T 1 T T T AA U URR U UU U U U I 13

  14. One variant of computing (CHOL) Source: Paper 14

  15. One variant of computing (CHOL) Source: Paper 15

  16. First iteration (4x4 matrix blocks) A 1,1 A 1,2 A 1,3 A 1,4 (A 2,1 ) A 2,2 A 2,3 A 2,4 (A 3,1 ) (A 3,2 ) A 3,3 A 3,4 (A 4,1 ) (A 4,2 ) (A 4,3 ) A 4,4 Source: Paper 16

  17. First iteration (4x4 matrix blocks) A 1,1 A 1,2 A 1,3 A 1,4 (A 2,1 ) A 2,2 A 2,3 A 2,4 (A 3,1 ) (A 3,2 ) A 3,3 A 3,4 (A 4,1 ) (A 4,2 ) (A 4,3 ) A 4,4 Source: Paper 17

  18. First iteration (4x4 matrix blocks) Computations: CHOL 0 TRSM 1 TRSM 2 TRSM 3 CHOL(A 1,1 ) Inv(A 1,1 ) A 1,2 Inv(A 1,1 ) A 1,3 Inv(A 1,1 ) A 1,4 SYRK 4 GEMM 5 GEMM 6 (A 2,1 ) A 2,2 – A T A 2,3 – A T A 2,4 – A T 1,2 A 1,2 1,2 A 1,3 1,2 A 1,4 SYRK 7 GEMM 8 (A 3,1 ) (A 3,2 ) A 3,3 – A T A 3,4 – A T 1,3 A 1,3 1,3 A 1,4 SYRK 9 (A 4,1 ) (A 4,2 ) (A 4,3 ) A 4,4 – A T 1,4 A 1,4 Source: Paper 18

  19. First iteration (4x4 matrix blocks) Computations: CHOL 0 TRSM 1 TRSM 2 TRSM 3 CHOL(A 1,1 ) Inv(A 1,1 ) A 1,2 Inv(A 1,1 ) A 1,3 Inv(A 1,1 ) A 1,4 SYRK 4 GEMM 5 GEMM 6 (A 2,1 ) A 2,2 – A T A 2,3 – A T A 2,4 – A T 1,2 A 1,2 1,2 A 1,3 1,2 A 1,4 SYRK 7 GEMM 8 (A 3,1 ) (A 3,2 ) A 3,3 – A T A 3,4 – A T 1,3 A 1,3 1,3 A 1,4 SYRK 9 (A 4,1 ) (A 4,2 ) (A 4,3 ) A 4,4 – A T 1,4 A 1,4 Source: Paper • CHOL Cholesky factorization 19

  20. First iteration (4x4 matrix blocks) Computations: CHOL 0 TRSM 1 TRSM 2 TRSM 3 CHOL(A 1,1 ) Inv(A 1,1 ) A 1,2 Inv(A 1,1 ) A 1,3 Inv(A 1,1 ) A 1,4 SYRK 4 GEMM 5 GEMM 6 (A 2,1 ) A 2,2 – A T A 2,3 – A T A 2,4 – A T 1,2 A 1,2 1,2 A 1,3 1,2 A 1,4 SYRK 7 GEMM 8 (A 3,1 ) (A 3,2 ) A 3,3 – A T A 3,4 – A T 1,3 A 1,3 1,3 A 1,4 SYRK 9 (A 4,1 ) (A 4,2 ) (A 4,3 ) A 4,4 – A T 1,4 A 1,4 Source: Paper • TRSM • CHOL Triangular solves Cholesky with multiple right factorization 20 hand sides

  21. First iteration (4x4 matrix blocks) Computations: CHOL 0 TRSM 1 TRSM 2 TRSM 3 CHOL(A 1,1 ) Inv(A 1,1 ) A 1,2 Inv(A 1,1 ) A 1,3 Inv(A 1,1 ) A 1,4 SYRK 4 GEMM 5 GEMM 6 (A 2,1 ) A 2,2 – A T A 2,3 – A T A 2,4 – A T 1,2 A 1,2 1,2 A 1,3 1,2 A 1,4 SYRK 7 GEMM 8 (A 3,1 ) (A 3,2 ) A 3,3 – A T A 3,4 – A T 1,3 A 1,3 1,3 A 1,4 SYRK 9 (A 4,1 ) (A 4,2 ) (A 4,3 ) A 4,4 – A T 1,4 A 1,4 Source: Paper • TRSM • CHOL • SYRK Triangular solves Symmetric rank-k Cholesky with multiple right factorization update 21 hand sides

  22. First iteration (4x4 matrix blocks) Computations: CHOL 0 TRSM 1 TRSM 2 TRSM 3 CHOL(A 1,1 ) Inv(A 1,1 ) A 1,2 Inv(A 1,1 ) A 1,3 Inv(A 1,1 ) A 1,4 SYRK 4 GEMM 5 GEMM 6 (A 2,1 ) A 2,2 – A T A 2,3 – A T A 2,4 – A T 1,2 A 1,2 1,2 A 1,3 1,2 A 1,4 SYRK 7 GEMM 8 (A 3,1 ) (A 3,2 ) A 3,3 – A T A 3,4 – A T 1,3 A 1,3 1,3 A 1,4 SYRK 9 (A 4,1 ) (A 4,2 ) (A 4,3 ) A 4,4 – A T 1,4 A 1,4 Source: Paper • TRSM • CHOL • SYRK • GEMM Triangular solves Symmetric rank-k Cholesky Matrix-Matrix with multiple right factorization update multiplication 22 hand sides

  23. First iteration (4x4 matrix blocks) Computations: Dependency Graph: CHOL 0 TRSM 1 TRSM 2 TRSM 3 CHOL(A 1,1 ) Inv(A 1,1 ) A 1,2 Inv(A 1,1 ) A 1,3 Inv(A 1,1 ) A 1,4 SYRK 4 GEMM 5 GEMM 6 (A 2,1 ) A 2,2 – A T A 2,3 – A T A 2,4 – A T 1,2 A 1,2 1,2 A 1,3 1,2 A 1,4 SYRK 7 GEMM 8 (A 3,1 ) (A 3,2 ) A 3,3 – A T A 3,4 – A T 1,3 A 1,3 1,3 A 1,4 SYRK 9 (A 4,1 ) (A 4,2 ) (A 4,3 ) A 4,4 – A T 1,4 A 1,4 Source: Paper • TRSM • CHOL • SYRK • GEMM Triangular solves Symmetric rank-k Cholesky Matrix-Matrix with multiple right factorization update multiplication 23 hand sides

  24. First iteration (4x4 matrix blocks) Computations: Dependency Graph: CHOL 0 TRSM 1 TRSM 2 TRSM 3 (A 1,1 ) (A 1,2 ) (A 1,3 ) (A 1,4 ) CHOL(A 1,1 ) Inv(A 1,1 ) A 1,2 Inv(A 1,1 ) A 1,3 Inv(A 1,1 ) A 1,4 TRSM 2 CHOL 0 TRSM 1 SYRK 4 GEMM 5 GEMM 6 (A 2,1 ) (A 2,1 ) CHOL(A 2,2 ) Inv(A 2,2 ) A 2,3 Inv(A 2,2 ) A 2,4 A 2,2 – A T A 2,3 – A T A 2,4 – A T 1,2 A 1,2 1,2 A 1,3 1,2 A 1,4 SYRK 3 GEMM 4 SYRK 7 GEMM 8 (A 3,1 ) (A 3,2 ) (A 3,1 ) (A 3,2 ) A 3,3 – A T 2,3 A 2,3 A 3,4 – A T 2,3 A 2,4 A 3,3 – A T A 3,4 – A T 1,3 A 1,3 1,3 A 1,4 SYRK 7 SYRK 9 (A 4,1 ) (A 4,2 ) (A 3,2 ) (A 4,1 ) (A 4,2 ) (A 4,3 ) A 4,4 – A T 2,4 A 2,4 A 4,4 – A T 1,4 A 1,4 Source: Paper • TRSM • CHOL • SYRK • GEMM Triangular solves Symmetric rank-k Cholesky Matrix-Matrix with multiple right factorization update multiplication 24 hand sides

  25. First iteration (4x4 matrix blocks) Computations: Dependency Graph: CHOL 0 TRSM 1 TRSM 2 TRSM 3 (A 1,3 ) (A 1,4 ) (A 1,1 ) (A 1,2 ) (A 1,3 ) (A 1,4 ) (A 1,1 ) (A 1,2 ) CHOL(A 1,1 ) Inv(A 1,1 ) A 1,2 Inv(A 1,1 ) A 1,3 Inv(A 1,1 ) A 1,4 TRSM 2 CHOL 0 TRSM 1 SYRK 4 GEMM 5 GEMM 6 (A 2,1 ) (A 2,1 ) (A 2,1 ) (A 2,2 ) (A 2,3 ) (A 2,4 ) CHOL(A 2,2 ) Inv(A 2,2 ) A 2,3 Inv(A 2,2 ) A 2,4 A 2,2 – A T A 2,3 – A T A 2,4 – A T 1,2 A 1,2 1,2 A 1,3 1,2 A 1,4 SYRK 3 GEMM 4 TRSM 2 CHOL 0 SYRK 7 GEMM 8 (A 3,1 ) (A 3,2 ) (A 3,1 ) (A 3,2 ) (A 3,1 ) (A 3,2 ) A 3,3 – A T 2,3 A 2,3 A 3,4 – A T 2,3 A 2,4 A 3,3 – A T A 3,4 – A T Inv(A 3,3 ) A 3,4 CHOL(A 3,3 ) 1,3 A 1,3 1,3 A 1,4 SYRK 7 SYRK 9 SYRK 3 (A 4,1 ) (A 4,2 ) (A 3,2 ) (A 4,1 ) (A 4,1 ) (A 4,2 ) (A 4,2 ) (A 4,3 ) (A 4,3 ) A 4,4 – A T 2,4 A 2,4 A 4,4 – A T A 4,4 – A T 3,4 A 3,4 1,4 A 1,4 Source: Paper • TRSM • CHOL • SYRK • GEMM Triangular solves Symmetric rank-k Cholesky Matrix-Matrix with multiple right factorization update multiplication 25 hand sides

Recommend


More recommend