How Two-sided Matrix Transformation Algorithms Can Benefit from Task Parallelism Mirko Myllykoski Department of Computing Science Ume˚ a University Nordic Numerical Linear Algebra Meeting KTH, Stockholm, 21-22 October, 2019 1 / 85
Eigenvalue problems ◮ Given A , B ∈ R n × n , find λ i ∈ C and x i ∈ C n such that Ax i = λ i x i or Ax i = λ i Bx i . ◮ Assumption: The matrices A and B are dense and nonsymmetric . 2 / 85
Eigenvalue problems (reduction to real Schur form) ◮ The matrix A is ◮ first reduced to upper Hessenberg form H = Q T 1 AQ 1 and ◮ then gradually reduced to real Schur form S = Q T 2 HQ 2 . Dense Hessenberg Schur Figure: An illustration of the two reduction steps in standard case. 3 / 85
Why? (task-based approach versus ScaLAPACK) 1.0 1.0 1.6 - 2.9 fold speedup 2.8 - 5.0 fold speedup 0.8 0.8 Relative runtime Relative runtime 0.6 0.6 0.4 0.4 0.2 0.2 StarNEig StarNEig PDHSEQR PDTRSEN 0.0 0.0 20k 40k 60k 80k 100k 120k 20k 40k 60k 80k 100k 120k Matrix dimension Matrix dimension (a) Schur reduction 1 . (b) Eigenvalue reordering. Figure: Improvement compared to ScaLAPACK. Up to 256 cores. 1 https://github.com/NLAFET/SEVP-PDHSEQR-Alg953/ . 4 / 85
Background (double-shift QR algorithm) ◮ The first column of H is transformed to the first column of ( H − λ 1 I )( H − λ 2 I ) , where the shifts λ 1 , λ 2 ∈ C are the eigenvalues of a small 2 × 2 submatrix. ◮ The resulting 3 × 3 bulge is chased across the diagonal of H . 0 0 2 x 2 b e u p i g l g r o e e b n l v e a m l u e Figure: An illustration of how the bulge is created and chased. 5 / 85
Background (multi-shift QR algorithm and level 3 BLAS) ◮ A modern multi-shift QR algorithm algorithm ◮ groups together a set of bulges and ◮ initially applies the transformations only within a small diagonal window . ◮ The transformations are accumulated and propagated with level 3 BLAS operations. In L2 cache Propagate with BLAS-3 updates Group transformations Apply locally Figure: An illustration of accumulated transformations. 6 / 85
Background (bulge chasing in ScaLAPACK) ◮ With p cores , we can have up to √ p concurrent windows . ◮ The transformation are broadcasted and applied in parallel. ◮ Theoretically possible degree of parallelism is p . Figure: An illustration of the bulge chasing stage. cores / ranks time Figure: A hypothetical trace for the bulge chasing stage. 7 / 85
Background (multi-shift QR algorithm with AED) Bulge chasing Hessenberg reduction n o S i t h c i f u t s d e r r u h c S S p i k e D e fl a t e R e o r d e r Figure: An illustration of the multi-shift QR algorithm with AED. 8 / 85
Background (AED in ScaLAPACK) cores / ranks AED time AED
Task-based approach (task graphs) ◮ The computational work is cut into self-contained tasks . ◮ The tasks are inserted into a runtime system . ◮ The runtime system derives the task dependences . ◮ The task dependencies can be visualized as a task graph . dependences W L L L L L W L L L L R W t a s L L L k s W R R L L W R R R L W R R R R R R R R R
Task-based approach (more opportunities for concurrency) ◮ Real live task graphs are much more complex. ◮ But enclose more opportunities for increased concurrency . ◮ The runtime system traverses the task graph. ◮ No global synchronization. ◮ Computational steps are allowed overlap and merge. ◮ Other benefits of the task-based approach include ◮ better load balancing, ◮ task priorities, ◮ accelerators support (GPUs) and ◮ implicit MPI communications . 11 / 85
Task-based approach (traversal) critical path ready for W scheduling L L L L L critical path dependences W L L L L R W can be L scheduled L L W R R L L W R R R L W R R R R R R R R R Figure: An illustration of how the task graph is traversed. 12 / 85
Trace (first AED) Figure: An illustration of the first AED window. * 13 / 85
Trace (first AED) Figure: An illustration of the first AED window. * 14 / 85
Trace (first AED) Figure: An illustration of the first AED window. * 15 / 85
Trace (first AED) Figure: An illustration of the first AED window. * → 16 / 85
Trace (bulges are introduced from the top left corner) Figure: An illustration of the beginning of the bulge chasing stage. * 17 / 85
Trace (bulges are introduced from the top left corner) Figure: An illustration of the beginning of the bulge chasing stage. 18 / 85
Trace (bulges are introduced from the top left corner) Figure: An illustration of the beginning of the bulge chasing stage. 19 / 85
Trace (bulges are introduced from the top left corner) Figure: An illustration of the beginning of the bulge chasing stage. 20 / 85
Trace (bulges are introduced from the top left corner) Figure: An illustration of the beginning of the bulge chasing stage. * → 21 / 85
Trace (bulges are chased across the diagonal) Figure: An illustration of the middle of the bulge chasing stage. 22 / 85
Trace (bulges are chased across the diagonal) Figure: An illustration of the middle of the bulge chasing stage. 23 / 85
Trace (bulges are chased across the diagonal) Figure: An illustration of the middle of the bulge chasing stage. 24 / 85
Trace (bulges are chased across the diagonal) Figure: An illustration of the middle of the bulge chasing stage. 25 / 85
Trace (bulges are chased across the diagonal) Figure: An illustration of the middle of the bulge chasing stage. 26 / 85
Trace (bulges are chased across the diagonal) Figure: An illustration of the middle of the bulge chasing stage. * → 27 / 85
Trace (delayed update wave follows the bulges) Figure: An illustration of the update wave that follows the bulges. 28 / 85
Trace (delayed update wave follows the bulges) Figure: An illustration of the update wave that follows the bulges. 29 / 85
Trace (delayed update wave follows the bulges) Figure: An illustration of the update wave that follows the bulges. 30 / 85
Trace (delayed update wave follows the bulges) Figure: An illustration of the update wave that follows the bulges. 31 / 85
Trace (delayed update wave follows the bulges) Figure: An illustration of the update wave that follows the bulges. 32 / 85
Trace (delayed update wave follows the bulges) Figure: An illustration of the update wave that follows the bulges. 33 / 85
Trace (delayed update wave follows the bulges) Figure: An illustration of the update wave that follows the bulges. 34 / 85
Trace (delayed update wave follows the bulges) Figure: An illustration of the update wave that follows the bulges. 35 / 85
Trace (delayed update wave follows the bulges) Figure: An illustration of the update wave that follows the bulges. * → 36 / 85
Trace (bulge chasing stage ends) Figure: An illustration of the end of the bulge chasing stage. 37 / 85
Trace (bulge chasing stage ends) Figure: An illustration of the end of the bulge chasing stage. 38 / 85
Trace (bulge chasing stage ends) Figure: An illustration of the end of the bulge chasing stage. 39 / 85
Trace (bulge chasing stage ends) Figure: An illustration of the end of the bulge chasing stage. * → 40 / 85
Trace (second AED) Figure: An illustration of the second AED window. * 41 / 85
Trace (second AED) Figure: An illustration of the second AED window. * 42 / 85
Trace (second AED) Figure: An illustration of the second AED window. * 43 / 85
Trace (second AED) Figure: An illustration of the second AED window. * 44 / 85
Trace (second AED) Figure: An illustration of the second AED window. * 45 / 85
Trace (second AED) Figure: An illustration of the second AED window. → 46 / 85
Trace (third AED) Figure: An illustration of the third AED window. * 47 / 85
Trace (third AED) Figure: An illustration of the third AED window. → 48 / 85
Trace (fourth AED) Figure: An illustration of the fourth AED window. 49 / 85
Trace (fourth AED) Figure: An illustration of the fourth AED window. 50 / 85
Trace (fourth AED) Figure: An illustration of the fourth AED window. * → 51 / 85
Trace (second bulge chasing stage begins) Figure: An illustration of the beginning of the second bulge chasing stage. * → 52 / 85
Trace (two merged bulge chasing stages) Figure: An illustration of two merged bulge chasing stages. 53 / 85
Trace (two merged bulge chasing stages) Figure: An illustration of two merged bulge chasing stages. 54 / 85
Trace (two merged bulge chasing stages) Figure: An illustration of two merged bulge chasing stages. 55 / 85
Trace (two merged bulge chasing stages) Figure: An illustration of two merged bulge chasing stages. 56 / 85
Trace (two merged bulge chasing stages) Figure: An illustration of two merged bulge chasing stages. 57 / 85
Trace (two merged bulge chasing stages) Figure: An illustration of two merged bulge chasing stages. * 58 / 85
Trace (two merged bulge chasing stages) Figure: An illustration of two merged bulge chasing stages. 59 / 85
Recommend
More recommend