Motivation Break the dependency Previous work Preconditioning Future work Summary Communication-avoiding Krylov subspace methods Mark Hoemmen mhoemmen@cs.berkeley.edu University of California Berkeley EECS SIAM Parallel Processing for Scientific Computing 2008 Hoemmen Comm.-avoiding KSMs
Motivation Break the dependency Previous work Preconditioning Future work Summary Overview Current Krylov methods: communication-limited Can rearrange them to avoid communication Can do this in a numerically stable way Requires rethinking preconditioning Hoemmen Comm.-avoiding KSMs
Motivation Break the dependency Two communication-bound kernels Previous work Potential to avoid communication Preconditioning Data dependencies limit reuse Future work Summary Motivation Two communication-bound kernels Can rearrange each kernel to avoid communication, but. . . Data dependency between the two precludes rearrangement. . . Unless you rearrange the Krylov method! Hoemmen Comm.-avoiding KSMs
Motivation Break the dependency Two communication-bound kernels Previous work Potential to avoid communication Preconditioning Data dependencies limit reuse Future work Summary Krylov methods: Two communication-bound kernels Sparse matrix-vector multiplication (SpMV) Share/communicate source vector w/ neighbors Low computational intensity per processor Orthogonalization: Θ( 1 ) reductions per vector Arnoldi/GMRES: Modified Gram-Schmidt or Householder QR Lanczos/CG: Recurrence orthogonalizes implicitly Hoemmen Comm.-avoiding KSMs
Motivation Break the dependency Two communication-bound kernels Previous work Potential to avoid communication Preconditioning Data dependencies limit reuse Future work Summary Krylov methods: Two communication-bound kernels Sparse matrix-vector multiplication (SpMV) Share/communicate source vector w/ neighbors Low computational intensity per processor Orthogonalization: Θ( 1 ) reductions per vector Arnoldi/GMRES: Modified Gram-Schmidt or Householder QR Lanczos/CG: Recurrence orthogonalizes implicitly Hoemmen Comm.-avoiding KSMs
Motivation Break the dependency Two communication-bound kernels Previous work Potential to avoid communication Preconditioning Data dependencies limit reuse Future work Summary Potential to avoid communication SpMV: Matrix powers kernel (Marghoob) Compute [ v , Av , A 2 v , . . . , A s v ] Tiling to reuse matrix entries Parallel: same latency cost as one SpMV Sequential: only read matrix O ( 1 ) times Orthogonalization: TSQR (Julien) Just as stable as Householder QR Parallel: same latency cost as one reduction Sequential: only read vectors once Hoemmen Comm.-avoiding KSMs
Motivation Break the dependency Two communication-bound kernels Previous work Potential to avoid communication Preconditioning Data dependencies limit reuse Future work Summary Potential to avoid communication SpMV: Matrix powers kernel (Marghoob) Compute [ v , Av , A 2 v , . . . , A s v ] Tiling to reuse matrix entries Parallel: same latency cost as one SpMV Sequential: only read matrix O ( 1 ) times Orthogonalization: TSQR (Julien) Just as stable as Householder QR Parallel: same latency cost as one reduction Sequential: only read vectors once Hoemmen Comm.-avoiding KSMs
Motivation Break the dependency Two communication-bound kernels Previous work Potential to avoid communication Preconditioning Data dependencies limit reuse Future work Summary Problem: Data dependencies limit reuse Krylov methods advance one vector at a time SpMV, then orthogonalize, then SpMV, . . . Figure: Data dependencies in Krylov subspace methods. Hoemmen Comm.-avoiding KSMs
Motivation Idea Break the dependency Example: GMRES Previous work Basis condition number Preconditioning Numerical experiments Future work Our algorithms Summary s -step Krylov methods: break the dependency Matrix powers kernel Compute basis of span { v , Av , A 2 v , . . . , A s v } TSQR Orthogonalize basis Use R factor to reconstruct upper Hessenberg H resp. tridiagonal T Solve least squares problem or linear system with H resp. T for coefficients of solution update Hoemmen Comm.-avoiding KSMs
Motivation Idea Break the dependency Example: GMRES Previous work Basis condition number Preconditioning Numerical experiments Future work Our algorithms Summary s -step Krylov methods: break the dependency Matrix powers kernel Compute basis of span { v , Av , A 2 v , . . . , A s v } TSQR Orthogonalize basis Use R factor to reconstruct upper Hessenberg H resp. tridiagonal T Solve least squares problem or linear system with H resp. T for coefficients of solution update Hoemmen Comm.-avoiding KSMs
Motivation Idea Break the dependency Example: GMRES Previous work Basis condition number Preconditioning Numerical experiments Future work Our algorithms Summary Example: GMRES Hoemmen Comm.-avoiding KSMs
Motivation Idea Break the dependency Example: GMRES Previous work Basis condition number Preconditioning Numerical experiments Future work Our algorithms Summary Original GMRES 1: for k = 1 to s do 2: w = Av k − 1 Orthogonalize w against v 0 , . . . , v k − 1 using Modified 3: Gram-Schmidt 4: end for 5: Compute solution using H Hoemmen Comm.-avoiding KSMs
Motivation Idea Break the dependency Example: GMRES Previous work Basis condition number Preconditioning Numerical experiments Future work Our algorithms Summary Version 2: Matrix powers kernel & TSQR 1: W = [ v 0 , Av 0 , A 2 v 0 , . . . , A s v 0 ] 2: [ Q , R ] = TSQR ( W ) 3: Compute H using R 4: Compute solution using H s powers of A for no extra latency cost s steps of QR for one step of latency But. . . Hoemmen Comm.-avoiding KSMs
Motivation Idea Break the dependency Example: GMRES Previous work Basis condition number Preconditioning Numerical experiments Future work Our algorithms Summary Basis computation not stable v , Av , A 2 v , . . . looks familiar. . . It’s the power method! Converges to principal eigenvector of A Expect increasing linear dependence. . . Basis condition number exponential in s Hoemmen Comm.-avoiding KSMs
Motivation Idea Break the dependency Example: GMRES Previous work Basis condition number Preconditioning Numerical experiments Future work Our algorithms Summary Basis computation not stable v , Av , A 2 v , . . . looks familiar. . . It’s the power method! Converges to principal eigenvector of A Expect increasing linear dependence. . . Basis condition number exponential in s Hoemmen Comm.-avoiding KSMs
Motivation Idea Break the dependency Example: GMRES Previous work Basis condition number Preconditioning Numerical experiments Future work Our algorithms Summary Basis computation not stable v , Av , A 2 v , . . . looks familiar. . . It’s the power method! Converges to principal eigenvector of A Expect increasing linear dependence. . . Basis condition number exponential in s Hoemmen Comm.-avoiding KSMs
Motivation Idea Break the dependency Example: GMRES Previous work Basis condition number Preconditioning Numerical experiments Future work Our algorithms Summary Version 3: Different basis Just like polynomial interpolation Use a different basis, e.g.: Newton basis W = [ v , ( A − θ 1 I ) v , ( A − θ 2 I )( A − θ 1 I ) v , . . . ] Get shifts θ i for free – Ritz values Can change shifts with each group of s Chebyshev basis W = [ v , T 1 ( v ) , T 2 ( v ) , . . . ] Use condition number bounds to scale T k ( z ) Uncertain sensitivity of κ 2 ( W ) to bounds Hoemmen Comm.-avoiding KSMs
Motivation Idea Break the dependency Example: GMRES Previous work Basis condition number Preconditioning Numerical experiments Future work Our algorithms Summary Basis condition number Figure: Condition number of various bases as a function of basis length s . Matrix A is a 10 6 × 10 6 2-D Poisson operator. Hoemmen Comm.-avoiding KSMs
Motivation Idea Break the dependency Example: GMRES Previous work Basis condition number Preconditioning Numerical experiments Future work Our algorithms Summary Numerical experiments Diagonal 10 4 × 10 4 matrix, κ 2 ( A ) = 10 8 s = 24 Newton: basis condition # about 10 14 Monomial: basis condition # about 10 16 Hoemmen Comm.-avoiding KSMs
Motivation Idea Break the dependency Example: GMRES Previous work Basis condition number Preconditioning Numerical experiments Future work Our algorithms Summary Better basis pays off: restarting GMRES(24,1) residuals: cond(A) = 1e8, n=1e4 0 Standard(24,1) Monomial(24,1) − 0.5 Newton(24,1) − 1 Log base 10 of 2 − norm relative residual error − 1.5 − 2 − 2.5 − 3 − 3.5 − 4 − 4.5 − 5 100 200 300 400 500 600 700 800 900 1000 Iteration count Figure: Restart after every group of s steps Hoemmen Comm.-avoiding KSMs
Motivation Idea Break the dependency Example: GMRES Previous work Basis condition number Preconditioning Numerical experiments Future work Our algorithms Summary Better basis pays off: less restarting GMRES(24,8) residuals: cond(A) = 1e8, n=1e4 1 Standard(24,8) Monomial(24,8) Newton(24,8) 0 Log base 10 of 2 − norm relative residual error − 1 − 2 − 3 − 4 − 5 − 6 100 200 300 400 500 600 700 800 900 1000 Iteration count Figure: Restart after 8 groups of s = 24 steps. Hoemmen Comm.-avoiding KSMs
Recommend
More recommend