m a phys or the development of a parallel algebraic
play

M A PHYS or the development of a parallel algebraic domain - PowerPoint PPT Presentation

M A PHYS or the development of a parallel algebraic domain decomposition solver in the course of the Solstice project Emmanuel A GULLO , Luc G IRAUD , Abdou G UERMOUCHE , Azzam H AIDAR , Yohan L I -T IN -Y IEN , Jean R OMAN HiePACS project -


  1. M A PHYS or the development of a parallel algebraic domain decomposition solver in the course of the Solstice project Emmanuel A GULLO , Luc G IRAUD , Abdou G UERMOUCHE , Azzam H AIDAR , Yohan L I -T IN -Y IEN , Jean R OMAN HiePACS project - INRIA Bordeaux Sud-Ouest joint INRIA-CERFACS lab. on High Performance Computing CERFACS Sparse Days Toulouse, June 2010

  2. Outline Motivations 1 A parallel algebraic domain decompostion solver 2 Parallel and numerical scalability on 3D academic 3 problems Parallel and numerical scalability on 3D Solstice 4 problems Prospectives 5 HiePACS team 2/30 Algebraic parallel domain decomposition solver

  3. Motivations A x = b The “spectrum” of linear algebra solvers Direct Iterative Robust/accurate for general Problem dependent efficiency/controlled problems accuracy BLAS-3 based implementations Only mat-vect required, fine grain computation Memory/CPU prohibitive for large 3 D Less memory computation, possible trade-off problems with CPU Limited parallel scalability Attractive “build-in” parallel features

  4. Overlapping Domain Decomposition Classical Additive Schwarz preconditioners δ Ω2 Goal: solve linear system A x = b Use iterative method Apply the preconditioner at each step The convergence rate deteriorates as the number of subdomains increases Ω1 0 1 0 1 − 1 A 1 , 1 A 1 ,δ A 1 , 1 A 1 ,δ A = ⇒ M δ A = A δ, 1 A δ,δ A δ, 2 AS = − 1 A δ, 1 A δ,δ A δ, 2 @ @ A A δ, 2 A 2 , 2 A δ, 2 A 2 , 2 Classical Additive Schwarz preconditioners N subdomains case N “ ” T “ ” − 1 R δ X M δ R δ A δ AS = i i i i = 1 HiePACS team 4/30 Algebraic parallel domain decomposition solver

  5. Non-overlapping Domain Decomposition Schur complement reduced system Γ Ω2 Goal: solve linear system A x = b Apply partially Gaussian elimination Solve the reduced system S x Γ = f Then solve A i x i = b i − A i , Γ x Γ Ω1 0 1 b 1 0 1 0 1 A 1 , 1 0 A 1 , Γ x 1 B C B C B C B C b 2 B C B C B C 0 A 2 , 2 A 2 , Γ x 2 A = B C B C B C 2 @ A @ B C X A Γ , i A − 1 @ A b Γ − b i S 0 0 x Γ i , i i = 1 Solve A x = b = ⇒ solve the reduced system S x Γ = f = ⇒ then solve A i x i = b i − A i , Γ x Γ 2 X A Γ , i A − 1 where S = A Γ , Γ − i , i A i , Γ , i = 1 2 X A Γ , i A − 1 f = − and b Γ b i . i , i i = 1 HiePACS team 5/30 Algebraic parallel domain decomposition solver

  6. Nonoverlapping Domain Decomposition Schur complement reduced system m n k l Ωι+1 Γ = k ∪ ℓ ∪ m ∪ n Ωι Ωι+2 Distributed Schur complement Ω ι Ω ι + 1 Ω ι + 2 z }| { z }| { z }| { ! ! ! S ( ι ) S ( ι + 1 ) S ( ι + 2 ) S k ℓ S ℓ m S mn mm kk ℓℓ S ( ι ) S ( ι + 1 ) S ( ι + 2 ) S ℓ k S m ℓ S nm mm nn ℓℓ X In an assembled form: S ℓℓ = S ( ι ) ℓℓ + S ( ι + 1 ) S ( ι ) = ⇒ S ℓℓ = ℓℓ ℓℓ ι ∈ adj HiePACS team 6/30 Algebraic parallel domain decomposition solver

  7. Non-overlapping Domain Decomposition Algebraic Additive Schwarz preconditioner [ L.Carvalho, L.G., G.Meurant - 01] N X R T Γ i S ( i ) R Γ i S = i = 1 0 1 0 1 ... ... B C B C B C B − 1 C S kk S k ℓ S kk S k ℓ B C B C S = = ⇒ M = B C B C − 1 S ℓ k S ℓℓ S ℓ m S ℓ k S ℓℓ S ℓ m B C B C @ A @ A S m ℓ S mm S mn S m ℓ S mm S mn S nm S nn S nm S nn N X R T S ( i ) ) − 1 R Γ i Γ i ( ¯ M = i = 1 S ( i ) is obtained from S ( i ) where ¯ Similarity with Neumann-Neumann preconditioner [J.F Bourgat, R. ! „ « S ( ι ) S k ℓ S kk S k ℓ Glowinski, P . Le Tallec and M. S ( i ) = S ( i ) = ⇒ ¯ kk = S ( ι ) S ℓ k S ℓℓ Vidrascu - 89] [Y.H. de S ℓ k ℓℓ Roeck, P . Le Tallec and M. Vidrascu | {z } | {z } - 91] local Schur local assembled Schur ց ր X S ( ι ) ℓℓ ι ∈ adj HiePACS team 7/30 Algebraic parallel domain decomposition solver

  8. Parallel preconditioning features S ( i ) = A ( i ) Γ i Γ i − A Γ i I i A − 1 I i I i A I i Γ i E g E m Ω i # domains � i (¯ R T S ( i ) ) − 1 R i E ℓ M AS = E k i = 1 Ω j S ( i ) 0 1 0 1 S mg S mk S m ℓ S mm S mg S mk S m ℓ mm S ( i ) S gm S gg S gk S g ℓ S gm S gk S g ℓ B C S ( i ) = S ( i ) = ¯ B C gg B C B C S ( i ) S km S kg S kk S k ℓ B S km S kg S k ℓ C @ A kk @ A S ℓ m S ℓ g S ℓ k S ℓℓ S ( i ) S ℓ m S ℓ g S ℓ k ℓℓ Assembled local Schur complement local Schur complement � S ( j ) S mm = mm j ∈ adj ( m ) HiePACS team 8/30 Algebraic parallel domain decomposition solver

  9. Parallel implementation Each subdomain A ( i ) is handled by one processor „ A I i I i « A I i Γ i A ( i ) ≡ A ( i ) A I i Γ i ΓΓ Concurrent partial factorizations are performed on each processor to form the so called “local Schur complement” S ( i ) = A ( i ) ΓΓ − A Γ i I i A − 1 I i I i A I i Γ i The reduced system S x Γ = f is solved using a distributed Krylov solver Γ ) k = ( y ( i ) ) k - One matrix vector product per iteration each processor computes S ( i ) ( x ( i ) - One local preconditioner apply ( M ( i ) )( z ( i ) ) k = ( r ( i ) ) k - Local neighbor-neighbor communication per iteration - Global reduction (dot products) Compute simultaneously the solution for the interior unknowns A I i I i x I i = b I i − A I i Γ i x Γ i HiePACS team 9/30 Algebraic parallel domain decomposition solver

  10. Algebraic Additive Schwarz preconditioner Main characteristics in 2 D The ratio interface/interior is small Does not require large amount of memory to store the preconditioner Computation/application of the preconditioner are fast They consist in a call to L APACK /B LAS -2 kernels Main characteristics in 3 D The ratio interface/interior is large The storage of the preconditioner might not be affordable The construction of the preconditioner can be computationally expensive Need cheaper Algebraic Additive Schwarz form of the preconditioner HiePACS team 10/30 Algebraic parallel domain decomposition solver

  11. How to alleviate the preconditioner construction Sparsification strategy through dropping � ¯ ¯ s k ℓ ≥ ξ ( | ¯ s kk | + | ¯ s ℓℓ | ) s k ℓ if � s k ℓ = 0 else Approximation through ILU - [ INRIA PhyLeas - A. Haidar, L.G., Y.Saad - 10] A ii ! „ « „ ˜ « A i Γ i ˜ L − 1 ˜ L i 0 U i A i Γ pILU ( A ( i ) ) ≡ pILU ≡ i A ( i ) A Γ i ˜ ˜ U − 1 S ( i ) A Γ i i I 0 Γ i Γ i i Mixed arithmetic strategy Compute and store the preconditioner in 32-bit precision arithmetic Remarks: the backward stability result of GMRES indicates that it is hopeless to expect convergence at a backward error level smaller than the 32-bit accuracy [C.Paige, M.Rozloˇ zn´ ık, Z.Strakoˇ s - 06] Idea: To overcome this limitation we use FGMRES [Y.Saad - 93; Arioli, Duff - 09] Exploit two levels of parallelism Use a parallel sparse direct solver on each sub-domains/sub-graphs HiePACS team 11/30 Algebraic parallel domain decomposition solver

  12. Academic model problems Problem patterns Circular flow velocity Problem −1− 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.2 0.4 0.6 0.8 1 Diffusion equation ( ǫ = 1 and v = 0) and convection-diffusion equation  − ǫ div ( K . ∇ u ) + v . ∇ u = f in Ω , u = 0 on ∂ Ω . Heterogeneous problems Anisotropic-heterogeneous problems Convection dominated term HiePACS team 12/30 Algebraic parallel domain decomposition solver

  13. Numerical behaviour of sparse preconditioners Convergence history of PCG Time history of PCG 3D heterogeneous diffusion problem 3D heterogeneous diffusion problem 0 Dense calculation 0 Dense calculation 10 10 Sparse with ξ =10 −5 Sparse with ξ =10 −5 Sparse with ξ =10 −4 Sparse with ξ =10 −4 −2 −2 10 10 Sparse with ξ =10 −3 Sparse with ξ =10 −3 Sparse with ξ =10 −2 Sparse with ξ =10 −2 −4 −4 10 10 −6 −6 10 10 ||r k ||/||b|| ||r k ||/||b|| −8 −8 10 10 −10 −10 10 10 −12 −12 10 10 −14 −14 10 10 −16 −16 10 10 0 20 40 60 80 100 120 140 160 180 200 220 240 0 20 40 60 80 100 120 140 160 180 # iter Time(sec) 3 D heterogeneous diffusion problem with 43 Mdof mapped on 1000 processors For ( ξ ≪ )the convergence is marginally affected while the memory saving is significant 15% For ( ξ ≫ ) a lot of resources are saved but the convergence becomes very poor 1% Even though they require more iterations, the sparsified variants converge faster as the time per iteration is smaller and the setup of the preconditioner is cheaper HiePACS team 13/30 Algebraic parallel domain decomposition solver

Recommend


More recommend