Fast preconditioners for time-harmonic wave equations Jack Poulson 1 Lexing Ying 1 , 2 Björn Engquist 1 , 2 Sergey Fomel 1 , 3 Siwei Li 1 , 3 1 ICES, UT Austin 2 Department of Mathematics, UT Austin 3 Jackson School of Geosciences ICERM, January 9, 2012 (revised January 15, 2012) 1 / 34
Outline Time-harmonic wave equations Sweeping preconditioners H -matrix approach Moving PML approach General algorithm Scalability issues Results Conclusions 2 / 34
Outline Time-harmonic wave equations Sweeping preconditioners H -matrix approach Moving PML approach General algorithm Scalability issues Results Conclusions 3 / 34
Time-harmonic wave equations Wave equations are often approximated by superimposing solutions of their time-harmonic form. Three common categories: ◮ Helmholtz equation (from acoustic wave equation) ◮ Time-harmonic Maxwell’s equations ◮ Time-harmonic linear elasticity Our strategy is independent of the specifics of the equation and heavily exploits absorbing boundary conditions. 1 This talk focuses on the simplest case, the Helmholtz equation. 1 P . Tsuji et al., “A sweeping preconditioner for time-harmonic Maxwell’s equations with finite elements” 4 / 34
The Helmholtz equation ω 2 � � u ( x ) = f ( x ) , x ∈ Ω ⊂ R d − ∆ − c 2 ( x ) ◮ Helmholtz operator is elliptic, but indefinite ◮ With real Dirichlet boundary conditions, usual discretizations will be real symmetric (Hermitian) and indefinite ◮ Sommerfeld radiation condition often imposed on at least one side, but PML yields complex symmetric (non-Hermitian) matrices (least squares methods are another story...) ◮ Solving large 3d Helmholtz equations is challenging: ◮ Standard preconditioners ineffective for high frequencies ◮ Sparse-direct solves prohibitively expensive (with n grid points per dimension, O ( N 2 ) = O ( n 6 ) work) 5 / 34
The damped Helmholtz equation − ∆ − ( ω + i α ) 2 � � u ( x ) = f ( x ) , α ≈ 2 π c 2 ( x ) Rough idea: the preconditioning operator’s long-range interactions will be less accurate than for short-range, so damp waves by adding a positive imaginary component to the frequency. ◮ Basic strategy is to use approximate inverse of damped Helmholtz equation as preconditioner for GMRES ◮ The damping parameter effects the convergence rate and is velocity and frequency dependent, but it can typically be chosen near 2 π . 6 / 34
Outline Time-harmonic wave equations Sweeping preconditioners H -matrix approach Moving PML approach General algorithm Scalability issues Results Conclusions 7 / 34
Sweeping preconditioners Engquist and Ying recently introduced two sweeping preconditioners. Both approximate Schur complements in block LDL T factorization with a particular ordering: ↑ y l a c i s P y M h p L → x → z 8 / 34
Sweeping preconditioners Engquist and Ying recently introduced two sweeping preconditioners. Both approximate Schur complements in block LDL T factorization with a particular ordering: A 3 , 3 A 2 , 2 A 1 , 1 9 / 34
Sweeping preconditioners Engquist and Ying recently introduced two sweeping preconditioners. Both approximate Schur complements in block LDL T factorization with a particular ordering: A T 0 A 1 , 1 1 2 , 1 0 S 1 1 ... B C S 2 A 2 , 1 A 2 , 2 B C B C A L T n − 1 · · · L T = L 1 · · · L n − 1 1 , B C B C ... B ... ... C B C B C @ A T @ A m , m − 1 S m A m , m − 1 A m , m ◮ A is block-tridiagonal discrete damped Helmholtz operator ◮ Each block corresponds to one panel ◮ A 1 , 1 must correspond to a boundary panel with PML ◮ S − 1 =( A i , i − A i , i − 1 S − 1 i − 1 A i − 1 , i ) − 1 , restricted half-space Green’s function! i ◮ Each L i is a block Gauss transform 2 , L i = I + E i + 1 A i + 1 , i S − 1 E T i . i 2 The elementary matrix kind, not a sum of Gaussians 10 / 34
Outline Time-harmonic wave equations Sweeping preconditioners H -matrix approach Moving PML approach General algorithm Scalability issues Results Conclusions 11 / 34
H -matrix approach ◮ Original sweeping preconditioner approach ◮ “Simply” updates and inverts Schur complements of implicit block LDL T factorization of damped Helmholtz in particular ordering in H -matrix arithmetic ◮ Inverting H -matrices in parallel is more expensive but scalable (with Schultz iteration) ◮ Subject of another talk (PP12)...sandbox code called DMHM 12 / 34
Outline Time-harmonic wave equations Sweeping preconditioners H -matrix approach Moving PML approach General algorithm Scalability issues Results Conclusions 13 / 34
Moving PML approach Key point: S − 1 is the discrete halfspace Green’s function i restricted to the i ’th panel. Approximate by putting an artificial absorbing boundary condition directly on the panel (which preserves sparsity). 3 PML PML ≈ i ’th panel i ’th panel . . . . . . 3 C.f. Atle and Engquist, "On surface radiation conditions for high-frequency wave scattering" 14 / 34
Moving PML approach Key point: S − 1 is the discrete halfspace Green’s function i restricted to the i ’th panel. Approximate by putting an artificial absorbing boundary condition directly on the panel (which preserves sparsity). 3 The preconditioner setup is just sparse-direct LDL T factorizations on each PML-padded subdomain. With O ( n ) subdomains with O ( n 2 ) degrees of freedom each, complexity is O ( n ( n 2 ) 3 / 2 ) = O ( n 4 ) = O ( N 4 / 3 ) , and memory requirement is only O ( n ( n 2 log n 2 )) = O ( n 3 log n ) = O ( N log N ) 3 C.f. Atle and Engquist, "On surface radiation conditions for high-frequency wave scattering" 15 / 34
Moving PML approach Key point: S − 1 is the discrete halfspace Green’s function i restricted to the i ’th panel. Approximate by putting an artificial absorbing boundary condition directly on the panel (which preserves sparsity). 3 Each preconditioner application requires two solves against each subdomain (one each for solving against L and L T ). The application complexity is thus O ( n ( n 2 log n )) = O ( n 3 log n ) = O ( N log N ) . Note that subdomains must be solved against one at a time! 3 C.f. Atle and Engquist, "On surface radiation conditions for high-frequency wave scattering" 16 / 34
Applying approximate Green’s functions ∗ 0 . . . . . . S − 1 = H − 1 g i ≈ v i , i i ∗ 0 v i g i Applying approximate Green’s function takes three steps: 1. Extend right-hand side by zeroes on the artificial PML region 0 . . . g i �→ 0 g i 17 / 34
Applying approximate Green’s functions ∗ 0 . . . . . . S − 1 = H − 1 g i ≈ v i , i i ∗ 0 v i g i Applying approximate Green’s function takes three steps: 2. Perform sparse-direct solve against H i ∗ 0 . . . . . . := H − 1 i ∗ 0 v i g i 18 / 34
Applying approximate Green’s functions ∗ 0 . . . . . . S − 1 = H − 1 g i ≈ v i , i i ∗ 0 v i g i Applying approximate Green’s function takes three steps: 3. Extract original degrees of freedom ∗ . . . �→ v i ∗ v i 19 / 34
Challenges for scalability ◮ Roughly half of the work is in sparse-direct triangular solves (and therefore, dense triangular solves) ◮ Dense triangular solves with O ( 1 ) right-hand sides are, at best, weakly scalable ◮ Triangular solves with O ( p ) right-hand sides are scalable, but this requires too much memory ◮ Parallelism in preconditioner application limited to quasi-2d subdomains! ◮ Black-box sparse-direct redistributes right-hand sides for solve ◮ MUMPS and SuperLU_Dist were not memory scalable, and WSMP is not open source, nor does it support large numbers of simultaneous factorizations 20 / 34
Fighting for scalability ◮ Wrote custom sparse-direct solver, Clique , on top of my distributed dense linear algebra library, Elemental (and made sure it was memory scalable!) ◮ Subdomain sparse-direct factorizations use subtree-to-subcube mappings and 2d front distributions (and redistribute fronts to 1d distribution after factorization) ◮ Globally reordering global right-hand sides based upon subdomain front distributions avoids communication in sparse-direct subdomain solves ◮ Dense triangular matrix-vector multiplication has a much lower latency cost than a dense triangular solve...so invert diagonal blocks of distributed fronts after factorization (solve latency drops from O ( m log p ) to O ( log p ) for m × m matrix). 21 / 34
Outline Time-harmonic wave equations Sweeping preconditioners H -matrix approach Moving PML approach General algorithm Scalability issues Results Conclusions 22 / 34
Overthrust model Velocity in [km/s] of middle XY , XZ , and YZ planes: Domain is 20 [km] x 20 [km] x 4.65 [km], with low velocity and faults near surface and high velocity near the bottom. Grid is 801 × 801 × 187. 23 / 34
Recommend
More recommend