CONVERGENCE ACCELERATION TECHNIQUES FOR DUAL TIME STEPPING Niki A. Loppi Brian C. Vermeire Peter E. Vincent AI & HPC Solution Architect Aerospace Engineering Department of Aeronautics NVIDIA Concordia University Imperial College London
OVERVIEW • Incompressible flows require a divergence free velocity field • Artificial Compressibility Method (ACM) is a suitable approach • A range of novel convergence acceleration techniques • Locally Adaptive Pseudo-Timestepping (LAPTS) • Polynomial Multigrid (P-MG) • Optimal explicit Runge-Kutta Methods
ARTIFICIAL COMPRESSIBILITY • An alternative to pressure projection in steady state • ACM uses a pseudo time problem to enforce incompressibility • Dual time-stepping can extend the ACM unsteady flows • This introduces a global hyperbolic problem in pseudo-time • Leverage the explicit solver technology already in PyFR
ARTIFICIAL COMPRESSIBILITY ∂ u ∂ u ∂ t + ∂ F ∂ x + ∂ G ∂ y + ∂ H Conservation law ∂ τ + I c ∂ z = 0 ∂ τ = R n +1, m − I c ∂ u 2 Δ t (3 u n +1, m − 4 u n + u n − 1 ) Physical time u ( k ) = u (0) − α m Δ τ ( R ( k − 1) − I c 2 Δ t (3 u ( k − 1) − 4 u n + u n − 1 ) ) Pseudo time Algorithm (1)
OVERVIEW • ACM performance relies on rapid convergence in pseudo-time • A range of novel convergence acceleration techniques in PyFR • Polynomial Multigrid (P-MG) • Locally Adaptive Pseudo-Timestepping (LAPTS) • Optimal explicit Runge-Kutta Methods
POLYNOMIAL MULTIGRID • Leverage lower polynomial degrees to accelerate convergence • Less strict CFL limits on the coarser levels • Less expensive per iteration on the coarser levels • Low-frequency error is converged faster on coarse levels • Correction from coarse levels is then prolongated to fine levels
POLYNOMIAL MULTIGRID Iterate Iterate Restrict Prolongate Iterate Iterate Restrict Prolongate Iterate
POLYNOMIAL MULTIGRID • Unsteady Circular Cylinder ~ 6.2x Speedup
POLYNOMIAL MULTIGRID • Incompressible Taylor Green Vortex ~ 3.5x Speedup
LAPTS • Convergence is accelerated by using local pseudo-time steps • Maximum permissible step size is limited by local CFL criteria • Element size • Polynomial degree • Local wave speeds and viscous effects • Runge-Kutta scheme properties • This limit is estimated via embedded pair Runge-Kutta schemes
LAPTS • Embedded pair gives an estimate of the truncation error • Pseudo-time step size is the adapted using a PI-controller • For each element • For each field variable • Scaled up on coarser grid levels when combined with P-MG
LAPTS • Unsteady Circular Cylinder ~ 4.1x Speedup
LAPTS • SD7003 Airfoil ~ 2.4x Speedup
OPTIMAL RUNGE-KUTTA SCHEMES • Properties of Runge-Kutta scheme limit pseudo-time step size • Each Runge-Kutta scheme has a stability polynomial • Each stability polynomial has a region of absolute stability • Pseudo-time step is limited by the size of this region • For the ACM, first-order in pseudo-time time is sufficient
OPTIMAL RUNGE-KUTTA SCHEMES s Stability polynomial ∑ γ j z j , z = Δ τω δ P s ,1 ( z ) = 1 + z + j =2 Optimise to yield maximum Δ τ { γ 2 , γ 3 , . . . , γ s } subject to | P s ,1 ( Δ τω δ ) | − 1 ≤ 0, ∀ ω δ
OPTIMAL RUNGE-KUTTA SCHEMES • Optimal stability polynomials can be used for embedded pairs • Divergence of a “test” scheme controls pseudo-time step • Allows automatic pseudo-time step size selection
OPTIMAL RUNGE-KUTTA SCHEMES • Unsteady Circular Cylinder ~ 2.1x Speedup
OPTIMAL RUNGE-KUTTA SCHEMES • Turbulent Jet ~ 2x Speedup
PERFORMANCE • Advancements in numerical methods (2015 - 2020) 25 Speed Up for Cylinder Benchmark 20 ~ 21x Speedup 15 10 5 0 RK4 RK-Opt LTS PMG RK-Opt+LTS+PMG
PERFORMANCE • Advancements in hardware (2015 - 2020) 20 15 Peak DP TFLOP/s ~ 16x Speedup 10 5 0 K20 P100 V100 A100
PERFORMANCE • Combined ~350x speedup (2015 - 2020) 20 25 Speed Up for Cylinder 20 15 Peak DP TFLOP/s Benchmark 15 10 10 5 5 0 0 K20 P100 V100 A100 RK4 LTS RK-Opt+LTS+PMG
RESULTS • DARPA SUBOFF at Re = 1.2 × 10 6
RESULTS
RESULTS
CONFIGURATION P-MG LAPTS Optimal Runge Kutta
REFERENCES •NA Loppi, FD Witherden, A Jameson, PE Vincent, A high-order cross-platform incompressible Navier–Stokes solver via artificial compressibility with application to a turbulent jet, Computer Physics Communications 233, 193-205, 2018. •NA Loppi, FD Witherden, A Jameson, PE Vincent, Locally adaptive pseudo-time stepping for high-order Flux Reconstruction, Journal of Computational Physics 399, 2019. •BC Vermeire, NA Loppi, PE Vincent, Optimal Runge–Kutta schemes for pseudo time-stepping with high-order unstructured methods, Journal of Computational Physics 383, 55-71, 2019. •BC Vermeire, NA Loppi, PE Vincent, Optimal embedded pair Runge-Kutta schemes for pseudo-time stepping, Journal of Computational Physics, 415, 2020.
QUESTIONS
Recommend
More recommend