convergence acceleration techniques for dual time stepping
play

CONVERGENCE ACCELERATION TECHNIQUES FOR DUAL TIME STEPPING Niki A. - PowerPoint PPT Presentation

CONVERGENCE ACCELERATION TECHNIQUES FOR DUAL TIME STEPPING Niki A. Loppi Brian C. Vermeire Peter E. Vincent AI & HPC Solution Architect Aerospace Engineering Department of Aeronautics NVIDIA Concordia University Imperial College London


  1. CONVERGENCE ACCELERATION TECHNIQUES FOR DUAL TIME STEPPING Niki A. Loppi Brian C. Vermeire Peter E. Vincent AI & HPC Solution Architect Aerospace Engineering Department of Aeronautics NVIDIA Concordia University Imperial College London

  2. OVERVIEW • Incompressible flows require a divergence free velocity field • Artificial Compressibility Method (ACM) is a suitable approach • A range of novel convergence acceleration techniques • Locally Adaptive Pseudo-Timestepping (LAPTS) • Polynomial Multigrid (P-MG) • Optimal explicit Runge-Kutta Methods

  3. ARTIFICIAL COMPRESSIBILITY • An alternative to pressure projection in steady state • ACM uses a pseudo time problem to enforce incompressibility • Dual time-stepping can extend the ACM unsteady flows • This introduces a global hyperbolic problem in pseudo-time • Leverage the explicit solver technology already in PyFR

  4. ARTIFICIAL COMPRESSIBILITY ∂ u ∂ u ∂ t + ∂ F ∂ x + ∂ G ∂ y + ∂ H Conservation law ∂ τ + I c ∂ z = 0 ∂ τ = R n +1, m − I c ∂ u 2 Δ t (3 u n +1, m − 4 u n + u n − 1 ) Physical time u ( k ) = u (0) − α m Δ τ ( R ( k − 1) − I c 2 Δ t (3 u ( k − 1) − 4 u n + u n − 1 ) ) Pseudo time Algorithm (1)

  5. OVERVIEW • ACM performance relies on rapid convergence in pseudo-time • A range of novel convergence acceleration techniques in PyFR • Polynomial Multigrid (P-MG) • Locally Adaptive Pseudo-Timestepping (LAPTS) • Optimal explicit Runge-Kutta Methods

  6. POLYNOMIAL MULTIGRID • Leverage lower polynomial degrees to accelerate convergence • Less strict CFL limits on the coarser levels • Less expensive per iteration on the coarser levels • Low-frequency error is converged faster on coarse levels • Correction from coarse levels is then prolongated to fine levels

  7. POLYNOMIAL MULTIGRID Iterate Iterate Restrict Prolongate Iterate Iterate Restrict Prolongate Iterate

  8. POLYNOMIAL MULTIGRID • Unsteady Circular Cylinder ~ 6.2x Speedup

  9. POLYNOMIAL MULTIGRID • Incompressible Taylor Green Vortex ~ 3.5x Speedup

  10. LAPTS • Convergence is accelerated by using local pseudo-time steps • Maximum permissible step size is limited by local CFL criteria • Element size • Polynomial degree • Local wave speeds and viscous effects • Runge-Kutta scheme properties • This limit is estimated via embedded pair Runge-Kutta schemes

  11. LAPTS • Embedded pair gives an estimate of the truncation error • Pseudo-time step size is the adapted using a PI-controller • For each element • For each field variable • Scaled up on coarser grid levels when combined with P-MG

  12. LAPTS • Unsteady Circular Cylinder ~ 4.1x Speedup

  13. LAPTS • SD7003 Airfoil ~ 2.4x Speedup

  14. OPTIMAL RUNGE-KUTTA SCHEMES • Properties of Runge-Kutta scheme limit pseudo-time step size • Each Runge-Kutta scheme has a stability polynomial • Each stability polynomial has a region of absolute stability • Pseudo-time step is limited by the size of this region • For the ACM, first-order in pseudo-time time is sufficient

  15. OPTIMAL RUNGE-KUTTA SCHEMES s Stability polynomial ∑ γ j z j , z = Δ τω δ P s ,1 ( z ) = 1 + z + j =2 Optimise to yield maximum Δ τ { γ 2 , γ 3 , . . . , γ s } subject to | P s ,1 ( Δ τω δ ) | − 1 ≤ 0, ∀ ω δ

  16. OPTIMAL RUNGE-KUTTA SCHEMES • Optimal stability polynomials can be used for embedded pairs • Divergence of a “test” scheme controls pseudo-time step • Allows automatic pseudo-time step size selection

  17. OPTIMAL RUNGE-KUTTA SCHEMES • Unsteady Circular Cylinder ~ 2.1x Speedup

  18. OPTIMAL RUNGE-KUTTA SCHEMES • Turbulent Jet ~ 2x Speedup

  19. PERFORMANCE • Advancements in numerical methods (2015 - 2020) 25 Speed Up for Cylinder Benchmark 20 ~ 21x Speedup 15 10 5 0 RK4 RK-Opt LTS PMG RK-Opt+LTS+PMG

  20. PERFORMANCE • Advancements in hardware (2015 - 2020) 20 15 Peak DP TFLOP/s ~ 16x Speedup 10 5 0 K20 P100 V100 A100

  21. PERFORMANCE • Combined ~350x speedup (2015 - 2020) 20 25 Speed Up for Cylinder 20 15 Peak DP TFLOP/s Benchmark 15 10 10 5 5 0 0 K20 P100 V100 A100 RK4 LTS RK-Opt+LTS+PMG

  22. RESULTS • DARPA SUBOFF at Re = 1.2 × 10 6

  23. RESULTS

  24. RESULTS

  25. CONFIGURATION P-MG LAPTS Optimal Runge Kutta

  26. REFERENCES •NA Loppi, FD Witherden, A Jameson, PE Vincent, A high-order cross-platform incompressible Navier–Stokes solver via artificial compressibility with application to a turbulent jet, Computer Physics Communications 233, 193-205, 2018. •NA Loppi, FD Witherden, A Jameson, PE Vincent, Locally adaptive pseudo-time stepping for high-order Flux Reconstruction, Journal of Computational Physics 399, 2019. •BC Vermeire, NA Loppi, PE Vincent, Optimal Runge–Kutta schemes for pseudo time-stepping with high-order unstructured methods, Journal of Computational Physics 383, 55-71, 2019. •BC Vermeire, NA Loppi, PE Vincent, Optimal embedded pair Runge-Kutta schemes for pseudo-time stepping, Journal of Computational Physics, 415, 2020.

  27. QUESTIONS

Recommend


More recommend