simulations on gpu
play

Simulations on GPU Mohamed H. Aissa 1,2 1 Turbomachinery Department , - PowerPoint PPT Presentation

Acceleration of Turbomachinery steady CFD Simulations on GPU Mohamed H. Aissa 1,2 1 Turbomachinery Department , Dr. Tom Verstraete 1 Von Karman Institute, Belgium 2 Delft Institute of Applied Mathematics, Prof. Cornelis Vuik 2 TU Delft, the


  1. Acceleration of Turbomachinery steady CFD Simulations on GPU Mohamed H. Aissa 1,2 1 Turbomachinery Department , Dr. Tom Verstraete 1 Von Karman Institute, Belgium 2 Delft Institute of Applied Mathematics, Prof. Cornelis Vuik 2 TU Delft, the Netherlands Unconventional HPC, EuroPar 2016 WS Grenoble, 23.08.2016

  2. Topic of f In Interest Topic of Interest: Reduce Fuel Consumption and CO 2 Emission Wikipedia.org

  3. Turbomachinery is about Performance and Topic of f In Interest Efficiency

  4. Topic of f In Interest Axial Jet Engine Source: Wikipedia.org

  5. Content • Multidisciplinary Optimization • CFD simulations on GPU • Literature review • Implicit RANS Implementation • Benchmark • Optimization Case

  6. Topic of f In Interest Optimization algorithm Derivative-based optimization Derivative free methods: e.g. Population based • • fast convergence but .. Simplicity • • derivative evaluation could be Black box approach of the complicated and problem specific evaluation but .. • (adjoint, automatic Large number of evaluations differentiation) min f(𝒚) 𝑡𝑣𝑐𝑘𝑓𝑑𝑢 𝑢𝑝 𝑕 𝒚 ≤ 0

  7. Topic of f In Interest Optimization algorithm Derivative-based optimization Derivative free methods: e.g. Population based • • fast convergence but .. Simplicity • • derivative evaluation could be Black box approach of the complicated and problem specific evaluation but .. • (adjoint, automatic Large number of evaluations differentiation) min f(𝒚) 𝑡𝑣𝑐𝑘𝑓𝑑𝑢 𝑢𝑝 𝑕 𝒚 ≤ 0

  8. CFD: Core of the Optimization CFD much slower than CSM Need for acceleration -> GPU CADO: the VKI in-house optimizer

  9. Steady CFD Simulations • Simulation with a unique solution for given boundary Conditions. • A start solution is advanced iteratively in time until convergence

  10. Steady CFD Simulations • Simulation with a unique solution for given boundary Conditions. • A start solution is advanced iteratively in time until convergence

  11. Numerical Scheme: Explicit Time Stepping ( β =0): Implicit Time Stepping ( β =1): Aissa, M.H., Verstraete, T., Vuik, C. "Aerodynamic Optimization of Supersonic Compressor Cascade using Differential Evolution on GPU". 13th Int. Conf. of Numerical Analysis and Applied Mathematics (ICNAAM 2015)

  12. Implicit Time Stepping is more Stable but … X 1 X 2 X n

  13. Literature Review • What to Port • only linear solver when it is dominant • both assembly and solve is optimal (no communication) • Linear solver • Library : code maturity but restrictive (petsc-dev, Paralution, AmgX, ViennaCL …) • Own code: flexibility • Storage format • Standard (CSR,DIA …) • New (hybrid)

  14. CFD Solver (Standard) http://mhais sa.blogspot. be/2015/10 /for- paralution- gpu- conversion- and.html Implicit Runge-Kutta scheme Xu et Al. JCP 2015

  15. CFD Solver (Standard) Implicit Runge-Kutta scheme

  16. CFD Solver ( On-demand Factorization) Implicit Runge-Kutta scheme

  17. CFD Solver ( On-demand Factorization) Stop condition relative, absolute or a combination

  18. CFD Solver ( On-demand Factorization) Stop condition relative, absolute or a combination

  19. Benchmark: Flow around LS89 2-Stages Runge-Kutta 1/12

  20. Assembly Acceleration 9 Speedups x 7.8 8 on Coarse Mesh 7 6 5 Assembly speedup 4 Linear solve speedup 3 Global speedup 2 1 CPU GPU 0 2xCores 2xcores 3xcores 4xcores ILU ILU OD 3xCores 10% 4xCores Standard On-demand CPU GPU CPU GPU 70% 14 x 12.2 Speedups 90% 12 on Fine Mesh 30% 10 8 6 4 2 0 2xCores 2xcores 3xcores 4xCores 4xcores ILU ILU OD 3xCores Standard On-demand CPU GPU CPU GPU

  21. Linear Solver Acceleration 9 Speedups 8 on Coarse Mesh 7 6 5 Assembly speedup 4 Linear solve speedup 3 x 1.2 Global speedup 2 x 0.7 1 0 2xCores 3xCores 4xCores 2xcores 3xcores 4xcores ILU ILU OD Standard On-demand CPU GPU CPU GPU 14 Speedups 12 on Fine Mesh 10 8 x 5.7 6 4 x 1.8 2 0 2xCores 3xCores 4xCores Standard 2xcores 3xcores 4xcores ILU On-demand ILU OD CPU CPU GPU GPU

  22. Global Acceleration 9 Speedups 8 on Coarse Mesh 7 6 5 Assembly speedup x 3.2 4 Linear solve speedup x 2.0 3 Global speedup 2 1 0 2xcores 3xcores 4xcores ILU ILU OD 2xCores 3xCores 4xCores Standard On-demand CPU GPU CPU GPU 14 Suggestion for better Speedups Performance assessment 12 on Fine Mesh x 9.6 are very welcome! 10 8 x 4.8 6 4 2 0 2xcores 3xcores 4xcores ILU ILU OD 2xCores 3xCores 4xCores Standard On-demand CPU GPU CPU GPU

  23. Increase of the Speedup for higher Numbers of Runge-Kutta Stages on Fine Mesh 16 14 12 10 Speedup 8 6 4 2 0 2 3 4 5 6 N stages Assembly Solve Global

  24. Content • Multidisciplinary Optimization • CFD simulations on GPU • Literature review • Implicit RANS Implementation • Benchmark • Optimization Case

  25. Test Case 3: TU Berlin TurboLab Stator Topic of f In Interest Optimization requirements Objectives: • Decrease outfow axial deviation • Decrease total pressure loss Considering 3 operating points

  26. Topic of f In Interest TurboLab Manufacturing Constraints • N blades = 15 • Chord length fixed • Casing fixture 60 mm d=10mm h=20mm d=2mm

  27. TurboLab: Boundary conditions and summary 9 kg/s +/- 0.1 Massflow imposed P 2 adapted Inlet P 0 : 102713.0 Pa Inlet T 0 : 294.314 K Objectives: • Decrease outfow axial deviation • Decrease total pressure loss Considering 3 operating points Inlet whirl angle: 42° Inlet pitch angle: 0 °

  28. Parametrization 21 Design variables Span [-]

  29. Turbolab Parameterization

  30. Optimization Results 0.17% 1.7 % 60% IT074IND6

  31. Optimized Blade

  32. Baseline Vs Optimized

  33. Baseline Vs Optimized

  34. Isentropic Mach Number at mid-span

  35. Conclusion • Optimization • GPU Solver with implicit time stepping • On-demand (incomplete) Factorization • 10x speedup • Aerodynamic shape optimization

  36. Future Work Benchmark Case: Transonic Turbine Stator T106c 80 Speedup based on CPU explicit 70 CPU exp GPU exp 60 GPU imp CPU imp 50 40 GPU Imp. 30 GPU Exp. 20 CPU Imp. 10 CPU Exp. 0 50K 450k 900k Mesh Size

  37. Thanks for your attention Mohamed Hassanine Aissa Turbomachinery & Propulsion Department 72, chaussee de Waterloo B1640 - Rhode Saint Genese - Belgium Email: aissa@vki.ac.be ack cknowle ledgements: Support t H Hardware

Recommend


More recommend