Acceleration of Turbomachinery steady CFD Simulations on GPU Mohamed H. Aissa 1,2 1 Turbomachinery Department , Dr. Tom Verstraete 1 Von Karman Institute, Belgium 2 Delft Institute of Applied Mathematics, Prof. Cornelis Vuik 2 TU Delft, the Netherlands Unconventional HPC, EuroPar 2016 WS Grenoble, 23.08.2016
Topic of f In Interest Topic of Interest: Reduce Fuel Consumption and CO 2 Emission Wikipedia.org
Turbomachinery is about Performance and Topic of f In Interest Efficiency
Topic of f In Interest Axial Jet Engine Source: Wikipedia.org
Content • Multidisciplinary Optimization • CFD simulations on GPU • Literature review • Implicit RANS Implementation • Benchmark • Optimization Case
Topic of f In Interest Optimization algorithm Derivative-based optimization Derivative free methods: e.g. Population based • • fast convergence but .. Simplicity • • derivative evaluation could be Black box approach of the complicated and problem specific evaluation but .. • (adjoint, automatic Large number of evaluations differentiation) min f(𝒚) 𝑡𝑣𝑐𝑘𝑓𝑑𝑢 𝑢𝑝 𝒚 ≤ 0
Topic of f In Interest Optimization algorithm Derivative-based optimization Derivative free methods: e.g. Population based • • fast convergence but .. Simplicity • • derivative evaluation could be Black box approach of the complicated and problem specific evaluation but .. • (adjoint, automatic Large number of evaluations differentiation) min f(𝒚) 𝑡𝑣𝑐𝑘𝑓𝑑𝑢 𝑢𝑝 𝒚 ≤ 0
CFD: Core of the Optimization CFD much slower than CSM Need for acceleration -> GPU CADO: the VKI in-house optimizer
Steady CFD Simulations • Simulation with a unique solution for given boundary Conditions. • A start solution is advanced iteratively in time until convergence
Steady CFD Simulations • Simulation with a unique solution for given boundary Conditions. • A start solution is advanced iteratively in time until convergence
Numerical Scheme: Explicit Time Stepping ( β =0): Implicit Time Stepping ( β =1): Aissa, M.H., Verstraete, T., Vuik, C. "Aerodynamic Optimization of Supersonic Compressor Cascade using Differential Evolution on GPU". 13th Int. Conf. of Numerical Analysis and Applied Mathematics (ICNAAM 2015)
Implicit Time Stepping is more Stable but … X 1 X 2 X n
Literature Review • What to Port • only linear solver when it is dominant • both assembly and solve is optimal (no communication) • Linear solver • Library : code maturity but restrictive (petsc-dev, Paralution, AmgX, ViennaCL …) • Own code: flexibility • Storage format • Standard (CSR,DIA …) • New (hybrid)
CFD Solver (Standard) http://mhais sa.blogspot. be/2015/10 /for- paralution- gpu- conversion- and.html Implicit Runge-Kutta scheme Xu et Al. JCP 2015
CFD Solver (Standard) Implicit Runge-Kutta scheme
CFD Solver ( On-demand Factorization) Implicit Runge-Kutta scheme
CFD Solver ( On-demand Factorization) Stop condition relative, absolute or a combination
CFD Solver ( On-demand Factorization) Stop condition relative, absolute or a combination
Benchmark: Flow around LS89 2-Stages Runge-Kutta 1/12
Assembly Acceleration 9 Speedups x 7.8 8 on Coarse Mesh 7 6 5 Assembly speedup 4 Linear solve speedup 3 Global speedup 2 1 CPU GPU 0 2xCores 2xcores 3xcores 4xcores ILU ILU OD 3xCores 10% 4xCores Standard On-demand CPU GPU CPU GPU 70% 14 x 12.2 Speedups 90% 12 on Fine Mesh 30% 10 8 6 4 2 0 2xCores 2xcores 3xcores 4xCores 4xcores ILU ILU OD 3xCores Standard On-demand CPU GPU CPU GPU
Linear Solver Acceleration 9 Speedups 8 on Coarse Mesh 7 6 5 Assembly speedup 4 Linear solve speedup 3 x 1.2 Global speedup 2 x 0.7 1 0 2xCores 3xCores 4xCores 2xcores 3xcores 4xcores ILU ILU OD Standard On-demand CPU GPU CPU GPU 14 Speedups 12 on Fine Mesh 10 8 x 5.7 6 4 x 1.8 2 0 2xCores 3xCores 4xCores Standard 2xcores 3xcores 4xcores ILU On-demand ILU OD CPU CPU GPU GPU
Global Acceleration 9 Speedups 8 on Coarse Mesh 7 6 5 Assembly speedup x 3.2 4 Linear solve speedup x 2.0 3 Global speedup 2 1 0 2xcores 3xcores 4xcores ILU ILU OD 2xCores 3xCores 4xCores Standard On-demand CPU GPU CPU GPU 14 Suggestion for better Speedups Performance assessment 12 on Fine Mesh x 9.6 are very welcome! 10 8 x 4.8 6 4 2 0 2xcores 3xcores 4xcores ILU ILU OD 2xCores 3xCores 4xCores Standard On-demand CPU GPU CPU GPU
Increase of the Speedup for higher Numbers of Runge-Kutta Stages on Fine Mesh 16 14 12 10 Speedup 8 6 4 2 0 2 3 4 5 6 N stages Assembly Solve Global
Content • Multidisciplinary Optimization • CFD simulations on GPU • Literature review • Implicit RANS Implementation • Benchmark • Optimization Case
Test Case 3: TU Berlin TurboLab Stator Topic of f In Interest Optimization requirements Objectives: • Decrease outfow axial deviation • Decrease total pressure loss Considering 3 operating points
Topic of f In Interest TurboLab Manufacturing Constraints • N blades = 15 • Chord length fixed • Casing fixture 60 mm d=10mm h=20mm d=2mm
TurboLab: Boundary conditions and summary 9 kg/s +/- 0.1 Massflow imposed P 2 adapted Inlet P 0 : 102713.0 Pa Inlet T 0 : 294.314 K Objectives: • Decrease outfow axial deviation • Decrease total pressure loss Considering 3 operating points Inlet whirl angle: 42° Inlet pitch angle: 0 °
Parametrization 21 Design variables Span [-]
Turbolab Parameterization
Optimization Results 0.17% 1.7 % 60% IT074IND6
Optimized Blade
Baseline Vs Optimized
Baseline Vs Optimized
Isentropic Mach Number at mid-span
Conclusion • Optimization • GPU Solver with implicit time stepping • On-demand (incomplete) Factorization • 10x speedup • Aerodynamic shape optimization
Future Work Benchmark Case: Transonic Turbine Stator T106c 80 Speedup based on CPU explicit 70 CPU exp GPU exp 60 GPU imp CPU imp 50 40 GPU Imp. 30 GPU Exp. 20 CPU Imp. 10 CPU Exp. 0 50K 450k 900k Mesh Size
Thanks for your attention Mohamed Hassanine Aissa Turbomachinery & Propulsion Department 72, chaussee de Waterloo B1640 - Rhode Saint Genese - Belgium Email: aissa@vki.ac.be ack cknowle ledgements: Support t H Hardware
Recommend
More recommend