jerry lee vaidya sankaran united technologies research
play

+ Jerry Lee, Vaidya Sankaran, United Technologies Research Center - PowerPoint PPT Presentation

Hybrid Simulations Using CPU-GPU Paradigm for Reacting Flows in Accelerating Industrial Competitiveness through Extreme- Scale Computing + Jerry Lee, Vaidya Sankaran, United Technologies Research Center UTC, East Hartford Acknowledgement:


  1. Hybrid Simulations Using CPU-GPU Paradigm for Reacting Flows in Accelerating Industrial Competitiveness through Extreme- Scale Computing + Jerry Lee, Vaidya Sankaran, United Technologies Research Center UTC, East Hartford Acknowledgement: Vivek Venugopal, Hui Gao for helping to implement the GPU code This page contains no technical information subjected to EAR and ITAR

  2. Special Thanks to Dr. Ramanan Sankaran, Dr. Suzy Tichenor & Dr. Jack Wells Oak Ridge National Laboratories, USA. 2

  3. Reactive flow adds a lot more PDEs to cold flow CFD       A u A V A      i i    A t x x i i      A 1 , u , e , Y ; 1 , , N ; i 1 , 2 , 3  i s N ~ 40 s Combustion adds a lot more PDEs, 9x cold flow This page contains no technical information subjected to EAR and ITAR

  4. “Fuel+O 2  product+heat ” has lot of paths, steps, and transcendental functions ONE step:     H O ( M ) HO ( M ) 2 2 dY    H k [ H ] [ O ] [ M ] 2 dt Pr  k k ( ) F   1 Pr    k A T exp( E / RT ) 0 14 transcendental 0 0 0     k A T exp( E / RT )    k [ M ]  0 Pr k   1   2     log(Pr) c      log( F ) 1 log( F )       c   n d (log(Pr) c )            * * * * * * * F ( 1 ) exp( T / T ) exp( T / T ) exp( T / T ) c  c , n , d linear functions of F c ~200-300 steps for jet fuel! This page contains no technical information subjected to EAR and ITAR

  5. Performance of standalone GPU chem solver (explicit) 1.0E+03 1.0E+02 cuts cost of chemistry Time /step(sec) compute way down in CFD 1.0E+01 CPU GPU 1.0E+00 0.0E+00 5.0E+06 1.0E+07 1.5E+07 2.0E+07 # threads * 1.0E-01 1.0E-02 * 1 thread = 1 DOE system This page contains no technical information subjected to EAR and ITAR

  6. Performance of standalone GPU chem solver (implicit) trans 1.0E-03 Wall time/ODE/time step [s] chem. on CPU 1.0E-04 cuts cost of chemistry compute GPU Walltime/ODE/step CPU Walltime/ODE/step way down in CFD GPU Walltime/ODE/step Reactive code cost  transport code DVODE on CPU 1.0E-05 1.0E-06 256 1024 4096 16384 65536 262144 1048576 4194304 # threads This page contains no technical information subjected to EAR and ITAR

  7. Operator splitting in CFD  thread the chemistry compute Integrate terms in tandem       A u A V A      i i    A t x x i i Independent of neighbors Collect from many cells and do threading Overall acceleration depends on chem. compute load

  8. domain decomposition  CPU transport, GPU chemistry CPU CPU CPU CPU CPU CPU transport transport Transport does all does all does all terms MPI MPI terms In tandem GPU threads for GPU GPU concurrent CHEMISTRY CHEMISTRY CHEMISTRY One GPU keeps up with all 16 CPU cores? This page contains no technical information subjected to EAR and ITAR

  9. GPU-CPU hybrid code tested on shear layer turbulent flame 3D Direct Numerical Simulation AIR 1. fully compressible reactive code 2. detailed chemical kinetics 3. detailed multicomponent transport 4. three dimension 5. all scales fully resolved Fuel: CO,H2 This page contains no technical information subjected to EAR and ITAR

  10. GPU chemistry hidden completely This page contains no technical information subjected to EAR and ITAR

  11. CPU only runs  chemistry takes a lion’s share (12 scalars) 6x speed up potential This page contains no technical information subjected to EAR and ITAR

  12. GPU: 83% to 7% reduction in chemistry compute load Capacity for bigger chemistry 35-40 species  good for jet fuel! This page contains no technical information subjected to EAR and ITAR

  13. Strong scalability of CPU-GPU hybrid wall time # cores This page contains no technical information subjected to EAR and ITAR

  14. Weak scalability of CPU-GPU hybrid good up to 64M cells This page contains no technical information subjected to EAR and ITAR

  15. Looking forward: GPU-CPU hybrid is a significant tech. enabler ~100 nodes doable with + Tune existing or create new turbulent- chemistry model http://www.happynews.com/news/11142008/visualizing-unseen-forces-turbulence.htm Courtesy of Stanford University Center for Turbulence Research This page contains no technical information subjected to EAR and ITAR

Recommend


More recommend