burning on the gpu fast and accurate chemical kinetics
play

Burning on the GPU: Fast and Accurate Chemical Kinetics GPU - PowerPoint PPT Presentation

Funded by: U.S. Department of Energy Vehicle Technologies Program Program Manager: Gurpreet Singh & Leo Breton Burning on the GPU: Fast and Accurate Chemical Kinetics GPU Technology Conference April 7, 2016 Russell Whitesides Session


  1. Funded by: U.S. Department of Energy Vehicle Technologies Program Program Manager: Gurpreet Singh & Leo Breton Burning on the GPU: Fast and Accurate Chemical Kinetics GPU Technology Conference April 7, 2016 Russell Whitesides Session 6195 LLNL-PRES-687782 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC

  2. + Why? To make it go faster? Lawrence Livermore National Laboratory 2 LLNL-PRES-687782

  3. Why? • Transportation efficiency • Chemistry is vital to predictive simulations • Chemistry can be > 90% of simulation time. We burn a lot of gasoline. Lawrence Livermore National Laboratory 3 LLNL-PRES-687782

  4. Why? Supercomputing @ DOE labs: Strong investment in GPUs with eye towards exascale OEM engine designers: Require fast turnaround with desktop class hardware National lab compute power and industry need. Lawrence Livermore National Laboratory 4 LLNL-PRES-687782

  5. “Typical” engine simulation w/ detailed chemistry Temperature Y O2 “Colorful Fluid Dynamics” Lawrence Livermore National Laboratory 5 LLNL-PRES-687782

  6. Detailed Chemistry in Reacting Flow CFD: Operator Splitting Technique: Solve independent set of ordinary differential equations (ODEs) in each cell to calculate chemical source terms for species and energy advection/diffusion equations. t t+ ∆ t Each cells is treated as an isolated system for chemistry. Lawrence Livermore National Laboratory 6 LLNL-PRES-687782

  7. CPU (un-coupled) chemistry integration t t+ ∆ t Each cells is treated as an isolated system for chemistry. Lawrence Livermore National Laboratory 7 LLNL-PRES-687782

  8. GPU (batched) chemistry integration t t+ ∆ t On the GPU we solve chemistry in batches of cells simultaneously. Lawrence Livermore National Laboratory 8 LLNL-PRES-687782

  9. Previously at GTC: See also Whitesides & McNenly, GTC 2015; McNenly & Whitesides, GTC 2014 Lawrence Livermore National Laboratory 9 LLNL-PRES-687782

  10. Note: most CFD simulations are done on distributed memory systems rank0 CPU rank4 CPU rank1 rank5 CPU CPU rank2 rank6 CPU CPU rank3 rank7 CPU CPU n_gpu = 0; Lawrence Livermore National Laboratory 10 LLNL-PRES-687782

  11. Note: most CFD simulations are done on distributed memory systems rank0 CPU rank4 CPU rank1 rank5 CPU CPU rank2 rank6 CPU CPU rank3 rank7 CPU CPU ++n_gpu; //now what? Lawrence Livermore National Laboratory 11 LLNL-PRES-687782

  12. Ideal CPU-GPU Work-sharing S GPU = walltime ( CPU ) walltime ( GPU ) Here CPU is a single core. Lawrence Livermore National Laboratory 12 LLNL-PRES-687782

  13. Ideal CPU-GPU Work-sharing 8 S GPU = 8 N CPU =4 7 S GPU = walltime ( CPU ) 6 walltime ( GPU ) 5 N CPU =8 S total 4 ( ) ( ) S total = N CPU + N GPU S GPU − 1 N CPU =16 3 2 * N CPU * N CPU =32 1 1 2 3 4 N GPU § # CPU cores = N CPU * TITAN (1.4375) § # GPU devices = N GPU * surface (1.8750) Let’s make use of the whole machine. Lawrence Livermore National Laboratory 13 LLNL-PRES-687782

  14. Good performance in simple case with both CPU and GPU doing work 10000 Chemistry Time (seconds) CPU Chemistry 1000 GPU Chemistry (std work sharing) 100 1 2 4 8 16 Number of Processors Distribute based on number of cells and give more to GPU. Lawrence Livermore National Laboratory 14 LLNL-PRES-687782

  15. Good performance in simple case with both CPU and GPU doing work 10000 Chemistry Time (seconds) CPU Chemistry 1000 GPU Chemistry (std work sharing) GPU Chemistry (custom work sharing) 100 1 2 4 8 16 Number of Processors S total = 1.7 S GPU = 7 ( S GPU = 6.6) Distribute based on number of cells and give more to GPU. Lawrence Livermore National Laboratory 15 LLNL-PRES-687782

  16. First attempt @ engine calculation on GPU+CPU Let’s go! Lawrence Livermore National Laboratory 16 LLNL-PRES-687782

  17. First attempt @ engine calculation on GPU+CPU 21.2 hours § 2x Xeon E5-2670 (16 cores) => 17.6 hours § 2x Xeon E5-2670 + 2 Tesla K40m => ( S GPU = 2.6) § S total = 21.2/17.6 = 1.20 What happened? Lawrence Livermore National Laboratory 17 LLNL-PRES-687782

  18. Integrator performance when doing batch solution vs. If the systems are not similar how much extra work needs to be done? Lawrence Livermore National Laboratory 18 LLNL-PRES-687782

  19. Batches of dissimilar reactors will suffer from excessive extra steps What penalty do we pay when batching? Lawrence Livermore National Laboratory 19 LLNL-PRES-687782

  20. Batches of dissimilar reactors will suffer from excessive extra steps What penalty do we pay when batching? Lawrence Livermore National Laboratory 20 LLNL-PRES-687782

  21. Batches of dissimilar reactors will suffer from excessive extra steps Possibly a lot of extra steps. Lawrence Livermore National Laboratory 21 LLNL-PRES-687782

  22. Sort reactors by how many steps they took to solve on the last CFD step n_steps >100 1 batch3 batch2 batch1 batch0 Easy as pie? Lawrence Livermore National Laboratory 22 LLNL-PRES-687782

  23. Have to manage the sorting and load- balancing in distributed memory system rank4 rank0 rank5 rank1 rank2 rank6 rank7 rank3 Not so fast. Lawrence Livermore National Laboratory 23 LLNL-PRES-687782

  24. Load balance based on expected cost and expected performance. rank4 rank0 rank5 rank1 rank6 rank2 rank7 rank3 MPI communication to re-balance for chemistry. Lawrence Livermore National Laboratory 24 LLNL-PRES-687782

  25. Second attempt @ engine calculation on GPU+CPU Let’s go again! Lawrence Livermore National Laboratory 25 LLNL-PRES-687782

  26. Total steps significantly reduced by batching appropriately How much does difference does it make? Lawrence Livermore National Laboratory 26 LLNL-PRES-687782

  27. Engine results with improved work- sharing and reactor sorting 13.0 hrs 9.1 hrs S total =1.7 7.6 hrs S GPU =6.6 ~40 % reduction in chemistry time; ~36% reduction in overall time J Lawrence Livermore National Laboratory 27 LLNL-PRES-687782

  28. Future directions § Improve S GPU • Derivative kernels • Matrix operations § Extrapolative integration methods • Less “startup” cost when re-initializing • Potentially well suited for GPU § Non-chemistry calc’s on GPU • Multi-species transport • Particle spray Possibilities for significant further improvements. Lawrence Livermore National Laboratory 28 LLNL-PRES-687782

  29. Summary § Much improved CFD chemistry work- sharing with GPU § ~40% reduction in chemistry time for real engine case (~36% total time) § Working on further improvement + Thank you! Lawrence Livermore National Laboratory 29 LLNL-PRES-687782

Recommend


More recommend