NVIDIA Technology Center MBE --- A GPU-based, fast, robust and precise solver for chemical ODEs Fan Feng, Zifa Wang GTC 2016 San Jose, USA, 04-07 Apr. 2016
Contents • Motivation of MBE solver • Introduction of MBE solver • Parallelization and GPU implementation • CPU vs. GPU numerical results vs. • Performance of the CPU vs. GPU • Conclusions
Motivation of MBE solver Statistic model Air Quality Simulations Kinetic model Reaction-Diffusion-Advection PDEs operator splitting techniques chemical reactions (ODEs), diffusion , advection , etc.
Motivation of MBE solver Nested Air Quality Prediction Modeling System (NAQPMS) Old Chemical Solver--- LSODE • Slow (>70% NAQPMS time for chemical ODEs) • Simulation errors (e.g. computation may fail because of unsuccessful iteration procedure in LSODE) Need a faster, robust and more precise solver Institute of Atmospheric Physics, Chinese Academy of Sciences (IAP/CAS)
Introduction of MBE solver Chemical equations: Production rate Loss rate where ( m species ) :
Introduction of MBE solver Numerical difficulty: algorithm Conquer Maintain & stiffness nonnegativity
Introduction of MBE solver Modified-Backward-Euler Method (MBE):
Introduction of MBE solver CBM-Z : a set of given chemical ODEs including 67 species 67 where ( 67 species ) (Given functions) :
Introduction of MBE solver Species of CBM-Z:
Introduction of MBE solver MBE --- A fast, robust and precise solver for chemical ODEs
Parallelization and GPU Implementation Spatial discretization nz ny nx Total spatial point = nx ● ny ● nz each thread each spatial point
Parallelization and GPU Implementation MBE No iteration in MBE Almost the same amount of calculation for each spatial point Load balance MBE --- A GPU-based solver
Parallelization and GPU Implementation GPU Implemetation 256 threads per block Total number of blocks = 𝑜𝑦 ∗ 𝑜𝑧 ∗ 𝑜𝑨 + 256 − 1 256
CPU vs. GPU numerical results Validation Check of GPU Implementation X: Time (Hours) Y: Concentration of the Species (PPB) O 3 CPU GPU Two lines almost coincide with each other
CPU vs. GPU numerical results NO NO 2 SO 2 H 2 O 2
CPU vs. GPU numerical results CO 1 O( D) 3 O( P) H 2 SO 4
Performance of the CPU vs. GPU vs. Nodes Num. Run Time (Sec) Speedup(X) CPU 1 473600 24295.7 - Intel(R) Xeon E5- 2690 @ 3.0 GHz K40 473600 375.9 64.6 K80 2 473600 376.5 64.5 1: In the test, only one core is used. We did not parallelize the CPU code. 2: K80 has two GPU chips, and only one chip is used in this test.
Conclusions • MBE is a GPU-based, fast, robust and precise solver for chemical ODEs • The GPU implementation of MBE is of high accuracy and computational efficiency – The numerical results of GPU code are nearly the same as CPU code – On K40, 64x speedup against CPU code – The same speedup is also achieved with one single K80 chip – We expect to double the performance on K80 if the two chips are used. • Better performance is expected with further optimization
Recommend
More recommend