for the
play

for the 3D Groundwater Flow Equation GTC 2015 Robert Zigon Sr - PowerPoint PPT Presentation

GPU Accelerated Solver for the 3D Groundwater Flow Equation GTC 2015 Robert Zigon Sr Staff Research Engineer Beckman Coulter Outline Background Legacy Fortran The Algorithm and CUDA attempts Results Lessons Learned


  1. GPU Accelerated Solver for the 3D Groundwater Flow Equation GTC 2015 Robert Zigon Sr Staff Research Engineer Beckman Coulter

  2. Outline • Background • Legacy Fortran • The Algorithm and CUDA attempts • Results • Lessons Learned

  3. Background Hydrogeology The study of the distribution and movement of water in the Earth’s crust.

  4. Questions asked by Hydrogeologists • Can an aquifer support another subdivision in a residential area? • Will a dam dry up if irrigation doubles? • Will waste products from a coal mine negatively impact wetlands?

  5. A PDE to model the water flow Freeze, 1971

  6. Discretizing the PDE • First order for time      ( , ) x t 1 i i  j i  ( ) ( ) j j   t t • Second order for spatial         [ K ( ) ( x t )] 1 2 i i i     j , i   ( 1 ) ( ) ( 1 ) j j j K ( )     1 x x x x        2 i 1 i 1 i 1      ( 1 ) ( ) ( 1 ) j j j K ( )   2 x

  7. Legacy Fortran • About 15 pages of code (Intel compiler) • In use for over 10 years • 7 day simulation, 24 hr step, 1M elements  2 hr run time • 30 day simulation, 24 hr step, 19M elements  8 days run time

  8. Algorithm Overview For each time step t While pressure not converged at (t) 1. Predict Psi (t) 2. Compute K(Psi (t) ) 3. Compute Psi (t) 4. Update Psi (t-2), Psi (t-1) 5. Generate discharge field Q (t)

  9. First CUDA attempt 1. Predict Psi (t) Compute K(Psi (t) ) Compute Psi (t) Update Psi (t-2), Psi (t-1) 2. Generate discharge field Q (t) -Launch 250,000 threads for 19M volume elements -Advance the plane of threads across the volume Results – Not Enough Registers!

  10. Second CUDA attempt 1. Predict Psi (t) Compute K(Psi (t) ) 2. Compute Psi (t) 3. Update Psi (t-2), Psi (t-1) 4. Generate discharge field Q (t) Results – K1 not enough registers!

  11. Third CUDA attempt 1. Predict Psi (t) 2. Compute K(Psi (t) ) 3. Compute Psi (t) 4. Update Psi (t-2), Psi (t-1) 5. Generate discharge field Q (t) Results – K2 nonlinear coefficients expensive K3 warp divergence boundary cond. Numerous matrix reads from GMEM

  12. Results – 7 Day, 19M elements 1 cpu 4 cpu K20c mins mins mins 1 cpu/K20 4 cpu/K20 24 hrs 120 72 10 12.6 7.6 12 hrs 251 165 21 12.0 7.9 6 hrs 532 352 41 13.0 8.6 4 hrs 826 510 63 13.1 8.1 2 hrs 1557 967 123 12.7 7.9 10000 1000 Time (mins) 1 CPU 100 4 CPU Tesla K20C 10 1 0 1 2 3 4 5 6 All arithmetic in double precision CUDA 5.5, K20C, VS 2008, Win7/64

  13. Lessons Learned • Advance a “plane of threads” through the volume • Matrix multi-splitting operator could reduce reads • Simplify non-linear terms with splines • Porting code  10x • Re-architecting code  100x

  14. Collaborators • Prof. Sally Letsinger, Indiana University • Prof. Raymond Chin, Indiana University-Purdue University of Indianapolis References • O’Leary -Multi-splitting of Matrices and Parallel Solution of Linear Systems • Freeze-Three dimensional, transient, saturated unsaturated flow in a ground basin • Micikevicius-3D Finite Difference Computation on GPUs using CUDA

  15. Questions? robert.zigon@beckman.com

Recommend


More recommend