massively parallel landscape evolution modelling using
play

Massively Parallel Landscape- Evolution Modelling using General - PowerPoint PPT Presentation

Massively Parallel Landscape- Evolution Modelling using General Purpose Graphical Processing Units S7331 Tuesday 9 th May 2017 GPU Technology Conference A. Stephen McGough , Darrel Maddy J. Wainwright, S. Liang, M. Rapoportas, A. Trueman, R.


  1. Massively Parallel Landscape- Evolution Modelling using General Purpose Graphical Processing Units S7331 Tuesday 9 th May 2017 GPU Technology Conference A. Stephen McGough , Darrel Maddy J. Wainwright, S. Liang, M. Rapoportas, A. Trueman, R. Grey, G. Kumar Vinod, and James Bell Durham University, UK Newcastle University, UK

  2. Outline • What is Landscape Evolution Modelling (LEM) • Parallelization of LEM • Preliminary Results

  3. Landscape Evolution Modeling • Landscapes change over time due to water/weathering • Physical and Chemical Weathering require water to break down material • Higher energy flowing water both Erodes and Transports material until decreasing energy conditions result in Deposition of material • These processes take a long time • Many glacial-Interglacial Cycles • Cycles are ~100ka for last 800ka, prior to 800ka cycles were ~40ka in length • We want to use retrodiction to work out how the landscape has changed

  4. Landscape Evolution Modeling • Use a simulation to model how the landscape changes • 3D Landscape is discretized as a regular 2D grid (x, y) with cell values representing surface heights (z) derived from a digital elevation model (DEM) • Cells can be 10m x 10m or larger 31 22 32 33 32 25 33 34 29 26 27 39 36 27 26 41 44 50 45 44 40 51 55 39 44 46

  5. Landscape Evolution Modeling (simplified) Each iteration of the simulation: How much material will be removed? 7 8 5 9 6 How much material will be deposited? 4 6 6 7 8 4 • Each step is ‘fairly’ fast… 5 8 9 10 9 • But we want to do lots of them 120K to 7 10 7 8 7 1M years Erosion/ 9 8 4 6 5 • On landscapes of 6-56M cells Deposition • If we could simulate 1 year in 1 minute Flow this would take 83 – 694 days! Routing • assuming 1 year = 1 iteration • may need more Current sequential version Flow is much slower than this… Accumulation 1 1 3 1 1 7 2 1 1 5 1 1 1 1 1 2 1 1 1 1 1 1 6 1 2

  6. Execution analysis of Sequential LEM • We started from an existing sequential LEM • 51x100 cells for just 120K years took 72 hours • estimate for 25M cells 64,000 years • This was non-optimal code • Reduced execution time from 72 to 4.7 hours • 64,000 years down to 300 years • But this is still not enough for our needs

  7. Execution analysis of Sequential LEM • Performance Analysis: • ~74% of time spent routing and accumulating • Need orders of magnitude speedup • So focus was on flow routing / accumulation

  8. Outline • What is Landscape Evolution Modelling (LEM) • Parallelization of LEM • Preliminary Results

  9. Parallel Flow Routing • Each cell can be done independently of all others 3 2 4 • SFD • 100% flow in the direction of steepest decent 7 5 8 (normally lowest neighbour) 7 1 9 • MFD • Flow is proportioned between all lower 3 2 4 neighbours 7 5 8 • Proportional to slope to each neighbours 7 1 9 • Almost linear speed-up Single flow direction vs multiple flow direction • Problems with code divergence MFD is ‘better’ but much more computationally demanding • CUDA Warps split when code contains a fork

  10. Parallel Accumulation: Correct Flow • Iterate: • Do not compute a cell until it has no incorrect cells flowing into it • Sum all inputs and add self • All cells can work independently of each other • Some restriction on updates not happening immediately Flow Routing Accumulation Correct 6 7 14 19 5 1 4 3 1 6 1 2 2 3 1 1 1 1 1 1 1 1 2 2 4 Cell values are not normally 1, but the initial rainfall on the cell

  11. Not the whole story… • Sinks and Plateaus • Can’t work out flow routing on sinks and plateaus • Need to ‘fake’ a flow routing • Fill a sink until it can flow out • Turned it into a plateau • Fake flow directions on a plateau to the outlet

  12. Parallel Plateau routing • Need to find the outflow of a plateau and flow all water to it • A common solution is to use a breadth first search algorithm • Parallel implementation • Though result does look ‘unnatural’ • Alternative patterns are possible – but acceptable • We are investigating alternative solutions

  13. Sink filling • Dealing with a single sink is (relatively) simple • Fill sink until we end up with a plateau (lake) • But what if we have multiple nested sinks?

  14. Nested Sink filling • Implemented parallel version of the sink filling algorithm proposed by Arger et al [2003] • Identify each sink (parallel) • Determine which cells flow into this sink - watershed (parallel) • Determine the lowest cell joining each pair of sinks (parallel/sequential) • Work out how high cells in each sink need to be raised to to allow all cells to flow out of the DEM (sequential) • Fill all sink cells to this height (parallel)

  15. GPGPU Solution: PARALLEM • Massively parallel version of the LEM • For Direction (including plateau and sinks) and Accumulation • Process has now been parallelized • on NVIDIA Fermi/Kepler based graphics cards • Tesla C2050, GTX580, K20, K40, K80 • ~two orders of magnitude speedup over the optimized sequential code (up to 56m cells)

  16. Outline • What is Landscape Evolution Modelling (LEM) • Parallelization of LEM • Preliminary Results

  17. Results : Performance • Overall performance 1000 100 10 Time (s) 1 CybErosion-slim T esla single iteration 0.1 580 single iteration T esla average 10 580 average 10 0.01 0.01 0.1 1 10 100 DEM size (millions)

  18. Results : Performance • Flow Direction • Including sink & plateau solution 1000 100 10 1 Time (s) 0.1 0.01 Sequential Flow Direction T esla Flow Direction 0.001 580 Flow Direction T erraflow Flow Direction 0.0001 0.001 0.01 0.1 1 10 100 DEM size (millions)

  19. Results : Performance • Flow Accumulation 1000 1000 1000 Sequential Flow Accumulation Sequential Flow Accumulation Sequential Flow Accumulation T T esla Flow Accumulation esla Flow Accumulation T esla Flow Accumulation 580 Flow Accumulation 580 Flow Accumulation 580 Flow Accumulation 100 100 100 T T erraflow Flow Accumulation erraflow Flow Accumulation T erraflow Flow Accumulation T esla K20 Flow Accumulation T esla K20 Flow Accumulation T esla K20 - removed conditionals 10 10 10 Time (s) Time (s) Time (s) 1 1 1 0.1 0.1 0.1 0.01 0.01 0.01 0.001 0.001 0.001 0.001 0.01 0.1 1 10 100 0.001 0.01 0.1 1 10 100 0.001 0.01 0.1 1 10 100 DEM size (millions) DEM size (millions) DEM size (millions)

  20. The Current Simulation • Core Model now extended with processes • Most only affect individual cells (weathering, vegetation) • Some have cross DEM effects (mass movement) but can use same process as before

  21. The Current Simulation • Actively running landscape models on K40/K80 GPGPUs • Taking ~7 weeks to run our model (MFD) • Leading to interesting results Upper Thames • Not seen as models have traditionally been Valley + 120K much smaller • Currently running on just 1 GPGPU • Running multiple models simultaneously • Now have a multi-GPGPU code for running flow accumulation • Designed to ‘sweep’ over the landscape

  22. Problem: Landscape Cutting with SFD 100 100 100 100 100 100 100 100 100 100 90 90 90 90 90 90 90 90 90 90 80 80 80 80 80 80 80 80 80 80 70 70 70 70 70 70 70 70 70 70 60 60 60 60 60 60 60 60 60 60 50 50 50 50 50 50 50 50 50 50 40 40 40 40 40 40 40 40 40 40 30 30 30 30 30 30 30 30 30 30 20 20 20 20 20 20 20 20 20 20 10 10 10 10 10 10 10 10 10 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 500 500 500 500 500 500 500 500 500 500 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2500 2500 2500 2500 2500 2500 2500 2500 2500 2500 3000 3000 3000 3000 3000 3000 3000 3000 3000 3000 3500 3500 3500 3500 3500 3500 3500 3500 3500 3500 4000 4000 4000 4000 4000 4000 4000 4000 4000 4000 4500 4500 4500 4500 4500 4500 4500 4500 4500 4500 5000 5000 5000 5000 5000 5000 5000 5000 5000 5000 5 10 10 15 20 15 15 20 25 20 50 25 20 25 50 70 70 50 25 25 50 70 115 115 70 50 50 115 70 125 115 70 125 70 115 125 138 115 125 138 115 125 138 125 138 125 138 138 138 115 125 138 125 138 138

  23. SFD MFD

  24. Comparing ‘cut in’ between SFD and MFD 84 82 80 78 76 74 72 70 0 500 1000 1500 2000 2500 3000 1k 20kSFD 20kMFD

  25. Problem: Algorithm Slow-down • Correct flow algorithm requires all input cells to be correct before progressing • Becomes a problem for rivers 100 80 Percentage Complete 60 • Correct flow 40 completion 20 profile 0 1 10 100 1000 10000 Iteration

Recommend


More recommend