blasting sand with cuda
play

Blasting Sand with CUDA: MPM Sand Simulation for VFX Gergely Klr - PowerPoint PPT Presentation

Blasting Sand with CUDA: MPM Sand Simulation for VFX Gergely Klr DreamWorks Animation t n t n+1 t n t n+1 t n t n+1 Grid influence Nave Particles-to-Grid Gather Particles-to-Grid Our Solution Each particle is read only once, We


  1. Blasting Sand with CUDA: MPM Sand Simulation for VFX Gergely Klár DreamWorks Animation

  2. t n t n+1

  3. t n t n+1

  4. t n t n+1

  5. Grid influence

  6. Naïve Particles-to-Grid

  7. Gather Particles-to-Grid

  8. Our Solution • Each particle is read only once, • We efficiently use shared memory for the grids, • We significantly reduce the number of atomic operations, • And our secret sauce: a special data structure for particle queries.

  9. 1 CUDA 1 CUDA 1 CUDA Block Block Block 1 CUDA 1 CUDA 1 CUDA Block Block Block 1 CUDA 1 CUDA 1 CUDA Block Block Block

  10. CellBins ParticleIDs Actual particle data

  11. TileBins CellBins ParticleIDs Actual particle data

  12. • In each block/tile: – Get blockIdx – Cells in the tile are TileBins[blockIdx-1].. TileBins[blockIdx]-1 – Get a cellId for each warp from this list • Each thread works on two affected grid nodes • Particles of a cell are CellBins[cellId-1]..CellBins[cellId]-1 • Compute the contribution from the particle • Store in shared – Write back to global

  13. Tile & Cell Keys ● Particle coordinates: (px, py, pz) ● Cell coordinates: (ci, cj, ck) = ⌊ (px, py, pz)/ Δx ⌋ Δx ● Tile and in-tile coordinates: (ci, cj, ck) = (ti, tj, tk) ∙TILE_SIZE + (ri, rj, rk) 7 bits 7 bits 7 bits 3 bits 3 bits 3 bits ti tj tk ri rj rk 32 bit unsigned integer

  14. Tile & Cell Keys Initial Particle IDs ● When sorted as uint32s, keys of the same tile will be consecutive sort ● RLE encoding counts the number of Particle IDs particles per cell ● The running sum of the counts gives RLE the offsets to particles inc. sum ● RLE encoding with a mask for the Cell Bins tile bits counts the number of non- empty cells per tile masked RLE ● The running sum of these counts gives the offsets to cells inc. sum Tile Bins

  15. Results

  16. Overall 1000 800 600 GPU 400 CPU 200 0 262K 884K 2,097K 7,000K # of particles nVidia Quadro K5200 Intel Xeon CPU E5-2697 v3 @ 2.60GHz w/ 28 cores Milliseconds per time step. Smaller is better.

  17. Particles to Grids Grids to Particles 600 600 500 500 400 400 300 300 200 200 100 100 0 0 262K 884K 2,097K 7,000K 262K 884K 2,097K 7,000K Milliseconds per time step. Smaller is better.

  18. Summary • Particle binning with sort-RLE-scan • Breaking the domain to tiles fitting to shared memory • Processing particles of a cell by a single warp

  19. Special thanks to: • Ken Museth • Rob Tesdahl • Stephen Jones • David Tonnesen • Jeff Budsberg • Ibrahim Sani • Lawrence Lee

  20. Thank you!

Recommend


More recommend