high performance in situ visualization on thousands of
play

High Performance In-Situ Visualization on Thousands of GPUs Jeroen - PowerPoint PPT Presentation

High Performance In-Situ Visualization on Thousands of GPUs Jeroen Bdorf Evghenii Gaburov Simon Portegies Zwart Peter Messmer Leiden Observatory Compute machine Simulation I/O layer disk I/O software Storage


  1. High Performance In-Situ Visualization on Thousands of GPUs Jeroen Bédorf Evghenii Gaburov Simon Portegies Zwart Peter Messmer Leiden Observatory

  2. • • • •

  3. Compute machine Simulation I/O layer disk I/O software Storage analysis & visualization disk I/O software I/O layer software Ex-situ visualization machine

  4. Compute & in-situ visualization machine Simulation I/O layer analysis & visualization, disk I/O software simulation steering sw Storage

  5. “Hoax object” Discovered at SC14!

  6. Gravitational tree code :: Bonsai Showcased at GTC12 & SC14 Gordon Bell Prize Finalist (2014) Features: • Scales up to 25 Pflops on Titan supercomputer • Async parallel I/O • In-situ (parallel) visualization http://github.com/treecode/Bonsai

  7. Gravitational tree code :: Bonsai Showcased at GTC12 & SC14 Gordon Bell Prize Finalist (2014) Features: • Scales up to 25 Pflops on Titan supercomputer • Async parallel I/O • In-situ ( parallel ) visualization http://github.com/treecode/Bonsai

  8. Compute & in-situ visualization machine Bonsai I/O layer analysis & visualization, simulation steering sw In-situ visualization pipeline: 1. Simulation step 2. Data partitioning 3. OpenGL rendering 4. Parallel compositing 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms Display Display (240 ms) Display Compositing 1 Simulation step (80 ms) 2 Data partition (50 ms) 3 OpenGL rendering (60 ms) 4 Compositing (50 ms) Simulation step …

  9. 2. Data partitioning

  10. 9 1 8 2 6 3 4 5 7

  11. 9 1 8 2 6 3 4 5 7

  12. Space Filling Curve (SFC) Domain decomposition in Bonsai

  13. depth Ray casting

  14. depth Ray casting Sampling data

  15. depth Ray casting Sampling data Shading

  16. 5 4 3 2 1 depth Ray casting Sampling data Shading Compositing

  17. 9 1 8 2 P L Q 6 3 4 5 7 P Q

  18. 9 1 8 2 P L Q 6 3 4 5 7 P Q

  19. 9 1 8 2 P L Q 9 6 3 1 4 5 2 7 P Q

  20. 9 1 8 2 P L Q 9 6 3 1 4 5 2 7 P Q

  21. 9 1 7 6 5 8 2 P 4 L Q 5 6 3 9 3 1 4 4 5 2 3 7 P Q

  22. 4 7 1 8 5 P L 2 Q 9 3 6 P Q

  23. 1 4 7 8 5 P 9 L 2 Q 7 6 8 5 4 3 9 1 2 3 6 P Q

  24. Recursive multi-section domain decomposition

  25. Every new in-situ data update Recursive multi-section SFC Both a CPU and Interconnect heavy operation

  26. Every new in-situ data update Recursive multi-section SFC Both a CPU and Interconnect heavy operation

  27. GPU-0

  28. GPU-1

  29. GPU-2

  30. GPU-3

  31. GPU-4

  32. GPU-5

  33. GPU-6

  34. GPU-7

  35. GPU-8

  36. GPU-0 GPU-1 GPU-2 GPU-3 GPU-4 GPU-5 GPU-6 GPU-7 GPU-8

  37. Final image

  38. 9 7 6 8 5 4. Parallel compositing 4 3 1 2 P Q

  39. proc 0 proc 1 proc 2 proc 3 proc 4 proc 5 proc 6 proc 7

  40. proc 0 proc 1 proc 2 proc 3 proc 4 proc 5 proc 6 proc 7

  41. proc 0 proc 1 proc 2 proc 3 proc 4 proc 5 proc 6 proc 7 G1 G7 G3 G6

  42. proc 0 proc 1 proc 2 proc 3 proc 4 proc 5 proc 6 proc 7 G1 G7 G3 G6

  43. proc 0 proc 1 proc 2 proc 3 proc 4 proc 5 proc 6 proc 7 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 7 7 7 7 7 1 1 1 1,3 1,3 3 7 7 7 7 7 1 1 1,6 1,3,6 1,3,6 3 7 7 7 7 7 6 3,6 3,6 3 7 7 7 7 7 6 6 6 6 6 6 MPI_Alltoallv(..) A bit of math & data exchange is done with a single operation:

  44. proc 0 proc 1 proc 2 proc 3 proc 4 proc 5 proc 6 proc 7 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 7 7 7 7 7 1 1 1 1,3 1,3 3 7 7 7 7 7 1 1 1,6 1,3,6 1,3,6 3 7 7 7 7 7 6 3,6 3,6 3 7 7 7 7 7 6 6 6 6 6 6 P2: blends pixels from G1 & G3 P3: blends pixels from G1, G3 & G6 P4: blends pixels from G3 & G6

  45. proc 0 proc 1 proc 2 proc 3 proc 4 proc 5 proc 6 proc 7 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 7 7 7 7 7 1 1 1 1+3 1+3 3 7 7 7 7 7 1 1 1+6 1+3+6 1+3+6 3 7 7 7 7 7 6 3+6 3+6 3 7 7 7 7 7 6 6 6 6 6 6 Glue scan-lines together with a single operation: MPI_Gather(..)

  46. 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms Compositing 1 Simulation step (80 ms) 2 Data partition (50 ms) 3 OpenGL rendering (60 ms) 4 Compositing (50 ms) Simulation step … Display Display (240 ms) Display

  47. 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 1 Simulation step (80 ms) 2 Data partition (50 ms) 3 OpenGL rendering (60 ms) 4 Compositing (50 ms) Display

  48. 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms Simulation step 1 Simulation step (80 ms) Simulation step (80 ms) Simulation step (80 ms) Simulation step … 2 Data partition (50 ms) 3 OpenGL rendering (60 ms) 4 Compositing (50 ms) Display

  49. 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms Simulation step 1 Simulation step (80 ms) Simulation step (80 ms) Simulation step (80 ms) Simulation step … Data partition (50 ms) 2 Data partition (50 ms) Data partition (50 ms) Data partition 3 OpenGL rendering (60 ms) 4 Compositing (50 ms) Display

  50. 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms Simulation step 1 Simulation step (80 ms) Simulation step (80 ms) Simulation step (80 ms) Simulation step … Data partition (50 ms) 2 Data partition (50 ms) Data partition (50 ms) Data partition OpenGL rendering OpenGL rendering (60 ms) OpenGL rendering (60 ms) 3 OpenGL rendering (60 ms) OpenGL rendering Compositing Compositing (50 ms) Compositing (50 ms) Compositing (50 ms) 4 Compositing (50 ms) Display Display (60 ms) Display (60 ms) Display (60 ms) Display

  51. 4 fps 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms Compositing 1 Simulation step (80 ms) 2 Data partition (50 ms) 3 OpenGL rendering (60 ms) 4 Compositing (50 ms) Simulation step … Display Display (240 ms) Display 16 fps 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms Simulation step 1 Simulation step (80 ms) Simulation step (80 ms) Simulation step (80 ms) Simulation step … Data partition (50 ms) 2 Data partition (50 ms) Data partition (50 ms) Data partition OpenGL rendering OpenGL rendering (60 ms) OpenGL rendering (60 ms) 3 OpenGL rendering (60 ms) OpenGL rendering Compositing Compositing (50 ms) Compositing (50 ms) Compositing (50 ms) 4 Compositing (50 ms) Display Display (60 ms) Display (60 ms) Display (60 ms) Display

  52. 4 fps 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms Compositing 1 Simulation step (80 ms) 2 Data partition (50 ms) 3 OpenGL rendering (60 ms) 4 Compositing (50 ms) Simulation step … Display Display (240 ms) Display • 16 bit colors • delegated MPI_Alltoallv with MPI rank placement • dedicated remote displaying machine to gather final image • image compression 15 fps 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms Simulation step 1 Simulation step (80 ms) Simulation step (80 ms) Simulation step (80 ms) Simulation step … Data partition (50 ms) 2 Data partition (50 ms) Data partition (50 ms) Data partition OpenGL rendering OpenGL rendering (60 ms) OpenGL rendering (60 ms) 3 OpenGL rendering (60 ms) OpenGL rendering Compositing Compositing (50 ms) Compositing (50 ms) Compositing (50 ms) 4 Compositing (50 ms) Display Display (60 ms) Display (60 ms) Display (60 ms) Display http://github.com/treecode/Bonsai

  53. • In-situ visualization as I/O workflow (e.g. ADIOS) • Take advantage of existing software (e.g. ParaView) • Interoperability with job schedulers (e.g. slurm) • More use cases (astro, chem, bio, automotive, aerospace)

Recommend


More recommend