image compositing on gpu accelerated supercomputers
play

Image Compositing on GPU-Accelerated Supercomputers Pascal Grosset - PowerPoint PPT Presentation

Image Compositing on GPU-Accelerated Supercomputers Pascal Grosset & Charles (Chuck) Hansen Tuesday 5 April 2016 GTC 2016 Outline - Direct Volume Rendering - Distributed Volume Rendering - Rendering Pipeline - Setup - Rendering -


  1. Image Compositing on GPU-Accelerated Supercomputers Pascal Grosset & Charles (Chuck) Hansen Tuesday 5 April 2016 GTC 2016

  2. Outline - Direct Volume Rendering - Distributed Volume Rendering - Rendering Pipeline - Setup - Rendering - Compositing - Test Setup - Results & Discussion - Conclusion & Future Work GTC 2016

  3. Direct Volume Rendering Block of scalar values GTC 2016

  4. Distributed Volume Rendering Sort-last Parallel Rendering 1. Partition the data among the nodes ( loading ) 2. Forming an image from the data (rendering) 3. Assemble the image (compositing) Block of scalar values GTC 2016

  5. Distributed Volume Rendering Sort-last Parallel Rendering 1. Partition the data among the nodes ( loading ) 2. Forming an image from the data (rendering) 3. Assemble the image (compositing) GTC 2016

  6. Distributed Volume Rendering Sort-last Parallel Rendering 1. Partition the data among the nodes (loading) 2. Forming an image from the data ( rendering ) 3. Assemble the image (compositing) GTC 2016

  7. Distributed Volume Rendering Sort-last Parallel Rendering 1. Partition the data among the nodes (loading) 2. Forming an image from the data ( rendering ) 3. Assemble the image (compositing) GTC 2016

  8. Distributed Volume Rendering Sort-last Parallel Rendering 1. Partition the data among the nodes (loading) 2. Forming an image from the data (rendering) 3. Assemble the image ( compositing ) GTC 2016

  9. Distributed Volume Rendering on GPU Rendering: OpenGL: Most common way to render Compositing: Transfer to CPU and composite there? GTC 2016

  10. Inter-node GPU Communication GTC 2016

  11. Inter-node GPU Communication CUDA Network Network CUDA Driver Buffer Driver Buffer Driver Buffer Driver Buffer 5 operations !!! GTC 2016

  12. Inter-node GPU Communication NO GPU Direct RDMA: 5 operations GPU Direct RDMA: 1 operation GTC 2016

  13. Distributed Volume Rendering on GPU Rendering: OpenGL: Most common way to render Compositing: Transfer to CPU and composite there? Use the GPU: CUDA GTC 2016

  14. Distributed Volume Rendering on GPU Rendering: OpenGL Shaders Compositing: CUDA: Computation + Communication Using OpenGL would imply 5 copies when compositing! GTC 2016

  15. Distributed Volume Rendering on GPU Rendering: OpenGL Shaders CUDA OpenGL Interop for linking OpenGL with CUDA Compositing: CUDA: Computation + Communication CUDA and OpenGL can run together on Tesla class GPUs GTC 2016

  16. Pipeline Setup Volume Rendering CUDA OpenGL Interop Compositing GTC 2016

  17. Pipeline OpenGL with Shaders Setup Setup Offscreen Rendering Volume Rendering Volume Rendering CUDA OpenGL Interop GPU Direct RDMA does NOT work with Compositing Compositing texture memory!!! GTC 2016

  18. Pipeline OpenGL with Shaders Setup Setup Offscreen Rendering to GL_TEXTURE_BUFFER Volume Rendering Volume Rendering CUDA OpenGL Interop Compositing Compositing GTC 2016

  19. Pipeline Compositing: Setup Setup CUDA Kernels GPU Direct RDMA Volume Rendering Volume Rendering Constraint : Computation >> Communication CUDA OpenGL Interop Algorithm that minimizes communication Compositing Compositing GTC 2016

  20. TOD-Tree Task-Overlapped Direct send Tree (TOD-Tree): 1. Direct Send 2. K-ary Tree compositing 3. Gather Aim: - Minimize communication - Overlap communication with computation GTC 2016

  21. TOD-Tree: Direct Send (stage 1) Each node: - Determine the nodes in its locality of size r - Creates and advertises receiving buffer - Do parallel Direct Send GTC 2016

  22. TOD-Tree: K-ary Tree (stage 2) Each node: ● Determine if it is sending or receiving Sending node: ● Sends to the receiving node Receiving node: ● Creates buffer and advertises ● Blend images GTC 2016

  23. TOD-Tree: Gather (stage 3) Display node: Receive from other images ● Other nodes: Nodes that have images send ● their data to the display node ● GTC 2016

  24. TOD-Tree vs Radix-K and Binary Swap Binary Swap CPU Comparison against IceT Radix-k TOD-Tree GTC 2016

  25. Pipeline Volume Rendering: Setup: Setup OpenGL Buffer Object Activate X Server Write offscreen using shaders Create OpenGL Context using GLX OpenGL CUDA Interop Driver 358 requires no X Server for OpenGL context Compositing: CUDA Kernel - Blending GPU Direct RDMA - Communication TOD-Tree - Logic GTC 2016

  26. Setup for testing Test Data: Cube dataset - one cube per node Test Platform: Piz Daint at Swiss National Supercomputing Center (CSCS) Cray XC30 with 5,272 Tesla K20X 7th in Top 500 Supercomputers Algorithm: TOD-Tree GTC 2016

  27. Results: TOD-Tree Edison vs Piz Daint 70 17 65 16 60 15 55 Time (ms) 14 Time (ms) 50 13 45 12 40 11 35 GTC 2016

  28. Results: TOD-Tree Edison vs Piz Daint GTC 2016

  29. Conclusion Image compositing on GPUs is now feasible! Rendering: OpenGL Shaders offscreen to GL_TEXTURE_BUFFER CUDA OpenGL InterOP Compositing: Blending: CUDA Kernels Communication: GPU Direct RDMA Logic: TOD-Tree Scales very well as we increase the size of images GTC 2016

  30. Future Work - Test in-situ rendering - Scale to a larger number of nodes - Vulkan for OpenGL volume rendering GTC 2016

  31. More details ... Paper: - A. V. Pascal Grosset, Manasa Prasad, Cameron Christensen, Aaron Knoll, Charles Hansen, " TOD-Tree: Task-Overlapped Direct send Tree Image Compositing for Hybrid MPI Parallelism and GPUs" , IEEE Transactions on Visualization & Computer Graphics , no. 1, pp. 1, PrePrints, doi:10.1109/TVCG. 2016.2542069 GTC 2016

  32. Thank you! Any Questions? Special thanks to Tom Fogal, Peter Messmer and Jean Favre. My email: pgrosset@sci.utah.edu GTC 2016

Recommend


More recommend