Image Compositing on GPU-Accelerated Supercomputers Pascal Grosset - PowerPoint PPT Presentation

Image Compositing on GPU-Accelerated Supercomputers Pascal Grosset & Charles (Chuck) Hansen Tuesday 5 April 2016 GTC 2016

Outline - Direct Volume Rendering - Distributed Volume Rendering - Rendering Pipeline - Setup - Rendering - Compositing - Test Setup - Results & Discussion - Conclusion & Future Work GTC 2016

Direct Volume Rendering Block of scalar values GTC 2016

Distributed Volume Rendering Sort-last Parallel Rendering 1. Partition the data among the nodes ( loading ) 2. Forming an image from the data (rendering) 3. Assemble the image (compositing) Block of scalar values GTC 2016

Distributed Volume Rendering Sort-last Parallel Rendering 1. Partition the data among the nodes ( loading ) 2. Forming an image from the data (rendering) 3. Assemble the image (compositing) GTC 2016

Distributed Volume Rendering Sort-last Parallel Rendering 1. Partition the data among the nodes (loading) 2. Forming an image from the data ( rendering ) 3. Assemble the image (compositing) GTC 2016

Distributed Volume Rendering Sort-last Parallel Rendering 1. Partition the data among the nodes (loading) 2. Forming an image from the data (rendering) 3. Assemble the image ( compositing ) GTC 2016

Distributed Volume Rendering on GPU Rendering: OpenGL: Most common way to render Compositing: Transfer to CPU and composite there? GTC 2016

Inter-node GPU Communication GTC 2016

Inter-node GPU Communication CUDA Network Network CUDA Driver Buffer Driver Buffer Driver Buffer Driver Buffer 5 operations !!! GTC 2016

Inter-node GPU Communication NO GPU Direct RDMA: 5 operations GPU Direct RDMA: 1 operation GTC 2016

Distributed Volume Rendering on GPU Rendering: OpenGL: Most common way to render Compositing: Transfer to CPU and composite there? Use the GPU: CUDA GTC 2016

Distributed Volume Rendering on GPU Rendering: OpenGL Shaders Compositing: CUDA: Computation + Communication Using OpenGL would imply 5 copies when compositing! GTC 2016

Distributed Volume Rendering on GPU Rendering: OpenGL Shaders CUDA OpenGL Interop for linking OpenGL with CUDA Compositing: CUDA: Computation + Communication CUDA and OpenGL can run together on Tesla class GPUs GTC 2016

Pipeline Setup Volume Rendering CUDA OpenGL Interop Compositing GTC 2016

Pipeline OpenGL with Shaders Setup Setup Offscreen Rendering Volume Rendering Volume Rendering CUDA OpenGL Interop GPU Direct RDMA does NOT work with Compositing Compositing texture memory!!! GTC 2016

Pipeline OpenGL with Shaders Setup Setup Offscreen Rendering to GL_TEXTURE_BUFFER Volume Rendering Volume Rendering CUDA OpenGL Interop Compositing Compositing GTC 2016

Pipeline Compositing: Setup Setup CUDA Kernels GPU Direct RDMA Volume Rendering Volume Rendering Constraint : Computation >> Communication CUDA OpenGL Interop Algorithm that minimizes communication Compositing Compositing GTC 2016

TOD-Tree Task-Overlapped Direct send Tree (TOD-Tree): 1. Direct Send 2. K-ary Tree compositing 3. Gather Aim: - Minimize communication - Overlap communication with computation GTC 2016

TOD-Tree: Direct Send (stage 1) Each node: - Determine the nodes in its locality of size r - Creates and advertises receiving buffer - Do parallel Direct Send GTC 2016

TOD-Tree: K-ary Tree (stage 2) Each node: ● Determine if it is sending or receiving Sending node: ● Sends to the receiving node Receiving node: ● Creates buffer and advertises ● Blend images GTC 2016

TOD-Tree: Gather (stage 3) Display node: Receive from other images ● Other nodes: Nodes that have images send ● their data to the display node ● GTC 2016

TOD-Tree vs Radix-K and Binary Swap Binary Swap CPU Comparison against IceT Radix-k TOD-Tree GTC 2016

Pipeline Volume Rendering: Setup: Setup OpenGL Buffer Object Activate X Server Write offscreen using shaders Create OpenGL Context using GLX OpenGL CUDA Interop Driver 358 requires no X Server for OpenGL context Compositing: CUDA Kernel - Blending GPU Direct RDMA - Communication TOD-Tree - Logic GTC 2016

Setup for testing Test Data: Cube dataset - one cube per node Test Platform: Piz Daint at Swiss National Supercomputing Center (CSCS) Cray XC30 with 5,272 Tesla K20X 7th in Top 500 Supercomputers Algorithm: TOD-Tree GTC 2016

Results: TOD-Tree Edison vs Piz Daint 70 17 65 16 60 15 55 Time (ms) 14 Time (ms) 50 13 45 12 40 11 35 GTC 2016

Results: TOD-Tree Edison vs Piz Daint GTC 2016

Conclusion Image compositing on GPUs is now feasible! Rendering: OpenGL Shaders offscreen to GL_TEXTURE_BUFFER CUDA OpenGL InterOP Compositing: Blending: CUDA Kernels Communication: GPU Direct RDMA Logic: TOD-Tree Scales very well as we increase the size of images GTC 2016

Future Work - Test in-situ rendering - Scale to a larger number of nodes - Vulkan for OpenGL volume rendering GTC 2016

More details ... Paper: - A. V. Pascal Grosset, Manasa Prasad, Cameron Christensen, Aaron Knoll, Charles Hansen, " TOD-Tree: Task-Overlapped Direct send Tree Image Compositing for Hybrid MPI Parallelism and GPUs" , IEEE Transactions on Visualization & Computer Graphics , no. 1, pp. 1, PrePrints, doi:10.1109/TVCG. 2016.2542069 GTC 2016

Thank you! Any Questions? Special thanks to Tom Fogal, Peter Messmer and Jean Favre. My email: pgrosset@sci.utah.edu GTC 2016

Image Compositing on GPU-Accelerated Supercomputers Pascal Grosset - PowerPoint PPT Presentation

Image Compositing on GPU-Accelerated Supercomputers Pascal Grosset & Charles (Chuck) Hansen Tuesday 5 April 2016 GTC 2016 Outline - Direct Volume Rendering - Distributed Volume Rendering - Rendering Pipeline - Setup - Rendering -

CS4405 Compositing Digital Compositing Digital compositing is the

Picture This! Visualization on GPU Accelerated Supercomputers Peter Messmer, 11/15/2016 NVIDIA

Local Shape Editing at the Compositing Stage Carlos J. Zubiaga, Gal Guennebaud, Romain Vergne,

FOG; COMPOSITING 1 OUTLINE Fog Compositing Blending Transparency Clipping

LARGE SCALE VISUALIZATION ON GPU ACCELERATED SUPERCOMPUTERS Peter Messmer, 11/16/2015

NVGRAPH,FIREHOSE,PAGERANK GPU ACCELERATED ANALYTICS NOV 2016 Joe Eaton Ph.D. Accelerated

GPU-Accelerated GPU-Accelerated Large Vocabulary Continuous Speech Recognition Large

Supercomputers and Supercomputers and Clusters and Clusters and Grid, Grid, Oh My! Oh My!

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

GPU-accelerated similarity searching in a database of short DNA sequences Richard Wilton

Accelerated Reader What is Accelerated Reader? Accelerated Reader is the number one software

Image Restoration Image Enhancement and Image Restoration both deal with improving images. Image

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Jorge

BUILDING STATISTICS Introduction Owner: CFBC Properties, LLC Occupancy Type: Office

The Power of Two-Choices in Regulating Interval Partitions Ohad N. Feldheim (Stanford) Joint

THE AURORA PARTITION-WALL SYSTEM, the 31 SERIES Komandor has introduced a new AURORA system. The

510 5 TH AVENUE LANDMARKS PRESENTATION S E P T E M B E R 2 5 , 2 0 1 8 WWW.SPECTORGROUP.COM

Unified Big Data nified Big Data Pr Processing ocessing with with Apache Spark pache Spark

$5.25 Million Estate Tax Exemption: Maximizing New Planning Opportunities Unwinding Prior

A PROPOSAL TO EXTEND THE U-PASS PROGRAM Associated Students of the University of Hawai i

NBEMS Narrow Band Emergency Messaging System Setup of these programs Fldigi Flamp Flmsg In

Image Compositing on GPU-Accelerated Supercomputers Pascal Grosset - PowerPoint PPT Presentation

Image Compositing on GPU-Accelerated Supercomputers Pascal Grosset & Charles (Chuck) Hansen Tuesday 5 April 2016 GTC 2016 Outline - Direct Volume Rendering - Distributed Volume Rendering - Rendering Pipeline - Setup - Rendering -

CS4405 Compositing Digital Compositing Digital compositing is the

Picture This! Visualization on GPU Accelerated Supercomputers Peter Messmer, 11/15/2016 NVIDIA

Local Shape Editing at the Compositing Stage Carlos J. Zubiaga, Gal Guennebaud, Romain Vergne,

FOG; COMPOSITING 1 OUTLINE Fog Compositing Blending Transparency Clipping

LARGE SCALE VISUALIZATION ON GPU ACCELERATED SUPERCOMPUTERS Peter Messmer, 11/16/2015

NVGRAPH,FIREHOSE,PAGERANK GPU ACCELERATED ANALYTICS NOV 2016 Joe Eaton Ph.D. Accelerated

GPU-Accelerated GPU-Accelerated Large Vocabulary Continuous Speech Recognition Large

Supercomputers and Supercomputers and Clusters and Clusters and Grid, Grid, Oh My! Oh My!

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

GPU-accelerated similarity searching in a database of short DNA sequences Richard Wilton

Accelerated Reader What is Accelerated Reader? Accelerated Reader is the number one software

Image Restoration Image Enhancement and Image Restoration both deal with improving images. Image

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO &amp; Co-founder Blagovest Taskov, RT GPU Team

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Jorge

BUILDING STATISTICS Introduction Owner: CFBC Properties, LLC Occupancy Type: Office

The Power of Two-Choices in Regulating Interval Partitions Ohad N. Feldheim (Stanford) Joint

THE AURORA PARTITION-WALL SYSTEM, the 31 SERIES Komandor has introduced a new AURORA system. The

510 5 TH AVENUE LANDMARKS PRESENTATION S E P T E M B E R 2 5 , 2 0 1 8 WWW.SPECTORGROUP.COM

Unified Big Data nified Big Data Pr Processing ocessing with with Apache Spark pache Spark

$5.25 Million Estate Tax Exemption: Maximizing New Planning Opportunities Unwinding Prior

A PROPOSAL TO EXTEND THE U-PASS PROGRAM Associated Students of the University of Hawai i

NBEMS Narrow Band Emergency Messaging System Setup of these programs Fldigi Flamp Flmsg In

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team