Load Balanced Parallel GPU Out-of-Core for Continuous LOD Model - PowerPoint PPT Presentation

Load Balanced Parallel GPU Out-of-Core for Continuous LOD Model Visualization Chao Peng, Peng Mi and Yong Cao Department of Computer Science, Virginia Tech, Blacksburg, Virginia, USA Ultrascale Visualization Workshop 2012

Motivation • How to efficiently render a large 3D model that contains a lot of objects and triangles? The Boeing 777 model: Triangles: 332 million Vertices: 223 million Objects: 719 thousand Rendering difficulties: The objects have dramatically different shapes and are topologically disconnected. The data size is far beyond the GPU rendering capabilities. Ultrascale Visualization Workshop 2012

The Previous Approach • Our GPU-based approach in EuroGraphics’12. – Parallel Continuous LOD: triangle-level mesh simplification. – GPU Out-of Core: CPU-GPU data streaming. Ultrascale Visualization Workshop 2012

A Multi-GPU and Multi-Display System The input triangle data set CPU Core CPU Core GPU Device GPU Device Ultrascale Visualization Workshop 2012

The approach on a single GPU LOD Selection GPU Out-of-Core Triangle Reformation Rendering (VBO) Ultrascale Visualization Workshop 2012

O 1 O 2 O 3 O 4 O 5 O 6 O 7 LOD Selection Coherence Evaluation Existing Data CPU GPU Out-of-Core Defragmentation GPU Triangle Reformation Rendering (VBO) Ultrascale Visualization Workshop 2012

Performance Bottleneck O 1 O 2 O 3 O 4 O 5 O 6 O 7 Coherence Evaluation CPU GPU Out-Of-Core 45% Defragmentation GPU Triangle Reformation 20% OpenGL VBO Rendering 28% Ultrascale Visualization Workshop 2012

Contributions LOD Selection LOD Selection Load Balancing GPU Out-of-Core GPU Out-of-Core Triangle Triangle Reformation Reformation Rendering (VBO) Rendering (VBO) Inter-GPU Communication Final Display Final Display Ultrascale Visualization Workshop 2012

Load Balancing 1 2 3 4 5 Viewpoint LOD Selection Result: n 1 n 2 n 3 n 4 n 5 2 5 3 1 4 Ultrascale Visualization Workshop 2012

Load Balancing 1 2 3 4 5 Viewpoint LOD Selection Result: n 1 n 2 n 3 n 4 n 5 n 1 n 2 0 0 0 GPU1: 0 0 n 3 n 4 n 5 GPU2: 2 5 3 n 1 +n 2 1 4 n 3 +n 4 +n 5 [1-t, 1+t] ∉ GPU 1 GPU 2 Ultrascale Visualization Workshop 2012

Load Balancing 1 2 3 4 5 Viewpoint LOD Selection Result: n 1 n 2 n 3 n 4 n 5 n 1 n 2 0 0 0 GPU1: 0 0 n 3 n 4 n 5 GPU2: 2 5 3 n 1 +n 2 1 4 [1-t, 1+t] ∉ n 3 +n 4 +n 5 GPU 1 GPU 2 Ultrascale Visualization Workshop 2012

Load Balancing 1 2 3 4 5 Viewpoint LOD Selection Result: n 1 n 2 n 3 n 4 n 5 n 1 n 2 n 3 n 4 0 GPU1: 0 0 0 0 n 5 GPU2: 2 5 3 n 1 +n 2 +n 3 +n 4 1 4 [1-t, 1+t] ∉ n 5 GPU 1 GPU 2 Ultrascale Visualization Workshop 2012

Load Balancing 1 2 3 4 5 Viewpoint LOD Selection Result: n 1 n 2 n 3 n 4 n 5 n 1 n 2 n 3 0 0 GPU1: 0 0 0 n 4 n 5 GPU2: 2 5 3 n 1 +n 2 +n 3 1 4 [1-t, 1+t] ∈ n 4 +n 5 GPU 1 GPU 2 Ultrascale Visualization Workshop 2012

Inter-GPU Communication Displayed image on GPU 1 Displayed image on GPU 2 Rendered image on GPU 1 Rendered image on GPU 2 CUDA Inter-Process Communication (CUDA 4.1 IPC) for transferring image buffer. Ultrascale Visualization Workshop 2012

Implementation • Two GPUs: – NVIDIA GTX 580. – 512 cores, 3 GB DDR5. – 192.4 GB/s memory bandwidth. • CPU Main Memory: – 16 GB RAMs. • Rendering performance: – An average of 20 fps on the Linux system with MPI and CUDA 4.2. Ultrascale Visualization Workshop 2012

Ultrascale Visualization Workshop 2012

Performance Evaluation • Comparison – Dual-GPU with load balancing (our approach). – Dural-GPU without load balancing. – Single GPU. Ultrascale Visualization Workshop 2012

Performance Evaluation Ultrascale Visualization Workshop 2012

Performance Evaluation Approach FPS Diff. Visible Load GPU Triangle GL Triangle Triangle Balancing Out-Of- Reformation Rendering Num. Num. Core Single-GPU 14.94 --- 12.29 M --- 29.62 ms 3.62 ms 30.24 ms Dual-GPU 17.84 7.94 M 12.29 M --- 24.54 ms 2.85 ms 25.31 ms (NB) 20.40 0.37 M 12.29 M 5.38 ms 18.56 ms 1.97 ms 19.13 ms Dual-GPU (B) Ultrascale Visualization Workshop 2012

Performance Evaluation Ultrascale Visualization Workshop 2012

Conclusion • A rendering system with two GPUs: – The workload balancer based on view- frustum partitioning method. • Inter-GPU communication for image re-arrangement. • Future work: – Scalability beyond two GPUs. Ultrascale Visualization Workshop 2012

Acknowledgment Ultrascale Visualization Workshop 2012

Thank you. Ultrascale Visualization Workshop 2012

Load Balanced Parallel GPU Out-of-Core for Continuous LOD Model - PowerPoint PPT Presentation

Load Balanced Parallel GPU Out-of-Core for Continuous LOD Model Visualization Chao Peng, Peng Mi and Yong Cao Department of Computer Science, Virginia Tech, Blacksburg, Virginia, USA Ultrascale Visualization Workshop 2012 Motivation How

Welcome Welcome Core: Core A Regional Destination Core: Core UL Core: Core Downtown

Nr. LIFE13 BIO/LT/001303 www.birds-electrogrid.lt L. Raudonikis (LOD) J.Liaudanskyt (LOD)

LoD 11 Subgroup International Naval Semester 17 June 2020 46 IG LoD 11 Group Members RANK

Performing Arts LOD of ECLAP Performing Arts LOD of ECLAP Content Service Pierfrancesco Bellini,

Ontology Alignment for LOD Toni Gruetze, Christoph Bhm, and Felix Naumann Holistic and

Load Balancing with nftables by Laura Garca (Zen Load Balancer Team) Netdev 1.1 Prototype of

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

Load Balancing Load Balancing Load balancing: distributing data and/or computations across

Caching, Parallelism, Fault Tolerance Marco Serafini COMPSCI 532 Lectures 2-3 Memory Hierarchy

Vertical Stress Increases Chapter 8 Point Load 1 3/25/2015 Point Load Point Load

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

Dynamic Load Balancing in Dynamic Load Balancing in Charm+ + Charm+ + Abhinav S Bhatele

2110412 Parallel Comp Arch CUDA: Parallel Programming on GPU Natawut Nupairoj, Ph.D. Department

GPU-Accelerated GPU-Accelerated Large Vocabulary Continuous Speech Recognition Large

Extending An+cipa+on Games with Loca+on, Penalty and Timeline

Flight Opportuni.es Program Flight Flight Opportuni.es Program

Annie E. Fales Elementary School Westborough, MA July 12, 2018 H M F H A R C H I T E C T S

Mul$media Event Detec$on and Recoun$ng Task Reports Time ime Pres

CS 680: GAME AI WEEK 7: PROCEDURAL CONTENT GENERATION 2/27/2012 Santiago Ontan

Is the 370 the worst bus in Sydney? 11 October, 2016 Questions: Bus privitisation? Better

Auctions in Cloud Computing Zongpeng Li Department of Computer Science, University of Calgary

Course Script INF 5110: Compiler con- struction INF5110, spring 2018 Martin Steffen Contents

Load Balanced Parallel GPU Out-of-Core for Continuous LOD Model - PowerPoint PPT Presentation

Load Balanced Parallel GPU Out-of-Core for Continuous LOD Model Visualization Chao Peng, Peng Mi and Yong Cao Department of Computer Science, Virginia Tech, Blacksburg, Virginia, USA Ultrascale Visualization Workshop 2012 Motivation How

Welcome Welcome Core: Core A Regional Destination Core: Core UL Core: Core Downtown

Nr. LIFE13 BIO/LT/001303 www.birds-electrogrid.lt L. Raudonikis (LOD) J.Liaudanskyt (LOD)

LoD 11 Subgroup International Naval Semester 17 June 2020 46 IG LoD 11 Group Members RANK

Performing Arts LOD of ECLAP Performing Arts LOD of ECLAP Content Service Pierfrancesco Bellini,

Ontology Alignment for LOD Toni Gruetze, Christoph Bhm, and Felix Naumann Holistic and

Load Balancing with nftables by Laura Garca (Zen Load Balancer Team) Netdev 1.1 Prototype of

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

Load Balancing Load Balancing Load balancing: distributing data and/or computations across

Caching, Parallelism, Fault Tolerance Marco Serafini COMPSCI 532 Lectures 2-3 Memory Hierarchy

Vertical Stress Increases Chapter 8 Point Load 1 3/25/2015 Point Load Point Load

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO &amp; Co-founder Blagovest Taskov, RT GPU Team

Dynamic Load Balancing in Dynamic Load Balancing in Charm+ + Charm+ + Abhinav S Bhatele

2110412 Parallel Comp Arch CUDA: Parallel Programming on GPU Natawut Nupairoj, Ph.D. Department

GPU-Accelerated GPU-Accelerated Large Vocabulary Continuous Speech Recognition Large

Extending An+cipa+on Games with Loca+on, Penalty and Timeline

Flight Opportuni.es Program Flight Flight Opportuni.es Program

Annie E. Fales Elementary School Westborough, MA July 12, 2018 H M F H A R C H I T E C T S

Mul$media Event Detec$on and Recoun$ng Task Reports Time ime Pres

CS 680: GAME AI WEEK 7: PROCEDURAL CONTENT GENERATION 2/27/2012 Santiago Ontan

Is the 370 the worst bus in Sydney? 11 October, 2016 Questions: Bus privitisation? Better

Auctions in Cloud Computing Zongpeng Li Department of Computer Science, University of Calgary

Course Script INF 5110: Compiler con- struction INF5110, spring 2018 Martin Steffen Contents

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team