Load Balanced Parallel GPU Out-of-Core for Continuous LOD Model - PowerPoint PPT Presentation
Load Balanced Parallel GPU Out-of-Core for Continuous LOD Model Visualization Chao Peng, Peng Mi and Yong Cao Department of Computer Science, Virginia Tech, Blacksburg, Virginia, USA Ultrascale Visualization Workshop 2012 Motivation How
Load Balanced Parallel GPU Out-of-Core for Continuous LOD Model Visualization Chao Peng, Peng Mi and Yong Cao Department of Computer Science, Virginia Tech, Blacksburg, Virginia, USA Ultrascale Visualization Workshop 2012
Motivation • How to efficiently render a large 3D model that contains a lot of objects and triangles? The Boeing 777 model: Triangles: 332 million Vertices: 223 million Objects: 719 thousand Rendering difficulties: The objects have dramatically different shapes and are topologically disconnected. The data size is far beyond the GPU rendering capabilities. Ultrascale Visualization Workshop 2012
The Previous Approach • Our GPU-based approach in EuroGraphics’12. – Parallel Continuous LOD: triangle-level mesh simplification. – GPU Out-of Core: CPU-GPU data streaming. Ultrascale Visualization Workshop 2012
A Multi-GPU and Multi-Display System The input triangle data set CPU Core CPU Core GPU Device GPU Device Ultrascale Visualization Workshop 2012
The approach on a single GPU LOD Selection GPU Out-of-Core Triangle Reformation Rendering (VBO) Ultrascale Visualization Workshop 2012
O 1 O 2 O 3 O 4 O 5 O 6 O 7 LOD Selection Coherence Evaluation Existing Data CPU GPU Out-of-Core Defragmentation GPU Triangle Reformation Rendering (VBO) Ultrascale Visualization Workshop 2012
Performance Bottleneck O 1 O 2 O 3 O 4 O 5 O 6 O 7 Coherence Evaluation CPU GPU Out-Of-Core 45% Defragmentation GPU Triangle Reformation 20% OpenGL VBO Rendering 28% Ultrascale Visualization Workshop 2012
Contributions LOD Selection LOD Selection Load Balancing GPU Out-of-Core GPU Out-of-Core Triangle Triangle Reformation Reformation Rendering (VBO) Rendering (VBO) Inter-GPU Communication Final Display Final Display Ultrascale Visualization Workshop 2012
Load Balancing 1 2 3 4 5 Viewpoint LOD Selection Result: n 1 n 2 n 3 n 4 n 5 2 5 3 1 4 Ultrascale Visualization Workshop 2012
Load Balancing 1 2 3 4 5 Viewpoint LOD Selection Result: n 1 n 2 n 3 n 4 n 5 n 1 n 2 0 0 0 GPU1: 0 0 n 3 n 4 n 5 GPU2: 2 5 3 n 1 +n 2 1 4 n 3 +n 4 +n 5 [1-t, 1+t] ∉ GPU 1 GPU 2 Ultrascale Visualization Workshop 2012
Load Balancing 1 2 3 4 5 Viewpoint LOD Selection Result: n 1 n 2 n 3 n 4 n 5 n 1 n 2 0 0 0 GPU1: 0 0 n 3 n 4 n 5 GPU2: 2 5 3 n 1 +n 2 1 4 [1-t, 1+t] ∉ n 3 +n 4 +n 5 GPU 1 GPU 2 Ultrascale Visualization Workshop 2012
Load Balancing 1 2 3 4 5 Viewpoint LOD Selection Result: n 1 n 2 n 3 n 4 n 5 n 1 n 2 n 3 n 4 0 GPU1: 0 0 0 0 n 5 GPU2: 2 5 3 n 1 +n 2 +n 3 +n 4 1 4 [1-t, 1+t] ∉ n 5 GPU 1 GPU 2 Ultrascale Visualization Workshop 2012
Load Balancing 1 2 3 4 5 Viewpoint LOD Selection Result: n 1 n 2 n 3 n 4 n 5 n 1 n 2 n 3 n 4 0 GPU1: 0 0 0 0 n 5 GPU2: 2 5 3 n 1 +n 2 +n 3 +n 4 1 4 [1-t, 1+t] ∉ n 5 GPU 1 GPU 2 Ultrascale Visualization Workshop 2012
Load Balancing 1 2 3 4 5 Viewpoint LOD Selection Result: n 1 n 2 n 3 n 4 n 5 n 1 n 2 n 3 0 0 GPU1: 0 0 0 n 4 n 5 GPU2: 2 5 3 n 1 +n 2 +n 3 1 4 [1-t, 1+t] ∈ n 4 +n 5 GPU 1 GPU 2 Ultrascale Visualization Workshop 2012
Inter-GPU Communication Displayed image on GPU 1 Displayed image on GPU 2 Rendered image on GPU 1 Rendered image on GPU 2 CUDA Inter-Process Communication (CUDA 4.1 IPC) for transferring image buffer. Ultrascale Visualization Workshop 2012
Implementation • Two GPUs: – NVIDIA GTX 580. – 512 cores, 3 GB DDR5. – 192.4 GB/s memory bandwidth. • CPU Main Memory: – 16 GB RAMs. • Rendering performance: – An average of 20 fps on the Linux system with MPI and CUDA 4.2. Ultrascale Visualization Workshop 2012
Ultrascale Visualization Workshop 2012
Performance Evaluation • Comparison – Dual-GPU with load balancing (our approach). – Dural-GPU without load balancing. – Single GPU. Ultrascale Visualization Workshop 2012
Performance Evaluation Ultrascale Visualization Workshop 2012
Performance Evaluation Approach FPS Diff. Visible Load GPU Triangle GL Triangle Triangle Balancing Out-Of- Reformation Rendering Num. Num. Core Single-GPU 14.94 --- 12.29 M --- 29.62 ms 3.62 ms 30.24 ms Dual-GPU 17.84 7.94 M 12.29 M --- 24.54 ms 2.85 ms 25.31 ms (NB) 20.40 0.37 M 12.29 M 5.38 ms 18.56 ms 1.97 ms 19.13 ms Dual-GPU (B) Ultrascale Visualization Workshop 2012
Performance Evaluation Ultrascale Visualization Workshop 2012
Conclusion • A rendering system with two GPUs: – The workload balancer based on view- frustum partitioning method. • Inter-GPU communication for image re-arrangement. • Future work: – Scalability beyond two GPUs. Ultrascale Visualization Workshop 2012
Acknowledgment Ultrascale Visualization Workshop 2012
Thank you. Ultrascale Visualization Workshop 2012
Recommend
More recommend
Explore More Topics
Stay informed with curated content and fresh updates.