Load Balanced Parallel GPU Out-of-Core for Continuous LOD Model Visualization Chao Peng, Peng Mi and Yong Cao Department of Computer Science, Virginia Tech, Blacksburg, Virginia, USA Ultrascale Visualization Workshop 2012
Motivation • How to efficiently render a large 3D model that contains a lot of objects and triangles? The Boeing 777 model: Triangles: 332 million Vertices: 223 million Objects: 719 thousand Rendering difficulties: The objects have dramatically different shapes and are topologically disconnected. The data size is far beyond the GPU rendering capabilities. Ultrascale Visualization Workshop 2012
The Previous Approach • Our GPU-based approach in EuroGraphics’12. – Parallel Continuous LOD: triangle-level mesh simplification. – GPU Out-of Core: CPU-GPU data streaming. Ultrascale Visualization Workshop 2012
A Multi-GPU and Multi-Display System The input triangle data set CPU Core CPU Core GPU Device GPU Device Ultrascale Visualization Workshop 2012
The approach on a single GPU LOD Selection GPU Out-of-Core Triangle Reformation Rendering (VBO) Ultrascale Visualization Workshop 2012
O 1 O 2 O 3 O 4 O 5 O 6 O 7 LOD Selection Coherence Evaluation Existing Data CPU GPU Out-of-Core Defragmentation GPU Triangle Reformation Rendering (VBO) Ultrascale Visualization Workshop 2012
Performance Bottleneck O 1 O 2 O 3 O 4 O 5 O 6 O 7 Coherence Evaluation CPU GPU Out-Of-Core 45% Defragmentation GPU Triangle Reformation 20% OpenGL VBO Rendering 28% Ultrascale Visualization Workshop 2012
Contributions LOD Selection LOD Selection Load Balancing GPU Out-of-Core GPU Out-of-Core Triangle Triangle Reformation Reformation Rendering (VBO) Rendering (VBO) Inter-GPU Communication Final Display Final Display Ultrascale Visualization Workshop 2012
Load Balancing 1 2 3 4 5 Viewpoint LOD Selection Result: n 1 n 2 n 3 n 4 n 5 2 5 3 1 4 Ultrascale Visualization Workshop 2012
Load Balancing 1 2 3 4 5 Viewpoint LOD Selection Result: n 1 n 2 n 3 n 4 n 5 n 1 n 2 0 0 0 GPU1: 0 0 n 3 n 4 n 5 GPU2: 2 5 3 n 1 +n 2 1 4 n 3 +n 4 +n 5 [1-t, 1+t] ∉ GPU 1 GPU 2 Ultrascale Visualization Workshop 2012
Load Balancing 1 2 3 4 5 Viewpoint LOD Selection Result: n 1 n 2 n 3 n 4 n 5 n 1 n 2 0 0 0 GPU1: 0 0 n 3 n 4 n 5 GPU2: 2 5 3 n 1 +n 2 1 4 [1-t, 1+t] ∉ n 3 +n 4 +n 5 GPU 1 GPU 2 Ultrascale Visualization Workshop 2012
Load Balancing 1 2 3 4 5 Viewpoint LOD Selection Result: n 1 n 2 n 3 n 4 n 5 n 1 n 2 n 3 n 4 0 GPU1: 0 0 0 0 n 5 GPU2: 2 5 3 n 1 +n 2 +n 3 +n 4 1 4 [1-t, 1+t] ∉ n 5 GPU 1 GPU 2 Ultrascale Visualization Workshop 2012
Load Balancing 1 2 3 4 5 Viewpoint LOD Selection Result: n 1 n 2 n 3 n 4 n 5 n 1 n 2 n 3 n 4 0 GPU1: 0 0 0 0 n 5 GPU2: 2 5 3 n 1 +n 2 +n 3 +n 4 1 4 [1-t, 1+t] ∉ n 5 GPU 1 GPU 2 Ultrascale Visualization Workshop 2012
Load Balancing 1 2 3 4 5 Viewpoint LOD Selection Result: n 1 n 2 n 3 n 4 n 5 n 1 n 2 n 3 0 0 GPU1: 0 0 0 n 4 n 5 GPU2: 2 5 3 n 1 +n 2 +n 3 1 4 [1-t, 1+t] ∈ n 4 +n 5 GPU 1 GPU 2 Ultrascale Visualization Workshop 2012
Inter-GPU Communication Displayed image on GPU 1 Displayed image on GPU 2 Rendered image on GPU 1 Rendered image on GPU 2 CUDA Inter-Process Communication (CUDA 4.1 IPC) for transferring image buffer. Ultrascale Visualization Workshop 2012
Implementation • Two GPUs: – NVIDIA GTX 580. – 512 cores, 3 GB DDR5. – 192.4 GB/s memory bandwidth. • CPU Main Memory: – 16 GB RAMs. • Rendering performance: – An average of 20 fps on the Linux system with MPI and CUDA 4.2. Ultrascale Visualization Workshop 2012
Ultrascale Visualization Workshop 2012
Performance Evaluation • Comparison – Dual-GPU with load balancing (our approach). – Dural-GPU without load balancing. – Single GPU. Ultrascale Visualization Workshop 2012
Performance Evaluation Ultrascale Visualization Workshop 2012
Performance Evaluation Approach FPS Diff. Visible Load GPU Triangle GL Triangle Triangle Balancing Out-Of- Reformation Rendering Num. Num. Core Single-GPU 14.94 --- 12.29 M --- 29.62 ms 3.62 ms 30.24 ms Dual-GPU 17.84 7.94 M 12.29 M --- 24.54 ms 2.85 ms 25.31 ms (NB) 20.40 0.37 M 12.29 M 5.38 ms 18.56 ms 1.97 ms 19.13 ms Dual-GPU (B) Ultrascale Visualization Workshop 2012
Performance Evaluation Ultrascale Visualization Workshop 2012
Conclusion • A rendering system with two GPUs: – The workload balancer based on view- frustum partitioning method. • Inter-GPU communication for image re-arrangement. • Future work: – Scalability beyond two GPUs. Ultrascale Visualization Workshop 2012
Acknowledgment Ultrascale Visualization Workshop 2012
Thank you. Ultrascale Visualization Workshop 2012
Recommend
More recommend