load balanced parallel gpu out of core for continuous lod
play

Load Balanced Parallel GPU Out-of-Core for Continuous LOD Model - PowerPoint PPT Presentation

Load Balanced Parallel GPU Out-of-Core for Continuous LOD Model Visualization Chao Peng, Peng Mi and Yong Cao Department of Computer Science, Virginia Tech, Blacksburg, Virginia, USA Ultrascale Visualization Workshop 2012 Motivation How


  1. Load Balanced Parallel GPU Out-of-Core for Continuous LOD Model Visualization Chao Peng, Peng Mi and Yong Cao Department of Computer Science, Virginia Tech, Blacksburg, Virginia, USA Ultrascale Visualization Workshop 2012

  2. Motivation • How to efficiently render a large 3D model that contains a lot of objects and triangles? The Boeing 777 model: Triangles: 332 million Vertices: 223 million Objects: 719 thousand Rendering difficulties: The objects have dramatically different shapes and are topologically disconnected. The data size is far beyond the GPU rendering capabilities. Ultrascale Visualization Workshop 2012

  3. The Previous Approach • Our GPU-based approach in EuroGraphics’12. – Parallel Continuous LOD: triangle-level mesh simplification. – GPU Out-of Core: CPU-GPU data streaming. Ultrascale Visualization Workshop 2012

  4. A Multi-GPU and Multi-Display System The input triangle data set CPU Core CPU Core GPU Device GPU Device Ultrascale Visualization Workshop 2012

  5. The approach on a single GPU LOD Selection GPU Out-of-Core Triangle Reformation Rendering (VBO) Ultrascale Visualization Workshop 2012

  6. O 1 O 2 O 3 O 4 O 5 O 6 O 7 LOD Selection Coherence Evaluation Existing Data CPU GPU Out-of-Core Defragmentation GPU Triangle Reformation Rendering (VBO) Ultrascale Visualization Workshop 2012

  7. Performance Bottleneck O 1 O 2 O 3 O 4 O 5 O 6 O 7 Coherence Evaluation CPU GPU Out-Of-Core 45% Defragmentation GPU Triangle Reformation 20% OpenGL VBO Rendering 28% Ultrascale Visualization Workshop 2012

  8. Contributions LOD Selection LOD Selection Load Balancing GPU Out-of-Core GPU Out-of-Core Triangle Triangle Reformation Reformation Rendering (VBO) Rendering (VBO) Inter-GPU Communication Final Display Final Display Ultrascale Visualization Workshop 2012

  9. Load Balancing 1 2 3 4 5 Viewpoint LOD Selection Result: n 1 n 2 n 3 n 4 n 5 2 5 3 1 4 Ultrascale Visualization Workshop 2012

  10. Load Balancing 1 2 3 4 5 Viewpoint LOD Selection Result: n 1 n 2 n 3 n 4 n 5 n 1 n 2 0 0 0 GPU1: 0 0 n 3 n 4 n 5 GPU2: 2 5 3 n 1 +n 2 1 4 n 3 +n 4 +n 5 [1-t, 1+t] ∉ GPU 1 GPU 2 Ultrascale Visualization Workshop 2012

  11. Load Balancing 1 2 3 4 5 Viewpoint LOD Selection Result: n 1 n 2 n 3 n 4 n 5 n 1 n 2 0 0 0 GPU1: 0 0 n 3 n 4 n 5 GPU2: 2 5 3 n 1 +n 2 1 4 [1-t, 1+t] ∉ n 3 +n 4 +n 5 GPU 1 GPU 2 Ultrascale Visualization Workshop 2012

  12. Load Balancing 1 2 3 4 5 Viewpoint LOD Selection Result: n 1 n 2 n 3 n 4 n 5 n 1 n 2 n 3 n 4 0 GPU1: 0 0 0 0 n 5 GPU2: 2 5 3 n 1 +n 2 +n 3 +n 4 1 4 [1-t, 1+t] ∉ n 5 GPU 1 GPU 2 Ultrascale Visualization Workshop 2012

  13. Load Balancing 1 2 3 4 5 Viewpoint LOD Selection Result: n 1 n 2 n 3 n 4 n 5 n 1 n 2 n 3 n 4 0 GPU1: 0 0 0 0 n 5 GPU2: 2 5 3 n 1 +n 2 +n 3 +n 4 1 4 [1-t, 1+t] ∉ n 5 GPU 1 GPU 2 Ultrascale Visualization Workshop 2012

  14. Load Balancing 1 2 3 4 5 Viewpoint LOD Selection Result: n 1 n 2 n 3 n 4 n 5 n 1 n 2 n 3 0 0 GPU1: 0 0 0 n 4 n 5 GPU2: 2 5 3 n 1 +n 2 +n 3 1 4 [1-t, 1+t] ∈ n 4 +n 5 GPU 1 GPU 2 Ultrascale Visualization Workshop 2012

  15. Inter-GPU Communication Displayed image on GPU 1 Displayed image on GPU 2 Rendered image on GPU 1 Rendered image on GPU 2 CUDA Inter-Process Communication (CUDA 4.1 IPC) for transferring image buffer. Ultrascale Visualization Workshop 2012

  16. Implementation • Two GPUs: – NVIDIA GTX 580. – 512 cores, 3 GB DDR5. – 192.4 GB/s memory bandwidth. • CPU Main Memory: – 16 GB RAMs. • Rendering performance: – An average of 20 fps on the Linux system with MPI and CUDA 4.2. Ultrascale Visualization Workshop 2012

  17. Ultrascale Visualization Workshop 2012

  18. Performance Evaluation • Comparison – Dual-GPU with load balancing (our approach). – Dural-GPU without load balancing. – Single GPU. Ultrascale Visualization Workshop 2012

  19. Performance Evaluation Ultrascale Visualization Workshop 2012

  20. Performance Evaluation Approach FPS Diff. Visible Load GPU Triangle GL Triangle Triangle Balancing Out-Of- Reformation Rendering Num. Num. Core Single-GPU 14.94 --- 12.29 M --- 29.62 ms 3.62 ms 30.24 ms Dual-GPU 17.84 7.94 M 12.29 M --- 24.54 ms 2.85 ms 25.31 ms (NB) 20.40 0.37 M 12.29 M 5.38 ms 18.56 ms 1.97 ms 19.13 ms Dual-GPU (B) Ultrascale Visualization Workshop 2012

  21. Performance Evaluation Ultrascale Visualization Workshop 2012

  22. Conclusion • A rendering system with two GPUs: – The workload balancer based on view- frustum partitioning method. • Inter-GPU communication for image re-arrangement. • Future work: – Scalability beyond two GPUs. Ultrascale Visualization Workshop 2012

  23. Acknowledgment Ultrascale Visualization Workshop 2012

  24. Thank you. Ultrascale Visualization Workshop 2012

Recommend


More recommend