PACT 2013 Parallel Frame Rendering: Trading Responsiveness for Energy on a Mobile GPU Jose-Maria Arnau 1 Joan-Manuel Parcerisa 1 Polychronis Xekalakis 2 jarnau@ac.upc.edu jmanel@ac.upc.edu polychronis.xekalakis@intel.com 1 Universitat Politecnica de Catalunya 2 Intel Labs, Intel Corporation 09 / September / 2013 1
Bandwidth Usage for Graphics 2 Textures and 3D models from: http://www.turbosquid.com
Bandwidth Usage for Graphics 62% 62% 2 Textures and 3D models from: http://www.turbosquid.com
Texture Reuse Frame i Frame i+1 3
Texture Reuse Frame i Frame i+1 86% of the texture dataset is shared 86% of the texture dataset is shared 3
Texture Reuse Frame i Frame i+1 86% of the texture dataset is shared 86% of the texture dataset is shared Mobile games exhibit a high degree of texture similarity between consecutive frames 3
Texture Reuse Mobile games exhibit a high degree of texture similarity between consecutive frames 4
Outline 1. Motivation 2. Conventional Rendering 3. Parallel Frame Rendering 4. Experimental Results 5. Conclusions 5
Outline 1. Motivation 2. Conventional Rendering 3. Parallel Frame Rendering 4. Experimental Results 5. Conclusions 5
Conventional Tile-Based Rendering GPU Color Buffer CPU Process user inputs Command Processor Command Processor Physical simulation Dispatch drawing commands Geometry Geometry L2 Cache Raster Unit 0 L2 Cache Raster Unit 0 Unit System Memory Unit Tiling Memory Tiling Memory Raster Unit 1 Raster Unit 1 Engine Controller Engine Controller Textures Geometry 6
Conventional Tile-Based Rendering GPU Color Buffer CPU Process user inputs Command Processor Command Processor Physical simulation Dispatch drawing commands Geometry Geometry L2 Cache Raster Unit 0 L2 Cache Raster Unit 0 Unit System Memory Unit Tiling Memory Tiling Memory Raster Unit 1 Raster Unit 1 Engine Controller Engine Controller Textures Geometry Time F0 F1 F2 F3 F4 CPU F0 CPU F1 CPU F2 CPU F3 CPU F4 GPU F0 GPU F1 GPU F2 GPU F3 GPU F4 CPU stage GPU stage Screen refresh 6
Conventional Tile-Based Rendering GPU Color Buffer CPU Process user inputs Command Processor Command Processor Physical simulation Dispatch drawing commands Geometry Geometry L2 Cache Raster Unit 0 L2 Cache Raster Unit 0 Unit System Memory Unit Tiling Memory Tiling Memory Raster Unit 1 Raster Unit 1 Engine Controller Engine Controller Textures Geometry Time F0 F1 F2 F3 F4 CPU F0 CPU F1 CPU F2 CPU F3 CPU F4 GPU F0 GPU F1 GPU F2 GPU F3 GPU F4 CPU stage GPU stage Screen refresh 6
Conventional Tile-Based Rendering GPU Color Buffer CPU Process user inputs Command Processor Command Processor Physical simulation Dispatch drawing commands Geometry Geometry L2 Cache Raster Unit 0 L2 Cache Raster Unit 0 Unit System Memory Unit Tiling Memory Tiling Memory Raster Unit 1 Raster Unit 1 Engine Controller Engine Controller Textures Geometry Time F0 F1 F2 F3 F4 CPU F0 CPU F1 CPU F2 CPU F3 CPU F4 GPU F0 GPU F1 GPU F2 GPU F3 GPU F4 CPU stage GPU stage Screen refresh 6
Conventional Tile-Based Rendering GPU Color Buffer CPU Process user inputs Command Processor Command Processor Physical simulation Dispatch drawing commands Geometry Geometry L2 Cache Raster Unit 0 L2 Cache Raster Unit 0 Unit System Memory Unit Tiling Memory Tiling Memory Raster Unit 1 Raster Unit 1 Engine Controller Engine Controller Textures Geometry Time F0 F1 F2 F3 F4 CPU F0 CPU F1 CPU F2 CPU F3 CPU F4 GPU F0 GPU F1 GPU F2 GPU F3 GPU F4 CPU stage GPU stage Screen refresh 6
Conventional Tile-Based Rendering GPU Color Buffer CPU Process user inputs Command Processor Command Processor Physical simulation Dispatch drawing commands Geometry Geometry L2 Cache Raster Unit 0 L2 Cache Raster Unit 0 Unit System Memory Unit Tiling Memory Tiling Memory Raster Unit 1 Raster Unit 1 Engine Controller Engine Controller Textures Geometry Time F0 F1 F2 F3 F4 CPU F0 CPU F1 CPU F2 CPU F3 CPU F4 GPU F0 GPU F1 GPU F2 GPU F3 GPU F4 Capacity miss CPU stage GPU stage Screen refresh 6
L2 Cache Reuse Distances 7
L2 Cache Reuse Distances The L2 Cache cannot capture the inter-frame texture reuse due to the huge distances 7
Outline 1. Motivation 2. Conventional Rendering 3. Parallel Frame Rendering 4. Experimental Results 5. Conclusions 8
Parallel Frame Rendering Conventional GPU Geometry Geometry Raster Unit 0 Raster Unit 0 L2 Cache Unit L2 Cache Unit Memory Memory Tiling Engine Tiling Engine Raster Unit 1 Raster Unit 1 Controller Controller Clustered GPU Geometry Unit 0 Geometry Unit 0 Raster Unit 0 Raster Unit 0 Tiling Engine 0 Tiling Engine 0 Shared Shared L2 Cache L2 Cache Cluster 0 Memory Memory Geometry Unit 1 Geometry Unit 1 Controller Controller Raster Unit 1 Raster Unit 1 Tiling Engine 1 Tiling Engine 1 Cluster 1 9
Parallel Frame Rendering Conventional GPU Geometry Geometry Raster Unit 0 Raster Unit 0 L2 Cache Render Render Unit L2 Cache Unit Frame 0 Frame 1 Memory Memory Tiling Engine Tiling Engine Raster Unit 1 Raster Unit 1 Controller Controller Time Clustered GPU Geometry Unit 0 Geometry Unit 0 Raster Unit 0 Raster Unit 0 Tiling Engine 0 Tiling Engine 0 Shared Shared L2 Cache L2 Cache Cluster 0 Memory Memory Geometry Unit 1 Geometry Unit 1 Controller Controller Raster Unit 1 Raster Unit 1 Tiling Engine 1 Tiling Engine 1 Cluster 1 9
Parallel Frame Rendering Conventional GPU Geometry Geometry Raster Unit 0 Raster Unit 0 L2 Cache Render Render Unit L2 Cache Unit Frame 0 Frame 1 Memory Memory Tiling Engine Tiling Engine Raster Unit 1 Raster Unit 1 Controller Controller Time Clustered GPU Geometry Unit 0 Render Geometry Unit 0 Raster Unit 0 Raster Unit 0 Tiling Engine 0 Tiling Engine 0 Shared Frame 0 Shared L2 Cache L2 Cache Cluster 0 Memory Memory Geometry Unit 1 Render Geometry Unit 1 Controller Controller Raster Unit 1 Raster Unit 1 Tiling Engine 1 Tiling Engine 1 Frame 1 Cluster 1 9
Parallel Frame Rendering Conventional GPU Geometry Geometry Raster Unit 0 Raster Unit 0 L2 Cache Render Render Unit L2 Cache Unit Frame 0 Frame 1 Memory Memory Tiling Engine Tiling Engine Raster Unit 1 Raster Unit 1 Controller Controller Time Clustered GPU Geometry Unit 0 Render Geometry Unit 0 Raster Unit 0 Raster Unit 0 Tiling Engine 0 Tiling Engine 0 Shared Frame 0 Shared L2 Cache L2 Cache Cluster 0 Memory Memory Geometry Unit 1 Render Geometry Unit 1 Controller Controller Raster Unit 1 Raster Unit 1 Tiling Engine 1 Tiling Engine 1 Frame 1 Cluster 1 9
Parallel Frame Rendering Conventional GPU Geometry Geometry Raster Unit 0 Raster Unit 0 L2 Cache Render Render Unit L2 Cache Unit Frame 0 Frame 1 Memory Memory Tiling Engine Tiling Engine Raster Unit 1 Raster Unit 1 Controller Controller Time Clustered GPU Geometry Unit 0 Render Geometry Unit 0 Raster Unit 0 Raster Unit 0 Tiling Engine 0 Tiling Engine 0 Shared Frame 0 Shared L2 Cache L2 Cache Cluster 0 Memory Memory Geometry Unit 1 Render Geometry Unit 1 Controller Controller Raster Unit 1 Raster Unit 1 Tiling Engine 1 Tiling Engine 1 Frame 1 Cluster 1 9
Parallel Frame Rendering Conventional GPU Geometry Geometry Raster Unit 0 Raster Unit 0 L2 Cache Render Render Unit L2 Cache Unit Frame 0 Frame 1 Memory Memory Tiling Engine Tiling Engine Raster Unit 1 Raster Unit 1 Controller Controller Time Clustered GPU Geometry Unit 0 Render Geometry Unit 0 Raster Unit 0 Raster Unit 0 Tiling Engine 0 Tiling Engine 0 Shared Frame 0 Shared L2 Cache L2 Cache Cluster 0 Memory Memory Geometry Unit 1 Render Geometry Unit 1 Controller Controller Raster Unit 1 Raster Unit 1 Tiling Engine 1 Tiling Engine 1 Frame 1 Cluster 1 9
Parallel Frame Rendering Conventional GPU Geometry Geometry Raster Unit 0 Raster Unit 0 L2 Cache Render Render Unit L2 Cache Unit Frame 0 Frame 1 Memory Memory Tiling Engine Tiling Engine Raster Unit 1 Raster Unit 1 Controller Controller Time Clustered GPU Geometry Unit 0 Render Geometry Unit 0 Raster Unit 0 Raster Unit 0 Tiling Engine 0 Tiling Engine 0 Shared Frame 0 Shared L2 Cache L2 Cache Cluster 0 Memory Memory Geometry Unit 1 Render Geometry Unit 1 Controller Controller Raster Unit 1 Raster Unit 1 Tiling Engine 1 Tiling Engine 1 Frame 1 Cluster 1 Textures are fetched once in the shared L2 cache and accessed by the 2 clusters in a short timespan 9
Parallel Frame Rendering CPU stage GPU stage Screen refresh Time F0 F1 F2 F3 F4 CPU F0 CPU F1 CPU F2 CPU F3 CPU F4 GPU F0 GPU F1 GPU F2 GPU F3 GPU F4 0 1
Parallel Frame Rendering CPU stage GPU stage Screen refresh Time F0 F1 F2 F3 F4 CPU F0 CPU F1 CPU F2 CPU F3 CPU F4 GPU F0 GPU F1 GPU F2 GPU F3 GPU F4 F0 F1 F2 CPU F0 CPU F1 CPU F2 CPU F3 CPU F4 GPU Cluster 0 - F0 GPU Cluter 0 - F2 GPU Cluster 1 - F1 GPU Cluster 1 - F3 0 1
Parallel Frame Rendering CPU stage GPU stage Screen refresh Time F0 F1 F2 F3 F4 CPU F0 CPU F1 CPU F2 CPU F3 CPU F4 GPU F0 GPU F1 GPU F2 GPU F3 GPU F4 F0 F1 F2 CPU F0 CPU F1 CPU F2 CPU F3 CPU F4 GPU Cluster 0 - F0 GPU Cluter 0 - F2 GPU Cluster 1 - F1 GPU Cluster 1 - F3 0 1
Recommend
More recommend