TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments Shinpei Kato * , Karthik Lakshmanan * , Raj Rajkumar * , and Yutaka Ishikawa ** * Carnegie Mellon University ** The University of Tokyo
USENIX Annual Technical Conference 2011 – Shinpei Kato (CMU), June 15, 2011 Graphics Applications
USENIX Annual Technical Conference 2011 – Shinpei Kato (CMU), June 15, 2011 Graphics Processing Unit (GPU) NVIDIA GPU GeForce GTX 480 48 480 0 simple simple co cores es L1 L1 L1 L1 L1 L1 L1 L2 Cache Device Memory CPU Host Memory
USENIX Annual Technical Conference 2011 – Shinpei Kato (CMU), June 15, 2011 Peak Performance 1600 GTX 580 1400 GTX 480 1200 GTX 285 1000 GFLOPS GTX 280 NVIDIA GPU 800 Intel CPU 9800 GTX 600 8800 GTX 400 7900 GTX 200 Q9650 980 XE E6850 E4300 X7460 0 2006/3/4 2007/12/14 2009/9/24 2011/7/6
USENIX Annual Technical Conference 2011 – Shinpei Kato (CMU), June 15, 2011 Peak Performance “per Watt” 7 GTX 580 6 GTX 285 GTX 480 5 GFLOPS / Watt 9800 GT 4 GTX 280 NVIDIA GPU 3 8800 GTX Intel CPU 7900 GTX 2 1 Q9650 980XE X7460 E6850 E4300 0 2006/3/4 2007/12/14 2009/9/24 2011/7/6
USENIX Annual Technical Conference 2011 – Shinpei Kato (CMU), June 15, 2011 General-Purpose Computing on GPU (GPGPU) 3-D On-line Game Autonomous Driving Virtual Reality Scientific Simulation 3-D Interface Computer Vision
USENIX Annual Technical Conference 2011 – Shinpei Kato (CMU), June 15, 2011 Outline 1. Introduction 2. What’s Problem 3. Our Solution – “TimeGraph” 4. Evaluation 5. Summary
USENIX Annual Technical Conference 2011 – Shinpei Kato (CMU), June 15, 2011 GPU Is Command-Driven CMD_HtoD CMD_HtoD CMD_LAUNCH CMD_DtoH Host Memory Host Memory Host Memory Host Memory GPU GPU GPU GPU Input Input Input Input Output Code Code Code Code Data Data Data Data Data copy copy copy GPU GPU GPU GPU Input Input Output Input Output Code Code Code Code Data Data Data Data Data Device Memory Device Memory Device Memory Device Memory
USENIX Annual Technical Conference 2011 – Shinpei Kato (CMU), June 15, 2011 Multi-Tasking Problem High-priority task GPU driver Low-priority task GPU command CPU time GPU time Blocked Blocked
USENIX Annual Technical Conference 2011 – Shinpei Kato (CMU), June 15, 2011 Impact of Interference 1 Relative frame-rate to standalone Observe Frame Rate of Execute with Engine (low workload) Compete w/ Widget (low GPU workload) OpenArena (3-D Game) Compete w/ Bomb (high GPU workload) Execute with Clearspd (high workload) 0.8 on Linux 0.6 0.4 0.2 0 NVIDIA proprietary driver NVIDIA Nouveau NVIDIA Nouveau Nouveau open-source driver GeForce 9500 GeForce GTX 285
USENIX Annual Technical Conference 2011 – Shinpei Kato (CMU), June 15, 2011 Outline 1. Introduction 2. What’s Problem 3. Our Solution – “TimeGraph” 4. Evaluation 5. Summary
USENIX Annual Technical Conference 2011 – Shinpei Kato (CMU), June 15, 2011 TimeGraph Architecture Software Approach User Space Kernel Space Applications TimeGraph OpenGL/CUDA Library GPU Command Queue User-space GPU Driver GPU Command Kernel-space GPU Command Scheduler Group GPU Driver GPU resource control High- Submission Interface Priority GPU Reserve Manager GPU exec. time prediction Notification IRQ Handler GPU Command Profiler Device Space Interrupt Graphics Processing Unit (GPU)
USENIX Annual Technical Conference 2011 – Shinpei Kato (CMU), June 15, 2011 Priority Support – Predictable Response Time (PRT) Policy • When GPU is not idle, GPU commands are queued • When GPU gets idle, GPU commands are dispatched High-priority task Low-priority task GPU driver GPU command Interrupt CPU time Prioritized correctly GPU time Overhead
USENIX Annual Technical Conference 2011 – Shinpei Kato (CMU), June 15, 2011 Priority Support – High Throughput (HT) Policy • When GPU is not idle, GPU commands are queued, only if priority is lower than current GPU context • When GPU gets idle, GPU commands are dispatched High-priority task Low-priority task GPU driver GPU command Interrupt CPU time GPU time Overhead reduced
USENIX Annual Technical Conference 2011 – Shinpei Kato (CMU), June 15, 2011 Reservation Support – Posterior Enforcement (PE) Policy • Enforce GPU resource usage optimistically • Specify capacity (C) and period (P) per task (/proc/GPU/$ TASK ) CPU time Enforced GPU time Execution Budget C C time P
USENIX Annual Technical Conference 2011 – Shinpei Kato (CMU), June 15, 2011 Reservation Support – Apriori Enforcement (AE) Policy • Enforce GPU resource usage pessimistically • Specify capacity (C) and period (P) per task (/proc/GPU/$ TASK ) CPU time Enforced Enforced GPU time Predict Predict Predict Predict Execution C Budget C time P
USENIX Annual Technical Conference 2011 – Shinpei Kato (CMU), June 15, 2011 GPU Execution Time Prediction • History-based approach – Search records of previous sequences of GPU commands that match the incoming sequences of GPU commands – Works for 2-D but needs investigation for 3-D and Compute • Please see the paper for the detail
USENIX Annual Technical Conference 2011 – Shinpei Kato (CMU), June 15, 2011 Outline 1. Introduction 2. What’s Problem 3. Our Solution – “TimeGraph” 4. Evaluation 5. Summary
USENIX Annual Technical Conference 2011 – Shinpei Kato (CMU), June 15, 2011 Experimental Setup • GPU : NVIDIA GeForce 9800 GT • CPU : Intel Xeon E5504 • OS : Linux Kernel 2.6.36 – Nouveau open-source driver • Benchmark: – Phoronix Test Suite http://www.phoronix-test-suite.com/ • Including OpenGL 3-D game programs – Gallium3D Demo Suite http://www.mesa3d.org/ • Including OpenGL 3-D widget and graphics-bomb programs
USENIX Annual Technical Conference 2011 – Shinpei Kato (CMU), June 15, 2011 Performance Protection Frame Rate of 3-D Game competing with Graphics Bomb in background 60 Average frame-rate (fps) 50 No Timing Support No TimeGraph Support 40 30 Priority Support Priority Support ( High Priority -> 3-D Game ) 20 Priority & Soft Reservation Priority & PE Reservation Support ( GPU Util. 10% -> Graphics Bomb ) Support 10 Priority & AE Reservation Support Priority & Hard Reservation ( GPU Util. 10% -> Graphics Bomb ) Support 0 OpenArena World of Urban Terror Unreal Padman Trounament 3-D Game Application
USENIX Annual Technical Conference 2011 – Shinpei Kato (CMU), June 15, 2011 Interference on Time Widget Engine #1 Widget Engine #2 Widget Engine #3 Widget Engine #1 Widget Engine #2 Widget Engine #3 Widget Engine #1 Widget Engine #2 Widget Engine #3 200 200 200 160 160 160 Frames per Second Frames per Second Frames per Second 120 120 120 80 80 80 40 40 40 0 0 0 0 20 40 60 80 100 120 0 20 40 60 80 100 120 0 20 40 60 80 100 120 Elapsed Time (Second) Elapsed Time (Second) Elapsed Time (Second) No TimeGraph Support Priority Support (PRT) Priority Support (PRT) + Reservation Support (PE)
USENIX Annual Technical Conference 2011 – Shinpei Kato (CMU), June 15, 2011 Standalone Performance 70 X server is assigned PRT policy 60 Average frame-rate (fps) No TimeGraph Support 50 Priority Support (HT) 40 30 Priority Support (PRT) 20 Priority & Reservation Support 10 (PRT & PE) Priority & Reservation Support 0 (PRT & AE) OpenArena World of Urban Terror Unreal Padman Trounament 3-D Game Application Overhead is acceptable for protecting GPU
USENIX Annual Technical Conference 2011 – Shinpei Kato (CMU), June 15, 2011 Outline 1. Introduction 2. What’s Problem 3. Our Solution – “TimeGraph” 4. Evaluation 5. Summary
USENIX Annual Technical Conference 2011 – Shinpei Kato (CMU), June 15, 2011 Concluding Remarks • TimeGraph enables prioritization and isolation for GPU applications in multi-tasking environments – Device-driver solution: no modification to user-space – Scheduling of GPU commands – Reservation of GPU resource usage • http://rtml.ece.cmu.edu/projects/timegraph/
USENIX Annual Technical Conference 2011 – Shinpei Kato (CMU), June 15, 2011 Current Status • GPGPU support (collaboration with PathScale Inc.) – Visit http://github.com/pathscale/pscnv • Making open-source fast and reliable – It’s getting competitive to the proprietary driver! – Some result from our OSPERT’11 paper (*) below: Launch HtoD DtoH NVIDIA GPU 100 Matrix Multiplication GeForce GTX 480 Execution Time (ms) 10 1 0.1 0.01 NVIDIA Ours NVIDIA Ours NVIDIA Ours NVIDIA Ours NVIDIA Ours NVIDIA Ours NVIDIA Ours 16 x 16 32 x 32 64 x 64 128 x 128 256 x 256 512 x 512 1024 x 1024 * Available at http://www.contrib.andrew.cmu.edu/~shinpei/papers/ospert11.pdf
Thank you for your attention! Questions?
Recommend
More recommend