Pegasus: Coordinated Scheduling for Virtualized Accelerator-based Systems Vishakha Gupta , Karsten Schwan @ Georgia Tech Niraj Tolia @ Maginatics Vanish Talwar, Parthasarathy Ranganathan @ HP Labs USENIX ATC 2011 – Portland, OR, USA
Increasing Popularity of Accelerators 2007 2011 2008 2009 2010 • IBM Cell- • IBM Cell- • Increasing • Tegras in • Amazon EC2 based- cellphones based popularity of adopts GPUs Playstation RoadRunner NVIDIA GPUs • Keeneland • Tianhe-1A powered • CUDA and Nebulae desktops and programmab supercomput laptops le GPUs for ers in Top500 developers 2
Example x86-GPU System PCIe 3
Example x86-GPU System Proprietary NVIDIA Driver and CUDA runtime • Memory management • Communication with device • Scheduling logic • Binary translation PCIe 4
Example x86-GPU System C-like CUDA-based applications (host portion) Proprietary NVIDIA Driver and CUDA runtime • Memory management • Communication with device • Scheduling logic • Binary translation PCIe 5
Example x86-GPU System C-like CUDA-based applications CUDA Kernels (host portion) Proprietary NVIDIA Driver and CUDA runtime • Memory management • Communication with device • Scheduling logic • Binary translation PCIe 6
Example x86-GPU System C-like CUDA-based applications CUDA Kernels (host portion) Proprietary NVIDIA Driver and CUDA runtime • Memory management • Communication with device • Scheduling logic • Binary translation PCIe Design flaw: Bulk of logic in drivers which were meant to be for simple operations like read, write and handle interrupts Shortcoming: Inaccessibility and one scheduling fits all 7
Sharing Accelerators 2011 2010 • Tegras in cellphones • Amazon EC2 adopts GPUs • Other cloud offerings by • HPC GPU Cluster (Keeneland ) AMD, NVIDIA 8
Sharing Accelerators 2011 2010 • Tegras in cellphones • Amazon EC2 adopts GPUs • Other cloud offerings by • HPC GPU Cluster (Keeneland ) AMD, NVIDIA • Most applications fail to occupy GPUs completely − With the exception of extensively tuned (e.g. supercomputing) applications 9
Sharing Accelerators 2011 2010 • Tegras in cellphones • Amazon EC2 adopts GPUs • Other cloud offerings by • HPC GPU Cluster (Keeneland ) AMD, NVIDIA • Most applications fail to occupy GPUs completely − With the exception of extensively tuned (e.g. supercomputing) applications • Expected utilization of GPUs across applications in some domains “may” follow patterns to allow sharing 10
Sharing Accelerators 2011 2010 • Tegras in cellphones • Amazon EC2 adopts GPUs • Other cloud offerings by • HPC GPU Cluster (Keeneland ) AMD, NVIDIA • Most applications fail to occupy GPUs completely − With the exception of extensively tuned (e.g. supercomputing) applications • Expected utilization of GPUs across applications in some domains “may” follow patterns to allow sharing Need for accelerator sharing: resource sharing is now supported in NVIDIA’s Fermi architecture Concern: Can driver scheduling do a good job? 11
NVIDIA GPU Sharing – Driver Default • Xeon Quadcore with 2 8800GTX NVIDIA Max GPUs, driver 169.09, CUDA SDK 1.1 • Coulomb Potential 50% [CP] benchmark Median from parboil benchmark suite Min • Result of sharing two GPUs among four instances of the application 12
NVIDIA GPU Sharing – Driver Default • Xeon Quadcore with 2 8800GTX NVIDIA Max GPUs, driver 169.09, CUDA SDK 1.1 • Coulomb Potential 50% [CP] benchmark Median from parboil benchmark suite Min • Result of sharing two GPUs among four instances of the application Driver can: efficiently implement computation and data interactions between host and accelerator Limitations: Call ordering suffers when sharing – any scheme used is static and cannot adapt to different system expectations 13
Re-thinking Accelerator-based Systems 14
Re-thinking Accelerator-based Systems • Accelerators as first class citizens − Why treat such powerful processing resources as devices? − How can such heterogeneous resources be managed especially with evolving programming models, evolving hardware and proprietary software? 15
Re-thinking Accelerator-based Systems • Accelerators as first class citizens − Why treat such powerful processing resources as devices? − How can such heterogeneous resources be managed especially with evolving programming models, evolving hardware and proprietary software? • Sharing of accelerators − Are there efficient methods to utilize a heterogeneous pool of resources? − Can applications share accelerators without a big hit in efficiency? 16
Re-thinking Accelerator-based Systems • Accelerators as first class citizens − Why treat such powerful processing resources as devices? − How can such heterogeneous resources be managed especially with evolving programming models, evolving hardware and proprietary software? • Sharing of accelerators − Are there efficient methods to utilize a heterogeneous pool of resources? − Can applications share accelerators without a big hit in efficiency? • Coordination across different processor types − How do you deal with multiple scheduling domains? − Does coordination obtain any performance gains? 17
Pegasus addresses the urgent need for systems support to smartly manage accelerators. 18
Pegasus addresses the urgent need for systems support to smartly manage accelerators. (Demonstrated through x86--NVIDIA GPU-based systems) 19
Pegasus addresses the urgent need for systems support to smartly manage accelerators. (Demonstrated through x86--NVIDIA GPU-based systems) It leverages new opportunities presented by increased adoption of virtualization technology in commercial, cloud computing, and even high performance infrastructures. 20
Pegasus addresses the urgent need for systems support to smartly manage accelerators. (Demonstrated through x86--NVIDIA GPU-based systems) It leverages new opportunities presented by increased adoption of virtualization technology in commercial, cloud computing, and even high performance infrastructures. (Virtualization provided by Xen hypervisor and Dom0 management domain) 21
ACCELERATORS AS FIRST CLASS CITIZENS 22
Manageability Extending Xen for Closed NVIDIA GPUs VM Management Domain (Dom0) Management Domain (Dom0) Traditional Device Linux Drivers Hypervisor (Xen) Hypervisor (Xen) General purpose multicores General purpose multicores Traditional Devices Traditional Devices 23
Manageability Extending Xen for Closed NVIDIA GPUs VM Management Domain (Dom0) Management Domain (Dom0) Traditional Device Linux Drivers Hypervisor (Xen) Hypervisor (Xen) General purpose multicores General purpose multicores Compute Accelerators (NVIDIA GPUs) Compute Accelerators (NVIDIA GPUs) Traditional Devices Traditional Devices 24
Manageability Extending Xen for Closed NVIDIA GPUs VM Management Domain (Dom0) Management Domain (Dom0) Traditional Runtime + Device Linux GPU Driver Drivers Hypervisor (Xen) Hypervisor (Xen) General purpose multicores General purpose multicores Compute Accelerators (NVIDIA GPUs) Compute Accelerators (NVIDIA GPUs) Traditional Devices Traditional Devices 25
Manageability Extending Xen for Closed NVIDIA GPUs VM Management Domain (Dom0) Management Domain (Dom0) GPU Application CUDA API Traditional Runtime + Device Linux GPU Driver Drivers Hypervisor (Xen) Hypervisor (Xen) General purpose multicores General purpose multicores Compute Accelerators (NVIDIA GPUs) Compute Accelerators (NVIDIA GPUs) Traditional Devices Traditional Devices NVIDIA’s CUDA – Compute Unified Device Architecture for managing GPUs 26
Manageability Extending Xen for Closed NVIDIA GPUs VM Management Domain (Dom0) Management Domain (Dom0) GPU Application GPU CUDA API Backend GPU Frontend Traditional Runtime + Device Linux GPU Driver Drivers Hypervisor (Xen) Hypervisor (Xen) General purpose multicores General purpose multicores Compute Accelerators (NVIDIA GPUs) Compute Accelerators (NVIDIA GPUs) Traditional Devices Traditional Devices NVIDIA’s CUDA – Compute Unified Device Architecture for managing GPUs 27
Manageability Extending Xen for Closed NVIDIA GPUs VM VM Management Domain (Dom0) Management Domain (Dom0) GPU Application GPU Application GPU CUDA API CUDA API Backend GPU Frontend GPU Frontend Traditional Runtime + Device Linux Linux GPU Driver Drivers Hypervisor (Xen) Hypervisor (Xen) General purpose multicores General purpose multicores Compute Accelerators (NVIDIA GPUs) Compute Accelerators (NVIDIA GPUs) Traditional Devices Traditional Devices NVIDIA’s CUDA – Compute Unified Device Architecture for managing GPUs 28
Manageability Extending Xen for Closed NVIDIA GPUs VM VM Management Domain (Dom0) Management Domain (Dom0) GPU Application GPU Application Mgmt GPU CUDA API CUDA API Extension Backend GPU Frontend GPU Frontend Traditional Runtime + Device Linux Linux GPU Driver Drivers Hypervisor (Xen) Hypervisor (Xen) General purpose multicores General purpose multicores Compute Accelerators (NVIDIA GPUs) Compute Accelerators (NVIDIA GPUs) Traditional Devices Traditional Devices NVIDIA’s CUDA – Compute Unified Device Architecture for managing GPUs 29
Recommend
More recommend