DELIVERING HIGH-PERFORMANCE REMOTE GRAPHICS WITH NVIDIA GRID VIRTUAL GPU Andy Currid NVIDIA
WHAT YOU’LL LEARN IN THIS SESSION NVIDIA's GRID Virtual GPU Architecture — What it is and how it works Using GRID Virtual GPU on Citrix XenS erver How to deliver great remote graphics from GRID Virtual GPU
WHY VIRTUALIZE? Workstation ENGINEER / DESIGNER POWER USER High-end PC Entry-level KNOWLEDGE PC WORKER
WHY VIRTUALIZE? Awesome performance! High cost Desktop workstation Hard to fully utilize, limited mobility Quadro GPU Challenging to manage Data security can be a problem
… CENTRALIZE THE WORKSTATION Awesome performance! Datacenter Easier to fully Notebook or Desktop workstation thin client utilize, manage and secure Quadro GPU Remote Graphics Even more expensive!
… VIRTUALIZE THE WORKSTATION Datacenter GPU-enabled server Virtual Machine Virtual Machine Notebook or thin client Guest OS Guest OS Apps Apps NVIDIA NVIDIA Driver Driver Citrix XenServer Direct GPU Remote VMware ESX Hypervisor access from Red Hat Enterprise Linux Graphics guest VM Open source Xen, KVM Dedicated GPU per user NVIDIA GRID GPU
… SHARE THE GPU Datacenter GPU-enabled server Virtual Machine Virtual Machine Notebook or thin client Guest OS Guest OS Hypervisor Apps Apps NVIDIA NVIDIA GRID Virtual GPU Driver Driver Manager Remote Direct GPU Hypervisor Physical GPU access from Graphics Management guest VMs NVIDIA GRID vGPU
NVIDIA GRID VIRTUAL GPU S tandard NVIDIA driver stack in each guest VM GPU-enabled server — API compatibility VM 1 VM 2 Guest OS Guest OS Hypervisor Apps Apps Direct hardware access GRID Virtual GPU NVIDIA NVIDIA from the guest Driver Driver Manager — Highest performance Hypervisor GRID Virtual GPU NVIDIA GRID vGPU Manager — Increased manageability
VIRTUAL GPU RESOURCE SHARING Frame buffer — Allocated at VM startup GPU-enabled server VM 1 VM 2 Channels Guest OS Guest OS Hypervisor Apps Apps — Used to post work to the GPU GRID Virtual GPU NVIDIA NVIDIA — VM accesses its channels via Driver Driver Manager GPU Base Address Register (BAR), isolated by CPU’s Hypervisor CPU MMU Memory Management Unit (MMU) GPU BAR VM1 BAR VM2 BAR NVIDIA GRID vGPU Channels GPU Engines Timeshared Scheduling Framebuffer VM1 FB — Timeshared among VMs, like 3D CE NVENC NVDEC VM2 FB multiple contexts on single OS
VIRTUAL GPU ISOLATION GPU MMU controls access from engines to framebuffer and system memory GPU-enabled server VM 1 VM 2 vGPU Manager maintains Guest OS Guest OS Hypervisor Apps Apps per-VM pagetables in GPU’s GRID Virtual GPU NVIDIA NVIDIA framebuffer Driver Driver Manager Hypervisor Translated DMA access to Valid accesses are routed to VM physical memory and FB framebuffer or system NVIDIA GRID vGPU Framebuffer memory GPU MMU VM1 FB VM2 FB Untranslated accesses VM1 pagetables 3D CE NVENC NVDEC VM2 pagetables Invalid accesses are blocked Pagetable access
VIRTUAL GPU DISPLAY GPU-enabled server VM 1 Guest OS Hypervisor Apps Virtual GPU exposes virtual display GRID Virtual GPU NVIDIA Driver Manager heads for each VM — E.g. 2 heads at 2560x1600 Hypervisor resolution Primary surfaces (front buffers) for NVIDIA GRID vGPU Framebuffer each head are maintained in a VM’s VM1 FB framebuffer Head Head 1 2 3D CE NVENC Physical scanout to a monitor is replaced by hardware delivery direct to system memory
NVIDIA GRID REMOTE GRAPHICS SDK Available on vGPU and Network passthrough GPU Remote Apps Apps Graphics Stack Apps Fast readback of desktop or H.264 or Graphics commands raw streams individual render targets GRID GPU or vGPU Hardware H.264 encoder NVENC 3D NVIFR NVFBC Citrix XenDesktop VMware View Front Render Buffer Target NICE DCV Framebuffer HP RGS
USING NVIDIA GRID vGPU Citrix XenS erver — First hypervisor to support GRID vGPU — Also supports GPU passthrough — Open source vSphere XenServer — Full tools integration for GPU — GRID certified server platforms VMware vS phere — Coming soon!
XENSERVER SETUP Install XenS erver Install XenCenter management GUI on PC Install GRID Virtual GPU Manager rpm -i NVIDIA-vgx-xenserver-6.2-331.30.i386.rpm
ASSIGNING A vGPU TO A VIRTUAL MACHINE Citrix XenCenter management GUI Assignment of virtual GPU, or passthrough of dedicated GPU
BOOT, INSTALL OF NVIDIA DRIVERS VM’s console accessed through XenCenter Install NVIDIA guest vGPU driver
vGPU OPERATION NVIDIA driver now loaded, vGPU is fully operational Verify with NVIDIA control panel
DELIVERING GREAT REMOTE GRAPHICS Use a high performance remote graphics stack Tune the platform for best graphics performance
TUNING THE PLATFORM Platform basics GPU selection NUMA considerations
PLATFORM BASICS Use sufficient CPU! — Graphically intensive apps typically need multiple cores Ensure CPUs can reach their highest clock speeds — Enable extended P-states / TurboBoost in the system BIOS — S et XenS erver’s frequency governor to performance mode xenpm set-scaling-governor performance /opt/xensource/libexec/xen-cmdline --set-xen cpufreq=xen:performance Use sufficient RAM! - don’ t overcommit memory Fast storage subsystem - local S S D or fast NAS / S AN
MEASURING UTILIZATION [root@xenserver-vgx-test2 ~]# nvidia-smi Mon Mar 24 09:56:42 2014 nvidia-smi +------------------------------------------------------+ | NVIDIA-SMI 331.62 Driver Version: 331.62 | command line |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | utility |===============================+======================+======================| | 0 GRID K1 On | 0000:04:00.0 Off | N/A | | N/A 31C P0 20W / 31W | 530MiB / 4095MiB | 61% Default | +-------------------------------+----------------------+----------------------+ | 1 GRID K1 On | 0000:05:00.0 Off | N/A | | N/A 29C P0 19W / 31W | 270MiB / 4095MiB | 46% Default | Reports GPU +-------------------------------+----------------------+----------------------+ | 2 GRID K1 On | 0000:06:00.0 Off | N/A | | N/A 26C P0 15W / 31W | 270MiB / 4095MiB | 7% Default | utilization, +-------------------------------+----------------------+----------------------+ | 3 GRID K1 On | 0000:07:00.0 Off | N/A | memory usage, | N/A 28C P0 19W / 31W | 270MiB / 4095MiB | 46% Default | +-------------------------------+----------------------+----------------------+ | 4 GRID K1 On | 0000:86:00.0 Off | N/A | temperature, and | N/A 26C P0 19W / 31W | 270MiB / 4095MiB | 45% Default | +-------------------------------+----------------------+----------------------+ much more | 5 GRID K1 On | 0000:87:00.0 Off | N/A | | N/A 27C P0 15W / 31W | 10MiB / 4095MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 6 GRID K1 On | 0000:88:00.0 Off | N/A | | N/A 33C P0 19W / 31W | 270MiB / 4095MiB | 53% Default | +-------------------------------+----------------------+----------------------+ | 7 GRID K1 On | 0000:89:00.0 Off | N/A | | N/A 32C P0 19W / 31W | 270MiB / 4095MiB | 46% Default | +-------------------------------+----------------------+----------------------+
MEASURING UTILIZATION GPU utilization graph in XenCenter
PICK THE RIGHT GRID GPU GRID K2 2 high-end Kepler GPUs 3072 CUDA cores (1536 / GPU) 8GB GDDR5 (4GB / GPU) ENGINEER / DESIGNER GRID K1 4 entry Kepler GPUs POWER USER 768 CUDA cores (192 / GPU) 16GB DDR3 (4GB / GPU) KNOWLEDGE WORKER
SELECT THE RIGHT VGPU GRID K2 2 high-end Kepler GPUs GRID K260Q 3072 CUDA cores (1536 / GPU) ENGINEER 8GB GDDR5 (4GB / GPU) 2GB framebuffer DESIGNER 4 heads, 2560x1600 GRID K240Q POWER USER 1GB framebuffer 2 heads, 2560x1600 GRID K200 KNOWLEDGE 256MB framebuffer WORKER 2 heads, 1920x1200
SELECT THE RIGHT vGPU GRID K1 GRID K140Q 4 entry Kepler GPUs 768 CUDA cores (192 / GPU) POWER USER 1GB framebuffer 16GB DDR3 (4GB / GPU) 2 heads, 2560x1600 GRID K100 KNOWLEDGE 256MB framebuffer WORKER 2 heads, 1920x1200
TAKE ACCOUNT OF NUMA Non-Uniform Memory Access Memory 1 Memory 0 Memory and GPUs connected to each CPU CPU S ocket 1 CPU S ocket 0 CPUs connected via Core Core Core Core Core Core proprietary interconnect Core Core Core Core Core Core CPU Interconnect CPU/ GPU access to memory PCI PCI Express Express on same socket is fastest GPU GPU GPU GPU Access to memory on remote socket is slower
Recommend
More recommend