April 4-7, 2016 | Silicon Valley REAL PERFORMANCE RESULTS WITH VMWARE HORIZON AND VIEWPLANNER Manvender Rawat, NVIDIA Jason K. Lee, NVIDIA Uday Kurkure, VMware Inc.
Overview of VMware Horizon 7 and NVIDIA GRID 2.0 Overview of VMware View Planner Blast Protocol AGENDA Performance and Scaling Results with Knowledge Worker Workloads Blast Extreme (GPU) vs. Blast Extreme (CPU ) vs PCoIP 2
INTRODUCTION 3
VMWARE HORIZON WITH NVIDIA GRID 4
HOW DOES NVIDIA GRID WORK? Virtual Virtual Virtual Virtual Virtual Virtual PC Workstation PC PC Workstation Workstation Virtualization Layer NVIDIA Graphics NVIDIA Graphics NVIDIA Graphics NVIDIA Quadro NVIDIA Quadro NVIDIA Quadro Driver Driver Driver Driver Driver Driver vGPU vGPU vGPU vGPU vGPU vGPU Hypervisor NVIDIA GRID vGPU manager Hardware NVIDIA NVIDIA CPUs Server GPU GPU H.264 Encode 5
HOW IT WORKS TODAY: PCoIP SERVER with GRID GPU CLIENT CPU NIC Kybd/Mse IP Network Decode Encode Render Render Capture GRID GPU WORKLOAD NON GPU WORKLOAD 6
NVIDIA BLAST EXTREME ACCELERATION SERVER with GRID GPU CLIENT CPU NIC Kybd/Mse IP Network Decode Encode Render Render Capture GRID GPU WORKLOAD NON GPU WORKLOAD 7
CPU BASED CAPTURE & ENCODE PIPELINE Execute Execute Transfer Transfer Packetize Capture Load GPU Encode Load App CPU GPU output to image to Encode & Display data in FB workload workload sys-mem sys-mem transmit CPU GPU CPU Increased CPU workload • Limited Scalability • Multiple Memory Transfers • 8
GPU BASED CAPTURE & ENCODE PIPELINE Execute Execute Packetize Load GPU Capture Execute Load App CPU GPU Encode Encode & Load GPU Capture data in FB Display Execute GPU Encode Encode workload workload transmit Load GPU Capture data in FB Display Execute GPU Encode Encode workload Load GPU Capture data in FB Display Execute GPU Encode Encode workload Load GPU Capture data in FB Display Execute GPU Encode Encode workload Load GPU Capture data in FB Display Execute GPU Encode Encode workload Load GPU Capture data in FB Display Execute GPU Encode Encode workload Load GPU Capture data in FB Display Encode GPU workload data in FB Display workload GPU CPU CPU workload offloaded to GPU • Increased Scalability • Reduced Memory Transfers • 9
CHALLENGES IN PERFORMANCE BENCHMARKING Selection of Workloads/Applications Automation Performance Metrics Scaling 10
BENCHMARKING FRAMEWORK VIEWPLANNER Simplicity: Ease of use - Simple Web Interface Expandability: Easily Add New Workloads Elasticity: Ease of Scaling with View and VP 11
BENCHMARKING WITH VIEWPLANNER Select the Workload Applications Provision the desired number of Desktop Virtual Machines with View and ViewPlanner Automatically Launch the Horizon Clients to Connect with the Desktops Automatically Start the workload on each of the desktop VMs Measure the Response times on the remote clients Do the analysis on Response Times and Resource Utilization Do the Scaling Experiments 12
VMWARE VIEWPLANNER 13
USER EXPERIENCE AND RESOURCE UTILIZATION User Experience in ViewPlanner is defined by Frames per Second Response Times Measuring Resource Utilization Nvdia-smi GPU Utilization Built-in VMware vSphere Tools CPU Utilization Memory Usage Network Statistics IO Statistics 14
PERFORMANCE METRICS MEASUREMENT Ramp up Ramp down Steady State For accurate results, the scores are computed in the Steady State Range. Exclude the Ramp Up & Ramp Down Iteration 15 results.
PARTNERS AND CUSTOMERS Using ViewPlanner 16
KNOWLEDGE WORKLOAD TEST RESULTS 17
NVIDIA TEST SETUP Virtual Client VMs Virtual VDI desktop VMs • 64-bit Win7 (SP1) • 64-bit Win7 (SP1) • • 4vCPU, 4 GB RAM 6vCPU, 14 GB RAM, 50GB HD • • View Client 4.0 Horizon View 7.0 agent Remote Display Protocol Blast Extreme / PCoIP Storage SuperMicro SYS-2027GR-TRFH SuperMicro SYS-2028GR-TRT Intel Xeon E5-2698 v3 @ 2.30GHz + 2 x Nvidia GRID M60 Intel Xeon E5- 2690 v2 @ 3.00GHz + 2 x Nvidia GRID K1 32 cores (2 x 16-core socket) Intel Haswell 20 cores (2 x 10-core socket) Intel IvyBridge 256 GB RAM 256 GB RAM 18
ADOBE PHOTOSHOP OPENGL WORKLOAD OVERVIEW 19
ADOBE PHOTOSHOP OPENGL WORKLOAD WORKLOAD Scaling 1VM to 48 VMs 3D intensive app 20
AUTOCAD BENCHMARK – USER EXPERIENCE METRIC Assuming user experience is FPS on our NVIDIA AutoCAD benchmark • • Only one measurement at the moment For AutoCAD anything higher than 20 FPS is awesome but users generally don’t • notice the difference once you exceed 30 FPS. But once you drop below 10 FPS, the software is going to feel very sluggish and • become unusable by the time you hit 5 FPS. 20 fps above is good – Autodesk claim this is minimum UX threshold. • Below 10fps – sluggish • • 5 fps – unusable 21
AUTOCAD WORKLOAD HOST UTILIZATION Host CPU utilization, NVEnc vs PCoIP Total 10913 vs 10570 : Very similar 100 90 80 Lower is better 70 60 50 40 30 20 10 0 23:10:57 23:11:59 23:13:01 23:14:03 23:15:05 23:16:07 23:17:09 23:18:11 23:19:12 23:20:14 23:21:16 23:22:18 23:23:20 23:24:22 23:25:24 23:26:25 23:27:27 23:28:29 23:29:31 23:30:33 23:31:35 23:32:37 23:33:39 23:34:40 23:35:42 23:36:44 23:37:46 23:38:48 23:39:50 23:40:52 23:41:54 23:42:56 23:43:58 23:45:00 23:46:02 23:47:04 23:48:06 23:49:08 23:50:10 23:51:12 23:52:14 23:53:16 23:54:18 23:55:20 23:56:22 23:57:23 23:58:25 23:59:27 0:00:29 0:01:31 0:02:32 0:03:34 0:04:36 0:05:37 0:06:39 0:07:41 0:08:43 0:09:45 0:10:46 0:11:48 0:12:50 0:13:51 nvenc pcoip NvEnc Encoder The AutoCAD benchmark doesn’t show rapid pixels moving or doesn’t contains huge pixels on the • screen, NVEnc encoder didn’t utilize(around 50% during all benchmark) • Both case Blast Extreme(NVEnc GPU) and PCoIP enabled hosts are show similar CPU host utilization 22 •
Utilization % 100 10 20 30 40 50 60 70 80 90 0 19:54:47 19:56:48 19:58:49 20:00:51 20:02:52 AUTOCAD WORKLOAD 32 VM GPU 20:04:53 20:06:54 20:08:56 20:10:57 20:12:58 20:15:00 20:17:01 20:19:02 20:21:04 20:23:05 20:25:06 20:27:08 20:29:09 UTILIZATION GPU utilization 20:31:10 20:33:11 20:35:13 20:37:14 20:39:15 20:41:17 20:43:18 Time GPU memory utilization 20:45:19 20:47:21 20:49:22 20:51:23 20:53:24 20:55:26 20:57:27 20:59:28 21:01:29 21:03:31 21:05:32 21:07:33 21:09:35 21:11:36 21:13:37 21:15:39 21:17:40 21:19:41 21:21:42 21:23:44 21:25:45 21:27:46 21:29:48 21:31:49 21:33:50 23
BLAST EXTREME(GPU) AVERAGE FPS (UX) AutoCAD AVG FPS, M60-1Q 32VMs Blast Extreme(GPU) vs PCoIP 40.00 36.81 36.49 35.00 30.00 Higher is better 25.00 FPS 20.00 15.00 Minimum fps for UX 10.00 5.00 0.00 NvEnc(build3) PCoIP The host DOES NOT saturate CPU resource 100% with 32 VMs current launching • we can scale more than 32. Planning testing go further. GPU isn’t bottleneck for scaling. • 24
VMware Test-bed for NVIDIA GRID on Horizon View Virtual Client VMs Virtual VDI desktop VMs • 64-bit Win7 (SP1) • 64-bit Win7 (SP1) • • 1 vCPU, 2 GB RAM, 2vCPU, 4 GB RAM, 40GB HD • • View Client 4.0 Horizon View 7.0 agent Remote Display Protocol Blast Extreme / PCoIP Storage Dell R730 – Intel Haswell CPUs + 2 x NVidia GRID M60 Dell R730 – Intel Haswell CPUs + 2 x NVidia GRID M60 24 cores (2 x 12-core socket) E5-2680 V3 24 cores (2 x 12-core socket) E5-2680 V3 384 GB RAM 384 GB RAM 25
REMOTE DISPLAY PROTOCOLS IN HORIZON VMware's Remote Display Protocol Blast Extreme Based on a Standard H.264 Exploits NVIDIA GPU Capabilities for Encoding Clients can use any GPU or CPU for decoding. Blast Extreme (GPU) : Blast GPU Uses GPU assist for H264 Encoding NVidia Tesla M60 Virtual Grid in Enterprise Cloud Blast Extreme (CPU) : Blast CPU Does not use hardware GPU assist for H264 Encoding PCoIP and Microsoft RDP 26 CONFIDENTIAL 2 6
KNOWLEDGE WORKER APPS Knowledge Worker Applications in ViewPlanner 3.6 Office Apps: Word, Excel, PowerPoint, Outlook Adobe Acrobat Reader, Firefox, 7zip Windows Media Player 27
VIEWPLANNER QOS METHODOLOGY Operations are split in Groups Group A: Interactive/fast-running CPU bound operations User expects minimal latencies E.g. Modifying Word, Excel Operations Group B: Long-running slow IO bound operations User can tolerate longer latencies E.g. Saving PowerPoint, Zip/UnZip QoS Criteria: Group A:95 th %ile : 0.70s ( <= 1.0 s) Group B: 95 th %ile: 2.3s ( <= 6.0s) 28 4/20/2016
VP MEASUREMENTS ON REMOTE CLIENTS Measures True Remote User Experience Measurements are done on remote clients Latency Measurement Each Operation’s Start Time and End Time are noted on the Remote Client as the Remote Client sees it. Frames/Second Metric for Video Workload Frames Seen by the remote client are counted 29 4/20/2016
KNOWLEDGE WORKER WORKLOAD GROUP A LATENCIES Lower is Better 1.20 1.20 Normalized Latencies wrt PCoIP 1.00 1.00 0.80 0.80 Seconds 0.60 0.60 0.40 0.40 0.20 0.20 0.00 0.00 1.00 8.00 16.00 32.00 48.00 64.00 #of VMs BlastGPU BlastCPU PCoIP BlastGPU/PCoIP BlastCPU/PCoIP 30
Recommend
More recommend