S5393 - EVOLUTION OF AN NVIDIA GRID™ DEPLOYMENT ERIK BOHNHORST , SR. GRID SOLUTION ARCHITECT , NVIDIA RONALD GRASS, SR. SYSTEMS ENGINEER, CITRIX SYSTEMS
What we will cover Who implemented NVIDIA GRID with Citrix XenDesktop Why did they want to move to a remote desktop solution How did they evaluate and implemented NVIDIA GRID Sales pitch & TechDemo Proof of concept Production environment Challenges and learnings How will they move forward
Who are we talking about Manufacturing vertical NVIDIA QUADRO customer Competitive market Wide range of CAD/CAE applications Experienced with remote desktop solutions
Business Drivers and initiatives Growing globalization within the company Enabling remote sites across the globe Increasing competition to hire the best Allowing employees, partners and contractors to work from anywhere Increasing competition to design and build faster with better quality Increasing productivity and flexibility Enable collaboration between internal and external teams Increasing security breaches Increasing the security and compliance German law ( “ Arbeitnehmerueberlassung ”) Enabling contractors to work off premise
Wouldn’t it be great if…. FROM ANYWHERE ON ANY DEVICE PRODUCTIVITY INCREASE SECURITY & COMPLIANCE LESS REDUNDANT INFRASTRUCTURE COLLABORATION
Project Start – early 2013 Evaluation of multiple remote solutions Interest in HP Blades due to the high density of GPUs Customer received a sales pitch on NVIDIA vGPU & XenDesktop Overall plan was to evaluate NVIDIA vGPU in early beta under NDA and compare NVIDIA vGPU vs. GPU Passthrough
O nce upon a time ... when the customer started
Citrix & Nvidia Partnership since 2008 GRID Announced during Nvidia GTC Keynote May 2012 Citrix vGPU announced during Synergy Keynote May 2013 Somewhere in between Nvidia RTM 2013 Sep 2013 vGPU Tech Preview Oct 2013 vGPU General Availability Dec 2013
Evolution of Nvidia GRID / vGPU : 2013 vGPU beta Only 5 vGPU profiles + passthrough available [root@SM01 ~]# xe vm-list name-label=Win7-vGPU-01 K100, K140Q, K200, K240Q, K260Q, passthrough w0rk5 f0r m3 ... s0 uuid ( RO) : 831ab2f3-8e23-e876-d92a-16810a85499e name-label ( RW): Win7-vGPU-01 ch3ck the uuids, Limited to Windows 7 only power-state ( RO): halted bl00dy n00b !! Creating a passthrough or vGPU objects was possible through CLI only :( [root@SM01 ~]# xe vgpu-create vm-uuid=831ab2f3-8e23-e876-d92a-16810a85499e gpu-group-uuid=d840caad- 2ce0-6395-78a5-9ac984667412 vgpu-type-uuid=5514073f-6d7b-90c6-6648-2335ad1cc81a No way to use passthrough and vGPU VMs at the same time I can‘t get it to :-( work XenServer 6.2 only ( special patched) 23908c99-eecb-835e-fd46-5936e0a3bf652 That’s our vGPU object as seen by the hypervisor (XenServer 6.2) Very limited hardware available Evolution of Nvidia GRID / vGPU : 2013 RTM Same 5 vGPU profiles + passthrough available K100, K140Q, K200, K240Q, K260Q, passthrough Creation of vGPUs and monitoring of pGPUs through CLI or XenCenter (GUI) Mass creations of vGPU enabled VMs through Desktop Studio (XenDesktop >7.1) Passthrough and vGPU VMs can be run simultaneously XenServer 6.2 SP1 with 64 vGPUs
Lifecycle of a successful GRID implementation Phase 1 (TechDemo) Conduct a techdemo for CAD/CAM responsibles / engineers that leads to a „WOW“ -effect. Phase 2 (Assessment & small and focussed PoC) Phase 3 (widened PoC based on feedback) Phase 4 (Implementation/User Acceptance/Production) Phase 5 (Maintenance / Update / Daily Use)
Sales pitch & TechDemo – create the „WOW - effect“ We did a sales pitch on Nvidia GRID and a very convincing TechDemo of Citrix XenDesktop with vGPU on XenServer to create the WOW-effect Demo Applications like Nvidia Hair, Nvidia Faceworks, Design Garage, Blender, VRRender, Autodesk 30 Day Trials or JT2Go have been used because of the lack of licenses and deep CATIA / SolidEdge / Siemens NX knowledge Demonstrated access from mobile platforms ( Android - Galaxy Tab 10.1 and iOS - iPad) We used Cloud-hosted Demo Center which proved this solution will work over WAN as well Focused on user experience and used peripherals (i.e Spacepilot)
From WOW to HOW ? Next steps Phase 1 (TechDemo) Phase 2 (Assessment and very focussed PoC) Start with a strictly defined use case ( LAN only, specific applications, small usergroup) Collect feedback on user experience, network Phase 3 (widened PoC based on feedback) Evaluate user feedback Widening use cases like remote access (WAN) Use more complex drawings / models and higher end use cases ( Engineer vs. Viewer only ) Phase 4 (Implementation/User Acceptance/Production) Phase 5 (Maintenance / Update / Daily Use)
Components involved Dassault CATIA, Siemens NX, AutoDesk CAD Application products, PTC Creo, JT2GO Citrix XenDesktop 7.1 or 7.5 Citrix Virtual Desktop Agent NVIDIA Display Driver 332.83 & NVIDIA vGPU Driver corresponding vGPU Manager version Hypervisor Citrix XenServer 6.2 SP1 Dual Socket Server with two 2x Intel E5-2690 v2, 256 GB RAM, SSDs, NVIDIA GRID K2 2x GRID K2
POC – Define virtual Workstations User Remoting GPUs per host OS vCPUs Virtual GPU Frame Buffer GPU Mode Segmentation Stack (2x GRID K2) Citrix Entry Windows 7 4 GRID K220Q 512 NVIDIA vGPU 32 XenDesktop Citrix Medium Windows 7 4 GRID K240Q 1024 NVIDIA vGPU 16 XenDesktop Citrix Advanced Windows 7 4 GRID K260Q 2048 NVIDIA vGPU 8 XenDesktop Citrix Expert Windows 7 4 GRID “K280Q” 4096 Passthrough 4 XenDesktop NICE DCV, Medium Linux 4 GRID K2 4096 Passthrough 4 HP RGS NICE DCV, Expert Linux 4 GRID K2 4096 Passthrough 4 HP RGS
Technical challenges Physical laws (latency, bandwidth,packet loss) Matching workstation-like user experience Server / Client side rendered mouse cursor Endpoint devices & endpoint performance (i.e. ThinClients) High screen resolution – lots of data (UHD/4K) Framerate / Low bandwidth / Graphics quality API support Distributed locations Peripheral devices
Bandwidth, Latency, Network Quality Quality and performance are in close relationship with available network (bandwidth) and distance (latency) Average User ~1-2 Mbps * Expert User ~4-5 Mbps * 20 Mbps for ~15 CAD/CAM Engineers * Influencing parameters Windows size and number of monitors Screen resolution Size of models, different usage patterns (VR, CAD, DMU, 3D-Viewing, etc.) Individual perception / level of acceptance (User Experience) * average measurements Source: Customer presentation
Technical Pitfalls we experienced 64bit hardware (MMIO - BAR Mapping) Server and GRID Card BIOSes NUMA – Server architecture Endpoint devices & performance (i.e. ThinClients + supported protocols) Framebuffer grabbing (NVFBC / Monterey API)
POC – End user feedback Source: Customer presentation
POC – IT administrator evaluation User Remoting GPUs per host OS vCPUs Virtual GPU Frame Buffer GPU Mode Segmentation Stack (2x GRID K2) Citrix Too little GPU frame buffer and not enough CPU resources Entry Windows 7 4 GRID K220Q 512 NVIDIA vGPU 32 XenDesktop Citrix Great performance and great scalability for most users Medium Windows 7 4 GRID K240Q 1024 NVIDIA vGPU 16 XenDesktop Citrix Great performance and good scalability for many users Advanced Windows 7 4 GRID K260Q 2048 NVIDIA vGPU 8 XenDesktop Citrix Great performance but doesn’t build the business case Expert Windows 7 4 GRID “K280Q” 4096 Passthrough 4 XenDesktop NICE DCV, Medium Linux 4 GRID K2 4096 Passthrough 4 HP RGS NICE DCV, Expert Linux 4 GRID K2 4096 Passthrough 4 HP RGS
POC – Sizing learning NVIDIA GRID vGPU NVIDIA QUADRO 3D 3D Engine Engine VS Frame Frame Buffer Buffer Time scheduling allows highest densities without compromising performance Customers need to understand the GPU requirements of their applications
POC – Organizational challenges Targets Project Schedule Clarification of support by the software vendors Decision on license model for CAx-Applications on virtual machines - international usage - usage by external partners, etc. Adjusting applications or the associated environment for an optimal use of the applications on virtual machines Support model for company Source: Customer presentation internal and external users Project result must be a validated technical solution which will be provided to customers internal departments and their external development partners as an IT Service
Lifecycle of a successful GRID implementation Phase 1 (TechDemo) Phase 2 (Assessment & small and focussed PoC) Phase 3 (widened PoC based on feedback) Phase 4 (Implementation/User Acceptance/Production) Educate support engineers / introduce support matrix Implement daily managment processes like provisioning of new and patching of existing VMs Phase 5 (Maintenance / Update / Daily Use)
Meanwhile things changed …
Recommend
More recommend