S9670 VIRTUAL DESKTOPS BY DAY, COMPUTATIONAL WORKLOADS BY NIGHT - - PowerPoint PPT Presentation

S9670 VIRTUAL DESKTOPS BY DAY, COMPUTATIONAL WORKLOADS BY NIGHT - AN EXAMPLE INFRASTRUCTURE Shailesh Deshmukh Senior Solution Architect Konstantin Cvetanov Senior Solution Architect Eric Kana Senior Solution Architect GPU Technology Conference 2019

• What We Will Discuss • Benefits of VDI • Computation Defined and Context • Dual-Use and Workflow Scenarios AGENDA • Operational Challenges • Solution Options • Reference Architecture • Demonstration • Summary NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

WHAT WE WILL DISCUSS A practical approach to configure intervals of VDI and Computational Resources on a daily basis – in an environment primarily designed for VDI - using commonly available tools. More about perspective than technology NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

BENEFITS OF VIRTUAL DESKTOP INFRASTRUCTURE • Enable flexible workflow scenarios • Utilize centralized, shared, and protected storage • Enable intellectual property protection • Provide flexibility in configuration • Enable user/workforce mobility • Widely supported GPU acceleration What you planned the system to do. NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

COMPUTATIONAL SPECTRUM Additive Scale of Requirements Classic High End Compute Compute Requirements • High Performance Interconnects General Compute • High Performance Storage Multi-node Support • Double Precision Math • Job Scheduling • Multi-GPU Support • Bandwidth Sensitivity • Latency Pressure • • Long runtimes • Storage Pressure • Memory Page Retirement ‘Lite’ Compute • Short to medium runtimes ECC Memory • • CUDA Higher CPU Utilization • • OpenCL Linux Support • • Single Precision Math Latency tolerant • Very short runtimes • Windows Support • System Complexity NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

WHY DUAL USE? • Cost and/or space savings • Variable usage trends/rates • Desire for on-prem elasticity • Unpredictable user community • Provide more workflow options to more users • Effective cost justification (capital/operational) Make best use of available resources NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

SCENARIO CONSIDERATIONS FOR DUAL USE • Creative Studio – Artists go home during late hours • Architecture Firm – Engineers/Designers work daylight hours • University/College – Lower utilization during summer sessions • Financial Services Firm – Lower utilization when markets are closed • Gov’t Agency – Multiple programs, duplicate (idle) resources Primary goal is user experience NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

WORKFLOW CONSIDERATIONS FOR DUAL USE • Creative Studio – Create during day / Render by Night • Architecture Firm – Design during day / Render-Compute by Night • University/College – Sell cycles or run experiments during Summer • Financial Services Firm – Traders by day / Numerical analysis by night • Gov’t Agency – Analysis work by Day / Image processing at Night Get creative with workflow overlap NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

OPERATIONAL CHALLENGES • What to do with our user VMs? • How do we best provision user VMs? • How do we monitor utilization? • How do we orchestrate user VM state, migration, and timing? • How do we manage compute jobs, and be ready for user VM restart? • How will users be productive in a scheduled environment? Manage Users, balanced with Compute Productivity NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

VECTORS FOR SUCCESS • User policies – reboot per day or week • Single precision math jobs • Single GPU compute jobs • Jobs that may be coalesced • Excess capacity • Stakeholder buy-in • Skilled admin staff NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

COMMON VDI INFRASTRUCTURE ASSETS • Hypervisor(s) – vSphere, AHV, RHVH, XenServer • vGPU Software • Compute cluster of nodes (chassis) • CPUs, GPUs, Storage, Network Assets • Monitoring Tools • Orchestration / Layering Tools • Containers • Job Schedulers Many common building blocks available NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

SOLUTION VECTORS • Shut down (all users) and swap (in all the compute) • Shut down (some users) and swap in (some) compute • Migrate/degrade (users) to fewer hosts, swap (in some/all) compute • Shut down (all users) and reprovision (to bare metal) nodes • Keep all users intact; initiate a cycle harvester • Some mixture of the above • Other options… GOAL = Use common and available tools NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

OPTION 1: SHUT DOWN / SWAP IN • Shut Down User Pool • Spin up compute Pool • Run Scheduled Jobs • Spin down compute Pool • Restart User Pool (Partial Shutdown also applies) NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

ARCHITECTURE DIAGRAM SLURM Controller License Managers Active Directory Windows 10 - VDI VM Pool Ubuntu - Compute VM Pool(s) vRealize Manager VIEW Broker vSphere vSphere vSphere vSphere vSphere vSphere vSphere .... Chassis Chassis Chassis Chassis Shared Storage Control Resources Compute Resources NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

SLURM WORKLOAD MANAGER ” Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters.” Source: https://slurm.schedmd.com/overview.html Components: • Centralized Manager: slurmctld – monitors resources and work Compute Node daemon: slurmd – waits for and executes work, returns work status • In this example: Slurm-ctrl = cluster controller VM • Compute[01-07] = compute VMs (nodes) • NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

ANATOMY OF A COMPUTE VM • Ubuntu 16.04/18.04 • Docker, nv-docker, Anaconda, Python3-pip, ipython- notebook • vGPU 7.1 • CUDA 10, toolkit, and samples • SLURM • VMware VIEW agent • DHCP per Active Directory DNS • Packaged as a VM template NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

COMPUTE PARTITION ORGANIZATION Ubuntu - Compute Pool Partitions vSphere vSphere vSphere vSphere vSphere vSphere .... .... Chassis Chassis Chassis GPU Type A GPU Type C GPU Type B CPU Type A CPU Type B CPU Type C Template A (Master Image) Template C Template B Template Resource Partitions (SLURM) NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

SLURM COMPUTE PARTITION CONFIG /etc/slurm/slurm.conf sinfo output Linux VM Templates mapped to Compute Partitions NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

OPERATIONAL TIMELINE Start VDI / Evacuate Compute Evacuate VDI / Start Compute Compute State VDI State VDI State 4 x T4-16Q 6 VMs (Linked-clones) 6 VMs (Linked-clones) 1 x V100-32Q Windows 10 Windows 10 2 x RTXx24Q Non-persistent VMs Non-persistent VMs ==================== T4-8Q vDWS Profiles T4-8Q vDWS Profiles 7 compute VMs Compute State time VDI State VDI State t1 t2 t3 Midnight 6 am 6 am NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

VCENTER INTERVAL SCHEDULING VDI Interval: Compute Interval: NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

SHUT DOWN / SWAP IN - HARDWARE Component Name GPU Tesla T4, V100, P40, RTX Chassis Supermicro 4029GP , Dell R740, HPDL380 Gen9 Storage FA-M20R2 (Pure Storage) Network CISCO 10G Endpoints Various NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

SHUT DOWN / SWAP IN - SOFTWARE Component Name Hypervisor vSphere 6.7u1 Hypervisor Manager vCenter 6.7 Job Scheduler Slurm 17.11.12 Interval Scheduler vCenter 6.7 VDI Guest o/s Windows 10 Compute Guest o/s Ubuntu 16.04 NVIDIA vGPU Software vGPU 7.1 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

ENVIRONMENT MONITORING NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

FUTURE NEEDS AND ASKS • Multiple GPUs per VM – limited availability today • Dynamic vGPU assignment per Template provisioning • Dynamic vGPU on live migration • vGPU + GPU ECC + UVM + P2P – supports relevant compute • vGPU + GPU memory Page retirement • VM snapshots and user sessions • Storage optimizations • Live migration integration – exists today NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

IMPORTANT: VGPU VM DEPLOYMENT POLICY (VMWARE / CITRIX) VMware vSphere Hypervisor (ESXi) by default uses a breadth-first allocation scheme for vGPU-enabled VMs; allocating new vGPU-enabled VMs on an available, least loaded physical GPU. We need to change that .. For Citrix, its easy NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

FINDINGS • At least 1 vCenter VM powered on in a pool (20/80 best practice) • Unify the storage for users and data – both VDI and Linux • Alert users when jobs don’t start properly - SLURM • Care for permissions – SLURM, containers, renderers, storage • SLURM is very powerful and potentially complex – understand it • Manage user VDI logistics and operations • Keep the UX paramount NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

S9670 VIRTUAL DESKTOPS BY DAY, COMPUTATIONAL WORKLOADS BY NIGHT - AN EXAMPLE INFRASTRUCTURE Shailesh Deshmukh Senior Solution Architect Konstantin Cvetanov Senior Solution Architect Eric Kana Senior Solution Architect GPU Technology Conference 2019

S9670 VIRTUAL DESKTOPS BY DAY, COMPUTATIONAL WORKLOADS BY NIGHT - - PowerPoint PPT Presentation

S9670 VIRTUAL DESKTOPS BY DAY, COMPUTATIONAL WORKLOADS BY NIGHT - AN EXAMPLE INFRASTRUCTURE Shailesh Deshmukh Senior Solution Architect Konstantin Cvetanov Senior Solution Architect Eric Kana Senior Solution Architect GPU Technology

Remote access access to GP Desktops Desktops Desktops Desktops Dr Paul Mi aul Miller SCIMP

NVIDIA GRID Linux Virtual Desktops with NVIDIA Virtual GPUs for Chip-Design Applications Shailesh

Introduction Workloads for Experiments Introduction to workloads CS 239 Workload

At Creation Common Holy Day 1 Day 2 Day 8 Day 9 Day 3 Day 4 Day 5 Day 6 Day 7 7 Days The

Integrating non-DCE/DFS Desktops into an existing DCE/DFS Environment Markus Zahn Computer

The Latest in High Performance 3D Desktops with VMware Horizon and NVIDIA GRID Agenda Why

puavo.org Managing Linux desktops in large-scale (in Finnish schools) Juha Erkkil a Opinsys

Week 1, class 1 Tasks for today: Get deal.II installed on your desktops Talk about the

Understanding Big Data Workloads on Understanding Big Data Workloads on Modern Processors using

Evaluation of Memory and CPU usage via Cgroups of ATLAS workloads via Cgroups of ATLAS workloads

Designing Fast Virtual Desktops for Healthcare What is VDI* and why is it important to healthcare?

GROUPS Virtual Group Topics Overview of Virtual Groups Participating as a Virtual Group in

Science with a Little Altitude | QS18 Fah Sathirapongsasuti, PhD EBC Everest Day 1 Day 2 Day

S8483 - Empowering CUDA Developers with Virtual Desktops Tony Foster Sr. Advisor, Technical

S7349: Getting Started with GPUs for Linux Virtual Desktops on VMware Horizon Trey Johnson

Capo: Recapitulating Storage for Virtual Desktops Mohammad Shamma, Dutch T. Meyer, Jake Wires,

OneNote for collaborative, electronic research notebooks and all-in-one classroom presentations

Syncing Traffic Incident Data Using FME Presenter: Trey Nunn June 3, 2020 8505 Technology

Investor Presentation November 2016 SAFE HARBOR This presentation contains forward-looking

Break Through The Noise with Smart Resurfacing, Active Listening, and Audience Thinking.

Fraud: Detection & Prevention December 2017 Agenda IT Security Bill Golden, CIO

CAMPARE and Cal-Bridge: Engaging Underrepresented Students in

Fellowship Applications: my experience Katherine Joy School of Earth Atmospheric and

The present and future of voiceprint based security Prof. Eliathamby Ambikairajah Head of School

Sambuz

Useful Links

Newsletter

Mail Us

S9670 VIRTUAL DESKTOPS BY DAY, COMPUTATIONAL WORKLOADS BY NIGHT - - PowerPoint PPT Presentation

S9670 VIRTUAL DESKTOPS BY DAY, COMPUTATIONAL WORKLOADS BY NIGHT - AN EXAMPLE INFRASTRUCTURE Shailesh Deshmukh Senior Solution Architect Konstantin Cvetanov Senior Solution Architect Eric Kana Senior Solution Architect GPU Technology

Remote access access to GP Desktops Desktops Desktops Desktops Dr Paul Mi aul Miller SCIMP

NVIDIA GRID Linux Virtual Desktops with NVIDIA Virtual GPUs for Chip-Design Applications Shailesh

Introduction Workloads for Experiments Introduction to workloads CS 239 Workload

At Creation Common Holy Day 1 Day 2 Day 8 Day 9 Day 3 Day 4 Day 5 Day 6 Day 7 7 Days The

Integrating non-DCE/DFS Desktops into an existing DCE/DFS Environment Markus Zahn Computer

The Latest in High Performance 3D Desktops with VMware Horizon and NVIDIA GRID Agenda Why

puavo.org Managing Linux desktops in large-scale (in Finnish schools) Juha Erkkil a Opinsys

Week 1, class 1 Tasks for today: Get deal.II installed on your desktops Talk about the

Understanding Big Data Workloads on Understanding Big Data Workloads on Modern Processors using

Evaluation of Memory and CPU usage via Cgroups of ATLAS workloads via Cgroups of ATLAS workloads

Designing Fast Virtual Desktops for Healthcare What is VDI* and why is it important to healthcare?

GROUPS Virtual Group Topics Overview of Virtual Groups Participating as a Virtual Group in

Science with a Little Altitude | QS18 Fah Sathirapongsasuti, PhD EBC Everest Day 1 Day 2 Day

S8483 - Empowering CUDA Developers with Virtual Desktops Tony Foster Sr. Advisor, Technical

S7349: Getting Started with GPUs for Linux Virtual Desktops on VMware Horizon Trey Johnson

Capo: Recapitulating Storage for Virtual Desktops Mohammad Shamma, Dutch T. Meyer, Jake Wires,

OneNote for collaborative, electronic research notebooks and all-in-one classroom presentations

Syncing Traffic Incident Data Using FME Presenter: Trey Nunn June 3, 2020 8505 Technology

Investor Presentation November 2016 SAFE HARBOR This presentation contains forward-looking

Break Through The Noise with Smart Resurfacing, Active Listening, and Audience Thinking.

Fraud: Detection &amp; Prevention December 2017 Agenda IT Security Bill Golden, CIO

CAMPARE and Cal-Bridge: Engaging Underrepresented Students in

Fellowship Applications: my experience Katherine Joy School of Earth Atmospheric and

The present and future of voiceprint based security Prof. Eliathamby Ambikairajah Head of School

Sambuz

Useful Links

Newsletter

Mail Us

Fraud: Detection & Prevention December 2017 Agenda IT Security Bill Golden, CIO