zen and the art of vgpu selection
play

ZEN AND THE ART OF VGPU SELECTION Jeremy Main - Lead Solution - PowerPoint PPT Presentation

ZEN AND THE ART OF VGPU SELECTION Jeremy Main - Lead Solution Architect NVIDIA GRID, Japan jmain@nvidia.com The real purpose of the scientific method is to make sure nature hasnt misled you into thinking you know something you actually


  1. ZEN AND THE ART OF VGPU SELECTION Jeremy Main - Lead Solution Architect NVIDIA GRID, Japan jmain@nvidia.com

  2. “ The real purpose of the scientific method is to make sure nature hasn’t misled you into thinking you know something you actually don’t know. ” Robert M. Pirsig Zen and the Art of Motorcycle Maintenance: An Inquiry Into Values 2

  3. FUNCTIONAL VIEWPOINTS WITH APPLICATION OF RATIONAL ANALYSIS 3

  4. FUNDAMENTALS: FRAME RATE 4

  5. 3D APPLICATION : CATIA V5 5

  6. FRAMERATE 4 seconds 6

  7. FRAMERATE 4 seconds 1 second 7

  8. FRAMERATE 4 Frames / Second 250ms / Frame 1 second 8

  9. FRAMERATE 8 Frames / Second 125ms / Frame 1 second 9

  10. FRAMERATE 16 Frames / Second 62ms / Frame 1 second 10

  11. FRAMERATE 30 Frames / Second 33ms / Frame 1 second 11

  12. FRAMERATE 60 Frames / Second 16ms / Frame 1 second 12

  13. AND SO? 13

  14. FRAMERATE IF the application can construct 3D data fast enough (efficient geometry representation) AND the GPU is powerful enough… Max FPS = 60 FPS GPU Utilization = 100% (grossly simplified for illustrative purposes only) 1 second 14

  15. FUNDAMENTALS: GPU UTILIZATION 15

  16. GPU UTILIZATION IF the application can construct 3D data fast enough (efficient geometry representation) AND the GPU is powerful enough Max FPS = 60 FPS GPU Utilization = 50% (grossly simplified for illustrative purposes only) 1 second 16

  17. GPU UTILIZATION IF the application can construct 3D data fast enough (efficient geometry representation) BUSY IDLE AND the GPU is powerful enough Max FPS = 60 FPS GPU Utilization = 50% (grossly simplified for illustrative purposes only) 1 second 17

  18. GPU UTILIZATION IF the application can’t construct 3D data fast enough (inefficient geometry representation) AND the GPU is powerful enough BUSY IDLE Max FPS = 15 FPS GPU Utilization = 20% (grossly simplified for illustrative purposes only) 1 second 18

  19. GPU UTILIZATION IF the application can construct 3D data fast enough (efficient geometry representation) BUSY AND the GPU is NOT powerful enough Max FPS = 15 FPS GPU Utilization = 100% (grossly simplified for illustrative purposes only) 1 second 19

  20. FUNDAMENTALS: VSYNC 20

  21. VSYNC VSYNC = ON : ~(v)Display horizontal Sync. Ex: 60Hz == 16ms/frame IF the application can construct 3D data fast enough (efficient geometry representation) BUSY IDLE AND the GPU is powerful enough Max FPS = 60 FPS GPU Utilization = 50% (grossly simplified for illustrative purposes only) 1 second 21

  22. VSYNC VSYNC = ON (Half Display Refresh): ~(v)Display horizontal Sync. Ex: (60Hz / 2) == 33ms/frame IF the application can construct 3D data fast enough (efficient geometry representation) BUSY IDLE AND the GPU is powerful enough Max FPS = 30 FPS GPU Utilization = 25% (grossly simplified for illustrative purposes only) 1 second 22

  23. FUNDAMENTALS: FRAME RATE LIMITER 23

  24. FRAME RATE LIMITER Frame Rate Limiter = ON : <= ~60 Potential frames rendered / second IF the application can construct 3D data fast enough (efficient geometry representation) BUSY IDLE AND the GPU is powerful enough Max FPS = 60 FPS GPU Utilization = 50% (grossly simplified for illustrative purposes only) 1 second 24

  25. FUNDAMENTALS GOING (OR NOT GOING) FASTER 25

  26. GOING (OR NOT GOING) FASTER Frame Rate Limiter = OFF , VSYNC = ON IF the application can construct 3D data fast enough (efficient geometry representation) BUSY IDLE AND the GPU is powerful enough Max FPS = 60 FPS GPU Utilization = 50% (grossly simplified for illustrative purposes only) 1 second 26

  27. GOING (OR NOT GOING) FASTER Frame Rate Limiter = ON , VSYNC = OFF IF the application can construct 3D data fast enough (efficient geometry representation) BUSY IDLE AND the GPU is powerful enough Max FPS = 60 FPS GPU Utilization = 50% (grossly simplified for illustrative purposes only) 1 second 27

  28. GOING (OR NOT GOING) FASTER Frame Rate Limiter = OFF , VSYNC = OFF IF the application can construct 3D data fast enough (efficient geometry representation) BUSY AND the GPU is powerful enough Max FPS = (until CPU or GPU bottleneck) GPU Utilization = 100% (grossly simplified for illustrative purposes only) 1 second 28

  29. FUNDAMENTALS: RENDERED VS. SAMPLED 29

  30. Sampling 20 Frames / Second 50ms / Sample Rendering 60 Frames / Second 16ms / Frame 30

  31. Sampling 20 Frames / Second 50ms / Sample Rendering 60 Frames / Second 16ms / Frame 31

  32. Sampling 20 Frames / Second 50ms / Sample Rendered but… unused frames Rendering 60 Frames / Second 16ms / Frame 32

  33. @VIRTUALIZED_RESOURCE “A BAD, VERY BAD WASTE OF SHARED RESOURCES” 33

  34. RENDERED VS. SAMPLED Options… Cautions If your sample framerate < 30 FPS, consider changing VSYNC policy to: “Adaptive Half Refresh” to lock max FPS @ 30 FPS and reduce “waste” May lead to additional input/output latency due to longer period between frame updates CPU based image compression can limit the actual delivered framerate based on quality settings, percentage of display changed, number of displays Network bandwidth deficiencies, quality affect delivered framerate Endpoint performance (ability to decode compression) affects displayable framerate 34

  35. FUNDAMENTALS: CPU 35

  36. CPU / VCPU UTILIZATION More is not always better 1 of 1 vCPUs @ 100% utilization = 100% reported utilization 1 of 2 vCPUs @ 100% utilization = 50% reported utilization 1 of 4 vCPUs @ 100% utilization = 25% reported utilization 1 of 8 vCPUs @ 100% utilization = 13% reported utilization Virtual environments using CPU-based image compression with full-screen updates can expect to have the compressor process consume a single vCPU Adding more vCPU cores can negatively impact VM performance due to pCPU scheduling contention by the hypervisor Know how much CPU resources your application and workload requires 36

  37. FUNDAMENTALS: SYSTEM MEMORY 37

  38. SYSTEM MEMORY Locked in (like a time-share contract) vDGA and vGPU VMs require all VM memory to be locked on startup Important consideration during PoC phase as well as production Be aware of VM memory exceeding the per-socket capacity (NUMA traversal) 38

  39. FUNDAMENTALS: FRAMEBUFFER 39

  40. FRAMEBUFFER I own thee… until shutdown It is yours for the duration so ensure you get the correct “size”, i.e. Profile Can not use another GPU’s framebuffer Does not support dynamic resizing Can not use excess “unused” capacity of other VM framebuffers on the same GPU Applications may efficiently represent geometry but will fall back to legacy methods when framebuffer is exhausted. Will lead to reduced rendering performance 40

  41. FUNDAMENTALS: DECODE 41

  42. DECODE For most of your video playback needs Stream must be h.264, VP8, HVEC Main Profile, VP9 Profile 0 Complete details in NVIDIA Video Codec SDK Application Notes – Decoder Application must support GPU decode capability for supported streams YouTube playback on Chrome uses VP9 (Caution) -> VP9 decode not verified FireFox, Edge will playback with hardware decode Splash player with GPU decode enabled will playback with hardware decode Other video players natively support available GPU decode as well 42

  43. FUNDAMENTALS: ENCODE 43

  44. ENCODER Free a vCPU do to other special things Dedicated silicon for encode on each GPU Out of band encoding, does not impact rendering performance NVENC added from Citrix XenDesktop 7.11 and VMware Horizon 7.0 Blast Extreme Confirm endpoints can perform H.264 decode, and enabled in client settings Up-to-date endpoint software required Ensure policies or settings do not override GPU encoder use; i.e. “build to lossless” 44

  45. MEASUREMENT 45

  46. MEASUREMENT PRINCIPLES Not all possible data points! Clarify and document the context(s) being measured Select metrics that will help explain different points of resource contention Capture workstation, PC data for pre-PoC sizing investigation (Optional) Capture screenshots @ 1FPS -> PNG -> ffmpeg -> MP4 file Capture VM, Endpoint and host metrics (nvidia-smi) for PoC Save data in a consistent manner, document testing procedures 46

  47. TOOLS 47

  48. TOOLS: SYSINFO32 Available in all Windows Environments Use SysInfo32 “System Information” to capture the measurement context CPU model, Clocks, Logical Cores Operating System Display Adapters Lots of ‘other’ information that surely must be interesting to someone? 48

  49. TITLE ONLY SLIDE 49

  50. TOOLS: PERFMON Available in all Windows Environments A large variety of counters! Very powerful for local or remote collection Some counters only exist in WMI, sadly Export hundreds of data points to CSV for endless sorting 50

  51. COUNTER CREATION AND USAGE

  52. Create new ”User Defined” collector Start ”perfmon” Expand “Data Collector Sets” Select “User Defined” -> “New” -> “Data Collector Set”

  53. Set base collector properties Enter a name for the collector Select “Create Manually” Click “Next”

  54. Configuration (continued 1) Select “Performance Counter”

  55. Configuration (continued 2) • Change sample interval 1 Second • Click “Add” to add counters

Recommend


More recommend