full virtualization for gpus reconsidered
play

Full Virtualization for GPUs Reconsidered Revisit -- Suzuki, Yusuke, - PowerPoint PPT Presentation

Full Virtualization for GPUs Reconsidered Revisit -- Suzuki, Yusuke, et al. GPUvm: Why not virtualizing GPUs at the hypervisor?. USENIX ATC 14. Hangchen Yu 1 , Christopher J. Rossbach 1,2 1 The University of Texas at Austin 2 VMware


  1. Full Virtualization for GPUs Reconsidered Revisit -- Suzuki, Yusuke, et al. “ GPUvm: Why not virtualizing GPUs at the hypervisor?.” USENIX ATC’ 14. Hangchen Yu 1 , Christopher J. Rossbach 1,2 1 The University of Texas at Austin 2 VMware Research Group

  2. Overview • Demands, introductions, challenges of virtual GPUs • Distinctive features of GPUvm • Re-evaluate GPUvm with additional benchmarks – Hard to set up the testbed – Some functionalities do not work – Over 200x overheads on average – Unfairness issue – Over 40% throughput loss #2

  3. Do we still need GPU virtualizations? • Share GPUs in datacenter • Different end-user demands • Hidden scenarios #3

  4. Do we still need GPU virtualizations? • Share GPUs in datacenter • Different end-user demands • Hidden scenarios #4

  5. Do we still need GPU virtualizations? • Share GPUs in datacenter • Different end-user demands • Hidden scenarios #5

  6. GPU Virtualization Challenges • Diverse hardware • Undocumented APIs • Closed-source GPUs and drivers • Deep graphics stack • Coupled layers • Significant overheads • Limited flexibility #6

  7. GPU Virtualization Comparisons Front-end Device emulation Synthesizes host graphics operations API remoting Forwards graphics API calls To external graphics stack Mediated-passthrough Dedicates a set of contexts Back-end Passthrough Provides exclusive access #7

  8. GPU Virtualization Comparisons Performance Fidelity Multiplexing Interposition Complexity Front-end Device emulation Synthesizes host graphics operations API remoting Forwards graphics API calls To external graphics stack Mediated-passthrough Dedicates a set of contexts Back-end Passthrough Provides exclusive access #8

  9. GPU Virtualization Comparisons Performance Fidelity Multiplexing Interposition Complexity Front-end Device emulation Synthesizes host graphics operations API remoting Forwards graphics API calls To external graphics stack Mediated-passthrough Dedicates a set of contexts Back-end Passthrough Provides exclusive access #9

  10. GPU Virtualization Comparisons Performance Fidelity Multiplexing Interposition Complexity Front-end Device emulation Synthesizes host graphics operations API remoting Forwards graphics API calls To external graphics stack Mediated-passthrough Dedicates a set of contexts Back-end Passthrough Provides exclusive access #10

  11. GPU Virtualization Comparisons Performance Fidelity Multiplexing Interposition Complexity Front-end Device emulation Synthesizes host graphics operations API remoting Forwards graphics API calls To external graphics stack Mediated-passthrough Dedicates a set of contexts Back-end Passthrough Provides exclusive access #11

  12. GPU Virtualization Comparisons Performance Fidelity Multiplexing Interposition Complexity Front-end Device emulation Synthesizes host graphics operations API remoting Forwards graphics API calls To external graphics stack Mediated-passthrough Dedicates a set of contexts Back-end Passthrough Provides exclusive access #12

  13. GPU Virtualization Comparisons Performance Fidelity Multiplexing Interposition Complexity Front-end Device emulation Synthesizes host graphics operations API remoting Forwards graphics API calls To external graphics stack Mediated-passthrough Dedicates a set of contexts Back-end Passthrough Provides exclusive access #13

  14. GPU Virtualization Examples Front-end M. Dowty, VMware SVGA , SIGOPS- OSR’09 Device emulation J. Duato, rCUDA , HiPC’10,11 API remoting G. Giunta, gVirtuS , European Conference on Parallel Processing’10 AMD MxGPU ( FirePro ), VMworld’15 Mediated-passthrough NVIDIA GRID vGPU, 15 KVMGT (Intel GVT-g ), 14 Back-end Passthrough Amazon Elastic Compute Cloud (AWS EC2 ) #14

  15. GPUvm Features Similar approaches when virtualizing at hypervisor-level Front-end Device emulation Exposes a native device model to VMs API remoting Forwards commands to GPU virtual aggregator Back-end Mediated-passthrough Passes-through some operations (I/O requests) to hardware #15

  16. Full-virtualization vs. Para-virtualization Para-virtualization Split device model Back End Apps GPU driver API vGPU driver API GPU driver Front End GPU Full-virtualization Trap-and-emulate Apps Device model GPU driver API Hypervisor GPU driver GPU #16

  17. Full-virtualization vs. Para-virtualization Performance Interposition Fidelity Multiplexing Para-virtualization Split device model Back End Apps GPU driver API vGPU driver API GPU driver Front End GPU Full-virtualization Trap-and-emulate Apps Device model GPU driver API Hypervisor GPU driver GPU #17

  18. Full-virtualization vs. Para-virtualization Performance Interposition Fidelity Multiplexing Para-virtualization Split device model Back End Apps vGPU driver GPU driver API vGPU driver API Back End API GPU driver Front End GPU Full-virtualization Trap-and-emulate Apps Device model GPU driver Hypervisor GPU driver API Hypervisor API GPU driver GPU #18

  19. Full Virtualization: A Reasonable Goal? Full-featured vGPU Strong isolation (3D acceleration) Full-virtualization Trap-and-emulate Apps Device model Device Model GPU driver API Hypervisor Slow performance Hard to map GPU driver GPU different GPUs #19

  20. Full Virtualization: A Reasonable Goal? Full-featured vGPU Strong isolation (3D acceleration) Full-virtualization Trap-and-emulate Apps Device model Device Model GPU driver API Hypervisor Slow performance Hard to map GPU driver GPU different GPUs #20

  21. GPUvm Overview • Access aggregator #21

  22. GPUvm Overview • Access aggregator #22

  23. GPUvm Overview • Access aggregator #23

  24. GPUvm Overview • Access aggregator VM • Shadow channel Virtual Context Driver – Mapped by a virtual channel Virtual Virtual Channel Channel • Shadow page table Shadow Channel Shadow Channel Shadow Page Shadow Page Table Table Shadowing Mechanism #24

  25. GPUvm Overview • Access aggregator VM • Shadow channel Virtual Context Driver – Mapped by a virtual channel Virtual Virtual Channel Channel • Shadow page table Shadow Channel Shadow Channel Shadow Page Shadow Page Table Table Shadowing Mechanism #25

  26. GPUvm Overview • Access aggregator VM • Shadow channel Virtual Context Driver – Mapped by a virtual channel Virtual Virtual Channel Channel • Shadow page table Shadow Channel Shadow Channel Shadow Page Shadow Page Table Table Shadowing Mechanism #26

  27. GPUvm Overview • Access aggregator VM • Shadow channel Virtual Context Driver – Mapped by a virtual channel Virtual Virtual Channel Channel • Shadow page table Shadow Channel Shadow Channel Shadow Page Shadow Page Table Table Shadowing Mechanism #27

  28. GPUvm Overview • Access aggregator • Shadow channel – Mapped by a virtual channel • Shadow page table • Virtual scheduler – FIFO – CREDIT – BAND (bandwidth-aware non-preemptive device) #28

  29. Why GPUvm? • Open-source • Overheads – FV (36x) PV (1.9x) Easier to Easier to analyze upgrade/swap/optimize • performance/mechanism Good open architecture components – Decoupled components – Native device model, virtual MMIO, shadow channels, shadow page tables, virtual schedulers Significant • Not-so-good aspects performance impact – Interposes guest access to memory-mapped resources – Shadows expensive resources • Trade-off of hypervisor-level full-virtualization #29

  30. MMIO through PCIe GPUvm Optimizations base address register • Sync virtual & shadow channels – Intercept data accesses – BAR3 remapping • BAR3 accesses are passed-through • Sync guest & shadow page tables – GPU-side page faults – Lazy shadowing • Updates shadow page tables only when referenced #30

  31. MMIO through PCIe GPUvm Optimizations base address register • Sync virtual & shadow channels – Intercept data accesses – BAR3 remapping • BAR3 accesses are passed-through • Sync guest & shadow page tables – GPU-side page faults – Lazy shadowing • Updates shadow page tables only when referenced #31

  32. Testbed • Specific hardware – NVIDIA Quadro 6000 NVC0 – GF100GL vs. GF100 (GTX 480) (different region addresses) • Specific software – Fedora 16 (Kernel 3.6.5) – Xen HVM (4.2.0) – Gdev (commit 605e69e7) – GCC 4.6.3 – NVCC 4.2 – Boost 1.4.7 #32

  33. Performance • BAR3 remapping – Relative execution time 1.6x speed-up – Fails for some benchmarks • Lazy shadowing – 1.2x speed-up – Fails for some benchmarks • Overhead – up to 737x, 232x on average • 7.4x Boot slowdown hotspot lud srad mmul WRITE bytes 659,664 662,544 666,784 660,832 Original WRITE bytes 6,736 7,240 6,352 6,672 #33

  34. Performance • BAR3 remapping – Relative execution time 1.6x speed-up – Fails for some benchmarks • Lazy shadowing – 1.2x speed-up – Fails for some benchmarks • Overhead – up to 737x, 232x on average • 7.4x Boot slowdown hotspot lud srad mmul WRITE bytes 659,664 662,544 666,784 660,832 Original WRITE bytes 6,736 7,240 6,352 6,672 #34

Recommend


More recommend