boostin boosting g
play

Boostin Boosting g Perf erfor ormance mance and Ear and - PowerPoint PPT Presentation

Boostin Boosting g Perf erfor ormance mance and Ear and Earnings nings of Cloud Computing of Cloud Computing Deplo Deployments yments with with rCUD rCUDA Federico Silla Universitat Politcnica de Valncia Spain Outline 1.


  1. Boostin Boosting g Perf erfor ormance mance and Ear and Earnings nings of Cloud Computing of Cloud Computing Deplo Deployments yments with with rCUD rCUDA Federico Silla Universitat Politècnica de València Spain

  2. Outline 1. Using CUDA GPUs from virtual machines 2. rCUDA: GPU virtualization 3. Performance of rCUDA with one virtual machine 4. Performance of rCUDA with several virtual machines 5. Conclusions GPU Technology Conference 2017 2/33

  3. Outline 1. Using CUDA GPUs from virtual machines 2. rCUDA: GPU virtualization 3. Performance of rCUDA with one virtual machine 4. Performance of rCUDA with several virtual machines 5. Conclusions GPU Technology Conference 2017 3/33

  4. Using CUDA GPUs from virtual machines • How to access the GPU in the native domain from the inside of a virtual machine? GPU Technology Conference 2017 4/33

  5. Using CUDA GPUs from virtual machines • The PCI passthrough technique can be used to assign the GPU to a virtual machine • However … the GPU is assigned in an exclusive way • Concurrent usage of the GPU is not possible GPU Technology Conference 2017 5/33

  6. Using CUDA GPUs from virtual machines • … the amount of virtual machines using CUDA acceleration cannot be larger than the amount of GPUs present in the host virtual machines ≤ GPUs GPU Technology Conference 2017 6/33

  7. Using CUDA GPUs from virtual machines • GPU virtualization allows as many virtual machines as required to share the GPU in the host GPU Technology Conference 2017 7/33

  8. Outline 1. Using CUDA GPUs from virtual machines 2. rCUDA: GPU virtualization for CUDA 3. Performance of rCUDA with one virtual machine 4. Performance of rCUDA with several virtual machines 5. Conclusions GPU Technology Conference 2017 8/33

  9. rCUDA … CUDA … they sound similar GPU Technology Conference 2017 9/33

  10. Basics of GPU computing Basic behavior of CUDA GPU GPU GPU Technology Conference 2017 10/33

  11. Basics of GPU computing GPU GPU GPU Technology Conference 2017 11/33

  12. rCUDA … remote CUDA A software technology that enables a more flexible use of GPUs in computing facilities No GPU No GPU rCUDA is a development by Universitat Politècnica de València, Spain GPU Technology Conference 2017 12/33

  13. Basics of rCUDA rCUDA is a development by Universitat Politècnica de València, Spain GPU Technology Conference 2017 13/33

  14. Basics of rCUDA rCUDA is a development by Universitat Politècnica de València, Spain GPU Technology Conference 2017 14/33

  15. rCUDA GPU virtualization envision  rCUDA allows a new vision of a GPU deployment, moving from the usual cluster configuration: node 1 node 2 node 3 node n GPU RAM GPU RAM GPU RAM GPU RAM Physical CPU RAM CPU RAM CPU RAM CPU RAM PCIe PCIe PCIe PCIe configuration CPU CPU CPU CPU RAM RAM RAM RAM Network Network Network Network Interconnection Network GPU RAM GPU RAM GPU RAM GPU RAM Logical to the following one: connections node 1 node 2 node 3 node n CPU CPU CPU CPU RAM RAM RAM RAM PCIe PCIe PCIe PCIe CPU CPU CPU CPU RAM RAM RAM RAM Logical Network Network Network Network configuration Interconnection Network GPU Technology Conference 2017 15/33

  16. Performance of applications using rCUDA • Several applications executed with CUDA and rCUDA • K20 GPU and FDR InfiniBand • K40 GPU and EDR InfiniBand Lower is better GPU Technology Conference 2017 16/33

  17. Performance of applications using rCUDA Lower EDR InfiniBand and P100 GPU is better BarraCUDA CUDA-MEME Lower is better GPU Technology Conference 2017 17/33

  18. Why the good performance of rCUDA? The low overhead of applications using rCUDA is due to: • Data copies with rCUDA attaining higher bandwidth to the remote GPU than CUDA does to the local GPU • Some internal synchronization mechanisms faster in rCUDA than in CUDA • … a very careful implementation of the rCUDA framework … “Ideas Are Easy, Implementation Is Hard ” Guy Kawasaki, marketing specialist and Silicon Valley venture capitalist GPU Technology Conference 2017 18/33

  19. Example of performance with P2P copies CUDA rCUDA model model rCUDA scenario 1 rCUDA provides the same semantics as CUDA rCUDA scenario 2 GPU Technology Conference 2017 19/33

  20. Example of performance with P2P copies rCUDA scenario 2 Higher is better GPU Technology Conference 2017 20/33

  21. Outline 1. Using CUDA GPUs from virtual machines 2. rCUDA: GPU virtualization for CUDA 3. Performance of rCUDA with one virtual machine 4. Performance of rCUDA with several virtual machines 5. Conclusions GPU Technology Conference 2017 21/33

  22. Using rCUDA to access the GPU • In clusters where InfiniBand is not available, the rCUDA server may be placed in the native domain and the rCUDA client would be placed inside the VMs • The virtual network provided by the hypervisor would be used to exchange data between the rCUDA clients and the Low performance rCUDA server network fabric available • This configuration allows the use of more than one GPU at the host KVM KVM GPU Technology Conference 2017 22/33

  23. Using rCUDA to access the GPU High performance network fabric available • If InfiniBand is available, the rCUDA server can be placed in another node • Several GPUs can be provided to the VMs, either in a single remote node or in KVM KVM several remote nodes GPU Technology Conference 2017 23/33

  24. Application performance with KVM FDR InfiniBand + K20 !! LAMMPS CUDA-MEME CUDASW++ GPU-BLAST GPU Technology Conference 2017 24/33

  25. Outline 1. Using CUDA GPUs from virtual machines 2. rCUDA: GPU virtualization for CUDA 3. Performance of rCUDA with one virtual machine 4. Performance of rCUDA with several virtual machines 5. Conclusions GPU Technology Conference 2017 25/33

  26. CUDA approach • Let’s use a computer with two GPUs and four virtual machines: • Two virtual machines use one GPU each (PCI passthrough) • Two virtual machines must run applications on CPU GPU Technology Conference 2017 26/33

  27. rCUDA approach • With rCUDA, the four virtual machines can share both GPUs. The two GPUs can be either in the same host or in other computer GPU Technology Conference 2017 27/33

  28. Performance comparison • Each of the 4 virtual machines execute as many instances as possible of one of the 4 following applications: • LAMMPS (red color in the plot below) • NAMD (green) • GPU-Blast (blue) • Fluidsim (yellow) • For each experiment, applications are shifted across virtual machines Sharing GPUs among applications increases the overall amount of executed jobs GPU Technology Conference 2017 28/33

  29. Outline 1. Using CUDA GPUs from virtual machines 2. rCUDA: GPU virtualization for CUDA 3. Performance of rCUDA with one virtual machine 4. Performance of rCUDA with several virtual machines 5. Conclusions GPU Technology Conference 2017 29/33

  30. Conclusions • rCUDA allows GPUs to be shared among several virtual machines • Applications do not need to be modified in order to use rCUDA • Performance with rCUDA when GPUs are not shared is not significantly reduced • Overall performance is increased when GPUs are shared among virtual machines GPU Technology Conference 2017 30/33

  31. Get a free copy of rCUDA at http://www http://www.r .rcuda.net cuda.net More than 800 requests world wide @rcuda_ rCUDA is a development by Universitat Politècnica de València, Spain GPU Technology Conference 2017 31/33

  32. Get a free copy of rCUDA at http://www http://www.r .rcuda.net cuda.net More than 800 requests world wide @rcuda_ Javier Prades Jaime Sierra Tony Díaz Pablo Higueras Carlos Reaño rCUDA is a development by Universitat Politècnica de València, Spain GPU Technology Conference 2017 32/33

  33. Thanks! Questions? rCUDA is a development by Universitat Politècnica de València, Spain GPU Technology Conference 2017 33/33

Recommend


More recommend