building a gpu focused ci solution
play

BUILDING A GPU-FOCUSED CI SOLUTION Mike Wendt @mike_wendt - PowerPoint PPT Presentation

BUILDING A GPU-FOCUSED CI SOLUTION Mike Wendt @mike_wendt github.com/nvidia github.com/mike-wendt Need for CPU CI Challenges of GPU CI Methods to Implement GPU CI Improving GPU CI Today AGENDA Demo Lessons Learned Next Steps Getting


  1. BUILDING A GPU-FOCUSED CI SOLUTION Mike Wendt @mike_wendt github.com/nvidia github.com/mike-wendt

  2. Need for CPU CI Challenges of GPU CI Methods to Implement GPU CI Improving GPU CI Today AGENDA Demo Lessons Learned Next Steps Getting Started 2

  3. NEED FOR GPU CI The number of GPU-accelerated applications are growing The leading open-source software projects from Apache and • others rely on CI External demand • Partners are collaborating with us on projects like GPU Open • Analytics Initiative (GoAi) and need GPU CI to ensure stable builds • Internal demand • Large code-bases internally for all kinds of GPU-accelerated applications require testing across different platforms/hardware Performance testing of new drivers and hardware needs repeatable • methods to make sure we continue to deliver performance 3

  4. CHALLENGES OF GPU CI GPUs bring a different set of problems than traditional CI Need GPUs Many traditional tools like Travis CI, Circle CI, and others do not Cloud or physical support GPUs Resource management For good reasons, dangers of misuse Expose GPU configuration to For tools that offer support, developers many times it is not native Driver, CUDA, GPU type Still feels “hacky,” but it gets the job done 4

  5. METHODS TO IMPLEMENT GPU CI 5

  6. BARE-METAL + GPU Fastest to get started with the most limitations Benefits Challenges Reduces complexity with minimal Managing dependencies can be setup tricky for multiple projects Works well for a small set of Limits ability to test multiple projects that use the same/similar platforms, limited to installed dependencies CUDA/OS Resource management is difficult 6

  7. BARE-METAL + GPU Fastest to get started with the most limitations CI Environment Source Test Tests Code Results GPUs Server 7

  8. DOCKER + NVIDIA CONTAINER RUNTIME github.com/nvidia/nvidia-docker Docker runtime that allows for GPU pass- thru on Linux systems Works with Debian/Ubuntu, RHEL/CentOS, and Amazon Linux Allows for testing multiple CUDA/OS environments on one machine Includes options to set supported driver operations and restrict GPU visibility 8

  9. DOCKER + GPU Easier to use with some hacking still required Benefits Challenges Ability to test multiple CUDA/OS Typically requires pre-built combinations Docker images with environments for testing and code to test Handles dependency management injected into container for testing for all projects Configuration tends to be a lot of Enables fine-grained resource environment variables and management cumbersome to manage Supports scale needed for larger GitLab CI and Jenkins require projects and teams “runners” for multiple nodes 9

  10. DOCKER + GPU Easier to use with some hacking still required CI Environment Docker Container Dockerfile or Source Container Code Custom Config Tests Test Results Docker + NVIDIA Runtime GPUs Server 10

  11. DOCKER + GPU Easier to use with some hacking still required CI Environment Docker Container Dockerfile or Source Container Code Custom Config Tests Test Results Docker + NVIDIA Runtime GPUs Server 11

  12. DOCKER + GPU Easier to use with some hacking still required CI Environment Docker Container Dockerfile or Source Container Code Custom Config Tests Test Results Docker + NVIDIA Runtime GPUs Server 12

  13. KUBERNETES + DOCKER + GPU Promises to be the easiest to use with minimal hacking Benefits Challenges GPU support in v1.8+ of Can only target GPUs on Kubernetes homogeneous nodes (heterogeneous support coming) Takes care of the “runner” challenge with GitLab/Jenkins Not all tools support GPU CI out of the box Resource management and scheduling is handled by Docker containers required for Kubernetes testing, but this can be the previous step in a pipeline 13

  14. KUBERNETES + DOCKER + GPU Promises to be the easiest to use with minimal hacking CI Environment Docker Container Repo Docker Container Dockerfile or Docker Test Source Container Container Code Custom Kubernetes Master Config Scheduler Tests Test Results … Server Kubernetes Master Docker + NVIDIA Runtime GPUs Kubernetes Worker 14

  15. KUBERNETES + DOCKER + GPU Promises to be the easiest to use with minimal hacking CI Environment Docker Container Repo Docker Container Dockerfile or Docker Test Source Container Container Code Custom Kubernetes Master Config Scheduler Tests Test Results … Server Kubernetes Master Docker + NVIDIA Runtime GPUs Kubernetes Worker 15

  16. KUBERNETES + DOCKER + GPU Promises to be the easiest to use with minimal hacking CI Environment Docker Container Repo Docker Container Dockerfile or Docker Test Source Container Container Code Custom Kubernetes Master Config Scheduler Tests Test Results … Server Kubernetes Master Docker + NVIDIA Runtime GPUs Kubernetes Worker 16

  17. KUBERNETES + DOCKER + GPU Promises to be the easiest to use with minimal hacking CI Environment Docker Container Repo Docker Container Dockerfile or Docker Test Source Container Container Code Custom Kubernetes Master Config Scheduler Tests Test Results … Server Kubernetes Master Docker + NVIDIA Runtime GPUs Kubernetes Worker 17

  18. HOW CAN WE MAKE THIS BETTER TODAY? 18

  19. JENKINS PLUGIN FOR NVIDIA + DOCKER Based on Jenkins docker-slaves plugin Simplifies the configuration of Docker containers for GPU CI testing Allows for targeting a Dockerfile within the repo to build and use for testing or a Docker image in a remote hub Supports side-containers with GPU support Easy to use and adapt a project for GPU CI 19

  20. DEMO 20

  21. JENKINS PLUGIN FOR NVIDIA + DOCKER Simplifying the configuration for GPU CI Jenkins CI Environment Docker Container Dockerfile or Source Container + Code Plugin Config Test Results Tests Docker + NVIDIA Runtime GPUs Server 21

  22. LESSONS LEARNED CI best practices apply to GPU code as well • • Pull request testing is one of the best methods to ensure code quality GitLab CI works great if there are only a few GPU-enabled repos to test • • For scale-out, GitLab on Kubernetes is best Larger organizations and projects need a centralized CI platform like Jenkins • Setup of a new repo is easy and with parameterized builds we can make use of • existing pipelines • Advanced uses of Jenkins Tagging is key to test on multiple GPU architectures and pipelines for multiple CUDA • version testing 22

  23. NEXT STEPS Continue plugin development and release as an open source project • • Internal Continue deployment of GPU CI and migrate performance testing toward full GPU CI • • Leverage capabilities of Jenkins to go beyond CI with CD and workflow automation External • Expand GPU CI testing by testing pull requests of open source projects using Jenkins • and the plugin • Take advantage of the GPU targeting within Kubernetes and new GPU features in the coming months Look at ways to more closely integrate GPU CI with GitLab CI and Jenkins plugins for • Kubernetes 23

  24. GETTING STARTED Links to useful repos github.com/nvidia github.com/mike-wendt NVIDIA Docker Runtime Jenkins Plugin For NVIDIA nvidia-docker Co ming soon NVIDIA Kubernetes Device Plugin Docker + NVIDIA Runtime on Ubuntu k8s-device-plugin nvidia-docker-ubuntu 24

  25. THANK YOU Mike Wendt @mike_wendt github.com/nvidia github.com/mike-wendt

Recommend


More recommend