building on prem gpu training infrastructure
play

Building On-prem GPU Training Infrastructure By Stephen Balaban - PowerPoint PPT Presentation

Building On-prem GPU Training Infrastructure By Stephen Balaban CEO, Lambda Lambda Customers About Me Started using CNNs for face recognition in 2012. First employee at Perceptio. We developed image recognition CNNs that ran


  1. Building On-prem GPU Training Infrastructure By Stephen Balaban CEO, Lambda

  2. Lambda Customers

  3. About Me Started using CNNs for face recognition in 2012. ● First employee at Perceptio. We developed image ● recognition CNNs that ran locally on the iPhone. Acquired by Apple in 2015. Published in SPIE and NeurIPS. ●

  4. Workshop Structure ● Audience survey ● Presentation w/ Q&A ● Q&A + Workshop

  5. 5 Stages of GPU Cloud Grief

  6. It all starts with the Shock of an expensive AWS bill.

  7. Stage 1 - Denial “This won’t happen again next month.”

  8. Stage 2 - Anger “The bill doubled again!”

  9. Stage 3 - Bargaining with your account manager.

  10. Stage 4 - Depression “Spot instances and reserved instances aren’t enough, this is hopeless.”

  11. Stage 5 - Acceptance “GPU cloud services are expensive. Managing hardware is scary.”

  12. Hardware: A Quick Rundown 1. GPUs 2. CPUs 3. GPU-GPU Bandwidth & PCIe Topology

  13. GPUs

  14. GPU Speed Comparisons Source: https://lambdalabs.com/blog/titan-rtx-tensorflow-benchmarks/

  15. Performance / $ Source: https://lambdalabs.com/blog/best-gpu-tensorflow-2080-ti-vs-v100-vs-titan-v-vs-1080-ti-benchmark/

  16. CPUs

  17. What to look for 1. Number of PCIe lanes. (Affects total bandwidth.) 2. NUMA Node Topology. (Affects GPU peering.) Source: https://lambdalabs.com/blog/best-gpu-tensorflow-2080-ti-vs-v100-vs-titan-v-vs-1080-ti-benchmark/

  18. GPU Peering & PCIe Topology

  19. PCIe Topology 16x 16x 16x 16x 16x 16x

  20. Dual Root PCIe Topology CPU-CPU CPU CPU Interconnect PEX PEX PEX PEX 8748 8748 8748 8748 G G G G G G G G P P P P P P P P U U U U U U U U 4 5 6 7 0 1 2 3 Arrow is 16x PCIe Connection Source: Lambda

  21. Single Root PCIe Topology CPU PEX 8796 PEX 8796 G G G G G G G G P P P P P P P P U U U U U U U U 0 1 2 3 4 5 6 7 Arrow is 16x PCIe Connection Source: Lambda

  22. Cascaded PCIe Topology CPU PEX 8796 PEX 8796 G G G G G G G G P P P P P P P P U U U U U U U U 0 1 2 3 4 5 6 7 Arrow is 16x PCIe Connection Source: Lambda

  23. NVLink System Topology CPU-CPU CPU CPU Interconnect PEX PEX PEX PEX 8748 8748 8748 8748 Open Circle is CPU-CPU Comm GPU 0 GPU 1 GPU 4 GPU 5 Green Double Arrow is NVLink GPU 2 GPU 3 GPU 6 GPU 7 Arrow is 16x PCIe Connection Source: Lambda

  24. Real Life Examples

  25. Source: ASUS

  26. Single Root Complex vs Dual Root Complex Single Root Complex Dual Root Complex (4029GP-TRT2) (4028GR-TRT) Source: Supermicro

  27. 1080 Ti GPUDirect Peer-to-Peer Bandwidth Benchmark 16x 16x 16x 16x 16x 16x Source: Lambda

  28. No Peering on the new 2080 Ti Topology used in this experiment. (For the 1080 Ti, no NVLink.) Source: Lambda

  29. Lambda Stack = GPU-enabled Frameworks For Ubuntu 16.04 or 18.04. One command: LAMBDA_REPO=$(mktemp) && \ wget -O${LAMBDA_REPO} https://lambdalabs.com/static/misc/lambda-stack-repo.deb && \ sudo dpkg -i ${LAMBDA_REPO} && rm -f ${LAMBDA_REPO} && \ sudo apt-get update && sudo apt-get install -y lambda-stack-cuda Also comes as a Docker Container. Source: https://lambdalabs.com/lambda-stack-deep-learning-software

  30. Cost Comparison: On-prem vs. Cloud p3dn.24xlarge Instance Lambda Hyperplane AWS $109,008 once $160,308/year with reserved pricing (Add $15,000 / year if you want to co-locate instead.)

  31. Cost Comparison: On-prem vs. Cloud p3.16xlarge Instance Lambda Blade AWS $28,389 once $139,371/year with reserved pricing (Add $15,000 / year if you want to co-locate instead.)

  32. Cost Comparison: On-prem vs. Cloud p3.8xlarge Instance Lambda Quad AWS $12,472 once $69,729/year with reserved pricing

  33. Thank You! Tweet @LambdaAPI @stephenbalaban LAMBDALABS.COM/BLOG

Recommend


More recommend