putting deep learning models in production sahil dua
play

Putting Deep Learning Models in Production Sahil Dua @sahildua2305 - PowerPoint PPT Presentation

Putting Deep Learning Models in Production Sahil Dua @sahildua2305 @sahildua2305 Lets imagine! @sahildua2305 @sahildua2305 But ... @sahildua2305 @sahildua2305 whoami Software Developer @ Booking.com Previously - Deep Learning


  1. Putting Deep Learning Models in Production Sahil Dua @sahildua2305 @sahildua2305

  2. Let’s imagine! @sahildua2305 @sahildua2305

  3. But ... @sahildua2305 @sahildua2305

  4. whoami ➔ Software Developer @ Booking.com ➔ Previously - Deep Learning Infrastructure ➔ Open Source Contributor (Git, Pandas, Kinto, go-github, etc.) ➔ Tech Speaker @sahildua2305 @sahildua2305

  5. Agenda ➔ Deep Learning at Booking.com ➔ Life-cycle of a model ➔ Training Models ➔ Serving Predictions @sahildua2305 @sahildua2305

  6. Deep Learning at Booking.com @sahildua2305 @sahildua2305

  7. Scale highlights . 1,500,000 + 1.4 million + room nights active properties booked in 220+ countries every 24 hours @sahildua2305 @sahildua2305

  8. Deep Learning ➔ Image understanding ➔ Translations ➔ Ads bidding ➔ ... @sahildua2305 @sahildua2305

  9. Image Tagging @sahildua2305 @sahildua2305

  10. Image Tagging @sahildua2305 @sahildua2305

  11. Image Tagging Sea view: 6.38 Balcony/Terrace: 4.82 Photo of the whole room: 4.21 Bed: 3.47 Decorative details: 3.15 Seating area: 2.70 @sahildua2305 @sahildua2305

  12. @sahildua2305 @sahildua2305

  13. Image Tagging Using the image tag information in the right context Swimming pool, Breakfast Buffet, etc. @sahildua2305 @sahildua2305

  14. Lifecycle of a model @sahildua2305 @sahildua2305

  15. Lifecycle of a model Data Train Deploy Analysis @sahildua2305 @sahildua2305

  16. Training a Model - on laptop @sahildua2305 @sahildua2305

  17. Training a Model - on laptop @sahildua2305 @sahildua2305

  18. Machine Learning workload ➔ Computationally intensive workload ➔ Often not highly parallelizable algorithms ➔ 10 to 100 GBs of data @sahildua2305 @sahildua2305

  19. Why Kubernetes (k8s)? ➔ Isolation ➔ Elasticity ➔ Flexibility @sahildua2305 @sahildua2305

  20. Why k8s – GPUs? ➔ In alpha since 1.3 ➔ Speed up 20X-50X resources: limits: alpha.kubernetes.io/nvidia-gpu: 1 @sahildua2305 @sahildua2305

  21. Training with k8s ➔ Base images with ML frameworks ◆ TensorFlow, Torch, VowpalWabbit, etc. ➔ Training code is installed at start time ➔ Data access - Hadoop (or PVs) @sahildua2305 @sahildua2305

  22. Startup Training pod Code .. start.sh train.py evaluate.py @sahildua2305 @sahildua2305

  23. Startup Training pod Data .. start.sh PV train.py evaluate.py @sahildua2305 @sahildua2305

  24. Streaming logs back Training pod Logs .. start.sh PV train.py evaluate.py @sahildua2305 @sahildua2305

  25. Exports the model Training pod model .. start.sh PV train.py evaluate.py @sahildua2305 @sahildua2305

  26. Serving predictions @sahildua2305 @sahildua2305

  27. Serving Predictions Input Features Client Model Prediction @sahildua2305 @sahildua2305

  28. Serving Predictions Input Features Client Model 1 Prediction Input Features Client Model X Prediction @sahildua2305 @sahildua2305

  29. Serving Predictions Input Features Client Model 1 Prediction Input Features Client Model X Prediction @sahildua2305 @sahildua2305

  30. Serving Predictions ➔ Stateless app with common code ➔ Containerized ➔ No model in image ➔ REST API for predictions @sahildua2305 @sahildua2305

  31. Serving Predictions Input App Features Client model Prediction @sahildua2305 @sahildua2305

  32. Serving Predictions ➔ Get trained model from Hadoop ➔ Load model in memory ➔ Warm it up ➔ Expose HTTP API ➔ Respond to the probes @sahildua2305 @sahildua2305

  33. Serving Predictions Input Features Client Prediction @sahildua2305 @sahildua2305

  34. Serving Predictions Input Features Client Prediction Input Features Client Prediction @sahildua2305 @sahildua2305

  35. Deploying a new model ➔ Create new Deployment ➔ Create new HTTP Route ➔ Wait for liveness/readiness probe @sahildua2305 @sahildua2305

  36. Performance PredictionTime = RequestOverhead + N*ComputationTime N is the number of instances to predict on @sahildua2305 @sahildua2305

  37. Optimizing for Latency ➔ Do not predict if you can precompute ➔ Reduce Request Overhead ➔ Predict for one instance ➔ Quantization (float 32 => fixed 8) ➔ TensorFlow specific: freeze network & optimize for inference @sahildua2305 @sahildua2305

  38. Optimizing for Throughput ➔ Do not predict if you can precompute ➔ Batch requests ➔ Parallelize requests @sahildua2305 @sahildua2305

  39. Summary ➔ Training models in pods ➔ Serving models ➔ Optimizing serving for latency/throughput @sahildua2305 @sahildua2305

  40. Next steps ➔ Tooling to control hundred deployments ➔ Autoscale prediction service ➔ Hyper parameter tuning for training @sahildua2305 @sahildua2305

  41. Want to get in touch? LinkedIn / Twitter / GitHub @sahildua2305 Website www.sahildua.com @sahildua2305 @sahildua2305

  42. THANK YOU @sahildua2305 @sahildua2305 @sahildua2305

Recommend


More recommend