Putting Deep Learning Models in Production Sahil Dua @sahildua2305 - PowerPoint PPT Presentation

Putting Deep Learning Models in Production Sahil Dua @sahildua2305 @sahildua2305

Let’s imagine! @sahildua2305 @sahildua2305

But ... @sahildua2305 @sahildua2305

whoami ➔ Software Developer @ Booking.com ➔ Previously - Deep Learning Infrastructure ➔ Open Source Contributor (Git, Pandas, Kinto, go-github, etc.) ➔ Tech Speaker @sahildua2305 @sahildua2305

Agenda ➔ Deep Learning at Booking.com ➔ Life-cycle of a model ➔ Training Models ➔ Serving Predictions @sahildua2305 @sahildua2305

Deep Learning at Booking.com @sahildua2305 @sahildua2305

Scale highlights . 1,500,000 + 1.4 million + room nights active properties booked in 220+ countries every 24 hours @sahildua2305 @sahildua2305

Deep Learning ➔ Image understanding ➔ Translations ➔ Ads bidding ➔ ... @sahildua2305 @sahildua2305

Image Tagging @sahildua2305 @sahildua2305

Image Tagging Sea view: 6.38 Balcony/Terrace: 4.82 Photo of the whole room: 4.21 Bed: 3.47 Decorative details: 3.15 Seating area: 2.70 @sahildua2305 @sahildua2305

@sahildua2305 @sahildua2305

Image Tagging Using the image tag information in the right context Swimming pool, Breakfast Buffet, etc. @sahildua2305 @sahildua2305

Lifecycle of a model @sahildua2305 @sahildua2305

Lifecycle of a model Data Train Deploy Analysis @sahildua2305 @sahildua2305

Training a Model - on laptop @sahildua2305 @sahildua2305

Machine Learning workload ➔ Computationally intensive workload ➔ Often not highly parallelizable algorithms ➔ 10 to 100 GBs of data @sahildua2305 @sahildua2305

Why Kubernetes (k8s)? ➔ Isolation ➔ Elasticity ➔ Flexibility @sahildua2305 @sahildua2305

Why k8s – GPUs? ➔ In alpha since 1.3 ➔ Speed up 20X-50X resources: limits: alpha.kubernetes.io/nvidia-gpu: 1 @sahildua2305 @sahildua2305

Training with k8s ➔ Base images with ML frameworks ◆ TensorFlow, Torch, VowpalWabbit, etc. ➔ Training code is installed at start time ➔ Data access - Hadoop (or PVs) @sahildua2305 @sahildua2305

Startup Training pod Code .. start.sh train.py evaluate.py @sahildua2305 @sahildua2305

Startup Training pod Data .. start.sh PV train.py evaluate.py @sahildua2305 @sahildua2305

Streaming logs back Training pod Logs .. start.sh PV train.py evaluate.py @sahildua2305 @sahildua2305

Exports the model Training pod model .. start.sh PV train.py evaluate.py @sahildua2305 @sahildua2305

Serving predictions @sahildua2305 @sahildua2305

Serving Predictions Input Features Client Model Prediction @sahildua2305 @sahildua2305

Serving Predictions Input Features Client Model 1 Prediction Input Features Client Model X Prediction @sahildua2305 @sahildua2305

Serving Predictions ➔ Stateless app with common code ➔ Containerized ➔ No model in image ➔ REST API for predictions @sahildua2305 @sahildua2305

Serving Predictions Input App Features Client model Prediction @sahildua2305 @sahildua2305

Serving Predictions ➔ Get trained model from Hadoop ➔ Load model in memory ➔ Warm it up ➔ Expose HTTP API ➔ Respond to the probes @sahildua2305 @sahildua2305

Serving Predictions Input Features Client Prediction @sahildua2305 @sahildua2305

Serving Predictions Input Features Client Prediction Input Features Client Prediction @sahildua2305 @sahildua2305

Deploying a new model ➔ Create new Deployment ➔ Create new HTTP Route ➔ Wait for liveness/readiness probe @sahildua2305 @sahildua2305

Performance PredictionTime = RequestOverhead + N*ComputationTime N is the number of instances to predict on @sahildua2305 @sahildua2305

Optimizing for Latency ➔ Do not predict if you can precompute ➔ Reduce Request Overhead ➔ Predict for one instance ➔ Quantization (float 32 => fixed 8) ➔ TensorFlow specific: freeze network & optimize for inference @sahildua2305 @sahildua2305

Optimizing for Throughput ➔ Do not predict if you can precompute ➔ Batch requests ➔ Parallelize requests @sahildua2305 @sahildua2305

Summary ➔ Training models in pods ➔ Serving models ➔ Optimizing serving for latency/throughput @sahildua2305 @sahildua2305

Next steps ➔ Tooling to control hundred deployments ➔ Autoscale prediction service ➔ Hyper parameter tuning for training @sahildua2305 @sahildua2305

Want to get in touch? LinkedIn / Twitter / GitHub @sahildua2305 Website www.sahildua.com @sahildua2305 @sahildua2305

THANK YOU @sahildua2305 @sahildua2305 @sahildua2305

Putting Deep Learning Models in Production Sahil Dua @sahildua2305 - PowerPoint PPT Presentation

Putting Deep Learning Models in Production Sahil Dua @sahildua2305 @sahildua2305 Lets imagine! @sahildua2305 @sahildua2305 But ... @sahildua2305 @sahildua2305 whoami Software Developer @ Booking.com Previously - Deep Learning

and Features in Deep Learning Interpretation Sahil Singla Joint work with Eric Wallace, Shi

Structured Probabilistic Models for Deep Learning Lecture slides for Chapter 16 of Deep Learning

DEEP LEARNING DEPLOYMENT WITH NVIDIA TENSORRT Shashank Prasanna Deep Learning in Production -

Lottery ticket hypothesis By : Grishma Gupta, Lokit Paras 1.Motivation Deep learning models

Visual deep learning models, in particular for face recognition and models of invariant

Approach to practical application of Deep Learning in manufacturer's production line Masahiro

models to understand visual cortex 11-785 Introduction to Deep Learning Fall 2017 Michael Tarr

Evaluation of Deep Learning Evaluation of Deep Learning Models for Network Models for Network

Learning Deep Structured Models for Semantic Segmentation Guosheng Lin Semantic Segmentation

Deep Learning and Hardware: Matching the Demands from the Machine Learning Community Ekapol

Semi-supervised Learning with Deep Generative Models Diedrik P. Kingma, Danilo J. Rezende, Shakir

Learning Deep Generative Models Inference & Representation Lecture 12 Rahul G. Krishnan

Feature extraction from deep models Olgert Denas Synopsis Intro to deep models Applications

FROM SCRATCH TO PRODUCTION Andrew Liu, Ryan Shen Deep Learning Solution Architect Defect

4 Deep Generative Models BVM 2018 Tutorial: Advanced Deep Learning Methods Jens Petersen Dept.

Deep-Learning: Unsupervised Generative models Deep Belief Networks Deep Stacked AutoEncoders

Optimization for Training Deep Models presented by Kan Ren Table of Contents Optimization

Deep Learning Models for Time Series Data Analysis with Applications to Health Care Yan Liu

a shallow survey of deep learning Applications, Models, Algorithms and Theory (?) Chiyuan Zhang

CS7015 (Deep Learning) : Lecture 22 Autoregressive Models (NADE, MADE) Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 22 Autoregressive Models (NADE, MADE) Mitesh M. Khapra

Variational Inference and Generative Models CS 294-112: Deep Reinforcement Learning Sergey

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Putting Deep Learning Models in Production Sahil Dua @sahildua2305 - PowerPoint PPT Presentation

Putting Deep Learning Models in Production Sahil Dua @sahildua2305 @sahildua2305 Lets imagine! @sahildua2305 @sahildua2305 But ... @sahildua2305 @sahildua2305 whoami Software Developer @ Booking.com Previously - Deep Learning

and Features in Deep Learning Interpretation Sahil Singla Joint work with Eric Wallace, Shi

Structured Probabilistic Models for Deep Learning Lecture slides for Chapter 16 of Deep Learning

DEEP LEARNING DEPLOYMENT WITH NVIDIA TENSORRT Shashank Prasanna Deep Learning in Production -

Lottery ticket hypothesis By : Grishma Gupta, Lokit Paras 1.Motivation Deep learning models

Visual deep learning models, in particular for face recognition and models of invariant

Approach to practical application of Deep Learning in manufacturer's production line Masahiro

models to understand visual cortex 11-785 Introduction to Deep Learning Fall 2017 Michael Tarr

Evaluation of Deep Learning Evaluation of Deep Learning Models for Network Models for Network

Learning Deep Structured Models for Semantic Segmentation Guosheng Lin Semantic Segmentation

Deep Learning and Hardware: Matching the Demands from the Machine Learning Community Ekapol

Semi-supervised Learning with Deep Generative Models Diedrik P. Kingma, Danilo J. Rezende, Shakir

Learning Deep Generative Models Inference &amp; Representation Lecture 12 Rahul G. Krishnan

Feature extraction from deep models Olgert Denas Synopsis Intro to deep models Applications

FROM SCRATCH TO PRODUCTION Andrew Liu, Ryan Shen Deep Learning Solution Architect Defect

4 Deep Generative Models BVM 2018 Tutorial: Advanced Deep Learning Methods Jens Petersen Dept.

Deep-Learning: Unsupervised Generative models Deep Belief Networks Deep Stacked AutoEncoders

Optimization for Training Deep Models presented by Kan Ren Table of Contents Optimization

Deep Learning Models for Time Series Data Analysis with Applications to Health Care Yan Liu

a shallow survey of deep learning Applications, Models, Algorithms and Theory (?) Chiyuan Zhang

CS7015 (Deep Learning) : Lecture 22 Autoregressive Models (NADE, MADE) Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 22 Autoregressive Models (NADE, MADE) Mitesh M. Khapra

Variational Inference and Generative Models CS 294-112: Deep Reinforcement Learning Sergey

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Learning Deep Generative Models Inference & Representation Lecture 12 Rahul G. Krishnan