Automating Cloud Deployment for Deep Learning Inference of - PowerPoint PPT Presentation

Automating Cloud Deployment for Deep Learning Inference of Real-time Online Services Yang Li Zhenhua Han Quanlu Zhang Zhenhua Li Haisheng Tan

DNN-driven Real-time Services Speech Recognition Image Classification Neural Machine Translation

Cloud Deployment Network transmission time DNN Model Task scheduling time Low Latency DNN inference time require …… Trade-off between Cost Efficiency execution time and economic cost

Cloud Deployment Network transmission time DNN Model Task scheduling time Low Latency DNN inference time require …… Trade-off between Cost Efficiency execution time and economic cost Inference cost (10000 times) of different models across different cloud configurations.

Here come the problems I want to deploy my face recognition service on the cloud. Given a configuration, how can I minimize the DNN inference time? How should I choose the cloud configuration?

Choose Cloud Configurations • Choose cloud configurations Both of them provide over 100 types of cloud configurations! Example: 2 series from over 40 series on Azure!

Reduce DNN Inference Time • A DNN model can have hundreds to thousands of operations. • Each operation can be placed on a list of feasible devices (e.g., CPUs or GPUs) to reduce execution time. How to choose the optimal device placement plan? Example: the computation graph of Inception-V3

Challenge Cloud Device • Huge search space Configuration Placement Space Space • Inference cost • the price of the cloud configuration * inference time. ($/hour) (second/request) ? How to automatically determine the cloud configuration and device placement for the Black-box inference of a DNN model, so as to Optimization! minimize the inference cost while satisfying the inference time constraint (QoS)?

AutoDeep • Given • A DNN model • Inference time constraint (QoS constraint) • Goal • Compute the cloud deployment with the lowest inference cost • Two-fold joint optimization • Cloud configuration searching • Black-box method: Bayesian Optimization (BO) • Device placement optimization • Markov decision process: Deep Reinforcement Learning (DRL)

Bayesian Black-box Optimization Optimization! • Regard the inference cost of a given DNN model with a QoS constraint as a black-box function 𝑔 . select goal 𝑔 Minimize 𝑔 Cloud Configuration Pool input converge and output iterations Optimize the DNN device placement in A (nearly) optimal cloud the selected cloud configuration and configuration with the optimized calculate inference cost (observation) device placement plan of the DNN.

Optimize Device Placement – DRL Model Attention Decoder Encoder

AutoDeep: Architectural Overview

Experiments – Device Placement Google RL • Algorithm designed by Mirhoseini et al. • [ICML17] Device placement optimization with reinforcement learning • Expert Designed • Experiments on 4 K80 GPUs Hand-crafted placements given by Mirhoseini et al. • Single GPU • Execution on a single GPU. •

Experiments LCF (Lowest Cost First) • Try configurations in the ascending • order of their unit price Uniform • Try configurations with uniform • QoS Constraint Increasing probability Inference cost of RNNLM under Inference cost of Inception-V3 varying QoS constraint under varying QoS constraint

Experiments AutoDeep ： Lowest search cost RNNLM Inception-V3

Future work My Email: liyang14thu@gmail.com • Improve learning efficiency • Developing a general network architecture so that re-training is not needed for new DNN inference models • Accelerate DRL training process • … • Optimize the system efficiency • Over 90% of searching time is wasted to initialize the DNN computation graph • Allowing placing operations in a fine-grained manner (i.e., without restarting a job)

Automating Cloud Deployment for Deep Learning Inference of - PowerPoint PPT Presentation

Automating Cloud Deployment for Deep Learning Inference of Real-time Online Services Yang Li Zhenhua Han Quanlu Zhang Zhenhua Li Haisheng Tan DNN-driven Real-time Services Speech Recognition Image Classification Neural Machine Translation

DEEP INTO TRTIS: BERT PRACTICAL DEPLOYMENT ON NVIDIA GPU Xu Tianhao, Deep Learning Solution

deploy Automating Cloud Testing and Deployment with Deploy Monday 9/16/2013 5:10pm Room

DEEP LEARNING DEPLOYMENT WITH NVIDIA TENSORRT Shashank Prasanna Deep Learning in Production -

Presented by: Doretta Richardson Pre-Deployment Brief Got Deployment? 2 Pre-Deployment Workshop

Presented by: Doretta Richardson Pre-Deployment Brief Got Deployment? 2 Pre-Deployment Workshop

IPv6 Deployment WG in IPv6 Promotion Council and its Deployment Guideline 2005.2.23 IPv6

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Automating batch fecundity measurements Automating batch fecundity measurements using digital

REDHAT KICKSTART REDHAT KICKSTART Automating Linux Installation Automating Linux Installation

Automating the Automating the configuration of flow configuration of flow monitoring probes

Automating MySQL Deployments on Kubernetes Calin Don & Flavius Mecea Presslabs Automating

Automating Authority Work Automating authority work, or, Be your own authority control vendor

Automating Production of Cross Media Automating Production of Cross Media Content for

RANDOMIZING AND RANDOMIZING AND AUTOMATING ASSESSMENT AUTOMATING ASSESSMENT WITH R WITH R exams

Outline EECS228a Lecture 2 Economics of Networks Research Topics Routing Congestion

OPERATOR SCHEDULING IN A DATA STREAM MANAGER Authors: D. Charney , U.etintemel , A.Rasin

FOR L ATENCY C RITICAL W ORKLOADS H ARSHAD K ASTURE , D ANIEL S ANCHEZ ASPLOS 2014 Motivation 2

A Constraint-Based Approach to Quality Assurance in Service Choreographies c, 1 Manuel Carro, 1 ,

#$%%$&'""

Quality of Service Quality of Service Principles, IntServ, RSVP, DiffServ Improving QOS in IP

WACAS March 2, 2014 Impr provin ving g Cover erage age an and d Rel elia iabil ilit ity

QoS-aware Service Composition in Dynamic Service Oriented Environments December 3rd 2009

Automating Cloud Deployment for Deep Learning Inference of - PowerPoint PPT Presentation

Automating Cloud Deployment for Deep Learning Inference of Real-time Online Services Yang Li Zhenhua Han Quanlu Zhang Zhenhua Li Haisheng Tan DNN-driven Real-time Services Speech Recognition Image Classification Neural Machine Translation

DEEP INTO TRTIS: BERT PRACTICAL DEPLOYMENT ON NVIDIA GPU Xu Tianhao, Deep Learning Solution

deploy Automating Cloud Testing and Deployment with Deploy Monday 9/16/2013 5:10pm Room

DEEP LEARNING DEPLOYMENT WITH NVIDIA TENSORRT Shashank Prasanna Deep Learning in Production -

Presented by: Doretta Richardson Pre-Deployment Brief Got Deployment? 2 Pre-Deployment Workshop

Presented by: Doretta Richardson Pre-Deployment Brief Got Deployment? 2 Pre-Deployment Workshop

IPv6 Deployment WG in IPv6 Promotion Council and its Deployment Guideline 2005.2.23 IPv6

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Automating batch fecundity measurements Automating batch fecundity measurements using digital

REDHAT KICKSTART REDHAT KICKSTART Automating Linux Installation Automating Linux Installation

Automating the Automating the configuration of flow configuration of flow monitoring probes

Automating MySQL Deployments on Kubernetes Calin Don &amp; Flavius Mecea Presslabs Automating

Automating Authority Work Automating authority work, or, Be your own authority control vendor

Automating Production of Cross Media Automating Production of Cross Media Content for

RANDOMIZING AND RANDOMIZING AND AUTOMATING ASSESSMENT AUTOMATING ASSESSMENT WITH R WITH R exams

Outline EECS228a Lecture 2 Economics of Networks Research Topics Routing Congestion

OPERATOR SCHEDULING IN A DATA STREAM MANAGER Authors: D. Charney , U.etintemel , A.Rasin

FOR L ATENCY C RITICAL W ORKLOADS H ARSHAD K ASTURE , D ANIEL S ANCHEZ ASPLOS 2014 Motivation 2

A Constraint-Based Approach to Quality Assurance in Service Choreographies c, 1 Manuel Carro, 1 ,

#$%%$&amp;'&quot;&quot;

Quality of Service Quality of Service Principles, IntServ, RSVP, DiffServ Improving QOS in IP

WACAS March 2, 2014 Impr provin ving g Cover erage age an and d Rel elia iabil ilit ity

QoS-aware Service Composition in Dynamic Service Oriented Environments December 3rd 2009

Automating MySQL Deployments on Kubernetes Calin Don & Flavius Mecea Presslabs Automating

#$%%$&'""