Cloud Deep Learning Gingfung Yeung, Damian Borowiec, Adrian Friday, - PowerPoint PPT Presentation

2020 USENIX HotCloud Towards GPU Utilization Prediction for Cloud Deep Learning Gingfung Yeung, Damian Borowiec, Adrian Friday, Richard Harper, Peter Garraghan Evolving Distributed System Lab School of Computing & Communications Lancaster University UK

Deep Learning (DL) Systems Growing number of Machine Learning engineers, More Deep Learning expensive GPUs researchers, users (DL) workloads Require efficient resource usage & high DL performance 2

DL System Challenges • Avg. GPU utilization ~ 52% in production systems [ Jeon et al. ’19 ] DL System Challenges • Long job completion + queue times ~ up to hours [ Jeon et al. ’19; Gu et al. ‘19 ] Addressed via understanding and exploiting workload patterns 3 3

Online profiling approach Deploy workload into isolated machines and GPUs to obtain workload patterns Workload Workload Resource Profile Monitor Response GPU-1 {Utilization = 20, Memory = 4GiB,Bytes…} GPU-1 GPU-2 GPU-2 {Utilization = 40, Memory = 6GiB,Bytes…} Node Usually per workload profiling range from minutes to hours 4

DL Metrics • Iteration time • Useful for scale-out workers, migration, SLA-aware inference • [ Peng et al. ’18; Xiao et al.’ 18; Shen et al.’ 19 ] • Network I/O • Useful for efficient distributed training • [Gu et al. ’19] • GPU Utilization • For packing and calculating interference • [Thinakaran et al. ’19; Xu et al. ’19] 5

Case: Scheduling Scheduling Loop 1. Query Resource Make decision based on Scheduler Monitor workload patterns from profiling 2. Issue 3. Migrate Resource Management Framework 6 6

Time is Money If the system has many heterogenous workloads, will lead to head-of-line blocking. • N workload × mins … … Workload Queue Profiling Stage Scheduling Stage (mins) 7

Online Profiling • Pros • Accurate, near real-time workload patterns • Provide insights to the system • Cons • Heterogenous workloads require different profiles • Time consuming (~mins to ~hours) • Require modifying underlying frameworks 8

Online Profiling • Pros • Accurate, near real-time workload patterns • Provide insights to the system • Cons Obtain prior execution ? • Heterogenous workloads require different profiles • Time consuming (~mins to ~hours) • Require actual execution onto an isolated machine • Require modifying underlying frameworks 9

Prediction • N workload × seconds Reduce blocking … … Workload Queue Prediction Stage Scheduling Stage (sub-second – seconds) 10

DL System Challenges • Avg. GPU utilization ~ 52% in production systems [ Jeon et al. ’19 ] DL System Challenges • Long job completion + queue times ~ up to hours [ Jeon et al. ’19; Gu et al. ‘19 ] Addressed via understanding and exploiting workload patterns 11 11

DL Metrics • Iteration time • Useful for scale-out workers, migration, SLA-aware inference • [ Peng et al. ’18; Xiao et al.’ 18; Shen et al.’ 19 ] • Network I/O • Useful for efficient distributed training • [Gu et al. ’19] • GPU Utilization • For packing and calculating interference • [Thinakaran et al. ’19; Xu et al. ’19] 12

Objective GPU utilization prediction engine for Cloud DL Systems Benefits • Estimates GPU utilization of unseen workloads • Prior to execution • No modification of existing DL frameworks • E.g. PyTorch, TensorFlow, MXNet… Analysis, prediction model, case study 13 13

DL computation graph Going deeper with convolutions [Szegedy et al 2014] Features: Num. Convs, FLOPs, layers, etc. Leverage graph information to (See paper for full features list) predict workload usage. 𝑔 𝑦 → 𝑧 14

Analysis • Profile DL workload utilization • Determine important model features • Set up • Nvidia 1080, Nvidia 2080, Intel i7-6850k • 13 DNN model architectures, 81 workloads See paper for full list of models and permutations. • Tools • Nvidia-smi • Nvidia Nsight Systems 15 15

Analysis 100 CNN GPU Utilization % RNN 80 GFLOPs 60 40 20 0 GPU Utilization % 16 16

Analysis 5x 100 Nvidia 1080 Normalized JCT increase GPU Utilization % Nvidia 2080 80 4x Batch 16 Batch 128 Batch 64 60 3x 40 2x 20 1x 0 0 50 100 150 200 Summative GPU Utilization (%) 1.5x – 4x slowdown from co-location 17

GPU Utilization Prediction 𝑜 1 2 𝑜 ෍ log 𝑞 𝑗 + 1 − log 𝑧 𝑗 + 1 𝑗=1 18

Evaluation 100 Slot-based Avg Cluster GPU Utilization (%) 80 Reactive Proactive 60 40 20 0 0 50 100 150 200 250 300 Time (minutes) 33.5% Makespan reduction 61.5% Utilization improvements 19

Open Challenges • Hardware • Number of processing elements, memory bandwidth and cache sizes. • DL Compilers • Extract lower level IR to determine optimization decision for more accurate prediction. (e.g. Op fusion – ConvBatchNorm) • Distributed Workload • Network I/O, parallelism strategy and system configuration. • (e.g. ring topology) • Co-location Scheduling • Incorporate prediction and system constraints • Derive an optimization algorithm • (e.g. Mixed Integer Programming). 20

Cloud Deep Learning Gingfung Yeung, Damian Borowiec, Adrian Friday, - PowerPoint PPT Presentation

2020 USENIX HotCloud Towards GPU Utilization Prediction for Cloud Deep Learning Gingfung Yeung, Damian Borowiec, Adrian Friday, Richard Harper, Peter Garraghan Evolving Distributed System Lab School of Computing & Communications Lancaster

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Building a Private Cloud Cloud Infrastructure Using Opensource Building a Private Cloud OSCON

KAFKA STREAMS CLOUD MONITORING AWS CLOUD MONITORING AWS APP CLOUD MONITORING AWS HTTP APP

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

SNR SNR- -cloud interaction cloud interaction cloud interaction SNR SNR cloud interaction

Cloud Cloud Cloud Cloud network Edge Edge Edge Edge as a Edge Edge Edge Edge Edge

Cloud Ross Mallace Commercial Director Cloud/SaaS Cloud is here. ALL By 2020 most core

CS 4803 to find some building blocks - hard problems (assumptions about hardness of some

Data Repair of Inconsistent DL-Programs Thomas Eiter Michael Fink Daria Stepanova

The Secret to Managing Shared Secrets ActiveState State Tool Webinar The Secret to

Course Project Ju Sun Computer Science & Engineering University of Minnesota, Twin Cities

Practical Tips for Managing ML/DL Experiments Iman Mirzadeh Disclaimer: The master theme for

MATH 12002 - CALCULUS I 3.4: Curve Sketching Professor Donald L. White Department of

JSR-166: Concurrency Utilities Present and Future The java.util.concurrent package aims to do for

Owning Your Home Network: Router Security Revisited Marcus

Cloud Deep Learning Gingfung Yeung, Damian Borowiec, Adrian Friday, - PowerPoint PPT Presentation

2020 USENIX HotCloud Towards GPU Utilization Prediction for Cloud Deep Learning Gingfung Yeung, Damian Borowiec, Adrian Friday, Richard Harper, Peter Garraghan Evolving Distributed System Lab School of Computing & Communications Lancaster

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Building a Private Cloud Cloud Infrastructure Using Opensource Building a Private Cloud OSCON

KAFKA STREAMS CLOUD MONITORING AWS CLOUD MONITORING AWS APP CLOUD MONITORING AWS HTTP APP

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

SNR SNR- -cloud interaction cloud interaction cloud interaction SNR SNR cloud interaction

Cloud Cloud Cloud Cloud network Edge Edge Edge Edge as a Edge Edge Edge Edge Edge

Cloud Ross Mallace Commercial Director Cloud/SaaS Cloud is here. ALL By 2020 most core

CS 4803 to find some building blocks - hard problems (assumptions about hardness of some

Data Repair of Inconsistent DL-Programs Thomas Eiter Michael Fink Daria Stepanova

The Secret to Managing Shared Secrets ActiveState State Tool Webinar The Secret to

Course Project Ju Sun Computer Science &amp; Engineering University of Minnesota, Twin Cities

Practical Tips for Managing ML/DL Experiments Iman Mirzadeh Disclaimer: The master theme for

MATH 12002 - CALCULUS I 3.4: Curve Sketching Professor Donald L. White Department of

JSR-166: Concurrency Utilities Present and Future The java.util.concurrent package aims to do for

Owning Your Home Network: Router Security Revisited Marcus

Course Project Ju Sun Computer Science & Engineering University of Minnesota, Twin Cities