Large-scale GPU Deep Learning Platform Design and Case Analysis - PowerPoint PPT Presentation

Large-scale GPU Deep Learning Platform Design and Case Analysis Zhang Qing Alfie Lew YOUR SUCCESS, WE SUCCEED

AI Age Has arrived Electric Age AI Age •In 1870s •In 2012 •The second •The fourth technological technological revolution revolution Information Age Steam Age •In 1940s~1950s •In 1760s •The third technological •The first technological revolution revolution

AI Application Trend • More and More Users The Internet – Security and Surveillance – Finance, health care – Car manufacturers – Smart City Financial Medical care Robots, entertainment – • More and more application scenarios Image / video analysis – Speech recognition – NLP/OCR – … – Automobile Household Entertainment

Deep Learning Process Flow Inference Data Preprocessing Training Abnormal Model Data sets “Thank you”

Deep Learning Computing Characteristics Data Preprocess High IO Intensity Training Extreme Computing and Communication Intensity Inference High throughput and low latency

Deep Learning Computing System Trend • Computing Mode From single node to clusters – From local to cloud – • Data Storage From Dedicated (Training and Inference) to Unified Storage – • System Management Development platform – Production platform – Cloud platform – • Application mode From single user to multi-user – From single framework to multiple frameworks –

Deep Leaning Challenges Implementing distributed Obtaining large amount of Large-scale deep learning parallel neural network labeled data and computing platform algorithm for speed, scale, preprocessing efficiency expandability

Architecture of Large-scale Deep Learning System Image/video Apps NLP Apps Speech Apps App Level Caffe-mpi TensorFlow Caffe CNTK mxnet Framework Level Management Scheduling Mirror Management monitoring applied analysis Management Level Inspur Teye Inspur AIStation GPU Inference Platform CPU pre-processing Platform GPU Training Platform Parallel 10GbE/IB Platform Level storage Network

Deep Learning Challenges - Platform Level Design IO efficiency for data pre-processing Computing resources required for modeling, tuning and optimization Inference speed and desired throughput for large amount of sample processing

Architecture of Large-scale Deep Learning Platform Computing Architecture • Data preprocessing platform – CPU cluster • Training platform – CPU+ P100/P40 GPU • HPC Cluster • Inference platform – CPU+P4 GPU • Hadoop • Data Storage • Offline with Lustre – Online with HDFS – Network • Offline with IB – Online with 10GbE –

Deep Learning Challenges - Management Layer Managing different computing platforms and configurations/devices Managing different frameworks for different computing tasks Managing the whole system and monitoring different computing tasks

Deep Learning Management System AIStation is a Deep Learning Cluster and training task management software, which can rapidly deploy training environment for Deep Learning, and comprehensively manage Deep Learning training tasks, providing an efficient and convenient platform for users. Key Functions Deployment of Deep-Learning environment Management of Deep-Learning training tasks GPU & CPU Monitoring GPU resources management and scheduling Cluster Statistics & Report

AIStation - Workflow Resource scheduling Containers run Assign GPUs Training mgmt Container installation Applications start Compose training jobs User interaction 1. Resources ： GPU 1. Shell access 1. Run Contariners 2. Templates ： TF1 2. VNC access 2. Execute Job 1. Job starter 3. Images ： TF/v1.0 3. Training 2. TF1.yaml commands 4. Parametes ： ps,ws… visualization 5. Data ： volume

AIStation - Integrating Deep Learning Frameworks Integrate Deep Learning Frameworks – Supports Multiple Deep Learning Frameworks: Caffe, TensorFlow, CNTK, etc. – Support various models: GoogleNet, VGG, ResNet, etc. – One-Key deployment of the Deep Learning environment – Training jobs submit & schedule – Training process management & visualization GPU Resource Training Jobs 20% 30% Utilization Throughput

Teye : Application Optimization Analysis Tool Analyzing the bottleneck and characteristics of Applicatation • GPU driver data：clock，ECC，power – GPU runtime data：memory util，memory copy，cache，SP/DP Gflops – CPU runtime info： AVX，SSE，SP/DP Gflops, CPI –

Deep Learning Challenges - Framework • How to select from many Deep Learning Framework? Caffe, TensorFlow, MxNET, CNTK, Torch, Theono, DeepLearning4j, PaddlePaddle … • What framework to use for a given scenario and model? • Using a single framework or multiple frameworks?

A Frameworks Comparison Compute Platform: Inspur SR-AI Rack(16 GPUs) + AIStation+Teye (management) • Framework: Caffe, Tensorflow, MxNet • Model: Alexnet, Googlenet • Performance • Alexnet: 4675.799 Images/s (16 GPUs/GPU = 14X) à Caffe is best – Googlenet：2462Images/s (16 GPUs/GPU = 13X) à MxNet is best –

Factors to Consider when Selecting Framework • Based on model size and complexity • Based on different application scenarios Image – Speech – NLP – • Based on data size to select distributed framework Caffe-MPI – Tensorflow – MxNet –

Deep Learning Challenges - Applications Layer • How to improve the recognition accuracy? Model design – Data pre-processing – • How to improve Training performance? CUDA Programming for half Precision (pascal) – CUDA Programming for mixed Precision – • How to improve Inference performance? CUDA Programming for Int8 –

Deep Learning Applications on GPU Speech Training Image Search Time （ S ） 300 256.1 250 200 CPU 150 115.2 GPU 100 50 0 Samples ： 1M ， dimensions:180 Image Training Network Security Time(s) 350 300 250 200 150 100 50 0 CPU:C+MKL 1GPU version 4GPU version version

Deep Learning Platform End-to-End Alexnet/GoogLenet/Resnet CNN/RNN/LSTM Model & Algorithm Deep learning Caffe-MPI TensorFlow MXNET PaddlePaddle training platform DL Framework “Big Win！” DL Management AIStation management system T-Eye Tuning Tool Speech recognition “ This is Daniel wu ” Face recognition Training Data Model Terminal “ Pursuit staff ” Video monitoring GPU AI Cloud “ Retinopathy ” 16 Card Medical imaging 2U8 Card GPU Box “Have booked G6 ” 10G/IB Network 2U4 Card 4U4 Card NF5280M4 P8000 Wrokstation Personal assistant Inference Training GPU Clustre AI recognition processing speech/image/video data processing Cluster / natural language Flash Storage AS5600/13000 Storage

Inspur Deep Learning GPU Servers 8GPU Server 64GPU Server 2GPU Server 4GPU Server Inspur is a leading AI computing providers: NF5280M4 NF5568M4 AGX-2 SR-AI Rack Supply >60% AI HW to CSP in Inference Training Training Training China

Thank You Visit us in Booth #911 COMPUTING INSPIRES FUTURE

Large-scale GPU Deep Learning Platform Design and Case Analysis - PowerPoint PPT Presentation

Large-scale GPU Deep Learning Platform Design and Case Analysis Zhang Qing Alfie Lew YOUR SUCCESS, WE SUCCEED AI Age Has arrived Electric Age AI Age In 1870s In 2012 The second The fourth technological technological

Pluto A Distributed Heterogeneous Deep Learning Framework Jun Yang, Yan Chen Large Scale

DSSTNE: Deep Learning At Scale For Large Sparse Datasets

Efficient Communication Library for Large-Scale Deep Learning Mar 26, 2018 Minsik Cho

From Worst-Case to Realistic-Case Analysis for Large Scale Machine Learning Algorithms

Initial Characterization of I/O in Large-Scale Deep Learning Applications Fahim Chowdhury,

Doing More with More: Recent Achievements in Large-Scale Deep Reinforcement Learning Compiled by:

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Natural Language Processing with Deep Learning CS224N The Future of Deep Learning + NLP Kevin

Building a National Scale Learning Platform Hi Hiren n Dos oshi Dec 2018 Co-Founders:

Storage and Caching System for Large-Scale Deep Learning Training Lipeng Wang 1 , Songgao Ye 2 ,

Large-Scale Deep Learning with TensorFlow for Building Intelligent Systems Jeff Dean Google

Exploring the Design Space of Deep Convolu(onal Neural

Learning Artificial Intelligence in Large-Scale Video Games A First Case Study with

CloudLeak: Large-Scale Deep Learning Models Stealing Through Adversarial Examples Honggang Yu 1 ,

Large Scale Deep Learning for Theorem Proving in HOList: First Results and Future Directions

IN5320 - Development in Platform Ecosystems Lecture 9: Design in platform ecosystems 15th of

Deep Convolutional Networks and their impact on solving large scale visual recognition problems

Large-Scale Machine Learning at Twitter 2 Large-Scale Machine Learning at Twitter Jimmy Lin and

Large-Scale Deep Learning for Intelligent Computer Systems Jeff Dean In collaboration with many

TRAINING DEEP LEARNING MODELS AT SCALE USING KUBERNETES Mitul Tiwari and Deepak Bobbarjung

A large-scale chemical data integration system Gaia Paolini Pfizer Confidential 1 Large-Scale

Bayesian Experimental Design for Large Scale Signal Acquisition Optimization Matthias Seeger

CSE 802 Spring 2017 Deep Learning Inci M. Baytas Michigan State University February 13-15,

Distributed Deep Learning with Horovod Alex Sergeev, Machine Learning Platform, Uber Engineering