Industrial Level Deep Learning Training Infrastructure the Practice - PowerPoint PPT Presentation

Industrial Level Deep Learning Training Infrastructure — the Practice and Experience from SenseTime Shengen Yan SenseTime Group Limited.

The Success of Deep Learning Google Search AlexNet won ImageNet 2006-01 2007-01 2008-01 2009-01 2010-01 2011-01 2012-01 2013-01 2014-01 2015-01 2016-01

What Lead to the Success?

Model Capacity The Key to High Performance # Layers 1207 169 22 8 5 LeNe Net Alex exNet ( (2012) 2) Goog ogLeNet et (201 014) ResN sNet (2 (2016) Ours rs

Computation power Years months weeks days Accelerate the training time from several years to several days!

01 Deep Learning Package A deep learning framework that is efficient , scalable , and flexible . 02 DeepLink A large-scale cluster platform designed for deep learning. Applications 03 Delivers many application models

Deep Learning is Complicated GoogleNet (2014) Deep Learning community developed frameworks to make the life easier.

Deep learning Training Frameworks ‣ SenseTime Deep Learning training Package • Both model parallel & data • Scalability • Memory efficient parallel • Computation efficient • Support huge model

Memory Footprint Optimization  Optimizations: liveness analysis, computation graph high level compiler backend optimization algorithms on intermediate representation.

Memory Footprint Optimization Seeing  Generated Graph with mirror(re-compute) node Perceiving Chen T, Xu B, Zhang C, et al. Training deep nets with sublinear memory cost[J]. arXiv preprint arXiv:1604.06174, 2016.

Model Capacity Ours MxNet TensorFlow Chainer Caffe Torch 140 120 100 80 60 40 20 0 VGG ResNet50 ResNet152 Inception V4 ResNet269 Inception ResNet Memory usage efficiency, higher is better

Single-GPU Performance milliseconds / iteration 2500 2000 1500 1000 500 0 Batch-32 Batch-64 Batch-128 Caffe 497.5 1045 1965 Chainer 200 290 543 TensorFlow 178.6 315.7 587.2 Parrots 122.7 225.6 471 Caff ffe Chai ainer Tens nsorFlo low Parr rrots

Communication Optimization  Support Multi-GPUs and Multi-Nodes  Three procedures: Copy, Allreduce, Copy  Optimizations: Other Nodes Allreduce • Master-slave threads to overlap the communication and computation overhead • GPU direct communication CPU Memory • Ring allreduce message passing Copy Copy GPU0 GPU1 GPU2 GPU3

Scalability single node multiple nodes 12000 1.2 10000 1 8000 0.8 6000 0.6 4000 0.4 2000 0.2 0 0 1 2 3 4 8 16 24 32 # GPUs # GPUs millisec/iter scale efficiency

The role of supercomputer It just like highway in the city — It is a key infrastructure of AI

Supercomputing Centers for AI The key infrastructures for AI research. DATA DeepLink COMPPUT- MODEL ATION

Challenges ‣ Interconnects at multiple levels • GPUs, Nodes, Sub-networks ‣ Distributed data • Random access becomes particularly difficult ‣ Scale vs. Stability • Failures of individual nodes/links ‣ Human resources • Engineers who understand both Deep Learning & HPC are difficult to come by

DeepLink Clusters Designed for Deep Learning Software Maximize respective strengths while ensuring optimal Hardware cooperation. Co-design • High speed interconnects High- • High performance GPU computing performance Hardware • Efficient distributed storage • Distributed storage & cache system (optimized for small files) Customized • Distributed deep learning framework Middlewares • Task scheduling & monitoring

Platform overview Deep Learning Training Visualization System Task scheduling system Software Distributed training software Computation library Customized communication library for deep learning High speed storage Lightweight virtualization Distributed cache system system Platform Operation/Maintenance/Monitoring System Heterogeneous deep learning super computer

Training Visualization

DeepLink in SenseTime >3000 GPUs

THANK YOU

Industrial Level Deep Learning Training Infrastructure the Practice - PowerPoint PPT Presentation

Industrial Level Deep Learning Training Infrastructure the Practice and Experience from SenseTime Shengen Yan SenseTime Group Limited. The Success of Deep Learning Google Search AlexNet won ImageNet 2006-01 2007-01 2008-01 2009-01

Deep Reinforcement Learning [Human-Level Control through deep reinforcement learning, Nature

Class Activation Map (CAM) Prof. Seungchul Lee Industrial AI Lab. Issues on CNN (or Deep

DeepMind Self-Learning Atari Agent Human - level control through deep reinforcement learning

Fully Convolutional Network (FCN) Prof. Seungchul Lee Industrial AI Lab. Deep Learning for

Industrial Transfer Learning Introduction to Industrial Transfer Learning Industrial Transfer

CS 4803 / 7643: Deep Learning Topics: Image Classification Supervised Learning view

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Intelligent Low-level Signal Detection and Zero- Suppession in Raw LArTPC Waveforms through Deep

NCI-DOE Cancer Initiative: Ras Biology in Membranes Molecular level Deep Learning (Towards

Node-Level Deep Learning Input Pipeline Optimization on GPGPU-Accelerated HPC Systems 28 Mar

C-Brain: A Deep Learning Accelerator that Tames the Diversity of CNNs through Adaptive Data-level

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep Learning: Intro Juhan Nam Review of Traditional Machine Learning The traditional machine

Reproducibility and Replicability in Deep Reinforcement Learning (and Other Deep Learning

Image Classification with DIGITS NVIDIA Deep Learning Institute 1 DEEP LEARNING INSTITUTE DLI

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Human-level control through deep reinforcement learning Volodymyr Mnih, Koray Kavukcuoglu, David

Natural Language Processing with Deep Learning CS224N The Future of Deep Learning + NLP Kevin

An Overview of Deep Residual Learning Semih Yagcioglu 01.03.2016 Deep Residual Learning

Deep Reinforcement Learning [Mastering the Game of Go with Deep Reinforcement Learning and Tree

Mocha.jl Deep Learning in Julia Chiyuan Zhang (@pluskid) CSAIL, MIT Deep Learning Learning

Minjie Wang Deep Learning Deep Learning trend in the past 10 years Caffe State-of-art DL

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations