Industrial Level Deep Learning Training Infrastructure — the Practice and Experience from SenseTime Shengen Yan SenseTime Group Limited.
The Success of Deep Learning Google Search AlexNet won ImageNet 2006-01 2007-01 2008-01 2009-01 2010-01 2011-01 2012-01 2013-01 2014-01 2015-01 2016-01
What Lead to the Success?
Model Capacity The Key to High Performance # Layers 1207 169 22 8 5 LeNe Net Alex exNet ( (2012) 2) Goog ogLeNet et (201 014) ResN sNet (2 (2016) Ours rs
Computation power Years months weeks days Accelerate the training time from several years to several days!
01 Deep Learning Package A deep learning framework that is efficient , scalable , and flexible . 02 DeepLink A large-scale cluster platform designed for deep learning. Applications 03 Delivers many application models
Deep Learning is Complicated GoogleNet (2014) Deep Learning community developed frameworks to make the life easier.
Deep learning Training Frameworks ‣ SenseTime Deep Learning training Package • Both model parallel & data • Scalability • Memory efficient parallel • Computation efficient • Support huge model
Memory Footprint Optimization Optimizations: liveness analysis, computation graph high level compiler backend optimization algorithms on intermediate representation.
Memory Footprint Optimization Seeing Generated Graph with mirror(re-compute) node Perceiving Chen T, Xu B, Zhang C, et al. Training deep nets with sublinear memory cost[J]. arXiv preprint arXiv:1604.06174, 2016.
Model Capacity Ours MxNet TensorFlow Chainer Caffe Torch 140 120 100 80 60 40 20 0 VGG ResNet50 ResNet152 Inception V4 ResNet269 Inception ResNet Memory usage efficiency, higher is better
Single-GPU Performance milliseconds / iteration 2500 2000 1500 1000 500 0 Batch-32 Batch-64 Batch-128 Caffe 497.5 1045 1965 Chainer 200 290 543 TensorFlow 178.6 315.7 587.2 Parrots 122.7 225.6 471 Caff ffe Chai ainer Tens nsorFlo low Parr rrots
Communication Optimization Support Multi-GPUs and Multi-Nodes Three procedures: Copy, Allreduce, Copy Optimizations: Other Nodes Allreduce • Master-slave threads to overlap the communication and computation overhead • GPU direct communication CPU Memory • Ring allreduce message passing Copy Copy GPU0 GPU1 GPU2 GPU3
Scalability single node multiple nodes 12000 1.2 10000 1 8000 0.8 6000 0.6 4000 0.4 2000 0.2 0 0 1 2 3 4 8 16 24 32 # GPUs # GPUs millisec/iter scale efficiency
01 Deep Learning Package A deep learning framework that is efficient , scalable , and flexible . 02 DeepLink A large-scale cluster platform designed for deep learning. Applications 03 Delivers many application models
The role of supercomputer It just like highway in the city — It is a key infrastructure of AI
Supercomputing Centers for AI The key infrastructures for AI research. DATA DeepLink COMPPUT- MODEL ATION
Challenges ‣ Interconnects at multiple levels • GPUs, Nodes, Sub-networks ‣ Distributed data • Random access becomes particularly difficult ‣ Scale vs. Stability • Failures of individual nodes/links ‣ Human resources • Engineers who understand both Deep Learning & HPC are difficult to come by
DeepLink Clusters Designed for Deep Learning Software Maximize respective strengths while ensuring optimal Hardware cooperation. Co-design • High speed interconnects High- • High performance GPU computing performance Hardware • Efficient distributed storage • Distributed storage & cache system (optimized for small files) Customized • Distributed deep learning framework Middlewares • Task scheduling & monitoring
Platform overview Deep Learning Training Visualization System Task scheduling system Software Distributed training software Computation library Customized communication library for deep learning High speed storage Lightweight virtualization Distributed cache system system Platform Operation/Maintenance/Monitoring System Heterogeneous deep learning super computer
Training Visualization
DeepLink in SenseTime >3000 GPUs
01 Deep Learning Package A deep learning framework that is efficient , scalable , and flexible . 02 DeepLink A large-scale cluster platform designed for deep learning. Applications 03 Delivers many application models
THANK YOU
Recommend
More recommend