industrial level deep learning
play

Industrial Level Deep Learning Training Infrastructure the Practice - PowerPoint PPT Presentation

Industrial Level Deep Learning Training Infrastructure the Practice and Experience from SenseTime Shengen Yan SenseTime Group Limited. The Success of Deep Learning Google Search AlexNet won ImageNet 2006-01 2007-01 2008-01 2009-01


  1. Industrial Level Deep Learning Training Infrastructure — the Practice and Experience from SenseTime Shengen Yan SenseTime Group Limited.

  2. The Success of Deep Learning Google Search AlexNet won ImageNet 2006-01 2007-01 2008-01 2009-01 2010-01 2011-01 2012-01 2013-01 2014-01 2015-01 2016-01

  3. What Lead to the Success?

  4. Model Capacity The Key to High Performance # Layers 1207 169 22 8 5 LeNe Net Alex exNet ( (2012) 2) Goog ogLeNet et (201 014) ResN sNet (2 (2016) Ours rs

  5. Computation power Years months weeks days Accelerate the training time from several years to several days!

  6. 01 Deep Learning Package A deep learning framework that is efficient , scalable , and flexible . 02 DeepLink A large-scale cluster platform designed for deep learning. Applications 03 Delivers many application models

  7. Deep Learning is Complicated GoogleNet (2014) Deep Learning community developed frameworks to make the life easier.

  8. Deep learning Training Frameworks ‣ SenseTime Deep Learning training Package • Both model parallel & data • Scalability • Memory efficient parallel • Computation efficient • Support huge model

  9. Memory Footprint Optimization  Optimizations: liveness analysis, computation graph high level compiler backend optimization algorithms on intermediate representation.

  10. Memory Footprint Optimization Seeing  Generated Graph with mirror(re-compute) node Perceiving Chen T, Xu B, Zhang C, et al. Training deep nets with sublinear memory cost[J]. arXiv preprint arXiv:1604.06174, 2016.

  11. Model Capacity Ours MxNet TensorFlow Chainer Caffe Torch 140 120 100 80 60 40 20 0 VGG ResNet50 ResNet152 Inception V4 ResNet269 Inception ResNet Memory usage efficiency, higher is better

  12. Single-GPU Performance milliseconds / iteration 2500 2000 1500 1000 500 0 Batch-32 Batch-64 Batch-128 Caffe 497.5 1045 1965 Chainer 200 290 543 TensorFlow 178.6 315.7 587.2 Parrots 122.7 225.6 471 Caff ffe Chai ainer Tens nsorFlo low Parr rrots

  13. Communication Optimization  Support Multi-GPUs and Multi-Nodes  Three procedures: Copy, Allreduce, Copy  Optimizations: Other Nodes Allreduce • Master-slave threads to overlap the communication and computation overhead • GPU direct communication CPU Memory • Ring allreduce message passing Copy Copy GPU0 GPU1 GPU2 GPU3

  14. Scalability single node multiple nodes 12000 1.2 10000 1 8000 0.8 6000 0.6 4000 0.4 2000 0.2 0 0 1 2 3 4 8 16 24 32 # GPUs # GPUs millisec/iter scale efficiency

  15. 01 Deep Learning Package A deep learning framework that is efficient , scalable , and flexible . 02 DeepLink A large-scale cluster platform designed for deep learning. Applications 03 Delivers many application models

  16. The role of supercomputer It just like highway in the city — It is a key infrastructure of AI

  17. Supercomputing Centers for AI The key infrastructures for AI research. DATA DeepLink COMPPUT- MODEL ATION

  18. Challenges ‣ Interconnects at multiple levels • GPUs, Nodes, Sub-networks ‣ Distributed data • Random access becomes particularly difficult ‣ Scale vs. Stability • Failures of individual nodes/links ‣ Human resources • Engineers who understand both Deep Learning & HPC are difficult to come by

  19. DeepLink Clusters Designed for Deep Learning Software Maximize respective strengths while ensuring optimal Hardware cooperation. Co-design • High speed interconnects High- • High performance GPU computing performance Hardware • Efficient distributed storage • Distributed storage & cache system (optimized for small files) Customized • Distributed deep learning framework Middlewares • Task scheduling & monitoring

  20. Platform overview Deep Learning Training Visualization System Task scheduling system Software Distributed training software Computation library Customized communication library for deep learning High speed storage Lightweight virtualization Distributed cache system system Platform Operation/Maintenance/Monitoring System Heterogeneous deep learning super computer

  21. Training Visualization

  22. DeepLink in SenseTime >3000 GPUs

  23. 01 Deep Learning Package A deep learning framework that is efficient , scalable , and flexible . 02 DeepLink A large-scale cluster platform designed for deep learning. Applications 03 Delivers many application models

  24. THANK YOU

Recommend


More recommend