tsing nghua hua university versity introduction
play

Tsing nghua hua University versity Introduction Deep learning - PowerPoint PPT Presentation

Deep ep500 500 BOF 2018 Jidong ong Zhai Tsing nghua hua University versity Introduction Deep learning has widely used in lots of areas Introduction A lot of deep learning frameworks, compute libraries and acceleration devices


  1. Deep ep500 500 BOF 2018 Jidong ong Zhai Tsing nghua hua University versity

  2. Introduction • Deep learning has widely used in lots of areas

  3. Introduction • A lot of deep learning frameworks, compute libraries and acceleration devices CNTK Frameworks ··· Compute BLAS ··· Libraries Compute TPU ··· Devices

  4. Introduction • However, how to evaluate? ? ? ? Benchmark CNTK Frameworks ··· Compute BLAS ··· Libraries Compute TPU ··· Devices

  5. Introduction • However, how to evaluate? ? ? ? Benchmark Set Which is better? Optimization Target CNTK Frameworks ··· Compute Running Time BLAS ··· Libraries Resource Use Promote Scalability Development Efficiency Compute … TPU ··· Devices

  6. Related Deep Learning Benchmarks convnet- TensorFlow DeepBench 2 DAWNBench 3 benchmarks 1 Benchmark 4 Framework Compute Library Compute Library Target Framework Compute Library Compute Device Framework Granularity Neural Network Basic Operation Neural Network Neural Network Models Training Low Diversity Diversity Only CNN 2 CNN + 1 RNN 4 CNN Inference CIFAR10 、 ImageNet Limited Dataset Dataset ImageNet Dummy Data ImageNet SQuAD Training Time and Single Metric Metrics Time Per Iteration Time Cost to certain Total Training Time Accuracy 1. convnet-benchmarks: https://github.com/soumith/convnet-benchmarks 2. Baidu DeepBench: https://github.com/baidu-research/DeepBench 3. Cody A. Coleman et al. DAWNBench: An End-to-End Deep Learning Benchmark and Competition . NIPS 2017 4. TensorFlow Benchmark https://www.tensorflow.org/performance/benchmarks

  7. Related Deep Learning Benchmarks MLPerf 1 Framework Evaluation Target Compute Device Granularity Neural Network 1. Image(Classification, Detection) Characteristics 2. NLP(Translation, Sentiment Analysis) Various Applications Diversity 3. Speech(Recognition) 4. Reinforcement Learning & Recommendation Dataset ImageNet, COCO, WMT, Librispeech, MovieLens , … Various Datasets Evaluation Metrics Training Time, Power Use and Cost to certain Accuracy 1. https://mlperf.org/

  8. How to evaluate HPC systems for machine learning?

  9. Our Work on Workload Analysis for Deep Learning • Preliminary workload analysis Applications Image Machine Language Question Classification Translation Model Answering Models VGG ResNet Seq2seq RNN LM AoA Reader WikiText-2 Easy to obtain Cifar Real time Real Data Dummy Data Dataset CBTest Tatoeba Controllable Generative

  10. Our Work • Time • Time of every operation type within one iteration • Time of phases within one iteration Seq2seq AoA Reader RNN LM ResNet VGG 0 100 200 300 400 500 600 700 Time(ms) Data Forward Backward Loss Update

  11. Workload Analysis 18,432 1.0 • Memory Usage 16,384 0.8 Memory Use(MB) 14,336 • Memory Usage Break Down 12,288 0.6 Ratio • Memory Usage – Input Size 10,240 0.4 8,192 6,144 0.2 4,096 2,048 0.0 0 50000 100000 150000 200000 Pic Area(Pixel 2 ) Traning Inference Training/Inference Seq2seq 18,432 1.0 16,384 AoA Reader 0.8 14,336 Memory Use(MB) 12,288 RNN LM 0.6 10,240 Ratio 8,192 0.4 ResNet 6,144 4,096 0.2 VGG 2,048 0 0.0 0 2000 4000 6000 8000 10000 12000 14000 16000 0 200 400 600 800 1000 1200 Memory Use(MB) Sequence Length Weight Mediate Result + Temp Training Inference Training/Inference

  12. Workload Characterization • Hardware Counters • For GPU GPU Warp Execution Warp Non-Pred Execution Bandwidth TFLPOS Occupancy Efficiency Efficiency Utilization Normalized 1 0.46 1.00 1.00 4.02 5.65

  13. Questions about an HPC Oriented Deep Learning Benchmark • Questions we need to think: • Model Selection • Various application areas? • A synthetic model with main features? • Dataset • Fixed data set (Imagenet)? • A Generative Data? • Metrics • Time for training? • Gflops? • AI operations per second?

Recommend


More recommend