snapea
play

SnaPEA : Predictive Early Activation for Reducing Computation In - PowerPoint PPT Presentation

SnaPEA : Predictive Early Activation for Reducing Computation In Deep Convolutional Neural Networks * * Vahideh Akhlaghi Amir Yazdanbakhsh Kambiz Samadi Rajesh K. Gupta Hadi Esmaeilzadeh * Equal Contribution University


  1. SnaPEA : Predictive Early Activation for Reducing Computation In Deep Convolutional Neural Networks *† * Vahideh Akhlaghi Amir Yazdanbakhsh ‡ Kambiz Samadi Rajesh K. Gupta Hadi Esmaeilzadeh * Equal Contribution University of California, San Diego †Georgia Institute of Technology ‡ Qualcomm Technologies, Inc ISCA ’18

  2. CNNs perform trillions of operations for one input . . . Dog CNN models Operations for inference VGG-16 16,362,000,000,000 Ops AlexNet 1,147,000,000,000 Ops GoogLeNet 283,000,000,000 Ops SqueezeNet 222,000,000,000 Ops 2

  3. Convolutions dominate CNN computation . . . Dog CNN models Operations for inference VGG-16 16,362,000,000,000 Ops AlexNet 1,147,000,000,000 Ops GoogLeNet 283,000,000,000 Ops SqueezeNet 222,000,000,000 Ops ≥ 90% of operations are for convolutional layers 3

  4. Research challenge: How to reduce CNN computation with minimal effect on accuracy? Our solution: SnaPEA 1. Leverage algorithmic structure 2. Exploit runtime information 3. Tune up with static multi-variable optimization 4

  5. (1) Algorithmic structure of CNNs guides SnaPEA � � � 𝒙 𝒍𝒌𝒋 𝒚 𝒍𝒌𝒋 𝒍 𝒌 𝒋 Kernels . . . Normal. Pooling Conv Conv Conv ReLU ReLU ReLU . . . . . . 5

  6. (1) Algorithmic structure of CNNs guides SnaPEA Rectified Linear Unit (ReLU) � � � 𝒙 𝒍𝒌𝒋 𝒚 𝒍𝒌𝒋 𝒍 𝒌 𝒋 Kernels . . . Normal. Pooling Conv Conv Conv ReLU ReLU ReLU . . . . . . 6

  7. Opportunity to reduce the computation Black pixels are zero values 100% to the Activation Layers 80% Negative Inputs 60% 40% Conv 20% ReLU . . . . . . 0% AlexNet GoogLeNet SqueezeNet VGGNet Average GoogLeNet Large number of negative convolution outputs (61% on average) 7

  8. Early termination of convolution Input Output Convolution Blue boxes are the performed operations in two highlighted convolutions ReLU makes negative outputs zero: cut convolution short 8

  9. (2) Runtime information enables reducing computation Rectified Linear Unit (ReLU) Normal. Pooling Conv Conv Conv ReLU ReLU ReLU . . . . . . GoogLeNet Varying distribution of zero and non-zero outputs 9

  10. SnaPEA: Principles SnaPEA: Leveraging algorithmic structure of CNNs and runtime information Reduce computation without accuracy loss Trade accuracy for further computation reduction Add minimal hardware overhead 10

  11. SnaPEA: An illustrative example Original convolution w + - + + + - + - - - + - 0 X ReLU + + + + + + + + + + + + PartialSum + - - + + + + + - - + - 11

  12. SnaPEA: An illustrative example (Exact mode) Original convolution w + - + + + - + - - - + - 0 X + + + + + + + + + + + + ReLU PartialSum + - - + + + + + - - + - Convolution in SnaPEA (Exact mode) + + + + + + - - - - - - w + + + + + + + + + + + + X 0 ReLU PartialSum + + + + + + + - - - - - 12

  13. Potential benefits in the exact mode 100 % Negative Weights 80 60 40 20 0 AlexNet GoogleNet SqueezeNet VGGNet On average, 54% of the weights are negative 13

  14. SnaPEA: An illustrative example (Predictive mode) Convolution in SnaPEA (Predictive mode) Speculation operations + - + + + + + - - - - - w + - + w Yes + + + + + + + + + + + + X ≤ th X + + + + + + + + + + + - - - Partial Sum - PartialSum + + + No + - + + + + + - - - - - w + + + + + + + + + + + + X + + + + + + + + - - - - Partial Sum 14

  15. Speculation operations Large Small absolute value absolute value x * w X * w n largest weights Group 1 Group 2 Group n Small Large x * w X * w largest weights from each group 15

  16. Optimize the level of speculation Speculation parameters: Th: Threshold N: Number of speculation operations Find (Th, N) for all kernels in a CNN to minimize operations and satisfy the accuracy 16

  17. Optimize the level of speculation All convolution kernels in a CNN … … … … Layer 1 Layer 2 Layer L Kernel Profiling Local Optimization Global Optimization (Th,N) 17

  18. Optimize the level of speculation All convolution kernels in a CNN … … … … Layer 1 Layer 2 Layer L Dog Kernel Profiling Threshold # of Value Operations Local Optimization Layer L Layer 1 Layer 2 Global Optimization … … … … (Th,N) Per kernel sensitivity analysis 18

  19. Optimize the level of speculation All convolution kernels in a CNN … … … … Layer 1 Layer 2 Layer L Cat Kernel Profiling Threshold # of Value Operations Local Optimization Layer L Layer 1 Layer 2 Global Optimization … … … … (Th,N) Per kernel sensitivity analysis 19

  20. Optimize the level of speculation All convolution kernels in a CNN … … … … Layer 1 Layer 2 Layer L Dog Kernel Profiling Threshold # of Value Operations Local Optimization Layer L Layer 1 Layer 2 Global Optimization … … … … (Th,N) Per kernel sensitivity analysis 20

  21. Optimize the level of speculation All convolution kernels in a CNN … … … … … Kernel 1 Kernel 2 Kernel m Dog … Layer 1 Layer 2 Layer L Kernel Profiling … Kernel 1 Kernel 2 Kernel m Local Optimization Dog Layer L Layer 1 Layer 2 Global Optimization … … … … (Th,N) Set of configurations per layer 21

  22. Optimize the level of speculation All convolution kernels in a CNN … … … … … Kernel 1 Kernel 2 Kernel k Dog … Layer 1 Layer 2 Layer L Kernel Profiling … Kernel 1 Kernel 2 Kernel k Local Optimization Dog Layer L Layer 1 Layer 2 Global Optimization … … … … (Th,N) Set of configurations per layer 22

  23. Optimize the level of speculation All convolution kernels in a CNN … … … … Layer 1 Layer 2 Layer L Cat … Kernel Profiling Layer 1 Layer 1 Layer L Kernel 1 Kernel 2 Kernel k Local Optimization Layer L Layer 1 Layer 2 Global Optimization … … … … (Th,N) Adjust parameters regarding the cross-layer effect 23

  24. Optimize the level of speculation All convolution kernels in a CNN … … … … Layer 1 Layer 2 Layer L Bird … Kernel Profiling Layer 1 Layer 1 Layer L Kernel 1 Kernel 2 Kernel k Local Optimization Layer L Layer 1 Layer 2 Global Optimization … … … … (Th,N) Adjust parameters regarding the cross-layer effect 24

  25. Optimize the level of speculation All convolution kernels in a CNN … … … … Layer 1 Layer 2 Layer L Dog … Kernel Profiling Layer 1 Layer 1 Layer L Kernel 1 Kernel 2 Kernel k Local Optimization Layer L Layer 1 Layer 2 Global Optimization … … … … (Th,N) Adjust parameters regarding the cross-layer effect 25

  26. SnaPEA: Hardware implementation Prediction Activation Unit (PAU) PE 1,m PE 1,1 Exact mode Terminate … … … MAC MAC MAC MAC MAC MAC PAU PAU PAU PAU PAU PAU Partial result … … PE n,1 PE n,m Sign-bit Threshold ≤ … … … MAC MAC MAC MAC MAC MAC Terminate PAU PAU PAU PAU PAU PAU Predictive mode Add low-overhead sign checks and threshold checks to the hardware 26

  27. SnaPEA: Hardware implementation Processing Engine (PE) Compute Lane Weight and In/Out Buffer Index Buffer K Compute Lanes MAC Prediction Activation Unit (PAU) 27

  28. Experimental setup Benchmarks CNN Model AlexNet GoogLeNet SqueezeNet VGG-16 2012 2015 2016 2014 Top-1 Accuracy 57.2% 68.7% 57.5% 70.5% Top-5 Accuracy 80.1% 89.0% 80.3% 89.9% Optimization Optimization algorithm built on top of Caffe Hardware implementation Simulation: Cycle accurate Power estimation: Design Compiler using TSMC 45 nm Baseline design: Eyeriss with the same number MAC units (256) SnaPEA area overhead compared to Eyeriss: 4.5% 28

  29. Experimental results Exact Predictive (accuracy loss <= 1%) Predictive (accuracy loss <= 2%) Predictive (accuracy loss <= 3%) 2.5 Speedup over Eyeriss 2.08 2.02 2 1.89 1.83 1.81 1.81 1.63 1.54 1.52 1.51 1.45 1.44 1.5 1.37 1.38 1.34 1.29 1.28 1.27 1.26 1.24 1 0.5 0 AlexNet GoogleNet SqueezeNet VGGNet Geomean 29

  30. Experimental results Layers in the predictive mode for accuracy loss ≤ 3% % of Conv Energy Network Speedup Layers Improvement AlexNet 60.0 2.11 1.97 GoogleNet 84.2 2.14 2.04 SqueezeNet 65.4 1.94 1.84 VGGNet 61.5 1.87 1.73 On average, 68% of layers operate in the predictive mode (3% accuracy drop). 30

  31. Experimental results Speedup Highest speedup (3.6 x ) in a layer in GoogLeNet 31

  32. Conclusion SnaPEA Exploit algorithmic structure and runtime information Reduce computations in convolutional layers Control the accuracy with multi-variable optimization Add minimal hardware overhead Future directions Leverage runtime information (e.g., patterns in inputs and activations) Expand to other activation functions (e.g., sigmoid) Tune up the hardware for more parallelism 32

Recommend


More recommend