cnvlutin ineffectual neuron free dnn computing
play

CNVLUTIN: Ineffectual-neuron-free DNN computing J. Albericio , P. - PowerPoint PPT Presentation

CNVLUTIN: Ineffectual-neuron-free DNN computing J. Albericio , P. Judd, T. Hetherington*, T. Aamodt*, N. E. Jerger, A. Moshovos * Please cite the original source. CNVLUTIN: Ineffectual-neuron-free DNN computing J. Albericio P. Judd T.


  1. CNVLUTIN: Ineffectual-neuron-free DNN computing J. Albericio , P. Judd, T. Hetherington*, T. Aamodt*, N. E. Jerger, A. Moshovos * Please cite the original source.

  2. CNVLUTIN: Ineffectual-neuron-free DNN computing J. Albericio P. Judd T. Hetherington* T. Aamodt* N. Enright Jerger A. Moshovos *

  3. ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ DNNs = SIMD Heaven x + x 100’s — 1000's 3

  4. ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ DNNs = SIMD Heaven x + x 100’s — 1000's 4

  5. ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ DNNs = SIMD Heaven x + x 100’s — 1000's 5

  6. ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ DNNs = SIMD Heaven x + x 100’s — 1000's 6

  7. ⋯ ⋯ ⋯ ⋯ ⋯ DNNs = SIMD Heaven x + x 100’s — 1000's 7

  8. CNVLUTIN: Smarter SIMD 52% Performance — 2x ED 2 P Out-of-the-box networks 8

  9. Outline 1. What’s a CNN? 2. A wide SIMD design 3. CNVLUTIN: Skipping neurons in a wide SIMD design 4. Evaluation 5. Our approach 9

  10. What’s a CNN? Korean … mask! 10’s of layers 10

  11. What’s a CNN? … 11

  12. What’s a CNN? Neurons (Input) … 11

  13. What’s a CNN? Synapses Neurons (Filters) (Input) … … 11

  14. What’s a CNN? … … 12

  15. What’s a CNN? Neurons (Output) … … 12

  16. What’s a CNN? Neurons (Output) … … 12

  17. What’s a CNN? Neurons (Output) … … … 12

  18. What’s a CNN? Korean … mask! 10’s of layers 13

  19. What’s a CNN? Convolution ReLU Pool Korean … mask! 10’s of layers 13

  20. What’s a CNN? CNN typical layer Convolution ReLU Pool Data size Negatives to 0 Inner products 3 reduction 2 x 1 … + 0 x -1 -2 -3 -3 -2 -1 0 1 2 3 14

  21. ~90% Time spent in convolutions 15

  22. Lots of Runtime Zeroes 0.6 0.5 0.4 0.3 0.2 0.1 0 Alexnet Google NiN VGG19 VGG_M VGG_S AVG Fraction of zero neurons in multiplications 16

  23. Lots of Runtime Zeroes 0.6 0.5 0.4 0.3 Waste of time and energy!!! 0.2 0.1 0 Alexnet Google NiN VGG19 VGG_M VGG_S AVG Fraction of zero neurons in multiplications 16

  24. Lots of Runtime Zeroes 0.6 0.5 0.4 0.3 Waste of time Dynamically and energy!!! generated 0.2 = 0.1 Not predictable 0 Alexnet Google NiN VGG19 VGG_M VGG_S AVG Fraction of zero neurons in multiplications 16

  25. How to compute DNNs: DaDianNao* NBin Neuron 16 Lane 0 Neuron Lane 15 SB (eDRAM) IP0 x Neurons + f NBout x Filter 0 Filter 0 IP15 x + f Filter 15 x Filter 15 *Chen et al. MICRO 2014

  26. ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ Processing in DaDianNao 0 1 1 2 0 Neuron 1 2 1 0 3 Lanes 15 0 1 1 1 0 Synapse 1 Lanes Filter 0 15 0 Synapse 1 Lanes Filter 15 15 18

  27. ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ Processing in DaDianNao 0 1 1 2 0 Neuron 1 3 2 1 0 Lanes 1 15 0 1 1 0 Synapse 1 Lanes Filter 0 15 0 Synapse 1 Lanes Filter 15 15 18

  28. ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ Processing in DaDianNao 0 1 1 2 0 Neuron 1 3 2 1 0 Lanes 1 15 0 1 1 X 0 Synapse Multiplication of corresponding 1 Lanes neuron and synapse elements Filter 0 15 0 X Synapse 1 Lanes Filter 15 15 18

  29. ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ Zero-skipping in DaDianNao? 0 2 1 1 2 0 Neuron 3 1 2 1 0 3 Lanes 1 15 0 1 1 1 0 Synapse 1 Lanes Filter 0 15 0 Synapse 1 Lanes Filter 15 15 19

  30. ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ Zero-skipping in DaDianNao? Zero 0 1 1 2 1 1 2 0 removal Neuron 2 1 3 1 2 1 0 3 Lanes 1 1 1 15 0 1 1 1 0 Synapse 1 Lanes Filter 0 15 0 Synapse 1 Lanes Filter 15 15 19

  31. ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ Zero-skipping in DaDianNao? Zero 0 1 1 2 1 1 2 0 removal Neuron 2 1 3 1 2 1 0 3 Lanes 1 1 1 15 0 1 1 1 0 Synapse 1 Lanes Filter 0 15 0 Synapse 1 Lanes Filter 15 15 19

  32. ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ Zero-skipping in DaDianNao? Zero 0 1 1 2 1 1 2 0 removal Neuron 2 1 3 1 2 1 0 3 Lanes 1 1 1 15 0 1 1 1 0 X Synapse 1 Lanes Filter 0 15 X 0 Synapse 1 Lanes Filter 15 15 19

  33. ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ Zero-skipping in DaDianNao? Zero 0 1 1 2 1 1 2 0 removal Neuron 2 1 3 1 2 1 0 3 Lanes 1 1 1 15 0 1 1 1 0 X Synapse Lanes can 1 Lanes not longer Filter 0 15 operate in lock-step! X 0 Synapse 1 Lanes Filter 15 15 19

  34. ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ CNVLUTIN: Decoupling Lanes Subunit 0 0 Neuron Lane 0 Neuron 1 Lanes Filter 0 Synapses Filter 1 15 Lane 0 Filter 15 0 Synapse 1 Lanes Filter 0 Subunit 15 15 Neuron Lane 15 Filter 0 0 Synapse Synapses Filter 1 1 Lanes Lane 15 Filter 15 Filter 15 15 CNVLUTIN DaDianNao 20

  35. ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ CNVLUTIN: Decoupling Lanes Subunit 0 Neuron Lane 0 1 1 2 0 Offsets 3 2 1 Filter 0 Synapses Lane 0 Filter 15 Subunit 15 Neuron Lane 15 0 1 1 1 Offsets 2 1 0 Filter 0 Synapses Lane 15 Filter 15 21

  36. ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ CNVLUTIN: Decoupling Lanes Subunit 0 Neuron Lane 0 1 1 2 Offsets 3 2 1 Filter 0 Synapses Lane 0 Filter 15 Subunit 15 Neuron Lane 15 1 1 1 Offsets 2 1 0 Filter 0 Synapses Lane 15 Filter 15 21

  37. ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ CNVLUTIN: Decoupling Lanes Subunit 0 Neuron Lane 0 1 1 2 Offsets 3 2 1 Filter 0 X Synapses Lane 0 Filter 15 Subunit 15 Neuron Lane 15 1 1 1 Offsets 2 1 0 X Filter 0 Synapses Lane 15 Filter 15 21

  38. CNVLUTIN: Ineffectual-neuron Filtering Layer i Layer i+1 22

  39. CNVLUTIN: Ineffectual-neuron Filtering Layer i Layer i+1 Dispatcher Encoder eDRAM 23

  40. CNVLUTIN: Ineffectual-neuron Filtering Layer i Layer i+1 Dispatcher Encoder eDRAM 23

  41. CNVLUTIN: Ineffectual-neuron Filtering Layer i Layer i+1 Dispatcher Encoder eDRAM Brick 2 Brick 1 Brick 0 Neurons 7 6 5 0 0 0 0 0 0 2 1 0 Packed neurons 0 7 6 5 0 0 0 0 0 0 2 1 eDRAM O ff sets 0 3 2 1 0 0 0 0 0 0 2 1 ZF Neurons 7 6 5 0 2 1 Unit Bu ff ers O ff set 3 2 1 0 2 1 23

  42. CNVLUTIN: Ineffectual-neuron Filtering Layer i Layer i+1 Dispatcher Encoder eDRAM Brick 2 Brick 1 Brick 0 Neurons 7 6 5 0 0 0 0 0 0 2 1 0 Packed neurons 0 7 6 5 0 0 0 0 0 0 2 1 eDRAM O ff sets 0 3 2 1 0 0 0 0 0 0 2 1 ZF Neurons 7 6 5 0 2 1 Unit Bu ff ers O ff set 3 2 1 0 2 1 23

  43. CNVLUTIN: Ineffectual-neuron Filtering Layer i Layer i+1 Dispatcher Encoder eDRAM Brick 2 Brick 1 Brick 0 Neurons 7 6 5 0 0 0 0 0 0 2 1 0 Packed neurons 0 7 6 5 0 0 0 0 0 0 2 1 eDRAM O ff sets 0 3 2 1 0 0 0 0 0 0 2 1 ZF Neurons 7 6 5 0 2 1 Unit Bu ff ers O ff set 3 2 1 0 2 1 23

  44. CNVLUTIN: Computation Slicing … Neuron Lane 15 Neuron Lane 1 Neuron Lane 0 24

  45. Methodology • In-house timing simulator: baseline + CNVLUTIN • Logic + SRAM: Synthesis on 65nm TSMC • eDRAM model: Destiny • DNNs: Trained models from Caffe model zoo 25

  46. Area Only +4.5% in area overhead 26

  47. Speedup: ineffectual = 0 2 1.5 1 Better 0.5 0 Alexnet Google NiN VGG19 VGG_M VGG_S Geo 27

  48. Speedup: ineffectual = 0 2 1.5 1 Better 0.5 0 Alexnet Google NiN VGG19 VGG_M VGG_S Geo 27

  49. Speedup: ineffectual = 0 2 1.5 1 Better 0.5 0 Alexnet Google NiN VGG19 VGG_M VGG_S Geo 1.37x Performance on average 27

  50. Loosening the Ineffectual Neuron Criterion CNVLUTIN zero “If all you have is a hammer, everything looks like a nail” (Maslow’s hammer) 37 0 13 10 15 1 123 0 0 7 1 3 0 1 20 0 18 31 0 33 28

  51. Loosening the Ineffectual Neuron Criterion CNVLUTIN zero “If all you have is a hammer, everything looks like a nail” (Maslow’s hammer) 37 0 13 10 15 1 123 0 0 7 1 3 0 1 20 0 18 31 0 33 Example: consider ineffectual if value<2 29

  52. Speedup: ineffectual >= 0 2 1.5 1 Better 0.5 0 Alexnet Google NiN VGG19 VGG_M VGG_S Geo only 0's 0's and more 1.52x Performance No accuracy lost 30

  53. Speedup: ineffectual >= 0 2 1.5 1 Better 0.5 0 Alexnet Google NiN VGG19 VGG_M VGG_S Geo only 0's 0's and more 1.52x Performance No accuracy lost 30

  54. Loosening the Ineffectual Neuron Criterion CNVLUTIN zero “If all you have is a hammer, everything looks like a nail” (Maslow’s hammer) 37 0 13 10 15 1 123 0 0 7 1 3 0 1 20 0 18 31 0 33 Example: consider ineffectual if value<2 31

  55. Loosening the Ineffectual Neuron Criterion CNVLUTIN zero “If all you have is a hammer, everything looks like a nail” (Maslow’s hammer) 37 0 13 10 15 1 123 0 0 7 1 3 0 1 20 0 18 31 0 33 Example: consider ineffectual if value<8 32

Recommend


More recommend