Computatio ion Reuse in in DNNs by Exploiting Input Sim imilarity Marc Riera , Jose Maria Arnau, Antonio GonzΓ‘lez
Sequence Processing Applications Speech Audio Signal 4/06/2018 ISCA 2018 2
Sequence Processing Applications 4/06/2018 ISCA 2018 3
Sequence Processing Applications 4/06/2018 ISCA 2018 4
Sequence Processing Applications 4/06/2018 ISCA 2018 5
Sequence Processing Applications Speech Recognition DNN executions to classify a sequence of audio frames in phonemes 4/06/2018 ISCA 2018 6
Benchmarks DNN Name DNN Type DNN Application #Parameters Accuracy Kaldi MLP Acoustic Scoring 4,7M 89,04% EESEN RNN Speech Recognition 11M 68,85% C3D CNN Video Classification 78M 93,48% AutoPilot CNN Self-Driving Cars 1,6M 99,63% 4/06/2018 ISCA 2018 7
In Input Sim imilarity 90% 77% 80% 69% 70% 61% 60% Input Similarity (%) 52% 50% 45% 40% 30% 20% 10% 0% Kaldi C3D Autopilot EESEN Average 4/06/2018 ISCA 2018 8
Exploiting Temporal Sim imilarity Example Baseline π₯ 0 π π½ 0 π π = π½ 0 π π₯ 0 + π½ 1 π π₯ 1 + π½ 2 π π₯ 2 + π π₯ 1 π π½ 1 Frame i N π₯ 2 π π½ 2 π+1 π₯ 0 π½ 0 π π+1 = π½ 0 π+1 π₯ 0 + π½ 1 π+1 π₯ 1 + π½ 2 π+1 π₯ 2 + π π₯ 1 π+1 π½ 1 Frame i+1 N π₯ 2 π+1 π½ 2 4/06/2018 ISCA 2018 9
Exploiting Temporal Sim imilarity Example Proposal π₯ 0 π π½ 0 π π = π½ 0 π π₯ 0 + π½ 1 π π₯ 1 + π½ 2 π π₯ 2 + π π₯ 1 π π½ 1 Frame i N π₯ 2 π π½ 2 π+1 π₯ 0 π½ 0 π )π π π· π+π = π· π + (π± π π+π βπ± π π₯ 1 π+1 π½ 1 Frame i+1 N Number of computations before = 6 π₯ 2 π+1 Number of computations after = 2 π½ 2 Note : Substraction of the inputs is almost negligible since its performed once per input 4/06/2018 ISCA 2018 10
Computatio ion Reuse 90% 79% 80% 74% 70% 66% Computation Reuse (%) 60% 55% 53% 50% 40% 30% 20% 10% 0% Kaldi C3D Autopilot EESEN Average 4/06/2018 ISCA 2018 11
DNN Processing Unit Tile 4/06/2018 ISCA 2018 12
FC Execution in the Reuse Accelerator (1) 4/06/2018 ISCA 2018 13
FC Execution in the Reuse Accelerator (2) 4/06/2018 ISCA 2018 14
FC Execution in the Reuse Accelerator (3) 4/06/2018 ISCA 2018 15
Other Supported Layers Convolutional Neural Network (CNN) Recurrent Neural Network (RNN) 4/06/2018 ISCA 2018 16
Evalu luation Methodology β’ Simulator to evaluate the performance and energy of the accelerator β’ Design Compiler to obtain power and delay of logic modules β’ 28/32nm library from Synopsys and the DesignWare logic modules β’ CACTI used for SRAM and eDRAM memories β’ MICRON LPDDR4 for main Memory β’ Accelerator Configuration: 4/06/2018 ISCA 2018 17
Memory ry Footprint Overheads 20 18 16 14 Memory Increase (%) 12 10 8 6 4 2 0 On-Chip IO Buffer Off-Chip Main Memory 4/06/2018 ISCA 2018 18
Results: SpeedUp 4/06/2018 ISCA 2018 19
Results: Energy Savin ings 4/06/2018 ISCA 2018 20
Conclusions β’ More than 60% of the inputs remain unmodified respect the previous execution β’ Our proposed scheme checks which inputs have changed: β’ Unmodified inputs are ignored, avoiding computations and memory accesses β’ Modified inputs are used to correct the previous output of each neuron β’ On average, 63% energy savings and 3.5x speedup β’ Small area overhead of less than 1% mainly for additional storage 4/06/2018 ISCA 2018 21
Computatio ion Reuse in in DNNs by Exploiting Input Sim imilarity Marc Riera , Jose Maria Arnau, Antonio GonzΓ‘lez
Recommend
More recommend