a case for dynamic activation quantization in cnns
play

A Case for Dynamic Activation Quantization in CNNs Karl Taht, Surya - PowerPoint PPT Presentation

A Case for Dynamic Activation Quantization in CNNs Karl Taht, Surya Narayanan, Rajeev Balasubramonian University of Utah Overview Background Proposal Search Space Architecture Results Future Work Improving CNN Efficiency


  1. A Case for Dynamic Activation Quantization in CNNs Karl Taht, Surya Narayanan, Rajeev Balasubramonian University of Utah

  2. Overview • Background • Proposal • Search Space • Architecture • Results • Future Work

  3. Improving CNN Efficiency • Stripes: Bit-Serial Deep Neural Network Computing • Per-layer bit precisions net significant savings with <1% accuracy loss • Brute force approach to find best quantization – retraining at each step! • Good end result, but expensive! • Weight-Entropy-Based Quantization for Deep Neural Networks • Quantize both weights and activations • Guided search to find optimal quantization (entropy and clustering) • Still requires retraining, still a passive approach Can we exploit adaptive reduced precision during inference?

  4. Proposal: Adaptive Quantization Approach (AQuA) • Most images contain regions of irrelevant information for the classification task • Can avoid such computations all together? • Quantize completely regions to 0 bits • More simply – Crop them!

  5. Proposal: Activation Cropping

  6. Proposal: Activation Cropping Concept: Save computations Add lightweight here predictor here

  7. Search Space – How to Crop • Exploit domain knowledge N • Information is typically centered within the image (>55% in our tests) • Utilize a regular pattern • Less control logic required Image N • Maps easier to different hardware • Added bonus: • While objects are centered, majority of area (and thus computation) is on the outside!

  8. Proposal: Activation Cropping N = 25 Concept: N = 10 Scale Feature Maps N = 8 Proportionally N = 5 N = 2

  9. Search Space – Crop Directions • We consider 16 possible crops as [ 0 1 0 0 ] [ 1 0 0 0 ] permutations of top, bottom, left, and right crops encoded as a vector: Image Image [ TOP , BOTTOM , LEFT , RIGHT ] • Unlike traditional pruning, AQuA can exploit image-based information to enhance pruning options. [ 0 0 1 0 ] [ 0 0 0 1 ] [ 0 1 0 1 ] [ 1 0 1 1 ] Image Image Image Image

  10. Quantifying Potentials • For maintaining original Number of Edges Cropped Top-1 accuracy, 75% images can tolerate some type of crop! • Greater savings with top-5 predictions • Technique invariant to weight quantization Weight Set

  11. Exploiting Energy Savings with ISAAC • Activation cropping technique can be applied to any architecture 1 bit • We use the ISAAC accelerator due 2 bit 1 bit to its flexibility Inputs W 2 bit e i g • Future work includes leveraging h 1 bit t s 2 bit additional variable precision 1 bit techniques 2 bit 8 bit Outputs

  12. Weight Precision Savings 10 bit 16 bit 1 bit 2 bit 1 bit 2 bit 5 columns 8 columns 1 bit 2 bit 1 bit 2 bit 8 bit ADC (Multiplexed) 5 x ADC Operations 8 x ADC Operations

  13. “FlexPoint” Support 10 bit 16 bit 1 bit 2 bit 1 bit 2 bit 5 columns 8 columns 1 bit 2 bit 1 bit 2 bit 8 bit ADC (Multiplexed) Can vary shift amount to compute fixed point computations with different exponents

  14. Activation Quantization Savings Buffered Input 1 0 0 1 1...1010101 1 .. 1 0 1 1 bit 2 bit 1 .. 1 0 1 0 1 0 1 0...1000110 1 bit 2 bit 1...0011111 0 0 1 0 0 .. 1 0 1 1 bit 2 bit k - bit inputs .. 0 0 1 1 1 1 1 1 1 bit 2 bit 5 4 2 1 k .. 7 6 3 Time Step 8 bit K -bit activations (inputs) require K time steps. Outputs

  15. Activation Quantization Savings Buffered Input 1 0 0 1 1...1010101 1 .. 1 0 1 1 bit Fewer computations means 2 bit 1 .. 1 0 1 0 1 0 1 0...1000110 increasing throughput, 1 bit 2 bit reducing area requirements, 1...0011111 0 0 1 0 0 .. 1 0 1 1 bit and lowering energy. 2 bit k - bit inputs .. 0 0 1 1 1 1 1 1 1 bit 2 bit 5 4 2 1 k .. 7 6 3 Time Step 8 bit K -bit activations (inputs) require K time steps. Outputs

  16. Naive Approach – Crop Everything • Substantial energy savings at a cost to accuracy • Theoretically, can save over 33% energy and maintain original accuracy!

  17. Overall Energy Savings • Adaptive quantization saves 33% on average compared to an uncropped baseline. • Technique can be applied in conjunction with weight quantization techniques with nearly identical relative savings

  18. Future Work • Predict unimportant regions Original • Using a “0 th ” layer with a just a few gradient-based kernels • Use variable low precision computations unimportant Sobel Gradient regions (not just cropping) • Quantify energy and latency changes due to additional prediction step, but fewer overall computations

  19. Conclusion • Adaptive quantization saves 33% on average compared to an uncropped baseline. • Technique can be applied in conjunction with weight quantization techniques with nearly identical relative savings

  20. Thank you! Questions?

Recommend


More recommend