A Case for Dynamic Activation Quantization in CNNs Karl Taht, Surya - PowerPoint PPT Presentation

A Case for Dynamic Activation Quantization in CNNs Karl Taht, Surya Narayanan, Rajeev Balasubramonian University of Utah

Overview • Background • Proposal • Search Space • Architecture • Results • Future Work

Improving CNN Efficiency • Stripes: Bit-Serial Deep Neural Network Computing • Per-layer bit precisions net significant savings with <1% accuracy loss • Brute force approach to find best quantization – retraining at each step! • Good end result, but expensive! • Weight-Entropy-Based Quantization for Deep Neural Networks • Quantize both weights and activations • Guided search to find optimal quantization (entropy and clustering) • Still requires retraining, still a passive approach Can we exploit adaptive reduced precision during inference?

Proposal: Adaptive Quantization Approach (AQuA) • Most images contain regions of irrelevant information for the classification task • Can avoid such computations all together? • Quantize completely regions to 0 bits • More simply – Crop them!

Proposal: Activation Cropping

Proposal: Activation Cropping Concept: Save computations Add lightweight here predictor here

Search Space – How to Crop • Exploit domain knowledge N • Information is typically centered within the image (>55% in our tests) • Utilize a regular pattern • Less control logic required Image N • Maps easier to different hardware • Added bonus: • While objects are centered, majority of area (and thus computation) is on the outside!

Proposal: Activation Cropping N = 25 Concept: N = 10 Scale Feature Maps N = 8 Proportionally N = 5 N = 2

Search Space – Crop Directions • We consider 16 possible crops as [ 0 1 0 0 ] [ 1 0 0 0 ] permutations of top, bottom, left, and right crops encoded as a vector: Image Image [ TOP , BOTTOM , LEFT , RIGHT ] • Unlike traditional pruning, AQuA can exploit image-based information to enhance pruning options. [ 0 0 1 0 ] [ 0 0 0 1 ] [ 0 1 0 1 ] [ 1 0 1 1 ] Image Image Image Image

Quantifying Potentials • For maintaining original Number of Edges Cropped Top-1 accuracy, 75% images can tolerate some type of crop! • Greater savings with top-5 predictions • Technique invariant to weight quantization Weight Set

Exploiting Energy Savings with ISAAC • Activation cropping technique can be applied to any architecture 1 bit • We use the ISAAC accelerator due 2 bit 1 bit to its flexibility Inputs W 2 bit e i g • Future work includes leveraging h 1 bit t s 2 bit additional variable precision 1 bit techniques 2 bit 8 bit Outputs

Weight Precision Savings 10 bit 16 bit 1 bit 2 bit 1 bit 2 bit 5 columns 8 columns 1 bit 2 bit 1 bit 2 bit 8 bit ADC (Multiplexed) 5 x ADC Operations 8 x ADC Operations

“FlexPoint” Support 10 bit 16 bit 1 bit 2 bit 1 bit 2 bit 5 columns 8 columns 1 bit 2 bit 1 bit 2 bit 8 bit ADC (Multiplexed) Can vary shift amount to compute fixed point computations with different exponents

Activation Quantization Savings Buffered Input 1 0 0 1 1...1010101 1 .. 1 0 1 1 bit 2 bit 1 .. 1 0 1 0 1 0 1 0...1000110 1 bit 2 bit 1...0011111 0 0 1 0 0 .. 1 0 1 1 bit 2 bit k - bit inputs .. 0 0 1 1 1 1 1 1 1 bit 2 bit 5 4 2 1 k .. 7 6 3 Time Step 8 bit K -bit activations (inputs) require K time steps. Outputs

Activation Quantization Savings Buffered Input 1 0 0 1 1...1010101 1 .. 1 0 1 1 bit Fewer computations means 2 bit 1 .. 1 0 1 0 1 0 1 0...1000110 increasing throughput, 1 bit 2 bit reducing area requirements, 1...0011111 0 0 1 0 0 .. 1 0 1 1 bit and lowering energy. 2 bit k - bit inputs .. 0 0 1 1 1 1 1 1 1 bit 2 bit 5 4 2 1 k .. 7 6 3 Time Step 8 bit K -bit activations (inputs) require K time steps. Outputs

Naive Approach – Crop Everything • Substantial energy savings at a cost to accuracy • Theoretically, can save over 33% energy and maintain original accuracy!

Overall Energy Savings • Adaptive quantization saves 33% on average compared to an uncropped baseline. • Technique can be applied in conjunction with weight quantization techniques with nearly identical relative savings

Future Work • Predict unimportant regions Original • Using a “0 th ” layer with a just a few gradient-based kernels • Use variable low precision computations unimportant Sobel Gradient regions (not just cropping) • Quantify energy and latency changes due to additional prediction step, but fewer overall computations

Conclusion • Adaptive quantization saves 33% on average compared to an uncropped baseline. • Technique can be applied in conjunction with weight quantization techniques with nearly identical relative savings

Thank you! Questions?

A Case for Dynamic Activation Quantization in CNNs Karl Taht, Surya - PowerPoint PPT Presentation

A Case for Dynamic Activation Quantization in CNNs Karl Taht, Surya Narayanan, Rajeev Balasubramonian University of Utah Overview Background Proposal Search Space Architecture Results Future Work Improving CNN Efficiency

Compilers Activation Records Alex Aiken Activation Records The information needed to manage

Quantization, after Souriau Prequantization Quantization? Group algebra Classical Franois

Deep Learning for Geometry Processing 3D Representations View-Based and Volumetric CNNs 3D

LOW PRECISION INFERENCE ON GPU Hao Wu, NVIDIA OUTLINE Performance motivation for quantization

Same, Same But Different Recovering Neural Network Quantization Error Through Weight

Quantization for TVM Ziheng Jiang TVM Conference, Dec 12th 2018 Quantization for TVM What is

Understanding Geometry of Encoder-Decoder CNNs (E-D CNNs) Jong Chul Ye & Woon Kyoung Sung

Introduction to CNNs and RNNs with PyTorch Introduction to CNNs and RNNs with PyTorch Presented

Quantization of Poisson-Lie Hamiltonian systems Chiara Esposito Julius Maximilian University of

Adiabatic limits, Theta functions, and Geometric Quantization 2019 CMS Winter Meeting Takahiko

CMSC5743 L05: Quantization Bei Yu (Latest update: October 12, 2020) Fall 2020 1 / 25 Overview

From Martingales in Finance to Quantization for pricing Giorgia Callegaro Universit di Padova

Quantization of group-valued moment maps III Eckhard Meinrenken June 4, 2011 Eckhard Meinrenken

Basophil activation test Edward Knol Dept. Immunology & Dermatology/Allergology Basophil

Internalization, Dimerization, and Activation of CD38 during mNOX Activation: - and Ca 2+

COMPONENT ACTIVATION OF A HIGH CURRENT COMPONENT ACTIVATION OF A HIGH CURRENT RADIOISOTOPE

REVIEW STUDY ON STANDBY REGULATION Request for services No. ENER/C3/2012-418-lot 2/08/2014-558

Next Generation Vitrimers 2 types of plastics Thermoplastics : highly re-processable, not very

Challenges and opportunities of modelling behaviour in E4 models By the People, Energy,

National Energy Efficiency Conference 2013 Date: 9 th October 2013 Venue: Max Atria, EXPO Session:

Current status and ongoing activities in the Energy Community Jasmina Trhulj, Energy Community

ZEVIWG Public Meeting Pacific Power December 6, 2019 Pacific Power Overview ~580,000

Placitas: Pipeline Exposure LLC Guide mouse over above caption for video activation 1 FAQS

Nordic Stakeholder reference group meeting Arlanda Airport May 21 th 2019 Agenda # Topic Time

A Case for Dynamic Activation Quantization in CNNs Karl Taht, Surya - PowerPoint PPT Presentation

A Case for Dynamic Activation Quantization in CNNs Karl Taht, Surya Narayanan, Rajeev Balasubramonian University of Utah Overview Background Proposal Search Space Architecture Results Future Work Improving CNN Efficiency

Compilers Activation Records Alex Aiken Activation Records The information needed to manage

Quantization, after Souriau Prequantization Quantization? Group algebra Classical Franois

Deep Learning for Geometry Processing 3D Representations View-Based and Volumetric CNNs 3D

LOW PRECISION INFERENCE ON GPU Hao Wu, NVIDIA OUTLINE Performance motivation for quantization

Same, Same But Different Recovering Neural Network Quantization Error Through Weight

Quantization for TVM Ziheng Jiang TVM Conference, Dec 12th 2018 Quantization for TVM What is

Understanding Geometry of Encoder-Decoder CNNs (E-D CNNs) Jong Chul Ye &amp; Woon Kyoung Sung

Introduction to CNNs and RNNs with PyTorch Introduction to CNNs and RNNs with PyTorch Presented

Quantization of Poisson-Lie Hamiltonian systems Chiara Esposito Julius Maximilian University of

Adiabatic limits, Theta functions, and Geometric Quantization 2019 CMS Winter Meeting Takahiko

CMSC5743 L05: Quantization Bei Yu (Latest update: October 12, 2020) Fall 2020 1 / 25 Overview

From Martingales in Finance to Quantization for pricing Giorgia Callegaro Universit di Padova

Quantization of group-valued moment maps III Eckhard Meinrenken June 4, 2011 Eckhard Meinrenken

Basophil activation test Edward Knol Dept. Immunology &amp; Dermatology/Allergology Basophil

Internalization, Dimerization, and Activation of CD38 during mNOX Activation: - and Ca 2+

COMPONENT ACTIVATION OF A HIGH CURRENT COMPONENT ACTIVATION OF A HIGH CURRENT RADIOISOTOPE

REVIEW STUDY ON STANDBY REGULATION Request for services No. ENER/C3/2012-418-lot 2/08/2014-558

Next Generation Vitrimers 2 types of plastics Thermoplastics : highly re-processable, not very

Challenges and opportunities of modelling behaviour in E4 models By the People, Energy,

National Energy Efficiency Conference 2013 Date: 9 th October 2013 Venue: Max Atria, EXPO Session:

Current status and ongoing activities in the Energy Community Jasmina Trhulj, Energy Community

ZEVIWG Public Meeting Pacific Power December 6, 2019 Pacific Power Overview ~580,000

Placitas: Pipeline Exposure LLC Guide mouse over above caption for video activation 1 FAQS

Nordic Stakeholder reference group meeting Arlanda Airport May 21 th 2019 Agenda # Topic Time

Understanding Geometry of Encoder-Decoder CNNs (E-D CNNs) Jong Chul Ye & Woon Kyoung Sung

Basophil activation test Edward Knol Dept. Immunology & Dermatology/Allergology Basophil