Towards Accurate Post-training Network Quantization via Bit-split and Stitching Peisong Wang , Qiang Chen, Xiangyu He, Jian Cheng Institute of Automation, Chinese Academy of Sciences 1
Outline • Background • Motivation • Approach • Experiments 2
Background • Low-bit quantization has emerged as a promising compression technique • Robustness to network architectures • Hardware friendly • Problems: low-bit quantization relies on • Training data • Large computational resources (CPUs, GPUs) • Quantization skills and expertise 3
Background Training-aware quantization Post-training Quantization Pre-trained This work Model Pre-trained Data-free Model BP-free Network Quantization Easy to use Network Quantization Finetune using data/labels Krishnamoorthi, Raghuraman. "Quantizing deep convolutional networks for efficient inference: A whitepaper." arXiv preprint arXiv:1806.08342 (2018). 4
Motivation Post-training quantization Pretrained model 5
Motivation Post-training quantization Pretrained model Low-bit model 6
Motivation Post-training quantization Pretrained model Minimize the Di Distance Low-bit model 7
Motivation Post-training quantization I. Define the distance Pretrained model II. Minimize the distance Minimize the Di Distance Low-bit model 8
Related works I. Define the distance TF-lite Map the maximum weighs (activations) II. Minimize the distance to the maximum low-bit number -|Max| |Max| -127 0 127 Krishnamoorthi, Raghuraman. "Quantizing deep convolutional networks for efficient inference: A whitepaper." arXiv preprint arXiv:1806.08342 (2018). 9
Related works I. Define the distance TensorRT Map the clip value II. Minimize the distance to the maximum low-bit number outliers outliers -127 0 127 Szymon Migacz. 8-bit Inference with TensorRT. GTC 2017 10
Method Objective I. Define the distance II. Minimize the distance Previous work Pretrained model This work Learns a low-bit mapping from input to the output of every convolution. Minimize the Di Distance Low-bit model 11
Method I. Define the distance II. Minimize the distance (Bit-split) 2 $%# 2 " 2 ' 𝑟 $ 𝑟 $%" 𝑟 # 𝑟 " … 12
Method Optimize 𝛽 Optimize m-th bit Wang, P., Hu, Q., Zhang, Y., Zhang, C., Liu, Y. and Cheng, J., 2018. Two-step quantization for low-bit neural networks. In Proceedings of the IEEE Conference on computer vision and pattern recognition (pp. 4376-4384). 13
Bit-Split for Post-training Network Quantization Problem: Optimization: 14
Bit-Split Results Weight Quantization: Both Weight and Activation Quantization: 15
Comparison with State-of-the-arts 16
Results on Detection and Instance segmentation 17
Thanks for your attention. Codes are available at https://github.com/wps712/BitSplit peisong.wang@nlpr.ia.ac.cn 18
Recommend
More recommend