towards accurate post training network quantization via
play

Towards Accurate Post-training Network Quantization via Bit-split - PowerPoint PPT Presentation

Towards Accurate Post-training Network Quantization via Bit-split and Stitching Peisong Wang , Qiang Chen, Xiangyu He, Jian Cheng Institute of Automation, Chinese Academy of Sciences 1 Outline Background Motivation Approach


  1. Towards Accurate Post-training Network Quantization via Bit-split and Stitching Peisong Wang , Qiang Chen, Xiangyu He, Jian Cheng Institute of Automation, Chinese Academy of Sciences 1

  2. Outline • Background • Motivation • Approach • Experiments 2

  3. Background • Low-bit quantization has emerged as a promising compression technique • Robustness to network architectures • Hardware friendly • Problems: low-bit quantization relies on • Training data • Large computational resources (CPUs, GPUs) • Quantization skills and expertise 3

  4. Background Training-aware quantization Post-training Quantization Pre-trained This work Model Pre-trained Data-free Model BP-free Network Quantization Easy to use Network Quantization Finetune using data/labels Krishnamoorthi, Raghuraman. "Quantizing deep convolutional networks for efficient inference: A whitepaper." arXiv preprint arXiv:1806.08342 (2018). 4

  5. Motivation Post-training quantization Pretrained model 5

  6. Motivation Post-training quantization Pretrained model Low-bit model 6

  7. Motivation Post-training quantization Pretrained model Minimize the Di Distance Low-bit model 7

  8. Motivation Post-training quantization I. Define the distance Pretrained model II. Minimize the distance Minimize the Di Distance Low-bit model 8

  9. Related works I. Define the distance TF-lite Map the maximum weighs (activations) II. Minimize the distance to the maximum low-bit number -|Max| |Max| -127 0 127 Krishnamoorthi, Raghuraman. "Quantizing deep convolutional networks for efficient inference: A whitepaper." arXiv preprint arXiv:1806.08342 (2018). 9

  10. Related works I. Define the distance TensorRT Map the clip value II. Minimize the distance to the maximum low-bit number outliers outliers -127 0 127 Szymon Migacz. 8-bit Inference with TensorRT. GTC 2017 10

  11. Method Objective I. Define the distance II. Minimize the distance Previous work Pretrained model This work Learns a low-bit mapping from input to the output of every convolution. Minimize the Di Distance Low-bit model 11

  12. Method I. Define the distance II. Minimize the distance (Bit-split) 2 $%# 2 " 2 ' 𝑟 $ 𝑟 $%" 𝑟 # 𝑟 " … 12

  13. Method Optimize 𝛽 Optimize m-th bit Wang, P., Hu, Q., Zhang, Y., Zhang, C., Liu, Y. and Cheng, J., 2018. Two-step quantization for low-bit neural networks. In Proceedings of the IEEE Conference on computer vision and pattern recognition (pp. 4376-4384). 13

  14. Bit-Split for Post-training Network Quantization Problem: Optimization: 14

  15. Bit-Split Results Weight Quantization: Both Weight and Activation Quantization: 15

  16. Comparison with State-of-the-arts 16

  17. Results on Detection and Instance segmentation 17

  18. Thanks for your attention. Codes are available at https://github.com/wps712/BitSplit peisong.wang@nlpr.ia.ac.cn 18

Recommend


More recommend