Towards Accurate Post-training Network Quantization via Bit-split - PowerPoint PPT Presentation

May 18, 2023 •8 likes •188 views

Towards Accurate Post-training Network Quantization via Bit-split and Stitching Peisong Wang , Qiang Chen, Xiangyu He, Jian Cheng Institute of Automation, Chinese Academy of Sciences 1 Outline Background Motivation Approach

Towards Accurate Post-training Network Quantization via Bit-split and Stitching Peisong Wang , Qiang Chen, Xiangyu He, Jian Cheng Institute of Automation, Chinese Academy of Sciences 1
Outline • Background • Motivation • Approach • Experiments 2
Background • Low-bit quantization has emerged as a promising compression technique • Robustness to network architectures • Hardware friendly • Problems: low-bit quantization relies on • Training data • Large computational resources (CPUs, GPUs) • Quantization skills and expertise 3
Background Training-aware quantization Post-training Quantization Pre-trained This work Model Pre-trained Data-free Model BP-free Network Quantization Easy to use Network Quantization Finetune using data/labels Krishnamoorthi, Raghuraman. "Quantizing deep convolutional networks for efficient inference: A whitepaper." arXiv preprint arXiv:1806.08342 (2018). 4
Motivation Post-training quantization Pretrained model 5
Motivation Post-training quantization Pretrained model Low-bit model 6
Motivation Post-training quantization Pretrained model Minimize the Di Distance Low-bit model 7
Motivation Post-training quantization I. Define the distance Pretrained model II. Minimize the distance Minimize the Di Distance Low-bit model 8
Related works I. Define the distance TF-lite Map the maximum weighs (activations) II. Minimize the distance to the maximum low-bit number -|Max| |Max| -127 0 127 Krishnamoorthi, Raghuraman. "Quantizing deep convolutional networks for efficient inference: A whitepaper." arXiv preprint arXiv:1806.08342 (2018). 9
Related works I. Define the distance TensorRT Map the clip value II. Minimize the distance to the maximum low-bit number outliers outliers -127 0 127 Szymon Migacz. 8-bit Inference with TensorRT. GTC 2017 10
Method Objective I. Define the distance II. Minimize the distance Previous work Pretrained model This work Learns a low-bit mapping from input to the output of every convolution. Minimize the Di Distance Low-bit model 11
Method I. Define the distance II. Minimize the distance (Bit-split) 2 $%# 2 " 2 ' 𝑟 $ 𝑟 $%" 𝑟 # 𝑟 " … 12
Method Optimize 𝛽 Optimize m-th bit Wang, P., Hu, Q., Zhang, Y., Zhang, C., Liu, Y. and Cheng, J., 2018. Two-step quantization for low-bit neural networks. In Proceedings of the IEEE Conference on computer vision and pattern recognition (pp. 4376-4384). 13
Bit-Split for Post-training Network Quantization Problem: Optimization: 14
Bit-Split Results Weight Quantization: Both Weight and Activation Quantization: 15
Comparison with State-of-the-arts 16
Results on Detection and Instance segmentation 17
Thanks for your attention. Codes are available at https://github.com/wps712/BitSplit peisong.wang@nlpr.ia.ac.cn 18

Recommend

Quantization, after Souriau Prequantization Quantization? Group algebra Classical Franois

Quantization, after Souriau Souriau Quantization, after Souriau Prequantization Quantization? Group algebra Classical Franois Ziegler (Georgia Southern) Quantum Nilpotent Reductive Geometric Quantization: Old and New E(3) 2019 CMS

1.51k views • 123 slides

LOW PRECISION INFERENCE ON GPU Hao Wu, NVIDIA OUTLINE Performance motivation for quantization

LOW PRECISION INFERENCE ON GPU Hao Wu, NVIDIA OUTLINE Performance motivation for quantization Quantization details Post-training quantization accuracy Training for quantization 2 INFERENCE (sometimes called serving)

1.13k views • 60 slides

Same, Same But Different Recovering Neural Network Quantization Error Through Weight

Same, Same But Different Recovering Neural Network Quantization Error Through Weight Factorization Eldad Meller ICML 2019 Neural Network Quantization Quantization of Neural Networks is needed for efficient inference Quantization adds

482 views • 9 slides

Quantization for TVM Ziheng Jiang TVM Conference, Dec 12th 2018 Quantization for TVM What is

Quantization for TVM Ziheng Jiang TVM Conference, Dec 12th 2018 Quantization for TVM What is Quantization? source: Han et al Converting weight value to low-bit integer like 8bit precision from float-point without significant accuracy drop.

421 views • 7 slides

TAKING DATA ON FORM TAKING DATA ON FORM- -WOUND WOUND MOTORS MOTORS By : Manuel Manny

TAKING DATA ON FORM TAKING DATA ON FORM- -WOUND WOUND MOTORS MOTORS By : Manuel Manny Garcia, Jr. ACCURATE ACCURATE ACCURATE ACCURATE DATA DATA DATA DATA Taking accurate data allows the coil Taking accurate data allows the coil

331 views • 28 slides

Quantization of Poisson-Lie Hamiltonian systems Chiara Esposito Julius Maximilian University of

Hamiltonian actions Quantization Quantization of Poisson-Lie Hamiltonian systems Chiara Esposito Julius Maximilian University of W urzburg August 22, 2014 1 / 15 Hamiltonian actions Quantization Outline Hamiltonian actions Hamiltonian

759 views • 21 slides

Adiabatic limits, Theta functions, and Geometric Quantization 2019 CMS Winter Meeting Takahiko

Adiabatic limits, Theta functions, and Geometric Quantization 2019 CMS Winter Meeting Takahiko Yoshida Meiji University Based on arXiv:1904.04076 1 Purpose & Main Theorems Geometric quantization Geometric quantization

472 views • 15 slides

CMSC5743 L05: Quantization Bei Yu (Latest update: October 12, 2020) Fall 2020 1 / 25 Overview

CMSC5743 L05: Quantization Bei Yu (Latest update: October 12, 2020) Fall 2020 1 / 25 Overview Fixed-Point Representation Non-differentiable Quantization Differentiable Quantization Reading List 2 / 25 Overview Fixed-Point Representation

537 views • 29 slides

From Martingales in Finance to Quantization for pricing Giorgia Callegaro Universit di Padova

Introduction Quantization Recursive marginal quantization Results and perspectives From Martingales in Finance to Quantization for pricing Giorgia Callegaro Universit di Padova Workshop on Martingales in Finance and Physics ICTP, 24 May

803 views • 24 slides

Quantization of group-valued moment maps III Eckhard Meinrenken June 4, 2011 Eckhard Meinrenken

Quantization of group-valued moment maps III Eckhard Meinrenken June 4, 2011 Eckhard Meinrenken Quantization of group-valued moment maps III Pre-quantization of q-Hamiltonian spaces Recall again the axioms of q-Hamiltonian G -spaces, : M

644 views • 32 slides

-quantization via lattice topological field theory Theo Johnson-Freyd, Northwestern University

Goal: deformation quantization of Poisson formal manifolds Open one-shifted Frobenius algebras The -product Field-theoretic interpretation -quantization via lattice topological field theory Theo Johnson-Freyd, Northwestern University

375 views • 11 slides

ACCURATE FLOATING-POINT SUMMATION IN CUB URI VERNER Summer intern OUTLINE Who needs accurate

ACCURATE FLOATING-POINT SUMMATION IN CUB URI VERNER Summer intern OUTLINE Who needs accurate floating-point summation? ! Round-off error: source and recovery A new method for accurate FP summation on a GPU Added as a function to the open-source

619 views • 28 slides

Learning Accurate Low-bit Deep Neural Networks with Stochastic Quantization Yinpeng Dong 1 ,

Learning Accurate Low-bit Deep Neural Networks with Stochastic Quantization Yinpeng Dong 1 , Renkun Ni 2 , Jianguo Li 3 , Yurong Chen 3 , Jun Zhu 1 , Hang Su 1 1 Department of CST, Tsinghua University 2 University of Virginia 3 Intel Labs China

321 views • 14 slides

Expert Knowledge Makes Towards an . . . Towards an . . . Predictions More Accurate: Reference

Empirical Observation . . . Towards an Explanation Expert Knowledge Makes Towards an . . . Towards an . . . Predictions More Accurate: Reference Theoretical Explanation of Home Page an Empirical Observation Title Page Julio

286 views • 6 slides

Post- -trauma vision trauma vision Post Post- -trauma vision trauma vision Post syndrome

Post- -trauma vision trauma vision Post Post- -trauma vision trauma vision Post syndrome syndrome syndrome: syndrome: myopia myopia AND AND Myopia Myopia accommodative accommodative Accommodative

245 views • 11 slides

Rail Resource Management Rail Resource Management (RRM) (RRM) Post Post Post Post

Rail Resource Management Rail Resource Management (RRM) (RRM) Post Post Post Post implementation review implementation review implementation review & future directions implementation review & future directions &

880 views • 22 slides

POST SECONDARY PLANNING NIGHT September 29, 2020 Virtual Meeting Rachel DeWyngaert HS Grades

POST SECONDARY PLANNING NIGHT September 29, 2020 Virtual Meeting Rachel DeWyngaert HS Grades 10-12 School Counselor Lenore Kingsmore Principal AGENDA: College search review Next steps College application procedures

235 views • 22 slides

The Scope of Sequential Screening with Ex-Post Participation Constraints Francisco Castro

The Scope of Sequential Screening with Ex-Post Participation Constraints Francisco Castro Columbia University Joint work with D. Bergemann (Yale) and G. Weintraub (Stanford) Microsoft, March 2019 1/23 Problem: Sequential Screening When

839 views • 71 slides

MA/CSSE 473 Day 01 Course Intro Algorithms Intro Pick up a handout from the back table MA/CSSE

MA/CSSE 473 Day 01 Course Intro Algorithms Intro Pick up a handout from the back table MA/CSSE 473 Day 01 In class Quizzes (NOT) Roll Call/Instructor quick intro Questions about the Syllabus? The importance of Data

353 views • 11 slides

Education post-2015 Outline: 1. Purpose of these slides 2. Key learning from the current MDGs

Education post-2015 Outline: 1. Purpose of these slides 2. Key learning from the current MDGs 3. Questions for education post-2015 4. Thoughts on a goal 5. Thoughts on targets 6. Thoughts on indicators 7. Considerations on goal, targets,

115 views • 8 slides

Post-Quantum Authentication in TLS 1.3: A Performance Study Di Dimitr trios Sikeridis 1, 1,2 ,

NDSS 2020, February 26, 2020 Post-Quantum Authentication in TLS 1.3: A Performance Study Di Dimitr trios Sikeridis 1, 1,2 , Panos Kampanakis 2 , Michael Devetsikiotis 1 1 Dept. of Electrical and Computer Engineering, The University of New

795 views • 23 slides

Enhanced Robot Audition Based on Microphone Array Source Separation with Post-Filter Jean-Marc

Enhanced Robot Audition Based on Microphone Array Source Separation with Post-Filter Jean-Marc Valin , Jean Rouat, Franois Michaud Department of Electrical Engineering and Computer Engineering Universit de Sherbrooke, Qubec, Canada

417 views • 14 slides

Implementation of a fluid model for the non-linear interaction between runaway electrons and

Summary Implementation of a fluid model for the non-linear interaction between runaway electrons and background plasma V. Bandaru 1 , M. Hoelzl 1 , G. Papp 1 , P. Aleynikov 2 , G. Huijsmans 3 1 Max-Planck-Institute for Plasma Physics, Garching,

555 views • 16 slides

Building Java Programs Chapter 5 Lecture 5-1: while Loops, Fencepost Loops, and Sentinel Loops

Building Java Programs Chapter 5 Lecture 5-1: while Loops, Fencepost Loops, and Sentinel Loops reading: 5.1 5.2 1 2 A deceptive problem... Write a method printLetters that prints each letter from a word separated by commas. For

545 views • 20 slides