Learning Accurate Low-bit Deep Neural Networks with Stochastic Quantization Yinpeng Dong 1 , Renkun Ni 2 , Jianguo Li 3 , Yurong Chen 3 , Jun Zhu 1 , Hang Su 1 1 Department of CST, Tsinghua University 2 University of Virginia 3 Intel Labs China
Deep Learning is Everywhere Self-Driving Alpha Go Machine Translation Dota 2
Limitations n More data + deeper models ร more FLOPs + lager memory n Computation Intensive n Memory Intensive n Hard to deploy on mobile devices 3
Low-bit DNNs for Efficient Inference n High Redundancy in DNNs; n Quantize full-precision(32-bits) weights to binary(1 bit) or ternary(2 bits) weights; n Replace multiplication(convolution) by addition and subtraction; 4
๏ฟฝ Typical Low-bit DNNs n BinaryConnect: ๐ถ " = $+1 with probability ๐ = ๐(๐ " ) โ1 with probability 1 โ ๐ n BWN: minimize ๐ โ ๐ฝ๐ถ @ ๐ฝ = โ ๐ " "AB ๐ถ " = ๐ก๐๐๐ ๐ " , ๐ n TWN: minimize ๐ โ ๐ฝ๐ +1 if ๐ " > โ โ ๐ " "โM โ 0 if ๐ " < โ ๐ " = E , ๐ฝ = ๐ฝ โ โ1 if ๐ " < โโ โ= 0.7 @ ๐ฝ โ = ๐ ๐ " > โ , ๐ Q ๐ " "AB 5
Training & Inference of Low-bit DNN n Let ๐ be the full-precision weights, ๐ be the low-bit weights ( ๐ถ , ๐ , ฮฑ๐ถ , ฮฑ๐ ). n Forward propagation: quantize ๐ to ๐ and perform convolution or multiplication n Backward propagation: use ๐ to calculate gradients n Parameter update: ๐ TUB = ๐ T โ ๐ T WX WY Z n Inference: only need to keep low-bit weights ๐ 6
Motivations n Quantize all weights simultaneously; n Quantization error ๐ โ ๐ may be large for some elements/filters; n Induce inappropriate gradient directions. n Quantize a portion of weights n Stochastic selection n Could be applied to any low-bit settings 7
Roulette Selection Algorithm Weight Matrix Quantization Error Stochastic Partition with r = 50% Hybrid Weight Matrix Rotation Rotation 1.3 -1.1 0.75 0.85 0.2 1.3 -1.1 0.75 0.85 C1 0.95 -0.9 1.05 -1.0 0.05 1 -1 1 -1 C2 Selection Selection Point Point 1.4 -0.9 -0.8 0.9 0.2 1 -1 -1 1 C3 -1.2 0.8 1.0 -1.0 0.1 -1.2 0.8 1.0 -1.0 C4 1-st selection: v=0.58 2-nd selection: v=0.37 C2 selected C3 selected ๐ " โ ๐ " B ๐ " = Quantization Error: ๐ " B Quantization Probability: Larger quantization error means smaller quantization probability, e.g. ๐ " โ B ] ^ Quantization Ratio r: Gradually increase to 100% 8
Training & Inference _ n Hybrid weight matrix ๐ _ " = $๐ " if channel i being selected ๐ ๐ " else n Parameter update ๐ TUB = ๐ T โ ๐ T ๐๐ _ T ๐๐ n Inference: all weights are quantized; use ๐ to perform inference 9
๏ฟฝ๏ฟฝ Ablation Studies n Selection Granularity: ยจ Filter-level > Element-level n Selection/partition algorithms ยจ Stochastic (roulette) > deterministic (sorting) ~ fixed (selection only at first iteration) n Quantization functions ยจ Linear > Sigmoid > Constant ~ Softmax , where ๐ = B n ๐ " = exp (๐ " ) โ exp โ (๐ " ) ] n Quantization Ratio Update Scheme ยจ Exponential > Fine-tune > Uniformly n 50% ร 75% ร 87.5% ร 100% 10
Results -- CIFAR CIFAR-10 CIFAR-100 Bits VGG-9 ResNet-56 VGG-9 ResNet-56 FWN 32 9.00 6.69 30.68 29.49 BWN 1 10.67 16.42 37.68 35.01 SQ-BWN 1 9.40 7.15 35.25 31.56 TWN 2 9.87 7.64 34.80 32.09 SQ-TWN 2 8.37 6.20 34.24 28.90 error (%) of VGG-9 and ResNet-56 trained with 5 different methods on the CIFAR-10 and 80 2 FWN FWN BWN TWN 1.8 SQ-BWN SQ-TWN 1.6 60 1.4 1.2 Loss Loss 40 1 0.8 0.6 20 0.4 0.2 0 0 0 64 128 192 256 0 64 128 192 256 Iter.(k) Iter.(k) 11
Results -- ImageNet AlexNet-BN ResNet-18 Bits top-1 top-5 top-1 top-5 FWN 32 44.18 20.83 34.80 13.60 BWN 1 51.22 27.18 45.20 21.08 SQ-BWN 1 48.78 24.86 41.64 18.35 TWN 2 47.54 23.81 39.83 17.02 SQ-TWN 2 44.70 21.40 36.18 14.26 (%) of AlexNet-BN and ResNet-18 trained with 5 different methods on 12
Conclusions n We propose a stochastic quantization algorithm for Low-bit DNN training n Our algorithm can be flexibly applied to all low-bit settings; n Our algorithm help to consistently improve the performance; n We release our codes to public for future development ยจ https://github.com/dongyp13/Stochastic-Quantization 13
Q & A
Recommend
More recommend