learning accurate low bit deep neural networks with
play

Learning Accurate Low-bit Deep Neural Networks with Stochastic - PowerPoint PPT Presentation

Learning Accurate Low-bit Deep Neural Networks with Stochastic Quantization Yinpeng Dong 1 , Renkun Ni 2 , Jianguo Li 3 , Yurong Chen 3 , Jun Zhu 1 , Hang Su 1 1 Department of CST, Tsinghua University 2 University of Virginia 3 Intel Labs China


  1. Learning Accurate Low-bit Deep Neural Networks with Stochastic Quantization Yinpeng Dong 1 , Renkun Ni 2 , Jianguo Li 3 , Yurong Chen 3 , Jun Zhu 1 , Hang Su 1 1 Department of CST, Tsinghua University 2 University of Virginia 3 Intel Labs China

  2. Deep Learning is Everywhere Self-Driving Alpha Go Machine Translation Dota 2

  3. Limitations n More data + deeper models ร  more FLOPs + lager memory n Computation Intensive n Memory Intensive n Hard to deploy on mobile devices 3

  4. Low-bit DNNs for Efficient Inference n High Redundancy in DNNs; n Quantize full-precision(32-bits) weights to binary(1 bit) or ternary(2 bits) weights; n Replace multiplication(convolution) by addition and subtraction; 4

  5. ๏ฟฝ Typical Low-bit DNNs n BinaryConnect: ๐ถ " = $+1 with probability ๐‘ž = ๐œ(๐‘‹ " ) โˆ’1 with probability 1 โˆ’ ๐‘ž n BWN: minimize ๐‘‹ โˆ’ ๐›ฝ๐ถ @ ๐›ฝ = โˆ‘ ๐‘‹ " "AB ๐ถ " = ๐‘ก๐‘—๐‘•๐‘œ ๐‘‹ " , ๐‘’ n TWN: minimize ๐‘‹ โˆ’ ๐›ฝ๐‘ˆ +1 if ๐‘‹ " > โˆ† โˆ‘ ๐‘‹ " "โˆˆM โˆ† 0 if ๐‘‹ " < โˆ† ๐‘ˆ " = E , ๐›ฝ = ๐ฝ โˆ† โˆ’1 if ๐‘‹ " < โˆ’โˆ† โˆ†= 0.7 @ ๐ฝ โˆ† = ๐‘— ๐‘‹ " > โˆ† , ๐‘’ Q ๐‘‹ " "AB 5

  6. Training & Inference of Low-bit DNN n Let ๐‘‹ be the full-precision weights, ๐‘… be the low-bit weights ( ๐ถ , ๐‘ˆ , ฮฑ๐ถ , ฮฑ๐‘ˆ ). n Forward propagation: quantize ๐‘‹ to ๐‘… and perform convolution or multiplication n Backward propagation: use ๐‘… to calculate gradients n Parameter update: ๐‘‹ TUB = ๐‘‹ T โˆ’ ๐œƒ T WX WY Z n Inference: only need to keep low-bit weights ๐‘… 6

  7. Motivations n Quantize all weights simultaneously; n Quantization error ๐‘‹ โˆ’ ๐‘… may be large for some elements/filters; n Induce inappropriate gradient directions. n Quantize a portion of weights n Stochastic selection n Could be applied to any low-bit settings 7

  8. Roulette Selection Algorithm Weight Matrix Quantization Error Stochastic Partition with r = 50% Hybrid Weight Matrix Rotation Rotation 1.3 -1.1 0.75 0.85 0.2 1.3 -1.1 0.75 0.85 C1 0.95 -0.9 1.05 -1.0 0.05 1 -1 1 -1 C2 Selection Selection Point Point 1.4 -0.9 -0.8 0.9 0.2 1 -1 -1 1 C3 -1.2 0.8 1.0 -1.0 0.1 -1.2 0.8 1.0 -1.0 C4 1-st selection: v=0.58 2-nd selection: v=0.37 C2 selected C3 selected ๐‘‹ " โˆ’ ๐‘… " B ๐‘“ " = Quantization Error: ๐‘‹ " B Quantization Probability: Larger quantization error means smaller quantization probability, e.g. ๐‘ž " โˆ B ] ^ Quantization Ratio r: Gradually increase to 100% 8

  9. Training & Inference _ n Hybrid weight matrix ๐‘… _ " = $๐‘… " if channel i being selected ๐‘… ๐‘‹ " else n Parameter update ๐‘‹ TUB = ๐‘‹ T โˆ’ ๐œƒ T ๐œ–๐‘€ _ T ๐œ–๐‘… n Inference: all weights are quantized; use ๐‘… to perform inference 9

  10. ๏ฟฝ๏ฟฝ Ablation Studies n Selection Granularity: ยจ Filter-level > Element-level n Selection/partition algorithms ยจ Stochastic (roulette) > deterministic (sorting) ~ fixed (selection only at first iteration) n Quantization functions ยจ Linear > Sigmoid > Constant ~ Softmax , where ๐‘” = B n ๐‘ž " = exp (๐‘” " ) โˆ‘ exp โ„ (๐‘” " ) ] n Quantization Ratio Update Scheme ยจ Exponential > Fine-tune > Uniformly n 50% ร  75% ร  87.5% ร  100% 10

  11. Results -- CIFAR CIFAR-10 CIFAR-100 Bits VGG-9 ResNet-56 VGG-9 ResNet-56 FWN 32 9.00 6.69 30.68 29.49 BWN 1 10.67 16.42 37.68 35.01 SQ-BWN 1 9.40 7.15 35.25 31.56 TWN 2 9.87 7.64 34.80 32.09 SQ-TWN 2 8.37 6.20 34.24 28.90 error (%) of VGG-9 and ResNet-56 trained with 5 different methods on the CIFAR-10 and 80 2 FWN FWN BWN TWN 1.8 SQ-BWN SQ-TWN 1.6 60 1.4 1.2 Loss Loss 40 1 0.8 0.6 20 0.4 0.2 0 0 0 64 128 192 256 0 64 128 192 256 Iter.(k) Iter.(k) 11

  12. Results -- ImageNet AlexNet-BN ResNet-18 Bits top-1 top-5 top-1 top-5 FWN 32 44.18 20.83 34.80 13.60 BWN 1 51.22 27.18 45.20 21.08 SQ-BWN 1 48.78 24.86 41.64 18.35 TWN 2 47.54 23.81 39.83 17.02 SQ-TWN 2 44.70 21.40 36.18 14.26 (%) of AlexNet-BN and ResNet-18 trained with 5 different methods on 12

  13. Conclusions n We propose a stochastic quantization algorithm for Low-bit DNN training n Our algorithm can be flexibly applied to all low-bit settings; n Our algorithm help to consistently improve the performance; n We release our codes to public for future development ยจ https://github.com/dongyp13/Stochastic-Quantization 13

  14. Q & A

Recommend


More recommend