cmsc5743 l06 binary ternary network
play

CMSC5743 L06: Binary/Ternary Network Bei Yu (Latest update: - PowerPoint PPT Presentation

CMSC5743 L06: Binary/Ternary Network Bei Yu (Latest update: November 2, 2020) Fall 2020 1 / 21 These slides contain/adapt materials developed by Ritchie Zhao et al. (2017). Accelerating binarized convolutional neural networks with


  1. CMSC5743 L06: Binary/Ternary Network Bei Yu (Latest update: November 2, 2020) Fall 2020 1 / 21

  2. These slides contain/adapt materials developed by ◮ Ritchie Zhao et al. (2017). “Accelerating binarized convolutional neural networks with software-programmable FPGAs”. In: Proc. FPGA , pp. 15–24 ◮ Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In: Proc. ECCV , pp. 525–542 2 / 21

  3. Motivation Binary / Ternary Net: Motivation 6400 4800 Count 3200 => 1600 0 0 1 -1 0 1 − 0.05 0 0.05 Weight Value 3 / 21

  4. � Binarized Neural Networks (BNN) CNN Key Differences 1. Inputs are binarized ( − 1 or +1) 2.4 6.2 … 5.0 9.1 … ∗ 0.8 0.1 3.3 1.8 4.3 7.8 = 2. Weights are binarized ( − 1 or +1) 0.3 0.8 … … 3. Results are binarized after Weights batch normalization Input Map Output Map BNN Batch Normalization 4 23 = 1 23 − 5 : + < 1 −1 … 1 −3 … 1 −1 … 6 7 − 8 ∗ 1 −1 1 1 3 −7 1 −1 = → 1 −1 … … … = 23 = >+1 if 4 23 ≥ 0 Weights 1 23 Input Map −1 otherwise Output Map (Binary) (Binary) (Binary) (Integer) Binarization 6 4 / 21

  5. BNN CIFAR-10 Architecture [2] Feature map 32x32 dimensions 16x16 8x8 4x4 10 512 256 512 128 256 3 128 Number of feature maps 1024 1024 � 6 conv layers, 3 dense layers, 3 max pooling layers � All conv filters are 3x3 � First conv layer takes in floating-point input � 13.4 Mbits total model size (after hardware optimizations) [2] M. Courbariaux et al. Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1 . arXiv:1602.02830 , Feb 2016. 7 4 / 21

  6. Advantages of BNN 1. Floating point ops replaced with binary logic ops b 1 b 2 b 1 1 ⨯ ⨯ b 2 b 1 b 2 b 1 1 XO XOR b 2 +1 +1 +1 0 0 0 +1 −1 −1 0 1 1 −1 +1 −1 1 0 1 −1 −1 +1 1 1 0 – Encode {+1, − 1} as {0,1} à multiplies become XORs – Conv/dense layers do dot products à XOR and popcount – Operations can map to LUT fabric as opposed to DSPs 2. Binarized weights may reduce total model size – Fewer bits per weight may be offset by having more weights 8 4 / 21

  7. BNN vs CNN Parameter Efficiency Architecture Depth Param Bits Param Bits Error Rate (Float) (Fixed-Point) (%) ResNet [3] 164 51.9M 13.0M* 11.26 (CIFAR-10) BNN [2] 9 - 13.4M 11.40 * Assuming each float param can be quantized to 8-bit fixed-point � Comparison: – Conservative assumption: ResNet can use 8-bit weights – BNN is based on VGG (less advanced architecture) – BNN seems to hold promise! [2] M. Courbariaux et al. Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1 . arXiv:1602.02830 , Feb 2016. [3] K. He, X. Zhang, S. Ren, and J. Sun. Identity Mappings in Deep Residual Networks. ECCV 2016. 9 4 / 21

  8. Overview Minimize the Quantization Error Reduce the Gradient Error 5 / 21

  9. Overview Minimize the Quantization Error Reduce the Gradient Error 6 / 21

  10. 1 1 Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In: Proc. ECCV , pp. 525–542. 6 / 21

  11. 1 1 Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In: Proc. ECCV , pp. 525–542. 6 / 21

  12. 1 1 Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In: Proc. ECCV , pp. 525–542. 6 / 21

  13. 1 1 Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In: Proc. ECCV , pp. 525–542. 6 / 21

  14. 1 1 Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In: Proc. ECCV , pp. 525–542. 6 / 21

  15. 1 1 Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In: Proc. ECCV , pp. 525–542. 6 / 21

  16. Training Binary Weight Networks Naive S Solution: � ! ����� � ��.1��� 1�.� ���� ����� ������.���� � ! ������2� .�� 1����. ���.���� 1 1 Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In: Proc. ECCV , pp. 525–542. 6 / 21

  17. AlexNet'TopX1'(%)'ILSVRC2012' 60' 56.7' 50' 40' 30' 20' 10' 0.2' 0' Full'Precision' '' ' Naïve' 1 1 Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In: Proc. ECCV , pp. 525–542. 6 / 21

  18. R W W '.'.'.'' ''.'.'.''' R R Binarization B W B '.'.'.'' ''.'.'.''' B B 1 1 Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In: Proc. ECCV , pp. 525–542. 6 / 21

  19. 1 1 Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In: Proc. ECCV , pp. 525–542. 6 / 21

  20. Binary Weight Network Train f for b binary w y weights: 1. Randomly initialize W 2. For iter = 1 to N R '.'.'.'' ''.'.'.''' R R 3. Load a random input image X W B = sign( W ) 4. α = k W k ` 1 5. n Forward pass with α, W B 6. Compute loss function C 7. @ W = Backward pass with α, W B @ C 8. Update W ( W = W − @ C @ W ) 9. 1 1 Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In: Proc. ECCV , pp. 525–542. 6 / 21

  21. 1 1 Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In: Proc. ECCV , pp. 525–542. 6 / 21

  22. 1 1 Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In: Proc. ECCV , pp. 525–542. 6 / 21

  23. 1 1 Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In: Proc. ECCV , pp. 525–542. 6 / 21

  24. 1 1 Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In: Proc. ECCV , pp. 525–542. 6 / 21

  25. 1 1 Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In: Proc. ECCV , pp. 525–542. 6 / 21

  26. 1 1 Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In: Proc. ECCV , pp. 525–542. 6 / 21

  27. AlexNet'TopX1'(%)'ILSVRC2012' 60' 56.8' 56.7' 50' 40' 30' 20' 10' 0.2' 0' '' ' Naïve' Full'Precision' Binary'Weight' 1 1 Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In: Proc. ECCV , pp. 525–542. 6 / 21

  28. 1 1 Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In: Proc. ECCV , pp. 525–542. 6 / 21

  29. 1 1 Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In: Proc. ECCV , pp. 525–542. 6 / 21

  30. 1 1 Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In: Proc. ECCV , pp. 525–542. 6 / 21

  31. (1) Binarizing Weights = R B (2) Binarizing Input Redundant computation in overlapping areas = R B Inefficient = sign( X ) X (2) Binarizing Input = = � | X : , : , i | B Efficient c sign( X ) c" Average Filter (3) Convolution with XNOR-Bitcount ≈ R B R B sign( X ) 1 1 Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In: Proc. ECCV , pp. 525–542. 6 / 21

  32. 1 1 Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In: Proc. ECCV , pp. 525–542. 6 / 21

  33. AlexNet'TopX1'(%)'ILSVRC2012' 60' 56.7' 56.8' 50' 40' 30.5' 30' 20' 10' 0.2' 0' '' ' 1 1 Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In: Proc. ECCV , pp. 525–542. 6 / 21

  34. Network Structure in XNOR-Networks BNorm' Conv' AcIv' Pool' +1' sign(x) ! ' X1' A'typical'block'in'CNN' MaxXPooling' ✗ InformaIon'Loss' ✓ MulIple'Maximums' 1 1 Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In: Proc. ECCV , pp. 525–542. 6 / 21

  35. Network Structure in XNOR-Networks BNorm' Conv' AcIv' Pool' ' ✗ InformaIon'Loss' ✓ MulIple'Maximums' 1 1 Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In: Proc. ECCV , pp. 525–542. 6 / 21

  36. Network Structure in XNOR-Networks BNorm' BNorm' Conv' Pool' AcIv' AcIv' ' ✓ InformaIon'Loss' ✓ MulIple'Maximums' 1 1 Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In: Proc. ECCV , pp. 525–542. 6 / 21

  37. 1 1 Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In: Proc. ECCV , pp. 525–542. 6 / 21

Recommend


More recommend