 
              Mixed Precision Neural Architecture Search for Energy Efficient Deep Learning Chengyue Gong* 1 , Zixuan Jiang * 2 , Dilin Wang 1 , Yibo Lin 2 , Qiang Liu 1 , and David Z. Pan 2 1 CS Department, 2 ECE Department The University of Texas at Austin ∗ indicates equal contributions 1
Contents t Introduction t Our algorithm t Experimental results t Conclusion 2
Success of Machine Learning cat Chinese English 3
Energy Efficient Computation t Energy computation, latency, security, etc. are critical metrics of edge inference. Cloud Edge Training Inference t Tradeoff between accuracy and complexity of models. t Efficient computation Higher › Neural architecture Accuracy Less Energy › Quantization 4
Neural Architecture Design t Mechanism of neural ImageNet Results 1000 50 networks is not well 45 interpreted. 40 35 Layers / Speed (ms) 100 Error rate (%) 30 t Designing neural architecture 25 is challenging. 20 10 15 10 t Can we advance AI/ML using 5 artificial intelligence instead 1 0 t 1 6 9 8 4 0 1 2 0 e of human intelligence? V 1 1 0 5 0 1 3 5 N - - - - - 1 1 2 - G G t t t x n e e e - - - t t t e o G G N N N e e e l i A N N N t V V p s s s e e e s s s e R R R e e e c R R R n I Layers Speed (ms) Top-1 error Top-5 error 5
Neural Architecture Search Search space Environment as a black box Sample networks # ! " Training and evaluation Hardware simulation Update controller 6
Neural Architecture Search t Black box optimization › Find the optimal network Search configuration to maximize the space performance › Huge search space Sample Update following t Available methods policy policy › Reinforcement learning › Evolutionary algorithm Black box › Differentiable architecture search H. Liu, K. Simonyan, and Y. Yang, “Darts: Differentiable architecture search,” ICLR 2019. 7
Quantization t Weights, activations can be MobileNet-V1 on ImageNet 100 35 quantized due to the inherent 90 redundancy in representations. 30 80 25 70 t Mixed precision for different 60 Energy (mJ) 20 Accuracy layers 50 15 › HAQ 40 30 10 20 K. Wang, Z. Liu, Y. Lin, J. Lin, and S. Han, “HAQ: 5 hardware-aware automated quantization with mixed 10 precision,” CVPR, 2019. 0 0 fix8 fix6 fix4 haq mixed haq mixed top 1 top 5 energy (mJ) 8
Our Work Mixed Precision Quantization Neural Energy Architecture Efficient Search Computation Our work 9
Contents t Introduction t Our algorithm t Experimental results t Conclusion 10
Search Space Basis: MobileNetV2 Block (MB) expand ratio ! ∈ 1,3,6 kernel size ' ∈ 3,5,7 network connectivity * ∈ 0,1 layer−wise bitwidths , - , , . ∈ 2,4,6,8 Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L. C. MobileNetV2: Inverted residuals and linear bottlenecks . CVPR 2018. 11
Search Space Input Shape Block Type Bitwidth #Channels Stride #Blocks 224 × 224 × 3 Conv 3 × 3 8 32 2 1 Search Space 112 × 112 × 32 16 2 1 block( !, #, $, % & , % ' ) 56 × 56 × 16 24 1 2 56 × 56 × 24 32 2 4 Neural architecture # settings !, #, $ 28 × 28 × 32 64 2 4 ()) (( ≈ +. ()×+. /0 14 × 14 × 64 128 1 4 Quantization 14 × 14 × 128 160 2 5 % & , % ' 7 × 7 × 160 256 1 2 7 × 7 × 256 Conv 1 × 1 8 1280 1 1 7 × 7 × 1280 Pooling and FC 8 - 1 1 12
Our Framework Environment Deployment Energy Controller Weighted # ! " Reward Loss Training Evaluation 13
Problem Formulation t Discover neural architectures that minimize the task-related loss while satisfying the energy constraint . , ∗ . ; 0 12345267 min $ % &~( ) * + Expectation of loss 8. :. ; ∗ = argmin * + , . ; 0 6@24A Training of NN % &~( ) B . < D Energy constraint E $ is the policy with parameter F . represents a neural network with weights ; 14
Problem Formulation t Discover neural architectures that minimize the task-related loss while satisfying the energy constraint. , ∗ . ; 0 12345267 min $ % &~( ) * + 8. :. ; ∗ = argmin * + , . ; 0 6@24A % &~( ) B . < D t Relaxation , ∗ . ; 0 12345267 min $ % &~( ) * + + F max % &~( ) B . − D, 0 8. :. ; ∗ = argmin * + , . ; 0 6@24A 15
Hardware Environment Deployment Energy Controller Weighted # ! " Reward Loss Training Evaluation 16
REINFORCE algorithm t Policy gradient theorem For any differentiable policy ! " , for any policy objective functions # , the policy gradient is ∇ " # % = ' ( ) [∇ " log ! " . ( ) ] t Non-differentiable energy measures ∇ " ' 0~( ) # 2 = ' 0~( ) # 2 ∇ " log ! " : ≈ 1 5 6 # 2 7 ∇ " log ! " 2 7 789 17
Software Environment Deployment Energy Controller Weighted # ! " Reward Loss Training Evaluation 18
Non-Differentiability t Relax the discrete mask variable ! to be a continuous random variable computed by the Gumbel Softmax function ! " = exp (( " + log - " )/0 ∑ exp (( " + log - " )/0 where - " is the logit, ( " ~ Gumbel(0, 1), 0 is the temperature. t One-hot [0, 1, 0] Continuous [.3, .5, .2] E. Jang, S. Gu, and B. Poole, “Categorical reparameterization with Gumbel-Softmax,” ICLR, 2017. 19
Bilevel Optimization t Whenever the policy parameter ! changes, the weights of network "($) needs to be retrained. t Motivated by differentiable architecture search ( DARTS ), we propose the following algorithm. Sample minibatch of network configurations $ from the controller Update network models "($) by minimizing the training loss Update the controller parameters ! H. Liu, K. Simonyan, and Y. Yang, “DARTS: Differentiable architecture search,” ICLR 2019. 20
Contents t Introduction t Our algorithm t Experimental results t Conclusion 21
Experimental Settings t Hardware simulator of Bit Fusion [1, 2] t First, search architectures and mixed precision for each layer on a proxy task, tiny ImageNet › Trained for a fixed 60 epochs › 5 days on 1 NVIDIA Tesla P100 t Next, train the discovered architectures on CIFAR-100 and ImageNet from scratch. [1] H. Sharma, J. Park, N. Suda, L. Lai, B. Chau, V. Chandra, and H. Esmaeilzadeh, “Bit fusion: Bit-level dynamically composable architecture for accelerating deep neural network,” in Proc. ISCA, June 2018, pp. 764–775. [2] https://github.com/hsharma35/bitfusion 22
Searched Results ! = 0.1 Ours-small Ours-base ! = 0.01 23
Results on ImageNet IMAGENET RESULT HAQ-small Ours-small HAQ-base Ours-base 40.2 32.1 24.7 21.2 16.3 12.9 12.7 11.6 10.9 10.1 9.94 8.91 2.12 2.06 1.7 1.44 Top-5 Error Model Size (MB) Energy (mJ) Latency (ms) 24
Joint NAS and Mixed Precision Quantization 22.6 NAS + quantization 22.4 (small) 22.2 Ours-small 22 Error (%) Model 21.8 Size NAS + quantization 21.6 (base) 21.4 Ours-base 21.2 21 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 Energy (mJ) 25
Adaptive Mixed Precision Quantization t Pareto front for error rate, latency, and energy 26
Contents t Introduction t Our algorithm t Experimental results t Conclusion 27
Conclusion t We propose a new methodology to perform joint optimization of NAS and mixed precision quantization in the extended search space. t Hardware performance is involved in the objective function. t Our methodology facilitates the end-to-end design automation flow of neural network design and deployment, especially the edge inference. 28
Thank you! 29
Backup Framework t Pipelines of Hardware-Centric Model Design Automation for Quantization Efficient Neural Networks t Limited research considers each stage of the pipeline Mixed collaboratively Precision NAS t Our proposed framework: Mixed Precision NAS Neural Model Architecture Pruning Search 30
Backup result ImageNet RESULT VGG-16 FXP 8 Resnet-50 FXP 8 MobileNetV2 FXP 8 FBNet-B FXP 8 FBNet-B FXP3 HAQ-small Mixed Ours-small Mixed HAQ-base Mixed Ours-base Mixed 838 753 591 557 138 83.9 73.9 36.29 40.2 33.01 31.62 34.7 32.1 28.19 28.23 26.84 29.1 29.1 27.9 25.5 24.7 24.7 29 21.2 16.3 15.4 13.5 12.7 12.9 11.6 10.9 10.1 9.94 9.75 8.97 8.91 7.4 5.3 4.5 3.4 2.12 2.06 1.68 1.7 1.44 Top-1 Error Top-5 Error Model Size (MB) Energy (mJ) Latency (ms) 31
Recommend
More recommend