automl for tinyml with once for all network
play

AutoML for TinyML with Once-for-All Network Song Han Massachusetts - PowerPoint PPT Presentation

AutoML for TinyML with Once-for-All Network Song Han Massachusetts Institute of Technology Once-for-All, ICLR20 AutoML for TinyML with Once-for-All Network


  1. AutoML for TinyML with Once-for-All Network Song Han Massachusetts Institute of Technology Once-for-All, ICLR’20

  2. AutoML for TinyML with Once-for-All Network �������������� ������������� �������������� ������������� �������� �������� �������� �������� �������� ������� fewer engineers small model �������� ������� many engineers less computation large model Less Engineer Resources: AutoML Less Computational Resources: TinyML A lot of computation Once-for-All, ICLR’20

  3. Challenge: Efficient Inference on Diverse Hardware Platforms Cloud AI Mobile AI Tiny AI (AIoT) less less resource resource • Memory: 32GB • Memory: 4GB • Memory: <100 KB • Computation: TFLOPS/s • Computation: GFLOPS/s • Computation: <MFLOPS/s • Different hardware platforms have different resource constraints. We need to customize our models for each platform to achieve the best accuracy-efficiency trade-off, especially on resource-constrained edge devices . Once-for-All, ICLR’20 3

  4. Challenge: Efficient Inference on Diverse Hardware Platforms Design Cost (GPU hours) 200 for training iterations: forward-backward(); The design cost is calculated under the assumption of using MobileNet-v2. 4

  5. Challenge: Efficient Inference on Diverse Hardware Platforms Design Cost (GPU hours) for search episodes: ( 1 ) 40K for training iterations: forward-backward(); if good_model: break; for post-search training iterations: forward-backward(); The design cost is calculated under the assumption of using MnasNet. Once-for-All, ICLR’20 5 [1] Tan, Mingxing, et al. "Mnasnet: Platform-aware neural architecture search for mobile." CVPR . 2019.

  6. Challenge: Efficient Inference on Diverse Hardware Platforms Diverse Hardware Platforms 2019 2017 2015 2013 Design Cost (GPU hours) for devices: ( 2 ) for search episodes: ( 1 ) 40K for training iterations: 160K forward-backward(); if good_model: break; for post-search training iterations: forward-backward(); The design cost is calculated under the assumption of using MnasNet. Once-for-All, ICLR’20 6 [1] Tan, Mingxing, et al. "Mnasnet: Platform-aware neural architecture search for mobile." CVPR . 2019.

  7. Challenge: Efficient Inference on Diverse Hardware Platforms Diverse Hardware Platforms … 10 12 10 9 10 6 Cloud AI ( FLOPS) Mobile AI ( FLOPS) Tiny AI ( FLOPS) Design Cost (GPU hours) for many devices: ( 2 ) for search episodes: ( 1 ) 40K for training iterations: 160K forward-backward(); if good_model: break; 1600K for post-search training iterations: forward-backward(); The design cost is calculated under the assumption of using MnasNet. Once-for-All, ICLR’20 7 [1] Tan, Mingxing, et al. "Mnasnet: Platform-aware neural architecture search for mobile." CVPR . 2019.

  8. Challenge: Efficient Inference on Diverse Hardware Platforms Diverse Hardware Platforms … 10 12 10 9 10 6 Cloud AI ( FLOPS) Mobile AI ( FLOPS) Tiny AI ( FLOPS) Design Cost (GPU hours) for many devices: ( 2 ) for search episodes: ( 1 ) 40K 11.4k lbs CO 2 emission → for training iterations: 45.4k lbs CO 2 emission 160K → forward-backward(); if good_model: break; 454.4k lbs CO 2 emission 1600K → for post-search training iterations: forward-backward(); 1 GPU hour translates to 0.284 lbs CO 2 emission according to Once-for-All, ICLR’20 8 Strubell, Emma, et al. "Energy and policy considerations for deep learning in NLP." ACL. 2019.

  9. Problem: TinyML comes at the cost of BigML (inference) (training/search) We need Green AI: Solve the Environmental Problem of NAS ICML’19, ACL’19 Evolved Transformer 4 orders of magnitude Ours 52 ACL’20 Hardware-Aware Transformer

  10. OFA: Decouple Training and Search Once-for-All: Conventional NAS for OFA training iterations: for devices: ( 2 ) training forward-backward(); for search episodes: ( 1 ) decouple for training iterations: => for devices: search forward-backward(); for search episodes: if good_model: break; sample from OFA; for post-search training iterations: if good_model: break; forward-backward(); direct deploy without training; Once-for-All, ICLR’20 10

  11. Challenge: Efficient Inference on Diverse Hardware Platforms Diverse Hardware Platforms … 10 12 10 9 10 6 Cloud AI ( FLOPS) Mobile AI ( FLOPS) Tiny AI ( FLOPS) for OFA training iterations: Design Cost (GPU hours) training forward-backward(); 40K 11.4k lbs CO 2 emission → decouple for devices: search 45.4k lbs CO 2 emission 160K → for search episodes: sample from OFA; 454.4k lbs CO 2 emission 1600K → if good_model: break; Once-for-All Network direct deploy without training; 1 GPU hour translates to 0.284 lbs CO 2 emission according to Once-for-All, ICLR’20 11 Strubell, Emma, et al. "Energy and policy considerations for deep learning in NLP." ACL. 2019.

  12. Once-for-All Network: Decouple Model Training and Architecture Design once-for-all network Once-for-All, ICLR’20 12

  13. Once-for-All Network: Decouple Model Training and Architecture Design once-for-all network Once-for-All, ICLR’20 13

  14. Once-for-All Network: Decouple Model Training and Architecture Design once-for-all network Once-for-All, ICLR’20 14

  15. Once-for-All Network: Decouple Model Training and Architecture Design once-for-all network … Once-for-All, ICLR’20 15

  16. Challenge: how to prevent different subnetworks from interfering with each other? Once-for-All, ICLR’20 16

  17. Solution: Progressive Shrinking 10 19 • More than different sub-networks in a single once-for-all network, covering 4 different dimensions: resolution , kernel size , depth , width . • Directly optimizing the once-for-all network from scratch is much more challenging than training a normal neural network given so many sub-networks to support. Once-for-All, ICLR’20 17

  18. Solution: Progressive Shrinking 10 19 • More than different sub-networks in a single once-for-all network, covering 4 different dimensions: resolution , kernel size , depth , width . • Directly optimizing the once-for-all network from scratch is much more challenging than training a normal neural network given so many sub-networks to support. Progressive Shrinking Jointly fine-tune Train the Shrink the model once-for-all both large and full model (4 dimensions) network small sub-networks • Small sub-networks are nested in large sub-networks. • Cast the training process of the once-for-all network as a progressive shrinking and joint fine-tuning process. Once-for-All, ICLR’20 18

  19. Connection to Network Pruning Network Pruning Train the single pruned Shrink the model Fine-tune full model network (only width) the small net Progressive Shrinking Fine-tune Train the Shrink the model once-for-all both large and full model (4 dimensions) network small sub-nets • Progressive shrinking can be viewed as a generalized network pruning with much higher flexibility across 4 dimensions. Once-for-All, ICLR’20 19

  20. Progressive Shrinking Full Full Full Full Elastic Elastic Elastic Elastic Resolution Kernel Size Depth Width Partial Once-for-All, ICLR’20 20

  21. Progressive Shrinking Full Full Full Full Elastic Elastic Elastic Elastic Resolution Kernel Size Depth Width Partial Once-for-All, ICLR’20 21

  22. Progressive Shrinking Full Full Full Full Elastic Elastic Elastic Elastic Resolution Kernel Size Depth Width Partial Once-for-All, ICLR’20 22

  23. Progressive Shrinking Full Full Full Full Elastic Elastic Elastic Elastic Resolution Kernel Size Depth Width Partial Once-for-All, ICLR’20 23

  24. Progressive Shrinking Full Full Full Full Elastic Elastic Elastic Elastic Resolution Kernel Size Depth Width Partial Once-for-All, ICLR’20 24

  25. Progressive Shrinking Full Full Full Full Elastic Elastic Elastic Elastic Resolution Kernel Size Depth Width Partial Once-for-All, ICLR’20 25

  26. Progressive Shrinking Full Full Full Full Elastic Elastic Elastic Elastic Resolution Kernel Size Depth Width Partial Partial Once-for-All, ICLR’20 26

  27. Progressive Shrinking Full Full Full Full Elastic Elastic Elastic Elastic Resolution Kernel Size Depth Width Partial Partial Once-for-All, ICLR’20 27

  28. Progressive Shrinking Full Full Full Full Elastic Elastic Elastic Elastic Resolution Kernel Size Depth Width Partial Partial Once-for-All, ICLR’20 28

  29. Progressive Shrinking Full Full Full Full Elastic Elastic Elastic Elastic Resolution Kernel Size Depth Width Partial Partial Once-for-All, ICLR’20 29

  30. Progressive Shrinking Full Full Full Full Elastic Elastic Elastic Elastic Resolution Kernel Size Depth Width Partial Partial Once-for-All, ICLR’20 30

  31. Progressive Shrinking Full Full Full Full Elastic Elastic Elastic Elastic Resolution Kernel Size Depth Width Partial Partial Once-for-All, ICLR’20 31

  32. Progressive Shrinking Full Full Full Full Elastic Elastic Elastic Elastic Resolution Kernel Size Depth Width Partial Partial Partial Once-for-All, ICLR’20 32

Recommend


More recommend