AI on the Edge — Discussion on the Gap Between Industry and Academia Yunhe Wang Huawei Noah’s Ark Lab
ABOUT ME Enthusiasm PKUer Researcher Programmer Yunhe Wang www.wangyunhe.site yunhe.wang@huawei.com
Deep Model Compression [Han et. al. NIPS 2015] Restrictions for using AI on the edge. [Han et. al. ICLR 2016 best paper award] It is very surprised to see that over 90% of pre-trained parameters in AlexNet and VGGNet are redundant. • The techniques used in visual compression is transferred successfully, e.g. quantization and Huffman • encoding. Compressed networks can achieve the same performance compared to original baselines after fine-tuning. • Cannot directly obtain a considerable speed-up on mainstream hardwares. •
CNNpack: Packing Convolutional Neural Networks in the Frequency Domain (NIPS 2016) Original filters DCT bases K-means clustering l1-shrinkage Compression Quantization 0.499 Huffman 0.498 & 0.5 0.501 CSR storage 0.502 0.500 Weighted combination DCT bases DCT feature maps Input data Feature maps of this layer Memory (MB) Multiplications Compressed AlexNet VGGNet-16 ResNet-50 800 2.50E+10 2e10 572 2.00E+10 600 r c 39x 46x 12x 1.50E+10 400 r s 25x 9.4x 4.4x 232 1.00E+10 3.8e9 200 95 Top1-err 41.6% 29.7% 25.2% 2.1e9 5.00E+09 8.5e8 12.4 7.9 7e8 5.9 3e7 0 Top5-err 19.2% 10.4% 7.8% 0.00E+00 AlexNet VGGNet-16 ResNet-50 AlexNet VGGNet-16 ResNet-50
Adversarial Learning of Portable Student Networks (AAAI 2018) Teacher Network Input Images Feature Space Discriminator Student Network (Assistant) Teacher Feature Student Feature We suggest to develop a teaching assistant network to identify the difference between features generated by student and teacher network: P n P n L GAN = 1 i =1 H ( o i S , y i ) + γ 1 ⇥� log( D ( z i T )) + log(1 − D ( z i �⇤ S )) , i =1 n n
Adversarial Learning of Portable Student Networks (AAAI 2018) (a) accuracy = 99.2% (b) accuracy = 97.2% (c) accuracy = 99.1% Visualization results of different networks trained on the MNIST dataset, where features of a specific category in every sub-figure are represented in the same color: (a) features of the original teacher network; (b) features of the student network learned using the standard back-propagation strategy; (c) features of the student network learned using the proposed method with a teaching assistant.
Toward Evolutionary Compression (SIGKDD 2018) An illustration of the evolution of LeNet on the MNIST dataset. Each dot represents an individual in the population, and the thirty best individuals are shown in each evolutional iteration. The fitness of individuals is gradually improved with an increasing number of iterations, implying that the network is more compact but remaining the same accuracy. Original Filters: Remained Filters: Retrained Filters:
Co-Evolutionary Compression for GANs (ICCV 2019) … … Iteration = 1 Iteration = 2 Iteration = T Input Baseline ThiNet Ours Generator A Generator A Gen A … … Population A Population A Population A Generator B Generator B Gen B … … Statistics of compressed generators Population B Population B Population B Two generators in CycleGAN will be simultaneously compressed: P30 Pro Latency: 6.8s -> 2.1s
DAFL: Data-Free Learning of Student Networks (ICCV 2019) How to provide perfect model optimization service on the cloud ? Privacy-Related AI Applications Voice Finger Entertain FaceID assistant print ment APP A generator is introduced to approximate training data Original and Generated Face Images Generated Images Teacher Network Random Signals Generative Network Distillation Student Network 98.20% on MNIST 92.22% on CIFAR-10 74.47% on CIFAR-100
AdderNet: Do We Really Need Multiplications in Deep Learning ? (CVPR 2020) Feature Visualization on MNIST Adder Network Convolutional Network Using Add in Deep Learning can significantly reduce the energy consumption and area cost of chips. Validations on ImageNet https://media.nips.cc/Conferences/2015/tutorialslides/Dally-NIPS-Tutorial-2015.pdf http://eecs.oregonstate.edu/research/vlsi/teaching/ECE471_WIN15/mark_horowitz_ISSCC_2014.pdf http://eyeriss.mit.edu/2019_neurips_tutorial.pdf Feature calculation in convolutional neural network: Feature calculation in adder neural network:
Huawei HDC 2020: Real-time Video Style Transfer https://developer.huaweicloud.com/exhibition/Atlas_neural_style.html Inference Time: 60ms Inference Time: about 630ms The key approaches used for completing this task: 1. Model Distillation: remove the optical flow module in the original network 2. Filter Pruning: reduce the computational complexity of the video generator Huawei Atlas 200 3. Operator Optimization : automatically select the suitable operators in Atlas 200 AI Accelerator Module
Discussions – Edge Computing The 4 reasons to move deep learning workloads from the cloud down on to the device 1. Privacy & security: if your data can't leave the premises where it’s captured 2. Latency: if you need to have a real-time response, so in the case of a robotics workload or a self-driving car 3. Reliability: your network up to the cloud might not always be reliable 4. Cost: if a channel is actually costly to use to send the data up to the cloud Deep Neural Network Github Link ü fast • small memory ü large memory • slow ü free energy • limited energy resource resource Mobile device Server/Cloud Zhihu (知乎)
Thank You! Contact me: yunhe.wang@huawei.com, wangyunhe@pku.edu.cn http://www.wangyunhe.site
Recommend
More recommend