AI on the Edge Discussion on the Gap Between Industry and Academia - - PowerPoint PPT Presentation

▶

Dec 24, 2023 451 likes •598 views

AI on the Edge Discussion on the Gap Between Industry and Academia Yunhe Wang Huawei Noahs Ark Lab ABOUT ME Enthusiasm PKUer Researcher Programmer Yunhe Wang www.wangyunhe.site yunhe.wang@huawei.com Deep Model Compression [Han

SLIDE 1

Huawei Noah’s Ark Lab Yunhe Wang

AI on the Edge

— Discussion on the Gap Between Industry and Academia

SLIDE 2

ABOUT ME

Enthusiasm

Programmer

PKUer Researcher

Yunhe Wang

www.wangyunhe.site yunhe.wang@huawei.com

SLIDE 3

[Han et. al. NIPS 2015] [Han et. al. ICLR 2016 best paper award]

It is very surprised to see that over 90% of pre-trained parameters in AlexNet and VGGNet are redundant.
The techniques used in visual compression is transferred successfully, e.g. quantization and Huffman

encoding.

Compressed networks can achieve the same performance compared to original baselines after fine-tuning.
Cannot directly obtain a considerable speed-up on mainstream hardwares.

Restrictions for using AI on the edge.

Deep Model Compression

SLIDE 4

CNNpack: Packing Convolutional Neural Networks in the Frequency Domain (NIPS 2016)

Compressed AlexNet VGGNet-16 ResNet-50

rc 39x 46x 12x rs 25x 9.4x 4.4x Top1-err 41.6% 29.7% 25.2% Top5-err 19.2% 10.4% 7.8%

Input data DCT bases DCT feature maps Weighted combination Feature maps of this layer DCT bases K-means clustering

0.499 0.498 0.501 0.502 0.500 0.5 Huffman & CSR storage

Original filters l1-shrinkage Quantization Compression

232 572 95 5.9 12.4 7.9

200 400 600 800 AlexNet VGGNet-16 ResNet-50 Memory (MB)

7e8 2e10 3.8e9 3e7 2.1e9 8.5e8

0.00E+00 5.00E+09 1.00E+10 1.50E+10 2.00E+10 2.50E+10

AlexNet VGGNet-16 ResNet-50 Multiplications

SLIDE 5

Input Images Teacher Network Student Network

Discriminator (Assistant)

Feature Space Teacher Feature Student Feature

LGAN = 1

i=1 H(oi S, yi) + γ 1 n

i=1

⇥ log(D(zi

T )) + log(1 − D(zi S))

⇤ , We suggest to develop a teaching assistant network to identify the difference between features generated by student and teacher network:

Adversarial Learning of Portable Student Networks (AAAI 2018)

SLIDE 6

Visualization results of different networks trained on the MNIST dataset, where features of a specific category in every sub-figure are represented in the same color: (a) features of the original teacher network; (b) features

f the student network learned using the standard back-propagation strategy; (c) features of the student

network learned using the proposed method with a teaching assistant. (a) accuracy = 99.2% (b) accuracy = 97.2% (c) accuracy = 99.1%

Adversarial Learning of Portable Student Networks (AAAI 2018)

SLIDE 7

An illustration of the evolution of LeNet on the MNIST dataset. Each dot represents an individual in the population, and the thirty best individuals are shown in each evolutional iteration. The fitness of individuals is gradually improved with an increasing number of iterations, implying that the network is more compact but remaining the same accuracy. Original Filters: Remained Filters: Retrained Filters:

Toward Evolutionary Compression (SIGKDD 2018)

SLIDE 8

Two generators in CycleGAN will be simultaneously compressed:

Statistics of compressed generators

P30 Pro Latency: 6.8s -> 2.1s

Co-Evolutionary Compression for GANs (ICCV 2019)

Generator A Generator B Generator A Generator B Gen A Gen B Iteration = 1 Iteration = 2 Iteration = T

… … … …

Population A Population A Population A Population B Population B Population B

… … Input Baseline ThiNet Ours

SLIDE 9

Student Network Teacher Network Random Signals Generated Images Generative Network Distillation

A generator is introduced to approximate training data

DAFL: Data-Free Learning of Student Networks (ICCV 2019)

How to provide perfect model optimization service on the cloud？

Privacy-Related AI Applications

Entertain ment APP FaceID Voice assistant Finger print

Original and Generated Face Images 98.20% on MNIST 92.22% on CIFAR-10 74.47% on CIFAR-100

SLIDE 10

AdderNet: Do We Really Need Multiplications in Deep Learning？(CVPR 2020)

Using Add in Deep Learning can significantly reduce the energy consumption and area cost of chips.

https://media.nips.cc/Conferences/2015/tutorialslides/Dally-NIPS-Tutorial-2015.pdf http://eecs.oregonstate.edu/research/vlsi/teaching/ECE471_WIN15/mark_horowitz_ISSCC_2014.pdf http://eyeriss.mit.edu/2019_neurips_tutorial.pdf

Feature Visualization on MNIST Adder Network Convolutional Network Feature calculation in adder neural network: Feature calculation in convolutional neural network: Validations on ImageNet

SLIDE 11

Huawei HDC 2020: Real-time Video Style Transfer

Inference Time: about 630ms Inference Time: 60ms Huawei Atlas 200 AI Accelerator Module The key approaches used for completing this task:

1. Model Distillation: remove the optical flow module in the original network 2. Filter Pruning: reduce the computational complexity of the video generator 3. Operator Optimization: automatically select the suitable operators in Atlas 200 https://developer.huaweicloud.com/exhibition/Atlas_neural_style.html

SLIDE 12

Discussions – Edge Computing

The 4 reasons to move deep learning workloads from the cloud down on to the device

1. Privacy & security: if your data can't leave the premises where it’s captured
2. Latency: if you need to have a real-time response, so in the case of a robotics workload or a self-driving car
3. Reliability: your network up to the cloud might not always be reliable
4. Cost: if a channel is actually costly to use to send the data up to the cloud

ü fast ü large memory ü free energy resource Server/Cloud Mobile device

small memory
slow
limited energy

resource Deep Neural Network

Github Link Zhihu (知乎)

SLIDE 13

Thank You!

Contact me: yunhe.wang@huawei.com, wangyunhe@pku.edu.cn http://www.wangyunhe.site