XDN: Towards Efficient Inference of Residual Neural Networks on - - PowerPoint PPT Presentation

xdn towards efficient inference of residual neural
SMART_READER_LITE
LIVE PREVIEW

XDN: Towards Efficient Inference of Residual Neural Networks on - - PowerPoint PPT Presentation

(XiaoDianNao) XDN: Towards Efficient Inference of Residual Neural Networks on Cambricon Chips 2019 BenchCouncil International Artificial Intelligence System Challenges Guangli Li , Xueying Wang, Xiu Ma Advisor: Prof. Xiaobing Feng


slide-1
SLIDE 1

XDN: Towards Efficient Inference of Residual Neural Networks on Cambricon Chips

— 2019 BenchCouncil International Artificial Intelligence System Challenges — State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences

Guangli Li, Xueying Wang, Xiu Ma Advisor: Prof. Xiaobing Feng

(XiaoDianNao)

slide-2
SLIDE 2

International Artificial Intelligence System Challenges XiaoDianNao (XDN)

XiaoDianNao Team

Team Members:

Guangli Li is a Ph.D. student at ICT,

  • CAS. His research interests include

Programming Systems and Deep Learning. Xueying Wang is a Ph.D. student at ICT, CAS. Her research interests include Parallel Programming and Deep Learning. Xiu Ma is a Ph.D. student at Jilin University and a visiting student at ICT, CAS. Her research interests include Programming Language and Complier.

Advisor:

  • Prof. Xiaobing Feng

Institute of Computing Technology, Chinese Academy of Sciences

slide-3
SLIDE 3

Track 2

International AI System Challenge based on Cambricon Chip

International Artificial Intelligence System Challenges XiaoDianNao (XDN)

Subject:

The implementation and optimization of convolutional neural network based image classification task on Cambricon AI Chips (MLUs).

Metrics:

Maximize the execution performance and minimize the prediction time (wall clock time) on provided test data.

Provided model: ResNet-50 (ACC: 84.39%) Dataset: CIFAR-10 Experiment Platform: BenchCouncil Testbed

18 Sept. ~ 14 Oct. (about 4 weeks)

Artificial Intelligence Chips (MLUs) Provided Development Tools: Cambricon Caffe

slide-4
SLIDE 4

International Artificial Intelligence System Challenges XiaoDianNao (XDN)

I. Optimization Methodology

  • II. Implementation of the XDN Engine
  • III. Experimental Results
  • IV. Conclusion and Future Work

Contents

slide-5
SLIDE 5

Methodology

input conv conv sum input conv conv sum input conv+ conv+ sum input conv conv sum input conv conv input conv conv sum conv conv bn scale ReLU conv bn scale ReLU

2 1 3

building block building block

1 pre-fusion of layers 2 fusion-guided pruning 3 post-fusion of layers

International Artificial Intelligence System Challenges XiaoDianNao (XDN)

  • Aggressive Fusion Strategy
  • Fusion-Guided Pruning Method
  • Traditional Fusion Strategy
slide-6
SLIDE 6

International Artificial Intelligence System Challenges XiaoDianNao (XDN)

XDN (XiaoDianNao)

——An Efficient Optimization and Inference Engine

I. Pruning Optimizer and Trainer (aggressive fusion and pruning) II. Fusion Optimizer (traditional fusion)

  • III. Auto-Tuner
  • IV. Data Preprocessor and Executor
  • V. Evaluator

I II III IV V

slide-7
SLIDE 7

International Artificial Intelligence System Challenges XiaoDianNao (XDN)

Implementation of XDN

  • Pruning and Fusion

Caffe Model

Cambricon Caffe (Trainer)

Caffe Prototxt

Pruning Optimizer Fusion Optimizer

Original Model

slide-8
SLIDE 8

International Artificial Intelligence System Challenges XiaoDianNao (XDN)

Implementation of XDN

  • Pruning and Fusion

Cambricon Caffe (Trainer) Pruning Optimizer Fusion Optimizer

Caffe Model Caffe Prototxt

Optimized Model

slide-9
SLIDE 9

International Artificial Intelligence System Challenges XiaoDianNao (XDN)

Implementation of XDN

  • Data Preprocessing

CIFAR-10 Images

OpenCV Inference Preprocess Original Executer: On-line Data Preprocess

slide-10
SLIDE 10

OpenCV Preprocess

International Artificial Intelligence System Challenges XiaoDianNao (XDN)

Implementation of XDN

  • Data Preprocessing

CIFAR-10 Images

Inference Off-line Data Preprocess & Efficient Memory-mapped File Memory- mapped Binary File

mmap()

Off-line

Very Fast

slide-11
SLIDE 11

International Artificial Intelligence System Challenges XiaoDianNao (XDN)

Implementation of XDN

  • Auto-Tuning of Hyper-Parameters: A Grid Search Approach

Score Function Search Space

Executor Auto-Tuner

thread data parallel Model parallel

Finding a best option of hyper-parameters

Evaluator

slide-12
SLIDE 12

International Artificial Intelligence System Challenges XiaoDianNao (XDN)

Final Results

Optimization Total Time (ms) Accuracy Original Baseline 2263 0.8439 Filter Pruning OPT-A-1 505 0.8439 OPT-B-1 449 0.8462 OPT-C-1 428 0.8191 Pruning + Conv&BN Fusion OPT-A-2 485 0.8441 OPT-B-2 304 0.8455 OPT-C-2 297 0.8203

Performance of ResNet-50 with XDN Engine

7 . 4 4 x S p e e d u p

Best Result: OPT-B-2, TIME=304ms, ACC=84.55%

slide-13
SLIDE 13

International Artificial Intelligence System Challenges XiaoDianNao (XDN)

Performance Analysis

Optimizations of XDN Total Time (ms) Pruning Fusion Data Preprocess Auto-Tuning 2263 1291 833 746 304

Performance of OPT-B-2

Each part contributes the final performance.

slide-14
SLIDE 14

International Artificial Intelligence System Challenges XiaoDianNao (XDN)

Artifact Description

BenchHub: http://125.39.136.212:8090/liguangli/bench19_xiaodiannao

  • Detailed XDN Documentations (English and Chinese)
  • The Codes of Reproducible Experiments

Step-by-Step

slide-15
SLIDE 15

International Artificial Intelligence System Challenges XiaoDianNao (XDN)

Conclusion and Future Work

  • 1. We proposed a optimization methodology
  • Characteristics of Cambricon Chips
  • Fusion-Guided Pruning Method
  • 2. We implemented the Efficient XDN engine
  • Pruner, Trainer, Optimizer, Auto-Tuner, Executer…
  • 3. We evaluated the performance
  • Achieves high speedup (7.44x) without accuracy loss

In the future, we plan to:

  • test the method on large-scale datasets, such as ImageNet;
  • extend the XDN engine to support more DNN models;
  • test the method on other AI chips;

Highlight Contributions:

slide-16
SLIDE 16

Thank You

XDN: Towards Efficient Inference of Residual Neural Networks on Cambricon Chips

liguangli@ict.ac.cn