NNBench-X: A Benchmarking Methodology for Neural Network Accelerator - PowerPoint PPT Presentation

Scalable and Energy-Efficient Architecture Lab (SEAL) NNBench-X: A Benchmarking Methodology for Neural Network Accelerator Designs Xinfeng Xie , Xing Hu, Peng Gu, Shuangchen Li, Yu Ji, and Yuan Xie University of California, Santa Barbara 02/17/2019 1

Outline • Background & Motivation • NN Benchmark for Accelerator: Why, What? • Benchmark Method • NN Workload Characterization • Case Study: TensorFlow Model Zoo • SW-HW Co-design Evaluation • Case Study: Neurocube, DianNao, and Cambricon-X • Conclusion & Future Work 2

NN Benchmark: Why? • NN accelerator has attracted a lot of attention • How good are existing accelerators? • How to design a better one? TPU-v1 Systolic Array GPU-Volta Sea of Small Cores DeePhi ？ Sparse MXU A benchmark-suite for evaluating and providing guidelines to accelerators with diverse and representative workloads. DaDianNao Memory Tile-based Arch HBM/GDDR5 3

NN Benchmark: What? • 3Vs in NN models • V olume : a large amount of NN models • V elocity : a fast speed of volume growth • V ariety : various NN architectures AlexNet # NN Models 856 Models A benchmark-suite needs to select representative NN models and update the suite. Inception module By 2016 the building block of GoogleNet 4

NN Benchmark: What? • SW-HW co-design: model compression + hardware design • Pruning: prune out insignificant weight • Quantization: use lower number of bits for data representation Pruned model EIE INT8 INT8 INT8 Original model INT8 INT8 Quantized model TPU-v1 5

NN Benchmark: What? • SW-HW co-design: model compression + hardware design • Pruning: prune out insignificant weight • Quantization: use lower number of bits for data representation How can I include one of them to evaluate SW-HW co-designs? Pruned model ？ INT8 INT8 INT8 A benchmark-suite needs to cover SW-HW co-designs for NN Original model accelerators . INT8 INT8 Quantized model 6

NN Benchmark: Related Work • We need a new NN benchmark for accelerators! Project Platform Phase App Selection SW-HW Co-design Name Training + ✖ Fathom CPU/GPU Empirical Inference ✖ BenchIP Accelerator Inference Empirical Training + ✖ MLPerf Cloud + Mobile Empirical Inference ☑ NNBench-X Accelerator Inference Quantitative 7

Benchmark Method • Overall idea: both SW and HW designs are input Application Feature Application Extraction + Similarity Application Set Candidate Pool Analysis Model Benchmark-suite Benchmark- Compression Generation suite Methods Hardware Hardware Evaluation PPA Results Designs 8

NN Workload Characterization • Application feature for NN applications • Two-level analysis: operator-level and application-level App1 op1 op2 op1 op3 op2 op4 op1 op2 Operator pool op3 op2 op1 op1 op4 op2 App2 Operator cluster 1 Operator cluster 2 op2 op4 op1 Application feature: Time breakdown on different operator clusters op3 9

Operator Feature • Operator features • Locality: #data / #comps • Parallelism: the ratio of #comps can be parallelized #data: sizeof(A) + sizeof(B) + sizeof(C) A #comps: + length(A) scalar add oprs B = Locality: #data / #comps C Parallelism: 100% An example of element-wise add 10

Case Study: TensorFlow Model Zoo • Up-to-date models from the machine learning community • Source code: https://github.com/tensorflow/models • A wide range of application domains: • Computer vision (CV), natural language processing (NLP), informatics etc. • 24 NN applications with 57 models. • Diverse neural network architectures and learning methods: • Convolutional neural network (CNN), recurrent neural network (RNN) etc. • Supervised learning, unsupervised learning, reinforcement learning etc. 11

Workload Characterization (1/5) • Observation #1: Convolution and matrix multiplication operators are similar to each other in terms of locality and parallelism features. • Observation #2: Operators with the same functionality can exhibit very different locality and parallelism features. 12

Workload Characterization (2/5) • Cluster 1: Inferior parallelism • Hard to be parallelized. • Bad news from Amdahl’s Law. • Cluster 2: Moderate parallelism and locality • Benefit from parallelization and cache hierarchy. • Cluster 3: Ample parallelism • Benefit from increased amount of Application feature , where R 1 , R 2 , and R 3 are computation resources. • Memory bandwidth could be the time spent in operators from three clusters respectively. bottleneck. 13

Workload Characterization (3/5) • Observation #3: The bottleneck of application is related to its application domain. • CV applications are bounded by R 2 (mostly Conv and MatMul). • NLP applications are bounded by R 3 (mostly Element-wise) 14

Workload Characterization (4/5) (a) CPU (b) GPU • Observation #4: Applications on GPU have a larger R 1 because parallelizable parts are well accelerated. (Amdahl’s Law) 15

Workload Characterization (5/5) • Select applications along the line R 2 + R 3 = 1 Table: Brief descriptions for ten applications in NNBench-X. Welcome to check our recent published paper for more details: X. Xie, X. Hu, P. Gu, S. Li, Y. Ji and Y. Xie, "NNBench-X: Benchmarking and Understanding Neural Network Workloads for Accelerator Designs," in IEEE Computer Architecture Letters . 16

Benchmark Method • After the first stage, we obtained the application set. Application Feature Application Extraction + Similarity Application Set Candidate Pool Analysis Model Benchmark-suite Benchmark- Compression Generation suite Methods Hardware Hardware Evaluation PPA Results Designs 17

Benchmark-suite Generation • Export a new computation graph according to the input model compression technique Sparse W b W X Y = WX + b WX MatMul SpMV BiasAdd An example: exporting a pruned model 18

Hardware Evaluation • Operator-based simulation framework App Accelerator Host op2 op4 op1 op3 Interconnection • Scheduling strategy: Hardware PPA models • Schedule operators to accelerator • Fallback: (unsupported by the accelerator) schedule into the host 19

SW-HW Co-design Evaluation • Evaluated Hardware: • GPU, Neurocube, DianNao, and Cambricon-X • Case Study I: Memory-centric vs. Compute-centric Designs • Evaluated hardware: GPU and Neurocube • Case Study II: Benefits of Model Compression • Solution I: DianNao + Dense models • Solution II: Cambricon-X + Sparse models (90% sparsity) • Solution III: Cambricon-X + Sparse models (95% sparsity) 20

Compute-centric vs. Memory-centric • Observation #5: GPU benefits applications bounded by R 2 because of rich on-chip computation resources and scratchpad memory. • Observation #6: Neurocube benefits applications bounded by R 3 by providing large effective (a) GPU (b) Neurocube memory bandwidth. Applications are listed in an increasing R 2 order along the x-axis. (decreasing R 3 order) 21

Benefits of Model Compression • Observation #7: Pruning weights helps CV and NLP applications differently. • Pruning weights help CV applications significantly. • NLP applications are not so sensitive to weight sparsity as DianNao: 0% weight sparsity CV applications. Cambricon-X (90%): 90% weight sparsity Cambricon-X (95%): 95% weight sparsity 22

Conclusion & Future Work • Two Main Takeaways: • CV and NLP applications are very different from the perspective of NN accelerator designs. • Conv and MatMul are not always the bottleneck of NN applications. • Future Work: • Hardware modeling in the early design stage of accelerators. • Other model compression techniques in addition to quantization and pruning. • Value-dependent behaviors in NN applications, such as graphical convolution network (GCN). 23

Thank You! Q & A Please contact the authors for further discussion. E-mail: xinfeng@ucsb.edu yuanxie@ucsb.edu 24

NNBench-X: A Benchmarking Methodology for Neural Network Accelerator - PowerPoint PPT Presentation

Scalable and Energy-Efficient Architecture Lab (SEAL) NNBench-X: A Benchmarking Methodology for Neural Network Accelerator Designs Xinfeng Xie , Xing Hu, Peng Gu, Shuangchen Li, Yu Ji, and Yuan Xie University of California, Santa Barbara

B3 Benchmarking B3 Building Benchmarking Program Overview www.CleanEnergyResourceTeams.org B3

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Benchmarking Lunch-n-Learn March 18, 2019 Agenda 1. Why Benchmarking? 2. Introduction to

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

AHP Slides March 2018 NHS Benchmarking Network Raising Standards through Sharing Excellence

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Autonomous Driving on Benchmarks Xiaodi Hou TWO DECADES OF BENCHMARKING Two decades of

Next Generation ACO Model Review of Alignment / Benchmarking Methodology February 28, 2017 For

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

Neural Network II Neural Network II Week 8 1 Team Homework Assignment #10 Team Homework

Scaling Methodology Scaling Methodology Dan Smith Director HW Engineering dsmith@nvidia.com

Benchmarking Methodology for IPv6 Transition Technologies

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Introduction to Neural Machine Translation Gongbo Tang 16 September 2019 Outline Why Neural

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Neural Networks for Machine Learning Lecture 2a An overview of the main types of neural network

CSC 380 Final Presentation Connect 4 David Alligood, Scott Swiger, Jo Van Voorhis Intro

Multicasting Short recap of the basics No single link should contain multiple copies of H R

Progressive Encoding and Compression of Surfaces Generated from Point Cloud Data J. Smith, G.

ENVIRONMENT STANDING COMMITTEE 18 September 2017 Street Trees & Pruning Requests Criteria

Community Advisory Group Meeting March 5, 2014 UCSF Mount Sutro Open Space Reserve For March

Partitioning under the hood in MySQL 5.5 Mattias Jonsson, Partitioning developer Mikael

A Pruned Problem Transformation Method for Multi-label Classification Jesse Read

Planting Program for NFO August 22, 2019 Mission Canopy plants and cares for trees where they

NNBench-X: A Benchmarking Methodology for Neural Network Accelerator - PowerPoint PPT Presentation

Scalable and Energy-Efficient Architecture Lab (SEAL) NNBench-X: A Benchmarking Methodology for Neural Network Accelerator Designs Xinfeng Xie , Xing Hu, Peng Gu, Shuangchen Li, Yu Ji, and Yuan Xie University of California, Santa Barbara

B3 Benchmarking B3 Building Benchmarking Program Overview www.CleanEnergyResourceTeams.org B3

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Benchmarking Lunch-n-Learn March 18, 2019 Agenda 1. Why Benchmarking? 2. Introduction to

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

AHP Slides March 2018 NHS Benchmarking Network Raising Standards through Sharing Excellence

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Autonomous Driving on Benchmarks Xiaodi Hou TWO DECADES OF BENCHMARKING Two decades of

Next Generation ACO Model Review of Alignment / Benchmarking Methodology February 28, 2017 For

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

Neural Network II Neural Network II Week 8 1 Team Homework Assignment #10 Team Homework

Scaling Methodology Scaling Methodology Dan Smith Director HW Engineering dsmith@nvidia.com

Benchmarking Methodology for IPv6 Transition Technologies

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Introduction to Neural Machine Translation Gongbo Tang 16 September 2019 Outline Why Neural

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Neural Networks for Machine Learning Lecture 2a An overview of the main types of neural network

CSC 380 Final Presentation Connect 4 David Alligood, Scott Swiger, Jo Van Voorhis Intro

Multicasting Short recap of the basics No single link should contain multiple copies of H R

Progressive Encoding and Compression of Surfaces Generated from Point Cloud Data J. Smith, G.

ENVIRONMENT STANDING COMMITTEE 18 September 2017 Street Trees &amp; Pruning Requests Criteria

Community Advisory Group Meeting March 5, 2014 UCSF Mount Sutro Open Space Reserve For March

Partitioning under the hood in MySQL 5.5 Mattias Jonsson, Partitioning developer Mikael

A Pruned Problem Transformation Method for Multi-label Classification Jesse Read

Planting Program for NFO August 22, 2019 Mission Canopy plants and cares for trees where they

ENVIRONMENT STANDING COMMITTEE 18 September 2017 Street Trees & Pruning Requests Criteria