Early Experience in Benchmarking Edge AI Processors with Object - PowerPoint PPT Presentation

Early Experience in Benchmarking Edge AI Processors with Object Detection Workloads Bench 2019 Yujie Hui 1 , Jeffrey Lien 2 , and Xiaoyi Lu 1 1 Department of Computer Science and Engineering, The Ohio State University {hui.82, lu.932}@osu.edu 2 NovuMind Inc. jlien@novumind.com The Ohio State University

Overview • Introduction • Overview of Edge AI Processors • Benchmarking Methodology • Evaluation • Conclusion The Ohio State University 2

Edge Computing DATA APP APP APP DATA APP Edge DATA APP Network Computing • Store and process the data closer to the location where it is needed • Deliver low latency to the end users The Ohio State University 3

Artificial Intelligence at the Edge Datacenter (e.g., GPU) • Inference is moving to the edge Data Features Training Evaluation Inference ❖ Heavy workloads in datacenters ❖ Less computationally demanding Edge Devices Datacenter (e.g., GPU) ❖ Low power consumption ❖ Low cost Data Features Training Evaluation Inference The Ohio State University 4

Killer Applications for AI@Edge – Object Detection Ma Machine Learning Use Cases in Facebook • Object Detection: Recommendat Face ID Recommendation ion 3% 2% RNN ASR RNN ASR Object ❖ Higher resolution of input RNN 10% Segmentation Translator 3% images Image RNN Translator Classification 6% Object Object ❖ Larger output tensors Detection Image Detection 34% Classification Object 42% Segmentation ❖ More complicated tasks Face ID Wu et al., Machine Learning at Facebook: Understanding Inference at Edge, HPCA-2019 C. Wu, At-Scale Infrastructure Challenges for Machine Learning, IISWC-2019 (Invited Talk) The Ohio State University 5

Object Detection Workloads - Demo Real life applications: ❖ Self driving cars ❖ Tracking objects ❖ Face detection ❖ Pedestrian detection ❖ Medical imaging ❖ Robotics Low latency and high accuracy inference needs high performance edge devices! The Ohio State University 6

Overview • Introduction • Overview of Edge AI Processors • Edge TPU • NVIDIA Xavier • NovuTensor • Benchmarking Methodology • Evaluation • Conclusion The Ohio State University 7

Edge AI Processors - EdgeTPU • A single-board computer • On-board Edge TPU coprocessor with capable for performing 4 TOPS • 1 GB LPDDR4 memory • Precision: INT 8 • Power: 2.5 watts https://coral.withgoogle.com/products/dev-board • Supports TensorFlow Lite model The Ohio State University 8

Edge AI Processors - Xavier • Volta GPU with 512 CUDA cores • TOPS: 22.6/11.3/1.3 • 16GB LPDDR4X memory • Precision: INT8/FP16/FP32 • Power: 10/15/30 watts https://developer.nvidia.com/embedded/jetson-agx-xavier-developer-kit • Supports CUDA, cuDNN, TensorRT The Ohio State University 9

Edge AI Processors – NovuTensor • Domain specific architecture focusing on performing 3D tensor computation • 2GB DDR4 memory, 15 TOPS • Precision: INT8 • TOPS: 15 Output Tensor Weight Tensor • Power: 20 watts Tensor Convolution Data Tensor • Support PyTorch Novutensor’s 3D Operation [1] https:// patentscope .wipo.int/search/en/detail.jsf?docId=US225521272&tab=NATIONALBIBLIO The Ohio State University 10

Challenges of Benchmarking Edge AI Processors • Challenge-1: Workload Selection v What are the representative models and datasets for benchmarking edge AI processors with object detection workload? • Challenge-2: Deployment v How to deploy deep neural networks on edge devices, given that each edge device needs a specific framework? • Challenge-3: Metrics and Dimensions v How to select an essential set of metrics and dimensions to comprehensively evaluate edge AI devices? The Ohio State University 11

Overview • Introduction • Overview of Edge AI Processors • Benchmarking Methodology • Workload and Dataset Selection • Deployment Experience • Metrics and Dimensions Selection • Evaluation • Conclusion The Ohio State University 12

Object Detection Workloads – YOLOv2 https://pjreddie.com/darknet/yolov2/ Darknet-19 • A real-time object detection system, which tells us what objects are seen • Tiny-YOLO is a lite version of YOLOv2 • Based on Darknet framework, can detect objects in an image or a video • Darknet-19 neural network YOLO9000: Better, Faster, Stronger. Joseph Redmon, Ali Farhadi The Ohio State University 13

Object Detection Workloads – MS COCO • 330K images (>200K labeled) • 1.5 million object instances • 80 object categories Microsoft COCO Dataset Examples ❖ Images contain rich information with many objects per image ❖ Large in number of instances per category http://cocodataset.org/#home Microsoft COCO: Common Objects in Context. Lin et al. The Ohio State University 14

Deployment Experience Retrain the model using ReLU activation function EdgeTPU Xavier NovuTensor Ed�e�TPU��de� Te��F��de� Modify the weights of NVIDIA’s deepstream NovuSDK 32-b��a��b�� .�� first convolutional reference applications [3] layer TensorRT 5.0.3 C�� Post-Training Quantization Ed�� TPU D�� Integer 15-watt and 30-watt DarkFlow [1] Ca��a��da�a modes Ed�e�TPU��de� Te��F��L��e Post-Training Integer 8-b��d��b�� .�� Quantization [2] EdgeTPU compiler O��E�� O��H�� P� [1]https://github.com/thtrieu/darkflow [2]https://medium.com/tensorflow/tensorflow-model-optimization-toolkit-post-training-integer-quantization-b4964a1ea9ba [3]https://github.com/NVIDIA-AI-IOT/deepstream_reference_apps The Ohio State University 15

Metrics and Dimensions Execution time: Mean Average Precision: & • Preprocess !" = $ ' ( )( • Execution Latency (ms) Accuracy % • Postprocess 0 *!" = 1 N - !" ./& Energy Efficiency (Images/sec/watt) Number of input images can be fully processed per unit-power The Ohio State University 16

Overview • Introduction • Overview of Edge AI Processors • Benchmarking Methodology • Evaluation • Conclusion The Ohio State University 17

Accuracy Dimension Execution time: Mean Average Precision: & • Preprocess !" = $ ' ( )( • Execution Latency (ms) Accuracy % • Postprocess 0 *!" = 1 N - !" ./& Energy Efficiency (Images/sec/watt) Number of input images can be fully processed per unit-power The Ohio State University 18

Evaluation Results - Accuracy 0.6 Tiny-YOLO YOLOv2 0.4 mAP 0.2 0 Edge TPU Xavier 15w Xavier MAXW NovuTensor 1080Ti+TensorRT 1080Ti Performance running YOLOv2 and Tiny-YOLO with 416x416 input images • Provide accurate results with 1% to 3% accuracy difference due to lower precision arithmetic • Accuracy degradation is different since the diversified implementation of quantization The Ohio State University 19

Latency Dimension Execution time: Mean Average Precision: & • Preprocess !" = $ ' ( )( • Execution Latency (ms) Accuracy % • Postprocess 0 *!" = 1 N - !" ./& Energy Efficiency (Images/sec/watt) Number of input images can be fully processed per unit-power The Ohio State University 20

Evaluation Results - Latency 100 Tiny-YOLO YOLOv2 Latency 80 (ms) 60 40 20 0 Edge TPU Xavier 15w Xavier MAXW NovuTensor 1080Ti+TensorRT 1080Ti Performance running YOLOv2 and Tiny-YOLO with 416x416 input images ❖ EdgeTPU is 9.5X and 14.79X slower than GPU with running Tiny-YOLO and YOLOv2 ❖ NovuTensor and Xavier are 4.66X - 6.08X slower than the GPU ❖ Xavier is 2X and 5.28X faster than EdgeTPU in the max power mode ❖ NovuTensor is 2.04X and 3.8X faster than EdgeTPU for YOLOv2 and Tiny-YOLO The Ohio State University 21

Energy Efficiency Dimension Execution time: Mean Average Precision: & • Preprocess !" = $ ' ( )( • Execution Latency (ms) Accuracy % • Postprocess 0 *!" = 1 N - !" ./& Energy Efficiency (Images/sec/watt) Number of input images can be fully processed per unit-power The Ohio State University 22

Evaluation Results – Energy Efficiency 15 (image/sec/watt) Tiny-YOLO YOLOv2 Efficiency 10 Energy 5 0 Edge TPU Xavier 15w Xavier MAXW NovuTensor 1080Ti+TensorRT 1080Ti Performance running YOLOv2 and Tiny-YOLO with 416x416 input images ❖ All edge AI processors have higher energy efficiency due to low power consumptions ❖ EdgeTPU delivers 2.9X and 1.13X higher energy efficiency than Xavier; 1.96X and 1.04X higher than NovuTensor The Ohio State University 23

Evaluation Results – Large Images 1 200 0.8 Energy Efficiency (image/sec/watt) Latency (ms) 0.6 100 0.4 0.2 0 0 w W T r i o T R 5 s X 0 1 n r 8 o w W r T i A e r 0 o T T s R e 5 M 1 s 0 n X i u 1 n r v e 8 o v A e T a r r 0 s e o T X e + M n 1 i N i u v i v e T v T a a r e o X 0 X + i N 8 v i 0 T a 1 X 0 8 0 1 (a) Latency (b) Energy Efficiency Performance running YOLOv2 and Tiny-YOLO with 1024X1024 input images • Xavier in the 15-watt mode delivers the best energy efficiency • 1080Ti using TensorRT has the best performance of latency The Ohio State University 24

Early Experience in Benchmarking Edge AI Processors with Object - PowerPoint PPT Presentation

Early Experience in Benchmarking Edge AI Processors with Object Detection Workloads Bench 2019 Yujie Hui 1 , Jeffrey Lien 2 , and Xiaoyi Lu 1 1 Department of Computer Science and Engineering, The Ohio State University {hui.82, lu.932}@osu.edu 2

Cloud Cloud Cloud Cloud network Edge Edge Edge Edge as a Edge Edge Edge Edge Edge

Get the edge Get the edge Get the edge Get the edge Get the edge Get the edge Get the edge

B3 Benchmarking B3 Building Benchmarking Program Overview www.CleanEnergyResourceTeams.org B3

Edge-based Segmentation Transform Hough Edge Tracking Linking Edge Detection Canny Edge

Benchmarking Lunch-n-Learn March 18, 2019 Agenda 1. Why Benchmarking? 2. Introduction to

Towards a Methodology for Benchmarking Edge Processing Frameworks Pedro Silva, Alexandru Costan,

Effect of Edge Preparation Methods on Effect of Edge Preparation Methods on Edge Retention Rate

Next Edge Theta Yield Fund Next Edge Capital Corp., January 2016 IMPORTANT NOTES The Next Edge

Next Edge Private Debt Fund Next Edge Capital Corp., June 2018 IMPORTANT NOTES The Next Edge

Mobile Edge Cloud Services in 5G Yanyong Zhang WINLAB, Rutgers University

2020 Customer Experience Benchmarking Report The connected customer delivering an effortless

Towards Benchmarking AIOT Device based on MCU Dong Li Seaway Technology Inc. ICT, CAS

> SOFT EDGE < By Iskos-Be rlin > SOFT EDGE < Soft Edge chair series is based on the

Next Edge Private Debt Fund Next Edge Capital Corp., September 2016 IMPORTANT NOTES The

Next Edge Private Debt Fund Next Edge Capital Corp., November 2017 IMPORTANT NOTES The Next

Wide Edge Lines Standard Edge Lines Wide Edge Lines Executive Committee for Highway Safety

Words Can Shift: Dynamically Adjusting Word Representations using Nonverbal Behaviors Yansen

Projected Impacts of Climate Change Global temperature change (relative to pre-industrial) 0C

Diverse Workloads need Specialized System Software: An approach of Multi-kernels and Application

11/17/08 Today Scalable content distribution P561: Network Systems Infrastructure Week 8:

Planning for the Future of Data, Storage, and I/O at NERSC Glenn K. Lockwood, Ph.D Advanced

June 19 2020 Reef Trust Partnership components RRAP program to launch FY 20/21 and announcement

USQCD Software All Hands Meeting FNAL, May 1, 2014 Rich Brower Chair of Software Committee Not

THE FRUIT OF THE SPIRIT Kindness BY CHRIS DAWSON Kindness What is it? crhstovthV

Early Experience in Benchmarking Edge AI Processors with Object - PowerPoint PPT Presentation

Early Experience in Benchmarking Edge AI Processors with Object Detection Workloads Bench 2019 Yujie Hui 1 , Jeffrey Lien 2 , and Xiaoyi Lu 1 1 Department of Computer Science and Engineering, The Ohio State University {hui.82, lu.932}@osu.edu 2

Cloud Cloud Cloud Cloud network Edge Edge Edge Edge as a Edge Edge Edge Edge Edge

Get the edge Get the edge Get the edge Get the edge Get the edge Get the edge Get the edge

B3 Benchmarking B3 Building Benchmarking Program Overview www.CleanEnergyResourceTeams.org B3

Edge-based Segmentation Transform Hough Edge Tracking Linking Edge Detection Canny Edge

Benchmarking Lunch-n-Learn March 18, 2019 Agenda 1. Why Benchmarking? 2. Introduction to

Towards a Methodology for Benchmarking Edge Processing Frameworks Pedro Silva, Alexandru Costan,

Effect of Edge Preparation Methods on Effect of Edge Preparation Methods on Edge Retention Rate

Next Edge Theta Yield Fund Next Edge Capital Corp., January 2016 IMPORTANT NOTES The Next Edge

Next Edge Private Debt Fund Next Edge Capital Corp., June 2018 IMPORTANT NOTES The Next Edge

Mobile Edge Cloud Services in 5G Yanyong Zhang WINLAB, Rutgers University

2020 Customer Experience Benchmarking Report The connected customer delivering an effortless

Towards Benchmarking AIOT Device based on MCU Dong Li Seaway Technology Inc. ICT, CAS

&gt; SOFT EDGE &lt; By Iskos-Be rlin &gt; SOFT EDGE &lt; Soft Edge chair series is based on the

Next Edge Private Debt Fund Next Edge Capital Corp., September 2016 IMPORTANT NOTES The

Next Edge Private Debt Fund Next Edge Capital Corp., November 2017 IMPORTANT NOTES The Next

Wide Edge Lines Standard Edge Lines Wide Edge Lines Executive Committee for Highway Safety

Words Can Shift: Dynamically Adjusting Word Representations using Nonverbal Behaviors Yansen

Projected Impacts of Climate Change Global temperature change (relative to pre-industrial) 0C

Diverse Workloads need Specialized System Software: An approach of Multi-kernels and Application

11/17/08 Today Scalable content distribution P561: Network Systems Infrastructure Week 8:

Planning for the Future of Data, Storage, and I/O at NERSC Glenn K. Lockwood, Ph.D Advanced

June 19 2020 Reef Trust Partnership components RRAP program to launch FY 20/21 and announcement

USQCD Software All Hands Meeting FNAL, May 1, 2014 Rich Brower Chair of Software Committee Not

THE FRUIT OF THE SPIRIT Kindness BY CHRIS DAWSON Kindness What is it? crhstovthV

> SOFT EDGE < By Iskos-Be rlin > SOFT EDGE < Soft Edge chair series is based on the