Adaptive Distributed Convolutional Neural Network Inference at the - PowerPoint PPT Presentation

Adaptive Distributed Convolutional Neural Network Inference at the Network Edge with ADCNN 17 August 2020 ICPP 2020 Sai Qian Zhang , Jieyu Lin, Qi Zhang

Executing DNN Inference Tasks for End Users Option 2: cloud only Option 1: edge only Image Cloud data center Image Edge devices user data u s e r d a t a Audio Audio user data user data user data Video Video user data ... ... Limited computing capability Large communication overhead ● Using edge device to handle the end user data leads to a long processing time, while using cloud server to process the end user data acquires a large communication delay.

Motivation ● Edge devices ○ Resource-limited ○ Pervasive ● A daptive D istributed C onvolutional N eural N etwork (ADCNN) ○ We propose a framework for agile execution of inference tasks on edge clusters for Convolutional Neural Networks (CNNs) ● Challenges ○ Reduce the inference latency while keeping the accuracy performance ○ Device heterogeneity and performance fluctuation ○ Applicable to different CNN models

Agenda ● Background ● CNN partitioning strategies ● ADCNN framework ● Modification on CNN architecture ● Evaluation ● Conclusion

CNN Background -- Convolutional Layer Output feature maps Filter 1 3 Filter 3 3 Filter 2 3 224 3 224 224 ... ... ... 224 Input feature maps Input feature maps Filter K ● The weight filters slide across the ifmaps. The dot product between the entries of each ifmap and weight filter are calculated at each position.

Background -- CNN Workload Characteristics Processing time for VGG16 ● Earlier layers take much longer to process than the later layers.

CNN Partitioning Strategies: CNN Channelwise Partitioning Convolution ofmaps ifmaps Filter 1 ... C/2 K/2 ... ... C/2 K/2 ... ... ... Filter K ... N H R M W U Workload of device 1 Workload of device 2 ● In channelwise partition, each node needs to exchange their partially accumulated ofmaps to produce final ofmaps, which may lead to a significant communication overhead.

CNN Partitioning Strategies: Spatial Partitioning 0.2 0.2 B A ifmap data halo 0.6 0.6 A B A B 0.4 0.3 0.9 C D D C 0.4 0.3 0.9 C D Data halo transmission among tiles (a) (b) (c) ● In spatial partition, each tile needs to transmit their data halo in order to compute the correct result.

Fully Decomposable Spatial Partition (FDSP) 0.2 0.2 0.0 B B A A 0.6 0.6 0.0 0.4 0.3 0.9 0.0 0.0 0.0 0.4 0.3 0.9 C D C D Normal Spatial Partition Fully Decomposable Spatial Partition (FDSP) ● The cross-tile information transfer can be eliminated by padding the edge pixels with zeros.

ADCNN Framework Step 1 Step 2 Edge device cluster Progressive Retraining Conv node Tiles Input ... ... ... Central node ... ... Dog Conv ... ... ... ... ... ... node ... ... ... Original CNN model Output CNN model

ADCNN Framework Edge device cluster Conv node ... ... ... Central Input node ... Results tiles Conv node ... ... ... ● The Conv nodes need to transmit the intermediate results to the Central node, which may still cause a significant communication overhead.

Modification on CNN Topology Apply clipped ReLU Quantization RLE 1 2 3 Unroll the neurons 4 1.3 -0.5 2.3 0.1 1.1 0.0 1.8 0.0 1.0 0.0 2.0 0.0 [1,0,2,0,1,0,0,0, 1.2 -0.2 0.1 0.1 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 [1,1,2,1,1,7,2,3] 0,0,0,0,2,0,0,0] -3.8 0.3 0.1 0.0 0.1 0.0 0.0 -2.2 0.0 0.0 0.0 0.0 2.5 0.1 0.2 -1.3 1.8 0.0 0.0 0.0 2.0 0.0 0.0 0.0 Output from the CONV nodes ● We modify the CNN model for reducing this communication overhead. ● We adopt progressive retraining by adding the modification on the CNN architecture

ADCNN Architecture ADCNN System CONV nodes Input tiles Central node cluster i_id:6 i_id:6 i_id:6 t_id:1 t_id:2 t_id:4 Input n_id:1 n_id:1 n_id:N ... partition 1 Stats 2 Statistics Intermediate results collection d:[-0.9,...,1.1],i_id:6,t_id:1,n_id:1 ... # received d:[0.3,...,-0.8],i_id:6,t_id:2,n_id:1 ... results d:[-0.4,...,0.2],i_id:6,t_id:4,n_id:N Layer Dog computation 4 3 ● ADCNN takes advantage of the fine-grained, fully independent tiles generated by FDSP and adapt it to dynamic conditions, allowing it to achieve fine-grained load balancing across heterogeneous edge nodes.

Accuracy Evaluation VGG16 Fully Convolutional Network ● We evaluate different CNN models from different applications. ● Accuracy degradations are around 1% for 8 by 8 FDSP on the input sample.

Inference Latency Comparison ● We implement ADCNN system with nine identical Raspberry Pi devices which simulate the edge devices. Among these nine devices, eight are used as Conv nodes, and the rest one is used as the Central node. ● Baselines: ○ Single device scheme ○ Remote cloud scheme ● ADCNN decreases the average processing latency by 6.68x and 4.42x, respectively.

ADCNN Performance in Dynamic Environment Variation on Inference Latency Changes on Tile Assignment ● We adjust the CPU processing speed on four of the Conv nodes (node 5,6,7,8) in the middle of the processing 50 input images, and detect its impact on tile assignment and overall inference latency. ● ADCNN can handle the dynamic condition on the the node performance effectively.

Conclusion ● We introduce ADCNN, a distributed inference framework which jointly optimize CNN architecture and computing system for better performance in dynamic network environments. ● ADCNN applies FDSP to partition the compute-intensive convolutional layers into many small independent computational tasks which can be executed in parallel on separate edge devices. ● ADCNN system can take advantage of the fine-grained, fully independent tiles generated by FDSP and adapt it to dynamic conditions, allowing it to achieve fine-grained load balancing across heterogeneous edge nodes. ● Compared to existing distributed CNN inference approaches, ADCNN provides up to 2.8x lower latency, while achieving a competitive inference accuracy. Additionally, ADCNN can quickly adapt to the variations on edge device performance.

Adaptive Distributed Convolutional Neural Network Inference at the - PowerPoint PPT Presentation

Adaptive Distributed Convolutional Neural Network Inference at the Network Edge with ADCNN 17 August 2020 ICPP 2020 Sai Qian Zhang , Jieyu Lin, Qi Zhang Executing DNN Inference Tasks for End Users Option 2: cloud only Option 1: edge only

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural

Convolutional Kuan-Ting Lai 2020/3/31 Neural Network Convolutional Neural Networks (CNN)

Introduction CSCE 970 CSCE 970 Lecture 4: Lecture 4: Convolutional Convolutional Neural

Convolutional Neural Nets 4-25-16 Reading Quiz Convolutional neural networks are most commonly

Adaptive Distributed Distributed Traffic Traffic Adaptive Adaptive Distributed Traffic Control

ON TEGRA X1 ALAN WANG, NVIDIA Convolutional Neural Network optimization target Result

Neural Network Part 3: Convolutional Neural Networks CS 760@UW-Madison Goals for the lecture

Convolutional Neural Nets CS447 Natural Language Processing (J. Hockenmaier)

Convolutional Neural Networks for Sentence Classification Yoon Kim New York University 1 / 34

Convolutional Neural Networks 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

Outline Convolutional Neural Network Architectures for Matching Natural Language Sentences.

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Dynamic models 2 Switching KFs continued, Assumed density filters, DBNs, BK, extensions

The CBM Time-of-Flight wall Ingo Deppner for the CBM-TOF Group Physikalisches Institut der Uni.

How to become a True stories to become a proactive developer! proactive developer? PRESENTED

AMI in LBBB Jeffrey Tabas, MD Professor of Emergency Medicine UCSF School of Medicine Goals:

Load Balancing and Termination Detection Load balancing used to distribute computations fairly

MySQL Performance Optimization and Troubleshooting with PMM Peter Zaitsev, CEO, Percona Percona

Future Colliders and European Strategy Update Dmitri Denisov, Fermilab Fermilab Users Meeting,

Leakage Current Summary Cosimo Cantini, Kevin Fusshoeller, Laura Molina Bueno Reminder 2

Adaptive Distributed Convolutional Neural Network Inference at the - PowerPoint PPT Presentation

Adaptive Distributed Convolutional Neural Network Inference at the Network Edge with ADCNN 17 August 2020 ICPP 2020 Sai Qian Zhang , Jieyu Lin, Qi Zhang Executing DNN Inference Tasks for End Users Option 2: cloud only Option 1: edge only

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural

Convolutional Kuan-Ting Lai 2020/3/31 Neural Network Convolutional Neural Networks (CNN)

Introduction CSCE 970 CSCE 970 Lecture 4: Lecture 4: Convolutional Convolutional Neural

Convolutional Neural Nets 4-25-16 Reading Quiz Convolutional neural networks are most commonly

Adaptive Distributed Distributed Traffic Traffic Adaptive Adaptive Distributed Traffic Control

ON TEGRA X1 ALAN WANG, NVIDIA Convolutional Neural Network optimization target Result

Neural Network Part 3: Convolutional Neural Networks CS 760@UW-Madison Goals for the lecture

Convolutional Neural Nets CS447 Natural Language Processing (J. Hockenmaier)

Convolutional Neural Networks for Sentence Classification Yoon Kim New York University 1 / 34

Convolutional Neural Networks 08, 10 &amp; 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

Outline Convolutional Neural Network Architectures for Matching Natural Language Sentences.

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Dynamic models 2 Switching KFs continued, Assumed density filters, DBNs, BK, extensions

The CBM Time-of-Flight wall Ingo Deppner for the CBM-TOF Group Physikalisches Institut der Uni.

How to become a True stories to become a proactive developer! proactive developer? PRESENTED

AMI in LBBB Jeffrey Tabas, MD Professor of Emergency Medicine UCSF School of Medicine Goals:

Load Balancing and Termination Detection Load balancing used to distribute computations fairly

MySQL Performance Optimization and Troubleshooting with PMM Peter Zaitsev, CEO, Percona Percona

Future Colliders and European Strategy Update Dmitri Denisov, Fermilab Fermilab Users Meeting,

Leakage Current Summary Cosimo Cantini, Kevin Fusshoeller, Laura Molina Bueno Reminder 2

Convolutional Neural Networks 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image Processing