Neurosurgeon Collaborative Intelligence Between the Cloud and - PowerPoint PPT Presentation

Neurosurgeon Collaborative Intelligence Between the Cloud and Mobile Edge by Y. Kang, J.Hauswald, C. Gao, A. Rovinski, T. Mudge, J. Mars and L. Tang Stefanos Laskaridis   sl829@cam.ac.uk R244: Large-Scale Data Processing and Optimisation

Summary [2.41, 0.87] [7,92, 0.87] a. Status quo b. Mobile-only c. Neurosurgeon Approach Approach Approach 2 Image taken from [1]

Status Quo 3

Status Quo • Deep Neural Networks in “intelligent” applications • Apple Siri, Google Now, Microsoft Cortana • Deep Neural Network applications are mostly o ffl oaded to powerful private or public clouds for computation • Computer Vision • Natural Language Processing • Speech Recognition • Large volume of data transfers cause latency and energy consumption . • However , SoC advancements urged authors to revisit the problem. 4

The Mobile edge 5

Experiment Setup Power Consumption   Watts Up? meter Software • Deep Learning: Ca ff e • mCPU: OpenBLAS • GPU: cuDNN Server Platform Mobile Platform • 4U Intel Dual CPU Chassis, 8xPCIe 3.0 x 16 slots • Tegra K1 SoC • 2x Intel Xeon E5-2620, 2.1 GHz • 4+1 quad core ARM Cortex A15 CPU • 1TB HDD • 2GB DDR3L 933MHz • 16x16GB DDR3 1866MHz ECC • NVIDIA Kepler with 192 CUDA cores • NVIDIA Tesla K40 M-class 12GB PCIe 6

Testing the Mobile Edge • Experiment running an Image of 152KB image through AlexNet [3] • Measuring: • Communication Latency : 3G, LTE, WiFi • Computation Latency : mCPU, mGPU, cloud GPU • End-to-end Latency • Energy Consumption 7

Testing the Mobile Edge Transmission has the dominating cost More power but shorter bursts 8 Images taken from [1]

Neurosurgeon:   Partitioning between Cloud and Mobile 9

cNN Pooling Convolution 10 Images taken from [2]

DNN Layer types • Fully Connected Layer ( fc )   All neurons are connected with all the neurons of the previous layer. Depth is the number of filters. Stride is how much we slide the filter each time. [2] • Convolutional & Local Layer ( conv, local )   Convolves an image with one or more filters to produce a set of maps. • Pooling Layer ( pool )   Downsamples an image to simplify representation. Can be average, max, or L2. [2] • Activation Layer ( sig, relu, htanh )   Applies non-linear function to its input (sigmoid, Rectified Linear Unit, Tanh) • Normalisation layer ( norm )   Normalises features across feature map. • Softmax Layer ( softmax )   Probability distribution over possible classes. • Argmax Layer ( argmax )   Chooses class with higher probability. • Dropout Layer ( dropout )   Randomly ignores neurons as regularisation to prevent overfitting. 11

AlexNet Inference-only   (fw propagation) 2x over   18% more cloud-only energy-efficient 12 Images taken from [1] and [3]

AlexNet • Convolutional layers produce a lot of data. • Pooling layers reduce a lot of data. • Fully connected layers operate on few data but have high latency. 13 Images taken from [3]

Partitioning • First layers have most of the data (convolutions and pooling) • Later layers have most of the latency (fully connected layers) • Key idea : Compute locally until the point it make sense and then o ffl oad to cloud. 14

More Applications Abbreviation Network Input Layers IMC AlexNet Image 24 Image Classification VGG VGG Image 46 Facial Recognition FACE DeepFace Image 10 DIG MNIST Image 9 Digit Recognition ASR Kaldi Speech 13 Speech Recognition POS SENNA Word vectors 3 Part-of-speech Tagging Named Entity NER SENNA Word vectors 3 Recognition Word Chunking CHK SENNA Word vectors 3 15

VGG Server processing latency Data communication latency Mobile processing latency 20 Latency (s) 15 10 5 0 Partition points (a) VGG Data communication energy Mobile processing energy 70 60 Energy (J) Energy (J) 50 40 30 20 10 0 Partition points (a) VGG Layer latency Size of output data 250 14 12 Data size (MB) 200 Latency (ms) 10 150 8 6 100 4 50 2 0 0 input conv1.1 relu1.1 conv1.2 relu1.2 pool1 conv2.1 relu2.1 conv2.2 relu2.2 pool2 conv3.1 relu3.1 conv3.2 relu3.2 conv3.3 relu3.3 conv3.4 relu3.4 pool3 conv4.1 relu4.1 conv4.2 relu4.2 conv4.3 relu4.3 conv4.4 relu4.4 pool4 conv5.1 relu5.1 conv5.2 relu5.2 conv5.3 relu5.3 conv5.4 relu5.4 pool5 fc6 relu6 drop6 fc7 relu7 drop7 fc8 softmax argmax Layers 16 Images taken from [1] (a) VGG

FACE Server processing latency Data communication latency Mobile processing latency Data communication energy Mobile processing energy 3 . 5 14 3 . 0 12 Latency (s) 2 . 5 Energy (J) 10 2 . 0 8 1 . 5 6 1 . 0 4 0 . 5 2 0 . 0 0 Partition points Partition points (b) FACE (b) FACE Layer latency Size of output data 100 2 . 5 Data size (MB) 80 2 . 0 Latency (ms) 60 1 . 5 40 1 . 0 20 0 . 5 0 0 . 0 input conv1 pool2 conv3 local4 local5 local6 fc7 fc8 softmax argmax Layers (b) FACE 17 Images taken from [1]

DIG Server processing latency Data communication latency Mobile processing latency Data communication energy Mobile processing energy 25 6 20 5 Energy (J) Latency (s) 4 15 3 10 2 5 1 0 0 Partition points Partition points (c) DIG Layer latency Size of output data 5 5 Data size (MB) 4 4 Latency (ms) 3 3 2 2 1 1 0 0 input conv1 pool1 conv2 pool2 fc1 relu1 fc2 softmax argmax Layers (c) DIG 18 Images taken from [1]

ASR Server processing latency Data communication latency Mobile processing latency Data communication energy Mobile processing energy 7 30 6 25 Latency (s) 5 Energy (J) 20 4 15 3 10 2 1 5 0 0 Partition points Partition points Layer latency Size of output data 50 5 Data size (MB) 40 4 Latency (ms) 30 3 20 2 10 1 0 0 input fc1 sig1 fc2 sig2 fc3 sig3 fc4 sig4 fc5 sig5 fc6 sig6 fc7 Layers (d) ASR 19 Images taken from [1]

POS Server processing latency Data communication latency Mobile processing latency Data communication energy Mobile processing energy 0 . 25 × 10 − 2 6 0 . 20 Energy (J) Energy (J) 5 Latency (s) 0 . 15 4 3 0 . 10 2 0 . 05 1 0 . 00 0 Partition points Partition points (e) POS (e) POS Layer latency Size of output data × 10 − 2 0 . 4 4 . 5 Data size (MB) Latency (ms) 3 . 4 0 . 3 0 . 2 2 . 2 0 . 1 1 . 1 0 . 0 0 . 0 input fc1 htanh fc3 Layers (e) POS 20 Images taken from [1]

NER Server processing latency Data communication latency Mobile processing latency Data communication energy Mobile processing energy × 10 − 2 0 . 25 6 5 0 . 20 Latency (s) Energy (J) 4 0 . 15 3 0 . 10 2 0 . 05 1 0 0 . 00 Partition points Partition points (f) NER (f) NER Layer latency Size of output data × 10 − 2 0 . 4 4 . 5 Data size (MB) Latency (ms) 3 . 4 0 . 3 0 . 2 2 . 2 0 . 1 1 . 1 0 . 0 0 . 0 input fc1 htanh fc3 Layers (f) NER 21 Images taken from [1]

CHK Server processing latency (b) FACE Data communication latency Mobile processing latency Data communication energy Mobile processing energy × 10 − 2 0 . 25 6 5 0 . 20 Latency (s) Energy (J) 4 0 . 15 3 0 . 10 2 0 . 05 1 0 0 . 00 Partition points Partition points (g) CHK Layer latency Size of output data × 10 − 2 0 . 4 4 . 5 Data size (MB) Latency (ms) 3 . 4 0 . 3 0 . 2 2 . 2 0 . 1 1 . 1 0 . 0 0 . 0 input fc1 htanh fc3 Layers (g) CHK 22 Images taken from [1]

Neurosurgeon 23

Neurosurgeon • Partitions DNN based on: • DNN Topology • Computation latency • Data size output • Dynamic factors • Wireless network • Datacenter workload 24

Neurosurgeon • Profiles device and cloud server • To generate prediction models • One time, in advance • Results stored in device for decision-making • Two, distinct goals: • Latency minimisation • Energy optimisation 25

Neurosurgeon 1) Generate 1) Extract layer 2) Predict layer 3) Evaluate 4) Partitioned prediction models configurations performance partition points Execution CONV FC ++ ++ ++ ++ ++ + + ++ + ++ ++ + + ACT … ++ + + ++ + ++ + ++ + + + POOL ++ ++ Prediction + ++ ++ + ++ ++ + ++ ++ + ++ + + + + ++ ++ + ++ Model + + + CONV FC ++ ++ ++ ++ + ++ ++ + ++ + Prediction + ACT … ++ ++ + + ++ + + ++ + ++ + + + Model POOL ++ ++ + ++ ++ ++ + ++ ++ + ++ + + + + ++ + ++ ++ + ++ + + + Target Application Prediction Prediction Prediction Model Model Model Deployment Phase Runtime Phase 26 Image taken from [1]

Regression model per DNN Layer Linear or logarithmic regression model. GFLOPS for performance. Layer Regression Variables (filter_size/stride)^2 *   Convolution (# filters) Local, Pooling input, output feature maps Fully Connected # Input/Output neurons Softmax, Argmax # Input/Output neurons Activation, Normalisation # neurons 27

Neurosurgeon Collaborative Intelligence Between the Cloud and - PowerPoint PPT Presentation

Neurosurgeon Collaborative Intelligence Between the Cloud and Mobile Edge by Y. Kang, J.Hauswald, C. Gao, A. Rovinski, T. Mudge, J. Mars and L. Tang Stefanos Laskaridis sl829@cam.ac.uk R244: Large-Scale Data Processing and Optimisation

HIC 2018| Neurosurgeon Monday, 30 th July, 2018 Director of Communications, SHC tom@shcx.io

ANGIOARCHITECTURE AND CLINICAL PRESENTATION OF BRAIN ARTERIOVENOUS MALFORMATIONS Marcio Luiz

Vascular Surgery Headline Surgical Treatments of Swedish Heart & Vascular Institute

1 PRESENT STATUS OF AYURVEDA Grey areas: Lack of effort to understand & apply the true

One Neurosurgery Movement Since October 2012 T. S. Park, M.D. Chairman, One Neurosurgery

and Adult Brain Cancer Clinical Registry Brain Cancer Biobanking Australia (BCBA) Ms Robyn

Delaying Surgery After Stroke A 63-year-old man suffers an acute stroke that is managed without

OpenCL Application on Mobile GPU: A Case Study Elena Barreras, Juan M. Jimenez, Arian Maghazeh,

GPU Servers for Research in Quantum Fluids L. Galantucci HPC & Quantum Summit QEII Centre,

Tuning Basic Linear Algebra Routines for Hybrid CPU+GPU Platforms e , Luis P. Garc a, Javier

SUPERCOMPUTERS TO SUPERCARS Bill Veenhuis Sr. Solutions Architect, Automotive

Alternative GPU friendly assignment algorithms Paul Richmond and Peter Heywood Department of

Lattice Measurement of the Delta I=1/2 Contribution to Standard Model Direct CP-Violation in K

Adaptation and Water, Wastewater and Stormwater: Milwaukee and the Milwaukee Metropolitan

Avrupal lar n Mstakbel Bir AB yesi Olarak Trkiyeye Bak lar ve

Putting together technology and dream we introduce first of its kind Virtual online platform.

ANEC Standardizasyonda Avrupal Tketicinin sesi Ay e Smer (asu@anec.eu) 1 Raising

CREATIVITY and CREATIVE PROBLEM SOLVING DR. BURAK KARABEY DOKUZ EYLL NVERSITY FACULTY OF

EMEA- Sat ve Pazarlama Departman 2015- lk 8 Ay Raporu COMPANY PRESENTATION HISTORY ELKON

Solutions for business and public sector SECURE DESIGNED SOFTWARE SOLUTIONS Our mission We strive

Investor Presentation September 2016 1 This management presentation (the presentation)

1 KOREN Update in 2014 Infrastructure Backbone(Up to 160G) and Access link (up to 10G) for

Team GigaMike About Us Mik Galon, MIM Certified Blockchain Expert / DApp Full Stack Web

Bottom-line Upfront Overview U.S. Government: Virginia is the most optimal location for

Neurosurgeon Collaborative Intelligence Between the Cloud and - PowerPoint PPT Presentation

Neurosurgeon Collaborative Intelligence Between the Cloud and Mobile Edge by Y. Kang, J.Hauswald, C. Gao, A. Rovinski, T. Mudge, J. Mars and L. Tang Stefanos Laskaridis sl829@cam.ac.uk R244: Large-Scale Data Processing and Optimisation

HIC 2018| Neurosurgeon Monday, 30 th July, 2018 Director of Communications, SHC tom@shcx.io

ANGIOARCHITECTURE AND CLINICAL PRESENTATION OF BRAIN ARTERIOVENOUS MALFORMATIONS Marcio Luiz

Vascular Surgery Headline Surgical Treatments of Swedish Heart &amp; Vascular Institute

1 PRESENT STATUS OF AYURVEDA Grey areas: Lack of effort to understand &amp; apply the true

One Neurosurgery Movement Since October 2012 T. S. Park, M.D. Chairman, One Neurosurgery

and Adult Brain Cancer Clinical Registry Brain Cancer Biobanking Australia (BCBA) Ms Robyn

Delaying Surgery After Stroke A 63-year-old man suffers an acute stroke that is managed without

OpenCL Application on Mobile GPU: A Case Study Elena Barreras, Juan M. Jimenez, Arian Maghazeh,

GPU Servers for Research in Quantum Fluids L. Galantucci HPC &amp; Quantum Summit QEII Centre,

Tuning Basic Linear Algebra Routines for Hybrid CPU+GPU Platforms e , Luis P. Garc a, Javier

SUPERCOMPUTERS TO SUPERCARS Bill Veenhuis Sr. Solutions Architect, Automotive

Alternative GPU friendly assignment algorithms Paul Richmond and Peter Heywood Department of

Lattice Measurement of the Delta I=1/2 Contribution to Standard Model Direct CP-Violation in K

Adaptation and Water, Wastewater and Stormwater: Milwaukee and the Milwaukee Metropolitan

Avrupal lar n Mstakbel Bir AB yesi Olarak Trkiyeye Bak lar ve

Putting together technology and dream we introduce first of its kind Virtual online platform.

ANEC Standardizasyonda Avrupal Tketicinin sesi Ay e Smer (asu@anec.eu) 1 Raising

CREATIVITY and CREATIVE PROBLEM SOLVING DR. BURAK KARABEY DOKUZ EYLL NVERSITY FACULTY OF

EMEA- Sat ve Pazarlama Departman 2015- lk 8 Ay Raporu COMPANY PRESENTATION HISTORY ELKON

Solutions for business and public sector SECURE DESIGNED SOFTWARE SOLUTIONS Our mission We strive

Investor Presentation September 2016 1 This management presentation (the presentation)

1 KOREN Update in 2014 Infrastructure Backbone(Up to 160G) and Access link (up to 10G) for

Team GigaMike About Us Mik Galon, MIM Certified Blockchain Expert / DApp Full Stack Web

Bottom-line Upfront Overview U.S. Government: Virginia is the most optimal location for

Vascular Surgery Headline Surgical Treatments of Swedish Heart & Vascular Institute

1 PRESENT STATUS OF AYURVEDA Grey areas: Lack of effort to understand & apply the true

GPU Servers for Research in Quantum Fluids L. Galantucci HPC & Quantum Summit QEII Centre,