real time image recognition using collaborative iot
play

Real-Time Image Recognition Using Collaborative IoT Devices , - PowerPoint PPT Presentation

Real-Time Image Recognition Using Collaborative IoT Devices , JiashenCao ,MatthewWoodward , MichaelS.Ryoo ,andHyesoonKim *Georgia Institute of Technology **Indiana University; EgoVidInc. Prevalence of IoT Devices 2 Internet


  1. Real-Time Image Recognition Using Collaborative IoT Devices , JiashenCao ∗ ,MatthewWoodward ∗ , MichaelS.Ryoo ∗∗ ,andHyesoonKim ∗ *Georgia Institute of Technology **Indiana University; EgoVidInc.

  2. Prevalence of IoT Devices 2 Internet of Things (IoT) devices are everywhere } Smart Locks, Smart Sprinklers, Smart Plugs, Smart Baby Monitors, Smart Cookers, Smart Thermostats, Smart Mirrors, Smart Cleaners, and Smart Refrigerators Many of which generate/capture abundance of real-time raw data such as images. https://www.pentasecurity.com/blog/10-smartest-iot-devices-2017/ ReQuEST workshop 2018

  3. How to Process IoT data? 3 } Advancements of deep neural networks (DNN) provides many high-accuracy solutions to previously impossible tasks: } Image Recognition } Face Recognition } Video (Action Recognition) } Voice Recognition } Performing these tasks in real-time requires high computational power. ReQuEST workshop 2018

  4. Where to Process (I) 4 } (Option A) Use the individual IoT device } Limited energy (e.g., battery powered) } Limited compute power } So, unable to meet time constrains } (Option B) Offload to Cloud } Such as Voice recognition service of Apple’s Siri, Amazon’s Echo, Microsoft’s Cortana, and Google Home } Any problem? ReQuEST workshop 2018

  5. Where to Process (II) 5 } (Option B) Cloud processing is promising but: } Not Scalable } More traffic, data, and storage } IoT devices outnumbered world population in 2017 } Privacy and Security } Voice recognition? Big Brother’s spying devices in the novel 1984 } Multiple layers: Network security, encryption, and etc. } Quality of Service (QoS) and Reliability } We have a tight timing constraint for real-time recognition F.Biscotti et al.,“The Impact of the Internet of Things on Data Centers,” Gartner Research, vol. 18, 2014. ReQuEST workshop 2018

  6. Where to Process (III) 6 } (Option C) What if we could harvest the aggregated computational power of local IoT devices? } At a given time, not all devices are fully utilized ReQuEST workshop 2018

  7. Collaborative IoT Devices 7 } (Option C) We study such collaboration between IoT devices in our paper, Musical Chair . } Our performance metric: Inferences per second } We use same models, so we have same accuracy In this work, we showcase the application of Musical Chair for Image recognition models on a farm of Raspberry PIs Hadidi et al. "Musical Chair: Efficient Real-Time Recognition Using Collaborative IoT Devices." arXiv preprint arXiv:1802.02138 (2018). ReQuEST workshop 2018

  8. Outline 8 } Motivation } Musical Chair } Data and Model Parallelism } Hardware and Software Overview } System Evaluations } Conclusion ReQuEST workshop 2018

  9. Musical Chair 9 } Musical Chair is a technique for distributing DNN computations over multiple IoT devices. (ii) Gather Data on Environment and DNN Model (i) Profiling DNN Layers Behavior Models Number of DNN Model 45 50 55 DNN Layers Devices (n) 20 Communication 4 Levels 256 19 Max Pooling Max + Video Frames Max Pooling Spatial 256 Element Intermediate Max . . . . . S Output 18 Stream Latency S . . . 15x256 fc Temporal Pyramid Max Pooling Max fc cnn Representation fc cnn fc cnn fc cnn fc cnn 17 Max Video and Bandwidth Max Pooling Max 16 Optical Flows Max Pooling Temporal Max . . . . . Output Stream 15x256 15 . . . Max Pooling Max 0 10 20 30 256 Max Environment and DNN Model Inspection Profiling Hardware Phase (iii) Generate Task Assignments Task Assignments for {1…n} devices Node B Task Spatial CNN Nodes B, C, & D Task Node H Tasks Node A Tasks Node D Tasks = Spatial CNN Maxpool Node J Task Recording Maxpool fc_1 (4k) + fc_2 (4k) Optical Flow ⁄ Dense Layers ! " Nodes B & C Task Node F Tasks Node A Tasks Node C Task Spatial CNN Node L Task Maxpool Node H Task Recording fc_3 (51) Temporal CNN Optical Flow fc_1 (4k) fc_2 (4k) (b) Four-node system Node A Tasks Node I Tasks Node K Task Node J Task Nodes E, F, & G Task Recording Maxpool fc_2 (4k) fc_3 (51) fc_1 (4k) Temporal CNN Optical Flow Node B Task Node D Tasks Spatial CNN Maxpool Node A Tasks fc_1 (8k) Node E Tasks Node G Tasks Node I Task Recording Nodes D & E Task fc_2 (8k) Optical Flow Maxpool fc_2 (4k) fc_3 (51) Temporal CNN Node C Task fc_1 (4k) Temporal CNN Task Assignment Phase Distributaion Hadidi et al. "Musical Chair: Efficient Real-Time Recognition Using Collaborative IoT Devices." arXiv preprint arXiv:1802.02138 (2018). ReQuEST workshop 2018

  10. Model & Data Parallelism 10 } Two forms of distribution: Arbitrary Task Task A Task C Task B Model Parallelism: Data Parallelism: Assignments: Part 1 Task B Task B Part 1 Input 1 Input 1 Output 1 Output 1 Input Output X1 Custom DNN Model: Y1 X2 Y2 X3 Part 2 Copy Copy Y3 Task B Task B Part2 Input 2 Input 1 X4 Output 2 Output 1 Output Task B Input Task B Data parallelism is providing the next input to multiple devices in a network. ReQuEST workshop 2018

  11. Model & Data Parallelism 11 } Two forms of distribution: Arbitrary Task Task A Task C Task B Model Parallelism: Data Parallelism: Assignments: Part 1 Task B Task B Part 1 Input 1 Input 1 Output 1 Output 1 Input Output X1 Custom DNN Model: Y1 X2 Y2 X3 Part 2 Copy Copy Y3 Task B Task B Part2 Input 2 Input 1 X4 Output 2 Output 1 Output Task B Input Task B Data parallelism is providing the next input to Model parallelism is splitting parts of a given layer or group of layers over multiple devices. multiple devices in a network. ReQuEST workshop 2018

  12. Model & Data Parallelism 12 } Two forms of distribution: Arbitrary Task Task A Task C Task B Model Parallelism: Data Parallelism: Assignments: Part 1 Task B Task B Part 1 Input 1 Input 1 Output 1 Output 1 Input Output X1 Custom DNN Model: Y1 X2 Y2 X3 Part 2 Copy Copy Y3 Task B Task B Part2 Input 2 Input 1 X4 Output 2 Output 1 Output Task B Input Task B Convolution Layers : Mostly data parallelism Data parallelism is providing the next input to Model parallelism is splitting parts of a given layer or group of layers over multiple devices. multiple devices in a network. Fully Connected Layers : Either data or model parallelism depending on size of the layer, input, and memory ReQuEST workshop 2018

  13. Hardware Overview 13 } Raspberry PI 3: } Cheap and accessible platform } Connected via a Wifi router } No GPU } Nvidia Jetson TX2: } High-end embedded platform } Has a GPU Moreover, we measured whole system power with a power analyzer ReQuEST workshop 2018

  14. Software Overview 14 } Dependencies: } Ubuntu 16.04 } Keras 2.1 } With Tensorflow backend for Raspberry Pis } With Tensorflow-GPU backend for TX2 } Apache Avro for procedure call and data serialization } Image Recognition Models: } AlexNet } VGG16 ReQuEST workshop 2018

  15. AlexNet 15 Input Size: 220x220x3 Five convolution layers Three fully connected layers Convolution (CNN) Layers Input conv2D conv2D fc_1 fc_2 maxpool maxpool conv2D fc_3 conv2D conv2D maxpool 55 220 27 13 13 13 11 3 3 5 3 3 5 3 11 3 3 3 13 13 13 27 220 1000 192 192 55 128 128 4092 4092 3 48 A. Krizhevsky et al., “Imagenet Classification With Deep Convolutional Neural Networks,” in NIPS 2012 ReQuEST workshop 2018

  16. AlexNet Distribution I 16 Five-device system: Task of C Tasks of B fc_1 (2k) CNN Layers Model Parallelism Task of E Tasks of A fc_2 (4k) Input Stream fc_3 (1k) M e r g e Task of D fc_1 (2k) Model Parallelism Convolution (CNN) Layers Input conv2D conv2D fc_1 fc_2 maxpool maxpool conv2D fc_3 conv2D conv2D maxpool 55 220 27 13 13 13 11 3 5 3 3 3 5 11 3 3 3 3 13 13 13 27 220 1000 192 192 55 128 128 4092 4092 3 48 ReQuEST workshop 2018

  17. AlexNet Distribution II 17 Six-device system: Task of D Tasks of B & C fc_1 (2k) CNN Layers Model Parallelism Data Parallelism Task of F Tasks of A fc_2 (4k) Input Stream fc_3 (1k) M e r g e Task of E fc_1 (2k) Model Parallelism Convolution (CNN) Layers Input conv2D conv2D fc_1 fc_2 maxpool maxpool conv2D fc_3 conv2D conv2D maxpool 55 220 27 13 13 13 11 3 5 3 3 3 5 11 3 3 3 3 13 13 13 27 220 1000 192 192 55 128 128 4092 4092 3 48 ReQuEST workshop 2018

  18. Inference per Second (IPS) TX2 (GPU) 0 1 2 3 4 5 6 7 8 AlexNet Results TX2 (CPU) IPS 5-Deivce Lower dynamic energy consumption 6-Device Comparable IPS with TX2 (-30%) Dynamic and Static Energy Energy per Inference (J) ReQuEST workshop 2018 0.5 1.5 T X 0 1 2 2 ( G P U T Dynamic Energy ) X 2 ( C P U ) 5 - D e i v c e 6 - D e v i c e Static Energy Energy per Inference (J) 0.5 1.5 2.5 T 0 1 2 X 2 ( G P Total Energy U ) T X 2 ( C P U ) Total Energy 5 - D e i v c e 6 - D e v i c 18 e

  19. VGG16 19 Input Size: 224x224x3 13 convolution layers Three fully connected layers Block 1 Block 2 Block 3 Block 4 Block 5 conv2D fc_2 fc_1 conv2D conv2D conv2D 2x maxpool conv2D conv2D maxpool fc_3 2x conv2D maxpool conv2D maxpool 2x 224 conv2D 112 conv2D maxpool 56 28 14 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1000 28 56 14 112 224 4092 4092 512 512 512 512 128 256 128 256 3 64 64 K. Simonyan et al., “Very Deep Convolutional Networks for Large-Scale Image Recognition,” in ICLR, 2015. ReQuEST workshop 2018

Recommend


More recommend