with apache storm
play

WITH APACHE STORM Mevlut Demir PhD Student The University of Texas - PowerPoint PPT Presentation

REAL-TIME ANALYTICS WITH APACHE STORM Mevlut Demir PhD Student The University of Texas at San Antonio Department of Electrical and Computer Engineering IN TODAYS TALK 1- Problem Formulation 2- A Real-Time Framework and Its Components


  1. REAL-TIME ANALYTICS WITH APACHE STORM Mevlut Demir PhD Student The University of Texas at San Antonio – Department of Electrical and Computer Engineering

  2. IN TODAY’S TALK 1- Problem Formulation 2- A Real-Time Framework and Its Components with an existing applications 3- Proposed Framework 4- Conclusion The University of Texas at San Antonio – Department of Electrical and Computer Engineering REAL-TIME ANALYTICS WITH APACHE STORM

  3. 1- INTRODUCTION • Number of IoT devices increased. - currently ~7 billion ,by 2020 ~50 billion (exponentially growing) - low manufacturing costs - availability of internet connections • IoT devices consist of : - CPU - memory storage - a wireless connection • IoT devices equipment with: - sensors (produce data) - actuators ( capable of receiving commands) The University of Texas at San Antonio – Department of Electrical and Computer Engineering REAL-TIME ANALYTICS WITH APACHE STORM

  4. 1- INTRODUCTION • An example of IoT in modern life : Robots; - limited on-board computation power - generates large amount of data • Challenges: - latency - computation needs (limits the robot’s mobility due to weights and power demands) *Google Images The University of Texas at San Antonio – Department of Electrical and Computer Engineering REAL-TIME ANALYTICS WITH APACHE STORM

  5. 1- INTRODUCTION • Solution: - scalable data processing platforms -> CLOUD It is a model for enabling ubiquitous, on-demand access to a shared pool of configurable computing resources (e.g., computer networks, servers, storage, applications and services), which can be rapidly provisioned and released with minimal management effort.[9] - becoming the standard computation • Advantages of using central data processing: - the ability to easily draw from vast stores of information, - efficient allocation of computing resources, - a proclivity for parallelization. The University of Texas at San Antonio – Department of Electrical and Computer Engineering REAL-TIME ANALYTICS WITH APACHE STORM

  6. 1.1- REQUIREMENTS FOR IOT DEVICES • Data transfer should be in an efficient and scalable manner. - Traditional GET/POST approach is not suitable because this approach increases latency and network traffic. • Parallel processing • Real-time analysis • Batch analysis The University of Texas at San Antonio – Department of Electrical and Computer Engineering REAL-TIME ANALYTICS WITH APACHE STORM

  7. 2. A REAL-TIME ARCHITECTURE • Gateway layer: Drivers are deployed in gateway layer. • Publish-subscribe messaging layer • Cloud-based big data processing layer: Apache Storm Process data and send back to the device. IoT Cloud Architecture [1] The University of Texas at San Antonio – Department of Electrical and Computer Engineering REAL-TIME ANALYTICS WITH APACHE STORM

  8. 2.1- GATEWAY LAYER • Gateways responsible for: Each has a unique ID - Managing drivers - Managing connections to the brokers - Handling the load balancing of the device data to the brokers - Update the gateway master Gateway layer [2] • Gateway master responsible for: - Control gateways - Update state information of gateways in a Zookeeper. - Deploy/undeploy & start/stop the drivers The University of Texas at San Antonio – Department of Electrical and Computer Engineering REAL-TIME ANALYTICS WITH APACHE STORM

  9. 2.1- GATEWAY LAYER • Driver: - Data bridge between a device and the cloud app. - Responsible for data conversion - Has name and set of communication channels - Can be deployed multiple times Each channel has a unique name MQ Layer[2] The University of Texas at San Antonio – Department of Electrical and Computer Engineering REAL-TIME ANALYTICS WITH APACHE STORM

  10. 2.2- MESSAGING LAYER • RabbitMQ - Topic based publish subscribe broker - Has a rich API ; topics can be easily created. - Supports Advance Message Queuing Protocol(AMQP) and Message Queue Telemetry Transport (MQTT) - Low latency - Creates lightweight topics RabbitMQ [3] The University of Texas at San Antonio – Department of Electrical and Computer Engineering REAL-TIME ANALYTICS WITH APACHE STORM

  11. 2.2- MESSAGING LAYER • Kafka - Topic based publish subscribe broker - Messages are appended to commit log - Topics are divided into partitions - Consumer can read the same topic in parallel - Has its own messaging protocol - Does not support AMQP or MQTT Kafka[4] The University of Texas at San Antonio – Department of Electrical and Computer Engineering REAL-TIME ANALYTICS WITH APACHE STORM

  12. 2.3- ZOOKEEPER - Need to detect online and offline devices - Storm requires coordination among the processing units, because of its distributed nature Discovery[2] The University of Texas at San Antonio – Department of Electrical and Computer Engineering REAL-TIME ANALYTICS WITH APACHE STORM

  13. 2.4- PROCESSING LAYER • Apache Storm - Fault tolerant - Horizontally scalable - Handles large amount of streaming data - Open source - Message guarantees - Simple programming model - Supports multi programming language The University of Texas at San Antonio – Department of Electrical and Computer Engineering REAL-TIME ANALYTICS WITH APACHE STORM

  14. 2.4- PROCESSING LAYER • Apache Storm Concept - Stream: Storm data model -> unbounded sequence tuple - Spout - Bolt - Topology Directed acrylic graph Vertices: computation Edges: stream of data tuple Apache Storm[5] The University of Texas at San Antonio – Department of Electrical and Computer Engineering REAL-TIME ANALYTICS WITH APACHE STORM

  15. 2.4- PROCESSING LAYER • Apache Storm - Grouping Twitter[6] The University of Texas at San Antonio – Department of Electrical and Computer Engineering REAL-TIME ANALYTICS WITH APACHE STORM

  16. 2.4- PROCESSING LAYER • Apache Storm Storm cluster[5] The University of Texas at San Antonio – Department of Electrical and Computer Engineering REAL-TIME ANALYTICS WITH APACHE STORM

  17. 2.4- PROCESSING LAYER • Apache Storm Topology The University of Texas at San Antonio – Department of Electrical and Computer Engineering REAL-TIME ANALYTICS WITH APACHE STORM

  18. 2.5- WRAP UP IoT Cloud [2] The University of Texas at San Antonio – Department of Electrical and Computer Engineering REAL-TIME ANALYTICS WITH APACHE STORM

  19. 3- EXISITING APPLICATIONS TurtleBot follows a large target in front of it by trying to maintain a constant distance to the target. Compressed depth images of the Kinect camera are sent to the cloud and the processing topology calculates command messages, in the form of velocity vectors, in order to maintain a set distance from the large object in front of TurtleBot. Turtlebot [7] The University of Texas at San Antonio – Department of Electrical and Computer Engineering REAL-TIME ANALYTICS WITH APACHE STORM

  20. 3- EXISITING APPLICATIONS • Storm Nimbus and Zookeeper -> 1 node • Gateway -> 2 nodes • Storm supervisors -> 3 nodes • Brokers -> 2 nodes An instance of medium flavor has 2 VCPUs, 4GB of memory, and 40GB of HDD. 4 spouts and 4 bolts are running in parallel. The University of Texas at San Antonio – Department of Electrical and Computer Engineering REAL-TIME ANALYTICS WITH APACHE STORM

  21. 3- EXISITING APPLICATIONS Cloud Drivers[8] The University of Texas at San Antonio – Department of Electrical and Computer Engineering REAL-TIME ANALYTICS WITH APACHE STORM

  22. 3- EXISITING APPLICATIONS Latency with RabbitMQ Latency with Kafka *[2] The University of Texas at San Antonio – Department of Electrical and Computer Engineering REAL-TIME ANALYTICS WITH APACHE STORM

  23. 3- EXISITING APPLICATIONS Latency with RabbitMQ Latency with Kafka *[2] The University of Texas at San Antonio – Department of Electrical and Computer Engineering REAL-TIME ANALYTICS WITH APACHE STORM

  24. 3- EXISITING APPLICATIONS Latency observed in TurtleBot application . *[2] The University of Texas at San Antonio – Department of Electrical and Computer Engineering REAL-TIME ANALYTICS WITH APACHE STORM

  25. 4- CONCLUSION • Introduction to a scalable, distributed architecture and its component. • Apache storm is leading real-time processing engine. • RabbitMQ can be chosen when latency is requirement. • Proof of concept was verified by an example. • Proposed a new framework. The University of Texas at San Antonio – Department of Electrical and Computer Engineering REAL-TIME ANALYTICS WITH APACHE STORM

  26. 5- REFERENCES • [1] Kamburugamuve, Supun, et al. "Cloud-based parallel implementation of slam for mobile robots." Proceedings of the International Conference on Internet of things and Cloud Computing . ACM, 2016. • [2] Kamburugamuve, Supun, Leif Christiansen, and Geoffrey Fox. "A framework for real time processing of sensor data in the cloud." Journal of Sensors 2015 (2015). • [3] http://www.rabbitmq.com/ • [4] http://kafka.apache.org/ • [5] http://storm.apache.org/ • [6] http://www.twitter.com/ • [7] http:// www.turtlebot.com • [8] He, Hengjing, et al. "Cloud based real-time multi-robot collision avoidance for swarm robotics." International Journal of Grid and Distributed Computing, May 7 (2015). • [9] http:// www.wikipedia.com • [10] http:// www.tensorflow.org • [11] http:// www.kubernetes.io • [12] http:// www.github.com The University of Texas at San Antonio – Department of Electrical and Computer Engineering REAL-TIME ANALYTICS WITH APACHE STORM

Recommend


More recommend