A (Probably not) Project Proposal: Spark Streaming vs Apache Storm - PowerPoint PPT Presentation

Jun 09, 2023 •205 likes •280 views

A (Probably not) Project Proposal: Spark Streaming vs Apache Storm for Real-time Event Detection Niall Egan November 2019 Streaming Dataflow Dataflow systems weve seen so far (e.g. MapReduce, Spark) are batch-processing systems

A (Probably not) Project Proposal: Spark Streaming vs Apache Storm for Real-time Event Detection Niall Egan November 2019
Streaming Dataflow ◮ Dataflow systems we’ve seen so far (e.g. MapReduce, Spark) are batch-processing systems ◮ Optimised for throughput , not latency
Spark Streaming ◮ Spark is a batch based system, based on RDDs: collections of objects spread across cluster ◮ Re-build on failure through lineage graph ◮ In memory RDDs faster than Hadoop ◮ How to get lower latencies? ◮ Micro-batching, exposed as D-Streams
Apache Storm ◮ Apache Storm is a streaming service from the ground up ◮ Consists of: ◮ Streams, unbounded sequence of tuples ◮ Spouts (sources of streams) ◮ Bolts (processes streams) ◮ Topologies
Proposed Application Comparison ◮ Earthquake Shakes Twitter Users: Real-time Event Detection by Social Sensors (Sakaki et al.) ◮ First step: tweet classification. Use SVM to classify tweets as positive or negatively relating to the target event. Have to avoid tweets such as ‘The earthquake yesterday was scary’. ◮ Second step: tweet as a sensory value. Regard twitter user as sensor with associated time and place. Then use Kalman filters to predict where the earthquake is happening. ◮ Put this onto Spark and Storm to do real-time, large-scale tweet classification and Kalman filters
Things to Compare On ◮ Latency (Storm should win) ◮ Memory usage ◮ Fault recovery times ◮ Scalability to number of nodes
Project Plan 1. Think of a better idea 2. Write a new project plan

Recommend

Spark Code Camp Discover Spark Streaming & Spark SQL Project Overview Focus on Spark

Spark Code Camp Discover Spark Streaming & Spark SQL Project Overview Focus on Spark Streaming and Spark SQL Explored Streaming API of Apache Spark on Ukko Cluster Window based Stream Content Direct Stream content

221 views • 9 slides

Apache Spark: A Unified Engine for Big Data Processing Presented by: Huanyi Chen Apache Spark:

Apache Spark: A Unified Engine for Big Data Processing Presented by: Huanyi Chen Apache Spark: A Unified Engine for Big Data Processing Engine? Unified? Apache Spark: A Unified Engine for Big Data Processing PAGE 2 Apache Spark: A

499 views • 36 slides

Streaming OODT: Combining Apache Spark's Power with Apache OODT Michael Starch NASA

Streaming OODT: Combining Apache Spark's Power with Apache OODT Michael Starch NASA Jet Propulsion Laboratory Agenda Data and Processing Data Systems Apache OODT Apache Spark Streaming OODT

725 views • 33 slides

Intr Intro o to Spark to Spark and Spark and Spark SQL SQL AMP Camp 2014 Michael Armbrust -

Intr Intro o to Spark to Spark and Spark and Spark SQL SQL AMP Camp 2014 Michael Armbrust - @michaelarmbrust What is Apache Spark? Fast and general cluster computing system, interoperable with Hadoop, included in all major distros

667 views • 43 slides

Spark Streaming and GraphX Amir H. Payberah amir@sics.se SICS Swedish ICT Amir H. Payberah

Spark Streaming and GraphX Amir H. Payberah amir@sics.se SICS Swedish ICT Amir H. Payberah (SICS) Spark Streaming and GraphX June 30, 2016 1 / 1 Spark Streaming Amir H. Payberah (SICS) Spark Streaming and GraphX June 30, 2016 2 / 1

524 views • 48 slides

Big Data Meets Machine Learning Apache Spark MLlib 1 MLlib Spark MLlib Graphx

Big Data Meets Machine Learning Apache Spark MLlib 1 MLlib Spark MLlib Graphx Streaming Spark Dataframe Spark Core (RDD) 2 Machine Learning Algorithms Supervised learning Given a set of features and labels Builds a model that

590 views • 24 slides

Sergey Beryozkin, T alend Sergey Beryozkin, T alend Apache CXF Apache CXF Practical JOSE

Sergey Beryozkin, T alend Sergey Beryozkin, T alend Apache CXF Apache CXF Practical JOSE with Apache CXF Practical JOSE with Apache CXF Practical JOSE with Apache CXF Practical JOSE with Apache CXF What Is Apache CXF Production

465 views • 25 slides

Apache Gearpump next-gen streaming engine Karol Brejna, Intel (karolbrejna@apache.org) Huafeng

Apache Gearpump next-gen streaming engine Karol Brejna, Intel (karolbrejna@apache.org) Huafeng Wang, Intel (huafengw@apache.org) Apache: Big Data Europe 2016 Sevilla, Spain 14 November 2016 Agenda What is Gearpump? Why Apache

853 views • 60 slides

Unified Big Data nified Big Data Pr Processing ocessing with with Apache Spark pache Spark

Unified Big Data nified Big Data Pr Processing ocessing with with Apache Spark pache Spark Matei Zaharia @matei_zaharia What is Apache Spark? Fast & general engine for big data processing Generalizes MapReduce model to support more

1.5k views • 52 slides

An Introduction to Apache Spark Amir H. Payberah amir@sics.se SICS Swedish ICT Amir H. Payberah

An Introduction to Apache Spark Amir H. Payberah amir@sics.se SICS Swedish ICT Amir H. Payberah (SICS) Apache Spark Feb. 2, 2016 1 / 67 Big Data small data big data Amir H. Payberah (SICS) Apache Spark Feb. 2, 2016 2 / 67 Big Data

1.09k views • 86 slides

Cypher for Apache Spark Graph processing workloads on OLAP and OLTP Mats Rydberg

Cypher for Apache Spark Graph processing workloads on OLAP and OLTP Mats Rydberg mats@neotechnology.com opencypher.org | opencypher@googlegroups.com opencypher.org | opencypher@googlegroups.com Cypher for Apache Spark Apache Spark:

281 views • 9 slides

Streaming items through a cluster with Spark Streaming Tathagata TD Das @tathadas CME 323:

Streaming items through a cluster with Spark Streaming Tathagata TD Das @tathadas CME 323: Distributed Algorithms and Optimization Stanford, May 6, 2015 Who am I? Who am I? > Project Management Committee (PMC) member of Apache Spark >

679 views • 41 slides

Day 4 Lab1: Docker container for Kafka - Spark streaming - Cassandra This Dockerfile sets up

Day 4 Lab1: Docker container for Kafka - Spark streaming - Cassandra This Dockerfile sets up a complete streaming environment for experimenting with Kafka, Spark streaming (PySpark), and Cassandra. It installs Kafka 0.10.2.1 Spark 2.1.1

156 views • 4 slides

Distributed Deep Learning Inference using Apache MXNet* and Apache Spark Naveen Swamy Amazon AI

Distributed Deep Learning Inference using Apache MXNet* and Apache Spark Naveen Swamy Amazon AI * Outline Review of Deep Learning Apache MXNet Framework Distributed Inference using MXNet and Spark Deep Learning Output CAR

652 views • 23 slides

Building a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch About

Nov / 14 / 16 Nick Pentreath Building a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch About @MLnick Principal Engineer, IBM Apache Spark PMC Focused on machine learning Author of Machine Learning

666 views • 53 slides

Apache Storm Christopher Little Apache Storm Alternatives Storm Hadoop Spark Streaming

Real-time Web Marketing with Apache Storm Christopher Little Apache Storm Alternatives Storm Hadoop Spark Streaming Processing Model DAG MapReduce DAG Processing Unit Record-at-a-time Batch Mini Batch Latency Sub-second High Few

386 views • 17 slides

A Spark of 2019-2020 4K School Year! WE MISS YOU ALL!! You are a very special person, And you

A Spark of 2019-2020 4K School Year! WE MISS YOU ALL!! You are a very special person, And you should already know How we loved to be your teachers, How fast the year did go. Please come back to visit us As through the grades you grow,

661 views • 30 slides

Deep Learning Frameworks with Spark and GPUs Abstract Spark is a powerful, scalable, real-time

Deep Learning Frameworks with Spark and GPUs Abstract Spark is a powerful, scalable, real-time data analytics engine that is fast becoming the de facto hub for data science and big data. However, in parallel, GPU clusters is fast becoming the

389 views • 26 slides

Discussion on Space Gravitational Wave Detection Yuta Michimura Department of Physics,

March 29, 2019 RESCEU Workshop on Space GW Detection @ U of Tokyo Discussion on Space Gravitational Wave Detection Yuta Michimura Department of Physics, University of Tokyo Questions How to realize DECIGO-like sensitivity? - Possibility

308 views • 8 slides

Inclusion & Equity (CDAIE) Fall 2018 P-CAB Presentation October 16, 2018 CDAIE Involvement

The Committee for Diversity Action, Inclusion & Equity (CDAIE) Fall 2018 P-CAB Presentation October 16, 2018 CDAIE Involvement Annual Cultural & Heritage Celebrations Disseminating Diversity News Briefs Providing Trainings &

245 views • 8 slides

Validation for Distributed Systems with Apache Spark & Beam Melinda Seckington Now

Validation for Distributed Systems with Apache Spark & Beam Melinda Seckington Now mostly works* Holden: My name is Holden Karau Prefered pronouns are she/her Developer Advocate at Google Apache Spark PMC,

956 views • 60 slides

Spark Emilie Zermatten SNSF 24.05.2019 - 28 Research creates knowledge. Aims Fund

Spark Emilie Zermatten SNSF 24.05.2019 - 28 Research creates knowledge. Aims Fund rapid testing or development of new scientific ideas Projects with unconventional thinking, unique approach High originality , only basic (if

438 views • 9 slides

Be The Spark to Success: Fostering Cultural Inclusion Through Positive Relationships Richland

Be The Spark to Success: Fostering Cultural Inclusion Through Positive Relationships Richland School District Two Inservice Training October 14, 2016 Presented by: Dr. Helen Grant, Chief Diversity and Multicultural Inclusion Officer Ms.

243 views • 22 slides

Collaboration is Key Emma Dunbar, Head of Engagement, Innovation & Entrepreneurship

Collaboration is Key Emma Dunbar, Head of Engagement, Innovation & Entrepreneurship Stakeholders @ Swansea Students & Colleges Students Investors Societies & Networks Student Santander Union Stakeholders Professional

263 views • 10 slides