Big Data Processing with Apache Spark Jay Urbain, PhD Credits: - PowerPoint PPT Presentation

Big Data Processing with Apache Spark Jay Urbain, PhD Credits: Resilient Distributed Datasets Resilient Distributed Datasets A Fault-T A Fault-Tolerant Abstraction for In-Memory Cluster Computing olerant Abstraction for In-Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin, Scott Shenker, Ion Stoica http://spark.apache.org/

Motivation

Example: MapReduce

Idea: cache data in-memory h"p://people.csail.mit.edu/matei/papers/2010/hotcloud_spark.pdf ¡ ¡

Example: MapReduce h"p://people.csail.mit.edu/matei/papers/2010/hotcloud_spark.pdf ¡ ¡

Goal: In-Memory Data Sharing

Challenge

Challenge h"p://web.stanford.edu/~ouster/cgi-‑bin/papers/ramcloud.pdf ¡ ¡ h"p://piccolo.news.cs.nyu.edu/piccolo.pdf ¡ ¡

Solution: Resilient Distributed Datasets (RDDs)

RDD Recovery

Generality of RDDs

Tradeoffs

h"p://databricks.com/blog/2014/11/05/spark-‑officially-‑sets-‑a-‑new-‑record-‑in-‑large-‑scale-‑sorDng.html ¡ ¡

Programming API

Programming Spark • Written in Scala “ scah-lah ” (runs on JVM) • Can write applications in Scala, Java, Python, and R • Interactive: Scala, Python, R

h"p://mesos.apache.org/ ¡ ¡

Spark References • http://spark.apache.org/docs/latest/programming- guide.html • http://spark.apache.org/docs/latest/api/python/index.html

h"p://shop.oreilly.com/product/0636920028512.do ¡

Big Data Processing with Apache Spark Jay Urbain, PhD Credits: - PowerPoint PPT Presentation

Big Data Processing with Apache Spark Jay Urbain, PhD Credits: Resilient Distributed Datasets Resilient Distributed Datasets A Fault-T A Fault-Tolerant Abstraction for In-Memory Cluster Computing olerant Abstraction for In-Memory Cluster

Apache Spark: A Unified Engine for Big Data Processing Presented by: Huanyi Chen Apache Spark:

Unified Big Data nified Big Data Pr Processing ocessing with with Apache Spark pache Spark

Spark Code Camp Discover Spark Streaming & Spark SQL Project Overview Focus on Spark

Intr Intro o to Spark to Spark and Spark and Spark SQL SQL AMP Camp 2014 Michael Armbrust -

An Introduction to Apache Spark Amir H. Payberah amir@sics.se SICS Swedish ICT Amir H. Payberah

Big Data Meets Machine Learning Apache Spark MLlib 1 MLlib Spark MLlib Graphx

Streaming OODT: Combining Apache Spark's Power with Apache OODT Michael Starch NASA

Cypher for Apache Spark Graph processing workloads on OLAP and OLTP Mats Rydberg

Apache Calcite for Enabling SQL Access to NoSQL Data Systems such as Apache Geode Christian

Sergey Beryozkin, T alend Sergey Beryozkin, T alend Apache CXF Apache CXF Practical JOSE

Apache Apex: Next Gen Big Data Analytics Thomas Weise <thw@apache.org> @thweise PMC Chair

Stream Processing with Apache Apex Thomas Weise Apache Apex PMC Chair thw@apache.org @thweise

Data Processing at the Speed of 100 Gbps using Apache Crail Patrick Stuedi IBM Research Apache

Apache Spark Dr. Mihail Content derived from: Ankam, Venkat. Big Data Analytics. Packt

Distributed Deep Learning Inference using Apache MXNet* and Apache Spark Naveen Swamy Amazon AI

Building a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch About

Scripted Components Massimo Felici and Conrad Hughes mfelici@inf.ed.ac.uk conrad.hughes@ed.ac.uk

Scenario #1 Ready Queue C B A 2ms 1ms 100ms FCFS Avg: A B C 101.3 time 100 101 103

http://web.stanford.edu/~ouster OH -stir-howt Introduction There are several good reasons

Open Source Development Perdita Stevens perdita@inf.ed.ac.uk

OPEN-MIDPLANE DIPOLES FOR A MUON COLLIDER * R. Weggel # , J. Kolonko & R. Scanlan, Particle

The Dark Matter density MW Components Global density Data: inner Data: outer Data: masers

- Cassandra for Time Series Data - Joris Gillis, June 28, 2017 1 Joris Gillis I am a software

Community Forums Receiving Input from Parents, Families and Staff School Location: Natomas

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

Big Data Processing with Apache Spark Jay Urbain, PhD Credits: - PowerPoint PPT Presentation

Big Data Processing with Apache Spark Jay Urbain, PhD Credits: Resilient Distributed Datasets Resilient Distributed Datasets A Fault-T A Fault-Tolerant Abstraction for In-Memory Cluster Computing olerant Abstraction for In-Memory Cluster

Apache Spark: A Unified Engine for Big Data Processing Presented by: Huanyi Chen Apache Spark:

Unified Big Data nified Big Data Pr Processing ocessing with with Apache Spark pache Spark

Spark Code Camp Discover Spark Streaming &amp; Spark SQL Project Overview Focus on Spark

Intr Intro o to Spark to Spark and Spark and Spark SQL SQL AMP Camp 2014 Michael Armbrust -

An Introduction to Apache Spark Amir H. Payberah amir@sics.se SICS Swedish ICT Amir H. Payberah

Big Data Meets Machine Learning Apache Spark MLlib 1 MLlib Spark MLlib Graphx

Streaming OODT: Combining Apache Spark's Power with Apache OODT Michael Starch NASA

Cypher for Apache Spark Graph processing workloads on OLAP and OLTP Mats Rydberg

Apache Calcite for Enabling SQL Access to NoSQL Data Systems such as Apache Geode Christian

Sergey Beryozkin, T alend Sergey Beryozkin, T alend Apache CXF Apache CXF Practical JOSE

Apache Apex: Next Gen Big Data Analytics Thomas Weise &lt;thw@apache.org&gt; @thweise PMC Chair

Stream Processing with Apache Apex Thomas Weise Apache Apex PMC Chair thw@apache.org @thweise

Data Processing at the Speed of 100 Gbps using Apache Crail Patrick Stuedi IBM Research Apache

Apache Spark Dr. Mihail Content derived from: Ankam, Venkat. Big Data Analytics. Packt

Distributed Deep Learning Inference using Apache MXNet* and Apache Spark Naveen Swamy Amazon AI

Building a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch About

Scripted Components Massimo Felici and Conrad Hughes mfelici@inf.ed.ac.uk conrad.hughes@ed.ac.uk

Scenario #1 Ready Queue C B A 2ms 1ms 100ms FCFS Avg: A B C 101.3 time 100 101 103

http://web.stanford.edu/~ouster OH -stir-howt Introduction There are several good reasons

Open Source Development Perdita Stevens perdita@inf.ed.ac.uk

OPEN-MIDPLANE DIPOLES FOR A MUON COLLIDER * R. Weggel # , J. Kolonko &amp; R. Scanlan, Particle

The Dark Matter density MW Components Global density Data: inner Data: outer Data: masers

- Cassandra for Time Series Data - Joris Gillis, June 28, 2017 1 Joris Gillis I am a software

Community Forums Receiving Input from Parents, Families and Staff School Location: Natomas

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

Spark Code Camp Discover Spark Streaming & Spark SQL Project Overview Focus on Spark

Apache Apex: Next Gen Big Data Analytics Thomas Weise <thw@apache.org> @thweise PMC Chair

OPEN-MIDPLANE DIPOLES FOR A MUON COLLIDER * R. Weggel # , J. Kolonko & R. Scanlan, Particle