Apache Spark Dr. Mihail Content derived from: Ankam, Venkat. Big - PowerPoint PPT Presentation

Mar 16, 2024 •341 likes •446 views

Apache Spark Dr. Mihail Content derived from: Ankam, Venkat. Big Data Analytics. Packt Publishing, 2016. July 9, 2019 (Dr. Mihail ) Intro Big Data July 9, 2019 1 / 8 Apache Spark Why Hadoop and MapReduce have been around for 10 years and

Apache Spark Dr. Mihail Content derived from: Ankam, Venkat. Big Data Analytics. Packt Publishing, 2016. July 9, 2019 (Dr. Mihail ) Intro Big Data July 9, 2019 1 / 8
Apache Spark Why Hadoop and MapReduce have been around for 10 years and proven to be best to process massive amounts of data with high performance. MR lacked performance in iterative computing: output between multiple MR jobs had to be dumped to disk (bottleneck) Focus on all-in-memory-compute. (Dr. Mihail ) Intro Big Data July 9, 2019 2 / 8
Memory (Dr. Mihail ) Intro Big Data July 9, 2019 3 / 8
Spark Why Spark? Designed to be interoperable with Hadoop Enables applications to distribute data reliably in-memory during processing. This is key to its performance and allows applications to avoid expensive disk access. Suitable for iterative algorithms Spark programs run up to 100x faster in-memory Provides native support for Java, Scala, Python and R Spark powers a stack of libraries, including Spark SQL, DataFrames (for interactive analytics), MLib (for machine learning) GraphX (for graph processing) Spark runs on Hadoop, Mesos, standalone cluster managers, on-premise hardware, or in the cloud (Dr. Mihail ) Intro Big Data July 9, 2019 4 / 8
MR vs. Spark (Dr. Mihail ) Intro Big Data July 9, 2019 5 / 8
(Dr. Mihail ) Intro Big Data July 9, 2019 6 / 8
(Dr. Mihail ) Intro Big Data July 9, 2019 7 / 8
(Dr. Mihail ) Intro Big Data July 9, 2019 8 / 8
(Dr. Mihail ) Intro Big Data July 9, 2019 9 / 8

Recommend

Apache Spark: A Unified Engine for Big Data Processing Presented by: Huanyi Chen Apache Spark:

Apache Spark: A Unified Engine for Big Data Processing Presented by: Huanyi Chen Apache Spark: A Unified Engine for Big Data Processing Engine? Unified? Apache Spark: A Unified Engine for Big Data Processing PAGE 2 Apache Spark: A

503 views • 36 slides

Spark Code Camp Discover Spark Streaming & Spark SQL Project Overview Focus on Spark

Spark Code Camp Discover Spark Streaming & Spark SQL Project Overview Focus on Spark Streaming and Spark SQL Explored Streaming API of Apache Spark on Ukko Cluster Window based Stream Content Direct Stream content

221 views • 9 slides

Intr Intro o to Spark to Spark and Spark and Spark SQL SQL AMP Camp 2014 Michael Armbrust -

Intr Intro o to Spark to Spark and Spark and Spark SQL SQL AMP Camp 2014 Michael Armbrust - @michaelarmbrust What is Apache Spark? Fast and general cluster computing system, interoperable with Hadoop, included in all major distros

667 views • 43 slides

Streaming OODT: Combining Apache Spark's Power with Apache OODT Michael Starch NASA

Streaming OODT: Combining Apache Spark's Power with Apache OODT Michael Starch NASA Jet Propulsion Laboratory Agenda Data and Processing Data Systems Apache OODT Apache Spark Streaming OODT

728 views • 33 slides

Sergey Beryozkin, T alend Sergey Beryozkin, T alend Apache CXF Apache CXF Practical JOSE

Sergey Beryozkin, T alend Sergey Beryozkin, T alend Apache CXF Apache CXF Practical JOSE with Apache CXF Practical JOSE with Apache CXF Practical JOSE with Apache CXF Practical JOSE with Apache CXF What Is Apache CXF Production

465 views • 25 slides

Big Data Meets Machine Learning Apache Spark MLlib 1 MLlib Spark MLlib Graphx

Big Data Meets Machine Learning Apache Spark MLlib 1 MLlib Spark MLlib Graphx Streaming Spark Dataframe Spark Core (RDD) 2 Machine Learning Algorithms Supervised learning Given a set of features and labels Builds a model that

595 views • 24 slides

Unified Big Data nified Big Data Pr Processing ocessing with with Apache Spark pache Spark

Unified Big Data nified Big Data Pr Processing ocessing with with Apache Spark pache Spark Matei Zaharia @matei_zaharia What is Apache Spark? Fast & general engine for big data processing Generalizes MapReduce model to support more

1.5k views • 52 slides

An Introduction to Apache Spark Amir H. Payberah amir@sics.se SICS Swedish ICT Amir H. Payberah

An Introduction to Apache Spark Amir H. Payberah amir@sics.se SICS Swedish ICT Amir H. Payberah (SICS) Apache Spark Feb. 2, 2016 1 / 67 Big Data small data big data Amir H. Payberah (SICS) Apache Spark Feb. 2, 2016 2 / 67 Big Data

1.1k views • 86 slides

Cypher for Apache Spark Graph processing workloads on OLAP and OLTP Mats Rydberg

Cypher for Apache Spark Graph processing workloads on OLAP and OLTP Mats Rydberg mats@neotechnology.com opencypher.org | opencypher@googlegroups.com opencypher.org | opencypher@googlegroups.com Cypher for Apache Spark Apache Spark:

281 views • 9 slides

Distributed Deep Learning Inference using Apache MXNet* and Apache Spark Naveen Swamy Amazon AI

Distributed Deep Learning Inference using Apache MXNet* and Apache Spark Naveen Swamy Amazon AI * Outline Review of Deep Learning Apache MXNet Framework Distributed Inference using MXNet and Spark Deep Learning Output CAR

653 views • 23 slides

Building a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch About

Nov / 14 / 16 Nick Pentreath Building a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch About @MLnick Principal Engineer, IBM Apache Spark PMC Focused on machine learning Author of Machine Learning

668 views • 53 slides

High Integrity Ada with SPARK Praxis Critical Systems 1 SPARK and the SPARK Examiner What is

High Integrity Ada with SPARK Praxis Critical Systems 1 SPARK and the SPARK Examiner What is SPARK? A sub-language of Ada 83 and 95 with particular properties that make it ideally suited to the most critical of applications: completely

851 views • 10 slides

Apache Felix Web Console Carsten Ziegeler | cziegeler@apache.org ApacheCon NA 2014 About

Apache Felix Web Console Carsten Ziegeler | cziegeler@apache.org ApacheCon NA 2014 About cziegeler@apache.org @cziegeler RnD Team at Adobe Research Switzerland Member of the Apache So fu ware Foundation Apache Felix and Apache

730 views • 26 slides

The Apache Way The Apache Way Nick Burch Nick Burch CTO, Quanticate CTO, Quanticate The

The Apache Way The Apache Way Nick Burch Nick Burch CTO, Quanticate CTO, Quanticate The Apache Way The Apache Way The Apache Way The Apache Way A collaborative slide deck with A collaborative slide deck with A collaborative slide deck

495 views • 45 slides

Apache Calcite for Enabling SQL Access to NoSQL Data Systems such as Apache Geode Christian

Apache Calcite for Enabling SQL Access to NoSQL Data Systems such as Apache Geode Christian Tzolov Whoami Christian Tzolov Engineer at Pivotal, Big-Data, Hadoop, Spring Cloud Dataflow, Apache Geode, Apache HAWQ, Apache Committer, Apache

798 views • 41 slides

Apache Spark CS240A Winter 2016. T Yang Some of them are based on P. Wendells Spark slides

Apache Spark CS240A Winter 2016. T Yang Some of them are based on P. Wendells Spark slides Parallel Processing using Spark+Hadoop Hadoop: Distributed file system that connects machines. Mapreduce: parallel programming style built on

472 views • 36 slides

Apache Solr Injection Michael Stepankin @artsploit DEF CON 27 @whoami Michael Stepankin

Apache Solr Injection Michael Stepankin @artsploit DEF CON 27 @whoami Michael Stepankin Security Researcher @ Veracode Web app breaker Works on making Dynamic and Static Code Analysis smarter Penetration tester in the

520 views • 41 slides

Kernel HTTPS/TCP/IP stack for HTTP DDoS mitigation Alexander Krizhanovsky Tempesta Technologies,

Kernel HTTPS/TCP/IP stack for HTTP DDoS mitigation Alexander Krizhanovsky Tempesta Technologies, Inc. ak@tempesta-tech.com Who am I? CEO & CTO at Tempesta Technologies (Seattle, WA) Developing Tempesta FW open source Linux Application

743 views • 48 slides

IT452 Advanced Web and Internet Systems Set 10: Web Servers (operation, configuration, and

IT452 Advanced Web and Internet Systems Set 10: Web Servers (operation, configuration, and security) (Chapters 21) Key Questions Popular web servers? What does a web server do? How can I control it? URL re-writing /

374 views • 8 slides

Distributed Computation of with Apache Hadoop Tsz-Wo Sze Yahoo! Cloud Computing Apache

Distributed Computation of with Apache Hadoop Tsz-Wo Sze Yahoo! Cloud Computing Apache Hadoop PMC Member Mapred2010 Dec 1 1 Agenda Introduction A New World Record How to Compute The n th Bits of ? Computing with

762 views • 52 slides

New Ideas Track: Testing MapReduce-Style Programs Christoph Csallner, Leonidas Fegaras, Chengkai

New Ideas Track: Testing MapReduce-Style Programs Christoph Csallner, Leonidas Fegaras, Chengkai Li Computer Science and Engineering Department University of Texas at Arlington (UTA) European Software Engineering Conference / ACM SIGSOFT

730 views • 19 slides

CSC 369: Distributed Computing Alex Dekhtyar May 6 Day 14: Java Hadoop API CSC 369: Distributed

CSC 369: Distributed Computing Alex Dekhtyar May 6 Day 14: Java Hadoop API CSC 369: Distributed Computing Alex Dekhtyar May 6 Day 14: Java Hadoop API HAPPY EQUATOR DAY! Housekeeping Lab 4 (mini-project): due Sunday night Lab 5: due

614 views • 31 slides

DONT OPTIMIZE MY QUERIES, ORGANIZE MY DATA! Julian Hyde (Apache Calcite) TELUQ, Montral,

DONT OPTIMIZE MY QUERIES, ORGANIZE MY DATA! Julian Hyde (Apache Calcite) TELUQ, Montral, 2018/09/24 A simple query Data Query SELECT SUM (householdSize) 2010 U.S. census FROM CensusHouseholds; 100 million records

680 views • 46 slides

ROOT4J / SPARK-ROOT: ROOT I/O for JVM and Applications for Apache Spark V. Khristenko 1 J.

Introduction Functionality Examples Summary ROOT4J / SPARK-ROOT: ROOT I/O for JVM and Applications for Apache Spark V. Khristenko 1 J. Pivarski 2 1 Department of Physics The University of Iowa 2 Princeton University - DIANA ROOT I/O Workshop,

547 views • 19 slides

Apache Spark Dr. Mihail Content derived from: Ankam, Venkat. Big - PowerPoint PPT Presentation

Apache Spark Dr. Mihail Content derived from: Ankam, Venkat. Big Data Analytics. Packt Publishing, 2016. July 9, 2019 (Dr. Mihail ) Intro Big Data July 9, 2019 1 / 8 Apache Spark Why Hadoop and MapReduce have been around for 10 years and

Apache Spark: A Unified Engine for Big Data Processing Presented by: Huanyi Chen Apache Spark:

Spark Code Camp Discover Spark Streaming & Spark SQL Project Overview Focus on Spark

Intr Intro o to Spark to Spark and Spark and Spark SQL SQL AMP Camp 2014 Michael Armbrust -

Streaming OODT: Combining Apache Spark's Power with Apache OODT Michael Starch NASA

Sergey Beryozkin, T alend Sergey Beryozkin, T alend Apache CXF Apache CXF Practical JOSE

Big Data Meets Machine Learning Apache Spark MLlib 1 MLlib Spark MLlib Graphx

Unified Big Data nified Big Data Pr Processing ocessing with with Apache Spark pache Spark

An Introduction to Apache Spark Amir H. Payberah amir@sics.se SICS Swedish ICT Amir H. Payberah

Cypher for Apache Spark Graph processing workloads on OLAP and OLTP Mats Rydberg

Distributed Deep Learning Inference using Apache MXNet* and Apache Spark Naveen Swamy Amazon AI

Building a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch About

High Integrity Ada with SPARK Praxis Critical Systems 1 SPARK and the SPARK Examiner What is

Apache Felix Web Console Carsten Ziegeler | cziegeler@apache.org ApacheCon NA 2014 About

The Apache Way The Apache Way Nick Burch Nick Burch CTO, Quanticate CTO, Quanticate The

Apache Calcite for Enabling SQL Access to NoSQL Data Systems such as Apache Geode Christian

Apache Spark CS240A Winter 2016. T Yang Some of them are based on P. Wendells Spark slides

Apache Solr Injection Michael Stepankin @artsploit DEF CON 27 @whoami Michael Stepankin

Kernel HTTPS/TCP/IP stack for HTTP DDoS mitigation Alexander Krizhanovsky Tempesta Technologies,

IT452 Advanced Web and Internet Systems Set 10: Web Servers (operation, configuration, and

Distributed Computation of with Apache Hadoop Tsz-Wo Sze Yahoo! Cloud Computing Apache

New Ideas Track: Testing MapReduce-Style Programs Christoph Csallner, Leonidas Fegaras, Chengkai

CSC 369: Distributed Computing Alex Dekhtyar May 6 Day 14: Java Hadoop API CSC 369: Distributed

DONT OPTIMIZE MY QUERIES, ORGANIZE MY DATA! Julian Hyde (Apache Calcite) TELUQ, Montral,

ROOT4J / SPARK-ROOT: ROOT I/O for JVM and Applications for Apache Spark V. Khristenko 1 J.

Sambuz

Useful Links

Newsletter

Mail Us

Apache Spark Dr. Mihail Content derived from: Ankam, Venkat. Big - PowerPoint PPT Presentation

Apache Spark Dr. Mihail Content derived from: Ankam, Venkat. Big Data Analytics. Packt Publishing, 2016. July 9, 2019 (Dr. Mihail ) Intro Big Data July 9, 2019 1 / 8 Apache Spark Why Hadoop and MapReduce have been around for 10 years and

Apache Spark: A Unified Engine for Big Data Processing Presented by: Huanyi Chen Apache Spark:

Spark Code Camp Discover Spark Streaming &amp; Spark SQL Project Overview Focus on Spark

Intr Intro o to Spark to Spark and Spark and Spark SQL SQL AMP Camp 2014 Michael Armbrust -

Streaming OODT: Combining Apache Spark's Power with Apache OODT Michael Starch NASA

Sergey Beryozkin, T alend Sergey Beryozkin, T alend Apache CXF Apache CXF Practical JOSE

Big Data Meets Machine Learning Apache Spark MLlib 1 MLlib Spark MLlib Graphx

Unified Big Data nified Big Data Pr Processing ocessing with with Apache Spark pache Spark

An Introduction to Apache Spark Amir H. Payberah amir@sics.se SICS Swedish ICT Amir H. Payberah

Cypher for Apache Spark Graph processing workloads on OLAP and OLTP Mats Rydberg

Distributed Deep Learning Inference using Apache MXNet* and Apache Spark Naveen Swamy Amazon AI

Building a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch About

High Integrity Ada with SPARK Praxis Critical Systems 1 SPARK and the SPARK Examiner What is

Apache Felix Web Console Carsten Ziegeler | cziegeler@apache.org ApacheCon NA 2014 About

The Apache Way The Apache Way Nick Burch Nick Burch CTO, Quanticate CTO, Quanticate The

Apache Calcite for Enabling SQL Access to NoSQL Data Systems such as Apache Geode Christian

Apache Spark CS240A Winter 2016. T Yang Some of them are based on P. Wendells Spark slides

Apache Solr Injection Michael Stepankin @artsploit DEF CON 27 @whoami Michael Stepankin

Kernel HTTPS/TCP/IP stack for HTTP DDoS mitigation Alexander Krizhanovsky Tempesta Technologies,

IT452 Advanced Web and Internet Systems Set 10: Web Servers (operation, configuration, and

Distributed Computation of with Apache Hadoop Tsz-Wo Sze Yahoo! Cloud Computing Apache

New Ideas Track: Testing MapReduce-Style Programs Christoph Csallner, Leonidas Fegaras, Chengkai

CSC 369: Distributed Computing Alex Dekhtyar May 6 Day 14: Java Hadoop API CSC 369: Distributed

DONT OPTIMIZE MY QUERIES, ORGANIZE MY DATA! Julian Hyde (Apache Calcite) TELUQ, Montral,

ROOT4J / SPARK-ROOT: ROOT I/O for JVM and Applications for Apache Spark V. Khristenko 1 J.

Sambuz

Useful Links

Newsletter

Mail Us

Spark Code Camp Discover Spark Streaming & Spark SQL Project Overview Focus on Spark