Resilient Distributed Datasets: A Fault-Tolerant Abstraction for - PowerPoint PPT Presentation

Apr 21, 2023 •349 likes •510 views

Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing Presentation by Zbigniew Chlebicki based on paper by Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael

Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing Presentation by Zbigniew Chlebicki based on paper by Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, Ion Stoica; University of California, Berkeley. Some images and code samples are from paper, presentation for NSDI or Spark Project website ( http://spark-project.org/ ).
MapReduce in Hadoop
Resilient Distributed Datasets (RDD) ● Immutable, partitioned collection of records ● Created by deterministic coarse-grained transformations ● Materialized on action ● Fault-tolerant through lineage ● Controllable persistence and partitioning
Example: Log mining val file = spark.textFile(“hdfs://…”) val errors = file.filter( line => line.contains(“ERROR”) ).cache() // Count all the errors errors.count() // Count errors mentioning MySQL errors.filter(line => line.contains(“MySQL”)).count() // Fetch the MySQL errors as an array of strings errors.filter(line => line.contains(“MySQL”)).collect()
Example: Logistic Regression val points = spark.textFile(…).map(parsePoint).cache() var w = Vector.random(D) // current separating plane for (i <- 1 to ITERATIONS) { val gradient = points.map(p => (1 / (1 + exp(-p.y*(w dot p.x))) – 1) * p.y * p.x ).reduce(_ + _) w -= gradient } println(“Final separating plane: “ + w)
Example: PageRank links = // RDD of (url, neighbors) pairs ranks = // RDD of (url, rank) pairs for (i <- 1 to ITERATIONS) { ranks = links.join(ranks).flatMap { (url, (links, rank)) => links.map(dest => (dest, rank/links.size)) }.reduceByKey(_ + _) }
Representation abstract def compute(split: Split): Iterator[T] abstract val dependencies: List[spark.Dependency[_]] abstract def splits: Array[Split] val partitioner: Option[Partitioner] def preferredLocations(split: Split): Seq[String]
Scheduling
Evaluation: PageRank
Scalability
Fault Recovery (k-means)
Behavior with Insufficient RAM (logistic regression)
User Applications ● Conviva, data mining (40x speedup) ● Mobile Millenium, traffic modeling ● Twitter, spam classification ● ...
Expressing other Models ● MapReduce, DryadLINQ ● Pregel graph processing ● Iterative MapReduce ● SQL
Conclusion ● RDDs are efficient, general and fault-tolerant abstraction for cluster computing ● 20x faster then Hadoop for memory bound applications ● Can be used for interactive data mining ● Available as Open Source at http://spark-project.org

Recommend

Lecture 10: Fault Tolerance Fault Tolerant Concurrent Computing The main principles of fault

Lecture 10: Fault Tolerance Fault Tolerant Concurrent Computing The main principles of fault tolerant programming are: Fault Detection - Knowing that a fault exists Fault Recovery - having atomic instructions that can be rolled back in

361 views • 18 slides

Building a Fault- Building a Fault- Tolerant Distributed Tolerant Distributed System with

Building a Fault- Building a Fault- Tolerant Distributed Tolerant Distributed System with System with zookeepertcl zookeepertcl Tcl Conference 2018 Tcl Conference 2018 Garrett McGrath Garrett McGrath /whois /whois /whois /whois

1.43k views • 125 slides

Big Data Processing with Apache Spark Jay Urbain, PhD Credits: Resilient Distributed Datasets

Big Data Processing with Apache Spark Jay Urbain, PhD Credits: Resilient Distributed Datasets Resilient Distributed Datasets A Fault-T A Fault-Tolerant Abstraction for In-Memory Cluster Computing olerant Abstraction for In-Memory Cluster

1.32k views • 67 slides

Distributed Systems 5. Fault Tolerant Systems Fault-Tolerance - 1 Lszl Bszrmnyi

Distributed Systems 5. Fault Tolerant Systems Fault-Tolerance - 1 Lszl Bszrmnyi Distributed Systems Fault tolerance A system or a component fails due to a fault Fault tolerance means that the system continues to provide its

428 views • 40 slides

Adaptive Fault Tolerant Systems: Adaptive Fault Tolerant Systems: Reflective Design and

1 Adaptive Fault Tolerant Systems: Adaptive Fault Tolerant Systems: Reflective Design and Validation Reflective Design and Validation Marc-Olivier Killijian Dependable Computing and Fault Tolerance Research Group Toulouse - France 2

865 views • 50 slides

Idealised Fault Tolerant Idealised Fault Tolerant Architectural Element Architectural Element

Idealised Fault Tolerant Idealised Fault Tolerant Architectural Element Architectural Element Rog rio rio de Lemos de Lemos Rog University of Kent, UK University of Kent, UK Motivation architectural fault tolerance; iFTE

350 views • 16 slides

Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing M.

Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M.J. Franklin, S. Shenker, I. Stoica Computer Laboratory Principal Motivation

628 views • 15 slides

Re Resilient Distributed Datasets: A Fa Fault-To Tolerant Abstraction for In In-Me Memor

Re Resilient Distributed Datasets: A Fa Fault-To Tolerant Abstraction for In In-Me Memor ory Cl Cluster r Com Computi ting Authors: Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J.

733 views • 27 slides

Fault-tolerant techniques Fault-tolerant techniques What causes component faults? What are the

EDA421/DIT171 - Parallel and Distributed Real-Time Systems, Chalmers/GU, 2011/2012 Lecture #14 Updated May 2, 2012 Fault-tolerant techniques Fault-tolerant techniques What causes component faults? What are the effects if the

374 views • 9 slides

FAULT-TOLERANT CONTROL Is it possible? JAN MACIEJOWSKI Fault- tolerant control. DPS09,

FAULT-TOLERANT CONTROL Is it possible? JAN MACIEJOWSKI Fault- tolerant control. DPS09, Gdask Canonical Control Engineering Problem Disturbance Controlled output Set-point Filter Controller Plant Sensor Noise This problem is

639 views • 29 slides

Fault-Tolerant Data Collection in Fault-Tolerant Data Collection in Heterogeneous Intelligent

Fault-Tolerant Data Collection in Fault-Tolerant Data Collection in Heterogeneous Intelligent Monitoring Networks Networks Heterogeneous Intelligent Monitoring Jing Deng Department of Computer Science University of North Carolina at

335 views • 18 slides

Overview Introduction and basic concept ECE 753: FAULT-TOLERANT Fault model and fault

4/1/2014 Overview Introduction and basic concept ECE 753: FAULT-TOLERANT Fault model and fault coverage COMPUTING Checkpointing and backward error recovery (rollback) Kewal K.Saluja General principles General principles

259 views • 3 slides

Fault-Tolerant Distributed Optimization Lili Su, Arun Padakandla, Qiong Hu, Seyyed A. Fatemi,

Fault-Tolerant Distributed Optimization Lili Su, Arun Padakandla, Qiong Hu, Seyyed A. Fatemi, Rehana Mahfuz, Vidyasagar Sadhu CSOI-Data Science Workshop May 27th, 2016 Lili Su et al. (SOI-DSWKSP) Fault-tolerant distributed optimization May

854 views • 6 slides

Computability Abstractions for Fault-tolerant Asynchronous Distributed Computing Julien Stainer

Computability Abstractions for Fault-tolerant Asynchronous Distributed Computing Julien Stainer under the supervision of Michel Raynal March 18 th , 2015 Computability Abstractions for Fault-tolerant Asynchronous Distributed Computing 1 / 51

1.63k views • 147 slides

Non-Cryptographic Fault-Tolerant Distributed Computation Marek Hamerlik December 6, 2007 Marek

Introduction t-privacy Tools t-resilience Advanced Non-Cryptographic Fault-Tolerant Distributed Computation Marek Hamerlik December 6, 2007 Marek Hamerlik Non-Cryptographic Fault-Tolerant Distributed Computation Introduction t-privacy

866 views • 67 slides

MapReduce & Resilient Distributed Datasets Yiqing Hua, Mengqi(Mandy) Xia Outline -

MapReduce & Resilient Distributed Datasets Yiqing Hua, Mengqi(Mandy) Xia Outline - MapReduce: - Motivation - Examples - The Design and How it Works - Performance - Resilient Distributed Datasets (RDD) - Motivation - Design -

1k views • 53 slides

TRENDS THAT WILL AFFECT YOUR PROFESSIONAL AND PERSONAL LIFE Dr. Keith Hornberger, BSRT, MBA,

THE FUTURE OF HEALTHCARE: TRENDS THAT WILL AFFECT YOUR PROFESSIONAL AND PERSONAL LIFE Dr. Keith Hornberger, BSRT, MBA, DHA, FACHE 1 The Future Direction of Healthcare Healthcare Reform will catalyze a wave of experimentation with new

306 views • 26 slides

Val Di Fiemme, Italy, 2019 Mr Vitty : Party Leader responsible for overseeing trip Mrs

Val Di Fiemme, Italy, 2019 Mr Vitty : Party Leader responsible for overseeing trip Mrs Taylor-Lane : Responsible for administering Medication Mrs Buckley : Responsible for student welfare Mr Clark : Responsible for student welfare The Staff

136 views • 13 slides

AN ANNUAL AL GENER ERAL AL MEETING TING 201 019 Safe harbour notice Certain statements made

AN ANNUAL AL GENER ERAL AL MEETING TING 201 019 Safe harbour notice Certain statements made in this presentation are forward-looking statements. These forward-looking statements include, but are not limited to, statements relating to BCEs

629 views • 42 slides

YOLOP Your Octothorpean Language for Optical Processing Team Members Sasha McIntosh Jonathan

YOLOP Your Octothorpean Language for Optical Processing Team Members Sasha McIntosh Jonathan Liu Lisa Li Introduction Image manipulation language C-like syntax Simplifies common image processing tasks such as... importing

537 views • 18 slides

Good practices in CITI-VAL - Results of the questionnaire - Based research Content I.

Good practices in CITI-VAL - Results of the questionnaire - Based research Content I. Introduction 1. the project 2. the questionnaire aspects, goals, basics II. Evaluation 1. facts basic information: country-wise, number of

1.3k views • 22 slides

.typesafe.com Scala Language-Integrated Connection Kit Jan Christopher Vogt Software Engineer,

.typesafe.com Scala Language-Integrated Connection Kit Jan Christopher Vogt Software Engineer, EPFL Lausanne A database query library for Scala person "select * from person" id name or 1 Martin 2 Stefan for( p <- Persons )

529 views • 50 slides

Creating the Val e Leader in Wireless Creating the Value Leader in Wireless The Combination of

Creating the Val e Leader in Wireless Creating the Value Leader in Wireless The Combination of T-Mobile USA and MetroPCS N Neville Ray, CTO, T-Mobile USA ill R CTO TM bil USA October 8, 2012 Safe harbor statement. Additi Additi Additi

277 views • 9 slides

Challenges 2018 Megvii (Face++) Team lizeming@megvii.com I. COCO1 8 Instance Seg Zeming LI

MSCOCO Instance Segmentation Challenges 2018 Megvii (Face++) Team lizeming@megvii.com I. COCO1 8 Instance Seg Zeming LI Jian SUN Yueqing ZHUANG Xiangyu ZHANG Gang YU Overview Improvements The results is obtained on test-dev Mask mmAP

691 views • 37 slides