Incoop: MapReduce for Incremental Computations Bhatotia, P., - PowerPoint PPT Presentation

Sep 21, 2022 •164 likes •302 views

Incoop: MapReduce for Incremental Computations Bhatotia, P., Wieder, A., Rodrigues, R., Acar, U.A., and Pasquin, R. (2011). Reviewed by Neil Satra Why? You are calculating PageRank at Google. Crawling petabytes of web pages. 1% of web pages

Incoop: MapReduce for Incremental Computations Bhatotia, P., Wieder, A., Rodrigues, R., Acar, U.A., and Pasquin, R. (2011). Reviewed by Neil Satra
Why? You are calculating PageRank at Google. Crawling petabytes of web pages. 1% of web pages have changed every time you crawl.
Why? It Iterative Batch Hard to scale efficiently Need to redo entire computation for updated data
Why? It Iterative Batch Hard to scale efficiently Need to redo entire computation for updated data Incremental Batch Data Processing
How? Caching: Option A: Give programmers the primitives Option B: Do it transparently
How? Not ot transparent Transparent Dr Dryad an and ot other to tools Yahoo! CBP DryadIncl, Nectar MapReduce Google Percolator Incoop
How? 3 optimizations: • Partitioning of file system • Fine-grained Reduce phase • Memoization-aware scheduling
How? Source: the paper
Strengths - Results: 10x to 1000x speedup, with a negligible processing overhead - Evaluation: Used unmodified code for 5 realistic applications and showed improvements both quantitatively and with mathematical proofs - Optimizations show attention paid beyond surface-level
Weaknesses - Evaluation: No quantitative comparison with non-transparent systems (Google Percolator) - Insufficient discussion of the memoization server, which could be a bottleneck or central point of failure. No attempt to decentralize that component. - Storage is linear in terms of input - Assumptions about the application - Garbage Collection of old cache entries - Evaluation: Replaced part of data with equal sized chunks, rather than appending new data
Summary o Modified version of Hadoop (MapReduce) o Efficient processing of large scale data, with incremental updates o Works with existing code, transparently o Memoizes computations, and tunes the operation of MapReduce to take maximum advantage of memoization o Strong contributions, decently evaluated, number of potential concerns have been addressed By Neil Satra
Bibliography Bhatotia, P., Wieder, A., Rodrigues, R., Acar, U.A., and Pasquin, R. (2011a). Incoop: MapReduce for incremental computations. In Proceedings of the 2nd ACM Symposium on Cloud Computing, (ACM), p. 7. Bhatotia, P., Wieder, A., Akkuş , \.Istemi Ekin, Rodrigues, R., and Acar, U.A. (2011b). Large-scale Incremental Data Processing with Change Propagation. In Proceedings of the 3rd USENIX Conference on Hot Topics in Cloud Computing, (Berkeley, CA, USA: USENIX Association), pp. 18 – 18. Gunda, P.K., Ravindranath, L., Thekkath, R.A., Yu, Y., and Zhuang, L. (2010). Nectar: automatic management of data and computation in datacenters . In In OSDI ’10,. Logothetis, D., Olston, C., Reed, B., Webb, K.C., and Yocum, K. (2010). Stateful Bulk Processing for Incremental Analytics. In Proceedings of the 1st ACM Symposium on Cloud Computing, (New York, NY, USA: ACM), pp. 51 – 62. Peng, D., and Dabek, F. (2010). Large-scale Incremental Processing Using Distributed Transactions and Notifications. In OSDI, pp. 1 – 15. Popa, L., Budiu, M., Yu, Y., and Isard, M. DryadInc: Reusing work in large-scale computations.

Recommend

Incoop: MapReduce for Incremental Computations by Bhatotia et al What is Incoop? Hadoop

Incoop: MapReduce for Incremental Computations by Bhatotia et al What is Incoop? Hadoop based framework Designed for improved efficiency of incremental programs Developed at the Max Plank institute by Bhatotia et al. Why Incoop?

685 views • 29 slides

Cutting MapReduce Cost with Spot Market Huan Liu Accenture Technology Labs Why spot market? 2

Cutting MapReduce Cost with Spot Market Huan Liu Accenture Technology Labs Why spot market? 2 Challenge with Spot Market 3 Cloud MapReduce Hadoop Our prior work MapReduce App MapReduce App Cloud MapReduce Hadoop Cloud OS Amazon

185 views • 7 slides

MapReduce Andrew Crotty Alex Galakatos What is MapReduce? MapReduce is a framework for:

MapReduce Andrew Crotty Alex Galakatos What is MapReduce? MapReduce is a framework for: parallelizable problems large datasets cluster/grid computing Background Google project Implemented many special-purpose computations

373 views • 26 slides

Mrs: MapReduce for Scientific Computing in Python Andrew McNabb, Jeff Lund , and Kevin Seppi

MapReduce MapReduce in Scientific Computing Mrs Features Performance and Case Studies Mrs: MapReduce for Scientific Computing in Python Andrew McNabb, Jeff Lund , and Kevin Seppi Brigham Young University November 16, 2012 MapReduce MapReduce

432 views • 29 slides

Lecture 16: Overview of MapReduce MapReduce is a parallel, distributed programming model and

Lecture 16: Overview of MapReduce MapReduce is a parallel, distributed programming model and implementation used to process and generate large data sets. The map component of a MapReduce job typically parses input data and distills it down to

532 views • 5 slides

Hadoop Map Reduce 1 MapReduce 2-in-1 A programming paradigm A query execution engine A kind

Hadoop Map Reduce 1 MapReduce 2-in-1 A programming paradigm A query execution engine A kind of functional programming We focus on the MapReduce execution engine of Hadoop through YARN 2 Logical View of MapReduce During MapReduce, the

422 views • 23 slides

MapReduce 320302 Databases & Web Services (P. Baumann) 1 Why MapReduce? Motivation: Large

MapReduce 320302 Databases & Web Services (P. Baumann) 1 Why MapReduce? Motivation: Large Scale Data Processing MapReduce Idea: simple, highly scalable, generic parallelization model Want to process lots of data ( > 1 TB)

781 views • 49 slides

MapReduce 340151 Big Data & Cloud Services (P. Baumann) 1 Overview MapReduce : the

MapReduce 340151 Big Data & Cloud Services (P. Baumann) 1 Overview MapReduce : the concept Hadoop : the implementation Query Languages for Hadoop Spark : the improvement MapReduce vs databases Conclusion 340151

788 views • 29 slides

COMP9313: Big Data Management MapReduce Data Structure in MapReduce Key-value pairs are the

COMP9313: Big Data Management MapReduce Data Structure in MapReduce Key-value pairs are the basic data structure in MapReduce Keys and values can be: integers, float, strings, raw bytes They can also be arbitrary data structures

1.77k views • 65 slides

Lecture 36: MapReduce Frameworks [Adapted from slides by John DeNero and MapReduce is a

Lecture 36: MapReduce Frameworks [Adapted from slides by John DeNero and MapReduce is a framework for batch processing of Big Data: http://research.google.com/archive/mapreduce-osdi04-slides] Framework: A system used by programmers to build

186 views • 3 slides

Laboratory Session: MapReduce Algorithm Design in MapReduce Pietro Michiardi Eurecom Pietro

Laboratory Session: MapReduce Algorithm Design in MapReduce Pietro Michiardi Eurecom Pietro Michiardi (Eurecom) Laboratory Session: MapReduce 1 / 63 Algorithm Design Preliminaries Preliminaries Pietro Michiardi (Eurecom) Laboratory

814 views • 62 slides

RESTORE: REUSING RESULTS OF MAPREDUCE JOBS Junjie Hu 1 Introduction Current practice

RESTORE: REUSING RESULTS OF MAPREDUCE JOBS Junjie Hu 1 Introduction Current practice deletes intermediate results of MapReduce jobs These results are not useless A system that reuses the output of MapReduce jobs / sub-jobs --

802 views • 59 slides

Flow Analysis Using MapReduce Strengths and Limitations Markus De Shon Sr. Security Engineer

Flow Analysis Using MapReduce Strengths and Limitations Markus De Shon Sr. Security Engineer Agenda MapReduce What is it? Case Study Entropy Timeseries Scaling MapReduces Other thoughts, Conclusions MapReduce: What is it? A parallel

380 views • 12 slides

Design Patterns for Efficient Graph Algorithms in MapReduce Algorithms in MapReduce Jimmy Lin and

Design Patterns for Efficient Graph Algorithms in MapReduce Algorithms in MapReduce Jimmy Lin and Michael Schatz Jimmy Lin and Michael Schatz University of Maryland Tuesday, June 29, 2010 This work is licensed under a Creative Commons

398 views • 29 slides

Counting Triangles and Modeling MapReduce Siddharth Suri Yahoo! Research Outline 2 Modeling

Counting Triangles and Modeling MapReduce Siddharth Suri Yahoo! Research Outline 2 Modeling MapReduce How and why did we come up with our model? [Karloff, Suri, Vassilvitskii SODA 2010] MapReduce algorithms for counting triangles in a

482 views • 29 slides

732A54 Big Data Analytics Lecture 10: Machine Learning with MapReduce Jose M. Pe na IDA,

732A54 Big Data Analytics Lecture 10: Machine Learning with MapReduce Jose M. Pe na IDA, Link oping University, Sweden 1/27 Contents MapReduce Framework Machine Learning with MapReduce Neural Networks Support Vector

527 views • 27 slides

Q3 2018 Results The Hague, 5 November 2018 Q3 2018 Results Nexive and Postcon are classified as

Q3 2018 Results The Hague, 5 November 2018 Q3 2018 Results Nexive and Postcon are classified as discontinued operations, resulting in adjusted segment reporting. All financials are based on continuing operations except where noted. Key

645 views • 30 slides

ACM 2018 Bill Hightower Hope. Faith. Love. FAITH Because of the increase of wickedness,..

ACM 2018 Bill Hightower Hope. Faith. Love. FAITH Because of the increase of wickedness,.. the love of most will grow cold Mathew 24:12 It is through obedience we come to know him. ..Satans end and sins end is to

411 views • 30 slides

Colorado River Basin Water Supply and Demand Study Public Outreach Meeting July 17, 2012

Colorado River Basin Water Supply and Demand Study Public Outreach Meeting July 17, 2012 Colorado River Basin Water Supply and Demand Study Welcome and Introductions Study Overview Summary of Water Demand Scenario

597 views • 40 slides

Corpus-based Visual Synthesis: An Approach to Artistic Stylization P a r a g K . M i t a l 1 M i

Corpus Images Source Image Stylization Corpus-based Visual Synthesis: An Approach to Artistic Stylization P a r a g K . M i t a l 1 M i c k G r i e r s o n 1 T i m S m i t h 2 1 Department of Computing , Goldsmit hs, University of London 2

1k views • 71 slides

PENNSYLVANIA DEPARTMENT OF TRANSPORTATION, DISTRICT 11-0 Kenmawr Bridge Replacement Project

Introductions PENNSYLVANIA DEPARTMENT OF TRANSPORTATION, DISTRICT 11-0 Kenmawr Bridge Replacement Project SWISSVALE AND RANKIN BOROUGHS Kenmawr Bridge Existing Conditions Project Team John Myler PENNDOT D-11 Construction

454 views • 17 slides

Secondary Mathematics Masterclass Gustavo Lau Introduction On what day were you born? Worksheet

Modular arithmetic Secondary Mathematics Masterclass Gustavo Lau Introduction On what day were you born? Worksheet 1 Going round in circles Modulo 12 How to represent time? 12 t 0 1 2 3 6 9 t 12 9 3 6 Modulo 12 Instead of 13 =

960 views • 79 slides

The Internet of Things Naif Almakhdhub , Abraham Clements, Mathias Payer, and Saurabh Bagchi 1

BenchIoT: A Security Benchmark for The Internet of Things Naif Almakhdhub , Abraham Clements, Mathias Payer, and Saurabh Bagchi 1 Internet of Things The number of IoT devices is expected to exceed 20 billion by 2020. Many will be

546 views • 25 slides

BGP Communities: Even more Worms in the Routing Can ACM IMC 2018, Boston, MA, USA Florian

BGP Communities: Even more Worms in the Routing Can ACM IMC 2018, Boston, MA, USA Florian Streibelt 1 <fstreibelt@mpi-inf.mpg.de> , Franziska Lichtblau 1 , Robert Beverly 2 , Cristel Pelsser 3 , Georgios Smaragdakis 4 , Randy Bush 5 , Anja

1.6k views • 83 slides