Frameworks for Data-Intensive Applications By Ahmad and Cheung - PowerPoint PPT Presentation

Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications By Ahmad and Cheung Presented by: Ishank Jain Department of Computer Science 03/19/2019

CONTENT  Background  Research Question  Method  Results  Conclusion  Questions Automatically Leveraging MapReduce Frameworks for PAGE 2 Data-Intensive Applications

BACKGROUND  Implementations of MapReduce  Source-to-Source Compilers  Synthesizing Efficient Implementations  Query Optimizers and IRs. Automatically Leveraging MapReduce Frameworks for PAGE 3 Data-Intensive Applications

BACKGROUND: Implementations of MapReduce Automatically Leveraging MapReduce Frameworks for PAGE 4 Data-Intensive Applications

BACKGROUND: Source-to-Source Compilers Automatically Leveraging MapReduce Frameworks for PAGE 5 Data-Intensive Applications

BACKGROUND: Synthesizing Efficient Implementations Automatically Leveraging MapReduce Frameworks for PAGE 6 Data-Intensive Applications

BACKGROUND: Query Optimizers and IRs. Automatically Leveraging MapReduce Frameworks for PAGE 7 Data-Intensive Applications

MOTIVATION Automatically Leveraging MapReduce Frameworks for PAGE 8 Data-Intensive Applications

CASPER  Casper is a compiler that can automatically retarget sequential Java programs to Big Data processing frameworks such as Spark, Hadoop or Flink . Image credit: https://casper.uwplse.org Automatically Leveraging MapReduce Frameworks for PAGE 9 Data-Intensive Applications

CASPER \ Automatically Leveraging MapReduce Frameworks for PAGE 10 Data-Intensive Applications

MapReduce OPERATORS  Map operator:  Converts a value of type τ into a multiset of key- value pairs of types κ and ν .  Reduce operator:  Combines two values of type ν to produce a final value.  Shuffling. Automatically Leveraging MapReduce Frameworks for PAGE 11 Data-Intensive Applications

PROGRAM SUMMARY  The program summary, a high-level intermediate representation (IR), describes how the output of the code fragment (i.e., m) can be computed using a series of map and reduce stages from the input data (i.e., mat) Automatically Leveraging MapReduce Frameworks for PAGE 12 Data-Intensive Applications

SYSTEM ARCHITECTURE  Program analyzer:  search space description  Verification condition  Summary generator.  Code generator. Automatically Leveraging MapReduce Frameworks for PAGE 13 Data-Intensive Applications

PROGRAM SUMMARIES  High level IR:  To express summaries that are translatable into the target API.  Let the synthesizer efficiently search for summaries that are equivalent to the input program.  Limited number of operations. Automatically Leveraging MapReduce Frameworks for PAGE 14 Data-Intensive Applications

SEARCH SPACE  To generate the search space grammar, Casper analyzes the input.  Code analyzer:  Dataflow analysis  Scanning function Automatically Leveraging MapReduce Frameworks for PAGE 15 Data-Intensive Applications

SEARCH SPACE Automatically Leveraging MapReduce Frameworks for PAGE 16 Data-Intensive Applications

VERIFYING SUMMARIES  Verification conditions:  Hoare logic  Predicate logic Automatically Leveraging MapReduce Frameworks for PAGE 17 Data-Intensive Applications

SEARCH STRATEGY  Input:  a set of candidate summaries and invariants encoded as a grammar,  The correctness specification for the summary in the form of verification conditions.  CEGIS Algorithm Automatically Leveraging MapReduce Frameworks for PAGE 18 Data-Intensive Applications

IMPROVISATION  Verifier failures:  Casper must first prevent summaries that failed the theorem prover from being regenerated by the synthesizer.  Incremental grammar generation:  Helps find summaries quicker and is more syntactically expressive. Automatically Leveraging MapReduce Frameworks for PAGE 19 Data-Intensive Applications

IMPROVISATION  Search Algorithm for summaries:  Each synthesized summary (correct or not) is eliminated from the search space, forcing the synthesizer to generate a new summary each time.  When the grammar is exhausted, Casper returns the set of correct summaries Δ if it is non -empty Automatically Leveraging MapReduce Frameworks for PAGE 20 Data-Intensive Applications

COST MODEL  Dynamic cost estimation:  It counts the number of unique data values that are emitted as keys. Automatically Leveraging MapReduce Frameworks for PAGE 21 Data-Intensive Applications

IMPORTANT POINTS AND LIMITATION  The IR does not currently model the full range of operators across different MapReduce implementations.  Biasing the search towards smaller grammars likely produces program summaries that run more efficiently. Although this is not sufficient to guarantee optimality of generated summaries. It’s a tradeoff between efficient solution and time spent to generate the grammar.  Casper can currently do this for basic Java statements, conditionals, functions, user-defined types, and loops.  Recursive methods and methods with side-effects are not currently supported. Automatically Leveraging MapReduce Frameworks for PAGE 22 Data-Intensive Applications

EVALUATION Automatically Leveraging MapReduce Frameworks for PAGE 23 Data-Intensive Applications

QUESTIONS  Casper covers limited set of operations and doesn’t perform well on ML related and Scientific images dataset. Does this make it usable only for beginner programmers?  “Summaries are restricted to only those expressible using the IR, which lacks many features (e.g., pointers) that a general purpose language would have”. Does this restrict the scope of finding a better target code?  Certain methods such as recursive methods are not supported(reason: they don’t gain any speedup). Is the paper not addressing issues that are essential part of general purpose coding?  NOTE: The paper wanted to reduce complexity for user to learn multiple DSL. Automatically Leveraging MapReduce Frameworks for PAGE 28 Data-Intensive Applications

REFERENCE Maaz Bin Safeer Ahmad, Alvin Cheung. Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications. Proc. ACM SIGMOD International Conference on Management of Data , pages 1205-1220, 2018. Automatically Leveraging MapReduce Frameworks for PAGE 29 Data-Intensive Applications

Frameworks for Data-Intensive Applications By Ahmad and Cheung - PowerPoint PPT Presentation

Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications By Ahmad and Cheung Presented by: Ishank Jain Department of Computer Science 03/19/2019 CONTENT Background Research Question Method Results

Web Frameworks Web Frameworks Banned for homework assignments Now that you're starting

Plugin frameworks About me About this talk Plugin 3 approaches to designing plugin APIs

Logical Frameworks Lilongwe, Malawi 23-27 May 2011 Session Objectives Understand what

2006- -2007 BUDGETARY 2007 BUDGETARY 2006 FRAMEWORKS FRAMEWORKS SECRETARY ROLANDO G.ANDAYA

Establishing Performance Frameworks www.apse.org.uk Performance Frameworks Effective Process

Rigidity of Graphs and Frameworks Bill Jackson School of Mathematical Sciences Queen Mary,

GraphQL with Python frameworks GraphQL with Python frameworks Create next-generation API with

Counting d.o.f.s in periodic frameworks Louis Theran (Aalto University / AScI, CS) Frameworks

Frameworks y Componentes (... reutilizar, reutilizar, reutilizar!!! ...) Universidad de los

OpenTHOS Multi-window Introduction Chen Gang <chengang@emindsoft.com.cn> 2016-09-24

Frameworks, Implementation & Open Frameworks, Implementation & Open Problems for the

Frameworks Concepts set of cooperating classes Frameworks extending some class

EA Frameworks and Meta- -Models Models EA Frameworks and Meta EA Summit 2004 June 8, 2004

Bayesian estimation approach in frameworks, integration of compilation and analysis Jan W. van

Congressional Budget Office January 4, 2016 Frameworks for Distributional Analyses Annual

When Frameworks Let You Down Platform-Imposed Constraints on the Design and Evolution of

Speech Processing 15-492/18-492 Speech Synthesis Waveform generation 2 Speech Synthesis Text

FMCAD 2009 EDA Vendors Panel Kevin Harer Principal Engineer and R&D Manager Magellan

Good Practices for Designing Cryptographic Primitives in Hardware Miroslav Kne evi NXP

Week 1- Introduction to model checking B. Srivathsan Chennai Mathematical Institute NPTEL-course

Satellite Data Give Snapshot the LOS displacement field of InSAR around the northern Pakistan

In System Identification, System Identification: . . . Interval (and Fuzzy) Estimates Algorithm

z Transforms IIT Bombay Consider discrete time uniformly sampled discrete signal = f k

Finite-Sample System Identification: An Overview and a New Correlation Method e 1, aji 2 Marco

Frameworks for Data-Intensive Applications By Ahmad and Cheung - PowerPoint PPT Presentation

Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications By Ahmad and Cheung Presented by: Ishank Jain Department of Computer Science 03/19/2019 CONTENT Background Research Question Method Results

Web Frameworks Web Frameworks Banned for homework assignments Now that you're starting

Plugin frameworks About me About this talk Plugin 3 approaches to designing plugin APIs

Logical Frameworks Lilongwe, Malawi 23-27 May 2011 Session Objectives Understand what

2006- -2007 BUDGETARY 2007 BUDGETARY 2006 FRAMEWORKS FRAMEWORKS SECRETARY ROLANDO G.ANDAYA

Establishing Performance Frameworks www.apse.org.uk Performance Frameworks Effective Process

Rigidity of Graphs and Frameworks Bill Jackson School of Mathematical Sciences Queen Mary,

GraphQL with Python frameworks GraphQL with Python frameworks Create next-generation API with

Counting d.o.f.s in periodic frameworks Louis Theran (Aalto University / AScI, CS) Frameworks

Frameworks y Componentes (... reutilizar, reutilizar, reutilizar!!! ...) Universidad de los

OpenTHOS Multi-window Introduction Chen Gang &lt;chengang@emindsoft.com.cn&gt; 2016-09-24

Frameworks, Implementation &amp; Open Frameworks, Implementation &amp; Open Problems for the

Frameworks Concepts set of cooperating classes Frameworks extending some class

EA Frameworks and Meta- -Models Models EA Frameworks and Meta EA Summit 2004 June 8, 2004

Bayesian estimation approach in frameworks, integration of compilation and analysis Jan W. van

Congressional Budget Office January 4, 2016 Frameworks for Distributional Analyses Annual

When Frameworks Let You Down Platform-Imposed Constraints on the Design and Evolution of

Speech Processing 15-492/18-492 Speech Synthesis Waveform generation 2 Speech Synthesis Text

FMCAD 2009 EDA Vendors Panel Kevin Harer Principal Engineer and R&amp;D Manager Magellan

Good Practices for Designing Cryptographic Primitives in Hardware Miroslav Kne evi NXP

Week 1- Introduction to model checking B. Srivathsan Chennai Mathematical Institute NPTEL-course

Satellite Data Give Snapshot the LOS displacement field of InSAR around the northern Pakistan

In System Identification, System Identification: . . . Interval (and Fuzzy) Estimates Algorithm

z Transforms IIT Bombay Consider discrete time uniformly sampled discrete signal = f k

Finite-Sample System Identification: An Overview and a New Correlation Method e 1, aji 2 Marco

OpenTHOS Multi-window Introduction Chen Gang <chengang@emindsoft.com.cn> 2016-09-24

Frameworks, Implementation & Open Frameworks, Implementation & Open Problems for the

FMCAD 2009 EDA Vendors Panel Kevin Harer Principal Engineer and R&D Manager Magellan