A Framework for Distributed Data- Parallel Execution in the Kepler - PowerPoint PPT Presentation

Sep 06, 2023 •478 likes •626 views

A Framework for Distributed Data- Parallel Execution in the Kepler Scientific Workflow System Jianwu Wang , Daniel Crawl, Ilkay Altintas San Diego Supercomputer Center University of California, San Diego http://biokepler.org/ Background

A Framework for Distributed Data- Parallel Execution in the Kepler Scientific Workflow System Jianwu Wang , Daniel Crawl, Ilkay Altintas San Diego Supercomputer Center University of California, San Diego http://biokepler.org/
Background • Scientific data – Enormous growth in the amount of scientific data – Applications need to process large-scale data sets • Data-intensive computing – Distributed data-parallel (DDP) patterns, e.g., PACT and MapReduce, facilitate data-intensive applications – Increasing number of execution engines available for these patterns, such as Hadoop and Stratosphere 06/02/12 http://biokepler.org/ 2
Challenges • Applications or workflows built using these DDP patterns are usually tightly- coupled with the underlying DDP execution engine • None of existing applications/systems support workflow execution on more than one DDP execution engine 06/02/12 http://biokepler.org/ 3
The bioKepler Approach • Use Distributed Data-Parallel (DDP) frameworks, e.g., MapReduce, to execute bioinformatics tools • Create configurable and reusable DDP components in Scientific Workflow System • Support different execution engines and computational environments 06/02/12 http://biokepler.org/ 4
Conceptual Framework 06/02/12 http://biokepler.org/ 5
bioKepler Architecture 06/02/12 http://biokepler.org/ 6
Distributed Data-Parallel bioActors • Set of steps to execute a bioinformatics tool in DDP environment • Can either be: – as sub-workflows (composite) – in Java code (atomic) • Includes: – Data-parallel patterns, e.g., Map, Reduce, All- Pairs, etc. to specify data grouping – I/O to interface with storage – Data format specifying how to split and join 06/02/12 http://biokepler.org/ 7
Distributed Data-Parallel Directors • Directors implement a Model of Computation – Specify when actors execute – How data transferred between actors • DDP Directors run bioActors on DDP execution engines – Hadoop director converts workflow into MapReduce, runs on Hadoop – Stratosphere director converts workflow into PACT program, executes on Nephele – Generic DDP director automatically detect available DDP engines and select the best 06/02/12 http://biokepler.org/ 8
DDP BLAST Workflow data partition for each execution 06/02/12 http://biokepler.org/ 9
DDP bioActor Usage Model 06/02/12 http://biokepler.org/ 10
DDP BLAST Workflow Experiments 06/02/12 http://biokepler.org/ 11
Summary • The bioKepler approach – Facilitates using data-parallel patterns for distributed execution of bioinformatics tools – Interfaces with different execution engines to use various computational resources • Future Work – Which patterns for which tools? – New patterns needed? 06/02/12 http://biokepler.org/ 12
Questions? • More Information {jianwu, crawl, altintas}@sdsc.edu http://www.kepler-project.org http://www.bioKepler.org • Acknowledgements – NSF OCI-0722079 for Kepler/CORE, DBI-1062565 for bioKepler – Gordon and Betty Moore Foundation for CAMERA – UCSD Triton Research Opportunities Grant 06/02/12 http://biokepler.org/ 13

Recommend

MASTERING STRATEGY EXECUTION 18 BEST PRACTICES FOR STRATEGY EXECUTION STRATEGY EXECUTION AS

MASTERING STRATEGY EXECUTION 18 BEST PRACTICES FOR STRATEGY EXECUTION STRATEGY EXECUTION AS COMPETITIVE ADVANTAGE STRATEGY EXECUTION IS CRUCIAL FOR EVERY ORGANIZATION o Successful execution of strong and robust strategies gives any organization

641 views • 32 slides

Parallel and Distributed Programming Introduction Kenjiro Taura 1 / 21 Contents 1 Why Parallel

Parallel and Distributed Programming Introduction Kenjiro Taura 1 / 21 Contents 1 Why Parallel Programming? 2 What Parallel Machines Look Like, and Where Performance Come From? 3 How to Program Parallel Machines? 4 How to Program Parallel

337 views • 33 slides

Parallel Query Execution in POLARDB for MySQL ystein Grvlen Benny Wang Alibaba Cloud Agenda

Parallel Query Execution in POLARDB for MySQL ystein Grvlen Benny Wang Alibaba Cloud Agenda What is Parallel Query? Parallel Query Design How to use Parallel Query Parallel Query Performance Future Work Agenda

581 views • 44 slides

Distributed Data-Parallel Programming Parallel Programming and Data Analysis Heather Miller

Distributed Data-Parallel Programming Parallel Programming and Data Analysis Heather Miller Data-Parallel Programming So far: Today: implementation of this paradigm. Data parallelism on a single multicore/multi-processor machine.

884 views • 54 slides

Outline Parallel / Distributed Computers CSCI 8220 Parallel and Distributed Air

Outline Parallel / Distributed Computers CSCI 8220 Parallel and Distributed Air Traffic Network Example Simulation Parallel Discrete Event Simulation Logical processes & time stamped messages Local

374 views • 4 slides

CSC2/458 Parallel and Distributed Systems Parallel Memory Systems: Coherence Sreepathi Pai

CSC2/458 Parallel and Distributed Systems Parallel Memory Systems: Coherence Sreepathi Pai February 06, 2018 URCS Outline Introduction to Parallel Memory Systems Memory Systems in Parallel Processors Coherence Implementations in Hardware

572 views • 34 slides

Lecture 2: Parallel Architectures Lecture 2: Parallel Architectures and Programming Models

ECE 451/566 - Intro. to Parallel & Distributed Prog. ECE-451/ECE-566 - Introduction to Parallel and Distributed Programming Lecture 2: Parallel Architectures Lecture 2: Parallel Architectures and Programming Models Department of Electrical

480 views • 32 slides

execution states with swapping Processes, Execution, and State 3F. Execution State Model exit

4/8/2018 execution states with swapping Processes, Execution, and State 3F. Execution State Model exit running 4A. Introduction to Scheduling create 4B. Non-Preemptive Scheduling request 4C. Preemptive Scheduling allocate blocked

305 views • 13 slides

Distributed Systems (ICE 601) Distributed Transactions Dongman Lee ICU Class Overview

Distributed Systems (ICE 601) Distributed Transactions Dongman Lee ICU Class Overview Distributed Transactions Atomic Commit Protocol Distributed Deadlock Distributed Systems - Distributed Transactions 1 Distributed

1.12k views • 10 slides

Unleashing Talent in A Distributed Workforce C O R E N E T 2 0 2 0 HACKATHON: DISTRIBUTED W O R K

DISTRIBUTED WORK TEAM #10 Unleashing Talent in A Distributed Workforce C O R E N E T 2 0 2 0 HACKATHON: DISTRIBUTED W O R K | TEAM 10 DISTRIBUTED WORK TEAM #10 DISTRIBUTED WORK TEAM #10 DISTRIBUTED WORK TEAM #10 3 DISTRIBUTED WORK TEAM

470 views • 11 slides

CSC2/458 Parallel and Distributed Systems Introduction Sreepathi Pai January 18, 2018 URCS

CSC2/458 Parallel and Distributed Systems Introduction Sreepathi Pai January 18, 2018 URCS Outline Administrivia Parallel Computing Distributed Computing Outline Administrivia Parallel Computing Distributed Computing People

585 views • 25 slides

Scenarios@run.time Distributed Scenarios@run.time Distributed Execution of Specifications

Scenarios@run.time Distributed Scenarios@run.time Distributed Execution of Specifications on Execution of Specifications on IoT-Connected Robots IoT-Connected Robots 10th International Workshop on Models@run.time at MODELS 2015,

865 views • 37 slides

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.2 Parallel

Parallel Programming Paradigms MPI Message-Passing Interface OpenMP Portable Shared Memory Programming Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.2 Parallel Programming Michael T. Heath and Edgar

790 views • 45 slides

Introduction Introduction What is Parallel Architecture? Why Parallel Architecture? Evolution

Introduction Introduction What is Parallel Architecture? Why Parallel Architecture? Evolution and Convergence of Parallel Architectures Fundamental Design Issues 2 What is Parallel Architecture? A parallel computer is a collection of

1.01k views • 84 slides

Introduction to Parallel Computing George Karypis Principles of Parallel Algorithm Design

Introduction to Parallel Computing George Karypis Principles of Parallel Algorithm Design Outline Overview of some Serial Algorithms Parallel Algorithm vs Parallel Formulation Elements of a Parallel Algorithm/Formulation Common

873 views • 52 slides

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources of

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources of Overhead in Parallel Programs n Performance Metrics for Parallel Systems n Effect of Granularity on Performance n Scalability of Parallel Systems n

646 views • 50 slides

Reproducible and Shareable Data Science in Distributed Clouds Randal Burns Professor and Chair

Reproducible and Shareable Data Science in Distributed Clouds Randal Burns Professor and Chair Department of Computer Science 23 January 2020 Reproducible and Sharable Data Science. Randal Burns INPE Data Science Seminar, 23 January 2020

249 views • 13 slides

Verifiable Set Operations over Outsourced Databases Ran Dimitris Nikos Omer Canetti

Verifiable Set Operations over Outsourced Databases Ran Dimitris Nikos Omer Canetti Papadopoulos Triandopoulos Paneth Boston University Boston RSA Laboratories Boston & Tel Aviv University University & Boston University

972 views • 64 slides

Rice University School Mathematics Project November 15, 2016 @RiceUSMP | @TeachCode | #CSforAll

Administrators Workshop Rice University School Mathematics Project November 15, 2016 @RiceUSMP | @TeachCode | #CSforAll rusmp.rice.edu Todays Facilitators: Alice Fisher RUSMP Director of Technology Applications & Integration Richard

764 views • 49 slides

What is Computing? The perceptions of university computing students Annemieke Craig Vashti

What is Computing? The perceptions of university computing students Annemieke Craig Vashti Galpin Rosemary Paradis Eva Turner Moderator Ursula Martin A Panel Presentation 1 Research to gain an insight into the perceptions

918 views • 57 slides

Computing in the time of DUNE; HPC computing solutions for LArSoft G. Cerati (FNAL) LArSoft

Computing in the time of DUNE; HPC computing solutions for LArSoft G. Cerati (FNAL) LArSoft Workshop June 25, 2019 Mostly ideas to work towards solutions! Technology is in rapid evolution 2 2019/06/25 Computing in the time

587 views • 26 slides

Bound on Quantum Computation time: QEC in a critical environment QEC - 2011 E. Novais, E.

Bound on Quantum Computation time: QEC in a critical environment QEC - 2011 E. Novais, E. Mucciolo, and Harold U. Baranger CCNH UFABC, Brazil Department of Physics University of Central Florida Department of Physics Duke

631 views • 50 slides

Computational Geometry Lecture 6: Smallest enclosing circles and more Computational Geometry

Smallest enclosing circles and more Computational Geometry Lecture 6: Smallest enclosing circles and more Computational Geometry Lecture 6: Smallest enclosing circles and more Facility location Given a set of houses and farms in an isolated

882 views • 61 slides

Introduction to the Computational Geometry Algorithms Library Monique Teillaud www.cgal.org

Introduction to the Computational Geometry Algorithms Library Monique Teillaud www.cgal.org January 2013 Overview The CGAL Open Source Project Contents of the Library Kernels and Numerical Robustness Part I The CGAL Open Source Project

679 views • 31 slides