Getting a grip on the grid: Getting a grip on the grid: A know - PowerPoint PPT Presentation

Getting a grip on the grid: Getting a grip on the grid: A know ledge base to trace grid experim ents Am m ar Benabdelkader ammarb@nikhef.nl Mark Santcroos Mark Santcroos m.a.santcroos@amc.uva.nl Victor Guevara Masis vguevara@nikhef.nl Souley Madougou souleym@nikhef.nl souleym@nikhef.nl Antoine van Kampen a.h.vankampen@amc.uva.nl Silvia Olabarriaga S.D.Olabarriaga@amc.uva.nl

Presentation Outlines • Background, challenges and Focus • • Provenance: an overview Provenance: an overview • Provenance API (Plier): • Database schema • • Architecture & Implementation Architecture & Implementation • eBioCrawler • Abstract/ concrete graph • • Challenges Challenges • Plier Toolbox: • Generic functionalities • • Customized functionalities Customized functionalities • Scientific Impact • Conclusion & future work 2

Big Grid (Dutch NGI) • Founding partners: NCF, Nikhef and NBIC (2007-2011) • Mission: To realise a fully operational world-class and resources-rich grid environment at the national level in the Netherlands to serve public scientific research, including particle physics, life sciences and all other disciplines, disciplines and and to to encourage encourage actively actively general general grid grid usage usage across across all all disciplines. • Details: • Ca. 25% for “user support” and “application-specific support” Ca. 25% for user support and application specific support • Ca. 50% for “hardware infrastructure” • Ca. 25% for “running costs” • Focus: • Grid: networking, compute, storage (resources) , databases, sensors, backup, .... • e-science: conducting science, using all kinds of ICT infrastructure and opportunities 3

AMC: e-BioScience Group • Bioinformatics Laboratory – Dept Clinical Epidemiology Biostatistics and Dept. Clinical Epidemiology, Biostatistics and Bioinformatics – Academic Medical Centre, University of Amsterdam • Filling “gap” between medical researchers and the Dutch NGI • Supporting a wide range of applications – Next Generation Sequencing – Medical Imaging – -Omics 4

e-BioScience Group: Layered Architecture 5

Background • To run their experiment, e-BioScience group deploys: – Moteur2/ DIANE Moteur2/ DIANE Moteur2/ DIANE Workflow engine and Moteur2/ DIANE Workflow engine, and – GWENDIA GWENDIA ( Grid Workflow Efficient Enactment for Data Intensive Applications ) • Most experiments are complex due to: M i l d – Iteration over input parameters of running experiments • Each job is instantiated instantiated several times according to the number input data links. – Re Re- -trial trial of failing process • Each failing job gets re-tried until it succeeded (or reaches re-trial limit) – Each workflow experiment may consist of a large number of failed and succeeded jobs. 6

Challenges • Hard to validate validate workflow experiments: – Identify whether an experiment succeeded succeeded or failed failed – Verify the validity V if th validity of the output results lidit lidit f th t t lt – Identify the source source of failure • Hard to instrument instrument and document document experiments: – How to document validated experiments? – What to do with failed experiments? – How to keep track of the validation process? – How to preserve/ publish the knowledge and expertise • Hard to make use of use of the gained gained expertise: – How to prevent similar sources of failure? – How to spread the gained expertise? – How to better exploit the gained expertise? 7

Focus Build a knowledge base knowledge base to instrument scientific g experimentations • Start with … – Building a knowledge Building a knowledge base to instruments scientific experimentations – Knowledge base should be flexible enough … • Adopt the Open Provenance Model Open Provenance Model (OPM) … – Better suited to our case, since it provides history of occurrence of things (with flexiblity) – Implement tools to build and store OPM-compliant data objects related to scientific experimentations • Build customized tools customized tools to explore the data • Enhance Enhance the database and Toolbox whenever needed. 8

Open Provenance Model (1) http: / / openprovenance org/ http: / / openprovenance.org/ • Allow us to express all the causes of an item – e.g., provenance of a scientific experiment includes: e.g., provenance of a scientific experiment includes: • Processes composing the experiment • Where did the processes run • What input they used What input they used • What results it generates, when and where • Who did launch and monitor the experiment • Etc. Etc. • Allow for process-oriented and dataflow oriented views • Based on a notion of annotated causality graph 9

Open Provenance Model (2) http: / / openprovenance.org/ 10

PLIER Development The Provenance Layer Infrastructure for E Provenance Layer Infrastructure for E- -science science Resources (PLIER) Resources (PLIER) provides an implementation of the Open Provenance Model (OPM) Four main components constitutes the Plier development: F i t tit t th Pli d l t 1. Implementing the most optimum OPM-compliant relational database schema 2. Developing the Plier Core API: Java-based API to build build and store store OPM graphs 3. 3. Developing the eBioCrawler: Developing the eBioCrawler: Java-based agents that crawls crawls the input/ output data for each experiments and stores stores it into the knowledge base. 4. Developing the Plier Toolbox: Java-based UI to visualize visualize, search search, and share share OPM graphs 11

PLIER: Database Schema OPM compliant database schema sed b Plie OPM compliant database schema used by Plier: 12

PLIER: Core API (1) Plier API is implemented using most recent standards and mechanisms: 1. JDO 3.1 is used as a java-centric API to access persistent data, 2. DataNucleus is used as a reference implementation f of the JDO API, 3. MySQL is used as a back-end database to store provenance data d t Plier Core API provides means to build build OPM-compliant data objects and store store them into the knowledge base 13

PLIER: Core API (2) Plier API can be used in two manners: 1. Integrated within the workflow management system (WF with data provenance capabilities) ( h d b l ) • Scientists only need to enable the data provenance capabilities from the WF. • WF developers need to implement the DPC inside the workflow engine. 2. Implement the provenance data based on the p p input/ output used/ generated by the workflow system: • No need to change the workflow engine. • You may risk to build incomplete OPM graphs 14

<event> Account Timestamp </event> 15 PLIER: Core API (3) Provenance Layer Workflow System clients User WF with provenance cap pabilities Profile …

eBioCrawler 1. Java-based agents that crawls crawls the input/ output Java-based agents that crawls crawls the input/ output data for data for each experiment and stores stores it into the each experiment and stores stores it into the knowledge base. knowledge base. knowledge base. • Uses GWENDIA workflow description to build the abstract model th b t abstract model of the experiment. b t t t d l d l f th i t • Uses other input/ output/ log input/ output/ log files to build the concrete model concrete model of the experiment. • Workflow experiment data available through secure https server • RISK: RISK: of not being able to collect/ extract the required of not being able to collect/ extract the required minimum data set of each experiment. minimum data set of each experiment. 16

eBioCrawler: Abstract Graph Abstract Graph Extracted from the workflow description (GWENDIA XML format) • Straight forward process Straight forward process g g p p 17

eBioCrawler: Concrete Graph Concrete Graph Extracted from the different input/ output/ log input/ output/ log , used/ generated by the workflow engine • complex process … complex process … l l For each workflow experiments • Users and host machines are modelled as AGENT AGENTs • Executed Jobs are modelled as PROCESS PROCESSes • Input files/ parameters are modelled as ARTIFACT ARTIFACTs • Output results are also modelled as ARTIFACT ARTIFACTs • Nodes are linked using CAUSAL CAUSAL DEPENDENCY DEPENDENCYies 18

eBioCrawler: Concrete Graph Concrete Graph Major issues, we faced: • • Re Re tried Re Re-tried tried processes causes data duplication mainly tried processes causes data duplication, mainly with input files, which results in heavy graphs • It was hard to identify input files input files/ parameters for each job (values and order) each job (values and order) • Output results Output results were hard to link to their corresponding processes • Most of the issues were solved by dedicating more Most of the issues were solved by dedicating more programming efforts into eBioCrawler programming efforts into eBioCrawler 19

Getting a grip on the grid: Getting a grip on the grid: A know - PowerPoint PPT Presentation

Getting a grip on the grid: Getting a grip on the grid: A know ledge base to trace grid experim ents Am m ar Benabdelkader ammarb@nikhef.nl Mark Santcroos Mark Santcroos m.a.santcroos@amc.uva.nl Victor Guevara Masis vguevara@nikhef.nl

GRIP TM For slippery and calcifjed lesions PTCA balloon catheter Anti-Slippery Efgect Workhorse

Tabulation and Visualization Department of Government London School of Economics and Political

Sun and Grid John Barr Grid Business Development 07808 328351 john.barr@sun.com Sun and Grid

ON-GRID VS OFF-GRID SOLAR On-Grid Solar is solar generation that is connected to the utility grid

Migrating from Grid to Cloud: Migrating from Grid to Cloud: Migrating from Grid to Cloud:

WINDOW CRIMPS WITH SURE GRIP FASTER Install in only three quick steps. SAFER Stronger

GRIP Presentation to Kansas City, Missouri Parks and Recreation Department Development Committee

Grip Strength as a Marker of Ageing and Health: Evidence from Six Low and Middle- Income Countries

1 Description: Grip Road traveling East of the mine entrance. Four 90 degree turns are not

GRiP Tools for Interoperability Questions and Considerations for Ethics Committees Evaluating

Fastpitch Pitching Candice VanHorn & Leslie Miller Basics Grip For fastballs Notes:

Getting a Grip on Your Student Loans Todays Plan Overview of Public Service Loan

SEE-GRID Deploying a Grid-enabled eInfrastructure in SE Europe www.see-grid.org Jorge Sanchez,

Modernizing T&D on the Electric Grid 11/29/2011 Mark Nealon System Meter & Smart Grid

Grid Grid to Grid Grid-to to Ports Clock Routing for to-Ports Clock Routing for Ports Clock

Grid/Clo d Comp ting Grid/Clo d Comp ting Grid/Cloud Computing Grid/Cloud Computing over

Reconciling High-Level Optimizations and Low-Level Code in LLVM Juneyoung Lee Seoul National

ADDRESSING INCREASED REGULATION IN THE ADDRESSING INCREASED REGULATION IN THE ADDRESSING

RJC India Seminars Mumbai and Surat February 2014 Agenda 1. Welcome Address 2. RJC and India

international international c e n t e r c e n t e r for research for research on

Ev a l u a t i o n a n d Q u a l i t y C o n t r o l fo r t h e C o p e r n i c u s S e a s

Blockchain - The Myth PEAK OF INFLATED EXPECTATIONS PLATEAU OF PRODUCTIVITY VISIBILITY

A Responsible Art Market in Practice RAM 3 rd edition Geneva, 1 st February 2019 Im Impor

Christian Wisdom SPRING 2019: PSALMS, LITURGY, CATECHISM Hit Record Quiz Scheme of Maneuver

Getting a grip on the grid: Getting a grip on the grid: A know - PowerPoint PPT Presentation

Getting a grip on the grid: Getting a grip on the grid: A know ledge base to trace grid experim ents Am m ar Benabdelkader ammarb@nikhef.nl Mark Santcroos Mark Santcroos m.a.santcroos@amc.uva.nl Victor Guevara Masis vguevara@nikhef.nl

GRIP TM For slippery and calcifjed lesions PTCA balloon catheter Anti-Slippery Efgect Workhorse

Tabulation and Visualization Department of Government London School of Economics and Political

Sun and Grid John Barr Grid Business Development 07808 328351 john.barr@sun.com Sun and Grid

ON-GRID VS OFF-GRID SOLAR On-Grid Solar is solar generation that is connected to the utility grid

Migrating from Grid to Cloud: Migrating from Grid to Cloud: Migrating from Grid to Cloud:

WINDOW CRIMPS WITH SURE GRIP FASTER Install in only three quick steps. SAFER Stronger

GRIP Presentation to Kansas City, Missouri Parks and Recreation Department Development Committee

Grip Strength as a Marker of Ageing and Health: Evidence from Six Low and Middle- Income Countries

1 Description: Grip Road traveling East of the mine entrance. Four 90 degree turns are not

GRiP Tools for Interoperability Questions and Considerations for Ethics Committees Evaluating

Fastpitch Pitching Candice VanHorn &amp; Leslie Miller Basics Grip For fastballs Notes:

Getting a Grip on Your Student Loans Todays Plan Overview of Public Service Loan

SEE-GRID Deploying a Grid-enabled eInfrastructure in SE Europe www.see-grid.org Jorge Sanchez,

Modernizing T&amp;D on the Electric Grid 11/29/2011 Mark Nealon System Meter &amp; Smart Grid

Grid Grid to Grid Grid-to to Ports Clock Routing for to-Ports Clock Routing for Ports Clock

Grid/Clo d Comp ting Grid/Clo d Comp ting Grid/Cloud Computing Grid/Cloud Computing over

Reconciling High-Level Optimizations and Low-Level Code in LLVM Juneyoung Lee Seoul National

ADDRESSING INCREASED REGULATION IN THE ADDRESSING INCREASED REGULATION IN THE ADDRESSING

RJC India Seminars Mumbai and Surat February 2014 Agenda 1. Welcome Address 2. RJC and India

international international c e n t e r c e n t e r for research for research on

Ev a l u a t i o n a n d Q u a l i t y C o n t r o l fo r t h e C o p e r n i c u s S e a s

Blockchain - The Myth PEAK OF INFLATED EXPECTATIONS PLATEAU OF PRODUCTIVITY VISIBILITY

A Responsible Art Market in Practice RAM 3 rd edition Geneva, 1 st February 2019 Im Impor

Christian Wisdom SPRING 2019: PSALMS, LITURGY, CATECHISM Hit Record Quiz Scheme of Maneuver

Fastpitch Pitching Candice VanHorn & Leslie Miller Basics Grip For fastballs Notes:

Modernizing T&D on the Electric Grid 11/29/2011 Mark Nealon System Meter & Smart Grid