reproducibility in the cloud
play

Reproducibility in the Cloud Rawaa Qasha, Jacek Caa, Paul Watson - PowerPoint PPT Presentation

A Framework for Scientific Workflow Reproducibility in the Cloud Rawaa Qasha, Jacek Caa, Paul Watson Newcastle University, Newcastle upon Tyne, UK Email: {r.qasha, jacek.cala, paul.watson}@newcastle.ac.uk In this paper A new framework for


  1. A Framework for Scientific Workflow Reproducibility in the Cloud Rawaa Qasha, Jacek Cała, Paul Watson Newcastle University, Newcastle upon Tyne, UK Email: {r.qasha, jacek.cala, paul.watson}@newcastle.ac.uk

  2. In this paper • A new framework for repeatability and reproducibility of scientific workflow • Integrating logical and physical preservation approaches • Offering Workflow/tasks repositories with version control • Supporting automatic deployment and image capture of workflows and tasks 2

  3. Outline • Background • Challenges for workflow reproducibility • Our solution for logical and physical preservations • Overview of reproducibility framework • Experiments and results • Conclusions 3

  4. Workflows & Reproducibility total no. of workflows Workflows can be re-excuted 1600 Number of workflows 1400 1200 1000 800 1443 600 400 341 200 (~24%) 18 (~20%) 92 0 study1* study2** 4 * Zhao et al, “ Why workflows break Understanding and combating decay in Taverna workflows ,” 2012 ** Mayer et al, “A Quantitative Study on the Re -executability of Publicly Shared Scientific Workflows”, 2015

  5. Challenges for workflow reproducibility • Insufficiently detailed workflow description • Insufficient description of the execution environment • Unavailable execution environments • Absence of & changes in the external dependencies • Missing input data 5

  6. Common reproducibility approaches Logical preservation T2 T1 T4 T3 Physical preservation 6

  7. Using TOSCA as a logical preservation Service Template Node Node Type Template (T1) T2 T1 Node Node T4 Template Template (T3) (T2) T3 Relationship Type Node Template (T4) 7 Workflow and execution environment description

  8. Using Docker for physical preservation Tools & Task Libs. artifact Data base Container Container Image Task image image creation With Depend. creation (a) Initial task deployment & execution Data Task Container image creation (b) Task deployment & execution with task image 8 Preserving execution environment and dependencies, tracking changes

  9. Reproducibility Framework Core Repository (GitHub) Task/WF Images Repository Repository LifeCycle (GitHub) Basic Types ( Docker Hub ) Scripts Automated Workflow Deployment & Enactment Engine Image (TOSCA Runtime Environment: Cloudify) Creation Target Execution Environment (Docker over local VM, AWS, Azure, GCE, …) 9

  10. Multi-container deployment 10

  11. Single container deployment 11

  12. Time line of workflow devOps 12

  13. Workflow repository 13 Preserving description, input data, tracking changes and deployment instructions

  14. Experiments and Results 14

  15. 1- Repeatability of a workflow on different clouds 15

  16. 2- Automatic image capture for improved performance 16

  17. 3- Reproducibility in the face of development changes 17

  18. Conclusions • Full workflow reproducibility is a long-standing issue • TOSCA description is used for logical preservation • Docker images for tasks/workflows support physical preservation • Changes tracking and automatic deployment also contribute to a comprehensive solution of the problem • Integration of these techniques addresses majority of the issues related to workflow decay 18

  19. THANK YOU

Recommend


More recommend