enabling reproducible computing on the epos ics d
play

Enabling Reproducible Computing on the EPOS ICS-D Alessandro Spinuso - PowerPoint PPT Presentation

Enabling Reproducible Computing on the EPOS ICS-D Alessandro Spinuso (KNMI), Daniele Bailo (INGV), Matser Jonas (KNMI) Chris Card (BGS), Jean-Baptiste Roquencourt (BRGM) Wayne Shelley (BGS) Computational Earth Science (CES) European Plate


  1. Enabling Reproducible Computing on the EPOS ICS-D Alessandro Spinuso (KNMI), Daniele Bailo (INGV), Matser Jonas (KNMI) Chris Card (BGS), Jean-Baptiste Roquencourt (BRGM) Wayne Shelley (BGS)

  2. Computational Earth Science (CES) European Plate Observing System Long-term plan to facilitate integrated use of data, data products, and facilities from distributed research infrastructures for solid Earth science in Europe. ICS-D The distributed Integrated Core Services element of EPOS. • ComputaAonal and Data Storage Infrastructures (HPC, Cloud) • Services of general interest (Data publishing services, external metadata catalogues, AAAI).

  3. CES: Earthquake Simulation VRE’s (portal.verce.eu) • Earthquake Simulation Produce synthetic seismograms for Earth models and earthquakes via the execution of HPC simulation software (SPECFEM3D - SPECFEMGLOBE) • Data processing & Misfit Analysis Observed data and synthetics are processed and compared via Data Intensive methods. Data accessed from federated international archives (FDSN), indexed and reused

  4. Traceable and Combinable Computations across Workspaces (portal.verce.eu) S-PROV ProvONE ICS-D PROV-O Cloud ICS-D HPC HPC (SCAI)

  5. CES embedded into the EPOS portal workspaces From data discovery to analysis in Distributed Data Discovery through ICS-C dedicated workspaces Catalogue Spatial integration Temporal integration Processing

  6. CES embedded into the EPOS portal workspaces Data Resource B Data Resource A - stage the distributed raw data onto computaMonal environment to develop/apply custom methods. Workflow for Data-Staging & - apply preprocessing workflows to the Preprocessing raw data before custom analysis. Processing Workspace - be informed about libraries that fit the selected data and use them. a researcher - update the raw data that is already in the wants to computaMonal environment - keep old versions of raw data (Reproducibility / Comparison) - archive the state of their environment (Track Progress / Restore)

  7. Notebook Service, architecture and requirements AAAI Contextualisation Workflow Workers Workers Containers Workflow Workflow Workflow Data-Staging & Data-Stage & Data-Stage & Preprocessing Preprocessing Preprocessing Notebook Service API Results Raw Data Volume Raw Data Volume Notebook pages Raw Data Volume lib requirements ICS-D Notebook Containers - Read-only and extensible input data (staging_history). - Users’s Data Volumes archived on- demand with notebook pages and - Workflow and Notebook container(s) share library requirements (snapshot). Volumes (Workflow as a Service). - Controlled by the EPOS GUI through - Libraries selectable from the EPOS ICS a dedicated API. catalogue .

  8. Notebook Service, architecture and requirements AAAI Contextualisation Workflow Workers Workers Containers Workflow Workflow Workflow Data-Stage & Data-Stage & Similar systems we learn from Data-Stage & Preprocessing Preprocessing Preprocessing Notebook Service API Results Raw Data Volume Raw Data Volume Notebook pages Raw Data Volume lib requirements ICS-D Build and Run Docker images from Github Repositories with notebook pages. Notebook Containers - Read-only and extensible input data. Environment version Control - Users’s Data Volumes archived on- - Workflow and Notebook container(s) share demand with notebook pages and Volumes (Workflow as a Service). library setup (snapshot). - Libraries selectable from the EPOS ICS - Controlled by the EPOS GUI through catalogue . a dedicated API.

  9. Notebook Service API Specification - Creation and management of a notebook instances and its snapshots . - Upload and execution of workflows . ( data-staging , on-demand preprocessing ). - Workflows runs are associated with an active notebook through a notebookID . - API implemented adopting REST Verbs to manage workflows , notebooks, snapshots and runs .

  10. Workflows Role in EPOS Objective: Performing routine operations as well as custom computations (at scale). Technology: ● Common Workflow Language for portable and scalable descriptions (CWLTool) ● Dispel4py Python based workflow: ○ Parallel Streaming computational API. ○ Multiple mappings (HPC, Cloud, MultiProcessing). ○ Customisable provenance capture and semantic contextualisation.

  11. Data-Intensive Provenance Model: Streaming Stateful Operators (S-PROV) Agents (Abstract Workflow & User’s Context) Agents (Concrete Software Actors)

  12. Data-Intensive Provenance Model: Streaming Stateful Operators (S-PROV) Agents (Abstract Workflow & User’s Context) State Multilayered provenance Semantic Clustering Process Delegation Resource Mapping Agents (Concrete Software Actors) Actors I/O Data and Metadata

  13. Data-Intensive Provenance Model: Streaming Stateful Operators (S-PROV) Agents (Abstract Workflow & User’s Context) State Multilayered provenance Semantic Clustering Process Delegation Resource Mapping Agents (Concrete Software Actors) Actors I/O Data and Metadata Further Extension: notebook snapshots’ dependencies and users’ configurations.

  14. S-ProvFlow Data-Intensive provenance as a Service API Methods Provenance acquisition (bulks). Monitoring and lineage queries. Contextual Metadata discovery. Comprehensive summaries. Export to PROV formats. (PROV-XML/RDF Turtle) https://github.com/KNMI/s-provenance

  15. S-ProvFlow Data-Intensive provenance as a Service API Methods Provenance acquisition (bulks). Monitoring and lineage queries. Contextual Metadata discovery. Comprehensive summaries. Export to PROV formats. (PROV-XML/RDF Turtle) https://github.com/KNMI/s-provenance

  16. Conclusions • Balance between automation and user’s control in coupling data discovery and processing. • Exploiting containerised software and infrastructures integrating Workflows and Notebooks associated with EPOS Workspaces . • Workflows as a Service (WaaS) for routine operations. Provenance and contextual metadata for validation and traceability. • Reproducibility mechanisms (in progress). - Resilient to changes at remote data providers (staging_history). - On-demand Archiving and Restore of intermediate progress (snapshots).

Recommend


More recommend