EGI-EUDAT joint access to data and computing services: an executive report DI4R - Brussels Michaela BARTH caela@kth.se Ute KARSTENS ute.karstens@nateko.lu.se Matthew VILJOEN matthew.viljoen@egi.eu Peter GILLE petergil@kth.se Maggie HELLSTRÖM margareta.hellstrom@nateko.lu.se Xavier PIVAN xavier.pivan@cerfacs.fr Christian PAGÉ christian.page@cerfacs.fr www.eudat.eu EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065
WP7 Task 7.2: Joint Access to Data, HTC and Cloud Computing Resources • EGI-EUDAT collaboration started in March 2015 and officially continues until end of EUDAT (February 2018). • Aiming at a production cross-infrastructure service • provide end-users with a seamless access to an integrated infrastructure offering both EGI and EUDAT services • pairing data and high-throughput computing resources together. • Concrete community pilots • EPOS • ICOS • ENES • Harmonization on all levels (Technical, Operational, Policies) 2
Benefits of EGI-EUDAT interoperability • Federated services through EGI paired together with EUDAT’s set of research data services: Computation on EGI EUDAT services for Federated Cloud and HTC transfer, syncing, sharing, staging and preservation of data
Initial user community pilots • EGI and EUDAT selected a set of relevant user communities • Identified user communities were prominent European Research infrastructure in the field of Earth Science (EPOS and ICOS), Bioinformatics (BBMRI and ELIXIR) and Space Physics (EISCAT-3D). • Integration activity has been driven by the end users from the start! • Process of getting requirements from user communities and their indication of prioritization of those requirements • Definition of universal use case
Definition of the universal use case • Demo at EGI Community Forum 2015 5
User community pilots • EPOS: • Fostering worldwide interoperability in Earth Sciences and provide services to a broad community of users • ICOS: • Creating web-based service (“Footprint tool”) at ICOS Carbon Portal, providing on-demand computing facilities • ENES: • Performing on-demand climate data analytics for climate research and climate change impact communities • Planned joint open call to expand the pilot activity to further early adopters: • Not many new user communities were expected to participate • Remaining user communities very productive in providing feedback • Instead: Integrate user communities that already previously used both EGI and EUDAT services, but not yet in combination.
European Plate Observing System (EPOS) • EPOS is the integrated solid Earth Sciences research infrastructure • Aims to be an effective coordinated European-scale monitoring facility for solid Earth dynamics • Aims to establish a long-term plan to facilitate the integrated use of data, models and facilities from existing and new distributed research infrastructures (RIs), for solid Earth science
EPOS data workflow plan
EPOS use case status • Use case first stage was: • Defining the best strategy for access to Federated Cloud partners and identify secure and efficient data transfer protocols towards the iRODS system • Now in second stage: • Integration with data storage services (EUDAT) and cloud computing resources (EGI)
• • • • •
≈ ≈ ≈ ≈ ≈ Footprint tool calculations for atmospheric sites using Lagrangian atmospheric transport model STILT Station Footprints Station Footprints Station Footprints Station Footprints Atmospheric observations Atmospheric observations Station Footprints Atmospheric observations Atmospheric observations Atmospheric observations ICOS ICOS ICOS ICOS ICOS Carbon Portal Carbon Portal ≈ 1 GB ≈ 1 GB Carbon Portal ≈ 1 GB Carbon Portal ≈ 1 GB Carbon Portal ≈ 1 GB Meteorological driver fields Meteorological driver fields Meteorological driver fields GHG concentrations GHG concentrations Meteorological driver fields GHG concentrations Meteorological driver fields GHG concentrations GHG concentrations STILT STILT ≈ 2-3 TB ≈ 2-3 TB STILT ≈ 2-3 TB STILT Lagrangian Lagrangian ≈ 2-3 TB STILT Lagrangian ≈ 2-3 TB Lagrangian transport model transport model Emissions Emissions Lagrangian transport model Emissions transport model Emissions transport model Emissions ≈ 670 CPUs per footprint ≈ 670 CPUs per footprint ≈ 1-2 TB ≈ 1-2 TB ≈ 670 CPUs per footprint ≈ 1-2 TB => 1700 CPUh per station per year => 1700 CPUh per station per year ≈ 670 CPUs per footprint ≈ 0.5-1 TB ≈ 0.5-1 TB per year per year ≈ 1-2 TB => 1700 CPUh per station per year ≈ 670 CPUs per footprint ≈ 0.5-1 TB per year ≈ 1-2 TB => 1700 CPUh per station per year ≈ 0.5-1 TB Prior fluxes Prior fluxes per year => 1700 CPUh per station per year ≈ 0.5-1 TB Prior fluxes per year Prior fluxes Prior fluxes
…
ICOS Carbon Portal use case status • Virtual machines with attached block storage instantiated in the EGI Federated Cloud. • Docker container for computations with local VM storage • Data transfer between VM and B2STAGE instance at PDC/KTH Stockholm • Storing of ICOS data tested on the B2SAFE system at KTH • Robot certificates installed to allow for further automation of the workflow • OneData software solution being tested Next steps: • ICOS data replicating in B2SAFE and access via B2STAGE service • Access to common storage for several VMs (via the EGI DataHub) • Load balancing to distribute computations/users requests to several VMs Beyond current EGI-EUDAT collaboration: • ICOS competence centre within EOSChub 13
Infrastructure for the European Network for Earth Science Modelling (IS-ENES) • Spawned from work on EGI EUDAT interoperability in WP7/WP8 and the ICOS Carbon Portal use case developed therein. • Goal: enabling computation on data stored in the Earth System Grid Federation (ESGF) infrastructure. • Calculations will be performed using the EUDAT General Execution Framework Workflow API (GEF) combined with EUDAT B2 services and EGI FedCloud • Results to be sent back to climate4impact.eu platform 14
IS-ENES: Current situation in the Climate Research Community ◆ Substantial increase in the federated climate data archive volume ◆ Download locally then analyze: not a sustainable workflow!
ENES use case status • Simplified view of the steps of the current adapted Use Case 1: Researcher finds data (e.g. via B2FIND) and provides a PID/URL. 2: Researcher prepares the configuration of the analysis that will be applied to the selected data using the GEF. 3: The GEF backend launches an EGI FedCloud VM and deploys a GEF Docker. Calculations are executed based on input parameters. Output is stored into EGI Volume. 4: Results are sent back to B2DROP for researcher to download, or execute another GEF for further calculations or to generate a figure. (Resulting figure could be put into B2DROP.) • So far: Automation of all steps is completed. Data comes from the ENES/ESGF data nodes, but eventually from B2SHARE. It uses the dockerized jOCCI API for automatic instantiation of VMs. • Next steps: • All AAI aspects need to be revisited, as we use 3 infrastructures. • Better integration with EUDAT B2 services; the implementation is at a prototype stage with generic aspects still needing to be developed. 17
Prototype overview: Deploying GEF execution on EGI FedCloud
B2STAGE/B2SAFE architecture example Third party B2STAGE transfer ingestion B2SAFE replica rule iRODS 19
Use Case Pilot Challenges Encountered • Scaling up (esp. AAI interoperation) • Managing co-existing support systems and channels • User-friendly documentation often missing or lacking • Steep learning curve for the user communities • Substantial time and trust investment • 3rd party dependencies and technical problems Globus Toolkit GridFTP → B2STAGE HTTPS API • • Support of metadata handling for B2SAFE (GraphDB) • Large amount of small files to be used as input for further model runs, was a problem within EGI OneData prior to 17.06.0-beta2 • Automatization freedom as unforeseen requirement 20
Pilot concept interpretation mismatch • EGI and EUDAT developers eagerly reacting to feedback • Pilots are not free beta-users for non-production-ready undocumented new features • Other parties using T7.2 to find the right contact within the user community for their own agendas • Personal contacts still highly appreciated 21
Outcomes • General: • Feedback on data-handling support within the EGI DataHub • Testing EGI Federated Cloud with automatic submission • Data transfer tests between the VMs and B2STAGE instances using both OneData and EGI DataHub to access a common storage for several VMs • Evaluating the new B2STAGE HTTP API • AAI Interoperability: • Transparent access: See the EGI and EUDAT services as offered by a unique infrastructure once authenticated • Access all (web + non-web) EGI and EUDAT services with the same credentials • Access Delegation from one service to another • Data privacy considerations and policy harmonisation • AAI overview document created for understanding each other’s AAI layers, agreements on e.g. RCauth as common link • Establishing and revising common roadmap 22
Recommend
More recommend