Open Data in gCube: the iMarine case Andrea Manieri - Engineering Ing.Inf. Spa Pasquale Pagano – CNR-ISTI Anton Ellenbroek – FAO
A journey 10+ years long 2 EGI Conference 2015, 21 May 2015, Lisboa
Multi-tenant Delivery Model • Dynamic deployment • Hosting • Resource Lifecycle Infrastructure Infrastructure as a Service as a Service • Monitoring • Accounting • Security • BiolCube • ConnectCube Software as a Software as a Service Service • GeosCube • StatsCube • FeatherWeightStack • SmartGears Platform as a Platform as a Service Service • ApplicationSupportLayer • SOA3 3 EGI Conference 2015, 21 May 2015, Lisboa
iMarine iMarine exploits a Hybrid Data Infrastructure by • combining over 500 software components • providing access to more than 25k datasets • serving more than 1000 jobs a day iMarine capacities are offered as services to 1700 researchers in 44 countries EGI Conference 2015, 21 May 2015, Lisboa 4
Open Data "Open means anyone can freely access, use, modify, and share for any purpose (subject, at most, to requirements that preserve provenance and openness)” (http://opendefinition.org/) EGI Conference 2015, 21 May 2015, Lisboa 5
What else? • Legal interoperability : • data from two or more databases may be combined or otherwise reused without compromising the legal rights of any of the data sources used. • Confidentiality of usage data : Operation performed by the users are accounted and visible to the VRE and community manager but details are hidden (e.g. Total volume used by the user but not the file names or Total number and CPU time used by the user but not the algorithm used and/or details about the execution EGI Conference 2015, 21 May 2015, Lisboa 6
What else? • (Digital) Data preservation : the series of managed activities necessary to ensure continued access to digital materials for as long as necessary. (Source: http://ifdo.org/wordpress/ ) • Default commitment is for long term maintenance; • Criteria of eligibility of standards to establish (by the Community) the format to be supported; • iMarine Platform commits to: • To maintain content through supported metadata; • To support a format as long as needed; • To support service for a fixed amount of time after decommissioning • To notify any service discontinuity EGI Conference 2015, 21 May 2015, Lisboa 7
What’s still to be explored? • Liability of the infrastructure for Infringements and violation (ensuring legal interoperability, IPR infringement, • Long-term technical support • How to deal with the Increasing amount of storage (specific hardware or sw solutions – e.g. deduplication) • How to deal with the Increasing number of formats (complexity in maintenance) • How to demonstrate how access rights allows to ensure privacy, confidentiality and security of sensible data • How to ensure provenance of data and keep track of their transformation • Relevance of data to be preserved • Software maintenance and its evolution, • Costs of the overall infrastructure operation EGI Conference 2015, 21 May 2015, Lisboa 8
All-you-need services Data Data Computing Computing Applications Applications iMarine Capacities 9 EGI Conference 2015, 21 May 2015, Lisboa
Data: Storage as Service to host and maintain data Database Cloud Storage Geographical DB High-availability Scalable Scalable Standard Reliable OGC Standard Ready-to-use Secure Privacy and Attribution 10 EGI Conference 2015, 21 May 2015, Lisboa
Data: Applications as a Service to curate and manage data Metadata Generation Harmonization Data Exchange Geospatial Data Disambiguate OGC protocols Biodiversity Data Validate DarwinCore Statistical Data Integrate and Consistency Check SDMX 11 EGI Conference 2015, 21 May 2015, Lisboa
Data OBIS WoR … MS Validation Data. WoR FAO DS EuroS GBIF tat iMarine Sharing iMarine Enriching Registries WOA CoL Processing MyOc ITIS ean IRMN NCBI G 12 EGI Conference 2015, 21 May 2015, Lisboa
Data OAI-PMH, OpenSearch RDF, OWL � FAO Facksheets � FAO FLOD � Aquatic Commons � Marine Top Level Ontology � Bioline International � IRD Ecoscope Ontologies Ontologies � Biodiversity Heritage � FactForge, Yago2 Documents Documents and Data and Data � OceanDocs � … Warehouses Warehouses � Nature, PenSoft DarwinCore / ISO19139 Journals >35 M Observations (OBIS) � … Biological Biological ≈ 120 K Observed Species (OBIS) Statistical Statistical and and Data Data Ecological Ecological ≈ 500 K Taxa (WoRMS) SDMX * Data Data >600 K Scientific Names (ITIS) � FAO CodeLists >12 K Species Maps (AquaMaps) � IRD CodeLists ≈ 600 Species Extent (FAO) GeoSpatial GeoSpatial � FAO datasets Data Data … FishBase, SeaLifeBase � Eurostat … CoL, GBIF � … ISO19139 (OGC W*S) � 10 years Chemical and Physical variables in 2D space � Ice concentration and velocity, Chlorophyll, Oxygen, Nitrate, Phosphate, variables > 350 Phytoplankton as carbon, Salinity, Temperature, … � On-demand Chemical and Physical variables in 3D space � Apparent Oxygen Utilization, Dissolved Oxygen, Salinity, Temperature, … EGI Conference 2015, 21 May 2015, Lisboa 13
Capacities: Computing as Service to process and extract knowledge Scalable Elastic Rich and Heterogeneous Easy to Manage Assignment of Computing High Throughput Across Boundaries Assignment of Processors Map-Reduce Tailored Virtual Research Environment Parallel R 14 EGI Conference 2015, 21 May 2015, Lisboa
Capacities: Computing as Service 15 EGI Conference 2015, 21 May 2015, Lisboa
Applications as a Service A BUNDLE is a set of services and technologies grouped according to a family of related tasks for achieving a common objective 16 EGI Conference 2015, 21 May 2015, Lisboa
Bundles used in iMarine Occurrence and Taxonomic Data Discovery Occurrence and Taxonomic Data Discovery Occurrence Data Processing Occurrence Data Processing Species Distribution Modeling Species Distribution Modeling Species Distribution Maps Discovery Species Distribution Maps Discovery Taxonomic Data Comparison Taxonomic Data Comparison Taxonomic Data Matching Taxonomic Data Matching Code List Discovery Code List Discovery Code List Management Code List Management Statistical Engine Statistical Engine Tabular Data Discovery Tabular Data Discovery Tabular Data Enrichment Tabular Data Enrichment Tabular Data Management Tabular Data Management Tabular Data Processing Tabular Data Processing Geospatial Data Discovery Geospatial Data Discovery Geospatial Data Processing Geospatial Data Processing Enhanced Documents Management Enhanced Documents Management Fact-sheets Management Fact-sheets Management Information Object Discovery Information Object Discovery Messaging Messaging Shared Workspace Shared Workspace Social Networking Facilities Social Networking Facilities EGI Conference 2015, 21 May 2015, Lisboa 17
Virtual Research Environment to share and collaborate Share Communicate Organize Database Tables Post Dynamic VRE Creation Workflow Favourite Secure Files Connection Policy Control 18 EGI Conference 2015, 21 May 2015, Lisboa
Methodology • Common Approach Publication in Publication in Generation of Generation of Import Import Harmonization Harmonization Standard Standard Metadata Metadata Format Format • Specialized Implementation Geospatial Data Geospatial Data Import Harmonization Biodiversity Data Biodiversity Data Generation of Metadata Publication in Statistical Data Statistical Data Standard Format EGI Conference 2015, 21 May 2015, Lisboa 19
Geospatial Data • Import from different sources Import Import • Harmonization and Validation of data – spatial and temporal coverage – extraction of features Harmonization Harmonization • Generation of metadata – Citation – Provenance Generation of Generation of Metadata Metadata – ISO19139 • Publication in Standard Format Publication in Publication in – WFS, WCS, WMS, WPS Standard Standard Format Format EGI Conference 2015, 21 May 2015, Lisboa 20
Biodiversity Data • Import from different sources Import Import • Harmonization and Validation of data – Status, names, • Generation of metadata Harmonization Harmonization – Citation – Provenance – DwC Generation of Generation of Metadata Metadata • Publication in Standard Format – Sharable and accessible through permanent Publication in Publication in Rest identifiers Standard Standard Format Format EGI Conference 2015, 21 May 2015, Lisboa 21
Statistical Data • Import from different formats (CSV, SDMX, Import Import SDMX files) • Harmonization and Validation of data – spatial and temporal dimensions Harmonization Harmonization – extraction of features • Generation of metadata – Citation Generation of Generation of Metadata Metadata – Provenance – SDMX Publication in Publication in • Publication in Standard Format Standard Standard Format Format – SDMX* EGI Conference 2015, 21 May 2015, Lisboa 22
Recommend
More recommend