De Develop opment of of the new Research Infrastructure for or Europ ope’s Na Natural Sc Science Collections using g novel building g blocks in EOSC SC wouter.addink@naturalis.nl Naturalis Biodiversity Center Distributed System of Scientific Collections (DiSSCo)
European Collections European Collection facilities: > 1 b billion specimens > 80% 80% of world’s species > 5,000 5,000 scientists employed > 16,000 16,000 scientific visitors pa > 10 m million public visitors pa > 25 m million web visitors pa
Lowering barriers for users 25, 25,000 000 resear arche chers travel every year to physically access scientific collections and 800k 800k obj bject cts are packed and shipped ( at €70M ) at an an annual annual publ public c cost of more than han €70M
DiSSCo Collections 530 Million
DiSSCo: A new European infrastructure 115 National Facilities 21 Countries • La Largest ever formal agreement between natural science collection facilities Ce Centralised go governance model • Sy Synchronisation of facilities at access, data and • policy level • One European virtual Collection
DiSSCo science services single entry point 1 A one-stop shop for services providing unified e-Sci cience ce service ces nalysis of di discovery, access, interpr pretation n and nd ana complex linked data 2 Physic Ph ical an al and r rem emote e A universal harmonised ph physical access service access s acces ser ervices ices and di digitisation n on n de demand nd service 3 Integrated us user suppo upport de desk and Su Suppor ort & Training g services implementation of mu multi-mo modal training pr programmes to enhance data skills
Taxon Interaction Sequence Occurrence Taxon Concept All data classes una unambi biguo uous usly Gene Specimen Taxon Name linked to the ph lin physical cts they derive object from Collection Trait Publication Specimens representations become the centrepiece of the DiSSCo knowledge base – They are used as anchoring points for disperse data classes
GBIF Collections-related Catalogue of LIfe Data classes Re-unit Re unite and and Serve Genbank Occurrence / Taxonomy Images GloBI Species Genomic Interactions information Plazi – EoL - TraitBank Literature - Traits TreatmentBank Treatments Nomenclature IPNI / Zoobank
Building block: Digital Objects (DO) A A ne new, , simple mod model fo for or organis isin ing dat data is is_s _stored_in _in re repository d-en entit ity bi bit se seque quenc nce is_r is _represented-by by aggregates ag collect ction DO DO is_a is _a is_d is _describ ibed-by by is_r is _referenced-by by is is_a _a peristent ID pe ID metad me adata Digital Objects are widely discussed in RDA GEDE by experts from 47 large Research Infrastructures • • Piloted in C2Camp (github.com/c2camp/core/wiki) by Ris (ICOS, CLARIN, DISSCO, ENES) and others to create critical mass across 3+ continents.
Why DO? Data heterogeneity hampers data exchange and reuse • 80 % of researchers time in data-intensive projects is wasted with data wrangling • To a large extent inefficiencies are due to bad data organisation • % 5 7 : y % e % v 9 r 0 u 7 8 S : : . . S 3 S 1 7 T 0 1 I Developments in science towards a stable data domain: 2 M 0 U 2 e E r i e d A w o D DONA foundation: global domain of resolvable PIDs (Handles, DOIs, ePIC, etc) o • r R B l F . d M w o FAIR principles: globally agreed guidance for proper data management/stewardship • r C • Various RDA WG results • Organisation: Research Infrastructures, eInfrastructures, clusters, EOSC But: A breakthrough for harmonised infrastructure building is still needed! We are using HTML/HTTP now for everything, the web is great but not for creating a stable data domain DO Architecture is a logical extension of the Internet to simplify the task of information management
What will DO bring us? (See presentation by Ulrich Schwardmann, GWDG, GEDE Workshop on DO, 2018:) • Abstraction for cross domain data management Reusability (by binding metadata and data with PID to a digital object) • • Interoperability by Registration of Types (RDA working group on Data Type Registries) and the Digital Object Interface Protocol (DOIP) Collections (PIDs pointing to a list of PIDs) enabling recursion • • Encapsulated Complexity for the Users View of the DO Cloud The DOIP Specification (version 2.0) will be conveyed by CNRI to the DONA Foundation in the coming weeks for public release – https://www.dona.net Global Digital Object Cloud, Larry Lannom, 2016
DSDO: Digital Specimen Digital Object PO PID: Physical object PID Occurrence ID GenBank Taxon PID Accession No GET Physical Object (PO) PID GET PO PID metadata PO PID DSDO
Why DOs approach is appropriate for re-uniting natural science collections-derived information Why Digital Objects? 1 2 3 Specimens are atomic items Digital objects collect all core A new kind of industrial information about the thing object that pervades every • Like journal articles, archaeological in one place aspect of our life today, a artefacts, DNA sequences, YouTube videos, taxon concepts, software technical essence of a thing • What it is, how it came into being, programs, workflows, etc. in cyberspace where it can be found, and pointers • Deserve individual and unique to other related things identification to avoid ambiguity • Virtual collection joined together • Editable but accuracy/authenticity around use and interpretation through logical and temporal can be controlled relations, networks, etc. Further reading: 1) Yuk Hui, On the existence of digital objects; 2) Jannis Kallinikos et al., A theory of digital objects
Building block: PIDs & Minimal metadata Enabling DOIP protocol • Building on RDA Kernel Information WG • To be aligned with minimum metadata • schema in EOSC Service catalogue Developed in RDA Data Fabric IG
Building block: AARC based AAI Piloted in Synthesys+ by the EGI AAI technology provider GRNET • Experimentation with user profiles based on augmented ORCIDs • Enables monitoring of Open Science done in the RI • Supports implementation of metrics for open science and collection • management
Building block: Attribution model RDA/TDWG Attribution Metadata Working Group Recommendation linking people, the curatorial actions they perform, and the objects they are curating.
Building block: Data packaging Requirements: Easy to use by end users • Flexible (extensible, scalable and customisable) • Machine readable metadata that is human-editable • Use of existing standard formats • Language, technology and infrastructure agnostic • Examples: • DarwinCore Archives (github.com/gbif/ipt/wiki/DwCAHowToGuide) Data Packages (frictionlessdata.io) • Linked Data Fragments (linkeddatafragments.org) •
Questions on DiSSCo Technical Architecture? Contact info@dissco.eu
Recommend
More recommend