preservation / curation Logo • „Research cannot flourish if data are not preserved and made accessible. All concerned must act accordingly.“ Nature 461, 145 (10 September 2009), doi:10.1038/461145a 1
Photon Sciences + X ‐ ray Facilities Logo • motivation: costs of data creation 2.000€/h beamline + overhead 120k€ ‐ 50M€ (ribosome ‐ structure) • data preservation and re ‐ use (status) portable USB hard ‐ disks, responsibility of researcher • data preservation and re ‐ use (goal) sharing, re ‐ use and integration, validation of results • requirements federated repositories open access (after a 5 year moving wall) Web ‐ based (Shibboleth) but integrated in Grid
German Language Studies Logo • preserve, share, analyze, re ‐ use German literature texts • 1,5 Mio texts + metadata + annotations = 5TB • requirements – versioning – provenance – licensing + IPR (author, publisher) → data are irretrievable, but may need to be deleted http://www.nlcphs.org/Academics/English/Pictures/shakespeare.jpg
variation in community requirements Logo • existing infrastructure available not available (climate) (bio statistics, social sciences, archaeology) • data types homogeneous heterogeneous (literature) (medicine) • data integrity erasable phases, versioning not changeable (literature) (humanities) (climate) • rights – open access: climate – licensing: literature – de ‐ personalization: medicine – private data: photon sciences 4
curation layers Logo research digital object data research data/ conceptual object semantics formats, logical object e.g. images, XML bit-stream physical object cf. Thibodeau: Overview of Technological Approaches to Digital Preservation and Challenges in Coming Years, 2002. http://www.clir.org/pubs/reports/pub107/thibodeau.html 5
roles / responsibilities Logo Community-Grid Data Curation Content WissGrid Preservation Bitstream D-Grid Preservation 6
Logo
services (pluggable) validation, metadata community-spec conversion quality control extraction services archive and storage integrity preservation repository: ingest, storage, access checks, planning replication Repository: catalogue services metadata management rights registries: search management formats, ontologies, ... infrastructure persistent AAI, user monitoring, workflow provenance identifier management logging 8 WissGrid Community D-Grid ++
grid/repository integration Logo 1 data • employing grid compute resources repos • our focus: data to service grid 2 • employing grid storage resources Storage repos Kopplung • Bit Preservation + Trust Zones grid 3 • repository federation repos 9
community-specific tools AAI, security, VO management, licenses, SLA, ... community-specific data handling curation services research data repository D-Grid infrastructure 10
finally Logo • there is no single approach that serves all needs (curation) • foster interoperability to emerge locally (not force) • link activities / communities 11
Recommend
More recommend