N EUROIMAGING R ESEARCH D ATA L IFE - CYCLE M ANAGEMENT Hurng-Chun (Hong) Lee, Robert Oostenveld, Erik van den Boogert, Eric Maris
Outlines • Lifecycle RDM: objectives and challenges • The method • the RDM protocol • Donders Research Data Repository (DRDR) - usage of iRODS • Strength and weakness of DRDR • Future focuses 2
Lifecycle RDM • RDM spans the entire research lifecycle data acquisition conception of research • Objectives: • long-term data preservation • scientific-process publication documentation data analysis • data publication 3
Challenges • large institute with heterogeneous scientific-administrative workflows • 600 researchers in PI groups ➞ more than 150 projects per year • 3 centres with 4 administrative domains • data complexity: • text, audio/video, imaging or signal data, etc. • sensitive data • size ranges from a few large (>2GB) files to a huge amount of small files (<1MB) • user expectation 4
The RDM protocol data acquisition conception of research data analysis data publication 5 * Box icons are made by Roundicons and Pixel Buddha from www.flaticon.com are licensed by CC 3.0 BY.
The RDM protocol data acquisition conception of research collection: a container of ✓ data (files/folders) ✓ metadata has a “ state ” attribute data acquisition collection data analysis data sharing collection data publication research documentation collection 5 * Box icons are made by Roundicons and Pixel Buddha from www.flaticon.com are licensed by CC 3.0 BY.
The RDM protocol data acquisition conception of research collection: a container of ✓ data (files/folders) ✓ metadata has a “ state ” attribute data acquisition collection PID data analysis data sharing collection data publication research documentation collection 5 * Box icons are made by Roundicons and Pixel Buddha from www.flaticon.com are licensed by CC 3.0 BY.
The RDM protocol data acquisition conception of research collection: a container of ✓ data (files/folders) ✓ metadata has a “ state ” attribute manager data acquisition collection research administrator (RA) contributor PID data analysis viewer data sharing collection data publication research documentation collection 5 * Box icons are made by Roundicons and Pixel Buddha from www.flaticon.com are licensed by CC 3.0 BY.
The RDM protocol data acquisition conception of research collection: a container of ✓ data (files/folders) ✓ metadata has a “ state ” attribute w o fl k manager data acquisition r o collection w research administrator y t i l i b (RA) i s n o p s contributor e r PID data analysis y t i l i b i g i l e viewer data sharing collection data publication research documentation collection 5 * Box icons are made by Roundicons and Pixel Buddha from www.flaticon.com are licensed by CC 3.0 BY.
The RDM protocol Organisational Unit (OU) data acquisition conception of research collection: a container of ✓ data (files/folders) ✓ metadata has a “ state ” attribute w o fl k manager data acquisition r o collection w research administrator y t i l i b (RA) i s n o p s contributor e r PID data analysis y t i l i b i g i l e viewer data sharing collection data publication research documentation collection 5 * Box icons are made by Roundicons and Pixel Buddha from www.flaticon.com are licensed by CC 3.0 BY.
The data repository • a iRODS-based ICT system implementing the workflow defined by the protocol • a single data-management system enabling internal collaboration, and external data sharing ✓ file-based system interfaces ✓ single and uniform namespace management WebDAV Stager portal ✓ access-controlled metadata management ✓ authentication via trusted identity providers data management middleware ✓ role-based authorisation management iRODS ELK stack ✓ data replication for disaster recovery rules ✓ workflow automation and policy storage system enforcement 6
Storage resources data of collections of OU2 data of collections of OU1 iRODS resources asynchronous data replication dataflow controle resc_ou1 resc_ou2 resc_nl vault_ou1_1 vault_ou1_2 vault_ou2_1 vault_ou2_2 vault_nl_1 load-balancing/OU-level quota quota quota quota quota filesystem NFS export NFS export storage two identical copies of data (disaster recovery) Location A (first replica) Location B (second replica) 7
Collection namespace • namespace reflects administrative hierarchy • metadata in KVU triplets • role-based authorisation with iRODS groups iRODS zone organisation organisational unit DRDR collection /rdm/DI/DCCN/DAC_3010000.01_123/ admin manager viewer contributor viewer 8
Management rules • a RPC-like interface for collection management Client Server (core.re) inputs: *collName, *kvp rdmUpdateCollectionMetadata Filter out attributes the client doesn’t have right to set Verify attribute value Update collection attributes output: *errorcode, *collectionAttrs return up-to-date collection attributes 9
User provisioning and authentication • authentication via a national federated IdP • user is provisioned upon sign-up to the management portal • IdP attributes are stored as user KVU-triplets in iRODS • setup PAM authentication on OTP (one-time password) for data access 10
Event logging • essential user actions are logged as events • non-blocking way streaming events to Elastic stack via “filebeat” reporting PEP iCAT filebeat rodsLog auditing 11
User interfaces • separating data-access from collection management • web portal for collection management • WebDAV (Davrods) for “easy” data transfer • file stager: a service “intelligently” managing bulk file transfer between a local storage and the repository file stager web interface transfer job manager/queue transfer agents actual transfer local DRDR storage irsync 12
Strength and weakness 👎 It fits to a combined scientific-administrative workflow of a large and heterogeneous institute 👎 It provides sufficient functionality for 1. sharing data for publication 2. implementing Data Management Plan (DMP) 👏 It has weak integration with data analysis facility 👏 It doesn’t implement standard way of organising collection content 13
Future focuses • seamless integration with computing (data- analysis) facility • FAIR-ness of published data collections • adoption beyond neuroimaging 14
Summary • We structured a RDM workflow • which covers the entire research lifecycle • in which both researcher and administrator take part of responsibility • which is specified by protocol; implemented by a iRODS- based digital repository https://data.donders.ru.nl 15
Recommend
More recommend