EGI-InSPIRE iRODS: Setup and Use of a National Data Management System in the French NGI Jerome PANSANEL & the FG-iRODS Team jerome.pansanel@iphc.cnrs.fr Hubert Curien Multidisciplinary Institute, Strasbourg, France 1 05/21/14 www.egi.eu www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE RI-261323
iRODS www.egi.eu EGI-InSPIRE RI-261323
Scientific Data Today ● Large amounts of data are collected by scientist and have to be analyzed ● Data are distributed across the world and have to be shared between the different partners ● Each data center has it own storage infrastructure (most of the time based on heterogeneous systems) ● The physical organization of the data should be transparent to the users ● Data should be easy to manage ● Data should be easily retrievable (for example with metadata search) ● Data access has to be protected (data replication, specific ACLs, ...) ● Data has to be available from anywhere → How to solve these challenges? 3 05/21/14 www.egi.eu EGI-InSPIRE RI-261323
IRODS: iRule Oriented Data Systems • Project started in 2006 (based on SRB) • Release under an Open Source license (BSD) • Developed by the DICE group and several collaborators • Rule Engine applies user-defined policies and rules • Integrate a descriptive metadata system to manipulate data • Data collections manageable over several sites and iRODS heterogeneous hardware • Logical organization of files is independent of its MS Disk Disk physical implementation • Enforcement of data consistency and homogeneity http://irods.org/ 4 05/21/14 www.egi.eu EGI-InSPIRE RI-261323
IRODS Consortium ● Objectives: Guide the continuous development of iRODS Obtain funding to support this development Provide a fully tested software by using complementary process of testing, packaging, and expertise developed at RENCI Evangelize iRODS among potential users ● For more informations: → http://irods-consortium.org/ 5 www.egi.eu EGI-InSPIRE RI-261323
Fact Sheet ● Usable from personal laptop to institutional repositories to international projects ● Thousands of users ● Billions of files and several petabytes of data ● Extensive documentation: https://wiki.irods.org/index.php/Documentation ● Binary packages available 6 05/21/14 www.egi.eu EGI-InSPIRE RI-261323
Under the Hood ● An iRODS system (a “zone”) is based on three main elements: Database Desktop Rule engine Grid Resources Cloud ● Data servers can be spread geographically within one zone ● Possibility to have different zones interconnected ● Available user interfaces : GUIs, CLIs and APIs (C, Java, Python, …) ● Automatic and manual data processing possible through the rule engine 7 05/21/14 www.egi.eu EGI-InSPIRE RI-261323
User Interfaces [user ~]$ ils /frgrid/home/UNECOLLAB/RAWDATA: C- /frgrid/home/UNECOLLAB/RAWDATA/CALIBRATION C- /frgrid/home/UNECOLLAB/RAWDATA/BE C- /frgrid/home/UNECOLLAB/RAWDATA/ZR [user ~]$ ils -l BE/ /frgrid/home/UNECOLLAB/RAWDATA/BE: owner 0 ps-lpsc-lpscdata7-fr 80072192 2013-11-11.16:21 & run0977156_123.dst owner 0 ps-lpsc-lpscdata7-fr 1748189011 2013-11-11.15:48 & run0977156_123.raw owner 1 iphcCache1 1748189011 2013-11-11.16:42 & run0977156_123.raw owner 0 ps-lpsc-lpscdata7-fr 80072192 2013-11-11.16:21 & run0977234_673.dst ... 8 05/21/14 www.egi.eu EGI-InSPIRE RI-261323
Metadata ● Associated with a file, a collection, a resource or a user ● Based on a triplet: name, value and unit [user ~]$ imeta add -d run0977156_123.raw length 10 cm [user ~]$ imeta add -d run0977156_123.raw hall east [user ~]$ imeta ls -d run0977156_123.raw AVUs defined for dataObj run0977156_123.raw: attribute: length value: 10 units: cm ---- attribute: hall value: east units: [user ~]$ imeta -d qu hall east collection: /frgrid/home/UNECOLLAB/RAWDATA/ZR dataObj: run0977156_123.raw ---- collection: /frgrid/home/UNECOLLAB/RAWDATA/ZR dataObj: run0817773_556.raw 9 www.egi.eu EGI-InSPIRE RI-261323
IRODS Rule Sample ● Constitution of a rule: actionDef | condition | workflow-chain |recovery-chain ● Example: acPostProcForPut { ON($objPath like "/tempZone/home/rods/monitored/\*") { msiSplitPath($objPath, *collection, *fileName); msiCollRsync(*collection, "/targetZone/home/rods/safe- copy", "demoResc", "IRODS_TO_IRODS", *Status); writeLine("serverLog", "Rsync of *collection to its safe copy done (status=*Status) Triggered by creation of $objPath); } } 10 05/21/14 www.egi.eu EGI-InSPIRE RI-261323
Genomic Data Management with iRODS WTSI Use Case: 1 ● Managing and accessing sequencing Binary Alignment/Map (BAM) files ● 500 TB SAN Storage ● Integrated in the sequencing pipeline ● Fine-grained access control ● Data replication ● Metadata on alignment are automatically added ● Data federation with other research institutes 1 G.-T. Chiang, P. Clapham, G. Qi, K. Sale & G. Coates: Implementing a genomics data management system using iRODS in the Wellcome Trust Sanger Institute. BMC Bioinformatics 2011, 12, 361. 11 05/21/14 www.egi.eu EGI-InSPIRE RI-261323
Other Examples ● Astrophysics: Auger supernova search ● Atmospheric science: NASA Langley Atmospheric Sciences Center ● Biology: Phylogenetics at CC IN2P3 ● Climate: NOAA National Climatic Data Center ● Cognitive Science: Temporal Dynamics of Learning Center ● Computer Science: GENI experimental network ● Cosmic Ray: AMS experiment on the International Space Station ● Dark Matter Physics: Edelweiss II ● Digital Library French National Library, Texas Digital Libraries ● Earth Science: NASA Center for Climate Simulations, Vhub - vulcanism ● Ecology: CEED Caveat Emptor Ecological Data ● Engineering: CIBER-U ● High Energy Physics: BaBar ● Hydrology: Institute for the Environment, UNC-CH; Hydroshare ● Genomics: Broad Institute, Wellcome Trust Sanger Institute, NGS ● Indexing: Cheshire ● Institutional repository: Carolina Digital Repository ● Medicine: Sick Kids Hospital ● Neuroscience: International Neuroinformatics Coordinating Facility ● Neutrino Physics: T2K and dChooz neutrino experiments ● Oceanography: Ocean Observatories Initiative ● Optical Astronomy: National Optical Astronomy Observatory 12 05/21/14 ● Particle Physics: Indra www.egi.eu EGI-InSPIRE RI-261323
FG-IRODS 13 www.egi.eu EGI-InSPIRE RI-261323
FG-iRODS Federated Infrastructure • Coordinated by France Grilles • A single production instance: Federated resources and workforce replicated Hosting users from any scientific domain 20 TB Design for small and medium projects Open to new resource providers User support and training iCAT 40 TB 20 TB 14 14 05/21/14 www.egi.eu EGI-InSPIRE RI-261323
French iRODS Federated Infrastructure Collaboration: ● National instance coordinated by the French NGI "France Grilles" ● Project started in 2013 ● Authenticate by identifiers or certificates ● Administrated collectively by four partners ● Centralised iRODS rule engine and catalogue to enforce coherent and homogeneous data management ● Resources distributed in different locations for high data availability 15 05/21/14 www.egi.eu EGI-InSPIRE RI-261323
FG-IRODS Team • Yonny CARDENAS (CC-IN2P3, Lyon) • Jean-Yves NIEF (CC-IN2P3, Lyon) • Gilles MATHIEU (France Grilles, Lyon) • Geneviève ROMIER (France Grilles, Lyon) • Jerome PANSANEL (IPHC, Strasbourg) • Catherine BISCARAT (LSPC, Grenoble) • David BENABEN (CBIB & INRA, Bordeaux) • Pierre GAY (MCIA, Bordeaux) • Benoît HIROUX (MCIA, Bordeaux) 16 05/21/14 www.egi.eu EGI-InSPIRE RI-261323
Achievements • Federated set of resources for a total of 80 TB • Real synergy between the administrators • Reliable and highly available storage • Usage policies is published • First training has been performed (Clermont-Ferrand, February 2014) • First users are currently hosted (proteomics and biological data) • IRODS clients installed on all grid sites supporting the france-grilles VO • IRODS packaging with GSI support (deb, rpm) • WEB Interface available • VM appliance provided to access the computing Grid and iRODS 17 05/21/14 www.egi.eu EGI-InSPIRE RI-261323
Perspectives ● Extend the storage pool with new resource providers ● Welcome more new users ● Deploy a monitoring solution to ensure infrastructure reliability ● Test the S3 plugin ● Find new financial resources to ensure the sustainability of the infrastructure ● Share expertise regarding data management and user support with other groups → http://www.france-grilles.fr/Pour-les-chercheurs-ou- ingenieurs#iRODS 18 05/21/14 www.egi.eu EGI-InSPIRE RI-261323
Recommend
More recommend