egi inspire irods setup and use of a national data
play

EGI-InSPIRE iRODS: Setup and Use of a National Data Management - PowerPoint PPT Presentation

EGI-InSPIRE iRODS: Setup and Use of a National Data Management System in the French NGI Jerome PANSANEL & the FG-iRODS Team jerome.pansanel@iphc.cnrs.fr Hubert Curien Multidisciplinary Institute, Strasbourg, France 1 05/21/14 www.egi.eu


  1. EGI-InSPIRE iRODS: Setup and Use of a National Data Management System in the French NGI Jerome PANSANEL & the FG-iRODS Team jerome.pansanel@iphc.cnrs.fr Hubert Curien Multidisciplinary Institute, Strasbourg, France 1 05/21/14 www.egi.eu www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE RI-261323

  2. iRODS www.egi.eu EGI-InSPIRE RI-261323

  3. Scientific Data Today ● Large amounts of data are collected by scientist and have to be analyzed ● Data are distributed across the world and have to be shared between the different partners ● Each data center has it own storage infrastructure (most of the time based on heterogeneous systems) ● The physical organization of the data should be transparent to the users ● Data should be easy to manage ● Data should be easily retrievable (for example with metadata search) ● Data access has to be protected (data replication, specific ACLs, ...) ● Data has to be available from anywhere → How to solve these challenges? 3 05/21/14 www.egi.eu EGI-InSPIRE RI-261323

  4. IRODS: iRule Oriented Data Systems • Project started in 2006 (based on SRB) • Release under an Open Source license (BSD) • Developed by the DICE group and several collaborators • Rule Engine applies user-defined policies and rules • Integrate a descriptive metadata system to manipulate data • Data collections manageable over several sites and iRODS heterogeneous hardware • Logical organization of files is independent of its MS Disk Disk physical implementation • Enforcement of data consistency and homogeneity http://irods.org/ 4 05/21/14 www.egi.eu EGI-InSPIRE RI-261323

  5. IRODS Consortium ● Objectives:  Guide the continuous development of iRODS  Obtain funding to support this development  Provide a fully tested software by using complementary process of testing, packaging, and expertise developed at RENCI  Evangelize iRODS among potential users ● For more informations:  → http://irods-consortium.org/ 5 www.egi.eu EGI-InSPIRE RI-261323

  6. Fact Sheet ● Usable from personal laptop to institutional repositories to international projects ● Thousands of users ● Billions of files and several petabytes of data ● Extensive documentation: https://wiki.irods.org/index.php/Documentation ● Binary packages available 6 05/21/14 www.egi.eu EGI-InSPIRE RI-261323

  7. Under the Hood ● An iRODS system (a “zone”) is based on three main elements:  Database Desktop  Rule engine Grid  Resources Cloud ● Data servers can be spread geographically within one zone ● Possibility to have different zones interconnected ● Available user interfaces : GUIs, CLIs and APIs (C, Java, Python, …) ● Automatic and manual data processing possible through the rule engine 7 05/21/14 www.egi.eu EGI-InSPIRE RI-261323

  8. User Interfaces [user ~]$ ils /frgrid/home/UNECOLLAB/RAWDATA: C- /frgrid/home/UNECOLLAB/RAWDATA/CALIBRATION C- /frgrid/home/UNECOLLAB/RAWDATA/BE C- /frgrid/home/UNECOLLAB/RAWDATA/ZR [user ~]$ ils -l BE/ /frgrid/home/UNECOLLAB/RAWDATA/BE: owner 0 ps-lpsc-lpscdata7-fr 80072192 2013-11-11.16:21 & run0977156_123.dst owner 0 ps-lpsc-lpscdata7-fr 1748189011 2013-11-11.15:48 & run0977156_123.raw owner 1 iphcCache1 1748189011 2013-11-11.16:42 & run0977156_123.raw owner 0 ps-lpsc-lpscdata7-fr 80072192 2013-11-11.16:21 & run0977234_673.dst ... 8 05/21/14 www.egi.eu EGI-InSPIRE RI-261323

  9. Metadata ● Associated with a file, a collection, a resource or a user ● Based on a triplet: name, value and unit [user ~]$ imeta add -d run0977156_123.raw length 10 cm [user ~]$ imeta add -d run0977156_123.raw hall east [user ~]$ imeta ls -d run0977156_123.raw AVUs defined for dataObj run0977156_123.raw: attribute: length value: 10 units: cm ---- attribute: hall value: east units: [user ~]$ imeta -d qu hall east collection: /frgrid/home/UNECOLLAB/RAWDATA/ZR dataObj: run0977156_123.raw ---- collection: /frgrid/home/UNECOLLAB/RAWDATA/ZR dataObj: run0817773_556.raw 9 www.egi.eu EGI-InSPIRE RI-261323

  10. IRODS Rule Sample ● Constitution of a rule: actionDef | condition | workflow-chain |recovery-chain ● Example: acPostProcForPut { ON($objPath like "/tempZone/home/rods/monitored/\*") { msiSplitPath($objPath, *collection, *fileName); msiCollRsync(*collection, "/targetZone/home/rods/safe- copy", "demoResc", "IRODS_TO_IRODS", *Status); writeLine("serverLog", "Rsync of *collection to its safe copy done (status=*Status) Triggered by creation of $objPath); } } 10 05/21/14 www.egi.eu EGI-InSPIRE RI-261323

  11. Genomic Data Management with iRODS WTSI Use Case: 1 ● Managing and accessing sequencing Binary Alignment/Map (BAM) files ● 500 TB SAN Storage ● Integrated in the sequencing pipeline ● Fine-grained access control ● Data replication ● Metadata on alignment are automatically added ● Data federation with other research institutes 1 G.-T. Chiang, P. Clapham, G. Qi, K. Sale & G. Coates: Implementing a genomics data management system using iRODS in the Wellcome Trust Sanger Institute. BMC Bioinformatics 2011, 12, 361. 11 05/21/14 www.egi.eu EGI-InSPIRE RI-261323

  12. Other Examples ● Astrophysics: Auger supernova search ● Atmospheric science: NASA Langley Atmospheric Sciences Center ● Biology: Phylogenetics at CC IN2P3 ● Climate: NOAA National Climatic Data Center ● Cognitive Science: Temporal Dynamics of Learning Center ● Computer Science: GENI experimental network ● Cosmic Ray: AMS experiment on the International Space Station ● Dark Matter Physics: Edelweiss II ● Digital Library French National Library, Texas Digital Libraries ● Earth Science: NASA Center for Climate Simulations, Vhub - vulcanism ● Ecology: CEED Caveat Emptor Ecological Data ● Engineering: CIBER-U ● High Energy Physics: BaBar ● Hydrology: Institute for the Environment, UNC-CH; Hydroshare ● Genomics: Broad Institute, Wellcome Trust Sanger Institute, NGS ● Indexing: Cheshire ● Institutional repository: Carolina Digital Repository ● Medicine: Sick Kids Hospital ● Neuroscience: International Neuroinformatics Coordinating Facility ● Neutrino Physics: T2K and dChooz neutrino experiments ● Oceanography: Ocean Observatories Initiative ● Optical Astronomy: National Optical Astronomy Observatory 12 05/21/14 ● Particle Physics: Indra www.egi.eu EGI-InSPIRE RI-261323

  13. FG-IRODS 13 www.egi.eu EGI-InSPIRE RI-261323

  14. FG-iRODS Federated Infrastructure • Coordinated by France Grilles • A single production instance:  Federated resources and workforce replicated  Hosting users from any scientific domain 20 TB  Design for small and medium projects  Open to new resource providers  User support and training iCAT 40 TB 20 TB 14 14 05/21/14 www.egi.eu EGI-InSPIRE RI-261323

  15. French iRODS Federated Infrastructure Collaboration: ● National instance coordinated by the French NGI "France Grilles" ● Project started in 2013 ● Authenticate by identifiers or certificates ● Administrated collectively by four partners ● Centralised iRODS rule engine and catalogue to enforce coherent and homogeneous data management ● Resources distributed in different locations for high data availability 15 05/21/14 www.egi.eu EGI-InSPIRE RI-261323

  16. FG-IRODS Team • Yonny CARDENAS (CC-IN2P3, Lyon) • Jean-Yves NIEF (CC-IN2P3, Lyon) • Gilles MATHIEU (France Grilles, Lyon) • Geneviève ROMIER (France Grilles, Lyon) • Jerome PANSANEL (IPHC, Strasbourg) • Catherine BISCARAT (LSPC, Grenoble) • David BENABEN (CBIB & INRA, Bordeaux) • Pierre GAY (MCIA, Bordeaux) • Benoît HIROUX (MCIA, Bordeaux) 16 05/21/14 www.egi.eu EGI-InSPIRE RI-261323

  17. Achievements • Federated set of resources for a total of 80 TB • Real synergy between the administrators • Reliable and highly available storage • Usage policies is published • First training has been performed (Clermont-Ferrand, February 2014) • First users are currently hosted (proteomics and biological data) • IRODS clients installed on all grid sites supporting the france-grilles VO • IRODS packaging with GSI support (deb, rpm) • WEB Interface available • VM appliance provided to access the computing Grid and iRODS 17 05/21/14 www.egi.eu EGI-InSPIRE RI-261323

  18. Perspectives ● Extend the storage pool with new resource providers ● Welcome more new users ● Deploy a monitoring solution to ensure infrastructure reliability ● Test the S3 plugin ● Find new financial resources to ensure the sustainability of the infrastructure ● Share expertise regarding data management and user support with other groups → http://www.france-grilles.fr/Pour-les-chercheurs-ou- ingenieurs#iRODS 18 05/21/14 www.egi.eu EGI-InSPIRE RI-261323

Recommend


More recommend