efficient and scalable climate metadata management with
play

Efficient and Scalable Climate Metadata Management with the GRelC - PowerPoint PPT Presentation

Efficient and Scalable Climate Metadata Management with the GRelC DAIS G. Aloisio, S. Fiore CMCC Scientific Computing and Operations Division University of Salento, Lecce Context : countdown of the Intergovernmental Panel on Climate Change


  1. Efficient and Scalable Climate Metadata Management with the GRelC DAIS G. Aloisio, S. Fiore CMCC Scientific Computing and Operations Division University of Salento, Lecce

  2. Context : countdown of the Intergovernmental Panel on Climate Change (IPCC) report  End of 2009 - Autumn 2010 : Climate simulations  End of 2010 - ? : Data Distribution  End of 2010 - Early 2012 : Scientific publications  Early 2013 : Report publication IPCC AR5 (Assessment Report #5) (Boucher and Pham, 2002) 2

  3. Scenario, issues and needs • Huge amount of data (PBs) produced at an international level • Need to share data among several centres • Data integration and sharing FP7 keywords • Need to move towards open , distributed and service-based environments • A lot of issues: o Data distribution o Data format heterogeneity o Metadata management o Metadata schema o Security and local policies o Transparent access to the system o Scalable approach o …. 3

  4. Euro-Mediterranean Centre for Climate Change The Euro-Mediterranean Centre for Climate Change Associate Centers Partners (CMCC) is a national initiative of scientific research in the field of climate change FEEM CVR ING V UNITUS • Research Divisions ( SCO , ANS, CIP, ISC, IAFENT, FDD) UNISS SANNI O CIRA UNILE IAMB • Partners (INGV, UNILE, CIRA, etc.) CRMPA SPACI • Associated Centres (SPACI, etc.) 4

  5. CMCC: An Integrated and Ubiquitous Environment for Climate Change Data Management : Metadata/data services Associate Centers Partners CMCC Environment : acts as an incubator for the proposed technologies Interdisciplinary : FEEM CVR Climate and Computing Scientists INGV Key Points : Transparency and Interoperability UNITUS Expertise and Know-how Computing Scientists (Unile) SANNIO UNISS SPACI support CIRA IAMB UNILE CRMPA Middleware: SPACI gLite, Globus, etc. Metadata Mng: Grid Metadata Handling System (GMHS)

  6. CMCC Data & Metadata: issues & requirements Data & Metadata Management • Data distribution, access, management, delivery, etc. • Metadata management, access, integration • Metadata search & discovery facilities • Metadata access & browsing • Pervasive and Ubiquitous access (Data Portal) • Metadata Agreement: design and schema implementation Main (non functional) requirements • Scalability • Transparency • Efficiency • Interoperability • Security • Loosely coupled system • Easy to access system 6

  7. Data Management @ CMCC - Phase1 Alias METADATA CMCC Metadata Services CMCC Metadata Agreement – Grid based solutions – Standard Analysis – Centralized Solution  ISO19115 / ISO19139  Initially deployed  Dublin Core Metadata  Based on GRelC DAS  Other standards and schema currently • Centralized Metadata Mng used – Distributed Solution – Schema definition  Work in progress  Design and schema implementation  GRelC DAIS  CMCC Working Group • P2P Solution • Metadata distribution • Interdisciplinary Group • Climate and Computer scientists CMCC Data Distribution Centre  Schema describes – CMCC Data Grid Portal • Models • Algorithms – Data oriented functionalities • Datasets – Search and discovery • …  Dataset browsing  Editing functionalities 7

  8. Metadata Management Stack CMCC Application Layer Graphical User Interface , Data Grid Portal, Command Line Interface Interoperable SOAP over GSI httpg protocol WS-I Interface High Level Services SEARCH - DISCOVERY - PUBLISHING BROWSING - DISPLAY METADATA EXTRACTION - Low Level Services ACCESS - BROWSING - QUERY - AGGREGATION - VALIDATION - DISPLAY SEARCH - DISCOVERY - DELIVERY BASIC ACCESS SERVICES - TRANSLATION LIBRARIES Low level APIs AUTOMATIC INGESTION LIBRARIES Metadata Physical Layer Metadata Catalog XML Doc/DB Schema 8

  9. Metadata Management: Stack 9

  10. GRelC Project (starting date 2001) Grid Relational Catalog (GRelC) is a project which aims at designing and developing a set of efficient, secure and transparent Data Grid Services XML DB DB DB Grid 10

  11. Grid Metadata Handling System: Data Integration Layer Data Grid Portal Grid Service Catalog 11

  12. Grid Metadata Handling System: architecture in the small GRelC DAIS 12

  13. National & International Testbeds sepac00.projects.cscs.ch GRelC Data Access Linux x86 Data Sources (DB) Lecce (Italy) Bejing (China) gandalf.unile.it Linux x86 spacina.na.infn.it sara.unile.it Linux IA64 Mac OS X sigma2.unile.it Linux IA64 gridsurfer.unile.it FreeBSD galileo.hpcc.unical.it Linux IA64 13

  14. Test Performance 14

  15. GRelC & EGEE RESPECT PROGRAM 15

  16. CMCC Metadata Grid Service • A Metadata Grid Service Infrastructure – GRelC Project based solution  Moving from GRelC DAS to GRelC DAIS  Data Access and Integration capabilities  Scalable approach to distributed database management  P2P and Grid Protocols/Services  CMCC customization – GRelC DAIS  Deployment is ongoing  4 Sites within the preliminary phase • Lecce, Bologna, Capua, Sassari  Distributed solution  Data Grid Portal available for metadata access  Two step search & discovery process based on different data models  SOA based approach with full security support through GSI 16

  17. CMCC on iSGTW Key issues: • GRelC DAIS 3.0 • CMCC GMHS • RESPECT Program • CMCC Deployment … See at: http://www.isgtw.org/?pid=1001234 17

  18. Scenario, issues and needs • Huge amount of data produced at an international level • Need to share data among several centres • Data integration and sharing FP7 keywords • Need to move towards open , distributed and service-based environments • A lot of issues: o Data distribution o Data format heterogeneity o Metadata management o Metadata schema o Security and local policies o Transparent access to the system o Scalable approach o …. 18

  19. A new research effort: Climate-G The main goal of Climate-G is to create a unified environment for climate change, able to concentrate in the same context big amount of data geographically spread among several centres, rich metadata descriptions, efficient data access services, advanced data analysis and visualization tools, etc. exploiting and joining knowledge and skills in the fields of climate change and computational science 19

  20. Climate-G partners Università del Salento 20

  21. Climate-G: Involved People Principal Investigators Giovanni Aloisio - Euro-Mediterranean Centre for Climate Change (CMCC) and University of Salento, Italy Sandro Fiore - Euro-Mediterranean Centre for Climate Change (CMCC) and University of Salento, Italy Sébastien Denvil - Institut Pierre-Simon Laplace (IPSL), France Monique Petitdidier - Institut Pierre-Simon Laplace (IPSL), France Involved people Giovanni Aloisio(1,6), Sandro Fiore(1,6), Sébastien Denvil(2), Monique Petitdidier(2), Peter Fox(3), Horst Schwichtenberg(4), Jon Blower(5), Roberto Barbera(7), David Weissenbach(8), André Gemuend(4) Institutions 1. Euro-Mediterranean Centre for Climate Change (CMCC), Italy 2. Institut Pierre-Simon Laplace (IPSL), France 3. High Altitude Observatory (HAO) at the NCAR,USA 4. Fraunhofer-SCAI, Germany 5. University of Reading, UK 6. University of Salento, Italy 7. University of Catania, Italy 8. Institut de Physique du Globe de Paris, France 21

  22. Climate-G: Metadata (XML and RDB) 22

  23. Metadata Distribution and virtualization For each site: Relational DB (index) XML DB (entire schema) Virtualization/Integration layer: GRelC DAIS Virtualization allows to conceal: Data distribution Number of sites, RDBMS and XML back-ends P2P Topology Data Integration aspects technological details … 23

  24. GRelC DAIS deployment in Climate-G Thanks to all these efforts we published an article on EGEE Newsletter Title: Climate Modelling and EGEE Link: http://eu-egee.org/newsletter/automn08/Autumn08_draft.html#news5 24

  25. Climate-G Data Distribution Centre • Main Functionalities o Search & Discovery o Data access & viz o Metadata browsing o Users and roles mng o …. • Features o Filters and listeners o Design Pattern approach o Easy to use interfaces o Platform independent o Secured by design o No additional software is required o It entirely replaces the Command Line Interface Developed by CMCC ADM Team 25

  26. Climate-G DDC: Snapshots 26

  27. Complete OPeNDAP Support 27

  28. Data Visualization (IDV support) 28

  29. For more information… P.Is. : G. Aloisio, S. Fiore, S. Denvil, M. Petitdidier Climate-G URL: http://grelc.unile.it:8080/ClimateG-DDC Newsletter: climateg-news@sara.unisalento.it (if you want to join send an email to climateg-info@cmcc.it) Questions/Information : climateg-info@cmcc.it To issue a new Grid Certificate : climateg-ca@cmcc.it 29

Recommend


More recommend