The Climate-G testbed: issues, requirements and results S. Fiore, Ph.D. SPACI and University of Salento, Italy sandro.fiore@unisalento.it On behalf of Climate-G Team EGI Technical Forum - Sept 16, 2010
Outline • Introduction • Issues and requirements – data, metadata – scientific gateways • The Climate-G testbed – user requirements – architecture, infrastructure – the Climate-G Portal Snapshots – future work EGITF 2010 2
Climate change data deluge • Huge amount of data produced across several countries leads to: – Need to share data among centers at an international level – Need to move towards open, distributed and transparent environments – Need to easily access to data through Scientific Data gateways – Need to carry out post-processing activities as well as analysis – Need to move towards domain-specific metadata schemas – Need to exploit large infrastructures to implement world-wide “production-level” environments for climate change scientists – More emphasys on publishing data into “global” contexts EGITF 2010 3
Requirements, needs, issues and challenges (I) • Management of data and metadata • Data is distributed among several centers • Easy join (in terms of startup costs) for new sites • Metadata management needs to be distributed too • Local autonomy needs to be preserved • Data formats: basically NetCDF, but also CSV, Grib, etc. • Metadata schema • Fine Grained and coarse grained data management • Coarse grained (e.g. climate datasets) • Fine grained (e.g. impacts data) • Main functionalities to address first use cases • Data search & discovery • Data access (e.g. download functionality) • Data subsetting (e.g. slicing and dicing of data) • Data visualization through different tools • Metadata management EGITF 2010 4
Requirements, needs, issues and challenges (II) • Security • Secure access to data (at data service level) • Secure access to metadata (at metadata service level) • Secure access to the portal (security at portal level) • Different roles (admin, data provider, metadata contributor, etc.) must be defined at several levels to set different privileges • Uniform security approach • Acces to the distributed environment via “Scientific Gateways” Data Distribution Centre to manage data, metadata, tools, services, users, etc . • • Integration of services and tools widely deployed, tested and adopted by the community • Pervasive, easy to access, easy to extend, ubiquitous, web based • Portal centric infrastructure & data centric portal • The data must be placed in the middle of the scene • Multiple options must be available to manage, display, download, analyze the data as needed. EGITF 2010 5
Search & Discovery: Metadata Management • “Context” description: from data to information • Exaustive schemas are needed – Domain based schema, community driven vocabulary Examples come from ES Curator and Metafor • Provenance metadata are challenging today to identify, trace and record the history of data • Metadata Tools and Services • Metadata services to manage projects/experiments/datasets descriptions • Main approaches – DBMS based – Grid based – OGC based • Common interfaces definition is an on going process – OGC, OGF and other standardization bodies Standardization activity is still needed • Metadata Tools: automatic extraction, ingestion, validation, etc.are needed due to the high number of metadata information EGITF 2010 6
Scientific Gateways (I) • From Data Portals to Scientific Data Gateways • From simple web data access applications to rich integrated environments Besides data, users can find: • Rich metadata descriptions, data visualization tools, a wide variety of services, etc. Data centric approach • Looking at the same data from different perspectives • Looking at the same data in complementary ways • Different grain-level approach for the data services – From coase (file access/download) to fine (variable aggregation) Union: join data from different datasets Tiling: join data along existing dimension …. • Different metadata support approach – Domain based, community driven, widely adopted Metafor (Europe) ES Curator (US) EGITF 2010 7
Scientific Gateways (II) • From Data Portals to Scientific Data Gateways • Towards Web2.0 approach – Usability and sharing as key concepts in Web2.0 – From personal websites to blogging – From publishing to participation – From content management systems to wikis – Mashup, Widgets and Tagging are some important features of Web2.0 – Web2.0 - a good reference available (Tim O’Reilly) http://oreilly.com/web2/archive/what-is-web-20.html • Stronger integration of scientific, collaborative and social aspects – Social networking capabilities are poorly exploited today but… They can increase level of discussions, feedback, data exploitation, scientific results, dissemination among different groups, scientific teams, etc. EGITF 2010 8
Scientific Gateways (III) CMCC Workshop - June 10-12, 2009 - Ugento EGITF 2010 9
A real use case: the Climate-G testbed The main goal of Climate-G is to create an open and unified environment for climate change enabling geographical and cross- institutional data discovery, access, analysis, visualization and sharing. This effort has been conceived as a proof of concept for the involved technologies (in particular the GRelC service) and it has been supported during the EGEE project by the Earth Science Cluster Community. It acts as a virtual laboratory involving partners both in Europe and US EGITF 2010 10
The Climate-G partnership EGITF 2010 11
The central role of the User Community Key assumption: • “The user community must be an active part in the whole process (requirements, tools to be integrated into the system, semantics of metadata, feedback and validation, list of priorities, meetings, etc.)” • Several partners of the Climate-G testbed works in the Earth Sciences and Environmental domains • Most of the users comes from the target community (about 80%) • The activity has been disseminated in the Geosciences conferences: EGU09, EGU2010, ESA2009, AGU2010 (tentative), etc. • Attract new users • Identify new needs and requirements • Define new use cases • Improve the existing software • …. EGITF 2010 12
Data and Metadata distribution EGITF 2010 13
Grid Metadata Service: GRelC (EGEE RESPECT) EGITF 2010 14
Portal-centric view of the infrastructure EGITF 2010 15
Climate-G Portal • Main Functionalities o Search & Discovery o Data access & viz o Metadata management o Users and roles mng o List of experiments o List of entries/datasets • Features o Easy to use interfaces o Platform independent o Secured by design o No additional software is required o It entirely replaces the Command Line Interface o JSP/Servlets based, AJAX (dynamic web pages) o Fast adoption of components in mashups , like Google Maps EGITF 2010 16
How data are organized? Projects IPSL/CNRS Fraunhofer-SCAI Experiments University of Cantabria 1:1 Datasets Variable Euro-Med Centre for Climate Change EGITF 2010 17
Climate-G Portal: Snapshots EGITF 2010 18
Climate-G: domain based services/tools Climate-G includes domain-based services & tools into the infrastructure - User community requirement: domain-based services part of the infrastructure - Provides domain specific tasks. Well known, tested and widely adopted. - Legacy systems already available and accessible Some examples: • OPeNDAP (OPeNDAP Consortium) • Provides access to climate data sources • Widely adopted in the Climate community • nc Web Map Service (Univ. of Reading) • HTTP interface for requesting geo-registered map images from geospatial databases • Integrated Data Viewer (UNIDATA,UCAR) and Godiva2 (Univ. of Reading) • Data visualization tools widely adopted by the Climate community EGITF 2010 19
Data Access - Complete OPeNDAP Support EGITF 2010 20
Data Visualization (IDV support) EGITF 2010 21
Godiva2 Integration Two-dimensional Data visualization tool Google Earth EGITF 2010 22
Recommend
More recommend