Distributed data management, the power and challenge of metadata Øystein Godøy
Why bother with data management? ● Science paradigms ● Maximise public investment in data – according to Jim Gray collection and production ● empirical science ● Promote scientific ● theoretical science collaboration ● computational ● Promote interdisciplinary science science ● data exploration ● Promote scientific science transparency ● Leave a legacy
Document data through metadata ● Discovery level ● Use level ● What ● Variable names/descriptions ● Where ● Missing values ● When ● Units ● Who ● Coordinates ● Constraints on sharing and usage ● Interdependency ● How to access data between variables ● ... ● Linkages between data
Metadata
The toolbox
METSIS implementations ● Arctic Data Centre (WMO ● Norwegian Satellite Earth DCPC) Observation Database for Marine and Polar Research ● WMO Global Cryosphere (RCN) Watch ● CryoClim – climate ● EU FP6 DAMOCLES consistent satellite remote ● EU FP7 ACCESS sensing products for the ● EUMETSAT Ocean and Sea Ice cryosphere (ESA/NRS) SAF (High Latitude centre) ● International Polar Year ● Svalbard Integrated Arctic ● National node in Norway Observing System (demo) ● International operational data ● SAON (demo) coordination
NORMAP ● Features ● Norwegian Satellite Earth Observation Database for Marine ● Subscription and Polar Research ● http://normap.nersc.no/ ● Collocated ● Satellite products visualisation ● Level 2 and higher ● Transformation of ● Distributed data repository individual products ● Nansen Environmental and Remote Sensing Centre ● (Transformation of ● Norwegian Meteorological multiple products to a Institute common reference) ● Kongsberg Satellite Services ● (CERSAT)
Global Cryosphere Watch ● WMO Information ● A coordinated framework for System ● Observations ● Relies on ● Data management interoperability ● Monitoring interfaces ● Assessment ● Much of the data is of scientific origin ● Product development ● Data are served from ● Focusing on the current the host data centre and future state of the cryosphere
Collocated visualisation
Lessons learned harvesting ● End point availability ● Transformation of metadata to search ● Few dedicated subsets model is done using XSLT of available data and SKOS ● Filtering of records is ● Harvest semi automatic required, but require initially to fully e.g. proper keywords understand the nature of ● CSDGM and ISO19115 each data centre linked metadata often lack ● Frequently interfaces to standardised data and documentation controlled vocabularies of these are lacking
Main challenges ● Standardised ● Propagation of metadata to global frameworks controlled vocabularies in machine readable ● Duplication of records form ● Require agreement to ● Filtering propagate metadata ● Interpretation of ● Transformation standards ● Identification of ● Not all are using interfaces standard interfaces to ● Metadata granularity data differs
Summary ● Integration of metadata sources is challenged by integration of technology and science through documentation standards and controlled vocabularies ● An increasing number of data centres support interoperability standards, but interpretation of standards often differ ● Metadata brokering is the short term key to integration of data centres
Recommend
More recommend