Web tools for accessing and disseminating data of different formats Mauro Michielon ICT Development NTTS conference, Brussels, 12 th March 2015
EEA’s context EEA needs to deal with a vast number of very heterogeneous datasets, coming from Member states, EU institutions, universities, research centres and private sector this is a challenging task and for long time there has been a need of creating procedures for streamlining incoming data in order overcome: • inconsistency of dataset layouts which has leaded to data management difficulties • poor interoperability levels due to data format fragmentation • constant need re- adapt processing chains… • …or need of constant manual interaction for dataset normalization • …and induced instability of layouts in data visualization Technological advancements, based on Open Linked Data and web based interactive visualizations libraries, are helping to improve the situation in order to remove such obstacles for our stakeholders.
Data storage – Open Data - Virtuoso OpenLink Linked open data approach, triple store based on: https://github.com/openlink/virtuoso-opensource Data transformation to RDF and ingestion from reliable sources are automated Datasets are exposed and can be queried via the EEA’s public endpoint: http://semantic.eea.europa.eu/sparql Possibility to link data coming from different data sources by the use common dictionaries The endpoint can process requests which must be coded in Sparql language output results can be of format: • humane readable formats: HTML,CSV,TSV • machine to machine consumables: JSON, XML, XML+schema • Interoperability : easier (machine to machine) exchange of data with partner institutions
Data visualization: Daviz Daviz: https://github.com/eea/eea.daviz - based on Google Charts libraries Data visualization tool developed and used by EEA to create interactive data visualizations Daviz is capable to consume the output of a sparql query and visualize the results PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX sdmx-measure: <http://purl.org/linked-data/sdmx/2009/measure#> PREFIX sdmx-dimension: <http://purl.org/linked-data/sdmx/2009/dimension#> PREFIX property: <http://rdfdata.eionet.europa.eu/eurostat/property#> PREFIX geo: <http://dd.eionet.europa.eu/vocabulary/eurostat/geo/> PREFIX unit: <http://dd.eionet.europa.eu/vocabulary/eurostat/unit/> PREFIX product: <http://dd.eionet.europa.eu/vocabulary/eurostat/product/> PREFIX indic_nrg: <http://dd.eionet.europa.eu/vocabulary/eurostat/indic_nrg/> PREFIX sdmx-attribute: <http://purl.org/linked-data/sdmx/2009/attribute#> SELECT year(?date) as ?date ?product_label ( sum(?B_100900) - COALESCE(sum(?B_101600),0) ) as ?value WHERE { { GRAPH <http://rdfdata.eionet.europa.eu/eurostat/data/nrg_100a.rdf.gz> { _:nrg_100a sdmx-dimension:refArea ?geo . FILTER (?geo = geo:EU28) . _:nrg_100a sdmx-attribute:unitMeasure unit:1000TOE . _:nrg_100a sdmx-dimension:timePeriod ?date . _:nrg_100a property:product ?product . FILTER (?product in (product:2000, product:3000, product:4000, product:5100, product:5500)) { _:nrg_100a property:indic_nrg indic_nrg:B_100900 . _:nrg_100a sdmx-measure:obsValue ?B_100900 . } UNION { _:nrg_100a property:indic_nrg indic_nrg:B_101600 . _:nrg_100a sdmx-measure:obsValue ?B_101600 . } } ?product rdfs:label ?product_label . } } GROUP BY ?date ?product_label ?product ORDER BY ?date ?product_label The query is published: it is the document where the methodology is exposed to the general public
Key-facts With the combined use of Linked Open Data techniques and Daviz application in the context of web products (indicators, SOER 2015, etc…), we try to enforce: • Consistency : via the use of reusable programmatic procedures for data processing • Transparency : methodologies/algorithms are made public (general public QC/QA allowed) • Traceability : data harvesting processes and sparql queries executions are time stamped consistency + transparency + traceability = trust
Recommend
More recommend