adding biodiversity datasets from argentinian patagonia
play

Adding Biodiversity Datasets from Argentinian Patagonia to the Web - PowerPoint PPT Presentation

Adding Biodiversity Datasets from Argentinian Patagonia to the Web of Data S4BioDiv 2017 2nd International Workshop on Semantics for Biodiversity Marcos Zrate, CENPAT - CONICET Germn Braun, GILIA - UNCOMA Pablo Fillottrani, DCIC - UNS


  1. Adding Biodiversity Datasets from Argentinian Patagonia to the Web of Data S4BioDiv 2017 – 2nd International Workshop on Semantics for Biodiversity Marcos Zárate, CENPAT - CONICET Germán Braun, GILIA - UNCOMA Pablo Fillottrani, DCIC - UNS

  2. Motivation • Currently there is a steadily growing wealth of biodiversity data from a wide range of disciplines which are available from on-line information systems around the world. • Biodiversity community has standardized shared common vocabularies such as Darwin Core (DwC) together with platforms as the Integrated Publishing Toolkit (IPT).

  3. Motivation • Since 2011 CENPAT has started to publicly share its biodiversity data under Open Data license. • These data are available as Darwin Core Archive (DwC-A) through IPT (http://ipt.cenpat- conicet.gob.ar:8081/)

  4. The Problem • IPT platform focuses on publishing content in unstructured or semi-structured formats but reducing the possibilities to interoperate with other datasets and make them accessible for machines. • Though the DwC is defined in an RDF document, integration of biodiversity data in the Semantic Web (SW) is in its early stages.

  5. Proposed Solution • We present a transformation process to publish biodiversity data as RDF datasets. • This process uses OpenRefine and RDF refine for generating RDF triples and define URIs. • We use GraphDB for storing, browsing, accessing and linking data with external RDF datasets.

  6. Proposed Architecture

  7. URI Definition • In order to generate URI for each resource, we use GREL (General Refine Expression Language) also provided by OpenRefine. • The general structure of the URIs is : ▫ http://[base uri]/[DwC class]/[value] • The resulting RDF triple for an occurrence is: ▫ SUBJECT <base_uri/occurrence/f6bbf85d-85ea-4605-87fad81aca73a1cd > ▫ PREDICATE rdf:type ▫ OBJECT dwc:Occurrence

  8. Exploitation: Conservation Status of Species • Information about the state of conservation is missing in CENPAT datasets.

  9. Exploitation: Occurrences by Year • The following query allows to observe the temporality of the occurrences and its results are visualised using R and ggplot2 package.

  10. Exploitation: Locations of Marine Mammals • This query retrieve the locations (latitude and longitude) for the species Mirounga Leonina , and its results are visualized using R and ggmap package.

  11. Results • In this initial stage only a few datasets were converted to RDF, our platform stored 502.00 RDF triples. • Also for the user to be able to exploit the dataset we define some SPARQL queries and their corresponding visualization using the statistical software R.

  12. Future work • As future works, we plan to automate some tasks of the process and interlink with more datasets. • Providing easier SPARQL access for non-skilled users. • We are analyzing other ontologies such as ENVO, NCBI and OWL Time and working on a suite of complementary ontologies for describing every aspect of semantic biodiversity.

  13. Links of interest • Github project ▫ https://github.com/cenpat-gilia/CENPAT-GILIA- LOD • SPARQL Endpoint ▫ http://crowd.fi.uncoma.edu.ar:3333/repositories/ BIO_CNP_GILIA • R scripts ▫ https://github.com/cenpat-gilia/CENPAT-GILIA- LOD/tree/master/r-scripts

  14. Thank you for your attention

Recommend


More recommend