semantic integration
play

Semantic integration of bibliographic records (Linked Open Data ) - PowerPoint PPT Presentation

Semantic integration of bibliographic records (Linked Open Data ) Author: Malakhov D. A. Introduction 2 There are many different sources of library data. Each organization can use only their information, which is not connected with


  1. Semantic integration of bibliographic records (Linked Open Data ) Author: Malakhov D. A.

  2. Introduction 2  There are many different sources of library data.  Each organization can use only their information, which is not connected with other sources.  Integration by space LOD (Linked Open Data) is a universal solution of this problem.  LOD was created to integrate as much information as possible in each subject area of it.  Publication of data in this space allows to enrich this information and to provide an access to it. 2/13

  3. Formulation of the problem 3  The purpose is to integrate the NLR (National Library) bibliographic records with records of the BNB (British National Library).  The NLR dataset has millions records (test set 17 th.). BNB data set consists of 3.5 million units, it was published in the LOD.  To reach the purpose, it’s necessary to solve such problems as : – Publication the NLR data according to the principles of LOD; – Integrating NLR data with BNB data. 3/13

  4. Publication of data on the principles of LOD 4 Necessary actions for the publication of data : – Describing the subject area (creating an ontology). – Converting the NLR data (RUSMARC / bin) to RDF. – Configure the semantic RDF data repository for NLR data. – Providing an access to the NLR data (via HTTP and SPARQL). 4/13

  5. Ontology 5  There are three ways of presenting bibliographic records in RDF : – MODS – the data model Library of Congress (USA). – Dublin core – the set of terms describing the network resources. – FOAF – the set of terms describing a person.  BNB reported it's data using Dublin core and FOAF. These standards for data presentation were used. 5/13

  6. Ontology 6 6/13

  7. Preparation of RDF 7 Preparation XSLT transformation (RUSMARC/xml to RDF) Converting RUSMARC/bin to RDF 7/13

  8. Storage creation 8  There are some ways to store semantic data : - storage in a relational database; - format TDB.  There are 3 API for semantic storage: - the Jena; - the Sesame; - the Virtuoso.  We selected the TDB format and the Jena. 8/13

  9. Providing access to data NLR 9  The server Jetty was chosen for processing HTTP requests.  The server returns information about the record, the author or the links, then it gets the full information about the object from the semantic storage via SPARQL.  The access point Fuseki which is set up with a logical conclusion Pellet OWL is selected for processing SPARQL queries to storage. 9/13

  10. Creating links 10  The clustering algorithm has been developed to create a link. The documents were linked by clusters.  The clustering algorithm : 1) Clusters are created on the basis of a set of data (for a few passages in this set). 2) The remaining elements are distributed in clusters (in one pass on these elements).  In the first instance the clusters of the NLR data were created.  Then BNB data were distributed by the clusters.  Links of documents and clusters were presented in RDF. 10/13

  11. The scheme of the system 11 11/13

  12. Conclusion 12 Further work can be carried out in such areas as : - full-text search in titles and descriptions; - distributed semantic repository; - searching by classifiers UDC and BDC; - searching by ISSN and ISBN. 12/13

  13. 13 Thank you for your attention! 13/13

Recommend


More recommend