linking library data contributions and role of subject
play

Linking library data: contributions and role of subject data Nuno - PowerPoint PPT Presentation

Linking library data: contributions and role of subject data Nuno Freire The European Library Outline Introduction to The European Library Motivation for Linked Library Data The European Library Open Dataset Linking Subject Data


  1. Linking library data: contributions and role of subject data Nuno Freire The European Library

  2. Outline  Introduction to The European Library  Motivation for Linked Library Data  The European Library Open Dataset  Linking Subject Data • Linking person names • Linking place names • Linking other entities/concepts  Concluding remarks

  3. About The European Library  Project started 1996, full operational service from 2005  Membership of national and research libraries of 47 Council of Europe states  European hub of library bibliographic resources, and full text collections.

  4. Library data aggregator  Library domain aggregator for Europena • Digital collections of cultural heritage resources • Aggregation of metadata • Aggregation of full text of Historical Newspapers  General Library domain aggregator • Aggregation of other bibliographic resources • Currently developping its capabilities for aggregating content (metadata + digital resource)

  5. Library open data distribution and supporting its reuse http://www.theeuropeanlibrary.org/tel4/access  Opening access to library data  Distributing data into strategic channels • European research infrastructures, Portals, etc.  Facilitating re-use • Supporting interoperability: APIs, data formats, etc. • ...and Linking Data

  6. The European Library Open Dataset www.theeuropeanlibrary.org

  7. Library LOD: Motivation LOD provides a set of procedures and technical standards to allow the reuse of data across communities. LOD allows for: ● Opening access to the data … in order to allow others to obtain, process and re -use the data. ● Linking the data to other datasets … in order to allow others to find the data more easily, better understand its meaning, match it with other data...

  8. Library LOD: Motivation Linking data makes it more precise and informative. Data links allow computers to better understand the data, enabling more use cases.

  9. Library LOD: Motivation Linked data is not new to libraries, and its value clearly realized ● Libraries have perceived the value of linked data for decades: o Authority files, union catalogues, ... ● Library data is already contributing with LOD datasets which are being re-used across all communities Nowadays LOD framework addresses the same benefits: ● but beyond libraries … at a global level … across all communities.

  10. The Data Model

  11. The Data Model  RDA Element Vocabularies • The most extensivelly used vocabularies • Used entensivelly in the properties of the Bibliographic Resources  FRBRer model • Used for context • Not used for Item, Manifestation, Expression, Work • The LOD data is derived from non-FRBR MARC data  Europeana Data Model • Used for Web Resources  OWL 2 Web Ontology Language • Used for linking to external datasets • For linking duplicate Bibliographic Resources within libraries  Dublin Core Terms • Used where more general semantics could/should be applied  WGS84 Geo Positioning

  12. Resulting usage of classes (from MARC data) Example statistics from the Research Libraries UK collection

  13. Resulting properties usage (from MARC data) Example statistics from the Research Libraries UK collection

  14. External LOD Datasets Linked To  Links to external datasets are available for the following: • VIAF Virtual Union Authority File • Geonames • Library of Congress Subject Headings • Library of Congress Children’s Subject Headings • Library of Congress Classification • data.bnf.fr • Gemeinsame Normdatei • Dewey Decimal Classification • Universal Decimal Classification • ISO639-2 Languages • MARC Countries

  15. The European Library Open Dataset Current Status

  16. Linking the main entities present in Subject Data • Person names • Place names • Other entities/concepts www.theeuropeanlibrary.org

  17. Linked Data at The European Library Linking person names

  18. The matching process  VIAF data used for matching, disambiguation, and match probability

  19. Matching Person names with VIAF  Names are matched by similarity  Confirmation of the correctness of a name match is taken from other matching data • The dates of birth and death • The title of the work is compared against the list of titles available in VIAF • All the contributors of the work are matched against the list of known co-authors in VIAF • The publisher(s) of the work are matched against the list of known publishers in VIAF  A match is only chosen if enough supporting evidence is found

  20. Linked Data at The European Library Linking place names

  21. The approach for place name linking • The alignment is performed with Geonames • Using the RDF dump of Geonames • It aims to find a single entity in Geonames for linking to the place name • The first step of this task is to find all possible candidates for the resolution in Geonames • Uses a heuristic based predictive model: • Assigns a probability for each resolution candidate as a match • A link is established if a minimum probability threshold for a match is achieved.

  22. Which information supports the place name linking Feature Description Number of words The number of words in the place name. Name match If the recognized place name matched: the main name of the place, an alternate name, etc. Exact name If the recognized place name matched exactly the place match name. Relative Relative population of the candidate in comparison with population other candidates. Geographic The type of geographic feature: continent, country, city, etc. feature type Related places The number of other place names found in the found administrative hierarchy. Relative related The relative number of administrative divisions found in the places bibliographic record In source country If it is located in one of the source countries of the bibliographic record.

  23. The approach for place name linking • This approach was recently evaluated by the EuropeanaTech task force on Evaluation and Enrichments. • Final report of this task force will be published very soon: http://pro.europeana.eu/taskforce/evaluation-and-enrichments

  24. Linked Data at The European Library Linking Subject Data

  25. Linking Subject Data  The challenges • Diversity of languages • Diversity of knowledge organization systems in use across European libraries • Heterogeneous levels of detail in subject information

  26. Linking Subject Data  Current status at The European Library • Use of alignments between ontologies: • Alignments were created manually or semi- automatically • Alignments in use include: CERIF, MACS (LCSH, RAMEAU, SWD), • Linking is performed also for classifications: • For UDC and DDC • … but only shallow linking is done, for the most general classifications

  27. Linking Subject Data  …but much more subject data is known to exist at The European Library  A data mining study conducted in the aggregated digital collections, revealed several knowledge organization systems in use: • 5 systems which are available as LOD • 34 systems not known to us at this time

  28. Concluding remarks (1/3)  Some of the best LOD datasets for linking bibliographic subject data originate from the library domain • High quality and mature datasets exist for Subject Heading Systems • Classification systems as LOD, are not as well developed • In the case of UDC, very good LOD data is available, it is compliant with standards and best practices, but lacks linking to other LOD datasets • Linking to Dbpedia would potentially promote the wider usage of UDC

  29. Concluding remarks (2/3)  Linking classifications to LOD datasets is not as straight forward due to combined classifications • In Semantic Web terms, the combination of concepts represented in this kind of classification requires multiple RDF statements • Most LOD linking tools are not prepared, and require adaptation to these cases

  30. Concluding remarks (3/3)  Uses cases for supporting Linked Subject Data are plentiful • … but much diversity of knowledge organization systems are in use, making it a challenge in terms of scale

  31. Thank hank you ou Nuno Freire nuno.freire@theeuropeanlibrary.org

Recommend


More recommend