Linking library data: contributions and role of subject data Nuno Freire The European Library
Outline Introduction to The European Library Motivation for Linked Library Data The European Library Open Dataset Linking Subject Data • Linking person names • Linking place names • Linking other entities/concepts Concluding remarks
About The European Library Project started 1996, full operational service from 2005 Membership of national and research libraries of 47 Council of Europe states European hub of library bibliographic resources, and full text collections.
Library data aggregator Library domain aggregator for Europena • Digital collections of cultural heritage resources • Aggregation of metadata • Aggregation of full text of Historical Newspapers General Library domain aggregator • Aggregation of other bibliographic resources • Currently developping its capabilities for aggregating content (metadata + digital resource)
Library open data distribution and supporting its reuse http://www.theeuropeanlibrary.org/tel4/access Opening access to library data Distributing data into strategic channels • European research infrastructures, Portals, etc. Facilitating re-use • Supporting interoperability: APIs, data formats, etc. • ...and Linking Data
The European Library Open Dataset www.theeuropeanlibrary.org
Library LOD: Motivation LOD provides a set of procedures and technical standards to allow the reuse of data across communities. LOD allows for: ● Opening access to the data … in order to allow others to obtain, process and re -use the data. ● Linking the data to other datasets … in order to allow others to find the data more easily, better understand its meaning, match it with other data...
Library LOD: Motivation Linking data makes it more precise and informative. Data links allow computers to better understand the data, enabling more use cases.
Library LOD: Motivation Linked data is not new to libraries, and its value clearly realized ● Libraries have perceived the value of linked data for decades: o Authority files, union catalogues, ... ● Library data is already contributing with LOD datasets which are being re-used across all communities Nowadays LOD framework addresses the same benefits: ● but beyond libraries … at a global level … across all communities.
The Data Model
The Data Model RDA Element Vocabularies • The most extensivelly used vocabularies • Used entensivelly in the properties of the Bibliographic Resources FRBRer model • Used for context • Not used for Item, Manifestation, Expression, Work • The LOD data is derived from non-FRBR MARC data Europeana Data Model • Used for Web Resources OWL 2 Web Ontology Language • Used for linking to external datasets • For linking duplicate Bibliographic Resources within libraries Dublin Core Terms • Used where more general semantics could/should be applied WGS84 Geo Positioning
Resulting usage of classes (from MARC data) Example statistics from the Research Libraries UK collection
Resulting properties usage (from MARC data) Example statistics from the Research Libraries UK collection
External LOD Datasets Linked To Links to external datasets are available for the following: • VIAF Virtual Union Authority File • Geonames • Library of Congress Subject Headings • Library of Congress Children’s Subject Headings • Library of Congress Classification • data.bnf.fr • Gemeinsame Normdatei • Dewey Decimal Classification • Universal Decimal Classification • ISO639-2 Languages • MARC Countries
The European Library Open Dataset Current Status
Linking the main entities present in Subject Data • Person names • Place names • Other entities/concepts www.theeuropeanlibrary.org
Linked Data at The European Library Linking person names
The matching process VIAF data used for matching, disambiguation, and match probability
Matching Person names with VIAF Names are matched by similarity Confirmation of the correctness of a name match is taken from other matching data • The dates of birth and death • The title of the work is compared against the list of titles available in VIAF • All the contributors of the work are matched against the list of known co-authors in VIAF • The publisher(s) of the work are matched against the list of known publishers in VIAF A match is only chosen if enough supporting evidence is found
Linked Data at The European Library Linking place names
The approach for place name linking • The alignment is performed with Geonames • Using the RDF dump of Geonames • It aims to find a single entity in Geonames for linking to the place name • The first step of this task is to find all possible candidates for the resolution in Geonames • Uses a heuristic based predictive model: • Assigns a probability for each resolution candidate as a match • A link is established if a minimum probability threshold for a match is achieved.
Which information supports the place name linking Feature Description Number of words The number of words in the place name. Name match If the recognized place name matched: the main name of the place, an alternate name, etc. Exact name If the recognized place name matched exactly the place match name. Relative Relative population of the candidate in comparison with population other candidates. Geographic The type of geographic feature: continent, country, city, etc. feature type Related places The number of other place names found in the found administrative hierarchy. Relative related The relative number of administrative divisions found in the places bibliographic record In source country If it is located in one of the source countries of the bibliographic record.
The approach for place name linking • This approach was recently evaluated by the EuropeanaTech task force on Evaluation and Enrichments. • Final report of this task force will be published very soon: http://pro.europeana.eu/taskforce/evaluation-and-enrichments
Linked Data at The European Library Linking Subject Data
Linking Subject Data The challenges • Diversity of languages • Diversity of knowledge organization systems in use across European libraries • Heterogeneous levels of detail in subject information
Linking Subject Data Current status at The European Library • Use of alignments between ontologies: • Alignments were created manually or semi- automatically • Alignments in use include: CERIF, MACS (LCSH, RAMEAU, SWD), • Linking is performed also for classifications: • For UDC and DDC • … but only shallow linking is done, for the most general classifications
Linking Subject Data …but much more subject data is known to exist at The European Library A data mining study conducted in the aggregated digital collections, revealed several knowledge organization systems in use: • 5 systems which are available as LOD • 34 systems not known to us at this time
Concluding remarks (1/3) Some of the best LOD datasets for linking bibliographic subject data originate from the library domain • High quality and mature datasets exist for Subject Heading Systems • Classification systems as LOD, are not as well developed • In the case of UDC, very good LOD data is available, it is compliant with standards and best practices, but lacks linking to other LOD datasets • Linking to Dbpedia would potentially promote the wider usage of UDC
Concluding remarks (2/3) Linking classifications to LOD datasets is not as straight forward due to combined classifications • In Semantic Web terms, the combination of concepts represented in this kind of classification requires multiple RDF statements • Most LOD linking tools are not prepared, and require adaptation to these cases
Concluding remarks (3/3) Uses cases for supporting Linked Subject Data are plentiful • … but much diversity of knowledge organization systems are in use, making it a challenge in terms of scale
Thank hank you ou Nuno Freire nuno.freire@theeuropeanlibrary.org
Recommend
More recommend