From MARC silos to Linked Data silos? Osma Suominen and Nina Hyvnen - PowerPoint PPT Presentation

From MARC silos to Linked Data silos? Osma Suominen and Nina Hyvönen SWIB16, Bonn November 30, 2016

Original image by Doc Searls. CC By 2.0 https://www.flickr.com/photos/docsearls/5500714140

Overview of current data models for bibliographic data

“Family forest” of bibliographic data models, conversion tools, application profiles and data sets Flat / Record-based MARC MODS MODS RDF marcmods2rdf DC-NDL AP NDL Catmandu BNB AP Dublin Core DC-RDF BNB DNB AP DNB BIBO Metafacture Swiss Swissbib AP bib schema.org + don’t have Works bib.extensions have Works LD4L BNF AP BNF World Cat marc2bibframe LD4P ontology LD4L ontology Legend BIBFRAME 1.0 BIBFRAME 2.0 pybibframe Non-RDF data model bibfra.me (Zepheira) RDF data model LibHub MARiMbA Conversion tool Entity-based FaBiO RDA Vocabulary BNE ontology BNE Application profile FRBR FRBR Core ALIADA Data set FRBRer FRBRoo eFRBRoo Artium

Libraryish Webbish - used for producing and maintaining (meta)data - used for publishing data for others to reuse - lossless conversion to/from legacy formats - interoperability with other (non-library) data (MARC) models - modelling of abstractions (records, authorities) - modelling of Real World Objects (books, people, - housekeeping metadata (status, timestamps) places, organizations...) - favour self-contained modelling over reuse of - favour simplicity over exhaustive detail other data models BIBO Bibliographic data MODS RDF Dublin Core RDF FaBiO LD4L ontology schema.org + BIBFRAME bib.extensions LD4P ontology Wikidata Authority data RDA Vocabulary properties MADS/RDF SKOS FOAF

BIBLIOGRAPHIC DATA MODELS https://xkcd.com/927/

Why does it have to be like this?

Reason 1 Different use cases require different kinds of data Reason 2 models. None of the existing Reason 3 models fits them all. Reason 4 But surely, for basic MARC records (e.g. a “regular” national library collection) a single model would be enough?

Reason 1 Converting existing data (i.e. MARC) into a modern Reason 2 entity-based model is difficult Reason 3 and prevents adoption of Reason 4 such data models in practice for real data. All FRBR-based models require “FRBRization”, which is difficult to get right. BIBFRAME is somewhat easier because of its more relaxed view about Works.

Reason 1 Libraries want to control their data - including data models. Reason 2 Reason 3 Reason 4 Defining your own ontology, or a custom application profile, allows maximum control. Issues like localization and language- or culture-specific requirements (e.g. Japanese dual representation of titles as hiragana and katakana ) are not always adequately addressed in the general models.

Reason 1 Once you’ve chosen a Reason 2 data model, you’re likely Reason 3 to stick to it. Reason 4

Choosing an RDF data model for a bibliographic data set 1. Want to have Works, or just records? 2. Libraryish (maintaining) or Webbish (publishing) use case? For maintaining metadata as RDF, suitable data models (BIBFRAME, RDA Vocabulary etc.) are not yet mature. For publishing, we already have too many data models.

What can we do about this?

Don’t create another data model, especially if it’s only for publishing. Help improve the existing ones! We need more efforts like LD4P that consider the production and maintenance of library data as modern, entity-based RDF instead of records. How could we share and reuse each other’s Works and other entities instead of having to all maintain our own?

Will Google, or some other big player, sort this out for us? A big actor offering a compelling use case for publishing bibliographic LOD would make a big difference. ● a global bibliographic knowledgebase? ● pushing all bibliographic data into Wikidata? ● Search Engine Optimization (SEO) using schema.org? This is happening for scientific datasets - Google recently defined a schema for them within schema.org.

Bibliographic data as LOD at the National Library of Finland

Our bibliographic databases Fennica - national bibliography (1M records) 1. Melinda union catalog (9M records) Arto - national article database (1.7M records) 2. Viola - national discography (1M records) 3. All are MARC record based Voyager or Aleph systems. The Z39.50/SRU APIs have been opened in September 2016

My assignment NATIONAL BIBLIOGRAPHY with apologies to Scott Adams

Not very Linked to start with ● Only some of our bibliographic records are in WorldCat ○ ...and we don’t know their OCLC numbers ● Our bibliographic records don’t have explicit (ID) links to authority records ○ ...but we’re working on it! ● Only some of our person and corporate name authority records are in VIAF ○ ...and we don’t know their VIAF IDs ● Our name authorities are not in ISNI either ● Our main subject headings (YSA) are linked via YSO to LCSH

Targeting schema.org Schema.org + bibliographic extensions allows surprisingly rich descriptions! Modelling of Works is possible, similar to BIBFRAME [1] [1] Godby, Carol Jean, and Denenberg, Ray. 2015. Common Ground: Exploring Compatibilities Between the Linked Data Models of the Library of Congress and OCLC . Dublin, Ohio: Library of Congress and OCLC Research. http://www.oclc.org/content/dam/research/publications/2015/oclcresearch-loc-linked-data-2015.pdf

schema.org forces to think about data from a web user’s point of view “We have these 1M bibliographic records”

schema.org forces to think about data from a web user’s point of view “We have these 1M bibliographic records” “The National Library maintains this amazing collection of literary works! We have these editions of those works in our collection. They are available free of charge for reading/borrowing from our library building (Unioninkatu 36, 00170 Helsinki, Finland) which is open Mon-Fri 10-17, except Wed 10-20. The electronic versions are available online from these URLs.”

Fennica using schema.org # The original English language work fennica:000215259work9 a schema:CreativeWork ; schema:about ysa:Y94527, ysa:Y96623, ysa:Y97136, ysa:Y97137, ysa:Y97575, ysa:Y99040, yso:p18360, yso:p19627, yso:p21034, yso:p2872, yso:p4403, yso:p9145 ; schema:author fennica:000215259person10 ; schema:inLanguage "en" ; schema:name "The illustrated A brief history of time" ; schema:workTranslation fennica:000215259 . # The manifestation (FRBR/RDA) / instance (BIBFRAME) fennica:000215259instance26 a schema:Book, schema:CreativeWork ; schema:author fennica:000215259person10 ; # The Finnish translation (~expression in FRBR/RDA) schema:contributor fennica:000215259person11 ; fennica:000215259 a schema:CreativeWork ; schema:datePublished "2000" ; schema:about ysa:Y94527, ysa:Y96623, ysa:Y97136, schema:description "Lisäpainokset: 4. p. 2002. - 5. p. 2005." ; ysa:Y97137, ysa:Y97575, ysa:Y99040, schema:exampleOfWork fennica:000215259 ; yso:p18360, yso:p19627, yso:p21034, schema:isbn "9510248215", "9789510248218" ; yso:p2872, yso:p4403, yso:p9145 ; schema:name "Ajan lyhyt historia" ; schema:author fennica:000215259person10 ; schema:numberOfPages "248, 6 s. :" ; schema:contributor fennica:000215259person11 ; schema:publisher [ schema:inLanguage "fi" ; schema:name "WSOY" ; schema:name "Ajan lyhyt historia" ; a schema:Organization schema:translationOfWork fennica:000215259work9 ; ] . schema:workExample fennica:000215259instance26 . # The original author fennica:000215259person10 a schema:Person ; schema:name "Hawking, Stephen" . # The translator Special thanks to Richard Wallis fennica:000215259person11 a schema:Person ; for help with applying schema.org! schema:name "Varteva, Risto" .

Fennica RDF conversion pipeline (draft) ● batch process driven by a Makefile, which defines dependencies Aleph- ○ incremental updates: only changed batches are reprocessed 1M records, bib- ● parallel execution on multiple CPU cores, single virtual machine 2.5 GB dump ● unit tested using Bats split into 300 batches 30M (max 10k records per batch) triples, 1.5 min 4 GB 9 GB ~3 GB Filter, merge works convert to using SPARQL txt mrcx rdf nt MARCXML BIBFRAME Schema.org using conversion conversion consolidate Raw RDF for Catmandu using using SPARQL & cleanup 240$l fix marc2bibframe CONSTRUCT merged publishing works txt mrcx rdf nt data using 11 min 75 min 35 min SPARQL nt + hdt nt + hdt txt mrcx rdf nt Create Create work keys work mappings (SPARQL) 2 min 35 min nt nt Under construction: https://github.com/NatLibFi/bib-rdf-pipeline

Current challenges 1. problems caused by errors & omissions in MARC records 2. extracting works: initial implementation needs fine tuning ○ the result will not be perfect; establishing a work registry would help 3. dumbing down MARC to match schema.org expectations ○ e.g. structured page counts: “vii, 89, 31 p.” -- schema.org only defines numeric numberOfPages property 4. linking internally - from strings to things ○ subjects from YSA and YSO - already working ○ using person and corporate name authorities 5. linking externally ○ linking name authorities to VIAF, ISNI, Wikidata... ○ linking works to WorldCat Works?

From MARC silos to Linked Data silos? Osma Suominen and Nina Hyvnen - PowerPoint PPT Presentation

From MARC silos to Linked Data silos? Osma Suominen and Nina Hyvnen SWIB16, Bonn November 30, 2016 Original image by Doc Searls. CC By 2.0 https://www.flickr.com/photos/docsearls/5500714140 Overview of current data models for

From MARC silos to Linked Data silos? Data models for bibliographic Linked Data Osma Suominen

The Advantages of Vertical Silos Silos Crdoba Business Group Presented to Food and Agriculture

Old silos, new silos, no silos From redundancy to aggregation or distribution? Lukas Koster

Composition Announcements Linked Lists Linked List Structure A linked list is either empty or a

Composition Announcements Linked Lists Linked List Structure A linked list is either empty or a

Linked Lists Fundamentals of Computer Science Outline Sequential vs. Linked Linked List

Composition Announcements Linked Lists Linked List Structure A linked list is either empty or a

Composition Announcements Linked Lists Linked List Structure A linked list is either empty or a

Silos, Silences and Diversity Gillian Tett Financial Times Silos = Bias Bias =

csci 210: Data Structures Linked lists Summary Today linked lists single-linked

Linked Lists Definition of Linked Lists A linked list is a sequence of items (objects) where

Joint Regional Seminar 2016 Risk Analysis of Equity-linked Products 1 Equity-linked products 2

Linked Lists Kruse and Ryba Textbook 4.1 and Chapter 6 Linked Lists Linked list of items

Ch 5 Linked Lists A Node Class for Linked Lists A Linked List Toolkit The Bag Class with a

Linked Lists first: 3 first: 4 first: 5 first: 3 first: 4 first: 5 rest: rest: rest:

Linked Data Mapper Mapper Linked Data A Browser rowser- -based Semantic Mapping

Macrofinancial History and the New Business Cycle Facts Discussion by: Marco E. Terrones Worshop

Innovation by Design Report by Mumbai Mirror The image published in Mid- Day dated 24 th

Introductionto piecesofpaper ShowtoTeam2facedown RandomVariables Team2:

Physics 116 Session 29 Relativity Nov 17, 2011 R. J. Wilkes Email: ph116@u.washington.edu

Class 29: More collisions Simple collision problem in 1D Before collision: m 1 m 2 #2 v 2i v 1i

Chapter 1 Politics Makes Sense Politics makes sense because politics is everywhere

#prep Y assembly 01-B: Heated Bed Sandwich IF you don't have the Heated Bed upgrade, you're on a

Holographically Viable Extensions of Topologically Massive and Minimal Massive Gravity? Emel

Sambuz

Useful Links

Newsletter

Mail Us