bringing europeana and clarin together dissemination and
play

Bringing Europeana and CLARIN together: Dissemination and - PowerPoint PPT Presentation

Bringing Europeana and CLARIN together: Dissemination and exploitation of cultural heritage data in a research infrastructure Twan Goosen 1 (CLARIN ERIC), Nuno Freire 2 , Clemens Neudecker 3 , Maria Eskevich 1 1 CLARIN ERIC; 2 Europeana /


  1. Bringing Europeana and CLARIN together: Dissemination and exploitation of cultural heritage data in a research infrastructure Twan Goosen 1 (CLARIN ERIC), Nuno Freire 2 , Clemens Neudecker 3 , Maria Eskevich 1 1 CLARIN ERIC; 2 Europeana / INESC-ID; 3 Berlin State Library/Europeana Newspapers Digital Infrastructures for research (DI4R) 2017 Brussels, BE 30 November 2017

  2. Europeana in six bullets • Europeana is the European digital platform for cultural heritage that • seeks to enable users to search and access knowledge in all the languages of Europe, either directly via its web portals , or indirectly via third-party applications leveraging its data service • Europeana enables people to explore the digital resources of Europe's galleries, museums, libraries, archives and audiovisual collections • working with partners and allies to develop frameworks, standards, strategy and policy relevant to digital cultural heritage, and to raise funds • providing digital expertise and platforms for bringing cultural heritage to wider audiences • championing the use of digitised cultural heritage in education, research and the creative industries through partnerships and international engagement campaigns 2

  3. CLARIN in seven bullets • CLARIN is the Common Language Resources and Technology Infrastructure • ESFRI ERIC status since 2012, Landmark since 2016 • that provides easy and sustainable access for scholars in the humanities and social sciences and beyond • to digital language data (in written, spoken, video or multimodal form) • and advanced tools to discover, explore, exploit, annotate, analyse or combine them, wherever they are located • through a single sign-on online environment • and that serves as an ecosystem for knowledge sharing 3

  4. CLARIN ERIC in members and centres A consortium of: • 19 members: AT, BG, CZ, DE, DK, DLU, EE, FI, GR, HU, IT, LT, LV, NL, NO, PL, PT, SE, SI • 2 observers: FR, UK; • >40 centres 4

  5. CLARIN & Europeana partnership in context of DSI Digital Service Infrastructure (DSI) : Creation of a complete, cohesive and integrated Digital Service Infrastructure • DSI (01.2015 – 06.2016): – European Research Distribution Plan – Assessment of relevant data sets available from The European Library (TEL) • DSI-2 (07.2016 – 08.2017): – Improvement of data quality and implementation of quality frameworks to improve metadata quality – Integration of Europeana data into CLARIN infrastructure • DSI-3 (09.2017 – 08.2018): – Fostering content supply by optimising Europeana data and aggregation infrastructure – Improving (meta-)data and content quality – Fostering reuse of digital cultural heritage resources by improving content distribution mechanisms – Maintain an international interoperable licensing framework 5

  6. Steps towards CLARIN & Europeana interoperability 1) Incorporate Europeana metadata in the VLO 2) Opening up the full-text Europeana Newspapers resources such as those from Europeana Newspapers through CLARIN’s federated content search mechanism (FCS) 3) Exploiting CLARIN’s communication channels to increase the awareness of Europeana within the community 4) Measure impact of the dissemination of Europeana data 6

  7. Metadata: access to cultural heritage Challenge: CLARIN and Europeana do not share a common metadata model • Aggregation of metadata from • Aggregation and resource providers (CLARIN centres exploitation of (meta)data and selected “external” parties) about digitised objects • Virtual Language Observatory from very different (VLO) provides a uniform contexts. experience and consistent • Europeana Data Model workflow. (EDM) as its model for • Language Resource Switchboard interoperability of (LRS) allows researchers to invoke metadata, in line with the tools with the selected resources vision of linked open directly from its user interface. vocabularies 7

  8. The CLARIN data architecture: repositories Repository at a CLARIN centre single text or web application Language Language Metadata recording ! Data Tools ! web service corpus ! ! web service describes lexicon pipeline ! ! wordnet stand-alone ! application ! grammar ! … … 8

  9. The CLARIN data architecture: harvesting Language Language Language Language Metadata Metadata Data Tools Data Tools Harvested Metadata copy Language Language Language Language Metadata Metadata Data Tools Data Tools 9

  10. The CLARIN data architecture: processing 10

  11. The CLARIN data architecture: content search Language Language Language Language Metadata Metadata Data Tools Data Tools (3) retrieve results (2) perform local search (Federated) Content Search ! ! (1) enter query ! (4) show aggregated results Language Language Language Language Metadata Metadata Data Tools Data Tools 11

  12. The CLARIN data architecture: workflows Language Language Language Language Metadata Metadata Data Tools Data Tools Web Service Pipelines ! ! (1) select input data (2) construct pipeline (3) execute (4) use/analyse output data Language Language Language Language Metadata Metadata Data Tools Data Tools 12

  13. Interoperability is key • to the exhange of metadata • to the exchange formats for the output of analytic tools • to the options for supporting comparative research 13

  14. CLARIN & Europeana Interoperaility highligths • CLARIN’s ingestion pipeline (Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH protocol)) was extended to retrieve a set of selected collections from Europeana and apply the conversion in the process. • Several infrastructure components had to be adapted to accommodate the significant increase in the amount of data to be handled and stored. – Current status: • 775 Europeana data sets (e.g. Newspapers) now found in the VLO • 10 K are technically suitable for processing with the LRS – Goal: • More records in the foreseeable future 14

  15. Metadata retrieval and conversion: OAI-PMH protocol • Europeana: – EDM-structured Europeana as RDF/XML documents • CLARIN: – Harvester performs conversions by means of XSLT stylesheets by applying a stylesheet that converts the RDF/XML documents metadata to Component Metadata (CMD) – Creation of a CMD profile for EDM in the CMDI Component Registry – implementation of an XSLT stylesheet that produces instances of the corresponding schema on basis of the EDM records. – Properties are defined as CMD elements in the order that they appear in the EDM specification while object order is based on relevance. – Concept links are assigned to most components and elements. – Implemented conversion stylesheet: the header information and resource proxies (entities representing external documents) in the resulting record are produced on the basis of a list of static XPaths in the original document. – The record’s payload is produced mostly by means of a straightforward crosswalk where the properties in the document are mapped to CMD components or elements of an equivalent name. • Test harvest of 11 selected metadata sets : – Total of 3.2 million successfully retrieved and converted, schema valid records – Full harvest and import of the size of this sample takes roughly 48 hours 15

  16. Processing pipeline issues • General lack of technical information available in the provided EDM (e.g. the media type for linked resources) • Direct links to machine processable resources are commonly missing • Limited functionality provided by the tools that are connected to the LRS (e.g. languages variability, resource types, accessibility) 16

  17. Get in touch www.clarin.eu clarin@clarin.eu https://www.europeana.eu https://pro.europeana.eu https://pro.europeana.eu/project/eu ropeana-dsi-3 17

Recommend


More recommend