The Europeana Use Case Multilingual & Semantic Interoperability in Cultural Heritage Information Systems Vivien Petras Berlin School of Library and Information Science 12 March 2013 W3C Multilingual Web Workshop
Contents • Europeana: Multilingual Collections & Users • Multilingual Interoperability • Semantic Enrichment • Preview: New Enrichment Plans • Playing with Europeana Data Image: http://www.europeana.eu/portal/record/08535/D53FE7B7621E65A5E01E16E3D72785C68F2E2059.html 2
Europeana • 15.2 million images • 10 million texts • 450,000 sound files • 170,000 video files > 2,200 institutions > 30 countries 3
Europeana Multilingual Collections Slovenian Hungarian Danish 1% 1% 2% Finnish 3% Italian German 6% 18% Polish à Most Europeana 6% objects are language- Norwegian 6% independent (e.g. Multilingual 12% images), but the meta- English 7% data is multilingual. French Spanish 11% 8% Swedish Dutch 9% 10% 4
Multilingual Europeana Users • Native language browser: 69% • Native language Google (entry point): 91% • Native language objects: 43% (SV 77%, DE 71%) à Native language use increases as soon as native language content increases. Gäde, Maria (forthcoming). “User Behavior through the Language Glass” – Language-specific Behavior in Multilingual Digital Libraries. Image: http://www.europeana.eu/resolve/record/9200105/AF5C65B3CC6A71CC0E4FF6FE5AAEB4CDAA1873C9 5
Multilingual Interface in 31 Languages • users seem to assume that search is affected 6
Query Result Filtering by Language • language of record vs. language of content 7
Document Translation • general MT – not domain-specific 8
Query Translation – Planned for 2013 How many languages? • How much user interaction? • 9
Semantic Enrichment • concept (GEMET Thesaurus), agent (DBpedia), period (Semium time ontology), place (Geonames) 10
Poisonous India … 11
Enrichment Challenges • Metadata quality & sparsity • Vocabulary ambiguity – domain GEMET print (German) Druck pressure – language electrical Power (German) Strom (Czech) strom tree – context Córdoba = Spain | Argentina Olensky, M., Stiller, J., Dröge, E. (2012). Poisonous India or the Importance of a Semantic and Multilingual Enrichment Strategy. In: Proc. of MTSR 2012: Metadata and Semantics Research Conference, Nov. 2012, Cádiz, Spain. Image: http://www.europeana.eu/portal/record/03919/FCD38BDE7A03579F24BEDA5D157943B75BB36F11.html 12
Preview: New Enrichment Plans à transition to linked data-based Europeana Data Model (EDM) • links to contextual vocabularies from providers • enrich during ingestion 13
Playing with Europeana Data • CHiC: Cultural Heritage in CLEF à Europeana data (XML) & queries / 13 languages à ad-hoc retrieval / semantic enrichment tasks à Submission deadline: 14 April 2013 à http://www.culturalheritageevaluation.org • Europeana Linked Open Data à RDF file dumps in EDM (Europeana Data Model) à SPARQL endpoint à CC0 open license à http://data.europeana.eu/ • Contact: vivien.petras@ibi.hu-berlin.de Image: http://www.europeana.eu/resolve/record/03486/DF559A7721E55BAE5BF5095FB9AA55406C0269C4 14
Recommend
More recommend