Semantic Web Techniques for Multiple Views on Heterogeneous Collections A Case Study Marjolein van Gendt, Antoine Isaac , Lourens van der Meij, Stefan Schlobach ECDL 2006
ECDL 2006 Outline • Motivations and project • Experiment • Collection formalization • Collection integration • Integrated collection access • Conclusion
ECDL 2006 Motivation • Current CH trend: portals that build on heterogeneous collections • Different databases • Documents described/ accessed according to different points of view (controlled vocabularies/ MD schemes)
ECDL 2006 MDS 1 MDS 1 MDS 1 - Field 1 - Field 1 - Field 1 - Field 1.1 - Field 1.1 - Field 1.1 - Field 2 - Field 2 - Field 2 - Field 2.1 - Field 2.1 - Field 2.1 - Field 2.2 - Field 2.2 - Field 2.2 - … - … - … MDS 2 MDS 2 MDS 2 - Field 1 - Field 1 - Field 1 - Field 1.1 - Field 1.1 - Field 1.1 - Field 1.2 - Field 1.2 - Field 1.2 - Field 1.2.1 - Field 1.2.1 - Field 1.2.1 - Field 1.3 - Field 1.3 - Field 1.3 Document Document Document - Field 2 - Field 2 - Field 2 Description Description Description - … - … - … Collection X Collection X Collection X Base X Base X Base X Document Document Document Description Description Description Collection Y Collection Y Collection Y Base Y Base Y Base Y Thesaurus x Thesaurus x Thesaurus y Thesaurus y
ECDL 2006 CH I nteroperability Problems • Current CH trend: portals that build on heterogeneous collections Different databases/ vocabularies/ MD schem es • Syntactic interoperability problem is being solved Access can be granted, cf. deployed portals • Semantic interoperability still to be addressed Links w ith original vocabularies/ MD structures are lost
ECDL 2006 MDS 1 - Field 1 MDS 2 Unified MD Scheme - Field 1.1 - Field 1 - Field 2 - Field 1 - Field 1.1 - Field 2.1 - Field 1.2 - Field 1.1 - Field 2.2 - Field 1.2.1 - … - Field 1.2 - Field 1.3 - Field 2 - … - … DB X Unified (Virtual) Description Base DB Y No semantic information for description vocabulary
ECDL 2006 STI TCH General Goals [Sem anTic Interoperability To access Cultural Heritage] Allow heterogeneous CH collections to be accessed • In a seamless way • Still benefiting from specific collection commitments Keeping original m etadata schem es and vocabularies
ECDL 2006 MDS 1 MDS 1 MDS 1 MDS 2 MDS 2 MDS 2 - Field 1 - Field 1 - Field 1 - Field 1.1 - Field 1.1 - Field 1.1 - Field 1 - Field 1 - Field 1 - Field 2 - Field 2 - Field 2 - Field 1.1 - Field 1.1 - Field 1.1 - Field 2.1 - Field 2.1 - Field 2.1 - Field 1.2 - Field 1.2 - Field 1.2 - Field 1.2.1 - Field 1.2.1 - Field 1.2.1 - Field 2.2 - Field 2.2 - Field 2.2 - Field 1.3 - Field 1.3 - Field 1.3 - … - … - … - Field 2 - Field 2 - Field 2 - … - … - … DB X DB X Knowledge base Knowledge base DB Y DB Y
ECDL 2006 STI TCH General Goals (2) Allow heterogeneous CH collections to be accessed • In a seamless way • Still benefiting from specific collection commitments Keeping original m etadata schem es and vocabularies Using Sem antic Web m eans for • Representation of the different points of view in one system • Creation and use of the alignment knowledge 2 m ethodological concerns • Generalize as much as possible • Automatize as much as possible
ECDL 2006 Experiment On a reduced scale • 2 collections and associated vocabularies Output w ished: insights on • Use of SW off-the-shelf techniques with CH-specific resources • Impact of turning to standard proposals (SW-linked tools and methods) • In a context of natural semantics (thesauri) • Added value of this effort • Quantitative and qualitative evaluation • Simple prototype for accessing documents
ECDL 2006 1 st Collection: KB I llustrated Manuscripts
ECDL 2006 1 st Collection: KB I llustrated Manuscripts
ECDL 2006 2 nd Collection: Rijksmuseum ARI A collection
ECDL 2006 2 nd Collection: Rijksmuseum ARI A collection
ECDL 2006 Outline • Motivations and project • Experiment • Collection formalization • Collection integration • Integrated collection access • Conclusion
ECDL 2006 Experiment Steps
ECDL 2006 Steps
ECDL 2006 Steps • Gathering vocabulary and collection data • Analyzing it • Transforming it using SW standards All record/ vocabulary inform ation in one repository
ECDL 2006 Collection Formalization Choices • Representation of vocabularies • Standard RDFS/ OWL encoding scheme: SKOS • Representation of records • Adhoc ontologies for collection MD schemes • Linking to SKOS concepts • RDF Schema repository: Sesam e
ECDL 2006 Vocabulary Formalisation: ARI A in SKOS
ECDL 2006 Steps
ECDL 2006 Steps • Provide mappers with vocabulary data • Proceed to evaluation/ selection of their results • Put the alignment in the repository
ECDL 2006 MDS 1 MDS 1 MDS 1 MDS 2 MDS 2 MDS 2 - Field 1 - Field 1 - Field 1 - Field 1.1 - Field 1.1 - Field 1.1 - Field 1 - Field 1 - Field 1 - Field 1.1 - Field 1.1 - Field 1.1 - Field 2 - Field 2 - Field 2 - Field 1.2 - Field 1.2 - Field 1.2 - Field 2.1 - Field 2.1 - Field 2.1 - Field 2.2 - Field 2.2 - Field 2.2 - Field 1.2.1 - Field 1.2.1 - Field 1.2.1 - Field 1.3 - Field 1.3 - Field 1.3 - … - … - … - Field 2 - Field 2 - Field 2 - … - … - … DB X DB X Knowledge base Knowledge base DB Y DB Y
ECDL 2006 Collection I ntegration: Ontology Mapping Tools Tests with 2 mapping tools • S-Match, Trento • Tree-like structures mapper • Falcon-AO, Nanjing • Standard OWL ontology mapper • Using • Lexical comparisons • Structural comparisons • Third resource (Wordnet as ‘oracle’)
ECDL 2006 Collection I ntegration: Mappings IC code IC label ARIA label "29B" "plants behaving as human beings or animals" "Flowers, plants" "25G1" "plants (in general)" "Flowers, plants" "25G7" "language of flowers" "Flowers, plants" "25GG3" "fabulous trees" "Flowers, plants" 42G family, relationship, descent Brothel scenes "25GG5" "fabulous lower plants" "Flowers, plants" "25H151" "deciduous forest" "Flowers, plants" "25H152" "forest of coniferous trees" "Flowers, plants" "25H153" "bush, shrubs ~ forest" "Flowers, plants" "29A" "animals acting as human beings" "Marine and other animals" "29B" "plants behaving as human beings or animals" "Marine and other animals"
ECDL 2006 Partial evaluation • Conceptual level • evaluating links, not results of document searches • S-Match: 46% precision (subset of IC: 1500 concepts ) • Falcon-AO: 16% precision (subset of IC) Not m uch sense? • Difficulty to carry out com plete evaluation • Qualitative analysis reveals that im provem ent is possible
ECDL 2006 Nice results (S-Match) • Lexical matching: 23L • Lemmatization: 25A271 • Background knowledge: 23U1
ECDL 2006 Errors • Not enough NLP – 23H • Wrong Wordnet Disambiguation – 29D
ECDL 2006 Steps
ECDL 2006 Steps • Adapted faceted browsing paradigm ( Flam enco ) • Search by navigating through several dimensions • Adaptation of the paradigm: From facets corresponding to orthogonal dim ensions of object description (‘m aterial’, ‘location’) to facets corresponding to different conceptual schem es (ARIA, IconClass) • 3 views (sets of facet definitions) on integrated collections • Single view • Combined view • Merged view
ECDL 2006 Collections Access: Single View • Facets based on 1 concept scheme • Access to objects indexed against concepts from other schemes If mapping between their index and the selected concepts A single point of view on integrated data set
ECDL 2006 Collections Access: Combined View • Search based on 2 concepts schemes Facets attached to the different vocabularies are presented Sim ultaneous access from different points of view on the sam e data
ECDL 2006 Collections Access: Merged View • Facets using a merged concept scheme with hierarchical links coming from schemes and alignment Making the links betw een vocabularies m ore visible during search A w ay to ‘enrich’w eakly structured vocabularies
ECDL 2006 Outline • Motivations and project • Experiment • Collection formalization • Collection integration • Integrated collection access • Conclusion
ECDL 2006 Steps
ECDL 2006 Lessons learned: Collection Formalization Representing different vocabulary types using form al standards is feasible, but not trivial • Influence of the use of vocabularies on interpretation • Expressivity level is variable (weakly structured model vs. complex ones) • Implies some loss of data Part of the form alization is application and system - specific • E.g. depending on standard RDF Schema reasoning services for SKOS axioms
ECDL 2006 Steps
Recommend
More recommend