TRANSFORMATION OF A LEGACY UDC- BASED CLASSIFICATION SYSTEM: EXPLOITING AND REMODELLING SEMANTIC RELATIONSHIPS Fran Ale lexande der, , Taxonomy Manager, BBC Information and Archives, London, UK Andy y Heather er, , Chief Technical Officer, Dods Parliamentary Communications, London, UK (formerly Principal Programme Architect, BBC Technology, London, UK) *All views expressed here are entirely our own personal views and in no way represent the BBC or official BBC policy.
INTRODUCTION TO THE BBC ARCHIVE 2 million items of TV and video 300,000 hours of audio still photographs, sheet music, and documents 4,000 loans per week Lonclass (London Classification), based on UDC, introduced 1964 Telclass (Television Classification), used mainly by the Natural History Unit (NHU), established 1979
DMI PROJECT – “ Fabric ” launched in 2008 preserve intellectual property and semantic richness of classifications facilitate publishing of classification data in semantically rich and interoperable forms
FACET CLASSES AS A BASIS FOR ONTOLOGICAL RELATIONSHIP MODELLING Facet/Class Example Format Subject Emergency Services Polyhierarchy Geographic Birmingham Simple hierarchy Event date 1585 Simple hierarchy Motion Takeoff Flat list Organisations The British Library Flat list, divided into sections Person Elizabeth I Flat list, divided into sections Artistic work The Mill on the Floss Flat list, divided into sections Shot type POV Flat list Shooting date (archive) 1971 Simple hierarchy
ANALYSIS OF LONCLASS Lonclass: 370,000 concepts 20,000 simple concepts 350,000 compound concepts some 150,000 KOS concepts (40%) used for only 1 catalogue item 50,000 (14%) used for only 2 catalogue items 300,000 (80%) used 10 times or fewer less than 5% of the concepts (approximately 16,000) were used 100 times or more
ANALYSIS OF A LONCLASS COMPOUND TERM
DECOMPOSITION METHODOLOGY decompose the PCCs in Lonclass and build term hierarchies of each of the set of defined Classes of Concepts use multiple, redundant classification points to mitigate against loss of semantic accuracy define a set of terms from the legacy KOS with value as classifications for clustering assets utilise terms in the legacy KOS with additional semantic value
TO ASSET PATHWAYS
CLASSIFICATION DATA MODEL nodes in the classification space modelled as Concepts with a variable number of alternate and preferred terms URIs to provide access to concepts and terms http://fabric.bbc.co.uk/classification/<UUID> classification groups containing multiple sets of classification terms classification groups attached at all levels in the Product Information hierarchy classification groups attached at any point on the media timeline
INFORMATION DISCOVERY ENVIRONMENT integrate the classification space into the Search environment match queries against the taxonomy to increase the degree of relevance of the response open source Solr search engine selected classification space denormalised in the engine to allow runtime node counts to be calculated
PROBLEMS AND LIMITATIONS inability of SKOS to fully model the order of relationships between multiple concept instances SKOS vocabulary of relationship types is limited stopping point for decomposition
BENEFITS OF EXPORTING TAXONOMIES IN OPEN FORMATS
CONCLUSIONS preserve semantics through migrations export in open formats
KEY REFERENCES Ben-Yitzhak, Neumann, Sznajder et al. (2008). Beyond Basic Faceted Search IBM Research Labs Bergman, M. K. (2009). Confronting Misconceptions with Adaptive Ontologies. [Blog post.] Available at: http://www.mkbergman.com/553/confronting-misconceptions-with-adaptive-ontologies/ Black, P . E. (2004). Dictionary of Algorithms and Data Structures [online], ed., U.S. National Institute of Standards and Technology . Available at: http://www.nist.gov/dads/HTML/directAcycGraph.html Bosch, M. (2006). Ontologies, Different Reasoning Strategies, Different Logics, Different Kinds of Knowledge Representation: Working T ogether. Knowledge Organization , 33(3), pp. 153-159. Brickley, D. (2010). Lonclass and RDF. [Blog post.] Available at http://danbri.org/words/2010/11/18/585 Brickley, D. (2011). Video Linking: Archives and Encyclopedias . [Blog post.] Available at http://danbri.org/words/2011/02/01/658 Foskett, A. C. (1971). The Subject Approach to Information . London, UK: Clive Bingley. Frické, M. (2011). Classification, Facets, and Metaproperties. Journal of Information Architecture , 2 (2). Available at http://journalofia.org/volume2/ issue2/04-fricke/. NoTube http://notube.tv/about-3/partners/ Rodriguez-Castro, B.; Glaser, H.; Carr, L. (2010). How to Reuse a Faceted Classification and Put It on the Semantic Web. In ISWC 2010, Part I, LNCS 6496; P.F. Patel-Schneider et al. (eds.), pp. 663 – 678. Berlin/Heidelberg Springer-Verlag Acknowledgements Nicholas Chivers; Ken Haylock; Kathryn Stickley; Helen Pritchard (DMI development team); Oliver Gardiner; John Jordan Map of the Semantic Web http://www.flickr.com/photos/jurvetson/3277667570/
Recommend
More recommend