Vocabulary management and SKOS Putting Business in the Lead Jan Voskuil (Taxonic) September 5th, 2014, Leipzig SEMANTiCS 2014
Introduction Jan Voskuil Taxonic (co-founder) Consultancy in Semantic Technology “SKOS is used for findability, but should be used also for vocabulary management in organizations. Business owns the dictionary, not IT” What are dictionaries and what for? SKOS: Tooling and benefits Practicalities
Dienst Justitiële Inrichtingen (DJI) Custodial Institutions Agency Ca. 10.000 employees Ca. 70.000 inmates per year Ca. 50 facilities Four groups of detainees Adult detainees Juvenile offenders Patients in forensic care Foreign nationals
Dictionaries: Benefits Knowledge management • Quality of information • Manageability • If your systems contain 100K+ of – attribute names, then they contain unstructured information (Dave McComb) Findability • – Document (DMS) – Data (DBMS) Exchangeability • 4
How many key words are enough? Frequency of the most Zipf’s Law • 5000 words are enough to understand frequent word • 95% of any corpus. For the other 5% you need to know the other 200,000 words Source: Tiberius and Schoonheim A Frequency Dictionary of Dutch, 2014 Frequency of the second most frequent word Pocket dictionary: 5K General dictionary: 100K Lexicographic dictionary: 1M+ 5
The Real World What is the correct definition of x ? Who decides this? My project introduces new terms, how can I get these accepted ? Dictionary Owner Begrippenwoordenboek DJI Dept X Begrippenlijst Project Y Project Y Mega Glossary ICT-Dept Information chain dictionaries Ketenwoordenboek Strafrecht JustID Ketenwoordenboek JustID Vreemdelingen Justitiethesaurus WODC Data Dictionaries Gegevenswoordenboek MITS ICT-Dept Datadictionary Tulp MIR ICT-Dept … It just does not work! 6
OLD SITUATION NEW SITUATION Various lists Single source of truth Various versions Single source of truth Word-documents Intranet (Internet) Distribution per mail Intranet (Internet) Endless discussions Clear-cut governance Responsibility of IT dept or project Ownership by the business 7
Some How To’s • Keep the dictionary lean and mean – Create a “pocket dictionary” – Example: 1200 key words • Governance: be pragmatic • Ownership within the business! • Use clear, explanatory descriptions – Language of the work force – Avoid legal speak! • Dictionary maintenance is a continuous proces! – Release cycle – One major, four minor releases per year – Major release is approved by senior executives 8
Why SKOS is so great: just enough semantics Justitiabele • Semantic relations (“Detainee”) – Compare one-dimensional lists Adult detainee narrower • A LIMITED number of Juvenile offender STANDARDIZED semantic Foreign national relations Patient in forensic care – Broader, Narrower, Related Term – Semantics is sufficiently vague Criminal Law • Intuitive, easy to understand narrower – Ideal for “pidginization” Penal Institution – Use is far broader than Class Sex Diagrams, ERDs and ontologies Male • Only most relevant info narrower Female • “GENERALIZED CLASSIFICATION” Unknown Undisclosed 9
Why SKOS is so great: tooling 10
Tooling: PoolParty Thesaurus Manager 11
End User View 12
SKOS is an Open Standard: Project Linking 13
http://vocabulary.wolterskluwer.de
prefLabel: Unfallverhütung Alternative labels From Wolters Broaders Kluwer Narrowers Related terms From DBPedia From lod.gesis.org Other thesauri on 15 the web From eurovoc.org
prefLabel: Unfallverhütung Alternative labels From Wolters Broaders Kluwer DJI and the POLICE have very different meanings for the word ARRESTANT Narrowers DO: > RESPECT DIFFERENCES BETWEEN ORGANIZATIONS Related terms > MAKE LEXICOGRAPHIC DIFFERENCES EXPLICIT USING LINKED THESAURI DON’T > TRY MAKING ALL ORGANIZATIONS USE EXACTLY THE SAME LANGUAGE From DBPedia From lod.gesis.org Other thesauri on 16 the web From eurovoc.org
Conclusion and next step: Linking Thesauri to Datamodels • Datamodels: not owned by business – too detailed – too complex – NO ownership at the strategic level • Thesauri – Relatively abstract – Relatively simple – Ownership by the business • SKOS bridges the gap – With datamodels in RDF, the gap can be bridged! 17
THESAURUS AND DOMAINMODELS: SCENARIO 1 THESAURUS Skos:Concept skos:Concept “Detention Facility” rdf:type rdfs:type skos:broader skos:exactMatch eurovoc:C877 skos:prefLabel voc:4862 skos:prefLabel “Penitentiary Institution” skos:Definition “Penal Institution”@en “A prison,[3] gaol or jail[4] is a facility in which inmates are forcibly confined and denied a skos:prefLabel variety of freedoms under the authority of… “място за лишаване от свобода ”@bg owl:sameAs? skos:exactMatch? DOMAIN MODEL :penitentiaryInstitution | Data dictionary rdf:type :isRegisteredAt :pi_Dordrecht :inmate#9818763 :cell “B.23.a” 18
THESAURUS AND DOMAINMODELS: SCENARIO 2 THESAURUS Skos:Concept skos:Concept “Detention Facility” rdfs:type rdf:type eurovoc:C877 skos:prefLabel “Penitentiary Institution” “Penal Institution”@en “A prison,[3] gaol or jail[4] is a facility in which skos:exactMatch inmates are forcibly confined and denied a skos:prefLabel variety of freedoms under the authority of… “място за лишаване от свобода ”@bg DOMAIN MODEL DOMAIN MODEL :penitentiaryInstitution | Data dictionary | Data dictionary rdf:type :isRegisteredAt :pi_Dordrecht :inmate#9818763 :cell “B.23.a” 19
jan.voskuil@taxonic.com www.taxonic.com
Recommend
More recommend