Ontology Maintenance Ontology Maintenance Support Support Text, Tools, and Theories Text, Tools, and Theories Chris Welty Chris Welty IBM Research IBM Research
Outline Outline • Opening joke • Opening joke • Motivation • Motivation • Maintenance • Maintenance • Support • Support – Tools Tools – – Theories Theories – – Text Analysis Text Analysis –
Motivation Motivation • Given: • Given: Ontologies Ontologies matter matter – Does quality matter? Does quality matter? –
Does quality matter? Does quality matter? • Good quality • Good quality ontologies ontologies cost more cost more – Coverage, correctness, richness, commitment Coverage, correctness, richness, commitment – [ Kashyap Kashyap, 2003] , 2003] [ – Organization, meta Organization, meta- -level consistency [ level consistency [ Guarino Guarino & & – Welty, 2000] [Rector, 2002] Welty, 2000] [Rector, 2002] – Required Required for some applications for some applications – • Improvements in quality can improve • Improvements in quality can improve performance [Welty, et al, 2004] performance [Welty, et al, 2004] – 18% – 18% f f - -improvement in search improvement in search – Cleanup cost ~ 1mw/3000 classes Cleanup cost ~ 1mw/3000 classes – – BUT BUT … … low quality ontology still improved base low quality ontology still improved base –
Motivation Motivation • Given: • Given: Ontologies Ontologies matter matter Sometimes – Does quality matter? Does quality matter? Sometimes – • Problem: How to create them • Problem: How to create them • Bigger problem: how to • Bigger problem: how to maintain maintain them them – From SE: 80% of the cost is maintenance From SE: 80% of the cost is maintenance – [ Schrobe Schrobe, 1996] , 1996] [
Software Maintenance Software Maintenance • Fixing Bugs • Fixing Bugs • Testing • Testing • Enhancing • Enhancing
Ontology Maintenance Ontology Maintenance • Fixing Bugs • • • Enhancing Fixing Bugs Enhancing – Tweaking – Tweaking – Inconsistent Inconsistent – • Richness • Richness – Inaccurate Inaccurate – • Correctness • Correctness – Inefficient Inefficient – • Organization • Organization • Meta • Meta- -level consistency level consistency • Testing • Testing • Efficiency • Efficiency – Regression tests Regression tests – – Extending Extending – • Improving coverage – Test Suites Test Suites • – Improving coverage • Extending commitment • Extending commitment – Meta tag sets for test Meta tag sets for test – • Integration • Integration content content – Refactoring Refactoring – – Ablation tests Ablation tests –
A looming problem A looming problem • Prediction • Prediction the – Ontology maintenance will become Ontology maintenance will become the – significant problem as ontologies ontologies become become significant problem as more mainstream more mainstream – Will follow the SE model (80% of cost) Will follow the SE model (80% of cost) – • Observation/Conjecture • Observation/Conjecture – High quality High quality ontologies ontologies are easier to maintain are easier to maintain –
Tool Support Tool Support • • • • Hierarchical view of View relations between Hierarchical view of View relations between classes classes classes classes • • • • Hierarchical view of Global axioms Hierarchical view of Global axioms properties properties • • View meta- -level level View meta • • Consistency Reasoning Consistency Reasoning • • Basic Upper- -level level Basic Upper – But But… ….no .no “ “segmentation segmentation – Theories Theories faults” ” faults – Space, Time, Parts, Space, Time, Parts, … … – • • Inferential Reasoning Inferential Reasoning • • Assistance for integration Assistance for integration • • View non- -tree tree View non taxonomies taxonomies
Theory Support Theory Support • Meta • Meta- -level analysis level analysis – OntoClean OntoClean [ [ Guarino Guarino & Welty, 2000] & Welty, 2000] – • Good organizing principles • Good organizing principles – R R- -Normalization [Rector, 2002] Normalization [Rector, 2002] – • Well • Well- -founded upper levels founded upper levels – Dolce [ Dolce [ Gangemi Gangemi, et al., 2003] , et al., 2003] – – DAML DAML- -Time [Hobbs, 2003] Time [Hobbs, 2003] – – RCC [ RCC [ Randell Randell, Cui & Cohn, 1992] , Cui & Cohn, 1992] –
OntoClean OntoClean • Draw • fundamental notions from Formal Draw fundamental notions from Formal Ontology Ontology • Establish a set of useful • meta- -properties properties , Establish a set of useful meta , based on behavior wrt wrt above notions above notions based on behavior • Explore the way these meta • Explore the way these meta- -properties combine properties combine property kinds to form relevant property kinds to form relevant • Explore the • taxonomic constraints imposed Explore the taxonomic constraints imposed by these property kinds by these property kinds – Expose common modeling pitfalls Expose common modeling pitfalls –
Overloading Subsumption Overloading Subsumption Common modeling pitfalls Common modeling pitfalls • Instantiation • Instantiation • Constitution • Constitution • Composition • Composition • Disjunction • Disjunction • Polysemy • Polysemy
Instantiation Instantiation My ThinkPad is a is a ThinkPad Model ThinkPad Model ? Does this ontology mean that My ThinkPad ? Does this ontology mean that ThinkPad Model ThinkPad Model Ooops… … Ooops T21 T21 My ThinkPad (s# xx123) My ThinkPad (s# xx123) Question: What ThinkPad models do you sell? Question: What ThinkPad models do you sell? Answer should NOT include My ThinkPad -- -- nor yours. nor yours. Answer should NOT include My ThinkPad
Composition Composition Computer Computer Disk Drive Disk Drive Memory Memory Micro Drive Micro Drive Question: What Computers do you sell? Question: What Computers do you sell? Answer should NOT include Disk Drives or Memory. Answer should NOT include Disk Drives or Memory.
Disjunction Disjunction has- -part part has Computer Computer Computer Part Computer Part Disk Drive Memory Disk Drive Memory Micro Drive Micro Drive has- -part part has Flashcard- -110 110 Flashcard Camera- -15 15 Camera Unintended model: flashcard- -110 is a computer 110 is a computer- -part part Unintended model: flashcard
Polysemy Polysemy Physical Object Abstract Entity Physical Object Abstract Entity Book Book ….. .. … Question: How many books do you have on Hemingway? Question: How many books do you have on Hemingway? Answer: 5,000 Answer: 5,000
Constitution Constitution Entity Entity Amount of Matter Amount of Matter Physical Object Physical Object Clay Metal Clay Metal Computer Computer Question: What types of matter will conduct electricity? Question: What types of matter will conduct electricity? Answer should NOT include computers. Answer should NOT include computers.
Text Analysis Support Text Analysis Support • Document Classification • Document Classification – Subject hierarchies Subject hierarchies – – Identify relevant concepts Identify relevant concepts – • Information Extraction • Information Extraction – Find individuals Find individuals – – Glossary extraction [Park, 2004] Glossary extraction [Park, 2004] –
Concept- -specific Ontology specific Ontology Concept Building through Search Building through Search • Human expert knows what she is interested in: anchor • Human expert knows what she is interested in: anchor concept concept • Find relations and other related concepts for the anchor • Find relations and other related concepts for the anchor concept concept • Active acquisition of knowledge sources through search • Active acquisition of knowledge sources through search – Concept Concept- -defining knowledge source: glossaries or defining knowledge source: glossaries or – dictionaries dictionaries – Up Up- -to to- -date knowledge source: web documents date knowledge source: web documents – • Very useful for recognizing missing terms • Very useful for recognizing missing terms
Domain Term Recognition Domain Term Recognition • Nominal Expressions • Nominal Expressions – acute radiation syndrome acute radiation syndrome – – intercontinental and submarine intercontinental and submarine- -launched launched – ballistic missile ballistic missile – highly enriched uranium highly enriched uranium – • New Domain Word Identification • New Domain Word Identification – agroterrorism agroterrorism, astrobiology, , astrobiology, biocomputation biocomputation – • Generic • Generic Premodifier Premodifier Filtering Filtering – average average radial first harmonic radial first harmonic runout runout – – absolute absolute amazement/zero amazement/zero –
Recommend
More recommend