auditing redundant import in reuse of a top
play

Auditing Redundant Import in Reuse of a Top Level Ontology for the - PowerPoint PPT Presentation

ICBO 2013 Workshop on Vaccine and Drug Ontology Studies Auditing Redundant Import in Reuse of a Top Level Ontology for the Drug Discovery Investigations Ontology (DDI) Zhe He 1 Christopher Ochs 1 Larisa Soldatova 2 Yehoshua Perl 1 Sivaram


  1. ICBO 2013 Workshop on Vaccine and Drug Ontology Studies Auditing Redundant Import in Reuse of a Top Level Ontology for the Drug Discovery Investigations Ontology (DDI) Zhe He 1 Christopher Ochs 1 Larisa Soldatova 2 Yehoshua Perl 1 Sivaram Arabandi 3 James Geller 1 1 New Jersey Institute of Technology, 2 Brunel University, 3 Ontopro LLC. 1

  2. Outline • Introduction – Environment – Motivation – Ontology for Drug Discovery Investigations (DDI) – Abstraction Networks & Partial Area Taxonomy • Algorithm Hide – Hiding Redundant BFO (Basic Formal Ontology) classes from DDI • Future work • Conclusions 2

  3. Environment • BioPortal: a large repository of over 340 biomedical ontologies covering a wide range of domains. • Many ontologies in BioPortal are released in OWL or OBO format. • OWL (Web Ontology Language): based on Description Logic, maintained by a working group of W3C. • OBO (Open Biological and Biomedical Ontologies ) Foundry: a collaborative experiment involving developers of ontologies who are establishing a set of principles for ontology development. 3

  4. Motivation • Use a top-level ontology as a template for a domain ontology is recommended. • OBO Foundry recommends importing BFO (Basic Formal Ontology). • The top-domain ontologies OGMS (Ontology for General Medical Science) and BioTop ( Beisswanger et al. 2008 ) reuse BFO. • Some domain ontologies reuse OGMS, thereby indirectly reusing BFO. 4

  5. Motivation (cont.) • Ontologies need to go through Quality Assurance before being put to use. – Discovering modeling errors and inconsistencies in the design – Unused imported top-level classes diminish the usability of the ontology. – Currently, there is no mechanism to remove unused imported classes. – Redundant imported top-level classes should be hidden. 5

  6. Ontology for Drug Discovery Investigations • DDI was developed to support automatic drug discovery investigations run by a Robot Scientist “Eve” ( Qi et al. 2010 ). • DDI is used for reasoning with data about the biological activity of compounds in regards to various drug targets. • DDI uses BFO (Basic Formal Ontology) and RO (Relations Ontology) as design templates and extends BFO and OBI (Ontology for Biomedical Investigations). • Some imported BFO classes were left unused in DDI. – connected_temporal_region – temporal_instant – temporal_interval 6

  7. Abstraction Networks • An abstraction network is a secondary network that provides a compact view of the structure and content of the primary ontology. • Abstraction of an ontology is the process by which subsets of classes are each replaced by a higher-level conceptual entity (node). Abstraction Ontology Network Subset of classes modeled by a node 7

  8. Partial Area Taxonomy • Partial area taxonomy is an abstraction network developed by our research group that summarizes sets of structurally and semantically similar classes. • Partial area taxonomies have been derived for – SNOMED CT ( Wang et al. 2007 ) – Ontology of Clinical Research (OCRe) ( Ochs et al. 2012 ) – Sleep Domain Ontology (SDO) ( Ochs et al. 2013 ) – Cancer Chemoprevention Ontology (CanCo) ( He et al. 2013 ) – etc. 8

  9. Area Taxonomy Area: Set of all classes that are explicitly defined or inferred as being in exactly the domain of a given set of object properties. 9

  10. Partial Area Taxonomy Root: Class with no superclasses in area Partial area: Root + all descendants in area 10

  11. Algorithm Hide • Hide is a post order recursive algorithm requiring linear time. • Hide identifies imported classes that are not used in the domain ontology. • Applicability: – Ontologies in OWL or OBO format – Both domain ontology and top-level ontology are trees. – Top-level ontology does not have object properties. • A Class is redundant if: – Imported from the top-level ontology AND – In Root partial area of the taxonomy AND – A leaf in the domain ontology (at some stage of the algorithm) AND – Not used as range of an object property 11

  12. Partial Area Taxonomy for DDI 12

  13. Entity Node of DDI Taxonomy • 81 classes in Entity root partial area of DDI taxonomy • BFO has 38 classes. • 32 out of 81 classes are imported from BFO. • 6 BFO classes are used as domains of object properties. • Hence, we reviewed 32 classes for redundancy. 13

  14. Entity (2 children) continuant (3 children) dependent_continuant (2 children) independent_continuant (3 children) material_entity (10 children) fiat_object_part object object_aggregate object_boundary site (3 children) BFO Classes spatial_region (4 children) one_dimentional_region in Entity two_dimentional_region Node Before three_dimentional_region zero_dimentional_region Hiding occurent (3 children) processual_entity (6 children) fiat_process_part process (2 children) process_aggregate process_boundary processual_context spatiotemporal_region (2 children) connected_spatiotemporal_region (2 children) spatiotemporal_instant spatiotemporal_interval scattered_spatiotemporal_region temporal_region (2 children) connected_temporal_region (2 children) temporal_instant temporal_interval scattered_temporal_region Legend LL Leaf LL Parent of classes that are all leaves LL Grandparent of grandchildren that are all leaves 14

  15. BFO Classes in Entity Partial Area After Hiding Entity (2 children) continuant (3 children) dependent_continuant (2 children) independent_continuant (3 children) material_entity (10 children) site (3 children) spatial_region (4 children) one_dimentional_region two_dimentional_region three_dimentional_region zero_dimentional_region occurent (3 children) processual_entity (6 children) process (2 children) • 18 unused BFO classes are hidden. • Meaning 18/32 = 56% BFO classes in Entity partial area are hidden. 15

  16. Future Work • As many as 35 out of 186 ontologies we investigated in BioPortal reuse BFO classes. • Some ontologies have a Directed Acyclic Graph (DAG) hierarchy, e.g. SDO (Sleep Domain Ontology) (Arabandi 2010). • Need to consider cases where both top-level and domain ontologies are DAG hierarchies. • Some top-domain ontologies have object properties, e.g. BioTop. • Need to design algorithm to deal with issues regarding redundant import of relationships in the reuse of top-domain ontologies. 16

  17. Conclusions • We described a recursive linear algorithm for hiding unused imported top-level ontology classes of an OWL-based ontology. • The algorithm was demonstrated by hiding 18 (56%) BFO imported classes from the DDI. • Hiding of unused imported top-level classes should be part of the Quality Assurance process of OWL-based ontologies. 17

  18. References • Qi, D., R. D. King, et al. (2010). "An ontology for description of drug discovery investigations." J Integr Bioinform 7 (3). • Arabandi , S. (2010). “Developing a Sleep Domain Ontology.” AMIA TBI/CRI Summit. San Francisco, CA. • Beisswanger , E, S. Schulz, et al. “ BioTop: An Upper Domain Ontology for the Life Sciences.” Appl Ontology 3(4): 205-212. • Wang, Y., et al. (2007). "Structural methodologies for auditing SNOMED." J Biomed Inform 40(5): 561-581. • Ochs, C., A. Agrawal, et al. (2012). "Deriving an Abstraction Network to Support Quality Assurance in OCRe." AMIA Annu Symp Proc: 681-689 • Ochs, C. , Z. He, et al. (2013). “Choosing the Granularity of Abstraction Networks for Orientation and Quality Assurance of the Sleep Domain Ontology.” The 4 th International Conference on Biomedical Ontology Proc. • He, Z., C. Ochs, et al. (2013). “ A Family-based Framework for Supporting Quality Assurance of Biomedical Ontologies in BioPortal .” To appear in AMIA Annu Symp Proc. 18

  19. Thank you! Any Questions? 19

  20. Algorithm of Hide • Algorithm Hide(R, O, T, v) • IF isInternal(O, v) THEN Domain ontology: O • FOR EACH Class w IN subclasses(R, v) { Top-Level ontology: T • Hide(R, O, T, w) Root Partial Area of O: R • } Class in O - v • END IF • • IF NOT(isInternal(O,v)) THEN Function Name Function Description • isInternal(O, v) Boolean function that IF isClassFrom(v, O, T) AND NOT(in_op_range(v, O)) returns true if class v has • THEN any subclasses in ontology O. • hide(v, O) subclasses(R, v) Returns iterator to the set of subclasses of class v in • END IF root partial area R. isClassFrom(v, O, T) Boolean function that • END IF returns true if the class v in • ontology O is imported from Top-Level ontology T. • RETURN in_op_range(v, O) Boolean function that returns true if class v is in the range of an object property of ontology O. • Main Program hide(v, O) Hides class v from ontology O and therefore • // Initially, call Hide on the root class r of the root partial area R. also removes all subclass • relationships from v. Hide(R, O, T, r) 20

Recommend


More recommend