Introduction Modelling Biological Knowledge with OWL • Much has been written about what KR languages can offer domain experts in terms of modelling facilities • Much less has been written about what Robert Stevens and Georgina Moulton Bio-Health Informatics Group domain experts need to capture in such School of Computer Science languages University of Manchester • OWL is the latest standard in ontology UK robert.stevens@manchester.ac.uk languages - how does it stack up when georgina.moulton@manchester.ac.uk representing biological knowledge? Talk Outline Talk Aims • To provide an insight into how OWL’s • Introduction to OWL model matches some of the requirements of • Representing biological knowledge in OWL the domain of biology • A case study - the phosphatase example • To illustrate the design patterns that can be • Ontological design patterns for the biologist used to overcome some of the limitations of OWL • Limitations posed by OWL • To give a flavour of some of the ‘hard’ • Summary problems - the challenges posed by biology 1
-Mosquito gross anatomy -Mouse adult gross anatomy -Mouse gross anatomy and development -C. elegans gross anatomy A Shared Understanding -Arabidopsis gross anatomy -Cereal plant gross anatomy -Protein covalent bond -Drosophila gross anatomy -Protein domain -Pathway ontology -Dictyostelium discoideum anatomy -UniProt taxonomy -Event (INOH pathway -Fungal gross anatomy FAO ontology) -Plant structure -Systems Biology -Maize gross anatomy -Protein-protein • A common understanding of that which -Medaka fish anatomy and development -Sequence types interaction -Zebrafish anatomy and development and features -Genetic Context BRENDA tissue / exists in biology Proteins Phenotype enzyme source Sequence Pathways Anatomy • Currently mostly human orientated • A move towards a shared understanding for Genotype Phenotype computers Development Plasmodium Gene products Transcript Cell type life cycle • Needs strict semantics, appropriate -NCI Thesaurus -Arabidopsis development - Molecule role -Mouse pathology -Cereal plant development expressivity and ontological distinction - Molecular Function -Human disease -Plant growth and developmental stage - Biological process -Cereal plant trait -C. elegans development - Cellular component -PATO PATO attribute and value.obo -Drosophila development FBdv fly -Mammalian phenotype development.obo OBO yes yes -Habronattus courtship eVOC (Expressed -Human developmental anatomy, abstract -Loggerhead nesting Sequence Annotation version for Humans) -Animal natural history and life history -Human developmental anatomy, timed version • After Chris Welty et al So What Counts as an Knowledge Representation Ontology? Languages General Frames Formal Ontological Distinction Thesauri Logical (properties) Is-a constraints Sharp Catalog/ ID Low Lax Disjointness, Formal Informal Terms/ Inverse, partof instance Is-a Value glossary Strict restrictions High Arom Gene Ontology Language Semantics Language Expressivity TAMBIS EcoCyc Blurred Mouse Anatomy PharmGKB 2
OWL OWL in One • Ontologies will form the back bone of the Slide semantic web C A • OWL is the latest standard in ontology B P languages from the W3C • Layered on top of RDF and RDF Schema P • Underpinned by Description Logics P Amino Acid Onto Description Logics • A decidable fragment of First Order Logic • Well defined & strict semantics • Possible to use machine reasoning: − Make implicit knowledge explicit − Aid the construction of an ontology • Reasoning services provided by DL reasoners include: − Subsumption − Equivalence − Consistency − Instantiation 3
What it Means Valine Side Chain Each and every Each and every instance of ValineSideChain follows the same instance of constraints as AminoAcidSideChain, BUT with finer constraints AminoAcidSideChain • ValineSideChain is an instance of ChemicalGroup • • SubClassOf: AminoAcidSideChain THAT Class: AminoAcidSideChain • • Functional hasCharge SOME NeutralCharge and SubClassOf: ChemicalGroup THAT property: each • • instance of the hasCharge SOME Charge and Each and every hasPolarity SOME NonPolar and class can have • instance is one of these • hasPolarity SOME polarity and constrained by to hasHydrophobicity SOME Hydrophobicity properties • follow these hasSize SOME GroupSize and and restrictions • • hasHydrophobicity SOME Hydrophobicity hasSize SOME TinySize Defining a Large, Bio-Ontologies Positively Charged Side • Biology poses huge challenges to logicians, Chain computer scientists and other people whose A LargePositivelyChargedSideChain is any job it is to make the technology work... AminoAcidSideChain that amongst other things is Large and • Scaling issues PositivelyCharged • Representation of complex relationships • The conditions that are Class: LargePositiveChargedAminoAcidSideChain sufficient to recognise • Many exceptions • an instance to be a EquivalentTo: AminoAcidSideChain THAT member of this class • • Exceptions to the exceptions! hasCharge SOME positiveCharge and • hasSize SOME LargeSize 4
Protein Classification A Case Study • Bioinformaticians use tools to identify • A peek at how OWL can successfully be functional domains ( e.g., InterProScan) • Tools simply show the presence of domains used to model biological knowledge • Motivation: Use OWL to automate the - they do not classify proteins • Experts classify proteins according to classification of proteins from new genomic sequences domain arrangements - the presence and number of each domain is important Phosphat Phosphatase Functional Ontolog Domains 5
Definition of Tyrosine The Open World Phosphatase • OWL has an open world assumption • Just because I’ve not said it, doesn’t mean it • is not true Class: ProteinPhosphatase • All I’ve said is that a receptor tyrosine EquivalentTo: Protein that hasdomain min 1 PhosphataseCatalyticDomain AND phosphatase has these domain – it may have hasDomain 1 transMembraneDomain others • In direct contrast to relational DB where if it Any protein that has at least 1 PhosphataseCatalyticDomain and exactly 1 transmembrane domain is a receptor tyrosine phosphatase is isn’t stated then it isn’t true • In OWL we mostly “don’t know” We haven’t described functionality, other domains, size, structure, etc., but just because they are not described doesn’t mean they are not possible. Definition for R2A Pase • Class R2A • EquivalentTo: Protein that - hasDomain 2 ProteinTyrosinePhosphataseDomain AND - hasDomain 1 TransmembraneDomain AND - hasDomain 4 FibronectinDomains AND - hasDomain 1 ImmunoglobulinDomain AND … there are known knowns; there are things we - hasDomain 1 MAMDomain AND know we know. We also know there are known unknowns; that is to say we know there are some - hasDomain 1 Cadherin-LikeDomain AND things we do not know. But there are also unknown unknowns -- the ones we don't know - hasDomain only (TyrosinePhosphataseDomain OR we don't know. TransmembraneDomain OR FibronectinDomain OR ImmunoglobulinDomain OR Clathrin-LikeDomain OR ManDomain) We have described all domains, and this states it is only allowed to contain these domains. Any others would mean an instance would be inconsistent 6
Description of an Instance Qualified Cardinality of a Protein Constraints • Instance: P21592 • Restrictions are often just existential TypeOf: Protein That Fact: hasDomain 2 • At least one of the successor ProteinTyrosinePhosphataseDomain and • Can specify how many instances are involved Fact: hasdomain 1 TransmembraneDomain by qualifying the cardinality and • hasDomain 2 FibronectinDomain Fact: hasdomain 4 FibronectinDomains and • Min-2, max-4, etc. Fact: hasDomain 1 • OWL 1.0 didn’t have QCR, though the ImmunoglobulinDomain and reasoners could use it Fact: hasdomain 1 MAMDomain and Fact: hasdomain 1 Cadherin-LikeDomain Classification of Protein Tyrosine Phosphatases Tyrosine Phosphatase (containsDomain some TransmembraneDomain) and (containsDomain at least 1 ProteinTyrosinePhosphataseDomain) R2A Instance : P21592 TypeOf: Protein That Fact: hasDomain 2 ProteinTyrosinePhosphataseDomain and Fact: hasdomain 1 TransmembraneDomain and Fact: hasdomain 4 FibronectinDomains and Fact: hasDomain 1 ImmunoglobulinDomain and Fact: hasdomain 1 MAMDomain and Fact: hasdomain 1 Cadherin-LikeDomain R2A Phosphatase (containsDomain some MAMDomain) and (containsDomain some ProteinTyrosineCatalyticDomain or ImmunoglobulinDomain) and (containsDomain some FibronectinDomain or FibronectinTypeIIIFoldDomain) and (containsDomain exactly 2 ProteinTyrosinePhosphataseDomain) 7
Recommend
More recommend