management of quantified semantic taxonomies for
play

Management of Quantified Semantic Taxonomies for Biothreat Response - PowerPoint PPT Presentation

Management of Quantified Semantic Taxonomies for Biothreat Response Cliff Joslyn Computer and Computational Sciences Los Alamos National Laboratory Modeling, Algorithms, and Informatics (CCS-3) DIMACS Tutorial and Working Group on


  1. Management of Quantified Semantic Taxonomies for Biothreat Response Cliff Joslyn Computer and Computational Sciences Los Alamos National Laboratory Modeling, Algorithms, and Informatics (CCS-3) DIMACS Tutorial and Working Group on Order-Theoretic Aspects of Epidemiology March, 2005 Los Alamos Unlimited Release 04-8407, 05-0340, 05-0640, 05-0907, 05-1621

  2. OUTLINE • Knowledge integration for biothtreat response • Bio-ontologies • Order theoretical representations and approaches: POSet Ontologies (POSOs) • Categorization and annotation problems • Quantified POSOs • Interoperability problem: towards a mathematical definition Cliff Joslyn, joslyn@lanl.gov dimacs05f, p. 1, 3/8/2005

  3. KNOWLEDGE INTEGRATION FOR BIOTHREAT RESPONSE • Rapid response to a novel Presentation biothreat • Past experiences: flu, resis- Alert tant TB, SARS, ebola, an- Genomic/ Diagnostic Proteomic thrax Agent Identification • Natural or engineered Lethality Virulence • Mucho funding: NIH, NSF, Agent Characterization DHS, DOD, DARPA, DOE Immunological Pathogenesis • New Los Alamos effort in Pathways Transmissibility Disease computational and theoret- Characterization Containment Therapeutic ical pathomics Response • Integration of knowledge bases within a biothreat Attribution response workflow KM Verspoor, CA Joslyn, JA Ambrosiano, A B¨ acker, O Bodenreider, L Hirschman, P Karp, H Kelly, S Loranger, M Musen, R Sriram, C Wroe: (2005) “Knowledge Integration for Biothreat Response”, Los Alamos Technical Report 05-0907 Cliff Joslyn, joslyn@lanl.gov dimacs05f, p. 2, 3/8/2005

  4. BIO-ONTOLOGIES • Domain-specific concepts and their semantic relations • At least: taxonomic, semantic hierarchies of typed objects and relations • In addition: inference engines over these data objects • Genomic revolution: large collections of hierarchically orga- nized categorizations of biological objects such as genes and proteins • IT revolution generally: anatomy, clinical, epidemeological • Computational biology primary success story for ontology development • Rapid proliferation: many more, more coming, other fields Gene Ontology http://www.geneontology.org Fundamental Model of Anatomy http://sig.biostr.washington.edu/projects/fm/AboutFM.html Unified Medical Language System http://www.nlm.nih.gov/research/umls Open Biology Ontologies http://obo.sourceforge.net MEdical Subject Headings http://www.nlm.nih.gov/mesh/meshhome.html Enzyme Structures Database http://www.biochem.ucl.ac.uk/bsm/enzymes Cliff Joslyn, joslyn@lanl.gov dimacs05f, p. 3, 3/8/2005

  5. GENE ONTOLOGY (GO): DNA METABOLISM PORTION • Taxonomic con- trolled vocabulary • ∼ 16 K nodes P GO populated by genes, proteins • Two orders on P GO : ≤ isa , ≤ has • Major community effort: assuming primary position in general bioin- Gene Ontology Consortium (2000): “Gene Ontology: Tool formatics For the Unification of Biology”, Nature Genetics , 25:25-29 • Tremendous computational resource: large, semantically rich, validated, middle ontology, first (?) in major use Cliff Joslyn, joslyn@lanl.gov dimacs05f, p. 4, 3/8/2005

  6. GO CA. 2001 Courtesy of Robert Kueffner, NCGR, 2001 Cliff Joslyn, joslyn@lanl.gov dimacs05f, p. 5, 3/8/2005

  7. CATEGORIZATION TASK: “CLUSTER” GENES IN ONTOLOGY SPACE • Develop functional hypotheses about genes identified through expression experiments • Given the Gene Ontology (GO) . . . • And a list of hundreds of genes of interest . . . • “Splatter” them over the GO . . . • Where do they end up? – Concentrated? – Dispersed – Clustered? – High or low? – Overlapping or distinct? Joslyn, Cliff; Mniszewski, Susan; Fulmer, Andy; and Heaton, Gary: (2004) “The Gene Ontology Categorizer”, Bioinformatics , v. 20 :s1, pp. 169-177 Cliff Joslyn, joslyn@lanl.gov dimacs05f, p. 6, 3/8/2005

  8. ANNOTATION TASK • Mappings among regions of sequence, structure, key- word spaces x • Mappings into regions of biological function space: Sequences taxonomic bio-ontologies of molecular function • Characterize formal struc- ture of bio-ontologies: – Order theoretical ap- proaches Structures Functions – Combinatoric algorithms KM Verspoor, JD Cohn, SM Mniszewski, and CA Joslyn: (2004) “Nearest Neighbor Catego- rization for Function Prediction”, in: Proc. 5th Community Wide Experiment on the Critical Assessment of Techniques for Protein Struc- ture Prediction (CASP 05) , in press Keywords/Literature Cliff Joslyn, joslyn@lanl.gov dimacs05f, p. 7, 3/8/2005

  9. INTEROPERABILITY TASKS: MERGING AND MATCHING Matching: Measure similarity between 1 1 two regions of a single ontology Comparing: Twist one ontology on a A C K given term set into another ordering i Merging: Given two completely dis- tinct ontologies: G F E J b g,h b g,h,i • Identify structurally similar re- gions: intersection I D j j • Create encompassing meta- GO EC ontologies: product or union? Cliff Joslyn, joslyn@lanl.gov dimacs05f, p. 8, 3/8/2005

  10. ORDER THEORETICAL KNOWLEDGE DISCOVERY • Cast databases as (collections of) ordered data objects: Native: Constructed explicitly (e.g. ontologies) Induced: From other relational data (e.g. concept lattices) • With inherent semantics: node, link types; metadata; text • Equipped with measures: Combinatorial: Distance, rank Statistical: Various scores, entropy measures . . . • Tasks: Induction, navigation, visualization, link analysis, search, classification, retrieval, anomaly detection, merger, linkage • Motivated now by appearance of databases and methods • Substantial progress and value from novel applications of elementary concepts • Need help : algorithms, mathematics, applications, funding, concepts, organization? Joslyn, Cliff; Oliverira, Joseph; and Scherrer, Chad: (2004) “Order Theoretical Knowledge Discovery: A White Paper”, Los Alamos Technical Report 04-5812, ftp://ftp.c3.lanl.gov/pub/users/joslyn/white.pdf Cliff Joslyn, joslyn@lanl.gov dimacs05f, p. 9, 3/8/2005

  11. SEMANTIC HIERARCHIES AS PARTIALLY ORDERED SETS • Partial Order: Set P ; relation ≤ ⊆ Directed P 2 : reflexive, anti-symmetric, tran- Graph sitive • Poset: P = � P, ≤� Partial Order = • Simplest mathematical structures Poset = DAG which admit to descriptions in terms of “levels” and “hierarchies” • More specific than graphs or net- Lattice works: no cycles, equivalent to Di- rected Acyclic Graphs (DAGs) • More general than trees, lattices: Tree single nodes, pairs of nodes can have multiple parents • Ubiquitous in knowledge systems: Antichain Chain constructed, induced, empirical Cliff Joslyn, joslyn@lanl.gov dimacs05f, p. 10, 3/8/2005

  12. BASIC POSET CONCEPTS Comparable Nodes: a ∼ b := a ≤ b or b ≤ a Chain: Collection of comparable nodes: a 1 ≤ a 2 ≤ . . . ≤ a n Chains: a ≤ b → C ( a, b ) := { C 1 ( a, b ) , . . . , C j ( a, b ) , . . . , C M ( a, b ) } ⊆ 2 2 P , and use C j , 1 ≤ j ≤ M . 1 Height: Size of maximal chain: H ( P ) Noncomparable Nodes: a �∼ b B C K Antichain: Collection of noncom- parable nodes: a 1 �∼ a 2 �∼ . . . �∼ I a n f F G E J Width: Size of maximal antichain b,d b g,h,i H W ( P ) e Interval: [ a, b ] := { c ∈ P : a ≤ c ≤ b } A D is a bounded sub-poset of P a,b,c j Cliff Joslyn, joslyn@lanl.gov dimacs05f, p. 11, 3/8/2005

  13. SOME GO POSET STATISTICS H W Nodes Leaves Interior Edges MF 7.0K 5.6K 1.3K 8.1K 13 ≥ 3 . 5K BP 7.7K 4.1K 3.6K 11.8K 15 ≥ 2 . 9K CC 1.3K 0.9K 0.4K 1.7K 13 ≥ 0 . 4K GO 16.0K 10.6K 5.4K 21.5K 16 ≥ 5 . 9K • GO for September, 2003 • Model as P GO = � P GO , ≤ isa ∪ ≤ has � Cliff Joslyn, joslyn@lanl.gov dimacs05f, p. 12, 3/8/2005

  14. DAGS, POSETS, AND COVERS 1 Graphical DAG: Γ := { γ 1 , γ 2 , . . . , γ i , . . . , γ n } Directed Edge: B C K γ i = � a, b � ∈ P 2 , a, b ∈ P . I Also use γ ( a, b ). F G E J Relational DAG: H D (Γ) := � P, ⇐� , where ⇐ ⊆ P 2 , ∀ a, b ∈ P, a ⇐ b ↔ A D � a, b � ∈ Γ. 0 Cover: V ( D ) := � P, < ·� , transitive reduction of ⇐ Poset: P ( D ) := � P, ≤� , transitive and reflexive closure of ⇐ . Ideal, Filter: ↓ ( a ) := { b ∈ P : b ≤ a } , ↑ ( a ) := { b ∈ P : a ≤ b } Children, Parents: ↓ ( a ) := { b ∈ P : b < · a } , ˙ ˙ ↑ ( a ) := { b ∈ P : a < · b } Cliff Joslyn, joslyn@lanl.gov dimacs05f, p. 13, 3/8/2005

  15. CHAIN DECOMPOSITION OF INTERVALS Assume a ≤ b ∈ P 1 Chain Decomposition: M � [ a, b ] = C j B C K j =1 Dilworth: M ≥ W ([ a, b ]) I Chain Length: F G E J h j := | C j | − 1 , ¯ h j := h j / ( H − 1) H Vectors of Chain Lengths: � � � h ( a, b ) := A D h 1 , h 2 , . . . , h j , . . . , h M , � h ( a, b ) := � ¯ h/ ( H − 1) 0 Extremes: ¯ ¯ h ∗ ( a, b ) = min h j , h ∗ ( a, b ) = min h j , h j ∈ � h j ∈ � h ( a,b ) ¯ ¯ h ( a,b ) h ∗ ( a, b ) = h ∗ ( a, b ) = ¯ ¯ max max h j , h j . h j ∈ � h j ∈ � ¯ ¯ h ( a,b ) h ( a,b ) Chains: C j = { γ ( a, c 1 ) , . . . , γ ( c h j − 3 , c h j − 2 ) , γ ( c h j − 2 , b ) } for some collection of nodes { c 1 , c 2 , . . . , c i , . . . c h j − 2 } ⊆ P, 1 ≤ i ≤ h j − 2. C j = a < · c 1 < · . . . < · c h j − 3 < · c h j − 2 < · b, γ i ∈ C j , 1 ≤ i ≤ h j Cliff Joslyn, joslyn@lanl.gov dimacs05f, p. 14, 3/8/2005

Recommend


More recommend