large scale knowledge representation of large scale
play

Large Scale Knowledge Representation of Large Scale Knowledge - PowerPoint PPT Presentation

Large Scale Knowledge Representation of Large Scale Knowledge Representation of Distributed Biomedical Information Distributed Biomedical Information Volker St mpflen mpflen Volker St Thorsten Barnickel Thorsten Barnickel Karamfilka


  1. Large Scale Knowledge Representation of Large Scale Knowledge Representation of Distributed Biomedical Information Distributed Biomedical Information Volker Stü ümpflen mpflen Volker St Thorsten Barnickel Thorsten Barnickel Karamfilka Nenova Nenova Karamfilka MIPS / Institute for for Bioinformatics Bioinformatics MIPS / Institute GSF – – National Research Center National Research Center for for Environment Environment and and Health Health GSF TMRA 07 TMRA 07

  2. Understanding Understanding Complex Biological Biological Systems Systems Complex Data e g d e l w o n K TMRA 07 TMRA 07

  3. Systems Biology Biology TMRA 07 TMRA 07 Systems

  4. Questions Questions � Different Different knowledge knowledge domains domains ? ? � � Ontologies Ontologies for for semantic semantic structuring structuring ? ? � � Semantic Semantic structures structures from from free free text ? text ? � � Knowledge Knowledge representation representation from from distributed distributed � resources ? ? resources TMRA 07 TMRA 07

  5. Merging Knowledge Knowledge Merging from Different Domains Different Domains from TMRA 07 TMRA 07

  6. Semantic Structuring Structuring Semantic Demands for for Ontologies Ontologies Demands � Life Life sciences sciences have have a a long long tradition tradition in in classification classification … … � � … … various various ontologies ontologies are are available available and in and in use use � � Ontologies (in Ontologies (in the the broadest broadest sense sense): ): � � Controlled Controlled vocabularies vocabularies � � Taxonomies Taxonomies � � Frames Frames � � … … � � Examples Examples for for Ontologies: Ontologies: � � MeSH MeSH terms terms, Gene , Gene Ontology Ontology (GO), (GO), FunCat FunCat, , … … � � Many Many more more from from e.g e.g. Open . Open Biomedical Biomedical Ontologies Ontologies � (http://obofoundry.org/ http://obofoundry.org/) ) ( TMRA 07 TMRA 07

  7. Example: : Extending Extending the the Functional Functional Example Context of Proteins of Proteins Context TMRA 07 TMRA 07

  8. Semantic Structuring Structuring and and Semantic Knowledge Representation Representation Knowledge Knowledge Portal Topic Map Topic Map Generation Generation Textmining Distributed access system • several hundreds of biomedical resources Web Service Web Service • distributed • > 1-2 PetaByte TMRA 07 TMRA 07

  9. Knowledge in Free Text in Free Text Knowledge … of pathogen response genes that prevent disease progression. The expression of ERF1 can be activated rapidly by ethylene Free text or jasmonate and can be activated synergistically by both hormones. In addition, both signalling … Topic Map TMRA 07 TMRA 07

  10. REBIMET REBIMET � Relation Relation Extraction Extraction from from Biomedical Biomedical Texts Texts � TMRA 07 TMRA 07

  11. Entity Recognition Recognition Entity � Identification Identification of relevant of relevant biological biological entities entities: : � � Based Based on synonym on synonym lists lists created created from from terms terms in in � taxonomies, , gene gene names names, , … …. . taxonomies � Realized Realized with with Apaches Apaches Lucene Lucene � TMRA 07 TMRA 07

  12. Information Extraction Extraction with with Semantic Semantic Role Role Information Labeling and and Cooccurrence Cooccurrence Labeling ASSERT tool 1. Semantic Role Labeling: (Pradhan S. et al., 2005) 1.1 SPA structure for verb a) 1.2 SPA structure for verb b) 2. Information Extraction: TMRA 07 TMRA 07

  13. Simplified TM TM Representation Representation Simplified � Generation of Topic Generation of Topic Map Map fragments fragments � � Connection Connection to to evidence evidence in text in text by by reification reification � TMRA 07 TMRA 07

  14. Screenshot Portal Portal Screenshot � PSI PSI based based merging merging � of textmining model model of textmining with genome genome model model with TMRA 07 TMRA 07

  15. Large Scale Scale Integration and Integration and Large Knowledge Representation Representation Knowledge Topic Map Topic Map Generation Generation Textmining Distributed access system Web Service Web Service TMRA 07 TMRA 07

  16. GeKnow: Integration of : Integration of GeKnow PEDANT, SIMAP, NCBI data, NCBI PubMed PubMed PEDANT, SIMAP, NCBI data, NCBI � PEDANT 3 ~ 600 GB PEDANT 3 ~ 600 GB � contains 450 genomes each stored in a single MySQL MySQL database database contains 450 genomes each stored in a single � � no possibilities for simultaneous cross genome comparison no possibilities for simultaneous cross genome comparison � � � SIMAP ~ SIMAP ~ 540 GB 540 GB compressed compressed � contains over 7 Mio. unique protein sequences contains over 7 Mio. unique protein sequences � � � NCBI NCBI � Taxonomy information (some thousands) Taxonomy information (some thousands) � � � Textmining from Textmining from PubMed PubMed � 16 Mio. abstracts, 65 Mio Hits, 15 Mio. Sentences, 13 Mio. SPA 16 Mio. abstracts, 65 Mio Hits, 15 Mio. Sentences, 13 Mio. SPA � � structures structures � Integration of these data on the fly Integration of these data on the fly � � Semantic linking of PEDANT databases with SIMAP and NCBI Semantic linking of PEDANT databases with SIMAP and NCBI � Taxonomy Taxonomy � No redundant data No redundant data � TMRA 07 TMRA 07

  17. How To To Generate Generate the the Topic Topic Maps Maps ? ? How Generation of TM fragments � Problems Problems with with generation generation of of one one large TM large TM � � Very Very large large data data collections collections ( (storage storage problems problems) ) � � Distributed Distributed � � Update Update problems problems � TMRA 07 TMRA 07

  18. System Architecture (GeKnow GeKnow) ) System Architecture ( Extension of our our n n- -Tier Tier � � Extension of J2EE based J2EE based component component and service service oriented oriented and architecture architecture (EJBs ( EJBs and Web Services) and Web Services) � Simply by Simply by adding adding some some � semantic components components .. .. semantic .. and one one semantic semantic Tier Tier � � .. and TMRA 07 TMRA 07

  19. Concept: : Concept Independent semantic layer on top of arbitrary data sources � Independent semantic layer on top of arbitrary data sources � Semantic level Semantic manager (merging, fragments) TM Resource manager Configuration Integration level Web Service Web Service TMRA 07 TMRA 07

  20. Integration Tier Integration Tier � Resource Resource: : � Aware of of mapping mapping Aware � � between topic between topic / / association association types and and methods methods types from data data source source from � Handler Handler: : � Proxy Proxy � � Manages connections Manages connections � � Execute query query methods methods Execute � � TMRA 07 TMRA 07

  21. Syntax Tier – – Topic Topic Types Types Syntax Tier � Converts Converts resource resource � specific format format specific into TM TM fragments fragments into � May May access access � multiple resources resources multiple (handled handled by by ( Resource Manager) Manager) Resource TMRA 07 TMRA 07

  22. Syntax Tier – – Association Types Association Types Syntax Tier � Converts Converts resource resource � specific format format specific into TM TM fragments fragments into � May May access access � multiple resources resources multiple (handled handled by by ( Resource Manager) Manager) Resource TMRA 07 TMRA 07

  23. Semantic Tier Semantic Tier � Responsible Responsible for for � fragment generation generation fragment � � Merging Merging � � � No No programming programming required required ( (only only configuration configuration) ) � Configuration TMRA 07 TMRA 07

  24. Portal / Portlets Portlets (JSR (JSR- -168) 168) Portal / TMRA 07 TMRA 07

  25. Portal Portal � Currently Currently JSF JSF based based � � Caused Caused several several problems problems � � Migration to Migration to more more generic generic portlets portlets � (XSLT based based) ) (XSLT TMRA 07 TMRA 07

Recommend


More recommend