go2pub pubmed query tool based on semantic expansion of
play

GO2PUB PubMed Query Tool Based on Semantic Expansion of Gene - PowerPoint PPT Presentation

GO2PUB PubMed Query Tool Based on Semantic Expansion of Gene Ontology Terms, a Lipid Metabolism Case Study Charles Bettembourg, Christian Diot, Anita Burgun and Olivier Dameron INRA UMR598 - INSERM U936 03/07/2012 Bettembourg, Diot,


  1. GO2PUB PubMed Query Tool Based on Semantic Expansion of Gene Ontology Terms, a Lipid Metabolism Case Study Charles Bettembourg, Christian Diot, Anita Burgun and Olivier Dameron INRA UMR598 - INSERM U936 03/07/2012 — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 1 / 39

  2. Introduction Context: Literature search Context: Literature search Pubmed : more than 20 million citations Fast continuous growth — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 2 / 39

  3. Introduction Context: Literature search Context: Literature search Pubmed : more than 20 million citations Fast continuous growth — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 2 / 39

  4. Introduction Context: Literature search Context: Literature search Pubmed : more than 20 million citations Fast continuous growth Numerous queries to build... . . . and relevant results to select. Need for automatic search tools — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 2 / 39

  5. Introduction Requirements Requirements Precise and complex queries — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 3 / 39

  6. Introduction Requirements Requirements Precise and complex queries Low silence and noise Available articles — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 3 / 39

  7. Introduction Requirements Requirements Precise and complex queries Low silence and noise Available articles Relevant article Non relevant article — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 3 / 39

  8. Introduction Requirements Requirements Precise and complex queries Low silence and noise Available articles Search results Relevant article Non relevant article Silence Noise — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 3 / 39

  9. Introduction Requirements Relevance measures: Precision PubMed Search results Relevant article Non relevant article Silence Noise Precision = Relevant retrieved documents = All retrieved documents + Precision Precision is the ratio between relevant retrieved documents and all the results obtained by the search tool for a query. — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 4 / 39

  10. Introduction Requirements Relevance measures: Recall PubMed Search results Relevant article Non relevant article Silence Noise Recall = Relevant retrieved documents = All relevant documents + Recall Precision is the ratio between relevant retrieved documents and all the relevants documents available in the database. — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 5 / 39

  11. Introduction Requirements Relevance measures: F-Score F-Score Measure combining precision and recall (1 + β²) . (Precision . Recall) F = β (β² . Precision + Recall) 2 . (Precision . Recall) F = 1 (Precision + Recall) — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 6 / 39

  12. Introduction Problems Domain specific vocabulary Application: literature search for species-specific metabolisms ◮ ex: lipid metabolism for chicken Several methods and tools ◮ Interface to set filters on PubMed queries ◮ Text-mining approaches ⋆ Natural Language Process ⋆ Latent Semantic Analysis BUT: Need a corpus of specific vocabulary — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 7 / 39

  13. Introduction Problems Complex querying process Writing exhaustive and complex queries relies on domain-specific knowledge ◮ ex: lipid metabolism Need a lot of keywords for a complex query ◮ Contradicts user-friendly requirement Automatic query enrichment using ontologies reconciles both requirements — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 8 / 39

  14. Introduction Problems Complex querying process Writing exhaustive and complex queries relies on domain-specific knowledge ◮ ex: lipid metabolism Need a lot of keywords for a complex query ◮ Contradicts user-friendly requirement Automatic query enrichment using ontologies reconciles both requirements Ontology definition (Bard, 2004) An ontology is a formal way of representing knowledge in which concepts are described both by their meaning and their relationship to each other. — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 8 / 39

  15. Semantic expansion and query enrichment Gene Ontology Gene Ontology Controlled vocabulary Hierarchy with inheritance — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 9 / 39

  16. Semantic expansion and query enrichment Gene Ontology Gene Ontology Controlled vocabulary Hierarchy with inheritance More than 34.000 terms to describe: ◮ Biological Processes ◮ Molecular Functions ◮ Cellular Components — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 9 / 39

  17. Semantic expansion and query enrichment Gene Ontology Annotations Gene Ontology Annotations Functional annotation of genes Multi-species One GO term may annote many genes Each gene can be used as PubMed keyword — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 10 / 39

  18. Semantic expansion and query enrichment Gene Ontology Annotations Gene Ontology Annotations Functional annotation of genes Multi-species One GO term may annote many genes Each gene can be used as PubMed keyword Main idea The genes annotated by a GO term of interest or one of its descendants can be used as keywords in a PubMed query. — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 10 / 39

  19. Semantic expansion and query enrichment Expansion illustration Expansion illustration PPAR Regulation of fatty acid metabolic process (GO:0019217) CAV1 Regulation of fatty acid biosynthetic process (GO:0042304) Negative regulation of Positive regulation of fatty acid biosynthetic fatty acid biosynthetic process (GO:0045717) process (GO:0045723) ChREBP APOA1 BRCA1 — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 11 / 39

  20. Semantic expansion and query enrichment Expansion illustration Expansion illustration PPAR Regulation of fatty acid metabolic process (GO:0019217) CAV1 Regulation of fatty acid biosynthetic process (GO:0042304) Negative regulation of Positive regulation of fatty acid biosynthetic fatty acid biosynthetic process (GO:0045717) process (GO:0045723) ChREBP APOA1 BRCA1 Extension to the descendants is important Not supported by other tools — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 11 / 39

  21. Semantic expansion and query enrichment Example Example GO:0019217 Regulation of fatty acid metabolic process 14 genes 57 symbols, names and synonyms = keywords — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 12 / 39

  22. Semantic expansion and query enrichment Example Semantic expansion is useful GO:0019217 (Regulation of fatty acid metabolic process), Chicken — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 13 / 39

  23. Semantic expansion and query enrichment Example Semantic expansion is useful GO:0019217 (Regulation of fatty acid metabolic process), Chicken Without query expansion: ◮ 2 articles concerning only 1 gene — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 13 / 39

  24. Semantic expansion and query enrichment Example Semantic expansion is useful GO:0019217 (Regulation of fatty acid metabolic process), Chicken Without query expansion: ◮ 2 articles concerning only 1 gene With query expansion: ◮ 9 articles concerning 7 genes — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 13 / 39

  25. GO2PUB website http://go2pub.genouest.org — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 14 / 39

  26. GO2PUB website Fill the form Please enter a GO term lipi — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 15 / 39

  27. GO2PUB website Query example Query example — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 16 / 39

  28. Relevance of GO2PUB Method Relevance analysis Comparison of GO2PUB with PubMed query system and with GoPubMed — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 17 / 39

  29. Relevance of GO2PUB Method Relevance analysis Comparison of GO2PUB with PubMed query system and with GoPubMed Qualitative analysis ◮ Selection of relevant results of 3 very specific queries sent to GO2PUB, PubMed and GoPubMed ◮ Calculation of Precision, Recall and F-score for each tool. — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 17 / 39

  30. Relevance of GO2PUB Method Relevance analysis Comparison of GO2PUB with PubMed query system and with GoPubMed Qualitative analysis ◮ Selection of relevant results of 3 very specific queries sent to GO2PUB, PubMed and GoPubMed ◮ Calculation of Precision, Recall and F-score for each tool. Generalization study ◮ Comparison of the results obtained with 20 random GO terms ◮ Selection of relevant results and computation of Precision, Recall and F-score for GO2PUB and GoPubMed — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 17 / 39

Recommend


More recommend