Prototyping a Biomedical Ontology Recommender Service Clement Jonquet Nigam H. Shah Mark A. Musen jonquet@stanford.edu 1
Ontologies & data & annota@ons (1/2) Hard for biomedical researchers to find the data they need Data integration problem Translational discoveries are prevented Annotating data with biomedical ontologies is a solution Annotations describe data with ontology concepts Semantic annotations (GO annotations , MeSH in PubMed) Ontologies play a common denominator role For examples: A researcher wants to integrate data from different gene expression dataset repositories A curator wants to triage articles for better information retrieval BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 2
Ontologies & data & annota@ons (2/2) Annotation of PMID 19550360 with Human Annotation of PMID 19550360 with FMA Annotation of PMID 19550360 with MeSH Annotation of PMID 19550360 with SNOMED‐CT Disease 1 match 17 matches 33 matches 5 matches BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 3
Which ontology to use? Large number of ontologies Different versions, platforms, formats, etc. Which ontology is relevant? accurate? What’s the risk of a bad choice? Miss a relevant ontology Miss possible reuse and start a new ontology Miss connection/integration with other data that use the right ontologies BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 4
The recommender service Given representative textual metadata (text description, keywords, etc.) Use a method based on semantic annotations generated by the NCBO Annotator Use 206 ontologies from UMLS & NCBO BioPortal Recommend and score the appropriate ontologies to annotate the data BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 5
NCBO Annotator workflow [Jonquet et al, AMIA STB 2009] Extract annotations from text by concept recognition Expand annotations using the knowledge represented in ontologies Score annotations according to the their context and returns them to the user BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 6
Concept recogni@on (step 1) Use a dictionary: a list of strings that identifies ontology concepts 206 ontologies, ~3.5M concepts & ~7M terms Use NCIBI Mgrep, a syntactic concept recognizer High degree of accuracy Fast, scalable, Domain independent BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 7
Seman@c expansion (step 2) • Use is_a hierarchies defined by original ontologies • Use mappings in UMLS Metathesaurus and NCBO BioPortal • Use semantic‐ similarity algorithms based on the is_a graph (ongoing work) BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 8
An example “Melanoma is a malignant tumor of melanocytes which are found predominantly in skin but also in the bowel and the eye”. NCI/C0025201 , Melanocyte in NCI Thesaurus {score: 10} NCI/C0025202 , Melanoma in NCI Thesaurus {score: 10} 39228/DOID:1909 , Melanoma in Human Disease {score: 10} NCI/C0027651 , Neoplasm (synonym of Tumor ) in NCI Thesaurus {8} Is_a closure expansion 39228/DOID:191 , Melanocytic neoplasm , direct parent of Melanoma in Human Disease {score: 8} 39228/DOID:0000818 , cell proliferation disease , grand parent of Melanoma in Human Disease {score: 8} NCI/C0027651 , Neoplasm in NCI Thesaurus, grand‐grand parent of Melanoma in NCI Thesaurus {score: 7} Mapping expansion FMA/C0025201 , Melanocyte in Foundational Model of Anatomy, concept mapped to NCI/C0025201 in UMLS {score: 7} BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 9
Ontology scoring method Each annotation computed by the NCBO annotator has a score depending of the context of annotation (e.g., direct, expanded, etc.) Ontologies are sorted by the sum of the scores of the annotations they have generated In the previous example: NCI/C0025201 {score: 10} + NCI/C0025202 {score: 10} + NCI/ C0027651 {score: 8} + NCI/C0027651 {score: 7} = 35 BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 10
11
BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 12
Recommenda@on & datasets We have compared results for 3 types of datasets about “Melanoma” PubMed article citations Clinicaltrials.gov trials Gene Expression Omnibus datasets Recommendation is data dependent Different types of data will require different types of ontologies for annotations BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 13
Ontologies iden@fied for each resources PubMed Phenotypic quality Human phenotype RadLex NCI Thesaurus Mouse pathology Human disease Galen Experimental Factor FMA Xenopus anatomy Mouse adult gross anatomy Zebrafish anatomy Human developmental Nci anatomy Mosquito gross anatomy anatomy Medaka fish anatomy GEO ClinicalTrials.gov BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 14
Results and analysis High score for big ontologies Key ontologies identified Regardless of the dataset some ontologies are always present Some ontologies appear only with a specific type of data Importance of appropriate recommendation Score does not follow linearly number of annotations Importance of scoring as well as the annotation context weights. BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 15
Future work Enhance backend annotation workflow Concept recognition S emantic expansion Different scoring methods that will support different kinds of recommendation scenarios Use the size of the ontologies to normalize the score Prefer “key” ontologies over other ontologies Parameterized scoring methods Customization of weights for specific contexts Predefined domain‐specific preferences BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 16
Conclusion Enabling data integration and translational discoveries requires scalable data annotation using ontologies It may be hard for a scientist to know which ontology to (re)use in an annotation task We prototyped an ontology recommender service, which, given sample textual metadata, will recommend appropriate ontologies to use Please try it and join us! BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 17
Thank you hXp://obs.bioontology.org hXp://www.bioontology.org 18 BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009
Why using ontologies? BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 19
Biomedical resources indexed with ontology • We have used the annotation workflow to index several important biomedical resources with ontology concepts • The index can be used to enhance search and data integration BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 20
Resources tab in BioPortal Example of resource available Number of annota@ons in the OBR index Ontology concept browsed Link to the original element ID of an element Annota@on context BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 21
Good use of the seman@cs (1/2) BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 22
Good use of the seman@cs (2/2) BioOntologies 2009 ‐ Prototyping a Biomedical Recommender Service ‐ June 28th, 2009 23
Recommend
More recommend