Annotation Analytics for Gene and Protein functions Nigam Shah, MBBS, PhD nigam@stanford.edu
Annotation service Process textual metadata to automatically tag text with as many ontology terms as possible. 107 million calls, ~1000 GB data
Resource index Won 1 st prize at the 2010 Semantic Web Challenge @ ISWC Pubmed Abstracts Adverse Events (AERS) GEO : Clinical Trials Drug Bank
Understanding the genome • Units of study range in length from ‘whole chromosome’ to ‘singe nucleotide’ • E.g. three copies of Chr. 21 Down’s syndrome • The focus in on finding the functional associations of strings in the genome
Generic GO based analysis routine Genome Study Set • Get annotations for each gene in a set • Count the occurrence of each annotation term in the study set • Count the occurrence of that term in some reference set (whole genome?) • P-value for how surprising their overlap is. Reference set
Annotation Analytics Landscape SNOMED-CT NCIT ICD-9 ? MeSH Genes2MSH : Drugs, Chemicals Cell Type Human Disease Gene Ontology GOPubMed Grant Drug Health Indicator Warehouse Gene Patient Paper Sets datasets Sets Sets Sets Sets
Mutation enrichment
Profiling a set of Aging genes 261 Age-related genes Genome Disease Ontology ~ 30% of genome
Annotation Analytics Landscape Mutations SNOMED-CT 1. Units of study range in length from ‘whole chromosome’ to NCIT ‘singe nucleotide’ ICD-9 What else 2. The focus in on finding the functional associations of strings MeSH Genes2MSH in the genome can we do? : 3. For each type of “string”, there Drugs, Chemicals will be some textual descriptions Cell Type that you can process computationally . Aging Human Disease Gene Ontology GOPubMed Drug Health Indicator Warehouse Gene Paper Patient Grant Sets datasets Sets Sets Sets Sets
The team @ www.bioontology.org/project-team NIH Roadmap grant U54 HG004028 10
Recommend
More recommend