Systematic Annotation Mark Voorhies 4/5/2012 Mark Voorhies Systematic Annotation
Review RTFM PNAS 95:14863 Mark Voorhies Systematic Annotation
The Gene Ontology Three directed acyclic graphs (aspects): Biological Process Molecular Function Subcellular Component Mark Voorhies Systematic Annotation
The Gene Ontology Mark Voorhies Systematic Annotation
The Gene Ontology Mark Voorhies Systematic Annotation
The AmiGO browser Mark Voorhies Systematic Annotation
The Gene Ontology How might we annotate genes with GO terms? How do we calculate the significance of the GO terms associated with a particular group of genes? Mark Voorhies Systematic Annotation
Associating GO terms How might we annotate genes with GO terms? Mark Voorhies Systematic Annotation
Associating GO terms How might we annotate genes with GO terms? By sequence homology ( e.g. , BLAST) By domain homology ( e.g. , InterProScan) Mapping from an annotated relative ( e.g. , INPARANOID) Human curation of the literature ( e.g. , SGD) Mark Voorhies Systematic Annotation
Associating GO terms: Evidence codes Experimental EXP: Inferred from Experiment IDA: Inferred from Direct Assay IPI: Inferred from Physical Interaction IMP: Inferred from Mutant Phenotype IGI: Inferred from Genetic Interaction IEP: Inferred from Expression Pattern Computational Analysis ISS: Inferred from Sequence or Structural Similarity ISO: Inferred from Sequence Orthology ISA: Inferred from Sequence Alignment ISM: Inferred from Sequence Model IGC: Inferred from Genomic Context RCA: inferred from Reviewed Computational Analysis Author Statement TAS: Traceable Author Statement NAS: Non-traceable Author Statement Curator Statement Evidence Codes IC: Inferred by Curator ND: No biological Data available Automatically-assigned IEA: Inferred from Electronic Annotation Obsolete NR: Not Recorded Mark Voorhies Systematic Annotation
The Gene Ontology How might we annotate genes with GO terms? How do we calculate the significance of the GO terms associated with a particular group of genes? Mark Voorhies Systematic Annotation
Sampling with replacement: Mutagenesis How many transformants do we have to screen in order to “cover” a genome? Mark Voorhies Systematic Annotation
Sampling with replacement: Mutagenesis How many transformants do we have to screen in order to “cover” a genome? Probability that a transformant has (1) disrupted gene: p m Number of genes in organsim: N g Mark Voorhies Systematic Annotation
Sampling with replacement: Mutagenesis How many transformants do we have to screen in order to “cover” a genome? Probability that a transformant has (1) disrupted gene: p m Number of genes in organsim: N g Probability that a specific gene is disrupted in a specific transformant: � 1 � = p m p d = p m (1) N g N g Mark Voorhies Systematic Annotation
Sampling with replacement: Mutagenesis How many transformants do we have to screen in order to “cover” a genome? Probability that a transformant has (1) disrupted gene: p m Number of genes in organsim: N g Probability that a specific gene is disrupted in a specific transformant: � 1 � = p m p d = p m (1) N g N g Probability of not disrupting that gene: p u = 1 − p m (2) N g Mark Voorhies Systematic Annotation
Sampling with replacement: Mutagenesis Probability of not disrupting that gene: p u = 1 − p m (3) N g Mark Voorhies Systematic Annotation
Sampling with replacement: Mutagenesis Probability of not disrupting that gene: p u = 1 − p m (3) N g The probability of not disrupting that gene n independent times is: � n � 1 − p m p u , n = (4) N g Mark Voorhies Systematic Annotation
Sampling with replacement: Mutagenesis Probability of not disrupting that gene: p u = 1 − p m (3) N g The probability of not disrupting that gene n independent times is: � n � 1 − p m p u , n = (4) N g And the probability of disrupting that gene n independent times is: � n � 1 − p m p d , n = 1 − p u , n = 1 − (5) N g Mark Voorhies Systematic Annotation
Sampling with replacement: Mutagenesis Probability of not disrupting that gene: p u = 1 − p m (3) N g The probability of not disrupting that gene n independent times is: � n � 1 − p m p u , n = (4) N g And the probability of disrupting that gene n independent times is: � n � 1 − p m p d , n = 1 − p u , n = 1 − (5) N g This is also the expected genome coverage. Mark Voorhies Systematic Annotation
Sampling with replacement: Mutagenesis 1.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.8 ● ● 0.6 p_i or coverage ● 0.4 ● 0.2 0.0 0 50000 100000 150000 200000 n Mark Voorhies Systematic Annotation
Sampling with replacement: General Cases Calculating the probability of zero events was easy. � n � 1 − p m p 0 , n = (6) N g Mark Voorhies Systematic Annotation
Sampling with replacement: General Cases Calculating the probability of zero events was easy. � n � 1 − p m p 0 , n = (6) N g What about exactly k events? Mark Voorhies Systematic Annotation
Sampling with replacement: General Cases Calculating the probability of zero events was easy. � n � 1 − p m p 0 , n = (6) N g What about exactly k events? Binomial distribution: � n � p k m (1 − p m ) n − k p k , n = (7) k Mark Voorhies Systematic Annotation
Sampling with replacement: General Cases Calculating the probability of zero events was easy. � n � 1 − p m p 0 , n = (6) N g What about exactly k events? Binomial distribution: � n � p k m (1 − p m ) n − k p k , n = (7) k What if there is more than one type of event? Mark Voorhies Systematic Annotation
Sampling with replacement: General Cases Calculating the probability of zero events was easy. � n � 1 − p m p 0 , n = (6) N g What about exactly k events? Binomial distribution: � n � p k m (1 − p m ) n − k p k , n = (7) k What if there is more than one type of event? Multinomial distribution: n ! � p k i p k 1 , k 2 ,..., n = � k i ! (8) i Mark Voorhies Systematic Annotation
Sampling without replacement: GO Annotation The binomial distribution assumes that event probabilities are constant: � n � p k m (1 − p m ) n − k p k , n = (9) k Mark Voorhies Systematic Annotation
Sampling without replacement: GO Annotation The binomial distribution assumes that event probabilities are constant: � n � p k m (1 − p m ) n − k p k , n = (9) k What if there are m virulence factors in our genome, and every time we discover one it is magically removed from our library? Mark Voorhies Systematic Annotation
Sampling without replacement: GO Annotation The binomial distribution assumes that event probabilities are constant: � n � p k m (1 − p m ) n − k p k , n = (9) k What if there are m virulence factors in our genome, and every time we discover one it is magically removed from our library? Hypergeometric distribution: � m �� N − m � k n − k p k , m , n = (10) � N � n Mark Voorhies Systematic Annotation
Sampling without replacement: GO Annotation The binomial distribution assumes that event probabilities are constant: � n � p k m (1 − p m ) n − k p k , n = (9) k What if there are m virulence factors in our genome, and every time we discover one it is magically removed from our library? Hypergeometric distribution: � m �� N − m � k n − k p k , m , n = (10) � N � n More than one disjoint type of label: � � m i � k i p k 1 , k 2 ,..., m 1 , m 2 ,..., n = (11) � N � n Mark Voorhies Systematic Annotation
Extracting gene lists from JavaTreeView Mark Voorhies Systematic Annotation
The SGD GO Slim Mapper Mark Voorhies Systematic Annotation
Multiple Hypothesis Testing http://xkcd.com/882/ Mark Voorhies Systematic Annotation
Alternatives to Hierarchical Clustering GORDER and pre-clustering by SOM Mark Voorhies Systematic Annotation
Alternatives to Hierarchical Clustering GORDER and pre-clustering by SOM Pre-calling number of clusters: k-means and k-medians Mark Voorhies Systematic Annotation
Alternatives to Hierarchical Clustering GORDER and pre-clustering by SOM Pre-calling number of clusters: k-means and k-medians Principal Component Analysis (PCA) Mark Voorhies Systematic Annotation
Homework Download PyMol Mark Voorhies Systematic Annotation
Recommend
More recommend