systematic annotation
play

Systematic Annotation Mark Voorhies 4/5/2012 Mark Voorhies - PowerPoint PPT Presentation

Systematic Annotation Mark Voorhies 4/5/2012 Mark Voorhies Systematic Annotation Review RTFM PNAS 95:14863 Mark Voorhies Systematic Annotation The Gene Ontology Three directed acyclic graphs (aspects): Biological Process Molecular


  1. Systematic Annotation Mark Voorhies 4/5/2012 Mark Voorhies Systematic Annotation

  2. Review RTFM PNAS 95:14863 Mark Voorhies Systematic Annotation

  3. The Gene Ontology Three directed acyclic graphs (aspects): Biological Process Molecular Function Subcellular Component Mark Voorhies Systematic Annotation

  4. The Gene Ontology Mark Voorhies Systematic Annotation

  5. The Gene Ontology Mark Voorhies Systematic Annotation

  6. The AmiGO browser Mark Voorhies Systematic Annotation

  7. The Gene Ontology How might we annotate genes with GO terms? How do we calculate the significance of the GO terms associated with a particular group of genes? Mark Voorhies Systematic Annotation

  8. Associating GO terms How might we annotate genes with GO terms? Mark Voorhies Systematic Annotation

  9. Associating GO terms How might we annotate genes with GO terms? By sequence homology ( e.g. , BLAST) By domain homology ( e.g. , InterProScan) Mapping from an annotated relative ( e.g. , INPARANOID) Human curation of the literature ( e.g. , SGD) Mark Voorhies Systematic Annotation

  10. Associating GO terms: Evidence codes Experimental EXP: Inferred from Experiment IDA: Inferred from Direct Assay IPI: Inferred from Physical Interaction IMP: Inferred from Mutant Phenotype IGI: Inferred from Genetic Interaction IEP: Inferred from Expression Pattern Computational Analysis ISS: Inferred from Sequence or Structural Similarity ISO: Inferred from Sequence Orthology ISA: Inferred from Sequence Alignment ISM: Inferred from Sequence Model IGC: Inferred from Genomic Context RCA: inferred from Reviewed Computational Analysis Author Statement TAS: Traceable Author Statement NAS: Non-traceable Author Statement Curator Statement Evidence Codes IC: Inferred by Curator ND: No biological Data available Automatically-assigned IEA: Inferred from Electronic Annotation Obsolete NR: Not Recorded Mark Voorhies Systematic Annotation

  11. The Gene Ontology How might we annotate genes with GO terms? How do we calculate the significance of the GO terms associated with a particular group of genes? Mark Voorhies Systematic Annotation

  12. Sampling with replacement: Mutagenesis How many transformants do we have to screen in order to “cover” a genome? Mark Voorhies Systematic Annotation

  13. Sampling with replacement: Mutagenesis How many transformants do we have to screen in order to “cover” a genome? Probability that a transformant has (1) disrupted gene: p m Number of genes in organsim: N g Mark Voorhies Systematic Annotation

  14. Sampling with replacement: Mutagenesis How many transformants do we have to screen in order to “cover” a genome? Probability that a transformant has (1) disrupted gene: p m Number of genes in organsim: N g Probability that a specific gene is disrupted in a specific transformant: � 1 � = p m p d = p m (1) N g N g Mark Voorhies Systematic Annotation

  15. Sampling with replacement: Mutagenesis How many transformants do we have to screen in order to “cover” a genome? Probability that a transformant has (1) disrupted gene: p m Number of genes in organsim: N g Probability that a specific gene is disrupted in a specific transformant: � 1 � = p m p d = p m (1) N g N g Probability of not disrupting that gene: p u = 1 − p m (2) N g Mark Voorhies Systematic Annotation

  16. Sampling with replacement: Mutagenesis Probability of not disrupting that gene: p u = 1 − p m (3) N g Mark Voorhies Systematic Annotation

  17. Sampling with replacement: Mutagenesis Probability of not disrupting that gene: p u = 1 − p m (3) N g The probability of not disrupting that gene n independent times is: � n � 1 − p m p u , n = (4) N g Mark Voorhies Systematic Annotation

  18. Sampling with replacement: Mutagenesis Probability of not disrupting that gene: p u = 1 − p m (3) N g The probability of not disrupting that gene n independent times is: � n � 1 − p m p u , n = (4) N g And the probability of disrupting that gene n independent times is: � n � 1 − p m p d , n = 1 − p u , n = 1 − (5) N g Mark Voorhies Systematic Annotation

  19. Sampling with replacement: Mutagenesis Probability of not disrupting that gene: p u = 1 − p m (3) N g The probability of not disrupting that gene n independent times is: � n � 1 − p m p u , n = (4) N g And the probability of disrupting that gene n independent times is: � n � 1 − p m p d , n = 1 − p u , n = 1 − (5) N g This is also the expected genome coverage. Mark Voorhies Systematic Annotation

  20. Sampling with replacement: Mutagenesis 1.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.8 ● ● 0.6 p_i or coverage ● 0.4 ● 0.2 0.0 0 50000 100000 150000 200000 n Mark Voorhies Systematic Annotation

  21. Sampling with replacement: General Cases Calculating the probability of zero events was easy. � n � 1 − p m p 0 , n = (6) N g Mark Voorhies Systematic Annotation

  22. Sampling with replacement: General Cases Calculating the probability of zero events was easy. � n � 1 − p m p 0 , n = (6) N g What about exactly k events? Mark Voorhies Systematic Annotation

  23. Sampling with replacement: General Cases Calculating the probability of zero events was easy. � n � 1 − p m p 0 , n = (6) N g What about exactly k events? Binomial distribution: � n � p k m (1 − p m ) n − k p k , n = (7) k Mark Voorhies Systematic Annotation

  24. Sampling with replacement: General Cases Calculating the probability of zero events was easy. � n � 1 − p m p 0 , n = (6) N g What about exactly k events? Binomial distribution: � n � p k m (1 − p m ) n − k p k , n = (7) k What if there is more than one type of event? Mark Voorhies Systematic Annotation

  25. Sampling with replacement: General Cases Calculating the probability of zero events was easy. � n � 1 − p m p 0 , n = (6) N g What about exactly k events? Binomial distribution: � n � p k m (1 − p m ) n − k p k , n = (7) k What if there is more than one type of event? Multinomial distribution: n ! � p k i p k 1 , k 2 ,..., n = � k i ! (8) i Mark Voorhies Systematic Annotation

  26. Sampling without replacement: GO Annotation The binomial distribution assumes that event probabilities are constant: � n � p k m (1 − p m ) n − k p k , n = (9) k Mark Voorhies Systematic Annotation

  27. Sampling without replacement: GO Annotation The binomial distribution assumes that event probabilities are constant: � n � p k m (1 − p m ) n − k p k , n = (9) k What if there are m virulence factors in our genome, and every time we discover one it is magically removed from our library? Mark Voorhies Systematic Annotation

  28. Sampling without replacement: GO Annotation The binomial distribution assumes that event probabilities are constant: � n � p k m (1 − p m ) n − k p k , n = (9) k What if there are m virulence factors in our genome, and every time we discover one it is magically removed from our library? Hypergeometric distribution: � m �� N − m � k n − k p k , m , n = (10) � N � n Mark Voorhies Systematic Annotation

  29. Sampling without replacement: GO Annotation The binomial distribution assumes that event probabilities are constant: � n � p k m (1 − p m ) n − k p k , n = (9) k What if there are m virulence factors in our genome, and every time we discover one it is magically removed from our library? Hypergeometric distribution: � m �� N − m � k n − k p k , m , n = (10) � N � n More than one disjoint type of label: � � m i � k i p k 1 , k 2 ,..., m 1 , m 2 ,..., n = (11) � N � n Mark Voorhies Systematic Annotation

  30. Extracting gene lists from JavaTreeView Mark Voorhies Systematic Annotation

  31. The SGD GO Slim Mapper Mark Voorhies Systematic Annotation

  32. Multiple Hypothesis Testing http://xkcd.com/882/ Mark Voorhies Systematic Annotation

  33. Alternatives to Hierarchical Clustering GORDER and pre-clustering by SOM Mark Voorhies Systematic Annotation

  34. Alternatives to Hierarchical Clustering GORDER and pre-clustering by SOM Pre-calling number of clusters: k-means and k-medians Mark Voorhies Systematic Annotation

  35. Alternatives to Hierarchical Clustering GORDER and pre-clustering by SOM Pre-calling number of clusters: k-means and k-medians Principal Component Analysis (PCA) Mark Voorhies Systematic Annotation

  36. Homework Download PyMol Mark Voorhies Systematic Annotation

Recommend


More recommend