beyond genomics detecting codes and signals in the
play

Beyond Genomics: Detecting Codes and Signals in the Cellular - PowerPoint PPT Presentation

Beyond Genomics: Detecting Codes and Signals in the Cellular Transcriptome Brendan J. Frey University of Toronto Brendan Frey Purpose of my talk To identify aspects of bioinformatics in which attendees of ISIT may be able to make significant


  1. Beyond Genomics: Detecting Codes and Signals in the Cellular Transcriptome Brendan J. Frey University of Toronto Brendan Frey

  2. Purpose of my talk To identify aspects of bioinformatics in which attendees of ISIT may be able to make significant contributions Brendan Frey

  3. Beyond Genomics: Detecting Codes and Signals in the Cellular Transcriptome Brendan J. Frey University of Toronto Brendan Frey

  4. The Genome Brendan Frey

  5. Starting point: Discrete biological sequences • Symbols are Bases: G, C, A, T RED indicates a definition that you should remember • Examples of biological sequences – Genes – Peptides – DNA – RNA – Chromosomes – Viruses – Proteins – HIV Brendan Frey

  6. Chromosomes: Inherited DNA sequence DNA Sequence (GCATTCATGC…) Cell replication Sexual cell reproduction Nucleus Brendan Frey

  7. The genome • Genome: Chromosomal DNA sequence from an organism or species • Examples Genome Length (bases) Human 3,000 million (750MB) Mouse 2,600 million Fly 100 million Yeast 13 million Brendan Frey

  8. Genes • A gene is a subsequence of the genome that encodes a functioning bio-molecule • The library of known genes – Comprises only 1% of genome sequence – Increases in diversity every year – Is probably far from complete Brendan Frey

  9. The Transcriptome Brendan Frey

  10. Genome: The digital backbone of molecular biology Transcripts: Perform functions encoded in the genome Brendan Frey

  11. Traditional genes Transcript Protein DNA (RNA) Transcription Translation Output: Protein Input: DNA Output: Transcript Input: Transcript Brendan Frey

  12. Traditional genes Transcript Protein DNA (RNA) Transcription Translation Transcriptome Genome Proteome Brendan Frey

  13. Transcription Gene Upstream region Exon Intron DNA … … Transcription proteins Regulatory proteins Transcript (RNA) Brendan Frey

  14. Transcription Upstream region Exon Intron DNA … … Brendan Frey

  15. Transcription • Codewords in the upstream region bind to corresponding regulatory proteins Regulatory protein CGTGGATAGTGAT Exon DNA … … Upstream region • Code: Set of regulatory codewords • Signals: Concentrations of regulatory proteins and the output transcript Brendan Frey

  16. Splicing of transcripts Exon Intron Transcript (RNA) … … Regulatory proteins Brendan Frey

  17. Splicing of transcripts Exon Intron Transcript (RNA) … … Regulatory proteins Splicing proteins Brendan Frey

  18. Splicing of transcripts Exon Intron Transcript (RNA) … … • The intron is spliced out • However, splicing may occur quite differently… Brendan Frey

  19. Splicing of transcripts Exon Intron Transcript (RNA) … … … Regulatory proteins Splicing proteins Brendan Frey

  20. Splicing of transcripts … Regulatory proteins Splicing proteins Brendan Frey

  21. Splicing of transcripts … Regulatory The middle exon is ‘skipped’, proteins Splic leading to a different transcript Brendan Frey

  22. Splicing of transcripts • Codewords in the introns and exons bind to corresponding regulatory proteins Regulatory proteins TTAGAT TGGGGT … • Code: Set of regulatory codewords • Signals: Concentrations of regulatory proteins and different spliced transcripts Brendan Frey

  23. The modern transcriptome Cell nucleus Genome Non-functional transcripts TRANSCRIPTION TRANSCRIPTION TRANSCRIPTION Liver Brain and Liver Transcript (RNA) SPLICING SPLICING SPLICING Non-traditional Transcript Brain Liver (mRNA) transcript mRNA mRNA TRANSLATION Protein Protein A Protein B Brendan Frey

  24. The modern transcriptome Cell nucleus Genomic DNA Non-functional transcripts TRANSCRIPTION TRANSCR. TRANSCRIPTION in Liver in Brain and Liver Transcript (RNA) SPLICING SPLICING SPLICING Non-traditional Spliced transcript Brain Brain Liver Liver (mRNA) transcript mRNA mRNA Alternative transcripts TRANSLATION … it turns out to be surprising in many ways Protein ncRNA Protein Protein # genes, ½ trans, 60% AS, 18k AS, 20% dis, 10k ncRNA Brendan Frey

  25. The Resources Brendan Frey

  26. Your collaborators can do lab work… • Sequencing: Snag an actual transcript and figure out its sequence • Microarrays: Find out if your predicted transcript fragment is expressed in a tissue sample • Mass spectrometry: Find out if a protein is present in a sample Brendan Frey

  27. Databases • Genomes • Genome annotations • Libraries of observed transcript fragments • Microarray datasets containing measured concentrations of transcripts • … Brendan Frey

  28. Cell Measuring transcript concentrations using microarrays T T A C C G 1. Fabricate microarray with probes G G C G G C T T A 2. Extract transcripts from cell C C G A A T 3. Add florescent tag C C G A A T T T A 4. Hybridize tagged sequence to microarray 5. Excite florescent tag with laser A T T and measure intensity G C T G C C A G T G T A G G A A G A T T A G probes Brendan Frey

  29. Inkjet printer technology Hughes et al, Nature Biotech 2001 Print nucleic acid sequences using inkjet printer Brendan Frey

  30. Then and now… • First microarrays (late 1990s) –‘Cancer chips’, ‘gene chips’, … –5,000-10,000 probes per slide –Noisy • Current microarrays –‘Sub-gene resolution’ –200,000 probes per slide –Low noise –Multi-chip designs are cost effective Brendan Frey

  31. The Case Study: Discovering protein-making transcripts using factor graphs BJ Frey, …, TR Hughes Nature Genetics, September 2005 Brendan Frey

  32. Controversy about the gene library ey et al’s impre ssive Despite F r c o mputatio nal r ec onstr uc tion of e , we arg ue that this gene str uc tur does not pr ove the c omplexity of the tr ansc r iptome – F ANT OM/ RI K E N Co nso rtium Sc ie nc e , Marc h 2006 How it all started… Brendan Frey

  33. Research on the transcriptome Analysis of Detection of genome transcripts Our project 2003-2005 2001-2005 1960’s-2000 2001-2006 Brendan Frey

  34. Estimates of number of undiscovered genes Bertone et al: ~11,000 (Science) Genome: ~10,000 (IHGSC, Nature) Genome: ~3000 (IHGSC, Nature) Kapranov et al, Rinn et al, Shoemaker et al: ~300,000 2000 2001 2002 2004 2005 2003 Brendan Frey

  35. Our microarrays • Our genome analysis highlighted 1 million possible exons (~180,000 already known) • We designed one 60-base probe for each possible exon Number of probes per 8000 bases Number of known exons per 8000 bases Coordinates (in bases) in Chromosome 4 Brendan Frey

  36. Our samples (37 tissues) Twelve pools of mouse mRNA Pool Composition (mRNA per array hybridization) 1 Heart (2 µ g), Skeletal muscle (2 µ g) 2 Liver (2 µ g) 3 Whole brain (1.5 µ g), Cerebellum (0.48 µ g), Olfactory bulb (0.15 µ g) 4 Colon (0.96 µ g), Intestine (1.04 µ g) 5 Testis (3 µ g), Epididymis (0.4 µ g) 6 Femur (0.9 µ g), Knee (0.4 µ g), Calvaria (0.06 µ g), Teeth+mandible (1.3 µ g), Teeth (0.4 µ g) 7 15d Embryo (1.3 µ g), 12.5d Embryo (12.5 µ g), 9.5d Embryo (0.3 µ g), 14.5d Embryo head (0.25 µ g), ES cells (0.24 µ g) 8 Digit (1.3 µ g), Tongue (0.6 µ g), Trachea (0.15 µ g) 9 Pancreas (1 µ g), Mammary gland (0.9 µ g), Adrenal gland (0.25 µ g), Prostate gland (0.25 µ g) 10 Salivary gland (1.26 µ g), Lymph node (0.74 µ g) 11 12.5d Placenta (1.15 µ g), 9.5d Placenta (0.5 µ g), 15d Placenta (0.35 µ g) 12 Lung (1 µ g), Kidney (1 µ g), Adipose (1 µ g), Bladder (0.05 µ g) Brendan Frey

  37. Signal: The data (small part of the data from Chromosome 4) Each column is an expression profile Example of a transcript Code: A ‘vector repetition code with deletions’ Brendan Frey

  38. The transcript model Each transcript is modeled using A prototype expression profile # probes before prototype (eg, 1) # probes after prototype (eg, 4) Flag indicating whether each probe corresponds to an exon e e e e e Brendan Frey

  39. The factor graph ... t 1 t 2 t 3 t 4 t 5 t 6 t n Transcription start/stop indicator r 1 r 2 r 3 r 4 r 5 r 6 r n Relative index of prototype e 1 e 2 e 3 e 4 e 5 e 6 e n Exon versus non-exon indicator s 1 s 2 s 3 s 4 s 5 s 6 s n Probe sensitivity & noise ... Expression profile (genomic order) x 1 x 2 x 3 x 4 x 5 x 6 x n The prototype for x i is x i+ r i , r i ∈ {-W,…,W}. We use W=100 ONLY 1 FREE PARAMETER: κ , probability of starting a transcript Brendan Frey

  40. After expression data (x) is observed, the factor graph becomes a tree ... t 1 t 2 t 3 t 4 t 5 t 6 t n Transcription start/stop indicator r 1 r 2 r 3 r 4 r 5 r 6 r n Relative index of prototype e 1 e 2 e 3 e 4 e 5 e 6 e n Exon versus non-exon indicator s 1 s 2 s 3 s 4 s 5 s 6 s n Probe sensitivity & noise ... Expression profile (genomic order) x 1 x 2 x 3 x 4 x 5 x 6 x n Brendan Frey

  41. After expression data (x) is observed, the factor graph becomes a tree ... t 1 t 2 t 3 t 4 t 5 t 6 t n Transcription start/stop indicator ... Relative index of prototype r 1 r 2 r 3 r 4 r 5 r 6 r n ... e 1 e 2 e 3 e 4 e 5 e 6 e n Exon versus non-exon indicator ... Probe sensitivity & noise s 1 s 2 s 3 s 4 s 5 s 6 s n Computation: The max-product algorithm performs exact inference and learning . Brendan Frey

Recommend


More recommend