cse182 l12
play

CSE182-L12 Gene Finding Quiz Who are these people, and what is - PowerPoint PPT Presentation

CSE182-L12 Gene Finding Quiz Who are these people, and what is the occasion? De novo Gene prediction: Summary Various signals distinguish coding regions from non-coding HMMs are a reasonable model for Gene structures, and provide


  1. CSE182-L12 Gene Finding

  2. Quiz • Who are these people, and what is the occasion?

  3. De novo Gene prediction: Summary • Various signals distinguish coding regions from non-coding • HMMs are a reasonable model for Gene structures, and provide a uniform method for combining various signals. • Further improvement may come from improved signal detection

  4. How many genes do we have? Nature Science

  5. Alternative splicing

  6. Comparative methods • Gene prediction is harder with alternative splicing. • One approach might be to use comparative methods to detect genes • Given a similar mRNA/protein (from another species, perhaps?), can you find the best parse of a genomic sequence that matches that target sequence • Yes, with a variant on alignment algorithms that penalize separately for introns, versus other gaps.

  7. Comparative gene finding tools • Procrustes/Sim4: mRNA vs. genomic • Genewise: proteins versus genomic • CEM: genomic versus genomic • Twinscan: Combines comparative and de novo approach.

  8. Course • Sequence Comparison (BLAST & other tools) • Protein Motifs: – Profiles/Regular Expression/HMMs • Protein Sequence Identification via Mass Spec. • Discovering protein coding genes – Gene finding HMMs – DNA signals (splice signals)

  9. Genome Assembly

  10. DNA Sequencing • DNA is double- stranded • The strands are separated, and a polymerase is used to copy the second strand. • Special bases terminate this process early.

  11. • A break at T is shown here. • Measuring the lengths using electrophoresis allows us to get the position of each T • The same can be done with every nucleotide. Color coding can help separate different nucleotides

  12. • Automated detectors ‘read’ the terminating bases. • The signal decays after 1000 bases.

  13. Sequencing Genomes: Clone by Clone • Clones are constructed to span the entire length of the genome. • These clones are ordered and oriented correctly (Mapping) • Each clone is sequenced individually

  14. Shotgun Sequencing • Shotgun sequencing of clones was considered viable • However, researchers in 1999 proposed shotgunning the entire genome.

  15. Library • Create vectors of the sequence and introduce them into bacteria. As bacteria multiply you will have many copies of the same clone.

  16. Sequencing

  17. Questions • Algorithmic: How do you put the genome back together from the pieces? Will be discussed in the next lecture. • Statistical? How many pieces do you need to sequence, etc.? – The answer to the statistical questions had already been given in the context of mapping, by Lander and Waterman.

  18. Lander Waterman Statistics G = Genome Length L = Clone Length N = Number of Clones T = Required Overlap c = Coverage = LN/G a = N/G q = T/L s = 1- q L G

  19. LW statistics: questions • As the coverage c increases, more and more areas of the genome are likely to be covered. Ideally, you want to see 1 island. • Q1: What is the expected number of islands? • Ans: N exp(-c s ) • The number increases at first, and gradually decreases.

  20. Analysis: Expected Number Islands • Computing Expected # islands. • Let X i =1 if an island ends at position i, X i =0 otherwise. • Number of islands = ∑ i X i • Expected # islands = E(∑ i X i ) = ∑ i E(X i )

  21. Prob. of an island ending at i L i T • E(X i ) = Prob (Island ends at pos. i) • = Prob(clone began at position i-L+1 AND no clone began in the next L-T positions) L - T = a e - c s ( ) E ( X i ) = a 1 - a G a e - c s = Ne - c s  Expected # islands = E ( X i ) = i

  22. LW statistics • Pr[Island contains exactly j clones]? • Consider an island that has already begun. With probability e -c s , it will never be continued. Therfore • Pr[Island contains exactly j clones]= (1 - e - c s ) j - 1 e - c s • Expected # j-clone islands = Ne - c s (1 - e - c s ) j - 1 e - c s

  23. Expected # of clones in an island e c s Why?

  24. Expected length of an island L e c s - 1 È ˘ Ê ˆ ˜ + (1 - s ) Í ˙ Á c Ë ¯ Î ˚

Recommend


More recommend