outline
play

Outline CSEP 590A Summer 2006 Biological roles for RNA What is - PowerPoint PPT Presentation

Outline CSEP 590A Summer 2006 Biological roles for RNA What is secondary structure? Lecture 8 How is it represented? RNA Secondary Structure Prediction Why is it important? Examples Approaches RNA Structure RNA Pairing Watson-Crick


  1. Outline CSEP 590A Summer 2006 Biological roles for RNA What is “secondary structure? Lecture 8 How is it represented? RNA Secondary Structure Prediction Why is it important? Examples Approaches RNA Structure RNA Pairing Watson-Crick Pairing Primary Structure: Sequence C - G ~ 3 kcal/mole A - U ~ 2 kcal/mole Secondary Structure: Pairing “Wobble Pair” G - U ~1 kcal/mole Non-canonical Pairs (esp. if modified) Tertiary Structure: 3D shape

  2. tRNA 3d Structure Ribosomes Watson, Gilman, Witkowski, & Zoller, 1992 tRNA - Alt. Representations tRNA - Alt. Representations 3’ Anticodon 3’ 5’ loop 5’ Anticodon loop Anticodon Anticodon loop loop

  3. Semi-classical RNAs “Classical” RNAs (discovery in mid 90’s) tRNA - transfer RNA (~61 kinds, ~ 75 nt) rRNA - ribosomal RNA (~4 kinds, 120-5k nt) tmRNA - resetting stalled ribosomes snRNA - small nuclear RNA (splicing: U1, etc, 60-300nt) RNaseP - tRNA processing (~300 nt) Telomerase - (200-400nt) RNase MRP - rRNA processing; mito. rep. (~225 nt) snoRNA - small nucleolar RNA (many SRP - signal recognition particle; membrane targeting varieties; 80-200nt) (~100-300 nt) SECIS - selenocysteine insertion element (~65nt) 6S - ? (~175 nt) Recent discoveries Why? microRNAs riboswitches RNA’s fold, and function many ribozymes regulatory elements … Nature uses what works Hundreds of families Rfam release 1, 1/2003: 25 families, 55k instances Rfam release 7, 3/2005: 503 families, 300k instances

  4. Noncoding Example: Glycine Regulation RNAs How is glycine level regulated? Plausible answer: g gce protein g g g TF g DNA TF glycine cleavage enzyme gene Breakthrough transcription factors (proteins) bind to of the Year DNA to turn nearby genes on or off Gene Regulation: The Met Repressor The Glycine Riboswitch Actual answer (in many bacteria): gce SAM protein g g g g 5 ′ 3 ′ gce mRNA DNA glycine cleavage enzyme gene DNA Protein Mandal et al. Science 2004 Alberts, et al, 3e.

  5. 6S mimics an Two open promoter SAM Ribo- switches Bacillus/ Clostridium Actino- bacteria E.coli Barrick et al. RNA 2005 Trotochaud et al. NSMB 2005 Corbino et al., Genome Biol. 2005 Willkomm et al. NAR 2005 Alberts, et al, 3e. The Hammerhead Wanted Ribozyme Good structure prediction tools Good motif descriptions/models Good, fast search tools Involved in “rolling (“RNA BLAST”, etc.) circle replication” of Good, fast motif discovery tools viruses. (“RNA MEME”, etc.) Importance of structure makes last 3 hard

  6. Why is RNA hard to deal with? A G A A A A A A A U G Task 1: C G U U C U C G A C U C G C U A G C G U G G C A A G G G G A G C C A U G C G C G C A G C A A G G A G G G A G A A G G A Structure Prediction C A C C A C U U G U A C C C C G A A A A A G G C U G C C A A A U A A G A A A G U G A G A C A C U C U U U G U U G G C C U C U G U G C A G G C G U G C G A C G C A U U G C G U A A A C G U G A C U G U U U G A U G G G C A: Structure often more important than sequence RNA Pairing Definitions Sequence 5’ r 1 r 2 r 3 ... r n 3’ in {A, C, G, T} Watson-Crick Pairing A Secondary Structure is a set of pairs i•j s.t. C - G ~ 3 kcal/mole i < j-4, and no sharp turns A - U ~ 2 kcal/mole if i•j & i’•j’ are two different pairs with i ≤ i’, then “Wobble Pair” G - U ~ 1 kcal/mole 2nd pair follows 1st, or j < i’, or Non-canonical Pairs (esp. if modified) is nested within it; i < i’ < j’ < j no “pseudoknots.”

  7. A Pseudoknot Nested Precedes A-C / \ 3’ - A-G-G-C-U U U-C-C-G-A-G-G-G | C-C-C - 5’ \ / U-C-U-C Pseudoknot Approaches to Approaches, II Structure Prediction Maximum Pairing Comparative sequence analysis + works on single sequences + handles all pairings (incl. pseudoknots) + simple - requires several (many?) aligned, - too inaccurate appropriately diverged sequences Minimum Energy Stochastic Context-free Grammars + works on single sequences - ignores pseudoknots Roughly combines min energy & comparative, - only finds “optimal” fold but no pseudoknots Partition Function Physical experiments (x-ray crystalography, NMR) + finds all folds - ignores pseudoknots

  8. “Optimal pairing of r i ... r j ” Nussinov: Max Pairing Two possibilities J Unpaired: i B(i,j) = # pairs in optimal pairing of r i ... r j Find best pairing of r i ... r j-1 j j-1 B(i,j) = 0 for all i, j with i ≥ j-4; otherwise J Paired: B(i,j) = max of: Find best r i ... r k-1 + i B(i,j-1) k-1 best r k+1 ... r j-1 plus 1 max { B(i,k-1)+1+B(k+1,j-1) | k i ≤ k < j-4 and r k -r j may pair} Why is it slow? j Time: O(n 3 ) k+1 Why do pseudoknots matter? j-1 Loop-based Energy Pair-based Energy Minimization Minimization Detailed experiments show it’s 1 E(i,j) = energy of pairs in optimal pairing of r i ... r j more accurate to model based E(i,j) = ∞ for all i, j with i ≥ j-4; otherwise 2 on loops, rather than just pairs Loop types 3 E(i,j) = min of: Hairpin loop energy of j-k pair E(i,j-1) Stack min { E(i,k-1) + e(r k , r j ) + E(k+1,j-1) | i ≤ k < j-4 } 4 Bulge Interior loop Time: O(n 3 ) Multiloop 5

  9. Base Pairs and Stacking The Double Helix cytosine uracil thymine guanine adenine Zuker: Loop-based Energy, I Loop W(i,j) = energy of optimal pairing of r i ... r j Examples V(i,j) = as above, but forcing pair i•j W(i,j) = V(i,j) = ∞ for all i, j with i ≥ j-4 W(i,j) = min(W(i,j-1), min { W(i,k-1)+V(k,j) | i ≤ k < j-4 } )

  10. Zuker: Loop-based Suboptimal Energy Energy, II bulge/ multi- There are always alternate folds with near-optimal hairpin stack interior loop energies. Thermodynamics: populations of identical molecules will exist in different folds; individual V(i,j) = min(eh(i,j), es(i,j)+V(i+1,j-1), VBI(i,j), VM(i,j)) molecules even flicker among different folds VM(i,j) = min { W(i,k)+W(k+1,j) | i < k < j } Mod to Zuker’s algorithm finds subopt folds VBI(i,j) = min { ebi(i,j,i ’ ,j ’ ) + V(i ’ , j ’ ) | McCaskill: more elaborate dyn. prog. algorithm i < i ’ < j ’ < j & i ’ -i+j-j ’ > 2 } calculates the “partition function,” which defines Time: O(n 4 ) bulge/ the probability distribution over all these states. interior O(n 3 ) possible if ebi(.) is “nice” Example of suboptimal folding Black dots: pairs in opt fold Colored dots: pairs in folds 2-5% worse than optimal fold Two competing secondary structures for the Leptomonas collosoma spliced leader mRNA.

  11. Accuracy Task 2: Motif Latest estimates suggest ~50-75% of base pairs Description predicted correctly in sequences of up to ~300nt Definitely useful, but obviously imperfect How to model an RNA “Motif”? How to model an RNA “Motif”? Add “column pairs” and pair emission probabilities Conceptually, start with a profile HMM: for base-paired regions from a multiple alignment, estimate nucleotide/ insert/delete preferences for each position given a new seq, estimate likelihood that it could be generated by the model, & align it to the model <<<<<<< >>>>>>> paired columns … … mostly G ins all G del

  12. RNA Motif Models Summary “Covariance Models” (Eddy & Durbin 1994) RNA has important roles beyond mRNA Many unexpected recent discoveries aka profile stochastic context-free grammars Structure is critical to function aka hidden Markov models on steroids True of proteins, too, but they’re easier to find, Model position-specific nucleotide due, e.g., to codon structure, which RNAs lack preferences and base-pair preferences RNA secondary structure can be predicted (to useful accuracy) by dynamic programming Pro: accurate RNA “motifs” (seq + 2-ary struct) well-captured by Con: model building hard, search sloooow “covariance models”

Recommend


More recommend