Prediction and Analysis of RNA Secondary Structures Peter Schuster Institut für Theoretische Chemie und Molekulare Strukturbiologie der Universität Wien RNA Secondary Structures in Dijon Dijon, 24.– 26.06.2002
Three-dimensional structure of phenylalanyl-transfer-RNA
RNA Secondary Structures and their Properties RNA secondary structures are listings of Watson-Crick and GU wobble base pairs, which are free of knots and pseudokots. Secondary structures are folding intermediates in the formation of full three-dimensional structures. D.Thirumalai, N.Lee, S.A.Woodson, and D.K.Klimov. Annu.Rev.Phys.Chem . 52 :751-762 (2001)
5'-End 3'-End Sequence GCGGAU UUA GCUC AGDDGGGA GAGC M CCAGA CUGAAYA UCUGG AGMUC CUGUG TPCGAUC CACAG A AUUCGC ACCA 3'-End 5'-End 70 60 Secondary Structure 10 50 20 30 40 Symbolic Notation 5'-End 3'-End Definition and formation of the secondary structure of phenylalanyl-tRNA
40 30 50 20 60 10 70 5'-Ende 3'-Ende Circle representation of tRNA phe
Virtuelle Root 5'-Ende 3'-Ende Tree representation of tRNA phe
60 30 40 20 50 10 70 76 3'-Ende 5'-Ende Mountain representation of tRNA phe
Mountain representation used in structure prediction of medium size RNA molecules
Mountain representation used in structure prediction of large RNA molecules
� � � � T = 0 K , t T > 0 K , t T > 0 K , t finite 3.30 3.40 3.10 49 48 47 46 2.80 45 44 42 43 41 40 38 37 39 36 Free Energy 34 35 33 32 31 29 30 28 27 26 25 2.60 24 23 22 21 20 19 3.10 18 S 10 17 16 15 13 14 12 S 8 3.40 2.90 S 9 11 10 9 S 7 5.10 S 5 3.00 S 6 8 6 7 5 S 4 4 S 3 3 7.40 S 2 2 5.90 S 1 S 0 S 0 S 1 S 0 Minimum Free Energy Structure Suboptimal Structures Kinetic Structures Different notions of RNA structure
RNA Minimum Free Energy Structures Efficient algorithms based on dynamical programming are available for computation of secondary structures for given sequences. Inverse folding algorithms compute sequences for given secondary structures. M.Zuker and P.Stiegler. Nucleic Acids Res . 9 :133-148 (1981) Vienna RNA Package : http:www.tbi.univie.ac.at (includes inverse folding , suboptimal structures , kinetic folding , etc.) I.L.Hofacker, W. Fontana, P.F.Stadler, L.S.Bonhoeffer, M.Tacker, and P. Schuster. Mh.Chem . 125 :167-188 (1994)
Minimum free energy criterion UUUAGCCAGCGCGAGUCGUGCGGACGGGGUUAUCUCUGUCGGGCUAGGGCGC 1st GUGAGCGCGGGGCACAGUUUCUCAAGGAUGUAAGUUUUUGCCGUUUAUCUGG 2nd 3rd trial UUAGCGAGAGAGGAGGCUUCUAGACCCAGCUCUCUGGGUCGUUGCUGAUGCG 4th 5th CAUUGGUGCUAAUGAUAUUAGGGCUGUAUUCCUGUAUAGCGAUCAGUGUCCG GUAGGCCCUCUUGACAUAAGAUUUUUCCAAUGGUGGGAGAUGGCCAUUGCAG Inverse folding The inverse folding algorithm searches for sequences that form a given RNA secondary structure under the minimum free energy criterion.
Criterion of Minimum Free Energy UUUAGCCAGCGCGAGUCGUGCGGACGGGGUUAUCUCUGUCGGGCUAGGGCGC GUGAGCGCGGGGCACAGUUUCUCAAGGAUGUAAGUUUUUGCCGUUUAUCUGG UUAGCGAGAGAGGAGGCUUCUAGACCCAGCUCUCUGGGUCGUUGCUGAUGCG CAUUGGUGCUAAUGAUAUUAGGGCUGUAUUCCUGUAUAGCGAUCAGUGUCCG GUAGGCCCUCUUGACAUAAGAUUUUUCCAAUGGUGGGAGAUGGCCAUUGCAG Sequence Space Shape Space
.... GC CA UC .... d =1 H d =2 .... GC GA UC .... .... GC CU UC .... H d =1 H .... GC GU UC .... Point mutations as moves in sequence space
CGTCGTTACAATTTA GTTATGTGCGAATTC CAAATT AAAA ACAAGAG..... G A G T CGTCGTTACAATTTA GTTATGTGCGAATTC CAAATT AAAA ACAAGAG..... A C A C Hamming distance d (S ,S ) = 4 H 1 2 d (S ,S ) = 0 (i) H 1 1 (ii) d (S ,S ) = d (S ,S ) H 1 2 H 2 1 � (iii) d (S ,S ) d (S ,S ) + d (S ,S ) H 1 3 H 1 2 H 2 3 The Hamming distance induces a metric in sequence space
Mutant class 0 0 1 1 2 4 8 16 Binary sequences are encoded by their decimal equivalents: 2 3 5 6 9 10 12 17 18 20 24 = 0 and = 1, for example, C G ≡ "0" 00000 = CCCCC , 3 7 11 13 14 19 21 22 25 26 28 ≡ "14" 01110 = , C GGG C ≡ 4 "29" 11101 = , etc. GGG G C 15 23 27 29 30 5 31 Sequence space of binary sequences of chain lenght n=5
ψ Sk = ( ) I. fk = ( f Sk ) Non-negative Sequence space Phenotype space numbers Mapping from sequence space into phenotype space and into fitness values
ψ Sk = ( ) I. fk = ( f Sk ) Non-negative Sequence space Phenotype space numbers
ψ Sk = ( ) I. fk = ( f Sk ) Non-negative Sequence space Phenotype space numbers
Neutral networks of small RNA molecules can be computed by exhaustive folding of complete sequence spaces, i.e. all RNA sequences of a given chain length. This number, N=4 n , becomes very large with increasing length, and is prohibitive for numerical computations. Neutral networks can be modelled by random graphs in sequence space. In this approach, nodes are inserted randomly into sequence space until the size of the pre-image, i.e. the number of neutral sequences, matches the neutral network to be studied.
Step 00 Sketch of sequence space Random graph approach to neutral networks
Step 01 Sketch of sequence space Random graph approach to neutral networks
Step 02 Sketch of sequence space Random graph approach to neutral networks
Step 03 Sketch of sequence space Random graph approach to neutral networks
Step 04 Sketch of sequence space Random graph approach to neutral networks
Step 05 Sketch of sequence space Random graph approach to neutral networks
Step 10 Sketch of sequence space Random graph approach to neutral networks
Step 15 Sketch of sequence space Random graph approach to neutral networks
Step 25 Sketch of sequence space Random graph approach to neutral networks
Step 50 Sketch of sequence space Random graph approach to neutral networks
Step 75 Sketch of sequence space Random graph approach to neutral networks
Step 100 Sketch of sequence space Random graph approach to neutral networks
� � Υ � � -1 � � G = ( S ) | ( ) = I I S k k j j k � � (k) j / λ k = λ j = 12 27 , | G k | / κ - cr = 1 - -1 ( 1) λ κ Connectivity threshold: � � � AUGC Alphabet size : = 4 cr 2 0.5 λ λ > network is connected G k cr . . . . k 3 0.4226 λ λ < network is not connected cr . . . . G k 4 0.3700 k Mean degree of neutrality and connectivity of neutral networks
Giant Component A multi-component neutral network
A connected neutral network
Suboptimal RNA Secondary Structures Michael Zuker. On finding all suboptimal foldings of an RNA molecule . Science 244 (1989), 48-52 Stefan Wuchty, Walter Fontana, Ivo L. Hofacker, Peter Schuster. Complete suboptimal folding of RNA and the stability of secondary structures. Biopolymers 49 (1999), 145-165
3' Total number of structures including all suboptimal conformations, stable 5' and unstable (with � G 0 >0): #conformations = 1 416 661 Minimum free energy structure AAAGGGCACAGGGUGAUUUCAAUAAUUUUA Sequence Example of a small RNA molecule: n=30
Density of stares of suboptimal structures of the RNA molecule with the sequence: AAAGGGCACAGGGUGAUUUCAAUAAUUUUA
Partition Function of RNA Secondary Structures John S. McCaskill . The equilibrium function and base pair binding probabilities for RNA secondary structure . Biopolymers 29 (1990), 1105-1119 Ivo L. Hofacker, Walter Fontana, Peter F. Stadler, L. Sebastian Bonhoeffer, Manfred Tacker, Peter Schuster. Fast folding and comparison of RNA secondary structures. Monatshefte für Chemie 125 (1994), 167-188
3' 5' Example of a small RNA molecule with two low-lying suboptimal conformations which contribute substantially to the partition function UUGGAGUACACAACCUGUACACUCUUUC Example of a small RNA molecule: n=28
U U G G A G U A C A C A A C C U G U A C A C U C U U U C C U U C U U U C U C A C A U G U C C A A C A C A U G A G G U U U U G G A G U A C A C A A C C U G U A C A C U C U U U C U C C U G G A U U A second suboptimal configuration C G A U ∆ E = 0.55 kcal / mole 0 →2 U A G C U A C C A C A C U U first suboptimal configuration U C ∆ E = 0.50 kcal / mole U → G G A G 0 1 C C U U A A U U G A U A C A C C A C C 3' U U U C U U U G G A G U C 5' C A minimum free energy A configuration U A G C � G = - 5.39 kcal / mole 0 U A C C A A C U U G G A G U A C A C A A C C U G U A C A C U C U U U C „Dot plot“ of the minimum free energy structure ( lower triangle ) and the partition function ( upper triangle ) of a small RNA molecule (n=28) with low energy suboptimal configurations
5'-End 3'-End Sequence GCGGAU UUA GCUC AGDDGGGA GAGC M CCAGA CUGAAYA UCUGG AGMUC CUGUG TPCGAUC CACAG A AUUCGC ACCA 3'-End 5'-End 70 60 Secondary Structure 10 50 20 30 40 Symbolic Notation 5'-End 3'-End Phenylalanyl-tRNA as an example for the computation of the partition function
G first suboptimal configuration ∆ 0 E = 0.43 kcal / mole → 1 3’ 5’ tRNA phe without modified bases
Recommend
More recommend