Kinetic Folding and Evolution of RNA Peter Schuster Institut für Theoretische Chemie, Universität Wien, Austria and The Santa Fe Institute, Santa Fe, New Mexico, USA Biophysik Kolloquium Humboldt-Universität Berlin, 05.07.2006
Recent review article: Peter Schuster, Prediction of RNA secondary structures: From theory to models and real molecules Rep. Prog. Phys . 69 :1419-1477, 2006. Web-Page for further information: http://www.tbi.univie.ac.at/~pks
5' - end N 1 O CH 2 O GCGGAU UUA GCUC AGUUGGGA GAGC CCAGA G CUGAAGA UCUGG AGGUC CUGUG UUCGAUC CACAG A AUUCGC ACCA 5'-end 3’-end N A U G C k = , , , OH O N 2 O P O CH 2 O Na � O O OH N 3 O P O CH 2 O Na � O Definition of RNA structure O OH N 4 O P O CH 2 O Na � O O OH 3' - end O P O Na � O
A symbolic notation of RNA secondary structure that is equivalent to the conventional graphs
N = 4 n N S < 3 n Criterion: Minimum free energy (mfe) Rules: _ ( _ ) _ � { AU , CG , GC , GU , UA , UG } A symbolic notation of RNA secondary structure that is equivalent to the conventional graphs
Conventional definition of RNA secondary structures
1. Sequence space and shape space 2. Neutral networks 3. Evolutionary optimization of structure 4. Suboptimal structures and kinetic folding 5. Comparison of kinetic folding and evolution 6. How to model evolution of kinetic folding?
1. Sequence space and shape space 2. Neutral networks 3. Evolutionary optimization of structure 4. Suboptimal structures and kinetic folding 5. Comparison of kinetic folding and evolution 6. How to model evolution of kinetic folding?
Sequence space
CGTCGTTACAATTTA GTTATGTGCGAATTC CAAATT AAAA ACAAGAG..... G A G T CGTCGTTACAATTTA GTTATGTGCGAATTC CAAATT AAAA ACAAGAG..... A C A C Hamming distance d (I ,I ) = 4 H 1 2 (i) d (I ,I ) = 0 H 1 1 (ii) d (I ,I ) = d (I ,I ) H 1 2 H 2 1 � (iii) d (I ,I ) d (I ,I ) + d (I ,I ) H 1 3 H 1 2 H 2 3 The Hamming distance between sequences induces a metric in sequence space
Every point in sequence space is equivalent Sequence space of binary sequences with chain length n = 5
Sequence space and structure space
Hamming distance d (S ,S ) = 4 H 1 2 (i) d (S ,S ) = 0 H 1 1 (ii) d (S ,S ) = d (S ,S ) H 1 2 H 2 1 � (iii) d (S ,S ) d (S ,S ) + d (S ,S ) H 1 3 H 1 2 H 2 3 The Hamming distance between structures in parentheses notation forms a metric in structure space
Two measures of distance in shape space: Hamming distance between structures, d H (S i ,S j ) and base pair distance, d P (S i ,S j )
Structures are not equivalent in structure space Sketch of structure space
∑ − 1 n = + ⋅ S S S S + − − 1 1 n n = j n j 1 j Counting the numbers of structures of chain length n � n+ 1 M.S. Waterman, T.F. Smith (1978) Math.Bioscience 42 :257-266
? ? ?
RNA sequence Biophysical chemistry: thermodynamics and kinetics RNA folding : Structural biology, spectroscopy of biomolecules, Empirical parameters understanding molecular function RNA structure of minimal free energy Sequence, structure, and design
5’-end 3’-end A C (h) C S 5 (h) S 3 U (h) G C S 4 A U A U (h) S 1 U G (h) S 2 (h) C G S 8 0 G (h) (h) S 9 S 7 G C � A U y g A r A e n e (h) A S 6 C C e U e A Suboptimal conformations r U G G F C C A G G U U U G G G A C C A U G A G G G C U G (h) S 0 Minimum of free energy The minimum free energy structures on a discrete space of conformations
Restrictions on physically acceptable mfe-structures: � � 3 and � � 2
≥ λ n Size restriction of elements: (i) hairpin loop loop ≥ σ (ii) stack n stack = Ξ + Φ S + + − 1 1 1 m m m ∑ − 2 m Ξ = + Φ ⋅ S S + − + 1 1 = λ + σ − m m k m k 2 2 k ∑ ⎣ − λ + ⎦ ( 1 ) / 2 Φ = m Ξ + − + 1 2 1 m = σ − m k 1 k S n � # structures of a sequence with chain length n Recursion formula for the number of physically acceptable stable structures I.L.Hofacker, P.Schuster, P.F. Stadler. 1998. Discr.Appl.Math . 89 :177-207
1. Sequence space and shape space 2. Neutral networks 3. Evolutionary optimization of structure 4. Suboptimal structures and kinetic folding 5. Comparison of kinetic folding and evolution 6. How to model evolution of kinetic folding?
RNA sequence Iterative determination of a sequence for the Inverse folding of RNA : given secondary RNA folding : structure Biotechnology, Structural biology, design of biomolecules spectroscopy of Inverse Folding with predefined biomolecules, Algorithm structures and functions understanding molecular function RNA structure of minimal free energy Sequence, structure, and design
Minimum free energy criterion UUUAGCCAGCGCGAGUCGUGCGGACGGGGUUAUCUCUGUCGGGCUAGGGCGC 1st GUGAGCGCGGGGCACAGUUUCUCAAGGAUGUAAGUUUUUGCCGUUUAUCUGG 2nd 3rd trial UUAGCGAGAGAGGAGGCUUCUAGACCCAGCUCUCUGGGUCGUUGCUGAUGCG 4th 5th CAUUGGUGCUAAUGAUAUUAGGGCUGUAUUCCUGUAUAGCGAUCAGUGUCCG GUAGGCCCUCUUGACAUAAGAUUUUUCCAAUGGUGGGAGAUGGCCAUUGCAG Inverse folding The inverse folding algorithm searches for sequences that form a given RNA secondary structure under the minimum free energy criterion.
I Space of genotypes: = { , , , , ... , } ; Hamming metric I I I I I 1 2 3 4 N S Space of phenotypes: = { , , , , ... , } ; metric (not required) S S S S S 1 2 3 4 M �� N M � ( ) = I S j k U � � -1 � � G k = ( ) | ( ) = I S I S k j j k � A mapping and its inversion
Degree of neutrality of neutral networks and the connectivity threshold
A multi-component neutral network formed by a rare structure: � < � cr
A connected neutral network formed by a common structure: � > � cr
3'-End 3'-End 3'-End 3'-End 5'-End 5'-End 5'-End 5'-End 70 70 70 70 60 60 60 60 10 10 10 10 50 50 50 50 20 20 20 20 30 40 30 40 30 40 30 40 A B C D RNA clover-leaf secondary structures of sequences with chain length n=76
Degree of neutrality of cloverleaf RNA secondary structures over different alphabets
Reference for postulation and in silico verification of neutral networks
Properties of RNA sequence to secondary structure mapping 1. More sequences than structures
Properties of RNA sequence to secondary structure mapping 1. More sequences than structures
Properties of RNA sequence to secondary structure mapping 1. More sequences than structures 2. Few common versus many rare structures
Properties of RNA sequence to secondary structure mapping 1. More sequences than structures 2. Few common versus many rare structures n = 100, stem-loop structures n = 30 RNA secondary structures and Zipf’s law
Properties of RNA sequence to secondary structure mapping 1. More sequences than structures 2. Few common versus many rare structures 3. Shape space covering of common structures
Properties of RNA sequence to secondary structure mapping 1. More sequences than structures 2. Few common versus many rare structures 3. Shape space covering of common structures
Properties of RNA sequence to secondary structure mapping 1. More sequences than structures 2. Few common versus many rare structures 3. Shape space covering of common structures 4. Neutral networks of common structures are connected
Properties of RNA sequence to secondary structure mapping 1. More sequences than structures 2. Few common versus many rare structures 3. Shape space covering of common structures 4. Neutral networks of common structures are connected
RNA 9 :1456-1463, 2003 Evidence for neutral networks and shape space covering
Evidence for neutral networks and intersection of apatamer functions
tRNA clover leafs with increasing stack lengths (1 � 4), n = 76 Alphabet Clover leaf 1 Clover leaf 2 Clover leaf 3 Clover leaf 4 -- -- -- 0.07 AU -- 0.22 0.21 0.20 AUG 0.28 0.28 0.29 0.31 AUGC 0.26 0.26 0.25 0.25 UGC 0.05 0.06 0.06 0.07 GC AUGC , n = 100 Mean length of path h Degree of neutrality λ Unconstrained fold 0.33 > 95 Cofold with one sequence 0.32 75 Cofold with two sequences 0.18 40 Degree of neutrality and lengths of neutral path
1. Sequence space and shape space 2. Neutral networks 3. Evolutionary optimization of structure 4. Suboptimal structures and kinetic folding 5. Comparison of kinetic folding and evolution 6. How to model evolution of kinetic folding?
Evolution in silico W. Fontana, P. Schuster, Science 280 (1998), 1451-1455
Replication rate constant : f k = � / [ � + � d S (k) ] � d S (k) = d H (S k ,S � ) Selection constraint : Population size, N = # RNA molecules, is controlled by the flow ≈ ± ( ) N t N N Mutation rate : p = 0.001 / site � replication The flowreactor as a device for studies of evolution in vitro and in silico
Recommend
More recommend