1. Controlled experiments on evolution and RNA replication 2. Evolution in silico and optimization of RNA structures 3. Sequence-structure maps, neutral networks, and intersections 4. Design of RNA molecules with predefined properties
Optimization of RNA molecules in silico W.Fontana, P.Schuster, A computer model of evolutionary optimization . Biophysical Chemistry 26 (1987), 123-147 W.Fontana, W.Schnabl, P.Schuster, Physical aspects of evolutionary optimization and adaptation . Phys.Rev.A 40 (1989), 3301-3321 M.A.Huynen, W.Fontana, P.F.Stadler, Smoothness within ruggedness. The role of neutrality in adaptation . Proc.Natl.Acad.Sci.USA 93 (1996), 397-401 W.Fontana, P.Schuster, Continuity in evolution. On the nature of transitions . Science 280 (1998), 1451-1455 W.Fontana, P.Schuster, Shaping space. The possible and the attainable in RNA genotype- phenotype mapping . J.Theor.Biol. 194 (1998), 491-515 B.M.R. Stadler, P.F. Stadler, G.P. Wagner, W. Fontana, The topology of the possible: Formal spaces underlying patterns of evolutionary change. J.Theor.Biol. 213 (2001), 241-274
Stock Solution Reaction Mixture Replication rate constant: f k = � / [ � + � d S (k) ] � (k) = d H (S k ,S � d S ) Selection constraint: # RNA molecules is controlled by the flow ≈ ± N ( t ) N N The flowreactor as a device for studies of evolution in vitro and in silico
3'-End 5'-End 70 60 10 50 20 40 30 Randomly chosen Phenylalanyl-tRNA as initial structure target structure
5' 3' Plus Strand G C C C G 5' 3' GAA UCCCG AA GAA UCCCGUCCCG AA Plus Strand C G G C C Insertion C 3' G 5' 3' Minus Strand C G G G G C GAAUCC CGA A GAAUCCA 3' 5' Deletion Plus Strand G C C C C G Point Mutation Mutations in nucleic acids represent the mechanism for variation of genotypes .
Master sequence Mutant cloud “Off-the-cloud” Concentration mutations Sequence e c a p s The molecular quasispecies in sequence space
Genotype-Phenotype Mapping Evaluation of the = � S { ( ) I { S { Phenotype I { ƒ f = ( S ) { { f { Q { f 1 j f 1 Mutation I 1 f n+1 f 2 I 1 I n+1 I 2 f n f 2 I n I 2 f 3 I 3 Q Q I 3 f 3 I 4 I { f 4 f { I 5 I 4 I 5 f 4 f 5 f 5 Evolutionary dynamics including molecular phenotypes
50 S d � - 0 5 40 e r u t c u r Evolutionary trajectory t s 30 l a i t i n i m o r f 20 e c n a t s i d e g 10 a r e v A 0 0 250 500 750 1000 1250 Time (arbitrary units) In silico optimization in the flow reactor: Trajectory ( biologists‘ view )
50 S d � 40 t e g r a t o t e 30 c n a t s i d e r u 20 t c u r t s e g a r 10 e v A Evolutionary trajectory 0 0 250 500 750 1000 1250 Time (arbitrary units) In silico optimization in the flow reactor: Trajectory ( physicists‘ view )
Average structure distance to target dS 36 � Relay steps Number of relay step 10 38 40 42 44 Evolutionary trajectory 0 1250 Time 44 Endconformation of optimization
Average structure distance to target dS 36 � Relay steps Number of relay step 10 38 40 42 44 Evolutionary trajectory 0 1250 Time 43 44 Reconstruction of the last step 43 � 44
Average structure distance to target dS 36 � Relay steps Number of relay step 10 38 40 42 44 Evolutionary trajectory 0 1250 Time 42 43 44 Reconstruction of last-but-one step 42 � 43 ( � 44)
Average structure distance to target dS 36 � Relay steps Number of relay step 10 38 40 42 44 Evolutionary trajectory 0 1250 Time 41 42 43 44 Reconstruction of step 41 � 42 ( � 43 � 44)
Average structure distance to target dS 36 � Relay steps Number of relay step 10 38 40 42 44 Evolutionary trajectory 0 1250 Time 40 41 42 43 44 Reconstruction of step 40 � 41 ( � 42 � 43 � 44)
Average structure distance to target dS 36 � Relay steps Number of relay step 10 38 40 42 44 Evolutionary trajectory 0 1250 Time Evolutionary process 39 40 41 42 43 44 Reconstruction Reconstruction of the relay series
Transition inducing point mutations Neutral point mutations Change in RNA sequences during the final five relay steps 39 � 44
50 Relay steps S d � 40 t e g r a t o t e 30 c n a t s i d e r u 20 t c u r t s e g a r 10 e v A Evolutionary trajectory 0 0 250 500 750 1000 1250 Time (arbitrary units) In silico optimization in the flow reactor: Trajectory and relay steps
Average structure distance Uninterrupted presence Number of relay step 08 to target dS 10 � 12 28 neutral point mutations during 20 14 a long quasi-stationary epoch Evolutionary trajectory 10 0 250 500 Time (arbitrary units) Transition inducing point mutations Neutral point mutations Neutral genotype evolution during phenotypic stasis
Variation in genotype space during optimization of phenotypes Mean Hamming distance within the population and drift velocity of the population center in sequence space.
50 Relay steps Main transitions Average structure distance to target d � S 40 30 20 10 Evolutionary trajectory 0 0 250 500 750 1000 1250 Time (arbitrary units) In silico optimization in the flow reactor: Main transitions
00 09 31 44 Three important steps in the formation of the tRNA clover leaf from a randomly chosen initial structure corresponding to three main transitions .
Roll-Over Shift α α α a Double Flip Flip α a a a b β b β Main or discontinuous Multi- loop transitions : Structural innovations , occur rarely on single point Closing of Constrained mutations Stacks
AUGC GC Movies of optimization trajectories over the AUGC and the GC alphabet
0.2 0.15 y c n e 0.1 u q e r F 0.05 0 0 1000 2000 3000 4000 5000 Runtime of trajectories Statistics of the lengths of trajectories from initial structure to target ( AUGC -sequences)
0.3 Main transitions 0.25 0.2 y c n e 0.15 u q e r F All transitions 0.1 0.05 0 0 20 40 80 100 60 Number of transitions Statistics of the numbers of transitions from initial structure to target ( AUGC -sequences)
Alphabet Runtime Transitions Main transitions No. of runs AUGC 385.6 22.5 12.6 1017 GUC 448.9 30.5 16.5 611 GC 2188.3 40.0 20.6 107 Statistics of trajectories and relay series (mean values of log-normal distributions)
1. Controlled experiments on evolution and RNA replication 2. Evolution in silico and optimization of RNA structures 3. Sequence-structure maps, neutral networks, and intersections 4. Design of RNA molecules with predefined properties
Minimum free energy criterion Inverse folding of RNA secondary structures The idea of inverse folding algorithm is to search for sequences that form a given RNA secondary structure under the minimum free energy criterion.
Structure
3’-end C U G G G A A A A A U C C C C A G A C C G G G G G U U U C C C C G 5’-end G Structure Compatible sequence
3’-end C A A U G A U G G G C A A G C A A G C A U G C C C A U C C C G A G A A C G C C G G C G G C G G G C G U U G C U C C G C C U G C G 5’-end U U G Structure Compatible sequence
3’-end C A A U G A U G Single nucleotides: A U G C , , , G G C A A G C A A G C A U G C C C A U C C C G A G A A C G C C G G C G G C G AU , UA G G C G Base pairs: GC , CG U U G C GU , UG U C C G C C U G C G 5’-end U U G Structure Compatible sequence
3’-end C A A U G A U G G G C A A G C A A G C A U G C C C A C U C C G A G A A C G C C G G C G G C G G G G G U G C U U C G C C G U G C G U U 5’-end G Structure Incompatible sequence
Initial trial sequences Stop sequence of an unsuccessful trial Intermediate compatible sequences Target sequence Target structure S k Approach to the target structure S k in the inverse folding algorithm
Minimum free energy criterion 1st 2nd 3rd trial 4th 5th Inverse folding of RNA secondary structures The inverse folding algorithm searches for sequences that form a given RNA secondary structure under the minimum free energy criterion.
CGTCGTTACAATTTA GTTATGTGCGAATTC CAAATT AAAA ACAAGAG..... G A G T CGTCGTTACAATTTA GTTATGTGCGAATTC CAAATT AAAA ACAAGAG..... A C A C Hamming distance d (I ,I ) = 4 H 1 2 (i) d (I ,I ) = 0 H 1 1 (ii) d (I ,I ) = d (I ,I ) H 1 2 H 2 1 � (iii) d (I ,I ) d (I ,I ) + d (I ,I ) H 1 3 H 1 2 H 2 3 The Hamming distance between sequences induces a metric in sequence space
Hamming distance d (S ,S ) = 4 H 1 2 (i) d (S ,S ) = 0 H 1 1 (ii) d (S ,S ) = d (S ,S ) H 1 2 H 2 1 � (iii) d (S ,S ) d (S ,S ) + d (S ,S ) H 1 3 H 1 2 H 2 3 The Hamming distance between structures in parentheses notation forms a metric in structure space
RNA sequences as well as RNA secondary structures can be visualized as objects in metric spaces . At constant chain length the sequence space is a (generalized) hypercube. The mapping from RNA sequences into RNA secondary structures is many-to-one. Hence, it is redundant and not invertible. RNA sequences , which are mapped onto the same RNA secondary structure , are neutral with respect to structure . The pre-images of structures in sequence space are neutral networks . They can be represented by graphs where the edges connect sequences of Hamming distance d H = 1.
ψ Sk = ( ) I. fk = ( f Sk ) Sequence space Structure space Real numbers Mapping from sequence space into structure space and into function
ψ Sk = ( ) I. fk = ( f Sk ) Sequence space Structure space Real numbers
ψ Sk = ( ) I. fk = ( f Sk ) Sequence space Structure space Real numbers The pre-image of the structure S k in sequence space is the neutral network G k
Neutral networks are sets of sequences forming the same structure. G k is the pre-image of the structure S k in sequence space: G k = � -1 (S k ) π { � j | � (I j ) = S k } The set is converted into a graph by connecting all sequences of Hamming distance one. Neutral networks of small RNA molecules can be computed by exhaustive folding of complete sequence spaces, i.e. all RNA sequences of a given chain length. This number, N=4 n , becomes very large with increasing length, and is prohibitive for numerical computations. Neutral networks can be modelled by random graphs in sequence space. In this approach, nodes are inserted randomly into sequence space until the size of the pre-image, i.e. the number of neutral sequences, matches the neutral network to be studied.
� � U � � -1 � � G = ( S ) | ( ) = I I S k k j j k � � (k) j / λ k = λ j = 12 27 = 0.444 , | G k | / κ - -1 ( 1) λ κ cr = 1 - Connectivity threshold: � � � Alphabet size : AUGC = 4 cr 2 0.5 GC,AU λ λ network G k is connected > cr . . . . k 3 0.423 GUC,AUG λ λ < network G k is not connected 4 cr . . . . 0.370 k AUGC Mean degree of neutrality and connectivity of neutral networks
A connected neutral network
Giant Component A multi-component neutral network
3'-End 3'-End 3'-End 3'-End 5'-End 5'-End 5'-End 5'-End 70 70 70 70 60 60 60 60 10 10 10 10 50 50 50 50 20 20 20 20 40 40 30 40 30 30 30 40 Degree of neutrality � � Alphabet � - - - - 0.073 0.032 AU � - - � 0.217 0.051 AUG 0.201 0.056 � 0.275 0.064 � � AUGC 0.279 0.063 0.313 0.058 � UGC 0.263 0.071 � � 0.257 0.070 0.250 0.064 GC � 0.052 0.033 � � 0.057 0.034 0.068 0.034 Degree of neutrality of cloverleaf RNA secondary structures over different alphabets
Stable tRNA clover leaf structures built from 3'-End binary, GC -only, sequences exist. The corresponding sequences are found through 5'-End inverse folding. Optimization by mutation and selection in the flow reactor turned out to be a hard problem. 70 60 10 The neutral network of the tRNA clover 50 leaf in GC sequence space is not 20 connected, whereas to the corresponding neutral network in AUGC sequence space 30 40 is close to the connectivity threshold, � cr . Here, both inverse folding and optimization in the flow reactor are much more effective than with GC sequences. The hardness of the structure optimization problem depends on the connectivity of neutral networks .
Reference for postulation and in silico verification of neutral networks
Structure S k G k Neutral Network � k G k C Compatible Set C k The compatible set C k of a structure S k consists of all sequences which form S k as its minimum free energy structure (the neutral network G k ) or one of its suboptimal structures.
Structure S 0 Structure S 1 The intersection of two compatible sets is always non empty: C 0 � C 1 � π
Reference for the definition of the intersection and the proof of the intersection theorem
3’-end C U G G G A A A A A U C C C C A G A C C G G G G G U U U C C C C G G 0 S A A A n C G o A U 1 i C A t S G C a m G C n G C o r G C o i G C t f a G C n G C m o G C A r c U o G U y C G f n g U A A r o e c A U G C n l e U A G C a m e C G C G e i r t C G A sequence at the intersection of f C p G o m C G C G b two neutral networks is compatible u u G C C G m S with both structures G U i U G n G C i U U M
3.30 49 48 47 46 45 44 42 43 41 40 38 39 37 36 34 35 33 32 31 30 29 28 27 25 24 26 23 22 21 20 19 0 1 . 3 18 17 16 15 13 14 12 11 10 5.10 9 8 7 6 5 4 3 7.40 2 5.90 S1 S0 basin '1' basin '0' Barrier tree for two long living minimum free energy long living structures metastable structure structure
Kinetics of RNA refolding between a long living metastable conformation and the minmum free energy structure
1. Controlled experiments on evolution and RNA replication 2. Evolution in silico and optimization of RNA structures 3. Sequence-structure maps, neutral networks, and intersections 4. Design of RNA molecules with predefined properties
A ribozyme switch E.A.Schultes, D.B.Bartel, Science 289 (2000), 448-452
Two ribozymes of chain lengths n = 88 nucleotides: An artificial ligase ( A ) and a natural cleavage ribozyme of hepatitis- � -virus ( B )
The sequence at the intersection : An RNA molecules which is 88 nucleotides long and can form both structures
Two neutral walks through sequence space with conservation of structure and catalytic activity
Evolutionary design of RNA molecules D.B.Bartel, J.W.Szostak, In vitro selection of RNA molecules that bind specific ligands . Nature 346 (1990), 818-822 C.Tuerk, L.Gold, SELEX - Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase . Science 249 (1990), 505-510 D.P.Bartel, J.W.Szostak, Isolation of new ribozymes from a large pool of random sequences . Science 261 (1993), 1411-1418 R.D.Jenison, S.C.Gill, A.Pardi, B.Poliski, High-resolution molecular discrimination by RNA . Science 263 (1994), 1425-1429 Y. Wang, R.R.Rando, Specific binding of aminoglycoside antibiotics to RNA . Chemistry & Biology 2 (1995), 281-290 Jiang, A. K. Suri, R. Fiala, D. J. Patel, Saccharide-RNA recognition in an aminoglycoside antibiotic-RNA aptamer complex . Chemistry & Biology 4 (1997), 35-50
Aptamer binding to aminoglycosid antibiotics: Structure of ligands Y. Wang, R.R.Rando, Specific binding of aminoglycoside antibiotics to RNA . Chemistry & Biology 2 (1995), 281-290
tobramycin A A A A A 5’- G G C C G G G U U U G C U C C U C G U G C C -3’ U U G C A C G A 5’- G G G U A RNA aptamer G C C G U 3’- C C A G U C A U C Formation of secondary structure of the tobramycin binding RNA aptamer L. Jiang, A. K. Suri, R. Fiala, D. J. Patel, Saccharide-RNA recognition in an aminoglycoside antibiotic-RNA aptamer complex. Chemistry & Biology 4 :35-50 (1997)
The three-dimensional structure of the tobramycin aptamer complex L. Jiang, A. K. Suri, R. Fiala, D. J. Patel, Chemistry & Biology 4 :35-50 (1997)
Questions that cannot be answered by current experimental techniques: (i) How does the distribution of genotypes change with time? (ii) Which intermediates are passed during an optimization experiment? (iii) Why does optimization occur in steps? (iv) What happens at the edges of the quasi-stationary epochs? (v) How much do individual trajectories differ? (vi) Which is the proper statistics for evolutionary optimization?
Questions that cannot be answered by current experimental techniques: (i) � How does the distribution of genotypes change with time? (ii) Which intermediates are passed during an optimization experiment? (iii) Why does optimization occur in steps? (iv) What happens at the edges of the quasi-stationary epochs? (v) How much do individual trajectories differ? (vi) Is there a proper statistics for evolutionary optimization?
Questions that cannot be answered by current experimental techniques: (i) � How does the distribution of genotypes change with time? (ii) � Which intermediates are passed during an optimization experiment? (iii) Why does optimization occur in steps? (iv) What happens at the edges of the quasi-stationary epochs? (v) How much do individual trajectories differ? (vi) Is there a proper statistics for evolutionary optimization?
Questions that cannot be answered by current experimental techniques: (i) � How does the distribution of genotypes change with time? (ii) � Which intermediates are passed during an optimization experiment? (iii) � Why does optimization occur in steps? (iv) What happens at the edges of the quasi-stationary epochs? (v) How much do individual trajectories differ? (vi) Is there a proper statistics for evolutionary optimization?
Questions that cannot be answered by current experimental techniques: (i) � How does the distribution of genotypes change with time? (ii) � Which intermediates are passed during an optimization experiment? (iii) � Why does optimization occur in steps? (iv) � What happens at the edges of the quasi-stationary epochs? (v) How much do individual trajectories differ? (vi) Is there a proper statistics for evolutionary optimization?
Recommend
More recommend