Mutation-selection equation : [I i ] = x i � 0, f i > 0, Q ij � 0 dx ∑ ∑ ∑ n n n = − φ = = φ = = i f Q x x , i 1 , 2 , , n ; x 1 ; f x f L j ji j i i j j = = = dt j 1 i 1 j 1 Solutions are obtained after integrating factor transformation by means of an eigenvalue problem ( ) ( ) ∑ − n 1 ⋅ ⋅ λ c 0 exp t l ( ) ∑ n ik k k = = = = x t k 0 ; i 1 , 2 , , n ; c ( 0 ) h x ( 0 ) L ( ) ( ) ∑ ∑ i − k ki i n n 1 = i 1 ⋅ ⋅ λ c 0 exp t l jk k k = = j 1 k 0 { } { } { } ÷ = = = − = = = 1 W f Q ; i , j 1 , 2 , , n ; L ; i , j 1 , 2 , , n ; L H h ; i , j 1 , 2 , , n L l L L i ij ij ij { } − ⋅ ⋅ = Λ = λ = − 1 L W L ; k 0 , 1 , , n 1 L k
e 1 l 0 x 1 e 1 x 3 e 3 e 2 l 2 e 3 e 2 x 2 l 1 The quasispecies on the concentration simplex S 3 = { } ∑ = 3 ≥ = = x 0 , i 1 , 2 , 3 ; x 1 i i i 1
In the case of non-zero mutation rates (p>0 or q<1) the Darwinian principle of optimization of mean fitness can be understood only as an optimization heuristic . It is valid only on part of the concentration simplex. There are other well defined areas were the mean fitness decreases monotonously or were it may show non- monotonous behavior. The volume of the part of the simplex where mean fitness is non-decreasing in the conventional sense decreases with inreasing mutation rate p. In systems with recombination a similar restriction holds for Fisher‘s „universal selection equation“. Its global validity is restricted to the one-gene (single locus) model.
1. Optimization through variation and selection in populations 2. Neutral networks in genotype-phenotype mappings 3. Optimization in the RNA model 4. Evolution experiments with molecules in the laboratory
Theory of genotype – phenotype mapping P. Schuster, W.Fontana, P.F.Stadler, I.L.Hofacker, From sequences to shapes and back: A case study in RNA secondary structures . Proc.Roy.Soc.London B 255 (1994), 279-284 W.Grüner, R.Giegerich, D.Strothmann, C.Reidys, I.L.Hofacker, P.Schuster, Analysis of RNA sequence structure maps by exhaustive enumeration. I. Neutral networks . Mh.Chem. 127 (1996), 355-374 W.Grüner, R.Giegerich, D.Strothmann, C.Reidys, I.L.Hofacker, P.Schuster, Analysis of RNA sequence structure maps by exhaustive enumeration. II. Structure of neutral networks and shape space covering . Mh.Chem. 127 (1996), 375-389 C.M.Reidys, P.F.Stadler, P.Schuster, Generic properties of combinatory maps . Bull.Math.Biol. 59 (1997), 339-397 I.L.Hofacker, P. Schuster, P.F.Stadler, Combinatorics of RNA secondary structures . Discr.Appl.Math. 89 (1998), 177-207 C.M.Reidys, P.F.Stadler, Combinatory landscapes . SIAM Review 44 (2002), 3-54
Genotype-phenotype relations are highly complex and only the most simple cases can be studied. One example is the folding of RNA sequences into RNA structures represented in course-grained form as secondary structures. The RNA genotype-phenotype relation is understood as a mapping from the space of RNA sequences into a space of RNA structures.
5'-End 3'-End Sequence GCGGAU UUA GCUC AGDDGGGA GAGC M CCAGA CUGAAYA UCUGG AGMUC CUGUG TPCGAUC CACAG A AUUCGC ACCA 3'-End 5'-End 70 60 Secondary structure 10 Tertiary structure 50 20 30 40 5'-End 3'-End Symbolic notation The RNA secondary structure is a listing of GC , AU , and GU base pairs. It is understood in contrast to the full 3D- or tertiary structure at the resolution of atomic coordinates. RNA secondary structures are biologically relevant. They are, for example, conserved in evolution.
RNA Minimum Free Energy Structures Efficient algorithms based on dynamical programming are available for computation of secondary structures for given sequences. Inverse folding algorithms compute sequences for given secondary structures. M.Zuker and P.Stiegler. Nucleic Acids Res . 9 :133-148 (1981) Vienna RNA Package : http:www.tbi.univie.ac.at (includes inverse folding, suboptimal structures, kinetic folding, etc.) I.L.Hofacker, W. Fontana, P.F.Stadler, L.S.Bonhoeffer, M.Tacker, and P. Schuster. Mh.Chem . 125 :167-188 (1994)
Minimum free energy criterion UUUAGCCAGCGCGAGUCGUGCGGACGGGGUUAUCUCUGUCGGGCUAGGGCGC 1st GUGAGCGCGGGGCACAGUUUCUCAAGGAUGUAAGUUUUUGCCGUUUAUCUGG 2nd 3rd trial UUAGCGAGAGAGGAGGCUUCUAGACCCAGCUCUCUGGGUCGUUGCUGAUGCG 4th 5th CAUUGGUGCUAAUGAUAUUAGGGCUGUAUUCCUGUAUAGCGAUCAGUGUCCG GUAGGCCCUCUUGACAUAAGAUUUUUCCAAUGGUGGGAGAUGGCCAUUGCAG Inverse folding The inverse folding algorithm searches for sequences that form a given RNA secondary structure under the minimum free energy criterion.
Criterion of Minimum Free Energy UUUAGCCAGCGCGAGUCGUGCGGACGGGGUUAUCUCUGUCGGGCUAGGGCGC GUGAGCGCGGGGCACAGUUUCUCAAGGAUGUAAGUUUUUGCCGUUUAUCUGG UUAGCGAGAGAGGAGGCUUCUAGACCCAGCUCUCUGGGUCGUUGCUGAUGCG CAUUGGUGCUAAUGAUAUUAGGGCUGUAUUCCUGUAUAGCGAUCAGUGUCCG GUAGGCCCUCUUGACAUAAGAUUUUUCCAAUGGUGGGAGAUGGCCAUUGCAG Sequence Space Shape Space
The RNA model considers RNA sequences as genotypes and simplified RNA structures, called secondary structures, as phenotypes. The mapping from genotypes into phenotypes is many-to-one. Hence, it is redundant and not invertible. Genotypes, i.e. RNA sequences, which are mapped onto the same phenotype, i.e. the same RNA secondary structure, form neutral networks . Neutral networks are represented by graphs in sequence space.
CGTCGTTACAATTTA GTTATGTGCGAATTC CAAATT AAAA ACAAGAG..... G A G T CGTCGTTACAATTTA GTTATGTGCGAATTC CAAATT AAAA ACAAGAG..... A C A C Hamming distance d (S ,S ) = 4 H 1 2 d (S ,S ) = 0 (i) H 1 1 (ii) d (S ,S ) = d (S ,S ) H 1 2 H 2 1 � (iii) d (S ,S ) d (S ,S ) + d (S ,S ) H 1 3 H 1 2 H 2 3 The Hamming distance induces a metric in sequence space
.... GC CA UC .... d =1 H d =2 .... GC GA UC .... .... GC CU UC .... H d =1 H .... GC GU UC .... Single point mutations as moves in sequence space
Mutant class 0 0 1 1 2 4 8 16 Binary sequences are encoded by their decimal equivalents: 2 3 5 6 9 10 12 17 18 20 24 = 0 and = 1, for example, C G ≡ "0" 00000 = CCCCC , 3 7 11 13 14 19 21 22 25 26 28 ≡ "14" 01110 = , C GGG C ≡ 4 "29" 11101 = , etc. GGG G C 15 23 27 29 30 5 31 Sequence space of binary sequences of chain lenght n=5
ψ Sk = ( ) I. fk = ( f Sk ) Non-negative Sequence space Phenotype space numbers Mapping from sequence space into phenotype space and into fitness values
ψ Sk = ( ) I. fk = ( f Sk ) Non-negative Sequence space Phenotype space numbers
ψ Sk = ( ) I. fk = ( f Sk ) Non-negative Sequence space Phenotype space numbers The pre-image of the structure S k in sequence space is the neutral network G k
Neutral networks are sets of sequences forming the same structure. G k is the pre-image of the structure S k in sequence space: -1 (S k ) � { � G k = � j | � (I j ) = S k } The set is converted into a graph by connecting all sequences of Hamming distance one. Neutral networks of small RNA molecules can be computed by exhaustive folding of complete sequence spaces, i.e. all RNA sequences of a given chain length. This number, N=4 n , becomes very large with increasing length, and is prohibitive for numerical computations. Neutral networks can be modelled by random graphs in sequence space. In this approach, nodes are inserted randomly into sequence space until the size of the pre-image, i.e. the number of neutral sequences, matches the neutral network to be studied.
Step 00 Sketch of sequence space Random graph approach to neutral networks
Step 01 Sketch of sequence space Random graph approach to neutral networks
Step 02 Sketch of sequence space Random graph approach to neutral networks
Step 03 Sketch of sequence space Random graph approach to neutral networks
Step 04 Sketch of sequence space Random graph approach to neutral networks
Step 05 Sketch of sequence space Random graph approach to neutral networks
Step 10 Sketch of sequence space Random graph approach to neutral networks
Step 15 Sketch of sequence space Random graph approach to neutral networks
Step 25 Sketch of sequence space Random graph approach to neutral networks
Step 50 Sketch of sequence space Random graph approach to neutral networks
Step 75 Sketch of sequence space Random graph approach to neutral networks
Step 100 Sketch of sequence space Random graph approach to neutral networks
� � � � � -1 � � G = ( S ) | ( ) = I I S k k j j k � � (k) j / λ k = λ j = 12 27 , | G k | / κ - cr = 1 - -1 ( 1) λ κ Connectivity threshold: � � � Alphabet size : AUGC = 4 cr 2 0.5 λ λ > network G k is connected cr . . . . k 3 0.4226 λ λ < network G k is not connected cr . . . . 4 0.3700 k Mean degree of neutrality and connectivity of neutral networks
Giant Component A multi-component neutral network
A connected neutral network
3’-end 3’-end C C U U G G G G G G A A A A A A A A A A U U C C C C C C C C A A G G A A C C C C G G G G G G G G G G U U U U U U C C C C C G C C G G G G 5’-end 5’-end A A A A U A U A G G C C G C G C G C G C G C G C Incompatible Compatible U A U A C G C G A A G C G C Compatibility of sequences with structures G C G C C G C G G G C G A sequence is compatible with its minimum C G C G C G C free energy structure and all its suboptimal G U G U G structures. U U U U
Neutral network k � k G C G k Compatible set Ck The compatible set C k of a structure S k consists of all sequences which form S k as its minimum free energy structure ( neutral network G k ) or one of its suboptimal structures.
3’- end C U G G G A A A A A U C C C C A G A C C G G G G G U U U C C C C G G A A A Minimum free energy conformation S0 C G A U C A Suboptimal conformation S1 G C G C G C G C G C G C G C G C U A G U C G A U A A U G C A sequence at the intersection of U A G C C G C G two neutral networks is compatible C G C G with both structures C G C G G C C G G U U G G C U U
G 1 G 2 � � � : C 1 C 2 � � � : C 1 C 2 The intersection of two compatible sets is always non empty: C 1 � C 2 � �
1. Optimization through variation and selection in populations 2. Neutral networks in genotype-phenotype mappings 3. Optimization in the RNA model 4. Evolution experiments with molecules in the laboratory
Optimization of RNA molecules in silico W.Fontana, P.Schuster, A computer model of evolutionary optimization . Biophysical Chemistry 26 (1987), 123-147 W.Fontana, W.Schnabl, P.Schuster, Physical aspects of evolutionary optimization and adaptation . Phys.Rev.A 40 (1989), 3301-3321 M.A.Huynen, W.Fontana, P.F.Stadler, Smoothness within ruggedness. The role of neutrality in adaptation . Proc.Natl.Acad.Sci.USA 93 (1996), 397-401 W.Fontana, P.Schuster, Continuity in evolution. On the nature of transitions . Science 280 (1998), 1451-1455 W.Fontana, P.Schuster, Shaping space. The possible and the attainable in RNA genotype- phenotype mapping . J.Theor.Biol. 194 (1998), 491-515 B.M.R. Stadler, P.F. Stadler, G.P. Wagner, W. Fontana, The topology of the possible: Formal spaces underlying patterns of evolutionary change. J.Theor.Biol. 213 (2001), 241-274
3'-End 5'-End 70 60 10 50 20 30 40 Randomly chosen Phenylalanyl-tRNA as initial structure target structure
Stock Solution Reaction Mixture Fitness function: f k = � / [ � + � (k) ] d S � d S (k) = d s (I k ,I � ) The flowreactor as a device for studies of evolution in vitro and in silico
Master sequence Mutant cloud “Off-the-cloud” Concentration mutations Sequence e c a p s The molecular quasispecies in sequence space
Genotype-Phenotype Mapping Evaluation of the � = � S � ( ) I � S � Phenotype I � ƒ f = ( S ) � � f � Q � f 1 j f 1 Mutation I 1 f 2 f n+1 I 1 I n+1 I 2 f n f 2 I n I 2 f 3 I 3 Q Q I 3 f 3 I � I 4 f 4 f � I 5 I 4 I 5 f 4 f 5 f 5 Evolutionary dynamics including molecular phenotypes
50 S d � - 0 5 40 e r u t c u r Evolutionary trajectory t s 30 l a i t i n i m o r f 20 e c n a t s i d e g 10 a r e v A 0 0 250 500 750 1000 1250 Time (arbitrary units) In silico optimization in the flow reactor: Trajectory ( biologists‘ view )
50 S d � 40 t e g r a t o t e 30 c n a t s i d e r u 20 t c u r t s e g a 10 r e v A Evolutionary trajectory 0 0 250 500 750 1000 1250 Time (arbitrary units) In silico optimization in the flow reactor: Trajectory ( physicists‘ view )
Average structure distance to target dS 36 � Relay steps Number of relay step 10 38 40 42 44 Evolutionary trajectory 0 1250 Time 44 Endconformation of optimization
Average structure distance to target dS 36 � Relay steps Number of relay step 10 38 40 42 44 Evolutionary trajectory 0 1250 Time 43 44 Reconstruction of the last step 43 � 44
Average structure distance to target dS 36 � Relay steps Number of relay step 10 38 40 42 44 Evolutionary trajectory 0 1250 Time 42 43 44 Reconstruction of last-but-one step 42 � 43 ( � 44)
Average structure distance to target dS 36 � Relay steps Number of relay step 10 38 40 42 44 Evolutionary trajectory 0 1250 Time 41 42 43 44 Reconstruction of step 41 � 42 ( � 43 � 44)
Average structure distance to target dS 36 � Relay steps Number of relay step 10 38 40 42 44 Evolutionary trajectory 0 1250 Time 40 41 42 43 44 Reconstruction of step 40 � 41 ( � 42 � 43 � 44)
Average structure distance to target dS 36 � Relay steps Number of relay step 10 38 40 42 44 Evolutionary trajectory 0 1250 Time Evolutionary process 39 40 41 42 43 44 Reconstruction Reconstruction of the relay series
Transition inducing point mutations Neutral point mutations Change in RNA sequences during the final five relay steps 39 � 44
50 Relay steps S d � 40 t e g r a t o t e 30 c n a t s i d e r u 20 t c u r t s e g a 10 r e v A Evolutionary trajectory 0 0 250 500 750 1000 1250 Time (arbitrary units) In silico optimization in the flow reactor: Trajectory and relay steps
50 Relay steps S Uninterrupted presence d � 40 t e g r a t o t e 30 c n a t s i d e r u 20 t c u r t s e g a 10 r e v A Evolutionary trajectory 0 0 250 500 750 1000 1250 Time (arbitrary units) In silico optimization in the flow reactor: Uninterrupted presence
Average structure distance Uninterrupted presence Number of relay step 08 to target dS 10 � 12 20 14 Evolutionary trajectory 10 0 250 500 Time (arbitrary units) Transition inducing point mutations Neutral point mutations Neutral genotype evolution during phenotypic stasis
Uninterrupted presence Average structure distance to target dS � 30 20 Number of relay step 25 20 30 35 Evolutionary trajectory 10 750 1000 1250 Time (arbitrary units) 18 20 21 19 26 28 31 29 A random sequence of minor or continuous transitions in the relay series
18 20 21 19 26 28 31 29 A random sequence of minor or continuous transitions in the relay series
Shortening of Stacks Elongation of Stacks Multi- loop Minor or continuous transitions : Occur frequently on single Opening of Constrained Stacks point mutations
50 Relay steps S Uninterrupted presence d � 40 t e g r a t o t e 30 c n a t s i d e r u 20 t c u r t s e g a 10 r e v A Evolutionary trajectory 0 0 250 500 750 1000 1250 Time (arbitrary units) In silico optimization in the flow reactor: Uninterrupted presence
Average structure distance to target dS 36 � Main transition leading to clover leaf Relay steps Number of relay step 10 38 40 42 44 36 37 38 Evolutionary trajectory 0 1250 Time Reconstruction of a main transitions 36 � 37 ( � 38)
50 Relay steps Main transitions Average structure distance to target d � S 40 30 20 10 Evolutionary trajectory 0 0 250 500 750 1000 1250 Time (arbitrary units) In silico optimization in the flow reactor: Main transitions
Roll-Over Shift α α α a Double Flip Flip α a a a b β b β Main or discontinuous Multi- loop transitions : Structural innovations , occur rarely on single point Closing of Constrained mutations Stacks
50 Relay steps Main transitions Average structure distance to target d � S Uninterrupted presence 40 30 20 10 Evolutionary trajectory 0 0 250 500 750 1000 1250 Time (arbitrary units) In silico optimization in the flow reactor
The one-error neighborhood of the neutral network G k corresponding to the structure S k is defined by � (S k ) = {S j | S j = � (I i ) � d h (I i ,I m ) , I m � G k } Let � jk be the number of points, at which the two neutral networks G k and G j are in Hamming distance one contact, with � jk = � kj . The probability of occurrence of S j in the neighbothood of S k is then given by jk ⌫ ( l � | G k | ) � (S j ;S k ) = � ( � -1 ) � We note that this probability is not symmetric, � (S j ;S k ) � � (S k ;S j ), except the two networks are of equal size, | G k | = | G j |. The definition of a statistical � - neighborhood of the structure S k allows for precise distinction between frequent and rare neighbors. Frequent neighbors are contained in the statistical neighborhood � (S k ) = {S j � � (S k ) | � (S j ;S k ) � � } . �
-1 10 3'-End 5'-End -2 10 70 2 Frequency of occurrence 5 60 10 10 -3 10 50 20 30 40 -4 10 -5 10 Frequent neighbors Rare neighbors Minor transitions Main transitions -6 10 0 1 2 3 4 5 10 10 10 10 10 10 Rank Probability of occurrence of different structures in the mutational neighborhood of tRNA phe
Statistics of evolutionary trajectories Population Number of Number of Number of main size replications transitions transitions N < n > < n > < n > rep tr dtr The number of main transitions or evolutionary innovations is constant.
(j) S 1 P � k P � (j) P S 2 � k � (j) S k � (j) P S 3 � k P P � k � k (j) S m (j) in the population Transition probabilities determining the presence of phenotype S k
ν ν ν ν ν ν ν ν ν λ ν ν λ λ λ λ λ λ λ λ λ ν µ µ µ µ µ µ µ µ µ µ µ 0 1 2 3 4 5 6 7 8 9 10 N-1 N x (x) = x µ µ λ λ ν (x) = x + ( -x) N 12 ) T 1,0 T 0,1 t 10 ( X r 8 e b m 6 u n e 4 l c i t r a 2 P 0 Time t Calculation of transition probabilities by means of a birth-and-death process with immigration
(j) S 1 P � k P � (j) P S 2 � k � (j) S k � (j) P S 3 � k P P � k � k (j) S m 1 (j) N = sat (j) l � p . . < >
00 09 31 44 Three important steps in the formation of the tRNA clover leaf from a randomly chosen initial structure corresponding to three main transitions .
Stable tRNA clover leaf structures built from 3'-End binary, GC -only, sequences exist. The corresponding sequences are readily found 5'-End through inverse folding. Optimization by mutation and selection in the flow reactor has so far always been unsuccessful. 70 60 10 The neutral network of the tRNA clover 50 leaf in GC sequence space is not 20 connected, whereas to the corresponding neutral network in AUGC sequence space 30 40 is very close to the critical connectivity threshold, � cr . Here, both inverse folding and optimization in the flow reactor are successful. The success of optimization depends on the connectivity of neutral networks .
Main results of computer simulations of molecular evolution • No trajectory was reproducible in detail. Sequences of target structures were always different. Nevertheless solutions of the same quality are almost always achieved. • Transitions between molecular phenotypes represented by RNA structures can be classified with respect to the induced structural changes. Highly probable minor transitions are opposed by main transitions with low probability of occurrence. • Main transitions represent important innovations in the course of evolution. • The number of minor transitions decreases with increasing population size. • The number of main transitions or evolutionary innovations is approximately constant for given start and stop structures. • Not all known structures are accessible through evolution in the flow reactor. An example is the tRNA clover leaf for GC-only sequences.
1. Optimization through variation and selection in populations 2. Neutral networks in genotype-phenotype mappings 3. Optimization in the RNA model 4. Evolution experiments with molecules in the laboratory
10 6 generations 10 7 generations Generation time 10 000 generations RNA molecules 10 sec 27.8 h = 1.16 d 115.7 d 3.17 a 1 min 6.94 d 1.90 a 19.01 a Bacteria 20 min 138.9 d 38.03 a 380 a 10 h 11.40 a 1 140 a 11 408 a Higher multicelluar 10 d 274 a 27 380 a 273 800 a 2 × 10 7 a 2 × 10 8 a organisms 20 a 20 000 a Generation times and evolutionary timescales
Evolution of RNA molecules based on Q β phage D.R.Mills, R,L,Peterson, S.Spiegelman, An extracellular Darwinian experiment with a self-duplicating nucleic acid molecule . Proc.Natl.Acad.Sci.USA 58 (1967), 217-224 S.Spiegelman, An approach to the experimental analysis of precellular evolution . Quart.Rev.Biophys. 4 (1971), 213-253 C.K.Biebricher, Darwinian selection of self-replicating RNA molecules . Evolutionary Biology 16 (1983), 1-52 C.K.Biebricher, W.C. Gardiner, Molecular evolution of RNA in vitro . Biophysical Chemistry 66 (1997), 179-192 G.Strunk, T. Ederhof, Machines for automated evolution experiments in vitro based on the serial transfer concept . Biophysical Chemistry 66 (1997), 193-202
RNA sample Time 0 1 2 3 4 5 6 69 70 � Stock solution: Q RNA-replicase, ATP, CTP, GTP and UTP, buffer The serial transfer technique applied to RNA evolution in vitro
Reproduction of the original figure of the β serial transfer experiment with Q RNA D.R.Mills, R,L,Peterson, S.Spiegelman, An extracellular Darwinian experiment with a self-duplicating nucleic acid molecule . Proc.Natl.Acad.Sci.USA 58 (1967), 217-224
Decrease in mean fitness due to quasispecies formation The increase in RNA production rate during a serial transfer experiment
Evolutionary design of RNA molecules D.B.Bartel, J.W.Szostak, In vitro selection of RNA molecules that bind specific ligands . Nature 346 (1990), 818-822 C.Tuerk, L.Gold, SELEX - Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase . Science 249 (1990), 505-510 D.P.Bartel, J.W.Szostak, Isolation of new ribozymes from a large pool of random sequences . Science 261 (1993), 1411-1418 R.D.Jenison, S.C.Gill, A.Pardi, B.Poliski, High-resolution molecular discrimination by RNA . Science 263 (1994), 1425-1429
Amplification Diversification Genetic Diversity Selection Cycle Selection Desired Properties ? ? ? no Selection cycle used in yes applied molecular evolution to design molecules with predefined properties
Recommend
More recommend