How innovation occurs in evolution of molecules Peter Schuster Institut für Theoretische Chemie und Molekulare Strukturbiologie der Universität Wien Evolutionary innovation Praha, 30.05.2002
Darwinian principle is based on three functions: • Reproduction efficiency expressed by fitness of phenotypes . • Variation of genotypes through imperfect copying and recombination. • Selection of phenotypes based on differences in fitness. Two additional features are required: • Large reservoirs of genotypes and sufficiently rich repertoires of phenotypes. • Mapping of genotypes into phenotypes with suitable properties.
The genotypes or genomes of individuals are DNA or RNA sequences. They are changing from generation to generation through mutation and recombination. Species are reproductively related ensembles of individuals. Genotypes unfold into phenotypes, being molecular structures, viruses or organisms, which are the targets of the evolutionary selection process. The most common mutations are point mutations , which consist of single nucleotide exchanges. The Hamming distance of two sequences is the minimal number of single nucleotide exchanges that mutually converts the two sequence into each other.
C A C A A C A C A C C 5’- G G G G G U U U G U U G U G C C -3’ = adenylate A = cytidylate C = uridylate = guanylate U G Genotype : The sequence of an RNA molecule consisting of monomers chosen from four classes, A , U , G , and C .
Phenotype : Three-dimensional structure of phenylalanyl transfer-RNA
Hydrogen bonds Hydrogen bonding between nucleotide bases is the principle of template action of RNA and DNA.
5' 3' Plus Strand G C C C G Synthesis 5' 3' Plus Strand G C C C G C G 3' Synthesis 5' 3' Plus Strand G C C C G Minus Strand C G G G C 5' 3' Complex Dissociation 3' 5' Plus Strand G C C C G Complementary replication as the + 5' 3' simplest copying mechanism of RNA Minus Strand C G G G C
5' 3' Plus Strand G C C C G 5' 3' GAA UCCCG AA GAA UCCCGUCCCG AA Plus Strand G C C C G Insertion C 3' G 5' 3' Minus Strand G G C G G C GAAUCCA GAAUCC CGA A 3' 5' Deletion Plus Strand G C C C G C Point Mutation Mutations represent the mechanism of variation in nucleic acids.
A A A A A G G C C G G G U U U G C U C C U C G U G C C -3’ 5’- = adenylate A 27 16 � 4 = 1.801 10 possible different sequences = uridylate U = cytidylate C Combinatorial diversity of sequences: N = 4 { = guanylate G Combinatorial diversity of heteropolymers illustrated by means of an RNA aptamer that binds to the antibiotic tobramycin
ψ Sk = ( ) I. fk = ( f Sk ) Non-negative Sequence space Phenotype space numbers Mapping from sequence space into phenotype space and into fitness values
Evolution of RNA molecules based on Q β phage D.R.Mills, R,L,Peterson, S.Spiegelman, An extracellular Darwinian experiment with a self-duplicating nucleic acid molecule . Proc.Natl.Acad.Sci.USA 58 (1967), 217-224 S.Spiegelman, An approach to the experimental analysis of precellular evolution . Quart.Rev.Biophys. 4 (1971), 213-253 C.K.Biebricher, Darwinian selection of self-replicating RNA molecules . Evolutionary Biology 16 (1983), 1-52 C.K.Biebricher, W.C. Gardiner, Molecular evolution of RNA in vitro . Biophysical Chemistry 66 (1997), 179-192
RNA sample Time 0 1 2 3 4 5 6 69 70 � Stock solution: Q RNA-replicase, ATP, CTP, GTP and UTP, buffer The serial transfer technique applied to RNA evolution in vitro
The increase in RNA production rate during a serial transfer experiment
A ribozyme switch E.A.Schultes, D.B.Bartel, One sequence, two ribozymes: Implication for the emergence of new ribozyme folds . Science 289 (2000), 448-452
Two ribozymes of chain lengths n = 88 nucleotides: An artificial ligase ( A ) and a natural cleavage ribozyme of hepatitis- � -virus ( B )
The sequence at the intersection : An RNA molecules which is 88 nucleotides long and can form both structures
Reference for the definition of the intersection and the proof of the intersection theorem
Two neutral walks through sequence space with conservation of structure and catalytic activity
Sequence of mutants from the intersection to both reference ribozymes
Reference for postulation and in silico verification of neutral networks
No new principle will declare itself from below a heap of facts. Sir Peter Medawar, 1985
(A) + I 1 I 1 I 1 + k 1 Σ (A) + I 2 I 2 Φ I 2 + dx / dt = k x - x j i i i j k 2 Φ = Σ ; Σ = 1 k x x i i i i i [A] = a = constant I j I j (A) + (A) + I j I j + + k = max {k ; j=1,2,...,n} k j k j m j � � � x (t) 1 for t m I m (A) + (A) + I m I m + k m s = (k m+1 -k m )/k m I n (A) + (A) + I n I n + + k n Selection of the „fittest“ or fastest replicating species
1 Fraction of advantageous variant 0.8 0.6 s = 0.1 s = 0.02 0.4 0.2 s = 0.01 0 0 200 600 1000 400 800 Time [Generations] Selection of advantageous mutants in populations of N = 10 000 individuals
Theory of molecular evolution M.Eigen, Self-organization of matter and the evolution of biological macromolecules . Naturwissenschaften 58 (1971), 465-526 M.Eigen, P.Schuster, The hypercycle. A principle of natural self-organization. Part A: Emergence of the hypercycle . Naturwissenschaften 58 (1977), 465-526 M.Eigen, P.Schuster, The hypercycle. A principle of natural self-organization. Part B: The abstract hypercycle . Naturwissenschaften 65 (1978), 7-41 M.Eigen, P.Schuster, The hypercycle. A principle of natural self-organization. Part C: The realistic hypercycle . Naturwissenschaften 65 (1978), 341-369 J.S.McCaskill, A localization threshold for macromolecular quasi-species from continuously distributed replication rates . J.Chem.Phys. 80 (1984), 5194-5205 M.Eigen, J.McCaskill, P.Schuster, The molecular quasispecies . Adv.Chem.Phys. 75 (1989), 149-263 C. Reidys, C.Forst, P.Schuster, Replication and mutation on neutral networks . Bull.Math.Biol. 63 (2001), 57-94
I 1 I j + k Q j 1j I 2 I j + k Q j 2j I j I j I j M + + k Q j nj I n I j + Σ i Q = 1 ij n-d(i,j) d(i,j) Q = (1-p) p ; p ...... error rate per digit ij d(i,j) ...... Hamming distance between I and I i j Σ Φ dx / dt = k Q x - x j i i ji i j Chemical kinetics of replication Φ = Σ ; Σ = 1 k x x and mutation as parallel reactions i i i i i
Master sequence Mutant cloud n o i t a r t n e c n o C Sequence space The molecular quasispecies in sequence space
The RNA model considers RNA sequences as genotypes and simplified RNA structures , called secondary structures , as phenotypes . Variation is restricted to point mutations . The mapping from genotypes into phenotypes is many-to-one. Hence, it is redundant and not invertible. Genotypes, i.e. RNA sequences, which are mapped onto the same phenotype, i.e. the same RNA secondary structure, form neutral networks . Neutral networks are represented by graphs in sequence space.
RNA secondary structures and their properties RNA secondary structures are listings of Watson-Crick and GU wobble base pairs , which are free of knots and pseudokots. Secondary structures are folding intermediates in the formation of full three-dimensional structures. D.Thirumalai, N.Lee, S.A.Woodson, and D.K.Klimov. Annu.Rev.Phys.Chem . 52 :751-762 (2001)
5'-End 3'-End Sequence GCGGAU UUA GCUC AGDDGGGA GAGC M CCAGA CUGAAYA UCUGG AGMUC CUGUG TPCGAUC CACAG A AUUCGC ACCA 3'-End 5'-End 70 60 Secondary Structure 10 50 20 30 40 Symbolic Notation 5'-End 3'-End Definition and formation of the secondary structure of phenylalanyl-tRNA
RNA minimum free energy structures Efficient algorithms based on dynamical programming are available for computation of secondary structures for given sequences. Inverse folding algorithms compute sequences for given secondary structures. M.Zuker and P.Stiegler. Nucleic Acids Res . 9 :133-148 (1981) Vienna RNA Package : http:www.tbi.univie.ac.at (includes inverse folding , suboptimal structures , kinetic folding , etc.) I.L.Hofacker, W. Fontana, P.F.Stadler, L.S.Bonhoeffer, M.Tacker, and P. Schuster. Mh.Chem . 125 :167-188 (1994)
Criterion of Minimum Free Energy UUUAGCCAGCGCGAGUCGUGCGGACGGGGUUAUCUCUGUCGGGCUAGGGCGC GUGAGCGCGGGGCACAGUUUCUCAAGGAUGUAAGUUUUUGCCGUUUAUCUGG UUAGCGAGAGAGGAGGCUUCUAGACCCAGCUCUCUGGGUCGUUGCUGAUGCG CAUUGGUGCUAAUGAUAUUAGGGCUGUAUUCCUGUAUAGCGAUCAGUGUCCG GUAGGCCCUCUUGACAUAAGAUUUUUCCAAUGGUGGGAGAUGGCCAUUGCAG Sequence Space Shape Space
.... GC CA UC .... d =1 H d =2 .... GC GA UC .... .... GC CU UC .... H d =1 H .... GC GU UC .... Point mutations as moves in sequence space
CGTCGTTACAATTTA GTTATGTGCGAATTC CAAATT AAAA ACAAGAG..... G A G T CGTCGTTACAATTTA GTTATGTGCGAATTC CAAATT AAAA ACAAGAG..... A C A C Hamming distance d (S ,S ) = 4 H 1 2 d (S ,S ) = 0 (i) H 1 1 (ii) d (S ,S ) = d (S ,S ) H 1 2 H 2 1 � (iii) d (S ,S ) d (S ,S ) + d (S ,S ) H 1 3 H 1 2 H 2 3 The Hamming distance induces a metric in sequence space
Recommend
More recommend