RNA Bioinformatics Beyond the One Sequence-One Structure Paradigm Peter Schuster Institut für Theoretische Chemie, Universität Wien, Austria and The Santa Fe Institute, Santa Fe, New Mexico, USA 2008 Molecular Informatics and Bioinformatics Collegium Budapest, 27.– 29.03.2008
Web-Page for further information: http://www.tbi.univie.ac.at/~pks
1. Computation of RNA equilibrium structures 2. Inverse folding and neutral networks 3. Evolutionary optimization of structure 4. Suboptimal conformations and kinetic folding
1. Computation of RNA equilibrium structures 2. Inverse folding and neutral networks 3. Evolutionary optimization of structure 4. Suboptimal conformations and kinetic folding
5' - end N 1 O CH 2 O GCGGAU UUA GCUC AGUUGGGA GAGC CCAGA G CUGAAGA UCUGG AGGUC CUGUG UUCGAUC CACAG A AUUCGC ACCA 5'-end 3’-end N A U G C k = , , , OH O N 2 O P O CH 2 O Na � O O OH N 3 O P O CH 2 O Na � O Definition of RNA structure O OH N 4 O P O CH 2 O Na � O O OH 3' - end O P O Na � O
N = 4 n N S < 3 n Criterion: Minimum free energy (mfe) Rules: _ ( _ ) _ � { AU , CG , GC , GU , UA , UG } A symbolic notation of RNA secondary structure that is equivalent to the conventional graphs
Conventional definition of RNA secondary structures
H-type pseudoknot
∑ − 1 = + n ⋅ S S S S + − − 1 1 = n n j n j 1 j Counting the numbers of structures of chain length n � n+ 1 M.S. Waterman, T.F. Smith (1978) Math.Bioscience 42 :257-266
Restrictions on physically acceptable mfe-structures: � � 3 and � � 2
≥ λ n Size restriction of elements: (i) hairpin loop loop ≥ σ (ii) stack n stack = Ξ + Φ S + + − 1 1 1 m m m ∑ − 2 m Ξ = + Φ ⋅ S S + − + 1 1 = λ + σ − m m k m k 2 2 k ∑ ⎣ − λ + ⎦ ( 1 ) / 2 Φ = m Ξ + − + 1 2 1 m = σ − m k 1 k S n � # structures of a sequence with chain length n Recursion formula for the number of physically acceptable stable structures I.L.Hofacker, P.Schuster, P.F. Stadler. 1998. Discr.Appl.Math . 89 :177-207
RNA sequence: GUAUCGAAAUACGUAGCGUAUGGGGAUGCUGGACGGUCCCAUCGGUACUCCA Biophysical chemistry: thermodynamics and kinetics RNA folding : Structural biology, spectroscopy of biomolecules, Empirical parameters understanding molecular function RNA structure of minimal free energy Sequence, structure, and design
(h) S 5 (h) S 3 (h) S 4 (h) S 1 (h) S 2 (h) S 8 0 G (h) (h) S 9 S 7 � y g r e n e (h) S 6 e e Suboptimal conformations r F (h) S 0 Minimum of free energy The minimum free energy structures on a discrete space of conformations
Elements of RNA secondary structures as used in free energy calculations ∑ ∑ ∑ ∑ ∆ = + + + + 300 ( ) ( ) ( ) L G g h n b n i n 0 , ij kl l b i stacks of hairpin bulges internal base pairs loops loops
Maximum matching j 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 i G G C G C G C C C G G C G C C 1 G * * 1 1 1 1 2 3 3 3 4 4 5 6 6 An example of a dynamic programming 2 G * * 0 1 1 2 2 2 3 3 4 4 5 6 computation of the maximum number of 3 C * * 0 1 1 1 2 3 3 3 4 5 5 base pairs 4 G * * 0 1 1 2 2 2 3 4 5 5 5 C * * 0 1 1 2 2 3 4 4 4 Back tracking yields the structure(s). 6 G * * 1 1 1 2 3 3 3 4 7 C * * 0 1 2 2 2 2 3 8 C * * 1 1 1 2 2 2 9 C * * 1 1 2 2 2 10 G * * 1 1 1 2 11 G * * 0 1 1 12 C * * 0 1 [i,k-1] [ k+1,j ] 13 G * * 1 14 C * * 15 C * i i+1 i+2 k j-1 j j+1 X i,k-1 X k+1,j { ( ) } = + + ρ max , max ( 1 ) X X X X + ≤ ≤ − − + + , 1 , 1 , 1 1 , , 1 i j i j i k j i k k j k j Minimum free energy computations are based on empirical energies
1. Computation of RNA equilibrium structures 2. Inverse folding and neutral networks 3. Evolutionary optimization of structure 4. Suboptimal conformations and kinetic folding
RNA sequence: GUAUCGAAAUACGUAGCGUAUGGGGAUGCUGGACGGUCCCAUCGGUACUCCA Iterative determination of a sequence for the Inverse folding of RNA : given secondary RNA folding : structure Biotechnology, Structural biology, design of biomolecules spectroscopy of Inverse Folding with predefined biomolecules, Algorithm structures and functions understanding molecular function RNA structure of minimal free energy Sequence, structure, and design
Compatibility of sequences and structures
Compatibility of sequences and structures
Inverse folding algorithm I 0 � I 1 � I 2 � I 3 � I 4 � ... � I k � I k+1 � ... � I t S 0 � S 1 � S 2 � S 3 � S 4 � ... � S k � S k+1 � ... � S t I k+1 = M k (I k ) and � d S (S k ,S k+1 ) = d S (S k+1 ,S t ) - d S (S k ,S t ) < 0 M ... base or base pair mutation operator d S (S i ,S j ) ... distance between the two structures S i and S j ‚Unsuccessful trial‘ ... termination after n steps
Approach to the target structure S k in the inverse folding algorithm
The inverse folding algorithm searches for sequences that form a given RNA secondary structure under the minimum free energy criterion.
I Space of genotypes: = { , , , , ... , } ; Hamming metric I I I I I 1 2 3 4 N S Space of phenotypes: = { , , , , ... , } ; metric (not required) S S S S S 1 2 3 4 M �� N M � ( ) = I S j k U � � -1 � � G k = ( ) | ( ) = I S I S k j j k � A mapping and its inversion
1. Computation of RNA equilibrium structures 2. Inverse folding and neutral networks 3. Evolutionary optimization of structure 4. Suboptimal conformations and kinetic folding
Structure of Phenylalanyl-tRNA as andomly chosen target structure initial sequence
Evolution in silico W. Fontana, P. Schuster, Science 280 (1998), 1451-1455
Evolution of RNA molecules as a Markow process and its analysis by means of the relay series
Evolution of RNA molecules as a Markow process and its analysis by means of the relay series
Evolution of RNA molecules as a Markow process and its analysis by means of the relay series
Evolution of RNA molecules as a Markow process and its analysis by means of the relay series
Evolution of RNA molecules as a Markow process and its analysis by means of the relay series
Evolution of RNA molecules as a Markow process and its analysis by means of the relay series
Evolution of RNA molecules as a Markow process and its analysis by means of the relay series
Evolution of RNA molecules as a Markow process and its analysis by means of the relay series
Evolution of RNA molecules as a Markow process and its analysis by means of the relay series
Evolution of RNA molecules as a Markow process and its analysis by means of the relay series
Evolution of RNA molecules as a Markow process and its analysis by means of the relay series
Evolution of RNA molecules as a Markow process and its analysis by means of the relay series
Evolution of RNA molecules as a Markow process and its analysis by means of the relay series
Replication rate constant: f k = � / [ � + � d S (k) ] � d S (k) = d H (S k ,S � ) Selection constraint: Population size, N = # RNA molecules, is controlled by the flow ≈ ± ( ) N t N N Mutation rate: p = 0.001 / site � replication The flowreactor as a device for studies of evolution in vitro and in silico
In silico optimization in the flow reactor: Evolutionary Trajectory
28 neutral point mutations during a long quasi-stationary epoch Transition inducing point mutations Neutral point mutations leave the change the molecular structure molecular structure unchanged Neutral genotype evolution during phenotypic stasis
A sketch of optimization on neutral networks
Randomly chosen initial structure Phenylalanyl-tRNA as target structure
Application of molecular evolution to problems in biotechnology
1. Computation of RNA equilibrium structures 2. Inverse folding and neutral networks 3. Evolutionary optimization of structure 4. Suboptimal conformations and kinetic folding
RNA secondary structures derived from a single sequence
An algorithm for the computation of all suboptimal structures of RNA molecules using the same concept for retrieval as applied in the sequence alignment algorithm by M.S. Waterman and T.F. Smith. Math.Biosci. 42:257-266, 1978.
An algorithm for the computation of RNA folding kinetics
Recommend
More recommend