From Sequences to Structures and Back The Vienna RNA Package Peter Schuster Institut für Theoretische Chemie, Universität Wien, Austria and The Santa Fe Institute, Santa Fe, New Mexico, USA Siemens PSE Life Science Symposium Brno, 14.03.2006
Web-Page for further information: http://www.tbi.univie.ac.at/~pks
5' - end N 1 O CH 2 O GCGGAU UUA GCUC AGUUGGGA GAGC CCAGA G CUGAAGA UCUGG AGGUC CUGUG UUCGAUC CACAG A AUUCGC ACCA 5'-end 3’-end N A U G C k = , , , OH O N 2 O P O CH 2 O Na � O O OH N 3 O P O CH 2 O Na � O Definition of RNA structure O OH N 4 O P O CH 2 O Na � O O OH 3' - end O P O Na � O
A symbolic notation of RNA secondary structure that is equivalent to the conventional graphs
N = 4 n N S < 3 n Criterion: Minimum free energy (mfe) Rules: _ ( _ ) _ � { AU , CG , GC , GU , UA , UG } A symbolic notation of RNA secondary structure that is equivalent to the conventional graphs
Conventional definition of RNA secondary structures
Restrictions on physically acceptable mfe-structures: � � 3 and � � 2
Vienna RNA Package RNAfold RNAdistance RNAinverse RNAduplex RNAsubopt RNAeval RNAheat RNAcofold RNApdist RNAalifold RNAplot http://www.tbi.univie.ac.at/RNA/
RNA sequence Biophysical chemistry: thermodynamics and kinetics RNA folding : Structural biology, spectroscopy of biomolecules, Empirical parameters understanding molecular function RNA structure of minimal free energy Sequence, structure, and design
5’-end 3’-end A C (h) C S 5 (h) S 3 U (h) G C S 4 A U A U (h) S 1 U G (h) S 2 (h) C G S 8 0 G (h) (h) S 9 S 7 G C � A U y g A r A e n e (h) A S 6 C C e U e A Suboptimal conformations r U G G F C C A G G U U U G G G A C C A U G A G G G C U G (h) S 0 Minimum of free energy The minimum free energy structures on a discrete space of conformations
hairpin loop hairpin hairpin loop loop stack free stack stack joint stack end bulge free end free end stack internal loop stack hairpin loop Elements of RNA hairpin loop multiloop secondary structures hairpin as used in free energy loop calculations s t a c k stack stack ∑ ∑ ∑ ∑ ∆ = + + + + 300 free free ( ) ( ) ( ) L G g h n b n i n end end 0 , ij kl l b i stacks of hairpin bulges internal base pairs loops loops
RNA sequence Iterative determination of a sequence for the Inverse folding of RNA : given secondary RNA folding : structure Biotechnology, Structural biology, design of biomolecules spectroscopy of Inverse Folding with predefined biomolecules, Algorithm structures and functions understanding molecular function RNA structure of minimal free energy Sequence, structure, and design
Inverse folding algorithm I 0 � I 1 � I 2 � I 3 � I 4 � ... � I k � I k+1 � ... � I t S 0 � S 1 � S 2 � S 3 � S 4 � ... � S k � S k+1 � ... � S t I k+1 = M k (I k ) and � d S (S k ,S k+1 ) = d S (S k+1 ,S t ) - d S (S k ,S t ) < 0 M ... base or base pair mutation operator d S (S i ,S j ) ... distance between the two structures S i and S j ‚Unsuccessful trial‘ ... termination after n steps
Intermediate compatible sequences Initial trial sequences Stop sequence of an unsuccessful trial Intermediate compatible sequences Target sequence Target structure S k Approach to the target structure S k in the inverse folding algorithm
Minimum free energy criterion 1st 2nd 3rd trial 4th 5th Inverse folding of RNA secondary structures The inverse folding algorithm searches for sequences that form a given RNA secondary structure under the minimum free energy criterion.
( ) ( ) ( ) ( ) ( ) ∑ ( ) − ε − ε = γ γ = / , with kT / base pair probability p X T T a S T g e 0 Q T k ij k ij k k k k ( ) ( ) ∑ = γ Q T T k k ∑ ∑ = − = − ln with 1 base pairing entropy s p p p p ≠ i ij ij ii ij , j j j i Base pair probability derived from the partition function Q ( T )
3' 5' Example of a small RNA molecule with two low-lying suboptimal conformations which contribute substantially to the partition function UUGGAGUACACAACCUGUACACUCUUUC Example of a small RNA molecule: n=28
U U G G A G U A C A C A A C C U G U A C A C U C U U U C C U C U U U C U C A C A U G U C C A A C A C A U G A G G U U U U U G G A G U A C A C A A C C U G U A C A C U C U U U C U C C G U G A U U A second suboptimal configuration C G U A ∆ E = 0.55 kcal / mole 0 →2 U A G C U A C C A C A C U first suboptimal configuration U U ∆ E C = 0.50 kcal / mole → U G G A G 0 1 C C U U A A U U G A U A C A C C A C 3' C U U U C U U U G G A G U C 5' C A minimum free energy A configuration U A G C � G = - 5.39 kcal / mole 0 U A C C A A C U U G G A G U A C A C A A C C U G U A C A C U C U U U C „Dot plot“ of the minimum free energy structure ( lower triangle ) and the partition function ( upper triangle ) of a small RNA molecule (n=28) with low energy suboptimal configurations
Phenylalanyl-tRNA as an example for the computation of the partition function
G first suboptimal configuration ∆ 0 E = 0.43 kcal / mole → 1 3’ 5’ tRNA phe without modified bases
G C G G A U U U A G C U C A G D D G G G A G A G C MC C A G A C U G A A Y A U C U G G A G MU C C U G U G T P C G A U C C A C A G A A U U C G C A C C A A C C A C G C U U A A G A C A C C U A G C P T G U G U C C U MG A G G U C U A Y A A G U C A G A C C M C G A G A G G G D D G A C U C G A U U U A G G C G G C G G A U U U A G C U C A G D D G G G A G A G C MC C A G A C U G A A Y A U C U G G A G M U C C U G U G T P C G A U C C A C A G A A U U C G C A C C A G C A P U T C G C C U A U C C G G M U C C A A A A C G C U U A A G G G A Y G C G G A U U U U C C U A C A A M G G A C A C U C G G U A C G A A G G D G G D first suboptimal configuration ∆ 0 E = 0.94 kcal / mole → 1 3’ 5’ phe tRNA with modified bases G C G G A U U U A G C U C A G D D G G G A G A G C MC C A G A C U G A A Y A U C U G G A G MU C C U G U G T P C G A U C C A C A G A A U U C G C A C C A
( ) ( ) ( ) ( ) ( ) ∑ ( ) − ε − ε = γ γ = / , with kT / base pair probability p X T T a S T g e Q T 0 k ij k ij k k k k ( ) ( ) ∑ = γ Q T T k k ∑ ∑ = − = − ln with 1 base pairing entropy s p p p p i ij ij ii ≠ ij , j j j i Reliability measures for structure prediction
Base pairing entropy and base pair probability in a model RNA molecule
without modification nucleotides with modification base pairing entropy base pair probability Reliability of structure prediction in tRNA phe
base pairing entropy base pair probability native structure Reliability of structure prediction in 5S ribosomal RNA
The Folding Algorithm Master equation A sequence I specifies an energy ordered set of dP ( ) ∑ ∑ ∑ + + + 1 1 1 = m − = m − m ( ) ( ) k P t P t k P P k compatible structures S (I): = ik ki = ik i k = ki 0 0 0 i i i dt = + 0 , 1 , , 1 K k m S (I) = {S 0 , S 1 , … , S m , O } Transition probabilities P ij (t) = Prob {S i → S j } are A trajectory T k (I) is a time ordered series of defined by structures in S (I). A folding trajectory is defined by starting with the open chain O and P ij (t) = P i (t) k ij = P i (t) exp(- ∆ G ij /2RT) / Σ i ending with the global minimum free energy structure S 0 or a metastable structure S k which P ji (t) = P j (t) k ji = P j (t) exp(- ∆ G ji /2RT) / Σ j represents a local energy minimum: ∑ T 0 (I) = { O , S (1) , … , S (t-1) , S (t) , + 2 m Σ = exp(- ∆ G ki /2RT) S (t+1) , … , S 0 } k = ≠ 1 , k k i T k (I) = { O , S (1) , … , S (t-1) , S (t) , The symmetric rule for transition rate parameters is due S (t+1) , … , S k } to Kawasaki (K. Kawasaki, Diffusion constants near the critical point for time depen-dent Ising models . Phys.Rev. 145 :224-230, 1966). Formulation of kinetic RNA folding as a stochastic process
Corresponds to base pair distance : d P ( S 1 , S 2 ) Base pair formation and base pair cleavage moves for nucleation and elongation of stacks
Base pair closure, opening and shift corresponds to Hamming distance: d H ( S 1 , S 2 ) Base pair shift move of class 1: Shift inside internal loops or bulges
(h) S 5 (h) S 1 (h) S 2 (h) (h) 0 S 9 S 7 Free energy G � (h) S 6 Suboptimal conformations Search for local minima in conformation space S h Local minimum
0 G � y T g { k r 0 e G n e � e e y r F g r e n e e e r F S { S { Saddle point T { k S k S k "Barrier tree" "Reaction coordinate" Definition of a ‚barrier tree‘
Recommend
More recommend