Lecture 7: RNA folding Chapter 6 – Problem 6.51 in Jones and Pevzner and the Turner model Fall 2019 September 19, 2019
RNA Basics RNA bases A,C,G,U Canonical Base Pairs ◦ A-U ◦ G-C ◦ G-U “wobble” pairing ◦ Bases can only pair with one other base. Image: http://www.bioalgorithms.info/ 2
RNA Structural Levels AAUCG...CUUCUUCCA Primary Primary Secondary Tertiary 3
RNA Secondary Structure Pseudoknot Stack Internal Loop Single-Stranded Bulge Loop Junction (Multiloop) Hairpin loop 4
Base Pair Maximization A G C G C A U C 5 Zuker (1981) Nucleic Acids Research 9(1) 133-149
Base Pair Maximization – Dynamic Programming Algorithm Simple Example: Maximizing Base Pairing 6
Base Pair Maximization – Dynamic Programming Algorithm S(i,j) is the folding of the subsequence of the RNA strand from index i to index j which results in the highest number of base pairs 7
Base Pair Maximization – Dynamic Programming Algorithm 8
Base Pair Maximization – Dynamic Programming Algorithm 9
Base Pair Maximization – Dynamic Programming Algorithm 10
Base Pair Maximization – Dynamic Programming Algorithm 11
Circular Representation Images – David Mount 12
Pseudoknots Images – David Mount Pseudoknots cause a breakdown in the presented Dynamic Programming Algorithm. In order to form a pseudoknot, checks must be made to ensure base is not already paired – this breaks down the divide and conquer recurrence relations. 13
Simplifying Assumptions • RNA folds into one minimum free-energy structure. • There are no knots (base pairs never cross). • The energy of a particular base pair in a double stranded region is sequence independent. • Neighbors do not influence the energy. • Was solved by dynamic programming, Zucker and Steigler 1981 14
Sequence Dependent Base Pair Energy Values (Nearest Neighbor Model) U U U U C G C G U A G C A U A U G C G C A UCGAC 3’ A UCGAC 3’ 5’ 5’ Example values: GC GC GC GC AU GC CG UA -2.3 -2.9 -3.4 -2.1 15
Free Energy Computation (Nearest Neighbor Model) U U +5.9 4 nt loop -1.1 mismatch of hairpin A A -2.9 stacking G C G C -2.9 stacking +3.3 1nt bulge A -1.8 stacking G C -0.9 stacking U A A U -1.8 stacking 5’ dangling C G -2.1 stacking A U -0.3 A 3’ -0.3 A 5’ G= - 4.9 kcal/mol 16
RNA Secondary Structure Stack 17
Nearest Neighbor Model • Stacking energy - assign negative energies to these between base pair regions. • Energy is influenced by the nearest closing base pair • These energies are estimated experimentally from small synthetic RNAs. • Positive energy - added for low entropy regions such as bulges, loops, etc. 18
RNA Secondary Structure Hairpin loop 19
Nearest Neighbor Model • Hairpin energy: • Experimentally measured for hairpins of length 5, 6, 7, 8, … up to a maximum. Extrapolation above the maximum. • The closing pair affects the energy. Distinguish between A- U and C-G. 20
RNA Secondary Structure Internal Loop Bulge Loop 21
Nearest Neighbor Model • Bulge/Internal energy: • Let L 1, L 2 denote the lengths of the two sides of the bulge/ internal loop. • Experimentally measured for different values of L 1, L 2 . • In practice for computational convenience, the energy is given as function of L 1 + L 2 by a lookup table and extrapolation. 22
RNA Secondary Structure Junction (Multiloop) 23
Nearest Neighbor Model • Multiloop energy: • Let U denote the number of unpaired bases. • Let P denote the number of base pairs. • The free energy is an affine function of U and P: a 1 + a 2 U + a 3 P. • This is the least accurate component of the NN model. 24
Recommend
More recommend