Algorithmique des structures d’ARN H´ el` ene Touzet Groupe de travail COMATEGE Combinatoire des mots, algorithmique du texte et du g´ enome
RNA - RiboNucleic Acid DNA messenger RNA noncoding RNA protein
RNA structure C A G A C G G C C G C G C G A G U G C A U A P12 G C G C G G U A A C G CA A A C G G UG A G A A A G G P13 A GG U G C G A C C A C G C G A C A A C G C GC P11 G C G P14 G G U G G U A G U C C A G A C C A U C C G P9 A U G P10 A A C C C AC G U A C C P 7 A A C A C C C G G A G C A A A G G G P5 G G G G C U G G P8 C G U AG C C U C A C G A G C C G G U G G G A G C U G A A C C U G A A A G G P3 G A CG G G G G A G A C G G C G G A G G G G U C U C C U C U G C U G C U U C C C G C U G G U G P2 A C 5´ P1 A G A A G C U G A C C A G 3´ U C C A C U U U G A C U G G U C A U U
RNA structure base pairs a . . . u c . . . g g g g g g g g g a a a c c c a g u u c u u u c u c a a g a c a a c c c secondary structure
Overview of the talk ◦ RNA folding ◦ Comparison of RNA structures ◦ Dynamic programming 2.0
RNA folding problem ◦ an RNA sequence + a folding model 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ◦ find a secondary structure with maximum number of base pairs 1 2 3 4 5 6 7 8 9 10 11 12 13 14
RNA folding problem 1 2 3 4 5 6 7 8 9 10 11 12 13 14 9,14 2,4 10,11 1,6 12,13 3,5 5,10 7,8
RNA folding problem 1 2 3 4 5 6 7 8 9 10 11 12 13 14 9,14 2,4 10,11 1,6 12,13 3,5 5,10 7,8
RNA folding problem i + 1 j i = i j i i + 1 k − 1 k k + 1 j S ( i , j ) : number of base pairs for the subtring i .. j � S ( i + 1 , j ) S ( i , j ) = max S ( i + 1 , k − 1) + S ( k + 1 , j ) + 1 , i < k ≤ j Implementation by dynamic programming R. Nussinov, A. Jacobson, PNAS 1980
RNA folding – Locally optimal secondary structures no base pairs can be added without violating the definition of secondary structure 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Construction of all locally optimal secondary structures ◦ maximal horizontal structure 1 2 3 4 5 6 7 8 9 10 11 12 13 14 set of juxtaposed base pairs, such that there is no pairing between any pair of visible positions ◦ locally optimal secondary structures : combinations of maximal horizontal structures ◦ implementation : dynamic programming × 2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 1 2 4 6 7 8 9 10 11 12 13 14 1 6 7 8 9 14 1 3 5 6 7 8 9 10 11 12 13 14 2 4 5 7 8 10 12 13 2 4 5 10 12 13 Saffarian, Giraud, de Monte, Touzet 2012
Comparison of RNA structures
Comparison of RNA structures Unlimited
Comparison of RNA structures Unlimited Crossing
Comparison of RNA structures Unlimited Crossing Nested
Comparison of RNA structures Simple operations single del ins pair pair ins pair del
Comparison of RNA structures Simple operations single del ins pair pair ins pair del
Comparison of RNA structures Simple operations single del ins pair pair ins pair del Full operations pairL del pairR del pairLR del pairL ins pairR ins pairLR ins
Comparison of RNA structures Nested Crossing Unlimited Tree alignment Tree edit distance Tree edit distance Simple O ( n 3 log( n )) [2] O ( n 3 log( n )) [2] O ( n 4 ) [1] General O ( n 4 ) [3] Full NP-complet [3] edit distance NP-complet [4] 1 Jiang, Wang, Zhang, 1995 2 Klein 1998 3 Blin, Touzet 2006 4 Blin, Fertin, Rusu, Sinoquet 2007
From RNAs to ICOREs ◦ myriad of problems over sequences, trees and graphs ◦ extensive use of dynamic programming ◦ ICORE universal specification framework for dynamic programming problems
Dynamic Programming 2.0 optimization problem dynamic programming equations algorithm code
Dynamic Programming 2.0 optimization problem dynamic programming equations algorithm new choices optimizations code debugging
Dynamic Programming 2.0 optimization problem optimization problem dynamic programming specification equations dynamic programming equations algorithm algorithm new choices optimizations code code debugging
Levenshtein distance ◦ 2 strings : sand and aunt ◦ 3 operations : replacement, deletion, insertion ◦ edit script s a - n d | | : - a u n t del(s,rep(a,a,ins(u,rep(n,n,rep(d,t,mty)))))
Levenshtein distance ◦ 2 strings : sand and aunt ◦ 3 operations : replacement, deletion, insertion ◦ edit script s a - n d | | : - a u n t del(s,rep(a,a,ins(u,rep(n,n,rep(d,t,mty)))))
◦ rewrite rules rep ( a , c , X ) a ∼ X ← → c ∼ X del ( a , X ) a ∼ X ← → X ins ( c , X ) X ← → c ∼ X ε ← mty → ε
◦ rewrite rules rep ( a , c , X ) a ∼ X ← → c ∼ X del ( a , X ) a ∼ X ← → X ins ( c , X ) X ← → c ∼ X ε ← mty → ε del ( s , rep ( a , a , ins ( u , rep ( n , n , rep ( d , t , mty ))))) ↓ ↓ del ( s , a ∼ ins ( u , rep ( n , n , rep ( d , t , mty )))) ↓ ↓ del ( s , a ∼ ins ( u , n ∼ rep ( d , t , mty ))) ւ ց del ( s , a ∼ ins ( u , n ∼ d ∼ mty )) del ( s , a ∼ ins ( u , n ∼ t ∼ mty )) ↓ ↓ s ∼ a ∼ ins ( u , n ∼ d ∼ mty ) a ∼ ins ( u , n ∼ t ∼ mty ) ↓ ↓ s ∼ a ∼ n ∼ d ∼ mty a ∼ u ∼ n ∼ t ∼ mty ↓ ↓ s ∼ a ∼ n ∼ d a ∼ u ∼ n ∼ t
◦ evaluation algebra rep ( a , c , x ) = if a == c then x else x + 1 del ( a , x ) = x + 1 ins ( c , x ) = x + 1 mty = 0 φ = min del(s,rep(a,a,ins(u,rep(n,n,rep(d,t,mty)))))
◦ evaluation algebra rep ( a , c , x ) = if a == c then x else x + 1 del ( a , x ) = x + 1 ins ( c , x ) = x + 1 mty = 0 φ = min del(s,rep(a,a,ins(u,rep(n,n,rep(d,t,mty))))) ◦ solving the Levenshtein distance finding a term on rep , del , ins , mty that rewrites to sand and aunt , and that is optimal for the evaluation algebra
ICOREs – definition Inverted Coupled Rewrite Systems k , a positive natural number – ICORE of dimension k : ◦ a set V of variables ◦ a core signature ζ , and k satellite signatures Σ 1 , . . . , Σ k ◦ k term rewrite systems, which all have the same left-hand sides in T ( ζ, V ) ◦ optionally a tree grammar G over the core signature ζ ◦ an evaluation algebra A for the core signature ζ , including an objective function φ
Back to RNA problems C A G A C G G C C G C G C G A G U G C A U A P12 G C G C G G U A A C G CA A A C G G UG A G A A A G G P13 A GG U G C G A C C A C G C G A C A A C G C GC P11 G C G P14 G G U G G U A G U C C A G A C C A U C C G P9 A U G P10 A A C C C AC G U A C C P 7 A A C A C C C G G A G C A A A G G G P5 G G G G C U G G P8 C G U AG C C U C A C G A G C C G G U G G G A G C U G A A C C U G A A A G G P3 G A CG G G G G A G A C G G C G G A G G G G U C U C C U C U G C U G C U U C C C G C U G G U G P2 A C 5´ P1 A G A A G C U G A C C A G 3´ U C C A C U U U G A C U G G U C A U U
RNA folding problem Input : an RNA sequence g g g g g g g g g g a a a c c c a u u c u u u c u c a a a c a a c c c Ouput : its optimal secondary structure g g g g a a a c c c a g g u u c g u u u c g g u c a a g a c a a c c c
RNA folding problem Input : an RNA sequence g g g g g g g g g g a a a c c c a u u c u u u c u c a a a c a a c c c Ouput : its optimal secondary structure g g g g a a a c c c a g g u u c g u u u c g g u c a a g a c a a c c c ICORE single ( a , X ) → a ∼ X split ( X , Y ) → X ∼ Y pair ( a , X , b ) → a ∼ X ∼ b mty → ε
RNA folding problem split Input : an RNA sequence pair single g c a g g g g g g g g g g pair pair a a a c c c a u u c u u u c u c a a a c a a c c c g c g c pair pair g c g c Ouput : its optimal secondary structure single pair a u a single pair a u a g g g g a a a c c c a g g u u c g u u u c g g u c a a g a c a a c c c single split a single pair pair a c g g c ICORE mty pair pair g c u a single ( a , X ) → a ∼ X single pair u c g split ( X , Y ) → X ∼ Y single single u a pair ( a , X , b ) → a ∼ X ∼ b mty → ε single single u a mty mty
RNA folding Levenshtein distance ε ← → ε mty mty → ε a ∼ X ← rep ( a , c , X ) → c ∼ X split ( X , Y ) X ∼ Y → a ∼ X ← del ( a , X ) → X single ( a , X ) → a ∼ X X ← ins ( c , X ) → c ∼ X pair ( a , X , b ) → a ∼ X ∼ b
Recommend
More recommend