algorithmique des structures d arn
play

Algorithmique des structures dARN H el` ene Touzet Groupe de - PowerPoint PPT Presentation

Algorithmique des structures dARN H el` ene Touzet Groupe de travail COMATEGE Combinatoire des mots, algorithmique du texte et du g enome RNA - RiboNucleic Acid DNA messenger RNA noncoding RNA protein RNA structure C A G A C


  1. Algorithmique des structures d’ARN H´ el` ene Touzet Groupe de travail COMATEGE Combinatoire des mots, algorithmique du texte et du g´ enome

  2. RNA - RiboNucleic Acid DNA messenger RNA noncoding RNA protein

  3. RNA structure C A G A C G G C C G C G C G A G U G C A U A P12 G C G C G G U A A C G CA A A C G G UG A G A A A G G P13 A GG U G C G A C C A C G C G A C A A C G C GC P11 G C G P14 G G U G G U A G U C C A G A C C A U C C G P9 A U G P10 A A C C C AC G U A C C P 7 A A C A C C C G G A G C A A A G G G P5 G G G G C U G G P8 C G U AG C C U C A C G A G C C G G U G G G A G C U G A A C C U G A A A G G P3 G A CG G G G G A G A C G G C G G A G G G G U C U C C U C U G C U G C U U C C C G C U G G U G P2 A C 5´ P1 A G A A G C U G A C C A G 3´ U C C A C U U U G A C U G G U C A U U

  4. RNA structure base pairs a . . . u c . . . g g g g g g g g g a a a c c c a g u u c u u u c u c a a g a c a a c c c secondary structure

  5. Overview of the talk ◦ RNA folding ◦ Comparison of RNA structures ◦ Dynamic programming 2.0

  6. RNA folding problem ◦ an RNA sequence + a folding model 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ◦ find a secondary structure with maximum number of base pairs 1 2 3 4 5 6 7 8 9 10 11 12 13 14

  7. RNA folding problem 1 2 3 4 5 6 7 8 9 10 11 12 13 14 9,14 2,4 10,11 1,6 12,13 3,5 5,10 7,8

  8. RNA folding problem 1 2 3 4 5 6 7 8 9 10 11 12 13 14 9,14 2,4 10,11 1,6 12,13 3,5 5,10 7,8

  9. RNA folding problem i + 1 j i = i j i i + 1 k − 1 k k + 1 j S ( i , j ) : number of base pairs for the subtring i .. j � S ( i + 1 , j ) S ( i , j ) = max S ( i + 1 , k − 1) + S ( k + 1 , j ) + 1 , i < k ≤ j Implementation by dynamic programming R. Nussinov, A. Jacobson, PNAS 1980

  10. RNA folding – Locally optimal secondary structures no base pairs can be added without violating the definition of secondary structure 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14

  11. Construction of all locally optimal secondary structures ◦ maximal horizontal structure 1 2 3 4 5 6 7 8 9 10 11 12 13 14 set of juxtaposed base pairs, such that there is no pairing between any pair of visible positions ◦ locally optimal secondary structures : combinations of maximal horizontal structures ◦ implementation : dynamic programming × 2

  12. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1 2 4 6 7 8 9 10 11 12 13 14 1 6 7 8 9 14 1 3 5 6 7 8 9 10 11 12 13 14 2 4 5 7 8 10 12 13 2 4 5 10 12 13 Saffarian, Giraud, de Monte, Touzet 2012

  13. Comparison of RNA structures

  14. Comparison of RNA structures Unlimited

  15. Comparison of RNA structures Unlimited Crossing

  16. Comparison of RNA structures Unlimited Crossing Nested

  17. Comparison of RNA structures Simple operations single del ins pair pair ins pair del

  18. Comparison of RNA structures Simple operations single del ins pair pair ins pair del

  19. Comparison of RNA structures Simple operations single del ins pair pair ins pair del Full operations pairL del pairR del pairLR del pairL ins pairR ins pairLR ins

  20. Comparison of RNA structures Nested Crossing Unlimited Tree alignment Tree edit distance Tree edit distance Simple O ( n 3 log( n )) [2] O ( n 3 log( n )) [2] O ( n 4 ) [1] General O ( n 4 ) [3] Full NP-complet [3] edit distance NP-complet [4] 1 Jiang, Wang, Zhang, 1995 2 Klein 1998 3 Blin, Touzet 2006 4 Blin, Fertin, Rusu, Sinoquet 2007

  21. From RNAs to ICOREs ◦ myriad of problems over sequences, trees and graphs ◦ extensive use of dynamic programming ◦ ICORE universal specification framework for dynamic programming problems

  22. Dynamic Programming 2.0 optimization problem dynamic programming equations algorithm code

  23. Dynamic Programming 2.0 optimization problem dynamic programming equations algorithm new choices optimizations code debugging

  24. Dynamic Programming 2.0 optimization problem optimization problem dynamic programming specification equations dynamic programming equations algorithm algorithm new choices optimizations code code debugging

  25. Levenshtein distance ◦ 2 strings : sand and aunt ◦ 3 operations : replacement, deletion, insertion ◦ edit script s a - n d | | : - a u n t del(s,rep(a,a,ins(u,rep(n,n,rep(d,t,mty)))))

  26. Levenshtein distance ◦ 2 strings : sand and aunt ◦ 3 operations : replacement, deletion, insertion ◦ edit script s a - n d | | : - a u n t del(s,rep(a,a,ins(u,rep(n,n,rep(d,t,mty)))))

  27. ◦ rewrite rules rep ( a , c , X ) a ∼ X ← → c ∼ X del ( a , X ) a ∼ X ← → X ins ( c , X ) X ← → c ∼ X ε ← mty → ε

  28. ◦ rewrite rules rep ( a , c , X ) a ∼ X ← → c ∼ X del ( a , X ) a ∼ X ← → X ins ( c , X ) X ← → c ∼ X ε ← mty → ε del ( s , rep ( a , a , ins ( u , rep ( n , n , rep ( d , t , mty ))))) ↓ ↓ del ( s , a ∼ ins ( u , rep ( n , n , rep ( d , t , mty )))) ↓ ↓ del ( s , a ∼ ins ( u , n ∼ rep ( d , t , mty ))) ւ ց del ( s , a ∼ ins ( u , n ∼ d ∼ mty )) del ( s , a ∼ ins ( u , n ∼ t ∼ mty )) ↓ ↓ s ∼ a ∼ ins ( u , n ∼ d ∼ mty ) a ∼ ins ( u , n ∼ t ∼ mty ) ↓ ↓ s ∼ a ∼ n ∼ d ∼ mty a ∼ u ∼ n ∼ t ∼ mty ↓ ↓ s ∼ a ∼ n ∼ d a ∼ u ∼ n ∼ t

  29. ◦ evaluation algebra rep ( a , c , x ) = if a == c then x else x + 1 del ( a , x ) = x + 1 ins ( c , x ) = x + 1 mty = 0 φ = min del(s,rep(a,a,ins(u,rep(n,n,rep(d,t,mty)))))

  30. ◦ evaluation algebra rep ( a , c , x ) = if a == c then x else x + 1 del ( a , x ) = x + 1 ins ( c , x ) = x + 1 mty = 0 φ = min del(s,rep(a,a,ins(u,rep(n,n,rep(d,t,mty))))) ◦ solving the Levenshtein distance finding a term on rep , del , ins , mty that rewrites to sand and aunt , and that is optimal for the evaluation algebra

  31. ICOREs – definition Inverted Coupled Rewrite Systems k , a positive natural number – ICORE of dimension k : ◦ a set V of variables ◦ a core signature ζ , and k satellite signatures Σ 1 , . . . , Σ k ◦ k term rewrite systems, which all have the same left-hand sides in T ( ζ, V ) ◦ optionally a tree grammar G over the core signature ζ ◦ an evaluation algebra A for the core signature ζ , including an objective function φ

  32. Back to RNA problems C A G A C G G C C G C G C G A G U G C A U A P12 G C G C G G U A A C G CA A A C G G UG A G A A A G G P13 A GG U G C G A C C A C G C G A C A A C G C GC P11 G C G P14 G G U G G U A G U C C A G A C C A U C C G P9 A U G P10 A A C C C AC G U A C C P 7 A A C A C C C G G A G C A A A G G G P5 G G G G C U G G P8 C G U AG C C U C A C G A G C C G G U G G G A G C U G A A C C U G A A A G G P3 G A CG G G G G A G A C G G C G G A G G G G U C U C C U C U G C U G C U U C C C G C U G G U G P2 A C 5´ P1 A G A A G C U G A C C A G 3´ U C C A C U U U G A C U G G U C A U U

  33. RNA folding problem Input : an RNA sequence g g g g g g g g g g a a a c c c a u u c u u u c u c a a a c a a c c c Ouput : its optimal secondary structure g g g g a a a c c c a g g u u c g u u u c g g u c a a g a c a a c c c

  34. RNA folding problem Input : an RNA sequence g g g g g g g g g g a a a c c c a u u c u u u c u c a a a c a a c c c Ouput : its optimal secondary structure g g g g a a a c c c a g g u u c g u u u c g g u c a a g a c a a c c c ICORE single ( a , X ) → a ∼ X split ( X , Y ) → X ∼ Y pair ( a , X , b ) → a ∼ X ∼ b mty → ε

  35. RNA folding problem split Input : an RNA sequence pair single g c a g g g g g g g g g g pair pair a a a c c c a u u c u u u c u c a a a c a a c c c g c g c pair pair g c g c Ouput : its optimal secondary structure single pair a u a single pair a u a g g g g a a a c c c a g g u u c g u u u c g g u c a a g a c a a c c c single split a single pair pair a c g g c ICORE mty pair pair g c u a single ( a , X ) → a ∼ X single pair u c g split ( X , Y ) → X ∼ Y single single u a pair ( a , X , b ) → a ∼ X ∼ b mty → ε single single u a mty mty

  36. RNA folding Levenshtein distance ε ← → ε mty mty → ε a ∼ X ← rep ( a , c , X ) → c ∼ X split ( X , Y ) X ∼ Y → a ∼ X ← del ( a , X ) → X single ( a , X ) → a ∼ X X ← ins ( c , X ) → c ∼ X pair ( a , X , b ) → a ∼ X ∼ b

Recommend


More recommend