modeling and studying rna secondary structure
play

Modeling and studying RNA secondary structure Eugne Asarin LIAFA, - PowerPoint PPT Presentation

Modeling and studying RNA secondary structure Eugne Asarin LIAFA, CNRS & Univ. Paris Diderot Credits Co-authors, partners and teachers: Vassily Lyubetsky, Alexander Seliverstov (IITP) Thierry Cachat, Tayssir Touili (LIAFA)


  1. Modeling and studying RNA secondary structure Eugène Asarin LIAFA, CNRS & Univ. Paris Diderot

  2. Credits Co-authors, partners and teachers: • Vassily Lyubetsky, Alexander Seliverstov (IITP) • Thierry Cachat, Tayssir Touili (LIAFA) Sponsor: • CNRS/RAS convention d’échanges EVOLVER/REVERA Special thanks to: • Hervé Isambert (I. Curie)

  3. Disclaimer • I am not a bioinformatician (yet) • I am (still) a computer scientist with verification background – everything is a transition system – one should explore its state space in a smart way • This talk – More informatics than byology+physics – More models than solutions – More questions and speculations than answers

  4. Motivating example: Classical Attenuation Regulation

  5. Transcription and Translation DNA A G C T G C

  6. Transcription and Translation Gene Polymerase DNA Amino Acids RNA Ribosome Transcription : DNA to RNA, done by Polymerase Translation : RNA to Amino Acids , done by Ribosome

  7. Gene Expression Gene DNA Ribosome RNA Amino Acids Gene expressed if Polymerase reach the Gene

  8. Classical Attenuation Regulation Gene DNA RNA Ribosome Depends on the structure of the RNA between Ribosome and Polymerase

  9. RNA Secondary Structure RNA: a sequence of nucleotides A, G, C, U with links A-U and G-C A C A C U G G C U C A C C U U C G G G U G G G C C U U U C U C G U C G U C G C G A U Helix C G U A C G G C ACACUG CUUUCUGCG

  10. RNA Secondary Structure (a simplified view) This structure is dynamic and changes very fast and can cause the slippage of the Polymerase

  11. Polymerase Slippage T-rich region: connection of Pol and DNA weakens Gene Ribosome RNA

  12. Polymerase Slippage T-rich region: connection of Pol and DNA weakens Gene Ribosome RNA Slippage of the polymerase and the gene is not expressed: Termination

  13. Polymerase Slippage T-rich region: connection of Pol and DNA weakens Gene Ribosome RNA Polymerase reaches Gene and the Gene is expressed: Antitermination Each of these two situations can happen with some probability .

  14. Regulation mechanism : causal chain • Concentration of a product (say trp) • Speed of Ribosome • Dynamics of secondary structure • Probability of Polymerase slippage • Gene expression • (production of trp)

  15. Wanted • A model of dynamics of the RNA secondary structure capable to predict the probability of gene expression. – Should be quantitative – Should represent transient behaviours (steady state not enough) – Should be validated by biological data on regulation

  16. Other motivations • Other kinds of regulation • Other alternative behaviors/structures on a RNA, e.g. in ribozymes • Scientific curiosity • Ineresting transfers between Bio and Info

  17. Models, Analyses, Tools

  18. Tools (2 examples) • Rnamodel - Lyubetsky et al. • Kinefold – Isambert et al.

  19. The approach: Markov chain • Features – The sequence is fixed. – States of the MC : all possible secondary structures on this sequence (or part of it) – Transitions: simple events (see below) – Rates: determined by � E • Difficulties – As usual, many parameters are difficult to find – The Markov Chain is huge and complex – Only Monte-Carlo simulation is possible – It is still heavy and slow

  20. Some recipes • Find a succinct structured representation of MC • Use on-the-fly state generation • Use symbolic representations • Use abstractions • Use acceleration • Use other advanced trchniques – perfect simulation etc. • Think

  21. A succinct representation We suggest Probabilistic Rewriting Systems

  22. The idea? • Represent the RNA secondary structure by a term • Represent the dynamics of the secondary structure by rewriting rules

  23. The set of windows Polymerase RNA Ribosome w =(R,P) R: position of the Ribosome in the RNA P: position of the Polymerase in the RNA W= {w=(R,P) s.t. 13 � R � P � l } l: the length of the regulatory region

  24. The helices B C D A f = (A,B,C,D)

  25. The hypohelices F G E H f = (A,B,C,D) B C g = (E,F,G,H) f(g) D A

  26. The structure as a term B C R D P A f = (A,B,C,D) w(f) w =(R,P)

  27. The structure as a term B C F G R P D H A E f = (A,B,C,D) w(f , g) g = (E,F,G,H) w =(R,P)

  28. The structure as a term j f R P h k g w =(R,P) w(f , g(h,k) , j) The order is not important

  29. The Dynamics as a Probabilistic Rewriting System

  30. Extension and Reduction of a helix B C G F E H D A g = (E,F,G,H) f = (A,B,C,D) One rule for Rewriting rule: f � g multipe contexts

  31. (De)-Composition of a Hypohelix j f h k

  32. (De)-Composition of a Helix j f h k g Rewriting rule: w(f , h , k , j) � w(f , g(h , k) , j) Meta Rewriting rule: w( , h , k ) � w( , g(h , k) )

  33. (De)-Composition of a Helix j f h k m

  34. (De)-Composition of a Helix j f h k g One rule for m multipe contexts Meta Rewriting rule: m( , h , k ) � m( , g(h , k) )

  35. The Window Movement Polymerase RNA Ribosome three rules for multipe contexts w =(R,P) Movement of the Ribosome: (R , P)(x) � (R+3 , P) (x’) Movement of the Polymerase: (R , P) (x) � (R , P+1) (x) Termination: Slippage of the Polymerase: (R , P) (x) � �

  36. Rates of the rewriting Rules • Rates of the Markov Chain determined by � E. Two terms – Stacking energy (easy) – Free energy (a bit obscure) • Movements of Rib and Pol – even more obscure

  37. Other optimisations Implemented or not

  38. On-the-fly (everybody does it) • States of the MC are created only when visited. • Still needed efficient data structures for the states • Needed efficient algorithms to find all the successors of a given state (and the rates)

  39. Concretely repeat 1000 times s=empty window repeat find all successors s’ of s and rates s->s’ s= a random s’ until expressed or aborted output statistics

  40. Symbolic representation (not used yet) A data structure (close to a formula) to represent current set of states or probability distribution. • Very successfull in verification domain • We tried probabilistic tree automata, they explode � • Thinking

  41. Abstraction (everybody use it in a naive way) • Aim : to have fewer states • Idea: to group together several states • Rnamodel and Kinefold : only maximal helices stored. • We try to group some close helices: 800000->300000 states • Abstraction-based algorithms should be done systematically

  42. Abstraction (what if) • Biological description is very abstract (terminator, antiterminator, noise) • What if … a model is possible at this level? • To think

  43. Acceleration • Problem : many fast transitions without any progress, major event rare • Partial solution: group similar states together • A better one : Isambert’s clustering • To think more

  44. Advanced simulation methods • « Perfect simulation » - nice but only for steady state • Other methods from Performance Evaluation – mostly for N^k • To read and to think

  45. More complex structures

  46. Other structures • Helices can be « flipped » - not a big deal • They can be pseudoknotted • Biologically relevant

  47. Consequences for modeling and analysis • What are legal configurations? • How to compute free energy? • Data structure : a set of helices, a tree with shortcuts, a colored graphs? • Alphabet and state space much bigger… • Abstractions more involved. • Kinefold – yes, Rnamodel – in progress

  48. Dreams • To find simple abstract models • To analyze them w/o Monte-Carlo • To transfer techniques from Verification and Performance evaluation and backwards

  49. Questions?

  50. Continuous Markov Chain • Probability for moving from s to s’ within t time units is: � � ( , ' ) s s t � 1 e

  51. Continuous Markov Chain • Probability for moving from s to s’ within t time units is: � ( , ' ) s s � ( ) E s t � � = � ( 1 ), ( ) ( , ' ) e E s s s ( ) E s

Recommend


More recommend