Modeling and studying RNA secondary structure Eugène Asarin LIAFA, CNRS & Univ. Paris Diderot
Credits Co-authors, partners and teachers: • Vassily Lyubetsky, Alexander Seliverstov (IITP) • Thierry Cachat, Tayssir Touili (LIAFA) Sponsor: • CNRS/RAS convention d’échanges EVOLVER/REVERA Special thanks to: • Hervé Isambert (I. Curie)
Disclaimer • I am not a bioinformatician (yet) • I am (still) a computer scientist with verification background – everything is a transition system – one should explore its state space in a smart way • This talk – More informatics than byology+physics – More models than solutions – More questions and speculations than answers
Motivating example: Classical Attenuation Regulation
Transcription and Translation DNA A G C T G C
Transcription and Translation Gene Polymerase DNA Amino Acids RNA Ribosome Transcription : DNA to RNA, done by Polymerase Translation : RNA to Amino Acids , done by Ribosome
Gene Expression Gene DNA Ribosome RNA Amino Acids Gene expressed if Polymerase reach the Gene
Classical Attenuation Regulation Gene DNA RNA Ribosome Depends on the structure of the RNA between Ribosome and Polymerase
RNA Secondary Structure RNA: a sequence of nucleotides A, G, C, U with links A-U and G-C A C A C U G G C U C A C C U U C G G G U G G G C C U U U C U C G U C G U C G C G A U Helix C G U A C G G C ACACUG CUUUCUGCG
RNA Secondary Structure (a simplified view) This structure is dynamic and changes very fast and can cause the slippage of the Polymerase
Polymerase Slippage T-rich region: connection of Pol and DNA weakens Gene Ribosome RNA
Polymerase Slippage T-rich region: connection of Pol and DNA weakens Gene Ribosome RNA Slippage of the polymerase and the gene is not expressed: Termination
Polymerase Slippage T-rich region: connection of Pol and DNA weakens Gene Ribosome RNA Polymerase reaches Gene and the Gene is expressed: Antitermination Each of these two situations can happen with some probability .
Regulation mechanism : causal chain • Concentration of a product (say trp) • Speed of Ribosome • Dynamics of secondary structure • Probability of Polymerase slippage • Gene expression • (production of trp)
Wanted • A model of dynamics of the RNA secondary structure capable to predict the probability of gene expression. – Should be quantitative – Should represent transient behaviours (steady state not enough) – Should be validated by biological data on regulation
Other motivations • Other kinds of regulation • Other alternative behaviors/structures on a RNA, e.g. in ribozymes • Scientific curiosity • Ineresting transfers between Bio and Info
Models, Analyses, Tools
Tools (2 examples) • Rnamodel - Lyubetsky et al. • Kinefold – Isambert et al.
The approach: Markov chain • Features – The sequence is fixed. – States of the MC : all possible secondary structures on this sequence (or part of it) – Transitions: simple events (see below) – Rates: determined by � E • Difficulties – As usual, many parameters are difficult to find – The Markov Chain is huge and complex – Only Monte-Carlo simulation is possible – It is still heavy and slow
Some recipes • Find a succinct structured representation of MC • Use on-the-fly state generation • Use symbolic representations • Use abstractions • Use acceleration • Use other advanced trchniques – perfect simulation etc. • Think
A succinct representation We suggest Probabilistic Rewriting Systems
The idea? • Represent the RNA secondary structure by a term • Represent the dynamics of the secondary structure by rewriting rules
The set of windows Polymerase RNA Ribosome w =(R,P) R: position of the Ribosome in the RNA P: position of the Polymerase in the RNA W= {w=(R,P) s.t. 13 � R � P � l } l: the length of the regulatory region
The helices B C D A f = (A,B,C,D)
The hypohelices F G E H f = (A,B,C,D) B C g = (E,F,G,H) f(g) D A
The structure as a term B C R D P A f = (A,B,C,D) w(f) w =(R,P)
The structure as a term B C F G R P D H A E f = (A,B,C,D) w(f , g) g = (E,F,G,H) w =(R,P)
The structure as a term j f R P h k g w =(R,P) w(f , g(h,k) , j) The order is not important
The Dynamics as a Probabilistic Rewriting System
Extension and Reduction of a helix B C G F E H D A g = (E,F,G,H) f = (A,B,C,D) One rule for Rewriting rule: f � g multipe contexts
(De)-Composition of a Hypohelix j f h k
(De)-Composition of a Helix j f h k g Rewriting rule: w(f , h , k , j) � w(f , g(h , k) , j) Meta Rewriting rule: w( , h , k ) � w( , g(h , k) )
(De)-Composition of a Helix j f h k m
(De)-Composition of a Helix j f h k g One rule for m multipe contexts Meta Rewriting rule: m( , h , k ) � m( , g(h , k) )
The Window Movement Polymerase RNA Ribosome three rules for multipe contexts w =(R,P) Movement of the Ribosome: (R , P)(x) � (R+3 , P) (x’) Movement of the Polymerase: (R , P) (x) � (R , P+1) (x) Termination: Slippage of the Polymerase: (R , P) (x) � �
Rates of the rewriting Rules • Rates of the Markov Chain determined by � E. Two terms – Stacking energy (easy) – Free energy (a bit obscure) • Movements of Rib and Pol – even more obscure
Other optimisations Implemented or not
On-the-fly (everybody does it) • States of the MC are created only when visited. • Still needed efficient data structures for the states • Needed efficient algorithms to find all the successors of a given state (and the rates)
Concretely repeat 1000 times s=empty window repeat find all successors s’ of s and rates s->s’ s= a random s’ until expressed or aborted output statistics
Symbolic representation (not used yet) A data structure (close to a formula) to represent current set of states or probability distribution. • Very successfull in verification domain • We tried probabilistic tree automata, they explode � • Thinking
Abstraction (everybody use it in a naive way) • Aim : to have fewer states • Idea: to group together several states • Rnamodel and Kinefold : only maximal helices stored. • We try to group some close helices: 800000->300000 states • Abstraction-based algorithms should be done systematically
Abstraction (what if) • Biological description is very abstract (terminator, antiterminator, noise) • What if … a model is possible at this level? • To think
Acceleration • Problem : many fast transitions without any progress, major event rare • Partial solution: group similar states together • A better one : Isambert’s clustering • To think more
Advanced simulation methods • « Perfect simulation » - nice but only for steady state • Other methods from Performance Evaluation – mostly for N^k • To read and to think
More complex structures
Other structures • Helices can be « flipped » - not a big deal • They can be pseudoknotted • Biologically relevant
Consequences for modeling and analysis • What are legal configurations? • How to compute free energy? • Data structure : a set of helices, a tree with shortcuts, a colored graphs? • Alphabet and state space much bigger… • Abstractions more involved. • Kinefold – yes, Rnamodel – in progress
Dreams • To find simple abstract models • To analyze them w/o Monte-Carlo • To transfer techniques from Verification and Performance evaluation and backwards
Questions?
Continuous Markov Chain • Probability for moving from s to s’ within t time units is: � � ( , ' ) s s t � 1 e
Continuous Markov Chain • Probability for moving from s to s’ within t time units is: � ( , ' ) s s � ( ) E s t � � = � ( 1 ), ( ) ( , ' ) e E s s s ( ) E s
Recommend
More recommend