Modeling and studying RNA secondary structure Eugne Asarin LIAFA, - PowerPoint PPT Presentation

Modeling and studying RNA secondary structure Eugène Asarin LIAFA, CNRS & Univ. Paris Diderot

Credits Co-authors, partners and teachers: • Vassily Lyubetsky, Alexander Seliverstov (IITP) • Thierry Cachat, Tayssir Touili (LIAFA) Sponsor: • CNRS/RAS convention d’échanges EVOLVER/REVERA Special thanks to: • Hervé Isambert (I. Curie)

Disclaimer • I am not a bioinformatician (yet) • I am (still) a computer scientist with verification background – everything is a transition system – one should explore its state space in a smart way • This talk – More informatics than byology+physics – More models than solutions – More questions and speculations than answers

Motivating example: Classical Attenuation Regulation

Transcription and Translation DNA A G C T G C

Transcription and Translation Gene Polymerase DNA Amino Acids RNA Ribosome Transcription : DNA to RNA, done by Polymerase Translation : RNA to Amino Acids , done by Ribosome

Gene Expression Gene DNA Ribosome RNA Amino Acids Gene expressed if Polymerase reach the Gene

Classical Attenuation Regulation Gene DNA RNA Ribosome Depends on the structure of the RNA between Ribosome and Polymerase

RNA Secondary Structure RNA: a sequence of nucleotides A, G, C, U with links A-U and G-C A C A C U G G C U C A C C U U C G G G U G G G C C U U U C U C G U C G U C G C G A U Helix C G U A C G G C ACACUG CUUUCUGCG

RNA Secondary Structure (a simplified view) This structure is dynamic and changes very fast and can cause the slippage of the Polymerase

Polymerase Slippage T-rich region: connection of Pol and DNA weakens Gene Ribosome RNA

Polymerase Slippage T-rich region: connection of Pol and DNA weakens Gene Ribosome RNA Slippage of the polymerase and the gene is not expressed: Termination

Polymerase Slippage T-rich region: connection of Pol and DNA weakens Gene Ribosome RNA Polymerase reaches Gene and the Gene is expressed: Antitermination Each of these two situations can happen with some probability .

Regulation mechanism : causal chain • Concentration of a product (say trp) • Speed of Ribosome • Dynamics of secondary structure • Probability of Polymerase slippage • Gene expression • (production of trp)

Wanted • A model of dynamics of the RNA secondary structure capable to predict the probability of gene expression. – Should be quantitative – Should represent transient behaviours (steady state not enough) – Should be validated by biological data on regulation

Other motivations • Other kinds of regulation • Other alternative behaviors/structures on a RNA, e.g. in ribozymes • Scientific curiosity • Ineresting transfers between Bio and Info

Models, Analyses, Tools

Tools (2 examples) • Rnamodel - Lyubetsky et al. • Kinefold – Isambert et al.

The approach: Markov chain • Features – The sequence is fixed. – States of the MC : all possible secondary structures on this sequence (or part of it) – Transitions: simple events (see below) – Rates: determined by � E • Difficulties – As usual, many parameters are difficult to find – The Markov Chain is huge and complex – Only Monte-Carlo simulation is possible – It is still heavy and slow

Some recipes • Find a succinct structured representation of MC • Use on-the-fly state generation • Use symbolic representations • Use abstractions • Use acceleration • Use other advanced trchniques – perfect simulation etc. • Think

A succinct representation We suggest Probabilistic Rewriting Systems

The idea? • Represent the RNA secondary structure by a term • Represent the dynamics of the secondary structure by rewriting rules

The set of windows Polymerase RNA Ribosome w =(R,P) R: position of the Ribosome in the RNA P: position of the Polymerase in the RNA W= {w=(R,P) s.t. 13 � R � P � l } l: the length of the regulatory region

The helices B C D A f = (A,B,C,D)

The hypohelices F G E H f = (A,B,C,D) B C g = (E,F,G,H) f(g) D A

The structure as a term B C R D P A f = (A,B,C,D) w(f) w =(R,P)

The structure as a term B C F G R P D H A E f = (A,B,C,D) w(f , g) g = (E,F,G,H) w =(R,P)

The structure as a term j f R P h k g w =(R,P) w(f , g(h,k) , j) The order is not important

The Dynamics as a Probabilistic Rewriting System

Extension and Reduction of a helix B C G F E H D A g = (E,F,G,H) f = (A,B,C,D) One rule for Rewriting rule: f � g multipe contexts

(De)-Composition of a Hypohelix j f h k

(De)-Composition of a Helix j f h k g Rewriting rule: w(f , h , k , j) � w(f , g(h , k) , j) Meta Rewriting rule: w( , h , k ) � w( , g(h , k) )

(De)-Composition of a Helix j f h k m

(De)-Composition of a Helix j f h k g One rule for m multipe contexts Meta Rewriting rule: m( , h , k ) � m( , g(h , k) )

The Window Movement Polymerase RNA Ribosome three rules for multipe contexts w =(R,P) Movement of the Ribosome: (R , P)(x) � (R+3 , P) (x’) Movement of the Polymerase: (R , P) (x) � (R , P+1) (x) Termination: Slippage of the Polymerase: (R , P) (x) � �

Rates of the rewriting Rules • Rates of the Markov Chain determined by � E. Two terms – Stacking energy (easy) – Free energy (a bit obscure) • Movements of Rib and Pol – even more obscure

Other optimisations Implemented or not

On-the-fly (everybody does it) • States of the MC are created only when visited. • Still needed efficient data structures for the states • Needed efficient algorithms to find all the successors of a given state (and the rates)

Concretely repeat 1000 times s=empty window repeat find all successors s’ of s and rates s->s’ s= a random s’ until expressed or aborted output statistics

Symbolic representation (not used yet) A data structure (close to a formula) to represent current set of states or probability distribution. • Very successfull in verification domain • We tried probabilistic tree automata, they explode � • Thinking

Abstraction (everybody use it in a naive way) • Aim : to have fewer states • Idea: to group together several states • Rnamodel and Kinefold : only maximal helices stored. • We try to group some close helices: 800000->300000 states • Abstraction-based algorithms should be done systematically

Abstraction (what if) • Biological description is very abstract (terminator, antiterminator, noise) • What if … a model is possible at this level? • To think

Acceleration • Problem : many fast transitions without any progress, major event rare • Partial solution: group similar states together • A better one : Isambert’s clustering • To think more

Advanced simulation methods • « Perfect simulation » - nice but only for steady state • Other methods from Performance Evaluation – mostly for N^k • To read and to think

More complex structures

Other structures • Helices can be « flipped » - not a big deal • They can be pseudoknotted • Biologically relevant

Consequences for modeling and analysis • What are legal configurations? • How to compute free energy? • Data structure : a set of helices, a tree with shortcuts, a colored graphs? • Alphabet and state space much bigger… • Abstractions more involved. • Kinefold – yes, Rnamodel – in progress

Dreams • To find simple abstract models • To analyze them w/o Monte-Carlo • To transfer techniques from Verification and Performance evaluation and backwards

Questions?

Continuous Markov Chain • Probability for moving from s to s’ within t time units is: � � ( , ' ) s s t � 1 e

Continuous Markov Chain • Probability for moving from s to s’ within t time units is: � ( , ' ) s s � ( ) E s t � � = � ( 1 ), ( ) ( , ' ) e E s s s ( ) E s

Modeling and studying RNA secondary structure Eugne Asarin LIAFA, - PowerPoint PPT Presentation

Modeling and studying RNA secondary structure Eugne Asarin LIAFA, CNRS & Univ. Paris Diderot Credits Co-authors, partners and teachers: Vassily Lyubetsky, Alexander Seliverstov (IITP) Thierry Cachat, Tayssir Touili (LIAFA)

Secondary Framing Secondary Framing Secondary Framing Secondary Framing 1 1 Secondary Framing

RNA World Hypothesis and RNA folding By Lixin Dai October 16, 2002 Outline: RNA World

CSE 527 Autumn 2006 Lectures 15-16 RNA Secondary Structure Prediction RNA Secondary Structure:

CSE 527 Autumn 2007 Lectures 17-18 RNA Secondary Structure Prediction RNA Secondary Structure:

Outline CSEP 590A Summer 2006 Biological roles for RNA What is secondary structure? Lecture

RNA Secondary Structure CSE 417 W.L. Ruzzo The Double Helix Los Alamos Science The Central

The Double Helix RNA Secondary Structure CSE 417 W.L. Ruzzo Los Alamos Science The Central

Prediction of RNA-RNA Interaction slides by Mathias M ohl and Rolf Backofen ohl M.M c

CSEP 590A Summer 2006 Lecture 8 RNA Secondary Structure Prediction Outline Biological roles

DNA AND RNA ATI TEAS SCIENCE DNA & RNA Questions related to DNA and RNA cover topics

Introduction to RNA-Seq Mary Piper Bioinformatics Consultant and Trainer DataCamp RNA-Seq

Prediction of RNA-RNA-Interaction 20 1 15 1 5 10 20 5 10 20 15 10 1 15 5 1 20 10

PROTEIN SYNTHESIS RNA (ribonucleic acid) 3 types RNA DIFFERENCES 1. messenger RNA (mRNA) DNA

PROTEIN SYNTHESIS RNA (ribonucleic acid) 3 types RNA DIFFERENCES 1. messenger RNA (mRNA)

RNA-seq basics: From reads to differential expression COMBINE RNA-seq Workshop RNA sequencing

RNA Secondary RNA Secondary Structures: Structures: A Case Study on A Case Study on Viruses

Schematics of the kinetoplastid mitochondrial kDNA organization And how were these networks

Pa#ent #1 Pa#ent #2 Pa#ent #3 Normal Lung Images

Error correc'on through catastrophes Arvind Murugan Physics and the James Franck Ins3tute

Sheldon Campbell M.D., Ph.D. Pathology and Laboratory Medicine, VA Connecticut Department of

Bio Interlude DNA Replica4on DNA Replica4on: Basics G T

Deciphering Signatures of Mutational Processes Operative in Human Cancer Tumor Cells Carry

Why Publish in JBC F. Peter Guengerich Editorial Board Member Orientation April 22, 2017 Hyatt

Bayesian dynamic borrowing of external information: What can be gained in terms of frequentist

Modeling and studying RNA secondary structure Eugne Asarin LIAFA, - PowerPoint PPT Presentation

Modeling and studying RNA secondary structure Eugne Asarin LIAFA, CNRS & Univ. Paris Diderot Credits Co-authors, partners and teachers: Vassily Lyubetsky, Alexander Seliverstov (IITP) Thierry Cachat, Tayssir Touili (LIAFA)

Secondary Framing Secondary Framing Secondary Framing Secondary Framing 1 1 Secondary Framing

RNA World Hypothesis and RNA folding By Lixin Dai October 16, 2002 Outline: RNA World

CSE 527 Autumn 2006 Lectures 15-16 RNA Secondary Structure Prediction RNA Secondary Structure:

CSE 527 Autumn 2007 Lectures 17-18 RNA Secondary Structure Prediction RNA Secondary Structure:

Outline CSEP 590A Summer 2006 Biological roles for RNA What is secondary structure? Lecture

RNA Secondary Structure CSE 417 W.L. Ruzzo The Double Helix Los Alamos Science The Central

The Double Helix RNA Secondary Structure CSE 417 W.L. Ruzzo Los Alamos Science The Central

Prediction of RNA-RNA Interaction slides by Mathias M ohl and Rolf Backofen ohl M.M c

CSEP 590A Summer 2006 Lecture 8 RNA Secondary Structure Prediction Outline Biological roles

DNA AND RNA ATI TEAS SCIENCE DNA &amp; RNA Questions related to DNA and RNA cover topics

Introduction to RNA-Seq Mary Piper Bioinformatics Consultant and Trainer DataCamp RNA-Seq

Prediction of RNA-RNA-Interaction 20 1 15 1 5 10 20 5 10 20 15 10 1 15 5 1 20 10

PROTEIN SYNTHESIS RNA (ribonucleic acid) 3 types RNA DIFFERENCES 1. messenger RNA (mRNA) DNA

PROTEIN SYNTHESIS RNA (ribonucleic acid) 3 types RNA DIFFERENCES 1. messenger RNA (mRNA)

RNA-seq basics: From reads to differential expression COMBINE RNA-seq Workshop RNA sequencing

RNA Secondary RNA Secondary Structures: Structures: A Case Study on A Case Study on Viruses

Schematics of the kinetoplastid mitochondrial kDNA organization And how were these networks

Pa#ent #1 Pa#ent #2 Pa#ent #3 Normal Lung Images

Error correc'on through catastrophes Arvind Murugan Physics and the James Franck Ins3tute

Sheldon Campbell M.D., Ph.D. Pathology and Laboratory Medicine, VA Connecticut Department of

Bio Interlude DNA Replica4on DNA Replica4on: Basics G T

Deciphering Signatures of Mutational Processes Operative in Human Cancer Tumor Cells Carry

Why Publish in JBC F. Peter Guengerich Editorial Board Member Orientation April 22, 2017 Hyatt

Bayesian dynamic borrowing of external information: What can be gained in terms of frequentist

DNA AND RNA ATI TEAS SCIENCE DNA & RNA Questions related to DNA and RNA cover topics