Prediction of RNA-RNA-Interaction 20 1 15 1 5 10 20 5 10 20 15 10 1 15 5 1 20 10 Can Alkan, Emre Karakoc, Joseph H. Nadeau, S. Cenk Sahinalp, Kaizhong Zhang. RNA-RNA interaction prediction and antisense RNA target search. JCB 2006 • define problem RIP (with and without PKs) S.Will, 18.417, Fall 2011 • prove NP-completeness even without PK for Base pair-energy model and more complex models (reduction from “longest common subsequence of multiple binary strings”, mLCP)
Relation between PK-Prediction and RIP 15 20 1 1 5 10 15 20 10 5 20 15 10 5 1 15 5 1 10 20 • RNAcofold: concatenate RNAs A and B, predict PK-free structure • specific restrictions on the structure of the interaction complex • Can we apply pseudoknot-prediction to concatenation? Difference to Alkan-algorithm? S.Will, 18.417, Fall 2011
Semiautomatic RNA 3D Structure Modeling S.Will, 18.417, Fall 2011 Bruce A Shapiro, Yaroslava G Yingling, Wojciech Kasprzak and Eckart Bindewald. Bridging the gap in RNA structure prediction Current Opinion in Structural Biology. 2007
An automated pipeline: MC-Fold/MC-Sym S.Will, 18.417, Fall 2011 Marc Parisien & Francois Major. The MC-Fold and MC-Sym pipeline infers RNA structure from sequence data. Nature 2008.
Potential obstacles • Reliability of secondary structure prediction → prediction from alignments, covariance • Pseudoknots → pseudoknot prediction → covariance analysis of large multiple alignments • Non-canonical base pairs → experimental loop energies? learn from 3D-structures! • 3D-motifs (due to non-canonical base pairs) → learn from 3D-structures, isostericity S.Will, 18.417, Fall 2011
Non-canonical Base Pairs, 3D-Motifs and Isostericity S.Will, 18.417, Fall 2011 Recurrent structural RNA motifs, Isostericity Matrices and sequence alignments. Aur´ elie Lescoute, Neocles B. Leontis, Christian Massire and Eric Westhof. NAR 2005.
Non-Canonical Base Pairs S.Will, 18.417, Fall 2011 Leontis, N.B. and Westhof, E. Geometric nomenclature and classification of RNA base pairs. RNA 2001
Back to MC-Fold/MC-sym • NCMs: Nucleotide Cyclic Motifs from PDB (531 structures) • MC-fold predicts secondary structure including non-canonical base pairs by merging NCMs • Probability-based scoring Pr [ structure | seq ] = Pr [ NCMs | seq ] × Pr [ junctions | NCMs ] × Pr [ hinges | junctions ] × Pr [ pairs | hinges ] • predict sub-optimals S.Will, 18.417, Fall 2011
Prediction Performance of MC-Fold CONTRAfold Predic t ed base pairs RNAsubopt MC-Fold (Machine (%) (Therm odynam ic s) (NCM) learning) False posit ives 6.7 7.5 17.9 False negat ives 25.2 26.9 10.1 True Posit ives 74.8 73.1 89.9 Canonic als 88.4 86.3 94.7 Non-c anonicals N/A 1.4 62.1 Mat t hew s = TP TP 82.8 81.4 86.6 ( TP � FN ) ( TP � FP ) 1968 base pairs (1665 Watson-Crick) in 264 hairpins from 182 S.Will, 18.417, Fall 2011 different PDB structures
MC-Sym • libraries of 3D-fragments for each NCM • solve combinatorial puzzle, satisfy steric/RMSD constraints S.Will, 18.417, Fall 2011 • Las-vegas algorithm (no exhaustive enumeration, could fail to produce solution) • run-time in pipeline 24h
Example Predictions of MC-Fold/MC-Sym 14 a 16 c 3 � 5 � 3 � 5 � 15 5 � II 3 � A G 18 G U C A25 C C G G C A C G B105 G III I A U 7 II A 10 A C U 20 B110 G A U G U C G A20 U A C U U G A C A A G G A G A e 3 � G C U U C C B120 B115 5 U G 25 C G A15 U A G C G G C G C A U C G C G C A A G C I U A G C G C 25 G C B125 G C 5 � 3 � III 5 � 3 � U A 5 � G C 3 � A U 5 � 16 b d 7 25 U A 3 � C 35 U A U C U C U C G 20 A U U G U A 30 U C A U 30 G C G C G A 11 40 U U G U A 15 G A U A A A 25 A U U U C G 35 U A 45 A G C 10 G G C S.Will, 18.417, Fall 2011 C G 4 20 U A U A 16 C G 5 � A U U C G 50 15 55 5 C G G U U G G G C C C 3 � A U 40 U G C C 5 � A U 3 � U C C C G G G 5 � 5 � A C G C U A 3 � 10 5 5 � 3 � [Parisien&Major, Nature 2007]
Rfam / Infernal • Infernal: scan genomic data for RNA family members Inference of RNA alignments • important tool for Rfam Rfam 10.1 (June 2011, 1973 families) http://rfam.sanger.ac.uk/ • in Rfam: ’hand-curated’ seed alignments ⇒ full alignments • use Stochastic Context Free Grammars to model RNA families • model of a family: Consensus Model (CM) U C input multiple alignment: example structure: U G 10 C G [structure] . : : < < < _ _ _ _ > - > > : < < - < . _ _ _ . > > > . A A U S.Will, 18.417, Fall 2011 human . A A G A C U U C G G A U C U G G C G . A C A . C C C . 5 G C mouse a U A C A C U U C G G A U G - C A C C . A A A . G U G a A 15 U C 21 GA A G G orc . A G G U C U U C - G C A C G G G C A g C C A c U U C . 2 C 1 5 10 15 20 25 28 C C C A 27 25
Infernal Construct grammatical description ROOT 1 2 MATL 2 consensus structure: guide tree: 3 MATL 3 BIF 4 2 15 3 4 14 16 27 BEGL 5 BEGR 15 5 13 17 26 4 MATP 6 14 15 MATL 16 12 18 5 MATP 7 13 16 MATP 17 27 6 11 19 25 MATR 8 12 17 MATP 18 26 7 10 21 23 6 MATP 9 11 18 MATL 19 8 9 22 7 MATL 10 19 MATP 20 25 8 MATL 11 21 MATL 21 9 MATL 12 22 MATL 22 S.Will, 18.417, Fall 2011 10 MATL 13 23 MATL 23 END 14 END 24
Infernal • Construct CM from guide tree • Expand nodes of guide tree: Add match, insertion, and deletion states • learn transition and output probabilities from alignment • CM comparable to profile HMM for protein families (Pfam) S 1 IL 2 ROOT 1 IR 3 ML 4 D 5 MATL 2 IL 6 ML 7 MATL 3 D 8 IL 9 "split set" MP 12 ML 13 MR 14 D 15 B 10 BIF 4 S 11 BEGL 5 MP 12 MATP 6 ML 13 MR 14 MATP 6 inserts D 15 IL 16 IR 17 IL 16 IR 17 MP 18 ML 19 "split set" MR 20 MP 18 ML 19 MR 20 D 21 MATP 7 D 21 IL 22 MATP 7 IR 23 MR 24 inserts MATR 8 D 25 S.Will, 18.417, Fall 2011 IL 22 IR 23 IR 26 MP 27 ML 28 MR 29 MATP 9 "split set" D 30 MR 24 D 25 IL 31 IR 32 MATR 8 ML 33 D 34 MATL 10 insert IL 35 IR 26 ML 36 MATL 11 D 37 IL 38 ML 39 D 40 MATL 12 IL 41 ML 42 D 43 MATL 13 IL 44 E 45 END 14 S 46 BEGR 15 IL 47 ML 48 D 49 MATL 16 IL 50 MP 51 ML 52 MR 53 MATP 17 D 54 IL 55 IR 56 MP 57 ML 58 MR 59 MATP 18 D 60 IL 61 IR 62 ML 63 D 64 MATL 19 IL 65 MP 66 ML 67 MR 68 MATP 20 D 69 IL 70 IR 71 ML 72 D 73 MATL 21 IL 74 ML 75 MATL 22 D 76 IL 77 ML 78 D 79 MATL 23 IL 80 END 24 E 81
Recommend
More recommend