a method for aligning rna secondary structures
play

A Method for Aligning RNA Secondary Structures Jason T. L. Wang - PowerPoint PPT Presentation

A Method for Aligning RNA Secondary Structures Jason T. L. Wang New Jersey Institute of Technology J Liu, JTL Wang, J Hu and B Tian, BMC Bioinformatics, 2005 1 Outline Introduction Structural alignment of RNA (preliminaries, RSmatch


  1. A Method for Aligning RNA Secondary Structures Jason T. L. Wang New Jersey Institute of Technology J Liu, JTL Wang, J Hu and B Tian, BMC Bioinformatics, 2005 1

  2. Outline • Introduction • Structural alignment of RNA (preliminaries, RSmatch algorithm, software) • Experiments (RNA motif detection) • Multiple structural alignment (RMulti) • Combining RSmatch with RNAView • Conclusion and future work 2

  3. Molecule building blocks • Protein building blocks: – 20 types of amino acid • RNA building blocks: – Purine: A denine, G uanine – Pyrimidine: C ytosine, U racil 3

  4. RNA structure elements • RNA sequence folds to form secondary/tertiary structure • Majority of base connections involve two bases – Watson-Crick: AU or CG – Non-canonical: UG or AG • Basic structure elements of RNA 4

  5. Definition of structural components C G • Given an RNA sequence: G U A G – 5’ � 3’: r 1 r 2 r 3 …r n A U C G • Two types of structural C G G A components [1] : U A G A U – Single bases (blue) G G C – Bonded base pairs (red) G C A U A G C G G U 5’ 3’ [1] Zuker, M. (1989) Science 5

  6. Secondary structure constraint (1) • No common base can be shared by any C G C G G U G U A two pairs [2] . G A G A U A U C G C G – Bad: “G” is shared C G C G Prohibited! G A G A by two pairs: A-G U A U A G G A U and G-C A U GG C GG C CG AC G C G C 3’ C A G A U G A GC G U A GC G G U 5’ 5’ 3’ (b) BAD (a) GOOD [2] Hofacker, I.L. (2003) NAR 6

  7. Secondary structure constraint (2) hairpin • A hairpin element must Prohibited! have at least 3 bases on C G G U A U the loop part [3] . A G A U A U C G C G – Bad: only two bases (A C G C G and U) present in the G A G A U A U A G loop G A U A U GG C GG C G C G C A U A U A GC G G A GC G G U U 5’ 5’ 3’ 3’ (a) GOOD (b) BAD [3] Zuker, M. (1991) NAR 7

  8. Secondary structure constraint (3) • Pseudoknots are not included [4] (b) GOOD (nested structure) (a) BAD (c) GOOD (branching) C G C G G U G U A G A G A U C G A U C G U G C G Prohibited! C G A G G G C G G A U G A C G A C G U A G C U A G C G A A U U U G AGG C GG C G A G G C GG U A 3’ G U AU G C A U 3’ G C G A A A U G G C A GC G G A U 5’ 5’ A U A C G G 3’ U 5’ [4] Mathews, D.H. (1999) JMB 8

  9. RNA secondary structure representation schemes a. Bond annotation [5] b. Arc representation [6] c. Tree representation [7] d. Nested parenthesis representation [8] [5] Shapiro, B. (1990) CABIOS [6] Zhang, K. (1999) CPM [7] Ma, B. (2002) TCS [8] Hofacker, I.L. (2002) JMB 9

  10. Outline • Introduction • Structural alignment of RNA (preliminaries, RSmatch algorithm, software) • Experiments (RNA motif detection) • Multiple structural alignment (RMulti) • Combining RSmatch with RNAView • Conclusion and future work 10

  11. Extended circle model � Circle model [9] : circle 5 U C G • circle 0: G, C, A, G, A, A G A A circle 7 • circle 1: A, A, U, G A U circle 4 U A • circle 7: C, C, G, C, G U A C G C U circle 3 A U • circle 8: G, U, A, U, U, U, C G G C U U G circle 2 C C � Sequential order between GA G circle 8 U components: A G circle 6 circle 1 A A C G > C > A-U > U > C-G > A-G 3’ G 5’ circle 0 [9] Liu, J. (2005) BMC Bioinformatics 11

  12. Hierarchical organization • circles are organized in a tree-like hierarchy circle 5 circle 0 U C G G A A circle 7 circle 1 A U circle 4 U A U circle 2 A C G C U circle 3 A U G circle 3 G C circle 6 U U G circle 2 C C GA G circle 8 U circle 4 circle 7 A G circle 6 circle 1 A A C 3’ G circle 8 5’ circle 5 circle 0 12

  13. Hierarchical relationship between two structural components (1) the same circle: e.g. each pair from G, C, G, A-U, G-C, G, A-U (2) descendant/ancestor circles: e.g. pair (G, A-U) (3) cousin circles: e.g. pairs (U, C), (A-U, G-C) and (U, G-C) (1) (2) (3) GU CG G U CG GU C G A A A A A A A U A U A U UA U A C G UA UA circle U A U A C G C G C U A U C C G U U A U A U G G G C U U G G G C C U U U U C G G C C C C C G G G A U G G G A U A U A G A G A G A A A A A A C C C G 3’ G 3’ G 3’ 5’ 5’ 5’ 13

  14. Partial structure induced by a structural component GU CG A A A U 10 UA U A C G C U A U G G C U U G C parent C 30 G G A U structure A G A A C GU CG G 3’ 5’ A A A U UA U A C G C U A U G G C U U G GU CG C C G 3’ 5’ A A child A U structure UA U A C G C U A U G G C U U G C C G G 5’ 3’ 14

  15. Structural alignment rules (1) • A 1 precedes A 2 iff B 1 precedes B 2 where A 1 , A 2 , B 1 ,B 2 are structural components. 15

  16. Structural alignment rules (2) RNA 1 RNA 2 (a) Same loop relationship preserved: A 1 is in the same loop as A 2 iff B 1 is in the same loop as B 2 (a) (b) Ancestor/descendant relationship preserved: A 1 is ancestor of A 2 iff B 1 is ancestor of B 2 (b) (c) Cousin relationship preserved: A 1 is cousin of A 2 iff B 1 is cousin of B 2 16 (c)

  17. Example alignment First RNA Second RNA • All structural alignment GU CG rules must be satisfied for C U CU A A a valid alignment A U U A U A GC UA U A U A C G G • In addition, a single base C A U G C U A U G G A C A C U U U U G U can not be aligned with a C C C GC G G G G C G base pair A U U A A A A A U U G G 3’ 3’ 5’ 5’ ..((...(((......)))((.(.....))).)).. ..((..((......))(((.......))).)).. GUACGCAGUAAGUCGAUACGCCGUAUUUCGCGGUAA GUUCGAUUUCUCUAAAGAGUAGCUUUCUCGGAAA Alignment Result ..((...(((......)))((.(.. ...))).)).. GUACGCAGUAAGUCGAUACGCCGUA—-UUUCGCGGUAA || || | || | | | ||| |||| ||| || GUUCGA-UU-UCUCUA-AAGA-GUAGCUUUCUCGGAAA 17 ..((.. (( ...... ))(( (.......))).))..

  18. Dynamic programming algorithm: overview First structure Second structure 5’ 3’ A 5’ G A UC GA 3’ UA U U U U U CA U A C G The best alignment A U G G C between partial structures A of U and A - U DP scoring table A U U C A U C A G G U A - U A G C-G A U C A U G U 18 A-U

  19. Case 1 5’ 3’ 5’ 3’ 19

  20. Case 2 5’ 3’ 5’ 3’ 20

  21. Case 3 5’ 3’ 5’ 3’ 21

  22. Case 4.1 5’ 3’ 5’ 3’ 22

  23. Case 4.2 5’ 3’ 3’ 5’ 23

  24. Example of matching score function • Score function of matching two equal-length structural components: i.e. =  1 , if both C and C are single bases and C C a b a b  = = g ( C , C ) 2 , if both C and C are base pairs and C C  a b a b a b  0 , otherwise  • Gap penalty equals 0 • Extending g to the whole set of matched component pairs, our goal is to maximize f(R 1 , R 2 ) ∑ = f ( R , R ) g ( C , C ) 1 2 a b i i i 24

  25. Cell type 1 : single base vs. single base 5’ 3’ A 5’ G 3’ UC A ? G A C U U U A U U CA U A G C U G C A U G A AUACAUGUUC UCAUACAGGUUA ..(.....). ....(.....). (C) (B) (A) 5’ 5’ 3’ 5’ 3’ A A 3’ A 3’ 5’ A 5’ 3’ 5’ A G A 3’ UC G UC G A C G A G C UC U U C G A U A U U A U U U U U A U U U U U C CA U A U U C G CA U A G U C CA U A G A U G A U G G C C G A U G A A G C A ..(.....) . ..(.....). ..(.....). --AUACAUGUU-C --AUACAUGUUC --AUACAUGUUC- UCAUACAGGUUA- UCAUACAGGUUA UCAUACAGGUU-A ....(.....). ....(.....). ....(.....) . 25

  26. Cell type 2: base pair vs. single base 5’ 3’ 5’ A 3’ A G UC G A C U ? U A U U U U C CA U A G A U G C G A first score 5’ 5’ 3’ A 3’ A G UC ? C G A U A U U second score U U C U CA U A G A U G G C A 5’ 3’ A 5’ 3’ A C G U UC ? G A A U U U U C U CA U A G A U G C G A 26

  27. Cell type 2: base pair vs. single base (first score) 5’ 5’ 3’ A 3’ A G UC G A C U ? A U U U U U C C G U A A U G A G C A UCAUACAGGUUA ACAUGUU ....(.....). (.....) 5’ 5’ 3’ 5’ 5’ 3’ A A 3’ 3’ A A G G UC UC G A G A C C U U A A U U U U U U U U U U C C C C G G U A U A A A U G U G A A G G C C A A ( ..... ) (.....) A-----CAUGU--U ----ACAUGUU- -UCAUACAGGUUA UCAUACAGGUUA ....(.....). ....(.....). 27

  28. Cell type 2: base pair vs. single base (second score) 5’ 5’ 3’ 3’ A A G UC C U G A A U ? U U U U C C G U A U G A A G C A UCAUACAGGUUA AUACAUGUU ..(.....) ....(.....). (A) (B) (C) 5’ 5’ 5’ 3’ 3’ 5’ 3’ A A 5’ 3’ A A 5’ 3’ 3’ A A G UC G UC C C U G A G U G A UC C U G A A U A U U U A U U U U U U U C U C U U C U C G C G U A C U G U A A A G U G A A U A U G A A G C G C A G A C A .. (.....) .. (.....) ..(.....) AU----ACAUGUU- --AU--------ACAUGUU --AUACAUGUU- --UCAUACAGGUUA UCAUACAGGUUA------- UCAUACAGGUUA ....(.....). ....(.....). ....(.....). 28

Recommend


More recommend