Maximal Exact Matchings • similar to (local) alignment, but only identify parts that are exactly identical (no gaps) • exact matches must be connected at sequence or structure level S.Will, 18.417, Fall 2011 • faster than structure alignment: O ( n 2 ) time & space
Maximal Exact Matchings expaRNA : compute alignments fast with help of exact matchings • step 1: compute matchings • step 2: chaining of matchings (select “chain” of compatible matchings, i.e. no overlap, no crossing) • step 3: compute alignment using chain of matchings as anchor constraints in LocARNA → speed-up over LocARNA S.Will, 18.417, Fall 2011 Steffen Heyne, Sebastian Will, Michael Beckstette, Rolf Backofen, Lightweight Comparison of RNAs Based on Exact Sequence-Structure Matches, Bioinformatics 2009
Bralibase 2.1-Benchmark 1.0 0.9 0.8 0.7 SPS 0.6 reference 0.5 Locarna ExpLoc−P (heuristic) ExpLoc−P (suboptimal) 0.4 ExpLoc (minsize10) ExpLoc (minsize9) RAF 0.3 20 40 60 80 Sequence Identity • ExpLoc: Exact matchings as anchors in LocARNA; 4.4, 5.4 times S.Will, 18.417, Fall 2011 speed-up • ExpLoc-P: Exact matchings from structure ensembles as anchors in LocARNA (submitted RECOMB’12; speed-up: 4.9, 6.0) • RAF: Do et al., Bioinformatics 2008; speed-up 15.9
Whole Genome Realignment for ncRNA Prediction whole genome alignment 1 slice and filter by thermodynamic stability of single RNA structures _ _ _ _ _ _ _ _ _ _ _ _ _ _ unstable conserved _ _ _ _ _ _ _ _ _ U A U _ _ structure of original G U A A G G U G _ locus alignment stable loci G _ _ _ U U A U U U C U U G G U UUGGC U U C A U U A A G G U U U A AUG U A U G A A C C A C U U A G A G CG A G G G U U U U A A C C G U G UA A UG C G C U realign G C G G G G G G U C C G 2 U U GA A based on sequence and 3 structure stable _ _ _ _ _ A _ estimate _ UU _ conserved A _ _ U G G G U U _ A A U A ncRNA likelihood _ _ U structure _ U G _ A _ A C C C C C G G A G C G U U U G U G G G G C U U G G U U C U U A G A G U A A U G C U A U G G C C U G U G A A G C U U U _ U G A A U C U A U U U A A C U A G U A C A U U A U U G G A C C U _ _ U U A A A A __ _ G U realigned loci predictions (q-values) .((((.......((((((.((((((((((((......))))..)))))))).)))))).. ............................................................ structure- DroMel_CAF1 UUUGAG------UGUUUCUUGUGUUCAUUAAG---GUUUAA--UGAAUCUAUGGAGCGAG 49 DroMel_CAF1 -----------------------TTTGAGTG-TTTCTTGTGTTCATTAAGGTTTAATGAA 36 DroSim_CAF1 UUUUAG------UGUUUCUUGGGUUCAUUAAG---GUUUAA--UGAAUCUAUGGAGCGAG 49 DroSim_CAF1 -----------------------TTTTAGTG-TTTCTTGGGTTCATTAAGGTTTAATGAA 36 DroYak_CAF1 -----------------------TTTGATGG-TTACTTTGCTTCATCAAGGTTTAATGGT 36 DroYak_CAF1 UUUGAU------GGUUACUUUGCUUCAUCAAG---GUUUAA--UGGUUCUAUGGAGCGAG 49 DroEre_CAF1 -----------------------TTTGATGG-TTTCTTTGCTTCATCAAGTTTTAATGAT 36 DroEre_CAF1 UUUGAU------GGUUUCUUUGCUUCAUCAAG---UUUUAA--UGAUUCUAUGAAGCGAG 49 DroPse_CAF1 GGGCCATGGCCTCCTCTGATCGATTAG-GGGTTTTCTTGCTTGATTTATCGGTTGATGGA 59 DroPse_CAF1 GGGCCAUGGCCUCCUCUGAUCGAUUAGGGGUUUUCUUGCUUGAUUUAUCGGUUGAUGGAG 60 based DroPer_CAF1 GGGCCAUGGCCUCCUCUGAUCGAUUAGGGGUUUUCUUGCUUGAUUUAUCGGUUGAUGGAG 60 DroPer_CAF1 GGGCCATGGCCTCCTCTGATCGATTAG-GGGTTTTCTTGCTTGATTTATCGGTTGATGGA 59 .........10........20........30........40........50......... .........10........20........30........40........50......... realignment ..((((..((.((....((((........))))....)).))..))))............ .....((((.(((((....)))..)).)))).......))))........... DroMel_CAF1 TCTATGGAGCGAGTAATGCGCTTGAAGCTGTGTTTATCTGGTCACATGTAT---TGA--A 91 DroMel_CAF1 UAAUGCGCUUGAAGCUGU-GUUUAUCUGGUCACAUGUAUUGA----------A 91 DroSim_CAF1 TCTATGGAGCGAGTACTGGGCTTGAAGCTGGGCTTATCTGGTCACATGTAT---TGA--A 91 DroSim_CAF1 UACUGGGCUUGAAGCUGG-GCUUAUCUGGUCACAUGUAUUGA----------A 91 DroYak_CAF1 TCTATGGAGCGAGTATTGGGCTTGAAGCTGTGTGTTTCTGGTCGCATGTAT---TGA--A 91 DroYak_CAF1 UAUUGGGCUUGAAGCUGU-GUGUUUCUGGUCGCAUGUAUUGA----------A 91 DroEre_CAF1 TCTATGAAGCGAGTATTGCGCTTGAAGCTGTGTGTTTCTGGTCACATGTAT---TGA--A 91 DroEre_CAF1 UAUUGCGCUUGAAGCUGU-GUGUUUCUGGUCACAUGUAUUGA----------A 91 DroPse_CAF1 GCAATGGGGTG----ATGCTAGTGA--GTGGGTGATTCTGGCCATGGCCATAGGTGAATA 113 DroPse_CAF1 CAAUGGGGUGAUGCUAGUGAGUGGGUGAUUCUGGCCAUGGCCAUAGGUGAAUA 113 DroPer_CAF1 GCAATGGGGTG----ATGCTAGTGA--GTGGGTGATTCTGGCCATGGCCATAGGTGAATA 113 − − −− → DroPer_CAF1 CAAUGGGGUGAUGCUAGUGAGUGGGUGAUUCUGGCCAUGGCCAUAGGUGAAUA 113 .........70........80........90........100.......110........ .........70........80........90........100.......110. S.Will, 18.417, Fall 2011 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ A _ _ _ _ _ _ A UU A _ _ _ _ _ _ U _ A G G G _ G U U U A A U _ U A U _ U _ _ U G _ _ U G _ A _ A U C _ A G A C C C C G U U U G U G G G A G C G U U G G G C U G G U G G G U U U G U U U A A C C A A G U A A U G C U A _ U A U G C U G U G A A G C U U U A U G U C C U U C G U U U U U G A A U C U A U A A G G A G _ U G A G CG A G A C C G U U U U U U U C U U A G U G G U C A C A C G G G U A U G C A UA A UG C G C G G G G C C UUGGC U U C A U U A A G G U U U A AUG C A _ A U U A A A _ _ A U U GA _ _ G U
RNA Shapes : Idea • A more coarse-grained look at RNA structure • intuition: often general shape of RNA is more important for RNA function than “details” • example: cloverleaf structure of tRNAs Shape can be considered at different levels of abstraction S.Will, 18.417, Fall 2011 Robert Giegerich, Bj¨ orn Voss, Marc Rehmsmeier, Abstract shapes of RNA, Nucleic Acids Research, 2004
RNA Shapes : different Shape Types • 5 Most abstract - helix nesting pattern and no unpaired regions • 4 Helix nesting pattern in internal loops and multiloops • 3 Nesting pattern for all loop types but no unpaired regions • 2 Nesting pattern for all loop types and unpaired regions in S.Will, 18.417, Fall 2011 bulges, internal loops, and multiloops • 1 Most accurate - all loops and all unpaired RNAshapes : Computes shape probabilities for a sequence (+ Shrep = representative structure for each shape)
Recommend
More recommend