maximal exact matchings
play

Maximal Exact Matchings similar to (local) alignment, but only - PowerPoint PPT Presentation

Maximal Exact Matchings similar to (local) alignment, but only identify parts that are exactly identical (no gaps) exact matches must be connected at sequence or structure level S.Will, 18.417, Fall 2011 faster than structure


  1. Maximal Exact Matchings • similar to (local) alignment, but only identify parts that are exactly identical (no gaps) • exact matches must be connected at sequence or structure level S.Will, 18.417, Fall 2011 • faster than structure alignment: O ( n 2 ) time & space

  2. Maximal Exact Matchings expaRNA : compute alignments fast with help of exact matchings • step 1: compute matchings • step 2: chaining of matchings (select “chain” of compatible matchings, i.e. no overlap, no crossing) • step 3: compute alignment using chain of matchings as anchor constraints in LocARNA → speed-up over LocARNA S.Will, 18.417, Fall 2011 Steffen Heyne, Sebastian Will, Michael Beckstette, Rolf Backofen, Lightweight Comparison of RNAs Based on Exact Sequence-Structure Matches, Bioinformatics 2009

  3. Bralibase 2.1-Benchmark 1.0 0.9 0.8 0.7 SPS 0.6 reference 0.5 Locarna ExpLoc−P (heuristic) ExpLoc−P (suboptimal) 0.4 ExpLoc (minsize10) ExpLoc (minsize9) RAF 0.3 20 40 60 80 Sequence Identity • ExpLoc: Exact matchings as anchors in LocARNA; 4.4, 5.4 times S.Will, 18.417, Fall 2011 speed-up • ExpLoc-P: Exact matchings from structure ensembles as anchors in LocARNA (submitted RECOMB’12; speed-up: 4.9, 6.0) • RAF: Do et al., Bioinformatics 2008; speed-up 15.9

  4. Whole Genome Realignment for ncRNA Prediction whole genome alignment 1 slice and filter by thermodynamic stability of single RNA structures _ _ _ _ _ _ _ _ _ _ _ _ _ _ unstable conserved _ _ _ _ _ _ _ _ _ U A U _ _ structure of original G U A A G G U G _ locus alignment stable loci G _ _ _ U U A U U U C U U G G U UUGGC U U C A U U A A G G U U U A AUG U A U G A A C C A C U U A G A G CG A G G G U U U U A A C C G U G UA A UG C G C U realign G C G G G G G G U C C G 2 U U GA A based on sequence and 3 structure stable _ _ _ _ _ A _ estimate _ UU _ conserved A _ _ U G G G U U _ A A U A ncRNA likelihood _ _ U structure _ U G _ A _ A C C C C C G G A G C G U U U G U G G G G C U U G G U U C U U A G A G U A A U G C U A U G G C C U G U G A A G C U U U _ U G A A U C U A U U U A A C U A G U A C A U U A U U G G A C C U _ _ U U A A A A __ _ G U realigned loci predictions (q-values) .((((.......((((((.((((((((((((......))))..)))))))).)))))).. ............................................................ structure- DroMel_CAF1 UUUGAG------UGUUUCUUGUGUUCAUUAAG---GUUUAA--UGAAUCUAUGGAGCGAG 49 DroMel_CAF1 -----------------------TTTGAGTG-TTTCTTGTGTTCATTAAGGTTTAATGAA 36 DroSim_CAF1 UUUUAG------UGUUUCUUGGGUUCAUUAAG---GUUUAA--UGAAUCUAUGGAGCGAG 49 DroSim_CAF1 -----------------------TTTTAGTG-TTTCTTGGGTTCATTAAGGTTTAATGAA 36 DroYak_CAF1 -----------------------TTTGATGG-TTACTTTGCTTCATCAAGGTTTAATGGT 36 DroYak_CAF1 UUUGAU------GGUUACUUUGCUUCAUCAAG---GUUUAA--UGGUUCUAUGGAGCGAG 49 DroEre_CAF1 -----------------------TTTGATGG-TTTCTTTGCTTCATCAAGTTTTAATGAT 36 DroEre_CAF1 UUUGAU------GGUUUCUUUGCUUCAUCAAG---UUUUAA--UGAUUCUAUGAAGCGAG 49 DroPse_CAF1 GGGCCATGGCCTCCTCTGATCGATTAG-GGGTTTTCTTGCTTGATTTATCGGTTGATGGA 59 DroPse_CAF1 GGGCCAUGGCCUCCUCUGAUCGAUUAGGGGUUUUCUUGCUUGAUUUAUCGGUUGAUGGAG 60 based DroPer_CAF1 GGGCCAUGGCCUCCUCUGAUCGAUUAGGGGUUUUCUUGCUUGAUUUAUCGGUUGAUGGAG 60 DroPer_CAF1 GGGCCATGGCCTCCTCTGATCGATTAG-GGGTTTTCTTGCTTGATTTATCGGTTGATGGA 59 .........10........20........30........40........50......... .........10........20........30........40........50......... realignment ..((((..((.((....((((........))))....)).))..))))............ .....((((.(((((....)))..)).)))).......))))........... DroMel_CAF1 TCTATGGAGCGAGTAATGCGCTTGAAGCTGTGTTTATCTGGTCACATGTAT---TGA--A 91 DroMel_CAF1 UAAUGCGCUUGAAGCUGU-GUUUAUCUGGUCACAUGUAUUGA----------A 91 DroSim_CAF1 TCTATGGAGCGAGTACTGGGCTTGAAGCTGGGCTTATCTGGTCACATGTAT---TGA--A 91 DroSim_CAF1 UACUGGGCUUGAAGCUGG-GCUUAUCUGGUCACAUGUAUUGA----------A 91 DroYak_CAF1 TCTATGGAGCGAGTATTGGGCTTGAAGCTGTGTGTTTCTGGTCGCATGTAT---TGA--A 91 DroYak_CAF1 UAUUGGGCUUGAAGCUGU-GUGUUUCUGGUCGCAUGUAUUGA----------A 91 DroEre_CAF1 TCTATGAAGCGAGTATTGCGCTTGAAGCTGTGTGTTTCTGGTCACATGTAT---TGA--A 91 DroEre_CAF1 UAUUGCGCUUGAAGCUGU-GUGUUUCUGGUCACAUGUAUUGA----------A 91 DroPse_CAF1 GCAATGGGGTG----ATGCTAGTGA--GTGGGTGATTCTGGCCATGGCCATAGGTGAATA 113 DroPse_CAF1 CAAUGGGGUGAUGCUAGUGAGUGGGUGAUUCUGGCCAUGGCCAUAGGUGAAUA 113 DroPer_CAF1 GCAATGGGGTG----ATGCTAGTGA--GTGGGTGATTCTGGCCATGGCCATAGGTGAATA 113 − − −− → DroPer_CAF1 CAAUGGGGUGAUGCUAGUGAGUGGGUGAUUCUGGCCAUGGCCAUAGGUGAAUA 113 .........70........80........90........100.......110........ .........70........80........90........100.......110. S.Will, 18.417, Fall 2011 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ A _ _ _ _ _ _ A UU A _ _ _ _ _ _ U _ A G G G _ G U U U A A U _ U A U _ U _ _ U G _ _ U G _ A _ A U C _ A G A C C C C G U U U G U G G G A G C G U U G G G C U G G U G G G U U U G U U U A A C C A A G U A A U G C U A _ U A U G C U G U G A A G C U U U A U G U C C U U C G U U U U U G A A U C U A U A A G G A G _ U G A G CG A G A C C G U U U U U U U C U U A G U G G U C A C A C G G G U A U G C A UA A UG C G C G G G G C C UUGGC U U C A U U A A G G U U U A AUG C A _ A U U A A A _ _ A U U GA _ _ G U

  5. RNA Shapes : Idea • A more coarse-grained look at RNA structure • intuition: often general shape of RNA is more important for RNA function than “details” • example: cloverleaf structure of tRNAs Shape can be considered at different levels of abstraction S.Will, 18.417, Fall 2011 Robert Giegerich, Bj¨ orn Voss, Marc Rehmsmeier, Abstract shapes of RNA, Nucleic Acids Research, 2004

  6. RNA Shapes : different Shape Types • 5 Most abstract - helix nesting pattern and no unpaired regions • 4 Helix nesting pattern in internal loops and multiloops • 3 Nesting pattern for all loop types but no unpaired regions • 2 Nesting pattern for all loop types and unpaired regions in S.Will, 18.417, Fall 2011 bulges, internal loops, and multiloops • 1 Most accurate - all loops and all unpaired RNAshapes : Computes shape probabilities for a sequence (+ Shrep = representative structure for each shape)

Recommend


More recommend