segment based multiple sequence alignment
play

Segment-based Multiple Sequence Alignment T . Rausch, A.-K. Emde, - PowerPoint PPT Presentation

Knut Reinert May 2009 Segment-based Multiple Sequence Alignment T . Rausch, A.-K. Emde, D. Weese, A. Dring, C. Notredame and K. Reinert Knut Reinert 25.5.2009 (based on slides from Tobias Rausch) Algorithmische Bioinformatik Monday, May


  1. Knut Reinert May 2009 Segment-based Multiple Sequence Alignment T . Rausch, A.-K. Emde, D. Weese, A. Döring, C. Notredame and K. Reinert Knut Reinert 25.5.2009 (based on slides from Tobias Rausch) Algorithmische Bioinformatik Monday, May 25,

  2. Tobias Rausch, Agenda September 2008 • Alignment Graph • Multiple Sequence Alignment Algorithm • Implementation and Results Algorithmische Bioinformatik Monday, May 25,

  3. Tobias Rausch, Segment-based Alignment Graph September 2008 • Alignment Matrix • Alignment Graph Algorithmische Bioinformatik Monday, May 25,

  4. Tobias Rausch, Applications September 2008 • Protein Alignment Algorithmische Bioinformatik Monday, May 25,

  5. Tobias Rausch, Applications September 2008 • Protein Alignment • Genome Comparison Algorithmische Bioinformatik Monday, May 25,

  6. Tobias Rausch, Applications September 2008 • Protein Alignment • Genome Comparison • Multi-Read Alignment Algorithmische Bioinformatik Monday, May 25,

  7. Tobias Rausch, Methods September 2008 Alignment Algorithm Algorithmische Bioinformatik Monday, May 25,

  8. Tobias Rausch, Components of the Algorithm September 2008 Algorithmische Bioinformatik Monday, May 25,

  9. Tobias Rausch, Components of the Algorithm September 2008 Algorithmische Bioinformatik Monday, May 25,

  10. Tobias Rausch, Segment-Match Generation September 2008 • Global alignments [NW70, Got82] Algorithmische Bioinformatik Monday, May 25,

  11. Tobias Rausch, Segment-Match Generation September 2008 • Global alignments [NW70, Got82] • Local alignments [SW81, WE87] Algorithmische Bioinformatik Monday, May 25,

  12. Tobias Rausch, Segment-Match Generation September 2008 • Global alignments [NW70, Got82] • Local alignments [SW81, WE87] • Variants: Overlap, Banded Algorithmische Bioinformatik Monday, May 25,

  13. Tobias Rausch, Segment-Match Generation September 2008 • Global alignments [NW70, Got82] • Local alignments [SW81, WE87] • Variants – Overlap alignments – Banded alignments • Others: – Longest-common subsequence [JV92] S 0 :XMJYAUZ S 1 :MZJAWXUE Algorithmische Bioinformatik Monday, May 25,

  14. Tobias Rausch, Segment-Match Generation September 2008 • Global alignments [NW70, Got82] • Local alignments [SW81, WE87] • Variants – Overlap alignments – Banded alignments • Others: – Longest-common subsequence [JV92] S 0 :XMJYAUZ S 1 :MZJAWXUE Algorithmische Bioinformatik Monday, May 25,

  15. Tobias Rausch, Segment-Match Generation September 2008 • Global alignments [NW70, Got82] • Local alignments [SW81, WE87] • Variants – Overlap alignments – Banded alignments • Others: – Longest-common subsequence [JV92] – External segment-matches • MUMmer, BLAST hits • External alignments Algorithmische Bioinformatik Monday, May 25,

  16. Tobias Rausch, Collect all Segment Matches September 2008 IPPQFDFRDEYPQC--VKP IPEYVD----WRQKGAVTP VKP IPPQFD VTP IPEYVD YPQC WRQK Algorithmische Bioinformatik Monday, May 25,

  17. Tobias Rausch, Segment Match Refinement September 2008 Algorithmische Bioinformatik Monday, May 25,

  18. Tobias Rausch, Segment Match Refinement September 2008 Algorithmische Bioinformatik Monday, May 25,

  19. Tobias Rausch, Segment Match Refinement September 2008 [HHR02, REW+08] Algorithmische Bioinformatik Monday, May 25,

  20. Tobias Rausch, Alignment Graph Construction September 2008 Algorithmische Bioinformatik Monday, May 25,

  21. Tobias Rausch, Alignment Graph Construction September 2008 • Only a subset of all the edges constitutes a valid alignment Algorithmische Bioinformatik Monday, May 25,

  22. Tobias Rausch, Alignment Graph Construction September 2008 • Only a subset of all the edges constitutes a valid alignment • Select the alignment edges (trace edges) of maximum weight = Maximal Trace [SK83,Kec93] Algorithmische Bioinformatik Monday, May 25,

  23. Tobias Rausch, Consistency Extension September 2008 GARFIELDTHELASTFA-TCAT GARFIELDTHE----FASTCAT --------THE----FA-TCAT [NHH00] Algorithmische Bioinformatik Monday, May 25,

  24. Tobias Rausch, Consistency Extension September 2008 GARFIELDTHELASTFATCAT GARFIELDTHEFAS---TCAT --------THEFA----TCAT Algorithmische Bioinformatik Monday, May 25,

  25. Tobias Rausch, Consistency Extension September 2008 GARFIELDTHE LASTFATCAT GARFIELDTHE FAS---TCAT --------THE FA----TCAT Algorithmische Bioinformatik Monday, May 25,

  26. Tobias Rausch, Consistency Extension September 2008 GARFIELDTHELASTFA TCAT GARFIELDTHEFAS--- TCAT --------THEFA---- TCAT Algorithmische Bioinformatik Monday, May 25,

  27. Tobias Rausch, Consistency Extension September 2008 GARFIELDTHE LASTFA TCAT GARFIELDTHE FAS--- TCAT --------THE FA---- TCAT Algorithmische Bioinformatik Monday, May 25,

  28. Tobias Rausch, Consistency Extension September 2008 • Increase the weight of clique edges [NHH00] Algorithmische Bioinformatik Monday, May 25,

  29. Tobias Rausch, Consistency Extension September 2008 GARFIELDTHELASTFA-TCAT GARFIELDTHE----FASTCAT --------THE----FA-TCAT Algorithmische Bioinformatik Monday, May 25,

  30. Tobias Rausch, Consistency Extension September 2008 GARFIELDTHE LASTFA- TCAT GARFIELDTHE ----FAS TCAT --------THE ----FA- TCAT Algorithmische Bioinformatik Monday, May 25,

  31. Tobias Rausch, Distance Matrix September 2008 Algorithmische Bioinformatik Monday, May 25,

  32. Tobias Rausch, Guide Tree September 2008 Algorithmische Bioinformatik Monday, May 25,

  33. Tobias Rausch, Graph-based Progressive Alignment September 2008 • Progressive alignment – Aligns strings / profiles of vertices – Heaviest common subsequence algorithm [JV92] Algorithmische Bioinformatik Monday, May 25,

  34. Tobias Rausch, Graph-based Progressive Alignment September 2008 Algorithmische Bioinformatik Monday, May 25,

  35. Tobias Rausch, Configurable September 2008 Algorithmische Bioinformatik Monday, May 25,

  36. Tobias Rausch, Implementation and Results September 2008 Implementation and Results Algorithmische Bioinformatik Monday, May 25,

  37. Tobias Rausch, Results September 2008 Deep Alignment Protein Alignment DNA and Multi-Read Genome Alignment Alignment Algorithmische Bioinformatik Monday, May 25,

  38. Tobias Rausch, Results September 2008 • BAliBASE 3.0 (Protein Benchmark) RV11 RV12 RV20 RV30 RV40 RV50 CPU time (s) M-Coffee 42.74 85.86 44.78 56.1 55.8 54.69 27,730 Our Tool 46.89 86.16 46.56 58.9 62.39 58.94 12,455 • Alignment of 6 adenoviruses (DNA) Avg. Identity CPU time (s) DIALIGN-T 48% 1259 MAFFT 62% 118 MUSCLE 38% 673 Our Tool 65% 328 • More detailed results are in the paper Algorithmische Bioinformatik Monday, May 25,

  39. Tobias Rausch, Part of SeqAn September 2008 • Extendable – Add your own algorithm www.seqan.de Algorithmische Bioinformatik Monday, May 25,

  40. Tobias Rausch, Thank You for Your Attention! September 2008 Visit: www.seqan.de/projects/msa.html Andreas Anne-Katrin Cedric David Knut Tobias Algorithmische Bioinformatik Monday, May 25,

  41. Tobias Rausch, References September 2008 [Got82] O. Gotoh. An improved algorithm for matching biological sequences. J. Mol. Biol., 162(3): 705–708, Dec 1982. [HHR02] A. L. Halpern, D. H. Huson, and K. Reinert. Segment match refinement and applications. In WABI ’02: Proceedings of the Second International Workshop on Algorithms in Bioinformatics, pages 126–139, London, UK, 2002. Springer-Verlag. [JV92] G. Jacobson and K.-P . Vo. Heaviest increasing/common subsequence problems. In CPM ’92: Proceedings of the Third Annual Symposium on Combinatorial Pattern Matching, pages 52–66, London, UK, 1992. Springer-Verlag. [Kec93]J. D. Kececioglu. The maximum weight trace problem in multiple sequence alignment. In CPM ’93: Proceedings of the 4th Annual Symposium on Combinatorial Pattern Matching, pages 106–119, London, UK, 1993. Springer-Verlag. [NHH00]C. Notredame, D.G. Higgins, and J. Heringa. TCoffee: A novel method for fast and accurate multiple sequence alignment. Journal of Molecular Biology., 302:205–217, 2000. [NW70]S. B. Needleman and C. D. Wunsch. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Molecular Biol., 48:443–453, 1970. [REW+08]Tobias Rausch, Anne-Katrin Emde, David Weese, Andreas Doring, Cedric Notredame, and Knut Reinert. Segment-based multiple sequence alignment. Bioinformatics, 24(16):i187–192, 2008. [SK83]D. Sankoff and J. B. Kruskal. Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. Addison-Wesley, Reading, MA, 1983. [SW81]T . F . Smith and M. S. Waterman. Identification of common molecular subsequences. Journal of Molecular Biology, 147(1):195–197, 1981. [WE87]M. S. Waterman and M. Eggert. A new algorithm for best subsequence alignments with application to tRNA-rRNA comparisons. Journal of Molecular Biology, 197(4):723–728, 1987. Algorithmische Bioinformatik Monday, May 25,

Recommend


More recommend