Alignment of Trees and Directed Acyclic Graphs Gabriel Valiente Algorithms, Bioinformatics, Complexity and Formal Methods Research Group Technical University of Catalonia Computational Biology and Bioinformatics Research Group Research Institute of Health Science, University of the Balearic Islands Centre for Genomic Regulation Barcelona Biomedical Research Park Ben-Gurion University of the Negev, Israel, April 27, 2009 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 1 / 35
Abstract It is well known that the string edit distance and the alignment of strings coincide, while the alignment of trees differs from the tree edit distance. In this talk, we recall various constraints on directed acyclic graphs that allow for a unique (up to isomorphism) representation, called the path multiplicity representation, and present a new method for the alignment of trees and directed acyclic graphs that exploits the path multiplicity representation to produce a meaningful optimal alignment in polynomial time. Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 2 / 35
Plan of the Talk String edit distance and alignment Tree edit distance and alignment DAG representation of phylogenetic networks Path multiplicity representation DAG alignment Tree alignment as DAG alignment Tool support BioPerl module Web interface to the BioPerl module Conclusion Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 3 / 35
String edit distance and alignment Definition The edit distance between two strings is the smallest number of insertions, deletions, and substitutions needed to transform one string into the other Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 4 / 35
String edit distance and alignment Definition The edit distance between two strings is the smallest number of insertions, deletions, and substitutions needed to transform one string into the other Definition An alignment of two strings is an arrangement of the two strings as rows of a matrix, with additional gaps (dashes) between the elements to make some or all of the remaining (aligned) columns contain identical elements but with no column gapped in both strings Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 4 / 35
String edit distance and alignment Definition The edit distance between two strings is the smallest number of insertions, deletions, and substitutions needed to transform one string into the other Definition An alignment of two strings is an arrangement of the two strings as rows of a matrix, with additional gaps (dashes) between the elements to make some or all of the remaining (aligned) columns contain identical elements but with no column gapped in both strings Example (Optimal alignment) -GCTTCCGGCTCGTATAATGTGTGG |||||*|*|| |||||* | TGCTTCTGACT ---ATAATA -G--- Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 4 / 35
Tree edit distance and alignment Definition The edit distance between two trees is the smallest number of insertions, deletions, and substitutions needed to transform one tree into the other Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 5 / 35
Tree edit distance and alignment Definition The edit distance between two trees is the smallest number of insertions, deletions, and substitutions needed to transform one tree into the other Example (Edit distance) a a a e d b f c c c b b d d Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 5 / 35
Tree edit distance and alignment Definition An alignment of two trees is an arrangement of the trees with space labeled nodes inserted such that their structures coincide Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 6 / 35
Tree edit distance and alignment Definition An alignment of two trees is an arrangement of the trees with space labeled nodes inserted such that their structures coincide Example (Optimal alignment) a a a a e e d f b f c c c c b b b d d d Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 6 / 35
Tree edit distance and alignment Remark An alignment of trees is a restricted form of tree edit distance in which all the insertions precede all the deletions Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 7 / 35
Tree edit distance and alignment Remark An alignment of trees is a restricted form of tree edit distance in which all the insertions precede all the deletions Remark With insertion cost 1, deletion cost 1, identical substitution cost 0, and non-identical substitution cost 2, an optimal tree edit yields a largest common subtree and an optimal alignment yields a smallest common supertree T. Jiang, L. Wang, and K. Zhang. Alignment of trees—an alternative to tree edit. Theoretical Computer Science , 143(1):137–148, 1995 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 7 / 35
Tree edit distance and alignment H. Bunke, X. Jiang, and A. Kandel. On the minimum common supergraph of two graphs. Computing , 65(1):13–25, 2000 M.-L. Fern´ andez and G. Valiente. A graph distance measure combining maximum common subgraph and minimum common supergraph. Pattern Recognition Letters , 22(6–7):753–758, 2001 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 8 / 35
Tree edit distance and alignment H. Bunke, X. Jiang, and A. Kandel. On the minimum common supergraph of two graphs. Computing , 65(1):13–25, 2000 M.-L. Fern´ andez and G. Valiente. A graph distance measure combining maximum common subgraph and minimum common supergraph. Pattern Recognition Letters , 22(6–7):753–758, 2001 Theorem The problems of finding a largest common subtree and a smallest common supertree of two trees, in each case together with a pair of witness (minor, topological, homeomorphic, or isomorphic) embeddings, are reducible to each other in time linear in the size of the trees F. Rossell´ o and G. Valiente. An algebraic view of the relation between largest common subtrees and smallest common supertrees. Theoretical Computer Science , 362(1–3):33–53, 2006 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 8 / 35
Tree edit distance and alignment Example A. Lozano, R. Pinter, O. Rokhlenko, G. Valiente, and M. Ziv-Ukelson. Seeded tree alignment and planar tanglegram layout. In Proc. 7th Workshop on Algorithms in Bioinformatics , volume 4645 of Lecture Notes in Bioinformatics , pages 98–110. Springer, 2007 A. Lozano, R. Pinter, O. Rokhlenko, G. Valiente, and M. Ziv-Ukelson. Seeded tree alignment. IEEE/ACM Transactions on Computational Biology and Bioinformatics , 5(4):503–513, 2008 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 9 / 35
DAG representation of phylogenetic networks D. H. Huson and D. Bryant. Application of phylogenetic networks in evolutionary studies. Mol. Biol. Evol. , 23(2):254–267, 2006 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 10 / 35
DAG representation of phylogenetic networks Definition A phylogenetic network is a directed acyclic graph whose terminal nodes are labeled by taxa names and whose internal nodes are either tree nodes (if they have only one parent) or hybrid nodes (if they have two or more parents) Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 11 / 35
DAG representation of phylogenetic networks Example 44 polymorphic sites in a sample of the single gene encoding for alcohol dehydrogenase in 11 species from 5 natural populations of D. melanogaster CCGCAATAATGGCGCTACTCTCACAATAACCCACTAGACAGCCT Wa-S CCCCAATATGGGCGCTACTTTCACAATAACCCACTAGACAGCCT Fl-1S CCGCAATATGGGCGCTACCCCCCGGAATCTCCACTAAACAGTCA Af-S CCGCAATATGGGCGCTGTCCCCCGGAATCTCCACTAAACTACCT Fr-S CCGAGATAAGTCCGAGGTCCCCCGGAATCTCCACTAGCCAGCCT Fl-2S CCCCAATATGGGCGCGACCCCCCGGAATCTCTATTCACCAGCTT Ja-S CCCCAATATGGGCGCGACCCCCCGGAATCTGTCTCCGCCAGCCT Fl-F TGCAGATAAGTCGGCGACCCCCCGGAATCTGTCTCCGCGAGCCT Fr-F TGCAGATAAGTCGGCGACCCCCCGGAATCTGTCTCCGCGAGCCT Wa-F TGCAGATAAGTCGGCGACCCCCCGGAATCTGTCTCCGCGAGCCT Af-F TGCAGGGGAGGGCTCGACCCCACGGGATCTGTCTCCGCCAGCCT Ja-F M. Kreitman. Nucleotide polymorphism at the alcohol dehydrogenase locus of Drosophila melanogaster . Nature , 304(5925):412–417, 1983 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 12 / 35
DAG representation of phylogenetic networks Example Ja-F Af-F Fr-F Wa-F Fl-2S Wa-S Af-S Fr-S Fl-1S Ja-S Fl-F Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 13 / 35
DAG representation of phylogenetic networks Example Fl-F Ja-F Fr-F Wa-F Af-F Ja-S Fl-2S Fr-S Wa-S Af-S Fl-1S Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 14 / 35
DAG representation of phylogenetic networks Definition A phylogenetic network is called tree-sibling if every hybrid node has at least one sibling that is a tree node Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 15 / 35
DAG representation of phylogenetic networks Definition A phylogenetic network is called tree-sibling if every hybrid node has at least one sibling that is a tree node Remark The biological meaning of the tree-sibling condition is that in each of the recombination or hybridization processes, at least one of the species involved in them also has some descendant through mutation Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 15 / 35
Recommend
More recommend