the worst case complexity of maximum parsimony
play

The worst case complexity of Maximum Parsimony Amir Carmel Noa - PowerPoint PPT Presentation

The worst case complexity of Maximum Parsimony Amir Carmel Noa Musa-Lempel Dekel Tsur Michal Ziv-Ukelson Ben-Gurion University June 12, 2014 1 / 23 Whats a phylogeny Phylogenies: Graph-like structures whose topology


  1. The worst case complexity of Maximum Parsimony ◮ Amir Carmel ◮ Noa Musa-Lempel ◮ Dekel Tsur ◮ Michal Ziv-Ukelson Ben-Gurion University June 12, 2014 1 / 23

  2. What’s a phylogeny Phylogenies: ◮ Graph-like structures whose topology describes the inferred evolutionary history among a set of species. ◮ Modeled as either rooted or unrooted labeled binary trees, where the input entities are assigned to the leaf vertices. 2 / 23

  3. Character based methods for phylogenetic reconstruction ◮ Each specie is characterized by a sequence of letters. ◮ We are given a subsitution scoring matrix over the letters. ◮ Position independence is assumed. 1: A C A G G T T 2: C C A G A T T 3: C C G G G T A 4: T G A G G T A 5: T G A G G T T 3 / 23

  4. rooted/unrooted phylogeny ◮ The decision whether to model phylogenies as rooted versus unooted depends on the substitution scoring matrix. ◮ Modeling phylogenies as unrooted trees requires the assumption of symmetric scoring matrices. ◮ Today, many applications apply asymmetric scoring matrices. 4 / 23

  5. Parsimony Maximization ◮ A classical approach for phylogenetic reconstruction. ◮ The Parsimony Maximization approach seeks the phylogenetic tree that supposes the least amount of evolutionary change explaining the observed data. ◮ There are two classical problems inferred from phylogenetic parsimony maximization: Small Parsimony (SP) and Maxmimum Parsimony (MP). 5 / 23

  6. Small Parsimony Problem (SP) Input: multiple alignment, tree topology on n leaves. 1: A C A G G T T 2: C C A G A T T 3: C C G G G T A 1 2 3 4: T G A G G T A 4 5 5: T G A G G T T Goal: Assignment to internal vertices that minimizes the scoring function. C C C C C C G C 1 2 3 1 2 3 C C C 4 5 C C C 4 5 G G G G Score = 1 Score = 2 6 / 23

  7. Small Parsimony Problem (SP) Input: multiple alignment, tree topology on n leaves. 1: A C A G G T T 2: C C A G A T T 3: C C G G G T A 1 2 3 4: T G A G G T A 4 5 5: T G A G G T T Goal: Assignment to internal vertices that minimizes the scoring function. C C C C C C G C 1 2 3 1 2 3 C C C 4 5 C C C 4 5 G G G G Score = 1 Score = 2 We note that known algorithms for Small Parsimony traverse the tree in a bottom up manner. 6 / 23

  8. Maximum Parsimony Problem (MP) Input: multiple alignment 1: A C A G G T T 2: C C A G A T T 3: C C G G G T A 4: T G A G G T A 5: T G A G G T T Goal: topology and assignments to internal vertices, that minimizes the SP score. C C C C C C C C G C C C 3 1 2 3 1 2 3 C C C C 4 5 C C C 4 5 1 4 2 5 G G G G C G C G Score = 1 Score = 2 Score = 2 7 / 23

  9. Maximum Parsimony Problem (MP) Input: multiple alignment 1: A C A G G T T 2: C C A G A T T 3: C C G G G T A 4: T G A G G T A 5: T G A G G T T Goal: topology and assignments to internal vertices, that minimizes the SP score. C C C C C C C C G C C C 3 1 2 3 1 2 3 C C C C 4 5 C C C 4 5 1 4 2 5 G G G G C G C G Score = 1 Score = 2 Score = 2 The Maximum Parsimony (MP) problem is NP-hard [L. R. Foulds and R. L. Graham (1982)]. 7 / 23

  10. Measuring SP and MP complexity in terms of assignment operations ◮ Assignment operation - time to compute the assignment for a single vertex. ◮ This depends on the scoring scheme employed, for example: Fitch’s algorithm (Hamming distance) O ( m ), Sankoff’s algorithm (weighted edit distance) O ( m Σ 2 ). 8 / 23

  11. Our contribution Previous results: ◮ Cavalli-Sforza and Edwards (1967) - ( n − 1) · (2 n − 3)!! assignment operations. ◮ Hendy and Penny (1982) - branch&bound algorithm for MP. Where (2 n − 3)!! = 1 × 3 × 5 × . . . × (2 n − 3). 9 / 23

  12. Our contribution New results: ◮ Cavalli-Sforza and Edwards (1967) - ( n − 1) · (2 n − 3)!! assignment operations. ◮ Hendy and Penny (1982) - branch&bound algorithm for MP. Worst case running time: Θ( √ n · (2 n − 3)!!) assignment operations. ◮ A new, faster algorithm which executes Θ((2 n − 3)!!) assignment operations. Where (2 n − 3)!! = 1 × 3 × 5 × . . . × (2 n − 3) 9 / 23

  13. The algorithm of Cavalli-Sforza and Edwards 10 / 23

  14. The algorithm of Cavalli-Sforza and Edwards ◮ Cavalli-Sforza and Edwards showed that the number of rooted phylogenies with n leaves is (2 n − 3)!!. 10 / 23

  15. The algorithm of Cavalli-Sforza and Edwards ◮ Cavalli-Sforza and Edwards showed that the number of rooted phylogenies with n leaves is (2 n − 3)!!. ◮ The algorithm enumerates all phylogenies with n leaves, and then solves the Small Parsimony (SP) problem on each tree. 10 / 23

  16. The algorithm of Cavalli-Sforza and Edwards ◮ Cavalli-Sforza and Edwards showed that the number of rooted phylogenies with n leaves is (2 n − 3)!!. ◮ The algorithm enumerates all phylogenies with n leaves, and then solves the Small Parsimony (SP) problem on each tree. ◮ Each phylogeny has exactly n − 1 internal vertices, therefore the algorithm has a running time of ( n − 1) · (2 n − 3)!! assignment operations. 10 / 23

  17. The algorithm of Hendy and Penny Preliminaries: 1 1 2 3 2 1 1 2 1 3 3 2 11 / 23

  18. The algorithm of Hendy and Penny Enumeration space: 1 1 2 3 2 1 1 2 1 3 3 2 3 1 2 11 / 23

  19. The algorithm of Hendy and Penny Enumeration space: 1 1 2 3 2 1 1 2 1 3 3 2 3 1 2 11 / 23

  20. The algorithm of Hendy and Penny Enumeration space: 1 1 2 3 2 1 1 2 1 3 3 2 3 4 1 2 11 / 23

  21. The algorithm of Hendy and Penny Assignment operations: 1 1 2 3 2 1 1 2 1 3 3 2 3 4 1 2 11 / 23

  22. The algorithm of Hendy and Penny Assignment operations: 1 1 2 3 2 1 1 2 1 3 3 2 3 4 1 2 11 / 23

  23. The algorithm of Hendy and Penny Assignment operations: 1 1 2 3 2 1 1 2 1 3 3 2 3 3 4 2 1 2 1 11 / 23

  24. The algorithm of Hendy and Penny Assignment operations: 1 1 2 3 2 1 1 2 1 3 3 2 3 3 4 2 1 2 1 11 / 23

  25. The algorithm of Hendy and Penny Assignment operations: 1 1 2 3 2 1 1 2 1 3 3 2 3 3 4 2 1 2 1 4 11 / 23

  26. The algorithm of Hendy and Penny Assignment operations: 1 1 2 3 2 1 1 2 1 3 3 2 3 3 4 2 1 2 1 4 11 / 23

  27. The algorithm of Hendy and Penny Assignment operations: 1 1 2 3 2 1 1 2 1 3 3 2 3 3 4 2 1 2 1 4 11 / 23

  28. The algorithm of Hendy and Penny Assignment operations: 1 1 2 3 2 1 1 2 1 3 3 2 3 3 4 2 1 2 1 4 11 / 23

  29. The algorithm of Hendy and Penny Assignment operations: 1 1 2 3 2 1 1 2 1 3 3 2 3 3 3 4 2 1 1 2 1 4 4 2 11 / 23

  30. The algorithm of Hendy and Penny Assignment operations: 1 1 2 3 2 1 1 2 1 3 3 2 3 3 3 4 2 1 1 2 1 4 4 2 1 2 4 3 11 / 23

  31. The algorithm of Hendy and Penny Assignment operations: 1 1 2 3 2 1 1 2 1 3 3 2 3 3 3 4 4 2 1 3 1 2 1 4 4 2 1 2 4 3 1 2 11 / 23

  32. The algorithm of Hendy and Penny Assignment operations: 1 1 2 3 2 1 1 2 1 3 3 2 3 3 3 4 4 2 1 3 1 2 1 4 4 2 1 2 4 3 1 2 11 / 23

  33. The algorithm of Hendy and Penny Assignment operations: 1 1 2 3 2 1 1 2 1 3 3 2 3 3 3 4 4 2 1 3 1 2 1 4 4 2 1 2 4 3 1 2 The search space tree is developed in top-down order, while the recalculations of assignments is done in a bottom-up order. 11 / 23

  34. The algorithm of Hendy and Penny Assignment operations: 1 1 2 3 2 1 1 2 1 3 3 2 3 3 3 4 4 2 1 3 1 2 1 4 4 2 1 2 4 3 1 2 The complexity of the algorithm equals to the number of assignment operations. 11 / 23

  35. The algorithm of Hendy and Penny Their algorithm was originally proposed for the purpose of branch and bound and its worst case bound was not previously properly analyzed. Using combinatorial methods we managed to achieve an exact bound. 12 / 23

  36. The number of assignment operations ◮ Let NumAnc ( v ) denote the number of ancestors of x in F v . 13 / 23

  37. The number of assignment operations ◮ Let NumAnc ( v ) denote the number of ancestors of x in F v . 5 4 2 6 7 x 1 3 ◮ 13 / 23

  38. The number of assignment operations ◮ Let NumAnc ( v ) denote the number of ancestors of x in F v . 5 4 2 6 7 x 1 3 ◮ ◮ The number of ancestors of x in F v is equal to the number of assignment operations executes in node v . 13 / 23

  39. The number of assignment operations ◮ Let NumAnc ( v ) denote the number of ancestors of x in F v . 5 4 2 6 7 x 1 3 ◮ ◮ The number of ancestors of x in F v is equal to the number of assignment operations executes in node v . ◮ Let H i be the sum of NumAnc ( v ) for all nodes v in level i + 1. 13 / 23

  40. The number of assignment operations ◮ Let NumAnc ( v ) denote the number of ancestors of x in F v . 5 4 2 6 7 x 1 3 ◮ ◮ The number of ancestors of x in F v is equal to the number of assignment operations executes in node v . ◮ Let H i be the sum of NumAnc ( v ) for all nodes v in level i + 1. ◮ By definition, � NumAnc ( v ) = � n − 1 i =1 H i . 13 / 23

Recommend


More recommend