an approximate approach for solving the balanced minimum
play

An Approximate Approach for Solving the Balanced Minimum Evolution - PowerPoint PPT Presentation

An Approximate Approach for Solving the Balanced Minimum Evolution Problem A. Aringhieri * , C. Braghin * and D. Catanzaro *Dipartimento di Tecnologie dellInformazione - University of Milan - Italy Service Graphes et Optimisation


  1. An Approximate Approach for Solving the Balanced Minimum Evolution Problem A. Aringhieri * , C. Braghin * and D. Catanzaro † *Dipartimento di Tecnologie dell’Informazione - University of Milan - Italy † Service Graphes et Optimisation Mathématique (G.O.M.) - Université Libre de Bruxelles - Belgium 1

  2. From phylogenetics to molecular phylogenetics A. Aringhieri * , C. Braghin * and D. Catanzaro † An approximate approach for solving the balanced minimum evolution problem 2

  3. From phylogenetics to molecular phylogenetics A. Aringhieri * , C. Braghin * and D. Catanzaro † An approximate approach for solving the balanced minimum evolution problem 3

  4. HIV-1 phylogeny A. Aringhieri * , C. Braghin * and D. Catanzaro † An approximate approach for solving the balanced minimum evolution problem 4

  5. HIV-1 phylogeny A. Aringhieri * , C. Braghin * and D. Catanzaro † An approximate approach for solving the balanced minimum evolution problem 4

  6. Applications medical research - epidemiology population dynamics - drug discovery 5

  7. Phylogenetics A. Aringhieri * , C. Braghin * and D. Catanzaro † An approximate approach for solving the balanced minimum evolution problem 6

  8. Species Molecular Sequence Macaca (A) AAGCTTCATAGGAGCAACCATTCTAATAATCGCACATGGCCTTACATCATCC Homo sapiens (B) AAGCTTCACCGGCGCAGTCATTCTCATAATCGCCCACGGGCTTACATCCTCA Pan (C) AAGCTTCACCGGCGCAATTATCCTCATAATCGCCCACGGACTTACATCCTCA Gorilla (D) AAGCTTCACCGGCGCAGTTGTTCTTATAATTGCCCACGGACTTACATCATCA Pongo (E) AAGCTTCACCGGCGCAACCACCCTCATGATTGCCCATGGACTCACATCCTCC B A w b w a 1 w 1 2 w e 3 w 2 E w c w d C D Phylogenies A. Aringhieri * , C. Braghin * and D. Catanzaro † An approximate approach for solving the balanced minimum evolution problem 7

  9. 1969 ME 1981 Maximum 1967 Model-based Likelihood criteria 1957 2001 Bayesian Parsimony estimation 1963 Phylogenetic estimation criteria A. Aringhieri * , C. Braghin * and D. Catanzaro † An approximate approach for solving the balanced minimum evolution problem 8

  10. Weighted Least Generalized Least Balanced Minimum Squares (WLS) Squares (GLS) Evolution (BME) Ordinary Least Squares (OLS) 2004 ME Phylogenetic estimation criteria A. Aringhieri * , C. Braghin * and D. Catanzaro † An approximate approach for solving the balanced minimum evolution problem 8

  11. In absence of convergent or divergent evolution, evolution of well conserved molecular sequences can be approximated over time by means of local minimum paths. Local instead of global minimum because of: The neighborhood of possible allele that are selected at each instant of the life of a species is finite. The selective pressure may be not constant over time. The dimension of a population may variate from species to species (different influence on the fitness). The Minimum Evolution (ME) criterion of phylogenetic estimation A. Aringhieri * , C. Braghin * and D. Catanzaro † An approximate approach for solving the balanced minimum evolution problem 9

  12. A minimal length phylogeny provides a lower bound on the overall amount of mutation events occurred along evolution of the set of species analyzed. The balanced minimum evolution criterion is a variation of ME in which the length of a phylogeny is computed as: j w b w 1 1 2 w 2 w e 3 i Fundamentals of the balanced minimum evolution criterion A. Aringhieri * , C. Braghin * and D. Catanzaro † An approximate approach for solving the balanced minimum evolution problem 10

  13. The phylogeny length under BME is equivalent to the average of the circular orders associated to a given phylogeny. sum of the edge weight belonging to the path from leaves x i to x i+1 Combinatorial interpretation of BME A. Aringhieri * , C. Braghin * and D. Catanzaro † An approximate approach for solving the balanced minimum evolution problem 11

  14. The problem of finding a phylogeny which satisfies the balanced minimum evolution criterion is known as Balanced Minimum Evolution Problem (BME) and consists of minimizing the function with the constraint that { τ ij} form a phylogeny. BME is in P if However in the most general case the complexity of BME is unknown. Complexity of BME A. Aringhieri * , C. Braghin * and D. Catanzaro † An approximate approach for solving the balanced minimum evolution problem 12

  15. A possible approach to solution A. Aringhieri * , C. Braghin * and D. Catanzaro † An approximate approach for solving the balanced minimum evolution problem 13

  16. Leaves NI-Shapes Shapes 3 1 3 4 1 15 5 1 105 6 2 945 7 2 10395 8 3 135135 9 4 2027025 10 11 34459425 15 265 1,00E+13 The total number of possible phylogenies with n leaves is ( 2n-5 )!! 20 11020 1,00E+22 30 14502229 1,00E+39 However, the number of Non-Isomorphic (NI) phylogenies 40 11077270355 1,00E+58 increases much slowly. Hence, a possible approach to solution could consists of enumerating all the possible NI phylogenies and then to proceed with the leaf assignments. A possible approach to solution A. Aringhieri * , C. Braghin * and D. Catanzaro † An approximate approach for solving the balanced minimum evolution problem 13

  17. In the most general case, this approach is approximate and can be stated as follows: Given a Distance Matrix Solve the TSP and save the best Circular Order CO* set T*=NULL For Any non Isomorphic Phylogeny Assign CO* to the phylogeny Does the length A possible approach to Update T* Yes decrease? solution No A. Aringhieri * , C. Braghin * and D. Catanzaro † An approximate approach for solving the balanced For any clockwise rotation of CO* on T minimum evolution problem Run 2-OPT on T 14

  18. Dataset Species Number of Species Characters Type RbcL Plants 500 1314 rbcL gene Rana Ranoid Frogs 64 1976 mt. DNA M37 Insects 37 2550 mt. DNA M28 Mamals 28 2086 mt. DNA M43 Cetacea 43 8128 mt. DNA M62 Fungi 82 2062 mt. DNA M82 Hyracoidae 62 3768 mt. DNA SeedPlant25 Pinoles 25 19784 tRNA Preliminary results: Molecular Datasets A. Aringhieri * , C. Braghin * and D. Catanzaro † An approximate approach for solving the balanced minimum evolution problem 15

  19. From the previous datasets we have extracted two sets of 10 instances of 20 and 25 species each, respectively. We have obtained the corresponding distance matrices by means of the General Time Reversible (GTR) model of DNA sequence evolution. The estimation procedure applied was the one described in D. Catanzaro, R. Pesenti, and M. C. Milinkovitch. A non-linear optimization procedure to estimate distances and instantaneous substitution rate matrices under the GTR model. Bioinformatics 22(6), 708-715, 2006. The experiment run on a Intel(R) Pentium(R) D CPU 3.20 GHz, equipped with 2Gb RAM and Linux Kernel 2.6.20, gcc version 4.1.2. Preliminary results: Instances A. Aringhieri * , C. Braghin * and D. Catanzaro † An approximate approach for solving the balanced minimum evolution problem 16

  20. Strategy Swap on best-so-far phylogeny Swap on each phylogeny Best-so-far value Best-so-far value Instances Dimension Time (sec) Swaps improves? Time (sec) Swaps improves? found found Dataset01 20 6.4200 no 130.03000 yes 2858.512695 2811.256836 Dataset02 20 6.29000 yes yes 2942.833984 127.63000 2942.833984 Dataset03 20 6.11000 no yes 2488.034424 127.72000 2452.971680 Dataset04 20 6.43000 no yes 2628.945312 127.32000 2611.698242 Dataset05 20 5.97000 no no 2330.825684 129.58000 2330.825684 Dataset06 20 6.34000 no yes 2659.358398 128.94000 2614.793457 Dataset07 20 6.34000 no yes 2775.332031 131.13000 2754.525391 Dataset08 20 6.41000 no no 2636.678955 130.48000 2636.678955 Dataset09 20 6.38000 yes yes 2511.175781 129.37000 2511.175781 Dataset10 20 6.35000 no yes 2567.597656 131.95000 2541.879883 The non-isomorphic enumeration is completed in 0.72 sec for instances containing 20 species 84 sec are needed to enumerate non-isomorphic phylogenies for instances containing 25 species Preliminary results: Computational Results A. Aringhieri * , C. Braghin * and D. Catanzaro † An approximate approach for solving the balanced minimum evolution problem 17

  21. In summary: Some cases of BME are in P . However, deciding the complexity of BME is still an open problem. This is the the first attempt in solving BME by approximate algorithms. No exact algorithm is currently known in the literature. Brute force enumeration for phylogenetic estimation under ME are unable to tackle instances larger than 12. The enumerative approach can be applied to any phylogenetic estimation method. The enumerative approach can be combined with exact approaches (es. exact leaf assignment) As a drawback, the enumerative procedure is exponential. Computational results are encouraging. However, tackling larger instances still warrants additional analysis. Summary and Conclusion A. Aringhieri * , C. Braghin * and D. Catanzaro † An approximate approach for solving the balanced minimum evolution problem 18

Recommend


More recommend