phylogeotool
play

PhyloGeoTool Exploring large-scale phylogenies in an - PowerPoint PPT Presentation

PhyloGeoTool Exploring large-scale phylogenies in an epidemiological context Ewout Vanden Eynden Clinical and Evolutionary Virology, Rega. Arevir Meeting April 29 th , 2016 Background Large-scale databases of clinical and demographical


  1. PhyloGeoTool Exploring large-scale phylogenies in an epidemiological context Ewout Vanden Eynden Clinical and Evolutionary Virology, Rega. Arevir Meeting April 29 th , 2016

  2. Background • Large-scale databases of clinical and demographical information • Opportunities for surveillance for epidemics and outbreak of viral pathogens • Tracking of individual variants with specific characteristics e.g. risk group, drug resistance, … can elaborate their relation to geographic or phylogenetic spread • Computational and methodological possible to infer large phylogenies Fig. 1 Circular tree representation of the dataset

  3. Problems • Efficient visual navigation of these phylogenies in current stand-alone tree viewers is challenging • Characterization of the complementing virus and patient data, associated with sequence clusters, requires adaptation of metadata • Fast and accurate placement of novel sequence data in an existing phylogenetic without reconstructing the phylogeny Fig. 2 Radial tree representation of the dataset

  4. Objectives • Automatic partitioning of a phylogeny in a defined number of clusters • Design of a GUI to provide a concise visualization of the tree of clusters on each different level that also shows their respective position within the entire phylogeny • Represent a summary of different attributes at each partitioning step of the phylogenetic tree. The summary is shown in a histogram while any geographical data is represented within a map • Support for the placement of novel data into the phylogeny without the need for recalculating the whole phylogeny and its intrinsic cluster calculations

  5. Full view of the tool î Fig. 3 Full view of the phylogeotool when hovered over a node

  6. PhyloGeoTool 0.04 Fig. 4 Radial colored tree representation of the dataset Fig. 5 Circular clustered tree representation of the dataset

  7. Investigate cluster 0.04 Fig. 6 Radial colored tree representation of a specific cluster Fig. 7 Circular clustered tree representation of a specific cluster

  8. Investigate cluster 0.04 Fig. 8 Radial colored tree representation of the dataset Fig. 9 Circular clustered tree representation of the dataset

  9. Investigate cluster 0.04 Fig. 10 Radial colored tree representation of the dataset Fig. 11 Circular clustered tree representation of the dataset

  10. Investigate cluster 0.04 Fig. 12 Radial colored tree representation of the dataset Fig. 13 Circular clustered tree representation of the dataset

  11. Extra information on each cluster • More detailed information of each cluster • Link tree to csv file • Each column is read as a different attribute • Geographical information (if available) is shown on the world map • Users can add extra information to the csv file themselves

  12. Sample csv file Fig. 14 Sample CSV file with attributes “Year of Birth”, “Gender”, “Country of origin (en)”, “Country of origin (iso), “Ethnic Group” and “Risk Group”

  13. Representation in the tool Fig. 15 Representation of the sample CSV file as summarized data in a histogram

  14. How to cluster (1)? • Start from a rooted tree • Top down iterative clustering approach 1. Take root node (A) of the biggest cluster (root node from tree in case no clusters have been defined yet) 2. Replace biggest cluster by: o Cluster 1 with root node B, which is the first child of A o Cluster 2 with root node C, which is the second child of A 3. In case required amount of clusters hasn’t been reached, go to step 1 and repeat

  15. Starting tree Fig. 16 Phylogenetic tree representation of a random sample dataset with 20 sequences

  16. K = 2 Fig. 17 Visual representation of the sample phylogenetic tree for a clustering with k=2

  17. K = 3 Fig. 18 Visual representation of the sample phylogenetic tree for a clustering with k=3

  18. K = 4 Fig. 19 Visual representation of the sample phylogenetic tree for a clustering with k=4

  19. How to cluster (2)? • Minimizing intra-cluster distances • Maximizing inter-cluster distances • Subtype Diversity Ratio, SDR (Archer et al. , Bioinformatics, 2007) • Ratio of the mean intra-cluster pairwise distance to the mean inter-cluster pairwise distance (Rambaut et al., Nature, 2001) • The clustering with the lowest SDR is the best • Distances taken directly from the phylogenetic tree

  20. Which clustering is the best • Cluster for k=2 to k=50 where k is the number of clusters • For each k, calculate the SDR score • Clustering with lowest SDR value is best clustering • Problem: More clusters mostly means a better clustering as the individual points are grouped in a better way (thus lower SDR). • The aim is to find the balance between the amount of clusters and the best clusters • The second derivative is used to find the biggest drop in SDR value

  21. Future perspectives • Integrated as web-application into EuResist Integrated Data Base (EIDB), • Phylogenetic placement (using the PPlacer software) • ….

  22. Acknowledgements Clinical and Epidemiological Virology, KU LEUVEN Pieter Libin*, Ewout Vanden Eynden*, Anne-Mieke Vandamme, Kristof Theys Computational Evolutionary Virology, KU LEUVEN Guy Baele* Artificial Intelligence Lab, VUB Pieter Libin, Ann Nowe EuResist network and the European HIV coreceptor study panel (eucohiv) VIROGENESIS receives funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 634650

  23. Demo

More recommend