family joining a method for constructing generally
play

Family-joining: A method for constructing generally labeled trees - PowerPoint PPT Presentation

Family-joining: A method for constructing generally labeled trees Prabhav Kalaghatgi Max Planck Institute for Informatics Saarbrcken AREVIR, Cologne, April 29 2016 A phylogenetic tree is a model of evolutionary relationship Chang et al. Mol


  1. Family-joining: A method for constructing generally labeled trees Prabhav Kalaghatgi Max Planck Institute for Informatics Saarbrücken AREVIR, Cologne, April 29 2016

  2. A phylogenetic tree is a model of evolutionary relationship Chang et al. Mol Biol Evol ; 2002 Prabhav Kalaghatgi 2/19

  3. Assumptions of current phylogenetic methods Leaf labeled trees O7 O6 L6 O8 L5 O5 L7 O9 L4 L3 O4 L2 Unobserved ancestors O1 L1 O2 O3 Observed species Prabhav Kalaghatgi 3/19

  4. Assumptions of current phylogenetic methods Leaf labeled trees Generally labeled trees O7 O6 O7 O6 L6 O8 L6 O8 L5 O5 L7 O9 O9 O5 L4 L7 L4 Observed ancestor L3 L3 O4 O4 L2 L1 Unobserved ancestors O1 O1 L1 O2 O3 O2 O3 Observed species Prabhav Kalaghatgi 3/19

  5. Assumptions of current phylogenetic methods Leaf labeled trees Generally labeled trees O7 O6 O7 O6 L6 O8 L6 O8 L5 O5 L7 O9 O9 O5 L4 L7 L4 Observed ancestor L3 L3 O4 O4 L2 L1 Unobserved ancestors Unobserved ancestors O1 O1 L1 O2 O3 O2 O3 Observed species Prabhav Kalaghatgi 3/19

  6. Relationship types: parent-child and siblings ∆ i , j = Avg k ( d jk − d ik − d ij ) sibling parent-child Select parent-child over sibling if Prabhav Kalaghatgi 4/19

  7. Family-joining (FJ) method O1 0 O2 3 0 O7 O3 8 9 0 O8 1 O4 5 8 9 6 0 O2 O5 9 10 7 1 0 1 3 L3 L2 L1 4 2 1 O6 O9 10 11 8 6 7 0 O6 2 O7 3 8 9 6 4 5 4 0 1 O8 12 13 10 8 9 8 6 0 O1 O4 1 O3 O9 7 8 5 3 4 3 1 5 0 O5 O1 O2 O3 O4 O5 O6 O7 O8 O9 Tree simulated for illustration Distances based on this tree Prabhav Kalaghatgi 5/19

  8. Family-joining (FJ) method A O1 0 O2 3 0 O3 O3 8 9 0 O2 O4 8 9 6 0 O4 O5 9 10 7 1 0 O1 O6 10 0 11 8 6 7 O5 O7 8 9 6 4 5 4 0 O8 O9 12 13 10 8 9 8 6 0 O6 O9 7 8 5 3 4 3 1 5 0 O8 O7 O1 O2 O3 O4 O5 O6 O7 O8 O9 Tree-additve distances Unresolved tree topology Prabhav Kalaghatgi 5/19

  9. Family-joining (FJ) method B O4 O3 O3 0 O5 O4 6 0 O2 O5 7 1 0 L1 O6 0 O6 8 6 7 O7 6 4 5 4 0 O 1 O8 10 8 9 8 6 0 O7 O9 O9 5 3 4 3 1 5 0 O8 L1 7 7 8 9 7 11 6 0 O3 O4 O5 O6 O7 O8 O9 L1 O1, O2 Siblings with latent parent Prabhav Kalaghatgi 5/19

  10. Family-joining (FJ) method C O6 O3 O2 O3 O7 0 O4 L1 6 0 O6 0 8 6 O4 O7 0 6 4 4 O1 O8 10 8 6 O5 8 0 O9 O9 5 3 1 0 3 5 O8 L1 7 7 9 7 11 6 0 O3 O4 O6 O7 O8 O9 L1 O4, O5 Parent-child Prabhav Kalaghatgi 5/19

  11. Family-joining (FJ) method D O6 O7 O2 O4 0 L1 L2 O6 O8 6 0 O7 4 4 0 O8 8 8 6 O1 0 O3 O4 O9 3 3 1 5 0 O9 L2 3 5 3 7 2 0 O5 O4 O6 O7 O8 O9 L2 L1, O3 Siblings with latent parent Prabhav Kalaghatgi 5/19

  12. Family-joining (FJ) method E O7 O6 O6 0 O7 4 0 O2 O8 8 6 0 L2 L1 L3 O8 O9 3 1 5 0 L3 4 2 6 1 0 O6 O7 O8 O9 L3 O1 O4 O9 O3 O5 L2, O4 Siblings with latent parent Prabhav Kalaghatgi 5/19

  13. Family-joining (FJ) method F O7 O2 O9 O7 0 L2 L3 L1 O8 6 0 O9 O8 1 5 0 O6 O7 O8 O9 O1 O4 O3 O5 L 3 ,O 6 (O 9 ) Siblings with observed parent Prabhav Kalaghatgi 5/19

  14. Family-joining (FJ) method G O7 O8 O2 L3 L2 L1 O9 O6 O1 O4 O3 O5 O7, O8 (O9) Siblings with observed parent Prabhav Kalaghatgi 5/19

  15. Family-joining (FJ) method H O7 O8 1 5 O2 1 3 L3 L2 L1 4 1 2 O9 O6 2 3 1 O1 O4 1 O3 O5 OLS branch length estimates Prabhav Kalaghatgi 5/19

  16. Related methods Recursive grouping (RG; Choi et al. 2011 JMLR) Chow-Liu recursive grouping (CLRG; Choi et al. 2011 JMLR) Neighbor-joining with edge contraction (NJc; Choi et al. 2011 JMLR) Sampled ancestors (SA; Gavryushkina et al. 2014 PLoS Comput Biol) Prabhav Kalaghatgi 6/19

  17. Simulated data 160 taxa Varying proportion of latent vertices 1000 nt long sequences GTR + Γ 100 replicates BIC for threshold selection Prabhav Kalaghatgi 7/19

  18. Robinson-Foulds distance 1.0 ● FJ ^| RF = 1 − |S ∩ S NJc RG ^| |S ∪ S CLRG Normalized Robinson−Foulds distance 0.8 SA 0.6 0.4 0.2 ● ● ● ● ● 0.0 0.5 0.37 0.25(d) 0.12 0 Fraction of latent vertices Prabhav Kalaghatgi 8/19

  19. Precision and Recall 1.0 1.0 ● ● ● ● ● ● ● ● ● ● 0.8 0.8 0.6 0.6 Precision Recall 0.4 0.4 ^| 0.2 ● FJ ^| 0.2 ● FJ ● Recall = |S ∩ S Precision = |S ∩ S NJc NJc ^| |S| |S RG RG CLRG CLRG 0.0 SA 0.0 SA 0.5 0.37 0.25(d) 0.12 0 0.5 0.37 0.25(d) 0.12 0 Fraction of latent vertices Fraction of latent vertices Prabhav Kalaghatgi 9/19

  20. Validation using the Belgian HIV-1 C transmission chain data F A G D I B C E K H L Vranken et al. 2014 PLoS Comput Biol publicly available at LANL Prabhav Kalaghatgi 10/19

  21. Validation using the Belgian HIV-1 C transmission chain data F A G D I B C E K H L Vranken et al. 2014 PLoS Comput Biol publicly available at LANL 11 hosts 181 env seqs Sequences at multiple time points per host Prabhav Kalaghatgi 10/19

  22. Unrooted generally labeled tree ● Host ● A ● ● ● ● B ● ● ● ● ● C ● ● ● ● ● D ● ● ● ● ● ● ● ● ● ● ● E ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● F ● ● ● ● ● ● ● ● ● ● ● ● G ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● H ● ● ● ● ● ● ● ● ● ● ● ● ● I ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● K ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● L ● ● ● ● ● ● ● ● ● ● ● ● latent ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Prabhav Kalaghatgi 11/19

  23. Inferring the location of the root ● Sampling year ● 1990 ● ● ● ● 1992 ● ● ● ● ● 1994 ● ● ● ● ● 1996 ● ● ● ● ● ● ● ● ● ● ● 1998 ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● 2000 ● ● ● ● ● ● ● ● ● ● ● ● 2002 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2004 ● ● ● ● ● ● ● ● ● ● ● ● ● 2006 ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Prabhav Kalaghatgi 12/19

  24. Rooted phylogenetic tree host A B C D E F G H I K L 0.00 0.02 0.04 0.06 0.08 0.10 subs/site Prabhav Kalaghatgi 13/19

  25. Ancestral state reconstruction host A B C D E F G H I K L 0.00 0.02 0.04 0.06 0.08 0.10 subs/site Prabhav Kalaghatgi 14/19

  26. Compatibility with transmission events host A E K B C E C D E F C L G H I K L C D B C B I B H A B A F A G 0.00 0.02 0.04 0.06 0.08 0.10 subs/site Prabhav Kalaghatgi 15/19

  27. Summary and Outlook FJ has 93 % precision and 90 % recall on simulated data High precision implies that most branches are reliable FJ tree is compatible with 9/10 transmission events Prabhav Kalaghatgi 16/19

  28. Summary and Outlook Improve reconstruction FJ has 93 % precision and accuracy, and speed 90 % recall on simulated data High precision implies that most branches are reliable FJ tree is compatible with 9/10 transmission events Prabhav Kalaghatgi 16/19

Recommend


More recommend