statistical binning enables an accurate coalescent based
play

Statistical binning enables an accurate coalescent-based estimation - PowerPoint PPT Presentation

Statistical binning enables an accurate coalescent-based estimation of the avian tree Siavash Mirarab, Md. Shamsuzzoha Bayzid, Bastien Boussau, and Tandy Warnow. Science (2014) Avian whole genomes phylogenies [Jarvis, Mirarab, et al., Science,


  1. Statistical binning enables an accurate coalescent-based estimation of the avian tree Siavash Mirarab, Md. Shamsuzzoha Bayzid, Bastien Boussau, and Tandy Warnow. Science (2014)

  2. Avian whole genomes phylogenies [Jarvis, Mirarab, et al., Science, 2014] 48 representative birds Species tree error Hope! Data (i.e., # of genes) 2

  3. Gene tree discordance gene: 
 recombination-free orthologous regions in genomes gene 1 gene 2 gene 999 gene 1000 ¡Owl ¡Finch Falcon Eagle ¡Owl Falcon ¡Finch Eagle 3

  4. Gene tree discordance The species tree ¡Owl ¡Finch Falcon ¡ ¡ ¡ ¡Eagle gene 1 gene 2 gene 999 gene 1000 A gene tree ¡Owl ¡Finch Falcon Eagle ¡Owl Falcon ¡Finch Eagle 3

  5. Gene tree discordance The species tree ¡Owl ¡Finch Falcon ¡ ¡ ¡ ¡Eagle gene 1 gene 2 gene 999 gene 1000 A gene tree ¡Owl ¡Finch Falcon Eagle ¡Owl Falcon ¡Finch Eagle Causes of gene tree discordance: • Modeled by multi-species coalescent • Incomplete Lineage Sorting (ILS) • Highly probable for radiations (e.g., short branches) such as the bird radiation; 60 mya • Duplication and loss • The species is identifiable from the gene • Horizontal Gene Transfer (HGT) tree distribution [Degnan and Salter, 2005] 3

  6. Species tree estimation from phylogenomic data 
 (approach 1: concatenation) gene 1 gene 2 gene 999 gene 1000 ACTGCACACCG 
 CTGAGCATCG 
 AGCAGCATCGTG 
 CAGGCACGCACGAA 
 ACTGC-CCCCG 
 CTGAGC-TCG 
 AGCAGC-TCGTG 
 AGC-CACGC-CATA 
 AATGC-CCCCG 
 ATGAGC-TC- 
 AGCAGC-TC-TG 
 ATGGCACGC-C-TA 
 -CTGCACACGG CTGA-CAC-G C-TA-CACGGTG AGCTAC-CACGGAT 4

  7. Species tree estimation from phylogenomic data 
 (approach 1: concatenation) gene 1 gene 2 gene 999 gene 1000 ACTGCACACCG 
 CTGAGCATCG 
 AGCAGCATCGTG 
 CAGGCACGCACGAA 
 ACTGC-CCCCG 
 CTGAGC-TCG 
 AGCAGC-TCGTG 
 AGC-CACGC-CATA 
 AATGC-CCCCG 
 ATGAGC-TC- 
 AGCAGC-TC-TG 
 ATGGCACGC-C-TA 
 -CTGCACACGG CTGA-CAC-G C-TA-CACGGTG AGCTAC-CACGGAT CAGAGCACGCACGAA 
 ACTGCACACCG 
 AGCAGCATGCGATG 
 CTGAGCATCG 
 AGCA-CACGC-CATA 
 ACTGC-CCCCG 
 AGCAGC-TGCGATG 
 CTGAGC-TCG 
 ATGAGCACGC-C-TA 
 AATGC-CCCCG 
 AGCAGC-TGC-ATG 
 ATGAGC-TC- 
 AGC-TAC-CACGGAT -CTGCACACGG C-TA-CAC-GGATG CTGA-CAC-G Concatenation 4

  8. Species tree estimation from phylogenomic data 
 (approach 1: concatenation) gene 1 gene 2 gene 999 gene 1000 ACTGCACACCG 
 CTGAGCATCG 
 AGCAGCATCGTG 
 CAGGCACGCACGAA 
 ACTGC-CCCCG 
 CTGAGC-TCG 
 AGCAGC-TCGTG 
 AGC-CACGC-CATA 
 AATGC-CCCCG 
 ATGAGC-TC- 
 AGCAGC-TC-TG 
 ATGGCACGC-C-TA 
 -CTGCACACGG CTGA-CAC-G C-TA-CACGGTG AGCTAC-CACGGAT ¡ ¡ ¡ ¡Eagle ¡ ¡ ¡Falcon ¡ ¡ ¡ 81% CAGAGCACGCACGAA 
 ACTGCACACCG 
 AGCAGCATGCGATG 
 CTGAGCATCG 
 AGCA-CACGC-CATA 
 ACTGC-CCCCG 
 AGCAGC-TGCGATG 
 CTGAGC-TCG 
 ATGAGCACGC-C-TA 
 AATGC-CCCCG 
 AGCAGC-TGC-ATG 
 ATGAGC-TC- 
 AGC-TAC-CACGGAT -CTGCACACGG C-TA-CAC-GGATG CTGA-CAC-G ML Concatenation ¡Owl ¡Finch 4

  9. Species tree estimation from phylogenomic data 
 (approach 1: concatenation) gene 1 gene 2 gene 999 gene 1000 ACTGCACACCG 
 CTGAGCATCG 
 AGCAGCATCGTG 
 CAGGCACGCACGAA 
 ACTGC-CCCCG 
 CTGAGC-TCG 
 AGCAGC-TCGTG 
 AGC-CACGC-CATA 
 AATGC-CCCCG 
 ATGAGC-TC- 
 AGCAGC-TC-TG 
 ATGGCACGC-C-TA 
 -CTGCACACGG CTGA-CAC-G C-TA-CACGGTG AGCTAC-CACGGAT ¡ ¡ ¡ ¡Eagle ¡ ¡ ¡Falcon ¡ ¡ ¡ 81% CAGAGCACGCACGAA 
 ACTGCACACCG 
 AGCAGCATGCGATG 
 CTGAGCATCG 
 AGCA-CACGC-CATA 
 ACTGC-CCCCG 
 AGCAGC-TGCGATG 
 CTGAGC-TCG 
 ATGAGCACGC-C-TA 
 AATGC-CCCCG 
 AGCAGC-TGC-ATG 
 ATGAGC-TC- 
 AGC-TAC-CACGGAT -CTGCACACGG C-TA-CAC-GGATG CTGA-CAC-G ML Concatenation ¡Owl ¡Finch Error - Statistically inconsistent & positively misleading [Roch and Steel, Theo. Pop. Gen., 2014] 
 - Mixed accuracy in simulations [Kubatko and Degnan, Systematic Biology, 2007] 
 [Mirarab, et al., Systematic Biology, 2014] Data 4

  10. Species tree estimation from phylogenomic data 
 (approach 2: summary methods) gene 1 gene 2 gene 999 gene 1000 ACTGCACACCG 
 CTGAGCATCG 
 AGCAGCATCGTG 
 CAGGCACGCACGAA 
 ACTGC-CCCCG 
 CTGAGC-TCG 
 AGCAGC-TCGTG 
 AGC-CACGC-CATA 
 AATGC-CCCCG 
 ATGAGC-TC- 
 AGCAGC-TC-TG 
 ATGGCACGC-C-TA 
 -CTGCACACGG CTGA-CAC-G C-TA-CACGGTG AGCTAC-CACGGAT 5

  11. Species tree estimation from phylogenomic data 
 (approach 2: summary methods) gene 1 gene 2 gene 999 gene 1000 ACTGCACACCG 
 CTGAGCATCG 
 AGCAGCATCGTG 
 CAGGCACGCACGAA 
 ACTGC-CCCCG 
 CTGAGC-TCG 
 AGCAGC-TCGTG 
 AGC-CACGC-CATA 
 AATGC-CCCCG 
 ATGAGC-TC- 
 AGCAGC-TC-TG 
 ATGGCACGC-C-TA 
 -CTGCACACGG CTGA-CAC-G C-TA-CACGGTG AGCTAC-CACGGAT Falcon ¡Owl ¡Owl Falcon Falcon ¡Owl Eagle Falcon ¡Finch Eagle ¡Finch Eagle ¡Finch Eagle ¡Finch ¡Owl 5

  12. Species tree estimation from phylogenomic data 
 (approach 2: summary methods) ¡ ¡ ¡ ¡Eagle ¡ ¡ ¡Falcon ¡ ¡ ¡ gene 1 gene 2 gene 999 gene 1000 78% ACTGCACACCG 
 CTGAGCATCG 
 AGCAGCATCGTG 
 CAGGCACGCACGAA 
 ACTGC-CCCCG 
 CTGAGC-TCG 
 AGCAGC-TCGTG 
 AGC-CACGC-CATA 
 AATGC-CCCCG 
 ATGAGC-TC- 
 AGCAGC-TC-TG 
 ATGGCACGC-C-TA 
 -CTGCACACGG CTGA-CAC-G C-TA-CACGGTG AGCTAC-CACGGAT ¡Owl ¡Finch Falcon ¡Owl ¡Owl Falcon Falcon ¡Owl Eagle Falcon Summary method ¡Finch Eagle ¡Finch Eagle ¡Finch Eagle ¡Finch ¡Owl 5

  13. Species tree estimation from phylogenomic data 
 (approach 2: summary methods) ¡ ¡ ¡ ¡Eagle ¡ ¡ ¡Falcon ¡ ¡ ¡ gene 1 gene 2 gene 999 gene 1000 78% ACTGCACACCG 
 CTGAGCATCG 
 AGCAGCATCGTG 
 CAGGCACGCACGAA 
 ACTGC-CCCCG 
 CTGAGC-TCG 
 AGCAGC-TCGTG 
 AGC-CACGC-CATA 
 AATGC-CCCCG 
 ATGAGC-TC- 
 AGCAGC-TC-TG 
 ATGGCACGC-C-TA 
 -CTGCACACGG CTGA-CAC-G C-TA-CACGGTG AGCTAC-CACGGAT ¡Owl ¡Finch Falcon ¡Owl ¡Owl Falcon Falcon ¡Owl Eagle Falcon Summary method ¡Finch Eagle ¡Finch Eagle ¡Finch Eagle ¡Finch ¡Owl Error Can be statistically consistent • MP-EST (maximum pseudo-likelihood) [Liu, Yu, Edwards, BMC Evol. Bio., 2010] • BUCKy-pop., NJst, STAR, ASTRAL, … Data 5

  14. Species tree estimation from phylogenomic data 
 (approach 2: summary methods) ¡ ¡ ¡ ¡Eagle ¡ ¡ ¡Falcon ¡ ¡ ¡ gene 1 gene 2 gene 999 gene 1000 78% ACTGCACACCG 
 CTGAGCATCG 
 AGCAGCATCGTG 
 CAGGCACGCACGAA 
 ACTGC-CCCCG 
 CTGAGC-TCG 
 AGCAGC-TCGTG 
 AGC-CACGC-CATA 
 AATGC-CCCCG 
 ATGAGC-TC- 
 AGCAGC-TC-TG 
 ATGGCACGC-C-TA 
 -CTGCACACGG CTGA-CAC-G C-TA-CACGGTG AGCTAC-CACGGAT ¡Owl ¡Finch Falcon ¡Owl ¡Owl Falcon Falcon ¡Owl Eagle Falcon Summary method ¡Finch Eagle ¡Finch Eagle ¡Finch Eagle ¡Finch ¡Owl Error Can be statistically consistent • MP-EST (maximum pseudo-likelihood) True gene trees [Liu, Yu, Edwards, BMC Evol. Bio., 2010] • BUCKy-pop., NJst, STAR, ASTRAL, … Data 5

  15. Gene trees on the avian dataset 14,000 “genes”: 8,000 exons and 2,500 introns 
 3,500 Ultra-Conserved Elements 20% branches (percentage) 15% median mean 10% 5% 0 0% 25% 50% 75% 100% branch bootstrap support A measure of confidence in estimated gene tree branches 6

  16. Gene trees on the avian dataset 14,000 “genes”: 8,000 exons and 2,500 introns 
 3,500 Ultra-Conserved Elements 20% branches (percentage) 15% median mean 10% 14,000 noisy gene trees 5% 0 0% 25% 50% 75% 100% branch bootstrap support A measure of confidence in estimated gene tree branches 6

Recommend


More recommend