cs 581 paper presentation
play

CS 581 Paper Presentation Muhammad Samir Khan Recovering the - PowerPoint PPT Presentation

CS 581 Paper Presentation Muhammad Samir Khan Recovering the Treelike Trend of Evolution Despite Extensive Lateral Genetic Transfer: A Probabilistic Analysis by Sebastien Roch and Sagi Snir Overview Introduction (what is LGT?) Notation


  1. CS 581 Paper Presentation Muhammad Samir Khan Recovering the Treelike Trend of Evolution Despite Extensive Lateral Genetic Transfer: A Probabilistic Analysis by Sebastien Roch and Sagi Snir

  2. Overview • Introduction (what is LGT?) • Notation • Model • Bounded-rates Model • Yule Process • Quartet Based Approach • Bounded Rates Model • Yule Process • Preferential LGT • Further Results

  3. What is LGT? • Non-vertical transfer of genes • Overall evolution is tree-like • Particularly common in bacteria • Primary Reason for the spread of antibiotic resistance 1 1. https://en.wikipedia.org/wiki/Horizontal_gene_transfer 2. http://www.nature.com/nrmicro/journal/v3/n9/images/nrmicro1253-f1.gif

  4. Species Phylogeny • 𝑈 𝑡 = (𝑊 𝑡 , 𝐹 𝑡 , 𝑀 𝑡 : 𝑠, 𝜐) 𝑠 • 𝑊 vertices extinct 𝑡 • 𝐹 𝑡 𝜐(𝑓) edges extant • 𝑀 𝑡 leaves • 𝑠 root • 𝜐(𝑓) interspeciation times • Number of leaves 𝑜 = 𝑜 + + 𝑜 − • 𝑜 + > 0 extant species • 𝑜 − ≥ 0 extinct species

  5. Extant Phylogeny + = (𝑊 𝑡+ , 𝐹 𝑡 + , 𝑀 𝑡 + : 𝑠 + , 𝜐 + ) • Denoted 𝑈 𝑠 𝑡 + • Restrict to extant leaves 𝑈 𝑡 |𝑀 𝑡 • Suppress vertices of degree 2 (add up the branch lengths) • Root at the most recent common + ancestor of 𝑀 𝑡 time + is ultrametric • 𝑈 𝑡 • Want to recover the extant phylogeny

  6. Gene Trees • 𝑈 𝑕 = (𝑊 𝑕 , 𝐹 𝑕 , 𝑀 𝑕 : 𝜕 𝑕 ) for a gene 𝑕 is an unrooted tree • 𝑊 vertices 𝑕 • 𝐹 𝑕 edges • 𝑀 𝑕 leaves subset of 𝑀 𝑡 • 𝜕 𝑕 (𝑓) branch lengths (expected number of substitutions) • Each vertex of degree 2 or 3 • 𝒰 𝑕 = 𝒰[𝑈 𝑕 ] is the topology of 𝑈 𝑕 with degree 2 vertices suppressed • Not ultrametric

  7. LGT Transfer – Subtree Prune and Regraft • LGT Transfer takes place on locations along the edges • Recipient location: pruning • Donor location: regrafting • A new node at donor location 1. Roch, S., & Snir, S. (2013). Recovering the treelike trend of evolution despite extensive lateral genetic transfer: a probabilistic analysis. Journal of Computational Biology, 20(2), 93-112.

  8. Contemporaneous Locations • Two locations 𝑦, 𝑧 are contemporaneous if their 𝜐 -distance to the root is identical: 𝜐 𝑦, 𝑠 = 𝜐(𝑧, 𝑠) (𝑆) is the set of locations contemporaneous to 𝑦 and • For 𝑆 > 0 , 𝐷 𝑦 with MRCA at 𝜐 -distance at most 𝑆 from 𝑦 : (𝑆) = 𝐷 𝑦 𝑧 ∶ 𝜐 𝑦, 𝑠 = 𝜐 𝑧, 𝑠 , 𝜐 𝑦, 𝑧 ≤ 2𝑆

  9. Random LGT • Species phylogeny fixed 𝑈 𝑡 = 𝑊 𝑡 , 𝐹 𝑡 , 𝑀 𝑡 : 𝑠, 𝜐 • 0 < 𝑆 ≤ ∞ (possibly depending on 𝑜 ) • Each edge has a rate of LGT 𝜇 𝑓 : 0 < 𝜇 𝑓 < +∞ • Λ 𝑓 = 𝜇 𝑓 𝜐 𝑓 • Λ 𝑢𝑝𝑢 = σ 𝑓∈𝐹 𝑡 Λ 𝑓 • Λ = σ 𝑓∈𝐹(𝑈 + ) Λ 𝑓 𝑡 |𝑀 𝑡 • Taxon sampling probability 𝑞 ∶ 0 < 𝑞 ≤ 1

  10. Random LGT • LGT locations: • Start from root (chronologically) • Along each edge 𝑓 ∈ 𝐹 𝑡 , select a recipient location according to a continuous- time Poisson process with rate 𝜇 𝑓 • If 𝑦 is selected as a recipient location, donor location is selected uniformly at 𝑆 random from 𝐷 𝑦 • Keep each extant leaf independently with probability 𝑞 , to get 𝑀 𝑕 • Gene tree 𝑈 𝑕 is obtained by keeping the subtree restricted to 𝑀 𝑕

  11. ҧ Bounded Rates Model • Constants: • 𝜍 𝜇 ∶ 0 < 𝜍 𝜇 < 1 • 𝜍 𝜐 ∶ 0 < 𝜍 𝜐 < 1 • 𝜐 ∶ 0 < ҧ 𝜐 < +∞ 𝜇 possibly depending on 𝑜 + : 0 < ҧ • ҧ 𝜇 < +∞ • Used to control the amount of LGT • Under the bounded rates model: 𝜍 𝜇 ҧ 𝜇 ≤ 𝜇 𝑓 ≤ ҧ 𝜇 ∀𝑓 ∈ 𝐹 𝑡 𝜐 ≤ 𝜐 + 𝑓 + ≤ ҧ ∀𝑓 + ∈ 𝐹 𝑡 + 𝜍 𝜐 ҧ 𝜐

  12. ҧ Yule Process • Branching process that starts with two species • Each species generates a new offspring at rate 𝜉 ∶ 0 < 𝜉 < +∞ • No extinct species • Stop when number of species = 𝑜 + 1 (ignore the last species) • 𝜍 𝜇 ҧ 𝜇 ≤ 𝜇 𝑓 ≤ ҧ 𝜇 for every edge 𝑓 ∈ 𝐹 𝑡 • 𝜍 𝜇 constant: 0 < 𝜍 𝜇 < 1 𝜇 possibly depending on 𝑜 : 0 < ҧ • 𝜇 < +∞

  13. Quartet Based Approach Output : Estimated extant species phylogeny ෠ • Input : Gene trees 𝑈 𝑕 1 , … , 𝑈 𝑈 𝑕 𝑂 • Let 𝑌 = 𝑏, 𝑐, 𝑑, 𝑒 be a four-tuple of extant species • Three possible quartets • 𝑟 1 = 𝑏𝑐|𝑑𝑒 • 𝑟 2 = 𝑏𝑑|𝑐𝑒 • 𝑟 3 = 𝑏𝑒|𝑐𝑑 • Frequency of quartet: 𝑕 𝑘 ∶𝑌⊆𝑀 𝑕𝑘 ,𝒰 𝑕𝑘 |𝑌=𝑟 𝑗 𝑔 𝑌 𝑟 𝑗 = 𝑕 𝑘 ∶𝑌⊆𝑀 𝑕𝑘

  14. Quartet Based Approach 1. Roch, S., & Snir, S. (2013). Recovering the treelike trend of evolution despite extensive lateral genetic transfer: a probabilistic analysis. Journal of Computational Biology, 20(2), 93-112.

  15. Bounded Rates Model 1. Roch, S., & Snir, S. (2013). Recovering the treelike trend of evolution despite extensive lateral genetic transfer: a probabilistic analysis. Journal of Computational Biology, 20(2), 93-112.

  16. Yules Process 1. Roch, S., & Snir, S. (2013). Recovering the treelike trend of evolution despite extensive lateral genetic transfer: a probabilistic analysis. Journal of Computational Biology, 20(2), 93-112.

  17. Preferential LGT 1. Roch, S., & Snir, S. (2013). Recovering the treelike trend of evolution despite extensive lateral genetic transfer: a probabilistic analysis. Journal of Computational Biology, 20(2), 93-112.

  18. Further Results • Highways of LGT • The same model as before with additional “highways” • Highways are pairs of edges where LGT occurs deterministically • Highways can be different for different genes • Same result holds under the bounded rates model • Assuming no extinctions • Frequency of genes affected by highways is low • Distance Based Approach under the GTR model • Compute the distance matrix by using the median of distances • Use any statistically consistent distance based method

  19. Questions?

Recommend


More recommend