CS 581 Paper Presentation Muhammad Samir Khan Recovering the Treelike Trend of Evolution Despite Extensive Lateral Genetic Transfer: A Probabilistic Analysis by Sebastien Roch and Sagi Snir
Overview • Introduction (what is LGT?) • Notation • Model • Bounded-rates Model • Yule Process • Quartet Based Approach • Bounded Rates Model • Yule Process • Preferential LGT • Further Results
What is LGT? • Non-vertical transfer of genes • Overall evolution is tree-like • Particularly common in bacteria • Primary Reason for the spread of antibiotic resistance 1 1. https://en.wikipedia.org/wiki/Horizontal_gene_transfer 2. http://www.nature.com/nrmicro/journal/v3/n9/images/nrmicro1253-f1.gif
Species Phylogeny • 𝑈 𝑡 = (𝑊 𝑡 , 𝐹 𝑡 , 𝑀 𝑡 : 𝑠, 𝜐) 𝑠 • 𝑊 vertices extinct 𝑡 • 𝐹 𝑡 𝜐(𝑓) edges extant • 𝑀 𝑡 leaves • 𝑠 root • 𝜐(𝑓) interspeciation times • Number of leaves 𝑜 = 𝑜 + + 𝑜 − • 𝑜 + > 0 extant species • 𝑜 − ≥ 0 extinct species
Extant Phylogeny + = (𝑊 𝑡+ , 𝐹 𝑡 + , 𝑀 𝑡 + : 𝑠 + , 𝜐 + ) • Denoted 𝑈 𝑠 𝑡 + • Restrict to extant leaves 𝑈 𝑡 |𝑀 𝑡 • Suppress vertices of degree 2 (add up the branch lengths) • Root at the most recent common + ancestor of 𝑀 𝑡 time + is ultrametric • 𝑈 𝑡 • Want to recover the extant phylogeny
Gene Trees • 𝑈 = (𝑊 , 𝐹 , 𝑀 : 𝜕 ) for a gene is an unrooted tree • 𝑊 vertices • 𝐹 edges • 𝑀 leaves subset of 𝑀 𝑡 • 𝜕 (𝑓) branch lengths (expected number of substitutions) • Each vertex of degree 2 or 3 • 𝒰 = 𝒰[𝑈 ] is the topology of 𝑈 with degree 2 vertices suppressed • Not ultrametric
LGT Transfer – Subtree Prune and Regraft • LGT Transfer takes place on locations along the edges • Recipient location: pruning • Donor location: regrafting • A new node at donor location 1. Roch, S., & Snir, S. (2013). Recovering the treelike trend of evolution despite extensive lateral genetic transfer: a probabilistic analysis. Journal of Computational Biology, 20(2), 93-112.
Contemporaneous Locations • Two locations 𝑦, 𝑧 are contemporaneous if their 𝜐 -distance to the root is identical: 𝜐 𝑦, 𝑠 = 𝜐(𝑧, 𝑠) (𝑆) is the set of locations contemporaneous to 𝑦 and • For 𝑆 > 0 , 𝐷 𝑦 with MRCA at 𝜐 -distance at most 𝑆 from 𝑦 : (𝑆) = 𝐷 𝑦 𝑧 ∶ 𝜐 𝑦, 𝑠 = 𝜐 𝑧, 𝑠 , 𝜐 𝑦, 𝑧 ≤ 2𝑆
Random LGT • Species phylogeny fixed 𝑈 𝑡 = 𝑊 𝑡 , 𝐹 𝑡 , 𝑀 𝑡 : 𝑠, 𝜐 • 0 < 𝑆 ≤ ∞ (possibly depending on 𝑜 ) • Each edge has a rate of LGT 𝜇 𝑓 : 0 < 𝜇 𝑓 < +∞ • Λ 𝑓 = 𝜇 𝑓 𝜐 𝑓 • Λ 𝑢𝑝𝑢 = σ 𝑓∈𝐹 𝑡 Λ 𝑓 • Λ = σ 𝑓∈𝐹(𝑈 + ) Λ 𝑓 𝑡 |𝑀 𝑡 • Taxon sampling probability 𝑞 ∶ 0 < 𝑞 ≤ 1
Random LGT • LGT locations: • Start from root (chronologically) • Along each edge 𝑓 ∈ 𝐹 𝑡 , select a recipient location according to a continuous- time Poisson process with rate 𝜇 𝑓 • If 𝑦 is selected as a recipient location, donor location is selected uniformly at 𝑆 random from 𝐷 𝑦 • Keep each extant leaf independently with probability 𝑞 , to get 𝑀 • Gene tree 𝑈 is obtained by keeping the subtree restricted to 𝑀
ҧ Bounded Rates Model • Constants: • 𝜍 𝜇 ∶ 0 < 𝜍 𝜇 < 1 • 𝜍 𝜐 ∶ 0 < 𝜍 𝜐 < 1 • 𝜐 ∶ 0 < ҧ 𝜐 < +∞ 𝜇 possibly depending on 𝑜 + : 0 < ҧ • ҧ 𝜇 < +∞ • Used to control the amount of LGT • Under the bounded rates model: 𝜍 𝜇 ҧ 𝜇 ≤ 𝜇 𝑓 ≤ ҧ 𝜇 ∀𝑓 ∈ 𝐹 𝑡 𝜐 ≤ 𝜐 + 𝑓 + ≤ ҧ ∀𝑓 + ∈ 𝐹 𝑡 + 𝜍 𝜐 ҧ 𝜐
ҧ Yule Process • Branching process that starts with two species • Each species generates a new offspring at rate 𝜉 ∶ 0 < 𝜉 < +∞ • No extinct species • Stop when number of species = 𝑜 + 1 (ignore the last species) • 𝜍 𝜇 ҧ 𝜇 ≤ 𝜇 𝑓 ≤ ҧ 𝜇 for every edge 𝑓 ∈ 𝐹 𝑡 • 𝜍 𝜇 constant: 0 < 𝜍 𝜇 < 1 𝜇 possibly depending on 𝑜 : 0 < ҧ • 𝜇 < +∞
Quartet Based Approach Output : Estimated extant species phylogeny • Input : Gene trees 𝑈 1 , … , 𝑈 𝑈 𝑂 • Let 𝑌 = 𝑏, 𝑐, 𝑑, 𝑒 be a four-tuple of extant species • Three possible quartets • 𝑟 1 = 𝑏𝑐|𝑑𝑒 • 𝑟 2 = 𝑏𝑑|𝑐𝑒 • 𝑟 3 = 𝑏𝑒|𝑐𝑑 • Frequency of quartet: 𝑘 ∶𝑌⊆𝑀 𝑘 ,𝒰 𝑘 |𝑌=𝑟 𝑗 𝑔 𝑌 𝑟 𝑗 = 𝑘 ∶𝑌⊆𝑀 𝑘
Quartet Based Approach 1. Roch, S., & Snir, S. (2013). Recovering the treelike trend of evolution despite extensive lateral genetic transfer: a probabilistic analysis. Journal of Computational Biology, 20(2), 93-112.
Bounded Rates Model 1. Roch, S., & Snir, S. (2013). Recovering the treelike trend of evolution despite extensive lateral genetic transfer: a probabilistic analysis. Journal of Computational Biology, 20(2), 93-112.
Yules Process 1. Roch, S., & Snir, S. (2013). Recovering the treelike trend of evolution despite extensive lateral genetic transfer: a probabilistic analysis. Journal of Computational Biology, 20(2), 93-112.
Preferential LGT 1. Roch, S., & Snir, S. (2013). Recovering the treelike trend of evolution despite extensive lateral genetic transfer: a probabilistic analysis. Journal of Computational Biology, 20(2), 93-112.
Further Results • Highways of LGT • The same model as before with additional “highways” • Highways are pairs of edges where LGT occurs deterministically • Highways can be different for different genes • Same result holds under the bounded rates model • Assuming no extinctions • Frequency of genes affected by highways is low • Distance Based Approach under the GTR model • Compute the distance matrix by using the median of distances • Use any statistically consistent distance based method
Questions?
Recommend
More recommend