A ¡Two-‑State ¡Model ¡of ¡Tree ¡ Evolution ¡and ¡Its ¡Applications ¡to ¡ Alu Retrotransposition NIEMA MOSHIRI AND SIAVASH MIRARAB Presented by: Surbhi Jain
INTRODUCTION
Background • Approx 6 billion base pairs of DNA in body Protein coding Intergenic regions DNA • Only 3 – 10% actually code for proteins LI – 17% • 90 – 97% integenic regions • within intervening regions are repeating elements – SINE (short interspersed repeating elements) Alu – 11 % • Alu most common • ~11% of the human genome • >1 million copies Adapted from presentation on Alu elements. PCR Workshop (2005).
Alu elements Alu elements probably arose from a gene that encodes the RNA • component of the signal recognition particle, which labels proteins for export from the cell. Roughly 1 million copies --11% of total genome • Recognition site for restriction enzyme Alu I ( A G^C T) is found • within the Alu region – hence the name. Approx 300 bp in length • Alu does not encode any functional molecules and depends on • the machinery of the active class of repetitive elements in order to be copied and moved about the genome.
Retrotransposon -“Jumping Gene” • Copy and paste model • Transcribed into mRNA by RNA polymerase • Converted to double stranded DNA by reverse transcriptase • Integrated into different spot in genome at the site of a single or double stranded break Batzer, M. and Deininger, P. (2002). Alu repeats and human genomic diversity. Nature Reviews Genetics .
Why study Alu elements? For biologists: Impact on the genome Impact on genome regulation – insertion mutations – distribution of methylation – recombination between elements – transcription of genes throughout the genome – gene conversion – gene expression Transcription of Alu elements changes in response to – Implicated in human diseases: cellular stress – Neurofibromatosis might be involved in – – Haemophilia maintaining or regulating the cellular stress response – Familial hypercholesterolaemia – Breast cancer – Insulin-resistant diabetes type II – Ewing sarcoma Batzer, M. and Deininger, P. (2002). Alu repeats and human genomic diversity. Nature Reviews Genetics .
Why study Alu elements? For phylogeneticists: • Alu elements are a primary source for the origin of simple sequence repeats in primate genomes • Alu-insertion polymorphisms are a boon for the study of human population genetics and primate comparative genomics because they are neutral, identical-by-descent genetic markers with known ancestral states • Phylogenetic analysis of Alu elements belonging to the Alu Ye5 subfamily has provided the strongest evidence yet that the chimp is humans' closest living relative Batzer, M. and Deininger, P. (2002). Alu repeats and human genomic diversity. Nature Reviews Genetics .
Batzer, M. and Deininger, P. (2002). Alu repeats and human genomic diversity. Nature Reviews Genetics .
Dual Birth Model 𝜇 " ¡ , 𝜇 $ ¡ = birth parameters r = 𝜇 " ¡ / 𝜇 $ ¡ Not exchangeable The right child of any branch is always active while the left one is inactive. • Active entities propagate with rate b (for “birth”), and inactive entities • activate and simultaneously propagate with rate a (for “activation”) Moshiri, N. and Mirarab, S. (2017). A Two-State Model of Tree Evolution and Its Applications to Alu Retrotransposition. Systematic Biology .
METHODS
Simulations • Fixed-n sampling procedure to generate 20 replicate “true” trees • 6 experiments each varying a single parameter Tree inference: FastTreeII & RAxML • r estimation: cherry based & length based estimator • Error measurement: normalized Robinson–Foulds (RF) distance & • Matching Split (MS) metric Moshiri, N. and Mirarab, S. (2017). A Two-State Model of Tree Evolution and Its Applications to Alu Retrotransposition. Systematic Biology .
Human Alu dataset • Dataset of 885,011 Alu repeats – Human Alu profile hidden Markov models (profile HMMs) from Dfam database – nhmmer to scan the hg19 reference genome Alignment: PASTA • Tree inference: FastTreeII and RAxML • Moshiri, N. and Mirarab, S. (2017). A Two-State Model of Tree Evolution and Its Applications to Alu Retrotransposition. Systematic Biology .
RESULTS
Simulations Moshiri, N. and Mirarab, S. (2017). A Two-State Model of Tree Evolution and Its Applications to Alu Retrotransposition. Systematic Biology .
Simulations Moshiri, N. and Mirarab, S. (2017). A Two-State Model of Tree Evolution and Its Applications to Alu Retrotransposition. Systematic Biology .
Human Alu dataset • 7% of Alu repeats have propagated at least once 𝜇 " (activation events per year per inactive element) = 1.426 x 10 -8 • 𝜇 $ ¡ (propagation event per active element per year) = 2.384 x 10 -6 • Moshiri, N. and Mirarab, S. (2017). A Two-State Model of Tree Evolution and Its Applications to Alu Retrotransposition. Systematic Biology .
Limitations and Future Directions • All elements are born into an inactive state, have identical rate of activation at birth, and an identical rate of birth • Modeling deactivation would enable the estimation of the number of elements that are active at any specific point in time • Allowing deaths in addition to births • Discussion about the merit of 𝜇 𝑏 ¡ , 𝜇 𝑐 ¡ values Moshiri, N. and Mirarab, S. (2017). A Two-State Model of Tree Evolution and Its Applications to Alu Retrotransposition. Systematic Biology .
Thanks for your attention. Questions?
Questions ¡from ¡the ¡class • How well do ML phylogeny models deal with retrotransposition. Is it the case that retrotransposition causes errors in typical models or that instead the birth-death model is just significantly better in these cases. • How much do birth-death models change the performance on canonical applications of phylogeny? • What are some technical difficulties in incorporating the deactivation process in the model? • Is there a possible explanation to why n did not influence the mean tree error too much but has large impact on variance of the tree error? (Figure 3(a) lower center) • Is there any way to assess the validity of their results? How accurate are their estimations of Alu parameters? • Are there any additions to their model that can capture more of the biological complexity of these sequences? • Why is dominance of an Alu insertion governed by an element being under "selective pressure"? • How would the MCMC approach be used to estimate r in this context? • What is the process of setting a node ’active’ supposed to represent in biology? • They assume a molecular clock while trying to find the activation and propagation events per year; is this ok to assume? • Is it correct to say that translating the standard time-reversible models of evolution to this model would involve using two sets of substitution parameters, one for each child edge of an internal node? • The paper states that Alu elements have no known biological function of their own, but their being studied can provide insights into their contributions to genetic disease. So, have previous studies positively linked their presence in the genome to any specific diseases? • Does the dual-birth model lose anything by not accounting for death rates, where branches can go extinct with a constant rate?
Supplementary slides • For example, Price et al. (2004 ) used whole-genome Alu data to estimate the total number of active elements to have been at least 143 throughout the history of Alu elements • Wang et al. (2006 ) used human polymorphism data to estimate the number of currently-active Alu elements to be at least 31 • Wacholder and Pollock (2016 ) introduced a novel • Bayesian transposable element ancestral reconstruction method and used it to estimate a lower-bound of 1386Alu elements to have ever been active • These studies are looking for a strong evidence of transposition capability and do not rule out the possibility that others are able to propagate
Recommend
More recommend