taming the beast workshop
play

Taming the Beast Workshop Bayesian inference of species tree - PowerPoint PPT Presentation

Taming the Beast Taming the Beast Workshop Bayesian inference of species tree Species & gene trees *BEAST Species tree prior Multispecies coalescent Bayesian inference of species tree Molecular clock model Felsenstein likelihood and


  1. Taming the Beast Taming the Beast Workshop Bayesian inference of species tree Species & gene trees *BEAST Species tree prior Multispecies coalescent Bayesian inference of species tree Molecular clock model Felsenstein likelihood and *BEAST Posterior distribution starBEAST2 References Chi Zhang June 28, 2016 1 / 19

  2. Species tree Taming the Beast Bayesian inference of species tree ◮ Species tree — the phylogeny representing the relationships Species & gene trees *BEAST among a group of species Species tree prior Multispecies coalescent Molecular clock model Felsenstein likelihood Posterior distribution starBEAST2 References Figure adapted from [Rogers and Gibbs, 2014] ◮ Gene tree — the phylogeny for sequences at a particular gene locus from those species 2 / 19

  3. Gene tree discordance Taming the Beast Bayesian inference of species tree Species & gene trees ◮ Incomplete lineage sorting *BEAST Species tree prior Multispecies coalescent Molecular clock model Felsenstein likelihood Posterior distribution starBEAST2 References Figure adapted from [Patterson et al., 2006] 3 / 19

  4. Gene tree discordance Taming the Beast Bayesian inference of species tree Species & gene trees ◮ Horizontal gene transfer *BEAST Species tree prior Multispecies coalescent ◮ Gene duplication and loss Molecular clock model Felsenstein likelihood Posterior distribution starBEAST2 References Figure adapted from [Degnan and Rosenberg, 2009] 4 / 19

  5. Gene tree discordance Taming the Beast Bayesian inference of ◮ Hybridization species tree Species & gene trees *BEAST Species tree prior Multispecies coalescent Molecular clock model Felsenstein likelihood Posterior distribution starBEAST2 References Figure adapted from [Li et al., 2016] 5 / 19

  6. Species tree inference and *BEAST Taming the Beast Bayesian inference of species tree Species & gene trees ◮ A Bayesian method to infer species tree from multilocus *BEAST Species tree prior sequence data [Heled and Drummond, 2010] Multispecies coalescent Molecular clock model ◮ *BEAST, a functionality of BEAST2 Felsenstein likelihood Posterior distribution starBEAST2 ◮ Gene trees are embedded References in the species tree under the multispecies coalescent model [Rannala and Yang, 2003] ◮ incomplete lineage sorting ◮ Gene trees are independent among loci Human Gorilla Chimp 6 / 19

  7. Species tree prior Taming the Beast Bayesian inference of species tree Species & gene trees *BEAST Species tree prior Multispecies coalescent ◮ The prior for species tree S has two parts: Molecular clock model Felsenstein likelihood Posterior distribution P ( S ) = P ( S T ) P ( N ) starBEAST2 References ◮ S T — species time tree ◮ N — population size functions ◮ P ( S T ) — typically a Yule (pure-birth) or birth-death prior ◮ we can assign a hyperprior for the speciation (birth) rate (and extinction (death) rate, if birth-death) ◮ P ( N ) — constant or continuous-linear 7 / 19

  8. Species tree prior Taming the Beast Bayesian inference of species tree Species & gene trees ◮ Constant population sizes *BEAST Species tree prior Multispecies coalescent Molecular clock model Felsenstein likelihood Posterior distribution starBEAST2 References Figure adapted from [Drummond and Bouckaert, 2015] 8 / 19

  9. Species tree prior Taming the Beast Bayesian inference of species tree ◮ Continuous-linear population sizes Species & gene trees *BEAST Species tree prior Multispecies coalescent Molecular clock model Felsenstein likelihood Posterior distribution starBEAST2 References Figure adapted from [Drummond and Bouckaert, 2015] 9 / 19

  10. Species tree prior Taming the Beast Bayesian inference of species tree Species & gene trees *BEAST Species tree prior Multispecies coalescent Molecular clock model Felsenstein likelihood ◮ In *BEAST, the prior type for N is fixed to gamma Posterior distribution starBEAST2 ◮ The gamma shape parameter k is fixed to 2, but we can References assign a hyperprior for ψ , the scale parameter of the gamma ◮ (This ψ parameter is called ”population mean” in Beauti, but the prior mean is actually 2 ψ when the population sizes are constant) 10 / 19

  11. Multispecies coalescent model Taming the Beast Bayesian inference of ◮ The prior for gene tree g , given species tree S species tree Species & gene trees *BEAST Species tree prior Multispecies coalescent Molecular clock model Felsenstein likelihood Posterior distribution starBEAST2 References Figure adapted from [Drummond and Bouckaert, 2015] 11 / 19

  12. Multispecies coalescent model Taming the Beast Bayesian inference of species tree Species & gene trees ◮ The prob. distribution of gene time tree g given species tree *BEAST Species tree prior S , is: Multispecies coalescent 2 s − 1 Molecular clock model � P ( g | S ) = P ( L j ( g ) | N j ( t )) Felsenstein likelihood Posterior distribution j = 1 starBEAST2 References ◮ s — number of extant species (2 s − 1 branches totally) ◮ N j ( t ) — population size function (linear) ◮ L j ( g ) — coalescent intervals for genealogy g that are contained in the j ’th branch of species tree S 12 / 19

  13. Molecular clock model Taming the Beast Bayesian inference of species tree Species & gene trees *BEAST Species tree prior Multispecies coalescent ◮ P ( c ) — prior for the molecular clock model of genealogy g Molecular clock model ◮ strict clock — typically fix to 1.0 for the first locus, and Felsenstein likelihood Posterior distribution infer the relative clock rates for the rest loci starBEAST2 ◮ relaxed clock References ◮ P ( θ ) — prior for the substitution model parameters ◮ e.g. HKY85, ◮ Prior for transition/transversion rate ratio ( κ ), e.g. gamma(2,1) ◮ Prior for base frequencies ( π T , π C , π A , π G ), e.g. Dirichlet(1,1,1,1) 13 / 19

  14. Felsenstein likelihood Taming the Beast Bayesian inference of species tree Species & gene trees *BEAST Species tree prior Multispecies coalescent Molecular clock model Felsenstein likelihood Posterior distribution ◮ The probability (likelihood) of data d i (alignment at locus starBEAST2 References i ), given the gene time tree g i , molecular clock c i , and substitution model θ i , is: P ( d i | g i , c i , θ i ) 14 / 19

  15. Priors and likelihood Taming the Beast Bayesian inference of species tree Species & gene trees *BEAST Species tree prior Multispecies coalescent Molecular clock model ◮ P ( S ) — prior for species tree Felsenstein likelihood Posterior distribution starBEAST2 ◮ P ( g i | S ) — prior for gene tree i (multispecies coalescent) References ◮ P ( c i ) — prior for clock rate of locus i ◮ P ( θ i ) — prior for substitution parameters of locus i ◮ P ( d i | g i , c i , θ i ) — likelihood of data at locus i 15 / 19

  16. Posterior Taming the Beast Bayesian inference of species tree Species & gene trees *BEAST Species tree prior Multispecies coalescent Molecular clock model Felsenstein likelihood ◮ The posterior distribution of the species tree S and other Posterior distribution paremeters given data D is: starBEAST2 References n � P ( S , g , c , Θ | D ) ∝ P ( S ) P ( g i | S ) P ( c i ) P ( θ i ) P ( d i | g i , c i , θ i ) i = 1 ◮ The data D = { d 1 , d 2 , . . . , d n } is composed of n alignments, one per locus. 16 / 19

  17. Integrating out population sizes Taming the Beast Bayesian inference of species tree Species & gene trees *BEAST Species tree prior Multispecies coalescent Molecular clock model ◮ Assume constant population sizes Felsenstein likelihood Posterior distribution ◮ Assign i.i.d inverse-gamma( α , β ) prior for N j starBEAST2 References ◮ mean = β/ ( α − 1 ) ◮ The population sizes N can be integrated out from P ( g | S ) [Jones, 2015] ◮ Specify α and β in the invgamma prior (instead of ψ in the gamma prior) 17 / 19

  18. starBEAST2 Taming the Beast Bayesian inference of species tree Species & gene trees *BEAST Species tree prior Multispecies coalescent ◮ A more efficient implementation and an upgrade of *BEAST Molecular clock model Felsenstein likelihood Posterior distribution starBEAST2 ◮ Population sizes integrated out [Jones, 2015] References ◮ Relaxed molecular clock per species tree branch (instead of per gene tree branch) ◮ More efficient MCMC proposals for the species tree and gene trees (coordinated operators) [Jones, 2015, Rannala and Yang, 2015] ◮ Available at github.com/genomescale/starbeast2 , will be released soon (as a BEAST2 add-on) 18 / 19

Recommend


More recommend