Nature Precedings : doi:10.1038/npre.2011.5953.1 : Posted 13 May 2011 ABC for Temporally Sampled Genetic Data Mark A. Beaumont, Schools of Biological Sciences and Mathematics, The University of Bristol, Bristol, UK 05 April 2011 Mark A. Beaumont, Schools of Biological Sciences and Mathematics, The University of Bristol, Bristol, UK ABC for Temporally Sampled Genetic Data 05 April 2011 () 1 / 25
Nature Precedings : doi:10.1038/npre.2011.5953.1 : Posted 13 May 2011 Temporal Change in Gene Frequency with Admixture Aim is to infer parameters in a state space model of changes in gene frequency in the presence of admixture. α 0 F 0 t 0 X 0 μ 1 β 1 N 1 α 1 F 1 t 1 X 1 μ 2 β 2 N 2 α 2 F 2 X 2 t 2 μ 3 β 3 N 3 α 3 F 3 t 3 X 3 μ 4 β 4 N 4 F 4 α 4 X 4 t 4 μ 5 β 5 N 5 F 5 α 5 t 5 X 5 μ 6 β 6 N 6 F 6 α 6 t 6 X 6 Mark A. Beaumont, Schools of Biological Sciences and Mathematics, The University of Bristol, Bristol, UK ABC for Temporally Sampled Genetic Data 05 April 2011 () 2 / 25
Nature Precedings : doi:10.1038/npre.2011.5953.1 : Posted 13 May 2011 Temporal Change in Gene Frequency with Admixture Temporally sampled genetic data is quite commonly obtained. Changes are usually attributed to genetic drift (a function of the population size). However admixture and replacement of populations over time may be confounded with drift. This is a major issue for ancient DNA samples Mark A. Beaumont, Schools of Biological Sciences and Mathematics, The University of Bristol, Bristol, UK ABC for Temporally Sampled Genetic Data 05 April 2011 () 3 / 25
Nature Precedings : doi:10.1038/npre.2011.5953.1 : Posted 13 May 2011 Importance Sampling, Particles, and MCMC Beaumont (Genetics, 2003); GIMH algorithm; using noisy estimates of likelihood obtained from sequential importance sampling in MCMC. Becquet and Przeworski (Genome Research, 2007); application of GIMH idea to MCMC-ABC algorithm of Marjoram et al (PNAS, 2003). Andrieu and Roberts (Annal. Stat. 2009)Pseudo-marginal method: convergence proofs and generalization of GIMH. Andrieu, Doucet, and Holenstein (RSSB, 2010); Particle MCMC Peters and Cornebise (RSSB, discussion of A,D,&H, 2010); ABC and particle MCMC. Mark A. Beaumont, Schools of Biological Sciences and Mathematics, The University of Bristol, Bristol, UK ABC for Temporally Sampled Genetic Data 05 April 2011 () 4 / 25
Nature Precedings : doi:10.1038/npre.2011.5953.1 : Posted 13 May 2011 Framework for Temporal Model with Admixture (1) S temporal samples are taken. t i Time of i th sample ( i = 0 , . . . , S ). ∆ t j Difference between time of j th and ( j − 1)th sample ( j = 1 , . . . , S ). N j Effective population size for j th interval. µ j Admixture proportion for j th interval. F i F ST of i th admixing population. Mark A. Beaumont, Schools of Biological Sciences and Mathematics, The University of Bristol, Bristol, UK ABC for Temporally Sampled Genetic Data 05 April 2011 () 5 / 25
Nature Precedings : doi:10.1038/npre.2011.5953.1 : Posted 13 May 2011 Framework for Temporal Model with Admixture (2) Use a Dirichlet rather than coalescent to model variance in allele frequencies: Laval et al. , (Genetics, 2003) Kitakado et al. , (Genetics, 2006) This does not give the same allele frequency distribution as the coalescent, but for a given F ST , the variance is the same (see discussant contributions to Nicholson et al (RSSB, 2002)). Mark A. Beaumont, Schools of Biological Sciences and Mathematics, The University of Bristol, Bristol, UK ABC for Temporally Sampled Genetic Data 05 April 2011 () 6 / 25
Nature Precedings : doi:10.1038/npre.2011.5953.1 : Posted 13 May 2011 Framework for Temporal Model with Admixture (3) For frequency vector α i of length K alleles, sampled at time t i , we model the change in frequency due to drift over the interval ∆ t i with effective size N i as α i ∼ D ( φ i α ′ ( i − 1) , 1 , . . . , φ i α ′ ( i − 1) , K ) where exp( − ∆ t i / N i ) φ i = (1 − exp ( − ∆ t i / N i )) · The observed frequencies, X i are assumed to be multinomial samples from α i Mark A. Beaumont, Schools of Biological Sciences and Mathematics, The University of Bristol, Bristol, UK ABC for Temporally Sampled Genetic Data 05 April 2011 () 7 / 25
Nature Precedings : doi:10.1038/npre.2011.5953.1 : Posted 13 May 2011 Framework for Temporal Model with Admixture (4) Admixture is modelled as α ′ i − 1 = (1 − µ i ) α i − 1 + µ i β i · The admixing frequencies β j , ( j = 1 , . . . , S ), and the initial α 0 , are drawn from Dirichlet distributions, parameterized by F i ( i = 0 , . . . , S ), and metapopulation frequency M . E.g : β 1 ∼ D ( θ 1 M 1 , . . . , θ 1 M K ) with θ 1 = 1 − 1 F 1 (Sewall Wright’s infinite island model) Mark A. Beaumont, Schools of Biological Sciences and Mathematics, The University of Bristol, Bristol, UK ABC for Temporally Sampled Genetic Data 05 April 2011 () 8 / 25
Nature Precedings : doi:10.1038/npre.2011.5953.1 : Posted 13 May 2011 α 0 F 0 t 0 X 0 μ 1 β 1 N 1 α 1 F 1 t 1 X 1 μ 2 β 2 N 2 α 2 F 2 X 2 t 2 μ 3 β 3 N 3 α 3 F 3 t 3 X 3 μ 4 β 4 N 4 F 4 α 4 X 4 t 4 μ 5 β 5 N 5 F 5 α 5 t 5 X 5 μ 6 β 6 N 6 F 6 α 6 t 6 X 6 Mark A. Beaumont, Schools of Biological Sciences and Mathematics, The University of Bristol, Bristol, UK ABC for Temporally Sampled Genetic Data 05 April 2011 () 9 / 25
Nature Precedings : doi:10.1038/npre.2011.5953.1 : Posted 13 May 2011 MCMC implementation of TMA Aim is to infer parameters in this model in a Bayesian framework. The likelihood is: P ( X 0 | α 0 ) P ( α 0 | F 0 , M ) × � S i =1 { P ( X i | α i ) P ( α i | α i − 1 , N i , ∆ t i , µ i , β i ) P ( β i | F i , M ) } The t i s are known. Assume a hierarchical prior on N i (Gaussian on log-scale) Assume beta priors on µ i and F i Assume Dirichlet prior on M Update parameters using Metropolis-Hastings. Mark A. Beaumont, Schools of Biological Sciences and Mathematics, The University of Bristol, Bristol, UK ABC for Temporally Sampled Genetic Data 05 April 2011 () 10 / 25
Nature Precedings : doi:10.1038/npre.2011.5953.1 : Posted 13 May 2011 Application to Bryozoan data 1 Data from a freshwater Bryozoan, Cristatella mucedo , studied by Beth Okamura (NHM, London) and Sophia Ahmed (Roscoff, France). 2 8 highly polymorphic microsatellite loci genotyped by Sophia Ahmed. 3 Sampled over 7 time periods. 4 Gene frequencies change markedly; unlikely to be due to drift. 5 Aim is to estimate effective population sizes, admixture proportions, and F ST of putative admixing populations. Mark A. Beaumont, Schools of Biological Sciences and Mathematics, The University of Bristol, Bristol, UK ABC for Temporally Sampled Genetic Data 05 April 2011 () 11 / 25
Nature Precedings : doi:10.1038/npre.2011.5953.1 : Posted 13 May 2011 Bryozoan data: results with MCMC algorithm 6 5 4 0.08 3 2 1 0.06 density 0.04 0.02 0.00 0 10 20 30 40 50 N Mark A. Beaumont, Schools of Biological Sciences and Mathematics, The University of Bristol, Bristol, UK ABC for Temporally Sampled Genetic Data 05 April 2011 () 12 / 25
Nature Precedings : doi:10.1038/npre.2011.5953.1 : Posted 13 May 2011 Bryozoan data: results with MCMC algorithm 6 5 4 3 2 6 1 density 4 2 0 0.0 0.2 0.4 0.6 0.8 1.0 mu Mark A. Beaumont, Schools of Biological Sciences and Mathematics, The University of Bristol, Bristol, UK ABC for Temporally Sampled Genetic Data 05 April 2011 () 13 / 25
Nature Precedings : doi:10.1038/npre.2011.5953.1 : Posted 13 May 2011 Bryozoan data: results with MCMC algorithm 0 1 2 150 3 4 5 6 100 density 50 0 0.0 0.2 0.4 0.6 0.8 1.0 FST Mark A. Beaumont, Schools of Biological Sciences and Mathematics, The University of Bristol, Bristol, UK ABC for Temporally Sampled Genetic Data 05 April 2011 () 14 / 25
Nature Precedings : doi:10.1038/npre.2011.5953.1 : Posted 13 May 2011 Convergence of MCMC Comparison of runs with likelihood held constant, to check for recovery of priors. Data sampled at 4 time points, 2 loci, 5 alleles each. Histogram — α i held constant Red line — α i updated Black line — prior N(4,1) 0.4 0.3 Density 0.2 0.1 0.0 2 4 6 8 Mean log N Mark A. Beaumont, Schools of Biological Sciences and Mathematics, The University of Bristol, Bristol, UK ABC for Temporally Sampled Genetic Data 05 April 2011 () 15 / 25
Nature Precedings : doi:10.1038/npre.2011.5953.1 : Posted 13 May 2011 Particle MCMC Implementation of TMA Aim is to avoid MCMC updates for α 1 , . . . , α S , but use MCMC for all other parameters (including α 0 ). At each MCMC step, use importance sampling of the α i to compute noisy likelihood estimate, conditioning on all parameter values at that stage in the MCMC. Mark A. Beaumont, Schools of Biological Sciences and Mathematics, The University of Bristol, Bristol, UK ABC for Temporally Sampled Genetic Data 05 April 2011 () 16 / 25
Recommend
More recommend