Phylogenomic perspectives on reproductive Phylogenomic perspectives on reproductive isolation and introgression isolation and introgression Botany Conference 2019, Tucson Deren Eaton, Columbia University Deren Eaton, Columbia University 1
The goal of phylogenomics The goal of phylogenomics Characterize evolu�onary rela�onships from a subset of sampled genomes. few genes across many taxa many genes across few taxa 2
WGS vs. RAD-seq genomic sampling WGS vs. RAD-seq genomic sampling Characterize whole genomes from a subset of sequenced markers. Full genome Shotgun reads Assembly Full genome RADseq reads Assembly 3
Coalescent varia�on Coalescent varia�on Different genomic regions have different genealocial histories. 4
Can sparse SNP sampling reconstruct Can sparse SNP sampling reconstruct genome-wide pa�erns? genome-wide pa�erns? Filtering and forma�ng to deal with missing data... 5
Viburnum Viburnum Phylogeny Phylogeny Species-level phylogenetic sampling Published: 65 species; Eaton et al. (2015) Current: 127 species; In Prep. Assembled in ipyrad (Eaton 2014; Eaton & Overcast) 290K RAD loci (75% missing) 3.1M SNPs across 127 species Species tree inferred with tetrad (Eaton et al. 2015) Uses all SNP information for each quartet (average ~30K SNPs per quartet) 6
Viburn Viburn 'ers 'ers 7
Viburnum Viburnum global RAD sampling global RAD sampling From global to popula�on-level varia�on. V. caudatum V. acutifolium V. microcarpum V. stenocalyx V. acutifolium V. loeseneri V. acutifolium V. acutifolium V. sulcatum hybrid V. lautum V. jucundum hybrid V. disjunctum hybrid V. stellato-tom. * V. hartwegii V. sulcatum V. costaricanum V. tinoides V. sulcatum V. jamesonii V. sulcatum V. triphyllum V. reticulatum global sub-clade populations 8
Viburnum Viburnum Orieno�nus rapid radia�on Orieno�nus rapid radia�on ~35 species from Mexico to Bolivia over ~10Ma 9
Viburnum Viburnum Orieno�nus rapid radia�on Orieno�nus rapid radia�on ~35 species from Mexico to Bolivia over ~10Ma A B C D E F G A B C D E F G A A B B C C D D E E F F G G 10
Viburnum Viburnum Orieno�nus rapid radia�on Orieno�nus rapid radia�on ~35 species from Mexico to Bolivia over ~10Ma
loesneri stenocalyx tiliafolium caudatum 59 microcarpum 97 ciliatum hartwegii-tuton microphyllum sulcatum-tuton fuscum 86 sulcatum 86 acutifolium stellato-tomentosum 93 disjuctum 97 blandum jucundum lautum 79 obtusatum hartwegii-chi costaricanum triphyllum hallii tinoides hallii tinoides anabaptista undulatum pastasanum jamesonii hallii pichinchense triphyllum 97 divaricatum reticulatum 69 seemenii ayavacense 11
Outline: RAD-seq phylogenomics in Outline: RAD-seq phylogenomics in ipyrad ipyrad 1. ipyrad-analysis toolkit. 2. Gene tree extrac�on: concatena�on. 3. Gene tree distribu�ons: sliding window consensus. 4. S�cking with SNPs: genome-wide inference. 12
ipyrad-analysis toolkit (and toytree) and jupyter ipyrad-analysis toolkit (and toytree) and jupyter 13
ipyrad-analysis toolkit ipyrad-analysis toolkit Filter or impute missing data; easily distribute massively parallel jobs. import ipyrad.analysis as ipa # initiate an analysis tool with arguments tool = ipa.pca(data=data, ...) # run job (distribute in parallel) tool.run() # examine results ... 14
PCA: very sensi�ve to missing data PCA: very sensi�ve to missing data No imputation (3% missing; 1250 SNPs) 5 5 4 PC1 (13.0%) 2 PC2 (5.4%) PC2 (5.4%) 0 0 0 -2 -4 -5 -5 0 10 20 30 0 10 20 30 -4 -2 0 2 4 PC0 (14.8%) PC0 (14.8%) PC1 (13.0%) 15
PCA: missing data imputed PCA: missing data imputed Pop 'Sampled' imputation (3.5% missing; 1207 SNPs) 2 2 2 PC1 (8.6%) PC2 (5.2%) PC2 (5.2%) 0 0 0 -2 -2 -2 -4 -2 0 2 4 -4 -2 0 2 4 -2 0 2 PC0 (27.4%) PC0 (27.4%) PC1 (8.6%) 16
PCA: missing data imputed PCA: missing data imputed Pop 'Sampled' imputation (22% missing; 10K SNPs) 10 10 10 PC1 (4.8%) PC2 (3.5%) PC2 (3.5%) 0 0 0 -10 -10 -10 -10 0 10 -10 0 10 -10 0 10 PC0 (8.3%) PC0 (8.3%) PC1 (4.8%) 17
PCA + T-SNE: missing data imputed PCA + T-SNE: missing data imputed TSNE manifold projec�on method (sckit-learn) 12 TSNE component 2 12 10 10 6 8 8 10 TSNE component 1 18
PCA + T-SNE: missing data imputed PCA + T-SNE: missing data imputed TSNE manifold projec�on method (sckit-learn) Veracruz 12 Oaxaca TSNE component 2 Colombia 12 Chiapas 10 Jamaica Bolivia 10 6 8 8 10 TSNE component 1 19
Outline: RAD-seq phylogenomics in Outline: RAD-seq phylogenomics in ipyrad ipyrad 1. ipyrad-analysis toolkit. 2. Gene tree extrac�on: concatena�on. 3. Gene tree distribu�ons: sliding window consensus. 4. S�cking with SNPs: genome-wide inference. 20
Missing data in phylogene�cs Missing data in phylogene�cs Locus 1 Locus 2 Locus 3 Locus 4 Locus 5 complete ... ... ... ... ... species-level sampling 1. concatenation 2. two-step inference 3. quartets joining (SNPs+SVD) 21
Missing data in phylogene�cs Missing data in phylogene�cs Locus 1 Locus 2 Locus 3 Locus 4 Locus 5 complete ... ... ... ... ... species-level sampling 1. concatenation 2. two-step inference 3. quartets joining (SNPs+SVD) 22
Missing data in phylogene�cs Missing data in phylogene�cs Locus 1 Locus 2 Locus 3 Locus 4 Locus 5 complete ... ... ... ... ... species-level sampling 1. concatenation 2. two-step inference 3. quartets joining (SNPs+SVD) 23
Window_extracter: extract, filter, format. Window_extracter: extract, filter, format. Reference mapped RAD loci can be "spatially binned" to form larger loci. import ipyrad.analysis as ipa # initiate an analysis tool with arguments tool = ipa.window_extacter( data=data, scaffold_idx=0, start=0, end=1000000, ) # writes a phylip file tool.run() 24
Window_extracter: extract, filter, format. Window_extracter: extract, filter, format. Reference mapped RAD loci can be "spatially binned" to form larger loci. Position 100-300 15500-15700 30500-30700 42500-42700 51000-51200 62000-62200 74500-74700 89000-89300 95000-95300 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... concatenation (mincov=4) 25
Window_extracter: extract, filter, format. Window_extracter: extract, filter, format. Reference mapped RAD loci can be "spatially binned" to form larger loci. Position 100-300 15500-15700 30500-30700 42500-42700 51000-51200 62000-62200 74500-74700 89000-89300 95000-95300 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... concatenation (mincov=8) 26
Herbicide resistance among Herbicide resistance among Amaranthus Amaranthus species. species. Chromosome 1 1Mb window at known concatenation tree herbicide resistance gene quinoa quinoa fimbriatus viridis fimbriatus deflexus Sandra Hoffberg australis viridis australis muricatus cannabinus deflexus cannabinus deflexus Eaton Lab Postdoc acanthochiton standleyanus arenicola blitum greggii blitum greggii blitum floridanus graecizans tuberculatus graecizans tuberculatus graecizans arenicola asplundii tuberculatus tricolor tuberculatus tricolor tuberculatus tricolor crassipes tricolor crassipes crassipes blitoides crassipes blitoides blitoides californicus blitoides albus californicus albus albus albus albus Introgression among standleyanus albus muricatus australis deflexus australis deflexus cannabinus the two most notorious weeds: deflexus cannabinus viridis palmeri viridis watsonii blitum palmeri A. palmeri (pigweed) blitum palmeri blitum palmeri graecizans palmeri graecizans palmeri A. tuberculatus (waterhemp) graecizans tuberculatus asplundii arenicola tricolor tuberculatus tricolor tuberculatus tricolor floridanus tricolor tuberculatus dubius tuberculatus dubius acanthochiton dubius arenicola spinosus greggii spinosus greggii spinosus fimbriatus spinosus fimbriatus spinosus powellii palmeri powellii watsonii powellii palmeri wrightii palmeri wrightii palmeri retroflexus palmeri retroflexus palmeri retroflexus quitensis acutilobus caudatus acutilobus caudatus dubius caudatus dubius caudatus dubius quitensis spinosus quitensis spinosus caudatus spinosus hybridus spinosus hybridus spinosus reference caudatus hypochondriacus caudatus hypochondriacus caudatus hypochondriacus caudatus hypochondriacus quitensis hypochondriacus caudatus hypochondriacus quitensis hybridus quitensis hybridus hybridus hybridus hybridus cruentus hybridus cruentus hybridus cruentus cruentus cruentus cruentus cruentus cruentus acutilobus cruentus acutilobus cruentus powellii hypochondriacus powellii hypochondriacus powellii hypochondriacus wrightii hypochondriacus wrightii hypochondriacus retroflexus hybridus retroflexus hypochondriacus retroflexus reference 27
Missing data in phylogene�cs Missing data in phylogene�cs Locus 1 Locus 2 Locus 3 Locus 4 Locus 5 complete ... ... ... ... ... species-level sampling 1. concatenation 2. two-step inference 3. quartets joining (SNPs+SVD) 28
Recommend
More recommend