Hitch-hiking and polygenic adaptation Kevin Thornton Ecology and Evolutionary Biology, UC Irvine 1
Linked selection vs. fates of selected mutations Hudson & Kaplan, 1995 De Vladar & Barton, 2014 2
Modeling traditions Population genetics Evol. quantitative genetics Fixed effect sizes Variable effect sizes Single selected site Many sites Directional sel’n Stabilizing selection Partial linkage LE or QLE 3
Tree structures Neutral Recent hard sweep 4
Patterns reflect the tree structures 5
Linked selection during polygenic adaptation • Use forward simulations • fwdpy11 is a Python package • Uses a C++ back-end (Thornton, 2014, Genetics) 6
Simulation scheme w = e − ( z − z o ) 2 / (2 V S ) A locus • 10 unlinked loci, θ = ρ = 1 , 000 per locus • Additive mutations arise at rate µ , Θ = 4 N µ • Two thetas cannot possibly be confusing. • N = 5 , 000 diploids • Evolve under GSS for 10 N generations with optimal trait value of 0 • Shift optimal trait value to z o > 0 and evolve for 10 N more generations 7
Adaptation occurs rapidly and before fixation = 2.5 × 10 4 , z 0 = 1, = 0.045 = 0.001, z 0 = 1, = 0.089 = 0.005, z 0 = 1, = 0.200 Mean trait value Mean trait value Mean trait value 1.0 5 × V ( G ) 5 × V ( G ) 5 × V ( G ) Value 0.5 0.0 1.0 = effect size, = effect size, = effect size, o = origination time o = origination time o = origination time 0.8 = 0.38, o = -0.0018 = 0.55, o = -0.0004 = 0.26, o = -0.0044 Mutation frequency = 0.17, o = 0.0040 0.6 = -0.09, o = 0.3042 0.4 0.2 0.0 1.0 = effect size, = effect size, = effect size, o = origination time o = origination time o = origination time 0.8 = 0.57, o = 0.0000 = 0.75, o = 0.0022 = 0.68, o = -0.0016 Mutation frequency = 0.46, o = 0.0004 = 0.57, o = -0.0002 = 0.57, o = -0.0012 0.6 = 0.36, o = 0.0002 = 0.55, o = 0.0066 = 0.54, o = -0.0004 = 0.34, o = 0.0006 = 0.48, o = 0.0012 = 0.51, o = -0.0008 = 0.31, o = 0.0068 = 0.39, o = 0.0000 = 0.49, o = 0.0182 0.4 0.2 0.0 0.02 0.00 0.02 0.04 0.02 0.00 0.02 0.04 0.02 0.00 0.02 0.04 Time since optimum shift (units of N generations) Figure 1: Large optimum shift, z o = 1 with V S = 1. 8
Contributions of different loci Optimal trait value, z o Optimal trait value, z o = 1 = 1 Mutation rate, µ Mutation rate, µ = 0.00025 = 0.005 Mean genetic value of locus. 1.0 0.8 0.6 0.4 0.2 0 5 0 5 0 0 5 0 5 0 0 0 1 1 2 0 0 1 1 2 . . . . . . . . . . 0 0 0 0 0 0 0 0 0 0 Time since optimum shift (units of N generations) Figure 2: Mean trait value per locus , colored by rank. 9
Sweeps from SGV start out rare 4000 0 1 2 3 4 5 6 7 8 9 0 2 4 6 8 10 12 0 5 10 15 20 25 3500 Number of haplotypes with mutation 3000 z o = 1.00 z o = 1.00 z o = 1.00 = 0.00025 = 0.001 = 0.005 2500 2000 1500 1000 500 0 0.0 0.5 0.0 0.5 0.0 0.5 Effect size ( ) Effect size ( ) Effect size ( ) This predicts “hard” sweep signals due to sweeps from large-effect SGV. 10
Temporal and spatial patterns of “selection signals” Distance from window with causal mutations. 0 2 4 1 3 5 z o z o z o = 1 = 1 = 1 µ µ µ = 0.00025 = 0.001 = 0.005 0.1 0.0 Mean H' −0.1 −0.2 −0.3 −0.4 0 1 2 3 0 1 2 3 0 1 2 3 z o z o z o = 1 = 1 = 1 µ µ µ = 0.00025 = 0.001 = 0.005 0.05 Mean z−score 0.0 −0.05 0 1 2 3 0 1 2 3 0 1 2 3 Time since optimum shift (units of N generations) Figure 3: Mean statistic per window over time for a large optimum shift. z scores are for the nS L statistic (Ferrer-Admetlla et al. (2014), MBE 11
Similar patterns for new mutations vs SVG z o z o z o = 1:Standing var. = 1:Standing var. = 1:Standing var. µ µ µ = 0.00025 = 0.001 = 0.005 0 −1 −2 Mean H' z o z o z o = 1:New mutation = 1:New mutation = 1:New mutation µ µ µ = 0.00025 = 0.001 = 0.005 0 −1 −2 0 1 2 3 0 1 2 3 0 1 2 3 z o z o z o = 1:Standing Var. = 1:Standing Var. = 1:Standing Var. µ µ µ = 0.00025 = 0.001 = 0.005 0.3 0.2 0.1 0.0 Mean z−score −0.1 −0.2 z o z o z o = 1:New mutation = 1:New mutation = 1:New mutation µ µ µ = 0.00025 = 0.001 = 0.005 0.3 0.2 0.1 0.0 −0.1 −0.2 0 1 2 3 0 1 2 3 0 1 2 3 Time since optimum shift (units of N generations) Figure 4: Same data, but conditioning on fixations of large effect 12
Mutational variance matters Distance from window with causal mutations. 0 2 4 1 3 5 Pr(| γ | >= γ ^) Pr(| γ | >= γ ^) Pr(| γ | >= γ ^) = 0.75 = 0.75 = 0.75 µ µ µ = 0.00025 = 0.001 = 0.005 0.0 −0.5 −1.0 Mean H' Pr(| γ | >= γ ^) Pr(| γ | >= γ ^) Pr(| γ | >= γ ^) = 0.05 = 0.05 = 0.05 µ µ µ = 0.00025 = 0.001 = 0.005 0.0 −0.5 −1.0 0.00 0.05 0.10 0.15 0.00 0.05 0.10 0.15 0.00 0.05 0.10 0.15 Time since optimum shift (units of N generations) Figure 5: Choose σ µ so that probability of a large-effect mutation is constant. Time scale is determined by δ q of fixations. 13
Implications • Patterns unique to “soft sweeps” are not generated by this model! • We are using supervised machine learning (Schrider/Kern) to further investigate this. • Hitch-hiking signals decrease as Θ increases • Keep in mind that our “tests” are usuall designed to detect hard sweeps Data not shown: • Small optimum shifts leave less dramatic patterns 14
Tree sequences: representing genetic data using tables Kelleher, et al. 2016. PLoS Computational Biology a.k.a "The msprime paper" Tree topologies and mutations: Nodes: Edges: ID time left right parent child 2 4 4 1 Time ago 0 0.0 0 10 3 1 1 0.0 0 10 4 3 0 1 3 3 2 0.0 2 0 5 3 0 3 1.0 0 5 4 2 2 1 0 0 1 2 0 4 2.0 5 10 3 2 T A A G G C 5 10 4 0 Intervals: 1 3 Sites: Mutations: 3 4 ancestral derived ID position state ID site node state 0 3 0 4 0 2.5 A 0 0 2 T 1 7.5 G 1 1 3 C 2 4 2 3 2 1 1 G 0 5 10 Genomic position 15
Tree sequence simplification. . . Kelleher, et al. 2018. PLoS Computational Biology 16
. . . can be done in FAST linear time. . . 17
. . . and give a huge performance boost. . . N = 1 e + 03 50 N = 1 e + 04 N = 5 e + 04 40 pedigree recording Speedup due to 30 20 10 0 10 3 10 4 10 5 Scaled recombination rate ( = 4 Nr ) 18
. . . allowing chromosome-scale simulations in large N Θ = 10 Θ = 100 "Polygenic adaptation" "Complete and partial sweeps" 0.116 doman. doman. Expected proportion of singleton mutations Generations since 0.114 optimum shift 0 50 100 150 200 250 0.112 0.110 0 0 0 5 0 5 0 5 0 5 . . . . . . 0 . . . . 0 0 2 5 7 1 0 2 5 7 1 Distance from trees with selected mutations (units of 4Nr) Figure 6: N = 2 × 10 5 diploids, ρ = 10 5 ( ≈ 100MB in humans), γ ∼ N (0 , 0 . 25), V S = 1. Analysis based on n = 3 , 000 diploids. 19
Facilitates better testing • Methods for detecting polygenic adaptation of continuous traits shouldn’t be evaluated with simulations of strong sweeps. • Methods assuming linkage equilibrium need to be tested using simulations involving partial linkage • etc. 20
Resources • fwdpy11 : https://fwdpy11.readthedocs.org • msprime : https://msprime.readthedocs.org • Tree sequence tutorials: https://tskit-dev.github.io/tutorials/ • The tree sequence toolkit: https://github.com/tskit-dev/tskit (“almost ready”) 21
Thanks • David Lawrie • Khoi Hyunh • Jaleal Sanjak • Tony Long • Jerome Kelleher, Jaime Ashander, Peter Ralph • NIH for funding • UCI HPCC for computing support 22
Recommend
More recommend