How best to distinguish selection on discrete loci from the infinitesimal model? Nick Barton
2 ��� Vienna Feb 2019.nb Longshanks Frank Chan, Layla Hiramitsu (Tübingen); Campbell Rolian (Calgary); Stefanie Belohlavy (IST); bioRxiv Two replicates of ~30 mice: within-family selection for the longest tibia Use a composite trait log TM - 0.57
Vienna Feb 2019.nb ��� 3 Some 10kb windows show strong allele frequency change: z=2 arcsin p ; Δ z 2 in 10kb windows
4 ��� Vienna Feb 2019.nb Motivation There is a rapid, consistent response to selection We know the selection, the pedigree, the sequence … Can we find the causal alleles ? A small experiment - but it represents larger populations, selected for a longer time.
Vienna Feb 2019.nb ��� 5 Outline The infinitesimal as the null model Variation in SNP and haplotype frequencies Estimating e ff ects of candidate loci on fitness and trait
6 ��� Vienna Feb 2019.nb The infinitesimal with linkage In this experiment, the pedigree is fixed, and so chromosomes evolve independently How much does infinitesimal selection a ff ect allele frequencies? � � = � / �� � � = � / �� ��� ��� �� �� � � ��� ��� ���� ��� ��� ��� ��� ��� ���� ���� ��� ��� ��� ��� ��� ���� � � = �� / �� � � = �� / �� ��� ��� �� �� �� �� � � � � ��� ��� ���� ��� ��� ��� ��� ��� ���� ���� ��� ��� ��� ��� ��� ���� Even strong selection has little e ff ect The di ff usion approximation works well Infinitesimal selection produces a slight excess of sweeps
Vienna Feb 2019.nb ��� 7 SNP are carried on haplotype blocks Simulate, conditioning on the pedigree, and the observed heritability (assuming additivity) ���� �� SNP are thrown down onto the haplotype blocks
8 ��� Vienna Feb 2019.nb Variance in SNP frequency is inflated (grey/black: old/new data; colours: replicate simulations) LS1 chrom 1 4000 3000 3000 2000 2000 1000 1000 10 20 30 40 50 10 LS1 chrom 2 5000 3500 3000 4000 2500 3000 2000 2000 1500 1000 1000 500 10 20 30 40 50 10 LS1 chrom 3 3500 4000 3000 2500 3000 2000 2000 1500 1000 1000 500 10 20 30 40 50 10 LS1 chrom 4 2500 2000 2000 1500 1500 1000 1000 500 500 10 20 30 40 50 10 LS1 chrom 5 4000 3000 2500 3000 2000 2000 1500 1000 1000 500 10 20 30 40 50 10 LS1 chrom 6 5000 5000 4000 4000 3000 3000 2000 2000 1000 1000 10 20 30 40 50 10 LS1 chrom 7 4000 1500 3000 1000 2000 500 1000 10 20 30 40 50 10 LS1 chrom 8 2000 2000 1500 1500 1000 1000 500 500 10 20 30 40 50 10 LS1 chrom 9 2000 1500 1800 1000 1600 1400 500 0 10 20 30 40 50 10 LS1 chrom 10 3000 2500 2500 2000 2000 1500
Vienna Feb 2019.nb ��� 9 1500 1500 1000 1000 500 500 10 20 30 40 50 10 LS1 chrom 11 3500 4000 3000 3000 2500 2000 2000 1500 1000 1000 500 10 20 30 40 50 10 LS1 chrom 12 2500 2500 2000 2000 1500 1500 1000 1000 500 500 10 20 30 40 50 10 LS1 chrom 13 3500 4000 3000 2500 3000 2000 2000 1500 1000 1000 500 10 20 30 40 50 10 LS1 chrom 14 5000 5000 4000 4000 3000 3000 2000 2000 1000 1000 10 20 30 40 50 10 LS1 chrom 15 3000 3000 2500 2500 2000 2000 1500 1500 1000 1000 500 500 10 20 30 40 50 10 LS1 chrom 16 1200 1200 1000 1000 800 800 600 600 400 400 200 200 10 20 30 40 50 10 LS1 chrom 17 4000 3500 3000 3000 2500 2000 2000 1500 1000 1000 500 10 20 30 40 50 10 LS1 chrom 18 3000 2500 2500 2000 2000 1500 1500 1000 1000 500 500 10 20 30 40 50 10 LS1 chrom 19 2000 2000 1500 1500 1000 1000 500 500 10 20 30 40 50 10 LS1 chrom 20 150 150 100 100 50 50
10 ��� Vienna Feb 2019.nb 10 20 30 40 50 10 20
Vienna Feb 2019.nb ��� 11 Variation in SNP frequency is inflated by LD in the base population In any window, we have k i of the i ' th haplotype: Variation in SNP frequencies reflect k i 1 n s Δ p i < Δ p 2 > = 2 ( 1 ) n S i = 1 j j i - 1 < Δ p 2 > = 2 var [ k ] 1 - n 0 - 1 n T ( 2 ) j where is the initial SNP frequency n 0 1 n s var Δ p i n s var < Δ p 2 > = 2 + i,j = 1 cov Δ p i 2 , Δ p j 2 ( 3 ) n S 2 i = 1 i ≠ j var < Δ p 2 > depends on the moments of k i and is inflated by LD in the base population. With large # of SNP, var < Δ p 2 > ~ cov Δ p i 2 , Δ p j 2 , which increases with D 2 . e.g. n 0 = 32, k i = { 10, 4, 2, 1, 1, 1, 1, 0, 0, …} , p 0 = 0.5: � � Δ � � � ] ��� [ Δ � � ����� ����� ���� �� ����� � � / � ��� ���
12 ��� Vienna Feb 2019.nb Is the candidate on chrom. 5 significant?
Vienna Feb 2019.nb ��� 13 Is the candidate on chrom. 5 significant? ���� ��
14 ��� Vienna Feb 2019.nb Is the candidate on chrom. 5 significant? Pairs of simulations, starting from the same founder genomes, give outlier Δ z 2 that overlap the signal from LS1 (red) but not LS2 (orange) Based on SNP frequencies, the signal is marginally significant
Vienna Feb 2019.nb ��� 15 Three sources of variation in SNP frequencies - e ff ects of founders - evolution of replicates - random SNP on haplotypes Variation due to LD amongst SNP can be strong: - coalescent simulations of a well-mixed population - Kelly & Hughes: D. simulans This source of error can be eliminated by working with haplotypes - haplotypes can be reconstructed from SNP frequencies (Kessner et al., 2013, Franssen et al., 2016)
16 ��� Vienna Feb 2019.nb How strong is selection ? Alleles in the candidate region on chrom. 5 sweep from p= 0.178 → 0.833, 0.981 in LS1, LS2 ⇒ s ~ 1 t log p 17 q 0 p 0 ~ 0.25 (cf. Taus et al., 2017) q 17 How large an e ff ect on the trait? Simulate an additive allele, e ff ect A ; 40 replicates; s = 0.41 A V e (le � ) The mean and sd from infinitesimal simulations (dots) fit with a single-locus WF model, N e ∼ 44(red) p 17 s 0.8 A / V e 0.20 - 0.8 - 0.6 - 0.4 - 0.2 - 0.05 0.6 0.15 - 0.10 ���� �� 0.4 - 0.15 0.10 - 0.20 0.2 0.05 - 0.25 - 0.30 A - 0.8 - 0.6 - 0.4 - 0.2 - 0.8 - 0.6 - 0.4 - 0.2 = 0.59 The locus on chromosome 5 has e ff ect A V e (0.32 V e to -0.87 V e ). This single locus is responsi- ble for ~ 9.4% (3.6% - 15.5%) of the response .
Vienna Feb 2019.nb ��� 17 Summary - The infinitesimal should be used as the null model - In Longshanks, even strong infinitesimal selection has little e ff ect (but: selection was within families; the map is long) - Substantial variation is generated by random assignment of SNP to haplotypes - especially with LD in the base population - Even an obvious signal is marginally significant in any one line - How many loci contribute to the selection response ?
Recommend
More recommend