Functional Divergence Topic 3B: Testing adaptive macroevolution [Part 2] Why are assumptions worth worrying about? Because they can lead to qualitatively different biological conclusions !!! 1
Can model assumptions affect the results a particular gene? Estimation of d S and d N between Drosophila melanogater and D. simualns GstD1 genes Method κ N d S d N ω ts/tv bias Codon bias S 1.0 152.9 447.1 0.0776 0.0213 0.274 no no yes no 1.88 165.8 434.2 0.0221 0.0691 0.320 3 × 4 no 1.0 70.6 529.4 0.1605 0.0189 0.118 3 × 4 yes 2.71 73.4 526.6 0.1526 0.0193 0.127 1.0 40.5 559.5 0.3198 0.0201 0.063 no empirical yes empirical 2.53 45.2 554.8 0.3041 0.0204 0.067 (Data from: Bielawski and Yang, In Statistical methods in Molecular Evolution , Springer Verlag Series in Statistics in Health and Medicine. New York, New York. In Press ) . 2
OK, that was a quantitative difference, but it did not lead to a qualitative difference in the biological conclusion Isochores and the vertebrate genome Cold-blooded Warm-blooded L2 H2 L1 H1 L1 H3 Isochore families (>300kb) GC poor: L1 and L2 GC rich: H1 and H2 H1 H2 GC very rich: H3 3
Origins of isochores 1. Natural selection: Bernardi and Bernardi 1986 Galtier and Mouchiroud 1998 Eyre-Walker 1999 2. Mutation pressure: Filipski 1988 Wolfe and Sharpe 1993 Francino and Ochman 1999 What is the genomic relationship between d S and GC content? 1. 2. 3. d s d s d s GC3 GC3 GC3 Miyata et al . 1989 Most studies Eyre-Walker 1994 Bernardi et al . 1993 Matassi et al . 1999 4
Mammalian nuclear genes: Simple Model Model with ts/tv and codon bias 1.8 1.8 r 2 = 0.53, P < 0.0001 r 2 = 0.0228, P = 0.1759 1.6 1.6 1.4 1.4 1.2 1.2 1.0 1.0 d S 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0 GC3 GC3 Artiodactyla vs. Primates (82 nuclear genes) (Data from:Bielawski, Dunn, and Yang (2000) Genetics, 156: 1299 - - 1308) Is my favorite gene evolving under positive selection pressure? 5
Estimation bias for the d N / d S ratio Simulation: GC3 = 89.5% (ENC = 28.3) 2.5 Positive selection 2.0 d N / d S = 0.01 1.5 d N / d S d N / d S = 0.10 1.0 d N / d S = 0.30 0.5 Purifying selection 0.0 0 1 2 3 4 Sequence divergence ( t ) The d N / d S ( ω ) ratio is a valuable index of selection pressure! Computing the d N / d S ( ω ) ratio can be tricky! 6
Another problem: In a pairwise analysis we must average the ω ratio over: 1. all sites 2. the entire evolutionary history CCT CAG t 0 t 1 k Pairwise analysis does not detect much adaptive evolution In a large-scale pairwise database search, only 17 out of 3,595 genes were found to be under positive selection, at <0.5% (Endo et al . 1996 MBE 13 : 685-690) 7
The problem of averaging over sites: ATG CTT GTG CTA CTT GTG CTA CTT GTG CTA CTT GTG CTA ATG CTT GTG CTA CTT GTG CTA CTT GTG CTA CTT GTG CTA CGC TAA 1 0 1 3 5 7 9 1 1 1 3 1 5 1 7 1 9 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95 97 99 1 01 Purifying: d N / d S < 1 Purifying: d N / d S < 1 Purifying: d N / d S < 1 Neutral: Neutral: Neutral: d N / d S = 1 d N / d S = 1 d N / d S = 1 Adaptive: Adaptive: Adaptive: d N / d S > 1 d N / d S > 1 d N / d S > 1 The problem of averaging over sites: 75% St. purifying: ω = 0.005 20% Wk. purifying: ω = 0.50 5% Adaptive: ω > 3.5 When we average over all three classes of sites ( ) we do NOT detect positive selection: The average is a weighted sum over all three categories of sites: (3.5 × 0.05) + (0.5 × 0.20) + (0.005 × 0.75) = 0.279 The average over all sites indicates that purifying selection dominates, with ω = 0.28 8
The problem of averaging over time: 150 – 200 mya 150 – 200 mya Fraction of t ω b.l. (my) 115 0.203 0.2 55 0.097 0.2 100 – 140 mya 100 – 140 mya 60 0.106 0.2 40 – 80 mya 40 – 80 mya 60 0.106 0.2 35 mya 35 mya 85 0.150 0.5 35 0.062 0.5 35 0.062 0.5 120 0.212 1.2 Chrom. 11 Chrom. 11 ε ε ε γ G γ G γ G γ A γ A γ A δ δ δ β β β (T = 565my) (T = 1) β globin gene cluster β globin gene cluster Grey branches : ω = 0.2 Again, if we average Black branches : ω = 0.5 over the tree, we do Blue branches : ω = 1.2 NOT detect positive selection; ω = 0.49. We have the technology… 9
A real dataset: let’s do it! 10
What is the DAZ gene family? Two members: DAZL1 : - autosome [3p24] - present in all vertebrates DAZ : - Y chromosome [Yq11.23] - present only in Old World Monkeys DAZ evolved via a chromosomal translocation event Gene duplication via 3p24 Yq11.23 translocation to Y-chromosome; DAZL1 DAZ O.W.M. 40 MYA DAZL1 N.W.M. -- All other DAZL1 -- vertebrates 11
DAZ = Deleted in AZoospermia • Azoospermia is the most common form of male infertility • AZF (azoospermic factor) - locus on Y chromosome ~15% of infertile men have deletion in AZF - deletion in AZF contains a gene[s], crucial for spermatogenesis - one of these (AZFc) encodes the DAZ gene At first, DAZ was thought to be functional: • DAZ and DAZL1 : expressed only in germ cells • DAZ : expression highest in spermatogonia • Elimination of DAZL1 in mice = azoospermia • Human DAZ rescues azoospermic mice Evolutionary analysis of DAZ family offers surprising conclusion (Agulnick et al. 1998) • Similar rates among three codon positions • Similar rates between introns and exons High rates of nonsynonymous substitution ( ω about 1) • Surprising conclusions: 1- No functional constraints on primate DAZ (young pseudogene) 2- DAZ plays no role in human spermatogenesis Method problem? Pairwise estimation of d N and d S Simple model [ts=tv; equal frequencies; JC69 correction] 12
Did selection pressure change following the translocation event? DAZL1 : Mus c DAZL1 : Macacca b d DAZL1 : Human a f DAZ : Macacca e g 0.1 DAZ : Human Synonymous sites Chromosomal translocation event Probabilistic models can permit different ω s on different branches x 2 x 3 x 4 x 1 t 3 ; ω 0 t 4 ; ω 0 t 1 ; ω 1 t 2 ; ω 1 j t 0 ; ω 0 k 13
Variable selective pressure among lineages of the DAZ gene family DAZL1 : Mus 0.001 DAZL1 : Macacca 3.47 1.44 DAZL1 : Human 0.10 0.35 DAZ : Macacca 0.35 1.14 0.1 DAZ : Human Increasing model complexity will always increase the likelihood score p Model Parameters for branches l ω 0 = 0.295 for all branches One-ratio……………… 1 -1442.44 ω 0 = 0.100 for branch a Free ratios……………… 7 -1426.40 ω 1 = 3.474 for branch b ω 2 = 0.001 for branch c ω 3 = 1.444 for branch d ω 4 = 0.350 for branch e ω 5 = 0.355 for branch f ω 6 = 1.144 for branch g The free ratios model has a higher likelihood, but it also has more parameters; how do we know if the gain in likelihood is significant? 14
Likelihood ratio test (LRT) ℓ 0 is the maximum log likelihood under H 0 given parameters θ 0 and ℓ 1 is the maximum log likelihood under H 1 given parameters θ 1 Test statistic = 2 ∆ ℓ = 2( ℓ 0 ( θ 0 ) - ℓ 1 ( θ 1 )) Degrees of freedom = difference in the number of parameters between the two models The LRT tests for a significant gain in likelihood score We test the increase the likelihood score with an LRT p Model Parameters for branches l ω 0 = 0.295 for all branches One-ratio……………… 1 -1442.44 ω 0 = 0.100 for branch a Free ratios……………… 7 -1426.40 ω 1 = 3.474 for branch b ω 2 = 0.001 for branch c ω 3 = 1.444 for branch d ω 4 = 0.350 for branch e ω 5 = 0.355 for branch f ω 6 = 1.144 for branch g Likelihood ratio test: One-ratio vs. Free-ratios: 2 δ = 14.2, df = 6, P = 0.014 Remember: the above estimates are an average over all sites in the gene! 15
Does selection pressure vary among sites? 1 p 0 0.8 0.6 0.4 p 1 p 2 0.2 0 Site class 1: ω 0 < 1, 75% of codon sites Site class 2: ω 1 = 1, 20% of codon sites Site class 3: ω 2 > 1, 05% of codon sites K − 1 ∑ = ω P p P We can formulate this in terms of a probability distribution: ( x ) ( x | ) h i h i i = 0 Does selection pressure vary among sites? H 0 : unifo rm selec tive pressure amo ng sites (M0) H 1 : variable se le c tive pr e ssur e amo ng site s (M3) Co mpare 2 ∆ l = 2( l 1 - l 0 ) with a χ 2 distr ibution Model 0 Model 3 1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 ω ω ω ω ˆ ˆ ˆ ˆ = 0.59 = 0.09 = 0.64 = 5.64 Note: the above are plots of the MLEs for DAZ 16
Are some sites subject to positive selection? H 0 : Beta distribute d variable sele c tive pressure (M7) H 1 : Beta plus po sitive selec tio n (M8) Co mpare 2 ∆ l = 2( l 1 - l 0 ) with a χ 2 distr ibution M8: beta& ω M7: beta Sites Sites 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 >1 ω ratio ω ratio Note: the above are plots of the MLEs for DAZ We use the LRT to test two hypotheses: H 1 : Selection pressure varies among sites in DAZ (M0 vs. M3) H 2 : Some sites in DAZ evolved under positive selection (M7 vs. M8) M0 vs. M3 M7 vs. M8 Tree A ………………………… 8.94* 6.82* Tree B ………………………… 19.14* 12.16* χ 2 Note. ⎯ * significant at 5% level ( = 5.99, df = 2) 5 % 17
Recommend
More recommend