The Machinery of Parametric Linkage Analysis David Duffy Queensland Institute of Medical Research Brisbane, Australia
Introduction • Mendelism • Linkage • Statistical distributions • Maximum likelihood linkage analysis • The generalized single major locus model QIMR
Mendel and Mendelism • Mendel studied binary traits • Had parental lines that bred true for traits ( homozygous ) • F 1 hybrid offspring were homogenous • F 2 generation exhibited Mendelian ratios • 3:1 • 1:2:1 QIMR
Backcross • F 1 with P 1 or P 2 • Simpler ratios • Simpler interpretation in case of linkage Paternal Genotype = Ff (F 1 ) Slightly frizzled F (50%) f (50%) Maternal Genotype = FF F (50%) FF (25%) Ff (25%) Frizzled (P 1 ) Frizzled Slightly Frizzled F (50%) FF (25%) Ff (25%) Frizzled Slightly Frizzled QIMR
The Other Backcross Maternal Genotype = ff f (50%) Ff (25%) ff (25%) Normal (P 2 ) Slightly Frizzled Normal f (50%) Ff (25%) ff (25%) Slightly Frizzled Normal QIMR
Dihybrid testcross • Backcross involving two traits • If both are dominant, see a 1:1:1:1ratio in the (informative) testcross Two traits in the potato plant: Tall v. Dwarf , and Cut leaf v. Potato cut leaf . Counts in the backcross generation (MacArthur 1931): Tall, Cut (F 1 ) x Dwarf, Potato Tall Dwarf Cut 77 72 149 Potato 62 73 135 139 145 284 QIMR
Linkage in a dihybrid testcross • Deviation from a 1:1:1:1ratio is due to linkage between the trait loci Two traits in the chicken: Frizzled v. Normal , and White v. Coloured . Counts in the testcross (Hutt 1931): White,Frizzled (F 1 ) x Coloured,Normal White Coloured Frizzled 18 63 81 Normal 63 13 76 81 76 157 The recombination fraction c = (18+13)/157 = 0.197. QIMR
Phase: Coupling and repulsion Counts from another mating (Hutt 1933): White,Frizzled (F 1 ) x Coloured,Normal White Coloured Frizzled 15 2 17 Normal 4 12 16 19 14 33 The recombination fraction c = (4+2)/33 = 0.182. In this family, the dominant traits White and Frizzled are in coupling , but in the previous family, they were in repulsion . QIMR
QIMR
Phase: Coupling and repulsion of frizzled and coloured In the backcross, only one parent is doubly heterozygous and contributes to the linkage information. In double heterozygotes, there are two possible arrangements on the chromosomes (the pairs of alleles on each chromosome are haplotypes ): QIMR
Gametic frequencies IF If iF if IF / if (coupling) (1-c)/2 c/2 c/2 (1-c)/2 If / iF (repulsion) c/2 (1-c)/2 (1-c)/2 c/2 Chooks 1 2 d Coloure 4 5 2 3 d Frizzle 4 5 2 3 3 4 d 4 2 0 1 Coloure d Frizzle 4 2 0 1 1 1 9 8 5 2 6 1 6 1 5 0 1 4 1 7 1 3 1 d 2 Coloure 0 4 0 4 0 4 0 2 0 2 0 4 0 2 0 4 0 2 0 4 0 2 0 d 4 Frizzle 0 4 0 4 0 4 0 2 0 2 0 4 0 2 0 4 0 2 0 4 0 2 0 QIMR
Mapping and Multipoint Analysis The experimental cross can be extended to involve more loci: three-point cross, etc • • The recombination fractions between pairs of loci can be used to order loci in the same linkage group The presence of double recombinants and interference means that recombination fractions are only roughly additive. A mapping function adjusts for one or both of these phenomena, allowing us to estimate consistent genetic map distances . So they address questions like, “if c AB =0.4 and c BC =0.4, what should c AC be?”. One map unit (1Morgan) is the (shortest) map distance that is equivalent to c =0.50. QIMR
Mapping and Multipoint Analysis The Morgan mapping function is, x = c , where x is the distance in map units. This assumes complete interference, and is adequate over small distances. The Haldane mapping function is: x = 0.5 log(1-2c) c = 0.5 (1-e -2x ) and adjusts for double recombination only. Trow’s formula assumes the Haldane mapping function: c AC = c AB + c BC − 2 c AB c BC . The Kosambi mapping function also allows for interference, but is not multipoint consistent , so it very occasionally causes problems in multipoint linkage analysis. x = 0.25 log[(1+2c)/(1-2c)] c = 0.5 (e 4x -1)/(e 4x +1) QIMR
Mapping and Multipoint Analysis Data from three-point cross of corn ( colourless , shrunken , waxy ) due to Stadler. Progeny Phenotype Count 1 A B C 17959 2 a b c 17699 3 A b c 509 4 a B C 524 5 A B c 4455 6 a b C 4654 7 A b C 20 8 a B c 12 Total Tested 45832 QIMR
Statistical Underpinnings In these experimental crosses,the numbers of offspring per mating is large,so we can neglect statistical uncertainty about: • The accuracy of the genotypes • The phase of the mating • The counts of recombinants and nonrecombinants Recombination is a binary ( yes - no , R - NR ) phenomenon. For a given parental genotype of known phase,the probability of a recombination event in production of a gamete isa constant ( c ). Each meiosis is an independent Bernoulli trial . The count of recombination events arising from a number of meioses therefore comes from the binomial distribution . QIMR
The Binomial Distribution If two loci are unlinked, c =0.50. For a testcross giving rise to 3 offspring, we expect eight outcomes to be equally likely. While if the two loci are linked, with c =0.10 say, the outcomes with fewer recombinants will be observed more often. Outcome c=1/2 c=1/10 R, R, R 1/8 1/1000 R, R, NR 1/8 9/1000 R, NR, R 1/8 9/1000 R, NR, NR 1/8 81/1000 NR, R, R 1/8 9/1000 NR, R, NR 1/8 81/1000 NR, NR, R 1/8 81/1000 NR, NR, NR 1/8 729/1000 QIMR
The Binomial Distribution If the order of the events making up each outcome is irrelevant (as it is this case), we say the events are exchangeable , and we can summarize the outcomes as counts: R NR c=1/2 c=1/10 3 0 1/8 1/1000 2 1 3/8 27/1000 1 2 3/8 243/1000 0 3 1/8 729/1000 The expected number of recombination events if c =0.5 is E(R)= cN =1.5. If c =0.1, then E(R)= cN =0.3. QIMR
The Likelihood Ratio If we wish to make a decision about whether two loci are linked, we usually evaluate a likelihood ratio comparing two hypotheses about our observed data . If in our testcross sibship we observed 0 out of 3 recombinants, then the likelihood ratio comparing the two hypotheses c =0.1and c =0.5 is the ratio of the probability of observing the data under the two hypotheses . Since these probabilities are not “actual” probabilities, but contingent on the underlying hypothesis, Fisher suggested we call them likelihoods . L ( R = 0, NR = 3 | c = 0 . 5) = 0 . 125 L ( R = 0, NR = 3 | c = 0 . 1) = 0 . 001 LR = 125 We interpret this as saying that the hypothesis that c =0.1is 125 times more likely than the hypothesis that the loci are unlinked. QIMR
The Lod Score Newton Morton suggested in 1955 that a likelihood ratio testing the hypothesis of linkage should be “significant” if it was 1000:1in favour of a hypothesis where c < 0.5. This was based on a sequential testing argument and the length of the human genetic map. It is thus a genome-wide critical significance level , adjusting for the number of possible tests that could be done. If the likelihood ratio was 100:1in favour of the c = 0.5 null hypothesis, then he suggested thisbe accepted assignificant evidence for exclusion of linkage for that value of c (eg c =0.1). Intermediate ratios were regarded as inconclusive. Following Barnard (1947), he presented the likelihood ratio as the decimal log odds or lod score . The lod scores from different families testing the same linkage hypothesis can be added together to obtain a total lod score for that hypothesis. Similarly, for large datasets, the likelihoods for particular hypotheses are usually very small, so model log likelihoods are a convenient summary for computations. QIMR
Linkage in outbred human families Human families are relatively small, so phase is harder to evaluate. Matings are relatively random, so only a proportion of families in the population are informative for linkage analysis at any given marker. QIMR
Codominant marker loci and the direct method One way to work out the phase of a mating is to genotype three generations of a family. Where there enough doubly heterozygous parents, one can count up the recombination events, as in a planned cross. QIMR
Genotypes at D12S379 and D12S95 in an Amish family D12S379 205 209 193 201 197 209 201 209 D12S95 146 152 146 158 146 158 156 158 1 2 3 4 | | | | +----+----+ +----+----+ | | 193 205 197 201 146 158 146 156 5 6 | | +-----------+------------+ | +--------+--------+--------+----+----+--------+--------+--------+ | | | | | | | | 193 201 197 205 193 201 193 197 193 197 197 205 201 205 201 205 156 158 146 146 156 158 146 158 146 158 146 146 146 146 146 146 7 8 9 10 11 12 13 14 QIMR
Direct estimation of recombination fraction 2 D12S379 205 209 193 201 197 209 201 209 D12S95 146 152 158 146 146 158 156 158 1 2 3 4 | | | | +----+----+ +----+----+ | | 193 205 197 201 158 146 146 156 5 6 The grandparental data allows us to work out that the four gametes that gave rise to the parents 5 and 6 were: {205,146} from individual 1 , {193,158} from 2 , {197,146} from 3 , {201,156} from 4 . QIMR
Recommend
More recommend