1
play

1 Sampling Sampling Question to you: which samples do you select - PowerPoint PPT Presentation

Overview Sampling Effect estimation Hardy-Weinberg Equilibrium Linkage Disequilibrium Statistics and analytical issues: Haplotypes SNP and haplotype Linda Broer (l.broer@erasmusmc.nl) Genetic Laboratory Department of


  1. Overview � Sampling � Effect estimation � Hardy-Weinberg Equilibrium � Linkage Disequilibrium Statistics and analytical issues: � Haplotypes SNP and haplotype Linda Broer (l.broer@erasmusmc.nl) Genetic Laboratory Department of Internal Medicine Erasmus MC, Rotterdam Overview Sampling � Sampling � Effect estimation � Hardy-Weinberg Equilibrium � Linkage Disequilibrium � Haplotypes Research question: how much of the variance in my trait is explained by genetic differences between individuals? 1

  2. Sampling Sampling Question to you: which samples do you select for your Important to take samples representative of source study? population! Sampling Sampling in practice: the Rotterdam Study � What about case/control studies? � Objective: to study health and disease in an elderly population � Osteoporosis � Breast Cancer � We want an area in Rotterdam where individuals are mostly of European ancestry and are relatively old. � Coronary Heart Disease � Etc. � Look up in public registries information about districts in Rotterdam. � Your selected cases must represent cases from source population � Ommoord � < 10% not of European ancestry � Your selected controls must represent the source population � > 60% are 45+ years old � Who to sample? 2

  3. Sampling in practice: the Rotterdam Study Sampling in practice: the Rotterdam Study Sampling in practice: one more example Overview � You want to study the development of children, starting before birth � Sampling � Effect estimation � Where would you look for your study population? � Hardy-Weinberg Equilibrium � Linkage Disequilibrium � Gynaecologist � Haplotypes � Whom would you include? � All pregnant women entering the clinic between set time points (e.g., 1 year) 3

  4. Effect estimation in case/control studies Effect estimation in case/control studies Cases Controls Total Cases Controls Total a + b a + b Allele + a b Allele + a b c + d c + d Allele - c d Allele - c d a + c b + d a + b + c + d a + c b + d a + b + c + d � � ⁄ �∗� � Odds ratio (OR) = � = � � �∗� � Odds of Allele + in cases = a/c � No association: OR = 1 � Odds of Allele + in controls = b/d � Association: OR ≠ 1 � � ⁄ �∗� � Thus Odds ratio (OR) = � = � OR > 1: increased risk � � �∗� � OR < 1: decreased risk (protective) Effect estimation in case/control studies Effect estimation in case/control studies Cases Controls Total � Example: 49 224 273 Allele + � 122 cases, 1428 controls � 73 cases did not have the risk allele, while 49 did have the risk Allele - 73 1204 1277 allele 122 1428 1550 � 1204 controls did not have the risk allele, while 224 did �� �� ⁄ ��∗���� � Odds ratio (OR) = ���∗�� = 3.6 = ��� ���� ⁄ � Let’s fill in the formula � Carriers of this variant have an increased risk for the disease 4

  5. Effect estimation in continuous traits Effect estimation and genetic model 0.82 0.80 Mean BMD g/cm 2 AA 0.78 AG GG 0.76 � Additive model: every additional allele increases your trait � Most often tested in genetic studies 0.74 � Recessive model: only if you have two risk alleles you have increased SNP trait � Dominant model: your trait is increases with the same amount for one Compare mean values across genetic groups or two risk alleles Overview Hardy-Weinberg Equilibrium (HWE) � Sampling � In a population, allele and genotype frequencies will remain constant over generations � Effect estimation � Hardy-Weinberg Equilibrium � Assumptions: � Linkage Disequilibrium � Large population � Haplotypes � Random mating � No new mutations � No natural selection � No migration � Adequate genotyping 5

  6. Hardy-Weinberg Equilibrium (HWE) Hardy-Weinberg Equilibrium (HWE) � Controls represent the general population � Variant with 2 alleles (A and B) � Thus must be in HWE! � Frequency of allele A = p � Frequency of allele B = q � What if they are not? � Thus: � + = 1 � Most likely genotyping error occurred in lab. Can’t use the data! � Punnett square of AB x AB crossing A B � Cases can be out of HWE. Why? p 2 A pq B pq q 2 � Usually at or near the disease-causing variant � Rest of genome should be in HWE! Genotype Formula AA p 2 � Results in the formula for HWE � Thus: � � + 2� + � = 1 AB 2pq BB q 2 Hardy-Weinberg Equilibrium (HWE) Hardy-Weinberg Equilibrium (HWE) � � � + 2� + � = 1 : Use to calculate expected numbers � Example Genotype N Genotype Observed Expected (O-E) 2 /E AA (p 2 ) 30 0.55 2 *100 = 30.25 AA 30 0.002 AB (2pq) 50 AB 50 2*0.45*0.55*100=49.5 0.005 BB (q 2 ) 20 BB 20 0.45 2 *100=20.25 0.003 � � � = �� ��� = 0.30 � Chi 2 = 0.01 (1 degree of freedom) � � = 0.30 = 0.55 and = 1 − 0.55 = 0.45 � P-value = 0.92 � Let’s calculate HWE � This population is in HWE! 6

  7. HWE: another example Overview � Population of cats (n=100) � Sampling � 16 white & 84 black � Effect estimation � White is a recessive trait (bb) � Hardy-Weinberg Equilibrium � Black cats are BB and Bb � Linkage Disequilibrium � What are the genotype frequencies in this population of cats? � Haplotypes � White cats: � = 0.16 � Therefore: = 0.16 = 0.4 � Thus: � = 1 − = 1 − 0.4 = 0.6 � Frequency of BB: � � = 0.6 � = 0.36 � Frequency of Bb: 2� = 2 ∗ 0.6 ∗ 0.4 = 0.48 Mendel’s law of independent assortment What is linkage disequilibrium (LD)? � Each pair of alleles segregates independently of the other pairs and all � Mendel got lucky! possible combinations of alleles can occur in the resulting gametes � More simply put: two SNPs (or traits) will inherit independently � LD: co-occurrence of alleles at adjacent loci more frequently than expected by the allele frequencies and recombination rate SNP1 SNP2 SNP3 7

  8. What is the use of knowing about LD? Where does LD come from? Alleles that exist in the population today arose through ancient mutation � Currently ~60 Million SNPs events Before mutation known in human genome A T M1 (SNP1) � Latest array ‘only’ measures 5 Million SNPs After mutation 1 A T D (SNP, DIP, CNV) � Is your disease causing SNP T G mutation M2 (SNP2) one of the 5 Million measured? After mutation 2 A T G T mutation G C Where does LD come from? Recombination? Recombination generates new arrangements of ancestral alleles � Recombination is NOT random Before recombination � Hot spots A T G T � In between hot spots, variants are in LD G C � They are correlated � Often measured as r 2 � R 2 = 1 : two variants provide same information After recombination A T G T G C A C 8

  9. Overview What is a haplotype? � Sampling � combination of alleles at multiple loci that are transmitted together on � Effect estimation the same chromosome � Hardy-Weinberg Equilibrium Ancestor � Linkage Disequilibrium � Haplotypes Present-day Region in LD Genotype or haplotype? Genetic variation is structured in blocks of high LD SNP1 SNP2 SNP3 SNP4 SNP5 SNP6 Maternal chr A C T A C T Genotype Haplotype Paternal chr G A C G A C 9

  10. How to use haplotypes How to use haplotypes G/A G/C T/C G/C A/T A/C Tags: A/T G/A G/C T/C G/C A/C 1 2 3 4 5 6 Tags: 2 3 4 5 1 6 SNP 1 SNP 1 SNP 3 SNP 3 A G G G T T G G A A SNP 6 A G G G T T G G A 2 in total A G G C C C C C C C A G G C C C C C C T A A G C C G G C C 3 in total T A A G C C G G C T A A C C C C C C C Test for association: T A A C C C C C C Test for association: SNP 1 captures 1+2 SNP 1 high r 2 high r 2 high r 2 SNP 3 captures 3+5 SNP 3 “AG” haplotype captures SNP SNP 6 4+6 After Carlson et al. (2004) AJHG 74 :106 How to use haplotypes: imputations So where do these reference haplotypes come from? � Correlation between variants used to ‘guess’ what the genotype of � Large sample collections genotyped/sequenced previously untyped variants is � HapMap � 1000 Genome Project � Local sequencing projects � UK10K � GoNL � … � Haplotype Reference Consortium 10

  11. So where do these reference haplotypes come from? What about haplotypes themselves? � APOE and Alzheimer’s Disease In summary / Take Home Messages Questions � Sampling must be representative of source population � Additive model is primary model for effect estimation � Hardy-Weinberg Equilibrium mostly warns you about genotyping errors � Linkage Disequilibrium is the correlation between adjacent loci � Haplotypes are the combination of alleles on adjacent loci � Haplotypes are used for imputations � Don’t forget that haplotypes themselves can also influence phenotypes 11

Recommend


More recommend