genome wide association studies
play

Genome-wide association studies Fernando Rivadeneira MD PhD 1,2 1 - PowerPoint PPT Presentation

Genome-wide association studies Fernando Rivadeneira MD PhD 1,2 1 Department of Internal Medicine 2 Department of Epidemiology SNPs and Diseases Molecular School of Medicine Monday, November 12 th , 2018 Topic outline - Rationale GWAS Approach


  1. Genome-wide association studies Fernando Rivadeneira MD PhD 1,2 1 Department of Internal Medicine 2 Department of Epidemiology SNPs and Diseases Molecular School of Medicine Monday, November 12 th , 2018

  2. Topic outline - Rationale GWAS Approach - Technology and QC - Study design - Study populations - Test for association - Population Stratification - Imputation (next talk) - Power - Phenotype definition - Follow-up GWAS signals

  3. Topic outline - Rationale GWAS Approach - Technology and QC - Study design - Study populations - Test for association - Population Stratification - Imputation (next talk) - Power - Phenotype definition - Follow-up studies and prospects

  4. What is linkage disequilibrium (LD)? • Co-occurrence of alleles at distinct/adjacent loci more frequently than expected by the allele frequencies and recombination rate • Allellic association depends on: SNP1 -G or A 1)physical distance (debate?) G → A 2)population history of sample SNP2 -C C → T 3)age of mutation/allele SNP3 -A G → A

  5. Identifying common variants associated to common traits and diseases is often targeted using the principles of: Linkage disequilibrium mapping M1 (SNP1) D (SNP, DIP, CNV) common, complex M2 (SNP2) (association) CD/CV common

  6. Linkage disequilibrium (LD) is the basis of the haplotype block structure What is an haplotype? • Linear, ordered arrangement of alleles on a chromosome • Combination of alleles of different polymorphisms on a single chromosome Ancestor Present-day Region in LD

  7. Genetic variation is structured into blocks of high LD:

  8. LD Statistics in practice • r 2 is inversely related to sample size of genetic association studies 1/r 2 1,000 cases 1,250 cases 1,000 controls r 2 =1.0 1,250 controls r 2 = 0.80 • D ´ is related to recombination history D ´ ~ 1 no recombination D ´ < 1 (0.8) historical recombination • D ’ and r 2 are complementary D ´ = 1 when r 2 is low (i.e. 0.02)

  9. Haplotype structure in the absence of recombination • In the absence of recombination, the shape of the tree and where mutations fall on it determine patterns of haplotype structure • Two mutations on the same branch will be in complete association, mutations on different branches will have lower and often low association r 2 = 1 r 2 = 0.04

  10. LD information allows to pick selected variants that “tag” variation in haplotypes G/C A/T G/A T/C G/C A/C Tags: 2 3 1 4 5 6 SNP 1 SNP 3 SNP 6 A G G T T A G G G C C C A G G C C C 3 in total T A A C C C G G G T A A C C C C C C Test for association: SNP 1 high r 2 high r 2 high r 2 SNP 3 SNP 6 After Carlson et al. (2004) AJHG 74 :106

  11. LD information allows to pick selected variants that “tag” variation in haplotypes G/A G/C T/C G/C A/T A/C Tags: 3 2 4 5 1 6 SNP 1 SNP 3 A G G G T T G G A A 2 in total A G G C C C C C C C T A A G C C G G C C C C C T A A C C C C Test for association: tags in multi-marker test should be SNP 1 captures 1+2 in high LD in order to avoid SNP 3 captures 3+5 overfitting “AG” haplotype captures SNP 4+6

  12. Properties underlying the haplotype-block structure • Regions of extensive Linkage disequilibrium and reduced haplotype diversity • Within a block SNPs are not independent • Haplotype-tag SNPs (htSNPs) are the subset of SNPs that can capture most of the haplotype diversity

  13. Genetic architecture fully determined by allele frequency and penetrance (effect size) of variants Rivadeneira & Makitie TEM 2016

  14. Genome-wide association (GWA) combines the strongest properties of linkage (hypothesis-free) and association (power) designs Genetic architecture of traits rare, monogenic (linkage) big Few examples Hypothesis- Effect Size free approach common, complex small Probably real (association) (impossible to identify with current methods) rare common Frequency Genetic Variant Modified from McCarthy et al., Nat Genet Rev 2008

  15. Genome-wide association (GWA) has been facilitated by the advent of: Of 3,000,000,000 bases Of 3,000,000,000 bases in human genome in human genome ~10,000,000 positions ~10,000,000 positions show variation show variation ~4,000,000 catalogued as common variation ~4,000,000 catalogued ~2,200,000 in CEU as common variation ~2,200,000 in CEU ~80-90% are captured by typing 500K markers ~80-90% are captured by typing 500K markers *from Mark McCarthy

  16. Topic outline - Rationale GWAS Approach - Technology and QC - Study design - Study populations - Test for association - Population Stratification - Imputation (next talk) - Power - Phenotype definition - Follow-up studies and prospects

  17. Microarray technology allows to genotype in the same effort hundred of thousands of SNPs per individual… AA AB BB AA → SNP 1 AA BB → SNP 2 BB AB → SNP 3 . . AB . . . . AB → SNP 500,000

  18. … which in the setting of large epidemiological studies allows the simultaneous testing of 2.5 million (imputed) markers for association with traits AA AB BB AA → SNP 1 AA BB → SNP 2 BB AB → SNP 3 . . AB . . . . AB → SNP 500,000

  19. This first step of the GWA approach is merely a hypothesis generating phase (with some very few exceptions) AA AB BB AA → SNP 1 AA BB → SNP 2 BB AB → SNP 3 . . 14 18 X AB 1 2 3 4 5 6 7 8 . . 10 12 . . Chromosomes AB → SNP 500,000

  20. The crucial step is replication which allows building-up evidence for association (genome-wide significance) AA AB BB p<0.05 threshold results in ~20,000 hypotheses AA → SNP 1 AA BB → SNP 2 BB AB → SNP 3 . . 14 18 X AB 1 2 3 4 5 6 7 8 . . 10 12 . . AB → SNP 500,000 Follow-up Set Meta-analysis of Top SNPs full datasets

  21. Only a selected number of SNPs is expected to achieve REPLICATION reaching a genome wide-significant level (i.e. 5 x 10 -8 ) Population stratification

  22. Quality Control Genotyping

  23. Rotterdam Study datasets QC methods description MAF> 1% GT SNPs: 512,849 RS-I Sample call rate < 98% Call Rate > 98% 466,389 RS-II Missing DNA pHWE > 1x10 -6 514,073 RS-III Gender mismatch Excess autosomal heterozigocity Imputed SNPs: 2,543,887 Duplicates or family relations IBS>97% Ethnic outliers (IBS distances > 4SD) Missing traits 24

  24. Topic outline - Rationale GWAS Approach - Technology and QC - Study design - Study populations - Test for association - Population Stratification - Imputation (next talk) - Power - Phenotype definition - Follow-up GWAS signals

  25. Type of study designs common variants - Relatedness base - Phenotype base - Family (extended pedigrees, - Case enrichment pedigrees, trios, sibs) - Extreme truncates - Unrelated individuals - Super/shared controls - Sampling base - Genetics base - Population-based - Genetic load enrichment - Disease oriented (case/control, - Isolates (extended LD) proband families) - Ethnicity - Epidemiological base - Admixture - Case/control - Genotype platform base - Cross-sectional - Staged approach (Gen) - Cohort (follow-up) - Joint analysis (Imp)

  26. Examples types of GWA studies • Disease oriented case/control studies – WTCCC, FUSION • Diseased oriented population-based studies – FRAMINGHAM HEART STUDY • Population-based Studies – ROTTERDAM STUDY – Generation R STUDY • Mega-GWAS – UKBIOBANK – MVP

  27. Most (if not all) GWA activities occur within CONSORTIA summing tenths to hundreds of thousands of participants CHARGE Rotterdam Study GEnetic Factors of OSteoporosis GENETIC INVESTIGATIONS OF ANTHROPOMETRIC TRAITS

  28. Topic outline - Rationale GWAS Approach - Technology and QC - Study design - Study populations - Test for association - Population Stratification - Imputation (next talk) - Power - Phenotype definition - Follow-up GWAS signals

  29. Different genetic models do influence the power of analysis but are difficult to determine a-priori => To avoid multiple testing problems the first genetic analyses are usually run using additive models which preserve power across different scenarios

  30. Statistical Methods Traits: Disease state or QT in natural units QT-> Standardized age-adjusted residuals from gender- stratified regression Trait = α + βAge + βAge2 Imputation: MACH, IMPUTE, BIM-BAM, PLINK r2>0.3, ratio Obs/Exp variance > 0.01, MAF > 0.01, HWE? Minor allele from HapMap CEU (+) strand => Reference Analysis: Performed by each cohort: MACH2QTL/BIN, SNPTEST, ProbABEL, PLINK Adjustment population stratification => Genomic control λ < 1.05, corrected SE = SE * √ λ Meta-analysis: METAL, PLINK, MetABEL: inverse variance weighted standard: fixed effects Heterogeneity: random effects for variants with I 2 > 50 GWS α < 5 x 10 -8 after double GC correction Significance:

  31. Topic outline - Rationale GWAS Approach - Technology and QC - Study design - Study populations - Test for association - Population Stratification - Imputation (next talk) - Power - Phenotype definition - Follow-up GWAS signals

Recommend


More recommend