multiple comparisons methods in genetic e id epidemiology
play

Multiple Comparisons Methods in Genetic E id Epidemiology Studies - PowerPoint PPT Presentation

Multiple Comparisons Methods in Genetic E id Epidemiology Studies i l St di Yi Ren Wang, MPH Department of Epidemiology UCLA School of Public Health G Genetic Epidemiology Today ti E id i l T d Genetic association studies have


  1. Multiple Comparisons Methods in Genetic E id Epidemiology Studies i l St di Yi Ren Wang, MPH Department of Epidemiology UCLA School of Public Health

  2. G Genetic Epidemiology Today ti E id i l T d • Genetic association studies have become more ambitious: more ambitious: � Early studies focused on one or a few candidate SNPs � Recent studies target many SNPs and haplotypes using high throughput platforms

  3. G Genome-wide Association Study id A i ti St d � Large number of genetic variations involved • 1 test for 500 000 SNPs • 1 test for 500,000 SNPs • 25,000 expected to be significant at p<0.05, by chance alone � To make things worse � To make things worse • Dominance (additive/dominant/recessive) • Epistasis (multiple combinations of • Epistasis (multiple combinations of SNPs) • Multiple phenotype definitions • Subgroup analyses • Subgroup analyses • Multiple analytic methods

  4. Motivating Example DNA-DSBR Pathway and Lung & DNA DSBR Pathway and Lung & UADT Cancer Study

  5. G Goal of the study l f th t d � This study intends to cover the genetic variations on the whole DNA-DSBR variations on the whole DNA-DSBR pathway, in order to systematically reveal a full picture of how genetic polymorphisms in f ll i t f h ti l hi i double-strand break pathway alters risks of lung cancer and UADT cancer � The potential gene-gene and gene- The potential gene gene and gene environment interactions will be explored

  6. St d D Study Design i � Population-based case-control study in Los Angeles Angeles � 611 new cases of lung cancer � 601 new cases of UADT cancer � 1040 cancer free controls matched to cases � 1040 cancer-free controls matched to cases by age (within 10 years category) and gender d

  7. G Gene Selection S l ti � 19 genes involved in the DNA-DSBR pathway were selected for evaluation based pathway were selected for evaluation based on evidence for their role in either the h homologous recombination repair (HR) or l bi ti i (HR) the non-homologous end joining (NHEJ) pathways.

  8. SNP S l SNPs Selection ti � Known functional SNPs within the DNA double stranded break repair pathway were double stranded break repair pathway were selected � As well as potential functional SNPs such as amino-acid-changing (nonsynonymous) g g ( y y ) SNPs (nsSNPs) � With a minor allele frequency (MAF) greater � With a minor allele frequency (MAF) greater than 5%

  9. SNP S l SNPs Selection ti � 189 SNPs analyzed are in or near one of 19 189 SNPs analyzed are in or near one of 19 DNA-DSBR genes.

  10. St d D Study Design i � SAS 9.1 software will be used for data analysis. � ORs and 95% CLs will be computed using p g unconditional logistic regression � Potential confounding factors adjusted: age, g j g gender, ethnicity, educational level and tobacco smoking for lung cancer; age, gender, ethnicity, educational level tobacco smoking alcohol educational level, tobacco smoking, alcohol drinking and diet for UADT cancer � χ 2 test is performed to evaluate Hardy � χ 2 test is performed to evaluate Hardy- Weinberg equilibrium.

  11. St Stratified Analyses tifi d A l L Lung Cancer: C � Non-small cell lung carcinoma (NSCLC) g ( ) � Small cell lung carcinoma (SCLC) Head and Neck Cancer: � Oral cancer � Oral cancer � Pharyngeal cancer � Laryngeal cancer � Esophageal cancer � Esophageal cancer

  12. Stratified and Multivariate Analyses � Interaction between DSBR and smoking for lung cancer lung cancer � Interaction between DSBR and smoking for UADT cancer � Interaction between DSBR and alcohol Interaction between DSBR and alcohol drinking for UADT cancer � Haplotype analysis H l t l i

  13. What are the Genetic Epidemiology Issues? � Population stratification • Variation of SNP frequency by ethnicity • Genomic control parameter will be calculated to assess the validity of the results � High dimensional data Hi h di i l d t • Gene-environment interactions � Interaction of host genetics with environment � Interaction of host genetics with environment • Gene-gene interactions � Interaction of different SNPs � Multiple comparisons

  14. Multiple comparisons issue

  15. Hypothesis Testing Hypothesis Testing � H0 : Null hypotheis vs. H1 : Alternative Hypothesis Hypothesis � T : test statistics C : critical value T : test statistics C : critical value � If |T|>C, H0 is rejected. Otherwise H0 is retained | | , j � Ex ) H0 : μ 1 = μ 2 vs. H1 : μ 1 ≠ μ 2 T = ( x 1 - x 2 ) / pooled μ 2 vs. H1 : μ 1 ≠ μ 2 T ( x 1 x 2 ) / pooled Ex ) H0 : μ 1 se If |T| > z (1- α /2) , H0 is rejected at the significance | | (1 α /2) , j g level α � C α

  16. Hypothesis Testing Hypothesis Testing Hypothesis Result Hypothesis Result Retained Rejected Truth H0 Type I error T th H0 T I H1 Type II error � Type I error rate = false positives ( α : significance level ) level ) � Type II error rate = false negatives � Power : 1 Type II error rate � Power : 1–Type II error rate P-values : p=inf{ α | H0 is rejected at the significance level α } •

  17. Issues in Multiple Comparison Issues in Multiple Comparison � Q : Given n treatments, which two treatments are Q G e t eat e ts, c t o t eat e ts a e significantly different ? (simultaneous testing) cf) Is treatment A different from treatment B ? ) � Ex ) m treatment means : μ 1 ,…, μ n H j : μ i = μ j where i ≠ j μ i μ j j T j = ( x i - x j ) / pooled ( j ) p j j i SE • Type I error when testing each at 0.05 significance level one by one : 1 – (0.95) n Inflated Type I error, ex) α =1 – (0.95) 10 = 0.401263 • • Remedies : Bonferroni Method Type I error rate = α / # of comparison

  18. M lti l Multiple Comparisons C i � Probability of finding a false association by chance = 1 - 0 95 n chance = 1 - 0.95 • n = 10, p = 40% • n = 100, p = 99.4% � Our data: Our data: • 189 genotypes, 2 cancer sites, 10 Subgroup analyses analyses • N = 2268, p = 99.99999%

  19. Type I Error Rates Type I Error Rates Hypothesis Result Hypothesis Result #retained #rejected Total Truth Truth H0 U V m0 H0 U V m0 H1 T S m1 Total m-R R m T t l R R � Per-comparison error rate ( PCER ) = E(V) / m p ( ) ( ) � Per-family error rate ( PFER ) = E(V) � Family-wise error rate = pr ( V ≥ 1 ) y p ( ) � False discovery rate ( FDR ) = E(Q), Q V/R , if R > 0 0, if R = 0 ,

  20. F l False Positives P iti In the absence of bias, three factors determine the probability that a statistically determine the probability that a statistically significant finding is actually a false-positive fi di finding � the magnitude of the P value g � statistical power � fraction of tested hypotheses that is true f ti f t t d h th th t i t

  21. M lti l Multiple Comparisons C i � There is a lack of consensus regarding the optimal approach to address the false- optimal approach to address the false- positive probability of single nucleotide polymorphism (SNP) associations. l hi (SNP) i ti

  22. Methods for Multiple p Comparisons � Ignore it � Adjust p-values � Adjust p-values • Familywise Error Rate (FWER) � Chance of any false positives Ch f f l iti • False discovery rate (FDR) Benjamini et al 2001 � Use Bayesian methods • False positive report probability (FPRP) Wacholder et al False positive report probability (FPRP) Wacholder et al 2004

  23. FWER FWER controlling procedures t lli d � Bonferonni • adj Pvalue = min(n*Pvalue 1) • adj Pvalue = min(n Pvalue,1) � Holm (1979) � Hochberg (1986) � Westfall & Young (1993) maxT and minP � Westfall & Young (1993) maxT and minP

  24. B Bonferroni correction f i ti � For testing 500,000 SNPs • 5,000 expected to be significant at p<0.01 5,000 e pected to be s g ca t at p 0 0 • 500 expected to be significant at p<0.001 • …… • 0.05 expected to be significant at p<0.0000001 � Suggests setting significance level to α = 10 7* � Suggests setting significance level to α = 10-7* � Bonferroni correction for m tests set significance level for p-values to α = 0.05 / m t i ifi l l f l t 0 05 /

  25. Multiple Testing Procedures based on P values Multiple Testing Procedures based on P-values that control the family-wise error rate � For a single hypothesis H 1 , p 1 =inf{ α | H 1 is rejected at the significance level α } If p 1 < α , H 1 is rejected. Otherwise H 1 is retained � Adjusted p-values for multiple testing (p*) p j *=inf{ α | H 1 is rejected at FWER= α } j If p j * < α , H j is rejected. Otherwise H j is retained � Single-Step, Step-Down and Step-Up procedure

Recommend


More recommend