case kontrol studier og genetiske associationsmodeller
play

Case-kontrol studier og genetiske associationsmodeller - PowerPoint PPT Presentation

Case-kontrol studier og genetiske associationsmodeller www.biostat.ku.dk/~bxc/SDC-courses Bendix Carstensen Claus Thorn Ekstrm Steno Diabetes Center & Inst. f. Matematik og Fysik, KVL & Biostatististisk afdeling, KU Steno Diabetes


  1. Case-kontrol studier og genetiske associationsmodeller www.biostat.ku.dk/~bxc/SDC-courses Bendix Carstensen Claus Thorn Ekstrøm Steno Diabetes Center & Inst. f. Matematik og Fysik, KVL & Biostatististisk afdeling, KU Steno Diabetes Center bxc@steno.dk ekstrom@dina.kvl.dk www.biostat.ku.dk/~bxc www.matfys.kvl.dk/~ekstrom December 2002

  2. Logarithms and exponentials 10 2 = 10 × 10 10 3 = 10 × 10 × 10 10 0 . 3010 = 2 10 2 × 10 3 = 10 5 log 10 (2) = 0 . 3010 10 3 / 10 2 = 10 1 (10 3 ) 2 = 10 6 10 0 . 4771 = 3 log 10 (3) = 0 . 4771 10 2 / 10 2 = 10 0 = 1 10 2 / 10 3 = 10 − 1 = 1 / 10 10 1 = 10 10 1 / 2 × 10 1 / 2 = 10 1 log 10 (10) = 1 √ 10 1 / 2 = 10 Logarithms and exponentials 1

  3. Multiplication and division 2 × 3 = 6 In general: log 10 (2) = 0 . 3010 log 10 (3) = 0 . 4771 log( xy ) = log( x ) + log( y ) 0 . 3010 + 0 . 4771 = 0 . 7781 log( x/y ) = log( x ) − log( y ) log 10 (6) = 0 . 7781 log( x a ) = a log( x ) log(1 /x ) = − log( x ) 10 0 . 3010 × 10 0 . 4771 = 10 0 . 7781 10 0 . 7781 = 6 Logarithms and exponentials 2

  4. Natural logarithms: e = 2 . 7183 log e ( e ) = 1 In general: e 0 . 6931 = 2 log e (2) = 0 . 6931 e x = exp( x ) e 1 . 0986 = 3 e x × e y = e x + y log e (3) = 1 . 0986 e x /e y = e x − y ( e x ) y = e x × y 2 × 3 = 6 1 /e x = e − x e 0 . 6931 × e 1 . 0986 = e 1 . 7918 e 1 . 7918 = 6 Logarithms and exponentials 3

  5. Names for the logarithms Engineers and calculators: log is the logarithm to base 10 . ln is the logarithm to base e , the natural log Matematicians: log is the logarithm to base e , the natural log log 10 is the logarithm to base 10 . Logarithms and exponentials 4

  6. Why natural logarithms? For small values of x (relative to 1): e x ≈ 1 + x ln(1 . 01) = 0 . 01 e − x ≈ 1 − x ln(0 . 99) = − 0 . 01 ⇒ ln(1 + x ) ≈ x ln(1 . 04) ≈ 0 . 04 ln(1 − x ) ≈ − x ln(1 . 20) = 0 . 182 � = 0 . 20 But: log 10 (1 . 01) = 0 . 4343 × 0 . 01 log 10 (0 . 99) = 0 . 4343 × − 0 . 01 log 10 ( x ) = 0 . 4343 × ln( x ) Logarithms and exponentials 5

  7. Hypothesis tests in statistical analysis For two populations the hypothesis of equal means is normally formulated as: ⇔ δ = µ 1 − µ 2 = 0 H 0 : µ 1 = µ 2 Statisticians would consider two models: x i 1 ∼ N ( µ 1 , σ 2 ) x i 1 ∼ N ( µ, σ 2 ) 1: 2: x i 2 ∼ N ( µ 2 , σ 2 ) x i 2 ∼ N ( µ, σ 2 ) H 0 would in this context then be: Can model 1 be reduced to model 2 ? Hypothesis testing is comparison of models. Hypothesis tests 6

  8. Comparing statistical models • Can a complicated model be reduced to one describing data in a simpler fashion? This is the kind of model that one would like to see accepted. • Can a model be reduced to a model that describes data as not varying with exposure / treatment? This is the kind of model that one would like to see rejected. Relevance of p < 0 . 05 depends on context. Hypothesis tests 7

  9. Probability In all scientific studies the outcome is subject to random variation. In case-control studies and association studies outcomes and exposures are discrete: • Case / Control • Genotype: aa / aA / AA “Measurement”-error described by probabilities for each possible outcome. Probability trees 8

  10. The binary probability model The risk F (Failure — Case) ✟✟✟✟✟✟✟✟✟✟✟✟✟✟✟✟✟ π ✟ parameter: π (pi). ❍❍❍❍❍❍❍❍❍❍❍❍❍❍❍❍❍ The odds parameter: 1 − π S (Survival — Control) ❍ ω (omega). π ω ⇔ ω = π = 1 − π 1 + ω Probability trees 9

  11. Conditional probabilities of failure F F ✟✟✟✟✟✟✟✟✟✟✟✟✟✟✟✟✟ 0 . 015 ✟ ✟✟✟✟✟✟✟✟✟✟✟✟✟✟✟✟✟ 0 . 005 ✟ E 1 E 0 ❍❍❍❍❍❍❍❍❍❍❍❍❍❍❍❍❍ ❍❍❍❍❍❍❍❍❍❍❍❍❍❍❍❍❍ 0 . 985 0 . 995 S S ❍ ❍ P { F | E 1 } = 0 . 015 P { F | E 0 } = 0 . 005 Risk for exposed individuals is increased by a factor of 0 . 015 / 0 . 005 = 3 . 0 , relative to unexposed Probability trees 10

  12. Conditional probabilities of failure p aa is the probability that a π aa F ✟✟✟✟✟✟✟ ✟ person has genotype aa. aa ❍❍❍❍❍❍❍ � � � S � ❍ p aa � � π aa is the conditional proba- � π aA � ✟ F � ✟✟✟✟✟✟✟ p aA bility of failure given geno- � � aA � ❍❍❍❍❍❍❍ ❅ type aa. ❅ S ❅ ❍ ❅ ❅ p AA ❅ ❅ π AA ❅ F ✟✟✟✟✟✟✟ ✟ p aa × π aa is the probability ❅ ❅ ❅ AA ❅ ❍❍❍❍❍❍❍ that a person has genotype S ❍ aa and fails. Probability trees 11

  13. Relationship between follow–up studies and case–control studies In a cohort study , the relationship between exposure and disease incidence is investigated by following the entire cohort and measuring the rate of occurrence of new cases in the different exposure groups. The follow–up allows the investigator to register those subjects who develop the disease during the study period and to identify those who remain free of the disease. Case-kontrol studier 12

  14. In a case-control study the subjects who develop the disease (the cases) are registered by some other mechanism than follow-up, and a group of healthy subjects (the controls) is used to represent the subjects who do not develop the disease. Case-kontrol studier 13

  15. Rationale behind case-control studies • In a follow-up study, rates among exposed and non-exposed are estimated by: D 1 D 0 Y 1 Y 0 where D are no. events and Y person-years. The rate ratio is estimated by: � D 0 � Y 1 D 1 = D 1 Y 1 Y 0 D 0 Y 0 Necessary to classify both cases and person-years by exposure. Case-kontrol studier 14

  16. • In a case-control study we use the same cases, but select controls to represent the distribution of risk time between exposed and unexposed: H 1 ≈ Y 1 H 0 Y 0 Therefore the rate ratio is estimated by: � H 1 D 1 D 0 H 0 • Controls represent risk time, not disease-free persons. Case-kontrol studier 15

  17. Case–control probability tree Exposure Failure Selection Probability Case S 1 ✟✟✟✟✟✟ pπ 1 × S 1 ( D 1 ) π 1 F ✑✑✑✑✑✑✑ ❍❍❍❍❍❍ 1 − S 1 E 1 ◗◗◗◗◗◗◗ p Control � s 1 p (1 − π 1 ) × s 1 � ✟✟✟✟✟✟ � ( H 1 ) � 1 − π 1 S � ❍❍❍❍❍❍ � � 1 − s 1 � � Case ❅ S 0 ❅ ✟✟✟✟✟✟ (1 − p ) π 0 × S 0 ❅ ( D 0 ) ❅ π 0 F ❅ ✑✑✑✑✑✑✑ ❍❍❍❍❍❍ ❅ ❅ 1 − S 0 1 − p ❅ E 0 ❅ ◗◗◗◗◗◗◗ Control s 0 ✟✟✟✟✟✟ (1 − p )(1 − π 0 ) × s 0 ( H 0 ) 1 − π 0 S ❍❍❍❍❍❍ 1 − s 0 Case-kontrol studier 16

  18. The case-control ratio (disaese odds): D 1 = S 1 π 1 D 0 = S 0 π 0 × × H 1 s 1 1 − π 1 H 0 s 0 1 − π 0 = π 1 / (1 − π 1 ) Odds-ratio = OR study = D 1 /H 1 π 0 / (1 − π 0 ) = OR population D 0 /H 0 but only if S 1 /s 1 = S 0 /s 0 , i.e. if sampling fractions are independent of exposure: S 1 = S 0 and s 1 = s 0 S sampling fraction for cases — large s sampling fraction for controls — small Case-kontrol studier 17

  19. Estimation from case-control study Odds-ratio of disease between exposed and unexposed given inclusion in the study : � OR = ω 1 π 1 π 0 = 1 − π 1 1 − π 0 ω 0 is the same as the odds-ratio of disease between exposed and unexposed in the “study base” , provided that is the selection mechanism (sampling fractions) is only depending on case/control status. Case-kontrol studier 18

  20. Log-likelihood for case-control studies Likelihood: Probability of observed data given the statistical model. Log-Likelihood (conditional on being included) is a binomial likelihood with odds ω 0 and ω 1 = θω 0 D 0 ln( ω 0 ) − N 0 ln(1 + ω 0 ) + D 1 ln( θω 0 ) − N 1 ln(1 + θω 0 ) Odds-ratio ( θ ) is the ratio of ω 1 to ω 0 , so: ln( θ ) = ln( ω 1 ) − ln( ω 0 ) Case-kontrol studier 19

  21. Estimates of ln( ω 1 ) and ln( ω 0 ) are: � D 1 � � D 0 � ln and ln H 1 H 0 with standard errors: � � 1 + 1 1 + 1 and D 1 H 1 D 0 H 0 Exposed and unexposed form two independent bodies of data, so the estimate of ln( θ ) [= ln(OR)] is � � D 1 � � D 0 � 1 + 1 + 1 + 1 − ln ln , s.e. = H 1 H 0 D 1 H 1 D 0 H 0 Case-kontrol studier 20

  22. Computing c.i. for odds-ratios � OR = D 1 /H 1 1 + 1 + 1 + 1 ˆ s.e.[ln(OR)] = D 0 /H 0 D 1 H 1 D 0 H 0 95% c.i. for ln(OR) : ln(OR) ± 1 . 96 × s.e.[ln(OR)] 95% c.i. for OR by taking the exponential: × OR ÷ exp (1 . 96 × s.e.[ln(OR)]) � �� � error factor Case-kontrol studier 21

  23. Kir 6.2 homozygotes and diabetes Genotype Diabetes cases Population controls KK 134 124 EE/EK 669 738 What is the odds-ratio of diabetes associated with being homozygous for the K-allele? This compares KK genotypic persons with EE and EK seen as one group. How precisely is this odds-ratio determined? Case-kontrol studier 22

  24. OR = D 1 /H 1 = 134 / 124 669 / 738 = 1 . 081 0 . 907 = 1 . 192 = 1 . 19 D 0 /H 0 � 1 + 1 + 1 + 1 s.e.(ln[OR]) = D 1 H 1 D 0 H 0 � 134 + 1 1 124 + 1 669 + 1 = 738 = 0 . 136 The 95% limits for the odds-ratio are: × × ÷ exp(1 . 96 × 0 . 136) = 1 . 192 ÷ 1 . 304 = (0 . 91 − 1 . 55) OR Case-kontrol studier 23

More recommend