Ph.D. course in epidemiology: Fall 2012. Confounding Analysis of cohort studies. • Epidemiology relies on observational studies or experiments of nature C & H, Ch. 6, 14-15. • Often these are poor experiments — no control for confounding by extraneous influences 18 September 2012 • Definition: A confounder is a variable whose influence we would have controlled if we had been able to design the natural www.biostat.ku.dk/~nk/epiE12 experiment. Per Kragh Andersen 1 2 Example: confounding by age, Fig. 14.1 Age Age ✟✟✟✟✟ ✟✟✟✟✟ F F 0.1 0.1 • Probability of failure for unexposed : ❍❍❍❍❍ ❍❍❍❍❍ < 55 < 55 (0 . 8 × 0 . 1) + (0 . 2 × 0 . 3) = 0 . 14 � � 0.8 0.4 � � � 0.9 � 0.9 S S • Probability of failure for exposed : � � � � (0 . 4 × 0 . 1) + (0 . 6 × 0 . 3) = 0 . 22 ❅ ❅ ❅ ❅ ❅ ✟✟✟✟✟ ❅ ✟✟✟✟✟ F F • Difference entirely due to difference in age structure. 0.3 0.3 ❅ ❅ ❅ ❅ 0.2 0.6 • When there is a true effect, its magnitude can be distorted by ❍❍❍❍❍ ❍❍❍❍❍ 55+ 55+ such influences. 0.7 0.7 S S Unexposed subjects Exposed subjects 3 4
Confounding when RR = 2 Results. Age ✟✟✟✟✟ Age ✟✟✟✟✟ F F 0.1 0.2 • The true relative risk, RR T = 0 . 2 / 0 . 1 = 0 . 4 / 0 . 2 = 2 ❍❍❍❍❍ ❍❍❍❍❍ < 55 < 55 • Probability of failure for unexposed : � � 0.8 0.4 � � 0.9 0.8 � � S S (0 . 8 × 0 . 1) + (0 . 2 × 0 . 2) = 0 . 12 � � � � • Probability of failure for exposed : ❅ ❅ ❅ ❅ ❅ ✟✟✟✟✟ ❅ ✟✟✟✟✟ (0 . 4 × 0 . 2) + (0 . 6 × 0 . 4) = 0 . 32 F F 0.2 0.4 ❅ ❅ 0.2 ❅ 0.6 ❅ • The apparent relative risk: 55+ ❍❍❍❍❍ 55+ ❍❍❍❍❍ RR O = 0 . 32 / 0 . 12 = 2 . 67 0.8 0.6 S S Unexposed subjects Exposed subjects 5 6 Confounding: schematically. A variable C is a potential confounder for the relation: Confounding E → O if it is A confounder is: • 1) related to the exposure: • associated with outcome: E − C e.g., older persons have higher disease probability, • 2) an independent risk factor for the outcome: • associated with the exposure: C → O e.g., older persons are more / less likely to be exposed, • 3) not a consequence of the exposure: • not a result of exposure, i.e. not an intermediate variable. E → C → O Not a statistical property; cannot be seen from tables; common That is: sense is required! − E C ց ւ O 7 8
Confounding. The problem is that we do not always get a fair comparison between exposed and non-exposed. Controlling confounding, Sect. 14.2 EXPOSED NON-EXPOSED In controlled experiments there are two ways of controlling confounding: 1. Randomization of subjects to experimental groups so that the Young Young distributions of the confounder are the same. 2. Hold the confounder constant . Old Old A randomly selected exposed person tends to be older than a randomly chosen non-exposed. 9 10 Standardization is a classical statistical technique for controlling for extraneous variables (in particular: age ) in the analysis of an Direct standardization, sect. 14.3 observational study 1. Direct standardization simulates randomization by equalizing 1. Estimate age-specific rates (or risks) in each group, the distribution of extraneous variables. 2. Calculate marginal rates (risks) if the age distribution were fixed 2. Indirect standardization simulates the second method: holding to that of some agreed standard population . extraneous variables constant. A standard population is another term for a common age-distribution. We first discuss direct standardization and then later turn to the main ways of “holding the confounder constant”: 3. Direct standardization is good for illustrative purposes as it provides absolute rates. • stratified (“Mantel-Haenszel”) analysis • or (more importantly) regression analysis: logistic, Poisson, Cox. 11 12
Age ✟✟✟✟✟ Age ✟✟✟✟✟ F F 0.1 0.1 ❍❍❍❍❍ ❍❍❍❍❍ < 55 < 55 � � 0.8 0.4 � � The Diet data 0.9 0.9 � � S S � � � � ❅ ❅ Exposed Unexposed ❅ ❅ Current ( < 2750 kcal) ( ≥ 2750 kcal) ❅ ✟✟✟✟✟ ❅ ✟✟✟✟✟ F F 0.3 0.3 ❅ ❅ age D Y Rate D Y Rate RR 0.2 ❅ 0.6 ❅ ❍❍❍❍❍ ❍❍❍❍❍ 55+ 55+ 40–49 2 311.9 6.41 4 607.9 6.58 0.97 50–59 12 878.1 13.67 5 1271.1 3.93 3.48 0.7 0.7 S S 60–69 14 667.5 20.97 8 888.9 9.00 2.33 Unexposed subjects Exposed subjects Total 28 1857.5 15.07 17 2768.9 6.14 2.46 Marginal failure probability (with 50-50 age distribution) is (0 . 5 × 0 . 1) + (0 . 5 × 0 . 3) = 0 . 2 for both groups 13 14 Direct standardization in the diet data. Choice of weights We can standardize the age-specific rates to a population with equal numbers of person–years in each age group. • Sometimes overall age structure of the whole study is used Exposed: • Use of a standard age structure can facilitate comparison with � 1 � � 1 � � 1 � other work. 3 × 6 . 41 + 3 × 13 . 67 + 3 × 20 . 97 = 13 . 67 • In cancer epidemiology standard populations approximating the Unexposed: European, US or World population age-distribution are used. � 1 � � 1 � � 1 � • Equal weights essentially give a comparison between cumulative 3 × 6 . 58 + 3 × 3 . 93 + 3 × 9 . 00 = 6 . 50 rates in the two groups Estimate of rate ratio is 13 . 67 / 6 . 50 = 2 . 10. 15 16
If the effect of exposure is the same in all age-strata, we can re-parameterize rates as: Stratified (Mantel-Haenszel) analysis, Ch. 15. Exposed Unexposed • Aim is to hold age constant. Age Low energy High energy Rate Ratio • Compare exposed and unexposed persons within age strata. λ 0 1 = θλ 0 λ 0 40–49 θ 0 0 • Compute a combined estimate of effect over all strata. λ 1 1 = θλ 1 λ 1 50–59 θ 0 0 • This implies a model in which there is no (systematic) variation λ 2 1 = θλ 2 λ 2 60–69 θ 0 0 of effect over strata. This is the proportional hazards model: • If estimates are similar we combine them, by a suitable average. For every stratum a : λ a 1 = θλ a 0 . θ is the effect of exposure “controlled for” age. 17 18 The Mantel-Haenszel estimate Data The MH-estimate for θ is (the weighted average): Exposed Unexposed D 1 a Y 0 a � � a Q a = Q a Y 0 a + Y 1 a θ MH = = R. Age ( a ) Low energy (1) High energy (0) D 0 a Y 1 a � � a R a a Y 0 a + Y 1 a 40–49 ( a = 0) D 10 , Y 10 D 00 , Y 00 This may be calculated by hand. 50–59 ( a = 1) D 11 , Y 11 D 01 , Y 01 Note that only θ is estimated, not the λ ’s. 60–69 ( a = 2) D 12 , Y 12 D 02 , Y 02 Maximum likelihood estimation of all parameters: later. 19 20
The Mantel-Haenszel test The Mantel-Haenszel test for no exposure effect is: An approximate confidence interval for θ can be obtained using a U 2 /V standard error for log(ˆ θ ) and then calculate the error factor in the where usual way: � � U = U a V sd(log( θ MH )) = a QR and where Y 1 a U a = D 1 a − ( D 0 a + D 1 a ) Y 0 a Y 1 a � � Y 0 a + Y 1 a V = V a = ( D 0 a + D 1 a ) ( Y 0 a + Y 1 a ) 2 . a a (NB: calculations by hand). This test may also be based on the likelihood principle. When θ = 1, this is approximately χ 2 1 − distributed. 21 22 Is it reasonable to assume constant rate ratio? Estimate θ and compute the expected number of unexposed cases given the total number of cases and the split of risk time between exposed and unexposed: The diet data. Y 0 a E 0 a = ( D 0 a + D 1 a ) • θ MH = 2 . 40, Y 0 a + θ MH Y 1 a • 90% c.i. from 1.44 to 4.01, (cases should occur in proportion Y 0 a : θ MH Y 1 a ). Then, compute the • MH-test statistic: 8.48 ∼ χ 2 “Breslow-Day” test statistic for homogeneity over strata: 1 , P = 0 . 004 , • Breslow-Day test statistic: 1.65 ∼ χ 2 A 2 , P = 0 . 44 . ( D 0 a − E 0 a ) 2 � ∼ χ 2 A − 1 , E 0 a a =1 (where A is the number of age strata). If this is sufficiently small, accept that the rate ratio is constant. 23 24
Recommend
More recommend