Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007
Today n Categorical predictor n create dummy variables n just like for linear regression n Comparing nested models that differ by two or more variables for logistic regression n X 2 Test of Deviance n analogous to the F test in linear regression n Effect Modification and Confounding
Example n Mean SAT scores were compared for the 50 US states. The goal of the study was to compare overall SAT scores using state-wide predictors such as per- pupil expenditures and average teachers’ salary. The investigators also considered the proportion of student eligible to take the SAT who actually took the examination.
Variables n Outcome n Total SAT score [sat_low] n 1= low, 0= high n Primary predictor n Average expenditures per pupil [expen] in thousands n Continuous, range: 3.65-9.77, mean: 5.9
Variables n Secondary predictors n Percent of pupils taking the SAT, in quartiles n percent1 – lowest quartile n percent2 – 2 nd quartile n percent3 – 3 rd quartile n percent4 – highest quartile n Mean teacher salary in thousands, in quartiles n salary1 – lowest quartile n salary2 – 2 nd quartile n salary3 – 3 rd quartile n salary4 – highest quartile
Modifications to variables n Expenditures: continuous, doesn’t include 0: center at $5,000 per pupil n Percent: four dummy variables for four categories; must exclude one category to create a reference group n Salary: four dummy variables for four categories; must exclude one category to create a reference group
Plan n Assess primary relationship n Add each secondary predictor separately n Determine which secondary predictor is more statistically significant n Add other secondary predictor to model with “better” secondary predictor
The X 2 Test of Deviance n We would like to consider adding salary quartiles to our model n We want to compare parent model to an extended model, which differs by the three dummy variables for the four salary quartiles. n The X 2 test of deviance compares nested models n We use it for nested models that differ by two or more variables because the Wald test cannot be used in that situation
1. Get the Log Likelihood from both models n The log likelihood is shown in the upper right corner of the logit or logistic output n Null model: LL = -28.94 n Extended model B: LL = -28.25
2. Find the deviance for each model Deviance = -2 x (log likelihood) n Deviance is analogous to residual sums of squares n (RSS) in linear regression; it measures the deviation still available in the model n A saturated model is one in which every Y is perfectly predicted Null model: n n Deviance = -2(-28.94) = 57.88 Extended model B: n n Deviance = -2(-28.25) = 56.50
3. Find the change in deviance between the nested models n Null model: Deviance = 57.88 n Extended model B: Deviance = 56.50 n Change in deviance = deviance null – deviance extended = 57.88 - 56.50 = 1.38
4. Evaluate the change in deviance n The change in deviance from the parent model to the nested model is an observed Chi-square statistic n df = # of variables added n H 0 : all new � ’s are 0 in the population n or H 0 : the parent model is better
4. Evaluate the change in deviance n H 0 : After adjusting for per-pupil expenditures, teachers’ salary is not an important predictor of SAT score. n X 2 obs = 1.38 n df = 3 n with 3 df and � = 0.05, X 2 cr is 7.81 n Fail to reject H 0
Notes about deviance test n The deviance test gives us a framework in which to add several predictors to a model simultaneously n Can only handle nested models n Analogous to F-test for linear regression n Also known as a "likelihood ratio test"
Conclusions n per-pupil expenditure is associated with SAT score n After adjusting for per-pupil expenditure n Percent of students taking the SAT is statistically significant n Teachers’ salary is not statistically significant n Is salary significant after adjusting for both expenditure and percent?
Possible ways to improve this model: n Add an interaction variable n Does the effect of expenditures on odds of low mean SAT score vary between states with low and high percentages of students taking the SAT? n Add a spline n Does the effect of expenditures on odds of low mean SAT score vary over the level of expenditures?
Effect Modification in Logistic Regression Heart Disease Smoking and Coffee
Effect modification n Just like with linear regression, we may want to allow different relationships between the primary predictor and outcome across levels of another covariate n Can model such relationships by fitting interaction terms n Modelling effect modification will require dealing with two or more covariates
Logistic models with two covariates β 0 + β 1 X 1 + β 2 X 2 n logit( p) = Then: logit( p | X 1 = X 1 + 1,X 2 = X 2 ) = β 0 + β 1 (X 1 + 1)+ β 2 X 2 ,X 2 = X 2 ) = β 0 + β 1 (X 1 )+ β 2 X 2 logit( p | X 1 = X 1 ∆ in log-odds β 1 = n β 1 is the change in log-odds for a 1 unit change in X 1 provided X 2 is held constant.
Interpretation in General = + odds(Y 1 | X 1, X ) = β 1 n Also: log 1 2 = odds(Y 1 | X , X ) 1 2 = exp( β 1 ) !! n And: OR n exp( β 1 ) is the Multiplicative change in odds for a 1 unit increase in X 1 provided X 2 is held constant . n The result is similar for X 2
Risk of CHD from Smoking and Coffee n = 151
Study Information n Study Facts: n Case-Control study n 40-50 year-old males previously in good health n Study questions: n Is smoking and/or coffee related to an increased odds of CHD? n Is the association of coffee with CHD higher among smokers? That is, is smoking an effect modifier of the coffee-CHD associations?
Fraction with CHD by smoking and coffee
Pooled data, ignoring smoking Odds ratio = (40 * 50) / (26 * 35) = 2.2 95% CI = (1.14, 4.24)
Among Non-Smokers Odds ratio = (15 * 42) / (15 * 21) = 2.0 95% CI = (0.82, 4.9)
Among Smokers Odds ratio = (25 * 8) / (11 * 14) = 1.3 95% CI = (.42, 4.0)
Plot Odds Ratios and 95% CIs
Define Variables n Y i = 1 if CHD case, 0 if control n COF i = 1 if Coffee Drinker, 0 if not n SMK i = 1 if Smoker, 0 if not n p i = Pr (Y i = 1) n n i = Number observed at pattern i of Xs
Logistic Regression Model n Y i are from a Binomial (n i , p i ) distribution n Yi are independent n log odds (Y i = 1) (or, logit( Y i = 1) ) is a function of n Coffee n Smoking n and coffee x smoking interaction
Logistic Regression Model p = β + β + β + β i log COF SMK COF SMK − 0 1 2 3 i i i i 1 p i n Which implies that Pr(Y i = 1) is the logistic function + + β + β � � X X X X e 0 1 1 2 2 3 1 2 i i i i = p + + β + β � � i + X X X X 0 1 1 2 2 3 1 2 i i i i 1 e
Probabilities of CHD as a function of coffee and smoking history Smoke No Yes Coffee + β e � 0 2 � e 0 No + β � + 0 2 � + 0 1 e 1 e + + β + β � � + � � e 0 1 2 3 e 0 1 Yes + � � + + β + β � � + + 0 1 0 1 2 3 1 e 1 e
Among Non-Smokers: β + β e 0 1 + β + β 1 e 0 1 1 ( ) β + β Odds Case | Coffee + 1 e = 0 1 ( ) β Odds Case | No Coffee e 0 β + 1 e 0 1 + β 1 e 0 β + β e 0 1 = β e 0 β = e 1 = Odds Ratio
Interpretations n exp{ � 1 } : odds ratio of being a CHD case for coffee drinkers -vs- non-drinkers among non-smokers n exp{ � 1 �� 3 } : odds ratio of being a CHD case for coffee drinkers -vs- non- drinkers among smokers
Interpretations n exp{ � 2 } : odds ratio of being a CHD case for smokers -vs- non-smokers among non-coffee drinkers n exp{ � 2 �� 3 } : odds ratio of being case for smokers -vs- non-smokers among coffee drinkers
Interpretations β e 0 fraction of cases among non- β n + 0 1 e smoking non-coffee drinking individuals in the sample (determined by sampling plan) n exp{ � 3 } : ratio of odds ratios
exp{ � 3 } Interpretations n exp{ � 3 } : factor by which odds ratio of being a CHD case for coffee drinkers -vs- nondrinkers is multiplied for smokers as compared to non-smokers or n exp{ � 3 } : factor by which odds ratio of being a CHD case for smokers -vs- non-smokers is multiplied for coffee drinkers as compared to non-coffee drinkers
Some Special Cases n Given = Pr( 1 ) Y = β + β + β + β log * COF SMK COF SMK = 0 1 2 3 Pr( 0 ) Y n If � 1 = � 2 = � 3 = 0 n Neither smoking no coffee drinking is associated with increased risk of CHD
Some Special Cases n Given = Pr( 1 ) Y = β + β + β + β log * COF SMK COF SMK = 0 1 2 3 Pr( 0 ) Y n If � 1 = � 3 = 0 n Smoking, but not coffee drinking, is associated with increased risk of CHD
Recommend
More recommend