lecture 12 effect modification and confounding in
play

Lecture 12: Effect modification, and confounding in logistic - PowerPoint PPT Presentation

Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today n Categorical predictor n create dummy variables n just like for linear regression n Comparing nested models that


  1. Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007

  2. Today n Categorical predictor n create dummy variables n just like for linear regression n Comparing nested models that differ by two or more variables for logistic regression n X 2 Test of Deviance n analogous to the F test in linear regression n Effect Modification and Confounding

  3. Example n Mean SAT scores were compared for the 50 US states. The goal of the study was to compare overall SAT scores using state-wide predictors such as per- pupil expenditures and average teachers’ salary. The investigators also considered the proportion of student eligible to take the SAT who actually took the examination.

  4. Variables n Outcome n Total SAT score [sat_low] n 1= low, 0= high n Primary predictor n Average expenditures per pupil [expen] in thousands n Continuous, range: 3.65-9.77, mean: 5.9

  5. Variables n Secondary predictors n Percent of pupils taking the SAT, in quartiles n percent1 – lowest quartile n percent2 – 2 nd quartile n percent3 – 3 rd quartile n percent4 – highest quartile n Mean teacher salary in thousands, in quartiles n salary1 – lowest quartile n salary2 – 2 nd quartile n salary3 – 3 rd quartile n salary4 – highest quartile

  6. Modifications to variables n Expenditures: continuous, doesn’t include 0: center at $5,000 per pupil n Percent: four dummy variables for four categories; must exclude one category to create a reference group n Salary: four dummy variables for four categories; must exclude one category to create a reference group

  7. Plan n Assess primary relationship n Add each secondary predictor separately n Determine which secondary predictor is more statistically significant n Add other secondary predictor to model with “better” secondary predictor

  8. The X 2 Test of Deviance n We would like to consider adding salary quartiles to our model n We want to compare parent model to an extended model, which differs by the three dummy variables for the four salary quartiles. n The X 2 test of deviance compares nested models n We use it for nested models that differ by two or more variables because the Wald test cannot be used in that situation

  9. 1. Get the Log Likelihood from both models n The log likelihood is shown in the upper right corner of the logit or logistic output n Null model: LL = -28.94 n Extended model B: LL = -28.25

  10. 2. Find the deviance for each model Deviance = -2 x (log likelihood) n Deviance is analogous to residual sums of squares n (RSS) in linear regression; it measures the deviation still available in the model n A saturated model is one in which every Y is perfectly predicted Null model: n n Deviance = -2(-28.94) = 57.88 Extended model B: n n Deviance = -2(-28.25) = 56.50

  11. 3. Find the change in deviance between the nested models n Null model: Deviance = 57.88 n Extended model B: Deviance = 56.50 n Change in deviance = deviance null – deviance extended = 57.88 - 56.50 = 1.38

  12. 4. Evaluate the change in deviance n The change in deviance from the parent model to the nested model is an observed Chi-square statistic n df = # of variables added n H 0 : all new � ’s are 0 in the population n or H 0 : the parent model is better

  13. 4. Evaluate the change in deviance n H 0 : After adjusting for per-pupil expenditures, teachers’ salary is not an important predictor of SAT score. n X 2 obs = 1.38 n df = 3 n with 3 df and � = 0.05, X 2 cr is 7.81 n Fail to reject H 0

  14. Notes about deviance test n The deviance test gives us a framework in which to add several predictors to a model simultaneously n Can only handle nested models n Analogous to F-test for linear regression n Also known as a "likelihood ratio test"

  15. Conclusions n per-pupil expenditure is associated with SAT score n After adjusting for per-pupil expenditure n Percent of students taking the SAT is statistically significant n Teachers’ salary is not statistically significant n Is salary significant after adjusting for both expenditure and percent?

  16. Possible ways to improve this model: n Add an interaction variable n Does the effect of expenditures on odds of low mean SAT score vary between states with low and high percentages of students taking the SAT? n Add a spline n Does the effect of expenditures on odds of low mean SAT score vary over the level of expenditures?

  17. Effect Modification in Logistic Regression Heart Disease Smoking and Coffee

  18. Effect modification n Just like with linear regression, we may want to allow different relationships between the primary predictor and outcome across levels of another covariate n Can model such relationships by fitting interaction terms n Modelling effect modification will require dealing with two or more covariates

  19. Logistic models with two covariates β 0 + β 1 X 1 + β 2 X 2 n logit( p) = Then: logit( p | X 1 = X 1 + 1,X 2 = X 2 ) = β 0 + β 1 (X 1 + 1)+ β 2 X 2 ,X 2 = X 2 ) = β 0 + β 1 (X 1 )+ β 2 X 2 logit( p | X 1 = X 1 ∆ in log-odds β 1 = n β 1 is the change in log-odds for a 1 unit change in X 1 provided X 2 is held constant.

  20. Interpretation in General  = +  odds(Y 1 | X 1, X ) = β 1   n Also: log 1 2   =   odds(Y 1 | X , X )   1 2 = exp( β 1 ) !! n And: OR n exp( β 1 ) is the Multiplicative change in odds for a 1 unit increase in X 1 provided X 2 is held constant . n The result is similar for X 2

  21. Risk of CHD from Smoking and Coffee n = 151

  22. Study Information n Study Facts: n Case-Control study n 40-50 year-old males previously in good health n Study questions: n Is smoking and/or coffee related to an increased odds of CHD? n Is the association of coffee with CHD higher among smokers? That is, is smoking an effect modifier of the coffee-CHD associations?

  23. Fraction with CHD by smoking and coffee

  24. Pooled data, ignoring smoking Odds ratio = (40 * 50) / (26 * 35) = 2.2 95% CI = (1.14, 4.24)

  25. Among Non-Smokers Odds ratio = (15 * 42) / (15 * 21) = 2.0 95% CI = (0.82, 4.9)

  26. Among Smokers Odds ratio = (25 * 8) / (11 * 14) = 1.3 95% CI = (.42, 4.0)

  27. Plot Odds Ratios and 95% CIs

  28. Define Variables n Y i = 1 if CHD case, 0 if control n COF i = 1 if Coffee Drinker, 0 if not n SMK i = 1 if Smoker, 0 if not n p i = Pr (Y i = 1) n n i = Number observed at pattern i of Xs

  29. Logistic Regression Model n Y i are from a Binomial (n i , p i ) distribution n Yi are independent n log odds (Y i = 1) (or, logit( Y i = 1) ) is a function of n Coffee n Smoking n and coffee x smoking interaction

  30. Logistic Regression Model   p   = β + β + β + β i log COF SMK COF SMK   − 0 1 2 3 i i i i  1  p i n Which implies that Pr(Y i = 1) is the logistic function + + β + β � � X X X X e 0 1 1 2 2 3 1 2 i i i i = p + + β + β � � i + X X X X 0 1 1 2 2 3 1 2 i i i i 1 e

  31. Probabilities of CHD as a function of coffee and smoking history Smoke No Yes Coffee + β e � 0 2 � e 0 No + β � + 0 2 � + 0 1 e 1 e + + β + β � � + � � e 0 1 2 3 e 0 1 Yes + � � + + β + β � � + + 0 1 0 1 2 3 1 e 1 e

  32. Among Non-Smokers: β + β e 0 1 + β + β 1 e 0 1 1 ( ) β + β Odds Case | Coffee + 1 e = 0 1 ( ) β Odds Case | No Coffee e 0 β + 1 e 0 1 + β 1 e 0 β + β e 0 1 = β e 0 β = e 1 = Odds Ratio

  33. Interpretations n exp{ � 1 } : odds ratio of being a CHD case for coffee drinkers -vs- non-drinkers among non-smokers n exp{ � 1 �� 3 } : odds ratio of being a CHD case for coffee drinkers -vs- non- drinkers among smokers

  34. Interpretations n exp{ � 2 } : odds ratio of being a CHD case for smokers -vs- non-smokers among non-coffee drinkers n exp{ � 2 �� 3 } : odds ratio of being case for smokers -vs- non-smokers among coffee drinkers

  35. Interpretations β e 0 fraction of cases among non- β n + 0 1 e smoking non-coffee drinking individuals in the sample (determined by sampling plan) n exp{ � 3 } : ratio of odds ratios

  36. exp{ � 3 } Interpretations n exp{ � 3 } : factor by which odds ratio of being a CHD case for coffee drinkers -vs- nondrinkers is multiplied for smokers as compared to non-smokers or n exp{ � 3 } : factor by which odds ratio of being a CHD case for smokers -vs- non-smokers is multiplied for coffee drinkers as compared to non-coffee drinkers

  37. Some Special Cases n Given   = Pr( 1 ) Y   = β + β + β + β log * COF SMK COF SMK   = 0 1 2 3   Pr( 0 ) Y n If � 1 = � 2 = � 3 = 0 n Neither smoking no coffee drinking is associated with increased risk of CHD

  38. Some Special Cases n Given   = Pr( 1 ) Y   = β + β + β + β log * COF SMK COF SMK   = 0 1 2 3   Pr( 0 ) Y n If � 1 = � 3 = 0 n Smoking, but not coffee drinking, is associated with increased risk of CHD

Recommend


More recommend