lecture 8 model assessment nested models and hypothesis
play

Lecture 8: Model assessment, nested models, and hypothesis testing - PowerPoint PPT Presentation

Lecture 8: Model assessment, nested models, and hypothesis testing Ani Manichaikul amanicha@jhsph.edu 27 April 2007 1 Another Example: Mortality n British Smoke, Pollution & Morality Data Airborne Smoke Particles 1.34 SO2


  1. Lecture 8: Model assessment, nested models, and hypothesis testing Ani Manichaikul amanicha@jhsph.edu 27 April 2007 1

  2. Another Example: Mortality n British Smoke, Pollution & Morality Data Airborne Smoke Particles 1.34 SO2 Concentration .09 London Mortality .29 4.46 112 518 2

  3. Mortality Example: Model Let: Y = the daily mortality for London (deaths) n X 1 = airborne smoke particles (mg/m3) (smoke) n X 2 = SO 2 (ppm) (so2) n Model: 1) Y i = β 0 + β 1 (X 1 -2) + β 2 (X 2 -.5) + ε i n 2) ε i ~ N(0, σ 2 ) n Mortality is a linear function of the concentration of airborne n smoke particles AND the SO2 level 3

  4. Mortality Example: Interpretations Model: n E( Y | X ) = β 0 + β 1 (X 1 -2) + β 2 (X 2 -.5) n β 0 : E( Y | X 1 = 2, X 2 = .5) = β 0 + β 1 (0)+ β 2 (0) = β 0 n Therefore: β 0 = The mean number of deaths per day when smoke particle concentrations are 2 mg/m 3 and SO 2 concentrations are 0.5 ppm levels 4

  5. Mortality Example: Interpretations n β 1 : E( Y | X 1 = (X 1 + 1), X 2 )= β 0 + β 1 (X 1 -1)+ β 2 (X 2 -.5) E( Y | X 1 = (X 1 ), X 2 ) = β 0 + β 1 (X 1 -2)+ β 2 (X 2 -.5) ∆ E( Y | X ) = β 1 n Therefore: β 1 = Expected change in mortality on days when particles are 0.1 mg/m 3 higher if SO 2 is unchanged 5

  6. Mortality Example: Interpretations n β 2 : E( Y | X 1 = ?, X 2 = ?) = E( Y | X 1 = ?, X 2 = ?) = ∆ E( Y | X ) = β 2 n Therefore: β 2 = 6

  7. Mortality Example: Results Source | SS df MS Number of obs = 15 -------------+------------------------------ F( 2, 12) = 36.57 Model | 205097.531 2 102548.765 Prob > F = 0.0000 Residual | 33654.2025 12 2804.51687 R-squared = 0.8590 -------------+------------------------------ Adj R-squared = 0.8355 Total | 238751.733 14 17053.6952 Root MSE = 52.958 ------------------------------------------------------------------------------ deaths | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- smokecenter | -220.3244 58.14314 -3.79 0.003 -347.0074 -93.64135 so2center | 1051.816 212.5959 4.95 0.000 588.6096 1515.023 _cons | 174.7703 29.16174 5.99 0.000 111.2323 238.3083 ------------------------------------------------------------------------------ 7

  8. Mortality Example: Inference n Overall F-Test: n Are ANY of the covariates significant? n H 0 : β 1 = β 2 = 0; n Fobs: (2,13) = 36.57; n p-val = 0.0000 n Decision: At least one of the β ’s are nonzero 8

  9. Parameter Estimates (95% C.I.) & individual t-tests β 0 n b 0 = 174.8 (111.2, 238.3) n H 0 : β 0 = 0; n tobs: (12) = 5.99; n p-val = 0.000 9

  10. Parameter Estimates (95% C.I.) & individual t-tests β 1 n b 1 = -220.3 (-347.0, -93.6) n H 0 : β 1 = 0; n tobs: (12) = -3.79; n p-val = 0.003 10

  11. Parameter Estimates (95% C.I.) & individual t-tests β 2 n b 2 = 1051.8 (588.6, 1515.0) n H 0 : β 2 = 0; n tobs: (12) = 4.95; n p-val = 0.000 means p-val < 0.001 n Note: s 2 = MSE = 2805; n s = √ MSE = ‘Root MSE’ = 53 11

  12. Parameter Interpretations: with Estimates n b 0 : when smoke particles and SO 2 are around their average levels, (2 mg/m 3 ,and 0.5 ppm respectively), the estimated mean number of deaths is 174.8 / day n b 1 : the estimated mean mortality is 22 deaths/day lower on days when particles are 0.1 mg/m 3 higher if SO 2 is unchanged n b 2 : (You do!) 12

  13. Estimating Suppose we were interested in the estimated mean n number of deaths when smoke particle concentrations were 3 mg/m 3 and SO 2 levels were 0.65 ppm E( Y | X ) = β 0 + β 1 (X 1 -2) + β 2 (X 2 -.5) so: E(Deaths) = b 0 + b 1 (smoke-2) + b 2 (so2-.5) n = 174.8 - 220 (3 - 2) + 1052 (.65 -0.5) ≈ 60 deaths How about if smoke particle concentrations were 3 n mg/m 3 and SO 2 levels were 0.45 ppm? 13

  14. Association n The estimate for airborne smoke particles is b 1 = –220, implying that smoke particles and mortality have a negative relationship n i.e. an increase in smoke particles is associated with a decrease in mortality, after adjusting for SO 2 levels. 14

  15. Negative Association?? n BUT WAIT! n Look at the plot of deaths vs smoke presented previously. Shouldn’t the relationship be positive instead?! n Let’s run Simple Linear Regressions (SLRs) of mortality on smoke & SO 2 and see what we get. 15

  16. Simple Linear Regression n Same Notation: n Y = the daily mortality for London (deaths) n X 1 = airborne smoke particles (mg/m3) (smoke) n X 2 = SO 2 (ppm) (so2) 16

  17. SLR Models n Smoke: n 1) Y i = β 0 + β 1 (X 1 -2) + ε i n 2) ε i ~ N(0, σ 2 ) n SO 2 : n 1) Y i = β 0 * + β 1 * (X 2 -.5) + ε i * n 2) ε i * ~ N(0, σ 2 * ) 17

  18. SLR: Deaths ~ Smoke SLR: DEATHS ~ SMOKE 500 London Mortality 400 300 200 100 0 2 4 Airborne Smoke Particles 18

  19. Death ~ Smoke: Results Source | SS df MS Number of obs = 15 -------------+------------------------------ F( 1, 13) = 17.34 Model | 136449.517 1 136449.517 Prob > F = 0.0011 Residual | 102302.216 13 7869.40127 R-squared = 0.5715 -------------+------------------------------ Adj R-squared = 0.5386 Total | 238751.733 14 17053.6952 Root MSE = 88.71 ------------------------------------------------------------------------------ deaths | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- smokecenter | 63.76092 15.31226 4.16 0.001 30.68078 96.84105 _cons | 299.3407 24.64457 12.15 0.000 246.0993 352.582 ------------------------------------------------------------------------------ Parameter Estimates: b 0 = 299.3 b 1 = 63.8 ( is positive?!!) Amount of variation described: R 2 = SSM / SST = 57% Residual Variability left over, (undescribed by this SLR): SSE = 1023002.216 19

  20. SLR: Death ~ SO 2 SLR: DEATHS ~ SO2 500 London Mortality 400 300 200 100 0 .5 1 1.5 SO2 Concentration 20

  21. Death ~ SO 2 : Results Source | SS df MS Number of obs = 15 -------------+------------------------------ F( 1, 13) = 28.99 Model | 164827.112 1 164827.112 Prob > F = 0.0001 Residual | 73924.6211 13 5686.50932 R-squared = 0.6904 -------------+------------------------------ Adj R-squared = 0.6666 Total | 238751.733 14 17053.6952 Root MSE = 75.409 ------------------------------------------------------------------------------ deaths | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- so2center | 256.2356 47.59353 5.38 0.000 153.416 359.0551 _cons | 272.2286 19.57285 13.91 0.000 229.944 314.5131 ------------------------------------------------------------------------------ Parameter Estimates: b 0 = 256.2 b 1 = 272.2 Amount of variation described: R 2 = SSM / SST = 69% Residual Variability left over, (undescribed by this SLR): SSE = 73924.6211 21

  22. Confounding in this Example Recall our parameter interpretations: n β 1 = Expected change in mortality on days when particles are 0.1 mg/m 3 higher if SO 2 is unchanged n Suppose we examine the relationship between smoke particle concentrations and SO 2 levels, (SLR): 22

  23. SLR: Smoke ~ SO 2 SLR: SMOKE ~ SO2 Airborne Smoke Particles 6 4 2 0 0 .5 1 1.5 SO2 Concentration 23

  24. Confounding n Smoke particle concentrations and SO 2 levels are highly related! How can we talk about changing smoke particle concentrations while leaving SO 2 levels unchanged?? n This phenomenon is called ‘confounding’ n both covariates are related to the outcome and to each other. n Confounding is the reason we found differences between the SLR models and the MLR model. 24

  25. Residuals: part “left over” 25

  26. Residuals n Residuals are deviations n what’s ‘left over’ n in the response, Y, from what was expected given the predictor, X n The residuals are the part of Y that can’t be predicted by X! 26

  27. Adjusted Variable Plots Idea: n Explain all that we can in London daily mortality using SO 2 levels n Explain all that we can in smoke particle concentrations using SO 2 levels n Explain everything that’s ‘left over’ in mortality with everything that’s ‘left over’ in smoke particle concentrations. The slope of this line will be the MLR coefficient! 27

  28. Adjusted Variable Plot AVP: Deaths vs. Smoke Resids: DEATHS ~ SO2 200 100 0 -100 -200 -.5 0 .5 Resids: SMOKE ~ SO2 28

  29. Recipe for AVP Recipe for obtaining the MLR slope for X 1 from an AVP (adjusted for X2): Regress Y on X 2 , save residuals as: R Y|X2 1. Regress X 1 on X 2 , save residuals as: R X1|X2 2. Plot R Y|X2 vs R X1|X2 (Adjusted Variable Plot ) 3. Regress R Y|X2 on R X1|X2 : R Y|X2 = β 0 * + β 1 * R X1|X2 + ε 29

  30. Notes on AVPs n β 1 * is identical to the coefficient of X 1 from an MLR of Y on X 1 and X 2 n β 0 * is zero -- zero intercept n The AVP display may be misleading if Y and/or X 1 are not linearly related to the other predictors 30

Recommend


More recommend