marcel dettling marcel dettling
play

Marcel Dettling Marcel Dettling Institute for Data Analysis and d - PowerPoint PPT Presentation

Applied Statistical Regr Applied Statistical Regr ression ression HS 2010 Week 10 Marcel Dettling Marcel Dettling Institute for Data Analysis and d Process Design Zurich University of Applied S Sciences marcel.dettling@zhaw.ch htt


  1. Applied Statistical Regr Applied Statistical Regr ression ression HS 2010 – Week 10 Marcel Dettling Marcel Dettling Institute for Data Analysis and d Process Design Zurich University of Applied S Sciences marcel.dettling@zhaw.ch htt http://stat.ethz.ch/~dettling // t t th h/ d ttli ETH Zürich, November 29 9, 2010 Marcel Dettling, Zurich University of Applied Sciences 1

  2. Applied Statistical Regr Applied Statistical Regr ression ression HS 2010 – Week 10 Logistic Regression L i ti R i Model M d l { { } } Y ∈ 0,1 , • has a Bernoulli d distribution. i i • The parameter of this distri bution is , the success rate p i Now please note that: = = = ( ( 1) 1) [ ] [ ] p P Y P Y E Y E Y i i i � the most powerful notion o of the logistic regression model is to see it as a model where w e try to find a relation between the expected value of and th he predictors! Y i = β + β + + β β Important : is no good here! ... p x x 0 1 1 i i ip Marcel Dettling, Zurich University of Applied Sciences 2

  3. Applied Statistical Regr Applied Statistical Regr ression ression HS 2010 – Week 10 E Example l Survival in Prem mature Birth 35 0 30 age 25 20 2.8 2.9 3.0 3.1 log10(we eight) Marcel Dettling, Zurich University of Applied Sciences 3

  4. Applied Statistical Regr Applied Statistical Regr ression ression HS 2010 – Week 10 I f Inference with GLMs ith GLM There are three tests that can be done: • Goodness-of-fit test - based on comparing agai based on comparing agai nst the saturated model nst the saturated model - not suitable for non-group ped, binary data • Comparing two nested m models - likelihood ratio test leads to deviance differences - test statistics has an asym mptotic Chi-Square distribution • Global test • Global test - comparing versus an emp pty model with only an intercept - this is a nested model tak this is a nested model, tak ke the null deviance ke the null deviance Marcel Dettling, Zurich University of Applied Sciences 4

  5. Applied Statistical Regr Applied Statistical Regr ression ression HS 2010 – Week 10 N ll D Null Deviance i Smallest model: - The smallest model is with hout predictors, only with intercept π π qual to ˆ - Fitted values will all be eq Fitted values will all be eq qual to 0 - Our best fit (F) and the sm mallest model (0) are nested A global test: ( ( ) ) ( ( ) ) ( ( ) ) − − = = − − (0) ( ) ( ) (0) F F ˆ ˆ 2 2 , , l l l l D y p D y p D D D y p D y p Example: Null deviance: 319.28 o on 246 degrees of freedom Residual deviance: 235.9 Residual deviance: 235 9 94 94 on 244 degrees of freedom on 244 degrees of freedom Marcel Dettling, Zurich University of Applied Sciences 5

  6. Applied Statistical Regr Applied Statistical Regr ression ression HS 2010 – Week 10 M d l Di Model Diagnostics ti Diagnostics are: g • as important with logistic re egression as they are with multiple linear regression models linear regression models • again based on differences s between fitted & observed values � we now have to take into a ccount that the variances are not equal for the different insta equal for the different insta nces nces. � we have to come up with n ovel types of residuals: Pearson and Deviance res siduals Marcel Dettling, Zurich University of Applied Sciences 6

  7. Applied Statistical Regr Applied Statistical Regr ression ression HS 2010 – Week 10 P Pearson Residuals R id l Take the difference between o observed an fitted value and divide by an estimate of the standard d deviation: − ˆ y p = i i R − i ˆ ˆ ( (1 ) ) p p p p i i i i 2 � R is the contribution of the e i th observation to the Pearson i statistic for model compari son. � It is important to note that � It is important to note that Pearson residuals exceeding a Pearson residuals exceeding a value of two in absolute va alue warrant a closer look Marcel Dettling, Zurich University of Applied Sciences 7

  8. Applied Statistical Regr Applied Statistical Regr ression ression HS 2010 – Week 10 Deviance Residuals D i R id l Take the contribution of the i th h observation to the log-likelihood, g , i.e. the chi-square statistic for model comparison. ( ( ) ) ) ) ( ( ) ) ( ( = − ⋅ + − − ˆ ˆ 2 log (1 )log 1 d y p y p i i i i i For obtaining a well interpreta able residual, we take the square root and the sign of the differe ence between true and fitted value: = − ⋅ ˆ ( ) D sign y p d i i i i � It is important to note that Pearson residuals exceeding a value of two in absolute va alue warrant a closer look Marcel Dettling, Zurich University of Applied Sciences 8

  9. Applied Statistical Regr Applied Statistical Regr ression ression HS 2010 – Week 10 Tukey-Anscombe Plo T k A b Pl ot t Remark: sometimes studentiz zed residuals are used! Tukey-Anscombe Plot 1 Tukey-Anscombe Plot 2 2 2 1 1 s s rson residual rson residual 0 0 -1 -1 Pear Pear -2 -2 -3 -3 0.2 0.4 0.6 0.8 1 .0 -3 -2 -1 0 1 2 3 fitted probabilities linear predictor Marcel Dettling, Zurich University of Applied Sciences 9

  10. Applied Statistical Regr Applied Statistical Regr ression ression HS 2010 – Week 10 T k Tukey-Anscombe Plo A b Pl ot t The Tukey-Anscombe plots in y p n R are not perfect. Better use: p xx <- predict(fit, type="re esponse") yy <- residuals(fit, type=" id l (fi " "pearson") " ") scatter.smooth(xx, yy, fami ily="gaussian", pch=20) abline(h=0, lty=3) bli (h 0 lt 3) Reasons: - using a non-robust smoothe er is a must - - different types of residuals different types of residuals can be used can be used - on the x-axis: probs or linea ar predictor Marcel Dettling, Zurich University of Applied Sciences 10

  11. Applied Statistical Regr Applied Statistical Regr ression ression HS 2010 – Week 10 M More Diagnostics Di ti Residuals vs Lev verage 2 165 1 n resid. 0 Std. Pearson -1 -2 4 S -3 68 Cook's distance 0.5 -4 0.00 0.02 0.04 0.06 0.08 Leverage e glm(survival ~ I(log10(w weight)) + age) Marcel Dettling, Zurich University of Applied Sciences 11

  12. Applied Statistical Regr Applied Statistical Regr ression ression HS 2010 – Week 10 Binomial Regression Bi i l R i n Models M d l Concentration Number of Number of in log of mg/l in log of mg/l insects n i insects n_i killed insects y_i killed insects y i 0.96 50 6 1.33 48 16 1.63 46 24 2.04 49 42 2.32 2 32 50 50 44 44 � for the number of killed inse ects, we have ~ ( , ) Y Bin n p i i i i i i � we are mainly interested in the proportion of insects surviving � these are grouped data: the ere is more than 1 observation for a given predictor setting Marcel Dettling, Zurich University of Applied Sciences 12

  13. Applied Statistical Regr Applied Statistical Regr ression ression HS 2010 – Week 10 M d l Model and Estimation d E ti ti n The goal is to find a relation: g = = η = β + β + + β ( 1| ,..., ) ~ ... p P Y x x x x 1 0 1 1 i i p i i p ip η = ( ) We will again use the logit link k function such that g p i i ⎛ ⎛ ⎞ ⎞ = p p β + β + + β ⎜ ⎟ log 1 i ... x x − 0 1 1 1 i p ip ⎝ ⎠ p i Here, is the expected value p e , and thus, also this model [ / ] E Y n i i i here fits within the GLM frame ework. The log-likelihood is: ⎡ p ⎤ ⎛ ⎞ n k ∑ ∑ β = + + − − i ⎢ ⎥ ( ) log ⎜ ⎟ log( ) (1 )log(1 ) l n y p n y i i i i i i ⎝ ⎝ ⎠ ⎠ y y ⎣ ⎣ ⎦ ⎦ = 1 i i Marcel Dettling, Zurich University of Applied Sciences 13

  14. Applied Statistical Regr Applied Statistical Regr ression ression HS 2010 – Week 10 Fitti Fitting with R ith R We need to generate a two-co g olumn matrix where the first contains the “successes” and the second contains the “failures” > killsurv > kill killed surviv [1,] 6 44 [1 ] 6 44 [2,] 16 32 [3 ] [3,] 24 22 24 22 [4,] 42 7 [5 ] [5,] 44 6 44 6 > fit <- glm(killsurv~ ~conc, family="binomial") Marcel Dettling, Zurich University of Applied Sciences 14

  15. Applied Statistical Regr Applied Statistical Regr ression ression HS 2010 – Week 10 S Summary Output O t t The result for the insecticide e example is: p > summary(glm(killsurv ~ co onc, family = "binomial") Coefficients: E ti Estimate Std. E t Std E E Error z value Pr(>|z|) l P (>| |) (Intercept) -4.8923 0. .6426 -7.613 2.67e-14 *** conc conc 3.1088 0. 3 1088 0 .3879 8.015 1.11e-15 *** 3879 8 015 1 11e 15 *** --- Null deviance: 96.6881 Null deviance: 96 6881 on 4 degrees of freedom on 4 degrees of freedom Residual deviance: 1.4542 on 3 degrees of freedom AIC: 24.675 AIC: 24 675 Marcel Dettling, Zurich University of Applied Sciences 15

Recommend


More recommend