Linear and logisitic regression models Søren Højsgaard Department of Mathematical Sciences Aalborg University, Denmark August 22, 2012 Printed: August 22, 2012 File: models-slides.tex
2: August 22, 2012 Contents 1 Linear normal models 3 2 Linear regression 4 2.1 Fitting linear regression model with lm() . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Printing the object: print() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3 Model objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.4 Extractor methods and methods for further computing . . . . . . . . . . . . . . . . . . . . . 13 2.5 Plotting the object: plot() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.6 Summary information: summary() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.7 Confidence interval for model parameters: confint() . . . . . . . . . . . . . . . . . . . . . . 19 2.8 Predicting new cases: predict() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3 Regression with a factor 23 3.1 Transforming data using transform() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4 Model comparison 30 4.1 Comparing two models with anova() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.2 Three commonly used tables for model comparisons . . . . . . . . . . . . . . . . . . . . . . . 33 4.3 Sequential ANOVA table: anova() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.4 On interpreting the anova() output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.5 Dropping each term in turn using drop1() . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.6 On interpreting the drop1() output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.7 Investigating parameter estimates using coef() . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.8 Which table to use?* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5 Residuals and model checking 42 5.1 Interpreting diagnostic plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 6 Logistic regression 46
3: August 22, 2012 1 Linear normal models • Linear normal models (regression models, analysis of variance models, analysis of covariance models etc.) are fitted using the lm() function. • The lm() function is typically called as: R> lm(y ~ x1 + x2 + x3, dataset) • The result from calling the lm() function is an object (also called a model object) with a specific class. • Further analysis of the model is typically via additional R functions applied to the model object.
4: August 22, 2012 2 Linear regression The hellung dataset has 51 rows and 3 columns. Diameter and concentration of Tetrahymena cells with (coded as 1 ) and without (coded as 2 ) glucose added to growth medium. Tetrahymena cells are often used as model organisms in experimental biology.
5: August 22, 2012 R> data(hellung,package="ISwR") R> head(hellung,6) glucose conc diameter 1 1 631000 21.2 2 1 592000 21.5 3 1 563000 21.3 4 1 475000 21.0 5 1 461000 21.5 6 1 416000 21.3 R> sapply(hellung, class) glucose conc diameter "integer" "integer" "numeric"
6: August 22, 2012 R> par(mfrow=c(1,2)) R> plot(diameter ~ conc, data=hellung, + col=glucose, pch=as.character(glucose)) R> plot(log(diameter) ~ log(conc), data=hellung, + col=glucose, pch=as.character(glucose)) 1 1 3.25 1 1 1 1 1 1 1 1 1 1 1 1 25 1 1 1 1 1 1 1 2 1 1 2 2 2 1 2 log(diameter) 2 1 2 1 2 1 3.15 1 2 1 1 1 2 1 diameter 1 2 1 1 1 2 1 2 1 1 2 1 23 2 2 1 2 1 1 2 2 1 2 1 1 2 1 2 1 1 1 1 3.05 2 1 1 1 1 1 1 2 1 1 2 2 2 1 1 21 2 2 2 1 2 2 1 1 2 2 2 2 2.95 2 2 19 0e+00 3e+05 6e+05 10 11 12 13 conc log(conc) • On a log scale the curves look linear • but are they parallel?
7: August 22, 2012 For now we ignore the glucose treatment. In the following let y = log( diameter ) and x = log( conc ) . The plots suggest an approximately linear relationship on the log scale which we can capture by a linear regression model e i ∼ N (0 , σ 2 ) , y i = α + βx i + e i , i = 1 , . . . , 51
8: August 22, 2012 2.1 Fitting linear regression model with lm() R> hm <- lm(log(diameter) ~ log(conc), data=hellung) Now hm is a linear model object.
9: August 22, 2012 2.2 Printing the object: print() The print() method gives some information about the object: R> print(hm) Call: lm(formula = log(diameter) ~ log(conc), data = hellung) Coefficients: (Intercept) log(conc) 3.74703 -0.05451 Instead of calling print() we may simply type the object’s name: R> hm Call: lm(formula = log(diameter) ~ log(conc), data = hellung) Coefficients: (Intercept) log(conc) 3.74703 -0.05451
10: August 22, 2012 2.3 Model objects Techically, a model object is a list with all sorts of information; for example the model specification, the data, the parameter estimates and so on. R> class(hm) [1] "lm" R> names(hm) [1] "coefficients" "residuals" "effects" "rank" [5] "fitted.values" "assign" "qr" "df.residual" [9] "xlevels" "call" "terms" "model" The names of the list are called attributes or slots .
11: August 22, 2012 We may extract values from the model object using the $ –operator as follows R> hm$coefficients (Intercept) log(conc) 3.74702641 -0.05451464 R> hm$residuals 1 2 3 4 5 0.0350211011 0.0455948627 0.0335108932 0.0100606873 0.0319602834 6 7 8 9 10 0.0170150708 -0.0352928345 0.0665403222 0.0089021709 0.0182022593 11 12 13 14 15 0.0349529425 0.0214114524 0.0462117137 0.0518996737 0.0275888916 16 17 18 19 20 0.0191915976 0.0218841508 0.0061947069 -0.0004889143 0.0306645283 21 22 23 24 25 0.0059028969 0.0565348153 0.0212271549 0.0298302213 0.0260690999 26 27 28 29 30 -0.0098984554 0.0471016029 0.0193724964 0.0218186112 0.0256750284 31 32 33 34 35 0.0096540212 0.0298367056 -0.0641562641 -0.0611421290 -0.0683058568 36 37 38 39 40 -0.0177867867 -0.0368224409 -0.0443737505 -0.0468146393 -0.0932894058 41 42 43 44 45 -0.0149983152 -0.0164825214 -0.0395395538 -0.0230984646 0.0014197167 46 47 48 49 50 -0.0295345605 -0.0402017510 -0.0534922053 -0.0320022390 -0.0401489903 51 -0.0533796004
12: August 22, 2012
13: August 22, 2012 2.4 Extractor methods and methods for further computing For some of the attributes there exist extractor functions, for example:
14: August 22, 2012 R> coef(hm) (Intercept) log(conc) 3.74702641 -0.05451464 R> residuals(hm) 1 2 3 4 5 0.0350211011 0.0455948627 0.0335108932 0.0100606873 0.0319602834 6 7 8 9 10 0.0170150708 -0.0352928345 0.0665403222 0.0089021709 0.0182022593 11 12 13 14 15 0.0349529425 0.0214114524 0.0462117137 0.0518996737 0.0275888916 16 17 18 19 20 0.0191915976 0.0218841508 0.0061947069 -0.0004889143 0.0306645283 21 22 23 24 25 0.0059028969 0.0565348153 0.0212271549 0.0298302213 0.0260690999 26 27 28 29 30 -0.0098984554 0.0471016029 0.0193724964 0.0218186112 0.0256750284 31 32 33 34 35 0.0096540212 0.0298367056 -0.0641562641 -0.0611421290 -0.0683058568 36 37 38 39 40 -0.0177867867 -0.0368224409 -0.0443737505 -0.0468146393 -0.0932894058 41 42 43 44 45 -0.0149983152 -0.0164825214 -0.0395395538 -0.0230984646 0.0014197167 46 47 48 49 50 -0.0295345605 -0.0402017510 -0.0534922053 -0.0320022390 -0.0401489903 51 -0.0533796004
15: August 22, 2012 Moreover, there are various methods available for model objects and each of these methods perform a specific task. Some of these methods are print() , summary() , plot() , coef() , fitted() , predict() ...
16: August 22, 2012 2.5 Plotting the object: plot() The plot() method for lm –objects produces illustrative diagnostic plots: R> par(mfrow=c(2,2),mar=c(2,4.5,2,2)) R> plot(hm) Residuals vs Fitted Normal Q−Q Standardized residuals 2 8 ● 8 ● ● ● ● ● ● ● ● ● ● ● ● ● 1 ● ● ● ● ● ● ● ● ● Residuals ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● 0.00 ●● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 35 ●● −0.10 35 −2 ● ● 40 40 ● 3.05 3.10 3.15 3.20 −2 −1 0 1 2 Scale−Location Residuals vs Leverage Standardized residuals Standardized residuals 40 1.5 ● 2 ● ● ● 35 ● 8 ● ● ● ● ● ● ● 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● 1.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● −1 ●● ● ● 0.5 ● ● ● ● ● ● ● ● ● ● ● ● ● 34 ● ● 33 35 ● Cook's distance ● ● 0.0 −3 3.05 3.10 3.15 3.20 0.00 0.02 0.04 0.06
Recommend
More recommend