-1- Presentation 7.3a: Multiple linear re- gression Murray Logan July 19, 2017 Table of contents 1 Theory 1 2 Centering data 3 3 Assumptions 5 4 Multiple linear models in R 7 5 Model selection 12 6 Worked Examples 13 1. Theory 1.1. Multiple Linear Regression 1.1.1. Additive model growth = intercept + temperature + nitrogen y i = β 0 + β 1 x i 1 + β 2 x i 2 + ... + β j x ij + ϵ i OR N ∑ y i = β 0 + β j x ji + ϵ i j =1: n 1.2. Multiple Linear Regression 1.2.1. Additive model growth = intercept + temperature + nitrogen y i = β 0 + β 1 x i 1 + β 2 x i 2 + ... + β j x ij + ϵ i - effect of one predictor holding the other(s) constant 1.3. Multiple Linear Regression 1.3.1. Additive model growth = intercept + temperature + nitrogen y i = β 0 + β 1 x i 1 + β 2 x i 2 + ... + β j x ij + ϵ i
-2- Y X1 X2 3 22.7 0.9 2.5 23.7 0.5 6 25.7 0.6 5.5 29.1 0.7 9 22 0.8 8.6 29 1.3 12 29.4 1 1.4. Multiple Linear Regression 1.4.1. Additive model 3 = β 0 + ( β 1 × 22.7) + ( β 2 × 0.9) + ε 1 2.5 = β 0 + ( β 1 × 23.7) + ( β 2 × 0.5) + ε 2 6 = β 0 + ( β 1 × 25.7) + ( β 2 × 0.6) + ε 3 5.5 = β 0 + ( β 1 × 29.1) + ( β 2 × 0.7) + ε 4 + ( β 1 × 22) + ( β 2 × 0.8) 9 = β 0 + ε 5 + ( β 1 × 29) + ( β 2 × 1.3) 8.6 = β 0 + ε 6 + ( β 1 × 29.4) + ( β 2 × 1) 12 = β 0 + ε 7 1.5. Multiple Linear Regression 1.5.1. Multiplicative model growth = intercept + temp + nitro + temp × nitro y i = β 0 + β 1 x i 1 + β 2 x i 2 + β 3 x i 1 x i 2 + ... + ϵ i 1.6. Multiple Linear Regression 1.6.1. Multiplicative model 3 = β 0 + ( β 1 × 22.7) + ( β 2 × 0.9) + ( β 3 × 22.7 × 0.9) + ε 1 2.5 = β 0 + ( β 1 × 23.7) + ( β 2 × 0.5) + ( β 3 × 23.7 × 0.5) + ε 2 6 = β 0 + ( β 1 × 25.7) + ( β 2 × 0.6) + ( β 3 × 25.7 × 0.6) + ε 3 5.5 = β 0 + ( β 1 × 29.1) + ( β 2 × 0.7) + ( β 3 × 29.1 × 0.7) + ε 4 9 = β 0 + ( β 1 × 22) + ( β 2 × 0.8) + ( β 3 × 22 × 0.8) + ε 5 8.6 = β 0 + ( β 1 × 29) + ( β 2 × 1.3) + ( β 3 × 29 × 1.3) + ε 6 12 = β 0 + ( β 1 × 29.4) + ( β 2 × 1) + ( β 3 × 29.4 × 1) + ε 7
-3- 2. Centering data 2.1. Multiple Linear Regression 2.1.1. Centering ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ● ● ● ● ● 10 y 0 −10 −20 0 10 20 30 40 50 60 x 2.2. Multiple Linear Regression 2.2.1. Centering ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 47 48 49 50 51 52 53 54 2.3. Multiple Linear Regression
-4- 2.3.1. Centering ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 47 48 49 50 51 52 53 54 2.4. Multiple Linear Regression 2.4.1. Centering ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 47 48 49 50 51 52 53 54 −3 −2 −1 0 1 2 3 4
-5- 2.5. Multiple Linear Regression 2.5.1. Centering 24 ● ● ● ● 22 ● ● ● ● ● y 20 ● ● ● ● 18 ● ● ● ● ● 16 ● −4 −2 0 2 4 cx1 3. Assumptions 3.1. Multiple Linear Regression 3.1.1. Assumptions Normality, homog., linearity 3.2. Multiple Linear Regression 3.2.1. Assumptions (multi)collinearity
-6- 3.3. Multiple Linear Regression 3.3.1. Variance inflation Strength of a relationship R 2 Strong when R 2 ≥ 0.8 3.4. Multiple Linear Regression 3.4.1. Variance inflation 1 var . inf = 1 − R 2 Collinear when var . inf > = 5 Some prefer > 3 3.5. Multiple Linear Regression 3.5.1. Assumptions (multi)collinearity library(car) # additive model - scaled predictors vif(lm(y ~ cx1 + cx2, data)) cx1 cx2 1.743817 1.743817 3.6. Multiple Linear Regression 3.6.1. Assumptions (multi)collinearity library(car) # additive model - scaled predictors vif(lm(y ~ cx1 + cx2, data))
-7- cx1 cx2 1.743817 1.743817 # multiplicative model - raw predictors vif(lm(y ~ x1 * x2, data)) x1 x2 x1:x2 7.259729 5.913254 16.949468 3.7. Multiple Linear Regression 3.7.1. Assumptions # multiplicative model - raw predictors vif(lm(y ~ x1 * x2, data)) x1 x2 x1:x2 7.259729 5.913254 16.949468 # multiplicative model - scaled predictors vif(lm(y ~ cx1 * cx2, data)) cx1 cx2 cx1:cx2 1.769411 1.771994 1.018694 4. Multiple linear models in R 4.1. Model fitting Additive model y i = β 0 + β 1 x i 1 + β 2 x i 2 + ϵ i data.add.lm <- lm(y~cx1+cx2, data) 4.2. Model fitting Additive model y i = β 0 + β 1 x i 1 + β 2 x i 2 + ϵ i data.add.lm <- lm(y~cx1+cx2, data) Multiplicative model y i = β 0 + β 1 x i 1 + β 2 x i 2 + β 3 x i 1 x i 2 + ϵ i data.mult.lm <- lm(y~cx1+cx2+cx1:cx2, data) #OR data.mult.lm <- lm(y~cx1*cx2, data) 4.3. Model evaluation Additive model plot(data.add.lm)
-8- Residuals vs Fitted Normal Q−Q 40 ● 40 ● ● ● ● 2 ● 2 ● ● ● ● ● ● ● Standardized residuals ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 ● ● ● ● ● ●● ● ● 1 ● ● ● ● ● ● ● ● ● ● Residuals ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1 ● ● ● ● ● ● ● ● ● ● ● ● ● −2 ● ● ● ● ● ● ● 74 ● −2 ● ● 30 ● 74 30 ● −3 −2 −1 0 −2 −1 0 1 2 Fitted values Theoretical Quantiles Scale−Location Residuals vs Leverage 1.5 40 30 ● ● 74 ● ● 40 ● ● ● ● ● ● 2 ● 19 ● ● ● ● ● ● ● ● Standardized residuals ● ● ● ● ● Standardized residuals ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1.0 ● 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 10 ● ● −2 ● ● ● ● ● ● ● ● ● Cook's distance 0.0 −3 −2 −1 0 0.00 0.02 0.04 0.06 Fitted values Leverage 4.4. Model evaluation Multiplicative model plot(data.mult.lm) Residuals vs Fitted Normal Q−Q 2 ● 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Standardized residuals ● ● ● ● ● ● ● ● ● 1 ● ● ● ● ● ● ● 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Residuals ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −2 ● ● ● ● ● 59 74 ● ● 30 −2 7459 ● ● ● 30 −4 −3 −2 −1 0 1 2 −2 −1 0 1 2 Fitted values Theoretical Quantiles Scale−Location Residuals vs Leverage 1.5 30 ● ● ● 59 74 ● ● ● 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 19 ● ● ● 40 ● ● Standardized residuals ● ● ● ● Standardized residuals ● ● ● ● ● ● ● ● ● 1 ● ● ● ● ● ● ● ● 1.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.5 ● ● −1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 84 ● −2 ● ● ● ● ● Cook's distance 0.0 −4 −3 −2 −1 0 1 2 0.00 0.05 0.10 0.15 0.20 Fitted values Leverage
Recommend
More recommend