ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Case Study 2 Price of Residential Property How does the sale price of a property relate to the appraised values of the land and improvements on the land, and the neighborhood it is in? Two questions: Do the data indicate that price can be predicted based on these variables? Is the relationship the same in different neighborhoods? 1 / 18 Case Study 2 Sale Prices of Residential Properties
ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Available data for 176 sales between May 2008 and June 2009: Sale price, y ; Appraised land value, in thousands of dollars, x 1 ; Appraised improvement value, in thousands of dollars, x 2 ; Neighborhood, three indicator variables x 3 , x 4 , and x 5 . 2 / 18 Case Study 2 Sale Prices of Residential Properties
ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Get the data and plot them: path <- file.path("Text", "Cases", "TAMSALES4.txt") prices <- read.table(path, header = TRUE) pairs(prices[, c("SALES", "LAND", "IMP")]) Consider four (nested) models: Model 1: First order in x 1 and x 2 , no neighborhood effect; Model 2: First order, additive neighborhood effect; Model 3: First order, interactive neighborhood effect; Model 4: Interaction model in x 1 and x 2 , and interactive neighborhood effect; 3 / 18 Case Study 2 Sale Prices of Residential Properties
ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Model 1: l1 <- lm(SALES ~ LAND + IMP, prices) Model 2: l2 <- lm(SALES ~ LAND + IMP + NBHD, prices) Model 3: l3 <- lm(SALES ~ (LAND + IMP) * NBHD, prices) Model 4: l4 <- lm(SALES ~ (LAND * IMP) * NBHD, prices) 4 / 18 Case Study 2 Sale Prices of Residential Properties
ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Summary of the models R 2 R 2 Model AIC s s jackknife jackknife a 1 112.9 .9233 118.3 .9154 2168.194 2 111.3 .9256 117.9 .9159 2165.956 3 108.7 .9290 121.3 .9111 2163.340 4 103.1 .9361 130.7 .8967 2148.515 5 / 18 Case Study 2 Sale Prices of Residential Properties
ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Notes Small s and high R 2 a are desirable. Here, Model 4 is optimal for both. The best model for one is always the best model for the other. s jackknife is the square root of jackknife = 1 � 2 , � � s 2 y i − ˆ y ( i ) n which is not the same as MSE jackknife (Chapter 5). Small s jackknife and high R 2 jackknife are also desirable. Here, Model 2 is optimal for both. Again, the best model for one is always the best model for the other. 6 / 18 Case Study 2 Sale Prices of Residential Properties
ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Information Criteria AIC is Akaike’s Information Criterion: σ 2 + 2( k + 1) AIC = n log ˆ (+ . . . ) σ 2 is the biased estimator of σ 2 : where ˆ σ 2 = n − ( k + 1) s 2 . ˆ n BIC (not shown in the table) is the Bayesian Information Criterion: σ 2 + ( k + 1) log n BIC = n log ˆ (+ . . . ) Small values of both are desirable. 7 / 18 Case Study 2 Sale Prices of Residential Properties
ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II R 2 a and AIC suggest using Model 4, but R 2 jackknife suggests using the simpler Model 2. The other criterion, BIC, suggests using Model 1! We can also use the nested model F -test approach to decide between any pair of the models. 8 / 18 Case Study 2 Sale Prices of Residential Properties
ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Model 1 versus Model 2 Model 2 differs from Model 1 only by including NBHD , so the ANOVA table provides the test: summary(aov(l2)) Df Sum Sq Mean Sq F value Pr(>F) LAND 1 16418670 16418670 1326.237 <2e-16 *** IMP 1 10475902 10475902 846.203 <2e-16 *** NBHD 3 100859 33620 2.716 0.0464 * Residuals 170 2104582 12380 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 The NBHD line shows that we reject Model 1 in favor of Model 2 at the 5% level, but not at the 1% level. 9 / 18 Case Study 2 Sale Prices of Residential Properties
ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Model 2 versus Model 3 Model 3 differs from Model 2 by including the interactions LAND:NBHD and IMP:NBHD , so we need to do some arithmetic: summary(aov(l3)) Df Sum Sq Mean Sq F value Pr(>F) LAND 1 16418670 16418670 1390.212 <2e-16 *** IMP 1 10475902 10475902 887.022 <2e-16 *** NBHD 3 100859 33620 2.847 0.0393 * LAND:NBHD 3 65732 21911 1.855 0.1392 IMP:NBHD 3 101979 33993 2.878 0.0377 * Residuals 164 1936871 11810 F = (3 × 1 . 855 + 3 × 2 . 878) / 6 = 2 . 366 with a P -value of .0322, so we also reject Model 2 in favor of Model 3 at the 5% level. 10 / 18 Case Study 2 Sale Prices of Residential Properties
ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Model 3 versus Model 4 Model 4 differs from Model 3 by including the interactions LAND:IMP and LAND:IMP:NBHD . ANOVA table in which these are the last two rows: summary(aov(SALES ~ NBHD * LAND * IMP, data = prices)) Df Sum Sq Mean Sq F value Pr(>F) NBHD 3 6045891 2015297 189.531 < 2e-16 *** LAND 1 11413704 11413704 1073.417 < 2e-16 *** IMP 1 9535835 9535835 896.810 < 2e-16 *** NBHD:LAND 3 65732 21911 2.061 0.1076 NBHD:IMP 3 101979 33993 3.197 0.0251 * LAND:IMP 1 185809 185809 17.475 4.78e-05 *** NBHD:LAND:IMP 3 49773 16591 1.560 0.2011 Residuals 160 1701289 10633 11 / 18 Case Study 2 Sale Prices of Residential Properties
ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II F = (1 × 17 . 475 + 3 × 1 . 560) / 4 = 5 . 539 with a P -value of .0003, so we also reject Model 3 in favor of Model 4 at the 5% level, and at the 1% and 0.1% levels. But note: each of these tests answers the question: Is there enough evidence against the simpler model to reject it? That is not the same question as: Which of these models will give the best predictions? 12 / 18 Case Study 2 Sale Prices of Residential Properties
ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Interpreting The Model summary(l4) Call: lm(formula = SALES ~ (LAND * IMP) * NBHD, data = prices) Residuals: Min 1Q Median 3Q Max -373.04 -46.44 -3.40 34.69 562.02 13 / 18 Case Study 2 Sale Prices of Residential Properties
ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.552e+02 1.089e+02 1.425 0.1560 LAND -8.272e-01 1.227e+00 -0.674 0.5010 IMP 9.609e-01 4.959e-01 1.938 0.0544 . NBHDDAVISISLES -6.017e+01 1.200e+02 -0.501 0.6168 NBHDHUNTERSGREEN -1.325e+02 1.319e+02 -1.004 0.3168 NBHDHYDEPARK -2.659e+02 1.459e+02 -1.823 0.0702 . LAND:IMP 5.176e-03 3.611e-03 1.434 0.1536 LAND:NBHDDAVISISLES 2.012e+00 1.233e+00 1.631 0.1048 LAND:NBHDHUNTERSGREEN 9.361e-01 1.841e+00 0.509 0.6118 LAND:NBHDHYDEPARK 2.534e+00 1.303e+00 1.945 0.0536 . IMP:NBHDDAVISISLES -1.977e-01 5.081e-01 -0.389 0.6978 IMP:NBHDHUNTERSGREEN 2.525e-01 5.877e-01 0.430 0.6680 IMP:NBHDHYDEPARK 4.192e-01 5.603e-01 0.748 0.4555 LAND:IMP:NBHDDAVISISLES -4.278e-03 3.617e-03 -1.183 0.2386 LAND:IMP:NBHDHUNTERSGREEN -2.465e-04 5.211e-03 -0.047 0.9623 LAND:IMP:NBHDHYDEPARK -5.198e-03 3.662e-03 -1.419 0.1578 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 14 / 18 Case Study 2 Sale Prices of Residential Properties
ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Residual standard error: 103.1 on 160 degrees of freedom Multiple R-squared: 0.9415, Adjusted R-squared: 0.9361 F-statistic: 171.8 on 15 and 160 DF, p-value: < 2.2e-16 15 / 18 Case Study 2 Sale Prices of Residential Properties
ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II The base neighborhood is Cheval, so the equation for that neighborhood is E ( Y ) = 155 . 2 − 0 . 83 x 1 + 0 . 96 x 2 + 0 . 0052 x 1 x 2 , a two-variable interaction model. For each other neighborhood, the equation is also a two-variable interaction model, but with different coefficients. 16 / 18 Case Study 2 Sale Prices of Residential Properties
ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II For another neighborhood, say Davis Isles, we must add the corresponding interaction terms: NBHDDAVISISLES = -60.17 to the intercept; LAND:NBHDDAVISISLES = 2.01 to the coefficient of x 1 ; IMP:NBHDDAVISISLES = -0.20 to the coefficient of x 2 ; LAND:IMP:NBHDDAVISISLES = -0.0043 to the coefficient of x 1 x 2 . We find E ( Y ) = 95 . 09 + 1 . 19 x 1 + 0 . 76 x 2 + 0 . 0009 x 1 x 2 . 17 / 18 Case Study 2 Sale Prices of Residential Properties
Recommend
More recommend