Chapter 13 Multiple Regression and Model Building
Multiple Regression Models The General Multiple Regression Model ... y x x x 0 1 1 2 2 k k is the dependent variable y are the independent variables , , ..., x x x 1 2 k is the deterministic portion of ... E y x x x 0 1 1 2 2 k k the model determines the contribution of the independent variable x i i
Multiple Regression Models Analyzing a Multiple Regression Model 1. Hypothesize the deterministic component of the model Use sample data to estimate β 0 , β 1 , β 2 ,… β k 2. Specify probability distribution of ε and estimate σ 3. Check that assumptions on ε are satisfied 4. 5. Statistically evaluate model usefulness 6. Useful model used for prediction, estimation, other purposes
The First-Order Model: Estimating and Interpreting the -Parameters For E y x x x x x 0 1 1 2 2 3 3 4 4 5 5 ˆ ˆ ˆ the chosen fitted model ˆ ... y x x 0 1 1 k k minimizes 2 ˆ S S E y y
The First-Order Model: Estimating and Interpreting the -Parameters y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + ε where Y = Sales price (dollars) X 1 = Appraised land value (dollars) X 2 = Appraised improvements (dollars) X 3 = Area (square feet )
The First-Order Model: Estimating and Interpreting the -Parameters Plot of data for sample size n=20
The First-Order Model: Estimating and Interpreting the -Parameters Fit model to data
The First-Order Model: Estimating and Interpreting the -Parameters Interpret β estimates E(y), the mean sale price of the property is ˆ estimated to increase .8145 dollars for every $1 .8 1 4 5 increase in appraised land value, holding other 1 variables constant E(y), the mean sale price of the property is ˆ estimated to increase .8204 dollars for every $1 .8 2 0 4 2 increase in appraised improvements, holding other variables constant E(y), the mean sale price of the property is ˆ estimated to increase 13.53 dollars for additional 1 3 .5 3 square foot of living area, holding other variables 1 constant
The First-Order Model: Estimating and Interpreting the -Parameters Given the model E(y) = 1 +2x 1 +x 2 , the effect of x 2 on E(y), holding x 1 and x 2 constant is
The First-Order Model: Estimating and Interpreting the -Parameters Given the model E(y) = 1 +2x 1 +x 2 , the effect of x 2 on E(y), holding x 1 and x 2 constant is
Model Assumptions Assumptions about Random Error ε For any given set of values of x 1 , x 2 ,…..x k , the random 1. error has a normal probability distribution with mean 0 and variance σ 2 2. The random errors are independent Estimators of σ 2 for a Multiple Regression Model with k Independent Variables SSE SSE s 2 = = n -Number of Estimated β parameters n -( k +1)
Inferences about the -Parameters 2 types of inferences can be made, using either confidence intervals or hypothesis testing For any inferences to be made, the assumptions made about the random error term ε (normal distribution with mean 0 and variance σ 2 , independence or errors) must be met
Inferences about the -Parameters A 100(1- α )% Confidence Interval for a -Parameter ˆ t s ˆ 2 i i where t α /2 is based on n -( k +1) degrees of freedom and n = Number of observations k +1 = Number of parameters in the model
Inferences about the -Parameters A Test of an Individual Parameter Coefficient Two-Tailed One-Tailed Test Test H 0 : β i =0 H 0 : β i =0 H a : β i <0 (or H a : β i >0) H a : β i ≠0 ˆ i : T e s t S ta tis tic t s ˆ i Rejection region: t < -t α Rejection region: | t |> t α /2 (or t < - t α when H a : β 1 >0) Where t α and t α /2 are based on n -( k +1) degrees of freedom
Inferences about the -Parameters An Excel Analysis Use for hypotheses about parameter coefficients Use for confidence Intervals
Checking the Overall Utility of a Model 3 tests: Multiple coefficient of determination R 2 1. S S S S E S S E E x p la in e d v a r ia b ility 2 y y 1 R S S S S T o ta l v a r ia b ility y y y y 2. Adjusted multiple coefficient of determination 1 1 n n S S E 2 2 1 1 1 R R a 1 1 n k S S n k y y 3. Global F-test 2 S S S S E k R k y y T e st sta tistic F : 2 1 S S E n k 1 R n k 1
Checking the Overall Utility of a Model Testing Global Usefulness of the Model: The Analysis of Variance F-test H 0 : β 1 = β 2=.... β k =0 H a : At least one β i ≠ 0 2 S S S S E k R k M e a n S q u a re M o d e l y y T e st sta tistic F : 2 1 M e a n S q u a re E rro r S S E n k 1 1 R n k where n is the sample size and k is number of terms in the model Rejection region: F>F α , with k numerator degrees of freedom and [n- (k+1)] denominator degrees of freedom
Checking the Overall Utility of a Model Checking the Utility of a Multiple Regression Model 1. Conduct a test of overall model adequacy using the F-test. If H 0 is rejected, proceed to step 2 Conduct t-tests on β parameters of particular 2. interest
Using the Model for Estimation and Prediction As in Simple Linear Regression, intervals around a predicted value will be wider than intervals around an estimated value Most statistics packages will print out both confidence and prediction intervals
Model Building: Interaction Models An Interaction Model relating E(y) to Two Quantitative Independent Variables E y x x x x 0 1 1 2 2 3 1 2 where represents the change in E(y) for x 1 3 2 every 1-unit increase in x 1 , holding x 2 fixed represents the change in E(y) for x 2 3 1 every 1-unit increase in x 2 , holding x 1 fixed
Model Building: Interaction Models When the relationship between two y When the linear relationship and x i is not impacted by a second x between y and x i depends on (no interaction) another x
Model Building: Interaction Models
Model Building: Quadratic and other Higher-Order Models A Quadratic (Second-Order) Model 2 E y x x 0 1 2 where is the y-intercept of the curve 0 is a shift parameter 1 is the rate of curvature 2
Model Building: Quadratic and other Higher-Order Models Home Size-Electrical Usage Data Size of Home, Monthly Usage, x (sq. ft.) y (kilowatt-hours) 1,290 1,182 1,350 1,172 1,470 1,264 1,600 1,493 1,710 1,571 1,840 1,711 1,980 1,804 2,230 1,840 2,400 1,95 2,930 1,954
Model Building: Quadratic and other Higher-Order Models 2 ˆ 1, 2 1 6 .1 2 .3 9 8 9 .0 0 0 4 5 y x x
Model Building: Quadratic and other Higher-Order Models A Complete Second-Order Model with Two Quantitative Independent Variables 2 2 E y x x x x x x 0 1 2 2 3 1 2 4 1 5 2 where is the y-intercept, value of E(y) when x 1 = x 2 =0 0 changes cause the surface to shift along the x 1 and x 2 , 1 2 axes controls the rotation of the surface 3 control the type of surface, rates of curvature , 4 5
Model Building: Quadratic and other Higher-Order Models
Model Building: Qualitative (Dummy) Variable Models Dummy variables – coded, qualitative variables • Codes are in the form of (1, 0), 1 being the presence of a condition, 0 the absence • Create Dummy variables so that there is one less dummy variable than categories of the qualitative variable of interest Gender dummy variable coded as x = 1 if male, x=0 if female If model is E(y)= β 0 + β 1 x , β 1 captures the effect of being male on the dependent variable
Model Building: Models with both Quantitative and Qualitative Variables Start with a first order model with one quantitative variable, E(y)= β 0 + β 1 x Adding a qualitative variable with no interaction, E(y)= β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3
Model Building: Models with both Quantitative and Qualitative Variables Adding an interaction term, E(y)= β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + β 4 x 1 x 2 + β 5 x 1 x 3 Main effect, Main effect Interaction x 1 x 2 and x 3
Model Building: Comparing Nested Models Models are nested if one model contains all the terms of the other model and at least one additional term. Complete (full) model – the more complex model Reduced model – the simpler model
Recommend
More recommend