business statistics
play

Business Statistics CONTENTS Multiple regression Dummy regressors - PowerPoint PPT Presentation

MULTIPLE REGRESSION ANALYSIS AND OTHER ISSUES Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression analysis Predicting with regression analysis Old exam question Further study MULTIPLE REGRESSION The


  1. MULTIPLE REGRESSION ANALYSIS AND OTHER ISSUES Business Statistics

  2. CONTENTS Multiple regression Dummy regressors Assumptions of regression analysis Predicting with regression analysis Old exam question Further study

  3. MULTIPLE REGRESSION The regression model so far is for one dependent variable ( ๐‘ ) and one independent (explanatory) variable ( ๐‘Œ ) โ–ช There are many cases where several explanatory variables might play a role โ–ช ... might โ€œexplainโ€ the dependent variable ๐‘ โ–ช Example: house prices depend on โ–ช floor area โ–ช ground area (first floor + garden) โ–ช number of rooms โ–ช age of the house โ–ช etc.

  4. MULTIPLE REGRESSION Generalize simple regression model Now, youโ€™ll understand why โ–ช from ๐‘ = ๐›พ 0 + ๐›พ 1 ๐‘Œ 1 + ๐œ we used a subscript 0 for the constant in ๐›พ 0 ... โ–ช to ๐‘ = ๐›พ 0 + ๐›พ 1 ๐‘Œ 1 + ๐›พ 2 ๐‘Œ 2 + ๐œ โ–ช or even to ๐‘ = ๐›พ 0 + ๐›พ 1 ๐‘Œ 1 + ๐›พ 2 ๐‘Œ 2 + โ‹ฏ + ๐›พ ๐‘™ ๐‘Œ ๐‘™ + ๐œ Multiple regression โ–ช a quite obvious extension โ–ช we can reuse much of the theory of simple regression โ–ช still based on OLS, ๐‘† 2 , ๐บ -test, and ๐‘ข -test

  5. MULTIPLE REGRESSION SPSS output Estimated model: เท  ๐‘ = โˆ’217603 + 5347๐‘Œ 1 + 225๐‘Œ 2

  6. MULTIPLE REGRESSION โ€œStep 0โ€ (statistical model): ๐‘ = ๐›พ 0 + ๐›พ 1 ๐‘Œ 1 + ๐›พ 2 ๐‘Œ 2 + ๐œ , with ๐œ~๐‘‚ 0, ๐œ 2 Step 1: โ–ช ๐ผ 0 : ๐›พ 1 = ๐›พ 2 = 0 ; ๐ผ 1 : at least one of these not 0 Step 2: mind that the null hypothesis does not include the constant (intercept) ๐›พ 0 ๐‘๐‘‡๐‘† โ–ช Sample statistic: ๐บ = ๐‘๐‘‡๐น ; reject for โ€œtoo largeโ€ values Step 3: โ–ช Under ๐ผ 0 : ๐บ~๐บ 2,๐‘œโˆ’3 ; assumption: see model (step 0) Step 4: with ๐‘™ regressors: โ–ช ๐บ calc = โ‹ฏ ; ๐บ crit = ๐บ 2,๐‘œโˆ’3;๐›ฝ df 1 = ๐‘™ df 2 = ๐‘œ โˆ’ ๐‘™ โˆ’ 1 Step 5: โ–ช reject/not reject ๐ผ 0

  7. MULTIPLE REGRESSION Rejecting the ๐บ -test in multiple regressions means: โ–ช at least one of the slope coefficients differs from 0 โ–ช โ€œnot ๐›พ 1 = ๐›พ 2 = 0 โ€ โ–ช which one differs (or differ) from 0 must be investigated by separate ๐‘ข -tests So, โ–ช while in simple regression the overall ๐บ -test and the ๐‘ข -test for ๐›พ 1 do exactly the same thing ... โ–ช ... the two tests have a complimentary role in multiple regression โ–ช first look at overall ๐บ , then go to the individual ๐‘ข s

  8. MULTIPLE REGRESSION First, overall model test, using ๐บ -test Next, test each slope coefficient, using ๐‘™ times a ๐‘ข -test not interesting

  9. EXERCISE 1 What does it mean when in multiple regression a. the overall ๐บ -test yields a significant result? b. a ๐‘ข -test of an individual coefficient ๐›พ 3 yields a significant result?

  10. MULTIPLE REGRESSION Example: โ–ช overall ๐บ -test: highly significant โ–ช both regression slopes: highly significant โ–ช coefficient of determination ( ๐‘† 2 ): very high ( 90% ) โ–ช a very useful model โ–ช in fact: better than the simple regression model with ๐‘† 2 = 82%

  11. MULTIPLE REGRESSION Observe: โ–ช including more explanatory variables will in general improve the model โ–ช ๐‘† 2 will increase, even if we include โ€œnon - senseโ€ variables (e.g., street number of the house) 2 (โ€œR -square- adjustedโ€) penalizes for including โ€œtoo โ–ช ๐‘† adj manyโ€ regressors ๐‘‡๐‘‡๐น/๐‘œโˆ’๐‘™โˆ’1 ๐‘‡๐‘‡๐‘ˆ/๐‘œโˆ’1 while ๐‘† 2 = 1 โˆ’ ๐‘‡๐‘‡๐น 2 โ–ช ๐‘† adj = 1 โˆ’ ๐‘‡๐‘‡๐‘ˆ

  12. DUMMY REGRESSORS House prices (numerical) depend on: โ–ช numerical variables (floor area, ground area, etc.) โ–ช binary categorical variables (with/without garage, etc.) โ–ช other categorical variables (no/free/paid parking, etc.) However: โ–ช regression for numerical ๐‘Œ and numerical ๐‘ โ–ช ANOVA for categorical ๐‘Œ and numerical ๐‘ So, how to combine numerical ๐‘Œ 1 and categorical ๐‘Œ 2 ? Solution: dummy variables for categorical variable โ–ช dummy regressors/dummy regression

  13. DUMMY REGRESSORS We can include dummy variables in multiple regression โ–ช Splitting binary in several binary Omitted variable: โ–ช original variable: garage = no/yes no_garage (redundant): garage=0 โ–ช garage: 0=no; 1=yes โ–ช Splitting non-binary in several binary โ–ช original variable: parking = no/free/paid Omitted variable: no_parking (redundant): โ–ช free_parking: 0=no; 1=yes free=0, paid=0 โ–ช paid_parking: 0=no; 1=yes โ–ช Dummy variables only for independent ( ๐‘Œ ) variables โ–ช never for dependent ( ๐‘ ) variable โ–ช ๐‘ must be numerical (think about ๐œ~๐‘‚ )

  14. DUMMY REGRESSORS Example โ–ช House price ( ๐‘ ) as a function of โ–ช floor area ( ๐‘Œ 1 ) โ–ช dummy for garden ( ๐‘Œ 2 ; 0=No, 1=Yes) โ–ช ๐‘„๐‘ ๐‘—๐‘‘๐‘“ = โˆ’261741 + 6040๐บ๐‘š๐‘๐‘๐‘ ๐ต๐‘ ๐‘“๐‘ + 21825๐ป๐‘๐‘ ๐‘’๐‘“๐‘œ meaning 21825 โ‚ฌ extra when there is a garden (whatever the size)

  15. DUMMY REGRESSORS โ–ช Use dummy variables only for the independent (explanatory) variable โ–ช not for the dependent variable.(logistic regression, not in this course!) โ–ช It is quite common to indicate dummy explanatory variables with a ๐ธ instead of an ๐‘Œ โ–ช for instance: ๐‘ = ๐›พ 0 + ๐›พ 1 ๐‘Œ 1 + ๐›พ 2 ๐ธ 2 + ๐›พ 3 ๐ธ 3 + ๐œ

  16. EXERCISE 2 We want to explain car prices in terms of 1) engine power 2) number of seats 3) gas/diesel/electric. What is the theoretical model?

  17. ASSUMPTIONS OF REGRESSION ANALYSIS The OLS equations always find coefficients ๐‘ 0 , ๐‘ 1 , โ€ฆ that minimize the residual sum of squares ( ๐‘‡๐‘‡๐น ) โ–ช so no assumptions required for that part But when testing the model (and when testing the coefficients ๐›พ 1 , ๐›พ 2 , โ€ฆ ) โ–ช we need to assume a statistical model with ๐œ~๐‘‚ 0, ๐œ 2 : โ–ช the residual terms should be normally distributed โ–ช the residual terms should come from a distribution with constant variance โ–ช the residual terms should be independent of each other โ–ช there should be a linear relationship between the ๐‘Œ -variable(s) and ๐‘

  18. ASSUMPTIONS OF REGRESSION ANALYSIS A final word on the residual ๐œ~๐‘‚ 0, ๐œ 2 โ–ช Theoretical regression model โ–ช ๐‘ = ๐›พ 0 + ๐›พ 1 ๐‘Œ 1 + ๐›พ 2 ๐‘Œ 2 + โ‹ฏ + ๐›พ ๐‘™ ๐‘Œ ๐‘™ + ๐œ โ–ช Estimated regression model เท  โ–ช ๐‘ = ๐‘ 0 + ๐‘ 1 ๐‘Œ 1 + ๐‘ 2 ๐‘Œ 2 + โ‹ฏ + ๐›พ ๐‘™ ๐‘Œ ๐‘™ โ–ช Observations โ–ช ๐‘ ๐‘— = ๐‘ 0 + ๐‘ 1 ๐‘Œ 1,๐‘— + ๐‘ 2 ๐‘Œ 2,๐‘— + โ‹ฏ + ๐›พ ๐‘™ ๐‘Œ ๐‘™,๐‘— + ๐‘“ ๐‘— โ–ช And the standard deviation of the residual term ๐œ = ๐œ 2 ๐‘‡๐‘‡๐น โ–ช is estimated by ๐‘ก = ๐‘œโˆ’๐‘™โˆ’1 = ๐‘๐‘‡๐น โ–ช is known as the standard error of the regression or standard error of the estimate

  19. PREDICTION WITH REGRESSION ANALYSIS Given a sample of data ๐‘ฆ 1๐‘— , ๐‘ฆ 2๐‘— , โ€ฆ , ๐‘ง ๐‘— with ๐‘— = 1, โ€ฆ , ๐‘œ โ–ช we can use OLS to estimate the regression model เท  ๐‘ = ๐‘ 0 + ๐‘ 1 ๐‘Œ 1 + ๐‘ 2 ๐‘Œ 2 + โ‹ฏ โ–ช subsequently, given the floor area, we can estimate the price of the house Now, a new โ€œ incompleteโ€ observations arrives โ–ช for instance, a new house with known floor area ( ๐‘ฆ ๐‘œ+1 ), but with unknown price (no ๐‘ง ๐‘œ+1 ) We can use the regression model to estimate the house price โ–ช so to predict เทŸ ๐‘ง ๐‘œ+1

  20. PREDICTION WITH REGRESSION ANALYSIS Example: โ–ช เท  ๐‘ = โˆ’264749 + 6152๐‘Œ โ–ช a house with floor area ๐‘ฆ = 85 m2 has an estimated price ๐‘ง = โˆ’264748 + 6152 ร— 85 = 258142 (โ‚ฌ) เทœ

  21. PREDICTION WITH REGRESSION ANALYSIS So, we can predict a value of เทœ ๐‘ง โ–ช for a given ๐‘ฆ (or ๐‘ฆ 1 , ๐‘ฆ 2 , โ€ฆ ) โ–ช and given estimated regression coefficients ( ๐‘ 0 , ๐‘ 1 , โ€ฆ ) The quality of this estimate depends obviously on the quality of the regression model โ–ช try to find a confidence interval for the estimated เทœ ๐‘ง -value โ–ช two types: โ–ช the confidence interval for the average price of a house of 85 m2 โ–ช the confidence interval for a particular house of 85 m2

  22. PREDICTION WITH REGRESSION ANALYSIS Point prediction: 258142 Case 1: confidence interval (95%) for prediction of mean price โ–ช 212866, 303419 Case 2: confidence interval (95%) for individual prediction Individual predictions are โ–ช โˆ’96372, 612658 always less accurate ๏‚ฎ wider confidence interval (this one even includes 0) Price ( ๐‘ ) unknown, area ( ๐‘Œ ) known

  23. OLD EXAM QUESTION 26 March 2015, Q3a

  24. FURTHER STUDY Doane & Seward 5/E 12.7, 13.1-13.5 Tutorial exercises week 4 multiple regression dummy regression prediction interval

Recommend


More recommend