Business Statistics CONTENTS Multiple regression Dummy regressors - PowerPoint PPT Presentation

MULTIPLE REGRESSION ANALYSIS AND OTHER ISSUES Business Statistics

CONTENTS Multiple regression Dummy regressors Assumptions of regression analysis Predicting with regression analysis Old exam question Further study

MULTIPLE REGRESSION The regression model so far is for one dependent variable ( 𝑍 ) and one independent (explanatory) variable ( 𝑌 ) ▪ There are many cases where several explanatory variables might play a role ▪ ... might “explain” the dependent variable 𝑍 ▪ Example: house prices depend on ▪ floor area ▪ ground area (first floor + garden) ▪ number of rooms ▪ age of the house ▪ etc.

MULTIPLE REGRESSION Generalize simple regression model Now, you’ll understand why ▪ from 𝑍 = 𝛾 0 + 𝛾 1 𝑌 1 + 𝜁 we used a subscript 0 for the constant in 𝛾 0 ... ▪ to 𝑍 = 𝛾 0 + 𝛾 1 𝑌 1 + 𝛾 2 𝑌 2 + 𝜁 ▪ or even to 𝑍 = 𝛾 0 + 𝛾 1 𝑌 1 + 𝛾 2 𝑌 2 + ⋯ + 𝛾 𝑙 𝑌 𝑙 + 𝜁 Multiple regression ▪ a quite obvious extension ▪ we can reuse much of the theory of simple regression ▪ still based on OLS, 𝑆 2 , 𝐺 -test, and 𝑢 -test

MULTIPLE REGRESSION SPSS output Estimated model: ෠ 𝑍 = −217603 + 5347𝑌 1 + 225𝑌 2

MULTIPLE REGRESSION “Step 0” (statistical model): 𝑍 = 𝛾 0 + 𝛾 1 𝑌 1 + 𝛾 2 𝑌 2 + 𝜁 , with 𝜁~𝑂 0, 𝜏 2 Step 1: ▪ 𝐼 0 : 𝛾 1 = 𝛾 2 = 0 ; 𝐼 1 : at least one of these not 0 Step 2: mind that the null hypothesis does not include the constant (intercept) 𝛾 0 𝑁𝑇𝑆 ▪ Sample statistic: 𝐺 = 𝑁𝑇𝐹 ; reject for “too large” values Step 3: ▪ Under 𝐼 0 : 𝐺~𝐺 2,𝑜−3 ; assumption: see model (step 0) Step 4: with 𝑙 regressors: ▪ 𝐺 calc = ⋯ ; 𝐺 crit = 𝐺 2,𝑜−3;𝛽 df 1 = 𝑙 df 2 = 𝑜 − 𝑙 − 1 Step 5: ▪ reject/not reject 𝐼 0

MULTIPLE REGRESSION Rejecting the 𝐺 -test in multiple regressions means: ▪ at least one of the slope coefficients differs from 0 ▪ “not 𝛾 1 = 𝛾 2 = 0 ” ▪ which one differs (or differ) from 0 must be investigated by separate 𝑢 -tests So, ▪ while in simple regression the overall 𝐺 -test and the 𝑢 -test for 𝛾 1 do exactly the same thing ... ▪ ... the two tests have a complimentary role in multiple regression ▪ first look at overall 𝐺 , then go to the individual 𝑢 s

MULTIPLE REGRESSION First, overall model test, using 𝐺 -test Next, test each slope coefficient, using 𝑙 times a 𝑢 -test not interesting

EXERCISE 1 What does it mean when in multiple regression a. the overall 𝐺 -test yields a significant result? b. a 𝑢 -test of an individual coefficient 𝛾 3 yields a significant result?

MULTIPLE REGRESSION Example: ▪ overall 𝐺 -test: highly significant ▪ both regression slopes: highly significant ▪ coefficient of determination ( 𝑆 2 ): very high ( 90% ) ▪ a very useful model ▪ in fact: better than the simple regression model with 𝑆 2 = 82%

MULTIPLE REGRESSION Observe: ▪ including more explanatory variables will in general improve the model ▪ 𝑆 2 will increase, even if we include “non - sense” variables (e.g., street number of the house) 2 (“R -square- adjusted”) penalizes for including “too ▪ 𝑆 adj many” regressors 𝑇𝑇𝐹/𝑜−𝑙−1 𝑇𝑇𝑈/𝑜−1 while 𝑆 2 = 1 − 𝑇𝑇𝐹 2 ▪ 𝑆 adj = 1 − 𝑇𝑇𝑈

DUMMY REGRESSORS House prices (numerical) depend on: ▪ numerical variables (floor area, ground area, etc.) ▪ binary categorical variables (with/without garage, etc.) ▪ other categorical variables (no/free/paid parking, etc.) However: ▪ regression for numerical 𝑌 and numerical 𝑍 ▪ ANOVA for categorical 𝑌 and numerical 𝑍 So, how to combine numerical 𝑌 1 and categorical 𝑌 2 ? Solution: dummy variables for categorical variable ▪ dummy regressors/dummy regression

DUMMY REGRESSORS We can include dummy variables in multiple regression ▪ Splitting binary in several binary Omitted variable: ▪ original variable: garage = no/yes no_garage (redundant): garage=0 ▪ garage: 0=no; 1=yes ▪ Splitting non-binary in several binary ▪ original variable: parking = no/free/paid Omitted variable: no_parking (redundant): ▪ free_parking: 0=no; 1=yes free=0, paid=0 ▪ paid_parking: 0=no; 1=yes ▪ Dummy variables only for independent ( 𝑌 ) variables ▪ never for dependent ( 𝑍 ) variable ▪ 𝑍 must be numerical (think about 𝜁~𝑂 )

DUMMY REGRESSORS Example ▪ House price ( 𝑍 ) as a function of ▪ floor area ( 𝑌 1 ) ▪ dummy for garden ( 𝑌 2 ; 0=No, 1=Yes) ▪ 𝑄𝑠𝑗𝑑𝑓 = −261741 + 6040𝐺𝑚𝑝𝑝𝑠𝐵𝑠𝑓𝑏 + 21825𝐻𝑏𝑠𝑒𝑓𝑜 meaning 21825 € extra when there is a garden (whatever the size)

DUMMY REGRESSORS ▪ Use dummy variables only for the independent (explanatory) variable ▪ not for the dependent variable.(logistic regression, not in this course!) ▪ It is quite common to indicate dummy explanatory variables with a 𝐸 instead of an 𝑌 ▪ for instance: 𝑍 = 𝛾 0 + 𝛾 1 𝑌 1 + 𝛾 2 𝐸 2 + 𝛾 3 𝐸 3 + 𝜁

EXERCISE 2 We want to explain car prices in terms of 1) engine power 2) number of seats 3) gas/diesel/electric. What is the theoretical model?

ASSUMPTIONS OF REGRESSION ANALYSIS The OLS equations always find coefficients 𝑐 0 , 𝑐 1 , … that minimize the residual sum of squares ( 𝑇𝑇𝐹 ) ▪ so no assumptions required for that part But when testing the model (and when testing the coefficients 𝛾 1 , 𝛾 2 , … ) ▪ we need to assume a statistical model with 𝜁~𝑂 0, 𝜏 2 : ▪ the residual terms should be normally distributed ▪ the residual terms should come from a distribution with constant variance ▪ the residual terms should be independent of each other ▪ there should be a linear relationship between the 𝑌 -variable(s) and 𝑍

ASSUMPTIONS OF REGRESSION ANALYSIS A final word on the residual 𝜁~𝑂 0, 𝜏 2 ▪ Theoretical regression model ▪ 𝑍 = 𝛾 0 + 𝛾 1 𝑌 1 + 𝛾 2 𝑌 2 + ⋯ + 𝛾 𝑙 𝑌 𝑙 + 𝜁 ▪ Estimated regression model ෠ ▪ 𝑍 = 𝑐 0 + 𝑐 1 𝑌 1 + 𝑐 2 𝑌 2 + ⋯ + 𝛾 𝑙 𝑌 𝑙 ▪ Observations ▪ 𝑍 𝑗 = 𝑐 0 + 𝑐 1 𝑌 1,𝑗 + 𝑐 2 𝑌 2,𝑗 + ⋯ + 𝛾 𝑙 𝑌 𝑙,𝑗 + 𝑓 𝑗 ▪ And the standard deviation of the residual term 𝜏 = 𝜏 2 𝑇𝑇𝐹 ▪ is estimated by 𝑡 = 𝑜−𝑙−1 = 𝑁𝑇𝐹 ▪ is known as the standard error of the regression or standard error of the estimate

PREDICTION WITH REGRESSION ANALYSIS Given a sample of data 𝑦 1𝑗 , 𝑦 2𝑗 , … , 𝑧 𝑗 with 𝑗 = 1, … , 𝑜 ▪ we can use OLS to estimate the regression model ෠ 𝑍 = 𝑐 0 + 𝑐 1 𝑌 1 + 𝑐 2 𝑌 2 + ⋯ ▪ subsequently, given the floor area, we can estimate the price of the house Now, a new “ incomplete” observations arrives ▪ for instance, a new house with known floor area ( 𝑦 𝑜+1 ), but with unknown price (no 𝑧 𝑜+1 ) We can use the regression model to estimate the house price ▪ so to predict ෟ 𝑧 𝑜+1

PREDICTION WITH REGRESSION ANALYSIS Example: ▪ ෠ 𝑍 = −264749 + 6152𝑌 ▪ a house with floor area 𝑦 = 85 m2 has an estimated price 𝑧 = −264748 + 6152 × 85 = 258142 (€) ො

PREDICTION WITH REGRESSION ANALYSIS So, we can predict a value of ො 𝑧 ▪ for a given 𝑦 (or 𝑦 1 , 𝑦 2 , … ) ▪ and given estimated regression coefficients ( 𝑐 0 , 𝑐 1 , … ) The quality of this estimate depends obviously on the quality of the regression model ▪ try to find a confidence interval for the estimated ො 𝑧 -value ▪ two types: ▪ the confidence interval for the average price of a house of 85 m2 ▪ the confidence interval for a particular house of 85 m2

PREDICTION WITH REGRESSION ANALYSIS Point prediction: 258142 Case 1: confidence interval (95%) for prediction of mean price ▪ 212866, 303419 Case 2: confidence interval (95%) for individual prediction Individual predictions are ▪ −96372, 612658 always less accurate  wider confidence interval (this one even includes 0) Price ( 𝑍 ) unknown, area ( 𝑌 ) known

OLD EXAM QUESTION 26 March 2015, Q3a

FURTHER STUDY Doane & Seward 5/E 12.7, 13.1-13.5 Tutorial exercises week 4 multiple regression dummy regression prediction interval

Business Statistics CONTENTS Multiple regression Dummy regressors - PowerPoint PPT Presentation

MULTIPLE REGRESSION ANALYSIS AND OTHER ISSUES Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression analysis Predicting with regression analysis Old exam question Further study MULTIPLE REGRESSION The

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

Areal statistics Barry Rowlingson Research Fellow DataCamp Spatial Statistics in R Borders

The Pulse monitors: Statistics Smartpods PULSE 1 - Improve Facility Efficiencies 2 - Increase

Quality Assurance in Official Statistics Directorate of Economics & Statistics, Planning

UK Bleeding Disorder Statistics UK Bleeding Disorder Statistics UK Bleeding Disorder Statistics

The Statistics Network The Statistics Network Statistics network Compute servers Desktop PCs

1 Practical Information 2 Introduction to Statistics Per Bruun Brockhoff 3 Descriptive Statistics:

Statistics for Social Sciences I: Introduction to Statistics Introduction to Statistics

Global assessment of linking trade statistics and the business register Nancy Snyder United

Introduction to Business Statistics Professor Jarad Niemi STAT 226 - Iowa State University

Business and Business Environment Business and Business Environment Introduction Business is

Business statistics and Globalisation UN Committee of Experts on Business Statistics First

Introduction to Business Statistics Introduction to Business Statistics QM 120 Ch Chapter 3 t

Introduction to Business Statistics Introduction to Business Statistics QM 120 Ch Chapter 4 t

REPUBLIC OF NAMIBIA WHAT IS FOREIGN TRADE STATISTICS WHAT IS FOREIGN TRADE STATISTICS Records

AP Biology and Statistics Statistics Statistics help to better understand the meaning of a

Stat 8053, Fall 2013: Robust Regression Duncans occupational-prestige regression was introduced

Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The

Dealing with Missing Data Challenges and Solutions Nicole Erler Department of Biostatistics,

Statistics and Data Analysis R Programming and Logistic Regression Ling-Chieh Kung Department of

Evaluating an Alternative CS1 for Students with Prior Programming Experience Michael S.

Workshop 11: Classification and Regression Trees Murray Logan 26-011-2013 Limitations of Linear

1A89 Push-Open Undermount Slide (16mm) Specifications Length: 250mm to 600mm Travel:

Employer Training: Adjustments Slide 1 Adjustment Reporting - Always contact your PERA

Business Statistics CONTENTS Multiple regression Dummy regressors - PowerPoint PPT Presentation

MULTIPLE REGRESSION ANALYSIS AND OTHER ISSUES Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression analysis Predicting with regression analysis Old exam question Further study MULTIPLE REGRESSION The

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

Areal statistics Barry Rowlingson Research Fellow DataCamp Spatial Statistics in R Borders

The Pulse monitors: Statistics Smartpods PULSE 1 - Improve Facility Efficiencies 2 - Increase

Quality Assurance in Official Statistics Directorate of Economics &amp; Statistics, Planning

UK Bleeding Disorder Statistics UK Bleeding Disorder Statistics UK Bleeding Disorder Statistics

The Statistics Network The Statistics Network Statistics network Compute servers Desktop PCs

1 Practical Information 2 Introduction to Statistics Per Bruun Brockhoff 3 Descriptive Statistics:

Statistics for Social Sciences I: Introduction to Statistics Introduction to Statistics

Global assessment of linking trade statistics and the business register Nancy Snyder United

Introduction to Business Statistics Professor Jarad Niemi STAT 226 - Iowa State University

Business and Business Environment Business and Business Environment Introduction Business is

Business statistics and Globalisation UN Committee of Experts on Business Statistics First

Introduction to Business Statistics Introduction to Business Statistics QM 120 Ch Chapter 3 t

Introduction to Business Statistics Introduction to Business Statistics QM 120 Ch Chapter 4 t

REPUBLIC OF NAMIBIA WHAT IS FOREIGN TRADE STATISTICS WHAT IS FOREIGN TRADE STATISTICS Records

AP Biology and Statistics Statistics Statistics help to better understand the meaning of a

Stat 8053, Fall 2013: Robust Regression Duncans occupational-prestige regression was introduced

Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The

Dealing with Missing Data Challenges and Solutions Nicole Erler Department of Biostatistics,

Statistics and Data Analysis R Programming and Logistic Regression Ling-Chieh Kung Department of

Evaluating an Alternative CS1 for Students with Prior Programming Experience Michael S.

Workshop 11: Classification and Regression Trees Murray Logan 26-011-2013 Limitations of Linear

1A89 Push-Open Undermount Slide (16mm) Specifications Length: 250mm to 600mm Travel:

Employer Training: Adjustments Slide 1 Adjustment Reporting - Always contact your PERA

Quality Assurance in Official Statistics Directorate of Economics & Statistics, Planning