Multiple Regression Analysis with Qualitative Additional Topics - Dummy Variables, Adjusted R-Squared & Information A Single Heteroskedasticity Dummy Independent Variable Dummy Variable Coefficients with log( y ) as the Dependent Variable Caio Vigo Dummy Variables for Multiple Categories Goodness-of- Fit and The University of Kansas Selection of Department of Economics Regressors: the Adjusted R-Squared Fall 2019 Heteroskedasticity & Robust Inference These slides were based on Introductory Econometrics by Jeffrey M. Wooldridge (2015) 1 / 53
Topics Multiple Regression Analysis with Qualitative Information 1 Multiple Regression Analysis with Qualitative Information A Single Dummy Independent 2 A Single Dummy Independent Variable Variable Dummy Variable Dummy Variable Coefficients with log( y ) as the Dependent Variable Coefficients with log( y ) as the Dependent Variable Dummy Variables for Multiple Categories Dummy Variables for Multiple Categories Goodness-of- 3 Goodness-of-Fit and Selection of Regressors: the Adjusted R-Squared Fit and Selection of Regressors: the Adjusted R-Squared 4 Heteroskedasticity & Robust Inference Heteroskedasticity & Robust Inference 2 / 53
Describing Qualitative Information Multiple • We have been studying variables (dependent and independent) with quantitative Regression Analysis with meaning. Qualitative Information A Single • Now we need to study how to incorporate qualitative information in our Dummy Independent framework (Multiple Regression Analysis). Variable Dummy Variable Coefficients with log( y ) as the • How do we describe binary qualitative information? Examples: Dependent Variable Dummy Variables for • A person is either male or female. binary or dummy variable Multiple Categories Goodness-of- Fit and • A worker belongs to a union or does not. binary or dummy variable Selection of Regressors: • A firm offers a 401(k) pension plan or it does not. binary or dummy variable the Adjusted R-Squared • the race of an individual. multiple categories variable Heteroskedasticity & Robust Inference • the region where a firm is located (N, S, W, E). multiple categories variable 3 / 53
Describing Qualitative Information Multiple Regression Analysis with • We will discuss only binary variables . Qualitative Information A Single • Binary variable (or dummy variable ) are also called a zero-one variable to Dummy Independent emphasize the two values it takes on. Variable Dummy Variable Coefficients with log( y ) as the • Therefore, we must decide which outcome is assigned zero, which is one. Dependent Variable Dummy Variables for Multiple Categories • Good practice: to choose the variable name to be descriptive. Goodness-of- Fit and Selection of Regressors: • For example, to indicate gender, female , which is one if the person is female, zero the Adjusted R-Squared if the person is male, is a better name than gender or sex (unclear what gender = 1 Heteroskedasticity corresponds to). & Robust Inference 4 / 53
Describing Qualitative Information Multiple • Consider the following dataset: Regression Analysis with Qualitative Information A Single Dummy Independent Variable Dummy Variable Coefficients with log( y ) as the Dependent Variable Dummy Variables for Multiple Categories Goodness-of- Fit and Selection of Regressors: the Adjusted R-Squared Heteroskedasticity & Robust Inference 5 / 53
Describing Qualitative Information Multiple Regression Analysis with Qualitative Information A Single Dummy Independent • For distinguishing different categories, any two different values would work. Variable Example: 5 or 6 Dummy Variable Coefficients with log( y ) as the Dependent Variable Dummy Variables for • 0 and 1 make the interpretation in regression analysis much easier. Multiple Categories Goodness-of- Fit and Selection of Regressors: the Adjusted R-Squared Heteroskedasticity & Robust Inference 6 / 53
Topics Multiple Regression Analysis with Qualitative Information 1 Multiple Regression Analysis with Qualitative Information A Single Dummy Independent 2 A Single Dummy Independent Variable Variable Dummy Variable Dummy Variable Coefficients with log( y ) as the Dependent Variable Coefficients with log( y ) as the Dependent Variable Dummy Variables for Multiple Categories Dummy Variables for Multiple Categories Goodness-of- 3 Goodness-of-Fit and Selection of Regressors: the Adjusted R-Squared Fit and Selection of Regressors: the Adjusted R-Squared 4 Heteroskedasticity & Robust Inference Heteroskedasticity & Robust Inference 7 / 53
A Single Dummy Independent Variable Multiple Regression • What would it mean to specify a simple regression model where the explanatory Analysis with Qualitative variable is binary? Consider Information A Single Dummy wage = β 0 + δ 0 female + u Independent Variable Dummy Variable Coefficients with log( y ) as the where we assume SLR.4 holds: Dependent Variable Dummy Variables for Multiple Categories Goodness-of- E ( u | female ) = 0 Fit and Selection of Regressors: the Adjusted • Therefore, R-Squared Heteroskedasticity & Robust E ( wage | female ) = β 0 + δ 0 female Inference 8 / 53
A Single Dummy Independent Variable Multiple Regression Analysis with Qualitative Information • There are only two values of female , 0 and 1 . A Single Dummy Independent Variable Dummy Variable E ( wage | female = 0) = β 0 + δ 0 · 0 = β 0 Coefficients with log( y ) as the Dependent Variable E ( wage | female = 1) = β 0 + δ 0 · 1 = β 0 + δ 0 Dummy Variables for Multiple Categories Goodness-of- Fit and Selection of In other words, the average wage for men is β 0 and the average wage for women is Regressors: the Adjusted β 0 + δ 0 . R-Squared Heteroskedasticity & Robust Inference 9 / 53
A Single Dummy Independent Variable Multiple Regression Analysis with Qualitative Information • We can write A Single Dummy Independent Variable δ 0 = E ( wage | female = 1) − E ( wage | female = 0) Dummy Variable Coefficients with log( y ) as the as the difference in average wage between women and men. Dependent Variable Dummy Variables for Multiple Categories • So δ 0 is not really a slope. Goodness-of- Fit and Selection of Regressors: It is just a difference in average outcomes between the two groups. the Adjusted R-Squared Heteroskedasticity & Robust Inference 10 / 53
A Single Dummy Independent Variable Multiple Regression Analysis with Qualitative Information • The population relationship is mimicked in the simple regression estimates. A Single Dummy Independent Variable ˆ β 0 = wage m Dummy Variable Coefficients with log( y ) as the β 0 + ˆ ˆ Dependent Variable δ 0 = wage f Dummy Variables for Multiple Categories ˆ δ 0 = wage f − wage m Goodness-of- Fit and Selection of where wage m is the average wage for men in the sample and wage f is the average Regressors: the Adjusted wage for women in the sample. R-Squared Heteroskedasticity & Robust Inference 11 / 53
A Single Dummy Independent Variable Multiple Regression Analysis with Qualitative Information A Single Dummy Independent Variable Dummy Variable Coefficients with log( y ) as the Dependent Variable Dummy Variables for Multiple Categories Goodness-of- Fit and Selection of Regressors: the Adjusted R-Squared Heteroskedasticity & Robust Inference 12 / 53
A Single Dummy Independent Variable Multiple Regression Analysis with Qualitative Information A Single Dummy Independent Variable Dummy Variable Coefficients with log( y ) as the Dependent Variable Dummy Variables for Multiple Categories Goodness-of- Fit and Selection of Regressors: the Adjusted R-Squared Heteroskedasticity & Robust Inference 13 / 53
A Single Dummy Independent Variable Multiple Regression Analysis with Qualitative Information A Single Dummy • The estimated difference is very large. Women earn about $ 2.51 less than men per Independent Variable hour, on average. Dummy Variable Coefficients with log( y ) as the Dependent Variable • Of course, there are some women who earn more than some men; this is a Dummy Variables for Multiple Categories difference in averages. Goodness-of- Fit and Selection of Regressors: the Adjusted R-Squared Heteroskedasticity & Robust Inference 14 / 53
A Single Dummy Independent Variable Multiple • This simple regression allows us to do a simple comparison of means test . The Regression Analysis with null is Qualitative Information A Single H 0 : µ f = µ m Dummy Independent Variable Dummy Variable Coefficients with where µ f is the population average wage for women and µ m is the population log( y ) as the Dependent Variable Dummy Variables for average wage for men. Multiple Categories Goodness-of- Fit and • Under MLR.1 to MLR.5, we can use the usual t statistic as approximately valid (or Selection of Regressors: exactly under MLR.6): the Adjusted R-Squared Heteroskedasticity t female = − 8 . 28 & Robust Inference which is a very strong rejection of H 0 . 15 / 53
Recommend
More recommend