Gov 2000: 9. Regression with Two Independent Variables Matthew Blackwell Fall 2016 1 / 62
1. Why Add Variables to a Regression? 2. Adding a Binary Covariate 3. Adding a Continuous Covariate 4. OLS Mechanics with Two Covariates 5. OLS Assumptions with Two Covariates 6. Omitted Variable Bias 7. Goodness of Fit & Multicollinearity 2 / 62
Where are we? Where are we going? 3 / 62 Number of Covariates in Our Regressions 10 8 6 4 2 0 Last Week
Where are we? Where are we going? 4 / 62 Number of Covariates in Our Regressions 10 8 6 4 2 0 Last Week This Week
Where are we? Where are we going? 5 / 62 Number of Covariates in Our Regressions 10 8 6 4 2 0 Last Week This Week Next Week
1/ Why Add Variables to a Regression? 6 / 62
7 / 62
8 / 62 Berkeley gender bias • Graduate admissions data from Berkeley, 1973 • Acceptance rates: ▶ Men: 8442 applicants, 44% admission rate ▶ Women: 4321 applicants, 35% admission rate • Evidence of discrimination toward women in admissions? • This is a marginal relationship. • What about the conditional relationship within departments?
Berkeley gender bias, II 28% D 417 33% 375 35% E 191 393 37% 24% F 373 6% 341 7% relationship given third variable (department). 34% 593 325 825 Men Women Dept Applied Admitted Applied Admitted C A 62% 108 82% B 560 63% 25 68% 9 / 62 • Within departments: • Within departments, women do somewhat better than men! • Women apply to more challenging departments. • Marginal relationships (admissions and gender) ≠ conditional
Simpson’s paradox 10 / 62 1 Z = 1 0 Y -1 Z = 0 -2 -3 0 1 2 3 4 X • Overall a positive relationship between 𝑍 𝑗 and 𝑌 𝑗 .
Simpson’s paradox 11 / 62 1 Z = 1 0 Y -1 Z = 0 -2 -3 0 1 2 3 4 X • Overall a positive relationship between 𝑍 𝑗 and 𝑌 𝑗 . • But within levels of 𝑎 𝑗 , the opposite.
Basic idea independent variable, 𝑌 : 𝔽[𝑍 𝑗 |𝑌 𝑗 ]. with a line: 𝑌 𝑗 , conditional on a third variable, 𝑎 𝑗 : 12 / 62 • Old goal: estimate the mean of 𝑍 as a function of some • For continuous 𝑌 ’s, we modeled the CEF/regression function 𝑍 𝑗 = 𝛾 0 + 𝛾 1 𝑌 𝑗 + 𝑣 𝑗 • New goal: estimate the relationship of two variables, 𝑍 𝑗 and 𝑍 𝑗 = 𝛾 0 + 𝛾 1 𝑌 𝑗 + 𝛾 2 𝑎 𝑗 + 𝑣 𝑗 • 𝛾 ’s are the population parameters we want to estimate.
Why control for another variable activity levels correlate with less weight? variable with more information on independent variables. but only appears to because a third variable 𝑎 causally afgects both of them. 13 / 62 • Descriptive ▶ Get a sense for the relationships in the data. ▶ Conditional on the number of steps I’ve taken, does higher • Predictive ▶ We can usually make better predictions about the dependent • Causal ▶ Block potential confounding, which is when 𝑌 doesn’t cause 𝑍 ,
Plan of atuack 1. Interpretation with a binary 𝑎 𝑗 2. Interpretation with a continuous 𝑎 𝑗 3. Mechanics of OLS with 2 covariates 4. OLS assumptions with 2 covariates: 14 / 62 ▶ Omitted variable bias ▶ Multicollinearity
What we won’t cover in lecture 1. The OLS formulas for 2 covariates 2. Proofs 𝑗 4. Hypothesis test/confjdence intervals (almost exactly the same) 15 / 62 3. The second covariate being a function of the fjrst: 𝑎 𝑗 = 𝑌 2
2/ Adding a Binary Covariate 16 / 62
Example 17 / 62 11 Non-African countries 10 Log GDP per capita 9 8 7 6 African countries 5 4 0 2 4 6 8 10 Strength of Property Rights
Basics error. rights. 18 / 62 • Ye olde model: 𝔽[𝑍 𝑗 |𝑌 𝑗 ] = 𝛽 0 + 𝛽 1 𝑌 𝑗 ▶ (𝛽 0 , 𝛽 1 ) are the bivariate intercept/slope, 𝑓 𝑗 is the bivariate • Concern: AJR might be picking up an “African efgect”: ▶ African countries might have low incomes and weak property • Condition on country being in Africa or not to remove this: 𝔽[𝑍 𝑗 |𝑌 𝑗 , 𝑎 𝑗 ] = 𝛾 0 + 𝛾 1 𝑌 𝑗 + 𝛾 2 𝑎 𝑗 ▶ 𝑎 𝑗 = 1 to indicate that 𝑗 is an African country ▶ 𝑎 𝑗 = 0 to indicate that 𝑗 is an non-African country ▶ Efgects are now within Africa or within non-Africa, not between
AJR model ## 3e-08 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.625 on 108 degrees of freedom (52 observations deleted due to missingness) 0.1471 ## Multiple R-squared: 0.708, Adjusted R-squared: 0.702 ## F-statistic: 131 on 2 and 108 DF, p-value: <2e-16 -5.97 -0.8784 ajr.mod <- lm(logpgp95 ~ avexpr + africa, data = ajr) 5.6556 summary(ajr.mod) ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 0.3134 ## africa 18.04 <2e-16 *** ## avexpr 0.4242 0.0397 10.68 <2e-16 *** 19 / 62
Two lines, one regression 𝛾 2 𝑎 𝑗 𝛾 1 𝑌 𝑗 𝛾 2 𝑎 𝑗 ̂ 𝛾 1 𝑌 𝑗 20 / 62 ̂ • How can we interpret this model? • Plug in two possible values for 𝑎 𝑗 and rearrange • When 𝑎 𝑗 = 0 : 𝑍 𝑗 = ̂ 𝛾 0 + ̂ 𝛾 1 𝑌 𝑗 + ̂ = ̂ 𝛾 0 + ̂ 𝛾 1 𝑌 𝑗 + ̂ 𝛾 2 × 0 = ̂ 𝛾 0 + ̂ • When 𝑎 𝑗 = 1 : 𝑍 𝑗 = ̂ 𝛾 0 + ̂ 𝛾 1 𝑌 𝑗 + ̂ = ̂ 𝛾 0 + ̂ 𝛾 1 𝑌 𝑗 + ̂ 𝛾 2 × 1 = ( ̂ 𝛾 0 + ̂ 𝛾 2 ) + ̂ • Two difgerent intercepts, same slope
Interpretation of the coefficients ̂ on property rights capita between African and non-African counties conditional countries (or for two non-African countries) 0.424 increase in average log incomes for two African 𝛾 1 : A one-unit increase in property rights is associated with a property rights measured at 0 is 5.656 ̂ Intercept for 𝑌 𝑗 𝛾 1 21 / 62 𝛾 2 𝛾 0 ̂ Slope for 𝑌 𝑗 ̂ 𝛾 1 ̂ Non-African country ( 𝑎 𝑗 = 0 ) 𝛾 0 + ̂ African country ( 𝑎 𝑗 = 1 ) • In this example, we have: 𝑍 𝑗 = 5.656 + 0.424 × 𝑌 𝑗 − 0.878 × 𝑎 𝑗 • ̂ 𝛾 0 : average log income for non-African country ( 𝑎 𝑗 = 0 ) with • ̂ • ̂ 𝛾 2 : there is a − 0.878 average difgerence in log income per
General interpretation of the 𝛾 2 𝑎 𝑗 𝛾 1 -unit coefficients 22 / 62 ̂ 𝑍 𝑗 = ̂ 𝛾 0 + ̂ 𝛾 1 𝑌 𝑗 + ̂ • ̂ 𝛾 0 : average value of 𝑍 𝑗 when both 𝑌 𝑗 and 𝑎 𝑗 are equal to 0 • ̂ 𝛾 1 : A 1-unit increase in 𝑌 𝑗 is associated with a ̂ change in 𝑍 𝑗 for units with the same value of 𝑎 𝑗 • ̂ 𝛾 2 : average difgerence in 𝑍 𝑗 between 𝑎 𝑗 = 1 group and 𝑎 𝑗 = 0 group for units with the same value of 𝑌 𝑗
Adding a binary variable, visually 23 / 62 11 β 0 = 5.656 10 β 1 = 0.424 Log GDP per capita 9 8 7 6 β 0 5 4 0 2 4 6 8 10 Strength of Property Rights
Adding a binary variable, visually 24 / 62 11 β 0 = 5.656 10 β 1 = 0.424 Log GDP per capita 9 β 2 = -0.878 8 7 β 2 6 β 0 5 β 0 + β 2 4 0 2 4 6 8 10 Strength of Property Rights
Marginal vs conditional 25 / 62 11 10 Log GDP per capita 9 8 7 6 5 4 0 2 4 6 8 10 Strength of Property Rights
3/ Adding a Continuous Covariate 26 / 62
malaria) Adding a continuous variable 27 / 62 • Ye olde model: 𝔽[𝑍 𝑗 |𝑌 𝑗 ] = 𝛽 0 + 𝛽 1 𝑌 𝑗 • New concern: geography is confounding the efgect ▶ geography might afgect political institutions ▶ geography might afgect average incomes (through diseases like • Condition on 𝑎 𝑗 : mean temperature in country 𝑗 (continuous) 𝔽[𝑍 𝑗 |𝑌 𝑗 , 𝑎 𝑗 ] = 𝛾 0 + 𝛾 1 𝑌 𝑗 + 𝛾 2 𝑎 𝑗
AJR model, revisited ## 0.003 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.643 on 57 degrees of freedom (103 observations deleted due to missingness) 0.0194 ## Multiple R-squared: 0.615, Adjusted R-squared: 0.602 ## F-statistic: 45.6 on 2 and 57 DF, p-value: 1.48e-12 -3.11 -0.0602 ajr.mod2 <- lm(logpgp95 ~ avexpr + meantemp, data = ajr) 6.8063 summary(ajr.mod2) ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 0.7518 ## meantemp 9.05 1.3e-12 *** ## avexpr 0.4057 0.0640 6.34 3.9e-08 *** 28 / 62
Interpretation with a continuous Z ̂ strength of property rights 𝛾 2 : A one-degree increase in mean temperature is associated country’s mean temperature 0.406 change in average log incomes conditional on a 𝛾 1 : A one-unit increase in property rights is associated with a 𝛾 0 : average log income for a country with property rights ̂ 𝛾 1 ̂ ̂ Intercept for 𝑌 𝑗 𝛾 1 29 / 62 ̂ ̂ 𝛾 1 ̂ Slope for 𝑌 𝑗 ̂ ̂ 𝛾 0 𝛾 1 𝑎 𝑗 = 0 ∘ C 𝑎 𝑗 = 21 ∘ C 𝛾 0 + ̂ 𝛾 2 × 21 𝑎 𝑗 = 24 ∘ C 𝛾 0 + ̂ 𝛾 2 × 24 𝑎 𝑗 = 26 ∘ C 𝛾 0 + ̂ 𝛾 2 × 26 • In this example we have: 𝑍 𝑗 = 6.806 + 0.406 × 𝑌 𝑗 − 0.06 × 𝑎 𝑗 • ̂ measured at 0 and a mean temperature of 0 is 6.806 • ̂ • ̂ with a − 0.06 change in average log incomes conditional on
Recommend
More recommend