TESTING AND CORRECTING FOR ENDOGENEITY IN NONLINEAR UNOBSERVED EFFECTS MODELS IAAE Lecture 21st International Panel Data Conference Central European University Budapest, June 30 2015 Jeff Wooldridge Michigan State University 1
1. Introduction 2. Linear Model 3. Exponential Model 4. Probit Response Function 5. Empirical Example 6. Extensions and Future Directions 2
1 . Introduction ∙ In unobserved effects models we can think of two kinds of endogeneity for an explanatory variable: 1. Correlation with unobserved effect(s) (time constant) 2. Correlation with innovations (time varying) 3
∙ Application to linear model: Levitt (1996, QJE), effects of prison size on violent crime. ∙ Nonlinear models: Can combine the correlated random effects (CRE) and control function (CF) approaches for certain nonlinear models. 4
∙ Papke and Wooldridge (2008, J of E) provides one approach. Simple but not ideal for testing purposes: It conflates the two kinds of endogeneity. ∙ Application to effects of spending on school/district test pass rates. ∙ Here focus on continuous endogenous explanatory variables (EEVs), but some suggestions for discrete EEVs. 5
∙ Other Approaches: 1. “Fixed Effects” (Heterogeneity as parameters to estimate): Incidental parameters problem with small T . Bias adjustments available for parameters and average partial effects, but usually stationarity and weak dependence (even independence) are assumed. Difficult to incorporate time effects. Endogenous explanatory variables? 6
2. Conditional MLE: Only works in special cases. Relies on conditional independence across time. Partial effects in nonlinear models often unidentified. Exentensions to EEVs? 3. Finite Number of Types (Bonhomme and Manresa) Conditional Independence Nonlinear Models? Extension to EEVs? 7
Ideal FE CMLE CRE Restricts D c i | x i ? No No No Yes Incidental Parameters with Small T ? No Yes No No Yes 1 Yes 2 Restricts Time Series Dependence/Heterogeneity? No No No 3 Restricts Amount of Heterogeneity? No Yes No Yes 4 APEs Identified? Yes No Yes Yes 5 Unbalanced Panels? Yes Yes Yes Yes 4 Yes 6 Can Estimate D c i ? Yes No Yes 4 No 7 Endogenous Explanatory Variables? Yes Yes 8
1. The large T approximations assume weak dependence and often stationarity. 2. Usually conditional independence, unless estimator is inherently fully robust (linear, Poisson). 3. Need at least one more time period than sources of heterogeneity. 4. Subject to the incidental parameters problem. 5. Subject to exchangeability restrictions. 6. Under conditional independence or some other restriction. 7. Unless one makes parametric assumptions on the reduced form and imposes conditional independence. 9
2 . Linear Model ∙ Consider a “structural” equation y it 1 x it 1 1 c i 1 u it 1 where x it 1 z it 1 , y it 2 ∙ The outside instruments are z it 2 . ∙ z it 1 can include time effects, but supress. 10
∙ Both z it and y it 2 may be correlated with c i 1 . ∙ Assume z it is strictly exogenous with respect to u it 1 : Cov z it , u ir 1 0, all t , r 1,..., T ∙ y it 2 may be correlated with u it 1 (across all time periods). ∙ Given a rank condition, 1 can be estimated by fixed effects IV (FEIV). 11
∙ How can we test the null hypothesis that y it 2 is exogenous with respect to u it 1 ? ∙ Hausman test comparing FE and FEIV. Cumbersome due to deficient rank. Original Hausman test not robust to serial correlation or heteroskedasticity in u it 1 . 12
∙ Variable Addition Test (Control Function): 1. Estimate the reduced form of y it 2 , y it 2 z it 2 c i 2 u it 2 , by fixed effects, and obtain the FE residuals, ̂ 2 ü it 2 ÿ it 2 − z ̈ it ÿ it 2 y it 2 − T − 1 ∑ T y ir 2 r 1 13
2. Estimate the equation y it 1 x it 1 1 ü it 2 1 c i 1 error it 1 by usual FE and compute a robust test of H 0 : 1 0 . ̂ ∙ In step (2), 1 is the FEIV estimator. ∙ Note that the nature of y it 2 is unrestricted (discrete, continuous, both features). 14
∙ We can also use a correlated random effects approach, but some but some care is needed to get a proper test. ∙ The Mundlak equation for y it 2 is y it 2 2 z it 2 z ̄ i 2 a i 2 u it 2 ̄ i T − 1 ∑ T z z it t 1 ∙ We are operating as if Cov z it , u is 2 0 , all t , s Cov z it , a i 2 0 , all t 15
∙ Key: How should we apply the Mundlak device to c i 1 in y it 1 x it 1 1 c i 1 u it 1 ? ∙ Projecting c i 1 only onto z ̄ i is fine for estimation. ∙ For testing, it does not distinguish between Cov y it 2 , c i 1 ≠ 0 and Cov y it 2 , u is 1 ≠ 0 . 16
∙ Better is to project c i 1 onto z ̄ i 2 where ̄ i , v y it 2 2 z it 2 z ̄ i 2 v it 2 c i 1 1 z ̄ i 1 v ̄ i 2 1 a i 1 Cov z i , a i 1 0 Cov y i 2 , a i 1 0 17
∙ Plugging in gives the estimating equation y it 1 x it 1 1 1 z ̄ i 1 v ̄ i 2 1 a i 1 u it 1 x it 1 1 1 z ̄ i 1 y ̄ i 2 1 a i 1 u it 1 ∙ By the Mundlak device, a i 1 is uncorrelated with all RHS observables. ∙ By assumption, z i is uncorrelated with u it 1 . ∙ Now test whether y it 2 , equivalently v it 2 , is uncorrelated with u it 1 . 18
1. Run a pooled OLS regression (or use random effects), y it 2 2 z it 2 z ̄ i 2 v it 2 , ̂ it 2 . and obtain the residuals, v 2. Estimate ̂ it 2 1 error it 1 y it 1 x it 1 1 1 z ̄ i 1 y ̄ i 2 1 v by POLS or RE. 3. Use a robust Wald test of H 0 : 1 0 . 19
∙ Algebra: ̂ (i) 1 in step (2) is the FEIV estimator. ̂ 1 is identical to that from estimating (ii) y it 1 x it 1 1 ü it 2 1 c i 1 error it 1 by FE. ∙ Result (i) still holds if y ̄ i 2 is dropped from the estimating equation, but (ii) does not. 20
∙ Conclusion: In using the CRE/CF approach for testing H 0 : Cov y it 2 , u is 1 0 , use the equations ̂ 2 z ̂ 2 v ̂ 2 z it ̂ it 2 y it 2 ̄ i ̂ it 2 1 error it 1 y it 1 x it 1 1 1 z ̄ i 1 y ̄ i 2 1 v ∙ Also works in the unbalanced case when the complete cases are used (Joshi and Wooldridge, 2015). 21
∙ What about using Chamberlain in place of Mundlak? ∙ Reusing notation, with z i z i 1 ,..., z iT , y i 2 y i 12 ,..., y iT 2 , ̂ 2 z i ̂ 2 v ̂ 2 z it ̂ it 2 y it 2 ̂ it 2 1 error it 1 y it 1 x it 1 1 1 z i 1 y i 2 1 v ∙ Estimates of 1 and 1 are are identical to Mundlak, as is robust Wald test. (POLS or RE). 22
∙ Conclusion: Include time averages of z it and y it 2 to obtain a clean test of endogeneity of y it 2 with respect to u it 1 . ∙ If POLS or RE are used, Chamberlain Mundlak. ∙ All goes through with time effects. ∙ Time constant variables can be included in the CRE/CF approach. 23
∙ Ignoring the pre-testing problem, a strategy for testing is: 1. If the VAT rejects, use FEIV. Or, then test REIV against FEIV, as instruments may be “super” exogenous. 2. If the VAT fails to reject, use FE or compare RE and FE. 24
3 . Exponential Model ∙ Fully robust test for exogeneity: 1. Estimate the reduced form for y it 2 by fixed effects and obtain the FE residuals, ̂ 2 ̈ it ü it 2 ÿ it 2 − z 2. Use FE Poisson on the mean function “ E y it 1 | z it 1 , y it 2 , ü it 2 , c i 1 c i 1 exp x it 1 1 ü it 2 1 ” and use a robust Wald test of H 0 : 1 0 . 25
∙ The null hypothesis is H 0 : E y it 1 | z i 1 , y i 2 , ü it 2 , c i 1 E y it 1 | z it 1 , y it 2 , c i 1 c i 1 exp x it 1 1 ∙ Algebra: The Poisson FE estimates of 1 , 1 are unchanged if we estimate the Mundlak reduced form: y it 2 2 z it 2 z ̄ i 2 v it 2 ̂ it 2 ≠ ̈ it 2 ü it 2 but ̂ it 2 ( v and use residuals, v ü it 2 ). v 26
∙ For testing, no restrictions are put on the RF of y it 2 . ∙ When would the Mundlak/FE Poisson approach be consistent for 1 ? ∙ Assume that E y it 1 | z i , y i 2 , c i 1 , u it 1 E y it 1 | z it 1 , y it 2 , c i 1 , u it 1 c i 1 exp x it 1 1 u it 1 where x it 1 can be any function of z it 1 , y it 2 . 27
∙ The reduced form is y it 2 2 z it 2 z ̄ i 2 a i 2 u it 2 v it 2 a i 2 u it 2 ∙ Sufficient is u it 1 u it 2 1 e it 1 v it 2 1 − a i 2 1 e it 1 e it 1 z i , c i 1 , c i 2 , u i 2 28
∙ This appears to impose a restriction of only contemporaneous correlation between u it 1 and u it 2 . How important is it? ∙ In the linear case it makes no difference. 29
Recommend
More recommend