endogeneity and instrumental variables
play

Endogeneity and Instrumental Variables Ping Yu School of Economics - PowerPoint PPT Presentation

Endogeneity and Instrumental Variables Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Endogeneity and IV 1 / 44 Endogeneity Endogeneity 1 Instrumental Variables 2 Reduced Form 3 Identification 4


  1. Endogeneity and Instrumental Variables Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Endogeneity and IV 1 / 44

  2. Endogeneity Endogeneity 1 Instrumental Variables 2 Reduced Form 3 Identification 4 Estimation: Two-Stage Least Squares 5 Interpretation of the IV Estimator 6 Ping Yu (HKU) Endogeneity and IV 2 / 44

  3. Endogeneity Endogeneity Ping Yu (HKU) Endogeneity and IV 2 / 44

  4. Endogeneity Endogeneity In the linear regression y i = x 0 i β + u i , (1) if E [ x i u i ] 6 = 0 , there is endogeneity . In this case, the LSE will be asymptotically biased. The analysis of data with endogenous regressors is arguably the main contribution of econometrics to statistical science. Ping Yu (HKU) Endogeneity and IV 3 / 44

  5. Endogeneity Five Sources of Endogeneity Simultaneous causality. - Example: Does computer usage increase the income? Do Cigarette taxes reduce smoking? Does putting criminals in jail reduce crime? - Solution: using instrumental variables (IVs), and designing and implementing a randomizing controlled experiment in which the reverse causality channel is nullified Omitted variables. - Example: in the model on returns to schooling, ability is an important variable that is correlated to years of education, but is not observable so is included in the error term. - Solution: using IVs, using panel data and using randomizing controlled experiments. Ping Yu (HKU) Endogeneity and IV 4 / 44

  6. Endogeneity Continue... Errors in variables. This term refers to the phenomenon that an otherwise exogenous regressor becomes endogenous when measured with error. - Example: in the returns-to-schooling model, the records for years of education are fraught with errors owing to lack of recall, typographical mistakes, or other reasons. - Solution: using IVs (e.g., exogenous determinants of the error ridden explanatory variables, or multiple indicators of the same outcome). Sample selection. - Example: in the analysis of returns to schooling, only wages for employed workers are available, but we want to know the effect of education for the general population. - Solution: Heckman’s control function approach. Functional form misspecification. E [ y j x ] may not be linear in x . Solution: nonparametric methods. Ping Yu (HKU) Endogeneity and IV 5 / 44

  7. Endogeneity Simultaneous Causality Wright (1928) considered to estimate the elasticity of butter demand, which is critical in the policy decision on the tariff of butter. Define p i = ln P i and q i = ln Q i , and the demand equation is q i = α 0 + α 1 p i + u i , (2) where u i represents other factors besides price that affect demand, such as income and consumer taste. But the supply equation is in the same form as (2): q i = β 0 + β 1 p i + v i , (3) where v i represents the factors that affect supply, such as weather conditions, factor prices, and union status. So p i and q i are determined "within" the model, and they are endogenous. Rigorously, note that β 0 � α 0 + v i � u i p i = , α 1 � β 1 α 1 � β 1 α 1 β 0 � α 0 β 1 + α 1 v i � β 1 u i q i = , α 1 � β 1 α 1 � β 1 by solving two simultaneous equations (2) and (3). Ping Yu (HKU) Endogeneity and IV 6 / 44

  8. Endogeneity continue... Suppose Cov ( u i , v i ) = 0, then Cov ( p i , u i ) = � Var ( u i ) , Cov ( p i , v i ) = Var ( v i ) , α 1 � β 1 α 1 � β 1 which are not zero. If α 1 < 0 and β 1 > 0, then Cov ( p i , u i ) > 0 and Cov ( p i , v i ) < 0, which is intuitively right (why?). If regress q i on p i , then the slope estimator converges to Cov ( p i , q i ) = α 1 + Cov ( p i , u i ) = β 1 + Cov ( p i , v i ) Var ( p i ) Var ( p i ) Var ( p i ) why ? α 1 Var ( v i )+ β 1 Var ( u i ) = 2 ( α 1 , β 1 ) . Var ( v i )+ Var ( u i ) So the LSE is neither α 1 nor β 1 , but a weighted average of them. Such a bias is called the simultaneous equations bias . The LSE cannot consistently estimate α 1 or β 1 because both curves are shifted by other factors besides price, and we cannot tell from data whether the change in price and quantity is due to a demand shift or a supply shift. Ping Yu (HKU) Endogeneity and IV 7 / 44

  9. Endogeneity continue... If u i = 0; that is, the demand curve stays still, then the equilibrium prices and quantities will trace out the demand curve and the LSE is consistent to α 1 . Figure 1 illustrates the discussion above intuitively. Demand and Supply in Three Time Periods Equilibria when Only the Supply Curve Shifts S 2 S 2 Period 2 S 1 S 1 Equilibrium S 3 S 3 Price Price Period 3 Equilibrium D 3 Period 1 D 2 D 1 Equilibrium D 1 Quantity Quantity Figure: Endogeneity and Identification of Instrument Variables Ping Yu (HKU) Endogeneity and IV 8 / 44

  10. Endogeneity continue... � � u i From above, we can see that p i has one part which is correlated with u i � α 1 � β 1 � � v i and one part is not . If we can isolate the second part, then we can focus α 1 � β 1 on those variations in p i that are uncorrelated with u i and disregard the variations in p i that bias the LSE. Take one supply shifter z i , e.g., weather, which can be considered to be uncorrelated with the demand shifter u i such as consumer’s tastes, then Cov ( z i , u i ) = 0, and Cov ( z i , p i ) 6 = 0. So Cov ( z i , q i ) = α 1 � Cov ( z i , p i ) , and α 1 = Cov ( z i , q i ) Cov ( z i , p i ) . A natural estimator is d Cov ( z i , q i ) b α 1 = , d Cov ( z i , p i ) which is the IV estimator . Ping Yu (HKU) Endogeneity and IV 9 / 44

  11. Endogeneity continue... Another method to estimate α 1 as suggested above is to run regression q i = α 0 + α 1 b p i + e u i , where b p i is the predicted value from the following regression: p i = γ 0 + γ 1 z i + η i , and e u i = α 1 ( p i � b p i ) + u i . It is easy to show that Cov ( b p i , e u i ) = 0, so the estimation is consistent. Such a procedure is called two-stage least squares (2SLS) for an obvious reason. In this case, the IV estimator and the 2SLS estimator are numerically equivalent. Ping Yu (HKU) Endogeneity and IV 10 / 44

  12. Endogeneity Omitted Variables Mundlak (1961) considered the production function estimation, where the error term includes factors that are observable to the economic agent under study but unobservable to the econometrician, and endogeneity arises when regressors are decisions made by the agent on the basis of such factors. Suppose that a farmer is producing a product with a Cobb-Douglas technology: Q i = A i � ( L i ) φ 1 � exp ( ν i ) , 0 < φ 1 < 1 , (4) where Q i is the output on the i th farm, L i is a variable input (labor), A i represents an input that is fixed over time (soil quality), and ν i represents a stochastic input (rainfall), which is not under the farmer’s control. We shall assume that the farmer knows the product price p and input price w , which do not depend on his decisions, and that he knows A i but econometricians do not. The factor input decision is made before knowing ν i , and so L i is chosen to maximize expected profits. The factor demand equation is � w � 1 φ 1 � 1 1 L i = ( A i B φ 1 ) 1 � φ 1 , (5) p so a better farm induces more labors on it. Ping Yu (HKU) Endogeneity and IV 11 / 44

  13. Endogeneity continue... We assume that ( A i , ν i ) is i.i.d. over farms, and A i is independent of ν i for each i , so B = E [ exp ( ν i )] is the same for all i , and the level of output the farm expects when it chooses L i is A i � ( L i ) φ 1 � B . Take logarithm on both sides of (4), we have a log-linear production function: log Q i = log A i + φ 1 � log ( L i ) + ν i . log A i is an omitted variable. Equivalently, each farm has a different intercept. The LSE of φ 1 will converge to Cov ( log Q i , log ( L i )) = φ 1 + Cov ( log A i , log ( L i )) , Var ( log ( L i )) Var ( log ( L i )) which is not φ 1 since there is correlation between log A i and log ( L i ) as shown in (5). Figure 2 shows the effect of log A i on φ 1 by drawing E [ log Q j log L , log A ] for two farms. In Figure 2, the OLS regression line passes through points AB with slope log Q 1 � log Q 2 D � C log L 1 � log L 2 , but the true φ 1 is log L 1 � log L 2 . Their difference is log L 1 � log L 2 = log A 1 � log A 2 A � D log L 1 � log L 2 , which is the bias introduced by the endogeneity of log A i . Ping Yu (HKU) Endogeneity and IV 12 / 44

  14. Endogeneity log Q A log Q 1 D B log Q 2 C log L 2 log L 1 log L Figure: Effect of Soil Quality on Labor Input Ping Yu (HKU) Endogeneity and IV 13 / 44

  15. Endogeneity continue... Rigorously, let u i = log ( A i ) � E [ log ( A i )] , and φ 0 = E [ log ( A i )] , then E [ u i ] = 0 and A i = exp ( φ 0 + u i ) . (4) and (5) can be written as log Q i = φ 0 + φ 1 � log ( L i ) + ν i + u i , (6) 1 log L i = β 0 + u i , (7) 1 � φ 1 � � �� 1 w where β 0 = φ 0 + log ( B φ 1 ) � log is a constant for all farms. 1 � φ 1 p It is obvious that log L i is correlated with ( ν i + u i ) . Thus, the LSE of φ 1 in the estimation of log-linear production function confounds the contribution to output of u i with the contribution of labor. Actually, p b φ 1 , OLS � ! 1 , because substituting (7) into (6), we get log Q i = φ 0 � ( 1 � φ 1 ) β 0 + 1 � log ( L i ) + ν i . The lesson from this example is that a variable chosen by the agent taking into account some error component unobservable to the econometrician can induce endogeneity. Ping Yu (HKU) Endogeneity and IV 14 / 44

Recommend


More recommend