w g v w g w g g v g w g w g w g 4 j m g u g 1
play

W g v W g W g g v g W g W g W g (4) j M g u g - PowerPoint PPT Presentation

A Course in Applied Econometrics 1 . The Linear Model with Cluster Effects . Lecture 7 : Cluster Sampling For each group or cluster g , let y gm , x g , z gm : m 1,..., M g be the observable data, where M g is the number of


  1. A Course in Applied Econometrics 1 . The Linear Model with Cluster Effects . Lecture 7 : Cluster Sampling � For each group or cluster g , let �� y gm , x g , z gm � : m � 1,..., M g � be the observable data, where M g is the number of units in cluster g , y gm is Jeff Wooldridge a scalar response, x g is a 1 � K vector containing explanatory variables IRP Lectures, UW Madison, August 2008 that vary only at the group level, and z gm is a 1 � L vector of covariates that vary within (as well as across) groups. 1. The Linear Model with Cluster Effects � The linear model with an additive error is 2. Estimation with a Small Number of Groups and Large Group Sizes y gm � � � x g � � z gm � � v gm (1) 3. What if G and M g are Both “Large”? for m � 1,..., M g , g � 1,..., G . 4. Nonlinear Models � Key questions: (1) Are we primarily interested in � or � ? 1 2 � In the panel data setting, G is the number of cross-sectional units and (2) Does v gm contain a common group effect, as in M g is the number of time periods for unit g . v gm � c g � u gm , m � 1,..., M g , (2) Large Group Asymptotics where c g is an unobserved group (cluster) effect and u gm is the � The theory with G � � and the group sizes, M g , fixed is well idiosyncratic component? (3) Are the regressors � x g , z gm � appropriately developed [White (1984), Arellano (1987)]. How should one use these exogenous? (4) How big are the group sizes ( M g � and number of methods? If groups � G � ? E � v gm | x g , z gm � � 0 � Easiest sampling scheme: From a large population of relatively small (3) clusters, we draw a large number of clusters ( G ), where cluster g has then pooled OLS estimator of y gm on 1, x g , z gm , m � 1,..., M g ; g � 1,..., G , is consistent for � � � � , � � , � � � � M g members. For example, sampling a large number of families, classrooms, or firms from a large population. (as G � � with M g fixed) and G -asymptotically normal. 3 4

  2. � Robust variance matrix is needed to account for correlation within � Generalized Least Squares: Strengthen the exogeneity assumption to clusters or heteroskedasticity in Var � v gm | x g , z gm � , or both. Write W g as E � v gm | x g , Z g � � 0, m � 1,..., M g ; g � 1,..., G , (5) the M g � � 1 � K � L � matrix of all regressors for group g . Then the where Z g is the M g � L matrix of unit-specific covariates. � 1 � K � L � � � 1 � K � L � variance matrix estimator is � Full RE approach: the M g � M g variance-covariance matrix of � 1 � 1 v g � � v g 1 , v g 2 ,..., v g , M g � � has the “random effects” form, G G G � � � � W g � v � W g � W g � g v � g W g W g W g (4) � j M g � � u g � 1 g � 1 g � 1 Var � v g � � � c 2 j M g 2 I M g , (6) � g is the M g � 1 vector of pooled OLS residuals for group g . where v where j M g is the M g � 1 vector of ones and I M g is the M g � M g identity This “sandwich” estimator is now computed routinely using “cluster” matrix. options. 5 6 � The usual assumptions include the “system homoskedasticity” � Cluster sample example: random coefficient model, assumption, y gm � � � x g � � z gm � g � v gm . (8) Var � v g | x g , Z g � � Var � v g � . (7) By estimating a standard random effects model that assumes common � The random effects estimator � � RE is asymptotically more efficient slopes � , we effectively include z gm � � g � � � in the idiosyncratic error. than pooled OLS under (5), (6), and (7) as G � � with the M g fixed. � If only � is of interest, fixed effects is attractive. Namely, apply The RE estimates and test statistics are computed routinely by popular pooled OLS to the equation with group means removed: software packages. y gm � y � g � � z gm � z � g � � � u gm � � g . (9) � Important point is often overlooked: one can, and in many cases should, make RE inference completely robust to an unknown form of Var � v g | x g , Z g � , whether we have a true cluster sample or panel data. 7 8

  3. � Often important to allow Var � u g | Z g � to have an arbitrary form, A fully robust variance matrix estimator of � � FE is � 1 � 1 including within-group correlation and heteroskedasticity. Certainly G G G � � � � � � Z � � Z � Z � g � g � g � g � g � g Z Z ü g ü g Z , (10) should for panel data (serial correlation), but also for cluster sampling. g � 1 g � 1 g � 1 From linear panel data notes, FE can consistently estimate the average � g is the matrix of within-group deviations from means and � where Z ü g effect in the random coefficient case. But � z gm � z � g �� � g � � � appears in is the M g � 1 vector of fixed effects residuals. This estimator is justified the error term. with large- G asymptotics. 9 10 � Above results are for “one-way clustering.” Cameron, Gelbach, and Should we Use the “ Large ” G Formulas with “ Large ” M g ? � What if one applies robust inference in scenarios where the fixed M g , Miller (2006) have shown how to extend the formulas to multi-way G � � asymptotic analysis not realistic? Can apply recent results of clustering. For example, we have individual-level data with industry and occupation representing different clusters. So we have y ghm for Hansen (2007) to various scenarios. � Hansen (2007, Theorem 2) shows that, with G and M g both getting g � 1,..., G , h � 1,..., H , m � 1,..., M gh . An individual belongs to two clusters, implying some correlation across groups. Correlation large, the usual inference based on the robust “sandwich” estimator is across occupational groups occurs because some individuals in valid with arbitrary correlation among the errors, v gm , within each different occupations (indexed by g ) are in the same industry (indexed group (but still independence across groups). For example, if we have a sample of G � 100 schools and roughly M g � 100 students per school, by h ). � If explanatory variables vary by individual, two-way fixed effects is and we use pooled OLS leaving the school effects in the error term, we attractive and often eliminates the need for cluster-robust inference. should expect the inference to have roughly the correct size. 11 12

  4. � Unfortunately, in the presence of cluster effects with a small number � If the explanatory variables of interest vary within group, FE is of groups ( G ) and large group sizes ( M g ), cluster-robust inference with attractive. First, allows c g to be arbitrarily correlated with the z gm . pooled OLS falls outside Hansen’s theoretical findings. We should not Second, with large M g , can treat the c g as parameters to estimate – expect good properties of the cluster-robust inference with small groups because we can estimate them precisely – and then assume that the and large group sizes. observations are independent across m (as well as g ). This means that � Example: Suppose G � 10 hospitals have been sampled with several the usual inference is valid, perhaps with adjustment for hundred patients per hospital. If the explanatory variable of interest heteroskedasticity. The fixed G , large M g results in Hansen (2007, varies only at the hospital level, tempting to use pooled OLS with Theorem 4) for cluster-robust inference apply, but are likely to be very costly: the usual variance matrix is multiplied by G / � G � 1 � and the t cluster-robust inference. But we have no theoretical justification for doing so, and reasons to expect it will not work well. (Section 2 below statistics are approximately distributed as t G � 1 (not standard normal). considers alternatives.) 13 14 � For panel data applications, Hansen’s (2007) results, particularly 2 . Estimation with Few Groups and Large Group Sizes � When G is small and each M g is large, we probably have a different Theorem 3, imply that cluster-robust inference for the fixed effects estimator should work well when the cross section ( N ) and time series sampling scheme: large random samples are drawn from different ( T ) dimensions are similar and not too small. If full time effects are segments of a population. Except for the relative dimensions of G and allowed in addition to unit-specific fixed effects – as they often should M g , the resulting data set is essentially indistinguishable from a data set – then the asymptotics must be with N and T both getting large. In this obtained by sampling entire clusters. � The problem of proper inference when M g is large relative to G – the case, any serial dependence in the idiosyncratic errors is assumed to be weakly dependent. The simulations in Bertrand, Duflo, and “Moulton (1990) problem” – has been recently studied by Donald and Mullainathan (2004) and Hansen (2007) verify that the fully robust Lang (2007). DL treat the parameters associated with the different cluster-robust variance matrix works well when N and T are about 50 groups as outcomes of random draws. and the idiosyncratic errors follow a stable AR(1) model. 15 16

Recommend


More recommend