Clustering as a Design Problem Alberto Abadie, Susan Athey, Guido Imbens, & Jeffrey Wooldridge CEMMAP London, April 15, 2016
• Adjusting standard errors for clustering is common in em- pirical work. • Motivation not always clear. • Implementation is not always clear. • We present a coherent framework for thinking about clus- tering that clarifies when and how to adjust for clustering. • Currently mostly exact calculations in simple cases. • Clarifies role of large number of clusters asymptotics. NOT about small sample issues, either small number of clus- ters or small number of units, NOT about serial correlation issues. (Important, but not key to issues discussed here) 1
Setup Data on ( Y i , D i , G i ), i = 1 , . . . , N Y i is outcome D i is regressor, mainly focus on special case where D i ∈ {− 1 , 1 } (to allow for exact results). G i ∈ { 1 , . . . , G } is group/cluster indicator. Estimate regression function β ′ = ( α, τ ) Y i = α + τ · D i + ε i = X ′ X ′ i β + ε, i = (1 , D i ) , 2
Least squares estimator (not generalized least squares) N ( Y i − α − τ · D i ) 2 τ ) ′ ˆ � (ˆ α, ˆ τ ) = arg min β = (ˆ α, ˆ i =1 Residuals ε i = Y i − ˆ ˆ α − ˆ τ · D i Focus of the paper is on properties of ˆ τ : • What is variance of ˆ τ • How do we estimate the variance of ˆ τ ? 3
Standard Textbook Approach: View D and G as fixed, assume ε ∼ N (0 , Ω) Ω block diagonal, correspondig to clusters Ω 1 0 0 . . . 0 Ω 2 0 . . . Ω = . . ... . . 0 . . . Ω G Variance estimators differ by assumptions on Ω g : diagonal (robust, Eicker-Huber-White), unrestricted (cluster, Liang- Zeger/Stata), constant off-diagonal (Moulton/Kloek) 4
Common Variance estimators (normalized by sample size) Eicker-Huber-White, standard robust var (zero error covar): − 1 − 1 N N N X i X ′ X i X ′ ε 2 X i X ′ ˆ � � � V robust = N i ˆ i i i i =1 i =1 i =1 Liang-Zeger, STATA, standard clustering adjustment, (unre- stricted within-cluster covariance matrix): − 1 ′ N G N X i X ′ X i X ′ ˆ � � � � � V cluster = N X i ˆ ε i X i ˆ ε i i i i =1 g =1 i : G i = g i : G i = g i =1 Moulton/Kloek (constant covariance within-clusters) 1 + ρ ε · ρ D · N � � V moulton = ˆ ˆ V robust · G where ρ ε , ρ D are the within-cluster correlations of ˆ ε and D . 5
Related Literature • Clustering: Moulton (1986, 1987, 1990), Kloek (1981) Hansen (2007), Cameron & Miller (2015), Angrist & Pischke (2008), Liang and Zeger (1986), Wooldridge (2010), Donald and Lang (2007), Bertrand, Duflo, and Mullainathan (2004) • Sample Design: Kish (1965) • Causal Literature: Neyman (1935, 1990), Rubin (1976, 2006), Rosenbaum (2000), Imbens and Rubin (2015) • Exper. Design: Murray (1998), Donner and Klar (2000) • Finite Population Issues: Abadie, Athey, Imbens, and Wooldridge (2014) 6
Views from the Literature • “The clustering problem is caused by the presence of a common unobserved random shock at the group level that will lead to correlation between all observations within each group” (Hansen, p. 671) • “The consensus is to be conservative and avoid bias and to use bigger and more aggregate clusters when possible, up to and including the point at which there is concern about having too few clusters.” (Cameron and Miller, p. 333) • Clustering does not matter when the regressors are not correlated within clusters. • Use ˆ V cluster when in doubt. 7
Questions 1. Is there any harm in using ˆ V cluster when ˆ V robust is valid? 2. Can we infer from the data whether ˆ V cluster or ˆ V robust is appropriate? 3. When are ˆ V cluster , ˆ V robust , or ˆ V moulton appropriate? 4. Is ˆ V cluster superior to ˆ V robust in large samples? 5. What is the role of within-cluster correlation of regres- sors? 8
We develop a framework within which these questions can be answered. Key features: • Specify population and estimand • Specify data generating process 9
Answers 1. Is there any harm in using ˆ V cluster when ˆ V robust is valid? YES 2. Can we infer from the data whether ˆ V cluster or ˆ V robust is appropriate? NO 3. When are ˆ V cluster or ˆ V robust appropriate? DEPENDS ON DESIGN 4. Is ˆ V cluster superior to ˆ V robust in large samples? DE- PENDS ON DESIGN 5. What is the role of within-cluster correlation of regres- sors? DEPENDS ON DESIGN 10
First, Define the Population and Estimand Population of size M . Population is partioned into G groups/clusters. The population size in cluster g is M g , here M g = M/G for all clusters for convenience. G i ∈ { 1 , . . . , G } is group/cluster indicator. M may be large/infinite, G may be large/infinite, M g may be large/infinite. R i ∈ { 0 , 1 } is sampling indicator, � M i =1 R i = N is sample size. 11
1. Descriptive Setting : Outcome Y i Estimand is population average M θ ∗ = 1 � Y i M i =1 Estimator is sample average M θ = 1 ˆ � R i · Y i N i =1 12
2. Causal Setting : potential outcomes Y i ( − 1) , Y i (1), treatment D i ∈ {− 1 , 1 } , re- alized outcome Y i = Y i ( D i ), Estimand is 0.5 times average treatment effect (to make estimand equal to limit of regression coefficient, simplifies calculations later, but not of essence) M θ ∗ = 1 � ( Y i (1) − Y i ( − 1)) / 2 M i =1 Estimator is � M � M i =1 R i · Y i · ( D i − D ) i =1 R i · D i ˆ θ = where D = � M � M i =1 R i · ( D i − D ) 2 i =1 R i 13
Descriptive Setting : population definitions 1 Y M,g = G � 2 σ 2 � � � g = Y i − Y M,g Y i M g − 1 M i : G i = g i : G i = g G 1 � 2 σ 2 � � cluster = Y M,g − Y M G − 1 g =1 G cond = 1 σ 2 σ 2 � g G g =1 σ 2 ( Y i − Y M )( Y j − Y M ) G cluster � ρ = ≈ σ 2 cluster + σ 2 σ 2 M ( M − G ) i � = j,G i = G j cond M 1 σ 2 = ( Y i − Y M ) 2 ≈ σ 2 cluster + σ 2 � cond M − 1 i =1 14
Estimator is M θ = 1 ˆ � R i · Y i N i =1 • (random sampling) Suppose sampling is completely ran- dom, � − 1 M � M � pr( R = r ) = ∀ r s . t . r i = N. , N i =1 Exact variance, normalized by sample size: 1 − N � � θ | RS) = σ 2 · ≈ σ 2 N · V (ˆ M 15
What do the variance estimators give us here? � � ˆ � ≈ σ 2 � RS E V robust � cluster · N � N � �� cond ≈ σ 2 · � ≈ σ 2 G + σ 2 � ˆ � � RS 1 + ρ · G − 1 E V cluster � • Adjusting the standard errors for clustering can make a difference here • Adjusting standard errors for clustering is wrong here 16
Why is the cluster variance wrong here? Implicitly the cluster variance takes as the estimand the average outcome in a super-population with a large number of clusters. The set of clusters that we see in the sample is viewed as just a small subset of that large population of clusters. In that case we dont have a random sample from the popu- lation of interest. • Be explicit about the population of interest. Do we see all clusters in the population or not. • This issue is distinct from the use of distributional approx- imations based on increasing the number of clusters. 17
Consider a model-based approach: Y i = X ′ ε i ∼ N (0 , σ 2 η g ∼ N (0 , σ 2 i β + ε i + η G i ε ) , η ) The standard ols variance expression β ) = ( X ′ X ) − 1 ( X ′ Ω X )( X ′ X ) − 1 V (ˆ is based on resampling units, or resampling both ε and η . In a random sample we will eventually see units from all clus- ters, and we do not need to resample the η g . The random sampling variance keeps the η g fixed. 18
• (clustered sampling) Suppose we randomly select H clusters out of G , and then select N/H units randomly from each of the sampled clusters: � − 1 � − H � � G M/G pr( R = r ) = · , H N/H � � for all r s . t . ∀ g r i = N/G ∨ r i = 0 . i : G i = g i : G i = g Now the exact variance is cluster · N 1 − H 1 − N � � � � θ | CS) = σ 2 + σ 2 N · V (ˆ H · cond · G M Adjusting standard errors for clustering here can make a difference and is correct here. Failure to do so leads to invalid confidence intervals. 19
Four Causal Settings • Random sample, random assignment of units. • Random sample, random assignment of clusters. • Clustered sample, random assignment of units. • Random sample, assignment prob varying across clusters. Questions 1. Is ˆ V robust valid? 2. Is ˆ V cluster valid? 20
Answers • Random sample, random assignment of units. ˆ ˆ V robust valid V cluster not generally valid • Random sample, random assignment of clusters. ˆ ˆ V robust not generally valid V cluster valid • Clustered sample, random assignment of units. depends on estimand: average effect in population versus average effect in sample • Random sample, assignment prob varying across clusters. neither generally valid 21
Causal Setting: Random Sampling, Random Assign- ment Points: 1. Should not cluster. 2. ˆ V robust is valid 3. ˆ V cluster can be different from ˆ V robust in large samples, with many clusters, even with ρ ε = ρ D = 0. 4. ˆ V moulton and ˆ V cluster are conceptually quite different. 22
Recommend
More recommend