Cluster Robust Inference with Heterogeneous Clusters joint work with Chang Lee and Drew Carter Douglas G. Steigerwald UC Santa Barbara July 2018 D. Steigerwald (UCSB) Cluster Robust July 2018 1 / 32
Empirical Framework Kuhn et alia AER 2011 measure consumption impact from a shock to neighbor’s income 410 postal codes ( g ) : 4 to 105 households ( i ) : g grows with n c gi = α 0 + α fe + β 1 � win g + β 2 � income gi + u gi V covariance matrix of OLSE for coe¢cients b V cluster-robust variance estimator baseline beliefs for this empirical setting b V is known to be consistent 1 b V removes downward bias in OLS estimator of V 2 degrees-of-freedom for hypothesis testing at least 410 3 410 for t test of H 0 : β 1 = 0 1 n for t test of H 0 : β 2 = 0 2 D. Steigerwald (UCSB) Cluster Robust July 2018 2 / 32
Research Response Our …ndings c gi = α 0 + α fe + β 1 � win g + β 2 � income gi + u gi V covariance matrix of OLSE for coe¢cients b V cluster-robust variance estimator our …ndings for this empirical setting b V is known to be consistent - false 1 previously established when group designs (cluster sizes) are equal 1 we establish consistency when group designs (cluster sizes) vary 2 inconsistent for α fe 3 b V removes downward bias in OLS estimator of V - false 2 b V may have downward bias 1 degrees-of-freedom for hypothesis testing at least 410 - false 3 b V a function only of between cluster variation 1 d-o-f at most 410 for either t test of H 0 : β 1 = 0 or H 0 : β 2 = 0 2 variation in designs (cluster sizes) reduces d-o-f below 410 3 D. Steigerwald (UCSB) Cluster Robust July 2018 3 / 32
Road Map Data sets with growing number of clusters Interest focuses on cluster invariant regressor I no cluster …xed e¤ects Consistency with cluster homogeneity White (1984) Finite sample behavior Cameron, Gelbach and Miller (2008) Consistency with cluster heterogeneity 1 allow cluster sizes to vary 1 number of clusters tends to in…nity 2 Guide to …nite sample behavior - re‡ects cluster heterogeneity 2 E¤ective number of clusters 1 smaller than number of clusters 2 Guidelines for Empirical Research 3 D. Steigerwald (UCSB) Cluster Robust July 2018 4 / 32
Cluster Structure data generating process y gi = β 0 + β 1 x g + β 2 z gi + u gi y gi observation i in cluster g ∑ g n g = n n g number of observations in cluster g G number of clusters Error covariance matrix 2 3 Ω 1 0 0 6 7 ... Ω = 4 5 0 0 Ω G 0 0 Ω g unrestricted (positive de…nite) D. Steigerwald (UCSB) Cluster Robust July 2018 5 / 32
Robust Test Statistic Shah, Holt and Folsom 1977 a selection vector H 0 : a T β = 0 h� i � � 1 X T ˆ V = ∑ G X T X β OLSE for β with variance g = 1 Var g u g test statistic a T ˆ β Z = p a T ˆ V a cluster robust variance estimator � � � 1 � ∑ G � � � 1 b X T X g = 1 X T u T X T X V = u g ˆ g X g � g ˆ robust to arbitrary structure of Ω g allows n g to vary D. Steigerwald (UCSB) Cluster Robust July 2018 6 / 32
Consistency Theorem 1 Assumptions Ω g not identical over g X g not identical over g n g not constant over g If, as n ! ∞ : G ! ∞ a T b MS V a ! 1 a T Va which leads directly to Z H 0 N ( 0 , 1 ) D. Steigerwald (UCSB) Cluster Robust July 2018 7 / 32
Remark 1 Convergence governed by G not n � � � 1 X T X T X A g = g X g ˆ β g OLSE based only on X g � � � � T b β g � ˆ ˆ β g � ˆ ˆ A T V = ∑ g A g β β g b V is a function only of between cluster variation consistency requires G ! ∞ y gi = β 0 + β 1 x g + β 2 z gi + u gi even for test of β 2 behavior of Z is governed by G if there is no cluster correlation each observation is a cluster G = n D. Steigerwald (UCSB) Cluster Robust July 2018 8 / 32
Remark 2 Inconsistent Testing b V is a function only of between cluster variation consistency of b V depends on G growing inconsistent test for I coe¢cient estimator that depends on …xed subset of clusters leading examples I controls that correspond to a group of clusters I cluster speci…c controls (cluster …xed e¤ects) D. Steigerwald (UCSB) Cluster Robust July 2018 9 / 32
Cluster Heterogeneity and Asymptotic Approximation What gives rise to cluster heterogeneity? For example: unequal cluster sizes 1 equal cluster sizes, but variation in Ω g 2 equal cluster sizes and constant Ω g , but variation in X g 3 the majority of empirical studies have cluster heterogeneity convergence of Z requires G ! ∞ Is G an accurate guide to performance under heterogeneity? D. Steigerwald (UCSB) Cluster Robust July 2018 10 / 32
Cluster Heterogeneity Measure analysis leads to a natural measure of heterogeneity for each cluster γ g = a T � � � 1 X T � � � 1 a X T X X T X g Ω g X g depends on which coe¢cients are under test through a measure of heterogeneity for entire sample � � 2 G ∑ G 1 γ g � ¯ γ g = 1 Γ = γ 2 ¯ I (squared) coe¢cient of variation for γ g D. Steigerwald (UCSB) Cluster Robust July 2018 11 / 32
Finite Sample Behavior of Cluster Robust Estimator leading term in asymptotic behavior of Z is governed by G under homogeneity I number of clusters is a guide to inference G 1 + Γ under heterogeneity inference is guided by the e¤ective number of clusters G ENC = 1 + Γ D. Steigerwald (UCSB) Cluster Robust July 2018 12 / 32
Magnitude of Cluster Correction example: if Γ = 2 ENC = G 3 di¤erent order of magnitude than standard bias correction G � k As n ! ∞ : ENC governs the mean-squared error of b V cluster heterogeneity increases I variation in b V I bias in b V D. Steigerwald (UCSB) Cluster Robust July 2018 13 / 32
Laboratory Performance Framework y gi = β 0 + β 1 x g + β 2 z gi + u gi error components model u gi = ε g + v gi � � iid 0 , cz 2 � N ( 0 , 1 ) independently of v gi j X � N ε g gi correlation matrix for cluster g 2 3 1 ρ ij 6 7 ... 1 p gi � p 5 where ρ ij = 4 1 + cz 2 1 + cz 2 gj ρ ij 1 c = 500 nearly uncorrelated (heteroskedastic) c = 0 perfectly correlated (homoskedastic) D. Steigerwald (UCSB) Cluster Robust July 2018 14 / 32
Design Variation 2500 observations divided into 100 groups iid iid � Bernoulli ( . 5 ) � U ( 0 , 1 ) x g z gi Cluster Sizes 1 design 1 : n 1 = 25 n 2 = � � � = n 100 = 25 1 design 2 : n 1 = 124 n 2 = � � � = n 100 = 24 2 . . . 3 design 10 : n 1 = 916 n 2 = � � � = n 100 = 16 4 Error Cluster Correlation 2 c = 500 : correlation � 0 heteroskedastic 1 . . . 2 c = 0 : correlation =1 homoskedastic 3 D. Steigerwald (UCSB) Cluster Robust July 2018 15 / 32
Impact of Design on E¤ective Number of Clusters E¤ective Number of Clusters: G 1 + Γ ( Ω , X ) Γ ( Ω , X ) : measure of cluster heterogeneity cluster size variation 1 Increasing cluster size variation reduces ENC 1 realized values for X 2 Data sets with unequal values for x g reduce ENC 1 cluster error correlation 3 As the cluster error correlation increases, ENC is more sensitive to 1 variation in x g for each set of cluster sizes and value of c : generate 1000 values of X D. Steigerwald (UCSB) Cluster Robust July 2018 16 / 32
Impact of Design on ENC D. Steigerwald (UCSB) Cluster Robust July 2018 17 / 32
Impact of E¤ective Number of Clusters on MSE of Cluster-Robust Variance Estimator Mean-Squared Error: � ! � a T b V a � 2 � G ( 1 + Γ ) MSE � � X a T Va Reducing the ENC increases the MSE for b V MSE is conditional on realization of X 1 5 values of X are generated for each set of cluster sizes and value of c 1 for each value of X , 1000 values of u are generated 2 D. Steigerwald (UCSB) Cluster Robust July 2018 18 / 32
D. Steigerwald (UCSB) Cluster Robust July 2018 19 / 32
D. Steigerwald (UCSB) Cluster Robust July 2018 20 / 32
MSE of Cluster-Robust Variance Estimator Cluster-Invariant Regressor y gi = β 0 + β 1 x g + β 2 z gi + u gi estimator of variance for ˆ β 1 MSE is impacted by bias bias is driven by variation in cluster size With variation in cluster sizes, the cluster-robust standard error can be signi…cantly downward biased for the cluster-invariant regressor. D. Steigerwald (UCSB) Cluster Robust July 2018 21 / 32
MSE of Cluster-Robust Variance Estimator Cluster-Varying Regressor y gi = β 0 + β 1 x g + β 2 z gi + u gi estimator of variance for ˆ β 2 MSE impact depends on c if c = 500 (no error cluster correlation) : bias impacts I bias driven by variation in cluster size if c < 500 (error cluster correlation) : variation dominates With error cluster correlation, the cluster-robust standard error can be highly variable for the cluster-varying regressor. D. Steigerwald (UCSB) Cluster Robust July 2018 22 / 32
Empirical Test Size for Cluster-Robust t Test Cluster-Invariant Regressor y gi = β 0 + β 1 x g + β 2 z gi + u gi test of H 0 : β 1 = 0 I small ENC ! downward bias in cluster-robust s.e. ! large empirical test size test of H 0 : β 2 = 0 I small ENC ! greater variation in cluster-robust s.e. ! variation in empirical test size Most pronounced impact for hypothesis test of β 1 D. Steigerwald (UCSB) Cluster Robust July 2018 23 / 32
D. Steigerwald (UCSB) Cluster Robust July 2018 24 / 32
D. Steigerwald (UCSB) Cluster Robust July 2018 25 / 32
Recommend
More recommend