Counterfactual distributions: estimation and inference in Stata Victor Chernozhukov Iván Fernández-Val Blaise Melly MIT Boston University Bern University November 17, 2016 Swiss Stata Users Group Meeting in Bern Chernozhukov, Fernández-Val and Melly Counterfactual distributions in Stata
Questions I What would have been the wage distribution in 1979 if the workers had the same distribution of characteristics as in 1988? I What would be the distribution of housing prices resulting from cleaning up a local hazardous-waste site? I What would be the distribution of wages for female workers if female workers were paid as much as male workers with the same characteristics? I In general, given an outcome Y and a covariate vector X . What is the e¤ect on F Y of a change in 1. F X (holding F Y j X …xed)? 2. F Y j X (holding F X …xed)? I To answer these questions we need to estimate counterfactual distributions. Chernozhukov, Fernández-Val and Melly Counterfactual distributions in Stata
Counterfactual distributions I Let 0 denote 1979 and 1 denote 1988. I Y is wages and X is a vector of worker characteristics (education, experience, ...). I F X k ( x ) is worker composition in k 2 f 0 , 1 g ; F Y j ( y j x ) is wage structure in j 2 f 0 , 1 g . I De…ne Z F Y h j j k i ( y ) : = F Y j ( y j x ) dF X k ( x ) . I F Y h 0 j 0 i is the observed distribution of wages in 1979; F Y h 0 j 1 i is the counterfactual distribution of wages in 1979 if workers have 1988 composition. I Common support: F Y h 0 j 1 i is well de…ned if the support of X 1 is included in the support of X 0 . Chernozhukov, Fernández-Val and Melly Counterfactual distributions in Stata
E¤ect of changing F X I We are interested in the e¤ect of shifting the covariate distribution from 1979 to that of 1988. I Distribution e¤ects ∆ DE ( y ) = F Y h 0 j 1 i ( y ) � F Y h 0 j 0 i ( y ) I The quantiles are often also of interest: Q Y h j j k i ( τ ) = inf f y : F Y h j j k i ( y ) � u g , 0 < τ < 1. Quantile e¤ects ∆ QE ( τ ) = Q Y h 0 j 1 i ( τ ) � Q Y h 0 j 0 i ( τ ) I In general, for a functional φ , the e¤ects is ∆ ( w ) : = φ ( F Y h 0 j 1 i ) ( w ) � φ ( F Y h 0 j 0 i ) ( w ) . Special cases: Lorenz curve, Gini coe¢cient, interquartile range, and more trivially the mean and the variance. Chernozhukov, Fernández-Val and Melly Counterfactual distributions in Stata
Types of counterfactual changes in F X 1. Groups correspond to di¤erent subpopulations (di¤erent time periods, male vs. female, black vs. white). 2. Transformations of the population: X 1 = g ( X 0 ) : I Unit change in location of one covariate: X 1 = X 0 + 1 where X is the number of cigarettes smoked by the mother and Y is the birthweight of the newborn. I Neutral redistribution of income: X 1 = µ X 0 + α ( X 0 � µ X 0 ) , where Y is the food expenditure (Engel curve). I Stock (1991): e¤ect on housing prices of removing hazardous waste disposal site. n 1 In 1. and 2., b F X 1 ( x ) = n � 1 ∑ 1 f X 1 i � x g . 1 i = 1 3. Change in some variable(s) but not in the other ones: unionization rate in 1988 and other characteristics from 1979. In 3., d b F X 1 ( x ) = d ˆ F U 1 j C 1 ( u j c ) d ˆ F C 0 ( c ) . Chernozhukov, Fernández-Val and Melly Counterfactual distributions in Stata
E¤ect of changing F Y j X I We are often interested in the e¤ect of changing the conditional distribution of the outcome for a given population. I Program evaluation: Group 1 is treated and group 0 is the control group. The quantile treatment e¤ect on the treated is QTET = Q Y h 1 j 1 i ( τ ) � Q Y h 0 j 1 i ( τ ) . I The counterfactual distributions are always statistically well-de…ned object. The e¤ects are of interest even in ‘non-causal’ framework (e.g. gender wage gap). I Causal interpretation under additional assumptions that give a structural interpretation to the conditional distribution. Selection on observables: the conditional distribution may be estimated using quantile or distribution regression. Endogenous groups: IV quantile regression (e.g. Chernozhukov and Hansen 2005). Chernozhukov, Fernández-Val and Melly Counterfactual distributions in Stata
Decompositions I The counterfactual distributions that we analyze are the key ingredients of the decomposition methods often used in economics. I Blinder/Oaxaca decomposition (parametric, linear decomposition of the mean di¤erence): Y 0 � ¯ ¯ Y 1 = ( ¯ X 0 β 0 � ¯ X 1 β 0 ) + ( ¯ X 1 β 0 � ¯ X 1 β 1 ) . This …ts in our framework (even if our machinery is not needed in this simple case) as � � � � Y h 0 j 0 i � Y h 1 j 1 i = Y h 0 j 0 i � Y h 0 j 1 i + Y h 0 j 1 i � Y h 1 j 1 i . I Our results allow us to do similar decomposition of any functional of the distribution. E.g. a quantile decomposition � � � � Q Y h 0 j 0 i ( τ ) � Q Y h 0 j 1 i ( τ ) + Q Y h 0 j 1 i ( τ ) � Q Y h 1 j 1 i ( τ ) . Chernozhukov, Fernández-Val and Melly Counterfactual distributions in Stata
Estimation: plug-in principle I We estimate the unknown elements in R F Y 0 ( y j x ) dF X 1 ( x ) by analog estimators. I We estimate the distribution of X 1 by the empirical distribution for group 1 . I The conditional distribution can be estimated by: 1. Location and location-scale shift models (e.g. OLS and independent errors), 2. Quantile regression, 3. Duration models (e.g. proportional hazard model), 4. Distribution regression. I Our results also cover other methods (e.g. IV quantile regression). Chernozhukov, Fernández-Val and Melly Counterfactual distributions in Stata
Outline of the algorithm for F Y h 0 j 1 i ( y ) 1. Estimation 1.1 Estimate F X 1 ( x ) by b F X 1 ( x ) . 1.2 Estimate F Y 0 ( y j x ) by b F Y 0 j X 0 ( y j x ) . F Y h 0 j 1 i ( y ) = R b 1.3 b F Y 0 j X 0 ( y j x ) d b F X 1 ( x ) (in most cases: n � 1 ∑ n 1 i = 1 b F Y 0 j X 0 ( y j X 1 i ) ). 1 2. Pointwise inference 2.1 Bootstrap b F Y h 0 j 1 i ( y ) to obtain the pointwise s.e. ˆ Σ ( y ) . 2.2 Obtain a 95% CI as b F Y h 0 j 1 i ( y ) � 1 . 96 � ˆ Σ ( y ) . 3. Uniform inference Obtain the 95% con…dence bands as b t � ˆ F Y h 0 j 1 i ( y ) � ˆ Σ ( y ) , where ˆ t is the 95th percentile of the bootstrap draws of the maximal t statistic over y . Chernozhukov, Fernández-Val and Melly Counterfactual distributions in Stata
Conditional quantile models I Location shift model (OLS with independent error term): X 0 β + V , V ? Y = ? X x 0 β + Q V ( u ) . Q Y ( u j x ) = Parsimonious but restrictive, X only impact location of Y . I Quantile regression (Koenker and Bassett 1978): X 0 β ( U ) , U j X � U ( 0 , 1 ) Y = x 0 β ( u ) . Q Y ( u j x ) = X can change shape of entire conditional distribution. I Connect the conditional distribution with the conditional quantile Z 1 F Y 0 ( y j x ) � 0 1 f Q Y 0 ( u j x ) � y g du . Chernozhukov, Fernández-Val and Melly Counterfactual distributions in Stata
Quantile regression 4 3 2 y 1 0 0 .2 .4 .6 .8 1 x y First quartile Median Third quartile Chernozhukov, Fernández-Val and Melly Counterfactual distributions in Stata
Conditional distribution models I Distribution regression model (Foresi and Peracchi 1995): F Y ( y j x ) = Λ ( x 0 β ( y )) , where Λ is a link function (probit, logit, cauchit). X can have heterogeneous e¤ects across the distribution. I Cox (72) proportional hazard model is a special case with complementary log-log link and constant slope parameter F Y ( y j x ) = 1 � exp ( � exp ( β 0 ( y ) � x 0 β 1 )) In other words: β ( y ) is assumed to be constant. I Estimate functional parameter vector y 7! β ( y ) by MLE: 1. Create indicators 1 f Y � y g , 2. Probit/logit of 1 f Y � y g on X . Chernozhukov, Fernández-Val and Melly Counterfactual distributions in Stata
Distribution regression 3 1 .8 Conditional distribution 2 .6 y i .4 1 .2 0 0 0 .2 .4 .6 .8 1 x y Prob(Y<1.15|x) Prob(Y<1.5|x) Prob(Y<1.85|x) Chernozhukov, Fernández-Val and Melly Counterfactual distributions in Stata
Comparison: QR vs DR I QR and DR are ‡exible semiparametric models for the conditional distribution that generalize important classical models. I Equivalent if X is saturated; but not nested otherwise. Choice cannot be made on the basis of generality. I QR requires smooth conditional density of Y . I QR usually overperforms DR under smoothness, but is less robust when Y has mass points. I Di¤erent ability to deal with data limitations: censoring and rounding. Chernozhukov, Fernández-Val and Melly Counterfactual distributions in Stata
Pointwise and uniform inference I The covariance function of b F Y h 0 j 1 i ( y ) is cumbersome to estimate = ) exchangeable bootstrap (covers empirical bootstrap, weighted bootstrap and subsampling) provides the pointwise s.e. ˆ Σ ( y ) . I Many policy questions of interest involve functional hypotheses: no e¤ect, constant e¤ect, stochastic dominance. = ) uniform con…dence bands: b t � ˆ F Y h 0 j 1 i ( y ) � b Σ ( y ) . The true t corresponds to the 95th percentile of the distribution of the maximum t -statistic Σ ( y ) � 1 / 2 j b b sup F Y h 0 j 1 i ( y ) � F Y h 0 j 1 i ( y ) j , y which is unknown. We use the bootstrap to estimate it. Chernozhukov, Fernández-Val and Melly Counterfactual distributions in Stata
Recommend
More recommend