Outline “A Course in Applied Econometrics” 1. Introduction Lecture 12 2. Basics 3. Graphical Analyses Regression Discontinuity Designs 4. Local Linear Regression 5. Choosing the Bandwidth Guido Imbens 6. Variance Estimation IRP Lectures, UW Madison, August 2008 7. Specification Checks 1 1. Introduction 2. Basics A Regression Discontinuity (RD) Design is a powerful and Two potential outcomes Y i (0) and Y i (1), widely applicable identification strategy. causal effect Y i (1) − Y i (0), binary treatment indicator W i , covariate X i , Often access to, or incentives for participation in, a service and the observed outcome equal to: or program is assigned based on transparent rules with crite- � ria based on clear cutoff values, rather than on discretion of Y i (0) if W i = 0 , Y i = Y i ( W i ) = (1) administrators. Y i (1) if W i = 1 . At X i = c incentives to participate change. Comparisons of individuals that are similar but on different sides of the cutoff point can be credible estimates of causal effects Two cases, Sharp Regression Discontinuity : for a specific subpopulation. W i = 1 { X i ≥ c } . (SRD) Good for internal validity, not much external validity. and Fuzzy Regression Discontinuity Design : Long history in Psychology literature (Thistlewaite and Camp- lim x ↓ c Pr( W i = 1 | X i = x ) � = lim x ↑ c Pr( W i = 1 | X i = x ) , (FRD) bell, 1960), early work by Goldberger (1972), recent resurgence in economics. 2 3
SRD Sharp Regression Discontinuity Key assumption: Example (Lee, 2007) E [ Y (0) | X = x ] and E [ Y (1) | X = x ] are continuous in x . What is effect of incumbency on election outcomes? (More Under this assumption, specifically, what is the probability of a Democrat winning the τ SRD = lim x ↓ c E [ Y i | X i = x ] − lim x ↑ c E [ Y i | X i = x ] . (FRD estimand) next election given that the last election was won by a Demo- crat?) The estimand is the difference of two regression functions at a point. Compare election outcomes in cases where previous election was very close. Extrapolation is unavoidable. 4 5 Fuzzy Regression Discontinuity Examples (VanderKlaauw, 2002) FRD What is effect of financial aid offer on acceptance of college What do we look at in the FRD case: ratio of discontinuities admission. in regression function of outcome and treatment: College admissions office puts applicants in a few categories τ FRD = lim x ↓ c E [ Y i | X i = x ] − lim x ↑ c E [ Y i | X i = x ] lim x ↓ c E [ W i | X i = x ] − lim x ↑ c E [ W i | X i = x ] . based on numerical score. (FRD Estimand) Financial aid offer is highly correlated with category. Compare individuals close to cutoff score. 6 7
Interpretation of FRD (Hahn, Todd, VanderKlaauw) Let W i ( x ) be potential treatment status given cutoff point x , for x in some small neigborhood around c (which requires that the cutoff point is at least in principle manipulable) External Validity W i ( x ) is non-increasing in x at x = c . The estimatand has little external validity. It is at best valid A complier is a unit such that for a population defined by the cutoff value c , and by the sub- lim W i ( x ) = 0 , and lim W i ( x ) = 1 . population that is affected at that value. x ↓ X i x ↑ X i Then lim x ↓ c E [ Y i | X i = x ] − lim x ↑ c E [ Y i | X i = x ] lim x ↓ c E [ W i | X i = x ] − lim x ↑ c E [ W i | X i = x ] = E [ Y i (1) − Y i (0) | unit i is a complier and X i = c ] . 8 9 FRD versus Unconfoundedness � 3. Graphical Analyses � � Y i (0) , Y i (1) ⊥ ⊥ W i � X i . (unconfoundedness) � Under this assumption: A. Plot regression function E [ Y i | X i = x ] E [ Y i (1) − Y i (0) | X i = c ] = E [ Y i | W i = 1 , X i = c ] − E [ Y i | W i = 0 , X i = c ] . This approach assumes that differences between treated and B. Plot regression functions E [ Z i | X i = x ] for covariates that control units with X i = c have a causal interpretation, without do not enter the assignment rule Z i exploiting the discontinuity. Unconfoundedness is fundamentally based on units being com- C. Plot density f X ( x ). parable if their covariates are similar. This is not an attractive assumption in the current setting where the probability of re- ceiving the treatment is discontinuous in the covariate. In all cases use estimators that do not smooth around the cutoff value. For example, for binwidth h define bins [ b k − 1 , b k ], Even if unconfoundedness holds, under continuity of potential where b k = c − ( K 0 − k + 1) · h , and average outcomes within outcome regression functions FRD approach will be consistent bins. for the average effect for compliers at X i = c . 10 11
4. Local Linear Regression We are interested in the value of a regression function at the Alternatively one can estimate the average effect directly in a boundary of the support. Standard kernel regression single regression, � N N � � � Y i = α + β · ( X i − c ) + τ · W i + γ · ( X i − c ) · W i + ε i µ l ( c ) = Y i 1 (2) i | c − h<X i <c i | c − h<X i <c thus solving does not work well for that case (slower convergence rates) N � min 1 { c − h ≤ X i ≤ c + h } Better rates are obtained by using local linear regression. First α,β,τ,γ i =1 N � ( Y i − α l − β l · ( X i − c )) 2 , × ( Y i − α − β · ( X i − c ) − τ · W i − γ · ( X i − c ) · W i ) 2 , min (3) α l ,β l i | c − h<X i <c which will numerically yield the same estimate of τ SRD . The value of lefthand limit µ l ( c ) is then estimated as � α l + ˆ µ l ( c ) = ˆ β l · ( c − c ) = ˆ α l . (4) This interpretation extends easily to the inclusion of covariates. Similarly for righthand side. Not much gained by using a non- uniform kernel. 12 13 Alternatively, define the vector of covariates ⎛ ⎞ ⎛ ⎞ α yl 1 Estimation for the FRD Case ⎜ ⎟ ⎜ ⎟ V i = 1 { X i < c } · ( X i − c ) ⎠ , and δ = β yl ⎠ . ⎝ ⎝ 1 { X i ≥ c } · ( X i − c ) β yr Do local linear regression for both the outcome and the treat- Then we can write ment indicator, on both sides, Y i = δ ′ V i + τ · W i + ε i . � � � � 2 , � (TSLS) α yl , ˆ ˆ β yl = arg min Y i − α yl − β yl · ( X i − c ) α yl ,β yl Then estimating τ based on the regression function (TSLS) by i : c − h ≤ X i <c Two-Stage-Least-Squares methods, using � � � ( W i − α wl − β wl · ( X i − c )) 2 , α wl , ˆ ˆ = arg min β wl α wl ,β wl W i as the endogenous regressor, i : c − h ≤ X i <c the indicator 1 { X i ≥ c } as the excluded instrument α yr , ˆ α wr , ˆ and similarly (ˆ β yr ) and (ˆ β wr ). Then the FRD estimator V i as the set of exogenous variables is = ˆ α yr − ˆ α yl τ FRD = ˆ τ y This is is numerically identical to ˆ τ FRD before (because of uni- ˆ . ˆ τ w ˆ α wr − ˆ α wl form kernel) Can add other covariates in straightfoward manner. 14 15
Optimal Bandwidth 5. Choosing the Bandwidth (Imbens-Kalyanaraman) ⎛ ⎞ 1 / 5 σ 2 σ 2 l ( c ) � � 1 / 5 r ( c ) We wish to take into account that ( i ) we are interested in the ⎜ p · f r ( c ) + ⎟ C 2 ⎜ (1 − p ) · f l ( c ) ⎟ · N − 1 / 5 h opt = · ⎜ ⎟ � � 2 regression function at the boundary of the support, and ( ii ) � ∂ 2 m r � 2 + ⎝ ⎠ C 1 ∂ 2 m l ∂x 2 ( c ) ∂x 2 ( c ) that we are interested in the regression function at x = c . IK focus on minimizing p is share of observations above threshold. � µ r ( c ) − µ r ( c )) 2 � � � 2 µ l ( c ) − µ l ( c )) 2 + (ˆ � ∞ 0 ( ν 2 − uν 1 )) 2 K 2 ( u ) du E (ˆ ν 2 C 1 = 1 2 − ν 1 ν 3 4 · C 2 = � � 2 ν 2 ν 0 − ν 2 ν 2 ν 0 − ν 2 1 1 Both ˆ µ l ( c ) and ˆ µ r ( c ) are based on local linear estimators, with � ∞ u j K ( u ) du the same bandwidth h . ν j = 0 If K ( u ) = 1 | u | < 0 . 5 , then ( C 2 /C 1 ) = 5 . 40 16 17 Bandwidth for FRD Design 6. Variance Estimation σ 2 Y l = lim x ↑ c Var( Y i | X i = x ) , C Y Wl = lim x ↑ c Cov( Y i , W i | X i = x ) , 1. Calculate optimal bandwidth separately for both regression � � � � 4 4 functions and choose smallest. σ 2 Y r + σ 2 σ 2 Wr + σ 2 V τ y = f X ( c ) · , V τ w = f X ( c ) · Y l Wl √ √ The asymptotic covar of Nh (ˆ τ y − τ y ) and Nh (ˆ τ w − τ w ) is 2. Calculate optimal bandwidth only for outcome and use that 4 for both regression functions. C τ y ,τ w = f X ( c ) · ( C Y Wr + C Y Wl ) . Finally, the asymptotic distribution has the form Typically the regression function for the treatment indicator ⎛ ⎞ · V τ y + τ 2 √ ⎝ 0 , 1 · V τ w − 2 · τ y is flatter than the regression function for the outcome away d y ⎠ . Nh · (ˆ τ − τ ) − → N · C τ y ,τ w τ 2 τ 4 τ 3 from the discontinuity point (completely flat in the SRD case). w w w So using same criterion would lead to larger bandwidth for This asymptotic distribution is a special case of that in HTV, estimation of regression function for treatment indicator. In using the rectangular kernel, and with h = N − δ , for 1 / 5 < δ < practice it is easier to use the same bandwidth, and so to 2 / 5 (so that the asymptotic bias can be ignored). avoid bias, use the bandwidth from criterion for SRD design or Can use plug in estimators for components of variance. smallest. 18 19
Recommend
More recommend