ps 406 week 7 section instrumental variables 2sls and rdd
play

PS 406 Week 7 Section: Instrumental Variables/2SLS and RDD D.J. - PowerPoint PPT Presentation

PS 406 Week 7 Section: Instrumental Variables/2SLS and RDD D.J. Flynn May 14, 2014 1 1 Updated 5/16/14 with sample bootstrap code, and 5/20/14 with example of subsetting on multiple criteria. D.J. Flynn PS406 Week 7 Section Spring 2014


  1. PS 406 – Week 7 Section: Instrumental Variables/2SLS and RDD D.J. Flynn May 14, 2014 1 1 Updated 5/16/14 with sample bootstrap code, and 5/20/14 with example of subsetting on multiple criteria. D.J. Flynn PS406 – Week 7 Section Spring 2014 1 / 16

  2. The IV approach and its assumptions Endogeneity as a threat to causal inference Exogeneity assumption: E ( u | X ) = 0 When X is endogeneous, then ˆ β will be a mixture of the relationship between X and Y and the relationship between X and u 2 Common violations: omitted variables measurement error (on X ) simultaneity/reciprocal causation others... instrumental variables is a method for uncovering the relationship between X → Y in the presence of endogeneity 2 See Jay’s slides 1-15 for more. D.J. Flynn PS406 – Week 7 Section Spring 2014 2 / 16

  3. The IV approach and its assumptions Key challenge: finding credible instruments When we want to use IV regression, we first need to identify a credible instrumental variable (or instrument ) The most common IV set-up is as follows: Z → X → Y An instrument ( Z ) MUST meet three requirements: doesn’t belong in the model (theoretic) 1 “relevance criterion”: cov ( z , x ) � 0 2 “exclusion restriction”: E ( z T u ) = 0 3 If these assumptions hold, then E ( ˆ β IV ) = β D.J. Flynn PS406 – Week 7 Section Spring 2014 3 / 16

  4. The IV approach and its assumptions The IV estimator Recall the OLS estimate of β is: ( X T X ) − 1 X T y = cov ( x , y ) var ( x ) The IV estimate of β is: ( z T x ) − 1 z T y = cov ( z , y ) cov ( z , x ) Important implications: weak instrument = any covariance b/w Z and Y produces significant bias strong instrument = if X and u are strongly related (which they are – that’s why we’re doing IV), then Z and u are surely related too = violation of exclusion restriction Ideally, you want a “moderately strong” (?) instrument. This is hard to find in practice, so need to discuss strength/limitations of any instrument you use (Jay). D.J. Flynn PS406 – Week 7 Section Spring 2014 4 / 16

  5. The IV approach and its assumptions Cool (?) IV examples Ladd (2012): X=talk radio exposure, Y=media trust, Z=miles of daily 1 commute to work Gerber (1998): X=campaign spending, Y=election outcomes, 2 Z=challenger wealth Acemoglu et al. (2001): X=political institutions, Y=economic 3 development, Z=settler mortality rates Card (1995): X=education, Y=earnings, Z=geographic proximity to 4 college/university..... D.J. Flynn PS406 – Week 7 Section Spring 2014 5 / 16

  6. IV Regression in R IV regression with the Card data install.packages("AER") library(AER) data(CollegeDistance) names(CollegeDistance) clean<-na.omit(data.frame(wage=CollegeDistance$wage, education=CollegeDistance$education, distance=CollegeDistance$distance)) D.J. Flynn PS406 – Week 7 Section Spring 2014 6 / 16

  7. IV Regression in R #cov(z,x): cov(clean$distance,clean$education) #model w/o IV: reg<-lm(wage~education,data=clean) summary(reg) D.J. Flynn PS406 – Week 7 Section Spring 2014 7 / 16

  8. IV Regression in R #incorporating IV: install.packages("sem") library(sem) #format: DV ~ exogenous and endogenous Xs, ~ #exogenous Xs and instruments iv.model<-tsls(wage~education, ~ distance,data=clean) summary(iv.model) #note: you could, of course, look at subgroups of the population (e.g., those who might be on the margin of going to college). this is just an average effect across all cases in the dataset D.J. Flynn PS406 – Week 7 Section Spring 2014 8 / 16

  9. IV Regression in R 2SLS vs. IV When you have one instrument per endogenous X, 2SLS collapses to instrumental variables. Here we have one instrument, so we can solve for the coefficient using the IV formula: D.J. Flynn PS406 – Week 7 Section Spring 2014 9 / 16

  10. IV Regression in R ymat<-clean$wage zmat<-matrix(1,nrow=4739,ncol=2) zmat[,2] <- clean$distance xmat<-matrix(1, nrow=4739, ncol=2) xmat[,2]<-clean$education solve(t(zmat) %*% xmat) %*% t(zmat) %*% ymat #notice coefficients are the same! When we have 2+ instruments (2SLS), the formula for β becomes more complicated: [ X T Z ( Z T Z ) X ] − 1 [ X T Z ( Z T )( Z T Z ) − 1 Z T Y ] D.J. Flynn PS406 – Week 7 Section Spring 2014 10 / 16

  11. RDD in R RDD using the Israeli class size data 3 Back to the Maimonides’ Rule data: classes with 40-49 students get 1 teacher and an aide, classes with 50+ students get an additional teacher setwd() final5<-read.dta("final5.dta") #here we’re using a bandwidth of +/-5 students: tempdata<-subset(final5, abs(final5$c_size - 50) <=5) summary(tempdata$c_size) 3 Data file available on BB. D.J. Flynn PS406 – Week 7 Section Spring 2014 11 / 16

  12. RDD in R tempabove<-subset(tempdata, c_size>50 & c_size <=55) tempbelow<-subset(tempdata, c_size<50 & c_size >=45) #regression for cases above threshold: above.lm<-lm(avgmath~c_size,data=tempabove) summary(above.lm) #regression for cases below threshold: below.lm<-lm(avgmath~c_size,data=tempbelow) summary(below.lm) Note: This slide was updated 5/20/14 with a more helpful example of subsetting on multiple criteria. D.J. Flynn PS406 – Week 7 Section Spring 2014 12 / 16

  13. RDD in R #fitted value for tempabove at c_size=50: above.predict<-above.lm$coef[1] + above.lm$coef[2]*50 #fitted value for tempbelow at c_size=50: below.predict<-below.lm$coef[1] + below.lm$coef[2]*50 #causal effect: above.predict-below.predict D.J. Flynn PS406 – Week 7 Section Spring 2014 13 / 16

  14. RDD in R Bootstrapping the RDD effect estimate #make clean dataset: clean<-na.omit(data.frame(avgmath=tempdata$avgmath, c_size=tempdata$c_size)) #make results vector: results<-vector(mode="numeric",length=1000) for (i in 1:1000) { permdata<-clean #sample from c_size and store values permdata$c_size.temp<-sample(permdata$c_size, nrow(permdata),replace=TRUE) #create subsets around threshold and run models: permabove<-subset(permdata, c_size.temp>50 & c_size.temp<=55) permbelow<-subset(permdata, c_size.temp<50 & c_size.temp>=45) D.J. Flynn PS406 – Week 7 Section Spring 2014 14 / 16

  15. RDD in R above.lm.perm<-lm(avgmath~c_size.temp, data=permabove) below.lm.perm<-lm(avgmath~c_size.temp, data=permbelow) #store fitted values for each model: fitted.above<-above.lm.perm$coef[1] + above.lm.perm$coef[2]*50 fitted.below<-below.lm.perm$coef[1] + below.lm.perm$coef[2]*50 #store difference: results[i] <- fitted.above - fitted.below } D.J. Flynn PS406 – Week 7 Section Spring 2014 15 / 16

  16. RDD in R #look at distribution of effect estimates: summary(results) hist(results) #95% CI: quantile(results, c(0.025, 0.975)) #p-value: sum(abs(results) > abs(fitted.above - fitted.below)) / length(results) D.J. Flynn PS406 – Week 7 Section Spring 2014 16 / 16

Recommend


More recommend