PS 406 – Week 4 Section: Matching and GLMs for Binary Outcomes D.J. Flynn April 23, 2014 D.J. Flynn PS406 – Week 4 Section Spring 2014 1 / 21
Matching Intuitive solution to the problem of confounding? Kind of. Relative to experiments... We lose efficiency because we need to estimate more parameters (e.g., when calculating p-scores) SEs aren’t straightforward (bootstrapping) Regression ofen does a better job of replicating experimental findings than matching estimators (Peikes et al. 2008) (My two cents:) A lot of decisions to make. Which covariates to match on? Which matching estimator? etc.. Really good overview: sekhon.polisci.berkeley.edu/papers/annualreview.pdf D.J. Flynn PS406 – Week 4 Section Spring 2014 2 / 21
Setting up the data framing<-read.csv("~/Downloads/framing-exp-data.csv") #Running example: effect of PID on support for renewables: framing$support.renew<-recode(framing$renewables, "1:4=0;5:7=1",as.factor.result=FALSE) framing$dem<-recode(framing$party,"1=1;else=0", as.factor.result=FALSE) framing.new<-na.omit(data.frame(TA=framing$TA,condition= framing$condition,renewables=framing$renewables,gmf= framing$gmf,sex=framing$sex,year=framing$year,party= framing$party,understand=framing$understand,interest= framing$interest,dem=framing$dem,support.renew= framing$support.renew)) D.J. Flynn PS406 – Week 4 Section Spring 2014 3 / 21
Estimating GLMs in R #Let’s use logit: logit<-glm(support.renew~as.factor(TA)+understand+interest, family=binomial(link=logit),data=framing.new) summary(logit) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.4905 2.1122 0.706 0.4804 as.factor(TA)2 -0.5258 0.9769 -0.538 0.5905 as.factor(TA)3 -0.6315 0.9904 -0.638 0.5237 understand 0.9575 0.4500 2.128 0.0334 * interest -0.7423 0.7192 -1.032 0.3020 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 D.J. Flynn PS406 – Week 4 Section Spring 2014 4 / 21
Propensity scores Recall that a propensity score is the probability that a given unit is assigned to treatment, conditional on covariates: Pr ( D i = T | X i ) framing.new$pscore<-logit$fitted.values summary(framing.new$pscore) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.6335 0.9214 0.9288 0.9200 0.9619 0.9829 hist(framing.new$pscore) plot(framing.new$pscore) D.J. Flynn PS406 – Week 4 Section Spring 2014 5 / 21
Now we can proceed with matching... install.packages("Matching") library(Matching) D.J. Flynn PS406 – Week 4 Section Spring 2014 6 / 21
Pairwise matching Pairwise matching matches controls to treatment cases with closest p-score. match<-with(framing.new,Match(Y=support.renew,Tr=dem, X=pscore,est="ATT")) D.J. Flynn PS406 – Week 4 Section Spring 2014 7 / 21
summary(match) Estimate... 0.20402 AI SE...... 0.056228 T-stat..... 3.6285 p.val...... 0.00028506 Original number of observations.............. 100 Original number of treated obs............... 58 Matched number of observations............... 58 Matched number of observations (unweighted). 221 D.J. Flynn PS406 – Week 4 Section Spring 2014 8 / 21
Let’s check the quality of our matches... with(framing.new, MatchBalance(support.renew~as.factor(TA)+ understand+interest,match.out=logit)) [Long List of Info] Successful matching = similar means for treated and control cases Use p -values and KS Bootstrap to gauge balance The Kolmogorov-Smirnov stat is a non-parametric test for the equality of sample distributions (cf., t-test) D.J. Flynn PS406 – Week 4 Section Spring 2014 9 / 21
Caliper matching Caliper matching specifies a maximum acceptable distance between propensity scores (e.g., don’t want to match two cases that are very different – even if the match is the “best” possible in the data) caliper<-with(framing.new,Match(Y=support.renew,Tr=dem, X=pscore,est="ATT",caliper=0.10)) D.J. Flynn PS406 – Week 4 Section Spring 2014 10 / 21
summary(caliper) Estimate... 0.1891 AI SE...... 0.049145 T-stat..... 3.8479 p.val...... 0.00011916 Original number of observations.............. 100 Original number of treated obs............... 58 Matched number of observations............... 52 Matched number of observations (unweighted). 215 Caliper (SDs)........................................ 0.1 Number of obs dropped by ’exact’ or ’caliper’ 6 D.J. Flynn PS406 – Week 4 Section Spring 2014 11 / 21
Common support matching Common support matching creates a range where propensity scores for treated and control cases overlap. Cases with p-scores outside this range will be dropped. Generally, caliper matching is better at dealing with outliers (bc it throws away inliers too). CRAN: “Seriously, don’t use it [common support matching].” common <- with(framing.new, Match(Y=support.renew,Tr=dem, X=pscore, est="ATT", CommonSupport=TRUE)) D.J. Flynn PS406 – Week 4 Section Spring 2014 12 / 21
summary(common) Estimate... 0.17879 AI SE...... 0.052167 T-stat..... 3.4272 p.val...... 0.00060973 Original number of observations.............. 96 Original number of treated obs............... 55 Matched number of observations............... 55 Matched number of observations (unweighted). 217 D.J. Flynn PS406 – Week 4 Section Spring 2014 13 / 21
Bias Adjustment Bias adjusted matching uses regression adjustment to improve the consistency of the matching estimator. (Not all estimators are consistent in expectation like OLS.) bias.adj <- with(framing.new, Match(Y=support.renew, Tr=dem, X=pscore, est="ATT", BiasAdjust=TRUE)) D.J. Flynn PS406 – Week 4 Section Spring 2014 14 / 21
summary(bias.adj) Estimate... 0.21874 AI SE...... 0.05897 T-stat..... 3.7094 p.val...... 0.00020776 Original number of observations.............. 100 Original number of treated obs............... 58 Matched number of observations............... 58 Matched number of observations (unweighted). 221 D.J. Flynn PS406 – Week 4 Section Spring 2014 15 / 21
Exact matching Under exact matching , only cases with the same p-scores will be matched; others will be discarded. You can specify which covariates to use for exact matches (e.g., no continuous covariates). exact <- with(framing.new, Match(Y=support.renew, Tr=dem, X=pscore, est="ATT", exact=TRUE)) D.J. Flynn PS406 – Week 4 Section Spring 2014 16 / 21
summary(exact) Estimate... 0.16667 AI SE...... 0.043309 T-stat..... 3.8483 p.val...... 0.00011893 Original number of observations.............. 100 Original number of treated obs............... 58 Matched number of observations............... 47 Matched number of observations (unweighted). 204 Number of obs dropped by ’exact’ or ’caliper’ 11 D.J. Flynn PS406 – Week 4 Section Spring 2014 17 / 21
Rosenbaum sensitivity analysis Matching relies on the propensity score, which was estimated using a vector of covariates X that we specified a priori . Thus, we want to check how sensitive our effect estimates are to potential confounders – that is, any variable(s) that affects assignment to treatment. Rosenbaum sensitivity analysis helps us do this. It shows us how our results might change given different values of a sensitivity parameter. A short, readable paper on RSA is here: www.personal.psu.edu/ljk20/rbounds%20vignette.pdf D.J. Flynn PS406 – Week 4 Section Spring 2014 18 / 21
Sensitivity analysis For binary outcomes: library(rbounds) binarysens() For continuous outcomes: library(rbounds) psens() D.J. Flynn PS406 – Week 4 Section Spring 2014 19 / 21
Back to our data: What do we see? binarysens(bias.adj) Rosenbaum Sensitivity Test Unconfounded estimate .... 0 Gamma Lower bound Upper bound 1 0 0.00000 2 0 0.00000 3 0 0.00001 4 0 0.00011 5 0 0.00057 6 0 0.00180 Note: Gamma is Odds of Differential Assignment To Treatment Due to Unobserved Factors D.J. Flynn PS406 – Week 4 Section Spring 2014 20 / 21
Closing thoughts: GLMs for binary outcomes We used logit to estimate propensity scores. We also could’ve used a linear probability model (OLS), probit, clog-log, others... Key takeaway is to always use one of these models (not OLS) when your DV is binary to prevent out-of-sample predictions (e.g., Pr(turnout)=1.23???) . When you estimate one of these, you’re no longer modeling E [ Y ] . Instead you’re modeling: [ Pr ( Y i = 1 ) | X i ] I stick with one (usually logit), that way I can use to “divide by 4 rule” to get rough ideas of effects. But always present substantive effects for readers. Otherwise, who cares about the effect of a given X on the log odds of Y?? D.J. Flynn PS406 – Week 4 Section Spring 2014 21 / 21
Recommend
More recommend