ps 406 week 3 section bootstrapping
play

PS 406 Week 3 Section: Bootstrapping D.J. Flynn April 21, 2014 - PowerPoint PPT Presentation

PS 406 Week 3 Section: Bootstrapping D.J. Flynn April 21, 2014 D.J. Flynn PS406 Week 3 Section Spring 2014 1 / 22 Todays plan Logic of the Bootstrap 1 Bootstrapping in R 2 D.J. Flynn PS406 Week 3 Section Spring 2014 2


  1. PS 406 – Week 3 Section: Bootstrapping D.J. Flynn April 21, 2014 D.J. Flynn PS406 – Week 3 Section Spring 2014 1 / 22

  2. Today’s plan Logic of the Bootstrap 1 Bootstrapping in R 2 D.J. Flynn PS406 – Week 3 Section Spring 2014 2 / 22

  3. Logic of the Bootstrap Logic of the Bootstrap We are sometimes confronted with problems that would be very difficult/time-consuming to solve mathematically. Usually, these problems involve calculating some measure of accuracy for a sample statistic: How well does a ML estimator perform in small samples? Confidence intervals for complicated interaction effects? etc... D.J. Flynn PS406 – Week 3 Section Spring 2014 3 / 22

  4. Logic of the Bootstrap Two variants of the bootstrap Non-parametric bootstrapping: Re-sample (with replacement) from original data, calculate the test statistic 1000+ times, use the variance in estimates as an estimate of variance in original estimator. Parametric bootstrapping: Specify a probability model for the data ( Y ), generate 1000+ samples based on that model, use the variance in estimates as an estimate of variance in original estimator. Let’s take a look at the code to perform each of these, paying attention to the key difference between them... D.J. Flynn PS406 – Week 3 Section Spring 2014 4 / 22

  5. Logic of the Bootstrap Non-parametric bootstrap from class coins <- rbinom(10,1,.5) temp <- matrix(0,nrow=1000,ncol=1) for (i in 1:1000){ temp[i,1] <- sum(sample(coins, length(coins), replace=TRUE))/length(coins) } D.J. Flynn PS406 – Week 3 Section Spring 2014 5 / 22

  6. Logic of the Bootstrap Parametric bootstrap from class temp <- matrix(0,nrow=1000,ncol=1) for (i in 1:1000){ temp[i,1] <- sum( rbinom(10,1, .5))/length(coins) } D.J. Flynn PS406 – Week 3 Section Spring 2014 6 / 22

  7. Bootstrapping in R Bootstrapping in R We use “loops” to tell R what process(es) to repeat The basic process is as follows: make vector/matrix to store results 1 create loop 2 tell R what operations to perform inside the loop 3 store the results in original vector/matrix 4 Let’s take a look at a simple example... D.J. Flynn PS406 – Week 3 Section Spring 2014 7 / 22

  8. Bootstrapping in R mean<-10 sd<-2 result.vec<-vector(mode="numeric",length=1000) for (i in 1:1000) { draws<-rnorm(1000,mean=mean,sd=sd) result.vec[i]<-mean(draws) } summary(result.vec) Min. 1st Qu. Median Mean 3rd Qu. Max. 9.823 9.959 10.000 10.000 10.040 10.220 D.J. Flynn PS406 – Week 3 Section Spring 2014 8 / 22

  9. Bootstrapping in R Let’s practice... D.J. Flynn PS406 – Week 3 Section Spring 2014 9 / 22

  10. Bootstrapping in R Non-parametric boostrapping (problem 1a) max<-10 #per the problem orig.sample<-runif(100,min=0,max=max) #create vector for bias results bias.vec<-vector(mode="numeric",length=1000) #sample from original with replacement, take max, store bias for (i in 1:1000){ boot.sample<-sample(orig.sample,100,replace=TRUE) max.boot<-max(boot.sample) bias.vec[i]<-(max.boot - max(orig.sample)) / max(orig.sample) } D.J. Flynn PS406 – Week 3 Section Spring 2014 10 / 22

  11. Bootstrapping in R #let’s take a look at our bootstrapped estimate of bias: mean(bias.vec) #We know from the lab that the actual bias is: -1 / (1000+1) #Q: How well does the non-parametric boostrap do at estimating bias? ... What about with different sample sizes? D.J. Flynn PS406 – Week 3 Section Spring 2014 11 / 22

  12. Bootstrapping in R Parametric bootstrapping (problem 1b) This problem is similar to the previous one. You’ll need to do a few things differently: draw a sample from the uniform distribution using some a > 0 as the maximum store the max from the original sample inside the loop: draw a sample from the uniform using the max from the original sample as the max. Store the max of the bootstrap sample. Store the bias in your results vector: bias = (mean(bootstrap max) - original max) / original max Try this on your own and let me know if you encounter issues. D.J. Flynn PS406 – Week 3 Section Spring 2014 12 / 22

  13. Bootstrapping in R Parametric bootstrapping with the normal distribution (problem 2) Suppose we have a sample of i.i.d. Normal observations with mean 10 and sd 2. We want to know the bias, variance, and MSE for a given estimator (e.g., mean) in samples of varying sizes. D.J. Flynn PS406 – Week 3 Section Spring 2014 13 / 22

  14. Bootstrapping in R sample.mean<-10 sample.sd<-2 results<-vector(mode="numeric",length=1000) for (i in 1:1000) { draws<-rnorm(100,mean=sample.mean,sd=sample.sd) results[i]<-mean(draws) } #calculate bias: mean(results) - sample.mean [1] 0.002322089 #calculate variance: var(results) [1] 0.03979419 D.J. Flynn PS406 – Week 3 Section Spring 2014 14 / 22

  15. Bootstrapping in R #calculate MSE: mean((results - sample.mean)^2) [1] 0.03975979 Note that for the lab you’ll need to draw two samples inside the loop (one for X and one for Y ) – and then store mean ( X ) mean ( Y ) in your results vector D.J. Flynn PS406 – Week 3 Section Spring 2014 15 / 22

  16. Bootstrapping in R Contaminated Data (problem 3) Suppose we accidentally merge two datasets together: the first is a sample from a Normal population with mean 10 and sd 2, and the second (contaminated) dataset is from a Normal population with mean 500 and variance 50. We’ve calculated some statistics (e.g., SD and IQR), and now we want to know how affected those estimators are by the presence of outliers. We know that the sample is N = 100 and 25% of the data are contaminated.. D.J. Flynn PS406 – Week 3 Section Spring 2014 16 / 22

  17. Bootstrapping in R original<-rnorm(100,mean=10,sd=2) contaminated<-rnorm(100,mean=500,sd=50) results <- matrix(NA, nrow=1000, ncol=2) for (i in 1:1000) { prob.contam<-rbinom(100,1,prob=0.25) actual.dist<-ifelse(prob.contam==1, contaminated, original) iqr<-IQR(actual.dist) sd<-sd(actual.dist) results[i,]<-c(iqr,sd) } D.J. Flynn PS406 – Week 3 Section Spring 2014 17 / 22

  18. Bootstrapping in R #to look at IQR: results[,1] #to look at SD: results[,2] #can calculate RMSE as follows (e.g., for SD): sqrt(mean((results[,2] - 1)^2)) / 1 D.J. Flynn PS406 – Week 3 Section Spring 2014 18 / 22

  19. Bootstrapping in R Bootstrapping Mediation Effects (problem 4) Let’s build a really simple mediation model using the Mali data from last week and use bootstrapping to calculate SE for our estimated effect. For this simple example: same name is the treatment global eval is the mediator vote prefer is the DV D.J. Flynn PS406 – Week 3 Section Spring 2014 19 / 22

  20. Bootstrapping in R data <- read.csv ("http://pantheon.yale.edu/~td244/cross_cutting_apsr_ replicationdata.csv") library(car) data$samename <- recode(mali$treat_assign, "1=0; 2=0; 3=0; 4=0; 5=0; 6=1") data.new <- na.omit(data.frame(global_eval=data$global_eval, samename=data$samename, vote_prefer=data$vote_prefer)) model.m<-lm(global_eval~samename,data=data.new) model.y<-lm(vote_prefer~samename+global_eval,data=data.new) library(mediation) med<-mediate(model.m,model.y,sims=1000,boot=TRUE, boot.ci.type="bca", treat="samename",mediator="global_eval") summary(med) D.J. Flynn PS406 – Week 3 Section Spring 2014 20 / 22

  21. Bootstrapping in R Let’s build a function to perform the bootstrap... myfunction<-function(data, i) { boot <- NULL boot$samename<-data.new$samename[i] boot$vote_prefer<-data.new$vote_prefer[i] boot$global_eval<-data.new$global_eval[i] model.m<-lm(global_eval~samename,data=boot) model.y<-lm(vote_prefer~samename+global_eval,data=boot) med<-mediate(model.m,model.y,boot=FALSE, boot.ci.type= "bca", treat="samename",mediator="global_eval") } D.J. Flynn PS406 – Week 3 Section Spring 2014 21 / 22

  22. Bootstrapping in R install.packages("boot") library(boot) #NOTE: number of reps cannot be < N #NOTE: this could take a few minutes. If it takes longer #(hours), let me know -- It’s possible to do the analysis #on a sub-set of cases. boot.result<-boot(data.new,myfunction,R=700) boot.ci(boot.result) D.J. Flynn PS406 – Week 3 Section Spring 2014 22 / 22

Recommend


More recommend