Bootstrapping 18.05 Spring 2014 Jeremy Orloff and Jonathan Bloom

Agenda Empirical bootstrap Parametric bootstrap June 9, 2014 2 / 15

Resampling Sample (size 6): 1 2 1 5 1 12 Resample by choosing k uniformly between 1 and 6 and taking the k th element. Resample (size 10): 5 1 1 1 12 1 2 1 1 5 A bootstrap (re)sample is always the same size as the original sample: Bootstrap sample (size 6): 5 1 1 1 12 1 June 9, 2014 3 / 15

Empirical bootstrap confidence intervals Use the data to estimate the variation of estimates based on the data! Data: x 1 , . . . , x n drawn from a distribution F . ˆ. Estimate a feature θ of F by a statistic θ Generate many bootstrap samples x 1 ∗ , . . . , x n ∗ . Compute the statistic θ ∗ for each bootstrap sample. Compute the bootstrap difference ˆ δ ∗ = θ ∗ − θ. Use the quantiles of δ ∗ to approximate quantiles of ˆ δ = θ − θ ˆ − δ ∗ ˆ − δ ∗ ] Set a confidence interval [ θ 1 − α/ 2 , θ α/ 2 ( δ α/ 2 is the α/ 2 quantile .) June 9, 2014 4 / 15

Concept question Consider finding bootstrap confidence intervals for I. the mean II. the median III. 47th percentile. Which is easiest to find? A. I B. II C. III D. I and II E. II and III F. I and III G. I and II and III answer: G. The program essentially the same for all three statistics. All that needs to change is the code for computing the specific statistic. June 9, 2014 5 / 15

Board question Data: 3 8 1 8 3 3 Bootstrap samples (each column is one bootstrap trial): 8 3 3 8 1 3 8 3 1 1 8 3 3 3 3 1 3 8 3 8 3 1 3 3 1 3 8 3 8 3 1 3 3 3 3 8 3 3 3 3 3 1 3 3 1 3 3 3 Compute a 75% confidence interval for the mean. Compute a 75% confidence interval for the median. June 9, 2014 6 / 15

Solution ¯ = 4 . 33 x ¯ ∗ : x 3.17 3.17 4.67 5.50 3.17 2.67 3.50 2.67 δ ∗ : -1.17 -1.17 0.33 1.17 -1.17 -1.67 -0.83 -1.67 So, δ ∗ = − 1 . 67, δ ∗ = 0 . 75. (For δ ∗ we took the average of the . 125 . 875 . 875 top two values –there are other reasonable choices.) Sort: -1.67 -1.67 -1.17 -1.17 -1.17 -0.83 0.33 1.17 75% CI: [¯ x − 0 . 75 , x ¯ + 1 . 67] = [3.58 6.00] June 9, 2014 7 / 15

Resampling in R # This code reminds you how to use the R function sample() to resample data. # an arbitrary array x = c(3, 5, 7, 9, 11, 13) n = length(x) # Take a bootstrap sample from x resample.bs = sample(x, n, replace=TRUE) print(resample.bs) # Print the 3rd and 5th elements in resample.bs resample.bs[c(3,5)] June 9, 2014 8 / 15

Parametric bootstrapping Use the data to estimate a parameter. Use the parameter to estimate the variation of the parameter estimate. Data: x 1 , . . . , x n drawn from a distribution F ( θ ). ˆ. Estimate θ by a statistic θ ˆ). Generate many bootstrap samples from F ( θ Compute θ ∗ for each bootstrap sample. Compute the difference from the estimate ˆ δ ∗ = θ ∗ − θ Use quantiles of δ ∗ to approximate quantiles of ˆ δ = θ − θ Use the quantiles to define a confidence interval. June 9, 2014 9 / 15

Parametric sampling in R # an arbitrary array from binomial(15, theta) for an unknown theta x = c(3, 5, 7, 9, 11, 13) binomSize = 15 n = length(x) thetaHat = mean(x)/binomSize parametricSample = rbinom(n, binomSize, thetaHat) print(parametricSample) June 9, 2014 10 / 15

Board question Data: 6 5 5 5 7 4 ∼ binomial(8, θ ) 1. Estimate θ . 2. Write out the R code to generate data of 100 parametric bootstrap samples and compute an 80% confidence interval for θ . (You will want to make use of the R function quantile() .) Solution on next slide June 9, 2014 11 / 15

Solution Data: x = 6 5 5 5 7 4 1. Since θ is the expected fraction of heads for each binomial we make the ˆ = mean ( x ) / 8 = average fraction of heads in each binomial trial. estimate θ ˆ θ = . 667 Parametric bootstrap sample: One bootstrap sample is 6 draws from a ˆ) distribution. binomial(8, θ The R code is on the next slides. We generate bootstrap data and compute δ ∗ . The quantiles we need are The bootstrap principle says δ p ≈ δ ∗ p The 80% confidence interval is ˆ − δ ˆ − δ θ ∗ . 9 , θ ∗ . 1 (Notice we are using quantiles not critical values here.) June 9, 2014 12 / 15

R code for parametric bootstrap binomSize = 8 # number of ‘coin tosses’ in each binomial trial x = c(6, 5, 5, 5, 7, 4) # given data n = length(x) # number of data points thetahat = mean(x)/binomSize # estimate of θ # Compute δ ∗ for 100 parametric bootstrap samples nboot = 100 dstar.list = rep(0,nboot) for (j in 1:nboot) { # Genereate a parametric bootstrap sample and compute δ ∗ xstar = rbinom(n,binomSize,thetahat) thetastar = mean(xstar)/binomSize dstar.list[j] = thetastar - thetahat } ( continued) June 9, 2014 13 / 15

R code continued # compute the confidence interval alpha = .2 dstar alpha2 = quantile(dstar.list, alpha/2, names=FALSE) dstar 1minusalpha2 = quantile(dstar.list, 1-alpha/2, names=FALSE) CI = thetahat - c(dstar 1minusalpha2, dstar alpha2) print(CI) June 9, 2014 14 / 15

Preview of linear regression Fit lines or polynomials to bivariate data Model: y = f ( x ) + E f ( x ) function, E random error. item Example: y = ax + b + E Example y = ax 2 + bx + c + E ax + b + E Example y = e June 9, 2014 15 / 15

��

Bootstrapping 18.05 Spring 2014 Jeremy Orloff and Jonathan Bloom - PowerPoint PPT Presentation

Bootstrapping 18.05 Spring 2014 Jeremy Orloff and Jonathan Bloom Agenda Empirical bootstrap Parametric bootstrap June 9, 2014 2 / 15 Resampling Sample (size 6): 1 2 1 5 1 12 Resample by choosing k uniformly between 1 and 6 and taking the k th

Bootstrapping without the Boot We like minimally supervised learning (bootstrapping).

Parametric Bootstrapping 18.05 Spring 2017 Parametric bootstrapping Use the estimated parameter

Explorations in Bootstrapping Guided Search 8th Language and Computation Day Deirdre Lungley

Improved Bootstrapping Approach in Multichannel Cognitive Radio Ad Hoc Networks The 4th Workshop

SFU NatLangLab Bootstrapping via Graph Propagation Max Whitney Anoop Sarkar Simon Fraser

INF5210 Information Infrastructure Class #11 Bootstrapping & Gateways Ben Eaton Dan Truong

Statistical analysis and bootstrapping Michel Bierlaire michel.bierlaire@epfl.ch Transport and

Bootstrapping Debian for a new architecture Pietro Abate Universite Paris Diderot / Irill

PS 406 Week 3 Section: Bootstrapping D.J. Flynn April 21, 2014 D.J. Flynn PS406 Week 3

Ring Switching and Bootstrapping FHE Chris Peikert School of Computer Science Georgia Tech

Bootstrapping evolvability for inter-domain routing with D-BGP Raja Sambasivan David Tran-Lam,

CP4: Fitting and Bootstrapping GLMs for Incremental Development Triangles Thomas Hartl, PwC LLP

Bootstrapping Food Preferences Through an Adaptive Visual Interface Longqi Yang , Yin Cui, Fan

Bootstrapping evolvability for inter-domain routing Raja Sambasivan , David Tran-Lam, Aditya

Demo of DANE-Enhanced Version of Off-the-Record Private Messaging Tool Bootstrapping Trust

Startup Machine Learning: Bootstrapping a fraud detection system Michael Manapat Stripe

Confidence intervals for the mixing time of a reversible Markov chain from a single sample path

Lecture 4. Maximum Likelihood Estimation - confidence intervals. Igor Rychlik Chalmers

Quantifying Chance Part 1: Sampling Variability INFO-1301, Quantitative Reasoning 1 University

HiGrad: Statistical Inference for Stochastic Approximation and Online Learning Weijie Su

Confidence Intervals for Normal Data 18.05 Spring 2014 Jeremy Orloff and Jonathan Bloom Agenda

Poli 30D Political Inquiry Normal Curve & Confidence Intervals Shane Xinyang Xuan

CS 147: Computer Systems Performance Analysis Comparing Systems and Analyzing Alternatives 1 /

CSE 312 Spring 2015 More on parameter estimation Bias; and Confidence Intervals 57 Bias