Stat 451 Lecture Notes 08 12 Bootstrap Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 9 in Givens & Hoeting, Chapter 24 in Lange 2 Updated: April 4, 2016 1 / 36
Outline 1 Introduction 2 Nonparametric bootstrap 3 Parametric bootstrap 4 Bootstrap in regression 5 Better bootstrap CIs 6 Remedies for bootstrap failure 7 Further remarks 2 / 36
Motivation For hypothesis testing and confidence intervals, there is a “statistic” whose sampling distribution is required. For example, to test if H 0 : µ = µ 0 based on a sample from a N( µ, σ 2 ) population, we use the t-statistic T = X − µ 0 S / √ n , whose null distribution (under stated conditions) is Student-t. But almost any deviation from this basic setup leads to a tremendously difficult distributional calculation. The goal of the bootstrap is to give a simple approximate solution based on simulations. 3 / 36
Notation For a distribution with cdf F , suppose we are interested in a parameter θ = ϕ ( F ), written as a functional of F . Examples: � Mean: ϕ ( F ) = x dF ( x ); Median: ϕ ( F ) = inf { x : F ( x ) ≥ 0 . 5 } ; ... Given data X = { X 1 , . . . , X n } from F , the empirical cdf is n � F ( x ) = 1 � I { X i ≤ x } , x ∈ R . n i =1 Then a natural estimate of θ is ˆ θ = ϕ ( � F ), the same functional of the the empirical cdf. 4 / 36
Notation (cont.) For inference, some statistic T ( X , F ) is used; e.g., T ( X , F ) = X − µ 0 S / √ n . Again, the sampling distribution of T ( X , F ) may be very complicated, unknown, or could depend on unknown F . Bootstrap Idea: Replace the unknown cdf F with the empirical cdf � F . Produce a numerical approximation of the sampling distribution of T ( X , F ) by repeated sampling from � F . 5 / 36
Notation (cont.) Let X ⋆ = { X ⋆ n } be an iid sample from � 1 , . . . , X ⋆ F , i.e., a sample of size n (with replacement) from X . Given X ⋆ , the statistic T ⋆ = T ( X ⋆ , � F ) can be evaluated. Repeated sampling of X ⋆ gives a sequence of T ⋆ ’s which can be used to approximate the distribution of T ( X , F ). For example, V { T ( X , F ) } ≈ Var { T ⋆ 1 , . . . , T ⋆ B } . Why should bootstrap work? Glivenko–Cantelli theorem says that � F → F as n → ∞ . So, iid sampling from � F should be approximately the same as iid sampling from F when n is large. 3 3 Lots of difficult theoretical work has been done to determine what it means for this approximation to be good and in what kinds of problems does it fail. 6 / 36
Outline 1 Introduction 2 Nonparametric bootstrap 3 Parametric bootstrap 4 Bootstrap in regression 5 Better bootstrap CIs 6 Remedies for bootstrap failure 7 Further remarks 7 / 36
Basic setup Above procedure is essentially the nonparametric bootstrap . Sampling distribution of T ( X , F ) is approximated directly by the empirical distribution of the bootstrap sample T ⋆ 1 , . . . , T ⋆ B . For example: Quantiles of T ( X , F ) are approximated by sample quantiles of T ⋆ 1 , . . . , T ⋆ B . Variance of T ( X , F ) is approximated by the sample variance of T ⋆ 1 , . . . , T ⋆ B . Bootstrap sample size is usually rather large, e.g., B ∼ 1000, but computationally manageable. 8 / 36
Example: variance of a sample median Example 29.4 in DasGupta (2008). X 1 , . . . , X n iid Cauchy with median µ . The sample mean X is a bad estimator of µ ( why? ), so use the sample median M n instead. For odd n , say n = 2 k + 1, there is an exact formula: � π/ 2 2 n ! x k ( π − x ) k (cot x ) 2 dx . V( M n ) = ( k !) 2 π n 0 A CLT-type of asymptotic approximation is � V( M n ) = π 2 / 4 n . What about a bootstrap approximation? 9 / 36
Example: variance of sample median (cont.) With n = 21 we have � V( M n ) = 0 . 1367 and V( M n ) = 0 . 1175 . A bootstrap 4 estimate of V( M n ) using B = 5000 is � V( M n ) boot = 0 . 1102 . Slight under-estimate of the variance... Main point is that we got a pretty good answer with essentially no effort — computer does all the hard work. 4 Note that I used set.seed(77) in the code... 10 / 36
Technical points What does it mean for the bootstrap to “work?” H n ( x ) is the true distribution function for ˆ θ n . n ( x ) is the true distribution function for ˆ H ⋆ θ ⋆ n . Bootstrap is “consistent” if the distance between H n ( x ) and H ⋆ n ( x ) converges to 0 (in probability) as n → ∞ . The bootstrap is successful in many problems, but there are known situations when it may fail: support depends on parameter; true parameter sits on boundary of parameter space; estimator convergence rate � = n − 1 / 2 ; ... The bootstrap can detect skewness in the distribution of ˆ θ n while CLT-type of approximations will not — often has a “second-order accuracy” property. Bootstrap often underestimate variances. 11 / 36
Bootstrap confidence intervals (CIs) A primary application of bootstrap is to construct CIs. The simplest approach is the percentile method . Let ˆ 1 , . . . , ˆ θ ⋆ θ ⋆ B be a bootstrap sample of point estimators. A two-sided 100(1 − α )% bootstrap percentile CI is [ ξ ⋆ α/ 2 , ξ ⋆ 1 − α/ 2 ] , where ξ ⋆ p is the 100 p percentile in the bootstrap sample. Simple and intuitive, but there are “better” methods. 12 / 36
Outline 1 Introduction 2 Nonparametric bootstrap 3 Parametric bootstrap 4 Bootstrap in regression 5 Better bootstrap CIs 6 Remedies for bootstrap failure 7 Further remarks 13 / 36
Definition Parametric bootstrap is a variation on the standard (nonparametric) bootstrap discussed previously. Let F = F θ be a parametric model. The parametric bootstrap replaces sampling iid from � F with θ , where ˆ sampling iid from F ˆ θ is some estimator of θ . Potentially more complicated than nonparametric bootstrap because sampling from F ˆ θ might be more difficult than sampling from � F . 14 / 36
Example: variance of sample median X 1 , . . . , X n iid Cauchy with median µ . Write M n for sample median. Parametric bootstrap samples X ⋆ 1 , . . . , X ⋆ n from a Cauchy distribution with median M n . Using B = 5000, parametric bootstrap gives � V( M n ) p-boot = 0 . 1356 . A bit closer to the true variance, V( M n ) = 0 . 1367, compared to nonparametric bootstrap. 15 / 36
Example: random effect model Hierarchical model: ∼ N( λ, ψ 2 ) iid ( µ 1 , . . . , µ n ) Y i | µ i ∼ N( µ i , σ 2 i ) , i = 1 , . . . , n . Parameters ( λ, ψ ) unknown but σ i ’s known. Parameter of interest is ψ ≥ 0, and values ψ ≈ 0 are of interest because it suggests homogeneity. ind ∼ N( λ, σ 2 i + ψ 2 ), i = 1 , . . . , n . Non-hierarchical version: Y i Can estimate ψ via maximum likelihood. Use parametric bootstrap to get confidence intervals? 16 / 36
Example: random effect model (cont.) Want to see what happens when ψ ≈ 0. Take ψ = n − 1 / 2 , near the boundary of ψ ≥ 0. Two-sided 95% parametric bootstrap percentile intervals have pretty low coverage in this case, even for large n . It is possible to get intervals with exact coverage... n Coverage Length 50 0.758 0.183 100 0.767 0.138 250 0.795 0.079 500 0.874 0.039 17 / 36
Outline 1 Introduction 2 Nonparametric bootstrap 3 Parametric bootstrap 4 Bootstrap in regression 5 Better bootstrap CIs 6 Remedies for bootstrap failure 7 Further remarks 18 / 36
Setup i , y i ) ⊤ are Consider an observational study where pairs z i = ( x ⊤ sampled from a joint predictor-response distribution. Let Z = { z 1 , . . . , z n } . Following the basic bootstrap principle above, repeatedly sample Z ⋆ = { z ⋆ 1 , . . . , z ⋆ n } with replacement from Z . Then do the same approximation of sampling distributions based on the empirical distribution from the bootstrap sample. This is called the paired bootstrap . What about for a fixed design? Complication is that the y i ’s are not iid. In such cases, first resample the residuals e i = y i − ˆ y i from the i ˆ original LS fit, and then set y ⋆ i = x ⊤ β + e ⋆ i . 19 / 36
Example: ratio of slope coefficients Consider the simple linear regression model y i = β 0 + β 1 x i + ε i , i = 1 , . . . , n , where ε 1 , . . . , ε n are iid mean zero, not necessarily normal. Assume this is an observational study . Suppose the parameter of interest is θ = β 1 /β 0 . A natural estimate of θ is ˆ θ = ˆ β 1 / ˆ β 0 . To get the (paired) bootstrap distribution of ˆ θ : Sample Z ⋆ = { z ⋆ 1 , . . . , z ⋆ n } with replacement from Z . Fit the regression model with data Z ⋆ to obtain ˆ 0 and ˆ β ⋆ β ⋆ 1 . θ ⋆ = ˆ Evaluate ˆ 1 / ˆ β ⋆ β ⋆ 0 . 20 / 36
Example: ratio of slope coefficients (cont.) Figure below shows bootstrap distribution of ˆ θ = ˆ β 1 / ˆ β 0 . 95% bootstrap percentile confidence interval for θ is: ( − 0 . 205 , − 0 . 173) . 50 40 Density 30 20 10 0 -0.24 -0.22 -0.20 -0.18 -0.16 theta.paired 21 / 36
Outline 1 Introduction 2 Nonparametric bootstrap 3 Parametric bootstrap 4 Bootstrap in regression 5 Better bootstrap CIs 6 Remedies for bootstrap failure 7 Further remarks 22 / 36
Recommend
More recommend