inference based on the wild bootstrap
play

Inference Based on the Wild Bootstrap James G. MacKinnon Department - PowerPoint PPT Presentation

Inference Based on the Wild Bootstrap James G. MacKinnon Department of Economics Queens University Kingston, Ontario, Canada K7L 3N6 email: jgm@econ.queensu.ca Ottawa September 14, 2012 The Wild Bootstrap Consider the linear regression


  1. Inference Based on the Wild Bootstrap James G. MacKinnon Department of Economics Queen’s University Kingston, Ontario, Canada K7L 3N6 email: jgm@econ.queensu.ca Ottawa September 14, 2012

  2. The Wild Bootstrap Consider the linear regression model E( u 2 i ) = σ 2 y i = X i β + u i , i , i = 1 , . . . , n. (1) One natural way to bootstrap this model is to use the residual bootstrap . We condition on X , ˆ β , and the empirical distribution of the residuals (perhaps transformed). Thus the bootstrap DGP is i = X i ˆ y ∗ β + u ∗ u ∗ i , i ∼ EDF(ˆ u i ) . (2) Strong assumptions! This assumes that E( y i | X i ) = X i β and that the error terms are IID, which implies homoskedasticity. At the opposite extreme is the pairs bootstrap , which draws bootstrap samples from the joint EDF of [ y i , X i ]. This assumes that there exists such a joint EDF, but it makes no assumptions about the properties of the u i or about the functional form of E( y i | X i ). The wild bootstrap is in some ways intermediate between the residual and pairs bootstraps. It assumes that E( y i | X i ) = X i β , but it allows for heteroskedasticity by conditioning on the (possibly transformed) residuals.

  3. If no restrictions are imposed, wild bootstrap DGP is i = X i ˆ y ∗ u i ) v ∗ β + f (ˆ i , (3) u i ) is a transformation of the i th residual ˆ u i , and v ∗ where f (ˆ i has mean 0. � � � � u i ) v ∗ Thus E f (ˆ = 0 even if E f (ˆ u i ) � = 0. Common choices: i � w1: f (ˆ u i ) = n/ ( n − k ) ˆ u i , u i ˆ w2: f (ˆ u i ) = (1 − h i ) 1 / 2 , u i ˆ w3: f (ˆ u i ) = . 1 − h i Here h i is the i th diagonal element of the “hat matrix” P X ≡ X ( X ⊤ X ) − 1 X ⊤ . The w1, w2, and w3 transformations are analogous to the ones used in the HC1, HC2, and HC3 covariance matrix estimators. u i ) v ∗ We would like functions of the bootstrap error terms f (ˆ i , such as n − 1 / 2 X ⊤ u ∗ , to have properties similar to those of the same functions of the actual error terms.

  4. Ideally, the bootstrap error terms would have the same moments as the transformed residuals. For that to be the case, we need i ) = 0 , E( v ∗ 2 i ) = 1 , E( v ∗ 3 i ) = 1 , E( v ∗ 4 E( v ∗ i ) = 1 . (4) But this is impossible! Consider the outer product of the vector [1 v v 2 ] ⊤ with itself for a random variable v with expectation zero:     v 2 σ 2 1 v 1 0   =   . v 2 v 3 σ 2 E v 0 µ 3 (5) v 2 v 3 v 4 σ 2 µ 3 µ 4 Determinant must be nonnegative since the matrix is positive semidefinite: 3 − σ 6 ≥ 0 . σ 2 µ 4 − µ 2 (6) But 1 ∗ 1 − 1 2 − 1 6 = − 1. If σ 2 = 1 and µ 3 = 1, then µ 4 ≥ 2. So there exists no distribution for the v ∗ i that satisfies (4). This means that there is no “ideal” distribution for the v ∗ i . We either need to relax the requirement that µ 3 = 1 or allow µ 4 ≥ 2.

  5. Most common choice for v ∗ i is Mammen’s two-point distribution : � √ √ √ − ( 5 − 1) / 2 with probability ( 5 + 1) / (2 5), v ∗ i = (7) √ √ √ ( 5 + 1) / 2 with probability ( 5 − 1) / (2 5). It was suggested in Mammen (1993). In this case, i ) = 0 , E( v ∗ 2 i ) = 1 , E( v ∗ 3 i ) = 1 , E( v ∗ 4 E( v ∗ i ) = 2 . (8) Thus (6) is satisfied as an equality. No distribution that has the correct third moment can have a fourth moment smaller than 2. Mammen must have obtained his distribution by solving the equations p 1 v 1 + (1 − p 1 ) v 2 = 0 , p 1 v 2 1 + (1 − p 1 ) v 2 2 = 1 , (9) p 1 v 3 1 + (1 − p 1 ) v 3 2 = 1 . √ √ √ √ The result is p 1 = ( 5 + 1) / (2 5), v 1 = − ( 5 − 1) / 2, and v 2 = ( 5 + 1) / 2, which leads to (7).

  6. Besides getting the fourth moment wrong, Mammen’s distribution involves two very different probabilities (0.72361 and 0.27639). Thus, about 72% of the time, the sign of the bootstrap error term for observation i will be the opposite of the sign of the residual. Davidson and Flachaire (2008) proposed the Rademacher distribution : � with probability 1 − 1 2 , v ∗ i = (10) with probability 1 1 2 , for which i ) = 0 , E( v ∗ 2 i ) = 1 , E( v ∗ 3 i ) = 0 , E( v ∗ 4 E( v ∗ i ) = 1 . (11) This has the desired fourth moment, and each bootstrap error is positive with probability one-half, which is appealing. But it imposes symmetry. If the error terms really are symmetric, it is clearly good to impose symmetry. Even if they are not, getting µ 4 right may well be more important than getting µ 3 wrong. D&F provide evidence, and see below. Using the Rademacher distribution means conditioning on X , ˆ β , and the absolute values of the (transformed) residuals.

  7. Alternatives to Two-Point Distributions Two-point distributions seem unnatural, as each observation can only have two bootstrap error terms associated with it. In the usual case, this means that there are only 2 n possible bootstrap samples. Since the standard normal distribution has mean 0 and variance 1, it may seem natural to use it for v ∗ . But µ 3 = 0 and µ 4 = 3. So its fourth moment is worse than for Mammen, and it has the same, sometimes undesirable, symmetry property as Rademacher. Mammen (1993) also suggested the continuous distribution: √ 2 + 1 v ∗ 2 ( w 2 i = u i / − i − 1) , (12) where u i and w i are independent standard normals. [There is a serious typo in the article, which makes it look as if u i = w i .] For this distribution, i ) = 0 , E( v ∗ 2 i ) = 1 , E( v ∗ 3 i ) = 1 , E( v ∗ 4 E( v ∗ i ) = 6 . (13) This gets the third moment right, but the fourth moment is extremely large. Mammen also suggests another, similar, distribution that is more complicated than (12) and has a slightly smaller fourth moment.

  8. Estimating Covariance Matrices Bootstrap methods are sometimes used to estimate standard errors and j is the estimate for the j th bootstrap sample, and covariance matrices. If ˆ β ∗ β ∗ denotes the average of the ˆ ¯ β ∗ j , then the usual estimator is B � ∗ (ˆ � ( ˆ j − ¯ β ∗ )( ˆ j − ¯ β ∗ β ∗ β ∗ ) ⊤ . Var β ) = (14) j =1 Evidently, β ∗ = ( X ⊤ X ) − 1 X ⊤ ( X ˆ ˆ j − ¯ j ) − ¯ β ∗ β + u ∗ β ∗ (15) j + ( ˆ β − ¯ = ( X ⊤ X ) − 1 X ⊤ u ∗ β ∗ ) . If the OLS estimator is unbiased, then E( ˆ j ) = ˆ β ∗ β . Thus we can ignore β ∗ if B is large enough. β − ¯ ˆ The first term in the last line of (15) times itself transposed is ( X ⊤ X ) − 1 X ⊤ u ∗ ⊤ X ( X ⊤ X ) − 1 . j u ∗ (16) j ⊤ instead of a This looks like a sandwich covariance matrix, but with u ∗ j u ∗ j diagonal matrix.

  9. i ) 2 = 1, diagonal elements of u ∗ ⊤ have expectation f 2 (ˆ Because E( v ∗ j u ∗ u i ). j For Rademacher, these diagonal elements are precisely f 2 (ˆ u i ). Off-diagonal elements must have expectation zero because E( v ∗ i v ∗ j ) = 0. For Rademacher, each off-diagonal element is the product of the same two transformed residuals multiplied by +1 or − 1. Thus, as B becomes large, the matrix X ⊤ u ∗ j u ∗ ⊤ X should converge to the j matrix X ⊤ ˆ ΩX , where ˆ Ω is an n × n diagonal matrix with the squares of the f (ˆ u i ) on the diagonal. When the transformation f ( · ) is w1, w2, or w3, the bootstrap covariance matrix estimator (14) converges to HC1, HC2, or HC3 as B → ∞ . Conclusion: Using the wild bootstrap to estimate covariance matrices is just an expensive way to approximate various HCCMEs, with unnecessary simulation randomness. • Pointless for making inferences about linear regression models. • Might be useful for obtaining covariance matrices for nonlinear functions of those coefficients. • Might be useful for nonlinear regression models. Similar arguments apply to using the pairs bootstrap.

  10. Bootstrap Testing Consider the heteroskedasticity-robust t statistic ˆ β l − β 0 τ (ˆ β l − β 0 l �� l ) = . (17) ΩX ( X ⊤ X ) − 1 � ( X ⊤ X ) − 1 X ⊤ ˆ ll To calculate wild bootstrap P value, estimate (1) under the null hypothesis to obtain ˜ β and ˜ u . Then generate B bootstrap samples, using the DGP i = X i ˜ y ∗ u i ) v ∗ β + f (˜ i . (18) As in (3), there are several choices for the transformation f ( · ). For each bootstrap sample, calculate τ (ˆ β ∗ lj ), the bootstrap analog of (17): ˆ lj − β 0 β ∗ τ (ˆ l lj − β 0 β ∗ �� l ) = . (19) j X ( X ⊤ X ) − 1 � ( X ⊤ X ) − 1 X ⊤ ˆ Ω ∗ ll lj is the OLS estimate for the j th bootstrap sample. X ⊤ ˆ ˆ β ∗ Ω ∗ j X is computed in the same way as X ⊤ ˆ ΩX , but uses residuals from bootstrap regression.

Recommend


More recommend