03-11-2015, NTTS Brussels Variance Estimation in Complex Samples: The Finite Population Bootstrap Using Pseudo-Populations Andreas Quatember
Andreas Quatember: The Finite Population Bootstrap Using Pseudo-Populations 2/11 The Finite Population Bootstrap The bootstrap method provides an alternative for variance estimation “... probably the most flexible and efficient method of analyzing survey data” (Lahiri 2003) Originally developed for the estimation of sampling distributions (of estimators) in i.i.d. situations (Efron 1979): 1. i.i.d. random sample s of size n from a distribution 2. Draw B i.i.d. random resamples of size n from s (MC version) 3. In each resample, calculate the estimator under study 4. For large B , the distribution of the B resample estimates approximates the interesting sampling distribution How can this idea be applied to without replacement sampling from finite populations?
Andreas Quatember: The Finite Population Bootstrap Using Pseudo-Populations 3/11 Different approaches are available (cf. Shao and Tu 1995): “Ad-hoc approach” (cf. Ranalli and Mecatti 2012): • i.i.d. resampling plus an adequate choice of the resample sizes (cf. McCarthy and Snowden 1985) • i.i.d. resampling plus rescaling of observations (cf. Rao and Wu 1988) • Subsampling from the original sample under the original sampling scheme with an adapted sample size (cf. Sitter 1992) • Combining with- and without replacement schemes (cf. Antal and Tillé 2011) “Plug-in approach” (cf. Ranalli and Mecatti 2012): • Generating a bootstrap population (cf. Gross 1980)
Andreas Quatember: The Finite Population Bootstrap Using Pseudo-Populations 4/11 Basic idea for SI samples and integer design-weights N / n : 1. SI sample of size n from U 2. Replicate each sample unit N / n times to generate a pseudo-population U p : HT approach The idea behind 1 ∑ ∑ = ⋅ = ⋅ t y y d : HT k k k π s s k Sample value y 1 is replicated d 1 times, y 2 is “cloned” d 2 times, and so on � d 1 , d 2 , …, d n are “replication factors” of the HT approach 3. Draw B SI-resamples of size n from U p 4. Calculate the estimator under study in each resample 5. The MC distribution of these estimates serves as an estimator of the true sampling distribution of the estimator
Andreas Quatember: The Finite Population Bootstrap Using Pseudo-Populations 5/11 The key to an efficient application of this procedure is the generation of an adequate pseudo-population U p For U p = U , this framework would perfectly simulate the interesting SI sampling distribution
Andreas Quatember: The Finite Population Bootstrap Using Pseudo-Populations 6/11 1. Non-integer design weights Booth et al. (1994): For the generation of U p , replicate each sampling unit k according to the integer part i k of its SI design weight N = + i r n resulting in n · i elements and add N − n · i elements drawn by SI sampling from s p ( i = 1,... C ) and resample in each of Create C such pseudo-populations U i them to account for the random nature of U p
Andreas Quatember: The Finite Population Bootstrap Using Pseudo-Populations 7/11 2. General probability sampling with arbitrary π k ’s Holmberg (1998): For the generation of U p , replicate each sampling unit k according to the integer part i k of its design weight = + d i r k k k and randomly one more time with probability r k
Andreas Quatember: The Finite Population Bootstrap Using Pseudo-Populations 8/11 Three properties should apply (Barbiero and Mecatti 2010): A) The total of auxiliary variable x in U p should be equal to t (x) in U B) The total of y in U p should be equal to its HT estimator t HT C) = E t t ( ) boot HT b HT , For Booth et al. (1994), or Holmberg (1998): • Violation of mimicking principle of the bootstrap approach by differing from the “nominal” U p for r k > 0 • Recalculation of sample inclusion probabilities is necessary
Andreas Quatember: The Finite Population Bootstrap Using Pseudo-Populations 9/11 The HT based bootstrap (HTB): A natural development of the generation procedures proposed (Quatem- ber 2014) Based directly on the HT principle ∑ = ⋅ t y d HT k k s allowing not only whole units with certain values from s in U p Affects the drawing probabilities of the different units in U p
Andreas Quatember: The Finite Population Bootstrap Using Pseudo-Populations 10/11 Summary of the results of a simulation study on the HTB finite popu- lation bootstrap approach: • Follows directly the mimicking principle (more understandable) • (Small) positive effect on the efficiency compared to other methods such as Holmberg (1998) • No recalculation of inclusion probabilities necessary (simpler algo- rithm) • Can still be used in situations where other methods fail (when some d k ‘s are close to one) • Large pseudo-populations do not have to be generated physically when the probability mechanism can be used for the resampling process (cf. Ranalli and Mecatti 2012)
Andreas Quatember: The Finite Population Bootstrap Using Pseudo-Populations 11/11 Thank you very much for your (hopefully non-pseudo) attention!
Recommend
More recommend