constructing better coverage intervals for some
play

Constructing better coverage intervals for some estimators computed - PowerPoint PPT Presentation

Constructing better coverage intervals for some estimators computed from a complex probability sample a round table summary Phillip S. Kott RTI International a Suppose t is a nearly (i.e., asymptotically) unbiased estimator for a parameter


  1. Constructing better coverage intervals for some estimators computed from a complex probability sample a round table summary Phillip S. Kott RTI International a

  2. Suppose ˆ t is a nearly (i.e., asymptotically) unbiased estimator for a parameter t estimated with data drawn via a probability survey. The one-sided Wald coverage intervals for t are ˆ − ˆ −  +   −  1 1 t t ( ) v and t t ( ) v , where v is an estimator for V the variance of ˆ t , and  (.) is the cumulative distribution function of a standard normal distribution. 1

  3. It is well known that when the sample size is large enough, both inequalities hold for roughl y α -percent of samples drawn using the same sampling design as the probability sample. A symmetric two-sided α -percent Wald interval is ˆ − − +   ˆ + − + 1 1 t ([1 ]/ 2) v t t ([1 ]/ 2) v . 2

  4. Kott and Liu (2010) proposed using the following skewness- adjusted one-sided intervals in place of the Wald intervals: ˆ ˆ  ++ +  +− + 2 2 2 2 t t z v and t t z v , 2 m 1 (1 z − + where  = 2 3 z ) b , 6 v 2 z =  -1 (α), m 3 is a nearly unbiased estimator for the third central moment of ˆ : ˆ = − 3 [( ) ], t M E t t and 3 3

  5. b is a nearly unbiased estimate or for ˆ ˆ = − − B E v t [ ( t )] V the regression of on v t t . , 2 1 (1 m z − + In  = 2 3 z ) b , 6 v 2 2 z b accounts for v varying with ˆ − t t 2 1 (1 m accounts for ˆ − 2 3 t being skewed. z ) 6 v 4

  6.   2 1 z m If b  m 3 / v , which is often true, then   + 3   .   6 3 v ˆ = ˆ 3/2 Let / be the estimated skewness of , and m v t 3  =  ˆ 3/2 M / V be the measure is estimating. 3   0: ˆ 2 When             2 2 1 z 1 z  + ˆ + +  + ˆ + − ˆ ˆ         t t z v and t t z v  6 3   6 3           These are the Wald intervals shifted by (1/6 + z 2 /3) ˆ v . 5

  7. A S TRATIFIED M ULTISTAGE S AMPLE Consider now constructing a coverage interval for a parameter t based on stratified multistage sample when a nearly unbiased estimator for that parameter can be put in the form: n H 1 =   h ˆ ˆ , t t hi n = = 1 1 h i h n primary sampling units (PSU ’s) in stratum where there are h h , and each ˆ t for a PSU i in stratum h is a nearly unbiased hi estimator for the same value. 6

  8. We make the common (but often inaccurate) assumption that that the PSU ’s w ere selected randomly but with replacement. We focus on the difference between two domain means estimated using data from the same sample, S . The estimated different in domains means can be expressed as:   (1) (2) w y d w y d k k k k k k , − = −   k S k S y y   (1) (2) (1) (2) w d w d k k k k   k S k S 7

  9. n  3, the following equalities can be used: When all h − 2 n 2 H N ( e e ) ,   h = h hi h v − n ( n 1) = = h 1 i 1 h h − 3 n 3 H N ( e e ) m   h = = h hi h 3 m , and b , − − 3 n ( n 1)( n 2) v = = h 1 i 1 h h h e has the following linearized expression: where each hi   (1) (2) d d  = − − −   k k e n w [ y y ] [ y y ] . ˆ ˆ  hi h k k (1) k (2) k S   N N hi 1 2 8

  10. Some Simple Approximations Even if there were or a statistician wanted to program the equations herself, there may not be three PSUs in every stratum. Unlike collapsing strata for variance estimation, the direction of the potential bias of ˆ  can be positive or negative when the population means of the strata differ. Consequently, strata collapsed together should have (near) equal expected population means. 9

  11. A key to skewness-adjusted coverage intervals is the estimated  ˆ v value b = m 3 / v = . The value of this term for the difference between proportions estimated for two distinction domains from a simple random sample is approximately − − − − − 2 2 m p (1 p )(1 2 p ) / n p (1 p )(1 p ) / n  = 3 1 1 1 1 2 2 2 2 b − + − v p (1 p ) / n p (1 p ) / n 1 1 1 2 2 2 10

  12.   1 1  − −   b (1 2 p ) . When p 1 = p 2 , this collapses to 1  n n  1 2 That appears to suggest that when assessing the difference between proportions in two distinct domains, one should multiply the domain sample sizes by their respective design effects; BUT the design effect captures the impact of clustering, stratification, and unequal weighting on the variance of an estimator, not on its third central moment. 11

  13. A wiser procedure for an estimated proportion p =  k  S w k y k /  k  S w k , where y k = 0/1, might be to estimate B = M 3 / V with  3 w = −  k k S b 2 (1 2 ), p   simple w w   k k k S k S  = and then insert ˆ / b v into the skewness-adjusted simple simple coverage intervals. This estimate ignores the impact of stratification and clustering on b . 12

  14. For the difference between two domain means:  = ˆ b / v with simple simple − − − − − 2 2 p (1 p )(1 2 p ) / n p (1 p )(1 p ) / n = 1 1 1 1 2 2 2 2 b , − + − simple * * p (1 p ) / n p (1 p ) / n 1 1 1 2 2 2 ( ) ( ) 3 2   w w k k = S = S 2 * a a where n , and n .   a a 3 2 w w k k S S a a 13

  15. For a more general population or domain mean of a y-variable ,  one can replace ˆ  = ˆ b / v , simple simple where  − 3 3 w ( y y ) =  k k k S b ,   − simple 2 2 w w ( y y )   k k k k S k S =   y w y / w . and k k k S S 14

  16. Calibration Weighting and the Jackknife Calibration weighting often removes much of the impact of stratification and clustering from an estimated mean. As a result estimating the skewness of an estimated proportion or mean using b simple may not be unreasonable, although it would often be better to replace the y k with a calibrated residual. 15

  17. If calibrated jackknife weights have been constructed to compute v for an estimator ˆ , t then these weights can also be used in estimating the third central moment of ˆ t : − n 2 H ( n 1)   h ˆ ˆ = − 3 h m ( t t ) , − 3( ) J ( hi ) n n ( 2) = = h 1 i 1 h h ˆ hi where ˆ t t is computed with calibrated weights, and ( is ) computed with the calibrated weights for the stratum- h PSU -i jackknife replicate. 16

  18. National Institute Statistical Sciences at the JSM NISS/SAMSI National Institute of Mon, 7/29 Reception Statistical Sciences 6:00 PM to 9:00 PM Statistical and Applied Room: H- Mathematical Capitol Sciences Institute Ballroom 7 17

Recommend


More recommend