understanding variance estimator
play

Understanding Variance Estimator Bia ias in in Stratified - PowerPoint PPT Presentation

Understanding Variance Estimator Bia ias in in Stratified Two-Stage Sampling Khoa Dong 1 , Tim Trudell 1 , Yang Cheng 1 , Eric Slud 1,2 1 U.S. Census Bureau, 2 University of Maryland Joint Statistical Meetings Vancouver, CA July 29, 2018 1


  1. Understanding Variance Estimator Bia ias in in Stratified Two-Stage Sampling Khoa Dong 1 , Tim Trudell 1 , Yang Cheng 1 , Eric Slud 1,2 1 U.S. Census Bureau, 2 University of Maryland Joint Statistical Meetings Vancouver, CA July 29, 2018 1

  2. Disclaimer This presentation is intended to inform interested parties of ongoing research and to encourage discussion of work in progress. Any views expressed on statistical, methodological, technical, or operational issues are those of the authors and not necessarily those of the U.S. Census Bureau. 2

  3. Outline 1. Motivation 2. Overview of Current Population Survey (CPS) 3. Problem description 4. CPS variance estimation 5. Simulation results 3

  4. Motivation β€’ When estimating response rate π‘ž and 𝑀𝑏𝑠(π‘ž) for households in CPS non-self-representing (NSR) primary sampling units (PSU), we observed unusually high value of 𝑀𝑏𝑠(π‘ž) . β€’ We wanted to better understand the cause of this result. 4

  5. Current Population Survey β€’ One of the oldest surveys in the U.S. (in operation since 1942) β€’ Measuring national unemployment rate β€’ Monthly sample of ~72,000 households 5

  6. Primary Sampling Unit β€’ PSU - either a county or group of contiguous counties β€’ Two types of PSU: β€’ Self-representing SR β€’ Non-self-representing NSR 6

  7. CPS Sample Design β€’ Two-stage stratified sampling design for NSR PSUs: β€’ First stage : select one PSU per stratum with probability proportional to size (civilian noninstitutionalized population 16+ = CNP 16+) β€’ Second stage : do systematic sampling within selected PSUs β€’ Systematic sampling for SR PSUs 7

  8. CPS Sample Design β€’ Select PSUs once every 10 years β€’ 852 PSUs selected in first-stage (2010 design): 506 SR and 346 NSR β€’ Approximately 80% of CNP 16+ population in SR PSUs 8

  9. Key Labor Force Estimates β€’ Noninstitutionalized civilian labor force statistics: β€’ Unemployment/employment levels β€’ Unemployment rate β€’ Labor force participation rate 9

  10. Problem β€’ Estimate monthly response rate π‘ž, variance 𝑀𝑏𝑠(π‘ž) for CPS households in NSR PSUs . β€’ The sample is at household level: 1 record for each sampled household in each month. β€’ Response 𝑧 𝑗 has binary outcome: 1--response and 0-- nonresponse. β€’ Time period: March 17 – March 18 10

  11. NSR Household Response Rates March 17 – March 18 Household Response Rate 11

  12. Estimated Varia iance for Response an and No Nonresponse Ra Rates Mar ar 17 17 – Mar ar 18 18 β€’ Expect to see π’˜π’ƒπ’” 𝒒 = π’˜π’ƒπ’”(𝟐 βˆ’ 𝒒) , but they are NOT. β€’ Our chosen variance estimator introduces bias in some way. 12

  13. CPS Vari riance Estimation β€’ Due to CPS sample design, there is no direct variance estimator formula: β€’ Select only one PSU per NSR stratum β€’ Systematic sampling within PSU β€’ Currently use balanced-repeated replication (BRR) method for NSR PSUs . 13

  14. CPS Vari riance Estimation β€’ BRR variance estimator: 𝑆 1 𝑀𝑏𝑠 ΰ·  ( ΰ·  𝑠 βˆ’ ΰ·  𝑍) 2 𝑍 = 𝑆(1 βˆ’ 𝐿) 2 ෍ 𝑍 𝑠=1 where ΰ·  𝑍 𝑠 = the 𝑠 -th replicate estimate of 𝑍 ΰ·  𝑍 = the full sample estimate of 𝑍 𝑆 = number of replicates 𝐿 = perturbation factor; 0 ≀ 𝐿 < 1 β€’ BRR requires selecting two PSUs per stratum, but CPS selects only one PSU per stratum οƒ  collapse strata to make pseudo-strata. β€’ These pseudo-strata should ideally contain exactly 2 perfectly matched strata . 14

  15. BRR wit ith Pseudo-Strata β€’ Suppose we want to estimate a population total 𝑍 using ΰ·  ΰ·  𝑀 𝑍 = Οƒ β„Ž=1 𝑍 β„Ž where 𝑀 denotes the number of strata. β€’ Consider the simple case when 𝑀 is even, we estimate the variance of ΰ·  𝑍 by combining the 𝑀 strata into 𝐻 groups of two strata each ( 𝑀 = 2𝐻 ). 15

  16. BRR wit ith Pseudo-Strata β€’ Hence, 𝐻 𝐻 ΰ·  ΰ·  (ΰ·  𝑕1 + ΰ·  𝑍 = ෍ 𝑍 𝑕 = ෍ 𝑍 𝑍 𝑕2 ) 𝑕=1 𝑕=1 𝑯 𝑯 πŸ‘ + 𝝉 π’‰πŸ‘ πŸ‘ ) 𝑾𝒃𝒔 ΰ·‘ 𝑾𝒃𝒔(ΰ·‘ 𝒁 π’‰πŸ ) + 𝑾𝒃𝒔(ΰ·‘ 𝒁 = ෍ 𝒁 π’‰πŸ‘ ) = ෍ (𝝉 π’‰πŸ 𝒉=𝟐 𝒉=𝟐 16

  17. BRR wit ith Pseudo-Strata β€’ The 𝑠 -th replicate estimate of 𝑍 : 𝐻 ΰ·  (1 + (1 βˆ’ 𝐿)πœ€ 𝑕𝑠 ) ΰ·  𝑕1 + (1 βˆ’ (1 βˆ’ 𝐿)πœ€ 𝑕𝑠 )ΰ·  𝑍 𝑠 = ෍ 𝑍 𝑍 𝑕2 𝑕=1 where πœ€ 𝑕𝑠 = 1 if the first stratum in 𝑕 -th group is selected and πœ€ 𝑕𝑠 = βˆ’ 1 if the second stratum in 𝑕 -th group is selected. β€’ πœ€ 𝑕𝑠 are chosen from entries of a Hadamard matrix. β€’ Rows of a Hadamard matrix are mutually orthogonal: 𝑆 ෍ πœ€ 𝑕𝑠 πœ€ 𝑙𝑠 = 0 (βˆ€ 𝑕 β‰  𝑙) 𝑠=1 17

  18. BRR wit ith Pseudo-Strata 𝐻 ΰ·  𝑠 βˆ’ ΰ·  1 βˆ’ 𝐿 πœ€ 𝑕𝑠 ΰ·  𝑕1 βˆ’ ΰ·  𝑍 𝑍 = ෍ 𝑍 𝑍 𝑕2 𝑕=1 𝐻 2 1 βˆ’ 𝐿 2 πœ€ 𝑕𝑠 (ΰ·  𝑠 βˆ’ ΰ·  ΰ·  𝑕1 βˆ’ ΰ·  𝑍) 2 = ෍ 2 𝑍 𝑍 𝑍 𝑕2 𝑕=1 𝐻 𝐻 1 βˆ’ 𝐿 2 πœ€ 𝑕𝑠 πœ€ 𝑙𝑠 ΰ·  𝑕1 βˆ’ ΰ·  ΰ·  𝑙1 βˆ’ ΰ·  + ෍ ෍ 𝑍 𝑍 𝑍 𝑍 𝑕2 𝑙2 𝑕=1 𝑙≠𝑕 18

  19. BRR with Pseudo-Strata 𝑆 𝑆 𝐻 1 1 2 𝑍) 2 = (ΰ·  𝑠 βˆ’ ΰ·  ΰ·  𝑕1 βˆ’ ΰ·  1 βˆ’ 𝐿 2 𝑆(1 βˆ’ 𝐿) 2 ෍ 𝑍 𝑆(1 βˆ’ 𝐿) 2 ෍ ෍ 𝑍 𝑍 𝑕2 𝑠=1 𝑠=1 𝑕=1 𝐻 𝐻 𝑆 1 1 βˆ’ 𝐿 2 ΰ·  𝑕1 βˆ’ ΰ·  ΰ·  𝑙1 βˆ’ ΰ·  + 𝑆(1 βˆ’ 𝐿) 2 ෍ ෍ 𝑍 𝑍 𝑍 𝑍 𝑙2 ෍ πœ€ 𝑕𝑠 πœ€ 𝑙𝑠 𝑕2 𝑕=1 𝑙≠𝑕 𝑠=1 β€’ Therefore, 𝐻 𝐻 2 +ΰ·  2 βˆ’2ΰ·  𝑕2 ) 2 = ෍ 𝑀𝑏𝑠 ΰ·  (ΰ·  𝑕1 βˆ’ ΰ·  (ΰ·  𝑕1 ΰ·  𝑍 = ෍ 𝑍 𝑍 𝑍 𝑍 𝑍 𝑍 𝑕2 ) 𝑕1 𝑕2 𝑕=1 𝑕=1 19

  20. Bia ias in in BRR wit ith Pseudo-Strata β€’ Taking expectation: 𝐻 2 +ΰ·  2 βˆ’2ΰ·  2 + π‘Šπ‘π‘  ΰ·  2 βˆ’ 2𝜈 𝑕1 𝜈 𝑕2 ( ΰ·  𝑕1 ΰ·  = π‘Šπ‘π‘  ΰ·  𝐹 ෍ 𝑍 𝑍 𝑍 𝑍 𝑕2 ) 𝑍 𝑕1 + 𝜈 𝑕1 𝑍 𝑕2 + 𝜈 𝑕2 𝑕1 𝑕2 𝑕=1 𝐻 𝐻 2 + 𝜏 2 ) + ෍ (𝜈 𝑕1 βˆ’ 𝜈 𝑕2 ) 2 = ෍ (𝜏 𝑕1 𝑕2 𝑕=1 𝑕=1 = π‘Šπ‘π‘ (ΰ·  𝑍) + 𝐢𝑗𝑏𝑑 2 2 = Var{ΰ·  π‘•β„Ž } and 𝜈 π‘•β„Ž = 𝐹{ΰ·  where 𝜏 𝑍 𝑍 π‘•β„Ž } . π‘•β„Ž β€’ Bias squared term is positive and would ADD to variance estimate. β€’ Bias squared term would be zero if the pair of PSUs in each group were perfectly matched. 20

  21. How are Strata Coll llapsed ? β€’ In CPS, the objective function is a function of: β€’ Unemployment β€’ Civilian labor force β€’ Children 0-17 at or below 200% poverty level 21

  22. Sim imulation Overview β€’ Use one month CPS data (Mar 18) which has pseudo-strata information. β€’ For each household, generate 𝑧 𝑗 responses iid from Bernoulli distribution with various π‘ž = 0.03, 0.06, … , 0.99 . β€’ For each π‘ž: β€’ Run 5,000 sims. β€’ Compute true variance and BRR variance. 𝐻 (𝜈 𝑕1 βˆ’ 𝜈 𝑕2 ) 2 β€’ Compute bias squared term Οƒ 𝑕=1 β€’ Compare true variance with BRR variance after adjusting for bias. 22

  23. Simulation Computation β€’ Total number of households: ΰ·‘ π‘œ 𝑂 = Οƒ 𝑗=1 π‘₯ 𝑗 β€’ Full sample estimated response count: ΰ·  π‘œ 𝑍 = Οƒ 𝑗=1 π‘₯ 𝑗 𝑧 𝑗 β€’ Replicate 𝑠 estimated response count: ΰ·  π‘œ 𝑠 = Οƒ 𝑗=1 𝑍 π‘₯ 𝑗 𝑧 𝑗 𝑔 𝑗𝑠 where 𝑔 𝑗𝑠 is either 1.5 or 0.5. 23

  24. Simulation Computation 4 β€’ BRR variance of ΰ·  𝑍: πΆπ‘†π‘†π‘Šπ‘π‘  ΰ·  160 (ΰ·  𝑠 βˆ’ ΰ·  𝑍) 2 160 Οƒ 𝑠=1 𝑍 = 𝑍 β€’ BRR variance of response rate π‘ž : 2 1 πΆπ‘†π‘†π‘Šπ‘π‘  ΰ·  πΆπ‘†π‘†π‘Šπ‘π‘  π‘ž = 𝑍 ΰ·‘ 𝑂 assuming 𝑂 = ΰ·‘ 𝑂 is fixed from outside knowledge. β€’ 𝜈 π‘•β„Ž = π‘ž Γ— 𝑂 π‘•β„Ž 24

  25. BRR Variance vs. Tru rue Variance As 𝒒 gets close to 1, BRR variance estimate is far BRRVar different from true variance. TrueVar 25

  26. BRR Variance vs. Tru rue Variance Not all of bias can be explained due to: BRRVar β€’ Strata collapsed based on different set of covariates β€’ Use AHS MOS instead of CPS β€’ AHS MOS not currently updated (from 2010 design) BRRVar - BiasSq TrueVar 26

  27. Summary ry β€’ For NSR component: β€’ CPS collapses strata to make pseudo-strata. β€’ There is no perfect matching of strata οƒ  bias in variance estimator. β€’ Bias gets significantly large when π‘ž gets close to 1. β€’ Quick fix is to use 𝑀𝑏𝑠(1 βˆ’ π‘ž) for large π‘ž . β€’ CPS is designed for civilian labor force statistics. Expect more bias when estimating variance of other statistics. 27

  28. Questions? Thank You! khoa.dong@census.gov 28

  29. References 1.David Judkins (1990). β€œFay’s method for variance estimation.” Journal of Official Statistics, Vol 6, No. 3, 1990 2. Philip J. McCarthy (1966). β€œReplication: An Approach to the Analysis of Data from Complex Surveys.” Vital and Health Statistics Series 2 No. 14 3. Robert E. Fay (1984). β€œSome Properties of Estimates of Variances Based on Replication Methods.” 4. Philip J. McCarthy (1969). β€œPseudo - Replication: Half Samples.” Review of the International Statistical Institute, Vol. 37, No. 3, pp. 239-264 5. Yang Cheng (2012). β€œOverview of Current Population Survey Methodology.” Internal Report. 6.Wolter, K.M. (2008). Introduction to Variance Estimation, New York: Spring-Verlag. 29

Recommend


More recommend