small area estimation applications in the us census
play

Small Area Estimation Applications in the US Census Bureau Annual - PowerPoint PPT Presentation

Small Area Estimation Applications in the US Census Bureau Annual Survey of Employment and Payroll Evaluation Bac Tran Program Research Branch, Chief Governments Division U.S. Census Bureau Outline Target Population Population


  1. Small Area Estimation Applications in the US Census Bureau Annual Survey of Employment and Payroll Evaluation Bac Tran Program Research Branch, Chief Governments Division U.S. Census Bureau

  2. Outline  Target Population  Population Parameters  Sampling Frame  Sample Design  Small Area Challenges  Estimators  Evaluation 2

  3. Target Population  Individual governments A government is an organized entity which, in addition to having governmental character, has sufficient discretion in the management of its own affairs to distinguish it as separate from the administrative structure of any other governmental unit  Types o Counties o Municipalities o Townships o Special Districts o School Districts 3

  4. Parameters of Interest Annual Survey of Employment and Payroll (ASPEP) Full-time Employees Full-time Pay Part-time Employees Part-time Pay Part-time Hours 4

  5. Parameters of Interest (Cont’d) ASPEP Publication Statistics on the number of federal, state, and local government employees and their gross payrolls 5

  6. Parameters of Interest Statistical Aggregation  Totals by (state, function)  Level of government totals o Local, state, state and local o Nation 6

  7. Parameters of Interest (Cont’d) Some Function Codes of ASPEP 001, Airport 040, Hospitals 002, Space Research & Technology (Federal) 044, Streets & Highways 005, Correction 050, Housing & Community Development (Local) 006, National Defense and International Relations 052, Local Libraries (Federal) 059, Natural Resources 012, Elementary and Secondary - Instruction 061, Parks & Recreation 112, Elementary and Secondary - Other Total 062, Police Protection - Officers 014, Postal Service (Federal) 162, Police-Other 016, Higher Education - Other 079, Welfare 018, Higher Education - Instructional 080, Sewerage 021, Other Education (State) 081, Solid Waste Management 022, Social Insurance Administration (State) 087, Water Transport & Terminals 023, Financial Administration 089, Other & Unallocable 024, Firefighters 090, Liquor Stores (State) 124, Fire - Other 091, Water Supply 025, Judicial & Legal 092, Electric Power 029, Other Government Administration 093, Gas Supply 032, Health 094, Transit 7

  8. Sampling Frame  Governments Integrated Directory (GID)  Created in 2007  Unit ID: 14 digits State (2) Type (1) County (3) Unit (3) SUP (3) SUB (2) 8

  9. Sampling Frame (Cont’d) Example of an unit ID  33 2 031 001 000 00 = New York City 33 2 031 001 301 00 = New York City public school system (dependent on the city government) 33 2 031 001 302 00 = Fashion Institute (dependent post- secondary education agency) 33 2 031 001 303 00 = CUNY, City University of New York (dependent on the city government) 33 2 031 001 303 01 = Manhattan Community College (one campus of CUNY) 9

  10. Sample Design Multistage sample design  PPS sample o Stratified PPS (state x type) based on Total Pay  Cut-off sampling method in sizable (state, type) strata o Construct a cut-off point to determine small and large size units (two strata)  Modified cut-off sampling (a stratified PPS sample method) o Sub-sampling on small strata 10

  11. Sample Sampling Frame  π ps  ˆ y gf Certainties  Sample Births 11

  12. Small Area Challenges  Designed at (state, type) level, estimated at state by function level  Estimate total employees and total payroll at state by function level     Y Y where g state and f , function gf gfi  i U gf 12 12

  13. Other Challenges Skew data- Not Transform 13

  14. Other Challenges (Cont’d) Skew data- Log Transform 14

  15. Estimators- ASPEP  Direct   ˆ HT y w y Horvitz-Thompson: gf gfi gfi  Composite  Battese, Harter, Fuller (BHF) Model  Our Proposed Model 15

  16. Composite Estimator ˆ ˆ      ˆ ˆ ˆ composite HT synthetic y y (1 ) y gf g gf g gf where g= state, f= function code ˆ ˆ  ˆ synthetic y K Y gf gf g 16

  17. Estimators- ASPEP Composite Weight (Cont’d)  Purcell & Kish (1979)  ˆ D v Y ( ) gf     g G f F , gf w 1  ˆ ˆ  S D 2 ( Y Y ) i i   g G f F ,  Issue:  Negative in some i = (state, function code)  Fixable (Lahiri & Pramanik, 2010) 17

  18. Composite Estimators (Cont’d) ˆ HT y Direct (HT): gf syn ˆ ˆ y Synthetic : = K Y gf gf g composite y Composite: gf   x ˆ gf K gf x gf f ˆ ˆ ˆ Y Y Y ˆ ˆ ˆ 1 g 2009 ASPEP regress on 51 Y Y Y 51 j 1 2007 Census (decision-based) 18

  19. Estimators (Cont’d) Battese, Harter, Fuller (BHF) Model        y x v ij 0 1 i i ij y : the number of full-time employees for the j th governmental unit ij within the i th small area x : number of full-time employees for the i th small area obtained from i the previous census   v : unknown intercept and slope, respectively; are small and i 0 1 area specific random effects  : errors in individual observations ij 19

  20. Estimators (Cont’d) Our Proposed Model        log( y ) log( ) x v ij 0 1 i i ij where iid iid    2 2 v ~ N (0, ) and ~ N (0, ) i ij 20

  21. Data for Evaluation Government units that overlap between the 2002 and 2007 Census of Governments reporting strictly positive numbers of full-time employees. 21

  22. Evaluation  Performance of log transform EB o Results o Residuals Diagnostic o EB performance in small area o Benchmark Ratio (BR) • EB  HT when n becomes larger  Smoothening the EB o One-way raking state totals to the direct (HT) o Two-way raking state by function totals to the HT 22

  23. Evaluation- Results  Out of 1,225 (CA, function code) cells o 671 cases (clear winner)  our model o 324 cases  HT o 230 cases  Composite  No significant difference o 160 cases between log-transformed model and the HT o 145 cases between the composite and the HT  HT won in cells where more than 70% of the units were large certainties  Testing for significance, our model can be used in 831 out of 1,225 cells (≈68%) 23

  24. Evaluation- Results Table 1: Percent Relative Error for Differences Estimates of Full Time Employees to the Truth (California) 24

  25. Evaluation (Cont’d) Results- Diagnostic Analysis  QQ Plot for BHF Model 25

  26. Evaluation (Cont’d) Results- Diagnostic Analysis  QQ Plot for Our Model 26

  27. Evaluation- Results (For Gas Supply, All States, Average n= 4) Figure 4: 27

  28. Evaluation (Cont’d) Benchmark Ratio (BR) o BR= |∑(estimate -HT)/HT| o Indicating how close the estimate is to the HT when considering large areas 28

  29. Evaluation (Cont’d) Results Comparison of Benchmark Ratios (Nation) Size BR for the EB BR for the BHF < 50 1.5 1.6 ≥ 50 1.1 1.5 29

  30. Evaluation (Cont’d) Visualization of Table 1 50% Figure3: Distance of the Estimators to the Truth 40% 30% 20% Distance to the Truth HT (Relative Errors) 10% Ours 0% BHF -10% -20% -30% (Function, Sample size ) From small n to big -40% 30

  31. Evaluation (Cont’d) Raking: Log-transformed to HT Base (CA) 2.00% Figure 5: Effect of Benchmarking the Log Transformation 1.00% Distance to True 0.00% 005 079 087 016 018 092 001 032 059 025 040 094 052 081 124 050 162 062 044 029 023 080 089 024 061 112 012 -1.00% Log Log_Benchmark ed -2.00% -3.00% Function Code -4.00% -5.00% 31

  32. Evaluation (Cont’d) Effect of Raking Benchmarking improved 32

  33. Evaluation (Cont’d) Comparison: EB, Raking EB and HT 40.00% Figure 7: EB, EB Benchmarked, and HT 35.00% 30.00% 25.00% Distance to True 20.00% 15.00% 10.00% 5.00% 0.00% 005 079 087 016 018 092 001 032 059 025 040 094 052 081 124 050 162 062 044 029 023 080 089 024 061 112 012 -5.00% Log -10.00% Log_Benchmarked -15.00% HT Funtion Code 33

  34. Evaluation (Cont’d) Domain Analysis (Gas Supply, AVG n=4) EB= log(full-time employees), Benchmarked-EB= EB benchmarked to HT (one-way raking to nation total) 34

  35. Evaluation (Cont’d) Overall- Relative Errors Table 2: Comparison of Overall Relative Errors (CA) Overall - Absolute Relative Errors Σ |(HT-True)/True| Σ |(EB-True)/True| Σ |(EB_benchmarked Σ |(BHF-True)/True| -True)/True| 5.26% 1.67% 1.44% 14.35% Overall - Relative Errors Σ (HT-True)/True Σ (EB-True)/True Σ (EB_benchmarked- Σ (BHF-True)/True True)/True 3.05% -1.5% -1% -14.35% 35

  36. Evaluation (Cont’d) Two-way Raking: (States, Functions)  Two-way raking: o All states to National total o All functions to National functions  255 underestimated cases goes down to 210 cases. 36

  37. Acknowledgements  Thankfully for strong support to this research o Carma Hogue (Assistant Division Chief) o Lisa Blumerman (Division Chief)  Technical advice/review o Dr. Partha Lahiri 37

  38. Contact Information Bac Tran Bac.Tran@census.gov Program Research Branch, Chief Governments Division U.S. Census Bureau 38

  39. Thank you for your time! Questions? 39

Recommend


More recommend