analysing whether sample survey data
play

Analysing whether sample survey data can be replaced by - PowerPoint PPT Presentation

Analysing whether sample survey data can be replaced by administrative data Arnout van Delden (a.vandelden@cbs.nl), Reinder Banning, Arjen de Boer and Jeroen Pannekoek Outline 1. Understanding fitness for use 2. Conceptual differences 3.


  1. Analysing whether sample survey data can be replaced by administrative data Arnout van Delden (a.vandelden@cbs.nl), Reinder Banning, Arjen de Boer and Jeroen Pannekoek

  2. Outline 1. Understanding fitness for use 2. Conceptual differences 3. Numerical differences 4. Discussion 2

  3. 1. Understanding fitness for use Concepts admin. data: • Numerous rules • Differ by type of industry Case study: • 2011 new production system • Levels and growth rates • Can VAT be used for turnover? • 324 “base cells” for publication 3

  4. 1. Fitness for use: group the base cells Group Target vs. administrative variable Control No conceptual differences Accept Conceptual differences and small numerical differences Taking decisions Adjust Conceptual differences and substantial systematic numerical differences Reject Conceptual differences and substantial non-systematic numerical differences How to assign the base cells to the groups? 4

  5. 2. Conceptual differences; find Control Base Unique (set) of rules Expected cells Effect 85 No regulation VAT = T 64 Foreign services not charged from 2010 VAT < T 35 International trade regulations, correctly derived VAT ≈ T 18 * Subcontractors shift VAT payment to main contractor VAT ≈ T * Foreign turnover not charged VAT ≪ T 17 Derogation: certain economic activities not charged 16 Subcontractors shift VAT payment to main contractor VAT ≈ T 89 21 Other sets of rules (not specified) 324 Total 5

  6. 3. Numerical differences: the data Yearly turnover: 2009, 2010 • SBS and VAT • Linked at micro level • Units exist whole year • Extremely small units excluded Hotels and similar accommodation 6

  7. 3. Numerical data: the model Linear regression: 𝑢 = 𝛽 𝑙 + 𝑒𝛽 𝑙 𝜀 𝑙𝑗 𝑢 + (𝛾 𝑙 +𝑒𝛾 𝑙 𝜀 𝑙𝑗 𝑢 ) 𝑦 𝑙𝑗 𝑢 + 𝜁 𝑙𝑗 𝑢 𝑧 𝑙𝑗 𝑢 ) SBS( 𝑧 ) and VAT ( 𝑦 ) for base cell ( 𝑙 ), unit ( 𝑗 ), year( 𝑢 ) & year-dummy ( 𝜀 𝑙𝑗 Regression weights – calibration weights (sample to population) – weighted residuals (heteroscedasticity) – M-estimator (Huber weights against outliers) 7

  8. 3. Numerical data: indicators for grouping Indicator Description 2 = 1 − 𝑇𝑇(𝑥) 𝑙,𝑠𝑓𝑡 Coefficient of determination, 𝑆 𝑙 𝑇𝑇(𝑥) 𝑙,𝑢𝑝𝑢 with regression weights w 𝑢 (𝑧 𝑢 −𝑧 𝑙𝑗 𝑢 ) 𝑒 𝑙𝑗 𝑙𝑗 MAPE: Mean absolute percentage error, ,𝑧 = 𝑧 𝑢 𝑗 𝑁 𝑙 𝑢 (𝑧 𝑢 +𝑧 𝑙𝑗 𝑢 ) with calibration weights d 𝑒 𝑙𝑗 𝑙𝑗 𝑢 𝑗 𝛽 𝑙 , 𝑒𝛽 𝑙 , 𝛾 𝑙 , 𝑒𝛾 𝑙 Size and p -values of regression coefficients 8

  9. Indicators for Reject ← 95% range Control → R Sea and coastal passenger water transport ̷ ̷ 𝟑 : 20 poorest base cells 𝑺 𝒍 • Sales partly not charged (19) 2 𝑆 𝑙 • International Trade (1) 9

  10. Indicators for group Accept & Adjust ← 95% range Control → Import of new passenger motor vehicles ̷ ̷ slope 2009 10

  11. Conceptual and numerical result in line? Adjust? Expected effect VAT < T Base cell Number of Slope Change of Regulation points (2009) Slope? (2010) 45112 1742 1.36 -0.01 Margin 45402 31 1.34 NA Margin 45194 42 1.17 0.05 Margin 45111 55 1.16 -0.03 Margin 45191X 210 1.08 -0.04 Margin Different moment, 47641 59 1.02 0.09 Margin 47790 88 0.99 1.86 Margin 45320 35 0.94 0.09 Margin 11

  12. 4. Discussion Main findings – Use outlier robust regression and indicators – Also control group not error free (deviations from 1:1) – We could not use the significance of regression coefficients – Instead: used 95%-range from control group – We achieved a rough grouping by re-using existing data Discussion points – Some base cells no decision: conceptual ≠ numerical results – Limitations: requires the presence of a control group 12

Recommend


More recommend