orietta luzi marco di zio ugo guarnera roberta varriale
play

Orietta Luzi , Marco Di Zio, Ugo Guarnera, Roberta Varriale Italian - PowerPoint PPT Presentation

Dealing with measurement and integration errors in administrative data: the case of the Italian multi-source system on small and medium enterprises Orietta Luzi , Marco Di Zio, Ugo Guarnera, Roberta Varriale Italian National Statistical


  1. Dealing with measurement and integration errors in administrative data: the case of the Italian multi-source system on small and medium enterprises Orietta Luzi , Marco Di Zio, Ugo Guarnera, Roberta Varriale Italian National Statistical Institute (Istat) NTTS2015 Conference - Brussels, 10-12 March, 2015 Dealing with measurement and integration errors in administrative data: the case of the Italian multi-source system on small and medium enterprises - NTTS 2015

  2. The «frame SBS»: a multiple-source system for Italian Structural Business Statistics based on administrative and survey data Frame SBS Statistical information system for estimating structural economic variables on business accounts ( Turnover, Purchases of goods and Services, Production Value, Value Added ,… ) for small and medium enterprises based on the primary use of integrated administrative/fiscal data, “complemented” with survey data Until now, SBS for enterprises with less than 100 employees (~4.4 mln units in 2011) have been estimated based on a direct sample survey (~100,000 units) - administrative data were used as auxiliary information . Variables Main economic aggregates Y6 Purchases of goods Y1 Income from sales and Services (Turnover) Y7 Purchases of services Y2 Changes in stock of finished and semi-finished products Y8 Use of third party assets Y3 Changes in contract work in progress Y9 Changes in stocks of raw materials and for resale Y4 Changes in internal work capitalized under fixed assets Y10 Other operating charges Y5 Other income and earnings PC Personnel Costs Components of the main economic aggregates (out of scope of the paper) Dealing with measurement and integration errors in administrative data: the case of the Italian multi-source system on small and medium enterprises - NTTS 2015

  3. The sources of the «frame SBS» Financial Statements ( FS ) of corporate enterprises liable to fill in the financial statement (about 800.000 enterprises each year) The Sector Studies survey ( SS ), which is a Fiscal Authority survey that includes each year about 3.5 mln enterprises with a turnover lower than 7.5 mln and greater than 30,000 euros belonging to many economic activity sectors The Tax Return Data ( Unico model), based on a unified model of tax declarations by legal form, and IRAP , the Italian regional tax on productive activities The Business Register ( BR ). Used as population list, auxiliary source of information The Social Security Data ( SSD ), which includes firm level data and employee data on wages and labor cost. Auxiliary source of information Dealing with measurement and integration errors in administrative data: the case of the Italian multi-source system on small and medium enterprises - NTTS 2015

  4. The sources of the «frame SBS» 2 Y 1 3 Y 1 1 Y 2 1 .....… Y k 1 2 Y 2 2 .....… Y k 3 Y 2 3 .....… Y k S Y 2 S ……...… Y p S Units ID Ateco N Emp Turn N Emp PC WS WH SC Y 1 Y 1 1 SME Survey 2 Financial Statements . ( ~ 16% of SMEs) . . SME Survey . . . . . . . Social Security Data (SSD) Sector Studies Survey . ( ~ 80% of SMEs) SME Survey . . . BR Tax Returns Data . (UNICO, IRAP) . . (~97% of SMEs) . . . . SME Survey . . . . . . SME Survey . Not covered ( ~ 4%) N (4.4 mil) Dealing with measurement and integration errors in administrative data: the case of the Italian multi-source system on small and medium enterprises - NTTS 2015

  5. Non-sampling errors  Harmonization  Measurement errors (consistency errors)  Coverage problems  (no unit identification errors were possible) Dealing with measurement and integration errors in administrative data: the case of the Italian multi-source system on small and medium enterprises - NTTS 2015

  6. Non-sampling errors - Harmonization  A system of different indicators and quality measures at both micro and aggregate level were used to compare and harmonize information on target variables coming from the different sources  hierarchical approach in the use of different sources  Subject matter experts  Permanent activity, dealing with changes in administrative and fiscal sources Dealing with measurement and integration errors in administrative data: the case of the Italian multi-source system on small and medium enterprises - NTTS 2015

  7. Non-sampling errors - Consistency errors A two-phases data editing strategy : a. editing activities on micro-data observed in each AD source were performed to identify logical/formal data inconsistencies (e.g. balance errors and other kind of invalid information) b.specific analyses were devoted to assess and resolve inconsistencies between variables integrated from different sources:  identification of outliers: trimming approach based on the analysis of the distribution of economic indicators built using information from different sources (such as the per-capita labor cost), and in rejecting those values exceeding pre-defined thresholds, by domain  influential errors: identified using a model-based robust selective editing approach for continuous variables (the selective editing methodology implemented in the R package SeleMix - Selective Editing via Mixture models) Dealing with measurement and integration errors in administrative data: the case of the Italian multi-source system on small and medium enterprises - NTTS 2015

  8. Non-sampling errors - Coverage errors  Coverage problems  unit non-response, deriving from the fact that the integrated AD sources relate to sub- populations which do not cover the overall SMEs population as defined for the SBS purposes  item non-response , mainly due to the incompleteness of information, for some units, of some AD sources, which do not observe all the target variables required for SBS estimation  Predictive approach based on imputation  allowed to build a complete micro-data file for those variables which are extensively covered by the (integrated) AD sources  the not available information is predicted (imputed) based on the available administrative information using a combination of different techniques (including Predictive Mean Matching, Nearest Neighbor Donor, other approaches based on logistic and linear regression), which have been applied to separate groups of variables taking into account their distributional characteristics and their relationships with other variables Dealing with measurement and integration errors in administrative data: the case of the Italian multi-source system on small and medium enterprises - NTTS 2015

  9. Coverage rate of the SME population by source and some main economic aggregates (year 2011) Source Number Units Number Employees Revenues Value Added FS 16.1 38.2 66.2 54.1 SS 64.0 49.2 24.5 36.4 Unico 16.2 8.3 5.5 6.1 Total covered 96.3 95.7 96.2 96.6 Not covered 3.7 4.3 3.7 3.4 Total 100.0 100.0 100.0 100.0 Dealing with measurement and integration errors in administrative data: the case of the Italian multi-source system on small and medium enterprises - NTTS 2015

  10. Results Relative differences between total estimates based on administrative and sample data are considered. percentage difference between the variables estimates 𝑍 𝑇𝑏𝑛𝑞𝑚𝑓 −𝑍 𝐺𝑠𝑏𝑛𝑓 ×100 𝒆 𝒖 = = 𝒆 𝒕 + 𝒆 𝒏 based on the new estimation system ( Y Frame ) and the 𝑍 𝐺𝑠𝑏𝑛𝑓 corresponding estimates based on the SME survey ( Y Sample ) 𝑍 𝐺𝑠𝑏𝑛𝑓,𝑇𝑏𝑛𝑞𝑚𝑓 −𝑍 𝐺𝑠𝑏𝑛𝑓 ×100 𝑒 𝑡 = : sampling effect 𝑍 𝐺𝑠𝑏𝑛𝑓 𝑍 𝑇𝑏𝑛𝑞𝑚𝑓 −𝑍 𝐺𝑠𝑏𝑛𝑓,𝑡𝑏𝑛𝑞𝑚𝑓 ×100 𝑒 𝑛 = : measurement effect 𝑍 𝐺𝑠𝑏𝑛𝑓 The largest component in the decomposition of the main economic aggregate estimates is the one associated with the sampling error This result is encouraging because it implies that the transition from design-based inference to an estimation approach based on administrative sources would result in a significant improvement of the estimate accuracy Dealing with measurement and integration errors in administrative data: the case of the Italian multi-source system on small and medium enterprises - NTTS 2015

  11. Concluding remarks….  Overcome some limitations of the current statistical production strategy (costs, burden, accuracy)  Expected increase of SBS consistency over time  Higher levels of consistency between annual statistics on enterprises and National Accounts, starting from the 2011 Benchmark … and future work  Managing unit identification problems over time (splits, fusions,…)  Assessing estimates accuracy for the main economic aggregates  Improve inferences for some components of the main economic aggregates in specific economic sectors  Consistent estimation w.r.t. the frame information in the different domains of statistics on enterprises (R&D, ICT, etc.) Dealing with measurement and integration errors in administrative data: the case of the Italian multi-source system on small and medium enterprises - NTTS 2015

  12. Thank you for your attention! Dealing with measurement and integration errors in administrative data: the case of the Italian multi-source system on small and medium enterprises - NTTS 2015

Recommend


More recommend