AN ONTOLOGY - BASED APPROACH TO ADMINISTRATIVE DATA SOURCES ’ DOCUMENTATION AND QUALITY EVALUATION G. D’Angiolini, P. De Salvo, A. Passacantilli, E. Patruno, T. Saccoccio Istituto Nazionale di Statistica - ITALY New Techniques and Technologies for Statistics – Brussels, 10-12 March 2015
The Istat’s strategy for supporting the statistical usage of administrative data Istat is launching a strategy for providing any actual or potential user of administrative data sources with NEW SERVICEs SUPPLYING DOCUMENTATION about the available administrative data sources, in particular about the INFORMATION CONTENT the QUALITY MAKING the available administrative data sources MORE USABLE for statistical purposes by means of modifying their content, when possible 1 An ontology- based approach to administrative data sources’ documentation and quality evaluation – NTTS 2015
The Istat’s strategy: ACTIVITIES and TOOLS Administrative data sources ’ INVESTIGATIONS Administrative data sources ’ SURVEYS Organizes the collected Disseminates the collected information about information about the QUALITY administrative data sources ’ CONTENT and QUALITY Modular Quality Assessment DARCAP (Documenting Public Framework for Administrative Administration ARchives) system Data Sources Steers the quality evaluator in producing SPECIFIC RELIABILITY ESTIMATES Supports the CHANGE NOTIFICATION activities SUPERVISION ON CHANGES AND INNOVATION PROJECTS concerning administrative data sources and forms 2 An ontology- based approach to administrative data sources’ documentation and quality evaluation – NTTS 2015
Supporting the statistical usage of administrative data: the need for STANDARD and MODULAR documentation Today ISTAT – a no-nordic NSI – aims at a massive usage of the available administrative data sources, which have various information content and quality features Need for STANDARD and MODULAR DOCUMENTATION about the administrative data sources’ INFORMATION CONTENT and QUALITY In all countries NEW TRENDS imply that the NSIs assume a role of methodological regulation n on statistical organizations implement their own Decision Support Systems, produce statistical information, use administrative data, exchange data 3 An ontology- based approach to administrative data sources’ documentation and quality evaluation – NTTS 2015
DOCUMENTING the ADMINISTRATIVE DATA SOURCES’ INFORMATION CONTENT: the goal The statistical users need to answer both GENERAL and SPECIFIC questions concerning the INFORMATION CONTENT of the available administrative data sources Answering SPECIFIC questions is crucial in order to effectively support the evaluation and the usage of each source as well as the COMPARISON and INTEGRATION of sources For answering SPECIFIC questions the statistical users need STANDARD and MODULAR documentation of the INFORMATION CONTENT 4 An ontology- based approach to administrative data sources’ documentation and quality evaluation – NTTS 2015
DOCUMENTING the ADMINISTRATIVE DATA SOURCES’ INFORMATION CONTENT: the goal The statistical users need to answer both GENERAL and SPECIFIC questions concerning the INFORMATION CONTENT a GENERAL QUESTION about the INFORMATION CONTENT: what phenomena the administrative data source observe? a SPECIFIC QUESTION about the INFORMATION CONTENT: does the source observe the set of events “Students’ enrollments”? Which is the administrative definition for “Students’ enrollments”? Which are the observed characteristics for “Students’ enrollments”? the answers require the specification of the ADMINISTRATIVE DATA SOURCE’S ONTOLOGY according to a CONCEPTUAL MODEL 5 An ontology- based approach to administrative data sources’ documentation and quality evaluation – NTTS 2015
DOCUMENTING the ADMINISTRATIVE DATA SOURCES’ INFORMATION CONTENT: our approach A CONCEPTUAL MODEL for documenting the ADMINISTRATIVE DATA SOURCE’S ONTOLOGY, based on a logical background For each administrative data source, by means of a dedicated INVESTIGATION… …w e document the ADMINISTRATIVE DATA SOURCE’S ONTOLOGY , by means of singling out all the collectives in the source with their owned relationships and their owned characteristics Such a STANDARD DOCUMENTATION of the analyzed administrative data sources’ INFORMATION CONTENT is available for any user in the DARCAP system 6 An ontology- based approach to administrative data sources’ documentation and quality evaluation – NTTS 2015
OUR CONCEPTUAL MODEL 7 OUR CONCEPTS COMMON LITERATURE SURVEY CONCEPTS CONCEPTSNCEPTS COLLECTIVES OBJECT SETS POPULATION • • • POPULATIONS or SETS often population and often population and OF EVENTS event are not separate event are not separate • many linked collectives in an concepts concepts • • ADS many linked collectives one main population, • set of single items, elements in an ADS some linked ones • • their single items are their single items are called OBJECTS called STATIST. UNITS CHARACTERISTICS VARIABLES VARIABLES • • • quantitative or qualitative quantitative or quantitative or • have observation domains qualitative qualitative • • or classifications often variable and often variable and • set of couples: collective classification are not classification are not element + classification (or separate concepts separate concepts domain) item • • RELATIONSHIPS often relationships are often relationships are • set of couples: collective not explicitly taken into not explicitly taken into element + linked other account account collective element
OUR CONCEPTUAL MODEL: Examples 8 OUR CONCEPTS Examples Examples in logic form COLLECTIVES COLLECTIVES COLLECTIVES • POPULATIONS or POPULATIONS POPULATIONS • Students • Students (Rossi) SETS OF EVENTS • set of single items, elements SETS OF EVENTS SETS OF EVENTS • Exams • Exams (Exam i ) • Degree_events • Degree_events (Degree_event i ) OWNED OWNED OWNED CHARACTERISTICS CHARACTERISTICS CHARACTERISTICS • • Has residence in • Has residence in (Rossi, Rome) set of couples: collective element + • Assigned credits (Exam i , 12) + Town list classification (or • Assigned credits domain) item + [6,12] OWNED OWNED OWNED RELATIONSHIPS RELATIONSHIPS RELATIONSHIPS • • • set of couples: Passed_by Passed_by (Exam i , Rossi) • • collective element + Concerns Concerns (Degree_event i , Rossi) linked other collective element
DOCUMENTING the ADMINISTRATIVE DATA SOURCES’ QUALITY: the goal The statistical users need to answer both GENERAL and SPECIFIC questions concerning the QUALITY a GENERAL QUESTION about the QUALITY: has the administrative data source an admissible overall quality? a SPECIFIC QUESTION about the QUALITY: does the administrative source collect reliable information about the set of events “Students’ enrollments”? and about the characteristic “Owned degree”? the answers require a MODULAR QUALITY ASSESSMENT CHECKLIST IT IS A DIFFICULT TASK Surveys and administrative data sources are very different observation processes 9 An ontology- based approach to administrative data sources’ documentation and quality evaluation – NTTS 2015
DOCUMENTING the ADMINISTRATIVE DATA SOURCES’ QUALITY: the goal IT IS A DIFFICULT TASK Surveys and administrative data sources are very different observation processes Different DATA COLLECTION PROCEDURES: surveys are designed as snapshots of the observed collectives at specified moments, administrative data sources collect new information at any moment, in a continuous way, in particular they observe sets of events which occur in the course of time Different QUALITY DETERMINANTS: the administrative sources’ data are often affected by systematic errors , and only the administrative source’s experts can provide the data users with proper information about the effects of such errors 10 An ontology- based approach to administrative data sources’ documentation and quality evaluation – NTTS 2015
DOCUMENTING the ADMINISTRATIVE DATA SOURCES’ QUALITY: our approach A MODULAR QUALITY ASSESSMENT FRAMEWORK based on a MODULAR QUALITY ASSESSMENT CHECKLIST For each administrative data source, by means of a dedicated INVESTIGATION… … we interview the data source’s expert about the existing sources of errors and when possible we perform in-depth quality analyses according to a MODULAR QUALITY ASSESSMENT CHECKLIST … …. because we want to assign specific quality indicators to each collective, owned relationship and owned characteristic in the administrative data source’s ontology 11 An ontology- based approach to administrative data sources’ documentation and quality evaluation – NTTS 2015
DOCUMENTING the ADMINISTRATIVE DATA SOURCES’ QUALITY: our approach We are now building the MODULAR QUALITY ASSESSMENT CHECKLIST Collectives, owned relationships and owned characteristics in the administrative data source are associated with different kinds of minimal information units which are affected by different kinds of minimal errors We are now singling out all such minimal errors and combining them in order to define all the possible errors which may affect each component of the administrative data source’s ontology: each collective, owned relationship and owned characteristic 12 An ontology- based approach to administrative data sources’ documentation and quality evaluation – NTTS 2015
Recommend
More recommend