The UK Longitudinal Studies (LSs) Sensitive microdata: Sample from the Census linked to administrative data (births, deaths, marriages, health and other) Restricted access: Safe settings ONS LS (England & Wales): London, Titchfield and Newport SLS (Scotland): Edinburgh NILS (Northern Ireland): Belfast Remote access Only variable names and labels are provided to the user A Support Officer runs analysis script on the real data Administrative Data Research Centre - Scotland | Beata Nowok | 10 March 2015
Synthetic data for the UK LSs Synthetic versions of data extracts to match individual user data requests Provided to approved researchers for preliminary analysis and preparing code, final analysis will be run on the real data in safe settings Administrative Data Research Centre - Scotland | Beata Nowok | 10 March 2015
Original (input) Marital Sex Age Education Income Life satisfaction status FEMALE 57 VOCATIONAL/GRAMMAR MARRIED 800 PLEASED MALE 41 SECONDARY UNMARRIED 1500 MIXED FEMALE 18 VOCATIONAL/GRAMMAR UNMARRIED NA PLEASED FEMALE 78 PRIMARY/NO EDUCATION WIDOWED 900 MIXED FEMALE 54 VOCATIONAL/GRAMMAR MARRIED 1500 MOSTLY SATISFIED MALE 20 SECONDARY UNMARRIED -8 PLEASED FEMALE 39 SECONDARY MARRIED 2000 MOSTLY SATISFIED MALE 39 SECONDARY MARRIED 1197 MIXED Synthetic (output) FEMALE 38 VOCATIONAL/GRAMMAR MARRIED NA MOSTLY DISSATISFIED FEMALE 73 VOCATIONAL/GRAMMAR WIDOWED 1700 PLEASED Marital Sex Age Education Income Life satisfaction status FEMALE 54 SECONDARY WIDOWED 2000 MOSTLY SATISFIED MALE 81 PRIMARY/NO EDUCATION MARRIED 2100 PLEASED MALE 30 VOCATIONAL/GRAMMAR UNMARRIED 900 MOSTLY SATISFIED MALE 54 VOCATIONAL/GRAMMAR MARRIED 1700 PLEASED MALE 68 SECONDARY MARRIED -8 DELIGHTED FEMALE 32 VOCATIONAL/GRAMMAR DIVORCED 870 MIXED MALE 61 PRIMARY/NO EDUCATION MARRIED -8 MIXED FEMALE 98 PRIMARY/NO EDUCATION MARRIED 800 MOSTLY DISSATISFIED FEMALE 50 PRIMARY/NO EDUCATION MARRIED NA MOSTLY SATISFIED Data that look FEMALE 37 VOCATIONAL/GRAMMAR MARRIED 158 PLEASED (structurally) like MALE 28 VOCATIONAL/GRAMMAR NA 1500 MOSTLY SATISFIED FEMALE 62 PRIMARY/NO EDUCATION MARRIED 830 MOSTLY SATISFIED original data but MALE 78 PRIMARY/NO EDUCATION MARRIED NA PLEASED FEMALE 29 SECONDARY MARRIED 580 MOSTLY SATISFIED contain artificial MALE 59 PRIMARY/NO EDUCATION MARRIED 1300 MOSTLY SATISFIED units only MALE 41 SECONDARY UNMARRIED 1500 MIXED MALE 18 SECONDARY UNMARRIED -8 PLEASED FEMALE 73 PRIMARY/NO EDUCATION WIDOWED 1350 MOSTLY SATISFIED
Data that behave (statistically) like original data
Generating Synthetic Versions of Sensitive Microdata for Statistical Disclosure Control package http://cran.r-project.org/package=synthpop
Generating synthetic data: method fit observed synthetic Y j ~ (Y 0 ,Y 1 ,...,Y j−1 ) draw Sequentially replacing original data values with synthetic values generated from conditional probability distributions
Generating synthetic data: synthpop observed synthetic syn ()
Generating synthetic data: synthpop Synthesis can be run with default parameters (classification and regression tree models - CART) syn(data) Methods to summarise and to make inferences from synthetic data Administrative Data Research Centre - Scotland | Beata Nowok | 10 March 2015
syn() & common data problems Missing-data patterns Semi-continuous variables Restricted values (interrelationships between variables) Linear constraints Non-negativity / non-normality Deterministic relations Administrative Data Research Centre - Scotland | Beata Nowok | 10 March 2015
Conclusions Synthetic data – expanding the use of confidential microdata UK LSs: Access to LS-like data on own computer ADRC-S: Archiving linked data Teaching The synthpop package for R – facilitating generation and analysis of synthetic data Direction: Automation based on best practices and methods Administrative Data Research Centre - Scotland | Beata Nowok | 10 March 2015
Recommend
More recommend