the uk longitudinal studies lss
play

The UK Longitudinal Studies (LSs) Sensitive microdata: Sample from - PowerPoint PPT Presentation

The UK Longitudinal Studies (LSs) Sensitive microdata: Sample from the Census linked to administrative data (births, deaths, marriages, health and other) Restricted access: Safe settings ONS LS (England & Wales): London,


  1. The UK Longitudinal Studies (LSs)  Sensitive microdata: Sample from the Census linked to administrative data (births, deaths, marriages, health and other)  Restricted access:  Safe settings ONS LS (England & Wales): London, Titchfield and Newport SLS (Scotland): Edinburgh NILS (Northern Ireland): Belfast  Remote access Only variable names and labels are provided to the user A Support Officer runs analysis script on the real data Administrative Data Research Centre - Scotland | Beata Nowok | 10 March 2015

  2. Synthetic data for the UK LSs  Synthetic versions of data extracts to match individual user data requests  Provided to approved researchers for preliminary analysis and preparing code, final analysis will be run on the real data in safe settings Administrative Data Research Centre - Scotland | Beata Nowok | 10 March 2015

  3. Original (input) Marital Sex Age Education Income Life satisfaction status FEMALE 57 VOCATIONAL/GRAMMAR MARRIED 800 PLEASED MALE 41 SECONDARY UNMARRIED 1500 MIXED FEMALE 18 VOCATIONAL/GRAMMAR UNMARRIED NA PLEASED FEMALE 78 PRIMARY/NO EDUCATION WIDOWED 900 MIXED FEMALE 54 VOCATIONAL/GRAMMAR MARRIED 1500 MOSTLY SATISFIED MALE 20 SECONDARY UNMARRIED -8 PLEASED FEMALE 39 SECONDARY MARRIED 2000 MOSTLY SATISFIED MALE 39 SECONDARY MARRIED 1197 MIXED Synthetic (output) FEMALE 38 VOCATIONAL/GRAMMAR MARRIED NA MOSTLY DISSATISFIED FEMALE 73 VOCATIONAL/GRAMMAR WIDOWED 1700 PLEASED Marital Sex Age Education Income Life satisfaction status FEMALE 54 SECONDARY WIDOWED 2000 MOSTLY SATISFIED MALE 81 PRIMARY/NO EDUCATION MARRIED 2100 PLEASED MALE 30 VOCATIONAL/GRAMMAR UNMARRIED 900 MOSTLY SATISFIED MALE 54 VOCATIONAL/GRAMMAR MARRIED 1700 PLEASED MALE 68 SECONDARY MARRIED -8 DELIGHTED FEMALE 32 VOCATIONAL/GRAMMAR DIVORCED 870 MIXED MALE 61 PRIMARY/NO EDUCATION MARRIED -8 MIXED FEMALE 98 PRIMARY/NO EDUCATION MARRIED 800 MOSTLY DISSATISFIED FEMALE 50 PRIMARY/NO EDUCATION MARRIED NA MOSTLY SATISFIED Data that look FEMALE 37 VOCATIONAL/GRAMMAR MARRIED 158 PLEASED (structurally) like MALE 28 VOCATIONAL/GRAMMAR NA 1500 MOSTLY SATISFIED FEMALE 62 PRIMARY/NO EDUCATION MARRIED 830 MOSTLY SATISFIED original data but MALE 78 PRIMARY/NO EDUCATION MARRIED NA PLEASED FEMALE 29 SECONDARY MARRIED 580 MOSTLY SATISFIED contain artificial MALE 59 PRIMARY/NO EDUCATION MARRIED 1300 MOSTLY SATISFIED units only MALE 41 SECONDARY UNMARRIED 1500 MIXED MALE 18 SECONDARY UNMARRIED -8 PLEASED FEMALE 73 PRIMARY/NO EDUCATION WIDOWED 1350 MOSTLY SATISFIED

  4. Data that behave (statistically) like original data

  5. Generating Synthetic Versions of Sensitive Microdata for Statistical Disclosure Control package http://cran.r-project.org/package=synthpop

  6. Generating synthetic data: method fit observed synthetic Y j ~ (Y 0 ,Y 1 ,...,Y j−1 ) draw Sequentially replacing original data values with synthetic values generated from conditional probability distributions

  7. Generating synthetic data: synthpop observed synthetic syn ()

  8. Generating synthetic data: synthpop  Synthesis can be run with default parameters (classification and regression tree models - CART) syn(data)  Methods to summarise and to make inferences from synthetic data Administrative Data Research Centre - Scotland | Beata Nowok | 10 March 2015

  9. syn() & common data problems  Missing-data patterns  Semi-continuous variables  Restricted values (interrelationships between variables)  Linear constraints  Non-negativity / non-normality  Deterministic relations Administrative Data Research Centre - Scotland | Beata Nowok | 10 March 2015

  10. Conclusions  Synthetic data – expanding the use of confidential microdata  UK LSs: Access to LS-like data on own computer  ADRC-S: Archiving linked data  Teaching  The synthpop package for R – facilitating generation and analysis of synthetic data Direction: Automation based on best practices and methods Administrative Data Research Centre - Scotland | Beata Nowok | 10 March 2015

Recommend


More recommend