The CALIBER Research Platform Using large-scale linked electronic health records for research Dr Arturo González-Izquierdo University College London Institute of Health Informatics 7 th – 12 th November 2018 UCL Institute of Health Informatics Big Data Science BAHIA 2018
• Data generation mechanism • Linked electronic health records • EHR phenotyping • Challenges & opportunities
Healthcare system Outpatients Hospitals General practitioners Specialists
Data generation mechanism Admitted patient care Acute events Elective patient care Diagnoses Monitoring Procedures Examination Referrals Initial point of consultation Health history Baseline characteristics Health behaviour Emergency presentations Tests Symptoms Medication Signs Specialised consultative care Advanced medical investigation Specialised treatment & interventions
Healthcare settings and data custodians CPRD : GP Data NHS Digital : Hospital Data Disease Registries : ONS : Tertiary care data Mortality Data
Electronic Health Records Hospital Episode Statistics Clinical Practice Research Datalink Office for National Statistics National Cancer Registration and Analysis Service
Linked Electronic Health Records SECONDARY CARE PRIMARY CARE Chronic obstructive airway disease Recurrent mild chest infection Lung Cancer Diagnosis Chronic cough Pneumonia DEPRIVATION AND Weight loss CANCER REGISTRATIONS MORTALITY
Linked Electronic Health Records Chronic obstructive airway disease Lung Cancer Diagnosis Recurrent mild chest infection Pneumonia hospitalisation Chronic cough Weight loss Registration date Consultation date Consultation date Admission date Admission date Date of death Date of birth Blood tests (routine) Sputum tests Diagnosis Diagnosis (cancer Underlying cause type, stage, Blood pressure Spirometry Diagnosis Additional diagnosis Subsidiary causes metastases) (deep vein thrombosis Weight Chest x-rays Procedures Pneumonia) Procedures (surgery) Height CT chest Procedure for biopsy Treatment (chemotherapy, Physical activity Treatment (antibiotic, of lesion (bronchoscopy, radiotherapy) Inhalers) Health history Chest drain) Discharge date (heart, diabetes, Chest imaging (x-ray, stroke) CT, PET CT, Smoking chest drain insertion) Alcohol Discharge date Contraception Immunisations
Linked Electronic Health Records Chronic obstructive airway disease Lung Cancer Diagnosis Recurrent mild chest infection Pneumonia hospitalisation Chronic cough Weight loss Registration date Consultation date Consultation date Admission date Admission date Date of death Date of birth Blood tests (routine) Sputum tests Diagnosis Diagnosis (cancer Underlying cause type, stage, Blood pressure Spirometry Diagnosis Additional diagnosis Subsidiary causes metastases) (deep vein thrombosis Weight Chest x-rays Procedures Pneumonia) Procedures (surgery) Height CT chest Procedure for biopsy Treatment (chemotherapy, Physical activity Treatment (antibiotic, of lesion (bronchoscopy, radiotherapy) Inhalers) Health history Chest drain) Discharge date (heart, diabetes, Chest imaging (x-ray, stroke) CT, PET CT, Smoking chest drain insertion) Alcohol Discharge date Contraception Immunisations
EHR phenotype Biometrics, test results, time dependent thresholds Diagnoses or procedures Medication Health care utilisation patterns
EHR phenotype
EHR phenotype • Extraction – Algorithm (generic) Pujades-Rodriguez M. (2016) Heart, 102:383-398
The CALIBER Research Platform Cohort identification methods Deep phenotyping algorithms Patient Population Longitudinal clinical trajectories Precise temporal allocation of Exposures and outcomes
Challenges Day-to-day challenges: 1. Comply with the data custodians directives on data protection 2. Understanding the data generation mechanisms 1.Clinical practice 2.Recording of information 3.Coding 3. Connecting jargons from multiple disciplines 4.Understand the associated information governance
Challenges EHR’s observation window Start End Relevant clinical event Exposure to factor of interest Outcome of interest Death
Opportunities • Recent willingness by data custodians to research health data using machine learning based methodologies • Wide range of exploratory or hypothesis generation/test studies – Patient classification (Machine Learning sub-phenotyping) – Detailed healthcare utilisation patterns (multi-state trajectory flows) – Integration of data models – Sophisticated epidemiological/statistical methods computationally feasible for causal inference – EHR based decision/early-detection tools (automation)
The Data Lab Natalie Fitzpatrick Data Science Facilitator n.fitzpatrick@ucl.ac.uk CALIBER portal https://www.caliberresearch.org/portal Denaxas Lab http://denaxaslab.org/
Recommend
More recommend