data quality finding data and
play

Data Quality: finding data and evaluating its quality for your - PowerPoint PPT Presentation

Data Quality: finding data and evaluating its quality for your research David Dorr, MD, MS Professor and Vice Chair, Medical Informatics and Clinical Epidemiology Professor, Medicine Oregon Health & Science University AcademyHealth 2017!


  1. Data Quality: finding data and evaluating its quality for your research David Dorr, MD, MS Professor and Vice Chair, Medical Informatics and Clinical Epidemiology Professor, Medicine Oregon Health & Science University AcademyHealth 2017!

  2. Overview • How can you find data for your problem that is appropriate for your methods? • Choices • Operational database, e.g., from your Electronic Health Record system • Clinical Quality Measure or Registry-based database • Database cluster / linkage : e.g., PCORnet, i2b2 SHRINE • Standardized database for observational studies : OHDSI OMOP common data model • Specific datasets : FigShare, published articles, the LIBRARY (including the National Library of Medicine)

  3. Data quality overview for evaluation Example ADMINISTRATIVE / CLAIMS data Completeness Can be made 100% complete for concepts provided; MISSING many clinical concepts (e.g., results) Correctness Variable by use; encounter correctness GOOD; diagnosis correctness MODERATE Currency POOR – lag Granularity Fine grained for diagnosis Integration* Challenging – often de-identified or different identifiers Fitness for use Utilization rates for an at-risk population (examples) e.g., ALL PAYER ALL CLAIMS Links beneficiaries across insurances De-identified Restricted http://www.oregon.gov/oha/HPA/ANALYTICS/APAC%20Page%20Docs/APAC-Overview.pdf * Or Interoperability: but I mean can it be combined with other data sources.

  4. Operational databases Adapted from https://medinform.jmir.org/2014/1/e5/

  5. Quality Data Model • All clinical quality measures Certain DOMAINS Certain taxonomies (LOINC, ICD-10, RxNORM) ONLY mapped if CQM uses it https://ecqi.healthit.gov/qdm

  6. EHR data source dramatically affects data quality and interpretation Sensitivity was highest in encounters (.55), and specificity in the Problem List (.82). Combining all information led to sensitivity of .95 and specificity of .19.

  7. Data quality for EHR extracts EHR data into standard EHR data warehouse Completeness Often more complete by DOMAIN; temporal completeness varies Correctness HIGHLY variable Currency EXCELLENT (with careful constraints) Granularity Fine grained for many domains; with narrative notes, can get extensively fine grained Integration Frequent foreign keys (multiple identifiers), limited by policy Fitness for use Episode-based care; setting-based care, including specialties (e.g., ambulatory primary care); (examples) workflow / operations (time-stamped observations) e.g., ‘Clarity’ Data warehouse for Epic (or newer Caboodle); Analytic data warehouses ‘Back end access’ to EHR data

  8. Definitions of observed conditions / symptom sets – PHENOTYPES – are increasingly required to improve quality but vary across sources PhenX PheKB Phenotype Protocol PhenX ID LOINC Name LOINC Name Code CDE Name CDE ID Global Mental Status Global mental Adult Cognitive Screener - status adult Assessment Adult PX130701 proto 62769-5 Score 3076130 … subvariables under this level with logic Human Phenotype Ontology How do I define ‘Dementia’ for my study?

  9. The variation amongst phenotypes extends across domains Shivade et al, JAMIA

  10. Clinical databases for observational studies in use Database Description Size / Use (Mini)Sentinel Database for active surveillance of regulated 178 million members; search Sentinel FDA products; maintained by FDA and other network notes; claims and pharmacy PCORNet Distributed network with common data 122 million; http://www.pcornet.org/ model I2b2 / SHRINE Distributed open source software and 23 million; i2b2.org common data model with deployed networks OHDSI OMOP Common data model intended to facilitate 600 million; ohdsi.org CDM observational studies; used in All of Us precision medicine

  11. Distributed models can facilitate collaboration / spread, but also require external resources

  12. Improving data quality: encouraging better mapping PheKB Phenotype: Dementia (excerpt) Atlas Evaluation Availability Feasibility COMPUTABLE PHENOTYPE Accuracy ASSESSMENT tool Currency PhenX Protocol PhenX ID LOINC Completeness, and LOINC Name Name Code CDE Name CDE ID Global Representativeness Mental Adult Status Global Cognitive Screener - mental status 62769- Assessment 307613 Adult PX130701 adult proto 5 Score 0 … subvariables under this level with logic Human Phenotype Ontology: Dementia www.ohdsi.org

  13. OHDSI OMOP common data model Model Domain Table Names PERSON, OBSERVATION_PERIOD, SPECIMEN, DEATH, VISIT_OCCURRENCE, PROCEDURE_OCCURRENCE, Standardized Clinical Data Tables DRUG_EXPOSURE, DEVICE_EXPOSURE, CONDITION_OCCURRENCE, MEASUREMENT, NOTE, OBSERVATION, FACT_RELATIONSHIP Standardized Health System Data Tables LOCATION, CARE_SITE, PROVIDER PAYER_PLAN_PERIOD, VISIT_COST, PROCEDURE_COST, Standardized Health Economics Data Tables DRUG_COST, DEVICE_COST COHORT, COHORT_ATTRIBUTE, DRUG_ERA, Standardized Derived Elements DOSE_ERA, CONDITION_ERA

  14. OHDSI OMOP related open source software

  15. Achilles Heel for OHDSI can automatically detect data errors

  16. Error rates per patient for OHDSI OMOP Errors / patient (using minimum database size) 0.12 0.1 0.08 0.06 0.04 0.02 0 A B C D E F G H I J K L M Error/patient Huser et al GEMS

  17. Data quality for Clinical datasets Clinical datasets Completeness Often incomplete; ETLs should define what data is in there and allow for assessment of completeness for your need Correctness May be validated and improved Currency Moderate Granularity Transformation may reduce some granularity, especially for nuanced concepts Integration Already integrated; but expanding to new data sources hard Fitness for use Hypothesis generation / cohort discovery on large scale studies; basic observational studies (examples) e.g., PCORNet, i2b2, OHDSI How to find - Your local Clinical and Translational Science Institute - core websites with forums http://www.oregon.gov/oha/HPA/ANALYTICS/APAC%20Page%20Docs/APAC-Overview.pdf

  18. Finding OTHER data sources to test hypotheses • The majority of de-identified data is not in any of these standards, but their own. • Multiple efforts to make these FAIR • Findable • Accessible • Interoperable • Reusable WHERE TO LOOK? The LIBRARY! Try the national library of medicine https://www.nlm.nih.gov/NIHbmic/nih_data_sharing_repositories.html Open source aggregation / metadata

  19. DataCite

  20. FigShare

  21. Data quality for found datasets Found datasets Completeness Complete solely for its purpose Correctness Oddly poor Currency Frozen in time Granularity VARIES Integration Extremely difficult Fitness for use Replication or focused Exploratory data analysis for pilot data (examples)

  22. Thanks ! • Lots of help from Nicole Weiskopf, PhD, who actually knows something about data quality • dorrd@ohsu.edu

Recommend


More recommend