DATA QUALITY ASSESSMENT FRAMEWORK LISA SCHILLING, MD, MSPH ACADEMY HEALTH ANNUAL RESEARCH MEETING JUNE 27, 2017
LOTS OF ACKNOWLEDGMENTS • Funding • AHRQ 1R01HS019908 (SAFTINet, PI Schilling) • AHRQ1R01HS019912 (SPAN, PI Steiner) • AHRQ U13 HS19564-01 AcademyHealth / EDM Forum (PI Holve) • NCATS UL1 TR000154 (University of Colorado CTSA) (PI Sokel) • PCORI CER Methods Award 5581 (PI Kahn) • Slides • Michael Kahn, MD, PhD, University of Colorado • Tiffany Callahan, University of Colorado • Maggie Massery , Children’s Hospital Colorado
3 Weber, G. M., Mandl, K. D. & Kohane, I. S. Finding the missing link for big biomedical data. JAMA 311, 2479 – 2480 (2014).
DATA QUALITY IN EHRs • Data collection tools optimized for efficiency and billing • Text templates • Copy/paste • Minimal data validation checks • Min/Max limits • Pick lists • Required fields
THE SIMPLE STUFF…..
Blood Pressure Measure Name Patients Used on Times used BLOOD PRESSURE 538,647 13,869,327 R AN NIBP 63,576 2,949,877 CARD BP 3 14,631 26,889 ABP INVASIVE PRESSURE 9,031 3,382,825 BLOOD PRESSURE (ED SEDATION) 7,402 41,498 EDU STAND BP 6,950 33,876 BP – ANY EDU LYING BP 6,941 32,609 TIME CARD BP 2 6,878 9,934 ED PRE HOSP BP 6,323 7,117 BP #2 5,529 40,592 EDU SIT BP 4,957 6,152 CARD BP 4 4,452 6,806 R AN IBP ART 4,430 1,181,368 BP #3 4,330 24,675 BP - STANDING 4,098 6,120 BP - LYING 4,068 5,753 BP - SITTING 3,920 5,292 BP #4 3,477 15,898 BP PRE SEDATION 1,831 2,246 PAP 1,793 218,931 BLOOD PRESSURE (CS) 1,322 8,290 ART PRESSURE #2 404 136,579 R AN IBP PAP 71 6,488 R AN IBP P1 60 4,562 ECMO BLOOD PRESSURE 57 85,037 CARD BP 1 55 56 R AN IBP AO 53 4,129 RV PRESSURE 50 124 R AN IBP FAP 37 3,997 R AN IBP UAP 27 1,021 R AN IBP P2 13 339 R AN IBP P 11 282 6 R AN IBP LAP 11 634 BP #2 8 8 R AN IBP P4 2 2 CHCO Slides from Maggie Massary. Used with permission R AN IBP BAP 1 1
EHR WORKFLOWS MEET DATA QUALITY • Core vital signs: Blood Pressure, Height & Weight • Blood Pressure: 113 unique BP Names: 15 have been deleted 45 are hidden 52 are available: • 37 are in use (have values) • 29 have been used more than a thousand times • 14 has been used on less than 71 patients • 23 have been used on more than 371 patients 7 CHCO Slides from Maggie Massary. Used with permission
LET’S TALK ABOUT DATA QUALITY (DQ) AND THE WAY WE DESCRIBE IT …..... • Six- year olds who their EMR records say are….. • Married (53) • Have significant others (18) • Divorced (2) / legally separated (3) • What term would you use to describe this issue? • Data validity • Data accuracy • Trueness- Truthiness • Believability • Consistency (age versus martial status) 8
WHY STANDARDIZE DATA QUALITY TERMINOLOGY? • Standardizing DQ terminology is a first step in…. • Standardizing DQ assessment methods …. • Supports sharable and reusable DQ methods • Supports common understanding of DQ issues • Supports increased transparency and trust in analytic methods & findings
DIVERSITY IN THE USE OF DQ TERMS 10
COMMUNITY-DRIVEN CONSENSUS RECOMMENDATIONS FOR DQ REPORTING 11
20 ITEMS IN 5 DOMAINS • Original Data Source • Data Steward • Data Processing/Provenance • Data Element Characterization • Analysis-specific Data Quality Specifications
COMMUNITY-DRIVEN CONSENSUS RECOMMENDATIONS FOR DQ REPORTING 13
THE HARMONIZED DATA QUALITY TERMINOLOGY • Divides the DQ “world” into two “contexts” • Verification : What you can do with just the data (and knowledge) you have on hand. • Expectations are derived internally • Validation : Brings in external resources – relative gold standards, recognized benchmarks/comparators • Expectations are derived externally 15
THREE DQ CATEGORIES THAT BUILD ON EACH OTHER • Completeness : Are data values present? • Doesn’t evaluate if the values makes sense, just “Are values there or not” • Fidelity : Are the data dependable? • Doesn’t evaluate if the values are believable, just “Do values align together as expected” • Plausibility : Are the data believable? • Doesn’t depend on the existence of an absolute truth 16
VERIFICATION VALIDATION Definition Example Definition Example COMPLETENESS: ARE THE DATA PRESENT? Density a. Atemporal: Measures a. Atemporal: Measures of a. Similar counts of missing a. Similar counts of missing of data density patient observations data density against a patient observations against a between ETLs. denominator are across network data denominator are expected based on partners. b. Counts of monthly expected based on external knowledge. emergency room visits b. Changes in counts of internal knowledge. during flu season. monthly emergency b. Temporal: Measures of b. Temporal: Measures room visits during flu data density against a season are similar to of data density time-oriented health department against a time- denominator are reports. oriented denominator expected based on are expected based external knowledge. on internal Includes total missingness knowledge. measures. Includes total missingness measures.
VERIFICATION VALIDATION Definition Example Definition Example FIDELITY: ARE THE DATA DEPENDABLE? Metadata a. Data elements conform a. Formatting for the a. Data elements a. Sex is only one ASCII conform to internal character. to representational primary language formatting a. Sex only has values ‘ M ’ , ‘ F ’ constraints based on variable in the external standards. or ‘ U ’ . constraints. demographics table a. Patient MRN ’ s link to other conforms to ISO b. Data elements standards. conform to relational tables as required. constraints. Measure a. Repeated a. Patient height a. Two dependent a. Recorded date of birth is measurement of the measurements are similar databases (e.g., consistent between EHR same fact show when taken by two database 1 abstracted data and registry data for expected variability. separate nurses within the from database 2) yield the same facility. same facility. similar results for identical measurements. Derivation a. Derived values a. Database- and hand- a. Two programmers a. Data transformations conform to calculated Body Mass provided with identical implemented in SAS and computational or Index values are identical. specifications and R yield identical results programming identical data sets on the same data set. specifications. report identical results for derived values. Uniqueness a. The database is a. Each patient is registered a. An object represented a. A single charge in a absent of duplicate under a single MRN. in a source database claims database measurements. is uniquely represents a single a. Person records obtained encounter in the EHR. represented in a target b. Within a database, via EHR and claims data database. b. A single drug order in an merged objects are are only counted once. only counted once. a. An object represented EHR database is in a source database represented by its is represented by its ingredients in a pharmacy database. components in a target database.
VERIFICATION VALIDATION Definition Example Definition Example PLAUSIBILITY: ARE THE DATA BELIEVABLE? Measure a. Data values and a. Data values and a. All patients have positive a. HbA1c values from distributions agree values for height and distributions (including hospital and national with an internal weight. subgroup distributions) reference lab are measurement or local agree with trusted statistically similar b. Serum glucose knowledge. reference standards or under the same measurement is similar external knowledge. conditions. b. Independent to finger stick glucose b. Similar results for measurements of the measurement. b. Diabetes ICD-9 and CPT same fact are in identical measurements codes are similar b. Oral and axillary agreement. are obtained from two between two temperatures are c. Logical constraints independent databases independent claims similar. representing the same databases serving between variables c. Sex agreement with sex- observations with equal similar populations. and subgroups agree specific contexts credibility. with local or common (pregnancy, prostate knowledge ( Includes cancer). "expected" c. Inpatient diagnoses are missingness). not associated with outpatient encounters. Time a. Observed or derived a. Length of stay for a. Observed or derived a. Length of stay for values conform to outpatient procedures values have similar outpatient procedures expected temporal per year conforms to temporal properties conforms to Medicare properties. expectations. across one or more data for similar b. Sequences or state external comparators or populations. a. An initial immunization b. Immunization sequences gold standards. transitions conform to precedes a booster expected properties. b. Sequences or state immunization. match the state transitions are similar to immunization registry sequence. external comparators or gold standards.
DQ CODE-A-THON DQ CODE-A-THON • Four teams • All workshop artifacts available on public github • https://github.com/DQCode-A-Thon/
21 www.pcori.org http://dododas.github.io/dqa-viz/dashboards.html
22 www.pcori.org http://dododas.github.io/dqa-viz/dashboards.html
23 www.pcori.org https://sigfried.github.io/parcoords/
Recommend
More recommend