environmental health science data streams data streams
play

Environmental Health Science Data Streams Data Streams Health Data - PowerPoint PPT Presentation

Environmental Health Science Data Streams Data Streams Health Data Health Data Brian S. Schwartz, MD, MS January 10 2013 January 10, 2013 When is a data stream not a data stream? When it is health data. EHR data = PHI of health EHR data =


  1. Environmental Health Science Data Streams Data Streams Health Data Health Data Brian S. Schwartz, MD, MS January 10 2013 January 10, 2013

  2. When is a data stream not a data stream? When it is health data. EHR data = PHI of health EHR data = PHI of health system “Data stream” IRB approval, data pull (by IT), data transfer (to researchers), data cleaning variable creation (“phenotyping” of patients) data cleaning, variable creation ( phenotyping of patients), data merging, environmental metrics, data analysis (computationally intensive – person, place, time)

  3. Using EHR Data: An Example • Using longitudinal EHR data, how do we know if a patient has diabetes? p • When does observation of the patient begin? – With EHR data cannot determine “enrollment” (health plan data) • When did the patient’s diabetes begin? • How do we distinguish type 1 from type 2? • What does it mean if a HbA1c level exists, or diabetes What does it mean if a HbA1c level exists or diabetes treatment began, before any ICD-9 code for diabetes? • How do we define diabetes severity? y • How do we avoid confounding by indication? – In observational studies of drug effects, drugs are not assigned randomly; indication for treatment may be related to risk of future randomly; indication for treatment may be related to risk of future health outcomes

  4. The Natural History of Diabetes Healthy Diabetes Complications Pre-diabetes 100 mg/dL ≤ FBS ≤ HbA1c ICD-9 HbA1c Rx 125mg/dL (screening) code (monitoring) mean duration = mean duration = mean duration = 159d 117d 1534d 1 st ICD-9 HbA1c HbA1c HbA1c (post-ICD-9) (pre-therapeutic) diabetes code (last-ever) n = 7337 n = 17,959 mean = 7.51% mean = 7.64% mean duration = 1732 days

  5. NIH Research Collaboratory Council of Councils Meeting, July 1, 2010 NIH-HMORN Collaboratory: Common Fund Proposal Purpose: The NIH-HMORN Collaboratory will enhance and strengthen Purpose: The NIH HMORN Collaboratory will enhance and strengthen a research platform to accelerate large epidemiology studies, pragmatic clinical trials, and EHR-enabled health care delivery research by leveraging the HMORN's scientific, data and operational y g g , p infrastructure. • Limited competition U54 RFP released 2-17-2011, then changed. • Duke Clinical Research Institute awarded $9M from NIH to serve as Duke Clinical Research Institute awarded $9M from NIH to serve as the Coordinating Center for NIH’s Health Care Systems Research Collaboratory – 9/25/12 press release – “The goal of the Collaboratory is to involve clinicians and patients in the The goal of the Collaboratory is to involve clinicians and patients in the design and interpretation of trials, provide the education needed to enhance the value of their participation, and use the data collected during healthcare delivery as the core data source for the full spectrum of clinical research from registries to obser ational st dies and of clinical research, from registries to observational studies and pragmatic randomized controlled trials.”

  6. Virtual Data Warehouse • NIH-HMORN Collaboratory included a goal to develop a VDW – “The objectives of this initiative are to improve data quality; enable cross-site and cross project synergies; balance site-, enable cross site and cross project synergies; balance site , project-, and network-level priorities; and reduce the preparatory work needed to assemble cohorts, count events, and capture exposure and co-morbidity data, all in support of an array of y y different types of studies.” • HMO-RN members have been working on a VDW • • An internal website provides metadata (years variable An internal website provides metadata (years, variable descriptions, labels, formats, definitions, specification, coding) • HMORN has developed guidelines and policies to facilitate research, but control resides at each member site h b t t l id t h b it • Efforts to write programs to extract & convert variables stored in legacy information systems to common standards; test standardized data for consistency & accuracy; standardize methods by providing d t f i t & t d di th d b idi macros & programs that are used across sites; provide instructions on how to use VDW to create analytic files for research

  7. The VDW • Not a centralized data warehouse; it consists of parallel, identical databases at each HMORN site, to facilitate merging across sites • It is not an analytic dataset, but does facilitate creation of such y , • As of March 2011, VDW data domains include: – Demographics: date of birth, gender, race and ethnicity – Enrollment: health plan membership enrollment , with insurance types, benefits, p p yp effective dates of coverage – Encounters: OPT, IPT, with associated diagnosis and procedure codes, type of encounter, provider seen, facility and discharge disposition – – Procedures: performed procedures (e g Procedures: performed procedures (e.g., surgery, lab, radiology, immunization); surgery lab radiology immunization); various coding systems (CPT, HCPCS, ICD ‐ 9, insurance claims Revenue Codes) – Diagnoses: dates, diagnosis codes, provider – Providers: specialty, age, gender, race and year graduated – Cancer/Tumor Registry: Surveillance, Epidemiology and End Results (SEER) program standards – most complex domain of VDW – Pharmacy Dispensing: date, National Drug or GPI code, therapeutic class, days supply, and amount dispensed supply, and amount dispensed – Vital Signs: height, weight, blood pressure, tobacco use and type – Laboratory Values: originally HbA1c, S-Cr, INR, FBG, serum K; values are being added through a timed priority list of 57 types of lab tests

  8. In multisite studies, site- level differences in disease incidence, predictive variables, and health outcomes can represent: outcomes can represent: • True “small area” variation in practice patterns & p p outcomes • Variability in data collection methods across sites th d it Data quality assessments across sites are a critical across sites are a critical first step in multisite studies Kahn, et al., Medical Care, 2012

  9. 1)Type of data • HEALTH data from electronic health records • At Geisinger, 400K+ primary care patients, hundreds of millions of At G i i 400K i ti t h d d f illi f records; many kinds of health information 2) What is the current status of data collection/archiving? 2) What is the current status of data collection/archiving? • Most patient health information will be collected electronically in the coming years • There is no single repository for US health data There is no single repository for US health data • Health systems most often use programs such as Epic; they then export data from Epic to a data warehouse for more easy access; and export from the warehouse for analysis y • There is no centralized warehouse; there are mechanisms for gaining access; there is no centralized catalog 3) Non technical aspects of sharing • These data are not public; they can be accessed after agreements are in place, most often in collaborative research relationships; there is no ro tine sharing routine sharing • Creating a single national repository of EHR data would be a daunting task

  10. 4) Standardization in description of the data • Many types of data: dates, encounters, diagnoses, ICD-9 and CPT codes, laboratory test codes with results, procedure test codes sometimes with results, physician orders, medications, imaging • Variation across providers, clinics, health systems • I do not believe there are as yet many ontology or metadata standards; text searching is necessary and natural language processing in early development searching is necessary and natural language processing in early development 5) Movement and ability to combine with other data • The health data are for INDIVIDUAL patients; individual patients cannot be directly li k d t linked to family members f il b • Health data can be linked to other data by location (generally residential address) and date (space and time) • In general, approaches to analysis of EHR data are on a study by study basis g , pp y y y y • Data have to be accessed, exported, used to create analytic variables, merged with other data, analyzed • Epic has some analysis tools; in general we export data and use biostatistical software programs software programs 6) Specific example: scientific question limited by integration challenges • As long as other data have meaning in space and time there should not be g g p obstacles to integration • Have to acknowledge we may not be able to get what we actually want – we use surrogates for exposure

  11. Thank you for listening Thank you for listening Second Presentation ENDS HERE

Recommend


More recommend