  1. Publicly Available Large Data Sets for Health Outcomes Research: Pearls, Pitfalls, Prices & More LAKSH IKA TEN N AKOON - MD , MS C , MP H IL , D TM&H R ESEAR C H SC IEN TIST TR AU MA, AC U TE C AR E AN D C R ITIC AL C AR E SU R GERY STAN FO R D U N IVER SITY October, 2018

  2. Aims • To encourage use of public data for Research • To characterize existing large clinical databases

  3. Best Currently Available Databases Databases Dates Source Nationwide Inpatient Sample (NIS) 1988- 2016 HCUP Nationwide Emergency Department Sample (NEDS) 2006-2016 HCUP Nationwide Readmissions Database (NRD) 2010-2016 HCUP KID Inpatient Data (KID) 1997,2000, HCUP 2003,2006, 2009, 2012, 2016 ACS National Trauma Databank (NTDB) 2002-2016 ACS National Surgical Quality Improvement program (NSQIP) 2005-2016 CDC National Ambulatory Medical Care Survey (NAMCS) 1993-2015

  4. Databases………. Databases Dates Source CDC National Health and Nutrition Examination Survey 1999-2015 (NHANES) CDC National Hospital Ambulatory Medical Care Survey 1992-2015 (NHAMCS) Medicare/SEER 1991-2015 Government Private MarketScan 2002-2011 Hospital Based Hospital based Registry data

  5. Nationwide Inpatient Sample (NIS) • The largest publicly available all-payer inpatient care database in the United States • Samples include all discharges from 20% stratified sample of US hospitals • NIS data can be weighted to generate national estimates • Years available: 1988 to 2016 • Has 8 million hospital stays a year • NIS_2015_CORE data file has: 7,153,989 Records

  6. Cost & Data Load Software • Cost of 2016 NIS : $625 • Original Data comes as CSV or ASCII files • Load programs are available in: STATA SAS SPSS • Data storage : Large databases need a server or BOX

  8. Citing HCUP Databases • Citing HCUP Databases in Abstract and Manuscript: As specified in the HCUP DUAs, include the database name, HCUP, and AHRQ as demonstrated below for each HCUP database: • HCUP Nationwide Inpatient Sample (NIS). Healthcare Cost and Utilization Project (HCUP). 2007-2009. Agency for Healthcare Research and Quality, Rockville, MD. www.hcup-

  9. What Data Elements Are in the NIS? Data Files ▪ Core Data ▪ Hospital Data ▪ Illness Severity Data ▪ Cost to charge ratio Data ▪ Diagnosis & Procedure Groups Data ▪

  10. Core Data File • Age at admission • Gender of patient • Race of Patient • Location of patient’s residence • Median household income for patient's ZIP code • ICD-9-CM diagnoses: primary and secondary diagnoses, number of diagnoses, diagnosis coding system • External causes of injury and poisoning: ECODE 1-4, number of external cause of injury • ICD-9-CM Procedures: primary and secondary procedures, number of procedures, procedure systems, duration of primary and secondary procedures • Total charges • Disposition • Length of stay

  11. Hospital Data File • Hospital bed size • Type of Hospital: government or private; government, nonfederal, public; private, non- profit; private, investor-own • Hospital Location: rural or urban, • Location/teaching status of Hospital: rural, urban non-teaching, urban teaching • Region of Hospital : Northeast, Midwest, South, West • Hospital Weights: weight to hospitals in AHA universe, weight to hospitals in the State

  12. Severity of Illness Data • Severity of Illness Subclass • Risk of Mortality Subclass • 29 Comorbid conditions: Alcohol Abuse, Depression, Drug Abuse, Liver Disease, Renal Failure, Obesity ………… .. • Defined by Elixhauser Comorbid Scale • https://www.hcup-

  13. Cost-to-Charge- Ratio Data • Year • Hospital Unique Identifier • Wage Index • CCR_NIS (an Identifier, linking NIS 2012 to current ) • Calculate “Total Cost” based on above data and “Total Charges” (TOTCHG) variable which is available in NIS core data file • Formula : gen Total_COSTS= TOTCHG*CCR_NIS

  14. Nationwide Emergency Department Sample (NEDS) • NEDS is the largest all-payer ED database in the United States • Samples include stratified samples of 20% of US hospital-based Emergency Departments • Years available: 2006-2016 • Number of ED visits: Between 25 and 30 million (unweighted) records for ED visits from 950 hospitals • Cost of NEDS 2016 $1000

  15. What Data Elements Are in the NEDS? Four Data Files per year ▪ Core data ▪ Emergency department data ▪ Inpatient data ▪ Hospital Weights data

  17. Nationwide Readmissions Database (NRD) • Calculate national readmission rates for all payers and the uninsured • Available nationally representative information on hospital readmissions for all ages • Unweighted NRD data from approximately 12 million discharges each year • Has Core data, Hospital data, Illness severity, Cost to Charge Ratio data • Available years 2010- 2016 • Cost of NRD 2016 data $1000

  18. KID (Kids’ Inpatient Database ) • Only all-payer pediatric inpatient care database in the USA • Contains 2-3 million hospital stays • Helps to develop national & regional estimates on diseases • Data available for Demographics, Injury characteristics, Diagnosis, Hospital characteristics, Outcomes and Healthcare Cost • Need to sign a DUA • Cost of KID 2016 data $500

  19. National Trauma Data Bank (NTDB) • The largest registry of trauma patients admitted to trauma centers in the United States • Data is not weighted • No DUA (data user agreement) • Samples are obtained from trauma center • registries ▪ In 2011, 747 trauma centers were included • Years available: 2002 -2016 • Data files are in CSV format • Cost of 2016 NTDB data $300

  20. NTDB Data • Demographic data • Injury severity data • Emergency department data • Mechanisms of Injury data • ICD9 and ICD10 Procedure data • ICD9 and ICD10 Diagnosis data • Discharge disposition data • Facility data • Vital signs data • Protective devices & transportation data • Comorbid and complications data

  21. National Surgical Quality Improvement Program (NSQIP) • A nationally validated, risk-adjusted, and outcomes- based program • NSQIP has prospective and outcomes data • Years available: 2005 - 2011 • NSQIP will measure and improve the quality of surgical care across surgical specialties • 680 hospitals are participating NSQIP in 2017

  22. What Data Elements Are in the NSQIP? • Preoperative risk factors • Intraoperative variables • 30-day postoperative mortality and morbidity outcomes • Demographic data • Current Procedural Terminology (CPT) data • Health and behavior data • Physical examination data

  23. • Free data for NSQIP participating hospitals • Data Request Process • Need to sign a DUA (Data User Agreement) • Download the data • Data files available in 3 different formats: Text, SPSS, SAS

  24. MarketScan Data • Private database • MarketScan is broadly representative of the commercially insured population of United States • High quality, longitudinal, and patient level data • Low percentage of missing data • Years available: 2002 – 2011 • Need to sign a DUA (Data User Agreement) Cost around $50,000/year

  25. What Data Elements Are in the MarketScan? • Patient socio-demographic data • Admission date and type • Diagnosis code (principal and secondary) • Discharge status • Procedure code (principal and secondary) • Length of stay • Place of service • Provider ID • Data on drugs/medications

  26. SEER-Medicare Data • SEER-Medicare Linked Database • Medicare beneficiaries with cancer • Data derived from Surveillance, Epidemiology and End Results • Diagnosis & Procedure codes: ICD9, ICD10, CPT, • HCPCS (Healthcare Common Procedure Classification System) • Patient Demographic and Socioeconomic Characteristics • Comorbidity • Breast, Colorectal, and Prostate Cancer Screening • Radiation Therapy (includes codes to identify radiation therapy) • Chemotherapy Use (includes codes to identify chemotherapy) • Complications of Cancer Treatment • Surveillance After Cancer Treatment • Data sets available from 1991-2015 • Need to sign a DUA (Data User Agreement)

  27. • Physician Characteristics • Hospital Characteristics • Health Care Costs Related to Cancer Treatment

  28. National Health and Nutrition Examination Survey (NHANES) • Cross-sectional and high quality survey data of adults and children in United States • Data available on nationally representative sample of about 5,000 persons/each year • Years available: 1971-75 — NHANES I 1976-80 — NHANES II 1982-84 — Hispanic Health and Nutrition Examination Survey (HHANES) 1988-94 — NHANES III 1999-present--National Health and Nutrition Examination Survey (Continuous NHANES) • Free to download the data from CDC website


