Using natural language processing to assess documentation of features of critical illness in discharge documents of ARDS survivors AcademyHealth ARM 2016: Leveraging Data to Improve Quality and Outcomes Gary E. Weissman, MD 1,2 , Michael O. Harhay, MPH 2,3 , Ricardo M. Lugo, MD, MA 4 , Barry D. Fuchs, MD, MS 1 , Scott D. Halpern, MD, PhD 1,2,3 , Mark E. Mikkelsen, MD, MSCE 1,3 1 Pulmonary, Allergy, and Critical Care Division, Hospital of the University of Pennsylvania, Philadelphia, PA 2 Leonard Davis Institute of Health Economics, University of Pennsylvania, Philadelphia, PA 3 Center for Clinical Epidemiology and Biostatistics, University of Pennsylvania, Philadelphia, PA 4 Division of Cardiovascular Medicine, Vanderbilt University School of Medicine, Nashville, TN June 27, 2016 @garyweissman
Disclosures Funding • NIH/NHLBI T32-HL098054 Conflicts of interest • None 2
Background Acute respiratory distress syndrome (ARDS) • 190,600 cases/yr in USA • Improved mortality more survivors Rubenfeld et al. N Engl J Med 2005;353:1685-1693. Abel et al. Thorax 1998;53:292-294. Post-intensive care syndrome (PICS) • 80% prevalence among ARDS survivors • Deficits across 3 domains: – Cognitive – Psychiatric – Functional Elliott et al. Crit Care Med 2014;42:2518-2526. Needham et al. Crit Care Med 2012;40:502-509. The handoff • Often only chance to communicate between inpatient outpatient • JCAHO: “reason for hospitalization” and “significant findings” • Review: primary diagnosis (17.5%), hospital course (14.5%) Kind et al. AHRQ 2008. Kripalani et al. JAMA 2007;297:831-841. 3
Questions Clinical • Which features of critical illness are documented at hospital discharge? Methodologic • Which NLP tasks are important for identification of features of critical illness in hospital discharge documents? 4
Natural language processing (NLP) “…a subfield of linguistics and computer science that deals with computer applications who input is natural language.” Bretonnel Cohen and Demner-Fushman. Biomedical Natural Language Processing. 2014 . Yim et al. JAMA Oncol 2016;797-804. 5
Methods Population Keywords Natural language processing (NLP) Sensitivity analysis Manual review Multivariable modified Poisson regression 6
Methods: Population Prospective, electronic, real-time ARDS identification 2013 - 2015 Sensitivity 97.6%, specificity 97.6% Exclusions: • Heart failure • Neurosurgery • 48h post-operative • FiO2 < 0.5 at diagnosis • Death in hospital • Discharge to inpatient hospice Final sample • 1,797 815 eligible discharge documents Koenig et al. Crit Care Med 2011;39:98-104. 7
Methods: Population 8
Methods: Keywords (incomplete) ARDS Mechanical ICU admission Symptoms of PICS ventilation Acute lung injury Extubate CCU Anxiety Acute respiratory Intubate Critical care Brain dysfunction distress syndrome ALI Mechanical ventilation Critical illness Cognitive impairment ALI/ARDS VDRF Critically ill Confusion ARDS Vent CTICU Delirium Vent dependent CTSICU Depression respiratory failure Ventilator ICU Executive dysfunction Intensive care ICU delirium Intensive care unit Immobility MICU Memory dysfunction 9
Methods: NLP R statistical computing language • packages: tm, RWeka, data.table, stringdist, ggplot2 Tasks • Standard preprocessing • Morphologic decomposition • Tokenization • Spelling error identification Not included • Named entity recognition • Relation extraction (negation, temporal) • Word sense disambiguation • Problem-specific segmentation Goal: keyword-based document classifier 10 10
Methods: NLP task detail Morphologic decomposition – “stemming” • Acronyms treated separately patient emergently intubated respiratory failure patient emerg intub respiratori failur 11 11
Methods: NLP task detail Spelling error identification • Sensitivity analysis Mr. Smith suspected of having acute respiratroy distress syndrome Restricted Damerau-Levenshtein Distance: > stringsim('respiratroy', 'respiratory') [1] 0.9090909 12 12
Results 13 13
Results: Modified Poisson regression 14 14
Results: sensitivity analysis 15 15
Results: Manual review 16 16
Error analysis Word sense disambiguation “… depression of left ventricular systolic function …” “… weak gag...” Sentence boundary detection “...was admitted to the MICU.MICU course complicated by...” 17 17
Summary points ARDS not often documented at discharge because it’s not recognized in the ICU – not because of “forgetting” Mechanical ventilation and ICU admission are frequently mentioned in discharge summaries of ARDS survivors Keyword-based document classifier has excellent accuracy for identifying ARDS, mechanical ventilation, and ICU admission in hospital discharge documents Named entity recognition and disambiguation (NERD) and sentence boundary detection tasks may improve performance of PICS symptom identification Weissman GE, et al. Annals of the American Thoracic Society. Epub. 2016. 18 18
Next steps EHR redesign • Easily accessible, searchable queries for researchers • Robust metadata for structured and unstructured fields Data challenges • Location-specific jargon • Share code Generalizability • Need for decision support for ARDS recognition in real time • Method not the results 19 19
Acknowledgments Acute care health services research group, University of Pennsylvania Penn Data Analytics Center Anonymous reviewers and editorial staff at Annals of the American Thoracic Society 20 20
21 21
Results 22 22
Results 23 23
Methods: NLP task detail Preprocessing • Remove whitespace, numbers, punctuation, stopwords, lower case The x-ray demonstrated a 4 cm mass. No effusion or pneumothorax. xray demonstrated cm mass effusion pneumothorax 24 24
Methods: NLP task detail Tokenization • N- grams (1 ≤ N ≤ 4) patient suspected acute respiratory distress syndrome intubated critical illness N = 2: patient suspected, suspected acute, acute respiratory, respiratory distress, distress syndrome, syndrome intubated, intubated critical, critical illness 25 25
Recommend
More recommend