septic shock prediction for patients with missing data
play

SEPTIC SHOCK PREDICTION FOR PATIENTS WITH MISSING DATA Joyce C Ho, - PowerPoint PPT Presentation

SEPTIC SHOCK PREDICTION FOR PATIENTS WITH MISSING DATA Joyce C Ho, Cheng Lee, Joydeep Ghosh University of Texas at Austin W HAT IS S EPSIS AND S EPTIC S HOCK ? Sepsis is a systemic inflammatory response to infection 11th leading cause


  1. SEPTIC SHOCK PREDICTION FOR PATIENTS WITH MISSING DATA Joyce C Ho, Cheng Lee, Joydeep Ghosh University of Texas at Austin

  2. W HAT IS S EPSIS AND S EPTIC S HOCK ? • Sepsis is a systemic inflammatory response to infection • 11th leading cause of death in 2010 • Estimated $14.6 billion spent on sepsis in 2008 • Septic shock (sepsis-induced hypotension) has a mortality rate of 45.7%

  3. I N -H OSPITAL D ETECTION Demographic 
 Information Vital Signs Patient Predictive Model Representation Labs Clinical Notes

  4. M ISSING D ATA P ROBLEM • Clinical studies must deal with large amounts of missing data • Measurements are noisy and irregularly sampled • Highly accurate measurements require invasive techniques (may not be medically necessary)

  5. T YPICAL A PPROACH • Ignore subjects with missing observations • Ignore features without complete data • Result: Highly curated datasets with limited features and small samples

  6. O UR S EPTIC S HOCK M ODEL Problem: Given a patient has sepsis, can we predict complications at least one hour prior to onset of septic shock? • Generalization to patients with partially missing observations • Simple and accessible approaches • Focus on commonly observed, non-invasive measurements

  7. C LINICAL F EATURES • Summary statistics (last measurement, min, mean, and max) in 8 hour window • Cardiac: non-invasive blood pressure, heart rate, pulse pressure • Other: respiratory rate, SpO 2 , temperature • Last measurement only (less observations) • White blood cell count • Index scores: SOFA, SAPS-I, Shock index

  8. I MPUTATION A PPROACHES • Mean / median imputation • Matrix factorization techniques Singular value based imputation (SVD) • Probabilistic principal component analysis • (PPCA) • K-nearest neighbors (KNN)

  9. I MPUTATION S ELECTION C RITERIA • Matrix factorization and neighborhood techniques have parameter to control resolution or locality of imputation • Evaluation metric typically involves randomly removing observations and comparing fit using root mean square error (RMSE) or mean absolute error (MAE) • RMSE / MAE may not necessarily translate to improved predictive performance

  10. P ERFORMANCE -O RIENTED I MPUTATION (POI) Random Build & Impute splits Evaluate Imputation parameter selection Optimal k Data Construct Impute Prediction Model

  11. MIMIC-II D ATABASE • Extensive, publicly available ICU data resource • Data between 2001 and 2007 from Boston’s Beth Israel Deaconess Medical Center ICUs • Over 40,000 ICU stays from 30,000+ patients • Clinical records with physiological measures, medication records, laboratory tests, free-form text notes, etc.

  12. I MPORTANCE OF I MPUTATION 400 Less than 22% of the 300 1,353 patients have Count 200 complete data 100 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 22 Number of Missing Features Feature 30 mins 60 mins Respiratory rate 0.67% 0.68% Temperature 1.70% 2.05% White blood Non-invasive BP is 15.30% 14.69% cells not always available Blood pressure 23.28% 23.44%

  13. D IFFERENCES IN P OPULATION Missing patients Complete only Sepsis Sepsis 
 Time Shock Shock P-value (only) (only) 30 mins 749 79 199 110 4.56E-26 60 mins 723 79 196 106 6.99E-24 90 mins 705 79 196 103 4.63E-22 120 mins 685 74 193 103 7.06E-23 Statistically significantly higher ratio of shock patients if you ignore patients with missing data

  14. P REDICTIVE P OWER OF M EAN I MPUTED M ODEL 30 minutes 60 minutes Train Data Test Data before (AUC) before (AUC) Complete Complete 0.796±0.065 0.777±0.050 Complete Imputed 0.815±0.033 0.800±0.053 Imputed Imputed 0.834±0.025 0.829±0.030 Imputed Complete 0.839±0.044 0.828±0.047 Model generalizes to broader population 
 with slightly better predictive performance

  15. C OMPARISON OF S ELECTION C RITERIA (SVM) SVD PPCA KNN 0.8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● AUC ● ● 0.7 ● ● ● ● ● ● ● ● ● ● ● 0.6 ● ● 2.00 ● ● ● ● ● 1.75 ● ● ● Value ● ● ● ● ● ● ● ● Lift 1.50 ● ● ● 1.25 ● ● ● ● ● ● ● ● ● ● 1.00 ● ● 0.5 ● ● ● ● ● ● 0.4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● F1 0.3 ● ● ● 0.2 ● ● ● 0.1 ● ● ● POI MAE RMSE POI MAE RMSE POI MAE RMSE Selection Criteria POI is generally better for 
 AUC + F-measure

  16. C OMPARING I MPUTATION A PPROACHES (SVD + L OG R) 60 120 180 1.00 True Positive Rate 0.75 selection 0.50 POI MAE 0.25 RMSE Mean 0.00 0.00 0.25 0.50 0.75 1.000.00 0.25 0.50 0.75 1.000.00 0.25 0.50 0.75 1.00 False Positive Rate POI outperforms RMSE, but mean and MAE are generally the best

  17. C OMPARING I MPUTATION A PPROACHES (SVD + L OG R) 60 120 180 25 ● ● ● ● 20 15 K ● ● ● ● ● ● ● 10 ● ● ● ● ● ● ● ● 5 ● ● ● ● ● ● ● ● ● ● ● POI MAE RMSE POI MAE RMSE POI MAE RMSE Selection Criteria RMSE favors the simplest model (k=1), MAE favors most complex (k=25), POI lies in between the two

  18. C OMPARING I MPUTATION A PPROACHES (F EATURE R ANK ) Feature Mean AUC F1 RMSE Systolic BP 1.50 1.70 1.70 2.40 SpO2 2.22 3.00 3.22 2.56 Shock Index 4.40 4.40 4.60 3.30 Temp 5.00 5.00 7.50 Diastolic BP 11.00 8.00 8.25 5.00 Selection criteria influences feature ranking within the same imputation method

  19. C ONCLUSION • Generalizes to all ICU patients • Focuses on commonly observed, non-invasive clinical measurements • Uses simple and accessible approaches for missing data problem

  20. R EFERENCES Joyce C Ho, Cheng H Lee, and Joydeep Ghosh. Imputation-enhanced prediction of septic shock in ICU patients. In 2012 ACM SIGKDD Workshop on Health Informatics (HI-KDD) , 2012. Joyce C Ho, Cheng H Lee, and Joydeep Ghosh. Septic shock prediction for patients with missing data. ACM Transactions on Management Information Systems (TMIS) , 5(1):1:1–1:15, 2014.

Recommend


More recommend