Semi-supervised Prediction of Comorbid Rare Conditions Chirag Nagpal 1 , K Miller 1 , T Pellathy 2 , M Hravnak 2 , G Clermont 2 , M Pinsky 2 , A Dubrawski 1 1 Auton Lab Carnegie Mellon University 2 University of Pittsburgh chiragn@cs.cmu.edu November 18, 2017 1 / 83
Overview 1 Background Motivation Prior Work 2 Dataset Description Sources Feature Extraction Ground Truth 3 Approach Baselines PreCoRC : Prediction of Comorbid Rare Conditions 4 Results 5 Future Work 2 / 83
Overview 1 Background Motivation Prior Work 2 Dataset Description Sources Feature Extraction Ground Truth 3 Approach Baselines PreCoRC : Prediction of Comorbid Rare Conditions 4 Results 5 Future Work 3 / 83
Motivation 4 / 83
Motivation • Rare Conditions are potentially under reported in EHR. 5 / 83
Motivation • Rare Conditions are potentially under reported in EHR. • Prevent Failure-to-Rescue, (FTR). Death of a Hospitalised Patient from a treatable condition. 6 / 83
Motivation • Rare Conditions are potentially under reported in EHR. • Prevent Failure-to-Rescue, (FTR). Death of a Hospitalised Patient from a treatable condition. • Ability to predict conditions would allow for pro-active healthcare 7 / 83
Motivation • Rare Conditions are potentially under reported in EHR. • Prevent Failure-to-Rescue, (FTR). Death of a Hospitalised Patient from a treatable condition. • Ability to predict conditions would allow for pro-active healthcare • Challenge: FTR Conditions, under reported, available data sparse for standard Machine Learning 8 / 83
Motivation 9 / 83
Motivation • Leverage historical EHR Build Early Warning System, identify patients at risk. 10 / 83
Motivation • Leverage historical EHR Build Early Warning System, identify patients at risk. • Augment scarce ground truth for operationally useful models. 11 / 83
Motivation • Leverage historical EHR Build Early Warning System, identify patients at risk. • Augment scarce ground truth for operationally useful models. • Model interpretable by the end user, medical practitioner. 12 / 83
Tree Featurization • Tree Featurization [Singh et al., 2014] Expicitly Leverage ICD Hierarchy in the Feature Representation. 13 / 83
Tree Featurization • Tree Featurization [Singh et al., 2014] Expicitly Leverage ICD Hierarchy in the Feature Representation. Pneumonia 487 14 / 83
Tree Featurization • Tree Featurization [Singh et al., 2014] Expicitly Leverage ICD Hierarchy in the Feature Representation. Pneumonia Pneumonia&Influenza 487 480-488 15 / 83
Tree Featurization • Tree Featurization [Singh et al., 2014] Expicitly Leverage ICD Hierarchy in the Feature Representation. Pneumonia Pneumonia&Influenza Respiratory System 487 480-488 460-519 16 / 83
OoD Embedding Learning 17 / 83
OoD Embedding Learning • Out of Domain Embedding Learning [Liu et al., 2016] Learn Embeddings from External Sources for Dense Representation of ICD codes PubMed 18 / 83
OoD Embedding Learning • Out of Domain Embedding Learning [Liu et al., 2016] Learn Embeddings from External Sources for Dense Representation of ICD codes PubMed PubMed Central 19 / 83
OoD Embedding Learning • Out of Domain Embedding Learning [Liu et al., 2016] Learn Embeddings from External Sources for Dense Representation of ICD codes PubMed PubMed Central Open Access 20 / 83
OoD Embedding Learning • Out of Domain Embedding Learning [Liu et al., 2016] Learn Embeddings from External Sources for Dense Representation of ICD codes One Hot Encoding PubMed PubMed Central Open Access 21 / 83
OoD Embedding Learning • Out of Domain Embedding Learning [Liu et al., 2016] Learn Embeddings from External Sources for Dense Representation of ICD codes CBOW One Hot Encoding PubMed PubMed Central Open Access 22 / 83
OoD Embedding Learning • Out of Domain Embedding Learning [Liu et al., 2016] Learn Embeddings from External Sources for Dense Representation of ICD codes CBOW One Hot Encoding Dense Encoding PubMed PubMed Central Open Access 23 / 83
Overview 1 Background Motivation Prior Work 2 Dataset Description Sources Feature Extraction Ground Truth 3 Approach Baselines PreCoRC : Prediction of Comorbid Rare Conditions 4 Results 5 Future Work 24 / 83
Feature Extraction 25 / 83
Feature Extraction Static Data 1 Age 2 Gender 3 Ethnicity 26 / 83
Feature Extraction Static Data Admission Data 1 ICD-9 Codes 1 Age • Diagnosis Codes 2 Gender • Procedure Codes • Admission Codes 3 Ethnicity 2 Diagnosis Related Groups 27 / 83
Feature Extraction Aggregated Records Static Data Admission Data 1 ICD-9 Codes 1 Age 1 X T n = { 1 , 0 ... 0 , 1 } • Diagnosis Codes 2 X ′ 2 Gender • Procedure Codes T n = • Admission Codes Σ { X T 1 , ..., X T n } 3 Ethnicity 2 Diagnosis Related Groups 28 / 83
Clinical Tasks 29 / 83
Clinical Tasks • Intubation & Mechanical Ventilation ( Task-IMV ) A Treatment Scenario occuring in context of Failure-to-Rescue (FTR) cases. ICD Codes: 96.04, 96.71, 96.72, 518.81 30 / 83
Clinical Tasks • Intubation & Mechanical Ventilation ( Task-IMV ) A Treatment Scenario occuring in context of Failure-to-Rescue (FTR) cases. ICD Codes: 96.04, 96.71, 96.72, 518.81 • Venous Thrombo-embolism ( Task-VTE ) Includes both, patients diagnosed with Pulmonary and Deep Vein Thrombosis, an under reported, Life Threatening Condition ICD Codes: 415.1, 451.11, 451,2, 451.81, 453.8 31 / 83
Clinical Tasks 32 / 83
Clinical Tasks Intubation & Mechanical Ventilation ( Task-IMV ) 1266 Positives ≈ 1.173% 33 / 83
Clinical Tasks Intubation & Mechanical Ventilation ( Task-IMV ) 1266 Positives ≈ 1.173% Task-IMV -10 : Uses 10% Labelled Data 34 / 83
Clinical Tasks Intubation & Mechanical Ventilation ( Task-IMV ) 1266 Positives ≈ 1.173% Task-IMV -10 : Uses 10% Labelled Data Task-IMV -90 : Uses 90% Labelled Data 35 / 83
Clinical Tasks Intubation & Mechanical Ventilation ( Task-IMV ) 1266 Positives ≈ 1.173% Task-IMV -10 : Uses 10% Labelled Data Task-IMV -90 : Uses 90% Labelled Data Venous Thromboembolism ( Task-VTE ) 56 Positives ≈ 0.0519% 36 / 83
Overview 1 Background Motivation Prior Work 2 Dataset Description Sources Feature Extraction Ground Truth 3 Approach Baselines PreCoRC : Prediction of Comorbid Rare Conditions 4 Results 5 Future Work 37 / 83
Baselines 38 / 83
Baselines • Logistic Regression with ℓ 2 Penalty. LR 39 / 83
Baselines • Logistic Regression with ℓ 2 Penalty. • Random Forest Ensemble LR RF 40 / 83
Baselines • Logistic Regression with ℓ 2 Penalty. • Random Forest Ensemble • Principal Component Analysis LR RF PCA-LR PCA-RF 41 / 83
Baselines • Logistic Regression with ℓ 2 Penalty. • Random Forest Ensemble • Principal Component Analysis • Non-Negative Matrix Factorisation LR RF PCA-LR PCA-RF NMF-LR NMF-RF 42 / 83
PreCoRC Pipeline 43 / 83
PreCoRC Pipeline T-Edges O-Edges Historical Test Data Data I-Edges P-Edges Pre diction La be l Re -Distribution Prior Final Prediction Score Graph ICD-9 Binary Structure Hierarchy Classifier 44 / 83
PreCoRC Pipeline T-Edges O-Edges Historical Test Data Data I-Edges P-Edges Pre diction La be l Re -Distribution Prior Final Prediction Score Graph ICD-9 Binary Structure Hierarchy Classifier 45 / 83
PreCoRC Pipeline T-Edges O-Edges Historical Test Data Data I-Edges P-Edges Prediction Label Re-Distribution Final Prior Prediction Score Graph ICD-9 Binary Structure Hierarchy Classifier 46 / 83
Graph Construction 47 / 83
Graph Construction Patients Records ICD-9 Ontology 487 Record 1 Influenza Patient A Record 2 480-488 Pneumonia & Influenza Record n Patient B 460-519 Respiratory Diseases Record 1 I-Edge s O-Edge s Record n P-Edge s T-Edge s 48 / 83
Graph Construction Patients Records ICD-9 Ontology 487 Record 1 Influenza Patient A Record 2 480-488 Pneumonia & Influenza Record n Patient B 460-519 Respiratory Diseases Record 1 I-Edge s O-Edge s Record n P-Edge s T-Edge s 49 / 83
Graph Construction Patients Records ICD-9 Ontology 487 Record 1 Influenza Patient A Record 2 480-488 Pneumonia & Influenza Record n Patient B 460-519 Respiratory Diseases Record 1 I-Edges O-Edges Record n P-Edges T-Edges 50 / 83
Label Propagation 51 / 83
Label Propagation Harmonic Energy Minimization [Zhu et al., 2003] i ∈L ( y i − f i ) 2 D ii + λ � i , j ( f i − f j ) 2 A ii E ( f ) = � 52 / 83
Label Propagation Harmonic Energy Minimization [Zhu et al., 2003] i ∈L ( y i − f i ) 2 D ii + λ � i , j ( f i − f j ) 2 A ii E ( f ) = � Soft Label HEM [Wang et al., 2013] � � � ( y i − f i ) 2 D ii + λ � ( f i − π i ) 2 D ii + � ( f i − f j ) 2 A ii E ( f ) = w 0 i ∈L i ∈U i , j 53 / 83
Recommend
More recommend