predicting ed attendance from
play

Predicting ED Attendance from GP Records Jon Patrick CEO - PowerPoint PPT Presentation

Predicting ED Attendance from GP Records Jon Patrick CEO Statement of Interests Project was funded by HCF Foundation to Outcome Health HLA was contracted by Outcome Health to perform the predictive modeling and NLP work The


  1. Predicting ED Attendance from GP Records Jon Patrick CEO

  2. Statement of Interests • Project was funded by HCF Foundation to Outcome Health • HLA was contracted by Outcome Health to perform the predictive modeling and NLP work

  3. The beginnings • HLA was contracted to provide the SNOMED CT coding of 56,00 GP Reason for Visit notes. • A Common Usage Classification System was created by HLA to better represent the classes of information important to the client. • This was based on a “dismembering and reassembly” of the SCT hierarchy to fit the client needs.

  4. Outcome Health Records • 35,416 GP visits for 20,971 Unique patients with subsequent ED records extracted from the VEMD – Data attributes as part of the Patient Record: • Care Plan Goal (Have Goal 4%) with major classes – Improve general health 1424, – Prevent influenza 1348, – Prevent complications 1166, – Maintain function 1147, – Manage pain 1125, – Reduce rate of progression of disease 1058, – Improve knowledge of condition 1035, – Maintain mobility 979

  5. More on Records Data attributes from the Clinical tables includes: • Smoking status(12% smoker, 23% ex-smoker), • Alcohol status(9% drinker, 5% non-drinker), • Allergy status (known allergy 12%).

  6. Descriptive Statistics • List of Diagnoses has about 30% (6478) of patients with reported diagnoses. The relative importance of this data attribute vis-a-vis the Reason_for_Visit attribute was an open question. • About 24% of visits use 4500 unique diagnosis descriptions. • The most frequent diagnoses by Visit are { – Hypertension 197, – URTI 174, – Asthma 174, – Depression 146, – Bronchitis 135, – Tonsillitis 115, – UTI 114, – Otitis media 97, – Gastroenteritis 95, – Review 87}

  7. Diagnosis records

  8. Descriptive Statistics • 86% of visits have some value for Diagnosis-Status-at-Visit – • The list of Diseases • COPD, • BOneJointDisease, • Diabetes, • Cancer, • CHD, • Asthma, • Gastroenteritis, • Stroke, • Influenza, • Hypertension, • Anxiety, • Depression, • Hepatitis} • Frequencies vary from 3-12%.

  9. Attributes for the Model – Pt 1 • There are 14 attribute groups making up 27 attributes. Two attributes {Reaction Types and Pathology Result Types have a total of 2341 attributes making a total of 2368 attributes. The range of values for each attribute is listed below. Many attribute values are left empty or have content equivalent to “unknown” • BP recorded (6 values) • Care goal (925 values) • Clinical fields – clinical-smoke info (5 values), – clinical-alcohol info (4 values), – clinical-allergy info (4 values) • Diagnosis Details – diagnosis-name (4099 values), – diagnosis-SCT category (29 values) • Immunisation (403 values) • MBS (69 values) • Reaction types and values ( 1661 types , 4 values)

  10. Attributes for the Model – Pt 2 • Script Details – Script-generic name (982 values), – script-drug name (2340 values), – script-product name (2058 values), – script-frequency (21 values), – script-repeat (4 values), – script-substitutions (2 values), – script-reason (956 values), – script-medication id (918 values) • Tobacco Usage – Tobacco-risk factor (4 values), – tobacco-quit status (2 values) • GP Visit details – GP visit-duration (5 values), – GP visit-age (6 values), – GP visit-type (8 values) • Gender (4 values)

  11. Baseline Predictive Model The results also show that 24% of the patients who are admitted to ED within 30_days of a GP visit are not correctly identified in this model, of which about 77% are classified as over_365_days class.

  12. 2-class Model

  13. Feature Sets • Sparse features • No useful feature set can be extracted from over 13% of patient records (group 1) and only one useful feature set can be extracted from over 17% of patient records

  14. All Class Gender Distribution over_365_day FN num over_365_day FP num over_365_day TP num 365_day FN num 365_day FP num 365_day TP num 180_day FN num 180_day FP num 180_day TP num 90_day FN num 90_day FP num 90_day TP num 30_day FN num 30_day FP num 30_day TP num 0 1000 2000 3000 4000 5000 6000 7000 8000 U M F

  15. Age Distribution for TP 400 350 300 250 200 150 100 50 0 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100 102 104 30_day TP num 90_day TP num 180_day TP num 365_day TP num over_365_day TP num

  16. Age Distribution for FP 400 350 300 250 200 150 100 50 0 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100 102 104 30_day FP num 90_day FP num 180_day FP num 365_day FP num over_365_day FP num

  17. Interim Analysis • Since the performance in the middle three classes (90_day, 180_day, 365_day) is much poorer than that in the two end classes (30_day, over_365_day), we need to consider reducing the number of classes. • The sparsity issue of valid data in data sets implies that we may need to consider extracting data from previous GP visits besides the most recent visit. • The gender and age distribution has significant differences between 30-day class and other classes; however, it is not distinctive among other classes.

  18. Virtual Visit • Compilation of the visits over the past 2 years. • Selection of criteria for admission for many attributes

  19. Model Reframing – Add non-ED records Extract 380,000 GP records without subsequent Hospital visit. Extract the patient intrinsic attributes and visit assigned attributes for those 380000 GP records Convert those extract attributes into learning features Draw a matching sample to the ED cases Re-build predictive models.

  20. 10 Essential Attributes – current diagnosis-name (current visit and activate only) – current diagnosis-sct category (current visit and activate only) ' – historical diagnosis-name (up to 10 years) – historical diagnosis-sct category (up to 10 years) – pt-age – pt-type – pt-gender – pt-atsi – pt-pension – pt-dva

  21. 41 Optional Attributes • immunisation (current visit) • historical immunisation (up to 5 years) • mbs • reaction (current visit) • historical reaction (up to 5 years) • current pathology test-test name (current visit) • current pathology test-radiology test (current visit) • current pathology result (current visit) • historical pathology test-test name (within 12 months) • historical pathology test-radiology test (within 12 months) • historical pathology result (within 12 months) • current scrip-generic name (within 8 months) • current scrip-drug name (within 8 months) • current scrip-product name (within 8 months) • current scrip-frequency (within 8 months) • current scrip-repeat (within 8 months) • current scrip-substitutions (within 8 months) • current scrip-reason (within 8 months) • current scrip-medication id (within 8 months) • current scrip-drug-class (within 8 months)

  22. Revised 90-day class model 6 Class Model F=73.90

  23. Adopted Model

  24. Concluding Points • Built a coherent representation of the patient records suited to computing a predictive model; • Tested a variety of combinations of attributes for the best results; • Converted the many attributes available into domain ranges that were relevant to the task; • Tested many class configurations around 30-day, 90-day, 180-day, 365-day and post-1-year attendances. • Devised representations of the various time lapses between the GP visits of patients; • Separated the analysis to use non-injury cases. • Designed a “virtual visit record” from the historical records which compressed the historical data yet separated it from the data of the most recent visit.

  25. Feature Importance for Under 6 Feature importance for aged under 6 alcohol-days alcohol-drinks alcohol-risk bp-recored care-goal clinical-alcohol clinical-allergy clinical-smoke diagnosis-category diagnosis-name historical_diagnosis-category historical_diagnosis-name historical_immunisation historical_measurement historical_pathology-result historical_pathology-test historical_scrip-drug-class historical_scrip-drug-name historical_scrip-frequency historical_scrip-generic-name historical_scrip-medication-id historical_scrip-product-name historical_scrip-rating historical_scrip-reason historical_scrip-repeat historical_scrip-substitutions immunisation mbs measurement pathology-result pathology-test pathology-test-radiology pt-age pt-atsi pt-dva pt-gender pt-pension pt-type reaction scrip-drug-class scrip-drug-name scrip-frequency scrip-generic-name scrip-medication-id scrip-product-name scrip-reason scrip-repeat scrip-substitutions 0.15 tobacco-risk tobacco-status 0.1 0.05 0 Ranking

Recommend


More recommend