M i n i n g A d m i n i s t r a t i v e a n d C l i n i c a l D i a b e t e s D a t a w ith Temporal Association Rules Stefano Concaro, Lucia Sacchi, Carlo Cerra, Riccardo Bellazzi MIE 2009, Sarajevo, August 31 st 2009
Summary • DataWarehouse Healthcare Agency (ASL) of Pavia Administrative healthcare data Clinical data • Methods Representation of temporal sequences Integration of data sources Temporal Association Rules (TARs) mining Management of temporal heterogeneity in the data • Application Diabetes Mellitus • Conclusions
DataWarehouse Local Healthcare Agency (ASL) of Pavia (1) Administrative healthcare data (since 2002) Hospital Admissions Ambulatory Drug DW Visits/Lab Tests Prescriptions … “Process data” for Pavia area reimbursement purposes • 530.000 people • 170.000 admissions/year CodPat CodPharm PurchDate CodDrug ATC code Quantity Cost CodPat CodHosp AdmDate DischDate Diagn(1-6) Proc(1-6) Refund CodPat CodAmb ContDate CodTest Refund • 4.500.000 drug prescriptions/year xxx xxx yyy yyy 22/07/2008 72364 29/03/2008 347.1 C01AA02 50 € 2 60 € xxx yyy 16/06/2008 24/06/2008 428.1, 410 41.00 2300 € • 9.000.000 visits-tests/year Progressive increase of DW dimension due to new data introduction and historical data maintenance
DataWarehouse Local Healthcare Agency (ASL) of Pavia (2) Clinical healthcare data (since 2007) • Jan 2007 – Oct 2008 • 1.300 diabetic patients Diabetes • 5.000 inspections Mellitus Outcomes of Cardio-Cerebro- Essential DW Vascular Disease Hypertension medical inspections and clinical tests … Variable Range IQ Range Unit 1. Body Mass Index (BMI) [10-80] [25.15-31.28] Kg/m 2 2. Systolic Blood Pressure (SBP) [60-240] [130-150] mmHg 3. Diastolic Blood Pressure (DBP) [30-150] [75-85] mmHg 4. Glycaemia [50-500] [112-162] mg/dl 5. Glycated Haemoglobin (HbA1c) [3-20] [6.3-7.9] % 6. Total Cholesterol [80-500] [175-232] mg/dl 7. HDL Cholesterol [10-120] [43-62] mg/dl 8. Triglycerides [10-2000] [91-177] mg/dl 9. Cardio-Vascular Risk (CVR) [0-100] [8.57-30.33] % 10. Anti-Hypertensive Therapy {Yes; No} - - 11. Care Intervention {Diet; Health training; None} - -
Temporal Representation Primary role of the temporal dimension Drug prescriptions Clinical data [Patient j] SBP=190 mmHg Triglycerides=230 mg/dl … ACE inhibitors Glycaemia=115 mg/dl Time [days] t3 t1 t2 t4 t5 t6 t7 Creatinine test DRG 121 Hospital Blood glucose test Lab Diagn 410 admissions Gamma GT tests Case history: temporal sequences of healthcare events Temporal Association Rules (TARs) mining
Integration of Data Sources • Administrative data: naturally represented as event sequences • Clinical data: pre-processing to shift from a quantitative representation to a qualitative description Knowledge-based Temporal Abstractions (TAs) State Trend TAs TAs • Glycaemia<65 ( low ) • Glycaemia • Glycaemia 65-100 Increasing ( regular ) • Glycaemia • Glycaemia 100-125 Steady ( IFG ) • Glycaemia • Glycaemia 125-180 Decreasing ( high )
Temporal Heterogeneity Hybrid events: point-like and interval-like events granularity SBP>180 (severe hypertension) Triglycerides>350 (very high) … ACE inhibitors Glycaemia 100-125 (IFG) Time [days] t3 t2 t1 t4 t5 t6 t7 Creatinine test DRG 121 Blood glucose test Diagn 410 Gamma GT point-like events interval-like events TARs mining on temporal sequences of hybrid events
Temporal Association Rules (TARs) TAR : relationship defined through a temporal operator ( op ) which holds between an event A ( the antecedent ) and an event C ( the consequent ) Basic rules : antecedent cardinality A C K=1 op E.g. ACE inhibitors BEFORE Heart failure diagnosis Complex rules : K>1 Apriori*-like search strategy A1 op A2 C op A3 op Time op A C E.g. {ACE inhibitors AND Beta-blockers AND Diuretics} BEFORE * [Agrawal R., Srikant R. Fast Algorithms for Mining Association Rules in Large Databases . Heart failure diagnosis In: 20th International Conference on Very Large Data Bases, 487-499 (1994)]
Support Support = # subjects supporting the rule Total # of subjects Rule occurrences s1 f = 3, span = 0 f = 1, span = 0.7 Subject s2 s3 f = 2, span = 0.2 Time The number of subjects supporting the rule is based on a frequency threshold ( f_th ) and a duration threshold ( span_th ) E.g. f_th = 3 , span_th = 0.5 support = 2/3
Support Support = # subjects supporting the rule Total # of subjects Support = 4/7
Confidence Confidence = # subjects supporting the rule (NSR) # subjects supporting the antecedent (NSA) Confidence = 2/3
Confidence Confidence = # subjects supporting the rule (NSR) # subjects supporting the antecedent (NSA) Probability that a patient experiences the consequent given that the antecedent occurred for that patient Events occurrences A C A C s1 1 T C s2 Subject 1 T A s3 1 T Time NSR = 1 Confidence = 1/2 NSA = 2
Application: Diabetes Mellitus Clinical Rule template State • Jan 2007 – Oct 2008 TAs C A • 1.300 diabetic patients State TAs • 5.000 inspections Trend Gap Trend TAs TAs Before SSN Accesses SSN Administrative Accesses Access Code Description DRG 134 Hypertension Diagnosis 25000 Type II Diabetes Mellitus ATC C07A Beta Blocking Agents … … Parameter settings minsup = 0.01 (13 patients) minconf = 0.3
Interesting rules Data Mining methods often produce a great amount of output information which is irrelevant , uninteresting or redundant Support | Confidence HbA1c 7-8 (high) BEFORE ATC A10: Drugs used in Diabetes 0.25 0.62 Total Cholesterol 220-280 (high) BEFORE ATC C10A: Lipid 0.18 0.73 modifying agents Post-processing Target : obtain only a reduced set of “ interesting ” rules Sequential Raw Reduced “Interesting” Mining Clinical data RuleSet RuleSet rules Minimp>1 step evaluation Quantitative verification of a-priori knowledge ClinR=1 Minimum improvement : keep only the rules which increase Rule classification based on the evidence of a clinical relationship between the events involved in the rules the confidence value with respect to all their subrules Suggestion for the discovery of unknown knowledge ClinR=0
Results (1) Support Confidence BMI 25-30 (overweight) 1 visit Glycaemia 65-110 Glycaemia Increasing 0.013 0.56 (regular) BEFORE HbA1c 7-8 (high) Anti-hypertensive therapy: yes Given the occurence of the antecedent, TARs verified in the 1.3% of there is a 56% probability of an increase in the diabetic sample ( 17 p.) glycaemia in the following visit
Results (2) Support Confidence ATC C03C: High 365 days ceiling diuretics HbA1c Increasing 0.012 0.57 ATC M04A: Antigout BEFORE agents Given the occurence of the antecedent, TARs verified in the 1.2% of there is a 57% probability of an increase in the diabetic sample ( 16 p.) HbA1c in the following year
Results (2) Support Confidence SBP<140 (regular) ATC C03C: High 365 days SBP Increasing ceiling diuretics 0.69 0.02 Care Intervention: BEFORE Diet Given the occurence of the antecedent, TARs verified in the 2% of there is a 69% probability of an increase in systolic blood pressure in the following the diabetic sample ( 26 p.) visit
Results (2) Support ClinR Confidence ATC B01 : 0.537 0.013 Antithrombotic agents BMI 25-30 (overweight) ATC B01A : 365 days Glycaemia 110-180 0.537 0.013 Antithrombotic agents (high) 0 BEFORE ATC B01AC : Platelet HbA1c>9 (excessively high) aggregation inhibitors, 0.464 0.011 excluding heparin Anti-platelet agents as the Apparently no clinical relationship TARs verified in about the 1% of main antithrombotic drug between physiological observations the diabetic sample ( 14 - 17 p.) therapy in the subgroup of and drug effects patients
Conclusions and Future Work General method to extract temporal relationships between diagnostic , therapeutic , or clinical patterns Explicit handling of temporal heterogeneity (hybrid events) Integration of different data sources with a uniform representation Ongoing Work and Future Developments Post-processing strategy Rule set reduction: definition of “ interesting ” rules Clinical classification of the rules Hierarchical mining exploiting the taxonomical information Ontology-driven rule classification to perform a totally automated post-processing procedure
Conclusions (2) Future work Hierarchical mining exploiting the taxonomical information Ontology-driven rule classification to perform a totally automated post-processing procedure Development of a method based on “chained” TARs to detect frequent temporal care-flows SBP>180 (severe hypertension) ACE inhibitors Time t2 t4 t1 t3 t5 t6 t7 Beta-blockers Heart failure {ACE inhibitors BEFORE Beta-blockers BEFORE SBP>180} Given a temporal case history , which are the most frequently {ACE inhibitors BEFORE Beta-blockers} {ACE inhibitors} BEFORE Beta-blockers BEFORE SBP>180 BEFORE Heart failure expected healthcare events ?
Acknowledgments Riccardo Bellazzi (riccardo.bellazzi@unipv.it) Stefano Concaro (stefano.concaro@unipv.it) Carlo Cerra, Pietro Fratino (carlo_cerra@asl.pavia.it)
Recommend
More recommend