M i n i n g A d m i n i s t r a t i v e a n d - - PowerPoint PPT Presentation

m i n i n g a d m i n i s t r a t i v e a n d c l i n i c
SMART_READER_LITE
LIVE PREVIEW

M i n i n g A d m i n i s t r a t i v e a n d - - PowerPoint PPT Presentation

M i n i n g A d m i n i s t r a t i v e a n d C l i n i c a l D i a b e t e s D a t a w ith Temporal Association Rules Stefano Concaro, Lucia Sacchi, Carlo Cerra, Riccardo Bellazzi MIE 2009, Sarajevo,


slide-1
SLIDE 1

Stefano Concaro, Lucia Sacchi, Carlo Cerra, Riccardo Bellazzi

M i n i n g A d m i n i s t r a t i v e a n d C l i n i c a l D i a b e t e s D a t a w ith Temporal Association Rules‏

MIE 2009, Sarajevo, August 31st 2009

slide-2
SLIDE 2

Summary

  • DataWarehouse Healthcare Agency (ASL) of Pavia
  • Administrative healthcare data
  • Clinical data
  • Methods
  • Representation of temporal sequences
  • Integration of data sources
  • Temporal Association Rules (TARs) mining
  • Management of temporal heterogeneity in the data
  • Application
  • Diabetes Mellitus
  • Conclusions
slide-3
SLIDE 3

Progressive increase of DW dimension due to new data introduction and historical data maintenance

DataWarehouse Local Healthcare Agency (ASL) of Pavia (1)

CodPat CodHosp AdmDate DischDate Diagn(1-6) Proc(1-6) Refund xxx yyy 16/06/2008 24/06/2008 428.1, 410 41.00 2300€ CodPat CodAmb ContDate CodTest Refund xxx yyy 29/03/2008 347.1 50€

Hospital Admissions DW Ambulatory Visits/Lab Tests Drug Prescriptions …

Administrative healthcare data (since 2002) Pavia area

  • 530.000 people
  • 170.000 admissions/year
  • 4.500.000 drug prescriptions/year
  • 9.000.000 visits-tests/year

“Process data” for reimbursement purposes

CodPat CodPharm PurchDate CodDrug ATC code Quantity Cost xxx yyy 22/07/2008 72364 C01AA02 2 60€

slide-4
SLIDE 4

Clinical healthcare data (since 2007)

Diabetes Mellitus DW Cardio-Cerebro- Vascular Disease Essential Hypertension …

Variable Range IQ Range Unit

  • 1. Body Mass Index (BMI)

[10-80] [25.15-31.28] Kg/m2

  • 2. Systolic Blood Pressure (SBP)

[60-240] [130-150] mmHg

  • 3. Diastolic Blood Pressure (DBP)

[30-150] [75-85] mmHg

  • 4. Glycaemia

[50-500] [112-162] mg/dl

  • 5. Glycated Haemoglobin (HbA1c)

[3-20] [6.3-7.9] %

  • 6. Total Cholesterol

[80-500] [175-232] mg/dl

  • 7. HDL Cholesterol

[10-120] [43-62] mg/dl

  • 8. Triglycerides

[10-2000] [91-177] mg/dl

  • 9. Cardio-Vascular Risk (CVR)

[0-100] [8.57-30.33] %

  • 10. Anti-Hypertensive Therapy

{Yes; No}

  • 11. Care Intervention

{Diet; Health training; None}

  • Jan 2007 – Oct 2008
  • 1.300 diabetic patients
  • 5.000 inspections

Outcomes of medical inspections and clinical tests

DataWarehouse Local Healthcare Agency (ASL) of Pavia (2)

slide-5
SLIDE 5

Temporal Representation

Case history: temporal sequences of healthcare events

t2 Time [days]

[Patient j]

Creatinine test Blood glucose test Gamma GT ACE inhibitors SBP=190 mmHg Triglycerides=230 mg/dl Glycaemia=115 mg/dl DRG 121 Diagn 410 t1 t3 t4 t6 t5 t7 …

Drug prescriptions Lab tests Clinical data Hospital admissions

Temporal Association Rules (TARs) mining Primary role of the temporal dimension

slide-6
SLIDE 6

Integration of Data Sources

  • Administrative data: naturally represented as event sequences
  • Clinical data: pre-processing to shift from a quantitative

representation to a qualitative description Knowledge-based Temporal Abstractions (TAs) State TAs Trend TAs

  • Glycaemia

Increasing

  • Glycaemia

Steady

  • Glycaemia

Decreasing

  • Glycaemia<65 (low)
  • Glycaemia 65-100

(regular)

  • Glycaemia 100-125

(IFG)

  • Glycaemia 125-180

(high)

slide-7
SLIDE 7

Temporal Heterogeneity

Hybrid events: point-like and interval-like events

t2 Time [days] Creatinine test Blood glucose test Gamma GT ACE inhibitors SBP>180 (severe hypertension) Triglycerides>350 (very high) Glycaemia 100-125 (IFG) DRG 121 Diagn 410 t1 t3 t4 t6 t5 t7 …

point-like events interval-like events TARs mining on temporal sequences of hybrid events granularity

slide-8
SLIDE 8

Apriori*-like search strategy

* [Agrawal R., Srikant R. Fast Algorithms for Mining Association Rules in Large Databases. In: 20th International Conference on Very Large Data Bases, 487-499 (1994)]

Temporal Association Rules (TARs)

TAR: relationship defined through a temporal operator (op) which holds between an event A (the antecedent) and an event C (the consequent)

E.g. ACE inhibitors BEFORE Heart failure diagnosis

A C

  • p

A1 A3 A2 A C

  • p

C

Time

  • p
  • p
  • p

Basic rules: antecedent cardinality K=1

Complex rules: K>1

E.g. {ACE inhibitors AND Beta-blockers AND Diuretics} BEFORE Heart failure diagnosis

slide-9
SLIDE 9

f = 3, span = 0 f = 1, span = 0.7 f = 2, span = 0.2 s1 s2 s3

Support

Support = # subjects supporting the rule Total # of subjects

The number of subjects supporting the rule is based on a frequency threshold (f_th) and a duration threshold (span_th) E.g. f_th = 3, span_th = 0.5  support = 2/3

Time Subject Rule

  • ccurrences
slide-10
SLIDE 10

Support

Support = # subjects supporting the rule Total # of subjects

Support = 4/7

slide-11
SLIDE 11

Confidence

Confidence = # subjects supporting the rule (NSR) # subjects supporting the antecedent (NSA)

Confidence = 2/3

slide-12
SLIDE 12

Confidence

Probability that a patient experiences the consequent given that the antecedent occurred for that patient

A s3 1 T C A C C A s2 s1 1 T 1 T

NSR = 1 NSA = 2 Confidence = 1/2

Time Subject Events

  • ccurrences

Confidence = # subjects supporting the rule (NSR) # subjects supporting the antecedent (NSA)

slide-13
SLIDE 13

Application: Diabetes Mellitus

  • Jan 2007 – Oct 2008
  • 1.300 diabetic patients
  • 5.000 inspections

Clinical Administrative

State TAs Trend TAs SSN Accesses

Rule template

State TAs Trend TAs SSN Accesses

A C

Access Code Description DRG 134 Hypertension Diagnosis 25000 Type II Diabetes Mellitus ATC C07A Beta Blocking Agents … …

Parameter settings minsup = 0.01 (13 patients) minconf = 0.3

Before Gap

slide-14
SLIDE 14

ClinR=1 ClinR=0

Quantitative verification of a-priori knowledge Suggestion for the discovery of unknown knowledge

Post-processing

Interesting rules

Data Mining methods often produce a great amount of output information which is irrelevant, uninteresting or redundant

Target: obtain only a reduced set of “interesting” rules Minimum improvement: keep only the rules which increase the confidence value with respect to all their subrules

Mining step

Sequential data Raw RuleSet Reduced RuleSet “Interesting” rules

Minimp>1 Clinical evaluation

Rule classification based on the evidence of a clinical relationship between the events involved in the rules HbA1c 7-8 (high) BEFORE ATC A10: Drugs used in Diabetes Support | Confidence 0.25 0.62 Total Cholesterol 220-280 (high) BEFORE ATC C10A: Lipid modifying agents 0.18 0.73

slide-15
SLIDE 15

Given the occurence of the antecedent, there is a 56% probability of an increase in glycaemia in the following visit TARs verified in the 1.3% of the diabetic sample (17 p.)

Support 0.013 Confidence 0.56

Results (1)

  • BMI 25-30

(overweight)

  • Glycaemia 65-110

(regular)

  • HbA1c 7-8 (high)
  • Anti-hypertensive

therapy: yes 1 visit BEFORE

  • Glycaemia Increasing
slide-16
SLIDE 16

Given the occurence of the antecedent, there is a 57% probability of an increase in HbA1c in the following year TARs verified in the 1.2% of the diabetic sample (16 p.)

Support 0.012 Confidence 0.57

Results (2)

  • ATC C03C: High

ceiling diuretics

  • ATC M04A: Antigout

agents 365 days BEFORE

  • HbA1c Increasing
slide-17
SLIDE 17

Given the occurence of the antecedent, there is a 69% probability of an increase in systolic blood pressure in the following visit TARs verified in the 2% of the diabetic sample (26 p.)

Support 0.02 Confidence 0.69

Results (2)

  • SBP<140 (regular)
  • ATC C03C: High

ceiling diuretics

  • Care Intervention:

Diet 365 days BEFORE

  • SBP Increasing
slide-18
SLIDE 18

TARs verified in about the 1% of the diabetic sample (14-17 p.) Apparently no clinical relationship between physiological observations and drug effects Anti-platelet agents as the main antithrombotic drug therapy in the subgroup of patients

ClinR

Results (2)

  • BMI 25-30

(overweight)

  • Glycaemia 110-180

(high)

  • HbA1c>9

(excessively high) 365 days BEFORE

  • ATC B01:

Antithrombotic agents

  • ATC B01A:

Antithrombotic agents

  • ATC B01AC: Platelet

aggregation inhibitors, excluding heparin

Support 0.013 0.013 0.011 Confidence 0.537 0.537 0.464

slide-19
SLIDE 19

Conclusions and Future Work

General method to extract temporal relationships between diagnostic, therapeutic, or clinical patterns

  • Explicit handling of temporal heterogeneity (hybrid events)
  • Integration of different data sources with a uniform representation

Post-processing strategy

  • Rule set reduction: definition of “interesting” rules
  • Clinical classification of the rules
  • Hierarchical mining exploiting the taxonomical information
  • Ontology-driven rule classification to perform a totally automated

post-processing procedure Ongoing Work and Future Developments

slide-20
SLIDE 20

{ACE inhibitors BEFORE Beta-blockers BEFORE SBP>180} BEFORE Heart failure {ACE inhibitors} BEFORE Beta-blockers {ACE inhibitors BEFORE Beta-blockers} BEFORE SBP>180

Conclusions (2)

Future work

  • Hierarchical mining exploiting the taxonomical information
  • Ontology-driven rule classification to perform a totally automated

post-processing procedure

  • Development of a method based on “chained” TARs to detect

frequent temporal care-flows

ACE inhibitors Heart failure SBP>180 (severe hypertension) Beta-blockers t2 Time t1 t5 t7 t6 t4 t3

Given a temporal case history, which are the most frequently expected healthcare events?

slide-21
SLIDE 21

Acknowledgments

Riccardo Bellazzi

(riccardo.bellazzi@unipv.it)

Stefano Concaro

(stefano.concaro@unipv.it)

Carlo Cerra, Pietro Fratino

(carlo_cerra@asl.pavia.it)

slide-22
SLIDE 22