A Robust Recursive Partitioning Algorithm for Mining Multiple Populations Jose Alvir 1 Javier Cabrera 2 Frank Caridi 1 Ha Nguyen 1 Pfizer Inc 1 & Rutgers University 2 Rutgers Biostatistics Day, 4/25/2008
The Challenge of Personalized Medicine • Drugs do not work for everybody • Certain drugs may work for certain individuals compared to other drugs • Individuals may need more or less of a drug than other individuals
The Challenge of Personalized Medicine • Shift from individuals to groups of individuals with similar characteristics • Search for subgroups where response is maximal • Classification techniques like CART are available
Pima Indians Diabetes Data Set 768 females at least 21 yrs old of Pima Indian heritage Variable Mean SD Number of times pregnant 3.8 3.4 Plasma glucose concentration 120.9 32 Diastolic blood pressure 69.1 19.4 Triceps skin fold thickness 20.5 16 2-Hour serum insulin 79.8 115.2 Body Mass Index 32 7.9 Diabetes pedigree function 0.5 0.3 Age 33.2 11.8 Diabetes 268/768
Classic Example of CART: Pima Indians & Diabetes • 768 Pima Indian females, 21+ years old ; 268 tested positive to diabetes • 8 predictors: PRG, PLASMA, BP, THICK, INSULIN, BODY, PEDIGREE, AGE P LA S M A <127. 5 | A G E <28. 5 B O D Y <29. 95 B O D Y <30. 95 B O D Y <26. 35 P LA S M A <145. 5 P LA S M A <157. 5 P LA S M A <99. 5 A G E <30. 5 0. 01325 0. 17500 0. 04878 0. 14630 0. 51430 0. 86960 P E D I G R E E <0. 561 B P <61 0. 18180 0. 72310 0. 40480 0. 73530 1. 00000 0. 32500
ARF – Activity Region Finder • Identify High Activity Regions • Find regions where concentration of “success” is highest, unlike other classification trees (e.g. CHAID, CART) that aim to predict response across the entire range • Splitting a node when there is substantial evidence that the response is higher/lower in the child node (compared to the parent node) • Written in R
Alvir J, Cabrera J, Caridi F, Nguyen H. Mining Clinical Trial Data. In Knowledge Discovery and Data Mining: Challenges and Realities with Real World Data, edited by Xingquan (Hill) Zhu and Ian Davidson, 2007
ARF applied to Pima Indian data DATASET n=768;p=35% PLASMA [155,199] n=122;p=80% PLASMA BODY [128,152] [29.9,45.7] n=153;p=49% n=92;p=88% AGE BODY PEDIGREE [29,56] [30.3,67.1] [0.344,1.394] n=199;p=35% n=99;p=64% n=55;p=96% PEDIGREE [0.439,1.057] n=38;p=82% Subset %Success n 1 PLASMA in [155,199] & BODY in [29.9,45.7] & PEDIGREE in [0.344,1.394] 96.364 55 2 PLASMA in [128,152] & BODY in [30.3,67.1] & PEDIGREE in [0.439,1.057] 81.579 38 3 PLASMA in [0,127] & AGE in [29,56] 35.176 199
Differences between CART & ARF trees • Best node for the CART tree has 9 observations with 100% diabetes • ARF tree has a node of 55 observations with 96% rate of diabetes • The node from CART has a high probability of occurring by chance • ARF tree produces sketches that summarize only important information and downplay less interesting information
Ziprasidone Placebo Controlled Trials 4- & 6-wk U.S. trials • Protocol 104 – 4 weeks N=195 • Protocol 106 – 4 weeks N=132 • Protocol 114 – 6 weeks N=299 • Protocol 115 – 6 weeks N=325 85 subjects on haloperidol excluded Total N = 951
Ziprasidone Data N by dose (mg./day) & Protocol # 104 106 114 115 PBO 47 47 92 80 10 46 40 55 43 86 80 47 104 120 42 76 160 103 200 83
Ziprasidone Data Mining Variables Outcomes: Change in BPRS Total score Predictors: age, sex, race, protocol, dose, baseline clinical ratings (positive Sx, CGI-S, anergia, depressive Sx, AIMS), duration of illness in years, current smoking status
Patient Characteristics Total = 951 N % Male 700 74 Race White 620 65 Black 234 25 Other 97 10 Smoker 716 75
Data Definitions • AIMS = mean of AIMS total/5 and TD severity • BPRS total & Sx scores (positive, depression, anergia) – absolute minimum is zero (items scored with minimum = 0 and not 1) • Positive Sx score – sum of conceptual disorganization, hallucinatory behavior, unusual thought content, suspiciousness • Depression – sum of anxiety, guilt feelings, depressive mood • Anergia – sum of blunted affect, emotional withdrawal, motor retardation • Residual BPRS change – Residual (observed minus predicted) LOCF BPRS total regressed on baseline BPRS
Patient Characteristics Mean S.D. Range BPRS change -5.1 13.4 -58, 55 Residual change 0 13.1 -45, 65 Baseline BPRS 35.9 11.0 14, 86 Age 38.7 10.1 18, 72 Duration of illness 16.0 9.6 0, 54 Baseline Positive Sx 12.7 3.4 4, 24 Baseline Depression 5.5 3.3 0, 17 Baseline CGI-S 4.8 0.8 3, 7 Baseline Anergia 6.0 3.4 0, 18 Baseline AIMS 0.4 0.6 0, 4
The Challenge of Personalized Medicine revisited • Can we identify subgroups for which the drug is more effective than placebo or other drugs? • Are there subsets for which a low dose is better than placebo? • Are there subsets for which a high dose is better than a low dose or vice versa?
Conventional tree methods can only answer these questions indirectly In conventional modelling: - The X space is defined by one sample - We estimate the conditional mean of a response variable given a set of predictors.
Comparative efficacy Subsets where: • the drug is more effective than placebo or other drugs • low dose is better than placebo • high dose is better than a low dose or vice versa In these situations: - The X space is defined by two or more samples. - We estimate the conditional difference of means or in general a function of the conditional means. - We extend ARF to the differences between two or more means
47th Interscience Conference on Antimicrobial Agents and Chemotherapy Chicago,September 17-20, 2007 Symptom Resolution with Azithromycin Extended Release Versus Amoxicillin/Clavulanate in Patients with Acute Sinusitis in a General Practice Physician Environment J. F. Piccirillo 1 , B. F. Marple 2 , C. S. Roberts 3 , J. R. Frytak 4 , V. F. Schabert 5 , J. C. Wegner 4 , H. Bhattacharyya 3 , S. P. Sanchez 3 1 Washington University School of Medicine, St Louis, MO 2 University of Texas Southwestern Medical Center, Dallas, TX 3 Pfizer Inc, New York, NY 4 i3 Innovus, Eden Prairie, MN 5 Integral Health Decisions Inc, Santa Barbara, CA
Sample Characteristics
Ziprasidone: 120 mg/160 mg Vs Placebo MULTIRESPONSE CART BASEDEP< 2.5 | URATILL>=16.5 BASEPOS< 15.5 -6.9380 5.4806 n=37 n=57 BCGIS< 5.5 12.5481 n=86 DURATILL< 7.5 14.3053 n=31 DURATILL>=3.5 RACE=bde DURATILL>=13 0.8727 n=62 ANERGIA>=8.5 -8.5192 7.8737 13.8646 n=30 n=27 n=37 -1.0445 7.1425 n=26 n=94
Ziprasidone: TOP THREE 120 mg/160 mg Vs 20 SPLITS Placebo 0 y -20 -40 0 5 10 15 BASEDEP x 10 20 0 0 y y -10 -20 -20 -40 -30 0 10 20 30 40 50 5 10 15 20 DURATIL BASEPOS x x
Software • These two ARF applications are being incorporated into PfarMineR , a suite of statistical methods for EDA and Data Mining • ARF is available at: http://www.rci.rutgers.edu/~cabrera/dm/DM.html
Recommend
More recommend