30.August 2010 Cutpoint Analysis with Bootstrap Validation for the Cox PH Model These slides present an overview of a new method Pro- fessor Kim and I developed and is presented in detail in Chapter 6.3.8 of our textbook. OUTLINE: • A Procedure to Identify the Cutpoint (Threshold) for a Continuous Prognostic Factor • Robustness of this Procedure Validated via Boot- strap • Application of the Cutpoint Analysis to the CNS Lymphoma Data • Results • Discussion • References
CNS (Central Nervous System) Lymphoma Data The following data set was first analyzed by Dahlborg, et al. (1996) 1. GROUP: 0=no prior radiation with respect to 1st blood brain barrier disruption (BBBD) procedure to deliver chemotherapy; 1=prior radiation 2. SEX: 0=male; 1=female 3. AGE: at time of 1st BBBD, recorded in years; instead, AGE60 = 1 if AGE ≤ 60 and = 0 otherwise 4. STATUS: 0=alive; 1=dead 5. Response: B3TODeath, time from 1st BBBD to death in years 6. KPS.PRE.: Karnofsky performance score before 1st BBBD, numerical value 0 − 100 7. LESSING: Lesions: single=0; multiple=1 8. LESDEEP: Lesions: superficial=0; deep=1 9. LESSUP: Lesions: supra=0; infra=1; both=2 10. PROC: Procedure: subtotal resection=1; biopsy=2; other=3 11. CHEMOPRIOR: no=0; yes=1 5
A Procedure to Identify the Cutpoint • Discretizing a continuous variable, initiated by Thomsen (1988), makes the model more in- terpretable. • When the cutpoint is not readily available, statistical methods that determine the cutpoint need to be used. � 0 if KPS.PRE. < θ K = 1 if KPS.PRE. ≥ θ . • Profile likelihood function is defined as θ → sup η L( θ, η | x 1 , . . . , x n ) , (4) where the supremum is taken over all possible values of η , and x 1 , . . . , x n are the realizations of the random variables X 1 , . . . , X n . See van der Vaart (1998). • Treat the values of KPS.PRE. as the values of θ and the β in the Cox PH model as η in the above usual setting. For various values of θ we fit a Cox PH model with the five covariates GROUP, SEX, AGE60, SEX:AGE60, and K . For each θ value, a profile log-likelihood value is obtained. The value of θ which maximizes the profile log-likelihood function is the desired cut point (threshold). The maximum must occur at one of the 58 KPS.PRE. scores. • Dahlborg et al. (1996) had a main effects model GROUP, SEX, KPS.PRE. • cutpt.coxph has three arguments: ( object , data , q ), where object is a coxph object, data is a data frame, and q is a vector of various quantiles. 6
> cns2.temp <- move.col(cns2,1,cns2,9) # Moves KPS.PRE. in col. 9 to col. 1. > cns2.temp$K <- rbinom(nrow(cns2),1,.5) # Creates a place holder for K > temp <- coxph(Surv(B3TODEATH,STATUS)~GROUP+SEX+AGE60+SEX:AGE60+K,data=cns2.temp) # automatically provides the necessary formula within cutpt.coxph > cns2cutpt <- cutpt.coxph(object=temp,data=cns2.temp,q=seq(0.05,0.95,0.05)) # output of cutpt.coxph > plot(cns2cutpt$out,type="l",lwd=2,col=1,xlab="KPS.PRE.", ylab="Profile log-likelihood") # plot of quantiles versus corresponding # profile log-likelihoods -112 -113 Profile log-likelihood -114 -115 50 60 70 80 90 100 KPS.PRE. Figure 1: Profile log-likelihoods at quantiles of KPS.PRE. -114 -116 Profile log-likelihood -118 -120 50 60 70 80 90 100 KPS.PRE. Figure 2: Profile log-likelihoods at quantiles of KPS.PRE. in main effects model 7
Robustness of Identifying the Cutpoint • Efron (1998) cautions that the validity of the selected model through currently available methods may be doubtful in certain situations. Efron and Gong (1983) found that from 500 bootstrap sets of data there was only one match to the originally selected model. Further, only one variable in the originally selected model appeared in more than half (295) of the bootstrap set based models. • The bootstrap method was first introduced by Efron (1979). For censored survival data, see Davison and Hinkley (1999). 1. Select n subjects from the original data with replacement with equal probability. 2. Run your model with this bootstrap sample and keep the statistics of interest. 3. Repeat this process B times. • No need to use all 58 values, but to use the sample quantiles of each bootstrap sample. Here I used .05, .10, . . . , .95 quantiles. > temp1 <- bootstrap(cns2,cutpt.coxph(object=cns2cutpt1,data=cns2, q=seq(.05,.95,.05))$cutpt,B=1000) # Bootstrap output > plot(temp1,xlab="cut points",ylab="density", main="Bootstrap Density Histogram of Cutpoints") # Density histogram # of bootstrap profile log-likelihoods at 19 quantiles 8
• Figure 3, shows a very similar shape to the profile likelihood in Figure 1. Bootstrap Density Histogram of Cut Points 0.06 0.05 0.04 density 0.03 0.02 0.01 0.00 40 50 60 70 80 90 100 cut points Figure 3: Bootstrap density histogram of the maximum profile log-likelihood estimates of θ . • Figure 4 shows a very similar shape to the profile log-likelihood in Figure 2. This again demonstrates the robustness of the profile likelihood approach to identify the cutpoint. Bootstrap Density Histogram of Cut Points 0.4 0.3 density 0.2 0.1 0.0 50 60 70 80 90 100 cut points Figure 4: Bootstrap density histogram of the maximum profile likelihood estimates of θ in main effects model 9
Application of Cutpoint Analysis • Fit the Cox PH model with GROUP, SEX, AGE60, SEX:AGE60, and K where � 0 if KPS.PRE. < 90 K = 1 if KPS.PRE. ≥ 90. > cns2.temp$K <- as.integer(cns2.temp$KPS.PRE.>=90) # creates the indicator variable K > cns2cutpt1 <- coxph(Surv(B3TODEATH,STATUS) ~ GROUP+SEX+AGE60+SEX:AGE60+K, data=cns2.temp) > summary(cns2cutpt1) coef exp(coef) se(coef) z p GROUP 1.285 3.616 0.373 3.45 0.00056 SEX -1.602 0.202 0.676 -2.37 0.01800 AGE60 -0.713 0.490 0.481 -1.48 0.14000 K -1.162 0.313 0.420 -2.77 0.00560 SEX:AGE60 1.582 4.864 0.864 1.83 0.06700 Likelihood ratio test = 27.2 on 5 df, p=0.0000524 10
Results Table 1. Continuous KPS.PRE. Dichotomized KPS.PRE.= K coefficient p -value coefficient p -value GROUP 1.1592 0.0022 GROUP 1.285 0.00056 SEX − 2 . 1113 0.0026 SEX − 1 . 602 0.01800 AGE60 − 1 . 0538 0.0210 AGE60 − 0 . 713 0.14000 SEX:AGE60 2.1400 0.0120 SEX:AGE60 1.582 0.06700 KPS.PRE. − 0 . 0307 0.0028 − 1 . 162 0.00560 K LRT = 27.6 .00004 LRT = 27.2 0.00005 • According to the LRT criterion in Table 1, the overall fit is not improved by using K . • K has about the same p -value as KPS.PRE had. • GROUP becomes stronger, whereas it makes the effects of all others weaker. In particular, AGE60 and SEX:AGE60 have p -values .140 and .067. This is contradictory to the clear interaction effect shown in Figure 7. Therefore, dichotomizing KPS.PRE. is not recommended. Table 2. Continuous KPS.PRE. Dichotomized KPS.PRE.= K coefficient p -value coefficient p -value GROUP .7785 .028 GROUP 1.089 .00031 SEX − . 7968 .052 SEX − . 665 .086 KPS.PRE. − . 0347 .00056 K − 1 . 396 .00024 LRT = 20.3 .000146 LRT = 23.5 .000032 • According to the LRT criterion in Table 2, the overall fit is improved using the variable K . • K has about the same p -value as KPS.PRE had. • GROUP becomes much stronger, whereas SEX becomes a bit weaker. SEX was originally marginally significant ( p -value = .052) in the main effects model and so this does not cause specific concern. 11
Discussion • Bootstrap validation demonstrates the robustness of the profile likelihood approach to identify the cutpoint. See Figures 3 and 4. • Martingale residual plot developed by Therneau, Grambsch, and Fleming (1990): Figure 5 for the two-way interaction model shows only a single line with a little bump, whereas Figure 6 based on the main effects model shows two parallel lines. 1 0 Martingale residual -1 -2 40 50 60 70 80 90 100 KPS.PRE. Figure 5: Martingale residuals to look for best functional form of the continuous covariate KPS.PRE. 1 0 Martingale residual -1 -2 -3 40 50 60 70 80 90 100 KPS.PRE. Figure 6: Martingale residuals to look for best functional form of the continuous covariate KPS.PRE. in main effects model 12
• Cutpoint analysis might be useful when one still tries to improve the current model. See Fig- ure 7. In the main effects model there is room for improvement. By having the dichotomized variable K , the new model improves not only the strength of the variable GROUP but the overall model fit. The interaction model does not. See the results. 1.4 1.4 F 1.5 1.2 1.2 F F 1st quartile survival time 1.0 1.0 median survival time mean survival time 1.0 0.8 0.8 M 0.6 0.6 M 0.5 0.4 0.4 M 0.2 0.2 0.0 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 AGE60 AGE60 AGE60 Figure 7: Interaction between SEX and AGE60 adjusted for KPS.PRE. and GROUP via coxph and then evaluated at GROUP=1 and KPS.PRE.=80. • Altman (1998) notes a possible loss of power to detect actual significance and biased estimates in regression settings. • A dominant peak in the profile log-likelihood seems to suggest the existence of a significant cutpoint. • Is this observation really true? A simulation study is required. 13
Recommend
More recommend