14th EYSM Debrecen, 24 August 2005 SEMIPARAMETRIC MODELS IN SURVIVAL ANALYSIS USING BETA PROCESS: PROPORTIONAL HAZARDS MODEL Pierpaolo De Blasi Universit` a Luigi Bocconi di Milano, Italy Joint work with: Nils Lid Hjort (University of Oslo)
SURVIVAL ANALYSIS Let T be a survival time, with cumulative df F ( t ) on [0 , ∞ ) . In survival analysis the basic quantity of interest is the hazard rate: h ↓ 0 h − 1 P { T ≤ t + dt | T > t } α ( t ) = lim � t so that F ( t ) = 1 − exp {− 0 α ( s ) ds } . Nonparametric estimation deals with cumulative hazard: � t dF ( s ) A ( t ) = F ( s, ∞ ) 0 Correspondence formula: F ( t ) = 1 − � [0 ,t ] { 1 − dA ( s ) } , which involves the product integral . � t With continuous survival times A ( t ) = 0 α ( s ) ds . Otherwise it has jumps in (0,1). Hazard rate concept allows to introduce covariate information on individual survival distribution by • Proportional hazards model hazards regression models: • Additive hazards model see Andersen, Borgan, Gill and Keiding (1993)
COUNTING PROCESS FORMULATION Take ( T 1 , δ 1 ) , . . . , ( T n , δ n ) be a sample of right-censored survival times: failure indicators δ i = 0 when T i > t i counting process N i ( t ) = I { T i ≤ t, δ i = 1 } Y i ( t ) = I { T i ≥ t } at risk indicator process Multiplicative Intensity Model (Aalen 1978) � t N i can be decomposed into the sum of its cumulative intensity process Λ i ( t ) = 0 Y i ( s ) dA ( s ) and a martingale noise M i : dN i ( t ) = Y i ( t ) dA ( s ) + dM i ( t ) . Nelson-Aalen estimator of A and Kaplan-Meier estimator of F are defined as: � t δ i dN · ( ds ) � ˆ A ∗ ( t ) = Y · ( t i ) = Y · ( s ) , 0 t i ≤ t � � 1 − dN · ( ds ) � ˆ F ∗ ( t ) = 1 − . Y · ( s ) [0 ,t ] dN · ( t ) |F t − ∼ Binomial ( Y · ( t ) , dA ( t )) . Probabilistic interpretation:
BETA PROCESS Nonparametric Bayes analysis for F , or for A : need a nonparametric prior for F , in the space F of all survival distribution, or for A , in the space A of all cumulative hazard functions . Hjort (1990) introduced the so called beta process . Let c ( t ) be a positive, piecewise continuous function on [0 , ∞ ) and A 0 ∈ A . � � A ∼ Beta c ( · ) , A 0 ( · ) has independent increments with � � dA ( s ) ∼ beta c ( s ) dA 0 ( s ) , c ( s ) { 1 − dA 0 ( s ) } . It admits L´ evy representation: � 1 � t � � (1 − e θs ) s − 1 (1 − s ) c ( x ) − 1 c ( x ) dA 0 ( x ) ds E exp {− θA ( t ) } = exp − 0 0 � �� � L´ evy measure and jumps in (0 , 1) . Also � t dA 0 ( s ) EA ( t ) = A 0 ( t ) and V ar A ( t ) = 1 + c ( s ) . 0 Special case : when c ( s ) = c exp {− A 0 ( s ) } , then F is a Dirichlet process.
Posterior distribution : Conjugacy to right-censored data, � � c ( · ) + Y · ( · ) , ˆ A | data ∼ Beta A , with � t c ( s ) dA 0 ( s ) + dN · ( s ) ˆ A ( t ) = E { A ( t ) | data } = . c ( s ) + Y · ( s ) 0 F ( t ) = E { F ( t ) | data } = 1 − � Also ˆ [0 ,t ] { 1 − d ˆ A ( s ) } .
PROPORTIONAL HAZARDS MODEL Consider lifetimes drawn from a non homogeneous population: data take the form ( T 1 , δ 1 , X 1 ) , . . . , ( T n , δ n , X n ) , where X is a covariate vector. Assume that the measurements X i influence the individual’s hazard rate according to α i ( t ) = r ( X t i β ) α 0 ( t ) . (prop haz) • β is a vector of coefficients; • the relative risk function r ( · ) is a nonnegative function on [0 , ∞ ) , often assumed to be 1 at zero; • α 0 ( · ) is a baseline hazard rate, usually left unspecified = ⇒ semiparametric analysis. Cox regression model (Cox 1972) It postulates the exponential function for r ( · ) : α i ( t ) = e X t i β α 0 ( t ) . (oldcox) Model presented for absolutely continuous failure times. # How to extend to accomodate discrete components?
BOUNDED RISK FACTOR Idea 1 : dA i ( s ) = dA 0 ( s ) exp( X t i β ) Take Beta process for A 0 and prior for β . Does not work since we need jumps in (0 , 1) . Idea 2 : dA i ( s ) = 1 − { 1 − dA 0 ( s ) } exp( X t i β ) Take Beta process for A 0 and prior for β . Work well, see Hjort(1990), Laud, Damien and Smith (1998), Kim and Lee (2003,2004). ”New Cox” model It postulates the logistic function for r ( · ) : e X t i γ α i ( s ) = α 0 ( s ) (newcox) 1 + e X t i γ Since the relative risk function r ( y ) = e y / (1 + e y ) is bounded by 1 , Idea 1 is fine: e X t i γ dA i ( s ) = dA 0 ( s ) i γ . 1 + e X t Use Beta process for A 0 and a prior π ( γ ) for γ . # How can we justify the boundness condition?
(Aalen and Hjort 2002, Hjort 2003) Frailty models that yelds proportional hazards Individuals are exposed to an unobservable cumulative damage type process of the form � Z ( t ) = G j j ≤M ( t ) where G 1 , G 2 , . . . are iid nonnegative random variables, interpreted as adding over time to the hazard � t level of the individual, while M ( t ) is a Poisson process with cumulative intensity Λ( t ) = 0 λ ( s ) . F ( t, ∞|F t − ) = exp {− Z ( t ) } The unconditional survival function takes the form � � − (1 − Ee − G 1 )Λ( t ) F ( t, ∞ ) = E exp {− Z ( t ) } = exp The covariate X i may enters in the specification of • the individual specific Poisson rate λ i ; • in the distribution of G i,j . With common Poisson rate λ ( t ) , the individual hazard rate is given by α i ( t ) = (1 − Ee − G i, 1 ) λ ( t ) . (1 − Ee − G i, 1 ) ≡ r ( x t ⇒ 0 ≤ r ( x t i γ ) ≤ 1 . i γ ) = The logistic function is a reasonable choice!
e X t i γ α i ( s ) = α 0 ( s ) (newcox) 1 + e X t i γ # Why New Cox should be preferred to the Cox model? • exponential form for r ( · ) is mostly tradition & convenience. The logistic function may be appro- priate as well. • It may achieve a better fit on real data: space for goodness-of-fit: – Nelson-Aalen plots of Z i = ˆ ˆ A ∗ 0 ( t i ) r ( X t i ˆ γ ) for the two r functions. When the model is correct, the ˆ Z i ’s are almost like right-censored life-times from unit exponential. – ”Very General Cox” model: e X t i γ α i ( s ) = α 0 ( s ) (very gen) { 1 + e X t i γ } κ and check whether ˆ κ is closer to 0 or 1 .
REAL DATA EXAMPLE Thickness: goodness−of−fit of NEW vs OLD Thickness: maximum of logLike(gamma,kappa) 0.8 −267 −268 newcox oldcox 0.6 −269 profile log_like −270 hatZ_i 0.4 −271 −272 0.2 −273 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.2 0.0 0.2 0.4 0.6 0.8 kappa time Figure 1: Danish melanoma data, n = 205 , covariate ”thickness”. (Left) Nelson-Aalen plots of Z i = ˆ ˆ A ∗ 0 ( t i ) r ( x t i ˆ γ ) . (Right) Very Gen : profile log-likelihood for the κ parameter: ˆ κ = 1 . 008 .
Thickness: estimated relative risk 6 oldcox newcox very general 5 4 r(x’ hat_gamma) 3 2 1 0 0 5 10 15 x Figure 2: Danish melanoma data, n = 205 , covariate ”thickness”. Estimated relative risk for oldcox and newcox versus very gen .
FREQUENTIST ANALYSIS The estimator ˆ γ is the maximizer of the partial likelihood � � δ i n r ( x t i γ ) � PL n ( γ ) = , n − 1 � n j =1 Y j ( t i ) r ( x t j γ ) i � �� � (0) S n ( s, γ ) i.e. ∂/∂ ˆ γ log PL n (ˆ γ ) = 0 , where � ∞ n � { log r ( x t i γ ) − log S n ( s, γ ) } dN i ( s ) (0) log PL n ( γ ) = 0 i =1 The Aalen-Breslow estimator for A 0 is given by � t � n i =1 dN i ( s ) ˆ A ∗ 0 ( t ) = n ( s, γ ) . (0) nS 0 Assume that the regression model is correct; by using martingale theory (Prentice & Self 1983) we may prove the asimptotic normality of the MPL estimator of γ : d n 1 / 2 (ˆ → N p (0 , Σ( γ ) − 1 ) γ − γ ) where � ∞ Σ( γ ) = v ( s, γ ) s (0) ( s, γ ) α 0 ( s ) 0
with v = s (2) /s (0) − ee t and e = s (1) /s (0) . These are limit functions of ∂ S n ( s, γ ) (1) = ∂γ S n ( s, γ ) (0) n � n − 1 Y i ( s ) r ( x t i γ )[1 − r ( x t = i γ )] x i i =1 n � n − 1 Y i ( s ) r ( x t i γ )[1 − r ( x t i γ )] 2 x i x t (2) S n ( s, γ ) = i i =1
BAYES ANALYSIS �� � Y i ( t ) − dN i ( t ) � n � dN i ( t ) � e x t e x t i γ i γ � � 1 − L ( A, γ ) = i γ dA ( t ) i γ dA ( t ) 1 + e x t 1 + e x t t i =1 • Beta process A ∼ Beta ( c, A 0 ) • Prior π ( γ ) for γ . For example the Jeffreys prior: � � 1 / 2 n x i x t � � 1 � i � � π ( γ ) ∝ � � (1 + e x t i γ ) 2 n � � i =1 which leads to proper posterior. # How to do updating for ( A, γ ) ? 1. posterior of A given γ ; 2. posterior of γ : integrate the likelihood w.r. to A | ( γ, data ) . # How does the posterior distribution behaves? 3. Bernstein-von Mises theorem for γ .
Recommend
More recommend