Outline ◮ Counting processes ◮ Martingales ◮ Applications to survival analysis: ◮ Nelson-Aalen estimate ◮ Cox partial likelihood (including time-varying covariates) 1 / 29
Primer: Stieltje’s integral For real functions f and g and a < b Stieltje’s integral is defined as � b � b n � f ( x ) g ( dx ) = f ( x ) d g ( x ) = lim f ( x i )[ g ( x i ) − g ( x i − 1 )] n →∞ a a i =1 where a = x 0 < x 1 < · · · < x n = b . Sufficient condition for existence: f continuous and g of bounded variation (i.e. g = g 1 − g 2 where g 1 and g 2 monotone functions). Example: g continuously differentiable � b � b f ( x ) g ′ ( x ) d x f ( x ) g ( dx ) = a a 2 / 29
Example: g right-continuous piecewise constant with jumps t 1 , . . . , t k in [ a , b ]: � b k � f ( x ) g ( dx ) = f ( t l )( g ( t l ) − g ( t l − 1 )) a l =1 Example: g piecewise continuous differentiable with jumps t 1 , . . . , t k in [ a , b ] (right-continuous in jumps): � b � b k � f ( x ) g ′ ( x ) d x + f ( x ) g ( dx ) = f ( t l )( g ( t l ) − g ( t l − )) a a l =1 3 / 29
Counting process A continuous time stochastic process N = { N ( t ) } t ≥ 0 is a counting process if N (0) = 0, N is piece-wise constant right-continuous, and with probability one: N ( t ) ∈ N ∪ { 0 } with jumps of size 1. Example: A counting process N is a Poisson process with intensity � t function λ if for 0 ≤ s < t , N ( t ) − N ( s ) ∼ Poisson( s λ ( u ) d u ) and if increments on disjoint intervals are independent. N ( t ) − N ( s ) is interpreted as the number of “events” in ] s , t ]. � t Equivalent definition: N ( t ) − N ( s ) ∼ Poisson( s λ ( u ) d u ) and conditional on N ( t ) − N ( s ) = n , the n jump positions in ] s , t ] are independent with density f ( u ) ∝ λ ( u ), u ∈ ] s , t ]. Equivalent definition for constant intensity: the waiting times W i = T i − T i − 1 between jump locations T i , i = 1 , 2 , . . . are independent Exponential( λ ) random variables (here T 0 = 0 is not a jump location). 4 / 29
The last two definitions show ways to construct a Poisson process N (letting N increase by one at each jump position). A counting process is also known as a point process - focus is then on the locations of jumps aka the points. Concept can be generalized to higher dimensions - spatial point processes. 5 / 29
Martingale Let for each t ≥ 0 F t denote set of ‘information’ available up to time t (technically, F t is a σ -algebra) such that F s ⊆ F t for 0 ≤ s ≤ t (information increasing over time) For a stochastic process M , F t could e.g. represent the history of the process itself up to time t . F t could also contain information about other stochastic processes evolving in parallel to M . Definition : M = { M ( t ) } t ≥ 0 is a martingale with respect to F = {F t } t ≥ 0 if ◮ E [ M ( t ) |F s ] = M ( s ), 0 ≤ s ≤ t . ◮ M ( t ) determined by F t : knowledge of F t means we know M ( t ) (technically speaking, M ( t ) is F t measurable). We say M is adapted to F . 6 / 29
Examples Suppose N is a Poisson process with intensity λ ( · ). Let � t Λ( t ) = E N ( t ) = 0 λ ( u ) d u . Then M ( t ) = N ( t ) − Λ( t ) is a martingale with respect to its own past F t = σ (( N ( u )) 0 ≤ u ≤ t ). A Brownian motion is a martingale with respect to its own past. 7 / 29
Properties: ◮ If M (0) = 0 then E M ( t ) = 0 for all t ≥ 0. ◮ Uncorrelated increments over disjoint intervals: E [ M ( t ) − M ( s )][ M ( u ) − M ( v )] = 0 for 0 ≤ v ≤ u ≤ s ≤ t . Martingale central limit theorem: a theorem that says that a sequence of martingales M n = { M n ( t ) } t ≥ 0 , n = 1 , 2 , . . . converges to a Gaussian process (typically closely related to Brownian motion). We shall consider survival analysis examples of such sequences. Definition: a process X is predictable with respect to F if X ( t ) is determined by F t − , i.e. information up to but not including t . In other words, X ( t ) is known given F t − d t . Example: a left-continuous process is predictable given its own past: X ( t ) = lim h → 0 X ( t − h ). 8 / 29
Infinitesimal characterization of martingale Let d M ( t ) = M ( d t ) = M (( t + d t ) − ) − M ( t − ) be increment over infinitesimal interval [ t , t + d t [. Then M is a martingale if E [ d M ( t ) |F t ] = 0 Heuristically, for s < t : �� � E [ M ( t ) |F s ] = M ( s ) + E d M ( u ) |F s ] s , t ] � t = M ( s ) + E [ d M ( u ) |F s ] s � t � = M ( s ) + E [ d M ( u ) |F u − ] |F s ] = M ( s ) E s (here we used F s ⊆ F u − , s < u , for the third equality) For counting process N , d N ( t ) is zero or one 9 / 29
NB: assumed that M is right-continuous and that left limit exists. Then ◮ M ( t + d t ) − M ( t ) is increment over ] t , t + d t ]. ◮ M ( t − ) is value of M just prior to t (limit of M ( u ) as u tends to t from the left). Hence M (( t + d t ) − ) − M ( t − ) becomes increment over [ t , t + d t [ ◮ F t − represents all information up to but not including t . To be honest, I’m not completely sure why the literature does not just define d M ( t ) = M ( t + d t ) − M ( t ) and consider E [ d M ( t ) |F t ]. I here follow Klein and Moeschberger as well as Gill. 10 / 29
Application in survival analysis Procedure: 1. express data as counting process N 2. construct martingale M ( t ) = N ( t ) − Λ( t ), t ≥ 0. 3. Express Nelson-Aalen/Kaplan-Meier/Cox partial likelihood as a stochastic integral � t ˜ M ( t ) = K ( u ) d M ( u ) 0 for some predictable process K . Note ˜ M ( u ) is also a martingale (exercise). √ n ˜ 1 4. Apply martingale central limit theorem to M n ( t ) (introducing n , number of subjects, in the notation) to get asymptotic normality. 11 / 29
Independent and identically distributed survival times Given survival data ( T i , ∆ i ), i = 1 , . . . , n define one-step counting processes N i ( t ) = 1[ T i ≤ t , ∆ i = 1] = 1[ X i ≤ t , X i ≤ C i ] and accumulated process n � N ( t ) = N i ( t ) . i =1 (note: X i independent continuous random variables implies N has jumps of size 1) F t : history of N i , i = 1 , . . . , n up to time t . Define Y i ( t ) = 1[ T i ≥ t ]. I.e. Y i is one if i th individual at risk at time t and zero otherwise. Y i is left-continuous and hence predictable. Y ( t ) = � n i =1 Y i ( t ) is the number at risk at time t . 12 / 29
Compensator Define � t Λ i ( t ) = Y i ( u ) h ( u ) d u 0 where h is the hazard rate of the X i . Then Λ i ( t ) is a continuous and hence predictable stochastic process. Moreover, M i = N i − Λ i is a martingale: we show E [ d N i ( t ) |F t − ] and E [ d Λ i ( t ) |F t − ] are equal. Based on F t − we can decide whether T i < t or T i ≥ t . 13 / 29
Case T i ≥ t : E [ d N i ( t ) |F t ] = E [1[ T i ∈ [ t , t + d t [ , C i ≥ X i ] | T i ≥ t ] ‘ = ′ P [ X i ∈ [ t , t + d t [ , C i ≥ t | X i ≥ t , C i ≥ t ] = P [ X i ∈ [ t , t + d t [ | X i ≥ t , C i ≥ t ] Under independent censoring, the last probability is h ( t ) d t = Y i ( t ) h ( t ) d t (‘=’ is because we replace C i ≥ X i by C i ≥ t ). Case T i < t : E [ d N i ( t ) |F t − ] = E [ d N i ( t ) | T i < t ] = 0 = Y i ( t ) h ( t ) d t (the only possible jump occurred prior to t ). Regarding d Λ i ( t ): E [ d Λ i ( t ) |F t − ] = E [ Y i ( t ) h ( t ) d t |F t − ] = Y i ( t ) h ( t ) d t (where we used Y i ( t ) h ( t ) d t predictable process, hence given F t − we know Y i ( t )). 14 / 29
Conclusion: E [ d N i ( t ) |F t − ] = E [ d Λ i ( t ) |F t − ] ⇔ E [ d M i ( t ) |F t − ] = 0 It follows that M ( t ) = N ( t ) − Λ( t ) is a martingale too where n � Λ( t ) = Λ i ( t ) = Y ( t ) h ( t ) i =1 15 / 29
Nelson-Aalen estimator Define 0 / 0 = 0. Then d N ( u ) = d Λ( u )+ d M ( u ) ⇔ d N ( u ) Y ( u ) = 1[ Y ( u ) > 0] h ( u ) d u + d M ( u ) Y ( u ) Integrating we obtain � t � t � t d N ( u ) d M ( u ) Y ( u ) = 1[ Y ( u ) > 0] h ( u ) d u + Y ( u ) 0 0 0 Here: � t ◮ H ∗ ( t ) = 0 1[ Y ( u ) > 0] h ( u ) d u is equal to H ( t ) for t ≤ max { T 1 , . . . , T n } . � t d M ( u ) ◮ W ( t ) = Y ( u ) d u is a zero-mean martingale ‘noise’ process 0 � t ◮ ˆ d N ( u ) Y ( u ) is an unbiased estimator of H ∗ ( t ) H ( t ) = 0 16 / 29
Observe: 1 ˆ � H ( t ) = Y ( t ∗ ) t ∗ ∈ D : t ∗ ≤ t is precisely the Nelson-Aalen estimator. 1 Martingale central limit theorem for √ n W can be used to show asymptotic normality of ˆ H . 17 / 29
Score process for Cox regression We still assume that the counting processes N i are independent but now with different hazard rates h i ( t ) = h 0 ( t ) exp[ β T Z i ( t )] Note: we immediately seize the opportunity to generalize the Cox regression model by allowing the covariates Z i ( t ) = ( Z i 1 ( t ) , . . . , Z ip ( t )) to be a time-varying predictable random process. Compensators � t n � Λ i ( t ) = λ i ( u ) d u λ i ( u ) = Y i ( u ) h i ( u ) Λ( t ) = Λ i ( t ) 0 i =1 Partial log likelihood process: � n � �� � β T Z i ( t i ) − log � Y l ( t i ) exp( β T Z l ( t i )) l ( β, t ) = i ∈ D : t i ≤ t l =1 Note: partial log likelihood l ( β ) = l ( β, ∞ ). We here used risk 18 / 29 process Y ( t ) notation instead of risk set R ( t ).
Recommend
More recommend