Lecture 16: Survival Analysis I – Kaplan Meier and Log-rank test Ani Manichaikul amanicha@jhsph.edu 11 May 2007
Survival Analysis n Statistical methods for the study of time to an event n Accounts for: n Time that events occur n Different follow-up times 2
Survival Analysis n Survival analysis methods allow us to incorporate information about both frequency of event occurrence and time to event information n Subjects are followed until they have an “event,” or the study ends 3
Endpoint n The endpoint doesn’t have to be ‘death’; it can be any well-defined event n Death n Disease onset n Menopause n Pregnancy n Relapse 4
Time Scale n When do you start the clock? n Time from diagnosis of disease to death n Time from HIV infection to AIDS n Time from birth (chronological age) n Time from randomization in clinical trial 5
Why Is Survival Analysis Tricky? n We need a method which can incorporate information about censored data into an analysis 6
The Survival Curve S(t) is an estimate of the proportion of individuals still alive (have not had S(t) the event) at time t Time 7
The Survival Curve n The survival curve as an important and complete summary ( ) # alive at followup time t = S ( t ) ( ) # alive at time 0 n Time 0: “start of clock” 8
Survival Curve Facts: n The curve starts at 1 and decreases n Estimating these curves and comparing them among groups constitutes a “survival analysis” n Need to decide on what summary is important n Mean survival time n Median survival time n Height at a specific time: One, two year survival rates n Difference of curves: S 1 (12) - S 2 (12) 9
Estimating Median Survival S(t) .50 m Time 10
Caveat—Medians Do Not Describe Whole Curve S(t) .50 m Time 11
Survival Function n The survival function, denoted S(t), is a better way to represent the probability distribution of the survival time T, when some of the observed times are censored n only know that T> t, rather than T= t n S(t) = Pr(T > t) = Pr(No event by time t) n S(t) is the probability of surviving beyond t 12
n Uncensored data: The event has occurred n Censored data: The event has yet to occur n Event-free at the current followup time n A competing event that is not an endpoint stops followup n Death (if not part of the endpoint) n Clinical event that requires treatment, etc. 13
n Important issue: If no events are reported in the interval from last follow- up to “now”, need to choose between: n No news is good news? n No news is no news 14
n Ignore the incomplete cases; drop them n Produces bias in the estimated curve n Unbalanced censoring produces biased comparisons n Impute an event time n Depends on a model n Use the available information on each participant 15
( ) # events = Event Rate total observatio n time n Example: 5 events in 600 person months 5/600 = 1/120 events per month n = 0.1 events per year = 10 events per 100 person-years n Gives an average event rate over the follow- up period n For a finer time resolution, do the above for small intervals 16
Quantities of Interest n The survivor function S(t) S(t)= P(T> t)= P(No event by time t) n Hazard function � (t) � (t) “= ” P(T= t)/ P(T> t) = risk of event occurring at time t The above form is true for discrete time, but involves more complicated calculus-based notation for continuous time. 17
Quantities of Interest n Often, we are interested in comparing the hazard between groups, for example, the relative hazard of relapse comparing those on chemo to those not on chemo n Relative Risk n Hazard Ratio n Risk Ratio 18
Estimation n Kaplan-Meier survivor function estimator n Cox proportional hazards model (PHM) for hazard ratio n We’ll start with Kaplan-Meier (K-M) 19
Central Problem n Estimation of the survival curve n S(t) = Proportion surviving at least to time t or beyond 20
The Survival Curve 1.0 S(0) always equals 1 All subjects are alive at beginning of the S(t) study Time 0 21
The Survival Curve 1.0 Curve can only remain at same value or decrease as time S(t) progresses Time 0 22
The Survival Curve 1.0 If all the subjects do not experience the event by the end of the S(t) study window, the curve may never reach zero Time 0 23
Example n Consider a clinical trial in patients with acute myelogenous leukemia (AML) comparing two groups of patients: no maintenance treatment with chemotherapy ( X= 0 ) -vs- maintenance chemotherapy treatment ( X= 1 ) 24
Example: Data 25
Why Survival Methods? n We are interested in estimating the relationship between chemotherapy and the time to AML relapse in weeks. n We need some tools because: n Data are censored, so linear regression is not appropriate n We are interested in time to relapse, not just relapse (yes/no), so logistic regression is not appropriate 26
Kaplan-Meier Estimate n Curve can be estimated at each event, but not at censoring times n S(t) = proportion of individuals surviving beyond time t 27
Kaplan-Meier Estimate n Curve can be estimated at each event, but not at censoring times − ( ) ( ) n t y t = × S S ( t ) (Pr evious _ Event _ Time ) ( ) n t n y( t ) = # events at time t n n( t ) = # subjects at risk for event at time t 28
Kaplan-Meier Estimate n Curve can be estimated at each event, but not at censoring times − n ( t ) y ( t ) = × S S ( t ) (Pr evious _ Event _ Time ) ( ) n t Proportion of original sample making it to time t 29
Kaplan-Meier Estimate n Curve can be estimated at each event, but not at censoring times − ( ) ( ) n t y t = × S S ( t ) (Pr evious _ Event _ Time ) ( ) n t Proportion surviving to time t who survive beyond time t 30
Kaplan-Meier Estimate n Start estimate at first event time n No Chemotherapy Group: Time = 5 − − n ( 5 ) y ( 5 ) 12 2 10 = = = = S ( 5 ) . 833 n ( 5 ) 12 12 31
Kaplan-Meier Estimate n No Chemotherapy group: Time= 8 n 2 nd event time − − n ( 8 ) y ( 8 ) 10 2 = × = × S ( 8 ) S ( 5 ) (. 833 ) n ( 8 ) 10 8 = × = . 833 . 666 10 32
Kaplan-Meier Estimate n Skip over censoring times: Remove from number at risk for next event time n Continue through final event time 33
Alternative Notation − = ∏ n y ˆ i i ( ) S t n ≤ i : : t t i i = ˆ S ( 0 ) 1 (by convention) 34
35
Notice n Time 16 was not included in the table, yet 2 people were subtracted from the risk set at time 23 n The estimated survivor function does not change at censoring times when no event occurs n Censored individuals are subtracted from the risk set at subsequent times because they are “lost to follow-up” 36
37
Kaplan-Meier Estimate n Graph is a step function n “Jumps” at each observed event time n Nothing is assumed about curved shape between each observed event time 38
Kaplan-Meier Estimate 39
Kaplan-Meier Estimate n Product limit estimate n Order survival times n Computed at observed events n Multiplying conditional probabilities n Next time we’ll discuss Confidence Intervals for S(t)! 40
Big Assumption n Independence of censoring and survival n Those censored at time t have the same prognosis as those not censored at t 41
Comparing Survival Curves n Common statistical tests: n Generalized Wilcoxon (Breslow, Gehan) n Logrank 42
Comparing Survival Curves n Both compare survival curves across multiple time points to answer the question: “Is overall survival different between any of the groups?” n H o : No difference in S (t) n H a : Difference in S (t) 43
Comparing Survival Curves n Wilcoxon (Breslow, Gehan) more sensitive to early survival differences Kaplan Meier Curve, by Group 1.00 Group 1 Group 2 0.75 0.50 0.25 0.00 0 100 200 300 400 analysis time 44
Comparing Survival Curves n Logrank more sensitive to later survival differences Kaplan Meier Curve, by Group 1.00 Group 1 Group 2 0.75 0.50 0.25 0.00 0 100 200 300 400 analysis time 45
Comparing Survival Curves n Neither test very good if curves “crossover” Kaplan Meier Curve, by Group 1.00 Group 1 Group 2 0.75 0.50 0.25 0.00 0 100 200 300 400 analysis time 46
Logrank Test n Answers the Quesiton: Are two survivor curves the same? n Use the times of events: t 1 , t 2 , ... (do not include censoring times) n Treat each event and its “set of persons still at risk” (i.e., risk set) at each time t j as an independent table 47
Logrank Test: Recipe n Make a 2 × 2 table at each t j 48
Logrank test n At each event time t j , under assumption of equal survival ( S A (t) = S B (t) ) the expected number of events in Group A out of the total events ( d j = a j + c j ) is in proportion to the numbers at risk in group A to the total at risk at time t j : E(a j )= d j * n jA /n j 49
Recommend
More recommend