Who needs the Cox model anyway Bendix Carstensen Steno Diabetes - PDF document

Who needs the Cox model anyway Bendix Carstensen Steno Diabetes Center Copenhagen Gentofte, Denmark http://BendixCarstensen.com SDC Epi and Biostat Network, 11 March 2020 Thursday 12 th March, 2020, 10:38 From /home/bendix/teach/AdvCoh/talks/Aarhus2020/slides.tex 1/ 47 The dogma [1] ◮ do not condition on the future — indisputable ◮ do not count people after they are dead — disputable ◮ stick to this world — expandable P. K. Andersen and N. Keiding: Interpretability and importance of functionals in competing risks and multistate models Stat Med, 31:1074–1088, 2012 2/ 47 (further) dogma for “sticking to this world” ◮ rates are continuous in time (and“smooth” ) ◮ rates may depend on more than one time scale ◮ . . . which timescales is an empirical question ◮ But first we look at the machinery for modeling simple occurence rates from follow-up studies (mortality, incidence, . . . ) 3/ 47

◮ In follow-up studies we estimate rates from: ◮ D — events, deaths ◮ Y — person-years ◮ ˆ λ = D / Y rates ◮ . . . empirical counterpart of intensity — an estimate ◮ Rates differ between persons. ◮ Rates differ within persons: ◮ by age ◮ by calendar time ◮ by disease duration ◮ . . . ◮ Multiple timescales — later 4/ 47 Representation of follow-up data A cohort or follow-up study records events and risk time The outcome (response) is thus bivariate : ( d , y ) Follow-up data for each individual must therefore have (at least) three pieces of information recorded: Date of entry date variable entry Date of exit exit date variable Status at exit indicator (mostly 0 / 1 ) event 5/ 47 From representation to likelihood ◮ Target is estimates of occurrence rates (mortality rates, incidence rates) ◮ . . . and how these depend on covariates ◮ If we assume that mortality, λ is constant over time, then the log-likelihood from one person based on ( d , y ) : ◮ d — event, 0 or 1 ( event ) ◮ y — risk time ( exit − entry ) ℓ ( λ ) = d log( λ ) − λ y ◮ This formula is not derived here — see note on website 6/ 47

d y t 0 t 1 t 2 t x y 1 y 2 y 3 Probability log-Likelihood P( d at t x | entry t 0 ) d log( λ ) − λ y = P( surv t 0 → t 1 | entry t 0 ) = 0 log( λ ) − λ y 1 × P( surv t 1 → t 2 | entry t 1 ) + 0 log( λ ) − λ y 2 × P( d at t x | entry t 2 ) + d log( λ ) − λ y 3 7/ 47 d = 0 y ❡ t 0 t 1 t 2 t x ❡ y 1 y 2 y 3 Probability log-Likelihood P( surv t 0 → t x | entry t 0 ) 0 log( λ ) − λ y = P( surv t 0 → t 1 | entry t 0 ) = 0 log( λ ) − λ y 1 × P( surv t 1 → t 2 | entry t 1 ) + 0 log( λ ) − λ y 2 × P( surv t 2 → t x | entry t 2 ) + 0 log( λ ) − λ y 3 8/ 47 d = 1 y ✉ t 0 t 1 t 2 t x ✉ y 1 y 2 y 3 Probability log-Likelihood P( event at t x | entry t 0 ) 1 log( λ ) − λ y = P( surv t 0 → t 1 | entry t 0 ) = 0 log( λ ) − λ y 1 × P( surv t 1 → t 2 | entry t 1 ) + 0 log( λ ) − λ y 2 × P( event at t x | entry t 2 ) + 1 log( λ ) − λ y 3 9/ 47

d y t 0 t 1 t 2 t x y 1 y 2 y 3 Probability log-Likelihood P( d at t x | entry t 0 ) d log( λ ) − λ y = P( surv t 0 → t 1 | entry t 0 ) = 0 log( λ ) − λ y 1 × P( surv t 1 → t 2 | entry t 1 ) + 0 log( λ ) − λ y 2 × P( d at t x | entry t 2 ) + d log( λ ) − λ y 3 10/ 47 d y t 0 t 1 t 2 t x y 1 y 2 y 3 Probability log-Likelihood P( d at t x | entry t 0 ) d log( λ ) − λ y = P( surv t 0 → t 1 | entry t 0 ) = 0 log( λ 1 ) − λ 1 y 1 × P( surv t 1 → t 2 | entry t 1 ) + 0 log( λ 2 ) − λ 2 y 2 × P( d at t x | entry t 2 ) + d log( λ 3 ) − λ 3 y 3 — allows different rates ( λ i ) in each interval 11/ 47 Likelihood for time-split data ◮ The setup is for a situation where it is assumed that rates are constant in each of the intervals ◮ Each record in the data set represents follow-up for one person in one (small) interval — many records for each person ◮ Each record in the data set contributes a term to the likelihood ◮ Each term looks like a contribution from a Poisson variate (albeit with values only 0 or 1 ), with mean λ y ◮ ⇒ Likelihood for one person’s FU (rate likelihood) is the same as the likelihood for several independent Poisson variates: ◮ Two models, one likelihood. 12/ 47

Analysis of time-split data Observations classified by p —person and i —interval ◮ d pi — In the model as response ◮ y pi — risk time In the model as offset log( y ) . . . or as part of the response ◮ Covariates are: ◮ timescales (age, period, time in study) ◮ other variables for this person (constant in each interval). ◮ Model rates using the covariates in glm : — no difference in how time-scales and other covariates are modeled 13/ 47 A look at the Cox model λ ( t , x ) = λ 0 ( t ) × exp( x ′ β ) A model for the rate as a function of t and x . Covariates: ◮ x ◮ t ◮ . . . often the effect of t is ignored (forgotten?) ◮ i.e. left unreported 14/ 47 Cox-likelihood The (partial) log-likelihood for the regression parameters: � � e η death � ℓ ( β ) = log � i ∈R t e η i death times is also a profile likelihood in the model where observation time has been subdivided in small pieces (empirical rates) and each small piece provided with its own parameter: � � � � + x ′ β = α t + η log λ ( t , x ) = log λ 0 ( t ) 15/ 47

The Cox-likelihood as profile likelihood ◮ One parameter per death time to describe the effect of time (i.e. the chosen timescale). � � � � log λ ( t , x i ) = log λ 0 ( t ) + β 1 x 1 i + · · · + β p x pi = α t + η i � �� η i ◮ Profile likelihood: ◮ Derive estimates of α t as function of data and β s — assuming constant rate between death/censoring times ◮ Insert in likelihood, now only a function of data and β s ◮ This turns out to be Cox’s partial likelihood ◮ Cumulative intensity ( Λ 0 ( t ) ) obtained via the Breslow-estimator 16/ 47 Mayo Clinic 1.0 lung cancer data: 0.8 60 year old woman 0.6 Survival 0.4 0.2 0.0 0 200 400 600 800 Days since diagnosis 17/ 47 The Cox-likelihood: mechanics of computing ◮ The likelihood is computed by suming over risk-sets: � � e η death � ℓ ( η ) = log � i ∈R t e η i t ◮ this is essentially splitting follow-up time at event- (and censoring) times ◮ . . . repeatedly in every cycle of the iteration ◮ . . . simplified by not keeping track of risk time ◮ . . . but only works along one time scale 18/ 47

� � � � log λ ( t , x i ) = log λ 0 ( t ) + β 1 x 1 i + · · · + β p x pi = α t + η i � �� η i ◮ Suppose the time scale has been divided into small intervals with at most one death in each: ◮ Empirical rates: ( d it , y it ) — each t has at most one d it = 1 . ◮ Assume w.l.o.g. the y s in the empirical rates all are 1. ◮ Log-likelihood contributions that contain information on a specific time-scale parameter α t will be from: ◮ the (only) empirical rate (1 , 1) with the death at time t . ◮ all other empirical rates (0 , 1) from those who were at risk at time t . 19/ 47 Note: There is one contribution from each person at risk to the part of the log-likelihood at t : � ℓ t ( α t , β ) = d i log( λ i ( t )) − λ i ( t ) y i i ∈R t � � d i ( α t + η i ) − e α t + η i � = i ∈R t = α t + η death − e α t � e η i i ∈R t where η death is the linear predictor for the person that died at t . 20/ 47 The derivative w.r.t. α t is: 1 D α t ℓ t ( α t , β ) = 1 − e α t � e η i = 0 e α t = ⇔ � i ∈R t e η i i ∈R t If this estimate is fed back into the log-likelihood for α t , we get the profile likelihood (with α t “profiled out” ): � � � � 1 e η death log + η death − 1 = log − 1 � � i ∈R t e η i i ∈R t e η i which is the same as the contribution from time t to Cox’s partial likelihood. 21/ 47

Splitting the dataset a priori ◮ The Poisson approach needs a dataset of empirical rates ( d , y ) with suitably small values of y . ◮ — each individual contributes many empirical rates ◮ (one per risk-set contribution in Cox-modelling) ◮ From each empirical rate we get: ◮ Poisson-response d ◮ Risk time y → log( y ) as offset ◮ time scale covariates: current age, current date, . . . ◮ other covariates ◮ Contributions not independent, but likelihood is a product ◮ Same likelihood as for independent Poisson variates ◮ Poisson glm with spline/factor effect of time 22/ 47 History This is not new, the profile likelihood was pointed out by Holford [2] in 1976, and the practical implementation was demonstrated by Whitehead in 1980 [3], using GLIM. . . . so I am telling an old story here. 23/ 47 Example: Mayo Clinic lung cancer ◮ Survival after lung cancer ◮ Covariates: ◮ Age at diagnosis ◮ Sex ◮ Time since diagnosis ◮ Cox model ◮ Split data: ◮ Poisson model, time as factor ◮ Poisson model, time as spline 24/ 47

Who needs the Cox model anyway Bendix Carstensen Steno Diabetes - PDF document

Who needs the Cox model anyway Bendix Carstensen Steno Diabetes Center Copenhagen Gentofte, Denmark http://BendixCarstensen.com SDC Epi and Biostat Network, 11 March 2020 Thursday 12 th March, 2020, 10:38 From

LTS Efforts in Network Mapping LTS Efforts in Network Mapping Dr B Ann Cox Dr B Ann Cox Dr. B.

Algorithms for Cox rings Simon Keicher ICERM May 2018 Algorithms for Cox rings S. Keicher

Needs Slides Needs Slides Needs Slides Needs Slides Needs Slides Needs Slides Needs Slides

Who needs the Cox model anyway Bendix Carstensen Steno Diabetes Center Copenhagen Gentofte,

Radare2 - The Dwarf Fortress of reversing Who needs a GUI anyway? Florent (Skia) Jacquet Julien

Responding To A PCAOB Investigation October 16, 2018 Lawline Robert H. Cox 1 Robert H. Cox

Special Needs Planning Cox Law Group, Inc Cynthia Cox, Esq. cynthia@coxlawgroupinc.com

Whose Internet Is It, Anyway? Blackhat DC 2010 Andrew Fried, ISC, SURBL Richard Cox, Spamhaus

Survival Rates and Multiple timescales Survival Lifetable estimators Competing risks Kaplan-

Survival Rates and Multiple timescales Survival Lifetable estimators Competing risks Kaplan-

The Cox Model Heidi Seibold Statistician at LMU Munich DataCamp Survival Analysis in R Why use

Photography Photography By: Jason Cox By: Jason Cox Cameras Cameras Pinhole Pinhole

The Mathematics of Billiards Washington University Math Circle Chris Cox March 6, 2016 Chris

1099 1099 1099 1099 New Y New York rk Av Avenue W W Washing ashington, ton, D D.C. D

2 3 4 5 6 7 8 9 10 11 Cox (1993); Cox et al. (1991); Hicks-Clarke & Iles (2000); Richard

Cary Cox Agenda Overview Cary Cox Assistant Secretary for Marketing & Communications

ConnectHome Nation Webinar Connecthome Nation Webinar Cox Communications Connect2Compete

+ Dale Cox / USGS + Laurie Johnson / Laurie Johnson Consulting + Serge Terentieff / East Bay

Survival Analysis Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester

Coxs proportional hazards model and Coxs partial likelihood Rasmus Waagepetersen October

(Un)Successful Adaptation NERRS Science Collaborative 1 Welcome Enjoy your lunch and dive right

Transition to Adulthood Learning Collaborative (TALC) FY20 Quarter 4 Meeting August 10, 2020

Measures of Fit for Logistic Regression Paul D. Allison, Ph.D. Statistical Horizons LLC Paper

OVERVIEW OF STATISTICAL DISCLOSURE LIMITATION Lawrence H. Cox, Associate Director National

Sambuz

Useful Links

Newsletter

Mail Us

Who needs the Cox model anyway Bendix Carstensen Steno Diabetes - PDF document

Who needs the Cox model anyway Bendix Carstensen Steno Diabetes Center Copenhagen Gentofte, Denmark http://BendixCarstensen.com SDC Epi and Biostat Network, 11 March 2020 Thursday 12 th March, 2020, 10:38 From

LTS Efforts in Network Mapping LTS Efforts in Network Mapping Dr B Ann Cox Dr B Ann Cox Dr. B.

Algorithms for Cox rings Simon Keicher ICERM May 2018 Algorithms for Cox rings S. Keicher

Needs Slides Needs Slides Needs Slides Needs Slides Needs Slides Needs Slides Needs Slides

Who needs the Cox model anyway Bendix Carstensen Steno Diabetes Center Copenhagen Gentofte,

Radare2 - The Dwarf Fortress of reversing Who needs a GUI anyway? Florent (Skia) Jacquet Julien

Responding To A PCAOB Investigation October 16, 2018 Lawline Robert H. Cox 1 Robert H. Cox

Special Needs Planning Cox Law Group, Inc Cynthia Cox, Esq. cynthia@coxlawgroupinc.com

Whose Internet Is It, Anyway? Blackhat DC 2010 Andrew Fried, ISC, SURBL Richard Cox, Spamhaus

Survival Rates and Multiple timescales Survival Lifetable estimators Competing risks Kaplan-

Survival Rates and Multiple timescales Survival Lifetable estimators Competing risks Kaplan-

The Cox Model Heidi Seibold Statistician at LMU Munich DataCamp Survival Analysis in R Why use

Photography Photography By: Jason Cox By: Jason Cox Cameras Cameras Pinhole Pinhole

The Mathematics of Billiards Washington University Math Circle Chris Cox March 6, 2016 Chris

1099 1099 1099 1099 New Y New York rk Av Avenue W W Washing ashington, ton, D D.C. D

2 3 4 5 6 7 8 9 10 11 Cox (1993); Cox et al. (1991); Hicks-Clarke &amp; Iles (2000); Richard

Cary Cox Agenda Overview Cary Cox Assistant Secretary for Marketing &amp; Communications

ConnectHome Nation Webinar Connecthome Nation Webinar Cox Communications Connect2Compete

+ Dale Cox / USGS + Laurie Johnson / Laurie Johnson Consulting + Serge Terentieff / East Bay

Survival Analysis Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester

Coxs proportional hazards model and Coxs partial likelihood Rasmus Waagepetersen October

(Un)Successful Adaptation NERRS Science Collaborative 1 Welcome Enjoy your lunch and dive right

Transition to Adulthood Learning Collaborative (TALC) FY20 Quarter 4 Meeting August 10, 2020

Measures of Fit for Logistic Regression Paul D. Allison, Ph.D. Statistical Horizons LLC Paper

OVERVIEW OF STATISTICAL DISCLOSURE LIMITATION Lawrence H. Cox, Associate Director National

Sambuz

Useful Links

Newsletter

Mail Us

2 3 4 5 6 7 8 9 10 11 Cox (1993); Cox et al. (1991); Hicks-Clarke & Iles (2000); Richard

Cary Cox Agenda Overview Cary Cox Assistant Secretary for Marketing & Communications