Dealing with Missing Values in Multivariate Joint Models for Longitudinal and Survival Data Nicole Erler Department of Biostatistics, Erasmus Medical Center � n.erler@erasmusmc.nl � N_Erler � www.nerler.com � NErler ISCB 2020
Chronic Hepatitis C Image: https://www.hepatitisc.uw.edu/go/evaluation-staging-monitoring/natural-history/core-concept/all 1
Longitudinal Covariates log(AST) log(ALT) platelets 10.0 8 1000 7.5 6 5.0 500 4 2.5 2 0 log(bilirubin) albumin GGT 60 6 3000 40 4 2000 20 2 1000 0 0 0 0 10 20 30 0 10 20 30 0 10 20 30 years since diagnosis 2
Baseline Covariates alcohol (4.2% NA) anti−HBc (10.7% NA) BMI (19.1% NA) diabetes (0.1% NA) 60 600 400 50 300 500 40 300 400 frequency frequency frequency frequency 200 30 300 200 20 200 100 100 10 100 50 0 0 0 0 No Yes NA Negative Positive NA 15 20 25 30 35 40 45 No Yes NA smoking (15.1% NA) sex age year 50 300 40 400 250 40 30 300 200 frequency frequency frequency frequency 30 150 20 200 20 100 10 100 10 50 0 0 0 0 Positive Negative NA Male Female 20 30 40 50 60 70 80 90 1985 1995 2005 3
Missing Values in Longitudinal Covariates patient 1 patient 2 patient 3 patient 4 scaled biomarker value 0 5 10 0.0 0.5 1.0 1.5 0 5 10 15 0.0 2.5 5.0 7.5 10.0 patient 5 patient 6 patient 7 biomarker log(bilirubin) log(AST) log(ALT) platelets albumin GGT 0 10 20 30 0 5 10 10 15 follow−up (years) 4
Multivariate Joint Model Proportional hazards model for time until event: K � i β ( tc ) η ki ( t ) ⊤ β ( tv ) x ⊤ h i ( t ) = h 0 ( t ) exp � �� � + k � �� � k =1 time time constant varying 5
Multivariate Joint Model Proportional hazards model for time until event: K � i β ( tc ) η ki ( t ) ⊤ β ( tv ) x ⊤ h i ( t ) = h 0 ( t ) exp � �� � + k � �� � k =1 time time constant varying Longitudinal (mixed) model for each biomarker k = 1 , ... K : E ( y ki ( t ) | b ki ) = η ki ( t ) = x ki ( t ) ⊤ β ( k ) + z ki ( t ) ⊤ b ki � �� � � �� � fixed random effects effects 5
Multivariate Joint Model Proportional hazards model for time until event: � � K � i β ( tc ) η ki ( t ) ⊤ β ( tv ) x ⊤ h i ( t ) = h 0 ( t ) exp � �� � + k � �� � k =1 time time constant varying Longitudinal (mixed) model for each biomarker k = 1 , ... K : E ( y ki ( t ) | b ki ) = η ki ( t ) = x ki ( t ) ⊤ β ( k ) + z ki ( t ) ⊤ b ki � �� � � �� � fixed random effects effects Missing values in (baseline) covariates. 5
Imputation of Missing Covariates Imputation of a (baseline) variable x i : ➡ sample from the predictive distribution of the missing values given the observed values 6
Imputation of Missing Covariates Imputation of a (baseline) variable x i : ➡ sample from the predictive distribution of the missing values given the observed values p ( x i | everything else � ) � �� � other baseline variables � repeatedly measured variables (incl. outcomes) � survival outcome 6
Imputation of Missing Covariates Imputation of a (baseline) variable x i : ➡ sample from the predictive distribution of the missing values given the observed values p ( x i | everything else � ) � �� � other baseline variables � repeatedly measured variables (incl. outcomes) � survival outcome ➡ We cannot directly specify the (correct) imputation model! 6
Imputation of Missing Covariates Idea: ◮ specify the joint distribution p ( everything ) ◮ derive p ( x i | everything else ) from p ( everything ) 7
Imputation of Missing Covariates Idea: ◮ specify the joint distribution p ( everything ) ◮ derive p ( x i | everything else ) from p ( everything ) But: p ( everything ) = p (survival outcome , longitudinal outcomes , longitudinal covariates , baseline covariates , random effects , parameters) = p ( T , D , y , X , b , θ ) Does this really solve anything? 7
Imputation of Missing Covariates Idea: ◮ specify the joint distribution p ( everything ) ◮ derive p ( x i | everything else ) from p ( everything ) But: p ( everything ) = p (survival outcome , longitudinal outcomes , longitudinal covariates , baseline covariates , random effects , parameters) = p ( T , D , y , X , b , θ ) Does this really solve anything? � Yes, it does! 7
Fully Bayesian Analysis & Imputation From probability theory: p ( A , B ) = p ( A | B ) p ( B ) 8
Fully Bayesian Analysis & Imputation From probability theory: p ( A , B ) = p ( A | B ) p ( B ) Joint distribution p ( T , D , y , X , b , θ ) = p ( T , D | X , b , θ ) p ( y | X , b , θ ) p ( X | θ ) p ( b | θ ) p ( θ ) 8
Fully Bayesian Analysis & Imputation From probability theory: p ( A , B ) = p ( A | B ) p ( B ) Joint distribution p ( T , D , y , X , b , θ ) = p ( T , D | X , b , θ ) p ( y | X , b , θ ) p ( X | θ ) p ( b | θ ) p ( θ ) � �� � survival model 8
Fully Bayesian Analysis & Imputation From probability theory: p ( A , B ) = p ( A | B ) p ( B ) Joint distribution p ( T , D , y , X , b , θ ) = p ( T , D | X , b , θ ) p ( y | X , b , θ ) p ( X | θ ) p ( b | θ ) p ( θ ) � �� � � �� � survival multivariate model longitudinal model 8
Fully Bayesian Analysis & Imputation From probability theory: p ( A , B ) = p ( A | B ) p ( B ) Joint distribution p ( T , D , y , X , b , θ ) = p ( T , D | X , b , θ ) p ( y | X , b , θ ) p ( X | θ ) p ( b | θ ) p ( θ ) � �� � � �� � � �� � ���� survival multivariate random priors model longitudinal effects model � �� � analysis model 8
Fully Bayesian Analysis & Imputation From probability theory: p ( A , B ) = p ( A | B ) p ( B ) Joint distribution p ( T , D , y , X , b , θ ) = p ( T , D | X , b , θ ) p ( y | X , b , θ ) p ( X | θ ) p ( b | θ ) p ( θ ) � �� � � �� � � �� � � �� � ���� survival multivariate imputation random priors model longitudinal part effects model � �� � analysis model 8
Fully Bayesian Analysis & Imputation Imputation part p ( X | θ ) = p ( x 1 , . . . , x p , X compl . | θ ) = p ( x 1 | X compl . , x 2 , x 3 , . . . , x p , θ ) p ( x 2 | X compl . , x 3 , . . . , x p , θ ) . . . p ( x p | X compl . , θ ) 9
Fully Bayesian Analysis & Imputation Imputation part p ( X | θ ) = p ( x 1 , . . . , x p , X compl . | θ ) = p ( x 1 | X compl . , x 2 , x 3 , . . . , x p , θ ) p ( x 2 | X compl . , x 3 , . . . , x p , θ ) . . . p ( x p | X compl . , θ ) Estimation: via MCMC ➡ Gibbs sampling (using Metropolis-Hastings, ...) 9
Fully Bayesian Analysis & Imputation Imputation part p ( X | θ ) = p ( x 1 , . . . , x p , X compl . | θ ) = p ( x 1 | X compl . , x 2 , x 3 , . . . , x p , θ ) p ( x 2 | X compl . , x 3 , . . . , x p , θ ) . . . p ( x p | X compl . , θ ) Estimation: via MCMC ➡ Gibbs sampling (using Metropolis-Hastings, ...) Software: Implemented in the R package JointAI (using JAGS) 9
In Practice: Analysis of the HCV Data library ("JointAI") library ("splines") fmla <- list ( # formula for survival model Surv (etime, event) ~ age + sex + alc + smoke + BMI + DM + year + logBili + logALT + logAST + Plt, # formulas for the longitudinal outcomes logBili ~ age + sex + time + (time | id), logAST ~ age + sex + ns (time, df = 5) + ( ns (time, df = 5) | id), logALT ~ age + sex + ns (time, df = 3) + ( ns (time, df = 3) | id), Plt ~ age + sex + ns (time, df = 3) + ( ns (time, df = 3) | id) ) 10
In Practice: Analysis of the HCV Data library ("JointAI") library ("splines") fmla <- list ( # formula for survival model Surv (etime, event) ~ age + sex + alc + smoke + BMI + DM + year + logBili + logALT + logAST + Plt, # formulas for the longitudinal outcomes logBili ~ age + sex + time + (time | id), logAST ~ age + sex + ns (time, df = 5) + ( ns (time, df = 5) | id), logALT ~ age + sex + ns (time, df = 3) + ( ns (time, df = 3) | id), Plt ~ age + sex + ns (time, df = 3) + ( ns (time, df = 3) | id) ) mod <- JM_imp (fmla, data = HCVdata, timevar = "time", n.iter = 2000) 10
In Practice: Analysis of the HCV Data Additional options: ◮ covariate model types ◮ hyper-parameters ◮ number of chains & thinning interval ◮ ... Additional features: ◮ use of auxiliary variables ◮ use of ridge shrinkage priors ◮ multi-level settings (e.g., multi-center) ◮ ... For more info, see https://nerler.github.io/JointAI 11
Connecting Models Longitudinal ➡ Survival Longitudinal ➡ Longitudinal 12
Connecting Models Longitudinal ➡ Survival Longitudinal ➡ Longitudinal type of association ◮ underlying value η ki ( t ) ◮ slope ◮ cumulative effect ◮ time-lag ◮ ... ◮ combination of the above 12
Connecting Models Longitudinal ➡ Survival Longitudinal ➡ Longitudinal ◮ independent type of association ◮ underlying value η ki ( t ) ◮ slope ◮ cumulative effect ◮ time-lag ◮ ... ◮ combination of the above 12
Recommend
More recommend