dealing with missing values in multivariate joint models
play

Dealing with Missing Values in Multivariate Joint Models for - PowerPoint PPT Presentation

Dealing with Missing Values in Multivariate Joint Models for Longitudinal and Survival Data Nicole Erler Department of Biostatistics, Erasmus Medical Center n.erler@erasmusmc.nl N_Erler www.nerler.com NErler ISCB 2020 Chronic


  1. Dealing with Missing Values in Multivariate Joint Models for Longitudinal and Survival Data Nicole Erler Department of Biostatistics, Erasmus Medical Center � n.erler@erasmusmc.nl � N_Erler � www.nerler.com � NErler ISCB 2020

  2. Chronic Hepatitis C Image: https://www.hepatitisc.uw.edu/go/evaluation-staging-monitoring/natural-history/core-concept/all 1

  3. Longitudinal Covariates log(AST) log(ALT) platelets 10.0 8 1000 7.5 6 5.0 500 4 2.5 2 0 log(bilirubin) albumin GGT 60 6 3000 40 4 2000 20 2 1000 0 0 0 0 10 20 30 0 10 20 30 0 10 20 30 years since diagnosis 2

  4. Baseline Covariates alcohol (4.2% NA) anti−HBc (10.7% NA) BMI (19.1% NA) diabetes (0.1% NA) 60 600 400 50 300 500 40 300 400 frequency frequency frequency frequency 200 30 300 200 20 200 100 100 10 100 50 0 0 0 0 No Yes NA Negative Positive NA 15 20 25 30 35 40 45 No Yes NA smoking (15.1% NA) sex age year 50 300 40 400 250 40 30 300 200 frequency frequency frequency frequency 30 150 20 200 20 100 10 100 10 50 0 0 0 0 Positive Negative NA Male Female 20 30 40 50 60 70 80 90 1985 1995 2005 3

  5. Missing Values in Longitudinal Covariates patient 1 patient 2 patient 3 patient 4 scaled biomarker value 0 5 10 0.0 0.5 1.0 1.5 0 5 10 15 0.0 2.5 5.0 7.5 10.0 patient 5 patient 6 patient 7 biomarker log(bilirubin) log(AST) log(ALT) platelets albumin GGT 0 10 20 30 0 5 10 10 15 follow−up (years) 4

  6. Multivariate Joint Model Proportional hazards model for time until event:   K � i β ( tc ) η ki ( t ) ⊤ β ( tv )  x ⊤ h i ( t ) = h 0 ( t ) exp � �� � +  k � �� � k =1 time time constant varying 5

  7. Multivariate Joint Model Proportional hazards model for time until event:   K � i β ( tc ) η ki ( t ) ⊤ β ( tv )  x ⊤ h i ( t ) = h 0 ( t ) exp � �� � +  k � �� � k =1 time time constant varying Longitudinal (mixed) model for each biomarker k = 1 , ... K : E ( y ki ( t ) | b ki ) = η ki ( t ) = x ki ( t ) ⊤ β ( k ) + z ki ( t ) ⊤ b ki � �� � � �� � fixed random effects effects 5

  8. Multivariate Joint Model Proportional hazards model for time until event: � � K � i β ( tc ) η ki ( t ) ⊤ β ( tv ) x ⊤ h i ( t ) = h 0 ( t ) exp � �� � + k � �� � k =1 time time constant varying Longitudinal (mixed) model for each biomarker k = 1 , ... K : E ( y ki ( t ) | b ki ) = η ki ( t ) = x ki ( t ) ⊤ β ( k ) + z ki ( t ) ⊤ b ki � �� � � �� � fixed random effects effects Missing values in (baseline) covariates. 5

  9. Imputation of Missing Covariates Imputation of a (baseline) variable x i : ➡ sample from the predictive distribution of the missing values given the observed values 6

  10. Imputation of Missing Covariates Imputation of a (baseline) variable x i : ➡ sample from the predictive distribution of the missing values given the observed values p ( x i | everything else � ) � �� � other baseline variables � repeatedly measured variables (incl. outcomes) � survival outcome 6

  11. Imputation of Missing Covariates Imputation of a (baseline) variable x i : ➡ sample from the predictive distribution of the missing values given the observed values p ( x i | everything else � ) � �� � other baseline variables � repeatedly measured variables (incl. outcomes) � survival outcome ➡ We cannot directly specify the (correct) imputation model! 6

  12. Imputation of Missing Covariates Idea: ◮ specify the joint distribution p ( everything ) ◮ derive p ( x i | everything else ) from p ( everything ) 7

  13. Imputation of Missing Covariates Idea: ◮ specify the joint distribution p ( everything ) ◮ derive p ( x i | everything else ) from p ( everything ) But: p ( everything ) = p (survival outcome , longitudinal outcomes , longitudinal covariates , baseline covariates , random effects , parameters) = p ( T , D , y , X , b , θ ) Does this really solve anything? 7

  14. Imputation of Missing Covariates Idea: ◮ specify the joint distribution p ( everything ) ◮ derive p ( x i | everything else ) from p ( everything ) But: p ( everything ) = p (survival outcome , longitudinal outcomes , longitudinal covariates , baseline covariates , random effects , parameters) = p ( T , D , y , X , b , θ ) Does this really solve anything? � Yes, it does! 7

  15. Fully Bayesian Analysis & Imputation From probability theory: p ( A , B ) = p ( A | B ) p ( B ) 8

  16. Fully Bayesian Analysis & Imputation From probability theory: p ( A , B ) = p ( A | B ) p ( B ) Joint distribution p ( T , D , y , X , b , θ ) = p ( T , D | X , b , θ ) p ( y | X , b , θ ) p ( X | θ ) p ( b | θ ) p ( θ ) 8

  17. Fully Bayesian Analysis & Imputation From probability theory: p ( A , B ) = p ( A | B ) p ( B ) Joint distribution p ( T , D , y , X , b , θ ) = p ( T , D | X , b , θ ) p ( y | X , b , θ ) p ( X | θ ) p ( b | θ ) p ( θ ) � �� � survival model 8

  18. Fully Bayesian Analysis & Imputation From probability theory: p ( A , B ) = p ( A | B ) p ( B ) Joint distribution p ( T , D , y , X , b , θ ) = p ( T , D | X , b , θ ) p ( y | X , b , θ ) p ( X | θ ) p ( b | θ ) p ( θ ) � �� � � �� � survival multivariate model longitudinal model 8

  19. Fully Bayesian Analysis & Imputation From probability theory: p ( A , B ) = p ( A | B ) p ( B ) Joint distribution p ( T , D , y , X , b , θ ) = p ( T , D | X , b , θ ) p ( y | X , b , θ ) p ( X | θ ) p ( b | θ ) p ( θ ) � �� � � �� � � �� � ���� survival multivariate random priors model longitudinal effects model � �� � analysis model 8

  20. Fully Bayesian Analysis & Imputation From probability theory: p ( A , B ) = p ( A | B ) p ( B ) Joint distribution p ( T , D , y , X , b , θ ) = p ( T , D | X , b , θ ) p ( y | X , b , θ ) p ( X | θ ) p ( b | θ ) p ( θ ) � �� � � �� � � �� � � �� � ���� survival multivariate imputation random priors model longitudinal part effects model � �� � analysis model 8

  21. Fully Bayesian Analysis & Imputation Imputation part p ( X | θ ) = p ( x 1 , . . . , x p , X compl . | θ ) = p ( x 1 | X compl . , x 2 , x 3 , . . . , x p , θ ) p ( x 2 | X compl . , x 3 , . . . , x p , θ ) . . . p ( x p | X compl . , θ ) 9

  22. Fully Bayesian Analysis & Imputation Imputation part p ( X | θ ) = p ( x 1 , . . . , x p , X compl . | θ ) = p ( x 1 | X compl . , x 2 , x 3 , . . . , x p , θ ) p ( x 2 | X compl . , x 3 , . . . , x p , θ ) . . . p ( x p | X compl . , θ ) Estimation: via MCMC ➡ Gibbs sampling (using Metropolis-Hastings, ...) 9

  23. Fully Bayesian Analysis & Imputation Imputation part p ( X | θ ) = p ( x 1 , . . . , x p , X compl . | θ ) = p ( x 1 | X compl . , x 2 , x 3 , . . . , x p , θ ) p ( x 2 | X compl . , x 3 , . . . , x p , θ ) . . . p ( x p | X compl . , θ ) Estimation: via MCMC ➡ Gibbs sampling (using Metropolis-Hastings, ...) Software: Implemented in the R package JointAI (using JAGS) 9

  24. In Practice: Analysis of the HCV Data library ("JointAI") library ("splines") fmla <- list ( # formula for survival model Surv (etime, event) ~ age + sex + alc + smoke + BMI + DM + year + logBili + logALT + logAST + Plt, # formulas for the longitudinal outcomes logBili ~ age + sex + time + (time | id), logAST ~ age + sex + ns (time, df = 5) + ( ns (time, df = 5) | id), logALT ~ age + sex + ns (time, df = 3) + ( ns (time, df = 3) | id), Plt ~ age + sex + ns (time, df = 3) + ( ns (time, df = 3) | id) ) 10

  25. In Practice: Analysis of the HCV Data library ("JointAI") library ("splines") fmla <- list ( # formula for survival model Surv (etime, event) ~ age + sex + alc + smoke + BMI + DM + year + logBili + logALT + logAST + Plt, # formulas for the longitudinal outcomes logBili ~ age + sex + time + (time | id), logAST ~ age + sex + ns (time, df = 5) + ( ns (time, df = 5) | id), logALT ~ age + sex + ns (time, df = 3) + ( ns (time, df = 3) | id), Plt ~ age + sex + ns (time, df = 3) + ( ns (time, df = 3) | id) ) mod <- JM_imp (fmla, data = HCVdata, timevar = "time", n.iter = 2000) 10

  26. In Practice: Analysis of the HCV Data Additional options: ◮ covariate model types ◮ hyper-parameters ◮ number of chains & thinning interval ◮ ... Additional features: ◮ use of auxiliary variables ◮ use of ridge shrinkage priors ◮ multi-level settings (e.g., multi-center) ◮ ... For more info, see https://nerler.github.io/JointAI 11

  27. Connecting Models Longitudinal ➡ Survival Longitudinal ➡ Longitudinal 12

  28. Connecting Models Longitudinal ➡ Survival Longitudinal ➡ Longitudinal type of association ◮ underlying value η ki ( t ) ◮ slope ◮ cumulative effect ◮ time-lag ◮ ... ◮ combination of the above 12

  29. Connecting Models Longitudinal ➡ Survival Longitudinal ➡ Longitudinal ◮ independent type of association ◮ underlying value η ki ( t ) ◮ slope ◮ cumulative effect ◮ time-lag ◮ ... ◮ combination of the above 12

Recommend


More recommend