Dealing with Missing Values in Multivariate Joint Models for - PowerPoint PPT Presentation

Dealing with Missing Values in Multivariate Joint Models for Longitudinal and Survival Data Nicole Erler Department of Biostatistics, Erasmus Medical Center � n.erler@erasmusmc.nl � N_Erler � www.nerler.com � NErler ISCB 2020

Chronic Hepatitis C Image: https://www.hepatitisc.uw.edu/go/evaluation-staging-monitoring/natural-history/core-concept/all 1

Longitudinal Covariates log(AST) log(ALT) platelets 10.0 8 1000 7.5 6 5.0 500 4 2.5 2 0 log(bilirubin) albumin GGT 60 6 3000 40 4 2000 20 2 1000 0 0 0 0 10 20 30 0 10 20 30 0 10 20 30 years since diagnosis 2

Baseline Covariates alcohol (4.2% NA) anti−HBc (10.7% NA) BMI (19.1% NA) diabetes (0.1% NA) 60 600 400 50 300 500 40 300 400 frequency frequency frequency frequency 200 30 300 200 20 200 100 100 10 100 50 0 0 0 0 No Yes NA Negative Positive NA 15 20 25 30 35 40 45 No Yes NA smoking (15.1% NA) sex age year 50 300 40 400 250 40 30 300 200 frequency frequency frequency frequency 30 150 20 200 20 100 10 100 10 50 0 0 0 0 Positive Negative NA Male Female 20 30 40 50 60 70 80 90 1985 1995 2005 3

Missing Values in Longitudinal Covariates patient 1 patient 2 patient 3 patient 4 scaled biomarker value 0 5 10 0.0 0.5 1.0 1.5 0 5 10 15 0.0 2.5 5.0 7.5 10.0 patient 5 patient 6 patient 7 biomarker log(bilirubin) log(AST) log(ALT) platelets albumin GGT 0 10 20 30 0 5 10 10 15 follow−up (years) 4

Multivariate Joint Model Proportional hazards model for time until event:   K � i β ( tc ) η ki ( t ) ⊤ β ( tv )  x ⊤ h i ( t ) = h 0 ( t ) exp � �� +  k � �� k =1 time time constant varying 5

Multivariate Joint Model Proportional hazards model for time until event:   K � i β ( tc ) η ki ( t ) ⊤ β ( tv )  x ⊤ h i ( t ) = h 0 ( t ) exp � �� +  k � �� k =1 time time constant varying Longitudinal (mixed) model for each biomarker k = 1 , ... K : E ( y ki ( t ) | b ki ) = η ki ( t ) = x ki ( t ) ⊤ β ( k ) + z ki ( t ) ⊤ b ki � �� fixed random effects effects 5

Multivariate Joint Model Proportional hazards model for time until event: � � K � i β ( tc ) η ki ( t ) ⊤ β ( tv ) x ⊤ h i ( t ) = h 0 ( t ) exp � �� + k � �� k =1 time time constant varying Longitudinal (mixed) model for each biomarker k = 1 , ... K : E ( y ki ( t ) | b ki ) = η ki ( t ) = x ki ( t ) ⊤ β ( k ) + z ki ( t ) ⊤ b ki � �� fixed random effects effects Missing values in (baseline) covariates. 5

Imputation of Missing Covariates Imputation of a (baseline) variable x i : ➡ sample from the predictive distribution of the missing values given the observed values 6

Imputation of Missing Covariates Imputation of a (baseline) variable x i : ➡ sample from the predictive distribution of the missing values given the observed values p ( x i | everything else � ) � �� other baseline variables � repeatedly measured variables (incl. outcomes) � survival outcome 6

Imputation of Missing Covariates Imputation of a (baseline) variable x i : ➡ sample from the predictive distribution of the missing values given the observed values p ( x i | everything else � ) � �� other baseline variables � repeatedly measured variables (incl. outcomes) � survival outcome ➡ We cannot directly specify the (correct) imputation model! 6

Imputation of Missing Covariates Idea: ◮ specify the joint distribution p ( everything ) ◮ derive p ( x i | everything else ) from p ( everything ) 7

Imputation of Missing Covariates Idea: ◮ specify the joint distribution p ( everything ) ◮ derive p ( x i | everything else ) from p ( everything ) But: p ( everything ) = p (survival outcome , longitudinal outcomes , longitudinal covariates , baseline covariates , random effects , parameters) = p ( T , D , y , X , b , θ ) Does this really solve anything? 7

Imputation of Missing Covariates Idea: ◮ specify the joint distribution p ( everything ) ◮ derive p ( x i | everything else ) from p ( everything ) But: p ( everything ) = p (survival outcome , longitudinal outcomes , longitudinal covariates , baseline covariates , random effects , parameters) = p ( T , D , y , X , b , θ ) Does this really solve anything? � Yes, it does! 7

Fully Bayesian Analysis & Imputation From probability theory: p ( A , B ) = p ( A | B ) p ( B ) 8

Fully Bayesian Analysis & Imputation From probability theory: p ( A , B ) = p ( A | B ) p ( B ) Joint distribution p ( T , D , y , X , b , θ ) = p ( T , D | X , b , θ ) p ( y | X , b , θ ) p ( X | θ ) p ( b | θ ) p ( θ ) � �� survival multivariate model longitudinal model 8

Fully Bayesian Analysis & Imputation From probability theory: p ( A , B ) = p ( A | B ) p ( B ) Joint distribution p ( T , D , y , X , b , θ ) = p ( T , D | X , b , θ ) p ( y | X , b , θ ) p ( X | θ ) p ( b | θ ) p ( θ ) � �� survival multivariate random priors model longitudinal effects model � �� analysis model 8

Fully Bayesian Analysis & Imputation From probability theory: p ( A , B ) = p ( A | B ) p ( B ) Joint distribution p ( T , D , y , X , b , θ ) = p ( T , D | X , b , θ ) p ( y | X , b , θ ) p ( X | θ ) p ( b | θ ) p ( θ ) � �� survival multivariate imputation random priors model longitudinal part effects model � �� analysis model 8

Fully Bayesian Analysis & Imputation Imputation part p ( X | θ ) = p ( x 1 , . . . , x p , X compl . | θ ) = p ( x 1 | X compl . , x 2 , x 3 , . . . , x p , θ ) p ( x 2 | X compl . , x 3 , . . . , x p , θ ) . . . p ( x p | X compl . , θ ) Estimation: via MCMC ➡ Gibbs sampling (using Metropolis-Hastings, ...) 9

Fully Bayesian Analysis & Imputation Imputation part p ( X | θ ) = p ( x 1 , . . . , x p , X compl . | θ ) = p ( x 1 | X compl . , x 2 , x 3 , . . . , x p , θ ) p ( x 2 | X compl . , x 3 , . . . , x p , θ ) . . . p ( x p | X compl . , θ ) Estimation: via MCMC ➡ Gibbs sampling (using Metropolis-Hastings, ...) Software: Implemented in the R package JointAI (using JAGS) 9

In Practice: Analysis of the HCV Data library ("JointAI") library ("splines") fmla <- list ( # formula for survival model Surv (etime, event) ~ age + sex + alc + smoke + BMI + DM + year + logBili + logALT + logAST + Plt, # formulas for the longitudinal outcomes logBili ~ age + sex + time + (time | id), logAST ~ age + sex + ns (time, df = 5) + ( ns (time, df = 5) | id), logALT ~ age + sex + ns (time, df = 3) + ( ns (time, df = 3) | id), Plt ~ age + sex + ns (time, df = 3) + ( ns (time, df = 3) | id) ) 10

In Practice: Analysis of the HCV Data library ("JointAI") library ("splines") fmla <- list ( # formula for survival model Surv (etime, event) ~ age + sex + alc + smoke + BMI + DM + year + logBili + logALT + logAST + Plt, # formulas for the longitudinal outcomes logBili ~ age + sex + time + (time | id), logAST ~ age + sex + ns (time, df = 5) + ( ns (time, df = 5) | id), logALT ~ age + sex + ns (time, df = 3) + ( ns (time, df = 3) | id), Plt ~ age + sex + ns (time, df = 3) + ( ns (time, df = 3) | id) ) mod <- JM_imp (fmla, data = HCVdata, timevar = "time", n.iter = 2000) 10

In Practice: Analysis of the HCV Data Additional options: ◮ covariate model types ◮ hyper-parameters ◮ number of chains & thinning interval ◮ ... Additional features: ◮ use of auxiliary variables ◮ use of ridge shrinkage priors ◮ multi-level settings (e.g., multi-center) ◮ ... For more info, see https://nerler.github.io/JointAI 11

Connecting Models Longitudinal ➡ Survival Longitudinal ➡ Longitudinal 12

Connecting Models Longitudinal ➡ Survival Longitudinal ➡ Longitudinal type of association ◮ underlying value η ki ( t ) ◮ slope ◮ cumulative effect ◮ time-lag ◮ ... ◮ combination of the above 12

Connecting Models Longitudinal ➡ Survival Longitudinal ➡ Longitudinal ◮ independent type of association ◮ underlying value η ki ( t ) ◮ slope ◮ cumulative effect ◮ time-lag ◮ ... ◮ combination of the above 12

Dealing with Missing Values in Multivariate Joint Models for - PowerPoint PPT Presentation

Dealing with Missing Values in Multivariate Joint Models for Longitudinal and Survival Data Nicole Erler Department of Biostatistics, Erasmus Medical Center n.erler@erasmusmc.nl N_Erler www.nerler.com NErler ISCB 2020 Chronic

Outline Multivariate Data 1 Multivariate Parametric Methods Multivariate Normal Distribution 2

Dealing with missing values part 1 Applied Multivariate Statistics Spring 2012 Overview

Dealing with missing values part 1 Applied Multivariate Statistics Spring 2013 Overview

Searching for and replacing missing values Nicholas Tierney Statistician DataCamp Dealing With

Dealing with missing values part 2 Applied Multivariate Statistics Spring 2012 Overview

Missing Values in SAS Magnus Mengelbier Director PhUSE 2011 1 Topics Introduction

Dealing With The Irate Customer Dealing With The Irate Customer Dealing with difficult

Performing and tracking imputation Nicholas Tierney Statistician DataCamp Dealing With Missing

Advances in ML: Theory Meets Practice Julie Josse Review on Missing Values Methods with Demos

Missing Data and Imputation NINA ORWITZ OCTOBER 30 TH , 2017 Outline Types of missing data

Multivariate t-distributions Surajit Ray Reader, University of Glasgow DataCamp Multivariate

Reading multivariate data Surajit Ray Reader, University of Glasgow DataCamp Multivariate

Handling Missing Values STAT 133 Gaston Sanchez Department of Statistics, UCBerkeley

An NFR Pattern Approach to Dealing An NFR Pattern Approach to Dealing An NFR Pattern Approach to

Multiple Imputation for Missing Data in KLoSA Juwon Song Korea University and UCLA Contents 1.

Multivariate and Partially observed models Erik Lindstrm n T Briefly on multivariate models N

Diabetes: Microvascular National Center for Complications Health in Public Housing Diabetes in

Lymphedema Evolving Surgical Options: Where We are and Where Were Going American College of

Liv Liver dise diseas ase in in pr prim imary car are: Consulting: Abbvie, Contravir,

via Stan Sam Brilleman 1,2 , Michael J. Crowther 3 , Margarita Moreno-Betancur 2,4,5 , Jacqueline

Comprehensive,Geriatric,Assessment, Achieving)pa,ent.centered)cancer)care ) Pierre,Soubeyran,

Comparison of Bayesian Network and Decision Tree Methods for Predicting Access to the Renal

Project Proposal J PAUL SGCTG SCOTROC4* demonstrated a statistical significant prognostic value

Organic Compounds in Water and Wastewater NOM Characterization II Lecture #8 Dave Reckhow -

Sambuz

Useful Links

Newsletter

Mail Us