Imputation of Incomplete Covariates in Longitudinal Data Can Bayesian non-parametric methods prevent model-misspecification? Nicole Erler and Dimitris Rizopoulos Erasmus Medical Center, Rotterdam 15 July 2019 Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 1
Motivation What are risk factors for diabetic retinopathy? important predictors: blood pressure haemoglobin A1c (HbA 1c ) other covariates: age at baseline gender diabetes duration smoking history & status Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 2
Motivation Challenge: Missing values retinopathy grade: 43% blood pressure: 20% Hb A1c : 20% diabetes duration: 11% smoking history: 33% smoking status: 28% Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 3
Motivation Challenge: Solution? Missing values Multiple Imputation retinopathy grade: 43% MICE / FCS blood pressure: 20% Joint Model Hb A1c : 20% (e.g. multivariate normal) diabetes duration: 11% Fully Bayesian smoking history: 33% ... smoking status: 28% Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 3
Fully Bayesian Analysis & Imputation Joint distribution p ( y | X , b , θ ) p ( X | θ ) p ( b | θ ) p ( θ ) � �� � � �� � � �� � ���� imputation priors analysis random model part effects Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 4
Fully Bayesian Analysis & Imputation Joint distribution p ( y | X , b , θ ) p ( X | θ ) p ( b | θ ) p ( θ ) � �� � � �� � � �� � ���� imputation priors analysis random model part effects Imputation part X � �� � p ( x 1 , . . . , x p , X compl . | θ ) = p ( x 1 | X compl . , θ ) p ( x 2 | x 1 , X compl . , θ ) p ( x 3 | x 1 , x 2 , X compl . , θ ) . . . Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 4
Fully Bayesian Analysis & Imputation Joint distribution p ( y | X , b , θ ) p ( X | θ ) p ( b | θ ) p ( θ ) � �� � � �� � � �� � ���� imputation priors analysis random model part effects Imputation part X � �� � p ( x 1 , . . . , x p , X compl . | θ ) = p ( x 1 | X compl . , θ ) p ( x 2 | x 1 , X compl . , θ ) p ( x 3 | x 1 , x 2 , X compl . , θ ) . . . Software Implemented in the R package JointAI Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 4
Handling Missing Values Assumptions about association structure linear, additive ➡ conditional distribution normal (for continuous) ➡ missingness process ignorable ➡ Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 5
Handling Missing Values Assumptions about association structure linear, additive ➡ conditional distribution normal (for continuous) ➡ missingness process ignorable ➡ Violation of the implied assumptions may result in bias ! Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 5
Real Data Non-linear evolutions over time HbA 1c retinopathy SBP fitted value (lin. predictor) 2 152.5 66 1 150.0 64 147.5 0 62 145.0 60 −1 142.5 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 follow−up time (years) Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 6
Real Data Non-linear associations among variables retinopathy SBP fitted value (lin. predictor) 150 0 140 −2 −4 130 −6 50 100 150 50 100 150 HbA 1 c Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 7
Bayesian P-Splines Instead of y ∼ β 0 + β 1 x 1 + . . . we assume d � y ∼ β 0 + β ℓ B ℓ ( x 1 ) + . . . ℓ =1 y x 1 Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 8
Bayesian P-Splines Instead of y ∼ β 0 + β 1 x 1 + . . . we assume d � y ∼ β 0 + β ℓ B ℓ ( x 1 ) + . . . ℓ =1 y x 1 Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 8
Bayesian P-Splines Instead of y ∼ β 0 + β 1 x 1 + . . . we assume d � y ∼ β 0 + β ℓ B ℓ ( x 1 ) + . . . ℓ =1 y x 1 Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 8
Bayesian P-Splines How many B ℓ ’s do we need? d =4 � y ∼ β 0 + β ℓ B ℓ ( x 1 ) + . . . ℓ =1 y x 1 Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 9
Bayesian P-Splines How many B ℓ ’s do we need? d =30 � y ∼ β 0 + β ℓ B ℓ ( x 1 ) + . . . ℓ =1 y x 1 Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 9
Bayesian P-Splines Idea: Use many functions but restrict neighboring β ’s to be similar: ( β 1 , . . . , β d ) ∼ MVN ( 0 , 1 /σ 2 D T D ) , with penalty matrix D , for example: 1 − 2 1 0 0 0 · · · 0 1 − 2 1 0 0 · · · ... ... ... ... ... D = · · · 0 0 1 − 2 1 0 · · · 0 0 0 1 − 2 1 Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 10
Simulation Analysis model: y ∼ β 0 + β 1 x 1 + β 2 x 2 + . . . Quadratic association between covariates: x 1 ∼ α 0 + α 1 x 2 + α 2 x 2 2 + . . . x 1 (incompl. covariate) 30 20 10 0 −10 −2 0 2 −2 0 2 −2 0 2 x 2 (complete covariate) Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 11
Simulation Analysis model: y ∼ β 0 + β 1 x 1 + β 2 x 2 + . . . Quadratic association between covariates: x 1 ∼ α 0 + α 1 x 2 + α 2 x 2 2 + . . . x 1 (incompl. covariate) 30 20 10 0 −10 −2 0 2 −2 0 2 −2 0 2 x 2 (complete covariate) 1.75 relative bias 1.50 1.25 1.00 default p−spline default p−spline default p−spline Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 11
Real Data Non-normal continuous distributions diabetes duration HbA 1c 2500 600 2000 1500 400 count 1000 200 500 0 0 0 10 20 30 40 40 80 120 value Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 12
Mixture of normal distributions diabetes duration HbA 1c density value Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 13
Mixture of normal distributions diabetes duration HbA 1c density value Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 13
Dirichlet Process Mixture Model x 1 i | θ i ∼ F ( θ i ) ∞ � θ i | G ∼ G = π k δ θ ∗ k k =1 G | α, G 0 ∼ ↓ ( α, G 0 ) DP stick-breaking construction e.g. x 1 i ∼ N ( µ k , σ 2 k ) Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 14
Dirichlet Process Mixture Model x 1 i | θ i ∼ F ( θ i ) ∞ � θ i | G ∼ G = π k δ θ ∗ k k =1 G | α, G 0 ∼ DP ↓ ( α, G 0 ) stick-breaking construction e.g. x 1 i ∼ N ( µ k , σ 2 k ) , σ 2 x i ∼ N ( η i + µ k ) , with η i = α 1 x 2 i + α 2 x 3 i + . . . k ↓ ↓ p ( µ k ) p ( σ 2 k ) Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 14
Dirichlet Process Mixture Model x 1 i | θ i ∼ F ( θ i ) ∞ � θ i | G ∼ G = π k δ θ ∗ k k =1 G | α, G 0 ∼ DP ↓ ( α, G 0 ) stick-breaking construction e.g. x 1 i ∼ N ( µ k , σ 2 k ) , σ 2 x i ∼ N ( η i + µ k ) , with η i = α 1 x 2 i + α 2 x 3 i + . . . k ↓ ↓ p ( µ k ) p ( σ 2 ➡ k ) very flexible little contribution ➡ � � Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 14
Mixture of Polya Trees 0.4 0.3 density 0.2 0.1 0.0 0 10 20 30 40 50 diabetes duration Beta(a 0 , a 0 ) Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 15
Mixture of Polya Trees 0.4 0.3 density 0.2 0.1 0.0 0 10 20 30 40 50 diabetes duration Beta(a 0 , a 0 ) Beta(a 1 , a 1 ) Beta(a 1 , a 1 ) Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 15
Mixture of Polya Trees 0.4 0.3 density 0.2 0.1 0.0 0 10 20 30 40 50 diabetes duration Beta(a 0 , a 0 ) Beta(a 1 , a 1 ) Beta(a 1 , a 1 ) Beta(a 2 , a 2 ) Beta(a 2 , a 2 ) Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 15
Mixture of Polya Trees 0.4 0.3 density 0.2 0.1 0.0 0 10 20 30 40 50 diabetes duration Beta(a 0 , a 0 ) Beta(a 1 , a 1 ) Beta(a 1 , a 1 ) Beta(a 2 , a 2 ) Beta(a 2 , a 2 ) Beta(a 3 , a 3 ) Beta(a 3 , a 3 ) Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 15
Mixture of Polya Trees 0.4 0.3 density 0.2 0.1 0.0 0 10 20 30 40 50 diabetes duration Beta(a 0 , a 0 ) Beta(a 1 , a 1 ) Beta(a 1 , a 1 ) Beta(a 2 , a 2 ) Beta(a 2 , a 2 ) Beta(a 3 , a 3 ) Beta(a 3 , a 3 ) Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 15
Mixture of Polya Trees 0.4 0.3 density 0.2 0.1 0.0 0 10 20 30 40 50 diabetes duration Beta(a 0 , a 0 ) Beta(a 1 , a 1 ) Beta(a 1 , a 1 ) Beta(a 2 , a 2 ) Beta(a 2 , a 2 ) Beta(a 3 , a 3 ) Beta(a 3 , a 3 ) Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 15
Mixture of Polya Trees 0.4 0.3 density 0.2 0.1 0.0 0 10 20 30 40 50 diabetes duration Beta(a 0 , a 0 ) Beta(a 1 , a 1 ) Beta(a 1 , a 1 ) Beta(a 2 , a 2 ) Beta(a 2 , a 2 ) Beta(a 3 , a 3 ) Beta(a 3 , a 3 ) Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 15
Practical Issues & Ideas flexible fit needs observed data everywhere Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 16
Practical Issues & Ideas flexible fit needs observed data everywhere missing observed incomplete covariate covariate Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 16
Recommend
More recommend