imputation of incomplete covariates in longitudinal data
play

Imputation of Incomplete Covariates in Longitudinal Data Can - PowerPoint PPT Presentation

Imputation of Incomplete Covariates in Longitudinal Data Can Bayesian non-parametric methods prevent model-misspecification? Nicole Erler and Dimitris Rizopoulos Erasmus Medical Center, Rotterdam 15 July 2019 Nicole Erler and Dimitris


  1. Imputation of Incomplete Covariates in Longitudinal Data Can Bayesian non-parametric methods prevent model-misspecification? Nicole Erler and Dimitris Rizopoulos Erasmus Medical Center, Rotterdam 15 July 2019 Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 1

  2. Motivation What are risk factors for diabetic retinopathy? important predictors: blood pressure haemoglobin A1c (HbA 1c ) other covariates: age at baseline gender diabetes duration smoking history & status Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 2

  3. Motivation Challenge: Missing values retinopathy grade: 43% blood pressure: 20% Hb A1c : 20% diabetes duration: 11% smoking history: 33% smoking status: 28% Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 3

  4. Motivation Challenge: Solution? Missing values Multiple Imputation retinopathy grade: 43% MICE / FCS blood pressure: 20% Joint Model Hb A1c : 20% (e.g. multivariate normal) diabetes duration: 11% Fully Bayesian smoking history: 33% ... smoking status: 28% Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 3

  5. Fully Bayesian Analysis & Imputation Joint distribution p ( y | X , b , θ ) p ( X | θ ) p ( b | θ ) p ( θ ) � �� � � �� � � �� � ���� imputation priors analysis random model part effects Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 4

  6. Fully Bayesian Analysis & Imputation Joint distribution p ( y | X , b , θ ) p ( X | θ ) p ( b | θ ) p ( θ ) � �� � � �� � � �� � ���� imputation priors analysis random model part effects Imputation part X � �� � p ( x 1 , . . . , x p , X compl . | θ ) = p ( x 1 | X compl . , θ ) p ( x 2 | x 1 , X compl . , θ ) p ( x 3 | x 1 , x 2 , X compl . , θ ) . . . Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 4

  7. Fully Bayesian Analysis & Imputation Joint distribution p ( y | X , b , θ ) p ( X | θ ) p ( b | θ ) p ( θ ) � �� � � �� � � �� � ���� imputation priors analysis random model part effects Imputation part X � �� � p ( x 1 , . . . , x p , X compl . | θ ) = p ( x 1 | X compl . , θ ) p ( x 2 | x 1 , X compl . , θ ) p ( x 3 | x 1 , x 2 , X compl . , θ ) . . . Software Implemented in the R package JointAI Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 4

  8. Handling Missing Values Assumptions about association structure linear, additive ➡ conditional distribution normal (for continuous) ➡ missingness process ignorable ➡ Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 5

  9. Handling Missing Values Assumptions about association structure linear, additive ➡ conditional distribution normal (for continuous) ➡ missingness process ignorable ➡ Violation of the implied assumptions may result in bias ! Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 5

  10. Real Data Non-linear evolutions over time HbA 1c retinopathy SBP fitted value (lin. predictor) 2 152.5 66 1 150.0 64 147.5 0 62 145.0 60 −1 142.5 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 follow−up time (years) Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 6

  11. Real Data Non-linear associations among variables retinopathy SBP fitted value (lin. predictor) 150 0 140 −2 −4 130 −6 50 100 150 50 100 150 HbA 1 c Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 7

  12. Bayesian P-Splines Instead of y ∼ β 0 + β 1 x 1 + . . . we assume d � y ∼ β 0 + β ℓ B ℓ ( x 1 ) + . . . ℓ =1 y x 1 Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 8

  13. Bayesian P-Splines Instead of y ∼ β 0 + β 1 x 1 + . . . we assume d � y ∼ β 0 + β ℓ B ℓ ( x 1 ) + . . . ℓ =1 y x 1 Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 8

  14. Bayesian P-Splines Instead of y ∼ β 0 + β 1 x 1 + . . . we assume d � y ∼ β 0 + β ℓ B ℓ ( x 1 ) + . . . ℓ =1 y x 1 Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 8

  15. Bayesian P-Splines How many B ℓ ’s do we need? d =4 � y ∼ β 0 + β ℓ B ℓ ( x 1 ) + . . . ℓ =1 y x 1 Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 9

  16. Bayesian P-Splines How many B ℓ ’s do we need? d =30 � y ∼ β 0 + β ℓ B ℓ ( x 1 ) + . . . ℓ =1 y x 1 Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 9

  17. Bayesian P-Splines Idea: Use many functions but restrict neighboring β ’s to be similar: ( β 1 , . . . , β d ) ∼ MVN ( 0 , 1 /σ 2 D T D ) , with penalty matrix D , for example:   1 − 2 1 0 0 0 · · ·  0 1 − 2 1 0 0 · · ·     ... ... ... ... ...  D =       · · · 0 0 1 − 2 1 0   · · · 0 0 0 1 − 2 1 Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 10

  18. Simulation Analysis model: y ∼ β 0 + β 1 x 1 + β 2 x 2 + . . . Quadratic association between covariates: x 1 ∼ α 0 + α 1 x 2 + α 2 x 2 2 + . . . x 1 (incompl. covariate) 30 20 10 0 −10 −2 0 2 −2 0 2 −2 0 2 x 2 (complete covariate) Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 11

  19. Simulation Analysis model: y ∼ β 0 + β 1 x 1 + β 2 x 2 + . . . Quadratic association between covariates: x 1 ∼ α 0 + α 1 x 2 + α 2 x 2 2 + . . . x 1 (incompl. covariate) 30 20 10 0 −10 −2 0 2 −2 0 2 −2 0 2 x 2 (complete covariate) 1.75 relative bias 1.50 1.25 1.00 default p−spline default p−spline default p−spline Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 11

  20. Real Data Non-normal continuous distributions diabetes duration HbA 1c 2500 600 2000 1500 400 count 1000 200 500 0 0 0 10 20 30 40 40 80 120 value Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 12

  21. Mixture of normal distributions diabetes duration HbA 1c density value Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 13

  22. Mixture of normal distributions diabetes duration HbA 1c density value Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 13

  23. Dirichlet Process Mixture Model x 1 i | θ i ∼ F ( θ i ) ∞ � θ i | G ∼ G = π k δ θ ∗ k k =1 G | α, G 0 ∼ ↓ ( α, G 0 ) DP stick-breaking construction e.g. x 1 i ∼ N ( µ k , σ 2 k ) Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 14

  24. Dirichlet Process Mixture Model x 1 i | θ i ∼ F ( θ i ) ∞ � θ i | G ∼ G = π k δ θ ∗ k k =1 G | α, G 0 ∼ DP ↓ ( α, G 0 ) stick-breaking construction e.g. x 1 i ∼ N ( µ k , σ 2 k ) , σ 2 x i ∼ N ( η i + µ k ) , with η i = α 1 x 2 i + α 2 x 3 i + . . . k ↓ ↓ p ( µ k ) p ( σ 2 k ) Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 14

  25. Dirichlet Process Mixture Model x 1 i | θ i ∼ F ( θ i ) ∞ � θ i | G ∼ G = π k δ θ ∗ k k =1 G | α, G 0 ∼ DP ↓ ( α, G 0 ) stick-breaking construction e.g. x 1 i ∼ N ( µ k , σ 2 k ) , σ 2 x i ∼ N ( η i + µ k ) , with η i = α 1 x 2 i + α 2 x 3 i + . . . k ↓ ↓ p ( µ k ) p ( σ 2 ➡ k ) very flexible little contribution ➡ � � Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 14

  26. Mixture of Polya Trees 0.4 0.3 density 0.2 0.1 0.0 0 10 20 30 40 50 diabetes duration Beta(a 0 , a 0 ) Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 15

  27. Mixture of Polya Trees 0.4 0.3 density 0.2 0.1 0.0 0 10 20 30 40 50 diabetes duration Beta(a 0 , a 0 ) Beta(a 1 , a 1 ) Beta(a 1 , a 1 ) Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 15

  28. Mixture of Polya Trees 0.4 0.3 density 0.2 0.1 0.0 0 10 20 30 40 50 diabetes duration Beta(a 0 , a 0 ) Beta(a 1 , a 1 ) Beta(a 1 , a 1 ) Beta(a 2 , a 2 ) Beta(a 2 , a 2 ) Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 15

  29. Mixture of Polya Trees 0.4 0.3 density 0.2 0.1 0.0 0 10 20 30 40 50 diabetes duration Beta(a 0 , a 0 ) Beta(a 1 , a 1 ) Beta(a 1 , a 1 ) Beta(a 2 , a 2 ) Beta(a 2 , a 2 ) Beta(a 3 , a 3 ) Beta(a 3 , a 3 ) Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 15

  30. Mixture of Polya Trees 0.4 0.3 density 0.2 0.1 0.0 0 10 20 30 40 50 diabetes duration Beta(a 0 , a 0 ) Beta(a 1 , a 1 ) Beta(a 1 , a 1 ) Beta(a 2 , a 2 ) Beta(a 2 , a 2 ) Beta(a 3 , a 3 ) Beta(a 3 , a 3 ) Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 15

  31. Mixture of Polya Trees 0.4 0.3 density 0.2 0.1 0.0 0 10 20 30 40 50 diabetes duration Beta(a 0 , a 0 ) Beta(a 1 , a 1 ) Beta(a 1 , a 1 ) Beta(a 2 , a 2 ) Beta(a 2 , a 2 ) Beta(a 3 , a 3 ) Beta(a 3 , a 3 ) Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 15

  32. Mixture of Polya Trees 0.4 0.3 density 0.2 0.1 0.0 0 10 20 30 40 50 diabetes duration Beta(a 0 , a 0 ) Beta(a 1 , a 1 ) Beta(a 1 , a 1 ) Beta(a 2 , a 2 ) Beta(a 2 , a 2 ) Beta(a 3 , a 3 ) Beta(a 3 , a 3 ) Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 15

  33. Practical Issues & Ideas flexible fit needs observed data everywhere Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 16

  34. Practical Issues & Ideas flexible fit needs observed data everywhere missing observed incomplete covariate covariate Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 16

Recommend


More recommend