Imputation of Incomplete Covariates in Longitudinal Data Can - - PowerPoint PPT Presentation

imputation of incomplete covariates in longitudinal data
SMART_READER_LITE
LIVE PREVIEW

Imputation of Incomplete Covariates in Longitudinal Data Can - - PowerPoint PPT Presentation

Imputation of Incomplete Covariates in Longitudinal Data Can Bayesian non-parametric methods prevent model-misspecification? Nicole Erler and Dimitris Rizopoulos Erasmus Medical Center, Rotterdam 15 July 2019 Nicole Erler and Dimitris


slide-1
SLIDE 1

Imputation of Incomplete Covariates in Longitudinal Data

Can Bayesian non-parametric methods prevent model-misspecification? Nicole Erler and Dimitris Rizopoulos

Erasmus Medical Center, Rotterdam

15 July 2019

Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 1

slide-2
SLIDE 2

Motivation

What are risk factors for diabetic retinopathy?

important predictors: blood pressure haemoglobin A1c (HbA1c)

  • ther covariates:

age at baseline gender diabetes duration smoking history & status

Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 2

slide-3
SLIDE 3

Motivation

Challenge:

Missing values retinopathy grade: 43% blood pressure: 20% HbA1c: 20% diabetes duration: 11% smoking history: 33% smoking status: 28%

Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 3

slide-4
SLIDE 4

Motivation

Challenge:

Missing values retinopathy grade: 43% blood pressure: 20% HbA1c: 20% diabetes duration: 11% smoking history: 33% smoking status: 28%

Solution?

Multiple Imputation

MICE / FCS Joint Model (e.g. multivariate normal)

Fully Bayesian ...

Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 3

slide-5
SLIDE 5

Fully Bayesian Analysis & Imputation

Joint distribution p(y | X, b, θ)

  • analysis

model p(X | θ)

  • imputation

part p(b | θ)

  • random

effects p(θ)

  • priors

Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 4

slide-6
SLIDE 6

Fully Bayesian Analysis & Imputation

Joint distribution p(y | X, b, θ)

  • analysis

model p(X | θ)

  • imputation

part p(b | θ)

  • random

effects p(θ)

  • priors

Imputation part p(

X

  • x1, . . . , xp, Xcompl. | θ)

= p(x1 | Xcompl., θ) p(x2 | x1, Xcompl., θ) p(x3 | x1, x2, Xcompl., θ) . . .

Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 4

slide-7
SLIDE 7

Fully Bayesian Analysis & Imputation

Joint distribution p(y | X, b, θ)

  • analysis

model p(X | θ)

  • imputation

part p(b | θ)

  • random

effects p(θ)

  • priors

Imputation part p(

X

  • x1, . . . , xp, Xcompl. | θ)

= p(x1 | Xcompl., θ) p(x2 | x1, Xcompl., θ) p(x3 | x1, x2, Xcompl., θ) . . . Software Implemented in the R package JointAI

Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 4

slide-8
SLIDE 8

Handling Missing Values

Assumptions about association structure ➡ linear, additive conditional distribution ➡ normal (for continuous) missingness process ➡ ignorable

Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 5

slide-9
SLIDE 9

Handling Missing Values

Assumptions about association structure ➡ linear, additive conditional distribution ➡ normal (for continuous) missingness process ➡ ignorable Violation of the implied assumptions may result in bias!

Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 5

slide-10
SLIDE 10

Real Data

Non-linear evolutions over time

HbA1c retinopathy SBP

5 10 15 20 5 10 15 20 5 10 15 20 142.5 145.0 147.5 150.0 152.5 −1 1 2 60 62 64 66

follow−up time (years) fitted value (lin. predictor)

Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 6

slide-11
SLIDE 11

Real Data

Non-linear associations among variables

retinopathy SBP

50 100 150 50 100 150 130 140 150 −6 −4 −2

HbA1 c fitted value (lin. predictor)

Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 7

slide-12
SLIDE 12

Bayesian P-Splines

Instead of y ∼ β0 + β1x1 + . . . we assume y ∼ β0 +

d

  • ℓ=1

βℓBℓ(x1) + . . .

x1 y

Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 8

slide-13
SLIDE 13

Bayesian P-Splines

Instead of y ∼ β0 + β1x1 + . . . we assume y ∼ β0 +

d

  • ℓ=1

βℓBℓ(x1) + . . .

x1 y

Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 8

slide-14
SLIDE 14

Bayesian P-Splines

Instead of y ∼ β0 + β1x1 + . . . we assume y ∼ β0 +

d

  • ℓ=1

βℓBℓ(x1) + . . .

x1 y

Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 8

slide-15
SLIDE 15

Bayesian P-Splines

How many Bℓ’s do we need? y ∼ β0 +

d=4

  • ℓ=1

βℓBℓ(x1) + . . .

x1 y

Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 9

slide-16
SLIDE 16

Bayesian P-Splines

How many Bℓ’s do we need? y ∼ β0 +

d=30

  • ℓ=1

βℓBℓ(x1) + . . .

x1 y

Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 9

slide-17
SLIDE 17

Bayesian P-Splines

Idea: Use many functions but restrict neighboring β’s to be similar: (β1, . . . , βd) ∼ MVN(0, 1/σ2DTD), with penalty matrix D, for example: D =

       

1 −2 1 · · · 1 −2 1 · · · ... ... ... ... ... · · · 1 −2 1 · · · 1 −2 1

       

Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 10

slide-18
SLIDE 18

Simulation

Analysis model: y ∼ β0 + β1x1 + β2x2 + . . . Quadratic association between covariates: x1 ∼ α0 + α1x2 + α2x2

2 + . . .

−2 2 −2 2 −2 2 −10 10 20 30

x2 (complete covariate) x1 (incompl. covariate)

Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 11

slide-19
SLIDE 19

Simulation

Analysis model: y ∼ β0 + β1x1 + β2x2 + . . . Quadratic association between covariates: x1 ∼ α0 + α1x2 + α2x2

2 + . . .

−2 2 −2 2 −2 2 −10 10 20 30

x2 (complete covariate) x1 (incompl. covariate) default p−spline default p−spline default p−spline

1.00 1.25 1.50 1.75

relative bias

Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 11

slide-20
SLIDE 20

Real Data

Non-normal continuous distributions

diabetes duration HbA1c

10 20 30 40 40 80 120 200 400 600 500 1000 1500 2000 2500

value count

Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 12

slide-21
SLIDE 21

Mixture of normal distributions

diabetes duration HbA1c value density

Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 13

slide-22
SLIDE 22

Mixture of normal distributions

diabetes duration HbA1c value density

Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 13

slide-23
SLIDE 23

Dirichlet Process Mixture Model

x1i | θi ∼ F(θi) θi | G ∼ G =

  • k=1

πkδθ∗

k

G | α, G0 ∼ DP

↓ (α, G0)

stick-breaking construction

e.g. x1i ∼ N(µk, σ2

k)

Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 14

slide-24
SLIDE 24

Dirichlet Process Mixture Model

x1i | θi ∼ F(θi) θi | G ∼ G =

  • k=1

πkδθ∗

k

G | α, G0 ∼ DP

↓ (α, G0)

stick-breaking construction

e.g. x1i ∼ N(µk, σ2

k)

xi ∼ N(ηi+µk

, σ2

k ↓

),

p(µk) p(σ2

k)

with ηi = α1x2i + α2x3i + . . .

Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 14

slide-25
SLIDE 25

Dirichlet Process Mixture Model

x1i | θi ∼ F(θi) θi | G ∼ G =

  • k=1

πkδθ∗

k

G | α, G0 ∼ DP

↓ (α, G0)

stick-breaking construction

e.g. x1i ∼ N(µk, σ2

k)

xi ∼ N(ηi+µk

, σ2

k ↓

),

p(µk) p(σ2

k)

with ηi = α1x2i + α2x3i + . . . very flexible

little contribution

  • Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven

14

slide-26
SLIDE 26

Mixture of Polya Trees

0.0 0.1 0.2 0.3 0.4 10 20 30 40 50

diabetes duration density

Beta(a0, a0)

Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 15

slide-27
SLIDE 27

Mixture of Polya Trees

0.0 0.1 0.2 0.3 0.4 10 20 30 40 50

diabetes duration density

Beta(a0, a0) Beta(a1, a1) Beta(a1, a1)

Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 15

slide-28
SLIDE 28

Mixture of Polya Trees

0.0 0.1 0.2 0.3 0.4 10 20 30 40 50

diabetes duration density

Beta(a0, a0) Beta(a1, a1) Beta(a1, a1) Beta(a2, a2) Beta(a2, a2)

Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 15

slide-29
SLIDE 29

Mixture of Polya Trees

0.0 0.1 0.2 0.3 0.4 10 20 30 40 50

diabetes duration density

Beta(a0, a0) Beta(a1, a1) Beta(a1, a1) Beta(a2, a2) Beta(a2, a2) Beta(a3, a3) Beta(a3, a3)

Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 15

slide-30
SLIDE 30

Mixture of Polya Trees

0.0 0.1 0.2 0.3 0.4 10 20 30 40 50

diabetes duration density

Beta(a0, a0) Beta(a1, a1) Beta(a1, a1) Beta(a2, a2) Beta(a2, a2) Beta(a3, a3) Beta(a3, a3)

Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 15

slide-31
SLIDE 31

Mixture of Polya Trees

0.0 0.1 0.2 0.3 0.4 10 20 30 40 50

diabetes duration density

Beta(a0, a0) Beta(a1, a1) Beta(a1, a1) Beta(a2, a2) Beta(a2, a2) Beta(a3, a3) Beta(a3, a3)

Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 15

slide-32
SLIDE 32

Mixture of Polya Trees

0.0 0.1 0.2 0.3 0.4 10 20 30 40 50

diabetes duration density

Beta(a0, a0) Beta(a1, a1) Beta(a1, a1) Beta(a2, a2) Beta(a2, a2) Beta(a3, a3) Beta(a3, a3)

Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 15

slide-33
SLIDE 33

Practical Issues & Ideas

flexible fit needs observed data everywhere

Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 16

slide-34
SLIDE 34

Practical Issues & Ideas

flexible fit needs observed data everywhere

covariate incomplete covariate

missing

  • bserved

Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 16

slide-35
SLIDE 35

Practical Issues & Ideas

flexible fit needs observed data everywhere

covariate incomplete covariate

missing

  • bserved

Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 16

slide-36
SLIDE 36

Practical Issues & Ideas

flexible fit needs observed data everywhere computational time

Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 16

slide-37
SLIDE 37

Practical Issues & Ideas

flexible fit needs observed data everywhere computational time Ideas: posterior predictive checks

χ2 type of tests Kolmogorov-Smirnoff test? discordance tests?

feasibility checks before running the model?

Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 16

slide-38
SLIDE 38

Preliminary Conclusion Can Bayesian non-parametric methods prevent reduce model-misspecification?

Nicole Erler and Dimitris Rizopoulos, ISCB 2019, Leuven 17

slide-39
SLIDE 39

Thank you for your attention.

  • n.erler@erasmusmc.nl
  • N_Erler
  • NErler
  • www.nerler.com