overview multiple imputation for multilevel data
play

Overview Multiple Imputation for Multilevel Data Bayesian - PowerPoint PPT Presentation

Overview Multiple Imputation for Multilevel Data Bayesian estimation for MLMs Univariate multiple imputation Session 1 Craig K. Enders Brian T. Keller Joint model imputation University of California - Los Angeles Fully conditional


  1. Overview Multiple Imputation for Multilevel Data Bayesian estimation for MLMs Univariate multiple imputation Session 1 Craig K. Enders Brian T. Keller Joint model imputation University of California - Los Angeles Fully conditional specification Department of Psychology Incomplete categorical variables Session 2 Work supported by IES award R305D150056 Software examples Why Imputation? Model Notation Dedicated multilevel programs restricts maximum Two-level model with observation i nested in likelihood estimation to incomplete outcomes cluster j (e.g., student i in school j ) Multilevel SEM software is more flexible but typically imposes normality on incomplete predictors and may perform poorly in some cases Imputation is flexible (e.g., mixtures of categorical and continuous variables are no problem)

  2. Bayesian Estimation And Imputation Bayesian estimation (e.g., Gibbs sampler) is the mathematical machinery for imputation Bayesian Estimation For Multilevel Models Each algorithmic cycle is a complete-data Bayes analysis followed by an imputation step A multilevel model generates imputations Analysis Example Bayesian Paradigm The Bayesian framework views parameters and Random intercept model with a level-1 predictor level-2 residuals as random variables that follow a probability distribution (a posterior) Assume complete data, estimation steps do not change with missing values Posterior Prior Likelihood

  3. Gibbs Sampler Gibbs Sampler Steps For One Iteration An iterative Gibbs sampler algorithm estimates quantities in one at a time, treating all other Estimate regression coefficients variables as known Estimate level-2 random effects Monte Carlo simulation “samples” parameter Estimate within-cluster residual variance values from their conditional distributions Estimate level-2 covariance matrix Repeating the sampling steps many times yields a distribution of each estimate Estimating Regression Coefficients Conditional Distribution Regression coefficients are drawn from a multivariate normal distribution that conditions on random effects, variances, and the data Current iteration Previous iteration

  4. Estimating Level-2 Random Effects Conditional Distribution Level-2 random effects are drawn from a multivariate normal distribution that conditions on the coefficients, variances, and the data Updated estimates Previous iteration Estimating The Residual Variance Conditional Distribution The within-cluster residual variance is drawn from an inverse Wishart distribution that conditions on the previous coefficients, random effects, level-2 covariance matrix, and the data

  5. Estimating Level-2 Covariance Matrix Conditional Distribution The level-2 covariance matrix is sampled from an inverse Wishart distribution that conditions on the previous coefficients, random effects, residual variance, and the data Iteration t is complete, start anew at iteration t + 1 Multilevel Imputation Imputation uses a model with an incomplete variable regressed on complete variables Univariate Multiple Imputation Bayesian estimation steps are applied to the filled-in data from the previous iteration Model parameters and level-2 residuals define a distribution from which imputations are sampled

  6. Analysis And Imputation Models Gibbs Sampler Steps Random intercept analysis model with an Estimate coefficients incomplete predictor Estimate random effects Complete-data Bayes estimation Estimate residual variance Random intercept imputation model with the Estimate covariance matrix incomplete predictor as the outcome Update imputations Imputation step Random Intercept Imputation Model Distribution Of Missing Values Cluster 1 A normal distribution generates imputations, Cluster 2 with center equal to the predicted value for Cluster 3 observation i in cluster j and spread equal to the X (Incomplete) within-cluster residual variance Y (Complete)

  7. Random Intercept Imputation Model Random Intercept Imputation Model Cluster 1 Cluster 1 Imputation = Cluster 2 Cluster 2 Cluster 3 Cluster 3 X (Incomplete) X (Incomplete) Y (Complete) Y (Complete) Burn-In Period Thinning Interval Estimate parameters Estimate parameters Update imputations Update imputations Estimate parameters Update imputations Estimate parameters Update imputations Burn-in interval Thinning interval (e.g., 2000) (e.g., 2000) Estimate parameters Estimate parameters Update imputations Update imputations Iterate . . . . . Iterate . . . . . Save data set 1 Save data set 2 Estimate parameters Update imputations Estimate parameters Update imputations

  8. Repeat Until Finished … Analysis And Pooling The analysis model is fit to each data set, and Estimate parameters Update imputations the arithmetic average of the M estimates is the multiple imputation point estimate Estimate parameters Update imputations Thinning interval (e.g., 2000) Estimate parameters Update imputations Iterate . . . . . Save data set 20 Estimate parameters Update imputations Pooling assumes a normal sampling distribution Pooling Standard Errors Multivariate Missing Data Average sampling Joint model imputation uses multivariate variance regression to impute the set of missing variables Fully conditional specification imputes variables Variance across one at a time in a sequence imputations Both are multilevel extensions of major single- Standard error level imputation frameworks

  9. Joint Model Imputation Two forms: Multivariate Imputation With 1) Multivariate regression model with incomplete variables regressed on complete variables The Joint Modeling Framework 2) Empty model treating all variables as outcomes Available in Mplus, MLwiN, and R packages (e.g., jomo, pan, mlmmm) Imputation Model Random Intercept Analysis Model Two-level random intercept analysis with continuous level-1 and level-2 predictors All variables have missing data

  10. Covariance Structure Imputation Step ? Level-2 Level-1 Compatible Analysis Models Compatibility Of Imputation And Analysis Contextual effects analyses The imputation model is more flexible than the analysis model because it allows level-1 and level-2 covariance matrices to freely vary Multilevel SEM The analysis model assumes a common slope Imputations are appropriate for random intercept analyses that partition relations into within- and between-cluster parts

  11. R Package jomo Mplus # load packages data: library (jomo) file = ridata.csv; variable: # read raw data names = cluster av1 av2 y x w; dat <- read.table("~/desktop/examples/ridata.csv", sep = ",") usevariables = av1 av2 y x w; names(dat) = c("cluster", "av1", "av2", "y", "x","w") missing = all(999); dat[dat == 999] <- NA analysis: type = basic; # jomo imputation set.seed(90291) bseed = 90291; dat$icept <- 1 data imputation: l1miss <- c("y", "x") impute = y x w; l2miss <- c("w") ndatasets = 20; l1complete <- c("icept") save = imp*.dat; l2complete <- c("icept") thin = 1000; impdata <- jomo(dat[l1miss], Y2 = dat[l2miss], X = dat[l1complete], output: X2 = dat[l2complete], clus = dat$cluster, tech8; nburn = 2000, nbetween = 2000, nimp = 20, meth = "common") Complete Data Joint Model Imputation Simulation Study J = 30, n j = 5 J = 30, n j = 30 Random intercept model with 1000 replications Intercept L1 Slope ICC = .25, medium effect sizes L2 Slope 30 clusters with 5 or 30 observations per cluster Intercept Var. (i.e., N = 150 and 900) Residual Var. 15% MAR missing data on all analysis variables -40 -20 0 20 -40 -20 0 20 20 imputations with R package jomo Percentage Bias Percentage Bias

  12. Random Slope Analysis Model Joint Model Limitations Two-level random slope analysis with continuous Within-cluster covariances must preserve level-1 level-1 and level-2 predictors relations, including the random coefficients The classic formulation of the joint model assumes a common covariance matrix at level-1 Imputation ignores random slope variation All variables have missing data Covariance Structure Revisited Simulation Study Random slope model with 1000 replications ICC = .25, medium effect sizes ? 30 clusters with 5 or 30 observations per cluster Level-2 (i.e., N = 150 and 900) Level-1 15% MAR missing data on all analysis variables 20 imputations with R package jomo

  13. Complete Data Joint Model Imputation Brief Maximum Likelihood Detour J = 30, n j = 5 J = 30, n j = 30 J = 30, n j = 30 Intercept Mplus allows incomplete Intercept L1 Slope random slope predictors L2 Slope L1 Slope L2 Slope Intercept Var. Requires numerical Intercept Var. Covariance integration and many Covariance Slope Var. latent variable products Slope Var. Residual Var. Residual Var. Often yields severe bias -40 -20 0 20 -40 -20 0 20 -40 -20 0 20 Percentage Bias Percentage Bias Percentage Bias Joint Modeling With Covariance Structure Random Level-1 Covariance Matrices Yucel (2011) extended the joint model to ? incorporate random level-1 covariance matrices Level-2 Available in the R package jomo Level-1 Currently limited to 2-level models

Recommend


More recommend