The Stochastic EM Algorithm for Censored Mixed Models Ian Marschner Macquarie University, Sydney 10th International Conference on MCQMC Methods in Scientific Computing, Sydney, 2012 1 / 18
INTRODUCTION Stochastic EM algorithm is a Monte Carlo iterative technique for computing an (approximate) MLE subject to missing data Fitting mixed models for repeated measures data with censoring can be a computationally challenging task due to the need for multidimensional integration in the likelihood function arising from the censoring (missingness) This talk presents a case study of the application of the stochastic EM algorithm to a censored mixed model analysis in the context of repeated HIV viral load measurements 2 / 18
EM ALGORITHM Maximisation of a log-likelihood 퐿 ( 휃 ; 푌 ) in the presence of missing data 휃 old of 휃 , an iteration of the algorithm Given a current estimate ˆ proceeds in two steps [ � ] E-step: 푄 ( 휃 ∣ ˆ � 푌 ; ˆ 휃 old ) = 퐸 � 휃 old 퐿 푐 ( 휃 ; 푋 ) where 퐿 푐 is the � log-likelihood corresponding to the complete data 푋 M-step: maximise 푄 ( 휃 ∣ ˆ 휃 old ) over the parameter space 휃 ∈ Θ to produce the updated estimate ˆ 휃 new 휃 new = ˆ Iterate until convergence (ˆ 휃 old ) 3 / 18
INTRACTABLE E-STEP In some cases the conditional expectation required for the E-step may be intractable Monte-Carlo EM (MCEM): involves calculation of conditional expectation via simulation MCEM requires a large number of simulations of missing data, conditional on observed data, at each iteration Stochastic EM (StEM): like MCEM but only one simulation at each iteration MCEM is equivalent to EM (MLE) up to the simulation approximation StEM is not equivalent to EM. It produces a Markov chain iterative sequence that can be averaged to produce an estimate of 휃 4 / 18
STOCHASTIC EM ALGORITHM Replace the E step with a single simulation 푋 ∗ of the complete data 푋 , conditional on the observed data 푌 and the current estimate ˆ 휃 old an StEM iteration consists of maximising 퐿 푐 ( 휃 ; 푋 ∗ ) The sequence of iterates arising from the StEM algorithm is a Markov chain and converges to a stationary distribution After a sufficiently long burn-in period a point estimate of 휃 can be obtained by averaging the sequence of StEM iterates Results exist drawing an asymptotic equivalence between the StEM and EM algorithms 5 / 18
CASE STUDY HIV viral load: blood concentration of HIV (log transformed) Antiviral therapy suppresses viral load Data: Repeated measures of viral load on the same individual subsequent to (partial) cessation of antiviral therapy Data: Subjects measured an average of 6.2 times each (4-8 times) Rate of increase of viral load (viral rebound) and its relationship to certain covariates gives information about HIV viral dynamics Viral load left censored (40%) due to limitations of laboratory assay 6 / 18
CENSORED REPEATED MEASURES EXAMPLES 6 6 Crude Slope = 0.026 Crude Slope = 0.092 5 Adjusted Slope = 0.053 5 Adjusted Slope = 0.121 VIRAL LOAD (log10) VIRAL LOAD (log10) 4 4 3 3 2 2 ^ ^ ^ ^ ^ ^ ^ 1 1 0 0 0 20 40 60 80 0 20 40 60 80 TIME (DAYS) TIME (DAYS) 6 6 Crude Slope = 0.044 Crude Slope = 0.079 5 Adjusted Slope = 0.061 5 Adjusted Slope = 0.097 VIRAL LOAD (log10) VIRAL LOAD (log10) 4 4 3 3 2 2 ^ ^ ^ ^ ^ ^ 1 1 0 0 0 20 40 60 80 0 20 40 60 80 TIME (DAYS) TIME (DAYS) 7 / 18
EXPLORATORY ANALYSES Separate censored linear regressions for each subject yield subject-specific estimates of rate of viral rebound Ignoring censoring leads to underestimation of rates of viral rebound Substantial variation exists between subjects in rates of viral rebound Various subject-specific viral parameters (covariates) may explain this variation viral load on starting therapy immune level when therapy was ceased (CD4 count) immune response when while on therapy (change in CD4 count) Exploratory analyses suggest a linear mixed effects model with uncorrelated slope and intercept 8 / 18
RATES OF VIRAL REBOUND std dev = 0.023 0.12 0.10 0.08 0.06 0.04 Subject-specific covariates may explain this variation 9 / 18
RATES OF VIRAL REBOUND (versus other quantities) 10 / 18
REPEATED MEASURES MIXED MODEL Let 푡 푖푗 be the 푗 th time at which subject 푖 is observed Let 푉 푖푗 be the value of subject 푖 ’s viral load at time 푡 푖푗 퐴 푖 , 퐵 푖 and 퐶 푖 are the values of the covariates for individual 푖 Mixed effects model: 푉 푖푗 = 휇 푖 + 푔 푖 ( 퐴 푖 , 퐵 푖 , 퐶 푖 ) 푡 푖푗 + 휖 푖푗 where 푔 푖 ( 퐴, 퐵, 퐶 ) = 훼 푖 + 훽퐴 + 훾퐵 + 훿퐶 { 휇 푖 } ∼ 푁 ( 휇, 휎 2 휇 ) ; { 훼 푖 } ∼ 푁 ( 훼, 휎 2 훼 ) ; { 휖 푖푗 } ∼ 푁 (0 , 휎 2 휖 ) (i.i.d.) Censoring: Observed data are 푉 ∗ 푖푗 = max ( 푉 푖푗 , 퐿 푖푗 ) for some lower assay limit 퐿 푖푗 that is independent of 푉 푖푗 In the absence of censoring, estimation is standard 11 / 18
STOCHASTIC EM ALGORITHM EM algorithm can be developed using the uncensored data as the complete data set Problem: intractable E-step due to multi-dimensional expectations over the censoring ranges up to 8-dimensional integration at each iteration for each individual Stochastic EM algorithm: simulate the uncensored data (conditional on the censoring limits) once at each iteration using truncated multivariate normal distribution at each iteration apply standard mixed model analysis to estimate updated parameter values Simple to implement: − − → truncated multinormal simulation ← − mixed model analysis 12 / 18
COMPUTATIONAL DETAILS Truncated multivariate normal simulation: Simple acceptance/rejection sampling (easy, inefficient) off-the-shelf routine using Gibbs sampling (more efficient, still easy) Initial estimates: Start with subject-specific censored linear regressions For each individual, impute uncensored observations by simulating from subject-specific fitted model Calculate initial estimates using multiple imputation Burn-in period and MCMC length: Experience suggests very quick stationarity beginning with multiple imputation estimate Analysis used burn-in period of 50 iterations followed by averaging of the next 500 iterations Information matrix is computable directly 13 / 18
PRELIMINARY SIMULATION RESULTS 100 simulations of random intercept and slope model (no covariates) 10 patients with 5 observations per patient and 50% censoring Unit slope StEM slope estimates: 20 15 10 5 0 0.0 0.5 1.0 1.5 2.0 2.5 slope estimate 14 / 18
ANALYSIS RESULTS Preliminary analysis with all covariates (8 parameters) Tests of significance of covariates based on information matrix standard errors Baseline viral load: P = 0.22 Initial immune level: P = 0.32 Immune response on therapy: P < 0.0001 Consider model with immune response as the sole covariate Consider whether observed variation in subject-specific slopes (viral rebound) is substantially explained by immune response covariate 15 / 18
ANALYSIS RESULTS Model with immune response only in the model: immune response explains viral rebound differences ( > 80%) crude analysis ignoring censoring is biased MI analysis is similar to StEM with some differences ˆ 휎 2 휎 2 휎 2 Method ˆ 휇 훼 ˆ 훿 ˆ ˆ ˆ 휇 훼 휖 uncensored 0.96 –0.026 0.054 0.046 0.00015 0.12 9 . 4 × 10 − 5 imputed 0.36 –0.024 0.061 0.39 0.13 8 . 3 × 10 − 5 StEM 0.46 –0.025 0.061 0.32 0.16 Interpretation: On cessation of therapy, the rate of viral rebound is greatest for individuals who had a better immune response to therapy A better immune response creates more “fuel” (i.e. more targets) for viral rebound 16 / 18
MARKOV CHAIN ITERATIVE BEHAVIOUR (6 parameters) 17 / 18
CONCLUSIONS Ignoring censoring in repeated measures data can lead to gross biases Stochastic EM algorithm allows repeated measures mixed models to be fitted to censored data using a standard mixed model routine for uncensored data All that is required is a truncated multivariate normal simulator, for simulation of uncensored data conditional on the censored data Assuming the availability of a NLME routine, non-linear models can be handled just as easily as linear models In the case study stationarity was achieved quickly and the fitted censored mixed effects models allowed insights into the sources of variation in changes over time 18 / 18
Recommend
More recommend