exploring and modeling dichotomous outcomes
play

Exploring and Modeling Dichotomous Outcomes Brandon LeBeau - PowerPoint PPT Presentation

DataCamp Longitudinal Analysis in R LONGITUDINAL ANALYSIS IN R Exploring and Modeling Dichotomous Outcomes Brandon LeBeau Assistant Professor DataCamp Longitudinal Analysis in R Dichotomous outcomes Dichotomous or binary outcomes take two


  1. DataCamp Longitudinal Analysis in R LONGITUDINAL ANALYSIS IN R Exploring and Modeling Dichotomous Outcomes Brandon LeBeau Assistant Professor

  2. DataCamp Longitudinal Analysis in R Dichotomous outcomes Dichotomous or binary outcomes take two values Examples: 0 = No, 1 = Yes 0 = Not Present, 1 = Present 0 = Not Proficient, 1 = Proficient 0 = No symptoms, 1 = Symptoms

  3. DataCamp Longitudinal Analysis in R Exploring data with dichotomous outcomes library(HSAUR2) head(toenail, n = 10) patientID outcome treatment time visit 1 1 moderate or severe terbinafine 0.0000000 1 2 1 moderate or severe terbinafine 0.8571429 2 3 1 moderate or severe terbinafine 3.5357140 3 4 1 none or mild terbinafine 4.5357140 4 5 1 none or mild terbinafine 7.5357140 5 6 1 none or mild terbinafine 10.0357100 6 7 1 none or mild terbinafine 13.0714300 7 8 2 none or mild itraconazole 0.0000000 1 9 2 none or mild itraconazole 0.9642857 2 10 2 moderate or severe itraconazole 2.0000000 3

  4. DataCamp Longitudinal Analysis in R Generalized linear mixed model (GLMM) Explores the log-odds of success Success refers to the outcome coded as 1 Continuous models are not appropriate due to predictions often being out of bounds due to mean and variance being related

  5. DataCamp Longitudinal Analysis in R Changes in the outcome variable over time toenail <- toenail %>% mutate(outcome_dich = ifelse(outcome == "none or mild", 1, 0), visit_0 = visit - 1) toenail %>% group_by(visit_0) %>% summarise(prop_outcome = mean(outcome_dich), num = n()) visit_0 prop_outcome num <dbl> <dbl> <int> 1 0 0.629 294 2 1 0.663 288 3 2 0.703 283 4 3 0.787 272 5 4 0.916 263 6 5 0.926 244 7 6 0.924 264

  6. DataCamp Longitudinal Analysis in R Fitting GLMM with lme4 Fitting GLMMs with lme4 are similar to previous chapters Two additions: use glmer instead of lmer specify family = binomial argument toe_output <- glmer(outcome_dich ~ 1 + visit_0 + treatment + ( 1 | patientID), data = toenail, family = binomial) summary(toe_output)

  7. DataCamp Longitudinal Analysis in R GLMM output Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) [glmerMod] Family: binomial ( logit ) Formula: outcome_dich ~ 1 + visit_0 + treatment + (1 | patientID) Data: toenail AIC BIC logLik deviance df.resid 1260.3 1282.6 -626.2 1252.3 1904 Random effects: Groups Name Variance Std.Dev. patientID (Intercept) 21.97 4.687 Number of obs: 1908, groups: patientID, 294 Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.96335 0.81901 2.397 0.0165 * visit_0 0.91153 0.07433 12.263 <2e-16 *** treatmentterbinafine 0.69688 0.68696 1.014 0.3104

  8. DataCamp Longitudinal Analysis in R LONGITUDINAL ANALYSIS IN R Time to practice!

  9. DataCamp Longitudinal Analysis in R LONGITUDINAL ANALYSIS IN R Generalized Estimating Equations (GEE) Brandon LeBeau Assistant Professor

  10. DataCamp Longitudinal Analysis in R Introduction to geepack Let's fit a first GEE model using the geepack package geeglm() is the model fitting function toenail <- toenail %>% mutate(outcome_dich = ifelse(outcome == "none or mild", 1, 0), visit_0 = visit - 1) # Fit GEE model gee_toe <- geeglm(outcome_dich ~ 1 + visit_0, data = toenail, id = patientID, family = binomial, scale.fix = TRUE) # Extract model summary summary(gee_toe)

  11. DataCamp Longitudinal Analysis in R geeglm output Call: geeglm(formula = outcome_dich ~ 1 + visit_0, family = binomial, data = toenail, id = patientID, scale.fix = TRUE) Coefficients: Estimate Std.err Wald Pr(>|W|) (Intercept) 0.35522 0.13122 7.328 0.00679 ** visit_0 0.38319 0.03728 105.673 < 2e-16 ***

  12. DataCamp Longitudinal Analysis in R Specifying working correlations An optional argument, corstr is used to control the working correlation matrix Accounts for the dependency due to repeated measures The default is independence # Fit GEE model gee_toe <- geeglm(outcome_dich ~ 1 + visit_0, data = toenail, id = patientID, family = binomial, corstr = 'exchangeable', scale.fix = TRUE) # Extract model summary summary(gee_toe)

  13. DataCamp Longitudinal Analysis in R GEE exchangeable output Here is the exchangeable output Call: geeglm(formula = outcome_dich ~ 1 + visit_0, family = binomial, data = toenail, id = patientID, corstr = "exchangeable", scale.fix = TRUE) Coefficients: Estimate Std.err Wald Pr(>|W|) (Intercept) 0.3332 0.1345 6.14 0.013 * visit_0 0.3797 0.0363 109.29 <2e-16 ***

  14. DataCamp Longitudinal Analysis in R Other working correlation structures corstr = "ar1" corstr = "unstructured" Example: correlation = 0.5 [,1] [,2] [,3] [,4] [,5] [,1] [,2] [,3] [,4] [,5] [1,] 1.0000 0.500 0.25 0.125 0.0625 [1,] 1.000 0.559 0.492 0.363 0.082 [2,] 0.5000 1.000 0.50 0.250 0.1250 [2,] 0.559 1.000 0.398 0.250 0.139 [3,] 0.2500 0.500 1.00 0.500 0.2500 [3,] 0.492 0.590 1.000 0.071 0.209 [4,] 0.1250 0.250 0.50 1.000 0.5000 [4,] 0.398 0.493 0.629 1.000 0.166 [5,] 0.0625 0.125 0.25 0.500 1.0000 [5,] 0.363 0.313 0.426 0.604 1.000

  15. DataCamp Longitudinal Analysis in R LONGITUDINAL ANALYSIS IN R Try GEE models!

  16. DataCamp Longitudinal Analysis in R LONGITUDINAL ANALYSIS IN R Model Selection Brandon LeBeau Assistant Professor

  17. DataCamp Longitudinal Analysis in R QIC QIC = quasi-likelihood under the independence model criterion GEE does not use maximum likelihood estimation like GLMM QIC needed for GEE MuMIn package calculates this statistic library(MuMIn) toenail <- toenail %>% mutate(outcome_dich = ifelse(outcome == "none or mild", 1, 0), visit_0 = visit - 1) # Fit GEE model gee_toe <- geeglm(outcome_dich ~ 1 + visit_0, data = toenail, id = patientID, family = binomial, scale.fix = TRUE) QIC(gee_toe) QIC 1828.552

  18. DataCamp Longitudinal Analysis in R Evaluating working correlation QIC can help select working correlation matrix # Fit GEE model gee_ind <- geeglm(outcome_dich ~ 1 + visit_0, data = toenail, id = patientID, family = binomial, scale.fix = TRUE) gee_exch <- geeglm(outcome_dich ~ 1 + visit_0, data = toenail, id = patientID, family = binomial, scale.fix = TRUE, corstr = 'exchangeable') gee_ar1 <- geeglm(outcome_dich ~ 1 + visit_0, data = toenail, id = patientID, family = binomial, scale.fix = TRUE, corstr = 'ar1') QIC(gee_ind, gee_exch, gee_ar1) QIC gee_ind 1828.552 gee_exch 1828.564 gee_ar1 1827.805

  19. DataCamp Longitudinal Analysis in R Model selection GLMM aictab() function from AICcmodavg package can be used for GLMM library(AICcmodavg) toe_baseline <- glmer(outcome_dich ~ 1 + visit_0 + ( 1 | patientID), data = toenail, family = binomial) toe_output <- glmer(outcome_dich ~ 1 + visit_0 + treatment + ( 1 | patientID), data = toenail, family = binomial) aictab(list(toe_baseline, toe_output), c("no treatment", "treatement")) Model selection based on AICc: K AICc Delta_AICc AICcWt Cum.Wt LL no treatment 3 1259.40 0.00 0.62 0.62 -626.69 treatement 4 1260.36 0.97 0.38 1.00 -626.17

  20. DataCamp Longitudinal Analysis in R LONGITUDINAL ANALYSIS IN R Time to practice model selection!

  21. DataCamp Longitudinal Analysis in R LONGITUDINAL ANALYSIS IN R Interpreting and Visualizing Model Results Brandon LeBeau Assistant Professor

  22. DataCamp Longitudinal Analysis in R Visualize GLMM Generate predicted values with predict() function toe_output <- glmer(outcome_dich ~ 1 + visit_0 + treatment + ( 1 | patientID), data = toenail, family = binomial) toenail <- toenail %>% mutate(pred_values = predict(toe_output)) ggplot(toenail, aes(x = visit_0, y = pred_values)) + geom_line(aes(group = patientID), linetype = 2) + theme_bw(base_size = 16) + xlab("Visit Number") + ylab("Predicted Values")

  23. DataCamp Longitudinal Analysis in R

  24. DataCamp Longitudinal Analysis in R Visualize GLMM - probabilities Often the probability metric is more intuitive predict() function with argument type = "response" will give probabilities toenail <- toenail %>% mutate(pred_values = predict(toe_output, type = "response")) ggplot(toenail, aes(x = visit_0, y = pred_values)) + geom_line(aes(group = patientID), linetype = 2) + theme_bw(base_size = 16) + xlab("Visit Number") + ylab("Prob of none or mild separation")

  25. DataCamp Longitudinal Analysis in R

  26. DataCamp Longitudinal Analysis in R Visualize GEE predict() can again be used here as with GLMMs gee_toe <- geeglm(outcome_dich ~ 1 + visit_0 + treatment, data = toenail, id = patientID, family = binomial, corstr = 'exchangeable', scale.fix = TRUE) toenail_gee <- toenail %>% mutate(pred_gee = predict(gee_toe, type = "response")) ggplot(toenail_gee, aes(x = visit_0, y = pred_gee)) + geom_line(aes(color = treatment)) + theme_bw(base_size = 16) + xlab("Visit Number") + ylab("Probability of none or mild separation")

  27. DataCamp Longitudinal Analysis in R

Recommend


More recommend