custom functions for specifying nonlinear terms to gnm
play

Custom Functions for Specifying Nonlinear Terms to gnm Heather - PowerPoint PPT Presentation

Custom Functions for Specifying Nonlinear Terms to gnm Heather Turner, David Firth and Andy Batchelor Department of Statistics University of Warwick, UK H. Turner, D. Firth and A. Batchelor () Custom nonlin Functions UseR August 2008


  1. Custom Functions for Specifying Nonlinear Terms to gnm Heather Turner, David Firth and Andy Batchelor Department of Statistics University of Warwick, UK H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 1 / 17

  2. Motivating Application: Marriage Data We wish to investigate the propensity to marry for women living in Ireland based on the Living in Ireland Surveys (1994-2001) For women born between 1950 and 1975 we have ◮ year of (first) marriage ◮ year and month of birth ◮ social class ◮ highest level of education attained ◮ year highest level of education was attained H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 2 / 17

  3. Proposed Model We wish to model the hazard of marriage occuring at time t h ( t ) = P ( T = t | T ≥ t ) using the discrete-time model logit( h ( t )) = c + β l log( age it − α l ) + β r log( α r − age it ) + x ′ it β = η The parameters can be estimated by logistic regression of a marriage indicator on η — a generalized nonlinear model ◮ need custom "nonlin" function to specify “log-excess” terms in the formula argument to gnm H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 3 / 17

  4. Variables and Predictors A "nonlin" function creates a list of arguments for the internal function nonlinTerms First define the variables and predictors in the term β l log( age it − α l ) Start to build "nonlin" function as follows LogExcess <- function(age) { list(predictors = list(beta = ~1, alpha = ~1), variables = list(substitute(age))) } class(LogExcess) <- "nonlin" H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 4 / 17

  5. Term-specific Issues Want to use same function for both log-excess terms, so add argument side = "left" To avoid taking logs of negative values, we let α l = age [ min ] − 10 − 5 − exp( α ∗ l ) α r = age [ max ] + 10 − 5 + exp( α ∗ r ) and estimate α ∗ l and α ∗ r instead. In LogExcess define constraint <- ifelse(side == "left", min(age) - 1e-5, max(age) + 1e-5) H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 5 / 17

  6. Term Definition The term argument of nonlinTerms takes labels for the predictors and variables and returns a deparsed expression of the term: term = function(predLabels, varLabels) { paste(predLabels[1], " * log(", " -"[side == "right"], varLabels[1], " + ", " -"[side == "left"], constraint, " + exp(", predLabels[2], "))") } So e.g. > term(c("beta", "alpha*"), "age") [1] "beta * log( age + - 14.99999 + exp( alpha* ))" H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 6 / 17

  7. Parameter Labels Default parameter labels are taken from the predictor names, here beta and alpha To make parameter labels unique, save call to LogExcess : call <- sys.call() and specify call argument to nonlinTerms call = as.expression(call) H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 7 / 17

  8. Complete Function LogExcess <- function(age, side = "left") { call <- sys.call() constraint <- ifelse(side == "left", min(age) - 1e-5, max(age) + 1e-5) list(predictors = list(beta = ~1, alpha = ~1), variables = list(substitute(age)), term = function(predLabels, varLabels) { paste(predLabels[1], " * log(", " -"[side == "right"], varLabels[1], " + ", " -"[side == "left"], constraint, " + exp(", predLabels[2], "))") } , call = as.expression(call)) } class(LogExcess) <- "nonlin" H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 8 / 17

  9. Summary of Baseline Model Call: gnm(formula = marriages/lives ~ LogExcess(age, side = "left") + LogExcess(age, side = "right"), family = binomial, data = fulldata, weights = lives, start = c(-20, 3, 0, 3, 0)) Deviance Residuals: Min 1Q Median 3Q Max -0.8098 -0.4441 -0.3224 -0.1528 4.0483 Estimate Std. Error z value Pr(>|z|) (Intercept) -118.5395 NA NA NA LogExcess(age, side = "left")beta 3.6928 NA NA NA LogExcess(age, side = "left")alpha -0.1432 NA NA NA LogExcess(age, side = "right")beta 24.8623 NA NA NA LogExcess(age, side = "right")alpha 4.0247 NA NA NA Std. Error is NA where coefficient has been constrained or is unidentified Residual deviance: 12553 on 31004 degrees of freedom AIC: 12748 Number of iterations: 76 H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 9 / 17

  10. Example ‘Recoil’ Plot 0.12 Probability of Marriage 0.08 0.04 0.00 10 20 30 40 50 Age H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 10 / 17

  11. Example ‘Recoil’ Plot 0.12 0.12 Probability of Marriage Probability of Marriage 0.08 0.08 0.04 0.04 0.00 0.00 10 10 20 20 30 30 40 40 50 50 Age Age H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 10 / 17

  12. Example ‘Recoil’ Plot 0.12 0.12 0.12 ●●●● Probability of Marriage Probability of Marriage Probability of Marriage ● ● ● ● 0.08 0.08 0.08 ● ● ● ● ● ● ● 0.04 0.04 0.04 ● ● ● ● ● ●●●●●●● ● 0.00 0.00 0.00 ● 10 10 10 20 20 20 30 30 30 40 40 40 50 50 50 Age Age Age H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 10 / 17

  13. Re-parameterization The problem with aliasing can be overcome by re-parameterizing the baseline model: �� � � ν − α l γ − δ ( ν − α l ) log age it − α l � � α r − ν �� + δ ( α r − ν ) log α r − age it A new nonlin function, Bell , is needed to specify this term H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 11 / 17

  14. Interpretation of Parameters The parameters of the new parameterization have a more useful interpretation than before: expit ( γ ) Probability of Marriage α L α R ν Age (years) H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 12 / 17

  15. Recoil Plots for Reparameterised Model 0.15 ●●●● ● ●●●●●●●●● 0.10 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.05 ● ● ● ● ● γ ν ● ● ● ● ●●●●●●●●●●●●●●● ● −2.09 → −1.95 ● 25.39 → → 28 ● ● ● ●●●● 0.00 ●●●●● ● Probability of Marriage ●●●●●● ●●●●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● α L δ ● ● ● ●●●●●●●●●●●●●●● ● ● 0.34 → 0.15 ● −0.14 → → 0.035 ●●●●●●● ● ● ● ●● 0.15 10 20 30 40 50 0.10 ●●●●●● ● ● ● ● ● ● Original Model 0.05 ● ● Perturbed Model ● α R ● ●●●●●●●●●●●●●●●● Re−fitted Model ● 4.02 → 20 ● 0.00 ● ●●● 10 20 30 40 50 Age (years) H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 13 / 17

  16. Summary of Re-parameterized Model Call: gnm(formula = marriages/lives ~ Bell(age), family = binomial, data = fulldata, weights = lives, start = c(NA, 26, 0.2, NA, NA)) Deviance Residuals: Min 1Q Median 3Q Max -0.8098 -0.4441 -0.3224 -0.1528 4.0483 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -2.09446 0.03313 -63.225 < 2e-16 Bell(age)peak 25.39354 0.30405 83.517 < 2e-16 Bell(age)fallOff 0.32917 0.09065 3.631 0.000282 Bell(age)leftAdj -0.14321 0.89350 -0.160 0.872663 Bell(age)rightAdj 4.02470 1.73753 2.316 0.020540 (Dispersion parameter for binomial family taken to be 1) Residual deviance: 12553 on 31004 degrees of freedom AIC: 12748 Number of iterations: 14 H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 14 / 17

  17. Further Analysis We can write a simple function to compute the endpoints and their standard errors > BellEndpoints(bell.mod) [,1] [,2] left 14.17508 0.7742838 right 100.92183 97.2381849 Adding theoretically important covariates produces even higher estimate of right endpoint ◮ use simpler model with infinite right endpoint Residual analysis suggests location of hazard depends on education level ◮ extend model to allow ν to depend on covariates H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 15 / 17

  18. Final Model For women born in 1950 0.20 1.0 Education level (years) Proportion never married Probability of marriage 0.8 0.15 8 11 0.6 14 0.10 15 16 0.4 17 0.05 0.2 0.00 0.0 15 20 25 30 35 40 45 10 20 30 40 50 Age (years) Age (years) H. Turner, D. Firth and A. Batchelor () Custom “nonlin” Functions UseR August 2008 16 / 17

Recommend


More recommend