statistical modelling with stata binary outcomes
play

Statistical Modelling with Stata: Binary Outcomes Mark Lunt Centre - PowerPoint PPT Presentation

Cross-tabulation Regression Diagnostics Statistical Modelling with Stata: Binary Outcomes Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 01/12/2020 Cross-tabulation Regression Diagnostics Cross-tabulation


  1. Cross-tabulation Regression Diagnostics Statistical Modelling with Stata: Binary Outcomes Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 01/12/2020

  2. Cross-tabulation Regression Diagnostics Cross-tabulation Exposed Unexposed Total Cases a b a + b Controls c d c + d Total a + c b + d a + b + c + d Simple random sample: fix a + b + c + d Exposure-based sampling: fix a + c and b + d Outcome-based sampling: fix a + b and c + d

  3. Cross-tabulation Regression Diagnostics The χ 2 Test Compares observed to expected numbers in each cell Expected under null hypothesis: no association Works for any of the sampling schemes

  4. Cross-tabulation Regression Diagnostics Measures of Association a == a ( b + d ) a + c Relative Risk = b b ( a + c ) b + d a b Risk Difference = a + c − b + d a == ad c Odds Ratio = b cb d All obtained with cs disease exposure[, or] Only Odds ratio valid with outcome based sampling

  5. Cross-tabulation Regression Diagnostics Crosstabulation in stata . cs back_p sex, or | sex | | Exposed Unexposed | Total -----------------+------------------------+------------ Cases | 637 445 | 1082 Noncases | 1694 1739 | 3433 -----------------+------------------------+------------ Total | 2331 2184 | 4515 | | Risk | .2732733 .2037546 | .2396456 | | | Point estimate | [95% Conf. Interval] |------------------------+------------------------ Risk difference | .0695187 | .044767 .0942704 Risk ratio | 1.341188 | 1.206183 1.491304 Attr. frac. ex. | .2543926 | .1709386 .329446 Attr. frac. pop | .1497672 | Odds ratio | 1.469486 | 1.27969 1.68743 (Cornfield) +------------------------------------------------- chi2(1) = 29.91 Pr>chi2 = 0.0000

  6. Cross-tabulation Regression Diagnostics Limitations of Tabulation No continuous predictors Limited numbers of categorical predictors

  7. Introduction Cross-tabulation Generalized Linear Models Regression Logistic Regression Diagnostics Other GLM’s for Binary Outcomes Linear Regression and Binary Outcomes Can’t use linear regression with binary outcomes Distribution is not normal Limited range of sensible predicted values Changing parameter estimation to allow for non-normal distribution is straightforward Need to limit range of predicted values

  8. Introduction Cross-tabulation Generalized Linear Models Regression Logistic Regression Diagnostics Other GLM’s for Binary Outcomes Example: CHD and Age 1 .8 .6 chd .4 .2 0 20 30 40 50 60 70 age

  9. Introduction Cross-tabulation Generalized Linear Models Regression Logistic Regression Diagnostics Other GLM’s for Binary Outcomes Example: CHD by Age group .8 Proportion of subjects with CHD .6 .4 .2 0 20 30 40 50 60 Mean age

  10. Introduction Cross-tabulation Generalized Linear Models Regression Logistic Regression Diagnostics Other GLM’s for Binary Outcomes Example: CHD by Age - Linear Fit 1 .5 0 20 30 40 50 60 70 Proportion of subjects with CHD Fitted values

  11. Introduction Cross-tabulation Generalized Linear Models Regression Logistic Regression Diagnostics Other GLM’s for Binary Outcomes Generalized Linear Models Linear Model Y = β 0 + β 1 x 1 + . . . + β p x p + ε ε is normally distributed Generalized Linear Model g ( Y ) = β 0 + β 1 x 1 + . . . + β p x p + ε ε has a known distribution

  12. Introduction Cross-tabulation Generalized Linear Models Regression Logistic Regression Diagnostics Other GLM’s for Binary Outcomes Probabilities and Odds Probability Odds p Ω = p / ( 1 − p ) 0.1 = 1/10 0.1/0.9 = 1:9 = 0.111 0.5 = 1/2 0.5/0.5 = 1:1 = 1 0.9 = 9/10 0.9/0.1 = 9:1 = 9

  13. Introduction Cross-tabulation Generalized Linear Models Regression Logistic Regression Diagnostics Other GLM’s for Binary Outcomes Probabilities and Odds 1 .8 .6 Proportion .4 .2 0 −5 0 5 Log odds

  14. Introduction Cross-tabulation Generalized Linear Models Regression Logistic Regression Diagnostics Other GLM’s for Binary Outcomes Advantage of the Odds Scale Just a different scale for measuring probabilities Any odds from 0 to ∞ corresponds to a probability Any log odds from −∞ to ∞ corresponds to a probability Shape of curve commonly fits data

  15. Introduction Cross-tabulation Generalized Linear Models Regression Logistic Regression Diagnostics Other GLM’s for Binary Outcomes The binomial distribution Outcome can be either 0 or 1 Has one parameter: the probability that the outcome is 1 Assumes observations are independent

  16. Introduction Cross-tabulation Generalized Linear Models Regression Logistic Regression Diagnostics Other GLM’s for Binary Outcomes The Logistic Regression Equation � π ˆ � log = β 0 + β 1 x 1 + . . . + β p x p 1 − ˆ π Binomial (ˆ π ) Y ∼ Y has a binomial distribution with parameter π ˆ π is the predicted probability that Y = 1

  17. Introduction Cross-tabulation Generalized Linear Models Regression Logistic Regression Diagnostics Other GLM’s for Binary Outcomes Parameter Interpretation When x i increases by 1, log (ˆ π/ ( 1 − ˆ π )) increases by β i π ) increases by a factor e β i Therefore ˆ π/ ( 1 − ˆ For a dichotomous predictor, this is exactly the odds ratio we met earlier. For a continuous predictor, the odds increase by a factor of e β i for each unit increase in the predictor

  18. Introduction Cross-tabulation Generalized Linear Models Regression Logistic Regression Diagnostics Other GLM’s for Binary Outcomes Odds Ratios and Relative Risks 5 4 3 2 1 0 0 .2 .4 .6 .8 1 Proportion Odds Proportion

  19. Introduction Cross-tabulation Generalized Linear Models Regression Logistic Regression Diagnostics Other GLM’s for Binary Outcomes Logistic Regression in Stata . logistic chd age Logistic regression Number of obs = 100 LR chi2(1) = 29.31 Prob > chi2 = 0.0000 Log likelihood = -53.676546 Pseudo R2 = 0.2145 ------------------------------------------------------------------------------ chd | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | 1.117307 .0268822 4.61 0.000 1.065842 1.171257 ------------------------------------------------------------------------------

  20. Introduction Cross-tabulation Generalized Linear Models Regression Logistic Regression Diagnostics Other GLM’s for Binary Outcomes Predict Lots of options for the predict command p gives the predicted probability for each subject xb gives the linear predictor (i.e. the log of the odds) for each subject

  21. Introduction Cross-tabulation Generalized Linear Models Regression Logistic Regression Diagnostics Other GLM’s for Binary Outcomes Plot of probability against age 1 .8 .6 .4 .2 0 20 30 40 50 60 70 Pr(chd) Proportion of subject in each ageband with CHD

  22. Introduction Cross-tabulation Generalized Linear Models Regression Logistic Regression Diagnostics Other GLM’s for Binary Outcomes Plot of log-odds against age 2 1 Linear prediction 0 −1 −2 −3 20 30 40 50 60 70 age

  23. Introduction Cross-tabulation Generalized Linear Models Regression Logistic Regression Diagnostics Other GLM’s for Binary Outcomes Other Models for Binary Outcomes Can use any function that maps ( −∞ , ∞ ) to (0, 1) Probit Model Complementary log-log Parameters lack interpretation

  24. Introduction Cross-tabulation Generalized Linear Models Regression Logistic Regression Diagnostics Other GLM’s for Binary Outcomes The Log-Binomial Model Models log( π ) rather than log( π/ ( 1 − π )) Gives relative risk rather than odds ratio Can produce predicted values greater than 1 May not fit the data as well Stata command: glm varlist , family(binomial) link(log) If association between log( π ) and predictor non-linear, lose simple interpretation.

  25. Introduction Cross-tabulation Generalized Linear Models Regression Logistic Regression Diagnostics Other GLM’s for Binary Outcomes Log-binomial model example 1.5 1 .5 0 20 30 40 50 60 70 logistic predictions log−binomial predictions Proportion of subjects with CHD

  26. Goodness of Fit Cross-tabulation Influential Observations Regression Poorly fitted observations Diagnostics Separation Logistic Regression Diagnostics Goodness of Fit Influential Observations Poorly fitted Observations

  27. Goodness of Fit Cross-tabulation Influential Observations Regression Poorly fitted observations Diagnostics Separation Problems with R 2 Multiple definitions Lack of interpretability Low values Can predict P ( Y = 1 ) perfectly, not predict Y well at all if P ( Y = 1 ) ≈ 0 . 5.

  28. Goodness of Fit Cross-tabulation Influential Observations Regression Poorly fitted observations Diagnostics Separation Hosmer-Lemeshow test Very like χ 2 test Divide subjects into groups Compare observed and expected numbers in each group Want to see a non -significant result Command used is estat gof

Recommend


More recommend