Statistical Modelling with Stata: Binary Outcomes Mark Lunt Centre - PowerPoint PPT Presentation

Cross-tabulation Regression Diagnostics Statistical Modelling with Stata: Binary Outcomes Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 01/12/2020

Cross-tabulation Regression Diagnostics Cross-tabulation Exposed Unexposed Total Cases a b a + b Controls c d c + d Total a + c b + d a + b + c + d Simple random sample: fix a + b + c + d Exposure-based sampling: fix a + c and b + d Outcome-based sampling: fix a + b and c + d

Cross-tabulation Regression Diagnostics The χ 2 Test Compares observed to expected numbers in each cell Expected under null hypothesis: no association Works for any of the sampling schemes

Cross-tabulation Regression Diagnostics Measures of Association a == a ( b + d ) a + c Relative Risk = b b ( a + c ) b + d a b Risk Difference = a + c − b + d a == ad c Odds Ratio = b cb d All obtained with cs disease exposure[, or] Only Odds ratio valid with outcome based sampling

Cross-tabulation Regression Diagnostics Crosstabulation in stata . cs back_p sex, or | sex | | Exposed Unexposed | Total -----------------+------------------------+------------ Cases | 637 445 | 1082 Noncases | 1694 1739 | 3433 -----------------+------------------------+------------ Total | 2331 2184 | 4515 | | Risk | .2732733 .2037546 | .2396456 | | | Point estimate | [95% Conf. Interval] |------------------------+------------------------ Risk difference | .0695187 | .044767 .0942704 Risk ratio | 1.341188 | 1.206183 1.491304 Attr. frac. ex. | .2543926 | .1709386 .329446 Attr. frac. pop | .1497672 | Odds ratio | 1.469486 | 1.27969 1.68743 (Cornfield) +------------------------------------------------- chi2(1) = 29.91 Pr>chi2 = 0.0000

Cross-tabulation Regression Diagnostics Limitations of Tabulation No continuous predictors Limited numbers of categorical predictors

Introduction Cross-tabulation Generalized Linear Models Regression Logistic Regression Diagnostics Other GLM’s for Binary Outcomes Linear Regression and Binary Outcomes Can’t use linear regression with binary outcomes Distribution is not normal Limited range of sensible predicted values Changing parameter estimation to allow for non-normal distribution is straightforward Need to limit range of predicted values

Introduction Cross-tabulation Generalized Linear Models Regression Logistic Regression Diagnostics Other GLM’s for Binary Outcomes Example: CHD and Age 1 .8 .6 chd .4 .2 0 20 30 40 50 60 70 age

Introduction Cross-tabulation Generalized Linear Models Regression Logistic Regression Diagnostics Other GLM’s for Binary Outcomes Example: CHD by Age group .8 Proportion of subjects with CHD .6 .4 .2 0 20 30 40 50 60 Mean age

Introduction Cross-tabulation Generalized Linear Models Regression Logistic Regression Diagnostics Other GLM’s for Binary Outcomes Example: CHD by Age - Linear Fit 1 .5 0 20 30 40 50 60 70 Proportion of subjects with CHD Fitted values

Introduction Cross-tabulation Generalized Linear Models Regression Logistic Regression Diagnostics Other GLM’s for Binary Outcomes Generalized Linear Models Linear Model Y = β 0 + β 1 x 1 + . . . + β p x p + ε ε is normally distributed Generalized Linear Model g ( Y ) = β 0 + β 1 x 1 + . . . + β p x p + ε ε has a known distribution

Introduction Cross-tabulation Generalized Linear Models Regression Logistic Regression Diagnostics Other GLM’s for Binary Outcomes Probabilities and Odds Probability Odds p Ω = p / ( 1 − p ) 0.1 = 1/10 0.1/0.9 = 1:9 = 0.111 0.5 = 1/2 0.5/0.5 = 1:1 = 1 0.9 = 9/10 0.9/0.1 = 9:1 = 9

Introduction Cross-tabulation Generalized Linear Models Regression Logistic Regression Diagnostics Other GLM’s for Binary Outcomes Probabilities and Odds 1 .8 .6 Proportion .4 .2 0 −5 0 5 Log odds

Introduction Cross-tabulation Generalized Linear Models Regression Logistic Regression Diagnostics Other GLM’s for Binary Outcomes Advantage of the Odds Scale Just a different scale for measuring probabilities Any odds from 0 to ∞ corresponds to a probability Any log odds from −∞ to ∞ corresponds to a probability Shape of curve commonly fits data

Introduction Cross-tabulation Generalized Linear Models Regression Logistic Regression Diagnostics Other GLM’s for Binary Outcomes The binomial distribution Outcome can be either 0 or 1 Has one parameter: the probability that the outcome is 1 Assumes observations are independent

Introduction Cross-tabulation Generalized Linear Models Regression Logistic Regression Diagnostics Other GLM’s for Binary Outcomes The Logistic Regression Equation � π ˆ � log = β 0 + β 1 x 1 + . . . + β p x p 1 − ˆ π Binomial (ˆ π ) Y ∼ Y has a binomial distribution with parameter π ˆ π is the predicted probability that Y = 1

Introduction Cross-tabulation Generalized Linear Models Regression Logistic Regression Diagnostics Other GLM’s for Binary Outcomes Parameter Interpretation When x i increases by 1, log (ˆ π/ ( 1 − ˆ π )) increases by β i π ) increases by a factor e β i Therefore ˆ π/ ( 1 − ˆ For a dichotomous predictor, this is exactly the odds ratio we met earlier. For a continuous predictor, the odds increase by a factor of e β i for each unit increase in the predictor

Introduction Cross-tabulation Generalized Linear Models Regression Logistic Regression Diagnostics Other GLM’s for Binary Outcomes Odds Ratios and Relative Risks 5 4 3 2 1 0 0 .2 .4 .6 .8 1 Proportion Odds Proportion

Introduction Cross-tabulation Generalized Linear Models Regression Logistic Regression Diagnostics Other GLM’s for Binary Outcomes Logistic Regression in Stata . logistic chd age Logistic regression Number of obs = 100 LR chi2(1) = 29.31 Prob > chi2 = 0.0000 Log likelihood = -53.676546 Pseudo R2 = 0.2145 ------------------------------------------------------------------------------ chd | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | 1.117307 .0268822 4.61 0.000 1.065842 1.171257 ------------------------------------------------------------------------------

Introduction Cross-tabulation Generalized Linear Models Regression Logistic Regression Diagnostics Other GLM’s for Binary Outcomes Predict Lots of options for the predict command p gives the predicted probability for each subject xb gives the linear predictor (i.e. the log of the odds) for each subject

Introduction Cross-tabulation Generalized Linear Models Regression Logistic Regression Diagnostics Other GLM’s for Binary Outcomes Plot of probability against age 1 .8 .6 .4 .2 0 20 30 40 50 60 70 Pr(chd) Proportion of subject in each ageband with CHD

Introduction Cross-tabulation Generalized Linear Models Regression Logistic Regression Diagnostics Other GLM’s for Binary Outcomes Plot of log-odds against age 2 1 Linear prediction 0 −1 −2 −3 20 30 40 50 60 70 age

Introduction Cross-tabulation Generalized Linear Models Regression Logistic Regression Diagnostics Other GLM’s for Binary Outcomes Other Models for Binary Outcomes Can use any function that maps ( −∞ , ∞ ) to (0, 1) Probit Model Complementary log-log Parameters lack interpretation

Introduction Cross-tabulation Generalized Linear Models Regression Logistic Regression Diagnostics Other GLM’s for Binary Outcomes The Log-Binomial Model Models log( π ) rather than log( π/ ( 1 − π )) Gives relative risk rather than odds ratio Can produce predicted values greater than 1 May not fit the data as well Stata command: glm varlist , family(binomial) link(log) If association between log( π ) and predictor non-linear, lose simple interpretation.

Introduction Cross-tabulation Generalized Linear Models Regression Logistic Regression Diagnostics Other GLM’s for Binary Outcomes Log-binomial model example 1.5 1 .5 0 20 30 40 50 60 70 logistic predictions log−binomial predictions Proportion of subjects with CHD

Goodness of Fit Cross-tabulation Influential Observations Regression Poorly fitted observations Diagnostics Separation Logistic Regression Diagnostics Goodness of Fit Influential Observations Poorly fitted Observations

Goodness of Fit Cross-tabulation Influential Observations Regression Poorly fitted observations Diagnostics Separation Problems with R 2 Multiple definitions Lack of interpretability Low values Can predict P ( Y = 1 ) perfectly, not predict Y well at all if P ( Y = 1 ) ≈ 0 . 5.

Goodness of Fit Cross-tabulation Influential Observations Regression Poorly fitted observations Diagnostics Separation Hosmer-Lemeshow test Very like χ 2 test Divide subjects into groups Compare observed and expected numbers in each group Want to see a non -significant result Command used is estat gof

Statistical Modelling with Stata: Binary Outcomes Mark Lunt Centre - PowerPoint PPT Presentation

Cross-tabulation Regression Diagnostics Statistical Modelling with Stata: Binary Outcomes Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 01/12/2020 Cross-tabulation Regression Diagnostics Cross-tabulation

Bayesian hierarchical models in Stata Nikolay Balov StataCorp LP 2016 Stata Conference Nikolay

Binary Numbers Binary numbers look like this Binary Numbers or Binary Code Binary numbers or

A Quick Review Decimal to binary Binary to decimal Binary to hexadecimal

Statistical Modelling in Stata: Categorical Outcomes Mark Lunt Centre for Epidemiology Versus

Binary Trees, Heaps Binary Trees, Heaps Binary trees Binary trees A binary tree (

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Python applications in Stata 16 BPLIM 2020 Portuguese Stata Conference BPLIM Python

Bayesian Analysis using Stata Bill Rising StataCorp LP 2016 Brazilian Stata Users Group Meeting

Simulating Baboon Behavior using Stata Phil Ender UCLA Statistical Consulting Group (Ret) Stata

Balanced Search Trees Binary Search Trees Binary Search Tree Binary Search Tree A binary tree is

Binary Numbers 723 Binary Numbers 723 = 7x100 + 2x10 + 3x1 Binary Numbers 723 = 7x100 + 2x10 +

Workshop 4: Statistical modelling intro Murray Logan 10 Mar 2019 Section 1 Introduction

Workshop 4: Statistical modelling intro Murray Logan March 10, 2019 Table of contents 1

Stata: Basics, Shortcuts, and Integration with Introduction LaTeX Stata Syntax and Shortcuts

Meta-analysis using Stata Yulia Marchenko Executive Director of Statistics StataCorp LLC 2019

Analyzing interval-censored survival-time data in Stata Xiao Yang Senior Statistician and

Lessons from Practice; Which patients benefit most? August 29th, 2020 ESG Stroes AMC,

Various Review Slides Spring 09 UC Berkeley Traeger 5 Risk and Uncertainty 78 The

Asset Pricing Chapter IV. Measuring Risk and Risk Aversion June 20, 2006 Asset Pricing 4.1

Lecture 15: Poisson assumptions, offsets, and relative risk Ani Manichaikul amanicha@jhsph.edu

Costs Javier Estrada The most underrated IESE of all financial variables Business School

Introduction: What Motivates My Eng ngagement in n Wildfire Related Issues Since October 2017

Understanding Minimal Risk Richard T. Campbell University of Illinois at Chicago Why is Minimal

What is the Expected Return on the Market? Ian Martin London School of Economics Ian Martin