Lecture 11: Interpreting logistic regression models Ani Manichaikul - PowerPoint PPT Presentation

Lecture 11: Interpreting logistic regression models Ani Manichaikul amanicha@jhsph.edu 3 May 2007

Logistic regression n Framework and ideas of linear modelling similar to linear regression n Still have a systematic and probabilistic part to any model n Coefficients have a new interpretation, based on log(odds) and log(odds ratios)

The logit function n In logistic regression, we are always modelling the outcome log(p/(1-p)) n We define the function: logit(p)= log(p/(1-p)) n We often use the name logit for convenience

Example: Public health graduate students n 323 graduate students in introductory biostatistics took a health survey. Current smoking status was gathered, which we will predict with gender. n Associating demographics with smoking is vital to planning public health programs. n Information was also collected on age, exercise, and history of smoking; potential confounders of the association between gender and current smoking. n Today, we will focus only on the association between gender and current smoking status.

Coding n Outcome: n smoking = 1 for current smokers 0 for current nonsmokers n Primary predictor: n gender = 1 for men 0 for women

Recall n In linear regression, if we had only one binary X like gender, we would be predicting two means: n � 0 – the mean outcome when X= 0 n � 0 + � 1 – the mean outcome when X= 1 n � 1 – the difference in mean outcome when X= 1 vs. when X= 0

Output Logit estimates Number of obs = 323 LR chi2(1) = 4.46 Prob > chi2 = 0.0348 Log likelihood = -75.469757 Pseudo R2 = 0.0287 ------------------------------------------------------------------------------ smoke | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- gender | .967966 .4547931 2.13 0.033 .0765879 1.859344 _cons | -3.058707 .3235656 -9.45 0.000 -3.692884 -2.42453 ------------------------------------------------------------------------------     p p ( ) ( )   = + ⇒   = + � � ln Gender ln - 3 . 1 1 . 0 Gender     − − 0 1     1 p 1 p

Predictions by gender n For women, gender= 0:   p ( )   = − + = − ln 3 . 1 1 . 0 0 3 . 1   −   1 p n For men, gender= 1:   p ( )   = − + = − ln 3 . 1 1 . 0 1 2 . 1   −   1 p n � 1 is the difference: � 1 is the change in log odds

Interpretation 1: log(odds) n � 0 : the log odds of smoking for women n � 0 �� 1 : the log odds of smoking for men n � 1 : the difference in the log odds of smoking for men as compared to women

But, we really wanted to predict P(Y= 1), not the log odds… n We can start to “untransform” the equation ( ) = = b ln a , then e if b a n n For women, X= 0: ln(odds)= � 0 �� 1 (0) = � 0 = = = � 0 -3.1 o dds of smoking for women e e 0 . 05 n For men, X= 1: ln(odds)= � 0 �� 1 (1) + + = = = = � � -3.1 1.0 -2.1 o dds of smoking for men e e e 0 . 12 0 1

Interpretation 2: odds � the odds of smoking for women e : 0 n (when X= 0) � + 0 � the odds of smoking for men e : 1 n (when X= 1) n In the past, we’ve compared two sets of odds by dividing to find the odds ratio (OR)

Comparing odds n If we subtract the log odds, mathematically that’s equivalent to dividing inside the log: n ln(a) – ln(b) = ln(a/b) n So, if + + = = = � � -3.1 1.0 -2.1 is the odds when X= 1, and e e e 0 . 12 0 1 n � 0 = = -3.1 is the odds when X= 0, then e e 0 . 05 n n we want to divide them in order to compare + � � odds for men e 0 . 12 0 1 = = = = Odds Ratio 2 . 4 � odds for women e 0 . 05 0

Interpreting the odds ratio n The odds of smoking is about 2 ½ times greater for men than for women. n Based on this study, smoking cessation programs should be targeted toward men, while perhaps smoking prevention programs should be targeted toward women.

Useful math n We can usually simplify an equation like this + � � e 0 1 = Odds Ratio � e 0 ( ) ( ) + = � � � - e 0 1 0 = � e 1 a e = − a b e because b e

odds and odds ratio � e : the odds when X= 0 0 n � + 0 � e : the odds when X= 1 1 n + � � e 0 1 = the odds ratio � e n 1 � e 0 comparing the odds when X= 1 vs. X= 0

Note on the computer output � e n R does not give in the output 0 n This is because logistic regression is so often used for case-control studies n the odds aren’t appropriate for a case-control study, because the investigators determine the ratio of cases to controls n the odds ratio is appropriate regardless of whether exposure or outcome was gathered first (by invariance of the odds ratio)

Types of interpretation n � 0 �� 1 = ln(odds) (for X= 1) n � 1 = difference in log odds + � 0 � e 1 = odds (for X= 1) n � e = odds ratio 1 n n But we started with P(Y= 1) n Can we find that?

More useful math p robability = n odds − 1 p robability odds = p robability n + 1 odds + � � e ( ) 0 1 = = so p robability for X 1 n + + � � 1 e 0 1

Finding the probability Find the log odds: For X= 0: ln(odds) = � 0 For X= 1: ln(odds) = � 0 + � 1 Find odds: � e 0 For X= 0: odds = + � 0 � e 1 For X= 1: odds =

Finding the probability Transform odds into probability: odds = p + 1 odds � e 0 = = For X 0 : p robability + � 1 e 0 + � � e 0 1 = = For X 1 : p robability + + � � 1 e 0 1

We could even go one step further p = 1 Re lative Risk (RR) n p 2 + � � e ( ) 0 1 = = For X 1 : P smoke | male n + + � � 1 e 0 1 � e ( ) 0 = = For X 0 : P smoke | female + � 1 e 0   + � � e 0 1     + + � �  1 e  p 0 1 = 1 R elative Risk for Men vs. Women : n   � p e 0   2   n no way to simplify + �  1 e  0

Remember to consider study design n We always can calculate the relative risk n The relative risk is not appropriate for case-control studies n Again, because the investigators decide the number of cases and controls to study n The odds ratio is appropriate for all study designs

Types of interpretation n � 0 �� 1 = ln(odds) (for X= 1) n � 1 = difference in log odds + � 0 � = odds (for X= 1) e 1 n � e = odds ratio 1 n + � � e ( ) 0 1 = = p robability for X 1 n + + � � 1 e 0 1  +  � � e 0 1     + + � �  1 e  0 1 = R elative Risk n   � e 0     + �  1 e  0

Interpretation Tips If the equation includes � 0 , then it is usually for a n particular set of people n log odds n odds n probability n exception: the equation for RR will include � 0 , because that equation cannot be simplified If the equation does not include � 0 , then it must n compare two groups n difference of log odds � log odds ratio n odds ratio

In General n Logistic regression for a binary outcome n Left side of equation is log odds n Can transform the equation to find n odds n probability n Can compare two groups n difference of log odds � log odds ratio n odds ratio n relative risk n Everything we learned before applies 25

Useful math for logistic regression b = ( ) = If ln a b , then e a n ( ) + = = � 0 � X= 1: ln(odds)= � 0 �� 1 (1) so o dds for X 1 e 1 n ln(a) – ln(b) = ln(a/b) n so ln(odds|X= 1) – ln(odds|X= 0) = ln(OR for X= 1 vs. X= 0) n + � � a + e e = × 0 1 a b a b Also : e e e = − = � a b e so e 1 n ( ) � b e e 0 2 �� = � × � = � so e e e e 1 1 1 1 odds = p robability n + 1 odds + � � e ( ) 0 1 = = so p robability for X 1 n + + � � 1 e 0 1

Another Example n Regular physical examination is an important preventative public health measure n We’ll study this outcome using the public health graduate student dataset. n Outcome: No physical exam in the past two years n Primary predictor: age n Secondary predictor and potential confounder : regularly taking a multivitamin

Problem n The original “phys” variable was meant to be continuous, but it was collected categorically. n time since last physician visit n Since it is now categorical and we wish to use it as the outcome for a regression model, we have to make it binary and use logistic regression.

Lecture 11: Interpreting logistic regression models Ani Manichaikul - PowerPoint PPT Presentation

Lecture 11: Interpreting logistic regression models Ani Manichaikul amanicha@jhsph.edu 3 May 2007 Logistic regression n Framework and ideas of linear modelling similar to linear regression n Still have a systematic and probabilistic part to

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Todays lecture Logistic regression How can we use logistic regression for reranking? Shay

From Logistic Regression to Neural Networks CMSC 470 Marine Carpuat Logistic Regression What

LEARNING Outline Math Behind Logistic Regression Visualizing Logistic Regression Loss

Workshop 10.5a: Logistic regression Murray Logan August 23, 2016 Table of contents 1 Logistic

Lecture 3: Logistic Regression Feng Li Shandong University fli@sdu.edu.cn September 21, 2020

Logistic Regression using OLS1D in Excel 2013 XL4D: V0H XL4D: V0H XL4D: V0H 2015 Schield

Workshop 10.5a: Logistic regression Murray Logan 05 Sep 2016 Section 1 Logistic regression

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Logistic regression Shay Cohen (based on slides by Sharon Goldwater) 28 October 2019 Todays

Learning From Data Lecture 9 Logistic Regression and Gradient Descent Logistic Regression

XL4B: Logistic Regression using OLS1B in Excel 2013 25 Feb 2018 V0C-2x XL4B: V0C-2x XL4B: V0C-2x

Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015

Lecture 9 Logistic regression Lecture 9 Logistic regression 10 17 2008 Review Review

Logistic regression Predict binary outcomes (success/failure) from numerical or categorical

Intro to GLM Day 3: Quantities of interest Federico Vegetti Central European University ECPR

Special Topics Some complex model-building problems can be handled using the linear regression

Introducing Open Platform for NFV Dirk Kutscher Chief Researcher NEC

Ukulele Lesson Eight Play each string, 1 2 one at a time. Listen to the sound. Are your

Lecture 18: Review Lecture Ani Manichaikul amanicha@jhsph.edu 15 May 2007 Types of

Statistical Modelling in Stata: Categorical Outcomes Mark Lunt Centre for Epidemiology Versus

Week 4: Binary Outcomes Logistic Regression & Classification Max H. Farrell The University

Searching for family members - (Durbin et al., Ch.5) Suppose we have a family of related

Lecture 11: Interpreting logistic regression models Ani Manichaikul - PowerPoint PPT Presentation

Lecture 11: Interpreting logistic regression models Ani Manichaikul amanicha@jhsph.edu 3 May 2007 Logistic regression n Framework and ideas of linear modelling similar to linear regression n Still have a systematic and probabilistic part to

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Todays lecture Logistic regression How can we use logistic regression for reranking? Shay

From Logistic Regression to Neural Networks CMSC 470 Marine Carpuat Logistic Regression What

LEARNING Outline Math Behind Logistic Regression Visualizing Logistic Regression Loss

Workshop 10.5a: Logistic regression Murray Logan August 23, 2016 Table of contents 1 Logistic

Lecture 3: Logistic Regression Feng Li Shandong University fli@sdu.edu.cn September 21, 2020

Logistic Regression using OLS1D in Excel 2013 XL4D: V0H XL4D: V0H XL4D: V0H 2015 Schield

Workshop 10.5a: Logistic regression Murray Logan 05 Sep 2016 Section 1 Logistic regression

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Logistic regression Shay Cohen (based on slides by Sharon Goldwater) 28 October 2019 Todays

Learning From Data Lecture 9 Logistic Regression and Gradient Descent Logistic Regression

XL4B: Logistic Regression using OLS1B in Excel 2013 25 Feb 2018 V0C-2x XL4B: V0C-2x XL4B: V0C-2x

Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015

Lecture 9 Logistic regression Lecture 9 Logistic regression 10 17 2008 Review Review

Logistic regression Predict binary outcomes (success/failure) from numerical or categorical

Intro to GLM Day 3: Quantities of interest Federico Vegetti Central European University ECPR

Special Topics Some complex model-building problems can be handled using the linear regression

Introducing Open Platform for NFV Dirk Kutscher Chief Researcher NEC

Ukulele Lesson Eight Play each string, 1 2 one at a time. Listen to the sound. Are your

Lecture 18: Review Lecture Ani Manichaikul amanicha@jhsph.edu 15 May 2007 Types of

Statistical Modelling in Stata: Categorical Outcomes Mark Lunt Centre for Epidemiology Versus

Week 4: Binary Outcomes Logistic Regression &amp; Classification Max H. Farrell The University

Searching for family members - (Durbin et al., Ch.5) Suppose we have a family of related

Week 4: Binary Outcomes Logistic Regression & Classification Max H. Farrell The University