Lecture 11: Interpreting logistic regression models Ani Manichaikul amanicha@jhsph.edu 3 May 2007
Logistic regression n Framework and ideas of linear modelling similar to linear regression n Still have a systematic and probabilistic part to any model n Coefficients have a new interpretation, based on log(odds) and log(odds ratios)
The logit function n In logistic regression, we are always modelling the outcome log(p/(1-p)) n We define the function: logit(p)= log(p/(1-p)) n We often use the name logit for convenience
Example: Public health graduate students n 323 graduate students in introductory biostatistics took a health survey. Current smoking status was gathered, which we will predict with gender. n Associating demographics with smoking is vital to planning public health programs. n Information was also collected on age, exercise, and history of smoking; potential confounders of the association between gender and current smoking. n Today, we will focus only on the association between gender and current smoking status.
Coding n Outcome: n smoking = 1 for current smokers 0 for current nonsmokers n Primary predictor: n gender = 1 for men 0 for women
Recall n In linear regression, if we had only one binary X like gender, we would be predicting two means: n � 0 – the mean outcome when X= 0 n � 0 + � 1 – the mean outcome when X= 1 n � 1 – the difference in mean outcome when X= 1 vs. when X= 0
Output Logit estimates Number of obs = 323 LR chi2(1) = 4.46 Prob > chi2 = 0.0348 Log likelihood = -75.469757 Pseudo R2 = 0.0287 ------------------------------------------------------------------------------ smoke | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- gender | .967966 .4547931 2.13 0.033 .0765879 1.859344 _cons | -3.058707 .3235656 -9.45 0.000 -3.692884 -2.42453 ------------------------------------------------------------------------------ p p ( ) ( ) = + ⇒ = + � � ln Gender ln - 3 . 1 1 . 0 Gender − − 0 1 1 p 1 p
Predictions by gender n For women, gender= 0: p ( ) = − + = − ln 3 . 1 1 . 0 0 3 . 1 − 1 p n For men, gender= 1: p ( ) = − + = − ln 3 . 1 1 . 0 1 2 . 1 − 1 p n � 1 is the difference: � 1 is the change in log odds
Interpretation 1: log(odds) n � 0 : the log odds of smoking for women n � 0 �� 1 : the log odds of smoking for men n � 1 : the difference in the log odds of smoking for men as compared to women
But, we really wanted to predict P(Y= 1), not the log odds… n We can start to “untransform” the equation ( ) = = b ln a , then e if b a n n For women, X= 0: ln(odds)= � 0 �� 1 (0) = � 0 = = = � 0 -3.1 o dds of smoking for women e e 0 . 05 n For men, X= 1: ln(odds)= � 0 �� 1 (1) + + = = = = � � -3.1 1.0 -2.1 o dds of smoking for men e e e 0 . 12 0 1
Interpretation 2: odds � the odds of smoking for women e : 0 n (when X= 0) � + 0 � the odds of smoking for men e : 1 n (when X= 1) n In the past, we’ve compared two sets of odds by dividing to find the odds ratio (OR)
Comparing odds n If we subtract the log odds, mathematically that’s equivalent to dividing inside the log: n ln(a) – ln(b) = ln(a/b) n So, if + + = = = � � -3.1 1.0 -2.1 is the odds when X= 1, and e e e 0 . 12 0 1 n � 0 = = -3.1 is the odds when X= 0, then e e 0 . 05 n n we want to divide them in order to compare + � � odds for men e 0 . 12 0 1 = = = = Odds Ratio 2 . 4 � odds for women e 0 . 05 0
Interpreting the odds ratio n The odds of smoking is about 2 ½ times greater for men than for women. n Based on this study, smoking cessation programs should be targeted toward men, while perhaps smoking prevention programs should be targeted toward women.
Useful math n We can usually simplify an equation like this + � � e 0 1 = Odds Ratio � e 0 ( ) ( ) + = � � � - e 0 1 0 = � e 1 a e = − a b e because b e
odds and odds ratio � e : the odds when X= 0 0 n � + 0 � e : the odds when X= 1 1 n + � � e 0 1 = the odds ratio � e n 1 � e 0 comparing the odds when X= 1 vs. X= 0
Note on the computer output � e n R does not give in the output 0 n This is because logistic regression is so often used for case-control studies n the odds aren’t appropriate for a case-control study, because the investigators determine the ratio of cases to controls n the odds ratio is appropriate regardless of whether exposure or outcome was gathered first (by invariance of the odds ratio)
Types of interpretation n � 0 �� 1 = ln(odds) (for X= 1) n � 1 = difference in log odds + � 0 � e 1 = odds (for X= 1) n � e = odds ratio 1 n n But we started with P(Y= 1) n Can we find that?
More useful math p robability = n odds − 1 p robability odds = p robability n + 1 odds + � � e ( ) 0 1 = = so p robability for X 1 n + + � � 1 e 0 1
Finding the probability Find the log odds: For X= 0: ln(odds) = � 0 For X= 1: ln(odds) = � 0 + � 1 Find odds: � e 0 For X= 0: odds = + � 0 � e 1 For X= 1: odds =
Finding the probability Transform odds into probability: odds = p + 1 odds � e 0 = = For X 0 : p robability + � 1 e 0 + � � e 0 1 = = For X 1 : p robability + + � � 1 e 0 1
We could even go one step further p = 1 Re lative Risk (RR) n p 2 + � � e ( ) 0 1 = = For X 1 : P smoke | male n + + � � 1 e 0 1 � e ( ) 0 = = For X 0 : P smoke | female + � 1 e 0 + � � e 0 1 + + � � 1 e p 0 1 = 1 R elative Risk for Men vs. Women : n � p e 0 2 n no way to simplify + � 1 e 0
Remember to consider study design n We always can calculate the relative risk n The relative risk is not appropriate for case-control studies n Again, because the investigators decide the number of cases and controls to study n The odds ratio is appropriate for all study designs
Types of interpretation n � 0 �� 1 = ln(odds) (for X= 1) n � 1 = difference in log odds + � 0 � = odds (for X= 1) e 1 n � e = odds ratio 1 n + � � e ( ) 0 1 = = p robability for X 1 n + + � � 1 e 0 1 + � � e 0 1 + + � � 1 e 0 1 = R elative Risk n � e 0 + � 1 e 0
Interpretation Tips If the equation includes � 0 , then it is usually for a n particular set of people n log odds n odds n probability n exception: the equation for RR will include � 0 , because that equation cannot be simplified If the equation does not include � 0 , then it must n compare two groups n difference of log odds � log odds ratio n odds ratio
In General n Logistic regression for a binary outcome n Left side of equation is log odds n Can transform the equation to find n odds n probability n Can compare two groups n difference of log odds � log odds ratio n odds ratio n relative risk n Everything we learned before applies 25
Useful math for logistic regression b = ( ) = If ln a b , then e a n ( ) + = = � 0 � X= 1: ln(odds)= � 0 �� 1 (1) so o dds for X 1 e 1 n ln(a) – ln(b) = ln(a/b) n so ln(odds|X= 1) – ln(odds|X= 0) = ln(OR for X= 1 vs. X= 0) n + � � a + e e = × 0 1 a b a b Also : e e e = − = � a b e so e 1 n ( ) � b e e 0 2 �� = � × � = � so e e e e 1 1 1 1 odds = p robability n + 1 odds + � � e ( ) 0 1 = = so p robability for X 1 n + + � � 1 e 0 1
Another Example n Regular physical examination is an important preventative public health measure n We’ll study this outcome using the public health graduate student dataset. n Outcome: No physical exam in the past two years n Primary predictor: age n Secondary predictor and potential confounder : regularly taking a multivitamin
Problem n The original “phys” variable was meant to be continuous, but it was collected categorically. n time since last physician visit n Since it is now categorical and we wish to use it as the outcome for a regression model, we have to make it binary and use logistic regression.
Recommend
More recommend