u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s Faculty of Health Sciences Dag 2: Logistic regression Susanne Rosthøj Biostatistisk Afdeling Institut for Folkesundhedsvidenskab Københavns Universitet sr@biostat.ku.dk
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s Calculation of odds ratio in 2 × 2-tables The Framingham study: Is there an association between sex and the risk of Coronary Heart Disease (CHD)? Sex no CHD CHD Total Females 616 104 720 Males 479 164 643 Total 1095 268 1363 Odds ratio: 164 = 164 × 616 479 OR = 104 × 479 = 2 . 03 104 616 The odds of CHD for males is double the odds of CHD for females. 2 / 8
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s The purpose of a logistic regression analysis Relate a binary outcome variable, e.g. � 1 if i has CHD Y i = if i has not CHD 0 to explanatory variables for individual i . In logistic regression we formulate models for log-odds : p i � � log 1 − p i NB: log = natural logarithm (=ln-button on calculator). 3 / 8
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s The logistic regression model Model: � � p i � i is female a 1 = log a 2 i is male 1 − p i � i is female a = a + b i is male � log ( 104 616 ) = log ( 164 479 ) � − 1 . 78 = − 1 . 07 � − 1 . 78 = − 1 . 78 + 0 . 71 There is a difference of b = 0 . 71 in log-odds between the males and females(?). 4 / 8
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s Calculating OR using logistic regression � � p i � i is female a = log a + b i is male. 1 − p i b = ( a + b ) − a = log (odds for males) - log (odds for females) = log (OR for males vs. females) ie. exp ( b ) = OR for males vs. females = exp ( 0 . 71 ) = 2 . 03 . Now determine the OR of CHD for females vs. males. 5 / 8
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s Logistic regression in SAS We use proc logistic proc logistic data=framing descending; class sex / param=glm; model chd01 = sex; run ; Note the options • descending which forces SAS to model the probability of outcome=1 (instead of 0) • param=glm in class statement asking SAS to consider the parameterization of the mean structure from the previous slide (it’s a technical detail) 6 / 8
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s Explanatory variable with several levels Divide age into 4 groups : No CHD CHD Total Odds log-odds 45-48 308 51 359 0.17 -1.80 49-52 298 61 359 0.20 -1.59 53-56 254 64 318 0.25 -1.38 57-62 235 92 327 0.39 -0.94 Total 1095 268 1363 Comparison OR log(OR) 61 · 308 49-52 vs 45-48 51 · 298 = 1.24 0.21 53-56 vs 45-48 = 1.52 0.42 57-62 vs 45-48 = 2.36 0.86 7 / 8
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s Logistic regression with one categorical variable The model : a if i is 45 − 48 years � p i � a + b 1 if i is years 49 − 52 = log a + b 2 if i is 53 − 56 years 1 − p i a + b 3 if i is years. 57 − 62 − 1 . 80 if i is 45 − 48 years − 1 . 80 + 0 . 21 if i is years 49 − 52 = − 1 . 80 + 0 . 42 if i is years 53 − 56 − 1 . 80 + 0 . 86 if i is 57 − 62 years. What is the OR of CHD comparing 49-52 to 57-62 year old? Suggest a test of whether there is an (overall) association between age group and CHD. 8 / 8
Recommend
More recommend