Lecture 10: Introduction to Logistic Regression Ani Manichaikul amanicha@jhsph.edu 2 May 2007
Logistic Regression n Regression for a response variable that follows a binomial distribution n Recall the “binomial model” n And the Binomial Distribution
Binomial Model n n independent trials n (e.g., coin tosses) n p = probability of success on each trial n (e.g., p = ½ = Pr of Heads) n Y = number of successes out of n trials n (e.g., Y= number of heads)
Binomial Distribution n ( ) − = = − n y y ( ) 1 P Y y p p y Example:
Why can’t we use regular regression (SLR or MLR)?
Cannot use Linear Regression n The response, Y, is NOT Normally Distributed n The variability of Y is NOT constant since the variance, Var(Y)= pq, depends on the expected response, E(Y)= p. n The predicted/fitted values must be such that the corresponding probabilities are between 0 and 1.
Example n Consider phase I clinical trial in which 35 independent patients are given a new medication for pain relief. Of the 35 patients, 22 report “significant” relief one hour after medication n Question: How effective is the drug?
Model n Y = # patients who get relief n n = 35 patients (trials) n p = probability of relief for any patient n The truth we seek in the population n How effective is the drug? What is p? n Get best estimate of p given data n Determine margin of error -- range of plausible values for p
Maximum Likelihood Method n The method of maximum likelihood estimation chooses values for parameter estimates which make the observed data “maximally likely” under the specified model
Maximum Likelihood n For the binomial model, we have observed Y= y and n ( ) − = = − n y y ( ) 1 P Y y p p y n So for this example 35 ( ) 22 1 = = − 13 ( ) P Y y p p 22
Maximum Likelihood n So, estimate p by choosing the value for p which makes observed data “maximally likely” n i.e., choose p that makes the value of Pr (Y= 22) maximal n The ML estimate is y/n = 22/35 = 0.63 estimated proportion of patients who will experience relief
Maximum Likelihood Likelihood Function: Pr(22 of 35) 1.0e-10 Max Likelihood Likelihood 5.0e-11 MLE: p= 0.63 0 0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1 p=Prob(Event)
Confidence Interval for p ( ) − 1 p p pq ˆ ˆ = p p n Variance of : Var( )= n n pq ˆ p n “Standard Error” of : n ˆ p n Estimate of “Standard Error” of : p ˆ ˆ q n
Confidence Interval for p n 95% Confidence Interval for the ‘true’ proportion, p: ( )( ) ˆ ˆ 0 . 63 0 . 37 p q ± = ± ˆ 1 . 96 0 . 63 1 . 96 p 35 n = 0.63-1.96(.082),0.63+ 1.96(.082) = (0.47, 0.79)
Conclusion n Based upon our clinical trial in which 22 of 35 patients experience relief, we estimate that 63% of persons who receive the new drug experience relief within 1 hour (95% CI: 47% to 79% )
Conclusion n Whether 63% (47% to 79%) represents an ‘effective’ drug will depend many things, especially on the science of the problem. n Sore throat pain? n Arthritis pain? n Accidentally cut your leg off pain?
Aside: Probabilities and Odds n The odds of an event are defined as: = = P(Y 1) P(Y 1) odds(Y= 1) = = = = P(Y 0) 1 - P(Y 1) p = 1 -p
Probabilities and Odds n We can go back and forth between odds and probabilities: p n Odds = 1 -p n p = odds/(odds+ 1)
Odds Ratio n We saw that an odds ratio (OR) can be helpful for comparisons. Recall the Vitamin A trial: odds(Death | Vit. A) n OR = odds(Death | No Vit A.)
Odds Ratio n The OR here describes the benefits of Vitamin A therapy. We saw for this example that: n OR = 0.59 n An estimated 40% reduction in mortality n OR is a building block for logistic regression
Logistic Regression n Suppose we want to ask whether new drug is better than a placebo and have the following observed data: Relief? Drug Placebo No 13 20 Yes 22 15 Total 35 35
Confidence Intervals for p Placebo ( ) ( ) Drug 0 0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1 p
Odds Ratio odds(Relie f | Drug) OR = odds(Relie f | Placebo) P(Relief | Drug) / [1 - P(Relief | Drug)] = P(Relief | Placebo) / [1 - P(Relief | Placebo)] 0.63/(1 - 0.63) = = 2.26 0.45/(1 - 0.45)
Confidence Interval for OR n CI used Woolf’s method for the ˆ log( R ) standard error of : O 1 1 1 1 ˆ n se( ) = + + + = log( R ) O 0 . 489 22 13 15 20 ˆ ˆ ± log( ) 1 . 96 (log( )) O R se O R find n n Then (e L ,e U )
Interpretation n OR = 2.26 n 95% CI: (0.86 , 5.9) n The Drug is an estimated 2 ¼ times better than the placebo. n But could the difference be due to chance alone?
Logistic Regression n Can we set up a model for this similar to what we’ve done in ANOVA and Regression? n Idea: model the log odds of the event, (in this example, relief) as a function of predictor variables
Model [ ] P(relief | Tx) = log odds(Relie f | Tx) log P(no relief | Tx) = β 0 + β 1 Tx 0 if Placebo where: Tx = 1 if Drug
Then… n log( odds(Relief|Drug) ) = β 0 + β 1 n log( odds(Relief|Placebo) ) = β 0 n log( odds(R|D)) – log( odds(R|P)) = β 1
And… odds(R | D) log = β 1 n Thus: odds(R | P) OR = exp( β 1 ) = e β 1 !! n And: n So: exp( β 1 ) = odds ratio of relief for patients taking the Drug-vs-patients taking the Placebo.
Logistic Regression Logit estimates Number of obs = 70 LR chi2(1) = 2.83 Prob > chi2 = 0.0926 Log likelihood = -46.99169 Pseudo R2 = 0.0292 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- drug | .8137752 .4889211 1.66 0.096 -.1444926 1.772043 _cons | -.2876821 .341565 -0.84 0.400 -.9571372 .3817731 ------------------------------------------------------------------------------ Estimates: ˆ ˆ β β + log( odds(relief) ) = Drug 0 1 = -0.288 + 0.814(Drug) Therefore: OR = exp(0.814) = 2.26 !
It’s the same! n So, why go to all the trouble of setting up a linear model? n What if there is a biologic reason to expect that the rate of relief (and perhaps drug efficacy) is age dependent?
Adding other variables n What if Pr(relief) = function of Drug or Placebo AND Age n We could easily include age in a model such as: log( odds(relief) ) = β 0 + β 1 Drug + β 2 Age
Logistic Regression n As in MLR, we can include many additional covariates. n For a Logistic Regression model with p predictors: log ( odds(Y= 1)) = β 0 + β 1 X 1 + ... + β p X p = = Pr( 1 ) Pr( 1 ) Y Y where: odds(Y= 1) = = − = = 1 Pr( 1 ) Y Pr( 0 ) Y
Logistic Regression n Thus: = Pr( 1 ) Y = β 0 + β 1 X 1 + ... + β p X p log = Pr( 0 ) Y n But, why use log(odds)?
Logistic regression n Linear regression might estimate anything (- � , + � ), not just a proportion in the range of 0 to 1. n Logistic regression is a way to estimate a proportion (between 0 and 1) as well as some related items
Linear models for binary outcomes n We would like to use something like what we know from linear regression: Continuous outcome = � 0 + � 1 X 1 + � 2 X 2 + … n How can we turn a proportion into a continuous outcome?
Transforming a proportion n The odds are always positive: p = ⇒ +∞ odds [ 0 , ) − 1 p n The log odds is continuous: p = ⇒ −∞ +∞ Log odds ln ( , ) − 1 p
Logit transformation Measure Min Max Name Pr(Y = 1) 0 1 “probability” = Pr( 1 ) Y ∞ 0 “odds” − = 1 Pr( 1 ) Y = Pr( 1 ) Y - ∞ ∞ log “log-odds” or “logit” − = 1 Pr( 1 ) Y
Logit Function n Relates log-odds (logit) to p = Pr(Y= 1) logit function 10 5 log-odds 0 -5 -10 0 .5 1 Probability of Success
Key Relationships n Relating log-odds, probabilities, and parameters in logistic regression: n Suppose model: β 0 + β 1 X logit( p) = p β 0 + β 1 X i.e. log = 1 -p n Take “anti-logs” p = exp( β 0 + β 1 X) 1 -p
Solve for p n p = (1 – p ) ⋅ exp( β 0 + β 1 X) n p = exp( β 0 + β 1 X) – p ⋅ exp( β 0 + β 1 X) n p + p ⋅ exp( β 0 + β 1 X) = exp( β 0 + β 1 X) n p ⋅ { 1+ exp( β 0 + β 1 X)} = exp( β 0 + β 1 X) (β + β exp ) X 0 1 n p = 1 + (β + β exp ) X 0 1
What’s the point? n We can determine the probability of success for a specific set of covariates, X, after running a logistic regression model.
Recommend
More recommend