stat 213 logistic regression ii
play

STAT 213 Logistic Regression II Colin Reimer Dawson Oberlin - PowerPoint PPT Presentation

Outline Logistic Regression Fitting the Model Assessment and Testing STAT 213 Logistic Regression II Colin Reimer Dawson Oberlin College 28 April 2016 Outline Logistic Regression Fitting the Model Assessment and Testing Outline


  1. Outline Logistic Regression Fitting the Model Assessment and Testing STAT 213 Logistic Regression II Colin Reimer Dawson Oberlin College 28 April 2016

  2. Outline Logistic Regression Fitting the Model Assessment and Testing Outline Logistic Regression Fitting the Model Assessment and Testing

  3. Outline Logistic Regression Fitting the Model Assessment and Testing Reading Quiz (Multiple Choice) Two logistic models have the same β 0 but different β 1 . For each of the following, state whether the statement must be true , might be true , or cannot be true b. The graphs of log( odds ) versus X cross the Y -axis at the same value of y d. The graphs of P ( Y = 1) versus X cross the line P ( Y = 1) = 0 . 5 at the same value of X e. The graphs of P ( Y = 1) versus X the line x = 0 . 5 at the same value of Y .

  4. Outline Logistic Regression Fitting the Model Assessment and Testing For Tuesday • Write up: 9.14, 9.22, 9.26 • Read: 10.1-10.2 • Answer: 9.12, 10.2, 10.4 • Soon: Project 3 (due on the last day of classes)

  5. Outline Logistic Regression Fitting the Model Assessment and Testing Quantitative Vs. Categorical Predictor and Response Response Quantitative Categorical Quantitative Linear Reg. Predictor Logistic Reg. Categorical ANOVA

  6. Outline Logistic Regression Fitting the Model Assessment and Testing Binary Logistic Regression Response variable ( Y ) is categorical with two categories (i.e., binary). • Code Y as an indicator variable: 0 or 1 • Assume (for now) a single quantitative predictor, X

  7. Outline Logistic Regression Fitting the Model Assessment and Testing Two Equivalent Forms of Logistic Regression Probability Form e β 0 + β 1 X π = 1 + e β 0 + β 1 X Logit Form � π � log = β 0 + β 1 X 1 − π π : Probability that Y = 1 π 1 − π : Odds that Y = 1 � � π log : Log odds, or logit that Y = 1 1 − π

  8. Outline Logistic Regression Fitting the Model Assessment and Testing Example: Golf Putts Distance (ft) 3 4 5 6 7 # Made 84 88 61 61 44 # Missed 17 31 47 64 90 Total 101 119 108 125 134 1. Estimate the probability of success at each length 2. Estimate the odds of success at each length 3. Estimate the log odds of success at each length 4. Plot each of these against distance

  9. Outline Logistic Regression Fitting the Model Assessment and Testing Odds Ratios Logit and Odds � π � log = β 0 + β 1 X 1 − π π 1 − π = e β 0 + β 1 X • In the model, for each 1 unit increase in X , the logit increases by β 1 . • Equivalently: For each 1 unit increase in X , the odds are multiplied by e β 1 • In other words, e β 1 is the odds ratio resulting from a one unit change in X , with β 1 the log odds ratio .

  10. Outline Logistic Regression Fitting the Model Assessment and Testing Odds Ratios The odds ratio associated with a binary response Y at two different predictor values X = x 2 vs. X = x 2 is the ratio of the odds; that is: Odds Ratio ( x 2 vs. x 1 ) = π ( x 2 ) / (1 − π ( x 2 )) π ( x 1 ) / (1 − π ( x 1 )) We can estimate this from a sample using: Odds Ratio ( x 2 vs. x 1 ) = ˆ π ( x 2 ) / (1 − ˆ π ( x 2 )) � π ( x 1 ) / (1 − ˆ ˆ π ( x 1 ))

  11. Outline Logistic Regression Fitting the Model Assessment and Testing Example: Golf Putts Distance (ft) 3 4 5 6 7 # Made 84 88 61 61 44 # Missed 17 31 47 64 90 Total 101 119 108 125 134 ˆ 0.832 0.739 0.565 0.488 0.328 π Odds 4.94 2.84 1.30 0.95 0.49 Log Odds 1.60 1.04 0.26 -0.05 -0.71 5. Find the sample odds ratio for success for 4 ft. vs. 3 ft; 5 vs. 4; 6 vs. 5; 7 vs. 6 6. Take the log of each of these to get the (additive) change in the logit. Should be slopes of lines “connecting the dots” (since ∆ X = 1 ).

  12. Outline Logistic Regression Fitting the Model Assessment and Testing Example: Golf Putts Distance (ft) 3 4 5 6 7 # Made 84 88 61 61 44 # Missed 17 31 47 64 90 Odds 4.94 2.84 1.30 0.95 0.49 Log Odds 1.60 1.04 0.26 -0.05 -0.71 OR 0.575 0.457 0.734 0.513 ∆ Log Odds -0.56 -0.78 -0.31 -0.66 • In the data, successive ORs (changes in log odds) are different • The model fits a constant ratio (slope for log odds) 7. Draw a single line through your logit plot and get an estimated slope and intercept. These are your ˆ β 0 and ˆ β 1 .

  13. Outline Logistic Regression Fitting the Model Assessment and Testing Example: Golf Putts library("mosaic") Putts <- data.frame(Distance = 3:7, Made = c(84,88,61,61,44), Total = c(101,119,108,125,134)) Putts <- mutate(Putts, PropMade = Made / Total) (model <- glm(PropMade ~ Distance, weights = Total, data = Putts, family = "binomial")) Call: glm(formula = PropMade ~ Distance, family = "binomial", data = Putts, weights = Total) Coefficients: (Intercept) Distance 3.2568 -0.5661 Degrees of Freedom: 4 Total (i.e. Null); 3 Residual Null Deviance: 81.39 Residual Deviance: 1.069 AIC: 30.18

  14. Outline Logistic Regression Fitting the Model Assessment and Testing Example: Golf Putts (Probabilities) xyplot(PropMade ~ Distance, data = Putts) f.hat <- makeFun(model) plotFun(f.hat(Distance) ~ Distance, add = TRUE) ● 0.8 ● 0.7 PropMade 0.6 ● 0.5 ● 0.4 ● 3 4 5 6 7 Distance

  15. Outline Logistic Regression Fitting the Model Assessment and Testing Example: Golf Putts (Odds) f.hat <- makeFun(model, transform = function(p){p/(1-p)}) xyplot(PropMade/(1 - PropMade) ~ Distance, data = Putts) plotFun(f.hat(Distance) ~ Distance, add = TRUE) PropMade/(1 − PropMade) 5 ● 4 3 ● 2 ● 1 ● ● 3 4 5 6 7 Distance exp(-0.5661) ## Odds ratio for a one foot increase in Distance [1] 0.5677353

  16. Outline Logistic Regression Fitting the Model Assessment and Testing Example: Golf Putts (Log Odds) f.hat <- makeFun(model, transform = logit) xyplot(logit(PropMade) ~ Distance, data = Putts) plotFun(f.hat(Distance) ~ Distance, add = TRUE) ● logit(PropMade) 1.5 1.0 ● 0.5 ● 0.0 ● −0.5 ● 3 4 5 6 7 Distance -0.5661 ## Log (odds ratio) / rate of change in log odds / slope of logit

  17. Outline Logistic Regression Fitting the Model Assessment and Testing Reconstructing Odds Ratio • The logistic regression output from R gives us ˆ β 0 and ˆ β 1 . But unlike in linear regression, these are not very interpretable on their own. • We have seen that β 1 corresponds to “rate of change in log odds”. Better to convert to “odds ratio” per unit change in X . • What do we do to β 1 to get this?

  18. Outline Logistic Regression Fitting the Model Assessment and Testing Choosing ˆ β 0 and ˆ β 1 Recall that in linear regression, we choose ˆ β 0 and ˆ β 1 to minimize ( Y i − f ( X i )) 2 = � � ( Y i − ˆ β 0 − ˆ β 1 X ) 2 RSS = i i For a logistic model, choose ˆ β 0 and ˆ β 1 to maximize the probability of the data under the model . n � π Y i π i ) 1 − Y i Pr ( Data | Model ) = ˆ i (1 − ˆ i =1 � Y i � n � β 0 +ˆ ˆ � 1 − Y i β 1 X i e 1 � = 1 + e ˆ β 0 +ˆ 1 + e ˆ β 0 +ˆ β 1 X i β 1 X i i =1

  19. Outline Logistic Regression Fitting the Model Assessment and Testing Maximum Likelihood • Pr ( Data | Model ) is called the likelihood of the model. • In fact, when we assume heteroskedastic Normal residuals, the RSS is the negative log likelihood. • So we’ve secretly been doing max likelihood this whole time. • But whereas MLE for Normal-linear model was a calculus problem, MLE for logistic requires an iterative algorithm.

  20. Outline Logistic Regression Fitting the Model Assessment and Testing Conditions for Logistic Regression 1. Linearity ( log odds depends linearly on X ) 2. Independence (no clustering or time/space dependence) 3. Random (data comes from a random sample, or random assignment) 4. Normality no longer applies! (Response is binary, so it can’t) 5. Homoskedasticity no longer required! (In fact, more variance when ˆ π near 0.5)

  21. Outline Logistic Regression Fitting the Model Assessment and Testing Checking Linearity • Can’t just transform response via logit to check linearity... • Unless data is binned... then can take logit of proportion per bin

  22. Outline Logistic Regression Fitting the Model Assessment and Testing Binned Data xyplot(logit(PropMade) ~ Distance, data = Putts, type = c("p","r")) ● 1.5 logit(PropMade) ● 1.0 0.5 ● 0.0 ● −0.5 ● 3 4 5 6 7 Distance Logits are fairly linear

  23. Outline Logistic Regression Fitting the Model Assessment and Testing Equivalent Model Code for Binned Data Putts <- mutate(Putts, Missed = Total - Made) (m2 <- glm(cbind(Made,Missed) ~ Distance, data = Putts, family = "binomial")) Call: glm(formula = cbind(Made, Missed) ~ Distance, family = "binomial", data = Putts) Coefficients: (Intercept) Distance 3.2568 -0.5661 Degrees of Freedom: 4 Total (i.e. Null); 3 Residual Null Deviance: 81.39 Residual Deviance: 1.069 AIC: 30.18

  24. Outline Logistic Regression Fitting the Model Assessment and Testing Hypothesis Test for β 1 In linear regression, we computed ˆ β 1 − 0 t obs = se (ˆ ˆ β 1 ) and found P -value = Pr ( | T n − 2 | ≥ | t obs | ) In logistic regression we can use a Normal approximation: ˆ β 1 − 0 z obs = se (ˆ ˆ β 1 ) and get P -value = Pr ( | Z | ≥ | z obs | )

Recommend


More recommend