lecture 14 introduction to poisson regression
play

Lecture 14: Introduction to Poisson Regression Ani Manichaikul - PowerPoint PPT Presentation

Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why count data? Number of traffic


  1. Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52

  2. Overview Modelling counts Contingency tables Poisson regression models 2 / 52

  3. Modelling counts I Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week Number of coughs in a concert hall, per minute Number of customers arriving in a shop, daily 3 / 52

  4. Modelling counts II Counts show up in all kinds of scenarios in our every day life In public health, count data are particularly important in measuring disease rates: We often count the number of people acquiring a particular disease in a given year Modelling counts gives us a framework in which to estimate disease incidence Poisson regression will allow us to measure association between incidence, and risk factors of interest adjusting for other related covariates 4 / 52

  5. A whole new model to deal with counts? I So far we have talked about: Linear regression: for normally distributed errors Logistic regression: for binomial distributed errors Typically, our count data do not follow either of these distributions Counts are not binary (0 / 1) Counts are discrete, not continuous Counts typically have a right skewed distribution 5 / 52

  6. A whole new model to deal with counts? II So far, the regression strategies we’ve discussed allow us to model: Expected values and expected increase in linear regression Log odds or log odds ratios in logistic regression In modelling counts, we are typically more interested in Incidence rates Incidence ratios (when comparing across levels of a risk factor) Poisson regression will provide us with a framework to handle counts prop erly! 6 / 52

  7. Poisson Assumptions Before going on to talk about Poisson regression, let’s revisit some key concepts about the Poisson distribution: The occurrences of a random event in an interval of time are independent In theory, an infinite number of occurrences of the event are possible (though perhaps rare) within the interval In any extremely small portion of the interval, the probability of more than one occurrence of the event is approximately zero 7 / 52

  8. Poisson Probability I The probability of x occurrence of an event in an interval is: P ( X = x ) = e − λ · λ x , x = 0 , 1 , 2 , . . . x ! where λ = the expected number of occurrences in the interval e = a constant ( ≈ 2.718) For the Poisson distribution: mean = variance = λ Mode = floor ( λ ), largest integer less than λ We can also think of λ as the rate parameter 8 / 52

  9. Poisson Probability II Poisson probability mass function for λ = 5 Poisson probablity, λ =5 0.15 0.10 Pr(X=x) 0.05 0.00 5 10 15 20 x 9 / 52

  10. Poisson Probability III Poisson probability mass function for λ = 30 Poisson probablity, λ =30 0.06 0.04 Pr(X=x) 0.02 0.00 0 10 20 30 40 50 60 70 x 10 / 52

  11. Poisson and Binomial The Poisson distribution can be used to approximate a binomial distribution when: n is large and p is very small, or np = λ is fixed, and n becomes infinitely large 11 / 52

  12. Example: Cancer in a large population Yearly cases of esophageal cancer in a large city 30 cases observed in 1990 P ( X = 30) = e − λ λ 30 30! λ = yearly average number of cases of esophageal cancer 12 / 52

  13. Example: Belief in Afterlife I Men and women were asked whether or not they believed in afterlife (General Social Survey 1991) Possible responses were: yes, no, or unsure Y N or U M 435 147 582 F 375 134 509 Total 810 281 1091 13 / 52

  14. Example: Belief in Afterlife II Question: Is belief in the afterlife independent of gender? We could address this question using a χ 2 test To perform a χ 2 test: Begin by stating our model: independece Then, we calculate expected cell counts for each entry in the table 14 / 52

  15. Example: Belief in Afterlife III Under an independence model, the probability of belief in afterlife for both genders would be estimated as number who believe ˆ p = P (belief in afterlife) = total number asked 810 = 1091 ≈ 0 . 742 Then, we calculate the expected counts as: E (males answering yes) = # men asked · ˆ p = 582 · 0 . 742 = 432 E (females answering yes) = # women asked · ˆ p = 509 · 0 . 742 = 378 15 / 52

  16. Example: Belief in Afterlife IV The observed and expected cell counts are as follows: Y N or U M 435 (432) 147 (150) 582 F 375 (378) 134 (131) 509 Total 810 281 1091 Then, the χ 2 statistic is calculated as: ( O ij − E ij ) 2 (435 − 432) 2 + (147 − 150) 2 � = 432 150 E ij i,j (375 − 378) 2 + (134 − 131) 2 + 378 131 = 0 . 173 16 / 52

  17. Example: Belief in Afterlife V Remeber, we are testing the null hypothesis of independence, versus the alternative that the proportion believing in afterlife differs across genders: H 0 : p male = p female = p overall H a : p male � = p female We decided to perform the hypothesis test using a χ 2 test. Here, the appropriate degrees of freedom are: (number of rows − 1)(number of columns − 1) = (2 − 1)(2 − 1) = 1 17 / 52

  18. Example: Belief in Afterlife VI Let’s test the hypothesis at level α = 0 . 05. Look up the appropriate critical value in R : > qchisq(1-0.05, df=1) [1] 3.841459 Our observed χ 2 statistic of 0.173 is much smaller than the critical value We can also look up the corresponding p-value for our test: > pchisq(0.173, df=1, lower.tail=F) [1] 0.6774593 Conclusion: fail to reject the null hypothesis 18 / 52

  19. Example: Belief in Afterlife – Poisson Model I Just now, we had to calculate the expected counts to perform the χ 2 test Actually, we could have written down a linear model to express the expected counts systematically Make use of the Poisson approximation to binomial Model: Y ij ∼ Poisson ( λ ij ) λ ij = λ · α male · γ response Here we interpret λ ij as the Poisson rate for the cell in the i th row and j th column λ is the baseline rate, α is the male effect, and γ is the response 19 / 52

  20. Example: Belief in Afterlife – Poisson Model II Recall that the log of a product is equal to the sum of the log: log( a · b ) = log( a ) + log( b ) So, let’s log-transform our model... originally we had: λ ij = λ · α male · γ response Taking the log of both sides, we have: log( λ ij ) = log λ + log( α male ) + log ( γ response ) This is looking like a linear model... 20 / 52

  21. Example: Belief in Afterlife – Poisson Model III We stated the systematic portion of this model as log ( λ ij ) = log λ + log( α male ) + log ( γ response ) which we can also write using β ’s to look like other linear models: log ( λ ij ) = β 0 + β 1 · I ( male ) + β 2 · I ( response ) The probabilistic portion of this model enters as: Y ij ∼ Poisson( λ ij ) 21 / 52

  22. Example: Belief in Afterlife – Poisson Model IV In this Poisson regression model: The outcome is the log of the expected cell count The baseline β 0 is the log expected cell count for females responding ”no” β 1 is the increase in log expected cell count for males compared to females β 2 is the increase in log expected cell count for the response ”yes” compared to ”no” 22 / 52

  23. Fitting the afterlife model in R I Data format We began with the view of our data in a table Y N or U M 435 (432) 147 (150) 582 F 375 (378) 134 (131) 509 Total 810 281 1091 However, R expects to see unique observations listed, together with relevant covariates (indicators here), as follows: count male yes 1 435 1 1 2 147 1 0 3 375 0 1 4 134 0 0 23 / 52

  24. Fitting the afterlife model in R II Once the data is entered in R , we can analyze it using the glm command, specifying the family as poisson: > summary(out<-glm(count ~ male + yes, family=poisson)) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 4.87595 0.06787 71.839 <2e-16 *** male 0.13402 0.06069 2.208 0.0272 * yes 1.05868 0.06923 15.291 <2e-16 *** --- Signif. codes: 0’***’ 0.001’**’ 0.01’*’ 0.05’.’ 0.1’ ’ 1 Null deviance: 272.68544 on 3 degrees of freedom Residual deviance: 0.16200 on 1 degrees of freedom AIC: 35.407 24 / 52

  25. Fitting the afterlife model in R III So we fit the model: log( λ ij ) = β 0 + β 1 · I ( male ) + β 2 · I ( response ) and our fitted model is: log( λ ij ) = 4 . 88 + 0 . 134 · I ( male ) + 1 . 06 · I ( response ) 25 / 52

  26. Predicting log expected cell counts I Using the fitted model: log( λ ij ) = β 0 + β 1 · I ( male ) + β 2 · I ( response ) we can get predicted values for log counts in each of the four cells: For females responding ”no”: log(E(count | female, no)) = 4 . 88 + 0 . 134 · 0 + 1 . 06 · 0 = 4 . 88 For males responding ”no”: log(E(count | male, no)) = 4 . 88 + 0 . 134 · 1 + 1 . 06 · 0 = 5 . 01 26 / 52

  27. Predicting log expected cell counts II Using the fitted model: log ( λ ij ) = β 0 + β 1 · I ( male ) + β 2 · I ( response ) we can get predicted values for log counts in each the four cells: For females responding ”yes”: log(E(count | female, yes)) = 4 . 88 + 0 . 134 · 0 + 1 . 06 · 1 = 5 . 94 For males responding ”yes”: log(E(count | male, yes)) = 4 . 88 + 0 . 134 · 1 + 1 . 06 · 1 = 6 . 07 27 / 52

Recommend


More recommend