introducing a zero modified negative binomial regression
play

Introducing a zero-modified negative binomial regression for - PowerPoint PPT Presentation

Introducing a zero-modified negative binomial regression for estimating the effect of chilling on Escherichia coli plate counts from Irish beef carcasses Dr. Ursula Gonzales Barron Prof. Francis Butler Biosystems Engineering, UCD School of


  1. Introducing a zero-modified negative binomial regression for estimating the effect of chilling on Escherichia coli plate counts from Irish beef carcasses Dr. Ursula Gonzales Barron Prof. Francis Butler Biosystems Engineering, UCD School of Agriculture, Food Science and Vet. Med. University College Dublin, Ireland

  2. Introduction  Traditionally, inferential statistical analysis of bacterial counts are conducted on „log 10 cfu‟ values.  Assumption behind: Logarithmic transformation will induce normality of data which is fundamental for conventional ANOVA.

  3. Histogram of frequencies for total viable counts on beef carcasses pre-chill (n=690) X <= 0.051 X <= 4.846  TVC can 2.5% 97.5% 0.40 be 0.35 Probability density 0.30 approxi- 0.25 mated to 0.20 0.15 normal 0.10 distribu- 0.05 tion 0.00 -2 -1 0 1 2 3 4 5 6 7 2 ) Total viable counts on beef carcasses (Log cfu/cm

  4. However…  Depending on the detection frequency of bacteria, data normality cannot always be achieved.  In situations where bacteria are not detected in considerable proportions (>15%), normality cannot be assumed.

  5. Histogram of frequencies for coliforms on beef carcasses pre-chill (n=690)  Can normal 0.7 distribution 0.6 0.5 be Frequency 0.4 assumed? 0.3 0.2 0.1 0.0 -3 -2 -1 0 1 2 3 4 5 6 7 Coliforms on beef carcasses (log cfu/cm2)

  6. Histogram of frequencies for Escherichia coli on beef carcasses pre-chill (n=690)  Histogram 1.8 for E. coli 1.6 has even a 1.4 1.2 more Frequency 1.0 dramatic 0.8 shape 0.6 because the 0.4 0.2 detection 0.0 frequency -3 -2 -1 0 1 2 3 4 Coliforms on beef carcasses (log cfu/cm2) is lower

  7. How to do inferential stats in this type of data?  Can log normality be assumed in these cases?

  8. Ways to approach it?  Transformations to induce normality  Box-Cox transformation did not work for coliforms nor E. coli ( too skewed!)  Categorisation of the outcome  loss of information  Rank statistics  loss of information

  9. Generalised Poisson models?  Work with discrete data: Log cfu/cm 2  CFU.  Modifications are done to the baseline Poisson model to address restrictive equi- dispersion (mean=variance) Y Exp i i i f Y i ! Y i

  10. Histogram of frequencies of plate counts for E. coli on beef carcasses pre-chill (CFU)  Certainly X <= 0.0 X <= 31.0 2.5% 97.5% 0.16 does not 0.12 look like a Frequency Poisson 0.08 0.04 0.00 0 10 20 30 40 50 E. coli plate counts

  11. Generalised Poisson model  In practice, heterogeneity causes over- dispersion (variance>>mean)  clustering  Poisson can be generalised by a dispersion parameter ε that accommodates the unobserved heterogeneity in the count data. exp( ) GP P  If exp( ε i ) follows a gamma distribution Γ(1/α , α ) . Then GP  negative binomial

  12. Histogram of frequencies for plate counts of Escherichia coli 0.36 0.32 0.28 0.24 Probability 0.20 0.16 ε may then account 0.12 for over-dispersion 0.08 0.04 0.00 0 10 20 30 40 50 E. coli plate counts

  13. Histogram of frequencies for plate counts of Escherichia coli 0.36 0.32 And the extra 0.28 zero counts? 0.24 Probability 0.20 0.16 ε may then account 0.12 for over-dispersion 0.08 0.04 0.00 0 10 20 30 40 50 E. coli plate counts

  14. Zero-modified generalised Poisson  We can hypothesise that for each observation, there are two possible data generation processes. The result of a Bernoulli trial determines which process is used: Process 1 only generates zero counts with probability „ω 0 ‟  Process 2 occurs with probability „1 - ω 0 ‟ and generates  positive counts from a negative binomial  This is the hurdle negative binomial (HNB) regression model

  15. Hurdle negative binomial model 0 for Y 0 i 1 Y 1 i Y Y 1 i i i i 1 1 Pr Y Y i i 1 1 for Y 0 i 1 1 1 1 i  Notice the two components  ω 0 is the probability of zero count, and is determined by a logistic model  ω 0 =f(covariates), λ i =f(covariates)

  16. Methodology  Two HNB regression models were fitted to E. coli plate counts from beef carcasses pre-chill and post- chill  Group level  Carcass level Y = CFU pre-chill and post-  Y = CFU post-chill  chill X = CFU pre-chill  X = Coded variable: pre-  chill (1), post-chill (2) exp X exp exp X exp e e 0 1 0 1 i i i i i i 0 0 Log b b X Log b b X 0 1 0 1 i i 1 1 0 0

  17. Results Regression Group-level regression model parameters Significant logit and NB Estimate St. error Pr > |t| Neg Bin β 0 (int) 1.151 0.381 ** Prob of zero count β 1 (covariate) 0.451 0.160 ** Logit -2.131 0.131 *** b 0 (int) b 1 (covariate) 1.797 0.089 *** Chilling increases odds Other estimates 0.118 0.015 *** OR (int) 6.034 0.542 *** OR (treat) For (+) counts: λ 1 (pre-chill) 4.965 1.606 ** pre-chill<post-chill λ 2 (post-chill) 7.795 2.648 ** ω 0 (pre-chill) 0.417 0.013 *** ?? ω 0 (post-chill) 0.812 0.011 ***

  18. Pre-chill counts Prob zero count post-chill Non-sign NB can predict ω 0 in post-chill decreases for a 1 colony and sign logit group (actual ω 0 =0.81) increase in the pre-chill (+) count Regression Group-level regression model Carcass-level regression model parameters Estimate St. error Pr > |t| Estimate St. error Pr > |t| Neg Bin β 0 (int) 1.151 0.381 ** 2.042 0.701 ** β 1 (covariate) 0.451 0.160 ** -0.004 0.004 ns Logit -2.131 0.131 *** 1.564 0.078 *** b 0 (int) b 1 (covariate) 1.797 0.089 *** -0.006 0.002 ** Other estimates 0.118 0.015 *** 4.777 0.373 *** OR (int) 6.034 0.542 *** 0.993 0.002 *** OR (treat) λ 1 (pre-chill) 4.965 1.606 ** - - - λ 2 (post-chill) 7.795 2.648 ** 6.916 4.854 ns ω 0 (pre-chill) 0.417 0.013 *** - - - ω 0 (post-chill) 0.812 0.011 *** 0.798 0.013 ***

  19. Escherichia coli plate counts as modelled by the group-level HNB regression 0.9 0.8 Pre-chill E(Y)=10.85 CFU 0.7 0.6 Probability 0.5 0.4 0.3 0.2 0.1 0.0 0 5 10 15 20 Escherichia coli plate counts from beef carcasses (CFU)

  20. Escherichia coli plate counts as modelled by the group-level HNB regression 0.9 As treatment  0.8 Pre-chill E(Y)=10.85 CFU covariate was a 0.7 coded variable, Post-chill E(Y)=5.05 CFU 0.6 probabilities take Probability 0.5 the shape of a 0.4 PMF 0.3 0.2 0.1 0.0 0 5 10 15 20 Escherichia coli plate counts from beef carcasses (CFU)

  21. Conclusions  In this application, the proportion of zero-counts post-chill could be predicted from the positive counts pre-chill.  It was the larger number of zero counts (significant logit) in the post-chill counts – and not a potential lower positive count (non-significant negative binomial), which explained the decrease in the E(Y) from 10.85 CFU in the pre-chill to 5.05 CFU in the post-chill.  This zero-modified heterogeneous Poisson model showed to be very flexible and proved promising to perform inferential stats on plate counts of microorganisms of infrequent recovery.

  22. Additional notes

  23. Significance of the generalised Poisson models – where to go from here?  Distribution fitting  stochastic modelling  Another variant: ZINB could separate true zero counts (absence of m.o) from “false” zeros (presence of m.o. in low concentration but not detected due to dilutions)  Effect on sampling criteria performance  Exposure assessment modelling  Mixed models

  24. Distribution fitting 0.15 Observed data 0.12 Probability 0.09 0.06 0.03 0 0 10 20 30 40 50 60 70 80 90 100 Coliform plate counts from pre-chill Irish beef carcasses

  25. Distribution fitting 0.15 Neg Bin 0.12 Observed data Probability mass function 0.09 0.06 0.03 0 0 10 20 30 40 50 60 70 80 90 100 Coliform plate counts from pre-chill Irish beef carcasses

  26. Distribution fitting 0.15 Neg Bin ZINB 0.12 Probability mass function Observed data 0.09 0.06 0.03 0 0 10 20 30 40 50 60 70 80 90 100 Coliform plate counts from pre-chill Irish beef carcasses

  27. Distribution fitting 0.15 Neg Bin 0.12 ZINB Probability mass function Hurdle NB Observed data 0.09 0.06 0.03 0 0 10 20 30 40 50 60 70 80 90 100 Coliform plate counts from pre-chill Irish beef carcasses

  28. Exposure assessment simulation data 1.000 0.999 Cumulative probability 0.998 Grilled 0.997 Fried 0.996 0.995 0.994 0 500 1000 1500 2000 2500 3000 Exposure to Salmonella Typhimurium (CFU) per serving of a cooked meat product

  29. Zero-inflated negative binomial model Real absence of bacteria?  Prevalence = (1-p 0 ) Zero count from the NB compon. λ >0 and Y i takes 0 for the random process  Not real absence 1 1 1 0 p p for Y 0 0 i 1 Pr Y i 1 Y 1 1 Y i i 1 1 p for Y 0 i 1 1 1 1 Y i

  30. Acknowledgments  Irish Department of Agriculture, Fisheries and Food

Recommend


More recommend