Size Estimation - Statistical Models for Underreporting Gerhard Neubauer, Gordana Djuraš & Herwig Friedl JOANNEUM RESEARCH and Technical University, Graz useR! 2009, Rennes 8.-10. July 2009 – p. 1
1 Introduction useR! 2009, Rennes 8.-10. July 2009 – p. 2
Underreporting Any sample of count data may be incomplete ■ criminology: crimes with an aspect of shame (sexuality, domestic violence) or theft of low values goods ■ public health: infectious (HIV) or chronic (diabetes) disease data ■ production: error counts in a production process ■ traffic accidents with minor damage Estimation of total number of cases useR! 2009, Rennes 8.-10. July 2009 – p. 3
Binomial Model Event reported Yes No R 1 − R R ∼ Bernoulli ( π ) useR! 2009, Rennes 8.-10. July 2009 – p. 4
Binomial Model Event reported Yes No R 1 − R R ∼ Bernoulli ( π ) iid sample of size λ ⇒ Y = � i R i ∼ Binomial ( λ, π ) var( Y ) = σ 2 = λπ (1 − π ) E( Y ) = µ = λπ, useR! 2009, Rennes 8.-10. July 2009 – p. 4
Binomial Model Event reported Yes No R 1 − R R ∼ Bernoulli ( π ) iid sample of size λ ⇒ Y = � i R i ∼ Binomial ( λ, π ) var( Y ) = σ 2 = λπ (1 − π ) E( Y ) = µ = λπ, Y the number of reported events π the reporting probability the total number of events - size parameter λ U = λ − Y the number of unreported events useR! 2009, Rennes 8.-10. July 2009 – p. 4
Binomial Model Event reported Yes No R 1 − R R ∼ Bernoulli ( π ) iid sample of size λ ⇒ Y = � i R i ∼ Binomial ( λ, π ) var( Y ) = σ 2 = λπ (1 − π ) E( Y ) = µ = λπ, Both λ and π have to be estimated No longer member of Exponential Family useR! 2009, Rennes 8.-10. July 2009 – p. 5
Estimation For T iid samples Y t Method of Moments For the binomial we have var( Y ) = µ − µ 2 /λ ≤ µ Limitation to data with s 2 < ¯ y For s 2 > ¯ y 1. Regression approach using ML ind Y t ∼ Binomial ( λ t , π ) with λ t = f ( x t , β ) Neubauer & Friedl (2006) 2. Mixed model approaches useR! 2009, Rennes 8.-10. July 2009 – p. 6
2 Alternative Models useR! 2009, Rennes 8.-10. July 2009 – p. 7
Beta-Binomial Random reporting probability P Y t | P ∼ Binomial ( λ, p ) P ∼ Beta ( γ, δ ) Y t ∼ Beta-Binomial ( λ, γ, δ ) Mean-variance relation µ − µ 2 � � var( Y t ) = φ λ φ = λ + γ + δ 1 + γ + δ ≥ 1 useR! 2009, Rennes 8.-10. July 2009 – p. 8
Poisson Random total number of cases L Y t | L ∼ Binomial ( l, π ) L ∼ Poisson ( λ ) Y t ∼ Poisson ( λπ ) Parameters not identified useR! 2009, Rennes 8.-10. July 2009 – p. 9
Negative Binomial Additional randomness in E( L ) L | K ∼ Poisson ( kλ ) Y t | K ∼ Poisson ( kλπ ) K ∼ Gamma ( ω, ω ) Y t ∼ Negative Binomial ( ω, 1 − π ) ω the number of unreported cases π the reporting probability Mean-variance relation var( Y t ) = µ + µ 2 ω ≥ µ useR! 2009, Rennes 8.-10. July 2009 – p. 10
Beta-Poisson Consider both binomial parameters as random Y | L, P ∼ Binomial ( L, P ) L ∼ Poisson ( λ ) P ∼ Beta ( γ, δ ) Y ∼ Beta-Poisson ( λ, γ, δ ) γ E ( Y ) = λπ = µ where π = γ + δ φ = 1 + λ (1 − π ) var( Y ) = µφ with 1 + γ + δ ≥ 1 useR! 2009, Rennes 8.-10. July 2009 – p. 11
Generalized Poisson Distribution Consul(1989) Moments θ θ E( Y ) = var( Y ) = (1 − τ ) (1 − τ ) 3 ■ τ = 0 : E( Y ) = var( Y ) ⇒ Poisson ( θ ) ■ 0 < τ < 1 : E( Y ) < var( Y ) ⇒ Neg. Binomial ■ τ < 0 : E( Y ) > var( Y ) ⇒ Binomial useR! 2009, Rennes 8.-10. July 2009 – p. 12
Conditional Poisson Models Motivation: π → 1 leads to Y → λ in the binomial approach Assume Y | L ∼ Poisson ( L ) Choose p ( L ) such that E( Y ) = λπ and var( Y ) = λπφ For example: L ∼ Binomial ( λ, π ) 1 < φ = 2 − π < 2 2 < φ = 2 − π L ∼ Negative Binomial ( λ, π ) 1 − π < ∞ useR! 2009, Rennes 8.-10. July 2009 – p. 13
Regression Modelling For all models - except the GP - we use exp( α ) λ t,β = exp( x ′ t β ) and π α = 1 + exp( α ) x t , d -vector of known regressors β , d -vector of unknown parameters For the GP model we use θ t,β = exp( x ′ t β ) τ α = 1 − exp( − α ) and Test α = 0 ⇒ Identify Poisson misdispersion useR! 2009, Rennes 8.-10. July 2009 – p. 14
3 Implementation useR! 2009, Rennes 8.-10. July 2009 – p. 15
R package sizEst Full ML estimation of all models: done Conditional Poisson in work Testing competing models: in development Testing parameters within models: done Model diagnostics: done Main functions: arrayEst, sizEst Implemented methods: sizEst: plot, predict, residuals, summary summary.arrayEst useR! 2009, Rennes 8.-10. July 2009 – p. 16
4 Real Data Application useR! 2009, Rennes 8.-10. July 2009 – p. 17
Stroke Data Hypothesis: Slight strokes are not seen in hospitals Data: Hospital discharges Output from function arrayEst() iterations loglik chi.sq gradient p1 GP 0.5 3 -405.256 94.602 0.000 0.4594 NegBin 0.5 3 -405.195 94.411 0.000 0.4604 BetaBin 0.5 19 -402.135 80.779 0.000 0.8230 BetaPois 0.5 30 -408.170 80.811 0.000 0.9097 useR! 2009, Rennes 8.-10. July 2009 – p. 18
Stroke Data Output from function sizEst() Distribution: BetaPois Formula: y ˜ beta01 + T.cos1 + T.sin1 - 1 Estimate Std. Error t value Pr(>|t|) alpha1 2.309 0.289 8.002 0 beta01 5.071 0.025 204.435 0 T.cos1 0.060 0.016 3.673 0 T.sin1 -0.083 0.016 -5.249 0 Theta 11.231 --- --- - Performance measures: measures loglik -408.170 chi.sq 80.811 df.residual 92.000 aic 824.341 bic 834.599 Reporting Probabilities: lower estimate upper alpha1 0.8622 0.9097 0.9571 useR! 2009, Rennes 8.-10. July 2009 – p. 19
Stroke Data Distribution=BetaPois, p0=0.3 Distribution=BetaPois, p0=0.3 200 150 150 Frequency Frequency 100 100 50 50 0 0 0 20 40 60 80 0 20 40 60 80 Mean function from Quasi−Poisson−Model (blue) and BetaPois model (red) Lambda function from BetaPois model (red) Pearson residuals Pearson residuals 1.0 0.4 0.8 0.6 0.3 Density 0.4 ACF 0.2 0.2 0.1 −0.2 0.0 −2 −1 0 1 2 0 5 10 15 Histogram of residuals and N(0,1) density Lag π = 0 . 91 ˆ useR! 2009, Rennes 8.-10. July 2009 – p. 20
Crime Register Data SIMO : Austrian online crime register ■ Time range: 2004 - 2007 ■ weekly counts ■ 132 regions ■ different crime categories In most cases Poisson overdispersion useR! 2009, Rennes 8. -10. July 2009 – p. 21
GP model estimates Shop Lifting Bicycle Theft 120 100 100 80 80 Frequency Frequency 60 60 40 40 20 20 0 0 0 50 100 150 200 0 50 100 150 200 Week Week π = 0 . 69 ˆ π = 0 . 67 ˆ useR! 2009, Rennes 8. -10. July 2009 – p. 22
Beta-Poisson model estimates Shop Lifting Bicycle Theft 120 150 100 80 Frequency Frequency 100 60 40 50 20 0 0 0 50 100 150 200 0 50 100 150 200 Week Week π = 0 . 65 ˆ π = 0 . 52 ˆ useR! 2009, Rennes 8. -10. July 2009 – p. 23
Summary ■ Great variety of models ■ MLE based implementation in R ■ Good performance for simulated data ■ Reasonable estimates for real data useR! 2009, Rennes 8. -10. July 2009 – p. 24
Summary ■ Great variety of models ■ MLE based implementation in R ■ Good performance for simulated data ■ Reasonable estimates for real data Future work ■ Implement Conditional Poisson models ■ Non-nested Testing for more than two models useR! 2009, Rennes 8. -10. July 2009 – p. 24
Recommend
More recommend