Jul 10, 2009 - Conditional Poisson in work. Testing competing models: in development. Testing parameters within models: done. Model diagnostics: done.
Size Estimation - Statistical Models for Underreporting Gerhard Neubauer, Gordana Djuraš & Herwig Friedl JOANNEUM RESEARCH and Technical University, Graz
useR! 2009, Rennes
8.-10. July 2009 – p. 1
1 Introduction
useR! 2009, Rennes
8.-10. July 2009 – p. 2
Underreporting Any sample of count data may be incomplete ■
■
■ ■
criminology: crimes with an aspect of shame (sexuality, domestic violence) or theft of low values goods public health: infectious (HIV) or chronic (diabetes) disease data production: error counts in a production process traffic accidents with minor damage
Estimation of total number of cases
useR! 2009, Rennes
8.-10. July 2009 – p. 3
Binomial Model Event reported Yes
No
R
1−R
R ∼ Bernoulli(π)
useR! 2009, Rennes
8.-10. July 2009 – p. 4
Binomial Model Event reported Yes
No
R
1−R
R ∼ Bernoulli(π) iid sample of size λ ⇒ Y = E(Y ) = µ = λπ,
useR! 2009, Rennes
P
i
Ri ∼ Binomial(λ, π)
var(Y ) = σ 2 = λπ(1 − π)
8.-10. July 2009 – p. 4
Binomial Model Event reported Yes
No
R
1−R
R ∼ Bernoulli(π) iid sample of size λ ⇒ Y = E(Y ) = µ = λπ, Y π λ U =λ−Y useR! 2009, Rennes
P
i
Ri ∼ Binomial(λ, π)
var(Y ) = σ 2 = λπ(1 − π)
the number of reported events the reporting probability the total number of events - size parameter the number of unreported events 8.-10. July 2009 – p. 4
Binomial Model Event reported Yes
No
R
1−R
R ∼ Bernoulli(π) iid sample of size λ ⇒ Y = E(Y ) = µ = λπ,
P
i
Ri ∼ Binomial(λ, π)
var(Y ) = σ 2 = λπ(1 − π)
Both λ and π have to be estimated No longer member of Exponential Family useR! 2009, Rennes
8.-10. July 2009 – p. 5
Estimation For T iid samples Yt
Method of Moments
For the binomial we have var(Y ) = µ − µ2 /λ ≤ µ Limitation to data with s2 < y¯ For s2 > y¯ 1. Regression approach using ML ind
Yt ∼ Binomial(λt , π) with λt = f (xt , β) Neubauer & Friedl (2006) 2. Mixed model approaches
useR! 2009, Rennes
8.-10. July 2009 – p. 6
2 Alternative Models
useR! 2009, Rennes
8.-10. July 2009 – p. 7
Beta-Binomial Random reporting probability P Yt |P ∼ Binomial(λ, p) P ∼ Beta(γ, δ) Yt ∼ Beta-Binomial(λ, γ, δ) Mean-variance relation
2
µ φ var(Yt ) = µ − λ λ+γ+δ φ= ≥1 1+γ+δ
useR! 2009, Rennes
8.-10. July 2009 – p. 8
Poisson Random total number of cases L Yt |L ∼ Binomial(l, π) L ∼ Poisson(λ) Yt ∼ Poisson(λπ) Parameters not identified
useR! 2009, Rennes
8.-10. July 2009 – p. 9
Negative Binomial Additional randomness in E(L) L|K Yt |K K Yt
∼ ∼ ∼ ∼
Poisson(kλ) Poisson(kλπ) Gamma(ω, ω) Negative Binomial(ω, 1 − π)
ω the number of unreported cases π the reporting probability Mean-variance relation µ2 ≥µ var(Yt ) = µ + ω useR! 2009, Rennes
8.-10. July 2009 – p. 10
Beta-Poisson Consider both binomial parameters as random Y |L, P ∼ Binomial(L, P ) L ∼ Poisson(λ) P ∼ Beta(γ, δ) Y
∼ Beta-Poisson(λ, γ, δ)
E(Y ) = λπ = µ var(Y ) = µφ with
useR! 2009, Rennes
where
γ π= γ+δ
λ(1 − π) φ=1+ ≥1 1+γ+δ
8.-10. July 2009 – p. 11
Generalized Poisson Distribution Consul(1989) Moments θ E(Y ) = (1 − τ )
useR! 2009, Rennes
θ var(Y ) = (1 − τ )3
■
τ = 0:
E(Y ) = var(Y ) ⇒ Poisson (θ)
■
0 < τ < 1: E(Y ) < var(Y ) ⇒ Neg. Binomial
■
τ < 0:
E(Y ) > var(Y ) ⇒ Binomial
8.-10. July 2009 – p. 12
Conditional Poisson Models Motivation: π → 1 leads to Y → λ in the binomial approach Assume Y |L ∼ P oisson(L) Choose p(L) such that E(Y ) = λπ
and
var(Y ) = λπφ
For example: L ∼ Binomial(λ, π) L ∼ N egative Binomial(λ, π)
useR! 2009, Rennes
1