Available online at www.sciencedirect.com
ScienceDirect Procedia Computer Science 86 (2016) 240 – 243
2016 International Electrical Engineering Congress, iEECON2016, 2-4 March 2016, Chiang Mai, Thailand
EM Algorithm for Truncated and Censored Poisson Likelihoods Chukiat Viwatwongkasem * Department of Biostatistics, Faculty of Public Health, Mahidol University, Bangkok 10400, Thailand.
Abstract The aim of this study is to find the maximum likelihood estimate (MLE) among frequency count data by using the expectationmaximization (EM) algorithm in which is useful to impute the missing or hidden values. Two forms of missing count data in both zero truncation and right censoring situations are illustrated for estimating the population size on drug use. The results show that a truncated and censored Poisson likelihood performs well with good estimates corresponding to the EM algorithm with a numerically stable convergence, a monotone increasing likelihood, and providing local maxima, so the expected global maximum of the MLE depends on the initial value. © 2016 2016The TheAuthors. Authors.Published Published Elsevier by by Elsevier B.V.B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of the Organizing Committee of iEECON2016. Peer-review under responsibility of the Organizing Committee of iEECON2016 Keywords: EM algorithm; Truncated and Censored Count; Imputation; Population Size Estimation
1. Introduction Expectation-Maximization (EM) algorithm is an efficiently iterative procedure for computing the maximum likelihood estimate (MLE); it can be applied not only in the presence of actual missing or hidden data, but also in the whole variety of complete situations. Dempster, Laird, and Rubin (1977) 1 gave the initial publication in Royal Statistical Society journal. A gap of the study is based on a reason that the EM algorithm is a useful method for solving the problem of incomplete data while other algorithms such as Newton-Raphson and Fisher Scoring method cannot be able to impute the unobserved (missing) data. Two applications of missing count data including the zero truncation and the right censoring are illustrated in the estimation of population size on drug use. 2. Methods
* Corresponding author. Tel.: +66-2-354-8530; fax: +66-2-354-8534. E-mail address:
[email protected]
1877-0509 © 2016 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of the Organizing Committee of iEECON2016 doi:10.1016/j.procs.2016.05.109
241
Chukiat Viwatwongkasem / Procedia Computer Science 86 (2016) 240 – 243
Let y ( y1,..., yn )c be an incomplete observed data vector of size n from the population function f ( y; θ) where θ (T1 ,...,T p )c is a vector of p unknown parameters. When some parts of the data are missing with unknown vector z , it is necessary to “fill in” the missing data z , leading to the complete data vector x (yc,zc)c . Then MLE via EM algorithm is determined under the complete-data log-likelihood l (θ; x) after imputation. In the E-step (expectation step), we calculate the expected value of the complete log-likelihood l (θ; x) with respect to the conditional distribution of z given the observed data vector y and the current estimate of the parameter vector θ( k 1) at the (k 1)th iteration: E ª¬l (θ;x) | y,θ( k 1) º¼ { Q(θ | θ( k 1) ) . In the M-step (maximization step), EM algorithm will maximize Q(θ | θ( k 1) ) with respect to θ to give an update value θ( k ) until convergence with an acceptable error. Accordingly, the MLE θˆ is obtained by choosing θ( k ) to be any value of θ Θ that maximizes Q(θ | θ( k 1) ) ; additionally, θ( k ) max Q(θ | θ( k 1) ) . θ
3. Motivational applications Viwatwongkasem et al. (2013) 2 projected the number of drug users in Thailand 2005-7 under surveillance data on the drug addicts undergoing treatment in the country over 1,140 health treatment centers where i is the count of treatment episodes for i 1,2,..., m and ni is the number of persons receiving treatment episode i , and sample size is n n1 n2 ... nm . Example 1 is a typical form of zero truncation while Example 2 includes right censoring. Example 1. Observed counts of treatment episodes on marijuana users in Thailand 2006 0 1 2 Treatment episodes in a case ( i ) 5,445 1,025 Number of cases (Frequencies n )
3 158
i
Example 2. Observed counts of treatment episodes on heroin users in Thailand 2005 0 1 2 Treatment episodes in a case ( i ) 3,057 791 Number of cases (Frequencies n )
3 351
i
4 107
4 21
5 80
5 1
6 59
7+ 22
4. Results for example 1 (zero truncation situation) The incompletely observed likelihood relative to zero-truncated count frequencies (n1, n2 ,..., nm )c y is m Po(i | O ) exp(O ) O i / i ! L(O; y) pic n where pic pic(O ) 1 Po(0 | O ) 1 exp(O ) i 1 where density pic is assumed to be a zero-truncated Poisson. Suppose that unknown missing frequency vector z (n0 )c is replaced by its conditional expectation e (e0 )c given the observed frequencies n1 , n2 ,..., nm and the current value of O : e0 E (n0 | O, n1,..., nm ) . To proceed with the EM context, the complete data vector x (e0 , n1,..., nm )c is needed to use. After adding up e0 , the zero-truncated Poisson likelihood with density pic should be changed to the simply complete Poisson likelihood with density pi , leading to its log-likelihood as i
Lcd (O; x)
m
p
i
ni
e
n
n
p0 0 p1 1 ... pm m
where
pi
Po(i | O )
i 0
lcd (O ; x)
m
e0 log p0 ¦ ni log pi i 1
In the E-step, the expected value e0 e0
? e0
exp(O ) O i i!
m
e0 log Po(0 | O ) ¦ ni log Po(i | O ) i 1
E (n0 | O , n) under the Poisson density p0 can be obtained as
E(n0 | n1, n2 ,..., nm , O )
p0 N
Po(0 | O ) N where N
Po(0 | O ) >e0 @ Po(0 | O) >n1 n2 ... nm @
Po(0 | O ) > n1 n2 ... nm @ 1 Po(0 | O )
e0 n
§ exp( O ) · ¨ ¸n © 1 exp( O ) ¹
The complete log-likelihood lcd (O; x) with the expectation and the observed data can be rewritten as
242
Chukiat Viwatwongkasem / Procedia Computer Science 86 (2016) 240 – 243
Q (O )
m
e0 O ¦ ni > O i log O log (i !) @
lcd (O ; x)
i 1
1 e0 n
In the M-step, the derivative of Q(Oˆ) lˆcd (O; x) and setting the result to 0 yield Oˆ Thompson estimator for estimating a population size is Nˆ
m
¦i n
i
. The Horvitz-
i 1
§ exp(O ) · ¨ ¸n . © 1 exp(O ) ¹
n e0 where e0
EM algorithm Step 0 Choose initial value Oˆ (0) , and set k 0 § Po(0 | Oˆ ( k ) ) · n ¨¨ ˆ ( k ) ¸¸ © 1 Po(0 | O ) ¹
§ exp(Oˆ ( k ) ) · and Nˆ ( k 1) n ¨¨ ˆ ( k ) ) ¸¸ 1 exp( O © ¹
e0( k 1) n
Step 1
Compute e0( k 1)
Step 2
Use complete data e0( k 1) , n1 ,..., nm to compute the new MLE Oˆ ( k 1)
Step 3
1 ( k 1) 0
e
m
¦i n n
i
,
i 1
Set k k 1 and go back to Step 1. The step 1 and 2 are repeated until convergence.
Table 1 Estimates of population size on marijuana users in Thailand 2006 ( n 6650 ) Methods Number of iterations Taylor’s series approximation Newton-Raphson algorithm Fisher scoring algorithm EM algorithm Chao’s method
Oˆ
Closed form 9 7 97 Closed form
nˆ0
0.349 0.397 0.397 0.397 0.376
e0
15897 13635 13635 13635 14462
Nˆ 22547 20285 20285 20285 21112
5. Results for example 2 (including right censoring situation) Let original observed count frequencies n1, n2 ,..., nm without n0 have a multinomial density n § · n exp(O )O j / j ! n n pcm where pcj ¨ ¸ p1c p2c m nm ¹ © n1 ¦ x 1 exp(O )O x / x! 1
2
m
Consider arbitrary right censoring count J in x 1,2,..., J where 2 d J d m . The incompletely observed likelihood relative to frequencies (n1, n2 ,..., nJ )c y is obtained as J
pc
L (O ; y )
nj
j
where pcj
pcj (O )
j 1
exp(O ) O j / j !
¦
J x 1
exp(O ) O x / x!
Suppose unknown missing frequency vector z (n0 , nJ 1,..., nm )c is replaced by its conditional expectation e (e0 , eJ 1,..., em )c given the observed frequencies n1, n2 ,..., nJ and the current value of O where ex E(nx | O, n1, n2 ,..., nJ ) . The complete data set is x (e0 , n1,..., nJ , eJ 1,..., em )c . The multinomial likelihood with probability of success pcj should be changed to the complete Poisson likelihood with density p j as Lcd (O; x)
m
p
ni
e
n
nJ
p0 0 p1 1 ... pJ
i
e
e
pJ J11 ... pm m where pi
Po(i | O )
i 0
lcd (O ; x)
J
e0 log p0 ¦ ni log pi i 1
exp(O )O i i!
m
¦ e log p
i J 1
i
i
In the E-step, the expected vector e (e0 , eJ 1,..., em )c where ex E(nx | O, n1, n2 ,..., nJ ) for x 0 or x ! J under the Poisson density px can be obtained as ex E(nx | n1, n2 ,..., nJ , O ) px N Po( x | O ) N where N e0 n1 ... nJ eJ 1 ... em ex ? e0
Po( x | O ) >e0 n1 ... nJ eJ 1 ... em @
m
¦ e > Po(0 | O ) Po( J 1| O ) ... Po(m | O )@>e
x J 1
x
0
n1 ... nJ eJ 1 ... em @
(1)
243
Chukiat Viwatwongkasem / Procedia Computer Science 86 (2016) 240 – 243
ª1 ¦ J Po( x | O )º >e0 n1 ... nJ eJ 1 ... em @ x 1 ¬ ¼
? ex
Finally, replace (2) into (1),
1 ¦ x 1 Po( x | O ) J
m
¦e
? e0
x J 1
¦
x
Po( x | O )
J x 1
> n1 ... nJ @
§ · ¨ JPo( x | O ) ¸ > n1 ... nJ @ for x ¨ ¦ Po( xc | O ) ¸ © xc 1 ¹
(2)
0 or x ! J
The complete log-likelihood lcd (O; x) with the expectation and the observed data can be rewritten as J
e0 O ¦ n j > O j log O log ( j !)@
Q(O ) lcd (O; x)
j 1
m
¦ e >O x log O log ( x!)@
x J 1
x
In the M-step, the derivatives of Q(Oˆ) lˆcd (O; x) and setting the result to 0 yield Oˆ
1 e0 ¦ j 1 n j ¦ x J
m
§ ¨ e J 1 x ©
j 1
j
m
¦
x J 1
· x ex ¸ ¹
§ · ¨ JPo(0 | O ) ¸ > n1 ... nJ @ . ¨ ¦ Po( xc | O ) ¸ © xc 1 ¹
m
e0 ¦ ni where e0
The population size estimator is Nˆ
J
¦ jn
i 1
EM algorithm Step 0 Choose initial value Oˆ (0) , and set k 0 §
Step 1 Step 2
Po( x | Oˆ ( k ) ) ·¸ > n ... nJ @ for x 0 , x ! J and Nˆ ( k 1) ¨ ¦ Po( xc | Oˆ ( k ) ) ¸ 1 c © x 1 ¹
Compute ex( k 1) ¨
( k 1) 0
Use complete data e Oˆ ( k 1)
Step 3
, n1 ,..., nJ , e
1 e0( k 1) ¦ j 1 n j ¦ x J
m
e0( k 1) ¦ ni
J
m
( k 1) J 1
§ ¨ ( k 1) e © J 1 x
... e J
¦ jn j 1
j
( k 1) J 1
i 1
to compute the new MLE
m
¦ xe
x J 1
( k 1) x
· ¸, ¹
Set k k 1 and go back to Step 1. The step 1 and 2 are repeated until convergence.
Table 2 Estimates of population sizes on heroin users in Thailand 2005 ( n Methods ˆ
O
EM algorithm Chao’s method
0.9468 0.5175
4467 )
nˆ0
e0
2818 5907
Nˆ 7285 10374
6. Discussion EM algorithm provides monotone increasing likelihood and contributes to numerically stable convergence. However, it may converge slowly; it is like other algorithms, such as Newton-Raphson, providing local maxima, so the final MLE as the global maximum may depend upon the initial value. Acknowledgements This study was partially supported for publication by the China Medical Board (CMB), Faculty of Public Health, Mahidol University, Bangkok, Thailand. References 1. Dempster AP, Laird NM, Rubin DB. Maximum likelihood estimation from incomplete-data via the EM algorithm (with discussion). Journal of the Royal Statistical Society, Series B 1977; 39: 1-38. 2. Viwatwongkasem C, Satitvipawee P, Jareinpituk S, Soontornpipit P. Mixture models for estimating the number of drug users in Thailand 2005-2007. Applied Mathematics 2013; 4: 1242-1250.