POISSON REGRESSION WITH A PERIODIC FUNCTION

POISSON REGRESSION WITH. A PERIODIC FUNCTION. Naveen K. Bansal and Debasis Kundu. Department of Mathematics, Statistics and Computer. Science ...

Naveen K. Bansal and Debasis Kundu

Department of Mathematics, Statistics and Computer Science, Marquette University, P.O. Box 1881, Milwaukee, WI 53201-1881, U.S.A. and Department of Mathematics, Indian Institute of Technology Kanpur, Kanpur, Pin 208 016, India

Let f yt g be a Poisson-like process with the mean t which is a periodic function of time t. We discuss how to fit this type of data set using quasi-likelihood method. Our method provides a new avenue to fit a time series data when the usual assumption of stationarity and homogeneous residual variances are invalid. We show that the estimators obtained are strongly consistent and also asymptotically normal.

Key Words: Consistent estimators; Asymptotic normality; Poisson-like process; Non-stationary time series

1. INTRODUCTION Non-linear regression methods can be used to fit a time series data f yt g, when the mean t is a periodic function of time t, e.g., t ¼ 0 þ 1 cosð!tÞ þ 2 sinð!tÞ, t ¼ 1, . . . , n, where 0 , 1 , 2 and ! ð0  !  Þ are unknown


parameters. This model has been considered by Hannan,[1] Walker,[2] Kundu[3] and others. Hannan[1] established the strong consistency and the asymptotic normality of the least sqaures estimates. However, these results were obtained under the assumption that f yt  t g is stationary. Thus the results of Hannan cannot be applied to inhomogeneous and non-stationary time series. In this paper, we are concerned mainly with counted data, in particular when the counts of an event follow Poisson-like process. As an example, consider the data ‘‘UK death form Bronchitis, Emphysema and Asthma’’ from Diggle.[4] The data set contains the monthly deaths in UK from Bronchitis, Emphysema and Asthma during 1974 to 1979. As a second example, consider the ‘‘Telecommunication Traffic Data’’ from Duffy, McIntosh, Rosenstein and Willinger.[5] The data set contains the number of calls received every ten seconds over four days. In such cases of count data, it is reasonable to assume that the outcome f yt g follows Poisson-like distribution with Eð yt Þ ¼ t


Varð yt Þ ¼  2 t ,



where  is known or unknown. The main point is that Varð yt Þ / Eð yt Þ, and thus the stationarity assumption of Hannan[1] is not valid here. Note that, when f yt g is Poisson process, then (1.1) holds with  2 ¼ 1. However, we avoid the assumption of Poisson distribution and only assume (1.1) with an additional assumption that Ejyt  t j2þ is a continuous (or bounded) function of t for some  > 0. We shall also assume that t is a periodic function with a log link function, see for example McCullagh and Nelder,[6] i.e., ð1:2Þ log t ¼ 0 þ 1 cosð!tÞ þ 2 sinð!tÞ, 0  !  , where 0 , 1 , 2 and ! are unknown parameters. In Section 2, we discuss the quasi-likelihood function (McCullagh and Nelder[6]) and obtain estimating equations. We also fit the female UK death data using quasi-likelihood estimates. In Section 3, we discuss the strong consistency of our estimates and in Section 4, we discuss the asymptotic normality results of these estimates. Some concluding remarks are presented in Section 5.

The quasi-likelihood function with the assumption (1.1) can be described (McCullagh and Nelder[6]) as follows 82 n 1 X Qðl; yÞ ¼ 2 ½ yt logðt =yt Þ  ðt  yt Þ : 83  t¼1 84 80 81

Thus the quasi-likelihood estimators of the parameter vector h ¼ ð 0 , 1 , 2 , !ÞT can be obtained by minimizing

Q n ¼ 

n X ð yt log t  t Þ:



By differentiating Q n with respect to 0 , 1 , 2 and !, from (1.2), we get the following estimating equations; S0n ðhÞ ¼

n X ð yt   t Þ ¼ 0 t¼1

n X S1n ðhÞ ¼ ð yt  t Þ cosð!tÞ ¼ 0 t¼1 n X S2n ðhÞ ¼ ð yt  t Þ sinð!tÞ ¼ 0 t¼1

S3n ðhÞ ¼

n X

tð yt  t Þð1 sinð!tÞ þ 2 cosð!tÞÞ ¼ 0:


Estimates obtained as a solution to the above equations will be called quasi-likelihood estimates. We now fit the data on female UK deaths due Bronchitis, Emphysema and Asthma from Diggle.[4] By looking at the plot of the data and plot of the periodograms, it appears that there is only one harmonic component. The data also indicates that the variance is proportional to the mean. The quasi-likelihood estimates of 0 , 1 , 2 and ! are obtained as ^0 ¼ 6:289,

^1 ¼ 0:2785,

^2 ¼ 0:3074,

!^ ¼ 0:5198:

The corresponding asymptotic standard errors (see Section 4) are 0.051, 0.013, 0.012 and 0.0009 respectively. We also obtained the least squares estimates Pfor the transform data, log yt , i.e., the estimates obtained by minimizing ðlog yt  0  1 cosð!tÞ  2 sinð!tÞÞ2 . The estimates are given by;

^0 ¼ 6:2797,

^1 ¼ 0:00245,

^2 ¼ 0:05514,

!^ ¼ 0:7389:

The Figures 1 and 2 show the fit of ^ t along with the data values. Figure 1 shows the fit using quasi-likelihood estimates and Figure 2 shows the fit using least squares estimates on the log transformed data.

Figure 1. Plot using the quasi-likelihood estimates. Here x-axis denotes months and y-axis denotes the number of deaths.

We also tried the least squares fit without transforming the data and the fit was even worse than the fit shown on Figure 2. It is clear that the least squares fit is not good because the least squares estimates are influenced by the higher concentration, i.e., the low variability of the lower data values, and the high variability of the higher values.

Here we show that the quasi-likelihood estimates as defined in Section 2 ð0Þ ð0Þ are strongly consistent. Let h0 ¼ ð ð0Þ 0 , 1 , 2 , !0 Þ be the true parameter 163 vector. By Lemma 1 of the Appendix, it is enough to show that 161 162 164 165 166 167 168

lim inf inf

n!1 S , M

1 ½Q ðhÞ  Q n ðh0 Þ > 0 a:s: n n


for any > 0 and M > 0, where S , M ¼ fh : jh  h0 j  , jhj  Mg.

Figure 2. Plot using the least squares estimates on log transformed data. Here x-axis denotes months and y-axis denotes the number of deaths.

ð0Þ t ,

ð0Þ ð0Þ ð0Þ Denoting ð0Þ t ¼ expð 0 þ 1 cosð!0 tÞ þ 2 sinð!0 tÞÞ and t ¼ yt  we get, from (1.2) and (2.1), n 1 1X ½Qn ðhÞ  Q n ðh0 Þ ¼ ð  þ 1 cosð!tÞ þ 2 sinð!tÞÞ n n t¼1 t 0

n   1X ð0Þ ð0Þ t ð0Þ 0 þ 1 cosð!0 tÞ þ 2 sinð!0 tÞ n t¼1

n 1X ð0Þ ½expft ðh, h0 Þg  t ðh, h0 Þ  1 , n t¼1 t

ð0Þ t ðh, h0 Þ ¼ 0 þ 1 cosð!tÞ þ 2 sinð!tÞ  ð0Þ 0  1 cosð!0 tÞ

 ð0Þ 2 sinð!0 tÞ:

Now, since


1 X Varð t Þ



From Lemma 2 of the Appendix n 1X cosð!tÞ t ! 0 0! n t¼1




n 1X ð0Þ t ½expft ðh, h0 Þg  t ðh, h0 Þ  1 > 0: S , M n t¼1

lim inf inf n!1

ex  x  1 

x2 for jxj< 6


ex  x  1 > for jxj  ,

where > 0 is some constant. Thus ! 2 ðh, h0 Þ : expft ðh, h0 Þg  t ðh, h0 Þ  1  min , 6 Therefore, n 1X ð0Þ t ½exp ft ðh, h0 Þg  t ðh, h0 Þ  1 S , M n t¼1

lim inf inf n!1

expð ð0Þ 0

jð0Þ 1 j

jð0Þ 2 jÞ min

! n 1X 2 , lim inf inf t ðh, h0 Þ : ð3:4Þ n!1 S , M n t¼1

Now using the same argument as Kundu,[8] it follows that

To prove (3.3), note that there exists > 0 such that

n 1X sinð!tÞ t ! 0 a:s: 0! n t¼1


Thus, from (3.2)–(8), (3.1) holds if we prove the following

1  X 1 ð0Þ ð0Þ   2 exp ð0Þ þ j j þ j j 0: S , M n t¼1

lim inf inf n!1


Thus (3.3) follows from (3.4) and (3.5).

Here, we establish the asymptotic normality of the estimator h^ n . Writing Sn ðhÞ ¼ ðS0n ðhÞ, S1n ðhÞ, S2n ðhÞ, S3n ðhÞÞT , we get   0 ¼ Sn ðh^ n Þ ¼ Sn ðh0 Þ þ S0n ðh^ n Þ h^ n  h0 ,

where h^ n ¼ hh0 þ ð1  hÞh^ n for some 0

1. (2) This paper presents the Poisson regression to fit a time series data which is a departure from the usual practice of assuming i:i:d or stationary

residuals. In the similar manner the results can be obtained for the regressions that are in the form of the generalized linear model. (3) When the dispersion parameter  2 is unknown, it can be 381 382 estimated by 379 380

^ 2 ¼

n 1 X ð yt  ^ t Þ2 : n  4 t¼1 ^ t

It can also be seen that ^ 2 is a strongly consistent estimator with some additional assumption on the moment of yt  t . When  2 is known, the 388 above quantity can be used as a measure of goodness of fit. 389 390 391


Lemma 1. Let h^ n be an estimator of h obtained by minimizing a measurable function Qn ðhÞ which converges to infinity a:s: as jhj ! 1. Let, for any > 0 and M > 0, S , M ¼ fh : jh  h0 j  , jhj  Mg for some fixed h0 . If for 396 any > 0, 397 394 395

lim inf inf

n!1 S , M

1 ½Q ðhÞ  Qn ðh0 Þ > 0 n n



then h^ n ! h0 a:s: as n ! 1.

Proof. Suppose h^ n does not converge to h0 a.s. as n ! 1, then either

Case 1. There exist > 0 and 0 0

with positive probability,

for all k  M0 . This implies that h^ nk does not minimize Qnk ðhÞ. Thus the proof follows by the contradiction.


Lemma 2. Let X1 , X2 , . . . be a sequence of independent random variables with mean zero and EjXt j2 ¼ t2