POISSON REGRESSION WITH. A PERIODIC FUNCTION. Naveen K. Bansal and Debasis Kundu. Department of Mathematics, Statistics and Computer. Science ...
120004910_STA_031_007_R1.pdf
COMMUN. STATIST.—THEORY METH., 31(7), 1123–1136 (2002) 1 2 3 4
REGRESSION AND DESIGN OF EXPERIMENTS
5 6 7 8 9
POISSON REGRESSION WITH A PERIODIC FUNCTION
10 11 12
Naveen K. Bansal and Debasis Kundu
13 14 15 16 17 18
Department of Mathematics, Statistics and Computer Science, Marquette University, P.O. Box 1881, Milwaukee, WI 53201-1881, U.S.A. and Department of Mathematics, Indian Institute of Technology Kanpur, Kanpur, Pin 208 016, India
19 20 21 22
ABSTRACT
23 24 25 26 27 28 29 30
Let f yt g be a Poisson-like process with the mean t which is a periodic function of time t. We discuss how to fit this type of data set using quasi-likelihood method. Our method provides a new avenue to fit a time series data when the usual assumption of stationarity and homogeneous residual variances are invalid. We show that the estimators obtained are strongly consistent and also asymptotically normal.
31 32 33
Key Words: Consistent estimators; Asymptotic normality; Poisson-like process; Non-stationary time series
34 35 36 37 38 39 40 41
1. INTRODUCTION Non-linear regression methods can be used to fit a time series data f yt g, when the mean t is a periodic function of time t, e.g., t ¼ 0 þ 1 cosð!tÞ þ 2 sinð!tÞ, t ¼ 1, . . . , n, where 0 , 1 , 2 and ! ð0 ! Þ are unknown
42
1123 Copyright & 2002 by Marcel Dekker, Inc.
www.dekker.com
+ [30.4.2002–11:02am] [1123–1136] [Page No. 1123] i:/Mdi/Sta/31(7)/120004910_STA_031_007_R1.3d Communications in Statistics: Theory and
120004910_STA_031_007_R1.pdf
1124 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
BANSAL AND KUNDU
parameters. This model has been considered by Hannan,[1] Walker,[2] Kundu[3] and others. Hannan[1] established the strong consistency and the asymptotic normality of the least sqaures estimates. However, these results were obtained under the assumption that f yt t g is stationary. Thus the results of Hannan cannot be applied to inhomogeneous and non-stationary time series. In this paper, we are concerned mainly with counted data, in particular when the counts of an event follow Poisson-like process. As an example, consider the data ‘‘UK death form Bronchitis, Emphysema and Asthma’’ from Diggle.[4] The data set contains the monthly deaths in UK from Bronchitis, Emphysema and Asthma during 1974 to 1979. As a second example, consider the ‘‘Telecommunication Traffic Data’’ from Duffy, McIntosh, Rosenstein and Willinger.[5] The data set contains the number of calls received every ten seconds over four days. In such cases of count data, it is reasonable to assume that the outcome f yt g follows Poisson-like distribution with Eð yt Þ ¼ t
and
Varð yt Þ ¼ 2 t ,
ð1:1Þ
2
where is known or unknown. The main point is that Varð yt Þ / Eð yt Þ, and thus the stationarity assumption of Hannan[1] is not valid here. Note that, when f yt g is Poisson process, then (1.1) holds with 2 ¼ 1. However, we avoid the assumption of Poisson distribution and only assume (1.1) with an additional assumption that Ejyt t j2þ is a continuous (or bounded) function of t for some > 0. We shall also assume that t is a periodic function with a log link function, see for example McCullagh and Nelder,[6] i.e., ð1:2Þ log t ¼ 0 þ 1 cosð!tÞ þ 2 sinð!tÞ, 0 ! , where 0 , 1 , 2 and ! are unknown parameters. In Section 2, we discuss the quasi-likelihood function (McCullagh and Nelder[6]) and obtain estimating equations. We also fit the female UK death data using quasi-likelihood estimates. In Section 3, we discuss the strong consistency of our estimates and in Section 4, we discuss the asymptotic normality results of these estimates. Some concluding remarks are presented in Section 5.
76 77 78
2. QUASI-LIKELIHOOD AND PARAMETERS ESTIMATION
79
The quasi-likelihood function with the assumption (1.1) can be described (McCullagh and Nelder[6]) as follows 82 n 1 X Qðl; yÞ ¼ 2 ½ yt logðt =yt Þ ðt yt Þ : 83 t¼1 84 80 81
+ [30.4.2002–11:02am] [1123–1136] [Page No. 1124] i:/Mdi/Sta/31(7)/120004910_STA_031_007_R1.3d Communications in Statistics: Theory and
120004910_STA_031_007_R1.pdf
POISSON REGRESSION 85 86
Thus the quasi-likelihood estimators of the parameter vector h ¼ ð 0 , 1 , 2 , !ÞT can be obtained by minimizing
87 88 89
1125
Q n ¼
n X ð yt log t t Þ:
ð2:1Þ
t¼1
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114
By differentiating Q n with respect to 0 , 1 , 2 and !, from (1.2), we get the following estimating equations; S0n ðhÞ ¼
n X ð yt t Þ ¼ 0 t¼1
n X S1n ðhÞ ¼ ð yt t Þ cosð!tÞ ¼ 0 t¼1 n X S2n ðhÞ ¼ ð yt t Þ sinð!tÞ ¼ 0 t¼1
S3n ðhÞ ¼
n X
tð yt t Þð1 sinð!tÞ þ 2 cosð!tÞÞ ¼ 0:
t¼1
Estimates obtained as a solution to the above equations will be called quasi-likelihood estimates. We now fit the data on female UK deaths due Bronchitis, Emphysema and Asthma from Diggle.[4] By looking at the plot of the data and plot of the periodograms, it appears that there is only one harmonic component. The data also indicates that the variance is proportional to the mean. The quasi-likelihood estimates of 0 , 1 , 2 and ! are obtained as ^0 ¼ 6:289,
^1 ¼ 0:2785,
^2 ¼ 0:3074,
!^ ¼ 0:5198:
115 116 117 118 119 120
The corresponding asymptotic standard errors (see Section 4) are 0.051, 0.013, 0.012 and 0.0009 respectively. We also obtained the least squares estimates Pfor the transform data, log yt , i.e., the estimates obtained by minimizing ðlog yt 0 1 cosð!tÞ 2 sinð!tÞÞ2 . The estimates are given by;
121 122
^0 ¼ 6:2797,
^1 ¼ 0:00245,
^2 ¼ 0:05514,
!^ ¼ 0:7389:
123 124 125 126
The Figures 1 and 2 show the fit of ^ t along with the data values. Figure 1 shows the fit using quasi-likelihood estimates and Figure 2 shows the fit using least squares estimates on the log transformed data.
F1 F2
+ [30.4.2002–11:02am] [1123–1136] [Page No. 1125] i:/Mdi/Sta/31(7)/120004910_STA_031_007_R1.3d Communications in Statistics: Theory and
120004910_STA_031_007_R1.pdf
1126
BANSAL AND KUNDU
127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156
Figure 1. Plot using the quasi-likelihood estimates. Here x-axis denotes months and y-axis denotes the number of deaths.
We also tried the least squares fit without transforming the data and the fit was even worse than the fit shown on Figure 2. It is clear that the least squares fit is not good because the least squares estimates are influenced by the higher concentration, i.e., the low variability of the lower data values, and the high variability of the higher values.
157 158 159
3. STRONG CONSISTENCY OF THE ESTIMATES
160
Here we show that the quasi-likelihood estimates as defined in Section 2 ð0Þ ð0Þ are strongly consistent. Let h0 ¼ ð ð0Þ 0 , 1 , 2 , !0 Þ be the true parameter 163 vector. By Lemma 1 of the Appendix, it is enough to show that 161 162 164 165 166 167 168
lim inf inf
n!1 S , M
1 ½Q ðhÞ Q n ðh0 Þ > 0 a:s: n n
ð3:1Þ
for any > 0 and M > 0, where S , M ¼ fh : jh h0 j , jhj Mg.
+ [30.4.2002–11:02am] [1123–1136] [Page No. 1126] i:/Mdi/Sta/31(7)/120004910_STA_031_007_R1.3d Communications in Statistics: Theory and
120004910_STA_031_007_R1.pdf
POISSON REGRESSION
1127
169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191
Figure 2. Plot using the least squares estimates on log transformed data. Here x-axis denotes months and y-axis denotes the number of deaths.
192 193 194 195 196 197 198
ð0Þ t ,
ð0Þ ð0Þ ð0Þ Denoting ð0Þ t ¼ expð 0 þ 1 cosð!0 tÞ þ 2 sinð!0 tÞÞ and t ¼ yt we get, from (1.2) and (2.1), n 1 1X ½Qn ðhÞ Q n ðh0 Þ ¼ ð þ 1 cosð!tÞ þ 2 sinð!tÞÞ n n t¼1 t 0
199 200 201
n 1X ð0Þ ð0Þ t ð0Þ 0 þ 1 cosð!0 tÞ þ 2 sinð!0 tÞ n t¼1
n 1X ð0Þ ½expft ðh, h0 Þg t ðh, h0 Þ 1 , n t¼1 t
202 203 204 205 206
ð3:2Þ
where
207 208 209 210
ð0Þ t ðh, h0 Þ ¼ 0 þ 1 cosð!tÞ þ 2 sinð!tÞ ð0Þ 0 1 cosð!0 tÞ
ð0Þ 2 sinð!0 tÞ:
+ [30.4.2002–11:02am] [1123–1136] [Page No. 1127] i:/Mdi/Sta/31(7)/120004910_STA_031_007_R1.3d Communications in Statistics: Theory and
120004910_STA_031_007_R1.pdf
1128 211 212
BANSAL AND KUNDU
Now, since
213
1 X Varðt Þ
214
t¼1
215 216 217 218 219 220 221 222 223 224 225 226 227 228
t2
233 234 235 236 237 238 239 240
244 245
t2
From Lemma 2 of the Appendix n 1X cosð!tÞt ! 0 0! n t¼1
sup
a:s:,
and
n 1X ð0Þ t ½expft ðh, h0 Þg t ðh, h0 Þ 1 > 0: S , M n t¼1
lim inf inf n!1
ex x 1
x2 for jxj< 6
and
250 251 252
ex x 1 > for jxj ,
where > 0 is some constant. Thus ! 2 ðh, h0 Þ : expft ðh, h0 Þg t ðh, h0 Þ 1 min , 6 Therefore, n 1X ð0Þ t ½exp ft ðh, h0 Þg t ðh, h0 Þ 1 S , M n t¼1
lim inf inf n!1
expð ð0Þ 0
jð0Þ 1 j
jð0Þ 2 jÞ min
! n 1X 2 , lim inf inf t ðh, h0 Þ : ð3:4Þ n!1 S , M n t¼1
Now using the same argument as Kundu,[8] it follows that
248 249
ð3:3Þ
To prove (3.3), note that there exists > 0 such that
246 247
n 1X sinð!tÞt ! 0 a:s: 0! n t¼1
sup
Thus, from (3.2)–(8), (3.1) holds if we prove the following
241 242 243
t¼1
1 X 1 ð0Þ ð0Þ 2 exp ð0Þ þ j j þ j j 0: S , M n t¼1
lim inf inf n!1
ð3:5Þ
Thus (3.3) follows from (3.4) and (3.5).
+ [30.4.2002–11:02am] [1123–1136] [Page No. 1128] i:/Mdi/Sta/31(7)/120004910_STA_031_007_R1.3d Communications in Statistics: Theory and
120004910_STA_031_007_R1.pdf
POISSON REGRESSION 253
1129
4. ASYMPTOTIC NORMALITY OF THE ESTIMATES
254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288
Here, we establish the asymptotic normality of the estimator h^ n . Writing Sn ðhÞ ¼ ðS0n ðhÞ, S1n ðhÞ, S2n ðhÞ, S3n ðhÞÞT , we get 0 ¼ Sn ðh^ n Þ ¼ Sn ðh0 Þ þ S0n ðh^ n Þ h^ n h0 ,
where h^ n ¼ hh0 þ ð1 hÞh^ n for some 0
1. (2) This paper presents the Poisson regression to fit a time series data which is a departure from the usual practice of assuming i:i:d or stationary
+ [30.4.2002–11:02am] [1123–1136] [Page No. 1131] i:/Mdi/Sta/31(7)/120004910_STA_031_007_R1.3d Communications in Statistics: Theory and
120004910_STA_031_007_R1.pdf
1132
BANSAL AND KUNDU
residuals. In the similar manner the results can be obtained for the regressions that are in the form of the generalized linear model. (3) When the dispersion parameter 2 is unknown, it can be 381 382 estimated by 379 380
383 384 385
^ 2 ¼
n 1 X ð yt ^ t Þ2 : n 4 t¼1 ^ t
386 387
It can also be seen that ^ 2 is a strongly consistent estimator with some additional assumption on the moment of yt t . When 2 is known, the 388 above quantity can be used as a measure of goodness of fit. 389 390 391
APPENDIX
392 393
Lemma 1. Let h^ n be an estimator of h obtained by minimizing a measurable function Qn ðhÞ which converges to infinity a:s: as jhj ! 1. Let, for any > 0 and M > 0, S , M ¼ fh : jh h0 j , jhj Mg for some fixed h0 . If for 396 any > 0, 397 394 395
398 399 400
lim inf inf
n!1 S , M
1 ½Q ðhÞ Qn ðh0 Þ > 0 n n
a:s:,
401
then h^ n ! h0 a:s: as n ! 1.
402 403
Proof. Suppose h^ n does not converge to h0 a.s. as n ! 1, then either
404 405 406 407 408 409 410 411 412 413 414
ðA:1Þ
Case 1. There exist > 0 and 0 0
with positive probability,
for all k M0 . This implies that h^ nk does not minimize Qnk ðhÞ. Thus the proof follows by the contradiction.
415
Lemma 2. Let X1 , X2 , . . . be a sequence of independent random variables with mean zero and EjXt j2 ¼ t2