Journal of Applied Statistics
ISSN: 0266-4763 (Print) 1360-0532 (Online) Journal homepage: http://www.tandfonline.com/loi/cjas20
A new modified Jackknifed estimator for the Poisson regression model Semra Türkan & Gamze Özel To cite this article: Semra Türkan & Gamze Özel (2016) A new modified Jackknifed estimator for the Poisson regression model, Journal of Applied Statistics, 43:10, 1892-1905, DOI: 10.1080/02664763.2015.1125861 To link to this article: http://dx.doi.org/10.1080/02664763.2015.1125861
Published online: 30 Dec 2015.
Submit your article to this journal
Article views: 98
View related articles
View Crossmark data
Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=cjas20 Download by: [Hacettepe University]
Date: 15 June 2016, At: 02:30
JOURNAL OF APPLIED STATISTICS, 2016 VOL. 43, NO. 10, 1892–1905 http://dx.doi.org/10.1080/02664763.2015.1125861
A new modified Jackknifed estimator for the Poisson regression model Semra Türkan and Gamze Özel
Downloaded by [Hacettepe University] at 02:30 15 June 2016
Department of Statistics, Hacettepe University, Ankara, Turkey ABSTRACT
ARTICLE HISTORY
The Poisson regression is very popular in applied researches when analyzing the count data. However, multicollinearity problem arises for the Poisson regression model when the independent variables are highly intercorrelated. Shrinkage estimator is a commonly applied solution to the general problem caused by multicollinearity. Recently, the ridge regression (RR) estimators and some methods for estimating the ridge parameter k in the Poisson regression have been proposed. It has been found that some estimators are better than the commonly used maximum-likelihood (ML) estimator and some other RR estimators. In this study, the modified Jackknifed Poisson ridge regression (MJPR) estimator is proposed to remedy the multicollinearity. A simulation study and a real data example are provided to evaluate the performance of estimators. Both mean-squared error and the percentage relative error are considered as the performance criteria. The simulation study and the real data example results show that the proposed MJPR method outperforms the Poisson ridge regression, Jackknifed Poisson ridge regression and the ML in all of the different situations evaluated in this paper.
Received 13 June 2014 Accepted 25 November 2015 KEYWORDS
Poisson regression; Jackknifed estimators; ridge regression; maximum likelihood; simulation; MSE AMS SUBJECT CLASSIFICATIONS
62J07; 62F10
1. Introduction Modeling of count variables is a common task in economics and the social sciences. The dependent variable often comes in the form of non-negative integers or counts in these fields. In that situation, one often applies the Poisson regression model which is usually estimated by the maximum likelihood (ML) method. The Poisson regression is not only the most widely used model for count data [4,27], but also it is very popular to estimate the parameters of multiplicative models [19,26]. However, the presence of multicollinearity among independent variables is a common problem in the Poisson regression model. It may lead to some of the regression coefficients being statistically insignificant and also to difficulties in interpreting the estimates of an individual coefficient [21]. For instance, Gråsjö [5] faced this problem since both government R&D and university R&D were used as independent variables when the author analyzed the number of patents for Swedish firms. Another example can be found in [6], where age, job experience and income were included in the regression model explaining how frequently credit-card holders failed CONTACT Semra Türkan © 2015 Taylor & Francis
[email protected]
JOURNAL OF APPLIED STATISTICS
1893
to meet their financial obligations. The problem of multicollinearity is that it leads to a high variance of the estimated coefficient vectors which makes difficult to interpret the estimates of the parameters. Hence, multicollinearity causes invalid statistical inferences and therefore, it is not possible to investigate the impact of different economic factors on the dependent variable. The most popular method to deal with this problem is the ridge regression (RR) analysis proposed by Hoerl and Kennard [9]. The RR estimation procedure is based on adding small positive number (k ≥ 0) to the diagonal of X X matrix which makes the RR estimator biased but ensures a smaller meansquared error (MSE) than the ordinary least square (OLS). Then, the RR estimator in the linear regression (LR) analysis is given by
Downloaded by [Hacettepe University] at 02:30 15 June 2016
βˆ RR = (XT X + kI)−1 XT y,
k ≥ 0,
(1)
where X is an n × p matrix of n observations on p independent variables, βˆ RR is an n × p vector of estimated parameters. Some different techniques for estimating k in Equation (1) are proposed by Alkhamisi et al. [1], Alkhamisi and Shukur [2], Khalaf and Shukur [13], Kibria [15], Manson and Shukur [20], and Muniz and Kibria [24]. In these studies, the performance of the RR estimator is compared based on simulations and the RR analysis is found as an effective analysis. Mansson and Shukur [20] proposed the Poisson ridge regression (PRR) method for the multicollinearity problem and showed that the PRR estimator outperformed the ML for the Poisson regression analysis. On the other hand, Liu [18] points out that it has the disadvantage when the estimated parameters are complicated nonlinear functions of the ridge parameter, k. Furthermore, Singh et al. [28] note that the RR estimators may carry a substantial amount of bias which has been ignored in the literature. They obtain an almost unbiased RR estimator using the Jackknifed procedure of bias reduction to solve this problem in the LR analysis. They demonstrate that the Jackknifed estimator has smaller bias as well as MSE than the RR under some conditions in the LR analysis. For this reason, Jackknifed estimator has become more popular in recent years. Hence, many papers have proposed the Jackknifed versions of the RR estimator in the LR analysis. Nyquist [25] showed the applications of the Jackknifed procedure in the RR. Gruber [7] compared the efficiency of Jackknifed and ridge-type estimators. Batah et al. [3] studied the effect of Jackknifing on various ridge-type estimators. Khurana et al. [14] proposed the second-order Jackknifed ridge that may reduce the bias further. They also showed that the transformation was not required for the Jackknifed procedure. The purpose of this paper is to solve the problem of an inflated MSE of the RR estimator by applying the Jackknifed estimator in the Poisson regression analysis. The rest of the paper is structured as follows: Section 2 describes the shrinkage estimators in the LR and the Poisson regression models. In Section 3, the modified Jackknifed Poisson ridge (MJPR) estimator for the Poisson regression model is proposed. The design of experiment and the results are discussed in Section 4. The MJPR estimator is assumed to perform better than the PRR and Jackknifed Poisson ridge regression (JPR) estimators. Then, the performances of the MJPR, PRR, and JPR are studied using Monte Carlo simulations. The MSE is used in order to judge the performance of the estimator. Some concluding remarks are given in Section 5.
1894
S. TÜRKAN AND G. ÖZEL
2. Methodology This section describes the shrinkage estimators in the LR and the Poisson regression models. 2.1. The linear regression estimators Consider the LR model given by
Downloaded by [Hacettepe University] at 02:30 15 June 2016
y = Xβ + u,
(2)
where y is an n × 1 response vector, X is an n × p matrix of coefficients, β is a p × 1 vector of unknown coefficients, and u is an n × 1 error vector with mean 0 and variance σ 2 I. Let Q = (q1 , q2 , . . . , qp ) be a p × p matrix whose columns are normalized eigen vectors of XT X and = diag(λ1 , λ2 , . . . , λp ), such that XT X = QQT . The LR model in Equation (2) can be written as y = Zγ + u,
(3)
where Z = XQ and γ = QT β. The OLS estimators of γ and β in Equation (3) are given by, respectively γˆ OLS = −1 ZT y,
(4)
βˆ OLS = Qγˆ OLS .
(5)
Note that, due to the relation γ = QT β, any estimator of γ has a corresponding βˆ = Qγˆ . Hence, it is sufficient to consider only the canonical form. The RR estimator is proposed by Hoerl and Kennard [10] as γˆ RR = ( + kI)−1 ZT y = (I − kA−1 )γˆ OLS ,
(6)
where k > 0 is the shrinkage parameter and A = ( + kI). Here, it is indicated that Jackknifed procedure is applied on a transformed set of regressors. Recently, Khurana et al. [14] have shown that the transformation is not required and it is easy to get the estimator for the original regression parameter explicitly. As they stated that the MJRR was introduced for the transformed parameter, we have used the transformed model in Equation (3). Then, the MSE of the RR estimator is obtained by Jadhav and Kashid [11] as MSE(γˆ RR ) = E(γˆ RR − γ )T (γˆ RR − γ ) = σ 2 (I − kA−1 )−1 (I − kA−1 ) + k2 A−1 γ γ T A−1 .
(7)
Singh et al. [28] proposed the Jackknifed ridge regression (JRR) estimator as γˆ JRR = (I + kA−1 )−1 γˆ RR = (I − k2 A−2 )γˆ OLS .
(8)
Then, Jadhav and Kashid [11] obtained the MSE of the JRR estimator as MSE(γˆ JRR ) = E(γˆ JRR − γ )T (γˆ JRR − γ ) = σ 2 (I − k2 A−2 )−1 (I − k2 A−2 ) + k4 A−2 γ γ T A−2
(9)
JOURNAL OF APPLIED STATISTICS
1895
The modified Jackknifed ridge regression (MJRR) estimator is proposed by Batah et al. [3] as γˆ MJRR = (I − k2 A−2 )γˆ RR = (I − k2 A−2 )(I − kA−1 )γˆ OLS .
(10)
The MSE of the MJRR estimator in Equation (10) is given by MSE(γˆ MJRR ) = E(γˆ MJRR − γ )T (γˆ MJRR − γ ) = σ 2 (I − k2 A−2 )(I − kA−1 )−1 (I − kA−1 )(I − k2 A−2 ) + k2 wA−1 γ γ T A−1 w,
(11)
Downloaded by [Hacettepe University] at 02:30 15 June 2016
where w = (I + kA−1 − k2 A−2 ). 2.2. The Poisson regression estimators The Poisson regression model is a common method for analyzing count data in applied researches. There is a widespread usage in microeconometrics when the dependent variable yi is Poisson distributed, where μi = exp (xi β). Here, xi is the ith row of X which is an n × p data matrix with p independent variables and β is an n × p vector of coefficients. β can be estimated by maximizing the log-likelihood given by (β) =
n
[− exp(xi β) + (xi β)yi − log(yi !)].
i=1
The vector of coefficients using the ML is then estimated by solving the following equation: ∂(β) = [yi − exp(xi β)]xi = 0. ∂β n
S(β) =
(12)
i=1
Since Equation (12) is nonlinear in β, the solution of the score vector S(β) is found by the iterative weighted least-square algorithm ˆ −1 XT Wˆ ˆ s, βˆ ML = (XT WX)
(13)
ˆ = diag [μˆ i ] and sˆ is a vector where the ith element equals to sˆi = log(μˆ i ) + ((yi − where W μˆ i )/μˆ i ). The ML estimator is asymptotically normally distributed with a covariance matrix that corresponds to the inverse of the matrix of the second derivative ˆ −1 . Cov(βˆ ML ) = (XT WX)
(14)
The MSE of Equation (13) can be written as ˆ −1 ] = E(βˆ ML − β)T (βˆ ML − β) = tr [(XT WX)
p
1
j=1
λj ML
,
(15)
ˆ matrix. Note that the weighted matrix of where λjML is the jth eigenvalue of the XT WX T ˆ is ill-conditioned which leads to instability and high variance of cross products, X WX,
1896
S. TÜRKAN AND G. ÖZEL
the ML estimator when the independent variables are highly correlated. In that situation, it is difficult to interpret the estimated parameters since the vector of estimated coefficients becomes too long [20]. Due to the presence of multicollinearity in the Poisson regression analysis, the PRR is proposed by Månsson and Shukur [20] as follows: ˆ + kI)−1 XT WX ˆ βˆ ML βˆ PRR = (XT WX ˆ + kI)−1 XT Wˆ ˆ s. = (XT WX
(16)
Downloaded by [Hacettepe University] at 02:30 15 June 2016
Note that this type of shrinkage estimator minimizes the increase in the weighted sum of squared error. Hence, the shrinkage parameter, k, may take on values between zero and infinity. βˆ RR < βˆ ML , for k > 0 and βˆ RR = βˆ ML , for k = 0.
3. Proposed modified Jackknifed Poisson ridge regression estimator In this section, the MJPR estimator is derived by following the study of Singh et al. [28] in proposing a Jackknifed form of the ridge estimator in the LR model. Then, the MSE property of the proposed estimator is obtained and several different methods of estimating the shrinkage parameter, k, are described. Let G = (g1 , g2 , . . . , gp ) is a p × p matrix whose columns are normalized eigen vectors ˆ and PR = diag(λ1PR , λ2PR , . . . , λpPR ), such that GT XT WXG ˆ ˆ = PR of XT WX = ZT WZ and Z = XG. The Poisson estimators of βˆ ML in Equation (13) can be written as T ˆ γˆ ML = −1 PR Z Wˆs,
βˆ ML = Gγˆ ML . The PRR is rewritten as ˆ s = (I − kB−1 )γˆ ML , γˆ PRR = (PR + kI)−1 ZT Wˆ
(17)
where B = (PR + kI) and k > 0. Let s−i , z−i , and W[−i] denote, respectively, the vector s with its the ith row deleted, the matrix Z with the ith row deleted, and the matrix W with the ith both row and column deleted. The PRR estimator with the ith observation deleted is given by T ˆ −1 T ˆ γ PRR −i = (Z−i W[−i] Z−i + kI) Z−i W[−i] sˆ −i ,
(18)
T ˆ ˆ [−i] sˆ−i equals to ZT Wˆ ˆ s − zT µ where ZT−i W i ˆ i sˆi and the inverse matrix (Z−i W[−i] Z−i + −1 kI) is obtained from the Sherman–Morrison Woodbury theorem as
ˆ [−i] Z−i + kI)−1 = (ZT WZ ˆ + kI)−1 + (ZT−i W
ˆ + kI)−1 zi zT (ZT WZ ˆ + kI)−1 μˆ i (ZT WZ i . ˆ + kI)−1 zi μˆ i 1 − zT (ZT WZ i
Then, we have γˆ JPR −i
= γˆ
PRR
ˆ + kI)−1 zT μˆ i (ˆsi − zT γˆ PRR ) (ZT WZ i i − , T T −1 ˆ 1 − z (Z WZ + kI) zi μˆ i i
where γˆ
PRR
=
ˆ + kI)−1 ZT Wˆ ˆ s. (ZT WZ
JOURNAL OF APPLIED STATISTICS
1897
Motivated by Hinkley [8], we obtain theJPR estimator as ˆ + kI)−1 γˆ JPR = γˆ PRR + (ZT WZ
n
z Ti μˆ i (ˆsi − z Ti γˆ PRR ),
(19)
i=1
which can be simplified to yield γˆ JPR = (I + B−1 k)γˆ PRR = (I − (B−1 k)2 )γˆ ML ,
(20)
ˆ + kI). where B = (ZT WZ From Equation (20), we obtain the MJPR as
Downloaded by [Hacettepe University] at 02:30 15 June 2016
γˆ MJPR = (I − B−2 k2 )(I − B−1 k)γˆ ML .
(21)
Then, the MSEs of γˆ PRR , γˆ JPR , and γˆ MJPR are obtained as, respectively MSE(γˆ PRR ) = Var(γˆ PRR ) + [bias(γˆ PRR )][bias(γˆ PRR )]T −1 2 −1 T −1 = (I − kB−1 )−1 PR (I − kB ) + k B γ γ B ,
MSE(γˆ
JPR
) = Var(γˆ =
JPR
) + [bias(γˆ
JPR
)][bias(γˆ
JPR
)]
2 −2 4 −2 T −2 (I − k B )−1 PR (I − k B ) + k B γ γ B , MJPR MJPR MJPR T 2 −2
MSE(γˆ MJPR ) = Var(γˆ
) + [bias(γˆ
)][bias(γˆ
(22)
T
(23)
)]
−1 2 −2 = (I − k2 B−2 )(I − kB−1 )−1 PR (I − kB )(I − k B )
+ k2 wP B−1 γ γ T B−1 wP ,
(24)
where wP = (I + kB−1 − k2 B−2 ). 3.1. Ridge parameter estimators There is no definite rule for estimating the ridge parameter, k. However, several methods have been proposed for the RR model, and these are generalized in this paper to be applicable for the MJPR estimator. Hence, following the previous studies on estimation of the ridge parameter, the five most commonly used estimators are considered. The first estimator proposed by Hoerl and Kennard [9,10] as σˆ 2 MJPR1 = kˆ 1 = 2 , αˆ max
(25)
ML 2 ˆ is the maximum element of δ T βˆ . Here, δ is the eigen vector of XT WX where αˆ max n 2 2 and σˆ = i=1 (yi − μˆ i ) /(n − p − 1).The second estimator of k introduced by Schaeffer et al. [27] as
MJPR2 = kˆ 2 =
1 . 2 αˆ max
1898
S. TÜRKAN AND G. ÖZEL
Then, the following estimators are proposed by Kibria et al. [16,17] as MJPR3 = kˆ 3 = median(m2i ), MJPR4 = kˆ 4 = max(mi ), 1 , MJPR5 = kˆ 5 = max mi where mi =
ML σˆ 2 /αˆ i2 and αˆ i is the ith element of δ T βˆ .
Downloaded by [Hacettepe University] at 02:30 15 June 2016
4. Application In this section, both the simulated data and the real data are used to evaluate the performance of the proposed MJPR estimator over the PRR, JPR, and ML estimators. We start by describing how we generate the data and which factors have been varied in the design of the experiment. Then, a discussion of the results obtained from the Monte Carlo simulations is provided. 4.1. The Monte Carlo simulation The dependent variable of the Poisson regression model is generated from P0 (μi ) where μi = exp(xi β),
i = 1, 2, . . . , n.
(26)
The starting values of the parameter in Equation (26) are selected as normalized eigen vectors corresponding to the largest eigen values of XT X matrix so that β T β = 1. The MSE is minimized when β is normalized eigen vectors corresponding to the largest eigen value of XT X matrix subject to constraint that β T β = 1 [15]. Several values for the coefficient of correlation and the sample sizes are considered in the simulation study. The following formula is used to generate the data with several degrees of the coefficient of correlation: xij = (1 − ρ 2 )1/2 mij + ρmip ,
i = 1, 2, . . . , n, j = 1, 2, . . . , p,
(27)
where mij are random numbers generated from the standard normal distribution and p is the number of parameters [15]. Then, based on the independent variables, the dependent variable of the Poisson regression model is generated. Three different values of ρ 2 corresponding to 0.9, 0.95, and 0.99 and four different sample sizes corresponding to 15, 30, 50, and 100 are considered in the simulation study. For given values of n, p and ρ 2 , the set of independent variables are generated. Then, the experiment was repeated 1000 times. For each replicate, the values of k for the different proposed estimators and corresponding the ridge estimators are calculated. To illustrate the theoretical results and to evaluate the performance of various estimators, the estimated MSE (EMSE) is used. The EMSE of each estimator is obtained by replacing all unknown parameters in the corresponding theoretical MSE expression of that estimator [12]. The average of the EMSE (AEMSE) for the PRR is obtained by using the average of the sum of estimated variances (ASEVAR) for the
JOURNAL OF APPLIED STATISTICS
1899
PRR and the average of the sum of estimated squared bias (ASESB) for the PRR given in Equation (22) as AEMSE = ASEVAR + ASESB 1000 p 1000 p 1 λPR 1 k2 γˆ PR = + , 1000 i=1 j=1 (λPR + k)2 1000 i=1 j=1 (λPR + k)2
(28)
Downloaded by [Hacettepe University] at 02:30 15 June 2016
where γˆ PR is the estimate of γ obtained from the PRR. Using the same technique, the AEMSEs of the ML, JPR, MJPR are calculated and the relative percentage errors obtained by using the AEMSE of the ML, PRR, JPR, MJPR are given in Table 1. The ASEVAR and ASESB of the ML, PRR, JPR, MJPR are given in Table 3. The estimated MSE of marginal effects is also calculated using following formula: 1000 1 EMSEM = 1000 s=1
n
n
ˆ i γˆj i=1 μ n
−
i=1 μi γj
n
2 ,
(29)
n μ ˆ γ ˆ /n and each true slope coefficient is where each slope coefficient is i j i=1
n i=1 μi γj /n [20]. In Table 2, the AEMSEs of the slope coefficients and the marginal effects are given for the Poisson regression model including four independent variables and ρ = 0.99. Table 1 presents the percentage relative error (PRE) values of the MJPR, JPR, PRR, and ML. The proportions of replications for which the MJPR estimators produced a smaller ratio of the AEMSE than the JPR, PRR, and the ML estimators are calculated and given in the parenthesis. As seen from Table 1, the AEMSE ratios of the ML, PRR, and JPR estimators over the MJPR estimators are greater than one. It means that the MJPR estimators are more efficient than the ML, PRR, and JPR estimators in all situations. In addition, the smaller AEMSE frequency of the MJPR estimators is greater than that of the ML, PRR, and JPR estimators. As seen from Table 2, all of the proposed MJPR estimators outperform the JPR, PRR, and the ML estimators in the sense that they have smaller AEMSEs for both slope parameters and marginal effects. Table 3 shows the ASEVAR and ASESB of the MJPR, JPR, PRR, and ML. As shown in Table 3, the ASEVARs of each estimator are increasing with the increasing of the degree of multicollinearity (ρ) and decreasing with the increasing value of n. The MJPR consistently shows smaller ASEVAR than that of the other estimators in all cases. Notice that the JPR has also served its purpose by reducing the bias in almost all of the cases. However, the MJPR3 performs better than the JPR3 in sense of lower bias. 4.2. Real data application In this study, the data of 18 football teams for the 2012–2013 Super League Season in Turkey is analyzed using the Poisson regression model [29,30]. The dependent variable is the number of won matches (NWM). The model deviance test is used to decide if the Poisson regression model is appropriate to the data set [23]. The value of the residual deviance statistic is 5.629 with 12 df and the p-value is .9335. It shows that the
1900
Downloaded by [Hacettepe University] at 02:30 15 June 2016
n 15
ρ
ML MJPR1
PRR1 MJPR1
JPR1 MJPR1
ML MJPR2
PRR2 MJPR2
JPR2 MJPR2
ML MJPR3
PRR3 MJPR3
JPR3 MJPR3
ML MJPR4
PRR4 MJPR4
JPR4 MJPR4
ML MJPR5
PRR5 MJPR5
JPR5 MJPR5
0.90
2.30 (99) 4.49 (99) 15.24 (99)
1.16 (98) 1.38 (98) 1.78 (99)
1.72 (98) 2.39 (98) 3.52 (99)
2.06 (99) 5.32 (99) 12.62 (99)
1.18 (99) 1.46 (99) 1.77 (99)
1.84 (99) 2.57 (99) 3.52 (99)
2.51 (100) 5.65 (100) 12.94 (100)
1.18 (99) 2.52 (99) 6.59 (99)
4.09 (99) 5.36 (99) 3.45 (99)
5.43 (100) 14.79 (100) 84.10 (100)
1.51 (99) 2.28 (99) 3.72 (99)
2.77 (99) 5.06 (99) 9.61 (99)
1.62 (100) 4.87 (100) 35.07 (100)
1.09 (100) 1.64 (100) 2.13 (100)
1.48 (100) 3.20 (100) 4.82 (100)
3.41 (99) 9.68 (99) 39.79 (99)
1.28 (99) 1.22 (99) 2.27 (99)
2.10 (99) 1.95 (99) 5.22 (99)
2.23 (99) 6.58 (99) 13.95 (99)
1.05 (99) 1.27 (99) 1.41 (98)
1.35 (99) 2.06 (99) 2.90 (99)
3.66 (100) 15.28 (100) 33.84 (100)
2.06 (99) 7.84 (100) 16.22 (100)
2.53 (99) 1.08 (99) 3.96 (100)
4.99 (100) 19.81 (100) 85.10 (100)
1.23 (100) 1.85 (100) 2.66 (99)
1.94 (100) 3.94 (100) 6.47 (99)
1.46 (100) 4.46 (100) 16.47 (100)
1.19 (100) 1.08 (100) 1.66 (100)
1.27 (100) 1.47 (100) 3.30 (100)
61.13 (99) 7.3 (99) 15.24 (99)
2.00 (99) 1.23 (99) 1.88 (98)
2.21 (99) 1.97 (99) 3.89 (98)
20.47 (99) 2.82 (99) 5.58 (99)
1.13 (99) 1.10 (99) 1.49 (99)
1.66 (99) 1.56 (99) 2.73 (99)
62.90 (100) 7.17 (100) 10.68 (100)
41.73 (99) 3.73 (100) 5.05 (100)
1.73 (99) 3.13 (99) 4.62 (100)
53.58 (100) 6.99 (100) 20.32 (100)
1.21 (99) 1.24 (100) 2.18 (99)
1.96 (99) 2.01 (100) 4.74 (99)
14.97 (100) 1.71 (100) 2.72 (100)
1.02 (100) 1.09 (100) 1.25 (100)
1.22 (100) 1.51 (100) 1.98 (100)
3.00 (100) 3.13 (100) 7.82 (99)
1.05 (100) 1.06 (98) 1.46 (98)
1.35 (99) 1.38 (96) 2.68 (98)
1.26 (99) 3.17 (99) 3.18 (100)
1.03 (99) 1.01 (99) 1.23 (99)
1.23 (99) 1.13 (99) 1.94 (99)
3.53 (100) 3.26 (100) 12.54 (100)
2.14 (100) 1.93 (100) 5.93 (99)
2.24 (99) 2.25 (100) 4.23 (100)
2.50 (100) 2.92 (100) 9.68 (100)
1.13 (100) 1.16 (99) 1.82 (100)
1.56 (100) 1.71 (99) 3.69 (100)
1.26 (100) 1.13 (100) 1.71 (100)
1.00 (100) 1.01 (100) 1.09 (100)
1.06 (100) 1.13 (100) 1.52 (100)
0.95 0.99 30
0.90 0.95 0.99
50
0.90 0.95 0.99
100
0.90 0.95 0.99
S. TÜRKAN AND G. ÖZEL
Table 1. The PRE of the ML, PRR, JPR, and MJPR estimators.
Estimated MSE for slope parameters
ML
15 30 50 100
9.083 8.595 1.890 0.978
15 30 50 100
6570.7 3068.0 1842.6 1141.5
PRR1 1.058 0.491 0.233 0.183 514.3 355.2 328.4 249.3
JPR1 2.099 1.128 0.482 0.335 909.6 743.9 580.9 415.7
MJPR1 0.596 0.216 0.124 0.125 454.2 163.5 282.7 140.0
PRR2 1.271 0.869 0.504 0.378
JPR2 2.532 1.789 0.927 0.598
MJPR2 0.720 0.616 0.339 0.308
PRR3 0.702 0.254 0.177 0.078
Estimated MSE for marginal effects 471.4 899.9 414.7 432.6 444.3 850.4 279.2 159.3 449.3 784.5 367.7 119.4 407.1 617.9 364.9 132.6
JPR3 1.379 0.530 0.374 0.165 897.6 339.7 246.4 275.2
MJPR3 0.40 0.134 0.081 0.039 288.8 108.9 41.31 107.9
PRR4 0.402 0.269 0.203 0.184 123.8 107.6 107.7 158.7
JPR4 1.038 0.653 0.441 0.373 260.6 223.4 222.6 303.1
MJPR4 0.108 0.101 0.093 0.101 14.03 55.28 61.5 125.6
PRR5 0.552 0.869 0.872 0.625 72.6 212.5 341.8 721.4
JPR5 1.248 1.723 1.379 0.867 170.4 385.9 554.4 965.6
MJPR5 0.259 0.522 0.696 0.572 21.91 155.8 292.7 508.8
JOURNAL OF APPLIED STATISTICS
Downloaded by [Hacettepe University] at 02:30 15 June 2016
Table 2. The estimated MSE for the slope coefficients and the marginal effects for p = 4 and ρ = 0.99.
1901
1902
Table 3. Variance and Bias2 of the PRR, JPR and MJPR estimators.
Downloaded by [Hacettepe University] at 02:30 15 June 2016
ρ
n = 15
0.90 0.95 0.99
n = 30
JPR1 Var
Bias2
0.027 0.106 0.275
0.353 0.616 0.783
0.90 0.95 0.99
0.021 0.026 0.159
n = 50
0.90 0.95 0.99
n = 100
0.90 0.95 0.99
n
Var
0.012 0.071 0.336
0.551 0.183 1.763
0.165 0.243 0.332
0.013 0.012 0.270
0.060 0.014 0.061
0.082 0.128 0.172
0.001 0.002 0.028
0.041 0.073 0.155
n = 15 n = 30 n = 50 n = 100
PRR2
Var
Bias2
0.027 0.087 0.156
0.301 0.437 0.440
0.292 0.420 0.858
0.019 0.027 0.067
0.008 0.007 0.083
0.149 0.219 0.399
0.000 0.000 0.023
0.054 0.098 0.312
PRR4 n
MJPR1 Bias2
JPR4
JPR2 Var
Bias2
0.030 0.113 0.320
0.403 0.531 0.951
0.126 0.194 0.149
0.005 0.046 0.196
0.011 0.014 0.032
0.060 0.101 0.092
0.001 0.002 0.022
0.039 0.069 0.103
MJPR4
MJPR2 Var
Bias2
0.010 0.080 0.343
0.664 1.054 2.189
0.227 0.368 0.673
0.001 0.022 0.359
0.013 0.017 0.100
0.226 0.312 0.404
0.001 0.000 0.039
0.097 0.071 0.339
PRR3
Var
Bias2
0.034 0.086 0.204
0.332 0.356 0.516
0.298 0.646 1.430
0.006 0.043 0.167
0.002 0.003 0.118
0.348 0.463 0.809
0.000 0.000 0.013
0.117 0.079 0.585
PRR5
JPR5
JPR3 Var
Bias2
0.098 0.137 0.178
0.203 0.279 0.524
0.216 0.282 0.449
0.021 0.032 0.074
0.015 0.019 0.071
0.197 0.279 0.268
0.001 0.000 0.041
0.094 0.070 0.267
MJPR5
ρ
Bias2
Var
Bias2
Var
Bias2
Var
Bias2
Var
Bias2
Var
Bias2
Var
0.90 0.95 0.99 0.90 0.95 0.99 0.90 0.95 0.99 0.90 0.95 0.99
0.037 0.112 0.149 0.013 0.046 0.091 0.009 0.015 0.059 0.003 0.006 0.044
0.173 0.251 0.253 0.109 0.154 0.178 0.089 0.134 0.144 0.051 0.082 0.140
0.031 0.186 0.301 0.005 0.074 0.186 0.004 0.006 0.082 0.001 0.001 0.044
0.354 0.619 0.737 0.187 0.352 0.467 0.155 0.235 0.359 0.074 0.129 0.329
0.027 0.043 0.038 0.013 0.022 0.027 0.009 0.014 0.027 0.003 0.006 0.028
0.112 0.116 0.070 0.086 0.086 0.074 0.072 0.106 0.066 0.045 0.070 0.073
0.021 0.131 0.131 0.061 0.025 0.151 0.003 0.029 0.101 0.000 0.001 0.034
0.484 0.659 0.421 0.344 0.494 0.718 0.293 0.509 0.771 0.095 0.197 0.591
0.003 0.140 0.237 0.000 0.005 0.187 0.000 0.005 0.056 0.000 0.000 0.006
0.685 1.405 1.011 0.429 0.702 1.536 0.353 0.739 01.323 0.101 0.221 0.861
0.025 0.082 0.049 0.007 0.028 0.084 0.004 0.033 0.089 0.000 0.001 0.040
0.44 0.401 0.210 0.332 0.452 0.438 0.286 0.459 0.607 0.095 0.195 0.532
MJPR3 Var
Bias2
Var
0.238 0.208 0.216
0.400 0.725 1.163
0.018 0.060 0.100
0.138 0.114 0.300
0.114 0.108 0.180
0.015 0.036 0.121
0.225 0.237 0.409
0.017 0.191 0.032
0.078 0.062 0.102
0.005 0.026 0.056
0.064 0.091 0.121
0.002 0.028 0.072
0.102 0.197 0.302
0.006 0.016 0.027
0.054 0.056 0.054
0.005 0.009 0.021
0.029 0.059 0.057
0.003 0.005 0.027
0.053 0.110 0.138
0.004 0.008 0.011
0.021 0.043 0.028
S. TÜRKAN AND G. ÖZEL
PRR1 Bias2
JOURNAL OF APPLIED STATISTICS
1903
Table 4. The correlation matrix of the independent variables.
Downloaded by [Hacettepe University] at 02:30 15 June 2016
NRC NS NOG NCG NGR1 NGR2
NRC
NS
NOG
NCG
NGR1
NGR2
1.00 −0.025 0.23 −0.12 0.15 0.45
1.00 −0.19 0.03 0.10 0.28
1.00 0.64 0.74 0.57
1.00 0.85 0.85
1.00 0.92
1.00
Poisson regression model fits well to the data. The independent variables that explain the NWM are determined as the number of red cards (NRC), the number of substitutions (NS), the number of matches ending over 2.5 goals (NOG), the number of matches completed with goals (NCG), the ratio of the goals scores in number of matches (NGR1 = NGS/NM), and the ratio of goals scores in the sum of goals conceded and goal scores [NGR2 = NGS/(NGC + NGS)]. Firstly, the bivariate correlations among independent variables are presented in Table 4. From Table 4, it is seen that the bivariate correlations between the NGR1, NGR2, and the NCG are high (greater than 0.80). It means that these variables are correlated with each other and there could be multicollinearity in the data. To investigate this, the variance inflation factor (VIF) values, which are the eigen values of the correlation matrix of independent variables, are calculated. Marquardt and Snee [22] suggest that the VIF values greater than 10 indicates the multicollinearity. The VIF values are obtained as 1.40, 1.53, 3.13, 5.79, 14.17, and 12.15. Furthermore, the condition number, which is the ratio of the largest to the smallest eigenvalue, is 16381.95 > 1000. It shows that there is a strong multicollinearity in the model [23]. Therefore, it can be concluded that there is a multicollinearity in the data. Table 5 presents the parameter estimates and standard errors obtained from the PRR, JPR, and MJPR. The PRR, JPR, and MJPR estimators are applied with the ridge parameters PRR3, PRR4, MJPR3, and MJPR4 since these have the best overall performance. It can be seen that the NRC, the NS, the number of matches ending over 2.5 goals (NOG), the number of matches completed with goals (NCG), the ratio of the goals scores in number of matches (NGR1 = NGS/NM) have negative impacts, while the ratio of goals scores in the sum of goals conceded and goal scores [NGR2 = NGS/(NGC + NGS)] has positive impact on the NWM for the ridge parameter, k3 . Similarly, the NRC, the NS, the Table 5. The parameter estimates and standard errors obtained from the PRR, JPR, and MJPR estimators. γˆ1 γˆ2 γˆ3 γˆ4 γˆ5 γˆ6
γˆML
PRR3
JPR3
MJPR3
PRR4
JPR4
MJPR4
−0.02 (4.27E-07) −0.05 (0.0002) −0.02 (0.001) 0.05 (0.002) −0.006 (0.004) 3.98 (10.06)
−0.02 (4.27E-07) −0.05 (0.0002) −0.02 (0.0007) −0.04 (0.001) −0.004 (0.001) 0.003 (3.68E-09)
−0.02 (4.27E-07) −0.05 (0.0002) −0.02 (0.0009) −0.04 (0.002) −0.005 (0.002) 0.006 (1.47E-08)
−0.02 (4.27E-07) −0.05 (0.0002) −0.01 (0.0007) −0.03 (0.0009) −0.003 (0.0008) 0.000004 (7.31E-15)
−0.02 (4.27E-07) −0.05 (0.0009) −0.02 (0.002) 0.04 (0.003) −0.005 (0.004) 0.01 (3.99E-07)
−0.02 (4.27E-07) −0.05 (0.0002) −0.02 (0.001) 0.04 (0.002) −0.006 (0.004) 0.03 (1.59E-06)
−0.02 (4.27E-07) −0.05 (0.0002) −0.02 (0.0009) 0.04 (0.002) −0.006 (0.003) 0.00009 (1.79E-11)
1904
S. TÜRKAN AND G. ÖZEL
number of matches ending over 2.5 goals (NOG), the ratio of the goals scores in number of matches (NGR1 = NGS/NM) have negative impacts, while the number of matches completed with goals (NCG), the ratio of goals scores in the sum of goals conceded and goal scores [NGR2 = NGS/(NGC + NGS)] have positive impacts on the NWM for the ridge parameter k4 . Table 5 shows that the values of the standard errors become smaller for some coefficients when the MJPR is applied. The most substantial decrease in the standard error can be obtained for γˆ6 when applying MJPR3.
Downloaded by [Hacettepe University] at 02:30 15 June 2016
5. Summary and conclusions Multicollinearity is a common problem in applied researches and it leads to some of the regression coefficients being statistically insignificant. Hence, multicollinearity makes difficult to have valid statistical inference. In this paper, a new modified Jackknifed estimator for the Poisson regression model is proposed in order to solve the problem of multicollinearity. Then, some methods of estimating the shrinkage parameter that were developed for the LR model are generalized. A Monte Carlo simulation study was conducted to compare the performance of the estimators using the AEMSE, ASEVAR, and ASESB. The simulation results show that the performance of the proposed MJPR estimator is better than ML, PRR, and JPR estimators in the presence of multicollinearity with respect to both AEMSE ratios and frequencies.
Acknowledgements The authors thank editor and the referees for their constructive comments and suggestions which lead to significant improvements in this paper.
References [1] M. Alkhamisi, G. Khalaf, and G. Shukur, Some modifications for choosing ridge parameters, Commun. Stat. Theory Math. 35 (2006), pp. 2005–2020. [2] M. Alkhamisi and G. Shukur, Developing ridge parameters for SUR model, Commun. Stat. Theory Math. 37 (2008), pp. 544–564. [3] F.S.M. Batah, T.V. Ramanathan, and S.D. Gore, The efficiency of modified Jackknife and ridge type regression estimators: A comparison, Surv. Math. Appl. 3 (2008), pp. 111–122. [4] C. Cameron and P.K. Trivedi, Regression Analysis of Count Data, Cambridge University Press, Cambridge, 1998. [5] U. Gråsjö, Accessibility to R&D and Patent Production, Royal Institute of Technology, Centre of Excellence for Science and Innovation Studies, Stockholm, 2005. [6] W.H. Greene, Accounting for Excess of Zeros and Sample Selection in Poisson and Negative Binomial Regression Model, Discussion Paper EC-94-10, Department of Economics, New York University, New York, 1994. [7] M.H.J. Gruber, The efficiency of Jackknife and usual ridge type estimators: A comparison, Stat. Probab. Lett. 11 (1991), pp. 49–51. [8] D.V. Hinkley, Jackknifing in unbalanced situations, Technometrics 19 (1977), pp. 285–292. [9] A.E. Hoerl and R.W. Kennard, Ridge regression: Biased estimation for non-orthogonal problems, Technometrics 12 (1970), pp. 55–67. [10] A.E. Hoerl and R.W. Kennard, Ridge regression: Application to non-orthogonal problems, Technometrics 12 (1970), pp. 69–82. [11] N.H. Jadhav and D.N. Kashid, A Jackknifed ridge M-estimator for regression model with multicollinearity and outliers, J. Stat. Theory Pract. 5 (2011), pp. 659–673.
Downloaded by [Hacettepe University] at 02:30 15 June 2016
JOURNAL OF APPLIED STATISTICS
1905
[12] N.H. Jadhav and D.N. Kashid, Robust linearized ridge M-estimator for inean regression model, Commun. Stat. Simul. Comput. 45 (2016), pp. 1–24. [13] G. Khalaf and G. Shukur, Choosing ridge parameters for regression problems, Commun. Stat. Theory Math. 34 (2005), pp. 1177–1182. [14] M. Khurana, Y.P. Chaubey, and S. Chandra, Jacknifing the ridge regression estimator: A revisit, Commun. Stat. Theor. M. 43 (2014), pp. 5249–5262. [15] B.M.G. Kibria, Performance of some new ridge regression estimators, Commun. Stat. Theor. Methods. 32 (2003), pp. 419–435. [16] B.M.G. Kibria, K. Månsson, and G. Shukur, Performance of some logistic ridge regression estimators, Comput. Econ. 40 (2011), pp. 401–414. [17] B.M.G. Kibria, K. Månsson, and G. Shukur, A simulation study of some biasing parameters for the ridge type estimation of Poisson regression, Commun. Stat. Simul. Comput. 44 (2015), pp. 943–957. [18] K. Liu, A new class of biased estimate in linear regression, Commun. Stat. Theor. Methods. 22 (1993), pp. 393–402. [19] W.G. Manning and J. Mullahy, Estimating log models: To transform or not to transform? J. Health Eco. 20 (2001), pp. 461–494. [20] K. Månsson and G. Shukur, A Poisson ridge regression estimator, Econ. Model. 28 (2011), pp. 1475–1481. [21] K. Månsson, G. Shukur, and P. Sjölander, A new asymmetric interaction ridge (AIR) regression method, HUI Working Papers 54 (2012), pp. 1–19. [22] D.W. Marquardt and R.D. Snee, Generalized in inverses, ridge regression, biased linear estimation, Technometrics 12 (1970), pp. 591–612. [23] D.C. Montgomery, E.A. Peck, and G.G. Vining, Introduction to Linear Regression Analysis, Wiley, New York, 2003. [24] G. Muniz and B.M.G. Kibria, On some ridge regression estimators: An empirical comparisons, Commun. Stat. Simul. Comput. 38 (2009), pp. 621–630. [25] H. Nyquist, Applications of the Jackknifed procedure in ridge regression, Comput. Stat. Data Anal. 6 (1988), pp. 177–183. [26] J.M.C. Santos Silva and S. Tenreyro, The log of gravity, Rev. Econ. Stat. 88 (2006), pp. 641–658. [27] R.L. Schaefer, L.D. Roi, and R.A. Wolfe, A ridge logistic estimator, Commun. Stat. Theory Math. 13 (1984), pp. 99–113. [28] B. Singh, Y.P. Chaubey, and T.D. Dwivedi, An almost unbiased ridge estimators, Sankhya 48 (1986), pp. 342–346. [29] Available at http://www.tff.org/default.aspx?pageID = 164. [30] Available at http://www.sahadan.com/takim_istatistikleri/Turkiye_Spor_Toto.