Journal of Statistical Research 2004, Vol. 38, No. 1, pp. 13-31 Bangladesh
ISSN 0256 - 422 X
ESTIMATION STRATEGIES FOR PARAMETERS OF THE LINEAR REGRESSION MODELS WITH SPHERICALLY SYMMETRIC DISTRIBUTIONS S. M. M. Tabatabaey Department of Statistics, School of Mathematical Sciences, Ferdowsi University of Mashhad, P. O. Box 1159-9177948953, Mashhad, Iran. Email:
[email protected] A. K. Md. E. Saleh Department of Mathematics and Statistics, Carleton University, Ottawa, Canada K1S 5B6. Email:
[email protected] B. M. Golam Kibria Department of Statistics, Florida International University, Miami, FL 33199, USA. Email:
[email protected]
summary This paper deals with the problem of pre-test and shrinkage estimation of the parameters for the linear regression model with spherically symmetric error distributions when the sub hypothesis restrictions are suspected. Accordingly we consider five well known estimators, namely, unrestricted estimator (UE), restricted estimator (RE), preliminary test (PT) estimator, shrinkage estimator (SE) and positive rule (PR) shrinkage estimator. The bias and risk functions of the proposed estimators are analyzed under both null and alternative hypotheses. Under the null hypothesis, the restricted estimator (RE) has the smallest risk followed by the pre-test or shrinkage estimators. However, the pre-test or shrinkage estimators perform the best followed by the unrestricted estimator (UE) and restricted estimator (RE) when the parameter moves away from the subspace of the restrictions. The conditions of superiority of the proposed estimator for departure parameter are provided. It is demonstrated that the positive rule shrinkage estimator utilizes both sample and non-sample information and performs uniformly better than UE and ordinary shrinkage estimators. Keywords and phrases: Bias; Dominance; James and Stein Estimator; Preliminary Test; Quadratic Risk; Spherical Distribution; Students t; Superiority. AMS Classification: 62J07, 62F03 c Institute of Statistical Research and Training (ISRT), University of Dhaka, Dhaka 1000, Bangladesh. °
14
1
TABATABAEY, SALEH & KIBRIA
Introduction
This paper will discuss about the estimation of the regression parameters when the errors of the model have spherically symmetric distribution. To describe the problem, we consider the following linear regression model y = Xβ + e,
(1.1)
where y = (y1 , y2 , . . . , yn )′ is an n × 1 vector of observations on the dependent variable, X is an n × p matrix of full rank p, β = (β1 , β2 , . . . , βp )′ is an p × 1 vector of parameters and e = (e1 , e2 , . . . , en )′ is an n × 1 vector of errors, which is distributed according to the laws belonging to the class of spherical compound normal distributions with E(e) = 0 and E(ee′ ) = σe2 In , where In is the n-dimensional identity matrix and σe2 the common variance of ei (i = 1, 2, . . . , n). This class is a subclass of the family of spherically symmetric distributions (SSDs) which can be expressed as a variance mixture of normal distributions, that is, Z ∞ f (e) = f (e|θ)g(θ)dθ, (1.2) 0
where f (e) is the probability density function (pdf ) of e, f (e|θ) is the pdf of normal with mean vector 0 and variance-covariance matrix θ2 In and g(θ) is the pdf of θ with support [0, ∞). In this case, E(θ2 ) = σe2 and we write e ∼ SSD({(0, E(θ2 )In )}). The well-known members of the SSD class are normal, multivariate Student’s t and Cauchy, a special case of multivariate Student’s t with 1 degree of freedom. In most applied as well as theoretical research works, the error terms in the linear models are assumed to be normally and independently distributed. However, such assumptions may not be appropriate in many practical situation (for example, see Gnanadesikan 1977 and Zellner 1976). It happens particularly if the error distribution has heavier tails. For instance, some economic data may be generated by processes whose distribution have more kurtosis than the normal distribution. One can tackle such situation by using the well known t distribution as it has heavier tail than the normal distribution, specially for smaller degrees of freedom (e.g. Fama (1965), Blatberg and Gonedes 1974, Sutradhar and Ali 1986, Ullah and Zinde-Walsh 1984). The multivariate Student’s t distribution can be obtained if g(θ) be assumed to have an inverted gamma (IG) density with say, scale parameter σ 2 and degrees of freedom ν, denoted by IG(ν, σ 2 ), and is given by µ 2 ¶ν/2 νσ 2 νσ 2 θ−(ν+1) e− 2θ2 , 0 < ν, σ, θ < ∞. g(θ) = Γ(ν/2) 2 Then, the multivariate t-distribution can be obtained from (1.2) as ¢ ¡ µ ¶− n+ν 2 Γ n+ν e′ e 2 f (e) = 1+ , 0 < ν, σ, < ∞, −∞ < ei < ∞. 2 n/2 n νσ (πν) Γ(ν/2)σ
(1.3)
15
Estimation Strategies for...
The mean vector and variance-covariance matrix of e are respectively, E(e) = 0,
and
E(ee′ ) =
ν σ 2 In = σe2 In , ν−2
ν > 2.
The marginal distributions are univariate Student t-distributions. For ν = 1, the pdf of (1.3) becomes Cauchy and as ν → ∞, the pdf approaches normal. For the full model the unrestricted estimator (U E) of β is given by βˆU E = C −1 X ′ y,
(1.4)
where C = X ′ X is the information matrix. The corresponding unbiased estimator of σe2 is given by σ ˜e2 =
(y − X βˆU E )′ (y − X βˆU E ) , m
m = n − p.
Our primary interest is to estimate the regression coefficients β when it is apriori suspected but not certain that β may be restricted to the subspace, H0 : Hβ
= h,
(1.5)
where H is an q × p known matrix of full rank q(< p) and h is an q × 1 vector of known constants. The restricted estimator (RE) of β is given by βˆRE = βˆU E − C −1 H ′ (HC −1 H ′ )−1 (H βˆU E − h) and
(1.6)
and the corresponding estimator of σe2 is given by σ ˆe2 =
(y − X βˆRE )′ (y − X βˆRE ) ; m+q
m=n−p
which is unbiased under the null hypothesis. Note that the restricted least squares estimator satisfies the condition H βˆ = h. The estimator of β in (1.4) is usually used in the case when there is no hypothesis information available on the vector of parameter of interest β. On the other hand, the estimator of β in (1.6) is useful in the presence of hypothesis (1.5). Furthermore, it is well known that the RE performs better than the U E, when the restrictions hold but as the parameters, β move away from the subspace Hβ = h, the RE becomes biased and inefficient while the performance of the U E remains stable. As a result, one may combine the U E and RE to obtain a better performance of the estimators in presence of the uncertain prior information (U P I) Hβ = h, which leads to the preliminary test estimator (P T ) and defined as βˆP T = βˆRE I(Ln ≤ Ln,α ) + βˆU E I(Ln > Ln,α ),
(1.7)
16
TABATABAEY, SALEH & KIBRIA
where, Ln =
(H βˆU E − h)′ (HC −1 H ′ )−1 )(H βˆU E − h) qs2
is the test-statistic for testing the null-hypothesis in (1.5), and Ln,α is the upper α-level critical value of Ln and I(A) is the indicator function of the set A. Under the null hypothesis and normal theory, Ln follows a central F -distribution with (q, m) degrees of freedom while under the alternative it follows the non-central F -distribution with (q, m) degrees of freedom and non-centrality parameter 12 ∆, where ∆=
(Hβ − h)′ (HC −1 H ′ )−1 (Hβ − h) σe2
is the departure parameter from the null hypothesis. It is important to remark that βˆP T is bounded and performs better than βˆRE in some part of the parameter space. The preliminary test approach estimation has been pioneered by Bancroft (1944), followed by Bancroft (1964), Mosteller (1948), Han and Bancroft (1968), Saleh and Sen (1978), Giles (1991), Kibria and Saleh (1993), Benda (1996) and recently Kibria and Saleh (2003 a,b) among others. They have assumed that the disturbances of the model are normally distributed. Note that, the preliminary test estimator (PT) has two characteristics: (1) it produces only two values, the unrestricted estimator and the restricted estimator, (2) it depends heavily on the level of significance of the preliminary test (PT). What about the intermediate value between βˆU E and βˆRE ? To overcome this shortcoming, we consider the Stein-type estimator. The Stein-type shrinkage estimator (SE) of β is defined as ˆU E − βˆRE ), βˆSE = βˆU E − dL−1 n (β
(1.8)
where d=
(q − 2)(n − p) , q(n − p + 2)
and q ≥ 3.
The SE in (1.8) will provide uniform improvement over βˆU E , however it is not a convex combination of βˆU E and βˆRE . Both (1.7) and (1.8) involve the statistic Ln which adjusts the estimator for departure from H0 . For large value of Ln both (1.7) and (1.8) yield βˆU E , while for small value of Ln their performance is different. The SE has the disadvantage that it has strange behavior for small Ln . Also the shrinkage factor (1 − dL−1 n ) becomes negative for Ln < d. This encourage one to find an alternative estimator. Hence, we define a better estimator, namely, the positive-rule shrinkage estimator (PR) of β as follows: ˆU E − βˆRE ). βˆP R = βˆSE − (1 − dL−1 n )I(Ln ≤ d)(β
(1.9)
The PR estimator in (1.9) will provide uniform improvement over βˆU E and it is a convex combination of βˆU E and βˆRE . The properties of stein-type estimators have been analyzed under normally assumption by various researchers. To mention a few, James and Stein
Estimation Strategies for...
17
(1961), Judge and Bock (1978) and Shalabh (1995). The positive part shrinkage estimator has been considered under the normally assumption by Ohtani (1993) and Adkins and Hill (1990) among others. Giles (1991) consider the pre-test estimator for the restricted linear model with spherically symmetric disturbances. He has investigated some finite sample properties of the estimators numerically for the case of multivariate t error. Gilies (1992) also considered PT estimator for two sample linear regression model under spherically symmetric disturbances. Kibria (1996) considered SE for the multicollinear data and for the restricted linear model with Students t error. Using MSE criterion, he discussed the performance of the estimators with respect both non-centrality and ridge parameters. Judge et al. (1985) discussed the finite sample properties of James-Stein type and its positive part of the location parameter vector under squared errors loss and multivariate t errors. They compared the risk functions of the estimators via a monte carlo experiments. Singh (1991) discussed the properties of James-Stein rule estimators in a regression model with multivariate Student’s t error. In general, the risk characteristics are found to be the same under normal and non-normal errors cases. However, there is very limited literature on the analytical results relating to the finite samples properties of positive rule shrinkage estimator for the linear model with non-normal error distribution. Since the PR estimator has some advantages over the SE or PT estimators, the main objective of this paper is to study the proposed estimators under a Student t disturbances for the estimation of regression coefficients in the model. The plan of the paper is as follows. In Section 2 we provide the expressions of biases and risks. Section 3 discuss the relative performance of the estimators. Finally, summary and conclusions have been included in Section 4.
2
The bias and risk of the estimators
In this section we give the expressions for the bias and quadratic risk of the estimators βˆU E , βˆRE , βˆP T , βˆSE , βˆP R .
2.1
Biases of the estimators
In this subsection we will discuss about the biases of the estimators. Note that the biases of the proposed estimators are routinely followed from Judge and Bock (1978, Chapter 10), and Saleh (2002). Therefore, we omit all derivation, instead, we present the expressions for the biases of the estimators in the following theorem. Theorem 2.1 Bias of the unrestricted estimator (UR), restricted estimator (RE), preliminary test estimator (PT), shrinkage estimator (SE) and the positive-rule shrinkage estimator (PR) are given respectively B(βˆU E )
= E(βˆU E − β) = 0
18
TABATABAEY, SALEH & KIBRIA
B(βˆRE )
= E(βˆRE − β) = −η
B(βˆP T )
= E(βˆP T − β) = −ηGq+2,n−p (x, ∆)
B(βˆSE )
= E(βˆSE − β) = −qdηE (2) [χ−2 q+2 (∆)] ½ · µ ¶¸ qd qd −1 PR (2) ˆ = E(β − β) = η E Fq+2,n−p (∆)I Fq+2,n−p (∆) ≤ q+2 q+2 ¾ qd (2) −1 − E (2) [Fq+2,n−p (∆)] − Gq+2,n−p (x, ∆) , (2.1) q+2
B(βˆP R )
(2)
where η = C −1 H ′ (HC −1 H ′ )−1 (Hβ − h), ¢ ¡ Γ ν2 + r + j − 2 ¢³ ¡ = Γ(r + 1)Γ ν2 + j − 2 r=0 1+ ½ ¾ 1 n−p × Ix , (q + 2i) + r, 2 2 ∞ X
(j)
Gq+2i,n−p (l∗ ; ∆)
³
∆ ν−2
∆ ν−2
´r
´ν/2+r+j−2
where I{.} is the incomplete beta function and x=
qFα . n − p + qFα
Also E (j) [χ−2 q+s (∆)]
=
r=0
E (j) [χ−2 q+s (0)]
=
¢ ³ ∆ ´r + r + j − 2 (q + s − 2 + 2r)−1 2 ν−2 ´ ν2 +r+j−2 ¡ ¢³ ∆ Γ(r + 1)Γ ν2 + j − 2 1 + ν−2
∞ Γ X
¡ν
(q + s − 2)−1 ,
j = 1, 2
and ¶¸ qd Fq+s,n−p (∆) ≤ E q+s ³ ¡ν ¢ ∆ ´r £ ¤ ∞ X Γ 2 + r + j − 2 ν−2 (q + s)Ix 21 (q + s − 2 + 2r), 12 (n − p + 2) = ´ ν2 +r+j−2 ¡ ¢³ ∆ r=0 Γ(r + 1)Γ ν2 + j − 2 1 + ν−2 (q + s − 2 + 2r) ¶¸ · µ qd −1 E (j) Fq+s,n−p (0)I Fq+s,n−p (0) ≤ q+s · ¸ 1 1 −1 = (q + s)(q + s − 2) Ix (q + s − 2), (n − p + 2) , j = 1, 2 2 2 (j)
·
−1 Fq+s,n−p (∆)I
µ
where x=
qd . n − p + qd
19
Estimation Strategies for...
For α = 0, the bias of βˆP T coincides with that of the restricted estimator, βˆRE , while for α = 1, it coincides with that of βˆU E , the unrestricted estimator β. Also as the non-centrality parameter ∆ → ∞, B(βˆU E ) = B(βˆP T ) = B(βˆSE ) = B(βˆP R ) = 0, while the B(βˆRE ) becomes unbounded. However, under H0 : Hβ = h, ∆ = 0, hence all the estimators are ′ unbiased. Note that ∆ can be expressed in terms of η as ∆ = η σCη 2 . e
Now we compare the biases under the alternative hypothesis. In order to present a clear cut picture of the biases, we transform them in quadratic (scaler) form by defining QB(βˆ∗ ) = B(βˆ∗ )′ CB(βˆ∗ ) as the quadratic bias function of an estimator βˆ∗ of the parameter vector β. The quadratic bias functions of the estimators can be expressed by the following theorem. Theorem 2.2 Quadratic bias of the unrestricted estimator (UE), restricted estimator (RE), preliminary test estimator (PT), shrinkage estimator (SE) and the positive-rule shrinkage estimator (PR) are given respectively QB(βˆU E )
=
QB(βˆRE )
= σe2 ∆
QB(βˆP T )
h i2 (2) = σe2 ∆ Gq+2,n−p (x, ∆)
QB(βˆSE ) QB(βˆP R )
0
h i2 = σe2 ∆ qdE (2) [χ−2 q+2 (∆)] ¶¸ · µ ½ qd qd −1 (2) 2 E Fq+2,n−p (∆)I Fq+2,n−p (∆) ≤ = σe ∆ q+2 q+2 ¾2 qd (2) −1 E (2) [Fq+2,n−p (∆)] − Gq+2,n−p (x, ∆) . − q+2
(2.2)
The quadratic bias function of all the estimators except βˆU E depends upon the parameters only through the ∆; thus the bias is a function of ∆. The bias of the PT depends on α, the size of the test. The magnitude of the RE increases without a bound and tends to ∞ (2) as ∆ → ∞. Since both E(χ−2 q+2 (∆)) and Gq+2,n−p (x, ∆) are decreasing function of ∆, the quadratic bias of PT, SE and PR estimators start from 0 and increase to a point and then decrease gradually to 0 when ∆ → ∞. However, the bias of PR estimator remain below the curve of SE and PT estimators. Figures 1 and 2 display various bias function for σe2 = 1 which support the above analysis to some extend. Based on the above analysis we may establish the following inequality QB(βˆU E ) ≤ QB(βˆP R ) ≤ QB(βˆSE ≤ QB(βˆP T ) ≤ QB(βˆRE ).
20
TABATABAEY, SALEH & KIBRIA
1.0 0.8
5
Bias
10
15
20
0
5
10
15
Delta
n=10, alpha=0.15
n=10, alpha=0.20
20
0.8 0.6 0.2
Bias
UE RE PT SR PR
0.0
0.0
0.2
0.4
0.6
UE RE PT SR PR
0.4
0.8
1.0
Delta
1.0
0
Bias
0.4 0.0
0.0
0.2
0.4
UE RE PT SR PR
UE RE PT SR PR
0.6
1.0 0.6
0.8
n=10, alpha=0.10
0.2
Bias
n=10, alpha=0.05
0
5
10 Delta
15
20
0
5
10
15
Delta
Figure 1. Quadratic bias function of various estimators for p = 5, q = 3 and ν = 5.
20
21
Estimation Strategies for...
Bias 0.0
0.0
0.2
Bias
UE RE PT SR PR
0.2
UE RE PT SR PR
0.4
0.4
0.6
n=20, alpha=0.10
0.6
n=20, alpha=0.05
5
10
15
20
0
5
10
15
Delta
n=20, alpha=0.15
n=20, alpha=0.20
20
Bias 0.0
0.0
0.2
Bias
UE RE PT SR PR
0.2
UE RE PT SR PR
0.4
0.4
0.6
Delta
0.6
0
0
5
10
15
20
0
5
Delta
10
15
20
Delta
Figure 2. Quadratic bias function of various estimators for p = 5, q = 4 and ν = 10.
2.2
Risk functions of the estimators
In this subsection we will present the quadratic risk function. Suppose β ∗ denotes an estimator of β, then for a given non-singular matrix W , the loss function is defined as L(β ∗ ; W ) = (β ∗ − β)′ W (β ∗ − β). The corresponding risk function of the estimator βˆ∗ is defined as R(β ∗ ; W ) = E(β ∗ − β)′ W (β ∗ − β) = tr(W M ),
(2.3)
where M is the mean-squared error matrix of β ∗ . The quadratic risk functions of the proposed estimators are routinely followed from Judge and Bock (1978, Chapter 10), Saleh (2002) and lead to the following theorem.
22
TABATABAEY, SALEH & KIBRIA
Theorem 2.3: Risks of UE, RE, PT, SE and PR are given respectively R(βˆU E ; W )
= σe2 tr(C −1 W )
R(βˆRE ; W )
£ ¤ = σe2 tr(C −1 W ) − tr(A) + η ′ Aη
R(βˆP T ; W )
R(βˆSE ; W )
R(βˆP R ; W )
£ ¤ (1) = σe2 tr(C −1 W ) − tr(A)Gq+2,n−p (x, ∆) h i (2) (2) + η ′ Aη 2Gq+2,n−p (x, ∆) − Gq+4,n−p (x, ∆) o n (1) −4 = σe2 tr(C −1 W ) − qdtr(A)σe2 2E (1) [χ−2 [χq+2 (∆)] q+2 (∆)] − (q − 2)E n ³ ´o (2) −2 + qdη ′ Aη (q − 2)E (2) [χ−4 [χq+2 (∆)] − E (2) [χ−2 q+4 (∆)] + 2 E q+4 (∆)] n = σe2 tr(C −1 W ) − dqσe2 tr(A) (q − 2)E (1) [χ−4 q+2 (∆)] ¸ ¾ · ′ (q + 2)η Aη (2∆)E (2) [χ−4 + 1− q+4 (∆)] 2σe2 ∆tr(A)
ˆSE (k); W ) = R(β( n "µ
¶# ¶2 µ qd qd −1 F (∆) I Fq+2,n−p (∆) < − tr(A)E 1− q + 2 q+2,m q+2 "µ ¶2 µ ¶#) ′ η Aη (2) qd qd −1 + (∆) I Fq+4,n−p (∆) < E 1− F σe2 q + 4 q+4,n−p q+4 ¶ ¸ ·µ qd qd −1 ′ (2) F (∆) − 1 I (Fq+2,n−p (∆) < ) , (2.4) − 2η AηE q + 2 q+2,n−p q+2 σe2
(1)
where tr(A)
= tr(W C −1 H ′ (HC −1 H ′ )−1 HC −1 ),
η ′ Aη
=
and
(Hβ − h)′ (HC −1 H ′ )−1 HC −1 W C −1 H ′ (HC −1 H ′ )−1 (Hβ − h).
Also E (j) [χ−4 q+s (∆)]
=
∞ Γ X r=0
E (j) [χ−4 q+s (0)]
=
¡ν
2
¢ ³ ∆ ´r + r + j − 2 ν−2 (q + s − 2 + 2r)−1 (q + s − 4 + 2r)−1 ´ ν2 +r+j−2 ¡ ¢³ ∆ Γ(r + 1)Γ ν2 + j − 2 1 + ν−2
(q + s − 2)−1 (q + s − 4)−1 ,
j = 1, 2
and · µ ¶¸ qd −2 E (j) Fq+s,n−p (∆)I Fq+s,n−p (∆) ≤ q+s ¢ ³ ∆ ´r ¤ £ ¡ν ∞ Γ 2 + r + j − 2 ν−2 (q + s)2 Ix 12 (q + s − 4 + 2r), 21 (n − p + 4) X = ´ ν2 +r+j−2 ¢³ ¡ ∆ r=0 Γ(r + 1)Γ ν + j − 2 1 + (n − p)(q + s − 2 + 2r)(q + s − 4 + 2r) 2 ν−2
23
Estimation Strategies for...
· µ ¶¸ qd −2 Fq+s,n−p (0)I Fq+s,n−p (0) ≤ E ¸ · q+s (q + s)2 1 1 = Ix (q + s − 4), (n − p + 4) , (n − p)(q + s − 2)(q + s − 4) 2 2 (j)
j = 1, 2
Based on the above informations we consider the performance of the estimators in the following section.
3
Risk analysis of the estimators
In this section we will compare the performance of the proposed estimators in the light of quadratic risk function. For our convenience we assume that ν is known. We obtain from Anderson (1984, Theorem A.2.4, p.590) that γp ≤
η ′ Aη ≤ γ1 , η ′ Cη
or
σe2 ∆γp ≤ η ′ Aη ≤ σe2 ∆γ1 ,
(3.1)
where γ1 and γp are the largest and the smallest characteristic roots of the matrix AC −1 ′ and ∆ = η σCη 2 . e
3.1
Comparison of βˆU E and βˆRE
First, we compare between βˆU E and βˆRE . Using (2.4), the risk difference is, R(βˆU E ; W ) − R(βˆRE ; W )
= σe2 tr(A) − η ′ Aη.
The difference in (3.2) will be non-negative whenever ∆ ≤ tr(A) γ1 ,
tr(A) γ1 . That is RE ∆ ≥ tr(A) γp . For W
(3.2) will dominate
otherwise UE will dominate RE when = C, we note UE when ∆ ≤ RE UE 2 ˆ ˆ that β performs better than β in the interval [0, qσe ] and worse outside this interval.
3.2
Comparison of βˆU E , βˆRE and βˆP T
First we compare βˆP T versus βˆU E . The risk difference is R(βˆU E ; W ) − R(βˆP T ; W )
(1)
= σe2 tr(A)Gq+2,n−p (x, ∆) h i (2) (2) − η ′ Aη 2Gq+2,n−p (x, ∆) − Gq+4,n−p (x, ∆) .
The difference in (3.3) will be non-negative whenever (1)
∆≤
h
tr(A)Gq+2,n−p (x, ∆)
i (2) (2) γ1 2Gq+2,n−p (x, ∆) − Gq+4,n−p (x, ∆) .
(3.3)
24
TABATABAEY, SALEH & KIBRIA
¶ · (1) tr(A)Gq+2,n−p (x,∆) PT UE ˆ ˆ h i and βˆU E performs Thus β is superior to β if ∆ ∈ 0, (2) (2) γ1 2Gq+2,n−p (x,∆)−Gq+4,n−p (x,∆) ¶ · (1) tr(A)Gq+2,n−p (x,∆) h i,∞ It follows from (3.3) that better than βˆP T if ∆ ∈ (2) (2) γp 2Gq+2,n−p (x,∆)−Gq+4,n−p (x,∆)
under H0 , βˆP T is superior to βˆU E for all α ∈ (0, 1). We can describe the graph of R(βˆP T ; W ) (1) as follows. It assumes a value of σe2 tr(C −1 W ) − σe2 tr(A)Gq+2,n−p (x, 0) at ∆ = 0, then increase crossing the risk of βˆU E to a maximum then drops gradually towards σe2 tr(C −1 W ) as ∆ → ∞.
Now we compare the risk between βˆRE and βˆP T . Both are superior than βˆU E under the null hypothesis. We note that R(βˆRE ; W ) − R(βˆP T ; W )
(1)
= −σe2 tr(A)[1 − Gq+2,n−p (x, ∆)] h i (2) (2) + η ′ Aη 1 − 2Gq+2,n−p (x, ∆) + Gq+4,n−p (x, ∆) . (3.4)
The difference in (3.4) will be non-positive whenever (1)
∆≤
h
tr(A)[1 − Gq+2,n−p (x, ∆)]
i. (2) (2) γ1 1 − 2Gq+2,n−p (x, ∆) + Gq+4,n−p (x, ∆)
· ¶ (1) tr(A)[1−Gq+2,n−p (x,∆)] i Thus βˆP T is superior to βˆRE if ∆ ∈ 0, h and βˆRE per(2) (2) γ1 1−2Gq+2,n−p (x,∆)+Gq+4,n−p (x,∆) · ¶ (1) tr(A)[1−Gq+2,n−p (x,∆)] PT ˆ h i,∞ . forms well than β if ∆ ∈ (2) (2) γp 1−2Gq+2,n−p (x,∆)+Gq+4,n−p (x,∆)
3.3
Comparison of βˆU E , βˆRE , βˆP T and βˆSE
Now we investigate the comparative statistical properties of the Stein-type estimator. First we compare between UE and SE. The risk difference is n R(βˆU E ; W ) − R(βˆSE ; W ) = dqσe2 tr(A) (q − 2)E (1) [χ−4 q+2 (∆)] ¸ ¾ · ′ (q + 2)η Aη (2) −4 (2∆)E [χq+4 (∆)] . (3.5) + 1− 2σe2 ∆tr(A) Using (3.1), the risk difference in (3.5) is positive for all A such that ½ ¾ tr(A) q+2 A: ≥ . γ1 2
(3.6)
Thus βˆSE uniformly dominates βˆU E . Further, as ∆ → ∞, the risk difference tends to 0 from below. Now we wish to compare βˆRE and βˆSE . We have n R(βˆSE ; W ) − R(βˆRE ; W ) = σe2 tr(A) − η ′ Aη − dqσe2 tr(A) (q − 2)E (1) [χ−4 q+2 (∆)]
25
Estimation Strategies for...
+
· ¸ ¾ (q + 2)η ′ Aη (2) −4 1− (2∆)E [χq+4 (∆)] . 2σe2 ∆tr(A)
(3.7)
From (3.7) we note that under H0 , R(βˆSE ; W ) ≥ R(βˆRE ; W ). Thus βˆRE performs better than βˆSE under H0 . However, η moves away from 0, η ′ Aη increases and the risk of βˆRE becomes unbounded while the risk of βˆSE remains below the risk of βˆU E and merges with it as ∆ → ∞. Thus βˆSE dominates βˆRE outside an interval around the origin. Now we compare βˆP T and βˆSE under H0 . We have h i (1) R(βˆSE ; W ) − R(βˆP T ; W ) = σe2 tr(A) Gq+2,n−p (x, 0) − d .
The above difference is positive for all α ∈ (0, 1) such that Fα satisfies the following inequality ¾ ½ q + 2 −1 Fq+2,n−p (d, 0) . (3.8) α : Fα > q
Thus PT dominates SE when (3.8) satisfies, while SE dominates PT when Fα satisfies the following inequality ½ ¾ q + 2 −1 α : Fα < Fq+2,n−p (d, 0) . (3.9) q Thus it is clear that Stein-type shrinkage estimator, βˆSE does not always dominate PT under H0 . The dominates depend on size of the critical level. Under the alternative hypothesis the risk difference is n h i (1) −4 R(βˆSE ; W ) − R(βˆP T ; W ) = −σe2 tr(A) qd 2E (1) [χ−2 [χq+2 (∆)] q+2 (∆)] − (q − 2)E o n (1) − Gq+2,n−p (x; ∆) + η ′ Aη dq(q − 2)E (2) [χ−4 q+2 (∆)] h i (2) −2 + 2qd E (2) [χ−2 [χq+4 (∆)] q+2 (∆)] − E h io (2) (2) − 2Gq+2,n−p (x; ∆) − Gq+4,n−p (x; ∆) . (3.10) The risk difference in (3.10) is positive and therefore PT will dominate SE if o n ¡ ¢ (1) (1) −4 (∆)] − G (x; ∆) (∆)] − (q − 2)E [χ tr(A) dq 2E (1) [χ−2 q+2 q+2,n−p q+2 , ∆≥ γp × f1 (∆, α)
while SE will dominate PT whenever n ¡ o ¢ (1) (1) −4 tr(A) dq 2E (1) [χ−2 (∆)] − (q − 2)E [χ (∆)] − G (x; ∆) q+2 q+2 q+2,n−p ∆≤ , γ1 × f1 (∆, α) where f1 (∆, α)
h i (2) −2 (2) −2 = dq(q − 2)E (2) [χ−4 (∆)] + 2qd E [χ (∆)] − E [χ (∆)] q+2 q+2 q+4
26
TABATABAEY, SALEH & KIBRIA
−
h
i (2) (2) 2Gq+2,n−p (x; ∆) − Gq+4,n−p (x; ∆) .
Thus under alternative hypothesis, SE will dominate RE if © ¡ ¢ª (1) −4 tr(A) dq 2E (1) [χ−2 [χq+2 (∆)] q+2 (∆)] − (q − 2)E © £ ¤ª , ∆≤ (2) [χ−2 (∆)] − E (2) [χ−2 (∆)] γ1 ×dq(q − 2)E (2) [χ−4 q+2 (∆)] + 2qd E q+2 q+4
while RE will dominate SE if © ¡ ¢ª (1) −4 tr(A) dq 2E (1) [χ−2 [χq+2 (∆)] q+2 (∆)] − (q − 2)E © £ ¤ª . ∆≥ (2) [χ−2 (∆)] − E (2) [χ−2 (∆)] γp × dq(q − 2)E (2) [χ−4 q+2 (∆)] + 2qd E q+2 q+4
3.4
Comparison of βˆU E , βˆRE , βˆP T , βˆSE , βˆP R
First we compare between βˆU E and βˆP R . From (2.4) and (3.1) it is observed that R(βˆP R ; W ) ≤ R(βˆU E ; W ),
∀∆, q ≥ 3.
This βˆP R uniformly dominates βˆU E . Further the risk of βˆP R remains below the risk of βˆU E and merges with it when ∆ → ∞. To compare βˆRE and βˆP R , under null hypothesis, we have "µ ( ¶2 qd −1 PR RE 2 (1) ˆ ˆ R(β ; W ) − R(β ; W ) = σe tr(A) (1 − d) − E 1− F (0) q + 2 q+2,n−p ¶¸¾ µ dq . (3.11) ×I Fq+2,n−p (0) ≤ q+2 Since ¶# ¶2 µ dq qd −1 F (0) I Fq+2,n−p (0) ≤ E 1− q + 2 q+2,n−p q+2 "µ ¶2 # qd ≤ E 1− = 1 − d, F −1 (0) q + 2 q+2,n−p (1)
"µ
the difference in (3.11) is always positive. This βˆRE performs better than βˆP R under H0 . However, η moves away from 0, η ′ Aη increases and the risk of βˆRE becomes unbounded while the risk of βˆP R remains below the risk of βˆU E and merges with it as ∆ → ∞. Thus βˆP R dominates βˆRE outside an interval around the origin. Now we compare βˆP T and βˆP R . Under H0 , the risk difference is n (1) R(βˆP R ; W ) − R(βˆP T ; W ) = σe2 tr(A) (Gq+2,n−p (x, 0) − d "µ ¶2 µ ¶#) qd dq −1 (1) − E 1− F (0) I Fq+2,n−p (0) ≤ . q + 2 q+2,n−p q+2
(3.12)
27
Estimation Strategies for...
The difference in (3.12) is always positive for all α satisfying the condition ½ ¾ q + 2 −1 α : Fα > Fq+2,n−p (d∗ , 0) , (3.13) q ³ ³ ´2 ´ qd dq −1 where d∗ = d + E (1) 1 − q+2 Fq+2,n−p (0) × I Fq+2,n−p (0) ≤ q+2 ) . The risk of βˆP R is smaller than that of the risk of βˆP T when the critical value Fα satisfies the following condition
¾ ½ q + 2 −1 Fq+2,n−p (d∗ , 0) . α : Fα < q
(3.14)
This leads to the conclusion that neither of the estimators, βˆP R or βˆP T uniformly dominate under H0 . This is because, under H0 , the PT reduces to RE. Now we compare βˆP R and βˆP T under the alternative hypothesis. The risk difference is n ³ ´ (1) −4 R(βˆP R ; W ) − R(βˆP T ; W ) = −σe2 tr(A) dq 2E (1) [χ−2 [χq+2 (∆)] q+2 (∆)] − (q − 2)E "µ ¶2 µ ¶# qd dq −1 (1) + E 1− F (∆) I Fq+2,n−p (∆) ≤ q + 2 q+2,n−p q+2 o n (1) ′ (2) −4 − Gq+2,n−p (x, ∆) + η Aη qd(q − 2)E [χq+2 (∆)] ³ ´ (2) −2 + 2qd E (2) [χ−2 [χq+2 (∆)] q+2 (∆)] − E ³ ´ (2) (2) − 2Gq+2,n−p (x, ∆) − Gq+4,n−p (x, ∆) "µ ¶2 µ ¶# qd dq + E (2) 1− F −1 (∆) I Fq+2,n−p (∆) ≤ q + 2 q+2,n−p q+2 ·µ ¶ µ ¶¸¾ qd dq −1 (2) − E F (∆) − 1 × I Fq+2,n−p (∆) ≤ . q + 2 q+2,n−p q+2 The right hand side of the above equation will be non-negative if ∆≥
f2 (∆, α) , γp × f3 (∆, α)
(3.15)
where n ³ ´ (1) −4 = σe2 tr(A) dq 2E (1) [χ−2 [χq+2 (∆)] q+2 (∆)] − (q − 2)E "µ ¶# ¶2 µ qd dq −1 (1) F (∆) I Fq+2,n−p (∆) ≤ + E 1− q + 2 q+2,n−p q+2 o (1) − Gq+2,n−p (x, ∆) (3.16)
f2 (∆, α)
and f3 (∆, α)
=
n ³ ´ (2) −2 (2) −2 qd(q − 2)E (2) [χ−4 (∆)] + 2qd E [χ (∆)] − E [χ (∆)] q+2 q+2 q+2
28
TABATABAEY, SALEH & KIBRIA
− + −
³
´ (2) (2) 2Gq+2,n−p (x, ∆) − Gq+4,n−p (x, ∆) "µ ¶2 µ ¶# dq qd −1 (2) F (∆) I Fq+2,n−p (∆) ≤ E 1− q + 2 q+2,n−p q+2 ·µ ¶ µ ¶¸¾ qd dq −1 E (2) Fq+2,n−p (∆) − 1 × I Fq+2,n−p (∆) ≤ . q+2 q+2
Thus PR estimator will dominate PT estimator when (3.15) holds, while PT will dominate PR when ∆≤
f2 (∆, α) . γ1 × f3 (∆, α)
(3.17)
Finally we compare the risks of βˆP R and βˆSE . The risk difference is given by ˆSE R(βˆP R ; W ) − R( "µβ ; W ) ¶# ¶2 µ dq qd −1 2 (1) F (∆) I Fq+2,n−p (∆) ≤ = −σe tr(A)E 1− q + 2 q+2,n−p q+2 "µ ¶2 µ ¶# dq qd −1 ′ (2) (∆) I Fq+2,n−p (∆) ≤ F − η AηE 1− q + 4 q+2,n−p q+2 ·µ ¶ µ ¶¸ qd dq −1 − 2η ′ AηE (2) Fq+2,n−p (∆) − 1 I Fq+2,n−p (∆) ≤ . (3.18) q+2 q+2 The right hand side of (3.18) is always negative since the expectation of a positive random variable is positive. Thus for all β, the risk of βˆP R is smaller than that of the risk of βˆSE . Therefore, the positive rule shrinkage estimator (PR) not only confirms the inadmissibility of the shrinkage estimator (SE), but also demonstrates a simple superior estimator. Now, based on the above discussion we may state the following theorem. Theorem 3.1: Under the null hypothesis and the inequalities (3.8), (3.9), (3.13), and (3.14) the dominance picture of the estimators is as follows βˆRE ≥ βˆP T ≥ βˆP R ≥ βˆSE ≥ βˆU E ,
(3.19)
where the notations > means dominates in the sense of smaller risk. The position of preliminary test estimator may shift from “in between” R(βˆRE ; W ) and R(βˆP R ; W ) to “in between” R(βˆSE ; W ) and R(βˆU R ; W ). Thus the dominance picture under the H0 may change as follows: βˆRE ≥ βˆP R ≥ βˆSE ≥ βˆP T ≥ βˆU E .
(3.20)
The dominance pictures in (3.19) and (3.20) changes as η moves away from 0. We note that βˆU E has constant risk σe2 tr(C −1 W ) while the risk of βˆRE depends on η and therefore, the risk of βˆRE becomes unbounded as η moves always from 0. Also for ∆ → ∞, the risk of βˆP T
Estimation Strategies for...
29
and βˆP R converge to the risk of βˆU E . For reasonable η near 0, the risk of βˆP T is smaller than that of βˆP R for q ≥ 3. Thus neither βˆP T nor βˆP R dominates the other except they share common property that as ∆ → ∞ the risk of both becomes σe2 tr(C −1 W ). However the risk of βˆP R is below the risk of βˆU E while the risk of βˆP T exceeds the risk of βˆU E at some intermediate values of ∆ depends on α.
4
Summary and Conclusion
In this paper we discussed some finite sample theory of five well known estimators of β that are a combination of the sample and non sample information. The RE performs the best compare to other estimators in the neighborhood of the null hypothesis, however, it performs worse when ∆ moves away from its origin. We have demonstrated the superiority conditions of the estimators based on the quadratic risk function. We find that βˆSE and βˆP R are more efficient than βˆU E in the whole parameter space. The performance property of the estimators is robust in the class of t distribution which is determined by degrees of freedom ν. Note that the application of βˆP R and βˆSE is constrained by the requirement q ≥ 3, while βˆP T does not need such constraint. However, the choice of the level of significance of the test has a dramatic impact on the nature of the risk function for the PT estimator. Thus when q ≥ 3, one would use βˆP R otherwise βˆP T with some optimum size α.
References [1] Anderson, T. W. (1984). An introduction to multivariate statistical analysis. Second Edition. John Wiley, NY. [2] Adkins, L. C. and Hill, R. C. (1990). The RLS positive part Stein estimator. American Journal of Agricultural Economics, 72, 727-730. [3] Bancroft, T. A. (1944). On biases in estimation due to use of preliminary tests of significance. Annals of Mathematics and Statistics, 15, 190-204. [4] Bancroft, T. A. (1964). Analysis and inference for incompletely specified models involving the use of preliminary test(s) of significance. Biometrics, 20, 427-442. [5] Benda, N. (1996). Pre-test estimation and design in the linear model. J. Statistical Planning and Inference, 52, 225 -240. [6] Blattberg, R. C. and Gonedes, N. J. (1974). A comparison of the stable and Student t distributions as statistical models for stock prices. Journal of Business, 47, 224-280. [7] Fama, E. F. (1965). The behavior of stock market prices. Journal of Business, 38, 34-105.
30
TABATABAEY, SALEH & KIBRIA
[8] Giles, A. J. (1991). Pretesting for Linear Restrictions in a Regression Model with Spherically Symmetric Distributions. J. Econometrics. 50, 377-398. [9] Giles, A. J. (1992). Estimation of the error variance after a preliminary test of homogeneity in a regression model with spherically symmetric disturbances. J. Econometrics. 53, 345-361. [10] Gnanadesikan, R. (1977). Methods for Statistical Data Analysis of Multivariate Observations, New York: Wiley. [11] Han, C-P, and Bancroft, T. A. (1968). On Pooling means when variance is unknown. Journal of the American Statistical Association, 63, 1333-1342. [12] James, W. and Stein, C. (1961). Estimation with quadratic loss. Proceeding of the Fourth Barkeley Symposium on Mathematical Statistics and Probability. University of California Press, Berkeley, CA, 361-379. [13] Judge, G.G. and Bock, M.E. (1978). The Statistical Implications of Pre-test and Steinrule Estimators in Econometrics, North-Holland Publishing Company, Amsterdam. [14] Judge, G. C., Miyazaki, S. and Yancey, T. (1985). Minimax estimators for the location vectors of spherically symmetric densities. Econometric Theory, 1, 409-417. [15] Kibria, B. M. G. (1996). On shrinkage ridge regression estimators for restricted linear models with multivariate t disturbances. Students, 1 (3), 177-188. [16] Kibria, B. M. G. and Saleh, A. K. Md. E. (1993). Performance of shrinkage preliminary test estimator in regression analysis. Jahangirnagar Review. A 17, 133-148. [17] Kibria, B. M. G. and Saleh, A. K. Md. E. (2003a). Effect of W, LR, and LM Tests on the Performance of Preliminary Test Ridge Regression Estimators. Journal of the Japan Statistical Society, 33(1), 119-136 [18] Kibria, B. M. G. and Saleh, A. K. Md. E. (2003b). Preliminary test ridge regression estimators with Student’s t errors and conflicting test-statistics, Metrika, 59, 105-124. [19] Mosteller, F. (1948). On pooling data. Journal of the American Statistical Association, 43, 231-242. [20] Ohtani, K. (1993). A comparison of the Stein-rule and positive part Stein-rule estimators in a misspecified linear regression models. Econometric Theory, 9, 668-679. [21] Saleh, A. K. Md. E. (2002). Theory of Preliminary test and Stein- Type Estimation with Application. Unpublished manuscript, School of Mathematics and Statistics, Carleton University, Ottawa, Ontario, K1S 5B6, Canada.
Estimation Strategies for...
31
[22] Saleh, A. K. Md. E. and Kibria, B. M. G. (1993). Performances of some new preliminary test ridge regression estimators and their properties. Communications in StatisticsTheory and Methods, 22, 2747-2764. [23] Saleh, A. K. Md. E. and Sen, P. K. (1978). Non-parametric estimation of location parametric after a preliminary test regression. Annals of Statistics, 6, 154-168. [24] Shalabh (1995). Performance of Stein-rule procedure for simultaneous prediction of actual and average values of study variable in linear regression models. Bul. Internat. Statist. Inst. 56, 1375-1390. [25] Singh, R. S. (1991). James-Stein rule estimators in linear regression models with multivariate t distributed error. Australia J. of Statistics, 33, 145-158. [26] Sutradhar, B. C. and Ali, M. M. (1986). Estimation of the parameter of a regression model with a multivariate t error. Communications in Statistics, A 15, 429-450. [27] Ullah, A. and Zinde-Walsh, V. (1984). On the robustness of LM, LR and W tests in regression models. Econometrics, 52, 1055-1066. [28] Zellner, A. (1976). Bayesian and non-Bayesian analysis of the regression model with multivariate Student t error terms, Journal of the American Statistical Association, 71, 400-405.