estimation strategies for parameters of the linear ...

Journal of Statistical Research 2004, Vol. 38, No. 1, pp. 13-31 Bangladesh

ISSN 0256 - 422 X

ESTIMATION STRATEGIES FOR PARAMETERS OF THE LINEAR REGRESSION MODELS WITH SPHERICALLY SYMMETRIC DISTRIBUTIONS S. M. M. Tabatabaey Department of Statistics, School of Mathematical Sciences, Ferdowsi University of Mashhad, P. O. Box 1159-9177948953, Mashhad, Iran. Email: [email protected] A. K. Md. E. Saleh Department of Mathematics and Statistics, Carleton University, Ottawa, Canada K1S 5B6. Email: [email protected] B. M. Golam Kibria Department of Statistics, Florida International University, Miami, FL 33199, USA. Email: [email protected]

summary This paper deals with the problem of pre-test and shrinkage estimation of the parameters for the linear regression model with spherically symmetric error distributions when the sub hypothesis restrictions are suspected. Accordingly we consider five well known estimators, namely, unrestricted estimator (UE), restricted estimator (RE), preliminary test (PT) estimator, shrinkage estimator (SE) and positive rule (PR) shrinkage estimator. The bias and risk functions of the proposed estimators are analyzed under both null and alternative hypotheses. Under the null hypothesis, the restricted estimator (RE) has the smallest risk followed by the pre-test or shrinkage estimators. However, the pre-test or shrinkage estimators perform the best followed by the unrestricted estimator (UE) and restricted estimator (RE) when the parameter moves away from the subspace of the restrictions. The conditions of superiority of the proposed estimator for departure parameter are provided. It is demonstrated that the positive rule shrinkage estimator utilizes both sample and non-sample information and performs uniformly better than UE and ordinary shrinkage estimators. Keywords and phrases: Bias; Dominance; James and Stein Estimator; Preliminary Test; Quadratic Risk; Spherical Distribution; Students t; Superiority. AMS Classification: 62J07, 62F03 c Institute of Statistical Research and Training (ISRT), University of Dhaka, Dhaka 1000, Bangladesh. °

14

1

TABATABAEY, SALEH & KIBRIA

Introduction

This paper will discuss about the estimation of the regression parameters when the errors of the model have spherically symmetric distribution. To describe the problem, we consider the following linear regression model y = Xβ + e,

(1.1)

where y = (y1 , y2 , . . . , yn )′ is an n × 1 vector of observations on the dependent variable, X is an n × p matrix of full rank p, β = (β1 , β2 , . . . , βp )′ is an p × 1 vector of parameters and e = (e1 , e2 , . . . , en )′ is an n × 1 vector of errors, which is distributed according to the laws belonging to the class of spherical compound normal distributions with E(e) = 0 and E(ee′ ) = σe2 In , where In is the n-dimensional identity matrix and σe2 the common variance of ei (i = 1, 2, . . . , n). This class is a subclass of the family of spherically symmetric distributions (SSDs) which can be expressed as a variance mixture of normal distributions, that is, Z ∞ f (e) = f (e|θ)g(θ)dθ, (1.2) 0

where f (e) is the probability density function (pdf ) of e, f (e|θ) is the pdf of normal with mean vector 0 and variance-covariance matrix θ2 In and g(θ) is the pdf of θ with support [0, ∞). In this case, E(θ2 ) = σe2 and we write e ∼ SSD({(0, E(θ2 )In )}). The well-known members of the SSD class are normal, multivariate Student’s t and Cauchy, a special case of multivariate Student’s t with 1 degree of freedom. In most applied as well as theoretical research works, the error terms in the linear models are assumed to be normally and independently distributed. However, such assumptions may not be appropriate in many practical situation (for example, see Gnanadesikan 1977 and Zellner 1976). It happens particularly if the error distribution has heavier tails. For instance, some economic data may be generated by processes whose distribution have more kurtosis than the normal distribution. One can tackle such situation by using the well known t distribution as it has heavier tail than the normal distribution, specially for smaller degrees of freedom (e.g. Fama (1965), Blatberg and Gonedes 1974, Sutradhar and Ali 1986, Ullah and Zinde-Walsh 1984). The multivariate Student’s t distribution can be obtained if g(θ) be assumed to have an inverted gamma (IG) density with say, scale parameter σ 2 and degrees of freedom ν, denoted by IG(ν, σ 2 ), and is given by µ 2 ¶ν/2 νσ 2 νσ 2 θ−(ν+1) e− 2θ2 , 0 < ν, σ, θ < ∞. g(θ) = Γ(ν/2) 2 Then, the multivariate t-distribution can be obtained from (1.2) as ¢ ¡ µ ¶− n+ν 2 Γ n+ν e′ e 2 f (e) = 1+ , 0 < ν, σ, < ∞, −∞ < ei < ∞. 2 n/2 n νσ (πν) Γ(ν/2)σ

(1.3)

15

Estimation Strategies for...

The mean vector and variance-covariance matrix of e are respectively, E(e) = 0,

and

E(ee′ ) =

ν σ 2 In = σe2 In , ν−2

ν > 2.

The marginal distributions are univariate Student t-distributions. For ν = 1, the pdf of (1.3) becomes Cauchy and as ν → ∞, the pdf approaches normal. For the full model the unrestricted estimator (U E) of β is given by βÛ E = C −1 X ′ y,

(1.4)

where C = X ′ X is the information matrix. The corresponding unbiased estimator of σe2 is given by σ ˜e2 =

(y − X βÛ E )′ (y − X βÛ E ) , m

m = n − p.

Our primary interest is to estimate the regression coefficients β when it is apriori suspected but not certain that β may be restricted to the subspace, H0 : Hβ

= h,

(1.5)

where H is an q × p known matrix of full rank q(< p) and h is an q × 1 vector of known constants. The restricted estimator (RE) of β is given by βˆRE = βÛ E − C −1 H ′ (HC −1 H ′ )−1 (H βÛ E − h) and

(1.6)

and the corresponding estimator of σe2 is given by σ ê2 =

(y − X βˆRE )′ (y − X βˆRE ) ; m+q

m=n−p

which is unbiased under the null hypothesis. Note that the restricted least squares estimator satisfies the condition H βˆ = h. The estimator of β in (1.4) is usually used in the case when there is no hypothesis information available on the vector of parameter of interest β. On the other hand, the estimator of β in (1.6) is useful in the presence of hypothesis (1.5). Furthermore, it is well known that the RE performs better than the U E, when the restrictions hold but as the parameters, β move away from the subspace Hβ = h, the RE becomes biased and inefficient while the performance of the U E remains stable. As a result, one may combine the U E and RE to obtain a better performance of the estimators in presence of the uncertain prior information (U P I) Hβ = h, which leads to the preliminary test estimator (P T ) and defined as βˆP T = βˆRE I(Ln ≤ Ln,α ) + βÛ E I(Ln > Ln,α ),

(1.7)

16


where, Ln =

(H βÛ E − h)′ (HC −1 H ′ )−1 )(H βÛ E − h) qs2

is the test-statistic for testing the null-hypothesis in (1.5), and Ln,α is the upper α-level critical value of Ln and I(A) is the indicator function of the set A. Under the null hypothesis and normal theory, Ln follows a central F -distribution with (q, m) degrees of freedom while under the alternative it follows the non-central F -distribution with (q, m) degrees of freedom and non-centrality parameter 12 ∆, where ∆=

(Hβ − h)′ (HC −1 H ′ )−1 (Hβ − h) σe2

is the departure parameter from the null hypothesis. It is important to remark that βˆP T is bounded and performs better than βˆRE in some part of the parameter space. The preliminary test approach estimation has been pioneered by Bancroft (1944), followed by Bancroft (1964), Mosteller (1948), Han and Bancroft (1968), Saleh and Sen (1978), Giles (1991), Kibria and Saleh (1993), Benda (1996) and recently Kibria and Saleh (2003 a,b) among others. They have assumed that the disturbances of the model are normally distributed. Note that, the preliminary test estimator (PT) has two characteristics: (1) it produces only two values, the unrestricted estimator and the restricted estimator, (2) it depends heavily on the level of significance of the preliminary test (PT). What about the intermediate value between βÛ E and βˆRE ? To overcome this shortcoming, we consider the Stein-type estimator. The Stein-type shrinkage estimator (SE) of β is defined as Û E − βˆRE ), βˆSE = βÛ E − dL−1 n (β

(1.8)

where d=

(q − 2)(n − p) , q(n − p + 2)

and q ≥ 3.

The SE in (1.8) will provide uniform improvement over βÛ E , however it is not a convex combination of βÛ E and βˆRE . Both (1.7) and (1.8) involve the statistic Ln which adjusts the estimator for departure from H0 . For large value of Ln both (1.7) and (1.8) yield βÛ E , while for small value of Ln their performance is different. The SE has the disadvantage that it has strange behavior for small Ln . Also the shrinkage factor (1 − dL−1 n ) becomes negative for Ln < d. This encourage one to find an alternative estimator. Hence, we define a better estimator, namely, the positive-rule shrinkage estimator (PR) of β as follows: Û E − βˆRE ). βˆP R = βˆSE − (1 − dL−1 n )I(Ln ≤ d)(β

(1.9)

The PR estimator in (1.9) will provide uniform improvement over βÛ E and it is a convex combination of βÛ E and βˆRE . The properties of stein-type estimators have been analyzed under normally assumption by various researchers. To mention a few, James and Stein


17

(1961), Judge and Bock (1978) and Shalabh (1995). The positive part shrinkage estimator has been considered under the normally assumption by Ohtani (1993) and Adkins and Hill (1990) among others. Giles (1991) consider the pre-test estimator for the restricted linear model with spherically symmetric disturbances. He has investigated some finite sample properties of the estimators numerically for the case of multivariate t error. Gilies (1992) also considered PT estimator for two sample linear regression model under spherically symmetric disturbances. Kibria (1996) considered SE for the multicollinear data and for the restricted linear model with Students t error. Using MSE criterion, he discussed the performance of the estimators with respect both non-centrality and ridge parameters. Judge et al. (1985) discussed the finite sample properties of James-Stein type and its positive part of the location parameter vector under squared errors loss and multivariate t errors. They compared the risk functions of the estimators via a monte carlo experiments. Singh (1991) discussed the properties of James-Stein rule estimators in a regression model with multivariate Student’s t error. In general, the risk characteristics are found to be the same under normal and non-normal errors cases. However, there is very limited literature on the analytical results relating to the finite samples properties of positive rule shrinkage estimator for the linear model with non-normal error distribution. Since the PR estimator has some advantages over the SE or PT estimators, the main objective of this paper is to study the proposed estimators under a Student t disturbances for the estimation of regression coefficients in the model. The plan of the paper is as follows. In Section 2 we provide the expressions of biases and risks. Section 3 discuss the relative performance of the estimators. Finally, summary and conclusions have been included in Section 4.

2

The bias and risk of the estimators

In this section we give the expressions for the bias and quadratic risk of the estimators βÛ E , βˆRE , βˆP T , βˆSE , βˆP R .

2.1

Biases of the estimators

In this subsection we will discuss about the biases of the estimators. Note that the biases of the proposed estimators are routinely followed from Judge and Bock (1978, Chapter 10), and Saleh (2002). Therefore, we omit all derivation, instead, we present the expressions for the biases of the estimators in the following theorem. Theorem 2.1 Bias of the unrestricted estimator (UR), restricted estimator (RE), preliminary test estimator (PT), shrinkage estimator (SE) and the positive-rule shrinkage estimator (PR) are given respectively B(βÛ E )

= E(βÛ E − β) = 0

18


B(βˆRE )

= E(βˆRE − β) = −η

B(βˆP T )

= E(βˆP T − β) = −ηGq+2,n−p (x, ∆)

B(βˆSE )

= E(βˆSE − β) = −qdηE (2) [χ−2 q+2 (∆)] ½ · µ ¶¸ qd qd −1 PR (2) ˆ = E(β − β) = η E Fq+2,n−p (∆)I Fq+2,n−p (∆) ≤ q+2 q+2 ¾ qd (2) −1 − E (2) [Fq+2,n−p (∆)] − Gq+2,n−p (x, ∆) , (2.1) q+2

B(βˆP R )

(2)

where η = C −1 H ′ (HC −1 H ′ )−1 (Hβ − h), ¢ ¡ Γ ν2 + r + j − 2 ¢³ ¡ = Γ(r + 1)Γ ν2 + j − 2 r=0 1+ ½ ¾ 1 n−p × Ix , (q + 2i) + r, 2 2 ∞ X

(j)

Gq+2i,n−p (l∗ ; ∆)

³

∆ ν−2

∆ ν−2

´r

´ν/2+r+j−2

where I{.} is the incomplete beta function and x=

qFα . n − p + qFα

Also E (j) [χ−2 q+s (∆)]

=

r=0

E (j) [χ−2 q+s (0)]

=

¢ ³ ∆ ´r + r + j − 2 (q + s − 2 + 2r)−1 2 ν−2 ´ ν2 +r+j−2 ¡ ¢³ ∆ Γ(r + 1)Γ ν2 + j − 2 1 + ν−2

∞ Γ X

¡ν

(q + s − 2)−1 ,

j = 1, 2

and ¶¸ qd Fq+s,n−p (∆) ≤ E q+s ³ ¡ν ¢ ∆ ´r £ ¤ ∞ X Γ 2 + r + j − 2 ν−2 (q + s)Ix 21 (q + s − 2 + 2r), 12 (n − p + 2) = ´ ν2 +r+j−2 ¡ ¢³ ∆ r=0 Γ(r + 1)Γ ν2 + j − 2 1 + ν−2 (q + s − 2 + 2r) ¶¸ · µ qd −1 E (j) Fq+s,n−p (0)I Fq+s,n−p (0) ≤ q+s · ¸ 1 1 −1 = (q + s)(q + s − 2) Ix (q + s − 2), (n − p + 2) , j = 1, 2 2 2 (j)

·

−1 Fq+s,n−p (∆)I

µ

where x=

qd . n − p + qd

19


For α = 0, the bias of βˆP T coincides with that of the restricted estimator, βˆRE , while for α = 1, it coincides with that of βÛ E , the unrestricted estimator β. Also as the non-centrality parameter ∆ → ∞, B(βÛ E ) = B(βˆP T ) = B(βˆSE ) = B(βˆP R ) = 0, while the B(βˆRE ) becomes unbounded. However, under H0 : Hβ = h, ∆ = 0, hence all the estimators are ′ unbiased. Note that ∆ can be expressed in terms of η as ∆ = η σCη 2 . e

Now we compare the biases under the alternative hypothesis. In order to present a clear cut picture of the biases, we transform them in quadratic (scaler) form by defining QB(βˆ∗ ) = B(βˆ∗ )′ CB(βˆ∗ ) as the quadratic bias function of an estimator βˆ∗ of the parameter vector β. The quadratic bias functions of the estimators can be expressed by the following theorem. Theorem 2.2 Quadratic bias of the unrestricted estimator (UE), restricted estimator (RE), preliminary test estimator (PT), shrinkage estimator (SE) and the positive-rule shrinkage estimator (PR) are given respectively QB(βÛ E )

=

QB(βˆRE )

= σe2 ∆

QB(βˆP T )

h i2 (2) = σe2 ∆ Gq+2,n−p (x, ∆)

QB(βˆSE ) QB(βˆP R )

0

h i2 = σe2 ∆ qdE (2) [χ−2 q+2 (∆)] ¶¸ · µ ½ qd qd −1 (2) 2 E Fq+2,n−p (∆)I Fq+2,n−p (∆) ≤ = σe ∆ q+2 q+2 ¾2 qd (2) −1 E (2) [Fq+2,n−p (∆)] − Gq+2,n−p (x, ∆) . − q+2

(2.2)

The quadratic bias function of all the estimators except βÛ E depends upon the parameters only through the ∆; thus the bias is a function of ∆. The bias of the PT depends on α, the size of the test. The magnitude of the RE increases without a bound and tends to ∞ (2) as ∆ → ∞. Since both E(χ−2 q+2 (∆)) and Gq+2,n−p (x, ∆) are decreasing function of ∆, the quadratic bias of PT, SE and PR estimators start from 0 and increase to a point and then decrease gradually to 0 when ∆ → ∞. However, the bias of PR estimator remain below the curve of SE and PT estimators. Figures 1 and 2 display various bias function for σe2 = 1 which support the above analysis to some extend. Based on the above analysis we may establish the following inequality QB(βÛ E ) ≤ QB(βˆP R ) ≤ QB(βˆSE ≤ QB(βˆP T ) ≤ QB(βˆRE ).

20


1.0 0.8

5

Bias

10

15

20

0

5

10

15

Delta

n=10, alpha=0.15

n=10, alpha=0.20

20

0.8 0.6 0.2

Bias

UE RE PT SR PR

0.0

0.0

0.2

0.4

0.6

UE RE PT SR PR

0.4

0.8

1.0

Delta

1.0

0

Bias

0.4 0.0

0.0

0.2

0.4

UE RE PT SR PR

UE RE PT SR PR

0.6

1.0 0.6

0.8

n=10, alpha=0.10

0.2

Bias

n=10, alpha=0.05

0

5

10 Delta

15

20

0

5

10

15

Delta

Figure 1. Quadratic bias function of various estimators for p = 5, q = 3 and ν = 5.

20

21


Bias 0.0

0.0

0.2

Bias

UE RE PT SR PR

0.2

UE RE PT SR PR

0.4

0.4

0.6

n=20, alpha=0.10

0.6

n=20, alpha=0.05

5

10

15

20

0

5

10

15

Delta

n=20, alpha=0.15

n=20, alpha=0.20

20

Bias 0.0

0.0

0.2

Bias

UE RE PT SR PR

0.2

UE RE PT SR PR

0.4

0.4

0.6

Delta

0.6

0

0

5

10

15

20

0

5

Delta

10

15

20

Delta

Figure 2. Quadratic bias function of various estimators for p = 5, q = 4 and ν = 10.

2.2

Risk functions of the estimators

In this subsection we will present the quadratic risk function. Suppose β ∗ denotes an estimator of β, then for a given non-singular matrix W , the loss function is defined as L(β ∗ ; W ) = (β ∗ − β)′ W (β ∗ − β). The corresponding risk function of the estimator βˆ∗ is defined as R(β ∗ ; W ) = E(β ∗ − β)′ W (β ∗ − β) = tr(W M ),

(2.3)

where M is the mean-squared error matrix of β ∗ . The quadratic risk functions of the proposed estimators are routinely followed from Judge and Bock (1978, Chapter 10), Saleh (2002) and lead to the following theorem.

22


Theorem 2.3: Risks of UE, RE, PT, SE and PR are given respectively R(βÛ E ; W )

= σe2 tr(C −1 W )

R(βˆRE ; W )

£ ¤ = σe2 tr(C −1 W ) − tr(A) + η ′ Aη

R(βˆP T ; W )

R(βˆSE ; W )

R(βˆP R ; W )

£ ¤ (1) = σe2 tr(C −1 W ) − tr(A)Gq+2,n−p (x, ∆) h i (2) (2) + η ′ Aη 2Gq+2,n−p (x, ∆) − Gq+4,n−p (x, ∆) o n (1) −4 = σe2 tr(C −1 W ) − qdtr(A)σe2 2E (1) [χ−2 [χq+2 (∆)] q+2 (∆)] − (q − 2)E n ³ ó (2) −2 + qdη ′ Aη (q − 2)E (2) [χ−4 [χq+2 (∆)] − E (2) [χ−2 q+4 (∆)] + 2 E q+4 (∆)] n = σe2 tr(C −1 W ) − dqσe2 tr(A) (q − 2)E (1) [χ−4 q+2 (∆)] ¸ ¾ · ′ (q + 2)η Aη (2∆)E (2) [χ−4 + 1− q+4 (∆)] 2σe2 ∆tr(A)

ˆSE (k); W ) = R(β( n "µ

¶# ¶2 µ qd qd −1 F (∆) I Fq+2,n−p (∆) < − tr(A)E 1− q + 2 q+2,m q+2 "µ ¶2 µ ¶#) ′ η Aη (2) qd qd −1 + (∆) I Fq+4,n−p (∆) < E 1− F σe2 q + 4 q+4,n−p q+4 ¶ ¸ ·µ qd qd −1 ′ (2) F (∆) − 1 I (Fq+2,n−p (∆) < ) , (2.4) − 2η AηE q + 2 q+2,n−p q+2 σe2

(1)

where tr(A)

= tr(W C −1 H ′ (HC −1 H ′ )−1 HC −1 ),

η ′ Aη

=

and

(Hβ − h)′ (HC −1 H ′ )−1 HC −1 W C −1 H ′ (HC −1 H ′ )−1 (Hβ − h).

Also E (j) [χ−4 q+s (∆)]

=

∞ Γ X r=0

E (j) [χ−4 q+s (0)]

=

¡ν

2

¢ ³ ∆ ´r + r + j − 2 ν−2 (q + s − 2 + 2r)−1 (q + s − 4 + 2r)−1 ´ ν2 +r+j−2 ¡ ¢³ ∆ Γ(r + 1)Γ ν2 + j − 2 1 + ν−2

(q + s − 2)−1 (q + s − 4)−1 ,

j = 1, 2

and · µ ¶¸ qd −2 E (j) Fq+s,n−p (∆)I Fq+s,n−p (∆) ≤ q+s ¢ ³ ∆ ´r ¤ £ ¡ν ∞ Γ 2 + r + j − 2 ν−2 (q + s)2 Ix 12 (q + s − 4 + 2r), 21 (n − p + 4) X = ´ ν2 +r+j−2 ¢³ ¡ ∆ r=0 Γ(r + 1)Γ ν + j − 2 1 + (n − p)(q + s − 2 + 2r)(q + s − 4 + 2r) 2 ν−2

23


· µ ¶¸ qd −2 Fq+s,n−p (0)I Fq+s,n−p (0) ≤ E ¸ · q+s (q + s)2 1 1 = Ix (q + s − 4), (n − p + 4) , (n − p)(q + s − 2)(q + s − 4) 2 2 (j)

j = 1, 2

Based on the above informations we consider the performance of the estimators in the following section.

3

Risk analysis of the estimators

In this section we will compare the performance of the proposed estimators in the light of quadratic risk function. For our convenience we assume that ν is known. We obtain from Anderson (1984, Theorem A.2.4, p.590) that γp ≤

η ′ Aη ≤ γ1 , η ′ Cη

or

σe2 ∆γp ≤ η ′ Aη ≤ σe2 ∆γ1 ,

(3.1)

where γ1 and γp are the largest and the smallest characteristic roots of the matrix AC −1 ′ and ∆ = η σCη 2 . e

3.1

Comparison of βÛ E and βˆRE

First, we compare between βÛ E and βˆRE . Using (2.4), the risk difference is, R(βÛ E ; W ) − R(βˆRE ; W )

= σe2 tr(A) − η ′ Aη.

The difference in (3.2) will be non-negative whenever ∆ ≤ tr(A) γ1 ,

tr(A) γ1 . That is RE ∆ ≥ tr(A) γp . For W

(3.2) will dominate

otherwise UE will dominate RE when = C, we note UE when ∆ ≤ RE UE 2 ˆ ˆ that β performs better than β in the interval [0, qσe ] and worse outside this interval.

3.2

Comparison of βÛ E , βˆRE and βˆP T

First we compare βˆP T versus βÛ E . The risk difference is R(βÛ E ; W ) − R(βˆP T ; W )

(1)

= σe2 tr(A)Gq+2,n−p (x, ∆) h i (2) (2) − η ′ Aη 2Gq+2,n−p (x, ∆) − Gq+4,n−p (x, ∆) .

The difference in (3.3) will be non-negative whenever (1)

∆≤

h

tr(A)Gq+2,n−p (x, ∆)

i (2) (2) γ1 2Gq+2,n−p (x, ∆) − Gq+4,n−p (x, ∆) .

(3.3)

24


¶ · (1) tr(A)Gq+2,n−p (x,∆) PT UE ˆ ˆ h i and βÛ E performs Thus β is superior to β if ∆ ∈ 0, (2) (2) γ1 2Gq+2,n−p (x,∆)−Gq+4,n−p (x,∆) ¶ · (1) tr(A)Gq+2,n−p (x,∆) h i,∞ It follows from (3.3) that better than βˆP T if ∆ ∈ (2) (2) γp 2Gq+2,n−p (x,∆)−Gq+4,n−p (x,∆)

under H0 , βˆP T is superior to βÛ E for all α ∈ (0, 1). We can describe the graph of R(βˆP T ; W ) (1) as follows. It assumes a value of σe2 tr(C −1 W ) − σe2 tr(A)Gq+2,n−p (x, 0) at ∆ = 0, then increase crossing the risk of βÛ E to a maximum then drops gradually towards σe2 tr(C −1 W ) as ∆ → ∞.

Now we compare the risk between βˆRE and βˆP T . Both are superior than βÛ E under the null hypothesis. We note that R(βˆRE ; W ) − R(βˆP T ; W )

(1)

= −σe2 tr(A)[1 − Gq+2,n−p (x, ∆)] h i (2) (2) + η ′ Aη 1 − 2Gq+2,n−p (x, ∆) + Gq+4,n−p (x, ∆) . (3.4)

The difference in (3.4) will be non-positive whenever (1)

∆≤

h

tr(A)[1 − Gq+2,n−p (x, ∆)]

i. (2) (2) γ1 1 − 2Gq+2,n−p (x, ∆) + Gq+4,n−p (x, ∆)

· ¶ (1) tr(A)[1−Gq+2,n−p (x,∆)] i Thus βˆP T is superior to βˆRE if ∆ ∈ 0, h and βˆRE per(2) (2) γ1 1−2Gq+2,n−p (x,∆)+Gq+4,n−p (x,∆) · ¶ (1) tr(A)[1−Gq+2,n−p (x,∆)] PT ˆ h i,∞ . forms well than β if ∆ ∈ (2) (2) γp 1−2Gq+2,n−p (x,∆)+Gq+4,n−p (x,∆)

3.3

Comparison of βÛ E , βˆRE , βˆP T and βˆSE

Now we investigate the comparative statistical properties of the Stein-type estimator. First we compare between UE and SE. The risk difference is n R(βÛ E ; W ) − R(βˆSE ; W ) = dqσe2 tr(A) (q − 2)E (1) [χ−4 q+2 (∆)] ¸ ¾ · ′ (q + 2)η Aη (2) −4 (2∆)E [χq+4 (∆)] . (3.5) + 1− 2σe2 ∆tr(A) Using (3.1), the risk difference in (3.5) is positive for all A such that ½ ¾ tr(A) q+2 A: ≥ . γ1 2

(3.6)

Thus βˆSE uniformly dominates βÛ E . Further, as ∆ → ∞, the risk difference tends to 0 from below. Now we wish to compare βˆRE and βˆSE . We have n R(βˆSE ; W ) − R(βˆRE ; W ) = σe2 tr(A) − η ′ Aη − dqσe2 tr(A) (q − 2)E (1) [χ−4 q+2 (∆)]

25


+

· ¸ ¾ (q + 2)η ′ Aη (2) −4 1− (2∆)E [χq+4 (∆)] . 2σe2 ∆tr(A)

(3.7)

From (3.7) we note that under H0 , R(βˆSE ; W ) ≥ R(βˆRE ; W ). Thus βˆRE performs better than βˆSE under H0 . However, η moves away from 0, η ′ Aη increases and the risk of βˆRE becomes unbounded while the risk of βˆSE remains below the risk of βÛ E and merges with it as ∆ → ∞. Thus βˆSE dominates βˆRE outside an interval around the origin. Now we compare βˆP T and βˆSE under H0 . We have h i (1) R(βˆSE ; W ) − R(βˆP T ; W ) = σe2 tr(A) Gq+2,n−p (x, 0) − d .

The above difference is positive for all α ∈ (0, 1) such that Fα satisfies the following inequality ¾ ½ q + 2 −1 Fq+2,n−p (d, 0) . (3.8) α : Fα > q

Thus PT dominates SE when (3.8) satisfies, while SE dominates PT when Fα satisfies the following inequality ½ ¾ q + 2 −1 α : Fα < Fq+2,n−p (d, 0) . (3.9) q Thus it is clear that Stein-type shrinkage estimator, βˆSE does not always dominate PT under H0 . The dominates depend on size of the critical level. Under the alternative hypothesis the risk difference is n h i (1) −4 R(βˆSE ; W ) − R(βˆP T ; W ) = −σe2 tr(A) qd 2E (1) [χ−2 [χq+2 (∆)] q+2 (∆)] − (q − 2)E o n (1) − Gq+2,n−p (x; ∆) + η ′ Aη dq(q − 2)E (2) [χ−4 q+2 (∆)] h i (2) −2 + 2qd E (2) [χ−2 [χq+4 (∆)] q+2 (∆)] − E h io (2) (2) − 2Gq+2,n−p (x; ∆) − Gq+4,n−p (x; ∆) . (3.10) The risk difference in (3.10) is positive and therefore PT will dominate SE if o n ¡ ¢ (1) (1) −4 (∆)] − G (x; ∆) (∆)] − (q − 2)E [χ tr(A) dq 2E (1) [χ−2 q+2 q+2,n−p q+2 , ∆≥ γp × f1 (∆, α)

while SE will dominate PT whenever n ¡ o ¢ (1) (1) −4 tr(A) dq 2E (1) [χ−2 (∆)] − (q − 2)E [χ (∆)] − G (x; ∆) q+2 q+2 q+2,n−p ∆≤ , γ1 × f1 (∆, α) where f1 (∆, α)

h i (2) −2 (2) −2 = dq(q − 2)E (2) [χ−4 (∆)] + 2qd E [χ (∆)] − E [χ (∆)] q+2 q+2 q+4

26


−

h

i (2) (2) 2Gq+2,n−p (x; ∆) − Gq+4,n−p (x; ∆) .

Thus under alternative hypothesis, SE will dominate RE if © ¡ ¢ª (1) −4 tr(A) dq 2E (1) [χ−2 [χq+2 (∆)] q+2 (∆)] − (q − 2)E © £ ¤ª , ∆≤ (2) [χ−2 (∆)] − E (2) [χ−2 (∆)] γ1 ×dq(q − 2)E (2) [χ−4 q+2 (∆)] + 2qd E q+2 q+4

while RE will dominate SE if © ¡ ¢ª (1) −4 tr(A) dq 2E (1) [χ−2 [χq+2 (∆)] q+2 (∆)] − (q − 2)E © £ ¤ª . ∆≥ (2) [χ−2 (∆)] − E (2) [χ−2 (∆)] γp × dq(q − 2)E (2) [χ−4 q+2 (∆)] + 2qd E q+2 q+4

3.4

Comparison of βÛ E , βˆRE , βˆP T , βˆSE , βˆP R

First we compare between βÛ E and βˆP R . From (2.4) and (3.1) it is observed that R(βˆP R ; W ) ≤ R(βÛ E ; W ),

∀∆, q ≥ 3.

This βˆP R uniformly dominates βÛ E . Further the risk of βˆP R remains below the risk of βÛ E and merges with it when ∆ → ∞. To compare βˆRE and βˆP R , under null hypothesis, we have "µ ( ¶2 qd −1 PR RE 2 (1) ˆ ˆ R(β ; W ) − R(β ; W ) = σe tr(A) (1 − d) − E 1− F (0) q + 2 q+2,n−p ¶¸¾ µ dq . (3.11) ×I Fq+2,n−p (0) ≤ q+2 Since ¶# ¶2 µ dq qd −1 F (0) I Fq+2,n−p (0) ≤ E 1− q + 2 q+2,n−p q+2 "µ ¶2 # qd ≤ E 1− = 1 − d, F −1 (0) q + 2 q+2,n−p (1)

"µ

the difference in (3.11) is always positive. This βˆRE performs better than βˆP R under H0 . However, η moves away from 0, η ′ Aη increases and the risk of βˆRE becomes unbounded while the risk of βˆP R remains below the risk of βÛ E and merges with it as ∆ → ∞. Thus βˆP R dominates βˆRE outside an interval around the origin. Now we compare βˆP T and βˆP R . Under H0 , the risk difference is n (1) R(βˆP R ; W ) − R(βˆP T ; W ) = σe2 tr(A) (Gq+2,n−p (x, 0) − d "µ ¶2 µ ¶#) qd dq −1 (1) − E 1− F (0) I Fq+2,n−p (0) ≤ . q + 2 q+2,n−p q+2

(3.12)

27


The difference in (3.12) is always positive for all α satisfying the condition ½ ¾ q + 2 −1 α : Fα > Fq+2,n−p (d∗ , 0) , (3.13) q ³ ³ ´2 ´ qd dq −1 where d∗ = d + E (1) 1 − q+2 Fq+2,n−p (0) × I Fq+2,n−p (0) ≤ q+2 ) . The risk of βˆP R is smaller than that of the risk of βˆP T when the critical value Fα satisfies the following condition

¾ ½ q + 2 −1 Fq+2,n−p (d∗ , 0) . α : Fα < q

(3.14)

This leads to the conclusion that neither of the estimators, βˆP R or βˆP T uniformly dominate under H0 . This is because, under H0 , the PT reduces to RE. Now we compare βˆP R and βˆP T under the alternative hypothesis. The risk difference is n ³ ´ (1) −4 R(βˆP R ; W ) − R(βˆP T ; W ) = −σe2 tr(A) dq 2E (1) [χ−2 [χq+2 (∆)] q+2 (∆)] − (q − 2)E "µ ¶2 µ ¶# qd dq −1 (1) + E 1− F (∆) I Fq+2,n−p (∆) ≤ q + 2 q+2,n−p q+2 o n (1) ′ (2) −4 − Gq+2,n−p (x, ∆) + η Aη qd(q − 2)E [χq+2 (∆)] ³ ´ (2) −2 + 2qd E (2) [χ−2 [χq+2 (∆)] q+2 (∆)] − E ³ ´ (2) (2) − 2Gq+2,n−p (x, ∆) − Gq+4,n−p (x, ∆) "µ ¶2 µ ¶# qd dq + E (2) 1− F −1 (∆) I Fq+2,n−p (∆) ≤ q + 2 q+2,n−p q+2 ·µ ¶ µ ¶¸¾ qd dq −1 (2) − E F (∆) − 1 × I Fq+2,n−p (∆) ≤ . q + 2 q+2,n−p q+2 The right hand side of the above equation will be non-negative if ∆≥

f2 (∆, α) , γp × f3 (∆, α)

(3.15)

where n ³ ´ (1) −4 = σe2 tr(A) dq 2E (1) [χ−2 [χq+2 (∆)] q+2 (∆)] − (q − 2)E "µ ¶# ¶2 µ qd dq −1 (1) F (∆) I Fq+2,n−p (∆) ≤ + E 1− q + 2 q+2,n−p q+2 o (1) − Gq+2,n−p (x, ∆) (3.16)

f2 (∆, α)

and f3 (∆, α)

=

n ³ ´ (2) −2 (2) −2 qd(q − 2)E (2) [χ−4 (∆)] + 2qd E [χ (∆)] − E [χ (∆)] q+2 q+2 q+2

28


− + −

³

´ (2) (2) 2Gq+2,n−p (x, ∆) − Gq+4,n−p (x, ∆) "µ ¶2 µ ¶# dq qd −1 (2) F (∆) I Fq+2,n−p (∆) ≤ E 1− q + 2 q+2,n−p q+2 ·µ ¶ µ ¶¸¾ qd dq −1 E (2) Fq+2,n−p (∆) − 1 × I Fq+2,n−p (∆) ≤ . q+2 q+2

Thus PR estimator will dominate PT estimator when (3.15) holds, while PT will dominate PR when ∆≤

f2 (∆, α) . γ1 × f3 (∆, α)

(3.17)

Finally we compare the risks of βˆP R and βˆSE . The risk difference is given by ˆSE R(βˆP R ; W ) − R( "µβ ; W ) ¶# ¶2 µ dq qd −1 2 (1) F (∆) I Fq+2,n−p (∆) ≤ = −σe tr(A)E 1− q + 2 q+2,n−p q+2 "µ ¶2 µ ¶# dq qd −1 ′ (2) (∆) I Fq+2,n−p (∆) ≤ F − η AηE 1− q + 4 q+2,n−p q+2 ·µ ¶ µ ¶¸ qd dq −1 − 2η ′ AηE (2) Fq+2,n−p (∆) − 1 I Fq+2,n−p (∆) ≤ . (3.18) q+2 q+2 The right hand side of (3.18) is always negative since the expectation of a positive random variable is positive. Thus for all β, the risk of βˆP R is smaller than that of the risk of βˆSE . Therefore, the positive rule shrinkage estimator (PR) not only confirms the inadmissibility of the shrinkage estimator (SE), but also demonstrates a simple superior estimator. Now, based on the above discussion we may state the following theorem. Theorem 3.1: Under the null hypothesis and the inequalities (3.8), (3.9), (3.13), and (3.14) the dominance picture of the estimators is as follows βˆRE ≥ βˆP T ≥ βˆP R ≥ βˆSE ≥ βÛ E ,

(3.19)

where the notations > means dominates in the sense of smaller risk. The position of preliminary test estimator may shift from “in between” R(βˆRE ; W ) and R(βˆP R ; W ) to “in between” R(βˆSE ; W ) and R(βÛ R ; W ). Thus the dominance picture under the H0 may change as follows: βˆRE ≥ βˆP R ≥ βˆSE ≥ βˆP T ≥ βÛ E .

(3.20)

The dominance pictures in (3.19) and (3.20) changes as η moves away from 0. We note that βÛ E has constant risk σe2 tr(C −1 W ) while the risk of βˆRE depends on η and therefore, the risk of βˆRE becomes unbounded as η moves always from 0. Also for ∆ → ∞, the risk of βˆP T


29

and βˆP R converge to the risk of βÛ E . For reasonable η near 0, the risk of βˆP T is smaller than that of βˆP R for q ≥ 3. Thus neither βˆP T nor βˆP R dominates the other except they share common property that as ∆ → ∞ the risk of both becomes σe2 tr(C −1 W ). However the risk of βˆP R is below the risk of βÛ E while the risk of βˆP T exceeds the risk of βÛ E at some intermediate values of ∆ depends on α.

4

Summary and Conclusion

In this paper we discussed some finite sample theory of five well known estimators of β that are a combination of the sample and non sample information. The RE performs the best compare to other estimators in the neighborhood of the null hypothesis, however, it performs worse when ∆ moves away from its origin. We have demonstrated the superiority conditions of the estimators based on the quadratic risk function. We find that βˆSE and βˆP R are more efficient than βÛ E in the whole parameter space. The performance property of the estimators is robust in the class of t distribution which is determined by degrees of freedom ν. Note that the application of βˆP R and βˆSE is constrained by the requirement q ≥ 3, while βˆP T does not need such constraint. However, the choice of the level of significance of the test has a dramatic impact on the nature of the risk function for the PT estimator. Thus when q ≥ 3, one would use βˆP R otherwise βˆP T with some optimum size α.

References [1] Anderson, T. W. (1984). An introduction to multivariate statistical analysis. Second Edition. John Wiley, NY. [2] Adkins, L. C. and Hill, R. C. (1990). The RLS positive part Stein estimator. American Journal of Agricultural Economics, 72, 727-730. [3] Bancroft, T. A. (1944). On biases in estimation due to use of preliminary tests of significance. Annals of Mathematics and Statistics, 15, 190-204. [4] Bancroft, T. A. (1964). Analysis and inference for incompletely specified models involving the use of preliminary test(s) of significance. Biometrics, 20, 427-442. [5] Benda, N. (1996). Pre-test estimation and design in the linear model. J. Statistical Planning and Inference, 52, 225 -240. [6] Blattberg, R. C. and Gonedes, N. J. (1974). A comparison of the stable and Student t distributions as statistical models for stock prices. Journal of Business, 47, 224-280. [7] Fama, E. F. (1965). The behavior of stock market prices. Journal of Business, 38, 34-105.

30


[8] Giles, A. J. (1991). Pretesting for Linear Restrictions in a Regression Model with Spherically Symmetric Distributions. J. Econometrics. 50, 377-398. [9] Giles, A. J. (1992). Estimation of the error variance after a preliminary test of homogeneity in a regression model with spherically symmetric disturbances. J. Econometrics. 53, 345-361. [10] Gnanadesikan, R. (1977). Methods for Statistical Data Analysis of Multivariate Observations, New York: Wiley. [11] Han, C-P, and Bancroft, T. A. (1968). On Pooling means when variance is unknown. Journal of the American Statistical Association, 63, 1333-1342. [12] James, W. and Stein, C. (1961). Estimation with quadratic loss. Proceeding of the Fourth Barkeley Symposium on Mathematical Statistics and Probability. University of California Press, Berkeley, CA, 361-379. [13] Judge, G.G. and Bock, M.E. (1978). The Statistical Implications of Pre-test and Steinrule Estimators in Econometrics, North-Holland Publishing Company, Amsterdam. [14] Judge, G. C., Miyazaki, S. and Yancey, T. (1985). Minimax estimators for the location vectors of spherically symmetric densities. Econometric Theory, 1, 409-417. [15] Kibria, B. M. G. (1996). On shrinkage ridge regression estimators for restricted linear models with multivariate t disturbances. Students, 1 (3), 177-188. [16] Kibria, B. M. G. and Saleh, A. K. Md. E. (1993). Performance of shrinkage preliminary test estimator in regression analysis. Jahangirnagar Review. A 17, 133-148. [17] Kibria, B. M. G. and Saleh, A. K. Md. E. (2003a). Effect of W, LR, and LM Tests on the Performance of Preliminary Test Ridge Regression Estimators. Journal of the Japan Statistical Society, 33(1), 119-136 [18] Kibria, B. M. G. and Saleh, A. K. Md. E. (2003b). Preliminary test ridge regression estimators with Student’s t errors and conflicting test-statistics, Metrika, 59, 105-124. [19] Mosteller, F. (1948). On pooling data. Journal of the American Statistical Association, 43, 231-242. [20] Ohtani, K. (1993). A comparison of the Stein-rule and positive part Stein-rule estimators in a misspecified linear regression models. Econometric Theory, 9, 668-679. [21] Saleh, A. K. Md. E. (2002). Theory of Preliminary test and Stein- Type Estimation with Application. Unpublished manuscript, School of Mathematics and Statistics, Carleton University, Ottawa, Ontario, K1S 5B6, Canada.


31

[22] Saleh, A. K. Md. E. and Kibria, B. M. G. (1993). Performances of some new preliminary test ridge regression estimators and their properties. Communications in StatisticsTheory and Methods, 22, 2747-2764. [23] Saleh, A. K. Md. E. and Sen, P. K. (1978). Non-parametric estimation of location parametric after a preliminary test regression. Annals of Statistics, 6, 154-168. [24] Shalabh (1995). Performance of Stein-rule procedure for simultaneous prediction of actual and average values of study variable in linear regression models. Bul. Internat. Statist. Inst. 56, 1375-1390. [25] Singh, R. S. (1991). James-Stein rule estimators in linear regression models with multivariate t distributed error. Australia J. of Statistics, 33, 145-158. [26] Sutradhar, B. C. and Ali, M. M. (1986). Estimation of the parameter of a regression model with a multivariate t error. Communications in Statistics, A 15, 429-450. [27] Ullah, A. and Zinde-Walsh, V. (1984). On the robustness of LM, LR and W tests in regression models. Econometrics, 52, 1055-1066. [28] Zellner, A. (1976). Bayesian and non-Bayesian analysis of the regression model with multivariate Student t error terms, Journal of the American Statistical Association, 71, 400-405.

estimation strategies for parameters of the linear ...

estimation strategies for parameters of the linear ...

Suggest Documents

Estimation of States and Parameters for Linear ... - JPL Robotics

Interval Estimation for the Parameters of the

Parameters Estimation for a Linear Exponential Distribution ... - hikari

Estimation of vocal cord biomechanical parameters by non-linear ...

Linear Estimation of Location and Scale Parameters Using Partial ...

Linear estimation of particle bulk parameters from ... - Atmos. Meas. Tech

Linear Minimax Regret Estimation of Deterministic Parameters with ...

ESTIMATION OF THE SLOPE PARAMETER FOR LINEAR ...

Parameters Estimation for the Exponentiated Weibull Distribution ...

Estimation of Plasma Parameters for ...

estimation of genetic parameters and selection for

Estimation of hydrogeological parameters for ... - Hydrologie.org

estimation of genetic parameters and selection for

ESTIMATION OF OPTO-MECHANICAL PARAMETERS FOR

estimation of genetic parameters and selection for

Estimation of Genetic parameters for post-weaning

Estimation of regional genetic parameters for mortality

Exact Likelihood Ratio Test for the Parameters of the Linear ...

Estimation for the Parameters of the Weibull Extension ... - m-hikari

Estimation for the Parameters of the Exponentiated Exponential ...

Different Estimation Procedures for the Parameters of the Extended ...

Comparison of the Estimation Methods for the Parameters ... - DergiPark

Estimation of the Unknown Parameters for the Compound Rayleigh ...

1 ESTIMATION OF DEMOGRAPHIC PARAMETERS