improved estimation of a multivariate normal mean vector ... - Sankhya

2 downloads 0 Views 185KB Size Report
SUMMARY. In this article we consider the problem of estimating the mean vector and the dispersion matrix of a multivariate normal distribution based on a ...
Sankhy¯ a : The Indian Journal of Statistics 1995, Volume 57, Series A, Pt. 2, pp. 267 - 286

IMPROVED ESTIMATION OF A MULTIVARIATE NORMAL MEAN VECTOR AND THE DISPERSION MATRIX : HOW ONE AFFECTS THE OTHER∗ By NABENDU PAL and ABDULAZIZ ELFESSI∗∗ University of Southwestern Louisiana SUMMARY. In this article we consider the problem of estimating the mean vector and the dispersion matrix of a multivariate normal distribution based on a sample of size n. Using simple techniques we show that the improved estimators of the mean vector can be used to construct improved estimators of the dispersion matrix and vice-versa.

1.

Introduction

normal distribution Let  X1 , X2 , . . . , Xn be i.i.d following a multivariate  p vector θε and dispersion matrix (p.d.). We assume (Np (θ, )) with mean  that both θ and are unknown and our goal here is to estimate both the parameters in a decision-theoretic setup. We reduce the data set by sufficiency and concentrate only on (X, S), where X S

n 1 1 Xi ∼ Np (θ, ) n i=1 n n   (Xi − X)(Xi − X) ∼ Wp ( | n − 1). =

=

. . . (1.1)

i=1

Paper received. July 1993; revised February 1994. AMS (1991) subject classification. Primary 62C15, secondary 62H12. Key words and phrases. Inadmissibility, loss function, James-Stein estimator, Wishart identity. ∗ Research supported by the Faculty Summer Research Grant (1993), University of Southwestern Louisiana. ∗∗

Present address : Department of Mathematics, University of Wisconsin at La Crosse, USA.

268

nabendu pal and abdulaziz elfessi

(Wp (. | .) denotes a Wishart distribution)  and these are independent. The maximum likelihood estimators of θ and are respectively  ˆ θˆM L = X and

ML

=

1 S. n

. . . (1.2)

  is ˆ U = Though θˆM L is also unbiased for θ, the unbiased estimator of S/(n − 1). However, in this paper we evaluate the estimators by their risk functions or the average loss functions. The loss function for estimating θ is taken as ˆ θ) = (θˆ − θ) Σ−1 (θˆ − θ), L0 (θ,

. . . (1.3)

whereas the loss functions for estimating Σ are ˆ Σ) L1 (Σ, ˆ Σ) L2 (Σ,

ˆ −1 − Ip )2 , = tr(ΣΣ ˆ −1 − ln | ΣΣ ˆ −1 | −p. = tr(ΣΣ

. . . (1.4)

If we consider the group of affine transformations (X, S) → (AX + b, ASA ), for nonsingular Ap×p and bεp , then the best affine equivariant estimator of θ under L0 is θˆ0 = X, and similarly the best affine equivariant estimators of respectively ˆ1 = Σ



1 1 ˆ2 = S and Σ S. (n + p) (n − 1)

. . . (1.5) under L1 and L2 are

. . . (1.6)

For estimating θ, the estimator θˆ0 is minimax for any p ≥ 1 but admissible only for p = 1, 2. For p ≥ 3, an improved estimator of θ is the famous JamesStein (1961) estimator θˆJS given as c∗ θˆJS = (1 − 0 )X. T

. . . (1.7)



where T = X S −1 X, c∗0 = (p − 2)/(n(n − p + 2)). Since this pioneering work, many shrinkage estimators which dominate θˆ0 have been proposed. Lin and Tsai (1973) generalized (1.7) and obtained a class of minimax estimators of the form r(T ) θˆr = (1 − . . . (1.8) )X, T where, 0 < r(.) ≤ 2(p − 2)/(n(n − p + 2)) and r(.) is nondecreasing. Konno (1991) extended the above result while estimating a p × m dimensional matrix ˆ − Θ) −1 (Θ ˆ − Θ) and the Θ of normal means under the loss function tr(Θ

normal mean vector and dispersion matrix

269

improved estimators of Θ reduce to (1.8) as a special case when m = 1. Further generalization has been done by Bilodeau and Kariya (1989) in connection with a normal MANOVA model. For estimating , a good amount of work has been done in the last three decades. If we consider the class of estimators depending only on S, then the estimators in (1.6) are admissible only for p = 1. In the univariate case (p = 1) Stein (1964) considered a larger class of estimators depending on both X and S and proved that the affine equivariant estimators are inadmissible in the above class. When p ≥ 2, the estimators in (1.6) are inadmissible in the class of estimators depending on S alone, and most of the time the improved estimators have simple structure and can give substantial risk improvements over the best affine equivariant estimators. For details on such estimators, one can see Pal (1993). Motivated by Stein (1964), when p ≥ 2, one can also use X to get the improvements but such typical improved estimators have one undesirable property – they are nonanalytic and hence again inadmissible (see Sinha (1987), Sinha and Ghosh(1987)). But one thing is clear that the use of X is always helpful in estimating since it also carries some information about .  Therefore, we should try to derive improved estimators of which are smooth (analytic) functions of both X and S. For p = 1, Strawderman (1971) derived such estimators only under the loss L1 and no multivariate generalization of that result is available yet. Very recently Kubokawa, Honda, Morita and Saleh  (1993) have derived smooth (analytic) estimators of using both X and S, and though the structure of their estimators is similar to ours, the motivations are completely different (in fact, we came to know about the above paper after completing our initial manuscript). A related but slightly different work in this direction is Kubokawa, Morita, and Nagakura (1993). It has been  Makita shown in that paper that when = σ 2 Ip , a Stein type nonsmooth estimator of σ 2 (Stein (1964)) can produce an improved estimator of the mean vector. Numerical results indicate the such improvements can be substantial near the origin. In this paper we first consider the estimation of θ under the loss L0 . In Section 2 we obtain a wider class of improved estimators (minimax) which includes the ones in (1.8). Our motivation for this class of estimators is quite simple. Note that, in the class of estimators (1.8), the shrinkage factor depends   is completely known, this T is then replaced by on T = X S −1 X. When  −1 T = X X, the scaled length of the sample mean. So the natural question −1 is – why not use a better estimator of rather than a constant multiple of  −1 S −1 when is unknown? The affine equivariant estimators of have the structure  ˆ −1 0

= (constant)S −1 .

Haff (1977, 1979a, 1979b, 1980) derived better estimates of

. . . (1.9) −1

of the form

270

nabendu pal and abdulaziz elfessi

 ˆ −1 ∗

=

 ˆ −1 0

+ u(S)Ip

. . . (1.10)

where u(S) is a suitable scalar valued function of S. Using (1.10), one can propose an improved estimator of θ with the following structure (a generalization of (1.7)) θˆ = (1 −

(constant) (X



S −1 X



+ u(S)X X)

)X,

. . . (1.11)

where u(S) ≡ 0 gives the aforementioned James-Stein structure. In Section 3, the estimation of the dispersion matrix has been considered. We start with the equivariant estimators which are constant multiples of S. Notice that in the expression of S((1.1)), X should be replaced by θ when it is completely known. Therefore, when θ is unknown, why don’t we use the James-Stein structure instead of X ? At least for p ≥ 3 this may work since the usual estimator (X) of θ is inadmissible. To our surprise, the above technique not only works for p ≥ 3 but also for p = 2 though the James-Stein structure is not available in the bivariate case. Using θˆc = (1 − c/T )X instead of X in S, we get S∗ =

n 

(Xi − θˆc )(Xi − θˆc ) ,

1

and after some simplification one gets (constant)  (XX ). . . . (1.12) 2 T Hence, to start  with, one may propose a constant multiple of S∗ as a new estimator of . Note that the expression (1.12) is scale equivariant and uses both X and S. S∗ = S +

In the next section we assume n > (p + 2) whereas in Section 3 we only need n > p. Though Efron and Morris (1976) have shown the use of improved estimators of the precision matrix while estimating the normal mean vectors in an empirical Bayes setup, to the best of our knowledge, the form of the proposed estimators given in (1.11) and (1.12) are not reported in the existing literature. Of course there are some questions left unanswered in this article (see Remark 3.3) but the main purpose of this work is to stimulate further research in this direction.

normal mean vector and dispersion matrix

2.

271

Estimation of θ

To estimate θ we start with a more general (than 1.11)) structure given as r(T ) θˆgr = (1 − )X, g

. . . (2.1) 

where r(.) is a suitable nonnegative function of T = X S −1 X and g = T +  u(S)X X. We’ll choose the scalar function u(S) later suitably. The following transformations will be used quite frequently to derive the risks of various estimators. Let − 12 − 12 − 12 − 12 Y = X, θY = θ and SY = S . Obviously, Y ∼ Np (θY , n−1 Ip ) and SY ∼ Wp (Ip | n − 1). The risk function of θˆgr is −1

r(T ) . . . (2.2) Y − θ Y 2 . g  Note that g can be rewritten as g = Y  SY−1 Y + u(S)(Y  Y ). Using (2.2), we can get the risk difference (RD) = R(θˆgr , θ) − R(θˆ0 , θ) as R(θˆgr , θ) = E(θˆgr − θ)

(θˆgr − θ) = EY −



RD

= =

 p r2 (T ) r(T )  2 Y E Y  − 2 Yi (Yi − θi ) g2 g i=1    2  p  r (T ) r(T ) 2 Y − 2 E Y  E (Y − θ Y i i i g2 g i=1

. . . (2.3)

where Yi and θiY be the i-th elements of Y and θY respectively. Using Stein’s normal identity, the second term of (2.3) can be simplified as

E

p  r(T ) g

Yi (Yi −

θiY

i=1

 where E i

E

 E

j=i

∂ ∂Yi



  Yi r(T ) ∂ } ) = EE { n ∂Yi g j=i i i=1   p 1  ∂ Yi r(T ) = , E n i=1 ∂Yi g p  1

. . . (2.4)

denotes expectation w.r.t. Yi (Yj , j = i). It is easy to see that Yi r(T ) g



 =E

 1 1 {r + 2r Yi (SY−1 Y )i } − rYi g (i) 2 , g g

. . . (2.5)

272

nabendu pal and abdulaziz elfessi

where r = r (T ), (SY−1 Y )i = i-th element of (SY−1 Y ) and g (i) = (∂g/∂Yi ). Let ∇g = (. . . , g (i) , . . .) . Then combining (2.3), (2.4) and (2.5) we get 

   r2 (T ) 2 1 1 2  −1   RD = E Y  − E {pr + 2(Y SY Y )r } − 2 rY (∇g) . g2 n g g The last term of the above expression can be simplified further as follows:  ∂ Y )} {Y  SY−1 Y + u(S)(Y  ∂Y  −1 = 2{SY Y + u(S) Y }, and   −1  Y } = 2g. Y (∇g) = 2{Y SY Y + u(S)Y 

∇g

=

Therefore,  RD = E

   (Y  SY−1 Y )  2 r2 (T ) r(T ) 2 − . Y  E (p − 2) + 2 r g2 n g g

In terms of X and S, the last expression can be rewritten as  2    r (T )   −1 2 1  −1  RD = E (X X) − S X)r } . E {((p − 2)r + 2(X g2 n g . . . (2.6) Assume r ≥ 0, 0 < r(T ) ≤ c0 and u(S) ≥ . Then  −1 X X 2 r(T ) {  c0 − (p − 2)}]. RD ≤ E[   −1 −1 n X S X + u(S)X X X S X Let B∗ be an orthogonal matrix such that B∗ X = (X, 0, . . . , 0) and let  W = B∗ SB∗ . Then X S −1 X = X2 W 11 = X2 /W11.2 . Here we assume that u(S) = u(B∗ W B∗ ) = u(W11.2 , W12 , W22 ),

. . . (A1)

and that W11.2 u(W11.2 , W12 , W22 ) is nondecreasing in W11.2 .

. . . (A2)

The notations are obvious. W 11 is the (1,1) element of W −1 . W11.2 = W11 − −1  W12 W22 W21 , where W11 = (1, 1) element of W, W22 is the (p − 1) × (p − 1)  is submatrix of W after deleting the first row and first column and W12 = W21 the first column of W without W11 . Hence,  −1 X X 2 2(p − 2) r(X2 /W11.2 )/X2 W11.2 c0 − { W11.2 } | X]] RD ≤ E[E[ 2 1 + W11.2 u(W11.2 , W12 , W22 ) n X . . . (2.7)

normal mean vector and dispersion matrix

Let h(W11.2 ) = and g(W11.2 ) =

X

273

r(X2 /W11.1 )/X2 1 + W11.2 u(W11.2 , W12 , W22 )

 −1

X2

X

2 W11.2 c0 −

2(p − 2) W11.2 . n

To show that the RHS expression in (2.7) ≤ 0, it is enough to have E {h(W11.2 ) g(W11.2 )} ≤ 0.

. . . (2.8)

.|X

Note that h(W11.2 ) is nonincreasing (since r ≥ 0 and by (A2)) and g(W11.2 has only one sign change (from negative to positive). Hence by A.1 (Appendix) a sufficient condition for (2.8) is E {g(W11.2 )} ≤ 0. Observe that (X

 −1

.|X

2

X) E {g(W11.2 )} X2 .|X  = E

 −1

X)2

X4 

= E

(X

2 W11.2 c0

  −1 X 2(p − 2) X − W11.2 ∂X n X2

  −1 X 2(p − 2) X |X c0 −   n (X S −1 X)2 X S −1 X

(X

 −1

X)2

= E[(χ2n−p )2 c0 − (2(p − 2)/n)χ2n−p ] ≤ 0, for c0 = 2(p − 2)/n(n − p + 2). Hence we get the following theorem. Theorem 2.1. Assume (A1), (A2) and (a) r(T ) is nondecreasing in T , (b) 0 < r(T ) ≤ c0 = 2(p − 2)/{n(n − p + 2)}, p ≥ 3. Then θˆgr dominates θˆ0 . In the following we consider some specific choices of the function u(.) and show how the above theorem works. (1) u(S) = vt(v), where v = trS −1 . Note that trS −1

= = =

tr(B∗ W B∗ )−1 = trW −1 = W 11 + trW 22 1 W21 W12 −1 + tr(W22 − ) W11.2 W11 −1 W21 1 + W12 W22 −1 + trW22 , W11.2

274

nabendu pal and abdulaziz elfessi

so that (A1) and (A2) are satisfied if t(v) is nonincreasing. (2) u(S) = vt(v), where v = (trS)−1 . Since −1 W21 + trW22 , trS = trW = W11.2 + W12 W22

it is easily seen that (A1), (A2) hold if t(v) is nonincreasing. (3) u(S) = vt(v), where v =| S |−1 . Since | S |=| W | +W11.2 | W22 |, (A1) and (A2) are satisfied if t(v) is nonincreasing. When we consider | S|

1/p

=| S|

−1

| S|

1+1/p

the function t(v) = (1/v) 2.1 holds.

1+1/p

1

−1

=| S|

| S|

1+1/p

−1

,

is decreasing, and this implies that the theorem

Remark 2.1. A smaller class (but still wider than Lin and Tsai (1973)) of simpler estimators is the one with estimators (derived from the above special case (1)) θˆbr = {1 −

r(T ) 

T + btr(S −1 )X X

}X,

. . . (2.9)

where 0 < r(.) ≤ 2(p − 2)/(n(n − p + 2)), r nondecreasing and b ≥ 0. Note that b = 0 and r(.) = (p − 2)/(n(n − p + 2)) gives θˆJS . The risk of the usual estimator θˆ0 is (p/n) and the risk improvement (RI) obtained by θˆJS over θˆ0 is RI

= =

Risk of θˆ0 − Risk of θˆJS (n − p)(p − 2)2 E((χ2p (λ))−1 ). n(n − p + 2)

−1 θ = 0. Note that the maximum RI obtained by θˆJS is attained at λ = nθ 0 ˆ We define relative risk improvement (RRI) as : RRI = [RI/R(θ , θ)] × 100%. In the following table we provide maximum relative risk improvements of θˆJS and θˆba over θˆ0 , where θˆba has the structure (2.9) with a = (p − 2)/(n(n − p + 2)) and b = .01. Similar to θˆJS , θˆba gives maximum improvement at λ = 0.

normal mean vector and dispersion matrix

Table 2.1. RRIs of θˆJS AND θˆba AT λ = 0 AND p



n (p + 7)

RRI of θˆJS (%) 25.926

RRI of θˆba (%) 27.110

(p + 12) (p + 7)

28.571 38.889

29.579 40.175

(p + 12) (p + 7)

42.857 46.667

42.858 47.494

(p + 12)

51.429

51.532

275

= I.

3

4

5

The problem of selecting an optimum ‘b’ is still remains open since it is difficult to find one analytically. We tried several values of b for numerical computations and observed that “small” values of b give better results than “larger” ones. An alternative proof. The above result (Theorem 2.1) can be proved in a totally different way. Here we give an alternative proof with u(S) = vt(v), v = tr(S −1 ), t ≥ 0 and nonincreasing. Note that the expression inside [ ] in the second term of (2.6) is free from the −1 parameters, whereas is present in the first term. We’ll find an unbiased estimator of the first term of (2.6) using the following lemma which is a simpler version of Haff’s (1979a) Wishart identity. Lemma 2.1. (Haff ’s Wishart identity). Let B = ((bij (S))) be a p × p matrix. For a nonnegative real value c, the matrix B(c) = ((bij )) is such that bij = bij if −1 denotes (B −1 )(c) and note that for any two p × p i = j; = cbij if i = j. Also B(c) matrices A and B, tr(A(c) b(1/c) ) = tr(AB). Let D∗ B = ΣΣ∂bij /∂sij . Then we i j

have the following identity.

E[htr(B

−1

)] =

∂h B(1/2) ] ∂S +(n − p − 2)E[htr(BS −1 )]. 2E[hD∗ B(1/2) ] + 2Etr[(

. . . (2.10)



To apply the above lemma we call B = (XX ) and h = r2 /g 2 . Then the first term of (2.6) can be reexpressed (using (2.10)) as

E[

r2  −1 (X X)] g2

= E[htr(B =

2Etr[(

−1

)]

∂h )B(1/2) ] + (n − p − 2)E[htrBS −1 ]. ∂S . . . (2.11)

276

nabendu pal and abdulaziz elfessi

We use the following results to simplify the first term of (2.11). ∂ (trBS −1 ) ∂S

= −(S −1 BS −1 )(2) (see A.2 in Appendix) ,

∂ 2 r ∂S

= 2(S −1 BS −1 )(2) r r,

and



∂ 2 g ∂S

= −2g −(S

−1

BS

−1

∂u(S) )(2) + (X X)( ) . ∂S 

Hence, the first term of (2.11) is ∂h )B(1/2) ] 2Etr[( ∂S

=

2Etr[−(

2 ){(S −1 BS −1 )(2) r r}B(1/2) g2

r2 ∂u  ){−(S −1 BS −1 )(2) B(1/2) + (X X)( )B(1/2) }). g3 ∂S . . . (2.12) Combining (2.6), (2.11) and (2.12) we get −2(

2rr r2 )(S −1 BS −1 )(2) B(1/2) − 2( 3 ){−(S −1 BS −1 )(2) B(1/2) 2 g g ∂u r2  +(X X)( )B(1/2) }] + (n − p − 2)E[( 2 )trBS −1 ] ∂S g 2 1  −1  − E[( ){(p − 2)r + 2(X S X)r }]. n g . . . (2.13)  Note that tr(S −1 BS −1 )(2) B(1/2) = (X S −1 X)2 = T 2 . So (2.13) reduces to RD

= 2Etr[−(

1 1 2 2 r2 ∂u   2 )rr T ] + 4E[( )r T ] − 4E[( )(X X)tr( )B(1/2) ] g2 g3 g3 ∂S 2(p − 2) r r2 4 r +(n − p − 2)E[ 2 T ] − E[ ] − E[ T ]. g n g n g . . . (2.14) Since r(.) ≥ 0 and nondecreasing (i.e., r ≥ 0), from (2.14) we have RD

= −4E[(

RD



r2 (p − 2) r r2 2 T ] + (n − p − 2)E[ 2 T ] − 2 E[ ] 2 g g n g ∂u r2  −4E[( 3 )(X X)tr( )B(1/2) ]. g ∂S

4E[

. . . (2.15)

−2 Note that when v = tr(S −1 , (∂v/∂S) = −S(2) (see Haff (1979a and 1979b)). This implies

normal mean vector and dispersion matrix

(

277

∂u −2 ) = −(t + vt )S(2) ∂S 

∂u i.e., tr( ∂S )B(1/2) = −(t + vt )tr(S −2 XX ) ≥ −(t + vt )vT. So, from (2.15),

RD

r2 (p − 2) r r2 2 )T ] + (n − p − 2)E[( )T ] − 2 E[ ] g3 g2 n g 2 r   +4E[( 3 )(X X)vT (t + vt )] g r2 (p − 2) r r2  ≤ 4E[( 3 )T (T + vt(X X))] + (n − p − 2)E[( 2 )T ] − 2 E[ ] g g n g (p − 2) r r2 E[ ] = (n − p + 2)E[( 2 )T ] − 2 g n g ≤

4E[(



(n − p + 2)E[

=
0 if α ≥ 1. So we choose α ≥ 1 and then the minimum is attained at J = 0, i.e.,

280

nabendu pal and abdulaziz elfessi

(p, α) =

Γ( p2 + (1 − α)) , α ≥ 1. Γ( p2 + 2(1 − α))

. . . (3.11)

The following theorem is immediate from the above derivation. 1 Theorem 3.1. The estimator ˆ c,α (of the form (3.1)) is uniformly better 1 than the best affine equivariant estimator ˆ under the loss L1 provided 1 ≤ α < 1 + p/4 and c satisfies the condition (3.10). Remark 3.1. Note that the expression (3.11) exists only if α < 1 + p/4. To see the existence of the improved estimators, we take the special case α = 1. Note that α = 1 implies (p, 1) = 1, d1 = (n−p), d2 = (n−p)(n−p+2) and d3 = (n + 1)(n − p). The range for c in this case is 0 1, the above risk improvements will depend −1 θ and the maximum RRI will be obtained at λ = 0. Although on λ = nθ the RRI’s given in the Table 3.1 is at most 2.5%, it is more than the ones (close to 1%) obtained by the nonsmooth estimators (using both S and X ) derived by Kubokawa (1989) and Perron (1990). (ii) Estimation under L2 . Though we can derive similar results under L2 , the techniques are somewhat different here. For the loss function L2 ,     −1  −1 ˆ 2  ˆ 2 c ˆ 2 c , ) = tr{ + α (XX )} −ln | { + α (XX )} | −p. L2 ( c,α T T 2  2  Therefore, the risk difference (RD) = R( ˆ c,α , ) − R( ˆ , ) is  −1

−1 c  −1   ˆ 2 ( (XX ) ) |] Tα Tα  −1 X) c (X  −1 |. = cE[  α ] − Eln | Ip + α (XX )S  −1 −1 (X S X) a2 (X S X) . . . (3.12) It is well known that for (p × 1) vectors a and b, | I + ab |= 1 + b a. Using this we can simplify the second term of (3.12) further and we get

RD

= E[c

(X

RD = EX E [c S

Note that ln(1 + inequality

1 W

X)

(X (X

− ln | Ip +

 −1



X)

S −1 X)α

− ln{1 +

c a2 (X



S −1 X)α−1

}].

)(W ≥ 0) is a convex function of W , and using Jensen’s

Eln(1 +

1 1 ) ≥ ln(1 + ). W E(W )

. . . (3.14)

Using the above inequality in the second term of (3.13) we get (assuming again α ≥ 1) E ln{1 + S

a2 (X



c S −1 X)

α−1

} ≥

c

α−1 } a2 E (X S −1 X) S c = ln{1 + α−1 },  −1 a2 d4 (X X)

ln{1 +



. . . (3.15)

where d4 = Γ((n − p)/2 − (α − 1))/(2α−1 Γ((n − p)/2)). The last expression is  −1  X/X S −1 X) ∼ χ2n−p and is independent obtained by using the fact that (X  −1 of (X X). Also

282

nabendu pal and abdulaziz elfessi

 E S

(X (X

 −1



X)

S −1 X)α

 =

(X

d1 −1

X)α−1

.

. . . (3.16)

Using (3.15) and (3.16) in (3.13) we have

RD



E [c



E [cd1 n

=

E [cd1 n

=

X

(X



J

d1 −1

α−1

X)α−1

E{

(nX

− ln{1 + 1

 −1

c

(X

X)α−1

 −1

X)α−1

} − ln{1 +

}]

a2 d4 E(nX

c

 −1

X)α−1

}]

p α−1 Γ( 2

+ J − (α − 1)) cnα−1 Γ( p2 + J) − ln{1 + }] p 2α−1 Γ( 2 + J) a2 d4 Γ( p2 + J + (α − 1)) J E [G(c, J)], (say) . J

. . . (3.17) The above function G(c, J) has the structure G(c, J) = ck1 − ln(1 + ck2 ),

. . . (3.18)

for suitable positive constants ki = ki (p, α, J)(i = 1, 2). It is easy to see that the expression in (3.18) is a convex function of c (since d2 G(c, J)/dc2 = k22 /(1 + ck2 )2 > 0). For a fixed J, the function G(c, J) is minimized at c = cJα , where 2α−1 Γ( p2 + J) a2 d1 d4 Γ( p2 + J − (α − 1))Γ( p2 + J + (α − 1)) }. {1− d1 nα−1 Γ( p2 + J − (α − 1)) (Γ( p2 + J))2 . . . (3.19) The minimum of cJα w.r.t. J is attained at J = 0 (see result A.4, Appendix); i.e.,

cJα =

2α−1 Γ( p2 ) a2 d1 d4 Γ( p2 − (α − 1))Γ( p2 + (α − 1)) }. {1− p J d1 nα−1 Γ( 2 − (α − 1)) (Γ( p2 ))2 . . . (3.20) Also, the expression G(c, J) takes the value 0 at c = 0. Therefore, G(c, J) ≤ 0∀J as long as 0 < c ≤ c0α (because G(c, J) is convex in c). This implies (3.17) ≤ 0. The following result is now obvious. Note that, for the existence of (3.20), we need α < 1 + p/2. mincJα = c0α =

2 Theorem 3.2. The estimator ˆ c,α (of the form (3.1)) is uniformly better 2 than the best affine equivariant estimator ˆ under the loss L2 provided 1 ≤ α < 1 + p/2 and 0 < c ≤ c0α (given in (3.20)).

normal mean vector and dispersion matrix

283

Remark 3.2. Again as a special case we take α = 1 to establish the existence of the improved estimators. In this case c01 = 1/d1 − a2 d4 = 1/(n − p) − 1/(n − 1) = (p − 1)/((n − 1)(n − p)) > 0 provided p ≥ 2. Therefore, for p ≥ 2 2 2  = ˆ + (XX )c/T is uniformly better than ˆ under L2 provided 2, ˆ c,α

0 < c ≤ (p − 1)/((n − 1)(n − p)). We can get the similar results for α = 2 but then p must be ≥ 3.

 Also, notice that the improved estimators of developed here are not location invariant. This undesirable property is a result of the use of James–Stein structure in the estimators. Remark 3.3. (Scope of further work). The results developed in this article clearly show some interdependence between improved estimation of the mean vector and improved estimation of the dispersion matrix (or the precision matrix). This also raises  many new questions. For example, can we use the improved estimator of (using both X and S) to get another improved estimator of θ (may be better than θˆJS )? And  use the improved estimator of θ to get further improvement for estimating , and so on. Hopefully, this cyclic process of successive improvements will stabilize eventually. This is under study now.   Note that our improved estimators of have the structure ˆ = (constant)  S + c(XX )/T α . We are now looking into a more general structure by replacing the constant multiple of S by Φ(S), where Φ is a p × p matrix depending on S alone and uniformly better than the best affine equivariant estimator. Hopefully this will give further improvements than those presented in this article. A somewhat different −1 but related problem is the improved estimation of the precision matrix . Note that, for an invertable matrix V and vector 1, (V + 11 )−1 = V −1 − (1 + 1 V −1 1)−1 V −1 11 V −1 (see Seber, 1984, Page 520). i Applying this result on ˆ c,α we can get a structure which may be a better −1 . Currently this is also under investigation. estimator of

Appendix

A.1. Lemma. Let the random variable Z ∼ f (z) on (0, ∞). The real valued functions h(Z) and g(Z) are such that h(Z) ≥ 0 nonincreasing and g(Z) has only one sign change (from negative to positive) on (0, ∞). Then E(h(Z)g(Z)) ≤ KE(g(Z)) for some suitable positive constant K. Proof. Let z∗ > 0 such that g(z) ≤ 0 for z(0, z∗ ]; and > 0 for z(z∗ , ∞). We write

284

nabendu pal and abdulaziz elfessi



z∗

E(h(Z)g(Z)) =



h(z)g(z)f (z)dz +

h(z)g(z)f (z)dz z∗

0

= D1 + D2 (say).

Since h(.) is nonincreasing, on (0, z∗ ) h(z)g(z) ≤ h(z∗ )g(z)

z∗ g(z)f (z)dz. i.e., D1 ≤ h(z∗ ) 0

Similarly, on (z∗ , ∞) h(z)g(z) ≤ h(z∗ )g(z) which gives

∞ g(z)f (z)dz. D2 ≤ h(z∗ ) z∗

Therefore, D1 + D2 ≤ KE(g(Z)), where K = h(z∗ ) ≥ 0. A.2. Simplification of (∂(trBS −1 )/∂S). Note that the (k, l) element of p

p

p

 ∂  ∂ ∂ ∂ (trBS −1 ) = (BS −1 )ii = bij sji tr(BS −1 ) = ∂S ∂skl ∂s ∂s kl kl i=1 i=1 j=1 (where bij and sji are the (i, j) and (j, i) elements of B and S −1 respectively) =

p p   i=1 j=1

bij

∂ ji s ∂skl

ik jk

if k = l −s s }. (see Haff (1979b)). −sik sjl −sil sjk if k = l, ∂ tr(BS −1 ) = −[2M − (1 − 2)diagM ] = −M (2) , Now it is easy to see that ∂S −1 −1 where M = S BS . Note that,

∂ ji ∂skl s

= {

A.3. Derivation of d3 . d3 = E[(

Y  SY Y Y Y )α ], where SY ∼ Wp (Ip | n − 1). )( Y Y Y  SY−1 Y

It is well known that SY can be decomposed as SY = KK  , where K = ((kij )) 2 ∼ χ2n−i . Also, is a lower triangular matrix with kij (i = j) ∼ N (0, 1) and kii note that the above expectation doesn’t depend on Y , and so with out loss of generality, we assume Y = (0, 0, . . . , 0, 1) . This gives d3

α

2 2 2 = E[kp1 + . . . + Kp−1,p + (kpp ) ] = (p − 1)E((χ2n−p )α ) + E((χ2n−p )α+1 ) Γ((n − p)/2 + α) Γ((n − p)/2 + (α + 1)) = (p − 1)2α + 2α+1 Γ((n − p)/2) Γ((n − p)/2) = (n − 1 + 2α)d1 (see expression (3.4) for d1 ).

285

normal mean vector and dispersion matrix A.4. Simplification of c0α in (3.20). From (3.19), we write cJα as cJα = d5 D1 (p, α, J){1 − d6 D2 (p, αJ)}, where d5 = 2α−1 /(d1 nα−1 ), d6 = a2 d1 d4 , D1 (p, α, J) = and D2 (p, α, J) =

Γ( p2

Γ( p2 + J) + J − (α − 1))

Γ( p2 + J − (α − 1))Γ( p2 + J + (α − 1)) 2

(Γ( p2 + J))

.

Since α ≥ 1, D1 (p, α, J) is an increasing function of J (see Lemma 1, Guo and Pal (1992), p. 315)). Let RJ,J+1 = D2 (p, α, J)/D2 (p, α, J + 1). After simplification we get RJ,J+1 = 1 +

(α − 1) 2

2 2

( p2 + J) − (α − 1)

> 1.

Therefore, D2 (p, α, J) is decreasing in J, which implies {1 − d6 D2 (p, α, J)} is increasing in J. Therefore cJα is increasing in J, and its minimum is attained at J = 0 which gives c0α . Acknowledgments. The authors are indebted to the referee for his help regarding the proof of Theorem 2.1. He also pointed out a few references which helped the authors greatly. Sincere thanks are due to a Co-Editor for many constructive suggestions. References

Bilodeau, M. and Kariya, T. (1989). Minimax estimators in the normal MANOVA model. Jour. Multivariate Analysis, 28, 260 - 270. Efron, B. and Morris, C. (1976). Multivariate empirical Bayes and estimation of covariance matrices. Ann. Statist, 4, 22 - 32. Guo, Y. Y. and Pal, N. (1992). A Sequence of Improvements over the James-Stein Estimator. Jour. Multivariate Analysis, 42, 302 - 317. Haff, L. R. (1977). Minimax estimators for a multivariate precision matrix. Jour. Multivariate Analysis, 7, 374 - 385. − − −− (1979a). Estimation of the inverse covariance matrix : random mixtures of the inverse Wishart matrix and the identity. Ann. Statist. 7, 1264 - 1276. − − −− (1979b). An identity for the Wishart distribution with application. Jour. Multivariate Analysis, 8, 586 - 542. − − −− (1980). Empirical Bayes estimation of the multivariate normal covariance matrix. Ann. Statist. 8, 586 - 597.

286

nabendu pal and abdulaziz elfessi

James, W. and Stein, C. (1961). Estimation with quadratic loss. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, 1, 361 - 380. University of California Press, Berkeley. Konno, Y. (1991). On estimation of a matrix of normal means with unknown covariance matrix. Jour. Multivariate Analysis, 36, 44 - 55. Kubokawa, T. (1989). Improved estimation of a covariance matrix under quadratic loss. Statistics and Probability Letters, 8, 69 - 71. Kubokawa, T., Honda, T., Morita, K. and Saleh, A. K. Md. E. (1993). Estimating a covariance matrix of a normal distribution with unknown mean. Journal of Japan Statistical Society, 23, No. 2, 131 - 144. Kubokawa, T., Morita, K., Makita, S. and Nagakura, K. (1993). Estimation of the variance and its applications. Jour. Statistical Planning and Inference, 35, 319 - 33. Lin, P. E. and Tsai, H. L. (1973). Generalized Bayes minimax estimators of the multivariate normal mean with unknown covariance matrix. Ann. Statist. 1, 142 - 145. Pal, N. (1993). Estimating the normal dispersion matrix and the precision matrix from a decision- theoretic point of view : a review. Statistical Papers/Statistische Hefte, 34, 1 - 26. Perron, F. (1990). Equivariant estimators of the covariance matrix. Canadian Journal of Statistics, 18, 179 - 182. Seber, G. A. F. (1984). Multivariate Observations. John Wiley and Sons, New York. Sinha, B. K. (1987). Inadmissibility of the best equivariant estimators of the variancecovariance matrix, the precision matrix and the generalized variance : A survey. Proceedings of the International Symposium on Advances in Multivariate Statistical Analysis, Indian Statistical Institute, Calcutta, INDIA. Sinha, B. K. and Ghosh, M. (1987). Inadmissibility of the best equivariant estimators of the variance-covariance matrix, the precision matrix and the generalized variance under entropy loss. Statistics and Decisions, 5, 201-227. Stein, C. (1964). Inadmissibility of the usual estimator for the variance of a normal distribution with unknown mean. Ann. Inst. Stat. Math., 16, 155 - 160. Strawderman, W. E. (1974). Minimax estimation of powers of the variance of a normal population under squared error loss. Ann. Statist. 2, 190 - 198.

Department of Statistics University of Southwestern Louisiana Lafayette, Louisiana 70504 USA

Suggest Documents