testing for proportionality of multivariate dispersion

0 downloads 0 Views 227KB Size Report
us to develop tests which are distribution free over a large class of null ... the following restriction on the sample sizes, namely n1 = k1n and n2 = k2n with .... sider any reasonable test for uniformity on S(p?1) and try to approximate it by a ne.
TESTING FOR PROPORTIONALITY OF MULTIVARIATE DISPERSION STRUCTURES USING INTERDIRECTIONS Sujit Kumar Ghosh and Debapriya Sengupta

University of Connecticut and Indian Statistical Institute

. 

The author would like to acknowledge the facilities provided by the Department of Statistics at the University of Connecticut, Storrs.



Part of the work was done while the author was visiting the Center of of Mathematics and its Application, The Australian National University, Canberra. The author would also like to acknowledge the facilities provided by Indian Statistical Institute, New Delhi, for a part of this research.

Proposed running head: Testing for Dispersion Structures Correspondence to: Sujit K. Ghosh U-120, UConn. 196 Auditorium Road Storrs, CT 06269.

e-mail: [email protected] FAX: (860)-486-4113.

2

Abstract.

Knowing whether the dispersion structures of two elliptically symmetric populations are proportional is an important problem in multivariate data analysis. Since the problem is invariant under nonsingular transformations it is possible to reduce it to the situation where one population is spherically symmetric while the other has a diagonal dispersion structure. In this article we show that the problem is actually equivalent to testing uniformity of a distribution on the sphere in an appropriate euclidean space. The main purpose is to demonstrate how the idea of interdirections, introduced by Randles (1989) in the context of multivariate sign tests, can be adapted to handle this situation. The ndings of this article also enhance the possibility of developing a technology for robust multivariate data analysis using such fundamentally geometric concepts.

AMS 1991 Subject Classi cation. primary 62H15 62G10, secondary 62F05, 62G20, 62F35. Key Words & Phrases. Ane Invariant Multivariate Sign Tests, Sphericity Tests, Elliptically Symmetric Distributions, Interdirections.

3

1 Introduction The intrinsic symmetry of many multivariate nonparametric testing problems permits us to develop tests which are distribution free over a large class of null hypotheses. The main advantage of such test procedures is that we can control the level of the test over a large class of null hypothesis, which is one of the primary concerns in hypothesis testing. Also in many problems of interest distribution free test procedures do have good power properties. Another appealing feature of these procedures is their inherent simplicity. The standard procedures in composite hypothesis testing problems typically consist of likelihood ratio (or uniformly most powerful invariant) tests for a suitable parametric subhypothesis of the original nonparametric hypothesis. In this paper we consider the following multivariate two sample dispersion structure problem. Suppose X ;    ; Xn and Y ;    ; Yn are two independent sets of random samples from two p-dimensional elliptically symmetric populations having densities given by j j? f (xT ? x) and j j? g(yT ? y): respectively. We wish to test H :  /  (i.e  =  for some  > 0 ) treating f and g as unknown nuisance parameters. Notice that we are using the term `dispersion structure' in order to include cases where the second moments may not exist. Even in that case the parameters  and  are well de ned for elliptically symmetric densities upto a constant multiple. In case the second moments are nite they are proportional to the dispersion matrices. In addition to this we shall also assume the following restriction on the sample sizes, namely n = k n and n = k n with 0 < k ; k < 1; k + k = 1. Moreover the same sampling fractions are maintained even as n tends to in nity. The situation where the populations have unknown centers say,  and  respectively is of more practical relevance. If we have a test procedure which is based on the assumption that  =  = 0 we can make an appropriate adjustment in the case when the centers are unknown, by rst subtracting suitable estimates of respective centers from the observations and then applying the same procedure on the centered data. By doing so we may alter the original distribution even asymptotically. It is not clear whether that is the case for the distribution free procedures we would like to consider in this article. We are able to present only some simulation evidence that the distributions are not altered much due to such precentering . The problem is worth pursuing from a theoretical level and probably the approach of Randles (1982) and de Wet & Randles (1987) will give some clue in this direction. When both f and g are standard normal densities, one can derive the likelihood ratio test (LRT) for the case when  =  under H . There is a large volume of literature studying various properties of LRT in the multivariate normal situation. We refer to Anderson (1958) and Muirhead (1982) for this. The (modi ed) LRT is 1

1

1

1

1 2

1

0

2

1

1 2

2

2

1

1

2

2

1

2

1

1

2

1

1

1

2

2

2

1

1

2

1

2

4

0

2

given by

An ) det(An ) Cn = det( det(An + An ) q q = P P where An = ni (Xi ? X )(Xi ? X )T and An = ni (Yi ? Y )(Yi ? Y )T and q = n ? 1; q = n ? 1. We shall treat the test statistic Cn as the classical procedure for q1 =2

1

1

1

2

2

1

=1

1

2

q2 =2

( 1+ 2) 2 2

1

=1

1

2

simulation purposes. To motivate our technique we describe a closely related problem which has been studied in great detail and where we borrow some key ideas from. Let X ;    ; Xn be iid samples from a p-dimensional density which is elliptically symmetric about a point . We wish to test H :  = 0. The classical test procedure is the Hotelling's T statistic (cf. Anderson 1958) which is the LRT assuming the population is Np (; ). When p = 1, the so called `sign test' is a well accepted distribution free procedure for this problem. In higher dimensions, several extensions of the concept of univariate sign test exist in the literature. Since the testing problem is invariant under nonsingular transformations it is worthwhile to consider ane invariant extensions of sign tests to higher dimensions. Various interesting procedures (for example, Hodges 1955, Blumen 1958, Oja & Nyblom 1989) are available. Randles (1989), Peters & Randles (1990) introduced the concept of interdirections which leads to a comprehensive study of a large class of multivariate sign tests. See also, Chaudhuri & Sengupta (1993). The main ndings of the above studies can be summarized as follows. After a reduction by ane invariance, the original problem reduces to testing whether the center of symmetry of a spherically symmetric distribution is zero. If one considers only distribution free procedures the problem reduces further to the problem of testing uniformity on the unit sphere S p? in IRp. This is a very well studied problem and a large number of tests have been proposed especially when p = 2. Most of these procedures can be expressed as functions of angles between observations. As shown by Randles (1989) and Chaudhuri and Sengupta (1993) a large class of ane invariant multivariate sign tests can be thought of as approximations to various well-known tests of uniformity on S p? in large samples. From this point of view the interdirections are actually ane invariant estimates of the angle between two observations with the special property that they are distribution free under H . For the dispersion structure problem we shall follow the same route. By invariance we can assume without loss of generality that one of the populations is spherically symmetric and the other has a diagonal dispersion structure. Thus we get a reduced problem where we have, say, Z ;    ; Zn iid samples from the density jj? g(z T ? z ) where  is diagonal and we want to test H S :  / Ip where Ip is the p  p identity matrix. For this problem any distribution free procedure should be based on Z =jjZ jj;    ; Zn=jjZn jj due to the rotational symmetry in the problem. Therefore 1

2

0

(

(

1)

1)

0

1 2

1

0

1

1

5

1

the problem (quite surprisingly!) reduces to testing uniformity on S p? . However one should note that the nature of the alternative is di erent in this case so that the test statistics turn out to be di erent from the location problem. Now we can consider any reasonable test for uniformity on S p? and try to approximate it by ane invariant procedures based on the original observations analogous to the location case. The organization of the paper goes as follows. In section 2 we describe the proposed test statistic. First the associated sphericity problem is considered. The actual test statistic proposed is an ane invariant approximation to its twin in the sphericity case. In the next section we develop a Wald type nonparametric test statistics along with other concluding remarks. Finally in section 4 all the tests are compared on the basis of simulations for various values of nuisance parameters (i.e, f and g). The technical details are provided in the appendix. (

(

1)

1)

2 Construction of the test statistic The construction of the test statistic will be described in two parts. As already mentioned by virtue of invariance we can reduce the original problem to a problem relating to testing uniformity on S p? . Next one can approximate an appropriate test statistic for the reduced problem by an (asymptotically equivalent) ane invariant version which will work for the actual problem. (

1)

2.1 Sphericity tests

Let Z ;    ; Zn be iid sample from a p-dimensional elliptically symmetric density given by jj? g(z T ? z ) with g and  unknown. We want to test H S :  / Ip . Since the problem is invariant under orthogonal transformations one can assume that  is diagonal without loss of generality. Also the elementary invariant quantities for constructing distribution free procedures are given by Ui = Zi =jjZijj; i = 1; : : : ; n which are iid uniform on S p? under H S . The normal theory likelihood ratio statistic is given by (Anderson 1958), ! pn n tr ( A n) Ln = jAnj = : (2:1) 1

1 2

1

0

(

1)

0

2

p P where An = ni Zi ZiT , the uncorrelated sums of squares and product matrix. The limiting behavior of Wn = ?2 log Ln can be worked out (see Nagao & Srivastava 2

=1

1973). It turns out that under elliptic symmetry

Fact 2.1 (i) Under H S , Wn has a limiting (1 + )  p p 2

0

(

( +1) 2

?1) distribution where



is the kurtosis of the density g. (ii) Under the sequence of local alternatives n = Ip + pP1n diag(c1 ; : : : ; cp ), Wn has a limiting (1 + )2(p p ?1) (2 ) distribution with 2 = 1+1  pi=1 (ci ? c)2. ( +1) 2

6

The above fact establishes that the level of the test based on Wn cannot be controlled even asymptotically under H S unless we make some extra assumption regarding . In order to construct distribution free test procedures for H S we look at the class of U -statistics of the form 0

0

Sn (h) =

X i

Suggest Documents