On Hotelling's Approach to Hypothesis Testing when a ... - CiteSeerX

1 downloads 0 Views 266KB Size Report
On Hotelling's Approach to Hypothesis. Testing when a Nuisance Parameter is. Present only under the Alternative. Peter Kim. Department of Mathematics and ...
On Hotelling's Approach to Hypothesis Testing when a Nuisance Parameter is Present only under the Alternative Peter Kim Department of Mathematics and Statistics University of Guelph Guelph, Ontario N1G 2W1 Canada Qi Li Daniel Naiman Department of Economics Department of Mathematical Sciences University of Guelph Johns Hopkins University Guelph, Ontario N1G 2W1 Baltimore, Maryland 21218 Canada USA Thanasis Stengos Department of Economics University of Guelph Guelph, Ontario N1G 2W1 Canada Abstract

In this paper we develop an alternative procedure to testing the signi cance of nonlinear terms in regression models based on the method of tubes. This statistical method was rst introduced by Hotelling (1939) and has been recently extended by others in the statistics literature. The proposed procedure makes use of concepts from di erential geometry and it can lead to an exact test with good power characteristics in certain circumstances.  An earlier draft of this paper was presented at the CESG 1994 meeting in Windsor, Canada. The

authors thank the participants for their helpful comments. Also comments from G. Fisher, T. Kariya and J. MacKinnon on the present version are gratefully acknowledged. The research of the rst, second and fourth authors have been nancially supported in part by SSHRC (Canada). The rst author acknowledges the support of NSERC (Canada), while the third author acknowledges the support of NSF (USA). All authors gratefully acknowledge the nancial support from the Oce of Research, University of Guelph.

1 Introduction There has been extensive research in econometrics over the years to develop methods that allow for nonlinear e ects to enter the regression function. In fact, a lot of recent research concentrates on allowing the presence of general nonlinearities that are handled as undesirable or nuisance components in the context of partially linear regression models. It is well known that these nonlinear components are essential to obtain consistency of the parameters of interest otherwise, the regression function would be misspeci ed. In this paper we consider a class of nonlinear models for which inference does not t in the standard classical testing environment of the trinity of tests; likelihood ratio (LR), Lagrange multiplier (LM) and Wald (W) tests. In the present type of testing problem we will have the presence of nuisance parameters under the alternative hypothesis but not under the null. Hence, the null asymptotic distribution of the classical tests is nonstandard and the usual optimality properties of the LR, LM and W tests may no longer apply. Davies (1977,1987) proposes a method to handle such a problem using the LR test where he obtains its asymptotic distribution under the null. Hansen (1991) extends Davies' results to consider LM type tests. Recently, Andrews and Ploberger (1994) suggest alternative asymptotic tests that retain the standard asymptotic optimality properties of the classical tests with well de ned asymptotic distributions. In all cases however, the above methods use asymptotic theory to calculate the required probabilities and therefore are not exact in nite samples. In this paper we present an alternative approach due to Hotelling (1939), to obtain an exact test of the null hypothesis described above. The statistical methodology uses the LR test however, the novelty comes from the fact that results from di erential geometry are used to calculate the volumes of certain tubular neighborhoods and this in turn allows us to obtain exact size calculations even in the presence of nuisance parameters under the alternative hypothesis. The contribution to econometrics therefore is that we can make nite sample statements on whether a nonlinear component is signi cant irrespective of the (nuisance) parameters that make up part of this nonlinear component. Indeed, the method is general enough and can be applied in many di erent econometric contexts. Kim et al (1995) use this method to test two linear nonnested regression models. We now provide a summary of the paper. In the next section we set up the problem in a general context. We then de ne the test statistic based on Hotelling's (1939) procedure which is just the LR statistic. It turns 2

out that the rejection region can be described geometrically as a tubular neighborhood of some subset of the unit sphere in some dimension. Consequently, the size of the test is the normalized volume content of this tubular neighborhood. Section 3 focuses on calculating the volume of these tubular neighborhoods. We review the available techniques which have appeared in the mathematical and statistical literature. We start with tubular neighborhoods of curves which Hotelling (1939) considers, followed by extension to tubular neighborhoods of submanifolds without boundary analyzed by Weyl (1939). Knowles and Siegmund (1989) calculates tubular neighborhoods of submanifolds with boundary. Inequalities of various sorts are available and are also discussed, see Naiman (1986), Johnstone and Siegmund (1989), Naiman (1990), Knowles, Siegmund and Zhang (1991) and Siegmund and Zhang (1993). Section 4 discusses the relevance to econometrics by examining di erent environments where the above methodology applies. Some concluding remarks as well as further extensions are discussed in Section 5. We also include some technical details in an Appendix.

2 Nonlinear Models and the Likelihood Ratio Test Consider the following nonlinear regression model:

y = X + f (Z; ) + v;

(1)

where y is the n  1 vector of observations on the dependent variable; X and Z are the n  k0 and n  k1 observation matrices of exogenous variables, respectively. Furthermore, and are k0  1 and k1  1 vectors of unknown parameters respectively, and f : Rnk1  Rk1 ! Rn is assumed known. We will assume that v  N (0; 2In), hence the unknown parameters of (1) are ( ; ; ; 2). Our interest is in testing the hypothesis

H0 :  = 0 versus H1 :  6= 0:

(2)

In general, because is not identi able when  = 0, the Wald statistic does not provide an adequate approximation to the log likelihood ratio statistic, hence it will not necessarily have an asymptotic chi-square distribution. Our approach, or to be more precise Hotelling's approach, attempts at dealing with the LR statistic in (2) directly. 3

2.1 The Canonical Form In order to facilitate the forthcoming discussion, let us now put the model (1), into canonical form in two steps. Indeed, let X denote the linear subspace of Rn spanned by the columns of X . The rst step is to remove the leading term on the right hand side of the equality in (1), by projection onto X . Thus de ne y~ = y ? PX y, f~ = f ? PX f and v~ = v ? PX v, where PX = X (X 0X )?1 X 0. With this modi cation, (1) becomes

y~ = f~(Z; ) + v~;

(3)

where v~ is distributed as N (0; 2[In ? PX ]). At this point we have to make some assumptions on the allowable functions f . Clearly, because  = 0 means that under expectation y 2 X , while under the alternative an additional f (Z; ) is present, then it is necessary to assume that f (Z; ) 6 2X for some Z; . This amounts to having f~(Z; ) 6= 0 for some Z; and in general f~(Z; ) 2 X ? for all Z; , where for a vector subspace W  Rd, for some d > 0, W ? denotes the orthogonal complement of W inside Rd. Furthermore, for parameter identi ability, it will be assumed that the class of allowable functions f separates parameter points in that for each xed Z , f (Z; ) 6= f (Z; ), whenever 6=  for ;  2 Rk1 . Note that these assumptions are satis ed in the linear case when f (Z; ) = Z and X and Z are nonnested. The second and nal step to putting (1) into canonical form is to note that because [In ? PX ] is idempotent, nonnegative de nite and symmetric, we can write

In ? PX = ODO0 ; where O is an n  n orthogonal matrix with D a diagonal matrix with n ? k0 ones down the main diagonal and zero elsewhere, ie, D = diagf1; : : : ; 1; 0; : : : ; 0g. Thus de ne y = DO0 y~, f  = DO0 f~ and v = DO0 v~. Then canonically, (3) is equivalent to

y = f (Z; ) + v;

(4)

where y is an m  1 vector, f  : Rnk1 ! Rm and v is distributed N (0; 2Im) with m = n ? k0 . ~ . We The hypothesis remains as (1), and when f (Z; ) = Z , f (Z; ) = Z  = DO0 Z note that we do not assume f to be di erentiable. 4

2.2 The Likelihood Ratio Statistic De ne

  )ij2  ^  2 H ( ) = ky ? k yfk(2Z; )k = 1 ? kjhfy(;Z;f (Z; )k2kyk2 ; where ^ = hy; f (Z; )ikf (Z; )k?2 is the OLS estimator of  (for a given value of and known f in (4)). Here, h; i denotes the usual inner product with the norm k  k de ned in the usual way. De ne the test statistic for this problem to be

H  = inf

H ( );

(5)

and reject H0 for small values of H . We note that H  is nothing more than the likelihood ratio statistic for the test (2). The size > 0 of the test can be determined by choosing 0 < c < 1 so that

= P fH  < 1 ? c2g = P fsup jhu; !ij2 > c2g;

(6)

where u = y=kyk and ! = f (Z; )=kf (Z; )k. The rejection region is,

D( ; c) =

(

u 2 S m?1

)

: sup jh!; uij > c  S m?1 ; !2

(7)

where  S m?1 is the range of values that ! can take and S m?1  Rm is the m ? 1 dimensional unit sphere embedded in m dimensional Euclidean space. The size > 0 therefore, for testing H0 using (5), is determined by the probability under H0 of the set fu 2 D( ; c)g. Now under the normality assumption and H0, u = y=kyk is uniformly distributed over S m?1. Consequently, under H0, Z = P fu 2 D( ; c)g = jS m1?1j du; D( ;c) where jS m?1j denotes the (surface) volume of S m?1 (see 8) and du denotes the uniform measure on S m?1.

5

2.3 Some Comments At this point, we would like to make two comments. First, the notion of nding the relative volume of a region within a unit sphere for a similar statistical testing situation, was rst developed by Hotelling (1939). Consequently, we will refer to (5) as the H-test. 1 The second comment is with respect to the uniformity of u = y=kyk on S m?1 under the null hypothesis. Although the normality assumption in (1) is sucient, it can be weakened. Indeed, for d > 0, a d?dimensional random vector w, is said to be spherically symmetric if w and Ow have the same distribution, where O is a d  d orthogonal matrix. It follows from Lemma 1.6 of Kariya and Sinha (1989), page 8, that w=kwk is uniformly distributed on S d?1, provided that P fw = 0g = 0. Therefore we only require v to be spherically symmetric with P fv = 0g = 0 in order to have uniformity of u = y=kyk on S m?1 under the null hypothesis. Consequently, robustness properties in the context of Kariya and Sinha (1989), or Kariya and Kim (19995) should prevail.

3 Volume of Tube Calculations As can be imagined, the relative volume is dependent on the region  S m?1. Clearly, the simpler is, the easier the volume calculation of (7) will be. Hotelling (1939) calculates volumes for tubular neighborhoods of curves. Weyl (1939) gives calculations for tubular neighborhoods of submanifolds without boundary. Recent progess in the statistical literature by Knowles and Siegmund (1989), calculates volumes for tubular neighborhoods with boundary for 2?dimensional submanifolds. Inequalities of various sorts have been obtained by Naiman (1986), Johnstone and Siegmund (1989), Naiman (1990), Knowles, Siegmund and Zhang (1991) and Siegmund and Zhang (1993). We will give a brief description of what is known but before doing so, we de ne two formulas that will be frequently used. Let M be a d?dimensional compact manifold, d > 0. We will denote by jM j, the d?dimensional volume of M . If M has boundary (topologically), we will denote it by 1 As an aside, it is interesting to note that mathematically, di erential geometers have been studying

extensions of Hotelling's volume of tubes calculation for years, starting with Weyl (1939). The statistical implications of Hotelling's tubes calculation however, have only recently caught the attention of statisticians; see Naiman (1986), Johnstone and Siegmund (1989), Knowles and Siegmund (1989), Johansen and Johnstone (1990), Naiman (1990), Knowles, Siegmund and Zhang (1991) and Siegmund and Zhang (1993). To the best of our knowledge, the above methodology, despite Hotelling's involvement, has not hitherto appeared in the econometrics literature.

6

@M . Therefore, j@M j will denote the (d ? 1)?dimensional volume of @M . Let B d be the d?dimensional unit ball in Rd, hence S d?1 = @B d. We note that d=2 d=2 (8) jB dj = ?((d+ 2)=2) and jS d?1j = ?((d2+ 1)=2) ; where d > 0 and ?() denotes the gamma function.

3.1 Curves Denote an interval by I = [a; b]  R. Let ! : I ! S m?1 be a continuous map otherwise known as a curve. If a curve is continuously di erentiable with nowhere vanishing derivative, it is called regular. We will say that a curve is closed if !(a) = !(b) and with the above notation, j!j will denote it's length. In this case, = f!(t) : t 2 I g and

D0 ( ; ) =

(

u 2 S m?1

: suphu; !(l)i > cos

)

l

(9)

for 0 <  < =2. Hotelling (1939) showed that if ! is a regular closed curve in S m?1 with no self intersection except at the end points of I , that if  is suciently small, then Z

D0 ( ;)

du = jB m?2jj!jsinm?2 :

(10)

In the case where ! is not closed with no self intersection, one would have to make a minor modi cation and indeed, (10) needs to be modi ed to Z1 m ? 2 m ? 2 m ? 2 du = jB jj!jsin  + jS j (1 ? z2)(m?3)=2dz: cos D0 ( ;)

Z

(11)

Consequently, (11) leads to the probability statement   j!j sinm?2 + 1 P nB (1=2; (m ? 1)=2) > cos2o ; (12) P sup h !; u i  cos  = 2 2 ! where B (q1; q2) denotes a Beta random variable with q1 and q2 degrees of freedom. In the case where ! is a closed curve, the second term in (12) disappears. We will now clarify what is meant by suciently small . Although we assume that the curve ! has no self intersection, two interior points on the curve can come very close. If the size of the tubular neighborhood is taken too large, it is possible that the tubular neighborhood around ! could turn inside itself. However, by continuity and no self intersection, by suciently shrinking the size of the tubular neighborhood, the self intersection of the tubular neighborhood can be removed. It is precisely at this point when (12) becomes exact. This is what is meant by suciently small here and below. 7

3.2 Submanifolds without Boundary The above calculations give us volumes of tubular neighborhoods around curves on S m?1. Weyl (1939) obtains calculations for volumes of tubular neighborhoods around manifolds (without boundary) embedded in S m?1, or submanifolds of S m?1. Indeed, Weyl generalizes (11) in the sense that Hotelling's formula reduces to a special case. If is an r?dimensional submanifold (without boundary) embedded in S m?1, then Weyl (1939) shows, for suciently small , in the same sense as Hotelling, that Z

D0 ( ;)

du = jS r?1j

X 0j r;j even

R m+j ?r?2 (cost)r?j dt 0 (sint)

kj (m ? r ? 1)(m ? r + 1)    (m ? r + j ? 3) ;

(13)

where the kj 's are certain integral invariants of independent of the embedding. 2

3.3 Submanifolds with Boundary If is a submanifold of S m?1 with boundary, then certain technical problems happen. Nevertheless for certain statistical applications, such regions are precisely what is needed. Knowles and Siegmund (1989) provide a tubes formula when is a 2?dimensional submanifold with boundary @ . Indeed, as long as @ is well behaved, see Theorem 1 of Knowles and Siegmund (1989) for the precise meaning, then for suciently small , in the sense of Hotelling, ) Z Z m?4 j ( j S (14) du = (m ? 3) j jcos(sin)m?3 + 2( ) 0 (sint)m?2dt D0 ( ;) m?3 j j@ j(sin)m?2; + 2(jSm ? 2) where ( ) refers to the Euler characteristic of .

3

3.4 Inequalities The above represent exact calculations. In most instances, however, submanifolds can be complicated. Nevertheless some inequalities are available which would be useful in obtaining conservative probability calculations in nite samples. 2 It is of interest to note that Hotelling initiated this research to address a statistical problem. Weyl

made the above generalization purely for mathematical interests and in so doing initiated a line of research in di erential geometry, see for example Gray (1982). 3 The latter is a topological invariant used to classify manifolds, see for example Guillemin and Pollack (1974). The method by which (14) is calculated is through the use of the Gauss-Bonnet Theorem, see also Guillemin and Pollack (1974).

8

In the case of curves Naiman (1986) shows that when ! is a piecewise regular curve of nite length in S m?1, then in fact (12) holds with inequality,   j!j sinm?2 + 1 P nB (1=2; (m ? 1)=2) > cos2o : (15) P sup h !; u i > cos   2 2 ! Thus the above represents an upper bound on the volume of the tubular neighborhood. In addition, Naiman (1990) obtains upper bound calculations for spherical polyhedra embedded in S m?1. Another technique at obtaining upper bounds is to notice that ( ) = h!; ui;

(16)

is a random process if is real valued and a random eld if is vector valued. One can then use the upcrossings of ( ) to get an upper bound on the volume content P (sup ( ) > c) for some c > 0. This is in fact the technique used in Johnstone and Siegmund (1989), Knowles, Siegmund and Zhang (1991) and Siegmund and Zhang (1993). We note that Davies (1977, 1987) uses upcrossing theory in his work although he does not formulate the problem as a volume of tubes calculation.

4 Econometric Applications It should now be clear that volume of tube calculations can provide nite sample probability statements for testing (2) using the LR statistic with respect to the model (1). We now provide some examples on how it can be used in an econometric environment.

4.1 A Textbook Case As a warmup let us provide a small numerical example. A version of the following model is the \textbook case" used in econometrics to illustrate issues of estimation and inference that arise in nonlinear regression, see Davidson and MacKinnon (1993). Indeed, let

yt = 1x1 + 2x2 + z + ut;

(17)

where x1, x2, and z are independently and identically distributed as uniform variates in the interval [1; 2] and ut is standard normal. For testing  = 0, we choose the parameter values to be 1 = 2 = 1 and = 2. The parameter  is set to take the values 0.3, 0.5, 0.7. In order to test the null hypothesis, one 9

Table 1: Asypmtotic t?ratio from NLS

n= 50 100 200 400

0.3 0.02 0.06 0.13 0.37

0.5 0.07 0.12 0.27 0.73

0.7 0.09 0.20 0.36 0.89

typically uses a Wald type asymptotic t?ratio. We estimate the model by a nonlinear least squares (NLS) algorithm based on the Gauss-Newton method. The t?statistic is formed by the ratio of the estimate of  divided by its estimated standard error. Table 1 presents Monte Carlo evidence on the proportion of rejections of H0 when in fact the data generating process is given by (17) with the above chosen parameter values. We also present evidence on the performance of the t?statistic of a linear version of (17) where is set to its true value of 2, see Table 2. In this benchmark case, the t?statistic Table 2: t?ratio for OLS

n= 50 100 200 400

0.3 0.51 0.89 1.00 1.00

0.5 0.91 1.00 1.00 1.00

0.7 1.00 1.00 1.00 1.00

is simply the ordinary least squares (OLS) t?ratio obtained from a linear regression of y on x1, x2 and z2. As can be seen from Table 1, for small and moderate sample sizes the power of the asymptotic t?statistic is quite low. In the case where  = 0:3, at the 5% level, the proportion of rejections is 0:37 even for a sample size of 400. For the case of  = 0:7, the proportion of rejections increases to 0.89 for the same sample size of 400. The results suggest that the asymptotic t?test cannot easily discriminate between models that are close to the null hypothesis, even at moderate to large samples. On the other hand the results based on the method of tubes are remarkably better, see Table 3. Even with a sample size of 25, for the case where  = 0:7, the proportion of rejections is 0.43, whereas for the case of  = 0:7, this proportion becomes 0.97. In fact as it is to be expected, the results from the method of tubes are very similar to the results from the benchmark where is assumed to be known. The number of replications in the experiments is 5000. 10

Table 3: Method of Tubes

n= 0.3 0.5 0.7 25 0.43 0.65 0.97 50 0.56 0.94 1.00 75 0.73 0.99 1.00

4.2 Linearity with a Change Point In some econometric data, one notices patterns only slightly di erent from ellipsoidal ones. Although a linear model may be a simple strategy, one may consider adding a change point parameterization. In the simplest of terms, we could think of X and Z being real variables and (1) as yi = 0 + 1xi + (xi ? )+ + vi; where vi  N (0; 2) for i = 1; : : :; n. From this information, we can compute the H  statistic. For simplicity, assume x 2 [0; 1] and let xj = j=n, for j = 1; : : :; n. If we restrict in a range so that A   B , then !( ) = f ( )=kf ( )k is a curve on S n?3 . Now because f (x; ) = (x ? )+ is nondi erentiable at 0 hence not regular, it is however piecewise di erentiable and therefore we can use the inequality (15). In order to use this inequality, one needs to know j!j. It is shown in Knowles and Siegmund (1989) that

p j!j ! ( 3=2)log[B (1 ? A)=A(1 ? B )];

as n ! 1. 4 Alternatively, we can numerically calculate j!j for small samples. We note that because the numerical calculation is done over a compact set, the estimates can be quite reliable. Consequently, one could use the asymptotic value in large samples, or the numerical value in small samples, to get a bound on the size. 5 4 This is a short summary of Knowles and Siegmund (1989). Details as to how the length is calculated

is contained in their Section 4. We should point out though that some of their numerical fundings indicate that the upper bound is very close to the exact value for certain parameter values even in small samples. 5 A numerical program has been written to calculate the tubes formula and is available from the authors' upon request.

11

4.3 The Linear Case In the linear case, ! = Z  kZ  k?1. Geometrically, the latter is the radial projection of Z  onto S m?1. Indeed, if Z denotes the column space of Z  whose rank is k, where 0 < k  k1 , then ! is the intersection of S m?1 with Z , the k dimensional subspace of Rm. Therefore, it is the embedding of S k?1 into S m?1 for k < m. This framework has been used by Kim et al (1995) to test two linear nonnested regression models. There are many ways to perform the embedding where the latter is clearly dependent on Z , hence, the parameterization of ! may not appear obvious. However, a very useful geometric property of spheres in various dimensions, are the symmetries. We will exploit these symmetries to help us with the parameterization. Let w1; : : : ; wk 2 S m?1 denote an orthonormal basis for Z . This can be obtained by performing the Gram-Schmidt orthogonalization process on the columns of Z . Thus with this basis,  ! = kZZ 

k = 1w1 +    + k wk ; where  = (1; : : : ; k )0 2 S k?1. This leads to the following calculation:

h!; ui =

k X j =1

j hwj ; ui:

(18)

For the case k = 1, S 0 = f?1; 1g, therefore, the operation of taking the maximum of the inner product involves looking at the regions max fjhw; uij; j ? hw; uijg  c: One can apply (12) directly using the second term and multiplying by 2. Thus in this case, the size is o n P B (1=2; (m ? 1)=2)  c2 : For the case k = 2, one has, sup jhw1; uisin(2t) + hw2; uicos(2t)j2  c2; t

for t 2 [0; 1]. We note that the curve is the great circle, hence is closed so that a direct calculation of the tubular neighborhood using (12) is o

n

P B (1; (m ? 2)=2)  c2 : 12

For general k  3, one can use Weyl's formula. It turns out however, that because the embedding is geometrically simple, one can do the calculation directly and is presented in the Appendix. Indeed, for k  1, o

n

P B (k=2; (m ? k)=2)  c2 :

(19)

Using the relationship between the Beta and F distributions, one can also determine the critical values of the H  statistic using the critical values of the F statistic. Indeed, one has (p=q)Fp;q  B (p=2; q=2); (20) 1 + (p=q)Fp;q where Fp;q denotes a random variable whose distribution is an F distribution with p and q degrees of freedom. Thus an equivalent way of determining c2 is ; (21) c2 = 1 + (m ? k)1=kF k;m?k;1? where for a given 0 <  < 1, P fFk;m?k  Fk;m?k; g = . The power of the H-test in the linear case can be calculated. Indeed, from (20) we have that under the alternative with  6= 0; and 6= 0, (   2 m ? k) c 2 2 P sup jh!; uij  c = P Fk;m?k ()  1 ? c2 k ; (22) ! where Fk;m?k () denotes a noncentral F distribution with k and n ? k degrees of freedom and noncentrality parameter , which is determined by

2 = 2kZ k2=2:

(23)

4.4 Nonnested Testing via Arti cal Nesting Consider the following two nonnested regression models: and

y = g(X; ) + v0;

(24)

y = f (Z; ) + v1;

(25)

where y is the n  1 vector of observations on the dependent variable; X and Z are the n  k0 and n  k1 observation matrices of exogenous variables, respectively. Furthermore, and are k0 1 and k1 1 vectors of unknown parameters respectively, and g : Rnk0 Rk0 ! Rn, 13

f : Rnk1  Rk1 ! Rn are assumed to be known functions. Note that when g(X; ) = X and f (Z; ) = Z , (24) and (25) would just be linear models, see Kim et al (1995). Assume that X and Z are nonrandom for otherwise the ensuing discussion will be conditional on these exogenous variables. Let v0 and v1 be the disturbances assumed to be distributed independently according to multivariate normal distributions N (0; 02In) and N (0; 12In) respectively. We note that 02; 12 > 0 are unknown and Ip will denote the p  p identity matrix. If we exponentially weight the likelihoods, as suggested by Cox (1961, 1962), then one can show, as in Fisher and McAleer (1981), that the joint likelihood can be written as a combination of the two underlying likelihoods. At this point if no assumption is made with regard to 02 and 12, then the issue of distinguishing between the testing and nesting parameter comes to the fore. 6 Consequently, we will assume 02 = 12 = 2 so that our interest is only in the conditional means. We will also assume for the time being that g(X; ) = X , although we will later make some comments about the general case. Under these assumptions, we can now write (24) and (25) in a compound model y = (1 ? )X + f (Z; ) + v;

(26)

where v  N (0; 2In). The statistical test is therefore (2). The upshot of (20), for the case where f (Z; ) = Z , is that the H  statistic is distributed as an F distribution under the null hypothesis, see Kim et al (1995). Consequently, in terms of size, the H-test is comparable to the JA-test. We do note however, that the Htest has the added advantage in that orthogonality between X and Z is not of any concern. In fact the H-test is the encompassing F-test, see Mizon and Richard (1986). We note that even though the JA-test is distributed exactly as a tn?k0 ?1 ?distribution under the null when X and Z are not orthogonal, under the alternative hypothesis, it is not distributed as a non-central F ?distribution as is the H-test. In fact, little is known about the distribution, see Milliken and Graybill (1970). In the linear case, because of the nonidenti ability of  from , testing  = 0 and = 0 are indistiguishable. Note that in the former, we are testing a scalar parameter, while in the latter we are testing a vector parameter. If identi ability between  and is a non issue, and there seems to be no reason why it should, then the above discussion provides a justi cation for the use of the encompassing F-test, or equivalently, the H-test, to test 6 See Fisher and McAleer (1981), page 106 for a detailed account of the di erence when 2 6= 2 . 0 1

14

for nonnested regressions. Indeed, if admissibility is at stake, then the usual F ?statistic is known to be admissible. This can be deduced from Stein (1956). Furthermore, one can generalize this procedure to incorporate a larger class of distributions on the errors, see Kelker (1970) and Kariya and Sinha (1989). 7 Consequently and under the usual loss for tests, at a xed signi cance level, domination of the H-test in terms of power cannot be obtained by any other test in all parts of the alternative parameter space, although there may be regions in the alternative space where domination does not hold. We also note that the H-test is the uniformly most powerful invariant test, see Kariya and Sinha (1989) and Kariya and Kim (1995). We expect that these results would carry over to the nonlinear case because the H-test is the likelihood ratio test and the latter is known to have good power properties in general, see Birnbaum (1955) and Matthes and Truax (1967). The above discussion regarding the size and power properties of the H-test lends support to the use of the F-statistic to test two linear nonnested models, see Kim et al (1995).

5 Conclusions A good expository paper illustrating the use of tubes for data analysis as well as discussing other statistical applications for example in bootstrapping, gait analysis, projection pursuit regression and more, can be found in Johansen and Johnstone (1990). Hotelling (1939) originally derived the tubes test for testing the presence of a harmonic component. We see however, that his ideas can be adapted for testing between two nonnested models. In testing between two econometric models, we show how the method of tubes can play a vital role. Clearly, this methodology is very versatile with respect to modelling in general and can therefore be applied to other areas of econometric research. In particular, the tubes method can be used to develop a more accurate test of the signi cance of nonlinear components in mixed linear and nonlinear models. In fact, it seems possible that we can substitute nonparametric for the nonlinear component in such a model. Thus the tubes method would allow for testing the added bene t of the nonparametric component in a semiparametric speci cation. Alternatively, this can be addressed in the context 7 Theorem 11 of Kelker (1970) can be restated in the following way. Let X = (X10 ; X20 )0 be an (l1 + l2 )  1 spherically symmetric random vector where Xj is an lj  1 random vector for j2 = 1; 2. Suppose the mean of X is (; 0) 2 Rl1  Rl2 while the covariance matrix is 2 Il1 +l2 . Then kXkX1 ?2 k2k=l=l2 1  Fl1 ;l2 . We emphasize

that this is an exact calculation.

15

of this paper as the situation in which the null speci cation remains linear, while the alternative is nonparametric. Indeed, these methods are currently under investigation by the authors. We would also like to point out that although v is assumed to be normally distributed, such is not necessary for the results of the paper to be true. In particular, all that is necessary is for v to be spherically symmetric. Clearly the normal density satis es this condition. As pointed out in the paper, this property would be enough to have u = y=kyk uniformly distributed on S m?1 . Consequently, in the case of a linear alternative, the H-test would still be computable from the Beta distribution. As a nal comment, we would like to address the situation of (1). We have assumed that g(X; ) = X . We can allow for nonlinear functions in X , but in order for the results of this paper to hold, we would need linearity in , ie, for each xed X , g(X; + c) = g(X; ) + cg(X; ) for ;  2 Rk0 and some scalar c. Indeed, this assumption allows one to project o the inital term in (1) to get the model in the form of (4) which then allows us to use the tube calculations. We do note however, that under sucient smoothness on g, we can apply a linear approximation to the problem and use the tube calculations as a limiting case. How well this strategy compares with other available strategies asymptotically, is also under investigation.

16

A Appendix: Size calculation We are including this appendix for the bene t of the reader who may not be familiar with integration on manifolds. Such calculations are very rare in the econometric literature. The only other econometric paper that we are aware of is the paper by Hillier (1990). Let !1; : : : ; !k 2 S m?1 be an orthonormal basis for Z . Denote by O(m) the space of m  m orthogonal matrices so that for O 2 O(m), OO0 = Im. Now O(m) is made up of two connected components. Denote by SO(m) that component of O(m) that has determinant 1. Mathematically, SO(m) has a group structure hence it is known as the special orthogonal group. Multiplication of SO(m) on S m?1 is referred to as the group SO(m) acting on S m?1 and is denoted by SO(m)  S m?1 ! S m?1 . We note that this group action is transitive, meaning for any x; y 2 S m?1, there exists a g 2 SO(m) such that x = gy. Let e1; : : :; em 2 Rm denote the canonical basis or Rm so that ej has entry 1 in the j th component and zeros elsewhere for j = 1; : : : ; m. We state the following lemma whose proof is given below.

Lemma A.1 For the orthonormal set !1; : : :; !k 2 S m?1, there exists a g 2 SO(m) such that g!1 = e1 ; g!2 = e2 ; : : :; g!k = ek for k = 1; : : : ; m.

This lemma can be used to simplify (18). Indeed, let g 2 SO(m) be as in Lemma A.1. Then,

h!; ui = hg!; gui k X = hj g!j ; gui j =1

= h~; gui;

(27)

where ~ = (1; : : :; k ; 0; : : : ; 0)0 2 S m?1, in particular, we embed  2 S k?1 into the rst k coordinates of S m?1. Let v = gu. By the Cauchy Schwarz inequality, we have that

jh!; uij2

k k X X 2  vj j2 j =1 j =1

k X = vj2: j =1

Furthermore, because this bound is attainable, we have sup jh!; uij2 = ! 17

k X vj2: j =1

(28)

Let du be the normalized uniform measure on S m?1 so that Z

du = 1:

S m?1

Now it is well known that du on S m?1 is invariant with respect to the SO(m) action so that d(gu) = du for all g 2 SO(m). Therefore, we have 

2  c2 P sup jh !; u ij !



Z

=

2 2 Zsup! jh!;uij c

=

Z

=

du

supg! jhg!;guij2 c2 Pk

j=1 vj c 2

2

du

dv;

where the last equality is obtained by using (28) as well as the invariance of the uniform measure on S m?1 with respect to SO(m) action. Therefore, we have

P



2 sup jh !; u ij !

 c2







= P u21 +    + u2k  c2 :

The exact size (18) can now be obtained by an application of the following.

Lemma A.2 If u = (u1; : : :; um)0 is uniformly distributed on S m?1, then u21 +    + u2k is distributed B (k=2; (m ? k)=2) for k = 1; : : : ; m. Proof. This lemma follows from essentially the same argument as that used in the

Proof of Lemma 3.1 in Johnstone and Siegmund (1989). Interesting, it follows from Basu's Theorem. We also note that this is Lemma 1.6 of Kariya and Sinha (1989). 2

Proof of Lemma A.1. By the transitive group action of SO(m) on S m?1 , we can nd a g1 2 SO(m) such that g1!1 = e1. Now note that 0 = h!1 ; !2i = hg1 !1; g1!2i. Since e1

has 1 in the rst component and zeros elsewhere, then the orthogonality, g1!2 must have zero in the rst coordinate. Consequently, the remaining coordinates of g1!2 are in S m?2 so let g2 2 SO(m ? 1) be such that when applied to the second through last coordinates of !2, that it becomes 1 in the rst coordinate followed by zeros. We can now embed g2 into SO(m) by placing it in the lower quadrant of an m  m matrix with 1 on the rst diagonal term and the remaining terms in the rst column and row to be zeros. One can now show that g2g1!1 = e1 and g2 g1!2 = e2. We can then proceed in the same way so that we obtain gj 2 SO(m), for the remaining !j , j = 3; : : : ; k. The lemma will now follow by de ning g = gk    g1 2 SO(m). 2 18

References

[1] Andrews, D.W.K. (1994). Empirical process methods in econometrics. In R.F. Engle and D.L. MacFadden, eds., Handbook of Econometrics: Volume IV, Elsevier: New York. [2] Andrews, D.W.K. and Ploberger (1994). Optimal Tests When a Nuisance Parameter Is Present Only under the Alternative. Econometrica 62, forthcoming. [3] Birnbaum, A. (1955). Characteristics of complete classes of tests of some multivariate parametric hypothesis, with application to likelihood ratio tests. Annals of Mathematical Statistics 26, 21-36. [4] Cox, D.R. (1961). Tests of separate families of hypotheses. Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability 1. Berkeley: University of California Press. [5] Cox, D.R. (1962). Further results on tests of separate families of hypotheses. Journal of the Royal Statistical Society Series B 24, 406-424. [6] Davidson, R., MacKinnon, J.G. (1981). Several tests for model speci cation in the presence of alternative hypotheses. Econometrica 49, 781-793. [7] Davidson, R., Mackinnon, J.G. (1982). Some non-nested hypothesis tests and the relations among them. Review of Economic Studies 49, 551-565. [8] Davidson, R., Mackinnon, J.G. (1993). Estimation and Inference in Econometrics. New York: Oxford University Press. [9] Davies, R.B. (1977). Hypothesis testing when a nuisance parameter is present only under the alternative. Biometrika 64, 247-254. [10] Davies, R.B. (1987). Hypothesis testing when a nuisance parameter is present only under the alternative. Biometrika 74, 33-43. [11] Fisher, G.R., McAleer, M. (1981). Alternative procedures and associated tests of signi cance for non-nested hypotheses. Journal of Econometrics 16, 103-119. [12] Gray, A. (1982). Comparison theorems for the volumes of tubes as generalizations of the Weyl tube formula. Topology 21, 201-228. [13] Guillemin, V., Pollack, A. (1974). Di erential Topology. Prentice-Hall: Englewood Cli s. [14] Hansen, B. (1991). Inference when a Nuisance Parameter in not Identi ed under the Null Hypothesis. Working Paper No. 296, Rochester Center for Economic Research, University of Rochester. [15] Hillier, G. (1990). On the normalization of structural equations: Properties of direction estimators. Econometrica 58, 1181-1194. [16] Hotelling, H. (1939). Tubes and spheres in n?spaces and a class of statistical problems. American Journal of Mathematics 61, 440-460. [17] Johnstone, I.M., Siegmund, D. (1989). On Hotelling's formula for the volume of tubes and Naiman's inequality. Annals of Statistics 17, 184-194. 19

[18] Johansen, S., Johnstone, I.M. (1990). Hotellings theorem on the volume of tubes: Some illustrations in simultaneous inference and data analysis. Annals of Statistics 18, 652-684. [19] Kariya, T., Sinha, B.K. (1989). Robustness of Statistical Tests. Academic Press: Boston. [20] Kariya, T., Kim, P.T. (1997). Finite Sample Robustness of Tests: An Overview. Handbook of Statistics 15: Robust Inference, C.R. Rao and G.S. Maddala, editors. Chapter 22, 645-660. [21] Kelker, D. (1970). Distribution theory of spherical distributions and a location-scale parameter generalization. Sankhya A 32, 419-430. [22] Kim, P., Li, Q., Naiman, D., Stengos, T. (1995). On Hotelling's formula for the Volume of tubes: The case of linear Nonnested regression Models. Manuscript, Department of Economics , University of Guelph. [23] Knowles, M., Siegmund, D. (1989). On Hotelling's approach to testing for a nonlinear parameter in regression. International Statistical Review 57, 205-220. [24] Knowles, M., Siegmund, D., Zhang, H. (1991). Con dence regions in semilinear regression. Biometrika 78, 15-31. [25] Matthes, T.K., Truax, D.R. (1967). Tests of composite hypotheses for the multivariate exponential family. Annals of Mathematical Statistics 38, 681-697. [26] Milliken, G.A., Graybill, F.A. (1970). Extensions of the general linear hypothesis model. Journal of the American Statistical Association 65, 797-807. [27] Mizon, G.E. (1984). The encompassing approach in econometrics. Quantitative Economics and Econometrics Analysis, eds K.F. Wallis and D.F. Hendry. Oxford: Basil Blackwell. [28] Mizon, G.E., Richard,J.E. (1986). The encompassing principle and its application to testing non-nested hypothesis. Econometrica 54, 657-678. [29] Naiman, D.Q. (1986). Conservative con dence bands in curvilinear regression. Annals of Statistics 14, 896-906. [30] Naiman, D.Q. (1990). Volumes of tubular neighborhoods of spherical polyhedra and statistical inference. Annals of Statistics 18, 685-716. [31] Siegmund, D., Zhang, H. (1993). The expected number of local maxima of a random eld and the volume of tubes. Annals of Statistics 21, 1948-1966. [32] Stein, C. (1956). The admissibility of Hotelling's T 2?test. Annals of Mathematical Statistics 27, 616-623. [33] Weyl, H. (1939). On the volume of tubes. American Journal of Mathematics 61, 461472.

20

Suggest Documents