role as the empirical processes in classical Kolmogorov–Smirnov tests. Each of them will be called a transformed empirical process (in short: TEP). For technical ...
The Annals of Statistics 1997, Vol. 25, No. 6, 2388–2409
TRANSFORMED EMPIRICAL PROCESSES AND MODIFIED KOLMOGOROV–SMIRNOV TESTS FOR MULTIVARIATE DISTRIBUTIONS1 ˜ and E. M. Cabana ˜ By A. Cabana Instituto Venezolano de Investigaciones Cient´ıficas and Universidad de la Republica, ´ Uruguay A general way of constructing classes of goodness-of-fit tests for multivariate samples is presented. These tests are based on a random signed measure that plays the same role as the empirical process in the construction of the classical Kolmogorov–Smirnov tests. The resulting tests are consistent against any fixed alternative, and, for each sequence of contiguous alternatives, a test in each class can be chosen so as to optimize the discrimination of those alternatives.
1. Introduction. The design of tests specially adapted to detect some specific alternatives is a common procedure in nonparametric statistics. All three well-known tests for the two-sample problem, Wilcoxon, Fisher and Yates or median are all three distribution free, but each one is better than the others in detecting changes in position for specific sample distributions: the logistic, the ´ ´ (1967), normal or the double exponential, respectively [see Hajek and Sidak for instance]. As far as we know, the design of goodness-of-fit tests, consistent against all fixed alternatives and specially fitted to detect a specific family of them, has not been as well developed. That problem was considered and solved by one ˜ (1996)], by using a martingale transformation of the of the authors [Cabana empirical process originally used in statistical inference by Khmaladze (1981, 1993), who introduces a goodness-of-fit test but does not focus his attention on the improvement of the behavior under specific alternatives. In the present article, we pose again the problem of designing consistent goodness-of-fit tests, fitted to an arbitrarily given family of alternatives (such as changes in position for normal samples, changes in dispersion for double ˜ (1996) in two exponential ones, etc.). Our results extend the ones in Cabana senses: (1) goodness-of-fit tests for multivariate samples are discussed, and (2) the basic transformations to be applied to the empirical process for the construction of the tests are obtained by means of L2 arguments that disregard the martingale property characteristic of the particular transformation ˜ (1996). introduced in Khmaladze (1981) and used in Cabana
Received September 1996; revised June 1997. work supported by Project 94/007, CONICYT (National Council for Scientific and Technological Research), Montevideo, Uruguay. AMS 1991 subject classifications. 62G10, 62G20, 62G30, 60G15. Key words and phrases. Goodness-of-fit, power improvement, TEPs. 1 Joint
2388
TRANSFORMED EMPIRICAL PROCESSES
2389
In fact, that martingale transformation is also an isometry in the L2 space related to the probability to be fitted, and we call it below a Laguerre isometry or L-isometry because it is obtained by changing variables in the Laguerre shift that carries each Laguerre polynomial into the next one, with degree one unit larger. Our procedure is carried on starting from any arbitrarily given isometry, not necessarily the Laguerre one, although that one is the simplest we can choose, and all the resulting tests have the same distribution free behavior under the null hypothesis of fit and also under the alternatives for which the test is designed. This shows in particular that the martingale property of Laguerre isometry is not at all essential for testing purposes. 2. Preliminaries. The empirical process associated with the sample X of size 1 with respect to the probability distribution F0 on E = Rd , d ≥ 1, is the signed measure bx x A 7→ bx A = 1x∈A −F0 A, evaluated at the random point x = X. It is well known that when X has distribution F0 , bX satisfies the properties: EbX A = 0 for every measurable set A, and EbX AbX B = F0 A ∩ B − F0 AF0 B for every measurable A; B, which means that bX has the same first- and second-order moments as an F0 -Brownian bridge. As n goes to infinity, the empirical process n 1 X bn = √ bX n i=1 i
(1)
of the sample X1 ; : : : ; Xn of i.i.d. variables with distribution F converges in distribution to an F0 -Brownian bridge bF0 when F = F0 . When F 6= F0 instead, bn behaves√asymptotically as the sum of an F-bridge bF and the deterministic term nF − F0 that tends to infinity in the supremum norm as n → ∞. This gives a well-known justification for the classical Kolmogorov– Smirnov test of H0 x F = F0 [see Donsker (1952)]. Let us introduce a family of alternatives Fτ (τ in a neighborhood of 0) contiguous to F0 = F0 , with density fτ with respect to F0 , and such that there exists an L2 E ; dF0 function k satisfying
q k
1 τ
→ 0 as τ → 0+ ; τ 6= 0:
(2) f − 1 −
τ 2 L2 When (2) holds, the function k necessarily satisfies
(3)
Z
kx dF0 x = 0;
and can be obtained as the L1 E ; dF0 -limit of fτ − 1/τ as τ → 0. We shall assume that we are especially interested in detecting the sequence √ of alternatives Hn x F = Fn = Fδ/ n , where n is the sample size, and δ is a fixed parameter introduced for further convenience of notation.
2390
˜ AND E. M. CABANA ˜ A. CABANA
Under Hn , bn A converges in distribution to the sum of bF0 plus the deterministic drift Z ∂EbXτ A √ lim n EbXδ/√n A = δ = δ kx dF0 x n→∞ ∂τ A τ=0
[c.f. Oosterhoff and van Zwet (1979)]. That drift measures the sensitivity of the test based on bn with respect to Hn . The aim of this paper is the construction of goodness-of-fit tests based on signed measures wˆ x x ∈ E such that the following hold. (a) When x is replaced by a random variable X, the resulting measure wˆ X A evaluated on any measurable set A is a random variable. P (b) The measure wˆ n = n−1/2 ni=1 wˆ Xi associated with the sample has some normalized limit distribution under H0 . (c) The asymptotic distributions of wˆ n under H0 and under Hn differ as much as possible. These random measures will play, in the construction of our tests, the same role as the empirical processes in classical Kolmogorov–Smirnov tests. Each of them will be called a transformed empirical process (in short: TEP). For technical convenience, we shall require that the second-order moments of wˆ X coincide with those of a Wiener process, normalized by the requirement that its total variance be one. Thus, under a central limit property to be esP tablished, the limit distribution of wˆ n = n−1/2 ni=1 wXi will be a normalized Gaussian process with independent increments. As a consequence, we pose ourselves the problem of finding signed measures wˆ x satisfying the following: (A) Ewˆ X A = 0, for all measurable sets A and the random variable X distributed as F0 ; (B) Ewˆ X Awˆ X B = VA ∩ B, for all measurable sets A; B, the random variable X distributed as F0 , and some probability V on E ; (C) ∂Ewˆ Xτ E /∂ττ=0 ≥ ∂Ewˆ 0Xτ E /∂ττ=0 for any wˆ 0x satisfying the analogue of (1) and (2), and the random variable Xτ distributed as Fτ . Conditions (A) and (B) do not require further explanation. As for condition (C), notice that it requires the drift of our TEP to be as large as possible on the set E of maximum variance. The adequacy of such heuristic requirement will be verified a posteriori, from the properties of the resulting tests, since ˜ (1996)]. they will have optimum efficiency [see Section 4 and Cabana Let us point out that the problem of finding processes wˆ X satisfying (A), ˜ (1993, 1996), where goodness-of-fit tests (B), (C) is implicitly solved in Cabana asymptotically efficient are obtained for univariate samples. In this article, we solve it explicitly, extend the solution to a multivariate context, and, for each family of alternatives as in (2), provide consistent multivariate tests, efficient for that family.
TRANSFORMED EMPIRICAL PROCESSES
2391
In the sequel, E is always set equal to Rd , and the examples in Section 9 are developed for d = 1 and d = 2, but most results apply to more general spaces, particularly the extension of isometries described in Section 8. 3. A formal L2 construction of the TEP. 3.1. Isometries and associated TEPs. Let us assume for simplicity that the probability V appearing in condition (B) is absolutely continuous with respect to F0 , and call a2 its density; that is, Z Z VA = a2 x dF0 x; a2 x dF0 x = 1: A
We shall assume further that a is F0 -a.e. different from zero; that is, F0 and V are absolutely continuous with respect to each other. We shall denote by •; • and •; •V the inner products in L2 E ; dF0 and 2 L E ; dV, respectively. From an orthonormal basis 9 = ψ0 = 1; ψ1 ; ψ2 ; : : : of L2 E ; dF0 , we may construct the new sequence of functions 9V = ψV i i=0;1;::: ;
ψV i x = ψi x/ax;
which is an orthonormal basis for L2 E ; dV. Conditions (A) and (B) can be replaced by the requirement that the Fourier coefficients Z ci x = ψV ˆ x y i y dw
of dwˆ x /dV with respect to 9V satisfy
Eci X = ci ; 1 = E and
Z
ψV ˆ X x = 0 i x dw
Z Z Eci Xcj X = ci ; cj = E ψV y d w ˆ y ψV ˆ X z X i j z dw Z V V V = ψV i yψj y dVy = ψi ; ψj V = δi; j :
Consequently, C = ci i=0;1;::: is required to be an orthonormal system in L E ; dF0 with all its elements orthogonal to the constant 1, in order that (B) and (C) hold. P Expand now wˆ Xτ E = i ci Xτ ψV i ; 1, and compute X Eci Xτ − ci X V ∂Ewˆ Xτ E = lim ψi ; 1V τ→0 ∂τ τ τ=0 i XZ (4) = ci xkx dF0 xψi ; a 2
i
X C X ki ai ; = ci ; kψi ; a = i
i
˜ AND E. M. CABANA ˜ A. CABANA
2392
where kC i , ai are the ith Fourier coefficients of k and a with respect to the orthonormal systems C and 9, respectively. Let us introduce now the isometry T that maps 9 onto C; that is, (5)
T x L2 E ; dF0 → L2 E ; dF0 ;
ψi 7→ T ψi = ci ; i = 1; 2; : : : :
The last term in (4) equals k; T a, and therefore condition (C) holds when, for a given T , one chooses a to minimize the angle between T a and k. This is accomplished by selecting T a = k/k, provided the span of C contains k. This will be ensured by imposing that the span of C, that is, the range of T , be the orthogonal complement 1⊥ of 1 in L2 E ; dF0 . The conclusions obtained so far can be summarized in the following statement. Proposition 1. The measure wˆ x with formal Fourier expansion (6)
∞ X
wˆ x A =
i=0
ci xa1 ˆ A ; ψi
satisfies (A), (B), (C), when 9 = ψi i=1;2;::: is an orthonormal basis of L2 E ; dF0 , T is an isometry on L2 E ; dF0 with range 1⊥ , C = ci i=1;2;::: is the image of 9 by T and the function aˆ has the property T aˆ = k/k. In that case, the objective of our optimization has the value ∂Ewˆ Xτ E (7) = k; T a ˆ = k: ∂τ τ=0
Remark 1. The preceding proposition gives a formal solution to our problem for each orthonormal basis of L2 E ; dF0 and each isometry T with range 1⊥ . Remark 2. T we obtain
(8)
By replacing ci x by T ψi x in (6), and using the linearity of wˆ • A = T
∞ X
a1 ˆ A ; ψi ψi = T a1 ˆ A ;
i=0
and this implies in particular that wˆ x depends only on the isometry T , but not on the orthonormal basis 9. Let us finally introduce the notation (9)
w•a; T A = T a1A
for the measure associated with the isometry T and the score function a ∈ L2 E ; dF0 . After Remark 2, we may reformulate our previous statement in the following.
TRANSFORMED EMPIRICAL PROCESSES
2393
Proposition 2. The measure (9) satisfies conditions (A), (B), (C) and equation (7), when T is an isometry on L2 E ; dF0 with range 1⊥ containing k, and the score function is chosen as aˆ = T −1 k/k. 3.2. Constructing the TEP as a stochastic integral with respect to the empirical process. From (9) and the orthogonality of the range of T with respect to 1, the expression Z a; T wX A = T a1A X = T a1A d1•≤X (10) Z Z = T a1A d1•≤X − F0 = T a1A dbX follows. From (10) we derive an expression for (11)
a; T
wn
in terms of bn , namely (12)
a; T
wn
n 1 X a; T =√ w n i=1 Xi
x =
Z
T a1x dbn :
3.3. Transformed empirical process in E = Rd . We adopt the expressions (10) and (12) as the definition of the transformed empirical process. Definition 1. The transformed empirical process of the sample X1 ; X2 ; : : : ; Xn of E -valued random variables, associated with the probability distribution F0 on E , the isometry T on L2 = L2 E ; dF0 with range equal to the orthogonal complement 1⊥ of the constant function 1 and the L2 -function a with a = 1 is Z a; T wn A = T a1A dbn : (13) No attention has been paid to the convergence of the Fourier expansions involved in the arguments that motivated Definition 1, but straightforward computation of moments shows that properties (A), (B) hold when wˆ is rea; T a; T placed by wX or wn . Moreover a;T Z ∂EwXτ E = kT a dF0 ; ∂τ τ=0 a; ˆ T
substituted for wˆ X , where aˆ = T −1 k/k. so that (C) holds with wX We show in Section 4 that, under suitable assumptions on T , a, a central limit theorem holds for the TEP so defined.
4. Asymptotic properties of the TEP. Let us consider the TEPs a; T wn n=1;2;::: with respect to F0 constructed over triangular arrays of i.i.d.
˜ AND E. M. CABANA ˜ A. CABANA
2394
variables Xn; 1 ; Xn; 2 ; : : : ; Xn; n with distribution function Fn . We describe separately the limit properties of the TEPs for n → ∞ in two cases: when Fn = F 6= F0 is the same fixed distribution for all samples and when Fn is a sequence contiguous to F0 . a; T
4.1. Unboundedness under fixed alternatives. The expectation of wX A when X has distribution F is Z Z T a1A xdFx − F0 x = T a1A xdFx = T a1A ; dF/dF0 :
Let us call T g the projection of dF/dF0 on the range of T . Then R a; T EwX A = a1A ; g = A agdF0 . From this expression we are led to the following. (14)
R
Corollary 1. (14) holds.
When J = b; cx b; c ∈ Rd or J = −∞; cx c ∈ Rd , then
Theorem 1.
If
A
agdF0 6= 0 for some A in a family of sets J , then lim sup wn A = +∞ a.s.
n→∞ A∈J
The conclusion follows from the assumptions that T has range 1⊥ and a is a.e. nonvanishing. 4.2. Gaussian limit under the null hypothesis and contiguous alternatives. Replacing the empirical process bn in (13) R by the F0 -Brownian bridge b, one obtains a Wiener process wa; T A = T a1A db, indexed on A (A in a given family of sets J ). Following Ossiander [Ossiander (1987)], the convera; T gence in distribution of the transformed empirical process wn to wa; T is guaranteed, under the null hypothesis H0 , by the assumptions (15)
T a1A ≤ G for some G ∈ L2 E ; dF0
and all A ∈ J
of uniform boundedness of the family (16)
G = T a1A x A ∈ J ;
and
Z 1q
(17)
0
2
log N ε; G ; F0 dε < ∞ 2
about the boundedness of the L2 F0 -metric entropy with bracketing N , which is defined by 2
(18)
N ε; G ; F0 = minκ: there exist sets U and L with cardinal κ of L2 F0 -functions such that for each f ∈ G there exist u ∈ U and l ∈ L such that l < f < u and u − l2 ≤ ε2 .
TRANSFORMED EMPIRICAL PROCESSES
2395
a; T
As for the asymptotic distribution of wn A under Hn , it follows from the Le Cam third lemma and assumption (2) [see Le Cam and Yang (1990), Oosterhoff and van Zwet (1979)] that it is the same as under H0 plus the bias R δ kT a1A dF0 . The following statement summarizes the asymptotic behavior indicated above. The assumptions in (i) must be verified for each particular isometry, as will be done below for the examples in Section 9. Theorem 2. Let X1 ; : : : ; Xn be a sample of E -valued i.i.d. random varia; T ables with distribution F and wn A the transformed empirical process of that sample associated to the probability distribution F0 on E , the isometry T on L2 = L2 E ; dF0 with range orthogonal to the constant function 1 and the L2 -function a (a = 1), as introduced in Definition 1. a; T
(i) When assumptions (15) and (17) hold, wn Ax A ∈ J converges in distribution to the Wiener process wa; T Ax A ∈ J with covariance function Z Ewa; T Awa; T B = a2 dF0 ; A∩B
under the null hypothesis H0 x “F = F0 :” (ii) When, in addition, the family of probabilities Fτ on E with density a; T τ f with respect to F0 satisfies Ax A ∈ J converges in R (2), then wn a; T distribution to w A√+ δ kT 1A a dF0 x A ∈ J , under the sequence of alternatives Hn x “F = Fδ/ n :” These results justify the test procedure described in next section. 5. The goodness-of-fit tests. We propose to test H0 x “F = F0 ” by means of the critical region a; ˆ T sup wn A > cα A∈J
with T aˆ = k/k and cα such that n o ˆ A > cα ≤ α; P sup waT A∈J
where α is an upper bound of the asymptotic level desired for the test. The family J of measurable subsets of E is chosen rich enough to generate ˆ the σ-field of measurable sets in E , but not so large that supA∈J waT A be unbounded. It is assumed that J contains E . Moreover, the compromise in choosing J is that a CLT (19)
lim sup wˆ n A − wV A = 0
n→∞ A∈J a; ˆ T
holds for copies wˆ n of wn (20)
and a V-Wiener process wV , and also
for F 6= F0 ; there is A ∈ J such that FA 6= F0 A:
˜ AND E. M. CABANA ˜ A. CABANA
2396
We shall choose for J either the family of all generalized intervals o n d J1 = × pi ; qi x − ∞ ≤ pi < qi ≤ ∞; i = 1; 2; : : : ; d i=1
or the family of translations of the negative orthant n d o J0 = × −∞; qi x qi ≤ ∞; i = 1; 2; : : : ; d : i=1
Since E ∈ J , the asymptotic power of the test for Hn is greater than ˆ T P wa; E + δk > cα = 8−cα + δk:
˜ (1996) or Cabana ˜ and Cabana ˜ (1996) An argument like the one used in Cabana shows that when the level α and the probability β of type II error are sufficiently small, the asymptotic relative efficiency of our test with respect to the maximum likelihood ratio test can be chosen as close to one as desired. In this sense, the test has optimum efficiency. For other values of α and β there is a reduction in the efficiency, due to a; ˆ T the use as the test variable of the supremum of wn over a family of sets instead of the value of the same process on E . This is the price to be paid for the consistency against all alternatives. Notice that the TEP could be defined on families of functions different from indicators of sets in J , but we have chosen this particular index set because the reflection principle for Brownian motion provides the exact distribution of the supremum of the process over the family. 6. On the construction of isometries and their associated transformed empirical processes. Let us notice that any isometry T on L2 E ; dF0 with range contained in 1⊥ = f ∈ L2 E ; dF0 x f; 1 = 0 induces an orthonormal system 9T = 1; T 1; T 2 1; : : : ; T i 1; : : :. Conversely, given any orthonormal system 9 = 1, ψ1 , ψ2 , : : :, the linear transformation T 9 on L2 E ; dF0 that maps each ψi onto ψi+1 , i = 0; 1; : : : ; is an isometry with range in 1⊥ . In addition, when 9 is a basis, the range is 1⊥ and therefore it contains any function k with the property (3). These observations imply that the isometries needed for the construction of TEPs may possibly be obtained from known orthonormal systems in L2 E ; dF0 . In Section 6.1 we indicate the analytical form of the shift TL induced by the Laguerre polynomials on L2 R+ ; e−x dx. A simple analytical property of TL reflected in (23) gives an alternative way to obtain this isometry. For a given orthonormal system 9, it is not easy in general to find the explicit analytical expression for the shift operator that maps each ψn onto ψn+1 . We describe it for a particular example in Section 6.2: the √ orthonormal system of Tchebyshev polynomials on L2T = L2 −1; 1; dx/π 1 − x2 . √ The normalized hn x = Hn x/ n!, n = 0; 1; : : : ; P Hermite polynomials 2 n where etx−t /2 = ∞ /n!√[Sansone (1959)], constitute an orthonorn=0 Hn xt 2 mal basis of L2H = L2 R; e−x /2 dx/ 2π. An explicit writing of the shift that
TRANSFORMED EMPIRICAL PROCESSES
2397
´ and maps hn onto hn+1 in terms of integrals of the Poisson kernel [see Moran Urbina (1996) for the form of the inverse shift], leads to cumbersome computations so that the alternative approach, namely, the generalization of (23), is adopted in Section 6.3 to construct an isometry TH on that space. By means of changes of variables, the isometries in Sections 6.1, 6.2 and 6.3 induce others on different L2 spaces on R, as described in Section 7, and the latter ones can be used in the construction of new isometries on L2 spaces on Rd as shown in Section 8. The Laguerre shift (29) has been used many times in probability theory and statistical inference: for instance, let us mention that Brownian bridge is constructed from Wiener process by means of its inverse in Karatzas and Shreve (1991), that Efron and Johnstone (1990) and Ritov and Wellner (1988) use (29) and its inverse in connection with hazard rates [some properties of both isometries are described in Groeneboom and Wellner (1992)], and that Khmaladze (1981) introduced (29) in statistical inference, specially emphasizing some associated martingale properties. We use it as our main example because of its very simple analytical expression, but do not apply any martingale approach. 6.1. The Laguerre shift. The well-known Laguerre polynomials Ln x n = 0; 1; : : : are an orthonormal basis of L2 R; dF0 , with F0 x = F0 0; x = 1 − e−x , and they are obtained by means of the iterated application of the mapping Zx hx = TL gx = gx − (21) gt dt 0
to the first element of the basis, the polynomial of degree zero L0 = 1: (22)
Ln x = −1n TLn L0 x
[see, for instance, Sansone (1959), as a general reference]. This implies that TL is an isometry and, for each g ∈ L2 R; dF0 , TL g is orthogonal to the constant 1. Both properties of TL follow immediately after plain calculations, and, conversely, imply (22). The clue to show that TL is an isometry is the equation Zx Zx d f x gt dt (23) TL gxTL hxf0 x = gxhxf0 x − ht dt dx 0 0 0 Rx Rx and the fact that limx→∞ f0 x 0 gt dt 0 ht dt = 0.
6.2. The √Tchebycheff shift. The Tchebycheff polynomials T0 x = 1, 2 cosn arccos√x, n = 1; 2; : : : are an orthonormal basis of Tn x = L2T = L2 −1; 1; dx/π 1 − x2 . The isometry TT that maps each Tn onto Tn+1 can be described as follows. √ R1 Given ux ∈ L2T , let us assume first that u; 1 = −1 ux dx/π 1 − x2 = 0. Perform the change of variable ut ˜ = ucos t, −π < t ≤ π, obtain an analytic function hu on D = ζ = reit x ζ < 1 such that u˜ r t =
˜ AND E. M. CABANA ˜ A. CABANA
2398
< hu reit converges in L2 −π; π; dt-norm to u ˜ t as r → 1 and take the L2 −π; π; dt-limit vcos t = lim < ζhu ζ: r=ζ→1
This limit is the image v = TT u of u. The function hu is determined up to an imaginary additive constant, and the one satisfying =hu 0 = 0 is obtained by integrating the Poisson kernel: z + ζ dz 1 Z u< z hu ζ = 2πi C z−ζ z Z 1 π 1 − r2 + 2ir sins − t = ds ucos s 2π −π 1 − 2r coss − t + r2 1 − r2 + 2ir sins − t 1 Zπ ucos s − ucos t ds: = ucos t + 2π −π 1 − 2r coss − t + r2 Taking the limit in