Nonparametric Tests based on Area - Statistics Stefan Kraft and Friedrich Schmid Seminar für Wirtschafts- und Sozialstatistik, Universität zu Köln Albertus-Magnus-Platz 50923 Köln, Germany
[email protected] [email protected] 1. Introduction
Let X and Y denote two random variables with continuous distribution functions F and G, respectively. The theoretical probability plot of F and G is dened by p 7! G(F , (p)) for p 2 (0; 1), where F , (p) = inf fxjF (x) pg denotes the inverse distribution function of F . The basic quantities of interest in this paper are 1
1
(1)
A =
(2)
A, =
+
and
Z
1
0
Z
1 0
(G(F , (p)) , p) dp = 1
+
(G(F , (p)) , p), dp = 1
Z1
,1
Z1 ,1
(G(x) , F (x)) dF (x) +
(G(x) , F (x)), dF (x)
where z denotes the nonnegative part of a real number z, i.e., z = maxfz; 0g and z, = maxf0; ,zg. A can be interpreted as the area below G(F , (p)) and above p. The interpretation of A, is analogous. +
+
+
1
2. Test statistics and their distributions
Let X ; : : : ; Xm and Y ; : : : ; Yn denote two independent samples from X and Y . The sample version of A is Z1 (3) Am;n = (G^ n(x) , F^m (x)) dF^m(x) 1
1
+
+
+
,1
m m R(X ) , i X X i i 1 i ^ = 1 , G = n (X i ) , mi m mi n m +
( )
( )
=1
!+
=1
where F^m and G^ n are the empirical distribution functions of the two samples and X : : : X m denote the order statistics of X ; X ; : : : ; Xm; R(X i ) is the rank of X i in the combined sample. In the same way we obtain (1)
(
)
1
(4)
A,
m;n
2
!, m R(X ) , i X 1 i i = ,m : mi n
( )
( )
( )
=1
Am;n and A,m;n and some functionals thereof can be used for various testing problems. +
Table 1. Testing problems and suitable test statistics Nullhypotheses and Alternative Hypotheses (1) Equality H : F (x) = G(x) 8x 2 IR vs. H : not H 0
1
Suitable Area-Statistics
Am;n + A,m;n (L Version of C. v. Mises) Am;n , A,m;n (Wilcoxon) maxfAm;n; A,m;ng +
1
+
0
+
(2) One sided stochastic dominance H : F (x) G(x) 8x 2 IR vs. H : not H Am;n (3) Stochastic dominance in either direction H : F (x) G(x) 8x 2 IR minfAm;n; A,m;ng or G(x) F (x) 8x 2 IR vs. H : not H 0
1
+
0
+
0
1
0
Kraft and Schmid (1999) developed a recursive scheme for the computation of the joint probabilities c ; A, c , ) (5) P (Am;n m n m;n m n under F = G where c = 0; 1; : : : ; m m, n and c , = 0; 1; : : : ; m m n . The nite sample distributions of Am;n; Am;n + A,m;n; maxfAm;n; A,m;ng; minfAm;n; A,m;ng and the corresponding quantiles (i.e. critical values) can now be derived. The joint asymptotic distribution of a proR R perly normalized version of (Am;n; A,m;n) is ( B (p) dp; B ,(p) dp) where (B (p))p2 ; is a Brownian Bridge (see Shorack and Wellner (1986)) and the asymptotic distributions of the above stated test statistics are functionals thereof. Quantiles of these distributions can be obtained by simulation. Power of tests based on the area statistics has been investigated by Schmid and Trede (1995,1996) and Kraft and Schmid (1999). (+)
+
(
2
2
(
(+)
+
)
1)
2
+
+
(
(
)
+
+
1 0
+
1 0
+1) 2
[0 1]
References Kraft, S. and F. Schmid (1999). Nonparametric tests based on area-statistics. Discussion papers in statistics and econometrics, Seminar of economic and social statistics, University of Cologne Schmid, F. and M. Trede (1995). A distribution free test for the two sample problem for general alternatives. Computational Statistics & Data Analysis 20, 416-419. Schmid, F. and M. Trede (1996). Testing for rst order stochastic dominance: a new distributionfree test. The Statistician 45, No. 3, 377-380. Shorack, G.R. and J.A. Wellner (1986). Empirical Processes with Application to Statistics. New York.
Résumé Nous presentons quelques tests d'hypothèses nonparamétriques dans le cadre du problème des deux échantillons indépendentes. Les tests se basent sur des statistiques de rang non linéaire que nous pouvons interpreter comme des plaines dans un p-p-plot.