Characterizing angular symmetry and regression symmetry - K.U.Leuven

17 downloads 0 Views 287KB Size Report
From Theorem 1 it follows that any P which is angularly symmetric about some .... characterize P on S by the theorem of CramÃer and Wold (1936), since any ...
Journal of Statistical Planning and Inference 122 (2004) 161 – 173

www.elsevier.com/locate/jspi

Characterizing angular symmetry and regression symmetry Peter J. Rousseeuw, Anja Struyf ∗;1 Department of Mathematics and Computer Science, University of Antwerp, Middelheimlaan 1, Antwerp B-2020, Belgium Accepted 15 June 2003

Abstract Let P be a general probability distribution on Rp , which need not have a density or moments. We investigate the relation between angular symmetry of P (a.k.a. directional symmetry) and the halfspace (Tukey) depth. When P is angularly symmetric about some 0 we derive the expression of the maximal Tukey depth. Surprisingly, the converse also holds, hence angular symmetry is completely characterized by Tukey depth. This fact puts some existing tests for centrosymmetry and for uniformity of a directional distribution in a new perspective. In the multiple regression framework, we assume that X is a (p − 1)-variate r.v. and Y is a univariate r.v. such that the joint distribution of (X; Y ) is again a totally general probability distribution on Rp . The concept of regression symmetry (RS) about a potential 3t 0 means that in each x the conditional probability of a positive error equals that of a negative error. If a distribution is regression symmetric about some 0 then the maximal regression depth has a certain expression. It turns out that the converse holds as well. Therefore, regression depth characterizes the linearity of the conditional median of Y on X , which we use to construct a statistical test for linearity. c 2003 Elsevier B.V. All rights reserved.  MSC: primary 62H05; secondary 62G05; 62J05 Keywords: Angular symmetry; Location depth; Regression depth; Regression symmetry; Testing linearity



Corresponding author. E-mail address: [email protected] (A. Struyf). URL: http://win-www.uia.ac.be/u/statis/index.html 1 Postdoctoral Fellow of the Fund for Scienti3c Research—Flanders, Belgium.

c 2003 Elsevier B.V. All rights reserved. 0378-3758/$ - see front matter  doi:10.1016/j.jspi.2003.06.015

162 P.J. Rousseeuw, A. Struyf / Journal of Statistical Planning and Inference 122 (2004) 161 – 173

1. Introduction It is natural to expect of a location estimator that in the case of a symmetric distribution the population estimate corresponds to the center of symmetry. In particular, for any angularly symmetric multivariate distribution (Liu, 1990) the location depth median (Tukey, 1975) corresponds to the center of angular symmetry. In regression one can de3ne a similar notion of regression symmetry. If the data distribution is regression symmetric, the deepest regression 3t (Rousseeuw and Hubert, 1999) yields the corresponding ‘center of symmetry’. Surprisingly, the location depth and the regression depth in turn characterize the respective notions of symmetry. In Section 2, we show that a center of angular symmetry V0 of a general distribution P is also a point in which the Tukey depth reaches its largest possible value and vice versa, and we give an expression for this maximal depth, thereby extending previous work by Zuo and SerEing (2000a). This characterization gives us more insight in some existing tests for centrosymmetry and uniformity of a spherical distribution. In Section 3, we consider a general probability distribution P on (X; Y ) ∈ Rp . We say that P is regression symmetric if and only if there exists a parameter vector V0 ∈ Rp such that in each x the conditional probability of a positive error E = Y − (X  ; 1)V0 equals that of a negative error. It turns out that the regression depth of V0 is the highest possible and attains a particular expression, and that any 3t V whose regression depth attains this expression must be a center of regression symmetry. The expression becomes simpler for distributions with a density. The analogy between the results of Sections 2 and 3 reinforces the similarity between location depth and regression depth. Both depth notions are natural extension of the univariate concept of rank to the more general settings of multivariate location and multiple regression. Their structural details are very similar, and hence it is not surprising that they share several important properties, computationally as well as statistically. 2. Location depth and angular symmetry The halfspace location depth was introduced by Tukey (1975) as a tool for analyzing 3nite data sets. The location depth of any point V ∈ Rp relative to the data set Xn = {x1 ; : : : ; xn } ⊂ Rp is de3ned as the smallest number of data points in any closed halfspace with boundary through V, i.e. ldepth(V; Xn ) = min #(HV; u ∩ Xn )=n; u=1

(2.1)

where HV; u = {x ∈ Rp ; u (x − V) ¿ 0}. This de3nition can easily be generalized to any probability distribution P on Rp with its Borel sets. The location depth of a point V relative to P then becomes ldepth(V) = inf P(HV; u ): u=1

Since (2.1) equals zero for V lying outside the convex hull of the data, and increases when V moves closer to the center of the data, it is often referred to as multivariate

P.J. Rousseeuw, A. Struyf / Journal of Statistical Planning and Inference 122 (2004) 161 – 173 163

ranking (Eddy, 1985; Green, 1981). This can be visualized by means of the ldepth regions D given by D = {V ∈ Rp ; ldepth(V; Xn ) ¿ }: These regions are convex sets, with D ⊆ D  for each  ¡ . The center of gravity of the innermost ldepth region is a point with maximal ldepth, called the deepest location or the Tukey median of the data set. This multivariate location Tl∗ (P) is a robust generalization of the univariate median. Donoho and Gasko (1992) explored the properties of the location depth and of the deepest location for 3nite data sets. MassJe and Theodorescu (1994) and Rousseeuw and Ruts (1999) gave several properties of the location depth for general probability distributions P, that need not have a density. The asymptotic behavior of the depth function was studied by He and Wang (1997) and MassJe (1999), and that of the deepest location by Bai and He (1999). Many statistical applications of location depth have been developed. A survey is given in Liu et al. (1999). Some distributions P are angularly symmetric (Liu, 1990). We say that P is angularly symmetric about a point V0 if for any Borel cone A in Rp (i.e., a Borel set A such that sA = A for any 0 ¡ s ¡ ∞) it holds that P(V0 + A) = P(V0 − A): This property of P is also called directional symmetry. It can easily be seen that P is angularly symmetric about V0 if and only if P  := P|Rp \{V0 }

(2.2)

is angularly symmetric about V0 and vice versa. When P({V0 }) ¡ 1 we can condition on Rp \{V0 } yielding P ∗ (B) := P(B \{V0 })=(1−P({V0 })), and then the angular symmetry of P is equivalent to that of P ∗ . Let us now de3ne the mapping h : Rp \ {V0 } → S = S(0; 1) as the radial projection onto the unit sphere, i.e. h(x) = (x − V0 )= x − V0 . Moreover, let Ph := (P ∗ )h

(2.3)

be the law of h. Then P is angularly symmetric about V0 if and only if Ph (B) = Ph (−B) for any Borel set B ⊂ S, i.e. if Ph is centrosymmetric about 0. We will now discuss the relation between angular symmetry and location depth. Remark. The location depth is always maximal at the center of angular symmetry, as proved by Zuo and SerEing (2000b). We will derive an explicit formula for the maximal depth in case of angular symmetry. The following upper bound on the location depth holds for any distribution P: Lemma 1. For any distribution P on Rp and any point V, it holds that ldepth(V) 6 12 + 12 P({V}):

164 P.J. Rousseeuw, A. Struyf / Journal of Statistical Planning and Inference 122 (2004) 161 – 173

Proof. For any u = 1, let B be the orthogonal hyperplane through V and denote ◦



the corresponding open halfspaces by A = H V; u and C = H V; −u . Assume wlog that P(A) 6 P(C). Since 2P(A) 6 P(A) + P(C) = 1 − P(B), we have P(A) + P(B) 6 12 + 1 1 1 1 2 P(B). By de3nition, ldepth(V) = inf B (P(A) + P(B)) 6 inf B ( 2 + 2 P(B)) = 2 + 12 P({V}). We will now prove that the upper bound of Lemma 1 is attained in the case of an angularly symmetric distribution for V equal to the point of symmetry. To prove this, we need the following lemma of Zuo and SerEing (2000a). Lemma 2. A general distribution P on Rp is angularly symmetric about V0 if and only if each hyperplane passing through V0 divides Rp in two open halfspaces with equal probability. Theorem 1. When P is angularly symmetric about some V0 then ldepth(V0 ) =

1 2

+ 12 P({V0 }):

Proof. If P is angularly symmetric about V0 then we can repeat the proof of Lemma 1 with V = V0 and P(A) = P(C) because of Lemma 2. This yields the required equality ldepth(V0 ) = 12 + 12 P({V0 }). We are grateful to an anonymous referee who pointed us to Lemma 1 and a more elegant proof of Theorem 1. From Theorem 1 it follows that any P which is angularly symmetric about some V0 with P({V0 }) ¿ 0 has a unique center of angular symmetry. Otherwise, there can only be two diMerent centers V1 = V2 of angular symmetry if P has all its mass on the straight line through V1 and V2 . These corollaries have been proved in another way by Liu (1990) and Zuo and SerEing (2000a). To show that angular symmetry is actually characterized by location depth, we will now prove the reverse of Theorem 1. Theorem 2. When there is a point V0 ∈ Rp with ldepth(V0 ) =

1 2

+ 12 P({V0 })

then P is angularly symmetric about V0 . Proof. Throughout we will assume that P({V0 }) ¡ 1 and that V0 = 0 wlog. De3ne P  as in (2.2). Consider a great circle C on S and the collection of hyperplanes {v⊥ ; v ∈ S}. Since the number of hyperplanes v⊥ with P  (v⊥ ) ¿ 1=n is at most 2n, it follows that the collection {v⊥ ; v ∈ C and P  (v⊥ ) ¿ 0} is countable. Thus, there exists v ∈ C for which P  (v⊥ ) = 0. By the aNne invariance of ldepth, we may assume wlog that v = (0; : : : ; 0; 1). Therefore, the horizontal hyperplane G 0 ≡ (xp = 0) satis3es P  (G0 ) = 0 and P  (xp ¿ 0) = P  (H0; v ) = P  (H0; −v ) = P  (xp ¡ 0) because

P.J. Rousseeuw, A. Struyf / Journal of Statistical Planning and Inference 122 (2004) 161 – 173 165

otherwise ldepth(V0 ; P  ) ¡ 12 P  (Rp ). Therefore, P  (xp ¿ 0)= 12 P  (Rp )= 12 [1−P({V0 })]= P  (xp ¡ 0). By de3nition, angular symmetry of P is equivalent to centrosymmetry of Ph (as in (2.3)) on the unit sphere S. Let us de3ne the equator as S 0 = {x ∈ S; xp = 0}, the northern hemisphere as S + = {x ∈ S; xp ¿ 0} and the southern hemisphere as S − ={x ∈ S; xp ¡ 0}. Since Ph (S 0 )=0 all the mass of S is in S + ∪S − . Centrosymmetry of Ph is equivalent to requiring that Ph (B) = Ph (−B) for any Borel subset of S, which in turn is equivalent to Ph (A) = Ph (−A) for any Borel subset A of S + . (Indeed, for any Borel subset of S we can write B0 = B ∩ S 0 ; B+ = B ∩ S + and B− = B ∩ S − . Then Ph (B0 ) = 0; Ph (B+ ) = Ph (−B+ ) and Ph (−B− ) = Ph (B− ). Therefore, Ph (B) = Ph (B+ ) + Ph (B− ) = Ph (−B+ ) + Ph (−B− ) = Ph (−B).) Consider the map h2 on S + ∪ S − de3ned by  h2 (x1 ; : : : ; xp ) =

xp−1 xp x1 ;:::; ; |xp | |xp | |xp |

 :

For any x ∈ S + it holds that h2 (x) ∈ G + ≡ (xp = 1). Analogously, for any x ∈ S − we 3nd h2 (x) ∈ G − ≡ (xp = −1). De3ne the measure P˜ on G + ∪ G − as the law of h2 . ˜ + ∪ G − ) = Ph (S) = P  (Rp ) = 1 − P({V0 }) and that Note that the total mass of P˜ is P(G 1 − + ˜ ˜ P(G ) = P(G ) = 2 [1 − P({V0 })]. It remains to prove that for any Borel set B ⊂ G + ˜ ˜ it holds that P(B) = P(−B) where of course −B ⊂ G − . For starters, let us take any closed halfspace D of G + . Then D is the intersection of G + with a closed halfspace H0; u of Rp where u = ep = (0; : : : ; 0; 1). Since the arc ]ep ; u[ of the great circle through u and ep contains at most countably many v with P  (@H0; v ) = 0, there exists a sequence of vn ∈ ]ep ; u[ with P  (@H0; vn )=0 such that vn → u in a monotone way on the arc. For each  n consider the halfspace Dn = G + ∩ H0; vn + ˜ of G . By construction, P(@D n ) = 0 and n ↓ Dn = D. For each n it holds that P  (H0; vn ) = 12 P  (Rp ) = 12 [1 − P({V0 })] by the assumption on ldepth(V0 ). Therefore ˜ 0; vn ∩ G − ) = P  (H0; vn ) − P(H ˜ 0; vn ∩ G + ) = 1 [1 − P({V0 })] − P(H ˜ 0; vn ∩ G + ), and making P(H 2 1 − − ˜ ˜ ˜ ˜ n ). Now use of P(G ) = 2 [1 − P({V0 })] we 3nd P(G \ H0; vn ) = P(H0; vn ∩ G + ) = P(D −  ˜ G \ H0; vn is the interior of −Dn . Since P (@H0; vn ) = 0 we have P(@(−D n )) = 0, hence ˜ ˜ ˜ ) = P(D ). Because P is a 3nite measure and the sets −D are decreasing in n P(−D n n n ˜ ˜ ˜ ˜ we 3nd P(−D) = lim ↓ P(−D ) = lim ↓ P(D ) = P(D). n n To go from closed halfspaces D of G + to arbitrary Borel subsets B of G + we make the following construction. Take the one-to-one map h3 : G + →Rp−1 : (x1 ; : : : ; xp−1 ; 1) → ˜ −1 ˜ −1 (x1 ; : : : ; xp−1 ). For any Borel subset A of Rp−1 we de3ne Q1 (A) = P(h 3 (A))= P(h3 p−1 ˜ −1 (Rp−1 )) = 2P(h . For G − 3 (A))=[1 − P({V0 })] so Q1 is a probability measure on R − we take a map which reEects points relative to the origin in G namely h4 : G − → Rp−1 : (x1 ; : : : ; xp−1 ; −1) → (−x1 ; : : : ; −xp−1 ). The corresponding probability measure ˜ −1 Q2 on Rp−1 is given by Q2 (A) = 2P(h 4 (A))=[1 − P({V0 })]. We have already proved that Q1 (D) = Q2 (D) for any closed halfspace D of Rp−1 . But by the theorem of CramJer and Wold (1936) this implies that Q1 ≡ Q2 . Therefore, for any Borel subset ˜ B of G1 we have P(B) = 12 [1 − P({V0 })]Q1 (h3 (B)) = 12 [1 − P({V0 })]Q2 (h3 (B)) = 12 [1 − ˜ P({V0 })]Q2 (h4 (−B)) = P(−B).

166 P.J. Rousseeuw, A. Struyf / Journal of Statistical Planning and Inference 122 (2004) 161 – 173

The proof technique used here is similar, but not equivalent to techniques used by Zuo and SerEing (2000a). For the special case of probability measures with a density it always holds that maxV ldepth(V) 6 12 . Combining this with Theorems 1 and 2, we obtain: Corollary 1. When P has a density, then P is angularly symmetric about some V0 if and only if max ldepth(V) = 12 : V

This property also follows from Zuo and SerEing (2000a). The only-if part of this property has already been proved by Rousseeuw and Ruts (1999) in a diMerent way. In two dimensions, the location depth ldepth(V0 ; Xn ) reduces to the bivariate sign test statistic of Hodges (1955) where the null hypothesis H0 was that P is centrosymmetric about V0 . By Theorem 2 we can now see that the real null hypothesis of this test is larger than the original H0 . It actually tests for angular symmetry instead of centrosymmetry, which is a special case. Ajne (1968) uses essentially the same test statistic to test for another null hypothesis, that a distribution on the circle is uniform. By the construction in (2.3) and Property 2 it follows that Ajne’s test has a much larger null hypothesis, namely centrosymmetry of the circular distribution. The latter is an illustration of the fact that the masses of all hemispheres of a sphere S in Rp are not enough to characterize the distribution P on S. Indeed, for any centrosymmetric distribution P on S it is true that the mass of each hemisphere equals 12 , and hence we cannot distinguish between those distributions solely based on the masses of hemispheres. On the other hand, the masses of all caps of S would be suNcient to characterize P on S by the theorem of CramJer and Wold (1936), since any nontrivial intersection of a halfspace H ⊂ Rp and S determines a cap of S and vice versa. The Tukey depth may also be de3ned for directional data. Small (1987) de3nes the angular Tukey depth (ATD) for a given spherical distribution P as ATD(V) = inf P(C) C

over the set of all closed hemispheres C. Liu and Singh (1992) prove that for absolutely continuous distributions on the sphere the ATD is constant with value 12 if and only if any hemisphere has probability 12 . This is a special case of Theorem 2, which holds for any distribution on the sphere. 3. Regression depth and regression symmetry Let (X; Y ) be a p-dimensional random variable with distribution P on Rp , where the component Y is univariate. We want to obtain a 3t V ∈ Rp such that (X  ; 1)V approximates Y . Rousseeuw and Hubert (1999) developed the regression depth to measure the quality of such a 3t V. A good 3t, with the mass of P well-balanced about the hyperplane y = (x ; 1)V, will have a high regression depth whereas a 3t with lower depth does not represent P well. In words, the regression depth is de3ned as the

P.J. Rousseeuw, A. Struyf / Journal of Statistical Planning and Inference 122 (2004) 161 – 173 167

smallest amount of probability mass that needs to be passed when tilting V in any way until it is vertical. Formally, when we de3ne the error E := Y − (X  ; 1)V the regression depth of V relative to P is given by rdepth(V) = inf {P((E ¿ 0) ∩ D) + P((E 6 0) ∩ Dc )} D

(3.1)

over all closed vertical halfspaces D (i.e., such that the boundary @D is parallel to the vertical direction). The collection of halfspaces over which the in3mum is taken may be reduced by the following lemma. Lemma 3. De9nition (3.1) is equivalent to rdepth(V) = inf {P((E ¿ 0) ∩ D) + P((E 6 0) ∩ Dc )} D

for all vertical closed halfspaces D with P(@D) = 0. Proof. Let us denote A := {P((E ¿ 0) ∩ D) + P((E 6 0) ∩ Dc ); P(@D) = 0}. Note that always rdepth(V) 6 inf A because not all halfspaces are allowed in A. It remains ˜ ¿ 0 we to prove that rdepth(V) ¿ inf A. We need to show that for any D˜ with P(@D) have ˜ + P((E 6 0) ∩ D˜ c ) ¿ inf A: P((E ¿ 0) ∩ D) Denote D˜ as Hx; u where x; u ∈ (y = 0). Consider all Hx− u; u for 0 ¡ ¡ 1. Since at most a countable number of them can have P(@Dx− u; u ) ¿ 0 there existsa decreasing ˜ sequence n → 0 such that Dn = Hx− n u;u satis3es P(@Dn ) = 0, while n ↓ Dn = D. Now P((E ¿ 0) ∩ Dn ) + P((E 6 0) ∩ Dnc ) =P((E ¿ 0) ∩ Dn ) + P(E 6 0) − P((E 6 0) ∩ Dn ) ˜ + P(E 6 0) − P((E 6 0) ∩ D): ˜ → P((E ¿ 0) ∩ D) c ˜ Since P((E ¿ 0)∩D)+P((E 6 0)∩D˜ ) is the limit of a sequence in A, it is ¿ inf A. Rousseeuw and Struyf (1998) implicitly used this de3nition of the regression depth, applied to empirical distributions, to construct an exact algorithm for the regression depth relative to three- and four-dimensional data sets. Note that the regression depth may also be written as rdepth(V) = inf P(sign(E) signD (X ) ¿ 0) D

= P(E = 0) + inf P(sign(E) signD (X ) ¿ 0); D

where signD (X ) := 1 when (X; 0) ∈ D and signD (X ) := −1 otherwise. Also note that the regression depth is regression invariant. That is, for any function g : Rp → Rp : (x; y) → (x; y + (x ; 1)) it holds that rdepth(V + ; Pg ) = rdepth(V; P). The 3t V with maximal regression depth is a robust regression estimator, called the deepest 9t Tr∗ (P). For a univariate distribution P the deepest 3t equals the median, hence it is a generalization of the median to linear regression. The asymptotic

168 P.J. Rousseeuw, A. Struyf / Journal of Statistical Planning and Inference 122 (2004) 161 – 173

distribution of the deepest regression was obtained by He and Portnoy (1998) in simple regression and by Bai and He (1999) in multiple regression. Various robustness properties of the deepest 3t are studied by Van Aelst and Rousseeuw (2000). The natural setting for regression depth is a semiparametric model H in which the functional form is parametric and the error distribution is nonparametric. Rousseeuw and Hubert (1999) de3ne H as the set of all distributions P on Rp with a strictly positive density such that there exists a V∗ ∈ Rp with med(E|X = x0 ) = 0

for all x0 ∈ Rp−1 :

(3.2)

This is equivalent to med(Y |X = x0 ) = (x ; 1)V∗ which says that the conditional median of Y given x0 is linear in x0 . This is an essential requirement for the applicability of methods like least absolute deviations regression and regression quantiles (Koenker and Bassett, 1978). An interesting property of model (3.2) is that it allows for skewed error distributions and heteroscedasticity. Since the median of Y |X = x0 need not be unique, we could expand de3nition (3.2) to P ∈ H ⇔ P(E ¿ 0|X = x0 ) = P(E ¡ 0|X = x0 )

for all x0 ∈ Rp :

However, we need to go further. First of all, the conditional distribution (and hence the conditional median and the conditional expectation) may not be uniquely de3ned when conditioning on an event with probability zero (Rao, 1988; Proschan and Presnell, 1998), and often P(X = x0 ) = 0. And secondly, we would like to remove the restriction that P has a density. Therefore, we will de3ne H as the set of all distributions P on Rp for which there exists a V0 such that for any Borel set B ⊂ Rp−1 it holds that P(X ∈ B and E ¿ 0) = P(X ∈ B and E ¡ 0):

(3.3)

A distribution P which ful3lls (3.3) is called regression symmetric about V0 . Note that regression symmetry is slightly more general than linearity of the conditional median med(Y |X ), since the latter need not be uniquely de3ned. Lemma 4. The following are equivalent: (1) P is regression symmetric about V0 . (2) P(X ∈ B|E ¿ 0) = P(X ∈ B|E ¡ 0) for any Borel set B in Rp−1 . (3) P(E ¿ 0|X ∈ B) = P(E ¡ 0|X ∈ B) for any Borel set B with P(B) = 0. Proof. Let us prove (1) ⇒ (2). By taking B = Rp−1 we see that P(E ¿ 0) = P(E ¡ 0) are both zero or both nonzero. If P(E ¿ 0)=0 then P(X ∈ B|E¿0)=0=P(X ∈ B|E¡0). If P(E ¿ 0) = 0 then P(X ∈ B|E ¿ 0) =

P(X ∈ B and E ¿ 0) P(X ∈ B and E ¡ 0) = P(E ¿ 0) P(E ¡ 0)

= P(X ∈ B|E ¡ 0): The other implications can be proved analogously.

P.J. Rousseeuw, A. Struyf / Journal of Statistical Planning and Inference 122 (2004) 161 – 173 169

Using the notation with the sign function, we 3nd that H is the set of all distributions P for which there exists a V0 such that for any Borel set B in Rp−1 with P(B) = 0 it holds that  sign(E) dP = 0 B×R

or equivalently E[sign(E)|X ] = 0: We will now prove that any distribution P which is regression symmetric about some V0 reaches its maximal regression depth in V0 . Theorem 3. When P is regression symmetric about some V0 then rdepth(V0 ) =

1 2

+ 12 P(E = 0):

In that case, rdepth(V0 ) is the maximal rdepth over all V ∈ Rp . Proof. If P(E = 0) = 1 then rdepth(V0 ) = 1 equal to the right-hand side, so from here on we will continue with the case P(E = 0) ¿ 0. By regression invariance we may assume that V0 = 0, so E = Y . Put G 0 ≡ (y = 0). Applying regression symmetry with B=Rp−1 yields P(E ¿ 0)=P(X ∈ B and E ¿ 0) = P(X ∈ B and E ¡ 0) = P(E ¡ 0), so P(E ¿ 0) = P(E ¡ 0) = 12 P(E = 0). Now consider any vertical halfspace D with P(@D) = 0. Then P((E ¡ 0) ∩Dc ) = P((E ¿ 0) ∩ Dc ) by applying regression symmetry to B = D ∩ G 0 . Therefore, P((E ¿ 0) ∩ D) + P((E ¡ 0) ∩ Dc ) = P((E ¿ 0) ∩ D) + P((E ¿ 0) ∩ Dc ) = P(E ¿ 0) = 12 P(E = 0). Finally, rdepth(0) = P(E = 0) + inf D {P((E ¿ 0) ∩ D) + P((E ¡ 0) ∩ Dc )} = P(E = 0) + 12 P (E = 0) = 12 + 12 P(E = 0). Suppose there exists some V∗ = 0 with rdepth(V∗ ) ¿ rdepth(0). Take the intersection L of the hyperplanes (y = (x ; 1)V0 ) and (y = (x ; 1)V∗ ). First suppose L = ∅, i.e. these hyperplanes are parallel. Assume wlog that Vp∗ = 0. Then take u = e1 = (1; 0; : : : ; 0) and consider the vertical halfspaces Dn = Dnu; −u for n = 1; 2; 3; : : : For each  n we have rdepth(V∗ ) 6 P((Y ¿ Vp∗ ) ∩ Dn ) + P((Y 6 Vp∗ ) ∩ Dnc ) → P(Y ¿ Vp∗ ) since n ↑ Dn = Rp  and n ↓ Dnc = ∅. But then 12 P(Y = 0) = P(Y ¿ 0) ¿ P(Y ¿ Vp∗ ) ¿ rdepth(V∗ ) hence rdepth(0) = P(Y = 0) + 12 P(Y = 0) ¿ rdepth(V∗ ), a contradiction. If L = ∅ then L is an aNne hyperplane of G 0 with dimension p − 2. Now take a vertical halfspace D of Rp with @D ∩ G 0 = L and such that (x ; 1)V∗ ¿ 0 for x ∈ D and (x ; 1)V∗ 6 0 for x ∈ Dc . Then rdepth(V∗ ) 6 P((Y ¿ (X  ; 1)V∗ ) ∩ D) + P((Y 6 (X  ; 1)V∗ ) ∩ Dc ) 6 P((Y ¿ 0) ∩ D) + P((Y 6 0) ∩ Dc ) = P(Y = 0) + P((Y ¿ 0) ∩ D) + P((Y ¿ 0) ∩ Dc ) = P(Y = 0) + P(Y ¿ 0) = rdepth(0) which again contradicts rdepth(V∗ ) ¿ rdepth(0). Thus no such V∗ exists, hence rdepth(0) = maxV rdepth(V). Rousseeuw and Hubert (1999) have proved that maxV rdepth(V) 6 12 for distributions with a density. Using this property and Theorem 3 we 3nd:

170 P.J. Rousseeuw, A. Struyf / Journal of Statistical Planning and Inference 122 (2004) 161 – 173

Corollary 2. If P has a density and P is regression symmetric about some V0 then 1 max rdepth = = rdepth(V0 ): 2 V We will now also prove the reverse of Theorem 3, which shows that regression symmetry is characterized by regression depth. Theorem 4. When there exists V0 ∈ Rp with rdepth(V0 ) =

1 2

+ 12 P(E = 0)

then P is regression symmetric about V0 . In that case, rdepth(V0 ) is the maximal rdepth over all V ∈ Rp . Proof. By regression equivariance we can put V0 =0 wlog, hence E =Y . If P(Y =0)=1 there is nothing to prove, so we will assume that P(Y = 0) ¿ 0. Now take u = e1 = (1; 0; : : : ; 0) and put Dn = Hnu; −u ⊂ Rp . For each n we have rdepth(0) 6 P((Y ¿ 0) ∩ Dn ) + P((Y 6 0) ∩ Dnc ) hence (letting n → ∞) rdepth(0) 6 P(Y ¿ 0)=P(Y =0)+P(Y ¿ 0). Analogously, rdepth(0) 6 P(Y =0)+P(Y ¡ 0). Since rdepth(0) = 12 + 12 P(Y = 0) = P(Y = 0) + 12 P(Y = 0) we 3nd 12 P(Y = 0) 6 P(Y ¡ 0) and 12 P(Y = 0) 6 P(Y ¿ 0). Adding these inequalities yields P(Y = 0) 6 P(Y = 0) so both inequalities must be equalities, hence P(Y ¿ 0) = P(Y ¡ 0) = 12 P(Y = 0). De3ne P1 as the conditional distribution of X given that Y ¿ 0, i.e. for any Borel set B ⊂ Rp−1 we put P1 (B) = P(X ∈ B and Y ¿ 0)=P(Y ¿ 0) = 2P(X ∈ B and Y ¿ 0)=P (Y = 0). Analogously, P2 (B) = 2P(X ∈ B and Y ¡ 0)=P(Y = 0). To prove regression symmetry it suNces to prove that P1 (B) = P2 (B) for any B. We 3rst prove this for any closed halfspace H of Rp−1 . Note that H × R is a vertical closed halfspace of Rp . Since rdepth(V) = P(Y = 0) + 12 P(Y = 0) 6 P(Y = 0) + P(X ∈ H and Y ¿ 0) + P(X ∈ H c and Y ¡ 0) and also rdepth(V) 6 P(Y = 0) + P(X ∈ H and Y ¡ 0) + P(X ∈ H c and Y ¿ 0) we must have equalities in both cases, since adding the inequalities yields 2P(Y = 0) + P(Y = 0) 6 2P(Y = 0) + P(Y = 0). This yields P1 (X ∈ H ) = 1 − P2 (X ∈ H c ) = P2 (X ∈ H ) by the complement rule for P2 . Because P1 (H ) = P2 (H ) for any closed halfspace H in Rp−1 , the theorem of CramJer and Wold (1936) implies that P1 ≡ P2 , i.e. P1 (B) = P2 (B) for any Borel set in Rp−1 . Since we have proved that P is RS about the 3t V0 , Theorem 3 implies that rdepth(V0 ) = maxV rdepth(V). We are now able to prove some uniqueness results about the hyperplane of regression symmetry. In words, when there are two diMerent hyperplanes V1 = V2 of regression symmetry there must be a ‘vacuum wedge’ between V1 and V2 with half of the total probability mass lying above the wedge and the other half below it. Corollary 3. (1) When P is regression symmetric about some 9t V0 and P(E = 0) ¿ 0 then the hyperplane of regression symmetry is unique. (2) When there exist two regression symmetry hyperplanes V1 = V2 then P(WV1 ;V2 )= 0 for the wedge WV1 ;V2 = {(x; y); y ∈ [(x ; 1)V1 ; (x ; 1)V2 ]}.

P.J. Rousseeuw, A. Struyf / Journal of Statistical Planning and Inference 122 (2004) 161 – 173 171

(3) When P is regression symmetric and P has a strictly positive density, then the hyperplane of regression symmetry is unique. Proof. (1) Put V0 = 0 so Y = E. If P were also regression symmetric about some V∗ = 0 then we can repeat the last part of the proof of Theorem 3 with the following modi3cations. If L = ∅ we 3nd P(Y ¡ Vp∗ ) ¿ P(Y ¡ 0) + P(Y = 0) ¿ 12 and P(Y ¿ Vp∗ ) = P(Y = Vp∗ ) + P(Y ¿ Vp∗ ) ¿ 12 . Adding these inequalities gives P(Rp ) ¿ 1, a contradiction. If L = ∅ we follow the analogous reasoning on D and on Dc yielding P(D ∪ Dc ) ¿ 1. (2) By Theorem 3 we know that rdepth(V1 ) ¿ 12 and rdepth(V2 ) ¿ 12 . Put V1 =0 wlog. Write the intersection of both 3ts as L. If L = ∅ we have WV1 ;V2 = Rp−1 × [0; (V2 )p ] = Rp \((y ¡ 0)∪(y ¿ (V2 )p ). Therefore, P(WV1 ;V2 )=1−P(Y ¡ 0)−P(Y ¿ (V2 )p ) 6 1− 1 1  c 2 − 2 = 0. If L = ∅ we 3nd that WV1 ;V2 = (D ∩ {(x; y); 0 6 y 6 (x ; 1)V2 }) ∪ (D ∩  {(x; y); (x ; 1)V2 6 y 6 0}). Both parts have probability zero by applying the above reasoning on D and Dc separately. (3) Suppose we have two symmetry hyperplanes V1 = V2 . Then the wedge p WV1 ;V2 has in3nite volume, i.e.  # (WV1 ;V2 ) = ∞. Denotingp the density of P by f(x; y) ¿ 0 we obtain P(Wbt1 ;bt2 ) = f(x; y)I ((x; y) ∈ WV1 ;V2 ) d# (x; y) ¿ 0 which contradicts part 2. From Theorem 4 we conclude that the maximal regression depth can be seen as a measure of the regression symmetry of the data set. Hence, we can construct a test for linearity using the maximal regression depth as test statistic. Since linearity of the functional form is a prerequisite for many regression methods, it is important to be able to test for it. In simple regression, the null hypothesis assumes that the data set Zn = {(xi ; yi ); i = 1; : : : ; n} follows the linear model H0 : yi = ˜1 xi + ˜2 + ei ;

i = 1; : : : ; n

for some V˜ = (˜1 ; ˜2 )t and with independent errors ei each having a distribution with zero median. First, we calculate the deepest 3t of the data set Zn which has regression depth k and corresponding residuals r1 ; : : : ; rn . The test statistic equals k. The p-value of the test can then be approximated by simulation. We generate m = 10 000 samples Z (j) = {(xi ; r.j (i) ); i = 1; : : : ; n} with .j a random permutation of the residuals. The p-value for depth k is then computed as p(k) = P(max rdepth 6 k|H0 ) ≈ #{j; max rdepth(Z (j) ) 6 k}=m: Example. The data set in Fig. 1 shows the price of 124 Mazda cars versus the year of purchase and is available at the Australasian Data and Story Library at http://www. maths.uq.oz.au/∼gks/data/. The regression depth of the deepest 3t (indicated in Fig. 1) is 52. Simulation gives us p(52) ≈ 0:0000, so we clearly reject linearity. After logarithmically transforming the response variable we obtain the deepest 3t in Fig. 2. Now the maximal regression depth is 57, with corresponding p-value p(57) ≈ 0:4276. We thus accept the linearity of the transformed data set.

172 P.J. Rousseeuw, A. Struyf / Journal of Statistical Planning and Inference 122 (2004) 161 – 173 40000

price

30000

20000

10000

0 75

80

85

90

year of purchase

Fig. 1. Price versus year of purchase for 124 Mazda cars with the deepest regression line.

log (price)

10

9

8

7

75

80

85

90

year of purchase

Fig. 2. Log(price) versus year of purchase for 124 Mazda cars with the deepest regression line.

Acknowledgements We wish to thank Bob SerEing and Yijun Zuo for interesting discussions.

P.J. Rousseeuw, A. Struyf / Journal of Statistical Planning and Inference 122 (2004) 161 – 173 173

References Ajne, B., 1968. A simple test for uniformity of a circular distribution. Biometrika 55, 343–354. Bai, Z., He, X., 1999. Asymptotic distributions of the maximal depth estimators for regression and multivariate location. Ann. Statist. 27, 1616–1637. CramJer, H., Wold, H., 1936. Some theorems on distribution functions. J. London Math. Soc. 11, 290–294. Donoho, D.L., Gasko, M., 1992. Breakdown properties of location estimates based on halfspace depth and projected outlyingness. Ann. Statist. 20, 1803–1827. Eddy, W.F., 1985. Ordering of multivariate data. In: Billard, L. (Ed.), Computer Science and Statistics, Proceedings of the 16th Symposium on the Interface. North-Holland, Amsterdam, pp. 25–30. Green, P.J., 1981. Peeling bivariate data. In: Barnett, V. (Ed.), Interpreting Multivariate Data. Wiley, New York, pp. 3–19. He, X., Portnoy, S., 1998. Asymptotics of the deepest line. In: Ahmed, S.E., Ahsanullah, M., Sinha, B.K. (Eds.), Applied Statistical Science III: Nonparametric Statistics and Related Topics. Nova Science Publishers Inc., New York, pp. 71–81. He, X., Wang, G., 1997. Convergence of depth contours for multivariate datasets. Ann. Statist. 25, 495–504. Hodges, J.L., 1955. A bivariate sign test. Ann. Math. Statist. 26, 523–527. Koenker, R., Bassett, G., 1978. Regression quantiles. Econometrica 46, 33–50. Liu, R.Y., 1990. On a notion of data depth based on random simplices. Ann. Statist. 18, 405–414. Liu, R.Y., Singh, K., 1992. Ordering directional data: concepts of data depth on circles and spheres. Ann. Statist. 20, 1468–1484. Liu, R.Y., Parelius, J., Singh, K., 1999. Multivariate analysis by data depth: descriptive statistics, graphics and inference. Ann. Statist. 27, 783–840. MassJe, J.C., 1999. Asymptotics for the Tukey depth. Technical Report. UniversitJe Laval, QuJebec, Canada. MassJe, J.-C., Theodorescu, R., 1994. Halfplane trimming for bivariate distributions. J. Multivariate Anal. 48, 188–202. Proschan, M.A., Presnell, B., 1998. Expect the unexpected from conditional expectation. Amer. Statist. 52, 248–252. Rao, M.M., 1988. Paradoxes in conditional probability. J. Multivariate Anal. 27, 434–446. Rousseeuw, P.J., Hubert, M., 1999. Regression depth. J. Amer. Statist. Assoc. 94, 388–402. Rousseeuw, P.J., Ruts, I., 1999. The depth function of a population distribution. Metrika 49, 213–244. Rousseeuw, P.J., Struyf, A., 1998. Computing location depth and regression depth in higher dimensions. Statist. Comput. 8, 193–203. Small, C.G., 1987. Measures of centrality for multivariate and directional distributions. Canad. J. Statist. 5, 31–39. Tukey, J.W., 1975. Mathematics and the picturing of data. Proceedings of the International Congress of Mathematicians, Vancouver, Vol. 2, pp. 523–531. Van Aelst, S., Rousseeuw, P.J., 2000. Robustness of deepest regression. J. Multivariate Anal. 73, 82–106. Zuo, Y., SerEing, R., 2000a. On the performance of some robust nonparametric location measures relative to a general notion of multivariate symmetry. J. Statist. Plann. Inference 84, 55–79. Zuo, Y., SerEing, R., 2000b. General notions of statistical depth function. Ann. Statist. 28, 461–482.