We show that the deepest regression functional is Fisher-consistent for the conditional median, and has a ... The deepest regression estimator T*(H) is defined as the fit 9 with ... :Yf in which the functional form is parametric and the error distribution is ..... The resulting sensitivity function strongly depends on the actual sample.
Journal of Multivariate Analysis 73, 82-106 (2000) ® doi:l0.1006/jmva.l999.1870, available online at http:jjwww.idealibrary.com on IOE~l
Robustness of Deepest Regression Stefan Van Aelst 1 and Peter J. Rousseeuw Department of Mathematics and Computer Science, U.I.A., Universiteitsplein I, B-2610 Antwerp, Belgium
Received October 12, 1998
In this paper we investigate the robustness properties of the deepest regression, a method for linear regression introduced by Rousseeuw and Hubert [ 6]. We show that the deepest regression functional is Fisher-consistent for the conditional median, and has a breakdown value of ~ in all dimensions. We also derive its influence function, and compare it with sensitivity functions. © 2000 Academic Press AMS 1991 subject classifications: 62F35, 62J05. Key words and phrases: breakdown value, influence function, regression depth.
1. INTRODUCTION Let (x, y) be a random p-dimensional column vector, with distribution H on JRP. We would like to regress the univariate y on the (p- 1) dimensional x. For any (potential) fit 9=(B 1 , .•. , BP) 1 we want to verify how well (x 1, 1) 9 approximates y. To measure the quality of a fit, Rousseeuw and Hubert [ 6] introduced the notion of regression depth, which is a counterpart to Tukey's location depth [9]. DEFINITION 1. The regression depth of a fit 9 E JRP relative to a given distribution H on JRP, where H is the distribution of the random variable ( x, y ), is given by
rdepth(9, H) =min {H(y-(x 1, 1)9>0 and x 1u e), H( y < e)} which is the "rank" of e when we rank from the outside inwards. For any p ~ 1, the regression depth of 9 measures how balanced the mass of H is about the linear fit determined by 9. DEFINITION 2. The deepest regression estimator T*(H) is defined as the fit 9 with maximal rdepth(9, H), that is T*(H) = argmax rdepth(9, H).
(1)
0
(See Rousseeuw and Hubert [6]). We call T* a functional because its argument is a distribution H on IRP. (For a finite dataset, we apply Definition 2 to the empirical distribution Hn.) For a distribution H on ~ 1 the deepest regression (DR) is its median. For H on ~P with p > 1, the DR thus generalizes the univariate median to linear regression. The DR is the "most balanced" fit for H. It is a "mediantype" regression method, unlike earlier robust methods such as least trimmed squares (Rousseeuw [ 4]) and S-estimators (Rousseeuw and Yohai [ 8]) that are "mode-seeking" because they search for a concentrated linear cloud with the majority of the probability mass. Figure la shows a dataset consisting of n =50 points generated from a bivariate gaussian distribution H with mean Jl = (4, 2) 1, standard deviations a 1 = 4 and a 2 = 3, and correlation p = 0.8. Denote the empirical distribution of this dataset by Hn. The fits 9 1 = (0.6, 4.6 and 9 2 = (- 2, 6 both have regression depth 1/50 according to Definition 1, and the deepest regression T*(Hn) = (0.615, -0.067f has depth 23/50 which is almost !. Figure 1a illustrates that lines with high regression depth provide a more
r
(a)
r
(b)
FIG. 1. (a) Dataset consisting of n =50 points generated from a bivariate gaussian distribution H on ~ 2 • The lines 9 1 and 9 2 have regression depth 1/50, and the deepest regression T*(Hnl has regression depth 23/50. (b) Contours of H. The line 9 1 now has depth 0.0027 and 9 2 has depth 0.05. The deepest regression T*(H) has regression depth ~·
84
VAN AELST AND ROUSSEEUW
balanced fit to the data than lines with low depth. This motivates our interest in the properties of the fit T* with maximal regression depth. Figure 1b shows contours of the corresponding population distribution H, where T*(H) = (0.6, -0.4)' has depth exactly~ while OJ has depth 0.0027 and 0 2 has depth 0.05. The natural setting of deepest regression is a large semi parametric model :Yf in which the functional form is parametric and the error distribution is nonparametric. Formally, :Yf consists of all distributions H on ~P with a strictly positive density such that there exists a 9 E ~P with medH(Y I x) = (x', 1) 9. Note that this model allows for skewed error distributions and heteroscedasticity. The asymptotic distribution of the deepest regression was obtained by He and Portnoy [ 3] in simple regression, and by Bai and He [ 1] in multiple regression. In this paper we study the robustness properties of the deepest regression functional T*. Section 2 shows that T* is Fisher-consistent, and in Section 3 it is shown that T* has a breakdown value of~· In Section 4 we derive the influence function of the deepest regression slope and intercept, and compare them with sensitivity functions. The conclusions are formulated in Section 5.
2. FISHER CONSISTENCY
We first define a probability-based distance between fits. DEFINITION 3. For every H on ~P and any hyperplanes OJ and 0 2 we define dH(Ot. 0 2) = H(A(O~> 0 2)), where A(O~> 0 2) = {(x, y); x E w- 1 and y E [ (x', 1) 0~> (x', 1) 0 2 ]} is the double wedge formed by the hyperplanes OJ and 0 2.
LEMMA
metric on
1.
For every H on
~P
with density h > 0, the function dH is a
~P.
Proof For every 0 E ~Pit clearly holds that dH(O, 0) = 0. For every OJ and 02 we see that dH(OJ, 0 2) = dH(0 2, 0 1 ), and that dH(OJ, 0 2 ) = 0 implies OJ =0 2 since h>O. Also the triangle inequality dH(0 1 ,0 3 )~dH(Ot.0 2 )+ dH(0 2,0 3 ) holds, because for every xE~P- 1 we have [(x', 1)0~> (x', 1)0 3] c [ (x', 1) Ot. (x', 1) 0 2] u [ (x', 1) 0 2, (x', 1) 03], hence A(Ot. 03) c A(O], 02) u A(02, 03 ). I LEMMA
2.
For every HE :Yf and any 0 it holds that rdepth(O, H)
= ~-dH(T*(H), 0).
ROBUSTNESS OF DEEPEST REGRESSION
85
Proof First note that for every 9 it holds that dH(9, T*(H)):::;; !. This can be seen as follows. Since rdepth( T*(H), H)=! the probability mass passed by T*(H) when tilting it until it is vertical is always exactly!. If we tilt T*(H) around the intersection of T*(H) and 9 so that it passes 9 until it is vertical, then A(9, T*(H)) is part of the region passed by T*(H). Therefore H(A(9, T*(H))):::;; !, hence dH(9, T*(H)):::;; !. Moreover, if we tilt 9 around this intersection so that it does not pass T*(H), then the amount of probability mass passed by 9 is exactly ! - d H( T*( H), 9 ), hence rdepth(9, H):!(!- dH(T*(H), 9). For p = 2 dimensions, take a base point u different from the base point v corresponding to the intersection of 9 and T*(H), as in Fig. 2. If we tilt T*(H) at u then we pass exactly ! of the probability mass. If we tilt 9 at u so that it does not pass T*( H) then we pass probability mass ! + H( C) -H(D) where C= {(x, y); XE [min(u, v), max(u, v)] andyE [(x, 1) 9, (x, 1) T*(H)]} and D={(x,y);x¢[min(u,v),max(u,v)] and yE[(x,l)9, (x, 1) T*(H)]}. If we tilt 9 at u so that it passes T*(H) then we pass probability mass !-H(C)+H(D). SinceA(9, T*(H))=CuD, the minimal amount of probability mass passed by 9 when u # v is higher than ! dH(T*(H), 9), hence rdepth(9, H)= !-dH(T*(H), 9). For p = 3 dimensions, take a base line U = (u 1 , u 2 ) different from the base line V= (v 1 , v2 ) corresponding to the intersection of 9 and T*(H). If we tilt 9 at U so that it does not pass T*(H) then we pass probability mass
>-
v
u X
FIG. 2. Example of base points v and u to v for p and D=D 1 uD 2 .
=
2 with the corresponding regions C
86
VAN AELST AND ROUSSEEUW
!+H(C)-H(D) with C= {(x, y, z); XE IR, y¢ [min(v 1 x+v 2 , u 1 x+u 2 ), max(v 1 x+v 2 ,u 1 x+u 2 )] and zE[(x,y,l)9,(x,y,l)T*(H)]} and D= { (x, y, z); x E IR, y E [min(v 1 x + v2 , u 1 x + u 2 ), max(v 1 x + v2 , u 1 x + u 2 )] and zE [(x, y, 1) 9, (x, y, 1) T*(H)]}. If we tilt 9 at U so that it passes T*(H), then we pass probability mass H( C)+ H(D). Since A(9, T*(H)) = CuD, the minimal amount of probability mass passed by 9 when U =I= V is higher than dH(T*(H), 9), hence rdepth(9, H)=!- dH(T*(H), 9). This construction can be generalized for p > 3.
!-
!-
From this proof it follows that the best base point (in general, the best base hyperplane in W-I) for tilting a fit 9 is the x-projection of the intersection of 9 with T*(H), and the direction in which to tilt 9 is such that 9 does not pass T*(H). The next theorem shows that the deepest regression T*(H) is a Fisherconsistent estimator of the conditional median medH(Y Ix) when H belongs to the large semiparametric model .Yf in which the error distribution is non parametric. THEOREM 1 (Fisher-consistency). T*(H)
=
For
every
HE .Yf
it
holds
that
0.
Proof The condition medH(Y I x) = (x', 1) 6 implies rdepth(O, H)=!. Since H has a strictly positive density h, for every 9 =I= 6 it holds that dH(9, 0) = H(A(9, 0)) > 0. Note that Lemma 2 still holds if we replace T*(H) by 6, hence we obtain rdepth(O, H)= !-dH(O, 0)
From the Fisher consistency in Theorem 1 together with the consistency of the deepest regression for 6 shown by Bai and He [ 1] it follows that T*(Hn) = r::(z~> ... , zn) converges to T*(H) in probability when z 1 , ... , zn is i.i.d. according to HE .Yt'. This confirms that T*(H) is the asymptotic value of
r::
r::.
3. BREAKDOWN VALUE The breakdown value r;*(T, H) of any functional Tat His the smallest fraction of the probability mass of H that needs to be replaced to carry T beyond all bounds (see Hampel et a!. [ 2] ). It is defined by r;*(T, H)= inf{ e; sup I T((l-e) H G
where G is an arbitrary distribution on
[RP.
+ eG)- T(H)II = oo}
(2)
ROBUSTNESS OF DEEPEST REGRESSION
87
LEMMA 3. If HE :Yf is a distribution on ~P and there exists a value 0 < 1J
e*( T*(H), H)~! -1]. Proof We will consider contaminated distributions H, = ( 1 -e) H where G is any distribution on ~P. The fraction e is sufficient to cause breakdown only if rdepth( T*(H,), H,) ~ e + 1J for some G. Suppose that rdepth( T*( H,), H,) > e + 1J for all G, then we find rdepth( T*(H,), H)> 1J for all G. Therefore T*(H.) belongs to K for all G. Since K is compact we have sup 6 II T*(H,)- T*(H) II < oo which means that e is not sufficient to cause breakdown. It follows that
+e G
e + 1J ~ rdepth(T*(H,), H,) ~rdepth(T*(H), ~
H.)
(1-e) rdepth(T*(H), H)
and because rdepth( T*(H), H)= & we obtain e + 1J ~ (1- e )/2 hence e~(l-21])/3>!-IJ.
I
DEFINITION 4. Let H be a distribution on ~P. For every 0 < k ~ & the depth region of depth k is defined by Dk(H) = {9; rdepth(9, H)~ k} c W.
LEMMA
4.
For every HE :Yf and 0 < k ~ & the depth region DdH) is
bounded. Proof For any 9 1 and 9 2 in ~P denote the euclidian distance between 9 1 and 9 2 by dE(9 1 , 9 2 ). Suppose that sup{dE(9, T*(H)); 9EDk(H)} = oo. Then there exists a sequence (9i)1 in Dk(H) with dE(9i, T*(H)) ....4 oo. This implies dH(9i, T*(H)) ....4 & since we have to pass half of the probability mass to take IIT*(H)II to infinity. But because dH(~, T*(H))~&-k for every fit~ in Dk(H), the sequence (9i)1 cannot stay in Dk(H). I LEMMA
5.
For every HE :Yf and 0 < k ~ & the depth region Dk(H) is
closed. Proof Suppose that 9 belongs to the closure of Dk(H) c ~P. Then dE(9, Dk(H)) = 0, so there exists a sequence (9i)1 in Dk(H) with dE(9, 9i) ....4 0. Because all metrics on ~P are topologically equivalent, this implies dH(9, 9i) ~ 0. Now
88
VAN AELST AND ROUSSEEUW
rdepth(O, H)= ~-dH(T*(H), 0) ~ ~- dH(T*(H),
Oj)- dH(Oi, 0)
~ ~-dH(T*(H),Oj)~k
THEOREM
2.
For any dimension p
~
2 and any distribution H in Yt it
holds that e*(T*, H)=
t.
Proof Lemmas 4 and 5 show that for every 0 < k < t there exists a compact set Dk(H) in ~P with rdepth(O, H)< k for all 0 ¢ Dk(H). Therefore it follows from Lemma 3 that e*(T*,H)~t-k for every k>O, so e*(T*, H)~ t.
To prove that e*(T*, H)~ t we show that T* can be made to break down by moving t of the probability mass arbitrarily far away. Let us first consider the case p = 2. Because of invariance, we may assume w.l.o.g. that medH(x) = 0. Take He= (1 -e) H +~Liz+ ~LI-z. Denote Av. 0 = (y < (x, 1) 0 and x(x, 1)0 and x>v), Cv.o=(y>(x, 1)0 and x yjx and O:S;v yjx, as in Figure 6b. For 0:::::; v < x we now obtain in the same way as for (i) that H,(Av, 0 ) + H,(Bv, o) =(I- e) H(Av. 0 ) + (1- e) H(Bv, o)
= (1 - e )/2 + (1 -e) k( v, b) and H,( Cv, 0 ) + H,(Dv, 0 ) = e/2 + (1- e) H( Cv, 0 ) + e/2 + H(Dv, 0 )
= (1 + e )/2- ( 1 -e) k( v, b), but now k( v, b) is negative and increasing in v. Therefore min {H,(Av,o) v
+ H,(Bv, 0 ), He( Cv,o) + H,(Du,o)}:::::; (1- e)[ 1/2 +k(O, b)] :::::; ( 1 - e)/2.
Hence also any fit 9=(b, 0) with b> yjx has depth at most (1-e)/2. (iii)
If b = yjx then for 0:::::; v we find H,(Av, o) + H,(Bv, o) = e/2 + (1- e)[H(Av, o) + H(Bv, o)J
= ~ + (1 -e) k( v, b) and He( Cv, o) + H,(Dv, o) = e/2 + (1- e)[H( Cv, o) + H(Dv, o)J
= ~- (1 -e) k(v, b)
ROBUSTNESS OF DEEPEST REGRESSION
99
where k( v, b) is negative and increasing in v. Therefore min {He(Av. o) + He(Bv, o), He( Cv, o) + He(Dv, o)} v
:(! + (1- s) k(O, yjx) O. (iv) Finally, consider a line 9 through the ongm with slope 0 < b < yjx. The function k( v, b) is now negative and increasing in v. For v > x we obtain He(Av, o) + H.(Bv, o) = s/2 + ( 1- e) H(Av, o) +(I -e) H(Bv, o)
=!+(1-e)k(v,b) and He( Cv, 0 ) + He(Dv,o) = s/2 + (1- e) H( Cv,o) + (1- e) H(Dv,o)
=!- ( 1 -e) k( V, b), SO minv>x{ He(Av, o) + He(Bv, o), He( Cv, o) + He(Dv,o)} = limv-x (He(Av, o) + He(Bv,o)) = !+ (1-e) k(x, b). For 0 :( v