The Deepest Regression Method 1 Introduction - CiteSeerX

0 downloads 0 Views 338KB Size Report
Jan 10, 2000 - regression, and show that the deepest polynomial regression has breakdown ... we consider polynomial regression, for which we update the ...
The Deepest Regression Method Stefan Van Aelst1, Peter J. Rousseeuw, Mia Hubert2, and Anja Struyf1 January 10, 2000 Department of Mathematics and Computer Science, U.I.A., Universiteitsplein 1, B-2610 Antwerp, Belgium

http://win-www.uia.ac.be/u/statis/

Abstract Deepest regression (DR ) is a method for linear regression introduced by Rousseeuw and Hubert [16]. The DR method is defined as the fit with largest regression depth relative to the data. In this paper we show that DR is a robust method, with breakdown value that converges almost surely to 1/3 in any dimension. We construct an approximate algorithm for fast computation of DR in more than two dimensions. From the distribution of the regression depth we derive tests for the true unknown parameters in the linear regression model. Moreover, we construct simultaneous confidence regions based on bootstrapped estimates. We also use the maximal regression depth to construct a test for linearity versus convexity/concavity. We apply DR to polynomial regression, and show that the deepest polynomial regression has breakdown value 1/3. Finally, DR is applied to the Michaelis-Menten model of enzyme kinetics, where it resolves a long-standing ambiguity.

1 Introduction Consider a dataset Zn = fzi = (xi1 ;    ; xi;p?1; yi); i = 1; : : : ; ng  IRp. In linear regression we want to fit a hyperplane of the form y = 1x1 + : : : + p?1xp?1 + p with  = (1 ; : : : ; p)t 2 IRp. We denote the x-part of each data point zi by xi = (xi1; : : : ; xi;p?1)t 2 IRp?1. The residuals of Zn relative to the fit  are denoted as ri = ri() = yi ? 1 xi1 ?   ? p?1 xi;p?1 ? p . To measure the quality of a fit, Rousseeuw and Hubert [16] introduced the notion of regression depth. 1 2

Research Assistant with the FWO, Belgium Postdoctoral Fellow at the FWO, Belgium.

1

Definition 1. The regression depth of a candidate t  2 IRp relative to a dataset Zn  IRp

is given by

rdepth (; Zn) = min f#(ri ()  0 and xti u < v) + #(ri ()  0 and xti u > v)g u;v

(1)

where the minimum is over all unit vectors u = (u1; : : : ; up?1)t 2 IRp?1 and all v 2 IR with xit u 6= v for all (xit ; yi) 2 Zn.

The regression depth of a fit   IRp relative to the dataset Zn  IRp is thus the smallest number of observations that need to be passed when tilting  until it becomes vertical. Therefore, we always have 0  rdepth (; Zn)  n. In the special case of p = 1 there are no x-values, and Zn is a univariate dataset. For any  2 IR we then have rdepth(; Zn) = min (#fyi()  0g; #fyi()  0g) which is the 'rank' of  when we rank from the outside inwards. For any p  1, the regression depth of  measures how balanced the dataset Zn is about the linear fit determined by . It can easily be verified that regression depth is scale invariant, regression invariant and affine invariant according to the definitions in Rousseeuw and Leroy ([17, page 116]). Based on the notion of regression depth, Rousseeuw and Hubert [16] introduced the deepest regression estimator (DR) for robust linear regression. In Section 2 we give the definition of DR and its basic properties. We show that DR is a robust method with breakdown value that converges almost surely to 1/3 in any dimension, when the good data come from a large semiparametric model. Section 3 proposes the fast approximate algorithm MEDSWEEP to compute DR in higher dimensions (p  3). Based on the distribution of the regression depth function, inference for the parameters is derived in Section 4. Tests and confidence regions for the true unknown parameters 1 ; : : : ; p are constructed. We also propose a test for linearity versus convexity of the dataset Zn based on the maximal depth of Zn. Applications of deepest regression to specific models are given in Section 5. First we consider polynomial regression, for which we update the definition of regression depth and then compute the deepest regression accordingly. We show that the deepest polynomial regression always has breakdown value at least 1/3. We also apply the deepest regression to the Michaelis-Menten model, where it provides a solution to the problem of ambiguous results obtained from the two commonly used parametrizations.

2

2 Definition and properties of deepest regression Definition 2. In p dimensions the deepest regression estimator DR(Zn) is de ned as the

t  with maximal rdepth (; Zn), that is

DR(Zn) = argmax rdepth (; Zn): 

(2)

(See Rousseeuw and Hubert [16].) Since the regression depth of a fit  can only increase if we slightly tilt the fit until it passes through p observations (while not passing any other observations), it suffices to consider all fits through p data points in Definition 2. If several of these fits have the same (maximal) regression depth, we take their average. Note that no distributional assumptions are made to define the deepest regression estimator of a dataset. The DR is a regression, scale, and affine equivariant estimator. For a univariate dataset, the deepest regression is its median. The DR thus generalizes the univariate median to linear regression. In the population case, let (xt; y) be a random p-dimensional variable, with distribution H on IRp. Then rdepth (; H ) is defined as the smallest amount of probability mass that needs to be passed when tilting  in any way until it is vertical. The deepest regression DR(H ) is the fit  with maximal depth. The natural setting of deepest regression is a large semiparametric model H in which the functional form is parametric and the error distribution is nonparametric. Formally, H consists of all distributions H on IRp that satisfy the following conditions: H has a strictly positive density and there exists a ~ 2 IRp with med H (y j x) = (xt ; 1)~:

(H)

Note that this model allows for skewed error distributions and heteroscedasticity. Van Aelst and Rousseeuw [22] have shown that the DR is a Fisher-consistent estimator of med (yjx) when the good data come from the natural semiparametric model H. The asymptotic distribution of the deepest regression was obtained by He and Portnoy [9] in simple regression, and by Bai and He [2] in multiple regression. Figure 1 shows the Educational Spending data, obtained from the DASL library at http://lib.stat.cmu.edu/DASL. This dataset lists the expenditures per pupil versus the average salary paid to teachers for n = 51 regions in the US. The fits 1 = (0:17; 0:6)t and 2 = (?0:3; 12)t both have regression depth 2, and the deepest regression DR(Zn) = 3

θ1

6

θ2 •

5

••

3

4

expenditures

7

8



• • • • • • •• • • ••• • • • •• • • • • • • ••• • • • • • • • • • • • • • • 20

25

DR(Z n) •

• • •

30

35

40

average salary

Figure 1: Educational spending data, with n = 51 observations in p = 2 dimensions. The lines 1 and 2 both have depth 2, and the deepest regression DR(Zn) is the average of ts with depth 23. (0:17; ?0:51)t is the average of fits with depth 23. Figure 1 illustrates that lines with high regression depth fit the data better than lines with low depth. The regression depth thus measures the quality of a fit, which motivates our interest in the deepest regression DR(Zn). We define the finite-sample breakdown value "n of an estimator Tn as the smallest fraction of contamination that can be added to any dataset Zn such that Tn explodes (see also Donoho and Gasko [6]). Let us consider an actual dataset Zn. Denote by Zn+m the dataset formed by adding m observations to Zn. Then the breakdown value is defined as "n(Tn; Zn) = minf n +mm ; sup kTn+m(Zn+m) ? Tn(Zn)k = 1g: Zn+m The breakdown value of the deepest regression is always positive, but it can be as low as 1=(p + 1) when the original data are themselves peculiar (Rouseeuw and Hubert [16]). Fortunately, it turns out that if the original data are drawn from the model, then the breakdown value converges almost surely to 1=3 in any dimension p. 4

Theorem 1. Let Zn = f(xt1; y1); : : : ; (xtn; yn)g be a sample from a distribution H on IRp (p  3) with H 2 H. Then a:s: 1 "n(DR; Zn) ?n?? ! : !1 3

(3)

(All proofs are given in the Appendix.) Theorem 1 says that the deepest regression does not break down when at least 67% of the data are generated from the semiparametric model H while the remaining data (i.e., up to 33% of the points) may be anything. This result holds in any dimension. The DR is thus robust to leverage points as well as to vertical outliers. Moreover, Theorem 1 illustrates that the deepest regression is different from L1 regression, P which is defined as L1 (Zn) = argmin ni=1 jri()j. Note that L1 is another generalization of  the univariate median to regression, but with zero breakdown value due to its vulnerability to leverage points. In simple regression, Van Aelst and Rousseeuw [22] derived the in uence function of the DR for elliptical distributions and computed the corresponding sensitivity functions. The in uence functions of the DR slope and intercept are piecewise smooth and bounded, meaning that an outlier cannot affect DR too much, and the corresponding sensitivity functions show that this already holds for small sample sizes. The deepest regression also inherits a monotone equivariance property from the univariate median, which does not hold for L1 or other estimators such as least squares, least trimmed squares (Rousseeuw [14]) or S-estimators (Rousseeuw and Yohai [20]). By definition, the regression depth only depends on the xi and the signs of the residuals. This allows for monotone transformations of the response yi: Assume the functional model is

y = g(1x1 +    + p?1xp?1 + p)

(4)

with g a strictly monotone link function. Typical examples of g include the logarithmic, the exponential, the square root, the square and the reciprocal transformation. The regression depth of a nonlinear fit (4) is defined as in (1) but with ri() = yi?g(1x1 +  +p?1xp?1+p). Due to the monotone equivariance, the deepest regression fit can be obtained as follows. First we put y~i = g?1(yi) and determine the deepest linear regression ^ = (^1 ; : : : ; ^p)t of the transformed data (xti ; y~i). Then we can backtransform the deepest linear regression ^, yielding the deepest nonlinear regression fit y = g(^1x1 +    + ^p) to the original data.

5

3 Computation In p = 2 dimensions the regression depth can be computed in O(n log n) time with the algorithm described in (Rousseeuw and Hubert [16]). To compute the regression depth of a fit in p = 3 or p = 4 dimensions, Rousseeuw and Struyf [18] constructed exact algorithms with time complexity O(np?1 log n). For datasets with large n and/or p they also give an approximate algorithm that computes the regression depth of a fit in O(mp3 + mpn + mn log n) time. Here m is the number of (p ? 1)-subsets in x-space used in the algorithm. ?  The algorithm is exact when all m = p?n 1 such subsets are considered. A naive exact algorithm for the deepest regression computes the regression depth of all O(np) fits through p observations and keeps the one(s) with maximal depth. This yields a total time complexity of O(n2p?1 log n) which is very slow for large n and/or high p. Even if we use the approximate algorithm of Rousseeuw and Struyf [18] to compute the depth of each fit, the time complexity remains very high. In simple regresssion, collaborative work with several specialists of computational geometry yielded an exact algorithm of complexity O(n log2 n), i.e. little more than linear time (van Kreveld et al. [24]). To speed up the computation in higher dimensions, we will now construct the fast algorithm MEDSWEEP to approximate the deepest regression. The MEDSWEEP algorithm is based on regression through the origin. For regression through the origin, Rousseeuw and Hubert [16] defined the regression depth (denoted as rdepth0 ) by requiring that v = 0 in De nition 1. Therefore, the rdepth0() of a fit   IRp relative to a dataset Zn  IRp+1 is again the smallest number of observations that needs to be passed when tilting  in any way until it becomes vertical. Rousseeuw and Hubert [16] have shown that in the special case of a regression line through the origin (p = 1), the deepest regression (DRo) of the dataset Zn = f(x1; y1); : : : ; (xn; yn)g is given by the slope n yi (5) DR o(Zn ) = med i=1 xi where observations with xi = 0 are not used. This estimator has minimax bias (Martin, Yohai and Zamar [11]) and can be computed in O(n) time. We propose a sweeping method based on the estimator (5) to approximate the deepest regression in higher dimensions. Suppose we have a dataset Zn = f(xi1 ; : : : ; xi;p?1; yi); i = 1; : : : ; ng. We arrange the n observations as rows in a n  p matrix X = [X1 ; : : : ; Xp?1; Y ] where the Xi and Y are n-dimensional column vectors. 6

Step 1: In the first step we construct the sweeping variables X1S ; : : : ; XpS?1. We start

with X1S = X1 . To obtain XjS (j > 1) we successively sweep X1S ; : : : ; XjS?1 out of the original variable Xj . In general, to sweep XkS out of Xl (k < l) we compute n

^lk = med j 2J

xjl ? med x i=1 il n

xSjk ? med xS i=1 ik

(6)

where J is the collection of indices for which the denominator is different from zero, and then we replace Xl by Xl ? ^lk Xk . If k < l ? 1 we can now sweep the next variable XkS+1 out of this new Xl . If k = l ? 1 then XlS = Xl . Thus we obtain the sweeping variables

X1S = X1 X2S = X2 ? ^21 X1S ... XpS?1 = Xp?1 ? ^p?1;1X1S ?    ? ^p?1;p?2XpS?2:

Step 2: In the second step we successively sweep X1S ; : : : ; XpS?1 out of Y . Put Y 0 = Y . For k = 1; : : : ; p ? 1 we now compute n

^k = med j 2J

yjk?1 ? med yk?1 i=1 i n

xSjk ? med xS i=1 ik

(7)

with J as before, and we replace the original Y k?1 by Y k = Y k?1 ? ^k XkS . Thus we obtain

Y S = Y ? ^1 X1S ?    ? ^p?1XpS?1:

(8)

The proces (7)-(8) is iterated until convergence is reached. In each iteration step all the coefficients ^1; : : : ; ^p?1 are updated. The maximal number of iterations has been set to 100 because experiments have shown that even when convergence is slow, after 100 iteration steps we are already close to the optimum. After the iteration proces, we take the median of Y S to be the intercept ^p.

Step 3: By backtransforming ^1; : : : ; ^p we obtain the regression coefficients (^1S ; : : : ; ^pS )t

S corresponding to the original variables X1; : : : ; Xp?1; Y . The obtained fit ^ is then slightly adjusted until it passes through p observations, because we know that this can only improve

7

the depth of the fit. We start by making the smallest absolute residual zero. Then for each of the directions X1; : : : ; Xp?1 we tilt the fit in that direction until it passes an observation while not changing the sign of any other residual. This yields the fit ^.

Step 4: In the last step we approximate the depth of the final fit ^. Let uS1 ; : : : ; uSp?1

be the directions corresponding to the variables X1S ; : : : ; XpS?1, then we compute the minimum over u 2 fe1; ?e1 ; : : : ; ep?1; ?ep?1; uS1 ; ?uS1 ; : : : ; uSp?1; ?uSp?1g instead of over all unit vectors u 2 IRp?1 in the right hand side of expression (1). Since computing the median takes O(n) time, the first step of the algorithm needs O(p2n) time and the second step takes O(hpn) time where h is the number of iterations. The adjustments in step 3 also take O(p2n) time, and computing the approximate depth in the last step can be done in O(pn log n) time. The time complexity of the MEDSWEEP algorithm thus becomes O(p2n + hpn + pn log n) which is very low. To measure the performance of our algorithm we carried out the following simulation. For different values of p and n we generated m = 10; 000 samples Z (j) = f(xi1 ; : : : ; xi;p?1; yi); i = 1; : : : ; ng from the standard gaussian distribution. For each of these samples we computed the deepest regression (^1(j) ; : : : ; ^p(j))t with the MEDSWEEP algorithm and measured the total time needed for these 10; 000 estimates. For each n and p we also computed the bias of the intercept, which is the average of the 10; 000 intercepts, and the bias of the vector of the slopes, which we measure by r

^(j))2 +    + (ave ^(j) )2 ): b(^1 ; : : : ; ^p?1) = p ?1 1 ((ave j 1 j p?1 We also give the mean squared error of the vector of the slopes, given by p?1 m X X 1 1 ^ ^ (^i(j) ? i )2 MSE(1 ; : : : ; p?1) = m j=1 p ? 1 i=1

(9)

(10)

where the true values i ; i = 1; : : : ; p ? 1 equal zero, and the mean squared error of the P intercept, given by m1 mj=1(^p(j))2. Table 1 lists the bias and mean squared error of the vector of the slopes, while the bias and mean squared error of the intercept are given in Table 2. Note that the bias and mean squared error of the slope vector and the intercept are low, and that they decrease with increasing n. From Tables 1 and 2 we also see that the mean squared error does not seem to increase with p. 8

Table 1: Bias (9) and mean squared error (10) of the DR slope vector, obtained by generating m = 10; 000 standard gaussian samples for each n and p. The DR fits were obtained with the MEDSWEEP algorithm. The results for the bias have to be multiplied by 10?4 and the results for the MSE by 10?3. n 50 100 300 500 1000

bias MSE bias MSE bias MSE bias MSE bias MSE

2 3 18.01 18.27 54.55 52.32 7.56 8.41 24.59 25.32 8.11 5.98 8.11 8.27 0.44 1.68 4.93 4.89 6.34 3.58 2.47 2.51

p 4 5 10 28.53 22.47 21.42 52.23 52.54 58.65 11.16 8.98 12.68 25.99 26.15 27.17 6.29 7.39 10.89 8.26 8.41 8.42 2.91 9.10 5.49 5.02 4.93 4.97 6.77 3.54 3.98 2.46 2.46 2.50

Table 3 lists the average time the MEDSWEEP algorithm needed for the computation of the DR, on a Sun SparcStation 20/514. We see that the algorithm is very fast. To illustrate the MEDSWEEP algorithm we generated 50 points in 5 dimensions, according to the model y = x1 ? x2 + x3 ? x4 + 1 + e with x1 ; x2; x3 ; x4 and e coming from the standard gaussian distribution. The DR fit obtained with MEDSWEEP is y = 0:98x1 ? 1:01x2 + 0:97x3 ? 1:00x4 + 1:14 with approximate depth 21. The algorithm needed 14 iterations till convergence. In a second example, we generated 50 points according to the model y = 100x1 + x2 ? 2x3 + 3x4 ? 4 + e with standard gaussian x1 ; x2 ; x3; x4 and e. After 25 iterations the MEDSWEEP algorithm yielded the fit y = 99:99x1 +0:99x2 ? 2:03x3 +3:00x4 ? 3:86 with approximate depth 21. Note that in both cases the coefficients obtained by the algorithm approximate the true parameters in the model very well. The MEDSWEEP algorithm is available from our website http://win-www.uia.ac.be/u/statis/ where its use is explained.

9

Table 2: Bias and mean squared error of the DR intercepts, obtained by generating m = 10; 000 standard gaussian samples for each n and p. The DR fits were obtained with the MEDSWEEP algorithm. The results for the bias have to be multiplied by 10?4 and the results for the MSE by 10?3. n 50 100 300 500 1000

bias MSE bias MSE bias MSE bias MSE bias MSE

2 3 -3.04 -48.70 32.89 32.62 5.75 -3.32 15.78 16.01 -2.71 -7.99 5.25 5.31 1.35 -5.53 3.09 3.15 -3.76 -3.40 1.56 1.56

p 4 14.38 35.23 12.21 16.92 -2.37 5.22 -4.33 3.21 -5.10 1.55

5 10 13.23 -19.64 36.72 44.35 9.92 -1.70 16.77 18.14 3.49 4.54 5.23 5.47 -4.26 2.39 3.21 3.18 0.75 -0.01 1.58 1.62

4 Inference 4.1 Tests for parameters In simple regression, the semiparametric model assumptions of condition (H) state that med(yjx) = ~1 x + ~2 and that the errors ei = yi ? ~1 xi ? ~2 are independent with P (ei > 0) = 1=2 = P (ei < 0), hence P (ei = 0) = 0. Then it is possible to compute Fn(k) := P (rdepth (~; Zn0 )  k) where Zn0 has the same fx1 ; : : : ; xng as the actual dataset Zn. By invariance properties,

Fn(k) = P (rdepth (0; f(xi; ei); i = 1; : : : ; ng)  k)

(11)

where the ei are i.i.d. from (say) the standard gaussian. Thus we can compute Fn(k) by simulating (11). When there are no ties among the xi we can even compute Fn(k) explicitly from formula (4.4) in Daniels (1954), yielding

Fn(k) = 2(n ? 2k)

j X 0

j =0

B (n; 21 )(n ? k + j (n ? 2k)) 10

(12)

Table 3: Computation time (in seconds) of the MEDSWEEP algorithm for a sample of size n with p dimensions. Each time is an average over 10; 000 samples. p n 2 3 4 50 0.017 0.023 0.040 100 0.027 0.071 0.15 300 0.056 0.21 0.42 500 0.093 0.39 0.73 1000 0.20 0.66 1.43

5 0.057 0.25 0.67 1.12 2.18

10 0.20 0.51 1.38 2.24 4.42

for k  [(n ? 1)=2], and Fn(k) = 1 otherwise. Here j 0 = [k=(n ? 2k)] and each term is a probability of the binomial distribution B (n; 1=2), which stems from the number of ei in fe1 ; : : : ; eng with a particular sign. For increasing n we can approximate B (n; 1=2) by a gaussian distribution due to the central limit theorem, so (12) can easily be extended to large n. The distribution of the regression depth allows us to test one or several regression coefficients. To test the combined null hypothesis (~1 ; ~2 )t = (0; 0)t we compute k = rdepth ((0; 0)t; Zn ) and the corresponding p-value equals Fn (k) and can be computed from (12). Consider the dataset in Figure 2 about n = 41 species of animals (Van den Bergh [23]). The plot shows the logarithm of the weight of a newborn versus the logarithm of the weight of its mother. The deepest regression line DR = (0:86; ?2:12)t has depth 19. For this dataset rdepth ((0; 0)t; Zn ) = 1, yielding the p-value F41 (1) = 0:00000 which is highly significant. To test the significance of the slope (H0 : ~1 = 0) we compute max rdepth ((0; 2)t ; Zn) over 2 . This is easy, because we only have to compute the rdepth of all horizontal lines passing through an observation. For the animal dataset, the maximal rdepth ((0; 2)t ; Zn) equals 5. Therefore the corresponding p-value is P (rdepth (~ ; Zn0 )  5) = F41 (5) = 0:00002, so H0 is rejected. This p-value 0:00002 should be interpreted in the same way as the p-value associated with R2 or the F-test in LS regression. Analogously, to test ~2 = 0 we compute max rdepth ((1 ; 0)t ; Zn) = 6 by considering all lines through the origin and an observation, yielding the p-value F41 (6) = 0:0001 which is also highly significant. More generally, we can test the hypothesis H0 : ~i = 0 for i = 1; 2 by computing the maximal regression depth of all lines with i = 0 that pass through an observation. 11

2

•• •• • •• • • • • • • ••• •

0



-2



-4

log(weight newborns)

4

•• • •• •

• -4



DR •

• •

-2



• •••



• • •• •

0

2

4

6

8

log(weight mother)

Figure 2: Logarithm of the weight of a newborn versus the logarithm of the weight of its mother for n = 41 species of animals, with the DR line which has depth 19. For example, to test the hypothesis H0 : ~1 = 1 for the animal data (i.e., the hypothesis that the weight of a newborn is proportional to the weight of its mother) we compute m ax rdepth ((1; 2)t ; Zn) = 10 and the corresponding p-value F41 (10) = 0:021, which is significant at the 5% level but not at the 1% level. These tests generalize easily to higher dimensions and situations with ties among the xi , but then we can no longer use the exact formula (12) which is restricted to the bivariate case without ties in fx1 ; : : : ; xng. Therefore, in these cases we compute Fn(k) by simulating (11). Let us consider the stock return dataset (Benderly and Zwick [3]) shown in Figure 3 with n = 28 observations in p = 3 dimensions. The regressors are output growth and in ation (both as percentages), and the response is the real return on stocks. The deepest regression obtained with the MEDSWEEP algorithm equals DR = (3:41; ?2:22; 6:66)t with approximate depth 11. To test the null hypothesis (~1; ~2 )t = (0; 0)t that both slopes are zero (this would be done with the R2 in LS regression) we compute the maximal rdepth ((0; 0; 3)t ; Zn) over all 3 (i.e. over all yi in the dataset). By computing the exact rdepth of these 28 hor12

izontal planes (by the fast algorithm of Rousseeuw and Struyf [18]) we obtain the value 6. Simulation yields the corresponding p-value F28(6) = 0:22, which is not significant. To test the significance of the intercept (H0 : ~3 = 0) we compute rdepth ((1 ; 2; 0)t; Zn) over all (1 ; 2 )t. That is, we compute the depth of all planes through two observations and the origin. For the example this yields max rdepth ((1 ; 2 ; 0)t; Zn) = 11 with corresponding p-value F28 (11)  1, which is not at all significant.

4.2 Confidence regions

In order to construct a confidence region for the unknown true parameter vector ~ = (~1 ; : : : ; ~p)t we use a bootstrap method. Starting from the dataset Zn = f(xti; yi); i = 1; : : : ; ng 2 IRp, we construct a bootstrap sample by randomly drawing n observations, with replacement. For each bootstrap sample Zn(j) ; j = 1; : : : ; m we compute its deepest regression fit ^(j) . Note that there will usually be a few outlying estimates ^(j) in the set of bootstrap fits f^(j); j = 1; : : : ; mg, which is natural since some bootstrap samples contain disproportionally many outliers. Therefore we don't construct a confidence ellipsoid based on the classical mean and covariance matrix of the f^(j) ; j = 1; : : : ; mg, but we use the robust minimum covariance determinant estimator (MCD) proposed by (Rousseeuw [14,15]). The MCD looks for the h  n=2 observations of which the empirical covariance matrix has the smallest possible determinant. Then the center ^ = (^1; : : : ; ^p)t of the dataset is defined as the average of these h points, and the covariance matrix ^ of the dataset is a certain multiple of their covariance matrix. To obtain a confidence ellipsoid of level we compute the MCD of the set of bootstrap estimates with h = d(1 ? )me. The (1 ? )% confidence ellipsoid E1? is then given by E1? = f 2 IRp; ( ? ^ )t ^ ?1( ? ^ )  RD 2 (^(j))d(1? )me:m g: (13) Here RD (^(j))d(1? )me:m is the d(1 ? )me order statistic of the robust distances of the bootstrap estimates f^(j); j = 1; : : : ; mg, where the robust distance (Rousseeuw and Leroy [17]) of a bootstrap estimate ^(j) is given by q

?1 (j ) (j ) RD ( ) = (^ ? ^ )t ^ (^ ? ^ ):

^(j)

(14)

From this confidence ellipsoid E1? in fit space we can also derive the corresponding regression confidence region for y^ = ~1x1 +    + ~p?1xp?1 + ~p defined as

R1? = f(xt ; y) 2 IRp; min ((xt; 1))  y  max ((xt; 1)) where  2 E1? g: 13

(15)

Theorem 2. The region R1? equals the set f(xt; y) 2 IRp; (xt; 1)^ ? c

q

(xt ; 1)^ (xt ; 1)t  y  (xt; 1)^ + c

q

(xt ; 1)^ (xt; 1)tg

(16)

where c = RD (^ )d(1? )me:m . (j )

Let us consider the Educational Spending data of Figure 1. Figure 4a shows the deepest regression estimates of 1000 bootstrap samples, drawn with replacement from the original data. Using the fast MCD algorithm of Rousseeuw and Van Driessen [19] we find the center in Figure 4a to be (0:19; ?0:95)t which corresponds well to the DR fit y = 0:19x ? 1:05 of the original data. As a confidence region for (~1 ; ~2 )t we take the 95% tolerance ellipse E0:95 based on the MCD center and scatter matrix, which yields the corresponding confidence region R0:95 shown in Figure 4b. Note that the intersection of this confidence region with a vertical line x = x0 is not a 95% probability interval for an observation y at x0. It is the interval spanned by the tted values y^ = 1 x0 + 2 for (1 ; 2 )t in a 95% confidence region for (~1 ; ~2 )t. An example of a confidence region in higher dimensions is shown in Figure 3. It shows the 3-dimensional stock return dataset with its deepest regression plane, obtained with the MEDSWEEP algorithm. The 95% confidence region shown in Figure 3 was based on m = 1; 000 bootstrap samples.

4.3 Test for linearity in simple regression If the observations of the bivariate dataset Zn lie exactly on a straight line, then max  rdepth (; Zn) = n is the highest possible value. On the other hand, if Zn lies exactly on a strictly convex or strictly concave curve, then max  rdepth (; Zn)  n=3 is at its lowest (Rousseeuw and Hubert [16, Theorem 2]). Therefore, max  rdepth (; Zn) can be seen as a measure of linearity for the dataset Zn. Note that this lower bound does not depend on the amount of curvature when the (xi ; yi ) lie exactly on the curve. However, as soon as there is noise (i.e. nearly always), the relative sizes of the error scale and the curvature come into play. The null hypothesis assumes that the dataset Zn follows the linear model

H0 : yi = ~1 xi + ~2 + ei

i = 1; : : : ; n

for some ~ = (~1 ; ~2 )t and with independent errors ei each having a distribution with zero median. To determine the corresponding p-value we generate m = 10; 000 samples Z (j) = 14

80 60

stock return

40 20 0

DR −20 −40 −60 −80 15 10

8 6

5

inflation

4 2

0 0 −5

output growth

−2

Figure 3: Stock return dataset with its deepest regression plane, and the upper and lower surface of the 95% con dence region R0:95 based on m = 1; 000 bootstrap samples. Table 4: Windmill data: Maximal rdepth k with corresponding p-value p(k) obtained by simulation. k p(k)

8 9 10 11 12 0 0.0001 0.0481 0.7334 1

f(xi ; ei); i = 1; : : : ; ng with the same x-values as in the dataset Zn and with standard gaussian ei . For each j we compute the maximal regression depth of the dataset Z (j) . Then for each value k we approximate the p-value P (maxrdepth  kjH0) by p(k) = #fj ; maxrdepth(Z (j))  kg=m:

(17)

For example, let us consider the windmill dataset (Hand et al. [8]) which consists of 25 measures of wind velocity with corresponding direct current output, as shown in Figure 5. The p-values in Table 4 were obtained from (17). For the actual max rdepth (Zn) = 10 we obtain p(10) = 0:0481, so we reject the linearity at the 5% level.

15

2

(a) • • •

••

0

1

• • •••••• • •• • • • ••• •• • • •• ••• •• • •• •• • •• •• •• • • • ••• •• ••• • • •• •••• • • • •• • •••• • • •• •• 0.95 • ••• • ••• •• • • • •• • • • • • • •• •• • •• •••• •• • ••• •• • •• • ••• •• • •••• • • ••• • • • • •• • • •• ••• • • • • ••

-3

-2

-1

θ2

E

••



••

• •

0.10

0.15

θ1

0.20

0.25

0.30

(b)

6

DR



••





• • •

• • •• • • •• • • • • •• • • • •



• • ••• •• •• •

• • • •



R 0.95 •





2









• 4

expenditures

8



20

25

30

35

40

average salary

Figure 4: (a) DR estimates of 1; 000 bootstrap samples from the Educational Spending data with the 95% con dence ellipse E0:95; (b) Plot of the data with the DR line and the 95% con dence region R0:95 for the tted value. 16

5 Specific models 5.1 Polynomial regression Consider a dataset Zn = f(xi; yi); i = 1; : : : ; ng  IR2 . The polynomial regression model wants to fit the data by y = 1 x + 2 x2 +    + k xk + k+1 where k is called the degree of the polynomial. The residuals of Zn relative to the fit  = (1 ; : : : ; k+1)t are denoted as ri = ri() = yi ? 1 x ?    ? k xk ? k+1. We could consider this to be a multiple regression problem with regressors x = (x; x2 : : : ; xk )t and determine the depth of a fit  as in Definition 1. But we know that the joint distribution of (xt; y) is degenerate (i.e. it does not have a density), so many properties of the deepest regression, such as Theorem 1 about the breakdown value, would not hold in this case. In fact, the set of possible x forms a so-called moment curve in IRk , so it is inherently one-dimensional. A better way to define the depth of a polynomial fit (denoted by rdepth k ) is to update the definition of regression depth in simple regression, where x was also univariate. For each polynomial fit  we can compute its residuals ri() and define its regression depth as rdepthk (; Zn) = min f#(ri ()  0 and uxi < v) + #(ri()  0 and uxi > v)g u;v

(18)

where u 2 f?1; 1g and v 2 IR with uxi 6= v for all (xi; yi) 2 Zn. Note that this definition only depends on the xi -values and the signs of the residuals. With this definition of depth, we define the deepest polynomial of degree k as in (2) and denote it by DRk (Zn).

Theorem 3. For any dataset Zn = f(xi ; yi); i = 1; : : : ; ng 2 IR2 with distinct xi the deepest polynomial regression of degree k has breakdown value

"n(DRk ; Zn)  3nn??33kk  31 :

(19)

Theorem 3 shows that with the definition of depth of a polynomial fit given in (18), the deepest polynomial regression has a positive breakdown value of approximately 1/3, so it is robust to vertical outliers as well as to leverage points. In Section 4.3 we rejected the linearity of the windmill data. Therefore we now consider a model with a quadratic component for this data, i.e. y = 1 x + 2 x2 + 3 . Figure 5 shows the windmill data with the deepest quadratic fit, which has depth 12. 17

2.0



••



• •





1.5

• • •

1.0

•• • • •

• •

0.5

direct current output



• • • ••

• 4

6

8

10

wind velocity

Figure 5: Windmill data with the deepest quadratic t, which has regression depth 12.

5.2 Michaelis-Menten model In the field of enzyme kinetics, the steady-state kinetics of the great majority of the enzymecatalyzed reactions that have been studied are adequately described by a hyperbolic relationship between the concentration s of a substrate and the steady-state velocity v. This relationship is expressed by the Michaelis-Menten equation (20) v = Kvmax+ss m where the constant vmax is the maximum velocity and Km is the Michaelis constant. The Michaelis-Menten equation has been linearized by rewriting it in the following three ways: v = vmax ? 1 v (21) s Km Km 1 = 1 + Km 1 (22) v vmax vmax s s = Km + 1 s (23) v vmax vmax which are known as the Scatchard equation (Scatchard [21]), the double reciprocal equation (Lineweaver and Burke [10]) and the Woolf equation (Haldane [7]). Each of the three rela18

tions (21), (22), (23) can be used to estimate the constants vmax and Km. In general the three relations yield different estimates for the constants vmax and Km , because the error terms are also transformed in a nonlinear way. Cressie and Keightley [5] compared these three linearizations of the Michaelis-Menten relation in the context of hormone-receptor assays, and concluded that for well-behaved data the Woolf equation (23) works best, but for data containing outliers the double reciprocal equation (22) with robust regression gives better results. Theorem 4 shows that applying the deepest regression to the Woolf equation (23) yields the same estimates for vmax and Km as the deepest regression applied to the double reciprocal equation (22). This resolves the ambiguity.

Theorem 4. Let Zn = f(si; vi); i = 1; : : : ; ng 2 IR2 with si > 0 for all i = 1; : : : ; n, and denote the DR t of the double reciprocal equation as DR(f( s1i ; v1i ); i = 1; : : : ; ng) = (^1 ; ^2 )t . Then the DR t of the Woolf equation satis es

s DR(f(si; i ); i = 1; : : : ; ng) = (^2 ; ^1 )t : vi

In both cases we obtain the same v^max = 1=^2 and K^m = ^1=^2 .

Example: In assays for hormone receptors the Michaelis-Menten equation describes

the relationship between the amount B of hormone bound to receptor and the amount F of hormone not bound to receptor. These assays are used e.g. to determine the cancer treatment method (see Cressie and Keightley [5]). Equation (20) now becomes B = KBmax+FF : D The parameters of interest are the concentration Bmax of binding sites and the dissociation constant KD for the hormone-receptor interaction. Figure 6a shows the Woolf plot of data from an estrogen receptor assay obtained by Cressie and Keightley [4]. Note that this dataset clearly contains one outlier. To this plot we added the deepest regression line DR(f(Fi; BFii ); i = 1; : : : ; ng) = (0:0215; 0:567)t. In Figure 6b we also show the double reciprocal plot with its deepest regression line DR(f( s1i ; v1i ); i = 1; : : : ; ng) = (0:567; 0:0215)t. Thus in both cases we obtain the same estimated values B^max = 46:556 and K^ D = 26:398, which are comparable to the least squares estimates B^max = 45:610 and K^ D = 24:079 obtained from the Woolf equation based on all data except the outlier. On the other hand, least squares applied to the full data gives B^max = 52:290 and K^ D = 38:048 based on the Woolf 19

(a) 7



6



LS

4

••



• •

2

3

F/B

5

DR

1

• • 50

100

150

200

250

300

F

(b) •

1/B

0.04

0.05



DR

0.03 0.02

••

•• • •

• •

LS

• 0.01

0.02

0.03

0.04

0.05

0.06

1/F

Figure 6: (a) Woolf plot of the Cressie and Keightly data with the deepest regression line DR = (0:0215; 0:567)t and the least squares t LS = (0:0191; 0:728)t; (b) double reciprocal plot of the data with the deepest regression line DR = (0:567; 0:0215)t and the least squares t LS = (0:596; 0:0203)t. 20

equation, and B^max = 49:354 and K^ D = 29:436 based on the double reciprocal equation. These estimates are quite different. With least squares, in both cases the estimates B^max and K^ D are highly in uenced by the outlying observation (both B^max and K^ D come out too high) which may lead to wrong conclusions, e.g. when determining a cancer treatment method.

Appendix Proof of Theorem 1. To prove Theorem 1 we need the following lemmas. Lemma 1. If the xi are in general position, then f; rdepth(; Zn)  pg is bounded. Proof: For J  f1; : : : ; ng with #J = p we denote the fit that passes through the p observations f(xti1 ; yi1 ); : : : ; (xtip ; yip ); ij 2 J g by J . Since the xi are in general position, such a fit J will be non-vertical (for any J ). Therefore, convfJ ; J  f1; : : : ; ng; #J = pg = convfJ ; rdepth (J ; Zn)  pg = convf; rdepth (; Zn)  pg is bounded, hence also f; rdepth (; Zn)  pg is bounded. (Here, `conv' stands for the convex hull.) 2 Lemma 2. For any dataset Zn 2 IRp with the xi in general position (i.e. no more than p ? 1 of the xi lie in any (p ? 2)-dimensional ane subspace of IRp?1 ) the deepest regression has

breakdown value

2 "n(DR; Zn)  n(pn+?1)p?+p21+ 1  p +1 1 :

(24)

Proof: By Lemma 1 we know that f; rdepth (; Zn)  pg is bounded. Therefore to break down the estimator we must add at least m observations such that rdepth (DR(Zn+m); Zn)  p ? 1. Since for all Zn it holds that rdepth (DR(Zn); Zn)  dn=(p + 1)e (see Mizera [12], Amenta et al. [1]), we obtain 



n + m  rdepth (DR(Z ); Z )  p ? 1 + m; n+m n+m p+1

from which it follows that m  n?pp2+1 . Finally, it holds that 2 "n(DR; Zn)  n +mm  n(pn+?1)p?+p21+ 1 :

21

2

Lemma 3. If Zn  Zn+m then rdepth (DR(Zn); Zn)  rdepth (DR(Zn+m ); Zn+m):

Proof. Since Zn  Zn+m it holds that rdepth (; Zn)  rdepth (; Zn+m) for all . Thus for all  it holds that rdepth (; Zn)  max rdepth (; Zn+m ) = rdepth (DR(Zn+m ); Zn+m ), hence  max rdepth (; Zn) = rdepth (DR(Zn ); Zn)  rdepth (DR(Zn+m ); Zn+m ). 2  Lemma 4. Under the conditions of Theorem 1 we have that rdepth (DR; Zn)

a:s: 1 ?n?? ! : !1 2

(25) n Proof. Let us consider the dual space, i.e. the p-dimensional space of all possible fits . Dualization transforms a hyperplane H : y = 1 x1 + : : : + p?1 xp?1 + p to the point . An observation zi = (xi;1 ; : : : ; xi;p?1; yi) is mapped to the set D(zi) of all  that pass through zi, so D(zi) is the hyperplane Hi given by p = ?xi;1 1 ? : : : ? xi;p?1 p?1 + yi. In dual space, the regression depth of a fit  corresponds to the minimal number of hyperplanes Hi intersected by any hal ine [;  + u > where kuk = 1. A unit vector u in dual space thus corresponds to an affine hyperplane V in x-space and a direction in which to tilt  until it is vertical. For each unit vector u, we therefore define the wedge-shaped region A;u in primal space, formed by tilting  around V (in the direction of u) until the fit becomes vertical. Further denote Hn the empirical distribution of the observed data Zn. Define the metric

H (Hn; H ) = sup jHn(A;u) ? H (A;u)j:  ;kuk=1

If Zn is sampled from H , then it holds that a:s: H (Hn; H ) ?n?? ! 0: !1

This follows from the generalization of the Glivenko-Cantelli theorem formulated in (Pollard [13, Theorem 14]) and proved by (Pollard [13, Lemma 18]) and the fact that A;u can be constructed by taking a finite number of unions and intersections of half-spaces. Now define () = inf u H (A;u) and its empirical version n() = inf u Hn(A;u). It is clear that n () = rdepth (; Zn)=n and (~) = 21 . Moreover we have that jn(~) ? (~)j  H (Hn; H ) hence a:s: 1 (26) n(~) = rdepth (~; Zn)=n ?n?? ! : !1 2 22

Finally it holds that

rdepth (~; Zn)





a:s:  rdepth (nDR; Zn)  n +2 p n1 :

(27) n The latter inequality uses Theorem 7 in (Rousseeuw and Hubert [16]) which is valid since Zn is almost surely in general position. Taking limits in (27) and using (26) then finishes the proof. 2

Proof of Theorem 1: We will first show that lim infn "n  31 almost surely. From Lemma 1 we know that f; rdepth(; Zn)  pg is bounded a.s. if Zn  IRp is sampled from H . Therefore, to break down the estimator we must add at least m observations such that rdepth (DR(Zn+m ); Zn )  p ? 1. This implies that

r := rdepth (DR(Zn); Zn)  rdepth (DR(Zn+m); Zn+m)  p ? 1 + m; where the first inequality is due to Lemma 3, and thus  a:s: "n  n +mm  n +r r? ?p +p +1 1 : Finally we apply Lemma 4 to conclude that a:s: limninf "n  31 : (28) Next, we prove that lim supn "n  31 almost surely. Since regression depth is affine invariant, we may assume that none of the jjxijj in the dataset are zero. We will denote a hyperplane  with equation y = xt  +  by its two components  and  corresponding to the slopes and the intercept. Fix two strictly positive real numbers 0 and 0 . We now consider a point (xt; y) such that for any hyperplane  passing through (xt; y) and p ? 1 data points (xti; yi) from Zn it holds that 0 < jj  jj < 1 and 0 < j  j < 1. Because we assumed that none of the jjxijj equals 0, we can always find such a point (xt ; y). Note that x 6= xi for any i, otherwise  would be unbounded. p The dataset Zn+m is then obtained by enlarging the dataset Zn with m = [ n+1+ 2 ]?p+2 points equal to (xt; y). We know that the deepest fit DR(Zn+m) must pass through at least p different observations of Zn+m. Denote by J any candidate deepest fit. If J passes through p J (xt ; y) it is clear that rdepth (J ; Zn+m)  m + p ? 1 = [ n+1+ 2 ]+1. On the other hand, any  p which passes through p data points of Zn has rdepth (J ; Zn+m)  [ n+1+ 2 ]. This can be seen as follows. First consider the dataset Zn+1 which consists of the n original observations and 23

one copy of the point (xt; y). From Theorem 7 in (Rousseeuw and Hubert [16]) it follows that  p . Now there always exists a unit vector u 2 IRp?1 and v 2 IR rdepth (J ; Zn+1)  n+1+ 2 t such that x u = v and xti u 6= v for all i = 1; : : : ; n. Then the number of observations passed when tilting J around (u; v) as in Definition 1 plus the number of observations passed when tilting J around (?u; ?v) equals n + 1 + p because the fit J passes through exactly p observations. Therefore, we can suppose that the number of observations passed when p tilting J around (u; v) is at most [ n+1+ 2 ]. (If not, we replace (u; v ) by (?u; ?v ).) Note that the residual r(xt;y) (J ) 6= 0 because J does not pass through (xt; y). First suppose the residual r(xt;y)(J ) is strictly positive. The data are in general position, therefore we can always find a value " > 0 such that for all i = 1; : : : ; n it holds that xti u < v ? " if xti u < v and always xti u > v ? " if xti u > v. Since xt u > v ? " and r(xt;y)(J ) > 0 the number of observations passed when tilting J around (u; v ? ") is the same as when tilting around (u; v). Finally, adding the other m ? 1 replications of (xt ; y) does not change this p J value. Therefore rdepth (J ; Zn+m)  [ n+1+ 2 ]. If the residual r(xt ;y) ( ) is strictly negative, we replace v by v + " in a similar way and obtain the same result. The above reasoning shows that the deepest fit must pass through (xt; y) and p ? 1 original data points. Since we have shown that all these fits have an arbitrarily large slope and intercept, it holds that p a:s: [ n+1+ 2 ] ? p + 2 ???! 1 "n  n +mm = p n + [ n+1+ 2 ] ? p + 2 n!1 3 and thus a:s: (29) lim sup "n  13 n From (28) and (29) we finally conclude (3). 2

Proof of Theorem 2. Consider x 2 IRp?1. We will prove that the upper and lower bounds in expression (16) are the values y such that in the dual plot the hyperplane (xt ; 1) ? y = 0 is tangent to the ellipsoid E1? : ( ? ^ )t ^ ?1( ? ^ ) = c2 where c = RD (^(j))d(1? )me:m . First consider the special case ^ = 0 and ^ = Ip yielding the unit sphere t  = 1. The tangent hyperplane in an arbitrary point ~ = (~1 ; : : : ; ~p)t on the sphere is given by t ~ ( ? ~) = 0. Therefore the hyperplane (xt ; 1) ? y = 0 becomes a tangent hyperplane iff (xt ; 1) = (~t =~p) and y = ~t~=~p for some ~ on the sphere. Together with ~t ~ = 1 this p yields y2 = (xt; 1)(xt; 1)t giving the lower bound y = ? (xt ; 1)(xt; 1)t and the upper bound p y = (xt ; 1)(xt; 1)t corresponding to expression (16) for this case. 24

Consider the general case of an ellipsoid ( ? ^ )t^ ?1( ? ^ ) = c2 and denote ^ = PPt where  = diag(1; : : : ; p) is the diagonal matrix of eigenvalues of ^ and P = (e1 ; : : : ; ep) is the matrix of eigenvectors of ^ . We can transform this to the previous case by the transformation ~ = c?1=2 Pt( ? ^ ). The hyperplane (xt ; 1) ? y = 0 then transforms to c(xt ; 1)P1=2 ~ ? (y ? (xt ; 1)^) = 0 which becomes a tangent hyperplane if t = c2 (xt ; 1)PPt (xt ; 1)t = c2 (xt ; 1) ^ (xt ; 1)t. (y ?(xt ; 1)^)2 = (c(xt ; 1)P1=2 )(c(xt ; 1)P1=2 )q This yields q the lower bound y = (xt ; 1)^ ? c (xt; 1)^ (xt ; 1)t and the upper bound y = 2 (xt ; 1)^ + c (xt ; 1)^ (xt ; 1)t of expression (16).

Proof of Theorem 3. From the definition of rdepth k by (18) it follows for any data set Zn = f(xi ; yi); i = 1; : : : ; ng with distinct xi as in Lemma 1 that f; rdepth k (; Zn)  k + 1g

is bounded. Now any bivariate linear fit y = ~1 x1 + ~2 corresponds to a polynomial fit  = (1 ; : : : ; k+1 ) with 1 = ~1 , 2 =    = k = 0 and k+1 = ~2 for k  1. Since it holds for the deepest regression line DR1 that rdepth (DR1; Zn) = rdepth k (DR1 ; Zn)  dn=3e (Rousseeuw and Hubert [16]), it follows that the deepest polynomial regression DRk of degree k has rdepth k (DRk ; Zn)  rdepth k (DR1; Zn)  dn=3e. Because f; rdepthk (; Zn)  k + 1g is bounded, we must add at least m observations such that rdepth k (DRk (Zn+m); Zn)  k to break down the estimator. We obtain   n + m  rdepth (DR (Z ); Z )  k + m; k n+m n+m k 3 from which it follows that m  (n ? 3k)=2. This yields 2 "n(DRk ; Zn)  n +mm  3nn??33kk :

Proof of Theorem 4. We will show that when si > 0 for all i = 1; : : : ; n it holds for every  = (1 ; 2 )t that rdepth ((1 ; 2 )t ; f( s1i ; v1i ); i = 1; : : : ; ng) = rdepth ((2 ; 1 )t ; f(si ; vsii ); i = 1; : : : ; ng). This follows from ri((1 ; 2)t ; f( s1 ; v1 ); i = 1; : : : ; ng) = v1 ? 1 s1 ? 2 i

i

i

i

= s1 ( v1 ? 1 ? 2 si) i i 1 = s ri((2 ; 1)t ; f(si; vsi ); i = 1; : : : ; ng): i

i

Since si > 0 for all i = 1; : : : ; n we have sign(rj ((1 ; 2 )t; f( s1 ; v1 ); i = 1; : : : ; ng)) = sign(rj ((2 ; 1)t ; f(si; vsi ); i = 1; : : : ; ng)) i i i 25

for all j = 1; : : : ; n, and switching the x-values from s1i to si reverses their order. Therefore, according to Definition 1 both depths are the same. 2

References 1. N. Amenta, M. Bern, D. Eppstein, and S. Teng, \Regression Depth and Center Points," submitted. 2. Z. Bai and X. He, Asymptotic distributions of the maximal depth estimators for regression and multivariate location, Ann. Statist., to appear. 3. J. Benderly and B. Zwick, In ation, real balances, output, and real stock returns, American Economic Review (1985), p. 1117. 4. N. A. C. Cressie and D. D. Keightley, The underlying structure of the direct linear plot with application to the analysis of hormone-receptor interactions, Journal of Steriod Biochemistry, 11 (1979), 1173-1180. 5. N. A. C. Cressie and D. D. Keightley, Analysing data from hormone-receptor assays, Biometrics, 37 (1981), 235-249. 6. D. L. Donoho and M. Gasko, Breakdown properties of location estimates based on halfspace depth and projected outlyingness, Ann. Statist., 20 (1992), 1803-1827. 7. J. B. Haldane, Graphical methods in enzyme chemistry, Nature, 179 (1957), 832. 8. D. J. Hand, F. Daly, A. D. Lunn, K. J. McConway, and E. Ostrowski, A Handbook of Small Data Sets, New York: Chapman and Hall, 1994. 9. X. He and S. Portnoy, Asymptotics of the deepest line, in \Applied Statistical Science III: Nonparametric Statistics and Related Topics" (S.E. Ahmed, M. Ahsanullah, and B.K. Sinha, Eds.), pp. 71-81, Nova Science Publishers Inc, New York, 1998. 10. H. Lineweaver and D. Burk, The determination of enzyme dissociation constants, Journal of the American Chemical Society, 56 (1934), 658-666. 11. R. D. Martin, V. J. Yohai and R. H. Zamar, Min-max bias robust regression, Ann. Statist., 17 (1989), 1608-1630. 26

12. I. Mizera, \On Depth and Deep Points: a Calculus," submitted. 13. D. Pollard, Convergence of Stochastic Processes, New York: Springer-Verlag, 1984. 14. P. J. Rousseeuw, Least median of squares regression, J. Amer. Statist. Assoc., 79 (1984), 871-880. 15. P. J. Rousseeuw, Multivariate estimation with high breakdown point, in \Mathematical Statistics and Applications, Vol. B" (W. Grossmann, G. P ug, I. Vincze and W. Wertz, Eds.), pp. 283{297, Reidel, Dordrecht, 1985. 16. P. J. Rousseeuw and M. Hubert, Regression Depth, J. Amer. Statist. Assoc., 94 (1999), 388-402. 17. P. J. Rousseeuw and A. M. Leroy, Robust Regression and Outlier Detection, New York: Wiley-Interscience, 1987. 18. P. J. Rousseeuw and A. Struyf, Computing location depth and regression depth in higher dimensions, Statistics and Computing, 8 (1998), 193-203. 19. P. J. Rousseeuw and K. Van Driessen, A fast algorithm for the minimum covariance determinant estimator, Technometrics, 41 (1999), 212-223. 20. P. J. Rousseeuw and V. J. Yohai, Robust regression by means of S-estimators, in \Robust and Nonlinear Time Series Analysis," Lecture Notes in Statistics No. 26 (J. Franke, W. Hardle and R.D. Martin, Eds.), pp. 256-272, Springer, New York, 1984. 21. G. Scatchard, The attractions of proteins for small molecules and ions, Annals of the New York Academy of Sciences, 51 (1949), 660-672. 22. S. Van Aelst and P. J. Rousseeuw, Robustness of deepest regression, J. Multivariate Anal., to appear. 23. H. Van den Bergh, Zie de zoo, Hamster, 8 (1968), De vrienden van het Schoolmuseum, M. Thiery. 24. M. Van Kreveld, J. S. Mitchell, P. J. Rousseeuw, M. Sharir, J. Snoeyink, and B. Speckmann, Efficient algorithms for maximum regression depth, in \Proceedings of the 15th Symposium on Computational Geometry," ACM (Association for Computing Machines), pp. 31-40, 1999. 27