Document not found! Please try again

Robust Regression by Trimmed Least-Squares ... - NCSU Statistics

0 downloads 0 Views 648KB Size Report
hypotheses based on trimmed least squares are available and are computationally similar to those based ..... for S involves only the intercept, §l' and not the slopes. Also §T(~)l = S, ... regression analysis are special cases of l3. 9). See Searle ...
Robust Regression by Trimmed Least-Squares Estimation by David Ruppert and Raymond J. Carroll

Koenker and Bassett (1978) introduced regression quantiles and suggested _.

a method of trimmed

least~squares

estimation based upon them.

We develop

asymptotic representations of regression quantiles and trimmed least-squares estimators. model.

The latter are robust, easily computed estimators for the linear

Moreover, robust confidence ellipsoids and tests of general linear

hypotheses based on trimmed least squares are available and are computationally similar to those based on ordinary least squares.

2

David Ruppert is Assistant Professor and Raymond J. Carroll is Associate Professor, both at the Department of Statistics, the University of North Carolina at Chapel Hill.

This research was supported by the National Science

Foundation Grant NSF MCS78-01240

and the Air Force Office of Scientific Research

under contract AFOSR-75-2796. The authors wish to thank Shiv K. Aggarwal for his programming assistance.

4Ir~

3

1.

INTRODUCTION

We will consider the linear model

xS+ where Y = (YI, ... ,Y ) " ~ n

e

(1.1)

X is a nxp matrix of known constants,

is a vector of unknown parameters, and i. i. d. with distribution F.

~ =

S=

~

(Sl""'Sp )'

(el, •.. ,e ) where el, ... ,e are n n

Recently there has been much interest in estima-

tors of S which do not have two serious drawbacks of the least-squares estimator, inefficiency when the errors have a distribution with heavier tails than the Gaussian and great sensitivity to a few outlying observations.

In general,

such methods, which fall under the rubic of robust regression, are extensions of techniques originally introduced for estimating location parameters.

In the

location model three broad classes of estimators, M,L and R estimators are available.

See Huber (1977) for an introduction to these classes.

For the linear model, M and R estimators have been studied extensively. Until recently only Bickel (1973) has studied regression analogs of L-estimators. His estimates have attractive asymptotic efficiencies, but they are computationally complex.

Moreover, they are not equivariant to reparametrization (see

remarks after his Theorem 3.1). There are compelling reasons for extending L-estimators to the linear model.

For the location problem, L-estimators, particularly trimmed means,

are attractive to those working with real data.

Stigler (1977) recently

applied robusts estimators to historical data and concluded that "the 10% trimmed mean (the smallest nonzero trimming percentage included in the study) emerges as the recommended estimator." Jaeckel (1971) has shown that if F is sYmmetric then for each L-estimator of location there are asymptotically equivalent M and R estimators.

However,

4 without knowledge of f it is not possible corresponding M and R estimators. equivalent to Huber's M-estimate

to match up a L-estimator with its

For example, trimmed means are asymptotically which is the solution b to

n

I

i=l

p((X.-b)js . 1 . n ) = min!

where if

Ixl

< k

if

Ixl

> k

The value of k is determined by the trimming proportion a of the trimmed mean, F, and the choice of s . n

In the scale non-invariant case (s

n

= 1)

k = F-l(l-a).

The practicing statistician, who known only his data, may find his intuition of more assistance when choosing a compared with k. Recently Koenker and Bassett (1978) have extended the concept of quanti1es to the linear model.

Let 0 < 8 < 1.

=

Define

8 - 1

if

x > 0

if

x < 0, and

(1.2)

Then a 8th regression quantile, S(8), is any value of p which solves

x. b) ~1

~

=

min!

(1.3)

Their Theorem 4.2 shows that regression quanti1es have asymptotic behavior similar to sample quantiles in the location problem.

Therefore, L-estimates

consisting of linear combinations of a fixed number of order statistics, for example the median, trimean, and Gastwirth's estimator, are easily extended

5

to the linear model and have the same asymptotic efficiencies as in the location model.

Moreover, as they point out, regression quantiles can be easily

computed by linear programming techniques.

They also suggest the following

trimmed least-squares estimators, call it ~T(a): remove from the sample any observations whose residual from Sea) is negative or whose residual from A

S(l-a) is positive and calculate the least-squares estimator using the remaining observations.

They conjecture that if lim n-l(X'X) A

then the variance of §T(a) is n

-1 2 0

= Q(positive definite),

n~

1 -1 2 (a,F)Q- , where n 0 (a,F) is the variance of

an a-trimmed mean from a population with distribution F. A

A

In this paper we develop asymptotic expansions for §(8) and §T(a) which provide simple proofs of Koenker and Bassett's Theorem 4.2 and their conjecture A

about the asymptotic covariance of §T(a). In the location model, if F is aSYmmetric then there is no natural parameter to estimate.

In the linear model, if the design matrix is chosen so one

column, say the first, consists entirely of ones and the remaining columns each sum to zero, then our expansions show that for each 0 < a < ~

A

n 2 (§T(a) - S - 8(a))

L

+

N(O,Q

-1

0

2

~

(a,F))

where o(a) is a vector whose components are all zero except for the first. Therefore, the ambiguity about the parameter being estimated involves only the intercept and none of the slope parameters. Additionally and general linear

we present a large sample theory of confidence ellipsoids hypothesis testing, which is quite similar to that of least

squares estimation with Gaussian errors. The close analogy between the asymptotic distributions of trimmed means of our trimmed least-squares estimator, ~T(a) is remarkable.

Other reasonable

definitions of a trimmed least squares estimator do not have this property.

6

For example, define an estimator of

S,

call it K50, as follows: compute the

A

residuals from S(.5) and after removing the observations with k = [an] smallest and k largest residuals compute the least squares estimator.

The asymptotic

behavior of K50, which is complicated and is not identical to that of §T(a), will be reported elsewhere. Section 2 presents the formal model being considered, the main results are in Section 3, and several examples are found in Section 4.

Proofs are in the

appendix.

2.

NOTATION AND ASSUMPTIONS Although y, X, and e will depend on

Recall the form (1.1) of our model.

n we will not make that explicit in the notation. the i th row of X.

Let x. = (x. 1 ' ... ,x. ) be ~1 1 1p

Assume XiI = 1, i = l, ... ,n, lim (

n~

-k

max (n 21 x .. I)) = 0, j :S;p; i:S;n 1J

(2.1)

and there exists positive definite Q such that lim n

-1

(X'X)

Q.

=

(2.2)

n~

-1

Without loss of generality, we assume F

p-variate Gaussian distribution with mean

a ~

p

< 8 < 1,

a

= F-1 (p)

< 8

1

< ... < 8

and define

m

~l

< 1, and

= ~a

and l

a ~2

(Yz) = O.

p

(~,L)


0, there exists K > 0,

n>

Her C.. is our x .. n -~ J1

1J

0, and an integer nO such

that if n > no' then p{

inf

IIM(~) II < n}
K ~

Proof:

This is shown in exactly the same manner as Lemma 5.2 of Jureckova (1977).

16 Proof of Theorem 2:

In M(~) replace ~ by

In

(see) - s(e».

By theorem 1

n

M(1rl

(s(e) - see))

= n-

Jz

\'

x '. ,I, ("

L

i=d

From Lemma A.3, Irl (s(e) - s(e))

=

'l'e

",1

x. see)) = op(l) . ",1

Y1'

op (1). Theorem

CA.4)

2 follows from

(A.3) and (A.4). Proof of Corollary 1:

A

Let y(e i ) = §(e i ) - §(e i )

We need only show that i f ~j

E

and

y' = (y(el)', ... ,yCem)').

(~i,"":~) then

RP for j=l, ... ,m and c' =

L yn c' y V + N(O,c'

'"

(~0

Q-1 )c).

'"

(A. S)

'"

Define

and m

n·l ' =

I

j=l

n·1J..

By Theorem 2, n

Irl c'y

= n -Jz I

n.

i=l

+

0

1

P

(1)

Then (A.S) follows since routine calculations show that conditions of Lindeberg's Central Limit Theorem.

.

nl .,n 2 ., ...

satisfy the

(See for example, Loeve (1963,

section 20.2, Theorem B). For A any matrix, let

IAI = max IA.. I. " 1J 1,

Lemma A.4:

J

Let D.1n (= D.) be a rxc matrix. 1 sup (n - 1 n

Suppose (A.6)

J

17

Let I be an open interval containing

~l

and

~2

and let the function g(x) be For ~l' ~2' and ~3 in RP

defined for all x and Lipschitz continuous on I. and 6 =: (~l' ~2' ~3) define

T* CL'I) T(6) =: T*(6) - E T*(6), (A.7)

S*(6) =: T*(6) - T*(O), ~

~

S(6) =: S*(6) - E S*(6)

Then for all M > 0, IS(6) I=:o (1). ~ P Proof:

Here we follow Bickel (1975, Lemma 4.1) closely.

assume M=:l.

For convenience

Define for £ =: 1,2

and let b. (6) =: g ( e. + x 1

1

~

~i

6

-1:

3

n 2)

Then for all n large enough varIS(~)

I

< n

-1

I 10 1.\ 2

n

i=:l +

n

-1

Varlb.1 (M - b.1 (0) jI{e.1 ~

E

n 2 I \0.\ Vadb. (0)(1. (6) - 1. (0) i=:l 1 1 11 ~ 11

Since g is Lipschitz on I n

I

i=:l

I and e.1

+

X.6 n ~l 3

_~ E

I}

18

by (A.6) and (2.1).

Since F is continuously differentiable in neighborhoods of

~l and ~2

R2 .2 O(n

-1

n

10.1 2 I i::::l

Ix. I n",1

1

~Z

)

::::

0(1)



Therefore for any fixed 6, P

SCM Choose

°

> O.

(A.8)

-+ 0

Now cover the p-dimensional cube, [-l,l]P , with a union of

cubes having vertices on the grid J(o) :::: {ljlo,j20, ... ,jpo); ji:::: O, ~l, ~2, ... , or ~ ([1/0]

+

If I~I .2 1, then for ~

I)}.

let V~(~) be the lowest

:::: 1,2,3

vertex of the cube containing ~~ and let V(~) : : (Vl(~)' V2(~)' V3(~))' Then straightforward calculations show that for some C

Is*(~) - S*(V(~)) 1.2 n1

+ n-

Yz

1

Yz

n

1

i~l 10il Ibi(~) - bi(V(~)) I I(e i .!. I~il n- Yz

n

L 10·1 i::::l 1

lb. (V(M) 1

~

I

E

I)

..

[I{-a. < Pl' < a.} 1

-

1 -

1

I

on-

Ix.

1

+

I{-a. < P2 · < a.}] 1 1 1

~ Z

Now since g is Lipschitz on I and F has a continuous derivative in neighborhoods of

for all 6

~l

E

and

~2

there is a constant Kl such that for m::::l,2

(J(0))3.

Then by (2.2), (A.6) and the Cauchy-Schwarz inequality for

19

Wm(~)

Exactly as in the argument leading to (A.S) we have (letting

=

Wm(o'~l)

o (1) . p

Thus, for all E

>

0 and 0

>

0, there exists nO such that if n

P{(max

~ EJ3

W (~ )) > E m

+

0

~

nO' then for m=I,2

K o} < E, I

~O

whence p{

sup

~E[0,1]3p

Is*(~)

-

S*(V(~))

I > E

+

KIo} < E

Note that there exists a constant K3 such that sup

IE(S*(M - S*(V(M))

~E[0,IJ3p

~

~

I :5. K3 0

.

Choosing 0 = e/max (K ,K ) we have that for all E > 0, there exists nO such that I 3 P(

sup

~dO, 1] 2p

IS(~) - S(V(~)

By (A.8), for this 0 and E there exists n P(

sup

~ EJ (0) 3

l

IS (~ ) I ~O

I

such that if n >

~O

Inequalities (A.9) and (A.lO) prove the lemma.

(A.9)

> 3E) < E

E)


" South African Statistics Journal, 8, 127-134. Daniel, Cuthbert and Wood, Fred S. (1971).

Fitting Equations to Data>

New

York: John Wiley.

e.

Huber, Peter J. (1977), Robust Statistical Procedures> Philadelphia: SIAM. Jaeckel, Louis A. (1971), "Robust Estimates of Location: Symmetry and Asymmetric Contamination," The Annals of Mathematical Statistics> 42,

1020~l034.

Jureckova, Jana (1977), "Asymptotic Relations of M-estimates and R-estimates in Linear Regression Model", The Annals of Statistics, 5, 464-472. Koenker, Roger and Bassett, Gilbert, Jr. (1978), "Regression Quantiles,"

Econometrica> 46, 33-50. Lenth, Russell V. (1976), "A Computational Procedure For Robust Multiple Regression," Tech. Report 53, University of Iowa. Loeve, Michel (1963), Probability Theory> New York: Van Nostrand. McKeown, P.G. and Rubin, D.S. (1977), "A Student Oriented Preprocessor for MPS/360," Computers and Operations Research> 4, 227-229.

27

Searle, Shayle R. (1971), Linear

Models~

Scheffe, Henry l1959), The Analysis of

New York: John Wiley and Sons.

Variance~

New York: John Wiley and Sons.

Stigler, Stephen M. (1977), "Do Robust Estimators Work with Real Data?"

The Annals of

.•

Statistics~

5, 1055-1098 .