Linear programming approach to LMS-estimation - ScienceDirect

3 downloads 106300 Views 389KB Size Report
Linear programming approach to LMS-estimation. P. Bocek and P. Lachout. Institute of Information Theory and Automation, Praha, Czech Republic. Received ...
Computational North-Holland

Statistics & Data Analysis 19 (1995) 129-134

Linear programming to LMS-estimation

129

approach

P. Bocek and P. Lachout Institute of Information Theory and Automation, Praha, Czech Republic

Received February 1992 Revised January 1993 Abstract: In the Paper a probabilistic

algorithm, based on the Simplex Method, minimization of the k-th smallest value of absolute residuals in linear regression. suitable for computation of the Least Median of Squares (LMS) estimator of A.M. Leroy [4]. Numerital results indicate that the algorithm represents an comparison with those put forward earlier. Keywords: Order statistics; Regression

is suggested for In particular it is P.J. Rousseeuw, improvement in

residuals; LMS estimator; The Simplex Method

1. Introduction

Estimators based on minimization of the k-th smallest value of absolute residuals are of great importante in robust regression because of their high breakdown Point. They are useful in diagnostic processing or as a starting Point for some more efficient estimators (cf. [6]). The LMS estimator of [4] belongs to this class; its breakdown Point is the highest possible. Threoretical properties of these estimators are desirable. Their actual computations, on the other hand, bring grave difficulties in consequence of their combinatorial complexity. Hence an effort has been made in the litersture to try and get some suboptimal solutions. So far there have been suggested two algorithms: “Progress” in [4] and “Optimal shift” in [5]. The former makes use of random search over certain data-determined hyperplanes while the latter represents an improvement based on shifts of such hyperplanes. Cf. [4], [5] for details. The present Paper introduces a new probabilistic algorithm whose idea Comes from the Simplex Method. The second part of the Paper describes the algorithm, in the third one a proof of convergence is outlined. A numerical example, Correspondence to: P. Lachout,

Vodarenskou

Institute of Information V&?i 4, 18208, Praha 8, Czech Republic.

Theory and Automation,

0167-9473/9.5/$09.50 0 1995 - Elsevier Science B.V. All rights reserved SSDZ 0167-9473(93)E0051-5

AV CR, Pod

P. BoZek and P. Lachout / Linear programming approach

130

cornparing Performance of the Paper.

of the three algorithms, is presented

in the fourth part

2. Algorithm Let us consider a linear regression model y=xp+e

(1)

y,>’ is an n-dimensional vector of observations, X is a given where y=(yr,..., n xp-matrix whose rank is p, p is an unknown p-dimensional vector of regression Parameters, and E = (Er,. . . , E,)’ is an unknown n-dimensional vector of errors. We introduce a numerical algorithm. Therefore statistical assumptions are omitted. We denote Xi the i-th row of X. For a p-dimensional vector b we define the vector e(b) of absolute residuals, e(b) = (el(b>, . . . , e,(b))‘, where ejb) = (yiX;bI, i = 1,. . . >TZ. The corresponding Order statistics will be denoted erlI(b)I - * - I er,,(b). Minimization, for a given k = 1,. . . , ~1, of the Order statistics e[,,(b> yields an estimate of the regression Parameters, say, Mk E argminet,]( b),

k= l,...,n.

b

Recall that for k = [n/2] (integer Part) the estimator Mk is nothing else but the LMS-estimator and for k = n it is 1, estimator. Further note that the estimator M, tan be easily obtained from a Solution of the following linear programming Problem (sec [l]): min Y Under restrictions B + r 1 2 y -Xb+rl2

(3) -y,

where l=(l,...,l)‘~R”.

For the other values of k the estimators cannot be expressed that easily. Still there is a possibility to derive M,‘s from a Solution of the dual Problem corresponding to (3), namely max y’h - y’+ under restrictions X’A - X’+ = 0 h= (Ar,...,AJ 4 = (4l>..‘G#J?J AI+ *** +A, + C&+ * * * 4, = 1 for i= l,...,fz. Ai, $i20

(4)

P. BoCek and P. Lachout / Linear programming approach

Let us briefly recall some terminology of the Simplex Method, reference on the Simplex Method, cf. [3]. Consider a Problem

131

for further

max c’x under restrictions RL = b ~20,

(5)

XER~

where c E Rd, b E R* are (column) vectors, A is a 4 X d-matrix with the full rank 4 I d. Any nonsingular 4 X q-submatrix of A is called a basis of (5); if B is a basis of (5) consisting of the columns ajl, . . . , ai@ of A, denote v(B) = (jl, . . . , j,}. For a basis B, the corresponding basic sohtion x E Rd is such vector that Ax = b and xj = 0 for j E v(B). A basis B is feasible if the corresponding basic Solution is non-negative. If B is a basis, we restritt the vector c onto a column vector cg = (cj : j E v(B)) and d e fine the corresponding criterial row c’,B-IA - c’. If, for i= l,..., d, the i-th component of the criterial row is non-negative, we say that the criterion is fulfiiled for i. The meaning is following. If the criterion is fulfilled for i then the i-th inequality of the dual Problem of (5) min bTy under restrictions ATy 2 c is fulfilled for y = (B-‘jTc,. According to our Problem setting, to get estimator M, means to find a feasible basis of (4) for which simplex criterion is fulfilled for n + k columns. The suggested algorithm is based on a search among the feasible bases of (4). The Simplex Method as well as random selection are employed. Notice that, as the matrix X has the full rank p, the matrix

(:: -“1:i

(6)

has the full rank p + 1. Thus the Simplex Method may be applied Problem (4). The actual algorithm looking for M, takes the following Steps: Step 0.

to the

Put r^= +m, 4 =i = -1.

Choose a feasible basis B of (41 randomly. The choice is made in such a way that each feasible basis has a positive probability of appearance. Step 1.

Step 2.

Let (A, 4) be the basic Solution corresponding If r 2 r^ then go to Step 1 else continue.

to the basis B and

r = y’h - ~‘4.

Step 3. If the number of indices for which the criterion is fulfilled is at least n + k then put ? = r, A = A, C$= q5 and go to Step 1; else continue. Step 4. Make one Change of the basis B in the Simplex procedure Problem (4) and go to Step 2.

for the

P. BoCek and P. Lachout / Linear programming approach

132

Stopping rule. If ? remains unchanged during Q consecutive visits in Step 2 finish the computation. (In applications, Q = 50 or Q = 100 proved to be useful.) After stopping the algorithm we have the resulting values r^, h, 4. An approximation b of the estimator M, is determined by the following equations and inequalities: x;b+;=J$

for ii > 0

-X$

for $i > 0;

+ r^= -yi

(7)

x;& +Y2y,

if the criterion is fulfilled for i

-Xik + ? 2 -yi

if the criterion is fulfilled for (n + 1).

(8)

Such solution always exists and, as a rule, the equations (7) have to be solved only. If the Solution of (7) is determined uniquely the inequalities (8) are fulfilled automatically. Let us briefly discuss the random choice of basis at the step 1. The structural matrix of (4) has a special structure (6). Therefore, the set V(B) of a basis B may contain at once i and i + n for at most one index i. The other choices of columns A produce Singular matrices. Since each basis B is completely determined by the set V(B), it is enough generate this set. Let jl,. . . , jp be randomly Chosen numbers of (1,. . . , n} which are different and j, + 1 be another randomly Chosen number of (1,. . . , n). Hence, a set v(B) is determined by the following. Take 4 = 1,. . . , II. If j, = j,,, then jq, 5, + n belong into v(B) else put j, or j, + y1into v(B) randomly. At the end, tf j,+* does not belong to {jl,. . . , j,} then put j,, 1 or j,,, + IZinto v(B) randomly.

3. Proof of convergence The proof makes use of the Linear Programming Theory, cf. e.g. [3]. Consider the following pair of linear programming Problems (1, J are arbitrary non-empty subsets of (1,. . . , TZ}): vd

min r under retrictions Xib + r 2 yi,

iEI

-Xjb + r 2 -yj,

max y’h - y’qb under restrictions X’h - X’$ = 0 A, + *** +h,+&+ (W

jEJ

.** +r$,=l

for i= l,...,n Ai, Ei 2 0 A,=O for i 6741 +j = 0 for j@J

P. BoCek and P. Lachout / Linear programming approach

The Problem (D,,]) is equivalent to the dual Problem of (P,,,). Therefore, Problems (D1,J) and (P,,,) resch the same optimal value.

133

the

(a) Let us prove r^2 e[,l(M,). The inequality is nontrivial only for r^< 03; hence there are the current feasible basis B and the corresponding basic Solution (A, 4) in Step 3. That is, (A, 4) is an optimal Solution of (DI,J) where 1 = {i: the criterion is fulfilled for i} , J = (j: the criterion is fulfilled for II + j}.

By duality, ? is an optimal value for (P,,,). In Step 3 it has to hold card I + card J 2 n + k which implies card 1 n J 2 k. Consequently ? 2 e[,l(M,) as ]yi - X:bl I ? whenever i E 1 n J. (b) We are going to prove that the value of e,,](M,) probability one.

is found in finite time with

Denote y. = et,l(M,) and I= {i: x;fV, + r. 2Yi}, J = {j: -Xi’&

+ r. 2 -yj>.

Then y. is optimal in (P,,,) and card 1 + card J 2 n + k because card (1 n J) 2 k and each i = 1,. . . , n has to belong to 1 or J (in fact, yi - XiM, or X:M, - yi is non-positive). By duality, (OI,J) has an optimal Solution (h, 4) which is the basic Solution corresponding to certain feasible basis B,. The algorithm has to choose B, in finite time with probability one because of a finite number of bases and positive probabilities of their appearances. Suppose B, has been Chosen. If r^I y0 then by (a) it is r^= yo,- otherwise ? > y. and we proceed to Step 3 where r^ takes on the value rO.

4. Numerital

example

The presented algorithm has been implemented to PC, the Source Programme is written in PASCAL. We applied it to various data sets and its results have been compared with those of the other discussed algorithms. Only one typical example is presented here: a known regression Problem of determining the influence of anatomical factors on wood specific gravity (five independent variables and an intercept) - data of Draph and Smith [2], p. 227. The LMS-estimation consists in minimization of the 13-th absolute residual. The results of the three algorithms are summed up in the following table. “Progress” denotes the computational procedure introduced in [4], “Optimal Shift” is developed in [5] and “Simplex” symbolically indicates our algorithm.

P. BoCek and P. Lachout / Linearprogramming

134

Xl x2

x4 x5

Intercept e[l3l

computation

Progress

Optimal Shift

Simplex

0.2687 0.2381 - 0.5357 - 0.2937 0.4510 0.4347 0.0073 14 min

0.2452 - 0.4554 - 0.5214 - 0.4509 0.6714 0.3566 0.0055 15 min

0.2354 0.0450 - 0.5746 - 0.3667 0.6263 0.3185 0.0041 30 sec

-

x3

time

approach

The table Shows advantages of the suggested algorithm. The computation time is shorter and achieved value of et,,] is substantially smaller. Similar results were achieved for various data.

References [ll Arthanari, T.S. and Y. Dodge, Mathematical

Programming in Statistics (John Wiley&Sons, New York, 1981). El Draper, N.R. and H. Smith, Applied Regression Analysis (J. Wiley&Sons, New York, 1966). [31 Hadley, G. Linear Programming (Addison-Wesley, London, 1962). 141Leroy, A.M. and P.J. Rousseeuw, Robust Regression and Outlier Detection (J. Wiley&Sons, New York, 1987). Fl Tichavsky, P., Algorithms for and geometrical characterizations of solutions in the LMS and LTS linear regression (Computational Statistics Quarterly 2 (1981) 139-151). [hl ViSek, J.A, What is adaptivity of regression analysis intended for? in: Transattions of ROBUST’90 (Union of Czechoslovak Mathematicians and Physicist, 1990) 160-181.

Suggest Documents