Linear programming approach to LMS-estimation. P. Bocek and P. Lachout. Institute of Information Theory and Automation, Praha, Czech Republic. Received ...
Computational North-Holland
Statistics & Data Analysis 19 (1995) 129-134
Linear programming to LMS-estimation
129
approach
P. Bocek and P. Lachout Institute of Information Theory and Automation, Praha, Czech Republic
Received February 1992 Revised January 1993 Abstract: In the Paper a probabilistic
algorithm, based on the Simplex Method, minimization of the k-th smallest value of absolute residuals in linear regression. suitable for computation of the Least Median of Squares (LMS) estimator of A.M. Leroy [4]. Numerital results indicate that the algorithm represents an comparison with those put forward earlier. Keywords: Order statistics; Regression
is suggested for In particular it is P.J. Rousseeuw, improvement in
residuals; LMS estimator; The Simplex Method
1. Introduction
Estimators based on minimization of the k-th smallest value of absolute residuals are of great importante in robust regression because of their high breakdown Point. They are useful in diagnostic processing or as a starting Point for some more efficient estimators (cf. [6]). The LMS estimator of [4] belongs to this class; its breakdown Point is the highest possible. Threoretical properties of these estimators are desirable. Their actual computations, on the other hand, bring grave difficulties in consequence of their combinatorial complexity. Hence an effort has been made in the litersture to try and get some suboptimal solutions. So far there have been suggested two algorithms: “Progress” in [4] and “Optimal shift” in [5]. The former makes use of random search over certain data-determined hyperplanes while the latter represents an improvement based on shifts of such hyperplanes. Cf. [4], [5] for details. The present Paper introduces a new probabilistic algorithm whose idea Comes from the Simplex Method. The second part of the Paper describes the algorithm, in the third one a proof of convergence is outlined. A numerical example, Correspondence to: P. Lachout,
Vodarenskou
Institute of Information V&?i 4, 18208, Praha 8, Czech Republic.
Theory and Automation,
0167-9473/9.5/$09.50 0 1995 - Elsevier Science B.V. All rights reserved SSDZ 0167-9473(93)E0051-5
AV CR, Pod
P. BoZek and P. Lachout / Linear programming approach
130
cornparing Performance of the Paper.
of the three algorithms, is presented
in the fourth part
2. Algorithm Let us consider a linear regression model y=xp+e
(1)
y,>’ is an n-dimensional vector of observations, X is a given where y=(yr,..., n xp-matrix whose rank is p, p is an unknown p-dimensional vector of regression Parameters, and E = (Er,. . . , E,)’ is an unknown n-dimensional vector of errors. We introduce a numerical algorithm. Therefore statistical assumptions are omitted. We denote Xi the i-th row of X. For a p-dimensional vector b we define the vector e(b) of absolute residuals, e(b) = (el(b>, . . . , e,(b))‘, where ejb) = (yiX;bI, i = 1,. . . >TZ. The corresponding Order statistics will be denoted erlI(b)I - * - I er,,(b). Minimization, for a given k = 1,. . . , ~1, of the Order statistics e[,,(b> yields an estimate of the regression Parameters, say, Mk E argminet,]( b),
k= l,...,n.
b
Recall that for k = [n/2] (integer Part) the estimator Mk is nothing else but the LMS-estimator and for k = n it is 1, estimator. Further note that the estimator M, tan be easily obtained from a Solution of the following linear programming Problem (sec [l]): min Y Under restrictions B + r 1 2 y -Xb+rl2
(3) -y,
where l=(l,...,l)‘~R”.
For the other values of k the estimators cannot be expressed that easily. Still there is a possibility to derive M,‘s from a Solution of the dual Problem corresponding to (3), namely max y’h - y’+ under restrictions X’A - X’+ = 0 h= (Ar,...,AJ 4 = (4l>..‘G#J?J AI+ *** +A, + C&+ * * * 4, = 1 for i= l,...,fz. Ai, $i20
(4)
P. BoCek and P. Lachout / Linear programming approach
Let us briefly recall some terminology of the Simplex Method, reference on the Simplex Method, cf. [3]. Consider a Problem
131
for further
max c’x under restrictions RL = b ~20,
(5)
XER~
where c E Rd, b E R* are (column) vectors, A is a 4 X d-matrix with the full rank 4 I d. Any nonsingular 4 X q-submatrix of A is called a basis of (5); if B is a basis of (5) consisting of the columns ajl, . . . , ai@ of A, denote v(B) = (jl, . . . , j,}. For a basis B, the corresponding basic sohtion x E Rd is such vector that Ax = b and xj = 0 for j E v(B). A basis B is feasible if the corresponding basic Solution is non-negative. If B is a basis, we restritt the vector c onto a column vector cg = (cj : j E v(B)) and d e fine the corresponding criterial row c’,B-IA - c’. If, for i= l,..., d, the i-th component of the criterial row is non-negative, we say that the criterion is fulfiiled for i. The meaning is following. If the criterion is fulfilled for i then the i-th inequality of the dual Problem of (5) min bTy under restrictions ATy 2 c is fulfilled for y = (B-‘jTc,. According to our Problem setting, to get estimator M, means to find a feasible basis of (4) for which simplex criterion is fulfilled for n + k columns. The suggested algorithm is based on a search among the feasible bases of (4). The Simplex Method as well as random selection are employed. Notice that, as the matrix X has the full rank p, the matrix
(:: -“1:i
(6)
has the full rank p + 1. Thus the Simplex Method may be applied Problem (4). The actual algorithm looking for M, takes the following Steps: Step 0.
to the
Put r^= +m, 4 =i = -1.
Choose a feasible basis B of (41 randomly. The choice is made in such a way that each feasible basis has a positive probability of appearance. Step 1.
Step 2.
Let (A, 4) be the basic Solution corresponding If r 2 r^ then go to Step 1 else continue.
to the basis B and
r = y’h - ~‘4.
Step 3. If the number of indices for which the criterion is fulfilled is at least n + k then put ? = r, A = A, C$= q5 and go to Step 1; else continue. Step 4. Make one Change of the basis B in the Simplex procedure Problem (4) and go to Step 2.
for the
P. BoCek and P. Lachout / Linear programming approach
132
Stopping rule. If ? remains unchanged during Q consecutive visits in Step 2 finish the computation. (In applications, Q = 50 or Q = 100 proved to be useful.) After stopping the algorithm we have the resulting values r^, h, 4. An approximation b of the estimator M, is determined by the following equations and inequalities: x;b+;=J$
for ii > 0
-X$
for $i > 0;
+ r^= -yi
(7)
x;& +Y2y,
if the criterion is fulfilled for i
-Xik + ? 2 -yi
if the criterion is fulfilled for (n + 1).
(8)
Such solution always exists and, as a rule, the equations (7) have to be solved only. If the Solution of (7) is determined uniquely the inequalities (8) are fulfilled automatically. Let us briefly discuss the random choice of basis at the step 1. The structural matrix of (4) has a special structure (6). Therefore, the set V(B) of a basis B may contain at once i and i + n for at most one index i. The other choices of columns A produce Singular matrices. Since each basis B is completely determined by the set V(B), it is enough generate this set. Let jl,. . . , jp be randomly Chosen numbers of (1,. . . , n} which are different and j, + 1 be another randomly Chosen number of (1,. . . , n). Hence, a set v(B) is determined by the following. Take 4 = 1,. . . , II. If j, = j,,, then jq, 5, + n belong into v(B) else put j, or j, + y1into v(B) randomly. At the end, tf j,+* does not belong to {jl,. . . , j,} then put j,, 1 or j,,, + IZinto v(B) randomly.
3. Proof of convergence The proof makes use of the Linear Programming Theory, cf. e.g. [3]. Consider the following pair of linear programming Problems (1, J are arbitrary non-empty subsets of (1,. . . , TZ}): vd
min r under retrictions Xib + r 2 yi,
iEI
-Xjb + r 2 -yj,
max y’h - y’qb under restrictions X’h - X’$ = 0 A, + *** +h,+&+ (W
jEJ
.** +r$,=l
for i= l,...,n Ai, Ei 2 0 A,=O for i 6741 +j = 0 for j@J
P. BoCek and P. Lachout / Linear programming approach
The Problem (D,,]) is equivalent to the dual Problem of (P,,,). Therefore, Problems (D1,J) and (P,,,) resch the same optimal value.
133
the
(a) Let us prove r^2 e[,l(M,). The inequality is nontrivial only for r^< 03; hence there are the current feasible basis B and the corresponding basic Solution (A, 4) in Step 3. That is, (A, 4) is an optimal Solution of (DI,J) where 1 = {i: the criterion is fulfilled for i} , J = (j: the criterion is fulfilled for II + j}.
By duality, ? is an optimal value for (P,,,). In Step 3 it has to hold card I + card J 2 n + k which implies card 1 n J 2 k. Consequently ? 2 e[,l(M,) as ]yi - X:bl I ? whenever i E 1 n J. (b) We are going to prove that the value of e,,](M,) probability one.
is found in finite time with
Denote y. = et,l(M,) and I= {i: x;fV, + r. 2Yi}, J = {j: -Xi’&
+ r. 2 -yj>.
Then y. is optimal in (P,,,) and card 1 + card J 2 n + k because card (1 n J) 2 k and each i = 1,. . . , n has to belong to 1 or J (in fact, yi - XiM, or X:M, - yi is non-positive). By duality, (OI,J) has an optimal Solution (h, 4) which is the basic Solution corresponding to certain feasible basis B,. The algorithm has to choose B, in finite time with probability one because of a finite number of bases and positive probabilities of their appearances. Suppose B, has been Chosen. If r^I y0 then by (a) it is r^= yo,- otherwise ? > y. and we proceed to Step 3 where r^ takes on the value rO.
4. Numerital
example
The presented algorithm has been implemented to PC, the Source Programme is written in PASCAL. We applied it to various data sets and its results have been compared with those of the other discussed algorithms. Only one typical example is presented here: a known regression Problem of determining the influence of anatomical factors on wood specific gravity (five independent variables and an intercept) - data of Draph and Smith [2], p. 227. The LMS-estimation consists in minimization of the 13-th absolute residual. The results of the three algorithms are summed up in the following table. “Progress” denotes the computational procedure introduced in [4], “Optimal Shift” is developed in [5] and “Simplex” symbolically indicates our algorithm.
P. BoCek and P. Lachout / Linearprogramming
134
Xl x2
x4 x5
Intercept e[l3l
computation
Progress
Optimal Shift
Simplex
0.2687 0.2381 - 0.5357 - 0.2937 0.4510 0.4347 0.0073 14 min
0.2452 - 0.4554 - 0.5214 - 0.4509 0.6714 0.3566 0.0055 15 min
0.2354 0.0450 - 0.5746 - 0.3667 0.6263 0.3185 0.0041 30 sec
-
x3
time
approach
The table Shows advantages of the suggested algorithm. The computation time is shorter and achieved value of et,,] is substantially smaller. Similar results were achieved for various data.
References [ll Arthanari, T.S. and Y. Dodge, Mathematical
Programming in Statistics (John Wiley&Sons, New York, 1981). El Draper, N.R. and H. Smith, Applied Regression Analysis (J. Wiley&Sons, New York, 1966). [31 Hadley, G. Linear Programming (Addison-Wesley, London, 1962). 141Leroy, A.M. and P.J. Rousseeuw, Robust Regression and Outlier Detection (J. Wiley&Sons, New York, 1987). Fl Tichavsky, P., Algorithms for and geometrical characterizations of solutions in the LMS and LTS linear regression (Computational Statistics Quarterly 2 (1981) 139-151). [hl ViSek, J.A, What is adaptivity of regression analysis intended for? in: Transattions of ROBUST’90 (Union of Czechoslovak Mathematicians and Physicist, 1990) 160-181.