Journal of Statistical Computation and Simulation Vol. 75, No. 4. April 2005. 263-286
Least absolute value regression: recent contributions TERRY E. DIELMAN* M. J. Neeley School ot" Business, TCU, RO. Box 298530, Fort Worth, TX 76129, USA (Revised 7 October 2003: in final form 21 March 2004} This article provides a review of research involving least absolute value (LAV) regression. The review is concentrated primarily on research publisbed since Ihe sur\'ey article by Dielman (Dielman, T. E. (1984). lx"a.sl absolute value estimation in regression mtxlels; An annotated bibliography. Communications ill Statistics - Theory and Methoih. 4. 513-541.) and includes articles on LAV estimation as applied to linear and non-linear regression models and in sysiems of equations. Some topics included are computation of LAV estimates, properties of LAV eslimators and inferences in LAV regression. In addition, recent work in some areas related lo LAV reijression will be discussed. Keywords. Linear regression models; Nonlinear regression models; Systems of equations; /. i -norm regression; Minimum absolute deviation regression; Least absolute deviation regression; Minimum sum of absoiute errors regression
1.
Introduction
This article provides a review of research on least absolute value (LAV) regression. It includes articles on LAV estimation as applied lo linear and non-linear regression models and in systems of equations. Some references to the LAV method as applied in approximation theory are also included. In addition, recent work in areas related to LAV regression will be discussed. I have attempted to include major contributions to LAV regression not included in Dielman 11 ]. My apologies in advance for any omissions. Additional survey articles on LAV estimation include the annotated bibliography of Dielman [I] as well as survey articles by Dodge [2|. Narula |3] and Pynnonen and Salmi 14], The paper by Dodge 12] served as an introduction to three special issues of Computational Statistics atul Data Analysis (CSDA) entitled "Statistical Data Anaiysis Procedures Based on the Li-Norm and Related Methods' {Computational Statistics and Data Analysis. Volume 5. Number 4. 1987; Volutiie 6. Numbers 3 and 4, 1988). An earlier version of the paper appears in Statistical Data Analysis Based on the L\-Norm atid Related Methods, Y. Dodge, editor. Amsterdam: North-Holland. 1987. This is a collection of articles from the First International Conference on the L|-Norni and Related Methods held in Neuchatel. Switzerland. I reference a number of the articles in the CSDA collection, but *Fax: 817-257-7227; Email:
[email protected]
Journal of Stalistical Cotnputatioti and Simulation ISSN 0094-9655 print/ISSN 1563-5163 online © 2005 Taylor & Francis Ltd htip://www.iandf.co.uk/iournals DOI: 10,1080/0094965042000223680
264
T E. Dielman
not their earlier versions in the conference proceedings since these are essentially repeats. There are three other collections of articles worth mentioning. These collections contain selected papers from the Second. Third and Ft)urth International Conferences on the L|Norm and Related Methods, held in Neuchatei. in 1992. 1997 and 2002, respectively. These collections are published as L\-Slatistical Analysis and Related Methods. Y. Dodge, editor, Amsterdam: North-Holland, 1992; L[-Statistical Procedures atid Related Topics, Y. Dodge, editor. Institute of Mathematical Statistics Lecture Notes - Monograph Series. Volume 31, 1997; and Statistical Data Atialysis Based on the L\-Nonn atid Related Methods, Y. Dodge, editor. Birkhauser, 2002. Selected papers from these collections will be referenced in this article. In addition to survey articles, there are books or chapters in books tbat provide information on LAV regression. Birkes and Dodge Iref. 5, see Chap. 41. Blootntield and Steiger [6] and Sposito |ref. 7. see Chap. 5] provide technical detail about LAV regression, whereas Farebrother | 8 | presents a discussion of the historical deveiopment of LAV and least squares (LS) methods. The primary emphasis of this article is LAV linear regression. To motivate the discussion, consider the multiple linear regression model fthXik-^E;
f o r / = = 1,2
n.
(1)
k=\
where v, is the /th value ofthe response variable; .t,*:, the /th observation on the kth explanatory variable; ^o- the constant in the equation, fit, the coefficient of the ^th explanatory variable andf, i.s the/th value of the disturbance. Additional assumptions about the model are reserved unlil later sections, The LAV regression involves finding estimates of fi^. fi\. fii fiK, denoted ho,b\,bz,.. ..hn^ that minimize the sum ofthe absolute values ofthe residuals, X!/'=i \yi ~ -^"'l' '^here y,- = /?{> -I- Ylk=\ ^k-'^ik represent predicted values. This problem can be restated as a linear programing problem:
minimize
/J(^,^ + d~)
(2)
(=1
subject to
/
^
\
Vi - I ^11 + X]''**''^ + ^^^ - (/,' I = 0 for / = 1. 2
n.
(3)
where df, d~ > 0, and the bi,. k =()A.2 K are unrestricted in sign. The df and d~ are. respectively, the positive and negative deviations (residuals) associated with the /th observation. The dtial problem is stated most conveniently as maximize
^ ( / ? , y , - v,)
subject to
\ \ /-''•^'* ~ -"•"'
(4) for /: = 1, 2
K,
(5)
(=1
where 0 < />, < 2, / = 1.2 n. In the dual formulation, the dual variables are the />,, / = I. 2 n. See Wagner |9] fora discussion of this form ofthe dual problem. The LAV regression is also known by several other names, including L| -norm regression, minimum absolute deviation regression, least absolute deviation regression and minimum sum of absolute errors regression.
Least absolute vahw
2.
265
History and computation of least absolute value regression
Boscovich [ 10,11 ] explicitly discussed minimizing the sum ofthe absolute errors as a criterion for fitting a line to observation data. This is the first recognized use of the LAV criteria for a regression application and is prior to Legendre's announcement of the principle of LS in 1805. After that announcement, LAV estimation took a secondary role in the solution of regression problems, likely due to the uniqueness of LS solutions, the relative computational simplicity of LS and the thorough reformulation and development of the method of LS by Gauss 112-14] and Laplace [15] in terms ofthe theory of probability. Fiu-ebrother [8] provides a history of the development of LAV and LS procedures for fitting linear relationships. I highly recommended this book for the reader interested in obtaining a clear perspective of the historical development of these methods. The other published work of Farebrother 116-19] also provides a variety of historical information. The 1993 article discusses the work of Boscovich. Koenker [20] provides an interesting review of historical connections to state-of-the-art aspects of LAV and quantile regression. He discusses Edgeworth's work on computation of LAV regression and its relationship to the simplex approach as well as Edgeworth's comment that LAV computations could be made as simple as those of LS. He also relates the work of Bowley [21 ] to that of Rousseeuw and Hubert [22] on regression depth and that of Frisch [23] to interior point methods. Stigler [24[ includes LAV regression as a part ofthe discussion in his book on statistics prior to I9O{). In addition, Stigler [25] discusses a manuscript fragment that shows that Thomas Simpson and Roger Boscovich met in 1760, and that Boscovicb posed a LAV regression problem to Simpson. Charnes et al. |26] are credited with tirst using tbe simplex method to solve a LAV regression problem. They used tbe simplex method to solve the primal linear programing problem directly. It was quickly recognized, however, that computational efficiencies could be gained by taking account ofthe special structure ofthe type of problem being solved. Until the 1990s, most research on algorithms/programs to solve the LAV regression problem involved variations of the simplex method. Barrodaleand Roberts [27,281 (BR) provided a veiyefticient algorithm based on the primal formulation ofthe problem. Tbis algorithm is the one used in the lMSL package of subroutines. It was considered to be the fastest algorithm available at the time it was published, and is still often used as a benchmark today. Armstrong and Kung [29] (AK) specialized the BR algorithm for simple regression. Bloomfield and Steiger [30] (BS) modified the BR algorithm by employing a steepest edge criterion to determine a pivot. Armstrong cr«/. [3! [ (AFK) used the revised simplex method with LU decomposition ofthe basis matrix to develop a very fast algorithm for multiple regression. Tbe algorithm is similar to that of BR but is more efficient due to its use ofthe LU decomposition for maintaining the current basis and requiring less sorting. Josvanger and Sposito [32] (JS) presented an algorithm for LAV simple regression that used a descent approach rather tban linear programing. Many early timing comparisons ofthe algorithms mentioned so far are summarized in Dielman and Pfaffenberger [34j. Gentle et al. [33] examined the pertbrmance of LAV algorithms for simple and multiple regression. The study used openly available codes. For simple regression, the codes of AK. JS. AFK. BS and Abdelmaiek [351 (A) were compared. The JS program performed well. In multiple regression, the AFK program performed well. The BS program performed well in both cases for smaller sample sizes, but failed to produce correct answers when sample size was large (1000 or more in multiple regression).
266
T E. Oielman
Gentle et al. [36] examined the performance of LAV algorithms for simple regression. Again, the study used openly available codes. For simple regression, the codes of AK, JS, AFK and BS were compared. The JS program performed well. When there was a perfect fit, (he AFK program outperformed the JS program. Naiula ('/ (li [37[ performed a liming comparison for the codes of JS, AFK. BS and BR for simple regression. The JS algorithm performed best when sample size was 3(X) or less, the BS algorithm when sample size was 750 or more, and the two performed similarly for intermediate sample sizes. The previous four algorithms and the algorithm of AK were compaied when LS estimates were used as starting values in the LAV algorithms. The AFK and BS algorithms performed best overall. Sunwooand Kim [38] (SK) developed an algorithm that used a direct descent method with tbe LS estimate as a starting point. SK showed their algorithm to be faster than hotb AK and JS. (It should be noted tbat the timing results of SK for tbe JS algorithm differed considerably from other published results.) Although the timing comparisons favor the SK algorithm, it is unclear whether tbe computational time involved in finding the LS estimate is included. The JS and AK algorithms employ advanced starts but not LS. so it is unclear whether the timing comparisons include LS starts for ali three pR)cedures. Soliman et al. [39] proposed an algorithm that used LS residuals to identify tbe observations whose LAV residuals are equal lo zero. In this way. Ihey claimed ihat the LAV regression could be determined and tbe resulting computational lime would be faster than algorithms utilizing simplex solutions. Herce [40[ sbowed tbat the algorithm does not necessarily produce LAV estimates. Christensen et al. [411 proposed a modification of the original algorithm in which ibe original method is implemented only after first discarding observations wiih large LS residuals. Herce [42] responded, but did not point out the remaining problems witb the revised algorithm. Bassett and Koenker [43[ showed that the modified algorithm wouki not produce estimates tbat are necessarily identical or even close to the LAV estimales. and recommended that the algorithm not be used for LAV estimation. Dielman [44| summarized the computational algorithms and timing comparisons for LAV regression including many of those presented so far. There have been a numher of other algorithms or modifications to algorithms suggested in tbe literature. Seneta and Steiger [451 suggested a LAV algorithm that is faster than lhe BS algorithm when tbe number of parameters is large relative to the number of observations. Farebrother [46] presented a version of the algorithm of Sadovski [471 for LAV simple regression tbat incorporated several improvements overthe original. Farebrother [48[ proposed three variants of tbis procedure, along witb timing comparisons, and suggestions for improvements to tbe code of JS. Rech et al. [49] described an algoritbm for fitting the LAV simple regression line that is based on a labeling tecbnique derived from linear programing. No timing comparisons were given. Madsen and Nielsen |50| described an algorithm fOr solving tbe linear LAV problem based on smoothing the non-diffcrentiable LAV function. Numerical tests suggested that the algorithm might be superior to the BR code. Hong and Cboi |511 proposed a method of finding Ihe LAV regression coefficient estimates by defining tbe estimates in terms of the convergent weighted medians ofthe slopes from each data point lo the point that is assumed to be on a predicted regression line. The method is similar to tbat of JS. Sklar |52j provided extensions to available LAV best subset algoritbms. Narula and Wellington [53| provided a single efficient algorithm to solve both the LAV and tbe Chebychev regression problems, ratber than using separate algoritbms for each. Planitz and Gates 1541 used a quadratic programing method to select the unique best LS solution from the convex set of all besl LAV solutions. They suggested this approach as a solution to cases when a unique LAV solulion does not exist. Adcock and Meade [55| compared three methods for computing LAV estimates in tbe linear modei: the BR algorithm, the modification of the BR algorithm due to Bloonitield and
Lea.st ahsolute valtie
267
Steiger [30] and an iteratively reweighled least squares (IRLS) algorilhiii. They found the IRLS algorithm to be (aster when the number of observations was large relative to the number of parameters {for example, in a simple regression with more than 1500 observations and in a five-variable multiple regression with more than 5000 observations). This is in contrast to previous comparisons involving IRLS algorithms. Portnoy and Koenker (56] surveyed recent developments on the computation of LAV estimates, primarily the interior point algorithms for solving linear programs. A simple preprocessing approach for data is described that together with the use of interior point algorithms provides dramatic time improvements in computing LAV estimates. The aulhors note that simplex-based algorithms will produce LAV regression solutions in less time than LS for problems with a few hundred observations, but for very large numbers of observations can be much slower. The pre-processing of the data involves choosing two subsets of the data such that the observations in one subset are known to fall above the optimal LAV plane and the observations in the other will fall below the plane. Using these subsets as observations effectively reduces the number of observations in the LAV regression problem and therefore the time required to produce a solution. The authors t)btain a 10- to 100-fold increase in computational speed over current, simplex-based algorithms in large problems (10,000-20{).{)00 observations). The authors note a number of avenues for future research that may reline such an approach. Hopefully, such algorithms might soon be available in commercial software. See also Coleman and Li [57], Koenker [58] and Portnoy [59J.
3.
Properties of least absolute value regression estimators
Rewrite the model in equation (1) in matrix form as Y = X/i + e,
(6)
where Y is the t\ x I vector of observations on the dependent variable; X, the n x (/T -I- I) matrix of observations on the independent variables. /J. the (A" -f I) x 1 vector of regression coefticients to be estimated and £ is the/J x I vectorof disturbances. Assume that the distribution function, F, ofthe disturbances has median zero, that F is continuous and has continuous and positive / at the median. Also assume that {I/»)X'X —»• Q. a positive defmite matrix, as /) —> oo. Under these assumptions, Bassett and Koenker [60] are recognized as the first to provide a proof that -Jn^ ~ (i) converges in distribution to a ^-dimensional Gaussian random vector with mean 0 and covariance matrix A-Q"', where k-/n is the asymptotic variance of the sample median from random samples from distribution F. Koenker and Bassett [611 also proved asymptotic normality of Boscovich's estimator (LAV subject to the constraint that the mean residual is zero). Phillips [62] used generalized functions of random variables and generalized Taylor series expansions to provide quick demonstrations of tbe asymptotic theory for the LAV estimator. Assuming the errors are independent and identically distributed (iid) with zero median and probability density that is positive and analytic at zero and that (l//f)X'X -^ Q, a positive definite limit, as n -^ oo, Pbillips proceeds as if the objective function were differentiable. Justification is provided for proceeding with Taylor series expansion ofthe first-order conditions to arrive at the asymptotic theory. Pollard [63] provided a direct proof of tbe asymptotic normality for the LAV estimator. The author points out that previous proofs depended on some sort of stochastic equicontinuity argument: they required uniform smallness for the changes in sotne sequence of stochastic processes due to small perturbations ofthe parameters. The technique in this artiele depends on
268
T. E. Dielman
the convexity property of the criterion function and results in a simpler proof. Pollard proves convergence in distribution under a variety of assumptions. He assumes that the disturbances are iid with median zero and a continuous, positive density in a neighborhood of zero and proves convergence when: (I) the independent variables are deterministic; (2) the independent variables are random and (3) the data generation process is autoregressive (AR) with either finite or infinite variance disturbance distributions. Wu [64] provides conditions under which LAV estimates are strongly consistent. There are a number of different assumptions and conditions under which weak consistency of the LAV coefficient estimator has been proved. Bai and Wu [65[ provide a summary of a variety of cases. Here is a list of possible assumptions used in various combinations to prove weak consistency for the regression model: (Al) The disturbances are independent and come from distribution functions /v each with median zero. (A2) The disturbances are independent and come from a common distribution function F with median zero. (B1) There exist positive constants 6 e {0,\ /2) and ^ > 0 such that for each / = 1.2 < ~S)] > 9. (B2J There exist positive constants 6 and A such that for each / = 1 , 2 , . . . , m a x ( / » ( - M < Ei < 0 ) . PiO < Ei < tt)] < Ou
forO < H < A .
(B3) There exist positive constants, ^i and^2i and A such that for each ( = 1 . 2 . . . . , ^2l"I 5 Pi(n) ^ ^i|w|
for I»| < A
where Pi(u} = P{Q < E i < u )
if M > 0 .
P i i u ) = p { i i < Sj < { ) ) i f w < 0 . (,B4) There exist positive constants B and A such that for each / = 1. 2 , . . . , e, has a density fi witb fi(u) < 6* for - A < M < A. There are aiso various sets of conditions on tbe explanatory variables, where /.t is a i-vector. (C1) .V-' -> 0 where 5« = E L i ^.^,' (C2) inf|^i=,E;^il/^'-^/l = oo
(C3) E"il-^/P = oo (C4) j:r=^\xi\ = oo Chen et al. [66] show that CI is a necessary condition under assumptions A2 and B3. Chen and Wu |67| assume AI and B4 and show that C4 is a necessary condition for consistency. Chen ('/ (ll. [68] show that C3 is a necessary condition for consistency under AI and B2. Bai and Wu [65] assume A1 and BI and show that C2 is a necessary condition for consistency. Andrews [69] showed that LAV estimators are unbiased if the conditional distribution of the vector of errors is symmetric given the matrix of regressors. In certain ca.ses of LAV estimation, there may not be a unique solution, so a tie-breaking rule might be needed to insure unbiasedness. This rule may take the form of a computational algorithm as discus.sed in Farebrother [701. When disturbance distributions are not symmetric. Withers |7I | provides approximations for the bias and skewness of the coefficient estimator.
Least absolute value
269
Bassett [72[ notes that a well-known property of the LAV estimator is that for a /)-variable linear model; p observations will be fitted exactly. He shows that certain subsets of p observations will not be fit by the LAV estimate for any realization of the dependent variables. This identifies subsets of the data that seem to be unimportant. The author considers this property of LAV estimation mainly because it seems so strange. Bai [ 731 developed the asymptotic theory for LAV estimation of a shift in a linear regression. Caner [74] developed the asymptotic theory for the more general threshold model. He et al. [75] introduced a finite-sample measure of performance of regression estimators based on tail behavior. For heavy-tailed error distributions, the measure introduced is essentially the same as the finite-sample concept of breakdown point. The LS, LAV and least median-of-squares estimators are examined using the new measure with results mirroring those that would be obtained using breakdown point. Ellis and Morgenthaler [76] introduced a leverage indicator that is appropriate for LAV regression. For the LAV case, the leverage indicator tells us about the breakdown and/or exactness of fit. Ellis [77] developed a measure of instability that shows tbat LAV estimators are frequently unstable. In a comment. Portnoy and Mizera (ref. 78, pp. 344-347) suggest that LAV estimators do not exhibit extreme forms of sensitivity or instability and find fault with the measure developed by Ellis. (Ellis responds on ref. 79, pp. 347-350.) Dieiman [ 1 ] summarized early small-sample comparisons of efficiency of LAV and LS estimators. The results of these early studies confirmed that the LAV regression estimator is more efficient than LS when disturbances ai'e heavy-tailed. This fact was later confirmed analytically as well. The analytic results sbow tbat LAV will be preferred to LS whenever tbe median is more efficient than the mean as a measure of location. Pffalfenberger and Dielman [80] used Monte Carlo simulation to compare LS. LAV and ridge regression along with a weighted ridge regression estimator (WRID) and an estimator that combines ridge and LAV regression (RLAV). The simulation used normal, contaminated normal. Laplace and Cauchy disturbances. The RLAV estimator performed well for the outlier producing distributions and high muiticollinearity. Lind el ul. [81] examined the performance of several estimators when regression disturbances are asymmetric. Estimators included LS, LAV and an asymptotically optimal M-estimator, The LAV estimator performs well when the percentage of observations in one tail is not too large and also provides a good starting point for the optimal M-estimator. McDonald and White [82| used a Monte Carlo simulation to compare LS. LAV and several other robust and partially adaptive estimators. Disturbances used were normal, contaminated normal, a bimodal mixture of normals and lognormal. Sample size was 50. Adaptive procedures appeared ttt be superior to other methods for most non-normal error distributions. They found the bimodal error distribution to be difficult for any method.
4.
Least absolute value regression with dependent errors
When time-series data are used in a regression, it is not unustial to find that the errors in the model are correlated. There is a long history of what to do in this case when the estimation method is LS. Several articles deal with this problem when LAV estimation is used. For the following discussion, the regression model in equation (1) is considered. The disturbances are generated by a first-order AR process: e, = pF.,_x + ; / , .
(7)
270
T. E. Dietman
where p is the first-order autocorrelation coeflicient (|p| < I) and the ;;, are iid disturbances, but not neces.sarily normally distributed. Two procedures, both two-stage and based on a generalized LS approach, are typically employed to correct for autocorrelation in the least squares regression context. These are the Prais-Winsten (PW) and Cochrane-Orcutt (CO) procedures. Both prtK-edures transform ihe data using the autocorrelation coefficient, p, after which the transformed data are used in estimation. The procedures difier in their treatment ofthe first observation, {x\. y\). Using ihe model of equation (6), the PW translbrmation matrix can be written;
(8) 0
0
...
-p
\
Pre-mu!tiplying the model in equation (6) by Mt yields
MY = MXft + Me
(9)
Y* = X';^ -I- n
(10)
or where Y' contains the transformed dependent variable values and X* is the matrix of transformed independent variable values, so yi
-
/>.vi • • -
V,, - py,' variables. Journal of Mutlivariale Anatysis. 61, 144-158. [175] Morgenthaler. S.. 1997. Properties of L| residuals, L\-.Statistical Procedures and Related Topics. IMS Lecture Notes - Monograph Series. 31. 79-90. [ 176] Sheather. S.J. and McKean, J,W., 1992, The interpretation of residuals based on L| estimation. In: Dodge, Y, (Ed.), L\-Statisticat Anatysis and Related Methods. North-Holland, Amsterdam, pp. 145-155. [ 177] Hurvich. CM, and Tsai, C . I WO. Mode! .selection for least absolute deviations regression in small-samples. Staii.stics & Pivbabitity Letters. 9, 259-265, [ 178) HuSkova. M.. 1997. Li-test procedures for detection of change. L\-Statisticat Procedures and Related Topics. IMS Lecture Notes - Monograph Series, 31. 57-70.
Least absolute value
285
Summary of Additional Papers on Least Absolute Value Estimation Not Cited in Text Armstrong, R.D, and Kung, M,T., 1984, A linked list data structure for a simple linear regression algorithm. Computational and Operational Research. II, 295-305, Pre,sents a special purpose algorithm to solve simple LAV regression problems. The algorithm is a specialization of the linear programing approach developed by Barrodale and Robens. but requires considerably less storage. Bassett, G.. 1992, The Gauss Markov property forthe median. In: Dodge. Y. (Ed.). L \ -StatisticalAnalyses and Related Methods (Amsterdam: North-Holland), pp. 23-31. A Gauss Markov type theorem for the median is proved. The author shows that such a result implies more about restrictions on the cla.ss of estimators considered than optimality of the e.stimator. Brennan,J.J.andSeiford,L.M.. 1987, Linear programing and 1| regression; A geometric interprelalion. Computational Slatislics ami t)ata Anatysis, 5, 263-276. Provide a geometric interpretation of the solution process of the LAV regression problem. Danao. R.A,. 1983, Regression by minimum sum of absolute errors: A note on perlect muiticollinearity. Philippine Review of Economics and Business. 20. 125-133, When there is perfect tiiulticollinearity among regressors in a LAV regression, the simplex algorithm wil! chtwse one maximal set of lineariy independent regressors from the equation by setting the coefficients of the other variables equal to zero, ln essence, the variables witb zero coefficients are dn)pped fmm tbe equation. There will be multiple optimal solutions pos,sible in such a ease. Dodge, Y. and Roenko. N.. 1992. Stability of L|-nonn regression under additional observations. Computational Statistics and Data Anatysis. 14,385-390. A lest is provided to determine whether the introduction of an ailditional observation will lead to a new set of LAV regression estimates or whether the original solution remains optimal, Dupacova, J.. 1992. Robustness of L i regression in the light of linear programing. In; DtKige, Y, (Ed.), L\-Stalislicat Analy.sis and Related Meihods (Amsterdam: North-Holland), pp. 47-61, Uses linear programing results to examine the behavior of LAV estimates in the linear regres,sion model. Properties of LAV estimates are explained through the useof LP. Earebrother, R.W.. 1987b, Mechanical representations of the Li and L: estimation problems. In: Dodge, Y. (Ed,), Slalislicat Data Analysis Ba.sed on the L i -mirm and Related Meihods {AmslcriSam: Nonli-Holland), pp, 455-464, Ha. CD. and Narula, S.C. 1989, Perturbation analysis for ihe minimum sum of absolute errors regression. Communications in Statistics-Simulation and Computation, 18,957-970, Used sensitivity analysis to investigate the amount by which the values ofthe response variable forthe non-defining observations (those with non/cro residuals) would change without changing the parameter estimate in a LAV regression, Harris, T. 1950. Regression using minimum absolute deviations. .American Stati.stician. 4. 14-15. Brief discussion of LAV regression as an answer to a contributed question. Huber. P., I9S7. The place of the L|-norm in robust estimation, Computalionat Statistics and Data Atuttysis, 5, 255-262. Discussed the place of the LAV estimator in robust estimation. Huber states the two main purposes of LAV estimation as (1) providing estimates with minimal bias if the observations are asymmetrically contaminated and [2) furnishing convenient staning values for estimates based on iterative procedures. Koenker. R. and Bassett, G.W.. 1984, Four (pathological) examples in asymptotic statistics. The American Staii.ttician, 38. 209-212. The authors present four examples illustrating varieties of pathological asymptotic behavior. The examples are presented in the context of LAV regres,sion. The article provides insight into results ofthe failure of statidard conditions, McConnell, C.R.. 1987. On computing a best discrete L| approximation using tbe method of vanishing Jacobians. Computatitmal Statistics and Data Anatysis, 5. 277-288. Shows that (be method of vanishing Jacobians ean be used to solve the L| linear programing problem, McKean, J.W, atid Sievers, G.L., 1987, Coefficients of determination for least absolute deviation analysis. Statistics