0% found this document useful (0 votes)
61 views

Least Absolute Value Regression

Uploaded by

nadamau
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views

Least Absolute Value Regression

Uploaded by

nadamau
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Journal of Statistical Computation and Simulation

Vol. 75, No. 4. April 2005. 263-286

Least absolute value regression: recent contributions


TERRY E. DIELMAN*

M. J. Neeley School ot" Business, TCU, RO. Box 298530, Fort Worth, TX 76129, USA

(Revised 7 October 2003: in final form 21 March 2004}

This article provides a review of research involving least absolute value (LAV) regression. The review
is concentrated primarily on research publisbed since Ihe sur\'ey article by Dielman (Dielman, T. E.
(1984). lx"a.sl absolute value estimation in regression mtxlels; An annotated bibliography. Communi-
cations ill Statistics - Theory and Methoih. 4. 513-541.) and includes articles on LAV estimation as
applied to linear and non-linear regression models and in sysiems of equations. Some topics included
are computation of LAV estimates, properties of LAV eslimators and inferences in LAV regression.
In addition, recent work in some areas related lo LAV reijression will be discussed.

Keywords. Linear regression models; Nonlinear regression models; Systems of equations; /. i -norm
regression; Minimum absolute deviation regression; Least absolute deviation regression; Minimum
sum of absoiute errors regression

1. Introduction

This article provides a review of research on least absolute value (LAV) regression. It includes
articles on LAV estimation as applied lo linear and non-linear regression models and in systems
of equations. Some references to the LAV method as applied in approximation theory are also
included. In addition, recent work in areas related to LAV regression will be discussed. I have
attempted to include major contributions to LAV regression not included in Dielman 11 ]. My
apologies in advance for any omissions.
Additional survey articles on LAV estimation include the annotated bibliography
of Dielman [I] as well as survey articles by Dodge [2|. Narula |3] and Pynnonen and
Salmi 14], The paper by Dodge 12] served as an introduction to three special issues of
Computational Statistics atul Data Analysis (CSDA) entitled "Statistical Data Anaiysis Pro-
cedures Based on the Li-Norm and Related Methods' {Computational Statistics and Data
Analysis. Volume 5. Number 4. 1987; Volutiie 6. Numbers 3 and 4, 1988). An earlier ver-
sion of the paper appears in Statistical Data Analysis Based on the L\-Norm atid Related
Methods, Y. Dodge, editor. Amsterdam: North-Holland. 1987. This is a collection of arti-
cles from the First International Conference on the L|-Norni and Related Methods held in
Neuchatel. Switzerland. I reference a number of the articles in the CSDA collection, but

*Fax: 817-257-7227; Email: t.dielman@lcu.edu

Journal of Stalistical Cotnputatioti and Simulation


ISSN 0094-9655 print/ISSN 1563-5163 online © 2005 Taylor & Francis Ltd
htip://www.iandf.co.uk/iournals
DOI: 10,1080/0094965042000223680
264 T E. Dielman

not their earlier versions in the conference proceedings since these are essentially repeats.
There are three other collections of articles worth mentioning. These collections contain
selected papers from the Second. Third and Ft)urth International Conferences on the L|-
Norm and Related Methods, held in Neuchatei. in 1992. 1997 and 2002, respectively. These
collections are published as L\-Slatistical Analysis and Related Methods. Y. Dodge, editor,
Amsterdam: North-Holland, 1992; L[-Statistical Procedures atid Related Topics, Y. Dodge,
editor. Institute of Mathematical Statistics Lecture Notes - Monograph Series. Volume 31,
1997; and Statistical Data Atialysis Based on the L\-Nonn atid Related Methods, Y. Dodge,
editor. Birkhauser, 2002. Selected papers from these collections will be referenced in this
article.
In addition to survey articles, there are books or chapters in books tbat provide informa-
tion on LAV regression. Birkes and Dodge Iref. 5, see Chap. 41. Blootntield and Steiger [6]
and Sposito |ref. 7. see Chap. 5] provide technical detail about LAV regression, whereas
Farebrother | 8 | presents a discussion of the historical deveiopment of LAV and least squares
(LS) methods.
The primary emphasis of this article is LAV linear regression. To motivate the discussion,
consider the multiple linear regression model

fthXik-^E; f o r / = = 1,2 n. (1)


k=\
where v, is the /th value ofthe response variable; .t,*:, the /th observation on the kth explanatory
variable; ^o- the constant in the equation, fit, the coefficient of the ^th explanatory variable
andf, i.s the/th value of the disturbance. Additional assumptions about the model are reserved
unlil later sections, The LAV regression involves finding estimates of fi^. fi\. fii fiK,
denoted ho,b\,bz,.. ..hn^ that minimize the sum ofthe absolute values ofthe residuals,
X!/'=i \yi ~ -^"'l' '^here y,- = /?{> -I- Ylk=\ ^k-'^ik represent predicted values.
This problem can be restated as a linear programing problem:

minimize /J(^,^ + d~) (2)


(=1

/ ^ \
subject to Vi - I ^11 + X]''**''^ + ^^^ - (/,' I = 0 for / = 1. 2 n. (3)

where df, d~ > 0, and the bi,. k =()A.2 K are unrestricted in sign. The df and d~
are. respectively, the positive and negative deviations (residuals) associated with the /th
observation.
The dtial problem is stated most conveniently as

maximize ^ ( / ? , y , - v,) (4)

subject to \ \ /-''•^'* ~ -"•"' for /: = 1, 2 K, (5)


(=1

where 0 < />, < 2, / = 1.2 n. In the dual formulation, the dual variables are the />,,
/ = I. 2 n. See Wagner |9] fora discussion of this form ofthe dual problem.
The LAV regression is also known by several other names, including L| -norm regression,
minimum absolute deviation regression, least absolute deviation regression and minimum sum
of absolute errors regression.
Least absolute vahw 265

2. History and computation of least absolute value regression

Boscovich [ 10,11 ] explicitly discussed minimizing the sum ofthe absolute errors as a criterion
for fitting a line to observation data. This is the first recognized use of the LAV criteria for a
regression application and is prior to Legendre's announcement of the principle of LS in 1805.
After that announcement, LAV estimation took a secondary role in the solution of regression
problems, likely due to the uniqueness of LS solutions, the relative computational simplicity
of LS and the thorough reformulation and development of the method of LS by Gauss 112-14]
and Laplace [15] in terms ofthe theory of probability.
Fiu-ebrother [8] provides a history of the development of LAV and LS procedures for fitting
linear relationships. I highly recommended this book for the reader interested in obtaining a
clear perspective of the historical development of these methods. The other published work
of Farebrother 116-19] also provides a variety of historical information. The 1993 article
discusses the work of Boscovich.
Koenker [20] provides an interesting review of historical connections to state-of-the-art
aspects of LAV and quantile regression. He discusses Edgeworth's work on computation of
LAV regression and its relationship to the simplex approach as well as Edgeworth's comment
that LAV computations could be made as simple as those of LS. He also relates the work of
Bowley [21 ] to that of Rousseeuw and Hubert [22] on regression depth and that of Frisch [23]
to interior point methods.
Stigler [24[ includes LAV regression as a part ofthe discussion in his book on statistics prior
to I9O{). In addition, Stigler [25] discusses a manuscript fragment that shows that Thomas
Simpson and Roger Boscovich met in 1760, and that Boscovicb posed a LAV regression
problem to Simpson.
Charnes et al. |26] are credited with tirst using tbe simplex method to solve a LAV regression
problem. They used tbe simplex method to solve the primal linear programing problem directly.
It was quickly recognized, however, that computational efficiencies could be gained by taking
account ofthe special structure ofthe type of problem being solved. Until the 1990s, most
research on algorithms/programs to solve the LAV regression problem involved variations of
the simplex method.
Barrodaleand Roberts [27,281 (BR) provided a veiyefticient algorithm based on the primal
formulation ofthe problem. Tbis algorithm is the one used in the lMSL package of subroutines.
It was considered to be the fastest algorithm available at the time it was published, and is
still often used as a benchmark today. Armstrong and Kung [29] (AK) specialized the BR
algorithm for simple regression. Bloomfield and Steiger [30] (BS) modified the BR algorithm
by employing a steepest edge criterion to determine a pivot.
Armstrong cr«/. [3! [ (AFK) used the revised simplex method with LU decomposition ofthe
basis matrix to develop a very fast algorithm for multiple regression. Tbe algorithm is similar
to that of BR but is more efficient due to its use ofthe LU decomposition for maintaining the
current basis and requiring less sorting.
Josvanger and Sposito [32] (JS) presented an algorithm for LAV simple regression that
used a descent approach rather tban linear programing. Many early timing comparisons ofthe
algorithms mentioned so far are summarized in Dielman and Pfaffenberger [34j.
Gentle et al. [33] examined the pertbrmance of LAV algorithms for simple and multiple
regression. The study used openly available codes. For simple regression, the codes of AK.
JS. AFK. BS and Abdelmaiek [351 (A) were compared. The JS program performed well. In
multiple regression, the AFK program performed well. The BS program performed well in
both cases for smaller sample sizes, but failed to produce correct answers when sample size
was large (1000 or more in multiple regression).
266 T E. Oielman

Gentle et al. [36] examined the performance of LAV algorithms for simple regression.
Again, the study used openly available codes. For simple regression, the codes of AK, JS,
AFK and BS were compared. The JS program performed well. When there was a perfect fit,
(he AFK program outperformed the JS program.
Naiula ('/ (li [37[ performed a liming comparison for the codes of JS, AFK. BS and BR for
simple regression. The JS algorithm performed best when sample size was 3(X) or less, the BS
algorithm when sample size was 750 or more, and the two performed similarly for intermediate
sample sizes. The previous four algorithms and the algorithm of AK were compaied when LS
estimates were used as starting values in the LAV algorithms. The AFK and BS algorithms
performed best overall.
Sunwooand Kim [38] (SK) developed an algorithm that used a direct descent method with
tbe LS estimate as a starting point. SK showed their algorithm to be faster than hotb AK and
JS. (It should be noted tbat the timing results of SK for tbe JS algorithm differed considerably
from other published results.) Although the timing comparisons favor the SK algorithm, it is
unclear whether tbe computational time involved in finding the LS estimate is included. The
JS and AK algorithms employ advanced starts but not LS. so it is unclear whether the timing
comparisons include LS starts for ali three pR)cedures.
Soliman et al. [39] proposed an algorithm that used LS residuals to identify tbe observations
whose LAV residuals are equal lo zero. In this way. Ihey claimed ihat the LAV regression could
be determined and tbe resulting computational lime would be faster than algorithms utilizing
simplex solutions. Herce [40[ sbowed tbat the algorithm does not necessarily produce LAV
estimates. Christensen et al. [411 proposed a modification of the original algorithm in which
ibe original method is implemented only after first discarding observations wiih large LS
residuals. Herce [42] responded, but did not point out the remaining problems witb the revised
algorithm. Bassett and Koenker [43[ showed that the modified algorithm wouki not produce
estimates tbat are necessarily identical or even close to the LAV estimales. and recommended
that the algorithm not be used for LAV estimation.
Dielman [44| summarized the computational algorithms and timing comparisons for LAV
regression including many of those presented so far.
There have been a numher of other algorithms or modifications to algorithms suggested
in tbe literature. Seneta and Steiger [451 suggested a LAV algorithm that is faster than lhe
BS algorithm when tbe number of parameters is large relative to the number of observations.
Farebrother [46] presented a version of the algorithm of Sadovski [471 for LAV simple regres-
sion tbat incorporated several improvements overthe original. Farebrother [48[ proposed three
variants of tbis procedure, along witb timing comparisons, and suggestions for improvements
to tbe code of JS. Rech et al. [49] described an algoritbm for fitting the LAV simple regression
line that is based on a labeling tecbnique derived from linear programing. No timing compar-
isons were given. Madsen and Nielsen |50| described an algorithm fOr solving tbe linear LAV
problem based on smoothing the non-diffcrentiable LAV function. Numerical tests suggested
that the algorithm might be superior to the BR code. Hong and Cboi |511 proposed a method
of finding Ihe LAV regression coefficient estimates by defining tbe estimates in terms of the
convergent weighted medians ofthe slopes from each data point lo the point that is assumed
to be on a predicted regression line. The method is similar to tbat of JS. Sklar |52j provided
extensions to available LAV best subset algoritbms. Narula and Wellington [53| provided a
single efficient algorithm to solve both the LAV and tbe Chebychev regression problems, ratber
than using separate algoritbms for each. Planitz and Gates 1541 used a quadratic programing
method to select the unique best LS solution from the convex set of all besl LAV solutions.
They suggested this approach as a solution to cases when a unique LAV solulion does not exist.
Adcock and Meade [55| compared three methods for computing LAV estimates in tbe
linear modei: the BR algorithm, the modification of the BR algorithm due to Bloonitield and
Lea.st ahsolute valtie 267

Steiger [30] and an iteratively reweighled least squares (IRLS) algorilhiii. They found the
IRLS algorithm to be (aster when the number of observations was large relative to the number
of parameters {for example, in a simple regression with more than 1500 observations and in
a five-variable multiple regression with more than 5000 observations). This is in contrast to
previous comparisons involving IRLS algorithms.
Portnoy and Koenker (56] surveyed recent developments on the computation of LAV
estimates, primarily the interior point algorithms for solving linear programs. A simple pre-
processing approach for data is described that together with the use of interior point algorithms
provides dramatic time improvements in computing LAV estimates. The aulhors note that
simplex-based algorithms will produce LAV regression solutions in less time than LS for
problems with a few hundred observations, but for very large numbers of observations can
be much slower. The pre-processing of the data involves choosing two subsets of the data
such that the observations in one subset are known to fall above the optimal LAV plane and
the observations in the other will fall below the plane. Using these subsets as observations
effectively reduces the number of observations in the LAV regression problem and therefore
the time required to produce a solution. The authors t)btain a 10- to 100-fold increase in com-
putational speed over current, simplex-based algorithms in large problems (10,000-20{).{)00
observations). The authors note a number of avenues for future research that may reline such
an approach. Hopefully, such algorithms might soon be available in commercial software. See
also Coleman and Li [57], Koenker [58] and Portnoy [59J.

3. Properties of least absolute value regression estimators

Rewrite the model in equation (1) in matrix form as

Y = X/i + e, (6)

where Y is the t\ x I vector of observations on the dependent variable; X, the n x (/T -I- I)
matrix of observations on the independent variables. /J. the (A" -f I) x 1 vector of regression
coefticients to be estimated and £ is the/J x I vectorof disturbances. Assume that the distribu-
tion function, F, ofthe disturbances has median zero, that F is continuous and has continuous
and positive / at the median. Also assume that {I/»)X'X —»• Q. a positive defmite matrix, as
/) —> oo. Under these assumptions, Bassett and Koenker [60] are recognized as the first to pro-
vide a proof that -Jn^ ~ (i) converges in distribution to a ^-dimensional Gaussian random
vector with mean 0 and covariance matrix A-Q"', where k-/n is the asymptotic variance of
the sample median from random samples from distribution F. Koenker and Bassett [611 also
proved asymptotic normality of Boscovich's estimator (LAV subject to the constraint that the
mean residual is zero).
Phillips [62] used generalized functions of random variables and generalized Taylor series
expansions to provide quick demonstrations of tbe asymptotic theory for the LAV estimator.
Assuming the errors are independent and identically distributed (iid) with zero median and
probability density that is positive and analytic at zero and that (l//f)X'X -^ Q, a positive
definite limit, as n -^ oo, Pbillips proceeds as if the objective function were differentiable. Jus-
tification is provided for proceeding with Taylor series expansion ofthe first-order conditions
to arrive at the asymptotic theory.
Pollard [63] provided a direct proof of tbe asymptotic normality for the LAV estimator.
The author points out that previous proofs depended on some sort of stochastic equicontinuity
argument: they required uniform smallness for the changes in sotne sequence of stochastic
processes due to small perturbations ofthe parameters. The technique in this artiele depends on
268 T. E. Dielman

the convexity property of the criterion function and results in a simpler proof. Pollard proves
convergence in distribution under a variety of assumptions. He assumes that the disturbances
are iid with median zero and a continuous, positive density in a neighborhood of zero and
proves convergence when: (I) the independent variables are deterministic; (2) the independent
variables are random and (3) the data generation process is autoregressive (AR) with either
finite or infinite variance disturbance distributions.
Wu [64] provides conditions under which LAV estimates are strongly consistent. There
are a number of different assumptions and conditions under which weak consistency of the
LAV coefficient estimator has been proved. Bai and Wu [65[ provide a summary of a variety
of cases. Here is a list of possible assumptions used in various combinations to prove weak
consistency for the regression model:
(Al) The disturbances are independent and come from distribution functions /v each with
median zero.
(A2) The disturbances are independent and come from a common distribution function F
with median zero.
(B1) There exist positive constants 6 e {0,\ /2) and ^ > 0 such that for each / = 1.2

< ~S)] > 9.

(B2J There exist positive constants 6 and A such that for each / = 1 , 2 , . . . ,

m a x ( / » ( - M < Ei < 0 ) . PiO < Ei < tt)] < Ou forO < H < A .

(B3) There exist positive constants, ^i and^2i and A such that for each ( = 1 . 2 . . . . ,

^2l"I 5 Pi(n) ^ ^i|w| for I»| < A

where

Pi(u} = P{Q < E i < u ) if M > 0 .


P i i u ) = p { i i < Sj < { ) ) i f w < 0 .

(,B4) There exist positive constants B and A such that for each / = 1. 2 , . . . , e, has a density
fi witb fi(u) < 6* for - A < M < A.
There are aiso various sets of conditions on tbe explanatory variables, where /.t is a i-vector.
(C1) .V-' -> 0 where 5« = E L i ^.^,'
(C2) inf|^i=,E;^il/^'-^/l = oo
(C3) E"il-^/P = oo
(C4) j:r=^\xi\ = oo
Chen et al. [66] show that CI is a necessary condition under assumptions A2 and B3. Chen
and Wu |67| assume AI and B4 and show that C4 is a necessary condition for consistency.
Chen ('/ (ll. [68] show that C3 is a necessary condition for consistency under AI and B2. Bai
and Wu [65] assume A1 and BI and show that C2 is a necessary condition for consistency.
Andrews [69] showed that LAV estimators are unbiased if the conditional distribution of
the vector of errors is symmetric given the matrix of regressors. In certain ca.ses of LAV
estimation, there may not be a unique solution, so a tie-breaking rule might be needed to
insure unbiasedness. This rule may take the form of a computational algorithm as discus.sed
in Farebrother [701. When disturbance distributions are not symmetric. Withers |7I | provides
approximations for the bias and skewness of the coefficient estimator.
Least absolute value 269

Bassett [72[ notes that a well-known property of the LAV estimator is that for a
/)-variable linear model; p observations will be fitted exactly. He shows that certain sub-
sets of p observations will not be fit by the LAV estimate for any realization of the dependent
variables. This identifies subsets of the data that seem to be unimportant. The author considers
this property of LAV estimation mainly because it seems so strange.
Bai [ 731 developed the asymptotic theory for LAV estimation of a shift in a linear regression.
Caner [74] developed the asymptotic theory for the more general threshold model.
He et al. [75] introduced a finite-sample measure of performance of regression estima-
tors based on tail behavior. For heavy-tailed error distributions, the measure introduced is
essentially the same as the finite-sample concept of breakdown point. The LS, LAV and least
median-of-squares estimators are examined using the new measure with results mirroring
those that would be obtained using breakdown point.
Ellis and Morgenthaler [76] introduced a leverage indicator that is appropriate for LAV
regression. For the LAV case, the leverage indicator tells us about the breakdown and/or
exactness of fit.
Ellis [77] developed a measure of instability that shows tbat LAV estimators are frequently
unstable. In a comment. Portnoy and Mizera (ref. 78, pp. 344-347) suggest that LAV estimators
do not exhibit extreme forms of sensitivity or instability and find fault with the measure
developed by Ellis. (Ellis responds on ref. 79, pp. 347-350.)
Dieiman [ 1 ] summarized early small-sample comparisons of efficiency of LAV and LS
estimators. The results of these early studies confirmed that the LAV regression estimator
is more efficient than LS when disturbances ai'e heavy-tailed. This fact was later confirmed
analytically as well. The analytic results sbow tbat LAV will be preferred to LS whenever tbe
median is more efficient than the mean as a measure of location.
Pffalfenberger and Dielman [80] used Monte Carlo simulation to compare LS. LAV and
ridge regression along with a weighted ridge regression estimator (WRID) and an estimator
that combines ridge and LAV regression (RLAV). The simulation used normal, contaminated
normal. Laplace and Cauchy disturbances. The RLAV estimator performed well for the outlier
producing distributions and high muiticollinearity.
Lind el ul. [81] examined the performance of several estimators when regression dis-
turbances are asymmetric. Estimators included LS, LAV and an asymptotically optimal
M-estimator, The LAV estimator performs well when the percentage of observations in one
tail is not too large and also provides a good starting point for the optimal M-estimator.
McDonald and White [82| used a Monte Carlo simulation to compare LS. LAV and several
other robust and partially adaptive estimators. Disturbances used were normal, contaminated
normal, a bimodal mixture of normals and lognormal. Sample size was 50. Adaptive procedures
appeared ttt be superior to other methods for most non-normal error distributions. They found
the bimodal error distribution to be difficult for any method.

4. Least absolute value regression with dependent errors

When time-series data are used in a regression, it is not unustial to find that the errors in the
model are correlated. There is a long history of what to do in this case when the estimation
method is LS. Several articles deal with this problem when LAV estimation is used.
For the following discussion, the regression model in equation (1) is considered. The
disturbances are generated by a first-order AR process:

e, = pF.,_x + ; / , . (7)
270 T. E. Dietman

where p is the first-order autocorrelation coeflicient (|p| < I) and the ;;, are iid disturbances,
but not neces.sarily normally distributed.
Two procedures, both two-stage and based on a generalized LS approach, are typically
employed to correct for autocorrelation in the least squares regression context. These are the
Prais-Winsten (PW) and Cochrane-Orcutt (CO) procedures. Both prtK-edures transform ihe
data using the autocorrelation coefficient, p, after which the transformed data are used in
estimation. The procedures difier in their treatment ofthe first observation, {x\. y\). Using ihe
model of equation (6), the PW translbrmation matrix can be written;

(8)
0 0 ... -p \

Pre-mu!tiplying the model in equation (6) by Mt yields

MY = MXft + Me (9)
or
Y* = X';^ -I- n (10)
where Y' contains the transformed dependent variable values and X* is the matrix of
transformed independent variable values, so

yi - />.vi • • - V,, - py,<-1J (II)

and

.^22 - PX\2
12)

_ \ ~ P .V,,i - p.X,,-\.] Xn2 - pXn-1,2 " ' * X,,K - pXn-i,


In equation (10), ij is the vector of serially uncorrelated rjt errors.
The CO transformaiioii matrix is the (7" — !) x 1 matrix obtainetl by removing the first
row of the M transformation matrix. The use of the CO iransformation means that T - I
observations, rather than 7", are used to estimate the mode!. The CO transformation omits the
first observation, whereas the PW transformation includes the transformed first observation.
Asymptotically, ihe loss of this single obser\ation is probably ofniinimal concern. However.
for small-samples, omitting the first observation has been shown to result in a LS estimator
inferior to that obtained when the first observation is retained and transformed. LSCO and
LAVCO will be used to indicate the method in which a CO transformation is used with LS
or LAV, respectively. Similarly. USPW and LAVPW wiil be used to indicate Ihe method in
which a PW transformation is used with LS or LAV.
Coursey and Nyquist [ 831 used a Monte Carlo simulation to compare four types of estimators
when disturbances are subject to first-order autocorrelation: LAV, LS. LSCO and LAVCO.
Sample si/es of 15 and 30 were used with disturbances generated from the class of symmetric
stable distributions. They find that the LAVCO estimator can perform worse than LAV in
certain cases. Prior research had shown that LS can outperform LSCO as well. The omission
ofthe tirst observation may result In an inferior estimator in small-samples.
Weiss [K4| examined LAV estimation in a regression with (irst-t)rder serially correlated
errors. He considered a LAVCO estimator to correct for first-order serial correlation. He
Least absolute value 271

shows that the LAVCO coefficient estimator is asymptotically normal, but this result assumes
existence of at least second moments of the error distribution, a stronger assumption than that
required for independent disturbances. A Monte Carlo simulation using normal, lognormal
and contaminated disturbances, with sample sizes of 25. 49 and 81 was performed. Results
suggested the LAVCO estimator would perform better than the LSCO estimator in heavy-
tailed distributions, but is negatively affected by trending regressors (as in the LS case).
Tests for first-order serial correlation were also examined. This examination found that using
the Durbin-Watson test with LAV residuals substituted for LS residuals was a reasonable
procedure. See Davis and Dunsmuir [851 for asymptotic results wben regression disturbances
follow ARMA errors.
Nyquist [86] showed that the LAVCO procedure is unreliable due to the search procedure and
properties of LAV estimation. It is more likely that the LAVCO procedure will not converge
to the proper value. An alternative non-linear procedure is suggested.
Dielman and Rose [87] used a Monte Carlo simulation to compare LAV. LS, LSPW. LAVPW.
LSCO and LAVCO. Disturbances used were normal, contaminated normal, Laplace and
Cauchy with a sample size of 20. The results suggest that: (I) LSCO and LAVCO should
be avoided; (2) correction for autocorrelation using the PW transformation improves LAV and
LS estimates for moderate-to-high levels of autocorrelation; (3) LAVPW appears to be the
recommended approach when error distributions are fat-tailed and autocorrelation is present
and (4) the estimates are not appreciably worse after autocorrelation correction, regardless of
the degree of autocorrelation. Dielman and Ro.se |88| used an identical simulation design to
compare LAV, LAVCO, LAVPW and two pre-test estimators that transform with either PW or
CO when a pre-test suggested that autocorrelation was present. Again, the PW transformation
was found preferable to the CO. There was little difference between always correcting and
correcting only when suggested by a pre-test.
Nyquist [89] proposed a LAV-based Lagrange multiplier (LM) test for first-order autocor-
relation. As the error variance increases, the asymptotic relative efficiency for the LAV-based
test becomes more favorable relative to tbe LS-based test.

5. Forecasting using least absolute value regression

Dielman [90] used Monte Carlo simulation to compare forecasts from LAV and LS estimated
regression equations with 30 observations. In error distributions that are prone to outliers
(Cauchy, Laplace, contaminated normal), the LAV forecasts were shown to be superior to
LS. Use of LAV (or some other robust technique) was suggested as an adjunct to LS. The
comparison of forecasts from the two methods would provide a way of assessing whether
outliers have adversely affected the LS forecasts. See also Dielman [911 for a correction to
the original article.
Dielman and Rose [92] investigated the forecasting perfonnance of LAV and LS estimated
regressions using Monte Carlo simulation when disturbances are subject to first-order autocor-
relation. Four estimators were compared: LAV. LS and both LAVPW and PW( see the previous
section for definitions of the PW and LAVPW estimators). Out-of-sample root mean square
forecast errors were the basis for comparison. Disturbance distributions used were normal,
contaminated normal. Laplace and Cauchy, and the sample size was 20. The results suggested
that: (I) correction for autocorrelation improves forecasts for moderate-to-high levels of auto-
correlation; (2) LAVPW appears to be the recommended approach when error distributions
are fat-tailed, and autocorrelation is present and (3) the forecasts are not appreciably worse
after autocorrelation correction, regardless of the degree of autocorrelation.
272 T. E. Dietman

6. Inferences in least absolute value estimated regressions

For purposes of this section, we re-express equation (6) in the following matrix form:

e (13)

In equation (13), the coefficient vector fi and the data matrix X have been partitioned; fii is
a Al X I vector of coefficients to remain in the model and X| is the associated part of the
original data matrix, X; ^2 represents the kz x I vector of coefficients to be included in a
hypothesis test and Xi is the associated part of the original data matrix. X. The test we will
consider is the basic test for coefficient significance, i.e. Ho: ^2 = 0. The covariance matrix
of the LAV coefficient estimator can be written as X^(X'X)"' with the scale parameter, k,
defined as X — \f2f{m), where /(";) is the p.d.f. ofthe disturbance distribution evaluated at
the median.
The LM test statistic for the test of the null hypothesis Ho: ^2 = 0 is given by

LM = giDg2. (14)

where g2 is the appropriate portion of the normalized gradient of the unrestricted LAV objective
function, evaluated at the restricted estimate, and D is the appropriate block of the (X'X)"'
matrix to be used in the test.
The WALD test statistic is given by

WALD = ^ (15)

where D is as previously defined and b2 is the LAV estimate of ^2-


The likelihood ratio (LR) test statistic (assuming the disturbances follow a Laplace
distribution) is
^^^2(SAD,^SAD.)^ (16)
A
where SAD| is tbe sum ofthe absolute deviations of the residuals in the restricted or reduced
model {i.e. fii = 0) and SAD2 is the sum of the absolute deviations of the residuals in the
unrestricted model.
The WALD, LR and LM statistics each have, asymptotically, a chi-square distribution with
Al degrees of freedom. See Koenker and Bassett [93| and Bai ft al. [94| for further details on
these test statistics.
Note that both the WALD and LR test statistics require the estimation ofthe scale parameter
K whereas the LM test statistic does not. One estimator often suggested can be computed as
follows:
sA?{it,r-,,,-i)eun)] . " + 1 fn
X= ^ where m = -—— - z^nJ —. (17)
Zajl 2 V 4
where the e,., are ordered residuals from the LAV-fitted model, and /J' = n - r where r is the
number of zero residuals. A value of ur — 0.05 is usually suggested. This estimator will be
referred to as the SECI estimator.
McKean and Schrader \9f>] used Monte Carlo simulation to compare several methods of
studentizing the sample median. The methods included the SECI estimator and several others
that could be extended for use in LAV regression hypothesis tests. SECI peribrmed well and
the value of ff = 0.05 seemed to produce the best results. McKean and Schrader [96] again
suggest this estimator and provide an example of its use.
Lea.st absolute value 273

Sposito and Tveite [97] used Monte Carlo simulation to study the SECI estimator. Good
estimates were obtained for finite range error distributions (triangle and uniform) considered
and for the normal distribution. For the Laplace and Cauchy error distributions, larger sample
sizes were needed: /J — 100 for Laplace. /; — 300 for Cauchy.
Sheather [98[ summarizes the results of a Monte Carlo simulation to compare the SECI
estimator and several other estimators for X. including some that do not extend easily to the
regression application. Tbe conclusion was that the SECI estimator provides a good, quick
point estimate ofthe standard enor. Dielman and Pfaffenberger [99. IOO] and Dielman and
Rose [101, 102) also noted that this estimator performs well when used to compute the LR
test statistic.
Liu [103] proposed several non-parametric estimators o'( X and proved strong consistency.
Niemiro [104] suggested kernel-smoothing methods for estimation ofthe nuisance parameter
in LAV regression. Consistency ofthe suggested estimator is shown and bounds are obtained
for the rate of convergence. Rao [105] and Bai et ai [94[ suggest additional alternatives.
Small-sample comparisons with the estimator in equation (17) would help to determine the
efficacy of tbese estimators.
Bootstrap methodology provides an alternative to the WALD, LR and LM tests. The boot-
strap approach was developed by Efron [106], and bas been shown to be useful in many
instances where traditional approaches for testing and estimation are either undeveloped or
suspected when k2 = \,a bootstrap test statistic for Ho: ^62 = 0 in the LAV regression context
can be computed as follows. The model shown as equation (13) is estimated using LAV esti-
mation procedures, and residuals are obtained, The test statistic. |b2 - 0|/se(b2), is computed
from the regression on the original data, where se(b2) represents the standard error of the
coefficient estimate b2, computed as

se(b2) = XD'^^ (18)

wbere A, is defined in equation (17) and D is defined in equation (14). The residuals. £•,(/ =
1,2 n). from this regression are saved, centered and resampled (with replacement), to
obtain a new sample of pseudo-disturbances, e*. Thee* values are used to create pseudo-data:

Y*=b,X,-f-b2X2+e*, (19)

where b| and b2 are the initial LAV estimates ofthe two vectors of regression coefficients. The
T X 1 vector, e*, is the vectorof e* values. The coefficients in equation (19) are re-estimated
to obtain new parameter estimates, b, and b^- The bootstrap test statistic |b2 — b2l/se(b2)
is computed and saved, and the process is repeated a large number of times. For a test to be
performed at a particular level of significance, a, the critical value is the (I — a)th percentile
from the ordered values of |b2 - b2l/se(bp. If the original test stati.stic. |b2 — 0]/se(b2). is
larger than this critical value, then the null hypothesis that b2 = 0 is rejected. This procedure
follows the guidelines suggested by Hall and Wilson [ 107], including the use of bootstrap
pivoting, which results in increased power for the test and a more accurate level of significance.
When k2 > 1, a modified approach is necessary to produce a statistic similar to the LS
F-statistic. De Angelis et al. [1081 described analytical and bootstrap approximations to the
estimator distributions in LAV regression.Theconsistency of the bootstrap in the LAV regres-
sion setting is established. The rate of convergence is slow in the case of the unsmoothed
bootstrap. Tbe authors show how the rate of convergence can be improved by using either a
smoothed bootstrap approach or a normal approximation based on kernel estimation of the
error density. Suggestions are given for the choice of smoothing bandwidth necessary in the
latter two cases.
274 T. E. Dietmati

The LR. LM and WALD tests are distributed asymptotically as chi-square with degrees of
freedom equal lu the number of coefficients included in the test (denoted A:). However, this
fact does not confirm how the test statistics will perfonn in small-samples. Several Monte
Carlo studies have been performed to try and shod light on the small-sample performance.
These studies will now be summarized.
Koenker [ 109] used Monte Carlo simulation to examine the performance of the WALD,
LM and LR tests for LAV regression coefficients. The true value of the nuisance parameter
was used in computing the WALD and LR statistics. Sample sizes were 30. 60 and 120.
and normal. Laplace and Cauchy error distributions were used. Compari.sons were based on
levels of significance and power, but power comparisons were not adjusted for differences in
levels of significance. The LM estimator performed comparatively well, although room for
improvement was noted.
Schrader and McKean [ 110] examined the LR., WALD and bootstrap test. A Monte Carlo
simulation was used to study test performance. Error distributions were normal, contaminated
normal and slash. Sample sizes ranged from 15 to 300. Comparisons were based on leve[s of
significance and power, but power comparisons were not adjusted for differences in levels of
significance. The study found the WALD test inadequate. The LR test performed reasonably
well. Best perft)rmance was with the bootstrap test.
Dielman and Pfaffenberger [99. 100. I 11 ] compared the WALD. LR and LM test statistics
under a variety of conditions using Monte Carlo simulation. Their results suggest that tbe LR
test using the SECI estimator and the LM test are both superior to the WALD test. Comparisons
were based on levels of significance and power, but power comparisons were not adjusted for
differences in levels of significance. One version of the WALD and LR tests used tbe SECI
estimator of the nuisance parameter; another used a bootstrap estimate of this parameter (not a
true bootstrap test but a bootstrapping procedure was used to estimate the nuisance parameter).
Bootstrap estimation of the nuisance parameter did not improve any of the test results.
Dielman and Rose [ 101 ] compared the LR test using the SECI estimator of the nuisance
parameter, the LM test and the bootstrap test. They used normal, contaminated normal. Laplace
and Cauchy disturbances with sample sizes of 10, 15. 20 and 25. The observed levels of
significance were closer to nominal for the LM and bootstrap tests. The power of the bootstrap
test was generally better than that of the LM test although somewhat lower than that of the
LR test. Results for power were not adjusted for differences in levels of significance.
Dielman and Rose [ 102 [compared the WALD. LR and LM tests in LAV multiple regression.
They used normal, contaminated normal. Laplace and Cauchy disturbances with sample sizes
of 14, 20 and 30. Empirical levels of significance and power of the test procedures were com-
pared. Power results were adjusted using the procedure suggested by Zhang and Boos [112].
The performance of individual coefficient tests and an overall-fit test was examined. Results
suggest that empirical levels of significance are closer to nominal for the LM test but that the
LR test is preferred on the basis of power. Both are preferable to the Wald test.
Dielman and Rose [113] used Monte Carlo simulation to compare level of significance and
power of tests in LAV regression when disturbances are subject to first-order serial correlation.
The test procedures considered are the WALD, LR and LM tests. The LAV regressions are
estimated both with and without correction for autocorrelation. Two corrections are applied:
the CO (omit first observation) Iransformation and the PW (retain first observation) transfor-
mation. Results indicate that correction for autocorrelation is important for large values of
the autocorrelation coefficient. The CO transformation and the WALD test seem to be the
preferred pair when level of significance is considered; when power is considered, tbe CO
and LR combination is prefened. The preference for the CO transformation for testing is in
contrast tv the results for estimation. When estimator efficiency is of interest, the PW trans-
formation produces superior results. This result does not suggest that we should disregard the
Least absolute value 275

PW transformation. Further examination suggests that test procedures do not perform partic-
ularly well with either the PW or the CO transformation when the level of autocorrelation
is high. Perhaps altemative approaches such as a bootstrap test would serve better in this
situation.
Stangenhaus[l 14], Stangenhaus and Narula [1151andStangenhausc;«/. [116] used Monte
Carlo simulation to study the performance of confidence intervals for coefficients in a LAV
regression. Their findings include:
Fairly accurate results are otilaineil with small-samples (sample size t t>-l 5) for normal and contaminated normal
error disiribulions bul much larger samples ;ire needed (sample size UK) nr more) lor Cauchy and Laplace emir
distrihutions. However, ihe difference between nominal and actual coverage rates was small in all cases. Sec
also Dielman and Pfaflenlierger [! 17|,
Intervals compuied using ihe bootstrap sampling distribution (percentile bootstrap = PB) were superior to those
constructed using ilie bootstrap slandard deviation (standard bootstrap — SB} in samples of size SO or less. (This
is consisieni with ihc hypothesis tesling restilts of Dielman and Pfaffenberger discussed earlier.) Little difference
was found with sample sizes grealerlhan 50. The SB intervals were constructed using 200bonlstrap repL'litions;
ihePB intervals used HHW.

Gutenbrunnerf^a/. [118] considered tests of ageneral linear hypothesis for linear regression
based on regression rank scores. The tests are robust to outliers and there is no need to estimate
a nuisance parameter. The regression rank scores arise as solutions to the dual form of the
linear program required to compute regression quantile.s. When sign scores are used, the test
statistic coincides with the LAV LM test.
Cade and Richards [ 119| suggested permutation procedures (resampling without replace-
ment) for hypothesis tests about the parameters in LAV linear regression. A Monte Carlo
study showed that the permutation test performs better than LS-based tests in cases where tbe
disturbances are fat-tailed or asymmetric.
Horowitz [ 120] noted that the LAV estimator does not satisfy the standard conditions lor
obtaining asymptotic refinements through use ofthe bootstrap because the LAV objective func-
tion is not smooth. He proposed a smoothed LAV estimator that is asymptotically equivalent
to the standard LAV estimator. For the smoothed estimator, refinements for tests of hypotheses
about the parameters are possible. The results extend to censored LAV and models with or
without heteroskedasticity.
Weiss [121) developed a generalized method of moments (GMM) test for comparing LAV
and LS regressions. The GMM test is equivalent to the Hausman test.
Furno [ 122] considered different versions ofthe LM test for autocorrelation and/or condi-
tional heteroskedasticity. The use of LS versus LAV residuals as well as squared residuals
versus their absolute values was compared. Furno showed that LM tests based on LAV
residuals were distributed asymptotically as chi-square and were robust to non-normality.
Monte Carlo simulation suggested using tbe absolute value of LAV residuals for the tests
discussed.

7. Time-series models

An and Chen [ 123 [ considered an AR model with stable disturbances and proved convergence
in probability of the LAV estimates of the AR parameters.
Dunsmuir and Spencer [ 124] proved strong consistency of the LAV estimator in ARMA
models under general conditions. They also proved asymptotic normality. Dunsmuir 1125|
used Monte Carlo simulation to study LAV estimation applied to a seasonal moving average
model. The model examined is essentially the airline model of Box of Jenkins. Results sug-
gest that the normal distribution is a poor approximation to the small-sample distributions
of the coefficients, LAV estimation does provide some benefits over LS when disturbance
276 T. E. Dielman

distributions are heavy-tailed, and no clear preference can be given for backcasting over zero
pre-period values when LAV estimation is used. Dunsmuir and Murtagh [126] formulated
LAV models for ARMA processes as non-linear programing problems and suggested tech-
niques for estimation. Examples of applications of LAV versus LS estimation using real data
are shown with support for the use of LAV.
Olsen and Ruzinsky [ 127] studied the convergence of the LAV estimator for an AR(p) pro-
cess when the process generating the data has been incorrectly identified as an ARMA(/7, q),
MAle/) or higher order AR process. Ruzinsky and Olsen [128] proved strong consistency of
the LAV AR parameter estimator when the process disturbance has zero tnean.
Pino and Morettin [129] proved the LAV estimates of ARMA model parameters to be
strongly consistent. They provided a stationarity and invertibility condition avoiding the usual
assumption of finite variance.
Knight [ 130, 1311 shows that LAV estimators of autoregressive parameters are asymptoti-
cally normal if the distribution function of the errors has positive tirst derivative evaluated at
zero. Knight derives limiting distributions of LAV estimators in AR models under more gen-
eral assumptions on the distribution function ofthe errors. Herce [132) derives the asymptotic
distribution of the LAV estimator of the AR parameter under the unit root hypothesis when
eiTors have finite variances. Rogers [133] also derives asymptotic results for LAV regression
models with certain assumptions relaxed. Specifically, his work deals with time-series models
with deterministic trends and random walks and is therefore related to the work of Knight and
Herce.

8. Non-linear least absolute value regression

Gonin and Money ] 134] presented a brief discussion of algorithms for solving the non-linear
LAV regression problem. The algorithms are grouped into three categories: those using only
first-derivative information, those using .second-derivative information and those which lin-
earize the model functions but incorporate quadratic approximations to take curvature effects
into account. They present a much more detailed discussion in Chapter 2 of Gonin and
Money [135].
Oberhofer [ 136 [ shows conditions for consistency of tbe LAV estimator in non-linear regres-
sion. Richardson and Bhattacharyya [ 137[ extended Oberhofer's result. Their resu[ts apply to
a more general parameter space.
Soliman et al. [I38[ proposed a modification of the algorithm discussed in Christensen
et al. [41 ] to produce non-linear LAV estimates with non-linear equality constraints. Bassett
and Koenker [43[ show that this algorithm will not produce estimates that are necessarily
identical or even close to the LAV estimates.
Weiss [ 139] examined an estimator designed for non-linear estimation of the regression
coefficients and the autocotrelation coefficient simultaneously in titne-series regression mod-
els with autocorrelated disturbances. The basic requirement for consistency is that errors
have conditional median zero. Asymptotic normality is shown. Estimation ofthe asymptotic
covariance matrix is also considered and LM. Wald and LR tests are developed.
Shi and Li [140, 1411 discussed a LAV estimator for a model with linear and non-linear
components. A piecewise polynomial is used to approximate the non-linear component. The
convergence rate of the estimator was studied. They lind convergence rates for estimators of
the regression coefficients under mild conditions.
Kim and Choi [142[ provided conditions for strong consistency and asymptotic normality
for the non-linear LAV regression estimator As with linear regression results, the non-linear
Lea.it absolute value Til

LAV estimator is shown to be more efficient than non-linear LS for any error distribution
for which the sample median is more efficient than the sample mean as an estimator of
location.
Andre' et al. [ 143 ] proposed using a non-linear LAV regression to estimate a multistage dose
response model. They used simulation to exatnine tbe performance ofthe model.
Zwanzig [144] shows conditions for strong consistency of the LAV estimator of the
parameters in the non-linear regression model and in the non-linear error-in-variables model.

9. Applications

In this section, a few ofthe applications of LAV regression that have appeared in the literature
will be discussed. Coleman and Larsen [145) used equations estimated by LAV. LS and
Chebychev to predict housing prices. The results showed little difference between forecasts
from the three methods. A more structured experiment might clarify the results furtber.
Corrado and Schatzberg [146] used LAV. LS and a non-parametric rank regression estimator
to estimate the systematic risk {slope parameter) in a regression of daily stock returns on an
index. Their results provide some evidence that both the LS and the non-parametric estimator
will be superior to LAV under the conditions examined. They used only large firms in the
analysis, so the regression disturbances are likely to be close to normal. In another financial
application. Chan and Lakonishok 1147 ] u.sed LS. LAV. trimmed quantile estimators (TRQ) and
two estimators that are linear combinations of regression quantiles to estimate market model
parameters. The study used simulated return data and actual returns. When a /-distribution with
3 degrees of freedom is used as the disturbance distribution, the robust methods were more
efficient than LS. The TRQ estimators appeared more efficient than LAV. In a similar analysis.
Butler et al. [148] applied LS and LAV estimation, as well as a partially adaptive estimator
to market model data and compared the results for two examples. Draper and Paudyal [ 149[
applied LS and LAV to actual returns and suggest that robust methods may be superior when
daily data are used. Mills et al. [150| apply LS and various robust estimators {including LAV)
to an event study, Their findings suggest the results based on cumulative average residuals
from the regression can vary depending on the estimator used.
Bassett [151] examined the rating of sports teams based on scores of previous games by
estimating the parameters ofthe standard linear model. He applied both LAV and LS to data
from the 1993 NFL season. In terms of correct predictions. LAV outperformed LS.
Kaergard [1521 Li^^d LAV regression to estimate a Danish investment growth model while
Sengupta and Okamura [ 153] applied LAV regression to a cost frontier mode! for analyzing
the impact of scale economies in tbe South Korean manufacturing .sector. Westland [154]
applied LAV. LS and two-stage LAV and LS estimation techniques to systems of equations
with partial structural variability. Preference among estimators and predictions depended to
some extent on the criteria used to judge performance, but, in general, little difference was
found for the example in this article.
Gonin and Money [ 155[ used non-linear examples from pharmacokinetics to illustrate an
adaptive algorithm for the choice of p in non-linear L/,-estimation. They compared estimates
from the adaptive algorithm to LAV estimates. See also Roy [I56[.
Another area that might be thought of as an application of LAV regression is to provide
initial estimates for otber robust algorithms. This was done, for example, in Rousseeuw and van
Zomeren [ 157]. They used LAV regression as part of a robust regression algorithm. Leverage
points are first eliminated from the data set by various methods, and then LAV regression is
applied to the remaining data.
278 T. E. Dielman

10. Further uses of least absolute value estimation methods

Powell [ 158] examined two-stage LAV estimators for the parameters of a structural equation in
a simultaneous equation model. He demonstrated the asymptotic normality of ihe estimators for
very general disturbance distributions. Powell [ I59[ proposed an e.stimator for the parameters
of the censored regression model that is a generalization of LAV estimation for the standard
linear model. The estimator is consistent and asymptotically normal for a wide class of error
distributions. Rao and Zhao ] 160] also studied the asymptotic behavior of the LAV estimator in
censored regression models. Using weaker conditions than those of Powell [159], Pollard [63]
and Chen and Wu [67], they prove consistency and asymptotic normality of the estimator.
Koenker and Portnoy [ 161 ] proposed robust alternatives to the seemingly unrelated regres-
sion estimator of Zellner. The LAV is considered as a special case. Asymptotic normality of
the coefficient estimators is shown.
Honore [ 162 [ proposed trimmed LS and trimmed LAV estimators of truncated and censored
regression models with fixed effects and panel data, where the panel was of length two. The
estimators were shown to be consistent and asymptotically normal.
Wang and Scott [ 163] proposed a hybrid method that combines non-parametric regression
and LAV estimation. LAV regressions are fitted over local neighborhoods. The new method
generalizes to several dimensions. The method is proved to be consistent. Thus a procedure
with relative computational ease and asymptotic theoretical results is obtained.
Dodge [ 164] introduced an estimation prtKedure which is a convex combination of LAV and
LS.The method performs well in comparison to LS. LAV and other robust methods. Choosing
thcparameters that detinclheconvexcombinationis the trick that must be mastered in practice.
Dodge and Jureckova [165] show that the estimation of regression parameters by a convex
combination of LS and LAV estimators can be adapted so the resulting estimator achieves the
minimum asymptotic variance in the mode[ under consideration. Dodge et al. [166] discuss
computation of this estimator. Dodge and Jureckova [ 1671 introduce an estimator that is a
convex combination of an M-estimator and the LAV estimator. Dodge and Jureckova (168]
summarize results based on estimation procedures that involve convex combinations of LAV
and some other estimation procedure.
Rao [105] examined extensions of LAV estimation to a multivariate linear model. Mathew
and Nordstrom [ 169] examined procedures for a linear model with an additive bias vector
that is bounded. They called tbis model an approximately linear model. When the regression
coefficients are estimated by minimizing the maximum of a weighted sum of squared devia-
tions, the criterion to be minimi/.ed is a linear combination t)t LS and LAV criteria for the ideal
linear model. When the regression coefficients are estimated by minimizing the maximum of a
weighted sum of absolute deviations, the estimate turns out to be independent ofthe assumed
bounds.
Narula and Karhonen ] I7O[ suggest the LAV criterion for estimating parameters in mul-
tivariate regression models. The problem can be formulated as a multiple objective LP
problem.
Morgenthaler [ 171 ] examined the consequences of replacing LSby LAV in tbe derivation of
quasi-Iikelihoods. The LAV-type criterion was found to be applicable and to lead to alternatives
for maximum likelihood fits.
Puig and Stephens [ 172] studied goodness-of-fit tests for the Laplace distribution. Asymp-
totic theory is given as well as critical values derived from Monte Carlo simulations for
finite-samples. Power studies suggest the Watson U- statistic is the most powerful. Examples
using LAV regression are provided. They suggest the importance of such tests in conjunction
with LAV estimation since the LAV estimator is maximum likelihood when disturbances
follow a Laplace distribution.
Least absolute value 279

Bradu [ 173] provides a sufficient condition for the exact fit property (EFP) which is easier to
check than the condition discussed in Ellis and Morgenthaler [76] and discusses the use ofthe
EFPfor outlier identification in .specific circumstances. Dodge [ 174[ proposed algorithms for
detecting outliers in both the y- and .v-variables. The algorithms utilize regressions involving
the v-variable and each .v-variable individually as the dependent variable.
Morgenthaler [175] di.scusses the behavior of residuals from LAV linear models from
designed experiments. Shortcomings of the use of LAV residuals are noted. Sheather and
McKean [ I76[ examined the usefulness of residual plots from LAV regression. Since the LAV
residuals and the fitted values can be negatively correlated, cases exist where examination of
residual plots may lead to judgments of modet inadequacy for properly chosen models.
Hurvich and Tsai [177] developed a criterion for the selection of LAV regression models. A
new measure specifically oriented toward LAV regression is developed and compared to the
corrected Akaike Information Criterion (cAlC) measure of Hurvich and Tsai. Both measures
perform well, although the cAIC is computationally less intensive.
Huskova [178] proposed L|-type test procedures for detection of a change in linear models.
The model examined is one where the regression values take on specific values up to a certain
point in time, then change from that time point on. The problem is to test whether the change
point occurs at a certain time period.

References
| i | Dielman, TE.. 1984. Leasl absolute value estimation in regression models: An aniiolaled bibHography.
Communications in Statistics - Theory ami Methods. 4. 51 .'1-541.
12| Dodge. Y. 1987. An intn>dui:tion to L| -norm bused statistical data analysis. Computtitioiial Statistics and Data
Analy.m. 5. 239-253.
|3] Narula. S.C. 1987. The minimum sum of absolute errors regression. Jounuil of Quality Technotogy. 19,37-45,
14| Pynniincn. S. and Salmi. T.. 1994. A report on least absoiule deviation regression wiih ordinary linear
programing. Liikelaloudflliueu Aiktikauskirja. 43. 3fi—+9.
|5] Birkes, D. and Dodge, Y. \^9'},.Ahermilivi' Methods of Regression (NewYork: John Wiley).
[6] Bloomfield. P. and Steiger. W.. 1983, U-astAhsolute Deviations: Theory, Applications, amiAlf;orithm\ IBoaton:
Birkhauser).
|7| Sposito. V.A.. 1989. Linear Programing niili Statistical Applications (Anu's. Iowa: Iowa State University
Press).
[81 Farebrother. R.W.. 1999. Fitting Linear Relatiomhlps: A History of the Calculus of Observations 1750-1900
(New York: Springer) (Springer Series in Stalislics).
[y| Wagner. H.M., 1959, LineiU" programing techniques for regression analysis. yoHma/o/r/ie/t/McntYiHSwn'iffcti/
As.socialion. 54. 2(K>-212.
[10] Boscovich, R.J., 1757. De litteraria expeditione per pontificiam ditioned. el .synopsis amplioris opens, ac
habeniur pluraejiisex exemplaria eliam sensorum impre.ssa. Bommien.si Scientiarum el Anum Instituto Atque
Aceulenn'ii Commentarii. 4. 3,^3-396.
[Ill Boscovich, R.J.. 1760. De recentissimis giadiiiim dimensionibus. et iigura, ac magniludine terrae inde
derivanda. Philosophiae Receniioris. a Beuedivto Stay inRomano Archiftyna.sis Puhtico Eloquentare Pm-
fessore. vesibus tradttae. Libri X. cum adnotianibus et Supplementis P. Rogerii Joseph Ho.wovirh, S. / , 2,
406-426.
112] Gauss, C.R, 1809. Theoria Motus Corponim Celestiutn in Sectionbus Conicis Soluni Antbientium. Hamburg:
Perthes el Besser. Translated. 1857. as Theory of Motion of the Heavenly Bodies Moving aboul the Sun in
Conic Sections, trans. Davis. C. H. Little. Brown. Boston. Reprinted. 1963 (New Y<trk: Dover).
[13] Gauss, C.F.. 1823, Theoria Combination's Ohsen'ationuni Errorilms Minimus Obnoxiae. Commentationes
Socieiatis Rcgiae Scieniiarum Gotiingensis Recentiores. 5: German summary. Goltingische Gelehne An/eigen
(1820. 321-327 and (1823). 313-318.
[141 Gauss, CF,, 1828, Supptementum Theoriae Combinationis Obsenationtim Ermribus Minimis Ohno.xiue.
Cnnimenialiones ,Societatis Regine Scienliamm Gotlingensis Recentiores, 6, German summary, Gottingische
GelehrteAnzeigen{l826), 1521-1527.
[15] Laplace, P.S.. 1812. Theorie Anolytique des Probabilites, Mme Courcies, Paris 1812. (Reprinted as Oeuvres
Complete tie Laplace. 7. Gautliier-Vilhirs, Paris. 1847).
[16] Farcbrotber. R.W., 1987, The historical development ofthe Lj and L4 estimation procedures: 1793-1930,
In: Dodge. Y (Ed.). Statistical Data Analysis Based on the L\-norm and Related Methods (Amsterdam:
Nortb-Holland), pp, 37-63.
[17| Farebrother. R.W., 1990, Siudies in ibe history of probability and slatislics XLII: Funher details of contacts
between Boscovich and Simpson in June 1760, Biomeirika. 77, 397-4(K).
280 T. E. Dielman

[181 Farebrolber, R.W.. 199.3, Boscovicb's method for correcting discordant observations. In: Bursill-Hall, P. (Ed.).
R. J. Boscovich: His Life and .Scientific Work (Rome: Insliluto della Encyclopedia Italiana), pp. 255-261.
[19] Farebrotbcr, R.W.. 1997. Notes on the early hislor>' of elemental set melhinls, Li-Stati.stical Procedures and
Related Topici. tMS Lecture Notes-Monograph Series. 31, 161-170.
[20] Koenker, R., 2(KX), Gallon, Edgcwonh, Frisch. and prospects for quantile regression in econometrics. Joumal
of Econometrics. 95, 347-374.
[21] Bowley, A.L.. 1902. Applications to wage statistics and olher groups. Journal ofthe Rovat Statistical Society.
6S.,'?31-.342.
[221 Rousseeuw, P. and Hubert. M., 1998, Regrsasiun dcpih. Journal of the American Statistical Association, 94.
388-102.
[23[ Friscb.R.. 1956, La Resolution des probltmesdeprogramme lin^airepar la m^tbodedu potential logarithmique.
Cahiers diiy Seminaire d'Econometrie. 4. 7-20.
[24[ Sligler. S.M., 1986. The History of Statistics: The Measurement of Uncertainty before I9(X) (Cambridge, MA:
The Belknap Press of Harvard University).
|25| Stigler. S.M.. 1984. Studies in the history of probability and statistics XL: Boscovich, Simpson and a I7t->O
tnanu.'^cript note on fitting a linear relalion. Biometrika. 71. 615-620.
126[ Charnes. A.. Cooper. W.W. and Ferguson. R.. 1955. Optimal estimation of executive compensation by linear
pm^nntin^. Management Science. I. 138-151.
1371 Banodaie, I. and Roberts. F.D.K., 1973, An improved algorithm for discrete I] linear approximation. SIAM
Jtnirtiai on Numerical Analysis, 10, 839-848.
[28[ Barrodale, I, and Roberts, F.D.K., 1974, Algorithm 478: Solution of an overdetennined system of equations in
tbe i norms. Communications' of the ACM. 17. 319-320.
[29 [ Armstrong, R,D. and Kung. M.T.. 1978. Algorithm AS 132: Least absolute values estimates for a simple linear
regression problem. Applied Statistics. 27, 363-366,
[30[ Bloomfield. P. and Steiger. W.. 1980. Least absolute deviations curve-fitting. StAM Journal on Scientific and
Statistical Computing. I. 290-301.
[311 Armstrong, R.D.. Frome, E.L. and Kung. D.S., 1979. A revised simplex algorithm lor the absolute deviation
curve-fitling problem. Communication in Staii.stics-Simulation ami Computation. B8, 175-190.
[32| Josvanger and Sposito. V,, 1983, L| norm estimates for the simple regression problem. Convnunicalions in
Statistics-Simulation and Computation. 12, 215-221.
[331 Gentle, J.E., Narula, S.C. and Sposito, V.A., 1987, Algorithms tor unconstrained L| linear regressiini. In: Dodge.
Y. (E\i.), Statistical Data Analvsis Ba.iedon the L\-Nonn and Related Methods (Amsterdam: Norlh-Htilland),
pp. S3-94.
[34[ Dielman, TE, and Pfaffenberger. R., 1984, Computational algorithms for calculating least absolute valtie and
Chebyshev estimates for multiple regression. American Journal of Mathematicat and Management Sciences,
4. 169-197,
[351 Abdelmalek. N.N.. 1980. Li .solution of overdetermined systems of linear equations, ACM Transactions for
Mathematical Software, 6. 220-227.
[36| Gentle. J.E.. Sposito. V.A, and Narula. S.C. 1988. Algorithms for unconstrained L| simple linear regression.
Computalionat Statistics anti Data Atratysis, 6. 335-33'J.
[37i Nanila. ,S.C,. Sposito. V.A. and Gentle. J.E.. 1991, Comparison of computer programs for simple linear Li
regression. Journal of Statistical Computation and Simulation, 39, 63-68.
[38| Sunwoo. H.S. and Kim. B,C.. 1987. L] estimation for the simple linear regression mixlel. Communications in
Statistics - Theory and methods. 16, 1703-1715.
[.19) Soliman, S.A., Christensen. G.S. and Rouhi. A.H., 1988, A new technique for curve fitting based on minimum
absolute deviations. Computalionat Statistics and Data Analysis. 6. 34 !-351,
[40| Herce, M.A., 1990. An example showing that a new technique for LAV estimation breaks down in certain
cases. Cotnpuiationat Statistics and Data Anatysis. 9. 197-202.
[41) Christensen, G.S.. Soliman. S.A. and Rouhi, A.. 1990, Discussion of an example showing that anew technique
for LAV estimation breaks down in conain cases. Computational Stalislics ami Data Analysis. 9. 203-213.
[42] Herce. M.A., 1990, Some comments on Christensen, Soliman and Rouhi •sDisctissioti.C('m/w/u/iom//S/«(/.v/ic-.v
A Data Analy.sis. 9. 215-216.
|43[ Bassett. G.W, and Koenker. R.W.. 1992, A note on recent proposals for computing h estimates. Computational
Statistics and Data Anatysis, 14, 207-211.
[44[ Dielman. T.E., 1992. Computational algorithms for least absolute value regression. In: Dodge, Y. (Ed,),
t.\-Staiistieal Anatyses atul Related Mettiods (Amsterdam: North-Holland), pp, 311-326,
[45 [ Seneta. E. and Steiger. W.L.. 19S4. A new LAD curve fitting algorithm: Slightly over-determined equation
systems in L|. Discrete Apptied Mathematics. 7, 79-91,
[46[ Farebrother. R.W.. 1988. A simple recursive procedure for the L] norm fitting of a straight line. Applied
Statisties. 3.457-489.
[47[ Sadovski, A.N., 1974. Algorithm AS74: L] -norm lit of a straight line. Applied Statistics. 23. 244-248.
[48[ Farebrother. R,W., 1992, Least squares initial values for the L|-norm titting of a straight line-a remark on
algorithm AS 238: A simple recursive procedure for the L| norm fitting of a straight line. Applied Statistics,

[49] Rech. P,, Schmidbauer, P. and Eng, J., 1989, Lea.st absolute regression revisited: A simple labeling method for
linding a LAR line. Communications in Statistics - Simulation and Computation, 18, 943-955,
Least absolute vatue 281

[50] Madsen, K, and Nielsen, H.B,. 1993. A finite smoothing algorithm for linear lj e.stimation. SIAM Journal on
Optimization, 3. 223-235,
[51 ] Hong, C.S, and Cboi. H.J.. 1997, On Lj regression coefficients. Communications in .Statistics-Simulation and
Compulation. 26, 531 -537.
(52] Sklar. M.G., 1988. Extensions to a best subset algorithm for leasl absolute value estimation. American Journal
of Mattiematicat and Management Sciences. 8, 1-58.
[53] Narula. S.C. and Wellington, J.F., 1988, An efficient algorithm for the MSAE and MM AE regression problems.
StAM Journal on Scientific and Statislicat Computing. 9, 717-727.
154] Planitz, M. and Gates. J.. 1991. Strict discrete approximation in the L| and L^^ norm&. Applied Statistics. 40.
113-122.
[55[ Adcock, C.J. and Meade. N., 1997. A comparison of two LP solvers and a new IRLS algorithm for L| estimation,
L i-Statisticat Procedures ami Retated Topics. IMS Ix'cture Notes - Monograph Series. 31, 119-132,
[56] Portnoy, S. and Kiienker. R.. 1997. The Gaussian hare and the Laplacian tonoise: Computability of squared-error
versus absolute-error estimators. Statisticat Science. 12. 279-296. (Comments and rejoinder, 296-30()).
[57] Coleman, T.R, and Li. Y.. 1992, A globally and quadnuically convergent affine scaling method for linear L|
problems, Mathematicat Programing. 56, 189-222.
[58] Koenker, R,. 1997,11 computation: An interior monologue, L]-Statistical Procedures and Related Topics. IMS
Lecture Notes - Monograph Series, 31, l,'5-32.
[59| Portnoy. S.. 1997, On computation of regression quantiles: Making the Laplacian tortoise faster. L\-Statistical
Procedures and Related Ibpics. IMS Lecture Notes - Monograph Series, 31, 187-200.
[60] Bassett, G.W. and Koenker. R.W., 1978, Asymptotic theory of least absolute error regression. Journal ofthe
American Statistical Association, 73. 618-622,
[61] Koenker. R. and Bassett. 0.. 1985, On Boscovich's estimator. Ttie Annats of Statistics, 13. 1625-1628.
[62[ Phillips. PCB.. 199 l.A shorleut to LAD estimator asymptotics. Econometric Tlieory. 7. 450-463.
[63] Pollard, D,. 1991. Asymptolics for least absolute deviation regression estimators. Econometric Theory, 7,
186-199.
[64] Wu. Y, 1988, Strong consisteney and exponential rate of the minimum Lj-norm estimates in linear regression
models. Computalionat Statistics and Data Anatysis. 6, 285-295.
[65] Bai.Z.D. and Wu. Y.. 1997, On necessary conditionsfor the weak consistency of minimum L|-norm estimates
in linear models. Statistics & Pmbabitity Letters. 34. 193-199.
[66] Chen. X,R.. Zhao. L. and Wu. Y.. 1993. On conditions of consistency of ML|N estimates. Stali.ttica Sinica, 3.
9-18,
[67] Chen. X.R. and Wu. Y.. 1993. On a necessary condition for the consistency of the L| estimates in linear
regression models. Communication in Statistics-Ttteory andMettunls. 22, 631-6,'t9.
[68] Chen. X.R,, Wu. Y. and Zhao, L., 1995, A necessary eondition tor the con.sisieney of L| estimates in linear
models. Sankhya: Ttie Indian Journat of Statistics. Series A, 57. 384-392.
[69] Andrews. D.W.K.. 1986. A note on the unbiasedness of feasible GLS, quasi-maximum likelihood, robust,
adaptive, and spectral estimators ofthe linear model. Econometrica. 54. 687-698.
[70] Farebrother, R,W., 1985. Unbiased L| andLi^je-stimation, Communications in Stati.stics -Ttieorv and Methods,
14. 1941-1962.
[71] Withers. C.S., 1987. The bias and skewness of L|-estimates in regression. Computational Statistics and Data
Anatysis. 5. M)l-30^.
[72] Bassett. G.W.. 1988. A p-subset property of L| and regression quantile estimates. Computational Statistics
anil Data Analy.sis. 6. 297-304.
[73] Bai. J.. 1995, Least absolute deviation estimation of a shift. Econometric Theory, 11. 403-436,
[74[ Caner. M., 2002, A note on least absolute deviation estimation of a threshold model. Econometric Theory, 18,
800-814,
[75] He, X., Jureckova, J.. Koenker. R. and Portnoy. S.. 1990. Tail behavior of regression estimators and their
breakdown points. Econometrica. 58. 1195-1214.
[76] Ellis, S.P. and Morgenthaler, S., 1992. Leverage and breakdown in L| regression. Journal ofthe American
Statistical Association. 87. 143-148.
[77] Ellis. S.P. 1998, Instability of least squares, least absolute deviation atid least median of squares linear
regression, Stati.stical Science, 13, 337-344,
[78] Ponnoy. S, and Mizera. I . 1998, Comment. Statisticat .Science, 13, 344-347.
[79[ Ellis, S.P, l^i^K Rcjomder. Slati.\tical Scietice. 13.347-350.
[80] Pfaffenbergt'r. R.C, and Dielman, T.E.. 1989. A comparison of regression estimators when both multicollincar-
ity and outliers are present. In: Lawrence. K. D. and Arthur. J. L. (Eds.), Robust Regression: Anatysis anil
Applications (New York: Marcel Dekker, Inc.). pp. 243-270.
[81] Lind. J.C, Mehra. K.L. and Sheahan, J.N,. 1992. Asymmetric errors in linear models: Estimation-theory and
Monie Carlo. .Siati.slics. 23. 305-320.
[82] McDonald. J.B. and White. S.B., 1993. A comparison of some robust, adaptive, and partially adaptive estimators
of regression models, Economelric Reviews, 12. 103-124.
[83] Coursey, D, and Nyquist. H,. 1983. On least absolute error estimation oflinear regression models with dependent
stable residuals, Ttie Review of Economics of Statistics. 65. 687-692.
[84] Weiss. A. A.. 1990. Least absolute error estimation in the presence of serial correlation.
44, 127-158,
282 T E. Dielman

[85] Davis. R.A. and Dunsttiuir, W,T,M,. 1997, Least absolute deviation estimation for regression with ARMA
errors. Journal of Theoretical Pivtxibility, 10. 481^97.
186) Nyquist. H.. 1992. L|-norm estimation of regression models with serially dependent error terms. In: Dodge.
Y. (Ed,), L\-Statistical Analysis and Retated Methods (Amsterdam: North-Holland I, pp. 253-264.
jK7[ Dielman. T.E. and Rose. E.L., 1994. Estimation in least absolute value regression with autocorrelated errors,
Journat of Statisticat Ctimputation and SimiiUuion. 50, 29—43.
[88[ Dielman. T.E. and Rose, E.L., 1995, Estimation after pre-testing in least absolute value regression with
auttKorrelated errors. Journal of Business and Management. 2. 74-95.
[89[ Nyquist. H.. 1997. A Lagrange multiplier approach to testing for serially dependent error terms. L\-Statistical
Procedures and Related Topics. IMS Lecture Notes -Monograph Series. 31. 329-336,
|90[ Dielman, T.E., 1986, A comparison of forecasts from least absolute value and least squares regression. Journal
of Forecasting, S, 189-195.
|91 [ Dielman. T.E.. 1989. Corrections to: A comparison of forecasts from least absolute value and least squares
regression. Journat of Forecasting. 8. 419-420,
[92[ Diehtutn. T.B. and Rose. E.L.. 1994, Forecastitig in least absolute value regression with autocorrelated errors:
A small-sample sludy. Internatirmal Journat of Forecasting. 10. 539-547,
[93 [ Koenker. R. and Bassett. CJ.. 1982. Tests of linear hypotheses and L\ estimation. tVfWrtwc/ncf;, 50. 1577-1583,
[94[ Bai, /.,D., Rao, C.R. and Yin. Y.Q.. 1990. Least absolute deviations analysis of variance. Sankhya: The Indian
Jimrnal of Statistics. 52. Series A. 166-177.
[95[ McKean. J. and Schrader. R., 1984. A comparison of methods for studentizing the sample median.
Communications in Statistics ~ Simulation and Computation, 13. 751-773,
[9(i] McKean. J, and Schrader. R., 1987. Least abs<ilute em»rs analysis of variance. In: Dodge, Y. (Ed.). Statistical
Data Anatysis Based on ttie L\-Norm and Retated Mettiods (Amsterdam: North-Holland), pp, 297-305,
[97] Sposito. V.A. and Tveite, M,D., 1986. On the estimation of the variance of the median used in L| linear
inference procedures. Communicati<ms in Stalislics-Ttieory and Methods. 15. I3fi7-1375.
[98[ Sheather. S.J.. 1987.Assessing the accuracy ofihesatiiple median: Estimated standard errors versus interpolated
conlidence intervals. In: Dodge. Y. (Ed.). Statistical Data Analysis Based on the L \ -Norm and Related Methods
(Amsterdam: North-Holland), pp. 203-215.
[99[ Dielman. T.E. and Pfalfonbergcr, R.. 1990. Tests of linear hypotheses in LAV regression. Communications in
Statistics -Simulation and Computation, 19, 1179-1199,
I I(M1[ Dielman, T,E, and Pfaftenberger. R,. 1992. A further comparison of tests of hypotheses in LAV regression.
Computatiifnal Statistics and Data Anatysis. 14. 375-384.
|IOI[ Dielman, TE. and Rose. E.L,, 1995. A bootstrap approach to hypothesis testing in least ab.solute value
regression. Computationat Statistics ami Data Anatysis. 20. 119-130.
[ IO2[ Dielman. T.H. and Rose. E.L.. 1996, A note on hypothesis testing in LAV multiple regression; A small-sample
comparison. Computationat Statistics and Data Anatysis. 21, 46.^—170,
[ IO3[ Liu. Z.J., 1992, Non-parametric estimates of the nuisance parameter in the LAD tests. Communications in
Stati.stics - Ttieoiy and Meihods, 21, 861 -881.
[ 104[ Niemiro. W,. 1995, Estimation of nuisance parameters for inference based on least absolute deviations.
Appticationes Mattiematicae. 22. 515-529.
[ IO5[ Rao. C.R.. 1988, Methodology based on the L|-norm in statistical inference. Sankhva: The Indian Journat of
Stati.mcs. Series A. 50. 289-313.
[IO6[ Efron. B,, 1979, Bootstrap methods: Another look at the jackknife,/l»i«u/.v(/Sw»',(f/c.v, 7. 1-26.
[ IO7[ Hall, P. and Wilson, S,R.. 199 L Twu guidelines lor bootstrap hypothesis testing. Biometrics, 47, 757-762.
[IO8[ De Angelis. D.. Hall, P, and Young, G.A.. 1993, Analytical and bootstrap approximations to estimator
distributions in /| regression, Journat of the American Statisticat Association. 88. 1310-1316.
[109! Koenker, R., 1987. A comparison of asymptotic testing methods for 1|-regression. In: Dodge. Y.( Ed.), ,.S'/«m7(C(j/
Data Analysis Ba.seil on ttic L\-Norm and Related Methods (Amsterdam: North-Hoi land), pp, 287-295,
[ 110[ Schrader, R.M, and McKean. J.W.. 1987. Smalt-samplepropertiesof least absolute errors analysis of variance.
In: Dodge. Y. (Ed.). Statistical Data Analy.sis Ba.sed on the L\-Norm and Related Mettiods (Amsterdam:
North-Holland), pp. 307-321,
(11 I [ Dielman, T.E. and Pfaffenberger, R.. 1988. Bootstrapping in least absolute value regression: An appticalion to
hypothesis testing. Communications in Stalislics - Simulation and Computation. 17, 843-8.56,
1112 [ Zhang, J. and Boos. D.D.. 1994. Adjusted power estimates in Monte Cario ex]>eriments. Communications in
Statistics -Simulation andCotttputation, 23. 165-173,
i 113[ Dielman. TE. and Ro.se. E.L.. 1997. Estimation and te,sting in least absolute value regression with serially
conelatcd disturbances. Annals of Operations Re.^earch, 74, 239-257.
i I I4[ Stangenhaus. 0.. 1987. Bootstrap and inference procedures for Li regression. In: Dodge. Y. (Ed.). Stati.\tical
Data Atialysis Based on the 1.1 -Norm and Relalecl Mettiods (Amsterdam: North-Holland), pp. 323-332.
[ I I5[ Stangenhaus. G. and Narula. S.C. 1991. Inference pnu-edures forthe L| regression. Computational Statistics
and Data .Analy.sis, 12. 79-85.
[ I I6[ Stangenhaus, G., Narula, S.C. and Ferrerira. P., 1991, Bootstrap confidence intervals forthe minimum sum of
absolute errors regression. Joumal of Statisticat Computation and Simulation. 46, 127-133,
[117] Dielman. T,E, and Pfaffenberger. R.. 1988b. Least absolute value regression: Necessary sample sizes lo use
normal theory inference procedures. Decision Sciences. 19, 734—743.
Lea.'it absolute vatue 283

[118] Gutenbrunner, C . JureckoviS. J.. Koenker. R. and Portnoy, S,. 1993, Tests of linear bypotheses based on
regression rank scores. Nonparametric Statistics, 2. 307-331.
[ 1191 Cade. B.S. and Richards. J.D.. 1996, Permutation tests for least absolute deviation regression. Biometrics. 52,
886-902.
[120] Horowitz. J.L.. 1998. Bootstrap methods for median regression models. Econometrica. (tt. I327-L351,
[ 121 [ Weiss. A.A,. 1988. A comparison of ordinary least squares and least ab.solute error estimation. Econometric
Theory. 4. 5\1-521.
[ 122 [ Furno, M.. 200(}, LM tests in the presence of non-normal error distributions. Econometric Theory. 16,249-261.
[I23[ An. H, and Chen, Z,, 1982, On convergence of LAD estimates in autoregression with infinite variance. yow/'Mrt/
of Multivariate Anatysis. 12. 335-345.
[ I24[ Dunsmuir. W.TM. and Spencer. N.M,, I99I, Strong consistency and asymptotic normality of l| estimates of
the autoregressive moving-average model. Journal of Time-Series Anatysis, 12. 95-104.
[ I25| Dunsmuir. W.T.M.. 1992. A simulation study of 1| estimation of a seasonal tnoving average time-series model.
Communication in Statistics-Simulation and Computations. 21. 519—531.
[126] Dunsmuir. W.TM. and Murtagh. B.A,. 1993, Least absolute deviation estimation of stationary time-series
models. European Journal of Operational Resean-h. 67. 272-277,
[127[ Olsen. E.T. and Rui'.insky. S,A.. 1989. Characterization of ihe LAD(L|) AR parameter estimator when applied
to stationary ARMA. MA. and higher order AR processes./£'££7r(;H.v(ic/(<7H.v<«Mf(>M,v»'c.(, Speech, and Si gnat
Processing. 37. 1451-1454.
[ 128[ Ru/insky, S.A. and Olsen. E.T, i989,Strongconsisteneyof tbe LAD (L|) estimator of parameters of stationat7
autoregressive processes with zero mean. IEEE Transactions on Acoustics. Speech and Signal Pmces.sing. 37,
597-6(K).
|]29[ Pino, F,A. and Morettin, P.A., 1993, The consistency of the L|-norm estimates in ARMA models.
Communications in Statistics - Ttieory ami Methodotogy. 22. 2185-2206.
[I3O[ Knight. K,. 1997. Some limit theory for L|-eslimators in autoregressive models under general conditions,
L\-Statisticat Procedures and Related Topics. IMS Lecture Notes-Monograph .Series. 31. 315-328,
[131[ Knight. K.. 1998, Limiting distributions for L| regression estimators under general conditions. Armats of
Statistics. 26. 755-770.
[ I32[ Herce, M,A., 1996. Asymptotic theory of LAD estimation in a unit root process with finite variance errors.
Econometric Theory. 12, 129-153.
[ I33[ Rogers. A.J., 2(H)1. Least absolute deviations regression under nonstandard conditions. Econometric Ttieory.
17. 820-852,
1134[ Gonin. R. and Money. A.H.. 1987. A review of computational methods for solving the non-linear L|-norm
estimation problem. In: Dodge. Y. (Ed.). Statisticat Data Anatysis Based on the L i -Nonn and Retaied Methods
{Amsterdam: North-Holland), pp. 117-129.
[ I35[ Gonin. R. and Money. A.H.. 1989. Nontinear Lp-Norm Estimation (NewYork: Marcel Dekker Inc.),
[I36| Oberhofer. W., 1982. The consistency of non-linear regression minimizing the L\-nOYm. Annals of Statistics,
10,316-319.
[I37[ Richardson. CD. and Bhattacharyya. B.B., 1987. Consistent L|-cstimators in non-linear regression for a
noncompact parameter space. Sankhya: The Itidian Journat of Statistics, Series A. 49. 377-387.
[ I38[ Soliman. S,A,. Christensen, G.S. and Rouhi, A.. 1991. A new algorithm tor nonlinear L|-norm minimization
with nonlinear equality constraints. Computational Statistics and Data Anatysis. 11. 97-109.
[ 139! Weiss, A.A,, I99L Estimating non-linear dynamic models using least absolute error estimation. Ecommielric
Theory, 7. 46-68.
[ i40[ Sbi. P. and Li. G,. 1994. Asymptotics of the minimum L| -norm estimates in a partly linear model. Sy.ttem.s
Science and Mathematicat Sciences, 7. 67-77,
[14![ Shi. P, and Li. G., 1994. On the rates of convergence of minimum L|-norni estimates in a partly linear model.
Communications in Stalislics - Theory and Methods. 23. 175-196.
(142) Kim. H.K. and Choi. S.H., 1995, Asymptotic properties of non-linear least absolute deviation estimators.
Journat of ihe Korean Statistical Society, 24, 127-139.
1143[ Andre, C.D.S.Niuiila. S.C, Peres. C, A. and Ventura, G.A,, 1997, Asymptotic properties of the minimum sum
of absolute errors estimators in a c!o.se-response model. Joumal of Statistical Computation atid Simulation,
58,361-379,
1144) Zwan/ig. S.. 1997, On Li-norm estimators in nonlinear regression and in nonlinear error-in-variables models.
L\-Stalistical Procedures and Retated Topics. IMS Lecture Notes - Monograph Series, H. tOI-l 18,
[I45[ Coleman, J.W. and Larsen. i.E.. 1991, Alternative estimation techtiiques for linear appraisal models. The
Appraisat Joumal. S9. 526-532.
[146] Corrado. C.J. and Schatzberg. J.D.. 1991. Estimating systematic risk with daiiy security returns: A note on the
relative efficiency of selected estimators. Ttie Financial Review. 26, 587-599.
[147] Chan, L.K.C, and Lakonishok. J.. 1992. Robust measurement of beta risk. Journal of Financial and Quantitative
Analysis. 27. 265-282.
1148[ Butler, R,J.. McDonald. J.B.. Nelson. R.D. and White. S.B,. 1990. Robust and partially adaptive estimation of
regression models. The Review of Economics and Statistics. 72. 321-327.
1149] Draper. P. and Paudyal. K., 1995. Empirical irregularities in the estimation of beta: The impact of alternative
estimation assumptions and prinredures. Journal of Business Finance & Accounting, 22. 157-177.
284 T.E. Dielman

\ 15O[ Mills, T C Coutts. J.A. and Roberts. J.. 1996. Misspecification testing and robust estimation ofthe market
model and their implications for event studies. Applied Economics. 28. 559-566.
[151) Bassett, G., 1997. Robust sports ratings based on lea.st absolute errors, Ttie American Staii.stician. 51,
99-105,
[152] Ka;rg3rd. N., 1987, Estimation criterion, residuals and prediction evalualion. Computational Statistics and
Data Anaty.sis, 5, 443^50.
[ I53[ Sengupta. J.K. and Okamura. K.. 1993. Scale economies in manufacturing: Problems of robust estimation.
Empiricat Economics. 18. 469-480.
[ I.'i4[ Westland, A,H., 1989. Robust estimation and prediction of economic systems: The ease of partial sttiictural
variability, Quatity & Quantity. 23. 61-73.
[155[ Gonin, R, and Money, A,H,, 1987. Outliers in physical processes: L|- or adaptive Lp-notTii estimation?
In: Dodge, Y. (Ed.), Statisticat Data Analv.sis Based on ttie L\-Nomi and Related Mettiods (Amsterdam:
Nonh-Holland), pp. 447-154.
1156[ Roy. T, 1993, Estimating shelf-life using L| regression methods. Journat of Pharmaceutical & Biomedical
Anatysis. 11.843-845.
[157[ Rousseeuw, P,J, and van Zomeren, B,C,, 1992, A comparison of some quick algorithms for robust regression.
Cimiputationat Statistics and Data Analy.s-is. 14, 107-116.
1158[ Powell, J.L.. 1983, The asymptotic normality of two-stage least absolute deviations e.stimaiors, Econometrica,
51. 1569-1575.
[I5y[ Powell, J.L., 1984. Least absolute deviations estimation for tbe censored regression model. Joumal of
Econometrics. 25, 303-325.
[ 160[ Rao. C.R. and Zhao, L.C, 1993, Asymptotic normality of LAD estimator in censored regression models.
Matliematicat Mettiods of Statistics, 2, 228-239.
[1611 Koenker. R. and Ponnoy, S.. 1990, M estimation of multivariate regressions, Joumal of the American Statistical
A.ssociation, 85, 1060-1068,
[ 162] Honore, B,E,. 1992, Trimmed LAD and least squares e.stimation of truncated and censored regression models
with fixed effects, Economeirica. 60. 533-565.
[ 163[ Wang. F T and Scott. D.W,. 1994. The L| method for robust non-parametric regression. yourna/o/?/i*'-4m(?rit«H
Statistical Association, 89. 65-76,
[ 164[ Dodge. Y.. 1984. Robust estimation of regression coefficients by minimizing a convex combination of least
squares and least absolute deviations. Computational Stalislics Quarterly. I. 139-153,
[ 165] Dodge. Y, and Jureckova. J.. 1987. Adaptive combination of least squares and least absolute deviations esti-
mators. In: Dodge. Y. (Ed.). Statistical Data Anatysis Based on L\-Norm and Retated Meihods (Amsterdam:
North-Holland), pp, 275-284.
[ 16(i[ Dodge. Y,. Antoch, J, and Jureckova, J,. 1991. Computational aspects of adaptive combination of least squares
and least absolute deviations estimators, C<miputalional Statistics and Data Analysis. 12, X7-99.
[ !(i7] Dodge. Y, and Jureckova, J,. 1988. Adaptive combination of M-estimalor and L]-estimator in the linear model.
In: Dodge. Y.. Fedorov. V. V. and Wynn. H. P. (Eds.). Optimal Design ami Analysis ofE.xperiments (Amsterdam:
Elsevier Sciences Publishers), pp. 167-176,
[ Ui8j Dodge. Y. and Jureckovd. J.. 1992. A class of estimators based on adaptive convex combinations of two
estimation procedures. In: Dodge, Y, (Ed.). Li-Statistical Analysis and Related Methods (Amsterdam: North-
Holland), pp. 31-45.
[169] Mathew. T. and Nordstrom. K., 1993. Least squares and least absolute deviation procedures in approximately
linear models. Statistics & Probability Letters. 16, 153-158.
[ I7O[ Narula. S.C. and Karhonen, PJ,, 1994, Multivariaie multiple linear regression based on the tninimutii sum of
absolute errors criteritin. European Journal of Operational Research. 73, 70-75.
[171 [ Morgenthaler. S,, 1992. Least-absolute-deviations fits for generalized linetu" models, Biomt-trika.79.741-154.
[ i72[ Puig. P, and Stephens, M.A., 2000. Tests of fit forthe Laplace distribution, with applications, Tectmometrics.
42.417-424,
[173] Bradu. D., 1997, Identification of outliers by means of L| regression: Safe and unsafe configurations,
Coni/iiitationat Statistics and Data Analysis. 24, 271 -281.
[ 174[ Dodge. Y., 1997, LAD regression for detecting outliers in response and explanator>' variables. Journal of
Mutlivariale Anatysis. 61, 144-158.
[175] Morgenthaler. S.. 1997. Properties of L| residuals, L\-.Statistical Procedures and Related Topics. IMS Lecture
Notes - Monograph Series. 31. 79-90.
[ 176] Sheather. S.J. and McKean, J,W., 1992, The interpretation of residuals based on L| estimation. In: Dodge, Y,
(Ed.), L\-Statisticat Anatysis and Related Methods. North-Holland, Amsterdam, pp. 145-155.
[ 177] Hurvich. CM, and Tsai, C . I WO. Mode! .selection for least absolute deviations regression in small-samples.
Staii.stics & Pivbabitity Letters. 9, 259-265,
[ 178) HuSkova. M.. 1997. Li-test procedures for detection of change. L\-Statisticat Procedures and Related Topics.
IMS Lecture Notes - Monograph Series, 31. 57-70.
Least absolute value 285

Summary of Additional Papers on Least Absolute Value Estimation Not Cited in Text
Armstrong, R.D, and Kung, M,T., 1984, A linked list data structure for a simple linear regression algorithm.
Computational and Operational Research. II, 295-305, Pre,sents a special purpose algorithm to solve simple
LAV regression problems. The algorithm is a specialization of the linear programing approach developed by
Barrodale and Robens. but requires considerably less storage.
Bassett, G.. 1992, The Gauss Markov property forthe median. In: Dodge. Y. (Ed.). L \ -StatisticalAnalyses and Related
Methods (Amsterdam: North-Holland), pp. 23-31. A Gauss Markov type theorem for the median is proved. The
author shows that such a result implies more about restrictions on the cla.ss of estimators considered than optimality
of the e.stimator.
Brennan,J.J.andSeiford,L.M.. 1987, Linear programing and 1| regression; A geometric interprelalion. Computational
Slatislics ami t)ata Anatysis, 5, 263-276. Provide a geometric interpretation of the solution process of the LAV
regression problem.
Danao. R.A,. 1983, Regression by minimum sum of absolute errors: A note on perlect muiticollinearity. Philippine
Review of Economics and Business. 20. 125-133, When there is perfect tiiulticollinearity among regressors in
a LAV regression, the simplex algorithm wil! chtwse one maximal set of lineariy independent regressors from
the equation by setting the coefficients of the other variables equal to zero, ln essence, the variables witb zero
coefficients are dn)pped fmm tbe equation. There will be multiple optimal solutions pos,sible in such a ease.
Dodge, Y. and Roenko. N.. 1992. Stability of L|-nonn regression under additional observations. Computational
Statistics and Data Anatysis. 14,385-390. A lest is provided to determine whether the introduction of an ailditional
observation will lead to a new set of LAV regression estimates or whether the original solution remains optimal,
Dupacova, J.. 1992. Robustness of L i regression in the light of linear programing. In; DtKige, Y, (Ed.), L\-Stalislicat
Analy.sis and Related Meihods (Amsterdam: North-Holland), pp. 47-61, Uses linear programing results to examine
the behavior of LAV estimates in the linear regres,sion model. Properties of LAV estimates are explained through
the useof LP.
Earebrother, R.W.. 1987b, Mechanical representations of the Li and L: estimation problems. In: Dodge, Y. (Ed,),
Slalislicat Data Analysis Ba.sed on the L i -mirm and Related Meihods {AmslcriSam: Nonli-Holland), pp, 455-464,
Ha. CD. and Narula, S.C. 1989, Perturbation analysis for ihe minimum sum of absolute errors regression. Communi-
cations in Statistics-Simulation and Computation, 18,957-970, Used sensitivity analysis to investigate the amount
by which the values ofthe response variable forthe non-defining observations (those with non/cro residuals) would
change without changing the parameter estimate in a LAV regression,
Harris, T. 1950. Regression using minimum absolute deviations. .American Stati.stician. 4. 14-15. Brief discussion
of LAV regression as an answer to a contributed question.
Huber. P., I9S7. The place of the L|-norm in robust estimation, Computalionat Statistics and Data Atuttysis, 5,
255-262. Discussed the place of the LAV estimator in robust estimation. Huber states the two main purposes of
LAV estimation as (1) providing estimates with minimal bias if the observations are asymmetrically contaminated
and [2) furnishing convenient staning values for estimates based on iterative procedures.
Koenker. R. and Bassett, G.W.. 1984, Four (pathological) examples in asymptotic statistics. The American Staii.ttician,
38. 209-212. The authors present four examples illustrating varieties of pathological asymptotic behavior. The
examples are presented in the context of LAV regres,sion. The article provides insight into results ofthe failure of
statidard conditions,
McConnell, C.R.. 1987. On computing a best discrete L| approximation using tbe method of vanishing Jacobians.
Computatitmal Statistics and Data Anatysis, 5. 277-288. Shows that (be method of vanishing Jacobians ean be
used to solve the L| linear programing problem,
McKean, J.W, atid Sievers, G.L., 1987, Coefficients of determination for least absolute deviation analysis. Statistics
<£ Probabitity Letters, 5, 49-54. Desirable properties for a coefficient of determination are developed and then
possible choices in the case of LAV regression are discussed. A measure linked to the test statistic for signitieance
of LAV regression coefficients seems to be a good cboice.
Muller, C , 1997, Lj -tests in linear models: Tests with maximum relative power. L \ -Statistical Procedures and Related
Topics. IMS Lectutv Notes -Monograph Series. 3\. 91-99, Showed that Wald-type tests i.tn coefficients in LAV
linear regression miiximize the relative power (power relative lo the bias),
Muller. C . 1992. l.i-estimaiion and tesling in conditionally contaminated linear models. In: Dodge. Y. (Ed,),
L\-Statisticat Anatysis and Related Mettiods (Amsterdam: North-Holland), pp, 69-76, Considered regression
models with disturbances thai may be from different contaminated nonnal distributions and examined situations
where LAV estimators and tests are not always the most robust.
286 T. E. Dielman

Natiila. H.C. and Wellington. J,F,, 1985. Interior analysis for the minimum sum of absolute etrors regression,
Tectinonu'trics. 27. ISI -188. Shows that there exists an interval for the value ofthe response variable of a nondefin-
ing observation (i>bservat ion with non-/ero residual) such that, if the \alue of the observation is in this interval, the
LAV paratneter estimates will not change. Also develops a procedure to determine an interval for the value of ihe
response variable for a defining oKservation (observation with zero residual) such that, if the value of the response
variable for the observation is in this interval, the set of defining oKservations does not change.
Saleh, A.K.M.E. and Sen, P.K.. 1987. On the asymptotic distributional risk properties of pretest and shrinkage
L| -estimators. Computational Statistics and Data Anatysis, 5. 289-299. This paper considered LAV estimation
of a subset of coefficients when another subset of coetticicnts is included in Ihe model but tnay be unnecessary.
The LAV e.siimator, a shrinkage estimator and a pre-test estimator arc examined. On ihe basis of asymptotic
distributional risk, the shrinkage e.stimator may dominate the LAV estimator, but not the preliminary lest estimator.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy