0% found this document useful (0 votes)
36 views

Lars Based S Estimator

This document describes a robust version of the Least Angle Regression (LARS) algorithm for variable selection based on S-estimators. LARS produces an ordered list of variables according to their importance, but is not robust to outliers. The authors derive an algorithm using S-regression, a robust regression estimator, instead of least squares. This makes LARS robust to outliers in the data. Simulation studies show the robust LARS compares favorably to previous robust model selection proposals and is able to select the correct variables even with outliers present.

Uploaded by

Joanne Wong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views

Lars Based S Estimator

This document describes a robust version of the Least Angle Regression (LARS) algorithm for variable selection based on S-estimators. LARS produces an ordered list of variables according to their importance, but is not robust to outliers. The authors derive an algorithm using S-regression, a robust regression estimator, instead of least squares. This makes LARS robust to outliers in the data. Simulation studies show the robust LARS compares favorably to previous robust model selection proposals and is able to select the correct variables even with outliers present.

Uploaded by

Joanne Wong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Robust Model Selection with LARS Based on

S-estimators
Claudio Agostinelli1 and Matias Salibian-Barrera2
1

Dipartimento di Statistica
Ca Foscari University
Venice, Italy claudio@unive.it
Department of Statistics
The University of British Columbia
Vancouver, BC, Canada matias@stat.ubc.ca

Abstract. We consider the problem of selecting a parsimonious subset of explanatory variables from a potentially large collection of covariates. We are concerned
with the case when data quality may be unreliable (e.g. there might be outliers
among the observations). When the number of available covariates is moderately
large, fitting all possible subsets is not a feasible option. Sequential methods like
forward or backward selection are generally greedy and may fail to include important predictors when these are correlated. To avoid this problem Efron et al.
(2004) proposed the Least Angle Regression algorithm to produce an ordered list
of the available covariates (sequencing) according to their relevance. We introduce
outlier robust versions of the LARS algorithm based on S-estimators for regression
(Rousseeuw and Yohai (1984)). This algorithm is computationally efficient and suitable even when the number of variables exceeds the sample size. Simulation studies
show that it is also robust to the presence of outliers in the data and compares
favourably to previous proposals in the literature.

Keywords: robustness, model selection, LARS, S-estimators, robust regres-

sion

Introduction

As a result of the recent dramatic increase in the ability to collect data,


researchers sometimes have a very large number of potentially relevant explanatory variables available to them. Typically, some of these covariates
are correlated among themselves and hence not all of them need to be included in a statistical model with good prediction performance. In addition,
models with few variables are generally easier to interpret than models with
many ones. Model selection refers to the process of finding a parsimonious
model with good prediction properties. Many model selection methods consist on sequentially fitting models from a pre-specified list and comparing
their goodness-of-fit, prediction properties, or a combination of both. In this
paper we consider the case where a proportion of the data may not satisfy the
Y. Lechevallier, G. Saporta (eds.), Proceedings of COMPSTAT2010,
c Springer-Verlag Berlin Heidelberg 2010
DOI 10.1007/978-3-7908-2604-3 6,

70

Agostinelli, C. and Salibian-Barrera, M.

model assumptions and we are interested in predicting the non-outlying observations. Therefore, we consider model selection methods for linear models
based on robust methods.
As it is the case with point estimation and other inference procedures,
likelihood-type model selection methods (e.g. AIC (Akaike (1970)), Mallows
Cp (Mallows (1973)), and BIC (Schwarz (1978)) may be severely affected
by a small proportion of atypical observations in the data. These outliers
may not necessarily consist of large values, but might not follow the model
that applies to the majority of the data. Model selection procedures that are
resistant to the presence of outliers in the sample have only recently started
to receive some attention in the literature. Seminal papers include Hampel
(1983), Ronchetti (1985, 1997) and Ronchetti and Staudte (1994). Other proposals include Sommer and Staudte (1995), Ronchetti, Field and Blanchart
(1997), Qian and K
unsch (1998), Agostinelli (2002a, 2002b), Agostinelli and
Markatou (2005), Morgenthaler, Welsch and Zenide (2003). See also the recent book by Maronna, Martin and Yohai (2006). These proposals are based
on robustified versions of classical selection criteria (e.g. robust Cp , robust
final prediction error, etc.). More recently M
uller and Welsh (2005) proposed
a model selection criterion that combines a measure of goodness-of-fit, a
penalty term to avoid over-fitting and and the expected prediction error conditional on the data. Salibian-Barrera and Van Aelst (2008) use the fast and
robust bootstrap of Salibian-Barrera and Zamar (2002) to obtain a faster
boostrap-based model selection method that is feasible to calculate for larger
number of covariates. Although less expensive from a computational point of
view than the stratified bootstrap of M
uller and Welsh (2005), this method,
as the previous ones, needs to compute the estimator on the full model.
A different approach to variable selection that is attractive when the number of explanatory variables is large is based on ordering the covariates according to their estimated importance in the full model. Forward stepwise
and backward elimination procedures are examples of this approach, whereby
in each step of the procedure a variable may enter or leave the linear model
(see, e.g. Weisberg (1985) or Miller (2002)). With backward elimination one
starts with the full model and then finds the best possible submodel with
one less covariate in it. This procedure is repeated until we fit a model with
a single covariate or a criterion is reached. A similar procedure is forward
stepwise, where we first select the covariate (say x1 ) with the highest absolute correlation with the response variable y. We take the residuals of the
regression of y on x1 as our new response, project all covariates orthogonally to x1 and add the variable with the highest absolute correlation to the
model. At the same step, variables in the model may be deleted according to
a criterion. These steps are repeated until no variables are added or deleted.
Unfortunately, when p is large (p = 100, for example), these procedure becomes unfeasible for highly-robust estimators, furthermore these algorithms

LARS Based on S-estimators

71

are known to be greedy and may relegate important covariates if they are
correlated with those selected earlier in the sequence.
The Least Angle Regression (LARS) of Efron et al. (2004) is a generalization of stepwise methods, where the length of the step is selected so as
to strike a balance between fast-but-greedy and slow-but-conservative
alternatives, as those in stagewise selection (see, e.g. Hastie, Tibshirani and
Friedman (2001)). It is easy to verify that this method is not robust to the
presence of a small amount of atypical observations. McCann and Welsch
(2007) proposed to add an indicator variable for each observation and then
run the usual LARS on the extended set of covariates. When high-leverage
outliers are possible, they suggest building models from randomly drawn
subsamples of the data, and then selecting the best of them based on their
(robustly estimated) prediction error. Khan, Van Aelst and Zamar (2007b)
showed that the LARS algorithm can be expressed in terms of the pairwise
sample correlations between covariates and the response variable, and proposed to apply this algorithm using robust correlation estimates. This is a
plug-in proposal in the sense that it takes a method derived using least
squares or L2 estimators and replaces the required point estimates by robust
counterparts.
In this paper we derive an algorithm based on LARS, but using a Sregression estimator (Rousseeuw and Yohai (1984)). Section 2 contains a brief
description of the LARS algorithm, while Section 3 describes our proposal.
Simulation results are discussed in Section 4 and concluding remarks can be
found in Section 5.

Review of Least Angle Regression

Let (y1 , x1 ), . . . , (yn , xn ) be n independent observations, where yi R and


xi Rp , i = 1, . . . , n. We are interested in fitting a linear model of the form
yj = + 0 xj + j

j = 1, . . . , n,

where Rp and the errors j are assumed to be independent with zero


mean and constant variance 2 . In what follows, we will assume, without
loss of generality that the variables have been centered and standardized to
satisfy:
n
X
i=1

yi = 0

n
X
i=1

xi,j = 0

n
X

x2i,j = 1

for 1 j p .

i=1

so that the linear model above does not contain the intercept term.
The Least Angle Regression algorithm (LARS) is a generalization of the
Forward Stagewise procedure. The latter is an iterative technique that starts
with the predictor vector
= 0 Rn , and at each step sets

=
+ sign(cj ) x(j)

72

Agostinelli, C. and Salibian-Barrera, M.

where j = arg max1ip cor(y


, x(i) ), x(i) Rn denotes the i-th column
of the design matrix, cj = cor(y
, x(j) ), and > 0 is a small constant.
Typically the parameter controls the speed and greediness of the method:
small values produce better results at a large computational cost, while large
values result in a faster algorithm that may relegate an important covariate
if it happens to be correlated with one that has entered the model earlier.
The LARS iterations can be described as follows. Start with the predictor

= 0. Let
A be the current predictor and let
c = X 0 (y
A ) ,
where X Rnp denotes the design matrix. In other words, c is the vector of current correlations cj , j = 1, . . . , p. Let A denote the active set,
which corresponds to those covariates with largest absolute correlations:
C = maxj {|cj |} and A = {j : |cj | = C}. Assume, without loss of generality, that A = {1, . . . , m}. Let sj = sign(cj ) for j A, and let XA Rnm
be the matrix formed by the corresponding signed columns of the design
matrix X, sj x(j) . Note that the vector uA = vA /kvA k, where
1

0
vA = XA (XA
XA )

1A ,

satisfies
0
XA
uA = AA 1A ,

(1)

where AA = 1/kvA k R. In other words, the unit vector uA makes equal


angles with the columns of XA . LARS updates
A to

A
A + uA ,
where is taken to be the smallest positive value such that a new covariate
joins the active set A of explanatory variables with largest absolute correlation. More specifically, note that, if for each we let () =
A + uA , then
for each j = 1, . . . , p we have

cj () = cor y (), x(j) = x0(j) (y ()) = cj aj ,
where aj = x0(j) uA . For j A, equation (1) implies that
|cj ()| = C AA ,
so all maximal current correlations decrease at a constant rate along this
direction. We then determine the smallest positive value of that makes the
correlations between the current active covariates and the residuals equal to
that of another covariate x(k) not in the active set A. This variable enters
the model, the active set becomes
A A {k} ,
and the correlations are updated to C C AA . We refer the interested
reader to Efron et al. (2004) for more details.

LARS Based on S-estimators

73

LARS based on S-estimators

S-regression estimators (Rousseeuw and Yohai (1984)) are defined as the


vector of coefficients that produce the smallest residuals in the sense of
minimizing a robust M-scale estimator. Formally we have:
= arg minp () ,
R

where () satisfies


n
1 X
ri ()

= b,
n i=1
()
: R R+ is a symmetric, bounded, non-decreasing and continuous function, and b (0, 1) is a fixed constant. The choice b = EF0 () ensures that the
resulting estimator is consistent when the errors have distribution function
F0 .
For a given active set A of k covariates let A , 0A ,
A be the S-estimators
of regressing the current residuals on the k active variables with indices in
A. Consider the parameter vector = (, 0 , ) that satisfies
!
n
X
ri x0i,k (A ) 0
1

= b.
n k 1 i=1

A robust measure of covariance between the residuals associated with and


the j-th covariate is given by
!
n
X
ri x0i,k (A ) 0
0
covj () =

xij ,

i=1
and the corresponding correlation is
,
j () = covj ()

n
X
i=1

ri x0i,k (A ) 0

!2
.

Our algorithm can be described as follows:


1. Set k = 0 and compute the S-estimators 0 = (0, 00 ,
0 ) by regressing y
against the intercept. The first variable to enter is that associated with
the largest robust correlation:



1 = max

j (0 ) .
1jp

Without loss of generality, assume that it corresponds to the first covariate.

74

Agostinelli, C. and Salibian-Barrera, M.

2. set k = k + 1 and compute the current residuals


ri,k = ri,k1 xti,k1 (k1 k1 ) 0,k1 .
3. let k , 0k ,
k be the Sestimators of regressing rk against xk .
4. For each j in the inactive set find j such that
j = |
j
| = |
m | for all 1 m k, 
Pn
0
t

i=1 (ri,k xi,k (k k ) 0k )/k = 0, and




Pn
t

i=1 (ri,k xi,k (k k ) 0k )/k = b(n k 1).


k+1 = maxj>k , the associated index, say v corresponds to the
5. Let
j
next variable to enter the active set. Let k = v .
6. Repeat until k = d.
Given an active set A, the above algorithm finds the length k such that
the robust correlation between the current residuals and the active covariates
matches that of an explanatory variable yet to enter the model. The variable
that achieves this with the smallest step is included in the model, and the
procedure is then iterated. It is in this sense that our proposal is based on
LARS.

Simulation results

To study the performance of our proposal we conducted a simulation study


using a similar design to that reported by Khan et al. (2007b). We generated
the response variable y according to the following model:
y = L1 + L2 + + Lk +  ,
where Lj , j = 1, . . . , k and  are independent random variables with a standard normal distribution. The value of is chosen to obtain a signal to
noise ratio of 3. We then generate d candidate covariates as follows
Xi = Li +
Xk+1 = L1 +
Xk+2 = L1 +
Xk+3 = L2 +
Xk+4 = L2 +
..
.
X3k1 = Lk +
X3k = Lk +

i ,
k+1
k+2
k+3
k+4

i = 1, . . . , k ,

3k1
3k ,

and Xj = j for j = 3k, 3k + 1, . . . , d. The choices = 5 and = 0.3 result in


cor(X1 , Xk+1 ) = cor(X1 , Xk+2 ) = cor(X2 , Xk+2 ) = cor(X2 , Xk+3 ) = =
cor(Xk , X3k ) = 0.5. We consider the following contamination cases:

LARS Based on S-estimators

75

a.  N (0, 1), no contamination;


b.  0.90 N (0, 1) + 0.10 N (0, 1)/U(0, 1), 10 % of symmetric outliers with
the Slash distribution;
c.  0.90 N (0, 1) + 0.10 N (20, 1), 10 % of asymmetric Normal outliers;
d. 10% of high leverage asymmetric Normal outliers (the corresponding covariates were sampled from a N (50, 1) distribution).
For each case we generated 500 independent samples with n = 150, k = 6
and d = 50. In each of these datasets we sorted the 50 covariates in the
order in which they were listed to enter the model. We used the usual LARS
algorithm as implemented in the R package lars, our proposal (LARSROB)
and the robust plug-in algorithm of Khan et al. (2007b) (RLARS).
For case (a) where no outliers were present in the data, all methods performed very close to each other. The results of our simulation for cases (b),
(c) and (d) above are summarized in Figures 1 to 3. For each sequence of
covariates consider the number tm of target explanatory variables included
in the first m covariates entering the model, m = 1, . . . , d. Ideally we would
like a sequence that satisfies tm = k for m k. In Figures 1 to 3 we plot
the average tm over the 500 samples, as a function of the model size m,
for each of the three methods. We see that for symmetric low-leverage outliers LARSROB and RLARS are very close to each other, with both giving
better results that the classical LARS. For asymmetric outliers LARSROB
performed marginally better than RLARS, while for high-leverage outliers
the performance of LARSROB deteriorates noticeably.

Conclusion

We have proposed a new robust algorithm to select covariates for a linear


model. Our method is based on the LARS procedure of Efron et al. (2004).
Rather than replacing classical correlation estimates by robust ones and applying the same LARS algorithm, we derived our method directly following
the intuition behind LARS but starting from robust S-regression estimates.
Simulation studies suggest that our method is robust to the presence of lowleverage outliers in the data, and that in this case it compares well with
the plug-in approach of Khan et al. (2007b). A possible way to make our
proposal more resistant to high-leverage outliers is to downweight extreme
values of the covariates in the robust correlation measure we utilize. Further
research along these lines is ongoing.
An important feature of our approach is that it naturally extends the
relationship between the LARS algorithm and the sequence of LASSO solutions (Tibshirani (1996)). Hence, with our approach we can obtain a resistant
algorithm to calculate the LASSO based on S-estimators. Details of the algorithm discussed here, and its connection with a robust LASSO method will
be published separately.

Agostinelli, C. and Salibian-Barrera, M.

5
4
3
2
1
0

NUMBER OF CORRECT COVARIATES

76

10

20

30

40

50

MODEL SIZE

5
4
3
2
1
0

NUMBER OF CORRECT COVARIATES

Fig. 1. Case (b) - Average number of correctly selected covariates as a function of


the model size. The solid line corresponds to LARS, the dashed line to our proposal
(LARSROB) and the dotted line to the RLARS algorithm of Khan et al. (2007b).

10

20

30

40

50

MODEL SIZE

Fig. 2. Case (c) - Average number of correctly selected covariates as a function of


the model size. The solid line corresponds to LARS, the dashed line to our proposal
(LARSROB) and the dotted line to the RLARS algorithm of Khan et al. (2007b).

77

5
4
3
2
1
0

NUMBER OF CORRECT COVARIATES

LARS Based on S-estimators

10

20

30

40

50

MODEL SIZE

Fig. 3. Case (d) - Average number of correctly selected covariates as a function of


the model size. The solid line corresponds to LARS, the dashed line to our proposal
(LARSROB) and the dotted line to the RLARS algorithm of Khan et al. (2007b).

References
AGOSTINELLI, C. (2002a): Robust model selection in regression via weighted
likelihood methodology. Statistics and Probability Letters, 56 289-300.
AGOSTINELLI, C. (2002b): Robust stepwise regression. Journal of Applied Statistics, 29(6) 825-840.
AGOSTINELLI, C. and MARKATOU, M. (2005): M. Robust model selection by
cross-validation via weighted likelihood. Unpublished manuscript.
AKAIKE, H. (1970): Statistical predictor identification. Annals of the Institute of
Statistical Mathematics, 22 203-217.
EFRON, B., HASTIE, T., JOHNSTONE, I. and TIBSHIRANI, R. (2004): Least
angle regression. The Annals of Statistics 32(2), 407-499.
HAMPEL, F.R. (1983): Some aspects of model choice in robust statistics. In: Proceedings of the 44th Session of the ISI, volume 2, 767-771. Madrid.
HASTIE, T., TIBSHIRANI, R. and FRIEDMAN, J. (2001): The Elements of Statistical Learning. Springer-Verlag, New York.
KHAN, J.A., VAN AELST, S., and ZAMAR, R.H. (2007a): Building a robust
linear model with forward selection and stepwise procedures. Computational
Statistics and Data Analysis 52, 239-248.
KHAN, J.A., VAN AELST, S., and ZAMAR, R.H. (2007b): Robust Linear Model
Selection Based on Least Angle Regression. Journal of the American Statistical
Association 102, 1289-1299.
MALLOWS, C.L. (1973): Some comments on Cp . Technometrics 15, 661-675.
MARONNA, R.A., MARTIN, D.R. and YOHAI, V.J. (2006): Robust Statistics:
Theory and Methods. Wiley, Ney York.

78

Agostinelli, C. and Salibian-Barrera, M.

McCANN, L. and WELSCH, R.E. (2007): Robust variable selection using least
angle regression and elemental set sampling. Computational Statistical and
Data Analysis 52, 249-257.
MILLER, A.J. (2002): Subset selection in regression. Chapman-Hall, New York.
MORGENTHALER, S., WELSCH, R.E. and ZENIDE, A. (2003): Algorithms for
robust model selection in linear regression. In: M. Hubert, G. Pison, A. Struyf
and S. Van Aelst (Eds.): Theory and Applications of Recent Robust Methods.
Brikh
auser-Verlag, Basel, 195-206.

MULLER,
S. and WELSH, A. H. (2005): Outlier robust model selection in linear
regression. Journal of the American Statistical Association 100, 1297-1310.

QIAN, G. and KUNSCH,


H.R. (1998): On model selection via stochastic complexity
in robust linear regression. Journal of Statistical Planning and Inference 75,
91-116.
RONCHETTI, E. (1985): Robust model selection in regression. Statistics and Probability Letters 3, 21-23.
RONCHETTI, E. (1997): Robustness aspects of model choice. Statistica Sinica 7,
327-338.
RONCHETTI, E. and STAUDTE, R.G. (1994): A robust version of Mallows Cp .
Journal of the American Statistical Association 89, 550-559.
RONCHETTI, E., FIELD, C. and BLANCHARD, W. (1997): Robust linear model
selection by cross-validation. Journal of the American Statistical Association
92, 1017-1023.
ROUSSEEUW, P.J. and YOHAI, V.J. (1984). Robust regression by means of Sestimators. In: J. Franke, W. Hardle and D. Martin (Eds.): Robust and Nonlinear Time Series, Lecture Notes in Statistics 26. Springer-Verlag, Berlin, 256272.
SALIBIAN-BARRERA, M. and VAN AELST, S. (2008): Robust model selection
using fast and robust bootstrap. Computational Statistics and Data Analysis
52 5121-5135.
SALIBIAN-BARRERA, M. and ZAMAR, R.H. (2002): Bootstrapping robust estimates of regression. The Annals of Statistics 30, 556-582.
SCHWARTZ, G. (1978): Estimating the dimensions of a model. The Annals of
Statistics 6, 461-464.
SOMMER, S. and STAUDTE, R.G. (1995): Robust variable selection in regression
in the presence of outliers and leverage points. Australian Journal of Statistics
37, 323-336.
TIBSHIRANI, R. (1996): Regression shrinkage and selection via the lasso. Journal
of the Royal Statistical Society, Series B: Methodological 58, 267-288.
WEISBERG, S. (1985): Applied linear regression. Wiley, New York.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy