Jurnal Asli Diagram Sa
Jurnal Asli Diagram Sa
a r t i c l e i n f o a b s t r a c t
Keywords: The question of variable selection in a multiple linear regression model is a major open
Regression analysis research topic in statistics. The subset selection problem in multiple linear regression deals
Subset selection problem with the selection of a minimal subset of input variables without loss of explanatory
Genetic algorithm power. In this paper, we adapt the genetic and simulated annealing algorithms for variable
Simulated annealing algorithm
selection in multiple linear regression. The performance of this hybrid heuristic method is
Hybrid heuristic optimization
compared to those obtained by forward selection, backward elimination and classical
genetic algorithm search. A comparative analysis on the literature data sets and simulation
data shows that our hybrid heuristic method may suggest efficient alternative to tradi-
tional subset selection methods for the variable selection problem in multiple linear
regression models.
Ó 2013 Elsevier Inc. All rights reserved.
1. Introduction
When dealing with regression models, it is often difficult to decide how many variables to measure in order to build
an adequate model. Often too many variables are measured. This usually happens when the modeller is unsure as to
which are important or because the experimental procedure provides measurement of a large number of variables auto-
matically [6].
It is very tempting for the modeller to generate a model using all available variables. This, however, creates several data
analysis problems [5]: (1) some of the variables may be completely irrelevant to the objectives of the model, and cloud any
meaningful relationships that exist between other variables; (2) in order to obtain reliable parameter estimates the number
of observations made on each variable should be significantly greater than the number of variables; (3) variables may be
correlated, in which case replicated information is redundant; (4) the signal to noise ratio of certain variables may be so
low that their inclusion in the model may be questioned (and will certainly lead to a poorer model), especially if other ‘‘clea-
ner’’ correlated variables are available; (5) when the model parameters are optimised using iterative methods, a greater
number of parameters can result in a more complex error surface to be optimised. This complexity may effect the overall
convergence time of the model.
For the these reasons it is advantageous to select the ‘‘best’’ variables prior to the modelling process [30] (best is generally
used to refer to the optimum set of variables with respect to statistical significance or a statistical criterion). When a data set
including many explanatory variables and a response variable is given, the choice of best model which predicts the response
variable is known as ‘‘variable selection’’ or ‘‘the selection of the best subset model’’.
0096-3003/$ - see front matter Ó 2013 Elsevier Inc. All rights reserved.
http://dx.doi.org/10.1016/j.amc.2013.05.016
H. Hasan Örkcü / Applied Mathematics and Computation 219 (2013) 11018–11028 11019
There are numerous situations where researchers are faced with a large pool of candidate variables for possible inclusion
in a multivariate statistical analysis. In most cases, the inclusion of all variables in the statistical analysis is, at best, unnec-
essary and, at worst, a serious impediment to the correct interpretation of the data. Not surprisingly, the general problem of
variable selection in multiple regression analysis has been acknowledged for more than 40 years [4,14,11,30,12,13,7,15,
20,1,33,16,28,38,21,31,32].
The general approach to variable selection is to minimise a cost function, where the cost function calculates some metric
to decide which subset of the available variables produces the ‘‘best’’ model. Some of optimization algorithm is then needed
to select the subsets to be tested by the cost function.
The simplest method of selection would be to examine all possible combinations of the variables exhaustively. If there are
m initial variables then this would result in 2m possible subsets [24]. If m is large then this is computationally expensive
(and in most situations practically impossible). Disqualifying this search method means that there is no guaranteed way
of finding the optimal variable subset for a given model.
The most popular statistical strategies are forward selection procedure (FS) and the backward elimination procedure (BE)
[30].
The forward selection procedure starts with an equation containing no predictor variables, only a constant term. The first
variable included in the equation is the one which has the highest simple correlation with the response variable Y. If the
regression coefficient of this variable is significantly different from zero it is retained in the equation, and a search for a sec-
ond variable is made. The variable that enters the equation as the second variable is one which has the highest correlation
with Y, after Y has been adjusted for the effect of the first variable. The significance of the regression coefficient of the second
variable is then tested. If the regression coefficient is significant, a search for a third variable is made in the same way. The
procedure is terminated when the last variable entering the equation has an insignificant regression coefficient or all the
variables are included in the equation. The significance of the regression coefficient of the last variable introduced in the
equation is judged by the standard t-test computed from the latest equation [8].
The backward elimination procedure starts with the full equation and successively drops one variable at a time. The vari-
ables are dropped on the basis of their contribution to the reduction of error sum of squares. The first variable deleted is the
one with the smallest contribution to the reduction of error sum of squares. This is equivalent to deleting the variable which
has the smallest t-test in the equation. If all the t-tests are significant, the full set of variables is retained in the equation.
Assuming that there are one or more variables that have insignificant t-tests, the procedure operates by dropping the var-
iable with the smallest insignificant t-test.
Although FS and BE are widely used, there are several criticisms to them. FS and BE imply an order of importance to the
variables. This can be misleading since, for example, it is not uncommon to find that the first variable included in FS is quite
unnecessary in the presence of other variables. Similarly, it is easily demonstrated that the first variable deleted in BE can be
the first variable included in FS [30].
The purpose of this paper is to introduce an alternative variable selection method for use in multiple linear regression
analysis. This method bases on a hybridization of the genetic algorithm (GA) and simulated annealing (SA) methods. GA devel-
oped by Holland [19] is based on the Darwinian theory of biological evolution. It is a very important stochastic search algo-
rithm for solving optimization problems. SA developed by Metropolis [29] is another important stochastic adaptive method
based on the physical process of annealing. Hybrid procedure uses the powerful features of these methods.
The rest of this paper is organized as follows. Section 2 presents the problem of selecting regression variables and model
selection criteria. The genetic algorithm, simulated annealing algorithm and hybrid of these heuristic methods are intro-
duced in Section 3. Section 4 demonstrates the approaches with literature and simulated data sets. Conclusions are given
in Section 5.
In building a multiple regression model, a crucial problem is the selection of regressors to be included. If a lower amount
of regressors are selected in the model, the estimate of the parameters will not be consistent and if a higher amount is se-
lected, its variance will increase.
Let fX 1 ; . . . ; X m g the set of m independent variables (with n observations) Y dependent or response variable in a multiple
regression model. Suppose that the model
Y ¼ b0 þ b1 X 1 þ þ bp X p þ e;
explains the relationship between the dependent and independent variables, where X # X is set of the p 6 m independent
variables chosen as regressor and b1 ; . . . ; bp is the parameter set. There are 2m possible submodels. When m is high, the
computational requirements for these procedures can be prohibitive because the number of models becomes infeasible.
In order to resolve this intractable problem, several heuristic methods addressed to restrict attention to a smaller number
of potential subset of regressors are usually employed by practitioners. Such heuristic procedures, rather than search
through all possible models, seek a good path through them. Some of the most popular are the stepwise procedures, such
as forward selection or backward elimination, sequentially include or exclude variables based on t statistic considerations
[30].
11020 H. Hasan Örkcü / Applied Mathematics and Computation 219 (2013) 11018–11028
There are several criteria for deciding on an appropriate subset. Some of the more common ones are:
RSSp
RMSp ¼ : ð2:1Þ
np
(2) The squared multiple correlation coefficient (R2 ),
RSSp
R2p ¼ 1 : ð2:2Þ
TSS
(3) The adjusted R2 (R2 ),
1 R2p
R2p ¼ 1 ðn 1Þ : ð2:3Þ
np
(4) Mallow’s C p ,
RSSp
Cp ¼ þ 2p n: ð2:4Þ
r^ 2
(5) The asymptotic information criterion (AIC),
AIC ðpÞ ¼ n log S2p þ 2p; ð2:5Þ
where p is the number of input or independent variables in the model, RSSp is the residual sum of squares for the p-variable
model, TSS is the total sum of squares, S2p is the variance of residuals, and n is the number of observations.
The new approach for selecting regressors proposed in this paper is based on an hybrid of genetic and simulated anneal-
ing heuristic optimization procedures called hybridGSA. Firstly, genetic and simulated annealing algorithms were summa-
rized in the subsections.
Genetic algorithms (GAs) are adaptive methods, which may be used to solve search and optimization problems [17, 19].
GAs perform search in complex, large and multimodal landscapes, and provide near-optimal solutions for objective or fitness
function of an optimization problem.
In GAs, the parameters of the search space are generally encoded in the form of binary strings (called chromosomes),
where each binary digit within the chromosome represented a gene. A collection of such strings is called a population. Ini-
tially, a random population is created, which represents different points in the search space. An objective or fitness function
is associated with each string that represents the degree of goodness of the string. Based on the principle of survival of the
fittest, a few of the strings are selected and each is assigned a number of copies that go into the mating pool. Biologically
inspired operators like crossover and mutation are applied on these strings to yield a new generation of strings. The process
of selection, crossover and mutation continues for a fixed number of generations or till a termination condition is satisfied.
Although many variants of the original operators exist, their original purpose remains intact for most implementations,
and can be described as follows:
1. Selection. Selection is the process that mimics the ‘‘survival of the fittest’’ principle in the biological theory of evolution.
Selection is implemented in a GA as a policy for determining which chromosomes in the population will survive and be
carried over into the next generation.
2. Crossover. The crossover process is implemented in a GA by exchanging chromosome segments between two or more par-
ent chromosomes to form a child chromosome. The crossover operator typically serves a dual purpose. First, to effectively
reduce the search space to regions of greater promise. Second, to provide a mechanism for allowing offspring to inherit
the properties of their parents. The crossover process is also commonly referred to as ‘‘recombination’’.
3. Mutation. Mutation is the process that mimics the unpredictable and unexpected developments that occur in biological
reproduction. In a GA, mutation is a random perturbation to one or more genes that occurs infrequently during the evo-
lutionary process. The purpose of the mutation operator is to provide a mechanism to escape local optima. Local optima
are manifested by stalled evolutionary progress when one or more dimensions in the search space have lost genetic diver-
sity. This anomaly is often referred to as premature convergence or misconvergence and is a common demise for many
GAs.
H. Hasan Örkcü / Applied Mathematics and Computation 219 (2013) 11018–11028 11021
As its name implies, the simulated annealing (SA) exploits an analogy between the way in which a metal cools and freezes
into a minimum energy crystalline structure (the annealing process) and the search for a minimum in a more general system.
There is a close analogy between this approach and the thermodynamic process of annealing (cooling of a solid). Thus, it is
also called the simulated annealing (SA) approach. It was Metropolis et al. and coworkers [29] who first proposed this idea,
and 30 years later, Kirpatrick et al. [22,23] observed that this approach could be used to search for feasible solutions of quite
general optimization problems. The main idea is that the SA strategy may help prevent being trapped at poor solutions asso-
ciated with local optima of the fitness function.
In essence, it is a method that uses the objective function to create a nonhomogeneous Markov chain that asymptotically
converges to the minimum (maximum) of the objective function. The concept is originally based on the manner in which
liquids freeze or metals recrystallise in the process of annealing. In an annealing process a melt, initially at high temperature
and disordered, is slowly cooled so that the system at any time is approximately in thermodynamic equilibrium. As cooling
proceeds, the system becomes more ordered and approaches a ‘‘frozen’’ ground state. The analogy to an optimisation prob-
lem is as follows: The current state of the thermodynamic system is analogous to the current solution to the optimisation
problem, the energy equation for the thermodynamic system is analogous to the objective function, and the ground state
is analogous to the global optimum.
SA consists of several decreasing temperatures. Each temperature has a few iterations. First, the beginning temperature is
selected and an initial solution is randomly chosen. The value of the cost function based on the current solution (i.e., the ini-
tial solution in this case) will then be calculated. The goal is to minimize the cost function. Afterwards, a new solution from
the neighborhood of the current solution will be generated. The new value of the cost function based on the new solution
will be calculated and compared to the current cost function value. If the new cost function value is less than the current
value, it will be accepted. Otherwise, the new value would be accepted only when the Metropolis’s criterion [29], which
is based on Boltzman’s probability, is met. According to Metropolis’s criterion, if the difference between the cost function
values of the current and the newly generated solutions (ME) is equal to or larger than zero, a random number d in ½0; 1
is generated from a uniform distribution. If
exp ððMEÞ=T Þ P d; ð3:1Þ
is met, the newly generated solution is accepted as the current solution. The given by (3.1) exponential function is occasion-
ally called a Boltzmann function, thus the operator is also called a Boltzmann-type operator.
The number of new solutions generated at each temperature is the same as the iteration number at the temperature
which is constrained by the termination condition. The termination condition could be as simple as a certain number of iter-
ations. After all the iterations at a temperature complete, the temperature would be lowered based on the temperature
updating rule. At the updated (and lowered) temperature, all required iterations will have to be completed before moving
to the next temperature. This process would repeat until the halting criterion is met. The result of simulated annealing
(SA) is related to the number of iterations at each temperature and the speed of reducing temperature. The temperature
updating rule can be chosen as
T 1 ¼ aN T 0 ; ð3:2Þ
where N is number of iterations (generations), T 0 is the initial temperature, T 1 ðT 0 > T 1 Þ is the final temperature, ais the cool-
ing ratio. The cooling ratio controls the speed of cooling. The higher the cooling ratio, the faster the temperature cools down.
The implementation of the SA algorithm is remarkably easy. The following elements must be provided:
GA has been successfully used in a wide range of differentiable, nondifferentiable, and discontinuous optimization prob-
lems encountered in statistics, engineering and economics applications [3,37]. However, GA has two major limitations. First,
the performance might deteriorate as the problem size grows. In fact, with a growing size of the problem, GA requires a lar-
ger population to obtain a satisfactory solution. Second, premature convergence might occur when the GA cannot find the
optimal solution due to loss of some important characters (genes) in candidate solutions [18,34,2,35,27,9,10].
The performance of GA can be improved by introducing more diversity among the chromosomes in the early stage of the
solution process so that premature convergence can be eliminated. A hybrid algorithm, which combines aspects of GA and SA
is proposed to overcome the limitations of GA. To implement this idea, the Metropolis acceptance test technique from SA is
adopted into GA. The new hybrid algorithm, referenced to as hybridGSA, has been shown to overcome the poor convergence
properties of GA and outperform GA or SA.
11022 H. Hasan Örkcü / Applied Mathematics and Computation 219 (2013) 11018–11028
Some authors [25] have suggested inserting a Boltzmann-type or SA operator after the crossover and mutation opera-
tions. This operator will induce competition between parents and children. This helps to prevent the population from becom-
ing homogeneous too soon.
By adding this Boltzmann-type operator (SA operator), we obtain a hybrid simulated annealing-genetic algorithm (hybrid-
GSA). The steps of hybridGSA are detailed as follows:
4. Computational results
In this section, we provide numerical examples that have frequently appeared in the related literature to illustrate the
performance of proposed hybridGSA method. These data sets were provided by Longley [26], Waugh [36] and Eksioglu
et al. [15]. All the computations are performed in MATLAB 9 program. In order to compare performance of the models,
AIC criterion given by (2.5) and adjusted R2 given by ( 2.3) are used for Longley [26] – Waugh [36] data sets and for data sets
taken from Eksioglu et al. [15], respectively. In regression analysis, whereas a small AIC value is preferable, a big adjusted R2
(or R2 ) value is.
In Table 1, for Longley [26] and Waugh [36] data sets, number of candidate independent variables, sample sizes, best
model variables, best model AIC value and working time (in second) for an all possible regression search are presented
for these data sets.
For GA search, number of the individuals in the population, crossover probability, mutation probability and maximum
number of iteration are chosen as 100; 0:8; 0:1 and 1000, respectively. Similarly, for hybridGSA search, number of the indi-
viduals in the population, crossover probability, mutation probability, maximum number of iteration, the initial temperature
and the cooling rate are chosen as 100; 0:8; 0:1; 1000; 100 and 0:9, respectively. The statistical significant rate is chosen as
0:05 for FS and BE methods.
Tables 2–5, for Longley [26] and Waugh [36] data sets, present detailed results of FS, BE, GA and hybridGSA, respectively.
As can be seen in Table 5, the proposed hybridGSA method found the best subsets for both literature data sets. In contrast,
FS, BE and GA were not find the best subsets, especially for Waugh data set. For example, the sets selected by procedures
were X 3 ; X 4 ; X 6 ; X 7 ; X 8 ; X 9 is the set found by hybridGSA method (which is the optimal set), X 1 ; X 4 ; X 5 ; X 7 ; X 8 ; X 9 is the set found
by FS and BE methods, X 1 ; X 4 ; X 6 ; X 7 ; X 8 ; X 9 is the set found GA method. It is important to note that the variable X 3 , which is in
the optimal group, was never identified by any other procedures and variable X 1 , which is not in the optimal group, was in-
cluded by the other procedures.
H. Hasan Örkcü / Applied Mathematics and Computation 219 (2013) 11018–11028 11023
Start
No
Is the produced offspring
Accept the
better than worst solution
offspring if
in the population? ( =?)
Yes
Is the number of No
generations greater than
the pre-set maximum?
Yes
Stop
Additional to the Longley [26] and Waugh [36] data sets, a set of experiments taken from Eksioglu et al. [15] have been
conducted to compare the performance of the hybridGSA model, and the results are reported in Table 6. A total of twelve data
sets have been used for the experiments. Eksioglu et al. [15] used adjusted R2 criterion for the comparisons. Therefore, we
have used adjusted R2 criterion for hybridGSA, FS, BE, GA and also GRASP and Lagrangian relaxation proposed by Eksioglu
et al. [15]. Detailed information about data sets and GRASP and Lagrangian relaxation models can be found by Eksioglu
et al. [15].
As can be seen from the results (Table 6), hybridGSA found the best subset in eight out of twelve instances whereas GRASP
and Lagrangian relaxation found the best subset in five out of twelve instances (best values are given by bold).
The hybridGSA procedure was compared with FS, BE and GA by AIC criterion based on the all possible regressions proce-
dure using simulated data sets. The linear regression problem had 100 and 200 observations and k variables ranging in
11 6 k 6 25. A total of 30 (15 2 ) different data sets were generated for this study. Each problem data set was generated
independently of the others. For the entire set of problem complete enumerations were performed for all possible subsets in
order to find the best one in a rigorous way (i.e. the k ¼ 25 problem has 225 ¼ 33554432 possible subsets).
11024 H. Hasan Örkcü / Applied Mathematics and Computation 219 (2013) 11018–11028
Table 1
All possible regression results.
Longley Waugh
Number of variables 6 9
Sample size 16 17
Best model variables X2 , X3 , X4, X6 X3 , X4 , X6 , X7 , X8 , X9
Best AIC value 227.66 93.528
Working time (in second) 0.4477 3.6103
Table 2
Forward selection results.
Longley Waugh
Predicted model variables X2 , X3 , X4 , X5 X1 , X4 , X5 , X7 , X8 , X9
Predicted AIC value 252.45 118.96
Working time (in second) 0.2145 1.4587
Table 3
Backward elimination results.
Longley Waugh
Predicted model variables X2 , X3 , X4 , X5 X1 , X4 , X5 , X7 , X8 , X9
Predicted AIC value 252.45 118.96
Working time (in second) 0.2346 1.5258
Table 4
Genetic algorithm results.
Longley Waugh
Predicted model variables X2 , X3 , X4 , X6 X1 , X4 , X6 , X7 , X8 , X9
Predicted AIC value 227.66 97.45
Working time (in second) 0.0941 0.5841
Number of individuals in the population 100 100
Crossover probability 0.8 0.8
Mutation probability 0.1 0.1
Maximum number of generation 1000 1000
Table 5
HybridGSA results.
Longley Waugh
Predicted model variables X2 , X3 , X4 , X6 X3 , X4 , X6 , X7 , X8 , X9
Predicted AIC value 227.66 93.528
Working time (in second) 0.0914 0.5087
Number of individuals in the population 100 100
Crossover probability 0.8 0.8
Mutation probability 0.1 0.1
Maximum number of generation 1000 100
The initial temperature 100 100
The cooling rate 0.9 0.9
Tables 7 and 8 present the results of the optimal solution for the 100 and 200 observations problem determined by all
possible regressions, respectively. Multiple linear regression problems were then analyzed using the FS, BE, GA and hybrid-
GSA procedures.
Tables 9 and 10 provide the results of the FS and BE procedures for n ¼ 100 and n ¼ 200, respectively. The statistical sig-
nificant rate is chosen as 0:05 for these statistical methods.
H. Hasan Örkcü / Applied Mathematics and Computation 219 (2013) 11018–11028 11025
Table 6
Adjusted R2 values for the best subsets.
Table 7
All possible regression results (n ¼ 100)
Table 8
All possible regression results (n ¼ 200).
Tables 11 and 12 provide the results of the hybridGSA and GA procedures for n ¼ 100 and n ¼ 200, respectively. Control
parameters are as in the literature examples for GA and hybridGSA methods (see Table 12).
For each procedure, the number of variables in solution and predicted AIC values are provided. The number of variables in
solution represents the number of variables predicted by each procedure.
As can be seen in Tables 10 and 11, the proposed hybridGSA search procedure found the best subset for all of the size prob-
lems, except n ¼ 200; k ¼ 24 case. In fact, FS and BE methods provided quite different results even amongst themselves.
11026 H. Hasan Örkcü / Applied Mathematics and Computation 219 (2013) 11018–11028
Table 9
Forward selection and backward elimination results (n ¼ 100
Table 10
Forward selection and backward elimination results (n ¼ 200).
Table 11
HybridGSA and genetic algorithm results (n ¼ 100).
Also, GA was not find the best subsets in n ¼ 100; k ¼ 15; k ¼ 18; k ¼ 21; k ¼ 24; k ¼ 25 and n ¼ 200; k ¼ 17; k ¼ 19;
k ¼ 21; k ¼ 23; k ¼ 24; k ¼ 25 cases.
H. Hasan Örkcü / Applied Mathematics and Computation 219 (2013) 11018–11028 11027
Table 12
HybridGSA and genetic algorithm results (n ¼ 200).
The results obtained from literature and simulated data sets indicate the superiority of hybridGSA procedure over existing
subset selection procedures.
5. Conclusion
This paper introduced an alternative variable selection method for use in regression analysis that is based on the hybrid of
genetic and simulated annealing algorithms (hybridGSA). Using literature and simulated data sets, the hybridGSA procedure
was compared with forward selection, backward elimination and classical genetic algorithm based on the all possible regres-
sions procedure. The results indicate the superiority of hybridGSA procedure over existing subset selection procedures in
multiple regression analysis.
Acknowledgments
We would like to thank the anonymous referees for their helpful and constructive comments on the previous version of
the manuscript which improved the presentation of this article.
References
[1] A. Al-Ani, A. Alsukker, R.N. Khushaba, Feature subset selection using differential evolution and a wheel based search strategy, Swarm Evol. Comput. 9
(2013) 15–26.
[2] H. Aytug, G.J. Koehler, New stopping criterion for genetic algorithms, Eur. J. Oper. Res. 26 (2000) 662–674.
[3] R. Baragona, F. Battaglia, C. Calzini, Genetic algorithms for the identification of additive and innovation outliers in time series, Comput. Stat. Data Anal.
37 (2001) 1–12.
[4] E.M.L. Beale, M.G. Kendall, D.W. Mann, The discarding of variables in multivariate analysis, Biometrika 54 (1967) 357–366.
[5] D. Broadhursta, R. Goodacre, A. Jones, J.J. Rowland, D.B. Kell, Genetic algorithms as a method for variable selection in multiple linear regression and
partial least squares regression, with applications to pyrolysis mass spectrometry, Anal. Chim. Acta 348 (1997) 71–86.
[6] P.J. Brown, Measurement Regression and Calibration, Clarendon Press, Oxford, 1993.
[7] M.J. Brusco, D. Steinley, J.D. Cradit, An exact algorithm for hierarchically well-formulated subsets in second-order polynomial regression,
Technometrics 51 (3) (2009) 306–315.
[8] S. Chatterjee, A.S. Hadi, Regression Analysis by Example, Wiley, 2006.
[9] K. Deep, M. Thakur, A new mutation operator for real coded genetic algorithms, Appl. Math. Comput. 193 (2007) 229–247.
[10] K. Deep, M. Thakur, A new crossover operator for real coded genetic algorithms, Appl. Math. Comput. 188 (2007) 895–911.
[11] Z. Drezner, G.A. Marcoulides, S. Salhi, Tabu search model selection in multiple regression analysis, Commun. Stat. Simul. 28 (2) (1999) 349–367.
[12] A.P. Duarte Silva, Efficient variable screening for multivariate analysis, J. Multivariate Anal. 76 (2001) 35–62.
[13] K. Fueda, M. Iizuka, Y. Mori, Variable selection in multivariate methods using global score estimation, Comput. Stat. 24 (2009) 127–144.
[14] G.M. Furnival, R.W. Wilson, Regression by leaps and bounds, Technometrics 16 (1974) 499–512.
[15] B. Eksioglu, R. Demirer, I. Capar, Subset selection in multiple linear regression: a new mathematical programming approach, Comput. Ind. Eng. 49
(2005) 155–167.
[16] C. Gatu, P.I. Yanev, E.J. Kontoghiorghes, A graph approach to generate all possible regression submodels, Comput. Stat. Data Anal. 52 (2007) 799–815.
[17] D.E. Goldberg, Genetic algorithms in search optimization and machine learning, Addison-Wesley, 1989.
[18] J.J. Grefenstette, Optimization of control parameters for genetic algorithms, IEEE Trans. Syst. Man Cybern. 16 (1986) 122–128.
[19] J. Holland, Adaptation in Natural and Artificial Systems, Michigan Press, Michigan, 1975.
[20] C.L. Huang, ACO-based hybrid classification system with feature subset selection and model parameters optimization, Neurocomputing 73 (2009)
438–448.
[21] C. Jin, S.W. Jin, L.N. Qin, Attribute selection method based on a hybrid BPNN and PSO algorithms, Appl. Soft Comput. 12 (2012) 2147–2155.
[22] S. Kirkpatrick, C.D. Gerlatt, M.P. Vecchi, Optimization by simulated annealing, Science 220 (1983) 671–680.
[23] S. Kirkpatrick, Optimization by simulated annealing-quantitative studies, J. Stat. Phys. 34 (1984) 975–986.
11028 H. Hasan Örkcü / Applied Mathematics and Computation 219 (2013) 11018–11028
[24] W.J. Krzanowski, Principles of Multivariate Analysis-A User’s Perspective, Oxford University Press, 1988.
[25] F.T. Lin, C.Y. Kao, C.C. Hsu, Applying the genetic approach to simulated annealing in solving some NP-hard problems, IEEE Trans. Syst. Man Cybern. 23
(6) (1993) 1752–1767.
[26] J. Longley, An appraisal of least squares programs for the electronic computer from the point of view of the user, J. Am. Stat. Assoc. 62 (1967) 819–841.
[27] H. Maaranen, K. Miettinen, M.M. Mäkelä, Quasi-random initial population for genetic algorithms, Comput. Math. Appl. 47 (2004) 1885–1895.
[28] R. Meiri, J. Zahavi, Using simulated annealing to optimize the feature selection problem in marketing applications, Eur. J. Oper. Res. 171 (2006) 842–
858.
[29] N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller, E. Teller, Equation of state calculations by fast computing machines, J. Chem. Phys. 21 (6)
(1953) 1087–1091.
[30] A.J. Miller, Subset Selection in Regression, second ed., Chapman and Hall, London, 2002.
[31] M. Monirul Kabir, M. Shahjahan, K. Murase, A new local search based hybrid genetic algorithm for feature selection, Neurocomputing 74 (2011) 2914–
2928.
[32] M. Monirul Kabir, M. Shahjahan, K. Murase, A new hybrid ant colony optimization algorithm for feature selection, Expert Syst. Appl. 39 (2012) 3747–
3763.
[33] X. Peng, D. Xu, A local information-based feature-selection algorithm for data regression, Pattern Recognition, in press.
[34] M. Srinivas, L.M. Patnaik, Adaptive probabilities of crossover and mutation in genetic algorithms, IEEE Trans. Syst. Man Cybern. 24 (1994) 656–667.
[35] S. Tsutsui, D.E. Goldberg, Search space boundary extension method in real-coded genetic algorithms, Inf. Sci. 133 (2001) 229–247.
[36] F.B. Waugh, Graphic Analysis in Agricultural Economics, Agricultural Handbook No. 128, U.S. Department of Agriculture, 1957.
[37] P. Winker, M. Gilli, Applications of optimization heuristics to estimation and modelling problems, Comput. Stat. Data Anal. 47 (2) (2004) 211–223.
[38] W. Zhao, R. Zhang, Y. Lv, J. Liu, Variable selection of the quantile varying coefficient regression models, J. Korean Stat. Soc., in press.