0% found this document useful (0 votes)

104 views

A Review of Feature Selection Methods With Applications

The document reviews feature selection methods and their applications. It discusses feature extraction, feature selection, and common filter, wrapper, and embedded feature selection methods. It also covers recent hybrid approaches and advanced topics in feature selection.

Uploaded by

Lorraine Oliveira

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

104 views

A Review of Feature Selection Methods With Applications

Uploaded by

Lorraine Oliveira

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

MIPRO 2015, 25-29 May 2015, Opatija, Croatia

A review of feature selection methods with

applications

A. Jović, K. Brkić and N. Bogunović*

* Faculty of Electrical Engineering and Computing, University of Zagreb / Department of Electronics, Microelectronics,
Computer and Intelligent Systems, Unska 3, 10 000 Zagreb, Croatia
{alan.jovic, karla.brkic, nikola.bogunovic}@fer.hr

Abstract - Feature selection (FS) methods can be used in feature set can then be easily reduced by taking into
data pre-processing to achieve efficient data reduction. This consideration characteristics such as dataset variance
is useful for finding accurate data models. Since exhaustive coverage. Feature selection, on the other hand, is a
search for optimal feature subset is infeasible in most cases, process of taking a small subset of features from the
many search strategies have been proposed in literature.
original feature set without transformation (thus
The usual applications of FS are in classification, clustering,
and regression tasks. This review considers most of the preserving the interpretation) and validating it with
commonly used FS techniques. Particular emphasis is on the respect to the analysis goal. The selection process can be
application aspects. In addition to standard filter, wrapper, achieved in a number of ways depending on the goal, the
and embedded methods, we also provide insight into FS for resources at hand, and the desired level of optimization.
recent hybrid approaches and other advanced topics.
In this paper, we focus on feature selection and
provide an overview of the existing methods that are
I. INTRODUCTION available for handling several different classes of
The abundance of data in contemporary datasets problems. Additionally, we consider the most important
demands development of clever algorithms for application domains and review comparative studies on
discovering important information. Data models are feature selection therein, in order to investigate which
constructed depending on the data mining tasks, but methods perform best for specific tasks. This research is
usually in the areas of classification, regression and motivated by the fact that there is an abundance of work
clustering. Often, pre-processing of the datasets takes in this field and insufficient systematization, particularly
place for two main reasons: 1) reduction of the size of the with respect to various application domains and novel
dataset in order to achieve more efficient analysis, and 2) research topics.
adaptation of the dataset to best suit the selected analysis Feature set reduction is based on the terms of feature
method. The former reason is more important nowadays relevance and redundancy with respect to goal. More
because of the plethora of developed analysis methods specifically, a feature is usually categorized as: 1)
that are at the researcher's disposal, while the size of an strongly relevant, 2) weakly relevant, but not redundant,
average dataset keeps growing both in respect to the 3) irrelevant, and 4) redundant [3,4]. A strongly relevant
number of features and samples. feature is always necessary for an optimal feature subset;
Dataset size reduction can be performed in one of the it cannot be removed without affecting the original
two ways: feature set reduction or sample set reduction. conditional target distribution [3]. Weakly relevant
In this paper, the focus is on feature set reduction. The feature may not always be necessary for an optimal
problem is important, because a high number of features subset, this may depend on certain conditions. Irrelevant
in a dataset, comparable to or higher than the number of features are not necessary to include at all. Redundant
samples, leads to model overfitting, which in turn leads to features are those that are weakly relevant but can be
poor results on the validation datasets. Additionally, completely replaced with a set of other features such that
constructing models from datasets with many features is the target distribution is not disturbed (the set of other
more computationally demanding [1]. All of this leads features is called Markov blanket of a feature).
researchers to propose many methods for feature set Redundancy is thus always inspected in multivariate case
reduction. The reduction is performed through the (when examining feature subset), whereas relevance is
processes of feature extraction (transformation) and established for individual features. The aim of feature
feature selection. Feature extraction methods such as selection is to maximize relevance and minimize
Principal Component Analysis (PCA), Linear redundancy. It usually includes finding a feature subset
Discriminant Analysis (LDA) and Multidimensional consisting of only relevant features.
Scaling work by transforming the original features into a In order to ensure that the optimal feature subset with
new feature set constructed from the original one based respect to goal concept has been found, feature selection
on their combinations, with the aim of discovering more method has to evaluate a total of 2m - 1 subsets, where m
meaningful information in the new set [2]. The new is the total number of features in the dataset (an empty

1200

Authorized licensed use limited to: PONTIFICIA UNIVERSIDADE CATOLICA DO RIO DE JANEIRO. Downloaded on March 28,2023 at 13:08:12 UTC from IEEE Xplore. Restrictions apply.
feature subset is excluded). This is computationally selection generates a starting subset based on a heuristic
infeasible even for a moderately large m. Therefore, (e.g. a genetic algorithm), and then explores it further.
putting completeness of the search aside, many heuristic The most common search strategies that can be used
methods have been proposed to find a sufficiently good with multivariate filters can be categorized into
(but not necessarily optimal) subset. The whole process exponential algorithms, sequential algorithms and
of finding the feature subset typically consists of four randomized algorithms. Exponential algorithms evaluate a
basic steps: 1) subset generation, 2) subset evaluation, 3) number of subsets that grows exponentially with the
a stopping criterion, and 4) validation of the results [5]. feature space size. Sequential algorithms add or remove
Feature subset generation is dependent on the state space features sequentially (one or few), which may lead to local
search strategy. After a strategy selects a candidate minima. Random algorithms incorporate randomness into
subset, it will be evaluated using an evaluation criterion their search procedure, which avoids local minima [17].
in step 2. After repeating steps 1 and 2 for a number of Common search strategies are shown in Table II.
times depending on the process stopping criterion, the
best candidate feature subset is selected. This subset is TABLE I. COMMON FILTER METHODS FOR FEATURE SELECTION
then validated on an independent dataset or using domain
Applicable to
knowledge, while considering the type of task at hand. Name Filter class
task
Study
univariate,
Information gain classification [6]
II. CLASSIFICATION OF FEATURE SELECTION information
univariate,
METHODS Gain ratio classification [7]
information
Feature selection methods can be classified in a Symmetrical univariate,
classification [8]
uncertainty information
number of ways. The most common one is the univariate,
classification into filters, wrappers, embedded, and hybrid Correlation regression [8]
statistical
methods [6]. The abovementioned classification assumes univariate,
Chi-square classification [7]
statistical
feature independency or near-independency. Additional multivariate,
methods have been devised for datasets with structured Inconsistency criterion classification [9]
consistency
features where dependencies exist and for streaming Minimum redundancy,
multivariate, classification,
features [2]. maximum relevance [2]
information regression
(mRmR)
Correlation-based multivariate, classification,
[7 ]
A. Filter methods feature selection (CFS) statistical regression
Fast correlation-based multivariate,
Filter methods select features based on a performance filter (FCBF) information
classification [8]
measure regardless of the employed data modeling univariate,
Fisher score classification [10]
algorithm. Only after the best features are found, the statistical
modeling algorithms can use them. Filter methods can univariate, classification,
Relief and ReliefF [11]
rank individual features or evaluate entire feature subsets. distance regression
Spectral feature
We can roughly classify the developed measures for selection (SPEC) univariate, classification,
feature filtering into: information, distance, consistency, [4]
and Laplacian Score similarity clustering
similarity, and statistical measures. While there are many (LS)
filter methods described in literature, a list of common Feature selection for multivariate,
clustering [12]
sparse clustering similarity
methods is given in Table I, along with the appropriate Localized Feature
references that provide details. Not all the filter features Selection Based on multivariate,
clustering [13]
can be used for all classes of data mining tasks. Therefore, Scatter Separability statistical
the filters are also classified depending on the task: (LFSBSS)
Multi-Cluster Feature multivariate,
classification, regression or clustering. Due to lack of Selection (MCFS) similarity
clustering [4]
space, we do not consider semi-supervised learning Feature weighting K- multivariate,
feature selection methods in this work. An interested clustering [14]
means statistical
reader is referred to [16] for more information. ReliefC
univariate,
clustering [15]
distance
Univariate feature filters evaluate (and usually rank) a
single feature, while multivariate filters evaluate an entire TABLE II. SEARCH STRATEGIES FOR FEATURE SELECTION
feature subset. Feature subset generation for multivariate
Algorithm
filters depends on the search strategy. While there are Algorithm name
group
many search strategies, there are four usual starting points Exhaustive search
Exponential
for feature subset generation: 1) forward selection, 2) Branch-and-bound
backward elimination, 3) bidirectional selection, and 4) Greedy forward selection or backward elimination
Best-first
heuristic feature subset selection. Forward selection Linear forward selection
typically starts with an empty feature set and then Sequential
Floating forward or backward selection
considers adding one or more features to the set. Beam search (and beam stack search)
Backward elimination typically starts with the whole Race search
Random generation
feature set and considers removing one or more features Simulated annealing
from the set. Bidirectional search starts from both sides - Randomized Evolutionary computation algorithms (e.g.
from an empty set and from the whole set, simultaneously genetic, ant colony optimization)
considering larger and smaller feature subsets. Heuristic Scatter search

1201

Authorized licensed use limited to: PONTIFICIA UNIVERSIDADE CATOLICA DO RIO DE JANEIRO. Downloaded on March 28,2023 at 13:08:12 UTC from IEEE Xplore. Restrictions apply.
B. Wrapper methods In these datasets, features are not independent. Therefore,
Wrappers consider feature subsets by the quality of the it is a good idea to employ specific algorithms to deal with
performance on a modelling algorithm, which is taken as a the dependencies in order to increase performance of the
black box evaluator. Thus, for classification tasks, a selected feature subsets. Most of the algorithms dealing
wrapper will evaluate subsets based on the classifier with feature structures are recent, and are based on some
performance (e.g. Naïve Bayes or SVM) [18,19], while adaptation of the Lasso regularization method to
for clustering, a wrapper will evaluate subsets based on accomodate different structures. Good overviews of these
the performance of a clustering algorithm (e.g. K-means) methods can be found in [2,33].
[20]. The evaluation is repeated for each subset, and the Streaming (or dynamic) features are features which
subset generation is dependent on the search strategy, in size is unknown in advance; they are rather dynamically
the same way as with filters. Wrappers are much slower generated, they arrive as streamed data into the dataset and
than filters in finding sufficiently good subsets because the modelling algorithms has to reach a decision whether
they depend on the resource demands of the modelling to keep them as useful for model construction or not.
algorithm. The feature subsets are also biased towards the Also, some features may become irrelevant over time and
modelling algorithm on which they were evaluated (even should be discarded. This scenario is common in social
when using cross-validation). Therefore, for a reliable networks such as Twitter, where new words are generated
generalization error estimate, it is necessary that both an that are not all relevant for a given subject [2]. The most
independent validation sample and another modelling important feature selection methods in this category are:
algorithm are used after the final subset is found. On the the Grafting algorithm [34], the Alpha-Investing algorithm
other hand, it has been empirically proven that wrappers [35] the OSFS algorithm [36], and dynamic feature
obtain subsets with better perfomance than filters because selection fuzzy-rough set approach [37].
the subsets are evaluated using a real modelling algorithm.
Practically any combination of search strategy and
modelling algorithm can be used as a wrapper, but III. FEATURE SELECTION APPLICATION DOMAINS
wrappers are only feasible for greedy search strategies and The choice of feature selection methods differs among
fast modelling algorithms such as Naïve Bayes [21], linear various application areas. In the following subsections,
SVM [22], and Extreme Learning Machines [23]. we review comparative studies on feature selection
pertaining to several well known application domains.
C. Embedded and hybrid methods Table III summarizes the findings from the reviewed
Embedded methods perform feature selection during studies.
the modelling algorithm's execution. These methods are A. Text mining
thus embedded in the algorithm either as its normal or
extended functionality. Common embedded methods In text mining, the standard way of representing a
include various types of decision tree algorithms: CART, document is by using the bag-of-words model. The idea
C4.5, random forest [24], but also other algorithms (e.g. is to model each document with the counts of words
multinomial logistic regression and its variants [25]). occurring in that document. Feature vectors are typically
Some embedded methods perform feature weighting formed so that each feature (i.e. each element of the
based on regularization models with objective functions feature vector) represents the count of a specific word, an
that minimize fitting errors and in the mean time force the alternative being to just indicate the presence/absence of
feature coefficients to be small or to be exact zero. These a word without specifying the count. The set of words
methods based on Lasso [26] or Elastic Net [27] usually whose occurrences are counted is called a vocabulary.
work with linear classifiers (SVM or others) and induce Given a dataset that needs to be represented, one can use
penalties to features that do not contribute to the model. all the words from all the documents in the dataset to
Hybrid methods were proposed to combine the best build the vocabulary and then prune the vocabulary using
properties of filters and wrappers. First, a filter method is feature selection.
used in order to reduce the feature space dimension space, It is common to apply a degree of preprocessing prior
possibly obtaining several candidate subsets [28]. Then, a
to feature selection, typically including the removal of
wrapper is employed to find the best candidate subset.
rare words with only a few occurrences, the removal of
Hybrid methods usually achieve high accuracy that is
characteristic to wrappers and high efficiency overly common words (e.g. "a", "the", "and" and similar)
characteristic to filters. While practically any combination and grouping the differently inflected forms of a word
of filter and wrapper can be used for constructing the together (lemmatization, stemming) [38].
hybrid methodology, several interesting methodologies Forman [38] performed a detailed experimental study
were recently proposed, such as: fuzzy random forest of filter feature selection methods for text classification.
based feature selection [29], hybrid genetic algorithms Twelve feature selection metrics were evaluated on 229
[30], hybrid ant colony optimization [31], or mixed text classification problem instances. Feature vectors
gravitational search algorithm [32].
were formed not as word counts, but as Boolean
representations of whether a certain word occurred or not.
D. Structured and streaming features
A linear SVM classifier with untuned parameters was
In some datasets, features may exhibit certain internal used to evaluate performance. The results were analyzed
structures such as spatial or temporal smoothness, with respect to precision, recall, F-measure and accuracy.
disjoint/overlapping groups, tree- or graph-like structures. Information gain was shown to perform best

1202

Authorized licensed use limited to: PONTIFICIA UNIVERSIDADE CATOLICA DO RIO DE JANEIRO. Downloaded on March 28,2023 at 13:08:12 UTC from IEEE Xplore. Restrictions apply.
TABLE III. SUMMARIZED FINDINGS OF RELEVANT FEATURE SELECTION METHODS IN VARIOUS APPLICATION AREAS
Application Evaluation
Subfield Datasets Feature selection methods Best performing Study
area metrics
Accuracy, accuracy
balanced, bi-normal
229 text classification Information gain
separation, chi-square, Accuracy, F-
problem instances (precision), bi-
Text document frequency, F1- measure,
gathered from Reuters, normal separation [38]
classification measure, information gain, precision,
TREC, OHSUMED, (accuracy, F-
odds ratio, odds ration and recall
etc. measure, recall)
Text mining numerator, power,
probability ratio, random
Information gain, chi-square,
Reuters-21578, 20 document frequency, term
Entropy, Iterative feature
Text clustering Newsgroups, strength, entropy-based [39]
precision selection
Web Directory ranking, term contribution,
iterative feature selection
Relief (R), K-means (K),
sequential floating forward R+K+B / R+K+F /
Aerial Images, The Average MSE
Image selection (F), sequential R+K, depending on
Digits Data, Cats and of 100 neural [40]
classification floating backward selection the size of feature
Image Dogs networks
(B), various combinations R subset
processing /
+ K + F/B
computer
Breast density Best-first with forward,
vision
classification backward and bi-directional
Best first forward,
from Mini-MIAS, KBD-FER search, genetic search and Accuracy [41]
best first backward
mammographic random search (k-NN and
images Naïve Bayesian classifiers)
Chi-square,
Three benchmark Chi-square, information gain,
symmetrical
Biomarker datasets deriving symmetrical uncertainty, Stability,
uncertainty, [42]
discovery from DNA microarray gain ratio, OneR, ReliefF, AUC
information gain,
experiments SVM-embedded
ReliefF
Bioinformatics
Information gain, twoing
Microarray
Two gene expression rule, sum minority, max
gene expression Consensus of all
datasets (Freije, minority, Gini index, sum of Accuracy [43]
data methods
Phillips) variances, t-statistics, one-
classification
dimensional SVM
Distance, entropy, SVM
Global geometric
Industrial Wind turbine test rig wrapper, neural network
Fault diagnosis Accuracy similarity scheme [22]
applications dataset wrapper, global geometric
with wrapper
similarity scheme

with respect to precision, while the author-introduced on the target application. Examples of features include
method bi-normal separation performed best for recall, F- histograms of oriented gradients, edge orientation
measure and accuracy. histograms, Haar wavelets, raw pixels, gradient values,
edges, color channels, etc. [44].
Liu et al. [39] investigated the use of feature selection
in the problem of text clustering, showing that feature Bins and Draper [40] studied the use of filter feature
selection can improve its performance and efficiency. selection methods in the general problem of image
Five filter feature selection methods were tested on three classification. Three different image datasets were used.
document datasets. Unsupervised feature selection They proposed a three-step method for feature selection
methods were shown to improve clustering performance, that combines Relief, K-means clustering and sequential
achieving about 2% entropy reduction and 1% precision floating forward/backward feature selection
improvement on average, while removing 90% of the (SFFS/SFBS). The idea is to: 1) use the Relief algorithm
features. The authors also proposed an iterative feature to remove irrelevant features, 2) use K-means clustering
selection method inspired by expectation maximization to cluster similar features and remove redundancy, and 3)
that combines supervised feature selection methods with run SFFS or SFBS to obtain the final set of features. The
clustering in a bootstrap setting. The proposed method authors found that using the proposed hybrid combination
reduces the entropy by 13.5% and increases precision by of algorithms yields better performance than when using
14.6%, hence coming closest to the established baseline, Relief or SFFS/SFBS alone. In cases when there are no
obtained by using a supervised approach. irrelevant or redundant features in the dataset, the
proposed algorithm does not degrade performance.
B. Image processing and computer vision When the goal is to select a specific number of features, it
Representing images is not a straightforward task, as is suggested to use the R+K+B variant of the algorithm if
the number of possible image features is practically the number of relevant and non-redundant features is less
unlimited [40]. The choice of features typically depends than 110, and otherwise R+K+F. If the number of
selected features is allowed to vary, authors suggest using

1203

Authorized licensed use limited to: PONTIFICIA UNIVERSIDADE CATOLICA DO RIO DE JANEIRO. Downloaded on March 28,2023 at 13:08:12 UTC from IEEE Xplore. Restrictions apply.
R+K. The authors also note that Relief is good at Abusamra [43] analyzed the performance of eight
removing irrelevant features, but not adequate for different filter-based feature selection methods and three
selecting the best among relevant features. classification methods on two datasets of microarray gene
expression data. The best individually performing feature
Muštra et al. [41] investigated the use of wrapper
selection methods varied depending on the dataset and
feature selection methods for breast density classification
the classifier used. Notably, using Gini index for feature
in mammographic images. Five wrapper feature selection
selection improved performance of an SVM classifier on
methods were evaluated in conjunction with three
both datasets. Some feature selection methods were
different classifiers on two datasets of mammographic
shown to degrade classification performance. However,
images. The best-performing methods were best-first
Abusamra demonstrated that classification accuracy can
search with forward selection and best-first search with
be consistently improved on both datasets using a
backward selection. Overall, the results over different
consensus of all feature selection methods to find top 20
classifiers and datasets were improved between 3% and
features, by counting the number of feature selection
12% when using feature selection.
methods that selected each feature. Seven features were
selected by all the methods, and additional 13 features
C. Industrial applications were randomly selected from a pool of features selected
Feature selection is important in fault diagnosis in by seven out of eight methods.
industrial applications, where numerous redundant
sensors monitor the performance of a machine. Liu et al. IV. CONCLUSION
[22] have shown that the accuracy of detecting a fault
(i.e. solving a binary classification problem of machine The current research advancement in this field is
state as faulty vs. normal) can be improved by using identified in the area of hybrid feature selection methods,
feature selection. They proposed to use a global particularly concerning the methodologies based on
geometric model and a similarity metric for feature evolutionary computation heuristic algorithms such as
selection in fault diagnostics. The idea is to find feature swarm intelligence based and various genetic algorithms.
subsets that are geometrically similar to the original Additionally, application areas such as bioinformatics,
feature set. The authors experimented with three different image processing, industrial applications and text mining
similarity measures: angular similarity, mutual deal with high-dimensional feature spaces where a clever
information and structure similarity index. The proposed hybrid methodology design is of utmost importance if any
approach was compared with distance-based and entropy- success is to be obtained. Therein, features may exhibit
based feature selection, and with SVM and neural complex internal structures or may even be unknown in
network wrappers. The best performance was obtained by advance.
combining the proposed geometric similarity approach While there is no silver bullet method, filters based on
with a wrapper, so that top 10% of feature subsets were information theory and wrappers based on greedy
preselected by geometric similarity, following by an stepwise approaches seem to offer best results. Future
exhaustive search-based wrapper approach to find the research should focus on optimizing the efficiency and
best subset. accuracy of feature subset search strategy by combining
earlier best filter and wrapper approaches. Most research
D. Bioinformatics tends to focus on small number of datasets on which their
An interesting application of feature selection is in methodology works. Larger comparative studies should be
biomarker discovery from genomics data. In genomics pursued in order to have more reliable results.
data, individual features correspond to genes, so by
selecting the most relevant features, one gains important ACKNOWLEDGEMENTS
knowledge about the genes that are the most
This work has been supported in part by the Croatian
discriminative for a particular problem. Dessì et al. [42]
Science Foundation, within the project “De-identification
proposed a framework for comparing different biomarker
Methods for Soft and Non-Biometric Identifiers” (DeMSI,
selection methods, taking into account predictive
UIP-11-2013-1544). This support is gratefully
performance and stability of the selected gene sets. They
acknowledged.
compared eight selection methods on three benchmark
datasets derived from DNA microarray experiments.
Additionally, they analyzed how similar the outputs of REFERENCES
different selection methods are, and found that the [1] F. Korn, B. Pagel, and C. Faloutsos, "On the „dimensionality
outputs of univariate methods seem to be more similar to curse‟ and the „self-similarity blessing‟," IEEE Trans. Knowl.
Data Eng. vol. 13, no. 1, pp. 96–111, 2001.
each other than to the multivariate methods. In particular,
[2] J. Tang, S. Alelyani, and H. Liu, "Feature Selection for
the SVM-embedded selection seems to select features Classification: A Review," in: C. Aggarwal (ed.), Data
quite distinct from the ones selected by other methods. Classification: Algorithms and Applications. CRC Press, 2014.
When jointly optimizing stability and predictive [3] L Yu and H. Liu, "Efficient Feature Selection via Analysis of
performance, best results were obtained using chi- square, Relevance and Redundancy," J. Mach. Learn. Res., vol. 5, pp.
1205–1224, 2004.
systematic uncertainty, information gain and ReliefF.

1204

Authorized licensed use limited to: PONTIFICIA UNIVERSIDADE CATOLICA DO RIO DE JANEIRO. Downloaded on March 28,2023 at 13:08:12 UTC from IEEE Xplore. Restrictions apply.
[4] S. Alelyani, J. Tang, and H. Liu, "Feature Selection for Clustering: in: B. Schölkopf, J. C. Platt, and T. Hoffmann (eds.), Advances in
A Review,'' in: C. Aggarwal and C. Reddy (eds.), Data Clustering: Neural Information Processing Systems, MIT Press, pp. 209–216,
Algorithms and Applications, CRC Press, 2013. 2007.
[5] H. Liu and L. Yu, "Toward integrating feature selection [26] S. Ma and J. Huang, "Penalized feature selection and classification
algorithms for classification and clustering," IEEE Trans. Knowl. in bioinformatics," Briefings in Bioinformatics, vol. 9, no. 5, pp.
Data Eng., vol. 17, no. 4, pp. 491–502, 2005. 392–403, 2008.
[6] N. Hoque, D. K. Bhattacharyya, and J. K. Kalita, "MIFS-ND: A [27] H. Zou and T. Hastie, "Regularization and variable selection via
mutual information-based feature selection method", Expert the elastic net," Journal of the Royal Statistical Society: Series B
Systems with Applications, vol. 41, issue 14, pp. 6371–6385, 2014. (Statistical Methodology), vol. 67, no. 2, pp. 301–320, 2005.
[7] I. H. Witten and E. Frank, Data mining: Practical machine [28] S. Das, "Filters, wrappers and a boosting-based hybrid for feature
learning tools and techniques, San Francisco CA, USA: Morgan selection," in: Proc. 18th International Conference on Machine
Kaufmann, 2011. Learning (ICML-2001), San Francisco, CA, USA, Morgan
[8] L. Yu and H. Liu, "Feature Selection for High-Dimensional Data: Kaufmann, pp. 74–81, 2001.
A Fast Correlation-Based Filter Solution," in: Proc. 20th [29] J. M. Cadenas, M. C. Garrido, and R. Martínez, "Feature subset
International Conference on Machine Learning (ICML-2003), selection Filter–Wrapper based on low quality data," Expert
Washington DC, USA, AAAI Press, pp. 856–863, 2003. Systems with Applications, vol. 40, pp. 6241–6252, 2013.
[9] H. Liu and R. Setiono, "A Probabilistic Approach to Feature [30] I. S. Oh, J. S. Lee, and B. R. Moon, "Hybrid genetic algorithms for
Selection-A Filter Solution," in: Proc. 13th International feature selection," IEEE Trans. Pattern Anal. Mach. Intell., vol.
Conference on Machine Learning (ICML-1996), Bary, Italy, 26, no. 11, pp. 1424–1437, 2004.
Morgan Kaufmann, pp. 319–327, 1996. [31] S. I. Ali and W. Shahzad, "A Feature Subset Selection Method
[10] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern classification, based on Conditional Mutual Information and Ant Colony
Wiley-interscience, 2012. Optimization," International Journal of Computer Applications,
[11] M. R. Sikonja and I. Kononenko, "Theoretical and empirical vol. 60, no. 11, pp. 5–10, 2012.
analysis of Relief and ReliefF," Mach. Learn., vol. 53, pp. 23–69, [32] S. Sarafrazi and H. Nezamabadi-pour, "Facing the classification of
2003. binary problems with a GSA-SVM hybrid system," Mathematical
[12] D. M. Witten and R. Tibshirani, "A framework for feature and Computer Modelling, vol. 57, issues 1-2, pp. 270–278, 2013.
selection in clustering," Journal of the American Statistical [33] J. Zhou, J. Liu, V. Narayan, and J. Ye, "Modeling disease
Association, vol. 105, no. 490, pp. 713–726, 2010. progression via fused sparse group lasso," in: Proc. 18th ACM
[13] Y. Li, M. Dong, and J. Hua. "Localized feature selection for SIGKDD International Conference on Knowledge Discovery and
clustering," Pattern Recognition Letters, vol. 29, no. 1, pp. 10–18, Data Mining, Beijing, China, ACM, pp. 1095–1103, 2012.
2008. [34] S. Perkins and J. Theiler, "Online feature selection using grafting,"
[14] D. S. Modha and W.S. Spangler, "Feature weighting in k-means In: Proc. 20th International Conference on Machine Learning
clustering," Mach. Learn., vol. 52, no. 3, pp. 217–237, 2003. (ICML-2003), Washington DC, USA, AAAI Press, pp. 592–599,
2003.
[15] M. Dash and Y.-S. Ong, "RELIEF-C: Efficient Feature Selection
for Clustering over Noisy Data," in: Proc. 23rd IEEE [35] D. Zhou, J. Huang, and B. Schölkopf, "Learning from labeled and
International Conference on Tools with Artificial Intelligence unlabeled data on a directed graph." In: Proc. 22nd International
(ICTAI), Roca Raton, Florida, USA, pp. 869–872, 2011. Conference on Machine Learning (ICML-2005), Bonn, Germany,
ACM, pp. 1041–1048, 2005.
[16] Z. Xu, I. King, and M. R.-T. Lyu, "Discriminative Semi-
Supervised Feature Selection Via Manifold Regularization," IEEE [36] X. Wu, K. Yu, H. Wang, and W. Ding, "Online streaming feature
Trans. Neural Networks, vol. 21, no. 7, pp. 1033–1047, 2010. selection," In: Proceedings of the 27th international conference on
machine learning (ICML-2010), Haifa, Israel, Omnipress, pp.
[17] H. Liu and H. Motoda, Feature Selection for Knowledge 1159–1166, 2010.
Discovery and Data Mining, London: Kluwer Academic
Publishers, 1998. [37] R. Diao, M. N. Parthalain, and Q. Shen, "Dynamic feature
selection with fuzzy-rough sets," in: Proc. IEEE International
[18] P. S. Bradley and O. L. Mangasarian, "Feature selection via
Conference on Fuzzy Systems (FUZZ IEEE 2013), Hyderabad,
concave minimization and support vector machines," in: Proc. India, IEEE Press, pp. 1–7, 2013.
15th International Conference on Machine Learning (ICML-
1998), Madison, Wisconsin, USA, Morgan Kaufmann, pp. 82–90, [38] G. Forman, "An extensive empirical study of feature selection
1998. metrics for text classification," J. Mach. Learn. Res., vol. 3, pp.
1289–1305, 2003.
[19] S. Maldonado, R. Weber, and F. Famili, "Feature selection for
high-dimensional class-imbalanced data sets using Support Vector [39] T. Liu, S. Liu, and Z. Chen, "An evaluation on feature selection
Machines," Information Sciences, vol. 286, pp. 228–246, 2014. for text clustering," in: Proc. 20th International Conference on
Machine Learning (ICML-2003), Washington DC, USA, AAAI
[20] Y. S. Kim, W. N. Street, and F. Menczer, "Evolutionary model Press, pp. 488–495, 2003.
selection in unsupervised learning," Intelligent Data Analysis, vol.
6, no. 6, pp. 531–556, 2002. [40] J. Bins and B. A. Draper, "Feature selection from huge feature
sets," in: Proc. 8th International Conference on Computer Vision
[21] J. C. Cortizo and I. Giraldez, "Multi Criteria Wrapper (ICCV-01), Vancouver, British Columbia, Canada, IEEE
Improvements to Naive Bayes Learning," LNCS, vol. 4224, pp. Computer Society, pp. 159–165, 2001.
419–427, 2006.
[41] M. Muštra, M. Grgić, and K. Delač, "Breast density classification
[22] C. Liu, D. Jiang, and W. Yang, "Global geometric similarity using multiple feature selection," Automatika, vol. 53, pp. 1289–
scheme for feature selection in fault diagnosis," Expert Systems 1305, 2012.
with Applications, vol. 41, issue 8, pp. 3585–3595, 2014.
[42] N. Dessì, E. Pascariello, and B. Pes, "A Comparative Analysis of
[23] F. Benoît, M. van Heeswijk, Y. Miche, M. Verleysen, and A. Biomarker Selection Techniques," BioMed Research
Lendasse, "Feature selection for nonlinear models with extreme International, vol. 2013, article ID: 387673, DOI:
learning machines," Neurocomputing, vol. 102, pp. 111–124, http://dx.doi.org/10.1155/2013/387673
2013.
[43] H. Abusamra, "A comparative study of feature selection and
[24] M. Sandri and P. Zuccolotto, "Variable Selection Using Random
classification methods for gene expression data of glioma,"
Forests," in: S. Zani, A. Cerioli, M. Riani, and M. Vichi (eds.), Procedia Computer Science, vol. 23, pp. 5–14, 2013.
Data Analysis, Classification and the Forward Search, Studies in
Classification, Data Analysis, and Knowledge Organization, [44] K. Brkić, "Structural analysis of video by histogram-based
Springer, pp. 263–270, 2006. description of local space-time appearance," Ph.D. dissertation,
University of Zagreb, Faculty of Electrical Engineering and
[25] G. C. Cawley, N. L. C. Talbot, and M. Girolami, "Sparse Computing, 2013.
Multinomial Logistic Regression via Bayesian L1 Regularisation,"

1205

Authorized licensed use limited to: PONTIFICIA UNIVERSIDADE CATOLICA DO RIO DE JANEIRO. Downloaded on March 28,2023 at 13:08:12 UTC from IEEE Xplore. Restrictions apply.

New Scheme of Work For SSS1-3
100% (9)
New Scheme of Work For SSS1-3
60 pages
A Review of Feature Selection and Its Methods: Cybernetics and Information Technologies March 2019
No ratings yet
A Review of Feature Selection and Its Methods: Cybernetics and Information Technologies March 2019
25 pages
Literature Review On Feature Subset Selection Techniques
No ratings yet
Literature Review On Feature Subset Selection Techniques
3 pages
Feature Pruning and Normalization
No ratings yet
Feature Pruning and Normalization
8 pages
A Novel Approach For Feature Selection Based On Correlation Measures CFS and Chi Square
No ratings yet
A Novel Approach For Feature Selection Based On Correlation Measures CFS and Chi Square
13 pages
Conference 101719
No ratings yet
Conference 101719
7 pages
Comparartive
No ratings yet
Comparartive
7 pages
Feature Selection
No ratings yet
Feature Selection
6 pages
Conference 101719
No ratings yet
Conference 101719
7 pages
A Review of Feature Selection and Its Methods
No ratings yet
A Review of Feature Selection and Its Methods
15 pages
Feature Selection
No ratings yet
Feature Selection
18 pages
A Review of Feature Selection Methods On Synthetic Data
No ratings yet
A Review of Feature Selection Methods On Synthetic Data
37 pages
An Improved Fast Clustering Method For Feature Subset Selection On High-Dimensional Data Clustering
No ratings yet
An Improved Fast Clustering Method For Feature Subset Selection On High-Dimensional Data Clustering
5 pages
Fast Clustering Based Feature Selection: Ubed S. Attar, Ajinkya N. Bapat, Nilesh S. Bhagure, Popat A. Bhesar
No ratings yet
Fast Clustering Based Feature Selection: Ubed S. Attar, Ajinkya N. Bapat, Nilesh S. Bhagure, Popat A. Bhesar
7 pages
E-Note 14653 Content Document 20231228101402AM
No ratings yet
E-Note 14653 Content Document 20231228101402AM
10 pages
Feature Selection
No ratings yet
Feature Selection
5 pages
Chandra Shekar 2014
No ratings yet
Chandra Shekar 2014
13 pages
A Study On Feature Selection Techniques in Bio Informatics
100% (1)
A Study On Feature Selection Techniques in Bio Informatics
7 pages
Untitled (1)
No ratings yet
Untitled (1)
93 pages
Feature Selection in PR
No ratings yet
Feature Selection in PR
6 pages
3.1 Dimensionality Reduction
No ratings yet
3.1 Dimensionality Reduction
24 pages
Feature Selection Techniques in Machine Learning
No ratings yet
Feature Selection Techniques in Machine Learning
9 pages
Feature Subset Selection With Fast Algorithm Implementation
No ratings yet
Feature Subset Selection With Fast Algorithm Implementation
5 pages
IEEE Dimensionality Reduction
No ratings yet
IEEE Dimensionality Reduction
6 pages
Unit - 3 Feature Engineering
No ratings yet
Unit - 3 Feature Engineering
29 pages
New .........
No ratings yet
New .........
2 pages
Fusion of Feature Selection With Symbolic Approach For Dimensionality Reduction
No ratings yet
Fusion of Feature Selection With Symbolic Approach For Dimensionality Reduction
4 pages
3038-Article Text-5729-1-10-20210418
No ratings yet
3038-Article Text-5729-1-10-20210418
6 pages
Feature Selection
No ratings yet
Feature Selection
18 pages
Feature selection techniques
No ratings yet
Feature selection techniques
5 pages
unit-3
No ratings yet
unit-3
23 pages
International Journal of Engineering Research and Development (IJERD)
No ratings yet
International Journal of Engineering Research and Development (IJERD)
5 pages
Module5.2 Feature selection methods
No ratings yet
Module5.2 Feature selection methods
64 pages
A Fast Clustering-Based Feature Subset Selection Algorithm For High Dimensional Data
No ratings yet
A Fast Clustering-Based Feature Subset Selection Algorithm For High Dimensional Data
8 pages
Xplore Feature Engineering
No ratings yet
Xplore Feature Engineering
9 pages
10_chapter 3
No ratings yet
10_chapter 3
15 pages
Feature engineering
No ratings yet
Feature engineering
5 pages
Data Reduction
No ratings yet
Data Reduction
23 pages
Literature Review On Feature Selection Methods For HighDimensional Data
No ratings yet
Literature Review On Feature Selection Methods For HighDimensional Data
9 pages
Review On Online Feature Selection
No ratings yet
Review On Online Feature Selection
4 pages
A Comprehensive Review of Feature Selection and Fe
No ratings yet
A Comprehensive Review of Feature Selection and Fe
16 pages
Machine Learning Unit-5
No ratings yet
Machine Learning Unit-5
49 pages
IJETR2225
No ratings yet
IJETR2225
3 pages
An Introduction To Feature Selection
No ratings yet
An Introduction To Feature Selection
45 pages
DM Prathameshwadnerkar92
No ratings yet
DM Prathameshwadnerkar92
9 pages
Feature Selection: A Literature Review
No ratings yet
Feature Selection: A Literature Review
19 pages
Lecture#10
No ratings yet
Lecture#10
24 pages
Explore Feature Engineering
No ratings yet
Explore Feature Engineering
10 pages
Kernels, Model Selection and Feature Selection
No ratings yet
Kernels, Model Selection and Feature Selection
5 pages
Independent Feature Elimination in High Dimensional Data: Empirical Study by Applying Learning Vector Quantization Method
No ratings yet
Independent Feature Elimination in High Dimensional Data: Empirical Study by Applying Learning Vector Quantization Method
6 pages
Feature Selection Based On Fuzzy Entropy
No ratings yet
Feature Selection Based On Fuzzy Entropy
5 pages
elaboudi2016 (1)
No ratings yet
elaboudi2016 (1)
5 pages
Dimenn Red PDF
No ratings yet
Dimenn Red PDF
135 pages
Feature selection 2011 Kotsiantis
No ratings yet
Feature selection 2011 Kotsiantis
20 pages
6 - Data Pre-Processing-III
No ratings yet
6 - Data Pre-Processing-III
30 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
47 pages
u1 p2 2
No ratings yet
u1 p2 2
66 pages
R21 Unit 2
No ratings yet
R21 Unit 2
101 pages
Feature Selection Techniques For Microarray Dataset: A Review
No ratings yet
Feature Selection Techniques For Microarray Dataset: A Review
8 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Antioxidant Synergy of -Tocopherol and Phospholipi (1)
No ratings yet
Antioxidant Synergy of -Tocopherol and Phospholipi (1)
10 pages
Zhou 2016
No ratings yet
Zhou 2016
9 pages
10 1016@j Renene 2019 04 164
No ratings yet
10 1016@j Renene 2019 04 164
6 pages
Arsenijevic1999 Metodo
No ratings yet
Arsenijevic1999 Metodo
9 pages
Modeling of Spiral Wound Membrane Desalination Modules and
No ratings yet
Modeling of Spiral Wound Membrane Desalination Modules and
22 pages
Research Opportunities in "Semiconductor Materials and Devices" ROSMD-2020
No ratings yet
Research Opportunities in "Semiconductor Materials and Devices" ROSMD-2020
2 pages
论文生成器
100% (1)
论文生成器
7 pages
13.29 PP 440 442 Private Fears in Public Spaces
No ratings yet
13.29 PP 440 442 Private Fears in Public Spaces
3 pages
People Watching Essay
100% (2)
People Watching Essay
6 pages
GHHR - Culture Analytics
No ratings yet
GHHR - Culture Analytics
482 pages
Week 01 (Lecture 01, 02 and 03) Definition, Nature and Scop of Political Geography
No ratings yet
Week 01 (Lecture 01, 02 and 03) Definition, Nature and Scop of Political Geography
5 pages
Semantics. Causes of Semantic Change
No ratings yet
Semantics. Causes of Semantic Change
3 pages
Addis Ababa Science and Technology University School of Graduate Studies
No ratings yet
Addis Ababa Science and Technology University School of Graduate Studies
41 pages
The impostor phenomenon psychological research theory and interventions 1st Edition Kevin Cokley 2024 Scribd Download
100% (11)
The impostor phenomenon psychological research theory and interventions 1st Edition Kevin Cokley 2024 Scribd Download
81 pages
Agile Leadership Model in Health Care: Organizational and Individual Antecedents and Outcomes
No ratings yet
Agile Leadership Model in Health Care: Organizational and Individual Antecedents and Outcomes
22 pages
Full-Unit-3 - Baseband Shaping For Data Transmission PDF
No ratings yet
Full-Unit-3 - Baseband Shaping For Data Transmission PDF
93 pages
GSELFActivities On Material Self March 27
100% (1)
GSELFActivities On Material Self March 27
2 pages
Rock Mass Classification PDF
No ratings yet
Rock Mass Classification PDF
31 pages
Complexity and Industrial Clusters - Dynamics and Models in Theory and Practice
No ratings yet
Complexity and Industrial Clusters - Dynamics and Models in Theory and Practice
307 pages
Week 7.1 - Central Limit Theorem
No ratings yet
Week 7.1 - Central Limit Theorem
20 pages
Broadcasting Script
No ratings yet
Broadcasting Script
3 pages
d12 Steel Design (Steel Connections)
No ratings yet
d12 Steel Design (Steel Connections)
119 pages
[EBOOK PDF] Download complete The Cybernetics Moment Or Why We Call Our Age the Information Age 1st Edition Ronald R. Kline ebook
100% (1)
[EBOOK PDF] Download complete The Cybernetics Moment Or Why We Call Our Age the Information Age 1st Edition Ronald R. Kline ebook
67 pages
10 Questions: The Most Repeated Themes
No ratings yet
10 Questions: The Most Repeated Themes
31 pages
OA-10-OBJ-AND-STRUCTURE - Customer Relations
No ratings yet
OA-10-OBJ-AND-STRUCTURE - Customer Relations
13 pages
Person Analysis-Analysis Dealing With Potential Participants and
No ratings yet
Person Analysis-Analysis Dealing With Potential Participants and
2 pages
The Four Types of Fear
No ratings yet
The Four Types of Fear
5 pages
The Rare Earth Elements Fundamentals and Applications 1st Edition David A. Atwood (Ed.) 2024 Scribd Download
100% (2)
The Rare Earth Elements Fundamentals and Applications 1st Edition David A. Atwood (Ed.) 2024 Scribd Download
55 pages
Transfer Learning
No ratings yet
Transfer Learning
60 pages
Math Lecture5
No ratings yet
Math Lecture5
17 pages
Test Bank for Health Psychology 6th by Straub - Download All Chapters Immediately In PDF Format
100% (6)
Test Bank for Health Psychology 6th by Straub - Download All Chapters Immediately In PDF Format
41 pages
Gas Power Cycles The Carnot Gas Power Cycle
No ratings yet
Gas Power Cycles The Carnot Gas Power Cycle
6 pages
IGNOU Assignment MS 02
No ratings yet
IGNOU Assignment MS 02
8 pages
TE Syllabus S1
No ratings yet
TE Syllabus S1
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

A Review of Feature Selection Methods With Applications

Uploaded by

A Review of Feature Selection Methods With Applications

Uploaded by

MIPRO 2015, 25-29 May 2015, Opatija, Croatia

A review of feature selection methods with

A. Jović, K. Brkić and N. Bogunović*

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

A Review of Feature Selection Methods With Applications

Uploaded by

A Review of Feature Selection Methods With Applications

Uploaded by

MIPRO 2015, 25-29 May 2015, Opatija, Croatia

A review of feature selection methods with

A. Jović*, K. Brkić* and N. Bogunović*

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

A. Jović, K. Brkić and N. Bogunović*