Icsp2016 RM

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/315512448
ICA and PCA integrated feature extraction for classiﬁcation
Conference Paper · November 2016

DOI: 10.1109/ICSP.2016.7877996
CITATIONS READS
17 344
2 authors:
Md Shamim Reza Jinwen Ma

Peking University Peking University
6 PUBLICATIONS 64 CITATIONS 237 PUBLICATIONS 3,337 CITATIONS
SEE PROFILE SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Convergence propertes of the EM algorithms for finite mixtures View project
Content-based Image Retrieval and Classification View project
All content following this page was uploaded by Jinwen Ma on 20 June 2022.
The user has requested enhancement of the downloaded file.

ICA and PCA Integrated Feature Extraction for
Classification
Md Shamim Reza, Jinwen Ma
Department of Information Science, School of Mathematical Sciences & LMAM
Peking University, Beijing, 100871, P.R. China
Email: shamim@pku.edu.cn, jwma@math.pku.edu.cn
Abstract—Accurate feature extraction plays a vital role in drawbacks of PCA is that these extracted components are not
the fields of machine learning, pattern recognition and image always independent and invariant under transformation, which
processing. Feature extraction methods based on principal com- may contradict to many supervised classification assumptions
ponent analysis (PCA), independent component analysis (ICA),
and linear discriminant analysis (LDA) are capable of improving [24].
the performances of classifiers. In this paper, we propose two fea- Another linear transformation method that commonly used
tures extraction approaches, which integrate with the extracted in classification system is LDA, proposed by Fisher. It uses
features of PCA and ICA through some statistical criterion. The class label to compute the between class and within class
performances of the proposed feature extraction approaches are matrix, seeks the directions along which the classes are best
evaluated on simulated data and three public data sets by using
cross-validation accuracy of different classifiers that found in separated [8]. However, LDA is a very powerful and useful
statistics and machine learning literature. Our experiment result method for feature extraction, it requires enough training
shows that integrated with ICA and PCA feature is more effective sample in each class and the application of LDA is limited
than others in classification analysis. when classes have significant difference between means [23],
[25].
I. I NTRODUCTION
Recently, ICA is found to be very useful and effective tech-
With the rapid growth of modern technology and the wide nique helps to extract representative features in pattern classi-
application of internet, the amount of data information has fication. It was originally proposed by Jutton and Herault [9]
increased greatly and almost doubles approximately in two for solving blind source separation (BSS) problem. Although
years [1]. Big data analysis has become a popular issue nowa- ICA was initially developed to solve the BSS problem, past
days and arousing interest of the researchers in this research studies have shown that ICA can serve as an effective feature
area witnessed over the last few decades [2]-[8]. Specially in extraction method of improving the classification performance
pattern classification, the existence of too much information in both supervised classification [10]-[14] and unsupervised
may often reduce the performances of the classifier. It may also classification [15]-[17]. It has also been found that ICA may
cause a classification algorithm to overfit to training samples help to improve the performance of various classifiers, such
and generalize poorly to new samples [3]. as SVM, artificial neural networks, decisions trees, hidden
Intrinsically, good classification results may be acquired Markov models, and the naı̈ve Bayes classifier [10]-[17]. In
from a set of representative features constructed from the pattern classification, a large number of paper have been pro-
knowledge of domain experts. When such expert knowledge is posed using PCA, LDA and ICA, as direct feature extraction in
not available, general feature extraction and feature selection the field of face recognition, signal analysis and UCI machine
techniques seem to be very beneficial to remove redundant or learning database [18]-[23].
irrelevant features. Although ICA and PCA can directly be used for feature
On the feature extraction for a classification problem, there extraction, they don’t guarantee to generate useful information
may exist irrelevant or redundant features that interfere the individually [13], [18]. This paper integrated with PCA and
learning process and thus lead to a wrong result [5]. Moreover, ICA features to generate a more representative feature in
when the dimension of feature space is so large that it requires improving classification performances. Among the numerous
many instances to find out the association among features, application of ICA, it has a limitation to sort their compo-
causing slow training and testing in learning algorithm. Some nents. Principal components (PC’s) are sorted according to the
of the classifiers like support vector machine (SVM) can eigenvalues, but independent components (IC’s) has no certain
tolerant extra redundant features, but performance of other rules to order these components. Past studies have shown that
classifiers are naturally resistant to non-informative predictors. non-gaussian IC’s are sometime interesting in classification
Tree and rule based models, naı̈ve Bayes, k-nearest neighbor- problem [34], [36]. The standardized fourth central moment
hood etc. deteriorates when extra irrelevant features are added kurtosis is earlier used as a measure of non-gaussanity as
[6], [16]-[18]. well as to sort IC’s [36]. Since the conventional measures of
The most common and useful feature transformation is kurtosis are sensitive to outliers, we consider quantile measure
PCA proposed by Pearson in the early 20th century [7]. The of kurtosis instead of classical kurtosis in this paper.
978-1-5090-1345-6/16/$31.00 ©2016 IEEE 1083 ICSP2016

The remaining of this paper is organized as follows. Section where γk is called the percentage retained in the data
II gives a background review of PCA, LDA, and ICA. Section representation. In feature extraction by using PCA, we should
III describes our proposed methods. Section IV illustrates the retain the extracted principal components that can explain at
data sets and experimental results and final section draws the least 80% of the total variation.
conclusion.
II. F EATURE E XTRACTION : PCA, LDA, AND ICA B. Linear Discriminant Analysis (LDA)
An array of attributes to classify the output class are LDA is simple and powerful feature extraction method for
called features. To select subset of features referred as feature classification problem while preserving as much of the class
selection, whereby feature extraction refers to a process, to discriminatory information as possible. It requires the class
transformed data space into a feature space, in which the label of each observation and takes into consideration the
original data is represented by a reduced number of effective scatter within-classes but also the scatter between-classes.
features, retaining most of the intrinsic information as much Consider a set of observations xi within C classes, LDA can
as possible. Some common feature extraction methods are generalize C-class problem gracefully and it is capable pro-
describe below: jection is from a N-dimensional space onto (C-1) dimensions.
Let an observation may come from the i-th class of C classes
A. Principal Component Analysis (PCA) with each class containing ni observations, the generalization
The central idea of PCA is to transforms data linearly of the within-class scatter matrix is
into a low-dimensional subspace by obtaining the maximized
C X
variance of the data. The resulting vectors are an uncorrelated X t
orthogonal basis set. The principal components are orthogonal SW = (x − µi ) (x − µi )
i=1 x∈wi
because they are the eigenvectors of the covariance matrix,
which is symmetric. The generalization of between-class scatter matrix is
Mathematically, considering k observations in a data set, C
each observation is n-dimensional by ignoring the class label.
X t
SB = Ni (µi − µ)(µi − µ)
Let x1 , x2 , ..., xk ∈ <n The following steps of computing i=1
PCA.
Where µ is the mean vector of all observation and ST = SW +
• Calculate the m-dimensional mean vector µ by
SB is the total scatter matrix. For the (C-1) class problem we
1X
k will seek (C-1) projection vectors Wi which can be arranged
µ= xi by columns into a projection matrix W = [w1 |w2 |...|wC−1 ],
k i=1
so that
• Compute the estimated covariance matrix S for the ob-
served data by yi = wiT x ⇒ y = W T x
k Since the projection is not scalar (it has C-1 dimensions),
1X t
S= (xi − µ)(xi − µ) we use the determinant of the scatter matrices into the criterion
k i=1
function, which then becomes
• Compute eigenvalues and corresponding eigenvectors of
S, where λ1 ≥ λ2 ≥ ... ≥ λk ≥ 0 W T SB W
|J (W )| =
• From k original variables, calculate k principal compo- W T SW W
nents by Find W that maximize J(W ) to solving the eigenval-
y1 = a11 x1 + a12 x2 + ... + a1k xk ue/eigenvector system as given below
SB W = λSW W
y2 = a21 x1 + a22 x2 + ... + a2k xk
LDA is supervised can extract (C-1) features while PCA
...
is unsupervised can extract r (rank of data) principles
yk = ak1 x1 + ak2 x2 + ... + akk xk features. In supervised learning, LDA is more efficient
feature extraction method than PCA because its extracted
yk ’s are uncorrelated (orthogonal). y1 explains as much as features use the class information. However, it is assumed
possible of original variance in data set, y2 explains as much that the distributions of samples in each class are normal and
as possible of remaining variance etc. homoscedastic. Therefore, it may be difficult to find a good
In general, a few larger eigenvalues dominate the others in and representative feature space if this assumption is violated
the most practical data sets, that is [25]. Furthermore, LDA may fail not only in heteroscedastic
λ1 + λ2 + ... + λm cases and sometimes even in homoscedastic cases [23], [33].
γk = ≥ 80%
λ1 + λ2 + ... + λm + ... + λk
1084
C. Independent Component Analysis (ICA) In past, PCA, ICA and LDA has been studied individually
ICA is a relatively new statistical and computational tech- in various supervised and unsupervised classification problem.
nique used to discover hidden factors (sources or features) In the next section, we will discuss about our newly propose
from a set of measurements or observed data such that the feature extraction methods which derive from PCA and ICA.
sources are maximally independent.
Mathematically, given the observed variables x(t) = III. P ROPOSED APPROACH
x1 (t), x2 (t), ..., xn (t) is composed of linear combination
of original and mutually independent source s(t) = Although, the performances of PCA and ICA are powerful
s1 (t), s2 (t), ..., sn (t) at time point t such that it expressed as in the field of data visualization and blind source separation
[28], [29]. For classification problem, feature extraction meth-
x(t) = As(t) (1) ods PCA and ICA are not as good as expected [8], [18].
To overcome the problem, we propose two feature extraction
where A is a mixing matrix with full rank. In principles of methods, which integrate with ICA and PCA to represent
ICA, Eq. (1) is often written as significant feature sets for classification problem.
The idea of the proposed feature extraction is very simple.
y = Wx (2) In the first approach, PCA apply to the original data, we then
where W = A−1 is the demixing matrix and y = retain those PC’s that can explain at least 80% of the total
y1 , y2 , ..., yn denotes the independent component. The task is variation, then standard ICA algorithm applying on extracted
to estimate the demixing matrix and independent components PC’s. IC’s are ordered using quantile measure of kurtosis. This
only based on the mixed observations, which can be done by method named as ICA on PCA (IPCA). In the propose method,
various ICA algorithms like fastICA, JADE, Infomax etc. the component containing most negative kurtosis (kurtosis <
In the principles of ICA estimation, extracted components 0) considered as IC1, second most negative kurtosis treated as
are non-gaussian and independent. Kurtosis (β1 ) is one of IC2 and so on.
the ways to measure non-gaussianity. The gaussian IC’s has In classification analysis, sub-gaussian distributions are
kurtosis value equal to 0, sub-gaussian, β1 ≤ 0, and for super more interesting, can indicate a cluster structure or at least
gaussian β1 ≥ 0. Classical measure of kurtosis is defined as a uniformly distributed factor. Thus the components with
the most negative kurtosis can give us the most relevant
4
E(x − µ) µ4 information for classification [34], [36].
β1 = 2 − 3 = −3 (3) In our second approach, ICA and PCA have applied to
2 σ4
E(x − µ) the original data individually. These extracted IC’s, and
Since the conventional measures of kurtosis are essentially PC’s are ordered by using quantile measure of kurtosis
based on sample averages, they are also sensitive to outliers. and eigenvalues, respectively. Both of the ordered extracted
Moreover, the impact of outliers is greatly amplified in the features of ICA and PCA are then integrated in such a way,
conventional measures of kurtosis due to the fact that they are that they contains most sub-gaussian IC’s and those PC’s that
raised to the third and fourth powers [32]. To overcome the can explain at least 80% variability of the original data. This
illustrated problem, we take an attempt to use robust measure proposed approach is named as IC-PC feature extraction.
of kurtosis in ICA. Moors (1988) proposed a quantile kurtosis Fig.1 shows the flow chart of implementing IPCA and IC-PC
alternative to β1 . The quantity of moors kurtosis is on the four database.
(E7 − E5 ) + (E3 − E1 )
Kurtosis = (4) IV. R ESULTS FROM E XPERIMENT
(E6 − E2 )
In this work, we evaluate performance of our proposed
where Ei is i-th octile; that is Ei = F −1 ( 8i ) . For gaussian feature extraction approaches on a simulated dataset, two sets
independent components Moor’s quantile kurtosis is equal to of UCI [30] database, namely Wisconsin breast cancer, and
1.23. One advantage of the quantile measures of kurtosis is wine data, and one set of data collected from Australian crabs
that it doesn’t depend on first moment, and second moment. [35].
So it is more robust than classical measure of kurtosis. In order to test the effectiveness of the proposed feature
In pattern classification, ICA is useful as a dimension extraction methods , we select the most influential number of
preserving transformation because it produces statistically original features by using random forest algorithm (FS-RFA),
independent components, and ICA has directly been used for which is available in R cran package, FSelector [31]. In FS-
feature extraction [26]-[29]. In earlier studies, Kwak et al. [17] RFA, first employs a weight function to generate weights for
also showed that ICA outperforms as feature extraction method each feature. To select importance of weight, the algorithm use
for face recognition than PCA and LDA. More recently, Fan et mean decrease accuracy, besides it selects optimum number
al. [13], [14] presented sequential feature extraction by using of subset feature through ranking approach of chi-square and
class-conditional independent component analysis for naı̈ve information gain. Finally, the process sorts top most influential
Bayes classification of microarray data. original subset of features.
1085
Fig. 1. Flow chart for implementing IPCA and IC-PC feature extraction.
As classifier system, we have used support vector machine TABLE I

(SVM), naı̈ve Bayes, decision Tree (C5.0), and neural network R ESULTS FOR S IMULATED DATA (PARENTHESES ARE THE NUMBER OF
PC ’ S & IC ’ S RESPECTIVELY )
(MLP) classifier. In our experiment, we have conducted 10-
fold cross validation in getting the performance as follows: Classification Accuracy (%)
The observations have been divided randomly into 10 disjoint Features
SVM Naı̈ve Bayes C5.0 MLP
sets (fold). For each experiment, 9 of these sets are used as Original 62.5 66.5 69.0 51.5
training data, while 10th set observation is reserved for testing. FS-RFA 64.0 69.0 65.0 56.0
The experiment is repeated 10 times in such a way that every PCA 65.0(3) 65.5(3) 68.5(3) 56.5(3)
fold appears once as a part of a test set. LDA 61.5 66.5 66.0 51.5
To show the effectiveness of our method, we have compared ICA 62.0 67.5 65.0 52.0
the performances of the proposed methods IPCA and IC-PC IPCA 66.0(3) 69.0(3) 65.5(3) 56.0(3)
with PCA, LDA, ICA, and same number of original feature IC-PC 66.0(3,1) 72.0(2,2) 74.0(3,4) 59.5(3,4)
selected by random forest algorithm (FS-RFA).
All the experiments conduct here are implemented in
R-studio software with different R-cran packages of machine and IPCA feature extraction methods improvement in all the
learning [31], and run on an Intel (R) core (TM) i7 with 3.40 classification performances in a certain degree.
GHz CPU and 16 GB RAM. In naı̈ve Bayes classification, fuse of first 2 PC’s and two
sub-gaussian IC’s performs well. Similarly in decision tree,
first 3 PC’s combined with 4 IC’s give maximum accuracy of
A. Simulation Study
74%. It is likely due to the fact that these features extraction
In the simulation study, we have generated 200 observations methods could extracted a set of significant features which are
from each of the four well-known probability distributions, independent as well as can explain the total variation of the
normal (n=200, µ = 0, σ 2 = 1), student’s t (n=200, degrees original data. Our proposed methods are easy to implement
of freedom, v=1), chi-square (n=200, d.f=1), and standard and able to reduce significant number of features as well as
uniform distribution (n=200). computational complexity.
To analyze the simulated data for classification problem, we The performance of LDA is not so well in simulated
have divided each 200 observations into four column in such data, because it assumes the distribution of samples in each
a way that each column has 50 observations. An additional class as normal and homoscedastic. Our simulated data
column has also been constructed to insert class label. includes normal and three other classes (student’s t, chi-
As for example of normal distribution, the generated 200 square and uniform), which may influence LDA performances.
observations have been divided into four columns, where each
column contains 50 observations, then each of the first 50
observations have been labeled by 1 in the additional column. B. Wisconsin Breast Cancer Data
Similarly for student’s t, 200 generated observations were This breast cancer database is obtained from the University
divided into four columns and inserted the class label 2 and of Wisconsin Hospitals, Madison from Dr. William H. Wol-
so on. Finally, we have combined the observations to obtain berg, assessed biopsies of breast tumours for 699 patients up
a data frame that contains 5 features including one class label to 15 July 1992 [30]. The data consists of nine attributes that
attribute each with 200 observations. has been scored on a scale of 1 to 10, and the two classes of
In simulated data, first 3 PC’s can explain 84.23% of the outcomes: benign and malignant. There are 16 missing value
total variation, then we have applied ICA algorithm on 3 PC’s in the attribute number six. In our experiment we replace these
to construct IPCA feature. To make IC-PC feature, we have missing values with the mode of corresponding attributes.
integrated 3 PC’s with non-gaussian independent components In this data, first 3 and 4 PC’s can explain 80.02% and
measured by quantile kurtosis. Table I shows that IC-PC 85.21% of the total variation, respectively. The performance
1086
of IPCA is approximately same while comparing with other TABLE II
methods for the first four PC’s, where IC-PC (2,1), that is R ESULTS FOR B REAST C ANCER DATA (PARENTHESES ARE THE NUMBER
OF PC ’ S & IC ’ S RESPECTIVELY )
fuse of first two PC’s and one most negative IC’s gives better
results for SVM and naı̈ve Bayes classification. Surprisingly, Classification Accuracy (%)
only one PC and one IC performs well in decision tree Features
SVM Naı̈ve Bayes C5.0 MLP
classification. The performances of LDA almost same as Original 96.56 95.99 94.27 96.85
IC-PC for this problem. Table II shows that IPCA and IC-PC FS-RFA 96.28 96.28 94.13 96.56
feature extraction methods can significantly improvement in PCA 96.85(3) 96.56(3) 96.28(2) 96.85(4)
the classifier performances of SVM, decision tree and MLP LDA 96.99 97.13 96.42 96.85
based on 10-fold cross validation. ICA 95.85 94.57 91.41 96.71
IPCA 96.99(4) 95.99(3) 94.99(4) 96.99(4)
IC-PC 96.85(2,1) 96.71(2,1) 97.28(1,1) 96.85(4,1)
C. Wine Data
Data from the machine learning repository [30]. A chemical
analysis of 178 Italian wines from three different cultivars TABLE III
R ESULTS FOR W INE DATA (PARENTHESES ARE THE NUMBER OF PC ’ S &
yielded 13 measurements. The dataset consists of 13 numerical IC ’ S RESPECTIVELY )
variables and three classes, where number of instances are 59
in class-1, 71 in class-2, and 48 in class-3. This dataset is Classification Accuracy (%)
Features
often used to test and compare the performances of various SVM Naı̈ve Bayes C5.0 MLP
classification algorithms. Original 98.33 97.22 90.52 97.75
For this data, first 5 PC’s and 7 PC’s can explain 80% FS-RFA 97.78 97.78 91.66 98.33
and 90.06% of the total variation, respectively, while original PCA 97.78(5) 97.78(7) 97.07(5) 97.22(5)
attributes number is 13. The classification accuracy rates for LDA 98.90 98.99 96.63 97.75
the four classifier are displayed in table III. It can be seen ICA 97.78 89.38 76.96 98.30
that both IPCA and IC-PC perform better than others, which IPCA 97.78(5) 89.38(9) 96.07(4) 97.75(5)
demonstrates the effectiveness of the proposed approach. IC-PC 98.90(5,9) 98.90(6,9) 94.97(5,9) 97.75(1,9)
D. Crabs Data TABLE IV

R ESULTS FOR C RABS DATA (PARENTHESES ARE THE NUMBER OF PC ’ S &
The crabs data frame has 8 attributes and 200 observations, IC ’ S RESPECTIVELY )
describing 5 morphological measurements on 50 crabs each of
two colour forms and both sexes, of the species Leptograpsus Classification Accuracy (%)
Features
variegatus collected at Fremantle, W. Australia [35]. It has one SVM Naı̈ve Bayes C5.0 MLP
index variable, which has been excluded from our analysis. Original 94.50 63.00 89.50 99.00
For crabs data, first three PC’s can explain 95.77% of the FS-RFA 96.50 63.50 89.50 99.50
total variation. We have then compared our feature extrac- PCA 99.00(3) 97.00(3) 97.00(3) 100.00(3)
tion approaches with PCA, LDA, ICA, and same number LDA 99.50 99.00 99.50 100.00
of original feature selected by FS-RFA. Table IV shows ICA 99.50 99.00 98.50 100.00
the comparative performances of classifiers. It can be seen IPCA 99.50(3) 99.50(3) 99.00(3) 100.00(3)
that in most of the cases, first three PC’s lead significant IC-PC 99.50(3,1) 99.50(3,1) 99.50(3,1) 100.00(3,1)
role to IPCA and IC-PC in terms of classification accuracy.
The performances of all classifiers have increased and give
maximum accuracy when we fuse feature of first three PC’s classification problem. We have empirically compared our
and one most negative IC. proposed methods with others on simulated and real datasets.
In naı̈ve Bayes classification of crabs data, we found Our analysis indicates that while extracting features, PCA and
that PCA, ICA, IPCA and IC-PC features substantially ICA do not perform well individually. When we fuse the
change the accuracy rate than original features. One of the features obtained from PCA and ICA, could able to generate a
possible reasons in improving classification accuracy by our representative feature. In most of the cases, our experimental
proposed methods is that, they could extracted features that results have shown that IPCA and IC-PC improve the classi-
are independent, which coincide the principles of naı̈ve Bayes fication performances substantially. The performances of the
classification assumptions. proposed approaches clearly show the necessity of dimension-
ality reduction and the integrated features by using PCA and
ICA in the field of data mining and pattern classification. Since
V. C ONCLUSION our proposed feature extraction methods have no relation with
In this paper, we have proposed two new approaches of the supervised information, in future, we can investigate the
feature extraction, derived from ICA and PCA, for supervised usefulness of these methods on clustering analysis.
1087
ACKNOWLEDGMENT [18] N. Kwak, C. Choi, Feature extraction based on ICA for binary classifi-
cation problems. IEEE Transactions on Knowledge and Data Engineering
This work was supported by the Natural Science Foundation 15, 1374- 1388, 2003.
of China for Grant 61171138. [19] X. Chen, Z. Jing, G. Xiao, Nonlinear fusion for face recognition using
fuzzy integral. Communications in Nonlinear Science and Numerical
Simulation 12, 823-831, 2007.
R EFERENCES [20] M.R. Boutell, J. Luo, Beyond pixels: Exploiting camera metadata for
photo classification. Pattern Recognition 38, 935-946, 2005.
[1] W. J. Frawley, G. Piatetsky-Shapiro, and C. J. Matheus, Knowledge [21] V. Sanchez-Poblador, E. Monte-Moreno, J. Sol-Casals, ICA as a prepro-
discovery in databases: An overview, AI Magazine, no. 3, pp. 5770, 1992. cessing technique for classification. Lecture Notes in Computer Science
[2] U. M Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, Ad- (LNCS) 3195, 1165-1172, 2004.
vances in Knowledge Discovery and Data Mining, American Association [22] J. Fortuna, D. Capson, Improved support vector classification using PCA
for Artificial Intelligence and The MIT Press, 1996. and ICA feature space modification. Pattern Recognition 37, 1117 1129,
[3] V. S. Cherkassky and I. F. Mulier, Learning from Data, chapter 5. John 2004.
Wiley & Sons, 1998. [23] J. Oh, N. kwak, M. Lee, C.H. Choi, Generalized mean for feature
[4] I. T. Joliffe, Principal Component Analysis, Springer-Verlag, 1986. extraction in one-class classification problems, Pattern Classification, 46,
[5] K.J. Cios, W. Pedrycz, and R.W. Swiniarski, Data mining methods for 3328-3340, 2013.
knowledge discovery, chapter 9, Kluwer Academic Publishers, 1998. [24] A .R. Webb, Statistical Pattern Recognition, 2nd ed, John Wiley and
[6] G. H. John, Enhancements to the data mining process, Ph.D. thesis, Sons, 2002.
Computer Science Dept., Stanford University, 1997. [25] M. Zhu, A. Martinz, Subclass discriminant analysis, IEEE Transections
[7] K. Pearson, LIII. On lines and planes of closest fit to systems of points in on pattern Analysis and machine Intellegence, 28(8), 1274-1286, 2006.
space, The London, Edinburgh, and Dublin Philosophical Magazine and [26] A. Hyvarinen, E. Oja, P. Hoyer, and J. Hurri, Image Feature Extraction
Journal of Science, vol. 2, no. 11, 559-572, 1901. by Sparse Coding and Independent Component Analysis, Proc. 14th Intl
[8] A. M. Martinez and A. C. Kak, PCA Vesus LDA, IEEE Trans. on Pattern Conf. Pattern Recognition, Aug. 1998.
Analysis and Machine Intelligence, vol. 23, no. 2, 228-233, 2001. [27] A.D. Back and T.P. Trappenberg, Input Variable Selection Using Inde-
[9] C. Jutten, J. Herault, Blind separation of sources, part 1: an adaptive pendent Component Analysis, Proc. Intl Joint Conf. Neural Networks,
algorithm based on neuromimetic architecture. Signal Processing 24, 1- July 1999.
10, 1991. [28] H.H. Yang and J. Moody, Data Visualization and Feature Selection:
[10] X. Zhang, V. Ramani, Z. Long, Y. Zeng, A. Ganapathiraju, J. Picone, New Algorithms for Nongaussian Data, Advances in Neural Information
Scenic beauty estimation using independent component analysis and Processing Systems, vol. 12, 2000.
support vector machines. In: Proceedings of IEEE Southeastcon, pp. 274- [29] T-Y. Yang and C-C. Chen, Data Visualization by PCA, LDA, and ICA,
277, 1999. ACEAT-493, 2015.
[11] N. Kwak, C.H. Choi, J.Y. Choi, Feature extraction using ICA. Lecture [30] University of California, Irvine (UCI) Machine Learning Repository.
Notes in Computer Science, 2130, 568-573, 2001. http : //www.ics.uci.edu/ mlearn/.
[12] S. N. Yu, K.T. Chou, Integration of independent component analysis [31] The Comprehensive R Archive Network. https : //cran.r −
and neural networks for ECG beat classification. Expert Systems with project.org/
Applications 34, 2841-2846, 2008. [32] T.H. Kim, H. White, On more robust estimation of skewness and
[13] L. Fan, K.L. Poh, P. Zhou, A sequential feature extraction approach kurtosis: simulation and application to the S&P500 index, Department
for naive bayes classification of microarray data, Export System with of Economics, UCSD , 2003.
Application, 36, 9919-9923, 2009. [33] D. Tao, X. Li, X. Wu, S. Maybank, General averaged divergence
[14] L. Fan, K.L. Poh, P. Zhou, Partition-conditional ICA for bayesian analysis, in: Proceedings of IEEE International Conference on Data
classification of microarray data, Export System with Application, 37, Mining, 2007.
8188-8192, 2010. [34] M.S. Reza, M. Nasser, and M. Shahjaman, An Improved Version of
[15] S.I. Lee, S. Batzoglou, Application of independent component analysis Kurtosis Measure and Their Application in ICA, International Journal of
to microarrays. Genome Biology 4, R76, 2003. Wireless Communication and Information Systems, Vol 1, No 1, 2011.
[16] A. Kapoor, T. Bowles, J. Chambers, A novel combined ICA and [35] N.A. Campbell, and R.J. Mahon, A multivariate study of variation in
clustering technique for the classification of gene expression data. In: two species of rock crab of genus Leptograpsus. Australian Journal of
Proceedings of IEEE International Conference on Acoustics, Speech, and Zoology 22, 417-425, 1974.
Signal Processing, Vol. 5, pp. 621 624, 2005. [36] M. Scholz, Y. Gibon, M. Stitt and J. Selbig, Independent component
[17] N. Kwak, Feature extraction for classification problems and its appli- analysis of starch deficient pgm mutants, Proceedings of the German
cation to face recognition. Pattern Recognition 41, 1701-1717, 2008. conference on Bioinformatics. pp.95-104, 2004.
1088
View publication stats

Icsp2016 RM

Uploaded by

Copyright:

Available Formats

Icsp2016 RM

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Icsp2016 RM

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

ICA and PCA integrated feature extraction for classiﬁcation

Conference Paper · November 2016

Md Shamim Reza Jinwen Ma

SEE PROFILE SEE PROFILE

Convergence propertes of the EM algorithms for finite mixtures View project

Content-based Image Retrieval and Classification View project

The user has requested enhancement of the downloaded file.

978-1-5090-1345-6/16/$31.00 ©2016 IEEE 1083 ICSP2016

As classifier system, we have used support vector machine TABLE I

D. Crabs Data TABLE IV

View publication stats

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.