0% found this document useful (0 votes)
22 views

Comparartive

asedrftgyhj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Comparartive

asedrftgyhj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Comparative Analysis of Dimensionality Reduction Techniques

Harshada Khandelwal
Ali Akbar Khan Amit Kolekar
Instrumentation & Control
Instrumentation & Control Instrumentation & Control
Engineering
Engineering Engineering
Vishwakarma Institute of
Vishwakarma Institute of Vishwakarma Institute of
Technology(VIT)
Technology(VIT) Technology(VIT)
Pune, India
Pune, India Pune, India
harshada.khandelwal20@vit.edu
ali.khan20@vit.edu amit.kolekar20@vit.edu

Abstract— Due to the recent developments in


computation complexity and storage devices allows
large amount of data to be stored and processed. The
data accumulated is so large and diverse that just by
looking we can’t derive anything. Visualizing such
higher dimensions data is very difficult so reducing
space to 2D/3D allows us to clearly plot and observe
patterns. Thereby this paper brings a visual comparison Figure 1:Block diagram for dimension reduction
of the different types of dimensionality reduction
techniques ranging from some of the classical methods 2. TYPES OF DIMENSIONALITY REDUCTION
to some of the latest
TECHNIQUES:
Keywords: Dimensionality reduction, feature selection, Existing dimensionality reduction approaches can be
feature extraction, ML, linear & non-linear
dimensionality reduction categorized into two main groups:
1. Feature Selection
I. INTRODUCTION 2. Feature Extraction
Dimensionality reduction is decreasing data from high to Feature selection, from large number of original features
low dimensions while preserving its characteristics. Over selects useful features and feature extraction construct new
the past few years, an enormous amount of digital data has features from the original feature by using function. Their
been continuously generated in a variety of application task is to find a set of effective features for classification and
areas. Additionally, the amount, heterogeneity, complexity, to optimize the classification performance with the feature
and dimensionality of the data are all increasing dimension reduction.
exponentially. High Dimensional Data (HDD) applications
[2, 3] have been discovered in a variety of fields, including In feature selection,some exclusion of features should be done
biomedicine, the web, education, medicine, commerce, and as information can be lost by the process of feature subset
social media. Text, digital photos, speech signals, and choice. However, in feature extraction, without decriment or
videos are only a few of the many formats that the vast losing much initial feature of datasetis observed. The
amount of fresh HDD is constantly emerging in [1].The selection between both algorithms techniques is influenced
variables measured for each and every observation are by the particular dataset of the projects [2]. Figure 2 gives a
referred to as dimensions. Data due to their sparseness and quick look of the bifurcation of different algorithms under
redundancy, high dimensional characteristics present a feature selection and feature extraction.[4]
significant difficulty for data mining and pattern
identification. Therefore, a variety of dimensionality
reduction techniques are utilized to study this data.
In practice, handling high dimensional data is quite
challenging because the number of features grows together
with the dimensionality of the dataset. When an ML model
is trained with such high-dimensional data, it overfits and
performs poorly. Therefore, dimensionality reduction
becomes crucial because real-world data, such speech
signals, digital photographs, or fMRI scans, has a high
amount of dimensionality and must be shrunk.[5]

Figure 2 :Taxonomy of dimensionality reduction


2.1 Feature Selection the performance of the model [8]. A general guideline is
Dataset with high dimensionality has a large volume of advised to preserve those variables which show high
features with misleading, duplicated, or inappropriate correlation with the target variable.
increases the dimension of search space and time Table 3: correlation of Item_Weight w.r.t others
complexity. Feature selection is a process in which it
creates a subset of relevant features , leaving out the
irrelevant to increase the model’s accuracy and get rid of
the noise in data.To explain it in a more descriptive way
we will perform the computations of different algorithm
under feature selection on Big Mart Sales. The dataset
contains 1559 products in 10 stores in different cities. The
dataset has 12 features having 8523 values each. The
output of the dataset is to find the product sale.

2.1.1 Missing value ratio


Working with large datasets one’s first task is to compute
the percentage of missing value to make a decision as to Table 4: correlation of Item_visibility w.r.t others
what should be the threshold. If the missing percentage is
too high, we can consider dropping down that feature [6].
Table 1: Missing Value Ratio

Table 5: Correlation of Item_MRP w.r.t others

2.1.2 Low variance filter


Variance in simple words means the fact of being different
or unique. As the name suggests if the feature has very less
information or low variance than that feature needs to be Table 6: correlation of Outlet_establishment_year w.r.t
discarded. As feature values which are closely associated others
with others aren’t going to help improve our model and
discarding them isn’t going to affect the target variable [7].
Table 2 Low variance filter

2.1.4 Random Forest


2.1.3 High Correlation filter The idea of this is to combine multiple decision trees, then
Correlation in simple terms means similarity between two relying on individual decision trees in determining the
variables. If there exists high correlation between two final output . In regression models the mean of all decision
variables it too is of no use to the development of the tree results is considered as the final result [9].
model as the features become redundant and thus reduce
of process is done until there is no improvement in
performance[12].

Figure 3: histogram representation of feature


importance

2.1.4 Backward Feature elimination


Backward feature selection is an iterative method used to
find the best subnet of features used while building Linear
Regression or Logistic Regression models. The algorithm
works on the principle of elimination, first it selects ‘n’
variables in a dataset and trains the model and compute its
performance. Next it eliminates one input feature and Figure 5: F value of each feature
trains the model on remaining ‘n-1’ features and calculates
it’s performance. This process is run on loop(repetition)
until no feature can be dropped or there is no change in the
performance [10][11].
2.2 Feature Extraction
It is an extraction process for a new set of features from
the original dataset retaining much of its information
possible. It also decreases additional features for a
particular dataset and shows reliable representation in a
feature space.
To understand better we will implement some of the
common algorithms under Feature Extraction on a bigger
data set and compare their results.
Dataset: Fashion MNIST dataset contains a total of 10
classes with a total of 70,000 images each of 28x28
grayscale pixel images of fashion items (clothing,
footwear, and bags). We will be training our model on
60,000 images.

2.2.1.1 Factor Analysis (FA)


Figure 4: feature rank after backward feature
elimination Factor Analytics is an unsupervised ML algorithm used to
reduce the huge number of variables into a small number
of factors called as factoring the data. Like PCA FA is a
2.1.5 Forward feature selection linear method. [2]This method groups variables based on
Forward feature selection is the exact opposite of their covariance, variables with high correlation are
backward feature elimination, instead of elimination it grouped in one but not with variables of other groups. The
tries to find the best feature which will improve the goal of FA is to uncover such relations and thus reduce the
performance of the model. The algorithm starts training dimensionality of the dataset.
with one feature and calculates its performance and keeps
on adding one more feature at a time, if the performance
reduces then that feature is dropped whereas if the
performance increases then that feature is kept. Repetition
Figure 8: Variance explained by 4 components
Figure 6: correlation of data based on 3 factors.

2.2.1.3 ICA
2.2.1.2 Principal Component Analysis (PCA)
ICA is one of the most widely used algorithms based on
The dimensionality reduction process used by Principal information- theory. PCA looks for principal components
Components Analysis (PCA), which is a linear method, based on their variance whereas ICA focuses on finding
involves embedding the data into a linear subspace with the independent components i.e no dependency between
reduced dimensions. Although there are other methods for two components. The algorithm works on the basis of two
doing this, PCA is by far the most often used main assumptions: the hidden components must be
(unsupervised) linear method. Therefore, we just include statistically independent and second they should have
PCA in our comparison. [3] The oldest and most used non-gaussian distribution. If the dataset passes these two
unsupervised technique, it chooses the most crucial criteria then it is possible to estimate the independent
features and seeks to extract as much data as possible from components.
the dataset. The feature that has the maximum variance is
called the first principal component. The feature that
explains the second maximum variance is considered the
second principal component and it goes on.

Figure 9: Decomposed ICA components

Figure 7: Explained Variance Ratio by 4 components


2.2.2.1 ISOMAP
When compared to other techniques, isometric mapping
(ISOMAP) takes a non-linear approach rather than a linear
one like those employed in PCA. The goal of ISOMAP is
to obtain a lower-dimensional representation of the data
while maintaining the geodesic distance. It is based on
spectral theories. One of the most often utilized methods
for identifying an inherent structure in data using nonlinear
manifold learning is the isomap. Getting a reduced Uniform-MAP is a technique similar to t-SNE that can
dimension representation of the data while maintaining the preserve as much of the localized, and more of the
geodesic distance is Isomap's distinctive feature[1]. globalized data structure but with a shorter runtime. Some
of the key limitations of t-SNE are slow computation time,
loss of large-scale information, Inability to meaningfully
represent very large datasets which is overcome by it. It
constructs a highly dimensional graph representation and
then optimizes a low-dimensional graph to be as
structurally similar as possible, whereas t-SNE moves the
graph point-to-point from high to low dimensional space.

Figure 10: Decomposition using 3 components & 5


neighbors

2.2.2.2 t- Distributed(t-SNE)
Reduce HDD in low dimensional space by using the Figure 12: component decomposition using 3 factors,5
unsupervised NLDRT technique known as t-SNE. It works neighbors & 0.3 as min distance
by comparing the separations between distributions. It
works well for datasets with non-linear structure and for
2.3 Observation and discussion
visualizing that structure. It is a non-linear algorithm that
uses a probabilistic approach rather than a mathematical In missing value ratio refer (Table 1) we can incur that
one like the ones used in PCA to visualize high there aren’t too many missing values in our rather than just
dimensional datasets. It maps data points using two two in our dataset. We can set a threshold of 20% to drop
approaches a) local approach which maps nearby points to down any variable which has missing values less than that
nearby points & b) global approach which preserves the after that the Outlet_Size will be dropped down.
geometry at all scales and calculates the conditional In the Low Varian filter as we can observe, refer (Table 2)
probability of similarity between a pair of points in High that Item_visibility has the lowest variance compared to
and low dimensional space and tries to minimize this others hence that column needs to be neglected. We can
difference of conditional probability. also set a threshold value for neglecting variables having
low variance than the threshold.
In High correlation filter observing refer (Table (3,4,5,6))
we can observe that mostly the correlation with one
another is negative or very low hence we don’t have any
high correlation in our dataset, generally if the correlation
is between 0.5-0.6 it is considered to drop those variables,
we have dropped Item_Outlet_Sales as it is our dependant
variable.
In Random Forest, we first convert it into numerical data
by one- hot., hence we need to drop the non-numeric data
like Item_Identifier and Outlet_Identifier. Refer (figure 3)
we can clearly observe that the Item MRP is the factor that
Figure 11: components transformed using 3 factors & affects the post to the Outlet Sales
300 iterations In Backward Feature Elimination we are going to select
the number of features to select as 10 for our dataset, it
2.2.2.3 Uniform- MAP means that the model is going to train itself until 10
features has left. Refer (figure 4) corresponds to the
features based on their ranking. And the best features are close together.
termed as rank 1.
In Forward feature selection For our dataset we are going 2.4 Conclusion
to select variables having F-value greater than 10, refer
In this paper. We reviewed current and state of the art
(figure 5) describes the F-value of each feature. As we can
dimension reduction methods by performing analysis of a
observe that most of the features have an F value of 1 so
dataset. We learned about how dimensionality reduction
we conclude that the two variances are equal. We can also
techniques are classified into two categories: Feature
observe that the top features refer (figure 3) also have a F
selections select a subset of features irrespective of
value higher than 1.
transformation. Whereas in extraction of features,
Both forward and backward feature selection methods and dimensions are reduced which is obtained by creating new
time consuming and computationally expensive hence sets(features) from the input datasets. We see the new non-
they are performed on datasets having small number of linear dimensionality reduction algorithms give better
variables results than traditional PCA on real datasets. This
Hence from the Feature Selection algorithms we can comparative review tries to run through all important
observe the attributes which can be discarded based on reduction techniques of dimensionality.
various factors and those which have a high relation with
the target parameter in our case Outlet sales.
References
[1] S. Ayeshaa, . K. H. Muhammad and . R. Taliba,
In factor analysis refer (Figure 6) we can observe that the "Overview and Comparative Study of Dimensionality
data is decomposed into 3 factors but it is very hard to Reduction," Information Fusion,Sciencedirect, 2020.
observe these factors individually and they are clustered in
very large sub groups. [2] I. K. Fodor, "A Survey of Dimension," California, May
9,2002.
In PCA refer (Figure 7) we can observe that we are able to
explain 60% of variance in our dataset using just 4 [3] L. v. d. Maaten, E. Postma and Jaap van den Herik,
components. Refer (Figure 8) we can observe that the "Dimensionality Reduction: A Comparative Review,"
variance captured by the first principal component is Tilburg centre for Creative ComputingTiCC TR , 2009.
highest followed by others. [4] M. A. Carreira-Perpinan, A Review of Dimension
Both FA and PCA both try to reduce the number of Reduction Techniques, Spain, January 27, 1997.
parameters to fewer variables considering their variance. [5] L. v. d. Maaten, An Introduction to
However, in PCA explains the maximum variance, while Dimensionality, Maastricht, July 2007.
in FA explains the covariance. [6] h. singh, "Missing value ratio," 2021. [Online].
In ICA refer (Figure 9) shows the different independent Available:
components separated in our dataset. https://www.analyticsvidhya.com/blog/2021/04/beginners
In ISOMAP refer (Figure 10) we can observe that the data -guide-to-missing-value-ratio-and-its-implementation/.
has been decomposed much better while comparing to the [7] h. singh, "low variance filter," 2021. [Online].
ones before. Available:
https://www.analyticsvidhya.com/blog/2021/04/beginners
-guide-to-low-variance-filter-and-its-implementation/.
Comparing the tables of UMAP and t-SNE we can observe
that the correlation between the components refer (Figure [8] "high correlation filter," [Online]. Available:
12) is low while compared to refer (Figure 11), hence https://solegaonkar.github.io/index.html.
UMAP performs much better than t-SNE and separates [9] w. koehrsen, "random forest," Dec 27, 2017.
these groups of similar categories from each other. [Online]. Available:
Hence by performance analysis of different algorithms in https://towardsdatascience.com/random-forest-in-python-
feature extraction we come to know that UMAP performs 24d0893d51c0.
better than others. [10] h. singh, "Backward feature elimination," 2021.
[Online]. Available:
https://www.analyticsvidhya.com/blog/2021/04/backward
As we can observe that in feature extraction a non-linear -feature-elimination-and-its-implementation/.
dimensionality reduction algorithm performs better than
linear, the reason being that linear techniques place data [11] J. Grover, - Mar 28, 2021. [Online]. Available:
points based on their dissimilarity, whereas non-linear https://medium.com/mlearning-ai/short-python-code-for-
techniques use manifolds and place similar data points backward-elimination-with-detailed-explanation-
52894a9a7880.
[12] h. singh, "forward feature selection," 2021.
[Online]. Available:
https://www.analyticsvidhya.com/blog/2021/04/forward-
feature-selection-and-its-implementation/.
[13] K. Matthew Mayo, "step forward selection," June
18, 2018[Online].Available:
https://www.kdnuggets.com/2018/06/step-forward-
feature-selection-python.html.
[14] Archdeacon, " Correlation and Regression
Analysis: A Historian’s Guide. Univ of Wisconsin Press.,"
[Online]. Available:
https://www.statisticshowto.com/probability-and-
statistics/f-statistic-value-test/.
[15] "Factor analysis," 2019 . [Online]. Available:
https://www.datacamp.com/tutorial/introduction-factor-
analysis.
[16] avcontentteam, "PCA," 2016. [Online]. Available:
https://www.analyticsvidhya.com/blog/2016/03/pca-
practical-guide-principal-component-analysis-python/.
[17] S. Talebi, "ICA," 2021. [Online]. Available:
https://towardsdatascience.com/independent-component-
analysis-ica-a3eba0ccec35.
[18] F. Hashmi, "unsupervised ML," [Online].
Available: https://thinkingneuron.com/data-science-
interview-questions-for-it-industry-part-4-unsupervised-
ml/#ICA.
[19] S. Dobilas, "ISOMAP," 2021. [Online]. Available:
https://towardsdatascience.com/isomap-embedding-an-
awesome-approach-to-non-linear-dimensionality-
reduction-fc7efbca47a0.
[20] Saurabh.jaju, "t-SNE," 2017. [Online]. Available:
https://builtin.com/data-science/tsne-python.
[21] Y. VERMA, "how to use t sne for dimensionality
reduction," 2022. [Online]. Available:
https://analyticsindiamag.com/how-to-use-t-sne-for-
dimensionality-reduction/.
[22] s. dobilas, "UMAP," 2021. [Online]. Available:
https://analyticsindiamag.com/how-to-use-t-sne-for-
dimensionality-reduction/.
[23] F. Hashmi, "How to implement independant
component analysis in python," [Online]. Available:
https://thinkingneuron.com/how-to-perform-independent-
component-analysisica-in-python/.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy