Comparartive
Comparartive
Harshada Khandelwal
Ali Akbar Khan Amit Kolekar
Instrumentation & Control
Instrumentation & Control Instrumentation & Control
Engineering
Engineering Engineering
Vishwakarma Institute of
Vishwakarma Institute of Vishwakarma Institute of
Technology(VIT)
Technology(VIT) Technology(VIT)
Pune, India
Pune, India Pune, India
harshada.khandelwal20@vit.edu
ali.khan20@vit.edu amit.kolekar20@vit.edu
2.2.1.3 ICA
2.2.1.2 Principal Component Analysis (PCA)
ICA is one of the most widely used algorithms based on
The dimensionality reduction process used by Principal information- theory. PCA looks for principal components
Components Analysis (PCA), which is a linear method, based on their variance whereas ICA focuses on finding
involves embedding the data into a linear subspace with the independent components i.e no dependency between
reduced dimensions. Although there are other methods for two components. The algorithm works on the basis of two
doing this, PCA is by far the most often used main assumptions: the hidden components must be
(unsupervised) linear method. Therefore, we just include statistically independent and second they should have
PCA in our comparison. [3] The oldest and most used non-gaussian distribution. If the dataset passes these two
unsupervised technique, it chooses the most crucial criteria then it is possible to estimate the independent
features and seeks to extract as much data as possible from components.
the dataset. The feature that has the maximum variance is
called the first principal component. The feature that
explains the second maximum variance is considered the
second principal component and it goes on.
2.2.2.2 t- Distributed(t-SNE)
Reduce HDD in low dimensional space by using the Figure 12: component decomposition using 3 factors,5
unsupervised NLDRT technique known as t-SNE. It works neighbors & 0.3 as min distance
by comparing the separations between distributions. It
works well for datasets with non-linear structure and for
2.3 Observation and discussion
visualizing that structure. It is a non-linear algorithm that
uses a probabilistic approach rather than a mathematical In missing value ratio refer (Table 1) we can incur that
one like the ones used in PCA to visualize high there aren’t too many missing values in our rather than just
dimensional datasets. It maps data points using two two in our dataset. We can set a threshold of 20% to drop
approaches a) local approach which maps nearby points to down any variable which has missing values less than that
nearby points & b) global approach which preserves the after that the Outlet_Size will be dropped down.
geometry at all scales and calculates the conditional In the Low Varian filter as we can observe, refer (Table 2)
probability of similarity between a pair of points in High that Item_visibility has the lowest variance compared to
and low dimensional space and tries to minimize this others hence that column needs to be neglected. We can
difference of conditional probability. also set a threshold value for neglecting variables having
low variance than the threshold.
In High correlation filter observing refer (Table (3,4,5,6))
we can observe that mostly the correlation with one
another is negative or very low hence we don’t have any
high correlation in our dataset, generally if the correlation
is between 0.5-0.6 it is considered to drop those variables,
we have dropped Item_Outlet_Sales as it is our dependant
variable.
In Random Forest, we first convert it into numerical data
by one- hot., hence we need to drop the non-numeric data
like Item_Identifier and Outlet_Identifier. Refer (figure 3)
we can clearly observe that the Item MRP is the factor that
Figure 11: components transformed using 3 factors & affects the post to the Outlet Sales
300 iterations In Backward Feature Elimination we are going to select
the number of features to select as 10 for our dataset, it
2.2.2.3 Uniform- MAP means that the model is going to train itself until 10
features has left. Refer (figure 4) corresponds to the
features based on their ranking. And the best features are close together.
termed as rank 1.
In Forward feature selection For our dataset we are going 2.4 Conclusion
to select variables having F-value greater than 10, refer
In this paper. We reviewed current and state of the art
(figure 5) describes the F-value of each feature. As we can
dimension reduction methods by performing analysis of a
observe that most of the features have an F value of 1 so
dataset. We learned about how dimensionality reduction
we conclude that the two variances are equal. We can also
techniques are classified into two categories: Feature
observe that the top features refer (figure 3) also have a F
selections select a subset of features irrespective of
value higher than 1.
transformation. Whereas in extraction of features,
Both forward and backward feature selection methods and dimensions are reduced which is obtained by creating new
time consuming and computationally expensive hence sets(features) from the input datasets. We see the new non-
they are performed on datasets having small number of linear dimensionality reduction algorithms give better
variables results than traditional PCA on real datasets. This
Hence from the Feature Selection algorithms we can comparative review tries to run through all important
observe the attributes which can be discarded based on reduction techniques of dimensionality.
various factors and those which have a high relation with
the target parameter in our case Outlet sales.
References
[1] S. Ayeshaa, . K. H. Muhammad and . R. Taliba,
In factor analysis refer (Figure 6) we can observe that the "Overview and Comparative Study of Dimensionality
data is decomposed into 3 factors but it is very hard to Reduction," Information Fusion,Sciencedirect, 2020.
observe these factors individually and they are clustered in
very large sub groups. [2] I. K. Fodor, "A Survey of Dimension," California, May
9,2002.
In PCA refer (Figure 7) we can observe that we are able to
explain 60% of variance in our dataset using just 4 [3] L. v. d. Maaten, E. Postma and Jaap van den Herik,
components. Refer (Figure 8) we can observe that the "Dimensionality Reduction: A Comparative Review,"
variance captured by the first principal component is Tilburg centre for Creative ComputingTiCC TR , 2009.
highest followed by others. [4] M. A. Carreira-Perpinan, A Review of Dimension
Both FA and PCA both try to reduce the number of Reduction Techniques, Spain, January 27, 1997.
parameters to fewer variables considering their variance. [5] L. v. d. Maaten, An Introduction to
However, in PCA explains the maximum variance, while Dimensionality, Maastricht, July 2007.
in FA explains the covariance. [6] h. singh, "Missing value ratio," 2021. [Online].
In ICA refer (Figure 9) shows the different independent Available:
components separated in our dataset. https://www.analyticsvidhya.com/blog/2021/04/beginners
In ISOMAP refer (Figure 10) we can observe that the data -guide-to-missing-value-ratio-and-its-implementation/.
has been decomposed much better while comparing to the [7] h. singh, "low variance filter," 2021. [Online].
ones before. Available:
https://www.analyticsvidhya.com/blog/2021/04/beginners
-guide-to-low-variance-filter-and-its-implementation/.
Comparing the tables of UMAP and t-SNE we can observe
that the correlation between the components refer (Figure [8] "high correlation filter," [Online]. Available:
12) is low while compared to refer (Figure 11), hence https://solegaonkar.github.io/index.html.
UMAP performs much better than t-SNE and separates [9] w. koehrsen, "random forest," Dec 27, 2017.
these groups of similar categories from each other. [Online]. Available:
Hence by performance analysis of different algorithms in https://towardsdatascience.com/random-forest-in-python-
feature extraction we come to know that UMAP performs 24d0893d51c0.
better than others. [10] h. singh, "Backward feature elimination," 2021.
[Online]. Available:
https://www.analyticsvidhya.com/blog/2021/04/backward
As we can observe that in feature extraction a non-linear -feature-elimination-and-its-implementation/.
dimensionality reduction algorithm performs better than
linear, the reason being that linear techniques place data [11] J. Grover, - Mar 28, 2021. [Online]. Available:
points based on their dissimilarity, whereas non-linear https://medium.com/mlearning-ai/short-python-code-for-
techniques use manifolds and place similar data points backward-elimination-with-detailed-explanation-
52894a9a7880.
[12] h. singh, "forward feature selection," 2021.
[Online]. Available:
https://www.analyticsvidhya.com/blog/2021/04/forward-
feature-selection-and-its-implementation/.
[13] K. Matthew Mayo, "step forward selection," June
18, 2018[Online].Available:
https://www.kdnuggets.com/2018/06/step-forward-
feature-selection-python.html.
[14] Archdeacon, " Correlation and Regression
Analysis: A Historian’s Guide. Univ of Wisconsin Press.,"
[Online]. Available:
https://www.statisticshowto.com/probability-and-
statistics/f-statistic-value-test/.
[15] "Factor analysis," 2019 . [Online]. Available:
https://www.datacamp.com/tutorial/introduction-factor-
analysis.
[16] avcontentteam, "PCA," 2016. [Online]. Available:
https://www.analyticsvidhya.com/blog/2016/03/pca-
practical-guide-principal-component-analysis-python/.
[17] S. Talebi, "ICA," 2021. [Online]. Available:
https://towardsdatascience.com/independent-component-
analysis-ica-a3eba0ccec35.
[18] F. Hashmi, "unsupervised ML," [Online].
Available: https://thinkingneuron.com/data-science-
interview-questions-for-it-industry-part-4-unsupervised-
ml/#ICA.
[19] S. Dobilas, "ISOMAP," 2021. [Online]. Available:
https://towardsdatascience.com/isomap-embedding-an-
awesome-approach-to-non-linear-dimensionality-
reduction-fc7efbca47a0.
[20] Saurabh.jaju, "t-SNE," 2017. [Online]. Available:
https://builtin.com/data-science/tsne-python.
[21] Y. VERMA, "how to use t sne for dimensionality
reduction," 2022. [Online]. Available:
https://analyticsindiamag.com/how-to-use-t-sne-for-
dimensionality-reduction/.
[22] s. dobilas, "UMAP," 2021. [Online]. Available:
https://analyticsindiamag.com/how-to-use-t-sne-for-
dimensionality-reduction/.
[23] F. Hashmi, "How to implement independant
component analysis in python," [Online]. Available:
https://thinkingneuron.com/how-to-perform-independent-
component-analysisica-in-python/.