MADHU IEEE Updated 28 07 24
MADHU IEEE Updated 28 07 24
3.2.2 NAIVE BAYES CLASSIFIER (NB) predicts the result on the data set. The working principle of
This method uses the Bayes theorem to calculate the most RF are discussed below:
frequent class for the new instance. It can handle some Step-1: Choose random T input points.
features or classes [5][6][16][22]. The Bayes hypothesis is a Step-2: Build the decision tree for the input points.
method to calculate difference between likelihoods P (a | b) Step-3: Forecast the decision which comprise two or more
from P(a), P(b) and P (b | a). likelihoods P (a | b) from P(a), Decision trees.
P(b) and P (b | a). The formula to calculate probability is as Step-4: Repeat Step 1 & 2.
follows : Step-5: Predictions of each decision tree are involved in the
testing data; RF chosen the final decision based on majority
voting process.
1
http://www.cs.cornell.edu/People/pabo/movie-review-data
3.2.6 K NEAREST NEIGHBORS (KNN) Random 82 40.25 41.84 0.81 0.84
The K-Nearest-Neighbors (KNN) is a straightforward and Forest
powerful classification method. In this method, data are Naive 74 29.35 43.20 0.84 0.58
Bayes
arranged by a dominant part vote of its Neighbors. It
Decision 72.43 30.94 41.49 0.79 0.62
acknowledges the new data similarity and is assigned to the
Tree
available similar category. For classification of test data, it
KNN 50.54 18.97 31.57 0.51 0.38
retrieves the stored information, based on that it assigns the
In above Table2, Logistic Regression classifier
class to test data.[3] It is also known as lazy learners, because
outperforms maximum accuracy.
of the storing the data and retrieves it for classification. [20]
4.2 DATASET-II Amazon Electronics Product Review dataset
4 EXPERIMENTS
[14]. The performance of various classifiers is shown in
In this work, datasets are collected from the repository [3][4].
Table3.
The Python software is utilized for implementation.
Confusion matrixes are generated to find the correctly
classified class. The same environment variable is used for TABLE3: EXPERIMENT RESULT OF ELECTRONICS
all algorithms. Algorith Classific TP FP Preci Reca
m ation Rate Rate sion ll
4.1 DATASET-I: Clothing, Shoes, and Jewelry [14]. Rating 5 Accuracy (in%) (in %)
and 4 are considered as positive reviews and 1 and 2 are (in %)
Logistic 94 47.58 46.74 .96 .93
considered as negative reviews. Rating 3 is neglected. 25000 Regressi
reviews of positive and negative are selected randomly and on
make it balanced. 80:20 for training and testing the dataset. SVM 96 47.89 47.72 .96 .95
Similar procedure is followed for all data set. Table 1 shows Random 92 47.80 43.86 .89 .95
Dataset Statistics. The performance of various classifiers is Forest
1
http://www.cs.cornell.edu/People/pabo/movie-review-data
Bayes Fig1: Various Classifier with different dataset
1
http://www.cs.cornell.edu/People/pabo/movie-review-data
Intelligent Systems, vol. 28, no. 2, pp. 47-54, March-April (2013),
doi: 10.1109/MIS.2013.1.
[10] F. Alattar, K. Shaalan,“Survey on Opinion Reason Mining and
Interpreting Sentiment Variations”,IEEE Access, Volume 9, (2021).
[11] Karthikeyan, C., Sahaya, A.N.A., Anandan, P., Prabha, R., Mohan,
D., Vijendra, B.D,“Predicting Stock Prices Using Machine Learning
Techniques”,Proceedings of the 6th International Conference on
Inventive Computation Technologies, ICICT 2021, .(2021).
[12] KoyelChakraborty; Siddhartha Bhattacharyya; Rajib Bag,“A Survey
of Sentiment Analysis from Social Media Data”, IEEE Transactions
on Computational Social Systems, Volume: 7, Issue: 2, (2020).
[13] Morinaga, S., Yamanishi, K., Tateishi, K. and Fukushima, T.,
(2002), “Mining product reputations on the web”, in Proceeding of the
eighth ACM SIGKDD, international conference on Knowledge
discovery and data mining (pp. 341-349). ACM.
[14] Ni, J., Li, J. & McAuley, J.,“Justifying recommendations using
distantly-labeled reviews and fine-grained aspects”, In Proc. 2019
Conference on Empirical Methods in Natural Language Processing
and the 9th International Joint Conference on Natural Language
Processing (EMNLP-IJCNLP), 188–197, (2019)
[15] Qixuan Hou; Meng Han; Zhipeng Cai,” Survey on data analysis in
social media: A practical application aspect” Big Data Mining and
Analytics , Volume: 3, Issue: 4, (2020).
[16] Sajana, T., Narasingarao, M. R.,“Classification of Imbalanced
Malaria Disease Using Naïve Bayesian Algorithm”, International
Journal of Engineering & Technology,7(2.7) ,786-790,(2018)
[17] S Sakhare, N.N., Sagar Imambi,“Performance analysis of regression-
based machine learning techniques for prediction of stock market
movement" International Journal of Recent Technology and
Engineering 7 (6), 655-662, (2019)
[18] Sanjay Bhargav, P., Nagarjuna Reddy, G., Ravi Chand, R.V., Pujitha,
K., Mathur, A., “Sentiment analysis for hotel rating using machine
learning algorithms” International Journal of Innovative Technology
and Exploring Engineering, Vol. 8,Issue.6,pp 1225-1228.
[19] Shaozhong Zhang; Haidong Zhong,“Mining Users Trust from E-
Commerce Reviews Based on Sentiment Similarity Analysis”, IEEE
Access ,Volume: 7, Page(s): 13523 – 13535, (2019).
[20] Surbhi Bhatia,“A Comparative Study of Opinion Summarization
Techniques”, IEEE Transactions on Computational Social Systems,
Vol. 8, No. 1, (2021).
[21] Shyamasundar L B., Jhansi Rani P, “A Multiple-Layer Machine
Learning Architecture for Improved Accuracy in Sentiment Analysis”,
The Computer Journal , Volume: 63, Issue: 1, Jan. 2020,pp 395 – [22]
409,(2020).
[23] Vavilapalli, S.S., Reddykorepu, P., Saggam, S., Pentyala, M., Devi,
S.A,” Summarizing Sentiment Analysis on Movie Critics Data”,
Proceedings of the 6th International Conference on Inventive
Computation Technologies, ICICT 2021, (2021).
[24] Yassine Al-Amrani,Mohamed Lazaar, Kamal Eddine
lkadiri,“Sentiment Analysis using supervised classification
algorithms”, Proceedings of the 2nd international Conference on Big
Data, Cloud and Applications, Association for Computing
Machinery, Article No.: 61, Pages 1–8,(2017).
[25] You Li, Yuming Lin, Jingwei Zhang and Guoyong Cai, Constructing
Domain-Dependent Sentiment Lexicons Automatically for Sentiment
Analysis. Information Technology Journal, 12: 990-996, (2013).
1
http://www.cs.cornell.edu/People/pabo/movie-review-data