Peerj Cs 08 914
Peerj Cs 08 914
ABSTRACT
The Internet Movie Database (IMDb), being one of the popular online databases for
movies and personalities, provides a wide range of movie reviews from millions of
users. This provides a diverse and large dataset to analyze users’ sentiments about
various personalities and movies. Despite being helpful to provide the critique of
movies, the reviews on IMDb cannot be read as a whole and requires automated tools
to provide insights on the sentiments in such reviews. This study provides the
implementation of various machine learning models to measure the polarity of the
sentiments presented in user reviews on the IMDb website. For this purpose, the
reviews are first preprocessed to remove redundant information and noise, and then
various classification models like support vector machines (SVM), Naïve Bayes
classifier, random forest, and gradient boosting classifiers are used to predict the
sentiment of these reviews. The objective is to find the optimal process and approach
to attain the highest accuracy with the best generalization. Various feature
engineering approaches such as term frequency-inverse document frequency (TF-
IDF), bag of words, global vectors for word representations, and Word2Vec are
Submitted 4 June 2021
Accepted 12 February 2022 applied along with the hyperparameter tuning of the classification models to enhance
Published 15 March 2022 the classification accuracy. Experimental results indicate that the SVM obtains the
Corresponding authors highest accuracy when used with TF-IDF features and achieves an accuracy of
Imran Ashraf, 89.55%. The sentiment classification accuracy of the models is affected due to the
imranashraf@ynu.ac.kr contradictions in the user sentiments in the reviews and assigned labels. For tackling
Gyu Sang Choi, castchoi@ynu.ac.kr
this issue, TextBlob is used to assign a sentiment to the dataset containing reviews
Academic editor before it can be used for training. Experimental results on TextBlob assigned
Sebastian Ventura
sentiments indicate that an accuracy of 92% can be obtained using the proposed
Additional Information and
model.
Declarations can be found on
page 24
DOI 10.7717/peerj-cs.914 Subjects Data Mining and Machine Learning, Data Science, Natural Language and Speech
Copyright Keywords Sentiment classification, Movies reviews, Bag of words, Text analysis, Supervised
2022 Naeem et al. machine learning
Distributed under
Creative Commons CC-BY 4.0
How to cite this article Naeem MZ, Rustam F, Mehmood A, M-z-d, Ashraf I, Choi GS. 2022. Classification of movie reviews using term
frequency-inverse document frequency and optimized machine learning algorithms. PeerJ Comput. Sci. 8:e914 DOI 10.7717/peerj-cs.914
INTRODUCTION
Social media has become an integral part of human lives in recent times. People want to
share their opinions, ideas, comments, and daily life events on social media. In modern
times, social media is used for showcasing one’s esteem and prestige by posting photos,
text, video clips, etc. The rise and wide usage of social media platforms and microblogging
websites provide the opportunity to share as you like where people share their opinions on
trending topics, politics, movie reviews, etc. Shared opinions on social networking
sites are generally known as short texts (ST) concerning the length of the posted text (Sahu
& Ahuja, 2016). ST has gained significant importance over traditional blogging because of
its simplicity and effectiveness to influence the crowd. These ST take the form of
jargon and are even used by search engines as queries. Apart from being inspiring, the ST
contains users’ sentiments about a specific personality, topic, or movie and can be
leveraged to identify the popularity of the discussed item. The process of mining the
sentiment from the texts is called sentiment analysis (SA) and has been regarded as a
significant research area during the last few years (Hearst, 2003). Sentiments given on
social media platforms like Twitter, Facebook, etc. can be used to analyze the perception of
people about a personality, service, or product, as well as, used to predict the outcome of
various social and political campaigns. Thus, SA helps to increase the popularity and
followers of political leaders, as well as, other important personalities. Many large
companies like Amazon, Apple, and Google use the reviews of their employees to analyze
the response to various services and policies. In the business sector, companies use SA to
derive new strategies based on customer feedback and reviews (Hand & Adams, 2014;
Alpaydin, 2020).
Besides the social media platforms, several websites serve as a common platform for
discussions about social events, sports, and movies, etc., and the Internet Movie Database
(IMDb) is one of the websites that offer a common interface to discuss movies and provide
reviews. Reviews are short texts that generally express an opinion about movies or
products. These reviews play a vital role in the success of movies or sales of the products
(Agarwal & Mittal, 2016). People generally look into blogs, review sites like IMDb to
know the movie cast, crew, reviews, and ratings of other people. Hence it is not only the
word of mouth that brings the audience to the theaters, reviews also play a prominent role
in the promotion of the movies. SA on movie reviews thus helps to perform opinion
summarization by extracting and analyzing the sentiments expressed by the reviewers
(Ikonomakis, Kotsiantis & Tampakas, 2005). Being said that the reviews contain valuable
and very useful content, the new user can’t read all the reviews and perceive the
positive or negative sentiment. The use of machine learning approaches proves to ease this
difficult task by automatically classifying the sentiments of these reviews. Sentiment
classification involves three types of approaches including the supervised machine learning
approach, using the semantic orientation of the text, and use of SentiWordNet based
libraries (Singh et al., 2013a).
Despite being several approaches presented, several challenges remain unresolved to
achieve the best possible accuracy for sentiment analysis. For example, a standard sequence
This study proposes a methodology to perform the sentiment analysis on the movie
reviews taken from the IMDb website. The proposed methodology involves
preprocessing steps and various machine learning classifiers along with several feature
extraction approaches.
Both simple and ensemble classifiers are tested with the methodology including decision
trees (DT), random forest (RF), gradient boosting classifier (GBC), and support
vector machines (SVM). In addition, a deep learning model is used to evaluate its
performance in comparison to traditional machine learning classifiers.
Four feature extraction techniques are tested for their efficacy in sentiment classification.
Feature extraction approaches include term frequency-inverse document frequency
(TF-IDF), BoW, global vectors (GloVe) for word representations, and Word2Vec.
Owing to the influence of the contradictions in users’ sentiments in the reviews and
assigned labels on the sentiment classification accuracy, in addition to the standard
dataset, TextBlob annotated dataset is also used for experiments.
The performance of the selected classifiers is analyzed using accuracy, precision, recall,
and F1 score. Additionally, the results are compared with several state-of-the-art
approaches to sentiment analysis.
The rest of this paper is organized as follows. “Related Work” discusses a few
research works which are closely related to the current study. The selected dataset,
machine learning classifiers, and preprocessing procedure, and the proposed methodology
are described in “Materials and Methods”. Results are discussed in “Results and
Discussion” and finally, “Conclusion” concludes the paper with possible directions for
future research.
RELATED WORK
A large amount of generated data on social media platforms on Facebook, and Twitter, etc.
are generating new opportunities and challenges for the researchers to fetch useful and
meaningful information to thrive business communities and serve the public. As a result,
multidimensional research efforts have been performed for sentiment classification and
analysis. Various machine learning and deep learning approaches have been presented in
the literature in this regard. Few research works which is related to the current study are
discussed here; we divide the research works into two categories: machine learning
approaches and deep learning approaches.
Data description
This study uses the ‘IMDb Reviews’ from Kaggle which contains users’ reviews about
movies (DAT, 2018). The dataset has been largely used for text mining and consists of
reviews of 50,000 movie reviews of which approximately 25,000 reviews belong to the
positive and negative classes, respectively. Table 2 shows samples of reviews from both
negative and positive classes.
TextBlob
TextBlob is a Python library that we used to annotate the dataset with new sentiments
(Tex, 2020; Loria, 2018). TextBlob is used for labeling as the possibility of contradiction
between the review text and label can not be ignored. TextBlob finds the polarity score
for each word and then sums up these polarity scores to find the sentiment. TextBlob
assigns a polarity score between −1 and 1. A polarity score greater than 0 shows the
positive sentiment, a polarity score less than 0 shows a negative sentiment while a 0 score
indicates that the sentiment is neutral. In the dataset used in this study, 23 neutral
sentiments are found after applying TextBlob. Pertaining to the low number of neutral
sentiments which can cause class imbalance, only negative and positive sentiments are
used for experiments. Contradiction in TextBlob annotated label and original dataset label
is shown in Table 3.
Bag of words
The BoW is simple to use and easy to implement for finding the features from raw text
data (Rustam et al., 2021; Rupapara et al., 2021a). Many language modeling and text
classification problems can be solved using the BoW features. In Python, the BoW is
implemented using the CountVectorizer. BoW counts the occurrences of a word in the
given text and formulates a feature vector of the whole text comprising of the counts of
each unique word in the text. Each unique word is called ’token’ and the feature vector is
the matrix of these tokens (Liu et al., 2008). Despite being simple, BoW often surpasses
many complicated feature engineering approaches in performance.
TF-IDF is applied to calculate the weights of important terms and the final output of
TF-IDF is in the form of a weight matrix. Values gradually increase to the count in TF-IDF
but are balanced with the frequency of the word in dataset (Zhang et al., 2008).
Word2Vec
Word2Vec is one of the widely used NLP techniques for feature extraction in text mining
that transforms text words into vectors (Wang, Ma & Zhang, 2016). Given a corpus of text,
Word2Vec uses a neural network model for learning word associations. Each unique
word has an associated list of numbers called ‘vector’. The cosine similarity of the vectors
represents the semantic similarity between the words that are represented by vectors.
GloVe
GloVe from Global Vectors is an unsupervised model used to obtain words’ vector
representation (Bhoir, Ghorpade & Mane, 2017). The vector representation is obtained by
mapping the words in a space such that the distance between the words represents the
semantic similarity. Developed at Stanford, GloVe can determine the similarity between
words and arrange them in the vectors. The output matrix by the GloVe gives vector space
of word with linear substructure.
Random forest
Rf is based on combining multiple decision trees on various subsamples of the dataset to
improve classification accuracy. These subsamples are the combination of randomly
selected features which are the size of the original dataset to form a bootstrap dataset. The
average of predictions from these models is used to obtain a model with low variance.
Information gain ratio and Gini index are the most frequently used feature selection
parameters to measure the impurity of feature (Agarwal et al., 2011).
X Xf ðCi ; TÞf ðCj ; TÞ
(4)
j6¼i
jTj jTj
f ðCi ; TÞ
where indicates the probability of being a member of class Ci.
jTj
The decision trees are not pruned upon traversing each new training data set. The user
can define the number of features and number of trees on each node and set the values of
other hyperparameters to increase the classification accuracy (Biau & Scornet, 2016).
Decision tree
DT is one of the most commonly used models for classification and prediction problems.
DT is a simple and powerful tool to understand data features and infer decisions. The
X
k
Split Info ¼ Pðvi Þlog2 Pðvi Þ (6)
i¼1
where k indicates the total number of splits for DT which is hyperparameter tuned for
different datasets to elevate the performance. DT is non-parametric, computationally
inexpensive, and shows better performance even when the data have redundant attributes.
Proposed methodology
With the growing production of movies over the last two decades, a large number of
opinions and reviews are posted on various social media platforms and websites. Such
reviews are texts that show explicit opinions about a film or product. These opinions play
an important part in the success of film or sales of the products (Agarwal & Mittal, 2016).
People search blogs, and evaluation sites like IMDb to get the likes and dislikes of other
IMDB Reviews
Figure 1 The work flow of proposed methodology for movie review classification.
Full-size DOI: 10.7717/peerj-cs.904/fig-1
Figure 2 Preprocessing steps for movies review dataset. Full-size DOI: 10.7717/peerj-cs.904/fig-2
people about films, the cast, and team, etc. but it is very difficult to read every review and
comment. Evaluation of these sentiments becomes beneficial to assisting people in this
task. Sentiments expressed in such reviews are important regarding the evaluation of the
movies and their crew. Automatic sentiment analysis with higher accuracy is extremely
important in this regard and this study follows the same direction and proposes an
approach to perform the sentiment analysis of movie reviews. In addition, since the
contradictions in the expressed sentiments in movie reviews and their assigned labels
can not be ignored, this study additionally uses TextBlob to determine the sentiments. Two
sets of experiments are performed using the standard dataset and TextBlob annotated
dataset to fill in the research gap as previous studies do not consider the contradictions in
the sentiments and assigned labels. Figure 1 shows the flow of the steps carried out for
sentiment classification.
As a first step, the reviews are preprocessed using a sequence of operations.
Preprocessing is critical to boosting the training of the classifiers and enhancing their
performance. The purpose of preprocessing is to clean the data by removing unnecessary,
meaningless, and redundant text from reviews. For this purpose, the six steps are carried
sequentially, as shown in Fig. 2.
Punctuation is removed from IMDb text reviews because punctuation does not add
any value to text analysis (Guzman & Maalej, 2014). Sentences are more readable for
humans due to punctuation, however, it is difficult for a machine to distinguish
punctuation from other characters. Punctuation distorts a model’s efficiency to distinguish
between entropy, punctuation, and other characters (Rupapara et al., 2021b, Liu et al.,
2008). Punctuation is removed from the text in pre-processing to reduce the complexity of
Table 6 Sample text from movie reviews after removing numeric values.
Input data After numeric removal
Gwyneth Paltrow is absolutely great in this movie. Gwyneth Paltrow is absolutely great in this movie
I own this movie This is number 1 movie I didnt like by choice I do. I own this movie This is number movie I didnt like by choice I do
I wish that 70s show would come back on tel. I wish that s show would come back on tel
Table 7 Sample output of the review text after changing the case of review text.
Input data After case lowering
Gwyneth Paltrow is absolutely great in this movie. gwyneth paltrow is absolutely great in this movie
I own this movie This is number movie I didnt like by choice I do. i own this movie this is number movie i didnt like by choice i do
I wish that s show would come back on tel. i wish that s show would come back on tel
the feature space. Table 5 shows the text of a sample review, before and after the
punctuation has been removed.
Once the punctuation is removed, the next step is to find numerical values and
remove them as they are not valuable for text analysis. Numerical values are used in the
reviews as an alternative to various English words to reduce the length of reviews and
ease of writing the review. For example, 2 is used for ‘to’ and numerical values are used
instead of counting like 1 instead of ‘one’. Such numerals are convenient for humans to
interpret, yet offer no help in the training of machine learning classifiers. Table 6 shows
text from sample reviews after the numeric values are removed.
In the subsequent step of numbers removal, all capital letters are converted to lower
form. Machine learning classifiers can not distinguish between lower and upper case letters
and consider them as different letters. For example, ‘Health’, and ‘health’ are considered as
two separate words if conversion is not performed from uppercase to lowercase. This
may reduce the significance of most occurred terms and degrade the performance (Liu &
et, 2010). It increases the complexity of the feature space and reduces the performance
of classifiers; therefore, converting the upper case letters to lower form helps in increasing
the training efficiency of the classifiers. Table 7 shows the text after the case is changed for
the reviews.
Stemming is an important step in pre-processing because eliminating affixes from
words and changing them into their root form is very helpful to enhance the efficiency of a
model (Goel, Gautam & Kumar, 2016). For example, ‘help’, ‘helped’, and ‘helping’ are
Table 9 Sample reviews before and after the stop words removal.
Input data After stopwords removal
gwyneth Paltrow is absolutely great in this movie. gwyneth paltrow absolute great movie
i own this movie this is number movie i didnt like by choice I do. own movie number movie didnt like choice do
i wish that s show would come back on tel. wish show would come back tel
altered forms of ‘help’, however, machine learning classifiers consider them as two
different words (Singh et al., 2013b). Stemming changes these different forms of words into
their root form. Stemming is implemented using the PorterStemmer library of Python
(Pang, Lee & Vaithyanathan, 2002). Table 8 shows the sample text of review before and
after stemming.
The last step in the preprocessing phase is the removal of stop words. Stop words have
no importance concerning the training of the classifiers. Instead, they increase the feature
vector size and reduce the performance. So they must be removed to decrease the
complexity of feature space and boost the training of classifiers. Table 9 shows the text of
the sample review after the stopwords have been removed.
After the preprocessing is complete, feature extraction takes place where BoW, TF-IDF,
and GloVe are used. Feature space for the sample reviews is given in Tables 10 and 11 for
BoW and TF-IDF features, respectively. Experiments are performed with the standard
dataset, as well as, the TextBlob annotated dataset to analyze the performance of the
machine learning and proposed models.
The data are split into training and testing sets in a 75 to 25 ratio. Machine learning
classifiers are trained on the training set while the test set is used to evaluate the
performance of the trained models. For evaluating the performance, standard well-known
parameters are used such as accuracy, precision, recall, and F1 score.
Evaluation parameters
Performance evaluation of the classifiers requires evaluation metrics for which
accuracy, precision, recall, and F1 score are selected concerning their wide use. The
introduction of the confusion matrix is necessary to define the mathematical formulas for
these evaluation metrics. The confusion matrix as shown in Fig. 3 can be considered as
an error matrix that indicates four quantities. The confusion matrix shows true positive
(TP), false positive (FP), true negative (TN), and false-negative (FN). Each row of the
matrix represents the actual labels while each column represents predicted labels (Landy &
Szalay, 1993).
TP indicates that the classifier predicted the review as positive and the original label is
also positive. A review is TN if it belongs to the negative class and the real outcome is
also negative. In the FP case, the review is predicted as positive, but the original label is
negative. Similarly, a review is called FN if it belongs to the positive class but the classifier
predicted it as negative (Rokach & Maimon, 2005).
Accuracy is a widely used evaluation metrics and indicates the ratio of true predictions
to the total predictions. It has a maximum value of 1 for 100% correct prediction and the
lowest value of 0 for 0% prediction. Accuracy can be defined as
TP þ TN
Accuracy ¼ (7)
TP þ TN þ FP þ FN
Precision focuses on the accuracy of predicting the positive cases. It shows what
proportion of the positively predicted cases is originally positive. It is defined as
TP
Precision ¼ (8)
TP þ FP
Recall calculates the ratio of correct positive cases to the total positive cases. To get the
ratio, the total number of TP is divided by the sum of TP and FN as follows
TP
Recall ¼ (9)
TP þ FN
of 0.87 with BoW features. Overall, the performance of all the classifiers is good except for
DT whose accuracy is 0.72.
Performance of the classifiers is given in Table 13 in terms of precision, recall, and F1
score. The F1 score indicates that its value is the same with both positive and negative
classes for all the classifiers, except for GBC who has F1 scores of 0.86 and 0.85 for positive
and negative classes, respectively. Precision values are slightly different for positive and
negative classes; for example, SVM has a precision of 0.88 and 0.90 for positive and
negative classes. Similarly, although precision, recall, and F1 score of DT are the lowest
but the values for positive and negative classes are almost the same. An equal number of
the training samples in the dataset makes a good fit for the classifiers, and their accuracy
and F1 scores are in agreement.
SVM performs better for text classification than other supervised learning models,
especially in the case of large datasets as this algorithm is derived from the theory of
structural risk minimization (Mouthami, Devi & Bhaskaran, 2013).
Figure 4 Performance comparison between machine learning models using original dataset and
BoW,TF-IDF, GloVe, Word2Vec features. Full-size DOI: 10.7717/peerj-cs.904/fig-4
with all features and achieved the best score with BoW, TF-IDF, and Word2Vec. This
significant performance of SVM is because of its linear architecture and binary
classification problem. SVM is more significant on linear data for binary classification with
its linear kernel as shown in this study.
Table 19 Performance evaluation of classifiers using TF-IDF features on the TextBlob annotated
dataset.
Model Accuracy Precision Recall F1 Score
Table 20 Performance evaluation of classifiers using GloVe features on the TextBlob annotated
dataset.
Model Accuracy Precision Recall F1 Score
Figure 5 Performance comparison between machine learning models using the TextBlob dataset and
BoW,TF-IDF, GloVe, Word2Vec features. Full-size DOI: 10.7717/peerj-cs.904/fig-5
on the original sentiments. However, the performance of the machine learning models is
inferior to that of BoW and TF-IDF.
The performance of machine learning models is good when used with TF-IDF features
extracted from the original dataset and SVM outperforms with a significant 0.89 accuracy
score. TF-IDF generates a weighted feature set as compared to BoW, GloVe, and
Word2Vec features which helps to improve the accuracy of learning models. On the other
hand, the accuracy of DT is reduced by 1% from 72% to 71% because DT is a rule-based
model that performs well on simple term frequency as compared to weighted features.
Weighted features introduce complexity in the DT learning process. SVM performs well
because TF-IDF provides a linear feature set with the binary class which is more suitable
for SVM that performs better being the linear model. The performance of machine
learning models is improved with TextBlob data annotation. Machine learning models
perform well with TF-IDF and BoW features and SVM obtains the highest accuracy of 0.92
accuracy score using TextBlob labels.
deep learning model CNN-LSTM achieves a 0.90 accuracy. The significant performance of
machine learning models is because of handcrafted TF-IDF weighted features.
Statistical T-test
The T-test is performed in this study to show the statistical significance of the proposed
approach (Fatima et al., 2021). The T-test is applied to SVM results with the proposed
approach and original dataset. The output from the T-test favors either null hypothesis or
alternative hypothesis.
Accept Null Hypothesis: This means that the compared results are statistically equal.
Reject Null Hypothesis: This means that the compared results are not statistically equal.
The output values of T-test in terms of T-statistic and critical Value are shown in
Table 24. T-test infers that the null hypothesis can be rejected in favor of the alternative
hypothesis because the T-statistic value is less than the critical value indicating that the
compared values are statistically different from each other.
CONCLUSION
With an ever-growing production of cinema movies, web series, and television dramas, a
large number of reviews can be found on social platforms and movies websites like IMDb.
Sentiment analysis of such reviews can provide insights about the movies, their team, and
cast to millions of viewers. This study proposes a methodology to perform sentiment
analysis on the movie reviews using supervised machine learning classifiers to assist the
people in selecting the movies based on the popularity and interest of the reviews. Four
machine learning algorithms including DT, RF, GBC, and SVM are utilized for sentiment
analysis that is trained on the dataset preprocessed through a series of steps. Moreover,
four feature extraction approaches including BoW, TF-IDF, GloVe, and Word2Vec are
investigated for their efficacy in extracting the meaningful and effective features from the
reviews. Results indicate that SVM achieves the highest accuracy among all the classifiers
with an accuracy of 89.55% when trained and tested using TF-IDF features. The
performance using BoW features is also good with an accuracy of 87.25%. Contrary to
BoW which counts the occurrence of unique tokens, TF-IDF also records the importance
of rare terms by assigning a higher weight to rare terms and perform better than BoW.
However, the performance of the classifiers is greatly affected by GloVe and Word2Vec
features which suggest that word embedding does not work well with the movie review
dataset. For improving the performance of models and reducing the influence of
contradictions found in the expressed sentiments and assigned labels, TextBlob is used for
data annotation. Experimental results on TextBlob annotated dataset indicates that SVM
achieves the highest accuracy of 92% with TF-IDF features. Compared to the standard
dataset, the TextBlob assigned labels result in better performance from the models. The
performance of deep learning models is slightly lower than machine learning models with
the highest accuracy of 0.90 by the CNN-LSTM. Despite the equal number of positive and
negative reviews used for training, the prediction accuracy for the positive and negative
classes is different. Precision, recall, and F1 score indicate the models have a good fit, and
Funding
This work was supported in part by the Basic Science Research Program through the
National Research Foundation of Korea (NRF) funded by the Ministry of Education
(NRF-2019R1A2C1006159 and NRF-2021R1A6A1A03039493), and in part by the 2021
Yeungnam University Research Grant. The funders had no role in study design, data
collection and analysis, decision to publish, or preparation of the manuscript.
Grant Disclosures
The following grant information was disclosed by the authors:
Ministry of Education: NRF-2019R1A2C1006159 and NRF-2021R1A6A1A03039493.
2021 Yeungnam University.
Competing Interests
Imran Ashraf is an Academic Editor for PeerJ Computer Science.
Author Contributions
Muhammad Zaid Naeem conceived and designed the experiments, prepared figures
and/or tables, and approved the final draft.
Furqan Rustam conceived and designed the experiments, performed the computation
work, authored or reviewed drafts of the paper, and approved the final draft.
Arif Mehmood analyzed the data, authored or reviewed drafts of the paper, supervision,
and approved the final draft.
Mui-zzud-din performed the experiments, authored or reviewed drafts of the paper, and
approved the final draft.
Imran Ashraf analyzed the data, authored or reviewed drafts of the paper, and approved
the final draft.
Gyu Sang Choi analyzed the data, authored or reviewed drafts of the paper, funding, and
approved the final draft.
Data Availability
The following information was supplied regarding data availability:
The dataset is available at Kaggle: https://www.kaggle.com/lakshmi25npathi/imdb-
dataset-of-50k-movie-reviews.
Supplemental Information
Supplemental information for this article can be found online at http://dx.doi.org/10.7717/
peerj-cs.914#supplemental-information.
REFERENCES
Agarwal A, Xie B, Vovsha I, Rambow O, Passonneau RJ. 2011. Sentiment analysis of twitter data.
In: Proceedings of the Workshop on Language in Social Media (LSM 2011). 30–38.
Agarwal B, Mittal N. 2016. Machine learning approach for sentiment analysis. In: Prominent
feature extraction for sentiment analysis. Berlin: Springer, 21–45.
Ali NM, Abd El Hamid MM, Youssif A. 2019. Sentiment analysis for movies reviews dataset using
deep learning models. International Journal of Data Mining & Knowledge Management Process
(IJDKP) 9(3):19–27 DOI 10.5121/ijdkp.2019.9302.
Alpaydin E. 2020. Introduction to machine learning. Cambridge: MIT Press.
Ashraf I, Hur S, Shafiq M, Kumari S, Park Y. 2019a. Guide: smartphone sensors-based pedestrian
indoor localization with heterogeneous devices. International Journal of Communication
Systems 32(15):e4062 DOI 10.1002/dac.4062.
Ashraf I, Hur S, Shafiq M, Park Y. 2019b. Floor identification using magnetic field data with
smartphone sensors. Sensors 19(11):2538 DOI 10.3390/s19112538.
Ayyadevara VK. 2018. Gradient boosting machine. In: Pro Machine Learning Algorithms. Berlin:
Springer, 117–134.
Bakshi RK, Kaur N, Kaur R, Kaur G. 2016. Opinion mining and sentiment analysis. In: 2016 3rd
International Conference on Computing for Sustainable Global Development (INDIACom).
Piscataway: IEEE, 452–455.
Bennett KP, Campbell C. 2000. Support vector machines: hype or hallelujah? ACM SIGKDD
Explorations Newsletter 2(2):1–13 DOI 10.1145/380995.380999.
Bhoir S, Ghorpade T, Mane V. 2017. Comparative analysis of different word embedding models.
In: 2017 International Conference on Advances in Computing, Communication and Control
(ICAC3). Piscataway: IEEE, 1–4.
Biau G, Scornet E. 2016. A random forest guided tour. Test 25(2):197–227
DOI 10.1007/s11749-016-0481-7.
Bodapati JD, Veeranjaneyulu N, Shaik S. 2019. Sentiment analysis from movie reviews using
LSTMs. Ingénierie des Systèmes d Information 24(1):125–129 DOI 10.18280/isi.240119.
Bruce LM, Koger CH, Li J. 2002. Dimensionality reduction of hyperspectral data using discrete
wavelet transform feature extraction. IEEE Transactions on Geoscience and Remote Sensing
40(10):2331–2338 DOI 10.1109/TGRS.2002.804721.
Cortes C, Vapnik V. 1995. Support-vector networks. Machine Learning 20(3):273–297
DOI 10.1007/BF00994018.
DAT. 2018. IMDb dataset. Available at https://www.kaggle.com/lakshmi25npathi/imdb-dataset-of-
50k-movie-reviews (accessed 30 October 2021).
Dessi D, Helaoui R, Kumar V, Reforgiato Recupero D, Riboni D. 2020. TF-IDF vs word
embeddings for morbidity identification in clinical notes: an initial study. In: 1st Workshop on
Smart Personal Health Interfaces, SmartPhil 2020 (CEUR-WS). Vol. 2596, 1–12.
Fatima EB, Omar B, Abdelmajid EM, Rustam F, Mehmood A, Choi GS. 2021. Minimizing the
overlapping degree to improve class-imbalanced learning under sparse feature selection: