Modelling 05 00076 v2
Modelling 05 00076 v2
Modelling 05 00076 v2
1 Department of Natural, Engineering and Technology Sciences, Faculty of Graduate Studies, Arab American
University, Ramallah P.O. Box 240, Palestine; h.alawneh3@student.aaup.edu
2 Department of Information Technology, Arab American University, Ramallah P.O. Box 240, Palestine
* Correspondence: ahmad.hasasneh@aaup.edu (A.H.); mohammed.maree@aaup.edu (M.M.)
Abstract: Social media users often express their emotions through text in posts and tweets, and these
can be used for sentiment analysis, identifying text as positive or negative. Sentiment analysis is
critical for different fields such as politics, tourism, e-commerce, education, and health. However,
sentiment analysis approaches that perform well on English text encounter challenges with Arabic text
due to its morphological complexity. Effective data preprocessing and machine learning techniques
are essential to overcome these challenges and provide insightful sentiment predictions for Arabic
text. This paper evaluates a combined CNN-LSTM framework with emoji encoding for Arabic
Sentiment Analysis, using the Arabic Sentiment Twitter Corpus (ASTC) dataset. Three experiments
were conducted with eight-parameter fusion approaches to evaluate the effect of data preprocessing,
namely the effect of emoji encoding on their real and emotional meaning. Emoji meanings were
collected from four websites specialized in finding the meaning of emojis in social media. Furthermore,
the Keras tuner optimized the CNN-LSTM parameters during the 5-fold cross-validation process. The
highest accuracy rate (91.85%) was achieved by keeping non-Arabic words and removing punctuation,
using the Snowball stemmer after encoding emojis into Arabic text, and applying Keras embedding.
This approach is competitive with other state-of-the-art approaches, showing that emoji encoding
enriches text by accurately reflecting emotions, and enabling investigation of the effect of data
Citation: Alawneh, H.; Hasasneh, A.;
preprocessing, allowing the hybrid model to achieve comparable results to the study using the same
Maree, M. On the Utilization of Emoji
ASTC dataset, thereby improving sentiment analysis accuracy.
Encoding and Data Preprocessing
with a Combined CNN-LSTM
Framework for Arabic Sentiment
Keywords: sentiment analysis; emoji encoding; CNN-LSTM; hyperparameters optimization; NLP;
Analysis. Modelling 2024, 5, 1469–1489. data preprocessing
https://doi.org/10.3390/
modelling5040076
more than 500 million speakers worldwide [7], and about 185 million Arabic speakers use
the Web [6]. Thus, ASA has recently emerged as an active research area, particularly in the
field of Machine Learning (ML) applications [8]. One of the ways to strengthen the ASA
domain is through the use of emojis, as they provide helpful features to enrich the textual
features for sentiment analysis, which are becoming more popular in the world of social
media [9].
They provide a rich source of semantic dimensions that can assist in conveying
users’ opinions. Here, we did not just consider emoticons that reflect facial expres-
sions, but also those that are used to enrich the text with concepts and ideas, such
as celebrations, weather status, vehicles and buildings, food and drink, animals and
plants, and the intended feelings and emotions from their use [10]. For example, the
m'”, and in English means “loves someone,
“ ” emoji means “ èXñÖÏ @ ð éJÓðQË@ ð m I .
romance, and affection”, and “ ” means “ ÐA« . èPAKB @ð èXAªË@”, and means “happi-
ɾ
ness and excitement in general” in English, while “ ” is rich in meanings and intentions
¯AÖÏ úæÖÏ @ èQº¯ ð @ éK XAÖÏ @ ÈAJ.m.Ì '@. Q®Ë@ ð @ èñ®Ë@
such as “ èQÓAªÖÏ @ð éÊK ñ£ HA
ð @ éªJ
J¢ËAK H Aj«B @.
. . . .
ÉÓ AJË@ð ÐCËAK. AAk@ ð @ , HAK YjJË@ úΫ I.ʪJË@ ð@”, which means “Physical mountains or the
idea of hiking and adventure. Admiration for nature, strength, or travel. Or overcoming
challenges, or a sense of peace and contemplation”. Thus, eliminating such emojis could
omit valuable information and feelings that they reflect, and change the overall meaning
of the user’s tweet and its emotional tone. On the other hand, including the intended
meaning and emotion of the emoji will help ML extract the right insights and support
decision-makers and managers in their decision-making.
This research work presents an approach to emoji encoding introduced by replacing
each emoji with its emotional and real social media meaning. Furthermore, a hybrid deep
learning model is proposed to evaluate the impact of this preprocessing step on the quality
of ASA and to build robust prediction models. These techniques address specific challenges
in Arabic sentiment analysis, such as the complexity of Arabic dialects, the lack of sentiment
lexicons, and the intricacies of Arabic morphology. Our approach advances the state of the
art by offering a more nuanced understanding of how these techniques can be effectively
employed to overcome these challenges. To the best of our knowledge, this is the first work
that utilizes a combined deep learning approach with emoji encoding for Arabic Sentiment
Analysis, which deserves to be considered.
Accordingly, we can summarize the main contribution of our proposed approach as
follows:
1. Combination of emoji encoding with the hybrid CNN-LSTM model: Our method inte-
grated emoji encoding that captures all the emotional and real meanings, specifically
tailored to enhance the understanding of sentiment in Arabic text.
2. Impact of preprocessing steps: We explored the effects of various preprocessing
techniques, such as keeping non-Arabic words, retaining punctuations, and using
different stemmers and embedding transformers, on the performance of our sentiment
analyzer. This exploration provides deeper insights into how specific transformation
or stemming strategies can effectively leverage punctuation and non-Arabic words to
enhance sentiment extraction in Arabic text.
The rest of this paper is organized as follows. The literature review on sentiment
analysis and text data preprocessing is discussed in Section 2. Then, the proposed method-
ology, including data collection, preprocessing, and hybrid model prediction and tuning
processes, is presented in Section 3. Section 4 shows the results obtained from the different
experiments, which are discussed in Section 5. Finally, Section 6 presents the conclusions
and suggests future work in this area.
Modelling 2024, 5 1471
2. Literature Review
Sentiment analysis is the understanding of people’s opinions, emotions, and attitudes
toward any topic or person expressed in textual data [11]. In the field of Natural Language
Processing, the ASA has recently received increasing attention [12]. Through reading on
the ASA field, we found research undertaken on hybrid models, deep learning models, and
classical machine learning models for classifying Arabic sentiments.
Hybrid models play a role in our understanding of the complexity of Arabic sentiment
as these models are trained on different datasets to build predictive models. The study
in [13] applied a combination of Convolutional Neural Network (CNN) and Long Short-
Term Memory (LSTM) on three datasets, the Arabic Health Services Dataset (Main-AHS
and Sub-AHS) [14], Ar-Twitter [15], and the Arabic Sentiment Tweets Dataset (ASTD) [16]
datasets. The max-pooling layer was excluded from the CNN to maintain the same fea-
ture vector length after convolving the filters on the input data. In addition, several
dataset-preparation techniques such as MADAMIRA, Farasa, and Stanford for Arabic text
preprocessing and several pre-trained word-embedding techniques for providing vector
representation for the text features, such as Word2Vec, Glove, and fastText, were inves-
tigated to improve the accuracy of Arabic sentiment classification. The best accuracy, of
94.83%, was achieved for the Main-AHS dataset using Farasa Lemmatization normalization,
and 88.86% for the Ar-Twitter dataset using Madamira Stem normalization and 81.62%
for the ASTD using Word2VecSG word embedding. Subsequently, a more complex ap-
proach was proposed in [17] by implementing a hybrid model to combine contextualized
sentence representations generated by the AraBERT model with static word embedding
using pre-trained Mazajak. In addition, CNN-Bidirectional Long Short-Term Memory
(CNN-BiLSTM) was used to obtain sentence representations from the static word vectors
in order to be able to concatenate the two types of embeddings. The hybrid model outper-
forms the standalone AraBERT model tested on the ArSarcasm-v2 dataset for both sarcasm
and sentiment classification tasks. The best results are a 0.62 F1 score and 0.715 F-PN
score (macro average of positive and negative class F scores) for sarcasm and sentiment
classification, respectively. Another hybrid model of CNN-BiLSTM was used in [18] for
different tasks, including a topic classifier, a sentiment analyzer, a sarcasm detector, and an
emotion classifier. This model was trained on different datasets for each task, with four
of them for sentiment analysis tasks; SS2030, ArSAS, Twitter dataset for Arabic Sentiment
Analysis, and ArSarcasm-v2 datasets, consisting of 4214, 21,000, 348,797, and 15,548 tweets,
respectively. The proposed model achieves an accuracy of 97.58%, 86%, 97%, and 81.6% for
topic, sentiment, sarcasm, and emotion classification, respectively.
On the other hand, deep learning has also been used for ASA. For example, the study
in [19] used deep learning to evaluate GloVe, Word2Vec, and FastText as classical word
embedding techniques and ARBERT as contextualized word embedding for sentiment
analysis with a comparative analysis. The word embedding techniques were evaluated
in trained and pre-trained versions by applying two deep learning models of BiLSTM
and CNN on five datasets, including HARD, Khooli, Arabic Jordanian General Tweets
(AJGT), ArSAS, and ASTD for sentiment classification. The BiLSTM model outperforms
CNN on three datasets, while CNN performs better on smaller datasets. In addition,
the generated embeddings outperform their pre-trained versions by about 0.28% to 1.8%
accuracy. The contextualized transformer-based embedding BERT model achieves the
highest performance in both trained and pre-trained versions. Another study in [20]
employed Deep Neural Networks along with investigating Support Vector Machines
(SVM), Naive Bayes (NB), and Random Forest (RF) as classical ML models that were tuned
using Differential Evolution (DE) algorithms for classifying the sentiment of Arabic texts
related to monkeypox. The dataset used was collected from Twitter over eight months,
resulting in 4763 tweets. The best result was obtained using the DNN based on Leaky ReLU
with an accuracy of 92%.
Classical ML has also been used for ASA. Thus, several supervised ML models have
been applied in [21], including SVM, Linear Regression, NB, Complementary Naive Bayes
Modelling 2024, 5 1472
(CNB), and Stochastic Gradient Descent (SGD) for both sentiment and sarcasm classification.
These models were trained and tested with 5-fold cross-validation on the ArSarcasm-v2
dataset. The best accuracy was achieved using SVM with 59.8% and 74.6% for sentiment
and sarcasm, respectively. Based on the same dataset, an improvement was presented
in [22] by applying different versions of two transformer-based models, AraELECTRA and
AraBERT, for sarcasm and sentiment detection. The best results for sarcasm were achieved
by the AraBERTv2-base model with an accuracy of 78.3%, while AraBERTv0.2-large was
the best for the sentiment task, with an accuracy of 65.15%. It is important to note that the
pre-trained model in [3] was not used to generate the embeddings. Instead, it presents a
fine-tuning approach of three stages for a pre-trained model called Arabic BERT, which was
developed for Arabic sentiment analysis. These stages consist of text pre-processing and
data cleaning, transfer learning of weights of pre-trained models, and a classification layer.
Model evaluation was performed by testing this model on five different Arabic review
datasets and comparing its results with 11 state-of-the-art models. This model outperforms
the prediction accuracy of the proposed models.
Researchers in SA follow different strategies to deal with emojis; some researchers
just eliminate the emojis, while others have considered the significance of emojis in their
work [23]. Including the emojis can help in expressing writers’ feelings, which helps in
improving the classification performance [24].
One strategy exploits the emojis in SA by replacing the emojis with textual data, such
as the study in [25], which is directed towards translating emojis by conducting emoji
Unicode translation. Also, it investigates the effect of combining Recurrent Neural Network
(RNN), LSTM, and Gated Recurrent Unit (GRU) in conjunction with Logistic Regression
(LR), RF, and SVM and grid search to improve the prediction performance for Arabic
sentiment analysis. The model performance is compared with three deep learning models,
which are RNN, LSTM, and GRU, implemented with CBOW word embedding and tuned
using Keras-tuner, and with five ML models, which are Decision Tree (DT), LR, K-Nearest
Neighbor (KNN), RF, and NB, implemented with the Term Frequency–Inverse Document
Frequency (TF-IDF) feature extraction model and grid-search cross-validation for model
tuning. Different datasets are used for training and testing the models: ASTC, ArTwitter,
and AJGT. Stacking LR achieved the highest testing accuracy of 92.22% compared to ML
models and DL models when using the ASTC dataset. Also, the study in [26] used a Russian
dataset of 6957 posts and each post has at least one emotional indicator (emojis, emoticons,
punctuation marks that express emotions); each emotional indicator was replaced with its
meaning to improve the model. The best model was an ensemble model of word2vector
model and a model of emotional indicator embedding tested on a dataset of 524 posts with
an accuracy of 91%.
Another strategy to improve SA is to use emojis as non-verbal features. The study
in [23] adapted non-verbal features for the task of Arabic sentiment analysis. Thus, several
ML models including NB, multinomial naive Bayes (MNB), SGD, sequential minimal
optimization-based support vector machines (SMO-SVM), DT, and RF were evaluated
on emoji-based features with a feature vector of length 429 and for 2091 instances. The
MNB achieved the best Area Under the Curve (AUC) of 87.30% when applied to the top
250 most relevant emojis selected using ReliefF and Correlation-Attribute Evaluator feature
selection techniques. In [27], several ML models were also investigated, including SGD,
SVM, Gaussian NB, KNN, DT, LSTM, GRU, Bi-LSTM, and bidirectional-GRU, to evaluate
non-verbal features. A dataset of 2091 microblogs after excluding tweets without emojis
was collected from ASTD, ArTwitter, QCRI, Syria, Semeval-2017 Task4 Subtask#A, and
843 Arabic microblogs with emojis from Twitter and YouTube. Then, the Emoji Sentiment
Ranking (ESR) lexicon, which is an emoji lexicon containing 969 used emojis after excluding
the unused emojis, and Principle Component Analysis (PCA) were applied to reduce
the dimensionality of the features from 430 to 100 features. The best accuracy of 71.71%
was achieved by the bidirectional-GRU model. In addition to non-verbal features, textual
features were also used in the study of [9]. Thus, five datasets were used after removing
Modelling 2024, 5 1473
instances that did not contain emojis, including Syria, ASTD, ArTwitter, QCRI, and Semeval-
2017. After merging all the datasets, each tweet was divided into textual and emoji features,
and then for the feature extraction step, the TF-IDF, Latent Semantic Analysis (LSA), and
two methods of word embedding were used to extract textual features, while a set of
120 emojis was used to calculate the occurrence of each emoji to obtain nonverbal features.
The SVM achieved the best results by merging skip-gram features with emojis and using
correlation-based feature selection with an accuracy of 83.02%.
In [28], another approach was applied by training an attention-based long short-term
memory network on the embeddings generated by bi-sense emojis and inspired by word
sense embedding. To obtain sentiment-aware embeddings of emojis, the bi-sense emojis
were learned under positive and negative sentimental tweets. The best accuracy of 90%
was achieved on the AA-sentiment dataset using Multi-Level Attention-based LSTM with
bi-sense emoji embedding (MATTBiE-LSTM) and 83.4% on the HA-sentiment dataset using
word-guide attention-based LSTM with bi-sense emoji embedding.
The previous studies used different hybrid models, transformers, and emoji-handling
strategies for ASA. However, the morphological complexity of the Arabic language and
the effect of several factors that change the meaning of the text, such as punctuation, non-
Arabic words, and emojis, mean that the ASA field needs further investigation. The studies
in [13,17–19] applied the hybrid and deep learning models on prepared datasets without
exploring the effect of emoji meaning, punctuation, or sentences in other languages on
the final classification results. Other studies applied classical machine learning models,
such as [21,22], but these models could not overcome the complexity of Arabic, so they
did not reach high accuracy scores. On the other hand, the studies in [20,25,26] treated
the emojis by replacing them with textual data. In contrast, the studies in [23,27] treated
the emojis as non-verbal features and removed text which may be rich in sentiment that
can improve the model results, so in [9] both the non-verbal features and the original text
were used. Although these studies examined emojis, they did not investigate the effect of
keeping non-Arabic words, punctuation, or the most suitable transformers when having
words written in other languages inside the Arabic text or when keeping punctuation,
or emoji encoding on the emotional and real meaning in their results. In this study, we
propose a combination of CNN and LSTM models trained and tested on the ASTC dataset
to improve the ASA. The study also investigates the effect of the proposed hybrid model
under different experiments and conditions to understand the importance of each step
in data preprocessing, including examining Keras and AraVec transformers and their
suitability when keeping punctuation and non-Arabic words, emoji handling, and the effect
of keeping Arabic words and punctuation on the model results.
Figure1.1.The
Figure Theworkflow
workflowof
ofthe
theproposed
proposedmodel
modelfor
forArabic
Arabicsentiment
sentimentanalysis.
analysis.
3.1.
3.1.Dataset
DatasetDescriptions
Descriptions
In
Inthis
thisresearch,
research,thetheperformance
performanceof ofthetheproposed
proposedmodel
modelandandthe
theimportance
importanceof ofdata
data
preprocessing
preprocessing were
were evaluated
evaluated using
using two
two different
differentdatasets,
datasets, namely
namely Arabic
Arabic Sentiment
Sentiment
Twitter
TwitterCorpus
Corpus (ASTC)
(ASTC) and Emoji Meaning.
Meaning. The TheASTC
ASTCdataset
datasetcontains
contains Arabic
Arabic tweets
tweets la-
labeled with their corresponding sentiment polarity for training the model,
beled with their corresponding sentiment polarity for training the model, and the Emoji and the Emoji
Meaning
Meaningdataset
datasetforms
formsa adictionary
dictionaryofof
emoji
emoji meanings,
meanings, which is used
which to give
is used each
to give emoji
each in
emoji
the ASTC dataset its meaning to find the effect of emoji encoding
in the ASTC dataset its meaning to find the effect of emoji encoding on ASA. on ASA.
The
TheASTC
ASTC[29]
[29]isisaapublicly
publiclyavailable
availabledataset
dataseton onKaggle,
Kaggle,collected
collectedin inApril
April2019
2019using
using
aapositive
positiveandandnegative
negativeemoji
emojilexicon.
lexicon. ItIt is
is aa balanced
balanced dataset
dataset consisting
consistingof of5656KKlabeled
labeled
Arabic
Arabictweets,
tweets,as
asshown
shownin in Figure
Figure2,2, which
which represent
representthethe number
number of of tweets
tweets in
in each
each class.
class.
The dataset is divided into 45 K for model training, with 22,760 positive and 22,513 neg-
Modelling 2024, 5, FOR PEER REVIEW 7
The dataset is divided into 45 K for model training, with 22,760 positive and 22,513 nega-
tive
ativetweets,
tweets, andand 11 11K forK formodelmodel testing, withwith
testing, 57515751positive and 5767
positive and 5767negative tweets.
negative The
tweets.
target variable is also labeled as positive or negative to describe
The target variable is also labeled as positive or negative to describe the emotions of the the emotions of the tweets.
In addition,
tweets. the Emoji
In addition, theMeaning dataset consists
Emoji Meaning of 912 emojis
dataset consists of 912 collected from the
emojis collected ASTC
from the
dataset, and then each emoji was mapped to its meaning.
ASTC dataset, and then each emoji was mapped to its meaning. Each emoji has an emo- Each emoji has an emotional
meaning based on Twitter users’ use in their tweets, such as “ ”, which means “ السعادة
tional meaning based on Twitter users’ use in their tweets, such as “ ”, which means
عدواني بشكل سلبي يكون الشخص ساخر أو
. úG@ð Y« ”بشكل عام أو الود او عندماandin English means “general
àñºK
“ úæÊ É¾
happiness or ð @
friendliness QkA
or j Ë@
when AÓYJ« ð@isXñË@
someone ð @ ÐA«sarcastic
being ɾ. èXAªË@ ” and in English
or passive-aggressive”.
.
meansemojis
Other “general canhappiness
have an emotional or friendliness
meaning or when
and can someone
be usedistobeingindicatesarcastic or passive-
their real mean-
ing, such as “ ”, which means “ ”حيوانك األليف او الكلب او الوالء والصداقة والرفقة و الثقةandreal
aggressive”. Other emojis can have an emotional meaning and can be used to indicate their in
English
meaning, “your
such petas “or dog or loyalty,
”, which
means friendship,
companionship,
“ é®JË@ ð 鮯QË@ ð é¯@YË@
ð ZBñË@ ð@trust”. I.ʾË@So,
ð@ Jto Ëobtain
B@ ½K@ñallJk”
the
andemoji meanings,
in English “your four pet orwebsites [30–33]friendship,
dog or loyalty, specialized in collecting emoji
companionship, trust”.meanings
So, to obtainwere all
used
the emoji meanings, four websites [30–33] specialized in collecting emoji meanings were con-
to map each emoji to its meaning. These websites were also used to validate the used
sistency
to map each of the emoji
emoji to itsmeanings,
meaning. providing
These websites cross-referencing
were also usedwith severalthe
to validate emoji interpre-
consistency of
tation
the emoji databases
meanings, or providing
dictionaries. Also, most emojis
cross-referencing can have
with several emoji multiple meanings,
interpretation so the
databases or
strategy
dictionaries.thatAlso,
was most
used emojis
for handling
can haveambiguity and validating
multiple meanings, the emojis’
so the strategy thatmeaning
was used in- for
cluded the addition of all the commonly shared emotional
handling ambiguity and validating the emojis’ meaning included the addition of all the com- and real word meanings be-
tween
monlyArabic-speaking
shared emotionalusers and provided
real word by [30,31] and
meanings English-speaking
between Arabic-speaking users provided
users provided by
[32,33], supported by human judgment. In addition to all
by [30,31] and English-speaking users provided by [32,33], supported by human judgment. of this, the performance com-
parison
In addition withtoand all without
of this, the emoji encoding demonstrates
performance comparison with that the
andinclusion
without emojiof emoji mean-
encoding
ing improves the
demonstrates thatrobustness
the inclusion of of
theemoji
model as a final
meaning step inthe
improves validating
robustness theofemoji encoding
the model as a
process.
final stepThe Emoji Meaning
in validating the emoji dataset provides
encoding a rich
process. The source
EmojiofMeaning
emotiondataset
and context
providesthata
improves
rich sourcemodel performance
of emotion and context by replacing
that improves eachmodel
emojiperformance
in the ASTCby dataset witheach
replacing its emo-
emoji
tional and real meaning from the Emoji Meaning dataset.
in the ASTC dataset with its emotional and real meaning from the Emoji Meaning dataset. It is important to point to theIt
fact that emoji
is important tointerpretation
point to the fact maythatdiffer
emojiacross different may
interpretation cultures;
differhowever, to address
across different this
cultures;
however,
issue, to address
we focused this study
in this issue, onweutilizing
focused in andthis study on utilizing
interpreting and interpreting
emojis according to theiremojis
com-
according
mon normstoglobally
their common amongnorms globally
cultures. among cultures.
Nevertheless, Nevertheless,
we acknowledge thatwe acknowledge
further investi-
that further
gation is stillinvestigation
required tois identify
still required
bothtocommon
identify both and common
uncommon andemojis
uncommon emojishave
that may that
may have
various various meanings
meanings dependingdepending on culture. on culture.
Figure 2.
Figure Thenumber
2. The numberof
ofpositive
positiveand
andnegative
negativetweets.
tweets.
2. Remove hashtags: Twitter users widely adopt the hashtag character “#” to bookmark
their tweet content or join a topic or trend community [34,35]. Therefore, the hashtags
were removed during this phase.
3. Remove diacritics: All diacritics were removed from the data because they did not affect
Þ ”,
the SA measurements [36]. For example, in “ @ AJË éJK QªÓ @ñ®¯ YªK. áÓ @QªË@ PAîE èA¾K. IªÖ
which means in English “I heard him crying on the day of the funeral, then they stood up
to offer their condolences to me @”, diacritics such as Fatha in “@” are used to give an
aesthetic shape to the sentence without following the rules of the Arabic language, such as
using the Fatha after the broken heart emoji: “ @”.
4. Remove numbers: All numbers are removed from tweets because they do not reflect
the sentiments contained in the text and are useless [37].
5. Removing stop words: stop words are frequently used in Arabic and English lan-
guages [36] and they have little semantic value [38]. Therefore, it is necessary to
remove the Arabic stop words if only Arabic words are kept in the dataset and both
Arabic and English stop words if both languages are used.
6. Tokenization: The TweetTokenizer from the NLTK library was used to break the text
into tokens [39]. It is a simple and fast tokenizer that focuses on data from Twitter and
works based on regular expressions [40]. Also, TweetTokenizer preserves the emojis
and emoticons as tokens, which allows them to be handled appropriately, and it deals
with the repeated characters by reducing them to a length of three [41,42]. This makes
it suitable for the dataset and preprocessing experiments used.
7. Preprocessing is divided into three phases, based on removing non-Arabic words
and punctuation in Experiment 1, keeping the non-Arabic words and removing the
punctuation in Experiment 2, and keeping the non-Arabic words and the punctuation
in Experiment 3. Then, each experiment is tested over eight preprocessing conditions
denoted by R1–R8 as shown in Figure 1, and based on the conditions described in the
8, 9, and 10 points.
8. Handling Emojis: To study the effect of emojis, two approaches were followed; the
first one involved removing the emojis from each tweet, while the second approach
treated emojis by collecting the emotional and real meaning of emojis based on their
usage on social media platforms and replacing each emoji with its textual meaning.
9. Stemming: This is a common morphological analysis that aims to reduce inflectional
forms and achieve a common base form for words in sentences [43]. For the Ara-
bic natural language, different stemmers can indicate the lexical root of the words.
Therefore, the effect of using different stemmers in Arabic sentiment analysis was
investigated by applying both the Information Science Research Institute’s (ISRI) and
Snowball stemmers.
10. Embedding: Embedding provides a numerical representation for words and sentences
by transforming each word into a numerical vector representation that captures the
syntactic and semantic meaning based on its contextual usage in the dataset [44]. The
effect of the transformation model was investigated by evaluating two transformation
methods, Keras embedding and AraVec 3.0 embedding. Keras embedding is trainable
and not a pertained model. This means that the embedding vector for each word was
adjusted randomly to small weights, and during back-propagation, the embedding
vectors were updated to minimize the loss function [45]. On the other hand, AraVec
3.0 [46] is an open-source project that provides a powerful pre-trained model for
Arabic word embedding transformation. The latest version of AraVec 3.0 has been
trained on two Arabic content domains, namely tweets and Arabic Wikipedia articles,
resulting in the provision of 16 different word-embedding models. This version also
provides two types of models, unigrams and n-grams, and the most commonly used
n-gram models are trained with a total of more than 1,169,075,128 tokens. In this
research, the n-gram model was used to generate embedding with a vector size of 100.
Modelling 2024, 5, FOR PEER REVIEW 9
Modelling 2024, 5 1477
3.3.3.
3.3.3. CNN-LSTM
In
In this
this study,
study, wewe developed
developed aa combined
combined deep
deep learning
learning architecture
architecture specifically
specifically for
Arabic
Arabicsentiment
sentimentanalysis
analysisand classification.
and TheThe
classification. workflow
workflowof theofproposed combined
the proposed CNN-
combined
LSTM
CNN-LSTMmodelmodel
includes five stages
includes whichwhich
five stages can be summarized
can be summarized as follows: the embedding
as follows: the embed-
layer, CNN layer,
ding layer, CNN maxlayer,pooling layer, LSTM
max pooling layer,layer,
LSTM and output
layer, andlayer (as shown
output layer (as in shown
Figure 3).
in
The CNN-LSTM model is a hybrid model that combines the advantages
Figure 3). The CNN-LSTM model is a hybrid model that combines the advantages of both of both CNNs
and
CNNs RNNs, specifically
and RNNs, LSTM networks.
specifically This combination
LSTM networks. results results
This combination in an effective model
in an effective
to capture
model both local
to capture bothand
localglobal dependencies
and global in theintext
dependencies thedata, making
text data, it well-suited
making for
it well-suited
sentiment analysis
for sentiment tasks.
analysis tasks.
dim to understand the vocabulary size of the dataset, output-dim to describe how the words
will be embedded in a certain vector space, and input-length to show the input sequence
length [50]. These determine the shape of the generated output from the embedding layer as
(batch-size, input-length, output-dim), where the batch-size value of “None” is used for the
dynamic batch size, which is common in the Keras implementation [47], while the value of
the input-length varies depending on the preprocessing steps that affect the sequence length
of the input data, and the output dimension is explored between 100 and 400, with a step of
50 units for Keras embedding and 100 for the AraVec, since it is a pre-trained model with a
static vector size of 100. Thus, the shape of the embedding layer output will be different for
different experiments and runs, depending on the input sequence length, which can vary
depending on whether non-Arabic words, punctuation, or emojis are retained or removed,
and the output dimension, which is determined during the hyperparameter tuning phase.
Also, the final shape of the generated embeddings for all experiments is summarized in
Table 1. In this research work, two types of embedding transformation methods were used
to study their effect on the model performance: AraVec 3.0 and Keras embedding.
Table 1. The output shape of each layer for the 1D CNN-LSTM model.
Exp Run
Embedding Shape Convolutional Layer Max Pooling LSTM Layer Flatten
Num Num
R1 (None, 1189, 150) (None, 1187, 400) (None, 593, 400) (None, 593, 250) (None, 148,250)
R2 (None, 1956, 300) (None, 1954, 100) (None, 977, 100) (None, 977, 80) (None, 78,160)
R3 (None, 1955, 150) (None, 1953, 400) (None, 976, 400) (None, 976, 170) (None, 165,920)
R4 (None, 1956, 100) (None, 1955, 200) (None, 977, 200) (None, 977, 130) (None, 127,010)
Exp 1
R5 (None, 968, 200) (None, 967, 300) (None, 483, 300) (None, 483, 100) (None, 48,300)
R6 (None, 1955, 100) (None, 1953, 400) (None, 976, 400) (None, 976, 200) (None, 195,200)
R7 (None, 968, 100) (None, 963, 300) (None, 481, 300) (None, 481, 190) (None, 91,390)
R8 (None, 969, 100) (None, 964, 100) (None, 482, 100) (None, 482, 120) (None, 57,840)
R1 (None, 1328, 400) (None, 1327, 200) (None, 663, 200) (None, 663, 300) (None, 198,900)
R2 (None, 2129, 150) (None, 2128, 400) (None, 1064, 400) (None,1064, 270) (None, 287,280)
R3 (None, 2128, 350) (None, 2126, 400) (None, 1063, 400) (None, 1063, 230) (None, 244,490)
R4 (None, 2129, 100) (None, 2128, 100) (None, 1064, 100) (None, 1064, 110) (None, 117,040)
Exp 2
R5 (None, 1141, 150) (None, 1139, 400) (None, 569, 400) (None, 569, 160) (None, 91,040)
R6 (None, 2128, 100) (None, 2127, 200) (None, 1063, 200) (None, 1063, 150) (None, 159,450)
R7 (None, 1141, 100) (None, 1140, 400) (None, 570, 400) (None, 570, 190) (None, 108,300)
R8 (None, 1142, 100) (None, 1141, 200) (None, 570, 200) (None, 570, 280) (None, 159,600)
R1 (None, 1167, 300) (None, 1166, 100) (None, 583, 100) (None, 583, 220) (None, 128,260)
R2 (None, 2048, 350) (None, 2046, 400) (None, 1023, 400) (None, 1023, 70) (None, 71,610)
R3 (None, 2128, 400) (None, 2127, 300) (None, 1063, 300) (None, 1063, 70) (None, 74,410)
R4 (None, 2221, 100) (None, 2219, 400) (None, 1109, 400) (None, 1109, 130) (None, 144,170)
Exp 3
R5 (None, 1141, 400) (None, 1139, 100) (None, 569, 100) (None, 569, 90) (None, 51,210)
R6 (None, 2128, 100) (None, 2126, 200) (None, 1063, 200) (None, 1063, 230) (None, 244,490)
R7 (None, 1141, 100) (None, 1136, 100) (None, 568, 100) (None, 568, 80) (None, 45,440)
R8 (None, 1167, 100) (None, 1166, 200) (None, 583, 200) (None, 583, 230) (None, 134,090)
Then, the CNN layer extracts the local features [51] from the generated embedding of
the embedding layer to feed the max-pooling layer. Thus, the feature extraction is found by
applying the convolutional filter (kernel) to the input matrix by shifting the kernel in the
matrix [50]. This results in an output shape of (None, Conv-input-length, Conv-filters) with
a dynamic batch size indicated by a None value and the values of the Conv-input-length
and the number of convolutional filters summarized in Table 1 for all experiments. The
number of filters in the Conv1D layer varied between 100 and 400, with increments of
100. Here, a convolutional filter was applied to a window of words Xi:i + h − 1, where
h is the window size and Xi is a K-dimensional vector, and Xi:i + j represents the input
feature matrix that extends from the ith to (i + j) words of the sentence vector [52]. The
Modelling 2024, 5 1479
window size, which is also called kernel size, was tested with values of 2, 3, and 6 to capture
different n-gram features. This results in a feature Cif, as proposed in Equation (1).
Ci = f (W.Xi:i+h−1 + b) (1)
where W represents the convolutional filter, b represents the bias, which is a real number,
and f is the activation function [52] because the output of each filter in the CNN layer is
applied to the ReLU activation function, which allows it to learn complex patterns. Then,
a feature map is generated by convolving through all the windows of words for a single
convolutional filter based on Equation (2), while the m filters in the convolutional layer will
generate m(n − h + 1) features [52]. Then, the activity L2 regularization of 0.01 is used to
prevent overfitting.
C = C1 , C2 , C3 , Cn−h+1 (2)
The max function from the max pooling with a pool size of 2 is applied to each
CNN filter output to select the maximum feature value from each filter window while
iterating across the matrix [53], resulting in reduced the output complexity while saving
the important features [51] to be fed to the LSTM layer. The output shape generated by
the max pooling layer is (None, Conv-input-length/2, Conv-filters); Table 1 shows all the
output shapes for this layer for all experiments.
Then, the LSTM layer is used to handle long-term dependencies for understanding
the context that reflects the emotions in the text. In this research, an LSTM layer was used
with a dropout value which was set by the Keras tuner to find the best value; the dropout
rate in the LSTM layer varied between 0.2 and 0.5, with a step of 0.1 to prevent overfitting.
Also, the LSTM units were used with a search space between 30 and 300, with a step of 10;
this tuning range provides a trade-off between the model performance and computational
efficiency. This layer was then followed by the Flatten layer to shape the features that
were larger than the threshold. The output shapes of both the LSTM and Flatten layers are
presented in Table 1.
The last layer, also called the fully connected layer [49], is a dense layer with one neuron
that generates a (None, 1) output shape and a sigmoid function to binary-classify the output
of the LSTM layer as positive or negative sentiment. Then, the Adam optimizer was utilized
to enhance the training process of our hybrid model with a learning rate that was sampled
logarithmically between 1 × 10−5 and 1 × 10−3 to ensure optimal training convergence.
Moreover, each layer in the CNN-LSTM model contributes a total number of trainable
parameters that are updated during model training. Table 2 presents the number of
trainable parameters for each layer of the CNN-LSTM model for all experiments and runs,
providing a clear understanding of the model complexity and ability to learn from the data.
The number of trainable parameters can be calculated automatically for each layer using
the summary () function from Keras.
Exp Run Embedding Conv Layer Max Pooling LSTM Layer Flatten Dense Layer
Num Num Param Param Param Param Param Param
R1 10,774,200 180,400 0 651,000 0 148,251
R2 5,438,400 90,100 0 57,920 0 78,161
R3 4,934,700 180,400 0 388,280 0 165,921
R4 147,671,600 40,200 0 172,120 0 127,011
Exp 1
R5 6,217,000 120,300 0 160,400 0 48,301
R6 147,671,600 120,400 0 480,800 0 195,201
R7 147,671,600 180,300 0 373,160 0 91,391
R8 147,671,600 60,100 0 106,080 0 57,841
Modelling 2024, 5 1480
Table 2. Cont.
Exp Run Embedding Conv Layer Max Pooling LSTM Layer Flatten Dense Layer
Num Num Param Param Param Param Param Param
R1 30,356,400 160,200 0 601,200 0 198,901
R2 2,938,350 120,400 0 724,680 0 287,281
R3 12,019,700 420,400 0 580,520 0 244,491
R4 147,671,600 20,100 0 92,840 0 117,041
Exp 2
R5 4,897,950 180,400 0 359,040 0 91,041
R6 147,671,600 40,200 0 210,600 0 159,451
R7 147,671,600 80,400 0 449,160 0 108,301
R8 147,671,600 40,200 0 538,720 0 159,601
R1 5,450,400 60,100 0 282,480 0 128,261
R2 6,902,000 420,400 0 131,880 0 71,611
R3 13,765,200 240,300 0 103,880 0 74,411
R4 147,671,600 120,400 0 276,120 0 144,171
Exp 3
R5 13,086,800 120,100 0 68,760 0 51,211
R6 147,671,600 60,200 0 396,520 0 244,491
R7 147,671,600 60,100 0 57,920 0 45,441
R8 147,671,600 40,200 0 396,520 0 134,091
The final phase is model tuning using the Keras tuner [54], an easy-to-use framework
that provides scalable hyperparameter optimization for deep learning models. The Keras
tuner solves the pain points of hyperparameter search by using one of the built-in search
algorithms (Random Search, Bayesian Optimization, and Hyperband) and configuring its
search space using define-by-run syntax to find the best set of hyperparameter values for the
model. In this research, the Keras Tuner was utilized with Bayesian Optimization to explore
the hyperparameter space by focusing on promising regions, thereby reducing the number of
trials required. The final selection of the hyperparameters is based on their improvement of
the model performance in a 5-fold cross-validation. Thus, the best model was selected not
only for its validation accuracy but also for its consistency across different folds.
The five-fold cross-validation with the Keras tuner was used to validate the model
and optimize the hyperparameters using five different validation folds, which allowed
the model to explore different hyperparameter combinations and select those that make
the model perform best for ASA. Moreover, during the model training and validation, all
dynamic batch sizes that were previously defined as None could be set to a value of 50 since
it achieved the best result after manually trying different values. Then, the best model with
the best validation performance was evaluated on test data never seen before using the
same number of epochs: 10.
True Positive
Precision = (4)
True Positive + False Positive
Recall is a popular metric that measures the consistency of the model’s performance
by finding the ratio of observations correctly predicted as positive to all actual positives, as
shown in Equation (5).
True Positive
Recall = (5)
True Positive + False Negative
The F1-score is the harmonic mean of precision and recall, and is also an important metric
to verify the test accuracy [56]. The calculation of the F1-score is shown in Equation (6).
Precision × Recall
F1 − score = 2 × (6)
Precision + Recall
4. Results
The results demonstrate the performance when using a combined deep learning
model in this context and the effect of each preprocessing step on the model performance,
including the effect of removing and keeping the non-Arabic words and punctuation over
eight combinations of preprocessing conditions. These groups of conditions expand the
scope of the research to find the role of replacing the emojis with their meanings and the
appropriate stemmer and embedding transformer for each group of preprocessing steps.
The preprocessing conditions were grouped into eight groups from R1 to R8, as follows.
• R1: ISRI stemmer, emoji removal, Keras embedding.
• R2: ISRI stemmer, emoji encoding, Keras embedding.
• R3: Snowball stemmer, emoji encoding, Keras embedding.
• R4: ISRI stemmer, emoji encoding, AraVec 3.0 embedding.
• R5: Snowball stemmer, emoji removal, Keras embedding.
• R6: Snowball stemmer, emoji encoding, AraVec 3.0 embedding.
• R7: Snowball stemmer, emoji removal, AraVec 3.0 embedding.
• R8: ISRI stemmer, emoji removal, AraVec 3.0 embedding.
After preprocessing the data and during model training and cross-validation, the
Keras tuner used Bayesian Optimization to sample a set of hyperparameters and then
trained the model based on these parameters. After that, the model performance was
evaluated using the validation data. These operations were repeated for a predefined
number of iterations. Once all iterations had been completed, the model with the best
accuracy was selected after repeating this process 5-fold during the cross-validation. Table 3
summarizes the best set of hyperparameters.
The best set of hyperparameters differs from experiment to experiment, with no single
set of parameters repeated more than the other. Therefore, we could not generalize a
set of these values to all experiments. Instead, we adjusted the parameters with the best
set of values extracted with the Keras tuner for each experiment, as appropriate. The
results of Experiments 1–3 are summarized in Table 4, Table 5 and Table 6, respectively.
All experiments were conducted on the Google Colab-L4 platform using Python version
3.10.12. The deep learning models were implemented using Keras version 3.4.1, running
on TensorFlow version 2.17.0.
Modelling 2024, 5 1482
4.1. Experiment 1
In this experiment, the effect of removing non-Arabic words and punctuation was
tested over eight conditions, resulting in eight experimental runs.
The results in Table 4 show that removing the emojis had a negative impact on
the model performance, achieving the lowest accuracies of 70.15%, 69.92%, 53.87%, and
53.87% in R1, R5, R7, and R8, respectively, while translating the emojis to their textual
meaning improved the model classification performance in R2, R3, R4, and R6, achieving
accuracies of 90.23%, 91.69%, 87.32%, and 76.09%, respectively. Also, the results in R2 and
R3 prove that Keras embedding is better than AraVec for both stemmers. Moreover, Keras
embedding gives a better representation when removing the emojis in R1 and R5, which
can be explained by the fact that Keras has the advantage of being specifically trained on
the same dataset.
4.2. Experiment 2
In this experiment, the impact of keeping the non-Arabic tokens and removing the punctua-
tion was tested using eight combinations of parameters, resulting in eight experimental runs.
The results in Table 5 show that removing the emojis had a negative impact on the
model performance, achieving the lowest accuracies of 70.11%, 69.94%, 56.48%, and 56.07%
in R1, R5, R7, and R8, respectively, while translating the emojis to their textual meaning
improved the model classification performance in R2, R3, R4, and R6, achieving accuracies
of 89.81%, 91.85%, 78.08%, and 75.59%, respectively. Also, the results in R2 and R3 prove
that Keras embedding is better than the AraVec in R4 and R6 for both stemmers. Also,
Keras embedding gives a better representation when removing the emojis in R1 and R5. In
addition, keeping the non-Arabic words in this experiment showed the superior ability of
the Snowball stemmer and Keras embedding in dealing with other languages over the ISRI
stemmer and AraVec embedding, as shown in the results in R2, R3, and R4.
4.3. Experiment 3
In this experiment, the effect of keeping the non-Arabic words and punctuation was
tested over eight conditions, resulting in eight experimental runs.
Table 6 also presents the importance of translating the emojis, which provides a
real improvement to the model results and shows that Keras embedding outperforms
the AraVec transformer. However, keeping the punctuation does not have an effect on
improving the results; instead, the results decreased in Table 6 when the punctuation was
not removed compared to the results in Table 5 for all experiments except in R2 and R6.
These results reflect the reality that Twitter users use punctuation to decorate text and
do not follow the rules of the Arabic language. These results will direct attention to the
importance of removing the punctuation from tweets to obtain real results for the SA. In this
experiment, the best results of 90.43% were achieved in R3 when using Keras embedding,
Modelling 2024, 5, FOR PEER REVIEW 16
Modelling 2024, 5, FOR PEER REVIEW 16
follow the rules of the Arabic language. These results will direct attention to the im-
Modelling 2024, 5 follow
portancethe
of rules
removingof the theArabic language.
punctuation fromThese
tweetsresults willreal
to obtain direct attention
results for the to
SA.the im-
In this
1484
portance of removing
experiment, the punctuation
the best results from achieved
of 90.43% were tweets to in obtain real results
R3 when for theembedding,
using Keras SA. In this
experiment, the best results of 90.43% were achieved in R3 when using
Snowball stemmer, and emoji encoding, which reflects their ability to deal with punctua- Keras embedding,
Snowball
Snowball stemmer,
stemmer,
tion and extract and
theand emojiencoding,
emoji
emotions encoding,which
expressed which
in reflects
thereflects
place their
their
of the ability
ability
emojis. to to deal
deal with
with punctua-
punctuation
tion and extract
and extract
Figure the the emotions
emotions
4 shows expressed
expressedmatrix
the confusion in the
in the place place of the
of the emojis.
of Experiment emojis.
2 R3 because it achieves the best
Figure
Figure
accuracy 44 shows
score of allthe
shows the confusion
confusion matrix
experiments, matrix
showing of
of Experiment
a percentage22 of
Experiment R3 because
R3true
because it
it achieves
positivesachieves the
the best
(TP) and best
true
accuracy
accuracy score
score of
of all
all experiments,
experiments, showing
showing a
a percentage
percentage of
of true
true positives
positives
negatives (TN) of 91.85%, and a percentage of false positives and false negatives of 8.15%, (TP)
(TP) and
and true
true
negatives
indicating(TN)
negatives (TN) of
a strongof 91.85%,
91.85%, and
and aa percentage
performance with high of
percentage TPfalse
of and positives
false TN rates.and
positives and false
false negatives
negatives
Specifically, of
of 8.15%,
the model 8.15%,
cor-
indicating a
a strong
strong performance
performance with
with high
high TP TPandand
TN TN rates.
rates. Specifically,
Specifically,
rectly predicted 4199 out of 4454 TN samples and 3921 out of 4386 TP samples. These the the
modelmodel cor-
correctly
rectly
resultspredicted
predicted 4199 that
suggest 4199
out of
our out
4454 ofTN
hybrid4454 TN samples
samples
CNN-LSTM andmodel andout
3921 3921 out of
of 4386
effectively TP4386 TP
thesamples.
samples.
captures These
These results
nuances of the
results
suggest suggest
that ourthat our
hybrid hybrid
CNN-LSTMCNN-LSTM
data, resulting in fewer misclassifications. model model effectively
effectively captures
captures the the
nuancesnuances
of the of the
data,
data, resulting
resulting in fewer in fewer misclassifications.
misclassifications.
Figure 55 shows
Figure shows aa Receiver
Receiver Operating Characteristic Curve
Operating Characteristic Curve (ROC-Curve)
(ROC-Curve) for for the
the same
same
Figure 5The
experiment. shows a Receivershows
ROC-Curve Operating
the Characteristic
same results as Curve
the (ROC-Curve)
confusion matrix for
in the same
Figure 4,
experiment. The ROC-Curve shows the same results as the confusion matrix in Figure 4, as
experiment.
as the curve The
risesROC-Curve
dramatically shows
to thethe same
upper results as the confusion matrix in Figure 4,
the curve rises dramatically to the upper leftleft
nearnear
thethe Y-axis
Y-axis to show
to show the the
highhigh
truetrue pos-
positive
as theand
itive curve
truerises dramatically
negative to the upperthe
rates, highlighting leftrobustness
near the Y-axis
of thetomodel
show the high true pos-
in distinguishing
and true negative rates, highlighting the robustness of the model in distinguishing between
itive and positive
between true negative rates, highlighting
and negative sentiments the robustness
and contributingoftothe
itsmodel
overall insuperior
distinguishing
perfor-
positive and negative sentiments and contributing to its overall superior performance,
between positive
mance, resulting and negative
in under
an areathe
undersentiments and
the ROC-Curve contributing
of 91.84%. to its overall superior perfor-
resulting in an area ROC-Curve of 91.84%.
mance, resulting in an area under the ROC-Curve of 91.84%.
of Arabic sentiment analysis was investigated by checking the model performance with
different preprocessing groups to find the most suitable set of preprocessing steps for
the tweet dataset. We also investigated the translation of emojis into their meanings to
understand their importance in data preparation.
In the first experiment, all non-Arabic words and punctuation were removed. Then,
the model was used to evaluate the different techniques for handling emojis, stemming,
and embedding. The results in Table 4 show that removing emojis from the data resulted in
poor classification accuracy in R1, R5, R7, and R8, whereas translating emojis into real and
emotional meanings improved the model accuracy in R2, R3, R4, and R6, reaching 91.69%
in R3 when using Snowball stemmer and Keras embedding. Also, using the ISRI stemmer
in R2 gave a close result of 90.23%. In R4 and R6, the pre-trained AraVec 3.0 had less effect
on improving the model results, with an accuracy of 87.32% and 76.09%, respectively.
In the second experiment, the results in Table 5 suggest that keeping the non-Arabic
words had no positive effect on the results when using ISRI stemmer and emoji encoding
or AraVec 3.0 embedding and emoji encoding in R2, R4, and R6 over the results in Table 4,
while keeping the non-Arabic words improved the results in R3 and R5 when using
Snowball stemmer and Keras embedding. This indicated that the combination of Snowball
stemmer and Keras embedding can deal with both the emotions stored inside the emojis
and the words written in other languages and can employ them to provide insight into
full vector representation, while ISRI stemmer and AraVec transformers could not employ
the non-Arabic words to improve the classification results, especially when using emoji
encoding. This is because AraVec is a pre-trained model trained on Arabic tweets and
texts from Wikipedia, and the existence of non-Arabic words affects its transformation
performance, while the Keras transformer is trained on the same dataset, which helps
it to provide better representation of the emotions from the emojis and the non-Arabic
words. So, the best result of 91.85% was achieved in Exp. 2 R3 over all experiments by
keeping the non-Arabic words, which often carry significant sentiment information that
contributes to the overall meaning of a post, and this led to a noticeable improvement in
the sentiment classification accuracy. This is because non-Arabic words often act as strong
sentiment indicators. For example, a tweet containing the phrase “I love” would likely
indicate a positive sentiment. Removing these tokens would remove important context
from the post, potentially leading to misclassification. Also, AraVec 3.0 embedding was
slightly positively affected by keeping the non-Arabic words and removing the emoticons
compared to Experiment 1 R7 and R8. This can be explained by the fact that removing the
emoticons helps AraVec to provide vector representation for the tweets with an output
dimension of 100, while Keras embedding uses the appropriate output dimension with the
Keras tuner and generates more meaningful full embeddings.
In the third experiment, non-Arabic and punctuation were retained, and the effect of
emoji removal and emoji encoding on the model was the same as in Experiments 1 and 2. This
is because the accuracy achieved by removing the emoji was improved by replacing each emoji
with its meaning. Also, the effect of punctuation was tested in this experiment by keeping the
punctuation and non-Arabic words to see their effect on the model performance compared to
keeping the non-Arabic and removing the punctuation. These results show that keeping the
punctuation had a negative effect on the model accuracy for all experiments, especially R3,
which provided the best accuracy in Exp 2, while R2 and R6 results were improved. These
results show the indiscriminate use of punctuation by Twitter users. Thus, removing the
punctuation will provide a more reliable and constant model.
The results obtained by the proposed approach applied to the ASTC dataset were
compared with the results obtained by following different approaches that applied to the
same dataset. The comparison was made with the study in [25] and presents the difference
in the study aim, preprocessing steps, and the classification model, as shown in Table 7.
Modelling 2024, 5 1486
Table 7. Comparison.
The proposed model shows comparable results with the results obtained in Heteroge-
neous Ensemble Deep Learning Model for Enhanced Arabic Sentiment Analysis, which
used the emoji Unicode translation and CBOW word embedding to generate a numerical
representation of the text to use the RNN, LSTM, and GRU combined with three meta-
learners, LR, RF, and SVM, to classify the tweets [25]. The investigated model in [25]
aimed to improve the performance of the model for predicting Arabic sentiment analysis.
Therefore, they started with data preprocessing by cleaning the data by removing non-
Arabic letters, digits, single Arabic letters, symbols, URLs, emails, and hashtags. Then,
tokenization was carried out by splitting the text with spaces, followed by the removal of
stop words, stemming from the ISRI stemmer, and emoji Unicode translation. In contrast,
the proposed model investigated the hybrid CNN-LSTM model with different data prepro-
cessing steps to achieve comparable accuracy results and highlighted the effect of emoji
encoding on emotional and real meaning, as well as non-Arabic words, punctuation, Arabic
stemmers, and trainable and pre-trained transformers. This research presents the compati-
bility between Snowball stemmer, Keras embedding, and CNN-LSTM model and shows
how keeping the non-Arabic words improved the model, while keeping the punctuation
had a negative effect on it. Moreover, both the study in [25] and our approach showed the
role of using the emoji meaning to enrich the sentiment of the text by achieving an accuracy
of 92.22% and 91.85%, respectively, while our approach provided a comparison between
the results when removing the emojis and when transforming them, which validates the
meaning of the emojis in the generated emoji meaning dataset.
Author Contributions: Conceptualization, H.A.; methodology, H.A.; software, H.A.; validation, H.A.,
A.H. and M.M.; formal analysis, H.A.; investigation, H.A.; data curation, H.A.; writing—original draft
Modelling 2024, 5 1487
preparation, H.A.; writing—review and editing, A.H. and M.M.; visualization, H.A.; supervision,
A.H. and M.M. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.
Data Availability Statement: The ASTC dataset supporting the findings in this research is available from
the link in the dataset citation. On the other hand, the Emoji Meaning dataset is available upon request.
Conflicts of Interest: The authors declare no conflicts of interest.
References
1. Diwali, A.; Saeedi, K.; Dashtipour, K.; Gogate, M.; Cambria, E.; Hussain, A. Sentiment Analysis Meets Explainable Artificial
Intelligence: A Survey on Explainable Sentiment Analysis. IEEE Trans. Affect. Comput. 2023, 15, 837–846. [CrossRef]
2. Saberi, B.; Saad, S. Sentiment analysis or opinion mining: A review. Int. J. Adv. Sci. Eng. Inf. Technol. 2017, 7, 1660–1666.
3. Abdelfattah, M.F.; Fakhr, M.W.; Rizka, M.A. ArSentBERT: Fine-tuned bidirectional encoder representations from transformers
model for Arabic sentiment classification. Bull. Electr. Eng. Inform. 2023, 12, 1196–1202. [CrossRef]
4. Mohammed, A.; Kora, R. Deep learning approaches for Arabic sentiment analysis. Soc. Netw. Anal. Min. 2019, 9, 52. [CrossRef]
5. Abdelwahab, Y.; Kholief, M.; Sedky, A.A.H. Justifying Arabic Text Sentiment Analysis Using Explainable AI (XAI): LASIK
Surgeries Case Study. Information 2022, 13, 536. [CrossRef]
6. Oueslati, O.; Cambria, E.; Ben HajHmida, M.; Ounelli, H. A review of sentiment analysis research in Arabic language. Future
Gener. Comput. Syst. 2020, 112, 408–430. [CrossRef]
7. Al Shamsi, A.A.; Abdallah, S. Ensemble Stacking Model for Sentiment Analysis of Emirati and Arabic Dialects. J. King Saud.
Univ.-Comput. Inf. Sci. 2023, 35, 101691. [CrossRef]
8. Elnagar, A.; Einea, O.; Lulu, L. Comparative study of sentiment classification for automated translated Latin reviews into Arabic.
In Proceedings of the IEEE/ACS International Conference on Computer Systems and Applications (AICCSA), Hammamet,
Tunisia, 30 October–3 November 2017; pp. 443–448. [CrossRef]
9. Al-Azani, S.; El-Alfy, E.S.M. Combining emojis with Arabic textual features for sentiment classification. In Proceedings of the 2018 9th
International Conference on Information and Communication Systems (ICICS), Irbid, Jordan, 3–5 April 2018; pp. 139–144. [CrossRef]
10. Novak, P.K.; Smailović, J.; Sluban, B.; Mozetič, I. Sentiment of Emojis. PLoS ONE 2015, 10, e144296. [CrossRef]
11. Soleymani, M.; Garcia, D.; Jou, B.; Schuller, B.; Chang, S.F.; Pantic, M. A survey of multimodal sentiment analysis. Image Vis.
Comput. 2017, 65, 3–14. [CrossRef]
12. Li, W.; Zhu, L.; Shi, Y.; Guo, K.; Cambria, E. User reviews: Sentiment analysis using lexicon integrated two-channel CNN–LSTM
family models. Appl. Soft Comput. 2020, 94, 106435. [CrossRef]
13. Alayba, A.M.; Palade, V. Leveraging Arabic sentiment classification using an enhanced CNN-LSTM approach and effective
Arabic text preparation. J. King Saud. Univ.-Comput. Inf. Sci. 2022, 34, 9710–9722. [CrossRef]
14. Alayba, A.M.; Palade, V.; England, M.; Iqbal, R. Arabic language sentiment analysis on health services. In Proceedings of the 2017 1st
International Workshop on Arabic Script Analysis and Recognition (ASAR), Nancy, France, 3–5 April 2017; pp. 114–118. [CrossRef]
15. Abdulla, N.A.; Ahmed, N.A.; Shehab, M.A.; Al-Ayyoub, M. Arabic sentiment analysis: Lexicon-based and corpus-based. In
Proceedings of the 2013 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT),
Amman, Jordan, 3–5 December 2013; pp. 1–6. [CrossRef]
16. Nabil, M.; Aly, M.; Atiya, A. Astd: Arabic sentiment tweets dataset. In Proceedings of the 2015 Conference on Empirical Methods
in Natural, Lisbon, Portugal, 17–21 September 2015; pp. 2515–2519. Available online: https://aclanthology.org/D15-1299.pdf
(accessed on 29 April 2024).
17. Hengle, A.; Kshirsagar, A.; Desai, S.; Marathe, M. Combining Context-Free and Contextualized Representations for Arabic Sarcasm
Detection and Sentiment Identification. arXiv 2021, arXiv:2103.05683. Available online: https://arxiv.org/abs/2103.05683v1
(accessed on 7 September 2023).
18. Jalil, A.A.; Aliwy, A.H. Classification of Arabic Social Media Texts Based on a Deep Learning Multi-Tasks Model. Al-Bahir J. Eng.
Pure Sci. 2023, 2, 12. [CrossRef]
19. Sabbeh, S.F.; Fasihuddin, H.A. A Comparative Analysis of Word Embedding and Deep Learning for Arabic Sentiment Classifica-
tion. Electronics 2023, 12, 1425. [CrossRef]
20. Gharaibeh, H.; Al Mamlook, R.E.; Samara, G.; Nasayreh, A.; Smadi, S.; Nahar, K.M.; Aljaidi, M.; Al-Daoud, E.; Gharaibeh, M.;
Abualigah, L. Arabic sentiment analysis of Monkeypox using deep neural network and optimized hyperparameters of machine
learning algorithms. Soc. Netw. Anal. Min. 2024, 14, 30. [CrossRef]
21. Nayel, H.; Amer, E.; Allam, A.; Abdallah, H. Machine Learning-Based Model for Sentiment and Sarcasm Detection. In
Proceedings of the Sixth Arabic Natural Language Processing Workshop, Kiev, Ukraine, 19 April 2021; pp. 386–389. Available
online: https://aclanthology.org/2021.wanlp-1.51 (accessed on 7 September 2023).
22. Wadhawan, A. AraBERT and Farasa Segmentation Based Approach for Sarcasm and Sentiment Detection in Arabic Tweets. arXiv
2021, arXiv:2103.01679. Available online: https://arxiv.org/abs/2103.01679v1 (accessed on 7 September 2023).
23. Al-Azani, S.; El-Alfy, E.S.M. Emoji-Based Sentiment Analysis of Arabic Microblogs Using Machine Learning. In Proceedings of the 21st
Saudi Computer Society National Computer Conference (NCC), Riyadh, Saudi Arabia, 25–26 April 2018; pp. 1–6. [CrossRef]
Modelling 2024, 5 1488
24. Arifiyanti, A.A.; Wahyuni, E.D. Emoji and emoticon in tweet sentiment classification. In Proceedings of the 6th Information
Technology International Seminar (IT IS), Surabaya, Indonesia, 14–16 October 2020; pp. 145–150. [CrossRef]
25. Saleh, H.; Mostafa, S.; Alharbi, A.; El-Sappagh, S.; Alkhalifah, T. Heterogeneous Ensemble Deep Learning Model for Enhanced
Arabic Sentiment Analysis. Sensors 2022, 22, 3707. [CrossRef]
26. Surikov, A.; Egorova, E. Alternative method sentiment analysis using emojis and emoticons. Procedia Comput. Sci. 2020,
178, 182–193. [CrossRef]
27. Al-Azani, S.; El-Alfy, E.S. Emojis-based sentiment classification of Arabic microblogs using deep recurrent neural networks.
In Proceedings of the 2018 International Conference on Computing Sciences and Engineering (ICCSE), Kuwait City, Kuwait,
11–13 March 2018; pp. 1–6. [CrossRef]
28. Chen, Y.; You, Q.; Yuan, J.; Luo, J. Twitter sentiment analysis via bi-sense emoji embedding and attention-based LSTM.
In Proceedings of the MM 2018—Proceedings of the 2018 ACM Multimedia Conference, Seoul, Republic of Korea,
22–26 October 2018; pp. 117–125. [CrossRef]
29. Arabic Sentiment Twitter Corpus. Available online: https://www.kaggle.com/datasets/mksaad/arabic-sentiment-twitter-
corpus/data?select=arabic_tweets (accessed on 31 March 2024).
30. EmojiGuide. Available online: https://ar.emojiguide.com/ (accessed on 9 April 2024).
31. EmojiAll. Available online: https://www.emojiall.com/ar (accessed on 9 April 2024).
32. Symbol Planet. Available online: https://symbolplanet.com/smileys-emotion-emoji-meanings/ (accessed on 9 April 2024).
33. wikiHow. Available online: https://www.wikihow.com/Category:Emoticons-and-Emojis (accessed on 9 April 2024).
34. Ma, Z.; Sun, A.; Yuan, Q.; Cong, G. Tagging your tweets: A probabilistic modeling of hashtag annotation in twitter. In Proceedings
of the 23rd ACM International Conference on Conference on Conference on Information and Knowledge Management, Shanghai,
China, 3 November 2014; pp. 999–1008. [CrossRef]
35. Yang, L.; Sun, T.; Zhang, M.; Mei, Q. We know what @you #tag: Does the dual role affect hashtag adoption? In Proceedings of the
21st Annual Conference on World Wide Web (WWW), Lyon, France, 16–20 April 2012; pp. 261–270. [CrossRef]
36. Khalid Bolbol, N.; Maghari, A.Y. Sentiment analysis of arabic tweets using supervised machine learning. In Pro-
ceedings of the 2020 International Conference on Promising Electronic Technologies (ICPET), Jerusalem, Palestine,
16–17 December 2020; pp. 89–93. [CrossRef]
37. Khamphakdee, N.; Seresangtakul, P. An Efficient Deep Learning for Thai Sentiment Analysis. Data 2023, 8, 90. [CrossRef]
38. Al-Helalat, M. Enhanced arabic information retrieval for informed decision-making: Empowering political search. Int. J. Progress.
Res. Eng. Manag. Sci. (IJPREMS) 2023, 3, 232–240. Available online: https://www.ijprems.com/uploadedfiles/paper/issue_7_
july_2023/31816/final/fin_ijprems1689480149.pdf (accessed on 10 May 2024).
39. Gurusamy, V.; Professor, A. Preprocessing Techniques for Text Mining. Int. J. Comput. Sci. Commun. Netw. 2014, 5, 7–16.
40. Van Der Goot, R. Where are we Still Split on Tokenization? In Findings of the Association for Computational Linguistics: EACL;
Association for Computational Linguistics: St. Julian’s, Malta, 2024; pp. 118–137. Available online: https://aclanthology.org/20
24.findings-eacl.9 (accessed on 27 April 2024).
41. Bird, S. NLTK: The natural language toolkit. In Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions; Association
for Computational Linguistics: Sydney, Australia, 2006; pp. 69–72. Available online: https://aclanthology.org/P06-4018.pdf
(accessed on 27 April 2024).
42. Islam, J.; Mercer, R.E.; Xiao, L. Multi-Channel Convolutional Neural Network for Twitter Emotion and Sentiment Recognition. In
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human
Language Technologies (NAACL HLT), Minneapolis, MN, USA, 2–7 June 2019; Volume 1, pp. 1355–1365. [CrossRef]
43. Maree, M.; Eleyat, M.; Rabayah, S.; Belkhatir, M. A hybrid composite features based sentence level sentiment analyzer. IAES Int. J.
Artif. Intell. 2023, 12, 284–294. [CrossRef]
44. Radwan, A.; Amarneh, M.; Alawneh, H.; Ashqar, H.I.; AlSobeh, A.; Magableh, A.A.A.R. Predictive Analytics in Mental Health
Leveraging LLM Embeddings and Machine Learning Models for Social Media Analysis. Int. J. Web Serv. Res. 2024, 21, 1–22. [CrossRef]
45. Gulli, A.; Pal, S. Deep Learning with Keras; Packt Publishing Ltd.: Birmingham, UK, 2017; Available online: https://scholar.
google.com/scholar_lookup?title=Deep+Learning+with+KERAS&author=Gulli,+A.&author=Pal,+S.&publication_year=2017
(accessed on 9 May 2024).
46. Soliman, A.B.; Eissa, K.; El-Beltagy, S.R. AraVec: A set of Arabic Word Embedding Models for use in Arabic NLP. Procedia Comput.
Sci. 2017, 117, 256–265. [CrossRef]
47. Bin Syed, M.A.; Ahmed, I. A CNN-LSTM Architecture for Marine Vessel Track Association Using Automatic Identification System
(AIS) Data. Sensors 2023, 23, 6400. [CrossRef]
48. Hu, F.; Yang, Q.; Yang, J.; Luo, Z.; Shao, J.; Wang, G. Incorporating multiple grid-based data in CNN-LSTM hybrid model for
daily runoff prediction in the source region of the Yellow River Basin. J. Hydrol. Reg. Stud. 2024, 51, 101652. [CrossRef]
49. Ghourabi, A.; Mahmood, M.A.; Alzubi, Q.M. A Hybrid CNN-LSTM Model for SMS Spam Detection in Arabic and English
Messages. Future Internet 2020, 12, 156. [CrossRef]
50. Saleh, H.; Mostafa, S.; Gabralla, L.A.; Aseeri, A.O.; El-Sappagh, S. Enhanced Arabic Sentiment Analysis Using a Novel Stacking
Ensemble of Hybrid and Deep Learning Models. Appl. Sci. 2022, 12, 8967. [CrossRef]
51. Rehman, A.U.; Malik, A.K.; Raza, B.; Ali, W. A Hybrid CNN-LSTM Model for Improving Accuracy of Movie Reviews Sentiment
Analysis. Multimed. Tools Appl. 2019, 78, 26597–26613. [CrossRef]
Modelling 2024, 5 1489
52. Khan, L.; Amjad, A.; Afaq, K.M.; Chang, H.T. Deep Sentiment Analysis Using CNN-LSTM Architecture of English and Roman
Urdu Text Shared in Social Media. Appl. Sci. 2022, 12, 2694. [CrossRef]
53. Behera, R.K.; Jena, M.; Rath, S.K.; Misra, S. Co-LSTM: Convolutional LSTM model for sentiment analysis in social big data. Inf.
Process Manag. 2021, 58, 102435. [CrossRef]
54. KerasTuner. Available online: https://keras.io/keras_tuner/ (accessed on 12 April 2024).
55. Alawneh, H.; Hasasneh, A. Survival Prediction of Children after Bone Marrow Transplant Using Machine Learning Algorithms.
Int. Arab. J. Inf. Technol. 2024, 21, 394–407. [CrossRef]
56. Islam, M.A.; Iacob, I.E. Manuscripts Character Recognition Using Machine Learning and Deep Learning. Modelling 2023,
4, 168–188. [CrossRef]
57. Al-Radhi, M.S.; Abdo, O.; Csapó, T.G.; Abdou, S.; Németh, G.; Fashal, M. A continuous vocoder for statistical parametric speech synthesis
and its evaluation using an audio-visual phonetically annotated Arabic corpus. Comput. Speech Lang. 2020, 60, 101025. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.