Modelling 05 00076 v2

Article
On the Utilization of Emoji Encoding and Data Preprocessing

with a Combined CNN-LSTM Framework for Arabic
Sentiment Analysis
Hussam Alawneh 1 , Ahmad Hasasneh 1, * and Mohammed Maree 2, *
1 Department of Natural, Engineering and Technology Sciences, Faculty of Graduate Studies, Arab American
University, Ramallah P.O. Box 240, Palestine; h.alawneh3@student.aaup.edu
2 Department of Information Technology, Arab American University, Ramallah P.O. Box 240, Palestine
* Correspondence: ahmad.hasasneh@aaup.edu (A.H.); mohammed.maree@aaup.edu (M.M.)
Abstract: Social media users often express their emotions through text in posts and tweets, and these
can be used for sentiment analysis, identifying text as positive or negative. Sentiment analysis is
critical for different fields such as politics, tourism, e-commerce, education, and health. However,
sentiment analysis approaches that perform well on English text encounter challenges with Arabic text
due to its morphological complexity. Effective data preprocessing and machine learning techniques
are essential to overcome these challenges and provide insightful sentiment predictions for Arabic
text. This paper evaluates a combined CNN-LSTM framework with emoji encoding for Arabic
Sentiment Analysis, using the Arabic Sentiment Twitter Corpus (ASTC) dataset. Three experiments
were conducted with eight-parameter fusion approaches to evaluate the effect of data preprocessing,
namely the effect of emoji encoding on their real and emotional meaning. Emoji meanings were
collected from four websites specialized in finding the meaning of emojis in social media. Furthermore,
the Keras tuner optimized the CNN-LSTM parameters during the 5-fold cross-validation process. The
highest accuracy rate (91.85%) was achieved by keeping non-Arabic words and removing punctuation,
using the Snowball stemmer after encoding emojis into Arabic text, and applying Keras embedding.
This approach is competitive with other state-of-the-art approaches, showing that emoji encoding
enriches text by accurately reflecting emotions, and enabling investigation of the effect of data
Citation: Alawneh, H.; Hasasneh, A.;
preprocessing, allowing the hybrid model to achieve comparable results to the study using the same
Maree, M. On the Utilization of Emoji
ASTC dataset, thereby improving sentiment analysis accuracy.
Encoding and Data Preprocessing
with a Combined CNN-LSTM
Framework for Arabic Sentiment
Keywords: sentiment analysis; emoji encoding; CNN-LSTM; hyperparameters optimization; NLP;
Analysis. Modelling 2024, 5, 1469–1489. data preprocessing
https://doi.org/10.3390/
modelling5040076
Academic Editor: Alfredo Cuzzocrea

1. Introduction
Received: 2 July 2024 Due to the growth and proliferation of social media platforms, the huge amount of
Revised: 20 September 2024 textual data available on the Internet is prompting more attention to be given to sentiment
Accepted: 30 September 2024 analysis [1]. Sentiment analysis (SA), often referred to as opinion mining, is a type of
Published: 7 October 2024
Natural Language Processing (NLP) that aims to extract sentiments by analyzing textual
data and classifying it based on text polarity [2]. It plays an important role in analyzing
thoughts, opinions, and emotions in texts written about healthcare systems, e-commerce,
Copyright: © 2024 by the authors.
and social networks [3]. Although Arabic is one of the most widely used languages in
Licensee MDPI, Basel, Switzerland. the world, research in Arabic sentiment analysis is still growing slowly compared to other
This article is an open access article languages such as English [4]. Therefore, extending the same success in SA to the Arabic
distributed under the terms and language is still a challenge.
conditions of the Creative Commons In the field of SA, most research is focused on the English language, with little attention
Attribution (CC BY) license (https:// paid to the Arabic language [5]. This is because Arabic Sentiment Analysis (ASA) is still
creativecommons.org/licenses/by/ challenging due to Arabic varieties, orthography, morphology, lack of corpora, lack of
4.0/). sentiment lexicons, and the use of dialectal Arabic [6]. Arabic is a global language with
Modelling 2024, 5, 1469–1489. https://doi.org/10.3390/modelling5040076 https://www.mdpi.com/journal/modelling

Modelling 2024, 5 1470
more than 500 million speakers worldwide [7], and about 185 million Arabic speakers use
the Web [6]. Thus, ASA has recently emerged as an active research area, particularly in the
field of Machine Learning (ML) applications [8]. One of the ways to strengthen the ASA
domain is through the use of emojis, as they provide helpful features to enrich the textual
features for sentiment analysis, which are becoming more popular in the world of social
media [9].
They provide a rich source of semantic dimensions that can assist in conveying
users’ opinions. Here, we did not just consider emoticons that reflect facial expres-
sions, but also those that are used to enrich the text with concepts and ideas, such
as celebrations, weather status, vehicles and buildings, food and drink, animals and
plants, and the intended feelings and emotions from their use [10]. For example, the
m'”, and in English means “loves someone,
“ ” emoji means “ èXñÖÏ @ ð éJÓðQË@ ð m I .
romance, and affection”, and “ ” means “ ÐA« . èPAKB @ð èXAªË@”, and means “happi-
É¾
ness and excitement in general” in English, while “ ” is rich in meanings and intentions
¯AÖÏ úæÖÏ @ èQº¯ ð @ éK XAÖÏ @ ÈAJ.m.Ì '@. Q®Ë@ ð @ èñ®Ë@
such as “ èQÓAªÖÏ @ð éÊK ñ£ HA
ð @ éªJ
J¢ËAK H Aj«B @.
. . . .

ÉÓ AJË@ð ÐCËAK. AAk@ ð @ , HAK YjJË@ úÎ« I.ÊªJË@ ð@”, which means “Physical mountains or the
idea of hiking and adventure. Admiration for nature, strength, or travel. Or overcoming
challenges, or a sense of peace and contemplation”. Thus, eliminating such emojis could
omit valuable information and feelings that they reflect, and change the overall meaning
of the user’s tweet and its emotional tone. On the other hand, including the intended
meaning and emotion of the emoji will help ML extract the right insights and support
decision-makers and managers in their decision-making.
This research work presents an approach to emoji encoding introduced by replacing
each emoji with its emotional and real social media meaning. Furthermore, a hybrid deep
learning model is proposed to evaluate the impact of this preprocessing step on the quality
of ASA and to build robust prediction models. These techniques address specific challenges
in Arabic sentiment analysis, such as the complexity of Arabic dialects, the lack of sentiment
lexicons, and the intricacies of Arabic morphology. Our approach advances the state of the
art by offering a more nuanced understanding of how these techniques can be effectively
employed to overcome these challenges. To the best of our knowledge, this is the first work
that utilizes a combined deep learning approach with emoji encoding for Arabic Sentiment
Analysis, which deserves to be considered.
Accordingly, we can summarize the main contribution of our proposed approach as
follows:
1. Combination of emoji encoding with the hybrid CNN-LSTM model: Our method inte-
grated emoji encoding that captures all the emotional and real meanings, specifically
tailored to enhance the understanding of sentiment in Arabic text.
2. Impact of preprocessing steps: We explored the effects of various preprocessing
techniques, such as keeping non-Arabic words, retaining punctuations, and using
different stemmers and embedding transformers, on the performance of our sentiment
analyzer. This exploration provides deeper insights into how specific transformation
or stemming strategies can effectively leverage punctuation and non-Arabic words to
enhance sentiment extraction in Arabic text.
The rest of this paper is organized as follows. The literature review on sentiment
analysis and text data preprocessing is discussed in Section 2. Then, the proposed method-
ology, including data collection, preprocessing, and hybrid model prediction and tuning
processes, is presented in Section 3. Section 4 shows the results obtained from the different
experiments, which are discussed in Section 5. Finally, Section 6 presents the conclusions
and suggests future work in this area.
2. Literature Review
Sentiment analysis is the understanding of people’s opinions, emotions, and attitudes
toward any topic or person expressed in textual data [11]. In the field of Natural Language
Processing, the ASA has recently received increasing attention [12]. Through reading on
the ASA field, we found research undertaken on hybrid models, deep learning models, and
classical machine learning models for classifying Arabic sentiments.
Hybrid models play a role in our understanding of the complexity of Arabic sentiment
as these models are trained on different datasets to build predictive models. The study
in [13] applied a combination of Convolutional Neural Network (CNN) and Long Short-
Term Memory (LSTM) on three datasets, the Arabic Health Services Dataset (Main-AHS
and Sub-AHS) [14], Ar-Twitter [15], and the Arabic Sentiment Tweets Dataset (ASTD) [16]
datasets. The max-pooling layer was excluded from the CNN to maintain the same fea-
ture vector length after convolving the filters on the input data. In addition, several
dataset-preparation techniques such as MADAMIRA, Farasa, and Stanford for Arabic text
preprocessing and several pre-trained word-embedding techniques for providing vector
representation for the text features, such as Word2Vec, Glove, and fastText, were inves-
tigated to improve the accuracy of Arabic sentiment classification. The best accuracy, of
94.83%, was achieved for the Main-AHS dataset using Farasa Lemmatization normalization,
and 88.86% for the Ar-Twitter dataset using Madamira Stem normalization and 81.62%
for the ASTD using Word2VecSG word embedding. Subsequently, a more complex ap-
proach was proposed in [17] by implementing a hybrid model to combine contextualized
sentence representations generated by the AraBERT model with static word embedding
using pre-trained Mazajak. In addition, CNN-Bidirectional Long Short-Term Memory
(CNN-BiLSTM) was used to obtain sentence representations from the static word vectors
in order to be able to concatenate the two types of embeddings. The hybrid model outper-
forms the standalone AraBERT model tested on the ArSarcasm-v2 dataset for both sarcasm
and sentiment classification tasks. The best results are a 0.62 F1 score and 0.715 F-PN
score (macro average of positive and negative class F scores) for sarcasm and sentiment
classification, respectively. Another hybrid model of CNN-BiLSTM was used in [18] for
different tasks, including a topic classifier, a sentiment analyzer, a sarcasm detector, and an
emotion classifier. This model was trained on different datasets for each task, with four
of them for sentiment analysis tasks; SS2030, ArSAS, Twitter dataset for Arabic Sentiment
Analysis, and ArSarcasm-v2 datasets, consisting of 4214, 21,000, 348,797, and 15,548 tweets,
respectively. The proposed model achieves an accuracy of 97.58%, 86%, 97%, and 81.6% for
topic, sentiment, sarcasm, and emotion classification, respectively.
On the other hand, deep learning has also been used for ASA. For example, the study
in [19] used deep learning to evaluate GloVe, Word2Vec, and FastText as classical word
embedding techniques and ARBERT as contextualized word embedding for sentiment
analysis with a comparative analysis. The word embedding techniques were evaluated
in trained and pre-trained versions by applying two deep learning models of BiLSTM
and CNN on five datasets, including HARD, Khooli, Arabic Jordanian General Tweets
(AJGT), ArSAS, and ASTD for sentiment classification. The BiLSTM model outperforms
CNN on three datasets, while CNN performs better on smaller datasets. In addition,
the generated embeddings outperform their pre-trained versions by about 0.28% to 1.8%
accuracy. The contextualized transformer-based embedding BERT model achieves the
highest performance in both trained and pre-trained versions. Another study in [20]
employed Deep Neural Networks along with investigating Support Vector Machines
(SVM), Naive Bayes (NB), and Random Forest (RF) as classical ML models that were tuned
using Differential Evolution (DE) algorithms for classifying the sentiment of Arabic texts
related to monkeypox. The dataset used was collected from Twitter over eight months,
resulting in 4763 tweets. The best result was obtained using the DNN based on Leaky ReLU
with an accuracy of 92%.
Classical ML has also been used for ASA. Thus, several supervised ML models have
been applied in [21], including SVM, Linear Regression, NB, Complementary Naive Bayes
(CNB), and Stochastic Gradient Descent (SGD) for both sentiment and sarcasm classification.
These models were trained and tested with 5-fold cross-validation on the ArSarcasm-v2
dataset. The best accuracy was achieved using SVM with 59.8% and 74.6% for sentiment
and sarcasm, respectively. Based on the same dataset, an improvement was presented
in [22] by applying different versions of two transformer-based models, AraELECTRA and
AraBERT, for sarcasm and sentiment detection. The best results for sarcasm were achieved
by the AraBERTv2-base model with an accuracy of 78.3%, while AraBERTv0.2-large was
the best for the sentiment task, with an accuracy of 65.15%. It is important to note that the
pre-trained model in [3] was not used to generate the embeddings. Instead, it presents a
fine-tuning approach of three stages for a pre-trained model called Arabic BERT, which was
developed for Arabic sentiment analysis. These stages consist of text pre-processing and
data cleaning, transfer learning of weights of pre-trained models, and a classification layer.
Model evaluation was performed by testing this model on five different Arabic review
datasets and comparing its results with 11 state-of-the-art models. This model outperforms
the prediction accuracy of the proposed models.
Researchers in SA follow different strategies to deal with emojis; some researchers
just eliminate the emojis, while others have considered the significance of emojis in their
work [23]. Including the emojis can help in expressing writers’ feelings, which helps in
improving the classification performance [24].
One strategy exploits the emojis in SA by replacing the emojis with textual data, such
as the study in [25], which is directed towards translating emojis by conducting emoji
Unicode translation. Also, it investigates the effect of combining Recurrent Neural Network
(RNN), LSTM, and Gated Recurrent Unit (GRU) in conjunction with Logistic Regression
(LR), RF, and SVM and grid search to improve the prediction performance for Arabic
sentiment analysis. The model performance is compared with three deep learning models,
which are RNN, LSTM, and GRU, implemented with CBOW word embedding and tuned
using Keras-tuner, and with five ML models, which are Decision Tree (DT), LR, K-Nearest
Neighbor (KNN), RF, and NB, implemented with the Term Frequency–Inverse Document
Frequency (TF-IDF) feature extraction model and grid-search cross-validation for model
tuning. Different datasets are used for training and testing the models: ASTC, ArTwitter,
and AJGT. Stacking LR achieved the highest testing accuracy of 92.22% compared to ML
models and DL models when using the ASTC dataset. Also, the study in [26] used a Russian
dataset of 6957 posts and each post has at least one emotional indicator (emojis, emoticons,
punctuation marks that express emotions); each emotional indicator was replaced with its
meaning to improve the model. The best model was an ensemble model of word2vector
model and a model of emotional indicator embedding tested on a dataset of 524 posts with
an accuracy of 91%.
Another strategy to improve SA is to use emojis as non-verbal features. The study
in [23] adapted non-verbal features for the task of Arabic sentiment analysis. Thus, several
ML models including NB, multinomial naive Bayes (MNB), SGD, sequential minimal
optimization-based support vector machines (SMO-SVM), DT, and RF were evaluated
on emoji-based features with a feature vector of length 429 and for 2091 instances. The
MNB achieved the best Area Under the Curve (AUC) of 87.30% when applied to the top
250 most relevant emojis selected using ReliefF and Correlation-Attribute Evaluator feature
selection techniques. In [27], several ML models were also investigated, including SGD,
SVM, Gaussian NB, KNN, DT, LSTM, GRU, Bi-LSTM, and bidirectional-GRU, to evaluate
non-verbal features. A dataset of 2091 microblogs after excluding tweets without emojis
was collected from ASTD, ArTwitter, QCRI, Syria, Semeval-2017 Task4 Subtask#A, and
843 Arabic microblogs with emojis from Twitter and YouTube. Then, the Emoji Sentiment
Ranking (ESR) lexicon, which is an emoji lexicon containing 969 used emojis after excluding
the unused emojis, and Principle Component Analysis (PCA) were applied to reduce
the dimensionality of the features from 430 to 100 features. The best accuracy of 71.71%
was achieved by the bidirectional-GRU model. In addition to non-verbal features, textual
features were also used in the study of [9]. Thus, five datasets were used after removing
instances that did not contain emojis, including Syria, ASTD, ArTwitter, QCRI, and Semeval-
2017. After merging all the datasets, each tweet was divided into textual and emoji features,
and then for the feature extraction step, the TF-IDF, Latent Semantic Analysis (LSA), and
two methods of word embedding were used to extract textual features, while a set of
120 emojis was used to calculate the occurrence of each emoji to obtain nonverbal features.
The SVM achieved the best results by merging skip-gram features with emojis and using
correlation-based feature selection with an accuracy of 83.02%.
In [28], another approach was applied by training an attention-based long short-term
memory network on the embeddings generated by bi-sense emojis and inspired by word
sense embedding. To obtain sentiment-aware embeddings of emojis, the bi-sense emojis
were learned under positive and negative sentimental tweets. The best accuracy of 90%
was achieved on the AA-sentiment dataset using Multi-Level Attention-based LSTM with
bi-sense emoji embedding (MATTBiE-LSTM) and 83.4% on the HA-sentiment dataset using
word-guide attention-based LSTM with bi-sense emoji embedding.
The previous studies used different hybrid models, transformers, and emoji-handling
strategies for ASA. However, the morphological complexity of the Arabic language and
the effect of several factors that change the meaning of the text, such as punctuation, non-
Arabic words, and emojis, mean that the ASA field needs further investigation. The studies
in [13,17–19] applied the hybrid and deep learning models on prepared datasets without
exploring the effect of emoji meaning, punctuation, or sentences in other languages on
the final classification results. Other studies applied classical machine learning models,
such as [21,22], but these models could not overcome the complexity of Arabic, so they
did not reach high accuracy scores. On the other hand, the studies in [20,25,26] treated
the emojis by replacing them with textual data. In contrast, the studies in [23,27] treated
the emojis as non-verbal features and removed text which may be rich in sentiment that
can improve the model results, so in [9] both the non-verbal features and the original text
were used. Although these studies examined emojis, they did not investigate the effect of
keeping non-Arabic words, punctuation, or the most suitable transformers when having
words written in other languages inside the Arabic text or when keeping punctuation,
or emoji encoding on the emotional and real meaning in their results. In this study, we
propose a combination of CNN and LSTM models trained and tested on the ASTC dataset
to improve the ASA. The study also investigates the effect of the proposed hybrid model
under different experiments and conditions to understand the importance of each step
in data preprocessing, including examining Keras and AraVec transformers and their
suitability when keeping punctuation and non-Arabic words, emoji handling, and the effect
of keeping Arabic words and punctuation on the model results.
3. Materials and Methods

Data preparation is an important part of developing accurate and realistic predictive
models. It reduces the dimensionality of the data by removing unnecessary text and
characters. At the same time, the text can be enriched with the intended feelings of the
users by replacing the emoji with its meaning, which leads to improved accuracy. Therefore,
the main goal of this research is to investigate the effect of data preprocessing on ASA tasks.
Figure 1 shows the workflow used to achieve this goal.
Modelling2024,
Modelling 2024,55, FOR PEER REVIEW 14746
Figure1.1.The
Figure Theworkflow
workflowof
ofthe
theproposed
proposedmodel
modelfor
forArabic
Arabicsentiment
sentimentanalysis.
analysis.
3.1.
3.1.Dataset
DatasetDescriptions
Descriptions
In
Inthis
thisresearch,
research,thetheperformance
performanceof ofthetheproposed
proposedmodel
modelandandthe
theimportance
importanceof ofdata
data
preprocessing
preprocessing were
were evaluated
evaluated using
using two
two different
differentdatasets,
datasets, namely
namely Arabic
Arabic Sentiment
Sentiment
Twitter
TwitterCorpus
Corpus (ASTC)
(ASTC) and Emoji Meaning.
Meaning. The TheASTC
ASTCdataset
datasetcontains
contains Arabic
Arabic tweets
tweets la-
labeled with their corresponding sentiment polarity for training the model,
beled with their corresponding sentiment polarity for training the model, and the Emoji and the Emoji
Meaning
Meaningdataset
datasetforms
formsa adictionary
dictionaryofof
emoji
emoji meanings,
meanings, which is used
which to give
is used each
to give emoji
each in
emoji
the ASTC dataset its meaning to find the effect of emoji encoding
in the ASTC dataset its meaning to find the effect of emoji encoding on ASA. on ASA.
The
TheASTC
ASTC[29]
[29]isisaapublicly
publiclyavailable
availabledataset
dataseton onKaggle,
Kaggle,collected
collectedin inApril
April2019
2019using
using
aapositive
positiveandandnegative
negativeemoji
emojilexicon.
lexicon. ItIt is
is aa balanced
balanced dataset
dataset consisting
consistingof of5656KKlabeled
labeled
Arabic
Arabictweets,
tweets,as
asshown
shownin in Figure
Figure2,2, which
which represent
representthethe number
number of of tweets
tweets in
in each
each class.
class.
The dataset is divided into 45 K for model training, with 22,760 positive and 22,513 neg-
Modelling 2024, 5, FOR PEER REVIEW 7
The dataset is divided into 45 K for model training, with 22,760 positive and 22,513 nega-
tive
ativetweets,
tweets, andand 11 11K forK formodelmodel testing, withwith
testing, 57515751positive and 5767
positive and 5767negative tweets.
negative The
tweets.
target variable is also labeled as positive or negative to describe
The target variable is also labeled as positive or negative to describe the emotions of the the emotions of the tweets.
In addition,
tweets. the Emoji
In addition, theMeaning dataset consists
Emoji Meaning of 912 emojis
dataset consists of 912 collected from the
emojis collected ASTC
from the
dataset, and then each emoji was mapped to its meaning.
ASTC dataset, and then each emoji was mapped to its meaning. Each emoji has an emo- Each emoji has an emotional
meaning based on Twitter users’ use in their tweets, such as “ ”, which means “ ‫السعادة‬
tional meaning based on Twitter users’ use in their tweets, such as “ ”, which means
‫عدواني بشكل سلبي‬ ‫يكون الشخص ساخر أو‬
. úG@ð Y« ‫ ”بشكل عام أو الود او عندما‬andin English means “general
àñºK
“ úæÊ É¾
happiness or ð @
friendliness QkA
or j Ë@
when AÓYJ« ð@isXñË@
someone ð @ ÐA«sarcastic
being É¾. èXAªË@ ” and in English
or passive-aggressive”.
.
meansemojis
Other “general canhappiness
have an emotional or friendliness
meaning or when
and can someone
be usedistobeingindicatesarcastic or passive-
their real mean-
ing, such as “ ”, which means “‫ ”حيوانك األليف او الكلب او الوالء والصداقة والرفقة و الثقة‬andreal
aggressive”. Other emojis can have an emotional meaning and can be used to indicate their in
English
meaning, “your
such petas “or dog or loyalty,
”, which

means friendship,
companionship,
“ é®JË@ ð é®¯QË@ ð é¯@YË@
ð ZBñË@ ð@trust”. I.Ê¾Ë@So,
ð@ Jto Ëobtain
B@ ½K@ñallJk”
the
andemoji meanings,
in English “your four pet orwebsites [30–33]friendship,
dog or loyalty, specialized in collecting emoji
companionship, trust”.meanings
So, to obtainwere all
used
the emoji meanings, four websites [30–33] specialized in collecting emoji meanings were con-
to map each emoji to its meaning. These websites were also used to validate the used
sistency
to map each of the emoji
emoji to itsmeanings,
meaning. providing
These websites cross-referencing
were also usedwith severalthe
to validate emoji interpre-
consistency of
tation
the emoji databases
meanings, or providing
dictionaries. Also, most emojis
cross-referencing can have
with several emoji multiple meanings,
interpretation so the
databases or
strategy
dictionaries.thatAlso,
was most
used emojis
for handling
can haveambiguity and validating
multiple meanings, the emojis’
so the strategy thatmeaning
was used in- for
cluded the addition of all the commonly shared emotional
handling ambiguity and validating the emojis’ meaning included the addition of all the com- and real word meanings be-
tween
monlyArabic-speaking
shared emotionalusers and provided
real word by [30,31] and
meanings English-speaking
between Arabic-speaking users provided
users provided by
[32,33], supported by human judgment. In addition to all
by [30,31] and English-speaking users provided by [32,33], supported by human judgment. of this, the performance com-
parison
In addition withtoand all without
of this, the emoji encoding demonstrates
performance comparison with that the
andinclusion
without emojiof emoji mean-
encoding
ing improves the
demonstrates thatrobustness
the inclusion of of
theemoji
model as a final
meaning step inthe
improves validating
robustness theofemoji encoding
the model as a
process.
final stepThe Emoji Meaning
in validating the emoji dataset provides
encoding a rich
process. The source
EmojiofMeaning
emotiondataset
and context
providesthata
improves
rich sourcemodel performance
of emotion and context by replacing
that improves eachmodel
emojiperformance
in the ASTCby dataset witheach
replacing its emo-
emoji
tional and real meaning from the Emoji Meaning dataset.
in the ASTC dataset with its emotional and real meaning from the Emoji Meaning dataset. It is important to point to theIt
fact that emoji
is important tointerpretation
point to the fact maythatdiffer
emojiacross different may
interpretation cultures;
differhowever, to address
across different this
cultures;
however,
issue, to address
we focused this study
in this issue, onweutilizing
focused in andthis study on utilizing
interpreting and interpreting
emojis according to theiremojis
com-
according
mon normstoglobally
their common amongnorms globally
cultures. among cultures.
Nevertheless, Nevertheless,
we acknowledge thatwe acknowledge
further investi-
that further
gation is stillinvestigation
required tois identify
still required
bothtocommon
identify both and common
uncommon andemojis
uncommon emojishave
that may that
may have
various various meanings
meanings dependingdepending on culture. on culture.
Figure 2.
Figure Thenumber
2. The numberof
ofpositive
positiveand
andnegative
negativetweets.
tweets.
3.2. Data Pre-Processing

3.2. Data Pre-Processing
The ASTC dataset contains many duplicate rows, hashtags, and diacritics. Therefore,
The ASTC dataset contains many duplicate rows, hashtags, and diacritics. Therefore,
several preprocessing steps were required to clean the text and remove all tokens that do
several preprocessing steps were required to clean the text and remove all tokens that do
not contribute to the actual meaning of the text. In particular, the following steps were used
not contribute
in all to the actual
the experiments meaningtoofevaluate
we conducted the text.our
In particular, the following steps were
proposed model:
used in all the experiments we conducted to evaluate our proposed model:
1. Drop duplication: The ASTC consists of 56 K rows divided into training and testing
1. Dropparts.duplication:
The trainingThe ASTCcontains
dataset consists15,721
of 56 Kduplicate
rows divided
rows,into
whiletraining
in the and testing
testing part
parts. The training dataset contains 15,721 duplicate rows, while
there are 2678 duplicate rows, resulting in 18,399 rows being dropped.in the testing part
there are 2678 duplicate rows, resulting in 18,399 rows being dropped.
2. Remove hashtags: Twitter users widely adopt the hashtag character “#” to bookmark
their tweet content or join a topic or trend community [34,35]. Therefore, the hashtags
were removed during this phase.
3. Remove diacritics: All diacritics were removed from the data because they did not affect
Þ ”,
the SA measurements [36]. For example, in “ @ AJË éJK QªÓ @ñ®¯ YªK. áÓ @QªË@ PAîE èA¾K. IªÖ
which means in English “I heard him crying on the day of the funeral, then they stood up

to offer their condolences to me @”, diacritics such as Fatha in “@” are used to give an
aesthetic shape to the sentence without following the rules of the Arabic language, such as

using the Fatha after the broken heart emoji: “ @”.
4. Remove numbers: All numbers are removed from tweets because they do not reflect
the sentiments contained in the text and are useless [37].
5. Removing stop words: stop words are frequently used in Arabic and English lan-
guages [36] and they have little semantic value [38]. Therefore, it is necessary to
remove the Arabic stop words if only Arabic words are kept in the dataset and both
Arabic and English stop words if both languages are used.
6. Tokenization: The TweetTokenizer from the NLTK library was used to break the text
into tokens [39]. It is a simple and fast tokenizer that focuses on data from Twitter and
works based on regular expressions [40]. Also, TweetTokenizer preserves the emojis
and emoticons as tokens, which allows them to be handled appropriately, and it deals
with the repeated characters by reducing them to a length of three [41,42]. This makes
it suitable for the dataset and preprocessing experiments used.
7. Preprocessing is divided into three phases, based on removing non-Arabic words
and punctuation in Experiment 1, keeping the non-Arabic words and removing the
punctuation in Experiment 2, and keeping the non-Arabic words and the punctuation
in Experiment 3. Then, each experiment is tested over eight preprocessing conditions
denoted by R1–R8 as shown in Figure 1, and based on the conditions described in the
8, 9, and 10 points.
8. Handling Emojis: To study the effect of emojis, two approaches were followed; the
first one involved removing the emojis from each tweet, while the second approach
treated emojis by collecting the emotional and real meaning of emojis based on their
usage on social media platforms and replacing each emoji with its textual meaning.
9. Stemming: This is a common morphological analysis that aims to reduce inflectional
forms and achieve a common base form for words in sentences [43]. For the Ara-
bic natural language, different stemmers can indicate the lexical root of the words.
Therefore, the effect of using different stemmers in Arabic sentiment analysis was
investigated by applying both the Information Science Research Institute’s (ISRI) and
Snowball stemmers.
10. Embedding: Embedding provides a numerical representation for words and sentences
by transforming each word into a numerical vector representation that captures the
syntactic and semantic meaning based on its contextual usage in the dataset [44]. The
effect of the transformation model was investigated by evaluating two transformation
methods, Keras embedding and AraVec 3.0 embedding. Keras embedding is trainable
and not a pertained model. This means that the embedding vector for each word was
adjusted randomly to small weights, and during back-propagation, the embedding
vectors were updated to minimize the loss function [45]. On the other hand, AraVec
3.0 [46] is an open-source project that provides a powerful pre-trained model for
Arabic word embedding transformation. The latest version of AraVec 3.0 has been
trained on two Arabic content domains, namely tweets and Arabic Wikipedia articles,
resulting in the provision of 16 different word-embedding models. This version also
provides two types of models, unigrams and n-grams, and the most commonly used
n-gram models are trained with a total of more than 1,169,075,128 tokens. In this
research, the n-gram model was used to generate embedding with a vector size of 100.
3.3. A Combined Deep Learning Model

3.3. A Convolutional
3.3.1. Combined Deep Neural
LearningNetwork
Model (CNN)
3.3.1. Convolutional Neural Network (CNN)
CNNs are a neural network type with a design which gives them the ability to pro-
CNNs are a neural
cess and analyze network
data with type representation
a special with a design which gives them
[47]. CNNs the ability
are excellent at to process
capturing
and analyze data with a special representation [47]. CNNs are excellent
spatial dependencies of targets and their environment, which makes them well-suited for at capturing
spatial dependencies
tasks such of targets
as time-series and their
prediction, imageenvironment,
recognition,which
naturalmakes themprocessing,
language well-suitedand
for
tasks such as time-series prediction, image recognition, natural language processing,
audio signal pattern recognition [47,48]. In the proposed model, the CNN part is used to and
audio
extractsignal pattern recognition
the informative [47,48].
features from theIninput
the proposed model,
textual data, suchthe
asCNN
wordpart is used to
combinations
extract the informative features from the input textual data, such as word combinations
and patterns, since the convolutional layer uses the learnable filters to extract the features
and
frompatterns,
the inputsince
datathe convolutional
at different spatiallayer uses the
locations learnable filters to extract the features
[47].
from the input data at different spatial locations [47].
3.3.2. Long Short-Term Memory (LSTM)
3.3.2. Long Short-Term Memory (LSTM)
LSTMs are a type of RNN designed to deal with temporal dependencies, including
LSTMs are a type of RNN designed to deal with temporal dependencies, including
text sequences and time series [47]. This means that the RNNs face a problem during back-
text sequences and time series [47]. This means that the RNNs face a problem during
propagation, where the error function can explode when there are multiple time steps
back-propagation, where the error function can explode when there are multiple time
[47]. On
steps the
[47]. Onother hand,hand,
the other a memory cell has
a memory been
cell hasadded to the to
been added LSTM design,design,
the LSTM which which
solves
solves the vanishing or exploding gradient problem faced in RNN by regulating theof
the vanishing or exploding gradient problem faced in RNN by regulating the flow in-
flow
formation through the network [47]. Thus, LSTMs effectively handle
of information through the network [47]. Thus, LSTMs effectively handle sequential datasequential data with
long-term
with dependencies,
long-term dependencies, making themthem
making suitable for problems
suitable for problemsstrongly related
strongly to time
related se-
to time
ries analysis or natural language processing
series analysis or natural language processing [48]. [48].
The basic
The basic structure
structure forfor each
each LSTM
LSTM unitunit consists
consists ofof aa memory
memory cell cell and
and three
three gates,
gates,
which are the input gate, which updates the memory cell with the fresh
which are the input gate, which updates the memory cell with the fresh data, the forget data, the forget
gate, which
gate, whichtakes
takesthetherole
roleofof determining
determining whether
whether to keep
to keep the data
the data or discard
or discard the from
the data data
from the memory, and the output gate, which generates the next hidden
the memory, and the output gate, which generates the next hidden state from the current state from the
current memory cell [47,49]. Thus, these gates play a role in updating the
memory cell [47,49]. Thus, these gates play a role in updating the current memory cell and current memory
cellcurrent
the and thehidden
currentstate
hidden
[49].state [49].
3.3.3.
3.3.3. CNN-LSTM
In
In this
this study,
study, wewe developed
developed aa combined
combined deep
deep learning
learning architecture
architecture specifically
specifically for
Arabic
Arabicsentiment
sentimentanalysis
analysisand classification.
and TheThe
classification. workflow
workflowof theofproposed combined
the proposed CNN-
combined
LSTM
CNN-LSTMmodelmodel
includes five stages
includes whichwhich
five stages can be summarized
can be summarized as follows: the embedding
as follows: the embed-
layer, CNN layer,
ding layer, CNN maxlayer,pooling layer, LSTM
max pooling layer,layer,
LSTM and output
layer, andlayer (as shown
output layer (as in shown
Figure 3).
in
The CNN-LSTM model is a hybrid model that combines the advantages
Figure 3). The CNN-LSTM model is a hybrid model that combines the advantages of both of both CNNs
and
CNNs RNNs, specifically
and RNNs, LSTM networks.
specifically This combination
LSTM networks. results results
This combination in an effective model
in an effective
to capture
model both local
to capture bothand
localglobal dependencies
and global in theintext
dependencies thedata, making
text data, it well-suited
making for
it well-suited
sentiment analysis
for sentiment tasks.
analysis tasks.
Figure 3. The proposed

Figure 3. proposed CNN-LSTM
CNN-LSTM model
model architecture
architecture for
for Arabic
Arabic sentiment
sentiment analysis.
analysis.
The embedding layer

The embedding layertakes
takesthe
thepreprocessed
preprocessed text
text and
and transforms
transforms it into
it into a vector
a vector rep-
representation
resentation so that the CNN-LSTM model can understand and process it effectively. The
so that the CNN-LSTM model can understand and process it effectively. The
transformer depends on three parameters to generate the embedding, and these are input-
dim to understand the vocabulary size of the dataset, output-dim to describe how the words
will be embedded in a certain vector space, and input-length to show the input sequence
length [50]. These determine the shape of the generated output from the embedding layer as
(batch-size, input-length, output-dim), where the batch-size value of “None” is used for the
dynamic batch size, which is common in the Keras implementation [47], while the value of
the input-length varies depending on the preprocessing steps that affect the sequence length
of the input data, and the output dimension is explored between 100 and 400, with a step of
50 units for Keras embedding and 100 for the AraVec, since it is a pre-trained model with a
static vector size of 100. Thus, the shape of the embedding layer output will be different for
different experiments and runs, depending on the input sequence length, which can vary
depending on whether non-Arabic words, punctuation, or emojis are retained or removed,
and the output dimension, which is determined during the hyperparameter tuning phase.
Also, the final shape of the generated embeddings for all experiments is summarized in
Table 1. In this research work, two types of embedding transformation methods were used
to study their effect on the model performance: AraVec 3.0 and Keras embedding.
Table 1. The output shape of each layer for the 1D CNN-LSTM model.
Exp Run
Embedding Shape Convolutional Layer Max Pooling LSTM Layer Flatten
Num Num
R1 (None, 1189, 150) (None, 1187, 400) (None, 593, 400) (None, 593, 250) (None, 148,250)
Exp 1
R2 (None, 2129, 150) (None, 2128, 400) (None, 1064, 400) (None,1064, 270) (None, 287,280)
Exp 2
Exp 3
Then, the CNN layer extracts the local features [51] from the generated embedding of
the embedding layer to feed the max-pooling layer. Thus, the feature extraction is found by
applying the convolutional filter (kernel) to the input matrix by shifting the kernel in the
matrix [50]. This results in an output shape of (None, Conv-input-length, Conv-filters) with
a dynamic batch size indicated by a None value and the values of the Conv-input-length
and the number of convolutional filters summarized in Table 1 for all experiments. The
number of filters in the Conv1D layer varied between 100 and 400, with increments of
100. Here, a convolutional filter was applied to a window of words Xi:i + h − 1, where
h is the window size and Xi is a K-dimensional vector, and Xi:i + j represents the input
feature matrix that extends from the ith to (i + j) words of the sentence vector [52]. The
window size, which is also called kernel size, was tested with values of 2, 3, and 6 to capture
different n-gram features. This results in a feature Cif, as proposed in Equation (1).
Ci = f (W.Xi:i+h−1 + b) (1)
where W represents the convolutional filter, b represents the bias, which is a real number,
and f is the activation function [52] because the output of each filter in the CNN layer is
applied to the ReLU activation function, which allows it to learn complex patterns. Then,
a feature map is generated by convolving through all the windows of words for a single
convolutional filter based on Equation (2), while the m filters in the convolutional layer will
generate m(n − h + 1) features [52]. Then, the activity L2 regularization of 0.01 is used to
prevent overfitting.
C = C1 , C2 , C3 , Cn−h+1 (2)
The max function from the max pooling with a pool size of 2 is applied to each
CNN filter output to select the maximum feature value from each filter window while
iterating across the matrix [53], resulting in reduced the output complexity while saving
the important features [51] to be fed to the LSTM layer. The output shape generated by
the max pooling layer is (None, Conv-input-length/2, Conv-filters); Table 1 shows all the
output shapes for this layer for all experiments.
Then, the LSTM layer is used to handle long-term dependencies for understanding
the context that reflects the emotions in the text. In this research, an LSTM layer was used
with a dropout value which was set by the Keras tuner to find the best value; the dropout
rate in the LSTM layer varied between 0.2 and 0.5, with a step of 0.1 to prevent overfitting.
Also, the LSTM units were used with a search space between 30 and 300, with a step of 10;
this tuning range provides a trade-off between the model performance and computational
efficiency. This layer was then followed by the Flatten layer to shape the features that
were larger than the threshold. The output shapes of both the LSTM and Flatten layers are
presented in Table 1.
The last layer, also called the fully connected layer [49], is a dense layer with one neuron
that generates a (None, 1) output shape and a sigmoid function to binary-classify the output
of the LSTM layer as positive or negative sentiment. Then, the Adam optimizer was utilized
to enhance the training process of our hybrid model with a learning rate that was sampled
logarithmically between 1 × 10−5 and 1 × 10−3 to ensure optimal training convergence.
Moreover, each layer in the CNN-LSTM model contributes a total number of trainable
parameters that are updated during model training. Table 2 presents the number of
trainable parameters for each layer of the CNN-LSTM model for all experiments and runs,
providing a clear understanding of the model complexity and ability to learn from the data.
The number of trainable parameters can be calculated automatically for each layer using
the summary () function from Keras.
Table 2. The number of trainable parameters for each CNN-LSTM layer.
Exp Run Embedding Conv Layer Max Pooling LSTM Layer Flatten Dense Layer
Num Num Param Param Param Param Param Param
R1 10,774,200 180,400 0 651,000 0 148,251
R2 5,438,400 90,100 0 57,920 0 78,161
R3 4,934,700 180,400 0 388,280 0 165,921
R4 147,671,600 40,200 0 172,120 0 127,011
Exp 1
R5 6,217,000 120,300 0 160,400 0 48,301
R6 147,671,600 120,400 0 480,800 0 195,201
R7 147,671,600 180,300 0 373,160 0 91,391
R8 147,671,600 60,100 0 106,080 0 57,841
Table 2. Cont.
Exp Run Embedding Conv Layer Max Pooling LSTM Layer Flatten Dense Layer
Num Num Param Param Param Param Param Param
R1 30,356,400 160,200 0 601,200 0 198,901
R2 2,938,350 120,400 0 724,680 0 287,281
R3 12,019,700 420,400 0 580,520 0 244,491
R4 147,671,600 20,100 0 92,840 0 117,041
Exp 2
R5 4,897,950 180,400 0 359,040 0 91,041
R6 147,671,600 40,200 0 210,600 0 159,451
R7 147,671,600 80,400 0 449,160 0 108,301
R8 147,671,600 40,200 0 538,720 0 159,601
R1 5,450,400 60,100 0 282,480 0 128,261
R2 6,902,000 420,400 0 131,880 0 71,611
R3 13,765,200 240,300 0 103,880 0 74,411
R4 147,671,600 120,400 0 276,120 0 144,171
Exp 3
R5 13,086,800 120,100 0 68,760 0 51,211
R6 147,671,600 60,200 0 396,520 0 244,491
R7 147,671,600 60,100 0 57,920 0 45,441
R8 147,671,600 40,200 0 396,520 0 134,091
The final phase is model tuning using the Keras tuner [54], an easy-to-use framework
that provides scalable hyperparameter optimization for deep learning models. The Keras
tuner solves the pain points of hyperparameter search by using one of the built-in search
algorithms (Random Search, Bayesian Optimization, and Hyperband) and configuring its
search space using define-by-run syntax to find the best set of hyperparameter values for the
model. In this research, the Keras Tuner was utilized with Bayesian Optimization to explore
the hyperparameter space by focusing on promising regions, thereby reducing the number of
trials required. The final selection of the hyperparameters is based on their improvement of
the model performance in a 5-fold cross-validation. Thus, the best model was selected not
only for its validation accuracy but also for its consistency across different folds.
The five-fold cross-validation with the Keras tuner was used to validate the model
and optimize the hyperparameters using five different validation folds, which allowed
the model to explore different hyperparameter combinations and select those that make
the model perform best for ASA. Moreover, during the model training and validation, all
dynamic batch sizes that were previously defined as None could be set to a value of 50 since
it achieved the best result after manually trying different values. Then, the best model with
the best validation performance was evaluated on test data never seen before using the
same number of epochs: 10.
3.4. Model Evaluation

Several evaluation measures were used to evaluate and check the model performance
in the Arabic sentiment classification task. The CNN-LSTM consistency was checked by
evaluating the results on the test data after training the hybrid model. Although accuracy is
the most popular performance measure, it may not represent the whole idea [55]. Therefore,
the precision, recall, and F1-score were also used to ensure a comprehensive evaluation.
Accuracy is a metric that provides an overall measure of how often the model correctly
classifies sentiment. It represents the ratio of correctly predicted observations to the total
number of observations and is calculated using Equation (3):
True Positive + True Negative

Accuracy = (3)
True Positive + True Negative + False Positive + False Negative
Precision is an evaluation metric to determine the model performance by finding the

ratio of observations correctly predicted as positive to the total predicted positives, as
shown in Equation (4).
True Positive
Precision = (4)
True Positive + False Positive
Recall is a popular metric that measures the consistency of the model’s performance
by finding the ratio of observations correctly predicted as positive to all actual positives, as
shown in Equation (5).
True Positive
Recall = (5)
True Positive + False Negative
The F1-score is the harmonic mean of precision and recall, and is also an important metric
to verify the test accuracy [56]. The calculation of the F1-score is shown in Equation (6).
Precision × Recall
F1 − score = 2 × (6)
Precision + Recall
4. Results
The results demonstrate the performance when using a combined deep learning
model in this context and the effect of each preprocessing step on the model performance,
including the effect of removing and keeping the non-Arabic words and punctuation over
eight combinations of preprocessing conditions. These groups of conditions expand the
scope of the research to find the role of replacing the emojis with their meanings and the
appropriate stemmer and embedding transformer for each group of preprocessing steps.
The preprocessing conditions were grouped into eight groups from R1 to R8, as follows.
• R1: ISRI stemmer, emoji removal, Keras embedding.
• R2: ISRI stemmer, emoji encoding, Keras embedding.
• R3: Snowball stemmer, emoji encoding, Keras embedding.
• R4: ISRI stemmer, emoji encoding, AraVec 3.0 embedding.
• R5: Snowball stemmer, emoji removal, Keras embedding.
• R6: Snowball stemmer, emoji encoding, AraVec 3.0 embedding.
• R7: Snowball stemmer, emoji removal, AraVec 3.0 embedding.
• R8: ISRI stemmer, emoji removal, AraVec 3.0 embedding.
After preprocessing the data and during model training and cross-validation, the
Keras tuner used Bayesian Optimization to sample a set of hyperparameters and then
trained the model based on these parameters. After that, the model performance was
evaluated using the validation data. These operations were repeated for a predefined
number of iterations. Once all iterations had been completed, the model with the best
accuracy was selected after repeating this process 5-fold during the cross-validation. Table 3
summarizes the best set of hyperparameters.
The best set of hyperparameters differs from experiment to experiment, with no single
set of parameters repeated more than the other. Therefore, we could not generalize a
set of these values to all experiments. Instead, we adjusted the parameters with the best
set of values extracted with the Keras tuner for each experiment, as appropriate. The
results of Experiments 1–3 are summarized in Table 4, Table 5 and Table 6, respectively.
All experiments were conducted on the Google Colab-L4 platform using Python version
3.10.12. The deep learning models were implemented using Keras version 3.4.1, running
on TensorFlow version 2.17.0.
Table 3. Best hyperparameters values determined by the Keras tuner.
Convolutional Convolutional LSTM Learning

Exp Num Run Output Dim LSTM Units
Filters Kernel Size Dropout Rate
R1 150 400 3 250 0.3 0.00049751
R2 300 100 3 80 0.3 0.00017099
R3 150 400 3 170 0.2 0.00022555
R4 100 200 2 130 0.4 0. 00023515
Exp 1
R5 200 300 2 100 0.4 0. 00014636
R6 100 400 3 200 0.3 0.00001149
R7 100 300 6 190 0.3 0.00002469
R8 100 100 6 120 0.4 0.00002816
R1 400 200 2 30 0.2 0. 00016625
R2 150 400 2 270 0.2 0.00007771
R3 350 400 3 230 0.4 0.00053589
R4 100 100 2 110 0.2 0.00007517
Exp 2
R5 150 400 3 160 0.3 0.00019196
R6 100 200 2 150 0.3 0.00043968
R7 100 400 2 190 0.2 0.00007383
R8 100 200 2 280 0.2 0.000144206
R1 300 100 2 220 0.2 0. 00028067
R2 350 400 3 70 0.4 0. 00027495
R3 400 300 2 70 0.2 0.00022570
R4 100 400 3 130 0.3 0.00003997
Exp 3
R5 400 100 3 90 0.2 0.00009749
R6 100 200 3 230 0.2 0.00001491
R7 100 100 6 80 0.4 0.00007121
R8 100 200 2 230 0.2 0.00011200
Table 4. Experiment 1 results.
Run Stemmer Emoji Embedding Precision Recall F1-Score Accuracy

R1 ISRI Remove Emoji Keras Embedding 72% 70% 70% 70.15%
R2 ISRI Encoding to Arabic Keras Embedding 90% 90% 90% 90.23%
R3 Snowball Encoding to Arabic Keras embedding 91% 91% 91% 91.69%
R4 ISRI Encoding to Arabic AraVec 3.0 87% 87% 87% 87.32%
R5 Snowball Remove Emoji Keras Embedding 70% 70% 70% 69.92%
R6 Snowball Encoding to Arabic AraVec 3.0 76% 76% 76% 76.09%
R7 Snowball Remove Emoji AraVec 3.0 54% 54% 53% 53.87%
R8 ISRI Remove Emoji AraVec 3.0 54% 53% 53% 53.55%

R1 ISRI Remove Emoji Keras embedding 70% 70% 70% 70.11%
R2 ISRI Encoding to Arabic Keras embedding 90% 90% 90% 89.81%

R1 ISRI Remove Emoji Keras embedding 71% 70% 69% 69.79%
R2 ISRI Encoding to Arabic Keras embedding 91% 91% 91% 90.28%
4.1. Experiment 1
In this experiment, the effect of removing non-Arabic words and punctuation was
tested over eight conditions, resulting in eight experimental runs.
The results in Table 4 show that removing the emojis had a negative impact on
the model performance, achieving the lowest accuracies of 70.15%, 69.92%, 53.87%, and
53.87% in R1, R5, R7, and R8, respectively, while translating the emojis to their textual
meaning improved the model classification performance in R2, R3, R4, and R6, achieving
accuracies of 90.23%, 91.69%, 87.32%, and 76.09%, respectively. Also, the results in R2 and
R3 prove that Keras embedding is better than AraVec for both stemmers. Moreover, Keras
embedding gives a better representation when removing the emojis in R1 and R5, which
can be explained by the fact that Keras has the advantage of being specifically trained on
the same dataset.
4.2. Experiment 2
In this experiment, the impact of keeping the non-Arabic tokens and removing the punctua-
tion was tested using eight combinations of parameters, resulting in eight experimental runs.
The results in Table 5 show that removing the emojis had a negative impact on the
model performance, achieving the lowest accuracies of 70.11%, 69.94%, 56.48%, and 56.07%
in R1, R5, R7, and R8, respectively, while translating the emojis to their textual meaning
improved the model classification performance in R2, R3, R4, and R6, achieving accuracies
of 89.81%, 91.85%, 78.08%, and 75.59%, respectively. Also, the results in R2 and R3 prove
that Keras embedding is better than the AraVec in R4 and R6 for both stemmers. Also,
Keras embedding gives a better representation when removing the emojis in R1 and R5. In
addition, keeping the non-Arabic words in this experiment showed the superior ability of
the Snowball stemmer and Keras embedding in dealing with other languages over the ISRI
stemmer and AraVec embedding, as shown in the results in R2, R3, and R4.
4.3. Experiment 3
In this experiment, the effect of keeping the non-Arabic words and punctuation was
tested over eight conditions, resulting in eight experimental runs.
Table 6 also presents the importance of translating the emojis, which provides a
real improvement to the model results and shows that Keras embedding outperforms
the AraVec transformer. However, keeping the punctuation does not have an effect on
improving the results; instead, the results decreased in Table 6 when the punctuation was
not removed compared to the results in Table 5 for all experiments except in R2 and R6.
These results reflect the reality that Twitter users use punctuation to decorate text and
do not follow the rules of the Arabic language. These results will direct attention to the
importance of removing the punctuation from tweets to obtain real results for the SA. In this
experiment, the best results of 90.43% were achieved in R3 when using Keras embedding,
follow the rules of the Arabic language. These results will direct attention to the im-
Modelling 2024, 5 follow
portancethe
of rules
removingof the theArabic language.
punctuation fromThese
tweetsresults willreal
to obtain direct attention
results for the to
SA.the im-
In this
1484
portance of removing
experiment, the punctuation
the best results from achieved
of 90.43% were tweets to in obtain real results
R3 when for theembedding,
using Keras SA. In this
experiment, the best results of 90.43% were achieved in R3 when using
Snowball stemmer, and emoji encoding, which reflects their ability to deal with punctua- Keras embedding,
Snowball
Snowball stemmer,
stemmer,
tion and extract and
theand emojiencoding,
emoji
emotions encoding,which
expressed which
in reflects
thereflects
place their
their
of the ability
ability
emojis. to to deal
deal with
with punctua-
punctuation
tion and extract
and extract
Figure the the emotions
emotions
4 shows expressed
expressedmatrix
the confusion in the
in the place place of the
of the emojis.
of Experiment emojis.
2 R3 because it achieves the best
Figure
Figure
accuracy 44 shows
score of allthe
shows the confusion
confusion matrix
experiments, matrix
showing of
of Experiment
a percentage22 of
Experiment R3 because
R3true
because it
it achieves
positivesachieves the
the best
(TP) and best
true
accuracy
accuracy score
score of
of all
all experiments,
experiments, showing
showing a
a percentage
percentage of
of true
true positives
positives
negatives (TN) of 91.85%, and a percentage of false positives and false negatives of 8.15%, (TP)
(TP) and
and true
true
negatives
indicating(TN)
negatives (TN) of
a strongof 91.85%,
91.85%, and
and aa percentage
performance with high of
percentage TPfalse
of and positives
false TN rates.and
positives and false
false negatives
negatives
Specifically, of
of 8.15%,
the model 8.15%,
cor-
indicating a
a strong
strong performance
performance with
with high
high TP TPandand
TN TN rates.
rates. Specifically,
Specifically,
rectly predicted 4199 out of 4454 TN samples and 3921 out of 4386 TP samples. These the the
modelmodel cor-
correctly
rectly
resultspredicted
predicted 4199 that
suggest 4199
out of
our out
4454 ofTN
hybrid4454 TN samples
samples
CNN-LSTM andmodel andout
3921 3921 out of
of 4386
effectively TP4386 TP
thesamples.
samples.
captures These
These results
nuances of the
results
suggest suggest
that ourthat our
hybrid hybrid
CNN-LSTMCNN-LSTM
data, resulting in fewer misclassifications. model model effectively
effectively captures
captures the the
nuancesnuances
of the of the
data,
data, resulting
resulting in fewer in fewer misclassifications.
misclassifications.
Figure 4. Confusion matrix of experiment 2 R3.

Figure 4.
Figure Confusion matrix
4. Confusion matrix of
of experiment
experiment 22 R3.
R3.
Figure 55 shows
Figure shows aa Receiver
Receiver Operating Characteristic Curve
Operating Characteristic Curve (ROC-Curve)
(ROC-Curve) for for the
the same
same
Figure 5The
experiment. shows a Receivershows
ROC-Curve Operating
the Characteristic
same results as Curve
the (ROC-Curve)
confusion matrix for
in the same
Figure 4,
experiment. The ROC-Curve shows the same results as the confusion matrix in Figure 4, as
experiment.
as the curve The
risesROC-Curve
dramatically shows
to thethe same
upper results as the confusion matrix in Figure 4,
the curve rises dramatically to the upper leftleft
nearnear
thethe Y-axis
Y-axis to show
to show the the
highhigh
truetrue pos-
positive
as theand
itive curve
truerises dramatically
negative to the upperthe
rates, highlighting leftrobustness
near the Y-axis
of thetomodel
show the high true pos-
in distinguishing
and true negative rates, highlighting the robustness of the model in distinguishing between
itive and positive
between true negative rates, highlighting
and negative sentiments the robustness
and contributingoftothe
itsmodel
overall insuperior
distinguishing
perfor-
positive and negative sentiments and contributing to its overall superior performance,
between positive
mance, resulting and negative
in under
an areathe
undersentiments and
the ROC-Curve contributing
of 91.84%. to its overall superior perfor-
resulting in an area ROC-Curve of 91.84%.
mance, resulting in an area under the ROC-Curve of 91.84%.
Figure 5. ROC-Curve of experiment 2 R3.

Figure 5. ROC-Curve of experiment 2 R3.
5. Discussion
In this research study, we present a combined deep-learning approach for the analysis
and classification of Arabic tweets. Also, the role of preprocessing in improving the field
of Arabic sentiment analysis was investigated by checking the model performance with
different preprocessing groups to find the most suitable set of preprocessing steps for
the tweet dataset. We also investigated the translation of emojis into their meanings to
understand their importance in data preparation.
In the first experiment, all non-Arabic words and punctuation were removed. Then,
the model was used to evaluate the different techniques for handling emojis, stemming,
and embedding. The results in Table 4 show that removing emojis from the data resulted in
poor classification accuracy in R1, R5, R7, and R8, whereas translating emojis into real and
emotional meanings improved the model accuracy in R2, R3, R4, and R6, reaching 91.69%
in R3 when using Snowball stemmer and Keras embedding. Also, using the ISRI stemmer
in R2 gave a close result of 90.23%. In R4 and R6, the pre-trained AraVec 3.0 had less effect
on improving the model results, with an accuracy of 87.32% and 76.09%, respectively.
In the second experiment, the results in Table 5 suggest that keeping the non-Arabic
words had no positive effect on the results when using ISRI stemmer and emoji encoding
or AraVec 3.0 embedding and emoji encoding in R2, R4, and R6 over the results in Table 4,
while keeping the non-Arabic words improved the results in R3 and R5 when using
Snowball stemmer and Keras embedding. This indicated that the combination of Snowball
stemmer and Keras embedding can deal with both the emotions stored inside the emojis
and the words written in other languages and can employ them to provide insight into
full vector representation, while ISRI stemmer and AraVec transformers could not employ
the non-Arabic words to improve the classification results, especially when using emoji
encoding. This is because AraVec is a pre-trained model trained on Arabic tweets and
texts from Wikipedia, and the existence of non-Arabic words affects its transformation
performance, while the Keras transformer is trained on the same dataset, which helps
it to provide better representation of the emotions from the emojis and the non-Arabic
words. So, the best result of 91.85% was achieved in Exp. 2 R3 over all experiments by
keeping the non-Arabic words, which often carry significant sentiment information that
contributes to the overall meaning of a post, and this led to a noticeable improvement in
the sentiment classification accuracy. This is because non-Arabic words often act as strong
sentiment indicators. For example, a tweet containing the phrase “I love” would likely
indicate a positive sentiment. Removing these tokens would remove important context
from the post, potentially leading to misclassification. Also, AraVec 3.0 embedding was
slightly positively affected by keeping the non-Arabic words and removing the emoticons
compared to Experiment 1 R7 and R8. This can be explained by the fact that removing the
emoticons helps AraVec to provide vector representation for the tweets with an output
dimension of 100, while Keras embedding uses the appropriate output dimension with the
Keras tuner and generates more meaningful full embeddings.
In the third experiment, non-Arabic and punctuation were retained, and the effect of
emoji removal and emoji encoding on the model was the same as in Experiments 1 and 2. This
is because the accuracy achieved by removing the emoji was improved by replacing each emoji
with its meaning. Also, the effect of punctuation was tested in this experiment by keeping the
punctuation and non-Arabic words to see their effect on the model performance compared to
keeping the non-Arabic and removing the punctuation. These results show that keeping the
punctuation had a negative effect on the model accuracy for all experiments, especially R3,
which provided the best accuracy in Exp 2, while R2 and R6 results were improved. These
results show the indiscriminate use of punctuation by Twitter users. Thus, removing the
punctuation will provide a more reliable and constant model.
The results obtained by the proposed approach applied to the ASTC dataset were
compared with the results obtained by following different approaches that applied to the
same dataset. The comparison was made with the study in [25] and presents the difference
in the study aim, preprocessing steps, and the classification model, as shown in Table 7.
Table 7. Comparison.
Article Dataset Model Accuracy

Our approach ASTC CNN-LSTM 91.85%
Heterogeneous Ensemble Deep Learning Model
ASTC Stacking LR 92.22%
for Enhanced Arabic Sentiment Analysis
The proposed model shows comparable results with the results obtained in Heteroge-
neous Ensemble Deep Learning Model for Enhanced Arabic Sentiment Analysis, which
used the emoji Unicode translation and CBOW word embedding to generate a numerical
representation of the text to use the RNN, LSTM, and GRU combined with three meta-
learners, LR, RF, and SVM, to classify the tweets [25]. The investigated model in [25]
aimed to improve the performance of the model for predicting Arabic sentiment analysis.
Therefore, they started with data preprocessing by cleaning the data by removing non-
Arabic letters, digits, single Arabic letters, symbols, URLs, emails, and hashtags. Then,
tokenization was carried out by splitting the text with spaces, followed by the removal of
stop words, stemming from the ISRI stemmer, and emoji Unicode translation. In contrast,
the proposed model investigated the hybrid CNN-LSTM model with different data prepro-
cessing steps to achieve comparable accuracy results and highlighted the effect of emoji
encoding on emotional and real meaning, as well as non-Arabic words, punctuation, Arabic
stemmers, and trainable and pre-trained transformers. This research presents the compati-
bility between Snowball stemmer, Keras embedding, and CNN-LSTM model and shows
how keeping the non-Arabic words improved the model, while keeping the punctuation
had a negative effect on it. Moreover, both the study in [25] and our approach showed the
role of using the emoji meaning to enrich the sentiment of the text by achieving an accuracy
of 92.22% and 91.85%, respectively, while our approach provided a comparison between
the results when removing the emojis and when transforming them, which validates the
meaning of the emojis in the generated emoji meaning dataset.
6. Conclusions and Future Work

The most important step in applying any machine learning or deep learning model
is data preprocessing, as it plays a role in building real and accurate models. Therefore,
the main goal of this research was to investigate the importance of providing the real and
emotional meaning of emojis in sentiment analysis and of finding the best combination of
preprocessing steps to enhance the model. It also aimed to show the effect of the presence
of punctuation and non-Arabic words in ASA.
This research proposed a real contribution to the improvement of ASA and shows
that emoji encoding has the most important effect on the results since social media users
enrich their tweets and posts with emotional signs by using these emojis. Also, including
the non-Arabic words when using the Keras embedding and Snowball stemmer resulted
in the best set of preprocessing combinations and achieved the highest accuracy score of
91.85% using the CNN-LSTM model.
Given the promising improvements in Arabic Sentiment Analysis (ASA) achieved
through advanced data preprocessing and emoji encoding techniques, future research
will focus on integrating these steps with other pre-trained transformers like AraBERT,
Glove, and MARBERT. This investigation aims to evaluate and compare their performance
against the currently utilized transformers, potentially uncovering more efficient models
for enhanced sentiment analysis outcomes. We would also like to explore the effect of
integrating insights from speech synthesis, as in [57], into text-based sentiment analysis
models, which could lead to the development of hybrid models capable of understanding
both text and speech data.
Author Contributions: Conceptualization, H.A.; methodology, H.A.; software, H.A.; validation, H.A.,
A.H. and M.M.; formal analysis, H.A.; investigation, H.A.; data curation, H.A.; writing—original draft
preparation, H.A.; writing—review and editing, A.H. and M.M.; visualization, H.A.; supervision,
A.H. and M.M. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.
Data Availability Statement: The ASTC dataset supporting the findings in this research is available from
the link in the dataset citation. On the other hand, the Emoji Meaning dataset is available upon request.
Conflicts of Interest: The authors declare no conflicts of interest.
References
1. Diwali, A.; Saeedi, K.; Dashtipour, K.; Gogate, M.; Cambria, E.; Hussain, A. Sentiment Analysis Meets Explainable Artificial
Intelligence: A Survey on Explainable Sentiment Analysis. IEEE Trans. Affect. Comput. 2023, 15, 837–846. [CrossRef]
2. Saberi, B.; Saad, S. Sentiment analysis or opinion mining: A review. Int. J. Adv. Sci. Eng. Inf. Technol. 2017, 7, 1660–1666.
3. Abdelfattah, M.F.; Fakhr, M.W.; Rizka, M.A. ArSentBERT: Fine-tuned bidirectional encoder representations from transformers
model for Arabic sentiment classification. Bull. Electr. Eng. Inform. 2023, 12, 1196–1202. [CrossRef]
4. Mohammed, A.; Kora, R. Deep learning approaches for Arabic sentiment analysis. Soc. Netw. Anal. Min. 2019, 9, 52. [CrossRef]
5. Abdelwahab, Y.; Kholief, M.; Sedky, A.A.H. Justifying Arabic Text Sentiment Analysis Using Explainable AI (XAI): LASIK
Surgeries Case Study. Information 2022, 13, 536. [CrossRef]
6. Oueslati, O.; Cambria, E.; Ben HajHmida, M.; Ounelli, H. A review of sentiment analysis research in Arabic language. Future
Gener. Comput. Syst. 2020, 112, 408–430. [CrossRef]
7. Al Shamsi, A.A.; Abdallah, S. Ensemble Stacking Model for Sentiment Analysis of Emirati and Arabic Dialects. J. King Saud.
Univ.-Comput. Inf. Sci. 2023, 35, 101691. [CrossRef]
8. Elnagar, A.; Einea, O.; Lulu, L. Comparative study of sentiment classification for automated translated Latin reviews into Arabic.
In Proceedings of the IEEE/ACS International Conference on Computer Systems and Applications (AICCSA), Hammamet,
Tunisia, 30 October–3 November 2017; pp. 443–448. [CrossRef]
9. Al-Azani, S.; El-Alfy, E.S.M. Combining emojis with Arabic textual features for sentiment classification. In Proceedings of the 2018 9th
International Conference on Information and Communication Systems (ICICS), Irbid, Jordan, 3–5 April 2018; pp. 139–144. [CrossRef]
10. Novak, P.K.; Smailović, J.; Sluban, B.; Mozetič, I. Sentiment of Emojis. PLoS ONE 2015, 10, e144296. [CrossRef]
11. Soleymani, M.; Garcia, D.; Jou, B.; Schuller, B.; Chang, S.F.; Pantic, M. A survey of multimodal sentiment analysis. Image Vis.
Comput. 2017, 65, 3–14. [CrossRef]
12. Li, W.; Zhu, L.; Shi, Y.; Guo, K.; Cambria, E. User reviews: Sentiment analysis using lexicon integrated two-channel CNN–LSTM
family models. Appl. Soft Comput. 2020, 94, 106435. [CrossRef]
13. Alayba, A.M.; Palade, V. Leveraging Arabic sentiment classification using an enhanced CNN-LSTM approach and effective
Arabic text preparation. J. King Saud. Univ.-Comput. Inf. Sci. 2022, 34, 9710–9722. [CrossRef]
14. Alayba, A.M.; Palade, V.; England, M.; Iqbal, R. Arabic language sentiment analysis on health services. In Proceedings of the 2017 1st
International Workshop on Arabic Script Analysis and Recognition (ASAR), Nancy, France, 3–5 April 2017; pp. 114–118. [CrossRef]
15. Abdulla, N.A.; Ahmed, N.A.; Shehab, M.A.; Al-Ayyoub, M. Arabic sentiment analysis: Lexicon-based and corpus-based. In
Proceedings of the 2013 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT),
Amman, Jordan, 3–5 December 2013; pp. 1–6. [CrossRef]
16. Nabil, M.; Aly, M.; Atiya, A. Astd: Arabic sentiment tweets dataset. In Proceedings of the 2015 Conference on Empirical Methods
in Natural, Lisbon, Portugal, 17–21 September 2015; pp. 2515–2519. Available online: https://aclanthology.org/D15-1299.pdf
(accessed on 29 April 2024).
17. Hengle, A.; Kshirsagar, A.; Desai, S.; Marathe, M. Combining Context-Free and Contextualized Representations for Arabic Sarcasm
Detection and Sentiment Identification. arXiv 2021, arXiv:2103.05683. Available online: https://arxiv.org/abs/2103.05683v1
(accessed on 7 September 2023).
18. Jalil, A.A.; Aliwy, A.H. Classification of Arabic Social Media Texts Based on a Deep Learning Multi-Tasks Model. Al-Bahir J. Eng.
Pure Sci. 2023, 2, 12. [CrossRef]
19. Sabbeh, S.F.; Fasihuddin, H.A. A Comparative Analysis of Word Embedding and Deep Learning for Arabic Sentiment Classifica-
tion. Electronics 2023, 12, 1425. [CrossRef]
20. Gharaibeh, H.; Al Mamlook, R.E.; Samara, G.; Nasayreh, A.; Smadi, S.; Nahar, K.M.; Aljaidi, M.; Al-Daoud, E.; Gharaibeh, M.;
Abualigah, L. Arabic sentiment analysis of Monkeypox using deep neural network and optimized hyperparameters of machine
learning algorithms. Soc. Netw. Anal. Min. 2024, 14, 30. [CrossRef]
21. Nayel, H.; Amer, E.; Allam, A.; Abdallah, H. Machine Learning-Based Model for Sentiment and Sarcasm Detection. In
Proceedings of the Sixth Arabic Natural Language Processing Workshop, Kiev, Ukraine, 19 April 2021; pp. 386–389. Available
online: https://aclanthology.org/2021.wanlp-1.51 (accessed on 7 September 2023).
22. Wadhawan, A. AraBERT and Farasa Segmentation Based Approach for Sarcasm and Sentiment Detection in Arabic Tweets. arXiv
2021, arXiv:2103.01679. Available online: https://arxiv.org/abs/2103.01679v1 (accessed on 7 September 2023).
23. Al-Azani, S.; El-Alfy, E.S.M. Emoji-Based Sentiment Analysis of Arabic Microblogs Using Machine Learning. In Proceedings of the 21st
Saudi Computer Society National Computer Conference (NCC), Riyadh, Saudi Arabia, 25–26 April 2018; pp. 1–6. [CrossRef]
24. Arifiyanti, A.A.; Wahyuni, E.D. Emoji and emoticon in tweet sentiment classification. In Proceedings of the 6th Information
Technology International Seminar (IT IS), Surabaya, Indonesia, 14–16 October 2020; pp. 145–150. [CrossRef]
25. Saleh, H.; Mostafa, S.; Alharbi, A.; El-Sappagh, S.; Alkhalifah, T. Heterogeneous Ensemble Deep Learning Model for Enhanced
Arabic Sentiment Analysis. Sensors 2022, 22, 3707. [CrossRef]
26. Surikov, A.; Egorova, E. Alternative method sentiment analysis using emojis and emoticons. Procedia Comput. Sci. 2020,
178, 182–193. [CrossRef]
27. Al-Azani, S.; El-Alfy, E.S. Emojis-based sentiment classification of Arabic microblogs using deep recurrent neural networks.
In Proceedings of the 2018 International Conference on Computing Sciences and Engineering (ICCSE), Kuwait City, Kuwait,
11–13 March 2018; pp. 1–6. [CrossRef]
28. Chen, Y.; You, Q.; Yuan, J.; Luo, J. Twitter sentiment analysis via bi-sense emoji embedding and attention-based LSTM.
In Proceedings of the MM 2018—Proceedings of the 2018 ACM Multimedia Conference, Seoul, Republic of Korea,
22–26 October 2018; pp. 117–125. [CrossRef]
29. Arabic Sentiment Twitter Corpus. Available online: https://www.kaggle.com/datasets/mksaad/arabic-sentiment-twitter-
corpus/data?select=arabic_tweets (accessed on 31 March 2024).
30. EmojiGuide. Available online: https://ar.emojiguide.com/ (accessed on 9 April 2024).
31. EmojiAll. Available online: https://www.emojiall.com/ar (accessed on 9 April 2024).
32. Symbol Planet. Available online: https://symbolplanet.com/smileys-emotion-emoji-meanings/ (accessed on 9 April 2024).
33. wikiHow. Available online: https://www.wikihow.com/Category:Emoticons-and-Emojis (accessed on 9 April 2024).
34. Ma, Z.; Sun, A.; Yuan, Q.; Cong, G. Tagging your tweets: A probabilistic modeling of hashtag annotation in twitter. In Proceedings
of the 23rd ACM International Conference on Conference on Conference on Information and Knowledge Management, Shanghai,
China, 3 November 2014; pp. 999–1008. [CrossRef]
35. Yang, L.; Sun, T.; Zhang, M.; Mei, Q. We know what @you #tag: Does the dual role affect hashtag adoption? In Proceedings of the
21st Annual Conference on World Wide Web (WWW), Lyon, France, 16–20 April 2012; pp. 261–270. [CrossRef]
36. Khalid Bolbol, N.; Maghari, A.Y. Sentiment analysis of arabic tweets using supervised machine learning. In Pro-
ceedings of the 2020 International Conference on Promising Electronic Technologies (ICPET), Jerusalem, Palestine,
16–17 December 2020; pp. 89–93. [CrossRef]
37. Khamphakdee, N.; Seresangtakul, P. An Efficient Deep Learning for Thai Sentiment Analysis. Data 2023, 8, 90. [CrossRef]
38. Al-Helalat, M. Enhanced arabic information retrieval for informed decision-making: Empowering political search. Int. J. Progress.
Res. Eng. Manag. Sci. (IJPREMS) 2023, 3, 232–240. Available online: https://www.ijprems.com/uploadedfiles/paper/issue_7_
july_2023/31816/final/fin_ijprems1689480149.pdf (accessed on 10 May 2024).
39. Gurusamy, V.; Professor, A. Preprocessing Techniques for Text Mining. Int. J. Comput. Sci. Commun. Netw. 2014, 5, 7–16.
40. Van Der Goot, R. Where are we Still Split on Tokenization? In Findings of the Association for Computational Linguistics: EACL;
Association for Computational Linguistics: St. Julian’s, Malta, 2024; pp. 118–137. Available online: https://aclanthology.org/20
24.findings-eacl.9 (accessed on 27 April 2024).
41. Bird, S. NLTK: The natural language toolkit. In Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions; Association
for Computational Linguistics: Sydney, Australia, 2006; pp. 69–72. Available online: https://aclanthology.org/P06-4018.pdf
(accessed on 27 April 2024).
42. Islam, J.; Mercer, R.E.; Xiao, L. Multi-Channel Convolutional Neural Network for Twitter Emotion and Sentiment Recognition. In
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human
Language Technologies (NAACL HLT), Minneapolis, MN, USA, 2–7 June 2019; Volume 1, pp. 1355–1365. [CrossRef]
43. Maree, M.; Eleyat, M.; Rabayah, S.; Belkhatir, M. A hybrid composite features based sentence level sentiment analyzer. IAES Int. J.
Artif. Intell. 2023, 12, 284–294. [CrossRef]
44. Radwan, A.; Amarneh, M.; Alawneh, H.; Ashqar, H.I.; AlSobeh, A.; Magableh, A.A.A.R. Predictive Analytics in Mental Health
Leveraging LLM Embeddings and Machine Learning Models for Social Media Analysis. Int. J. Web Serv. Res. 2024, 21, 1–22. [CrossRef]
45. Gulli, A.; Pal, S. Deep Learning with Keras; Packt Publishing Ltd.: Birmingham, UK, 2017; Available online: https://scholar.
google.com/scholar_lookup?title=Deep+Learning+with+KERAS&author=Gulli,+A.&author=Pal,+S.&publication_year=2017
(accessed on 9 May 2024).
46. Soliman, A.B.; Eissa, K.; El-Beltagy, S.R. AraVec: A set of Arabic Word Embedding Models for use in Arabic NLP. Procedia Comput.
Sci. 2017, 117, 256–265. [CrossRef]
47. Bin Syed, M.A.; Ahmed, I. A CNN-LSTM Architecture for Marine Vessel Track Association Using Automatic Identification System
(AIS) Data. Sensors 2023, 23, 6400. [CrossRef]
48. Hu, F.; Yang, Q.; Yang, J.; Luo, Z.; Shao, J.; Wang, G. Incorporating multiple grid-based data in CNN-LSTM hybrid model for
daily runoff prediction in the source region of the Yellow River Basin. J. Hydrol. Reg. Stud. 2024, 51, 101652. [CrossRef]
49. Ghourabi, A.; Mahmood, M.A.; Alzubi, Q.M. A Hybrid CNN-LSTM Model for SMS Spam Detection in Arabic and English
Messages. Future Internet 2020, 12, 156. [CrossRef]
50. Saleh, H.; Mostafa, S.; Gabralla, L.A.; Aseeri, A.O.; El-Sappagh, S. Enhanced Arabic Sentiment Analysis Using a Novel Stacking
Ensemble of Hybrid and Deep Learning Models. Appl. Sci. 2022, 12, 8967. [CrossRef]
51. Rehman, A.U.; Malik, A.K.; Raza, B.; Ali, W. A Hybrid CNN-LSTM Model for Improving Accuracy of Movie Reviews Sentiment
Analysis. Multimed. Tools Appl. 2019, 78, 26597–26613. [CrossRef]
52. Khan, L.; Amjad, A.; Afaq, K.M.; Chang, H.T. Deep Sentiment Analysis Using CNN-LSTM Architecture of English and Roman
Urdu Text Shared in Social Media. Appl. Sci. 2022, 12, 2694. [CrossRef]
53. Behera, R.K.; Jena, M.; Rath, S.K.; Misra, S. Co-LSTM: Convolutional LSTM model for sentiment analysis in social big data. Inf.
Process Manag. 2021, 58, 102435. [CrossRef]
54. KerasTuner. Available online: https://keras.io/keras_tuner/ (accessed on 12 April 2024).
55. Alawneh, H.; Hasasneh, A. Survival Prediction of Children after Bone Marrow Transplant Using Machine Learning Algorithms.
Int. Arab. J. Inf. Technol. 2024, 21, 394–407. [CrossRef]
56. Islam, M.A.; Iacob, I.E. Manuscripts Character Recognition Using Machine Learning and Deep Learning. Modelling 2023,
4, 168–188. [CrossRef]
57. Al-Radhi, M.S.; Abdo, O.; Csapó, T.G.; Abdou, S.; Németh, G.; Fashal, M. A continuous vocoder for statistical parametric speech synthesis
and its evaluation using an audio-visual phonetically annotated Arabic corpus. Comput. Speech Lang. 2020, 60, 101025. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

Modelling 05 00076 v2

Uploaded by

Copyright:

Available Formats

Modelling 05 00076 v2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Modelling 05 00076 v2

Uploaded by

Copyright:

Available Formats

Article

On the Utilization of Emoji Encoding and Data Preprocessing

Academic Editor: Alfredo Cuzzocrea

Modelling 2024, 5, 1469–1489. https://doi.org/10.3390/modelling5040076 https://www.mdpi.com/journal/modelling

3. Materials and Methods

Modelling 2024, 5 1475

3.2. Data Pre-Processing

3.3. A Combined Deep Learning Model

Figure 3. The proposed

The embedding layer

Table 2. The number of trainable parameters for each CNN-LSTM layer.

3.4. Model Evaluation

True Positive + True Negative

Precision is an evaluation metric to determine the model performance by finding the

Table 3. Best hyperparameters values determined by the Keras tuner.

Convolutional Convolutional LSTM Learning

Table 4. Experiment 1 results.

Run Stemmer Emoji Embedding Precision Recall F1-Score Accuracy

Table 5. Experiment 2 results.

Run Stemmer Emoji Embedding Precision Recall F1-Score Accuracy

Table 6. Experiment 3 results.

Run Stemmer Emoji Embedding Precision Recall F1-Score Accuracy

Figure 4. Confusion matrix of experiment 2 R3.

Figure 5. ROC-Curve of experiment 2 R3.

Article Dataset Model Accuracy

6. Conclusions and Future Work

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.