Short Text Sentiment Classification Using Bayesian
Short Text Sentiment Classification Using Bayesian
Article
Short Text Sentiment Classification Using Bayesian and Deep
Neural Networks
Zhan Shi and Chongjun Fan *
Business School, University of Shanghai for Science & Technology, Shanghai 200093, China;
211420079@st.usst.edu.cn
* Correspondence: fan_chongjun@163.com
Abstract: The previous multi-layer learning network is easy to fall into local extreme points in
supervised learning. If the training samples sufficiently cover future samples, the learned multi-
layer weights can be well used to predict new test samples. This paper mainly studies the research
and analysis of machine short text sentiment classification based on Bayesian network and deep
neural network algorithm. It first introduces Bayesian network and deep neural network algorithms,
and analyzes the comments of various social software such as Twitter, Weibo, and other popular
emotional communication platforms. Using modeling technology popular reviews are designed to
conduct classification research on unigrams, bigrams, parts of speech, dependency labels, and triplet
dependencies. The results show that the range of its classification accuracy is the smallest as 0.8116 and
the largest as 0.87. These values are obtained when the input nodes of the triple dependency feature
are 12,000, and the reconstruction error range of the Boltzmann machine is limited between 7.3175
and 26.5429, and the average classification accuracy is 0.8301. The advantages of triplet dependency
features for text representation in text sentiment classification tasks are illustrated. It shows that
Bayesian and deep neural network show good advantages in short text emotion classification.
Keywords: Bayesian network; deep neural network algorithms; text sentiment analysis; machine learning
1. Introduction
Sentiment analysis has a long research history in the field of natural language process-
Citation: Shi, Z.; Fan, C. Short Text
ing. In the past, basically most of the methods were partially based on domain knowledge.
Sentiment Classification Using
Since then, the method based on machine learning has become the mainstream method of
Bayesian and Deep Neural Networks.
sentiment analysis.
Electronics 2023, 12, 1589. https://
In emotion analysis, emotion classification is the most important item. It is based on
doi.org/10.3390/electronics12071589
the emotional information displayed in the text, and divides the text into two or more
Academic Editor: Juan-Carlos Cano different categories, that is, a division of the attitudes, views, and tendencies of the text
authors. Emotional classification is a new research direction, which has a very important
Received: 3 February 2023
Revised: 17 March 2023
application value in view mining, information prediction, comment classification, garbage
Accepted: 23 March 2023
filtering, part of speech tagging, public opinion monitoring, etc.
Published: 28 March 2023
On blog and Weibo data, support vector machine and multinomial naive Bayesian
model are tested respectively. It was found that on long texts (blogs), SVMs perform better,
while on short texts (microblogs and Twitter), multinomial naive Bayes models outperform.
Based on the analysis of sentiment data flow based on association rules, this paper
Copyright: © 2023 by the authors. studies the major events in 2010, and finds that new training data are continuously obtained
Licensee MDPI, Basel, Switzerland. in the data flow, and studies how to automatically analyze users’ opinions and emotions in
This article is an open access article a real-time environment.
distributed under the terms and
This paper mainly introduces the deep neural network algorithm, Bayesian regular-
conditions of the Creative Commons
ization deep belief network, machine learning text sentiment classification, tests the role of
Attribution (CC BY) license (https://
the meta-learning method based on deep belief network in text sentiment classification,
creativecommons.org/licenses/by/
and makes experimental research and analysis, and concludes the desired conclusion.
4.0/).
The innovation of this paper is to use the deep neural network algorithm to establish the
BR-DBN model and test its performance. The results showed that the model is suitable for
discriminative classification problems, and then the experimental research and analysis are
carried out in the experimental part, which are closely linked.
2. Related Work
Text sentiment analysis plays an important role in social network information mining.
It is also the theoretical basis and basis for personalized recommendation, interest circle
classification, and public opinion analysis. Therefore, Chang G proposed a fine-grained
short text sentiment analysis method based on machine learning. In order to improve the
calculation method of feature selection and weighting, he proposed a sentiment analysis
algorithm N-CHI and weight calculation W-TF-IDF which is more suitable for feature
extraction, and improved the proportion and weight of sentiment words in feature words
through experiments [1]. In addition to the traditional document classification feature
set, it is also possible to extract the comments of certain posts as part of the microblog
features based on the relationship between the commenter and the poster by constructing a
microblog social network as input information. Sun X proposed a Deep Belief Network
(DBN) model and a multimodal feature extraction method to extend the features and
dimensions of short texts for Chinese microblog sentiment classification [2]. Emotions can
be expressed in many ways, such as facial expressions and gestures, speech, and written text.
Sentiment analysis in text documents is essentially a content-based classification problem
involving concepts from the fields of natural language processing and machine learning.
Joshi S discussed techniques used in sentiment recognition and sentiment analysis based
on textual data [3]. In recent years, sentiment analysis research has gained a huge impetus
on English text data, however, few studies have focused on Nepali text data, and this work
focuses on Nepalese text data. Piryani R explored machine learning methods and proposes a
dictionary-based approach to sentiment analysis of tweets written in Nepali using linguistic
features and lexical resources [4]. Text classification is a central task in natural language
processing, aiming to classify text documents into predefined classes or categories. It
needs appropriate functions to describe the content and meaning of text documents and
map them to target categories. The existing text feature representation depends on the
weighted representation of document terms. Therefore, it is very important to choose
an appropriate term weighting method, which will help to improve the effectiveness of
classification tasks. Attieh J provides a new text classification framework for category-based
feature engineering [5]. Naive Bayesian learning algorithm is widely used in many fields,
especially in text classification. However, when it is used in fields that violate its naive
assumptions, or when the training set is too small to find an accurate probability estimate,
its performance will decline. El Hindi K M proposed a naive Bayesian method of inertia
fine tuning to solve these two problems [6]. In recent years, the deep learning model has
been successfully applied to text emotion analysis. However, category imbalance and
unmarked corpus still limit the accuracy of text emotion classification. To overcome these
two problems, Jiang W proposed a new text sentiment analysis classification model [7].
Sentiment analysis of online content related to electronic news, products, services, etc.,
has become very important in this digital age to improve the quality of the services provided.
Machine learning-based, knowledge-based, and hybrid are three approaches for sentiment
analysis of text, audio, and sentiment. The system proposed by Divate MS is a polarity-
based sentiment analysis of Marathi electronic news [8]. Twitter is an online blogging
site on the Internet that provides a platform for people to experience and talk about their
thoughts on troubles, events, merchandise, and exclusive ideas. Bhagat C proposed that
its most important goal is to have a comprehensive understanding of the way machine
learning strategies are used in sentiment analysis in order to get better results in short
details [9]. Sentiment analysis is one of the main fields of natural language processing, and
its main task is to extract sentiments, opinions, attitudes, and emotions from subjective
texts. Due to its importance in decision-making and people’s trust in website reviews,
proposed that its most important goal is to have a comprehensive understanding of the
way machine learning strategies are used in sentiment analysis in order to get better re-
sults in short details [9]. Sentiment analysis is one of the main fields of natural language
processing, and its main task is to extract sentiments, opinions, attitudes, and emotions
Electronics 2023, 12, 1589 from subjective texts. Due to its importance in decision-making and people’s trust in 3web- of 16
site reviews, there are many academic studies addressing the SA problem. So, Albayati A
Q proposed deep learning to explore powerful machine learning techniques, emerging
with
there its
arefeature representation
many academic studies and ability to the
addressing discriminate
SA problem. data,So,resulting
Albayatiin A state-of-the-
Q proposed
art prediction results [10]. Social network data are unstructured
deep learning to explore powerful machine learning techniques, emerging with its feature and unpredictable, and
contain idioms, jargon, and dynamic themes. Machine learning algorithms
representation and ability to discriminate data, resulting in state-of-the-art prediction for traffic event
detection
results [10].may not network
Social be able todataextract valuable information
are unstructured from socialand
and unpredictable, network
containdata. Far-
idioms,
man Ali proposed a real-time monitoring framework based on
jargon, and dynamic themes. Machine learning algorithms for traffic event detection may social networks for traffic
accident
not be able detection
to extractand condition
valuable analysis from
information usingsocial
ontology and data.
network potential
Farman Dirichlet assign-
Ali proposed
ment as well as two-way short-term and short-term memory [11].
a real-time monitoring framework based on social networks for traffic accident detection In the emotional attitude
extraction
and conditiontask,analysis
the goalusing
is to identify
ontologythe and “attitude”—emotional relationship
potential Dirichlet assignment as between
well as two-the
entities mentioned in the text. Rusnachenko N studied attention-based
way short-term and short-term memory [11]. In the emotional attitude extraction task, the context coder in
emotional attitude extraction task [12]. The views put forward by
goal is to identify the “attitude”—emotional relationship between the entities mentioned these scholars are all in
line
in thewith theRusnachenko
text. current situation of emotional
N studied texts, and context
attention-based this research
coderhas great research
in emotional sig-
attitude
nificance.
extraction However,
task [12]. Thetheyviews
all overlooked
put forward a very important
by these point,
scholars arethat
all inis,line
they didthe
with notcurrent
clarify
their research
situation objects. Therefore,
of emotional texts, and this this research
paper will hasfocus
greaton an investigation
research significance. andHowever,
analysis
combining algorithm experiments and actual research objects.
they all overlooked a very important point, that is, they did not clarify their research objects.
Therefore, this paper will focus on an investigation and analysis combining algorithm
3. Bayesian Network
experiments and actual and Deep Neural
research objects.Network Algorithm
3.1. Deep Neural Network Algorithm
3. Bayesian Network and Deep Neural Network Algorithm
Deep learning is a research field of machine learning. It studies the distribution rules
3.1. Deep Neural Network Algorithm
of data, so that the machine can have the same learning ability as humans, and have cer-
Deep learning
tain recognition is a research
ability for images field
andof sounds.
machineIn learning. It studies
recent years, withthe thedistribution
great success rules
of
of data, so that the machine can have the same learning ability as
deep learning in computer vision, speech recognition, data mining, and many other fields,humans, and have certain
recognition
it ability forthat
also has difficulties images and sounds.
traditional methods In recent
cannotyears, with the
overcome. great success
Therefore, of deep
it has become
learning in computer vision, speech recognition, data mining, and many other fields, it also
a new research hotspot. Deep learning uses more complex neural networks to solve prob-
has difficulties that traditional methods cannot overcome. Therefore, it has become a new
lems. Face recognition technology is ubiquitous in daily life, and face recognition is a rel-
research hotspot. Deep learning uses more complex neural networks to solve problems. Face
atively important in-depth learning direction. For neural network, face is like a data ma-
recognition technology is ubiquitous in daily life, and face recognition is a relatively important
trix. The top layer is used to extract facial features, and the bottom layer is used to recog-
in-depth learning direction. For neural network, face is like a data matrix. The top layer is
nize facial features.
used to extract facial features, and the bottom layer is used to recognize facial features.
(1) Deep self-encoding network
(1) Deep self-encoding network
Encoder is a device that encodes a signal or data into a signal that can be communi-
Encoder is a device that encodes a signal or data into a signal that can be commu-
cated, transmitted, and stored. The encoder converts angular displacement and linear dis-
nicated, transmitted, and stored. The encoder converts angular displacement and linear
placement into electrical signals. According to the different reading modes, the encoder
displacement into electrical signals. According to the different reading modes, the encoder
can be divided into contact type and non-contact type. According to the working principle
can be divided into contact type and non-contact type. According to the working principle
of
of coding,
coding, the
the coding
coding device
device cancan bebe divided
divided intointo incremental
incremental codingcoding device
device and and absolute
absolute
value
value coding
coding device.
device. Encoder
Encoder appeared
appeared in in the
the 1980s,
1980s, which
which is is aacontrol-free
control-free learning
learning algo-
algo-
rithm.
rithm. The
The basic
basic idea
idea of
of Encoder
Encoder is is to
to match
match the the input
input network
network as much as
as much as possible.
possible. TheThe
encoding processconsists
encoding process consistsofofthetheinput
inputlayer
layer toto form
form thethe hidden
hidden layer,
layer, andand the decoding
the decoding pro-
process consists
cess consists of the
of the hidden
hidden layerlayer to output
to the the output layer.
layer. Figure
Figure 1 shows
1 shows the overall
the overall frame-
framework
work
of thisofpaper.
this paper.
network parameters are fixed, the energy function can be used to evaluate whether the
new data conform to the network distribution. Suppose layer d has m visible units and
layer g has n hidden units. Then the energy function between the visible layer node and
the hidden layer node (d, g) is:
m m m n
P(d, g|α) = − ∑ x j d j − ∑ y i gi − ∑ ∑ d j w ji gi (1)
n =1 n =1 n =1 m =1
p− p(d,g|α)
E(d, g|α) = (2)
z(α)
In which,
z( α ) = ∑ p− p(d,g|α) (3)
d,g
Then the likelihood functions E(d|α) and E( g|α) can be expressed as:
1
z(α) ∑
E(d|α) = p− p(d,g|α) (4)
g
1
z(α) ∑
E( g|α) = p− p(d,g|α) (5)
d
In addition, the conditional probabilities E(d| g; α) and E( g|d; α) of the hidden layer
and visible layer can also be obtained as:
E(d, g|α)
1
z(α)
p− p(d,g|α) p− p(d,g|α)
E(d| g; α) = = 1
= (6)
E( g|α)
z(α) ∑
p− p(d,g|α) ∑ p− p(d,g|α)
d d
E(d, g|α)
1
z(α)
p− p(d,g|α) p− p(d,g|α)
E( g|d; α) = = 1
= (7)
E(d|α)
z(α) ∑
p− p(d,g|α) ∑ p− p(d,g|α)
g g
Since there is no connection between the hidden layer and the visible layer, the
activation function can be derived from Equations (6) and (7), respectively:
1
E(d j = 1| g; α) = − x j −∑i w ji gi
(8)
1+ p
1
E( g j = 1|d; α) = (9)
1 + p−yi −∑ j w ji d j
Learning an RBM is about determining what is best for learning the data. This value
can be obtained by minimizing the gradient and maximizing the likelihood function. In
order to simplify the calculation, the logarithm can be increased, and the key step is to
find the partial derivative of 2 in Learning an RBM is about determining what is best for
learning the data. This value can be obtained by minimizing the gradient and maximizing
the likelihood function. In order to simplify the calculation, the logarithm can be increased,
and the key step is to find the partial derivative of α in 1, namely:
" #
η InE(d|α) η ( p(d, g|α)) η ( p(d, g|α))
=∑ − (10)
ηα ηα E( g|d,α) ηα E(d| g,α)
Electronics 2023, 12, 1589 5 of 16
Since α = w ji , x j , yi , the partial derivative of w ji , x j , yi can be obtained as:
η InE(d|α)
= d j giE(g|d,α) − d j giE(d,g|α) (11)
ηw ji
Figure2.2. Bayesian
Figure Bayesian network
networkstructure
structurediagram.
diagram.
The purpose
3.3. Bayesian of thisDeep
Regularized paper is toNetwork
Belief apply the Bayesian regularization algorithm to the
Model
RBM (1)
algorithm to improve
Model construction the generalization ability of DBN.
Suppose the function is:
This paper constructs a BR DBN model whose bottom is superimposed by multilayer
Bayesian regularization RBM (BR RBM). The frame is shown in Figure 3. Back-propaga-
1 V M
Q = ∑ ∑each
tion is to calculate the partial derivatives of − cmvin)2the opposite direction according
(bmvlayer (12)
2
to the loss function, so as to update thev=parameters.
1 m =1
where V represents the number of output nodes; M represents the number of training sets;
bmv represents the expected output value; cmv represents the actual output value. Then,
using the regularization method, the training function becomes:
In the formula, P is the new learning function, β and ϕ are the hyperparameters that
determine the distribution of parameters such as weights and thresholds.
Hyperparametric optimization refers to the process of finding the optimal combination
of hyperparameters in machine learning to improve the performance and effectiveness
of the models. Common hyperparametric optimization methods include grid search,
random search, Bayesian optimization, and automatic machine learning. Among them,
grid search is a violent search method, which will train all possible combinations of super
parameters; random search is a method of randomly selecting super parameters, which
can achieve the balance between calculation cost and effect; Bayesian optimization is
an optimization method based on Bayesian theorem, which can gradually adjust the
value range of superparameters according to the performance of known superparameter
combinations to find the optimal superparameter combination; automatic machine learning
is an automatic machine learning method, which can automatically select the optimal
model and super-parameter combination, and can transfer learning in multiple tasks, thus
improving the generalization ability of the model.
Figure3.3.BR-DBN
Figure BR-DBNmodel
modelstructure.
structure.
(2)Model
(2) Modeltraining
training
Thecompleted
The completed statestate of each layer
layer of
of BR-RBM
BR-RBMisisused
usedasasthe
theinput
inputofofthe
thenext
nextlayer
layerof
BR-RBM,
of BR-RBM, and andthethe
process is repeated
process until
is repeated the the
until pre-training of allofBR-RBM
pre-training all BR-RBMlayers is com-
layers is
pleted [13].[13].
completed
Assuming that the BR-DBN network consists of m layers of BR-RBM, since the tuning
phase starts from the last layer of BR-DBN, set the output vector of the last layer to be
f m (a) , that is, the initial sample is a, then f m (a) is:
1
f m (a) = ( y + wm f m−1 ( a ))
m (14)
Electronics 2023, 12, 1589 7 of 16
Assuming that the BR-DBN network consists of m layers of BR-RBM, since the tuning
phase starts from the last layer of BR-DBN, set the output vector of the last layer to be
f m ( a), that is, the initial sample is a, then f m ( a) is:
1
f m ( a) = m m m −1
(14)
1+ p(y +w f (a))
In the formula, ym and wm are the bias value and weight of the 1st layer BR-RBM
respectively; f m−1 is the output vector of the m − 1th layer. After the forward l-layer
BR-RBM learning, it can be concluded that the jth sample belongs to the category. The
probability of b j ∈ (1, 2, . . . k) is:
m f m (a m
p Dr j )+ k
q ( b j = r | f m ( a j ), D m , k m ) = Drm f m ( a j )+km
(15)
∑rk=1 p
function, when b j = r, the value is equal to 1; when b j 6= r, the value is equal to 0. To find
the minimum value of the error, use gradient ascent to find the partial derivatives of the
parameters as follows:
1 n h m _ m _
i
n j∑
∇εm S ( εm ) =
f ( a j )( 1 b j = r − g ( a j )) (17)
=1
If the number of hyperparameters is very large, use random search to find the potential
combination of hyperparameters, and then use the local grid search to select the optimal
feature. Next, super-parameter trimming, the formula is:
Data Set Training Set Test Set Average Classification Error Rate %
Iris 100 50 1.97
Seeds 150 60 3.46
Perfume Data 320 150 2.87
Four class 500 200 2.59
As can be seen from Table 1, for different datasets, the BR-DBN model has a lower
average error rate, and the results show that the model is suitable for discriminative
classification problems. Error rate refers to the proportion of the number of samples with
incorrect classification to the total number of samples.
Electronics 2023, 12, 1589 8 of 16
the challenge addressed in this chapter. This paper presents a brand new model, based on
the emoji smoothing language model, namely emoticonsmoothedlanguagemodel (ESLAM).
The main contributions of ESLAM are as follows:
After training the language model through manually annotated data, ESLAM smoothed
the language model using training data annotated with emoticons. Thus, ESLAM seamlessly
integrates manual and noisy annotated data to form a unified probabilistic model framework.
The large amount of noise annotation data allows the ESLAM language model to handle
misspelled words, slang, tone words, abbreviations, and their various unlogged words. This
ability is not found in a common supervised learning model based on manual annotation.
In addition to discriminating between positive and negative polarity classification,
ESLAM can also be used for subjective classification. The previous noise annotation-based
algorithm cannot be used for subjective classification.
Most noise annotation-based learning algorithms need to crawl a large number of
Electronics 2023, 12, x FOR PEER REVIEW
twitter short texts and store them locally, but considering that twitter crawling has access
frequency limited, it is also a time-consuming, storage space consuming and inefficient
way [19]. The ESLAM in this paper proposes an innovative and simple method to directly
estimate
estimate the probability the probability
of each word in the oflanguage
each word in theby
model language model by
using twitter’s using
open API,twitter’s o
without the need to download
without the need to download any original text from twitter. any original text from twitter.
Experiments on real Experiments
data fromon real data
twitter show from
thattwitter
ESLAM showcanthat ESLAM can
effectively effectively int
integrate
artificial and noisetificial and noise
annotation annotation
information and information
work better than and work better than other
other algorithmic models algorithm
that use only one of that useinformation.
these only one of these information.
To test the role
To test the role of meta-learning of meta-learning
methods based on deep methods
beliefbased on deep
networks belief
in text networks in
emotion
tion classification, two sets of contrast experiments
classification, two sets of contrast experiments were used. The first group compares the were used. The first group
the resultsmethod
results of the meta-learning of the meta-learning
of deep beliefmethod
networkofand deepthebelief
deepnetwork and the deep b
belief network
directly acting on work directly
the text featureacting
vectoron in
thetext
textemotion
feature vector in text emotion
classification, and theclassification,
results of and t
metalearning and of metalearning
fixed rules in textand fixed rules
emotion in text emotion classification.
classification.
The deep persuasion Theofdeep persuasion
the network of theaffects
directly network thedirectly
emotional affects the emotional
classification of theclassificat
text. The work process text. The workofprocess
consists consists
three parts: textofpre-processing,
three parts: texttextpre-processing, text feature
feature selection,
and learning in theand learning
deep neuralin the deepThe
network. neural network.
process Thein
is shown process
Figureis4.shown in Figure 4.
Microsoft, and Twitter. After removing non-English tweets and junk tweets, there are still
3723 tweets left. The larger the index value, the more accurate the text emotion classification
results are. Six sets of different features were selected, including monary word, binary word,
word sex, dependency label, combined features of emotion score, and triplet dependency
feature. The dimensions of each functional set were network input with 1000, 2000, 4000,
6000, 8000, 12000, 14000 items with the highest information gain score. The number of
different network layers and their corresponding hidden layer nodes are shown in Table 2.
In Table 2, the network structure with X representing the input nodes and 2 layers is
X-600-300, indicating the first hidden layer knots of 600 and the second hidden layer knots
of 300.
Experimental results record the classification accuracy and reconstruction error of
different feature dimensions under different network structures. It records the accuracy
of the network DBN: X-2000-1000-500-200-100, DBN: X-600-300-100, DBN: X-600-300 and
BP: X-600 in different dimensions of six sets of feature sets. The reconstruction errors are
numbered according to the corresponding number of hidden layers.
Exact Value Reconstruction Error 1 Reconstruction Error 2 Reconstruction Error 3 Time (s)
minimum 0.8116 7.3175 2.7168 1.3974 166.3
maximum value 0.8700 26.5429 6.2811 4.4129 1503.6
average value 0.8301 15.9288 4.9615 2.9398 763.4
0.88 0.88
0.84 0.84
types
types
0.82 0.82
0.8 0.8
0.78 0.78
value value
Figure 5. DBN: X20000-1000-500-200-100 classification standard rate.
Figure 5. DBN: X20000-1000-500-200-100 classification standard rate.
Table 3 shows the results of the network DBN: X-2000-1000-500-200-100. Among them,
the minimum classification accuracymaximum,
The minimum, was 0.8058andandmean
the maximum
values of was 0.8692,data
the source obtained
results were
when the triplet dependency feature dimension was 14,000,
lated. Statistical results are presented in Tables 3–6. the average classification
accuracy of the 5-layer deep belief network was 0.8303. However, the first-layer restricted
Boltzmann machineTable reconstruction error for a single
3. DBN: X2000-1000-500-200-100 training
statistical set is 9.4408 to 22.5903, and
results.
the average reconstruction error is 16.0944. The reconstruction error increases with the
Exact
input Reconstruction
nodes. Reconstruction
The reconstruction error in the Reconstruction
second layer ranged Reconstruction
from 6.0798Reconstruction
to 10.3566, Tim
Value Error 1 Error 2 Error 3 Error 4
with an average value of 8.7713. The first layer is greatly reduced in the reconstruction Error 5
error.
minimum The
0.8058 9.4408
reconstruction 0.6078
error of the third layer ranges2.4241 2.4355 and the average
from 2.4241 to 5.2308, 1.4961 16
Imaximum reconstruction error is 3.9798, which is also reduced from the reconstruction error of the
0.8692 22.5905 10.3566 5.2308 5.2445 4.3040 49
value second layer. The fourth layer reconstruction error ranged from 2.4355 to 5.2445, with an
average of 4.1796, and the third layer showed little change. The reconstruction error in the
fifth layer ranges from 1.4961 to 4.3040, with an average of 3.0649, decreasing compared
with the reconstruction error in the previous layer. The network running time ranged from
166.3 to 1503.6 s, increasing with increasing input nodes.
Table 4 shows the results of the network DBN: X-600-300-100. Among them, the
minimum range of classification accuracy is 0.8116, the maximum is 0.87, taken when the
triplet dependency feature input node is 12,000, the average classification accuracy is 0.8301.
The reconstruction error range of the first layer confined Boltzmann machine is between
7.3175 and 26.5429, increasing with increasing input nodes. The reconstruction error of the
second layer ranges from 2.7168 to 6.2811, which is less accurate from that of the previous
Electronics 2023, 12, 1589 12 of 16
layer. The reconstruction error of the third layer ranges from 1.3974 to 4.4129, which is also
reduced compared with the previous layer. The time period ranged from 166.3 s to 1503.6 s.
Table 5 shows the results of the network DBN: X-600-300. Among them, the minimum
classification accuracy was at 0.8116 and the maximum was 0.87, obtained when the triplet
dependency feature input node was 14,000, and the average classification accuracy was
0.8326. However, the reconstruction error range of the training set is 7.3296 to 26.5921,
increasing with more input nodes. The reconstruction error in the second layer ranges from
2.7044 to 9.5712, which is lower than that of the previous layer. The time period ranged
from 142.2 s to 1409.4 s.
Table 6 shows the results of network BP: X-600. Among them, the minimum classi-
fication accuracy was 0.8133 and the maximum was 0.8641, obtained when the ternary
dependency feature input dimension was 14,000, and the average classification accuracy
was 0.8333. The time period ranged from 45.35 s to 1117.5 s.
4.3. Experimental
(1) Effect of different feature sets on the classification accuracy
As can also be seen from Figure 5, in order to verify numerical, by calculating the
average classification accuracy of different feature sets, the results are 0.81, 0.8381, 0.8195,
0.8152, 0.8220, and 0.8620. The highest classification accuracies were 0.8142, 0.8433, 0.8308,
0.825, 0.8342, and 0.8692, respectively. We show that the triple dependence is a feature
representation that achieves the highest classification accuracy, followed by combined
features of monary and binary words.
AsPEER
Electronics 2023, 12, x FOR can be seen from Figure 6, the average classification accuracy according to different
REVIEW 1
feature sets is 0.8202, 0.8379, 0.8214, 0.8175, 0.8215, and 0.8585. The highest classification
accuracy obtained was 0.8291, 0.845, 0.8283, 0.8325, 0.8375, and 0.87.
0.88 0.88
1000 2000 4000 6000
8000 10,000 12,000 14,000
0.86 0.86
0.84 0.84
types
types
0.82 0.82
0.8 0.8
0.78 0.78
value value
Figure 6. DBN: X600-300-100 classification standard rate.
Figure 6. DBN: X600-300-100 classification standard rate.
As can be seen from Figure 7, when the triplet dependency feature dimension is taken
at 4000 or above, the classification accuracy exceeds other features and feature combinations.
0.88 0.88
Second, the combination of monary word features achieves good classification results in
1000 2000 8000 10,000
0.8 0.8
Electronics 2023, 12, 1589 13 of 16
0.88 0.88
1000 2000 8000 10,000
0.84 0.84
types
types
0.82 0.82
0.8 0.8
0.78 0.78
value value
Figure 7. DBN: X600-300 classification standard rate.
Figure 7. DBN: X600-300 classification standard rate.
As can be seen from Figure 8, when the triplet dependency feature dimension is taken
at 4000 or above, the classification accuracy exceeds other features and feature combinations.
The average classification accuracy of binary word features is higher than the addition of
word features, dependency labels, and emotion score. The lowest average classification
accuracy was on unitary word features. According to the calculation, the average classi-
fication accuracy of different feature sets is: 0.8196, 0.8349, 0.8307, 0.8276. Meta-learning
text emotion classification 850.8306, 0.8479. Based on deep belief network, the highest
classification accuracy obtained was: 0.8275, 0.8416, 0.8425, 0.8375, 0.8391, and 0.8641.
(2) Analysis and comparison of deep belief network and BP network
The deep belief network is composed of the multilayered restricted Boltzmann machine
in the stack form, and the initial weights of the network are learned by the restricted
Boltzmann machine algorithm, and then adjusted by the BP algorithm according to the
label data. However, the initial value of the network is randomly assigned, which is
adjusted by the BP algorithm, which leads to the non-convergence of the BP network due
to the error decline. This section analyzes the classification accuracy and convergence of
deep belief networks and BP networks in experiments.
The deep belief network structure with different numbers of layers was compared
with the classification accuracy of the BP networks, They are: One yuan word 4000, One
Word 6000, +Binary word 4000, +Binary word 6000, +Binary word 8000, +Binary word
10,000, +Binary word 12,000, +Binary word 14,000, +Sex of Words 4000, +Sex of Words
6000, +Sex of Words 8000, +Sex of words 10,000, +Sex of Words 12,000, +Sex of Words
14,000, +Lalabel 4000, +Lalabel 6000, +Lalabel 8000, +Lalabel 10,000, +dependent label
12000, +Lalabel 14,000, +Emotional score of 4000, +5 Emotional score of 6000, +Emotional
Electronics 2023, 12, 1589 14 of 16
score of 8000, +Emotional score of 10,000, +Emotional Score of 12,000, +Emotional score
Electronics 2023, 12, x of
FOR14,000, Terplet dependency 4000, Terplet dependency 6000, Terplet dependency 8000,
PEER REVIEW 15
Terplet dependency 10,000, Terplet dependencies 12,000, Terplet dependency 14,000. The
comparison of the obtained results is shown in Figure 9.
0.88 0.88
1000 2000 8000 10,000
0.84 0.84
types
types
0.82 0.82
0.8 0.8
0.78 0.78
the error decline. This section analyzes the classification accuracy and convergenc
0.84
deep belief networks and BP networks in experiments.
0.83 The deep belief network structure with different numbers of layers was comp
with the classification accuracy of the BP networks, They are: One yuan word 4000,
0.82 Word 6000, +Binary word 4000, +Binary word 6000, +Binary word 8000, +Binary w
10,000, +Binary word 12,000, +Binary word 14,000, +Sex of Words 4000, +Sex of W
0.81
6000, +Sex of Words 8000, +Sex of words 10,000, +Sex of Words 12,000, +Sex of W
0.8 14,000, +Lalabel 4000, +Lalabel 6000, +Lalabel 8000, +Lalabel 10,000, +dependent l
0 12000, 5+Lalabel1014,000, 15 +Emotional20 score 25of 4000, 30+5 Emotional
35 score of 6000, +Emoti
score of 8000, +Emotional Feautre score
Sets of 10,000, +Emotional Score of 12,000, +Emotional sco
Figure
Figure 9. Emotional14,000,
9. Emotional Terplet
analysis
analysis dependency
lineline chart.
chart. 4000, Terplet dependency 6000, Terplet dependency 8000,
plet dependency 10,000, Terplet dependencies 12,000, Terplet dependency 14,000.
TheThe structural
structural classification
comparison accuracy
of the accuracy
classification obtained of of
thethe
results is three
threeshown different
in Figure
different deep
deep belief
9.belief networks
networks in in
Figure
Figure 9 has
9 has almost
almost thethe same
same trend
trend at at each
each input
input node.
node. In Inthethe comparison
comparison of of deep
deep belief
belief
network and BP between the 11th node and the 26th node, BP:
network and BP between the 11th node and the 26th node, BP: X-600 is better than other X-600 is better than other
networks, while DBN-X-2000-1000-500-500-200-100, from the 28th to 32nd junction, BP: X- X-
networks, while DBN-X-2000-1000-500-500-200-100, from the 28th to 32nd junction, BP:
600600
hashasthethe lowest
lowest classification
classification accuracy,
accuracy, indicating
indicating that
that BPBP learns
learns less
less under
under complex
complex
features, less than the deep belief network
features, less than the deep belief network [22]. [22].
BPBP network
network algorithm
algorithm is an
is an essential
essential gradient
gradient descent
descent method,
method, andandthethe
highhigh dimen-
dimen-
sional characteristics of network input and the nature of text emotion
sional characteristics of network input and the nature of text emotion classification itself classification itself
make optimization objective function very complex, therefore, the process of “zigzag”
make optimization objective function very complex, therefore, the process of “zigzag”
phenomenon is used. When the optimization of neuron output is close to 0 or 1, the weight
phenomenon is used. When the optimization of neuron output is close to 0 or 1, the weight
error change is small, in the error spread to pause, leading to network convergence. How-
ever, in the deep belief network, the optimization of weights is realized by limiting the
Boltzmann machine to avoid the non-convergence due to too small error in the gradient
descent algorithm [23].
Electronics 2023, 12, 1589 15 of 16
error change is small, in the error spread to pause, leading to network convergence. How-
ever, in the deep belief network, the optimization of weights is realized by limiting the
Boltzmann machine to avoid the non-convergence due to too small error in the gradient
descent algorithm [23].
5. Conclusions
The main work and conclusions are as follows:
(1) According to the characteristics of Chinese text, by analyzing the theory, character-
istics, and generating methods of dependent syntactic relationship, yielded the process of
constructing the dependent relationship characteristics of Chinese triad; it analyzed and
summarized the dependent syntactic relationship of many Chinese sentences, and formu-
lated the rules for Chinese sentences without affecting the structure of dependent tree. The
merge and delete algorithm of redundancy and useless nodes are presented. The above
method is used in Chinese hotel review data, book review data, and laptop review data,
effectively realizing the conversion of triplet dependency characteristics of text [24,25].
(2) It compared the accuracy of text emotion classification by comparing the com-
bination of the proposed ternary dependency features and common text representation
features including monary words, binary words, words, dependency labels, and emotion
scores [26,27]. To this end, two sets of experiments are designed for comparative analysis:
one calculates the emotion score of each comment statement on three datasets based on
semantic methods, and one uses the features extracted from three data instances for ma-
chine learning and k-neighbor classification algorithm [28]. Meanwhile, the text feature
representation method of different feature sets is dimension-reduced, and the feature vector
space of different dimensions is used in traditional machine learning algorithms. Experi-
mental results show that the triplet dependent feature representation method is effective in
text emotion classification, with a much higher result than the emotion dictionary scores
based on semantic methods, and the classification accuracy reaches 84 to 86% in large-scale
data for SVM classification algorithms, increasing 2~3% based on existing features. But it
is also found that the triplet dependency feature leads to the growth of the characteristic
dimension. Determining the dimension is a difficult problem to reduce the dimension.
However, due to the limitations of time and technology, this paper has not carried out a
detailed analysis of the problems encountered in the emotional classification of short text,
which will be further discussed in the future.
Author Contributions: Formal analysis, C.F.; Writing—original draft, Z.S. All authors have read and
agreed to the published version of the manuscript.
Funding: This research received no external funding.
Data Availability Statement: Not applicable.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Chang, G.; Huo, H. A method of fine-grained short text sentiment analysis based on machine learning. Neural Netw. World 2018,
28, 325–344. [CrossRef]
2. Sun, X.; Peng, X.; Hu, M. Extended Multi-modality Features and Deep Learning Based Microblog Short Text Sentiment Analysis.
Dianzi Yu Xinxi Xuebao/J. Electron. Inf. Technol. 2017, 39, 2048–2055. [CrossRef]
3. Joshi, S.; Deshpande, D. Twitter Sentiment Analysis System. Int. J. Comput. Appl. 2018, 180, 35–39. [CrossRef]
4. Piryani, R.; Piryani, B.; Singh, V.K.; Pinto, D. Sentiment analysis in Nepali: Exploring machine learning and lexicon-based
approaches. J. Intell. Fuzzy Syst. 2020, 39, 2201–2212. [CrossRef]
5. Attieh, J.; Tekli, J. Supervised term-category feature weighting for improved text classification. Knowl. Based Syst. 2023, 261, 110215.
[CrossRef]
6. El Hindi, K.M.; Aljulaidan, R.R.; AlSalman, H. Lazy fine-tuning algorithms for naïve Bayesian text classification. Appl. Soft
Comput. 2020, 96, 106652. [CrossRef]
7. Jiang, W.; Zhou, K.; Xiong, C.; Guodong, D.; Chubin, O.; Zhang, J. KSCB: A novel unsupervised method for text sentiment
analysis. Appl. Intell. 2023, 53, 301–311. [CrossRef]
Electronics 2023, 12, 1589 16 of 16
8. Divate, M.S. Sentiment analysis of Marathi news using LSTM. Int. J. Inf. Technol. 2021, 13, 2069–2074. [CrossRef]
9. Bhagat, C.; Mane, D. Survey On Text Categorization Using Sentiment Analysis. Int. J. Sci. Technol. Res. 2019, 8, 1189–1195.
10. Albayati, A.Q.; Al_Araji, A. Arabic Sentiment Analysis (ASA) Using Deep Learning Approach. Univ. Baghdad Eng. J. 2020, 26, 85–93.
[CrossRef]
11. Ali, F.; Ali, A.; Imran, M.; Naqvi, R.A.; Siddiqi, M.H.; Kwak, K.-S. Traffic accident detection and condition analysis based on social
networking data. Accid. Anal. Prev. 2021, 151, 105973. [CrossRef] [PubMed]
12. Rusnachenko, N.; Loukachevitch, N. Attention-Based Neural Networks for Sentiment Attitude Extraction using Distant Su-
pervision. In Proceedings of the 10th International Conference on Web Intelligence, Mining and Semantics, Biarritz, France,
30 June–3 July 2020; pp. 159–168. [CrossRef]
13. Gallego, F.O.; Corchuelo, R. Torii: An aspect-based sentiment analysis system that can mine conditions. Software 2020, 50, 47–64.
[CrossRef]
14. Chen, J.; Yan, S.; Wong, K.C. Verbal aggression detection on Twitter comments: Convolutional neural network for short-text
sentiment analysis. Neural Comput. Appl. 2018, 3, 10809–10818. [CrossRef]
15. Rehman, A.U.; Malik, A.K.; Raza, B. A Hybrid CNN-LSTM Model for Improving Accuracy of Movie Reviews Sentiment Analysis.
Multimed. Tools Appl. 2019, 78, 26597–26613. [CrossRef]
16. Karthik, E.; Sethukarasi, T. Sarcastic user behavior classification and prediction from social media data using firebug swarm
optimization-based long short-term memory. J. Supercomput. 2021, 78, 5333–5357. [CrossRef]
17. Wang, X.; Zhang, H.; Xu, Z. Public Sentiments Analysis Based on Fuzzy Logic for Text. Int. J. Softw. Eng. Knowl. Eng. 2016, 26,
1341–1360. [CrossRef]
18. Ashok, K.J.; Trueman, T.E.; Cambria, E. A Convolutional Stacked Bidirectional LSTM with a Multiplicative Attention Mechanism
for Aspect Category and Sentiment Detection. Cogn. Comput. 2021, 13, 1423–1432.
19. Roseline, V.; Chellam, G.H. Sentiment Classification Using PS-POS Embedding with Bilstm-CRF and Attention. Int. J. Future
Gener. Commun. Netw. 2020, 13, 3520–3526.
20. Han, H.; Bai, X.; Ping, L. Augmented sentiment representation by learning context information. Neural Comput. Appl. 2019, 31,
8475–8482. [CrossRef]
21. Sengan, S.P.; Sagar, V.; Khalaf, O.I.; Dhanapal, R. The optimization of reconfigured real-time datasets for improving classification
performance of machine learning algorithms. Math. Eng. Sci. Aerosp. 2021, 12, 43–54.
22. Roseline, V.; Herenchellam, D. PS-POS Embedding Target Extraction Using CRF and BiLSTM. Int. J. Adv. Sci. Technol. 2020, 29,
10984–10995.
23. Bashar, M.A.; Nayak, R.; Luong, K. Progressive domain adaptation for detecting hate speech on social media with small training
set and its application to COVID-19 concerned posts. Soc. Netw. Anal. Min. 2021, 11, 69. [CrossRef] [PubMed]
24. Huan, J.L.; Sekh, A.A.; Quek, C.; Prasad, D.K. Emotionally charged text classification with deep learning and sentiment semantic.
Neural Comput. Appl. 2021, 34, 2341–2351. [CrossRef]
25. Yan, Z.; Cao, W.; Ji, J. Social behavior prediction with graph U-Net+. Discov. Internet Things 2021, 1, 18. [CrossRef]
26. Brooke, J.; Hammond, A.; Hirst, G. Using models of lexical style to quantify free indirect discourse in modernist fiction. Lit.
Linguist. Comput. 2017, 32, 234–250. [CrossRef]
27. Kumar, M.; Aggarwal, J.; Rani, A.; Stephan, T.; Shankar, A.; Mirjalili, S. Secure video communication using firefly optimization
and visual cryptography. Artif. Intell. Rev. 2021, 55, 2997–3017. [CrossRef]
28. Lu, H.; Wang, S.S.; Zhou, Q.W.; Zhao, Y.N.; Zhao, B.Y. Damage and control of major poisonous plants in the western grasslands of
China? a review. Rangel. J. 2012, 34, 329. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.