A REVIEW ON RECENT ADVANCES IN DEEP LEARNING FOR
A REVIEW ON RECENT ADVANCES IN DEEP LEARNING FOR
A REVIEW ON RECENT ADVANCES IN DEEP LEARNING FOR
COMPUSOFT, An international journal of advanced computer technology, 9(7), July-2020 (Volume-IX, Issue-VII)
ISSN:2320-0790
Abstract: Now days the horizons of social online media keep expanding, the impacts they have on people are huge. For
example, many businesses are taking advantage of the input from social media to advertise to specific target market. This is
done by detecting and analyzing the sentiment (emotions, feelings, opinions) in social media about any topic or product from the
texts. There are numerous machine learning as well as natural language processing methods used to examine public opinions
with low time complexity. Deep learning techniques, however, have become widely popular in recent times because of their
high efficiency and accuracy. This paper provides a complete overview of the common deep learning frameworks used in
sentiment analysis in recent time. We offer a taxonomical study of text representations, learning model, evaluation, metrics and
implications of recent advances in deep learning architectures. We also added a special emphasis on deep learning methods; the
key findings and limitations of different authors are discussed. This will hopefully help other researchers to do further
development of deep learning methods in text processing especially for sentiment analysis. The research also presents the quick
summaries of the most popular datasets, lexicons with their related research, performance and main features of the datasets. The
aim of this survey is to emphasize the ability to solve text-based sentiment analysis challenges in deep learning architectures
with successful achievement for accuracy, speed with context, syntactic and semantic meaning. This review paper analyzes
uniquely with the progress and recent advances in sentiment analysis based on existing advanced methods and approach based
on deep learning with their findings, performance comparisons and the limitations.
Keywords: Sentiment Analysis(SA); Text; Deep learning; Emotion Recognition (ER); Classifiers; Neural Network (NN).
3775
COMPUSOFT, An international journal of advanced computer technology, 9(7), July-2020 (Volume-IX, Issue-VII)
3776
COMPUSOFT, An international journal of advanced computer technology, 9(7), July-2020 (Volume-IX, Issue-VII)
Word It used the [24]- Uses attention structure of a CNN model with its related layers.
embeddings distributed 2018 based Convolutional neural networks (CNNs) are generally
with pre- representation of hierarchical
trained each words and model of LSTM flexible with larger inputs and very broad scale. Unlike
word2vec in SG skipping with conventional ANNS, CNNs include inputs, completely
and the Skip- word with length incorporating connected as well as output layers, but they do have
gram l. common sense convolutional and pooling layers including additional
knowledge of all
sentiment layers which are essential to CNN effectiveness. Such
concepts layers involve the processing of filtered feature maps. A
FastText[20] The word [25]- CNN model uses feature map is source data representations. In back
representation is 2018 this domain- propagation time, the filters used in both convolution and
taught effectively specific double
when utilizing embedding pooling layers.
the details at the mechanism Figure-3 shows the basic structure of CNN model.
character level
then this tools
fastText is also
works for all of
the rear words
GloVe[26] Glove vectors is [16]- MAN model
unsupervised, 2020 uses two level
and the terms are embedding
represented by GloVec(Local
the vector for and word level
each words. The interaction)
terms are
identified by
word similarity
distance as well
as semantic
space
BERT[27] BERT pretrains (Gao, BERT used for
duel-directional Feng, target dependent
representations Song, & sentiment
of unlabelled Wu, classification.
data in each of 2019)
these layers
Table-2: Summary of text embedding in deep learning Figure-3: CNN model [29].
based SA. F. Long Term Short Term Memory(LSTM)
D. Deep Learning Model LSTM is generally an RNN prolongation that requires
There are three basic steps in the deep learning-based inputs to be stored for a long period. LSTM has an
model which are application, architecture and preparation advanced memory, as opposed to RNN's basic internal
of inputted text using different embedding approach then memories. Figure-4 shows basic architecture of LSTM. The
feed forwarded to the deep learning model named CNN, memory content can be read, written and erased. Therefore,
RNN based model and finally predict NLP application. it addresses RNN's drawback that suffered from vanishing
This section gives overall information about the tools and point. LSTM will determine which knowledge to remember
methods used in deep learning. Figure-2 shows the overall and what to forget. The memory may be gated in LSTM. It
deep learning model for sentiment analysis. has three gates named as input, forget and the output gates.
Basic equation of LSTM is given below.
it= 𝜎(wi[ht-1,xt]+bi),ft= 𝜎(wf[ht-1,xt]+bf),ot= 𝜎(wt[ht- similarity between a LSTM and GRU and GRU has limited
1,xt]+bo) parameters as compared with LSTM. Generally, GRU
Here, it is used for input gate, ot for output gate and ft performs good enough than LSTM. GRU has use of gates
denotes forget gate. ,σ represents the activation function, for high performance namely two gates those are reset and
wx is used to indicate weights of different gates(x), ht-1 is update gate. Each reset gate defines how the new inputted
the output from the previous LSTM with time stamp t-1, xt data is merged with the prior information, while the update
is the currenttime stamp, b used for bias value in different gate defined which prior memory will be maintained.
gates. GRUs has no background conditions (ct) like LSTMs.
Update gate and reset gate makes forget gate and linked
G. Attention mechanism with RNN with past hidden layer. Therefore, in a GRU, the purpose of
Attention processes have become popular in the NLP with the LSTM reset gate is essentially separated into the reset
a significant paper regarding machine translation[31]. RNN and upgrades that shown in figure-6.
works with a single hidden layer in order to obtain an
intuition for the attention process. The goal of the network
for the attention function is to extract the meaning of each
hidden state and to calculate the value of a weighted
summation of features. Here Figure-5 is the schematic from
Bahdanau's Attention Method. An LSTM series for each
input sentence is generated throughout the Bidirectional
LSTM used in this (h1. h2. hTx). All the h1,h2 .., etc.
vectors used during their function simply are just the
concatenation with hidden states as forward and backwards
in the encoder. In basic terms, the Tx indicates words
number in the input sentence is represented by the vectors
variables h1,h2,h3, hTx. Only the last condition of the
encoder, LSTM (hTx in this case), works as context vector Figure-6:Gated Recurrent Unit [30]
on the basic encoder as well as decoder method.
Here x is used for input and r for reset gate and h for
hidden state and z for update and sigma for
rt=σ(Wxrxt + Whrht-1 + br)
zt= σ(Wxzxt + Whzht-1 + bz)
ĥt=tanh(Wxhxt + Whh(rt ⊙h t-1) + bh)
ĥt=zt⊙h t-1 + (1-zt) ⊙ĥt
here W represents matrices, b represents model parameters,
σ represents sigmoid function and the symbol⊙ for the
multiplication.
Figure-5: Attention based model
the weights will be learned by a feed-forward NN with its I. Capsule Network of Recurrent Neural Network(RNN)
equation given below. The context-based vector ci for the
produced output word is yi that is generated by the The hierarchical relationship among local features, which
weighted summation of the annotations and the weights αij can misclassify concepts due to their characteristics, could
is calculated using a softmax function with the given not be modeled in CNN. With max pooling in CNN, some
equation: important information would be lost, because the active
neurons will just transfer to the next stage. Capsule
𝑒𝑥𝑝( 𝑒𝑖𝑗 ) networks have also been suggested to overcome these
𝑎𝑖𝑗 = 𝑇𝑥 constraints. These networks tackle the spatial relations
𝑘=1 ⅇ𝑥 𝑝(𝑒𝑖𝑘 ) among entities by the use of dynamic routing with
𝑇𝑥 capsules. This is much better when compared to CNN's
eij=a(si-1,hj) ci= 𝑗 =1
𝑎𝑖𝑗 , ℎ𝑖 max pooling service. The dynamic routing is used to train
the neuronal vectors of capsule networks; the dynamic
Here eij is the product of a neural feed forward of network routing is used to replace the conventional neural network
that is represented by the functions to track the alignment cell node. The capsule networks allow it to be with
from input in j as well as output in i. relatively lower information than most of the other models
H. Gated Recurrent Unit(GRU) of neural network. Main role of the capsule network to
establish spatial relations and also the location directions
Gated Recurrent Units (GRUs) was developed by
focused on the conventional neural network, and by
Kyunghyun Cho in 2014 as a gating function with
merging invariance with coverability to recognize objects.
Recurrent Neural Network(CNN)[32].There is a lot of
A first level capsule chooses to give its production towards
3778
COMPUSOFT, An international journal of advanced computer technology, 9(7), July-2020 (Volume-IX, Issue-VII)
next level capsules which vectors provide a broad scalar its function and spatial position details. The above
element by lower level capsule projection. Figure 7 shows limitations of handling long ranged dependent feature of
routing process in capsule network. information have been almost solved by the Recurrent
Neural Network (RNN) [32] [37] [38] and capsule neural
network [8] [39]. RNN has been widely used in different
fields for classification purposes with better result.
Author Type Journa Task Lexicon or Approach Sentiment Text Performance Advantages Disadvantages
and ls dataset analysis
Year features
[16] - Hybrid Neuroc Opinio Works with Glove Attention English Accuracy, Macro- This method used Its mean value-based
2020 omputi n five datasets: Vector, Mechanism F1 percentage on Multi headed tasks sometimes misses
ng, Mining laptop2014, Encoder, done attention data: attention context, Uses cosine
Elsevie restaurant201 Cosine weights using Laptop2014(78.13 mechanism that similarity that it cannot
r 4,restaurant20 Similarity mean value. ,73.20) increase the able to handle with the
15,restaurant2 Cosine Restaurant2014 performance for irrelevant portion of text
016, and similarity is (84.38,71.31) sentiment
twitter used for Restaurant2015 classification
position (82.65,69.10)
analysis for Restaurant2016(8
context and 5.87,73.28)Twitte
aspect r(76.56,72.19)
[13] - Hybrid Neuroc Opinio Persian Uses SVM, Uses author Persian Performance on Uses knowledge- More dependent to DNN
2020 omputi n product and LR and developed Product review based approach on classifier, Works only
3779
COMPUSOFT, An international journal of advanced computer technology, 9(7), July-2020 (Volume-IX, Issue-VII)
ng, mining corpora from DNN dependency data: the dependency for Persian, depends on
Elsevie www.digikala classifiers rules to extract Precession:87% rule for parsing Google translator, unable
r .com and (LSTM- sentiment Recall:92% purpose to extract to handle multi word
hotel reviews CNN) feature using F1 Score:89% opinion for this expression, informal
from syntactic Accuracy:86.29% context is handled words, idioms, sarcasm,
http://www.h relation among well than other complex sentence.
ellokish.com the words approaches.
[45] - Hybrid IEEE, Opinio Laptop BERT with Uses Word English Performance on Did not consider Did not perform great
2019 Access n review, target Piece tokenizer, Laptop data: the whole with mixed sentiment
Mining Restaurant dependency segment, and Accuracy: 78.87 sentence, but polarities towards
review and approach position ±1.13, Micro focuses on target different aspects.
Twitter Glove embeddings and F1:74.38 ±1.39 terms instead
dataset Vector, encoder layer Restaurant data:
Word2Vec for Accuracy: 83.87
classification ±.27
with BERT ,MicroF1:79.61±0
.79
Twitterdata:Accur
acy:77.31 ±.79
,MicroF1:
75.56±0.93
[46]- Deep Compu Multile TREC QA Capsule CNN and English NLP-Capsule got Performs good Datasets need to be large
2019 learnin tation vel dataset neural Capsule with performance of with emough and more realistic.
g and opinion network routing with MAP 77.73% margin accuracy
langua Mining compressions,A MRR 74.16% for multi-label
ge, daptive KDE text classification
Compu Routing and question
ter answering
Scienc
e,
Cornell
Univer
sity
Journal
[33] - Hybrid IEEE, Opinio Movie BiGRU and CNN used for English 82.55% accuracy RNN Routing This method performs
2019 Access n Review data, CNN feature for Movie review process of the well but attention-CNN
Mining NLPCC2014 model and extraction data Capsule network layer is not rich enough
dataset BiGRU is used 87.84% accuracy can extract with its operations for
for and Capsule for NLPCC data independent text better performance
network can features i.e with
extract word’s position,
independent semantic and
text features syntactic
structure.
[47]- Semi- IEEE, Emotio ISEAR and Deep Uses CNN, Bi- English Accuracy:74.6 % Extract 6 basic Performance is limited
2019 Superv Access n other 9 learning LSTM, Word for ISEAR emotions of with the data of word
ised Recogn datasets embeddings dataset, Good Ekman’s model, embedding
ition Word2Vec, performance for can handle
GloVe, and other 9 datasets semantic of
FastText sentence
[7] - Superv Compu Emotio Semeval- Deep Uses ASGD English Accuracy 75.82% Extract sad, It has some lacking
2019 ised tation n 2019 Task-3 learning: (Average for SemVal2019 happy, angry handling context and
and recogni data Bi-LSTM Stochastic data and also emotions in semantic relation among
Langua tion collections of Gradient shown its different conversion words in the sentence by
ge, labeled Descent) for model data this attention based
Cornel conversations training, uses BiLSTM
Univer attention based
sity AWD-Bi-
Publish LSTM for
er classification.
[8] - Hybrid Confer Emotio Kaggle Capsule Word English 98.46 Accuracy It performs well It is domain dependent
2018 ence n toxicity Network embedding, on Kaggle data for for domain but not suitable for other
paper recogni detection with Focal Loss for toxic comment dependent dataset domain, cannot handle
,ACL tion dataset Dynamic better classification for Kaggle misspelled word.
Anthol Routing, performance in toxicity dataset
ogy, LSTM Toxic
ACL comments
web classification
[48]- Deep Knowl Opinio Amazon Attention Uses two level English Accuracy Amazon Domain Domain dependent and
2018 learnin edge n multi-domain mechanism modules named Books 87.75% representation do not perform for multi
g based classifi sentiment as domain DVD 86.58% allow attention domain and multilingual
system, cation dataset, module and Electronics process for text.
Elsevie Sanders sentiment 87.50% Kitchen selecting the most
3780
COMPUSOFT, An international journal of advanced computer technology, 9(7), July-2020 (Volume-IX, Issue-VII)
DRAWBACKS OF DEEP LEARNING METHODS IN SENTIMENT III. CHALLENGES, LIMITATIONS AND FUTURE WORK IN
ANALYSIS SENTIMENT ANALYSIS WITH THE MODEL OF DEEP LEARNING
From the review of deep learning in this article, it can be It is not possible for a machine to work like human to
concluded that the deep learning architectures have shown recognize sentiment. However, existing recent of sentiment
outstanding results and important advances in sentiment analysis from text has generally performed with good
analysis, there are still some disadvantage in using the accuracy. They are however still lacking in terms of
following algorithms: lacking coherence, context, semantic meaning handling,
1. In order to ensure that a machine achieves the negation, modifiers, and intensifier of the sentence. Context
required output, most deep learning strategies based task is giving some satisfaction for this problem.
allow several labelled data to be trained. Thus, for Lexicon or dictionary-based approaches can handle
sentiment analysis research, a big set of dataset is grammatical syntax but also have some limitations such as
necessary for training the deep learning low accuracy, higher time complexity, dictionary, and
architecture in order to predict the class labels domain dependency. Unsupervised based works give
correctly. Huge quantities of data can be adaptability, simplicity, lower complexity but these
exceedingly complicated and cumbersome to methods also come with the limitations in time complexity
collect and label. and accuracy. But, Machine learning approaches like Tfidf,
2. Unlike conventional machine learning or lexical Naïve Bayes (NB) , Random Forest (RF),Support Vector
approaches, which display what features are (SVM), Logistic Regression (LR), Bayesian, k-means,
chosen to predict a certain feeling, it is difficult to Maximum entropy classification, Conditional Random
find out what the real explanation for the neuronal Field (CRF) classifier work better for faster time but have
network, by finding at weights in various stages, limitation of handling semantics and dependency of words
for predicting multilevel sentiment of text. It in the sentence. In recent deep learning-based approach
makes its challenging to achieve the result about CNN, LSTM, GRU, BERT, Capsule neural network gives
the prediction analysis of the model of neural higher accuracy by handling independent text features.
networks, as they function works looks like "black There are also some limitations in deep learning such as
box." handling of context and syntactic properly. Deep learning
3. Deep learning approaches such as CNN need to be and quantum deep learning is the current trends in the area
tuned on initial parameters. You see this in of sentiment analysis. Now live sentiment analysis is also
Stoyanovsky et al[49]. The network's efficiency the trends task for a game, product or other.
therefore relies on the value of the hyper
parameters on the networks. This is also a difficult IV. CONCLUSION
job to determine the optimum hyper parameter This article gives a systematic review and analysis on deep
values. learning methods of text sentiment analysis. It mainly
4. The time it takes to train them is also really nice as introduces several different deep learning methods with
there are a huge number of the parameters in deep textual data for different categories, and further
learning. In addition, to increase performance[50], summarizes and analyses their benefits, disadvantages,
they need high performance based hardware such limitations and applicability etc. Sentiment analysis from
as GPUs and wide RAM. test, image, speech, and video is very important in Human
Computer Interaction. For social media, the text-based
sentiment analysis plays a vital role. From this review
paper we can conclude that deep learning method gives
higher accuracy than all other methods. But in the case to
3781
COMPUSOFT, An international journal of advanced computer technology, 9(7), July-2020 (Volume-IX, Issue-VII)
handle the coherence and semantic in sentence the [15] Trinh, S., et al., 2016. Lexicon-based sentiment analysis of Facebook
knowledge-based approach is better but has the limitations comments in Vietnamese language, in Recent developments in
intelligent information and database systems. Springer. 263-276.
of accuracy, time and space complexity. On the other hand,
ontology-based sentiment analysis is good to handle text [16] Xu, Q., et al., 2020. Aspect-based sentiment classification with
properly but it is time consuming as compared to all other multi-attention network. Neurocomputing. 388, 135-143.
approaches. Although machine learning based supervised
technique is faster and more accurate but this type of [17] Ren, R., Wu, D.D. and Liu, T. 2018. Forecasting stock market
movement direction using sentiment analysis and support vector
method cannot handle negation, intensifier or modifier machine. IEEE Systems Journal. 13(1), 760-770.
clause in the sentence. For this case the unsupervised
knowledge-based approach and deep learning is good than [18] Kušen, E. and Strembeck, M. 2018. Politics, sentiments, and
all other methods. misinformation: An analysis of the Twitter discussion on the 2016
Austrian Presidential Elections. Online Social Networks and Media.
5, 37-50.
REFERENCES
[19] Haselmayer, M. and Jenny, M. 2017. Sentiment analysis of political
communication: combining a dictionary approach with crowdcoding.
[1] Kolkur, S., Dantal, G. and Mahe, R. 2015. Study of different levels
Quality & quantity. 51(6), 2623-2646.
for sentiment analysis. International Journal of Current Engineering
and Technology. 5(2), 768-770.
[20] Mikolov, T., et al., 2013.Efficient estimation of word representations
in vector space. arXiv preprint arXiv:1301.3781.
[2] Liu, B. 2012. Sentiment analysis and opinion mining. Synthesis
lectures on human language technologies. 5(1), 1-167.
[21] Zhai, S. and Zhang, Z.M. 2016.Semisupervised autoencoder for
sentiment analysis. in Thirtieth AAAI Conference on Artificial
[3] Hoogervorst, R., et al., 2016. Aspect-based sentiment analysis on the
Intelligence.
web using rhetorical structure theory. in International Conference on
Web Engineering. Springer.
[22] Zhang, Y., Jin, R. and Zhou, Z.-H. 2010. Understanding bag-of-
words model: a statistical framework. International Journal of
[4] Bibi, M., et al., 2020. A Cooperative Binary-Clustering Framework
Machine Learning and Cybernetics. 1(1-4), 43-52.
Based on Majority Voting for Twitter Sentiment Analysis. IEEE
Access.
[23] Zhang, M., Zhang, Y. and Vo,D.-T. 2016. Gated neural networks for
targeted sentiment analysis. in Thirtieth AAAI Conference on
[5] Sailunaz, K. and Alhajj, R. 2019.Emotion and sentiment analysis Artificial Intelligence.
from Twitter text. Journal of Computational Science. 36, 101003.
[24] Ma, Y., Peng, H. and Cambria,E. 2018. Targeted aspect-based
[6] Seal, D., Roy, U.K. and Basak, R. 2020. Sentence-Level Emotion sentiment analysis via embedding commonsense knowledge into an
Detection from Text Based on Semantic Rules, in Information and
attentive LSTM. in Thirty-second AAAI conference on artificial
Communication Technology for Sustainable Development. Springer.
intelligence.
423-430.
[25] Xu, H., et al., 2018. Double embeddings and cnn-based sequence
[7] Ragheb, W., et al., 2019. Attention-based Modeling for Emotion
labeling for aspect extraction. arXiv preprint arXiv:1805.04601.
Detection and Classification in Textual Conversations. arXiv
preprint arXiv:1906.07020.
[26] Pennington, J., Socher, R. and Manning, C.D. 2014.Glove: Global
vectors for word representation. in Proceedings of the 2014
[8] Srivastava, S., Khurana, P. and Tewari,V. 2018. Identifying
conference on empirical methods in natural language processing
aggression and toxicity in comments using capsule network. in
(EMNLP).
Proceedings of the First Workshop on Trolling, Aggression and
Cyberbullying (TRAC-2018).
[27] Devlin, J., et al., 2018. Bert: Pre-training of deep bidirectional
transformers for language understanding. arXiv preprint
[9] Alomari, E., Mehmood, R. and Katib, I. 2020. Sentiment Analysis of
arXiv:1810.04805.
Arabic Tweets for Road Traffic Congestion and Event Detection, in
Smart Infrastructure and Applications. Springer. 37-54.
[28] Kim, Y. 2014. Convolutional neural networks for sentence
classification. arXiv preprint arXiv:1408.5882.
[10] Aloufi, S. and El Saddik, A. 2018. Sentiment identification in
football-specific tweets. IEEE Access. 6, 78609-78621.
[29] Bahdanau, D., Cho, K. and Bengio, Y. 2014. Neural machine
translation by jointly learning to align and translate. arXiv preprint
[11] Ruz, G.A., Henríquez, P.A. andMascareño, A. 2020, Sentiment
arXiv:1409.0473.
analysis of Twitter data during critical events through Bayesian
networks classifiers. Future Generation Computer Systems. 106, 92-
104. [30] Cho, K., et al., 2014. Learning phrase representations using RNN
encoder-decoder for statistical machine translation. arXiv preprint
arXiv:1406.1078.
[12] Wu, D. and Cui, Y. 2018. Disaster early warning and damage
assessment analysis using social media data and geo-location
information. Decision Support Systems. 111, 48-59. [31] Zhang, X., Zhao, J. and LeCun, Y. 2015. Character-level
convolutional networks for text classification. in Advances in neural
information processing systems.
[13] Dashtipour, K., et al., 2020.A hybrid Persian sentiment analysis
framework: Integrating dependency grammar based rules and deep
neural networks. Neurocomputing. 380, 1-10. [32] Conneau, A., et al., 2016.Very deep convolutional networks for text
classification. arXiv preprint arXiv:1606.01781.
[14] Kausar, S., et al., 2019. A Sentiment Polarity Categorization
Technique for Online Product Reviews. IEEE Access.
3782
COMPUSOFT, An international journal of advanced computer technology, 9(7), July-2020 (Volume-IX, Issue-VII)
[33] Park, J.H. and Fung, P. 2017. One-step and two-step classification
for abusive language detection on twitter. arXiv preprint
arXiv:1706.01206.
[34] Lai, S., et al., 2015.Recurrent convolutional neural networks for text
classification. in Twenty-ninth AAAI conference on artificial
intelligence.
[41] Du, Y., et al., 2019. A novel capsule based hybrid neural network for
sentiment classification. IEEE Access. 7, 39321-39328.
[44] Zhao, W., et al., 2019. Towards scalable and reliable capsule
networks for challenging NLP applications. arXiv preprint
arXiv:1906.02829.
[46] Yuan, Z., et al., 2018. Domain attention model for multi-domain
sentiment classification. Knowledge-Based Systems. 155, 1-10.
3783