Deep Learning
Deep Learning
Deep Learning
https://doi.org/10.1007/s42979-021-00815-1
REVIEW ARTICLE
Received: 29 May 2021 / Accepted: 7 August 2021 / Published online: 18 August 2021
© The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd 2021
Abstract
Deep learning (DL), a branch of machine learning (ML) and artifcial intelligence (AI) is nowadays considered as
a core technology of today’s Fourth Industrial Revolution (4IR or Industry 4.0). Due to its learning capabilities
from data, DL technology originated from artifcial neural network (ANN), has become a hot topic in the context
of computing, and is widely applied in various application areas like healthcare, visual recognition, text
analytics, cybersecurity, and many more. However, building an appropriate DL model is a challenging task, due
to the dynamic nature and variations in real-world problems and data. Moreover, the lack of core understanding
turns DL methods into black-box machines that hamper develop
ment at the standard level. This article presents a structured and comprehensive view on DL techniques
including a taxonomy considering various types of real-world tasks like supervised or unsupervised. In our
taxonomy, we take into account deep networks for supervised or discriminative learning, unsupervised or
generative learning as well as hybrid learning and relevant others. We also summarize real-world application
areas where deep learning techniques can be used. Finally, we point out ten potential aspects for future
generation DL modeling with research directions. Overall, this article aims to draw a big picture on DL modeling
that can be used as a reference guide for both academia and industry professionals.
Keywords Deep learning · Artifcial neural network · Artifcial intelligence · Discriminative learning ·
Generative learning · Hybrid learning · Intelligent systems
SN Computer Science
SN Computer Science (2021) 2:420 Page 3 of 20 420
– Finally, we point out and discuss ten potential is therefore rel evant to artifcial intelligence [103],
aspects with research directions for future machine learning [97] and data science with
generation DL mod eling in terms of conducting advanced analytics [95] that are well known areas in
future research and system development. computer science, particularly, today’s intel ligent
computing. In the following, we frst discuss regarding
This paper is organized as follows. Section “Why the position of deep learning in AI, or how DL
Deep Learning in Today's Research technology is related to these areas of computing.
andApplications?” motivates why deep learning is
important to build data-driven intel ligent systems. In The Position of Deep Learning in AI
Section“ Deep Learning Techniques and
Applications”, we present our DL taxonomy by taking Nowadays, artifcial intelligence (AI), machine learning
into account the variations of deep learning tasks and (ML), and deep learning (DL) are three popular terms
how they are used in solving real-world issues and that are sometimes used interchangeably to describe
briefy discuss the techniques with summarizing the systems or software that behaves intelligently. In
potential application areas. In Section “Research Fig. 2, we illustrate the position of deep Learning,
Directions and Future Aspects”, we discuss various comparing with machine learning and artifcial
research issues of deep learning-based mod intelligence. According to Fig. 2, DL is a part
eling and highlight the promising topics for future
research within the scope of our study. Finally,
Section “Concluding Remarks” concludes this paper.
SN Computer Science
420 Page 4 of 20 SN Computer Science (2021) 2:420
ments, video clips, time-series data, are some real world application areas of deep learning. Diferent
examples of sequential data. cat egories of DL techniques perform diferently
– Image or 2D Data A digital image is made up of a depending on the nature and characteristics of data,
matrix, which is a rectangular array of numbers, discussed briefy in Section “Deep Learning
sym bols, or expressions arranged in rows and Techniques and Applications” with a taxonomy
columns in a 2D array of numbers. Matrix, pixels, presentation. However, in many real-world application
voxels, and bit depth are the four essential areas, the standard machine learning techniques,
characteristics or fundamental parameters of a particularly, logic-rule or tree-based techniques [93,
digital image. 101] perform signifcantly depending on the
– Tabular Data A tabular dataset consists primarily of application nature. Figure 3 also shows the
rows and columns. Thus tabular datasets contain performance comparison of DL and ML modeling
data in a columnar format as in a database table. considering the amount of data. In the fol lowing, we
Each column (feld) must have a name and each highlight several cases, where deep learning is useful
column may only con to solve real-world problems, according to our main
tain data of the defned type. Overall, it is a logical focus in this paper.
and systematic arrangement of data in the form of
rows and columns that are based on data DL Properties and Dependencies
properties or features. Deep learning models can
learn efciently on tabular data and allow us to build A DL model typically follows the same processing
data-driven intelligent systems. stages as machine learning modeling. In Fig. 4, we
have shown a deep learning workfow to solve real-
world problems, which consists of three processing
steps, such as data understand
ing and preprocessing, DL model building, and
training, and validation and interpretation. However,
unlike the ML modeling [98, 108], feature extraction
in the DL model is automated rather than manual. K-
nearest neighbor, support vector machines, decision
tree, random forest, naive Bayes, linear regression,
association rules, k-means clustering, are some
examples of machine learning techniques that are
com
monly used in various application areas [97]. On the
other hand, the DL model includes convolution neural
Fig. 3 An illustration of the performance comparison between network, recurrent neural network, autoencoder, deep
deep learning (DL) and other machine learning (ML) belief network, and many more, discussed briefy with
algorithms, where DL modeling from large amounts of data can their potential appli
increase the perfor mance
cation areas in Section 3. In the following, we discuss
The above-discussed data forms are common in the
the key properties and dependencies of DL
techniques, that are
Fig. 4 A typical DL workfow to solve real-world problems, which consists of three sequential stages (i) data understanding and
preprocessing (ii) DL model building and training (iii) validation and interpretation
SN Computer Science
SN Computer Science (2021) 2:420 Page 5 of 20 420
needed to take into account before started working on than one week to complete a training session,
DL modeling for real-world applications. whereas training with ML algorithms takes
relatively little time, only seconds to hours [107,
– Data Dependencies Deep learning is typically 127]. During testing, deep learning algorithms take
dependent on a large amount of data to build a extremely little time to run [127], when compared to
data-driven model for a particular problem domain. certain machine learning methods.
The reason is that when the data volume is small, – Black-box Perception and Interpretability Interpret
deep learning algorithms often perform poorly [64]. ability is an important factor when comparing DL with
In such circumstances, however, the performance ML. It’s difcult to explain how a deep learning result
of the standard machine-learning algo was obtained, i.e., “black-box”. On the other hand, the
rithms will be improved if the specifed rules are machine-learning algorithms, particularly, rule-based
used [64, 107]. machine learning techniques [97] provide explicit logic
– Hardware Dependencies The DL algorithms require rules (IF-THEN) for making decisions that are easily
large computational operations while training a interpretable for humans. For instance, in our earlier
model with large datasets. As the larger the works, we have presented several machines learning
computations, the more the advantage of a GPU rule based techniques [100, 102, 105], where the
over a CPU, the GPU is mostly used to optimize extracted rules are human-understandable and easier
the operations efciently. Thus, to work properly to interpret, update or delete according to the target
with the deep learning training, GPU hardware is applications.
necessary. Therefore, DL relies more on high-
performance machines with GPUs than standard The most signifcant distinction between deep learning
machine learning methods [19, 127]. and regular machine learning is how well it performs
– Feature Engineering Process Feature engineering is when data
the process of extracting features (characteristics, grows exponentially. An illustration of the performance
properties, and attributes) from raw data using comparison between DL and standard ML algorithms
domain knowledge. A fundamental distinction has been shown in Fig. 3, where DL modeling can
between DL and other machine increase the performance with the amount of data.
learning techniques is the attempt to extract high- Thus, DL modeling is extremely useful when dealing
level characteristics directly from data [22, 97]. with a large amount of data because of its capacity to
Thus, DL decreases the time and efort required to process vast amounts of features to build an efective
construct a fea ture extractor for each problem. data-driven model. In terms of develop
– Model Training and Execution time In general, train ing and training DL models, it relies on parallelized
ing a deep learning algorithm takes a long time due matrix and tensor operations as well as computing
to a large number of parameters in the DL gradients and optimization. Several, DL libraries and
algorithm; thus, the model training process takes resources [30] such as PyTorch [82] (with a high-level
longer. For instance, the DL models can take more API called Lightning) and TensorFlow [1] (which also
ofers Keras as a high-level API) ofers these core Unsupervised: a data-driven process that analyzes
utilities including many pre-trained models, as well as unlabeled datasets, (iii) Semi-supervised: a
many other necessary functions for implementa hybridization of both the supervised and
tion and DL model building. unsupervised methods, and (iv) Reinforcement: an
environ ment driven approach, discussed briefy in our
earlier paper [97]. Thus, to present our taxonomy, we
Deep Learning Techniques divide DL tech niques broadly into three major
and Applications categories: (i) deep networks for supervised or
discriminative learning; (ii) deep networks for
unsupervised or generative learning; and (ii) deep net
In this section, we go through the various types of
works for hybrid learning combing both and relevant
deep neural network techniques, which typically
others, as shown in Fig. 6. In the following, we briefy
consider sev eral layers of information-processing
discuss each of these techniques that can be used to
stages in hierarchical structures to learn. A typical
solve real-world prob lems in various application areas
deep neural network contains multiple hidden layers
according to their learning capabilities.
including input and output layers. Figure 5 shows a
general structure of a deep neural network (hidden
layer = N and N ≥ 2) comparing with a shallow
Deep Networks for Supervised
network (hidden layer = 1). We also present our
or Discriminative Learning
taxonomy on DL techniques based on how they are
used to solve vari ous problems, in this section. This category of DL techniques is utilized to provide a
However, before exploring the details of the DL discriminative function in supervised or classification
techniques, it’s useful to review various types of applications. Discriminative deep architectures are
learning tasks such as (i) Supervised: a task-driven typi cally designed to give discriminative power for
approach that uses labeled training data, (ii) pattern
SN Computer Science
420 Page 6 of 20 SN Computer Science (2021) 2:420
Fig. 5 A general architecture of a a shallow network with one hidden layer and b a deep neural network with multiple
hidden layers
Fig. 6 A taxonomy of DL techniques, broadly divided into three major categories (i) deep networks for supervised or discriminative
learning, (ii) deep networks for unsupervised or generative learning, and (ii) deep networks for hybrid learning and relevant others
classifcation by describing the posterior distributions network (ANN). It is also known as the foundation
of classes conditioned on visible data [21]. archi tecture of deep neural networks (DNN) or deep
Discriminative architectures mainly include Multi- learning. A typical MLP is a fully connected network
Layer Perceptron (MLP), Convolutional Neural that consists of an input layer that receives input
Networks (CNN or ConvNet), Recur data, an output layer that makes a decision or
rent Neural Networks (RNN), along with their variants. prediction about the input signal, and one or more
In the following, we briefy discuss these techniques. hidden layers between these two that are consid ered
as the network’s computational engine [36, 103]. The
Multi-layer Perceptron (MLP) output of an MLP network is determined using a
variety of activation functions, also known as transfer
Multi-layer Perceptron (MLP), a supervised learning functions, such as ReLU (Rectifed Linear Unit), Tanh,
approach [83], is a type of feedforward artifcial neural Sigmoid, and Soft max [83, 96]. To train MLP employs
the most extensively
SN Computer Science
SN Computer Science (2021) 2:420 Page 7 of 20 420
used algorithm “Backpropagation” [36], a supervised The Convolutional Neural Network (CNN or ConvNet)
learn ing technique, which is also known as the most [65] is a popular discriminative deep learning
basic build ing block of a neural network. During the architecture that learns directly from the input without
training process, various optimization approaches the need for human feature extraction. Figure 7
such as Stochastic Gradi ent Descent (SGD), Limited shows an example of a CNN including multiple
Memory BFGS (L-BFGS), and Adaptive Moment convolutions and pooling layers. As a result, the CNN
Estimation (Adam) are applied. MLP requires tuning enhances the design of traditional ANN like
of several hyperparameters such as the num ber of regularized MLP networks. Each layer in CNN takes
hidden layers, neurons, and iterations, which could into account optimum parameters for a meaningful
make solving a complicated model computationally output as well as reduces model complexity. CNN
expen sive. However, through partial ft, MLP ofers the also uses a ‘dropout’ [30] that can deal with the
advantage of learning non-linear models in real-time problem of over-ftting, which may occur in a
or online [83]. traditional network.
CNNs are specifcally intended to deal with a variety
Convolutional Neural Network (CNN or ConvNet) of 2D shapes and are thus widely employed in visual
recogni tion, medical image analysis, image
segmentation, natural language processing, and
many more [65, 96]. The capa bility of automatically tion into and out of the cell is managed by three
discovering essential features from the input without gates. For instance, the ‘Forget Gate’ determines
the need for human intervention makes it more what informa tion from the previous state cell will
powerful than a traditional network. Several variants be memorized and what information will be
of CNN are exist in the area that includes visual removed that is no longer use ful, while the ‘Input
geometry group (VGG) [38], AlexNet [62], Xception Gate’ determines which information should enter
[17], Inception [116], ResNet [39], etc. that can be the cell state and the ‘Output Gate’ deter mines
used in various applica tion domains according to and controls the outputs. As it solves the issues of
their learning capabilities. training a recurrent network, the LSTM network is
considered one of the most successful RNN.
Recurrent Neural Network (RNN) and its Variants – Bidirectional RNN/LSTM Bidirectional RNNs
connect two hidden layers that run in opposite
A Recurrent Neural Network (RNN) is another popular directions to a single output, allowing them to
neu ral network, which employs sequential or time- accept data from both the past and future.
series data and feeds the output from the previous Bidirectional RNNs, unlike tradi
step as input to the current stage [27, 74]. Like tional recurrent networks, are trained to predict
feedforward and CNN, recurrent networks learn from both positive and negative time directions at the
training input, however, distinguish by same time. A Bidirectional LSTM, often known as
a BiLSTM, is an extension of the standard LSTM
that can increase model performance on
Fig. 7 An example of a convo sequence classifcation issues [113]. It is a
lutional neural network (CNN
or ConvNet) including multiple
sequence processing model comprising of two
convolution and pooling layers LSTMs: one takes the input forward and the other
their “memory”, which allows them to impact current takes it backward. Bidirectional LSTM in particular
input and output through using information from is a popular choice in natural language processing
previous inputs. Unlike typical DNN, which assumes tasks.
that inputs and outputs are independent of one – Gated recurrent units (GRUs) A Gated Recurrent
another, the output of RNN is reliant on prior Unit (GRU) is another popular variant of the recurrent
elements within the sequence. However, standard net work that uses gating methods to control and
recurrent networks have the issue of vanishing manage information fow between cells in the neural
gradients, which makes learning long data sequences network, introduced by Cho et al. [16]. The GRU is
challenging. In the following, we discuss several like an LSTM, however, has fewer parameters, as it
popular variants of the recurrent network that has a reset gate and
minimizes the issues and perform well in many real-
world application domains.
SN Computer Science
SN Computer Science (2021) 2:420 Page 9 of 20 420
alignment of the latent feature space [66]. Inverse problems requiring a generative solution.
models, such as Bidirectional GAN (BiGAN) [25] can
also learn a mapping from data to the latent space, Auto-Encoder (AE) and Its Variants
similar to how the standard GAN model learns a
mapping from a latent space to the data distribution. An auto-encoder (AE) [31] is a popular unsupervised
The potential application areas of GAN networks are learn ing technique in which neural networks are used
healthcare, image analysis, data augmen to learn representations. Typically, auto-encoders are
tation, video generation, voice generation, pandemics, used to work with high-dimensional data, and
trafc control, cybersecurity, and many more, which dimensionality reduction explains how a set of data is
are increas ing rapidly. Overall, GANs have represented. Encoder, code, and decoder are the
established themselves as a comprehensive domain three parts of an autoencoder. The encoder
of independent data expansion and as a solution to compresses the input and generates the code, which
the decoder subsequently uses to reconstruct the
input. The AEs have recently been used to learn
generative data mod
els [69]. The auto-encoder is widely used in many
unsuper vised learning tasks, e.g., dimensionality
reduction, feature extraction, efcient coding,
generative modeling, denoising, anomaly or outlier
detection, etc. [31, 132]. Principal com ponent
analysis (PCA) [99], which is also used to reduce the
dimensionality of huge data sets, is essentially similar
to a single-layered AE with a linear activation
function. Regular ized autoencoders such as sparse,
denoising, and contractive are useful for learning
representations for later classifcation tasks [119],
while variational autoencoders can be used as
generative models [56], discussed below.
SN Computer Science
420 Page 10 of 20 SN Computer Science (2021) 2:420
coders which map the input onto a latent vector, A Restricted Boltzmann Machine (RBM) [75] is also a
map the input data into the parameters of a gen erative stochastic neural network capable of
probability dis tribution, such as the mean and learning a prob ability distribution across its inputs.
variance of a Gaussian distribution. A VAE Boltzmann machines typically consist of visible and
assumes that the source data has an underlying hidden nodes and each node is connected to every
probability distribution and then tries to discover other node, which helps us understand irregularities
the distribution’s parameters. Although this by learning how the system works in normal
approach was initially designed for unsupervised circumstances. RBMs are a subset of Boltzmann
learn ing, its use has been demonstrated in other machines that have a limit on the number of
domains such as semi-supervised learning [128] connections between the visible and hidden layers
and supervised learning [51]. [77]. This restriction permits train ing algorithms like
the gradient-based contrastive divergence algorithm
Although, the earlier concept of AE was typically for to be more efcient than those for Boltzmann
dimensionality reduction or feature learning machines in general [41]. RBMs have found
mentioned above, recently, AEs have been brought applications in dimensionality reduction, classifcation,
to the forefront of generative modeling, even the regression, col laborative fltering, feature learning,
generative adversarial net topic modeling, and many others. In the area of deep
work is one of the popular methods in the area. The learning modeling, they can be trained either
AEs have been efectively employed in a variety of supervised or unsupervised, depend ing on the task.
domains, including healthcare, computer vision, Overall, the RBMs can recognize patterns in data
speech recogni tion, cybersecurity, natural language automatically and develop probabilistic or stochastic
processing, and many more. Overall, we can models, which are utilized for feature selection or
conclude that auto-encoder and its variants can play extraction, as well as forming a deep belief network.
a signifcant role as unsupervised feature learning
with neural network architecture. Deep Belief Network (DBN)
of data through exploratory analysis. As a result, discriminative models. The generative models with
these deep generative networks can be utilized as use ful representation can provide more informative
preprocessing for supervised or discriminative and low dimensional features for discrimination, and
learning tasks, as well as ensuring model accuracy, they can also
where unsupervised representation learning can enable to enhance the training data quality and
allow for improved classifer generalization. quantity, providing additional information for
classifcation.
Deep Networks for Hybrid Learning
and Other Approaches Deep Transfer Learning (DTL)
In addition to the above-discussed deep learning Transfer Learning is a technique for efectively using
categories, hybrid deep networks and several other previ ously learned model knowledge to solve a new
approaches such as deep transfer learning (DTL) and task with minimum training or fne-tuning. In
deep reinforcement learn ing (DRL) are popular, comparison to typical machine learning techniques
which are discussed in the following. [97], DL takes a large amount of training data. As a
result, the need for a substantial vol ume of labeled
Hybrid Deep Neural Networks data is a signifcant barrier to address some essential
domain-specifc tasks, particularly, in the medical
Generative models are adaptable, with the capacity to sector, where creating large-scale, high-quality
learn from both labeled and unlabeled data. annotated medical or health datasets is both difcult
Discriminative mod els, on the other hand, are unable and costly. Fur thermore, the standard DL model
to learn from unlabeled data yet outperform their demands a lot of computa tional resources, such as a
generative counterparts in super vised tasks. A GPU-enabled server, even though researchers are
framework for training both deep generative and working hard to improve it. As a result, Deep Transfer
discriminative models simultaneously can enjoy the Learning (DTL), a DL-based transfer learning
benefts of both models, which motivates hybrid method, might be helpful to address this issue.
networks. Hybrid deep learning models are typically Figure 11 shows a general structure of the transfer
composed of multiple (two or more) deep basic learning process, where knowledge from the pre-
learning models, where the basic model is a trained model is transferred into a new DL model. It’s
discriminative or generative deep learn ing model especially popular in deep learning right now since it
discussed earlier. Based on the integration of dif allows to train deep neural networks with very little
ferent basic generative or discriminative models, the data [126].
below three categories of hybrid deep learning Transfer learning is a two-stage approach for
models might be useful for solving real-world training a DL model that consists of a pre-training
problems. These are as follows: step and a fne tuning step in which the model is
trained on the target task. Since deep neural
– Hybrid Model_1: An integration of diferent networks have gained popularity in a vari ety of felds,
generative or discriminative models to extract a large number of DTL methods have been pre
more meaningful and robust features. Examples sented, making it crucial to categorize and summarize
could be CNN+LSTM, AE+GAN, and so on. them. Based on the techniques used in the literature,
– Hybrid Model_2: An integration of generative model DTL can be classifed into four categories [117].
followed by a discriminative model. Examples These are (i) instances based deep transfer learning
could be DBN+MLP, GAN+CNN, AE+CNN, and that utilizes instances in source domain by
so on. appropriate weight, (ii) mapping-based deep transfer
– Hybrid Model_3: An integration of generative or learning that maps instances from two domains into a
discrim inative model followed by a non-deep learning new data space with better similarity, (iii) network-
classifer. Examples could be AE+SVM, CNN+SVM, based deep transfer learning that reuses the partial of
and so on. network pre trained in the source domain, and (iv)
adversarial based deep transfer learning that uses
adversarial technology to fnd transferable features
Thus, in a broad sense, we can conclude that hybrid
that both suitable for two domains. Due to its high
mod els can be either classifcation-focused or non-
efectiveness and practicality, adversarial-based deep
classifcation depending on the target use. However,
transfer learning has exploded in popularity in recent
most of the hybrid learning-related studies in the area
years. Transfer learning can also be classifed into
of deep learning are classifcation-focused or
inductive, transductive, and unsupervised transfer
supervised learning tasks, sum marized in Table 1.
learning depending on the circumstances between
The unsupervised generative models with meaningful
the source and target domains and activities [81].
representations are employed to enhance the
While most current research focuses on supervised techniques are useful in a variety of felds including
learning, how deep neural networks can transfer natural language processing, sentiment classifcation,
knowledge in unsupervised or semi-supervised visual recognition, speech recogni tion, spam fltering,
learning may gain further interest in the future. DTL and relevant others.
SN Computer Science
420 Page 12 of 20 SN Computer Science (2021) 2:420
Table 1 A summary of deep learning tasks and methods in several popular real-world applications areas
Application areas Tasks Methods References
Healthcare and Medical applications Regular health factors analysis CNN-based Ismail et al. [48] Identifying malicious
behaviors RNN-based Xue et al. [129]
Coronary heart disease risk prediction Autoencoder based Amarbayasgalan et al. [6]
Cancer classifcation Transfer learning based Sevakula et al. [110]
Diagnosis of COVID-19 CNN and BiLSTM based Aslan et al. [10]
Detection of COVID-19 CNN-LSTM based Islam et al. [47]
Natural Language Processing Text summarization Auto-encoder based Yousef et al. [130] Sentiment analysis CNN-LSTM
based Wang et al. [120]
Sentiment analysis CNN and Bi-LSTM based Minaee et al. [78]
Aspect-level sentiment classifcation Attention-based LSTM Wang et al. [124]
Speech recognition Distant speech recognition Attention-based LSTM Zhang et al. [135] Speech emotion classifcation
Transfer learning based Latif et al. [63]
Emotion recognition from speech CNN and LSTM based Satt et al. [109]
Cybersecurity Zero-day malware detection Autoencoders and GAN based Kim et al. [54] Security incidents and fraud
analysis SOM-based Lopez et al. [70]
Android malware detection Autoencoder and CNN based Wang et al. [122]
intrusion detection classifcation DBN-based Wei et al. [125]
DoS attack detection RBM-based Imamverdiyev et al. [46]
Suspicious fow detection Hybrid deep-learning-based Garg et al. [29]
Network intrusion detection AE and SVM based Al et al. [4]
IoT and Smart cities Smart energy management CNN and Attention mechanism Abdel et al. [2] Particulate matter
forecasting CNN-LSTM based Huang et al. [43]
Smart parking system CNN-LSTM based Piccialli et al. [85]
Disaster management DNN-based Aqib et al. [8]
Air quality prediction LSTM-RNN based Kok et al. [61]
Cybersecurity in smart cities RBM, DBN, RNN, CNN, GAN Chen et al. [15]
Smart Agriculture A smart agriculture IoT system RL-based Bu et al. [11] Plant disease detection CNN-based Ale et al.
[5]
Automated soil quality evaluation DNN-based Sumathi et al. [115]
Business and Financial Services Predicting customers’ purchase behavior DNN based Chaudhuri [14] Stock trend
prediction CNN and LSTM based anuradha et al. [7]
Financial loan default prediction CNN-based Deng et al. [23]
Power consumption forecasting LSTM-based Shao et al. [112]
Virtual Assistant and Chatbot Services An intelligent chatbot Bi-RNN and Attention model Dhyani et al. [24] Virtual listener
agent GRU and LSTM based Huang et al. [44]
Smart blind assistant CNN-based Rahman et al. [88]
Object Detection and Recognition Object detection in X-ray images CNN-based Gu et al. [35] Object detection for
disaster response CNN-based Pi et al. [84]
Medicine recognition system CNN-based Chang et al. [12]
Face recognition in IoT-cloud environ ment CNN-based Masud et al. [76]
SN Computer Science
SN Computer Science (2021) 2:420 Page 13 of 20 420
SN Computer Science
420 Page 14 of 20 SN Computer Science (2021) 2:420
Fig. 13 Several potential real-world application areas of deep research directions based on our study.
learning
– Automation in Data Annotation According to the
existing literature, discussed in Section 3, most of the
various deep learning tasks and techniques that are deep learn ing models are trained through publicly
used to solve the relevant tasks in several real-world available datasets that are annotated. However, to
applications areas. Overall, from Fig. 13 and Table 1, build a system for a new problem domain or recent
we can conclude that the future prospects of deep data-driven system, raw data from relevant sources
learning modeling in real are needed to collect. Thus, data
world application areas are huge and there are lots of
scopes to work. In the next section, we also
summarize the research issues in deep learning
modeling and point out the potential aspects for annotation, e.g., categorization, tagging, or labeling
future generation DL modeling. of a large amount of raw data, is important for
building dis criminative deep learning models or
supervised tasks, which is challenging. A
Research Directions and Future Aspects technique with the capability of automatic and
dynamic data annotation, rather than man ual
While existing methods have established a solid annotation or hiring annotators, particularly, for
foundation for deep learning systems and research, large datasets, could be more efective for
this section outlines the below ten potential future supervised learning as well as minimizing human
efort. Therefore, a more in-depth investigation of Thus, deep learning models may become
data collection and annotation methods, or worthless or yield decreased accuracy if the data
designing an unsupervised learning-based is bad, such as data sparsity, non-representative,
solution could be one of the primary research poor-quality, ambiguous values, noise, data
directions in the area of deep learning modeling. imbalance, irrelevant features, data inconsistency,
– Data Preparation for Ensuring Data Quality As dis insufcient quan
cussed earlier throughout the paper, the deep tity, and so on for training. Consequently, such
learning algorithms highly impact data quality, and issues in data can lead to poor processing and
availability for training, and consequently on the inaccurate fnd
resultant model for a particular problem domain.
SN Computer Science
SN Computer Science (2021) 2:420 Page 15 of 20 420
ings, which is a major problem while discovering of the data, could be a novel contribution, which
insights from data. Thus deep learning models can also be considered as a major future aspect in
also need to adapt to such rising issues in data, to the area of supervised or discriminative learning. –
capture approximated information from Deep Networks for Unsupervised or Generative Learn
observations. Therefore, efective data pre- ing As discussed in Section 3, unsupervised learning
processing techniques are needed to design or generative deep learning modeling is one of the
accord major tasks in the area, as it allows us to characterize
ing to the nature of the data problem and the high-order correlation properties or features in
characteristics, to handling such emerging data, or generating a new representation of data
challenges, which could be another research through explor atory analysis. Moreover, unlike
direction in the area. supervised learning [97], it does not require labeled
– Black-box Perception and Proper DL/ML Algorithm data due to its capa bility to derive insights directly
Selection In general, it’s difcult to explain how a from the data as well as data-driven decision making.
deep learning result is obtained or how they get Consequently, it thus can be used as preprocessing
the ultimate decisions for a particular model. for supervised learning or discriminative modeling as
Although DL models achieve signifcant well as semi-supervised learning tasks, which ensure
performance while learning from large datasets, as learning accuracy and
discussed in Section 2, this “black-box” perception model efciency. According to our designed
of DL modeling typically represents weak taxonomy of deep learning techniques, as shown
statistical interpretability that could be a major in Fig. 6, genera tive techniques mainly include
issue in the area. On the other hand, ML GAN, AE, SOM, RBM, DBN, and their variants.
algorithms, particularly, rule-based machine Thus, designing new tech niques or their variants
learning techniques provide explicit logic rules (IF- for an efective data modeling or representation
THEN) for making decisions that are eas ier to according to the target real-world application could
interpret, update or delete according to the target be a novel contribution, which can also be
applications [97, 100, 105]. If the wrong learning considered as a major future aspect in the area of
algo unsupervised or generative learning.
rithm is chosen, unanticipated results may occur, – Hybrid/Ensemble Modeling and Uncertainty
result ing in a loss of efort as well as the model’s Handling According to our designed taxonomy of
efcacy and accuracy. Thus by taking into account DL techniques, as shown in Fig 6, this is
the performance, complexity, model accuracy, and considered as another major category in deep
applicability, selecting an appropriate model for learning tasks. As hybrid modeling enjoys the
the target application is chal lenging, and in-depth benefts of both generative and discrimina tive
analysis is needed for better under standing and learning, an efective hybridization can outperform
decision making. others in terms of performance as well as
– Deep Networks for Supervised or Discriminative uncertainty handling in high-risk applications. In
Learn ing: According to our designed taxonomy of Section 3, we have summarized various types of
deep learn ing techniques, as shown in Fig. 6, hybridization, e.g., AE+CNN/SVM. Since a group
discriminative archi tectures mainly include MLP, of neural networks is trained with distinct
CNN, and RNN, along parameters or with separate sub sampling training
with their variants that are applied widely in various datasets, hybridization or ensem bles of such
application domains. However, designing new techniques, i.e., DL with DL/ML, can play a key
techniques or their variants of such discriminative role in the area. Thus designing efective blended
techniques by tak ing into account model discriminative and generative models accord ingly
optimization, accuracy, and appli cability, according rather than naive method, could be an important
to the target real-world application and the nature research opportunity to solve various real-world
issues including semi-supervised learning tasks world applications. The concept of incremental
and model uncertainty. approaches or recency-based learning [100] might be
– Dynamism in Selecting Threshold/ Hyper- efective in sev eral cases depending on the nature of
parameters Values, and Network Structures with target applications. Moreover, assuming the network
Computational Ef ciency In general, the relationship structures with a static number of nodes and layers,
among performance, model complexity, and hyper-parameters values or threshold settings, or
computational requirements is a key issue in deep selecting them by the trial-and error process may not
learning modeling and applications. A combination of be efective in many cases, as it can be changed due
algorithmic advancements with improved accuracy as to the changes in data. Thus, a data driven approach
well as maintaining computational efciency, i.e., to select them dynamically could be more efective
achieving the maximum throughput while consum ing while building a deep learning model in terms of both
the least amount of resources, without signifcant performance and real-world applicability. Such type of
information loss, can lead to a breakthrough in the data-driven automation can lead to future generation
efec tiveness of deep learning modeling in future real- deep learning modeling with additional intel ligence,
which could be a signifcant future aspect in the
SN Computer Science
420 Page 16 of 20 SN Computer Science (2021) 2:420
area as well as an important research direction to language processing, the properties of the English
contrib ute. language typically difer from other lan guages like
– Lightweight Deep Learning Modeling for Next-Gener Bengali, Arabic, French, etc. Thus integrating
ation Smart Devices and Applications: In recent years, domain-based constraints into the deep learning
the Internet of Things (IoT) consisting of billions of model could produce better results for such particular
intelligent and communicating things and mobile com purpose. For instance, a task-specifc feature
munications technologies have become popular to extractor considering domain knowledge in smart
detect and gather human and environmental manufacturing for fault diag nosis can resolve the
information (e.g. geo-information, weather data, bio- issues in traditional deep-learning based methods
data, human behav iors, and so on) for a variety of [28]. Similarly, domain knowledge in medi cal image
intelligent services and applications. Every day, these analysis [58], fnancial sentiment analysis [49],
ubiquitous smart things or devices generate large cybersecurity analytics [94, 103] as well as conceptual
amounts of data, requiring rapid data processing on a data model in which semantic information, (i.e.,
variety of smart mobile devices [72]. Deep learning meaningful for a system, rather than merely
technologies can be incorporate to discover correlational) [45, 121, 131] is included, can play a
underlying properties and to efectively han dle such vital role in the area. Transfer learning could be an
large amounts of sensor data for a variety of IoT efective way to get started on a new challenge with
applications including health monitoring and dis ease domain knowledge. Moreover, contextual information
analysis, smart cities, trafc fow prediction, and such as spatial, temporal, social, environmental
monitoring, smart transportation, manufacture inspec contexts [92, 104, 108] can also play an important
tion, fault assessment, smart industry or Industry 4.0, role to incorpo rate context-aware computing with
and many more. Although deep learning techniques domain knowledge for smart decision making as
discussed in Section 3 are considered as powerful well as building adaptive and intelligent context-
tools for processing big data, lightweight modeling is aware systems. Therefore understanding domain
impor tant for resource-constrained devices, due to knowledge and efectively incorporating them into
their high computational cost and considerable the deep learning model could be another research
memory overhead. Thus several techniques such as direc tion.
optimization, simplif cation, compression, pruning, – Designing General Deep Learning Framework for
generalization, important feature extraction, etc. Target Application Domains One promising
might be helpful in several cases. Therefore, research direction for deep learning-based
constructing the lightweight deep learning techniques solutions is to develop a general framework that
based on a baseline network architecture to adapt the can handle data diversity, dimensions, stim
DL model for next-generation mobile, IoT, or ulation types, etc. The general framework would
resource-constrained devices and applications, could require two key capabilities: the attention
be considered as a signifcant future aspect in the mechanism that focuses on the most valuable
area. parts of input signals, and the abil ity to capture
– Incorporating Domain Knowledge into Deep Learn latent feature that enables the framework to
ing Modeling Domain knowledge, as opposed to capture the distinctive and informative features.
general knowledge or domain-independent Attention models have been a popular research
knowledge, is knowl edge of a specifc, specialized topic because of their intuition, versatility, and
topic or feld. For instance, in terms of natural interpretability, and employed in various
application areas like computer vision, natural lan above-mentioned concerns and tackle real-world
guage processing, text or image classifcation, problems in a variety of application areas. This can
sentiment analysis, recommender systems, user also help the researchers con
profling, etc [13, 80]. Attention mechanism can be duct a thorough analysis of the application’s hidden
implemented based on learning algorithms such and unexpected challenges to produce more reliable
as reinforcement learning that is capable of fnding and realis tic outcomes. Overall, we can conclude that
the most useful part through a policy search [133, addressing the above-mentioned issues and
134]. Similarly, CNN can be integrated with contributing to proposing efec tive and efcient
suitable attention mechanisms to form a general techniques could lead to “Future Genera tion DL”
classif cation framework, where CNN can be used modeling as well as more intelligent and automated
as a feature learning tool for capturing features in applications.
various levels and ranges. Thus, designing a
general deep learning framework considering
attention as well as a latent feature for target Concluding Remarks
application domains could be another area to
contribute. In this article, we have presented a structured and
compre hensive view of deep learning technology,
To summarize, deep learning is a fairly open topic to which is consid ered a core part of artifcial intelligence
which academics can contribute by developing new as well as data sci ence. It starts with a history of
methods or improving existing methods to handle the artifcial neural networks and
SN Computer Science
SN Computer Science (2021) 2:420 Page 17 of 20 420
moves to recent deep learning techniques and interpretability, addressing the challenges or future
breakthroughs in diferent applications. Then, the key aspects that are identifed could lead to future genera
algorithms in this area, as well as deep neural tion deep learning modeling and smarter systems.
network modeling in various dimensions are This can also help the researchers for in-depth
explored. For this, we have also presented a analysis to produce more reliable and realistic
taxonomy considering the variations of deep learning outcomes. Overall, we believe that our study on
tasks and how they are used for diferent purposes. In neural networks and deep learning-based advanced
our compre analytics points in a promising path and can be uti
hensive study, we have taken into account not only lized as a reference guide for future research and
the deep networks for supervised or discriminative implemen tations in relevant application domains by
learning but also the deep networks for unsupervised both academic and industry professionals.
or generative learning, and hybrid learning that can
be used to solve a variety of real-world issues
according to the nature of problems. Declarations
Deep learning, unlike traditional machine learning
and data mining algorithms, can produce extremely Conflict of interest The author declares no confict of interest.
high-level data representations from enormous
amounts of raw data. As a result, it has provided an
excellent solution to a variety of real-world problems. References
A successful deep learning technique must possess
the relevant data-driven modeling depending on the 1. Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J,
characteristics of raw data. The sophisticated learn Devin Ma, Ghemawat S, Irving G, Isard M, et al.
ing algorithms then need to be trained through the Tensorfow: a system for large-scale machine learning. In:
collected data and knowledge related to the target 12th {USENIX} Symposium on operating systems design
and implementation ({OSDI} 16), 2016; p. 265–283.
application before the system can assist with 2. Abdel-Basset M, Hawash H, Chakrabortty RK, Ryan M.
intelligent decision-making. Deep learning has shown Energy-net: a deep learning approach for smart energy
to be useful in a wide range of applica man agement in iot-based smart cities. IEEE Internet of
tions and research areas such as healthcare, Things J. 2021.
3. Aggarwal A, Mittal M, Battineni G. Generative adversarial
sentiment analy sis, visual recognition, business net work: an overview of theory and applications. Int J Inf
intelligence, cybersecurity, and many more that are Manag Data Insights. 2021; p. 100004.
summarized in the paper. 4. Al-Qatf M, Lasheng Y, Al-Habib M, Al-Sabahi K. Deep
Finally, we have summarized and discussed the learning approach combining sparse autoencoder with
chal lenges faced and the potential research svm for network intrusion detection. IEEE Access.
2018;6:52843–56.
directions, and future aspects in the area. Although 5. Ale L, Sheta A, Li L, Wang Y, Zhang N. Deep learning
deep learning is considered a black-box solution for based plant disease detection for smart agriculture. In:
many applications due to its poor reasoning and 2019 IEEE Globecom Workshops (GC Wkshps), 2019; p.
1–6. IEEE. but will they buy? predicting customers’ purchase
6. Amarbayasgalan T, Lee JY, Kim KR, Ryu KH. Deep behavior using deep learning. Decis Support Syst. 2021;
autoencoder based neural networks for coronary heart p. 113622.
disease risk prediction. In: Heterogeneous data 15. Chen D, Wawrzynski P, Lv Z. Cyber security in smart
management, polystores, and analytics for healthcare. cities: a review of deep learning-based applications and
Springer; 2019. p. 237–48. case studies. Sustain Cities Soc. 2020; p. 102655.
7. Anuradha J, et al. Big data based stock trend prediction 16. Cho K, Van MB, Gulcehre C, Bahdanau D, Bougares F,
using deep cnn with reinforcement-lstm model. Int J Syst Schwenk H, Bengio Y. Learning phrase representations
Assur Eng Manag. 2021; p. 1–11. using rnn encoder decoder for statistical machine
8. Aqib M, Mehmood R, Albeshri A, Alzahrani A. Disaster translation. arXiv preprint arXiv:1406.1078, 2014.
man agement in smart cities by forecasting trafc plan 17. Chollet F. Xception: Deep learning with depthwise
using deep learning and gpus. In: International separable convolutions. In: Proceedings of the IEEE
Conference on smart cities, infrastructure, technologies Conference on com puter vision and pattern recognition, 2017;
and applications. Springer; 2017. p. 139–54. p. 1251–258.
9. Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA. 18. Chung J, Gulcehre C, Cho KH, Bengio Y. Empirical
Deep reinforcement learning: a brief survey. IEEE Signal evaluation of gated recurrent neural networks on
Process Mag. 2017;34(6):26–38. sequence modeling. arXiv preprint arXiv:1412.3555,
10. Aslan MF, Unlersen MF, Sabanci K, Durdu A. Cnn-based 2014.
trans fer learning-bilstm network: a novel approach for 19. Coelho IM, Coelho VN, da Eduardo J, Luz S, Ochi LS,
covid-19 infec tion detection. Appl Soft Comput. Guima rães FG, Rios E. A gpu deep learning metaheuristic
2021;98:106912. based model for time series forecasting. Appl Energy.
11. Bu F, Wang X. A smart agriculture iot system based on 2017;201:412–8.
deep rein forcement learning. Futur Gener Comput Syst. 20. Da'u A, Salim N. Recommendation system based on deep
2019;99:500–7. 12. Chang W-J, Chen L-B, Hsu C-H, Lin C-P, learn ing methods: a systematic review and new
Yang T-C. A deep learning-based intelligent medicine directions. Artif Intel Rev. 2020;53(4):2709–48.
recognition system for chronic patients. IEEE Access. 21. Deng L. A tutorial survey of architectures, algorithms, and
2019;7:44441–58. appli cations for deep learning. APSIPA Trans Signal Inf
13. Chaudhari S, Mithal V, Polatkan Gu, Ramanath R. An Process. 2014; p. 3.
attentive survey of attention models. arXiv preprint 22. Deng L, Dong Yu. Deep learning: methods and
arXiv:1904.02874, 2019. applications. Found Trends Signal Process. 2014;7(3–4):197–
14. Chaudhuri N, Gupta G, Vamsi V, Bose I. On the platform 387.
SN Computer Science
420 Page 18 of 20 SN Computer Science (2021) 2:420
23. Deng S, Li R, Jin Y, He H. Cnn-based feature cross and deep convolutional neural networks. IEEE Trans Ind
clas sifer for loan default prediction. In: 2020 International Electron. 2020.
Con ference on image, video processing and artifcial 36. Han J, Pei J, Kamber M. Data mining: concepts and
intelligence, volume 11584, page 115841K. International techniques. Amsterdam: Elsevier; 2011.
Society for Optics and Photonics, 2020. 37. Haykin S. Neural networks and learning machines, 3/E.
24. Dhyani M, Kumar R. An intelligent chatbot using deep Lon don: Pearson Education; 2010.
learning with bidirectional rnn and attention model. Mater 38. He K, Zhang X, Ren S, Sun J. Spatial pyramid pooling in
Today Proc. 2021;34:817–24. deep convolutional networks for visual recognition. IEEE
25. Donahue J, Krähenbühl P, Darrell T. Adversarial feature Trans Pat tern Anal Mach Intell. 2015;37(9):1904–16.
learn ing. arXiv preprint arXiv:1605.09782, 2016. 39. He K, Zhang X, Ren S, Sun J. Deep residual learning for
26. Du K-L, Swamy MNS. Neural networks and statistical learn image recognition. In: Proceedings of the IEEE Conference on
ing. Berlin: Springer Science & Business Media; 2013. 27. com puter vision and pattern recognition, 2016; p. 770–78.
Dupond S. A thorough review on the current advance of neural 40. Hinton GE. Deep belief networks. Scholarpedia.
network structures. Annu Rev Control. 2019;14:200–30. 28. 2009;4(5):5947.
Feng J, Yao Y, Lu S, Liu Y. Domain knowledge-based deep 41. Hinton GE, Osindero S, Teh Y-W. A fast learning algorithm
broad learning framework for fault diagnosis. IEEE Trans Ind for deep belief nets. Neural Comput. 2006;18(7):1527–54. 42.
Electron. 2020;68(4):3454–64. Hochreiter S, Schmidhuber J. Long short-term memory. Neural
29. Garg S, Kaur K, Kumar N, Rodrigues JJPC. Hybrid deep Comput. 1997;9(8):1735–80.
learning-based anomaly detection scheme for suspicious 43. Huang C-J, Kuo P-H. A deep cnn-lstm model for particu
fow detection in sdn: a social multimedia perspective. late matter (pm2. 5) forecasting in smart cities. Sensors.
IEEE Trans Multimed. 2019;21(3):566–78. 2018;18(7):2220.
30. Géron A. Hands-on machine learning with Scikit-Learn, 44. Huang H-H, Fukuda M, Nishida T. Toward rnn based micro
Keras. In: and TensorFlow: concepts, tools, and non-verbal behavior generation for virtual listener agents.
techniques to build intelligent systems. O’Reilly Media; In: International Conference on human-computer
2019. interaction, 2019; p. 53–63. Springer.
31. Goodfellow I, Bengio Y, Courville A, Bengio Y. Deep 45. Hulsebos M, Hu K, Bakker M, Zgraggen E, Satyanarayan
learning, vol. 1. Cambridge: MIT Press; 2016. A, Kraska T, Demiralp Ça, Hidalgo C. Sherlock: a deep
32. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde- learning approach to semantic data type detection. In:
Farley D, Ozair S, Courville A, Bengio Y. Generative Proceedings of the 25th ACM SIGKDD International
adversarial nets. In: Advances in neural information Conference on knowl
processing systems. 2014; p. 2672–680. edge discovery & data mining, 2019; p. 1500–508.
33. Google trends. 2021. https://trends.google.com/trends/. 34. 46. Imamverdiyev Y, Abdullayeva F. Deep learning method for
Gruber N, Jockisch A. Are gru cells more specifc and lstm denial of service attack detection based on restricted
cells more sensitive in motive classifcation of text? Front Artif Boltzmann machine. Big Data. 2018;6(2):159–69.
Intell. 2020;3:40. 47. Islam MZ, Islam MM, Asraf A. A combined deep cnn-lstm
35. Gu B, Ge R, Chen Y, Luo L, Coatrieux G. Automatic and net work for the detection of novel coronavirus (covid-19)
robust object detection in x-ray baggage inspection using using x-ray images. Inf Med Unlock. 2020;20:100412.
48. Ismail WN, Hassan MM, Alsalamah HA, Fortino G. Cnn- Roeder I, Scherf N. Domain-specific cues improve robustness
based health model for regular health factors analysis in of deep learning-based segmentation of ct volumes. Sci Rep.
internet-of medical things environment. IEEE. Access. 2020;10(1):1–9.
2020;8:52541–9. 59. Kohonen T. The self-organizing map. Proc IEEE.
49. Jangid H, Singhal S, Shah RR, Zimmermann R. Aspect- 1990;78(9):1464–80.
based fnancial sentiment analysis using deep learning. In: 60. Kohonen T. Essentials of the self-organizing map. Neural
Compan ion Proceedings of the The Web Conference Netw. 2013;37:52–65.
2018, 2018; p. 1961–966. 61. Kök İ, Şimşek MU, Özdemir S. A deep learning model for
50. Kaelbling LP, Littman ML, Moore AW. Reinforcement air quality prediction in smart cities. In: 2017 IEEE
learning: a survey. J Artif Intell Res. 1996;4:237–85. International Conference on Big Data (Big Data), 2017; p.
51. Kameoka H, Li L, Inoue S, Makino S. Supervised 1983–990. IEEE.
determined source separation with multichannel 62. Krizhevsky A, Sutskever I, Hinton GE. Imagenet
variational autoencoder. Neural Comput. classifcation with deep convolutional neural networks. In:
2019;31(9):1891–914. Advances in neural information processing systems.
52. Karhunen J, Raiko T, Cho KH. Unsupervised deep 2012; p. 1097–105.
learning: a short review. In: Advances in independent 63. Latif S, Rana R, Younis S, Qadir J, Epps J. Transfer
component analysis and learning machines. 2015; p. learning for improving speech emotion classifcation
125–42. accuracy. arXiv preprint arXiv:1801.06353, 2018.
53. Kawde P, Verma GK. Deep belief network based afect 64. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature.
recogni tion from physiological signals. In: 2017 4th IEEE 2015;521(7553):436–44.
Uttar Pradesh Section International Conference on 65. LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based
electrical, computer and electronics (UPCON), 2017; p. learning applied to document recognition. Proc IEEE.
587–92. IEEE. 1998;86(11):2278–324.
54. Kim J-Y, Seok-Jun B, Cho S-B. Zero-day malware 66. Li B, François-Lavet V, Doan T, Pineau J. Domain
detection using transferred generative adversarial adversarial reinforcement learning. arXiv preprint
networks based on deep autoencoders. Inf Sci. arXiv:2102.07097, 2021. 67. Li T-HS, Kuo P-H, Tsai T-N, Luan
2018;460:83–102. P-C. Cnn and lstm based facial expression analysis model for
55. Kingma DP, Welling M. Auto-encoding variational bayes. a humanoid robot. IEEE Access. 2019;7:93998–4011.
arXiv preprint arXiv:1312.6114, 2013. 68. Liu C, Cao Y, Luo Y, Chen G, Vokkarane V, Yunsheng M,
56. Kingma DP, Welling M. An introduction to variational Chen S, Hou P. A new deep learning-based food
autoen coders. arXiv preprint arXiv:1906.02691, 2019. recognition system for dietary assessment on an edge
57. Kiran PKR, Bhasker B. Dnnrec: a novel deep learning computing service infrastructure. IEEE Trans Serv
based hybrid recommender system. Expert Syst Appl. 2020. Comput. 2017;11(2):249–61.
58. Kloenne M, Niehaus S, Lampe L, Merola A, Reinelt J,
SN Computer Science
SN Computer Science (2021) 2:420 Page 19 of 20 420
69. Liu W, Wang Z, Liu X, Zeng N, Liu Y, Alsaadi FE. A survey 78. Minaee S, Azimi E, Abdolrashidi AA. Deep-sentiment: senti
of deep neural network architectures and their ment analysis using ensemble of cnn and bi-lstm models.
applications. Neuro computing. 2017;234:11–26. arXiv preprint arXiv:1904.04206, 2019.
70. López AU, Mateo F, Navío-Marco J, Martínez-Martínez JM, 79. Naeem M, Paragliola G, Coronato A. A reinforcement learn
Gómez-Sanchís J, Vila-Francés J, Serrano-López AJ. ing and deep learning based intelligent system for the
Analysis of computer user behavior, security incidents support of impaired patients in home treatment. Expert
and fraud using self-organizing maps. Comput Secur. Syst Appl. 2021;168:114285.
2019;83:38–51. 80. Niu Z, Zhong G, Hui Yu. A review on the attention
71. Lopez-Martin M, Carro B, Sanchez-Esguevillas A. mechanism of deep learning. Neurocomputing. 2021;452:48–
Application of deep reinforcement learning to intrusion 62. 81. Pan SJ, Yang Q. A survey on transfer learning. IEEE
detection for super vised problems. Expert Syst Appl. Trans Knowl Data Eng. 2009;22(10):1345–59.
2020;141:112963. 82. Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan
72. Ma X, Yao T, Menglan H, Dong Y, Liu W, Wang F, Liu J. A G, Killeen T, Lin Z, Gimelshein N, Antiga L, et al. Pytorch:
sur vey on deep learning empowered iot applications. An imperative style, high-performance deep learning
IEEE Access. 2019;7:181721–32. library. Adv Neural Inf Process Syst. 2019;32:8026–37.
73. Makhzani A, Frey B. K-sparse autoencoders. arXiv preprint 83. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion
arXiv:1312.5663, 2013. B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg
74. Mandic D, Chambers J. Recurrent neural networks for V, et al. Scikit-learn: machine learning in python. J Mach
prediction: learning algorithms, architectures and stability. Learn Res. 2011;12:2825–30.
Hoboken: Wiley; 2001. 84. Pi Y, Nath ND, Behzadan AH. Convolutional neural
75. Marlin B, Swersky K, Chen B, Freitas N. Inductive networks for object detection in aerial imagery for disaster
principles for restricted boltzmann machine learning. In: response and recovery. Adv Eng Inf. 2020;43:101009.
Proceedings of the Thirteenth International Conference on 85. Piccialli F, Giampaolo F, Prezioso E, Crisci D, Cuomo S.
artifcial intelligence and statistics, p. 509–16. JMLR Pre dictive analytics for smart parking: A deep learning
Workshop and Conference Proceed approach in forecasting of iot data. ACM Trans Internet
ings, 2010. Technol (TOIT). 2021;21(3):1–21.
76. Masud M, Muhammad G, Alhumyani H, Alshamrani SS, 86. Puterman ML. Markov decision processes: discrete
Cheikhrouhou O, Ibrahim S, Hossain MS. Deep learning- stochastic dynamic programming. Hoboken: Wiley; 2014.
based intelligent face recognition in iot-cloud environment. 87. Qu X, Lin Y, Kai G, Linru M, Meng S, Mingxing K, Mu L,
Comput Commun. 2020;152:215–22. editors. A survey on the development of self-organizing
77. Memisevic R, Hinton GE. Learning to represent spatial maps for unsupervised intrusion detection. Mob Netw
transfor mations with factored higher-order boltzmann Appl. 2019; p. 1–22.
machines. Neural Comput. 2010;22(6):1473–92. 88. Rahman MW, Tashfa SS, Islam R, Hasan MM, Sultan SI,
Mia S, Rahman MM. The architectural design of smart 99. Sarker IH, Abushark YB, Khan AI. Contextpca: Predicting
blind assistant con text-aware smartphone apps usage based on machine
using iot with deep learning paradigm. Internet of Things. learning techniques. Symmetry. 2020;12(4):499.
2021;13:100344. 100. Sarker IH, Colman A, Han J. Recencyminer: mining
89. Ren J, Green M, Huang X. From traditional to deep recency based personalized behavior from contextual
learning: fault diagnosis for autonomous vehicles. In: smartphone data. J Big Data. 2019;6(1):1–21.
Learning control. Elsevier. 2021; p. 205–19. 101. Sarker IH, Colman A, Han J, Khan AI, Abushark YB, Salah
90. Rifai S, Vincent P, Muller X, Glorot X, Bengio Y. K. Behavdt: a behavioral decision tree learning to build
Contractive auto-encoders: Explicit invariance during user centric context-aware predictive model. Mob Netw
feature extraction. In: Icml, 2011. Appl. 2020;25(3):1151–61.
91. Rosa RL, Schwartz GM, Ruggiero WV, Rodríguez DZ. A 102. Sarker IH, Colman A, Kabir MA, Han J. Individualized time
knowledge-based recommendation system that includes series segmentation for mining mobile phone user
sentiment analysis and deep learning. IEEE Trans Ind Inf. behavior. Comput J. 2018;61(3):349–68.
2018;15(4):2124–35. 103. Sarker IH, Furhad MH, Nowrozy R. Ai-driven cybersecurity:
92. Sarker IH. Context-aware rule learning from smartphone an overview, security intelligence modeling and research
data: survey, challenges and future directions. J Big Data. directions. SN Computer. Science. 2021;2(3):1–18.
2019;6(1):1–25. 104. Sarker IH, Hoque MM, Uddin MK. Mobile data science and
93. Sarker IH. A machine learning based robust prediction intelligent apps: concepts, ai-based modeling and research
model for real-life mobile phone data. Internet of Things. direc tions. Mob Netw Appl. 2021;26(1):285–303.
2019;5:180–93. 94. Sarker IH. Cyberlearning: efectiveness 105. Sarker IH, Kayes ASM. Abc-ruleminer: User behavioral
analysis of machine learning security modeling to detect rule based machine learning method for context-aware
cyber-anomalies and multi attacks. Internet of Things. intelligent services. J Netw Comput Appl.
2021;14:100393. 2020;168:102762.
95. Sarker IH. Data science and analytics: an overview from 106. Sarker IH, Kayes ASM, Badsha S, Alqahtani H, Watters P,
data driven smart computing, decision-making and Ng A. Cybersecurity data science: an overview from
applications per spective. SN Comput Sci. 2021. machine learning perspective. J Big data. 2020;7(1):1–29.
96. Sarker IH. Deep cybersecurity: a comprehensive overview 107. Sarker IH, Kayes ASM, Watters P. Effectiveness analy sis
from neural network and deep learning perspective. SN of machine learning classifcation models for predicting
Computer. Science. 2021;2(3):1–16. personalized context-aware smartphone usage. J Big
97. Sarker IH. Machine learning: Algorithms, real-world Data. 2019;6(1):1–28.
applications and research directions. SN Computer. Science. 108. Sarker IH, Salah K. Appspred: predicting context-aware
2021;2(3):1–21. 98. Sarker IH, Abushark YB, Alsolami F, Khan smart phone apps using random forest learning. Internet of
AI. Intrudtree: a machine learning based cyber security Things. 2019;8:100106.
intrusion detection model. Symmetry. 2020;12(5):754.
SN Computer Science
420 Page 20 of 20 SN Computer Science (2021) 2:420
109. Satt A, Rozenberg S, Hoory R. Efcient emotion on deep transfer learning. In: International Conference on
recognition from speech using deep learning on artifcial neural networks, 2018; p. 270–279. Springer.
spectrograms. In: Interspeec, 2017; p. 1089–1093. 118. Vesanto J, Alhoniemi E. Clustering of the self-organizing
110. Sevakula RK, Singh V, Verma NK, Kumar C, Cui Y. Trans map. IEEE Trans Neural Netw. 2000;11(3):586–600.
fer learning for molecular cancer classification using deep 119. Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol P-
neural networks. IEEE/ACM Trans Comput Biol Bioinf. A, Bottou L. Stacked denoising autoencoders: Learning
2018;16(6):2089–100. useful rep resentations in a deep network with a local
111. Sujay Narumanchi H, Ananya Pramod Kompalli Shankar denoising criterion. J Mach Learn Res. 2010;11(12).
A, Devashish CK. Deep learning based large scale visual 120. Wang J, Liang-Chih Yu, Robert Lai K, Zhang X. Tree-
rec ommendation and search for e-commerce. arXiv structured regional cnn-lstm model for dimensional
preprint arXiv:1703.02344, 2017. sentiment analysis. IEEE/ACM Trans Audio Speech Lang
112. Shao X, Kim CS. Multi-step short-term power consumption Process. 2019;28:581–91.
fore casting using multi-channel lstm with time location 121. Wang S, Wan J, Li D, Liu C. Knowledge reasoning with
considering customer behavior. IEEE Access. 2020;8:125263– seman tic data for real-time data processing in smart
73. factory. Sensors. 2018;18(2):471.
113. Siami-Namini S, Tavakoli N, Namin AS. The performance 122. Wang W, Zhao M, Wang J. Efective android malware
of lstm and bilstm in forecasting time series. In: 2019 detec tion with a hybrid model based on deep
IEEE Inter national Conference on Big Data (Big Data), autoencoder and con volutional neural network. J Ambient
2019; p. 3285–292. IEEE. Intell Humaniz Comput. 2019;10(8):3035–43.
114. Ślusarczyk B. Industry 4.0: are we ready? Pol J Manag 123. Wang X, Liu J, Qiu T, Chaoxu M, Chen C, Zhou P. A real
Stud. 2018; p. 17 time collision prediction mechanism with deep learning for
115. Sumathi P, Subramanian R, Karthikeyan VV, Karthik S. intelligent transportation system. IEEE Trans Veh
Soil monitoring and evaluation system using edl-asqe: Technol. 2020;69(9):9497–508.
enhanced deep learning model for ioi smart agriculture 124. Wang Y, Huang M, Zhu X, Zhao L. Attention-based lstm
network. Int J Commun Syst. 2021; p. e4859. for aspect-level sentiment classifcation. In: Proceedings
116. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov of the 2016 Conference on empirical methods in natural
D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper language processing, 2016; p. 606–615.
with convolutions. In: Proceedings of the IEEE 125. Wei P, Li Y, Zhang Z, Tao H, Li Z, Liu D. An optimization
Conference on computer vision and pattern recognition, method for intrusion detection classifcation model based
2015; p. 1–9. on deep belief network. IEEE Access. 2019;7:87593–605.
117. Tan C, Sun F, Kong T, Zhang W, Yang C, Liu C. A survey 126. Weiss K, Khoshgoftaar TM, Wang DD. A survey of
transfer learning. J Big data. 2016;3(1):9. recom mender systems. Front Comput Sci. 2020;14(2):430–50.
127. Xin Y, Kong L, Liu Z, Chen Y, Li Y, Zhu H, Gao M, Hou H, 133. Zhang X, Yao L, Huang C, Wang S, Tan M, Long Gu,
Wang C. Machine learning and deep learning methods for Wang C. Multi-modality sensor data classifcation with selective
cyber security. Ieee access. 2018;6:35365–81. attention. arXiv preprint arXiv:1804.05493, 2018.
128. Xu W, Sun H, Deng C, Tan Y. Variational autoencoder for 134. Zhang X, Yao L, Wang X, Monaghan J, Mcalpine D,
semi supervised text classifcation. In: Thirty-First AAAI Zhang Y. A survey on deep learning based brain
Conference on artifcial intelligence, 2017. computer interface: recent advances and new frontiers.
129. Xue Q, Chuah MC. New attacks on rnn based healthcare arXiv preprint arXiv:1905.04149, 2019; p. 66.
learning system and their detections. Smart Health. 135. Zhang Y, Zhang P, Yan Y. Attention-based lstm with multi-
2018;9:144–57. 130. Yousef-Azar M, Hamey L. Text task learning for distant speech recognition. In:
summarization using unsuper Interspeech, 2017; p. 3857–861.
vised deep learning. Expert Syst Appl. 2017;68:93–105.
131. Yuan X, Shi J, Gu L. A review of deep learning methods Publisher's Note Springer Nature remains neutral with regard
for semantic segmentation of remote sensing imagery. Expert to jurisdictional claims in published maps and institutional
Syst Appl. 2020;p. 114417. afliations.
132. Zhang G, Liu Y, Jin X. A survey of autoencoder-based
SN Computer Science