Deep Learning

SN Computer Science (2021) 2:420
https://doi.org/10.1007/s42979-021-00815-1
REVIEW ARTICLE
Deep Learning: A Comprehensive Overview

on Techniques, Taxonomy, Applications and Research
Directions
Iqbal H. Sarker1,2
Received: 29 May 2021 / Accepted: 7 August 2021 / Published online: 18 August 2021
© The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd 2021
Abstract
Deep learning (DL), a branch of machine learning (ML) and artifcial intelligence (AI) is nowadays considered as
a core technology of today’s Fourth Industrial Revolution (4IR or Industry 4.0). Due to its learning capabilities
from data, DL technology originated from artifcial neural network (ANN), has become a hot topic in the context
of computing, and is widely applied in various application areas like healthcare, visual recognition, text
analytics, cybersecurity, and many more. However, building an appropriate DL model is a challenging task, due
to the dynamic nature and variations in real-world problems and data. Moreover, the lack of core understanding
turns DL methods into black-box machines that hamper develop
ment at the standard level. This article presents a structured and comprehensive view on DL techniques
including a taxonomy considering various types of real-world tasks like supervised or unsupervised. In our
taxonomy, we take into account deep networks for supervised or discriminative learning, unsupervised or
generative learning as well as hybrid learning and relevant others. We also summarize real-world application
areas where deep learning techniques can be used. Finally, we point out ten potential aspects for future
generation DL modeling with research directions. Overall, this article aims to draw a big picture on DL modeling
that can be used as a reference guide for both academia and industry professionals.
Keywords Deep learning · Artifcial neural network · Artifcial intelligence · Discriminative learning ·
Generative learning · Hybrid learning · Intelligent systems
Introduction VIC 3122, Australia

2
Chittagong University of Engineering &
Technology, Chittagong 4349, Bangladesh
In the late 1980s, neural networks became a
prevalent topic in the area of Machine Learning (ML) the interest in researching this topic decreased later
as well as Artifcial Intelligence (AI), due to the on. After that, in 2006, “Deep Learning” (DL) was
invention of various efcient learning methods and introduced by Hinton et al. [41], which was based on
network structures [52]. Multilayer perceptron the concept of artifcial neural network (ANN). Deep
networks trained by “Backpropagation” type learning became a prominent topic after that,
algorithms, self-organizing maps, and radial basis resulting in a rebirth in neural network research,
function networks were such innovative methods [26, hence, some times referred to as “new
36, 37]. While neural networks are successfully used generation neural networks”. This is because deep
in many applications, networks, when properly trained, have produced
signifcant success in a variety of classifcation and
regression challenges [52].
This article is part of the topical collection “Advances in
Computational Approaches for Artifcial Intelligence, Image Nowadays, DL technology is considered as one of
Processing, IoT and Cloud Applications” guest edited by the hot topics within the area of machine learning,
Bhanu Prakash K. N. and M. Shivakumar. artifcial intelligence as well as data science and
analytics, due to its learning capabilities from the
🖂 Iqbal H. Sarker given data. Many corporations including Google,
msarker@swin.edu.au Microsoft, Nokia, etc., study it actively as it can
provide signifcant results in diferent classifca
1
Swinburne University of Technology, Melbourne, tion and regression problems and datasets [52]. In
terms of working domain, DL is considered as a which is shown in our earlier paper [96] based on the
subset of ML and AI, and thus DL can be seen as an historical data
AI function that mimics the human brain’s processing
of data. The worldwide popu SN Computer Science
larity of “Deep learning” is increasing day by day, Vol.:(0123456789)
420 Page 2 of 20 SN Computer Science (2021) 2:420
collected from Google trends [33]. Deep learning

difers from standard machine learning in terms of
efciency as the volume of data increases, discussed
briefy in Section “Why Deep Learning in Today's
Research and Applications?”. DL technology uses
multiple layers to represent the abstractions of data to
build computational models. While deep learning
takes a long time to train a model due to a large
number of parameters, it takes a short amount of time
to run during testing as compared to other machine
learning algorithms [127].
While today’s Fourth Industrial Revolution (4IR or
Indus try 4.0) is typically focusing on technology-
driven “automa tion, smart and intelligent systems”,
DL technology, which is originated from ANN, has Fig. 1 Schematic representation of the mathematical model of
become one of the core tech nologies to achieve the an artifcial neuron (processing element), highlighting input (Xi),
weight
goal [103, 114]. A typical neural network is mainly ∑
composed of many simple, connected pro cessing (w), bias (b), summation function ( ), activation function (f) and
out put signal (y)
elements or processors called neurons, each of which
“black-box” machines that hamper the standard
generates a series of real-valued activations for the
develop ment of deep learning research and
target outcome. Figure 1 shows a schematic
applications. Thus for clear understanding, in this
representation of the mathematical model of an
paper, we present a structured and comprehensive
artifcial neuron, i.e., processing element, highlighting
view on DL techniques considering the variations in
input (Xi), weight (w), bias (b), sum real-world problems and tasks. To achieve our goal,
mation function (∑), activation function (f) and we briefy discuss various DL techniques and present
a taxonomy by taking into account three major
correspond ing output signal (y). Neural network- categories: (i) deep networks for supervised or
based DL technology is now widely applied in many discrimi native learning that is utilized to provide a
felds and research areas such as healthcare, discrimina tive function in supervised deep learning or
sentiment analysis, natural language process ing, classifca tion applications; (ii) deep networks for
visual recognition, business intelligence, unsupervised or generative learning that are used to
cybersecurity, and many more that have been characterize the high-order correlation properties or
summarized in the latter part of this paper. features for pattern analysis or synthesis, thus can be
Although DL models are successfully applied in used as preprocessing for the supervised algorithm;
various application areas, mentioned above, building and (ii) deep networks for hybrid learning that is an
an appropri ate model of deep learning is a integration of both supervised and unsupervised
challenging task, due to the dynamic nature and model and relevant others. We take into account
variations of real-world problems and data. Moreover, such categories based on the nature and learning
DL models are typically considered as capabilities of diferent DL techniques and how they
are used to solve problems in real-world applications
[97]. Moreover, identifying key research issues and
prospects including efective data representation, new
algorithm design, data-driven hyper-parameter
learning, and model optimization, integrating domain
knowledge, adapting resource-constrained devices,
etc. is one of the key targets of this study, which can
lead to “Future Generation DL Modeling”. Thus the
goal of this paper is set to assist those in academia
and industry as a reference guide, who want to
research and develop data-driven smart and
intelligent systems based on DL techniques.
The overall contribution of this paper is summarized
as follows: used for difer ent purposes. In our taxonomy, we
divide the techniques into three major categories
– This article focuses on diferent aspects of deep such as deep networks for supervised or
learning modeling, i.e., the learning capabilities of discriminative learning, unsupervised or
DL techniques in diferent dimensions such as generative learning, as well as deep networks for
supervised or unsuper vised tasks, to function in an hybrid learning, and relevant others.
automated and intelligent manner, which can play – We have summarized several potential real-world
as a core technology of today’s Fourth Industrial appli cation areas of deep learning, to assist
Revolution (Industry 4.0). developers as well as researchers in broadening
– We explore a variety of prominent DL techniques their perspectives on DL techniques. Diferent
and present a taxonomy by taking into account the categories of DL techniques high lighted in our
variations in deep learning tasks and how they are taxonomy can be used to solve various issues
accordingly.
SN Computer Science
SN Computer Science (2021) 2:420 Page 3 of 20 420
– Finally, we point out and discuss ten potential is therefore rel evant to artifcial intelligence [103],
aspects with research directions for future machine learning [97] and data science with
generation DL mod eling in terms of conducting advanced analytics [95] that are well known areas in
future research and system development. computer science, particularly, today’s intel ligent
computing. In the following, we frst discuss regarding
This paper is organized as follows. Section “Why the position of deep learning in AI, or how DL
Deep Learning in Today's Research technology is related to these areas of computing.
andApplications?” motivates why deep learning is
important to build data-driven intel ligent systems. In The Position of Deep Learning in AI
Section“ Deep Learning Techniques and
Applications”, we present our DL taxonomy by taking Nowadays, artifcial intelligence (AI), machine learning
into account the variations of deep learning tasks and (ML), and deep learning (DL) are three popular terms
how they are used in solving real-world issues and that are sometimes used interchangeably to describe
briefy discuss the techniques with summarizing the systems or software that behaves intelligently. In
potential application areas. In Section “Research Fig. 2, we illustrate the position of deep Learning,
Directions and Future Aspects”, we discuss various comparing with machine learning and artifcial
research issues of deep learning-based mod intelligence. According to Fig. 2, DL is a part
eling and highlight the promising topics for future
research within the scope of our study. Finally,
Section “Concluding Remarks” concludes this paper.
Why Deep Learning in Today’s

Research and Applications?
The main focus of today’s Fourth Industrial Revolution

(Industry 4.0) is typically technology-driven
Fig. 2 An illustration of the position of deep learning (DL),
automation, smart and intelligent systems, in various
compar ing with machine learning (ML) and artifcial intelligence
application areas including smart healthcare, (AI)
business intelligence, smart cit
ies, cybersecurity intelligence, and many more [95].
Deep learning approaches have grown dramatically of ML as well as a part of the broad area AI. In
in terms of performance in a wide range of general, AI incorporates human behavior and
applications considering security technologies, intelligence to machines or systems [103], while ML is
particularly, as an excellent solution for uncovering the method to learn from data or experience [97],
complex architecture in high-dimensional data. Thus, which automates analytical model build
DL techniques can play a key role in building ing. DL also represents learning methods from data
intelligent data-driven systems according to today’s where the computation is done through multi-layer
needs, because of their excellent learning capabilities neural networks and processing. The term “Deep” in
from histori the deep learning meth odology refers to the concept
cal data. Consequently, DL can change the world as of multiple levels or stages through which data is
well as humans’ everyday life through its automation processed for building a data-driven model.
power and learning from experience. DL technology Thus, DL can be considered as one of the core
technol ogy of AI, a frontier for artifcial intelligence, Understanding Various Forms of Data
which can be used for building intelligent systems
and automation. More importantly, it pushes AI to a As DL models learn from data, an in-depth
new level, termed “Smarter AI”. As DL are capable of understanding and representation of data are
learning from data, there is a strong relation of deep important to build a data driven intelligent system in a
learning with “Data Science” [95] as well. Typically, particular application area. In the real world, data can
data science represents the entire process of fnding be in various forms, which typically can be
meaning or insights in data in a particular problem represented as below for deep learning modeling:
domain, where DL methods can play a key role for
advanced analytics and intelligent decision-making – Sequential Data Sequential data is any kind of data
[104, 106]. Over where the order matters, i,e., a set of sequences. It
all, we can conclude that DL technology is capable to needs to explicitly account for the sequential
change the current world, particularly, in terms of a nature of input data while building the model. Text
powerful com putational engine and contribute to streams, audio frag
technology-driven auto mation, smart and intelligent
systems accordingly, and meets the goal of Industry
4.0.
SN Computer Science
ments, video clips, time-series data, are some real world application areas of deep learning. Diferent
examples of sequential data. cat egories of DL techniques perform diferently
– Image or 2D Data A digital image is made up of a depending on the nature and characteristics of data,
matrix, which is a rectangular array of numbers, discussed briefy in Section “Deep Learning
sym bols, or expressions arranged in rows and Techniques and Applications” with a taxonomy
columns in a 2D array of numbers. Matrix, pixels, presentation. However, in many real-world application
voxels, and bit depth are the four essential areas, the standard machine learning techniques,
characteristics or fundamental parameters of a particularly, logic-rule or tree-based techniques [93,
digital image. 101] perform signifcantly depending on the
– Tabular Data A tabular dataset consists primarily of application nature. Figure 3 also shows the
rows and columns. Thus tabular datasets contain performance comparison of DL and ML modeling
data in a columnar format as in a database table. considering the amount of data. In the fol lowing, we
Each column (feld) must have a name and each highlight several cases, where deep learning is useful
column may only con to solve real-world problems, according to our main
tain data of the defned type. Overall, it is a logical focus in this paper.
and systematic arrangement of data in the form of
rows and columns that are based on data DL Properties and Dependencies
properties or features. Deep learning models can
learn efciently on tabular data and allow us to build A DL model typically follows the same processing
data-driven intelligent systems. stages as machine learning modeling. In Fig. 4, we
have shown a deep learning workfow to solve real-
world problems, which consists of three processing
steps, such as data understand
ing and preprocessing, DL model building, and
training, and validation and interpretation. However,
unlike the ML modeling [98, 108], feature extraction
in the DL model is automated rather than manual. K-
nearest neighbor, support vector machines, decision
tree, random forest, naive Bayes, linear regression,
association rules, k-means clustering, are some
examples of machine learning techniques that are
com
monly used in various application areas [97]. On the
other hand, the DL model includes convolution neural
Fig. 3 An illustration of the performance comparison between network, recurrent neural network, autoencoder, deep
deep learning (DL) and other machine learning (ML) belief network, and many more, discussed briefy with
algorithms, where DL modeling from large amounts of data can their potential appli
increase the perfor mance
cation areas in Section 3. In the following, we discuss
The above-discussed data forms are common in the
the key properties and dependencies of DL
techniques, that are
Fig. 4 A typical DL workfow to solve real-world problems, which consists of three sequential stages (i) data understanding and
preprocessing (ii) DL model building and training (iii) validation and interpretation
SN Computer Science
needed to take into account before started working on than one week to complete a training session,
DL modeling for real-world applications. whereas training with ML algorithms takes
relatively little time, only seconds to hours [107,
– Data Dependencies Deep learning is typically 127]. During testing, deep learning algorithms take
dependent on a large amount of data to build a extremely little time to run [127], when compared to
data-driven model for a particular problem domain. certain machine learning methods.
The reason is that when the data volume is small, – Black-box Perception and Interpretability Interpret
deep learning algorithms often perform poorly [64]. ability is an important factor when comparing DL with
In such circumstances, however, the performance ML. It’s difcult to explain how a deep learning result
of the standard machine-learning algo was obtained, i.e., “black-box”. On the other hand, the
rithms will be improved if the specifed rules are machine-learning algorithms, particularly, rule-based
used [64, 107]. machine learning techniques [97] provide explicit logic
– Hardware Dependencies The DL algorithms require rules (IF-THEN) for making decisions that are easily
large computational operations while training a interpretable for humans. For instance, in our earlier
model with large datasets. As the larger the works, we have presented several machines learning
computations, the more the advantage of a GPU rule based techniques [100, 102, 105], where the
over a CPU, the GPU is mostly used to optimize extracted rules are human-understandable and easier
the operations efciently. Thus, to work properly to interpret, update or delete according to the target
with the deep learning training, GPU hardware is applications.
necessary. Therefore, DL relies more on high-
performance machines with GPUs than standard The most signifcant distinction between deep learning
machine learning methods [19, 127]. and regular machine learning is how well it performs
– Feature Engineering Process Feature engineering is when data
the process of extracting features (characteristics, grows exponentially. An illustration of the performance
properties, and attributes) from raw data using comparison between DL and standard ML algorithms
domain knowledge. A fundamental distinction has been shown in Fig. 3, where DL modeling can
between DL and other machine increase the performance with the amount of data.
learning techniques is the attempt to extract high- Thus, DL modeling is extremely useful when dealing
level characteristics directly from data [22, 97]. with a large amount of data because of its capacity to
Thus, DL decreases the time and efort required to process vast amounts of features to build an efective
construct a fea ture extractor for each problem. data-driven model. In terms of develop
– Model Training and Execution time In general, train ing and training DL models, it relies on parallelized
ing a deep learning algorithm takes a long time due matrix and tensor operations as well as computing
to a large number of parameters in the DL gradients and optimization. Several, DL libraries and
algorithm; thus, the model training process takes resources [30] such as PyTorch [82] (with a high-level
longer. For instance, the DL models can take more API called Lightning) and TensorFlow [1] (which also
ofers Keras as a high-level API) ofers these core Unsupervised: a data-driven process that analyzes
utilities including many pre-trained models, as well as unlabeled datasets, (iii) Semi-supervised: a
many other necessary functions for implementa hybridization of both the supervised and
tion and DL model building. unsupervised methods, and (iv) Reinforcement: an
environ ment driven approach, discussed briefy in our
earlier paper [97]. Thus, to present our taxonomy, we
Deep Learning Techniques divide DL tech niques broadly into three major
and Applications categories: (i) deep networks for supervised or
discriminative learning; (ii) deep networks for
unsupervised or generative learning; and (ii) deep net
In this section, we go through the various types of
works for hybrid learning combing both and relevant
deep neural network techniques, which typically
others, as shown in Fig. 6. In the following, we briefy
consider sev eral layers of information-processing
discuss each of these techniques that can be used to
stages in hierarchical structures to learn. A typical
solve real-world prob lems in various application areas
deep neural network contains multiple hidden layers
according to their learning capabilities.
including input and output layers. Figure 5 shows a
general structure of a deep neural network (hidden
layer = N and N ≥ 2) comparing with a shallow
Deep Networks for Supervised
network (hidden layer = 1). We also present our
or Discriminative Learning
taxonomy on DL techniques based on how they are
used to solve vari ous problems, in this section. This category of DL techniques is utilized to provide a
However, before exploring the details of the DL discriminative function in supervised or classification
techniques, it’s useful to review various types of applications. Discriminative deep architectures are
learning tasks such as (i) Supervised: a task-driven typi cally designed to give discriminative power for
approach that uses labeled training data, (ii) pattern
SN Computer Science
Fig. 5 A general architecture of a a shallow network with one hidden layer and b a deep neural network with multiple
hidden layers
Fig. 6 A taxonomy of DL techniques, broadly divided into three major categories (i) deep networks for supervised or discriminative
learning, (ii) deep networks for unsupervised or generative learning, and (ii) deep networks for hybrid learning and relevant others
classifcation by describing the posterior distributions network (ANN). It is also known as the foundation
of classes conditioned on visible data [21]. archi tecture of deep neural networks (DNN) or deep
Discriminative architectures mainly include Multi- learning. A typical MLP is a fully connected network
Layer Perceptron (MLP), Convolutional Neural that consists of an input layer that receives input
Networks (CNN or ConvNet), Recur data, an output layer that makes a decision or
rent Neural Networks (RNN), along with their variants. prediction about the input signal, and one or more
In the following, we briefy discuss these techniques. hidden layers between these two that are consid ered
as the network’s computational engine [36, 103]. The
Multi-layer Perceptron (MLP) output of an MLP network is determined using a
variety of activation functions, also known as transfer
Multi-layer Perceptron (MLP), a supervised learning functions, such as ReLU (Rectifed Linear Unit), Tanh,
approach [83], is a type of feedforward artifcial neural Sigmoid, and Soft max [83, 96]. To train MLP employs
the most extensively
SN Computer Science
used algorithm “Backpropagation” [36], a supervised The Convolutional Neural Network (CNN or ConvNet)
learn ing technique, which is also known as the most [65] is a popular discriminative deep learning
basic build ing block of a neural network. During the architecture that learns directly from the input without
training process, various optimization approaches the need for human feature extraction. Figure 7
such as Stochastic Gradi ent Descent (SGD), Limited shows an example of a CNN including multiple
Memory BFGS (L-BFGS), and Adaptive Moment convolutions and pooling layers. As a result, the CNN
Estimation (Adam) are applied. MLP requires tuning enhances the design of traditional ANN like
of several hyperparameters such as the num ber of regularized MLP networks. Each layer in CNN takes
hidden layers, neurons, and iterations, which could into account optimum parameters for a meaningful
make solving a complicated model computationally output as well as reduces model complexity. CNN
expen sive. However, through partial ft, MLP ofers the also uses a ‘dropout’ [30] that can deal with the
advantage of learning non-linear models in real-time problem of over-ftting, which may occur in a
or online [83]. traditional network.
CNNs are specifcally intended to deal with a variety
Convolutional Neural Network (CNN or ConvNet) of 2D shapes and are thus widely employed in visual
recogni tion, medical image analysis, image
segmentation, natural language processing, and
many more [65, 96]. The capa bility of automatically tion into and out of the cell is managed by three
discovering essential features from the input without gates. For instance, the ‘Forget Gate’ determines
the need for human intervention makes it more what informa tion from the previous state cell will
powerful than a traditional network. Several variants be memorized and what information will be
of CNN are exist in the area that includes visual removed that is no longer use ful, while the ‘Input
geometry group (VGG) [38], AlexNet [62], Xception Gate’ determines which information should enter
[17], Inception [116], ResNet [39], etc. that can be the cell state and the ‘Output Gate’ deter mines
used in various applica tion domains according to and controls the outputs. As it solves the issues of
their learning capabilities. training a recurrent network, the LSTM network is
considered one of the most successful RNN.
Recurrent Neural Network (RNN) and its Variants – Bidirectional RNN/LSTM Bidirectional RNNs
connect two hidden layers that run in opposite
A Recurrent Neural Network (RNN) is another popular directions to a single output, allowing them to
neu ral network, which employs sequential or time- accept data from both the past and future.
series data and feeds the output from the previous Bidirectional RNNs, unlike tradi
step as input to the current stage [27, 74]. Like tional recurrent networks, are trained to predict
feedforward and CNN, recurrent networks learn from both positive and negative time directions at the
training input, however, distinguish by same time. A Bidirectional LSTM, often known as
a BiLSTM, is an extension of the standard LSTM
that can increase model performance on
Fig. 7 An example of a convo sequence classifcation issues [113]. It is a
lutional neural network (CNN
or ConvNet) including multiple
sequence processing model comprising of two
convolution and pooling layers LSTMs: one takes the input forward and the other
their “memory”, which allows them to impact current takes it backward. Bidirectional LSTM in particular
input and output through using information from is a popular choice in natural language processing
previous inputs. Unlike typical DNN, which assumes tasks.
that inputs and outputs are independent of one – Gated recurrent units (GRUs) A Gated Recurrent
another, the output of RNN is reliant on prior Unit (GRU) is another popular variant of the recurrent
elements within the sequence. However, standard net work that uses gating methods to control and
recurrent networks have the issue of vanishing manage information fow between cells in the neural
gradients, which makes learning long data sequences network, introduced by Cho et al. [16]. The GRU is
challenging. In the following, we discuss several like an LSTM, however, has fewer parameters, as it
popular variants of the recurrent network that has a reset gate and
minimizes the issues and perform well in many real-
world application domains.
– Long short-term memory (LSTM) This is a popular

form of RNN architecture that uses special units to
deal with the vanishing gradient problem, which
was introduced by Hochreiter et al. [42]. A memory
cell in an LSTM unit can store data for long
periods and the fow of informa
SN Computer Science
Fig. 8 Basic structure of a gated recurrent unit (GRU) cell

consisting of reset and update gates
an update gate but lacks the output gate, as shown

in Fig. 8. Thus, the key diference between a GRU
and an LSTM is that a GRU has two gates (reset
and update gates) whereas an LSTM has three
gates (namely input, output and forget gates). The
GRU’s structure enables it to capture
dependencies from large sequences of data in an
adaptive manner, without discarding information
from earlier parts of the sequence. Thus GRU is a
slightly more streamlined variant that often ofers
comparable performance and is signifcantly faster Generative Adversarial Network (GAN)
to compute [18]. Although GRUs have been
shown to exhibit better per formance on certain A Generative Adversarial Network (GAN), designed
smaller and less frequent datasets [18, 34], both by Ian Goodfellow [32], is a type of neural network
variants of RNN have proven their efec architecture for generative modeling to create new
tiveness while producing the outcome. plausible samples on demand. It involves
automatically discovering and learning regularities or
Overall, the basic property of a recurrent network is patterns in input data so that the model may be used
that it has at least one feedback connection, which to generate or output new examples from the origi
enables acti vations to loop. This allows the networks nal dataset. As shown in Fig. 9, GANs are composed
to do temporal processing and sequence learning, of two neural networks, a generator G that creates
such as sequence recogni tion or reproduction, new data having properties similar to the original
temporal association or prediction, etc. Following are data, and a discriminator D that predicts the likelihood
some popular application areas of recurrent networks of a subsequent sample being drawn from actual data
such as prediction problems, machine translation, rather than data provided by the generator. Thus in
natural language processing, text summarization, GAN modeling, both the generator and discriminator
speech recognition, and many more. are trained to compete with each other. While the
generator tries to fool and confuse the discriminator
Deep Networks for Generative by creating more realistic data, the discriminator tries
or Unsupervised Learning to distin guish the genuine data from the fake data
generated by G.
This category of DL techniques is typically used to Generally, GAN network deployment is designed for
charac terize the high-order correlation properties or unsupervised learning tasks, but it has also proven to
features for pattern analysis or synthesis, as well as be a better solution for semi-supervised and
the joint statistical distributions of the visible data and reinforcement learn ing as well depending on the task
their associated classes [21]. The key idea of [3]. GANs are also used in state-of-the-art transfer
generative deep architectures is that during the learning research to enforce the
learning process, precise supervisory information
such as target class labels is not of concern. As a
result, the methods under this category are
essentially applied for unsupervised learning as the
methods are typically used for feature learning or
data generating and representation [20, 21]. Thus
generative modeling can be used as preprocessing
for the supervised learning tasks as well, which
ensures the discriminative model accuracy.
Commonly used deep neural network techniques for
unsupervised or generative learning are Generative
Adversarial Network (GAN), Autoencoder (AE),
Restricted Boltzmann Machine (RBM), Self-Organ
izing Map (SOM), and Deep Belief Network (DBN)
along with their variants.
Fig. 9 Schematic structure of a standard generative
adversarial net work (GAN)
SN Computer Science
alignment of the latent feature space [66]. Inverse problems requiring a generative solution.
models, such as Bidirectional GAN (BiGAN) [25] can
also learn a mapping from data to the latent space, Auto-Encoder (AE) and Its Variants
similar to how the standard GAN model learns a
mapping from a latent space to the data distribution. An auto-encoder (AE) [31] is a popular unsupervised
The potential application areas of GAN networks are learn ing technique in which neural networks are used
healthcare, image analysis, data augmen to learn representations. Typically, auto-encoders are
tation, video generation, voice generation, pandemics, used to work with high-dimensional data, and
trafc control, cybersecurity, and many more, which dimensionality reduction explains how a set of data is
are increas ing rapidly. Overall, GANs have represented. Encoder, code, and decoder are the
established themselves as a comprehensive domain three parts of an autoencoder. The encoder
of independent data expansion and as a solution to compresses the input and generates the code, which
the decoder subsequently uses to reconstruct the
input. The AEs have recently been used to learn
generative data mod
els [69]. The auto-encoder is widely used in many
unsuper vised learning tasks, e.g., dimensionality
reduction, feature extraction, efcient coding,
generative modeling, denoising, anomaly or outlier
detection, etc. [31, 132]. Principal com ponent
analysis (PCA) [99], which is also used to reduce the
dimensionality of huge data sets, is essentially similar
to a single-layered AE with a linear activation
function. Regular ized autoencoders such as sparse,
denoising, and contractive are useful for learning
representations for later classifcation tasks [119],
while variational autoencoders can be used as
generative models [56], discussed below.
– Sparse Autoencoder (SAE) A sparse autoencoder

[73] has a sparsity penalty on the coding layer as
a part of its training requirement. SAEs may have
more hidden units than inputs, but only a small
number of hidden units are permitted to be active
at the same time, resulting in a sparse model.
Figure 10 shows a schematic structure of a sparse
autoencoder with several active units in the hid
den layer. This model is thus obliged to respond to
the unique statistical features of the training data
following its constraints.
Fig. 10 Schematic structure of a sparse autoencoder (SAE)
– Denoising Autoencoder (DAE) A denoising with sev eral active units (flled circle) in the hidden layer
autoencoder is a variant on the basic autoencoder
that attempts to improve representation (to extract
useful features) by altering the reconstruction over the training data, i.e, cleaning the corrupted
criterion, and thus reduces the risk of learning the input, or denoising. Thus, in the context of
identity function [31, 119]. In other words, it computing, DAEs can be considered as very
receives a corrupted data point as input and is powerful flters that can be utilized for automatic
trained to recover the original undistorted input as pre-processing. A denoising autoencoder, for
its out example, could be used to automatically pre-
put through minimizing the average reconstruction process an image, thereby boosting its quality for
error recognition accuracy.
– Contractive Autoencoder (CAE) The idea behind a
con tractive autoencoder, proposed by Rifai et al.
[90], is to make the autoencoders robust of small
changes in the training dataset. In its objective
function, a CAE includes an explicit regularizer
that forces the model to learn an encoding that is
robust to small changes in input values. As a
result, the learned representation’s sensitivity to
the training input is reduced. While DAEs
encourage the robustness of reconstruction as
discussed above, CAEs encourage the robustness
of representation.
– Variational Autoencoder (VAE) A variational autoen
coder [55] has a fundamentally unique property that
distinguishes it from the classical autoencoder dis
cussed above, which makes this so efective for gen
erative modeling. VAEs, unlike the traditional autoen
SN Computer Science
coders which map the input onto a latent vector, A Restricted Boltzmann Machine (RBM) [75] is also a
map the input data into the parameters of a gen erative stochastic neural network capable of
probability dis tribution, such as the mean and learning a prob ability distribution across its inputs.
variance of a Gaussian distribution. A VAE Boltzmann machines typically consist of visible and
assumes that the source data has an underlying hidden nodes and each node is connected to every
probability distribution and then tries to discover other node, which helps us understand irregularities
the distribution’s parameters. Although this by learning how the system works in normal
approach was initially designed for unsupervised circumstances. RBMs are a subset of Boltzmann
learn ing, its use has been demonstrated in other machines that have a limit on the number of
domains such as semi-supervised learning [128] connections between the visible and hidden layers
and supervised learning [51]. [77]. This restriction permits train ing algorithms like
the gradient-based contrastive divergence algorithm
Although, the earlier concept of AE was typically for to be more efcient than those for Boltzmann
dimensionality reduction or feature learning machines in general [41]. RBMs have found
mentioned above, recently, AEs have been brought applications in dimensionality reduction, classifcation,
to the forefront of generative modeling, even the regression, col laborative fltering, feature learning,
generative adversarial net topic modeling, and many others. In the area of deep
work is one of the popular methods in the area. The learning modeling, they can be trained either
AEs have been efectively employed in a variety of supervised or unsupervised, depend ing on the task.
domains, including healthcare, computer vision, Overall, the RBMs can recognize patterns in data
speech recogni tion, cybersecurity, natural language automatically and develop probabilistic or stochastic
processing, and many more. Overall, we can models, which are utilized for feature selection or
conclude that auto-encoder and its variants can play extraction, as well as forming a deep belief network.
a signifcant role as unsupervised feature learning
with neural network architecture. Deep Belief Network (DBN)
A Deep Belief Network (DBN) [40] is a multi-layer

Kohonen Map or Self-Organizing Map (SOM) genera tive graphical model of stacking several
individual unsu pervised networks such as AEs or
A Self-Organizing Map (SOM) or Kohonen Map [59] is RBMs, that use each network’s hidden layer as the
another form of unsupervised learning technique for input for the next layer, i.e, connected sequentially.
creat ing a low-dimensional (usually two-dimensional) Thus, we can divide a DBN into (i) AE-DBN which is
represen tation of a higher-dimensional data set while known as stacked AE, and (ii) RBM DBN that is
maintaining the topological structure of the data. known as stacked RBM, where AE-DBN is composed
SOM is also known as a neural network-based of autoencoders and RBM-DBN is composed of
dimensionality reduction algo rithm that is commonly restricted Boltzmann machines, discussed earlier. The
used for clustering [118]. A SOM adapts to the ultimate goal is to develop a faster-unsupervised
topological form of a dataset by repeatedly moving its training technique for each sub-network that depends
neurons closer to the data points, allowing us to on contrastive divergence [41]. DBN can capture a
visualize enormous datasets and fnd probable hierarchical representa tion of input data based on its
clusters. The frst layer of a SOM is the input layer, deep structure. The primary idea behind DBN is to
and the second layer is the output layer or feature train unsupervised feed-forward neural networks with
map. Unlike other neu ral networks that use error- unlabeled data before fine-tuning the network with
correction learning, such as backpropagation with labeled input. One of the most important advantages
gradient descent [36], SOMs employ competitive of DBN, as opposed to typical shallow learning
learning, which uses a neighborhood function to networks, is that it permits the detection of deep
retain the input space’s topological features. SOM is patterns, which allows for reasoning abilities and the
widely utilized in a variety of applications, including capture of the deep diference between normal and
pat tern identifcation, health or medical diagnosis, erroneous data [89]. A continuous DBN is simply an
anomaly detection, and virus or worm attack extension of a standard DBN that allows a continuous
detection [60, 87]. The primary beneft of employing a range of decimals instead of binary data. Overall, the
SOM is that this can make high-dimensional data DBN model can play a key role in a wide range of
easier to visualize and analyze to understand the high-dimensional data applications due to its strong
patterns. The reduction of dimensionality and grid feature extraction and classifcation capabilities and
clustering makes it easy to observe similarities in the become one of the signifcant topics in the feld of
data. As a result, SOMs can play a vital role in neural networks.
developing a data-driven efective model for a In summary, the generative learning techniques
particular problem domain, depending on the data discussed above typically allow us to generate a new
characteristics. representation
Restricted Boltzmann Machine (RBM)
SN Computer Science
of data through exploratory analysis. As a result, discriminative models. The generative models with
these deep generative networks can be utilized as use ful representation can provide more informative
preprocessing for supervised or discriminative and low dimensional features for discrimination, and
learning tasks, as well as ensuring model accuracy, they can also
where unsupervised representation learning can enable to enhance the training data quality and
allow for improved classifer generalization. quantity, providing additional information for
classifcation.
Deep Networks for Hybrid Learning
and Other Approaches Deep Transfer Learning (DTL)
In addition to the above-discussed deep learning Transfer Learning is a technique for efectively using
categories, hybrid deep networks and several other previ ously learned model knowledge to solve a new
approaches such as deep transfer learning (DTL) and task with minimum training or fne-tuning. In
deep reinforcement learn ing (DRL) are popular, comparison to typical machine learning techniques
which are discussed in the following. [97], DL takes a large amount of training data. As a
result, the need for a substantial vol ume of labeled
Hybrid Deep Neural Networks data is a signifcant barrier to address some essential
domain-specifc tasks, particularly, in the medical
Generative models are adaptable, with the capacity to sector, where creating large-scale, high-quality
learn from both labeled and unlabeled data. annotated medical or health datasets is both difcult
Discriminative mod els, on the other hand, are unable and costly. Fur thermore, the standard DL model
to learn from unlabeled data yet outperform their demands a lot of computa tional resources, such as a
generative counterparts in super vised tasks. A GPU-enabled server, even though researchers are
framework for training both deep generative and working hard to improve it. As a result, Deep Transfer
discriminative models simultaneously can enjoy the Learning (DTL), a DL-based transfer learning
benefts of both models, which motivates hybrid method, might be helpful to address this issue.
networks. Hybrid deep learning models are typically Figure 11 shows a general structure of the transfer
composed of multiple (two or more) deep basic learning process, where knowledge from the pre-
learning models, where the basic model is a trained model is transferred into a new DL model. It’s
discriminative or generative deep learn ing model especially popular in deep learning right now since it
discussed earlier. Based on the integration of dif allows to train deep neural networks with very little
ferent basic generative or discriminative models, the data [126].
below three categories of hybrid deep learning Transfer learning is a two-stage approach for
models might be useful for solving real-world training a DL model that consists of a pre-training
problems. These are as follows: step and a fne tuning step in which the model is
trained on the target task. Since deep neural
– Hybrid Model_1: An integration of diferent networks have gained popularity in a vari ety of felds,
generative or discriminative models to extract a large number of DTL methods have been pre
more meaningful and robust features. Examples sented, making it crucial to categorize and summarize
could be CNN+LSTM, AE+GAN, and so on. them. Based on the techniques used in the literature,
– Hybrid Model_2: An integration of generative model DTL can be classifed into four categories [117].
followed by a discriminative model. Examples These are (i) instances based deep transfer learning
could be DBN+MLP, GAN+CNN, AE+CNN, and that utilizes instances in source domain by
so on. appropriate weight, (ii) mapping-based deep transfer
– Hybrid Model_3: An integration of generative or learning that maps instances from two domains into a
discrim inative model followed by a non-deep learning new data space with better similarity, (iii) network-
classifer. Examples could be AE+SVM, CNN+SVM, based deep transfer learning that reuses the partial of
and so on. network pre trained in the source domain, and (iv)
adversarial based deep transfer learning that uses
adversarial technology to fnd transferable features
Thus, in a broad sense, we can conclude that hybrid
that both suitable for two domains. Due to its high
mod els can be either classifcation-focused or non-
efectiveness and practicality, adversarial-based deep
classifcation depending on the target use. However,
transfer learning has exploded in popularity in recent
most of the hybrid learning-related studies in the area
years. Transfer learning can also be classifed into
of deep learning are classifcation-focused or
inductive, transductive, and unsupervised transfer
supervised learning tasks, sum marized in Table 1.
learning depending on the circumstances between
The unsupervised generative models with meaningful
the source and target domains and activities [81].
representations are employed to enhance the
While most current research focuses on supervised techniques are useful in a variety of felds including
learning, how deep neural networks can transfer natural language processing, sentiment classifcation,
knowledge in unsupervised or semi-supervised visual recognition, speech recogni tion, spam fltering,
learning may gain further interest in the future. DTL and relevant others.
SN Computer Science
Table 1 A summary of deep learning tasks and methods in several popular real-world applications areas
Application areas Tasks Methods References
Healthcare and Medical applications Regular health factors analysis CNN-based Ismail et al. [48] Identifying malicious
behaviors RNN-based Xue et al. [129]
Coronary heart disease risk prediction Autoencoder based Amarbayasgalan et al. [6]
Cancer classifcation Transfer learning based Sevakula et al. [110]
Diagnosis of COVID-19 CNN and BiLSTM based Aslan et al. [10]
Detection of COVID-19 CNN-LSTM based Islam et al. [47]
Natural Language Processing Text summarization Auto-encoder based Yousef et al. [130] Sentiment analysis CNN-LSTM
based Wang et al. [120]
Sentiment analysis CNN and Bi-LSTM based Minaee et al. [78]
Aspect-level sentiment classifcation Attention-based LSTM Wang et al. [124]
Speech recognition Distant speech recognition Attention-based LSTM Zhang et al. [135] Speech emotion classifcation
Transfer learning based Latif et al. [63]
Emotion recognition from speech CNN and LSTM based Satt et al. [109]
Cybersecurity Zero-day malware detection Autoencoders and GAN based Kim et al. [54] Security incidents and fraud
analysis SOM-based Lopez et al. [70]
Android malware detection Autoencoder and CNN based Wang et al. [122]
intrusion detection classifcation DBN-based Wei et al. [125]
DoS attack detection RBM-based Imamverdiyev et al. [46]
Suspicious fow detection Hybrid deep-learning-based Garg et al. [29]
Network intrusion detection AE and SVM based Al et al. [4]
IoT and Smart cities Smart energy management CNN and Attention mechanism Abdel et al. [2] Particulate matter
forecasting CNN-LSTM based Huang et al. [43]
Smart parking system CNN-LSTM based Piccialli et al. [85]
Disaster management DNN-based Aqib et al. [8]
Air quality prediction LSTM-RNN based Kok et al. [61]
Cybersecurity in smart cities RBM, DBN, RNN, CNN, GAN Chen et al. [15]
Smart Agriculture A smart agriculture IoT system RL-based Bu et al. [11] Plant disease detection CNN-based Ale et al.
[5]
Automated soil quality evaluation DNN-based Sumathi et al. [115]
Business and Financial Services Predicting customers’ purchase behavior DNN based Chaudhuri [14] Stock trend
prediction CNN and LSTM based anuradha et al. [7]
Financial loan default prediction CNN-based Deng et al. [23]
Power consumption forecasting LSTM-based Shao et al. [112]
Virtual Assistant and Chatbot Services An intelligent chatbot Bi-RNN and Attention model Dhyani et al. [24] Virtual listener
agent GRU and LSTM based Huang et al. [44]
Smart blind assistant CNN-based Rahman et al. [88]
Object Detection and Recognition Object detection in X-ray images CNN-based Gu et al. [35] Object detection for
disaster response CNN-based Pi et al. [84]
Medicine recognition system CNN-based Chang et al. [12]
Face recognition in IoT-cloud environ ment CNN-based Masud et al. [76]
Food recognition system CNN-based Liu et al. [68]

Afect recognition system DBN-based Kawde et al. [53]
Facial expression analysis CNN and LSTM based Li et al. [67]
Recommendation and Intelligent system Hybrid recommender system DNN-based Kiran et al. [57] Visual recommendation
and search CNN-based Shankar et al. [111]
Recommendation system CNN and Bi-LSTM based Rosa et al. [91]
Intelligent system for impaired patients RL-based Naeem et al. [79]
Intelligent transportation system CNN-based Wang et al. [123]
SN Computer Science
Fig. 11 A general structure of Deep Q-Networks, Double DQN, Bi-directional

transfer learning process, where Learning, Monte Carlo Control, etc. are used in the
knowledge from pre-trained
model is transferred into new
area [50, 97]. In DRL methods it incorporates DL
DL model models, e.g. Deep Neural Net works (DNN), based on
MDP principle [71], as policy and/ or value function
approximators. CNN for example can be used as a
component of RL agents to learn directly from
Deep Reinforcement Learning (DRL)
Reinforcement learning takes a diferent approach to

solv ing the sequential decision-making problem than
other approaches we have discussed so far. The
concepts of an environment and an agent are often
introduced first in reinforcement learning. The agent
can perform a series of actions in the environment,
each of which has an impact on the environment’s
state and can result in possible rewards (feedback) -
“positive” for good sequences of actions that result in
a “good” state, and “negative” for bad sequences of
actions that result in a “bad” state. The purpose of rein
forcement learning is to learn good action sequences
through interaction with the environment, typically
referred to as a policy.
Deep reinforcement learning (DRL or deep RL) [9] inte Fig. 12 Schematic structure of deep reinforcement learning
(DRL) highlighting a deep neural network
grates neural networks with a reinforcement learning
archi tecture to allow the agents to learn the
appropriate actions in a virtual environment, as raw, high-dimensional visual inputs. In the real world,
shown in Fig. 12. In the area of reinforcement DRL based solutions can be used in several
learning, model-based RL is based on learning a application areas including robotics, video games,
transition model that enables for modeling of the natural language process ing, computer vision, and
environment without interacting with it directly, relevant others.
whereas model-free RL methods learn directly from
interactions with the environment. Q-learning is a Deep Learning Application Summary
popular model-free RL technique for determining the
best action-selection policy for any (fnite) Markov During the past few years, deep learning has been
Decision Process (MDP) [86, 97]. MDP is a success fully applied to numerous problems in many
mathematical framework for modeling decisions application areas. These include natural language
based on state, action, and rewards [86]. In addition, processing, senti ment analysis, cybersecurity,
business, virtual assistants, visual recognition, taxonomy in Fig. 6 that includes discriminative
healthcare, robotics, and many more. In Fig. 13, we learning, generative learning, as well as hybrid
have summarized several potential real-world models, discussed earlier, are employed in these
application areas of deep learning. Various deep application areas. In Table 1, we have also
learning techniques according to our presented summarized
SN Computer Science
Fig. 13 Several potential real-world application areas of deep research directions based on our study.
learning
– Automation in Data Annotation According to the
existing literature, discussed in Section 3, most of the
various deep learning tasks and techniques that are deep learn ing models are trained through publicly
used to solve the relevant tasks in several real-world available datasets that are annotated. However, to
applications areas. Overall, from Fig. 13 and Table 1, build a system for a new problem domain or recent
we can conclude that the future prospects of deep data-driven system, raw data from relevant sources
learning modeling in real are needed to collect. Thus, data
world application areas are huge and there are lots of
scopes to work. In the next section, we also
summarize the research issues in deep learning
modeling and point out the potential aspects for annotation, e.g., categorization, tagging, or labeling
future generation DL modeling. of a large amount of raw data, is important for
building dis criminative deep learning models or
supervised tasks, which is challenging. A
Research Directions and Future Aspects technique with the capability of automatic and
dynamic data annotation, rather than man ual
While existing methods have established a solid annotation or hiring annotators, particularly, for
foundation for deep learning systems and research, large datasets, could be more efective for
this section outlines the below ten potential future supervised learning as well as minimizing human
efort. Therefore, a more in-depth investigation of Thus, deep learning models may become
data collection and annotation methods, or worthless or yield decreased accuracy if the data
designing an unsupervised learning-based is bad, such as data sparsity, non-representative,
solution could be one of the primary research poor-quality, ambiguous values, noise, data
directions in the area of deep learning modeling. imbalance, irrelevant features, data inconsistency,
– Data Preparation for Ensuring Data Quality As dis insufcient quan
cussed earlier throughout the paper, the deep tity, and so on for training. Consequently, such
learning algorithms highly impact data quality, and issues in data can lead to poor processing and
availability for training, and consequently on the inaccurate fnd
resultant model for a particular problem domain.
SN Computer Science
ings, which is a major problem while discovering of the data, could be a novel contribution, which
insights from data. Thus deep learning models can also be considered as a major future aspect in
also need to adapt to such rising issues in data, to the area of supervised or discriminative learning. –
capture approximated information from Deep Networks for Unsupervised or Generative Learn
observations. Therefore, efective data pre- ing As discussed in Section 3, unsupervised learning
processing techniques are needed to design or generative deep learning modeling is one of the
accord major tasks in the area, as it allows us to characterize
ing to the nature of the data problem and the high-order correlation properties or features in
characteristics, to handling such emerging data, or generating a new representation of data
challenges, which could be another research through explor atory analysis. Moreover, unlike
direction in the area. supervised learning [97], it does not require labeled
– Black-box Perception and Proper DL/ML Algorithm data due to its capa bility to derive insights directly
Selection In general, it’s difcult to explain how a from the data as well as data-driven decision making.
deep learning result is obtained or how they get Consequently, it thus can be used as preprocessing
the ultimate decisions for a particular model. for supervised learning or discriminative modeling as
Although DL models achieve signifcant well as semi-supervised learning tasks, which ensure
performance while learning from large datasets, as learning accuracy and
discussed in Section 2, this “black-box” perception model efciency. According to our designed
of DL modeling typically represents weak taxonomy of deep learning techniques, as shown
statistical interpretability that could be a major in Fig. 6, genera tive techniques mainly include
issue in the area. On the other hand, ML GAN, AE, SOM, RBM, DBN, and their variants.
algorithms, particularly, rule-based machine Thus, designing new tech niques or their variants
learning techniques provide explicit logic rules (IF- for an efective data modeling or representation
THEN) for making decisions that are eas ier to according to the target real-world application could
interpret, update or delete according to the target be a novel contribution, which can also be
applications [97, 100, 105]. If the wrong learning considered as a major future aspect in the area of
algo unsupervised or generative learning.
rithm is chosen, unanticipated results may occur, – Hybrid/Ensemble Modeling and Uncertainty
result ing in a loss of efort as well as the model’s Handling According to our designed taxonomy of
efcacy and accuracy. Thus by taking into account DL techniques, as shown in Fig 6, this is
the performance, complexity, model accuracy, and considered as another major category in deep
applicability, selecting an appropriate model for learning tasks. As hybrid modeling enjoys the
the target application is chal lenging, and in-depth benefts of both generative and discrimina tive
analysis is needed for better under standing and learning, an efective hybridization can outperform
decision making. others in terms of performance as well as
– Deep Networks for Supervised or Discriminative uncertainty handling in high-risk applications. In
Learn ing: According to our designed taxonomy of Section 3, we have summarized various types of
deep learn ing techniques, as shown in Fig. 6, hybridization, e.g., AE+CNN/SVM. Since a group
discriminative archi tectures mainly include MLP, of neural networks is trained with distinct
CNN, and RNN, along parameters or with separate sub sampling training
with their variants that are applied widely in various datasets, hybridization or ensem bles of such
application domains. However, designing new techniques, i.e., DL with DL/ML, can play a key
techniques or their variants of such discriminative role in the area. Thus designing efective blended
techniques by tak ing into account model discriminative and generative models accord ingly
optimization, accuracy, and appli cability, according rather than naive method, could be an important
to the target real-world application and the nature research opportunity to solve various real-world
issues including semi-supervised learning tasks world applications. The concept of incremental
and model uncertainty. approaches or recency-based learning [100] might be
– Dynamism in Selecting Threshold/ Hyper- efective in sev eral cases depending on the nature of
parameters Values, and Network Structures with target applications. Moreover, assuming the network
Computational Ef ciency In general, the relationship structures with a static number of nodes and layers,
among performance, model complexity, and hyper-parameters values or threshold settings, or
computational requirements is a key issue in deep selecting them by the trial-and error process may not
learning modeling and applications. A combination of be efective in many cases, as it can be changed due
algorithmic advancements with improved accuracy as to the changes in data. Thus, a data driven approach
well as maintaining computational efciency, i.e., to select them dynamically could be more efective
achieving the maximum throughput while consum ing while building a deep learning model in terms of both
the least amount of resources, without signifcant performance and real-world applicability. Such type of
information loss, can lead to a breakthrough in the data-driven automation can lead to future generation
efec tiveness of deep learning modeling in future real- deep learning modeling with additional intel ligence,
which could be a signifcant future aspect in the
SN Computer Science
area as well as an important research direction to language processing, the properties of the English
contrib ute. language typically difer from other lan guages like
– Lightweight Deep Learning Modeling for Next-Gener Bengali, Arabic, French, etc. Thus integrating
ation Smart Devices and Applications: In recent years, domain-based constraints into the deep learning
the Internet of Things (IoT) consisting of billions of model could produce better results for such particular
intelligent and communicating things and mobile com purpose. For instance, a task-specifc feature
munications technologies have become popular to extractor considering domain knowledge in smart
detect and gather human and environmental manufacturing for fault diag nosis can resolve the
information (e.g. geo-information, weather data, bio- issues in traditional deep-learning based methods
data, human behav iors, and so on) for a variety of [28]. Similarly, domain knowledge in medi cal image
intelligent services and applications. Every day, these analysis [58], fnancial sentiment analysis [49],
ubiquitous smart things or devices generate large cybersecurity analytics [94, 103] as well as conceptual
amounts of data, requiring rapid data processing on a data model in which semantic information, (i.e.,
variety of smart mobile devices [72]. Deep learning meaningful for a system, rather than merely
technologies can be incorporate to discover correlational) [45, 121, 131] is included, can play a
underlying properties and to efectively han dle such vital role in the area. Transfer learning could be an
large amounts of sensor data for a variety of IoT efective way to get started on a new challenge with
applications including health monitoring and dis ease domain knowledge. Moreover, contextual information
analysis, smart cities, trafc fow prediction, and such as spatial, temporal, social, environmental
monitoring, smart transportation, manufacture inspec contexts [92, 104, 108] can also play an important
tion, fault assessment, smart industry or Industry 4.0, role to incorpo rate context-aware computing with
and many more. Although deep learning techniques domain knowledge for smart decision making as
discussed in Section 3 are considered as powerful well as building adaptive and intelligent context-
tools for processing big data, lightweight modeling is aware systems. Therefore understanding domain
impor tant for resource-constrained devices, due to knowledge and efectively incorporating them into
their high computational cost and considerable the deep learning model could be another research
memory overhead. Thus several techniques such as direc tion.
optimization, simplif cation, compression, pruning, – Designing General Deep Learning Framework for
generalization, important feature extraction, etc. Target Application Domains One promising
might be helpful in several cases. Therefore, research direction for deep learning-based
constructing the lightweight deep learning techniques solutions is to develop a general framework that
based on a baseline network architecture to adapt the can handle data diversity, dimensions, stim
DL model for next-generation mobile, IoT, or ulation types, etc. The general framework would
resource-constrained devices and applications, could require two key capabilities: the attention
be considered as a signifcant future aspect in the mechanism that focuses on the most valuable
area. parts of input signals, and the abil ity to capture
– Incorporating Domain Knowledge into Deep Learn latent feature that enables the framework to
ing Modeling Domain knowledge, as opposed to capture the distinctive and informative features.
general knowledge or domain-independent Attention models have been a popular research
knowledge, is knowl edge of a specifc, specialized topic because of their intuition, versatility, and
topic or feld. For instance, in terms of natural interpretability, and employed in various
application areas like computer vision, natural lan above-mentioned concerns and tackle real-world
guage processing, text or image classifcation, problems in a variety of application areas. This can
sentiment analysis, recommender systems, user also help the researchers con
profling, etc [13, 80]. Attention mechanism can be duct a thorough analysis of the application’s hidden
implemented based on learning algorithms such and unexpected challenges to produce more reliable
as reinforcement learning that is capable of fnding and realis tic outcomes. Overall, we can conclude that
the most useful part through a policy search [133, addressing the above-mentioned issues and
134]. Similarly, CNN can be integrated with contributing to proposing efec tive and efcient
suitable attention mechanisms to form a general techniques could lead to “Future Genera tion DL”
classif cation framework, where CNN can be used modeling as well as more intelligent and automated
as a feature learning tool for capturing features in applications.
various levels and ranges. Thus, designing a
general deep learning framework considering
attention as well as a latent feature for target Concluding Remarks
application domains could be another area to
contribute. In this article, we have presented a structured and
compre hensive view of deep learning technology,
To summarize, deep learning is a fairly open topic to which is consid ered a core part of artifcial intelligence
which academics can contribute by developing new as well as data sci ence. It starts with a history of
methods or improving existing methods to handle the artifcial neural networks and
SN Computer Science
moves to recent deep learning techniques and interpretability, addressing the challenges or future
breakthroughs in diferent applications. Then, the key aspects that are identifed could lead to future genera
algorithms in this area, as well as deep neural tion deep learning modeling and smarter systems.
network modeling in various dimensions are This can also help the researchers for in-depth
explored. For this, we have also presented a analysis to produce more reliable and realistic
taxonomy considering the variations of deep learning outcomes. Overall, we believe that our study on
tasks and how they are used for diferent purposes. In neural networks and deep learning-based advanced
our compre analytics points in a promising path and can be uti
hensive study, we have taken into account not only lized as a reference guide for future research and
the deep networks for supervised or discriminative implemen tations in relevant application domains by
learning but also the deep networks for unsupervised both academic and industry professionals.
or generative learning, and hybrid learning that can
be used to solve a variety of real-world issues
according to the nature of problems. Declarations
Deep learning, unlike traditional machine learning
and data mining algorithms, can produce extremely Conflict of interest The author declares no confict of interest.
high-level data representations from enormous
amounts of raw data. As a result, it has provided an
excellent solution to a variety of real-world problems. References
A successful deep learning technique must possess
the relevant data-driven modeling depending on the 1. Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J,
characteristics of raw data. The sophisticated learn Devin Ma, Ghemawat S, Irving G, Isard M, et al.
ing algorithms then need to be trained through the Tensorfow: a system for large-scale machine learning. In:
collected data and knowledge related to the target 12th {USENIX} Symposium on operating systems design
and implementation ({OSDI} 16), 2016; p. 265–283.
application before the system can assist with 2. Abdel-Basset M, Hawash H, Chakrabortty RK, Ryan M.
intelligent decision-making. Deep learning has shown Energy-net: a deep learning approach for smart energy
to be useful in a wide range of applica man agement in iot-based smart cities. IEEE Internet of
tions and research areas such as healthcare, Things J. 2021.
3. Aggarwal A, Mittal M, Battineni G. Generative adversarial
sentiment analy sis, visual recognition, business net work: an overview of theory and applications. Int J Inf
intelligence, cybersecurity, and many more that are Manag Data Insights. 2021; p. 100004.
summarized in the paper. 4. Al-Qatf M, Lasheng Y, Al-Habib M, Al-Sabahi K. Deep
Finally, we have summarized and discussed the learning approach combining sparse autoencoder with
chal lenges faced and the potential research svm for network intrusion detection. IEEE Access.
2018;6:52843–56.
directions, and future aspects in the area. Although 5. Ale L, Sheta A, Li L, Wang Y, Zhang N. Deep learning
deep learning is considered a black-box solution for based plant disease detection for smart agriculture. In:
many applications due to its poor reasoning and 2019 IEEE Globecom Workshops (GC Wkshps), 2019; p.
1–6. IEEE. but will they buy? predicting customers’ purchase
6. Amarbayasgalan T, Lee JY, Kim KR, Ryu KH. Deep behavior using deep learning. Decis Support Syst. 2021;
autoencoder based neural networks for coronary heart p. 113622.
disease risk prediction. In: Heterogeneous data 15. Chen D, Wawrzynski P, Lv Z. Cyber security in smart
management, polystores, and analytics for healthcare. cities: a review of deep learning-based applications and
Springer; 2019. p. 237–48. case studies. Sustain Cities Soc. 2020; p. 102655.
7. Anuradha J, et al. Big data based stock trend prediction 16. Cho K, Van MB, Gulcehre C, Bahdanau D, Bougares F,
using deep cnn with reinforcement-lstm model. Int J Syst Schwenk H, Bengio Y. Learning phrase representations
Assur Eng Manag. 2021; p. 1–11. using rnn encoder decoder for statistical machine
8. Aqib M, Mehmood R, Albeshri A, Alzahrani A. Disaster translation. arXiv preprint arXiv:1406.1078, 2014.
man agement in smart cities by forecasting trafc plan 17. Chollet F. Xception: Deep learning with depthwise
using deep learning and gpus. In: International separable convolutions. In: Proceedings of the IEEE
Conference on smart cities, infrastructure, technologies Conference on com puter vision and pattern recognition, 2017;
and applications. Springer; 2017. p. 139–54. p. 1251–258.
9. Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA. 18. Chung J, Gulcehre C, Cho KH, Bengio Y. Empirical
Deep reinforcement learning: a brief survey. IEEE Signal evaluation of gated recurrent neural networks on
Process Mag. 2017;34(6):26–38. sequence modeling. arXiv preprint arXiv:1412.3555,
10. Aslan MF, Unlersen MF, Sabanci K, Durdu A. Cnn-based 2014.
trans fer learning-bilstm network: a novel approach for 19. Coelho IM, Coelho VN, da Eduardo J, Luz S, Ochi LS,
covid-19 infec tion detection. Appl Soft Comput. Guima rães FG, Rios E. A gpu deep learning metaheuristic
2021;98:106912. based model for time series forecasting. Appl Energy.
11. Bu F, Wang X. A smart agriculture iot system based on 2017;201:412–8.
deep rein forcement learning. Futur Gener Comput Syst. 20. Da'u A, Salim N. Recommendation system based on deep
2019;99:500–7. 12. Chang W-J, Chen L-B, Hsu C-H, Lin C-P, learn ing methods: a systematic review and new
Yang T-C. A deep learning-based intelligent medicine directions. Artif Intel Rev. 2020;53(4):2709–48.
recognition system for chronic patients. IEEE Access. 21. Deng L. A tutorial survey of architectures, algorithms, and
2019;7:44441–58. appli cations for deep learning. APSIPA Trans Signal Inf
13. Chaudhari S, Mithal V, Polatkan Gu, Ramanath R. An Process. 2014; p. 3.
attentive survey of attention models. arXiv preprint 22. Deng L, Dong Yu. Deep learning: methods and
arXiv:1904.02874, 2019. applications. Found Trends Signal Process. 2014;7(3–4):197–
14. Chaudhuri N, Gupta G, Vamsi V, Bose I. On the platform 387.
SN Computer Science
23. Deng S, Li R, Jin Y, He H. Cnn-based feature cross and deep convolutional neural networks. IEEE Trans Ind
clas sifer for loan default prediction. In: 2020 International Electron. 2020.
Con ference on image, video processing and artifcial 36. Han J, Pei J, Kamber M. Data mining: concepts and
intelligence, volume 11584, page 115841K. International techniques. Amsterdam: Elsevier; 2011.
Society for Optics and Photonics, 2020. 37. Haykin S. Neural networks and learning machines, 3/E.
24. Dhyani M, Kumar R. An intelligent chatbot using deep Lon don: Pearson Education; 2010.
learning with bidirectional rnn and attention model. Mater 38. He K, Zhang X, Ren S, Sun J. Spatial pyramid pooling in
Today Proc. 2021;34:817–24. deep convolutional networks for visual recognition. IEEE
25. Donahue J, Krähenbühl P, Darrell T. Adversarial feature Trans Pat tern Anal Mach Intell. 2015;37(9):1904–16.
learn ing. arXiv preprint arXiv:1605.09782, 2016. 39. He K, Zhang X, Ren S, Sun J. Deep residual learning for
26. Du K-L, Swamy MNS. Neural networks and statistical learn image recognition. In: Proceedings of the IEEE Conference on
ing. Berlin: Springer Science & Business Media; 2013. 27. com puter vision and pattern recognition, 2016; p. 770–78.
Dupond S. A thorough review on the current advance of neural 40. Hinton GE. Deep belief networks. Scholarpedia.
network structures. Annu Rev Control. 2019;14:200–30. 28. 2009;4(5):5947.
Feng J, Yao Y, Lu S, Liu Y. Domain knowledge-based deep 41. Hinton GE, Osindero S, Teh Y-W. A fast learning algorithm
broad learning framework for fault diagnosis. IEEE Trans Ind for deep belief nets. Neural Comput. 2006;18(7):1527–54. 42.
Electron. 2020;68(4):3454–64. Hochreiter S, Schmidhuber J. Long short-term memory. Neural
29. Garg S, Kaur K, Kumar N, Rodrigues JJPC. Hybrid deep Comput. 1997;9(8):1735–80.
learning-based anomaly detection scheme for suspicious 43. Huang C-J, Kuo P-H. A deep cnn-lstm model for particu
fow detection in sdn: a social multimedia perspective. late matter (pm2. 5) forecasting in smart cities. Sensors.
IEEE Trans Multimed. 2019;21(3):566–78. 2018;18(7):2220.
30. Géron A. Hands-on machine learning with Scikit-Learn, 44. Huang H-H, Fukuda M, Nishida T. Toward rnn based micro
Keras. In: and TensorFlow: concepts, tools, and non-verbal behavior generation for virtual listener agents.
techniques to build intelligent systems. O’Reilly Media; In: International Conference on human-computer
2019. interaction, 2019; p. 53–63. Springer.
31. Goodfellow I, Bengio Y, Courville A, Bengio Y. Deep 45. Hulsebos M, Hu K, Bakker M, Zgraggen E, Satyanarayan
learning, vol. 1. Cambridge: MIT Press; 2016. A, Kraska T, Demiralp Ça, Hidalgo C. Sherlock: a deep
32. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde- learning approach to semantic data type detection. In:
Farley D, Ozair S, Courville A, Bengio Y. Generative Proceedings of the 25th ACM SIGKDD International
adversarial nets. In: Advances in neural information Conference on knowl
processing systems. 2014; p. 2672–680. edge discovery & data mining, 2019; p. 1500–508.
33. Google trends. 2021. https://trends.google.com/trends/. 34. 46. Imamverdiyev Y, Abdullayeva F. Deep learning method for
Gruber N, Jockisch A. Are gru cells more specifc and lstm denial of service attack detection based on restricted
cells more sensitive in motive classifcation of text? Front Artif Boltzmann machine. Big Data. 2018;6(2):159–69.
Intell. 2020;3:40. 47. Islam MZ, Islam MM, Asraf A. A combined deep cnn-lstm
35. Gu B, Ge R, Chen Y, Luo L, Coatrieux G. Automatic and net work for the detection of novel coronavirus (covid-19)
robust object detection in x-ray baggage inspection using using x-ray images. Inf Med Unlock. 2020;20:100412.
48. Ismail WN, Hassan MM, Alsalamah HA, Fortino G. Cnn- Roeder I, Scherf N. Domain-specific cues improve robustness
based health model for regular health factors analysis in of deep learning-based segmentation of ct volumes. Sci Rep.
internet-of medical things environment. IEEE. Access. 2020;10(1):1–9.
2020;8:52541–9. 59. Kohonen T. The self-organizing map. Proc IEEE.
49. Jangid H, Singhal S, Shah RR, Zimmermann R. Aspect- 1990;78(9):1464–80.
based fnancial sentiment analysis using deep learning. In: 60. Kohonen T. Essentials of the self-organizing map. Neural
Compan ion Proceedings of the The Web Conference Netw. 2013;37:52–65.
2018, 2018; p. 1961–966. 61. Kök İ, Şimşek MU, Özdemir S. A deep learning model for
50. Kaelbling LP, Littman ML, Moore AW. Reinforcement air quality prediction in smart cities. In: 2017 IEEE
learning: a survey. J Artif Intell Res. 1996;4:237–85. International Conference on Big Data (Big Data), 2017; p.
51. Kameoka H, Li L, Inoue S, Makino S. Supervised 1983–990. IEEE.
determined source separation with multichannel 62. Krizhevsky A, Sutskever I, Hinton GE. Imagenet
variational autoencoder. Neural Comput. classifcation with deep convolutional neural networks. In:
2019;31(9):1891–914. Advances in neural information processing systems.
52. Karhunen J, Raiko T, Cho KH. Unsupervised deep 2012; p. 1097–105.
learning: a short review. In: Advances in independent 63. Latif S, Rana R, Younis S, Qadir J, Epps J. Transfer
component analysis and learning machines. 2015; p. learning for improving speech emotion classifcation
125–42. accuracy. arXiv preprint arXiv:1801.06353, 2018.
53. Kawde P, Verma GK. Deep belief network based afect 64. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature.
recogni tion from physiological signals. In: 2017 4th IEEE 2015;521(7553):436–44.
Uttar Pradesh Section International Conference on 65. LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based
electrical, computer and electronics (UPCON), 2017; p. learning applied to document recognition. Proc IEEE.
587–92. IEEE. 1998;86(11):2278–324.
54. Kim J-Y, Seok-Jun B, Cho S-B. Zero-day malware 66. Li B, François-Lavet V, Doan T, Pineau J. Domain
detection using transferred generative adversarial adversarial reinforcement learning. arXiv preprint
networks based on deep autoencoders. Inf Sci. arXiv:2102.07097, 2021. 67. Li T-HS, Kuo P-H, Tsai T-N, Luan
2018;460:83–102. P-C. Cnn and lstm based facial expression analysis model for
55. Kingma DP, Welling M. Auto-encoding variational bayes. a humanoid robot. IEEE Access. 2019;7:93998–4011.
arXiv preprint arXiv:1312.6114, 2013. 68. Liu C, Cao Y, Luo Y, Chen G, Vokkarane V, Yunsheng M,
56. Kingma DP, Welling M. An introduction to variational Chen S, Hou P. A new deep learning-based food
autoen coders. arXiv preprint arXiv:1906.02691, 2019. recognition system for dietary assessment on an edge
57. Kiran PKR, Bhasker B. Dnnrec: a novel deep learning computing service infrastructure. IEEE Trans Serv
based hybrid recommender system. Expert Syst Appl. 2020. Comput. 2017;11(2):249–61.
58. Kloenne M, Niehaus S, Lampe L, Merola A, Reinelt J,
SN Computer Science
69. Liu W, Wang Z, Liu X, Zeng N, Liu Y, Alsaadi FE. A survey 78. Minaee S, Azimi E, Abdolrashidi AA. Deep-sentiment: senti
of deep neural network architectures and their ment analysis using ensemble of cnn and bi-lstm models.
applications. Neuro computing. 2017;234:11–26. arXiv preprint arXiv:1904.04206, 2019.
70. López AU, Mateo F, Navío-Marco J, Martínez-Martínez JM, 79. Naeem M, Paragliola G, Coronato A. A reinforcement learn
Gómez-Sanchís J, Vila-Francés J, Serrano-López AJ. ing and deep learning based intelligent system for the
Analysis of computer user behavior, security incidents support of impaired patients in home treatment. Expert
and fraud using self-organizing maps. Comput Secur. Syst Appl. 2021;168:114285.
2019;83:38–51. 80. Niu Z, Zhong G, Hui Yu. A review on the attention
71. Lopez-Martin M, Carro B, Sanchez-Esguevillas A. mechanism of deep learning. Neurocomputing. 2021;452:48–
Application of deep reinforcement learning to intrusion 62. 81. Pan SJ, Yang Q. A survey on transfer learning. IEEE
detection for super vised problems. Expert Syst Appl. Trans Knowl Data Eng. 2009;22(10):1345–59.
2020;141:112963. 82. Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan
72. Ma X, Yao T, Menglan H, Dong Y, Liu W, Wang F, Liu J. A G, Killeen T, Lin Z, Gimelshein N, Antiga L, et al. Pytorch:
sur vey on deep learning empowered iot applications. An imperative style, high-performance deep learning
IEEE Access. 2019;7:181721–32. library. Adv Neural Inf Process Syst. 2019;32:8026–37.
73. Makhzani A, Frey B. K-sparse autoencoders. arXiv preprint 83. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion
arXiv:1312.5663, 2013. B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg
74. Mandic D, Chambers J. Recurrent neural networks for V, et al. Scikit-learn: machine learning in python. J Mach
prediction: learning algorithms, architectures and stability. Learn Res. 2011;12:2825–30.
Hoboken: Wiley; 2001. 84. Pi Y, Nath ND, Behzadan AH. Convolutional neural
75. Marlin B, Swersky K, Chen B, Freitas N. Inductive networks for object detection in aerial imagery for disaster
principles for restricted boltzmann machine learning. In: response and recovery. Adv Eng Inf. 2020;43:101009.
Proceedings of the Thirteenth International Conference on 85. Piccialli F, Giampaolo F, Prezioso E, Crisci D, Cuomo S.
artifcial intelligence and statistics, p. 509–16. JMLR Pre dictive analytics for smart parking: A deep learning
Workshop and Conference Proceed approach in forecasting of iot data. ACM Trans Internet
ings, 2010. Technol (TOIT). 2021;21(3):1–21.
76. Masud M, Muhammad G, Alhumyani H, Alshamrani SS, 86. Puterman ML. Markov decision processes: discrete
Cheikhrouhou O, Ibrahim S, Hossain MS. Deep learning- stochastic dynamic programming. Hoboken: Wiley; 2014.
based intelligent face recognition in iot-cloud environment. 87. Qu X, Lin Y, Kai G, Linru M, Meng S, Mingxing K, Mu L,
Comput Commun. 2020;152:215–22. editors. A survey on the development of self-organizing
77. Memisevic R, Hinton GE. Learning to represent spatial maps for unsupervised intrusion detection. Mob Netw
transfor mations with factored higher-order boltzmann Appl. 2019; p. 1–22.
machines. Neural Comput. 2010;22(6):1473–92. 88. Rahman MW, Tashfa SS, Islam R, Hasan MM, Sultan SI,
Mia S, Rahman MM. The architectural design of smart 99. Sarker IH, Abushark YB, Khan AI. Contextpca: Predicting
blind assistant con text-aware smartphone apps usage based on machine
using iot with deep learning paradigm. Internet of Things. learning techniques. Symmetry. 2020;12(4):499.
2021;13:100344. 100. Sarker IH, Colman A, Han J. Recencyminer: mining
89. Ren J, Green M, Huang X. From traditional to deep recency based personalized behavior from contextual
learning: fault diagnosis for autonomous vehicles. In: smartphone data. J Big Data. 2019;6(1):1–21.
Learning control. Elsevier. 2021; p. 205–19. 101. Sarker IH, Colman A, Han J, Khan AI, Abushark YB, Salah
90. Rifai S, Vincent P, Muller X, Glorot X, Bengio Y. K. Behavdt: a behavioral decision tree learning to build
Contractive auto-encoders: Explicit invariance during user centric context-aware predictive model. Mob Netw
feature extraction. In: Icml, 2011. Appl. 2020;25(3):1151–61.
91. Rosa RL, Schwartz GM, Ruggiero WV, Rodríguez DZ. A 102. Sarker IH, Colman A, Kabir MA, Han J. Individualized time
knowledge-based recommendation system that includes series segmentation for mining mobile phone user
sentiment analysis and deep learning. IEEE Trans Ind Inf. behavior. Comput J. 2018;61(3):349–68.
2018;15(4):2124–35. 103. Sarker IH, Furhad MH, Nowrozy R. Ai-driven cybersecurity:
92. Sarker IH. Context-aware rule learning from smartphone an overview, security intelligence modeling and research
data: survey, challenges and future directions. J Big Data. directions. SN Computer. Science. 2021;2(3):1–18.
2019;6(1):1–25. 104. Sarker IH, Hoque MM, Uddin MK. Mobile data science and
93. Sarker IH. A machine learning based robust prediction intelligent apps: concepts, ai-based modeling and research
model for real-life mobile phone data. Internet of Things. direc tions. Mob Netw Appl. 2021;26(1):285–303.
2019;5:180–93. 94. Sarker IH. Cyberlearning: efectiveness 105. Sarker IH, Kayes ASM. Abc-ruleminer: User behavioral
analysis of machine learning security modeling to detect rule based machine learning method for context-aware
cyber-anomalies and multi attacks. Internet of Things. intelligent services. J Netw Comput Appl.
2021;14:100393. 2020;168:102762.
95. Sarker IH. Data science and analytics: an overview from 106. Sarker IH, Kayes ASM, Badsha S, Alqahtani H, Watters P,
data driven smart computing, decision-making and Ng A. Cybersecurity data science: an overview from
applications per spective. SN Comput Sci. 2021. machine learning perspective. J Big data. 2020;7(1):1–29.
96. Sarker IH. Deep cybersecurity: a comprehensive overview 107. Sarker IH, Kayes ASM, Watters P. Effectiveness analy sis
from neural network and deep learning perspective. SN of machine learning classifcation models for predicting
Computer. Science. 2021;2(3):1–16. personalized context-aware smartphone usage. J Big
97. Sarker IH. Machine learning: Algorithms, real-world Data. 2019;6(1):1–28.
applications and research directions. SN Computer. Science. 108. Sarker IH, Salah K. Appspred: predicting context-aware
2021;2(3):1–21. 98. Sarker IH, Abushark YB, Alsolami F, Khan smart phone apps using random forest learning. Internet of
AI. Intrudtree: a machine learning based cyber security Things. 2019;8:100106.
intrusion detection model. Symmetry. 2020;12(5):754.
SN Computer Science
109. Satt A, Rozenberg S, Hoory R. Efcient emotion on deep transfer learning. In: International Conference on
recognition from speech using deep learning on artifcial neural networks, 2018; p. 270–279. Springer.
spectrograms. In: Interspeec, 2017; p. 1089–1093. 118. Vesanto J, Alhoniemi E. Clustering of the self-organizing
110. Sevakula RK, Singh V, Verma NK, Kumar C, Cui Y. Trans map. IEEE Trans Neural Netw. 2000;11(3):586–600.
fer learning for molecular cancer classification using deep 119. Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol P-
neural networks. IEEE/ACM Trans Comput Biol Bioinf. A, Bottou L. Stacked denoising autoencoders: Learning
2018;16(6):2089–100. useful rep resentations in a deep network with a local
111. Sujay Narumanchi H, Ananya Pramod Kompalli Shankar denoising criterion. J Mach Learn Res. 2010;11(12).
A, Devashish CK. Deep learning based large scale visual 120. Wang J, Liang-Chih Yu, Robert Lai K, Zhang X. Tree-
rec ommendation and search for e-commerce. arXiv structured regional cnn-lstm model for dimensional
preprint arXiv:1703.02344, 2017. sentiment analysis. IEEE/ACM Trans Audio Speech Lang
112. Shao X, Kim CS. Multi-step short-term power consumption Process. 2019;28:581–91.
fore casting using multi-channel lstm with time location 121. Wang S, Wan J, Li D, Liu C. Knowledge reasoning with
considering customer behavior. IEEE Access. 2020;8:125263– seman tic data for real-time data processing in smart
73. factory. Sensors. 2018;18(2):471.
113. Siami-Namini S, Tavakoli N, Namin AS. The performance 122. Wang W, Zhao M, Wang J. Efective android malware
of lstm and bilstm in forecasting time series. In: 2019 detec tion with a hybrid model based on deep
IEEE Inter national Conference on Big Data (Big Data), autoencoder and con volutional neural network. J Ambient
2019; p. 3285–292. IEEE. Intell Humaniz Comput. 2019;10(8):3035–43.
114. Ślusarczyk B. Industry 4.0: are we ready? Pol J Manag 123. Wang X, Liu J, Qiu T, Chaoxu M, Chen C, Zhou P. A real
Stud. 2018; p. 17 time collision prediction mechanism with deep learning for
115. Sumathi P, Subramanian R, Karthikeyan VV, Karthik S. intelligent transportation system. IEEE Trans Veh
Soil monitoring and evaluation system using edl-asqe: Technol. 2020;69(9):9497–508.
enhanced deep learning model for ioi smart agriculture 124. Wang Y, Huang M, Zhu X, Zhao L. Attention-based lstm
network. Int J Commun Syst. 2021; p. e4859. for aspect-level sentiment classifcation. In: Proceedings
116. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov of the 2016 Conference on empirical methods in natural
D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper language processing, 2016; p. 606–615.
with convolutions. In: Proceedings of the IEEE 125. Wei P, Li Y, Zhang Z, Tao H, Li Z, Liu D. An optimization
Conference on computer vision and pattern recognition, method for intrusion detection classifcation model based
2015; p. 1–9. on deep belief network. IEEE Access. 2019;7:87593–605.
117. Tan C, Sun F, Kong T, Zhang W, Yang C, Liu C. A survey 126. Weiss K, Khoshgoftaar TM, Wang DD. A survey of
transfer learning. J Big data. 2016;3(1):9. recom mender systems. Front Comput Sci. 2020;14(2):430–50.
127. Xin Y, Kong L, Liu Z, Chen Y, Li Y, Zhu H, Gao M, Hou H, 133. Zhang X, Yao L, Huang C, Wang S, Tan M, Long Gu,
Wang C. Machine learning and deep learning methods for Wang C. Multi-modality sensor data classifcation with selective
cyber security. Ieee access. 2018;6:35365–81. attention. arXiv preprint arXiv:1804.05493, 2018.
128. Xu W, Sun H, Deng C, Tan Y. Variational autoencoder for 134. Zhang X, Yao L, Wang X, Monaghan J, Mcalpine D,
semi supervised text classifcation. In: Thirty-First AAAI Zhang Y. A survey on deep learning based brain
Conference on artifcial intelligence, 2017. computer interface: recent advances and new frontiers.
129. Xue Q, Chuah MC. New attacks on rnn based healthcare arXiv preprint arXiv:1905.04149, 2019; p. 66.
learning system and their detections. Smart Health. 135. Zhang Y, Zhang P, Yan Y. Attention-based lstm with multi-
2018;9:144–57. 130. Yousef-Azar M, Hamey L. Text task learning for distant speech recognition. In:
summarization using unsuper Interspeech, 2017; p. 3857–861.
vised deep learning. Expert Syst Appl. 2017;68:93–105.
131. Yuan X, Shi J, Gu L. A review of deep learning methods Publisher's Note Springer Nature remains neutral with regard
for semantic segmentation of remote sensing imagery. Expert to jurisdictional claims in published maps and institutional
Syst Appl. 2020;p. 114417. afliations.
132. Zhang G, Liu Y, Jin X. A survey of autoencoder-based
SN Computer Science

Deep Learning

Uploaded by

Copyright:

Available Formats

Deep Learning

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Deep Learning

Uploaded by

Copyright:

Available Formats

SN Computer Science (2021) 2:420

Deep Learning: A Comprehensive Overview

Introduction VIC 3122, Australia

collected from Google trends [33]. Deep learning

Why Deep Learning in Today’s

The main focus of today’s Fourth Industrial Revolution

– Long short-term memory (LSTM) This is a popular

Fig. 8 Basic structure of a gated recurrent unit (GRU) cell

an update gate but lacks the output gate, as shown

– Sparse Autoencoder (SAE) A sparse autoencoder

A Deep Belief Network (DBN) [40] is a multi-layer

Food recognition system CNN-based Liu et al. [68]

Fig. 11 A general structure of Deep Q-Networks, Double DQN, Bi-directional

Deep Reinforcement Learning (DRL)

Reinforcement learning takes a diferent approach to

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.