Artificial Neural Networks-Based Machine Learning For Wireless Networks: A Tutorial

1
Artificial Neural Networks-Based Machine Learning for

Wireless Networks: A Tutorial
Mingzhe Chen∗,†,‡ , Ursula Challita§ , Walid Saad¶ , Changchuan Yin∗ , and Mérouane Debbahk
∗ Beijing Laboratory of Advanced Information Network, Beijing University of Posts and Telecommunications, Beijing, China 100876,
Email: chenmingzhe@bupt.edu.cn and ccyin@bupt.edu.cn.
† The Future Network of Intelligence Institute, The Chinese University of Hong Kong, Shenzhen, China.
‡ Department of Electrical Engineering, Princeton University, Princeton, NJ, USA.
§ School of Informatics, The University of Edinburgh, Edinburgh, UK. Email: ursula.challita@ed.ac.uk.
¶ Wireless@VT, Bradley Department of Electrical and Computer Engineering, Virginia Tech, Blacksburg, VA, USA, Email: walids@vt.edu.
k Mathematical and Algorithmic Sciences Lab, Huawei France R & D, Paris, France,
Email: merouane.debbah@huawei.com.
Abstract—In order to effectively provide ultra reliable low from conventional multimedia or voice-based services [4]. For
latency communications and pervasive connectivity for Internet instance, beyond the need for high data rates – which has been
of Things (IoT) devices, next-generation wireless networks can the main driver of the wireless network evolution in the past
leverage intelligent, data-driven functions enabled by the inte-
gration of machine learning notions across the wireless core and decade – next-generation wireless networks will also have to
edge infrastructure. In this context, this paper provides a com- deliver ultra-reliable, low-latency communication [4] and [5],
prehensive tutorial that overviews how artificial neural networks that is adaptive and in real-time to the dynamics of the IoT
(ANNs)-based machine learning algorithms can be employed for users and the IoT’s physical environment. For example, drones
solving various wireless networking problems. For this purpose, and connected vehicles [6] will place autonomy at the heart of
we first present a detailed overview of a number of key types of
ANNs that include recurrent, spiking, and deep neural networks, the IoT. This, in turn, will necessitate the deployment of ultra-
that are pertinent to wireless networking applications. For each reliable wireless links that can provide real-time, low-latency
type of ANN, we present the basic architecture as well as specific control for such autonomous systems [7]–[9]. Meanwhile, in
examples that are particularly important for wireless network tomorrow’s wireless networks, large volumes of data will
design. Such examples include echo state networks, liquid state be collected, periodically and in real-time, across a massive
machine, and long short term memory. And then, we provide
an in-depth overview on the variety of wireless communication number of sensing and wearable devices that monitor physical
problems that can be addressed using ANNs, ranging from environments. Such massive short-packet transmissions will
communication using unmanned aerial vehicles to virtual reality lead to a substantial traffic over the wireless uplink, which has
applications over wireless networks and edge computing and traditionally been much less congested than the downlink [10].
caching. For each individual application, we present the main This same wireless network must also support cloud-based
motivation for using ANNs along with the associated challenges
while we also provide a detailed example for a use case scenario gaming [11], immersive virtual reality services [12], real-time
and outline future works that can be addressed using ANNs. In HD streaming, and conventional multimedia services. This
a nutshell, this article constitutes the first holistic tutorial on the ultimately creates a radically different networking environment
development of ANN-based machine learning techniques tailored whose novel applications and their diverse quality-of-service
to the needs of future wireless networks. (QoS) and reliability requirements mandate a fundamental
change in the way in which wireless networks are modeled,
analyzed, designed, and optimized.
I. I NTRODUCTION
The need to cope with this ongoing and rapid evolution of
The wireless networking landscape is undergoing a major wireless services has led to a considerable body of research
revolution. The smartphone-centric networks of yesteryears that investigates what the optimal cellular network architecture
are gradually morphing into an Internet of Things (IoT) will be within the context of the emerging fifth generation
ecosystem [1]–[3] that integrates a heterogeneous mix of (5G) wireless networks (e.g., see [13] and the references
wireless-enabled devices ranging from smartphones, to drones, therein). While the main ingredients for 5G – such as dense
connected vehicles, wearables, sensors, and virtual reality small cell deployments, millimeter wave (mmWave) com-
devices. This unprecedented transformation will not only drive munications, and device-to-device (D2D) communications –
an exponential growth in wireless traffic in the foreseeable have been identified, integrating them into a truly harmonious
future, but it will also lead to the emergence of new and wireless system that can meet the IoT challenges requires
untested wireless service use cases, that substantially differ instilling intelligent functions across both the edge and the
core of the network. These intelligent functions must be able
This work was supported in part by the National Natural Science Foundation to adaptively exploit the wireless system resources and the
of China under Grant 61629101, Grant 61871041, and Grant 61671086, in part
by Beijing Natural Science Foundation and Municipal Education Committee generated data, in order to optimize the network operations and
Joint Funding Project under Grant KZ201911232046, in part by the 111 guarantee, in real-time, the QoS needs of emerging wireless
Project under Grant B17007, in part by grants No. ZDSYS201707251409055, and IoT services. Such mobile edge and core intelligence can
No. 2017ZT07X152, No. 2018B030338001, and No. 2018YFB1800800, in
part by the U.S. National Science Foundation under Grants CNS-1460316, potentially be realized by integrating fundamental notions of
CNS-1836802, and IIS-1633363. machine learning (ML) [14], in particular, artificial neural
Digital Object Identifier: 10.1109/COMST.2019.2926625
1553-877X c 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.
2
network (ANN)-based ML approaches, across the wireless control. In fact, if properly designed, ML optimization al-
infrastructure and the end-user devices. ANNs [15] are a gorithms will provide inherently self-organizing, self-healing,
computational nonlinear machine learning framework can be and self-optimizing solutions for a broad range of problems
used for supervised learning, unsupervised learning [16], semi- within the context of network optimization and resource
supervised learning [17], and reinforcement learning [18], in management. Such ML-driven self-organizing solutions are
various wireless networking scenarios. Hereinafter, ML is used particularly apropos for ultra dense wireless networks in which
to refer to ANN-based ML. classical centralized and distributed optimization approaches
can no longer cope with the scale and the heterogeneity of the
network.
A. Role of ANNs in Wireless Networks
Third, beyond its system-level functions, ML can play a
ML tools are undoubtedly one of the most important tools key role at the physical layer of a wireless network [32].
for endowing wireless networks with intelligent functions, as As shown in [32]–[37], ML tools can be used to redefine
evidenced by the wide adoption of ML in a myriad of applica- the way in which physical layer functions, such as coding
tions domains [19]–[24]. In the context of wireless networks, and modulation, are designed, at both transmitter and receiver
ML will enable any wireless device to actively and intelli- levels, within a generic communication system. Such an ML-
gently monitor its environment by learning and predicting the driven approach has been shown [32]–[37] to have a lot of
evolution of the various environmental features (e.g., wireless promise in delivering lower bit error rates and better robustness
channel dynamics, traffic patterns, network composition, con- to the wireless channel impediments.
tent requests, user context, etc.) and proactively taking actions Last, but not least, the rapid deployment of highly user-
that maximize the chances of success for some predefined centric wireless services, such as virtual reality [38], in which
goal, which, in a wireless system, pertains to some sought the gap between the end-user and the network functions
after quality-of-service. ML enables the network infrastructure is almost minimal, strongly motivates the need for wireless
to learn from the wireless networking environment and take networks that can track and adapt to the human user behavior.
adaptive network optimization actions. In consequence, ML In this regard, ML is perhaps the only tool that is capable to
is expected to play several roles in the next-generation of learn and mimic human behavior, which will help in creating
wireless networks [25]–[29]. the wireless network to adapt its functions to its human users,
First, the most natural application of ML in a wireless thus creating a truly immersive environment and to maximize
system is to exploit intelligent and predictive data analytics the overall quality-of-experience (QoE) of the users.
to enhance situational awareness and the overall network From the above discussion, we can further narrow down
operations [25]. In this context, ML will provide the wireless the introduction of ML in wireless networks to imply two
network with the ability to parse through massive amounts key functions: 1) Intelligent and predictive data analytics,
of data, generated from multiple sources that range from the ability of the wireless network to intelligently process
wireless channel measurements and sensor readings to drones large volumes of data, gathered from its devices, in order
and surveillance images, in order to create a comprehensive to analyze and predict the context of the wireless users
operational map of the massive number of devices within the and the wireless network’s environmental states thus en-
network [30]. This map can, in turn, be exploited to optimize abling data-driven network-wide operational decisions, and 2)
various functions, such as fault monitoring and user tracking, intelligent/self-organizing network control and optimization,
across the wireless network. the ability of the wireless network to dynamically learn the
Second, beyond its powerful intelligent and predictive data wireless environment and intelligently control the wireless
analytics functions, ML will be a major driver of intelligent network and optimize its resources according to information
and data-driven wireless network optimization [30]. For in- smartly learned about the wireless environment and users’
stance, ML tools will enable the introduction of intelligent states.
resource management tools, that can be used to address a Clearly, the ML-based system operation is no longer a
variety of problems ranging from cell association and radio privilege, but rather a necessity for future wireless networks.
access technology selection to frequency allocation, spectrum ML-driven wireless network designs will pave the way to-
management, power control, and intelligent beamforming. In wards an unimaginably rich set of new network functions and
contrast to the conventional distributed optimization tech- wireless services. For instance, even though 5G networks may
niques, that are often done iteratively in an offline or semi- not be fully ML capable, we envision that the subsequent,
offline manner [31], ML-guided resource management mech- sixth generation (6G) [39] of wireless cellular networks will
anisms will be able to operate in a fully online manner by surely integrate important tools from ML, as evidenced by the
learning, in real time, the states of the wireless environment recent development of intelligent mobile networks proposed
and the network’s users. Such mechanisms will therefore be by Huawei [40] and the “big innovation house" proposed by
able to continuously improve their own performance over Qualcomm [41]. As such, the question is no longer if ML tools
time which, in turn, will enable more intelligent and dynamic are going to be integrated into wireless networks but rather
network decision making. Such ML-driven decision making when such an integration will happen. In fact, the importance
is essential for much of the envisioned IoT and 5G services, of an ML-enabled wireless network has already been moti-
particularly those that require real-time, low latency operation, vated by a number of recent wireless networking paradigms,
such as autonomous driving, drone guidance, and industrial such as mobile edge caching, context-aware networking, and
3
mobile edge computing [42]–[49], the majority of which use The main contribution of this paper is, thus, to provide a
ML techniques for various tasks such as user behavior analysis tutorial on the topic of ANN-based ML for wireless network
and predictions so as to determine which contents to cache and design The overarching goal is to give a tutorial on the emerg-
how to proactively allocate computing resources. However, ing research contributions, from ANNs and wireless commu-
despite their importance, these works have a narrow focus and nications, that address the major opportunities and challenges
do not provide any broad, tutorial-like material that can shed in developing ANN-based ML frameworks for understanding
light on the challenges and opportunities associated with the and designing intelligent wireless systems. To the best of our
use of ML for designing intelligent wireless networks. knowledge, this is the first tutorial that gathers the state-of-
the-art and emerging research contributions related to the use
of ANNs for addressing a set of communication problems
B. Previous Works in beyond 5G wireless networks. Our main contributions
A number of surveys and tutorials on ML applications in include:
wireless networking have been published, for example, [3], • We provide a comprehensive treatment of artificial neural
[32], and [50]–[62]. Nevertheless, these works are limited in a networks, with an emphasis on how such tools can be
number of ways. First, a majority of the existing works focuses used to create a new breed of ML-enabled wireless
on a single ML technique (often the basics of deep learning networks.
[32], [50], and [56]–[58] or reinforcement learning [61]) and, • After providing a brief introduction to the basics of ML,
as such, they do not capture the rich spectrum of available ML we provide a more detailed exposition of ANNs that are
frameworks. Second, they mostly restrict their scope to a single particularly useful for wireless applications, such as re-
wireless application such as sensor networks [53], cognitive current, spiking, and deep neural networks. For each type,
radio networks [52], machine-to-machine (M2M) communica- we provide an introduction on their basic architectures
tion [3], physical layer design [32], software defined network- and a specific use-case example. Other ANNs that can be
ing [55], Internet of Things [57], or self-organizing networks used for wireless applications are also briefly mentioned
(SONs) [59], and, hence, they do not comprehensively cover where appropriate.
the broad range of applications that can adopt ML in future • Then, we discuss a broad range of wireless applica-
networks. Third, a large number of the existing surveys and tions that can make use of ANN. These applications
tutorials, such as [3], [51]–[53], [60], and [62]1 , are highly include drone-based communications, spectrum manage-
qualitative and do not provide an in-depth technical and quan- ment with multiple radio access technologies, wireless
titative description on the variety of existing ML tools that are virtual reality, mobile edge caching and computing, and
suitable for wireless communications. Last, but not least, some the IoT system, among others. For each application, we
surveys discuss the basics of neural networks with applications first outline the main rationale for applying ANNs while
outside of wireless communications. However, these surveys pinpointing illustrative scenarios. Then, we expose the
are largely inaccessible to the wireless community, due to their challenges and opportunities brought forward by the use
reliance on examples from rather orthogonal disciplines such of ANNs in the specific wireless application. We comple-
as computer vision. Moreover, most of the existing tutorials or ment this discussion with a detailed example drawn from
surveys do not provide concrete guidelines on how, when, and the state-of-the-art and, then, we conclude by shedding
where to use different artificial neural network (ANN) tools light on the potential future works within each specific
in the context of wireless networks. Finally, the introductory area.
literature on ML for wireless networks such as in [3], [32], The rest of this tutorial is organized as follows (Fig. 1).
and [50]–[62], is largely sparse and fragmented and provides In Section II, we introduce the basics of ANNs. Section III
very scarce details on the role of ANNs, hence, making it presents several key types of ANNs such as recurrent neural
difficult to understand the intrinsic details of this broad and far networks (RNNs), spiking neural networks (SNNs), and deep
reaching area. Table I summarizes the difference between this neural networks (DNNs). In Section IV, we discuss the use
tutorial and the magazine, tutorial, and survey papers. From of ANNs for wireless communication and the corresponding
Table I, we can see that, compared to the existing works such challenges and opportunities. Finally, conclusions are drawn
as [3], [32], [50]–[62], our tutorial provides a more detailed in Section V.
exposition of several types of ANNs that are particularly useful
for wireless applications and explains, pedagogically and, in
II. A RTIFICIAL N EURAL N ETWORKS : P RELIMINARIES
detail, how to develop ANN-based ML solutions to endow
intelligent wireless networks and realize the full potential of ML was born from pattern recognition and it is essentially
5G systems, and beyond. based on the premise that intelligent machines should be
C. Contributions able to learn from and adapt to their environment through
experience [19]–[24]. Due to the ever growing volumes of
1 The main difference between our tutorial and [62] is that the authors in generated data – across critical infrastructures, communication
[62] do not provide a comprehensive tutorial on how a broad range of ANNs networks, and smart cities – and the need for intelligent data
can be used for solving the wireless communication problems related to drone- analytics, the use of ML algorithms has become ubiquitous
based communications, spectrum management with multiple radio access
technologies, wireless virtual reality, mobile edge caching and computing, [64] across many sectors, such as in financial services, health
and the IoT. care, technology, and entertainment. Using ML algorithms to
4
TABLE I
C OMPARISON OF T HIS W ORK W ITH E XISTING S URVEY AND T UTORIAL PAPERS . H ERE , “CC", “CR", “DT", “PL", AND “DA" REFER TO CACHING AND
COMPUTING , COGNITIVE RADIO NETWORK , DATA TRAFFIC DOMAIN , PHYSICAL LAYER DOMAIN , AND DATA ANALYTICS .
Existing Key Machine Learning Tools Key Applications

Works FNN RNN DNN ESN SNN DA √ RL
√ UAV VR CC SON Multi-RAT IoT √ PL CR DT
[3] √ √
[32] √
[50] √ √ √ √
[51] √ √ √
[52] √ √ √ √
[53] √ √ √
[54] √
[63] √ √
[55] √ √ √ √
[56] √ √ √ √ √ √
[57] √ √ √ √ √ √
[58] √ √ √ √ √
[59] √ √ √ √ √ √ √ √
[60] √ √ √ √ √ √ √ √ √
[61] √ √ √ √ √ √ √
[62] √ √ √ √ √ √ √ √ √ √ √ √
Our tutorial
Section I: Introduction
A. Role of Machine Learning in Wireless Networks C. Contributions
B. Previous Works
Section II: Artificial Neural Networks: Preliminaries Section III: Types of Artificial Neural Networks
Brief Introduction to Machine Learning and Motivation Behind A. Recurrent Neural Networks
Artificial Neural Networks
B. Spiking Neural Networks
Introduction to the Architecture of Artificial Neural Netowkrs C. Deep Neural Networks
Section IV: Application of ANNs for Wireless Communications

A. An Overview of Using ANNs for B. Unmanned Aerial Vehicles-Based
Wireless Networks Wireless Networks
C. Wireless Virtual Reality E. Co-existence of Multiple Radio
D. Mobile Edge Caching and Computing Access Technologies
F. Internet of Things G. Summary
Section V: Conclusion
Fig. 1. Organization of the tutorial.
build models that uncover connections and predict dynamic Supervised learning algorithms are trained using labeled
system or human behavior, system operators can make intelli- data [65]. When dealing with labeled data, both the input
gent decisions without any human intervention. For example, data and its desired output data are known to the system.
in a wireless system such as the IoT, ML tools can be used for Supervised learning is commonly used in applications that
intelligent data analytics and edge intelligence. ML tasks often have enough historical data. In contrast, the training of un-
depend on the nature of their training data. In ML, training is supervised learning tasks is done without labeled data [65].
the process that teaches the machining learning framework to The goal of unsupervised learning is to explore the data
achieve a specific goal, such as for speech recognition. In other and infer some structure directly from the unlabeled data.
words, training enables the ML framework to discover poten- Semi-supervised learning is used for the same applications
tial relationships between the input data and the output data of as supervised learning but it uses both labeled and unlabeled
this machine learning framework. There exist, in general, four data for training [65]. This type of learning can be used
key classes of learning approaches [65]: a) supervised learning, with methods such as classification, regression, and prediction.
b) unsupervised learning, c) semi-supervised learning, and d) Semi-supervised learning is useful when the cost of a fully-
reinforcement learning. labeled training process is relatively high. In contrast to the
5
previously discussed learning methods that need to be trained • Feedforward neural networks: In a feedforward neural
with historical data, RL is trained by the data collected from network (FNN), each neuron has incoming connections
implementation of the RL [65]. The goal of RL is to learn only from the previous layer and outgoing connections
an environment and find the best strategies for a given agent, only to the next layer. FNNs can be used to define
in different environments. The RL algorithms are particularly more advanced architectures such as: a) extreme learn-
interesting in the context of wireless network optimization ing machines (ELMs), b) convolutional neural networks
[66]. To perform supervised, unsupervised, semi-supervised, (CNNs), c) time delay neural networks (TDNNs), d)
or RL learning tasks, several frameworks have been developed. autoencoders, e) probabilistic neural networks (PNUs),
Among those frameworks, ANNs [54] are arguably the most and e) radial basis functions (RBFs).
important, as they are able to mimic human intelligence. • Physical neural networks: In a physical neural network
ANNs are inspired by the structure and functional aspects (PNN), an electrically adjustable resistance material is
of biological neural networks, that can learn from compli- used to emulate the function of a neural activation.
cated or imprecise data [54]. Within the context of wireless Each type of ANN is suitable for a particular learning
communications, as it will be clearer from the later sections, task. For instance, RNNs are effective in dealing with time-
ANNs can be used to investigate and predict network and dependent data while SNNs are effective in dealing with
user behavior so as to provide user information for solving continuous data. It should be noted that most of the data
diverse wireless networking problems such as cell association, collected by wireless networks is time-dependent and con-
spectrum management, computational resource allocation, and tinuous. In particular, in wireless networks, the user context
cached content replacement. Moreover, recent developments and behavior, the wireless signals, and the wireless channel
of smart devices and mobile applications have significantly conditions are all time-dependent and continuous. RNNs and
increased the level at which human users interact with mobile SNNs are effective in dealing with such collected data. They
systems. A trained ANN can be thought of as an “expert” in can exploit this data for various purposes, such as network
dealing with human-related data. Therefore, using ANNs to control and user behavior predictions. However, since RNNs
extract information from the user environment can provide a or SNNs can record only a limited size of historical data, they
wireless network with the ability to predict the users’ future may not be able to solve all of the wireless communication
behaviors and, hence, to design an optimal strategy to improve problems. To solve complex wireless problems that cannot
the resulting QoS and reliability. be solved by shallow RNNs and SNNs, one can use DNNs
There are various types of ANNs (see Fig. 2): which have a high memory capacity for data analytics and
• Modular neural networks: A modular neural network can separate the complex problem that needs to be learned
(MNN) is composed of several independent ANNs and an into a composition of several simpler problems thus making
intermediary. In an MNN, each ANN is used to complete the learning process effective. In consequence, in Section III,
one subtask of the entire task that an MNN wants to we specifically introduce RNNs, SNNs, and DNNs that are
perform. An intermediary is used to process the output most suited for wireless network use cases.
of each independent ANN and generate the output of an
MNN. III. T YPES OF A RTIFICIAL N EURAL N ETWORKS
• Recurrent neural networks: RNNs are ANN architectures
In this section, we specifically discuss three types of ANNs:
that allow neuron connections from a neuron in one layer
RNNs, SNNs, and DNNs, that have a promising potential for
to neurons in previous layers. According to different acti-
wireless network design, as will become clear in Section IV.
vation functions and connection methods for the neurons
For each kind of ANN, we briefly introduce its architecture,
in an RNN, RNNs can be used to define several different
advantages, and properties. Then, we present specific example
architectures: a) stochastic neural networks, b) bidirec-
architectures.
tional neural networks (BNNs), c) fully recurrent neural
network (FRNN), d) neural Turing machines (NTMs),
e) long short-term memories (LSTMs), e) echo state A. Recurrent Neural Networks
networks (ESNs), f) simple recurrent neural networks 1) Architecture of Recurrent Neural Networks: In a tradi-
(SRNNs), and g) gated recurrent units (GRUs). tional ANN, it is assumed that all the inputs or all the outputs
• Generative adversarial networks: Generative adversarial are independent from each other. However, for many tasks,
networks (GANs) consist of two neural networks. One the inputs (outputs) are related. For example, for predicting the
neural network is used to learn a map from a latent space mobility patterns of wireless devices, the input data, that is the
to a particular data distribution, while another neural users’ locations, are certainly related. To this end, recurrent
network is used to discriminate between the true data neural networks [67], which are ANN architectures that allow
distribution and the distribution mapped by the neural neuron connections from a neuron in one layer to neurons in
network. previous layers [67], as shown in Fig. 3, have been introduced.
• Deep neural networks: All the ANNs that have multiple This seemingly simple change enables the output of a neural
hidden layers are known as DNNs. network to depend, not only on the current input, but also on
• Spiking neural networks: The spiking neural networks the historical input, as shown in Fig. 4. This allows RNNs
consist of spiking neurons that accurately mimic the to make use of sequential information and exploit dynamic
biological neural networks. temporal behaviors such as those faced in mobility prediction
6
Stochastic neural network

Learning vector quantization
Bidirectional neural network
Self-organizing maps
Fully recurrent neural network
Modular neural network Boltzmann machine
Neural Turing machines
Restricted Boltzmann machine
Long short-term memory
Recurrent neural network Hopfield network
Echo state network
Simple recurrent network
Generative adversarial networks
Gated recurrent units
Deep belief networks

Deep neural network
All types of neural networks with multiple hidden layers
Spiking neural network Liquid state machine

Extreme learning machines
Feedforwaƌd neural network Convolutional neural network
Contractive autoencoder
Time delay neural network
Variational autoencoder
Physical neural network Autoencoder
Sparse autoencoder
Probabilistic neural network
Denoising autoencoder
Radial basis function
Fig. 2. Summary of artificial neural networks.
compared to traditional ANNs (e.g., FNNs) since each value

Input Hidden Output of the activation function depends on the series data recorded
in RNNs. To reduce the training complexity of RNNs, one
promising solution is to develop an RNN that needs to only
train the output weight matrix. Next, we specifically introduce
this type of RNNs, named echo state networks (ESNs) [70].
2) Example RNN – Echo State Networks: ESNs are
known to be a highly practical type of RNNs due to their
effective approach for training [71]. In fact, ESNs reinvig-
orated interest in RNNs [72] by making them accessible to
wider audiences due to their apparent simplicity. In an ESN,
the input weight matrix and the hidden weight matrix are
Fig. 3. Recurrent neural network architecture. randomly generated without any specific training. Therefore,
ESN needs to only train the output weight matrix. ESNs can, in
theory, approximate any arbitrary nonlinear dynamical system
with any arbitrary precision, they have an inherent tempo-
or speech recognition. For example, an RNN can be used to ral processing capability, and are therefore a very powerful
predict the mobility patterns of mobile devices and wireless enhancement of the linear blackbox modeling techniques in
users. These patterns are related to the historical locations that nonlinear domain. Due to the ESN’s appealing properties such
the wireless users have visited. This task cannot be done in as training simplicity and ability to record historical informa-
one step without combing historical locations from previous tion, it has been widely applied for supervised learning tasks,
steps. Therefore, the ANNs whose output depends only on RL tasks, classification, and regression. In wireless networks,
the current input, such as FNNs, cannot perform such highly ESNs have been applied for various natural applications, such
time-dependent tasks. A summary of the key advantages and as content prediction, resource management, and mobility
disadvantages of RNNs for wireless applications is presented pattern estimation, as it will be clear in Section IV. Next,
in Table II. Note that, in theory, RNNs can make use of the specific architecture and training methods for ESNs are
historical information in arbitrarily long sequences, but in introduced.
practice they are limited to only a subset of the historical • Architecture of an Echo State Network: ESNs use an RNN
information [68]. For training RNNs, the most commonly used architecture with only one hidden layer2 . We define that the
algorithms include the backpropagation through time (BPTT)
algorithm [69]. However, RNNs require more time to train 2 Deep generalizations of ESNs also exist [73]
7
Input Hidden Output
...
...
... W W
Fig. 4. Architecture of an unfolded recurrent neural network.
TABLE II
S UMMARY OF THE A DVANTAGES AND D ISADVANTAGES OF ANN S FOR W IRELESS A PPLICATIONS
Typical type of input data Advantages Drawbacks

• Effectiveness in processing time-related data • Training complexity due to the loop connections
such as wireless traffic between neurons
• Ability to capture dynamic temporal behaviors • Limited memory to record historical data
RNNs Time-dependent data
such as content requests or device mobility
• Ability to make use of sequential information
such as sequential symbols received by a user
• Effectiveness in processing continuous data • Training complexity due to dynamic neurons
such as amplitudes of wireless signals • Specific training method is needed for each type of SNN
• Large memory available for data collection • Need to sample the states of neurons
SNNs Continuous data
• Ability to cope with rapidly changing, dynamic
network behavior (e.g., dynamic traffic)
• Ability to perform multiple learning tasks
• Inherent ability to find low-dimensional • Hard to tune for practical applications
representations (features) of high-dimensional data • Large training dataset is required
DNNs High-dimensional data such as images and wireless traffic pattern • Computationally intensive to train
• Better learning capability compared to shallow ANNs
• Effective in learning very complex functions
T
input vector of an ESN as xt = [xt,1 , . . . , xt,Nin ] and the describe the activation value of each neuron. Even though the
T
output vector of an ESN as y t = [yt,1 , . . . , yt,Nout ] . An ESN input and the hidden weight matrices are fixed (randomly), all
model consists of the input weight matrix W in ∈ RN ×Nin , the neurons of an ESN will have their own activation values
the recurrent weight matrix W ∈ RN ×(N +1) , the leaking rate (hidden state). As opposed to the classical RNNs in which the
α, and the output weight matrix W out ∈ RNout ×(1+N +Nin ) , hidden state depends only on the current input, in ESNs, the
where N is the number of neurons in the hidden layer. The hidden state will be given by:
leaking rate α must be chosen to match the speed of the
T
dynamics of the hidden states st = [st,1 , . . . , st,N ] , where s̃t = f (W [1; st−1 ] + W in xt ) , (1)
st,i represents the state of neuron i at time t, and output y t . To
allow ESNs to store historical information, the hidden state st st = (1 − α) st−1 + αs̃t , (2)
should satisfy the so-called echo state property, which means
that the hidden state st should be uniquely defined by the x
where f (x) = eex −e +e−x and [·; ·] represents a vertical vector
−x
fading history of the input x0 , x1 , . . . , xt . This is in contrast (or matrix) concatenation. The model is also sometimes used
to traditional ANNs, such as FNNs, that need to adjust the without the leaky integration, which is a special case for α = 1
weight values of the neurons in the hidden layers, ESNs only yielding s̃t = st . From (1), we can see that the scaling of W in
need to guarantee the echo state property. Typically, in order and W determines the proportion of how much the current
to guarantee the echo state property of an ESN, the spectral state st depends on the current input xt and how much on
radius of W should be smaller than 1. The setting of other the previous state st−1 . Here, a feedback connection from
ESN components to guarantee the echo state property and to y t−1 to st can be applied to the ESNs, defined as a weight
optimize ESN performance can be found in [70]. matrix W fb ∈ RN ×Nout . Hence, (1) can be rewritten as s̃t =
Having described the main components of ESNs, we now f W [1; st−1 ] + W in xt + W fb y t−1 .
8
Based on the hidden state st , the output signal of the ESN Liquid
can be given by: 5HDGRXW fuction
yt = W out [1; st ; xt ] . (3)

Output
Here, an additional nonlinearity can be applied to (3), i.e., Input
y t = tanh (W out [1; st ; xt ]).

• Training in Echo State Networks: The training process in
ESNs seeks to minimize the mean square error (MSE) between
the targeted output and the actual output. When this MSE is
minimized, the actual output will be the target output which Fig. 5. Architecture of a LSM.
can be given by y D D
t = W out [1; st ; xt ] where y t is the targeted
output. Therefore, the training goal is to find an optimal W out
such that W out [1; st ; xt ] = y D t . In contrast to conventional SNNs have two major advantages over traditional ANNs: fast
RNNs that require gradient-based learning algorithms to adjust real-time decoding of signals and high information carriage
all the inputs, the hidden, and the output weight matrices, capacity by adding a temporal dimension. Therefore, an SNN
ESNs only need to train the output weight matrix with simple can use fewer neurons to accomplish the same task compared
training methods such as ridge regression. The most universal to traditional ANNs and it can also be used for real-time
and stable solution to this problem is via the so-called ridge computations on continuous streams of data, which means
regression approach, also known as regression with Tikhonov that both the inputs and outputs of an SNN are streams of
regularization [74], which is given by: data in continuous time. However, the training of SNNs is
−1 more challenging (and potentially more time-consuming) than
W out = y D
T T
[1; s t ; xt ] [1; s t ; x t ] [1; s t ; x t ] + θI , (4)
t that of traditional ANNs due to their complex spiking neural
where I is an identity matrix and θ is a regularization models. A summary of the key advantages and disadvantages
coefficient which should be selected individually for a concrete of SNNs for wireless applications is presented in Table II. To
reservoir based on validation data. When θ = 0, the ridge reduce the training complexity of SNNs and keep the dynamics
regression will become a generalization of a regular linear of spiking neurons, one promising solution is to develop a
regression. However, ridge regression is an offline training spiking neuron network that needs to only train the output
method for ESNs. In fact, ESNs can be also trained by weight matrix, like ESNs in RNNs. Next, we specifically
using online methods such as the least mean squares (LMS) present this type of SNNs, named liquid state machine.
algorithm [75], or the recursive least squares (RLS) algorithms 2) Example SNN - Liquid State Machine: The architecture
[76]. of an LSM consists of only two components: the liquid and
the readout function, as shown in Fig. 5. Here, the liquid
represents a spiking neural network with leaky-integrate-and-
B. Spiking Neural Networks fire (LIF) model neurons and the readout function is a number
Another important type of ANNs is the so-called spiking of FNNs. For an LSM, the connections between the neurons
neural networks. In contrast to FNNs and RNNs that simply in the liquid is randomly generated, allowing LSM to possess
use a single value to denote the activations of neurons, SNNs a recurrent nature that turns the time-varying input into a
use a more accurate model of biological neural networks to spatio-temporal pattern. In contrast to the general SNNs that
denote the activations of neurons. In the following, we first need to adjust the weight values of all neurons, LSMs need
briefly introduce the architecture of SNNs. Then, we give an to only train the comparatively simple FNN of the readout
example for SNNs, the so-called liquid state machine. function. In particular, simple training methods for FNNs such
1) Architecture of a Spiking Neural Network: The archi- as the feedforward propagation algorithm can be used for
tecture of SNNs is similar to the neurons in the biological training SNNs to minimize the errors between the desired
neural networks. Therefore, we first discuss how the neurons output signal and the actual output signal, which enables LSM
operate in a real-world biological neural network. Then, we to be widely applied for practical applications such as [77]
discuss the model of neurons in SNNs. and [78]. Due to the LSM’s spiking neurons, it can perform
In biological neural networks, the neurons use spikes to ML tasks on continuous data like general SNNs but, it is also
communicate with each other. The incoming signals alter the possible to train it using effective and simple algorithms. In
voltage of a neuron and when the voltage exceeds a threshold wireless networks, this can be suitable for signal detection and
value, the neuron sends out an action potential which is a short nonlinear audio prediction. Next, we specifically introduce the
(1 ms) and sudden increase in voltage that is created in the LSM architecture.
cell body or soma. Due to the form and the nature of this • Liquid Model: In LSM, the liquid is made up of a large
process, we refer to it as a spike or a pulse. For SNNs, the number of spiking LIF model neurons, located in a virtual
use of such spikes can significantly improve the dynamics of three-dimensional column. The liquid has two important func-
the network. Therefore, SNNs can model a central nervous tions in the classification of time-series data. First, its fading
system and study the operation of biological neural circuits. memory is responsible for collecting and integrating the input
Since the neurons in SNNs are modeled based on the spike, signal over time. Each one of the neurons in the liquid keeps
9
Input Hidden Output shift from conventional, shallow ANNs, towards DNN, include
recent advances in computing capacity due to the emergence
of capable processing units, the wide availability of data for
DNN training, and the emergence of effective DNN training
algorithms [81]. As opposed to shallow ANNs that have only
one hidden layer, a DNN having multiple layers is more
beneficial due to the following reasons:
• Number of neurons: Generally, a shallow ANN would
require a lot more neurons than a DNN for the same
level of performance. In fact, the number of units in a
shallow ANN grows exponentially with the complexity
of the task.
Fig. 6. Architecture of a DNN. • Task learning: While the shallow ANNs can be effective
to solve small-scale problems, they can be ineffective
when dealing with more complex problems, such as
its own state, which gives the liquid a strong fading memory. wireless environment mapping. In fact, the main issue
The activity in the network and the actual firing of the neurons is that shallow ANNs are very good at memorization, but
can also last for a while after the signal has ended, which not so good at generalization. As such, DNNs are more
can be viewed as another form of memory. Second, in the suitable for many real-world tasks which often involve
liquid of an LSM, the different input signals are separated, complex problems that are solved by decomposing the
allowing for the readout to classify them. This separation is function that needs to be learned into several simpler
hypothesized to happen by increasing the dimensionality of the functions so as to improve the efficiency of the learning
signal. For example, if the input signal has 20 input channels, process.
this is transformed into 135 (3 × 3 × 15) signals and states It is worth noting that, although DNNs have a large capacity
of neurons in the liquid. For every pair of input signal and to model a high degree of nonlinearity in the input data, a
liquid neuron, there is a certain chance of being connected, central challenge is that of overfitting. In DNNs, overfitting
e.g., 30% in [79]. The connections between the neurons are becomes particularly acute due to the presence of a very
allocated in a stochastic manner (e.g., see [79, Appendix B]). large number of parameters. To overcome this issue, several
All neurons in a liquid will connect to the readout functions. advanced regularization approaches, such as dataset augmen-
• Readout Model: The readout of an LSM consists of one tation and weight decay [82] have been proposed. These
or more FNNs that use the activity state of the liquid to methods modify the learning algorithm so that the test error is
approximate a specific function. The purpose of the readout reduced at the expense of increased training errors. A summary
is to build the relationship between the dynamics of the of key advantages and disadvantages of DNNs for wireless
spiking neurons and the desired output signals. The inputs applications are presented in Table II.
of the readout networks are called readout-moments. These Next, we elaborate more on LSTM, a special kind of DNN
are snapshots of the liquid activity taken at a regular interval. that is capable of storing information for long periods of
Whatever measure is used, the readout represents the state time by using an identity activation function for the memory
of the liquid at some point in time. In general, in LSM, cell. This, in turn, makes LSTM suitable for various wireless
FNNs are used as the readout function. FNNs will use the communication problems such as channel selection.
liquid dynamics (i.e., spikes) as their input and the desired 1) Example DNN - Long Short Term Memory: LSTMs
output signals as their output. Then, the readout function can that typically consist of three hidden layers are a special kind
be trained using traditional training methods used for FNNs, of “deep learning” RNNs that are capable of storing infor-
mainly backpropagation. Once the readout function has been mation for either long or short periods of time. In particular,
trained, the LSM can be used to perform the corresponding the activations of an LSTM network correspond to short-term
tasks. memory, while the weights correspond to long-term memory.
Therefore, if the activations can preserve information over
C. Deep Neural Networks long periods of time, then this makes them long-term short-
Thus far, all of the discussed ANNs, including ESNs and term memory. Although both ESNs and LSTMs are good at
LSMs, have assumed a single hidden layer. Such an architec- modeling time series data, LSTM cells have the capability
ture is typically referred to as a shallow ANN. In contrast, a of dealing with long term dependencies. An LSTM contains
deep neural network is an ANN with multiple hidden layers LSTM units each of which having a cell with a state ct at time
between the input and the output layers [80], as shown in Fig. t. Access to this memory unit, as shown in Fig.7, for reading
6. Therefore, a DNN models high-level abstractions in data or modifying information is controlled via three gates:
through multiple nonlinear transformations to learn multiple • Input gate (it ): controls whether the input is passed on
levels of representation and abstraction [80]. Several types of to the memory cell or ignored.
DNNs exist such as deep CNNs, deep ESNs, deep LSMs, and • Output gate (ot ): controls whether the current activation
LSTM [80]. The main reasons that have enabled a paradigm vector of the memory cell is passed on to the output layer
10
Fig. 7. Architecture of an LSTM as shown in [83].
TABLE III gate, respectively. W o and U o are the weight and transition
VARIOUS B EHAVIORS OF AN LSTM C ELL matrices of the output gate, respectively. W c and U c are the
Input gate Forget gate Behavior weight and transition matrices of the cell state, respectively.
0 1 remember the previous value
1 1 add to the previous value
fg , fc , and fh are the activation functions, corresponding
0 0 erase the value respectively to the sigmoid and the tanh functions. ⊙ denotes
1 0 overwrite the value the Hadamard product. Compared to a standard RNN, LSTM
uses additive memory updates and separates the memory c
from the hidden state s, which interacts with the environment
or not. when making predictions. To train an LSTM network, the
• Forget gate (ft ): controls whether the activation vector stochastic gradient descent algorithm can be used.
of the memory cell is reset to zero or maintained.
Finally, another important type of DNNs is the so-called
Therefore, an LSTM cell makes decisions about what to convolutional neural networks that were recently proposed
store, and when to allow reads, writes, and erasures, via gates for analyzing visual imagery [84]. CNNs are essentially a
that open and close. At each time step t, an LSTM receives class of deep, FNNs. In CNNs, the hidden layers have neu-
inputs from two external sources, the current frame xt and the rons arranged in three dimensions: width, height, and depth.
previous hidden states of all LSTM units in the same layer These hidden layers are either convolutional, pooling, or fully
ht−1 , at each of the four terminals (the three gates and the connected, and, hence, if one hidden layer is convolutional
input). These inputs get summed up, along with the bias factors (pooling/fully connected), then it is called convolutional (pool-
bf , bi , bo , and bc . The gates are activated by passing their total ing/fully connected) layer. The convolutional layers apply a
input through the logistic functions. Table III summarizes the convolution operation to the input, passing the result to the
various behaviors an LSTM cell can achieve depending on next layer. The pooling layers are mainly used to simplify the
the values of the input and the forget gates. Moreover, the information from the convolutional layer while fully connected
update steps of a layer of LSTM units are summarized in the layers connect every neuron in one layer to every neuron in
following equations: another layer. As opposed to LSTM, that are good at temporal
gt = fg (W f xt + U f st−1 + bf ), (5) modeling, CNNs are appropriate at reducing frequency vari-
ations which therefore makes them suitable for applications
it = fg (W i xt + U i st−1 + bi ), (6) that deal with spatial data such as interference identification
in wireless networks [85]. Moreover, CNNs can be combined
ot = fg (W o xt + U o st−1 + bo ), (7)
with LSTM, resulting in a CNN LSTM architecture that can
ct = gt ⊙ ct−1 + it ⊙ fc (W c xt + U c ht−1 + bc ), (8) be used for sequence prediction problems with spatial inputs,
like images or videos [86].
st = ot ⊙ fh (ct ), (9)
In summary, different types of ANNs will have different
where gt , it , and ot are the forget, the input, and the output architectures, activation functions, connection methods, and
gate vectors at time t, respectively. xt is the input vector, data storage capacities. Each specific type of ANNs is suitable
ht is the hidden/output vector, and ct is the cell state vector for dealing with a particular type of data. For example, RNNs
(i.e., internal memory) at time t. W f and U f are the weight are good at dealing with time-related data while SNNs are
and transition matrices of the forget gate, respectively. W i good at dealing with continuous data. Moreover, each type
and U i are the weight and transition matrices of the input of ANNs has its own advantages and disadvantages in terms
11
of learning tasks, specific tasks such as time-related tasks or and autonomous/connected vehicles to inform the vehicles of
space-related tasks, training data size, training time, and data the traffic state and to potentially re-route some traffic to
storage space. Given all of their advantages, ANNs are ripe to respond to the current state of the system. Furthermore, ANNs
be exploited in a diverse spectrum of applications in wireless can be beneficial for integrating different data from multiple
networking, as discussed in the following section. sensors thus facilitating more interesting and complex wireless
communication applications. In particular, ANNs can identify
IV. A PPLICATIONS OF N EURAL N ETWORKS IN W IRELESS nonintuitive features largely from cross-sensor correlations
C OMMUNICATIONS which can result in a more accurate estimation of a wireless
In this section, we first overview the motivation behind network’s conditions and an efficient allocation of the available
developing ANN solutions for wireless communications and resources. Finally, the wireless network can use ANNs to learn
networking problems. Then, we introduce the use of ANNs about faults, infrastructure failure, and other disruptive events,
for various wireless applications. In particular, we discuss how so as to improve its resilience to such events.
to use ANNs for unmanned aerial vehicles (UAVs), wireless Second, a key application of ANNs in wireless networks is
virtual reality (VR), mobile edge caching and computing, for enabling self-organizing network operation by instilling
multiple radio access technologies, and the IoT. ANN-based ML at the edge of the network, as well as
across its various components (e.g., base stations and end-
user devices). Such edge intelligence is a key enabler of self-
A. Artificially Intelligent Wireless Networks using ANNs: An organizing solutions for resource management, user associa-
Overview tion, and data offloading. In this context, ANNs can serve as
Recently, ANNs have started to attract significant attention RL tools [87] that can be used by a wireless network’s devices
in the context of wireless communications and networking [4], to learn the wireless environment and to make intelligent
[25] and [32], since the development of smart devices and decisions. An ANN-based RL algorithm also can be used to
mobile applications has significantly increased the autonomy learn the users’ information such as their locations and data
of a wireless network, as well as the level at which human rate, and determine the UAV’s path based on the learned in-
users interact with the wireless communication system. More- formation. Traditional learning algorithms, such as Q-learning,
over, the development of mobile edge computing and caching that use tables or matrices to record historical data, do not scale
technologies makes it possible for base stations to store and well for dense wireless networks. On the other hand, ANNs
analyze the behavior of the users of a wireless network. In recently use a nonlinear function approximation method to find
addition, the emergence of the Internet of Things motivates the relationship using historical information. Therefore, ANN-
the use of ANNs to improve the way in which wireless based RL algorithms can learn complex relationships between
data is processed, collected, and used for various sensing and wireless users and their networking environments to provide
autonomy purposes. solutions for the notoriously challenging problems of network
In essence, within the wireless communication domains, performance optimization and resource management.
ANNs have been proposed for two major applications. First, ANNs can be simultaneously employed for both predic-
they can be used for prediction, inference, and the intelligent tion and intelligent/self-organizing operation, for scenarios in
and predictive data analytics purposes. Within this application which two functions are largely interdependent. For instance,
domain, the ANN-based ML algorithms enable the wireless data can help in decision making, while decision making can
network to learn from the datasets generated by its users, generate new data. For example, when considering virtual
environment, and network devices. For instance, ANNs can reality applications over wireless networks, one can use ANNs
be used to analyze and predict the wireless users’ mobility to predict the behavior of users, such as head movement
patterns and content requests therefore allowing the BSs to and content requests. These predictions can help an ANN-
optimize the use of their resources, such as frequency, time, based RL algorithm to allocate computational and spectral
or the files that will be cached across the network. Moreover, resources to the users hence improving their QoS. Next,
predictions and inference will be a primary enabler of the we discuss specific applications that use ANNs for wireless
emerging IoT and smart cities paradigms. Within an IoT or communications.
within a smart city ecosystem, sensors will generate massive
volumes of data that can be used by the wireless network to
optimize its resources usage, understand its network operation, B. Wireless Communications and Networking with Unmanned
monitor failures, or simply deliver smart services, such as Aerial Vehicles
intelligent transportation. In this regard, the use of ANNs 1) UAVs for Wireless Communications: Providing connec-
for optimized predictions is imperative. In fact, ANNs will tivity from the sky to ground wireless users is an emerging
equip the network with the capability to process massive trend in wireless networking [88] (Fig. 8). Compared to
volumes of data and to parse useful information out of this terrestrial communications, a wireless system with low-altitude
data, as a pre-cursor to delivering smart city services. For UAVs is faster to deploy, more flexibly reconfigured, and
example, road traffic data gathered from IoT sensors can be likely to experience better communication channels due to the
processed using ANN tools to predict the road traffic status presence of short-range, line-of-sight (LoS) links. The use of
at various locations in the city. This can then be used by the highly mobile and energy-constrained UAVs for wireless com-
wireless network that connects road traffic signals, apparatus, munications also introduces many new challenges [88], such as
12
the users, battery-limited UAVs can determine their optimal

locations and design an optimal flying path to service ground
users. Meanwhile, using ANNs enables more advanced UAV
applications such as environment identification. Clearly, within
a wireless environment, most of the data of interest, such as
that pertaining to the human behavior, UAV movement, and
data collected from wireless devices, will be time related. For
instance, certain users will often go to the same office for work
at the same time during weekdays. ANNs can effectively deal
with time-dependent data which makes them a natural choice
for the applications of UAV-based wireless communication.
Fig. 8. UAV-enabled wireless networks. In this figure, UAVs can be used as Using ANNs for UAVs faces many challenges, such as
BSs to serve users in hotspot areas due to special events such as a sport game
or a disaster scenarios. the limited flight time to collect data, the limited power and
the computational resources for training ANNs, as well as
the data errors due to the air-to-ground channel. First, the
limited battery life and the limited computational power of
the need of network modeling, backhaul (fronthaul) limitations UAVs can significantly constrain the use of ANNs. This stems
for UAV-to-UAV communication when the UAVs act as flying from the fact that ANNs require a non-negligible amount of
BSs, optimal deployment, air-to-ground channel modeling, time and computational resources for training. For instance,
energy efficiency, path planning, and security. In particular, UAVs must consider a tradeoff between the energy used for
compared to the deployment of terrestrial BSs that are static, training ANNs and that used for other applications such as
mostly long-term, and two-dimensional, the deployment of servicing users. Moreover, due to their flight time constraints
UAVs is flexible, short-term, and three-dimensional. There- [90], UAVs can only collect data within a limited time period.
fore, there is a need to investigate the optimal deployment In consequence, UAVs may not have enough collected data
of UAVs for coverage extension and capacity improvement. for training ANNs. In addition, the air-to-ground channels
Moreover, UAVs can be used for data collection, delivery, of UAVs will be significantly affected by the weather, the
and transmitting telematics. Hence, there is a need to develop environment, and their movement. Therefore, the collected
intelligent self-organizing control algorithms to optimize the data can include errors that may affect the accuracy of the
flying path of UAVs. In addition, the scarcity of the wireless outcomes of the ANNs.
spectrum, that is already heavily used for terrestrial networks, The existing literature has studied a number of problems
is also a big challenge for UAV-based wireless communication. related to using ANNs for UAVs [91]–[98]. In [91], the authors
Due to the UAVs’ channel characteristics (less blockage and used a deep RL algorithm to efficiently control the coverage
high probability for LoS link), the use of mmWave spectrum and connectivity of UAVs. The authors in [92] studied the
bands and visible light [89] will be a promising solution use of ANNs for UAV assignment to meet the high traffic
for UAV-based communication. Therefore, one can consider demands of ground users. The work in [93] investigated
resource management problems in the context of mmW- the use of ANNs for UAV detection. In [94], the authors
equipped UAVs, given their potential benefits for air-to-ground studied the use of ANNs for trajectory tracking of UAVs.
communications. Finally, one must consider the problems The work in [95] proposed a multilayer perceptron based
of resource allocation, interference management, and routing learning algorithm that uses aerial images and aerial geo-
when the UAVs act as users. referenced images to estimate the positions of UAVs. In
2) Neural Networks for UAV-Based Wireless Communi- [96], an ESN based RL algorithm is proposed for resource
cation: Due to the flying nature of UAVs, they can track allocation in UAV based networks. In [98], we proposed an
the users’ behavior and collect information related to the RL algorithm that uses LSM for resource allocation in UAV-
users and the UAVs within any distance, at any time or based LTE over an unlicensed band (LTE-U) network. For
any place, which provides an ideal setting for implementing UAV-based wireless communications, ANNs can be also used
ANN techniques. ANNs have two major use cases for UAV- for many applications such as path planning [99], as mentioned
based wireless communication. First, using ANN-centric RL previously. Next, we explain a specific ANN application for
algorithms, UAVs can be operated in a self-organizing manner. UAV-based wireless communication.
For instance, using ANNs as a RL, UAVs can dynamically 3) Example: An elegant and interesting use of ANNs for
adjust their locations, flying directions, resource allocation UAV-based communication systems is presented in [97] for
decisions, and path planning to serve their ground users and the study of the proactive deployment of cache-enabled UAVs.
adapt to the users’ dynamic environment. Second, UAVs can The model in [97] considers the downlink of a wireless cloud
be used to map the ground environment as well as the wireless radio access network (CRAN) servicing a set of mobile users
environment itself to collect data and take advantage of ANN via terrestrial remote radio heads and flying cache-enabled
algorithms to exploit the collected data and perform data UAVs. The terrestrial remote radio heads (RRHs) transmit over
analytics to predict the ground users’ behavior. For example, the cellular band and are connected to the cloud’s pool of
ANNs can exploit the collected mobility data to predict the the baseband units (BBUs) via capacity-constrained fronthaul
users’ mobility patterns. Based on the behavioral patterns of links. Since each user has its own QoE requirement, the
13
capacity-constrained fronthaul links will directly limit the data location. The output is the prediction of a user’s location
rate of the users that request content from the cloud. Therefore, in the next time slots. Ridge regression is used to train the
the cache-enabled UAVs are introduced to service the mobile ESNs. The conceptor is also defined as a matrix used to
users along with terrestrial RRHs. Each cache-enabled UAV control the learning of an ESN. During the learning stage, the
can store a limited number of popular content that the users conceptor will record the learned mobility patterns and content
request. By caching the predicted content, the transmission request distribution patterns. When the conceptor ESN-based
delay from the content server to the UAVs can be significantly algorithm encounters a new input pattern, it will first determine
reduced as each UAV can directly transmit its stored content whether this pattern has been learned. If this new pattern has
to the users. been previously learned, the conceptor will instruct the ESN
A realistic model for periodic, daily, and pedestrian mobility to directly ignore it. This can allow the ESN to save some of
patterns is considered according to which each user will its memory only for the unlearned patterns.
regularly visit a certain location of interest. The QoE of each Based on the users’ mobility pattern prediction, the BBUs
user is formally defined as function of each user’s data rate, can determine the user association using a K-mean clustering
delay, and device type. The impact of the device type on the approach. By implementing a K-mean clustering approach,
QoE is captured by the screen size of each device. The screen the users that are close to each other are grouped into one
size will also affect the QoE perception of the user, especially cluster. In consequence, each UAV services one cluster and the
for video-oriented applications. The goal of [97] is to find user-UAV association is determined. Then, based on the UAV
an effective deployment of cache-enabled UAVs to satisfy the association and each user’s content request distribution, the
QoE requirements of each user while minimizing the transmit optimal contents to cache at each UAV and the optimal UAVs’
powers of the UAVs. This problem involves predicting, for locations can be found. When the altitude of a UAV is much
each user, the content request distribution and the periodic higher (lower) than the size of its corresponding coverage, the
locations, finding the optimal contents to cache at the UAVs, optimal location of the UAV can be found [97, Theorems 2 and
determining the users’ associations, as well as adjusting the 3]. For more generic cases, it can be found by the ESN-based
locations and transmit power of the UAVs. ANNs can be RL algorithm [101].
used to solve the prediction tasks due to their effectiveness in In Fig. 9, based on [97], we show how the memory of the
dealing with time-varying data (e.g., mobility data). Moreover, conceptor ESN reservoir changes as the number of mobility
ANNs can extract the relationships between the user locations patterns that were learned varies. The used mobility data is
and the users’ context information such as gender, occupation, gathered from Beijing University of Posts and Telecommuni-
and age. In addition, ANN-based RL algorithms can find the cations by recording the students’ locations during each day.
relationship between the UAVs’ location and the data rate of In Fig. 9, one mobility pattern represents the users’ trajectory
each user, enabling UAVs to find the locations that maximize in one day and the colored region is the memory used by the
the users’ data rates. ESN. Fig. 9 shows that the usage of the memory increases
A prediction algorithm using the framework of ESN with as the number of the learned mobility patterns increases. Fig.
conceptors is developed to find the users’ content request 9 also shows that the conceptor ESN uses less memory for
distributions and their mobility patterns. The predictions of learning mobility pattern 2 compared to pattern 6. In fact,
the users’ content request distribution and their mobility pat- compared to pattern 6, mobility pattern 2 has more similarities
terns are then used to find the user-UAV association, optimal to mobility pattern 1, and, hence, the conceptor ESN requires
locations of the UAVs and content caching at the UAVs. Since less memory to learn pattern 2. This is because the proposed
the data of the users’ behaviors such as mobility and content approach can be used to only learn the difference between the
request are time-related, an ESN-based approach, as previously learned mobility patterns and the new ones rather than to learn
discussed in Subsection III-A2, can quickly learn the mobility the entirety of every new pattern.
pattern and content request distributions without requiring Fig. 10 shows how the total transmit power of the UAVs
significant training data. Conceptors, defined in [100], enable changes as the number of users varies. From Fig. 10, we can
an ESN to perform a large number of predictions of mobility observe that the total UAV transmit power resulting from all
and content request patterns. Moreover, new patterns can the algorithms increases with the number of users. This is due
be added to the reservoir of the ESN without interfering to the fact that the number of users associated with the RRHs
with the previously acquired ones. The architecture of the and the capacity of the wireless fronthaul links are limited.
conceptor ESN-based prediction approach is based on the ESN Therefore, the UAVs must increase their transmit power to
model specified in Subsection III-A2. For the content request satisfy the QoE requirement of each user. From Fig. 10, we can
distribution prediction, the cloud’s BBUs must implement one also see that the conceptor based ESN approach can reduce the
conceptor ESN algorithm for each user. The input is defined total transmit power of the UAVs by about 16.7% compared
as each user’s context that includes gender, occupation, age, to the ESN algorithm used to predict the content request and
and device type. The output is the prediction of a user’s the mobility for a network with 70 users. This is because the
content request distribution. The generation of the reservoir conceptor ESN, that separates the users’ behavior into multiple
is done as explained in Subsection III-A2. The conceptor is patterns and uses the conceptor to learn these patterns, can
defined as a matrix that is used to control the learning of an predict the users’ behavior more accurately compared to the
ESN. For predicting mobility patterns, the input of the ESN- ESN algorithm.
based algorithm is defined as the user’s context and current Resource allocation problems in UAV-based wireless net-
14
Fig. 9. Mobility patterns predictions of conceptor ESN algorithm [97]. In this figure, the green curve represents the conceptor ESN prediction, the black
curve is the real positions, top rectangle j is the index of the mobility pattern learned by ESN, the legend on the bottom left shows the total reservoir memory
used by ESN and the legend on the bottom right shows the normalized root mean square error of each mobility pattern prediction.
Optimal algorithm with complete information are used for intelligently determining the user association,
Proposed conceptor ESN algorithm
ESN algorithm that predicts content request and mobility
optimal caching, and optimal UAV locations. The key lessons
90 ESN algorithm that predicts mobility with random caching learned here include:
Total transmit power of UAVs (W)
80
70 • The advantage of the conceptor ESN for UAV-based

60 networks is that it provided the network with an ability to
50 proactively determine the deployment of UAVs and the
40 optimal content stored at UAVs. Since UAVs are flexible
30
in their deployment (unlike terrestrial base stations), such
20
a proactive approach is desirable. The analysis in [97]
10
also revealed that the use of a conceptor in the ESN
0
40 50 60 70 80 90 scheme allows it to separate a user’s weekly mobility
Number of users into several patterns and use various non-linear systems
for predictions thus improving accuracy. Moreover, the
Fig. 10. Simulation result showing the transmit power as the number of users conceptor ESNs enable the cloud to add new patterns
varies [97].
to the ESN without interfering with previously acquired
ones and, hence, they can improve the usage of an ESN’s
memory (i.e., its capacity to store past data).
works can also be addressed using LSMs, as explained in [98]. • The conceptor ESN algorithm that we presented in this
In particular, in [98], an LSM-based RL algorithm is used for section is able to perform its predictions over a long
resource and cache management in LTE over unlicensed (LTE- period of time. In this case, the conceptor ESN can be
U) UAV networks. The LSM-based RL algorithm in [98] can trained in a completely offline manner and its training
find the appropriate policies for user association and resource process can be implemented at the cloud, thus leveraging
allocation as well as the contents to cache at UAVs, as the its computational power. Once trained at the cloud, the
users’ content requests change dynamically. This is due to UAVs can then directly use the cloud-trained conceptor
the fact that an LSM can record the dynamic user content ESNs for predictions and deployment. Thus, this re-
requests as well as the policies of the user association, resource sults energy savings which is particularly important for
allocation, and content caching due to its large memory (com- resource-limited UAVs. Another reason to train conceptor
pared to ESN). Based on the recorded information, the LSM ESNs at the cloud is that the cloud is better positioned
algorithm can build a relationship between content requests, in the network to collect mobility information. Due to
user association, resource allocation and caching content. this implementation, one can neglect the overhead for the
4) Lessons learned: From this example, we have demon- training of the conceptor ESNs.
strated that the conceptor ESN can be used for effective • From this work, we have observed that, for mobility
data analytics in wireless networks that integrate UAV base prediction, a shallow conceptor ESN learning algorithm
stations, particularly, for mobility pattern and content request can achieve the same prediction accuracy compared to a
distribution predictions (and UAV-level caching). The ML deep learning algorithm (e.g., similar to the one that will
angle in this application stems from the fact that predictions be introduced in the multi-RAT application of Subsection
15
IV-E). This is mainly due to the fact that the future

locations of each user depend only on a small number
of the locations that the user has previously visited. In
consequence, a shallow conceptor ESN is sufficient to
record these visited locations and perform reasonable
predictions.
• One disadvantage of using a conceptor ESN learning
algorithm for intelligent and predictive data analytics is
that the conceptor will increase the training complexity Fig. 11. Wireless VR networks. In this figure, BSs that are acted as VR
of each ESN. This is due to the fact that, during the controllers generate and transmit VR videos to VR users according to the
tracking information collected from VR users.
training process, the conceptor needs to identify the input
data of a given ESN and also needs to find appropriate
memory space of the ESN for data recording. This further
motivates the need to train the conceptor ESNs at the developments motivate us to analyze wireless VR as a key use
cloud so as to save the UAV energy. case of ANNs in future wireless networks.
Note that, observations in the third and fourth bullets above When a VR device is operated over a wireless link, the users
can be generalized to other shallow RNNs. must send the tracking information that includes the users’
5) Future Works: Clearly, ANNs are an important tool locations and orientations to the BSs and, then, the BSs will
for addressing key challenges in UAV-based communication use the tracking information to construct 360◦ images and
networks. In fact, different types of ANNs can be suitable for send these images to the users. Therefore, for wireless VR
various UAV applications. For instance, given their effective- applications, the uplink and downlink transmissions must be
ness in dealing with time-dependent data, RNNs can be used jointly considered. Moreover, in contrast to traditional video
for predicting user locations and traffic demands. This allows that consists of 120◦ images, a VR video consists of high-
UAVs to optimize their location based on the dynamics of the resolution 360◦ vision with three-dimensional surround stereo.
network. DNN-based RL algorithms can be used to determine This new type of VR video requires a much higher data rate
the time duration that the UAVs need to service the ground than that of traditional mobile video. In addition, as the VR
users and how to service the ground users (e.g., stop or fly to images are constructed according to the the users’ movement
service the users). Since DNNs have the ability to store large such as their head or eye movement, the tracking accuracy
amount of data, DNN-based RL algorithms can also be used to of the VR system will directly affect the user experience. In
store the data related to the users’ historical context and, then, summary, the challenges of operating VR devices over wireless
predict each ground user’s locations, content requests, and networks [38] include tracking accuracy, low delay, high data
latency requirement. Based on these predictions, the UAVs can rate, user experience modeling, effective image compression
find their optimal trajectory and, as a result, determine which as well as VR content and tracking information transmission
area to serve at any given time. In addition, SNNs can be used over wireless links.
for modeling the air-to ground channel, in general, and over 2) Neural Networks for Wireless Virtual Reality: The use
mmWave frequencies, in particular. This is because SNNs are of ANNs is a promising solution for a number of problems
good at dealing with continuous data and the wireless channel related to wireless VR. This is due to the fact that, compared to
is time-varing and continuous [102]. For instance, UAVs can other applications such as UAV or caching, VR applications
use SNNs to analyze the data that they can collect from the depend more on the users’ environment and their behavior
radio environment, such as the received signal strength, UAVs’ vis-a-vis the VR environment. In a wireless VR network, the
positions, and users’ positions, and then generate an air-to- head and eye movements will significantly affect resource
ground channel model to fit the collected data. Finally, SNNs management and network control. This is a very new challenge
are a good choice for the prediction of UE UAVs’ trajectories. for wireless networks. For instance, ANNs are effective at
Then, the networks can select the appropriate BSs to service identifying and predicting the users’ movements and their
UE UAVs. A summary of key problems that can be solved by actions. Based on the predictions of the users’ environment,
using ANNs for UAV-based communications is presented in actions, and movements, the BSs can improve the generation
Table VI along with the challenges and future works. of the VR images and optimize the resource management for
wireless VR users. ANNs have two major applications for
wireless VR. First, ANNs can be used to predict the users’
C. Wireless Virtual Reality movement as well as their future interactions with the VR
1) Virtual Reality over Wireless Networks: Recently, the environment. For example, a user displays only the visible
wireless industry such as Qualcomm [130] and Nokia [131], portion of a 360◦ video and, hence, transmitting the entire
has rated VR as one of the most important applications in 360◦ video frame can waste the capacity-limited bandwidth.
5G and beyond networks. Moreover, 3GPP is standardizing Since all the images are constructed based on the users’ move-
wireless VR, called extended reality (XR) [12]. In addition, ments, using ANNs, one can predict the users’ movement and,
several industrial players such as HTC Vive [132], and Oculus hence, enable the wireless BSs to generate only the portion
[133], and Intel [134] are all developing wireless VR devices of the VR image that a user wants to display. Moreover,
that can operate over wireless cellular networks. These recent the predictions of the users’ movement can also improve the
16
TABLE IV
S UMMARY OF THE USE OF ANN S FOR S PECIFIC A PPLICATION
Applications Existing Works Challenges Future Works and Suggested Solutions

• UAV control [91], [94] • Limited power and computation • UAV path planning ⇒ RNN-based RL algorithm
• Position estimation [95] for training ANNs • Resource management ⇒ DNN-based RL algorithm
• UAV detection [93] • Limited time for data collection • Channel modeling for air-to-ground ⇒ SNN-based algorithm
UAV
• Deployment and caching [92], • Errors in training data • Handover for UE UAVs ⇒ RNN-based algorithm
[96], and [97] • Design multi-hop aerial network ⇒ CNN-based algorithm
• UE UAV trajectory prediction ⇒ SNN-based algorithm
• Resource allocation [103], [104] • Errors in collected data • VR users’ movement ⇒ RNNs prediction algorithm
• Head movement prediction [105] • Limited computational resources • Content correlation ⇒ CNN-based algorithm
• Gaze prediction [106] • Limited time for training ANNs • VR video coding and decoding ⇒ CNN-based algorithm
VR • Content caching and transmission [107] • Correction of inaccurate VR images ⇒ CNN-based algorithm
• Viewing video prediction ⇒ SNN-based algorithm
• Joint wireless and VR user environment prediction
⇒ RNNs prediction algorithm
• Manage computational resources and video formats
⇒ DNN-based RL algorithm
• Architecture for caching [108] • Data cleaning • Analysis of content correlation ⇒ CNN-based RL algorithm
• Cache replacement [109]–[111] • Content classification • Content transmission methods ⇒ RNN-based RL algorithm
Caching and • Content popularity prediction • Limited storage of ANNs for • Clustering of users and tasks ⇒ CNN-based algorithm
Computing [112] and [113] recording all types of contents • Computational demand prediction ⇒ SNN-based algorithm
• Content request distribution • Computing time prediction ⇒ SNN-based algorithm
prediction [97] and [114] • Computational resource allocation ⇒ RNN-based RL approach
• Computational caching ⇒ RNN-based RL algorithm
• Resource management [115], [101] • Channel selection • Detection of LoS links ⇒ CNN-based algorithm
• RAT selection [116] • Mobility predictions • Antenna tilting ⇒ DNN-based RL algorithm
• Transmission technology • Channel load estimation • Channel estimation ⇒ SNN-based algorithm
Multi-RAT
classification [117] • Load balancing • Handover among multi-RAT BSs ⇒ RNN-based algorithm
• Multi-radio packet scheduling [118] • MmWave links for multi-RAT ⇒ DNN-based algorithm
• Mode selection [119], [120] • MmWave channel modeling ⇒ SNN-based algorithm
• Model IoT as ANNs [121] • Massive amounts of data and • Data compression and recovery ⇒ CNN-based algorithm
• Failure detection [122], [123] large number of devices • Resource management ⇒ RNN-based RL algorithm
• User activities classification [124] • Limited computation and energy • User identification ⇒ DNN-based algorithm
• Tracking accuracy improvement [125] resources • IoT devices management ⇒ SNN-based algorithm
IoT
• Image detection [126] • Errors in collected data • Data relationship extraction ⇒ RNN-based RL algorithm
• Data sampling [127] • Real-time training for ANNs • Modeling autonomous M2M communication
• Entity state prediction [128] ⇒ FNN and SNN based algorithm
• Target surveillance [129]
tracking accuracy of the VR sensors. In particular, the BSs will time is an important question for wireless VR. In this regard,
jointly consider the users’ movement predicted by ANNs and training ANNs in an offline manner or using ANNs that
the users’ movements collected by VR sensors to determine converge quickly can be two promising solutions for speeding
the users’ movements. up the training process of ANNs.
Second, ANNs can be used to develop self-organizing The existing literature has studied a number of problems
algorithms to dynamically control and manage the wireless VR related to using ANNs for VR such as in [103]–[107]. The
network thus addressing problems such as dynamic resource work in [105] proposed an ESN based distributed learning
management. In particular, ANNs can be used for adaptively algorithm to predict the users’ head movement in VR ap-
optimizing the resource allocation and adjusting the quality plications. In [106], a decision forest learning algorithm is
and format of the VR images according to the cellular network proposed for gaze prediction. The work in [103] developed
environment. a neural network based transfer learning algorithm for data
Using ANNs for VR faces many challenges. First, in correlation aware resource allocation. 360◦ content caching
wireless VR networks, the data collected from the users may and transmission is optimized in [107] using an ESN and SSN
contain errors that are unknown to the BSs. In consequence, based deep RL algorithm. Table V summarizes the type of
the BSs may need to use erroneous data to train the ANNs and, ANNs and learning algorithms used for each existing work in
hence, the prediction accuracy of the ANN will be significantly virtual reality networks. In essence, the existing VR literature
affected. Second, due to the large data size of each 360◦ VR such as [103]–[107] has used ANNs to solve a number of
image, the BSs must spend a large amount of computational VR problems such as hand gestures recognition, interactive
resources to process VR images. Meanwhile, the training shape changes, video conversion, head movement prediction,
of ANNs will also require a large amount of computational and resource allocation. However, with the exception of our
resources. Thus, how to effectively allocate the computational works in [104] and [105], all of the other works that use ANNs
resources for processing VR images and training ANNs is an for VR applications are focused on wired VR. Therefore, they
important challenge. In addition, the VR applications require do not consider the challenges of wireless VR such as scarce
ultra-low latency while the training of ANNs can be time- spectrum resources, limited data rates, and how to transmit
consuming. Hence, how to effectively train ANNs in a limited the tracking data accurately and reliably. In fact, ANNs can
17
be used for wireless VR to solve the problems such as users 28

movement prediction, spectrum management, and VR image ESN-based learning algorithm
Optimal heuristic search
26
generation. Next, a specific ANNs’ application for VR over
Average delay of each served user (ms)

Q-learning algorithm
Propotional fair algorithm
wireless network is introduced. 24
3) Example: One key application of using ANNs for 22
wireless VR systems is presented in [104] for the study of 20
resource allocation in cellular networks that support VR users. 18
In this model, the BSs act as the VR control centers that collect 16
the tracking information from the VR users over the cellular 14

uplink and then send the generated images (based on the
12
tracking information) and accompanying surround stereo audio
10
to the VR users over the downlink. Therefore, this resource 1 2 3 4 5 6
Number of SBSs
allocation problem in wireless VR must jointly consider both
the uplink and the downlink transmissions. To capture the Fig. 12. Delay for each served user vs. the number of BSs [104].
VR users’ QoS in a cellular network, the model in [104]
jointly accounts for VR tracking accuracy, processing delay,
and transmission delay. The tracking accuracy is defined as the
difference between the tracking vector transmitted wirelessly based RL algorithm can use less exploration time to build
from the VR headset to the BS and the accurate tracking the relationship between the actions and their corresponding
vector obtained from the users’ force feedback. The tracking utilities and then optimize resource allocation.
vector represents the users’ positions and orientations. The To simplify the generation and training process of an ANN-
transmission delay consists of the uplink transmission delay based RL algorithm, an ESN-based RL algorithm is selected
and the downlink transmission delay. The uplink transmission for VR resource allocation. The ESN-based learning algorithm
delay represents the time that a BS uses to receive the tracking enables each BS to predict the value of VR QoS resulting from
information while the downlink transmission delay is the time each resource allocation scheme without having to traverse all
that a BS uses to transmit the VR contents. The processing the resource allocation schemes. The architecture of the ESN-
delay is defined as the time that a BS spends to correct based self-organizing approach is based on the ESN model
the VR image from the image constructed based on the specified in Subsection III-A2. To use ESNs for RL, each row
inaccurate tracking vector to the image constructed according of the ESN’s output weight matrix is defined as one action.
to the accurate tracking vector. In [104], the relationship Here, one action represents one type of resource allocation.
between the delay and the tracking is not necessarily linear The input of each ESN is the current action selection strategies
nor independent and, thus, multi-attribute utility theory [135] of all BSs. The generation of the ESN model follows Subsec-
is used to construct a utility function assigns a unique value tion III-A2. The output is the estimated utility value. In the
to each tracking and delay components of the VR QoS. learning process, at each time slot, each BS will implement
The goal of [104] is to develop an effective resource block one action according to the current action selection strategy.
allocation scheme to maximize the users’ utility function that After the BSs perform their selected actions, they can get the
captures the VR QoS. This maximization jointly considers actual utility values. Based on the actual utility values and
the coupled problems of user association, uplink resource the utility values estimated by ESN, each BS can adjust the
allocation, and downlink resource allocation. Moreover, the values of the output weight matrix of an ESN according to (4).
VR QoS of each BS depends not only on its resource allocation As time elapses, the ESN can accurately estimate the utility
scheme but also on the resource allocation decisions of other values for each BS and can find the relationship between the
BSs. Consequently, the use of centralized optimization for resource allocation schemes and the utility values. Based on
such a complex problem is largely intractable and yields this relationship, each BS can find the optimal action selection
significant overhead. In addition, for VR resource allocation strategy that maximizes the average VR QoS for its users.
problems, we must jointly consider both uplink and downlink Fig. 12 shows how average delay of each user varies as the
resource allocation, and, thus, the number of actions will be number of BSs changes. From Fig. 12, we can see that, as
much larger than conventional scenarios that consider only the number of BSs increases, the transmission delay for each
uplink or downlink resource allocation. Thus, as the number served user increases. This is due to the fact that, as the number
of actions significantly increases, each BS may not be able of BSs increases, the number of users located in each BS’s
to collect all the information needed to calculate the utility coverage decreases and, hence, the average delay increases.
function. However, as the number of BSs increases, the delay increase
To overcome these challenges, an ANN-based RL algorithm becomes slower due to the additional interference. This stems
can be used for self-organizing VR resource allocation. In par- from the fact that, as the number of BSs continues to increase,
ticular, an ANN-based RL algorithm can find the relationship the number of the users associated with each BS decreases
between the user association, resource allocation, and the user and more spectrum will be allocated to each user. Hence, the
data rates, and, then, it can, directly select the optimal resource delay of each user will continue to decrease. However, as the
allocation scheme after the training process. For the downlink number of the BSs increases, the increasing interference will
and uplink resource allocation problem in [104], an ANN- limit the reduction in the delay. Fig. 12 also shows that the
18
ESN-based algorithm achieves up to a 19.6% gain in terms formance for resource block allocation as DNN-based RL
of average delay compared to the Q-learning algorithm for algorithms. However, the time needed for training DNNs
the case with 6 BSs. Fig. 12 also shows that the ESN-based such as LSTMs will be much higher than the time needed
approach allows the wireless VR transmission to meet the for training ESNs. In consequence, one must choose an
VR delay requirement that includes both the transmission and appropriate ANN architecture for RL depending on the
processing delay (typically 20 ms [136]). These gains stem complexity of the wireless optimization problems. In the
from the adaptive nature of ESNs. wireless VR application, it could be more suitable to use
From this example, we illustrated the use of ESN as a shallow ANN in the RL algorithm for problems such as
an RL algorithm for self-organizing resource allocation in channel selection and user association, while DNN-based
wireless VR. An ESN-based RL algorithm enables each BS RL algorithms are more suitable for power allocation.
to allocate downlink and uplink spectrum resource in a self- This is due to the fact that, in power allocation problems,
organizing manner that adjusts the resource allocation accord- the optimized variables are continuous and, thus, the
ing to the dynamical environment. Moreover, an ESN-based number of actions needed for RL will be much larger
RL algorithm can use an approximation method to find the than those used in other problems (e.g., user association).
relationship between each BS’s actions and its corresponding Here, we note that, the above lesson learned can be gener-
utility values, and, hence, an ESN-based RL algorithm can alized to other shallow ANNs.
speed up the training process. Simulation results show that an 5) Future Works: Clearly, ANNs are a promising tool to
ESN-based RL algorithm enables each BS to achieve the delay address challenges in wireless VR applications. In fact, the
requirement of VR transmission. above application of ANNs for spectrum resource allocation
4) Lessons learned: Clearly, we have demonstrated that can be easily extended to manage other types of resources
ESNs can be an effective tool for resource management in a such as computational resources, and video formats. Moreover,
wireless VR network that needs to jointly consider the uplink SNNs can be used for the prediction of the viewing VR video
and downlink resource block allocation. Some key outcomes which is the VR video displayed at the headset of one user.
learned from this application include the following: Then, the network can reduce the data size of each transmitted
• In non-wireless applications such as speech recognition, VR video and pre-transmit each viewing VR video to the
ESNs are used for data analytics. In this VR application, users. This is because SNNs are good at processing the rapidly
ESNs are used as a reinforcement learning algorithm for changing, dynamic VR videos. Furthermore, RNNs can be
downlink and uplink resource block management. The used to predict and detect the VR users’ movement such as eye
advantage of the ESN based RL algorithm is that it movement and head movement and their interactions with the
provided the network with an ability to predict the value environment. Then, the network can pre-construct VR images
of the VR QoS that results from each action (instead based on these predictions which can reduce the time spent to
of relying on a Q-table to record the observed utility construct the VR images. The user-VR system interactions are
values as done in Q-learning) and, hence, it can find the all time-dependent and, hence, RNNs are a good choice for
optimal action selection strategy that can maximize the performing such tasks. Note that, the prediction of the users’
individual (per SBS) VR QoS utilities without having movement will directly affect the VR images that are sent to
to traverse all actions. As a result, ESN-based RL is the users at each time slot and, hence, the learning algorithm
suitable for wireless VR resource management problems must complete the training process during a short time period.
in which both uplink and downlink resources must be In consequence, we should use RNNs that are easy to train for
managed jointly, thus increasing the search space for the prediction of the users’ movement. Finally, CNNs can be
the wireless VR QoS optimization problem, compared to used for VR video compression and recovery so as to reduce
standard wireless resource management problems. This the data size of each transmitted VR video and improve the
was a novel use case of ESNs that is motivated by the QoS for each VR user. This is because CNNs are good at
underlying wireless system, rather than by the need to storing large amount of data in spatial domain and learn the
process some data as done in computer vision. features of VR images. A summary of key problems that can
• Compared to most of the existing DNN-based RL algo- be solved by using ANNs in wireless VR system is presented
rithms that cannot analytically guarantee convergence to in Table VI along with the challenges and future works.
a final equilibrium or optimization solution, in this ap-
plication, we have proved that ESN-based RL algorithms
will finally converge to the expected VR QoS utilities if D. Mobile Edge Caching and Computing
the learning parameters are appropriately set. 1) Mobile Edge Caching and Computing: Caching at the
• Due to the limited memory capacity of each ESN, the edge of the wireless networks, as shown in Fig. 13, enables
application of an ESN-based RL algorithm depends on the network devices (BSs and end-user devices) to store the
the complexity of the underlying wireless problems. ESN- most popular content to reduce the data traffic (content trans-
based RL algorithms can be used to solve the opti- mission), delay, and bandwidth usage, as well as to improve
mization problem with a moderate number of optimized the energy efficiency and the utilization of the users’ context
variables while DNN-based algorithms can be used to and social information [137]. Recently, it has become possible
solve more complex optimization problems. In this work, to jointly consider cache placement and content delivery, using
the ESN-based RL algorithms can achieve the same per- coded caching [138]. Coded caching enables network devices
19
a user requests a certain content. Since human behavior can

be predicted by ANNs, ANNs are a promising solution for
%NQWF effective mobile edge caching and computing.
In essence, ANNs can play a vital role in three major
applications for mobile edge caching and computing. First,
ANNs can be used for prediction and inference purposes. For
example, ANNs can be used to predict the users’ content re-
.QECN quest distributions and content request frequency. The content
FKUVTKDWVGF
EQORWVKPI
#EEGUURQKPV request distribution and content request frequency can be used
to determine which content to store at the end-user devices or
BSs. Furthermore, ANNs can also be used to find social infor-
%CEJG
mation from the collected data. In particular, ANNs can learn
+Q6*QOG
%QORWVKPI
CV#2
the users’ interests, activities, and interactions. By exploiting
FGXKEGU
the correlation between the users’ data, their social interests,
and their common interests, the accuracy of predicting future
Fig. 13. Mobile edge caching and computing wireless networks. events such as the users’ geographic locations, the next visited
cells, and the requested contents can be dramatically improved
[143]. For example, ANNs can be used to predict the users’
interests. The users that have the same interests are highly
to create multicasting opportunities for specific content, via likely to request the same content. Therefore, the system
coded multicast transmissions, thus significantly improving operator can cluster the users that have the same interests and
the bandwidth efficiency [139]. However, designing effective store the popular contents they may request. Similarly, ANNs
caching strategies for wireless networks faces many challenges can be used to predict the computational requirements of tasks
[137] such as solving optimized cache placement, cache up- which in turn enables the network devices to schedule the
date, and content popularity analytics problems. computational resources in advance thus minimizing latency.
In addition to caching, the wireless network’s edge devices Second, ANNs can be used as an effective clustering al-
can be used for performing effective and low-latency computa- gorithm to classify the users based on their activities such
tions using the emerging paradigm of mobile edge computing as content request, which enables the system operator to
[140]. The basic premise of mobile edge computing is to determine which contents to store at a storage unit and,
exploit local resources for computational purposes (e.g., for thus, improve the usage of cached contents. For instance,
VR image generation or for sensor data processing), in order to the content requests of users can change over time while
avoid high-latency transmission to remote cloud servers. Mo- the cached content will be updated for a long time (i.g.,
bile edge computing, which includes related concepts such as one day) and, hence, the system operator must determine
fog computing [141], can decrease the overall computational which content to cache by reviewing all the collected content
latency by reducing the reliance on the remote cloud while requests. ANNs, such as CNNs, can be used to store the
effectively offloading computational resources across multiple content request information and classify the large amount of
local and remote devices. The key challenge in mobile edge content requests for cache update. In fact, predictions and
computing is to optimally allocate computational tasks across clustering are interrelated and, therefore, ANNs can be used
both the edge devices (e.g., fog nodes) and the remote data for both applications simultaneously. For example, ANNs can
servers, in a way to optimize latency. Finally, it is worth first be used to predict the users’ content request distributions,
noting that some recent works [142] have jointly combined and, then, ANNs can be used to classify the users that have
caching and computing. In this case, caching is used to store similar content request distributions. Meanwhile, ANN-based
the most popular and basic computational tasks. Based on the clustering algorithms can be used to classify the computing
caching results, the network will have to determine the opti- tasks. Then, the computing tasks that are clustered into a group
mal computational resource allocation to globally minimize can be assigned to a certain computing center. In this case,
latency. However, optimizing mobile edge computing faces each computing center will process one type of computing
many challenges such as computing placement, computational tasks thus reducing the computational time. Finally, ANNs can
resource allocation, computing tasks assignment, end-to-end also be used for intelligently scheduling the computing tasks to
latency minimization, and minimization of the energy con- different computing centers. In particular, ANNs can be used
sumption for the devices. as an RL algorithm to learn each computing center’s state such
2) Neural Networks for Mobile Edge Caching and Com- as its computational load, and then, allocate computing tasks
puting: ANNs can play a central role in the design of the based on the learned information to reduce the computational
new mobile edge caching and computing mechanisms. For time.
instance, the problems of optimal cache placement and cache Using ANNs for mobile edge caching and computing faces
update are all dependent on the predictions of the users’ many challenges. Data cleaning is an essential part of the data
behaviors such as the users’ content request problems. For analysis process for mobile edge processing. For example,
example, the cache placement depends on the users’ locations to predict the users’ content requests, the data processing
while the cache update depends on the frequency with which system should be capable of reading and extracting useful data
20
from huge and disparate data sources. For example, one user’s Virtual clusters are connected to the content servers via
content request depends on this user’s age, job, and locations. capacity-constrained wired backhaul links. Since the backhaul
In fact, the data cleaning process usually takes more time than (fronthaul) links are wired, we assume that the total transmis-
the learning process. For instance, the type and volume of sion rate of the backhaul (fronthaul) links is equally allocated
content that users may request can be in the order of millions to the content that must be transmitted over the backhaul
and, hence, the data processing system should select appropri- (fronthaul) links. Each user has a periodic mobility pattern and
ate content to analyze and predict the users’ content request regularly visits a certain location. Since cache-enabled RRHs
behaviors. For caching, the most important use of ANNs is to and BBUs can store the requested content, this content can be
predict the users’ content requests which directly determines transmitted over four possible links: a) content server-BBUs-
the caching update. However, each user may request a large RRH-user, b) cloud cache-BBUs-RRH-user, c) RRH cache-
volume of content types such as video, music, and news, each RRH-user, and d) remote RRH cache-remote RRH-BBUs-
of which having different formats and resolutions. Hence, for RRH-user. The notion of effective capacity3 [144] was used to
each user, the total number of the requested content items capture the maximum content transmission rate of a channel
will be significantly large. However, the memory of an ANN under a certain QoS requirement. The effective capacity of
is limited and, hence, each ANN can record only a limited each content transmission depends on the link that is used to
number of requested contents. In consequence, an ANN must transmit the content and the actual link capacity between the
be able to select the most important content for content request user and the associated RRHs.
prediction so as to help the network operator determine which The goal of [114] is to develop an effective framework for
content to store at mobile edge cache. Similarly, for computing content caching and RRH clustering in an effort to reduce the
tasks predictions, the limited-memory ANNs can only store a network’s interference and to offload the traffic of the backhaul
finite number of the computing tasks and, hence, they must and of the fronthaul based on the predictions of the users’
select suitable computing tasks to store and predict. Moreover, content request distributions and mobility patterns. To achieve
as opposed to mobile edge caching that requires a long period this goal, a QoS and delay optimization problem is formulated,
of time to update the cached contents, mobile edge computing whose objective is to maximize the long-term sum effective
needs to process the tasks as soon as possible. Therefore, the capacity of all users. This optimization problem involves
ANNs used for mobile edge computing must complete their the prediction of the content request distribution and of the
training process in a short period time. periodic location for each user, and the finding of the optimal
The existing literature has studied a number of problems content to cache at the BBUs and at the RRHs. To predict
related to the use of ANNs for caching [97], [108], [109], the content request distribution and mobility patterns for each
[114], and [110]–[113]. The authors in [108] proposed a user, an ESN-based learning algorithm is used, similarly to
big data-enabled architecture to investigate proactive content the one described in Subsection III-A2. For each user, the
caching in 5G wireless networks. In [109]–[111], ANNs are BBUs must implement one ESN algorithm for content request
used to determine the cache replacement and content delivery. distribution prediction and another ESN algorithm for mobility
The authors in [112] developed a data extraction method using pattern prediction.
the Hadoop platform to predict content popularity. In [113], For the content request distribution prediction, the input of
an extreme-learning machine neural network is used to predict the developed ESN is a user’s context which includes content
content popularity. The works in [97] and [114] developed an request time, week, gender, occupation, age, and device type.
ESN-based learning algorithm to predict the users’ mobility The output is the predicted content request distribution. The
patterns and content request distributions. In general, existing ESN model consists of the input weight matrix, the output
works such as in [97], [108], [109], [114], and [110]–[113] weight matrix, and the recurrent weight matrix (see Subsection
have used ANNs to solve the caching problems such as III-A2). A linear gradient descent approach is used to train the
cache replacement, content popularity prediction, and content output weight matrix. For mobility pattern prediction, the input
request distribution prediction. For mobile edge computing, in of the developed ESN is the current location of each user and
general, there is no existing work that uses ANNs to solve the output is the vector of locations that a user is predicted to
these relevant problems. Next, we explain a specific ANNs’ visit for the next steps. In contrast to the recurrent matrix that
application for mobile edge caching. is a sparse matrix and generated randomly, the recurrent matrix
3) Example: One illustrative application for the use of of the ESN used for mobility prediction contains only W non-
ANNs for mobile edge caching is presented in [114] which zero elements, where W is the dimension of the recurrent
studies the problem of proactive caching in CRANs. In this matrix. This simplified recurrent matrix can speed up the
model, the users are served by the RRHs which are connected training process of the ESNs. An offline manner using ridge
to the cloud pool of the BBUs via capacity-constrained wired regression is used to train the output weight matrix.
fronthaul links. The RRHs and the users are all equipped Based on the users’ content request distribution and lo-
with storage units that can be used to store the most popular cations, the cloud can estimate the users’ RRH association,
content that the users request. The RRHs which have the determine each RRH’s content request distribution, and, then,
same content request distributions are grouped into a virtual
3 The effective capacity is a link-layer channel model that can be used to
cluster and serve their users using zero-forcing method. The
measure a content transmission over multiple hops. In particular, the effective
content request distribution for a particular user represents the capacity can be used to measure a content transmission from the BBUs to the
probabilities with which the user requests different content. RRHs, then from RRHs to the users.
21
10 4 for each user. For example, to predict the weekly mobility
1.25 ESNs and sublinear algorithm
Optimal caching with complete information pattern of each user, an ESN-based learning algorithm
Sum effective capacity (bits/s/Hz) 1.2
Random caching with clustering
Random caching without clustering cannot separate the mobility pattern in a week into several
1.15 days and use a specific non-linear system to predict the
1.1
users’ mobility in each day. In fact, as we discussed in the
UAV application in Subsection IV-B, using a unique non-
linear system to predict the mobility of each user each
1.05
1 day can significantly improve the accuracy of weekly

0.95 mobility pattern prediction. Learning using ESNs is more
0.9
appropriate for predicting a single task, rather than for
multiple prediction tasks. To overcome this challenge,
0.85
512 640 768 896 1024 1152 one can use the conceptor notion that was discussed
Number of RRHs
in Subsection IV-B. Note that, this observation can be
Fig. 14. Sum effective capacity as function of the number of RRHs [114]. generalized to other shallow RNNs and SNNs.
• Compared to conceptor ESNs, ESN based learning algo-
rithms have a lower training complexity and faster con-
vergence speed. However, as already mentioned, ESNs
cluster the RRHs into several groups. Finally, the content cannot separate the users’ contexts for multiple mobility
that must be cached at the cloud and at the RRHs can be pattern predictions which will affect the prediction accu-
determined. The analysis result proved that the ESN-based racy. In consequence, one must choose between standard
algorithm will reach an optimal solution to the content caching ESN or a conceptor ESN depending on the number of pre-
problem. diction tasks needed and their complexity. In Subsection
Fig. 14 shows how the sum of the effective capacities IV-D, the predictions are used to determine the cached
of all the users in a period changes with the number of content whose prediction is somewhat less challenging
RRHs. As the number of the RRHs increases, the effective compared to other metrics that require more precise
capacities of all the algorithms increase as the users become predictions such as the UAV locations in Subsection
closer to their RRHs. The ESN approach can yield up to IV-B. Therefore, we choose the ESN based prediction
21.6% and 24.4% of improvements in the effective capacity algorithms for mobility and content request distribution
compared to random caching with clustering and random predictions.
caching without clustering, respectively, for a network with
512 RRHs. This stems from the fact that the ESN-based 5) Future Works: Clearly, ANNs will be an important tool
algorithm can effectively use the predictions of the ESNs to for solving challenges in mobile edge caching and computing
determine which content to cache. applications, especially for content request prediction and
4) Lessons learned: The presented example of the mobile computing tasks prediction. In fact, CNNs, that are good at
edge caching and computing application demonstrated that storing voluminous data in spatial domains, can be used to in-
ESNs are effective for the prediction of the users’ mobility vestigate the content correlation in the spatial domains. Based
patterns and content request distribution, based on which the on the content correlation, each BS can store the contents
cloud can determine the content stored at the cloud and at that are the most related to other contents to improve the
the RRHs. Some key outcomes learned from this application caching efficiency and hit ratio. Moreover, RNNs can be used
include: as self-organizing RL algorithms to allocate computational
• Even though analyzing the memory capacity of an ESN is resources. RNNs are suitable here because they can record the
generally challenging, in this application, we were able utility values resulting from different computational resources
to derive the memory capacity for an ESN that uses a allocation schemes as time elapses. Then, the RNN-based
linear activation function. Based on this analysis, we can RL algorithms can find the optimal computational resource
accurately set the size of the matrices and the memory allocation after several implementations. Meanwhile, in con-
capacity of each ESN that can precisely predict the users’ trast to the user association in cellular network where each
mobility and content request distributions. Here, we need user can only associate with one BS, one computing task can
to note that, as the memory capacity increases, the be assigned to several computing centers and one computing
training complexity of an ESN will significantly increase. center can process different computing tasks. Therefore, the
In this context, for mobility prediction in this application, problem of computing task assignment is a many-to-many
we build an ESN model with minimum memory capacity matching problem [145]. RNN-based RL algorithms can also
that can accurately predict the users’ mobility patterns be used to solve the computing task assignment problem due
and quickly converge. In fact, for different prediction to their advantages in analyzing historical data pertaining to
tasks, one can adjust the memory capacity of each ESN past assignments of computing tasks. In addition, DNN-based
using the obtained results to enable the ESNs to record RL algorithms can be used to jointly optimize the cache
all of the information needed for the predictions. replacement and the content delivery. To achieve this purpose,
• This example also showed that ESN-based learning algo- each action of the DNN-based RL algorithm must contain one
rithms can be trained to predict only one mobility pattern content delivery method as well as one cache update scheme.
22
This is because DNNs are good at storing large amounts of ANNs can allow the smart use of different RATs wherein a
utility values resulting from different content delivery and BS can learn when to transmit on each type of frequency band
cache update schemes. Last but not as least, SNNs can be based on the underlying network conditions. For instance,
used to predict the dynamic computational resource demands ANNs may allow multi-mode BSs to steer their traffic flows
for each user due to their advantages in dealing with highly between the mmWave, the microwave, and the unlicensed band
dynamic data. A summary of the key problems of using ANNs based on the availability of a LoS link, the congestion on
for mobile edge caching and computing is presented in Table the licensed band and the availability of the unlicensed band.
VI along with the challenges and future works. Moreover, in LTE-WiFi link aggregation (LWA) scenarios,
ANNs allow cellular devices to learn when to operate on each
band or utilize both links simultaneously.
E. Co-existence of Multiple Radio Access Technologies
Moreover, ANNs can provide multi-mode BSs with the
1) Co-existence of Multiple Radio Access Technologies: ability to learn the appropriate resource management proce-
To cope with the unprecedented increase in mobile data traffic dure over different RATs or spectrum bands in an online
and realize the envisioned 5G services, a significant enhance- manner and, thus, to offer an autonomous and self-organizing
ment of per-user throughput and overall system capacity is operation with no explicit communication among different
required [146]. Such an enhancement can be achieved through BSs, once deployed. For instance, ANNs can be trained over
advanced PHY/MAC/network technologies and efficient meth- large datasets which take into account the variations of the
ods of spectrum management. In fact, one of the main ad- traffic load over several days for scenarios in which the traffic
vancements in the network design for 5G networks relies on load of WiFi access points (WAPs) can be characterized based
the integration of multiple different radio access technologies on a particular traffic model [153]. It should be noted that
(RATs) [147]. Multi-RAT based networks encompass several cellular data traffic networks exhibit statistically fluctuating
technologies in which spectrum sharing is important. These and periodic demand patterns, especially for applications such
include cognitive radio networks, LTE-U networks, as well as file transfer, video streaming, and browsing [153]. ANNs
as heterogeneous networks that include both mmWave and can also accommodate the users’ mobility patterns to predict
sub-6 GHz frequencies. With the multi-RAT integration, a the availability of a LoS link, thus, allowing the transmission
mobile device can potentially transmit data over multiple radio over the mmWave band. In particular, they can be trained
interfaces such as LTE and WiFi [148], at the same time, to learn the antenna tilting angle based on the environment
thus improving its performance [149]. Moreover, a multi- changes in order to guarantee a LoS communication link with
RAT network allows fast handover between different RATs the users and, thus, to enable an efficient communication over
and, thus, it provides seamless mobility experience for users. the mmWave spectrum. Moreover, ANNs may enable multiple
Therefore, the integration of different RATs results in an BSs to learn how to form multi-hop, mmWave links over
improvement in the utilization of the available radio resources backhaul infrastructure, while properly allocating resources
and, thus, in an increase in the system’s capacity. It also across those links in an autonomous manner [154], [155].
guarantees a consistent service experience for different users To cope with the changes in the traffic model and/or the
irrespective of the served RAT and it facilitates the network users’ mobility pattern, ANNs can be combined with online
management. ML [156] by properly re-training the weights of the developed
Spectrum management is also regarded as another key learning mechanisms. Multi-mode BSs can, thus, learn the
component of Multi-RAT based networks [150]. Unlike early traffic patterns over time and, thus, predict the future channel
generations of cellular networks that operate exclusively on availability status. With proper network design, ANNs can
the sub-6 GHz (microwave) licensed band, Multi-RAT based allow operators to improve their network’s performance by re-
networks are expected to transmit over the conventional sub-6 ducing the probability of congestion occurrence while ensuring
GHz band, the unlicensed spectrum and the 60 GHz mmWave a degree of fairness to the other corresponding technologies
frequency band [151], [152]. We note that, on the other in the network.
hand, the classical LTE microwave licensed band is reliable, A proactive resource management of the radio spectrum
however, limited and hence is a scarce resource. On the other for multi-mode BSs can also be achieved using ANNs. In
hand, the unlicensed bands can be used to serve best effort a proactive approach, rather than reactively responding to
traffic only since the operation over this spectrum should incoming demands and serving them when requested, multi-
account for the presence of other coexisting technologies. mode BSs can predict traffic patterns and determine future off-
Therefore, a multi-mode BS operating over the licensed, peak times on different spectrum bands so that the incoming
unlicensed, and mmWave frequency bands can exploit the traffic demand can be properly allocated over a given time
different characteristics and availability of the frequency bands window. In an LTE-U system, for instance, a proactive co-
thus providing robust and reliable communication links for the existence mechanism may enable future delay-intolerant data
end users [152]. However, to reap the benefits of multi-mode demands to be served within a given prediction window ahead
BSs, spectrum sharing is crucial. of their actual arrival time thus avoiding the underutilization of
2) Neural Networks for Spectrum Management and the unlicensed spectrum during off-peak hours [157]. This will
Multi-RAT: ANNs are an attractive solution approach for also lead to an increase in the LTE-U transmission opportunity
tackling various challenges that arise in multi-RAT scenarios. as well as to a decrease in the collision probability with WAPs
To leverage the advantages of such multi-RAT networks, and other BSs in the network.
23
Several existing works have adopted various learning tech- The exponential backoff scheme is adopted for WiFi while the
niques in order to tackle a variety of challenges that arise in BSs adjust their contention window size (and, thus, the channel
multi-RAT networks [62], [101], [115]–[120]. The problem access probability) on each of the selected channels based on
of resource allocation with uplink-downlink decoupling in an the network traffic conditions while also guaranteeing a long-
LTE-U system has been investigated in [101] in which the term equal weighted fairness with WLAN and other BSs.
authors propose a decentralized scheme based on ESNs. The The proactive resource allocation scheme in [158] is for-
authors in [115] propose a fuzzy-neural system for resource mulated as a noncooperative game in which the players are
management among different access networks. The work in the BSs. Each BS must choose which channels to transmit
[116] used an ANN-based learning algorithm for channel on along with the corresponding channel access probabilities
estimation and channel selection. The authors in [117] pro- at t = 0 for each t of the next time window T . This, in
pose a supervised ANN approach, based on FNNs, for the turn, allows the BSs to determine future off-peak hours of the
classification of the users’ transmission technology in a multi- WLAN on each of the unlicensed channels thus transmitting
RAT system. In [118], the authors propose a hopfield neural on the less congested channels. Each BS can therefore max-
network scheme for multi-radio packet scheduling. In [119], imize its total throughput over the set of selected channels
the authors propose a cross-system learning framework in over T while guaranteeing long-term equal weighted fairness
order to optimize the long-term performance of multi-mode with the WLAN and the other BSs. To solve the formulated
BSs, by steering delay-tolerant traffic towards WiFi. The work game (and find the so-called Nash equilibrium solution), a
in [120] used a deep RL algorithm for mode selection and DNN framework based on LSTM cells was used. To allow
resource management in a fog radio access network. Other a sequence-to-sequence mapping, we considered an encoder-
important problems in this domain include root cause analysis decoder model as described in Section III-C. In this model,
issues as the ones are studied in [62]. Nevertheless, these prior the encoder network maps an input sequence to a vector of a
works [62], [101], [115]–[120] consider a reactive approach fixed dimensionality and then the decoder network decodes the
in which the data requests are first initiated and, then, the target sequence from the vector. In this scheme, the input of the
resources are allocated based on their corresponding delay encoder is a time series representation of the historical traffic
tolerance value. In particular, existing works do not consider load of the BSs and WAPs on all the unlicensed channels.
the predictable behavior of the traffic and, thus, they do not The learned vector representation is then fed into a multi-layer
account for future off-peak times during which data traffic perceptron (MLP) that summarizes the input vectors into one
could be distributed among different RATs. vector, thus accounting for the dependency among all the input
Here, note that, ANNs are suitable for learning the data time series vectors. The output of the MLP is then fed into
traffic variations over time and, thus, to predict the future different separate decoders, allowing each BS to reconstruct
traffic load. In particular, since LSTM cells are capable of its predicted action sequence.
storing information for long periods of time, they can learn the To train the proposed network, the REINFORCE algorithm
long-term dependency within a given sequence. Predictions at [159] is used to compute the gradient of the expected reward
a given time step are influenced by the network activations with respect to the policy parameters, and the standard gradient
at previous time steps, thus, making LSTMs an attractive descent optimization algorithm [160] is adopted to allow the
solution for proactively allocating the available resources in model to generate optimal action sequences for input history
multi-RAT systems. In what follows, we summarize our work traffic values. In particular, we considered the RMSprop gradi-
in [158], in which we developed a deep RL scheme, based on ent descent optimization algorithm [161], an adaptive learning
LSTM memory cells, for allocating the resources in an LTE-U rate approach, wherein the learning rate of a particular weight
network over a fixed time window T . is divided by a running average of the magnitudes of the recent
3) Example: An interesting application of DNNs in the gradients for that weight.
context of LTE-U and WiFi coexistence is presented in [158]. The proposed proactive resource allocation scheme was
The work in [158] considers a network composed of several compared with a reactive approach for three different network
LTE-U BSs belonging to different LTE operators, several scenarios. Fig. 15 shows that for very small values of T ,
WAPs and a set of unlicensed channels on which LTE-U BSs the proposed scheme does not yield any significant gains.
and WAPs can operate on. The LTE carrier aggregation fea- However, as T increases, the BSs have additional opportunities
ture, using which the BSs can aggregate up to five component for shifting part of the traffic into the future and, thus, the gains
carriers belonging to the same or different operating frequency start to become more pronounced. For example, we can see
bands, is adopted. We consider a time domain divided into that, for 4 BSs and 4 channels, the proposed proactive scheme
multiple time windows of duration T , each of which consisting achieves an increase of 17% and 20% in terms of the average
of multiple time epochs t. Our objective is to proactively airtime allocation for LTE-U as compared to the reactive
determine the spectrum allocation vector for each BS at t = 0 approach. Here, note that the gain of the proposed scheme,
over T while guaranteeing long-term equal weighted airtime with respect to the reactive approach, keeps on increasing until
share with WLAN. In particular, each BS learns its channel it reaches a maximum achievable value, after which it remains
selection, carrier aggregation, and fractional spectrum access almost constant.
over T while ensuring long-term airtime fairness with the 4) Lessons learned: In the aforementioned application,
WLAN and the other LTE-U operators. A contention-based we have demonstrated that LSTM can be an effective tool
protocol is used for channel access over the unlicensed band. for resource management in an LTE-U system that needs to
24
0.7 though that is not ascertained analytically. An interesting

Average airtime allocation for LTE-U
future research to address in this context is to analyze the
0.65 convergence for LSTM-based RL in an LTE-U context,
or more generally, in a multi-RAT resource management
0.6
context. This difficulty in analyzing the convergence of
0.55 LSTM can also be encountered when dealing with other
types of ANN-based RL schemes.
0.5 • In this LTE-U scenario, the network operator can train
the LSTM in a completely offline manner since all that
0.45 is needed for this training is to use past observations
0.4
of WiFi traffic and, it is generally known that, within a
geographic area, over long periods of time, the wireless
0.35 data traffic parameters are more or less consistent. This
is a key motivation for using a deep architecture here.
0.3 • This work has demonstrated that, even though deep learn-
2 4 6 8 10 12
ing based on LSTM can provide significant improvements
Time window T
in the predictions of time-stamped sequences of data
Fig. 15. The average throughput gain for LTE-U upon applying a proactive (here being the time-varying WiFi traffic), in a practical
approach (with varying T ) as compared to a reactive approach [158]. wireless application, one does need to use many layers. In
fact, through our simulations, we observed that increasing
the number of hidden layers has a very small impact
on the achieved performance. This is mainly due to
maintain a fair co-existence between WiFi and LTE. The key
the fact that the WiFi traffic that is used as input to
benefit brought forward by LSTM in this application is that
LSTM in this work, is much less time-varying than the
it enabled the cellular system to accurately predict future off-
datasets that are used in other, non-wireless fields, such
peak hours of WiFi, so as to seize the channels on which to
as in natural language processing, where multiple layers
transmit. This, in turn, led to a better co-existence between
provide more gains. However, we do note that, in this
the two systems, owing to the predictive ability of LSTM
work, we wanted to predict a future sequence of WiFi
that provided the system with the ability to use historical
traffic data based on a significant history of data and,
WiFi traffic data to determine future traffic and, thus, make
therefore, using shallow networks like ESN (e.g., as done
anticipatory resource management decisions. The main lessons
in the UAV and VR applications) would not have been
learned here include:
as effective as using LSTM that has both short and long
• LSTM has mostly been used for data analytics. In the term memory (as explained in Subsection III-C) and can
aforementioned application, the network needed LSTM as more effectively handle predictions of future sequences
a part of a RL algorithm that can determine the solution that require significant historical data, as is the case for
of a game-theoretic setting, which can be thought of as WiFi traffic. That said, in our simulations, we only needed
the solution of a series of optimization problems that are three hidden layers to reap the benefits of LSTM.
solved at the level of each BS. In this context, LSTM • As it is evident from the previous point, whether or
enabled the RL algorithm to estimate future utilities not one adopts a deep architecture or a very advanced
(rather than just observe them from the environment as type of ANN depends on the type of application that
done in Q-learning) and, hence, be able to seek bet- is being addressed. For WiFi traffic prediction, a deep
ter optimization problem solutions (equivalent to game- architecture was appropriate. Meanwhile, for prediction
theoretic equilibria). This was a novel use case of LSTM of mobility data and user-based content in the VR and
that is motivated by the underlying wireless system, rather UAV applications that were previously discussed, the use
than by the need to process some data. of a shallow RNN by itself provided significant gains,
• Even though proving the optimality properties of the even without using a deep architecture. That said, as we
LSTM output itself is difficult, in this application, we will see later in Section IV, in some applications like IoT,
have shown that by combining LSTM with a game- one can solve meaningful wireless problems by resorting
theoretic framework, we can ensure that, whenever the to very simple ANNs, such as FNNs, without the need for
RL algorithm converges, it is guaranteed to be at a Nash deep architectures or more advanced structures. This is a
equilibrium (i.e., as a point at which none of the RL major contrast to other ML application domains such as
algorithms can find a better outcome). However, guaran- computer vision, where oftentimes a complex, deep ANN
teeing convergence analytically is much more challenging is needed to obtain meaningful results.
than, for example, the ESN-based approaches we used in • One disadvantage of using an ANN within a RL al-
the VR and the UAV problems, due to the deep nature gorithm is that the prediction errors may affect the
of LSTM. We do note that our thorough simulations performance of the outcome. In some sense, within the
(for many simulation parameters and settings), showed aforementioned game-theoretic context, the efficiency of
that the algorithm will actually always converge, even the reached equilibrium can be impacted by the prediction
25
errors. While this is true for all of the applications in ecosystem [162]. The IoT will enable machine-type devices
which we used ANNs as part of a RL algorithm, the effect to connect with each other over wireless links and operate
of the prediction errors may be more pronounced for the in a self-organizing manner [163]. Therefore, IoT devices
LTE-U application because it may lead to the LTE seizing will be able to collect and exchange real-time information
more or less WiFi slots than needed, which can directly to provide smart services. In this respect, the IoT will allow
impact the operation of the WiFi user. Naturally, this is delivering innovative services and solutions in the realms of
a more serious drawback than in scenarios where the smart cities, smart grids, smart homes, and connected vehicles
network is simply using ANNs to cache data (e.g., as in that could provide a significant improvement in people’s lives.
the previously discussed UAV application) or perform cell However, the practical deployment of an IoT system still faces
association (in which case, if a prediction error occurs, the many challenges [163] such as data analytics, computation,
network can simply resort back to known cell association transmission capabilities, connectivity, end-to-end latency, se-
algorithms). curity [164], and privacy. In particular, how to provide massive
5) Future Works: The above application of ANNs to LTE- device connectivity with stringent latency requirement will be
U systems can be easily extended to a multi-mode network one of the most important challenges. The current centralized
in which the BSs transmit on the licensed, the unlicensed, communication models and the corresponding technologies
and the mmWave spectrum. In fact, given their capability of may not be able to provide such massive connectivity. There-
dealing with time series data, RNNs can enhance mobility and fore, there is a need for a new communication architecture,
handover in highly mobile wireless environments by learning such as fog computing models for IoT devices connectivity.
the mobility patterns of users thus decreasing the ping-pong ef- Moreover, for each IoT device, energy and computational
fect among different RATs. For instance, a predictive mobility resources are limited. Hence, how to allocate computational
management framework can address critical handover issues, resources and power for all the IoT devices to achieve the
including frequent handovers, handover failures, and excessive data rate and latency requirements is another challenge.
energy consumption for seamless handovers in emerging dense 2) Neural Networks for the Internet of Things: ANNs
multi-RAT wireless cellular networks. ANNs can also predict can be used to address some of the key challenges within
the QoS requirements, in terms of delay and rate, for the the context of the IoT. So far, ANNs have seen four major
future offered traffic. Moreover, they can predict the trans- applications for the IoT. First, ANNs enable the IoT system
mission links’ conditions and, thus, schedule users based on to leverage intelligent data analytics to extract important
the links’ conditions and QoS requirements. Therefore, given patterns and relationships from the data sent by the IoT
the mobility patterns, transmission links’ conditions and QoS devices. For example, ANNs can be used to discover important
requirements for each user, BSs can learn how to allocate correlations among data to improve the data compression and
different users on different bands such that the total network data recovery. Second, using ANN-based RL algorithms, IoT
performance, in terms of delay and throughput, is optimized. devices can operate in a self-organizing manner and adapt their
An interesting future work of the use of DNNs for mmWave strategies (i.e., channel selection) based on the wireless and
communication is antenna tilting. In particular, DNNs are users environments. For instance, an IoT device that uses an
capable of learning several features of the network environ- ANN-based RL algorithm can dynamically select the most
ment and thus predicting the optimal tilt angle based on the suitable frequency band for communication according to the
availability of a LoS link and data rate requirements. This network state. Third, the IoT devices that use ANN-based
in turn improves the users’ throughput thus achieving high algorithms can identify and classify the data collected from
data rate. Moreover, LSTMs are capable of learning long time the IoT sensors. Finally, one of the main goals of the IoT is to
series and thus can allow BSs to predict the link formation for improve the life quality of humans and reduce the interaction
the mmWave backhaul network. In fact, the formation of this between human and IoT devices. Thus, ANNs can be used to
backhaul network is highly dependent on the network topology predict the users behavior to provide advanced information for
and the traffic conditions. Therefore, given the dynamics the IoT devices. For example, ANNs can be used to predict
of the network, LSTMs enable BSs to dynamically update the time that an individual will come home, and, hence, adjust
the formation of the links among each others based on the the control strategy for the IoT devices at home.
changes in the network. Moreover, SNNs can be used for Using ANNs for IoT faces many challenges. First, in
mmWave channel modeling since they can process and predict IoT, both energy and computational resources are limited.
continuous-time data effectively. A summary of key problems Therefore, one should consider the tradeoff between the energy
that can be solved by using ANNs in multi-RAT system is and computational needs of training ANNs and the accuracy
presented in Table VI along with the challenges and future requirement of a given ANN-based learning algorithm. In
works. particular, the higher the required accuracy, the higher the
computational and energy requirements. Second, within an IoT
ecosystem, the collected data may have different structure and
F. Internet of Things even contain several errors. Therefore, when data are used to
1) The Internet of Things: In the foreseeable future, it train ANNs, one should consider how to classify the data and
is envisioned that trillions of machine-type devices such as deal with the flaws in the data. In other words, the ANNs in
wearables, sensors, connected vehicles, or mundane objects IoT must tolerate erroneous data. Third, in the IoT system,
will be connected to the Internet, forming a massive IoT ANNs can exploit thousands of types of data for prediction
26
and self-organizing control. For a given task, the data collected IoT device is limited, IoT devices with different computational
from the IoT devices may not all be related to the task. Hence, resources will map to a different number of neurons. For
ANNs must select suitable data for the task. example, an IoT device that has more computational resources
The existing literature [121]–[129] has studied a number can map to a larger number of neurons. Moreover, to ensure
of problems related to using ANNs for IoT. In [121], the the integrity of the mapping model, each neuron can only
authors use a framework to treat an IoT network as an ANN to map to one of the IoT devices. Given that there are several
reduce delivery latency. The authors in [122] and [123] used ways to map the IoT network to the trained FNN, the optimal
a backpropagation neural network for sensor failure detection mapping is formulated as an integer linear program which
in an IoT network. In [124], eight ML algorithms, including is then solved using CPLEX. When the optimal mapping
DNNs and FNNs, are tested for human activities classification between the IoT network and the trained FNN is found, the
and robot navigation as well as body postures and movements. optimal connections between the IoT devices are built. Hence,
In [125], the authors used the Laguerre neural network-based if the IoT network can find the optimal connections for all
approximate dynamic programming scheme to improve the devices based on the objective functions, the transmit power
tracking efficiency in an IoT network. The authors in [126] and expected transmit time can be reduced. Simulation results
develped a streaming hardware accelerator for CNNs to im- show that the mapping algorithm can achieve significant gains
prove the accuracy of image detection in an IoT network. The in terms of total transmit power and expected transmit time
work in [127] used a denoising autoencoder neural network compared to a centralized algorithm. This is because the IoT
for data sampling in an IoT network. In [128], a deep belief network uses FNNs to approximate the objective functions and
network is used for entity state prediction. The authors in find the optimal device connections.
[129] used ANNs for target surveillance. In summary, the prior 4) Lessons learned: This IoT application has shown that
works used ANNs to solve a number of IoT problems such FNNs are an effective tool for network mapping in IoTs so as
as IoT network modeling, failure detection, human activities to find the optimal transmission links from the transmitters to
classification, and tracking accuracy improvement. However, the receivers through the relays. We can summarize the main
ANNs can also be used to analyze the data correlation for data lessons learned here as follows:
compression and data recovery, to identify humans, to predict • The advantage of FNNs for the studied IoT application
human activities, and to manage the resources of devices. Next, is that it enabled the IoT devices to optimally build the
we explain a specific ANNs’ application for IoT. transmission links between the receivers and the trans-
3) Example: One illustrative application for the use of mitters so as to reduce the transmission delay without
ANNs within the context of the IoT is presented in [121] any communications among the IoT devices. In this
which studies how to improve the communication quality application, the wireless network only consists of the
by mapping IoT networks to ANNs. The considered IoT receivers, the transmitters, and the relays, and, the data
network is primarily a wireless sensor network. Two objective in this wireless network will only be transmitted from
functions are considered : a) minimizing the overall cost of the transmitters to the relays, then from the relays to the
communication between the devices mapped to the neurons receivers. The use of FNNs to map this network is ap-
in the input layer and the devices mapped to the neurons in propriate as it allows one to find the optimal transmission
the output layers. Here, the overall cost represents the total links between the transmitters and the receivers, through
transmit power of all devices used to transmit the information the relays. This was a novel use case of FNNs that is
signals, and b) minimizing the expected transmission time to motivated by the underlying wireless system.
deliver the information signals. • FNNs are very simple neural networks with little training
To minimize the total transmit power and the expected overhead, which makes them suitable for implementa-
transmit time for the IoT, the basic idea of [121] is to train an tion in IoT systems in which the devices are resource-
ANN so as to approximate the objective functions discussed constrained.
above and, then, map the IoT network to the ANN. FNNs, are • One disadvantage of using FNNs for mapping wireless
used for this mapping since they transmit the information in networks is that they can be only used for a network
only one direction, forward, from the input nodes, through the with a small number of transmitters and receivers. This
hidden nodes, and to the output nodes. First, one must identify is due to the fact that, as the number of transmitters and
the devices that want to send signals as well as the devices that receivers increases, the number of neurons in the input,
will receive signals. The IoT devices that want to send signals output, and hidden layers increases. Since FNNs need to
are mapped to the neurons in the input layers. The IoT devices calculate the gradients of all of the neurons (in contrast to
that want to receive signals are mapped to the neurons in the ESNs that only need to update the output weight matrix),
output layers. The other IoT devices are mapped to the neurons the training complexity will significantly increase.
in the hidden layers. Some of the devices that are mapped to • The presented IoT application is restricted to a very
the hidden layers will be used to forward the signals. Then, simple mapping of IoT devices via an FNN. However, the
the FNN is trained in an offline manner to approximate the IoT domain is much richer than this application and one
objective functions. The IoT network devices are mapped into can envision a plethora of resource management, physical
neurons and wireless links into connections between neurons, layer enhancement, and network optimization problems
and, hence, a method is needed to map the trained FNN to that can be addressed using more elaborate ANNs such
the IoT network. Since the computational resources of each as those presented in Section III (and in the previous
27
applications). 2) Challenges and Limitations of ANN-based RL Algo-

Note that, the first, second, and third bullets observations above rithms: Implementing ANN-based RL algorithms in wireless
can be generalized to other works that rely on FNNs for networks also faces many challenges. First, for RL algorithms,
solving wireless communication problems. the training complexity increases quickly as the number of
5) Future Works: ANNs are undoubtedly an important tool BSs or users that implement RL algorithms increases. In
for solving a variety of problems in the IoT, particularly in consequence, one needs to find a smart training method to
terms of intelligent data analytics and smart operation. In fact, decrease the training complexity. Moreover, the complexity
beyond using FNNs to map the IoT devices hence optimizing and convergence of RL algorithms that rely on ANNs can
the connections between the IoT devices as discussed above, be challenging to characterize analytically. Recently, most of
FNNs can also be used to map other systems. For example, the existing works use models based on Markov decision
one can map the input layer of an FNN to the IoT devices processes (MDPs) and game theory to analyze the convergence
and the output layer to the computing centers. Then, one can of RL algorithms. In fact, RL algorithms can also be used for
find an optimal allocation of computational tasks via FNN the problems that cannot be modeled by MDP or game theory
mapping. Moreover, ANNs can be used for data compression models. However, the convergence of these problems is often
and recovery so as to reduce both the size of the transmitted challenging to ascertain analytically and, thus, one has to rely
data and end-to-end devices latency. To compress the data, on simulations. In addition, one must reduce the computational
an ANN needs to extract the most important features from resources and power needed for the ANN-based RL algorithms
the data and, then, these features can be used to present that must be implemented at wireless devices. In fact, for
the compressed data. In particular, CNNs can be used for ANN-based RL algorithms, the number of actions and states
data compression and recovery in the spatial domain while must be finite. In this case, ANN-based RL algorithms need
RNNs can be used for data compression and recovery in to be carefully designed if they are to be used to solve the
the time domain. This is because CNNs are effective at problems that have continuous states and actions.
extracting patterns and features from large amounts of data 3) Advantages of ANN-based Data Analytics Algorithms:
while RNNs are suitable for extracting the relationships from The second important use case of ANNs in wireless networks
time-dependent series data. In addition, DNNs can be used is data analytics. In wireless networks, most of the collected
for human identification. An IoT ecosystem that can identify data will be time-dependent. For example, mobile user behav-
different individuals can pre-allocate spectral or computational iors, wireless signals, and novel energy are all time-dependent.
resources to the IoT devices that a certain individual often In consequence, wireless operators can use RNNs for user
uses. DNNs are suitable here because they have multiple behavior prediction, signal detection, channel modeling, and
hidden layers to store more information related to a user energy prediction. In particular, due to the unique neuron
compared to other ANNs and, hence, DNNs can use one user’s connection method (each neuron in one layer can connect to
information such as hairstyle, clothes, and oral patterns to the neurons in previous layers) of RNNs, they are effective
identify that individual so as to provide services tailored to in dealing with time-dependent data. Moreover, one can use
this user. A summary of key problems that can be solved by CNNs, a type of DNNs, for modulation classification, as done
using ANNs in IoT system is shown in Table VI along with in [32]. CNNs can also be used to analyze the images captured
the challenges and future works. by the mobile devices such as VR devices and UAVs so as to
extract the features of captured images. The features extracted
G. Summary by CNNs can be used for the users movement identification,
In summary, for wireless communications, ANNs have environment identification, and data compression and recovery
two important use cases: 1) ANN-based RL algorithms for which can be used for wireless network control and data traffic
network control, resource management, user association, and offloading. For example, one can use CNNs for data compres-
interference alignment, and 2) intelligent data analytics for sion at the transmitters and data recovery at the receivers so as
signal detection, spectrum sensing, channel state detection, to reduce the traffic load over the transmission links between
energy prediction, as well as user behavior predictions and transmitters and receivers. Meanwhile, since SNNs consist of
classifications. In this subsection, we first summarize the spiking neurons, they are effective in dealing with continuous
advantages, challenges, and limitations of ANN based RL data. In consequence, one can use SNNs for signal detection,
algorithms for wireless communication applications. Then, we channel modeling, channel state detection, and wireless device
introduce the advantages, challenges, and limitations of using (aerial or ground) identification. For example, one can use
ANNs for data analytics in wireless networks. both continuous flying trajectory and radio frequency signals
1) Advantages of ANN-based RL Algorithms: In general, as the input of SNNs to identify UAVs and then tweak their
RL algorithms based on ANNs can be used for wireless transmission parameters.
network control and resource management as the wireless 4) Challenges and Limitations of ANN-based Data Ana-
network states and conditions are unknown, as shown in the lytics Algorithms: Implementing ANNs for data analytics in
example of co-existence of multiple radio access technologies. wireless networks also faces many challenges. First, the data
Moreover, RL algorithms can be used to solve non-convex related to the behavior of mobile users is not easy to collect
optimization problems or problems in which the optimization due to privacy concerns. For instance, a network operator
variables are coupled, as shown in the example of wireless such as Verizon can collect only partial datasets related to the
virtual reality. mobile users. Due to this partial availability of datasets, the
28
TABLE V
S UMMARY OF THE USE OF ANN- BASED L EARNING A LGORITHMS FOR E XISTING W ORKS IN S PECIFIC A PPLICATION
Existing Works Data Analytics

Applications ANN Tool RL
Problems Reference Supervised Unsupervised √
• [91] • FNNs. √
• UAV control.
• [94] • FNNs √
• Position estimation. • [95] • FNNs √
• UAV detection. • [93] • RNNs. √
UAV
• [96] • RNNs. √
• Resource allocation.
• [98] • SNNs. √
• [92] • FNNs. √ √
• UAV deployment.
• [97] • RNNs. √ √
• Head movement prediction. • [105] • RNNs. √
VR • Resource allocation. • [103], [104] • RNNs. √.
• VR content caching and transmission. • [107] • DNNs. √ .
• [109] • FNNs. √
• Cache replacement. • [110] • DNNs. √
Caching and
• [111] • DNNs. √
Computing
• Content popularity prediction. • [113] • FNNs. √
• Content request distribution prediction. • [97], [114] • RNNs. √
• [115] • DNNs. √
• Resource management.
• [101] • RNNs. √
• RAT selection. • [116] • CNNs. √
Multi-RAT • Transmission technology classification. • [117] • FNNs. √
• Multi-radio packet scheduling. • [118] • RNNs. √
• Mode selection. • [120] • FNNs. √
• Automatic root cause analysis. • [62] • RNNs. √ √
• Model IoT as ANNs. • [121], [123] • FNNs. √
• Failure detection. • [122] • FNNs. √
• User activities classification. • [124] • DNNs. √ .
• Tracking accuracy improvement. • [125] • DNNs. √
IoT
• Image detection. • [126] • CNNs. √
• Data sampling. • [127] • PNNs. √
• Entity state prediction. • [128] • DNNs. √
• Target surveillance. • [129] • FNNs.
prediction accuracy of ANNs can be compromised. Second, training complexity. Finally, training ANNs may require a
for data analytics, existing ANN-based learning algorithms large amount of training data (depending on the application)
cannot be readily implemented at the mobile devices such and such data may not be always readily available in a wireless
as smartphones due to high training complexity and energy network.
consumption. In fact, small IoT or wearable devices such as Table V summarizes the type of ANNs and learning algo-
watches and IoT sensors, or even smartphones, can record rithms used for each existing work in each application. Based
more data related to the users’ environment compared to BSs on this table, one can identify the advantages, disadvantages,
that are located far away from the users. In consequence, if and limitations of each learning algorithm for all types of
an ANN learning algorithm can be implemented at wearable problems encountered in the literature. Table VI provides a
and carriable devices, it can use more data related to the summary of the key wireless networking problems that can be
users’ behaviors for training purpose and, hence, the prediction solved by using ANNs along with the challenges and relevant
accuracy can be improved, while also alleviating privacy applications.
concerns. One possibility to overcome this challenge is to train
at a BS or cloud then implement the trained ANNs at the V. C ONCLUSION
users’ device. Third, distributed ANN learning algorithms are
needed for wireless networks. In particular, mobile users will In this paper, we have provided one of the first comprehen-
connect to the different BSs as they move from one cell to sive tutorials on the use of artificial neural networks-based
another. In this case, the data related to such mobile user may machine learning for enabling a variety of applications in
be located at different BSs and the BSs may not be able to tomorrow’s wireless networks. In particular, we have presented
exchange the collected data due to limited capacity of backhaul an overview of a number of key types of neural networks
links. In consequence, a distributed ANN learning algorithm such as recurrent, spiking, and deep neural networks. For each
is needed for data analytics as the users’ data is located at type, we have overviewed the basic architecture as well as
different BSs. One possibility to overcome this challenge is the associated challenges and opportunities. Then, we have
to leverage the emerging idea of federated learning [165] that provided a panoramic overview of the variety of wireless
enables distributed learning. Moreover, the training complexity communication problems that can be addressed using ANNs.
of ANN-based data analytics algorithms can be higher than In particular, we have investigated many emerging applications
other ML tools such as ridge regression. In consequence, one including unmanned aerial vehicles, wireless virtual reality,
must balance the tradeoff between prediction accuracy and mobile edge caching and computing, Internet of Things, and
multi-RAT wireless networks. For each application, we have
29
TABLE VI
S UMMARY OF THE USE OF ANN S FOR S PECIFIC W IRELESS P ROBLEMS
Use Case Relevant applications

Wireless networking ANN
Challenges Multi Physical
related problems Tools DA RL UAV VR MECC IoT
RAT Layer
• Large networks and action spaces.
• Need for self-organizing solutions. √ √ √ √ √ √ √
RNNs
Resource allocation • Resource allocation variables are coupled.
DNNs
• Need for self-organizing solution.
• Non-convex optimization problems.
• Involves time-dependent locations.
Wireless-aware path √ √
• Driven by environmental data. RNNs
planning for autonomous
• Need for adaptation to dynamic settings. DNNs
systems (e.g., UAVs)
• Require distributed solutions.
• Unknown relationship between the
Channel modeling and received and transmitted signals. √ √ √ √ √ √
SNNs
estimation • Need for estimation of wireless channels.
• Need for modeling solutions that
can adapt to time-varying channels.
• Handover often involves dynamic
mobility thus requiring adaptive solutions. √ √ √ √ √
Handover RNNs
• Need for on-the-fly decisions.
• Optimized variables are binary.
• User behavior is correlated in time.
Wireless user behavior • User behavior involves underlying RNNs √ √ √ √ √ √ √
estimation factors that must be characterized. DNNs
• User behaviors vary across time scales.
• Content is time and user dependent. √ √ √ √ √
SNNs
Wireless content prediction • Content requests is often arbitrary.
DNNs
• Predictions is focused on data analytics.
Content delivery format • Need to consider users’ requirements. √ √ √ √
RNNs
and method (e.g., 360◦ or • Optimized variables are discrete.
DNNs
120◦ contents) • Content requests are time varying.
Users and computational • High complexity to scan all of users and √ √ √ √ √
CNNs
tasks clustering computational tasks.
• Computational time and demands are
Computational time and √ √
time-dependent and continuous.
demand predictions of each SNNs
• Predictions driven by users’ other
task requested by each user
behaviors and information.
• LoS links are dynamic and time-varying. SNNs √ √ √ √ √ √
Detection of LoS links • Need to observe the physical channel. DNNs
• Need to track users’ mobility.
• Must estimate the angle of the receiver’s
antennas. √ √ √
Antenna tilting • Requires intelligent tracking of SNNs
transmitter-receiver coupling.
• Must be executed in a short time.
Data compression and • High complexity of data scanning. √ √ √ √ √
recovery for data • Correlation among user data. CNNs
transmission and caching • Lack of prior models on user identities.
• A large amount of input data..
User and device • Large-scale nature of the network. √ √
DNNs
identifications • Presence of large volumes of data.
• High churn and dynamics
• Diversity of IoT devices. √ √ √ √
IoT device management • Large-scale nature of the IoT system. DNNs
• High churn and dynamics in IoT.
• Diversity of mobile devices. √ √ √ √
Wireless network modeling • Need to identify various mobile devices. DNNs
• Need to adapt to dynamic environment.
• A UAV’s trajectory is continuous.
Autonomous vehicle (e.g., • Trajectory is time-dependent. √ √
SNNs
UAV) trajectory prediction • Trajectory depends on wireless
parameters (e.g., interference).
• Data is correlated in time and space √ √ √ √ √
RNNs
Wireless data correlation domain.
CNNs
• Need to process large sized data.
30
provided the main motivation for using ANNs along with their [21] F. Sebastiani, “Machine learning in automated text categorization,”
associated challenges while also providing a detailed example ACM Computing Surveys (CSUR), vol. 34, no. 1, pp. 1–47, March
2002.
for a use case scenario. Last, but not least, for each application, [22] R. Collobert and J. Weston, “A unified architecture for natural language
we have provided a broad overview on future works that processing: Deep neural networks with multitask learning,” in Proc.
can be addressed using ANNs. Clearly, the future of wireless of the International Conference on Machine Learning, New York, NY,
USA, July 2008, pp. 160–167.
networks will inevitably rely on artificial intelligence and, thus, [23] B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up?: Sentiment
this paper provides a stepping stone towards understanding the classification using machine learning techniques,” in Proc. of the
analytical machinery needed to develop such a new breed of Conference on Empirical Methods in Natural Language processing,
Stroudsburg, PA, USA, July 2002.
wireless networks. [24] C. M. Bishop, Pattern recognition and machine learning, Springer,
2006.
[25] S. Bi, R. Zhang, Z. Ding, and S. Cui, “Wireless communications in
R EFERENCES the era of big data,” IEEE Communications Magazine, vol. 53, no. 10,
pp. 190–199, Oct. 2015.
[1] N. C. Luong, D. T. Hoang, P. Wang, D. Niyato, D. I. Kim, and Z. Han,
“Data collection and wireless communication in internet of things [26] “The Amazing Ways Verizon Uses AI And
(IoT) using economic analysis and pricing models: a survey,” IEEE Machine Learning To Improve Performance,”
Communications Surveys & Tutorials, vol. 18, no. 4, pp. 2546–2590, https://www.forbes.com/sites/bernardmarr/2018/06/22/
June 2016. the-amazing-ways-verizon-uses-ai-and-machine-learning-to-improve-performance/
[2] Z. Dawy, W. Saad, A. Ghosh, J. G. Andrews, and E. Yaacoub, “To- #3695e5e17638, 2018.
ward massive machine type cellular communications,” IEEE Wireless [27] “Making waves with AI,” https://www.
Communications, vol. 24, no. 1, pp. 120–128, Nov. 2017. ericsson.com/en/mobility-report/reports/june-2018/
[3] T. Park, N. Abuzainab, and W. Saad, “Learning how to communicate applying-machine-intelligence-to-network-management, 2018.
in the internet of things: Finite resources and heterogeneity,” IEEE [28] “Qualcomm AI Research,” https://www.qualcomm.com/invention/
Access, vol. 4, pp. 7063–7073, Nov. 2016. artificial-intelligence/ai-research, 2018.
[4] F. Boccardi, R. W. Heath, A. Lozano, T. L. Marzetta, and P. Popovski, [29] “Focus group on machine learning for future networks includ-
“Five disruptive technology directions for 5G,” IEEE Communications ing 5G,” https://www.itu.int/en/ITU-T/focusgroups/ml5g/Pages/default.
Magazine, vol. 52, no. 2, pp. 74–80, Feb. 2014. aspx, 2018.
[5] 3GPP, “Study on latency reduction techniques for LTE ,” Technical [30] J. Ferber, Multi-agent systems: An introduction to distributed artificial
Report (TR) 36.881, 3rd Generation Partnership Project (3GPP), 2016. intelligence, vol. 1, Addison-Wesley Reading, 1999.
[6] Y. Zeng, R. Zhang, and T. J. Lim, “Wireless communications with [31] S. Bubeck, Convex Optimization: Algorithms and Complexity, Now
unmanned aerial vehicles: opportunities and challenges,” IEEE Com- Foundations and Trends, 2015.
munications Magazine, vol. 54, no. 5, pp. 36–42, May 2016. [32] T. O’Shea and J. Hoydis, “An introduction to deep learning for the
[7] T. Zeng, O. Semiari, W. Saad, and M. Bennis, “Joint communication physical layer,” IEEE Transactions on Cognitive Communications and
and control for wireless autonomous vehicular platoon systems,” arXiv Networking, vol. 3, no. 4, pp. 563–575, Dec 2017.
preprint arXiv:1804.05290, 2018. [33] T. O’Shea, K. Karra, and T. C. Clancy, “Learning approximate neural
[8] M. Mozaffari, W. Saad, M. Bennis, and M. Debbah, “Communications estimators for wireless channel state information,” in Proc. of IEEE
and control for wireless drone-based antenna array,” IEEE Transactions International Workshop on Machine Learning for Signal Processing
on Communications, vol. 67, no. 1, pp. 820–834, Jan 2019. (MLSP), Tokyo, Japan, Sep. 2017.
[9] F. Javed, M. K. Afzal, M. Sharif, and B. Kim, “Internet of things [34] T. J. O’Shea, T. Erpek, and T. C. Clancy, “Deep learning based mimo
(iot) operating systems support, networking technologies, applications, communications,” available online arXiv:1707.07980, July 2017.
and challenges: A comparative review,” IEEE Communications Surveys [35] F. Liang, C. Shen, and F. Wu, “An iterative BP-CNN architecture
Tutorials, vol. 20, no. 3, pp. 2062–2100, thirdquarter 2018. for channel decoding,” IEEE Journal of Selected Topics in Signal
[10] G. Durisi, T. Koch, and P. Popovski, “Toward massive, ultrareliable, Processing, vol. 12, no. 1, pp. 144–159, Feb 2018.
and low-latency wireless communication with short packets,” Proceed- [36] E. Nachmani, E. Marciano, L. Lugosch, W. J. Gross, D. Burshtein, and
ings of the IEEE, vol. 104, no. 9, pp. 1711–1726, Sept 2016. Y. Be’ery, “Deep learning methods for improved decoding of linear
[11] D. G. Gopal and S. Kaushik, “Emerging technologies and applications codes,” IEEE Journal of Selected Topics in Signal Processing, vol. 12,
for cloud-based gaming: Review on cloud gaming,” Emerging Tech- no. 1, pp. 119–131, Feb 2018.
nologies and Applications for Cloud-Based Gaming, vol. 41, no. 07, [37] N. Samuel, T. Diskin, and A. Wiesel, “Deep mimo detection,” in
pp. 79–89, March 2016. Proc. of IEEE International Workshop on Signal Processing Advances
[12] 3GPP, “Extended Reality (XR) in 5G,” Technical Report (TR) 26.928, in Wireless Communications (SPAWC), Sapporo, Japan, July 2017.
3rd Generation Partnership Project (3GPP), 03 2018, Version 14.0.0. [38] E. Baştuğ, M. Bennis, M. Médard, and M. Debbah, “Towards
[13] J. G. Andrews, S. Buzzi, W. Choi, S. V. Hanly, A. Lozano, A. C. K. interconnected virtual reality: Opportunities, challenges and enablers,”
Soong, and J. C. Zhang, “What will 5G be?,” IEEE Journal on selected IEEE Communications Magazine, vol. 55, no. 6, pp. 110–117, June
areas in communications, vol. 32, no. 6, pp. 1065–1082, June 2014. 2017.
[14] T. Segaran, Programming collective intelligence: Building smart web [39] W. Saad, M. Bennis, and M. Chen, “A vision of 6G wireless systems:
2.0 applications, O’Reilly Media, Inc., 2007. Applications, trends, technologies, and open research problems,” arXiv
[15] B. Yegnanarayana, Artificial neural networks, PHI Learning Pvt. Ltd., preprint arXiv:1902.10265, 2019.
2009. [40] R. Yu, “Huawei reveals the future of mobile ai at IFA
[16] G. Chakraborty and B. Chakraborty, “A novel normalization technique 2017,” http://www.businesswire.com/news/home/20170902005020/en/
for unsupervised learning in ann,” IEEE Transactions on Neural Huawei-Reveals-Future-Mobile-AI-IFA-2017, 2017.
Networks, vol. 11, no. 1, pp. 253–257, Jan 2000. [41] S. Kovach, “Qualcomm CEO Steve Mollenkopf: What
[17] K. P. Bennett and A. Demiriz, “Semi-supervised support vector the big innovation house that powered the mobile
machines,” in Proc. of Advances in Neural Information processing boom is betting on next,” http://www.businessinsider.com/
systems, 1999, pp. 368–374. qualcomm-ceo-steve-mollenkopf-interview-2017-7, 2017.
[18] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. [42] Y. C. Hu, M. Patel, D. Sabella, N. Sprecher, and V. Young, “Mobile
Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, and G. Os- edge computing-A key technology towards 5G,” ETSI White Paper,
trovski, “Human-level control through deep reinforcement learning,” vol. 11, no. 11, pp. 1–16, Sept. 2015.
Nature, vol. 518, no. 7540, pp. 529, 2015. [43] A. Ahmed and E. Ahmed, “A survey on mobile edge computing,” in
[19] C. Andrieu, N. De Freitas, A. Doucet, and M. I. Jordan, “An Proc. of International Conference on Intelligent Systems and Control,
introduction to MCMC for machine learning,” Machine Learning, vol. Coimbatore, India, Jan. 2016.
50, no. 1-2, pp. 5–43, Jan. 2003. [44] S. Sardellitti, G. Scutari, and S. Barbarossa, “Joint optimization of
[20] W. T. Freeman, E. C. Pasztor, and O. T. Carmichael, “Learning low- radio and computational resources for multicell mobile-edge comput-
level vision,” International Journal of Computer Vision, vol. 40, no. ing,” IEEE Transactions on Signal and Information Processing over
1, pp. 25–47, Oct. 2000. Networks, vol. 1, no. 2, pp. 89–103, June 2015.
31
[45] S. Nunna, A. Kousaridas, M. Ibrahim, M. Dillinger, C. Thuemmler, [68] M. Lukoševičius and H. Jaeger, “Reservoir computing approaches to
H. Feussner, and A. Schneider, “Enabling real-time context-aware recurrent neural network training,” Computer Science Review, vol. 3,
collaboration through 5G and mobile edge computing,” in Proc. of no. 3, pp. 127–149, Aug. 2009.
International Conference on Information Technology-New Generations, [69] P. J. Werbos, “Backpropagation through time: What it does and how
Las Vegas, NV, USA, June 2015. to do it,” Proceedings of the IEEE, vol. 78, no. 10, pp. 1550–1560,
[46] G. Lee, W. Saad, and M. Bennis, “Decentralized cross-tier interference Oct. 1990.
mitigation in cognitive femtocell networks,” in Proc. of IEEE Interna- [70] M. Lukos̆evicius, A Practical Guide to Applying Echo State Networks,
tional Conference on Communications (ICC), Paris, France, May 2017. Springer Berlin Heidelberg, 2012.
[47] Y. Mao, J. Zhang, and K. B. Letaief, “Dynamic computation offloading [71] H. Jaeger, “Short term memory in echo state networks,” in GMD
for mobile-edge computing with energy harvesting devices,” IEEE Report, 2001.
Journal on Selected Areas in Communications, vol. 34, no. 12, pp. [72] R. Ali and T. Peter, “Minimum complexity echo state network,”
3590–3605, Sep. 2016. IEEE Transactions on Neural Networks, vol. 22, no. 1, pp. 131–144,
[48] X. Chen, L. Jiao, W. Li, and X. Fu, “Efficient multi-user computation November 2011.
offloading for mobile-edge cloud computing,” IEEE/ACM Transactions [73] C. Gallicchio and A. Micheli, “Deep reservoir computing: A critical
on Networking, vol. 24, no. 5, pp. 2795–2808, Oct. 2016. analysis,” in Pro. of European Symposium on Artificial Neural
[49] O. Semiari, W. Saad, S. Valentin, M. Bennis, and H. V. Poor, “Context- Networks, Computational Intelligence and Machine Learning, Bruges,
aware small cell networks: How social metrics improve wireless Belgium, April 2016.
resource allocation,” IEEE Transactions on Wireless Communications, [74] C. M. Bishop, “Training with noise is equivalent to tikhonov regular-
vol. 14, no. 11, pp. 5927–5940, July 2015. ization,” Training, vol. 7, no. 1, pp. 108–116, 2008.
[50] H. Sun, X. Chen, Q. Shi, M. Hong, X. Fu, and N. D. Sidiropoulos, [75] B. Farhang-Boroujeny, Adaptive filters: theory and applications, John
“Learning to optimize: Training deep neural networks for wireless Wiley & Sons, 2013.
resource management,” available online arXiv:1705.09412, May 2017. [76] H. Jaeger and H. Haas, “Harnessing nonlinearity: Predicting chaotic
[51] C. Jiang, H. Zhang, Y. Ren, Z. Han, K. C. Chen, and L. Hanzo, systems and saving energy in wireless communication,” Science, vol.
“Machine learning paradigms for next-generation wireless networks,” 304, no. 5667, pp. 78–80, April 2004.
IEEE Wireless Communications, vol. 24, no. 2, pp. 98–105, April 2017. [77] D. Verstraeten, B. Schrauwen, D. Stroobandt, and J. Van Campenhout,
[52] M. Bkassiny, Y. Li, and S. K. Jayaweera, “A survey on machine- “Isolated word recognition with the liquid state machine: A case study,”
learning techniques in cognitive radios,” IEEE Communications Sur- Information Processing Letters, vol. 95, no. 6, pp. 521–528, Sept. 2005.
veys & Tutorials, vol. 15, no. 3, pp. 1136–1159, Oct. 2013. [78] W. Maass, “Liquid state machines: motivation, theory, and applica-
[53] M. A. Alsheikh, S. Lin, D. Niyato, and H. P. Tan, “Machine learning tions,” Computability in Context: Computation and Logic in the Real
in wireless sensor networks: Algorithms, strategies, and applications,” World, pp. 275–296, 2010.
IEEE Communications Surveys & Tutorials, vol. 16, no. 4, pp. 1996– [79] W. Maass, T. Natschläger, and H. Markram, “Real-time computing
2018, April 2014. without stable states: A new framework for neural computation based
[54] H. B. Demuth, M. H. Beale, O. De Jess, and M. T. Hagan, Neural on perturbations,” Neural Computation, vol. 14, no. 11, pp. 2531–2560,
network design, Martin Hagan, 2014. Nov. 2002.
[55] J. Xie, F. R. Yu, T. Huang, R. Xie, J. Liu, C. Wang, and Y. Liu,
[80] A. Courville I. Goodfellow, Y. Bengio, Deep Learning, The MIT press,
“A survey of machine learning techniques applied to software defined
2016.
networking (SDN): Research issues and challenges,” IEEE Commu-
[81] X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier networks,”
nications Surveys Tutorials, vol. 21, no. 1, pp. 393–430, Firstquarter
in Proc. of Artificial Intelligence and Statistics (AISTATS), Fort Laud-
2019.
erdale, FL, USA, June 2011.
[56] Q. Mao, F. Hu, and Q. Hao, “Deep learning for intelligent wireless
networks: A comprehensive survey,” IEEE Communications Surveys [82] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, chapter
Tutorials, vol. 20, no. 4, pp. 2595–2621, Fourthquarter 2018. Chapter 7: Regularization for Deep Learning, MIT Press, 2016, http:
[57] M. Mohammadi, A. Al-Fuqaha, S. Sorour, and M. Guizani, “Deep //www.deeplearningbook.org.
learning for IoT big data and streaming analytics: A survey,” IEEE [83] “A beginner guide to recurrent networks and LSTMs,” https://
Communications Surveys Tutorials, vol. 20, no. 4, pp. 2923–2960, deeplearning4j.org/lstm.html.
Fourthquarter 2018. [84] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
[58] Z. M. Fadlullah, F. Tang, B. Mao, N. Kato, O. Akashi, T. Inoue, and with deep convolutional neural networks,” in Proc. of Advances in
K. Mizutani, “State-of-the-art deep learning: Evolving machine intel- neural information processing systems, 2012.
ligence toward tomorrows intelligent network traffic control systems,” [85] M. Schmidt, D. Block, and U. Meier, “Wireless interference iden-
IEEE Communications Surveys Tutorials, vol. 19, no. 4, pp. 2432– tification with convolutional neural networks,” in Proc. of IEEE
2455, Fourthquarter 2017. International Conference on Industrial Informatics (INDIN), Emden,
[59] P. V. Klaine, M. A. Imran, O. Onireti, and R. D. Souza, “A survey Germany, July 2017.
of machine learning techniques applied to self-organizing cellular [86] M. Soh, “Learning CNN-LSTM architectures for image caption
networks,” IEEE Communications Surveys Tutorials, vol. 19, no. 4, generation,” http://cs224d.stanford.edu/reports/msoh.pdf, 2016.
pp. 2392–2431, Fourthquarter 2017. [87] L. J. Lin, “Reinforcement learning for robots using neural networks,”
[60] Y. Sun, M. Peng, Y. Zhou, Y. Huang, and S. Mao, “Application Tech. Rep., Carnegie-Mellon Univ Pittsburgh PA School of Computer
of machine learning in wireless networks: Key techniques and open Science, 1993.
issues,” IEEE Communications Surveys Tutorials, to appear, 2019. [88] Y. Zeng, R. Zhang, and T. J. Lim, “Wireless communications with
[61] N. C. Luong, D. T. Hoang, S. Gong, D. Niyato, P. Wang, Y. Liang, unmanned aerial vehicles: opportunities and challenges,” IEEE Com-
and D. I. Kim, “Applications of deep reinforcement learning in munications Magazine, vol. 54, no. 5, pp. 36–42, May 2016.
communications and networking: A survey,” IEEE Communications [89] Y. Yang, M. Chen, C. Guo, C. Feng, and W. Saad, “Power efficient
Surveys Tutorials, to appear, 2019. visible light communication (VLC) with unmanned aerial vehicles
[62] X. You, C. Zhang, X. Tan, S. Jin, and H. Wu, “AI for 5G: Research (UAVs),” IEEE Communications Letters, to appear, 2019.
directions and paradigms,” Science China Information Sciences, vol. [90] M. Mozaffari, W. Saad, M. Bennis, and M. Debbah, “Wireless com-
62, no. 2, pp. 1589–1602, Nov. 2019. munication using unmanned aerial vehicles (UAVs): Optimal transport
[63] J. Schmidhuber, “Deep learning in neural networks: An overview,” theory for hover time optimization,” IEEE Transactions on Wireless
Neural networks, vol. 61, pp. 85–117, Jan. 2015. Communications, vol. 16, no. 12, pp. 8052–8066, Dec 2017.
[64] “Machine learning: What it is and why it matters,” https://www.sas. [91] C. H. Liu, Z. Chen, J. Tang, J. Xu, and C. Piao, “Energy-efficient
com/en_us/insights/analytics/machine-learning.html. UAV control for effective and fair communication coverage: A deep
[65] E. Alpaydin, Introduction to machine learning, MIT press, 2014. reinforcement learning approach,” IEEE Journal on Selected Areas in
[66] L. Rose, S. Lasaulce, S. M. Perlaza, and M. Debbah, “Learning Communications, vol. 36, no. 9, pp. 2059–2070, Sep. 2018.
equilibria with partial information in decentralized wireless networks,” [92] V. Sharma, M. Bennis, and R. Kumar, “UAV-assisted heterogeneous
IEEE Communications Magazine, vol. 49, no. 8, pp. 136–142, August networks for capacity enhancement,” IEEE Communications Letters,
2011. vol. 20, no. 6, pp. 1207–1210, June 2016.
[67] D. P. Mandic and J. A. Chambers, Recurrent neural networks for [93] H. Zhang, C. Cao, L. Xu, and T. A. Gulliver, “A UAV detection
prediction: learning algorithms, architectures and stability, Wiley algorithm based on an artificial neural network,” IEEE Access, vol. 6,
Online Library, 2001. pp. 24720–24728, May 2018.
32
[94] D. Nodland, H. Zargarzadeh, and S. Jagannathan, “Neural network- users,” IEEE Transactions on Wireless Communications, vol. 16, no.
based optimal adaptive output feedback control of a helicopter UAV,” 6, pp. 3520–3535, June 2017.
IEEE Transactions on Neural Networks and Learning Systems, vol. 24, [115] L. Giupponi, R. Agusti, J. Perez-Romero, and O. Sallent, “Joint radio
no. 7, pp. 1061–1073, July 2013. resource management algorithm for multi-RAT networks,” in Pro.
[95] J. R. G. Braga, H. F. C. Velho, G. Conte, P. Doherty, and Élcio H. S., of IEEE Global Telecommunications Conference (GLOBECOM). St.
“An image matching system for autonomous UAV navigation based Louis, MO, Nov. 2005.
on neural network,” in Proc. of International Conference on Control, [116] H. He, C. Wen, S. Jin, and G. Y. Li, “Deep learning-based channel
Automation, Robotics and Vision (ICARCV), Phuket, Thailand, Nov. estimation for beamspace mmwave massive MIMO systems,” IEEE
2016. Wireless Communications Letters, vol. 7, no. 5, pp. 852–855, Oct 2018.
[96] J. Cui, Y. Liu, and A. Nallanathan, “Multi-agent reinforcement [117] S. Baban, D. Denkoviski, O. Holland, L. Gavrilovska, and H. Aghvami,
learning based resource allocation for UAV networks,” arXiv preprint “Radio access technology classification for cognitive radio networks,”
arXiv:1810.10408, 2018. in Pro. of IEEE International Symposium on Personal, Indoor, and
[97] M. Chen, M. Mozaffari, W. Saad, C. Yin, M. Debbah, and C. S. Hong, Mobile Radio Communications (PIMRC). London, UK, Sept. 2013.
“Caching in the sky: Proactive deployment of cache-enabled unmanned [118] Y. Cui, Y. Xu, R. Xu, and X. Sha, “A multi-radio packet scheduling
aerial vehicles for optimized quality-of-experience,” IEEE Journal on algorithm for real-time traffic in a heterogeneous wireless network
Selected Areas in Communications, vol. 35, no. 5, pp. 1046–1061, May environment,” Information Technology Journal, vol. 10, pp. 182–188,
2017. Oct. 2010.
[98] M. Chen, W. Saad, and C. Yin, “Liquid state machine learning for [119] M. Bennis, M. Simsek, A. Czylwik, W. Saad, S. Valentin, and M. Deb-
resource and cache management in LTE-U unmanned aerial vehicle bah, “When cellular meets WiFi in wireless small cell networks,” IEEE
(UAV) networks,” IEEE Transactions on Wireless Communications, Communication Magazine, vol. 51, no. 6, pp. 44–50, June 2013.
vol. 18, no. 3, pp. 1504–1517, March 2019. [120] Y. Sun, M. Peng, and S. Mao, “Deep reinforcement learning-based
[99] X. Liu, M. Chen, and C. Yin, “Optimized trajectory design in UAV mode selection and resource management for green fog radio access
based cellular networks for 3D users: A double Q-learning approach,” networks,” IEEE Internet of Things Journal, vol. 6, no. 2, pp. 1960–
arXiv preprint arXiv:1902.06610, 2019. 1971, April 2019.
[100] H. Jaeger, “Controlling recurrent neural networks by conceptors,” [121] N. Kaminski, I. Macaluso, E. Di Pascale, A. Nag, J. Brady, M. Kelly,
available online: arxiv.org/abs/1403.3369, 2014. K. Nolan, W. Guibene, and L. Doyle, “A neural-network-based
[101] M. Chen, W. Saad, and C. Yin, “Echo state networks for self-organizing realization of in-network computation for the Internet of Things,” in
resource allocation in LTE-U with uplink-downlink decoupling,” IEEE Proc. of IEEE International Conference on Communications, Paris,
Transactions on Wireless Communications, vol. 16, no. 1, pp. 3–16, France, May 2017.
Jan. 2017. [122] S. R. Naidu, E. Zafiriou, and T. J. McAvoy, “Use of neural networks
[102] U. Challita, Z. Dawy, G. Turkiyyah, and J. Naoum-Sawaya, “A for sensor failure detection in a control system,” IEEE Control Systems
chance constrained approach for LTE cellular network planning under Magazine, vol. 10, no. 3, pp. 49–55, April 1990.
uncertainty,” Computer Communications, vol. 73, pp. 34–45, Jan. 2016. [123] H. Ning and Z. Wang, “Future internet of things architecture: Like
[103] M. Chen, W. Saad, C. Yin, and M. Debbah, “Data correlation-aware mankind neural system or social organization framework?,” IEEE
resource management in wireless virtual reality (VR): An echo state Communications Letters, vol. 15, no. 4, pp. 461–463, March 2011.
transfer learning approach,” IEEE Transactions on Communications, [124] F. Alam, R. Mehmood, I. Katib, and A. Albeshri, “Analysis of eight
vol. 67, no. 6, pp. 4267–4280, June 2019. data mining algorithms for smarter internet of things (IoT),” Procedia
[104] M. Chen, W. Saad, and C. Yin, “Virtual reality over wireless networks: Computer Science, vol. 98, pp. 437–442, Dec. 2016.
Quality-of-service model and learning-based resource management,” [125] X. Luo, Y. Lv, M. Zhou, W. Wang, and W. Zhao, “A laguerre neural
IEEE Transactions on Communications, vol. 66, no. 11, pp. 5621– network-based ADP learning scheme with its application to tracking
5635, Nov 2018. control in the internet of things,” Personal and Ubiquitous Computing,
[105] M. Chen, O. Semiari, W. Saad, X. Liu, and C. Yin, “Federated echo vol. 20, no. 3, pp. 361–372, June 2016.
state learning for minimizing breaks in presence in wireless virtual [126] L. Du, Y. Du, Y. Li, J. Su, Y. Kuan, C. Liu, and M. F. Chang, “A
reality networks,” arXiv preprint arXiv:1812.01202, 2018. reconfigurable streaming deep convolutional neural network accelerator
[106] G. A. Koulieris, G. Drettakis, D. Cunningham, and K. Mania, “Gaze for internet of things,” IEEE Transactions on Circuits and Systems I:
prediction using machine learning for dynamic stereo manipulation in Regular Papers, vol. 65, no. 1, pp. 198–208, Jan 2018.
games,” in Proc. of IEEE Virtual Reality (VR), Greenville, SC, USA, [127] T. Yu, X. Wang, and A. Shami, “UAV-enabled spatial data sampling in
March 2016. large-scale IoT systems using denoising autoencoder neural network,”
[107] M. Chen, W. Saad, and C. Yin, “Echo-liquid state deep learning for IEEE Internet of Things Journal, vol. 6, no. 2, pp. 1856–1865, April
360◦ content transmission and caching in wireless VR networks with 2019.
cellular-connected UAVs,” IEEE Transactions on Communications, to [128] P. Zhang, X. Kang, D. Wu, and R. Wang, “High-accuracy entity state
appear, 2019. prediction method based on deep belief network towards iot search,”
[108] E. Zeydan, E. Bastug, M. Bennis, M. A. Kader, I. A. Karatepe, A. S. IEEE Wireless Communications Letters, to appear, 2018.
Er, and M. Debbah, “Big data caching for networking: Moving from [129] J. Liang, X. Yu, and H. Li, “Collaborative energy-efficient moving
cloud to edge,” IEEE Communications Magazine, vol. 54, no. 9, pp. in internet of things: Genetic fuzzy tree vs. neural networks,” IEEE
36–42, Sept. 2016. Internet of Things Journal, to appear,2018.
[109] J. Cobb and H. ElAarag, “Web proxy cache replacement scheme based [130] “Qualcomm announces support for next-generation vr experiences
on back-propagation neural network,” Journal of Systems and Software, with new snapdragon 845 virtual reality development
vol. 81, no. 9, pp. 1539–1558, Sept. 2008. kit,” https://www.qualcomm.com/news/releases/2018/03/21/
[110] Z. Zhang, M. Hua, C. Li, Y. Huang, and L. Yang, “Placement delivery qualcomm-announces-support-next-generation-vr-experiences-new-snapdragon,
array design via attention-based sequence-to-sequence model with deep 2018.
neural network,” IEEE Wireless Communications Letters, vol. 8, no. [131] M. Bennis M. S. Elbamby, C. Perfecto and K. Doppler, “To-
2, pp. 372–375, April 2019. wards low-latency and ultra-reliable virtual reality,” available online:
[111] Y. Wei, F. R. Yu, M. Song, and Z. Han, “Joint optimization of caching, arxiv.org/abs/1801.07587, Jan. 2018.
computing, and radio resources for fog-enabled IoT using natural actor- [132] “Vive wireless adapter,” https://www.vive.com/us/wireless-adapter/,
critic deep reinforcement learning,” IEEE Internet of Things Journal, 2018.
vol. 6, no. 2, pp. 2061–2073, April 2019. [133] “TPCAST wireless adapter for oculus rift,” https://www.tpcastvr.com/
[112] E. Bastug, M. Bennis, E. Zeydan, M. A. Kader, I. A. Karatepe, A. S. product-rift, 2018.
Er, and M. Debbah, “Big data meets telcos: A proactive caching [134] “Because your senses do not have wires,” https://www.intel.com/
perspective,” Journal of Communications and Networks, vol. 17, no. content/www/us/en/wireless-products/wigig-overview.html, 2018.
6, pp. 549–557, December 2015. [135] A. E. Abbas, “Constructing multiattribute utility functions for decision
[113] S. M. S. Tanzil, W. Hoiles, and V. Krishnamurthy, “Adaptive scheme analysis,” INFORMS Tutorials in Operations Research, pp. 62–98, Oct.
for caching YouTube content in a cellular network: Machine learning 2010.
approach,” IEEE Access, vol. 5, pp. 5870–5881, March 2017. [136] M. Abrash, “What VR could, should, and almost certainly will be
[114] M. Chen, W. Saad, C. Yin, and M. Debbah, “Echo state networks for within two years,” https://media.steampowered.com/apps/abrashblog/
proactive caching in cloud-based radio access networks with mobile Abrash%20Dev%20Days%202014.pdf.
33
[137] E. Bastug, M. Bennis, and M. Debbah, “Living on the edge: The role [159] R. Williams, “Simple statistical gradient-following algorithms for
of proactive caching in 5g wireless networks,” IEEE Communications connectionist reinforcement learning,” Machine Learning, vol. 8, no.
Magazine, vol. 52, no. 8, pp. 82–89, Aug 2014. 3, pp. 229–256, May 1992.
[138] M. A. Maddah-Ali and U. Niesen, “Coding for caching: Fundamental [160] R. Sutton, D. McAllester, S. Singh, and Y. Mansour, “Policy gradient
limits and practical challenges,” IEEE Communications Magazine, vol. methods for reinforcement learning with function approximation,”
54, no. 8, pp. 23–29, August 2016. Advances in Neural Information Processing Systems, vol. 12, pp. 1057–
[139] Y. Fadlallah, A. M. Tulino, D. Barone, G. Vettigli, J. Llorca, and J. M. 1063, 2000.
Gorce, “Coding for caching in 5G networks,” IEEE Communications [161] T. Tieleman and G. Hinton, “Lecture 6.5—RmsProp: Divide the
Magazine, vol. 55, no. 2, pp. 106–113, Feb. 2017. gradient by a running average of its recent magnitude,” Technical
[140] Y. Mao, C. You, J. Zhang, K. Huang, and K. B. Letaief, “A survey report, 2012.
on mobile edge computing: The communication perspective,” IEEE [162] M. Agiwal, A. Roy, and N. Saxena, “Next generation 5G wireless
Communications Surveys & Tutorials, to appear, 2017. networks: A comprehensive survey,” IEEE Communications Surveys
[141] M. Peng, Y. Sun, X. Li, Z. Mao, and C. Wang, “Recent advances in & Tutorials, vol. 18, no. 3, pp. 1617–1655, Feb. 2016.
cloud radio access networks: System architectures, key techniques, and [163] A. Al-Fuqaha, M. Guizani, M. Mohammadi, M. Aledhari, and
open issues,” IEEE Communications Surveys and Tutorials, vol. 18, M. Ayyash, “Internet of things: A survey on enabling technologies,
no. 3, pp. 2282–2308, Mar. 2016. protocols, and applications,” IEEE Communications Surveys Tutorials,
[142] M. S. Elbamby, M. Bennis, and W. Saad, “Proactive edge computing vol. 17, no. 4, pp. 2347–2376, Fourthquarter 2015.
in latency-constrained fog networks,” in 2017 European Conference on [164] Y. Hu, A. Sanjab, and W. Saad, “Dynamic psychological game theory
Networks and Communications (EuCNC), Oulu, Finland, June 2017. for secure internet of battlefield things (IoBT) systems,” IEEE Internet
[143] E. Bastug, M. Bennis, and M. Debbah, “Living on the edge: The role of Things Journal, vol. 6, no. 2, pp. 3712–3726, April 2019.
of proactive caching in 5G wireless networks,” IEEE Communications [165] V. Smith, C. K. Chiang, M. Sanjabi, and A. Talwalkar, “Federated
Magazine, vol. 52, no. 8, pp. 82–89, Aug. 2014. multi-task learning,” available online: arxiv.org/abs/1705.10467, May
[144] D. Wu and R. Negi, “Effective capacity-based quality of service 2017.
measures for wireless networks,” Mobile Networks and Applications,
vol. 11, no. 1, pp. 91–99, February 2006.
[145] Y. Gu, W. Saad, M. Bennis, M. Debbah, and Z. Han, “Matching theory
for future wireless networks: fundamentals and applications,” IEEE
Communications Magazine, Special Issue on Emerging Applications,
Services, and Engineering for Cellular Cognitive Systems, vol. 53, no.
15, pp. 52–59, May 2015.
[146] 5GPPP, “The 5G infrastructure public private partnership: the next
generation of communication networks and services.,” Feb. 2015.
[147] C. Sexton, N. J. Kaminski, J. M. Marquez-Barja, N. Marchetti, and
L. A. DaSilva, “5G: Adaptable networks enabled by versatile radio
access technologies,” IEEE Communications Surveys Tutorials, vol.
19, no. 2, pp. 688–720, Secondquarter 2017.
[148] Y. Hu, R. MacKenzie, and M. Hao, “Expected Q-learning for self-
organizing resource allocation in LTE-U with downlink-uplink decou-
pling,” in Prof. of European Wireless Conference, Dresden, Germany,
May 2017.
[149] J. Andrews, “Seven ways that HetNets are a cellular paradigm shift,”
IEEE Communications Magazine, vol. 51, no. 3, pp. 136–144, Mar.
2013.
[150] G. Salami, O. Durowoju, A. Attar, O. Holland, R. Tafazolli, and
H. Aghvami, “A comparison between the centralized and distributed
approaches for spectrum management,” IEEE Communications Surveys
Tutorials, vol. 13, no. 2, pp. 274–290, Second 2011.
[151] Q. Li, H. Niu, A. Papathanassiou, and G. Wu, “5G network capacity:
Key elements and technologies,” IEEE Vehicular Technology Magazine,
vol. 9, no. 1, pp. 71–78, Mar. 2014.
[152] O. Semiari, W. Saad, M. Bennis, and M. Debbah, “Joint millimeter
wave and microwave resources allocation in cellular networks with
dual-mode base stations,” IEEE Transactions on Wireless Communi-
cations, vol. 16, no. 7, pp. 4802–4816, July 2017.
[153] S. Ha, S. Sen, C. Joe-Wong, Y. Im, and M. Chiang, “TUBE: time
dependent pricing for mobile data,” in Proc. of Special Interest Group
on Data Communication (ACM SIGCOMM). Helsinki, Finland, Aug.
2012.
[154] U. Challita and W. Saad, “Network formation in the sky: Unmanned
aerial vehicles for multi-hop wireless backhauling,” in Proc. of the
IEEE Global Communications Conference (GLOBECOM), Singapore,
Dec. 2017.
[155] O. Semiari, W. Saad, M. Bennis, and Z. Dawy, “Inter-operator resource
management for millimeter wave, multi-hop backhaul networks,” IEEE
Transactions on Wireless Communications, vol. 16, no. 8, pp. 5258–
5272, Aug. 2017.
[156] N. Burlutskiy, M. Petridis, A. Fish, A. Chernov, and N. Ali, An
Investigation on Online Versus Batch Learning in Predicting User
Behaviour, Research and Development in Intelligent Systems XXXIII.
Springer, 2016.
[157] U. Challita, L. Dong, and W. Saad, “Deep learning for proactive
resource allocation in LTE-U networks,” in Proc. of European Wireless
Conference. Dresden, Germany, May 2017.
[158] U. Challita, L. Dong, and W. Saad, “Proactive resource management
for LTE in unlicensed spectrum: A deep learning perspective,” IEEE
Transactions on Wireless Communications, vol. 17, no. 7, pp. 4674–
4689, July 2018.

Artificial Neural Networks-Based Machine Learning For Wireless Networks: A Tutorial

Uploaded by

Copyright:

Available Formats

Artificial Neural Networks-Based Machine Learning For Wireless Networks: A Tutorial

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Artificial Neural Networks-Based Machine Learning For Wireless Networks: A Tutorial

Uploaded by

Copyright:

Available Formats

1

Artificial Neural Networks-Based Machine Learning for

Existing Key Machine Learning Tools Key Applications

Section IV: Application of ANNs for Wireless Communications

Fig. 1. Organization of the tutorial.

Stochastic neural network

Deep belief networks

Spiking neural network Liquid state machine

Fig. 2. Summary of artificial neural networks.

compared to traditional ANNs (e.g., FNNs) since each value

Input Hidden Output

Fig. 4. Architecture of an unfolded recurrent neural network.

Typical type of input data Advantages Drawbacks

can be given by: 5HDGRXW fuction

yt = W out [1; st ; xt ] . (3)

y t = tanh (W out [1; st ; xt ]).

Fig. 7. Architecture of an LSTM as shown in [83].

the users, battery-limited UAVs can determine their optimal

70 • The advantage of the conceptor ESN for UAV-based

IV-E). This is mainly due to the fact that the future

Applications Existing Works Challenges Future Works and Suggested Solutions

be used for wireless VR to solve the problems such as users 28

Average delay of each served user (ms)

3) Example: One key application of using ANNs for 22

wireless VR systems is presented in [104] for the study of 20

resource allocation in cellular networks that support VR users. 18

the tracking information from the VR users over the cellular 14

a user requests a certain content. Since human behavior can

1 day can significantly improve the accuracy of weekly

0.7 though that is not ascertained analytically. An interesting

applications). 2) Challenges and Limitations of ANN-based RL Algo-

Existing Works Data Analytics

Use Case Relevant applications

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.