Artificial Neural Networks-Based Machine Learning For Wireless Networks: A Tutorial
Artificial Neural Networks-Based Machine Learning For Wireless Networks: A Tutorial
Artificial Neural Networks-Based Machine Learning For Wireless Networks: A Tutorial
Abstract—In order to effectively provide ultra reliable low from conventional multimedia or voice-based services [4]. For
latency communications and pervasive connectivity for Internet instance, beyond the need for high data rates – which has been
of Things (IoT) devices, next-generation wireless networks can the main driver of the wireless network evolution in the past
leverage intelligent, data-driven functions enabled by the inte-
gration of machine learning notions across the wireless core and decade – next-generation wireless networks will also have to
edge infrastructure. In this context, this paper provides a com- deliver ultra-reliable, low-latency communication [4] and [5],
prehensive tutorial that overviews how artificial neural networks that is adaptive and in real-time to the dynamics of the IoT
(ANNs)-based machine learning algorithms can be employed for users and the IoT’s physical environment. For example, drones
solving various wireless networking problems. For this purpose, and connected vehicles [6] will place autonomy at the heart of
we first present a detailed overview of a number of key types of
ANNs that include recurrent, spiking, and deep neural networks, the IoT. This, in turn, will necessitate the deployment of ultra-
that are pertinent to wireless networking applications. For each reliable wireless links that can provide real-time, low-latency
type of ANN, we present the basic architecture as well as specific control for such autonomous systems [7]–[9]. Meanwhile, in
examples that are particularly important for wireless network tomorrow’s wireless networks, large volumes of data will
design. Such examples include echo state networks, liquid state be collected, periodically and in real-time, across a massive
machine, and long short term memory. And then, we provide
an in-depth overview on the variety of wireless communication number of sensing and wearable devices that monitor physical
problems that can be addressed using ANNs, ranging from environments. Such massive short-packet transmissions will
communication using unmanned aerial vehicles to virtual reality lead to a substantial traffic over the wireless uplink, which has
applications over wireless networks and edge computing and traditionally been much less congested than the downlink [10].
caching. For each individual application, we present the main This same wireless network must also support cloud-based
motivation for using ANNs along with the associated challenges
while we also provide a detailed example for a use case scenario gaming [11], immersive virtual reality services [12], real-time
and outline future works that can be addressed using ANNs. In HD streaming, and conventional multimedia services. This
a nutshell, this article constitutes the first holistic tutorial on the ultimately creates a radically different networking environment
development of ANN-based machine learning techniques tailored whose novel applications and their diverse quality-of-service
to the needs of future wireless networks. (QoS) and reliability requirements mandate a fundamental
change in the way in which wireless networks are modeled,
analyzed, designed, and optimized.
I. I NTRODUCTION
The need to cope with this ongoing and rapid evolution of
The wireless networking landscape is undergoing a major wireless services has led to a considerable body of research
revolution. The smartphone-centric networks of yesteryears that investigates what the optimal cellular network architecture
are gradually morphing into an Internet of Things (IoT) will be within the context of the emerging fifth generation
ecosystem [1]–[3] that integrates a heterogeneous mix of (5G) wireless networks (e.g., see [13] and the references
wireless-enabled devices ranging from smartphones, to drones, therein). While the main ingredients for 5G – such as dense
connected vehicles, wearables, sensors, and virtual reality small cell deployments, millimeter wave (mmWave) com-
devices. This unprecedented transformation will not only drive munications, and device-to-device (D2D) communications –
an exponential growth in wireless traffic in the foreseeable have been identified, integrating them into a truly harmonious
future, but it will also lead to the emergence of new and wireless system that can meet the IoT challenges requires
untested wireless service use cases, that substantially differ instilling intelligent functions across both the edge and the
core of the network. These intelligent functions must be able
This work was supported in part by the National Natural Science Foundation to adaptively exploit the wireless system resources and the
of China under Grant 61629101, Grant 61871041, and Grant 61671086, in part
by Beijing Natural Science Foundation and Municipal Education Committee generated data, in order to optimize the network operations and
Joint Funding Project under Grant KZ201911232046, in part by the 111 guarantee, in real-time, the QoS needs of emerging wireless
Project under Grant B17007, in part by grants No. ZDSYS201707251409055, and IoT services. Such mobile edge and core intelligence can
No. 2017ZT07X152, No. 2018B030338001, and No. 2018YFB1800800, in
part by the U.S. National Science Foundation under Grants CNS-1460316, potentially be realized by integrating fundamental notions of
CNS-1836802, and IIS-1633363. machine learning (ML) [14], in particular, artificial neural
Digital Object Identifier: 10.1109/COMST.2019.2926625
1553-877X
c 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.
2
network (ANN)-based ML approaches, across the wireless control. In fact, if properly designed, ML optimization al-
infrastructure and the end-user devices. ANNs [15] are a gorithms will provide inherently self-organizing, self-healing,
computational nonlinear machine learning framework can be and self-optimizing solutions for a broad range of problems
used for supervised learning, unsupervised learning [16], semi- within the context of network optimization and resource
supervised learning [17], and reinforcement learning [18], in management. Such ML-driven self-organizing solutions are
various wireless networking scenarios. Hereinafter, ML is used particularly apropos for ultra dense wireless networks in which
to refer to ANN-based ML. classical centralized and distributed optimization approaches
can no longer cope with the scale and the heterogeneity of the
network.
A. Role of ANNs in Wireless Networks
Third, beyond its system-level functions, ML can play a
ML tools are undoubtedly one of the most important tools key role at the physical layer of a wireless network [32].
for endowing wireless networks with intelligent functions, as As shown in [32]–[37], ML tools can be used to redefine
evidenced by the wide adoption of ML in a myriad of applica- the way in which physical layer functions, such as coding
tions domains [19]–[24]. In the context of wireless networks, and modulation, are designed, at both transmitter and receiver
ML will enable any wireless device to actively and intelli- levels, within a generic communication system. Such an ML-
gently monitor its environment by learning and predicting the driven approach has been shown [32]–[37] to have a lot of
evolution of the various environmental features (e.g., wireless promise in delivering lower bit error rates and better robustness
channel dynamics, traffic patterns, network composition, con- to the wireless channel impediments.
tent requests, user context, etc.) and proactively taking actions Last, but not least, the rapid deployment of highly user-
that maximize the chances of success for some predefined centric wireless services, such as virtual reality [38], in which
goal, which, in a wireless system, pertains to some sought the gap between the end-user and the network functions
after quality-of-service. ML enables the network infrastructure is almost minimal, strongly motivates the need for wireless
to learn from the wireless networking environment and take networks that can track and adapt to the human user behavior.
adaptive network optimization actions. In consequence, ML In this regard, ML is perhaps the only tool that is capable to
is expected to play several roles in the next-generation of learn and mimic human behavior, which will help in creating
wireless networks [25]–[29]. the wireless network to adapt its functions to its human users,
First, the most natural application of ML in a wireless thus creating a truly immersive environment and to maximize
system is to exploit intelligent and predictive data analytics the overall quality-of-experience (QoE) of the users.
to enhance situational awareness and the overall network From the above discussion, we can further narrow down
operations [25]. In this context, ML will provide the wireless the introduction of ML in wireless networks to imply two
network with the ability to parse through massive amounts key functions: 1) Intelligent and predictive data analytics,
of data, generated from multiple sources that range from the ability of the wireless network to intelligently process
wireless channel measurements and sensor readings to drones large volumes of data, gathered from its devices, in order
and surveillance images, in order to create a comprehensive to analyze and predict the context of the wireless users
operational map of the massive number of devices within the and the wireless network’s environmental states thus en-
network [30]. This map can, in turn, be exploited to optimize abling data-driven network-wide operational decisions, and 2)
various functions, such as fault monitoring and user tracking, intelligent/self-organizing network control and optimization,
across the wireless network. the ability of the wireless network to dynamically learn the
Second, beyond its powerful intelligent and predictive data wireless environment and intelligently control the wireless
analytics functions, ML will be a major driver of intelligent network and optimize its resources according to information
and data-driven wireless network optimization [30]. For in- smartly learned about the wireless environment and users’
stance, ML tools will enable the introduction of intelligent states.
resource management tools, that can be used to address a Clearly, the ML-based system operation is no longer a
variety of problems ranging from cell association and radio privilege, but rather a necessity for future wireless networks.
access technology selection to frequency allocation, spectrum ML-driven wireless network designs will pave the way to-
management, power control, and intelligent beamforming. In wards an unimaginably rich set of new network functions and
contrast to the conventional distributed optimization tech- wireless services. For instance, even though 5G networks may
niques, that are often done iteratively in an offline or semi- not be fully ML capable, we envision that the subsequent,
offline manner [31], ML-guided resource management mech- sixth generation (6G) [39] of wireless cellular networks will
anisms will be able to operate in a fully online manner by surely integrate important tools from ML, as evidenced by the
learning, in real time, the states of the wireless environment recent development of intelligent mobile networks proposed
and the network’s users. Such mechanisms will therefore be by Huawei [40] and the “big innovation house" proposed by
able to continuously improve their own performance over Qualcomm [41]. As such, the question is no longer if ML tools
time which, in turn, will enable more intelligent and dynamic are going to be integrated into wireless networks but rather
network decision making. Such ML-driven decision making when such an integration will happen. In fact, the importance
is essential for much of the envisioned IoT and 5G services, of an ML-enabled wireless network has already been moti-
particularly those that require real-time, low latency operation, vated by a number of recent wireless networking paradigms,
such as autonomous driving, drone guidance, and industrial such as mobile edge caching, context-aware networking, and
3
mobile edge computing [42]–[49], the majority of which use The main contribution of this paper is, thus, to provide a
ML techniques for various tasks such as user behavior analysis tutorial on the topic of ANN-based ML for wireless network
and predictions so as to determine which contents to cache and design The overarching goal is to give a tutorial on the emerg-
how to proactively allocate computing resources. However, ing research contributions, from ANNs and wireless commu-
despite their importance, these works have a narrow focus and nications, that address the major opportunities and challenges
do not provide any broad, tutorial-like material that can shed in developing ANN-based ML frameworks for understanding
light on the challenges and opportunities associated with the and designing intelligent wireless systems. To the best of our
use of ML for designing intelligent wireless networks. knowledge, this is the first tutorial that gathers the state-of-
the-art and emerging research contributions related to the use
of ANNs for addressing a set of communication problems
B. Previous Works in beyond 5G wireless networks. Our main contributions
A number of surveys and tutorials on ML applications in include:
wireless networking have been published, for example, [3], • We provide a comprehensive treatment of artificial neural
[32], and [50]–[62]. Nevertheless, these works are limited in a networks, with an emphasis on how such tools can be
number of ways. First, a majority of the existing works focuses used to create a new breed of ML-enabled wireless
on a single ML technique (often the basics of deep learning networks.
[32], [50], and [56]–[58] or reinforcement learning [61]) and, • After providing a brief introduction to the basics of ML,
as such, they do not capture the rich spectrum of available ML we provide a more detailed exposition of ANNs that are
frameworks. Second, they mostly restrict their scope to a single particularly useful for wireless applications, such as re-
wireless application such as sensor networks [53], cognitive current, spiking, and deep neural networks. For each type,
radio networks [52], machine-to-machine (M2M) communica- we provide an introduction on their basic architectures
tion [3], physical layer design [32], software defined network- and a specific use-case example. Other ANNs that can be
ing [55], Internet of Things [57], or self-organizing networks used for wireless applications are also briefly mentioned
(SONs) [59], and, hence, they do not comprehensively cover where appropriate.
the broad range of applications that can adopt ML in future • Then, we discuss a broad range of wireless applica-
networks. Third, a large number of the existing surveys and tions that can make use of ANN. These applications
tutorials, such as [3], [51]–[53], [60], and [62]1 , are highly include drone-based communications, spectrum manage-
qualitative and do not provide an in-depth technical and quan- ment with multiple radio access technologies, wireless
titative description on the variety of existing ML tools that are virtual reality, mobile edge caching and computing, and
suitable for wireless communications. Last, but not least, some the IoT system, among others. For each application, we
surveys discuss the basics of neural networks with applications first outline the main rationale for applying ANNs while
outside of wireless communications. However, these surveys pinpointing illustrative scenarios. Then, we expose the
are largely inaccessible to the wireless community, due to their challenges and opportunities brought forward by the use
reliance on examples from rather orthogonal disciplines such of ANNs in the specific wireless application. We comple-
as computer vision. Moreover, most of the existing tutorials or ment this discussion with a detailed example drawn from
surveys do not provide concrete guidelines on how, when, and the state-of-the-art and, then, we conclude by shedding
where to use different artificial neural network (ANN) tools light on the potential future works within each specific
in the context of wireless networks. Finally, the introductory area.
literature on ML for wireless networks such as in [3], [32], The rest of this tutorial is organized as follows (Fig. 1).
and [50]–[62], is largely sparse and fragmented and provides In Section II, we introduce the basics of ANNs. Section III
very scarce details on the role of ANNs, hence, making it presents several key types of ANNs such as recurrent neural
difficult to understand the intrinsic details of this broad and far networks (RNNs), spiking neural networks (SNNs), and deep
reaching area. Table I summarizes the difference between this neural networks (DNNs). In Section IV, we discuss the use
tutorial and the magazine, tutorial, and survey papers. From of ANNs for wireless communication and the corresponding
Table I, we can see that, compared to the existing works such challenges and opportunities. Finally, conclusions are drawn
as [3], [32], [50]–[62], our tutorial provides a more detailed in Section V.
exposition of several types of ANNs that are particularly useful
for wireless applications and explains, pedagogically and, in
II. A RTIFICIAL N EURAL N ETWORKS : P RELIMINARIES
detail, how to develop ANN-based ML solutions to endow
intelligent wireless networks and realize the full potential of ML was born from pattern recognition and it is essentially
5G systems, and beyond. based on the premise that intelligent machines should be
C. Contributions able to learn from and adapt to their environment through
experience [19]–[24]. Due to the ever growing volumes of
1 The main difference between our tutorial and [62] is that the authors in generated data – across critical infrastructures, communication
[62] do not provide a comprehensive tutorial on how a broad range of ANNs networks, and smart cities – and the need for intelligent data
can be used for solving the wireless communication problems related to drone- analytics, the use of ML algorithms has become ubiquitous
based communications, spectrum management with multiple radio access
technologies, wireless virtual reality, mobile edge caching and computing, [64] across many sectors, such as in financial services, health
and the IoT. care, technology, and entertainment. Using ML algorithms to
4
TABLE I
C OMPARISON OF T HIS W ORK W ITH E XISTING S URVEY AND T UTORIAL PAPERS . H ERE , “CC", “CR", “DT", “PL", AND “DA" REFER TO CACHING AND
COMPUTING , COGNITIVE RADIO NETWORK , DATA TRAFFIC DOMAIN , PHYSICAL LAYER DOMAIN , AND DATA ANALYTICS .
Section I: Introduction
A. Role of Machine Learning in Wireless Networks C. Contributions
B. Previous Works
Section II: Artificial Neural Networks: Preliminaries Section III: Types of Artificial Neural Networks
Brief Introduction to Machine Learning and Motivation Behind A. Recurrent Neural Networks
Artificial Neural Networks
B. Spiking Neural Networks
Introduction to the Architecture of Artificial Neural Netowkrs C. Deep Neural Networks
Section V: Conclusion
build models that uncover connections and predict dynamic Supervised learning algorithms are trained using labeled
system or human behavior, system operators can make intelli- data [65]. When dealing with labeled data, both the input
gent decisions without any human intervention. For example, data and its desired output data are known to the system.
in a wireless system such as the IoT, ML tools can be used for Supervised learning is commonly used in applications that
intelligent data analytics and edge intelligence. ML tasks often have enough historical data. In contrast, the training of un-
depend on the nature of their training data. In ML, training is supervised learning tasks is done without labeled data [65].
the process that teaches the machining learning framework to The goal of unsupervised learning is to explore the data
achieve a specific goal, such as for speech recognition. In other and infer some structure directly from the unlabeled data.
words, training enables the ML framework to discover poten- Semi-supervised learning is used for the same applications
tial relationships between the input data and the output data of as supervised learning but it uses both labeled and unlabeled
this machine learning framework. There exist, in general, four data for training [65]. This type of learning can be used
key classes of learning approaches [65]: a) supervised learning, with methods such as classification, regression, and prediction.
b) unsupervised learning, c) semi-supervised learning, and d) Semi-supervised learning is useful when the cost of a fully-
reinforcement learning. labeled training process is relatively high. In contrast to the
5
previously discussed learning methods that need to be trained • Feedforward neural networks: In a feedforward neural
with historical data, RL is trained by the data collected from network (FNN), each neuron has incoming connections
implementation of the RL [65]. The goal of RL is to learn only from the previous layer and outgoing connections
an environment and find the best strategies for a given agent, only to the next layer. FNNs can be used to define
in different environments. The RL algorithms are particularly more advanced architectures such as: a) extreme learn-
interesting in the context of wireless network optimization ing machines (ELMs), b) convolutional neural networks
[66]. To perform supervised, unsupervised, semi-supervised, (CNNs), c) time delay neural networks (TDNNs), d)
or RL learning tasks, several frameworks have been developed. autoencoders, e) probabilistic neural networks (PNUs),
Among those frameworks, ANNs [54] are arguably the most and e) radial basis functions (RBFs).
important, as they are able to mimic human intelligence. • Physical neural networks: In a physical neural network
ANNs are inspired by the structure and functional aspects (PNN), an electrically adjustable resistance material is
of biological neural networks, that can learn from compli- used to emulate the function of a neural activation.
cated or imprecise data [54]. Within the context of wireless Each type of ANN is suitable for a particular learning
communications, as it will be clearer from the later sections, task. For instance, RNNs are effective in dealing with time-
ANNs can be used to investigate and predict network and dependent data while SNNs are effective in dealing with
user behavior so as to provide user information for solving continuous data. It should be noted that most of the data
diverse wireless networking problems such as cell association, collected by wireless networks is time-dependent and con-
spectrum management, computational resource allocation, and tinuous. In particular, in wireless networks, the user context
cached content replacement. Moreover, recent developments and behavior, the wireless signals, and the wireless channel
of smart devices and mobile applications have significantly conditions are all time-dependent and continuous. RNNs and
increased the level at which human users interact with mobile SNNs are effective in dealing with such collected data. They
systems. A trained ANN can be thought of as an “expert” in can exploit this data for various purposes, such as network
dealing with human-related data. Therefore, using ANNs to control and user behavior predictions. However, since RNNs
extract information from the user environment can provide a or SNNs can record only a limited size of historical data, they
wireless network with the ability to predict the users’ future may not be able to solve all of the wireless communication
behaviors and, hence, to design an optimal strategy to improve problems. To solve complex wireless problems that cannot
the resulting QoS and reliability. be solved by shallow RNNs and SNNs, one can use DNNs
There are various types of ANNs (see Fig. 2): which have a high memory capacity for data analytics and
• Modular neural networks: A modular neural network can separate the complex problem that needs to be learned
(MNN) is composed of several independent ANNs and an into a composition of several simpler problems thus making
intermediary. In an MNN, each ANN is used to complete the learning process effective. In consequence, in Section III,
one subtask of the entire task that an MNN wants to we specifically introduce RNNs, SNNs, and DNNs that are
perform. An intermediary is used to process the output most suited for wireless network use cases.
of each independent ANN and generate the output of an
MNN. III. T YPES OF A RTIFICIAL N EURAL N ETWORKS
• Recurrent neural networks: RNNs are ANN architectures
In this section, we specifically discuss three types of ANNs:
that allow neuron connections from a neuron in one layer
RNNs, SNNs, and DNNs, that have a promising potential for
to neurons in previous layers. According to different acti-
wireless network design, as will become clear in Section IV.
vation functions and connection methods for the neurons
For each kind of ANN, we briefly introduce its architecture,
in an RNN, RNNs can be used to define several different
advantages, and properties. Then, we present specific example
architectures: a) stochastic neural networks, b) bidirec-
architectures.
tional neural networks (BNNs), c) fully recurrent neural
network (FRNN), d) neural Turing machines (NTMs),
e) long short-term memories (LSTMs), e) echo state A. Recurrent Neural Networks
networks (ESNs), f) simple recurrent neural networks 1) Architecture of Recurrent Neural Networks: In a tradi-
(SRNNs), and g) gated recurrent units (GRUs). tional ANN, it is assumed that all the inputs or all the outputs
• Generative adversarial networks: Generative adversarial are independent from each other. However, for many tasks,
networks (GANs) consist of two neural networks. One the inputs (outputs) are related. For example, for predicting the
neural network is used to learn a map from a latent space mobility patterns of wireless devices, the input data, that is the
to a particular data distribution, while another neural users’ locations, are certainly related. To this end, recurrent
network is used to discriminate between the true data neural networks [67], which are ANN architectures that allow
distribution and the distribution mapped by the neural neuron connections from a neuron in one layer to neurons in
network. previous layers [67], as shown in Fig. 3, have been introduced.
• Deep neural networks: All the ANNs that have multiple This seemingly simple change enables the output of a neural
hidden layers are known as DNNs. network to depend, not only on the current input, but also on
• Spiking neural networks: The spiking neural networks the historical input, as shown in Fig. 4. This allows RNNs
consist of spiking neurons that accurately mimic the to make use of sequential information and exploit dynamic
biological neural networks. temporal behaviors such as those faced in mobility prediction
6
...
...
... W W
TABLE II
S UMMARY OF THE A DVANTAGES AND D ISADVANTAGES OF ANN S FOR W IRELESS A PPLICATIONS
T
input vector of an ESN as xt = [xt,1 , . . . , xt,Nin ] and the describe the activation value of each neuron. Even though the
T
output vector of an ESN as y t = [yt,1 , . . . , yt,Nout ] . An ESN input and the hidden weight matrices are fixed (randomly), all
model consists of the input weight matrix W in ∈ RN ×Nin , the neurons of an ESN will have their own activation values
the recurrent weight matrix W ∈ RN ×(N +1) , the leaking rate (hidden state). As opposed to the classical RNNs in which the
α, and the output weight matrix W out ∈ RNout ×(1+N +Nin ) , hidden state depends only on the current input, in ESNs, the
where N is the number of neurons in the hidden layer. The hidden state will be given by:
leaking rate α must be chosen to match the speed of the
T
dynamics of the hidden states st = [st,1 , . . . , st,N ] , where s̃t = f (W [1; st−1 ] + W in xt ) , (1)
st,i represents the state of neuron i at time t, and output y t . To
allow ESNs to store historical information, the hidden state st st = (1 − α) st−1 + αs̃t , (2)
should satisfy the so-called echo state property, which means
that the hidden state st should be uniquely defined by the x
where f (x) = eex −e +e−x and [·; ·] represents a vertical vector
−x
fading history of the input x0 , x1 , . . . , xt . This is in contrast (or matrix) concatenation. The model is also sometimes used
to traditional ANNs, such as FNNs, that need to adjust the without the leaky integration, which is a special case for α = 1
weight values of the neurons in the hidden layers, ESNs only yielding s̃t = st . From (1), we can see that the scaling of W in
need to guarantee the echo state property. Typically, in order and W determines the proportion of how much the current
to guarantee the echo state property of an ESN, the spectral state st depends on the current input xt and how much on
radius of W should be smaller than 1. The setting of other the previous state st−1 . Here, a feedback connection from
ESN components to guarantee the echo state property and to y t−1 to st can be applied to the ESNs, defined as a weight
optimize ESN performance can be found in [70]. matrix W fb ∈ RN ×Nout . Hence, (1) can be rewritten as s̃t =
Having described the main components of ESNs, we now f W [1; st−1 ] + W in xt + W fb y t−1 .
8
Based on the hidden state st , the output signal of the ESN Liquid
Input Hidden Output shift from conventional, shallow ANNs, towards DNN, include
recent advances in computing capacity due to the emergence
of capable processing units, the wide availability of data for
DNN training, and the emergence of effective DNN training
algorithms [81]. As opposed to shallow ANNs that have only
one hidden layer, a DNN having multiple layers is more
beneficial due to the following reasons:
• Number of neurons: Generally, a shallow ANN would
require a lot more neurons than a DNN for the same
level of performance. In fact, the number of units in a
shallow ANN grows exponentially with the complexity
of the task.
Fig. 6. Architecture of a DNN. • Task learning: While the shallow ANNs can be effective
to solve small-scale problems, they can be ineffective
when dealing with more complex problems, such as
its own state, which gives the liquid a strong fading memory. wireless environment mapping. In fact, the main issue
The activity in the network and the actual firing of the neurons is that shallow ANNs are very good at memorization, but
can also last for a while after the signal has ended, which not so good at generalization. As such, DNNs are more
can be viewed as another form of memory. Second, in the suitable for many real-world tasks which often involve
liquid of an LSM, the different input signals are separated, complex problems that are solved by decomposing the
allowing for the readout to classify them. This separation is function that needs to be learned into several simpler
hypothesized to happen by increasing the dimensionality of the functions so as to improve the efficiency of the learning
signal. For example, if the input signal has 20 input channels, process.
this is transformed into 135 (3 × 3 × 15) signals and states It is worth noting that, although DNNs have a large capacity
of neurons in the liquid. For every pair of input signal and to model a high degree of nonlinearity in the input data, a
liquid neuron, there is a certain chance of being connected, central challenge is that of overfitting. In DNNs, overfitting
e.g., 30% in [79]. The connections between the neurons are becomes particularly acute due to the presence of a very
allocated in a stochastic manner (e.g., see [79, Appendix B]). large number of parameters. To overcome this issue, several
All neurons in a liquid will connect to the readout functions. advanced regularization approaches, such as dataset augmen-
• Readout Model: The readout of an LSM consists of one tation and weight decay [82] have been proposed. These
or more FNNs that use the activity state of the liquid to methods modify the learning algorithm so that the test error is
approximate a specific function. The purpose of the readout reduced at the expense of increased training errors. A summary
is to build the relationship between the dynamics of the of key advantages and disadvantages of DNNs for wireless
spiking neurons and the desired output signals. The inputs applications are presented in Table II.
of the readout networks are called readout-moments. These Next, we elaborate more on LSTM, a special kind of DNN
are snapshots of the liquid activity taken at a regular interval. that is capable of storing information for long periods of
Whatever measure is used, the readout represents the state time by using an identity activation function for the memory
of the liquid at some point in time. In general, in LSM, cell. This, in turn, makes LSTM suitable for various wireless
FNNs are used as the readout function. FNNs will use the communication problems such as channel selection.
liquid dynamics (i.e., spikes) as their input and the desired 1) Example DNN - Long Short Term Memory: LSTMs
output signals as their output. Then, the readout function can that typically consist of three hidden layers are a special kind
be trained using traditional training methods used for FNNs, of “deep learning” RNNs that are capable of storing infor-
mainly backpropagation. Once the readout function has been mation for either long or short periods of time. In particular,
trained, the LSM can be used to perform the corresponding the activations of an LSTM network correspond to short-term
tasks. memory, while the weights correspond to long-term memory.
Therefore, if the activations can preserve information over
C. Deep Neural Networks long periods of time, then this makes them long-term short-
Thus far, all of the discussed ANNs, including ESNs and term memory. Although both ESNs and LSTMs are good at
LSMs, have assumed a single hidden layer. Such an architec- modeling time series data, LSTM cells have the capability
ture is typically referred to as a shallow ANN. In contrast, a of dealing with long term dependencies. An LSTM contains
deep neural network is an ANN with multiple hidden layers LSTM units each of which having a cell with a state ct at time
between the input and the output layers [80], as shown in Fig. t. Access to this memory unit, as shown in Fig.7, for reading
6. Therefore, a DNN models high-level abstractions in data or modifying information is controlled via three gates:
through multiple nonlinear transformations to learn multiple • Input gate (it ): controls whether the input is passed on
levels of representation and abstraction [80]. Several types of to the memory cell or ignored.
DNNs exist such as deep CNNs, deep ESNs, deep LSMs, and • Output gate (ot ): controls whether the current activation
LSTM [80]. The main reasons that have enabled a paradigm vector of the memory cell is passed on to the output layer
10
TABLE III gate, respectively. W o and U o are the weight and transition
VARIOUS B EHAVIORS OF AN LSTM C ELL matrices of the output gate, respectively. W c and U c are the
Input gate Forget gate Behavior weight and transition matrices of the cell state, respectively.
0 1 remember the previous value
1 1 add to the previous value
fg , fc , and fh are the activation functions, corresponding
0 0 erase the value respectively to the sigmoid and the tanh functions. ⊙ denotes
1 0 overwrite the value the Hadamard product. Compared to a standard RNN, LSTM
uses additive memory updates and separates the memory c
from the hidden state s, which interacts with the environment
or not. when making predictions. To train an LSTM network, the
• Forget gate (ft ): controls whether the activation vector stochastic gradient descent algorithm can be used.
of the memory cell is reset to zero or maintained.
Finally, another important type of DNNs is the so-called
Therefore, an LSTM cell makes decisions about what to convolutional neural networks that were recently proposed
store, and when to allow reads, writes, and erasures, via gates for analyzing visual imagery [84]. CNNs are essentially a
that open and close. At each time step t, an LSTM receives class of deep, FNNs. In CNNs, the hidden layers have neu-
inputs from two external sources, the current frame xt and the rons arranged in three dimensions: width, height, and depth.
previous hidden states of all LSTM units in the same layer These hidden layers are either convolutional, pooling, or fully
ht−1 , at each of the four terminals (the three gates and the connected, and, hence, if one hidden layer is convolutional
input). These inputs get summed up, along with the bias factors (pooling/fully connected), then it is called convolutional (pool-
bf , bi , bo , and bc . The gates are activated by passing their total ing/fully connected) layer. The convolutional layers apply a
input through the logistic functions. Table III summarizes the convolution operation to the input, passing the result to the
various behaviors an LSTM cell can achieve depending on next layer. The pooling layers are mainly used to simplify the
the values of the input and the forget gates. Moreover, the information from the convolutional layer while fully connected
update steps of a layer of LSTM units are summarized in the layers connect every neuron in one layer to every neuron in
following equations: another layer. As opposed to LSTM, that are good at temporal
gt = fg (W f xt + U f st−1 + bf ), (5) modeling, CNNs are appropriate at reducing frequency vari-
ations which therefore makes them suitable for applications
it = fg (W i xt + U i st−1 + bi ), (6) that deal with spatial data such as interference identification
in wireless networks [85]. Moreover, CNNs can be combined
ot = fg (W o xt + U o st−1 + bo ), (7)
with LSTM, resulting in a CNN LSTM architecture that can
ct = gt ⊙ ct−1 + it ⊙ fc (W c xt + U c ht−1 + bc ), (8) be used for sequence prediction problems with spatial inputs,
like images or videos [86].
st = ot ⊙ fh (ct ), (9)
In summary, different types of ANNs will have different
where gt , it , and ot are the forget, the input, and the output architectures, activation functions, connection methods, and
gate vectors at time t, respectively. xt is the input vector, data storage capacities. Each specific type of ANNs is suitable
ht is the hidden/output vector, and ct is the cell state vector for dealing with a particular type of data. For example, RNNs
(i.e., internal memory) at time t. W f and U f are the weight are good at dealing with time-related data while SNNs are
and transition matrices of the forget gate, respectively. W i good at dealing with continuous data. Moreover, each type
and U i are the weight and transition matrices of the input of ANNs has its own advantages and disadvantages in terms
11
of learning tasks, specific tasks such as time-related tasks or and autonomous/connected vehicles to inform the vehicles of
space-related tasks, training data size, training time, and data the traffic state and to potentially re-route some traffic to
storage space. Given all of their advantages, ANNs are ripe to respond to the current state of the system. Furthermore, ANNs
be exploited in a diverse spectrum of applications in wireless can be beneficial for integrating different data from multiple
networking, as discussed in the following section. sensors thus facilitating more interesting and complex wireless
communication applications. In particular, ANNs can identify
IV. A PPLICATIONS OF N EURAL N ETWORKS IN W IRELESS nonintuitive features largely from cross-sensor correlations
C OMMUNICATIONS which can result in a more accurate estimation of a wireless
In this section, we first overview the motivation behind network’s conditions and an efficient allocation of the available
developing ANN solutions for wireless communications and resources. Finally, the wireless network can use ANNs to learn
networking problems. Then, we introduce the use of ANNs about faults, infrastructure failure, and other disruptive events,
for various wireless applications. In particular, we discuss how so as to improve its resilience to such events.
to use ANNs for unmanned aerial vehicles (UAVs), wireless Second, a key application of ANNs in wireless networks is
virtual reality (VR), mobile edge caching and computing, for enabling self-organizing network operation by instilling
multiple radio access technologies, and the IoT. ANN-based ML at the edge of the network, as well as
across its various components (e.g., base stations and end-
user devices). Such edge intelligence is a key enabler of self-
A. Artificially Intelligent Wireless Networks using ANNs: An organizing solutions for resource management, user associa-
Overview tion, and data offloading. In this context, ANNs can serve as
Recently, ANNs have started to attract significant attention RL tools [87] that can be used by a wireless network’s devices
in the context of wireless communications and networking [4], to learn the wireless environment and to make intelligent
[25] and [32], since the development of smart devices and decisions. An ANN-based RL algorithm also can be used to
mobile applications has significantly increased the autonomy learn the users’ information such as their locations and data
of a wireless network, as well as the level at which human rate, and determine the UAV’s path based on the learned in-
users interact with the wireless communication system. More- formation. Traditional learning algorithms, such as Q-learning,
over, the development of mobile edge computing and caching that use tables or matrices to record historical data, do not scale
technologies makes it possible for base stations to store and well for dense wireless networks. On the other hand, ANNs
analyze the behavior of the users of a wireless network. In recently use a nonlinear function approximation method to find
addition, the emergence of the Internet of Things motivates the relationship using historical information. Therefore, ANN-
the use of ANNs to improve the way in which wireless based RL algorithms can learn complex relationships between
data is processed, collected, and used for various sensing and wireless users and their networking environments to provide
autonomy purposes. solutions for the notoriously challenging problems of network
In essence, within the wireless communication domains, performance optimization and resource management.
ANNs have been proposed for two major applications. First, ANNs can be simultaneously employed for both predic-
they can be used for prediction, inference, and the intelligent tion and intelligent/self-organizing operation, for scenarios in
and predictive data analytics purposes. Within this application which two functions are largely interdependent. For instance,
domain, the ANN-based ML algorithms enable the wireless data can help in decision making, while decision making can
network to learn from the datasets generated by its users, generate new data. For example, when considering virtual
environment, and network devices. For instance, ANNs can reality applications over wireless networks, one can use ANNs
be used to analyze and predict the wireless users’ mobility to predict the behavior of users, such as head movement
patterns and content requests therefore allowing the BSs to and content requests. These predictions can help an ANN-
optimize the use of their resources, such as frequency, time, based RL algorithm to allocate computational and spectral
or the files that will be cached across the network. Moreover, resources to the users hence improving their QoS. Next,
predictions and inference will be a primary enabler of the we discuss specific applications that use ANNs for wireless
emerging IoT and smart cities paradigms. Within an IoT or communications.
within a smart city ecosystem, sensors will generate massive
volumes of data that can be used by the wireless network to
optimize its resources usage, understand its network operation, B. Wireless Communications and Networking with Unmanned
monitor failures, or simply deliver smart services, such as Aerial Vehicles
intelligent transportation. In this regard, the use of ANNs 1) UAVs for Wireless Communications: Providing connec-
for optimized predictions is imperative. In fact, ANNs will tivity from the sky to ground wireless users is an emerging
equip the network with the capability to process massive trend in wireless networking [88] (Fig. 8). Compared to
volumes of data and to parse useful information out of this terrestrial communications, a wireless system with low-altitude
data, as a pre-cursor to delivering smart city services. For UAVs is faster to deploy, more flexibly reconfigured, and
example, road traffic data gathered from IoT sensors can be likely to experience better communication channels due to the
processed using ANN tools to predict the road traffic status presence of short-range, line-of-sight (LoS) links. The use of
at various locations in the city. This can then be used by the highly mobile and energy-constrained UAVs for wireless com-
wireless network that connects road traffic signals, apparatus, munications also introduces many new challenges [88], such as
12
capacity-constrained fronthaul links will directly limit the data location. The output is the prediction of a user’s location
rate of the users that request content from the cloud. Therefore, in the next time slots. Ridge regression is used to train the
the cache-enabled UAVs are introduced to service the mobile ESNs. The conceptor is also defined as a matrix used to
users along with terrestrial RRHs. Each cache-enabled UAV control the learning of an ESN. During the learning stage, the
can store a limited number of popular content that the users conceptor will record the learned mobility patterns and content
request. By caching the predicted content, the transmission request distribution patterns. When the conceptor ESN-based
delay from the content server to the UAVs can be significantly algorithm encounters a new input pattern, it will first determine
reduced as each UAV can directly transmit its stored content whether this pattern has been learned. If this new pattern has
to the users. been previously learned, the conceptor will instruct the ESN
A realistic model for periodic, daily, and pedestrian mobility to directly ignore it. This can allow the ESN to save some of
patterns is considered according to which each user will its memory only for the unlearned patterns.
regularly visit a certain location of interest. The QoE of each Based on the users’ mobility pattern prediction, the BBUs
user is formally defined as function of each user’s data rate, can determine the user association using a K-mean clustering
delay, and device type. The impact of the device type on the approach. By implementing a K-mean clustering approach,
QoE is captured by the screen size of each device. The screen the users that are close to each other are grouped into one
size will also affect the QoE perception of the user, especially cluster. In consequence, each UAV services one cluster and the
for video-oriented applications. The goal of [97] is to find user-UAV association is determined. Then, based on the UAV
an effective deployment of cache-enabled UAVs to satisfy the association and each user’s content request distribution, the
QoE requirements of each user while minimizing the transmit optimal contents to cache at each UAV and the optimal UAVs’
powers of the UAVs. This problem involves predicting, for locations can be found. When the altitude of a UAV is much
each user, the content request distribution and the periodic higher (lower) than the size of its corresponding coverage, the
locations, finding the optimal contents to cache at the UAVs, optimal location of the UAV can be found [97, Theorems 2 and
determining the users’ associations, as well as adjusting the 3]. For more generic cases, it can be found by the ESN-based
locations and transmit power of the UAVs. ANNs can be RL algorithm [101].
used to solve the prediction tasks due to their effectiveness in In Fig. 9, based on [97], we show how the memory of the
dealing with time-varying data (e.g., mobility data). Moreover, conceptor ESN reservoir changes as the number of mobility
ANNs can extract the relationships between the user locations patterns that were learned varies. The used mobility data is
and the users’ context information such as gender, occupation, gathered from Beijing University of Posts and Telecommuni-
and age. In addition, ANN-based RL algorithms can find the cations by recording the students’ locations during each day.
relationship between the UAVs’ location and the data rate of In Fig. 9, one mobility pattern represents the users’ trajectory
each user, enabling UAVs to find the locations that maximize in one day and the colored region is the memory used by the
the users’ data rates. ESN. Fig. 9 shows that the usage of the memory increases
A prediction algorithm using the framework of ESN with as the number of the learned mobility patterns increases. Fig.
conceptors is developed to find the users’ content request 9 also shows that the conceptor ESN uses less memory for
distributions and their mobility patterns. The predictions of learning mobility pattern 2 compared to pattern 6. In fact,
the users’ content request distribution and their mobility pat- compared to pattern 6, mobility pattern 2 has more similarities
terns are then used to find the user-UAV association, optimal to mobility pattern 1, and, hence, the conceptor ESN requires
locations of the UAVs and content caching at the UAVs. Since less memory to learn pattern 2. This is because the proposed
the data of the users’ behaviors such as mobility and content approach can be used to only learn the difference between the
request are time-related, an ESN-based approach, as previously learned mobility patterns and the new ones rather than to learn
discussed in Subsection III-A2, can quickly learn the mobility the entirety of every new pattern.
pattern and content request distributions without requiring Fig. 10 shows how the total transmit power of the UAVs
significant training data. Conceptors, defined in [100], enable changes as the number of users varies. From Fig. 10, we can
an ESN to perform a large number of predictions of mobility observe that the total UAV transmit power resulting from all
and content request patterns. Moreover, new patterns can the algorithms increases with the number of users. This is due
be added to the reservoir of the ESN without interfering to the fact that the number of users associated with the RRHs
with the previously acquired ones. The architecture of the and the capacity of the wireless fronthaul links are limited.
conceptor ESN-based prediction approach is based on the ESN Therefore, the UAVs must increase their transmit power to
model specified in Subsection III-A2. For the content request satisfy the QoE requirement of each user. From Fig. 10, we can
distribution prediction, the cloud’s BBUs must implement one also see that the conceptor based ESN approach can reduce the
conceptor ESN algorithm for each user. The input is defined total transmit power of the UAVs by about 16.7% compared
as each user’s context that includes gender, occupation, age, to the ESN algorithm used to predict the content request and
and device type. The output is the prediction of a user’s the mobility for a network with 70 users. This is because the
content request distribution. The generation of the reservoir conceptor ESN, that separates the users’ behavior into multiple
is done as explained in Subsection III-A2. The conceptor is patterns and uses the conceptor to learn these patterns, can
defined as a matrix that is used to control the learning of an predict the users’ behavior more accurately compared to the
ESN. For predicting mobility patterns, the input of the ESN- ESN algorithm.
based algorithm is defined as the user’s context and current Resource allocation problems in UAV-based wireless net-
14
Fig. 9. Mobility patterns predictions of conceptor ESN algorithm [97]. In this figure, the green curve represents the conceptor ESN prediction, the black
curve is the real positions, top rectangle j is the index of the mobility pattern learned by ESN, the legend on the bottom left shows the total reservoir memory
used by ESN and the legend on the bottom right shows the normalized root mean square error of each mobility pattern prediction.
Optimal algorithm with complete information are used for intelligently determining the user association,
Proposed conceptor ESN algorithm
ESN algorithm that predicts content request and mobility
optimal caching, and optimal UAV locations. The key lessons
90 ESN algorithm that predicts mobility with random caching learned here include:
Total transmit power of UAVs (W)
80
TABLE IV
S UMMARY OF THE USE OF ANN S FOR S PECIFIC A PPLICATION
tracking accuracy of the VR sensors. In particular, the BSs will time is an important question for wireless VR. In this regard,
jointly consider the users’ movement predicted by ANNs and training ANNs in an offline manner or using ANNs that
the users’ movements collected by VR sensors to determine converge quickly can be two promising solutions for speeding
the users’ movements. up the training process of ANNs.
Second, ANNs can be used to develop self-organizing The existing literature has studied a number of problems
algorithms to dynamically control and manage the wireless VR related to using ANNs for VR such as in [103]–[107]. The
network thus addressing problems such as dynamic resource work in [105] proposed an ESN based distributed learning
management. In particular, ANNs can be used for adaptively algorithm to predict the users’ head movement in VR ap-
optimizing the resource allocation and adjusting the quality plications. In [106], a decision forest learning algorithm is
and format of the VR images according to the cellular network proposed for gaze prediction. The work in [103] developed
environment. a neural network based transfer learning algorithm for data
Using ANNs for VR faces many challenges. First, in correlation aware resource allocation. 360◦ content caching
wireless VR networks, the data collected from the users may and transmission is optimized in [107] using an ESN and SSN
contain errors that are unknown to the BSs. In consequence, based deep RL algorithm. Table V summarizes the type of
the BSs may need to use erroneous data to train the ANNs and, ANNs and learning algorithms used for each existing work in
hence, the prediction accuracy of the ANN will be significantly virtual reality networks. In essence, the existing VR literature
affected. Second, due to the large data size of each 360◦ VR such as [103]–[107] has used ANNs to solve a number of
image, the BSs must spend a large amount of computational VR problems such as hand gestures recognition, interactive
resources to process VR images. Meanwhile, the training shape changes, video conversion, head movement prediction,
of ANNs will also require a large amount of computational and resource allocation. However, with the exception of our
resources. Thus, how to effectively allocate the computational works in [104] and [105], all of the other works that use ANNs
resources for processing VR images and training ANNs is an for VR applications are focused on wired VR. Therefore, they
important challenge. In addition, the VR applications require do not consider the challenges of wireless VR such as scarce
ultra-low latency while the training of ANNs can be time- spectrum resources, limited data rates, and how to transmit
consuming. Hence, how to effectively train ANNs in a limited the tracking data accurately and reliably. In fact, ANNs can
17
In this model, the BSs act as the VR control centers that collect 16
ESN-based algorithm achieves up to a 19.6% gain in terms formance for resource block allocation as DNN-based RL
of average delay compared to the Q-learning algorithm for algorithms. However, the time needed for training DNNs
the case with 6 BSs. Fig. 12 also shows that the ESN-based such as LSTMs will be much higher than the time needed
approach allows the wireless VR transmission to meet the for training ESNs. In consequence, one must choose an
VR delay requirement that includes both the transmission and appropriate ANN architecture for RL depending on the
processing delay (typically 20 ms [136]). These gains stem complexity of the wireless optimization problems. In the
from the adaptive nature of ESNs. wireless VR application, it could be more suitable to use
From this example, we illustrated the use of ESN as a shallow ANN in the RL algorithm for problems such as
an RL algorithm for self-organizing resource allocation in channel selection and user association, while DNN-based
wireless VR. An ESN-based RL algorithm enables each BS RL algorithms are more suitable for power allocation.
to allocate downlink and uplink spectrum resource in a self- This is due to the fact that, in power allocation problems,
organizing manner that adjusts the resource allocation accord- the optimized variables are continuous and, thus, the
ing to the dynamical environment. Moreover, an ESN-based number of actions needed for RL will be much larger
RL algorithm can use an approximation method to find the than those used in other problems (e.g., user association).
relationship between each BS’s actions and its corresponding Here, we note that, the above lesson learned can be gener-
utility values, and, hence, an ESN-based RL algorithm can alized to other shallow ANNs.
speed up the training process. Simulation results show that an 5) Future Works: Clearly, ANNs are a promising tool to
ESN-based RL algorithm enables each BS to achieve the delay address challenges in wireless VR applications. In fact, the
requirement of VR transmission. above application of ANNs for spectrum resource allocation
4) Lessons learned: Clearly, we have demonstrated that can be easily extended to manage other types of resources
ESNs can be an effective tool for resource management in a such as computational resources, and video formats. Moreover,
wireless VR network that needs to jointly consider the uplink SNNs can be used for the prediction of the viewing VR video
and downlink resource block allocation. Some key outcomes which is the VR video displayed at the headset of one user.
learned from this application include the following: Then, the network can reduce the data size of each transmitted
• In non-wireless applications such as speech recognition, VR video and pre-transmit each viewing VR video to the
ESNs are used for data analytics. In this VR application, users. This is because SNNs are good at processing the rapidly
ESNs are used as a reinforcement learning algorithm for changing, dynamic VR videos. Furthermore, RNNs can be
downlink and uplink resource block management. The used to predict and detect the VR users’ movement such as eye
advantage of the ESN based RL algorithm is that it movement and head movement and their interactions with the
provided the network with an ability to predict the value environment. Then, the network can pre-construct VR images
of the VR QoS that results from each action (instead based on these predictions which can reduce the time spent to
of relying on a Q-table to record the observed utility construct the VR images. The user-VR system interactions are
values as done in Q-learning) and, hence, it can find the all time-dependent and, hence, RNNs are a good choice for
optimal action selection strategy that can maximize the performing such tasks. Note that, the prediction of the users’
individual (per SBS) VR QoS utilities without having movement will directly affect the VR images that are sent to
to traverse all actions. As a result, ESN-based RL is the users at each time slot and, hence, the learning algorithm
suitable for wireless VR resource management problems must complete the training process during a short time period.
in which both uplink and downlink resources must be In consequence, we should use RNNs that are easy to train for
managed jointly, thus increasing the search space for the prediction of the users’ movement. Finally, CNNs can be
the wireless VR QoS optimization problem, compared to used for VR video compression and recovery so as to reduce
standard wireless resource management problems. This the data size of each transmitted VR video and improve the
was a novel use case of ESNs that is motivated by the QoS for each VR user. This is because CNNs are good at
underlying wireless system, rather than by the need to storing large amount of data in spatial domain and learn the
process some data as done in computer vision. features of VR images. A summary of key problems that can
• Compared to most of the existing DNN-based RL algo- be solved by using ANNs in wireless VR system is presented
rithms that cannot analytically guarantee convergence to in Table VI along with the challenges and future works.
a final equilibrium or optimization solution, in this ap-
plication, we have proved that ESN-based RL algorithms
will finally converge to the expected VR QoS utilities if D. Mobile Edge Caching and Computing
the learning parameters are appropriately set. 1) Mobile Edge Caching and Computing: Caching at the
• Due to the limited memory capacity of each ESN, the edge of the wireless networks, as shown in Fig. 13, enables
application of an ESN-based RL algorithm depends on the network devices (BSs and end-user devices) to store the
the complexity of the underlying wireless problems. ESN- most popular content to reduce the data traffic (content trans-
based RL algorithms can be used to solve the opti- mission), delay, and bandwidth usage, as well as to improve
mization problem with a moderate number of optimized the energy efficiency and the utilization of the users’ context
variables while DNN-based algorithms can be used to and social information [137]. Recently, it has become possible
solve more complex optimization problems. In this work, to jointly consider cache placement and content delivery, using
the ESN-based RL algorithms can achieve the same per- coded caching [138]. Coded caching enables network devices
19
from huge and disparate data sources. For example, one user’s Virtual clusters are connected to the content servers via
content request depends on this user’s age, job, and locations. capacity-constrained wired backhaul links. Since the backhaul
In fact, the data cleaning process usually takes more time than (fronthaul) links are wired, we assume that the total transmis-
the learning process. For instance, the type and volume of sion rate of the backhaul (fronthaul) links is equally allocated
content that users may request can be in the order of millions to the content that must be transmitted over the backhaul
and, hence, the data processing system should select appropri- (fronthaul) links. Each user has a periodic mobility pattern and
ate content to analyze and predict the users’ content request regularly visits a certain location. Since cache-enabled RRHs
behaviors. For caching, the most important use of ANNs is to and BBUs can store the requested content, this content can be
predict the users’ content requests which directly determines transmitted over four possible links: a) content server-BBUs-
the caching update. However, each user may request a large RRH-user, b) cloud cache-BBUs-RRH-user, c) RRH cache-
volume of content types such as video, music, and news, each RRH-user, and d) remote RRH cache-remote RRH-BBUs-
of which having different formats and resolutions. Hence, for RRH-user. The notion of effective capacity3 [144] was used to
each user, the total number of the requested content items capture the maximum content transmission rate of a channel
will be significantly large. However, the memory of an ANN under a certain QoS requirement. The effective capacity of
is limited and, hence, each ANN can record only a limited each content transmission depends on the link that is used to
number of requested contents. In consequence, an ANN must transmit the content and the actual link capacity between the
be able to select the most important content for content request user and the associated RRHs.
prediction so as to help the network operator determine which The goal of [114] is to develop an effective framework for
content to store at mobile edge cache. Similarly, for computing content caching and RRH clustering in an effort to reduce the
tasks predictions, the limited-memory ANNs can only store a network’s interference and to offload the traffic of the backhaul
finite number of the computing tasks and, hence, they must and of the fronthaul based on the predictions of the users’
select suitable computing tasks to store and predict. Moreover, content request distributions and mobility patterns. To achieve
as opposed to mobile edge caching that requires a long period this goal, a QoS and delay optimization problem is formulated,
of time to update the cached contents, mobile edge computing whose objective is to maximize the long-term sum effective
needs to process the tasks as soon as possible. Therefore, the capacity of all users. This optimization problem involves
ANNs used for mobile edge computing must complete their the prediction of the content request distribution and of the
training process in a short period time. periodic location for each user, and the finding of the optimal
The existing literature has studied a number of problems content to cache at the BBUs and at the RRHs. To predict
related to the use of ANNs for caching [97], [108], [109], the content request distribution and mobility patterns for each
[114], and [110]–[113]. The authors in [108] proposed a user, an ESN-based learning algorithm is used, similarly to
big data-enabled architecture to investigate proactive content the one described in Subsection III-A2. For each user, the
caching in 5G wireless networks. In [109]–[111], ANNs are BBUs must implement one ESN algorithm for content request
used to determine the cache replacement and content delivery. distribution prediction and another ESN algorithm for mobility
The authors in [112] developed a data extraction method using pattern prediction.
the Hadoop platform to predict content popularity. In [113], For the content request distribution prediction, the input of
an extreme-learning machine neural network is used to predict the developed ESN is a user’s context which includes content
content popularity. The works in [97] and [114] developed an request time, week, gender, occupation, age, and device type.
ESN-based learning algorithm to predict the users’ mobility The output is the predicted content request distribution. The
patterns and content request distributions. In general, existing ESN model consists of the input weight matrix, the output
works such as in [97], [108], [109], [114], and [110]–[113] weight matrix, and the recurrent weight matrix (see Subsection
have used ANNs to solve the caching problems such as III-A2). A linear gradient descent approach is used to train the
cache replacement, content popularity prediction, and content output weight matrix. For mobility pattern prediction, the input
request distribution prediction. For mobile edge computing, in of the developed ESN is the current location of each user and
general, there is no existing work that uses ANNs to solve the output is the vector of locations that a user is predicted to
these relevant problems. Next, we explain a specific ANNs’ visit for the next steps. In contrast to the recurrent matrix that
application for mobile edge caching. is a sparse matrix and generated randomly, the recurrent matrix
3) Example: One illustrative application for the use of of the ESN used for mobility prediction contains only W non-
ANNs for mobile edge caching is presented in [114] which zero elements, where W is the dimension of the recurrent
studies the problem of proactive caching in CRANs. In this matrix. This simplified recurrent matrix can speed up the
model, the users are served by the RRHs which are connected training process of the ESNs. An offline manner using ridge
to the cloud pool of the BBUs via capacity-constrained wired regression is used to train the output weight matrix.
fronthaul links. The RRHs and the users are all equipped Based on the users’ content request distribution and lo-
with storage units that can be used to store the most popular cations, the cloud can estimate the users’ RRH association,
content that the users request. The RRHs which have the determine each RRH’s content request distribution, and, then,
same content request distributions are grouped into a virtual
3 The effective capacity is a link-layer channel model that can be used to
cluster and serve their users using zero-forcing method. The
measure a content transmission over multiple hops. In particular, the effective
content request distribution for a particular user represents the capacity can be used to measure a content transmission from the BBUs to the
probabilities with which the user requests different content. RRHs, then from RRHs to the users.
21
10 4 for each user. For example, to predict the weekly mobility
1.25 ESNs and sublinear algorithm
Optimal caching with complete information pattern of each user, an ESN-based learning algorithm
Sum effective capacity (bits/s/Hz) 1.2
Random caching with clustering
Random caching without clustering cannot separate the mobility pattern in a week into several
1.15 days and use a specific non-linear system to predict the
1.1
users’ mobility in each day. In fact, as we discussed in the
UAV application in Subsection IV-B, using a unique non-
linear system to predict the mobility of each user each
1.05
This is because DNNs are good at storing large amounts of ANNs can allow the smart use of different RATs wherein a
utility values resulting from different content delivery and BS can learn when to transmit on each type of frequency band
cache update schemes. Last but not as least, SNNs can be based on the underlying network conditions. For instance,
used to predict the dynamic computational resource demands ANNs may allow multi-mode BSs to steer their traffic flows
for each user due to their advantages in dealing with highly between the mmWave, the microwave, and the unlicensed band
dynamic data. A summary of the key problems of using ANNs based on the availability of a LoS link, the congestion on
for mobile edge caching and computing is presented in Table the licensed band and the availability of the unlicensed band.
VI along with the challenges and future works. Moreover, in LTE-WiFi link aggregation (LWA) scenarios,
ANNs allow cellular devices to learn when to operate on each
band or utilize both links simultaneously.
E. Co-existence of Multiple Radio Access Technologies
Moreover, ANNs can provide multi-mode BSs with the
1) Co-existence of Multiple Radio Access Technologies: ability to learn the appropriate resource management proce-
To cope with the unprecedented increase in mobile data traffic dure over different RATs or spectrum bands in an online
and realize the envisioned 5G services, a significant enhance- manner and, thus, to offer an autonomous and self-organizing
ment of per-user throughput and overall system capacity is operation with no explicit communication among different
required [146]. Such an enhancement can be achieved through BSs, once deployed. For instance, ANNs can be trained over
advanced PHY/MAC/network technologies and efficient meth- large datasets which take into account the variations of the
ods of spectrum management. In fact, one of the main ad- traffic load over several days for scenarios in which the traffic
vancements in the network design for 5G networks relies on load of WiFi access points (WAPs) can be characterized based
the integration of multiple different radio access technologies on a particular traffic model [153]. It should be noted that
(RATs) [147]. Multi-RAT based networks encompass several cellular data traffic networks exhibit statistically fluctuating
technologies in which spectrum sharing is important. These and periodic demand patterns, especially for applications such
include cognitive radio networks, LTE-U networks, as well as file transfer, video streaming, and browsing [153]. ANNs
as heterogeneous networks that include both mmWave and can also accommodate the users’ mobility patterns to predict
sub-6 GHz frequencies. With the multi-RAT integration, a the availability of a LoS link, thus, allowing the transmission
mobile device can potentially transmit data over multiple radio over the mmWave band. In particular, they can be trained
interfaces such as LTE and WiFi [148], at the same time, to learn the antenna tilting angle based on the environment
thus improving its performance [149]. Moreover, a multi- changes in order to guarantee a LoS communication link with
RAT network allows fast handover between different RATs the users and, thus, to enable an efficient communication over
and, thus, it provides seamless mobility experience for users. the mmWave spectrum. Moreover, ANNs may enable multiple
Therefore, the integration of different RATs results in an BSs to learn how to form multi-hop, mmWave links over
improvement in the utilization of the available radio resources backhaul infrastructure, while properly allocating resources
and, thus, in an increase in the system’s capacity. It also across those links in an autonomous manner [154], [155].
guarantees a consistent service experience for different users To cope with the changes in the traffic model and/or the
irrespective of the served RAT and it facilitates the network users’ mobility pattern, ANNs can be combined with online
management. ML [156] by properly re-training the weights of the developed
Spectrum management is also regarded as another key learning mechanisms. Multi-mode BSs can, thus, learn the
component of Multi-RAT based networks [150]. Unlike early traffic patterns over time and, thus, predict the future channel
generations of cellular networks that operate exclusively on availability status. With proper network design, ANNs can
the sub-6 GHz (microwave) licensed band, Multi-RAT based allow operators to improve their network’s performance by re-
networks are expected to transmit over the conventional sub-6 ducing the probability of congestion occurrence while ensuring
GHz band, the unlicensed spectrum and the 60 GHz mmWave a degree of fairness to the other corresponding technologies
frequency band [151], [152]. We note that, on the other in the network.
hand, the classical LTE microwave licensed band is reliable, A proactive resource management of the radio spectrum
however, limited and hence is a scarce resource. On the other for multi-mode BSs can also be achieved using ANNs. In
hand, the unlicensed bands can be used to serve best effort a proactive approach, rather than reactively responding to
traffic only since the operation over this spectrum should incoming demands and serving them when requested, multi-
account for the presence of other coexisting technologies. mode BSs can predict traffic patterns and determine future off-
Therefore, a multi-mode BS operating over the licensed, peak times on different spectrum bands so that the incoming
unlicensed, and mmWave frequency bands can exploit the traffic demand can be properly allocated over a given time
different characteristics and availability of the frequency bands window. In an LTE-U system, for instance, a proactive co-
thus providing robust and reliable communication links for the existence mechanism may enable future delay-intolerant data
end users [152]. However, to reap the benefits of multi-mode demands to be served within a given prediction window ahead
BSs, spectrum sharing is crucial. of their actual arrival time thus avoiding the underutilization of
2) Neural Networks for Spectrum Management and the unlicensed spectrum during off-peak hours [157]. This will
Multi-RAT: ANNs are an attractive solution approach for also lead to an increase in the LTE-U transmission opportunity
tackling various challenges that arise in multi-RAT scenarios. as well as to a decrease in the collision probability with WAPs
To leverage the advantages of such multi-RAT networks, and other BSs in the network.
23
Several existing works have adopted various learning tech- The exponential backoff scheme is adopted for WiFi while the
niques in order to tackle a variety of challenges that arise in BSs adjust their contention window size (and, thus, the channel
multi-RAT networks [62], [101], [115]–[120]. The problem access probability) on each of the selected channels based on
of resource allocation with uplink-downlink decoupling in an the network traffic conditions while also guaranteeing a long-
LTE-U system has been investigated in [101] in which the term equal weighted fairness with WLAN and other BSs.
authors propose a decentralized scheme based on ESNs. The The proactive resource allocation scheme in [158] is for-
authors in [115] propose a fuzzy-neural system for resource mulated as a noncooperative game in which the players are
management among different access networks. The work in the BSs. Each BS must choose which channels to transmit
[116] used an ANN-based learning algorithm for channel on along with the corresponding channel access probabilities
estimation and channel selection. The authors in [117] pro- at t = 0 for each t of the next time window T . This, in
pose a supervised ANN approach, based on FNNs, for the turn, allows the BSs to determine future off-peak hours of the
classification of the users’ transmission technology in a multi- WLAN on each of the unlicensed channels thus transmitting
RAT system. In [118], the authors propose a hopfield neural on the less congested channels. Each BS can therefore max-
network scheme for multi-radio packet scheduling. In [119], imize its total throughput over the set of selected channels
the authors propose a cross-system learning framework in over T while guaranteeing long-term equal weighted fairness
order to optimize the long-term performance of multi-mode with the WLAN and the other BSs. To solve the formulated
BSs, by steering delay-tolerant traffic towards WiFi. The work game (and find the so-called Nash equilibrium solution), a
in [120] used a deep RL algorithm for mode selection and DNN framework based on LSTM cells was used. To allow
resource management in a fog radio access network. Other a sequence-to-sequence mapping, we considered an encoder-
important problems in this domain include root cause analysis decoder model as described in Section III-C. In this model,
issues as the ones are studied in [62]. Nevertheless, these prior the encoder network maps an input sequence to a vector of a
works [62], [101], [115]–[120] consider a reactive approach fixed dimensionality and then the decoder network decodes the
in which the data requests are first initiated and, then, the target sequence from the vector. In this scheme, the input of the
resources are allocated based on their corresponding delay encoder is a time series representation of the historical traffic
tolerance value. In particular, existing works do not consider load of the BSs and WAPs on all the unlicensed channels.
the predictable behavior of the traffic and, thus, they do not The learned vector representation is then fed into a multi-layer
account for future off-peak times during which data traffic perceptron (MLP) that summarizes the input vectors into one
could be distributed among different RATs. vector, thus accounting for the dependency among all the input
Here, note that, ANNs are suitable for learning the data time series vectors. The output of the MLP is then fed into
traffic variations over time and, thus, to predict the future different separate decoders, allowing each BS to reconstruct
traffic load. In particular, since LSTM cells are capable of its predicted action sequence.
storing information for long periods of time, they can learn the To train the proposed network, the REINFORCE algorithm
long-term dependency within a given sequence. Predictions at [159] is used to compute the gradient of the expected reward
a given time step are influenced by the network activations with respect to the policy parameters, and the standard gradient
at previous time steps, thus, making LSTMs an attractive descent optimization algorithm [160] is adopted to allow the
solution for proactively allocating the available resources in model to generate optimal action sequences for input history
multi-RAT systems. In what follows, we summarize our work traffic values. In particular, we considered the RMSprop gradi-
in [158], in which we developed a deep RL scheme, based on ent descent optimization algorithm [161], an adaptive learning
LSTM memory cells, for allocating the resources in an LTE-U rate approach, wherein the learning rate of a particular weight
network over a fixed time window T . is divided by a running average of the magnitudes of the recent
3) Example: An interesting application of DNNs in the gradients for that weight.
context of LTE-U and WiFi coexistence is presented in [158]. The proposed proactive resource allocation scheme was
The work in [158] considers a network composed of several compared with a reactive approach for three different network
LTE-U BSs belonging to different LTE operators, several scenarios. Fig. 15 shows that for very small values of T ,
WAPs and a set of unlicensed channels on which LTE-U BSs the proposed scheme does not yield any significant gains.
and WAPs can operate on. The LTE carrier aggregation fea- However, as T increases, the BSs have additional opportunities
ture, using which the BSs can aggregate up to five component for shifting part of the traffic into the future and, thus, the gains
carriers belonging to the same or different operating frequency start to become more pronounced. For example, we can see
bands, is adopted. We consider a time domain divided into that, for 4 BSs and 4 channels, the proposed proactive scheme
multiple time windows of duration T , each of which consisting achieves an increase of 17% and 20% in terms of the average
of multiple time epochs t. Our objective is to proactively airtime allocation for LTE-U as compared to the reactive
determine the spectrum allocation vector for each BS at t = 0 approach. Here, note that the gain of the proposed scheme,
over T while guaranteeing long-term equal weighted airtime with respect to the reactive approach, keeps on increasing until
share with WLAN. In particular, each BS learns its channel it reaches a maximum achievable value, after which it remains
selection, carrier aggregation, and fractional spectrum access almost constant.
over T while ensuring long-term airtime fairness with the 4) Lessons learned: In the aforementioned application,
WLAN and the other LTE-U operators. A contention-based we have demonstrated that LSTM can be an effective tool
protocol is used for channel access over the unlicensed band. for resource management in an LTE-U system that needs to
24
errors. While this is true for all of the applications in ecosystem [162]. The IoT will enable machine-type devices
which we used ANNs as part of a RL algorithm, the effect to connect with each other over wireless links and operate
of the prediction errors may be more pronounced for the in a self-organizing manner [163]. Therefore, IoT devices
LTE-U application because it may lead to the LTE seizing will be able to collect and exchange real-time information
more or less WiFi slots than needed, which can directly to provide smart services. In this respect, the IoT will allow
impact the operation of the WiFi user. Naturally, this is delivering innovative services and solutions in the realms of
a more serious drawback than in scenarios where the smart cities, smart grids, smart homes, and connected vehicles
network is simply using ANNs to cache data (e.g., as in that could provide a significant improvement in people’s lives.
the previously discussed UAV application) or perform cell However, the practical deployment of an IoT system still faces
association (in which case, if a prediction error occurs, the many challenges [163] such as data analytics, computation,
network can simply resort back to known cell association transmission capabilities, connectivity, end-to-end latency, se-
algorithms). curity [164], and privacy. In particular, how to provide massive
5) Future Works: The above application of ANNs to LTE- device connectivity with stringent latency requirement will be
U systems can be easily extended to a multi-mode network one of the most important challenges. The current centralized
in which the BSs transmit on the licensed, the unlicensed, communication models and the corresponding technologies
and the mmWave spectrum. In fact, given their capability of may not be able to provide such massive connectivity. There-
dealing with time series data, RNNs can enhance mobility and fore, there is a need for a new communication architecture,
handover in highly mobile wireless environments by learning such as fog computing models for IoT devices connectivity.
the mobility patterns of users thus decreasing the ping-pong ef- Moreover, for each IoT device, energy and computational
fect among different RATs. For instance, a predictive mobility resources are limited. Hence, how to allocate computational
management framework can address critical handover issues, resources and power for all the IoT devices to achieve the
including frequent handovers, handover failures, and excessive data rate and latency requirements is another challenge.
energy consumption for seamless handovers in emerging dense 2) Neural Networks for the Internet of Things: ANNs
multi-RAT wireless cellular networks. ANNs can also predict can be used to address some of the key challenges within
the QoS requirements, in terms of delay and rate, for the the context of the IoT. So far, ANNs have seen four major
future offered traffic. Moreover, they can predict the trans- applications for the IoT. First, ANNs enable the IoT system
mission links’ conditions and, thus, schedule users based on to leverage intelligent data analytics to extract important
the links’ conditions and QoS requirements. Therefore, given patterns and relationships from the data sent by the IoT
the mobility patterns, transmission links’ conditions and QoS devices. For example, ANNs can be used to discover important
requirements for each user, BSs can learn how to allocate correlations among data to improve the data compression and
different users on different bands such that the total network data recovery. Second, using ANN-based RL algorithms, IoT
performance, in terms of delay and throughput, is optimized. devices can operate in a self-organizing manner and adapt their
An interesting future work of the use of DNNs for mmWave strategies (i.e., channel selection) based on the wireless and
communication is antenna tilting. In particular, DNNs are users environments. For instance, an IoT device that uses an
capable of learning several features of the network environ- ANN-based RL algorithm can dynamically select the most
ment and thus predicting the optimal tilt angle based on the suitable frequency band for communication according to the
availability of a LoS link and data rate requirements. This network state. Third, the IoT devices that use ANN-based
in turn improves the users’ throughput thus achieving high algorithms can identify and classify the data collected from
data rate. Moreover, LSTMs are capable of learning long time the IoT sensors. Finally, one of the main goals of the IoT is to
series and thus can allow BSs to predict the link formation for improve the life quality of humans and reduce the interaction
the mmWave backhaul network. In fact, the formation of this between human and IoT devices. Thus, ANNs can be used to
backhaul network is highly dependent on the network topology predict the users behavior to provide advanced information for
and the traffic conditions. Therefore, given the dynamics the IoT devices. For example, ANNs can be used to predict
of the network, LSTMs enable BSs to dynamically update the time that an individual will come home, and, hence, adjust
the formation of the links among each others based on the the control strategy for the IoT devices at home.
changes in the network. Moreover, SNNs can be used for Using ANNs for IoT faces many challenges. First, in
mmWave channel modeling since they can process and predict IoT, both energy and computational resources are limited.
continuous-time data effectively. A summary of key problems Therefore, one should consider the tradeoff between the energy
that can be solved by using ANNs in multi-RAT system is and computational needs of training ANNs and the accuracy
presented in Table VI along with the challenges and future requirement of a given ANN-based learning algorithm. In
works. particular, the higher the required accuracy, the higher the
computational and energy requirements. Second, within an IoT
ecosystem, the collected data may have different structure and
F. Internet of Things even contain several errors. Therefore, when data are used to
1) The Internet of Things: In the foreseeable future, it train ANNs, one should consider how to classify the data and
is envisioned that trillions of machine-type devices such as deal with the flaws in the data. In other words, the ANNs in
wearables, sensors, connected vehicles, or mundane objects IoT must tolerate erroneous data. Third, in the IoT system,
will be connected to the Internet, forming a massive IoT ANNs can exploit thousands of types of data for prediction
26
and self-organizing control. For a given task, the data collected IoT device is limited, IoT devices with different computational
from the IoT devices may not all be related to the task. Hence, resources will map to a different number of neurons. For
ANNs must select suitable data for the task. example, an IoT device that has more computational resources
The existing literature [121]–[129] has studied a number can map to a larger number of neurons. Moreover, to ensure
of problems related to using ANNs for IoT. In [121], the the integrity of the mapping model, each neuron can only
authors use a framework to treat an IoT network as an ANN to map to one of the IoT devices. Given that there are several
reduce delivery latency. The authors in [122] and [123] used ways to map the IoT network to the trained FNN, the optimal
a backpropagation neural network for sensor failure detection mapping is formulated as an integer linear program which
in an IoT network. In [124], eight ML algorithms, including is then solved using CPLEX. When the optimal mapping
DNNs and FNNs, are tested for human activities classification between the IoT network and the trained FNN is found, the
and robot navigation as well as body postures and movements. optimal connections between the IoT devices are built. Hence,
In [125], the authors used the Laguerre neural network-based if the IoT network can find the optimal connections for all
approximate dynamic programming scheme to improve the devices based on the objective functions, the transmit power
tracking efficiency in an IoT network. The authors in [126] and expected transmit time can be reduced. Simulation results
develped a streaming hardware accelerator for CNNs to im- show that the mapping algorithm can achieve significant gains
prove the accuracy of image detection in an IoT network. The in terms of total transmit power and expected transmit time
work in [127] used a denoising autoencoder neural network compared to a centralized algorithm. This is because the IoT
for data sampling in an IoT network. In [128], a deep belief network uses FNNs to approximate the objective functions and
network is used for entity state prediction. The authors in find the optimal device connections.
[129] used ANNs for target surveillance. In summary, the prior 4) Lessons learned: This IoT application has shown that
works used ANNs to solve a number of IoT problems such FNNs are an effective tool for network mapping in IoTs so as
as IoT network modeling, failure detection, human activities to find the optimal transmission links from the transmitters to
classification, and tracking accuracy improvement. However, the receivers through the relays. We can summarize the main
ANNs can also be used to analyze the data correlation for data lessons learned here as follows:
compression and data recovery, to identify humans, to predict • The advantage of FNNs for the studied IoT application
human activities, and to manage the resources of devices. Next, is that it enabled the IoT devices to optimally build the
we explain a specific ANNs’ application for IoT. transmission links between the receivers and the trans-
3) Example: One illustrative application for the use of mitters so as to reduce the transmission delay without
ANNs within the context of the IoT is presented in [121] any communications among the IoT devices. In this
which studies how to improve the communication quality application, the wireless network only consists of the
by mapping IoT networks to ANNs. The considered IoT receivers, the transmitters, and the relays, and, the data
network is primarily a wireless sensor network. Two objective in this wireless network will only be transmitted from
functions are considered : a) minimizing the overall cost of the transmitters to the relays, then from the relays to the
communication between the devices mapped to the neurons receivers. The use of FNNs to map this network is ap-
in the input layer and the devices mapped to the neurons in propriate as it allows one to find the optimal transmission
the output layers. Here, the overall cost represents the total links between the transmitters and the receivers, through
transmit power of all devices used to transmit the information the relays. This was a novel use case of FNNs that is
signals, and b) minimizing the expected transmission time to motivated by the underlying wireless system.
deliver the information signals. • FNNs are very simple neural networks with little training
To minimize the total transmit power and the expected overhead, which makes them suitable for implementa-
transmit time for the IoT, the basic idea of [121] is to train an tion in IoT systems in which the devices are resource-
ANN so as to approximate the objective functions discussed constrained.
above and, then, map the IoT network to the ANN. FNNs, are • One disadvantage of using FNNs for mapping wireless
used for this mapping since they transmit the information in networks is that they can be only used for a network
only one direction, forward, from the input nodes, through the with a small number of transmitters and receivers. This
hidden nodes, and to the output nodes. First, one must identify is due to the fact that, as the number of transmitters and
the devices that want to send signals as well as the devices that receivers increases, the number of neurons in the input,
will receive signals. The IoT devices that want to send signals output, and hidden layers increases. Since FNNs need to
are mapped to the neurons in the input layers. The IoT devices calculate the gradients of all of the neurons (in contrast to
that want to receive signals are mapped to the neurons in the ESNs that only need to update the output weight matrix),
output layers. The other IoT devices are mapped to the neurons the training complexity will significantly increase.
in the hidden layers. Some of the devices that are mapped to • The presented IoT application is restricted to a very
the hidden layers will be used to forward the signals. Then, simple mapping of IoT devices via an FNN. However, the
the FNN is trained in an offline manner to approximate the IoT domain is much richer than this application and one
objective functions. The IoT network devices are mapped into can envision a plethora of resource management, physical
neurons and wireless links into connections between neurons, layer enhancement, and network optimization problems
and, hence, a method is needed to map the trained FNN to that can be addressed using more elaborate ANNs such
the IoT network. Since the computational resources of each as those presented in Section III (and in the previous
27
TABLE V
S UMMARY OF THE USE OF ANN- BASED L EARNING A LGORITHMS FOR E XISTING W ORKS IN S PECIFIC A PPLICATION
prediction accuracy of ANNs can be compromised. Second, training complexity. Finally, training ANNs may require a
for data analytics, existing ANN-based learning algorithms large amount of training data (depending on the application)
cannot be readily implemented at the mobile devices such and such data may not be always readily available in a wireless
as smartphones due to high training complexity and energy network.
consumption. In fact, small IoT or wearable devices such as Table V summarizes the type of ANNs and learning algo-
watches and IoT sensors, or even smartphones, can record rithms used for each existing work in each application. Based
more data related to the users’ environment compared to BSs on this table, one can identify the advantages, disadvantages,
that are located far away from the users. In consequence, if and limitations of each learning algorithm for all types of
an ANN learning algorithm can be implemented at wearable problems encountered in the literature. Table VI provides a
and carriable devices, it can use more data related to the summary of the key wireless networking problems that can be
users’ behaviors for training purpose and, hence, the prediction solved by using ANNs along with the challenges and relevant
accuracy can be improved, while also alleviating privacy applications.
concerns. One possibility to overcome this challenge is to train
at a BS or cloud then implement the trained ANNs at the V. C ONCLUSION
users’ device. Third, distributed ANN learning algorithms are
needed for wireless networks. In particular, mobile users will In this paper, we have provided one of the first comprehen-
connect to the different BSs as they move from one cell to sive tutorials on the use of artificial neural networks-based
another. In this case, the data related to such mobile user may machine learning for enabling a variety of applications in
be located at different BSs and the BSs may not be able to tomorrow’s wireless networks. In particular, we have presented
exchange the collected data due to limited capacity of backhaul an overview of a number of key types of neural networks
links. In consequence, a distributed ANN learning algorithm such as recurrent, spiking, and deep neural networks. For each
is needed for data analytics as the users’ data is located at type, we have overviewed the basic architecture as well as
different BSs. One possibility to overcome this challenge is the associated challenges and opportunities. Then, we have
to leverage the emerging idea of federated learning [165] that provided a panoramic overview of the variety of wireless
enables distributed learning. Moreover, the training complexity communication problems that can be addressed using ANNs.
of ANN-based data analytics algorithms can be higher than In particular, we have investigated many emerging applications
other ML tools such as ridge regression. In consequence, one including unmanned aerial vehicles, wireless virtual reality,
must balance the tradeoff between prediction accuracy and mobile edge caching and computing, Internet of Things, and
multi-RAT wireless networks. For each application, we have
29
TABLE VI
S UMMARY OF THE USE OF ANN S FOR S PECIFIC W IRELESS P ROBLEMS
provided the main motivation for using ANNs along with their [21] F. Sebastiani, “Machine learning in automated text categorization,”
associated challenges while also providing a detailed example ACM Computing Surveys (CSUR), vol. 34, no. 1, pp. 1–47, March
2002.
for a use case scenario. Last, but not least, for each application, [22] R. Collobert and J. Weston, “A unified architecture for natural language
we have provided a broad overview on future works that processing: Deep neural networks with multitask learning,” in Proc.
can be addressed using ANNs. Clearly, the future of wireless of the International Conference on Machine Learning, New York, NY,
USA, July 2008, pp. 160–167.
networks will inevitably rely on artificial intelligence and, thus, [23] B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up?: Sentiment
this paper provides a stepping stone towards understanding the classification using machine learning techniques,” in Proc. of the
analytical machinery needed to develop such a new breed of Conference on Empirical Methods in Natural Language processing,
Stroudsburg, PA, USA, July 2002.
wireless networks. [24] C. M. Bishop, Pattern recognition and machine learning, Springer,
2006.
[25] S. Bi, R. Zhang, Z. Ding, and S. Cui, “Wireless communications in
R EFERENCES the era of big data,” IEEE Communications Magazine, vol. 53, no. 10,
pp. 190–199, Oct. 2015.
[1] N. C. Luong, D. T. Hoang, P. Wang, D. Niyato, D. I. Kim, and Z. Han,
“Data collection and wireless communication in internet of things [26] “The Amazing Ways Verizon Uses AI And
(IoT) using economic analysis and pricing models: a survey,” IEEE Machine Learning To Improve Performance,”
Communications Surveys & Tutorials, vol. 18, no. 4, pp. 2546–2590, https://www.forbes.com/sites/bernardmarr/2018/06/22/
June 2016. the-amazing-ways-verizon-uses-ai-and-machine-learning-to-improve-performance/
[2] Z. Dawy, W. Saad, A. Ghosh, J. G. Andrews, and E. Yaacoub, “To- #3695e5e17638, 2018.
ward massive machine type cellular communications,” IEEE Wireless [27] “Making waves with AI,” https://www.
Communications, vol. 24, no. 1, pp. 120–128, Nov. 2017. ericsson.com/en/mobility-report/reports/june-2018/
[3] T. Park, N. Abuzainab, and W. Saad, “Learning how to communicate applying-machine-intelligence-to-network-management, 2018.
in the internet of things: Finite resources and heterogeneity,” IEEE [28] “Qualcomm AI Research,” https://www.qualcomm.com/invention/
Access, vol. 4, pp. 7063–7073, Nov. 2016. artificial-intelligence/ai-research, 2018.
[4] F. Boccardi, R. W. Heath, A. Lozano, T. L. Marzetta, and P. Popovski, [29] “Focus group on machine learning for future networks includ-
“Five disruptive technology directions for 5G,” IEEE Communications ing 5G,” https://www.itu.int/en/ITU-T/focusgroups/ml5g/Pages/default.
Magazine, vol. 52, no. 2, pp. 74–80, Feb. 2014. aspx, 2018.
[5] 3GPP, “Study on latency reduction techniques for LTE ,” Technical [30] J. Ferber, Multi-agent systems: An introduction to distributed artificial
Report (TR) 36.881, 3rd Generation Partnership Project (3GPP), 2016. intelligence, vol. 1, Addison-Wesley Reading, 1999.
[6] Y. Zeng, R. Zhang, and T. J. Lim, “Wireless communications with [31] S. Bubeck, Convex Optimization: Algorithms and Complexity, Now
unmanned aerial vehicles: opportunities and challenges,” IEEE Com- Foundations and Trends, 2015.
munications Magazine, vol. 54, no. 5, pp. 36–42, May 2016. [32] T. O’Shea and J. Hoydis, “An introduction to deep learning for the
[7] T. Zeng, O. Semiari, W. Saad, and M. Bennis, “Joint communication physical layer,” IEEE Transactions on Cognitive Communications and
and control for wireless autonomous vehicular platoon systems,” arXiv Networking, vol. 3, no. 4, pp. 563–575, Dec 2017.
preprint arXiv:1804.05290, 2018. [33] T. O’Shea, K. Karra, and T. C. Clancy, “Learning approximate neural
[8] M. Mozaffari, W. Saad, M. Bennis, and M. Debbah, “Communications estimators for wireless channel state information,” in Proc. of IEEE
and control for wireless drone-based antenna array,” IEEE Transactions International Workshop on Machine Learning for Signal Processing
on Communications, vol. 67, no. 1, pp. 820–834, Jan 2019. (MLSP), Tokyo, Japan, Sep. 2017.
[9] F. Javed, M. K. Afzal, M. Sharif, and B. Kim, “Internet of things [34] T. J. O’Shea, T. Erpek, and T. C. Clancy, “Deep learning based mimo
(iot) operating systems support, networking technologies, applications, communications,” available online arXiv:1707.07980, July 2017.
and challenges: A comparative review,” IEEE Communications Surveys [35] F. Liang, C. Shen, and F. Wu, “An iterative BP-CNN architecture
Tutorials, vol. 20, no. 3, pp. 2062–2100, thirdquarter 2018. for channel decoding,” IEEE Journal of Selected Topics in Signal
[10] G. Durisi, T. Koch, and P. Popovski, “Toward massive, ultrareliable, Processing, vol. 12, no. 1, pp. 144–159, Feb 2018.
and low-latency wireless communication with short packets,” Proceed- [36] E. Nachmani, E. Marciano, L. Lugosch, W. J. Gross, D. Burshtein, and
ings of the IEEE, vol. 104, no. 9, pp. 1711–1726, Sept 2016. Y. Be’ery, “Deep learning methods for improved decoding of linear
[11] D. G. Gopal and S. Kaushik, “Emerging technologies and applications codes,” IEEE Journal of Selected Topics in Signal Processing, vol. 12,
for cloud-based gaming: Review on cloud gaming,” Emerging Tech- no. 1, pp. 119–131, Feb 2018.
nologies and Applications for Cloud-Based Gaming, vol. 41, no. 07, [37] N. Samuel, T. Diskin, and A. Wiesel, “Deep mimo detection,” in
pp. 79–89, March 2016. Proc. of IEEE International Workshop on Signal Processing Advances
[12] 3GPP, “Extended Reality (XR) in 5G,” Technical Report (TR) 26.928, in Wireless Communications (SPAWC), Sapporo, Japan, July 2017.
3rd Generation Partnership Project (3GPP), 03 2018, Version 14.0.0. [38] E. Baştuğ, M. Bennis, M. Médard, and M. Debbah, “Towards
[13] J. G. Andrews, S. Buzzi, W. Choi, S. V. Hanly, A. Lozano, A. C. K. interconnected virtual reality: Opportunities, challenges and enablers,”
Soong, and J. C. Zhang, “What will 5G be?,” IEEE Journal on selected IEEE Communications Magazine, vol. 55, no. 6, pp. 110–117, June
areas in communications, vol. 32, no. 6, pp. 1065–1082, June 2014. 2017.
[14] T. Segaran, Programming collective intelligence: Building smart web [39] W. Saad, M. Bennis, and M. Chen, “A vision of 6G wireless systems:
2.0 applications, O’Reilly Media, Inc., 2007. Applications, trends, technologies, and open research problems,” arXiv
[15] B. Yegnanarayana, Artificial neural networks, PHI Learning Pvt. Ltd., preprint arXiv:1902.10265, 2019.
2009. [40] R. Yu, “Huawei reveals the future of mobile ai at IFA
[16] G. Chakraborty and B. Chakraborty, “A novel normalization technique 2017,” http://www.businesswire.com/news/home/20170902005020/en/
for unsupervised learning in ann,” IEEE Transactions on Neural Huawei-Reveals-Future-Mobile-AI-IFA-2017, 2017.
Networks, vol. 11, no. 1, pp. 253–257, Jan 2000. [41] S. Kovach, “Qualcomm CEO Steve Mollenkopf: What
[17] K. P. Bennett and A. Demiriz, “Semi-supervised support vector the big innovation house that powered the mobile
machines,” in Proc. of Advances in Neural Information processing boom is betting on next,” http://www.businessinsider.com/
systems, 1999, pp. 368–374. qualcomm-ceo-steve-mollenkopf-interview-2017-7, 2017.
[18] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. [42] Y. C. Hu, M. Patel, D. Sabella, N. Sprecher, and V. Young, “Mobile
Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, and G. Os- edge computing-A key technology towards 5G,” ETSI White Paper,
trovski, “Human-level control through deep reinforcement learning,” vol. 11, no. 11, pp. 1–16, Sept. 2015.
Nature, vol. 518, no. 7540, pp. 529, 2015. [43] A. Ahmed and E. Ahmed, “A survey on mobile edge computing,” in
[19] C. Andrieu, N. De Freitas, A. Doucet, and M. I. Jordan, “An Proc. of International Conference on Intelligent Systems and Control,
introduction to MCMC for machine learning,” Machine Learning, vol. Coimbatore, India, Jan. 2016.
50, no. 1-2, pp. 5–43, Jan. 2003. [44] S. Sardellitti, G. Scutari, and S. Barbarossa, “Joint optimization of
[20] W. T. Freeman, E. C. Pasztor, and O. T. Carmichael, “Learning low- radio and computational resources for multicell mobile-edge comput-
level vision,” International Journal of Computer Vision, vol. 40, no. ing,” IEEE Transactions on Signal and Information Processing over
1, pp. 25–47, Oct. 2000. Networks, vol. 1, no. 2, pp. 89–103, June 2015.
31
[45] S. Nunna, A. Kousaridas, M. Ibrahim, M. Dillinger, C. Thuemmler, [68] M. Lukoševičius and H. Jaeger, “Reservoir computing approaches to
H. Feussner, and A. Schneider, “Enabling real-time context-aware recurrent neural network training,” Computer Science Review, vol. 3,
collaboration through 5G and mobile edge computing,” in Proc. of no. 3, pp. 127–149, Aug. 2009.
International Conference on Information Technology-New Generations, [69] P. J. Werbos, “Backpropagation through time: What it does and how
Las Vegas, NV, USA, June 2015. to do it,” Proceedings of the IEEE, vol. 78, no. 10, pp. 1550–1560,
[46] G. Lee, W. Saad, and M. Bennis, “Decentralized cross-tier interference Oct. 1990.
mitigation in cognitive femtocell networks,” in Proc. of IEEE Interna- [70] M. Lukos̆evicius, A Practical Guide to Applying Echo State Networks,
tional Conference on Communications (ICC), Paris, France, May 2017. Springer Berlin Heidelberg, 2012.
[47] Y. Mao, J. Zhang, and K. B. Letaief, “Dynamic computation offloading [71] H. Jaeger, “Short term memory in echo state networks,” in GMD
for mobile-edge computing with energy harvesting devices,” IEEE Report, 2001.
Journal on Selected Areas in Communications, vol. 34, no. 12, pp. [72] R. Ali and T. Peter, “Minimum complexity echo state network,”
3590–3605, Sep. 2016. IEEE Transactions on Neural Networks, vol. 22, no. 1, pp. 131–144,
[48] X. Chen, L. Jiao, W. Li, and X. Fu, “Efficient multi-user computation November 2011.
offloading for mobile-edge cloud computing,” IEEE/ACM Transactions [73] C. Gallicchio and A. Micheli, “Deep reservoir computing: A critical
on Networking, vol. 24, no. 5, pp. 2795–2808, Oct. 2016. analysis,” in Pro. of European Symposium on Artificial Neural
[49] O. Semiari, W. Saad, S. Valentin, M. Bennis, and H. V. Poor, “Context- Networks, Computational Intelligence and Machine Learning, Bruges,
aware small cell networks: How social metrics improve wireless Belgium, April 2016.
resource allocation,” IEEE Transactions on Wireless Communications, [74] C. M. Bishop, “Training with noise is equivalent to tikhonov regular-
vol. 14, no. 11, pp. 5927–5940, July 2015. ization,” Training, vol. 7, no. 1, pp. 108–116, 2008.
[50] H. Sun, X. Chen, Q. Shi, M. Hong, X. Fu, and N. D. Sidiropoulos, [75] B. Farhang-Boroujeny, Adaptive filters: theory and applications, John
“Learning to optimize: Training deep neural networks for wireless Wiley & Sons, 2013.
resource management,” available online arXiv:1705.09412, May 2017. [76] H. Jaeger and H. Haas, “Harnessing nonlinearity: Predicting chaotic
[51] C. Jiang, H. Zhang, Y. Ren, Z. Han, K. C. Chen, and L. Hanzo, systems and saving energy in wireless communication,” Science, vol.
“Machine learning paradigms for next-generation wireless networks,” 304, no. 5667, pp. 78–80, April 2004.
IEEE Wireless Communications, vol. 24, no. 2, pp. 98–105, April 2017. [77] D. Verstraeten, B. Schrauwen, D. Stroobandt, and J. Van Campenhout,
[52] M. Bkassiny, Y. Li, and S. K. Jayaweera, “A survey on machine- “Isolated word recognition with the liquid state machine: A case study,”
learning techniques in cognitive radios,” IEEE Communications Sur- Information Processing Letters, vol. 95, no. 6, pp. 521–528, Sept. 2005.
veys & Tutorials, vol. 15, no. 3, pp. 1136–1159, Oct. 2013. [78] W. Maass, “Liquid state machines: motivation, theory, and applica-
[53] M. A. Alsheikh, S. Lin, D. Niyato, and H. P. Tan, “Machine learning tions,” Computability in Context: Computation and Logic in the Real
in wireless sensor networks: Algorithms, strategies, and applications,” World, pp. 275–296, 2010.
IEEE Communications Surveys & Tutorials, vol. 16, no. 4, pp. 1996– [79] W. Maass, T. Natschläger, and H. Markram, “Real-time computing
2018, April 2014. without stable states: A new framework for neural computation based
[54] H. B. Demuth, M. H. Beale, O. De Jess, and M. T. Hagan, Neural on perturbations,” Neural Computation, vol. 14, no. 11, pp. 2531–2560,
network design, Martin Hagan, 2014. Nov. 2002.
[55] J. Xie, F. R. Yu, T. Huang, R. Xie, J. Liu, C. Wang, and Y. Liu,
[80] A. Courville I. Goodfellow, Y. Bengio, Deep Learning, The MIT press,
“A survey of machine learning techniques applied to software defined
2016.
networking (SDN): Research issues and challenges,” IEEE Commu-
[81] X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier networks,”
nications Surveys Tutorials, vol. 21, no. 1, pp. 393–430, Firstquarter
in Proc. of Artificial Intelligence and Statistics (AISTATS), Fort Laud-
2019.
erdale, FL, USA, June 2011.
[56] Q. Mao, F. Hu, and Q. Hao, “Deep learning for intelligent wireless
networks: A comprehensive survey,” IEEE Communications Surveys [82] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, chapter
Tutorials, vol. 20, no. 4, pp. 2595–2621, Fourthquarter 2018. Chapter 7: Regularization for Deep Learning, MIT Press, 2016, http:
[57] M. Mohammadi, A. Al-Fuqaha, S. Sorour, and M. Guizani, “Deep //www.deeplearningbook.org.
learning for IoT big data and streaming analytics: A survey,” IEEE [83] “A beginner guide to recurrent networks and LSTMs,” https://
Communications Surveys Tutorials, vol. 20, no. 4, pp. 2923–2960, deeplearning4j.org/lstm.html.
Fourthquarter 2018. [84] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
[58] Z. M. Fadlullah, F. Tang, B. Mao, N. Kato, O. Akashi, T. Inoue, and with deep convolutional neural networks,” in Proc. of Advances in
K. Mizutani, “State-of-the-art deep learning: Evolving machine intel- neural information processing systems, 2012.
ligence toward tomorrows intelligent network traffic control systems,” [85] M. Schmidt, D. Block, and U. Meier, “Wireless interference iden-
IEEE Communications Surveys Tutorials, vol. 19, no. 4, pp. 2432– tification with convolutional neural networks,” in Proc. of IEEE
2455, Fourthquarter 2017. International Conference on Industrial Informatics (INDIN), Emden,
[59] P. V. Klaine, M. A. Imran, O. Onireti, and R. D. Souza, “A survey Germany, July 2017.
of machine learning techniques applied to self-organizing cellular [86] M. Soh, “Learning CNN-LSTM architectures for image caption
networks,” IEEE Communications Surveys Tutorials, vol. 19, no. 4, generation,” http://cs224d.stanford.edu/reports/msoh.pdf, 2016.
pp. 2392–2431, Fourthquarter 2017. [87] L. J. Lin, “Reinforcement learning for robots using neural networks,”
[60] Y. Sun, M. Peng, Y. Zhou, Y. Huang, and S. Mao, “Application Tech. Rep., Carnegie-Mellon Univ Pittsburgh PA School of Computer
of machine learning in wireless networks: Key techniques and open Science, 1993.
issues,” IEEE Communications Surveys Tutorials, to appear, 2019. [88] Y. Zeng, R. Zhang, and T. J. Lim, “Wireless communications with
[61] N. C. Luong, D. T. Hoang, S. Gong, D. Niyato, P. Wang, Y. Liang, unmanned aerial vehicles: opportunities and challenges,” IEEE Com-
and D. I. Kim, “Applications of deep reinforcement learning in munications Magazine, vol. 54, no. 5, pp. 36–42, May 2016.
communications and networking: A survey,” IEEE Communications [89] Y. Yang, M. Chen, C. Guo, C. Feng, and W. Saad, “Power efficient
Surveys Tutorials, to appear, 2019. visible light communication (VLC) with unmanned aerial vehicles
[62] X. You, C. Zhang, X. Tan, S. Jin, and H. Wu, “AI for 5G: Research (UAVs),” IEEE Communications Letters, to appear, 2019.
directions and paradigms,” Science China Information Sciences, vol. [90] M. Mozaffari, W. Saad, M. Bennis, and M. Debbah, “Wireless com-
62, no. 2, pp. 1589–1602, Nov. 2019. munication using unmanned aerial vehicles (UAVs): Optimal transport
[63] J. Schmidhuber, “Deep learning in neural networks: An overview,” theory for hover time optimization,” IEEE Transactions on Wireless
Neural networks, vol. 61, pp. 85–117, Jan. 2015. Communications, vol. 16, no. 12, pp. 8052–8066, Dec 2017.
[64] “Machine learning: What it is and why it matters,” https://www.sas. [91] C. H. Liu, Z. Chen, J. Tang, J. Xu, and C. Piao, “Energy-efficient
com/en_us/insights/analytics/machine-learning.html. UAV control for effective and fair communication coverage: A deep
[65] E. Alpaydin, Introduction to machine learning, MIT press, 2014. reinforcement learning approach,” IEEE Journal on Selected Areas in
[66] L. Rose, S. Lasaulce, S. M. Perlaza, and M. Debbah, “Learning Communications, vol. 36, no. 9, pp. 2059–2070, Sep. 2018.
equilibria with partial information in decentralized wireless networks,” [92] V. Sharma, M. Bennis, and R. Kumar, “UAV-assisted heterogeneous
IEEE Communications Magazine, vol. 49, no. 8, pp. 136–142, August networks for capacity enhancement,” IEEE Communications Letters,
2011. vol. 20, no. 6, pp. 1207–1210, June 2016.
[67] D. P. Mandic and J. A. Chambers, Recurrent neural networks for [93] H. Zhang, C. Cao, L. Xu, and T. A. Gulliver, “A UAV detection
prediction: learning algorithms, architectures and stability, Wiley algorithm based on an artificial neural network,” IEEE Access, vol. 6,
Online Library, 2001. pp. 24720–24728, May 2018.
32
[94] D. Nodland, H. Zargarzadeh, and S. Jagannathan, “Neural network- users,” IEEE Transactions on Wireless Communications, vol. 16, no.
based optimal adaptive output feedback control of a helicopter UAV,” 6, pp. 3520–3535, June 2017.
IEEE Transactions on Neural Networks and Learning Systems, vol. 24, [115] L. Giupponi, R. Agusti, J. Perez-Romero, and O. Sallent, “Joint radio
no. 7, pp. 1061–1073, July 2013. resource management algorithm for multi-RAT networks,” in Pro.
[95] J. R. G. Braga, H. F. C. Velho, G. Conte, P. Doherty, and Élcio H. S., of IEEE Global Telecommunications Conference (GLOBECOM). St.
“An image matching system for autonomous UAV navigation based Louis, MO, Nov. 2005.
on neural network,” in Proc. of International Conference on Control, [116] H. He, C. Wen, S. Jin, and G. Y. Li, “Deep learning-based channel
Automation, Robotics and Vision (ICARCV), Phuket, Thailand, Nov. estimation for beamspace mmwave massive MIMO systems,” IEEE
2016. Wireless Communications Letters, vol. 7, no. 5, pp. 852–855, Oct 2018.
[96] J. Cui, Y. Liu, and A. Nallanathan, “Multi-agent reinforcement [117] S. Baban, D. Denkoviski, O. Holland, L. Gavrilovska, and H. Aghvami,
learning based resource allocation for UAV networks,” arXiv preprint “Radio access technology classification for cognitive radio networks,”
arXiv:1810.10408, 2018. in Pro. of IEEE International Symposium on Personal, Indoor, and
[97] M. Chen, M. Mozaffari, W. Saad, C. Yin, M. Debbah, and C. S. Hong, Mobile Radio Communications (PIMRC). London, UK, Sept. 2013.
“Caching in the sky: Proactive deployment of cache-enabled unmanned [118] Y. Cui, Y. Xu, R. Xu, and X. Sha, “A multi-radio packet scheduling
aerial vehicles for optimized quality-of-experience,” IEEE Journal on algorithm for real-time traffic in a heterogeneous wireless network
Selected Areas in Communications, vol. 35, no. 5, pp. 1046–1061, May environment,” Information Technology Journal, vol. 10, pp. 182–188,
2017. Oct. 2010.
[98] M. Chen, W. Saad, and C. Yin, “Liquid state machine learning for [119] M. Bennis, M. Simsek, A. Czylwik, W. Saad, S. Valentin, and M. Deb-
resource and cache management in LTE-U unmanned aerial vehicle bah, “When cellular meets WiFi in wireless small cell networks,” IEEE
(UAV) networks,” IEEE Transactions on Wireless Communications, Communication Magazine, vol. 51, no. 6, pp. 44–50, June 2013.
vol. 18, no. 3, pp. 1504–1517, March 2019. [120] Y. Sun, M. Peng, and S. Mao, “Deep reinforcement learning-based
[99] X. Liu, M. Chen, and C. Yin, “Optimized trajectory design in UAV mode selection and resource management for green fog radio access
based cellular networks for 3D users: A double Q-learning approach,” networks,” IEEE Internet of Things Journal, vol. 6, no. 2, pp. 1960–
arXiv preprint arXiv:1902.06610, 2019. 1971, April 2019.
[100] H. Jaeger, “Controlling recurrent neural networks by conceptors,” [121] N. Kaminski, I. Macaluso, E. Di Pascale, A. Nag, J. Brady, M. Kelly,
available online: arxiv.org/abs/1403.3369, 2014. K. Nolan, W. Guibene, and L. Doyle, “A neural-network-based
[101] M. Chen, W. Saad, and C. Yin, “Echo state networks for self-organizing realization of in-network computation for the Internet of Things,” in
resource allocation in LTE-U with uplink-downlink decoupling,” IEEE Proc. of IEEE International Conference on Communications, Paris,
Transactions on Wireless Communications, vol. 16, no. 1, pp. 3–16, France, May 2017.
Jan. 2017. [122] S. R. Naidu, E. Zafiriou, and T. J. McAvoy, “Use of neural networks
[102] U. Challita, Z. Dawy, G. Turkiyyah, and J. Naoum-Sawaya, “A for sensor failure detection in a control system,” IEEE Control Systems
chance constrained approach for LTE cellular network planning under Magazine, vol. 10, no. 3, pp. 49–55, April 1990.
uncertainty,” Computer Communications, vol. 73, pp. 34–45, Jan. 2016. [123] H. Ning and Z. Wang, “Future internet of things architecture: Like
[103] M. Chen, W. Saad, C. Yin, and M. Debbah, “Data correlation-aware mankind neural system or social organization framework?,” IEEE
resource management in wireless virtual reality (VR): An echo state Communications Letters, vol. 15, no. 4, pp. 461–463, March 2011.
transfer learning approach,” IEEE Transactions on Communications, [124] F. Alam, R. Mehmood, I. Katib, and A. Albeshri, “Analysis of eight
vol. 67, no. 6, pp. 4267–4280, June 2019. data mining algorithms for smarter internet of things (IoT),” Procedia
[104] M. Chen, W. Saad, and C. Yin, “Virtual reality over wireless networks: Computer Science, vol. 98, pp. 437–442, Dec. 2016.
Quality-of-service model and learning-based resource management,” [125] X. Luo, Y. Lv, M. Zhou, W. Wang, and W. Zhao, “A laguerre neural
IEEE Transactions on Communications, vol. 66, no. 11, pp. 5621– network-based ADP learning scheme with its application to tracking
5635, Nov 2018. control in the internet of things,” Personal and Ubiquitous Computing,
[105] M. Chen, O. Semiari, W. Saad, X. Liu, and C. Yin, “Federated echo vol. 20, no. 3, pp. 361–372, June 2016.
state learning for minimizing breaks in presence in wireless virtual [126] L. Du, Y. Du, Y. Li, J. Su, Y. Kuan, C. Liu, and M. F. Chang, “A
reality networks,” arXiv preprint arXiv:1812.01202, 2018. reconfigurable streaming deep convolutional neural network accelerator
[106] G. A. Koulieris, G. Drettakis, D. Cunningham, and K. Mania, “Gaze for internet of things,” IEEE Transactions on Circuits and Systems I:
prediction using machine learning for dynamic stereo manipulation in Regular Papers, vol. 65, no. 1, pp. 198–208, Jan 2018.
games,” in Proc. of IEEE Virtual Reality (VR), Greenville, SC, USA, [127] T. Yu, X. Wang, and A. Shami, “UAV-enabled spatial data sampling in
March 2016. large-scale IoT systems using denoising autoencoder neural network,”
[107] M. Chen, W. Saad, and C. Yin, “Echo-liquid state deep learning for IEEE Internet of Things Journal, vol. 6, no. 2, pp. 1856–1865, April
360◦ content transmission and caching in wireless VR networks with 2019.
cellular-connected UAVs,” IEEE Transactions on Communications, to [128] P. Zhang, X. Kang, D. Wu, and R. Wang, “High-accuracy entity state
appear, 2019. prediction method based on deep belief network towards iot search,”
[108] E. Zeydan, E. Bastug, M. Bennis, M. A. Kader, I. A. Karatepe, A. S. IEEE Wireless Communications Letters, to appear, 2018.
Er, and M. Debbah, “Big data caching for networking: Moving from [129] J. Liang, X. Yu, and H. Li, “Collaborative energy-efficient moving
cloud to edge,” IEEE Communications Magazine, vol. 54, no. 9, pp. in internet of things: Genetic fuzzy tree vs. neural networks,” IEEE
36–42, Sept. 2016. Internet of Things Journal, to appear,2018.
[109] J. Cobb and H. ElAarag, “Web proxy cache replacement scheme based [130] “Qualcomm announces support for next-generation vr experiences
on back-propagation neural network,” Journal of Systems and Software, with new snapdragon 845 virtual reality development
vol. 81, no. 9, pp. 1539–1558, Sept. 2008. kit,” https://www.qualcomm.com/news/releases/2018/03/21/
[110] Z. Zhang, M. Hua, C. Li, Y. Huang, and L. Yang, “Placement delivery qualcomm-announces-support-next-generation-vr-experiences-new-snapdragon,
array design via attention-based sequence-to-sequence model with deep 2018.
neural network,” IEEE Wireless Communications Letters, vol. 8, no. [131] M. Bennis M. S. Elbamby, C. Perfecto and K. Doppler, “To-
2, pp. 372–375, April 2019. wards low-latency and ultra-reliable virtual reality,” available online:
[111] Y. Wei, F. R. Yu, M. Song, and Z. Han, “Joint optimization of caching, arxiv.org/abs/1801.07587, Jan. 2018.
computing, and radio resources for fog-enabled IoT using natural actor- [132] “Vive wireless adapter,” https://www.vive.com/us/wireless-adapter/,
critic deep reinforcement learning,” IEEE Internet of Things Journal, 2018.
vol. 6, no. 2, pp. 2061–2073, April 2019. [133] “TPCAST wireless adapter for oculus rift,” https://www.tpcastvr.com/
[112] E. Bastug, M. Bennis, E. Zeydan, M. A. Kader, I. A. Karatepe, A. S. product-rift, 2018.
Er, and M. Debbah, “Big data meets telcos: A proactive caching [134] “Because your senses do not have wires,” https://www.intel.com/
perspective,” Journal of Communications and Networks, vol. 17, no. content/www/us/en/wireless-products/wigig-overview.html, 2018.
6, pp. 549–557, December 2015. [135] A. E. Abbas, “Constructing multiattribute utility functions for decision
[113] S. M. S. Tanzil, W. Hoiles, and V. Krishnamurthy, “Adaptive scheme analysis,” INFORMS Tutorials in Operations Research, pp. 62–98, Oct.
for caching YouTube content in a cellular network: Machine learning 2010.
approach,” IEEE Access, vol. 5, pp. 5870–5881, March 2017. [136] M. Abrash, “What VR could, should, and almost certainly will be
[114] M. Chen, W. Saad, C. Yin, and M. Debbah, “Echo state networks for within two years,” https://media.steampowered.com/apps/abrashblog/
proactive caching in cloud-based radio access networks with mobile Abrash%20Dev%20Days%202014.pdf.
33
[137] E. Bastug, M. Bennis, and M. Debbah, “Living on the edge: The role [159] R. Williams, “Simple statistical gradient-following algorithms for
of proactive caching in 5g wireless networks,” IEEE Communications connectionist reinforcement learning,” Machine Learning, vol. 8, no.
Magazine, vol. 52, no. 8, pp. 82–89, Aug 2014. 3, pp. 229–256, May 1992.
[138] M. A. Maddah-Ali and U. Niesen, “Coding for caching: Fundamental [160] R. Sutton, D. McAllester, S. Singh, and Y. Mansour, “Policy gradient
limits and practical challenges,” IEEE Communications Magazine, vol. methods for reinforcement learning with function approximation,”
54, no. 8, pp. 23–29, August 2016. Advances in Neural Information Processing Systems, vol. 12, pp. 1057–
[139] Y. Fadlallah, A. M. Tulino, D. Barone, G. Vettigli, J. Llorca, and J. M. 1063, 2000.
Gorce, “Coding for caching in 5G networks,” IEEE Communications [161] T. Tieleman and G. Hinton, “Lecture 6.5—RmsProp: Divide the
Magazine, vol. 55, no. 2, pp. 106–113, Feb. 2017. gradient by a running average of its recent magnitude,” Technical
[140] Y. Mao, C. You, J. Zhang, K. Huang, and K. B. Letaief, “A survey report, 2012.
on mobile edge computing: The communication perspective,” IEEE [162] M. Agiwal, A. Roy, and N. Saxena, “Next generation 5G wireless
Communications Surveys & Tutorials, to appear, 2017. networks: A comprehensive survey,” IEEE Communications Surveys
[141] M. Peng, Y. Sun, X. Li, Z. Mao, and C. Wang, “Recent advances in & Tutorials, vol. 18, no. 3, pp. 1617–1655, Feb. 2016.
cloud radio access networks: System architectures, key techniques, and [163] A. Al-Fuqaha, M. Guizani, M. Mohammadi, M. Aledhari, and
open issues,” IEEE Communications Surveys and Tutorials, vol. 18, M. Ayyash, “Internet of things: A survey on enabling technologies,
no. 3, pp. 2282–2308, Mar. 2016. protocols, and applications,” IEEE Communications Surveys Tutorials,
[142] M. S. Elbamby, M. Bennis, and W. Saad, “Proactive edge computing vol. 17, no. 4, pp. 2347–2376, Fourthquarter 2015.
in latency-constrained fog networks,” in 2017 European Conference on [164] Y. Hu, A. Sanjab, and W. Saad, “Dynamic psychological game theory
Networks and Communications (EuCNC), Oulu, Finland, June 2017. for secure internet of battlefield things (IoBT) systems,” IEEE Internet
[143] E. Bastug, M. Bennis, and M. Debbah, “Living on the edge: The role of Things Journal, vol. 6, no. 2, pp. 3712–3726, April 2019.
of proactive caching in 5G wireless networks,” IEEE Communications [165] V. Smith, C. K. Chiang, M. Sanjabi, and A. Talwalkar, “Federated
Magazine, vol. 52, no. 8, pp. 82–89, Aug. 2014. multi-task learning,” available online: arxiv.org/abs/1705.10467, May
[144] D. Wu and R. Negi, “Effective capacity-based quality of service 2017.
measures for wireless networks,” Mobile Networks and Applications,
vol. 11, no. 1, pp. 91–99, February 2006.
[145] Y. Gu, W. Saad, M. Bennis, M. Debbah, and Z. Han, “Matching theory
for future wireless networks: fundamentals and applications,” IEEE
Communications Magazine, Special Issue on Emerging Applications,
Services, and Engineering for Cellular Cognitive Systems, vol. 53, no.
15, pp. 52–59, May 2015.
[146] 5GPPP, “The 5G infrastructure public private partnership: the next
generation of communication networks and services.,” Feb. 2015.
[147] C. Sexton, N. J. Kaminski, J. M. Marquez-Barja, N. Marchetti, and
L. A. DaSilva, “5G: Adaptable networks enabled by versatile radio
access technologies,” IEEE Communications Surveys Tutorials, vol.
19, no. 2, pp. 688–720, Secondquarter 2017.
[148] Y. Hu, R. MacKenzie, and M. Hao, “Expected Q-learning for self-
organizing resource allocation in LTE-U with downlink-uplink decou-
pling,” in Prof. of European Wireless Conference, Dresden, Germany,
May 2017.
[149] J. Andrews, “Seven ways that HetNets are a cellular paradigm shift,”
IEEE Communications Magazine, vol. 51, no. 3, pp. 136–144, Mar.
2013.
[150] G. Salami, O. Durowoju, A. Attar, O. Holland, R. Tafazolli, and
H. Aghvami, “A comparison between the centralized and distributed
approaches for spectrum management,” IEEE Communications Surveys
Tutorials, vol. 13, no. 2, pp. 274–290, Second 2011.
[151] Q. Li, H. Niu, A. Papathanassiou, and G. Wu, “5G network capacity:
Key elements and technologies,” IEEE Vehicular Technology Magazine,
vol. 9, no. 1, pp. 71–78, Mar. 2014.
[152] O. Semiari, W. Saad, M. Bennis, and M. Debbah, “Joint millimeter
wave and microwave resources allocation in cellular networks with
dual-mode base stations,” IEEE Transactions on Wireless Communi-
cations, vol. 16, no. 7, pp. 4802–4816, July 2017.
[153] S. Ha, S. Sen, C. Joe-Wong, Y. Im, and M. Chiang, “TUBE: time
dependent pricing for mobile data,” in Proc. of Special Interest Group
on Data Communication (ACM SIGCOMM). Helsinki, Finland, Aug.
2012.
[154] U. Challita and W. Saad, “Network formation in the sky: Unmanned
aerial vehicles for multi-hop wireless backhauling,” in Proc. of the
IEEE Global Communications Conference (GLOBECOM), Singapore,
Dec. 2017.
[155] O. Semiari, W. Saad, M. Bennis, and Z. Dawy, “Inter-operator resource
management for millimeter wave, multi-hop backhaul networks,” IEEE
Transactions on Wireless Communications, vol. 16, no. 8, pp. 5258–
5272, Aug. 2017.
[156] N. Burlutskiy, M. Petridis, A. Fish, A. Chernov, and N. Ali, An
Investigation on Online Versus Batch Learning in Predicting User
Behaviour, Research and Development in Intelligent Systems XXXIII.
Springer, 2016.
[157] U. Challita, L. Dong, and W. Saad, “Deep learning for proactive
resource allocation in LTE-U networks,” in Proc. of European Wireless
Conference. Dresden, Germany, May 2017.
[158] U. Challita, L. Dong, and W. Saad, “Proactive resource management
for LTE in unlicensed spectrum: A deep learning perspective,” IEEE
Transactions on Wireless Communications, vol. 17, no. 7, pp. 4674–
4689, July 2018.