Master'S Thesis: Potential Deep Learning Approaches For The Physical Layer
Master'S Thesis: Potential Deep Learning Approaches For The Physical Layer
Master'S Thesis: Potential Deep Learning Approaches For The Physical Layer
MASTER’S THESIS
POTENTIAL DEEP LEARNING
APPROACHES FOR THE PHYSICAL
LAYER
July 2019
Rajapaksha R. (2019) Potential Deep Learning Approaches for the Physical
Layer. University of Oulu, Faculty of Information Technology and Electrical
Engineering, Degree Programme in Wireless Communications Engineering, 59 p.
ABSTRACT
ABSTRACT
TABLE OF CONTENTS
FOREWORD
LIST OF ABBREVIATIONS AND SYMBOLS
1 INTRODUCTION 7
1.1 Motivation and Thesis Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 BACKGROUND AND LITERATURE 10
2.1 Potential of Deep Learning for the Physical Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Deep Learning Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Deep Learning Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4.1 Deep Learning based Block Structured Communications . . . . . . . . . . . . . 13
2.4.2 Deep Learning based End-to-End Communications . . . . . . . . . . . . . . . . . . . 15
3 END-TO-END LEARNING OF UNCODED SYSTEMS 18
3.1 End-to-End Learning of a Communications System. . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.2 Autoencoder for End-to-End Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Autoencoder Implementation for Uncoded Communications Systems . . . . . . . 20
3.3 Results and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4 END-TO-END LEARNING OF CODED SYSTEMS 26
4.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.1.1 Baseline for Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2 Autoencoder Implementation for Coded Communications Systems with
BPSK Modulation for AWGN Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2.2 Results and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.3 Autoencoder Implementation for Coded Communications Systems with
Higher Order Modulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.3.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.3.2 Effect of Model Layout and Hyperparameter Tuning for the
Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3.3 Results and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.3.4 Processing Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.3.5 Comparison with 5G Channel Coding and Modulation Schemes . . . . 45
5 CONCLUSION AND FUTURE WORK 46
5.1 Summary and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6 REFERENCES 49
7 APPENDICES 52
FOREWORD
This thesis is focused on potential deep learning approaches for the physical layer as
a part of the High5 and MOSSAF projects at the Center for Wireless Communications
(CWC) of University of Oulu, Finland. I would like to express my sincere gratitude to my
supervisor and mentor Prof. Nandana Rajatheva for the guidance, support, inspiration
and encouragement given throughout the period of my master studies. I am also
grateful to Academy Prof. Matti Latva-aho for providing me the opportunity to join
and contribute to the High5 and MOSSAF projects. Also I would like to thank project
manager Dr. Pekka Pirinen and the other colleagues at CWC for their support in my
research. I would also like to express my gratitude to Matti Isohookana, the coordinator
of Double Degree Master’s Program, for his support and guidance throughout the past
year.
I am also thankful to Dr. Janaka V. Wijayakulasooriya, my supervisor from University
of Peradeniya, Sri Lanka, for the support given. Also I would like to thank all the
lecturers from University of Peradeniya for their contribution in making the inaugural
Double Degree Master’s Programme a success.
Finally I would like to express my sincere gratitude to my mother, father and brother
for their immense love, support and encouragement provided throughout my life.
Acronyms
1D One-Dimensional
2D Two-Dimensional
3GPP 3rd Generation Partnership Project
4G Fourth Generation
5G Fifth Generation
ANN Artificial Neural Network
ASIC Application-Specific Integrated Circuit
AWGN Additive White Gaussian Noise
BAIR Berkeley Artificial Intelligence Research
BER Bit Error Rate
BLER Block Error Rate
BPSK Binary Phase-Shift Keying
CC Convolutional Coding
CNN Convolutional Neural Networks
CP Cyclic Prefix
CPU Central Processing Unit
CSI Channel State Information
DL Deep Learning
DNN Deep Neural Network
GAN Generative Adversarial Network
GD Gradient Descent
GPU Graphics Processing Units
LDPC Low Density Parity Check
LTE Long-Term Evolution
MASK M-ary Amplitude Shift-Keying
MER Message Error Rate
MFSK M-ary Frequency Shift-Keying
ML Machine Learning
MLP Multi Layer Perceptrons
MMSE Minimum Mean Square Error
MSE Mean Squared Error
NLP Natural Language Processing
NN Neural Network
NR New Radio
OFDM Orthogonal Frequency-Division Multiplexing
QAM Quadrature Amplitude Modulation
QPSK Quadrature Phase-Shift Keying
ReLU Rectified Linear Unit
RTN Radio Transformer Network
SDR Software-Defined Radio
SGD Stochastic Gradient Descent
SNR Signal-to-Noise Ratio
SVM Support Vector Machine
TBCC Tail Biting Convolutional Codes
TPU Tensor Processing Unit
URLLC Ultra-Reliable Low-Latency Communication
Symbols
Wireless networks and related services have become critical and fundamental building
blocks in the modern digitized world which have changed the way we live, work
and communicate with each other. Emergence of many unprecedented services and
applications such as autonomous vehicles, remote medical diagnostics and surgeries,
smart cities and factories, artificial intelligence based personal assistants etc. are
challenging the traditional communication mechanisms and approaches in terms of
latency, reliability, energy efficiency, flexibility and connection density. Catering the
stringent requirements arising from those different verticals requires a greater need for
wireless system research with novel architectures, approaches and algorithms in almost all
the layers of a communications system. Newly initiated fifth generation (5G) of mobile
communication technology is expected to cater for these requirements revolutionizing
everything so far in wireless-enabled applications [1].
As said, 5G brings most stringent requirements when catering to the advanced
applications and services which will be supported by it. For example, ultra-reliable low-
latency communication (URLLC), one category out of the three service categories defined
in 5G, which perhaps may be the most challenging as it needs to meet two challenging
and contradicting requirements: low latency and ultra-high reliability, requires end-to-
end latency in range of 10 ms and very high reliability with 10−5 bit error rate (BER) in
1 ms period [2]. High reliability means that the channel estimation accuracy should
be improved since the channel coding gain is small for the short packet lengths so
that the loss, if any, caused by the channel estimation should be prevented as much as
possible. It is to be achieved by advanced channel estimation techniques and by adding
more resources to the pilots which again raises concerns with latency requirements as
more pilots result in control overhead which affect the throughput and hence latency
of the communications. Another concern is faster signal processing at the transmitter
and receiver to achieve low latency requirements of URLLC. Therefore, for successful
implementation of URLLC systems, all these factors need to be taken into consideration
which requires novel architectures, approaches and algorithms in almost all the layers of
the communication system.
Communications field is very rich of expert knowledge based on statistics, information
theory and solid mathematical modelling capable of modelling channels [3], optimal
signalling and detection schemes for reliable data transfer compensating for various
hardware imperfections etc. especially for the physical layer [4]. However, existing
conventional communication theories exhibit several inherent limitations in fulfilling the
large data and ultra high rate communication requirements in complex situations such
as channel modelling in complex scenarios, fast and effective signal processing in low
latency systems such as URLLC, limited and sub-optimal block structures due to the
fixed block structure of the communication systems etc. In recent history, there has been
an increasing interest in deep learning approaches for the physical layer implementations
due to certain advantages they possess which could be useful in overcoming the above
challenges.
1.1 Motivation and Thesis Objectives
• Conduct a thorough literature review on deep learning concepts for the physical
layer and identify appropriate deep learning approaches as alternatives for the
conventional communications systems.
• Study about the autoencoder based end-to-end learning of communications systems
and implement the basic system proposed by [5] for a simple transmitter-receiver
system.
The thesis is structured into five chapters. In this chapter we have given an
overview about requirements and challenges which are needed to be addressed by future
communications systems and have discussed our motivations to look into deep learning
based approaches for the physical layer communication blocks. In the second chapter we
present the background and literature related to the thesis work. There we discuss the
potential of deep learning for the physical layer in detail, along with deep learning basics
which are relevant in the context of studied literature and the thesis work. Then we
present the literature review, explaining the existing work on deep learning based block
structured communications and deep learning based end-to-end communications.
In the third chapter, we discuss the autoencoder concept for end-to-end learning of
communications systems and analyse the performance of the autoencoder based end-
to-end system proposed by [5] in comparison to conventional uncoded systems with
different modulation schemes. In the fourth chapter we analyse the performance of
the autoencoder based end-to-end learning system in comparison to conventional coded
systems with different modulation schemes. There, we also present a new autoencoder
model to cater for implementing equivalent autoencoder counterpart of coded systems
with higher order modulation schemes with a comparable BER performance to the
baseline systems compared. In the last chapter we present the conclusions of our research
findings along with future directions for improvements.
10
There have been attempts to apply machine learning (ML) for the physical layer for few
decades where researchers have proposed ML based algorithms for different sub tasks in
the physical layer such as modulation recognition [7], [8], encoding and decoding [9], [10],
channel modelling and identification [11], channel estimation and equalization [12], [13],
[14] etc. However, it is seen that ML has not been commercially used due to the fact
that ML algorithms do not have enough learning capability to cater for the complex task
of handling physical channels.
It is believed that introduction of deep learning (DL) to the physical layer could
bring further performance improvements to the existing ML approaches, eliminating the
limitations faced by conventional ML algorithms, due to the characteristics it has such
as deep modularization which enhances feature extraction and structural flexibility to a
great extent compared to ML algorithms [15]. Specifically, DL-based systems could be
used to automatically learn features from raw data instead of manual feature extraction
where flexible adjustment of model structures via hyperparameter tuning is possible in
order to optimize end-to-end performance of the system.
In this chapter we discuss the potential of applying DL for the physical layer which has
created a great interest among the research community to study DL-based approaches for
the physical layer. We then present a basic overview about the basic DL concepts where
a detailed description of different DL concepts is available in Appendix 1. An overview
of different DL libraries is also presented. Finally, we present a detailed overview of
selected literature which have proposed new DL-based approaches in the physical layer
of the communication systems.
observed that these “learned” algorithms could be executed faster and at lower
energy cost than their manually “programmed” counterparts [5]. Parallel processing
architectures with distributed memory architectures such as graphics processing
units (GPUs) and specialized chips for NN inferences, are proved to be very energy
efficient and capable of providing considerable computational throughput when
fully utilized by parallel implementations [5].
Building a DL model from the scratch is a complex task and requires a great effort as it
requires definitions of forwarding behaviours and gradient propagation operations at each
layer and implementing efficient and fast optimization algorithms for model training, in
addition to CUDA coding for GPU parellelization. In recent years, DL has gained a
great momentum and popularity being used in quite many application areas such as
image and video recognition, speech recognition and natural language processing (NLP)
etc. This continuously-growing usage and popularity of DL has resulted in development
of numerous tools, algorithms and dedicated libraries which make it easy to build and
train large NNs. Most of these tools allow high level algorithm definition in various
programming languages or configuration files, automatic differentiation of training loss
functions through arbitrarily large networks, and compilation of the network’s forwards
and backwards passes into hardware optimized concurrent dense matrix algebra kernels
[5]. They are built with massively parallel GPU architectures enabling GPU acceleration
which makes faster processing of model training routines of large networks with huge
amounts of data. A brief summary of some of the widely used libraries is given below.
• TensorFlow
Created by the Google Brain team, TensorFlow is an open source library for
numerical computation and large-scale machine learning that operates at large
scale and in heterogenous environments [17]. TensorFlow uses dataflow graphs
to represent computation, shared state, and the operations that mutate that
state. It maps the nodes of a dataflow graph across many machines in a cluster,
and within a machine across multiple computational devices, including multicore
cetnral processing units (CPUs), general-purpose GPUs, and custom-designed
application-specific integrated circuits (ASICs) known as Tensor processing units
(TPUs) [17]. TensorFlow supports multiple languages to create DL models. Some
of the languages that it supports are Python, C++, Java, Go, R. Currently,
the best-supported client language is Python with detailed documentation and
tutorials. Keras [23], Luminoth and TensorLayer are some of the dedicated
DL toolboxes which are built upon TensorFlow, which provide higher-level
programming interfaces. Keras is the main tool which we have used to implement
the DL models we have proposed in this thesis, as it has a very user friendly
and highly customizable interface which enables quick and easy prototyping for
experimenting.
• PyTorch
PyTorch [18] is a Python based open source DL library inspired by Torch. It is
a framework built to be flexible and modular for research, with the stability and
support needed for production deployment. PyTorch has been primarily developed
by Facebook’s artificial intelligence research group. PyTorch is one of the preferred
DL research platforms built to provide maximum flexibility and speed. It is known
for providing two of the most high-level features; namely, tensor computations
with strong GPU acceleration support and building deep neural networks on a
tape-based autograd systems designed for immediate and python-like execution.
PyTorch has a growing popularity among the research community since building
NNs in PyTorch is straightforward.
13
• Caffe
Caffe is a dedicated DL framework made with expression, speed, and modularity
in mind. It is developed by Berkeley artificial intelligence research (BAIR) [19]
and by community contributors. It allows to train NNs on multiple GPUs
within distributed systems, and supports DL implementations on mobile operation
systems, such as iOS and Android [20].
Channel
Source RF
Source Channel Coding Modulation
Coding Transmitter
Source Channel
Destination Demodulation Detection RF Receiver
Decoding Decoding
Channel
Estimation
In the previous section, we discussed several DL based approaches which are used
as alternatives for one or two processing blocks of the conventional block structured
communications system. However, when looking back to the original requirement of
a communications system of transmitting a message from source to destination over a
channel, even though the block structure enables individual analysis and controlling of
each block, it can not be guaranteed that optimization of each block will always result
in global optimization for the communication problem because end-to-end performance
improvements can be achieved by joint optimization of two or more blocks.
A novel DL based concept has been introduced in recent history based on this thought
process, which reformulates the communication task as an end-to-end reconstruction
optimization task where the artificial block structure of the conventional communications
system is no longer required. This novel concept is based on implementing the end-to-
end communications system by an autoencoder system and the initial studies have shown
that it has comparable performance to the conventional systems and also has shown that
16
the end-to-end method has great potential to be a universal solution for different channel
models. In this section we discuss about that newly introduced concept of autoencoder
based end-to-end communications and present details about some of very recent studies
based on that.
libraries. The limitation of short block lengths faced by the autoencoder models has
been overcome by implementing mechanisms for continuous data transmission and
receiver synchronization where a frame synchronization module based on another NN
is implemented at the receiver to cater for the receiver synchronization. A two step
training procedure based on transfer learning is used to overcome training the model
over actual channels by finetuning the receiver part of the autoencoder to capture the
effects of the actual channel including the hardware imperfections which are not initially
included in the model. Comparison of the BLER performance of the “learned” system
with that of a practical baseline have shown a comparable performance close to 1 dB. The
study has thus validated the potential of actual implementation of autoencoder based
communication systems.
∈ ∈ ℂ ∈ ℂ ̂ ∈
0 0.01
0 0.008
0 0.001
. .
Multiple Dense
Normalization
. .
Activation
1 0.89
Layers
Layers
Noise
Layer
Layer
0 0.02
. .
. .
. .
. .
. .
0 0.002
0 0.03
Layers 1-5 compose the transmitter side of the system where the energy constraint
of the transmit signals is guaranteed by the normalization layer at the end. Layers 7-9
compose the receiver side of the system where estimated message can be found from the
output of the softmax layer. Noise layer in-between the transmitter and receiver side of
the system acts as the AWGN channel.
Autoencoder is trained end-to-end over the stochastic channel model using SGD
method with the Adam optimizer with learning rate = 0.001. Following approaches
were taken to select Eb /N0 values for the AWGN channel during training:
• Training at a fixed Eb /N0 value (i.e., 5 dB or 8 dB etc.)
• Picking Eb /N0 values randomly from a predefined Eb /N0 range for each training
epoch
• Starting from a high Eb /N0 value and gradually decreasing it along training epochs
(i.e., starting from 8 dB and reduce by 2 dB after each 10 epochs)
21
Autoencoder model training and testing was implemented in Keras [23] with
TensorFlow [23] as its backend. We have trained the models over 50 epochs using
1,000,000 randomly generated messages with Eb /N0 values for AWGN channel in model
training in three settings mentioned earlier. Testing the trained models were done with
1,000,000 different random messages for 0 dB to 8 dB Eb /N0 range and their BER
performance have been compared with the corresponding baseline systems.
We have tried out several autoencoder configurations which result in BPSK equivalent
systems where communication rate R = 1 bit/channel use and QPSK equivalent systems
where R = 2 bits/channel use. The message alphabet size M and number of channel
uses n have been set accordingly in order to achieve the desired communication rate.
Table 3.2 shows the autoencoder configuration parameters and their baseline systems
which we have compared performance with. Total energy per message is kept same in
both autoencoder system and baseline system in each scenario.
Table 3.2. Different autoencoder configuration parameters and their baseline systems
used for performance comparison
BER performance
Figure 3.3 shows the BER performance of R = 1 systems compared with theoretical
AWGN BER performance of their baseline BPSK scheme. It seems that all four
autoencoder configurations have equal or better BER performance across almost full
Eb /N0 range except from 0 dB to 2 dB where some autoencoder systems have slightly
higher BER than BPSK system. It is interesting to see that the BER performance
improves when the message alphabet size and number of channel uses per message
increases even though communication rate is the same in all models. That is probably
because the transmitted messages get some sort of temporal encoding since multiple
channel uses are used to transmit a single message. When the number of channel uses
per message increases, more flexibility and degree of freedom is there to formulate the
22
transmit symbols adjusting to the channel distortions and hence transmit symbols are
more tolerant to errors and can be recovered at the receiver with less errors.
100
M=2, n=1 Autoencoder
M=4, n=2 Autoencoder
M=8, n=3 Autoencoder
10-1 M=16, n=4 Autoencoder
M=256, n=8 Autoencoder
BPSK
10-2
Bit Error Rate (BER)
10-3
10-4
10-5
10-6
0 1 2 3 4 5 6 7 8
E b /N 0 (dB)
Figure 3.3. BER performance of the R = 1 bit/channel use systems compared with
theoretical AWGN BPSK performance.
However, it should be noted that this kind of a system has a certain delay when
detecting and decoding received symbols at the receiver. That is, for a system with
M = 16 and n = 4, even though the communication rate of 1 bit/channel use gives the
idea that we can decode 1 bit per each transmission at the receiver, we have to wait for
four signalling instances to receive the complete message which consists of four symbols
and then decode it so that it gives the 4 bits long message. While increasing M and n
enables to have a lower BER for a system with a given communication rate, this delay in
detection also needs to be taken into consideration when deciding M and n parameters.
Figure 3.4 shows the BER performance of R = 2 systems compared with theoretical
AWGN BER performance of their baseline QPSK scheme and there also we can observe
that autoencoder has better BER performance than QPSK in higher Eb /N0 values.
However, it is seen that QPSK is better when Eb /N0 is in the low range between 0-5
dB.
23
100
M=4, n=1 Autoencoder
M=16, n=2 Autoencoder
M=64, n=3 Autoencoder
M=256, n=4 Autoencoder
10-1
QPSK
Bit Error Rate (BER)
10-2
10-3
10-4
0 1 2 3 4 5 6 7 8
E b /N 0 (dB)
Figure 3.4. BER performance of the R = 2 bits/channel use systems compared with
theoretical AWGN QPSK performance.
Learned Constellations
The autoencoder can be split to two parts: encoder and decoder, after training the
model in end-to-end manner. Then the encoder part is implemented at the transmitter
side which generates encoded symbols for each message to be sent over the channel and
decoder part is implemented at the receiver which regenerates the messages from the
received symbols. After completion of model training, encoder can generate all possible
output signals for each message in the message alphabet. Figure 3.5 and Figure 3.6 show
learned constellations for different systems we tested. When mapping 2n-dimensional
output from the encoder model to the n-dimensional complex valued vector x, the odd
indexed elements of x are taken as in-phase (I) components and even elements of x are
taken as quadrature (Q) components. In the scatter plots, I and Q values are plotted in
x- and y- axes respectively.
From the scatter plots in Figure 3.5, it can be seen that for the systems with M = 1, 2
and 4 where n = 1, learned constellations are similar to BPSK, QPSK and 16-PSK
constellations respectively, with some arbitrary rotations. For the M = 4, n = 2 system
shown in Figure 3.6, we can observe that the model has learned unique constellation
points for four messages in two symbols in order to minimize the symbol estimation
error at the receiver. It can be seen that in the first symbol, points marked in “∗” have a
maximum amplitude contrast in their in-phase amplitude values having high positive and
negative values, and in the second symbol their signal points are closely located to each
24
other near zero. On the other hand, points marked in “4” have low amplitude values in
the first symbol and high amplitude values in the second symbol. This arrangement has
enabled the system to have a better a tolerance to channel distortions and has resulted
in less symbol estimation errors at the receiver.
2 2 2
1 1 1
0 0 0
-1 -1 -1
-2 -2 -2
-2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2
Symbol 1 Symbol 2
2 2
1 1
0 0
-1 -1
-2 -2
-2 -1 0 1 2 -2 -1 0 1 2
approaches is shown in Figure 3.7. For this, we have used M = 16, n = 4, R = 1 system
configuration which is equivalent to BPSK.
100
10-1
10-2
Bit Error Rate
10-3
10-6
0 1 2 3 4 5 6 7 8
Eb/No(dB)
Figure 3.7. BER performance for different training Eb /N0 values and different batch
sizes. M = 16, n = 4 (R = 1 bit/channel use) system used.
From the results, we could see that training at a fixed Eb /N0 value of 5 dB gave the
best BER performance. Increasing the training Eb /N0 seemed not that optimum as the
trained model was unable to perform well in low Eb /N0 values. When the training Eb /N0
was selected to be too low, BER performance again degraded as the model seemed unable
to capture the actual underlying patterns between inputs and outputs during training.
Different batch sizes were tried when training the models and it was observed that when
training at a fixed Eb /N0 = 5 dB, a larger batch size of 2000 resulted in improved BER
performance compared to smaller batch sizes, while when training was done decreasing
Eb /N0 values along the training epochs, a smaller batch size around 50 or 100 gave better
BER performance than higher batch sizes. Overall, training at a fixed Eb /N0 = 5 dB with
batch size = 2000 gave the best BER performance among the different configurations we
tried.
26
Transmitter
Channel
Receiver
Then the data block is fed to the modulator of order Mmod where the data bits are
divided into codewords of size kmod = log2 (Mmod ), and each codeword is mapped to
a point in the signal constellation with given amplitudes for the I,Q signals. At the
receiver, the reverse process of the above happens where incoming symbols are mapped
to codewords and the codewords are grouped serially to produce the block. It is then
fed to the channel decoder where the N bit sized block is converted to K bit sized block
after channel decoding which is the estimation of the transmitted information block.
In Section 4.2 and 4.3, the autoencoder model dimensions are selected in such a way
that the above described network structure is preserved so that baseline system and
autoencoder system can be compared.
The same autoencoder model developed in Section 3.2 is used to compare the autoencoder
model BER performance with the conventional coded system BER performance.
Different rates R were achieved by changing ratio of the number of input bits k
and number of n channel uses in the autoencoder models accordingly, so that the
models resulted in equivalent systems to conventional systems with code rates R =
{1/2, 1/3, 1/4}, and with BPSK modulation (Mmod = 2). The AWGN channel noise
variance is given by β = (2REb /N0 )−1 . This section presents the implementation and
the results obtained for the autoencoder based transmitter-receiver system in comparison
with the standard communications system with different code.
4.2.1 Implementation
Autoencoder model was implemented, trained and tested in Keras with TensorFlow as
backend similar to Section 3.2 and the model training was done in end-to-end over
stochastic channel model using SGD with Adam optimizer with learning rate = 0.001.
Same code rate values were achieved by implementing autoencoder models for different
28
messages sizes M = {2, 4, 16, 256} and setting the number of channel uses (n) accordingly.
Table 4.1 shows the model parameters for different simulations that were performed. For
each model, energy for transmitting a message were kept equal in the autoencoder model
and in the baseline system. Each model was trained over 50 epochs and mini-batch size
2000 using a training set of 1,000,000 randomly generated messages. For model training,
Eb /N0 = 5 dB was used. Testing the trained models were performed with 1,000,000
different messages over 0 dB to 10 dB Eb /N0 range comparing the BER performance
with their corresponding baseline system.
Table 4.1. System parameters for autoencoder models and baseline systems
BER Performance
Figures 4.2 - 4.4 show simulated BER performances of different autoencoder models with
R = {1/2, 1/3, 1/4} and their baseline systems with convolutional coding with respective
code rates and BPSK modulation scheme. Selected block length for baseline system is
K = 800 and the constraint length of the convolutional encoder/decoder is taken as 7.
It can be observed that the BER performance of the autoencoder improves when
message size is increasing. For a given code rate, M = 2 model has almost same
performance with uncoded BPSK while M = 256 model has resulted in a much improved
BER performance closer to the baseline. This improvement is achieved since the model
has more degrees of freedom and more flexibility for a better end-to-end optimization
when the message size is high.
29
100
R=1/2, soft decoding, BPSK
R=1/2, hard decoding, BPSK
Autoencoder: M=2, R=1/2
10-1
Autoencoder: M=4, R=1/2
Autoencoder: M=16, R=1/2
Autoencoder: M=256, R=1/2
10-2 BPSK uncoded
Bit Error Rate (BER)
10-3
10-4
10-5
10-6
10-7
0 1 2 3 4 5 6 7 8 9 10
E b /N 0 (dB)
100
R=1/3, soft decoding, BPSK
R=1/3, hard decoding, BPSK
Autoencoder: M=2, R=1/3
Autoencoder: M=4, R=1/3
10-1
Autoencoder: M=16, R=1/3
Autoencoder: M=256, R=1/3
BPSK uncoded
10-2
Bit Error Rate (BER)
10-3
10-4
10-5
10-6
0 1 2 3 4 5 6 7 8 9 10
E b /N 0 (dB)
When considering M = 2 and rate R = 1/2 system, number of bits per message
is k = log2 (2) = 1 and, n = 2 number of channel uses are there to transmit the 1
bit message. For this setup, the best possible signal formulation in each channel use
would be to maximize the distance between two message constellation points which is
similar to BPSK modulation. Since the autoencoder model dimensions are determined
by the parameters M and R, low M values do not result in much coding gain as the
non-linearities added by the model during the learning process are limited by the layer
dimensions. Increasing the message size increases the layer dimensions and, for a same
rate R, the model has more degrees of freedom in terms of learnable parameters which can
be optimized to minimize the end-to-end message transmission error. For example, for
M = 256, R = 1/2 system, 16 channel uses are there to transmit 256 different messages
which has more flexibility than the earlier scenario where 2 messages are transmitted in
2 channel uses. Thus, when M = 256, R = 1/2, the model has been able to learn the
transmit symbols with a channel coding gain as expected, which can be observed from
the BER plots. Table 4.2 compares the number of learnable parameters in each layer
in above two scenarios which helps us understanding how the model learning capacity
increases with increasing message size.
Even though the autoencoder BER performance is always worse than soft decision CC,
it can be observed that autoencoder has a comparable performance to the hard decision
CC, specially when the code rate is high. For R = 1/2, autoencoder with M = 256 is
better than hard decision CC in low Eb /N0 range from 0 dB to 5 dB and it is only around
1 dB worse than the hard decision CC at a BER of 10−5 .
31
100
R=1/4, soft decoding, BPSK
R=1/4, hard decoding, BPSK
Autoencoder: M=2, R=1/4
10-1 Autoencoder: M=4, R=1/4
Autoencoder: M=16, R=1/4
Autoencoder: M=256, R=1/4
BPSK uncoded
10-2
Bit Error Rate (BER)
10-3
10-4
10-5
10-6
10-7
0 1 2 3 4 5 6 7 8 9 10
E b /N 0 (dB)
Figure 4.5 shows the message error rate performances of different autoencoder models
with R = {1/2, 1/3, 1/4} and M = 256. We can observe that for the same message size,
three models with different rates have resulted in almost same MER, and the models
have an acceptable MER performance with a 10−5 error at 7 dB.
100
Autoencoder: M=256, R=1/2
Autoencoder: M=256, R=1/3
-1
10 Autoencoder: M=256, R=1/4
Message Error Rate (MER)
10-2
10-3
10-4
10-5
10-6
10-7
0 1 2 3 4 5 6 7 8
E b /N 0 (dB)
Figure 4.5. MER performance of different autoencoder models with R = {1/2, 1/3, 1/4}
and M = 256.
32
Figures 4.6 and 4.7 compare the autoencoder BER performance for R = {1/2, 1/4}
when different block lengths are used in the baseline system with hard decision CC. It
can be observed that for R = 1/2 system, baseline BER performance is better for mid-
range block lengths where K is 400 - 800 bits. The performance degrades when block
size is very low as 200 bits or high as 2000 - 4000 bits, reducing the gap between the
autoencoder and baseline BER performance. For R = 1/4 hard decision CC, there is not
much effect on block size for the BER. On the other hand, autoencoder performance is
independent of block length K as for a given model with message size M , its input size
is k = log2 (M ) bits where the K bit long block is divided to k bit long sub-blocks and
fed to the system.
Thus, from the results we obtained, autoencoder models would be more effective to be
used in high or low input block length scenarios for systems with higher code rates where
they have comparable performance to the baseline.
100
K=200, R=1/2 hard
K=400, R=1/2 hard
K=800, R=1/2 hard
10-1 K=1600, R=1/2 hard
K=2000, R=1/2 hard
K=4000, R=1/2 hard
Autoencoder: M=256, R=1/2
10-2 BPSK uncoded
Bit Error Rate (BER)
10-3
10-4
10-5
10-6
10-7
0 1 2 3 4 5 6 7 8 9 10
E b /N 0 (dB)
Figure 4.6. R = 1/2 system BER performance comparison for different block lengths:
K = {200, 400, 800, 1600, 2000, 4000}.
33
100
K=200, R=1/4 hard
K=400, R=1/4 hard
K=800, R=1/4 hard
10-1 K=1600, R=1/4 hard
K=2000, R=1/4 hard
K=4000, R=1/4 hard
Autoencoder: M=256, R=1/4
10-2 BPSK uncoded
Bit Error Rate (BER)
10-3
10-4
10-5
10-6
10-7
0 1 2 3 4 5 6 7 8 9 10
E b /N 0 (dB)
Figure 4.7. R = 1/4 system BER performance comparison for different block lengths:
K = {200, 400, 800, 1600, 2000, 4000}
Learned Constellations
Figures 4.8 and 4.9 show the learnt constellations for different systems we tested for a
same code rate of R = 1/2. Same as in Chapter 3, when mapping 2n-dimensional output
from the encoder model to the n-dimensional complex valued vector x, the odd indexed
elements and even indexed elements of x are taken as I and Q components respectively. In
the scatter plots, I and Q values are plotted in x- and y- axes respectively. M = 4, R = 1/2
system uses 4 symbols to transmit a single message and Figure 4.8 shows the signal points
in all 4 symbols. It can be observed that the model has learned unique constellation points
for 4 messages in four symbols in order to minimize the symbol estimation error at the
receiver. M = 16, R = 1/2 system uses 8 symbols to transmit a single message and the
learned signal points for each of the 8 symbols are shown in Figure 4.9.
From the constellation diagrams we can observe that the autoencoder system does not
have a fixed constellation as in equivalent BPSK modulation scheme. In a conventional
communications system, output from the channel coding block is binary valued and each
bit in the coded block is mapped to the BPSK constellation accordingly. Thus, the
signal transmission in each channel use is independent of others and each signal carries
independent bit information. In contrast, in the autoencoder, the transmit signals are
temporally correlated to each other as n signals in n channel uses transmit the message
s a whole. In the autoencoder implementation, the model uses the available number of
channel uses (or number of symbols) per transmit message and learns the optimum I, Q
signal values for each channel use to transmit the message with a minimum reconstruction
error at the receiver. This approach results in learning a joint coding and modulation
scheme utilizing the available channel uses to transmit a given message depending on the
34
system parameters (number of input bits per message and number of channel uses per
message) in order to achieve the maximum possible tolerance to the distortions caused
by the noise added in the channel. Thus, even for a same rate R (R = 1/2 in this case),
changing the message size results in different constellations as the model dimensions and
the learned parameters are different for different message sizes.
Symbol 1 Symbol 2
2 2
1 1
0 0
-1 -1
-2 -2
-2 -1 0 1 2 -2 -1 0 1 2
Symbol 3 Symbol 4
2 2
1 1
0 0
-1 -1
-2 -2
-2 -1 0 1 2 -2 -1 0 1 2
Figure 4.8. Scatter plots of learned constellations for M = 4, R = 1/2 system. 4 messages
are shown using 4 different markers in the plot.
1 1 1 1
0 0 0 0
-1 -1 -1 -1
-2 -2 -2 -2
-2 0 2 -2 0 2 -2 0 2 -2 0 2
1 1 1 1
0 0 0 0
-1 -1 -1 -1
-2 -2 -2 -2
-2 0 2 -2 0 2 -2 0 2 -2 0 2
Figure 4.9. Scatter plots of learned constellations for M = 16, R = 1/2 system. 3
different messages are shown using 3 different markers in the plot.
35
4.3.1 Implementation
Table 4.3. Layout of the autoencoder model equivalent to coded systems with higher
order modulations
0.0000011
0 0.99
)
)
1 0.000023
Normalization Layer
)
)
0 .
Input Layer ( )
Noise Layer
Dense Layers (
Dense Layer (
. .
Dense Layer (
Dense Layer (
. 1.00
1 0.0000061
0 .
. .
. .
. 0.0001
0 0.9998
1 0.000012
0 0.000045
0
0
1
1
0
1
0
. Comparator
.
1 if
0 else
. for
.
.
0
1
0
0
Output
( bits)
Figure 4.10. Autoencoder model implementation equivalent to coded systems with higher
order modulations.
Since the input to the model is a binary vector and we expect a reconstruction of the
input vector at the output of the autoencoder, it is essential to have an output layer
which gives values 0s and 1s as output. Thus, we have implemented a fully connected
layer with the Sigmoid activation function at the output layer (which has outputs in
the range (0, 1)) along with the binary cross-entropy loss function for model training
which results in each of the k bits of the output vector to be closer to either 1 or 0 after
end-to-end optimization of the model to minimize the loss. After the model training,
the autoencoder output can be applied to a simple comparator module to produce the
binary outputs as shown in Figure 4.10.
Autoencoder is trained end-to-end over the stochastic channel model using SGD
method with the Adam optimizer with learning rate = 0.001. Same as in the earlier
simulations, model training and testing was implemented in Keras with TensorFlow.
Different models were trained for different message sizes (M = 16, 64, 256, 4096 etc.),
code rates and modulation schemes such as QPSK, 16-QAM etc. The AWGN channel
noise variance is given by β = (2Rkmod Eb /N0 )−1 and for model training, the channel is
represented by an additive noise layer with fixed variance β. Each model was trained
over 100 epochs and with batch size = 1000 with a training set of 1,000,000 randomly
generated messages. Eb /N0 = 5 dB was used for model training. Testing the trained
models were performed with 1,000,000 different random messages over 0 dB to 10 dB
Eb /N0 range and their BER performances have been compared with the corresponding
1 of 2 7/17/2019, 11:50 PM
37
baseline systems. Table 4.4 below summarises the simulation parameters on which we
have tested the models.
Table 4.4. System parameters for autoencoder models and baseline systems
Even though an extensive search for determining optimum batch size and number
of epochs for model training was not done, batch size = 1000 and epochs = 100 were
observed to give better results after performing some initial simulations with different
configuration settings and hence those values were used when training the models. As
results from Chapter 3 showed a better BER performance when model training was done
at Eb /N0 = 5 dB, same Eb /N0 value was used for training the new autoencoder models
as well.
BER Performance
Figure 4.11 shows BER performance comparison between the autoencoder and baseline
system for R = 1/2 with 16-QAM modulation. Selected block length for baseline system
is K = 800 and the constraint length of the convolutional encoder/decoder is taken as 7.
100
10-1
10-2
Bit Error Rate (BER)
10-3
10-4
10-6
0 1 2 3 4 5 6 7 8 9 10
E b /N 0 (dB)
Figure 4.11. BER performance for baseline and autoencoder for R = 1/2, 16-QAM
system.
Autoencoder BER performance is better than baseline convolutional coded system with
hard decision decoding over the full Eb /N0 considered. However, it can be noticed that
the difference between BER decreases along with increasing Eb /N0 , having almost equal
performance at Eb /N0 = 10 dB. Baseline CC implementation with soft decision decoding
is better than autoencoder for higher Eb /N0 values. However, it can be observed that
while the coded soft decision CC BER performance is worse than uncoded 16-QAM at
39
lower Eb /N0 values, autoencoder has a better BER performance than soft decision CC
and uncoded 16-QAM in 0 dB to 4 dB Eb /N0 range.
BLER comparison between the autoencoder and baseline systems is shown in Figure
4.12. It can be observed that autoencoder BLER performance is worse compared to
the baseline. This can be explained since the optimization criteria for the autoencoder
was not the BLER, but the message error rate (MER) or the BER of each message
transmitted. Figure 4.13 shows the MER performance along the considered Eb /N0 range
and we can observe that it has an acceptable MER, having less than 10−4 error at 10 dB.
The autoencoder based system does not require larger input block sizes to operate as
the input size to the model is k bits at a time. Thus, it can achieve an acceptable BER
and MER performances as shown, with only k bits (k = 8 in this case) which is a very
low block size compared to conventional systems which typically operate with 100s or
1000s bits long block sizes. Such a system would be advantageous for low latency and
low throughput communications as short message transmission can be achieved with an
acceptable error performance, and with less processing complexity and processing delay
than in the conventional systems.
100
10-1
Block Error Rate (BER)
10-2
Figure 4.12. BLER performance for baseline and autoencoder for R = 1/2, 16-QAM
system.
40
100
10-1
Message Error Rate (MER)
10-2
10-3
10-4
10-5
0 1 2 3 4 5 6 7 8 9 10
Eb /N0 (dB)
Figure 4.13. MER performance for M = 256, R = 1/2, Mmod = 16 autoencoder model.
Figures 4.14 and 4.15 compare the hard decision CC and soft decision CC performances
for different block lengths respectively along with the autoencoder performance.
Autoencoder performance is independent of block length K as its input size is k bits
where K bit long block is divided to k bit sub-blocks and fed to the system. Autoencoder
is better than hard decision CC over the full Eb /N0 range for all the block sizes checked
and is around 3 dB worse than soft decision CC at a BER of 10−5 .
41
100
10-1
10-2
Bit Error Rate (BER)
10-3
Figure 4.14. BER performance for baseline and autoencoder for R = 1/2, 16-QAM
system with different block lengths: K = {200, 400, 800, 1600, 2000, 4000}.
100
10-1
10-2
Bit Error Rate (BER)
10-3
10-4
10-7
0 1 2 3 4 5 6 7 8 9 10
E b /N 0 (dB)
Figure 4.15. BER performance for baseline and autoencoder for R = 1/2, 16-QAM
system with different block lengths: K = {200, 400, 800, 1600, 2000, 4000}.
42
100
R=1/2 hard, QPSK
R=1/2 soft, QPSK
Autoencoder: M=256, R=1/2, Mmod=4
Autoencoder: M=64, R=1/2, Mmod=4
10-1
Autoencoder: M=16, R=1/2, Mmod=4
QPSK uncoded
10-2
Bit Error Rate (BER)
10-3
10-4
10-5
10-6
0 1 2 3 4 5 6 7 8 9 10
E b /N 0 (dB)
Figure 4.16. BER performance for baseline and autoencoder for R = 1/2, QPSK system.
Autoencoder models are implemented with different message sizes M = {16, 64, 256}.
We tried training the M = 256, R = 1/2, Mmod = 4 model at different Eb /N0 values and
Figure 4.17 illustrates the BER performances of the models trained at Eb /N0 = {0, 5, 8}
dB. It was observed that the model trained at Eb /N0 = 2 dB is having much improved
BER performance at low Eb /N0 range. It is around 1.5 dB better than hard decision CC
in the 0 to 4 dB Eb /N0 range and it is only about 1 dB worse than soft decision CC in
the said Eb /N0 range. It is interesting to see that training the model at a very low Eb /N0
value has resulted in learning optimum an transmission mechanism to overcome the high
distortions caused by the channel in low Eb /N0 range. That is, the learnt signalling
strategy is more robust for low Eb /N0 range. However, it can be seen that the learned
transmission mechanism is not suitable for higher Eb /N0 values as the performance is
worse even than the uncoded case. This result shows the possibility of training a deep
learning model to be optimum for a specific Eb /N0 range. Thus, instead of having a single
model with a fixed transmitter-receiver mechanism to suit the full Eb /N0 range, it might
be possible to develop multiple models with different transmitter-receiver arrangements
43
to suit different operating environments according to the Eb /N0 . Having the flexibility
to design such multiple systems which operate under the same system parameters (i.e.
same R and same modulation order Mmod ) can be noted as an advantage of deep learning
based communications systems since conventional systems generally have fixed setups.
100
10-1
10-2
Bit Error Rate (BER)
10-3
10-4
10-6
0 1 2 3 4 5 6 7 8 9 10
E b /N 0 (dB)
Figure 4.17. BER performance for baseline and autoencoder models for R = 1/2, QPSK
system.
In [38], the authors have compared the decoder complexities of different candidate
channel coding algorithms for URLLC, and the Viterbi algorithm which is used in the
decoder of the convolutional codes have a computational complexity of 4.R.N.2m where
R, N and m denote code rate, code block length and memory order respectively.
For a neural network, if there are M neurons in a hidden layer and N inputs to
that layer, there are N.M multiplications and M separate additions over N + 1 terms,
and M applications of the transfer function f (). Thus, the number of mathematical
operations depend on the number of layers in the neural network and the dimensions
of each layer. Table 4.5 below summarises the number of additions, multiplications and
transfer function applications in each layer in the autoencoder model we implemented in
Section 4.3. However, in implementation, neural networks are generally implemented with
parallel processing architectures as each neuron in a layer only depends on the inputs from
the previous layer, the learned parameters (or weights), and the application of the transfer
function. Thus, when considering the parallel processing implementation, processing
complexity in each layer for a single parallel path would be just N multiplications and
N + 1 additions and a single application of the transfer function. Viterbi decoder and
autoencoder has the same range of processing complexity without considering the parallel
implementation of the autoencoder. Therefore, when considering a parallel processing
implementation, the autoencoder has a very low processing complexity compared to
conventional system given that autoencoder caters for end-to-end processing including
both transmitter and receiver side processing. Thus, the autoencoder based systems
can be implemented with very low processing complexity and hence lower processing
delays than the conventional systems can be achieved, which is another advantage when
considering implementation of low latency systems.
The latest release (release 15) of the cellular standard in the 3rd Generation Partnership
Project (3GPP) has announced the specifications for the 5G new radio (NR) air interface
[39]. Compared to fourth generation (4G) long-term evolution (LTE), in 5G NR, two
new channel coding techniques have been adopted, for data channels and control channels
respectively. Specifically, low density parity check (LDPC) codes are to replace turbo
codes used in 4G LTE for data channels and polar codes are to replace tail biting
convolutional codes (TBCCs) for control channels [39]. When considering the modulation
schemes, BPSK, QPSK, 16-QAM, 64-QAM and 256-QAM are adopted for 5G NR [40].
Performance of different channel coding schemes for 5G with modulation schemes such
as QPSK and 16-QAM have been investigated in [41] and [42]. When comparing their
results with the results we have obtained, we can observe that the LDPC and polar codes
have a better BER and BLER performance than the autoencoder based systems which
we have implemented, and the autoencoder models need improvements if they are to
be considered to be suitable alternatives to the proposed 5G implementations. When
selecting physical layer implementations, processing complexity is also to be considered
in order to understand advantages and disadvantages of autoencoder based systems and
conventional systems.
46
models were observed to have comparable performance to hard decision decoding with
BPSK with less than 1 dB difference. Autoencoder models were implemented with
different message sizes and it was observed increasing the message size resulted in better
BER performance, due to the increased degrees of freedom and flexibility for learning
introduced to the model by increased message size. In contrast to the conventional system
with separate coding and modulation blocks, autoencoder system learns a joint coding
and modulation scheme which fits best to the channel. Here also, the learning capacity
of the model is to be admired, as a single model trained at Eb /N0 = 5 dB resulted in
learning optimum transmission mechanisms to suit the full 0-10 dB Eb /N0 range.
To design an equivalent system to conventional coded systems with higher order
modulation schemes, a new autoencoder model was proposed which incorporated
the conventional system parameters such as coding rate and modulation order etc.
Simulations showed that the proposed autoencoder model is capable of achieving
comparable performance to the baseline system in several instances. For R = 1/2 and
16-QAM scenario, equivalent autoencoder model resulted in better BER than the hard
decision CC over the full 0-10 dB Eb /N0 range while it was better than the soft decision
CC in low Eb /N0 range between 0-4 dB. For R = 1/2 and QPSK scenario, training the
model at 5 dB did not give the expected performance. However, it was observed that
training the model at 2 dB resulted in a model achieving BER performance even better
than soft decision CC for low Eb /N0 range between 0-4 dB. It shows the possibility of
implementing different flexible transmission strategies based on the operating conditions
(channel condition, Eb /N0 etc.) without having a single fixed model. This flexibility
of the DL based approach which can be achieved by exploiting its learning capability
can be stated as an advantage over the existing conventional systems which use a fixed
communication mechanism in all instances.
Processing complexity of the autoencoder based systems were also analysed in
comparison to decoder complexity of the conventional systems which is considered a
computationally intensive task among the other blocks. Parallel architecture of the
DL models enable fast processing of information compared to the conventional systems.
Also, having short block length transmission with acceptable error performance shows
the potential of having DL based systems specially for low latency and low throughput
applications.
Conclusively, comparable BER performance, lower processing complexity and low
latency processing due to inherent parallel processing architecture, flexible structure
and higher learning capacity are identified as advantages of the autoencoder based
systems which show their potential and feasibility as an alternative to conventional
communications systems.
48
AWGN channel performance is compared under the current scope of the thesis and
the autoencoder model should be extended for other fading channels to analyse its
performance in fading scenarios as well. In this study we have assumed an ideal
communications system with perfect timing and both carrier-phase and frequency
synchronization. Generally, conventional systems are well proven to have acceptable
performance in ideal system settings (with perfect timing, carrier-phase and frequency
synchronization etc.), since the underlying mathematical models can be well explained in
such systems compared to practical non-ideal system settings. Being able to implement
DL based systems having comparable performance with respect to conventional
approaches in such perfect settings show the potential of DL, whereas the real strength
of DL could be exploited more in a non-ideal scenario, since DL is generally known to
be well performing in situations where it is difficult to capture the underlying structures
and patterns of input-output data using exact mathematical models. Thus, it is expected
that DL based approaches will give better results when considering non-linear, non-
ideal channel conditions and practical systems with imperfections such as timing offsets,
carrier-phase and frequency synchronization issues. Therefore, further research can be
carried out evaluating the performance of DL based systems in such scenarios.
Also it would be important to investigate how the autoencoder model can be extended
to implement DL based systems equivalent to conventional coded systems with large block
sizes and very high level modulations with comparable BLER and BER performances.
In the models which we have studied, block size is not taken into consideration as
the autoencoder model is formulated based on short length messages. We can find
mechanisms to incorporate the block structure into the autoencoder model. We can
also consider comparing the performance of the autoencoder based systems with 5G
channel coding implementations like polar and LDPC codes.
49
6 REFERENCES
[1] Ericsson (2017) 5G systems - Enabling the transformation of industry and society.
White Paper UEN 284 23-3251 rev B, Ericsson.
[2] 3GPP (2018) Study on scenarios and requirements for next generation access
technologies. TR 38.913, v15.0.0, 3rd Generation Partnership Project (3GPP).
[3] Rappaport T.S. (2002) Wireless Communications: Principles and Practice. USA:
Prentice-Hall; 2nd edition.
[5] O’Shea T. & Hoydis J. (2017) An introduction to deep learning for the physical layer.
IEEE Transactions on Cognitive Communications and Networking 3, pp. 563–575.
[6] O’Shea T.J., Karra K. & Clancy T.C. (2016) Learning to communicate:
Channel auto-encoders, domain specific regularizers, and attention. In: 2016
IEEE International Symposium on Signal Processing and Information Technology
(ISSPIT), pp. 223–228.
[7] Fehske A., Gaeddert J. & Reed J.H. (2005) A new approach to signal classification
using spectral correlation and neural networks. In: First IEEE International
Symposium on New Frontiers in Dynamic Spectrum Access Networks, 2005.
DySPAN 2005., pp. 144–150.
[8] Nandi A. & Azzouz E. (1997) Modulation recognition using artificial neural
networks. Signal Processing 56, pp. 165 – 175.
[9] Bruck J. & Blaum M. (1989) Neural networks, error-correcting codes, and
polynomials over the binary n-cube. IEEE Transactions on Information Theory 35,
pp. 976–987.
[10] Ortuno I., Ortuno M. & Delgado J.A. (1992) Error correcting neural networks for
channels with gaussian noise. In: [Proceedings 1992] IJCNN International Joint
Conference on Neural Networks, vol. 4, vol. 4, pp. 295–300 vol.4.
[11] Ibukahla M., Sombria J., Castanie F. & Bershad N.J. (1997) Neural networks for
modeling nonlinear memoryless communication channels. IEEE Transactions on
Communications 45, pp. 768–771.
[12] Wen C., Jin S., Wong K., Chen J. & Ting P. (2015) Channel estimation for massive
mimo using gaussian-mixture bayesian learning. IEEE Transactions on Wireless
Communications 14, pp. 1356–1368.
[13] Chen S., Gibson G., Cowan C. & Grant P. (1990) Adaptive equalization of finite
non-linear channels using multilayer perceptrons. Signal Processing 20, pp. 107 –
119.
[15] Wang T., Wen C., Wang H., Gao F., Jiang T. & Jin S. (2017) Deep learning for
wireless physical layer: Opportunities and challenges. China Communications 14,
pp. 92–111.
[16] Hornik K., Stinchcombe M. & White H. (1989) Multilayer feedforward networks are
universal approximators. Neural Networks 2, pp. 359 – 366.
[18] Paszke A., Gross S., Chintala S., Chanan G., Yang E., DeVito Z., Lin Z., Desmaison
A., Antiga L. & Lerer A. (2017) Automatic differentiation in PyTorch. In: NIPS
Autodiff Workshop.
[19] Jia Y., Shelhamer E., Donahue J., Karayev S., Long J., Girshick R.B., Guadarrama
S. & Darrell T. (2014) Caffe: Convolutional architecture for fast feature embedding.
In: ACM Multimedia.
[20] Zhang C., Patras P. & Haddadi H. (2019) Deep learning in mobile and wireless
networking: A survey. IEEE Communications Surveys Tutorials , pp. 1–1.
[21] Goodfellow I., Bengio Y. & Courville A. (2016) Deep Learning. The MIT Press.
[22] Rumelhart D., Hinton G. & Williams R. (1989) Learning representations by back-
propagating errors. Nature 323, pp. 533, 536.
[24] Nachmani E., Be’ery Y. & Burshtein D. (2016) Learning to decode linear codes
using deep learning. In: 2016 54th Annual Allerton Conference on Communication,
Control, and Computing (Allerton), pp. 341–346.
[25] Nachmani E., Marciano E., Burshtein D. & Be’ery Y. (2017) RNN decoding of linear
block codes. CoRR abs/1702.07560. URL: http://arxiv.org/abs/1702.07560.
[26] Gruber T., Cammerer S., Hoydis J. & t. Brink S. (2017) On deep learning-based
channel decoding. In: 2017 51st Annual Conference on Information Sciences and
Systems (CISS), pp. 1–6.
[27] Cammerer S., Gruber T., Hoydis J. & ten Brink S. (2017) Scaling deep learning-
based decoding of polar codes via partitioning. CoRR abs/1702.06901. URL: http:
//arxiv.org/abs/1702.06901.
[28] Nachmani E., Marciano E., Lugosch L., Gross W.J., Burshtein D. & Be’ery Y.
(2018) Deep learning methods for improved decoding of linear codes. IEEE Journal
of Selected Topics in Signal Processing 12, pp. 119–131.
[29] Liang F., Shen C. & Wu F. (2018) An iterative BP-CNN architecture for channel
decoding. IEEE Journal of Selected Topics in Signal Processing 12, pp. 144–159.
51
[30] Samuel N., Diskin T. & Wiesel A. (2017) Deep mimo detection. In: 2017 IEEE 18th
International Workshop on Signal Processing Advances in Wireless Communications
(SPAWC), pp. 1–5.
[31] Farsad N. & Goldsmith A.J. (2017) Detection algorithms for communication systems
using deep learning. CoRR abs/1705.08044. URL: http://arxiv.org/abs/1705.
08044.
[32] Ye H., Li G.Y. & Juang B. (2018) Power of deep learning for channel estimation
and signal detection in ofdm systems. IEEE Wireless Communications Letters 7, pp.
114–117.
[33] Neumann D., Wiese T. & Utschick W. (2018) Learning the mmse channel estimator.
IEEE Transactions on Signal Processing 66, pp. 2905–2917.
[34] Nandi A.K. & Azzouz E.E. (1998) Algorithms for automatic modulation recognition
of communication signals. IEEE Transactions on Communications 46, pp. 431–436.
[35] Dörner S., Cammerer S., Hoydis J. & t. Brink S. (2018) Deep learning based
communication over the air. IEEE Journal of Selected Topics in Signal Processing
12, pp. 132–143.
[36] O’Shea T.J., Roy T., West N. & Hilburn B.C. (2018) Physical layer communications
system design over-the-air using adversarial networks. In: 2018 26th European Signal
Processing Conference (EUSIPCO), pp. 529–532.
[37] MATLAB (2019) version 9.6.0 (R2019a). The MathWorks Inc., Natick,
Massachusetts.
[38] Sybis M., Wesolowski K., Jayasinghe K., Venkatasubramanian V. & Vukadinovic V.
(2016) Channel coding for ultra-reliable low-latency communication in 5g systems.
In: 2016 IEEE 84th Vehicular Technology Conference (VTC-Fall), pp. 1–5.
[39] 3GPP (2018) Multiplexing and channel coding. TS 38.212, v15.2.0, 3rd Generation
Partnership Project (3GPP).
[40] 3GPP (2018) Physical channels and modulation. TS 38.211, v15.2.0, 3rd Generation
Partnership Project (3GPP).
[41] Hui D., Sandberg S., Blankenship Y., Andersson M. & Grosjean L. (2018) Channel
coding in 5G New Radio: A tutorial overview and performance comparison with 4G
LTE. IEEE Vehicular Technology Magazine 13, pp. 60–69.
[42] Gamage H., Rajatheva N. & Latva-aho M. (2017) Channel coding for enhanced
mobile broadband communication in 5G systems. In: 2017 European Conference on
Networks and Communications (EuCNC), pp. 1–6.
52
7 APPENDICES
Deep feedforward networks, also called feedforward neural networks (feedforward NNs),
or multi layer perceptrons (MLPs), are the basic type of DL models. The objective of a
feedforward network is to approximate some function f ∗ . For example, for a classifier,
y = f ∗ (x) maps an input x to a category y. A feedforward network defines a mapping
y = f (x; θ) and learns the value of the parameters θ that result in the best function
approximation [21].
These are called feedforward networks since the information flows from input to the
output through the intermediate computations used to define f . In feedforward NNs
there are no feedback connections in which outputs of the model are fed back into itself.
When feedforward NNs are extended to include feedback connections, they are called
recurrent neural networks.
Feedforward NNs are given the name networks because they are typically made of
composing many different functions together. That is, the function f which maps the
input x to the output y may consists of one or more functions which connect as a
chain forming the chained network structure to make the input-output mapping. For
an example, there may be three functions f1 , f2 , and f3 connected into a chain to form
f (x) = f3 (f2 (f1 (x))). Here, f1 is called the first layer, f2 is called the second layer and
so on.
The overall length of the chain gives the depth of the model, which has given the rise
to the terminology “deep learning”. The first layer of the feedforward network is called
the input layer, and the final layer is called the output layer. Inside layers which
are in-between the input and the output layers are called hidden layers since their
behaviour is hidden to outside and the input training data does not show the desired
output for these layers. Structure of a typical fully connected feedforward NN is shown
in Figure Appendix 1.1.
Hidden Layers
During the network training process, the learning algorithm decides best
implementation of these hidden layers in order to approximate the function f ∗ in the
optimum manner. An MLP consists of at least three layers which is the input layer,
hidden layer and the output layer. Usually MLPs with more than one hidden layer are
regarded as DL structures. Thus, following the above definition, a feedforward NN with
L layers describes a mapping f (x0 ; θ) : RN0 7→ RNL of an input vector x0 ∈ RN0 to an
output vector xl ∈ RNL through L iterative processing steps as follows:
where fl (xl−1 ; θl ) : RNl−1 7→ RNl is the mapping performed by the lth layer. This
mapping depends on the output vector xl−1 from the previous layer which is fed as
the input to the lth layer and also the set of parameters θl . Furthermore, fl can also
be a function of some random variables which makes the mapping stochastic. θ =
{θ1 , θ2 , .., θL } denotes the set of all parameters of the network. The lth layer of the
network is called dense or fully-connected if fl (xl−1 ; θl ) has the form
The activation function σ(·) in (3) is applied individually to each element of its input
vector, i.e., [σ(u)]i = σ(ui ). Activation function adds non-linearity to the network to
make the network more powerful and adds ability to it to learn something complex and
complicated form data and represent non-linear complex arbitrary functional mappings
between inputs and outputs. Without this non-linearity, there would not be much of
an advantage of having a stacked multiple layer structure as explained earlier. Hence,
using a non-linear activation it is possible to generate non-linear mappings from inputs
55
to outputs. Commonly used activation functions are listed in Table Appendix 1.2 [5].
Typically in classification problems, the SoftMax layer is used the output layer of a
network.
Table Appendix 1.2. Different types of activation functions used in neural networks
In the model training process, the labelled data, i.e., a set of input and output vector
pairs, (x0,i , x?L,i ), i = 1, ..., S, is used to minimize the loss by adjusting the parameter set,
θ. Here x?L,i is the expected output of the NN when x0,i is used as the input. Following
equation gives the loss of the network which is tried to be minimized during the training
process.
S
1X
L(θ) = l(x? , xL,i ). (4)
S i=1 L,i
Here, l(u, v) : RNL × RNL 7→ R is the loss function and xL,i is the actual output from
the NN when x0,i is given as the input. Commonly used loss functions include mean
squared error (MSE), binary cross-entropy and categorical cross-entropy. Different loss
functions are listed out in the Table Appendix 1.3. Furthermore, to train a model for
a specific scenario, the loss function can be revised by adding different norms (e.g. L1,
L2) of parameters or activations to the loss function. Stochastic gradient descent (SGD)
algorithm is one of the most widely applied algorithms to obtain optimum sets of θ.
Creating and training of deep NNs changing any of the above described parameters can
be done using currently available DL libraries which are discussed in the Section 2.3.
Table Appendix 1.3. Different types of loss functions used in neural networks
Pooling Layer
Convoluiton Layer
where L is a loss function such as MSE which penalizes g(f (x)) for being different to
the input x.
Traditionally, autoencoders were used for dimensionality reduction or feature learning.
Autoencoders are closely related to principal component analysis (PCA) since PCA also
tries to reduce dimensionality of input data in an unsupervised manner by minimizing
the reconstruction error. However, autoencoders can represent both linear and non-
linear transformation in encoding and decoding, thus has more flexibility, while PCA
generally performs linear transformation. Autoencoders can be layered to form DL
network due to its network representation. There can be multiple hidden layers where a
deep NN is formed. Layout of an autoencoder NN is shown in the Figure Appendix 1.3.
Autoencoders thus can be thought as a special case of feedforward networks, and may be
trained with all of the same techniques, typically mini-batch gradient descent following
gradients computed by back-propagation.
Input Output
Code
Hidden Layer
Encoder Decoder
Depending on the application the autoencoder is used, there are different types
of autoencoders such as undercomplete autoencoders, sparse autoencoders, denoising
autoencoders, contractive autoencoders etc.
Real Sample
Predictions
Real
Real Dataset
Discriminator Loss
Input Noise
Generator Fake
Generated Sample
The generator directly produces samples x = g(z; θ(g) ) and the discriminator outputs
a probability value given by d(x; θ(d) ), which indicates the probability that x is a real
training sample instead of a fake sample generated by the generator model. Therefore, a
min-max two players game is introduced between the generator g and the discriminator
d, and the min-max optimization objective can be given as
arg min max ν(θ(g) , θ(d) ) = Ex∼pdata log d(x) + Ex∼pmodel log(1 − d(x)). (7)
g d
arg min max ν(θ(g) , θ(d) ) = Ex∼pdata log d(x|m) + Ex∼pmodel log(1 − d(x|m)). (8)
g d
59
Training of NNs involves finding the optimum parameters for each of the layers in the
network which minimizes a desired loss function such as the loss function given in 4 for
a simple feedforward NN. The differentiable architecture of DNNs allows learning the
optimum model parameters which minimizes the loss function using gradient descent
(GD) methods through back-propagation, following the fundamental chain rule [22].
Having a large number of hidden layers and neurons result in having several other
parameters to be determined and thereby makes the network implementation difficult.
Vanishing gradients, slow convergence of the network, getting stuck in a local minimum
are some of the problems which are faced during network training process [15]. Vanishing
gradient problem where the gradients of the loss function approaches zero, which makes
the network training difficult is solved by introducing new activation functions such as
rectified linear units (ReLU) [15].
A modified version of the classic GD algorithm is used to achieve faster convergence
and to reduce the computation complexity, which is known as the stochastic gradient
descent algorithm which is widely used in network training. SGD starts with a random
initialization of the parameters θ = θ0 and then updates them iteratively as
where η > 0 is the learning rate and L̃(θt ) is an approximation of the loss function
computed for a randomly selected mini-batch of training samples St ⊂ {1, 2, .., S} of size
St at each iteration, given as
1 X
L̃(θt ) = l(x? , xL,i ). (10)
St i∈St L,i
It is noted that gradient complexity can be significantly reduced by selecting St small
compared to S, while still reducing the weight update variance [5]. In order to avoid
converging the model to local optimal solutions and to further increase training speed,
different adaptive learning rate algorithms such as Adagrad, RMSProp, Momentum, and
Adam have been proposed [15].
Another challenge of network training is that even though the network gets trained
well and performs well for the training data, it may give poor performance for testing
data due to overfitting for the training data. To avoid overfitting, different approaches
like early stopping, regularization and dropout schemes have been proposed which
results in getting acceptable performance in both training and testing data [15].