Machine Learning in 6G Wireless Communications
Machine Learning in 6G Wireless Communications
2 FEBRUARY 2023
75
INVITED PAPER Special Section on Emerging Communication Technologies in Conjunction with Main Topics of ICETC 2021
SUMMARY Mobile communication systems are not only the core of the is expected to improve the performance of the uplink and
Information and Communication Technology (ICT) infrastructure but also realize communication quality assurance.
that of our social infrastructure. The 5th generation mobile communication
Mobile communication systems are evolving every 10
system (5G) has already started and is in use. 5G is expected for various use
cases in industry and society. Thus, many companies and research institutes years, and by 2030, when the next generation of mobile com-
are now trying to improve the performance of 5G, that is, 5G Enhancement munication systems (Beyond 5G (6G)) is expected to be in
and the next generation of mobile communication systems (Beyond 5G use, various social issues and use cases are expected to be
(6G)). 6G is expected to meet various highly demanding requirements even addressed. As shown above, 6G will be required to support
compared with 5G, such as extremely high data rate, extremely large cover-
age, extremely low latency, extremely low energy, extremely high reliability,
the data traffic that is expected to continuously increase as
extreme massive connectivity, and so on. Artificial intelligence (AI) and mobile communication services become more sophisticated
machine learning (ML), AI/ML, will have more important roles than ever in and diverse. Also, 6G will be required to meet the extremely
6G wireless communications with the above extreme high requirements for high-performance requirements that will support the resolu-
a diversity of applications, including new combinations of the requirements
tion of social issues and new use cases in the 2030s. The
for new use cases. We can say that AI/ML will be essential for 6G wireless
communications. This paper introduces some ML techniques and appli- Ministry of Internal Affairs and Communications (MIC) has
cations in 6G wireless communications, mainly focusing on the physical presented the following three social images for the 2030s
layer. when 6G is expected to be used in “Beyond 5G Promotion
key words: artificial intelligence (AI), machine learning (ML), deep learn- Strategy — Roadmap towards 6G —” [2]: “Inclusive soci-
ing (DL), neural network (NN), deep neural network (DNN), 6G, deep
transfer learning (DTL)
ety,” “Sustainable society,” and “Dependable society.”
The year 2030 is also the target year for achieving
1. Introduction the Sustainable Development Goals (SDGs) adopted at the
United Nations Summit in 2015. 6G is expected to support
Digital Transformation (DX), which transforms society, the realization of these goals as a social infrastructure. Ac-
economy, and industry using digital technology represented cording to the “Beyond 5G Promotion Strategy,” in addition
by rapidly developing AI (Artificial Intelligence), is attract- to further upgrading of the characteristic functions of 5G,
ing much attention. Information and Communication Tech- such as eMBB, URLLC, and mMTC, 6G must be equipped
nology (ICT) infrastructure plays an important role in DX. with four new functions: “ultra low power consumption,”
It is no exaggeration to say that mobile communication sys- “autonomy,” “scalability,” and “ultra security and resiliency.”
tems, represented by the 5th generation mobile communi- In addition to the above functions, [1] also lists lower cost
cation system (5G), are the core of the ICT infrastructure. (lower cost per bit) and sensing as requirements.
5G is expected for various use cases in industry and soci- To meet those high requirements, various techniques
ety. 5G has three functional requirements: enhanced Mobile need to be developed and used in 6G. Several companies and
Broadband (eMBB), Ultra-Reliable and Low Latency Com- research institutes have issued white papers about B5G and
munications (URLLC), and Massive Machine Type Com- 6G [1]–[8]. In those white papers, we can see many common
munications (mMTC). In 5G (New Radio (NR) Release 15), requirements as shown below such as in [1].
which is currently in service, best-effort services that empha- • Extreme high data rate/capacity
size downlink speed are mainly realized as a result of stan-
dardization in 3GPP, focusing on eMBB and some URLLC – Peak data rate > 100 Gbps exploiting new spec-
among them [1]. In the future, it is expected that services trum bands
that take advantage of large data uploads and services that – > 100× capacity
guarantee communication quality, particularly for industrial – Extreme-high uplink capacity
applications, will be required. Therefore, 5G Enhancement • Extreme low latency
Manuscript received July 8, 2022. – E2E very low latency < 1 ms
Manuscript revised July 20, 2022. – Always low latency
Manuscript publicized August 10, 2022.
† The author is with the Faculty of Science and Technology,
• Extreme coverage extension
Keio University, Yokohama-shi, 223-8522 Japan.
a) E-mail: ohtsuki@ics.keio.ac.jp – Gbps coverage everywhere
DOI: 10.1587/transcom.2022CEI0002 – New coverage areas, e.g., sky (10000 m), sea
Copyright © 2023 The Institute of Electronics, Information and Communication Engineers
IEICE TRANS. COMMUN., VOL.E106–B, NO.2 FEBRUARY 2023
76
massive MIMO BP detection using DIP with a DNN-trained It can be seen that the BER performance is improved by ap-
scaling factor. In BP detection, we create the heatmap of the plying DIP. It can be also seen that the BER performance of
received signals after interference removal at each iteration the proposed method with the trained scaling factor is better
so that it correlates. By applying DIP to the heatmap of than that without the trained one.
the received signals, it is possible to reduce residual inter-
ference and noise. After applying DIP, the variance of the 2.2.2 Pilot Contamination
interference and noise components changes. To bring the
variance closer to its true value, we scale it. Because it is In massive MIMO, the number of channels that needs to
difficult to calculate the value of the variance after applying be estimated is large. Since the number of orthogonal pi-
DIP theoretically, we train the scaling factors offline using lot signals is limited when we limit the length of those, the
DNN-BP. By scaling the variance, it is possible to improve same pilot signals need to be reused in neighboring cells.
the reliability of the message. Figure 4 shows the BER per- The degradation of the channel estimation performance by
formance versus SNR in dB in the correlated channel (the reusing the same pilot signals is referred to as pilot contam-
correlation factor ρ = 0.3) where the modulation scheme is ination. In [27] a covariance-aided channel estimation is
QPSK, 16×16 MIMO, and the number of BP iterations is 7. proposed, in which the MMSE channel estimation is de-
rived. This scheme can remove the pilot contamination
completely when the covariance matrices satisfy a certain
non-overlapping condition. However, this assumption is not
so practical.
Recently, DL is expected to improve the channel estima-
tion performance in massive MIMO. In [28], DL is integrated
into direction-of-arrival (DoA) estimation and channel esti-
mation in massive MIMO systems. In [16] we propose two
methods of DL-aided channel estimation to reduce the ef-
fects of pilot contamination. One uses a neural network
consisting of fully connected layers, while the other uses
a CNN. Figure 5 shows the frameworks of the proposed
methods where the upper and lower parts show the struc-
ture in the neural network-based estimation using the fully
connected layers and the CNN-based estimation using the
convolutional layers, respectively. Neural networks, partic-
Fig. 4 BER performance vs SNR (dB) per receive antenna in the cor-
related channel (ρ = 0.3) where the modulation scheme is QPSK, 16×16
ularly CNN, can extract features of spatial information from
MIMO, and the number of BP iterations is 7. the contaminated signals. It is shown that the former method
Fig. 5 A framework for the proposed methods [16]. The upper part and the lower part show the
structure in the neural network-based estimation using the fully connected layers and the CNN-based
estimation using the convolutional layers, respectively.
IEICE TRANS. COMMUN., VOL.E106–B, NO.2 FEBRUARY 2023
80
Transfer learning (TL) is a machine learning method where Fig. 6 The system model of the CSI feedback scheme based on DTL [17].
a model trained for a task is used as a starting point for a
model on a different related task. TL is a popular technique
in DL such as for computer vision and natural language
processing where a large amount of computation and time
resources are required to train a model from scratch. In TL,
a domain and a task are defined. A domain D is defined
as a pair D = { χ, P(X)}, which consists of a feature space
χ and a marginal distribution P(X) over the feature space,
where X = {x1, ..., xn } ∈ χ. A task is defined as a pair
T = {Y, f (·)}. Y is the label space, and given yi ∈ Y, f (·)
is a function that predicts yi corresponding to xi . Using the
definitions of a domain and a task, TL can be defined as
follows [29]:
Transfer learning : Given a source domain DS , a
source task TS , a target domain DT , and a target task TT , Fig. 7 The NMSE performance versus the number of target data samples
the aim of TL is to improve the learning of the target predic- where the target channel is CDL-A.
tion function fT (·) in DT using the knowledge in DS and TS ,
where DS , DT or TS , TT .
Deep transfer learning: DTL is a method that com- source data. We can see that there is a performance degrada-
bines deep learning with TL. Given that the TL task is de- tion, but the DTL scheme achieves good NMSE performance
fined by hDS ,TS , DT ,TT i, which is a DTL task when the with the small number of target data samples. We can also
target prediction function fT (·) for TT is a non-linear func- see that in the DTL scheme, different source models provide
tion approximated by DNN. different NMSE performances. In this environment where
DTL has been applied to wireless communications as the target channel is CDL-A (NLOS), the DTL scheme pro-
well, such as CSI feedback, beamforming, signal detection, vides the best NMSE performance when the source model is
physical layer security, and so on. In [17], DTL is used to CDL-B (NLOS) and CDL-C (NLOS). As mentioned before,
generate the CSI feedback deep learning model for each tar- the source model selection is important for the DTL scheme.
get channel model whereas the Clustered Delay Line (CDL) Some discussion about the source model selection criteria in
channel model [30] is used to simulate the wireless environ- the DTL scheme can be found in [31].
ments. Specifically, the DNN is trained as the source model
by using a large number of CDL-A samples as source data. 2.4 mmWave Communications
The source model is then fine-tuned with a small number of
CDL-B, CDL-C, CDL-D, and CDL-E samples, i.e., target In wireless communications, there have been continuous and
data, respectively. Based on this procedure, a target model tremendous efforts to increase capacity by expanding spec-
for each target channel can be obtained with a small number trum and improving spectral efficiency and spatial reuse. It
of samples and a short training time. Figure 6 shows the sys- is very important to utilize new spectrum bands such as
tem model of the CSI feedback scheme based on DTL [17]. mmWave bands and tera-hertz frequency bands. A sig-
Figure 7 shows the NMSE performance of the DTL scheme nificant amount of research has been ongoing to improve
[17] in FDD massive MIMO systems where the target chan- and realize mmWave systems. However, mmWave systems
nel is CDL-A. The frequencies of the uplink and downlink suffer from severe pathloss. Thus, it is essential to use
channels are set to 2.0 GHz and 2.1 GHz, respectively. The beamforming with large antenna array gains in mmWave
numbers of antennas of UE and BS are 2 and 32, respec- communications. In mmWave communications, the power
tively. The number of subcarriers is set to 72 with a spacing consumption and cost of RF chains are both high. Hybrid
of 15 kHz, and the number of OFDM symbols to 14. The es- beamforming is a promising technique to balance tradeoffs
timated CSI of UE and feedback CSI of BS are assumed to be between cost and performance. Since mmWave communi-
error-free. The compression ratio is set to 1/8. The number cations need to use a large number of antennas, the channel
of source data samples used to train the source model is set to estimation is also the challenging task. Against the chal-
50,000, and that of target data samples used for fine-tuning lenge, a switched beamforming scheme has been proposed
is varied as 200, 500, 1000, 2000, and 4000. The red dotted [32] in which the best beams to steer are found within the
line with the label “CDL-A (NLOS)” represents the NMSE codebook. Among beam selection schemes, an exhaustive
performance of the source model trained using CDL-A as the search scheme achieves the best performance but requires a
OHTSUKI: MACHINE LEARNING IN 6G WIRELESS COMMUNICATIONS
81
large overhead particularly when a large number of beams to the power map obtained by such as 8 × 8 DFT beams. It
are employed [33]. A hierarchical beam search proposed is shown in [18] that the proposed beam selection achieves
in [34] can reduce the beam training overhead by two-stage a performance comparable to that of the exhaustive search
beam training. In the hierarchical beam search scheme, BS scheme. Note that the number of beam measurements per
and UE, equipped with multiple-tier codebooks, sweep wider coherence time is 8 for the proposed scheme and 64 for the
beams first and iteratively thin the search space for the best exhaustive search scheme.
narrow beam. The hierarchical beam search scheme can pro-
vide a good trade-off among the performance, the time, and 3. Conclusions
the large overhead. In [35] a beam selection scheme using
DL is proposed to reduce the overhead. The DL model es- In this paper, I presented an overview of some ML techniques
timates the qualities (received power) of all the beams from and applications in 6G wireless communications, mainly
a few beam measurements. The authors introduce the DL- focusing on the physical layer. One of the challenges in
based image reconstruction approach to the beam selection applying ML to real systems is the dynamic environments.
where the received power matrix is transformed into a power The environments of wireless communications dynamically
map by assigning the received power to the corresponding change. ML makes inferences and predictions using data.
color. However, since the beams used for measurements are Therefore, if the statistical properties of the data change
selected randomly, the performance of the scheme can be over time, the performance of the system using ML may
largely affected by the beam searching area [35]. degrade. To use ML in wireless communications, DTL that
In [18], we proposed a DL-based low overhead analog I introduced its applications in wireless communications is
beam selection scheme in which two different-width beams one of the promising solutions. Another solution is the meta-
are steered, wide beams for pilot signals and narrow beams learning that learns how to learn [17]. Another challenge is
for data signals. To change the beam widths without los- that ML, particularly DL-based solutions, usually require a
ing beamforming gain, a balance beam is implemented in large amount of training data and computational resources.
our proposed scheme, which concentrates a radiation pat- To apply DL-based solutions, we need to carefully consider
tern over the target area. Based on the wide-beam measure- those requirements.
ments, the proposed super-resolution-inspired DL predicts A common problem with AI is that the parameters ob-
the beam qualities (received powers) of narrow beams where tained as a result of training are difficult to interpret. That
the spatial correlation in the beam qualities is utilized with is, it is difficult to interpret why the characteristics obtained
a CNN to improve the estimation accuracy. Moreover, the by AI are the way they are. This is called the interpretability
proposed scheme predicts beam qualities to reduce the fre- problem. However, to use AI in a real system, it is necessary
quency of beam training. The proposed scheme transmits to be able to understand and explain why such characteristics
the pilot signal only every other channel coherence time to are obtained. Explainable AI (XAI), which is an ML model
reduce the training overhead. The current received pow- whose results and processes leading to them are interpretable
ers with narrow beams are predicted based on the past pilot by humans, has been actively studied in recent years. A typ-
signals. Thus, the training time can be reduced by half. ical technique to realize XAI is LIME [36], which is a local
To capture spatiotemporal correlations, the proposed model approximation approach to represent AI’s decision logic for
is designed with a convolutional long short-term memory specific input data in an interpretable form. XAI is also an
(LSTM) network. Figure 8 shows an idea of the proposed important technology for 6G.
super-resolution-inspired DL scheme. Here, the received
power matrix obtained by 4 × 4 DFT beams is transformed References
into a power map by assigning the received power to the cor-
responding color. The low-resolution beam domain image [1] NTT DOCOMO, “White Paper: 5G Evolution and 6G (Version 4.0),”
Jan. 2022. https://www.docomo.ne.jp/english/binary/pdf/corporate/
is input to the super-resolution-inspired DL network to out-
technology/whitepaper_6g/DOCOMO_6G_White_PaperEN_v4.0.p
put the high-resolution beam domain image corresponding df
[2] The Ministry of Internal Affairs and Communications (MIC),
“Beyond 5G Promotion Strategy — Roadmap towards 6G —,”
June 2020. https://www.soumu.go.jp/main_sosiki/joho_tsusin/eng/
presentation/pdf/Beyond_5G_Promotion_Strategy-Roadmap_towar
ds_6G-.pdf
[3] National Institute of Information and Communications Tech-
nology (NICT), “Beyond 5G/6G White Paper (English Ver-
sion 2.0),” June 2022. https://beyond5g.nict.go.jp/images/download/
NICT_B5G6G_WhitePaperEN_v2_0.pdf
[4] KDDI Coorporation, KDDI Research, Inc., “Beyond 5G/6G White
Paper (Version 2.0.1),” Oct. 2021. https://www.kddi-research.jp/
sites/default/files/kddi_whitepaper_en/pdf/KDDI_B5G6G_WhitePa
perEN_2.0.1.pdf
Fig. 8 Estimation of received power of narrow beam from that of wide [5] NOKIA Bell Labs, “Communications in the 6G Era,” Sept. 2020.
beam based on super resolution [18]. https://d1p0gxnqcu0lvz.cloudfront.net/documents/Asset_20200909
IEICE TRANS. COMMUN., VOL.E106–B, NO.2 FEBRUARY 2023
82