tham2019--
tham2019--
Abstract— The rapid growth of data traffic has pushed the In traditional radio access network (RAN) deployment,
mobile telecommunication industry towards the adoption of fifth each base station (BS) is physically attached with a fix
generation (5G) communications. Cloud radio access network number of antennas, which handles baseband processing and
(CRAN), one of the 5G key enabler, facilitates fine-grained radio functions within small coverage. Accommodating
management of network resources by separating the remote
higher transmission rates means that a massive number of
radio head (RRH) from the baseband unit (BBU) via a high-
speed front-haul link. Classical resource allocation (RA) schemes physical BSs must be installed. This, however, incurs high
rely on numerical techniques to optimize various performance initial investment, site support, system management, setup
metrics. Most of these works can be defined as instantaneous and wireless channel interference among users [4]. Cloud
since the optimization decisions are derived from the current RAN (CRAN), a new paradigm for 5G communications,
network state without considering past network states. While distributes a set of low-power antennas known as remote
utility theory can incorporate long-term optimization effect into radio head (RRH) geographically at distinct locations within
these optimization actions, the growing heterogeneity and the coverage area [5-6]. All RRHs are then connected to a
complexity of network environments has rendered the RA issue centralized control and processing station known as baseband
intractable. One prospective candidate is reinforcement learning
unit (BBU) via a high-speed front-haul link. Consequently,
(RL), a dynamic programming framework which solves the RA
problems optimally over varying network states. Still, such RRHs are able to coordinate with each other and expand the
method cannot handle the highly dimensional state-action spaces cellular network coverage. This transforms into supportive
in the context of CRAN problems. Driven by the success of channel conditions and ultimately excellent QoS for all user
machine learning, researchers begin to explore the potential of equipments.
deep reinforcement learning (DRL) to address the RA problems. Inspired by the CRAN advantages, resource allocation (RA)
In this work, an overview of the major existing DRL approaches for CRAN has been extensively investigated. Data rate-
in CRAN is presented. We conclude this article by identifying oriented optimization problems for multi-user CRAN were
current technical hurdles and potential future research treated in [7] and [8]. The work in [9] studied the energy
directions.
efficiency (EE) maximization problem of CRAN subject to
Keywords—Deep Reinforcement Learning, 5G, Resource
Allocation, Cloud RAN individual antenna power constraints. It, however, does not
consider any requirement of data service rates, which is
I. INTRODUCTION important for provisioning heterogeneous multimedia services.
In [10], an EE maximization problem has been formulated
Recent years have witnessed the great evolution in mobile under the constraints on per-antenna transmission power and
communications, which began in 1980s with the first proportional data rates among user equipments. The
generation (1G), followed by 2G (1990), 3G (2002), 4G aforementioned RA schemes rely on numerical methods to
(2010) and the upcoming 5G [1]. The International optimize various performance metrics. Specifically,
Telecommunication Union Radiocommunications techniques such as Charnes-Cooper transformation (CCT),
Standardization Sector (ITU-R) has standardized the Lagrange Dual decomposition, parameterized convex
ambitious 5G requirements, referred to as International program, and bi-level optimization are utilized to reach
Mobile Telecommunications 2020 (IMT-2020) [2], which optimality in every single time slots.
encompasses 100 Mb/s user experienced data rate, one-ms Most of the abovementioned RA works can be defined as
latency, mobility up to 500 km/h, and backward compatibility instantaneous since the optimization decisions are derived
to long term evolution (LTE)/LTE-A. Such design goals stem from the current network state without considering past
from the fact that the total mobile data traffic will increase network states. This may lead to suboptimal results from the
significantly to 69 Exabytes per month in 2022 [3], due to the perspective of long-term network performance. For instance,
unprecedented growth of Internet of Things (IoT) devices. pursuing instantaneous energy efficiency may yield to
Under this premise, it is obvious that telecommunication unnecessary turning ON / OFF of RRHs, which is associated
operators need to take into account the costs from with enormous power and timing overheads [11]. Such issue
commoditization and quality of service (QoS) for mobile exhibits same flavor with the well-known ping-pong effect in
users during the initial phase of 5G deployment.
This work is supported by the Universiti Tunku Abdul Rahman under
UTARRF (IPSR/RMC/UTARRF/2017-C2/T08).
1853
Proceedings of APSIPA Annual Summit and Conference 2019 18-21 November 2019, Lanzhou, China
and ℵ ⊆ ℛ stand for the sets of active and inactive RRHs, offloading decision.
respectively. Besides that, we take into account the transition Similar work can be found in [22], where the authors
power (from active / sleep to sleep / active status). Υ equals considered both the error probability of decoding and
the set of mode-transition RRHs in the existing time slot , violation probability of delay in order to support low latency
which is controlled by the BBU. Armed with the above communications. In [23], a stepwise RA algorithm that
framework, we can define state and action spaces in the minimizes the total power consumption of CRAN has been
subsequent section. proposed. It relies on the combination of DQN and convex
optimization to select which RRHs to turn ON and to allocate
III. DRL-BASED RA IN CRAN transmission power among these active RRHs. Such low-
complexity algorithm, however, may yield infeasible solution
Generally speaking, DRL consists of two phases namely if the number of active RRHs is too low. Furthermore, similar
offline DNN construction phase and online deep Q learning to [19-22], the training process is not in a self-supervised
phase [16]. DNN is adopted to estimate the correlation learning mode. The authors in [24] have addressed this issue
between each state-action match ( , ) and its value function by proposing Monte Carlo Tree Search (MCTS) algorithm. In
( , ), which is the expected cumulative reward when the MCTS, beginning from a root state, it will mimic routes into
environment commences at state and pursues action . ( , the future in order to attain a favorable action by calculating
) can be formulated as: the reward value. Besides that, the work in [24] has improved
∞
the traditional DNN by separating the last DNN layers to
build a sub neural network for accommodating higher action
( , )= ( , ) = , = (4) dimension.
=0
IV. RESEARCH DIRECTIONS AND OPEN ISSUES
where represents the reward achieved in time slot , and
∈(0,1] is the discount factor which indicates the adjustment We pinpoint issues that remain worthy for further
between the prompt and future rewards. (4) lies at the heart of investigation as well as future research.
most DRL-based RA schemes where different system
assumptions, objective functions and optimization variables • The prime challenge discovered in all R&D efforts is the
dictate the specific definitions of , and . Table I difficulty in searching optimality for the DRL-based
summarizes the existing related works. problem. A significant portion stems from training
In [19], a RL-based offloading strategy has been proposed process involving the large state-action dimension.
to choose the RRH and the offloading rate based on the Therefore, an effective DRL-based RA should be able to
existing battery level, the past data rate to each RRH and the shrink the state-action space by using transfer learning. In
estimated amount of the harvested energy. The authors further this way, it can constantly absorb the features of
accelerate the learning speed based on convolutional neural newcomers and lessens random explorations at the early
network (CNN) which compresses the state space. In [20], a stage [17]. Stepwise design could be another efficient
double DQN based strategic computation offloading scheme way to scale down complexity while approaching the
has been designed for ultra-dense sliced RAN. Furthermore, optimal system performance. As demonstrated in [23],
the double DQN is coupled with a Q-function decomposition the continuous action space of dynamic power allocation
approach. In [21], a DQN method which uses a DNN to has been effectively shifted from the MDP to convex
predict the action-value function of Q-learning has been optimization.
devised to manage the computational resource allocation and
Table I. Comparison of Existing DRL Based RA Algorithms
Work [19] [20] [21] [22] [23] [24]
Q-function
Learning Algorithm CNN+Q-learning decomposition + DNN+Q-learning DNN+Q-learning DNN+Q-learning MCTS+MLT
double DQN
Sum Cost of Delay
Objective Function Latency Energy Task Success Rate Power Latency & Energy
and Energy
Communication
Offloading Rate & Offloading Rate & Binary Offloading
Computation Binary On/Off & Resource, Offloading
Action Space Computation Computation & Computation
Resource Power adaptation Rate & Computation
Resource Resource Resource
Resource
Battery Level,
Renewable Energy Task Queue State, Waiting Time of
Computing
generated in a Time Energy Queue the Tasks to be
Capability,
Slot & Number of State & Channel Computing processed at the User Demand Rate
State Space Radio Bandwidth
Potential Qualities between Capability Head of Buffers, On/OFF of RRHs
Resource State &
Transmission Rates UEs and RRHs Queue Length of
Task Request State
corresponding to the Buffers & CSI
Each Edge Device
1854
Proceedings of APSIPA Annual Summit and Conference 2019 18-21 November 2019, Lanzhou, China
• Another issue seldom discussed in most of existing DRL- OFDMA Systems”, IEEE Trans. Veh. Technol., vol. 59, no. 8,
based RA works is signaling overhead. From the pp. 4105-4115, 2010.
implementation viewpoint, incorporating the signaling [14] R. S. Sutton, A. G. Barto, Reinforcement Learning: An
Introduction, Cambridge, MA:MIT Press, 1998.
overhead into RA problem will be beneficial. The
[15] M. Miozzo, L. Giupponi, M. Rossi, and P. Dini, "Distributed Q-
signaling overhead is tightly connected with the accuracy Learning for Energy Harvesting Heterogeneous Networks",
of channel estimation. That is, when fast fading happens, IEEE ICC 2015 workshop on Green Communications and
more signaling will be exchanged so that DRL agent can Networks with Energy Harvesting Smart Grids and Renewable
keep up with the CSI. It is still unclear how much Energies, 2015.
performance degradation must be sacrificed when [16] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, et
imperfect CSI occurs. Therefore, an effective DRL-based al. Human-level control through deep reinforcement learning,
RA should be able to record and preserve historical Nature, Vol. 518, No. 7540, 2015, pp. 529–533.
observations, enabling the DRL agent to excecute [17] C. Zhang, P. Patras, H. Haddadi, “Deep learning in mobile and
wireless networking: A survey”, 2018, [online] Available:
accurate CSI prediction, given partial observations.
https://arxiv.org/abs/1803.04311.
Recurrent neural network (RNN) such as long short-term [18] B. Dai and W. Yu, “Energy efficiency of downlink transmission
memory (LSTM) could be one of the promising solutions. strategies for cloud radio access networks”, IEEE J. Sel. Areas
Commun., vol. 34, no. 4, pp. 1037–1050, 2016.
REFERENCES [19] M. Min, L. Xiao, Y. Chen et al., “Learning-based computation
offloading for iot devices with energy harvesting,” IEEE Trans.
[1] S. Lien, S. Shieh, Y. Huang, B. Su, Y. Hsu and H. Wei, "5G Veh. Technol., vol. 68, no. 2, pp. 1930–1941, Feb 2019.
New Radio: Waveform, Frame Structure, Multiple Access, and [20] X. Chen, H. Zhang, C. Wu et al., “Optimized computation
Initial Access," IEEE Commun. Mag., vol. 55, no. 6, pp. 64-71, offloading performance in virtual edge computing systems via
June 2017. deep reinforcement learning,” IEEE Internet Things J., vol. 6,
[2] J.-C. Guey et al., "On 5G Radio Access Architecture and no. 3, pp. 4005-4018, June 2019.
Technology", IEEE Wireless Commun., vol. 22, no. 5, pp. 2-5, [21] J. Li, H. Gao, T. Lv et al., “Deep reinforcement learning based
Oct. 2015. computation offloading and resource allocation for mec,” in
[3] J. Wu, “Green wireless communications: from concept to reality Proc. IEEE WCNC, pp. 1–6, April 2018.
[industry perspectives],” IEEE Wireless Commun., vol. 19, pp. [22] T. Yang, Y. Hu, M. C. Gursoy et al., “Deep reinforcement
4–5, Aug. 2012. learning based resource allocation in low latency edge
[4] X. Wang, “C-RAN: The Road Towards Green RAN,” China computing networks,” in Proc. ISWCS, pp. 1–5, Aug 2018.
Commun. J., Jun 2010. [23] Z. Xu, Y. Wang, J. Tang, J. Wang, and M. C. Gursoy, "A deep
[5] NGMN Alliance 5G White Paper, Mar. 2015, [online] reinforcement learning based framework for power-efficient
Available: https://www.ngmn.org/5g-white-paper/5g-white- resource allocation in cloud RANs", in Proc. IEEE ICC, pp. 1-6,
paper.html. May 2017.
[6] P. Rost et al., "Cloud technologies for flexible 5G radio access [24] J. Chen, S. Chen, Q. Wang, B. Cao, G. Feng and J. Hu, "iRAF:
networks," IEEE Commun. Mag., vol. 52, no. 5, pp. 68-76, May A Deep Reinforcement Learning Approach for Collaborative
2014. Mobile Edge Computing IoT Networks," IEEE Internet Things
[7] V. D. Papoutsis and S. A. Kotsopoulos, “Chunk-based resource J, vol. 6, no. 4, pp. 7011-7024, Aug. 2019.
allocation in distributed MISO-OFDMA systems with fairness
guarantee,” IEEE Commun. Lett., vol. 15, no. 4, pp. 377–379,
Apr. 2011.
[8] C. He, B. Sheng, P. Zhu, and X. You, “Energy efficiency and
spectral efficiency tradeoff in downlink distributed antenna
systems,” IEEE Wireless Commun. Lett., vol. 1, no. 3, pp. 153–
156, Jun. 2012.
[9] C. He, B. Sheng, P. Zhu, X. You, and G. Y. Li, “Energy-and
spectralefficiency tradeoff for distributed antenna systems with
proportional fairness,” IEEE J. Sel. Areas Commun., vol. 31, no.
5, pp. 894–902, May 2013.
[10] M.-L. Tham, S. F. Chien, D. W. Holtby, S. Alimov, "Energy-
efficient power allocation for distributed antenna systems with
proportional fairness", IEEE Trans. Green Commun. Netw., vol.
1, no. 2, pp. 145-157, Jun. 2017.
[11] Z. Xu, Y. Wang, J. Tang, J. Wang, and M. C. Gursoy, "A deep
reinforcement learning based framework for power-efficient
resource allocation in cloud RANs", in Proc. IEEE ICC, pp. 1-6,
May 2017.
[12] M. Tayyab, X. Gelabert and R. Jäntti, "A Survey on Handover
Management: From LTE to NR," IEEE Access, vol. 7, pp.
118907-118930, 2019.
[13] C. M. Yen, C. J. Chang, and L. C. Wang, “A Utility-Based
TMCR Scheduling Scheme for Downlink Multiuser MIMO-
1855