Energy-Efficient Joint Task Offloading and Resource Allocation in OFDMA-Based Collaborative Edge Computing
Energy-Efficient Joint Task Offloading and Resource Allocation in OFDMA-Based Collaborative Edge Computing
Energy-Efficient Joint Task Offloading and Resource Allocation in OFDMA-Based Collaborative Edge Computing
3, MARCH 2022
Abstract— Mobile edge computing (MEC) is an emergent Index Terms— Mobile edge computing, computing offload-
architecture, which brings computation and storage resources ing, resource allocation, power and subcarrier allocation, deep
to the edge of mobile network and provides rich services and Q-network.
applications near the end users. The joint problem of task
offloading and resource allocation in the multi-user collaborative
mobile edge computing network (C-MEC) based on Orthogonal I. I NTRODUCTION
Frequency-Division Multiple Access (OFDMA) is a challenging
issue. In this paper, we investigate the offloading decision, collab-
oration decision, computing resource allocation and communica-
tion resource allocation problem in C-MEC. The delay-sensitive
W ITH the development of Internet of Things (IoTs) and
5G technology, numerous mobile devices are deployed
into the network, such as smart phones, smart sensors, wear-
tasks of users can be computed locally, offloaded to collaborative
devices or MEC servers. The goal is to minimize the total energy able smart devices and other mobile devices, etc. These
consumption of all mobile users under the delay constraint. The new devices require real-time and fast processing of large
problem is formulated as a mixed-integer nonlinear programming amounts of data generated by applications, and have relatively
(MINLP), which involves the joint optimization of task offloading strict requirements on delay performance, which brings great
decision, collaboration decision, subcarrier and power alloca-
tion, and computing resource allocation. A two-level alternation challenges to the existing mobile communication network.
method framework is proposed to solve the formulated MINLP In order to effectively meet the high bandwidth and low latency
problem. In the upper level, a heuristic algorithm is used to required by the rapid development of mobile Internet and IoTs,
handle the collaboration decision and offloading decisions under mobile computing has changed from centralized mobile cloud
the initial setting; and in the lower level, the allocation of power, computing to Mobile Edge Computing (MEC) [1]–[3]. Mobile
subcarrier, and computing resources is updated through deep
reinforcement learning based on the current offloading decision. devices with limited battery capacity and computing power can
Simulation results show that the proposed algorithm achieves offload these computing intensive and delay critical tasks to
excellent performance in energy efficient and task completion MEC servers to reduce energy consumption and delay [4], [5].
rate (CR) for different network parameter settings. Nevertheless, offloading incurs extra overhead due to the
communication required between the mobile device and
Manuscript received April 19, 2021; revised July 11, 2021; accepted the MEC server. The additional communication requirement
August 24, 2021. Date of publication September 6, 2021; date of current
version March 10, 2022. This work was supported in part by the National affects both energy consumption and latency, and conse-
Natural Science Foundation of China under Grant 62072477, Grant 61309027, quently, the offloading strategy is particularly important.
Grant 61702562, and Grant 62071398; in part by Hunan Provincial Nat- Shu et al. [6] investigate the problem of access control and
ural Science Foundation of China under Grant 2018JJ3888; in part by the
Scientific Research Fund of Hunan Provincial Education Department under task offloading in multi-user edge systems and propose an effi-
Grant 18B197; in part by the National Key Research and Development cient offloading scheme. According to the personal preference
Program of China under Grant 2018YFB1700200; and in part by Hunan Key of individual task and edge node, Wang et al. [7] propose
Laboratory of Intelligent Logistics Technology 2019TP1015. The associate
editor coordinating the review of this article and approving it for publication a stable offloading algorithm. However, the performance of
was K. Xue. (Corresponding author: Zhufang Kuang.) the offloading strategy is affected by both communication and
Lin Tan is with the School of Computer and Information Engineering, computing resources, because the communication resources
Central South University of Forestry and Technology, Changsha 410004,
China (e-mail: 7624151@qq.com). affect the data transmission rate and the energy consumption
Zhufang Kuang is with the School of Computer and Information Engineer- of mobile devices, while computing resources affect the task
ing, Central South University of Forestry and Technology, Changsha 410004, computation delay.
China, and also with the Key Laboratory of Intelligent Logistics Technology
(Hunan Province), Changsha 410004, China (e-mail: zfkuangcn@163.com). Currently, some research works of MEC focus on the joint
Lian Zhao is with the Department of Electrical, Computer and Biomedical communication and computing resources optimization prob-
Engineering, Ryerson University, Toronto, ON M5B 2K3, Canada (e-mail: lem or the offloading decision. These problems can be solved
l5zhao@ryerson.ca).
Anfeng Liu is with the School of Computer Science and Engi- by deep reinforcement learning technology [8]–[11]. Recently,
neering, Central South University, Changsha 410010, China (e-mail: the joint problem of offloading decision and resource alloca-
afengliu@mail.csu.edu.cn). tion have been considered [12], [13]. Qian et al. [14] inves-
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TWC.2021.3108641. tigate the nonorthogonal multiple access (NOMA)-enabled
Digital Object Identifier 10.1109/TWC.2021.3108641 multi-access MEC and optimized offloading strategy and
1536-1276 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Ottawa. Downloaded on February 05,2024 at 15:42:52 UTC from IEEE Xplore. Restrictions apply.
TAN et al.: ENERGY-EFFICIENT JOINT TASK OFFLOADING AND RESOURCE ALLOCATION 1961
communication resource allocation. Kuang et al. [4] pro- oration decision, and the allocation of computing resources
pose a bi-level joint optimization algorithm to optimize the and communication resources in the OFDMA-based C-MEC
offloading strategy and resource allocation respectively [15], networks. Due to the NP-hard property of the joint opti-
[16]. In [17]–[19], the problem is addressed by the variable mization problem, the original problem is transformed into
characteristics of the task. Yan et al. [17] consider the task a bi-level optimization problem. The goal of the upper level is
dependency between the mobile device and Chen et al. [18] to optimize the collaboration decision and offloading decision;
consider different task types. In [20], [21], the optimization and the goal of the lower level is to find the optimal allocation
of channel assignment, power allocation, spectrum and data of power, subcarriers, and computing resources under the
caching is considered to reduce the task delay and overall given collaboration decision and offloading decision. In this
energy consumption. The works in [22]–[25] focus on optimiz- way, we can fully consider the dependence between the
ing and improving the overall energy efficiency of the system. offloading decision and the allocation of computing resources
The above studies mainly consider the offloading decision and communication resources. The main contributions of this
and resource allocation problem in MEC networks. However, work are summarized as follows:
the collaboration among mobile users has not been consid- • We investigate a scenario of Orthogonal
ered. Essentially, the joint problem of offloading decision, Frequency-Division Multiple Access (OFDMA) based
collaboration decision, and resource allocation in multi-user multi-user collaborative mobile edge computing
collaborative mobile edge computing network (C-MEC) is a (C-MEC). Then, the joint optimization problem of
challenging issue. collaboration decision, computation offloading and
Moreover, a growing number of mobile devices dis- resource allocation is formulated as Mixed-Integer
tributed around the edge of the network are often idle. NonLinear Programming (MINLP) problem. The
Collaborative computing can leverage the resources of object of this problem is to minimize the total energy
these idle devices to help other devices to perform their consumption of mobile devices by constraining system
tasks [26]. Collaborative computing can enhance the func- variables such as the offloading decisions, collaboration
tions of MEC, which can effectively alleviate the network decision, power allocation, subcarrier allocation, and
congestion and save the computing resources of MEC server. computing resource allocation, while the delay limit of
In [27]–[29] the collaborative computing between cloud server each task is satisfied.
and MEC server is considered. Wang et al. [30] propose a • We propose a two-level alternation method framework
flexible tripartite collaborative scheme based on user equip- to solve the formulated MINLP problem. The joint opti-
ment, MEC server, and Cloud server in a mobile network. mization problem of collaboration decision, computation
Feng et al. [31] considered the collaborative computing offloading and resource allocation is decomposed into
among MEC servers, studied the joint optimization as a two levels. In the upper level, we propose the Gen-
Markov decision process (MDP) and propose a collabora- eration of Offloading Decision (GOD) algorithm based
tive computing offloading and resource allocation algorithm. on Ant Colony System (ACS) to handle the collabora-
In [32]–[35], the works focus on the collaborative computing tion decision and offloading decisions under the initial
among mobile devices, which can offload their computa- setting; in the lower level, we propose the Allocation
tion tasks to nearby mobile devices in addition to MEC of Wireless (communication) resources and Computing
server. In particular, Saleem et al. [35] propose an Orthogonal Resources (AWCR) algorithm based on Deep Q-Network
Frequency-Division Multiple Access (OFDMA) scheme to (DQN) to obtain the optimal allocation of power, subcar-
minimize the total delay by considering the expected energy rier and computing resources on the current offloading
consumption, partial offloading, and resource allocation con- decision and collaboration decision.
straints. Furthermore, multiple access is achieved in OFDMA The rest of this paper is organized as follows. Section II
by assigning subsets of subcarriers to individual users, while introduces the system model and presents the optimization
the interference is ignored due to the exclusive subcarrier problem. Section III describes the details of our proposed
allocation. The network communication rate and bandwidth approach. The simulation studies are shown in Section IV.
utilization of the whole system can be improved by using Finally, Section V concludes this paper.
OFDMA for transmission [36]–[39]. Wu et al. [36] propose
the weighting and computing efficiency maximization problem
in OFDMA mobile edge computing networks. II. S YSTEM M ODEL AND P ROBLEM F ORMULATION
However, these aforementioned studies in collaborative In this section, we analyze the energy required for task
mobile edge computing networks only consider offloading offloading, transmission and computation, and formulate
decision and (or) power allocation. The joint problem of the problem of system energy minimization. In C-MEC,
offloading decision, collaboration decision, subcarrier and the energy consumption refers to the energy consumed by all
power allocation, and computing resource allocation has not mobile users. Since the MEC server is powered by the base
been investigated. In order to fill this gap, in this paper, station, the MEC server energy consumption is not taken into
we consider an OFDMA-based collaborative mobile edge account.
computing system and propose an energy-efficient strategy, As shown in Fig. 1, we consider a base station with a
which reduces the total energy consumption of the whole MEC MEC server and multiple users. N = {1, 2, . . . , n, . . . |N |}
system by jointly optimizing the offloading decision, collab- indicates that there are |N | mobile users, each user has one
Authorized licensed use limited to: University of Ottawa. Downloaded on February 05,2024 at 15:42:52 UTC from IEEE Xplore. Restrictions apply.
1962 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 21, NO. 3, MARCH 2022
Authorized licensed use limited to: University of Ottawa. Downloaded on February 05,2024 at 15:42:52 UTC from IEEE Xplore. Restrictions apply.
TAN et al.: ENERGY-EFFICIENT JOINT TASK OFFLOADING AND RESOURCE ALLOCATION 1963
For the MEC computing pattern, the transmission energy D. Problem Formulation
consumption of Mi includes the energy consumption of mobile Considering that the offloading decision, collaboration
user i to transmit input data and receive computing results. decision, computing resource allocation, and communication
Assuming the uplink and downlink use the same subcarrier resource allocation are all related with energy consumption,
for transmission, the total energy of mobile device for trans- we minimize the total energy consumption of all mobile
mission to and receiving from the MEC server is given by devices by jointly optimizing them, including total computing
Up Ii Oi and transmission energy consumption. The joint optimization
trans
Ei,j (D) = wi,s,j pi,s,j Up
+ Down , problem can be written as:
s∈S Ri,j (D) Ri,j (D)
∀i ∈ N , j = 0 (4) P: min E sum
D,,W,PU p ,PDown
⎛
|N | |N |
For the collaborative computing pattern, the energy con-
sumption of Mi is composed of mobile user i to transmit = ⎝ comp
di,j Ei,j (ri,j )
input data and receive calculation results from the collaborative i=1 j=1
⎞
mobile device j. In addition, the collaborative mobile device |N |
j also needs energy consumption to receive the input data and + trans
di,j Ei,j (D)⎠
send the computing results back to the mobile device i. There- j=0,j=i
fore, the total transmission energy consumption is given by s.t. C1 : di,j ∈ {0, 1} , ∀i ∈ N , ∀j ∈ F.
trans |N |
Ei,j (D)
C2 : di,j = 1, ∀i ∈ N ,
Ii Oi
= wi,s,j pUp
i,s,j Up
+ Down (D)
j=0
Ri,j (D) Ri,j |N |
s∈S
C3 : di,j ≤ 1, ∀j ∈ F − {0} ,
Ii Oi
+ pDown
i,s,j Up
+ Down (D)
, i=1
Ri,j (D) Ri,j |N |
C4 : di,j ri,j ≤ rjmax , ∀j ∈ F,
∀i ∈ N , j ∈ F − {0, i} (5)
i=1
C5 : ri,j > 0, ∀di,j = 1
B. Computation Model
(i ∈ N and j ∈ F) ,
The MEC server receives the offloaded tasks and then
executes these tasks in parallel. Because we consider the C6 : ri,j = 0, ∀di,j = 0
upper limit of the processing capacity of the MEC server and (i ∈ N and j ∈ F) ,
the upper limit of the tasks that can be processed under the C7 : wi,s,j ∈ {0, 1} , ∀i ∈ N , ∀s ∈ S,
delay constraint, we do not consider the task serial processing
∀j ∈ F − {i} ,
and task waiting queue. According to the given computing
resource ri,j , the computing time of Mi in the pattern j can C8 : wi,s,j = 1, ∀s ∈ S.
be written as i∈N j∈F \{i}
Ci C9 : pUp
i,s,j ≥ 0, pi,s,j ≥ 0,
Down
comp
Ti,j (rij ) = , ∀i ∈ N , j ∈ F (6)
ri,j
∀i ∈ N , ∀s ∈ S, ∀j ∈ F − {i}
Since the task computing at MEC server does not consume
any energy of mobile devices, and the MEC server is powered C10 : wi,s,j pUp
i,s,j ≤ Pmax ,
Up
at the base station, the MEC server energy consumption is not j∈F ,j=i s∈S
Authorized licensed use limited to: University of Ottawa. Downloaded on February 05,2024 at 15:42:52 UTC from IEEE Xplore. Restrictions apply.
1964 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 21, NO. 3, MARCH 2022
that each subcarrier is allocated to at most one user; C9 are of the upper level. Deep reinforcement learning aims to
the value range of transmission power; C10 is the maximum find a balance between exploration (unknown territory) and
power for user i to transmit Mi or receive calculation results, exploitation. In deep reinforcement learning, agents interact
and C11 is the maximum power for j to receive Mi or with the unknown environment through repeated observations,
transmit calculation results; C12 ensures that each task must behavior and rewards, to construct optimal strategies. It is
be completed within the deadline. a promising method to deal with high-complexity tasks in
the real world. In the case that the changing channel and
III. P ROPOSED A PPROACH power affect the allocation of computational resources and
the corresponding energy consumption results, we propose
A. Problem Transformation
the allocation of Wireless (communication) Resources and
We can observe that the energy consumption optimization P Computing Resources (AWCR) algorithm based on deep
is a MINLP problem, which is general NP-hard. It is difficult reinforcement learning method DQN algorithm in the lower
to find the solution of problem P with the traditional opti- level. In the case of given offloading decision population D,
mization method. We found the following two characteristics the lower optimization problem P2 selects the ant (i.e. D∗ )
through the further analysis of P. First, according to the result that returns the best performance in the offloading decision
of the offloading decision, the number of available computing population D and solves the computing resource allocation ∗ ,
resources and the selection of communication resource channel wireless channel resource allocation W∗ , transmission power
may vary with different patterns. Secondly, we cannot evaluate allocation PUp∗ and PDown∗ , which is formulated as:
the performance of the offload decision before computing and
communication resource allocation is generated, therefore it P2 : min E sum (D)
D∗ ,∗ ,W∗ ,PU p∗ ,PDown∗
is affected by the allocation results of computing resources ⎛
|N | |N |
and communication resources. Since the bi-level optimization
can solve the problem of mutual constraints characteristics, P = ⎝ comp
di,j Ei,j (ri,j )
is transformed into a bi-level optimization problem, that is, i=1 j=1
⎞
the solution of the upper level problem is under the condition |N |
of ensuring the optimization of the lower level problem as a + trans
di,j Ei,j (D)⎠
prerequisite. In this paper, the upper level optimization prob- j=0,j=i
lem P1 will solve the offloading decision problem, and the s.t. C4, C5, C6, C7, C8, C9, C10, C11, C12
lower level optimization problem P2 will solve the computing (11)
and communication resource allocation problem. The goal is to
minimize the total energy consumption of all mobile devices. In P1 and P2, the dependence of offloading decision on
In the upper level, the generation of offloading decision computing and communication resource allocation can be
is a typical NP-hard optimization problem. Finding the best fully considered. Specifically, since the generation of com-
offload decision is often considered a challenging task due puting and communication resource allocation based on the
to the nature of the NP-hard. Although the optimal offload- offloading decision, the limitations on available computing and
ing decision can be found through deterministic algorithms, communication resources caused by the offloading decision
in large-scale problems, their computational complexity is too are considered. On the one hand, only when the offloading
high to provide the optimal solution in a reasonable time. strategy is known, the energy consumption of the whole
In addition, since there is no logical ordering relationship system can be calculated. On the other hand, the resource
between feasible patterns, therefore, we propose a Generation allocation also affects the generation of offloading strategy.
of Offloading Decision (GOD) algorithm based on Ant Colony In addition, because the optimal allocation of computing and
System (ACS). According to ACS, the upper level optimiza- communication resources for each offloading decision can be
tion problem P1 minimizes energy consumption by generating obtained at the lower level, the quality of each offloading
offloading decision D = {D1 , . . . , DPop } of P op population decision can be accurately evaluated.
size with the assistance of pheromones, which is formulated
as:
⎛ B. JGA Algorithm
|N | |N |
P1 : min E sum
= ⎝ comp
di,j Ei,j (ri,j ) The framework of the proposed JGA algorithm (Joint GOD
D
i=1 j=1 and AWCR) is presented in Algorithm 1. During the initial-
⎞ ization phase, feasible patterns are pruned to generate a set
|N |
+ trans
di,j Ei,j (D)⎠ of available feasible patterns for each task (Line 2), which
j=0,j=i
will be introduced in Section III-C. In the main loop, both
s.t. C1, C2, C3, C12 (10) the upper level optimization and the lower level optimization
are achieved in each iteration. The latter is nested in the
In the lower level, with a goal of minimizing the total former. Specifically, the GOD algorithm is used to generate
energy consumption of the system, the power allocation, P op offloading decision (Line 4), which will be introduced in
subcarrier allocation, and computing resources allocation are Section III-D.1. In the lower level optimization, according to
obtained according to the offloading decision population D the given offloading decision population, the AWCR algorithm
Authorized licensed use limited to: University of Ottawa. Downloaded on February 05,2024 at 15:42:52 UTC from IEEE Xplore. Restrictions apply.
TAN et al.: ENERGY-EFFICIENT JOINT TASK OFFLOADING AND RESOURCE ALLOCATION 1965
Algorithm 1 JGA Algorithm maximum transmission power, namely, the maximum values
of uplink and downlink transmission rates.
1: iteration = 0;
Then we give the necessary conditions for the successful
2: Generate a set of all available feasible patterns Fi ;
execution of the feasible pattern based on C4 − C6, as shown
3: while iteration < iterationmax do
below:
4: Generate P op offloading decisions based on GOD
Algorithm 2 : D = {D1 , . . . , DPop } ; |N |
5: The optimal subcarrier allocation, power allocation and di,j r
i,j ≤ rj
max
and r
i,j ≥ 0, ∀i ∈ N , j ∈ F (13)
computing resource allocation under the selected best i=1
offloading decision in the population are calculated According to this condition, if the pattern cannot be sat-
based on AWCR Algorithm 3: isfied, it will be pruned, and a set of all available feasible
∗ ∗
{D∗ , ∗ , W∗ , PUp , PDown }; patterns Fi can be generated for the offloading decision of
6: Accelerate convergence by performing local search; mobile user i ∈ N .
7: Update the global pheromone based on (18);
8: iteration = iteration + 1;
9: end while D. Upper Level Optimization
10: Output: the best offloading decision and corresponding The goal of the upper level optimization is to minimize the
optimal subcarrier allocation, power allocation and com- total energy consumption of all mobile users by optimizing
puting resource allocation the generation of offloading decisions. As mentioned before,
using deterministic algorithms to find the optimal solution for
the problem is difficult if it is not impossible. The evolutionary
is used to select the best offloading decision in the popu- algorithm, as a stochastic algorithm based on population,
lation and optimize the corresponding computing resources, is widely used to solve NP-hard problems, such as Particle
subcarrier allocation, and power allocation (Line 5), which Swarm Optimization (PSO), Differential Evolution (DE) and
will be introduced in Section III-E. Then, the local search Ant Colony System (ACS). The upper level optimization
operator accelerates the convergence of the iterative optimal consists of three parts: a Generation of Offloading Deci-
solution (Line 6), which will be introduced in Section III-D.3. sion (GOD) algorithm based on ACS, pheromone manage-
Finally, update the global pheromone (Line 7), which will be ment, and local search. Each part is discussed below.
introduced in Section III-D.2. The above process will continue 1) GOD Algorithm: In the ACS algorithm, ants travel in
until the stop condition is met. We will introduce JGA in detail order to reach the destination and communicate indirectly
in the following subsection. through pheromones. The travel distance is used as a heuristic
information and works with pheromones to influence the
C. Feasible Pattern Pruning selection of ant colony. In our algorithm, the location selection
Due to the fact that the solution space of the feasible pattern of ants is transformed into the offloading decision pattern
is very large, there are many feasible pattern solutions are selection of the task, and the energy consumption brought
not available. The unavailable feasible patterns for each task by the selection pattern is taken as heuristic information. For
are removed during the initialization to prune the number of each task, the ant selects a pattern from the set of available
feasible offloading decisions (Line 2). feasible pattern set, then the offloading decision of all tasks
Since the available computing resources and communica- are obtained.
tion resources of mobile users are different in each pattern, It should be noted that if each ant chooses the mode with the
if the requirements of the minimum computing resources and highest pheromone concentration and heuristic information,
communication resources to complete the task in this pattern stagnation will occur. In other words, the algorithm will
are not satisfied, the pattern of the task is not feasible. Then, converge to the local optimal solution prematurely and cannot
the minimum lower bound of computing resources r find the global optimal solution. Hence, randomness is added
i,j of task
Mi under pattern j is calculated based on the time constraint into the allocation rule as a remedy strategy. Some ants need
of task completion C12, all tasks must be completed within to follow the allocation strategy of the highest pheromone
the time constraint, which is given by: concentration and heuristic information, and some ants need
⎧ to follow the random allocation strategy of probability distri-
⎪ Ci bution. The pattern j ∗ selection rules of ants from Fi is given
⎪
⎪ T max if j = i
⎨ i by
r
i,j = Ci ∀i ∈ N ⎧
⎪
⎪ otherwise,
⎪
⎩ Ti max − Ii
− Oi ⎨ arg max τi,k (ηi,k )β , if q ≤ q0
Up Down
Ri,j Ri,j
j∗ = k∈Fi (14)
(12) ⎩ J, otherwise
where ri,j represents the lower bound of the required com- where q is a random number uniformly distributed on [0, 1],
puting resource under the pattern j for task Mi to meet its τ is pheromone, η is heuristic information, and β is the
maximum delay constraint, and RUp and R Down represent
i,j i,j parameter that determines the relative importance between the
the transmission rate under the condition of allocating the pheromone and heuristic information. J is a random variable
Authorized licensed use limited to: University of Ottawa. Downloaded on February 05,2024 at 15:42:52 UTC from IEEE Xplore. Restrictions apply.
1966 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 21, NO. 3, MARCH 2022
in other ant’s tours. In this way ants will make a better accelerate the convergence. In the local search, the feasible
use of pheromone information: without local pheromones patterns for each task are checked by priority. For each
updating all ants would search in a narrow neighborhood available feasible pattern j of Mi (expressed as jc ), if the total
of the best previous tour. energy consumption is reduced through switching the assigned
After a task has been assigned a pattern, the local pattern to jc , then this adjustment is successful and acceptable.
pheromones are updated to reduce the probability of the Repeat the above steps, until each available feasible pattern for
same pattern being assigned in different conditions. The each task is obtained.
local pheromone update rule is formulated as
Authorized licensed use limited to: University of Ottawa. Downloaded on February 05,2024 at 15:42:52 UTC from IEEE Xplore. Restrictions apply.
TAN et al.: ENERGY-EFFICIENT JOINT TASK OFFLOADING AND RESOURCE ALLOCATION 1967
1) Using DQN to Optimize the Lower Level: In order to energy consumption is proportional to the square of
solve the lower optimization problem and get the optimal the computing resources. Therefore, the lower bound
E sum and D∗ in the population, the DQN algorithm is used of the computing resources is the optimal value. The
to find the optimal channel allocation W∗ , power allocation corresponding computing resources allocation ri,j (Dk ),
∗ ∗
PUp , PDown and computing resource allocation ∗ of each and the energy consumption E can be obtained. Since
user i according to the task decision provided by the upper the reward is negatively correlated with the total energy
level. In the following, the states, actions, and rewards for the consumption, −E is used as the reward. The minimum
DQN algorithm are specified. energy consumption corresponds to the maximum return.
• State (st): The system state consists of two components, 2) AWCR Algorithm: In the AWCR algorithm, there are an
st = {i, j}. i is the current user, j is the execution pattern experience replay memory P ool and two neural networks. The
of task Mi of current user i. capacity of the replay memory pool is ο to store the observed
• Action (a): In our system, the action is represented by a, experience (stl , al , Rl , stl+1 ). The neural networks are trained
which consists of three parts: channel selection, uplink by the experience in the replay memory. The two neural
power selection, and downlink power selection. The
networks include an eval_net and a target_net, and Qeval
channel selection is a vector Π = ξ1 , . . . , ξi , . . . , ξ|N | , and Qtar represent their Q-values under observed state stl .
where ξi = {ξi,1 , . . . , ξi,s , . . . , ξi,|S| } represents the These two Q-values are used to update the parameters of the
channel selection of user i, where ξi,s = 1 means that neural network by gradient descent. In particular, the eval_net
user i occupies channel s. The uplink power selection and the target_net have the same structure of neural network,
is a vector ΛUp = pUp Up Up
1 , . . . , pi , . . . , p|N | , where
with parameter vectors ω and ω , respectively.
The eval_net is used to indicate Qeval based on the
pUp
i = pUp Up Up
i,1 , . . . , pi,k , . . . , pi,max−1 represents the current network parameter ω, which is used to approximate the
uplink power selection of user i, and pUp i,k = 1 rep- Q-value of action al in the current state stl . The calculation
resents the power distribution of the k-th discretized formula of Qeval is given as follows:
grade according to the maximum uplink power Pmax Up
Qeval stl , al , ω = Q stl , al , ω (21)
to user i. The downlink power selection is a vec-
tor ΛDown = pDown 1 , . . . , pDown , . . . , pDown
|N | , where The target_net is used to indicate Qtar based on the
j
current network parameter ω , which is given by:
Down Down Down Down
pj = pj,1 , . . . , pj,k , . . . , pj,max−2 repre-
sents downlink power selection of selection pattern j, Qtar stl+1 , al+1 , ω = R + γ max Q stl+1 , al+1 , ω
a(t+1)
and pDown
j,k = 1 represents the power distribution of (22)
the k-th discretized grade according to the maximum
downlink power Pmax Down
to MEC (j = 0) or mobile where R is the reward
l+1 forl+1selecting
action al in the cur-
l
users (j > 0, j = i). Combined with vectors Π, ΛUp rent state st , Q st , a , ω is used to approximate
and ΛDown , the system action is as following: the Q-value of action al+1 in the future state stl+1 , γ is
⎡ ⎤ the discount factor that determines the relative importance
ξ1 , . . . , ξi , . . . , ξ|N | ,
⎢ Up Up Up ⎥ between the current reward and the future reward.
a = ⎣ p1 , . . . , pi , . . . , p|N | , ⎦ Then the loss function is calculated according to the mean
Down Down Down
p1 , . . . , pj , . . . , p|N | square errors (MSE) of the Qeval under eval_net and the Qtar
• Reward (R): For each step, after performing every possi- under target_net, which is given by
ble action a, a reward R at a certain state st is obtained. 2
L (ω) = E Qtar stl+1 , al+1 , ω − Qeval stl , al , ω (23)
In general, the reward function should be related to
the target function. The computing resource allocation In order to minimize the difference between Qeval and Qtar ,
ri,j (Dk ) can be derived, which is given by: the gradient descent method is used to update the network
weight parameter ω, which is given by
ri,j (Dk )
⎧ C ∂L (ω)
⎪ i Δω =
⎪
⎨ T max if j = i ∂ω
=
i
Ci = E Qtar stl+1 , al+1 , ω − Qeval stl , al , ω
⎪
⎪ otherwise,
⎩ T max − Ii
− Oi ∂Qeval stl , al , ω
i RU p
i,j (Dk )
RDown
i,j (Dk ) × (24)
∂ω
∀i ∈ N (20)
Algorithm 3 shows the details of AWCR algorithm.
where ri,j (Dk ) denotes the lower bound of the com- After the loop training is completed, through the trained
puting resource in pattern j for task Mi to meet the neural network, input the given offloading decision population
delay constraint, for a given offloading decision matrix D of the upper layer, the selected offloading decision, channel
Up
Dk ; and Ri,j Down
(Dk ) and Ri,j (Dk ) denote the trans- allocation, power allocation and computing resource allocation
mission rate under the given subcarrier allocation and can be obtained. Thus, the current local optimal solution
transmission power allocation with action a. Through {Dk , k , Wk , PUp
k , Pk
Down
} and the corresponding minimum
the analysis of (7), it can be seen that the computing energy consumption E sum can be obtained. Then these results
Authorized licensed use limited to: University of Ottawa. Downloaded on February 05,2024 at 15:42:52 UTC from IEEE Xplore. Restrictions apply.
1968 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 21, NO. 3, MARCH 2022
are fed back to the upper level, which updates the pheromone
accordingly.
A. Simulation Settings
Suppose that in a 2000m * 2000m area, the base station Fig. 3. Average EC(J) value vs. different β for N = 100.
is located in the center of the area, and all mobile users are
randomly distributed in the area. In addition, Ci (i ∈ N ) is
randomly distributed in [100, 2500] Mega cycles, Ii (i ∈ N ) B. Comparison Parameter Settings
is randomly distributed in [1, 1200] bytes, and Oi (i ∈ N ) is We compare different parameter settings on average EC to
randomly distributed in [1, Ii (i ∈ N )] bytes. The computing show the effect of each parameter on the optimal solution.
power rjmax of each mobile device is any of the four specifica- In Fig. 2, we compare the impact of different discount factors
tions of 0.5GHz, 0.8GHz, 1.0GHz and 1.5GHz, respectively. γ on average energy consumption. It can be seen that a
In addition, the default setting for Timax (i ∈ N ) is 1.0s, BN small γ value will underestimate the future reward, while a
is 12. 5khz. According to the physical interference model [43], large γ value will overemphasize the future reward. Fig. 2
we set the channel gain g = d−4 , where d is the propagation shows γ = 0.9 is the best parameter for determining the
distance. Similar to [15], the transmission power pmax up and relative importance between the current reward and the future
pmax
down are 1.3W and 0.8W, respectively. Background noise δ2 reward. In Fig. 3, we compare different β on average energy
is −113 dBm, CPU frequency of MEC server is 20GHz, k is consumption. The parameter β determines the importance
10−27 . between heuristic information and pheromone. When β = 1,
The population size and maximum iteration number of GOD the importance of pheromone and heuristic information is the
are set as P op = 50 and iterationmax = 300, respectively. same, with poor convergence performance. When β = 3,
Other parameter settings of ant colony in GOD were similar the importance of heuristic information is overestimated. Fig. 3
to [42]: q0 = 0.9, β = 2, α = ρ = 0.1. The discount factor and shows β = 2 is the best parameter with the lowest and the
replay memory capacity of AWCR were set as γ = 0.9 and fast convergence of the EC performance.
ο = 2000, respectively. The parameters updating frequency
for target_net ϑ is set to 100. Other parameter settings of
C. Comparison With Exhaustive Search
neural network were consistent with [44].
In this paper, two performance indicators are adopted, In order to verify the effectiveness of JGA, it is compared
namely, task Completion Rate (CR) and total Energy Con- with Exhaustive Search (ES). The exhaustive search is a
sumption (EC). CR records the completion rate of tasks kind of violent search method, which can obtain the global
under the delay constraint. EC represents the total energy optimal solution of the optimization problem by traversing and
consumption of the mobile users in the entire system. comparing all feasible solutions. The problem with exhaustive
Authorized licensed use limited to: University of Ottawa. Downloaded on February 05,2024 at 15:42:52 UTC from IEEE Xplore. Restrictions apply.
TAN et al.: ENERGY-EFFICIENT JOINT TASK OFFLOADING AND RESOURCE ALLOCATION 1969
E. Effectiveness of JGA
In this section, we study the effectiveness of our algorithm,
and compare JGA with the following five schemes:
• Local Execution (LE): All tasks are executed locally.
• MEC Execution (ME): All tasks are offloaded to MEC.
Fig. 4. Comparison of average EC(J) value performance of Timax = 1s • Without Collaborative (WC): Tasks can be performed on
between exhaustive search and JGA. local devices or MEC servers.
• Without MEC (WM): Tasks can be performed on local or
collaborative devices.
• Random: All tasks select execution pattern randomly.
Firstly, we compare the CR values of JGA and these five
schemes. It can be seen that LE, ME, WC and WM or
Random cannot guarantee that all tasks are completed, while
JGA’s completion rate (CR) is 1, which can ensure that all
tasks can be successfully executed within the whole range of
the simulation. In JGA algorithm, the collaboration decision,
computation offloading and resource allocation are considered
jointly, the optimal resource allocation scheme is obtained.
Therefore, the final result can achieve 100% completion rate of
the task. Due to the fact that the computing capacity of mobile
devices and MEC server is limited, and the harsh conditions
required to complete the task, it is difficult to achieve 100%
Fig. 5. Comparison of average EC(J) value performance between BiJOR task completion rate in other schemes. It can be seen that
and JGA.
even on resource-poor mobile devices, JGA can guarantee
100% completion rate of all user tasks under dense network
search is the complexity of the algorithm will increase expo- conditions.
nentially as the user size increases. The energy consumption As shown in Fig. 7, the CR value of ME decreases with
of JGA and exhaustive search on a small user scale are shown the increasing number of users, because the MEC has limited
in Fig. 4. computing power. The WC decreases with the increasing
It can be seen that the performance of the proposed JGA number of users as well. The LE, WM and Random three
matches well with the global optimal solution obtained by algorithms have minor differences effect on the completion
the exhaustive search method. Therefore, in today’s large-scale rate with the increase of number of users, because each task is
networks, the proposed algorithm which significantly reduces executed locally or on a collaborative device with certain com-
the execution time compared with ES algorithms, provides a puting power. Therefore, its performance is not significantly
practical and efficient solution. affected by the number of users. The collaborative mobile
edge computing system combines the advantages of MEC
servers and mobile devices, and can take advantage of idle
D. Comparison With BiJOR computing resources on mobile devices. JGA alleviates the
Fig. 5 presents the average energy consumption comparison above problems effectively and achieves the best performance.
with BiJOR [15] on the same number of users. The BiJOR In addition, as shown in Fig. 8, we compare the performance
is a bi-level optimization algorithm. The upper level uses of CR value under different delay constraints and MEC server
ACS to generate the offloading decision. The difference is processing capacity. It can be seen that the JGA algorithm can
that the lower level uses a simple monotone optimization achieve 100% CR rate with all the simulation settings.
method to optimize the allocation of computing resources. The Secondly, we compare the average EC under the condition
energy consumption performance of the two systems under that the CR is not 100%, that is, we cannot guarantee that
two different delay constraints is compared. In order to ensure all tasks can be completed before the deadline. Due to the
the effectiveness of the simulation, the same data as BiJOR fact we ignore the energy consumption of MEC server, so the
is used for the comparative simulation. It can be seen that EC value of ME is not involved in this comparison. Therefore,
the proposed JGA achieved energy saving over BiJOR, more we compare the EC values of JGA and the other four schemes.
Authorized licensed use limited to: University of Ottawa. Downloaded on February 05,2024 at 15:42:52 UTC from IEEE Xplore. Restrictions apply.
1970 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 21, NO. 3, MARCH 2022
Fig. 7. Comparison of CR with different number of users, Timax = 1s and Fig. 9. Comparison of EC with different number of users, Timax = 1s and
r0max = 20GHz. r0max = 20GHz.
Authorized licensed use limited to: University of Ottawa. Downloaded on February 05,2024 at 15:42:52 UTC from IEEE Xplore. Restrictions apply.
TAN et al.: ENERGY-EFFICIENT JOINT TASK OFFLOADING AND RESOURCE ALLOCATION 1971
Authorized licensed use limited to: University of Ottawa. Downloaded on February 05,2024 at 15:42:52 UTC from IEEE Xplore. Restrictions apply.
1972 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 21, NO. 3, MARCH 2022
[23] M. Li, N. Cheng, J. Gao, Y. Wang, L. Zhao, and X. Shen, “Energy- Lin Tan received the B.Eng. degree in network
efficient UAV-assisted mobile edge computing: Resource allocation and engineering from Qingdao University of Technology,
trajectory optimization,” IEEE Trans. Veh. Technol., vol. 69, no. 3, Qingdao, China, in 2019. He is currently pursuing
pp. 3424–3438, Mar. 2020. the M.Sc. degree in computer technology with the
[24] Z. Kuang, G. Liu, G. Li, and X. Deng, “Energy efficient resource Central South University of Forestry and Technol-
allocation algorithm in energy harvesting-based D2D heterogeneous ogy. His current research interests include mobile
networks,” IEEE Internet Things J., vol. 6, no. 1, pp. 557–567, edge computing, and optimization algorithm and its
Feb. 2019. application.
[25] Z. Kuang, L. Zhang, and L. Zhao, “Energy- and spectral-efficiency
tradeoff with α-fairness in energy harvesting D2D communica-
tion,” IEEE Trans. Veh. Technol., vol. 69, no. 9, pp. 9972–9983,
Jun. 2020.
[26] X. Chen, Z. Zhou, W. Wu, D. Wu, and J. Zhang, “Socially-motivated
cooperative mobile edge computing,” IEEE Netw., vol. 32, no. 6,
pp. 177–183, Nov./Dec. 2018. Zhufang Kuang (Member, IEEE) received the
[27] Y. Li, X. Wang, X. Gan, H. Jin, L. Fu, and X. Wang, “Learning-aided M.Sc. degree from the National University of
computation offloading for trusted collaborative mobile edge comput- Defense Technology, Changsha, China, in 2006, and
ing,” IEEE Trans. Mobile Comput., vol. 19, no. 12, pp. 2833–2849, the Ph.D. degree in computer science from Cen-
Dec. 2020. tral South University, Changsha, in 2012. He was
[28] L. Ruan, Y. Yan, S. Guo, F. Wen, and X. Qiu, “Priority-based residential a Post-Doctoral Researcher with the School of
energy management with collaborative edge and cloud computing,” Software, Central South University. From 2015 to
IEEE Trans. Ind. Informat., vol. 16, no. 3, pp. 1848–1857, Mar. 2020. 2016, he was a Visiting Scholar/a Professor with
[29] X. Wang, C. Wang, X. Li, V. C. M. Leung, and T. Taleb, “Federated the University of Victoria, Victoria, BC, Canada.
deep reinforcement learning for Internet of Things with decentralized He is currently a Full Professor with the Department
cooperative edge caching,” IEEE Internet Things J., vol. 7, no. 10, of Computer Science, Central South University of
pp. 9441–9455, Oct. 2020. Forestry and Technology. His current research interests include wireless
[30] X. Wang, R. Li, C. Wang, X. Li, T. Taleb, and V. C. M. Leung, communications and networking, mobile edge computing, and optimization
“Attention-weighted federated deep reinforcement learning for device- algorithm and its application. He is a Senior Member of CCF, and a member
to-device assisted heterogeneous collaborative edge caching,” IEEE J. of the CCF Network and Data Communications Council and ACM. He is the
Sel. Areas Commun., vol. 39, no. 1, pp. 154–169, Jan. 2021. Vice Chair of CCF YOCSEF CHANGSHA from 2018 to 2019.
[31] J. Feng, F. R. Yu, Q. Pei, X. Chu, J. Du, and L. Zhu, “Cooperative
computation offloading and resource allocation for blockchain-enabled
mobile-edge computing: A deep reinforcement learning approach,” IEEE
Internet Things J., vol. 7, no. 7, pp. 6214–6228, Jul. 2020.
Lian Zhao (Senior Member, IEEE) received the
[32] Y. Huang, Y. Liu, and F. Chen, “NOMA-aided mobile edge com- Ph.D. degree from the University of Waterloo,
puting via user cooperation,” IEEE Trans. Commun., vol. 68, no. 4,
Canada, in 2002. She joined the Department of
pp. 2221–2235, Apr. 2020.
Electrical, Computer and Biomedical Engineering
[33] U. Saleem, Y. Liu, S. Jangsher, Y. Li, and T. Jiang, “Mobility-aware at Ryerson University, Toronto, Canada, in 2003,
joint task scheduling and resource allocation for cooperative mobile where she was a Professor in 2014. Her research
edge computing,” IEEE Trans. Wireless Commun., vol. 20, no. 1, interests are in the areas of wireless communications,
pp. 360–374, Jan. 2021. radio resource management, mobile edge comput-
[34] M. Li, J. Gao, L. Zhao, and X. Shen, “Deep reinforcement learning ing, caching and communications, and the Internet-
for collaborative edge computing in vehicular networks,” IEEE Trans. of-Vehicles networks. She served as a Committee
Cognit. Commun. Netw., vol. 6, no. 4, pp. 1122–1135, Dec. 2020. Member for the Natural Science and Engineering
[35] U. Saleem, Y. Liu, S. Jangsher, X. Tao, and Y. Li, “Latency minimiza- Research Council of Canada (NSERC) Discovery Grants Evaluation Group
tion for D2D-enabled partial computation offloading in mobile edge for Electrical and Computer Engineering from 2015 to 2018. She is a Senior
computing,” IEEE Trans. Veh. Technol., vol. 69, no. 4, pp. 4472–4486, Member of the IEEE Communication and Vehicular Society and a Licensed
Apr. 2020. Professional Engineer in the Province of Ontario. She received the Best Land
[36] Y. Wu, Y. Wang, F. Zhou, and R. Q. Hu, “Computation efficiency Transportation Paper Award from the IEEE Vehicular Technology Society
maximization in OFDMA-based mobile edge computing networks,” in 2016, the Best Paper Award from the 2013 International Conference
IEEE Commun. Lett., vol. 24, no. 1, pp. 159–163, Jan. 2020. on Wireless Communications and Signal Processing (WCSP), and Canada
[37] M. Masoudi and C. Cavdar, “Device vs edge computing for mobile Foundation for Innovation (CFI) New Opportunity Research Award in 2005.
services: Delay-aware decision making to minimize power consump- She served as the Co-Chair of Wireless Communication Symposium for IEEE
tion,” IEEE Trans. Mobile Comput., early access, Jun. 3, 2020, doi: Globecom 2020 and IEEE ICC 2018, and Communication Theory Symposium
10.1109/TMC.2020.2999784. for IEEE Globecom 2013. She has been serving as an Editor for IEEE
[38] S. Rezvani, N. Mokari, M. R. Javan, and E. A. Jorswieck, “Fairness T RANSACTIONS ON V EHICULAR T ECHNOLOGY, IEEE T RANSACTIONS ON
and transmission-aware caching and delivery policies in OFDMA-based W IRELESS C OMMUNICATIONS , and IEEE I NTERNET OF T HINGS J OURNAL.
HetNets,” IEEE Trans. Mobile Comput., vol. 19, no. 2, pp. 331–346, She has been selected as an IEEE Communication Society (ComSoc) Distin-
Feb. 2020. guished Lecturer (DL) in 2020 and 2021.
[39] P. Zhao, H. Tian, K.-C. Chen, S. Fan, and G. Nie, “Context-aware TDD
configuration and resource allocation for mobile edge computing,” IEEE
Trans. Commun., vol. 68, no. 2, pp. 1118–1131, Feb. 2020.
[40] X. Chen, L. Pu, L. Gao, W. Wu, and D. Wu, “Exploiting massive
D2D collaboration for energy-efficient mobile edge computing,” IEEE Anfeng Liu received the M.Sc. and Ph.D. degrees
Wireless Commun., vol. 24, no. 4, pp. 64–71, Aug. 2017. in computer science from Central South University,
[41] W. Zhang, Y. Wen, K. Guan, D. Kilper, H. Luo, and D. O. Wu, China, in 2002 and 2005, respectively. He was a
“Energy-optimal mobile cloud computing under stochastic wireless Post-Doctoral Researcher with the School of Elec-
channel,” IEEE Trans. Wireless Commun., vol. 12, no. 9, pp. 4569–4581, tronic Science and Technology, National University
Sep. 2013. of Defense Technology, Changsha, China. From
[42] M. Dorigo and L. M. Gambardella, “Ant colony system: A cooperative 2009 to 2012, he was a Visiting Scholar/a Profes-
learning approach to the traveling salesman problem,” IEEE Trans. Evol. sor with the Department of Electrical and Com-
Comput., vol. 1, no. 1, pp. 53–66, Apr. 1997. puter Engineering (ELCE), University of Waterloo,
[43] T. Rappaport, Wireless Communications: Principle and Practice, vol. 2. Canada. He is currently a Full Professor with the
Upper Saddle River, NJ, USA: Prentice-Hall, 1996. School of Computer Science and Engineering, Cen-
[44] Q. Luo, C. Li, T. H. Luan, and W. Shi, “Collaborative data scheduling tral South University. His current research interests include wireless sensor
for vehicular edge computing via deep reinforcement learning,” IEEE networks and mobile edge computing. He is also a member (E200012141M)
Internet Things J., vol. 7, no. 10, pp. 9637–9650, Oct. 2020. of China Computer Federation (CCF).
Authorized licensed use limited to: University of Ottawa. Downloaded on February 05,2024 at 15:42:52 UTC from IEEE Xplore. Restrictions apply.