Paper 3

Engineering Applications of Artificial Intelligence 116 (2022) 105439
Contents lists available at ScienceDirect
Engineering Applications of Artificial Intelligence

journal homepage: www.elsevier.com/locate/engappai
Machine learning approach for truck-drones based last-mile delivery in the

era of industry 4.0
Ali Arishi a,b ,∗, Krishna Krishnan a , Majed Arishi c
a Department of Industrial, Systems, and Manufacturing Engineering, Wichita State University, Wichita, KS 67220, USA
b
Department of Industrial Engineering, King Khalid University, Abha, 62529, Saudi Arabia
c
Department of Civil and Environmental Engineering, University of South Florida, Tampa, FL 33620, USA
ARTICLE INFO ABSTRACT

Keywords: Under the vision of industry 4.0, the integration of drones in last-mile delivery can transform traditional
Last-mile delivery delivery practices and provide competitive advantages. However, the combinatorial nature of the routing
Truck-drone system problem and the technical limitations of the drones present a real challenge for adopting a stand-alone drone
Machine learning
delivery as an alternative to truck delivery. This study introduces the Parking Location and Traveling Salesman
Constrained clustering
Problem with Homogeneous Drones (PLTSPHD). This problem considers a scenario in which a single truck
Deep reinforcement learning
Operational cost
carries identical drones along with parcels from the depot to preassigned launching/parking sites, from where
the drones complete the last-mile deliveries. In contrast to previous studies that tackle truck-drone delivery
using conventional optimization approaches, this paper proposes a two-phase machine learning (ML) approach
for the PLTSPHD, which minimizes the total operational cost of the last-mile problem. The proposed ML
approach for PLTSPHD consists of clustering and routing phases. In the first phase, a constrained k-means
clustering algorithm is proposed to cluster delivery locations based on the maximum flight range and number
of available drones per truck. A deep reinforcement learning (DRL) model is then developed in the second
stage to find an optimal route among all constrained clusters. Experimental results show that solving the
presented truck-drone problem using the ML framework can significantly reduce the operational cost compared
to standard truck delivery. The constrained clustering reduces the complexity of the routing problem while
adhering to the constraints. In addition, the trained DRL model outperforms the state-of-art Google’s OR-
tools solver and other types of well-known heuristics in terms of both solution quality and computation time.
Moreover, a sensitivity analysis of different key parameters is conducted to highlight some key trade-offs in
using multiple drones and their dependence on operating costs and problem sizes.
1. Introduction (Jeong et al., 2020; Rejeb et al., 2021). Drones can provide fast and
timely deliveries of medical supplies to remote or hard access areas.
The rise in e-commerce has attracted great attention to many de- They are much more lighter and energy efficient for transporting small
livery issues in the supply chain (SC). Most of the delivery issues are packages. In case of disease outbreaks like COVID-19, drone delivery
related to last-mile deliveries. As last-mile delivery logistics require a can eliminate human contact and provide a timely supply of vaccines
lot of different elements to ensure faster and more efficient delivery to health facilities. Drone-based delivery can increase customer satisfac-
conditions, last-mile delivery problems have been considered as more tion and accelerate revolutions in logistics. The fact that 86% of parcels
expensive, more polluting, and less efficient than any other problems shipped by Amazon are light, weighing less than 5 lb Guglielmo (2013),
in SC (Ranieri et al., 2018). suggests that in the near future, drones will play an important role in
Today, with advancements in technologies and the affordability of logistics, especially in last-mile delivery
sensors, unmanned aerial vehicles (UAV) or drones have the potential Despite their advantages, drones have two main technical limi-
to transform traditional delivery practices in SCs. The deployment of tations, limited payload and limited flight range. These constraints
drones in the last-mile delivery can benefit SC in many ways, includ- present a challenge to adopt a stand-alone drone delivery as an al-
ing reducing traveling costs, providing contact-free delivery, avoiding ternative to truck delivery in the last mile. Therefore, researchers
traffic congestion, and facilitating shorter delivery times (Rejeb et al., have proposed several types of truck-drone systems to overcome these
2021) . Several leaders in logistics and transportation investigate the constraints. Truck-drone delivery problem is NP-hard in nature; pre-
use of drones for last-mile delivery, including UPS, DHL and Amazon vious studies have focused on exact and heuristic methods to solve
∗ Corresponding author at: Department of Industrial, Systems, and Manufacturing Engineering, Wichita State University, Wichita, KS 67220, USA.
E-mail address: aaarishi@shockers.wichita.edu (A. Arishi).
https://doi.org/10.1016/j.engappai.2022.105439
Received 11 April 2022; Received in revised form 6 August 2022; Accepted 5 September 2022
Available online 20 September 2022
0952-1976/© 2022 Elsevier Ltd. All rights reserved.
A. Arishi, K. Krishnan and M. Arishi Engineering Applications of Artificial Intelligence 116 (2022) 105439
non-practical truck-drone delivery problems. Exact approaches such as truck starts from the depot and visits a set of launching sites where the
mixed integer programming (MIP), and dynamic programming exhaus- truck parks and drones can be concurrently deployed to make all final
tively explore the search space to find the optimal solution. However, deliveries. The objective of PLTSPHD is to reduce the total operational
they require long computation time and are hardly applicable to prob- cost of the truck-drone delivery. Our proposed ML approach for solving
lems with more than 50 customers. Classical heuristic approaches PLTSPHD consists of clustering and routing phases. In the first phase,
such as saving, path cheapest arc, and nearest neighbor (NN) have a an unsupervised constrained k-means clustering algorithm is proposed
polynomial running time but without a guarantee for optimality since to cluster delivery locations based on the maximum flight range and the
they stop at local optima. Metaheuristic approaches such as guided- number of available drones per truck. A DRL model is then developed
local search, OR-tools solver, and genetics algorithms (GA) perform in the second phase to find an optimal route among all constrained
more exhaustive exploration of the search space to escape local optima. clusters. Experimental results show that the last-mile delivery under
However, their run time can be intractable when the search space the PLTSPHD formulation has the potential to provide significant cost
is too large for the algorithm to explore. Due to the combinatorial savings, especially in large-scale problems. The proposed constrained
nature of the truck-drone delivery problem, the performance of the k-means clustering approach effectively reduces the scale of the large
conventional approaches can be greatly affected by the size of the routing problem while adhering to the operational constraints of the
problem. Therefore, the current challenge is to develop a scalable and PLTSPHD. In addition, the trained DRL scales well and performs better
efficient approach for truck-drone delivery. than other non-leaning algorithms. Finally, a sensitivity analysis of
Recently, the application of machine learning (ML), especially deep different key parameters is conducted to highlight some key trade-
reinforcement learning (DRL), has achieved great success in many fields offs in using multiple drones and their dependence on operating costs
(Luong et al., 2019; Silver et al., 2016). ML deals with developing and problem sizes. The main contributions of this research can be
algorithms that can learn from data and use the knowledge gained summarized as follows:
to solve problems. ML uses the power of computers to learn how
• We introduce a novel PLTSPHD which is a new variant of the
to approximate function instead of explicitly writing a program. ML
truck-drone problem. PLTSPHD focuses on finding an optimal
can be broadly divided into supervised, unsupervised, and reinforce-
routing strategy that minimizes the total operational cost for the
ment learning (RL). Supervised learning trains algorithms with labeled
two-echelon last-mile delivery problem.
datasets to predict future outcomes. On the other hand, the unsuper-
• To the best of our knowledge, this is the first study to de-
vised learning approach learns to discover hidden patterns in data
velop a complete ML framework for solving large-scale PLTSPHD
without training or pre-labeled datasets. RL is a special type of ML
last-mile delivery.
that focuses on sequential decision-making problems. In RL, an agent
• Our proposed constrained k-means algorithm divides the loca-
interacts with the environment in order to improve its performance
tions of customers in the delivery area based on user-specified
through a series of trials and errors. The agent receives feedback
constraints.
from the environment after completing the task; therefore, it is not
• We propose a DRL model to learn heuristics for PLTSPHD. In
unsupervised. Also, it does not have a set of labeled data for training, so
contrast with conventional methods that suffer from scalability
it cannot be considered supervised. The three learning paradigms can
issues and long computation time, the proposed DRL scales and
be used for solving combinatorial problems. Supervised ML models can
performs well on problems not seen during the training. Once
be trained on a dataset consisting of inputs and corresponding optimal
trained, the DRL model can be used to solve a collection of unseen
solutions. However, this approach is not practical because optimal
PLTSPHD instances of different characteristics.
solutions are expensive to obtain in practice, especially for large-scale
• We compare the performance of a number of different solution
problems. Unsupervised ML models can decompose large-scale and
methods: DRL, OR-tools, GA, and NN in terms of both solution
reduce the dimensionality of large-scale problems. RL models can learn
quality and computational time. OR-tools is a flexible and well-
heuristics for directly solving combinatorial problems. In the case of
known solver that can solve different types of routing problems,
small state and action spaces, RL can be used to find the optimal routing
while GA is one of the most applied metaheuristics for solving
strategy. However, RL cannot learn a good strategy in a complex and
routing problems. We added NN heuristics to the comparison so
large-scale problem due to the explosion in state and action spaces.
that we have a combination of old and modern algorithms.
Nowadays, with the rise of deep learning (DL), neural networks have
• We varied the number of customers, demand distribution, number
proven their ability to work as high-quality function approximators
of drones, size of service regions, and drone cost in our experi-
for ML algorithms (Bengio et al., 2021; Ostad-Ali-Askari et al., 2017;
ment to evaluate the effectiveness of the proposed approach.
Serrano, 2022; Wang et al., 2021). The word ‘‘deep’’ in DL refers to
the depth of the neural network. In combinatorial optimization, DRL The rest of this paper is organized as follows. Section 2 briefly reviews
can effectively replace heavy computations with a fast approximation related research on the studied topic. Section 3 describes the PLTSPHD.
in a generic way without the need to design a new explicit algorithm Section 4 Introduces the proposed two-phase ML approach. Experi-
for NP-hard problems (Wang and Tang, 2021). The combination of the ments, statistical comparison, and sensitivity analysis are conducted
two learning frameworks overcomes the issues of designing handcrafted and discussed in Section 5. Finally, Section 6 concludes the paper and
heuristics and the need for domain knowledge required in conventional highlights future works.
optimization methods.
In the era of industry 4.0, the application of artificial intelligence 2. Literature review
and smart logistics can transform traditional practices in the SC. How-
ever, in the review of truck-drone routing and ML approaches, it is 2.1. Truck-drone routing problems
found that researchers have focused on using exact and heuristics meth-
ods. On the other hand, DRL studies focused only on traditional truck Over the last few years, studies regarding the use of truck-drone
delivery. In this study, we propose a hybrid ML approach for solving a combinations in logistics have been rapidly growing with various con-
new variant of the truck-drone delivery problem in the SC, namely the figurations of the relationship between the two types of transportation.
parking location and traveling salesman problem with homogeneous The truck-drone problem was first investigated for the first time in
drones (PLTSPHD). This problem is designed to consider the limita- 2015 when Murray & Chu introduced drones to the traveling salesman
tions of routing a truck with multiple drones for last-mile deliveries. problem (Murray and Chu, 2015). They presented an exact MILP model
In PLTSPHD, a modern truck equipped with a fleet of homogeneous and heuristic algorithm to solve two variants of truck-drone problems.
drones is used to deliver lightweight parcels to customer locations. The The first problem is the flying sidekick traveling salesman problem
2
(FSTSP), where a drone works in tandem with a truck to deliver 2.2. Machine learning for routing problems
parcels to a set of customers. The second problem is the parallel drone
scheduling traveling salesman problem (PDSTSP). In PDSTSP, a single In the last few years, researchers have leveraged ML algorithms to
truck serves customers located far from the depot in a TSP route, either decompose or learn heuristics for routing problems. Since our
while multiple drones serve the nearby customers directly from the paper is the first to study an integrated constrained location-clustering
depot. The objective of both variants is to minimize the completion and truck-drone routing problem using ML, we review the relevant
time, that is, the time needed to serve all customers and return both studies on other routing problems such as VRP and TSP.
vehicles (truck and drone) to the depot. Using conventional methods, Clustering can accelerate the running time of other exact or heuris-
the authors solved small-sized problems with up to 10 customers. tics algorithms when solving routing problems. Previous studies that
In addition, to FSTSP and PDSTSP, there are other emerging studies use ML algorithms to decompose routing problems typically employ the
on the truck-drone routing problems. Agatz et al. (2018) proposed classical k-means clustering method. For instance, the work in Abdirad
a combination of exact and heuristics to minimize the operational et al. (2020) and Xu et al. (2018) used k-means clustering to reduce
the complexity of the dynamic VRP before running metaheuristics.
cost of the traveling salesman with drone (TSPD). The exact method
Singanamala et al. (2018) utilized k-means to determine the location
solved small problems with up to 12 customers to optimality, while
of new depots before solving the multi-depot VRP (MDVRP) with
the heuristic approach tackled instances with up to 100 customer
a heuristic algorithm. Similarly, Mostafa and Eltawil (2017) applied
nodes in less than 30 min. Ha et al. (2018) studied TSP-D with 100
k-means to assign customers to clusters before solving VRP with a
customers using heuristics but with a different objective function of
heterogeneous fleet. Geetha et al. (2013) proposed an improved k-
minimizing operational costs, including costs incurred by waiting time
means to decompose the MDVRP to multiple VRP before applying
and transportation costs. Bouman et al. (2018) developed a dynamic
practical swarm optimization. Very recently, Le et al. (2022) applied
programming model for solving the TSPD with up to 20 customer
k-means and branch and cut algorithm to deal with large VRPTW.
nodes. Cavani et al. (2021) proposed a MILP model to formulate the Even though the k-means algorithm is simple and efficient for de-
TSPD with multiple drones. Their exact approach solved small instances composing large problems, it cannot impose constraints on the clusters.
with up 24 customers. Recently, Dell’Amico et al. (2021) developed In many real-world applications, prior information about the underly-
an improved MILP for the FSTSP. When tested on instances with ing size and diameter of the clusters generally arises from operational
10 customers, they showed that their MILP model provides a better limitations, expert opinion, or the needs of the problem considered.
solution than the original FSTSP model. Such information is also useful for location clustering because it can
Other attempts addressed other variants of the truck-drone prob- be used to increase the feasibility and accuracy of clustering tasks.
lems where multiple trucks can be paired with multiple drones. This This brings about the concept of constrained clustering. Constrained
type of problem is known as a vehicle routing problem with drones clustering can incorporate such prior knowledge into the clustering
(VRPD). Routing multiple trucks and multiple drones in a tandem process to gain a better portioning of the data. Constraints like forcing
fashion makes the problem more challenging and different from the the sizes of clusters to be meaningful and putting lower or upper
classical VRP literature. Because of the VRPD aspect that trucks and limits on the diameter of the clusters are common issues in location
drones can be routed in many ways to serve all customer points, there problems. The studies in Chen et al. (2005), Kawahara et al. (2011)
is a huge increase in the search space, which greatly affects the running and Zhu et al. (2010) show that imposing size constraints on the cluster
time. Kitjacharoenchai et al. (2019) proposed MILP formulation for the can improve the clustering performance. Zhu et al. (2010) proposed a
VRPD. Similar to Murray and Chu (2015), they tested their approach on heuristic algorithm to transform a size constrained clustering problem
small-size problems with 10 nodes. Wang and Sheu (2019) presented into integer linear programming. Chen et al. (2005) presented a size
a MILP for solving a VRPD model with a multi-drop condition where regularized cut algorithm to guide the clustering process. Other studies
drones can be launched and then picked up by a different truck. Their focused on modifying the k-means algorithm to enforce user-specified
analysis showed how the running times of MILP increase exponentially constraints. Wagstaff et al. (2001) modified the search strategy in
with problem size. The authors reported that more than four and a k-means to include pairwise constraints as background knowledge.
half hours were needed to solve instances with 15 nodes. Schermer Similarly, Ganganath et al. (2014) modified k-means to obtain clusters
et al. (2019) developed a model for VRPD with the possibility of with selected sizes. Recently, Baumann (2020) presented a binary linear
enroute (VRPDERO). In their problem, they relaxed the assumption of programming approach for the constrained k-means that is better at
clustering than the iterative search algorithm. However, it is well
dispatching and retrieving a drone only at the customer node and hence
known that clustering approaches based on linear programming cannot
allowing a drone to be picked up along the truck route arc (enroute).
be applied to solve large-scale problems. Therefore, we modified the k-
They developed an exact method using MILP for small instances with
means algorithm by adding a constraint checking process to the search
10 customers and proposed a hybrid variable neighborhood search
strategy.
and tabu search to address large instances with up to 50 customers.
Although unsupervised clustering can accelerate the running time
Very recently, Kuo et al. (2022) also proposed MILP and variable
of other exact or heuristic algorithms when solving routing problems,
neighborhood search (VNS) heuristics algorithm to solve a vehicle rout-
its benefit is inversely proportional to the number of clusters ob-
ing problem with drones considering time window (VRPDTW). They
tained. As the number of clusters increases, the quality of existing
considered problems with up to 50 customers due to the complexity of
non-learning approaches suffers, and hence there is an urgent need
the problem. for the SC’s scheduler to solve routing problems with hundreds of
Despite the research attempts on the truck-drone routing problem, customers quickly. In recent years, some researchers extended DL
solving large-scale problems remains challenging. Most studies focused to solving routing problems. The developed approaches for routing
on traditional methods to solve non-practical truck-drone routing prob- problems are usually based on the sequence-to-sequence (encoder–
lems. Exact methods for routing trucks with drones suffer from high decoder) DL framework. In this framework, the encoder maps the
computational time and can solve only small-sized problems. On the information of nodes in the routing problems into feature embeddings
other hand, heuristics can solve larger problems but may take several (representations), and the decoder is responsible for sequentially gener-
minutes to a few hours to reach good solutions. Regardless of the popu- ating the solutions. The first DL model to tackle routing problems was
larity of heuristics, their implementation requires many parameters that proposed by Vinyals et al. (2015). The authors modified the sequence-
need to be configured for the algorithm to perform efficiently (Labadie to-sequence model and proposed a pointer network to solve TSP. The
et al., 2016). model was trained by using supervised learning, which generally needs
3
Fig. 1. PLTSPHD for last-mile delivery.
optimal solutions to be provided as labels during the training. However, 3. Problem description
optimal solutions are expensive to obtain in practice, therefore the
quality of their pointer network becomes sensitive to the quality of the The proposed PLTSPHD represents a different scheme of truck-
provided labels. Later, Bello et al. (2016) extended the DL of the pointer drone for last-mile delivery. In PLTSPHD, a single truck carrying a
network to the framework of RL to solve TSP. Their RL significantly fleet of homogeneous drones is used to perform deliveries of parcels
outperformed the supervised learning approach and opened the door to customers. The truck begins its route from the depot moving all the
for tackling routing problems with a novel DRL approach. Nazari et al. parcels to be delivered by drones. The truck visits a set of preassigned
(2018) further extended the model of Bello et al. (2016) by replacing launching/parking sites, from which the drones start their trips to make
the recurrent neural network encoder of the pointer model with one- all final deliveries. After serving the customers, each drone must return
dimensional convolution. Then the authors trained their model with RL to the truck for a battery swap at the same launching/parking spot.
algorithm to solve TSP and VRP. The results showed that the developed Once all the drones are collected, the truck moves to the following
model provides better results than traditional heuristics and OR-Tools. parking location and repeats the process until all the customers are
Inspired by the deep transformer model proposed by Vaswani et al. served. The objective of the PLTSPHD is to minimize the total opera-
(2017), Kool et al. (2018) proposed a graph attention model to solve tional costs for last-mile deliveries while considering the limitations of
TSP and VRP. The authors trained their model using RL algorithm with drones. Fig. 1 illustrates the PLTSPHD operation for last-mile deliveries.
a simple greedy rollout baseline. Their results demonstrated that multi- To describe the proposed truck-drone system more accurately, op-
attention layers could outperform the pointer model as well as OR-tools erating assumptions for PLTSPHD are summarized below:
in terms of both solution quality and computation time. Zhao et al. • Customer locations are widespread and far from the depot, so
(2020) presented a hybrid DRL model where the routing problem was drones cannot reach directly from the depot.
firstly solved by a pointer model trained with RL, and then the resulting • The truck carries m fully autonomous and identical drones and
routes were further processed by local search heuristics. Li et al. (2021) has unlimited capacity to carry all parcels.
proposed a DRL based on the graph attention model and RL to solve • The truck launches the drones to k service regions to make all
the covering salesman problem (CSP), which is a generalization of the final deliveries.
TSP. The study (Zhang et al., 2021) presented a DRL model for the • The weight of a parcel cannot exceed the drone’s carrying capac-
dynamic TSP (DTSP). This model was built on the graph attention ity Q.
model and RL with a greedy rollout baseline. Their experiment verified • The sum of customer demands 𝑞𝑖 in each k serviced zone does not
the effectiveness of DRL for tackling routing problems in dynamics and exceed the number of drones.
uncertain environments. Recently, Xu et al. (2021) extended the atten- • A drone can only deliver a single parcel per trip and may not be
tion model by using an enhanced node embedding. Their experiments repeatedly deployed from the truck at the same launching site.
demonstrated the efficiency and generalizability of the attention-based • A customer with demand 𝑞𝑖 > 1 can be served by more than one
models when solving large TSP. drone.
When examining the literature, it is evident that researchers have • Drones travel directly from the center of the serviced zone 𝐶𝑐𝑘 to
focused on the traditional k-means for decomposing large routing prob- the customer points.
lems, whereas DRL approaches mainly focused on classical TSP and • A drone cannot exceed the maximum flight distance R in each
VRP. Therefore, we propose a two-phase ML framework to solve a new trip.
variant of the truck-drone routing problem, where the constrained k- • Drones must return to the truck at the launching site for a battery
means alleviates the complexity of the problem and the DRL model swap after each trip so that they can fly with a full battery
optimizes the routing policy. capacity in the next service zone.
4
Table 1
Notations and their descriptions.
Notations Descriptions
𝑆 Undirected graph where problem is defined.
𝑈 = {1, 2,. . . , n}. The set of customer points.
𝐷 = {1, 2,. . . , m}. The set of drones.
𝐾 = {1, 2,. . . , K}. The set of service zones (constrained clusters).
𝐶c = {𝐶𝑐1 , 𝐶𝑐2 ,. . . , 𝐶𝑐𝐾 }. The set of parking spots (clusters centroids).
𝐴 = {𝐴𝑡 , 𝐴𝑑 } The set of arcs.
𝐴𝑡 = {(k,l) | k,l ∈ 𝐶𝑐 , k ≠ l} the set of truck’s edges.
𝐴𝑑 = {(𝑖, 𝑗) | 𝑖, 𝑗 ∈ 𝑈 , 𝑖 ≠ j} the set of drone’s edges.
𝑖, 𝑗 The index of customer points.
𝑘, 𝑙 The index of serviced zones (clusters).
𝑐𝑖 The (𝑥𝑖 , 𝑦𝑖 ) coordinates of 𝑖th customer.
𝑞𝑖 The demand of the 𝑖th customer.
𝑤𝑖 The weight of a parcel for customer 𝑖th customer.
𝑛𝑘 The number of customer orders in 𝑘th service zone.
𝑃𝑘 The number of 𝑘th service zone.
𝑞𝑘 The demand of the 𝑘th service zone in the truck route.
𝐶𝑐𝑘 The 𝑘th parking spot.
𝑑𝑘𝑙 The distance between the 𝑘th parking spot and the 𝑙th parking spot.
𝑑𝑖𝑗𝑘 The distance between the 𝑖th customer point and 𝑗th customer point in the 𝑘th service zone.
Q The maximum carrying capacity of a drone.
R The maximum flight range of a drone.
𝑓𝑡 The fixed cost for using a truck.
𝑓𝑑 The fixed cost for a drone.
𝑐𝑡 The travel cost of truck ($/mile).
𝑐𝑑 The travel cost of a drone ($/mile).
𝑐𝑠 The stop cost of truck in 𝑘th service zone.
𝑥𝑘𝑙 Binary variable = 1, if the truck travels from the 𝑘th service zone to the 𝑙th service zone , 0 otherwise.
𝑦𝑖𝑗𝑘𝑚 Binary variable = 1, if the 𝑚th drone travels from the 𝑖th point to the 𝑗th point in the 𝑘th service zone, 0 otherwise.
• The truck may not leave the serviced zone till all drones are
collected. 𝑤𝑖 .𝑦𝑖𝑗𝑘𝑚 ≤ 𝑄 (7)
3.1. Mathematical formulation The objective function in Eq. (1) searches for the minimum total
operational cost. Constraint (2) ensures that each cluster is visited
Formally, the PLTSPHD can be represented on a graph 𝑆 = {𝑈 , 𝐴}, exactly once. Constraint (3) indicates that there is no loop in the path
where 𝑈 = 𝑈0 ∪𝑈 is the set of vertices, where 𝑈0 = {0} represent of the truck. Constraint (4) enforces that every single order in cluster
the depot and U = {1, 2,. . . , 𝑛} is the set of all customer points. Set 𝑘th is delivered by only one drone, and 𝑚th drones can serve customers
A = {𝐴𝑡 , 𝐴𝑑 } is the set of arcs of the truck and drones. Each customer with multiple orders. Constraint (5) indicates that the demand of the
node in two-dimensional space (xi,yi) is associated with a non-negative 𝑘th cluster cannot exceed the total number of drones used. Constraint
demand 𝑞𝑖 . The truck with a fleet of m drones starting from the depot, (6) ensures that the total flight distance of 𝑚th drone cannot exceed
deploys drones in each service zone k, and then returns to the depot. its maximum flight range. Constraint (7) ensures that the weight of a
Launching sites 𝐶𝑐𝐾 are the centroids of the clusters, which can be parcel delivered by the 𝑚th drone to a customer i in the 𝑘th service
considered as dummy customers with a demand equal to 𝑞𝑘 . zone does not exceed its maximum capacity.
To consider the detailed description and assumptions of the pro-
posed truck-drones delivery system, several notations are defined to 4. The proposed ML approach
model the PLTSPHD mathematically. These notations and their descrip-
tions are presented in Table 1. Since PLTSPHD is an extension of TSPD, which is an NP-hard prob-
lem, the computational complexity of PLTSPHD increases dramatically
(𝑛 𝑛 ) as the number of customers increases. In addition, adding a few lines
∑
𝐷
∑
𝐾
∑
𝐾
∑𝑘 ∑
𝑘 ∑ ∑
𝐾 𝐾
𝑀𝑖𝑛 𝑍 = 𝑓𝑡 + 𝑓𝑑 𝑚 + 𝑐𝑠 𝑘 + 𝑐𝑑 𝑑𝑖𝑗𝑘 𝑦𝑖𝑗𝑘𝑚 + 𝑐𝑡 𝑑𝑘𝑙 𝑥𝑘𝑙 (1) of constraints can sharply slow down the running time of classical
𝑚=1 𝑘=1 𝑘=1 𝑖=1 𝑗=1 𝑘=1 𝑙=1 optimization methods. Inspired by the ideas of constrained clustering
∑
𝐾 and DRL for combinatorial optimization, this study proposes a two-
𝑥𝑘𝑙 = 1, 𝑙 ∈ {1, 2, … , 𝐾} (2) phase ML approach to deal with the mentioned challenges. In the first
𝑘=1 phase, an unsupervised constrained 𝑘-means algorithm is proposed to
reduce the complexity and scale of the problem while adhering to the
∑
𝐾 ∑
𝐾
limitations of drones. In the second stage, a DRL model is developed to
𝑥𝑘𝑙 ≤ 𝐾 − 1, 𝑘, 𝑙 ∈ {1, 2, … , 𝐾} (3)
𝑘=1 𝑙=1
optimize the routing policy in real-time.
𝑛𝑘
∑ { }
𝑦𝑖𝑗𝑘𝑚 = 1, 𝑗 ∈ 1, 2, … , 𝑛𝑘 , 𝑘 ∈ {1, 2, … , 𝐾} , 𝑚 ∈ {1, 2, … , 𝐷} 4.1. Constrained k-means algorithm
𝑖=1,𝑖≠𝑗
(4) Clustering is an unsupervised ML method that splits data into clus-
𝑛𝑘
∑ { } ters or groups. Given a distribution network on graph S, the objective
𝑞𝑖 ≤ 𝑚, 𝑖 ∈ 1, 2, … , 𝑛𝑘 (5) of the proposed constrained k-means is to find several feasible parking
i=1 sites 𝑃𝑘 in the network. The constrained k-means is iteratively applied
𝑛𝑘 𝑛𝑘
∑∑ { } on the network. The process outputs 𝐾 service zones with 𝐶𝑐𝑘 , 𝑛𝑘 and
𝑑𝑖𝑗𝑘 𝑦𝑖𝑗𝑘𝑚 ≤ 𝑅, 𝑖, 𝑗 ∈ 1, 2, … , 𝑛𝑘 , 𝑘 ∈ {1, 2, … , 𝐾} ,
𝑖=1 𝑗=1
𝑞𝑘 in each iteration. The constrained clustering process finishes when
all demands 𝑞𝑖 are aggregated into feasible clusters. This process can
𝑚 ∈ {1, 2, … , 𝐷} (6) maximize the use of drones at each stop. The concept of aggregating the
5
the next best potential cluster continues until a feasible cluster that has
not yet met the constraints is found.
Although the modified algorithm enforces constraints that exist in
real-world applications, it may converge to local optima. Therefore, it
is recommended to run the algorithm for several iterations and then
choose the best outcome. For example, if there are empty clusters or
not all customers are assigned to clusters, we can incrementally reduce
or increase the number of k clusters in the first step. There are many
approaches to selecting the correct number of k clusters in the literature
(Kodinariya and Makwana, 2013). We use the elbow method heuristics
to determine the optimal number of clusters (Kodinariya and Makwana,
2013). However, there is a shortcoming in the elbow method which is
the choice of the initial number of clusters. Initializing the algorithm
with an infeasible k value can significantly affect the runtime and
reduce the possibility of converging to local optima. Therefore, careful
attention must be given to the choice of the initial number of k clusters.
To solve this problem, we use prior knowledge regarding the size of
the cluster to set a minimum boundary for the number of initial k to
𝑘𝑚𝑖𝑛 and then calculate the performance of the algorithm for different
numbers of k clusters. As the number of possible clusters increases, the
total WCSS decreases. Using the elbow method, a meaningful number
of constrained clusters can be obtained while reducing the total WCSS.
This will optimize the trade-off between the number of parking stops
and the distance each drone covers.
Note that as k < n in many practical applications, it does not have a
drone-fulfilled deliveries into fewer K service zones will transform the
significant impact on the computation time of the modified algorithm.
difficult problem of truck-drone delivery into a simpler truck routing
The time complexity of the developed constrained k-means algorithm
problem. The variable cost of drone delivery for each service zone can
was O(n× log(n)× k × log(k)) per iteration. Let I be the number of
be calculated as follows:
iterations required for convergence in each clustering run, and V be
𝑛𝑘 𝑛𝑘
∑ ∑ the number of different k values used in the elbow method. Therefore,
𝑧𝑘 = 2.𝑐𝑑 𝑑𝑖𝑗𝑘 𝑦𝑖𝑗𝑘𝑚 (8) the total time complexity can be expressed as O(I ×V × n× log(n)× k ×
𝑖=1 𝑗=1
log(k)).
Since flying drones is cheaper than driving a truck, reducing the
number of clusters will reduce the number of stops on the truck route
and make truck-drone delivery more efficient. 4.2. DRL model
Clustering problems without constraints are often considered as a
search for a good partition. The nodes inside the same cluster are In order to solve PLTSPHD in real-time, we propose a DRL model
similar while they differ from the nodes belonging to other clusters. to establish a routing policy that minimizes the total operational cost.
The similarity criteria can differ depending on the clustering algorithm Referring to the deep transformer network in Vaswani et al. (2017),
used. The objective of the traditional k-means algorithm is to find we adopt an encoder–decoder with multi-head attention layers. This
K clusters such that the similarity among the nodes in each cluster DL model is trained using RL algorithm on unlabeled training sets
is maximized; to achieve this, the optimization criterion in k-means of PLTSPHD graphs to generate real-time solutions. Fig. 2 shows an
minimizes the total within-cluster sum of squares (WCSS). In Euclidean abstraction of the encoder–decoder architecture for PLTSPHD.
space, the WCSS is the sum of the squared distances between each node In the proposed DRL approach, the encoder extracts original fea-
and cluster centroid. tures from a PLTSPHD graph S to produce the embeddings (representa-
Considering the operating assumptions of the PLTSPHD, clustering tions) of all nodes. The node embeddings are processed and updated us-
should be performed to allow flexibility in clustering while satisfying ing multiple sequential layers, each consisting of a two-layer multi-head
user constraints. This can be achieved by modifying the search strategy, attention (MHA) layer and node-wise fully connected feed-forward (FF)
objective function, or similarity measure in traditional k-means (Dinler sublayer. Then, the decoder integrates the produced node embeddings
and Tural, 2016). Similar to the studies in Ganganath et al. (2014) with a masking scheme and context vector in another MHA layer to
and Wagstaff et al. (2001), we modified the search strategy in the produce a probability distribution over clusters, one cluster at each time
traditional k-means to impose user-specified constraints on the clusters. step t (t < K). More formally, the model defines a stochastics policy
The demand constraint requires that each potential cluster has a total p(𝜋|S) for predicting the solution sequence 𝜋 = (𝜋1 , … , 𝜋𝑡 ). According
demand (quantity of parcels) less than the given threshold (number of to the chain rule, this stochastic policy can be parameterized using 𝜃
drones in the truck), whereas the flight range constraint requires that as follows:
the distance between the parking spot and each customer cannot exceed
∏
𝐾
the maximum range of the drone. The pseudocode of the constrained 𝑝𝜃 (𝜋𝑡 |𝑆) = 𝑝𝜃 (𝜋𝑡 |𝑆, 𝜋1 ∶ 𝑡−1 ) (9)
k-means is presented in Algorithm 1. 𝑡=1
The constrained k-means algorithm works similarly to the classical We use a RL algorithm to train the model parameters 𝜃 of the DL neural
k-means algorithm; the key difference is that a customer node is not network. In RL, the agent only needs a reward signal to learn a routing
always assigned to the nearest centroid in the assignment step because policy 𝜋𝜃 . The main components of RL are defined as follows:
of the flight distance and the demand size constraints. Unlike random
node assignment in traditional k-means, constrained k-means assigns • State: The state in the decoder contains static features and dy-
a new point to the nearest valid centroid only if the current cluster namic features. Specifically, the agent will obtain static informa-
does not violate the predefined constraints. Otherwise, the search for tion about each service zone (cluster), such as location, demand,
6
Fig. 2. General frame structure of the encoder–decoder for PLTSPHD.
and delivery cost, which remains constant all the time. In addi- Given a graph S, nodes feature of PLTSPHD are linearly projected
tion, the agent can observe dynamic information regarding route with trainable parameters W and b to map them into a higher dimen-
history at each time step t. sion as the initial embeddings:
• Action: At each time step t, the agent uses the produced prob-
ℎ(0)
𝑘
= 𝑊 [𝑠𝑘 ] + 𝑏 (11)
ability distribution over the unvisited clusters to select the next
cluster to serve. Previously served clusters are masked by setting Then the embedding is updated using MAH layers. Each layer carries
their probabilities to 0. a MHA and a FF operation. For each sub-layer, a batch normalization
• Reward: The negative of the objective function in Eq. (1) is (BN) (Ioffe and Szegedy, 2015), and skip connections (He et al., 2016)
received as a reward signal L and the model parameters 𝜃 of policy are added. The self-attention mechanism is the basic operator of MAH.
𝜋𝜃 are updated according to the obtained reward signal when all It can be interpreted as a weighted passing message between nodes
clusters are served. As the cumulative reward increases, the total in a graph (Vaswani et al., 2017). It encodes each node’s relationship
operational cost of the PLTSPHD is minimized as follows: with every other node in the input sequence, paying more attention to
the most relevant ones. Therefore, the computed feature vector of each
𝐿 (𝜋|𝑠) = −𝐶𝑜𝑠𝑡 𝑜𝑓 𝑃 𝐿𝑇 𝑆𝑃 𝐻𝐷 (10) cluster gathers not only its own information, but also its relationship
with other clusters. The weight of the message that the cluster receives
from other clusters depends on the compatibility of its query with the
4.2.1. Encoder framework
key–value pair of other clusters. Formally, each MHA layer is computed
The purpose of the encoder is to learns the representation of the as:
nodes in the problem. It maps the node features to a vector of higher ( 𝓁 𝓁𝑇 )
dimensions. In our PLTSPHD, each cluster has several characteristics 𝑂𝑖 𝑌𝑖 ( )
ℎ𝑒𝑎𝑑𝑘𝓁 = 𝑠𝑜𝑓 𝑡𝑚𝑎𝑥 √ 𝑉𝑖𝓁 , 𝑖 ∈ 1, 2, … , 𝑁ℎ (12)
including location, demand, and drone delivery cost so that they could 𝑑ℎ
([ ] )
be represented as a tuple 𝑠𝑘 = (𝑐𝑘 , 𝑞𝑘 , 𝑧𝑘 ), 𝑘 = (1, 2, … , 𝐾) In the encoder,
ℎ̂ (𝓁) = 𝐵𝑁 ℎ𝑒𝑎𝑑0𝓁 , ℎ𝑒𝑎𝑑1𝓁 , … , ℎ𝑒𝑎𝑑𝑁 𝓁
𝑊 𝓁 + ℎ(𝓁−1) , (13)
the MAH with N layers is used to facilitate the DL to capture richer ℎ
node information. The FF sub-layer is composed of two fully connected where is 1∕𝑑ℎ the scaling factor, 𝑁ℎ is the number of heads, and
layers. It utilizes the activation function to learn more information ℎ(𝓁−1) = [ℎ(𝓁−1)
0
; ℎ(𝓁−1)
1
; … ; ℎ(𝓁−1)
𝐾
] is the embedding of each cluster
about the processed vector. node from the last MHA layer. In layer 𝓁 ∈ {1, 2, … , 𝑁}, 𝑂𝑘𝓁 =
7
ℎ(𝓁−1)
𝑘
𝓁 = ℎ(𝓁−1) 𝑊 𝑌 , 𝑉 𝓁 = ℎ(𝓁−1) 𝑊 𝑉 are query, key, and value
𝑊𝑖𝑂 , 𝑌𝑘,𝑖 𝑘 𝑘 𝑘,𝑖 𝑘 𝑖
for each cluster by projecting the embedding ℎ(𝓁−1) . The FF sublayer
introduces nonlinearity between attention passes to fuse all the features
of the input sequence. Given the attention value (feature vector) of the
cluster node k in the 𝓁th layer, the calculation formulas in the FF are
as follows:
ℎ(𝓁)
𝑘
= 𝐵𝑁(𝐹 𝐹 (ℎ̂ 𝑘(𝓁) ) + ℎ̂ (𝓁)
𝑘
), (14)
𝐹 𝐹 (ℎ̂ (𝓁)
𝑘
) = 𝑤(𝓁)
2
𝑅𝑒𝑙𝑢(𝑤(𝓁)
1
ℎ̂ (𝓁)
𝑘
+ 𝑏(𝓁)
1
) + 𝑏(𝓁)
2
, (15)
Finally, after N attention layer, for each cluster k, the encoder outputs
the graph embedding ℎ𝑔𝑟𝑎𝑝ℎ of the entire problem context as the mean
of the final node embedding ℎ𝑁 𝑘
:
1 ∑ 𝑁
𝐾
(𝑁) (𝑁)
ℎ𝑔𝑟𝑎𝑝ℎ =ℎ = ℎ (16)
𝐾 + 1 𝑘=1 𝑘
4.2.2. Decoder framework

The purpose of the decoder is to generate the probability distri-
bution over
[ clusters ] at each time step t. It takes a context vector
ℎ(𝑁)
𝑐 = ℎ(𝑁) ; ℎ (𝑁)
𝑔𝑟𝑎𝑝ℎ 𝑙𝑎𝑠𝑡
as input, where refers [.; .] to the concatenation
of vectors, and ℎ(𝑁)
𝑙𝑎𝑠𝑡
is the node embedding of the last visited cluster.
This context vector combines the static and dynamic features of the
environment at each time step. The ℎ(𝑁) 𝑐 is processed with another MHA
layer in the decoder, where it only attends to clusters that are feasible
in time step t. In contrast to the encoder, the decoder only computes
the attention value between context vector and node embeddings to
produce a transformed context vector ℎ′(𝑁) 𝑐 .
Given problem instance S, the decoder outputs a cluster node
with the highest probability according to the probability distribution
∏
𝑝𝜃 (𝜋𝑡 |𝑆) = 𝐾𝑡=1 𝑝𝜃 (𝜋𝑡 |𝑆, 𝜋1 ∶ 𝑡−1 ), where 𝜋1 ∶ 𝑡−1 represents the route of
previously visited clusters. For t = 1 and t = K, no information is During the training, the model parameters 𝜃 can be learned via a ran-
required because the agent starts and finishes at the depot. Utilize the dom policy and iteratively updated with RL algorithm. In each training
observation of an agent as query 𝑜 and the embedding of each cluster epoch, a batch of instances 𝑠1 , 𝑠2 , … , 𝑠𝐵 is drawn from a training set
node as key 𝑦𝑙 in the MHA layer, the compatibility score of an agent S to train the DRL model. The baseline construct routes by greedy
choosing cluster l at each time step t is computed as : decoding. Before training starts, the parameters 𝜃 and 𝜃 𝑏𝑙 are set to
′
be the same. These parameters are updated periodically based on a
𝑜 = 𝑊𝑜 ℎ𝑐(𝑁) , 𝑦𝑙 = 𝑊𝑦 ℎ(𝑁)
𝑙
(17) statistical test. The model is trained for several epochs. At the first
epoch, an exponential moving average is used as a warm-up baseline:
( )
𝑜𝑇 𝑦𝑙
𝑢𝑙 = √ (18) ⎧1 ∑𝐵
( )
𝑑ℎ ⎪ 𝐿 𝜋|𝑠𝑖 , training on first batch
⎪𝐵 1
Then we apply the softmax function to compute the probability of 𝑏⎨ (21)
1 ∑ (
𝐵
selecting each cluster as follows: ⎪ ′ )
⎪𝛽𝑏 + (1 − 𝛽) 𝐵 𝐿 𝜋|𝑠𝑖 , otherwise
⎩ 𝑖=1
( ) ( ) 𝑒𝑢𝑙
𝑝𝜃 𝜋𝑡 |𝑆 = 𝑠𝑜𝑓 𝑡𝑚𝑎𝑥 𝑢𝑙 = ∑𝐾 (19)
𝑙=1 𝑒𝑢𝑙 At the end of each training epoch, the baseline parameters 𝜃 𝑏𝑙 is
In order to ensure that each cluster is covered exactly once, we set updated with current strategy 𝜃 only if the operational cost for the
𝑢𝑙 = −∞ to prevent visiting the clusters that have been covered. This candidate parameters is significantly reduced through the t-tests with
will ensure that the probability of prohibited action will be 0 and the 𝛼 = 0.05. Starting from a random policy and iteratively optimizing with
sum of the probabilities of all feasible actions equals 1. RL and Adam, the DRL can effectively learn a set of model parameters
In practice, we hope to find a set of optimized parameters 𝜃 ∗ that 𝜃 ∗ that can minimize the operational cost for PLTSPHD. The training
approximates the optimal routing policy 𝜋𝜃∗ . To get the best approx- algorithm is outlined in Algorithm 2.
imation, we need to find the gradient of 𝜃. We adopt a policy-based
Unlike conventional algorithms that only consider the running time
RL algorithm with rollout baseline 𝑏(𝑆) to optimize the parameters
complexity, DL models must consider both the training time complexity
𝜃. Policy gradient using REINFORCE (Williams, 1992) algorithm and
and running time complexity. The training time complexity of the
has proven to be very effective for training DL (Kool et al., 2018; Li
model is O (𝑛𝑒 × 𝐼 × 𝜃 × 𝐺), where 𝑛𝑒 is the number of epochs when the
et al., 2021; Lin et al., 2021; Zhang et al., 2021). According to the
proposed DRL model converged, 𝐼 = 𝑆∕𝐵 is the number of iterations
policy gradient theorem, the gradient of the policy is the expectation
(number of batches) per epoch and 𝐺 is the time complexity of the
of the product of the expected reward and the gradient of the log of
convergence of each element in the model parameters 𝜃. The running
the policy (Sutton et al., 1999). In this method, the loss function is the
time of the trained DRL has linear complexity. In our experiment, the
differentials between the total operation cost 𝐿 (𝜋|𝑆) and the baseline
trained model takes a negligible run time of under 1 s. The encoder of
𝑏(𝑆). The baseline is implemented to stabilize the training and reduce
the graph attention network quickly processes the entire input all at
the variance of the gradient.
once for each new graph. The decoder is quite fast because it considers
∇𝜃 𝐽 (𝜃|𝑆) = 𝐸𝜋∼𝑃𝜃 (.|S) [(𝐿 (𝜋|𝑆) – 𝑏(𝑆))∇𝜃 𝑙𝑜𝑔 𝑃𝜃 (𝜋|𝑆)], (20) only unmasked clusters at each decoding step.
8
Table 2 5.1.2. Comparison with conventional algorithms

Attribution of generated instance.
Since exact algorithms are too computationally expensive to solve
Scale Area Number of customers
large-scale instances, they are not suitable to be used as comparative al-
Small 100 sq mile 70 gorithms. For that reason, we compare the proposed ML approach with
Medium 150 sq mile 110
three well-known heuristic and metaheuristic algorithms, namely, the
Large 200 sq mile 250
NN-based PLTSPHD, the GA-based PLTSPHD, and the ORtools-based
PLTSPHD. The main characteristics of these conventional methods are
presented in Table 3.
5. Case studies The NN algorithm has a limited search space and usually stops at
local optima. On the other hand, GA and OR-tools perform a more
exhaustive exploration of the search space to escape local optima, but
In this section, we conduct a series of experiments to evaluate they require a stop-search criterion. Additional searches can signifi-
the performance of the proposed ML approach for solving the PLT- cantly affect the execution time and make it more computationally
SPHD last-mile delivery. The carried experiment includes problems expensive than expected. In this context, we set the time and value
with different sizes, numbers of customers, and drones. We first solve obtained by our model as benchmark solutions and limited the run time
the truck-only last mile delivery. We then examine the percentage of of OR-tools to 10 × DRL time and GA to 100 × DRL time. We present the
savings and the solution quality of the routing policies obtained by average total cost for each comparative algorithm and its running for
running the proposed ML and compare them with popular optimiza- each instance scale. For PLTSPHD, the proposed constrained k-means
tion methods. In addition, we statistically test the DRL against other is applied on each instance before running each algorithm. To measure
algorithms to determine superiority and verify the overall performance the disparity between the trained DRL and the comparison algorithms,
across all test cases. we calculate the gap indicator as follows.
𝐴𝑇 𝐶𝑖 − 𝐴𝑇 𝐶𝐷𝑅𝐿
𝐺𝑎𝑝 = × 100 (22)
𝐴𝑇 𝐶𝐷𝑅𝐿
5.1. Experimental design and settings
where 𝐴𝑇 𝐶𝑖 represents the average total cost of the 𝑖th conventional
algorithm and 𝐴𝑇 𝐶𝐷𝑅𝐿 represents the average total cost of the DRL. All
experiments are implemented in Python to provide a fair comparison.
5.1.1. Instance and hyperparameters settings
They are evaluated on a workstation configured with an AMD Ryzen 5
The experiments consist of randomly generated instances of three processor and a GeForce RTX2060 GPU.
scale types. Since the introduced PLTSPHD has no publicly accessible
benchmarks, we synthesize several instances for training and testing. A
5.2. Experimental results
few numbers of truck drone instances for FSTSP, TSPD and VRPD have
been reported in the literature, however these instances are not suited
Extensive experiments are conducted to evaluate the performance
for PLTSPHD because of the limited number of drones and multi-parcel
of the proposed two-phase ML for last-mile delivery. With 3 instances,
demand per customer.
5 algorithms (constrained k-means, conventional algorithms, and DRL
In all synthetic instances, the location of the depot remains the algorithm), 3 drone settings, we have in total 30 × 3 × 5 × 3 = 1350 tests.
same, whereas the locations of last-mile deliveries are randomly gener- The numerical results of comparative experiment for each scale type
ated considering real locations in Wichita KS. The Euclidean distance for truck-only system and proposed truck-drone system are shown in
between a pair of delivery locations is considered based on their Tables 4 and 5, respectively. Each algorithm is run 30 times on each
longitude and latitude. The quantity of orders varies from customer to scale type. For both systems, we present the average total operational
customer, so each customer’s demand is set to be a discrete number cost and the running time in second of each algorithm. The best result
randomly chosen from (1, 2, 3, 4). Table 2 provides information on the in both systems is set as gap. Column ‘‘% saving’’ in Table 5 shows the
area sizes and the number of customers at different scales. Additionally, amount of average savings from PLTSPHD (DRL) last-mile delivery to
for all instances, the weights of the parcels that customers order are truck-only (DRL) last-mile delivery.
randomly generated between (0.22–2.26) lb according to the Gaussian
As can be seen from Table 5, the proposed two-phase ML ap-
distribution. The truck has an infinite capacity to carry all parcels but
proach for PLTSPHD consistently produced better results than other
only carries a limited number of small drones that fly at a constant
algorithms, with cost reduction up to 46.24% in the large-scale region.
speed. For the sake of convenience, we consider three fleets of drones
The calculated savings show the potential benefits of PLTSPHD and the
with m = (5,10,15). The drone variable cost is $0.05 per mile (Iran-
impact of the number of drones on the operational costs. In the truck-
manesh and Raad, 2019; Kim, 2016; Wang and Sheu, 2019) while the
only system, the average operational cost increases with an increase
fixed cost is $12 per drone used. The truck stop cost is $1 per stop. The
in area and customers. However, in PLTSPHD, the average operational
variable cost of the truck is $1.38 per mile (Barradas, 2013) while the
cost is more sensitive to the number of drones. Given a certain number
fixed cost of the truck is $100.
of drones per truck, the constrained k-means aggregate customer lo-
To train the DRL model, we use 2500 batches of 256 instances cations into fewer service zones while adhering to constraints (5) and
in each epoch. Due to the GPU’s limited memory, a medium scale (6). In every service zone, the distance between the launching site and
dataset is used to train the parameter of the DRL, which generates one each assigned customer is less than the maximum flying range. Also,
parameter set of the DRL. Validation and test instances are produced the total customer demand is less than the drones transported by truck.
separately in advance using the same setting describe above. During Fig. 3.a visualizes customer locations and demands for a small-scale
the validation process, we generate 1000 instances of the same scale scenario. The corresponding optimal number of constrained clusters
used in training. For testing the trained DRL model, unseen instance using the elbow method for a case with 15 drones is displayed on
for each scale of problem sizes is used evaluate learning ability and Fig. 3.b. Fig. 3.c displays the routing plan obtained by the proposed
generalizability. two-phase ML approach, while Fig. 3.d shows the DRL routing plan for
In the DRL model, we set the number of layers in the encoder the truck-only configuration.
to three layers, each with eight heads in MAH. Dimensions of initial It is observed that the proposed constrained k-means algorithm can
embeddings, query, key and value are set to 128. The model is trained effectively reduce the complexity and scale of the problem with a slight
for 100 epochs with a learning rate 𝜂 = 0.0001. increase in computational cost. Since we first clustered customers to
9
Table 3
The characteristics of the comparative methods.
Method Characteristics
NN 1. A constructive heuristic that selects the nearest available neighbor of the last visited node (Kizilateş and Nuriyeva, 2013).
2. Advantages: fast and simple classical heuristic.
3. Disadvantages: limited search space and no guarantee for optimality.
GA 1. A Population-based metaheuristic for solving optimization problems based on a natural selection process (Potvin, 1996).
2. Advantages: Most used algorithm for routing problem (Konstantakopoulos et al., 2022). The non-deterministic evolution enables GA to escape a local
optima solution.
3. Disadvantages: the search process suffers from substantial execution times and requires several parameter settings to perform well.
OR-tools 1. Google’s operations research toolbox (Perron and Furnon, 2019).
2. Advantages: fast and highly tuned software for combinatorial optimization.
3. Disadvantages: It cannot be easily customized to solve problems with complex constraints.
Table 4
The comparative results of solving truck-only last-mile delivery.
Regions Total demand Average Truck-only Operational Cost ($)
NN GA OR-tools DRL
Small 127 orders 368.17 (0.59 s) 362.65 (35 s) 355.75 (3.5 s) 353.61 (0.35 s)
Medium 205 orders 430.81 (1.2 s) 418.4 (43 s) 412.91 (4.3 s) 408.77 (0.43 s)
Large 350 orders 746.64 (2.2 s) 726.74 (85 s) 723.74 (8.5 s) 721.22 (0.85 s)
Table 5
The comparative results of solving truck-drones (PLTSPHD) last-mile delivery.
Regions Number of Average number Average Truck-Drone Operational Cost ($) Gap % Savings
drones of constrained
clusters
PLTSPHD PLTSPHD PLTSPHD PLTSPHD DRL vs. DRL vs. DRL vs.
(NN) (GA) (OR-tools) (DRL) NN GA OR-tools
5 28 (18.6 s) 300.82 (0.34 s) 298.07 (19 s) 294.47 (1.9 s) 292.27 (0.19 s) 2.93 1.98 0.76 17.20%
Small 10 15 (11.2 s) 300.85 (0.11 s) 299.47 (14 s) 297.97 (1.4 s) 296.43 (0.14 s) 1.49 1.02 0.51 16.02%
15 11 (9.3 s) 337.67 (0.1 s) 336.29 (10 s) 336.29 (1 s) 336.29 (0.1 s) 0.41 0.00 0.00 4.73%
5 47 (24.8 s) 350.6 (0.44 s) 345.08 (28 s) 340.94 (2.8 s) 336.8 (0.28 s) 4.10 2.46 1.23 17.60%
Medium 10 24 (16.4 s) 329 (0.21 s) 324.86 (12 s) 322.10 (1.2 s) 320.72 (0.12 s) 2.58 1.29 0.43 21.53%
15 19 (13.1 s) 355.72 (0.13 s) 352.82 (17 s) 349.79 (1.7 s) 347.99 (0.17 s) 2.22 1.39 0.52 14.86%
5 79 (42.4 s) 488.58 (0.63 s) 477.54 (37 s) 465.12 (3.7 s) 462.36 (0.37 s) 5.67 3.28 0.60 35.89%
Large 10 36 (27.7 s) 398.8 (0.46 s) 393.78 (21 s) 390.52 (2.1 s) 387.76 (0.21 s) 2.85 1.55 0.71 46.24%
15 25 (17.1 s) 443.6 (0.21 s) 436.7 (15 s) 432.56 (1.5 s) 431.18 (0.15 s) 2.88 1.28 0.32 40.22%
obtain a set of service zone where drones can be directly launched from
the truck to make final deliveries, the quality of the truck route in the
second phase is highly dependent on the number of stops. Specifically,
the number of obtained constrained clusters, which determines the
number of stop/launch sites for the truck-drone system, results in
substantially different travel distances for both truck and drones. In
our constrained k-means clustering, the number of clusters is greatly
affected by the number of drones. As the number of drones increases, it
enables the truck to have fewer stops which in turn can affect the total
flight distance of the drones. However, fewer stops do not necessarily
translate into better solutions. With the increase in the number of
parking stops, it can be observed that DRL performs better than the
other algorithms. The trained DRL model can provide real-time and
quality solutions with an extremely low computation time of less than
one second in all scenarios. In contrast, solutions obtained by the NN,
GA, and OR-tools degrade with the increase in the number of clusters
or size of the problem. The NN performs the worst, followed by the
GA in terms of solution quality in most cases. When routing among
a few clusters, OR-tools and GA provide comparable outcomes and
are capable of performing almost as well as our method. However,
as the number of clusters increases, the gap between these algorithms Fig. 3.a. Customer locations and demands.
and our trained DRL widens. Therefore, it is fair to conclude that the
combination of constrained 𝑘-means and DRL can provide the right
balance between solution quality and running time.
5.3. Statistical comparison
To further verify the overall performance of the DRL across all

test cases, we statistically test the DRL against other algorithms to
determine superiority. Since the parametric 𝑡-test assumes that the data
10
follows the normal distribution and have equal variance, we perform

the nonparametric Wilcoxon signed-rank test to test the significance
of result difference between the DRL and other algorithms. Table 6
shows the results of the Wilcoxon test. The Wilcoxon test computes
the difference between the two procedures on a set of examples before
sorting the absolute values of the differences to determine the associ-
ated rank. The rank of the positive differences is summarized in column
‘‘ R+’’, while the rank of the negative differences is summarized in
column ‘‘R−’’. The gap between R+ and R- represents the difference
between the DRL and compared algorithm. The p-value column is used
to quantitatively assess the difference between the two algorithms. A
p-value larger than 0.05 under the confidence level 𝛼 = 0.05 indicates
that there is no significant difference.
Based on the results of the Wilcoxon test, we can observe that
in most cases, the performance of DRL is superior to NN and GA,
respectively. We observe that the performance of the DRL is superior to
OR-tools in three cases with a large number of clusters, which confirms
our observations in the experimental result subsection. The statistical
indifference between DRL and other algorithms indicates the effect of
cluster size on the performance of the algorithm. Overall, and especially
Fig. 3.b. Elbow method for optimal K.
in large clusters, the DRL is the best choice for solving the PLTSPHD.
5.4. Sensitivity analysis
In this section, we conduct a sensitivity by varying the travel cost

of drones. The drone travel cost, denoted as 𝑐𝑑 , represents the trans-
portation cost per unit of travel distance. Given a set of service zones,
the changes in the drone travel cost can affect the amount of savings
obtained by each configuration of the truck-drones system. Therefore,
we experiment on this parameter by incrementally increasing the most
cited drone travel cost of $0.05/mile to $0.5/mile. To illustrate the
impact of the changes in drone travel cost on the total operational costs,
Fig. 4 shows the percentage of savings with 𝑚 = 5, 10, and 15 drones
as a function of the drone travel cost per mile 𝑐𝑑 .
The results in Fig. 4 show how savings from implementing PLTSPHD
decrease as 𝑐𝑑 increases. While the PLTSPHD still generates savings
for medium and small scenarios, these savings are relatively low and
more sensitive to the increase in the number of drones and drone
variable costs compared to the large scenarios. This behavior is ex-
pected because the difference between the small and medium cases is
only 40 customers. However, this slight difference affects the optimal
configurations of the drones in both cases. When using 𝑚 = 15 drones in
Fig. 3.c. PLTSPHD routing plan. small and medium regions, the savings are observed to decline sharply
with an increment of 𝑐𝑑 . On the other hand, a steady decline is observed
in the large region. Another interesting phenomenon is that when 𝑐𝑑
increases, using two different sizes of drone fleets will generate almost
the same amount of savings. This phenomenon is observed in the large
region when 𝑐𝑑 = $0.3/mile.
Even though having more drones reduces the number of stops, it
increases the fixed cost. In addition, fewer stops mean higher flight
distance by drones, indicating that the marginal benefits generated by
having more drones can be reduced; that is, after m reaches a certain
number, continuing to increase m along with 𝑐𝑑 will have a negative
impact on the total operational costs. This observation is clear in the
chart when the small and medium business cases no longer become
viable, as delivery with more drones would be more expensive than the
last-mile delivery cost by the truck-only system. This insight provides
SCs more freedom to choose the number of drones that best suit their
operations with regard to other factors besides the total operational cost
of deliveries.
6. Conclusion
Under the vision of Industry 4.0, the application of artificial intelli-

gence and smart logistics has captured the attention of many SCs and
Fig. 3.d. Truck-only routing plan.
logistics giants. In this paper, we introduce the Parking Location and
11
Table 6
Wilcoxon signed-rank test results to compare DRL with other algorithms.
Number of drones Algorithm Small Medium Large
R+ R− p-value R+ R− p-value R+ R− p-value
PLTSPHD (NN) 465 0 0 .00001 465 0 0.00001 465 0 0.00001
5 PLTSPHD (GA) 429 36 0.00002 450 15 0.00001 459 6 0.00001
PLTSPHD (OR-tools) 324 141 0.05344 408 57 0.00354 391 74 0.00112
PLTSPHD (NN) 465 0 0.00001 450 15 0.00001 465 0 0.00001
10 PLTSPHD (GA) 326 139 0.05045 456 9 0.00001 445 20 0.00001
PLTSPHD (OR-tools) 278 187 0.34389 319 146 0.75081 310 125 0.04556
PLTSPHD (NN) 465 0 0.00001 465 0 0.00001 465 0 0.00001
15 PLTSPHD (GA) 0 0 1.00000 444 21 0.00001 440 25 0.00001
PLTSPHD (OR-tools) 0 0 1.00000 290 175 0.23812 261 204 0.55588
Fig. 4. The sensitivity of the savings to the modeling parameters 𝑐𝑑 and m.
Traveling Salesman Problem with Homogeneous Drones (PLTSPHD). the PLTSPHD. The reduction in complexity enhances the performance
This problem is designed to consider the limitations of routing a truck of all comparative algorithms, however DRL remains superior to them,
with multiple drones for last-mile deliveries in the SC. The PLTSPHD especially in large cases. Sensitivity analysis shows the impact of the
considers a scenario in which a single truck carries identical drones changes in the drone travel cost on the amount of savings obtained by
along with parcels from the depot to preassigned launching/parking each configuration of the truck-drones system in all problem sizes. The
sites, from where the drones complete the last-mile deliveries. results show that savings remain significant for serving more customers
Large-scale delivery problems frequently arise in practice, and SC over a larger region. The savings are observed to be highly dependent
system response time is crucial. Conventional methods suffer from on the problem scale. While the amount of savings decreases with an
scalability issues and long computation time. In contrast to previous increase in the drone travel cost per mile, the business case remains
studies that tackle truck-drone delivery using conventional optimiza- interesting, given that the most cited value of drone travel cost is $0.05
tion approaches, this paper proposes a two-phase ML approach for per mile.
the PLTSPHD, with the goal of minimizing the operational cost. The From the perspective of real application, the methods and models
proposed ML combines constrained clustering and DRL. The purpose developed in this study could enable SCs to improve their last-mile
of constrained clustering is to find a set of feasible parking sites that delivery practices, support real-time decision-making, provide contact-
minimize drone flight range and increase service area coverage. Once less delivery methods during pandemics, and meet the rapidly growing
feasible parking sites are obtained, the developed DRL determines the demand for fast delivery. While PLTSPHD considers a single truck to
optimal routing plan that minimizes the total operational cost for the deploy a fleet of drones and then wait at parking stops to retrieve
truck-drone system. We mathematically formulated the constraints and all drones, future work can consider more complex scenarios such as
objective function of the PLTSPHD and then used them to design our dynamic customers, time windows, drone speeds, and truck speeds. In
ML approach. The flight range constraint and the maximum number addition, instead of training the DRL on artificial data, a routing simu-
of orders constraint are used in the constraint clustering, while the lator can be used to reflect real roads and capture the traffic conditions
objective function is utilized as a reward signal to train the agent in of a dense city. Moreover, the two-phase ML approach for PLTSPHD
DRL. can be extended to include multiple trucks with many drones. Another
Experimental results show that the last-mile delivery under the PLT- interesting direction for future research is the feasibility of drones to
SPHD formulation has the potential to provide significant cost savings, carry multiple packages and serve multiple customers simultaneously.
especially in large-scale problems. The proposed constrained k-means Once this becomes feasible, the utilization of drones in last-mile deliv-
clustering approach effectively reduces the scale and complexity of the ery will reshape the traditional logistics and transportation practices in
large routing problem while adhering to the operational constraints of the SC. We hope that our study will inspire further research to learn
12
more about how different ML techniques can be used to address large Kitjacharoenchai, P., Ventresca, M., Moshref-Javadi, M., Lee, S., Tanchoco, J.M.,
combinatorial optimization problems. Brunese, P.A., 2019. Multiple traveling salesman problem with drones:
Mathematical model and heuristic approach. Comput. Ind. Eng. 129, 14–30.
Kizilateş, G., Nuriyeva, F., 2013. On the nearest neighbor algorithms for the travel-
CRediT authorship contribution statement
ing salesman problem. In: Advances in Computational Science, Engineering and
Information Technology. Springer, pp. 111–118.
Ali Arishi: Conceptualization, Methodology, Software, Formal anal- Kodinariya, T.M., Makwana, P.R., 2013. Review on determining number of cluster in
ysis, Investigation, Visualization, Writing – original draft, Writing – K-means clustering. Int. J. 1, 90–95.
review & editing, Resources. Krishna Krishnan: Conceptualization, Konstantakopoulos, G.D., Gayialis, S.P., Kechagias, E.P., 2022. Vehicle routing prob-
Supervision, Investigation, Writing – review & editing. Majed Arishi: lem and related algorithms for logistics distribution: A literature review and
Statistical Experiment, Writing – review & editing. classification. Oper. Res. 22, 2033–2062.
Kool, W., Van Hoof, H., Welling, M., 2018. Attention, learn to solve routing problems!.
arXiv preprint arXiv:.08475.
Declaration of competing interest
Kuo, R., Lu, S.-H., Lai, P.-Y., Mara, S.T.W., 2022. Vehicle routing problem with drones
considering time windows. Expert Syst. Appl. 191, 116264.
The authors declare that they have no known competing finan- Labadie, N., Prins, C., Prodhon, C., 2016. Metaheuristics for Vehicle Routing Problems.
cial interests or personal relationships that could have appeared to John Wiley & Sons.
influence the work reported in this paper. Le, T.D.C., Nguyen, D.D., Oláh, J., Pakurár, M., 2022. Clustering algorithm for a vehicle
routing problem with time windows. Transport 37, 17–27.
References Li, K., Zhang, T., Wang, R., Wang, Y., Han, Y., Wang, L., 2021. Deep reinforcement
learning for combinatorial optimization: Covering salesman problems. IEEE Trans.
Abdirad, M., Krishnan, K., Gupta, D., 2020. A three-stage algorithm for the large scale Cybern..
dynamic vehicle routing problem with an industry 4.0 approach. arXiv preprint Lin, B., Ghaddar, B., Nathwani, J., 2021. Deep reinforcement learning for the electric
arXiv:.11719. vehicle routing problem with time windows. IEEE Trans. Intell. Transp. Syst..
Agatz, N., Bouman, P., Schmidt, M., 2018. Optimization approaches for the traveling Luong, N.C., Hoang, D.T., Gong, S., Niyato, D., Wang, P., Liang, Y.-C., Kim, D.I.,
salesman problem with drone. Transp. Sci. 52, 965–981. 2019. Tutorials, applications of deep reinforcement learning in communications
Barradas, S., 2013. The Real Cost of Trucking—Per Mile Operating Cost of a Commercial and networking: A survey. IEEE Commun. Surv. 21, 3133–3174.
Truck, The Truckers Report. Available online: https://www.thetruckersreport.com/ Mostafa, N., Eltawil, A., 2017. Solving the heterogeneous capacitated vehicle routing
infographics/cost-of-trucking/ (accessed on December 2021). problem using K-means clustering and valid inequalities. In: Proceedings of the
Baumann, P., 2020. A binary linear programming-based k-means algorithm for clus- International Conference on Industrial Engineering and Operations Management.
tering with must-link and cannot-link constraints. In: 2020 IEEE International Murray, C.C., Chu, A.G., 2015. The flying sidekick traveling salesman problem:
Conference on Industrial Engineering and Engineering Management. IEEM, IEEE, Optimization of drone-assisted parcel delivery. Transp. Res. C 54, 86–109.
pp. 324–328. Nazari, M., Oroojlooy, A., Snyder, L., Takác, M., 2018. Reinforcement learning for
Bello, I., Pham, H., Le, Q.V., Norouzi, M., Bengio, S., 2016. Neural combinatorial solving the vehicle routing problem. Adv. Neural Inf. Process. Syst. 31.
optimization with reinforcement learning. arXiv preprint arXiv:.09940. Ostad-Ali-Askari, K., Shayannejad, M., Ghorbanizadeh-Kharazi, H., 2017. Artificial
Bengio, Y., Lodi, A., Prouvost, A., 2021. Machine learning for combinatorial neural network for modeling nitrate pollution of groundwater in marginal area
optimization: a methodological tour d’horizon. European J. Oper. Res. 290, of Zayandeh-rood River, Isfahan, Iran. KSCE J. Civ. Eng. 21, 134–140.
405–421. Perron, L., Furnon, V., 2019. OR-tools 7.2.
Bouman, P., Agatz, N., Schmidt, M., 2018. Dynamic programming approaches for the Potvin, J.-Y., 1996. Genetic algorithms for the traveling salesman problem. Ann. Oper.
traveling salesman problem with drone. Networks 72, 528–542. Res. 63, 337–370.
Cavani, S., Iori, M., Roberti, R., 2021. Exact methods for the traveling salesman problem Ranieri, L., Digiesi, S., Silvestri, B., Roccotelli, M., 2018. A review of last mile logistics
with multiple drones. Transp. Res. C 130, 103280. innovations in an externalities cost reduction vision. Sustainability 10, 782.
Chen, Y., Zhang, Y., Ji, X., 2005. Size regularized cut for data clustering. Adv. Neural Rejeb, A., Rejeb, K., Simske, S.J., Treiblmaier, H., 2021. Drones for supply chain
Inf. Process. Syst. 18. management and logistics: a review and research agenda. Int. J. Logist. Res. Appl.
Dell’Amico, M., Montemanni, R., Novellani, S., 2021. Drone-assisted deliveries: New 1–24.
formulations for the flying sidekick traveling salesman problem. Optim. Lett. 15, Schermer, D., Moeini, M., Wendt, O., 2019. A hybrid VNS/Tabu search algorithm for
1617–1648. solving the vehicle routing problem with drones and en route operations. Comput.
Dinler, D., Tural, M.K., 2016. A survey of constrained clustering. In: Unsupervised Oper. Res. 109, 134–158.
Learning Algorithms. Springer, pp. 207–235. Serrano, W., 2022. Deep reinforcement learning with the random neural network. Eng.
Ganganath, N., Cheng, C.-T., Chi, K.T., 2014. Data clustering with cluster size Appl. Artif. Intell. 110, 104751.
constraints using a modified k-means algorithm. In: 2014 International Conference Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., Van Den Driessche, G.,
on Cyber-Enabled Distributed Computing and Knowledge Discovery. IEEE, pp. Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., 2016. Mastering
158–161. the game of go with deep neural networks and tree search. Nature 529, 484–489.
Geetha, S., Poonthalir, G., Vanathi, P., 2013. Nested particle swarm optimisation for Singanamala, P., Reddy, D., Venkataramaiah, P., 2018. Solution to a multi depot vehicle
multi-depot vehicle routing problem. Int. J. Oper. Res. 16, 329–348. routing problem using K-means algorithm, clarke and wright algorithm and ant
Guglielmo, C., 2013. Turns out amazon, touting drone delivery, does sell lots of colony optimization. Int. J. Appl. Eng. Res. 13, 15236–15246.
products that weigh less than 5 pounds. Sutton, R.S., McAllester, D., Singh, S., Mansour, Y., 1999. Policy gradient methods for
Ha, Q.M., Deville, Y., Pham, Q.D., Hà, M.H., 2018. On the min-cost traveling salesman reinforcement learning with function approximation. Adv. Neural Inf. Process. Syst.
problem with drone. Transp. Res. C 86, 597–621. 12.
He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for image recog- Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł.,
nition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Polosukhin, I., 2017. Attention is all you need. Adv. Neural Inf. Process. Syst. 30.
Recognition, pp. 770–778. Vinyals, O., Fortunato, M., Jaitly, N., 2015. Pointer networks. Adv. Neural Inf. Process.
Ioffe, S., Szegedy, C., 2015. Batch normalization: Accelerating deep network training by Syst. 28.
reducing internal covariate shift. In: International Conference on Machine Learning. Wagstaff, K., Cardie, C., Rogers, S., Schrödl, S., 2001. Constrained k-means clustering
PMLR, pp. 448–456. with background knowledge. Icml 57, 7–584.
Iranmanesh, S., Raad, R., 2019. A novel data forwarding strategy for a drone delay Wang, Z., Sheu, J.-B., 2019. Vehicle routing problem with drones. Transp. Res. B 122,
tolerant network with range extension. Electronics 8, 659. 350–364.
Jeong, H.Y., Song, B.D., Lee, S.S., 2020. The flying warehouse delivery system: A Wang, Q., Tang, C., 2021. Deep reinforcement learning for transportation network
quantitative approach for the optimal operation policy of airborne fulfillment combinatorial optimization: A survey. Knowl.-Based Syst. 233, 107526.
center. IEEE Transactions on Intelligent Transportation 22, 7521–7530. Wang, S., Wang, H., Zhou, Y., Liu, J., Dai, P., Du, X., Wahab, M.A., 2021. Automatic
Kawahara, Y., Nagano, K., Okamoto, Y., 2011. Submodular fractional programming for laser profile recognition and fast tracking for structured light measurement using
balanced clustering. Pattern Recognit. Lett. 32, 235–243. deep learning and template matching. Measurement 169, 108362.
Kim, E., 2016. The most staggering part about Amazon’s upcoming drone delivery Williams, R.J., 1992. Simple statistical gradient-following algorithms for connectionist
service. reinforcement learning. Mach. Learn. 8, 229–256.
13
Xu, Y., Fang, M., Chen, L., Xu, G., Du, Y., Zhang, C., 2021. Reinforcement learning Zhao, J., Mao, M., Zhao, X., Zou, J., 2020. A hybrid of deep reinforcement learning
with multiple relational attention for solving vehicle routing problems. IEEE Trans. and local search for the vehicle routing problems. IEEE Trans. Intell. Transp. Syst.
Cybern.. 22, 7208–7218.
Xu, H., Pu, P., Duan, F., 2018. Dynamic vehicle routing problems with enhanced ant Zhu, S., Wang, D., Li, T., 2010. Data clustering with size constraints. Knowl.-Based
colony optimization. Discrete Dyn. Nat. Soc. 2018. Syst. 23, 883–889.
Zhang, Z., Liu, H., Zhou, M., Wang, J., 2021. Solving dynamic traveling salesman
problems with deep reinforcement learning. IEEE Trans. Neural Netw. Learn. Syst..
14

Paper 3

Uploaded by

Copyright:

Available Formats

Paper 3

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Paper 3

Uploaded by

Copyright:

Available Formats

Engineering Applications of Artificial Intelligence 116 (2022) 105439

Contents lists available at ScienceDirect

Engineering Applications of Artificial Intelligence

Machine learning approach for truck-drones based last-mile delivery in the

ARTICLE INFO ABSTRACT

Fig. 1. PLTSPHD for last-mile delivery.

Fig. 2. General frame structure of the encoder–decoder for PLTSPHD.

4.2.2. Decoder framework

Table 2 5.1.2. Comparison with conventional algorithms

To further verify the overall performance of the DRL across all

follows the normal distribution and have equal variance, we perform

5.4. Sensitivity analysis

In this section, we conduct a sensitivity by varying the travel cost

Under the vision of Industry 4.0, the application of artificial intelli-

Fig. 4. The sensitivity of the savings to the modeling parameters 𝑐𝑑 and m.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.