Sustainability 15 08262 v3
Sustainability 15 08262 v3
Sustainability 15 08262 v3
Article
A Multi-Agent Reinforcement Learning Approach to the
Dynamic Job Shop Scheduling Problem
Ali Fırat İnal 1, * , Çağrı Sel 2 , Adnan Aktepe 1 , Ahmet Kürşad Türker 1 and Süleyman Ersöz 1
Abstract: In a production environment, scheduling decides job and machine allocations and the
operation sequence. In a job shop production system, the wide variety of jobs, complex routes, and
real-life events becomes challenging for scheduling activities. New, unexpected events disrupt the
production schedule and require dynamic scheduling updates to the production schedule on an
event-based basis. To solve the dynamic scheduling problem, we propose a multi-agent system with
reinforcement learning aimed at the minimization of tardiness and flow time to improve the dynamic
scheduling techniques. The performance of the proposed multi-agent system is compared with the
first-in–first-out, shortest processing time, and earliest due date dispatching rules in terms of the
minimization of tardy jobs, mean tardiness, maximum tardiness, mean earliness, maximum earliness,
mean flow time, maximum flow time, work in process, and makespan. Five scenarios are generated
with different arrival intervals of the jobs to the job shop production system. The results of the
experiments, performed for the 3 × 3, 5 × 5, and 10 × 10 problem sizes, show that our multi-agent
system overperforms compared to the dispatching rules as the workload of the job shop increases.
Under a heavy workload, the proposed multi-agent system gives the best results for five performance
criteria, which are the proportion of tardy jobs, mean tardiness, maximum tardiness, mean flow time,
and maximum flow time.
Citation: İnal, A.F.; Sel, Ç.; Aktepe,
A.; Türker, A.K.; Ersöz, S. A Keywords: dynamic job shop scheduling problem; multi-agent system; reinforcement learning;
Multi-Agent Reinforcement Learning Industry 4.0; dispatching rules
Approach to the Dynamic Job Shop
Scheduling Problem. Sustainability
2023, 15, 8262. https://doi.org/
10.3390/su15108262
1. Introduction
Academic Editors: Samir Lamouri, Scheduling is one of the critical activities in production management to enhance a
Robert Pellerin and Pascal Forget production system’s performance. Scheduling determines the jobs produced on a machine
Received: 19 March 2023
and the production sequence [1–6]. In case the arrival times of the jobs are pre-known, all
Revised: 10 May 2023 of the jobs in a process can be organized once by static scheduling. However, the arrival
Accepted: 16 May 2023 time of each job can barely be foreseen in practice, so it is a necessity to dynamically update
Published: 18 May 2023 the production schedule while the system is running.
In practice, many dynamic events such as arrival times, processing times, machine
breakdowns, order cancellations, and due date changes can occur. The actual times of the
events are not precise. Random events continuously corrupt the current schedule, so a revised
Copyright: © 2023 by the authors. schedule is needed every time a new event occurs. In this study, we propose a dynamic
Licensee MDPI, Basel, Switzerland. scheduling method based on an event-based simulation to model the rescheduling issue.
This article is an open access article In dynamic scheduling problems, production systems are classified as job shop, flow
distributed under the terms and shop, mixed shop, open shop, and group shop [7–14]. In a job shop production system, the
conditions of the Creative Commons
variety of products is high, and the batch volume is low because of the varying customers’
Attribution (CC BY) license (https://
orders. In a dynamic job shop, new orders constantly arrive at the system to be produced,
creativecommons.org/licenses/by/
and completed orders leave the system. The continuous arrivals of jobs that require different
4.0/).
2. Literature Summary
In this section, we review the relevant studies published during 2010–2022. For
the literature review, we use “job shop scheduling”, “dynamic job shop scheduling”,
“agent”, “multi-agent system”, and “reinforcement learning” as the keywords. We include
the research papers indexed in Science Citation Index (SCI) and Science Citation Index
Expanded (SCIE). We examine the DJSP characteristics and solution approach in the field.
The interested readers are referred to recent review papers [26–35] in the field. The relevant
studies in the literature are summarized under three main categories as the static problem,
the dynamic problem, and DJSP.
Komma et al. (2011) [36] is the pioneering research in the field. They prepared a
guide on designing agent architecture in different production systems using the Java Agent
Development Framework. They consider a discrete event simulation by modeling the
components of a production system. Owliya et al. (2012) [37] designed a MAS for general
use. They tested the MAS structure on a single machine scheduling problem. They used
cost and resource utilization rates as the performance criteria. Leitao et al. (2015) [38]
designed a MAS and agents’ communication with each other as block diagrams. These
show the general behavior of the agents. Yu et al. (2018) [39] developed MAS-based
scheduling on two identical parallel machines. They defined the operations and machines
as agents. They took the makespan and total tardiness as the criteria. Wong et al. (2012) [40]
designed a MAS using ACO for the process planning and integrated scheduling problem.
They took the makespan, average flow time, and resource utilization rates as the criteria.
Wang et al. (2019) [41] developed a MAS in which the agents communicate with each
other by using the game theory method. They tested this MAS architecture on a simulation
model of a smart workshop. They took the makespan, machine workloads, and energy
consumption as the criteria. They showed that the MAS architecture yielded better results
than the FIFO-based and SPT-based approaches. Kim et al. (2020) [42] designed a MAS
for personalized manufacturing. They used the makespan and maximum tardiness as
the performance criteria. They compared the designed MAS with the frequently used
dispatching rules in the literature. They used the RL algorithm for the development of the
decision mechanism.
While static scheduling problems are ideal for testing new solution methods, real-world
scheduling problems are dynamic. For this reason, a technique that offers feasible solutions to
the static problem may not provide a feasible solution to dynamic real-world problems.
Lee et al. (2017) [43] designed a structure with two rival agents in a system with
two parallel machines. The agents’ goal was to minimize the makespan. The structure was
compared with the GA. Ahmadi et al. (2016) [44] used NSGA-II and NRGA in the DFJSP,
considering machine breakdowns. Shiue et al. (2018) [45] designed a structure that changes
the dispatching rules by using RL for the DFJSP. They chose the average flow time and
number of tardy jobs as the criteria. Sahin et al. (2017) [46] designed a MAS for the DFJSP.
Each agent tried to achieve its own goal. They made both dynamic and static scheduling
and achieved satisfactory results. Maoudj et al. (2019) [47] designed a MAS architecture for
the robotic flexible assembly cell, which is considered as the DFJSP. The MAS architecture
created the schedule by switching between the dispatching rules. They used the makespan
as a criterion. They showed that the agent architecture they designed could yield better
results than the metaheuristics they compared it with. Huang and Liao (2012) [48] designed
a MAS architecture for the dynamic parallel machine scheduling problem. In the MAS
structure, which consists of work, machine, and management agent, the communication
between agents is examined in detail. As the criteria, they considered total tardiness,
flow time, resource utilization rates, and revenue value. Y. Liu et al. (2018) [49] studied
cloud manufacturing. They created a MAS-based scheduling mechanism and tested it in a
sample study using the simulation method. They explained the communication between
the agents in detail. Jiang et al. (2017) [50] worked on dynamic scheduling in CPPS.
They established a double-layered decision-making mechanism. This decision-making
mechanism performed the rescheduling activity with a GA. The agents both collected
information and took actions from the decision-making mechanism. S. Zhang and Wong
(2017) [51] simulated different dynamic factors in different scenarios in the DFJSP. They
hybridized the MAS-based approach they developed with the ACO. The makespan was
considered as a criterion. Barenji et al. (2017) [52] worked on MAS-based DSS for solving
the D-Flow Shop problem. They tested the MAS-based DSS by modeling a small- and
medium-sized real-life system in a simulation environment. The proposed a system that
can perform both static and dynamic scheduling. They used the makespan as a criterion.
Shi et al. (2021) [53] designed a MAS that updates the priorities of the jobs with different
types of GA. They tested the MAS structure in sustainable hybrid-flow-type production.
They took into account the makespan, energy consumption, and carbon emissions as the
criteria. The proposed MAS structure increased the computation time as the problem size
increases but gave better results than the compared algorithms. Luo (2020) [54] designed
a MAS with the RL approach for the DFJSP. The makespan was used as a criterion. The
designed MAS was compared with the dispatching rules frequently used in the literature.
Since dynamic scheduling problems reflect real production systems, they are divided
into many categories. It would not be correct to say that a method that gives feasible
solutions for one category necessarily gives feasible solutions in other categories.
In this study, we propose a MAS-RL for the DJSP. The studies conducted on the DJSP
are summarized in Table 3 according to the problem/environment, the solution method,
and dynamic factors.
Sustainability 2023, 15, 8262 5 of 24
We examined both the method and the problem characteristics of the studies in the
literature and summarized them in the following list. We described the points of our
study’s similarities and differences from the literature. As a result of the literature review,
the following list of improvements to the literature were reached.
1. In our study, each job type can have different priority values on each machine as
a unique scheduling method. In the literature, there is no other study using this
scheduling method exactly as it is in this study. With this method, we aim to give
flexibility to the production schedule. This method, which has a unique scheduling
way, is explained in detail in the following sections. It is thought that researchers can
adapt this scheduling method to their own studies and maybe improve this method
by making some changes.
2. No other studies using MAS with RL for DJSP were found in the literature. However,
there is one study using a single agent with RL, which is Kardos et al. (2020) [60].
Since there are insufficient studies on this specific mixture of the problem and solution
method, our study can be considered as a novel study conducted on this area.
The aspects of our study that differ from Kardos et al. (2020) [60] and how they are
extended are mentioned in the following list.
i. While a single agent with RL was proposed for the DJSP in Kardos et al. (2020) [60],
we extend this approach by proposing a MAS with RL. In other words, a structure
is designed in which there are multiple agents with different purposes. In this way,
instead of trying to optimize the entire system with a single agent, it tries to be
optimized with multiple agents in parts.
ii. While Kardos et al. (2020) [60] considered only the OA as a dynamic factor, we
extend dynamic factors such as OA, PT, and DD. Since using more dynamic events
together means that the problem becomes more difficult to solve, we improve the
literature in this aspect.
iii. While Kardos et al. (2020) [60] took into account the average lead time as a perfor-
mance criterion, we extend the number of performance criteria to nine, which is the
proportion of the tardy jobs, mean tardiness, maximum tardiness, mean earliness,
maximum earliness, mean flow time, maximum flow time, work in process, and
makespan in this study. With the expansion of the performance criteria, the results
in this specific area can be examined in a wider range, making it possible to reach
conclusions from various aspects.
In addition to all these improvements to the literature, we aimed to make it easier for
researchers who are not experts in MAS to understand the MAS easily and develop their
own studies on this subject. From this aspect, the MAS-RL in this study was designed to be
as understandable as possible, and each agent’s working principle was explained in detail.
In this way, we tried to encourage that MAS studies be carried out in the future.
3. Problem Statement
The main frame of the DJSP is to use a limited number of machines (or service
providers), to process a specified number of jobs (or tasks), while trying to optimize the
specified objectives such as the makespan or tardiness. Each of these jobs has a specified
operation sequence or route through the machines, with a specified processing time at the
corresponding machine. When the job passes through the last operation sequence, it is
considered as a finished job.
The DJSP also has other constraints that needs to be taken care of. In some of the studies
in the literature, the problem is attempted to be solved by the mathematical programming
method, while, in other studies, simulation programs specially designed for scheduling
problems are used. The advantages of using a simulation program are that the production
schedule can be stopped and examined at any time, the workflows can be followed visually,
and there is no need for a mathematical model. In our study, the Arena® package program
was used as a simulation program. Within the modules of the program, the constraints
Sustainability 2023, 15, 8262 7 of 24
of DJSP are already present. For this reason, these constraints are given by linguistic
expressions as follows.
1. Different operations are performed on different machines;
Sustainability 2023, 15, x FOR PEER REVIEW 7 of 23
2. The machines operate only one job at the same time;
3. The jobs are operated on only one machine at the same time;
4. Operations that have started cannot be interrupted or paused;
5.
5. The
The jobs
jobs must
must follow
follow their
their routes
routes in
in the
the specific
specific order;
order;
6.
6. The
The queue
queue capacity
capacity is
is unlimited
unlimited for
for any
any machine.
machine.
Job shop
Job shop scheduling
scheduling problems
problems can can be
be of
of different
different sizes.
sizes. The
The problem
problem size
size is
is expressed
expressed
as the
as the number
number of of job
job types
types and
and the
the number
number of of machines
machines (jxm).
(jxm). The
The machine
machine and and job-type
job-type
thresholds of a job shop that determine the complexity of a job shop-instance aregen-
thresholds of a job shop that determine the complexity of a job shop-instance are not not
erally agreed
generally upon
agreed uponin the
in literature [65].[65].
the literature In addition, therethere
In addition, are studies in the
are studies inliterature that
the literature
mention
that that that
mention the problem
the problemsize size
doesdoesnot make a difference
not make a differencefor the
for performance
the performance of the
of dis-
the
patching rules
dispatching [66,67].
rules [66,67].
In this
In this study,
study,we weperformed
performedexperiments
experimentsfor forthe
the3 3××3,3,55×× 5,
5, and
and 10
10× × 10
10 problem sizes
problem sizes
to show
to show thatthat our
our proposed
proposed approach
approach works
works well
well for
for different
different problem
problem sizes.
sizes. The
The routes
routes
that jobs
that jobs follow
follow in in aa job-shop
job-shop environment
environment are are complex
complex and and difficult
difficult to
to follow.
follow. To illustrate
To illustrate
this, aa visual
this, visual representation
representation of of the
the 33 ×
× 33 problem
problem size
size dynamic
dynamic job-shop
job-shop environment
environment is is
given in
given in Figure
Figure 1. 1.
Figure 1.
Figure 1. Visual
Visual representative
representative of
of the
the 33 ×
× 33 problem
problem size
size dynamic
dynamic job-shop
job-shop environment.
environment.
“M” means
“M” means machine,
machine, andand “jt”
“jt” means
means job
jobtype.
type.Job Jobtype
type11isismarked
markedininred,
red,job
jobtype
type2
2isismarked
markediningreen,
green,andandjob
jobtype
type33isis marked
marked in in blue.
blue. Each
Each job type visits each machine
machine
according to its route.route. For example, jt1’s route (red) is M1–M2–M3, M1–M2–M3, jt2’s route (green) is
M2–M3–M1, and
M2–M3–M1, and jt3’s
jt3’s route
route (blue)
(blue) isis M3–M1–M2.
M3–M1–M2. A A machine
machine cancan have
have more
more than
than one
one job
job
of the same job type in the queue.
of the same job type in the queue.
The DJSP
The DJSP isisananNP-hard
NP-hardclassclassproblem
problemdue duetotoitsitscomplexity.
complexity. TheThe increase
increase in the
in the diver-
diversity
sity of machines and job types and the increase in the complexity of the jobs’ routesitmake
of machines and job types and the increase in the complexity of the jobs’ routes make almost it
almost impossible
impossible to reach to thereach the optimum
optimum solution of solution
the problemof theinproblem in the polynomial
the polynomial time. Due to time.
the
stochastic
Due to theand dynamicand
stochastic nature of thenature
dynamic job arrivals
of thetojob
thearrivals
system, to
it may cause a it
the system, computational
may cause a
burden to produce reliable solutions even with the 3 × 3 problem
computational burden to produce reliable solutions even with the 3 × 3 problem size. size.
In
In Table
Table 4,4, the
the model
model notations
notations are are described.
described.
Table 4. Notations.
Notation Description
j jth job or job index
m mth machine or machine index
t Current time
Aj Arrival time of jth job to the system
Aj,m Arrival time of jth job to mth machine
Cj Completion time of jth job
Sustainability 2023, 15, 8262 8 of 24
Table 4. Notations.
Notation Description
j jth job or job index
m mth machine or machine index
t Current time
Aj Arrival time of jth job to the system
Aj,m Arrival time of jth job to mth machine
Cj Completion time of jth job
Dj Due date of jth job
Pj Total processing time of jth job
Pj,m Processing time of jth job on mth machine
Zm Priority index of mth machine
In the job-shop-type production system, jobs (j) randomly arrive at the shop floor
according to exponential distribution. The arrival time of each job is recorded as (Aj ). The
processing times of the job on each separate machine (m) are determined according to the
normal distribution and are recorded as (Pj,m ). The due dates of the jobs are assigned as
(Dj ). After the assignments, the jobs are directed to the first machines on their routes. If
a machine is in idle state and the machine’s queue is empty, the job’s processing starts
immediately, otherwise the job is directed to the machine’s queue and waits for the machine
to become idle. When the machine becomes idle, and the job is chosen to be next, it enters
the machine and is processed as the time (Pj,m ). The job is then routed to the next machine
according to its route, and this sequence repeats until the job’s route complete. Here,
we assume that the transportation times between the machines can be neglected. The
completion time of each job is recorded as (Cj ) and is used to calculate the flow time (Fj )
and deviation from the due date (Devj ). These formulations are presented below. The flow
time is calculated by Equation (1). Equation (2) calculates the deviation from the due date.
A positive deviation indicates that the job is tardy (or late), and the outcome of the equation
is considered tardiness (Tj ). If the deviation is negative, it corresponds to earliness (Ej ). It is
undesirable for a job leaving the system to be early or tardy. A job that is early indirectly
causes other jobs in the system to be tardy. This is a situation where every job is requested
to be completed exactly on its due date [68].
Fj = Cj − Aj , (1)
Devj = Cj − Dj , (2)
minimum arrival time should be selected and processed on the machine as a priority. FIFO
is formulated in Equation (3).
Min( Zm ) = D j ∀ j, (5)
Prior job
Queue Agent Machine Agent
Machine
Updated queue
Queue status,
Job Decided status
prior job
status
Send the job
info to the next machine's queue
when the process is finished
Job info
Job Agent Database Agent
Updated weights
Queue status,
Job info, Updated queue
Create the jobs according to queue status, status
dynamic order arrivals machine status Decided
and Weights prior job
info
Reinforcement Learning
Decision Agent
Decided Mechanism
prior job info
Physical flow and Weights
Figure 2.MAS-RL
Figure 2. The proposed The proposed MAS-RL structure.
structure.
In the MAS-RLInstructure,
the MAS-RL jobsstructure, jobsby
are created are created
a Job by aand
Agent, Job aAgent,
chainand a chain
reaction reaction begin
begins
when an order when an The
arrives. orderreaction
arrives.continues
The reaction continues
until until thenumber
the maximum maximum number
of jobs has of jobs has bee
been
reached.the
reached. We describe Wegoals,
describe the goals,
decisions, and decisions,
internal and internal mechanisms
mechanisms of the agents.
of the agents.
5.2.
5.2. Reinforcement
Reinforcement Learning
Learning Mechanism
Mechanism
Machine learning algorithms
Machine learning algorithms receive receivehistorical
historical input
input andand output
output datadata
fromfrom super-
supervised
vised learning.
learning. The supervised
The supervised learninglearning
method method
allowsallows the algorithm
the algorithm to create
to create outputs outputs as
as close
close
to thetodesired
the desired
resultresult as possible
as possible by changing
by changing the model
the model betweenbetween each input/output
each input/output pair.
pair. Supervised
Supervised learning
learning algorithms
algorithms include
include decision
decision trees,
trees, neural
neural networks,
networks, support
support vec-
vector
tor machines,
machines, andand linear
linear regression.
regression.
The
The labeled
labeled training
training sets
sets and
and data
data are not used
are not used inin unsupervised
unsupervised learning.
learning. Instead,
Instead, the
the
machine
machine searches
searches the
the data for less
data for less obvious
obvious patterns.
patterns. Machine
Machine learning
learning ofof this
this type
type makes
makes
decisions
decisions byby using
using the
the data
data toto find patterns. K-means,
find patterns. K-means, Hidden
Hidden Markov
Markov models,
models, aa Gaussian
Gaussian
mixture,
mixture, and
and hierarchical
hierarchical clustering models are
clustering models are common
common unsupervised learning algorithms.
unsupervised learning algorithms.
RL
RL is
is a machine learning
a machine learning type
type that
that reflects
reflects humans’
humans’ learning
learning mechanism.
mechanism. The The agent
agent
learns
learns by
by interacting
interactingwithwiththetheenvironment
environmentand and receives
receives a positive
a positivereward
rewardor negative
or negativere-
ward (punishment). The agent is programmed to seek a long-term reward to reach the
reward (punishment). The agent is programmed to seek a long-term reward to reach the
goal [69]. The
The RL
RL mechanism
mechanism is illustrated in Figure 3. The The agent
agent takes an action by looking
at the state. The environment
environment changes
changes according
according to the actionaction taken.
taken. According
According to to this
change,
change, the
the agent
agent receives
receives aa reward.
reward. Then,
Then, the
the loop
loop starts
starts over
over byby looking
looking at
at the
the state
stateagain.
again.
Environment
Agent
Figure
Figure 3.
3. The
The agent–environment
agent–environment interaction
interaction in
in RL
RL framework.
framework.
To use
To use the
theRL
RLmechanism,
mechanism,a apriority
prioritytable is is
table needed, as as
needed, in in
dispatching rules.
dispatching Therefore,
rules. There-
a priority value is defined for each job type on each machine. These values are indicated
fore, a priority value is defined for each job type on each machine. These values are indi-
by W. W values are visualized in Table 5 only for the 3 × 3 problem, as the size of the
cated by W. W values are visualized in Table 5 only for the 3 × 3 problem, as the size of the
problem increases as the job types and the number of machines increase. For the 5 × 5 and
problem increases as the job types and the number of machines increase. For the 5 × 5 and
10 × 10 problems, the table expands as the job types (i) and machines (m) increase.
10 × 10 problems, the table expands as the job types (i) and machines (m) increase.
Table 5. W values by job type and machine indices (3 × 3 problem).
Table 5. W values by job type and machine indices (3 × 3 problem).
Wi,mWi,m Machine
Machine (m) (m)
M1 M1 M2 M2 M
M 33
When scheduler agents need to select a job from the corresponding machine’s queue,
they give priority to the job type with the highest W value. This structure is formulated in
Equation (6).
Max( Zm ) = Wi,m ∀ I, (6)
Sustainability 2023, 15, 8262 12 of 24
In cases where the queues have more than one job of the same job type, the W values
of the jobs are equal. When trying to give priority to one of these jobs, a tie situation occurs.
To break the tie, the FIFO rule is used to define the earliest job that came to the queue.
The W values are updated for every job leaving the system. A job leaving the system
changes the priority values of the jobs in the same type in the system. For example, a job
with job type Type2 updates W 21 , W 22 , and W 23 when leaving the system. The magnitude
of change takes place, as shown in Equation (7) for tardy jobs and as shown in Equation (8)
for early jobs.
h i
new Wi,m = old Wi,m + Tj/ max Tj + ( Ni /max( Ni )) + ( Mm /max( Mm )) /α ∀i, m, (7)
new Wi,m = old Wi,m − E j/ max E j /β ∀i, m (8)
6. Simulation Model
In order to simulate a real job-shop environment, all input data need to be dynamically and
stochastically obtained throughout the simulation period. For this reason, while the simulation
is running, input data such as jobs’ arrival times, processing times, and due dates are generated
according to the probability distributions when needed. The routes and processing times of the
jobs used in the 3 × 3 problem simulation model are given in Table 6. The processing times are
randomly generated by the normal distribution. Job type (i) in the table expands to 5 lines for
the 5 × 5 problem and 10 lines for the 10 × 10 problem. The processing times used for the 5 × 5
and 10 × 10 problems are given in Tables 7 and 8, respectively.
Sustainability 2023, 15, 8262 13 of 24
The jobs’ arrival rates are assigned separately for five different scenarios. As the
time between arrivals becomes shorter, the jobs’ arrivals become more frequent, and the
workload of the system increases. The scenarios are presented in Table 9, which represents
very low, low, moderate, heavy, and very heavy workloads. The time between arrivals is
randomly generated by the exponential distribution.
There are different due date assignment methods in the literature for job-shop-scheduling
problems. These different methods do not have any advantages over each other. Due to
its ease of implementation, one of the “processing time multiplying” methods was used
in [70]. In this method, the due date is assigned by the uniform distribution for each job,
as shown in Equation (9). When calculating the due date, the arrival time of the job and
the estimated total processing time should also be taken into account. After the due date
assignment, jobs go to the first machine on their routes.
− − −
Scenario 4 Scenario 5
W12
6.00 6.00
W13
4.00 4.00
W21
W23
− −
W31
−2.00 −2.00 W32
Events Events
Figure 5. Event-based
Figure 5. Event-based graphofofWWvalues
graph valuesin
in33××3 3problem.
problem.
Sustainability 2023, 15, x FOR PEER REVIEW 19 of 23
Sustainability 2023, 15, 8262 20 of 24
In the
In the graphs
graphs shown
shown in
in Figure
Figure 5,
5, itit is
is seen
seen that
that aa learning
learning curve
curve (LC),
(LC), which
which isis very
very
common in
common in machine
machine learning
learning studies
studies in
in the
the literature,
literature, has
has emerged.
emerged. The
The LC
LC is
is known
known forfor
initially making hard peaks and becoming stable as time passes [71]. The LC describes a
initially making hard peaks and becoming stable as time passes [71]. The LC describes
asystem’s
system’sperformance
performanceononaatask
taskas
asaa function
function over over some
some resource
resource to solve that
to solve that task,
task, as
as
shown in Figure 6.
shown in Figure 6.
Figure 6.
Figure 6. A
A representation
representation of
of LC.
LC.
In machine
In machine learning
learning studies,
studies, performance
performance criteria
criteria such
such as
as the
the Mean
Mean Squared
Squared Error
Error
(MSE)or
(MSE) orthe
theMean
MeanAbsolute
AbsolutePercentage
Percentage Error
Error (MAPE)
(MAPE) areare often
often used.
used. In our
In our study,
study, in-
instead
stead
of of them,
using using athem, a strategy
strategy based on based on instantly
instantly correcting correcting
the errorthe error occurred
occurred was
was adopted.
adopted.
The MAS-RL The constantly
MAS-RL constantly
monitoredmonitored
the systemthe system
and andthe
updated updated W values ac-
the according
W values to
cording
the to the magnitude
magnitude of the
of the errors. errors.
That That
is, the is, the
error and error and W symmetrically
W values values symmetrically pro-
proceeded
ceeded according
according to each other.
to each other.
8.
8. Conclusions
Conclusions
In
In this
this paper,
paper, aa MAS-RL
MAS-RL approach
approach was was proposed
proposed to to solve
solve thethe DJSP.
DJSP. The
The performance
performance
of
of the proposed approach were compared to the FIFO, SPT, and EDD dispatching rules
the proposed approach were compared to the FIFO, SPT, and EDD dispatching rules inin
the literature. Five different scenarios with increasing job arrival rates
the literature. Five different scenarios with increasing job arrival rates and nine different and nine different
performance
performance criteria were used
criteria were usedforforcomparison.
comparison.Experiments
Experiments were
were performed
performed forfor
thethe
3×
33,×5 3, 5 × 5, and 10 × 10 problem sizes. The following conclusions
× 5, and 10 × 10 problem sizes. The following conclusions were made from the exper- were made from the
experimental results.
imental results.
1. As
1. As the
the workload
workload increases, the MAS-RL
increases, the MAS-RL performs
performs better.
better. From
From Scenario
Scenario 11 toto Scenario
Scenario
5, the workload increases along with the performance of the
5, the workload increases along with the performance of the MAS-RL. It is under-MAS-RL. It is understood
that this
stood is this
that caused by twoby
is caused factors. The first
two factors. The factor
first is that is
factor as that
the workload increases,
as the workload in-
the number of jobs in the system also increases, so more scheduling
creases, the number of jobs in the system also increases, so more scheduling activities activities are
needed.
are The The
needed. MAS-RL
MAS-RLquickly examines
quickly the status
examines of all jobs
the status of allinjobs
the system and makes
in the system and
makes the most appropriate choices. The second factor is that the MAS-RL to
the most appropriate choices. The second factor is that the MAS-RL starts make
starts to
more effective decisions after completing its learning stage. Having
make more effective decisions after completing its learning stage. Having many jobs many jobs in the
system at the same time enables the MAS-RL to learn faster and also allows it to apply
in the system at the same time enables the MAS-RL to learn faster and also allows it
what it has learned to more jobs.
to apply what it has learned to more jobs.
2. The MAS-RL can successfully overcome tardiness. For all problem sizes, the MAS-RL
2. The MAS-RL can successfully overcome tardiness. For all problem sizes, the MAS-
gave the best results in Scenarios 4 and 5 for all the performance criteria related to
RL gave the best results in Scenarios 4 and 5 for all the performance criteria related
tardiness. In the literature, the dispatching rules that work best for tardiness are
to tardiness. In the literature, the dispatching rules that work best for tardiness are
known as EDD and its derivatives. However, the MAS-RL showed promising results
known as EDD and its derivatives. However, the MAS-RL showed promising results
for tardiness, outperforming EDD under heavy workloads.
for tardiness, outperforming EDD under heavy workloads.
3. The MAS-RL can reduce the flow time. For the 3 × 3 problem in Scenarios 4 and 5,
3. The MAS-RL can reduce the flow time. For the 3 × 3 problem in Scenarios 4 and 5, the
the MAS-RL gave the best results for all the performance criteria related to flow time.
MAS-RL gave the best results for all the performance criteria related to flow time. For
For the 5 × 5 and 10 × 10 problems, the MAS-RL only gave the best results for PC6
the 5 × 5 and 10 × 10 problems, the MAS-RL only gave the best results for PC6 (mean
(mean flow time).
4. flow time). can give feasible solutions for the aspect of the makespan. The makespan
The MAS-RL
4. The MAS-RL
shows how long canthe
give feasible
duration solutions
is to complete fora certain
the aspectnumber of the makespan.
of jobs. For all Thethe
makespan shows how long the duration is to complete a certain
problem sizes in Scenario 2, the MAS-RL gave the best results for the makespan. number of jobs. For
For
all the problem sizes in Scenario 2, the MAS-RL gave the
real businesses, it is not very meaningful to only look at the makespan. Even when best results for the
makespan. For real businesses, it is not very meaningful to only look at the makespan.
Sustainability 2023, 15, 8262 21 of 24
the makespan is optimal, if orders exceed the due date, it would not be long before
the business loses its customers. We still included the makespan in our study, as it has
been calculated since the first studies of scheduling problems.
5. There is no remarkable relation between the size of the problem and the performance
of the MAS-RL. Except for minor differences, the solution methods for all the problem
sizes yield similar results.
Another unique contribution of this study to the literature is that each job type could
receive different priorities on each machine. In addition, the priorities were reconciled with
the RL mechanism so that they could change over time. This technique allowed for more
flexible changes to be possible in the production schedule.
In future studies, the MAS-RL can be tested in even larger or smaller systems. Dynamic
events such as machine failures and order cancellations can be implemented in future
research. A different variety of parameters can be used in the calculation of the W values.
This may change the duration of the learning period for the MAS-RL. Researchers can
adapt this scheduling method to their own studies for different problem types.
Author Contributions: Conceptualization, A.F.İ.; methodology, A.K.T. and A.F.İ.; software, A.F.İ. and
Ç.S.; validation, A.A., Ç.S. and S.E.; formal analysis, A.F.İ.; investigation, A.F.İ.; resources, A.F.İ. and
S.E.; data curation, A.F.İ.; writing—original draft preparation, A.F.İ. and Ç.S.; writing—review and
editing, A.F.İ. and Ç.S.; visualization, A.F.İ.; supervision, Ç.S., A.K.T. and S.E.; project administration,
A.F.İ. and S.E. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.
Conflicts of Interest: The authors declare no conflict of interest.
Abbreviations
OC Order cancellations
PT Processing times
RL Reinforcement learning
SA Simulated annealing
SEA Symbiotic evolutionary algorithm
SHFS Sustainable hybrid flow shop
SPT Shortest processing time
ST Setup times
TS Tabu search
UO Urgent orders
VNS Variable neighborhood search
WSI Weighted sum of indicators
References
1. Nelson, R.T.; Holloway, C.A.; Wong, R.M.-L. Centralized Scheduling and Priority Implementation Heuristics for a Dynamic Job
Shop Model. AIIE Trans. 1977, 9, 95–102. [CrossRef]
2. Anciaux, D.; Roy, D.; Vernadat, F. Reactive Shop-Floor Control with a Multi-Agent System. IFAC Proc. Vol. 1997, 30, 425–430.
[CrossRef]
3. Brauer, W.; Weib, G.; Munchen, T.U. Multi-Machine Scheduling—A Multi-Agent Learning Approach. In Proceedings of the
International Conference on Multi Agent Systems, (Cat. No. 98EX160). Paris, France, 3–7 July 1998; pp. 42–48.
4. Chen, Y.-Y.; Fu, L.-C.; Chen, Y.-C. Multi-agent based dynamic scheduling for a flexible assembly system. In Proceedings of the IEEE
International Conference on Robotics and Automation, (Cat. No. 98CH36146). Leuven, Belgium, 20 May 1998; pp. 2122–2127.
[CrossRef]
5. Shen, W.; Maturana, F.; Norrie, D. Learning in Agent-Based Manufacturing. In Proceedings of the Artificial Intelligence and
Manufacturing Research Planning Workshop, Madison, WI, USA, 26–30 July 1998; pp. 177–183.
6. Sousa, P.; Ramos, C. A distributed architecture and negotiation protocol for scheduling in manufacturing systems. Comput. Ind.
1999, 38, 103–113. [CrossRef]
7. Maturana, F.; Shen, W.; Norrie, D. MetaMorph: An adaptive agent-based architecture for intelligent manufacturing. Int. J. Prod.
Res. 1999, 37, 2159–2173. [CrossRef]
8. Ouelhadj, D.; Hanach, C.; Bouzouia, B. Multi-agent system for dynamic scheduling and control in manufacturing cells. In
Proceedings of the 1998 IEEE International Conference on Robotics and Automation, (Cat. No. 98CH36146). Leuven, Belgium,
20 May 1998. [CrossRef]
9. Ouelhadj, D.; Hanachi, C.; Bouzouia, B.; Moualek, A.; Farhi, A. A multi-contract net protocol for dynamic scheduling in flexible
manufacturing systems (FMS). In Proceedings of the 1999 IEEE International Conference on Robotics and Automation, (Cat. No.
99CH36288C). Detroit, MI, USA, 10–15 May 1999. [CrossRef]
10. Frey, D.; Nimis, J.; Wörn, H.; Lockemann, P. Benchmarking and robust multi-agent-based production planning and control. Eng.
Appl. Artif. Intell. 2003, 16, 307–320. [CrossRef]
11. Grandgirard, J.; Poinsot, D.; Krespi, L.; Nénon, J.P.; Cortesero, A.M. Costs of secondary parasitism in the facultative hyperpara-
sitoid Pachycrepoideus dubius: Does host size matter? Entomol. Exp. Appl. 2002, 103, 239–248. [CrossRef]
12. Bongaerts, L.; Monostori, L.; McFarlane, D.; Kádár, B. Hierarchy in distributed shop floor control. Comput. Ind. 2000, 43, 123–137.
[CrossRef]
13. Paternina-Arboleda, C.D.; Das, T.K. A multi-agent reinforcement learning approach to obtaining dynamic control policies for
stochastic lot scheduling problem. Simul. Model. Pr. Theory 2005, 13, 389–406. [CrossRef]
14. Liu, S.; Ong, H.; Ng, K. Metaheuristics for minimizing the makespan of the dynamic shop scheduling problem. Adv. Eng. Softw.
2005, 36, 199–205. [CrossRef]
15. Wang, Y.-C.; Usher, J.M. Application of reinforcement learning for agent-based production scheduling. Eng. Appl. Artif. Intell.
2005, 18, 73–82. [CrossRef]
16. Wong, T.; Leung, C.; Mak, K.; Fung, R. Dynamic shopfloor scheduling in multi-agent manufacturing systems. Expert Syst. Appl.
2006, 31, 486–494. [CrossRef]
17. Wang, S.-J.; Xi, L.-F.; Zhou, B.-H. FBS-enhanced agent-based dynamic scheduling in FMS. Eng. Appl. Artif. Intell. 2008, 21, 644–657.
[CrossRef]
18. Blanc, P.; Demongodin, I.; Castagna, P. A holonic approach for manufacturing execution system design: An industrial application.
Eng. Appl. Artif. Intell. 2008, 21, 315–330. [CrossRef]
19. Xiang, W.; Lee, H. Ant colony intelligence in multi-agent dynamic manufacturing scheduling. Eng. Appl. Artif. Intell.
2008, 21, 73–85. [CrossRef]
Sustainability 2023, 15, 8262 23 of 24
20. Chaouch, I.; Driss, O.B.; Ghedira, K. A Survey of Optimization Techniques for Distributed Job Shop Scheduling Problems in
Multi-factories. In Cybernetics and Mathematics Applications in Intelligent Systems: Proceedings of the 6th Computer Science
On-line Conference 2017 (CSOC2017), 26–29 April 2017; Springer International Publishing: Berlin/Heidelberg, Germany, 2017;
pp. 369–378. [CrossRef]
21. Guo, Q.-L.; Zhang, M. Multiagent-based scheduling optimization for Intelligent Manufacturing System. Int. J. Adv. Manuf.
Technol. 2008, 44, 595–605. [CrossRef]
22. Guo, Q.; Zhang, M. A novel approach for multi-agent-based Intelligent Manufacturing System. Inf. Sci. 2009, 179, 3079–3090.
[CrossRef]
23. Ouelhadj, D.; Petrovic, S. A survey of dynamic scheduling in manufacturing systems. J. Sched. 2008, 12, 417–431. [CrossRef]
24. Cossentino, M.; Fortino, G.; Gleizes, M.-P.; Pavón, J. Simulation-based design and evaluation of multi-agent systems. Simul.
Model. Pr. Theory 2010, 18, 1425–1427. [CrossRef]
25. Moyaux, T.; Liu, Y.; Bouleux, G.; Cheutet, V. An agent-based architecture of the Digital Twin for an Emergency Department.
Sustainability 2023, 15, 3412. [CrossRef]
26. Asadzadeh, L. A local search genetic algorithm for the job shop scheduling problem with intelligent agents. Comput. Ind. Eng.
2015, 85, 376–383. [CrossRef]
27. Aydemir, E.; Koruca, H.I. A New Production Scheduling Module Using Priority-Rule Based Genetic Algorithm. Int. J. Simul.
Model. 2015, 14, 450–462. [CrossRef]
28. Wang, S.; Wan, J.; Zhang, D.; Li, D.; Zhang, C. Towards smart factory for industry 4.0: A self-organized multi-agent system with
big data based feedback and coordination. Comput. Netw. 2016, 101, 158–168. [CrossRef]
29. Karnouskos, S.; Leitao, P. Key Contributing Factors to the Acceptance of Agents in Industrial Environments. IEEE Trans. Ind.
Informatics 2016, 13, 696–703. [CrossRef]
30. Li, K.; Zhou, T.; Liu, B.-H.; Li, H. A multi-agent system for sharing distributed manufacturing resources. Expert Syst. Appl.
2018, 99, 32–43. [CrossRef]
31. Zhou, L.; Zhang, L.; Sarker, B.R.; Laili, Y.; Ren, L. An event-triggered dynamic scheduling method for randomly arriving tasks in
cloud manufacturing. Int. J. Comput. Integr. Manuf. 2017, 31, 318–333. [CrossRef]
32. Gao, K.; Cao, Z.; Zhang, L.; Chen, Z.; Han, Y.; Pan, Q. A review on swarm intelligence and evolutionary algorithms for solving
flexible job shop scheduling problems. IEEE/CAA J. Autom. Sin. 2019, 6, 904–916. [CrossRef]
33. Wang, J.; Zhang, Y.; Liu, Y.; Wu, N. Multiagent and Bargaining-Game-Based Real-Time Scheduling for Internet of Things-Enabled
Flexible Job Shop. IEEE Internet Things J. 2018, 6, 2518–2531. [CrossRef]
34. Zhang, C.-L.; Wang, J.-Q.; Zhang, C.-W. Two-agent scheduling on a single parallel-batching machine to minimize the weighted
sum of the agents’ makespans. J. Ambient. Intell. Humaniz. Comput. 2018, 10, 999–1007. [CrossRef]
35. Mohan, J.; Lanka, K.; Rao, A.N. A Review of Dynamic Job Shop Scheduling Techniques. Procedia Manuf. 2019, 30, 34–39.
[CrossRef]
36. Komma, V.R.; Jain, P.K.; Mehta, N.K. An approach for agent modeling in manufacturing on JADE™ reactive architecture. Int. J.
Adv. Manuf. Technol. 2010, 52, 1079–1090. [CrossRef]
37. Owliya, M.; Saadat, M.; Anane, R.; Goharian, M. A New Agents-Based Model for Dynamic Job Allocation in Manufacturing
Shopfloors. IEEE Syst. J. 2012, 6, 353–361. [CrossRef]
38. Leitao, P.; Rodrigues, N.; Turrin, C.; Pagani, A. Multiagent System Integrating Process and Quality Control in a Factory Producing
Laundry Washing Machines. IEEE Trans. Ind. Inform. 2015, 11, 879–886. [CrossRef]
39. Yu, F.; Wen, P.; Yi, S. A multi-agent scheduling problem for two identical parallel machines to minimize total tardiness time and
makespan. Adv. Mech. Eng. 2018, 10, 1687814018756103. [CrossRef]
40. Wong, T.; Zhang, S.; Wang, G.; Zhang, L. Integrated process planning and scheduling—Multi-agent system with two-stage ant
colony optimisation algorithm. Int. J. Prod. Res. 2012, 50, 6188–6201. [CrossRef]
41. Wang, Y.; Liu, H.; Zheng, W.; Xia, Y.; Li, Y.; Chen, P.; Guo, K.; Xie, H. Multi-Objective Workflow Scheduling with Deep-Q-Network-
Based Multi-Agent Reinforcement Learning. IEEE Access 2019, 7, 39974–39982. [CrossRef]
42. Kim, Y.G.; Lee, S.; Son, J.; Bae, H.; Chung, B.D. Multi-agent system and reinforcement learning approach for distributed
intelligence in a flexible smart manufacturing system. J. Manuf. Syst. 2020, 57, 440–450. [CrossRef]
43. Lee, W.-C.; Chung, Y.-H.; Wang, J.-Y. A parallel-machine scheduling problem with two competing agents. Eng. Optim.
2016, 49, 962–975. [CrossRef]
44. Ahmadi, E.; Zandieh, M.; Farrokh, M.; Emami, S.M. A multi objective optimization approach for flexible job shop scheduling
problem under random machine breakdown by evolutionary algorithms. Comput. Oper. Res. 2016, 73, 56–66. [CrossRef]
45. Shiue, Y.-R.; Lee, K.-C.; Su, C.-T. Real-time scheduling for a smart factory using a reinforcement learning approach. Comput. Ind.
Eng. 2018, 125, 604–614. [CrossRef]
46. Sahin, C.; Demirtas, M.; Erol, R.; Baykasoğlu, A.; Kaplanoğlu, V. A multi-agent based approach to dynamic scheduling with
flexible processing capabilities. J. Intell. Manuf. 2015, 28, 1827–1845. [CrossRef]
47. Maoudj, A.; Bouzouia, B.; Hentout, A.; Kouider, A.; Toumi, R. Distributed multi-agent scheduling and control system for robotic
flexible assembly cells. J. Intell. Manuf. 2017, 30, 1629–1644. [CrossRef]
48. Huang, C.-J.; Liao, L.-M. A multi-agent-based negotiation approach for parallel machine scheduling with multi-objectives in an
electro-etching process. Int. J. Prod. Res. 2012, 50, 5719–5733. [CrossRef]
Sustainability 2023, 15, 8262 24 of 24
49. Liu, Y.; Wang, L.; Wang, Y.; Wang, X.V.; Zhang, L. Multi-agent-based scheduling in cloud manufacturing with dynamic task
arrivals. Procedia CIRP 2018, 72, 953–960. [CrossRef]
50. Jiang, Z.; Jin, Y.; Mingcheng, E.; Li, Q. Distributed Dynamic Scheduling for Cyber-Physical Production Systems Based on a
Multi-Agent System. IEEE Access 2017, 6, 1855–1869. [CrossRef]
51. Zhang, S.; Wong, T.N. Flexible job-shop scheduling/rescheduling in dynamic environment: A hybrid MAS/ACO approach. Int. J.
Prod. Res. 2016, 55, 3173–3196. [CrossRef]
52. Barenji, A.V.; Barenji, R.V.; Roudi, D.; Hashemipour, M. A dynamic multi-agent-based scheduling approach for SMEs. Int. J. Adv.
Manuf. Technol. 2016, 89, 3123–3137. [CrossRef]
53. Shi, L.; Guo, G.; Song, X. Multi-agent based dynamic scheduling optimisation of the sustainable hybrid flow shop in a ubiquitous
environment. Int. J. Prod. Res. 2019, 59, 576–597. [CrossRef]
54. Luo, S. Dynamic scheduling for flexible job shop with new job insertions by deep reinforcement learning. Appl. Soft Comput.
2020, 91, 106208. [CrossRef]
55. Baykasoglu, A.; Karaslan, F.S. Solving comprehensive dynamic job shop scheduling problem by using a GRASP-based approach.
Int. J. Prod. Res. 2017, 55, 3308–3325. [CrossRef]
56. Sel, Ç.; Hamzadayı, A. A simulated annealing approach based simulation-optimisation to the dynamic job-shop scheduling
problem. Pamukkale Univ. J. Eng. Sci. 2018, 24, 665–674. [CrossRef]
57. Zhang, H.; Roy, U. A semantics-based dispatching rule selection approach for job shop scheduling. J. Intell. Manuf. 2018, 30,
2759–2779. [CrossRef]
58. Turker, A.K.; Aktepe, A.; Inal, A.F.; Ersoz, O.O.; Das, G.S.; Birgoren, B. A Decision Support System for Dynamic Job-Shop
Scheduling Using Real-Time Data with Simulation. Mathematics 2019, 7, 278. [CrossRef]
59. Aydin, M.; Öztemel, E. Dynamic job-shop scheduling using reinforcement learning agents. Robot. Auton. Syst. 2000, 33, 169–178.
[CrossRef]
60. Kardos, C.; Laflamme, C.; Gallina, V.; Sihn, W. Dynamic scheduling in a job-shop production system with reinforcement learning.
Procedia CIRP 2021, 97, 104–109. [CrossRef]
61. Erol, R.; Sahin, C.; Baykasoglu, A.; Kaplanoglu, V. A multi-agent based approach to dynamic scheduling of machines and
automated guided vehicles in manufacturing systems. Appl. Soft Comput. 2012, 12, 1720–1732. [CrossRef]
62. Jana, T.K.; Bairagi, B.; Paul, S.; Sarkar, B.; Saha, J. Dynamic schedule execution in an agent based holonic manufacturing system. J.
Manuf. Syst. 2013, 32, 801–816. [CrossRef]
63. Leusin, M.E.; Kück, M.; Frazzon, E.M.; Maldonado, M.U.; Freitag, M. Potential of a Multi-Agent System Approach for Production
Control in Smart Factories. IFAC-PapersOnLine 2018, 51, 1459–1464. [CrossRef]
64. Leusin, M.E.; Frazzon, E.M.; Maldonado, M.U.; Kück, M.; Freitag, M. Solving the Job-Shop Scheduling Problem in the Industry
4.0 Era. Technologies 2018, 6, 107. [CrossRef]
65. Sels, V.; Gheysen, N.; Vanhoucke, M. A comparison of priority rules for the job shop scheduling problem under different flow
time- and tardiness-related objective functions. Int. J. Prod. Res. 2012, 50, 4255–4270. [CrossRef]
66. Holthaus, O.; Rajendran, C. Efficient dispatching rules for scheduling in a job shop. Int. J. Prod. Econ. 1997, 48, 87–105. [CrossRef]
67. Jain, A.; Meeran, S. Deterministic job-shop scheduling: Past, present and future. Eur. J. Oper. Res. 1999, 113, 390–434. [CrossRef]
68. Yazdani, M.; Aleti, A.; Khalili, S.M.; Jolai, F. Optimizing the sum of maximum earliness and tardiness of the job shop scheduling
problem. Comput. Ind. Eng. 2017, 107, 12–24. [CrossRef]
69. Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction, 2nd ed.; MIT Press: Cambridge, MA, USA, 2018;
ISBN 9780262352703.
70. Baykasoǧlu, A.; Göçken, M.; Unutmaz, Z.D. New approaches to due date assignment in job shops. Eur. J. Oper. Res.
2008, 187, 31–45. [CrossRef]
71. Mohr, F.; van Rijn, J.N. Learning Curves for Decision Making in Supervised Machine Learning—A Survey. arXiv 2022,
arXiv:2201.12150.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.