Simulation
Simulation
Simulation
1
System Simulation and Modeling
Unit 1
Introduction to Simulation
Simulation is the imitation of the operation of a real-world process or system over time.
Simulation involves the generation of an artificial history of a system, and the observation
of that artificial history to draw inferences concerning the operating characteristics of the
real system that is represented.
2
System Simulation and Modeling
1. Special training is required for Model building. It is an art that is learnt overtime
and through experience.
2. Simulation results can be difficult to interpret, as most simulation outputs are
basically random variables (random inputs).
3. Simulation modeling and analysis can be time consuming and expensive.
3
System Simulation and Modeling
1. Simulation should not be used when the problem can be solved using common
sense. For example if customers arrive randomly at an average rate of 50/hour and
are served at a mean rate of 10/hour, then to determine minimum number of servers,
simulation is not required. Just compute 50/10=5 servers.
2. Simulation should not be used if the problem can be solved analytically.
3. Simulation should not be used if it is easier to perform direct experiments.
4. Not to use simulation, if the costs exceed the savings.
5 & 6 .Simulation should not be performed if the resources or time are not available.
7. If no data is available, not even estimates then simulation is not advised as it
requires data, sometimes lots of data.
8 This rule is concerned with the ability to verify and validate the model. If there is
not enough time or personnel not available, simulation is not appropriate.
9. If mangers have unreasonable expectation –say, too much too soon- or power of
simulation is overestimated, then simulation may not be appropriate.
10. If system behavior is too complex or can’t be defined then it is not appropriate.
Manufacturing Applications
Semiconductor Manufacturing
Construction Engineering and project management
Military application
Logistics, Supply chain and distribution application
Transportation modes and Traffic
Business Process Simulation
Health Care
Automated Material Handling System (AMHS)
System Environment
A system is often affected by the changes occurring outside the system. Such changes are
said to occur in system environment.
Example: Factory system that makes and assembles parts into a product (Fig 1.1)
.
4
System Simulation and Modeling
Production
Customer Control
orders Department
The table 1.1 lists few examples for the above mentioned components of a system.
Other terminologies
5
System Simulation and Modeling
A continuous system is one in which the state variables change continuously over time.
Example: Head of water behind a dam. After a rain storm, water flows into the lake behind
the dam. Water is drawn from dam for flood control and to generate electricity. Fig 1.3
shows how the head of water behind the dam (state variable) changes.
6
System Simulation and Modeling
Physical Model is a smaller or larger physical copy of an object. Physical models allow
visualization, from examining the model of information about the thing the model
represents. A model can be a physical object such as an architectural model of a building.
Static Mathematical Model gives the relationship between the system attributes when the
system is in equilibrium. For example in market model there is balance between supply
and demand for commodity and both factors depend upon price.
The process of developing a simulation model involves defining the situation or system to
be analyzed, identifying the associated variables, and describing the relationships between
them as accurately as possible
7
System Simulation and Modeling
Models
Physical Mathematical
Static Dynamic
Simulation
Static / Dynamic
Deterministic / Stochastic
Discrete / Continuous
Fig 1.4 Types of Models
8
System Simulation and Modeling
Problem formulation
9
System Simulation and Modeling
The study begins with defining the problem statement. It can be developed either by the
analyst or client. If the statement is provided by client, then the analyst must take extreme
care to ensure that the problem is clearly understood. If a problem statement is prepared by
the simulation analyst, it is important that the client understand and agree with the
formulation. Even with all of these precautions, it is possible that the problem will need to
be reformulated as the simulation study progresses.
Model conceptualization
Model is a simplification of reality. The real-world system under investigation is abstracted
by a conceptual model. It is recommended that modeling begins with simple model and
grows until a model of appropriate complexity has been achieved. For example,
consider the model of a manufacturing and material handling system. The basic model with
the arrivals, queues and servers is constructed. Then, add the failures and shift schedules.
Next, add the material-handling capabilities. Finally, add the special features. Constructing
an excessive complex model will add to the cost of the study and the time for its completion,
without increasing the quality of the output. Maintaining client involvement will enhance
the quality of the resulting model and increase the client's confidence in its use.
Data collection
This step involves in gathering the desired input data. The data changes over the complexity
of model. Data collection takes a huge amount of total time required to perform simulation.
It should be started at early stages together with model building. The collection of data
should be relevant with the objectives of study.
Model translation
The conceptual model constructed in Step 3 is coded into a computer recognizable form,
an operational model. The suitable simulation language is used.
Verified?
Verification is with respect to the operational model. Is it performing properly? If the input
parameters and logical structure of model are correctly represented in computer, then
verification is completed.
Validated?
10
System Simulation and Modeling
Validation is the determination, that the model is an accurate representation of the real
system. This is done by calibration of model –an iterative process of comparing model to
the actual system behavior. This process is repeated until model accuracy is acceptable.
Experimental design
The alternatives to be simulated must be determined. For each scenario that is to be
simulated, decisions need to be made concerning the length of the simulation run, the
number of runs (also called replications), and the length of initialization period.
More runs?
After the completion on the analysis of runs, the simulation analyst determines if additional
runs are needed and any additional experiments should follow.
Implementation
If the client has been involved throughout the study period, and the simulation analyst has
followed all of the steps rigorously, then the likelihood of a successful implementation is
increased.
The simulation model building process shown in the fig 1.5 can be divided to four phases
Phase 1-Problem formulation
Setting of objectives and overall design
Phase 2- Model conceptualization
Data collection
Model translation
Verification
Validation
Phase 3- Experimental design
Production runs and analysis
Additional runs
Phase 4- Documentation and reporting
Implementation
11
System Simulation and Modeling
Unit 2
Simulation of Queuing and Inventory Systems
2.1 Introduction to Queuing Systems
2.2 Characteristics of Queuing Systems
2.2.1 The calling population
2.2.2 System capacity
2.2.3 The arrival process
2.2.4 Queue behavior and queue discipline
2.2.5 Service times and the service mechanism
2.3 Queuing Notation
2.4 Simulation of Queuing Systems
2.5 Simulation of Inventory Systems
The term “customer” refers to any type of entity that can be viewed as requesting “service”
from the system. Therefore many service facilities like production systems, repair and
maintenance facilities, communications and computer systems and transport and material
handling systems can be viewed as queuing systems. Typical measures of system
performance include server utilization (percentage of time a server is busy), length of
waiting lines and delays of customers.
12
System Simulation and Modeling
The term customer can refer to people, machines, trucks, mechanics, patients, airplanes, e-
mail, cases, orders, or dirty clothes-anything that arrives at a facility and requires service.
The term server might refer to receptionists, mechanics, tool-crib clerks, medical personnel,
automatic storage and retrieval machines (e.g., cranes), runaways at an airport, automatic
packers, etc which provides the requested service.
A more typical example is that of five tire-curing machines serviced by a single worker.
The machines are the “customers” who arrive at the instant they automatically open. The
worker is the “server”, who “serves” an open machine as soon as possible. When all five
are closed and instant a machine opens and requires a service, the arrival rate decreases. At
those times when all five are open (so four machines are waiting for service while the
worker is attending the other one), the arrival rate is zero; that is, no arrival is possible until
the worker finishes with a machine, in which case it returns to the calling population and
becomes a potential arrival. But if arrival rate is defined as the expected number of arrivals
in the next unit of time, then it becomes clear that this expectation is largest when all
machines could potentially open in the next unit of time.
Examples of infinite population include the potential customers of the restaurant, bank, or
other similar service facility and also very large group of machines serviced by a technician.
In systems with large population of potential customers, the calling population is usually
assumed to be infinite.
In many queuing systems there is a limit to the number of customers that may be in the
waiting line or system. When a system has limited capacity, a distinction is made between
the arrival rate (i.e., the number of arrivals per time unit) and the effective rate (i.e., the
number who arrive and enter the system per time unit).
13
System Simulation and Modeling
For example,
1. (Limited capacity) - An automatic car wash may have room only for 10 cars to enter the
mechanism. It may be too dangerous or illegal for cars to wait in the street. An
arriving customer who finds the system full does not enter but returns immediately to the
calling population.
2. (Unlimited capacity) - Some systems, such as concert ticket sales for students, may be
considered as having unlimited capacity. There are no limits on the number of students
allowed to wait to purchase tickets.
The arrival process for infinite population models is usually characterized in terms of
interarrival times of successive customers. Arrivals may occur at scheduled times or
random times.
The most important model for random arrivals is the Poisson arrival process.
Poisson arrival process is used as a model for the arrival of people to restaurants,
driving banks and other service facilities like the arrival of telephone calls to a
telephone exchange, etc.
Second important class of intervals is the scheduled arrivals. In this case the inter
arrival times may be constant, or constant plus or minus a small random amount to
represent early or late arrivals. For example, Patients to a physicians office,
scheduled airline flight arrivals to an airport.
A third situation occurs when at least one customer is assumed to be always present
in the queue so that the server is never idle because of lack of customers. For
example, a customer may represent raw material for a product and sufficient raw
material is assumed to be always available.
For finite population models the arrival process is characterized in a different manner. We
define a customer as pending, when that customer is outside the queuing system and a
member of calling population; a run time of a given customer is the length of time from
departure from the queuing system until that customers next arrival to the queue. Let A1(i),
A2 (i) ,… be the successive run times of customers i and let S1(i), S2 (i) ,… be the
corresponding successive system times i.e. Sn(i) is the total time spent in the system by
customer i during the nth visit. For example, a tire curing machine is pending when it is
closed and curing a tire. It becomes not pending, the instant it opens and demands service
from the worker. The following fig 2.2 illustrates these concepts for machine 3.
14
System Simulation and Modeling
Suppose, if it is assumed that all machines are pending at time 0, the first arrival to the
system occurs at time A1 = min {A1(1), A1(2), A1(3), A1(4), A1(5)}. If A1 = A1(2) , then machine
2 is the first arrival (i.e., the first to open) after time 0. Here, the arrival rate is not constant
but is a function of the number of pending customers.
Queue behavior refers to customer actions while in a queue waiting for service to begin.
There is a possibility that the incoming customers may
Balk: leave when they see that the line is too long.
Renege: leave after being in the line when they see that the line is moving too slow.
Jockey: move from one line to another if they think they have chosen a slow line.
Queue discipline refers to the logical ordering of customers in a queue and determines
which customer is chosen for service when the server becomes free.
Queue disciplines can be:
FIFO (first in, first out)
LIFO (last in, first out)
SIRO (service in random order)
SPT (shortest processing time first)
PR (priority service)
In a job shop, queue disciplines are some times based on due dates and expected processing
time for a given type of job.
The service times of successive arrivals are denoted by S1, S2, and S3… They may be
constant or of random duration. In case of random, {S1, S2, …} is usually characterized as
a sequence of independent and identically distributed random variables.
The distributions like exponential, weibull, gamma, etc can be used as models for service
times. Sometimes services may be identically distributed for all customers of a given type
or class or priority, while customers of different types may have different service time
distributions. Sometimes, service times depend upon the time of day or length of waiting
line.
A queuing system consists of a number of service centers and interconnecting queues. Each
service center consists some number of severs c, working in parallel. Parallel service
mechanisms are either single server (c=1), multiple servers (1 < c < ∞) or unlimited servers
(c=∞). A self service facility is usually characterized as having an unlimited number of
servers.
15
System Simulation and Modeling
Example 2.1
Consider a discount warehouse where customers may either serve themselves or wait for
one of the 3 clerks and finally leave after paying a single cashier. The system is represented
by the flow diagram in fig 2.3.The sub system consisting of queue 2 and service center 2
is shown in more detail in the fig 2.4.
16
System Simulation and Modeling
For example, M / M / 1 / ∞ / ∞ are often shortened to M / M / 1. The tire curing system can
be initially represented by G / G / 1 / 5 / 5.
State of the system – The number of units in the system and the status of the server- busy
or idle
Event – Set of circumstances that cause an instantaneous change in the system. There are
only 2 possible events that can affect the state of the system. They are the entry of a unit in
the system (the arrival event) or the completion of service on a unit (the departure unit).
Simulation Clock – used to track simulated time.
The queuing system includes the server, the unit being serviced and the units in the queue.
If a unit has just completed the service the simulation proceeds as shown in the fig 2.5
17
System Simulation and Modeling
The arrival event occurs when the unit enters the system. The flow diagram for the arrival
event is shown in the fig 2.6.
The unit may find the server either idle or busy; therefore it begins service immediately or
enters the queue for the server. This course of action is shown in fig 2.7
Queue status
Not empty Empty
Server Busy Enter queue Enter queue
status Idle Impossible Enter service
Fig 2.7 Potential unit actions upon arrival
After the completion of the service the server may become idle or remain busy with the
next unit. This relationship of the server outcomes to the status of the queue is shown in
fig 2.8
Queue status
Not empty Empty
Server Busy Impossible
outcomes Idle Impossible
Fig 2.8 Server outcomes after service completion
If the queue is not empty another unit will enter the server and it will be busy. If the queue
is empty, the server will be idle after a service is completed. These two possibilities are
shown in the shaded portion of the fig 2.8
Simulation clock times for arrivals and departures are computed in a simulation table
customized for each problem. In simulation, events usually occur at random times. The
randomness needed to imitate real life is made possible through the use of random numbers.
Random numbers are distributed uniformly and independently and the interval (0, 1).
Random digits are uniformly distributed on the set {0, 1, 2,…, 9}. Random digits can be
18
System Simulation and Modeling
used to form random numbers by selecting the proper number of digits for each random
number and placing a decimal point to the left of the value selected.
Example 2.2 (Single channel queue simulation problem)
A small grocery store has only one check out counter. Customers arrive at this check out
counter at random from 1-8 minutes apart. Each possible value of inter arrival time has the
same probability of occurrence as shown in the table 2.1. The service times vary from 1-6
min with the probabilities shown in the table 2.2. The problem is to analyze the system by
simulating the arrival and service of 20 customers.
Time between Cumulative Random-digit
Probability
arrivals(minutes) probability assignment
1 0.125 0.125 001-125
2 0.125 0.250 126-250
3 0.125 0.375 251-375
4 0.125 0.500 376-500
5 0.125 0.625 501-625
6 0.125 0.750 626-750
7 0.125 0.875 751-875
8 0.125 1.000 876-000
Table 2.1 Distribution of time between arrivals
Obtain the random digits from the table of random digits (refer appendix). Since
the probabilities in the table 2.1 are accurate to 3 significant digits, three-place
random numbers will suffice.
Similarly for table 2.2 two-place random numbers will suffice. It is necessary to list
only 19 random numbers to generate times between arrivals (for 20 customers) as
the first arrival is assumed to occur at time 0.
The third column in tables 2.1 and 2.2 contains cumulative probability for the
distribution.
The rightmost column contains the random digit assignment in table 2.1; the first
random-digit assignment is 001-125. There are 1000 three digit values possible
(001 through 000). The probability of time-between-arrivals of 1 minute is 0.125
and 125 of the 1000 random digit values are assigned to such an occurrence.
In the table 2.3, the time between arrivals is determined. The first random digits are
913. To obtain the corresponding time between arrivals, find in fourth column of
table 2.1 the assignment in which 913 lies and read 8 minutes from the first column
of the table.
19
System Simulation and Modeling
Similarly service times for all 20 customers are generated in table 2.4 with the aid
of table 2.2.
Time Time
Random between Random between
Customer Customer
digits arrivals digits arrivals
(minutes) (minutes)
1 - - 11 109 1
2 913 8 12 093 1
3 727 6 13 607 5
4 015 1 14 738 6
5 948 8 15 359 3
6 309 3 16 888 8
7 922 8 17 106 1
8 753 7 18 212 2
9 235 2 19 493 4
10 302 3 20 535 5
Table 2.3 Time–between–Arrivals determination
The simulation table for a single channel queue is shown in table 2.5. The first customer is
assumed to arrive at time 0. Service begins immediately and finishes at time 4. The
customer was in the system for 4 minutes.
After the first customer, the subsequent rows in the table are based on the random numbers
for interarrival time and service time and the completion time of the previous customer.
For example, the second customer arrives at time 8 and thus the server was idle for 4 min.
Skipping down to the fourth customer, it is seen that this customer arrived at time 15 but
could not be served until time 18. This customer had to wait in the queue for 3 minutes.
This process continues for all 20 customers.
20
System Simulation and Modeling
21
System Simulation and Modeling
1. The average waiting time for a customer is 2.8 minutes. This is determined by
= 56 / 20 = 2.8 minutes
2. The probability that a customer has to wait in the queue is 0.65. It is obtained by
= 13 / 20 = 0.65
= 18 / 86 = 0.21
The probability of the server being busy is the complement of 0.21 = 0.79
= 68 / 20 = 3.4 minutes
The result can be compared with the expected service time by finding the mean of the
service time distribution using the equation
22
System Simulation and Modeling
= 82 / 19 = 4.3 minutes
One is subtracted from the denominator because the first arrival is assumed to occur at time
0.
= (1 + 8) / 2 = 4.5 minutes
The expected time between arrivals is slightly higher than the average. As the simulation
becomes longer, the average value of time between arrivals will approach the theoretical
mean E(A).
6. The average waiting time of those who wait is 4.3 minutes. This is obtained by
= 56 / 13 = 4.3 minutes
7. The average time a customer spends in the system is 6.2 minutes. This can be determined
in two ways.
First,
This is an example for more than one service channels (2 servers). A simplifying rule is
that Able gets the customer if both carhops are idle. Able has the seniority.
23
System Simulation and Modeling
The problem is to find how well the current arrangement is working. A simulation of 60
minutes of operation is made to estimate the system performance.
24
System Simulation and Modeling
25
System Simulation and Modeling
Inventory system is one of the important classes of simulation problems. This inventory
system has a periodic review of length N, at which time the inventory is checked. At each
review, an order is made to bring the inventory up to the level M (Maximum inventory).
At the end of first review period, an order quantity Q1 is placed. The lead time (length of
time between the placement and receipt of an order) is zero.
A simple inventory system is shown in fig 2.9. Demand is shown as being uniform over
the time period in fig 2.9. Actually demands are not usually uniform, they fluctuate over
time. One possibility is that demands all occur at the beginning of the cycle. Another is that
the lead time is random of some positive length.
In the second cycle the amount in inventory drops below zero, indicating shortage. In fig
2.9 these units are backordered; when the order arrives, the demands for the backordered
items are satisfied first.
Larger inventories decreases the possibilities of shortages but these costs must be traded
off in order to minimize the total cost of an inventory system. The total cost (profit) of an
inventory system is the measure of performance which is affected by changing M and N.
26
System Simulation and Modeling
In an (M, N) inventory systems, the events that may occur are: the demand for items in the
inventory, the review of the inventory position and the receipt of an order at the end of each
review period. When the lead time is zero, as in fig 2.9, the last two events occur
simultaneously.
The revenue from sales is 50 cents for each paper sold. The cost of newspapers is
33 cents for each paper purchased.
The lost profit from excess demand is 50 – 33 = 17 cents for each paper demanded
that could not be provided.
The salvage value of scrap papers is 5 cents each.
27
System Simulation and Modeling
Tables 2.11 and 2.12 provide the random digits for the types of newsdays and the
demands for those days.
The policy is changed to other values and the simulation is repeated until the best
value is found.
Cumulative Random-digit
Type of Newsday Probability
probability assignment
Good 0.35 0.35 01-35
Fair 0.45 0.80 36-80
Poor 0.20 1.00 81-00
Table 2.11 Random - digit assignment for type of newsday
On day 1 the demand is 60 newspapers. The revenue from the sale of 60 newspapers
is $30. Ten newspapers are left at the end of the day. The salvage value at 5 cents
each is 50 cents. The profit for the first day is
Profit = $30.00 - $23.10 – 0 + $0.50 = $7.40
th
On 5 day the demand is greater than the supply. The revenue from sales is $35,
since only 70 papers are available under this policy. An additional 20 papers could
have been sold. Thus a lost profit of $3.40 (20 * 17 cents) is assessed. The daily
profit is determined as follows
Profit = $35.00 - $23.10 - $3.40 + 0 = $8.50
The profit for the 20-day period is the sum of the daily profits $174.90. It can also
be computed from the totals for the 20 days of simulation as
Total profit = $645.00 - $462.00 - $13.60 + $5.50 = $174.90.
This simulation is repeated by changing the values for the policy (buying number of
newspapers) like 60, 80 and so on. The best policy is obtained by comparing all the profits.
28
System Simulation and Modeling
29
System Simulation and Modeling
Cumulative Random-digit
Lead Time (days) Probability
probability assignment
1 0.6 0.6 1-6
2 0.3 0.9 7-9
3 0.1 1.0 0
Table 2.15 Random – Digit assignments for lead time
To make an estimate of the mean units in ending inventory, many cycles would have to be
simulated and also changing M and N values.
30
System Simulation and Modeling
31
System Simulation and Modeling
Unit 3
Statistical Models
A random variable is a rule that assigns a number to each outcome of an experiment. These
numbers are called values of random variable. Random variables are usually denoted by X.
Ex 1. If a die is rolled out, the outcome has a value from 1 through 6.
2. If a coin is tossed, the possible outcome is head ‘H’ or tail ‘T’.
A discrete random variable takes only specific, isolated numerical values. The variables
which take finite numeric values are called as Finite discrete random variables and which
takes unlimited values are called as Infinite discrete random variables. The examples are
shown in the table 3.1.
32
System Simulation and Modeling
Discrete Finite
Flip a coin three times; X =
{0, 1, 2, 3} There are only four possible
the total number of heads.
values for X.
Select a mutual fund; X = Discrete Infinite
the number of companies {2, 3, 4, ...} There is no stated upper limit
in the fund portfolio. to the size of the portfolio.
Table 3.1 Examples for discrete random variables
Let
X → discrete random variable
RX → possible values of X, given by range space of X.
xi → individual outcome in RX.
A number p(xi) = P(X = xi) gives the probability that the random variable equals the value
of xi. The number p(xi), i=1, 2, 3 … must satisfy two conditions
The collection of pairs (xi, p(xi)) i.e. a list of probabilities associated with each of its
possible values is called probability distribution of X. p(xi) is called probability mass
function (pmf) of X.
Example 3.1
Consider the experiment of tossing a single die, defining X as the number of spots on up
face of die after a toss.
Solution
N=total number of observations = 21
The discrete probability distribution is given by
xi 1 2 3 4 5 6
P(xi) 1/21 2/21 3/21 4/21 5/21 6/21
The conditions also are satisfied, i.e.
1. p(xi) ≥ 0, for i = 1,2,…6
33
System Simulation and Modeling
Continuous Random variable takes any values within a continuous range or an interval.
The example is tabulated in table 3.2.
For a continuous random variable X, the probability that X lies in the interval [a, b], is
given by
The function f(x) is called probability density function (pdf) of random variable X.
3. f(x) = 0, if x is not in RX
For any specified value x0, P(X = x0 ) = 0 since
34
System Simulation and Modeling
Example 3.2
The life of a laser- ray device used to inspect cracks in aircraft wings is given by X,
continuous random variable, assuming x ≥ 0.The pdf of lifetime in years is ,
Solution
The probability that the life of laser ray device between 2 and 3 years is determined from
= -e -3/2 + e -1
= -0.223 + 0.368 = 0.145
The cdf is denoted by F(x), measures the probability that the random variable X is less than
or equal to x, i.e. F(x) = P (X ≤ x).
If X is discrete, then
35
System Simulation and Modeling
If X is continuous, then
Note - All probability questions about X can be answered in terms of cdf. For example
P (a < X ≤ b) = F (b) – F (a), for all a < b.
Example 3.3
The life of a laser- ray device used to inspect cracks in aircraft wings is given by X,
continuous random variable, assuming x ≥ 0.The pdf of lifetime in years is ,
1. Determine the probability that the device will last for less than 2 years.
2. Determine the probability that the life of laser ray device is between 2 and 3 years.
Solution
1. The probability that the device will last for less than 2 years is,
P(0 ≤ X ≤ 2) = F(2) – F(0)
= 1- e-1
= 0.632
2. The probability that the life of laser ray device between 2 and 3 years is
P(2 ≤ X ≤ 3) = F(3) – F(2)
= (1 – e-3/2) – (1 – e-1)
= -e-3/2 + e-1
= -0.233 + 0.368
= 0.145
3.1.3 Expectation
36
System Simulation and Modeling
Note
Mean E (X) is a measure of central tendency of a random variable.
Variance V(X) is a measure of variation of possible values of X around the mean E
(X).
Example 3.4
Find the mean, variance and standard deviation of die- tossing experiment.
Solution
N=21
xi 1 2 3 4 5 6
p (xi) 1/21 2/21 3/21 4/21 5/21 6/21
3.1.4 Mode
In case of discrete, Mode is the value of random variable that occurs most frequently. In
case of continuous, the mode is the value at which pdf is maximized.
37
System Simulation and Modeling
During the conduct of simulation, numerous situations arise where an investigator choose
to introduce probabilistic events. For example
In Queuing systems ─ inter arrival and service times are probabilistic.
In Inventory models ─ time between demand and lead time may be probabilistic.
In Reliability model ─ time to failure may be probabilistic.
In each case, the simulation analyst desires to generate random events and use known
statistical models if the distribution can be found. Some of the systems and the chosen
statistical models are discussed.
Queuing system
In queuing examples, interarrival and service-time patterns are given. The times between
the arrivals and service time are always probabilistic. Service times may be constant or
probabilistic.
If service times are completely random, exponential distribution is often used.
If service time is constant, but some random variability causes fluctuations in
positive or negative way then normal distribution is used.
For example- The time it takes for lathe to traverse a 10cm shaft should be always
same, but the material may have slight difference in hardness, causing different
processing times.
If there are more large service times, then weibull distribution is a better model.
To model interarrival and service times ─ exponential, gamma and weibull
distributions are used
The differences between these distributions involve the location of modes of pdf’s and the
shape of their tails for large and small times.
Mode – Exponential distribution – at origin
Gamma and Weibull distribution – at some point (≥ 0)
Tail – Gamma and Exponential distribution – long
Weibull distribution – declines more or less rapidly
Inventory systems
It has three random variables
1. Number of units demanded per order or per time period
2. Time between demands
3. Lead time ( time between placing an order and receiving receipt of that order)
In simple mathematical model, demand is constant over time and lead time is zero
or constant.
In realistic cases (Simulation models), demand occurs randomly in time and
number of units demanded each time is also random.
The geometric distribution has its mode at unity, given that atleast one demand has
occurred.
If demand data are characterized by a long tail, Negative Binomial distribution is
appropriate.
The Poisson distribution is often used to model the demand.
38
System Simulation and Modeling
Limited data
In many situations, simulations begin before data collection is completed. Three
distributions have application to incomplete/ limited data
Uniform distribution can be used when an interarrival or service time is known to
be random but no information is immediately available about the distribution.
Triangular distribution can be used when assumptions are made about minimum,
maximum and modal values of random variable.
Beta distribution.
Discrete random variables are used to describe random phenomena. The four distributions
are discussed.
Thus
p(x1, x2, … xn) = p1(x1). p2(x2)… pn (xn)
Mean of Xj is given by
E (Xj) = 0.q + 1.p = p
39
System Simulation and Modeling
and Variance,
V (Xj) = [(02 .q) + (12 .p)] – p2 = p (1– p)
The random variable X denotes the number of successes in n Bernoulli trails has a binomial
distribution given by p(x).
i.e. probability of a particular outcome with all success (S) occurring in first x trails
followed by n-x failures (F).
where q = 1 – p,
X = X1 + X2 + … + Xn
E (X) = p + p + … + p = np
V (X) = pq + pq + …+ pq = npq
Example 3.5
A production process manufactures computer chips on the average at 2% non-conforming.
Every day a random sample of size 50 is taken from the process. If the sample contains
more than two non-conforming chips, the process will be stopped. Determine the
probability that the process is stopped by the sampling scheme.
Solution
n = 50 Bernoulli trails
p = 2% = 0.02 (each trial)
q = 1 - p = 0.98
X→ Total number of non conforming chips in the sample
40
System Simulation and Modeling
Binomial distribution
To compute, determine the probability that more than two non conforming chips are
present in the sample
P(X > 2) = 1 – P(X ≤ 2)
Variance is
V (X) = npq = 50 (0.02) (0.98) =0.98
The geometric distribution is related to a sequence of Bernoulli trails. The random variable
X is defined as number of trails to achieve the first success. The distribution of X is given
by
The event {X = x} occurs when there are x-1 failures followed by a success. Each failure
has an associated probability p. Thus
P (FFF…..FS) = q x - 1 p
Mean,
Variance,
41
System Simulation and Modeling
Example 3.6
40% of assembled bubble-jet printers are rejected at inspection station. Find the probability
that the first acceptable printer is third one inspected.
Solution
q =40% =0.4
q = 1- p → p = 1 - 0.4 = 0.6
p (x) = qx-1 p
p (3) = (0.4)2 (0.6) = 0.096
Approximately 10% of cases are first accepted
where α > 0
Most queuing systems characteristics such as arrival and departure processes are described
by a poisson distribution.
Example 3.7
A computer terminal repair person is beeped each time there is a call for service. The
number of beeps per hour is known to occur in accordance with a poisson distribution with
a mean of α = 2 per hour.
1. Determine the probability of three beeps in next hour.
2. Determine the probability of two or more beeps in an hour period.
Solution
α=2
x=3
1. The probability of three beeps in next hour
42
System Simulation and Modeling
Continuous random variables are used to describe the random phenomena. The
distributions are described below.
cdf,
Note that
The probability is proportional to the length of interval for all x1 and x2 satisfying a
≤ x1 < x2 ≤ b.
Mean,
Variance,
43
System Simulation and Modeling
Example 3.8
A bus arrives every 20 minutes at a specified stop beginning at 6.40am and continues till
8.40am. A certain passenger does not know the schedule, but arrives randomly (uniformly
distributed) between 7am and 7.30am every morning. What is probability that the
passenger waits more than 5 minutes for a bus?
Solution
The passenger waits for more than 5 minutes only if his/her arrival time is between 7am
and 7.15am or between 7.20am and 7.30am.
If X is a random variable, which denotes the number of minutes past 7am that the passenger
arrives, then the probability is given by
P (0 < X < 15) + P (20 < X < 30)
Now, X is uniform random variable on (0, 30). Therefore the desired probability is
F (15) – F (0) + F (30) – F (20)
=15/30 – 0 + 1 – 20/30 = 5/6
Mean,
Variance,
44
System Simulation and Modeling
The cdf ,
- This distribution is used to model interarrival times and service times when they
are completely random. In this case λ is a rate: arrival per hour or services per
minute
- It is also used to model the lifetime of a component that fails instantaneously, such
as light bulb, then λ is failure rate.
Example 3.9
Suppose the life of an industrial lamp, in thousands of hours is exponentially distributed
with failure rate λ=1/3 (one failure every 3000 hours, on average)
1. Determine the probability that lamp last longer than its mean life of 3000 hours.
2. Determine the probability that the lamp last between 2000 and 3000 hours.
Solution
λ=1/3
1. The probability that the lamp will last longer than its mean life is given by
P(X > 3) = 1 – P (X ≤ 3)
= 1 – F (3)
= 1 – (1 – e-3/3)
= 0.368
2. The probability that the lamp will last between 2000 and 3000 hours is
P(2 ≤ X ≤ 3) = F (3) – F (2)
= (1 – e-3/3) - (1 – e-2/3)
= - 0.368 + 0.513
= 0.145
A random variable X with mean µ (-∞ < µ < ∞) and variance σ2 (σ2 > 0) has a normal
distribution, if its pdf is
45
System Simulation and Modeling
The Notation X ~ N(µ, σ2) indicates that random variable X is normally distributed with
mean µ and variable σ2. The normal pdf is represented in fig 3.4
Since the above equation is in closed form, it is not possible to evaluate. So a transformation
variable, z = (t - µ) / σ, allows the evaluation to be independent of µ and σ.
The pdf ,
The above is the pdf of normal distribution with mean 0 and variance 1.Thus Z ~ N (0, 1)
and it is said that Z has standard Normal Distribution. It is shown in the fig 3.5
46
System Simulation and Modeling
Example 3.10
X ~ N (50, 9). Determine F(56).
Solution
X ~ N (50, 9) → X is normally distributed with mean value 50 and variance 9
F (56) = P (X ≤ 56)
=Φ [(56 -50)/3] [Φ (x - µ)/σ]
=Φ (2)
=0.9772 (using table A.3)
The fig 3.6(a) shows pdf of X ~ N(50, 9) and x0 = 56. The shaded portion is the desired
probability. The fig 3.6(b) shows Standard Normal Distribution Z ~ N (0, 1) with value 2
marked since x0 = 56 is 2σ greater than the mean.
Example 3.11
The time required to load a vessel X is distributed N (12, 4). Determine the probability that
the vessel will be loaded in less than 10 hours.
47
System Simulation and Modeling
Solution
µ = 12
σ2 = 4
F (10) =?
F (10) = Φ [(10 - 12)/2]
= Φ (-1)
= 0.1587 [where Φ (-1) = 1- Φ (1)]
Φ(1) = 0.8413, the complement of 0.8413 or 0.1587 is contained in the tail.
The fig 3.7(a) shows the shaded portion of standard normal distribution. The fig 3.7(b)
shows the symmetry property, to determine the shaded region to be Φ (-1).
Example 3.12
The time to pass through a queue to begin self-service at a cafeteria has been found to be
N(15, 9). Determine the probability that an arriving customer waits between 14 and 17
minutes.
Solution
N (15, 9) => µ = 15, σ2 = 9
P (14 ≤ X ≤ 17) = F (17) – F (14)
= Φ [(17 - 15)/3] - Φ [(14 - 15)/3]
= Φ [0.667] - Φ [-0.333]
= 0.3780
If fig 3.8(a) represents the probability F(17) – F(14) then fig 3.8(b) represents the
equivalent probability Φ [0.667] - Φ [-0.333] for standard normal distribution.
48
System Simulation and Modeling
Mean,
Variance,
where
The cdf ,
49
System Simulation and Modeling
Example 3.13
The time it takes for an aircraft to land and clear the runway at a major airport has a weibull
distribution with v = 1.34 minutes, β = 0.5 and α = 0.04 min. Determine the probability
that an incoming airplane will take more than 1.5 minutes to land and clear the runway.
Solution
V = 1.34mins
β = 0.5
α = 0.04 min
The probability than an incoming airplane will take more than 1.5 minutes is
P (X > 1.5) = 1- P(X ≤ 1.5)
= 1- F(1.5) = 1- exp [- {(1.5 – 1.34)/ (0.04)} 0.5]
= 1- e- 2 = 1- 0.135 = 0.865
Mean = E(X) = (a + b + c) / 3
Mode = b = 3 E(X) – (a + c)
Since a ≤ b ≤ c,
The cdf,
50
System Simulation and Modeling
Example 3.14
The central processing requirements for a program that will execute, have a triangular
distribution with a = 0.05 second, b = 1.1 seconds and c = 6.5 seconds. Determine the
probability that the CPU requirement for a random number is 2.5 seconds or less.
Solution
a = 0.05 second
b = 1.1 seconds
c = 6.5 seconds
P (X ≤ 2.5) = F (2.5) =?
Interval → (0.05, 1.1) plus that portion in (1.1, 2.5) i.e. 1.1 < 2.5 ≤ 6.5 (b < x ≤ c)
Thus the probability is 0.541 that the CPU requirement is 2.5 seconds or less.
where σ2 > 0
51
System Simulation and Modeling
The parameter µ and σ2 are not the mean and variance of lognormal. These parameters
come from the normal distribution. When Y has N(µ, σ2), then X= eY has a lognormal
distribution with parameters µ and σ2. If mean and variance of lognormal are known to be
µL and σ2L , then the parameter µ and σ2 are given by
The counting process {N(t),t ≥ 0}is said to be a poisson process with mean rate λ, if it
satisfies the following assumptions.
1. Arrivals occur one at a time.
2. {N(t),t ≥ 0} has stationary increments. The distribution of number of arrivals
between t and t + s depends only on length of interval s and not on starting point t.
Thus arrivals are completely random
3. {N(t),t ≥ 0} has independent increments.
If arrivals are according to the poisson process, it can be shown that probability N(t) is
equal to n i.e.
Comparing poisson pmf with the above equation, N(t) has poisson distribution with
parameter α = λt.
For any times s and t, such that s < t, the assumption - stationary increments implies random
variable N(t) –N(s) representing number of arrivals in interval s to t is also poisson
distributed with mean λ (t - s).
52
System Simulation and Modeling
Thus
and
E [N(t) –N(s)] = λ (t - s)
= V [N(t) – N(s)]
Now, consider the time at which an arrival occur in poisson process. Let the first arrival
occur at time A1, the second at time A1+A2 and so on. Thus A1, A2, … are successive inter
arrival times. It is depicted in fig 3.10
Since first arrival occurs after time t and no arrivals in interval [0, t], it is seen that {A
1 > t} = {N(t) = 0}
Therefore
P(A 1 > t) = P[N(t) = 0] = e-λt
The probability that the first arrival will occur in [0, t] is given by
P(A 1 ≤ t) = 1 - e-λt
Therefore poisson process can be also defined as if inter arrival times are exponentially
and independently distributed then number of arrivals by time t, say N(t) meets three
assumptions.
Suppose, each time an event occurs is classified as either type I or type II event. Further
that each event is classified as type I event with probability p and type II event with
probability 1-p, independently of all other events. The two properties are
1. Random splitting
Let N1(t) be random variable denoting number of type I event, N2(t) for type II event. N(t)
= N1(t)+ N2(t). N1(t)and N2(t) are both poisson processes having rates λp and λ(1-p) is
shown in fig 3.11
53
System Simulation and Modeling
2. Pooled process
It is the process of pooling two arrival streams. If Ni(t) are random variables representing
independent poisson process with rates λi, for i=1 and 2, then N(t) = N1(t)+ N2(t) is a
poisson process with rate λ1+ λ2, shown in fig 3.12
54
System Simulation and Modeling
Fig 3.14 provides a cdf of data, nothing but empirical distribution of given data.
55
System Simulation and Modeling
It is seen that between 0 and 0.5 hour, there are 21 instances, between 0.5 and 1.0 hours 12
instances and so on. The empirical cdf is shown in fig 3.15
56
System Simulation and Modeling
Unit 4
Random-Number Generation
57
System Simulation and Modeling
The word “Pseudo” is used, because generating numbers using a known method removes
the potential for true randomness. Since the sequence of numbers is deterministic they are
referred to as "pseudo-random".
Goal: To produce a sequence of numbers between [0, 1] that simulates, or replicates, the
ideal properties of random numbers (RN).
58
System Simulation and Modeling
The most widely used technique for generating random numbers, initially proposed by
Lehmer [1951]. This method produces a sequence of integers, X1, X2 … between 0 and m-
1 by following a recursive relationship:
The initial value X0 is called seed. The selection of the values for a, c, m, and X0 drastically
affects the statistical properties and the cycle length.
Example 4.1
Use linear congruential method to generate sequence of random numbers with
X0 = 27, a = 17, c = 43, and m = 100.
Solution
Random numbers (Ri)
59
System Simulation and Modeling
The secondary properties to generate random numbers include maximum density and
maximum period.
1. Maximum density means values assumed by Ri, i = 1, 2,… leave no large gaps on the
interval [0,1].
Problem: The values generated from Ri = Xi / m, is discrete on integers instead of
continuous.
Solution: A very large integer for modulus m.
2. Maximum Period
To achieve Maximum density and avoid cycling, the generator should have largest possible
period. Most digital computers use a binary representation of numbers. Speed and
efficiency are aided by a modulus m, to be (or close to) a power of 2.Maximal period is
achieved by proper choice of a, c, m and X0. The different cases are
Example 4.2
Using the multiplicative congruential method, find the period of the generator for a = 13,
m = 26 and X0 = 1, 2, 3, and 4.
Solution
c=0 (multiplicative congruential method), m = 26= 64 and a=13 → (a=5+8*1=13) so ‘a’ is
in the form 5+8k with k=1.
So therefore the maximal period p= m / 4= 64 / 4=16 for odd seeds i.e. for X0=1 and 3
60
System Simulation and Modeling
Similarly for X0 =3 and 4 are calculated. The values are tabulated below in the table 4.1
Therefore
For X0=1, 3 maximal period is 16
For X0=2, maximal period is 8
For X0=4, maximal period is 4
As the computing power increases, the complexity of the system to simulate also increases.
So a longer period generator with good statistical properties is needed. One successful
approach is to combine two or more multiplicative congruential generators.
Theorem : If Wi,1,Wi,2 ,...,Wi,k are any independent, discrete-valued random variables and
Wi,1 is uniformly distributed on integers 0 to m1 -2, then
61
System Simulation and Modeling
Let Xi, 1, Xi, 2, … Xi, k be ith output from k different multiplicative congruential generators,
where the jth generator has prime modulus mj and multiplier aj is chosen so that the period
is mj -1. Then the jth generator is producing Xi,j that are approximately uniformly distributed
on 1 to mj -1 and Wi, j = Xi, j -1 is approximately uniformly distributed on 0 to mj -2.
Example 4.3
For 32-bit computers, L’Ecuyer [1988] suggests combining k = 2 generators with m1
= 2,147,483,563, a1 = 40,014, m2 = 2,147,483,399 and a2 = 40,692. This leads to the
following algorithm:
Step 4: Return
62
System Simulation and Modeling
= P (reject H0 | H0 true)
There are five types of tests. The first is concerned for testing the uniformity whereas
second through five with testing for independence.
63
System Simulation and Modeling
The fundamental test performed to validate a new generator is the test for uniformity. The
two different methods of testing are
1. Kolmogorov-Smirnov test
2. Chi-Square test
1. Kolmogorov-Smirnov test
Notations used
F(x) →Continuous cdf
SN(x) → Empirical cdf
N →Total number of observations
R1, R2 …RN → Samples from Random generator
D → Sample statistic
Dα→ Critical value
By definition,
Step 1– Rank the data from smallest to largest. Let R(i) denote the ith smallest observation,
so that
R (1) ≤ R (2) ≤ ….. ≤ R (N)
Step 2 – Compute
D+ = max {(i / N) - R (i)}
1≤i≤N
64
System Simulation and Modeling
Step 3 – Compute
D= max (D+, D-)
Step 4 – Determine the critical value Dα, from the table A.8 for the specified significance
level α and the given sample size N.
Step 5
a. If D > Dα, the null hypothesis that the data are sample from a uniform distribution
is rejected.
b. If D ≤ Dα then there is no difference detected between the true distribution of {R1,
R2 …RN} and the uniform distribution. So it is accepted.
Example 4.4
Suppose 5 generated numbers are 0.44, 0.81, 0.14, 0.05, and 0.93. It is desired to perform
a test for uniformity using Kolmogorov-Smirnov test with a level of significance α
= 0.05.
Solution
N=5 , i = 1, 2, 3, 4, 5 Arrange Ri from
smallest to largest
Step 1 - Ri 0.05 0.14 0.44 0.81 0.93
i/N 0.20 0.40 0.60 0.80 1.00
i/N- Ri 0.15 0.26 0.16 - 0.07
Step 2 - Ri– [(i-1)/N] 0.05 - 0.04 0.21 0.13
The calculations in the above table are depicted in the fig 4.2, where empirical cdf SN(x) is
compared to uniform cdf F(x). It is seen that D+ is the largest deviation of SN(x) above F(x)
and D- is the largest deviation of SN(x) below F(x).
65
System Simulation and Modeling
Example 4.5
Use chi-square test with α=0.05 to test whether the data shown below are uniformly
distributed.
0.34 0.90 0.25 0.89 0.87 0.44 0.12 0.21 0.46 0.67
0.83 0.76 0.79 0.64 0.70 0.81 0.94 0.74 0.22 0.74
0.96 0.99 0.77 0.67 0.56 0.41 0.52 0.73 0.99 0.02
0.47 0.30 0.17 0.82 0.56 0.05 0.45 0.31 0.78 0.05
0.79 0.71 0.23 0.19 0.82 0.93 0.65 0.37 0.39 0.42
0.99 0.17 0.99 0.46 0.05 0.66 0.10 0.42 0.18 0.49
66
System Simulation and Modeling
0.37 0.51 0.54 0.01 0.81 0.28 0.69 0.34 0.75 0.49
0.72 0.43 0.56 0.97 0.30 0.94 0.96 0.58 0.73 0.05
0.06 0.39 0.84 0.24 0.40 0.64 0.40 0.19 0.79 0.62
0.18 0.26 0.97 0.88 0.64 0.47 0.60 0.11 0.29 0.78
Solution
Let n=10, the interval [0-1] divided in equal lengths, (0.01-0.10), (0.11-0.20), ---, (0.91-
1.0)
N = 100
Ei=N/n=100/10=10
Note:
In general for any value choose ‘n’ such that Ei ≥ 5.
Kolmogorov-Smirnov test is more powerful than chi-square test because it can be
applied to small sample sizes, whereas chi square requires large sample, say N ≥
50.
Run - The succession of similar events preceded and followed by a different event is called
as run.
67
System Simulation and Modeling
Run
No. Run
length
1 1 H
2 2 TT
3 2 HH
4 3 TTT
5 1 H
6 1 T
It has 7 runs, first run of length one, second run of length three, third run of length 3, fourth
run with one, fifth run with one, sixth run with three and seventh run with one. There are
three up runs and four down runs.
68
System Simulation and Modeling
Variance,
For N > 20, the distribution of ‘a’ is reasonably approximated by a normal distribution,
N(μa, σa2). This approximation is used to test the independence of numbers from a generator.
The test statistic is obtained by subtracting the mean from observed number of runs ‘a’ and
dividing by standard deviation, i.e. Test statistic is given by,
Where Z0 ~ N (0, 1)
Example 4.7
Based on runs up and runs down, determine whether the following sequence of 40 numbers
is such that the hypothesis of independence can be rejected or accepted where α = 0.05.
0.41 0.68 0.89 0.94 0.74 0.91 0.55 0.62 0.36 0.27
0.19 0.72 0.75 0.08 0.54 0.02 0.01 0.36 0.16 0.28
0.18 0.01 0.95 0.69 0.18 0.47 0.23 0.32 0.82 0.53
0.31 0.42 0.73 0.04 0.83 0.45 0.13 0.57 0.63 0.29
69
System Simulation and Modeling
Solution
The sequence of runs up and down is as follows:
+ + + - + - + - - - + + - + - - + - +
- - + - - + - + + - - + + - + - - + + -
No. of runs → a = 26
N = 40
μa = {2(40) - 1} / 3 = 26.33
σa2= {16(40) - 29} / 90 = 6.79
Z0 = (26 - 26.33) / √ (6.79) = -0.13
Critical value → Zα/2→ Z0.025 = 1.96 (from table A.3)
–Zα/2 ≤ Z0 ≤ Zα/2 → -1.96 ≤ -0.13 ≤ 1.96
Therefore independence of the numbers cannot be rejected, we accept null hypothesis.
Runs are described with above/below the mean value. A ‘+’ sign is used to indicate above
mean and ‘-‘sign for below the mean.
70
System Simulation and Modeling
Variance
For either n1 or n2 greater than 20, b is approximately normally distributed. The test statistic
is obtained by subtracting mean from number of runs ‘b’ and dividing by standard deviation
i.e.
The null hypothesis is accepted when –Zα/2 ≤ Z0 ≤ Zα/2, where α is the level of
significance.
Example 4.8
Based on runs above and below mean, determine whether the following sequence of 40
numbers is such that the hypothesis of independence can be rejected or accepted where α
= 0.05.
0.41 0.68 0.89 0.94 0.74 0.91 0.55 0.62 0.36 0.27
0.19 0.72 0.75 0.08 0.54 0.02 0.01 0.36 0.16 0.28
0.18 0.01 0.95 0.69 0.18 0.47 0.23 0.32 0.82 0.53
0.31 0.42 0.73 0.04 0.83 0.45 0.13 0.57 0.63 0.29
Solution
Mean= 0.495
The sequence of runs above and below mean is as follows:
- + + + + + + + - - - + + - + - - - - -
- - + + - - - - + + - - + - + - - + + -
n1 = 18
n2 = 22
N = n1 + n2 = 40
b = 17
μb = [{2(18) (22)} / 40] +(1 / 2) = 20.3
σb2= [2 (18) (22) {(2) (18) (22) – 40}] / [(40)2 (40 – 1)] = 9.54
Since n2 > 20, normal approximation is accepted.
Z0 = (17- 20.3) / √ (9.54) = -1.07
Critical value → Zα/2→ Z0.025 = 1.96 (from table A.3)
–Zα/2 ≤ Z0 ≤ Zα/2 → -1.96 ≤ -1.07 ≤ 1.96
Therefore hypothesis of independence cannot be rejected on the basis of this test.
71
System Simulation and Modeling
If two numbers are below mean, two numbers are above mean and so on. Then the
numbers are dependent.
For runs above and below mean, the expected value of Yi is given by
Where wi, the approximate probability that a run has length i, is given by
The approximate expected total number of runs (of all lengths) E (A), is given by
The appropriate test is chi-square test with Oi, the observed number of runs of length i. The
test statistic is given by
Example 4.9
72
System Simulation and Modeling
Given the sequence of numbers, can the hypothesis that the numbers are independent be
rejected on the basis of length of runs up and down at α = 0.05?
0.30 0.48 0.36 0.01 0.54 0.34 0.96 0.06 0.61 0.85
0.48 0.86 0.14 0.86 0.89 0.37 0.49 0.60 0.04 0.83
0.42 0.83 0.37 0.21 0.90 0.89 0.91 0.79 0.57 0.99
0.95 0.27 0.41 0.81 0.96 0.31 0.09 0.06 0.23 0.77
0.73 0.47 0.13 0.55 0.11 0.75 0.36 0.25 0.23 0.72
0.60 0.84 0.70 0.30 0.26 0.38 0.05 0.19 0.73 0.44
Solution
N = 60
The sequence of + and – are as follows
+ - - + - + - + + - + - + + - + + - + - + - - + - + - - + -
- + + + - - - + + - - - + - + - - - + - + - - - + - + + -
Calculate Oi
Run Length, i 1 2 3 4
Observed Runs, Oi 26 9 5 0
73
System Simulation and Modeling
To find X20, the calculations and procedures are shown in table 4.3
Solution
N = 60
Mean = (0.99+0.00)/2 = 0.495
n1 = 28
n2 = 32
N = n1 + n2 = 60
74
System Simulation and Modeling
Calculate Oi
Run Length, i 1 2 3 ≥4
Observed Runs, Oi 17 8 1 5
The probabilities of runs of various lengths wi are as follows
For i ≥ 4,
To find X02, the calculations and procedures are shown in table 4.4
75
System Simulation and Modeling
1
2
3
≥4
- 31 29.70
Table 4.4 Length of runs above and below mean: X02 test
The tests for autocorrelation are concerned with dependence between numbers in a
sequence.
For example
0.12 0.01 0.23 0.28 0.89 0.31 0.64 0.28 0.83 0.93
0.99 0.15 0.33 0.35 0.91 0.41 0.60 0.27 0.75 0.88
0.68 0.49 0.05 0.43 0.95 0.58 0.19 0.36 0.69 0.87
Even though the numbers satisfies all the previous test, still we find that every 5th, 10th, …
numbers are larger numbers in that position. Hence the numbers are dependent. So this test
is preferable.
For large values the above is approximately normal if the values Ri, Ri+m, Ri+2m … Ri+(M+1)m
are uncorrelated. The test statistic can be formed as
76
System Simulation and Modeling
Where
Example 4.11
Test whether the 3rd, 8th, 13th and so on, numbers in the sequence are autocorrelated. The
level of significance is 0.05.
0.12 0.01 0.23 0.28 0.89 0.31 0.64 0.28 0.83 0.93
0.99 0.15 0.33 0.35 0.91 0.41 0.60 0.27 0.75 0.88
0.68 0.49 0.05 0.43 0.95 0.58 0.19 0.36 0.69 0.87
Solution
α = 0.05
i = 3 (beginning with 3rd number)
m = 5 (every five numbers)
N = 30
M= 4 (largest integer 3+ (M+1)5 ≤ 30)
= - 0.1945
77
System Simulation and Modeling
The Gap test measures the number of digits between successive occurrences of the same
digit. A gap of length x occurs between the occurrences of some specified digit. The
probability of gap is determined as
For example
Length of gaps associated with the digit 3.
4, 1, 3, 5, 1, 7, 2, 8, 2, 0, 7, 9, 1, 3, 5, 2, 7, 9, 4, 1, 6, 3
3, 9, 6, 3, 4, 8, 2, 3, 1, 9, 4, 4, 6, 8, 4, 1, 3, 8, 9, 5, 5, 7
3, 9, 5, 9, 8, 5, 3, 2, 2, 3, 7, 4, 7, 0, 3, 6, 3, 5, 9, 9, 5, 5
5, 0, 4, 6, 8, 0, 4, 7, 0, 3, 3, 0, 9, 5, 7, 9, 5, 1, 6, 6, 3, 8
8, 8, 9, 2, 9, 1, 8, 5, 4, 4, 5, 0, 2, 3, 9, 7, 1, 2, 0, 3, 6, 3
There are eighteen 3’s in list. Therefore 17 gaps, the first gap is of length 10, second gap
is of length is length 7 and so on. We are interested in the frequency of gaps.
The probability of first gap is determined as P(gap of 10)=(0.9)10 (0.1)
The observed frequencies for all digits are compared to the theoretical frequency using the
Kolmogorov-Smirnov test. The theoretical frequency distribution for random ordered
digits is given by
78
System Simulation and Modeling
Step 1 – Specify the cdf for the theoretical frequency distribution, F(x) = 1 – 0.9 x + 1
Step 2 –Arrange the observed sample of gaps in a cumulative distribution with these
same classes.
Step 4 – Determine the critical value Dα , from table A.8 for the specified value of α and
the sample size N.
Step 5 –If the calculated value of D is greater than the tabulated value of Dα, the null
hypothesis of independence is rejected.
Example 4.12
Based on the frequency with which gaps occur analyze the following 110 digits to test
whether they are independent. Use α= 0.05.
4, 1, 3, 5, 1, 7, 2, 8, 2, 0, 7, 9, 1, 3, 5, 2, 7, 9, 4, 1, 6, 3
3, 9, 6, 3, 4, 8, 2, 3, 1, 9, 4, 4, 6, 8, 4, 1, 3, 8, 9, 5, 5, 7
3, 9, 5, 9, 8, 5, 3, 2, 2, 3, 7, 4, 7, 0, 3, 6, 3, 5, 9, 9, 5, 5
5, 0, 4, 6, 8, 0, 4, 7, 0, 3, 3, 0, 9, 5, 7, 9, 5, 1, 6, 6, 3, 8
8, 8, 9, 2, 9, 1, 8, 5, 4, 4, 5, 0, 2, 3, 9, 7, 1, 2, 0, 3, 6, 3
Solution
Digit 0 1 2 3 4 5 6 7 8 9
No. of Gaps 7 8 8 17 10 13 7 8 9 13
The numbers of gaps associated with the various digits are as follows in table 4.5
79
System Simulation and Modeling
Is based on the frequency with which certain digits are repeated in a series of number.
For example
0.255 0.577 0.331 0.414 0.828 0.909 0.303 0.001 …
Example 4.13
A sequence of 1000 three-digit numbers has been generated and an analysis indicates that
680 have three different digits, 289 contain exactly one pair of like digits, and 31 contain
three like digits. Based on the poker test, are these numbers independent?
Solution
Let α = 0.05
The test is summarized in table 4.6
80
System Simulation and Modeling
Observed Expected
Combination, i
frequency, Oi frequency, Ei
680 720 2.22
Three different digits 31 10 44.10
Three like digits 289 270 1.33
Exactly one pair ------ ------ -------
1000 1000 47.65 =
Table 4.6 Poker test results
Therefore the independence of the numbers is rejected on the basis of this test.
Unit 5
Random Variate Generation
5.3 Inverse Transform Technique
5.1.1 Exponential Distributions
5.1.2 Uniform Distributions
5.1.3 Triangular Distributions
5.1.4 Discrete Distribution
Empirical Distribution
Uniform Distributions
Geometric Distributions
5.4 Acceptance-Rejection Technique
5.2 1 Poisson Distributions
All the techniques for generating random variates assumes that the uniform (0, 1) random
numbers R1, R2 … is readily available, where each Ri has
probability density function (pdf)
81
System Simulation and Modeling
cdf
Example 5.1
If interarrival times X1, X2, X3... had an exponential distribution with rate λ, then λ can be
interpreted as mean number of arrivals per unit time. For any i,
Step1 -Compute the cdf of desired random variable X. For exponential distribution, cdf is
82
System Simulation and Modeling
F (x) = 1 – e –λx, x ≥ 0
Equation (5.1) is called a random variate generator for exponential distribution and
can be written as X = F-1(R).
Step4 -Generate uniform random numbers R1, R2, R3 … and compute desired random
variate by
Xi = F-1(Ri)
For exponential distribution, using (5.1)
for i = 1, 2 …
Note: It is justified that both 1 - Ri and Ri are uniformly distributed on (0, 1).
X = a + (b – a) R (5.3)
The pdf of X is
83
System Simulation and Modeling
Step2 – Set
X – a = R (b – a)
Therefore X = a + (b – a) R
The pdf,
The cdf,
84
System Simulation and Modeling
For 0 ≤ X ≤ 1,
and for 1≤ X ≤ 2,
Thus X is generated by
The discrete distributions can be generated by using inverse transform technique either
numerically (table-lookup procedure) or algebraically (formula). It includes empirical
distribution and two standard discrete distributions – (discrete) uniform and geometric
distributions.
Example 5.2 (Empirical Discrete distribution)
At the end of the day, number of shipments on loading dock of ABC Company is 0, 1 or 2
with relative frequency of occurrence of 0.50, 0.30 and 0.20 respectively. The internal
consultants were asked to develop a model to improve efficiency of loading and hauling
operations, as a part they are required to generate values X, to represent number of
shipments on loading dock at end of each day. The discrete random variable with
distribution is given in the table 5.1 and 5.2.
x p(x) F(x)
0 0.50 0.50
1 0.30 0.80
2 0.20 1.00
Table 5.1 Distributions of number of shipments X
1 0.50 0
2 0.80 1
3 1.00 2
85
System Simulation and Modeling
pmf is given by
p(0) = P (X=0) = 0.50
p(1) = P (X=1) = 0.30
p(2) = P (X=2) = 0.20
The cdf of discrete random variable always consists of horizontal line segments with jumps
of size p(x) at points x, which the random variable can assume. There is a jump
of size p(0) = 0.5 at x = 0, p(1) = 0.3 at x =1 and p(2) = 0.2 at x = 2.
cdf,
1. Graphically
First locate R1 = 0.73 on vertical axis, draw a horizontal line segment until it hits a ‘jump’
in cdf and then drop a perpendicular to horizontal axis to get the generated variate.
86
System Simulation and Modeling
Here r0 = 0, x0 = - ∞, while x1, x2… xn are possible values of random variable and
Set X1 = x2 = 1
cdf,
Let us consider
xi = i
ri = p(1) + p(2) + …+ p(xi) = F(xi )
87
System Simulation and Modeling
By using inequality, F (xi-1) = ri-1 < R ≤ ri = F (xi), generated random number R satisfies
Then X is generated by setting X = i. Now the above inequality (5.6) can be solved for i
i – 1 < Rk ≤ i
Rk ≤ i < Rk + 1
Example 5.4
Generate R and use inequality F (xi-1) = ri-1 < R ≤ ri = F (xi), such that
88
System Simulation and Modeling
→ x2 – x – k (k +1) R = 0
The above equation is in form of quadratic equation. So the solution is obtained by using
quadratic formula,
cdf,
89
System Simulation and Modeling
= p [1 + (1 – p) + (1 – p) 2 + (1 – p) 3 +…+ (1 – p) x]
= 1 – (1 – p) x + 1
Generate R and use inequality F (xi-1) = ri-1 < R ≤ ri = F (xi), such that
(x + 1) ln (1- p) ≤ ln (1 – R) < x ln (1 – p)
÷ by ln(1 – p)
Thus, X = x for that integer value of x satisfying inequality. By using round up function
For a geometric variate X, assume values {q, q+1, q+2 …} with pmf
Note- Commonly q = 1
Example 5.6
Generate 3 values from a geometric distribution on the range {X ≥ 1} with mean 2. Such a
geometric distribution has pmf p(x) = p( 1 – p )x – 1 , where x= 1,2,… with mean 1/p = 2.
90
System Simulation and Modeling
Solution
p = 1/2
q=1
p(x) = p (1 – p)x – 1
Step3 – If another uniform random variate on [1/4, 1] is needed, repeat from beginning at
Step1, otherwise stop.
To show this
then
91
System Simulation and Modeling
This is the probability distribution of R, given that R is between ¼ and 1 is the desired
distribution.
N can be interpreted as number of arrivals from poisson arrival process in one unit of time.
In exponential distribution, α is the mean number of arrivals per unit time. Thus there is a
relationship between (discrete) poisson distribution and (continuous) exponential
distribution. i.e. N = n if and only if
Where
A1, A2,...→ inter arrival times of customers, exponentially distributed with rate α.
N = n, says exactly n arrivals during one unit of time but A1 +A2+…+An ≤ 1 < A1+…+ An
+ An+1 says that nth arrival occurred before time 1, whereas (n + 1) st arrival occurred after
time 1. Therefore these two statements are equivalent. Equation (5.7) is simplified by using
equation (5.2), we get
92
System Simulation and Modeling
Now multiply throughout by − α, which reverses the inequality sign and sum of logarithm
to product of logarithm
Step1 – Set n = 0, P = 1
If N = n, then n+1 random numbers are required, so the average number is given
by E (N + 1) = α + 1.
Example 5.7
Generate three poisson variates with mean α =0.2 (Random Numbers are to be taken from
Table A.2)
Solution
Step1 – Set n = 0, P = 1
Step2 – R1 = 0.4357, P =1*0.4357 = 0.4357
Step3 – Since P < e−α i.e. 0.4357 < 0.8187, Accept N = 0.
Step1 – Set n = 0, P = 1
Step2 – R1 = 0.4146, P =1*0.4146 = 0.4146
Step3 – Since P < e−α i.e. 0.4146 < 0.8187, Accept N = 0.
93
System Simulation and Modeling
Step1 – Set n = 0, P = 1
Step2 – R1 =0.8353, P = 0.8353
Step3 – Since P ≥ e−α, reject n = 0, return to step2 with n = 1.
94
System Simulation and Modeling
Input data provide the driving force for a simulation model. In queuing system the
distribution of time between arrivals and service times are the input data. The distributions
of demand and lead time are the input data for inventory system.
“GIGO or garbage - in, garbage - out” is the basic concept in computers. In case input data
are not accurately collected and analyzed then simulation output data will result in
misleading and possibly damage with increase in cost factor.
To enhance and facilitate data collection, the following suggestions are required.
1. A useful expenditure of time is in planning.
Data collection should start before observing the process. Devise the forms
for this purpose. These forms have to be modified several times before
actual data collection begins.
Check for unusual circumstances and consider how they will be handled.
If data is already present then plenty of time is required for converting the
data into a usable format.
2. Try to analyze the data that are collected i.e. whether the data collected is adequate
for simulation. Check the data that are useless for the system.
3. Try to combine homogeneous data sets; an initial test is to see if means of
distribution are same.
4. Be aware of possibility of data censoring. Censoring → whether the part or whole
data is accepted. Censoring results in long process time, so collect only relevant
data.
5. Build a scatter diagram to determine the relationship between two variables. Scatter
diagram – A dotted graph with one variable on x-axis and one variable on y-axis.
6. Observe the sequence of inputs for autocorrelation. If service time of ith customer
affects the ‘i + n’ customer then there is an autocorrelation.
7. Keep in mind the difference between the input data from output or performance
data. Make sure to collect input data which represents the uncertain quantities that
95
System Simulation and Modeling
are largely beyond the control of system and will not be altered by changes made
to improve the system.
When the data is available, this step begins by developing a frequency distribution or
histogram, of the data. Based on the frequency distribution and structural knowledge of the
process, a family of distribution is chosen.
6.2.1 Histogram
The number of class intervals depends on number of observations and dispersion in the
data. If intervals are too wide, histogram will be coarse or blocky. If intervals are too
narrow, the histogram will be ragged. The fig 6.1 shows the different forms of histogram.
96
System Simulation and Modeling
Histograms for discrete data corresponds to pmf of distribution, it has large number of
data points so it should have a cell for each value in the range of data. If there are few
points then combine adjacent cells to eliminate ragged appearance of histogram.
97
System Simulation and Modeling
4 10 10 3
5 8 11 1
T able 6.1 Number of arrivals in a 5 minute period
The number of vehicles is a discrete variable; since there are sample data the histogram can
have a cell for each possible value in the range of data. The resulting histogram is shown
in fig 6.2
Lifetime is usually a continuous variable. Since the data is large from 0.002 day to 144.695
days, use intervals of width three results, as shown in table 6.2. It is represented in fig 6.3.
98
System Simulation and Modeling
0 ≤ xj <3 23
3 ≤ xj <6 10
6 ≤ xj <9 5
9 ≤ xj <12 1
12 ≤ xj <15 1
15 ≤ xj <18 2
18 ≤ xj <21 0
21 ≤ xj <24 1
24 ≤ xj <27 1
27 ≤ xj <30 0
30 ≤ xj <33 1
33 ≤ xj <36 1
42 ≤ xj <45 1
57 ≤ xj <60 1
78 ≤ xj <81 1
144 ≤ xj <147 1
Table 6.2 Electronics chip data
99
System Simulation and Modeling
Negative binomial -Models the number of trials required to achieve ‘k’ successes.
Example: Number of chips that must be inspected to find 4 defective chips.
Poisson - Models number of independent events that occur in fixed amount of time.
Example: Number of customers arriving to a restaurant during 1 hour.
Exponential -Models the time between independent events or process time which is
memory less.
Example: Times between arrivals of large number of customers.
Gamma - Models nonnegative random variables, the gamma can be shifted away from 0
by adding a constant.
Discrete or continuous uniform - Models complete uncertainty, since all outcomes are
equally likely. This distribution is often used when there are no data.
Empirical -It is used when no theoretical distributions are appropriate. Resample from the
actual data collected.
A quantile-quantile (q-q) plot is a useful tool for evaluating distribution fit, where as
histogram is not preferred for evaluating the fit of chosen distribution.
If X is a random variable with cdf F then q–quantile of X is that value γ such that
γ = F-1(q)
Let
100
System Simulation and Modeling
The q –q plot is based on the fact that yj is an estimate of (j – 1/2) / n quantile of x. i.e.
yj is approximately F-1[(j – 1 /2) / n].
j = 1, 2,…20
Sample mean =99.99 seconds
Sample variance = (0.2832)2 seconds.
101
System Simulation and Modeling
1 99.55 0.03
2 99.56 0.08
3 99.62 0.13
4 99.65 0.18
5 99.79 0.23
6 99.82 0.28
7 99.83 0.33
8 99.85 0.38
9 99.90 0.43
10 99.96 0.48
11 99.98 0.53
12 100.02 0.58
13 100.06 0.63
14 100.17 0.68
15 100.23 0.73
16 100.26 0.78
17 100.27 0.83
18 100.33 0.88
19 100.41 0.93
20 100.47 0.98
Table 6.3 Computation of values
The ordered observations yj versus F-1((j – 1 /2)/20) is plotted, which is shown in fig 6.4
102
System Simulation and Modeling
The sample mean and sample variance are used to estimate the parameters of a
hypothesized distribution.
1. If data are discrete or continuous raw data then sample mean is defined by
103
System Simulation and Modeling
2. If data are discrete and grouped in a frequency distribution then mean and variance is
given by
4. If data are discrete or continuous and have been placed in class interval, then mean and
variance is
Where
fj→ Observed frequency
mj→ Mid point of jth interval
c→ Number of class intervals
Numerical estimates of the distribution parameters are required to reduce the family of
distributions to a specific distribution and to test the resulting hypothesis. The table 6.4
contains suggested estimators for distribution often used in simulation.
104
System Simulation and Modeling
Exponential λ
Gamma β, θ
Normal μ,σ2
Lognormal μ,σ2
Solution
Natural log of the given data is
2.9, 3.3, 3.0, 1.8, 3.6, 1.6, 3.1, 0, 1.1 and 2.1
105
System Simulation and Modeling
If very large data are available, then goodness-of-fit test is likely to reject all
candidate distributions.
6.4.1 Chi-square Test
This test is valid for large sample sizes and used for both discrete and continuous
distributional assumptions. It formalizes the spontaneous idea of comparing the histogram
of data to the shape of candidate density or mass function.
The test procedure beings by arranging the n observations into a set of k class intervals or
cells. The test statistic is given by
Ei = npi
Where
pi → theoretical, hypothesized probability associated with ith class interval.
Where,
s → Number of parameters of hypothesized distribution estimated by sample statistics.
The critical value X2α, k-s-1 is found in table A.6. Null hypothesis H0, is accepted if
If an Ei value is too small, it can be combined with expected frequencies in adjacent class
intervals. The corresponding observed frequency Oi values should be also combined and k
should be reduced by one for each cell combined.
If the distribution being tested is discrete, each value of random variable should be a class
interval, unless it is required to combine adjacent class intervals to meet minimum expected
cell-frequency requirements.
106
System Simulation and Modeling
If the distribution being tested is continuous, the class intervals are given by [ai – 1, ai),
Where ai – 1 and ai are endpoints of ith class interval
Note
1. For discrete - Number of class intervals is determined by number of cells resulting
after combining adjacent cells as necessary.
2. For continuous – Number of class intervals must be specified.
The table 6.5 helps in determining the number of class intervals for continuous data.
Example 6.5
The vehicle arrival data is tabulated below
Solution
Hypotheses
H0 : Random variable is poisson distributed
H1 : Random variable is not poisson distributed
107
System Simulation and Modeling
The probabilities associated with various values of x are obtained by using above equation
p(0) = 0.026 p(3) =0.211 p(6) =0.085 p(9) =0.008
p(1) =0.096 p(4) =0.192 p(7) =0.044 p(10)=0.003
p(2) =0.174 p(5) =0.140 p(8) =0.020 p(11)=0.001
If a continuous distributional assumption is being tested, class intervals that are equal in
probability should be used instead of equal in width of interval.
Ei = npi ≥ 5
Solving for k,
108
System Simulation and Modeling
Example 6.6
Life test were performed on a random sample of electronic chips at 1.5 times the nominal
voltage, and their lifetime (or time to failure) in days was recorded:
Solution
Since the histogram appears to follow exponential distribution. The parameter is given a
Hypotheses
Ho : Random variable is exponentially distributed
H1 : Random variable is not exponentially distributed
The intervals must be of equal probability, so the end points of the class intervals must be
determined. Number of intervals should be less than or equal to n / 5.
Here n = 50, so k ≤ 50 / 5 → k ≤ 10
Let k = 8, then each interval will have probability as p= 1/k = 1/8 = 0.125
109
System Simulation and Modeling
a3 = 5.595
a4 = 8.252
a5 = 11.677
a6 = 16.503
a7 = 24.755
a0 = 0
ak =
The first interval is [0, 1.590) that is 0 ≤ x < 1.590 second interval [1.590, 3.425) and so
on. The values are computed and tabulated in table 6.7
Changing the number of classes and interval width affects the value of calculated
and tabulated chi-square.
A hypothesis may be accepted when the data are grouped in one way but rejected
if it is done in another way.
110
System Simulation and Modeling
It requires the data to be placed in the class intervals. In case of continuous grouping
is arbitrary.
The critical values in table A.8 are biased, they are too conservative. Conservative means
that critical values will be too large, resulting in smaller Type I (α) errors than those
specified.
Example 6.7
The interarrival times (minutes) are collected over 100-minute interval and are arranged in
order of occurrence.
0.44 0.53 2.04 2.74 2.00 0.30 2.54 0.52 2.02 1.89 1.53 0.21
2.80 0.04 1.35 8.32 2.34 1.95 0.10 1.42 0.46 0.07 1.09 0.76
5.55 3.93 1.07 2.26 2.88 0.67 1.12 0.26 4.57 5.37 0.12 3.19
1.63 1.46 1.08 2.06 0.85 0.83 2.44 2.11 3.15 2.90 6.58 0.64
Solution
Hypotheses
H0 : The inter arrival times are exponentially distributed .
H1 : The inter arrival times are not exponentially distributed.
The data were collected over the interval 0 to 100 minutes, so T = 100 minutes. If the
underlying distribution of inter arrival times {T1, T2, T3, … } is exponential, arrival times
are uniformly distributed on interval (0, T).
The arrival times T1, T1 +T2, T1 +T2 +T3…are obtained by adding inter arrival times, then
the arrival times are normalized to (0, 1) so that kolmogorov-smirnov test can be applied.
On interval (0, 1), the points will be [T1 / T, (T1 +T2) / T, …]. The resulting 50 points are
0.004 0.009 0.030 0.057 0.077 0.080 0.105 0.111 0.131 0.150
4 7 1 5 5 5 9 1 3 2
0.165 0.167 0.195 0.196 0.209 0.292 0.316 0.335 0.336 0.350
5 6 6 0 5 7 1 6 6 8
0.355 0.356 0.367 0.374 0.430 0.469 0.479 0.502 0.531 0.538
3 1 0 6 0 4 6 7 5 2
0.549 0.552 0.597 0.651 0.652 0.684 0.700 0.715 0.726 0.746
4 0 7 4 6 5 8 4 2 8
111
System Simulation and Modeling
0.755 0.763 0.788 0.798 0.820 0.841 0.873 0.902 0.968 0.974
3 6 0 2 6 7 2 2 0 4
D+ = 0.1054
D- = 0.0080
D = max {D+, D-} = max {0.1054, 0.0080} = 0.1054
Critical value → D0.05 = 1.36/√n = 1.36/√ 50 = 0.1923 (from table A.8)
D < Dα . Therefore interarrival times are exponentially distributed.
Note – A similar to kolmogorov-smirnov test is Anderson-darling test. It is the test based
on difference between empirical cdf and fitted cdf.
The p-value is the significance level at which one would just reject H0 for given value of
test statistic. Therefore a large p-value tends to indicate a good fit, while small p-value
suggests a poor fit.
The p-value can be viewed as a measure of fit. This suggests we could fit every distribution
at our disposal, compute a test statistic for each fit then choose the distribution that yields
largest p-value.
There are many ways to obtain information, if data are not available. Few are mentioned
below
1. Engineering data
The values provided by manufacturers provide a starting point for input modeling
by fixing a central value.
2. Expert option
112
System Simulation and Modeling
Talking to the experts who have experience with the process or similar processes.
They can provide optimistic, pessimistic and most likely thoughts.
3. Physical or conventional limitations
Many real processes have physical limits on performance (Ex. Computer data entry
is faster than a person can type). Do not ignore obvious limits or bounds that narrow
the range of input process.
4. The nature of process
The choice of distribution should be after clear understanding of distributions.
When no data is available then uniform, triangular and beta distributions are used as input
models. A useful refinement is obtained, when minimum, maximum and one or more
breakpoints can be given. A breakpoint is an intermediate value and a probability of being
less than or equal to that value.
Example 6.8
For a product planning simulation the sales volume of various products is required. The
sales person responsible for product XYZ says that no fewer than 1000 units will be sold
because of existing contracts, no more than 5000 units will be sold because of that is the
entire market for the product. Based on experience she believes that there is
90% chance of selling more than 2000 units
25% chance of selling more than 3500 units
Only 1% chance of selling more than 4500 units
Solution
Minimum – 1000 units
Maximum – 5000 units
90% chance of selling more than 2000 units
10% = 0.10 chance of selling between 1000 and 2000 units
1% = 0.01 chance of selling more than 4500 units
25% = 0.24 chance of selling more than 3500 unit (because 1% chance of selling more than
4500 units).
Remaining 65% chance of selling between 2000 and 3500 units
113
System Simulation and Modeling
The variables may be related and if the variables appear in a simulation models as inputs,
the relationship should be determined. When inputs exhibit dependence then multivariate
input models are used. Example: Two random variables lead time and annual demand in
inventory system.
Time series models are useful for representing a sequence of dependent inputs.
Example: Successive time between orders in a system.
Let
X1 X2 → two random variables
μi = E(Xi) → mean of Xi
σi2 = var(Xi)→ variance of Xi
Covariance and correlation are the measures of linear dependence between X1 and X2. In
others words it indicates how well the relationship between X1 and X2 is described by the
model
(X1 – μ1) = β(X2 – μ2) + ε
The covariance can take any value between - ∞ and ∞. The correlation standardizes
covariance to be between -1 and 1:
114
System Simulation and Modeling
The sequence of random variables X1, X2,… that are identically distributed (same mean &
variance) and may be dependent, such a sequence is called as Time series.
If the value of autocovariance depends only on h and not on t, then time series is called as
covariance stationary.
If X1 and X2 are normally distributed then dependence between them can be modeled by
bivariate normal distribution with parameter μ1, μ2, σ12, σ22 and ρ = corr (X1, X2)
To estimate ρ,
Let (X11, X21), (X12, X22),… (X1n, X2n) → n independent and identically distributed pairs
Example 6.9
The following data are available on demand and lead time for last 10 years. Determine the
correlation.
Lead time Demand
6.5 103
4.3 83
6.9 116
115
System Simulation and Modeling
6.0 97
6.9 112
6.9 104
5.8 106
7.3 109
4.5 92
6.3 96
Solution
116
System Simulation and Modeling
If X1, X2, X3,… is a sequence of identically distributed but dependent and covariance –
stationary random variables then there are number of time series models that can be used
to represent the process.
AR(1) model
Consider the time-series model
Xt = µ + Ф(X t -1 - µ) +εt , t= 2, 3, …
where
ε2, ε3, ε4…→ independent and identically normally distributed with mean zero and
variance σε2 and -1< Ф < 1
If X1 is appropriately chosen then X1, X2, are normally distributed with mean µ, variance
σ2ε / (1 - Ф2) and ρh = Фh , h = 1,2,… This time – series model is called Autoregressive
order -1 model or AR(1).
Then
117
System Simulation and Modeling
Step1: Generate X1 from normal distribution with mean µ and variance σЄ2∕ (1- Ф2)
Set t=2
Step2: Generate εt from normal distribution with mean 0 and variance σЄ2
Step3: Set Xt = μ +Ф ( Xt -1 - µ ) + ε t
for t =2, 3, …
Where ε2, ε 3→ independent and identically exponentially distributed with mean 1/λ and
0≤Ф<1
If initial values X1 is chosen appropriately then X1, X2, ….all are exponentially distributed
with mean 1/λ and ρh= Фh for h=1, 2,… This time–series model is called Exponential
Autoregressive order-1model or EAR (1).Only autocorrelation greater than 0, can be
represented by this model. Estimation of parameter proceeds as AR (1) by setting
, lag –1 autocorrelation.
Algorithm to generate a stationary EAR (1) time series (Given values of parameter Ф and
λ)
118
System Simulation and Modeling
Step2: Generate U from uniform distribution on [0, 1]. If U ≤ Ф then set Xt = ФXt - 1
Otherwise generate εt from exponential distribution with mean 1/λ and set
Xt = ФXt - 1 + ε t
Example 6.10
The stock brokerage would typically have a large sample of data. The following 20 times
gaps between customer by and sell orders had been recorded (in seconds)
1.95 1.75 1.58 1.42 1.28 1.15 1.04 0.93 0.84 0.75
0.68 0.61 11.98 10.79 9.71 14.02 12.62 11.36 10.22 9.20
Estimate lag-1 autocorrelation,
Solution
and
119
System Simulation and Modeling
Unit 7
Verification and Validation of Simulation Models
Verification and Validation of the simulation model is one of the most important and
difficult task carried out by the model developer, to work closely with end users throughout
the period of development and to increase the model’s credibility.
Validation is an integral part of the model development. The goal of validation is a two
fold process:
1. To produce a model that represents true system behavior, this can be used as a
substitute for the actual system, for the purpose of experimenting.
2. To increase the acceptance, credibility level of model, so that the model will be
used by managers and other decision makers.
Conceptually, the verification and validation process consists of the following components.
1. Verification is concerned with building the model right, which is used in
comparison of conceptual model to the computer representation.
2. Validation is concerned with building the right model, which is used to determine
that a model is an accurate representation of a real system. It is achieved through
the calibration of the model, an iterative process of comparing the model to actual
system behavior. This process is repeated until model accuracy is judged to be
acceptable.
120
System Simulation and Modeling
This step leads in understanding the system behavior. Persons familiar with the
system or sub system should be questioned and gain the advantage of their special
knowledge. As the development proceeds, new questions may arise and model
developers will return to this step.
Step 2- The second step involves in the construction of a conceptual model. It includes the
collection of assumptions of components, structure of the system and hypothesis
on the values of model input parameters.
Step 3- The third step is the translation of operational model into a computer recognizable
form - computerized model.
Model building is not a linear process instead the model builder goes back to these steps
many times while building, verifying and validating the model. The fig 7.1 shows the
model building process.
121
System Simulation and Modeling
Real
system
Calibration
and
Validation Conceptual validation
Conceptual model
1. Assumptions on system components
2. Structural assumptions, which define
the interactions between system
components
3. Input parameters and data assumptions
Model verification
Operational model
(Computerized
representation)
The main purpose of verification is to assure that the conceptual model is accurately
reflected in the operational model (computerized representation). The conceptual model
involves some degree of abstraction about the operations of the system.
122
System Simulation and Modeling
4. At the end of simulation, have the computerized representation that prints the input
parameter. This is to confirm that these parameter values are not changed or
modified.
5. As far as possible make the self documentation of computerized model.
6. If the computerized representation is animated, then check whether the animation
reflects the real system.
7. The interactive run controller (IRC), debugger assists in finding and correcting the
errors in the following ways.
a. As the simulation progress, it can be monitored. This is achieved by
advancing the simulation under a desired time and then display model
information.
b. Focus on each or multiple line of logic that constitutes a procedure or a
particular entity. For example every time a specified entity becomes active,
simulation will pause.
c. Selected model components values can be observed.
d. Simulation can be temporarily suspended or paused, to view information
and reassign values or redirect entities.
8. Graphical interfaces are required to represent the model graphically, it simplifies
the model understanding.
The standard statistics (average waiting time, average queue length etc) are automatically
collected in simulation language, which takes little time to display all statistics of interest.
Two sets of statistics that indicates the factor model reasonableness are
1. Current contents refer to the number of items in each components of the system
at a given time.
2. Total count refers to the total number of items that have entered each component
of the system by a given time.
A careful evaluation is required to detect the mistakes in model logic. To help in error
detection, it is best to adopt any of these verification processes.
123
System Simulation and Modeling
Example 7.1
In a single server queue model, an analyst made a run over 16 units of time and observed
that time average length of waiting line was ĹQ = 0.4375 customer, which is reasonably a
short run. So a detailed verification is required to be performed by analyst. The trace is
shown in table 7.1.
Definition of variables:
CLOCK = Simulation clock
EVTYP = Event type ( start, arrival, departure, or stop)
NCUST = Number of customers in system at time ‘CLOCK’
STATUS = Status of server (1- busy, 0- idle)
The reader can verify that Ĺ Q is computed correctly from the data
124
System Simulation and Modeling
The computer value is correct according to the given status, but its value is indeed wrong
as the attribute STATUS was not the correct value.
Of the three techniques, it is recommended that first two always to be carried out. The close
examination of model output for reasonableness is especially valuable and informative.
Calibration is the iterative process of comparing the model to the real system, making
adjustments to the model, comparing the revised model to the reality, making additional
adjustments and comparing again. The fig 7.2 shows the relationship of model calibration
to the overall validation process.
Comparing the model to reality is performed by either subjective test or objective test.
1. Subjective test involves people, who are knowledgeable about one or more aspects
of system, making judgment about the model and its output.
2. Objective test requires data on systems behavior and the corresponding data
produced by the model.
125
System Simulation and Modeling
The iterative process of comparing model and real system, revising conceptual and
operational model, is carried out until the model is judged accurate.
A possible criticism of calibration phase is to stop at the point where model has been “fit”
to one data set. This can be overcome by collecting a new set of system data and using at
final stage of validation i.e. after the model has been calibrated using the original system
data set, a final validation is done by using second system data set. In case of any
discrepancy, the modeler has to return back to calibration phase and modify the model until
it is acceptable.
Each revision of model involves cost, time and effort. The modeler must weigh increase in
model accuracy versus the cost of increased validation effort. If the level of accuracy is not
obtained within the budget constraints then accuracy level should be lowered or reject the
model.
126
System Simulation and Modeling
The potential users of the model must be involved in the model construction from its
conceptualization to implementation, to assure that the reality is built into the model
through assumptions regarding system structure and reliable data.
1. They can evaluate the model output for reasonableness and help in identifying the
deficiencies. So they are involved in the calibration process, as the model is
iteratively improved.
2. The increase in the model’s perceived validity or credibility helps the manager to
trust the simulation results, a basis for decision making.
Sensitivity analysis can also be used to check a model’s face validity – the model user is
asked whether the model behaves in the expected way, when one or more input variables
are changed. Based on experience and observation on the real system, both model user and
builder address the problem.
For most large-scale simulation models, many possible sensitivity tests are carried out as
there are many input variables. The builder must choose the most critical input variables
for testing if it is too expensive or time consuming.
2. Data assumptions involve collection of reliable data and correct statistical analysis
of the data.
127
System Simulation and Modeling
The reliability of data is verified by consultation with bank managers, who identify typical
slack/rush time. When two or more data sets collected are combined, objective statistical
tests is performed for homogeneity of data.
Additional tests may be required for correlation in data. The analyst begins statistical
analysis as soon as he is assured of dealing with a random sample.
In this phase, the model is viewed as an input-output transformation i.e. model accepts
values of input parameters and transforms these inputs into outputs measures of
performance. The modeler collects two sets of data, one data set used at the time of
developing and calibrating the model and the other if required at the final validation test.
In any case, the modeler should use the main responses of interest as criteria for validating
a model. A necessary condition in this phase is, some version of system under study exists,
so data can be collected (at least one set of input conditions), which might be useful to
compare with model predictions. If system is in planning stage and no system operating
data is collected, complete input output validation is not possible.
What about the validity of model of a non existent proposed system or model of existing
system under new input conditions?
First, the responses of two models under similar input conditions will be used as
criteria for comparison of existing and proposed system.
128
System Simulation and Modeling
If the changes are minor then it can be carefully verified and output from the new model is
accepted with confidence. If a similar subsystem exists elsewhere, it may be possible to
validate sub model that represents the subsystem and then integrate this sub model with
other validated sub models to build a complete model, this is a partial validation of major
changes.
To conduct a validation test using historical input data, it is important that all input data
(An , Sn…..) and all system response data such as average delay (Z2) should be collected
during the same time period. Otherwise the comparison of model to the system responses
could be misleading – the responses depends on inputs and structure of the system or model.
Implementation of this technique for large system is difficult because of the need of
simultaneous data collection. Some electronic counters and devices are used for ease of
data collection. In this technique the modeler hopes that simulation will provide a replica
of a real system, but to determine the level of accuracy both model builder’s and model
user’s judgment is considered.
7.3.5 Input-Output Validation: Using a Turing Test
The comparison of model output to system output can be carried out by persons who are
knowledgeable about system behavior, when no statistical test is readily applicable.
129
System Simulation and Modeling
For example: Suppose five reports of system performance over five different days are
prepared and simulation output data are used to produce five fake reports. So there are 10
reports exactly in same format and contains information as required by managers and
engineers. These 10 reports are shuffled randomly and submitted to the engineer, to identify
fake and real reports. If the engineer identifies fake reports, then the model builder
questions the engineer and uses the information gained to improve the model. If the
engineer cannot distinguish, then the modeler will conclude that this model is adequate.
This type of validation test is called Turing test.
130
System Simulation and Modeling
Random digits
131
System Simulation and Modeling
132
System Simulation and Modeling
133
System Simulation and Modeling
134
System Simulation and Modeling
135
System Simulation and Modeling
136
System Simulation and Modeling
υ
1 63.656 31.821 12.706 6.314 3.078
2 9.925 6.965 4.303 2.920 1.886
3 5.841 4.541 3.182 2.353 1.638
4 4.604 3.747 2.776 2.132 1.533
5 4.032 3.365 2.571 2.015 1.476
6 3.707 3.143 2.447 1.943 1.440
7 3.499 2.998 2.365 1.895 1.415
8 3.355 2.896 2.306 1.860 1.397
9 3.250 2.821 2.262 1.833 1.383
10 3.169 2.764 2.228 1.812 1.372
11 3.106 2.718 2.201 1.796 1.363
12 3.055 2.681 2.179 1.782 1.356
13 3.012 2.650 2.160 1.771 1.350
14 2.977 2.624 2.145 1.761 1.345
15 2.947 2.602 2.131 1.753 1.341
16 2.921 2.583 2.120 1.746 1.337
17 2.898 2.567 2.110 1.740 1.333
18 2.878 2.552 2.101 1.734 1.330
19 2.861 2.539 2.093 1.729 1.328
20 2.845 2.528 2.086 1.725 1.325
21 2.831 2.518 2.080 1.721 1.323
22 2.819 2.508 2.074 1.717 1.321
23 2.807 2.500 2.069 1.714 1.319
24 2.797 2.492 2.064 1.711 1.318
25 2.787 2.485 2.060 1.708 1.316
26 2.779 2.479 2.056 1.706 1.315
27 2.771 2.473 2.052 1.703 1.314
28 2.763 2.467 2.048 1.701 1.313
29 2.756 2.462 2.045 1.699 1.311
30 2.750 2.457 2.042 1.697 1.310
60 2.660 2.390 2.000 1.671 1.296
120 2.617 2.358 1.980 1.658 1.289
Infinity 2.576 2.326 1.960 1.645 1.282
137
System Simulation and Modeling
υ
1 7.87944 6.63490 5.02389 3.84146 2.70554
2 10.59663 9.21034 7.37776 5.99146 4.60517
3 12.83816 11.34487 9.34840 7.81473 6.25139
4 14.86026 13.27670 11.14329 9.48773 7.77944
5 16.74960 15.08627 12.83250 11.07050 9.23636
6 18.54758 16.81189 14.44938 12.59159 10.64464
7 20.27774 18.47531 16.01276 14.06714 12.01704
8 21.95495 20.09024 17.53455 15.50731 13.36157
9 23.58935 21.66599 19.02277 16.91898 14.68366
10 25.18818 23.20925 20.48318 18.30704 15.98718
11 26.75685 24.72497 21.92005 19.67514 17.27501
12 28.29952 26.21697 23.33666 21.02607 18.54935
13 29.81947 27.68825 24.73560 22.36203 19.81193
14 31.31935 29.14124 26.11895 23.68479 21.06414
15 32.80132 30.57791 27.48839 24.99579 22.30713
16 34.26719 31.99993 28.84535 26.29623 23.54183
17 35.71847 33.40866 30.19101 27.58711 24.76904
18 37.15645 34.80531 31.52638 28.86930 25.98942
19 38.58226 36.19087 32.85233 30.14353 27.20357
20 39.99685 37.56623 34.16961 31.41043 28.41198
21 41.40106 38.93217 35.47888 32.67057 29.61509
22 42.79565 40.28936 36.78071 33.92444 30.81328
23 44.18128 41.63840 38.07563 35.17246 32.00690
24 45.55851 42.97982 39.36408 36.41503 33.19624
25 46.92789 44.31410 40.64647 37.65248 34.38159
26 48.28988 45.64168 41.92317 38.88514 35.56317
27 49.64492 46.96294 43.19451 40.11327 36.74122
28 50.99338 48.27824 44.46079 41.33714 37.91592
29 52.33562 49.58788 45.72229 42.55697 39.08747
30 53.67196 50.89218 46.97924 43.77297 40.25602
138
System Simulation and Modeling
139
System Simulation and Modeling
140
System Simulation and Modeling
141