Forcasting Ass@OM
Forcasting Ass@OM
Forcasting Ass@OM
Department of Management
Postgraduate Program (MBA)
In sawla campus
Operation management
Group -II
1. EYERUSALEM TIMHERTE Id no. PWBE/029/15
2. TESFANEH GORFU Id no. PWBE/033/15
3. TAMIRAT TEREFE Id no. PWBE//15
4. YAKOB PEKO Id no. PWBE//15
2. Features of forecasting
It is concerned with future events.
It is necessary for planning process.
The impact of future events has to be considered in the planning process.
It is a guessing of future events.
It considers all the factors which affect organizational functions.
Personal observation also helps forecasting.
2|Page
feeling that they may get burned every time a new forecast is issued.
4. The forecast should be expressed in meaningful units. Financial planners need to
know
how many dollars will be needed, production planners need to know how many units
will be needed, and schedulers need to know what machines and skills will be required.
The choice of units depends on user needs.
5. The forecast should be in writing. Although this will not guarantee that all
concerned are
using the same information, it will at least increase the likelihood of it. In addition, a
written
forecast will permit an objective basis for evaluating the forecast once actual results are
in.
6. The forecasting technique should be simple to understand and use.
7. The forecast should be cost-effective: The benefits should outweigh the costs.
4. Importance of forecasting
1. Pivotal role in an organization:-Many organizations have failed because of lack of
forecasting or faulty forecasting. The reason is that planning is based on accurate
forecasting.
2. Development of a business:-The performance of specified objectives depends upon
the proper forecasting. So the development of a business or an organization is fully
based on the forecasting
3. Co-ordination:-Forecasting helps to collect the information about internal and
external factors. Thus collected information provides a basis for co-ordination.
4. Effective control:-Management executive can ascertain the strength and weaknesses
of sub-ordinates or employees through forecasting.
5. Key to success:-All business organizations are facing risks. Forecasting provides clues
and reduce risk and uncertainties. The management executives can save the business
and get success by taking appropriate action.
6. Implementation of project:-Many entrepreneurs implement a project on the basis of
their experience .Forecasting helps an entrepreneur to gain experience and ensures him
success.
7. Primacy to planning:-The information required for planning is supplied by
forecasting. So, forecasting is the primacy to the planning.
Advantages
Balanced work-load
Minimization in the fluctuations of production
Better use of production facilities
Better material management
Better customer service
Better utilization of capital and resources
Better design of facilities and production system.
Limitation
Forecasting is to be made on the basis of certain assumptions and human
judgments which yield wrong result.
It can not be considered as a scientific method for guessing future events.
It does not specify any concrete relationship between past and future events.
It requires high degree of skill.
It needs adequate reliable information so difficult to collect reliable information.
Heavy cost and time consuming.
It can not be applied to a long period.
For example, forecasts of electricity demand can be highly accurate because all three
conditions are usually satisfied. We have a good idea of the contributing factors:
electricity demand is driven largely by temperatures, with smaller effects for calendar
variation such as holidays, and economic conditions. Provided there is a sufficient
4|Page
history of data on electricity demand and weather conditions, and we have the skills
to develop a good model linking electricity demand and the key driver variables, the
forecasts can be remarkably accurate.
On the other hand, when forecasting currency exchange rates, only one of the
conditions is satisfied: there is plenty of available data. However, we have a limited
understanding of the factors that affect exchange rates, and forecasts of the e xchange
rate have a direct effect on the rates themselves. If there are well-publicised forecasts
that the exchange rate will increase, then people will immediately adjust the price
they are willing to pay and so the forecasts are self-fulfilling. In a sense, the exchange
rates become their own forecasts. This is an example of the “efficient market
hypothesis”. Consequently, forecasting whether the exchange rate will rise or fall
tomorrow is about as predictable as forecasting whether a tossed coin will com e down
as a head or a tail. In both situations, you will be correct about 50% of the time,
whatever you forecast. In situations like this, forecasters need to be aware of their
own limitations, and not claim more than is possible.
Often in forecasting, a key step is knowing when something can be forecast
accurately, and when forecasts will be no better than tossing a coin. Good forecasts
capture the genuine patterns and relationships which exist in the historical data, but
do not replicate past events that will not occur again. In this book, we will learn how
to tell the difference between a random fluctuation in the past data that should be
ignored, and a genuine pattern that should be modelled and extrapolated.
Many people wrongly assume that forecasts are not possible in a changing
environment. Every environment is changing, and a good forecasting model captures
the way in which things are changing. Forecasts rarely assume that the environment
is unchanging. What is normally assumed is that the way in which the environment is
changing will continue into the future. That is, a highly volatile environment will
continue to be highly volatile; a business with fluctuating sales will continue to have
fluctuating sales; and an economy that has gone through booms and busts will
continue to go through booms and busts. A forecasting model is intended to capture
the way things move, not just where things are. As Abraham Lincoln said, “If we could
first know where we are and whither we are tending, we could better judge what to do
and how to do it”.
Forecasting situations vary widely in their time horizons, factors determining actual
outcomes, types of data patterns, and many other aspects. Forecasting methods can
be simple, such as using the most recent observation as a forecast (which is called
the naïve method), or highly complex, such as neural nets and econometric systems
of simultaneous equations.
often done poorly, and is frequently confused with planning and goals. They are three
different things.
Forecasting
is about predicting the future as accurately as possible, given all of the
information available, including historical data and knowledge of any future
events that might impact the forecasts.
Goals
are what you would like to have happen. Goals should be linked to forecasts
and plans, but this does not always occur. Too often, goals are set without any
plan for how to achieve them, and no forecasts for whether they are realistic.
Planning
is a response to forecasts and goals. Planning involves determining the
appropriate actions that are required to make your forecasts match your goals.
Forecasting should be an integral part of the decision-making activities of
management, as it can play an important role in many areas of a company. Modern
organisations require short-term, medium-term and long-term forecasts, depending
on the specific application.
Short-term forecasts
are needed for the scheduling of personnel, production and transportation. As
part of the scheduling process, forecasts of demand are often also required.
Medium-term forecasts
are needed to determine future resource requirements, in order to purchase
raw materials, hire personnel, or buy machinery and equipment.
Long-term forecasts
are used in strategic planning. Such decisions must take account of market
opportunities, environmental factors and internal resources.
An organisation needs to develop a forecasting system that involves several
approaches to predicting uncertain events. Such forecasting systems require the
development of expertise in identifying forecasting problems, applying a range of
forecasting methods, selecting appropriate methods for each problem, and evaluating
and refining forecasting methods over time. It is also important to have strong
organisational support for the use of formal forecasting methods if they are to be used
successfully.
6|Page
1. Every product line, or for groups of products?
2. Every sales outlet, or for outlets grouped by region, or only for total sales?
3. Weekly data, monthly data or annual data?
It is also necessary to consider the forecasting horizon. Will forecasts be required for
one month in advance, for 6 months, or for ten years? Different types of models will
be necessary, depending on what forecast horizon is most important.
How frequently are forecasts required? Forecasts that need to be produced frequently
are better done using an automated system than with methods that require careful
manual work.
It is worth spending time talking to the people who will use the forecasts to en sure
that you understand their needs, and how the forecasts are to be used, before
embarking on extensive work in producing the forecasts.
Once it has been determined what forecasts are required, it is then necessary to find
or collect the data on which the forecasts will be based. The data required for
forecasting may already exist. These days, a lot of data are recorded, and the
forecaster’s task is often to identify where and how the required data are stored. The
data may include sales records of a company, the historical demand for a product, or
the unemployment rate for a geographic region. A large part of a forecaster’s time can
be spent in locating and collating the available data prior to developing suitable
forecasting methods.
Anything that is observed sequentially over time is a time series. In this book, we will
only consider time series that are observed at regular intervals of time (e.g., hourly,
daily, weekly, monthly, quarterly, annually). Irregularly spaced time series can also
occur, but are beyond the scope of this book.
When forecasting time series data, the aim is to estimate how the sequence of
observations will continue into the future. Figure 1.1 shows the quarterly Australian
beer production from 1992 to the second quarter of 2010.
Figure 1.1: Australian quarterly beer production: 1992Q1–2010Q2, with two years of
forecasts.
The blue lines show forecasts for the next two years. Notice how the forecasts have
captured the seasonal pattern seen in the historical data and replicated it for the next
two years. The dark shaded region shows 80% prediction intervals. That is, each
future value is expected to lie in the dark shaded region with a probability of 80%.
The light shaded region shows 95% prediction intervals. These prediction intervals
8|Page
are a useful way of displaying the uncertainty in forecasts. In this case the forecasts
are expected to be accurate, and hence the prediction intervals are quite narrow.
The simplest time series forecasting methods use only information on the variable to
be forecast, and make no attempt to discover the factors that affect its behaviour.
Therefore they will extrapolate trend and seasonal patterns, but they ignore all other
information such as marketing initiatives, competitor activity, changes in economic
conditions, and so on.
Time series models used for forecasting include decomposition models, exponential
smoothing models and ARIMA models.
explanatory or mixed model. First, the system may not be understood, and even if it
was understood it may be extremely difficult to measure the relationships that are
assumed to govern its behavior. Second, it is necessary to know or forecast the future
values of the various predictors in order to be able to forecast the variable of int erest,
and this may be too difficult. Third, the main concern may be only to predict what will
happen, not to know why it happens. Finally, the time series model may give more
accurate forecasts than an explanatory or mixed model.
The model to be used in forecasting depends on the resources and data available, the
accuracy of the competing models, and the way in which the forecasting model is to be
used.
10 | P a g e
variables, and the way in which the forecasts are to be used. It is common to
compare two or three potential models. Each model is itself an artificial
construct that is based on a set of assumptions (explicit and implicit) and
usually involves one or more parameters which must be estimated using the
known historical data.
Figure 1.2: Total international visitors to Australia (1980-2015) along with ten possible
futures.
When we obtain a forecast, we are estimating the middle of the range of possible
values the random variable could take. Often, a forecast is accompanied by
a prediction interval giving a range of values the random variable could take with
relatively high probability. For example, a 95% prediction interval contains a range of
values which should include the actual future value with probability 95%.
Rather than plotting individual possible futures as shown in Figure 1.2, we usually
show these prediction intervals instead. The plot below shows 80% and 95% intervals
for the future Australian international visitors. The blue line is the average of the
possible future values, which we call the point forecasts.
12 | P a g e
Figure 1.3: Total international visitors to Australia (1980–2015) along with 10-year
forecasts and 80% and 95% prediction intervals.
We will use the subscript t for time. For example, yt will denote the observation at
time t. Suppose we denote all the information we have observed as I and we want to
forecast yt. We then write yt|I| meaning “the random variable yt given what we know
in I”. The set of values that this random variable could take, along with their relative
probabilities, is known as the “probability distribution” of yt|I|. In forecasting, we call
this the forecast distribution.
When we talk about the “forecast”, we usually mean the average value of the forecast
distribution, and we put a “hat” over y to show this. Thus, we write the forecast
of yt as ^yt^, meaning the average of the possible values that yt could take given
everything we know. Occasionally, we will use ^yt^ to refer to the median (or middle
value) of the forecast distribution instead.
It is often useful to specify exactly what information we have used in calculating the
forecast. Then we will write, for example, ^yt|t−1^|−1 to mean the forecast
of yt taking account of all previous observations (y1,…,yt−1)(1,…,−1).
Similarly, ^yT+h|T^+ℎ| means the forecast of yT+h+ℎ taking account
of y1,…,yT1,…, (i.e., an hℎ-step forecast taking account of all observations up to
time T).
OPERATION MANAGIMENT FORECASTING
6.1 ts objects
A time series can be thought of as a list of numbers, along with some information
about what times those numbers were recorded. This information can be stored as
a ts object in R.
Suppose you have annual observations for the last few years:
Year Observation
2012 123
2013 39
2014 78
2015 52
2016 110
Annual 1
14 | P a g e
Data frequency
Quarterly 4
Monthly 12
Weekly 52
There was a period in 1989 when no passengers were carried — this was due to an
industrial dispute.
There was a period of reduced load in 1992. This was due to a trial in which some
economy class seats were replaced by business class seats.
A large increase in passenger load occurred in the second half of 1991.
There are some large dips in load around the start of each year. These are due to
holiday effects.
There is a long-term fluctuation in the level of the series which increases during 1987,
decreases in 1989, and increases again through 1990 and 1991.
There are some periods of missing observations.
Any model will need to take all these features into account in order to effectively
forecast the passenger load into the future.
A simpler time series is shown in Figure 2.2.
autoplot(a10) +
ggtitle("Antidiabetic drug sales") +
ylab("$ million") +
xlab("Year")
16 | P a g e
Figure 2.2: Monthly sales of antidiabetic drugs in Australia.
Here, there is a clear and increasing trend. There is also a strong seasonal pattern that
increases in size as the level of the series increases. The sudden drop at the start of
each year is caused by a government subsidisation scheme that makes it cost-effective
for patients to stockpile drugs at the end of the calendar year. Any forecasts of this
series would need to capture the seasonal pattern, and the fact that the trend is
changing slowly.
Seasonal
A seasonal pattern occurs when a time series is affected by seasonal factors
such as the time of the year or the day of the week. Seasonality is always of a
fixed and known frequency. The monthly sales of antidiabetic drugs above
OPERATION MANAGIMENT FORECASTING
shows seasonality which is induced partly by the change in the cost of the drugs
at the end of the calendar year.
Cyclic
A cycle occurs when the data exhibit rises and falls that are not of a fixed
frequency. These fluctuations are usually due to economic conditions, and are
often related to the “business cycle”. The duration of these fluctuations is
usually at least 2 years.
Many people confuse cyclic behaviour with seasonal behaviour, but they are really
quite different. If the fluctuations are not of a fixed frequency then they are cyclic; if
the frequency is unchanging and associated with some aspect of the calendar, then the
pattern is seasonal. In general, the average length of cycles is longer than the length of
a seasonal pattern, and the magnitudes of cycles tend to be more variable than the
magnitudes of seasonal patterns.
Many time series include trend, cycles and seasonality. When choosing a forecasting
method, we will first need to identify the time series patterns in the data, and then
choose a method that is able to capture the patterns properly.
1.
18 | P a g e
2. Figure 2.4: Seasonal plot of monthly antidiabetic drug sales in Australia.
These are exactly the same data as were shown earlier, but now the data from
each season are overlapped. A seasonal plot allows the underlying seasonal
pattern to be seen more clearly, and is especially useful in identifying years in
which the pattern changes.
In this case, it is clear that there is a large jump in sales in January each year.
Actually, these are probably sales in late December as customers stockpile
before the end of the calendar year, but the sales are not registered with the
government until a week or two later. The graph also shows that there was an
unusually small number of sales in March 2008 (most other years show an
increase between February and March). The small number of sales in June
2008 is probably due to incomplete counting of sales at the time the data were
collected.
Figure 2.6: Seasonal subseries plot of monthly antidiabetic drug sales in Australia.
OPERATION MANAGIMENT FORECASTING
The horizontal lines indicate the means for each month. This form of plot enables the
underlying seasonal pattern to be seen clearly, and also shows the changes in
seasonality over time. It is especially useful in identifying changes within particular
seasons. In this example, the plot is not particularly revealing; but in some cases, this
is the most useful way of viewing seasonal changes over time.
6.6 Scatterplots
The graphs discussed so far are useful for visualising individual time series. It is also
useful to explore relationships between time series.
Figure 2.7 shows two time series: half-hourly electricity demand (in Gigawatts) and
temperature (in degrees Celsius), for 2014 in Victoria, Australia. The temperatures
are for Melbourne, the largest city in Victoria, while the demand values are for the
entire state.
autoplot(elecdemand[,c("Demand","Temperature")], facets=TRUE) +
xlab("Year: 2014") + ylab("") +
ggtitle("Half-hourly electricity demand: Victoria, Australia")
Figure 2.7: Half hourly electricity demand and temperatures in Victoria, Australia, for
2014.
(The actual code for this plot is a little more complicated than what is shown in order
to include the months on the x-axis.)
20 | P a g e
We can study the relationship between demand and temperature by plotting one
series against the other.
qplot(Temperature, Demand, data=as.data.frame(elecdemand)) +
ylab("Demand (GW)") + xlab("Temperature (Celsius)")
#> Warning: `qplot()` was deprecated in ggplot2 3.4.0.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
generated.
Figure 2.8: Half-hourly electricity demand plotted against temperature for 2014 in
Victoria, Australia.
This scatterplot helps us to visualise the relationship between the variables. It is clear
that high demand occurs when temperatures are high due to the effect of air-
conditioning. But there is also a heating effect, where demand increases for very low
temperatures.
Figure 2.11: Quarterly visitor nights for various regions of NSW, Australia.
To see the relationships between these five time series, we can plot each time series
against the others. These plots can be arranged in a scatterplot matrix, as shown in
Figure 2.12. (This plot requires the GGally package to be installed.)
GGally::ggpairs(as.data.frame(visnights[,1:5]))
22 | P a g e
Figure 2.12: A scatterplot matrix of the quarterly visitor nights in five regions of NSW,
Australia.
For each panel, the variable on the vertical axis is given by the variable name in that
row, and the variable on the horizontal axis is given by the variable name in that
column. There are many options available to produce different plots within each
panel. In the default version, the correlations are shown in the upper right half of the
plot, while the scatterplots are shown in the lower half. On the diagonal are shown
density plots.
The value of the scatterplot matrix is that it enables a quick view of the relationships
between all pairs of variables. In this example, the second column of plots shows
there is a strong positive relationship between visitors to the NSW north coast and
OPERATION MANAGIMENT FORECASTING
visitors to the NSW south coast, but no detectable relationship between visitors to the
NSW north coast and visitors to the NSW south inland. Outliers can also be seen.
There is one unusually high quarter for the NSW Metropolitan region, corresponding
to the 2000 Sydney Olympics. This is most easily seen in the first two plots in the left
column of Figure 2.12, where the largest value for NSW Metro is separate from the
main cloud of observations.
24 | P a g e
Figure 2.13: Lagged scatterplots for quarterly beer production.
Here the colours indicate the quarter of the variable on the vertical axis. The lines
connect points in chronological order. The relationship is strongly positive at lags 4
and 8, reflecting the strong seasonality in the data. The negative relationship seen for
lags 2 and 6 occurs because peaks (in Q4) are plotted against troughs (in Q2)
The window() function used here is very useful when extracting a portion of a time
series. In this case, we have extracted the data from ausbeer, beginning in 1992.
OPERATION MANAGIMENT FORECASTING
6.9 Autocorrelation
Just as correlation measures the extent of a linear relationship between two variables,
autocorrelation measures the linear relationship between lagged values of a time series.
There are several autocorrelation coefficients, corresponding to each panel in the lag plot.
For example, r11 measures the relationship between yt and yt−1−1, r22 measures the
relationship between yt and yt−2−2, and so on.
The value of rk can be written
asrk=T∑t=k+1(yt−¯y)(yt−k−¯y)T∑t=1(yt−¯y)2,=∑=+1(−¯)(¯)∑=1(−¯)2,where T
is the length of the time series.
The first nine autocorrelation coefficients for the beer production data are given in the
following table.
r1 1 r2 2 r3 3 r4 4 r5 5 r6 6 r7 7 r8 8 r9 9
-0.102 -0.657 -0.060 0.869 -0.089 -0.635 -0.054 0.832 -0.108
These correspond to the nine scatterplots in Figure 2.13. The autocorrelation coefficients
are plotted to show the autocorrelation function or ACF. The plot is also known as
a correlogram.
ggAcf(beer2)
r44 is higher than for the other lags. This is due to the seasonal pattern in the data: the
peaks tend to be four quarters apart and the troughs tend to be four quarters apart.
r22 is more negative than for the other lags because troughs tend to be two quarters
behind peaks.
The dashed blue lines indicate whether the correlations are significantly different from
zero. These are explained in Section 2.9.
26 | P a g e
6.10 White noise
Time series that show no autocorrelation are called white noise. Figure 2.17 gives an
example of a white noise series.
set.seed(30)
y <- ts(rnorm(50))
autoplot(y) + ggtitle("White noise")
For white noise series, we expect each autocorrelation to be close to zero. Of course,
they will not be exactly equal to zero as there is some random variation. For a white
noise series, we expect 95% of the spikes in the ACF to lie
within ±2/√ T ±2/ where T is the length of the time series. It is common to plot these
bounds on a graph of the ACF (the blue dashed lines above). If one or more large
spikes are outside these bounds, or if substantially more than 5% of spikes are outside
these bounds, then the series is probably not white noise.
In this example, T=50=50 and so the bounds are
at ±2/√ 50 =±0.28±2/50=±0.28. All of the autocorrelation coefficients lie within
these limits, confirming that the data are white noise.
Average method
Here, the forecasts of all future values are equal to the average (or “mean”) of the
historical data. If we let the historical data be denoted by y1,…,yT1,…,, then we can
write the forecasts as^yT+h|T=¯y=(y1+⋯+yT)/T.^+ℎ|=¯=(1+⋯+)/.The
notation ^yT+h|T^+ℎ| is a short-hand for the estimate of yT+h+ℎ based on the
data y1,…,yT1,…,.
meanf(y, h)
# y contains the time series
# h is the forecast horizon
Naïve method
For naïve forecasts, we simply set all forecasts to be the value of the last observation.
That is,^yT+h|T=yT.^+ℎ|=.This method works remarkably well for many economic
and financial time series.
28 | P a g e
naive(y, h)
rwf(y, h) # Equivalent alternative
Because a naïve forecast is optimal when data follow a random walk, these are also
called random walk forecasts.
Drift method
A variation on the naïve method is to allow the forecasts to increase or decrease over
time, where the amount of change over time (called the drift) is set to be the average
change seen in the historical data. Thus the forecast for time T+h+ℎ is given
by^yT+h|T=yT+hT−1T∑t=2(yt−yt−1)=yT+h(yT−y1T−1).^+ℎ|=+ℎ−1∑=2(−−1)=
+ℎ(−1−1).This is equivalent to drawing a line between the first and last
observations, and extrapolating it into the future.
rwf(y, h, drift=TRUE)
Calendar adjustments
Some of the variation seen in seasonal data may be due to simple calendar effects. In
such cases, it is usually much easier to remove the variation before fitting a
forecasting model. The monthdays() function will compute the number of days in each
month or quarter.
OPERATION MANAGIMENT FORECASTING
For example, if you are studying the monthly milk production on a farm, there will be
variation between the months simply because of the different numbers of days in each
month, in addition to the seasonal variation across the year.
dframe <- cbind(Monthly = milk,
DailyAverage = milk/monthdays(milk))
autoplot(dframe, facet=TRUE) +
xlab("Years") + ylab("Pounds") +
ggtitle("Milk production per cow")
Population adjustments
Any data that are affected by population changes can be adjusted to give per -capita
data. That is, consider the data per person (or per thousand people, or per million
people) rather than the total. For example, if you are studying the number of hospital
beds in a particular region over time, the results are much easier to interpret if you
30 | P a g e
remove the effects of population changes by considering the number of beds per
thousand people. Then you can see whether there have been real increases in the
number of beds, or whether the increases are due entirely to population increases. It
is possible for the total number of beds to increase, but the number of beds per
thousand people to decrease. This occurs when the population is increasing faster
than the number of hospital beds. For most data that are affected by population
changes, it is best to use per-capita data rather than the totals.
Inflation adjustments
Data which are affected by the value of money are best adjusted before modelling. For
example, the average cost of a new house will have increased over the last few decades
due to inflation. A $200,000 house this year is not the same as a $200,000 house
twenty years ago. For this reason, financial time series are usually adjusted so that all
values are stated in dollar values from a particular year. For example, the house price
data may be stated in year 2000 dollars.
To make these adjustments, a price index is used. If zt denotes the price index
and yt denotes the original house price in year t, then xt=yt/zt∗z2000=/∗2000 gives
the adjusted house price at year 2000 dollar values. Price indexes are often
constructed by government agencies. For consumer goods, a common price index is
the Consumer Price Index (or CPI).
Mathematical transformations
If the data show variation that increases or decreases with the level of the series, then
a transformation can be useful. For example, a logarithmic transformation is often
useful. If we denote the original observations as y1,…,yT1,…, and the transformed
observations as w1,…,wT1,…,, then wt=log(yt)=log(). Logarithms are useful
because they are interpretable: changes in a log value are relative (or percentage)
changes on the original scale. So if log base 10 is used, then an increase of 1 on the log
scale corresponds to a multiplication of 10 on the original scale. Another useful
feature of log transformations is that they constrain the forecasts to stay positive on
the original scale.
Sometimes other transformations are also used (although they are not so
interpretable). For example, square roots and cube roots can be used. These are
called power transformations because they can be written in the form wt=ypt=.
A useful family of transformations, that includes both logarithms and power
transformations, is the family of Box-Cox transformations (Box & Cox, 1964),
which depend on the parameter λ and are defined as
follows:wt={log(yt)if λ=0;(yλt−1)/λotherwise.={log()if =0;(−1)/otherwise
.
OPERATION MANAGIMENT FORECASTING
32 | P a g e
Features of power transformations
Choose a simple value of λ. It makes explanations easier.
The forecasting results are relatively insensitive to the value of λ.
Often no transformation is needed.
Transformations sometimes make little difference to the forecasts but have a large
effect on prediction intervals .
Bias adjustments
One issue with using mathematical transformations such as Box-Cox transformations
is that the back-transformed point forecast will not be the mean of the forecast
distribution. In fact, it will usually be the median of the forecast distribution
(assuming that the distribution on the transformed space is symmetric). For many
purposes, this is acceptable, but occasionally the mean forecast is required. For
example, you may wish to add up sales forecasts from various regions to form a
forecast for the whole country. But medians do not add up, whereas means do.
Figure 3.4: Forecasts of egg prices using a random walk with drift applied to the
logged data.
The blue line in Figure 3.4 shows the forecast medians while the red line shows the
forecast means. Notice how the skewed forecast distribution pulls up the point
forecast when we use the bias adjustment.
Bias adjustment is not done by default in the forecast package. If you want your
forecasts to be means rather than medians, use the argument biasadj=TRUE when you
select your Box-Cox transformation parameter.
34 | P a g e
Actually, fitted values are often not true forecasts because any parameters involved in
the forecasting method are estimated using all available observations in the time
series, including future observations. For example, if we use the average method, the
fitted values are given by^yt=^c^=^where ^c^ is the average computed over all
available observations, including those at times after t. Similarly, for the drift
method, the drift parameter is estimated using all available observations. In this case,
the fitted values are given
by^yt=yt−1+^c^=−1+^where ^c=(yT−y1)/(T−1)^=(−1)/(−1). In both cases,
there is a parameter to be estimated from the data. The “hat” above the c reminds us
that this is an estimate. When the estimate of c involves observations after time t, the
fitted values are not true forecasts. On the other hand, naïve or seasonal naïve
forecasts do not involve any parameters, and so fitted values are true forecasts in such
cases.
Residuals
The “residuals” in a time series model are what is left over after fitting a model. F or
many (but not all) time series models, the residuals are equal to the difference
between the observations and the corresponding fitted values: et=yt−^yt.=−^.
Residuals are useful in checking whether a model has adequately captured the
information in the data. A good forecasting method will yield residuals with the
following properties:
1. The residuals are uncorrelated. If there are correlations between residuals, then there is
information left in the residuals which should be used in computing forecasts.
2. The residuals have zero mean. If the residuals have a mean other than zero, then the forecasts
are biased.
Any forecasting method that does not satisfy these properties can be improved.
However, that does not mean that forecasting methods that satisfy these properties
cannot be improved. It is possible to have several different forecasting methods for
the same data set, all of which satisfy these properties. Checking these properties is
important in order to see whether a method is using all of the available information,
but it is not a good way to select a forecasting method.
If either of these properties is not satisfied, then the forecasting method can be
modified to give better forecasts. Adjusting for bias is easy: if the residuals have
mean m, then simply add m to all forecasts and the bias problem is solved.
In addition to these essential properties, it is useful (but not necessary) for the
residuals to also have the following two properties.
These two properties make the calculation of prediction intervals easier (see
Section 3.5 for an example). However, a forecasting method that does not satisfy
these properties cannot necessarily be improved. Sometimes applying a Box-Cox
transformation may assist with these properties, but otherwise there is usually little
that you can do to ensure that your residuals have constant variance and a normal
distribution. Instead, an alternative approach to obtaining prediction intervals is
necessary. Again, we will not address how to do this until later in the book.
36 | P a g e
7.4 Evaluating forecast accuracy
The size of the test set is typically about 20% of the total sample, although this value
depends on how long the sample is and how far ahead you want to forecast. The test
set should ideally be at least as large as the maximum forecast horizon required. The
following points should be noted.
A model which fits the training data well will not necessarily forecast well.
A perfect fit can always be obtained by using a model with enough parameters.
Over-fitting a model to data is just as bad as failing to identify a systematic pattern in the
data.
Some references describe the test set as the “hold-out set” because these data are
“held out” of the data used for fitting. Other references call the training set the “in -
sample data” and the test set the “out-of-sample data”. We prefer to use “training
data” and “test data” in this book.
Another useful function is subset() which allows for more types of subsetting. A great
advantage of this function is that it allows the use of indices to choose a subset.
OPERATION MANAGIMENT FORECASTING
Forecast errors
A forecast “error” is the difference between an observed value and its forecast. Here
“error” does not mean a mistake, it means the unpredictable part of an observation. It
can be written aseT+h=yT+h−^yT+h|T,+ℎ=+ℎ−^+ℎ|,where the training data is given
by {y1,…,yT}{1,…,} and the test data is given by {yT+1,yT+2,…}{+1,+2,…}.
Note that forecast errors are different from residuals in two ways. First, residuals are
calculated on the training set while forecast errors are calculated on the test set.
Second, residuals are based on one-step forecasts while forecast errors can
involve multi-step forecasts.
We can measure forecast accuracy by summarising the forecast errors in different
ways.
Scale-dependent errors
The forecast errors are on the same scale as the data. Accuracy measures that are
based only on et are therefore scale-dependent and cannot be used to make
comparisons between series that involve different units.
The two most commonly used scale-dependent measures are based on the absolute
errors or squared errors: Mean absolute error: MAE=mean(|et|),Root mean
squared error: RMSE=√ mean(e2t) .Mean absolute error:
MAE=mean(||),Root mean squared error: RMSE=mean(2).When
comparing forecast methods applied to a single time series, or to several time series
with the same units, the MAE is popular as it is easy to both understand and compute.
A forecast method that minimises the MAE will lead to forecasts of the median, while
minimising the RMSE will lead to forecasts of the mean. Consequently, the RMSE is
also widely used, despite being more difficult to interpret.
Percentage errors
The percentage error is given by pt=100et/yt=100/. Percentage errors have the
advantage of being unit-free, and so are frequently used to compare forecast
performances between data sets. The most commonly used measure is: Mean
absolute percentage error: MAPE=mean(|pt|).Mean absolute percentage
error: MAPE=mean(||).Measures based on percentage errors have the
disadvantage of being infinite or undefined if yt=0=0 for any t in the period of
interest, and having extreme values if any yt is close to zero. Another problem with
percentage errors that is often overlooked is that they assume the unit of
measurement has a meaningful zero. 3 For example, a percentage error makes no
sense when measuring the accuracy of temperature forecasts on either the Fahrenheit
or Celsius scales, because temperature has an arbitrary zero point.
38 | P a g e
They also have the disadvantage that they put a heavier penalty on negative errors
than on positive errors. This observation led to the use of the so-called “symmetric”
MAPE (sMAPE) proposed by Armstrong (1978, p. 348), which was used in the M3
forecasting competition. It is defined
bysMAPE=mean(200|yt−^yt|/(yt+^yt)).sMAPE=mean(200−^|/(+^)).Howe
ver, if yt is close to zero, ^yt^ is also likely to be close to zero. Thus, the measure still
involves division by a number close to zero, making the calculation unstable. Also, the
value of sMAPE can be negative, so it is not really a measure of “absolute percentage
errors” at all.
Hyndman & Koehler (2006) recommend that the sMAPE not be used. It is included
here only because it is widely used, although we will not use it in this book.
Scaled errors
Scaled errors were proposed by Hyndman & Koehler (2006) as an alternative to using
percentage errors when comparing forecast accuracy across series with different
units. They proposed scaling the errors based on the training MAE from a simple
forecast method.
For a non-seasonal time series, a useful way to define a scaled error uses naïve
forecasts:qj=ej1T−1T∑t=2|yt−yt−1|.=−1∑=2|−−1|.Because the numerator and
denominator both involve values on the scale of the original data, qj is independent of
the scale of the data. A scaled error is less than one if it arises from a better forecast
than the average naïve forecast computed on the training data. Conversely, it is
greater than one if the forecast is worse than the average naïve forecast computed on
the training data.
For seasonal time series, a scaled error can be defined using seasonal naïve
forecasts:qj=ej1T−mT∑t=m+1|yt−yt−m|.=1−∑=+1|−|.
The mean absolute scaled error is simplyMASE=mean(|qj|).MASE=mean(||).
The forecast accuracy is computed by averaging over the test sets. This procedure is
sometimes known as “evaluation on a rolling forecasting origin” because the “origin”
at which the forecast is based rolls forward in time.
With time series forecasting, one-step forecasts may not be as relevant as multi-step
forecasts. In this case, the cross-validation procedure based on a rolling forecasting
origin can be modified to allow multi-step errors to be used. Suppose that we are
interested in models that produce good 44-step-ahead forecasts. Then the
corresponding diagram is shown below.
40 | P a g e
As expected, the RMSE from the residuals is smaller, as the corresponding “forecasts”
are based on a model fitted to the entire data set, rather than being true forecasts.
A good way to choose the best forecasting model is to find the model with the smallest
RMSE computed using time series cross-validation.
Pipe operator
The ugliness of the above R code makes this a good opportunity to introduce some
alternative ways of stringing R functions together. In the above code, we are nesting
functions within functions within functions, so you have to read the code from the
inside out, making it difficult to understand what is being computed. Instead, we can
use the pipe operator %>% as follows.
goog200 %>% tsCV(forecastfunction=rwf, drift=TRUE, h=1) -> e
e^2 %>% mean(na.rm=TRUE) %>% sqrt()
#> [1] 6.233
goog200 %>% rwf(drift=TRUE) %>% residuals() -> res
res^2 %>% mean(na.rm=TRUE) %>% sqrt()
#> [1] 6.169
The left hand side of each pipe is passed as the first argument to the function on the
right hand side. This is consistent with the way we read from left to right in English.
When using pipes, all other arguments must be named, which also helps readability.
When using pipes, it is natural to use the right arrow assignment -> rather than the
left arrow. For example, the third line above can be read as “Take the goog200 series,
pass it to rwf() with drift=TRUE , compute the resulting residuals, and store them
as res”.
We will use the pipe operator whenever it makes the code easier to read. In order to
be consistent, we will always follow a function with parentheses to differentiate it
from other objects, even if it has no arguments. See, for example, the use
of sqrt() and residuals() in the code above.
Percentage Multiplier
50 0.67
55 0.76
60 0.84
65 0.93
70 1.04
75 1.15
80 1.28
85 1.44
90 1.64
95 1.96
96 2.05
97 2.17
98 2.33
99 2.58
The value of prediction intervals is that they express the uncertainty in the forecasts. If
we only produce point forecasts, there is no way of telling how accurate the forecasts are.
However, if we also produce prediction intervals, then it is clear how much uncertainty is
associated with each forecast. For this reason, point forecasts can be of almost no value
without the accompanying prediction intervals.
42 | P a g e
is531.48±1.96(6.21)=[519.3,543.6].531.48±1.96(6.21)=[519.3,543.6].Similarly, an 80%
prediction interval is given
by531.48±1.28(6.21)=[523.5,539.4].531.48±1.28(6.21)=[523.5,539.4].
The value of the multiplier (1.96 or 1.28) is taken from Table 3.1.
Benchmark methods
For the four benchmark methods, it is possible to mathematically derive the forecast
standard deviation under the assumption of uncorrelated residuals. If ^σh^ℎ denotes the
standard deviation of the hℎ-step forecast distribution, and ^σ^ is the residual standard
deviation, then we can use the following expressions.
Mean forecasts: ^σh=^σ√ 1+1/T ^ℎ=^1+1/
Naïve forecasts: ^σh=^σ√ h ^ℎ=^ℎ
Seasonal naïve forecasts ^σh=^σ√ k+1 ^ℎ=^+1, where k is the integer part
of (h−1)/m(ℎ−1)/ and m is the seasonal period.
Drift forecasts: ^σh=^σ√ h(1+h/T) ^ℎ=^ℎ(1+ℎ/).
44 | P a g e
ARMA Model
A mixed autoregressive moving average process of order (p,q), an ARMA(p,q) process, is a
stationary process {Yt} which satisfies the relation
t ∈ Z, (1)
where µ is the process mean, {²t} is a white noise process with mean 0 and variance σ2, φp 6= 0
and θq 6= 0. 2
t ∈ Z,
where the constant c is given by
From now on we shall assume that the AR and MA characteristic polynomials have no common
factors, since, otherwise, the model would be over-parameterized and the common factors
could be cancelled out in Equation (2) to obtain an equivalent model of lower order with no
common factors. Note that ARMA(p,0) ≡ AR(p) and ARMA(0,q) ≡ MA(q).
The ARMA(p,q) model defines a stationary, linear process if and only if all the roots of the
AR characteristic equation φ(z) = 0 lie strictly outside the unit circle in the complex plane, which
is precisely the condition for the corresponding AR(p) model to define a stationary process. The
resulting process is invertible if and only if all the roots of the MA characteristic equation θ(z) =
0 lie strictly outside the unit circle in the complex plane, which is precisely the condition for the
OPERATION MANAGIMENT FORECASTING
corresponding MA(q) process to be invertible. We shall require both the stationarity and
invertibility conditions to be satisfied.
Having assumed for an ARMA model that the AR and MA characteristic polynomials have no
common factors and that the process is stationary and invertible, it follows that the model and
its parameter values (apart from the process mean µ) are uniquely identifiable from its
autocovariance function. It also follows from Equation (2) that the infinite moving average
expression for {Yt} is given by
,
i.e.,
Yt = µ + ψ(L)²t,
where the generating function ψ is given by
The ARMA processes all belong to the family of linear processes as defined in Section 4.3
(slightly generalized by the addition of the term µ for the process mean). What is important
about the ARMA processes for practical purposes is that they are characterized by a finite
number, p+q +1, of parameters — p autoregressive parameters, q moving average parameters
and one parameter µ for the process mean — which can be estimated from the observed time
series data to which the model is being fitted.
Xp Xq γτ = φkE[Yt−kYt−τ] + E[²tYt−τ] +
θiE[²t−iYt−τ]. (4)
k=1 i=1
E[²t−iYt−τ] = 0, 0 ≤ i ≤ q. Hence if τ
46 | P a g e
Xp γτ =
φkγτ−k.
k=1
Dividing through by γ0,
. (5)
Equation (5) is similar to Equation (26) of Section 5.4 for the autocorrelation function of
an AR(p) process — it is the same difference equation but with a more restricted range
of validity. Hence the general form of solution for the autocorrelation function of an
ARMA(p,q) process, as a sum of geometric terms, is similar to that for the
corresponding AR(p) process, but the determination of the arbitrary constants in the
general solution is more complicated.
ARIMA models
ARIMA models provide another approach to time series forecasting. Exponential
smoothing and ARIMA models are the two most widely used approaches to time
series forecasting, and provide complementary approaches to the problem. While
exponential smoothing models are based on a description of the trend and seasonality
in the data, ARIMA models aim to describe the autocorrelations in the data.
Before we introduce ARIMA models, we must first discuss the concept of stationarity
and the technique of differencing time series.
In general, a stationary time series will have no predictable patterns in the long-term.
Time plots will show the series to be roughly horizontal (although some cyclic
behaviour is possible), with constant variance.
Backshift notation
The backward shift operator B is a useful notational device when working with time
series lags:Byt=yt−1.=−1.(Some references use L for “lag” instead of B for
“backshift”.) In other words, B, operating on yt, has the effect of shifting the data
back one period. Two applications of B to yt shifts the data back two
periods:B(Byt)=B2yt=yt−2.()=2=−2.For monthly data, if we wish to consider “the
same month last year,” the notation is B12yt12= yt−12−12.
The backward shift operator is convenient for describing the process of differencing.
A first difference can be written
asy′t=yt−yt−1=yt−Byt=(1−B)yt.′=−−1=−=(1−).Note that a first difference is
represented by (1−B)(1−). Similarly, if second-order differences have to be
computed,
then:y′′t=yt−2yt−1+yt−2=(1−2B+B2)yt=(1−B)2yt.″=−2−1+−2=(1−2+2)=(1−)2
.In general, a dth-order difference can be written as (1−B)dyt.(1−).
Backshift notation is particularly useful when combining differences, as the operator
can be treated using ordinary algebraic rules. In particular, terms involving B can be
multiplied together.
For example, a seasonal difference followed by a first difference can be written
as(1−B)(1−Bm)yt=(1−B−Bm+Bm+1)yt=yt−yt−1−yt−m+yt−m−1,(1−)(1−)=(1−−++
1)=−−1−−+−−1,the same result we obtained earlier.
Autoregressive models
In a multiple regression model, we forecast the variable of interest using a linear
combination of predictors. In an autoregression model, we forecast the variable of
interest using a linear combination of past values of the variable. The
term autoregression indicates that it is a regression of the variable against itself.
48 | P a g e
series patterns. The variance of the error term εt will only change the scale of the
series, not the patterns.
When p≥3≥3, the restrictions are much more complicated. R takes care of these
restrictions when estimating a model.
OPERATION MANAGIMENT FORECASTING
Figure 8.6: Two examples of data from moving average models with different
parameters. Left: MA(1) with yt=20+εt+0.8εt−1=20++0.8−1. Right: MA(2)
with yt=εt−εt−1+0.8εt−2=−−1+0.8−2. In both cases, εt is normally distributed white
noise with mean zero and variance one.
Figure 8.6 shows some data from an MA(1) model and an MA(2) model. Changing the
parameters θ1,…,θq1,…, results in different time series patterns. As with
autoregressive models, the variance of the error term εt will only change the scale of
the series, not the patterns.
It is possible to write any stationary AR( p) model as an MA( ∞∞) model. For example,
using repeated substitution, we can demonstrate this for an AR(1)
model:yt=ϕ1yt−1+εt=ϕ1(ϕ1yt−2+εt−1)+εt=ϕ21yt−2+ϕ1εt−1+εt=ϕ31yt−3+ϕ21εt−2+ϕ1εt−1
+εtetc.=1−1+=1(1−2+−1)+=12−2+1−1+=13−3+12−2+1−1+etc.
Provided −1<ϕ1<1−1<1<1, the value of ϕk11 will get smaller as k gets larger. So
eventually we
50 | P a g e
obtainyt=εt+ϕ1εt−1+ϕ21εt−2+ϕ31εt−3+⋯,=+1−1+12−2+13−3+⋯,an MA(∞∞)
process.
The reverse result holds if we impose some constraints on the MA parameters. Then
the MA model is called invertible. That is, we can write any invertible MA( q)
process as an AR( ∞∞) process. Invertible models are not simply introduced to enable
us to convert from MA models to AR models. They also have some desirable
mathematical properties.
For example, consider the MA(1) process, yt=εt+θ1εt−1=+1−1. In its AR(∞∞)
representation, the most recent error can be written as a linear function of current
and past observations: εt=∞∑j=0(−θ)jyt−j.=∑=0∞(−)−.When |θ|>1||>1, the
weights increase as lags increase, so the more distant the observations the greater
their influence on the current error. When |θ|=1||=1, the weights are constant in
size, and the distant observations have the same influence as the recent observations.
As neither of these situations make much sense, we require |θ|<1||<1, so the most
recent observations have higher weight than observations from the more distant past.
Thus, the process is invertible when |θ|<1||<1.
The invertibility constraints for other models are similar to the stationarity
constraints.
More complicated conditions hold for q≥3≥3. Again, R will take care of these
constraints when estimating the models.
Information Criteria
Akaike’s Information Criterion (AIC), which was useful in selecting predictors for
regression, is also useful for determining the order of an ARIMA model. It can be
written asAIC=−2log(L)+2(p+q+k+1),AIC=−2log()+2(+++1),where L is
the likelihood of the data, k=1=1 if c≠0≠0 and k=0=0 if c=0=0. Note that the last
term in parentheses is the number of parameters in the model (including σ22, the
variance of the residuals).
For ARIMA models, the corrected AIC can be written
asAICc=AIC+2(p+q+k+1)(p+q+k+2)T−p−q−k−2,AICc=AIC+2(+++1)(++
+2)−−−−2,and the Bayesian Information Criterion can be written
asBIC=AIC+[log(T)−2](p+q+k+1).BIC=AIC+[log()−2](+++1).Good
models are obtained by minimising the AIC, AICc or BIC. Our preference is to use the
AICc.
It is important to note that these information criteria tend not to be good guides to
selecting the appropriate order of differencing ( d) of a model, but only for selecting
the values of p and q. This is because the differencing changes the data on which the
likelihood is computed, making the AIC values between models with different orders
of differencing not comparable. So we need to use some other approach to choose d,
and then we can use the AICc to select p and q.
52 | P a g e
References
Principles of management by B.S.Shah.
Principles of management by T. Ramasamy.
Principle of management by P.N.Reedy
Rob J Hyndman and George Athanasopoulos
Monash University, Australia
Armstrong, J. S. (1978). Long-range forecasting: From crystal ball to computer. John Wiley
& Sons. [Amazon]
Hyndman, R. J., & Khandakar, Y. (2008). Automatic time series forecasting: The forecast
package for R. Journal of Statistical Software,
Kwiatkowski, D., Phillips, P. C. B., Schmidt, P., & Shin, Y. (1992). Testing the null hypothesis
of stationarity against the alternative of a unit root: How sure are we that economic time series
have a unit root? Journal of Econometrics