Forcasting Ass@OM

College of Business and Economics
Department of Management
Postgraduate Program (MBA)
In sawla campus
Operation management
Group -II
1. EYERUSALEM TIMHERTE Id no. PWBE/029/15
2. TESFANEH GORFU Id no. PWBE/033/15
3. TAMIRAT TEREFE Id no. PWBE//15
4. YAKOB PEKO Id no. PWBE//15
Submission date: September 30/2023

Submitted to: Abiot Tsegaye(BA, MBA, PhD)
AMU, SAWLA CAMPUS

ETHIOPIA.
OPERATION MANAGIMENT FORECASTING
1.1 Meaning of Forecasting

In simple terms forecasting means, “estimation or prediction of future”. The
prediction of outcomes, trends, or expected future behavior of a business, industry
sector, or the economy through the use of statistics.
Forecasting is an operational research technique used as a basis for management
planning and decision making.
Webster's new collegiate dictionary defines that, “A forecast is a prediction and its
purpose is to calculate and predict some future events or condition.”
Allen L.A., “forecasting is a systemic attempt to probe the future by inference from
known facts.”
Neter & Wasserman, “business forecasting is refers to a statistical analysis of the past
and current movements in the given time series so as to obtain clues about the future
pattern of these movement.”
 Forecasting is a systematic guessing of the future course of events.

 Forecasting provides a basis for a planning.
 According to fayol, forecasting includes both assessing the future and making
provision for it.
2. Features of forecasting
 It is concerned with future events.
 It is necessary for planning process.
 The impact of future events has to be considered in the planning process.
 It is a guessing of future events.
 It considers all the factors which affect organizational functions.
 Personal observation also helps forecasting.
3. ELEMENTS OF A GOOD FORECAST

A properly prepared forecast should fulfill certain requirements:
1. The forecast should be timely. Usually, a certain amount of time is needed to respond
to
the information contained in a forecast.
2. The forecast should be accurate, and the degree of accuracy should be stated. This
will
enable users to plan for possible errors and will provide a basis for comparing
alternative forecasts.
3. The forecast should be reliable; it should work consistently. A technique that
sometimes
provides a good forecast and sometimes a poor one will leave users with the uneasy
2|Page
feeling that they may get burned every time a new forecast is issued.
4. The forecast should be expressed in meaningful units. Financial planners need to
know
how many dollars will be needed, production planners need to know how many units
will be needed, and schedulers need to know what machines and skills will be required.
The choice of units depends on user needs.
5. The forecast should be in writing. Although this will not guarantee that all
concerned are
using the same information, it will at least increase the likelihood of it. In addition, a
written
forecast will permit an objective basis for evaluating the forecast once actual results are
in.
6. The forecasting technique should be simple to understand and use.
7. The forecast should be cost-effective: The benefits should outweigh the costs.
4. Importance of forecasting
1. Pivotal role in an organization:-Many organizations have failed because of lack of
forecasting or faulty forecasting. The reason is that planning is based on accurate
forecasting.
2. Development of a business:-The performance of specified objectives depends upon
the proper forecasting. So the development of a business or an organization is fully
based on the forecasting
3. Co-ordination:-Forecasting helps to collect the information about internal and
external factors. Thus collected information provides a basis for co-ordination.
4. Effective control:-Management executive can ascertain the strength and weaknesses
of sub-ordinates or employees through forecasting.
5. Key to success:-All business organizations are facing risks. Forecasting provides clues
and reduce risk and uncertainties. The management executives can save the business
and get success by taking appropriate action.
6. Implementation of project:-Many entrepreneurs implement a project on the basis of
their experience .Forecasting helps an entrepreneur to gain experience and ensures him
success.
7. Primacy to planning:-The information required for planning is supplied by
forecasting. So, forecasting is the primacy to the planning.
Advantages
 Effective handling of uncertainty

 Better labor relations
 Balanced work-load
 Minimization in the fluctuations of production
 Better use of production facilities
 Better material management
 Better customer service
 Better utilization of capital and resources
 Better design of facilities and production system.
Limitation
 Forecasting is to be made on the basis of certain assumptions and human
judgments which yield wrong result.
 It can not be considered as a scientific method for guessing future events.
 It does not specify any concrete relationship between past and future events.
 It requires high degree of skill.
 It needs adequate reliable information so difficult to collect reliable information.
 Heavy cost and time consuming.
 It can not be applied to a long period.
5 What can be forecast?

Forecasting is required in many situations: deciding whether to build another power
generation plant in the next five years requires forecasts of future demand;
scheduling staff in a call centre next week requires forecasts of call volumes; stocking
an inventory requires forecasts of stock requirements. Forecasts can be required
several years in advance (for the case of capital investments), or only a few minutes
beforehand (for telecommunication routing). Whatever the circumstances or time
horizons involved, forecasting is an important aid to effective and efficient planning.
Some things are easier to forecast than others. The time of the sunrise tomorrow
morning can be forecast precisely. On the other hand, tomorrow’s lotto numbers
cannot be forecast with any accuracy. The predictability of an event or a quantity
depends on several factors including:
1. how well we understand the factors that contribute to it;

2. how much data is available;
3. whether the forecasts can affect the thing we are trying to forecast.
For example, forecasts of electricity demand can be highly accurate because all three
conditions are usually satisfied. We have a good idea of the contributing factors:
electricity demand is driven largely by temperatures, with smaller effects for calendar
variation such as holidays, and economic conditions. Provided there is a sufficient
4|Page
history of data on electricity demand and weather conditions, and we have the skills
to develop a good model linking electricity demand and the key driver variables, the
forecasts can be remarkably accurate.
On the other hand, when forecasting currency exchange rates, only one of the
conditions is satisfied: there is plenty of available data. However, we have a limited
understanding of the factors that affect exchange rates, and forecasts of the e xchange
rate have a direct effect on the rates themselves. If there are well-publicised forecasts
that the exchange rate will increase, then people will immediately adjust the price
they are willing to pay and so the forecasts are self-fulfilling. In a sense, the exchange
rates become their own forecasts. This is an example of the “efficient market
hypothesis”. Consequently, forecasting whether the exchange rate will rise or fall
tomorrow is about as predictable as forecasting whether a tossed coin will com e down
as a head or a tail. In both situations, you will be correct about 50% of the time,
whatever you forecast. In situations like this, forecasters need to be aware of their
own limitations, and not claim more than is possible.
Often in forecasting, a key step is knowing when something can be forecast
accurately, and when forecasts will be no better than tossing a coin. Good forecasts
capture the genuine patterns and relationships which exist in the historical data, but
do not replicate past events that will not occur again. In this book, we will learn how
to tell the difference between a random fluctuation in the past data that should be
ignored, and a genuine pattern that should be modelled and extrapolated.
Many people wrongly assume that forecasts are not possible in a changing
environment. Every environment is changing, and a good forecasting model captures
the way in which things are changing. Forecasts rarely assume that the environment
is unchanging. What is normally assumed is that the way in which the environment is
changing will continue into the future. That is, a highly volatile environment will
continue to be highly volatile; a business with fluctuating sales will continue to have
fluctuating sales; and an economy that has gone through booms and busts will
continue to go through booms and busts. A forecasting model is intended to capture
the way things move, not just where things are. As Abraham Lincoln said, “If we could
first know where we are and whither we are tending, we could better judge what to do
and how to do it”.
Forecasting situations vary widely in their time horizons, factors determining actual
outcomes, types of data patterns, and many other aspects. Forecasting methods can
be simple, such as using the most recent observation as a forecast (which is called
the naïve method), or highly complex, such as neural nets and econometric systems
of simultaneous equations.
5.1 Forecasting, planning and goals

Forecasting is a common statistical task in business, where it helps to inform
decisions about the scheduling of production, transportation and personnel, and
provides a guide to long-term strategic planning. However, business forecasting is
often done poorly, and is frequently confused with planning and goals. They are three
different things.
Forecasting
is about predicting the future as accurately as possible, given all of the
information available, including historical data and knowledge of any future
events that might impact the forecasts.
Goals
are what you would like to have happen. Goals should be linked to forecasts
and plans, but this does not always occur. Too often, goals are set without any
plan for how to achieve them, and no forecasts for whether they are realistic.
Planning
is a response to forecasts and goals. Planning involves determining the
appropriate actions that are required to make your forecasts match your goals.
Forecasting should be an integral part of the decision-making activities of
management, as it can play an important role in many areas of a company. Modern
organisations require short-term, medium-term and long-term forecasts, depending
on the specific application.
Short-term forecasts
are needed for the scheduling of personnel, production and transportation. As
part of the scheduling process, forecasts of demand are often also required.
Medium-term forecasts
are needed to determine future resource requirements, in order to purchase
raw materials, hire personnel, or buy machinery and equipment.
Long-term forecasts
are used in strategic planning. Such decisions must take account of market
opportunities, environmental factors and internal resources.
An organisation needs to develop a forecasting system that involves several
approaches to predicting uncertain events. Such forecasting systems require the
development of expertise in identifying forecasting problems, applying a range of
forecasting methods, selecting appropriate methods for each problem, and evaluating
and refining forecasting methods over time. It is also important to have strong
organisational support for the use of formal forecasting methods if they are to be used
successfully.
5.2 Determining what to forecast

In the early stages of a forecasting project, decisions need to be made about what
should be forecast. For example, if forecasts are required for items in a manufacturing
environment, it is necessary to ask whether forecasts are needed for:
6|Page
1. Every product line, or for groups of products?
2. Every sales outlet, or for outlets grouped by region, or only for total sales?
3. Weekly data, monthly data or annual data?
It is also necessary to consider the forecasting horizon. Will forecasts be required for
one month in advance, for 6 months, or for ten years? Different types of models will
be necessary, depending on what forecast horizon is most important.
How frequently are forecasts required? Forecasts that need to be produced frequently
are better done using an automated system than with methods that require careful
manual work.
It is worth spending time talking to the people who will use the forecasts to en sure
that you understand their needs, and how the forecasts are to be used, before
embarking on extensive work in producing the forecasts.
Once it has been determined what forecasts are required, it is then necessary to find
or collect the data on which the forecasts will be based. The data required for
forecasting may already exist. These days, a lot of data are recorded, and the
forecaster’s task is often to identify where and how the required data are stored. The
data may include sales records of a company, the historical demand for a product, or
the unemployment rate for a geographic region. A large part of a forecaster’s time can
be spent in locating and collating the available data prior to developing suitable
forecasting methods.
5.3 Forecasting data and methods

The appropriate forecasting methods depend largely on what data are available.
If there are no data available, or if the data available are not relevant to the forecasts,
then qualitative forecasting methods must be used. These methods are not purely
guesswork—there are well-developed structured approaches to obtaining good
forecasts without using historical data. Quantitative forecasting can be applied
when two conditions are satisfied:
1. numerical information about the past is available;

2. it is reasonable to assume that some aspects of the past patterns will continue into the
future.
There is a wide range of quantitative forecasting methods, often developed within

specific disciplines for specific purposes. Each method has its own properties,
accuracies, and costs that must be considered when choosing a specific method.
Most quantitative prediction problems use either time series data (collected at regular
intervals over time) or cross-sectional data (collected at a single point in time). In this
book we are concerned with forecasting future data, and we concentrate on the time
series domain.
Time series forecasting

Examples of time series data include:
 Daily IBM stock prices

 Monthly rainfall
 Quarterly sales results for Amazon
 Annual Google profits
Anything that is observed sequentially over time is a time series. In this book, we will
only consider time series that are observed at regular intervals of time (e.g., hourly,
daily, weekly, monthly, quarterly, annually). Irregularly spaced time series can also
occur, but are beyond the scope of this book.
When forecasting time series data, the aim is to estimate how the sequence of
observations will continue into the future. Figure 1.1 shows the quarterly Australian
beer production from 1992 to the second quarter of 2010.
Figure 1.1: Australian quarterly beer production: 1992Q1–2010Q2, with two years of
forecasts.
The blue lines show forecasts for the next two years. Notice how the forecasts have
captured the seasonal pattern seen in the historical data and replicated it for the next
two years. The dark shaded region shows 80% prediction intervals. That is, each
future value is expected to lie in the dark shaded region with a probability of 80%.
The light shaded region shows 95% prediction intervals. These prediction intervals
8|Page
are a useful way of displaying the uncertainty in forecasts. In this case the forecasts
are expected to be accurate, and hence the prediction intervals are quite narrow.
The simplest time series forecasting methods use only information on the variable to
be forecast, and make no attempt to discover the factors that affect its behaviour.
Therefore they will extrapolate trend and seasonal patterns, but they ignore all other
information such as marketing initiatives, competitor activity, changes in economic
conditions, and so on.
Time series models used for forecasting include decomposition models, exponential
smoothing models and ARIMA models.
Predictor variables and time series forecasting

Predictor variables are often useful in time series forecasting. For example, suppose
we wish to forecast the hourly electricity demand (ED) of a hot region during the
summer period. A model with predictor variables might be of the form ED=f(current
temperature, strength of economy, population, time of day, day of week,
error).ED=(current temperature, strength of economy, population,
time of day, day of week, error).The relationship is not exact — there will
always be changes in electricity demand that cannot be accounted for by the predictor
variables. The “error” term on the right allows for random variation and the effects of
relevant variables that are not included in the model. We call this an explanatory
model because it helps explain what causes the variation in electricity demand.
Because the electricity demand data form a time series, we could also use a time
series model for forecasting. In this case, a suitable time series forecasting equation
is of the form
EDt+1=f(EDt,EDt−1,EDt−2,EDt−3,…,error),ED+1=(ED,ED−1,ED−2,ED−3,…
,error),
where t is the present hour, t+1+1 is the next hour, t−1−1 is the previous
hour, t−2−2 is two hours ago, and so on. Here, prediction of the future is based on
past values of a variable, but not on external variables which may affect the system.
Again, the “error” term on the right allows for random variation and the effects of
relevant variables that are not included in the model.
There is also a third type of model which combines the features of the above two
models. For example, it might be given by EDt+1=f(EDt, current temperature, time
of day, day of week, error).ED+1=(ED, current temperature, time of day,
day of week, error).These types of mixed models have been given various
names in different disciplines. They are known as dynamic regression models, panel
data models, longitudinal models, transfer function models, and linear system models
(assuming that f is linear).
An explanatory model is useful because it incorporates information about other
variables, rather than only historical values of the variable to be forecast. However,
there are several reasons a forecaster might select a time series model rather than an
explanatory or mixed model. First, the system may not be understood, and even if it
was understood it may be extremely difficult to measure the relationships that are
assumed to govern its behavior. Second, it is necessary to know or forecast the future
values of the various predictors in order to be able to forecast the variable of int erest,
and this may be too difficult. Third, the main concern may be only to predict what will
happen, not to know why it happens. Finally, the time series model may give more
accurate forecasts than an explanatory or mixed model.
The model to be used in forecasting depends on the resources and data available, the
accuracy of the competing models, and the way in which the forecasting model is to be
used.
5.4 The basic steps in a forecasting task

A forecasting task usually involves five basic steps.
Step 1: Problem definition.
Often this is the most difficult part of forecasting. Defining the problem
carefully requires an understanding of the way the forecasts will be used, who
requires the forecasts, and how the forecasting function fits within the
organization requiring the forecasts. A forecaster needs to spend time talking
to everyone who will be involved in collecting data, maintaining databases, and
using the forecasts for future planning.
Step 2: Gathering information.

There are always at least two kinds of information required: (a) statistical data,
and (b) the accumulated expertise of the people who collect the data and use
the forecasts. Often, it will be difficult to obtain enough historical data to be
able to fit a good statistical model. Occasionally, old data will be less useful due
to structural changes in the system being forecast; then we may choose to use
only the most recent data. However, remember that good statistical models will
handle evolutionary changes in the system; don’t throw away good data
unnecessarily.
Step 3: Preliminary (exploratory) analysis.

Always start by graphing the data. Are there consistent patterns? Is there a
significant trend? Is seasonality important? Is there evidence of the presence of
business cycles? Are there any outliers in the data that need to be explained by
those with expert knowledge? How strong are the relationships among the
variables available for analysis? Various tools have been developed to help with
this analysis.
Step 4: Choosing and fitting models.

The best model to use depends on the availability of historical data, the
strength of relationships between the forecast variable and any explanatory
10 | P a g e
variables, and the way in which the forecasts are to be used. It is common to
compare two or three potential models. Each model is itself an artificial
construct that is based on a set of assumptions (explicit and implicit) and
usually involves one or more parameters which must be estimated using the
known historical data.
Step 5: Using and evaluating a forecasting model.

Once a model has been selected and its parameters estimated, the model is
used to make forecasts. The performance of the model can only be properly
evaluated after the data for the forecast period have become available. A
number of methods have been developed to help in assessing the accuracy of
forecasts. There are also organizational issues in using and acting on the
forecasts. A brief discussion of some of these issues is given in Chapter 3. When
using a forecasting model in practice, numerous practical issues arise such as
how to handle missing values and outliers, or how to deal with short time
series.
5.5 The statistical forecasting perspective

The thing we are trying to forecast is unknown (or we would not be forecasting it),
and so we can think of it as a random variable. For example, the total sales for next
month could take a range of possible values, and until we add up the actual sales at
the end of the month, we don’t know what the value will be. So until we know the
sales for next month, it is a random quantity.
Because next month is relatively close, we usually have a good idea what the likely
sales values could be. On the other hand, if we are forecasting the sales for the same
month next year, the possible values it could take are much more variable. In most
forecasting situations, the variation associated with the thing we are forecasting will
shrink as the event approaches. In other words, the further ahead we forecast, the
more uncertain we are.
We can imagine many possible futures, each yielding a different value for the thing we
wish to forecast. Plotted in black in Figure 1.2 are the total international visitors to
Australia from 1980 to 2015. Also shown are ten possible futures from 2016–2025.
Figure 1.2: Total international visitors to Australia (1980-2015) along with ten possible
futures.
When we obtain a forecast, we are estimating the middle of the range of possible
values the random variable could take. Often, a forecast is accompanied by
a prediction interval giving a range of values the random variable could take with
relatively high probability. For example, a 95% prediction interval contains a range of
values which should include the actual future value with probability 95%.
Rather than plotting individual possible futures as shown in Figure 1.2, we usually
show these prediction intervals instead. The plot below shows 80% and 95% intervals
for the future Australian international visitors. The blue line is the average of the
possible future values, which we call the point forecasts.
12 | P a g e
Figure 1.3: Total international visitors to Australia (1980–2015) along with 10-year
forecasts and 80% and 95% prediction intervals.
We will use the subscript t for time. For example, yt will denote the observation at
time t. Suppose we denote all the information we have observed as I and we want to
forecast yt. We then write yt|I| meaning “the random variable yt given what we know
in I”. The set of values that this random variable could take, along with their relative
probabilities, is known as the “probability distribution” of yt|I|. In forecasting, we call
this the forecast distribution.
When we talk about the “forecast”, we usually mean the average value of the forecast
distribution, and we put a “hat” over y to show this. Thus, we write the forecast
of yt as ^yt^, meaning the average of the possible values that yt could take given
everything we know. Occasionally, we will use ^yt^ to refer to the median (or middle
value) of the forecast distribution instead.
It is often useful to specify exactly what information we have used in calculating the
forecast. Then we will write, for example, ^yt|t−1^|−1 to mean the forecast
of yt taking account of all previous observations (y1,…,yt−1)(1,…,−1).
Similarly, ^yT+h|T^+ℎ| means the forecast of yT+h+ℎ taking account
of y1,…,yT1,…, (i.e., an hℎ-step forecast taking account of all observations up to
time T).
6. Time series graphics

The first thing to do in any data analysis task is to plot the data. Graphs enable many
features of the data to be visualised, including patterns, unusual observations,
changes over time, and relationships between variables. The features that are seen in
plots of the data must then be incorporated, as much as possible, into the forecasting
methods to be used. Just as the type of data determines what forecasting method to
use, it also determines what graphs are appropriate. But before we produce graphs,
we need to set up our time series in R.
6.1 ts objects
A time series can be thought of as a list of numbers, along with some information
about what times those numbers were recorded. This information can be stored as
a ts object in R.
Suppose you have annual observations for the last few years:
Year Observation
2012 123
2013 39
2014 78
2015 52
2016 110
We turn this into a ts object using the ts() function:

y <- ts(c(123,39,78,52,110), start=2012)
If you have annual data, with one observation per year, you only need to provide the
starting year (or the ending year).
For observations that are more frequent than once per year, you simply add
a frequency argument. For example, if your monthly data is already stored as a
numerical vector z, then it can be converted to a ts object like this:
y <- ts(z, start=2003, frequency=12)
Almost all of the data used in this book is already stored as ts objects. But if you want
to work with your own data, you will need to use the ts() function before proceeding
with the analysis.
Frequency of a time series

The “frequency” is the number of observations before the seasonal pattern
repeats.1 When using the ts() function in R, the following choices should be used.
Data frequency
Annual 1
14 | P a g e
Data frequency
Quarterly 4
Monthly 12
Weekly 52
Actually, there are not 5252 weeks in a year,

but 365.25/7=52.18365.25/7=52.18 on average, allowing for a leap year every
fourth year. But most functions which use ts objects require integer frequency.
If the frequency of observations is greater than once per week, then there is usually
more than one way of handling the frequency. For example, data with daily
observations might have a weekly seasonality (frequency =7=7) or an annual
seasonality (frequency =365.25=365.25). Similarly, data that are observed every
minute might have an hourly seasonality (frequency =60=60), a daily seasonality
(frequency=24×60=1440=24×60=1440), a weekly seasonality
(frequency=24×60×7=10080=24×60×7=10080) and an annual seasonality
(frequency=24×60×365.25=525960=24×60×365.25=525960). If you want to
use a ts object, then you need to decide which of these is the most important.
In chapter 11 we will look at handling these types of multiple seasonality, without
having to choose just one of the frequencies.
6.2 Time plots

For time series data, the obvious graph to start with is a time plot. That is, the
observations are plotted against the time of observation, with consecutive
observations joined by straight lines. Figure 2.1 below shows the weekly economy
passenger load on Ansett Airlines between Australia’s two largest cities.
autoplot(melsyd[,"Economy.Class"]) +
ggtitle("Economy class passengers: Melbourne-Sydney") +
xlab("Year") +
ylab("Thousands")
Figure 2.1: Weekly economy passenger load on Ansett Airlines.
We will use the autoplot() command frequently. It automatically produces an

appropriate plot of whatever you pass to it in the first argument. In this case, it
recognises melsyd[,"Economy.Class"] as a time series and produces a time plot.
The time plot immediately reveals some interesting features.
 There was a period in 1989 when no passengers were carried — this was due to an
industrial dispute.
 There was a period of reduced load in 1992. This was due to a trial in which some
economy class seats were replaced by business class seats.
 A large increase in passenger load occurred in the second half of 1991.
 There are some large dips in load around the start of each year. These are due to
holiday effects.
 There is a long-term fluctuation in the level of the series which increases during 1987,
decreases in 1989, and increases again through 1990 and 1991.
 There are some periods of missing observations.
Any model will need to take all these features into account in order to effectively
forecast the passenger load into the future.
A simpler time series is shown in Figure 2.2.
autoplot(a10) +
ggtitle("Antidiabetic drug sales") +
ylab("$ million") +
xlab("Year")
16 | P a g e
Figure 2.2: Monthly sales of antidiabetic drugs in Australia.
Here, there is a clear and increasing trend. There is also a strong seasonal pattern that
increases in size as the level of the series increases. The sudden drop at the start of
each year is caused by a government subsidisation scheme that makes it cost-effective
for patients to stockpile drugs at the end of the calendar year. Any forecasts of this
series would need to capture the seasonal pattern, and the fact that the trend is
changing slowly.
6.3Time series patterns

In describing these time series, we have used words such as “trend” and “seasonal”
which need to be defined more carefully.
Trend
A trend exists when there is a long-term increase or decrease in the data. It
does not have to be linear. Sometimes we will refer to a trend as “changing
direction”, when it might go from an increasing trend to a decreasing trend.
There is a trend in the antidiabetic drug sales data shown in Figure 2.2.
Seasonal
A seasonal pattern occurs when a time series is affected by seasonal factors
such as the time of the year or the day of the week. Seasonality is always of a
fixed and known frequency. The monthly sales of antidiabetic drugs above
shows seasonality which is induced partly by the change in the cost of the drugs
at the end of the calendar year.
Cyclic
A cycle occurs when the data exhibit rises and falls that are not of a fixed
frequency. These fluctuations are usually due to economic conditions, and are
often related to the “business cycle”. The duration of these fluctuations is
usually at least 2 years.
Many people confuse cyclic behaviour with seasonal behaviour, but they are really
quite different. If the fluctuations are not of a fixed frequency then they are cyclic; if
the frequency is unchanging and associated with some aspect of the calendar, then the
pattern is seasonal. In general, the average length of cycles is longer than the length of
a seasonal pattern, and the magnitudes of cycles tend to be more variable than the
magnitudes of seasonal patterns.
Many time series include trend, cycles and seasonality. When choosing a forecasting
method, we will first need to identify the time series patterns in the data, and then
choose a method that is able to capture the patterns properly.
6.4 Seasonal plots

A seasonal plot is similar to a time plot except that the data are plotted against the
individual “seasons” in which the data were observed. An example is given below
showing the antidiabetic drug sales.
ggseasonplot(a10, year.labels=TRUE, year.labels.left=TRUE) +
ylab("$ million") +
ggtitle("Seasonal plot: antidiabetic drug sales")
1.
18 | P a g e
2. Figure 2.4: Seasonal plot of monthly antidiabetic drug sales in Australia.
These are exactly the same data as were shown earlier, but now the data from
each season are overlapped. A seasonal plot allows the underlying seasonal
pattern to be seen more clearly, and is especially useful in identifying years in
which the pattern changes.
In this case, it is clear that there is a large jump in sales in January each year.
Actually, these are probably sales in late December as customers stockpile
before the end of the calendar year, but the sales are not registered with the
government until a week or two later. The graph also shows that there was an
unusually small number of sales in March 2008 (most other years show an
increase between February and March). The small number of sales in June
2008 is probably due to incomplete counting of sales at the time the data were
collected.
6.5 Seasonal subseries plots

An alternative plot that emphasises the seasonal patterns is where the data for each
season are collected together in separate mini time plots.
ggsubseriesplot(a10) +
ylab("$ million") +
ggtitle("Seasonal subseries plot: antidiabetic drug sales")
Figure 2.6: Seasonal subseries plot of monthly antidiabetic drug sales in Australia.
The horizontal lines indicate the means for each month. This form of plot enables the
underlying seasonal pattern to be seen clearly, and also shows the changes in
seasonality over time. It is especially useful in identifying changes within particular
seasons. In this example, the plot is not particularly revealing; but in some cases, this
is the most useful way of viewing seasonal changes over time.
6.6 Scatterplots
The graphs discussed so far are useful for visualising individual time series. It is also
useful to explore relationships between time series.
Figure 2.7 shows two time series: half-hourly electricity demand (in Gigawatts) and
temperature (in degrees Celsius), for 2014 in Victoria, Australia. The temperatures
are for Melbourne, the largest city in Victoria, while the demand values are for the
entire state.
autoplot(elecdemand[,c("Demand","Temperature")], facets=TRUE) +
xlab("Year: 2014") + ylab("") +
ggtitle("Half-hourly electricity demand: Victoria, Australia")
Figure 2.7: Half hourly electricity demand and temperatures in Victoria, Australia, for
2014.
(The actual code for this plot is a little more complicated than what is shown in order
to include the months on the x-axis.)
20 | P a g e
We can study the relationship between demand and temperature by plotting one
series against the other.
qplot(Temperature, Demand, data=as.data.frame(elecdemand)) +
ylab("Demand (GW)") + xlab("Temperature (Celsius)")
#> Warning: `qplot()` was deprecated in ggplot2 3.4.0.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
generated.
Figure 2.8: Half-hourly electricity demand plotted against temperature for 2014 in
Victoria, Australia.
This scatterplot helps us to visualise the relationship between the variables. It is clear
that high demand occurs when temperatures are high due to the effect of air-
conditioning. But there is also a heating effect, where demand increases for very low
temperatures.
6.7 Scatterplot matrices

When there are several potential predictor variables, it is useful to plot each variable
against each other variable. Consider the five time series shown in Figure 2.11,
showing quarterly visitor numbers for five regions of New South Wales, Australia.
autoplot(visnights[,1:5], facets=TRUE) +
ylab("Number of visitor nights each quarter (millions)")
Figure 2.11: Quarterly visitor nights for various regions of NSW, Australia.
To see the relationships between these five time series, we can plot each time series
against the others. These plots can be arranged in a scatterplot matrix, as shown in
Figure 2.12. (This plot requires the GGally package to be installed.)
GGally::ggpairs(as.data.frame(visnights[,1:5]))
22 | P a g e
Figure 2.12: A scatterplot matrix of the quarterly visitor nights in five regions of NSW,
Australia.
For each panel, the variable on the vertical axis is given by the variable name in that
row, and the variable on the horizontal axis is given by the variable name in that
column. There are many options available to produce different plots within each
panel. In the default version, the correlations are shown in the upper right half of the
plot, while the scatterplots are shown in the lower half. On the diagonal are shown
density plots.
The value of the scatterplot matrix is that it enables a quick view of the relationships
between all pairs of variables. In this example, the second column of plots shows
there is a strong positive relationship between visitors to the NSW north coast and
visitors to the NSW south coast, but no detectable relationship between visitors to the
NSW north coast and visitors to the NSW south inland. Outliers can also be seen.
There is one unusually high quarter for the NSW Metropolitan region, corresponding
to the 2000 Sydney Olympics. This is most easily seen in the first two plots in the left
column of Figure 2.12, where the largest value for NSW Metro is separate from the
main cloud of observations.
6.8 Lag plots

Figure 2.13 displays scatterplots of quarterly Australian beer production, where the
horizontal axis shows lagged values of the time series. Each graph
shows yt�� plotted against yt−k��−� for different values of k�.
beer2 <- window(ausbeer, start=1992)
gglagplot(beer2)
24 | P a g e
Figure 2.13: Lagged scatterplots for quarterly beer production.
Here the colours indicate the quarter of the variable on the vertical axis. The lines
connect points in chronological order. The relationship is strongly positive at lags 4
and 8, reflecting the strong seasonality in the data. The negative relationship seen for
lags 2 and 6 occurs because peaks (in Q4) are plotted against troughs (in Q2)
The window() function used here is very useful when extracting a portion of a time
series. In this case, we have extracted the data from ausbeer, beginning in 1992.
6.9 Autocorrelation
Just as correlation measures the extent of a linear relationship between two variables,
autocorrelation measures the linear relationship between lagged values of a time series.
There are several autocorrelation coefficients, corresponding to each panel in the lag plot.
For example, r11 measures the relationship between yt and yt−1−1, r22 measures the
relationship between yt and yt−2−2, and so on.
The value of rk can be written
asrk=T∑t=k+1(yt−¯y)(yt−k−¯y)T∑t=1(yt−¯y)2,=∑=+1(−¯)(¯)∑=1(−¯)2,where T
is the length of the time series.
The first nine autocorrelation coefficients for the beer production data are given in the
following table.
r1 1 r2 2 r3 3 r4 4 r5 5 r6 6 r7 7 r8 8 r9 9
-0.102 -0.657 -0.060 0.869 -0.089 -0.635 -0.054 0.832 -0.108
These correspond to the nine scatterplots in Figure 2.13. The autocorrelation coefficients
are plotted to show the autocorrelation function or ACF. The plot is also known as
a correlogram.
ggAcf(beer2)
Figure 2.14: Autocorrelation function of quarterly beer production.

In this graph:
 r44 is higher than for the other lags. This is due to the seasonal pattern in the data: the
peaks tend to be four quarters apart and the troughs tend to be four quarters apart.
 r22 is more negative than for the other lags because troughs tend to be two quarters
behind peaks.
 The dashed blue lines indicate whether the correlations are significantly different from
zero. These are explained in Section 2.9.
26 | P a g e
6.10 White noise
Time series that show no autocorrelation are called white noise. Figure 2.17 gives an
example of a white noise series.
set.seed(30)
y <- ts(rnorm(50))
autoplot(y) + ggtitle("White noise")
Figure 2.17: A white noise time series.

ggAcf(y)
Figure 2.18: Autocorrelation function for the white noise series.

For white noise series, we expect each autocorrelation to be close to zero. Of course,
they will not be exactly equal to zero as there is some random variation. For a white
noise series, we expect 95% of the spikes in the ACF to lie
within ±2/√ T ±2/ where T is the length of the time series. It is common to plot these
bounds on a graph of the ACF (the blue dashed lines above). If one or more large
spikes are outside these bounds, or if substantially more than 5% of spikes are outside
these bounds, then the series is probably not white noise.
In this example, T=50=50 and so the bounds are
at ±2/√ 50 =±0.28±2/50=±0.28. All of the autocorrelation coefficients lie within
these limits, confirming that the data are white noise.
7. The forecaster’s toolbox

In this chapter, we discuss some general tools that are useful for many different
forecasting situations. We will describe some benchmark forecasting methods, ways
of making the forecasting task simpler using transformations and adjustments,
methods for checking whether a forecasting method has adequately utilised the
available information, and techniques for computing prediction intervals.
Each of the tools discussed in this chapter will be used repeatedly in subsequent
chapters as we develop and explore a range of forecasting methods.
7.1 Some simple forecasting methods

Some forecasting methods are extremely simple and surprisingly effective. We will
use the following four forecasting methods as benchmarks throughout this book.
Average method
Here, the forecasts of all future values are equal to the average (or “mean”) of the
historical data. If we let the historical data be denoted by y1,…,yT1,…,, then we can
write the forecasts as^yT+h|T=¯y=(y1+⋯+yT)/T.^+ℎ|=¯=(1+⋯+)/.The
notation ^yT+h|T^+ℎ| is a short-hand for the estimate of yT+h+ℎ based on the
data y1,…,yT1,…,.
meanf(y, h)
# y contains the time series
# h is the forecast horizon
Naïve method
For naïve forecasts, we simply set all forecasts to be the value of the last observation.
That is,^yT+h|T=yT.^+ℎ|=.This method works remarkably well for many economic
and financial time series.
28 | P a g e
naive(y, h)
rwf(y, h) # Equivalent alternative
Because a naïve forecast is optimal when data follow a random walk, these are also
called random walk forecasts.
Seasonal naïve method

A similar method is useful for highly seasonal data. In this case, we set each forecast
to be equal to the last observed value from the same season (e.g., the same month of
the previous year). Formally, the forecast for time T+h+ℎ is written
as^yT+h|T=yT+h−m(k+1),^+ℎ|=+ℎ−(+1),where m== the seasonal period, and k is
the integer part of (h−1)/m(ℎ−1)/ (i.e., the number of complete years in the forecast
period prior to time T+h+ℎ). This looks more complicated than it really is. For
example, with monthly data, the forecast for all future February values is equal to the
last observed February value. With quarterly data, the forecast of all future Q2 values
is equal to the last observed Q2 value (where Q2 means the second quarter). Similar
rules apply for other months and quarters, and for other seasonal periods.
snaive(y, h)
Drift method
A variation on the naïve method is to allow the forecasts to increase or decrease over
time, where the amount of change over time (called the drift) is set to be the average
change seen in the historical data. Thus the forecast for time T+h+ℎ is given
by^yT+h|T=yT+hT−1T∑t=2(yt−yt−1)=yT+h(yT−y1T−1).^+ℎ|=+ℎ−1∑=2(−−1)=
+ℎ(−1−1).This is equivalent to drawing a line between the first and last
observations, and extrapolating it into the future.
rwf(y, h, drift=TRUE)
7.2 Transformations and adjustments

Adjusting the historical data can often lead to a simpler forecasting task. Here, we
deal with four kinds of adjustments: calendar adjustments, population adjustment s,
inflation adjustments and mathematical transformations. The purpose of these
adjustments and transformations is to simplify the patterns in the historical data by
removing known sources of variation or by making the pattern more consistent across
the whole data set. Simpler patterns usually lead to more accurate forecasts.
Calendar adjustments
Some of the variation seen in seasonal data may be due to simple calendar effects. In
such cases, it is usually much easier to remove the variation before fitting a
forecasting model. The monthdays() function will compute the number of days in each
month or quarter.
For example, if you are studying the monthly milk production on a farm, there will be
variation between the months simply because of the different numbers of days in each
month, in addition to the seasonal variation across the year.
dframe <- cbind(Monthly = milk,
DailyAverage = milk/monthdays(milk))
autoplot(dframe, facet=TRUE) +
xlab("Years") + ylab("Pounds") +
ggtitle("Milk production per cow")
Figure 3.3: Monthly milk production per cow.

Notice how much simpler the seasonal pattern is in the average daily production plot
compared to the total monthly production plot. By looking at the average daily
production instead of the total monthly production, we effectively remove the
variation due to the different month lengths. Simpler patterns are usually easier to
model and lead to more accurate forecasts.
A similar adjustment can be done for sales data when the number of trading days in
each month varies. In this case, the sales per trading day can be modelled instead of
the total sales for each month.
Population adjustments
Any data that are affected by population changes can be adjusted to give per -capita
data. That is, consider the data per person (or per thousand people, or per million
people) rather than the total. For example, if you are studying the number of hospital
beds in a particular region over time, the results are much easier to interpret if you
30 | P a g e
remove the effects of population changes by considering the number of beds per
thousand people. Then you can see whether there have been real increases in the
number of beds, or whether the increases are due entirely to population increases. It
is possible for the total number of beds to increase, but the number of beds per
thousand people to decrease. This occurs when the population is increasing faster
than the number of hospital beds. For most data that are affected by population
changes, it is best to use per-capita data rather than the totals.
Inflation adjustments
Data which are affected by the value of money are best adjusted before modelling. For
example, the average cost of a new house will have increased over the last few decades
due to inflation. A $200,000 house this year is not the same as a $200,000 house
twenty years ago. For this reason, financial time series are usually adjusted so that all
values are stated in dollar values from a particular year. For example, the house price
data may be stated in year 2000 dollars.
To make these adjustments, a price index is used. If zt denotes the price index
and yt denotes the original house price in year t, then xt=yt/zt∗z2000=/∗2000 gives
the adjusted house price at year 2000 dollar values. Price indexes are often
constructed by government agencies. For consumer goods, a common price index is
the Consumer Price Index (or CPI).
Mathematical transformations
If the data show variation that increases or decreases with the level of the series, then
a transformation can be useful. For example, a logarithmic transformation is often
useful. If we denote the original observations as y1,…,yT1,…, and the transformed
observations as w1,…,wT1,…,, then wt=log(yt)=log⁡(). Logarithms are useful
because they are interpretable: changes in a log value are relative (or percentage)
changes on the original scale. So if log base 10 is used, then an increase of 1 on the log
scale corresponds to a multiplication of 10 on the original scale. Another useful
feature of log transformations is that they constrain the forecasts to stay positive on
the original scale.
Sometimes other transformations are also used (although they are not so
interpretable). For example, square roots and cube roots can be used. These are
called power transformations because they can be written in the form wt=ypt=.
A useful family of transformations, that includes both logarithms and power
transformations, is the family of Box-Cox transformations (Box & Cox, 1964),
which depend on the parameter λ and are defined as
follows:wt={log(yt)if λ=0;(yλt−1)/λotherwise.={log⁡()if =0;(−1)/otherwise
.
The logarithm in a Box-Cox transformation is always a natural logarithm (i.e., to

base e). So if λ=0=0, natural logarithms are used, but if λ≠0≠0, a power
transformation is used, followed by some simple scaling.
If λ=1=1, then wt=yt−1=−1, so the transformed data is shifted downwards but there
is no change in the shape of the time series. But for all other values of λ, the time
series will change shape.
Use the slider below to see the effect of varying λ to transform Australian monthly
electricity production:
A good value of λ is one which makes the size of the seasonal variation about the
same across the whole series, as that makes the forecasting model simpler. In this
case, λ=0.30=0.30 works quite well, although any value of λ between 0 and 0.5
would give similar results.
The BoxCox.lambda() function will choose a value of lambda for you.
(lambda <- BoxCox.lambda(elec))
#> [1] 0.2654
autoplot(BoxCox(elec,lambda))
The BoxCox() command actually implements a slight modification of the Box-Cox

transformation, discussed in Bickel & Doksum (1981), which allows for negative
values
of yt when λ>0>0:wt={log(yt)if λ=0;sign(yt)(|yt|λ−1)/λotherwise.={log⁡()if =
0;sign()(||−1)/otherwise.For positive values of yt, this is the same as the original
Box-Cox transformation.
Having chosen a transformation, we need to forecast the transformed data. Then, we
need to reverse the transformation (or back-transform) to obtain forecasts on the
original scale. The reverse Box-Cox transformation is given
byyt={exp(wt)if λ=0;sign(λwt+1)|λwt+1|1/λotherwise.(3.1)(3.1)={exp⁡()if =
0;sign(+1)|+1|1/otherwise.
32 | P a g e
Features of power transformations
 Choose a simple value of λ. It makes explanations easier.
 The forecasting results are relatively insensitive to the value of λ.
 Often no transformation is needed.
 Transformations sometimes make little difference to the forecasts but have a large
effect on prediction intervals .
Bias adjustments
One issue with using mathematical transformations such as Box-Cox transformations
is that the back-transformed point forecast will not be the mean of the forecast
distribution. In fact, it will usually be the median of the forecast distribution
(assuming that the distribution on the transformed space is symmetric). For many
purposes, this is acceptable, but occasionally the mean forecast is required. For
example, you may wish to add up sales forecasts from various regions to form a
forecast for the whole country. But medians do not add up, whereas means do.
For a Box-Cox transformation, the back-transformed mean is given by yt=⎧⎪

⎪
⎪⎨⎪
⎪
⎪⎩exp(wt)[1+σ2h2]if λ=0;(λwt+1)1/λ[1+σ2h(1−λ)2(λwt+1)2]otherwise;(3.2)(3.2)={e
xp⁡()[1+ℎ22]if =0;(+1)1/[1+ℎ2(1−)2(+1)2]otherwise; where σ2hℎ2 is
the hℎ-step forecast variance. The larger the forecast variance, the bigger the
difference between the mean and the median.
The difference between the simple back-transformed forecast given by and the mean
given by is called the bias. When we use the mean, rather than the median, we say
the point forecasts have been bias-adjusted.
To see how much difference this bias-adjustment makes, consider the following
example, where we forecast average annual price of eggs using the drift method with a
log transformation (λ=0)(=0). The log transformation is useful in this case to ensure
the forecasts and the prediction intervals stay positive.
fc <- rwf(eggs, drift=TRUE, lambda=0, h=50, level=80)
fc2 <- rwf(eggs, drift=TRUE, lambda=0, h=50, level=80,
biasadj=TRUE)
autoplot(eggs) +
autolayer(fc, series="Simple back transformation") +
autolayer(fc2, series="Bias adjusted", PI=FALSE) +
guides(colour=guide_legend(title="Forecast"))
Figure 3.4: Forecasts of egg prices using a random walk with drift applied to the
logged data.
The blue line in Figure 3.4 shows the forecast medians while the red line shows the
forecast means. Notice how the skewed forecast distribution pulls up the point
forecast when we use the bias adjustment.
Bias adjustment is not done by default in the forecast package. If you want your
forecasts to be means rather than medians, use the argument biasadj=TRUE when you
select your Box-Cox transformation parameter.
7.3 Residual diagnostics

Fitted values
Each observation in a time series can be forecast using all previous observations. We
call these fitted values and they are denoted by ^yt|t−1^|−1, meaning the forecast
of yt based on observations y1,…,yt−11,…,−1 . We use these so often, we sometimes
drop part of the subscript and just write ^yt^ instead of ^yt|t−1^|−1. Fitted values
always involve one-step forecasts.
34 | P a g e
Actually, fitted values are often not true forecasts because any parameters involved in
the forecasting method are estimated using all available observations in the time
series, including future observations. For example, if we use the average method, the
fitted values are given by^yt=^c^=^where ^c^ is the average computed over all
available observations, including those at times after t. Similarly, for the drift
method, the drift parameter is estimated using all available observations. In this case,
the fitted values are given
by^yt=yt−1+^c^=−1+^where ^c=(yT−y1)/(T−1)^=(−1)/(−1). In both cases,
there is a parameter to be estimated from the data. The “hat” above the c reminds us
that this is an estimate. When the estimate of c involves observations after time t, the
fitted values are not true forecasts. On the other hand, naïve or seasonal naïve
forecasts do not involve any parameters, and so fitted values are true forecasts in such
cases.
Residuals
The “residuals” in a time series model are what is left over after fitting a model. F or
many (but not all) time series models, the residuals are equal to the difference
between the observations and the corresponding fitted values: et=yt−^yt.=−^.
Residuals are useful in checking whether a model has adequately captured the
information in the data. A good forecasting method will yield residuals with the
following properties:
1. The residuals are uncorrelated. If there are correlations between residuals, then there is
information left in the residuals which should be used in computing forecasts.
2. The residuals have zero mean. If the residuals have a mean other than zero, then the forecasts
are biased.
Any forecasting method that does not satisfy these properties can be improved.
However, that does not mean that forecasting methods that satisfy these properties
cannot be improved. It is possible to have several different forecasting methods for
the same data set, all of which satisfy these properties. Checking these properties is
important in order to see whether a method is using all of the available information,
but it is not a good way to select a forecasting method.
If either of these properties is not satisfied, then the forecasting method can be
modified to give better forecasts. Adjusting for bias is easy: if the residuals have
mean m, then simply add m to all forecasts and the bias problem is solved.
In addition to these essential properties, it is useful (but not necessary) for the
residuals to also have the following two properties.
3. The residuals have constant variance.

4. The residuals are normally distributed.
These two properties make the calculation of prediction intervals easier (see
Section 3.5 for an example). However, a forecasting method that does not satisfy
these properties cannot necessarily be improved. Sometimes applying a Box-Cox
transformation may assist with these properties, but otherwise there is usually little
that you can do to ensure that your residuals have constant variance and a normal
distribution. Instead, an alternative approach to obtaining prediction intervals is
necessary. Again, we will not address how to do this until later in the book.
Portmanteau tests for autocorrelation

In addition to looking at the ACF plot, we can also do a more formal test for
autocorrelation by considering a whole set of rk values as a group, rather than treating
each one separately.
Recall that rk is the autocorrelation for lag k. When we look at the ACF plot to see
whether each spike is within the required limits, we are implicitly carrying out
multiple hypothesis tests, each one with a small probability of giving a false positive.
When enough of these tests are done, it is likely that at least one will give a false
positive, and so we may conclude that the residuals have some remaining
autocorrelation, when in fact they do not.
In order to overcome this problem, we test whether the first hℎ autocorrelations are
significantly different from what would be expected from a white noise process. A test
for a group of autocorrelations is called a portmanteau test, from a French word
describing a suitcase or coat rack carrying several items of clothing.
One such test is the Box-Pierce test, based on the following
statisticQ=Tℓ∑k=1r2k,=∑=1ℓ2,where ℓℓ is the maximum lag being considered
and T is the number of observations. If each rkis close to zero, then Q will be small. If
some rk values are large (positive or negative), then Q will be large. We suggest
using ℓ=10ℓ=10 for non-seasonal data and ℓ=2mℓ=2 for seasonal data, where m is
the period of seasonality. However, the test is not good when ℓℓ is large, so if these
values are larger than T/5/5, then use ℓ=T/5ℓ=/5
A related (and more accurate) test is the Ljung-Box test, based
onQ∗=T(T+2)ℓ∑k=1(T−k)−1r2k.∗=(+2)∑=1ℓ(−)−12.Again, large values
of Q∗∗ suggest that the autocorrelations do not come from a white noise series.
How large is too large? If the autocorrelations did come from a white noise series,
then both Q and Q∗∗ would have a χ22 distribution with ℓℓ degrees of freedom.
36 | P a g e
7.4 Evaluating forecast accuracy
Training and test sets

It is important to evaluate forecast accuracy using genuine forecasts. Consequently,
the size of the residuals is not a reliable indication of how large true forecast errors
are likely to be. The accuracy of forecasts can only be determined by considering how
well a model performs on new data that were not used when fitting the model.
When choosing models, it is common practice to separate the available data into two
portions, training and test data, where the training data is used to estimate any
parameters of a forecasting method and the test data is used to evaluate its accuracy.
Because the test data is not used in determining the forecasts, it should provide a
reliable indication of how well the model is likely to forecast on new data.
The size of the test set is typically about 20% of the total sample, although this value
depends on how long the sample is and how far ahead you want to forecast. The test
set should ideally be at least as large as the maximum forecast horizon required. The
following points should be noted.
 A model which fits the training data well will not necessarily forecast well.
 A perfect fit can always be obtained by using a model with enough parameters.
 Over-fitting a model to data is just as bad as failing to identify a systematic pattern in the
data.
Some references describe the test set as the “hold-out set” because these data are
“held out” of the data used for fitting. Other references call the training set the “in -
sample data” and the test set the “out-of-sample data”. We prefer to use “training
data” and “test data” in this book.
Functions to subset a time series

The window() function introduced in Chapter 2 is useful when extracting a portion of
a time series, such as we need when creating training and test sets. In
the window() function, we specify the start and/or end of the portion of time series
required using time values. For example,
window(ausbeer, start=1995)
extracts all data from 1995 onward.
Another useful function is subset() which allows for more types of subsetting. A great
advantage of this function is that it allows the use of indices to choose a subset.
Forecast errors
A forecast “error” is the difference between an observed value and its forecast. Here
“error” does not mean a mistake, it means the unpredictable part of an observation. It
can be written aseT+h=yT+h−^yT+h|T,+ℎ=+ℎ−^+ℎ|,where the training data is given
by {y1,…,yT}{1,…,} and the test data is given by {yT+1,yT+2,…}{+1,+2,…}.
Note that forecast errors are different from residuals in two ways. First, residuals are
calculated on the training set while forecast errors are calculated on the test set.
Second, residuals are based on one-step forecasts while forecast errors can
involve multi-step forecasts.
We can measure forecast accuracy by summarising the forecast errors in different
ways.
Scale-dependent errors
The forecast errors are on the same scale as the data. Accuracy measures that are
based only on et are therefore scale-dependent and cannot be used to make
comparisons between series that involve different units.
The two most commonly used scale-dependent measures are based on the absolute
errors or squared errors: Mean absolute error: MAE=mean(|et|),Root mean
squared error: RMSE=√ mean(e2t) .Mean absolute error:
MAE=mean(||),Root mean squared error: RMSE=mean(2).When
comparing forecast methods applied to a single time series, or to several time series
with the same units, the MAE is popular as it is easy to both understand and compute.
A forecast method that minimises the MAE will lead to forecasts of the median, while
minimising the RMSE will lead to forecasts of the mean. Consequently, the RMSE is
also widely used, despite being more difficult to interpret.
Percentage errors
The percentage error is given by pt=100et/yt=100/. Percentage errors have the
advantage of being unit-free, and so are frequently used to compare forecast
performances between data sets. The most commonly used measure is: Mean
absolute percentage error: MAPE=mean(|pt|).Mean absolute percentage
error: MAPE=mean(||).Measures based on percentage errors have the
disadvantage of being infinite or undefined if yt=0=0 for any t in the period of
interest, and having extreme values if any yt is close to zero. Another problem with
percentage errors that is often overlooked is that they assume the unit of
measurement has a meaningful zero. 3 For example, a percentage error makes no
sense when measuring the accuracy of temperature forecasts on either the Fahrenheit
or Celsius scales, because temperature has an arbitrary zero point.
38 | P a g e
They also have the disadvantage that they put a heavier penalty on negative errors
than on positive errors. This observation led to the use of the so-called “symmetric”
MAPE (sMAPE) proposed by Armstrong (1978, p. 348), which was used in the M3
forecasting competition. It is defined
bysMAPE=mean(200|yt−^yt|/(yt+^yt)).sMAPE=mean(200−^|/(+^)).Howe
ver, if yt is close to zero, ^yt^ is also likely to be close to zero. Thus, the measure still
involves division by a number close to zero, making the calculation unstable. Also, the
value of sMAPE can be negative, so it is not really a measure of “absolute percentage
errors” at all.
Hyndman & Koehler (2006) recommend that the sMAPE not be used. It is included
here only because it is widely used, although we will not use it in this book.
Scaled errors
Scaled errors were proposed by Hyndman & Koehler (2006) as an alternative to using
percentage errors when comparing forecast accuracy across series with different
units. They proposed scaling the errors based on the training MAE from a simple
forecast method.
For a non-seasonal time series, a useful way to define a scaled error uses naïve
forecasts:qj=ej1T−1T∑t=2|yt−yt−1|.=−1∑=2|−−1|.Because the numerator and
denominator both involve values on the scale of the original data, qj is independent of
the scale of the data. A scaled error is less than one if it arises from a better forecast
than the average naïve forecast computed on the training data. Conversely, it is
greater than one if the forecast is worse than the average naïve forecast computed on
the training data.
For seasonal time series, a scaled error can be defined using seasonal naïve
forecasts:qj=ej1T−mT∑t=m+1|yt−yt−m|.=1−∑=+1|−|.
The mean absolute scaled error is simplyMASE=mean(|qj|).MASE=mean(||).
Time series cross-validation

A more sophisticated version of training/test sets is time series cross-validation. In
this procedure, there are a series of test sets, each consisting of a single observation.
The corresponding training set consists only of observations that occurred prior to
the observation that forms the test set. Thus, no future observations can be used in
constructing the forecast. Since it is not possible to obtain a reliable forecast based on
a small training set, the earliest observations are not considered as test sets.
The following diagram illustrates the series of training and test sets, where the blue
observations form the training sets, and the red observations form the test sets.
The forecast accuracy is computed by averaging over the test sets. This procedure is
sometimes known as “evaluation on a rolling forecasting origin” because the “origin”
at which the forecast is based rolls forward in time.
With time series forecasting, one-step forecasts may not be as relevant as multi-step
forecasts. In this case, the cross-validation procedure based on a rolling forecasting
origin can be modified to allow multi-step errors to be used. Suppose that we are
interested in models that produce good 44-step-ahead forecasts. Then the
corresponding diagram is shown below.
Time series cross-validation is implemented with the tsCV() function. In the

following example, we compare the RMSE obtained via time series cross-validation
with the residual RMSE.
40 | P a g e
As expected, the RMSE from the residuals is smaller, as the corresponding “forecasts”
are based on a model fitted to the entire data set, rather than being true forecasts.
A good way to choose the best forecasting model is to find the model with the smallest
RMSE computed using time series cross-validation.
Pipe operator
The ugliness of the above R code makes this a good opportunity to introduce some
alternative ways of stringing R functions together. In the above code, we are nesting
functions within functions within functions, so you have to read the code from the
inside out, making it difficult to understand what is being computed. Instead, we can
use the pipe operator %>% as follows.
goog200 %>% tsCV(forecastfunction=rwf, drift=TRUE, h=1) -> e
e^2 %>% mean(na.rm=TRUE) %>% sqrt()
#> [1] 6.233
goog200 %>% rwf(drift=TRUE) %>% residuals() -> res
res^2 %>% mean(na.rm=TRUE) %>% sqrt()
#> [1] 6.169
The left hand side of each pipe is passed as the first argument to the function on the
right hand side. This is consistent with the way we read from left to right in English.
When using pipes, all other arguments must be named, which also helps readability.
When using pipes, it is natural to use the right arrow assignment -> rather than the
left arrow. For example, the third line above can be read as “Take the goog200 series,
pass it to rwf() with drift=TRUE , compute the resulting residuals, and store them
as res”.
We will use the pipe operator whenever it makes the code easier to read. In order to
be consistent, we will always follow a function with parentheses to differentiate it
from other objects, even if it has no arguments. See, for example, the use
of sqrt() and residuals() in the code above.
7.5 Prediction intervals

As discussed in Section 1.7, a prediction interval gives an interval within which we
expect yt to lie with a specified probability. For example, assuming that the forecast
errors are normally distributed, a 95% prediction interval for the hℎ-step forecast
is^yT+h|T±1.96^σh,^+ℎ|±1.96^ℎ,where ^σh^ℎ is an estimate of the standard
deviation of the hℎ-step forecast distribution.
More generally, a prediction interval can be written as ^yT+h|T±c^σh+ℎ|±^ℎwhere the
multiplier c depends on the coverage probability. In this book we usually calculate 80%
intervals and 95% intervals, although any percentage may be used. The following table
gives the value of cfor a range of coverage probabilities assuming normally distributed
forecast errors.
Multipliers to be used for prediction intervals.
Percentage Multiplier
50 0.67
55 0.76
60 0.84
65 0.93
70 1.04
75 1.15
80 1.28
85 1.44
90 1.64
95 1.96
96 2.05
97 2.17
98 2.33
99 2.58
The value of prediction intervals is that they express the uncertainty in the forecasts. If
we only produce point forecasts, there is no way of telling how accurate the forecasts are.
However, if we also produce prediction intervals, then it is clear how much uncertainty is
associated with each forecast. For this reason, point forecasts can be of almost no value
without the accompanying prediction intervals.
One-step prediction intervals

When forecasting one step ahead, the standard deviation of the forecast distribution
is almost the same as the standard deviation of the residuals. (In fact, the two
standard deviations are identical if there are no parameters to be estimated, as is the
case with the naïve method. For forecasting methods involving parameters to be
estimated, the standard deviation of the forecast distribution is slightly larger than
the residual standard deviation, although this difference is often ignored.)
The last value of the observed series is 531.48, so the forecast of the next value of the
GSP is 531.48. The standard deviation of the residuals from the naïve method is 6.21.
Hence, a 95% prediction interval for the next value of the GSP
42 | P a g e
is531.48±1.96(6.21)=[519.3,543.6].531.48±1.96(6.21)=[519.3,543.6].Similarly, an 80%
prediction interval is given
by531.48±1.28(6.21)=[523.5,539.4].531.48±1.28(6.21)=[523.5,539.4].
The value of the multiplier (1.96 or 1.28) is taken from Table 3.1.
Multi-step prediction intervals

A common feature of prediction intervals is that they increase in length as the forecast
horizon increases. The further ahead we forecast, the more uncertainty is associated
with the forecast, and thus the wider the prediction intervals. That is, σhℎ usually
increases with hℎ (although there are some non-linear forecasting methods that do
not have this property).
To produce a prediction interval, it is necessary to have an estimate of σhℎ. As already
noted, for one-step forecasts (h=1ℎ=1), the residual standard deviation provides a
good estimate of the forecast standard deviation σ11. For multi-step forecasts, a more
complicated method of calculation is required. These calculations assume that the
residuals are uncorrelated.
Benchmark methods
For the four benchmark methods, it is possible to mathematically derive the forecast
standard deviation under the assumption of uncorrelated residuals. If ^σh^ℎ denotes the
standard deviation of the hℎ-step forecast distribution, and ^σ^ is the residual standard
deviation, then we can use the following expressions.
Mean forecasts: ^σh=^σ√ 1+1/T ^ℎ=^1+1/
Naïve forecasts: ^σh=^σ√ h ^ℎ=^ℎ
Seasonal naïve forecasts ^σh=^σ√ k+1 ^ℎ=^+1, where k is the integer part
of (h−1)/m(ℎ−1)/ and m is the seasonal period.
Drift forecasts: ^σh=^σ√ h(1+h/T) ^ℎ=^ℎ(1+ℎ/).
Prediction intervals from bootstrapped residuals

When a normal distribution for the forecast errors is an unreasonable assumption,
one alternative is to use bootstrapping, which only assumes that the forecast errors
are uncorrelated.
A forecast error is defined as et=yt−^yt|t−1=−^|−1. We can re-write this
asyt=^yt|t−1+et.=^|−1+.So we can simulate the next observation of a time series
usingyT+1=^yT+1|T+eT+1+1=^+1|+1where ^yT+1|T^+1| is the one-step forecast
and eT+1+1 is the unknown future error. Assuming future errors will be similar to
past errors, we can replace eT+1+1 by sampling from the collection of errors we have
seen in the past (i.e., the residuals). Adding the new simulated observation to our data
set, we can repeat the process to

obtainyT+2=^yT+2|T+1+eT+2+2=^+2|+1++2where eT+2+2 is another draw from the
collection of residuals. Continuing in this way, we can simulate an entire set of future
values for our time series.
Doing this repeatedly, we obtain many possible futures. Then we can compute
prediction intervals by calculating percentiles for each forecast horizon. The result is
called a bootstrapped prediction interval. The name “bootstrap” is a reference to
pulling ourselves up by our bootstraps, because the process allows us to measure
future uncertainty by only using the historical data.
Prediction intervals with transformations

If a transformation has been used, then the prediction interval should be computed
on the transformed scale, and the end points back-transformed to give a prediction
interval on the original scale. This approach preserves the probability coverage of the
prediction interval, although it will no longer be symmetric around the point forecast.
The back-transformation of prediction intervals is done automatically using the
functions in the forecast package in R, provided you have used the lambda argument
when computing the forecasts.
8. ARIMA (Autoregressive Integrated Moving Average) and

ARMA (Autoregressive Moving Average)
ARIMA (Autoregressive Integrated Moving Average) and ARMA (Autoregressive Moving

Average) are both time series forecasting models used to predict future values based on
historical data.
ARIMA model is a combination of three components: autoregressive (AR), integrated (I), and
moving average (MA). The AR component represents the relationship between an observation
and a certain number of lagged observations. The MA component represents the dependency
between an observation and a residual error from a moving average model applied to lagged
observations. The I component is the differencing of raw observations to make the time series
stationary.
ARMA model, on the other hand, is a combination of two components: autoregressive (AR) and
moving average (MA). It does not include the integrated component. The AR component
represents the relationship between an observation and a certain number of lagged
observations, while the MA component represents the dependency between an observation and
a residual error from a moving average model applied to lagged observations.
Both ARIMA and ARMA models are commonly used in time series analysis and forecasting. The
choice between them depends on the characteristics of the data and the specific requirements of
the forecasting task. ARIMA is more flexible and suitable for non-stationary time series data,
while ARMA is simpler and suitable for stationary time series data.
44 | P a g e
ARMA Model
A mixed autoregressive moving average process of order (p,q), an ARMA(p,q) process, is a
stationary process {Yt} which satisfies the relation
t ∈ Z, (1)
where µ is the process mean, {²t} is a white noise process with mean 0 and variance σ2, φp 6= 0
and θq 6= 0. 2
Alternatively, the model (1) may be written as
t ∈ Z,
where the constant c is given by
Another way of writing the model (1) is as
φ(L)(Yt − µ) = θ(L)²t, (2)
where φ(z) is the AR characteristic polynomial,
φ(z) = 1 − φ1z − φ2z2 − ... − φpzp,
and θ(z) is the MA characteristic polynomial,
θ(z) = 1 + θ1z + θ2z2 + ... + θqzq.
From now on we shall assume that the AR and MA characteristic polynomials have no common
factors, since, otherwise, the model would be over-parameterized and the common factors
could be cancelled out in Equation (2) to obtain an equivalent model of lower order with no
common factors. Note that ARMA(p,0) ≡ AR(p) and ARMA(0,q) ≡ MA(q).
The ARMA(p,q) model defines a stationary, linear process if and only if all the roots of the
AR characteristic equation φ(z) = 0 lie strictly outside the unit circle in the complex plane, which
is precisely the condition for the corresponding AR(p) model to define a stationary process. The
resulting process is invertible if and only if all the roots of the MA characteristic equation θ(z) =
0 lie strictly outside the unit circle in the complex plane, which is precisely the condition for the
corresponding MA(q) process to be invertible. We shall require both the stationarity and
invertibility conditions to be satisfied.
Having assumed for an ARMA model that the AR and MA characteristic polynomials have no
common factors and that the process is stationary and invertible, it follows that the model and
its parameter values (apart from the process mean µ) are uniquely identifiable from its
autocovariance function. It also follows from Equation (2) that the infinite moving average
expression for {Yt} is given by
,
i.e.,
Yt = µ + ψ(L)²t,
where the generating function ψ is given by
ψ(z) = φ(z)−1θ(z). (3)
The ARMA processes all belong to the family of linear processes as defined in Section 4.3
(slightly generalized by the addition of the term µ for the process mean). What is important
about the ARMA processes for practical purposes is that they are characterized by a finite
number, p+q +1, of parameters — p autoregressive parameters, q moving average parameters
and one parameter µ for the process mean — which can be estimated from the observed time
series data to which the model is being fitted.
Autocovariances and autocorrelations for ARMA processes

To investigate the autocovariances of an ARMA(p,q) process, we may without loss of
generality take the process mean to be zero. Setting µ = 0 in Equation (1), multiplying
through by Yt−τ
and taking expectations,
Xp Xq γτ = φkE[Yt−kYt−τ] + E[²tYt−τ] +
θiE[²t−iYt−τ]. (4)
k=1 i=1
If τ ≥ q + 1 then, by Equation (10) of Section 4.3,
E[²t−iYt−τ] = 0, 0 ≤ i ≤ q. Hence if τ
≥ q + 1 then Equation (4) may be written as
46 | P a g e
Xp γτ =
φkγτ−k.
k=1
Dividing through by γ0,
. (5)
Equation (5) is similar to Equation (26) of Section 5.4 for the autocorrelation function of
an AR(p) process — it is the same difference equation but with a more restricted range
of validity. Hence the general form of solution for the autocorrelation function of an
ARMA(p,q) process, as a sum of geometric terms, is similar to that for the
corresponding AR(p) process, but the determination of the arbitrary constants in the
general solution is more complicated.
ARIMA models
ARIMA models provide another approach to time series forecasting. Exponential
smoothing and ARIMA models are the two most widely used approaches to time
series forecasting, and provide complementary approaches to the problem. While
exponential smoothing models are based on a description of the trend and seasonality
in the data, ARIMA models aim to describe the autocorrelations in the data.
Before we introduce ARIMA models, we must first discuss the concept of stationarity
and the technique of differencing time series.
Stationarity and differencing

A stationary time series is one whose properties do not depend on the time at which
the series is observed. 15 Thus, time series with trends, or with seasonality, are not
stationary — the trend and seasonality will affect the value of the time series at
different times. On the other hand, a white noise series is stationary — it does not
matter when you observe it, it should look much the same at any point in time.
Some cases can be confusing — a time series with cyclic behaviour (but with no trend
or seasonality) is stationary. This is because the cycles are not of a fixed length, so
before we observe the series we cannot be sure where the peaks and troughs of the
cycles will be.
In general, a stationary time series will have no predictable patterns in the long-term.
Time plots will show the series to be roughly horizontal (although some cyclic
behaviour is possible), with constant variance.
Backshift notation
The backward shift operator B is a useful notational device when working with time
series lags:Byt=yt−1.=−1.(Some references use L for “lag” instead of B for
“backshift”.) In other words, B, operating on yt, has the effect of shifting the data
back one period. Two applications of B to yt shifts the data back two
periods:B(Byt)=B2yt=yt−2.()=2=−2.For monthly data, if we wish to consider “the
same month last year,” the notation is B12yt12= yt−12−12.
The backward shift operator is convenient for describing the process of differencing.
A first difference can be written
asy′t=yt−yt−1=yt−Byt=(1−B)yt.′=−−1=−=(1−).Note that a first difference is
represented by (1−B)(1−). Similarly, if second-order differences have to be
computed,
then:y′′t=yt−2yt−1+yt−2=(1−2B+B2)yt=(1−B)2yt.″=−2−1+−2=(1−2+2)=(1−)2
.In general, a dth-order difference can be written as (1−B)dyt.(1−).
Backshift notation is particularly useful when combining differences, as the operator
can be treated using ordinary algebraic rules. In particular, terms involving B can be
multiplied together.
For example, a seasonal difference followed by a first difference can be written
as(1−B)(1−Bm)yt=(1−B−Bm+Bm+1)yt=yt−yt−1−yt−m+yt−m−1,(1−)(1−)=(1−−++
1)=−−1−−+−−1,the same result we obtained earlier.
Autoregressive models
In a multiple regression model, we forecast the variable of interest using a linear
combination of predictors. In an autoregression model, we forecast the variable of
interest using a linear combination of past values of the variable. The
term autoregression indicates that it is a regression of the variable against itself.
Thus, an autoregressive model of order p can be written

asyt=c+ϕ1yt−1+ϕ2yt−2+⋯+ϕpyt−p+εt,=+1−1+2−2+⋯+−+,where εt is white
noise. This is like a multiple regression but with lagged values of yt as predictors. We
refer to this as an AR(p) model, an autoregressive model of order p.
Autoregressive models are remarkably flexible at handling a wide range of different
time series patterns. The two series in Figure 8.5 show series from an AR(1) model
and an AR(2) model. Changing the parameters ϕ1,…,ϕp1,… results in different time
48 | P a g e
series patterns. The variance of the error term εt will only change the scale of the
series, not the patterns.
Two examples of data from autoregressive models with different parameters.

Left: AR(1) with yt=18−0.8yt−1+εt=18−0.8−1+. Right: AR(2)
with yt=8+1.3yt−1−0.7yt−2+εt=8+1.3−1−0.7−2+. In both cases, εt is normally
distributed white noise with mean zero and variance one.
For an AR(1) model:
 when ϕ1=01=0, yt is equivalent to white noise;

 when ϕ1=11=1 and c=0=0, ytis equivalent to a random walk;
 when ϕ1=11=1 and c≠0≠0, yt is equivalent to a random walk with drift;
 when ϕ1<01<0, yt tends to oscillate around the mean.
We normally restrict autoregressive models to stationary data, in which case some

constraints on the values of the parameters are required.
 For an AR(1) model: −1<ϕ1<1−1<1<1.

 For an AR(2) model: −1<ϕ2<1−1<2<1, ϕ1+ϕ2<11+2<1, ϕ2−ϕ1<12−1<1.
When p≥3≥3, the restrictions are much more complicated. R takes care of these
restrictions when estimating a model.
Moving average models

Rather than using past values of the forecast variable in a regression, a moving
average model uses past forecast errors in a regression-like
model.yt=c+εt+θ1εt−1+θ2εt−2+⋯+θqεt−q,=++1−1+2−2+⋯+−,where εt is white
noise. We refer to this as an MA(q) model, a moving average model of order q. Of
course, we do not observe the values of εt, so it is not really a regression in the usual
sense.
Notice that each value of yt can be thought of as a weighted moving average of the
past few forecast errors. A moving average model is used for forecasting future
values, while moving average smoothing is used for estimating the trend-cycle of past
values.
Figure 8.6: Two examples of data from moving average models with different
parameters. Left: MA(1) with yt=20+εt+0.8εt−1=20++0.8−1. Right: MA(2)
with yt=εt−εt−1+0.8εt−2=−−1+0.8−2. In both cases, εt is normally distributed white
noise with mean zero and variance one.
Figure 8.6 shows some data from an MA(1) model and an MA(2) model. Changing the
parameters θ1,…,θq1,…, results in different time series patterns. As with
autoregressive models, the variance of the error term εt will only change the scale of
the series, not the patterns.
It is possible to write any stationary AR( p) model as an MA( ∞∞) model. For example,
using repeated substitution, we can demonstrate this for an AR(1)
model:yt=ϕ1yt−1+εt=ϕ1(ϕ1yt−2+εt−1)+εt=ϕ21yt−2+ϕ1εt−1+εt=ϕ31yt−3+ϕ21εt−2+ϕ1εt−1
+εtetc.=1−1+=1(1−2+−1)+=12−2+1−1+=13−3+12−2+1−1+etc.
Provided −1<ϕ1<1−1<1<1, the value of ϕk11 will get smaller as k gets larger. So
eventually we
50 | P a g e
obtainyt=εt+ϕ1εt−1+ϕ21εt−2+ϕ31εt−3+⋯,=+1−1+12−2+13−3+⋯,an MA(∞∞)
process.
The reverse result holds if we impose some constraints on the MA parameters. Then
the MA model is called invertible. That is, we can write any invertible MA( q)
process as an AR( ∞∞) process. Invertible models are not simply introduced to enable
us to convert from MA models to AR models. They also have some desirable
mathematical properties.
For example, consider the MA(1) process, yt=εt+θ1εt−1=+1−1. In its AR(∞∞)
representation, the most recent error can be written as a linear function of current
and past observations: εt=∞∑j=0(−θ)jyt−j.=∑=0∞(−)−.When |θ|>1||>1, the
weights increase as lags increase, so the more distant the observations the greater
their influence on the current error. When |θ|=1||=1, the weights are constant in
size, and the distant observations have the same influence as the recent observations.
As neither of these situations make much sense, we require |θ|<1||<1, so the most
recent observations have higher weight than observations from the more distant past.
Thus, the process is invertible when |θ|<1||<1.
The invertibility constraints for other models are similar to the stationarity
constraints.
 For an MA(1) model: −1<θ1<1−1<1<1.

 For an MA(2) model: −1<θ2<1, −1<2<1, θ2+θ1>−1, 2+1>−1, θ1−θ2<11−2<1.
More complicated conditions hold for q≥3≥3. Again, R will take care of these
constraints when estimating the models.
Non-seasonal ARIMA models

If we combine differencing with autoregression and a moving average model, we
obtain a non-seasonal ARIMA model. ARIMA is an acronym for AutoRegressive
Integrated Moving Average (in this context, “integration” is the reverse of
differencing). The full model can be written
asy′t=c+ϕ1y′t−1+⋯+ϕpy′t−p+θ1εt−1+⋯+θqεt−q+εt,(8.1)(8.1)′=+1−1′+⋯+−′+1−1
+⋯+−+,where y′t′ is the differenced series (it may have been differenced more than
once). The “predictors” on the right hand side include both lagged values of yt and
lagged errors. We call this an ARIMA(p,d,q,,) model, where
p== order of the autoregressive part;

d== degree of first differencing involved;
q== order of the moving average part.
The same stationarity and invertibility conditions that are used for autoregressive and
moving average models also apply to an ARIMA model.
Estimation and order selection

Maximum likelihood estimation
Once the model order has been identified (i.e., the values of p, dand q), we need to
estimate the parameters c, ϕ1,…,ϕp1,…,, θ1,…,θq1,…,. When R estimates the
ARIMA model, it uses maximum likelihood estimation (MLE). This technique finds
the values of the parameters which maximise the probability of obtaining the data
that we have observed. For ARIMA models, MLE is similar to the least
squares estimates that would be obtained by minimising T∑t=1ε2t.∑=12.(For the
regression models considered in Chapter 5, MLE gives exactly the same parameter
estimates as least squares estimation.) Note that ARIMA models are much more
complicated to estimate than regression models, and different software will give
slightly different answers as they use different methods of estimation, and different
optimisation algorithms.
In practice, R will report the value of the log likelihood of the data; that is, the
logarithm of the probability of the observed data coming from the estimated model.
For given values of p, d and q, R will try to maximise the log likelihood when finding
parameter estimates.
Information Criteria
Akaike’s Information Criterion (AIC), which was useful in selecting predictors for
regression, is also useful for determining the order of an ARIMA model. It can be
written asAIC=−2log(L)+2(p+q+k+1),AIC=−2log⁡()+2(+++1),where L is
the likelihood of the data, k=1=1 if c≠0≠0 and k=0=0 if c=0=0. Note that the last
term in parentheses is the number of parameters in the model (including σ22, the
variance of the residuals).
For ARIMA models, the corrected AIC can be written
asAICc=AIC+2(p+q+k+1)(p+q+k+2)T−p−q−k−2,AICc=AIC+2(+++1)(++
+2)−−−−2,and the Bayesian Information Criterion can be written
asBIC=AIC+[log(T)−2](p+q+k+1).BIC=AIC+[log⁡()−2](+++1).Good
models are obtained by minimising the AIC, AICc or BIC. Our preference is to use the
AICc.
It is important to note that these information criteria tend not to be good guides to
selecting the appropriate order of differencing ( d) of a model, but only for selecting
the values of p and q. This is because the differencing changes the data on which the
likelihood is computed, making the AIC values between models with different orders
of differencing not comparable. So we need to use some other approach to choose d,
and then we can use the AICc to select p and q.
52 | P a g e
References
 Principles of management by B.S.Shah.
 Principles of management by T. Ramasamy.
 Principle of management by P.N.Reedy
 Rob J Hyndman and George Athanasopoulos
Monash University, Australia
 Armstrong, J. S. (1978). Long-range forecasting: From crystal ball to computer. John Wiley
& Sons. [Amazon]
 Hyndman, R. J., & Khandakar, Y. (2008). Automatic time series forecasting: The forecast
package for R. Journal of Statistical Software,
 Kwiatkowski, D., Phillips, P. C. B., Schmidt, P., & Shin, Y. (1992). Testing the null hypothesis
of stationarity against the alternative of a unit root: How sure are we that economic time series
have a unit root? Journal of Econometrics

Forcasting Ass@OM

Uploaded by

Copyright:

Available Formats

Forcasting Ass@OM

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Forcasting Ass@OM

Uploaded by

Copyright:

Available Formats

College of Business and Economics

Submission date: September 30/2023

AMU, SAWLA CAMPUS

1.1 Meaning of Forecasting

 Forecasting is a systematic guessing of the future course of events.

3. ELEMENTS OF A GOOD FORECAST

 Effective handling of uncertainty

5 What can be forecast?

1. how well we understand the factors that contribute to it;

5.1 Forecasting, planning and goals

5.2 Determining what to forecast

5.3 Forecasting data and methods

1. numerical information about the past is available;

There is a wide range of quantitative forecasting methods, often developed within

Time series forecasting

 Daily IBM stock prices

Predictor variables and time series forecasting

5.4 The basic steps in a forecasting task

Step 2: Gathering information.

Step 3: Preliminary (exploratory) analysis.

Step 4: Choosing and fitting models.

Step 5: Using and evaluating a forecasting model.

5.5 The statistical forecasting perspective

6. Time series graphics

We turn this into a ts object using the ts() function:

Frequency of a time series

Actually, there are not 5252 weeks in a year,

6.2 Time plots

Figure 2.1: Weekly economy passenger load on Ansett Airlines.

We will use the autoplot() command frequently. It automatically produces an

6.3Time series patterns

6.4 Seasonal plots

6.5 Seasonal subseries plots

6.7 Scatterplot matrices

6.8 Lag plots

Figure 2.14: Autocorrelation function of quarterly beer production.

Figure 2.17: A white noise time series.

Figure 2.18: Autocorrelation function for the white noise series.

7. The forecaster’s toolbox

7.1 Some simple forecasting methods

Seasonal naïve method

7.2 Transformations and adjustments

Figure 3.3: Monthly milk production per cow.

The logarithm in a Box-Cox transformation is always a natural logarithm (i.e., to

The BoxCox() command actually implements a slight modification of the Box-Cox

For a Box-Cox transformation, the back-transformed mean is given by yt=⎧⎪

7.3 Residual diagnostics

3. The residuals have constant variance.

Portmanteau tests for autocorrelation

Training and test sets

Functions to subset a time series

Time series cross-validation

Time series cross-validation is implemented with the tsCV() function. In the

7.5 Prediction intervals

Multipliers to be used for prediction intervals.

One-step prediction intervals

Multi-step prediction intervals

Prediction intervals from bootstrapped residuals

set, we can repeat the process to