Chapter 1_ Simple linear regression
Chapter 1_ Simple linear regression
• It is also called the two-variable linear regression model or bivariate linear regression model because it relates
the two variables x and y
The meaning of each term (1)
• The variables y and x have several different names used
interchangeably,.
• y is called the dependent variable, the explained variable, the
response variable, the predicted variable, or the regressand.
• x is called the independent variable, the explanatory variable,
the control variable, the predictor variable, or the regressor
• The terms “dependent variable” and “independent variable” are
frequently used in econometrics
The meaning of each term (1)
•
Examples
•
•
Deriving the OLS Estimates
•
Practice: Estimating parameters
• Let us approach the topic of regression analysis with an example. A mail order business adds a new
summer dress to its collection. The purchasing manager needs to know how many dresses to buy so that
by the end of the season the total quantity purchased equals the quantity ordered by customers. To
prevent stock shortages (i.e. customers going without wares) and stock surpluses (i.e. the business is left
stuck with extra dresses), the purchasing managing decides to carry out a sales forecast.
• What’s the best way to forecast sales? The economist immediately thinks of several possible predictors
or influencing variables. How high are sales of a similar dress in the previous year? How high is the
price? How large is the image of the dress in the catalogue? How large is the advertising budget for the
dress? But we don’t only want to know which independent variables exert an influence; we want to
know how large the respective influence is. To know that catalogue image size has an influence on the
number of orders does not suffice. We need to find out the number of orders that can be expected on
average when the image size is, say, 50 sq cm.
Let us first consider the case where future
demand is estimated from the sales of a
similar dress from the previous year. The
following figure displays the association as
a scatterplot for 100 dresses of a given price
category, with the future demand plotted on
the y-axis and the demand from the previous
year plotted on the x-axis.
If all the plots lay on the angle bisector (an angle bisector divides an angle into two angles of
equal measure), the future demand of period (t) would equal the sold quantities of the previous
year (t -1). As is easy to see, this is only rarely the case. The scatterplot that results contains some
large deviations, producing a correlation coefficient of only r =0.42
Now if, instead of equivalent dresses from
the previous year, we take into account the
catalogue image size for the current season
(t), we arrive at the scatterplot in the new
following figure. We see immediately that
the data points lie much closer to the line,
which was drawn to best approximate the
course of the data. This line is more suited
for a sales forecast than a line produced
using the “equivalence method” in the
previous Figure.
The relatively large correlation coefficient of r=0.95, however, ultimately shows that the linear
association between these variables is stronger. The points lie much closer to the line, which means
that the sales forecast will result in fewer costs for stock shortages and stock surpluses. But, again,
this applies only for products of the same quality and in a specific price category.
Since we want to prevent both, we can position the line so
that the sum of deviations between realized points and the
points on line is as close to zero as possible. The problem
with this approach is that a variety of possible lines with
different qualities of fit all fulfil this condition. A selection
of possible lines is shown in the figure above