Lecture 1
Lecture 1
Regression
• Financial analysts often need to examine whether a variable is
useful for explaining another variable.
• For example, the analyst may want to know whether earnings
growth, or perhaps cash flow growth, helps explain the
company’s value in the marketplace.
• Regression analysis is a tool for examining this type of issue
• Suppose an analyst is examining the return on assets (ROA) for
an industry and observes the ROA for the six companies shown
in Exhibit 1.
• The average of these ROAs is 12.5%, but the range is from 4%
to 20%.
• In trying to understand why the ROAs differ among these companies, we could
look at why the ROA of Company A differ from that of Company B, why the
ROA of Company A differ from that of Company D, why the ROA of Company
F differ from that of Company C, and so on, comparing each pair of ROAs.
• A way to make this a simpler exploration is to try to understand why each
company’s ROA differ from the mean ROA of 12.5%.
• We look at the sum of squared deviations of the observations
from the mean to capture variations in ROA from their mean.
• Let Y represent the variable that we would like to explain,
which in this case is the return on assets. Let Yi represent an
observation of a company’s ROA, and let represent the mean
ROA for the sample of size n. We can describe the variation of
the ROAs as
• Our goal is to understand what drives these returns on assets or, in other
words, what explains the variation of Y. The variation of Y is often referred to
as the sum of squares total (SST), or the total sum of squares.
• We now ask whether it is possible to explain the variation of the ROA using
another variable that also varies among the companies; note that if this
other variable is constant or random, it would not serve to explain why the
ROAs differ from one another.
• Suppose the analyst believes that the capital expenditures in the previous
period, scaled by the prior period’s beginning property, plant, and
equipment, are a driver for the ROA variable.
• Let us represent this scaled capital expenditures variable as CAPEX, as we
show in Exhibit 2.
e The variation of X, in this case CAPEX, is calculated as
We can see the relation between ROA and CAPEX in the scatter
plot
• We refer to the variable whose variation is being explained as the dependent
variable, or the explained variable; it is typically denoted by Y.
• We refer to the variable whose variation is being used to explain the variation
of the dependent variable as the independent variable, or the explanatory
variable; it is typically denoted by X.
• In our example, the ROA is the dependent variable (Y) and CAPEX is the
independent variable (X).
• A common method for relating the dependent and independent variables is
through the estimation of a linear relationship, which implies describing the
relation between the two variables as represented by a straight line.
• If we have only one independent variable, we refer to the method as simple
linear regression (SLR);
• if we have more than one independent variable, we refer to the method as
multiple regression.
• Linear regression allows us to test hypotheses about the relationship between
two variables, by quantifying the strength of the relationship between the two
variables, and to use one variable to make predictions about the other
variable.
• Our focus is on linear regression with a single independent variable—that is,
simple linear regression.
Identifying the Dependent and Independent Variables in a
Regression
Intercept,
Slope
coefficient
Error term
Estimating the Regression Line
• Fitting the line requires minimizing the sum of the squared
residuals, the sum of squares error (SSE), also known as the
residual sum of squares:
Once we estimate the slope, we can then estimate the intercept
using the mean of Y and the mean of X:
Interpreting the Regression Coefficients