A Brief Overview of The Classical Linear Regression Model (CLRM)

Chapter 3
A brief overview of the

classical linear regression model
(CLRM)
‘Introductory Econometrics for Finance’ © Chris Brooks 2013 1

Regression
• Regression is probably the single most important tool at the

econometrician’s disposal.
But what is regression analysis?
• It is concerned with describing and evaluating the relationship between

a given variable (usually called the dependent variable) and one or
more other variables (usually known as the independent variable(s)).

Some Notation
• Denote the dependent variable by y and the independent variable(s) by x1, x2,
... , xk where there are k independent variables.
• Some alternative names for the y and x variables:

y x
dependent variable independent variables
regressand regressors
effect variable causal variables
explained variable explanatory variable
• Note that there can be many x variables, but we will limit ourselves to the
case where there is only one x variable to start with. In our set-up, there is
only one y variable.
Regression is different from Correlation
• If we say y and x are correlated, it means that we are treating y and x in

a completely symmetrical way.
• In regression, we treat the dependent variable (y) and the independent

variable(s) (x’s) very differently. The y variable is assumed to be
random or “stochastic” in some way, i.e. to have a probability
distribution. The x variables are, however, assumed to have fixed
(“non-stochastic”) values in repeated samples.

Simple Regression
• For simplicity, say k=1. This is the situation where y depends on only one x
variable.
• Examples of the kind of relationship that may be of interest include:

– How asset returns vary with their level of market risk
– Measuring the long-term relationship between stock prices and
dividends.
– Constructing an optimal hedge ratio

Simple Regression: An Example
• Suppose that we have the following data on the excess returns on a fund
manager’s portfolio (“fund XXX”) together with the excess returns on a
market index:
Year, t Excess return Excess return on market index
= rXXX,t – rft = rmt - rft
1 17.8 13.7
2 39.0 23.2
3 12.8 6.9
4 24.2 16.8
5 17.2 12.3
• We have some intuition that the beta on this fund is positive, and we
therefore want to find whether there appears to be a relationship between
x and y given the data that we have. The first stage would be to form a
scatter plot of the two variables.

Graph (Scatter Diagram)
45
Excess return on fund XXX
40
35
30
25
20
15
10
5
0
0 5 10 15 20 25
Excess return on market portfolio
Finding a Line of Best Fit
• We can use the general equation for a straight line,

y=a+bx
to get the line that best “fits” the data.
• However, this equation (y=a+bx) is completely deterministic.
• Is this realistic? No. So what we do is to add a random disturbance

term, u into the equation.
yt =  +  xt + u t
where t = 1,2,3,4,5

Why do we include a Disturbance term?
• The disturbance term can capture a number of features:
- We always leave out some determinants of yt

- There may be errors in the measurement of yt that cannot be
modelled.
- Random outside influences on yt which we cannot model

Determining the Regression Coefficients
• So how do we determine what  and  are?

• Choose  and  so that the (vertical) distances from the data points to the
fitted lines are minimised (so that the line fits the data as closely as
possible): y
x
Ordinary Least Squares
• The most common method used to fit a line to the data is known as
OLS (ordinary least squares).
• What we actually do is take each distance and square it (i.e. take the
area of each of the squares in the diagram) and minimise the total sum
of the squares (hence least squares).
• Tightening up the notation, let

yt denote the actual data point t
ŷt denote the fitted value from the regression line
ût denote the residual, yt - ŷt

Actual and Fitted Value
yi
û i
ŷi
xi x

How OLS Works
• So min. uˆ1 + uˆ 2 + uˆ3 + uˆ 4 + uˆ5 , or minimise

2 2 2 2 2
 uˆ
t =1
2
t . This is known
as the residual sum of squares.
• But what was ût ? It was the difference between the actual point and
the line, yt - ŷt .
• So minimising ( y − ˆ
y )
 t t is equivalent to minimising
2
 t
ˆ
u 2
with respect to $ and $ .

What do We Use $ and $ For?
• In the CAPM example used above, plugging the 5 observations in to make up

the formulae given above would lead to the estimates
$ = -1.74 and $= 1.64. We would write the fitted line as:
yˆ t = −1.74 + 1.64 x t
• Question: If an analyst tells you that she expects the market to yield a return
20% higher than the risk-free rate next year, what would you expect the return
on fund XXX to be?
• Solution: We can say that the expected value of y = “-1.74 + 1.64 * value of x”,
so plug x = 20 into the equation to get the expected value for y:
yˆ i = −1.74 + 1.64 20 = 31.06
Accuracy of Intercept Estimate
• Care needs to be exercised when considering the intercept estimate,

particularly if there are no or few observations close to the y-axis:
y
0 x

The Population and the Sample
• The population is the total collection of all objects or people to be studied,

for example,
• Interested in Population of interest

predicting outcome the entire electorate
of an election
• A sample is a selection of just some items from the population.
• A random sample is a sample in which each individual item in the

population is equally likely to be drawn.

The PRF and the SRF
• The population regression function (PRF) is a description of the model that

is thought to be generating the actual data and the true relationship
between the variables (i.e. the true values of  and  ).
• The PRF is yt =  + xt + ut
• The SRF (the sample regression function ) is yˆ t = ˆ + ˆxt

and we also know that uˆt = yt − yˆ t.
• We use the SRF to infer likely values of the PRF.
• We also want to know how “good” our estimates of  and  are.

Linearity
• In order to use OLS, we need a model which is linear in the parameters (

and  ). It does not necessarily have to be linear in the variables (y and x).
• Linear in the parameters means that the parameters are not multiplied
together, divided, squared or cubed etc.
• Some models can be transformed to linear ones by a suitable substitution

or manipulation, e.g. the exponential regression model
Yt = e X t eut ln Yt =  +  ln X t + ut
• Then let yt=ln Yt and xt=ln Xt
yt =  + xt + ut
Linear and Non-linear Models
• This is known as the exponential regression model. Here, the coefficients

can be interpreted as elasticities.
• Similarly, if theory suggests that y and x should be inversely related:


yt =  + + ut
xt
then the regression can be estimated using OLS by substituting
1
zt =
xt
• But some models are intrinsically non-linear, e.g.

yt =  + xt + ut

Estimator or Estimate?
• Estimators are the formulae used to calculate the coefficients
• Estimates are the actual numerical values for the coefficients.

The Assumptions Underlying the
Classical Linear Regression Model (CLRM)
• The model which we have used is known as the classical linear regression model.
• We observe data for xt, but since yt also depends on ut, we must be specific about
how the ut are generated.
• We usually make the following set of assumptions about the ut’s (the
unobservable error terms):
• Technical Notation Interpretation
1. E(ut) = 0 The errors have zero mean
2. Var (ut) = 2 The variance of the errors is constant and finite
over all values of xt
3. Cov (ui,uj)=0 The errors are statistically independent of
one another
4. Cov (ut,xt)=0 No relationship between the error and
corresponding x variate
The Assumptions Underlying the
CLRM Again
• An alternative assumption to 4., which is slightly stronger, is that the

xt’s are non-stochastic or fixed in repeated samples.
• A fifth assumption is required if we want to make inferences about the

population parameters (the actual  and ) from the sample parameters
( $ and $ )
• Additional Assumption
5. ut is normally distributed

Note
• Remember!
Variance and covariance are mathematical terms frequently used in
statistics and probability theory. Variance refers to the spread of a
data set around its mean value, while a covariance refers to the
measure of the directional relationship between two random
variables.
• Be careful!
Stochastic = random
Non-stochastic = Non random = deterministic = fixed
Instructor’s note 23
Properties of the OLS Estimator
• If assumptions 1. through 4. hold, then the estimators $ and $ determined by

OLS are known as Best Linear Unbiased Estimators (BLUE).
What does the acronym stand for?
• “Estimator” - $ is an estimator of the true value of .

• “Linear” - $ is a linear estimator
• “Unbiased” - On average, the actual value of the $ and $’s will be equal to
the true values.
• “Best” - means that the OLS estimator $ has minimum variance among
the class of linear unbiased estimators. The Gauss-Markov
theorem proves that the OLS estimator is best.

An Introduction to Statistical Inference
• We want to make inferences about the likely population values from

the regression parameters.
Example: Suppose we have the following regression results:

yˆ t = 20.3 + 0.5091xt
(14.38) (0.2561)
• $ = 0.5091 is a single (point) estimate of the unknown population
parameter,  . How “reliable” is this estimate?
• The reliability of the point estimate is measured by the coefficient’s

standard error.

Hypothesis Testing: Some Concepts
• We can use the information in the sample to make inferences about the
population.
• We will always have two hypotheses that go together, the null hypothesis
(denoted H0) and the alternative hypothesis (denoted H1).
• The null hypothesis is the statement or the statistical hypothesis that is actually
being tested. The alternative hypothesis represents the remaining outcomes of
interest.
• For example, suppose given the regression results above, we are interested in
the hypothesis that the true value of  is in fact 0.5. We would use the notation
H0 :  = 0.5
H1 :   0.5
This would be known as a two sided test.

One-Sided Hypothesis Tests
• Sometimes we may have some prior information that, for example, we

would expect  > 0.5 rather than  < 0.5. In this case, we would do a
one-sided test:
H0 :  = 0.5
H1 :  > 0.5
or we could have had
H0 :  = 0.5
H1 :  < 0.5
• There are two ways to conduct a hypothesis test: via the test of
significance approach or via the confidence interval approach.

Testing Hypotheses:
The Test of Significance Approach
• Assume the regression equation is given by ,

yt =  + xt + ut for t=1,2,...,T
• The steps involved in doing a test of significance are:

1. Estimate $ , $ and SE($ ) , SE( $ ) in the usual way
2. Calculate the test statistic. This is given by the formula

$ −  *
test statistic =
SE ( $ )
where  * is the value of  under the null hypothesis.

The Test of Significance Approach (cont’d)
3. We need some tabulated distribution with which to compare the estimated

test statistics. Test statistics derived in this way can be shown to follow a t-
distribution with T-2 degrees of freedom.
As the number of degrees of freedom increases, we need to be less cautious in
our approach since we can be more sure that our results are robust.
4. We need to choose a “significance level”, often denoted . This is also

sometimes called the size of the test and it determines the region where we
will reject or not reject the null hypothesis that we are testing. It is
conventional to use a significance level of 5%.
Intuitive explanation is that we would only expect a result as extreme as this
or more extreme 5% of the time as a consequence of chance alone.
Conventional to use a 5% size of test, but 10% and 1% are also commonly
used.
Determining the Rejection Region for a Test of
Significance
5. Given a significance level, we can determine a rejection region and non-
rejection region. For a 2-sided test:
f(x)
2.5% 95% non-rejection 2.5%

rejection region region rejection region

The Rejection Region for a 1-Sided Test (Upper Tail)
f(x)
95% non-rejection
region 5% rejection region

The Rejection Region for a 1-Sided Test (Lower Tail)
f(x)
95% non-rejection region

5% rejection region

The Test of Significance Approach: Drawing
Conclusions
6. Use the t-tables to obtain a critical value or values with which to

compare the test statistic.
7. Finally perform the test. If the test statistic lies in the rejection
region then reject the null hypothesis (H0), else do not reject H0.

A Note on the t and the Normal Distribution
• You should all be familiar with the normal distribution and its
characteristic “bell” shape.
• We can scale a normal variate to have zero mean and unit variance by
subtracting its mean and dividing by its standard deviation.
• There is, however, a specific relationship between the t- and the

standard normal distribution. Both are symmetrical and centred on
zero. The t-distribution has another parameter, its degrees of freedom.
We will always know this (for the time being from the number of
observations -2).

What Does the t-Distribution Look Like?
normal distribution
t-distribution

The Confidence Interval Approach
to Hypothesis Testing
• An example of its usage: We estimate a parameter, say to be 0.93, and

a “95% confidence interval” to be (0.77,1.09). This means that we are
95% confident that the interval containing the true (but unknown)
value of .
• Confidence intervals are almost invariably two-sided, although in

theory a one-sided interval can be constructed.

Determining the Rejection Region
f(x)
2.5% rejection region 2.5% rejection region
-2.086 +2.086
Performing the Test
• The hypotheses are:

H0 :  = 1
H1 :   1
Test of significance Confidence interval

approach approach
test stat =
$ −  * ˆ  t crit  SE ( ˆ )
SE ( $ )
05091
. −1 = 0.5091  2.086  0.2561
= = −1917
.
0.2561 = (−0.0251,1.0433)
Do not reject H0 since Since 1 lies within the
test stat lies within confidence interval,
non-rejection region do not reject H0
Some More Terminology
• If we reject the null hypothesis at the 5% level, we say that the result
of the test is statistically significant.
• Note that a statistically significant result may be of no practical

significance. E.g. if a shipment of cans of beans is expected to weigh
450g per tin, but the actual mean weight of some tins is 449g, the
result may be highly statistically significant but presumably nobody
would care about 1g of beans.

The Errors That We Can Make
Using Hypothesis Tests
• We usually reject H0 if the test statistic is statistically significant at a

chosen significance level.
• There are two possible errors we could make:

1. Rejecting H0 when it was really true. This is called a type I error.
2. Not rejecting H0 when it was in fact false. This is called a type II error.
Reality
H0 is true H0 is false
Significant Type I error 
Result of (reject H0) =
Test Insignificant Type II error
( do not  =
reject H0)
The Trade-off Between Type I and Type II Errors
• The probability of a type I error is just , the significance level or size of test we
chose. To see this, recall what we said significance at the 5% level meant: it is only
5% likely that a result as or more extreme as this could have occurred purely by
chance.
• Note that there is no chance for a free lunch here! What happens if we reduce the size
of the test (e.g. from a 5% test to a 1% test)? We reduce the chances of making a type
I error ... but we also reduce the probability that we will reject the null hypothesis at
all, so we increase the probability of a type II error: less likely
to falsely reject
Reduce size → more strict → reject null
of test criterion for hypothesis more likely to
rejection less often incorrectly not
reject
• So there is always a trade off between type I and type II errors when choosing a
significance level. The only way we can reduce the chances of both is to increase
the sample size.
The Exact Significance Level or p-value
• This is equivalent to choosing an infinite number of critical t-values from

tables. It gives us the marginal significance level where we would be
indifferent between rejecting and not rejecting the null hypothesis.
• If the test statistic is large in absolute value, the p-value will be small, and
vice versa. The p-value gives the plausibility of the null hypothesis.
e.g. a test statistic is distributed as a t62 = 1.47.

The p-value = 0.12.
• Do we reject at the 5% level?...........................No

• Do we reject at the 10% level?.........................No
• Do we reject at the 20% level?.........................Yes
Chapter 4
Further development and analysis of the

classical linear regression model

Generalising the Simple Model to
Multiple Linear Regression
• Before, we have used the model

yt =  + xt + ut t = 1,2,...,T
• But what if our dependent (y) variable depends on more than one
independent variable?
For example the number of cars sold might plausibly depend on
1. the price of cars
2. the price of public transport
3. the price of petrol
4. the extent of the public’s concern about global warming
• Similarly, stock returns might depend on several factors.
• Having just one independent variable is no good in this case - we want to
have more than one x variable. It is very easy to generalise the simple
model to one with k-1 regressors (independent variables).

Multiple Regression and the Constant Term
• Now we write
yt = 1 +  2 x2t +  3 x3t + ... +  k xkt + ut , t=1,2,...,T
• Where is x1? It is the constant term. In fact the constant term is usually
represented by a column of ones of length T:
1
1
x1 =  


1
1 is the coefficient attached to the constant term (which we called  before).

The OLS Estimator for the
Multiple Regression Model
• In order to obtain the parameter estimates, 1, 2,..., k, we would

minimise the RSS with respect to all the s.
• It can be shown that

 ˆ1 
 
ˆ  ˆ 2 
 =   = ( X X ) −1 X  y

 
 ˆ k 

Testing Multiple Hypotheses: The F-test
• We used the t-test to test single hypotheses, i.e. hypotheses involving only
one coefficient. But what if we want to test more than one coefficient
simultaneously?
• We do this using the F-test. The F-test involves estimating 2 regressions.
• The unrestricted regression is the one in which the coefficients are freely
determined by the data, as we have done before.
• The restricted regression is the one in which the coefficients are restricted,
i.e. the restrictions are imposed on some s.

The F-Distribution
• The test statistic follows the F-distribution, which has 2 d.f.

parameters.
• The value of the degrees of freedom parameters are m and (T-k)

respectively (the order of the d.f. parameters is important).
• The appropriate critical value will be in column m, row (T-k).
• The F-distribution has only positive values and is not symmetrical. We

therefore only reject the null if the test statistic > critical F-value.

What we Cannot Test with Either an F or a t-test
We cannot test using this framework hypotheses which are not linear
or which are multiplicative, e.g.
H0: 2 3 = 2 or H0:  2 2 = 1
cannot be tested.

The Relationship between the t and the F-
Distributions
• Any hypothesis which could be tested with a t-test could have been
tested using an F-test, but not the other way around.
For example, consider the hypothesis

H0: 2 = 0.5
H1: 2  0.5
$2 − 0.5
We could have tested this using the usual t-test: test stat =
SE ( $2 )
or it could be tested in the framework above for the F-test.
• Note that the two tests always give the same result since the t-
distribution is just a special case of the F-distribution.

Note
what is an appropriate sample size for model estimation?
• In general, as many observations as possible should be
used
• Why? Because sampling error is minimised by
increasing the size of the sample, since the larger the
sample, the less likely it is that all of the data drawn
will be unrepresentative of the population.
Instructor’s note 51
Data Mining
• Data mining is searching many series for statistical relationships

without theoretical justification.
• Data mining or data snooping is trying many variables in a
regression without basing the selection of the candidate variables
on a financial or economic theory.
• For example, suppose we generate one dependent variable and twenty
explanatory variables completely randomly and independently of each
other.
• If we regress the dependent variable separately on each independent
variable, on average one slope coefficient will be significant at 5%.
• If data mining occurs, the true significance level will be greater than
the nominal significance level.
Goodness of Fit Statistics
• We would like some measure of how well our regression model actually fits
the data.
• We have goodness of fit statistics to test this: i.e. how well the sample
regression function (srf) fits the data.
• The most common goodness of fit statistic is known as R2. One way to define
R2 is to say that it is the square of the correlation coefficient between y
and y$ .

The Limit Cases: R2 = 0 and R2 = 1
yt
yt
xt xt

Problems with R2 as a Goodness of Fit Measure
• There are a number of them:
1. R2 is defined in terms of variation about the mean of y so that if a model

is reparameterised (rearranged) and the dependent variable changes, R2
will change.
2. R2 never falls if more regressors are added. to the regression, e.g.

consider:
Regression 1: yt = 1 + 2x2t + 3x3t + ut
Regression 2: y = 1 + 2x2t + 3x3t + 4x4t + ut
R2 will always be at least as high for regression 2 relative to regression 1.
3. R2 quite often takes on values of 0.9 or higher for time series
regressions.

Adjusted R2
• In order to get around these problems, a modification is often made

which takes into account the loss of degrees of freedom associated
with adding extra variables. This is known as R 2 , or adjusted R2:
 T −1 
R 2 =1−  (1 − R 2 )
T − k 
• So if we add an extra regressor, k increases and unless R2 increases by
a more than offsetting amount, R 2 will actually fall.

A Regression Example:
Hedonic House Pricing Models
• Hedonic models are used to value real assets, especially housing, and view the
asset as representing a bundle of characteristics.
• Des Rosiers and Thérialt (1996) consider the effect of various amenities on rental
values for buildings and apartments 5 sub-markets in the Quebec area of Canada.
• The rental value in Canadian Dollars per month (the dependent variable) is a
function of 9 to 14 variables (depending on the area under consideration). The
paper employs 1990 data, and for the Quebec City region, there are 13,378
observations, and the 12 explanatory variables are:
LnAGE - log of the apparent age of the property
NBROOMS - number of bedrooms
AREABYRM - area per room (in square metres)
ELEVATOR - a dummy variable = 1 if the building has an elevator; 0 otherwise
BASEMENT - a dummy variable = 1 if the unit is located in a basement; 0
otherwise
Hedonic House Pricing Models:
Variable Definitions
OUTPARK - number of outdoor parking spaces

INDPARK - number of indoor parking spaces
NOLEASE - a dummy variable = 1 if the unit has no lease attached to it; 0
otherwise
LnDISTCBD - log of the distance in kilometres to the central business district
SINGLPAR - percentage of single parent families in the area where the
building stands
DSHOPCNTR- distance in kilometres to the nearest shopping centre
VACDIFF1 - vacancy difference between the building and the census figure
• Examine the signs and sizes of the coefficients.

– The coefficient estimates themselves show the Canadian dollar rental price
per month of each feature of the dwelling.
Hedonic House Price Results
Dependent Variable: Canadian Dollars per Month
Variable Coefficient t-ratio A priori sign expected

Intercept 282.21 56.09 +
LnAGE -53.10 -59.71 -
NBROOMS 48.47 104.81 +
AREABYRM 3.97 29.99 +
ELEVATOR 88.51 45.04 +
BASEMENT -15.90 -11.32 -
OUTPARK 7.17 7.07 +
INDPARK 73.76 31.25 +
NOLEASE -16.99 -7.62 -
LnDISTCBD 5.84 4.60 -
SINGLPAR -4.27 -38.88 -
DSHOPCNTR -10.04 -5.97 -
VACDIFF1 0.29 5.98 -
Notes: Adjusted R2 = 0.65l; regression F-statistic = 2082.27. Source: Des Rosiers and
Thérialt
(1996). Reprinted with permission of the American Real Estate Society.

• The adjusted R squared ( R 2) value indicates that 65%
of the total variability of rental prices about their mean
value is explained by the model.

Quantile Regression - Background
• Standard regression approaches effectively model the (conditional)

mean of the dependent variable
• We could calculate from the fitted regression line the value that y
would take for any values of the explanatory variables
• But this would be an extrapolation of the behaviour of the relationship
between y and x at the mean to the remainder of the data
• This approach will often be suboptimal
• For example, there might be a non-linear (e.g., ∩-shaped) relationship
between x and y
• Estimating a standard linear regression model may lead to seriously
misleading estimates of this relationship as it will ‘average’ the
positive and negative effects.

Quantile Regression – Background 2
• It would be possible to include non-linear (i.e. polynomial) terms in

the regression model (for example, squared, cubic, . . . terms)
• But quantile regressions represent a more natural and flexible way to
capture the complexities by estimating models for the conditional
quantile functions
• Quantile regressions can be conducted in both time-series and cross-
sectional contexts
• It is usually assumed that the dependent variable, often called the
response variable, is independently distributed and homoscedastic
• Quantile regressions are more robust to outliers and non-
normality than OLS regressions

Quantile Regression – Background 3
• Quantile regression is a non-parametric technique since no

distributional assumptions are required to optimally estimate the
parameters
• The notation and approaches commonly used in quantile regression
modelling are different to those that we are familiar with in financial
econometrics
• Increased interest in modelling the ‘tail behaviour’ of series have
spurred applications of quantile regression in finance
• A common use of the technique here is to value at risk modelling
• This seems natural given that the models are based on estimating the
quantile of a distribution of possible losses.

Quantiles – A Definition
• By definition, quantiles must lie between zero and one
• Quantile regressions effectively model the entire conditional

distribution of y given the explanatory variables.
• Quantile regression is an extension of linear regression that is

used when the conditions of linear regression are not met (i.e.,
linearity, homoscedasticity, independence, or normality).

Application
• We will be using the file ‘SandPhedge.xls’, which

contains monthly returns for the S&P500 index (in
column 2) and S&P500 futures (in column 3)
• the first step is to open a workfile. Open EViews and

click on File/New/Workfile; choose Dated – regular
frequency and Monthly frequency data. The start date
is 2002:02 and the end date is 2013:04. Then import
the Excel file by clicking Import and Read Text-Lotus-
Excel

• The first step is to transform the levels of the two series
into percentage returns. It is common in academic
research to use continuously compounded returns
rather than simple returns.
• To achieve this (i.e. to produce continuously
compounded returns), click on Genr and in the ‘Enter
Equation’ dialog box, enter
Rfutures=100*dlog(futures).
Then click Genr again and do the same for the spot series:
Rspot=100*dlog(spot).
Do not forget to Save the workfile. Continue to re-save it
at regular intervals to ensure that no work is lost! 66
• We have imported more than one series, we can examine
a number of descriptive statistics together and measures
of association between the series. For example, click
Quick and Group Statistics. From there you will see that it
is possible to calculate the covariances or correlations
between series.
• For now, click on Descriptive Statistics and Common
Sample. In the dialog box that appears, type rspot
rfutures and click OK.

Note that the number
of observations has
reduced from 66 for
the levels of the series
to 65 when we
computed the returns
(as one observation is
‘lost’ in constructing
the t − 1 value of the
prices in the returns
formula).
68
We can now proceed to
estimate the regression.
There are several ways to do
this, but the easiest is to
select Quick and then
Estimate Equation
In the ‘Equation Specification’
window, you insert the list of
variables to be used, with the
dependent variable (y) first, and
including a constant (c), so type
rspot c rfutures In the ‘Estimation settings’ box, the default
estimation method is OLS and the default
sample is the whole sample, 69
The parameter estimates for the
intercept ( ˆα) and slope (βˆ) are
0.0006 and 1.007 respectively.
Name the regression results
returnreg
70
Now estimate a
regression for the
levels of the series
rather than the
returns
spot c futures
The intercept
estimate ( ˆα) in this
regression is 5.49
and the slope
estimate (βˆ) is 0.99
Compare the results!
71
Multiple regression in EViews using
an APT-style model
In the spirit of arbitrage pricing theory (APT), the

following example will examine regressions that seek to
determine whether the monthly returns on Microsoft
stock can be explained by reference to unexpected
changes in a set of macroeconomic and financial variables

Open a ‘macro.xls’, . There are 326 monthly observations in the file starting in March
1986 and ending in April 2013. There are 13 series plus a column of dates.
The series in the Excel file are:
the Microsoft stock price,
the S&P500 index value,
the consumer price index,
an industrial production index,
Treasury bill yields for the following maturities: three months, six months, one year,
three years, five years and ten years,
a measure of ‘narrow’ money supply,
a consumer credit series,
and a ‘credit spread’ series. (Which is defined as the difference in annualised average
yields between a portfolio of bonds rated AAA and a portfolio of bonds rated BAA.)
• The first stage is to generate a set of changes or differences for each of the
variables, since the APT posits that the stock returns can be explained by
reference to the unexpected changes in the macroeconomic variables
rather than their levels.
• The unexpected value of a variable can be defined as the difference
between the actual (realised) value of the variable and its expected value.
• The question then arises about how we believe that investors might have
formed their expectations, and while there are many ways to construct
measures of expectations, the easiest is to assume that investors have
naive expectations that the next period value of the variable is equal to the
current value.
• This being the case, the entire change in the variable from one period to
the next is the unexpected change (because investors are assumed to
expect no change)
74
• Transforming the variables:
Press Genr and then enter the following in the ‘Enter equation’ box:
dspread = baa aaa spread - baa aaa spread(-1)
Repeat for all the following transformations:
dcredit = consumer credit - consumer credit(-1)
dprod = industrial production - industrial production(-1)
rmsoft = 100*dlog(microsoft)
rsandp = 100*dlog(sandp)
dmoney = m1money supply - m1money supply(-1)
inflation = 100*dlog(cpi)
term = ustb10y - ustb3m
and then click OK. Next, we need to apply further transformations
75
dinflation = inflation - inflation(-1)
mustb3m = ustb3m/12
rterm = term - term(-1)
ermsoft = rmsoft - mustb3m
ersandp = rsandp - mustb3m
The final two of these calculate excess returns for the stock and for
the index

We can now run the regression.
So click Object/New Object/Equation and name the object ‘msoftreg’.
Type the following variables in the Equation specification window
ERMSOFT C ERSANDP DPROD DCREDIT DINFLATION DMONEY

DSPREAD RTERM
and use Least Squares over the whole sample period.

examine the main regression
results. Which of the
variables has a statistically
significant impact on the
Microsoft excess returns?
Using your knowledge of the

effects of the financial and
macroeconomic environment
on stock returns, examine
whether the coefficients have
their expected signs and
whether the sizes of the
parameters are plausible.
78
• The regression F-statistic takes a value 11.76.
Remember that this tests the null hypothesis that all of
the slope parameters are jointly zero.
• The p-value of zero attached to the test statistic shows
that this null hypothesis should be rejected.
• However, there are a number of parameter estimates
that are not significantly different from zero specifically
those on the DPROD, DCREDIT and DSPREAD variables

stepwise regression
• There is a procedure known as a stepwise regression

That is an automatic variable selection procedure which chooses
the jointly most ‘important’ explanatory variables from a set of
candidate variables.
There are a number of different stepwise regression procedures,
but the simplest is the uni-directional forwards method. This
starts with no variables in the regression (or only those
variables that are always required by the researcher to be in the
regression) and then it selects first the variable with the lowest
p-value (largest t-ratio) if it were included, then the variable
with the second lowest p-value conditional upon the first
variable already being included, and so on. 80
To conduct a stepwise regression which will automatically select from among
these variables the most important ones for explaining the variations in
Microsoft stock returns,
click Object/New Object/Equation. Name the equation Msoftstepwise and
then in the ‘Estimation settings/Method’ box, change LS -- Least Squares (NLS
and ARMA) to STEPLS – Stepwise Least Squares
and then in the top box that appears, ‘Dependent variable followed by list of
always included regressors’, enter ERMSOFT C This shows that the dependent
variable will be the excess returns on Microsoft stock and that an intercept
will always be included in the regression.
If the researcher had a strong prior view that a particular explanatory
variable must always be included in the regression, it should be listed in this
first box
In the second box, ‘List
of search regressors’,
type the list of all of
the explanatory
variables :
ERSANDP DPROD
DCREDIT
DINFLATION DMONEY
DSPREAD RTERM
82
Clicking on the ‘Options’ tab gives a number of ways to conduct the
regression.
For example, ‘Forwards’ will start with the list of required regressors
(the intercept only in this case) and will sequentially add to them,
while ‘Backwards’ will start by including all of the variables and will
sequentially delete variables from the regression.
The default criterion is to include variables if the p-value is less than

0.5, but this seems high and could potentially result in the inclusion of
some very insignificant variables, so modify this to 0.2 and then click
OK.

As can be seen, the
excess market return,
the term structure,
and unexpected
inflation variables
have all been
included, while the
default spread and
credit variables have
been omitted.
84
The end

A Brief Overview of The Classical Linear Regression Model (CLRM)

Uploaded by

Copyright:

Available Formats

A Brief Overview of The Classical Linear Regression Model (CLRM)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Brief Overview of The Classical Linear Regression Model (CLRM)

Uploaded by

Copyright:

Available Formats

Chapter 3

A brief overview of the

‘Introductory Econometrics for Finance’ © Chris Brooks 2013 1

• Regression is probably the single most important tool at the

But what is regression analysis?

• It is concerned with describing and evaluating the relationship between

‘Introductory Econometrics for Finance’ © Chris Brooks 2013 2

• Some alternative names for the y and x variables:

• If we say y and x are correlated, it means that we are treating y and x in

• In regression, we treat the dependent variable (y) and the independent

‘Introductory Econometrics for Finance’ © Chris Brooks 2013 4

• Examples of the kind of relationship that may be of interest include:

‘Introductory Econometrics for Finance’ © Chris Brooks 2013 5

‘Introductory Econometrics for Finance’ © Chris Brooks 2013 6

• We can use the general equation for a straight line,

• However, this equation (y=a+bx) is completely deterministic.

• Is this realistic? No. So what we do is to add a random disturbance

‘Introductory Econometrics for Finance’ © Chris Brooks 2013 8

• The disturbance term can capture a number of features:

- We always leave out some determinants of yt

‘Introductory Econometrics for Finance’ © Chris Brooks 2013 9

• So how do we determine what  and  are?

• Tightening up the notation, let

‘Introductory Econometrics for Finance’ © Chris Brooks 2013 11

‘Introductory Econometrics for Finance’ © Chris Brooks 2013 12

• So min. uˆ1 + uˆ 2 + uˆ3 + uˆ 4 + uˆ5 , or minimise

with respect to $ and $ .

‘Introductory Econometrics for Finance’ © Chris Brooks 2013 13

• In the CAPM example used above, plugging the 5 observations in to make up

• Care needs to be exercised when considering the intercept estimate,

‘Introductory Econometrics for Finance’ © Chris Brooks 2013 15

• The population is the total collection of all objects or people to be studied,

• Interested in Population of interest

• A sample is a selection of just some items from the population.

• A random sample is a sample in which each individual item in the

‘Introductory Econometrics for Finance’ © Chris Brooks 2013 16

• The population regression function (PRF) is a description of the model that

• The PRF is yt =  + xt + ut

• The SRF (the sample regression function ) is yˆ t = ˆ + ˆxt

• We use the SRF to infer likely values of the PRF.

• We also want to know how “good” our estimates of  and  are.

• In order to use OLS, we need a model which is linear in the parameters (

• Some models can be transformed to linear ones by a suitable substitution

• This is known as the exponential regression model. Here, the coefficients

• Similarly, if theory suggests that y and x should be inversely related:

‘Introductory Econometrics for Finance’ © Chris Brooks 2013 19

• Estimators are the formulae used to calculate the coefficients

• Estimates are the actual numerical values for the coefficients.

‘Introductory Econometrics for Finance’ © Chris Brooks 2013 20

• An alternative assumption to 4., which is slightly stronger, is that the

• A fifth assumption is required if we want to make inferences about the

‘Introductory Econometrics for Finance’ © Chris Brooks 2013 22

• If assumptions 1. through 4. hold, then the estimators $ and $ determined by

• “Estimator” - $ is an estimator of the true value of .

‘Introductory Econometrics for Finance’ © Chris Brooks 2013 24

• We want to make inferences about the likely population values from

Example: Suppose we have the following regression results:

• The reliability of the point estimate is measured by the coefficient’s

‘Introductory Econometrics for Finance’ © Chris Brooks 2013 25