Unit 4 Multiple Regression Model: 4.0 Objectives
Unit 4 Multiple Regression Model: 4.0 Objectives
MODEL
Structure
4.0 objective:
4.1 Introduction
Multiple Regression ~ d d e l
Additional Assumptions
Tests of Significance
Coefficient of Determination
Matrix Form of Multiple Regression Model
Structural Stability of Regression Models - Chow Test
Prediction with Multiple Regression Model
Let Us Sum Up
Key Words
Some Useful Books
AnswersIHints to Check Your Progress Exercises
4.0 OBJECTIVES
After going through this unit, you will be in a position to:
a e-xplain the concept of multiple regression;
explain the concept of coefficient of determination;
test the structural stability of regression models; and
make prediction with multiple regression.
4.1 INTRODUCTION
In Units 2 and 3, we studied the basic statistical tools and procedures for
analysing relationships between two variables. However, the two-variable
framework is too restrictive for realistic analyses of economic phenomena.
Economic models generally contain more than two variables. When a
regression equation has three or more than three variables, we call it a
multiple regression model.. The statistical formulas for estimating the
parameters, variance, and testing the parameters are very similar, or in some
cases identical, to the two-variable regression model.
he
simplest possible multiple regression model (without using matrix
algebra) is three-variable regression, with one dependent variable and two
explanatory variables. If you can understand the analysis of a relationship
between three variables, you should be able to generalize the concept to a'
multiple regression model. A conventional example of a three variablc
equation is a demand equation in which quantity demanded depends not only
on the price of the commodity but also on the income of a consumer.
1 4.2 MULTIPLE REGRESSION MODEL
Multiple Regression Model
where Y is the dependent variable, X's are the independent variables, and E
is the error term. X,, represents the i th observation on explanatory variable
X, . Explanatory variable X, is taken here as 1. P, is the constant term or
intercept and PI,. -.,pk are slopes of the equation.
ESS = ci12~ (
= x
- f ) 2 , where f = 8, + b 2 x 2 +AX,,
,
then
The assumptions of the multiple regression model are quite similar to those of
the two-variable podel. Besides the assumptions of two-variable regression
model, the multiple regression model has another assumption, viz. na exact
linear relationship exists between two or more independent variables, or we
can say there is no multicollinearity. Suppose in eq.(4.2), Y ,A',, and A',
represent consumption expenditurg, income and wealth of the consumer
respectively. In postulating that consumption expenditure is linearly related to
income and wealth. economic theory presumes that wealth and incothe may
have some independent influence on consumption. If not. there is na sense in
including both income and wealth variables in the model. In the extreme, if
there is an exact linear relatibnship between income and wealth, we have only
one independent variable, not two, and there is no need to include 60th the
variables. In short, the assumption of no multicollinearity requires that in the
regression
- model we include only those variables which are not linear
functions of some of the variables in the model.
-
the variation of Y about its mean. In other words, the F -statistic tests
the joint hypothesis that P2 .P, = ...Pk= 0 . It can be shown that
If the null hypothesis is true, then we would expect RSS, R ~and , therefore
F to be close to 0 . Thus, a high value of the F -statistic is a rationale for
rejecting the null hypothesis. As F -statistic is not significantly different from
0 lets us conclude that the explanatory variables do little to explain the
variation of Y about its mean. In the two-variabb model, the F statistic tests
whether the regression line ii horizontal. In sbch a case, R2 = 0 and the
regression explains none of the variation in the dependent variable. Note that
we do hot test whether the regression passes through the origin; our objective
.
is simply to see whether we can explain any variation around the mean of Y .
The F -tests of the significance of a regression equation may allow for
rejection of the null hypothesis even though none of the regression
coefficients are found to be significant according to individual t tests. This
situation may arise, for example, if the independent variables are highly
correlated with each other. The result may be high standard errors for the
coefficients and low t values, yet the model as a whole may fit the data well.
1 Recall that in a two-variable model we tested for the significance of /?at (N - 2) degree of
&--A,- + h a --A-l :mnl..AnA h.rr\ n-lrl-n-tn-, ~ ~ ~ ~ ; - l - r l.,;"
no n ~ n r t a nto-
t 9nA Y
Basic ~conbmetricThegry
We can break down the difference between and its mean L as follows:
Example 4.1
Following is a numerical example for a three variable case.
Y L
v x2 Y XI x2
The above information gives us Multiple Regression Model
I
I Therefore,
ESS = *'
E = 96.84
so that
Where the value in parentheses is the standard error of the coefficient. From
i this equation we can compute
1 RSS = cj2
= 1383.16
and
-
RSS I (k - 1)
)
= 14.28
ESS I (n- k)
Year Y
Based on the above data, the OLS method gives the following result.
where figures in the parentheses are the estimated standard errors. The
interpretation of this regression is as follows: For the sample period if both
X , and X , were fixed at zero, the average rate of actual inflation would have
been about 7.19 per cent. The partial regression coefficient of -1.3925 means
that by holding X 3 (the expected inflation rate) constant the actual inflation
rate on the average increased (decreased) by about 1.4 per cent for every one .
, unit (here one percentage point) decrease (increase) in the unemployrnerit rate
over the period 1970-1982. Likewise, by holding the unemployment rate
constant, the coefficient value of 1.4700 implies that over the same time
period th? actual inflation rate on the average increased by about 1.47 per cent
for eve1:I one percentage point increase in t4e anticipated or expected rate of
inflation. The R, valle of 0.88 means that the two explanatory variables
together account for about 88 per cent of the variation in the actual
inflation rate, a fairly high amount of explanatory power since R2 can at best
be one.
Example 4.3 Multiple Regression Model
With the information given in'example - 4.2, we will now test whether the
coefficients significantly determine the rate of inflation. We can postulate the
following hypotheses.
1) H,:P2=0 , and H, : P2 # 0
2) H,:fi=O and H, : fi # 0
The null hypothesis states that, holding X , constant, unemployment has no
(linear) influence on the actual rate of inflation. Similarlv. holding X ,
d * u
Using the two-tailed t test, the critical t value is 3.169. Sirice the
computed t value of 4.566 exceeds the critical value of 3.169, we may
reject the nil1 hypothesis and say that b2
is statistically significant, that
is, significantly different from zero.
In this case, the computed t value exceeds the critical t value of 3.1 69
at 1 per cent level of significance. Therefore, we reject the null
hypothesis and find that the coefficient is significantly. different from
zero.
The above two t - tests show that the independent variables have
significant influence on the dependent variables. In other words, the
increase (decrease) in actual inflation is due to both the decrease
(increase) in unemployment rate as well as the increase (decrease) in
expected inflation.
against the alternative that at least one of the p ' s is nonzero. Resent
the appropriate test and state its distributions (including the numbers of
degrees of freedom).
.
The matrix formulation of the above niodel is
' Y = X ~ + E ... (4.14)
' 8
in which
where Multiple Regression Model
..
Y = N x 1 column vector of de~endent~variable
bbservations
X = N x k matrix of independent variable observations ...
p = k x 1 column vector of unknown parameters
E =Nx
.rL
1 column vector of errors .-
. . . . .
In our representation of the matrix X , each component X,, -has two
subscripts, the first Satisfying the appropriate column. (variable? d d the
second signifying row (observation). Each column of X represent3 a vector .
r'
of N observations on a given variable, with all observations associated ;with
the intercept equal to 1.
The assumptions of t h i classical linear regression model can be represented as
follows:
(i) The elements of X are fixed and have finite variance. In addition X
.
has rank k ,which is less than the number of observations N .
= a 2 ~
where I is an N x N matrix.
Where
And
i=xp
B represents the N x 1 vector of regression residuals, while P represents the
N x 1 vector of fitted values for Y . Substituting eqs. (4.17) and (4.18) into eq.
(4.16), we get
The last steps follows because B'x ' Y and Y ' X are
~ both scalars and are
equal to each other.
From Unit 2 we know that 'the least-squares estimators can be obtained by
minimizing ESS. In order to minimise ESS we take partial derivatives with
respect t o p so that :
= P since E(E) = O
By definition
= E [( X ' X )I x'&E'x(x'x)-I]
vUt-(j) = O'(X~X)-'
Let we take
= c' (xp+ E )
c'X = I ,
SO
E(P*) = c'Xp = IkP = p
We have
p* ; -'y = c ' ( x ~ + E =) c ' X ~ + O ' =E P + c ' E
1
' Or
Now Multiple Regression Model
= a2clc
'
We have .
~ u r ( a )= a2(x'x)-'
=
We have taken
C1 -- rvl+d'
Or
+d )
c'c = (wP+d')(w
= ~~'w+d'd+w'd+d'w
= w'w+dld
since ~ 0
M J ' = and d ' w = 0 J?
Now
a2c~C = cr2w'w+cr2d'd .
or .'
*) = ~ur(,?l)+0 2 d ' d
var(~
Since
Then
= ~'X'XB+E'X~+~'X'P+~'$
R?
ESS -
- I-- 2'2 -
-. B'X'.Y~
TSS Y' Y Y'Y
Find the least squares estimates of Q, . Q,, Q, using the data in the deviation
form. Calculate R2 and interpret the estimated relation.
From the above data we construct the following deviation forms:
Now we hav:
Multiple Regression Model
Basic Econometric Theory may undergo.change so that there is a structural change in the model. In order test
for structural stability of a model there are several tests.
The Chow forecast test leads naturally to more general tests of structural
change. A structural change or break occurs if the parameters underlying a
relationship differ fiom one subset of the data to another. There may, of
coufse, be several relevant subsets oc data, with the possibility of several
structural breaks. For the moment we will consider just two subsets of n, and
n, observations making up the total sample of n = n, + n, observations.
Suppose; for example, that we wish to investigate whether savings in a
country differs between pre-liberalisation and post-li6eralisation periods.
Suppose we have observations on the relevant variables for n, pre-
.liberalisation years and n, post-liberalisation years. A Chow test could be
performed by using the estimated pre-liberalisation function to forecast
savings in the post-liberalisation years. However, grovided n, > k , one might
alternatively use the estimated post-liberalisation function to forecast savings
in pre-liberalisation years. It is, however, not clear which choice should be
made, and the two procedures might well yield different answers.' If the
subsets are lqge enough it is better to estimate both functions and test for
common parameters.
To see the structural change, we formulate the savings functions for the two
periods as follows:
'PeriodI:
Period 11: .
(a) E,, - N (0, 0,) and E,, - N (0, c 2 ) ,is., the two error terms are
- normally distributed with the same variance, and
(b) E,, and E,, are independently distributed. '
The Chow test proceeds as follows: Multiple Regression Model
Step 111: Add these two residual sum of squares, say''ES~,, (unrestricted
residual sum of squares), with DF = (n, + n, - 2k) . Find
ESS,,, - ESS,,, , with DF = k . .
Step IV: Given the assumptions of Chow test, it can be shown that
Example 4.5
We present the data o n personal savings. and personal income in the~united
Kingdom for the period 1946-1963 in the following. We want to test whether
the savings function is same in the two time periods.
Period I Period I1
1946-1954 Savings Income 1955-1963 Savings Income
1946 0.36 8.8 1955 0.59 15.5
1947 0.2 1 9.4 1956 0.90 16.7
I 948 0.08 10.0 1957 0.95 17.7
1949 0.20 10.6 1958 0.82 18.6
1950 0.10 11.0 1959 1.04 19.7
1951 0.12 .- 11.9 1960 1.53 21.1
1952 0.4 1 12.7 1961 1.94 22.8
1953 0.50 13.5 1962 1.75 23.9
1954 0.43 14.3 1963 1.99 25.2
Step I:
f = -1.0821 + 0.1178 X,
Basif Econometric Theory
(0.1452) (0.0088)
t = (77.4548) (13.43 16)
. -.
Step 111:
. .
ESS,, = 0.3327
Step IV:
t Then we have
Consider
t b, + Ax2,+ B3x30
=
Since E(B, - A ) , E ( & -a),E ( & - 4 ) .and E ( E , ) ark all equal to zero,
we have E(C - Y ,) = 0 . Thus the predictor $ is unbiased. Since and Y,'
Period 1 Period TI
1981-91 Expenditure NSDP . 1991-2001 Expenditure NSDP
(Rs. Cr.) (Rs. Cr.) (Rs. Cr.) (Rs. Cr.)
Besides, this unit discussed testing for structural stability of regression models
and prediction of dependent variables, given the-values of the $dependent
variables.
I
i
4.10 KEY WORDS
Adjusted R-square :
'I
,N. 1
I
Gujarati, Damodar N., 1995, Basic Ecoaomefrics; McGraw-Hill lnc.,
Singapore.
Johnston, Jack and John Dinardo,l 997. Econometric Methods, The McGraw-
I