Chapter # 6: Multiple Regression Analysis: The Problem of Estimation
Chapter # 6: Multiple Regression Analysis: The Problem of Estimation
Chapter # 6: Multiple Regression Analysis: The Problem of Estimation
• OLS Estimators
• To find the OLS estimators, let us first write the sample regression function
• (SRF) corresponding to the PRF of (7.1.1) as follows:
• Yi = βˆ1 + βˆ2X2i + βˆ3X3i +uˆi (7.4.1)
• where uˆi is the residual term, the sample counterpart of the stochastic
disturbance term ui .
• As noted in Chapter 3, the OLS procedure consists in so choosing the values of
the unknown parameters that the residual sum of squares (RSS) Σuˆ2i is as
small as possible. Symbolically,
• Min Σuˆ2i = Σ(Yi − βˆ1 − βˆ2X2i − βˆ3X3i)2 (7.4.2)
Con’t
• The most straightforward procedure to obtain the estimators that will minimize (7.4.2)
is to differentiate it with respect to the unknowns, set the resulting expressions to
zero, and solve them simultaneously. This procedure gives the following normal
equations
• Y¯ = βˆ1 + βˆ2X¯2 + βˆ3X¯3 (7.4.3)
• ΣYi X2i = βˆ1 ΣX2i + βˆ2 ΣX22i + βˆ3 ΣX2i X3i (7.4.4)
• ΣYi X3i = βˆ1 ΣX3i + βˆ2 ΣX2i X3i + βˆ3 ΣX23i (7.4.5)
• From Eq. (7.4.3) we see at once that
• βˆ1 = Y¯ − βˆ2X¯2 − βˆ3X¯3 (7.4.6)
• which is the OLS estimator of the population intercept β1.
• the following formulas are driven from the normal equations (7.4.3) to (7.4.5):
Con’t
• which give the OLS estimators of the population partial regression coefficients β2
and β3, respectively.
• In passing, note the following:
• (1) Equations (7.4.7) and (7.4.8) are symmetrical in nature because one can be
obtained from the other by interchanging the roles of X2 and X3;
• (2) the denominators of these two equations are identical; and
• (3) the three-variable case is a natural extension of the two-variable case.
Variances and Standard Errors of OLS
Estimators
• As in the two-variable case, we need the standard errors for two main purposes: to establish
confidence intervals and to test statistical hypotheses. The relevant formulas are as follows:
Con’t
Con’t
• An unbiased estimator of σ2 is given by:
• σˆ2 = Σuˆ2i/(n− 3) (7.4.18)
• The degrees of freedom are now (n− 3) because in estimating Σuˆ2i we
must first estimate β1, β2, and β3, which consume 3 df. The estimator
σˆ2 can be computed from (7.4.18) once the residuals are available,
but it can also be obtained more readily by using the following
relation:
• Σuˆ2i = Σ y2i − βˆ2 Σ yix2i − βˆ3 Σ yix3i (7.4.19)
• which is the three-variable counterpart of the relation given in (3.3.6).
Con’t
• Properties of OLS Estimators
• 1. The three-variable regression line (surface) passes through the
means Y¯, X¯2, and X¯3, which is evident from (7.4.3). This property
holds generally. Thus in the k-variable linear regression model [a
regressand and (k− 1) regressors]
• Yi = β1 + β2X2i + β3X3i +· · ·+βkXki + ui (7.4.20)
• we have
• βˆ1 = Y¯ − β2X¯2 − β3X¯3 −· · ·−βkX¯k (7.4.21)
CON’T
• 2. The mean value of the estimated Yi (= Yˆi) is equal to the mean value of the actual Yi :
• Yˆi = βˆ1 + βˆ2X2i + βˆ3X3i
• = (Y¯ − βˆ2X¯2 − βˆ3X¯3) + βˆ2X2i + βˆ3X3i
• = Y¯ + βˆ2(X2i − X¯2) + βˆ3(X3i − X¯3) (7.4.22)
• = Y¯ + βˆ2x2i + βˆ3x3i
• Summing both sides of (7.4.22) over the sample values and dividing through by the
sample size n gives Y¯ˆ = Y¯. Notice that by virtue of (7.4.22) we can write
• yˆi = βˆ2x2i + βˆ3x3i (7.4.23)
• where yˆi = (Yˆi − Y¯). Therefore, the SRF (7.4.1) can be expressed in the deviation form
as:
• yi = yˆi +uˆi = βˆ2x2i + βˆ3x3i +uˆi (7.4.24)
CON’T
• 3. Σuˆi = u¯ˆ = 0, which can be verified from (7.4.24).
• 4. The residuals uˆi are uncorrelated with X2i and X3i , that is,
• Σuˆi X2i = Σuˆi X3i = 0
• 5. The residuals uˆi are uncorrelated with Yˆi ; that is,
• ΣuˆiYˆi = 0.
CON’T
• 6. From (7.4.12) and (7.4.15) it is evident that as r23, the correlation
coefficient between X2 and X3, increases toward 1, the variances of βˆ2
and βˆ3 increase for given values of σ2 and Σx22i or Σx23i . In the limit,
when r23 = 1 (i.e., perfect collinearity), these variances become infinite.
• 7. Given the assumptions of the classical linear regression model, which
are spelled out in Section 7.1, one can prove that the OLS estimators of
the partial regression coefficients not only are linear and unbiased but
also have minimum variance in the class of all linear unbiased
estimators. In short, they are BLUE: Put differently, they satisfy the
Gauss-Markov theorem.
THE MULTIPLE COEFFICIENT OF DETERMINATION R2 AND
THE MULTIPLE COEFFICIENT OF CORRELATION R
• In the three variable model we would like to know the proportion of the variation in Y explained
by the variables X2 and X3 jointly. The quantity that gives this information is known as the multiple
coefficient of determination and is denoted by R2; conceptually it is akin to r2.
• To derive R2, we may follow the derivation of r2 given in Section 3.5. Recall that
• Yi = βˆ1 + βˆ2X2i + βˆ3X3i +uˆI (7.5.1)
• = Yˆi +uˆi
• Yˆi is an estimator of true E(Yi | X2i , X3i). Eq. (7.5.1) may be written in the deviation form as:
• yi = βˆ2x2i + βˆ3x3i +uˆI = (7.5.2)
• = yˆi + uˆi
• Squaring (7.5.2) on both sides and summing over the sample values, we obtain
• Σy2i = Σyˆ2i + Σuˆ2i + 2Σ yˆi uˆi (7.5.3)
• = Σyˆ2i + Σuˆ2i
CON’T
• Verbally, Eq. (7.5.3) states that the total sum of squares (TSS) equals
the (ESS) + the (RSS). Now substituting for ˆu2i from (7.4.19), we
obtain
• Σy2i = Σyˆ2i + Σy2i − βˆ2 Σ yix2i − βˆ3 Σ yix3i
• which, on rearranging, gives
• ESS = Σyˆ2i = βˆ2 Σ yix2i + βˆ3 Σ yix3i (7.5.4)
• Now, by definition
• R2 =ESS/TSS
• = (βˆ2 Σ yix2i + βˆ3 Σ yix3i) / Σ y2i (7.5.5)
CON’T
• Note that R2, like r2, lies between 0 and 1. If it is 1, the fitted
regression line explains 100 percent of the variation in Y. On the other
hand, if it is 0, the model does not explain any of the variation in Y.
The fit of the model is said to be “better’’ the closer is R2 to 1.
• The three-or-more-variable R is the coefficient of multiple correlation,
denoted by, and it is a measure of the degree of association between
Y and all the explanatory variables jointly.
• R is always taken to be positive. In practice, however, R is of little
importance. The more meaningful quantity is R2.
EXAMPLE 7.1: CHILD MORTALITY IN RELATION TO
PER CAPITA GNP AND FEMALE LITERACY RATE
• When we introduce more than one variables in our model, we need
to net out the influence of each of the regressors. That is, we need to
estimate the (partial) regression coefficients of each regressor. Thus
our model is:
• CMi = β1 + β2PGNPi + β3FLRi + ui (7.6.1)
• The necessary data are given in Table 7.1 Keep in mind that CM is the
number of deaths of children under five per 1000 live births, PGNP is
per capita GNP in 1980, and FLR is measured in percent. Our sample
consists of 64 countries.
Table 7.1
OBSER CM FLR PGNP OBSER CM FLR OBSER CM FLR PGNP
• 1 128 37 1870 PGNP • 24 12 81 4240
• 2 204 22 130• 14 165 31 1150 • 25 167 29 240
• 3 202 16 310• 15 94 77 1160 • 26 135 65 430
• 27 107 87 3020
• 4 197 65 570• 16 96 80 1270
• 28 72 63 1420
• 5 96 76 2050 • 17 148 30 580 • 29 128 49 420
• 6 209 26 200• 18 • 30 27 63 19830
98 69 660
• 7 170 45 670 • 31 152 84 420
• 19 161 43 420
• 8 240 29 300 • 32 224 23 530
• 20 118 47 1080 • 33 142
• 9 241 11 120 50 8640
• 21 269 17 290 • 34 104 62 350
• 10 55 55 290
• 22 189 35 270 • 35 287 31 230
• 11 75 87 1180
• 36 41 66 1620
• 12 129 55 900• 23 126 58 560
• 37 312 11 190
• 13 24 93 1730 • 38 77 88 2090
• 39 142 22 900
Con’t
OBSER CM FLR PGNP…. OBSER CM FLR PGNP----
40 262 22 230
• 56 61 88 670
41 215 12 140
42 246 9 330 • 57 168 28 410
43 191 31 1010 • 58 28 95 4370
44 182 19 300 • 59 121 41 1310
45 37 88 1730
• 60 115 62 1470
46 103 35 780
47 67 85 1300 • 61 186 45 300
48 143 78 930 • 62 47 85 3630
49 83 85 690 • 63 178 45 220
50 223 33 200 • 64 142 67 560
51 240 19 450
52 312 21 280
53 12 79 4430
54 52 83 270 •
55 79 43 1340
Con’t
• Using the Eviews statistical package, we obtained the following
results:
• CMi = 263.6416 − 0.0056 PGNPi − 2.2316 FLRi
• se = (11.5932) (0.0019) (0.2099)
• R2 (7.6.2) = 0.7077 R¯2 = 0.6981
• Observe the partial slope coefficient of PGNP, namely, −0.0056, it is
precisely the same as that obtained from the three-step procedure
discussed before. But we did so without the three-step cumbersome
procedure.
Con’t
• 0.0056 is the partial regression coefficient of PGNP, with the influence of FLR held
constant, as PGNP increases, say, by a dollar, on average, child mortality goes down by
0.0056 units. To make it more economically interpretable, if the per capita GNP goes
up by a thousand dollars, on average, the number of deaths of children under age 5
goes down by about 5.6 per thousand live births.
• The coefficient −2.2316 tells us that holding the influence of PGNP constant, on
average, the number of deaths of children under 5 goes down by about 2.23 per
thousand live births as the female literacy rate increases by one percentage point.
• The intercept value of about 263, mechanically interpreted, means that if the values of
PGNP and FLR rate were fixed at zero, the mean child mortality would be about 263
deaths per thousand live births. The R2 value of about 0.71 means that about 71
percent of the variation in child mortality is explained by PGNP and FLR, a fairly high
value considering that the maximum value of R2 can at most be 1.
Regression on Standardized Variables
• A variable is said to be standardized or in standard deviation units if it is expressed
in terms of deviation from its mean and divided by its standard deviation. For our
child mortality example, the results are as follows:
• CM* = − 0.2026 PGNP*i − 0.7639 FLR*i (7.6.3)
• se = (0.0713) (0.0713) r2 = 0.7077
• Note: The starred variables are standardized variables.
• As you can see from this regression, with FLR held constant, a standard deviation
increase in PGNP leads, on average, to a 0.2026 standard deviation decrease in
CM. Similarly, holding PGNP constant, a standard deviation increase in FLR, on
average, leads to a 0.7639 standard deviation decrease in CM. Relatively speaking,
female literacy has more impact on child mortality than per capita GNP.
SIMPLE REGRESSION IN THE CONTEXT OF MULTIPLE REGRESSION: INTRODUCTION
TO SPECIFICATION BIAS
• The partial correlations given in Eqs. (7.11.1) to (7.11.3) are called first order correlation
coefficients. By order we mean the number of secondary subscripts. Thus r1 2.3 4 would be the
correlation coefficient of order two, r12.345 would be the correlation coefficient of order three, and
so on. As noted previously, r12, r13, and so on are called simple or zero-order correlations. The
interpretation of, say, r1 2.3 4 is that it gives the coefficient of correlation between Y and X2, holding
X3 and X4 constant.
Con’t
• Interpretation of Simple and Partial Correlation Coefficients
• observe the following:
• 1. Even if r12 = 0, r1 2.3 will not be zero unless r13 or r23 or both are zero.
• 2. If r12 = 0 and r13 and r23 are nonzero and are of the same sign, r1 2.3 will be negative, whereas
if they are of the opposite signs, it will be positive.
• An example will make this point clear. Let Y = crop yield, X2 = rainfall, and X3 = temperature.
Assume r12 = 0, that is, no association between crop yield and rainfall. Assume further that r13
is positive and r23 is negative. Then, as (7.11.1) shows, r1 2.3 will be positive; that is, holding
temperature constant, there is a positive association between yield and rainfall. This
seemingly paradoxical result, however, is not surprising. Since temperature X3 affects both
yield Y and rainfall X2, in order to find out the net relationship between crop yield and rainfall,
we need to remove the influence of the “nuisance” variable temperature. This example
shows how one might be misled by the simple coefficient of correlation.
Con’t
• 3. The terms r1 2.3 and r12 (and similar comparisons) need not have the same sign.
• 4. In the two-variable case we have seen that r2 lies between 0 and 1. The same
property holds true of the squared partial correlation coefficients. Using this fact, the
reader should verify that one can obtain the following expression from (7.11.1):
• 0 ≤ r 212 + r 213 + r 223 − 2r12r13r23 ≤ 1 (7.11.4)
• which gives the interrelationships among the three zero-order correlation
coefficients. Similar expressions can be derived from Eqs. (7.9.3) and (7.9.4).
• 5. Suppose that r13 = r23 = 0. Does this mean that r12 is also zero? The answer is
obvious from (7.11.4). The fact that Y and X3 and X2 and X3 are uncorrelated does not
mean that Y and X2 are uncorrelated. In passing, note that the expression r212.3 may
be called the coefficient of partial determination and may be interpreted as the
proportion of the variation in Y not explained by the variable X3 that has been
explained by the inclusion of X2 into the model. Conceptually it is similar to R2.
Con’t
• Before moving on, note the following relationships between R2, simple correlation
coefficients, and partial correlation coefficients:
• R2 = r 2 12 + r 2 13 − 2r12r13r23 / (1 − r 2 23) (7.11.5)
• R2 = r 2 12 + (1 − r 2 12) r 2 13.2 (7.11.6)
• R2 = r 2 13 + (1 − r 2 13) r 2 12.3 (7.11.7)
• In concluding this section, consider the following: It was stated previously that R2 will
not decrease if an additional explanatory variable is introduced into the model, which
can be seen clearly from (7.11.6). This equation states that the proportion of the
variation in Y explained by X2 and X3 jointly is the sum of two parts: the part explained
by X2 alone (= r 2 12) and the part not explained by X2 (= 1 − r 2 12) times the proportion
that is explained by X3 after holding the influence of X2 constant. Now R2 > r 2 12 so long
as r 2 13.2 > 0. At worst, r 2 13.2 will be zero, in which case R2 = r 2 12.