Unit - III

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

86 REGRESSION [3.

3]
Example {continued) From the data given earlier,
-3 -6'
3 6
X = 1 -2 (43)
-1 2
0 0
2
' 425 - 5(9 ) 1562 - 5(9)34" '20 32"
and «'« 2
1562-5(9)34 5860 - 5(34) 32 80
with
20
(SC'SC)-1 == (44)
144 5j
and
665 - 5(14)9 35
'y = (45)
L2430 - 5(14)34. .50.
Therefore, on substituting in (41),

20 -f 35 50/24
144 5J L.50J L-5/24J
as in (28). And from (42)
9"
K = 14 ■ [50/24 -5/24] = 56/24
34
as in (28).
Derivation of $■ in this manner does not apply for the no-intercept model
which contains no &0-term. For then the partitioning of b' as [b0 S'] does not
exist, b' is itself the vector of the Z>'s corresponding to the k ^-variables and
f> = (X/X)~1X/y is based on uncorrected sums of squares and products as
exemplified in (24).

3. FOUR METHODS OF ESTIMATION

In deriving the estimator h = (X'X)~1X/y in the previous section we blithely


adopted the least squares procedure for doing so. This is a well-accepted
method of estimation and its rationale will not be discussed here. However,
for convenient reference we summarize four common methods of estimation
[3.3] FOUR METHODS OF ESTIMATION 87
which, although differing in basic concept, all lead to the same estimator
under certain frequently-used assumptions. All four procedures are sum-
marized in terms of the full rank model where, in y = Xb + e, X has full
column rank, E(y) = Xb and E(e) = 0. Reference to their use in models not
of full rank is made in Chapter 5.
a. Ordinary least squares
This involves choosing f> as the value of b which minimizes the sum of
squares of deviations of the observations from their expected values; i.e.,
N
choose h as that b which minimizes 2 {V% — E(yd]2 — (y — Xb)'(y ~ Xb).
*=i
The resulting estimator is, as we have seen,
6 = (X'X)-iX'y.
b. Generalized least squares
On assuming that the variance-covariance matrix of e is var(e) = V, this
method involves minimizing (y — Xb)'V~1(y — Xb) with respect to b. This
leads to
b = (x'v-^xy-ix'v-y
Clearly, when V = σ% the generalized and the ordinary least squares esti-
mators are the same: b = f>.
c. Maximum likelihood
With least squares estimation no assumption is made about the form of the
distribution of the random error terms in the model, the terms represented by
e. With maximum likelihood estimation some assumption is made about this
distribution (often that it is normal) and the likelihood of the sample of
observations represented by the data is then maximized. On assuming that the
e's are normally distributed with zero mean and variance-co variance matrix
V, i.e., e ~ 7V(0, V), the likelihood is
L = (2π)~*Ν \\\~h exp {-Ky - Xbyv-^y - Xb)}.
Maximizing this with respect to b is equivalent to solving 9(loge L)/3b = 0.
The solution is the maximum likelihood estimator of b and turns out to be
b = (χ'ν-Ή^χ'ν-γ
the same as the generalized least squares estimator. As before, when V =
a% b simplifies to h. Only then, in thinking of b as the maximum likelihood
estimator, we do so on the basis of assuming e ~ N(0, σ2Ι).
Two well-known points are worth emphasizing about these estimators.
First, least squares estimation does not pre-suppose any distributional prop-
erties of the e's other than finite (in our case zero) means and finite variances.
88 REGRESSION [3.3]
Second, maximum likelihood estimation under normality assumptions leads
to the same estimator, b, as generalized least squares; and this reduces to the
ordinary least squares estimator h when V = σ2Ι.
d. The best linear unbiased estimator (b.l.u.e.)
For any row vector t' conformable with b the scalar t'b is a linear function
of the elements of the parameter vector b. A fourth estimation procedure
derives a best, linear, unbiased estimator (b.l.u.e.) of t'b.
The three characteristics of the estimator inherent in its definition lead to
its derivation.
(/) linearity: it is to be a linear function of the observations y. Let the
estimator be X'y, where λ' is a row vector of order N. Then λ is uniquely
determined by the other two characteristics of the definition, as shall be
shown.
(//) unbiasedness: X'y is to be an unbiased estimator of t'b. Therefore
E(k'y) must equal t'b; i.e., X'Xb = t'b. Since this is to be true for all b,
λ'Χ = t'. (46)
(///) a "best" estimator: "best" means that in the class of linear, unbiased
estimators of t'b, the "best" is to be the one that has minimum variance. This
is the criterion for deriving λ'.
Suppose var(y) = V. Then v(Xy) = λΎλ, and for X'y to be "best" this
variance must be a minimum; i.e., λ is chosen to minimize X'VX subject to
the limitation that λ'Χ = t' derived in (46). Using 2Θ as a vector of Lagrange
multipliers we therefore minimize
w = λΎλ - 2θ'(Χ'λ - t)
with respect to the elements of λ' and θ'. Clearly dw/dQ = 0 gives (46), and
dwjd\ gives
VX = ΧΘ or λ = ν^Χθ
since V" 1 exists. Substitution in (46) gives t' = λ'Χ = Θ'ΧΎ^Χ and so
Θ' = tXX'V^X)" 1 and hence
λ' = Θ'ΧΎ-1 = t'iX'V-iX^X'V- 1 . (47)
Hence the b.l.u.e. of t'b is t ' i X ' V ^ X ^ X ' V ^ y , and its variance is
Kb.l.u.e. of t'b) = v(Xy) = λΎλ = ί'ίΧ'ν^Χ)" 1 ^ (48)
on substituting for λ from (47). These results are quite general: from among
all estimators of t'b that are both linear and unbiased the one having the
smallest variance is t^X'V^X^X'V^y; and the value of this smallest vari-
ance is t'tX'V^X)- 1 !.
[3.4] CONSEQUENCES OF ESTIMATION 89
Since (47) is the sole solution to the problem of minimizing λΎλ subject to
(46), the b.l.u.e. X'y of t'b is the unique estimator of t'b having the properties
of linearity, unbiasedness and "bestness"—minimum variance of all linear
unbiased estimators. Thus the b.l.u.e. of t'b is unique, X'y for λ' given in (47).
Furthermore, this result is true for any vector t\ Thus for some other vector, p'
say, the b.l.u.e. of p'b is p ' i X ' V ^ X ^ X ' V ^ y , and its variance is p'iX'V^X)-^;
and its covariance with the b.l.u.e. of t'b is ^(Xy^XyH, as may be readily
shown.
Suppose that t' takes the value u^, the /th row of lk . Then u^b is bt, the ith
element of b, and the b.l.u.e. of b{ is uXX'V^X^X'V^y, the ith element of
( X ' V ^ X ^ X ' V ^ y ; and its variance is ^ ( X ' V ^ X ) - 1 ^ , the ith diagonal term
of (Χ'ν _ 1 Χ) - 1 . Thus by letting t' be, in turn, each row of lk , the

b.l.u.e. of b is b = ( X ' V ^ X ^ X ' V ^ y ,


with var(b) = (X'V^X)- 1 . (49)
This expression for b is identical to that given earlier; i.e., the generalized
least squares estimator, the maximum likelihood estimator under normality
assumptions and the b.l.u.e. are all the same, b.
It was shown above that h = (X'X^X'y is the b.l.u.e. of b when V = la2.
More generally, McElroy (1967) has shown that h is the b.l.u.e. of b whenever
V = [(1 - p)I + ll'p]a2 for 0 < p < 1. This form of V demands equality
of variances of the e/s, and equality of all covariances between them, with the
correlation between any two e/s being p; clearly p = 0 is the case V = Itf2.

4. CONSEQUENCES OF ESTIMATION

Properties of h = (X'X)_1X'y and consequences thereof are now dis-


cussed. The topics dealt with in this section are based solely on the two
properties so far attributed to e, that E(e) = 0 and var(e) = cr2I. In the next
section we consider distributional properties, based upon the further
assumption of normality of the e's; but this assumption is not made here.
The general case of var(e) = V is left largely to the reader (see Sec. 5.8).
a. Unbiasedness
Since h is the b.l.u.e. of b for V = a% it is unbiased. This can also be
shown directly:
E(h) = ^ ( X ' X ^ X ' y = (X'X)~1X'Xb = b. (50)
Thus the expected value of b is b and so f> is unbiased, implying, of course,
that in h' = [b0 £'] the estimator S is also unbiased.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy