Econ-607 - Unit2-W1-3

Addis Ababa University
College of Business and Economics

Department of Economics
Econ 607: Econometrics I
2. Regression Models
Fantu Guta Chemrie (PhD)
F. Guta (CoBE) Econ 607 September, 2018 1 / 117
2. Regression Models
2.1 K-Variables Linear Regression Models
K-variables linear regression model is speci…ed as:
yi = β1 + β2 xi ,2 + β3 xi ,3 + + βk xi ,k + ei ; i = 1, 2, . . . , n
= 1 β1 + xi ,2 β2 + xi ,3 β3 + + xi ,k βk + ei
2 3
6 β1 7
6 7
6 7
6 7
6 β2 7
6 7
= 1 xi ,2 xi ,k 6 7 + ei = xi0 β + ei
6 .. 7
6 . 7
6 7
6 7
4 5
βk
Stacking all the n observations as a column vector
gives:
2 3 2 3 2 3 2 3
0 0
6 y1 7 6 x1 β + e1 7 6 x1 β 7 6 e1 7
6 7 6 7 6 7 6 7
6 7 6 7 6 7 6 7
6 7 6 0 7 6 7 6 7
6 y2 7 6 x2 β + e2 7 6 x20 β 7 6 e2 7
6 7 6 7 6 7 6 7
6 7 = 6 7=6 +
7 6 7
6 .. 7 6 .. 7 6 .. 7 6 .. 7
6 . 7 6 . 7 6 . 7 6 . 7
6 7 6 7 6 7 6 7
6 7 6 7 6 7 6 7
4 5 4 5 4 5 4 5
yn 0
xn β + en xn0 β en
2 3 2 3
6 x10 7 6 e1 7
6 7 6 7
6 7 6 7
6 0 7 6 7
6 x2 7 6 e2 7
6 7 6 7
= 6 7β+6 7 = xβ + e
6 .. 7 6 .. 7
6 . 7 6 . 7
6 7 6 7
6 7 6 7
4 5 4 5
xn0 en

Therefore, a K-variables linear regression model can
compactly be written as:
y = x β + e (observables: y, x)
(n 1) (n k ) (k 1) (n 1)
To obtain the least squares estimator b

β of β
minimize
0
S ( β ) = e 0 e = (y xβ) (y xβ)
= y0 y y0 xβ β0 x0 y+ β0 x0 xβ
= y0 y 2β0 x0 y+ β0 x0 xβ
First order condition for minimum is obtained as
follows:
∂S ( β)
= 2x0 y+2x0 xβ
∂β
∂S ( β)
=0
∂β β=bβ
) 2x0 y+2x0 xb
β=0
1
) b
β = (x0 x ) x0 y
b
β is a linear estimator.

Consider the class of all linear estimator
h i
e 0 1 0
β = Ay = B + (x x) x y
h i
e 0 1 0
E β x = B + (x x) x E ( y j x)
h i
0 1 0
= B + (x x) x E [ (xβ + e)j x]
Assume that E ( ej x) = 0
h i
e 1
E β x = B + (x0 x ) 0
x xβ
= Bxβ + β = β 8 β i¤ Bx = 0
So, if we assume
a) E ( ej x) = 0 ) any other linear unbiased
estimator has the property that Bx = 0 and thus b
β
is unbiased as Bx = 0 for b
β. If we restrict ourselves
to the class of all linear unbiased estimators e
β we
need an assumption for var (e).
b) var ( ej x) = σ2 In , e is homoscedastic and

uncorrelated, then
h i
e e 1
β E β x = B + (x0 x ) 0
x e

Therefore,
0
var e
β x = E e
β E e
β e
β E e
β x
h i
1 0 1
= E B + x0 x x ee0 x x0 x + B0 x
1 0 1
= B + x0 x x E ee0 x x x0 x + B0
1
= σ2 BB 0 + σ2 x0 x ; as Bx = x0 B 0 = 0
1
This exceeds σ2 (x0 x) by a positive de…nite matrix
σ2 BB 0 , minimized by choosing B = 0; e
β=b
β.

Thus, b
β is the minimum variance among the class
of all linear unbiased estimators (MVULE/BLUE)
and has variance given by var b
1
β x = σ 2 (x0 x ) .
Gauss Markov Theorem
If e
β is any other linear unbiased estimator of β, then
var e
β x var b
β x is positive de…nite matrix,
the variance of any linear combination,
var λ0 b
β x < var λ0 e
β x .

This is to mean that
var λ0 e
β x var λ0 b
β x = λ0 var e
β x λ λ0 var b
β x λ
h i
= λ0 var e
β x var b
β x λ>0
N.B. λ0 b
β is an estimator of λ0 β.
Example
0 1
4 1 0
B C
B C
Example of PD matrix: A = B 1 5 1 C.
@ A
0 1 1

Least squares can be regarded as decomposing
vector y into two orthogonal components:
Py is the projection of y onto the space spanned by
the column vectors of the matrix x and My is the
projection of y on the space orthogonal to x
y0 y = y0 Py + y0 My
y0 y 1 0 0
n y ii y = y0 Py 1 0 0
n y ii y +y0 My
where i is a column vector of ones of order n.

Then,
y0 In 1 0
n ii y = y0 Py 1 0 0
n y Pii Py +y0 My
1
where P = x (x0 x) x0 ; Pi = i , as i = xJ
2 3
6 1 7
where J = 4 5.
0
(k 1) 1
The matrix P is an idempotent matrix, i.e.,

PP = P and P = P 0 .
Therefore,
y0 In 1 0
n ii y = y0 P In 1 0
n ii Py + y0 My
) y0 Ay = y0 PAPy + y0 My
1 0
where A = In n ii
Assumptions implicit:
i ). Model is correctly speci…ed.
ii ). rank (x) = k, full column rank and x is …xed
(non-stochastic).
iii ).E ( ej x) = 0.
iv ).var ( ej x) = σ2 In .
Residuals:
1
e = y xb
β where b
β = (x0 x ) x0 y
1
= y x (x0 x ) x0 y
1
= In x (x0 x ) x0 y
1
= (In P ) y where P = x (x0 x) x0
= My where M = In P
We know that Px = x ; Mx = 0, P and M are

both idempotent matrices.
) My =M (x β + e) = M e
) E ( e j x) = E ( Myj x) = E ( M ej x) = ME ( ej x) = 0
var ( e j x) = E ee 0 x = E M ee0 M 0 x
= E M ee0 M x = ME ee0 x M = σ2 M
1 0 1 0
tr (M ) = tr In x x0 x x = tr (In ) tr x x0 x x
h i
1 0
= n tr x0 x xx =n tr (Ik ) = n k = rank (M )
In fact the residuals obey the k linear constraints:

X 0 e = x0 Me = 0 as Mx = 0
Since x0 e = x0 Me = 0 has degenerate distribution.

To estimate σ2 , consider e 0 e
h i
E e 0e x = E (M e ) 0 (M e ) x = E e 0 M 0 M e x
= E e0 M e x = E tr e0 M e x , as e0 M e is scalar
= E tr M ee0 x , as tr (AB ) = tr (BA )
= tr E M ee0 x = tr ME ee0 x = tr M σ2 In
= σ2 tr (M ) = σ2 (n k)

e 0e
Therefore, S 2 = has E (S 2 ) = σ2 and is
n k
unbiased estimator of σ2 .
N.B. x is "…xed"-just we need E ( yj x) = xβ
var ( yj x) = σ2 In , var ( ej x) = σ2 In .
When is conditioning valid?, i.e., no loss of

information, — (exactly or asymptotically) x is
weakly exogenous.
v ) . ej x N (0, σ2 In ).
So assumptions (iii ) (v ) implies

yj x N (xβ, σ2 In ).
1) . To obtain distribution of b
β, we need the
following theorem.
Theorem
x N (µ, V ) ) Lx N (Lµ, LVL0 ).
(J 1)
More generally : Lx + γ N (Lµ + γ, LVL0 )
b 1 0 1 0 1 0
β = (x0 x ) xy = (x0 x ) x (x β + e ) = β + (x0 x ) xe
1
) ej x N 0, σ2 In ) b
β x N β , σ 2 (x0 x )

2) . b
β and S 2 are independent. To show this, we
need the following theorem:
Theorem
If x N (0, I ), then Lx and x0 Ax are independent if
LA = 0.
1 b 1 0 1 1 1
β β( x x ) ( x0 e ) = ( x0 x ) x0
= e
σ σ σ
1 1 1 0 1
and 2 e 0 e = 2 e0 Me = e M e
σ σ σ σ
1
are independent as (x0 x) x0 M = 0 and
1
e N (0, I ).
σ
)b β and S 2 are independent.
3) . Distribution of S 2 :
(n k) S2 1 0 1
2
= e M e χ2 (n k ) ,
σ σ σ
1 0 1
i.e., e M e χ2 (rank (M )) .
σ σ
Now, suppose we want to test one or more linear

restrictions concerning β. For example, say k = 4,
and 8
>
>
>
> β1 = 0
>
>
>
<
H0 : β2 = β3
>
>
>
>
>
>
>
: β =1
4
This hypothesis can be expressed in matrix form as:

2 3
2 3 2 3
6 β1 7
6 7
6 1 0 0 0 76 7 6 0 7
6 76 7 6 7
6 7 6 β2 7 6 7
6 76 7 6 7
6 0 1 1 0 7 6 7=6 0 7
6 76 7 6 7
6 76 β 7 6 7
4 56 3 7 4 5
0 0 0 1 6 4
7
5 1
β4
In general, a linear hypothesis is of the form
H0 : R β = r
Assume our constraints are linearly independent. If

R is (q k ), i.e. rank (R ) = q (if not we eliminate
the linearly dependent constraints).
Consider,
b 1
β x N β, σ2 (x0 x)
1
Rb
β x N R β, σ2 R (x0 x) R0
and so, under H0 : R β = r
1
Rb
β r X N 0, σ2 R (x0 x) R0
Now, we need extra theorem:

Theorem
If x N (µ, V ) , then
(J 1)
0 1
(x µ) V (x µ) χ2 (J ) , where V is a psd
matrix.
Proof.
1
Let PP 0 = V ) IJ = P 1
V (P 0 )
Proof.
1
and V 1
= (P 0 ) P 1
0 1
So, (x µ) V (x µ) =
0 1 d
(x µ ) (P 0 ) P 1
(x µ) = y0 y.
1
Where y =P (x µ ) ; (x µ) N (0, V ).
1 d
So, y =P 1
(x µ) N 0, P 1
V (P 0 ) =
N (0, IJ )
0 d
) (x µ) V 1
(x µ) = y0 y χ2 (J )
1
Returning to Rb
β r x N 0, σ2 R (x0 x) R0
0 h i 1
1
) Rb
β r σ 2 R x0 x R0 Rb
β r χ2 (q ) , under H0
1 0 h i 1
1
) Rb
β r R x0 x R0 Rb
β r χ2 (q ) (2.1.1)
σ2
1
Here, we are assuming that if (x0 x) is positive
1
de…nite and rank (R ) = q, R (x0 x) R 0 is positive
de…nite and thus invertible.
We cannot use (2.1.1) for a test, but if we divide by
the degrees of freedom q, we have χ2 (q ) /q,
independent of
(n k ) S 2 /σ2 / (n k) χ2 (n k ) / (n k ).
Thus, the σ2 cancels on division as do the (n k )’s

, and
0 h i 1
1
Rb
β r R (x0 x ) R0 Rb
β r
F = F (q, n k ) , under H0
qS 2
i ). Testing the signi…cance of a particular coe¢ cient

can be carried out as follows:
βj = 0 ; q = 1 ; R = 0 0 1 0 0 = ej0
"
j th position
1 1 1
R (x0 x ) R 0 = ej0 (x0 x) ej = (x0 x)jj , the j th principal
diagonal
element of (x0 x) 1 .
h i 1
1
R (x0 x ) R0 1 1
= 1
=h i2 ,
qS 2 S 2 (x0 x)jj c b
s.e. βj
s.e. standard error.
So, we can test H0 : βj = 0 by calculating
b
βj
t (n k )under H0 . If the alternative is
.e. b
sc βj
HA : βj 6= 0, we use a two tailed test, with 2 12 %
point of the t statistic
F. Guta (CoBE) being
Econ 607 2 (or see table).
September, 2018 29 / 117
Standard regression output might look like the
following table:
Equation for US investment, 1968-82
Variable Coe¢ cient Standard error t-ratio p-value
Constant 0.509 0.05510 9.2
Time 0.017 0.00197 8.4
Real GDP 0.670 0.05500 12.2
Interest rate 0.002 0.00122 1.91
In‡ation 0.9E 4 0.13E 2 0.07
“Interest rate” is border line: signi…cant at 10%, or

at 5% on a one-tailed test.
Note that x has …ve columns,
2 3
1 1 x13 x14 x15
6 7
6 7
6 1 2 x23 x24 x25 7
6 7
6 .. .. .. .. .. 7
6 . . . . .7
4 5
1 n xn3 xn4 xn5
Using x20 = 1 n or x20 = 1968 1982
makes a di¤erence only to the constant term.

If we accept βj = 0, we can eliminate xj from the
equation and reduce k by 1.
However, our di¤erent t ratios are not
independent, either within a regression or between
di¤erent regressions for the same y.
ii ). Other Linear Hypothesis:
2 3
6 β1 7
If β = 64 7
5 ; H0 : β2 = 0 (a subset of β, without
β2
loss of generality the last J, is zero

2 3
6 β1 7
R= ; r = 0 ; Rβ = 6 7=β
0 IJ (J 1 )
0 IJ 4 5 2
β2

1
What is R (x0 x) R0?
2 3
6 V11 V12 7
6 7
If V = 6 7 , V22 is (J J)
4 5
V21 V22
2 3
6 β̂1 7
0 1 0 6 7
R xx R = V22 ; R β̂ r = β̂2 if β̂ = 6 7
4 5
β̂2
0
b
β2 V221 bβ2
So, F= 2
F (J, n k ) under H0 : β2 =0
JS
To make progress with this, we need to be able to
partition the inverse of (x0 x). Alternatively,
iii ). The hypothesis R β = r can also be tested
using:
(ẽ 0 ẽ e 0 e ) /J (RRSS URSS ) /J e 0e
F = = F (J, n k ) , where S 2 =
e 0 e / (n k) URSS / (n k ) n k
R 2 and R̄ 2
e 0e e 0e
R2 = 1 =1
∑ni=1 (yi ȳ )
2 y0 Ay
1 0 0
where A = In n i i and i = 1 1 1

R 2 : descriptive, proportion of variation explained.
2 3
6 1 7
6 7
2 6 7
6 .. 7
R 2 = [r (b
y, y)] as long as x contains i=6
6 . 7
7
and
6 7
4 5
1
y = xb
b β.
But adding columns to the x matrix gives positive
de…nite increase in R 2 ; therefore, R12 R22
e 0 e / (n k )
R̄ 2 = 1
y0 Ay/ (n 1)
R̄ 2 : attempts to avoid an ever increase in R 2 , but
adding variables increases R̄ 2 if F ( β2 = 0) > 1.
2
Note that ∑ni=1 (yi ȳ ) is the RRSS obtained
from regression of y on i and e 0 e = URSS.
(RRSS URSS ) / (k 1)
F = F (k 1, n k)
URSS / (n k )
tests H0 : β2 = 0 (joint test of signi…cance on the

slopes).
If the regression does not contain a constant term
R 2 can be negative. When the regression does not
include the constant term, to avoid R 2 < 0, use
2
[r (b
y, y)] .
2.2 Restricted Least Squares
To obtain the restricted least squares estimator, e

β,
of β minimize the sum of the squared errors with
respect to β, subject to the given constraint.
0
S ( β ) = (y xβ) (y xβ) subject to R β = r
The Lagrangian is given by:
0
L ( β, λ) = (y xβ) (y xβ) + 2λ0 (r R β)
The …rst order conditions are
∂ L ( β, λ )
= 2x0 y+2x0 x β̃ 2R 0 λ̃ = 0 (2.2.1)
∂β β̃, λ̃
∂ L ( β, λ )
= 2 r R β̃ = 0 (2.2.2)
∂λ β̃, λ̃
From equation (2.2.1), it follows that

x0 x β̃ = x0 y+R 0 λ̃
) β̃ = (x0 x) 1 x0 y+ (x0 x) 1
R 0 λ̃
=b 1
β + (x0 x) R 0 λ̃
1
) R β̃ = R b
β + R (x0 x ) R 0 λ̃ (2.2.3)
Using equation (2.1.2), equation (2.1.3) becomes
1
r = Rb
β + R x0 x R 0 λ̃
h i 1
1
) λ̃ = R x0 x R0 r Rb
β
h i 1
1 1
) β̃ = b
β + x0 x R 0 R x0 x R0 r Rb
β
Note that if R b
β = r, then β̃ = b
β. One can …nd
E β̃ and var β̃ , and establish that
var b
β var β̃ is psd; but β̃ is biased unless
R β = r.
One can compare MSE β̃ and MSE b
β ;
h i
0
MSE β̃ = E β̃ β β̃ β
n o
0
= E β̃ E β̃ + E β̃ β β̃ E β̃ + E β̃ β
h i
0
= var β̃ + E E β̃ β E β̃ β
Consider whether it is worth imposing the

restrictions that may be false. It turns out that the
condition for
MSE β̃ MSE b
β
nsd
h i 1
0 1
λ = (R β r) R x0 x R0 (R β r ) /σ2 < 1
Under H0 , our F test is the ratio of two

independent χ2 d.f . Under H1 , the numerator
has a non-central χ2 (density shifted to the right)
and the ratio is a non-central F distribution with
non-centrality parameter λ or (λ/2) depending on
the correction adopted.
The right shift explains our use of the upper tail

when testing. The MSE arguments suggest testing
that λ < 1 rather than R β = r using 5% upper tail
of an F 0 (n, k, λ) (non-central F distribution with
parameters n, k, λ ) with λ = 1.
Returning to restricted least squares:
If ẽ = y x β̃, then
ẽ = y xb
β + xb
β x β̃ = e + x b
β β̃
0
) ẽ 0 ẽ = e 0 e + b
β β̃ x0 x b
β β̃
h i 1
1 1
But b
β β̃ = x0 x R 0 R x0 x R0 r Rb
β
0h 1
i 1
) ẽ 0 ẽ e 0e = r Rb
β R x0 x R0 r Rb
β
Thus, RRSS URSS has the form asserted earlier.
Maximum Likelihood Estimation

We have considered the linear model: y = xβ + e
Assuming:
i ). x is full column rank and …xed (non-stochastic)

ii ). E (e) = 0 ; E (ee0 ) = σ2 In or
iii ). e N (0,σ2 In )
Thus OLS yields estimators which are BLUE given

these assumptions.
The assumption of normality is strong, but it has
advantage, it does give us exact …nite sample
distributional results.
It is desirable to have a method which is typically

robust to departures from these assumptions !
OLS is unlikely to be robust,
In particular,
i ). x may be stochastic
ii ). e may be non-normal, may be correlated.
We may relax these assumptions, but we sacri…ce

the exact …nite distributional results, and so must
rely on asymptotic results.
2.3 Asymptotic Theory
Concerned with behaviour of random variables as

the sample size tends to in…nity. In particular, how
does the random variable b
β behaves as n ! ∞?
Convergence in Probability:
Suppose we have a sample of size n observations on

X drawn at random from a distribution with mean
1 n
µ and variance σ2 . Consider X n = n ∑i =1 Xi , then
E X n = µ and var X n = σ2 /n, i.e., X n is
unbiased and var X n ! 0 as n ! ∞, i.e.,
distribution of X n becomes more and more
concentrated around µ.
Consider the neighbourhood of µ e, then the
probability that X n lies in the interval is given by:
n o n o σ2
P µ e < Xn < µ + e = P Xn µ <e 1
n e2
We can make the interval smaller by decreasing e

since the variance declines monotonically with n,
there exists a n and a δ (0 < δ < 1) such that for
all n > n
P Xn µ <e >1 δ
Then, the random variable X n is said to converge in

probability to µ.
Alternatively as n increases the probability that X n

lies in a speci…ed interval increases (i.e., δ
decreases) so equivalently:
lim P Xn µ <e = 1
n!∞
i.e. plimX n = µ
i.e., by decreasing δ we can make the probability of

X n lying in an arbitrarily small interval around µ is
equal to one.
We write plim X n = µ.
We say that X n is a consistent estimator of µ.
Consider an alternative estimator of µ, denoted by
mn , such that E (mn ) = µ + nc , where c is a
constant. This is obviously biased in small samples,
but limn!∞ E (mn ) = µ, i.e., mn is asymptotically
unbiased.
Chebysheve’s Theorem:
For a random variable X with …nite mean and

variance (µ, σ2 ) and for a given λ > 0,
1
P fjX µj λσg .
λ2
Using this theorem, we obtain
q
c 1
P mn µ+ n λ var (mn )
λ2
p
Letting e = λ var (mn ), we have
var (mn )
P mn µ + nc e
e2
) lim P mn µ + nc e =0
n!∞

So that mn is consistent if var (mn ) ! 0 with
n ! ∞.
Su¢ cient conditions for consistency:
i ). Asymptotic unbiasedness
ii ). Variance! 0 as n ! ∞.
The operator plim has the following properties:
2
plim X 2 = (plim)
plimXY = (plimX ) (plimY )
Example
Consistency of least squares estimator b
β:
b 1 0 1 0
β = x0 x x y = β + x0 x xe
1
= β + x0 x / n x0 e/n = β+A 1 B
x0 x x0 e
where A = ; B=
n n
If x is stochastic, we will assume that
x0 x
plim =Σ (positive de…nite)
n
Example
If x contains the constant term:
2 3
6 plim ∑ ei /n 7
6 7
6 7
6 7
6 plim ∑ x e /n 7
0
xe 6 i ,2 i 7
plim =6
6
7
7
n 6 .. 7
6 . 7
6 7
6 7
4 5
plim ∑ xi ,k ei /n
Since E (ē) = 0, var (ē) = σ2 /n )plim ē = 0.

Now, E (∑ni=1 xi,j ei /n) = 0 for every j = 2, . . . , k,
Example
i ). x is …xed or non-stochastic, or
ii). x is stochastic with cov (x, e) = 0.
n n
σ2
Now, var ∑ xi ,j ei /n = n ∑ xi2,j /n ; j = 2, . . . , k.
i =1 i =1
In view of the assumption that plim x0 x/n = Σ, we

n
have that plim ∑ xi,j
2
/n is constant and plim σ2 /n
i =1
is zero,
x0 e
i.e. plim =0
n
) plim b
β = β + plim A 1
(plim B )
Example
1
= β + (plim A) (plim B ) = β + Σ 1
0=β
Convergence in distribution
Suppose X N (µ, σ2 ), then X n N (µ, σ2 /n).
But as n ! ∞, the distribution of X n is degenerate,
i.e., collapses around the point µ. However, consider
p
Zn = n X n µ
) E (Zn ) = 0, var (Zn ) = σ2
Thus the density function (distribution) of Zn is
N (0, σ2 ) which is independent of n and hence …nite
sample distributions are the same as the limiting
distribution.
Often, small sample distributions cannot be derived
or di¢ cult to calculate, but limiting distributions
based on standardized variates such as Zn are
available.
Central Limit Theorem:

Suppose X has mean µ and variance σ2 . De…ne
p d
Zn = n X n µ , then Zn ! N (0, σ2 )
independent of the actual distribution of the X ’s.
We say that X n AN (µ, σ2 /n) where
asyvar X n = σ2 /n.
Asymptotic (large sample) theory:
Convergence in probability
Consistency, plim b
θ = θ.

Convergence in distribution
d
Central limit theorem: Zn ! N (0, σ2 ).
2.4 Maximum Likelihood Estimation:
A very general method with a widespread

application.
De…nition
The likelihood of an observation of X is the value of
the density function at x, f (x, θ ), where f depends
on θ (parameter).
Clari…cation: the density f (x, θ ) is regarded as a
function of x for a given θ.

The likelihood function is regarded as a function of
θ for a given x.

Maximum Likelihood Principle: to choose the value
of θ (possibly a vector) which is most likely to have
generated the observed (given) sample values of x.
Given x1 , x2 , . . . , xn , i.e., n observations on the
random variable X , the likelihood function (provided
x1 , x2 , . . . , xn are independent) is
n
L (θ; x1 , x2 , . . . , xn ) = ∏ f (xi ; θ )
i =1
The maximum likelihood estimates (MLE ) are

obtained as a solution to:
maxL (θ; x) ! b
θ = h (x)
θ
MLE of the General Linear Model:
Model : y = xβ + u (2.4.1)
Assume X is …xed (non-stochastic). Equation

(2.4.1) is a transformation from u to y. The
(multivariate) density of y may be written as:
∂u
fY (y) = fU (u (y)) = fU (u (y)) jIn j = fU (u (y)
∂y0
Assuming u N (0, σ2 In ), we have
n /2 1 0
fU (u) = 2πσ2 exp uu (2.4.2)
2σ 2
n /2 1
) fY (y) = 2πσ2 exp (y x β ) 0 (y x β)
2σ 2
We want to maximize L with respect to β, σ2 ,

0
maxL. Let θ = β0 σ2 , i.e., θ is a (k + 1) 1
β, σ2
∂L
column vector. Then b
θ is the solution to = 0.
∂θ b
θ
We use (maximize) ln L,

From equation (2.4.2), it follows that:
n n 1
ln L = ln 2π ln σ2 (y x β ) 0 (y x β)
2 2 2σ 2
9
∂ ln L 1 >
>
= 2x0 y+2x0 xb
β =0 >
>
>
∂β b
β, σ̂2 b2
2σ >
>
>
=
1
= x0 y x0 x b
β =0 >
(2.4.3)
b2
σ >
>
>
∂ ln L n 1 0 >
>
>
= 2
+ 4 y xb β y b
xβ = 0 >
;
∂σ2 b
β, σ̂2 2σb b
2σ

b
β = (x0 x )
1
x0 y this is the OLS estimator
1 0 e 0e
b2 =
σ y xb
β y xb
β = where e = y xb
β
n n
(n k ) 2
Now E bβ = β but E σb2 6= σ2 ; E σb2 = n σ .
Subject to certain regularity conditions1 if
(y1 , y2 , . . . yn ) is a random sample and b

θn is the
maximum likelihood estimator of θ0 (true value)
based on a sample of size n, then
1
p d 1 ∂2 ln L
n b
θn θ0 ! N (0, V) where V = lim E
n !∞ n ∂θ∂θ0
1 For details see slides 116 and 117

This suggests that we may conduct approximate
test in …nite samples based on the limiting
distributions, e.g., use t test. The approximation
obviously becomes better as n gets larger.
Theorem
Cramer-Rao Theorem: Subject to certain regularity
conditions, given a random sample (y1 , y2 , . . . yn )
and an unbiased estimator b
θn of θ0 , then
1
var b
θn I (θ0 ) is positive semi-de…nite
Theorem
∂ ln L ∂ ln L ∂2 ln L
where I (θ0 ) = E = E
∂θ ∂θ0 ∂θ∂θ0
I (θ0 ) is known as the information matrix. The

1
matrix I (θ0 ) serves as the minimum variance
bound (MVB).
1
If var b
θn = I (θ0 ) , then b
θn is said to be
e¢ cient.
1
N.B. var b
θn I (θ0 ) 0 in the sense that the
(k k ) psd
(k k )
di¤erence is a psd matrix.
Now b
βML is unbiased but σ̂2ML is biased.
0 1 0 1
b
β β
So @ ML A is a biased estimator of @ A.
2 2
bML
σ σ
This is to mean that, strictly speaking, the

Cramer-Rao Theorem is not applicable here, but it
is of interest to examine these properties:

8
>
>
> ∂2 ln L x0 x
>
> =
>
> ∂β∂β0 σ2
>
<
∂2 ln L 1 0
= 4 (
x y x0 x β )
>
> ∂β∂σ 2 σ
>
>
>
> ∂2 ln L n 1
>
> = 4 y x β ) 0 (y x β )
:
2 2 2 6 (
∂ (σ ) σ σ
8
>
> ∂2 ln L X 0X
>
> E =
>
> ∂β∂β0 σ2
>
>
<
∂2 ln L
) E =0
>
> ∂β∂σ 2
>
> !
>
> ∂2 ln L n n n
>
> E = + 4 = 4
: 2 4
∂ ( σ2 ) 2σ σ 2σ
The information matrix is therefore:
0 1 0 1
x0 x 1
B 0 C B σ 2 (x0 x ) 0 C
B 2 C 1 B C
I (θ) = B σ C ) [I (θ)] =B C
@ n A @ A
00 00 2σ 4 /n
2σ 4
1
Thus, var bβ = var bβML = σ2 (x0 x) = MVB .
So b
β and b
βML are both e¢ cient estimators of β.
b2ML is a biased estimator σ2 , the theorem is
Since σ
not really relevant or applicable.
2σ4 2σ4
But var (S 2 ) = n k > n for …nite n ) S 2 is not
an e¢ cient estimator of σ2 . In fact, there is no
unbiased estimator of σ2 that attains the MVB.
Hypothesis Testing: the Likelihood Ratio Principle
So far we have considered linear restrictions of the

form: H0 : R β = r which can be tested using an
F statistic. But what about non-linear restrictions
of the form, say: H0 : β1 β2 = 1.
This cannot be written in the form R β = r, but we
can represent it as: H0 : g ( β) = 0.
This hypothesis can be tested using likelihood ratio
test:
maxL
H0
λ=
max L
H0 [ HA
Example
yi = β1 + β2 xi2 + β3 xi3 + ei
Consider a null hypothesis of the form:

1
H0 : β2 = .
β3
1
Under H0 , yi = β1 + β2 xi2 + xi3 + ei .
β2
Thus OLS is inapplicable, i.e., won’t take the
restrictions into account. The procedure is:
Step I: Estimate the unrestricted model (by OLS)
and obtain maximized value of the log likelihood,
i.e., ln LU .
Step II: Impose the restrictions, estimate by
maximum likelihood (ML) and obtain the maximized
value of the restricted log likelihood, i.e., ln LR .
approx
Step III: 2 ln λ = 2 ln LR ln LU χ2 (q )
where q is the number of restrictions imposed by
the null hypothesis.
Step IV: Accept H0 if 2 ln λ < c̄ (critical value

based on χ2 (q )); reject H0 if otherwise.
N.B. ln LR < ln LU ) 2 ln λ 0.
2.5 Wald, Likelihood Ratio and Lagrange Multiplier

Tests
These are general methods for constructing test

statistics.
Usually they provide us with asymptotically valid
tests.
We will de…ne the general form of these test
statistics, and then applying them to testing
R β = r as an illustration.
We start with Maximum Likelihood Estimation
p h i 1
d
n b
θ θ ! N 0, lim n1 In (θ)
n!∞
Where if ` (θ) = ln L (θ)

∂` (θ)
= S (θ) the score vector
∂θ
∂2 ` (θ) ∂S (θ)
0 = =H the Hessian matrix
∂θ∂θ ∂θ0
E (H ) = I (θ) the information matrix
In general, in i.i.d. sampling
n n
L (θ) = ∏ f ( Xi j θ) ; ` (θ) = ln L (θ) = ∑ ln f ( Xi j θ)
i =1 i =1
So, `, and hence S, H and I are sums of n

identically distributed terms.
So, n1 In (θ) is a mean, subscripting I is to emphasize
its dependence on the sample size n.
For a linear regression model: y = x β + e where
e N 0, σ2 In ,
n n 1
` (θ) = ` β,σ2 = ln 2π ln σ2 (y x β ) 0 (y x β)
2 2 2σ 2
So, 0 1 0 1
∂` (θ) 1
∂` (θ) B
B
C B
C B (x0 y x0 x β ) C
S (θ) = =B ∂β C=@ σ2 C
∂θ @ A A
∂` (θ) n 1
+ 4 (y x β ) 0 (y x β)
∂σ2 2 σ2 2σ

So, the Hessian matrix is
0 1
∂2 ` (θ) ∂2 ` (θ)
B C
B ∂β∂β0 ∂β∂σ2 C
H = B
B
C
@ ∂2 ` (θ) ∂2 ` (θ) C
A
∂σ2 ∂β0 2
∂ (σ )
2
0 1
1 0 1 0
B xx x (y x β ) C
B σ2 σ4 C
= B C
@ 1 n 1 A
(y x β)0 x (y x β ) 0 (y x β)
σ4 2σ 4 σ6
0 1 0 1
1 0 1 0
B C B C
B σ2 X X 0
C B σ2 X X 0
C
) E (H ) = B C =B C
@ n n σ2 A @ n A
00 4
+ 6 00
2σ σ 2σ 4

and
0 1 0 1
1 0 1 1 0
B C B C
B 2σ 2 x x 0
C B 2n
xx 0
C
In (θ) = B C ) n1 In (θ) = B 2σ C
@ n A @ 1 A
00 00
2σ 4 2σ 4
p
If n1 x0 x ! Q, positive de…nite matrix, then
0 1
1 0 1
h i 1 B σ2 lim nx x 0 C
B C
lim 1 In (θ) = B n !∞ C
n !∞ n @ A
00 2σ 4
So, the asymptotic maximum likelihood result

stated for the i.i.d. case, suggest that
0 1 20 1 0 13
b 1
p B β C 6B 0 C B σ2 lim n1 x0 x 0 C7
B β C d 6B C B n !∞ C7
nB C ! N 6B C,B C7
@ 2 A 4@ A @ A5
b
σ σ2 0 00 2σ 4
p 1
In fact n b
β β is exactly N 0,σ2 (x0 x/n)
b2 , and
independently of the distribution of σ
b2
σ
b 2 = 2σ 4
var σ n k
n2
but E σb2 = σ2 n k
n and is
σ2
1
n χ2 (n k ).
We de…ne the likelihood ratio (LR) test as follows:
If H0 : C (θ) = 0 and e
LR is the likelihood
(J 1) (k 1)
maximized subject to the restriction, b

LU is the
unrestricted maximum value of the likelihood
function, then
!
e
LR
2 ln λ = 2 ln , 2 ln λ χ2 (J )
b
LU
In our example of testing R β = r, we know that

n n e 0e
ln b
LU = (1 + ln 2π ) ln σ 2
b where σ 2
b =
2 2 n
Similarly,
n n ẽ 0 ẽ
ln e
LR = (1 + ln 2π ) 2 2
ln σ̃ where σ̃ =
2 2 n
So,
bU eR n σ̃2
2 ln λ = 2 ln L ln L =2 ln 2
2 σb
!n /2
n /2
b2
σ e 0e
) λ= =
σ̃2 ẽ 0 ẽ
2 /n ẽ 0 ẽ ẽ 0 ẽ e 0e
λ 1= 1=
e 0e e 0e
n k 2 /n (ẽ 0 ẽ e 0 e ) /J
) λ 1 = =F F (J, n k)
J e 0 e / (n k )
2 /n JF
i.e. λ = 1+
n k
2 JF JF
) ln λ = ln 1 + ) 2 ln λ = n ln 1 +
n n k n k

x2
Now, 1 + x < 1 + x + + = ex for x > 0
2!
) ln (1 + x ) < x
JF d 2
2 ln λ < n ! χ (J )
n k
But the approximation will become very close for n
large enough.
The Wald Test: H0 : C (θ) = 0
h i0 h i 1h i
d
W = C b
θ var C b
θ C b
θ ! χ2 (J )
H0

var C b
θ ' c var b
θ c0
∂C (θ)
where c = .
∂θ0
(J k ) b
θ
h i0 h i 1h i
d
W ' C b
θ c var b
θ c0 C b
θ ! χ2 (J )
H0
In our example, C (θ) = 0 , R β r = 0.

θ = Rb
So, C b β r
0 1
" #
∂C (θ) B β C
∂C (θ) ∂C (θ) B C
c= = ; θ =B C
∂θ0 ∂β0 ∂σ2 b
@ A
θ σ 2
(J k ) b
θ

Thus,
1
1 1
var b
θ = lim In (θ)
n n!∞ n
0 1
1
2 lim 1 0
B
1 B σ xx 0 C
n!∞ n C
= B C
n @ A
00 2σ 4
2 3
2 0 1
σ (x x) 0
and we use 4 5
0 4
0 2σ /n

2 32 3
2 0 1 0
6 σ (x x) 0 76 R 7
So, c .var b
θ .c 0 = R 0
6
4
76
54
7 = σ 2 R (x0 x )
5
1
R 0.
00 4
2 σ /n 0
0 h i 1
1
) W = Rb
β r R x0 x R0 Rb
β r /σ2
Replacing σ2 by the unconstrained MLE σ̂2 , we

obtain:
0 h i 1
1 0
W = Rb
β r R x0 x R Rb
β r /bσ2
0h 1
i 1
Rb
β r R (x0 x ) R 0 Rbβ r /σ2 /J
n k
W = n o =F F (J , n k)
nJ b2 /σ2 / (n k )
nσ
and as n ! ∞, . n n k ! 1) W = JF d! χ2 (J ).
H0

JF nJF
As 2 ln λ = n ln 1 + < = W.
n k n k
LR < W , but LR ! W as n ! ∞
) LR < W
The Lagrange Multiplier (Score) Test:
Exists in two forms. In general the Lagrangian is:
L = ` (θ) λ0 C (θ) , and the F.O.C.,
∂L ∂` (θ) ∂C (θ)
= c 0λ e e = 0, where c =
∂θ e e
θ, λ ∂θ e e
θ, λ
θ, λ ∂θ0
F.O.C. imply
0 ∂` e
θ
c e
θ e=S e
λ θ , where S e
θ = the score vector
∂θ
If e
θ is close to b
θ, we expect S e
θ 'S b
θ so, we can
use as our test statistic,
0h i 1 d
S e
θ f S e
var θ S e
θ ! χ2 (J )
H0
Theorem
In general,
Theorem
i ). As E (S (θ)) = 0 , E S e
θ =0
H0
h i
ii ). As E (S (θ)) = 0 , var (S (θ)) = E S (θ) S (θ)0 = In (θ).
h i 1 0
So, LM = λe0c eθ In e θ c eθ λ e
e should be close to 0, as the ‘cost’of

Of course λ
the constraint, if the constraint is nearly valid. It is
better to write
0 1
1 1
LM = S e
θ lim In e
θ S e
θ
n n !∞ n

For a linear regression model:
0 1
1
B x0 y x0 x e
β C
B e2 C
S e
θ = B σ C
@ n 1 0 A
+ y x e
β y xe
β
0 e 2 2σ
2σ e4 1 0 1
1 0 1
B xe
e C B 2 x0 e e C
B e 2 C B e C
= B σ C = B σ C (2.5.1)
@ n nσe2 A @ A
+ 0
e2
2σ 2σe4
But ẽ = y x β̃ = y xb
β+x b
β β̃ = e + x b
β β̃
) x0 e
e = x0 e + x0 x b
β e
β = x0 x b
β e
β , as x0 e = 0

However,
h i 1
b e 1 1
β β = x0 x R 0 R x0 x R0 r Rb
β
h i 1
1 1
= x0 x R 0 R x0 x R0 Rb
β r
h i 1
1
=) x0 x b
β e
β = R 0 R x0 x R0 Rb
β r
h i 1
1
) x0 e
e = R 0 R x0 x R0 Rb
β r (2.5.2)
On the other hand,

0 1
1 1 0
1 B 2 xx 0 C
B e n C
lim In e
θ =B σ C (2.5.3)
n !∞ n @ 1 A
00
e4
2σ
From (2.5.1), (2.5.2) and (2.5.3), it follows that
0 1
1 1
LM = S eθ lim In ~ θ S eθ
n n !∞ n
" #
1 1 0
h i 1
= 1
n 2
R β̂ r R (x0 x ) R 0 R0 0
e
σ
0 1 1
1 1 0
B 2 xx 0 C
B σ C
B e n C
@ 1 A
0 0
2σe4
2 3
1 h i 1
6 2 R 0 R (X 0 X ) 1 R 0 Rbβ r 7
6 σe 7
6 7
4 5
0
0h 1
i 1
) LM = R b
β r R x0 x R0 Rb
β σ2
r /e (2.5.4)

From (2.5.4), it follows that
8 0h i 1
9
>
< Rb r R (x0 x )
1
R0 Rb r >
= σ
β β b2 Wσ b2
LM = =
>
: b2
σ > e2
; σ e2
σ
e2 , LM<W. Moreover,
Now, as σ̂2 < σ
e 0e
e e e 0e e 0e
LM = =n 1
ẽ 0 ẽ /n e 0e
e e
n /2
e 0e e 0e
LR = 2 ln = n ln 0
e 0e
e e e
ee e
LR /n e 0e
) e = 0
e
ee e
Therefore,
LR /n
LM = n 1 e
LR /n LM
) e =1
n
LR LM LM
) = ln 1 <
n n n
The preceding inequality holds as for x > 0,

x2 x3
ln (1 x) = x < x.
2 3
) LR > LM
Moreover, as n ! ∞, LR ! LM.
Therefore, for linear hypothesis in the linear
regression model, we have: W LR LM.
This ranking does not carry over to non-linear
restrictions . However, if we have linear restrictions
plus a quadratic likelihood, then it follows that
W = LR = LM. Note that in regression: we have a
likelihood quadratic in β but not in σ2 , so, equality
holds if σ2 is known or as n ! ∞.

Notes:
i ). W , LR, and LM are asymptotically equivalent

under H0 [ HA = Hm the maintained hypothesis,
but if Hm is false, may behave di¤erently.
ii ). Easier to modify W and LM for, example,
heteroscedasticity using HCSE formulae.
iii ). Small sample distributions may vary
(considerably) from each other and asymptotic χ2 .
iv ). Special problem with W , which is not invariant
to the way restrictions are written:
Example
β1
H0 : β1 = β2 or H0 : =1
β2
Same LR, LM, but di¤erent W (and problem if b
β2
close to zero)
v ). Wald makes general to simple, easy; as only

needs b
θ plus if W (k1 ) tests k1 restrictions, W (k2 )
tests k2 restrictions and k1 < k2 (and the k1
restrictions are nested in k2 restrictions), then
W (k2 ) W (k1 ) χ2 (k2 k1 ) under all restrictions
are valid, independently of W (k1 ) χ2 (k1 , 0).
vi ). When the restricted model is easier to estimate:

e.g. yt = xt0 β + ut : ut = ρut 1 + et regression
with AR (1) errors H0 = ρ = 0; then β̃ is just
OLS: and LM is the natural test.
2.6 Generalized Least Squares
So far we assumed E (uu0 ) = σ2 In . This plays a

crucial role in the results so far obtained, but is
rather restrictive.
Sources of non-spherical disturbances:
1). Suppose we have a cross-section of households

where we observe income and expenditure. We
might expect there to be more variation in
expenditures by households at higher income levels,
leading to the assumption that:

0 1
σ21 0 0
B C
B C
B 0 σ22 0 C
E (uu0 ) = B
B .. .. ..
C
C
B . . ... . C
@ A
0 0 σ2n
where σ21 < σ22 < < σ2n , i.e., variance increases
as income increases.
N.B. the data is ordered from the lowest income (1)

to the highest income (n).
This clearly violates the spherical nature of the
disturbances, i.e., variances are not constant even
though the covariances are (assumed) to be zero
(Heteroscedasticity).
2). The e¤ects of shocks or unmodelled
components in a model may persist into subsequent
periods. For the model: yt = xt0 β + ut where
xt0 = xt1 xt2 xtk , we may …nd that
ut = ρut 1 + et (2.6.1)
Where E (et ) = 0, E (e2t ) = σ2e and E (et es ) = 0

for all t 6= s.
i.e., et is spherical. In this case we have
‘autocorrelation’. What are the properties of ut ?
Let L denotes the lag operator:
i.e., Lxt = xt 1
L2 xt = xt 2
..
.
Lj xt = xt j , L0 xt = xt
Then, equation (2.6.1) can be written as:

ut = ρLut + et
) (1 ρL) ut = et
et
) ut = = 1 + ρL + ρ2 L2 + et
1 ρL
Or,
ut = et + ρet 1 + ρ2 et 2 + (2.6.2)
Now, E (ut ) = 0 and

σ2e
var (ut ) = 2
= σ2 , if jρj < 1 (2.6.3)
1 ρ
Also,
E (u t u t 1) = ρσ2
E (u t u t 2) = ρ2 σ 2
.
.
.
E (u t u t j) = ρj σ 2
0 1
B 1 ρ ρ2 ρT 1 C
B C
B C
B C
B ρ 1 ρ ρT 2 C
B C
B C
B
2B
C
i .e., E uu0 =σ B C (2.6.4)
ρ2 ρ 1 ρT 3
C
B C
B .. .. .. .. C
B .. C
B . . . . . C
B C
B C
@ A
ρT 1 ρT 2 ρT 3 1

Alternatively, we can specify this as follows:
E (uu0 ) = V (2.6.5)
depending on whether we wish to concentrate on σ2

or not.
OLS with non-spherical Disturbances
Model: y = xβ + u
Assume: x is …xed (non-stochastic) and full column

rank, rank (x) = k.
Assume that
E (u) = 0
E (uu0 ) = σ2 Ω
The OLS estimator b

β of β is given by
b 1 1
β = (x0 x ) x0 y = β + (x0 x ) x0 u
Thus, E b
β = β, since E (u) = 0, i.e., unbiased.
But
0
var b
β =E b
β E b
β b
β E b
β
h i
0 1 0 0 0 1
= E (x x) x uu x (x x)
1 1
= ( x0 x ) x0 E (uu0 ) x (x0 x)
1 1
= ( x0 x ) x0 ( σ 2 Ω ) x (x0 x )
) var b
1 1
β = σ 2 ( x0 x ) (x0 Ωx) (x0 x) (2.6.6)
Although b
β is still unbiased, the usual expression for
var b
β is no longer appropriate, –inferences will be
misleading or incorrect, i.e., t stat, F stat,

con…dence interval. Optimal properties of OLS
estimators are violated (no longer minimum
variance).
The Generalized Least Squares (GLS) Estimator
Suppose we premultiply both sides of the model by
a non-singular transformation matrix W to obtain:
W y = W xβ + W u (2.6.7)
such that:
E (W u) = 0
E (W uu0 W 0 ) = σ2 W ΩW 0 (2.6.8)
If we could choose W such that W ΩW 0 = IT , then
we could apply OLS to (2.6.7)
Theorem
If Ω is a symmetric positive de…nite matrix, then we
can …nd a matrix P such that:
Ω = PP 0 (2.6.9)
1
This suggests P 1
Ω (P 0 ) = IT and that we could

Theorem
1
set W = P so that
1
Ω 1
= (P 0 ) P 1
= W 0W
Apply OLS to equation (2.6.7) to obtain:
b 1
β G = (x0 W 0 W x ) (x0 W 0 W y )
1
= x0 Ω 1 x x0 Ω 1 y
which may be written as:

1
b
β G = β + x0 Ω 1 x x0 Ω 1 u
) E β̂G = β , as E (u) = 0
0
var b
βG = E b
βG E b
βG b
βG E b
βG
h i
1 1
= E x0 Ω 1
x x0 Ω 1
uu0 Ω 1
x x0 Ω 1
x
1 1
= x0 Ω 1
x x0 Ω 1
E uu0 Ω 1
x x0 Ω 1
x
1 1
= x0 Ω 1
x x0 Ω 1
σ2 Ω Ω 1
x x0 Ω 1
x
1
) var b
β G = σ 2 (x0 Ω 1 x )
In a sense, we standardize the observations by their

variances/covariances. Now, var bβG var b
β 0,
nsd
in the sense that this is a nsd matrix, i.e., prefer b
βG
since this is the minimum variance estimator.
An Alternative Derivation of b
βG
In the classical linear regression model, the OLS

estimator is found as the solution to:
0
minS = (y xβ) (y xβ)
β
This is equivalent to:

0 1
minS̄ = (y xβ) (σ2 In ) (y xβ)
β
With non-spherical disturbances, consider:
minSG = (y x β)0 Ω 1
(y x β)
β
∂SG
= 2x0 Ω 1
y + 2x0 Ω 1
xb
βG = 0
∂β b
βG
) x0 Ω 1
xb
β G = x0 Ω 1
y
1
Or bβG = x0 Ω 1 x x0 Ω 1y
This is equivalent to maximizing the likelihood

function under the assumption that u is normal.
Under the assumption of normality, we can use the

usual F statistic to test:
H0 : R β = r
given by:
0 h i 1
1
Rb
βG r R x0 Ω 1x R0 Rb
βG r
F = F (q, T k)
qS 2
0
y xb
βG Ω 1 y xb
βG bG
u 0 Ω 1u
bG
2
where S = =
T k T k
The following are the assumptions called regularity
conditions
∂
i ). log f (x; θ ) exist for all x and θ
∂θ
Z Z n
∂
ii ).
∂θ ∏ f (xi ; θ ) dx1 dxn =
i =1
Z Z n
∂
∂θ ∏ f (xi ; θ ) dx1 dxn
i =1

Z Z n
∂
iii ). t (x1 , , xn ) ∏ f (xi ; θ ) dx1 dxn
∂θ i =1
Z Z n
∂
= t (x1 , , xn )
∂θ ∏ f (xi ; θ ) dx1 dxn
i =1
( )
2
∂
iv ).0 < Eθ log f (x; θ ) < θ for all θ 2 Θ
∂θ

Econ-607 - Unit2-W1-3

Uploaded by

Copyright:

Available Formats

Econ-607 - Unit2-W1-3

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Econ-607 - Unit2-W1-3

Uploaded by

Copyright:

Available Formats

Addis Ababa University

College of Business and Economics

K-variables linear regression model is speci…ed as:

F. Guta (CoBE) Econ 607 September, 2018 3 / 117

To obtain the least squares estimator b

F. Guta (CoBE) Econ 607 September, 2018 5 / 117

b) var ( ej x) = σ2 In , e is homoscedastic and

F. Guta (CoBE) Econ 607 September, 2018 7 / 117

F. Guta (CoBE) Econ 607 September, 2018 8 / 117

Gauss Markov Theorem

F. Guta (CoBE) Econ 607 September, 2018 9 / 117

F. Guta (CoBE) Econ 607 September, 2018 10 / 117

where i is a column vector of ones of order n.

The matrix P is an idempotent matrix, i.e.,

We know that Px = x ; Mx = 0, P and M are

In fact the residuals obey the k linear constraints:

Since x0 e = x0 Me = 0 has degenerate distribution.

= E tr M ee0 x , as tr (AB ) = tr (BA )

F. Guta (CoBE) Econ 607 September, 2018 18 / 117

When is conditioning valid?, i.e., no loss of

So assumptions (iii ) (v ) implies

F. Guta (CoBE) Econ 607 September, 2018 20 / 117

Now, suppose we want to test one or more linear

This hypothesis can be expressed in matrix form as:

Assume our constraints are linearly independent. If

Now, we need extra theorem:

Thus, the σ2 cancels on division as do the (n k )’s

i ). Testing the signi…cance of a particular coe¢ cient

Constant 0.509 0.05510 9.2

Time 0.017 0.00197 8.4

Real GDP 0.670 0.05500 12.2

Interest rate 0.002 0.00122 1.91

In‡ation 0.9E 4 0.13E 2 0.07

“Interest rate” is border line: signi…cant at 10%, or

Using x20 = 1 n or x20 = 1968 1982

makes a di¤erence only to the constant term.

loss of generality the last J, is zero

F. Guta (CoBE) Econ 607 September, 2018 32 / 117

F. Guta (CoBE) Econ 607 September, 2018 34 / 117

But adding columns to the x matrix gives positive

de…nite increase in R 2 ; therefore, R12 R22

from regression of y on i and e 0 e = URSS.

tests H0 : β2 = 0 (joint test of signi…cance on the

2.2 Restricted Least Squares

To obtain the restricted least squares estimator, e

The …rst order conditions are

From equation (2.2.1), it follows that

F. Guta (CoBE) Econ 607 September, 2018 38 / 117

Using equation (2.1.2), equation (2.1.3) becomes

Consider whether it is worth imposing the

Under H0 , our F test is the ratio of two

The right shift explains our use of the upper tail

Thus, RRSS URSS has the form asserted earlier.

Maximum Likelihood Estimation

i ). x is full column rank and …xed (non-stochastic)

Thus OLS yields estimators which are BLUE given

It is desirable to have a method which is typically

We may relax these assumptions, but we sacri…ce

Concerned with behaviour of random variables as