Econ-607 - Unit2-W1-3

Download as pdf or txt
Download as pdf or txt
You are on page 1of 117

Addis Ababa University

College of Business and Economics


Department of Economics
Econ 607: Econometrics I
2. Regression Models
Fantu Guta Chemrie (PhD)
F. Guta (CoBE) Econ 607 September, 2018 1 / 117
2. Regression Models
2.1 K-Variables Linear Regression Models

K-variables linear regression model is speci…ed as:

yi = β1 + β2 xi ,2 + β3 xi ,3 + + βk xi ,k + ei ; i = 1, 2, . . . , n

= 1 β1 + xi ,2 β2 + xi ,3 β3 + + xi ,k βk + ei
2 3
6 β1 7
6 7
6 7
6 7
6 β2 7
6 7
= 1 xi ,2 xi ,k 6 7 + ei = xi0 β + ei
6 .. 7
6 . 7
6 7
6 7
4 5
βk
F. Guta (CoBE) Econ 607 September, 2018 2 / 117
Stacking all the n observations as a column vector
gives:
2 3 2 3 2 3 2 3
0 0
6 y1 7 6 x1 β + e1 7 6 x1 β 7 6 e1 7
6 7 6 7 6 7 6 7
6 7 6 7 6 7 6 7
6 7 6 0 7 6 7 6 7
6 y2 7 6 x2 β + e2 7 6 x20 β 7 6 e2 7
6 7 6 7 6 7 6 7
6 7 = 6 7=6 +
7 6 7
6 .. 7 6 .. 7 6 .. 7 6 .. 7
6 . 7 6 . 7 6 . 7 6 . 7
6 7 6 7 6 7 6 7
6 7 6 7 6 7 6 7
4 5 4 5 4 5 4 5
yn 0
xn β + en xn0 β en
2 3 2 3
6 x10 7 6 e1 7
6 7 6 7
6 7 6 7
6 0 7 6 7
6 x2 7 6 e2 7
6 7 6 7
= 6 7β+6 7 = xβ + e
6 .. 7 6 .. 7
6 . 7 6 . 7
6 7 6 7
6 7 6 7
4 5 4 5
xn0 en

F. Guta (CoBE) Econ 607 September, 2018 3 / 117


Therefore, a K-variables linear regression model can
compactly be written as:

y = x β + e (observables: y, x)
(n 1) (n k ) (k 1) (n 1)

To obtain the least squares estimator b


β of β
minimize
0
S ( β ) = e 0 e = (y xβ) (y xβ)
= y0 y y0 xβ β0 x0 y+ β0 x0 xβ
= y0 y 2β0 x0 y+ β0 x0 xβ
F. Guta (CoBE) Econ 607 September, 2018 4 / 117
First order condition for minimum is obtained as
follows:

∂S ( β)
= 2x0 y+2x0 xβ
∂β
∂S ( β)
=0
∂β β=bβ
) 2x0 y+2x0 xb
β=0
1
) b
β = (x0 x ) x0 y

b
β is a linear estimator.

F. Guta (CoBE) Econ 607 September, 2018 5 / 117


Consider the class of all linear estimator
h i
e 0 1 0
β = Ay = B + (x x) x y
h i
e 0 1 0
E β x = B + (x x) x E ( y j x)
h i
0 1 0
= B + (x x) x E [ (xβ + e)j x]

Assume that E ( ej x) = 0
h i
e 1
E β x = B + (x0 x ) 0
x xβ
= Bxβ + β = β 8 β i¤ Bx = 0

So, if we assume
F. Guta (CoBE) Econ 607 September, 2018 6 / 117
a) E ( ej x) = 0 ) any other linear unbiased
estimator has the property that Bx = 0 and thus b
β
is unbiased as Bx = 0 for b
β. If we restrict ourselves
to the class of all linear unbiased estimators e
β we
need an assumption for var (e).

b) var ( ej x) = σ2 In , e is homoscedastic and


uncorrelated, then
h i
e e 1
β E β x = B + (x0 x ) 0
x e

F. Guta (CoBE) Econ 607 September, 2018 7 / 117


Therefore,

0
var e
β x = E e
β E e
β e
β E e
β x
h i
1 0 1
= E B + x0 x x ee0 x x0 x + B0 x

1 0 1
= B + x0 x x E ee0 x x x0 x + B0

1
= σ2 BB 0 + σ2 x0 x ; as Bx = x0 B 0 = 0

1
This exceeds σ2 (x0 x) by a positive de…nite matrix
σ2 BB 0 , minimized by choosing B = 0; e
β=b
β.

F. Guta (CoBE) Econ 607 September, 2018 8 / 117


Thus, b
β is the minimum variance among the class
of all linear unbiased estimators (MVULE/BLUE)
and has variance given by var b
1
β x = σ 2 (x0 x ) .

Gauss Markov Theorem

If e
β is any other linear unbiased estimator of β, then
var e
β x var b
β x is positive de…nite matrix,
the variance of any linear combination,
var λ0 b
β x < var λ0 e
β x .

F. Guta (CoBE) Econ 607 September, 2018 9 / 117


This is to mean that

var λ0 e
β x var λ0 b
β x = λ0 var e
β x λ λ0 var b
β x λ
h i
= λ0 var e
β x var b
β x λ>0

N.B. λ0 b
β is an estimator of λ0 β.

Example
0 1
4 1 0
B C
B C
Example of PD matrix: A = B 1 5 1 C.
@ A
0 1 1

F. Guta (CoBE) Econ 607 September, 2018 10 / 117


F. Guta (CoBE) Econ 607 September, 2018 11 / 117
F. Guta (CoBE) Econ 607 September, 2018 12 / 117
Least squares can be regarded as decomposing
vector y into two orthogonal components:
Py is the projection of y onto the space spanned by
the column vectors of the matrix x and My is the
projection of y on the space orthogonal to x

y0 y = y0 Py + y0 My
y0 y 1 0 0
n y ii y = y0 Py 1 0 0
n y ii y +y0 My

where i is a column vector of ones of order n.


F. Guta (CoBE) Econ 607 September, 2018 13 / 117
Then,

y0 In 1 0
n ii y = y0 Py 1 0 0
n y Pii Py +y0 My

1
where P = x (x0 x) x0 ; Pi = i , as i = xJ
2 3
6 1 7
where J = 4 5.
0
(k 1) 1

The matrix P is an idempotent matrix, i.e.,


PP = P and P = P 0 .
F. Guta (CoBE) Econ 607 September, 2018 14 / 117
Therefore,

y0 In 1 0
n ii y = y0 P In 1 0
n ii Py + y0 My
) y0 Ay = y0 PAPy + y0 My
1 0
where A = In n ii

Assumptions implicit:
i ). Model is correctly speci…ed.
ii ). rank (x) = k, full column rank and x is …xed
(non-stochastic).
iii ).E ( ej x) = 0.
F. Guta (CoBE) Econ 607 September, 2018 15 / 117
iv ).var ( ej x) = σ2 In .

Residuals:

1
e = y xb
β where b
β = (x0 x ) x0 y
1
= y x (x0 x ) x0 y
1
= In x (x0 x ) x0 y
1
= (In P ) y where P = x (x0 x) x0
= My where M = In P

We know that Px = x ; Mx = 0, P and M are


F. Guta (CoBE) Econ 607 September, 2018 16 / 117
both idempotent matrices.

) My =M (x β + e) = M e

) E ( e j x) = E ( Myj x) = E ( M ej x) = ME ( ej x) = 0

var ( e j x) = E ee 0 x = E M ee0 M 0 x

= E M ee0 M x = ME ee0 x M = σ2 M

1 0 1 0
tr (M ) = tr In x x0 x x = tr (In ) tr x x0 x x
h i
1 0
= n tr x0 x xx =n tr (Ik ) = n k = rank (M )

In fact the residuals obey the k linear constraints:


F. Guta (CoBE) Econ 607 September, 2018 17 / 117
X 0 e = x0 Me = 0 as Mx = 0

Since x0 e = x0 Me = 0 has degenerate distribution.


To estimate σ2 , consider e 0 e
h i
E e 0e x = E (M e ) 0 (M e ) x = E e 0 M 0 M e x

= E e0 M e x = E tr e0 M e x , as e0 M e is scalar

= E tr M ee0 x , as tr (AB ) = tr (BA )

= tr E M ee0 x = tr ME ee0 x = tr M σ2 In

= σ2 tr (M ) = σ2 (n k)

F. Guta (CoBE) Econ 607 September, 2018 18 / 117


e 0e
Therefore, S 2 = has E (S 2 ) = σ2 and is
n k
unbiased estimator of σ2 .
N.B. x is "…xed"-just we need E ( yj x) = xβ

var ( yj x) = σ2 In , var ( ej x) = σ2 In .

When is conditioning valid?, i.e., no loss of


information, — (exactly or asymptotically) x is
weakly exogenous.

v ) . ej x N (0, σ2 In ).

So assumptions (iii ) (v ) implies


F. Guta (CoBE) Econ 607 September, 2018 19 / 117
yj x N (xβ, σ2 In ).
1) . To obtain distribution of b
β, we need the
following theorem.
Theorem
x N (µ, V ) ) Lx N (Lµ, LVL0 ).
(J 1)
More generally : Lx + γ N (Lµ + γ, LVL0 )

b 1 0 1 0 1 0
β = (x0 x ) xy = (x0 x ) x (x β + e ) = β + (x0 x ) xe

1
) ej x N 0, σ2 In ) b
β x N β , σ 2 (x0 x )

F. Guta (CoBE) Econ 607 September, 2018 20 / 117


2) . b
β and S 2 are independent. To show this, we
need the following theorem:
Theorem
If x N (0, I ), then Lx and x0 Ax are independent if
LA = 0.

1 b 1 0 1 1 1
β β( x x ) ( x0 e ) = ( x0 x ) x0
= e
σ σ σ
1 1 1 0 1
and 2 e 0 e = 2 e0 Me = e M e
σ σ σ σ
F. Guta (CoBE) Econ 607 September, 2018 21 / 117
1
are independent as (x0 x) x0 M = 0 and
1
e N (0, I ).
σ
)b β and S 2 are independent.
3) . Distribution of S 2 :
(n k) S2 1 0 1
2
= e M e χ2 (n k ) ,
σ σ σ
1 0 1
i.e., e M e χ2 (rank (M )) .
σ σ

Now, suppose we want to test one or more linear


restrictions concerning β. For example, say k = 4,
F. Guta (CoBE) Econ 607 September, 2018 22 / 117
and 8
>
>
>
> β1 = 0
>
>
>
<
H0 : β2 = β3
>
>
>
>
>
>
>
: β =1
4

This hypothesis can be expressed in matrix form as:


2 3
2 3 2 3
6 β1 7
6 7
6 1 0 0 0 76 7 6 0 7
6 76 7 6 7
6 7 6 β2 7 6 7
6 76 7 6 7
6 0 1 1 0 7 6 7=6 0 7
6 76 7 6 7
6 76 β 7 6 7
4 56 3 7 4 5
0 0 0 1 6 4
7
5 1
β4
F. Guta (CoBE) Econ 607 September, 2018 23 / 117
In general, a linear hypothesis is of the form

H0 : R β = r

Assume our constraints are linearly independent. If


R is (q k ), i.e. rank (R ) = q (if not we eliminate
the linearly dependent constraints).
Consider,

b 1
β x N β, σ2 (x0 x)
1
Rb
β x N R β, σ2 R (x0 x) R0
F. Guta (CoBE) Econ 607 September, 2018 24 / 117
and so, under H0 : R β = r
1
Rb
β r X N 0, σ2 R (x0 x) R0

Now, we need extra theorem:


Theorem
If x N (µ, V ) , then
(J 1)
0 1
(x µ) V (x µ) χ2 (J ) , where V is a psd
matrix.
Proof.
1
Let PP 0 = V ) IJ = P 1
V (P 0 )
F. Guta (CoBE) Econ 607 September, 2018 25 / 117
Proof.
1
and V 1
= (P 0 ) P 1

0 1
So, (x µ) V (x µ) =
0 1 d
(x µ ) (P 0 ) P 1
(x µ) = y0 y.

1
Where y =P (x µ ) ; (x µ) N (0, V ).

1 d
So, y =P 1
(x µ) N 0, P 1
V (P 0 ) =
N (0, IJ )

0 d
) (x µ) V 1
(x µ) = y0 y χ2 (J )
F. Guta (CoBE) Econ 607 September, 2018 26 / 117
1
Returning to Rb
β r x N 0, σ2 R (x0 x) R0

0 h i 1
1
) Rb
β r σ 2 R x0 x R0 Rb
β r χ2 (q ) , under H0

1 0 h i 1
1
) Rb
β r R x0 x R0 Rb
β r χ2 (q ) (2.1.1)
σ2
1
Here, we are assuming that if (x0 x) is positive
1
de…nite and rank (R ) = q, R (x0 x) R 0 is positive
de…nite and thus invertible.
We cannot use (2.1.1) for a test, but if we divide by
the degrees of freedom q, we have χ2 (q ) /q,
independent of
F. Guta (CoBE) Econ 607 September, 2018 27 / 117
(n k ) S 2 /σ2 / (n k) χ2 (n k ) / (n k ).

Thus, the σ2 cancels on division as do the (n k )’s


, and
0 h i 1
1
Rb
β r R (x0 x ) R0 Rb
β r
F = F (q, n k ) , under H0
qS 2

i ). Testing the signi…cance of a particular coe¢ cient


can be carried out as follows:

βj = 0 ; q = 1 ; R = 0 0 1 0 0 = ej0

"

j th position
F. Guta (CoBE) Econ 607 September, 2018 28 / 117
1 1 1
R (x0 x ) R 0 = ej0 (x0 x) ej = (x0 x)jj , the j th principal

diagonal
element of (x0 x) 1 .
h i 1
1
R (x0 x ) R0 1 1
= 1
=h i2 ,
qS 2 S 2 (x0 x)jj c b
s.e. βj
s.e. standard error.
So, we can test H0 : βj = 0 by calculating
b
βj
t (n k )under H0 . If the alternative is
.e. b
sc βj
HA : βj 6= 0, we use a two tailed test, with 2 12 %
point of the t statistic
F. Guta (CoBE) being
Econ 607 2 (or see table).
September, 2018 29 / 117
Standard regression output might look like the
following table:
Equation for US investment, 1968-82
Variable Coe¢ cient Standard error t-ratio p-value

Constant 0.509 0.05510 9.2

Time 0.017 0.00197 8.4

Real GDP 0.670 0.05500 12.2

Interest rate 0.002 0.00122 1.91

In‡ation 0.9E 4 0.13E 2 0.07

“Interest rate” is border line: signi…cant at 10%, or


F. Guta (CoBE) Econ 607 September, 2018 30 / 117
at 5% on a one-tailed test.
Note that x has …ve columns,
2 3
1 1 x13 x14 x15
6 7
6 7
6 1 2 x23 x24 x25 7
6 7
6 .. .. .. .. .. 7
6 . . . . .7
4 5
1 n xn3 xn4 xn5

Using x20 = 1 n or x20 = 1968 1982

makes a di¤erence only to the constant term.


If we accept βj = 0, we can eliminate xj from the
F. Guta (CoBE) Econ 607 September, 2018 31 / 117
equation and reduce k by 1.
However, our di¤erent t ratios are not
independent, either within a regression or between
di¤erent regressions for the same y.
ii ). Other Linear Hypothesis:
2 3
6 β1 7
If β = 64 7
5 ; H0 : β2 = 0 (a subset of β, without
β2

loss of generality the last J, is zero


2 3
6 β1 7
R= ; r = 0 ; Rβ = 6 7=β
0 IJ (J 1 )
0 IJ 4 5 2
β2

F. Guta (CoBE) Econ 607 September, 2018 32 / 117


1
What is R (x0 x) R0?
2 3
6 V11 V12 7
6 7
If V = 6 7 , V22 is (J J)
4 5
V21 V22

2 3
6 β̂1 7
0 1 0 6 7
R xx R = V22 ; R β̂ r = β̂2 if β̂ = 6 7
4 5
β̂2
0
b
β2 V221 bβ2
So, F= 2
F (J, n k ) under H0 : β2 =0
JS
To make progress with this, we need to be able to
F. Guta (CoBE) Econ 607 September, 2018 33 / 117
partition the inverse of (x0 x). Alternatively,
iii ). The hypothesis R β = r can also be tested
using:
(ẽ 0 ẽ e 0 e ) /J (RRSS URSS ) /J e 0e
F = = F (J, n k ) , where S 2 =
e 0 e / (n k) URSS / (n k ) n k

R 2 and R̄ 2

e 0e e 0e
R2 = 1 =1
∑ni=1 (yi ȳ )
2 y0 Ay

1 0 0
where A = In n i i and i = 1 1 1

F. Guta (CoBE) Econ 607 September, 2018 34 / 117


R 2 : descriptive, proportion of variation explained.
2 3
6 1 7
6 7
2 6 7
6 .. 7
R 2 = [r (b
y, y)] as long as x contains i=6
6 . 7
7
and
6 7
4 5
1

y = xb
b β.

But adding columns to the x matrix gives positive

de…nite increase in R 2 ; therefore, R12 R22

e 0 e / (n k )
R̄ 2 = 1
y0 Ay/ (n 1)
F. Guta (CoBE) Econ 607 September, 2018 35 / 117
R̄ 2 : attempts to avoid an ever increase in R 2 , but
adding variables increases R̄ 2 if F ( β2 = 0) > 1.
2
Note that ∑ni=1 (yi ȳ ) is the RRSS obtained

from regression of y on i and e 0 e = URSS.

(RRSS URSS ) / (k 1)
F = F (k 1, n k)
URSS / (n k )

tests H0 : β2 = 0 (joint test of signi…cance on the


slopes).
If the regression does not contain a constant term
F. Guta (CoBE) Econ 607 September, 2018 36 / 117
R 2 can be negative. When the regression does not
include the constant term, to avoid R 2 < 0, use
2
[r (b
y, y)] .

2.2 Restricted Least Squares

To obtain the restricted least squares estimator, e


β,
of β minimize the sum of the squared errors with
respect to β, subject to the given constraint.

0
S ( β ) = (y xβ) (y xβ) subject to R β = r
F. Guta (CoBE) Econ 607 September, 2018 37 / 117
The Lagrangian is given by:

0
L ( β, λ) = (y xβ) (y xβ) + 2λ0 (r R β)

The …rst order conditions are

∂ L ( β, λ )
= 2x0 y+2x0 x β̃ 2R 0 λ̃ = 0 (2.2.1)
∂β β̃, λ̃
∂ L ( β, λ )
= 2 r R β̃ = 0 (2.2.2)
∂λ β̃, λ̃

From equation (2.2.1), it follows that

F. Guta (CoBE) Econ 607 September, 2018 38 / 117


x0 x β̃ = x0 y+R 0 λ̃
) β̃ = (x0 x) 1 x0 y+ (x0 x) 1
R 0 λ̃
=b 1
β + (x0 x) R 0 λ̃
1
) R β̃ = R b
β + R (x0 x ) R 0 λ̃ (2.2.3)

Using equation (2.1.2), equation (2.1.3) becomes

1
r = Rb
β + R x0 x R 0 λ̃
h i 1
1
) λ̃ = R x0 x R0 r Rb
β
h i 1
1 1
) β̃ = b
β + x0 x R 0 R x0 x R0 r Rb
β
F. Guta (CoBE) Econ 607 September, 2018 39 / 117
Note that if R b
β = r, then β̃ = b
β. One can …nd
E β̃ and var β̃ , and establish that
var b
β var β̃ is psd; but β̃ is biased unless
R β = r.
One can compare MSE β̃ and MSE b
β ;
h i
0
MSE β̃ = E β̃ β β̃ β
n o
0
= E β̃ E β̃ + E β̃ β β̃ E β̃ + E β̃ β
h i
0
= var β̃ + E E β̃ β E β̃ β

Consider whether it is worth imposing the


F. Guta (CoBE) Econ 607 September, 2018 40 / 117
restrictions that may be false. It turns out that the

condition for

MSE β̃ MSE b
β
nsd
h i 1
0 1
λ = (R β r) R x0 x R0 (R β r ) /σ2 < 1

Under H0 , our F test is the ratio of two


independent χ2 d.f . Under H1 , the numerator
has a non-central χ2 (density shifted to the right)
F. Guta (CoBE) Econ 607 September, 2018 41 / 117
and the ratio is a non-central F distribution with
non-centrality parameter λ or (λ/2) depending on
the correction adopted.

The right shift explains our use of the upper tail


when testing. The MSE arguments suggest testing
that λ < 1 rather than R β = r using 5% upper tail
of an F 0 (n, k, λ) (non-central F distribution with
parameters n, k, λ ) with λ = 1.
Returning to restricted least squares:
F. Guta (CoBE) Econ 607 September, 2018 42 / 117
If ẽ = y x β̃, then

ẽ = y xb
β + xb
β x β̃ = e + x b
β β̃

0
) ẽ 0 ẽ = e 0 e + b
β β̃ x0 x b
β β̃
h i 1
1 1
But b
β β̃ = x0 x R 0 R x0 x R0 r Rb
β

0h 1
i 1
) ẽ 0 ẽ e 0e = r Rb
β R x0 x R0 r Rb
β

Thus, RRSS URSS has the form asserted earlier.

Maximum Likelihood Estimation


F. Guta (CoBE) Econ 607 September, 2018 43 / 117
We have considered the linear model: y = xβ + e
Assuming:

i ). x is full column rank and …xed (non-stochastic)


ii ). E (e) = 0 ; E (ee0 ) = σ2 In or
iii ). e N (0,σ2 In )

Thus OLS yields estimators which are BLUE given


these assumptions.
The assumption of normality is strong, but it has
advantage, it does give us exact …nite sample
F. Guta (CoBE) Econ 607 September, 2018 44 / 117
distributional results.

It is desirable to have a method which is typically


robust to departures from these assumptions !
OLS is unlikely to be robust,
In particular,

i ). x may be stochastic
ii ). e may be non-normal, may be correlated.

We may relax these assumptions, but we sacri…ce


the exact …nite distributional results, and so must
F. Guta (CoBE) Econ 607 September, 2018 45 / 117
rely on asymptotic results.
2.3 Asymptotic Theory

Concerned with behaviour of random variables as


the sample size tends to in…nity. In particular, how
does the random variable b
β behaves as n ! ∞?

Convergence in Probability:

Suppose we have a sample of size n observations on


X drawn at random from a distribution with mean
1 n
µ and variance σ2 . Consider X n = n ∑i =1 Xi , then
F. Guta (CoBE) Econ 607 September, 2018 46 / 117
E X n = µ and var X n = σ2 /n, i.e., X n is
unbiased and var X n ! 0 as n ! ∞, i.e.,
distribution of X n becomes more and more
concentrated around µ.

Consider the neighbourhood of µ e, then the

probability that X n lies in the interval is given by:

n o n o σ2
P µ e < Xn < µ + e = P Xn µ <e 1
n e2

We can make the interval smaller by decreasing e


F. Guta (CoBE) Econ 607 September, 2018 47 / 117
since the variance declines monotonically with n,
there exists a n and a δ (0 < δ < 1) such that for

all n > n

P Xn µ <e >1 δ

Then, the random variable X n is said to converge in


probability to µ.

Alternatively as n increases the probability that X n


lies in a speci…ed interval increases (i.e., δ
F. Guta (CoBE) Econ 607 September, 2018 48 / 117
decreases) so equivalently:

lim P Xn µ <e = 1
n!∞

i.e. plimX n = µ

i.e., by decreasing δ we can make the probability of


X n lying in an arbitrarily small interval around µ is
equal to one.

We write plim X n = µ.
F. Guta (CoBE) Econ 607 September, 2018 49 / 117
We say that X n is a consistent estimator of µ.
Consider an alternative estimator of µ, denoted by
mn , such that E (mn ) = µ + nc , where c is a
constant. This is obviously biased in small samples,
but limn!∞ E (mn ) = µ, i.e., mn is asymptotically
unbiased.

Chebysheve’s Theorem:

For a random variable X with …nite mean and


variance (µ, σ2 ) and for a given λ > 0,
F. Guta (CoBE) Econ 607 September, 2018 50 / 117
1
P fjX µj λσg .
λ2

Using this theorem, we obtain

q
c 1
P mn µ+ n λ var (mn )
λ2
p
Letting e = λ var (mn ), we have

var (mn )
P mn µ + nc e
e2

) lim P mn µ + nc e =0
n!∞

F. Guta (CoBE) Econ 607 September, 2018 51 / 117


So that mn is consistent if var (mn ) ! 0 with
n ! ∞.
Su¢ cient conditions for consistency:

i ). Asymptotic unbiasedness
ii ). Variance! 0 as n ! ∞.

The operator plim has the following properties:

2
plim X 2 = (plim)
plimXY = (plimX ) (plimY )
F. Guta (CoBE) Econ 607 September, 2018 52 / 117
Example
Consistency of least squares estimator b
β:

b 1 0 1 0
β = x0 x x y = β + x0 x xe

1
= β + x0 x / n x0 e/n = β+A 1 B

x0 x x0 e
where A = ; B=
n n

If x is stochastic, we will assume that

x0 x
plim =Σ (positive de…nite)
n
F. Guta (CoBE) Econ 607 September, 2018 53 / 117
Example
If x contains the constant term:
2 3
6 plim ∑ ei /n 7
6 7
6 7
6 7
6 plim ∑ x e /n 7
0
xe 6 i ,2 i 7
plim =6
6
7
7
n 6 .. 7
6 . 7
6 7
6 7
4 5
plim ∑ xi ,k ei /n

Since E (ē) = 0, var (ē) = σ2 /n )plim ē = 0.


Now, E (∑ni=1 xi,j ei /n) = 0 for every j = 2, . . . , k,
F. Guta (CoBE) Econ 607 September, 2018 54 / 117
Example
i ). x is …xed or non-stochastic, or
ii). x is stochastic with cov (x, e) = 0.
n n
σ2
Now, var ∑ xi ,j ei /n = n ∑ xi2,j /n ; j = 2, . . . , k.
i =1 i =1

In view of the assumption that plim x0 x/n = Σ, we


n
have that plim ∑ xi,j
2
/n is constant and plim σ2 /n
i =1
is zero,

x0 e
i.e. plim =0
n
) plim b
β = β + plim A 1
(plim B )
F. Guta (CoBE) Econ 607 September, 2018 55 / 117
Example

1
= β + (plim A) (plim B ) = β + Σ 1
0=β

Convergence in distribution
Suppose X N (µ, σ2 ), then X n N (µ, σ2 /n).
But as n ! ∞, the distribution of X n is degenerate,
i.e., collapses around the point µ. However, consider
p
Zn = n X n µ
) E (Zn ) = 0, var (Zn ) = σ2
F. Guta (CoBE) Econ 607 September, 2018 56 / 117
Thus the density function (distribution) of Zn is
N (0, σ2 ) which is independent of n and hence …nite
sample distributions are the same as the limiting
distribution.
Often, small sample distributions cannot be derived
or di¢ cult to calculate, but limiting distributions
based on standardized variates such as Zn are
available.

Central Limit Theorem:


F. Guta (CoBE) Econ 607 September, 2018 57 / 117
Suppose X has mean µ and variance σ2 . De…ne
p d
Zn = n X n µ , then Zn ! N (0, σ2 )
independent of the actual distribution of the X ’s.
We say that X n AN (µ, σ2 /n) where
asyvar X n = σ2 /n.

Asymptotic (large sample) theory:

Convergence in probability
Consistency, plim b
θ = θ.

F. Guta (CoBE) Econ 607 September, 2018 58 / 117


Convergence in distribution
d
Central limit theorem: Zn ! N (0, σ2 ).

2.4 Maximum Likelihood Estimation:

A very general method with a widespread


application.

De…nition
The likelihood of an observation of X is the value of
the density function at x, f (x, θ ), where f depends
on θ (parameter).
F. Guta (CoBE) Econ 607 September, 2018 59 / 117
Clari…cation: the density f (x, θ ) is regarded as a
function of x for a given θ.

F. Guta (CoBE) Econ 607 September, 2018 60 / 117


The likelihood function is regarded as a function of
θ for a given x.

F. Guta (CoBE) Econ 607 September, 2018 61 / 117


Maximum Likelihood Principle: to choose the value
of θ (possibly a vector) which is most likely to have
generated the observed (given) sample values of x.
Given x1 , x2 , . . . , xn , i.e., n observations on the
random variable X , the likelihood function (provided
x1 , x2 , . . . , xn are independent) is
n
L (θ; x1 , x2 , . . . , xn ) = ∏ f (xi ; θ )
i =1

The maximum likelihood estimates (MLE ) are


obtained as a solution to:
F. Guta (CoBE) Econ 607 September, 2018 62 / 117
maxL (θ; x) ! b
θ = h (x)
θ
MLE of the General Linear Model:

Model : y = xβ + u (2.4.1)

Assume X is …xed (non-stochastic). Equation


(2.4.1) is a transformation from u to y. The
(multivariate) density of y may be written as:

∂u
fY (y) = fU (u (y)) = fU (u (y)) jIn j = fU (u (y)
∂y0
F. Guta (CoBE) Econ 607 September, 2018 63 / 117
Assuming u N (0, σ2 In ), we have

n /2 1 0
fU (u) = 2πσ2 exp uu (2.4.2)
2σ 2
n /2 1
) fY (y) = 2πσ2 exp (y x β ) 0 (y x β)
2σ 2

We want to maximize L with respect to β, σ2 ,


0
maxL. Let θ = β0 σ2 , i.e., θ is a (k + 1) 1
β, σ2
∂L
column vector. Then b
θ is the solution to = 0.
∂θ b
θ
We use (maximize) ln L,

F. Guta (CoBE) Econ 607 September, 2018 64 / 117


From equation (2.4.2), it follows that:

n n 1
ln L = ln 2π ln σ2 (y x β ) 0 (y x β)
2 2 2σ 2

9
∂ ln L 1 >
>
= 2x0 y+2x0 xb
β =0 >
>
>
∂β b
β, σ̂2 b2
2σ >
>
>
=
1
= x0 y x0 x b
β =0 >
(2.4.3)
b2
σ >
>
>
∂ ln L n 1 0 >
>
>
= 2
+ 4 y xb β y b
xβ = 0 >
;
∂σ2 b
β, σ̂2 2σb b

From equation (2.4.3), it follows that


b
β = (x0 x )
1
x0 y this is the OLS estimator
F. Guta (CoBE) Econ 607 September, 2018 65 / 117
1 0 e 0e
b2 =
σ y xb
β y xb
β = where e = y xb
β
n n
(n k ) 2
Now E bβ = β but E σb2 6= σ2 ; E σb2 = n σ .

Subject to certain regularity conditions1 if

(y1 , y2 , . . . yn ) is a random sample and b


θn is the
maximum likelihood estimator of θ0 (true value)
based on a sample of size n, then

1
p d 1 ∂2 ln L
n b
θn θ0 ! N (0, V) where V = lim E
n !∞ n ∂θ∂θ0

1 For details see slides 116 and 117


F. Guta (CoBE) Econ 607 September, 2018 66 / 117
This suggests that we may conduct approximate
test in …nite samples based on the limiting
distributions, e.g., use t test. The approximation
obviously becomes better as n gets larger.
Theorem
Cramer-Rao Theorem: Subject to certain regularity
conditions, given a random sample (y1 , y2 , . . . yn )
and an unbiased estimator b
θn of θ0 , then

1
var b
θn I (θ0 ) is positive semi-de…nite
F. Guta (CoBE) Econ 607 September, 2018 67 / 117
Theorem

∂ ln L ∂ ln L ∂2 ln L
where I (θ0 ) = E = E
∂θ ∂θ0 ∂θ∂θ0

I (θ0 ) is known as the information matrix. The


1
matrix I (θ0 ) serves as the minimum variance
bound (MVB).

1
If var b
θn = I (θ0 ) , then b
θn is said to be
e¢ cient.
F. Guta (CoBE) Econ 607 September, 2018 68 / 117
1
N.B. var b
θn I (θ0 ) 0 in the sense that the
(k k ) psd
(k k )
di¤erence is a psd matrix.
Now b
βML is unbiased but σ̂2ML is biased.
0 1 0 1
b
β β
So @ ML A is a biased estimator of @ A.
2 2
bML
σ σ

This is to mean that, strictly speaking, the


Cramer-Rao Theorem is not applicable here, but it
is of interest to examine these properties:

F. Guta (CoBE) Econ 607 September, 2018 69 / 117


From equation (2.4.3), it follows that
8
>
>
> ∂2 ln L x0 x
>
> =
>
> ∂β∂β0 σ2
>
<
∂2 ln L 1 0
= 4 (
x y x0 x β )
>
> ∂β∂σ 2 σ
>
>
>
> ∂2 ln L n 1
>
> = 4 y x β ) 0 (y x β )
:
2 2 2 6 (
∂ (σ ) σ σ
8
>
> ∂2 ln L X 0X
>
> E =
>
> ∂β∂β0 σ2
>
>
<
∂2 ln L
) E =0
>
> ∂β∂σ 2
>
> !
>
> ∂2 ln L n n n
>
> E = + 4 = 4
: 2 4
∂ ( σ2 ) 2σ σ 2σ
F. Guta (CoBE) Econ 607 September, 2018 70 / 117
The information matrix is therefore:
0 1 0 1
x0 x 1
B 0 C B σ 2 (x0 x ) 0 C
B 2 C 1 B C
I (θ) = B σ C ) [I (θ)] =B C
@ n A @ A
00 00 2σ 4 /n
2σ 4

1
Thus, var bβ = var bβML = σ2 (x0 x) = MVB .

So b
β and b
βML are both e¢ cient estimators of β.
b2ML is a biased estimator σ2 , the theorem is
Since σ
not really relevant or applicable.
2σ4 2σ4
But var (S 2 ) = n k > n for …nite n ) S 2 is not
F. Guta (CoBE) Econ 607 September, 2018 71 / 117
an e¢ cient estimator of σ2 . In fact, there is no
unbiased estimator of σ2 that attains the MVB.
Hypothesis Testing: the Likelihood Ratio Principle

So far we have considered linear restrictions of the


form: H0 : R β = r which can be tested using an
F statistic. But what about non-linear restrictions
of the form, say: H0 : β1 β2 = 1.
This cannot be written in the form R β = r, but we
can represent it as: H0 : g ( β) = 0.
F. Guta (CoBE) Econ 607 September, 2018 72 / 117
This hypothesis can be tested using likelihood ratio
test:
maxL
H0
λ=
max L
H0 [ HA

Example
yi = β1 + β2 xi2 + β3 xi3 + ei

Consider a null hypothesis of the form:


1
H0 : β2 = .
β3
1
Under H0 , yi = β1 + β2 xi2 + xi3 + ei .
β2
F. Guta (CoBE) Econ 607 September, 2018 73 / 117
Thus OLS is inapplicable, i.e., won’t take the
restrictions into account. The procedure is:
Step I: Estimate the unrestricted model (by OLS)
and obtain maximized value of the log likelihood,
i.e., ln LU .
Step II: Impose the restrictions, estimate by
maximum likelihood (ML) and obtain the maximized
value of the restricted log likelihood, i.e., ln LR .
approx
Step III: 2 ln λ = 2 ln LR ln LU χ2 (q )
F. Guta (CoBE) Econ 607 September, 2018 74 / 117
where q is the number of restrictions imposed by
the null hypothesis.

Step IV: Accept H0 if 2 ln λ < c̄ (critical value


based on χ2 (q )); reject H0 if otherwise.
N.B. ln LR < ln LU ) 2 ln λ 0.

2.5 Wald, Likelihood Ratio and Lagrange Multiplier


Tests

These are general methods for constructing test


statistics.
F. Guta (CoBE) Econ 607 September, 2018 75 / 117
Usually they provide us with asymptotically valid
tests.
We will de…ne the general form of these test
statistics, and then applying them to testing
R β = r as an illustration.
We start with Maximum Likelihood Estimation
p h i 1
d
n b
θ θ ! N 0, lim n1 In (θ)
n!∞

Where if ` (θ) = ln L (θ)


F. Guta (CoBE) Econ 607 September, 2018 76 / 117
∂` (θ)
= S (θ) the score vector
∂θ
∂2 ` (θ) ∂S (θ)
0 = =H the Hessian matrix
∂θ∂θ ∂θ0
E (H ) = I (θ) the information matrix

In general, in i.i.d. sampling

n n
L (θ) = ∏ f ( Xi j θ) ; ` (θ) = ln L (θ) = ∑ ln f ( Xi j θ)
i =1 i =1

So, `, and hence S, H and I are sums of n


identically distributed terms.
F. Guta (CoBE) Econ 607 September, 2018 77 / 117
So, n1 In (θ) is a mean, subscripting I is to emphasize
its dependence on the sample size n.

For a linear regression model: y = x β + e where

e N 0, σ2 In ,

n n 1
` (θ) = ` β,σ2 = ln 2π ln σ2 (y x β ) 0 (y x β)
2 2 2σ 2

So, 0 1 0 1
∂` (θ) 1
∂` (θ) B
B
C B
C B (x0 y x0 x β ) C
S (θ) = =B ∂β C=@ σ2 C
∂θ @ A A
∂` (θ) n 1
+ 4 (y x β ) 0 (y x β)
∂σ2 2 σ2 2σ

F. Guta (CoBE) Econ 607 September, 2018 78 / 117


So, the Hessian matrix is
0 1
∂2 ` (θ) ∂2 ` (θ)
B C
B ∂β∂β0 ∂β∂σ2 C
H = B
B
C
@ ∂2 ` (θ) ∂2 ` (θ) C
A
∂σ2 ∂β0 2
∂ (σ )
2
0 1
1 0 1 0
B xx x (y x β ) C
B σ2 σ4 C
= B C
@ 1 n 1 A
(y x β)0 x (y x β ) 0 (y x β)
σ4 2σ 4 σ6

0 1 0 1
1 0 1 0
B C B C
B σ2 X X 0
C B σ2 X X 0
C
) E (H ) = B C =B C
@ n n σ2 A @ n A
00 4
+ 6 00
2σ σ 2σ 4

F. Guta (CoBE) Econ 607 September, 2018 79 / 117


and
0 1 0 1
1 0 1 1 0
B C B C
B 2σ 2 x x 0
C B 2n
xx 0
C
In (θ) = B C ) n1 In (θ) = B 2σ C
@ n A @ 1 A
00 00
2σ 4 2σ 4

p
If n1 x0 x ! Q, positive de…nite matrix, then
0 1
1 0 1
h i 1 B σ2 lim nx x 0 C
B C
lim 1 In (θ) = B n !∞ C
n !∞ n @ A
00 2σ 4

So, the asymptotic maximum likelihood result


stated for the i.i.d. case, suggest that
F. Guta (CoBE) Econ 607 September, 2018 80 / 117
0 1 20 1 0 13
b 1
p B β C 6B 0 C B σ2 lim n1 x0 x 0 C7
B β C d 6B C B n !∞ C7
nB C ! N 6B C,B C7
@ 2 A 4@ A @ A5
b
σ σ2 0 00 2σ 4
p 1
In fact n b
β β is exactly N 0,σ2 (x0 x/n)
b2 , and
independently of the distribution of σ
b2
σ
b 2 = 2σ 4
var σ n k
n2
but E σb2 = σ2 n k
n and is
σ2
1
n χ2 (n k ).

We de…ne the likelihood ratio (LR) test as follows:

If H0 : C (θ) = 0 and e
LR is the likelihood
(J 1) (k 1)

maximized subject to the restriction, b


LU is the
F. Guta (CoBE) Econ 607 September, 2018 81 / 117
unrestricted maximum value of the likelihood
function, then
!
e
LR
2 ln λ = 2 ln , 2 ln λ χ2 (J )
b
LU

In our example of testing R β = r, we know that


n n e 0e
ln b
LU = (1 + ln 2π ) ln σ 2
b where σ 2
b =
2 2 n
Similarly,
n n ẽ 0 ẽ
ln e
LR = (1 + ln 2π ) 2 2
ln σ̃ where σ̃ =
2 2 n
F. Guta (CoBE) Econ 607 September, 2018 82 / 117
So,

bU eR n σ̃2
2 ln λ = 2 ln L ln L =2 ln 2
2 σb
!n /2
n /2
b2
σ e 0e
) λ= =
σ̃2 ẽ 0 ẽ
2 /n ẽ 0 ẽ ẽ 0 ẽ e 0e
λ 1= 1=
e 0e e 0e
n k 2 /n (ẽ 0 ẽ e 0 e ) /J
) λ 1 = =F F (J, n k)
J e 0 e / (n k )
2 /n JF
i.e. λ = 1+
n k
2 JF JF
) ln λ = ln 1 + ) 2 ln λ = n ln 1 +
n n k n k

F. Guta (CoBE) Econ 607 September, 2018 83 / 117


x2
Now, 1 + x < 1 + x + + = ex for x > 0
2!
) ln (1 + x ) < x
JF d 2
2 ln λ < n ! χ (J )
n k
But the approximation will become very close for n
large enough.

The Wald Test: H0 : C (θ) = 0

h i0 h i 1h i
d
W = C b
θ var C b
θ C b
θ ! χ2 (J )
H0

F. Guta (CoBE) Econ 607 September, 2018 84 / 117


var C b
θ ' c var b
θ c0
∂C (θ)
where c = .
∂θ0
(J k ) b
θ
h i0 h i 1h i
d
W ' C b
θ c var b
θ c0 C b
θ ! χ2 (J )
H0

In our example, C (θ) = 0 , R β r = 0.


θ = Rb
So, C b β r
0 1
" #
∂C (θ) B β C
∂C (θ) ∂C (θ) B C
c= = ; θ =B C
∂θ0 ∂β0 ∂σ2 b
@ A
θ σ 2
(J k ) b
θ

F. Guta (CoBE) Econ 607 September, 2018 85 / 117


Thus,

1
1 1
var b
θ = lim In (θ)
n n!∞ n
0 1
1
2 lim 1 0
B
1 B σ xx 0 C
n!∞ n C
= B C
n @ A
00 2σ 4

2 3
2 0 1
σ (x x) 0
and we use 4 5
0 4
0 2σ /n

F. Guta (CoBE) Econ 607 September, 2018 86 / 117


2 32 3
2 0 1 0
6 σ (x x) 0 76 R 7
So, c .var b
θ .c 0 = R 0
6
4
76
54
7 = σ 2 R (x0 x )
5
1
R 0.
00 4
2 σ /n 0

0 h i 1
1
) W = Rb
β r R x0 x R0 Rb
β r /σ2

Replacing σ2 by the unconstrained MLE σ̂2 , we


obtain:
0 h i 1
1 0
W = Rb
β r R x0 x R Rb
β r /bσ2
0h 1
i 1
Rb
β r R (x0 x ) R 0 Rbβ r /σ2 /J
n k
W = n o =F F (J , n k)
nJ b2 /σ2 / (n k )

and as n ! ∞, . n n k ! 1) W = JF d! χ2 (J ).
H0

F. Guta (CoBE) Econ 607 September, 2018 87 / 117


JF nJF
As 2 ln λ = n ln 1 + < = W.
n k n k

LR < W , but LR ! W as n ! ∞

) LR < W

The Lagrange Multiplier (Score) Test:

Exists in two forms. In general the Lagrangian is:

L = ` (θ) λ0 C (θ) , and the F.O.C.,

∂L ∂` (θ) ∂C (θ)
= c 0λ e e = 0, where c =
∂θ e e
θ, λ ∂θ e e
θ, λ
θ, λ ∂θ0
F. Guta (CoBE) Econ 607 September, 2018 88 / 117
F.O.C. imply
0 ∂` e
θ
c e
θ e=S e
λ θ , where S e
θ = the score vector
∂θ

If e
θ is close to b
θ, we expect S e
θ 'S b
θ so, we can

use as our test statistic,

0h i 1 d
S e
θ f S e
var θ S e
θ ! χ2 (J )
H0

Theorem
In general,
F. Guta (CoBE) Econ 607 September, 2018 89 / 117
Theorem
i ). As E (S (θ)) = 0 , E S e
θ =0
H0
h i
ii ). As E (S (θ)) = 0 , var (S (θ)) = E S (θ) S (θ)0 = In (θ).
h i 1 0
So, LM = λe0c eθ In e θ c eθ λ e

e should be close to 0, as the ‘cost’of


Of course λ
the constraint, if the constraint is nearly valid. It is
better to write

0 1
1 1
LM = S e
θ lim In e
θ S e
θ
n n !∞ n

F. Guta (CoBE) Econ 607 September, 2018 90 / 117


For a linear regression model:
0 1
1
B x0 y x0 x e
β C
B e2 C
S e
θ = B σ C
@ n 1 0 A
+ y x e
β y xe
β
0 e 2 2σ
2σ e4 1 0 1
1 0 1
B xe
e C B 2 x0 e e C
B e 2 C B e C
= B σ C = B σ C (2.5.1)
@ n nσe2 A @ A
+ 0
e2
2σ 2σe4

But ẽ = y x β̃ = y xb
β+x b
β β̃ = e + x b
β β̃

) x0 e
e = x0 e + x0 x b
β e
β = x0 x b
β e
β , as x0 e = 0

F. Guta (CoBE) Econ 607 September, 2018 91 / 117


However,
h i 1
b e 1 1
β β = x0 x R 0 R x0 x R0 r Rb
β
h i 1
1 1
= x0 x R 0 R x0 x R0 Rb
β r

h i 1
1
=) x0 x b
β e
β = R 0 R x0 x R0 Rb
β r

h i 1
1
) x0 e
e = R 0 R x0 x R0 Rb
β r (2.5.2)

On the other hand,


0 1
1 1 0
1 B 2 xx 0 C
B e n C
lim In e
θ =B σ C (2.5.3)
n !∞ n @ 1 A
00
e4

F. Guta (CoBE) Econ 607 September, 2018 92 / 117
From (2.5.1), (2.5.2) and (2.5.3), it follows that

0 1
1 1
LM = S eθ lim In ~ θ S eθ
n n !∞ n
" #
1 1 0
h i 1
= 1
n 2
R β̂ r R (x0 x ) R 0 R0 0
e
σ
0 1 1
1 1 0
B 2 xx 0 C
B σ C
B e n C
@ 1 A
0 0
2σe4
2 3
1 h i 1
6 2 R 0 R (X 0 X ) 1 R 0 Rbβ r 7
6 σe 7
6 7
4 5
0

0h 1
i 1
) LM = R b
β r R x0 x R0 Rb
β σ2
r /e (2.5.4)

F. Guta (CoBE) Econ 607 September, 2018 93 / 117


From (2.5.4), it follows that
8 0h i 1
9
>
< Rb r R (x0 x )
1
R0 Rb r >
= σ
β β b2 Wσ b2
LM = =
>
: b2
σ > e2
; σ e2
σ

e2 , LM<W. Moreover,
Now, as σ̂2 < σ

e 0e
e e e 0e e 0e
LM = =n 1
ẽ 0 ẽ /n e 0e
e e
n /2
e 0e e 0e
LR = 2 ln = n ln 0
e 0e
e e e
ee e
LR /n e 0e
) e = 0
e
ee e
F. Guta (CoBE) Econ 607 September, 2018 94 / 117
Therefore,

LR /n
LM = n 1 e

LR /n LM
) e =1
n
LR LM LM
) = ln 1 <
n n n

The preceding inequality holds as for x > 0,


x2 x3
ln (1 x) = x < x.
2 3

) LR > LM
F. Guta (CoBE) Econ 607 September, 2018 95 / 117
Moreover, as n ! ∞, LR ! LM.
Therefore, for linear hypothesis in the linear
regression model, we have: W LR LM.
This ranking does not carry over to non-linear
restrictions . However, if we have linear restrictions
plus a quadratic likelihood, then it follows that
W = LR = LM. Note that in regression: we have a
likelihood quadratic in β but not in σ2 , so, equality
holds if σ2 is known or as n ! ∞.

F. Guta (CoBE) Econ 607 September, 2018 96 / 117


Notes:

i ). W , LR, and LM are asymptotically equivalent


under H0 [ HA = Hm the maintained hypothesis,
but if Hm is false, may behave di¤erently.
ii ). Easier to modify W and LM for, example,
heteroscedasticity using HCSE formulae.
iii ). Small sample distributions may vary
(considerably) from each other and asymptotic χ2 .
iv ). Special problem with W , which is not invariant
F. Guta (CoBE) Econ 607 September, 2018 97 / 117
to the way restrictions are written:
Example
β1
H0 : β1 = β2 or H0 : =1
β2
Same LR, LM, but di¤erent W (and problem if b
β2
close to zero)

v ). Wald makes general to simple, easy; as only


needs b
θ plus if W (k1 ) tests k1 restrictions, W (k2 )
tests k2 restrictions and k1 < k2 (and the k1
restrictions are nested in k2 restrictions), then
F. Guta (CoBE) Econ 607 September, 2018 98 / 117
W (k2 ) W (k1 ) χ2 (k2 k1 ) under all restrictions

are valid, independently of W (k1 ) χ2 (k1 , 0).

vi ). When the restricted model is easier to estimate:


e.g. yt = xt0 β + ut : ut = ρut 1 + et regression
with AR (1) errors H0 = ρ = 0; then β̃ is just
OLS: and LM is the natural test.
2.6 Generalized Least Squares

So far we assumed E (uu0 ) = σ2 In . This plays a


crucial role in the results so far obtained, but is
F. Guta (CoBE) Econ 607 September, 2018 99 / 117
rather restrictive.

Sources of non-spherical disturbances:

1). Suppose we have a cross-section of households


where we observe income and expenditure. We
might expect there to be more variation in
expenditures by households at higher income levels,
leading to the assumption that:

F. Guta (CoBE) Econ 607 September, 2018 100 / 117


0 1
σ21 0 0
B C
B C
B 0 σ22 0 C
E (uu0 ) = B
B .. .. ..
C
C
B . . ... . C
@ A
0 0 σ2n
where σ21 < σ22 < < σ2n , i.e., variance increases
as income increases.

N.B. the data is ordered from the lowest income (1)


to the highest income (n).
This clearly violates the spherical nature of the
disturbances, i.e., variances are not constant even
F. Guta (CoBE) Econ 607 September, 2018 101 / 117
though the covariances are (assumed) to be zero
(Heteroscedasticity).
2). The e¤ects of shocks or unmodelled
components in a model may persist into subsequent
periods. For the model: yt = xt0 β + ut where
xt0 = xt1 xt2 xtk , we may …nd that

ut = ρut 1 + et (2.6.1)

Where E (et ) = 0, E (e2t ) = σ2e and E (et es ) = 0


for all t 6= s.
F. Guta (CoBE) Econ 607 September, 2018 102 / 117
i.e., et is spherical. In this case we have
‘autocorrelation’. What are the properties of ut ?

Let L denotes the lag operator:

i.e., Lxt = xt 1

L2 xt = xt 2
..
.
Lj xt = xt j , L0 xt = xt

Then, equation (2.6.1) can be written as:


F. Guta (CoBE) Econ 607 September, 2018 103 / 117
ut = ρLut + et

) (1 ρL) ut = et

et
) ut = = 1 + ρL + ρ2 L2 + et
1 ρL

Or,

ut = et + ρet 1 + ρ2 et 2 + (2.6.2)

Now, E (ut ) = 0 and


σ2e
var (ut ) = 2
= σ2 , if jρj < 1 (2.6.3)
1 ρ
F. Guta (CoBE) Econ 607 September, 2018 104 / 117
Also,

E (u t u t 1) = ρσ2

E (u t u t 2) = ρ2 σ 2

.
.
.

E (u t u t j) = ρj σ 2

0 1
B 1 ρ ρ2 ρT 1 C
B C
B C
B C
B ρ 1 ρ ρT 2 C
B C
B C
B
2B
C
i .e., E uu0 =σ B C (2.6.4)
ρ2 ρ 1 ρT 3
C
B C
B .. .. .. .. C
B .. C
B . . . . . C
B C
B C
@ A
ρT 1 ρT 2 ρT 3 1

F. Guta (CoBE) Econ 607 September, 2018 105 / 117


Alternatively, we can specify this as follows:

E (uu0 ) = V (2.6.5)

depending on whether we wish to concentrate on σ2


or not.
OLS with non-spherical Disturbances
Model: y = xβ + u

Assume: x is …xed (non-stochastic) and full column


rank, rank (x) = k.
F. Guta (CoBE) Econ 607 September, 2018 106 / 117
Assume that

E (u) = 0
E (uu0 ) = σ2 Ω

The OLS estimator b


β of β is given by

b 1 1
β = (x0 x ) x0 y = β + (x0 x ) x0 u

Thus, E b
β = β, since E (u) = 0, i.e., unbiased.
But
0
var b
β =E b
β E b
β b
β E b
β
F. Guta (CoBE) Econ 607 September, 2018 107 / 117
h i
0 1 0 0 0 1
= E (x x) x uu x (x x)
1 1
= ( x0 x ) x0 E (uu0 ) x (x0 x)
1 1
= ( x0 x ) x0 ( σ 2 Ω ) x (x0 x )

) var b
1 1
β = σ 2 ( x0 x ) (x0 Ωx) (x0 x) (2.6.6)

Although b
β is still unbiased, the usual expression for
var b
β is no longer appropriate, –inferences will be
misleading or incorrect, i.e., t stat, F stat,

F. Guta (CoBE) Econ 607 September, 2018 108 / 117


con…dence interval. Optimal properties of OLS
estimators are violated (no longer minimum
variance).
The Generalized Least Squares (GLS) Estimator
Suppose we premultiply both sides of the model by
a non-singular transformation matrix W to obtain:

W y = W xβ + W u (2.6.7)

such that:
E (W u) = 0
F. Guta (CoBE) Econ 607 September, 2018 109 / 117
E (W uu0 W 0 ) = σ2 W ΩW 0 (2.6.8)
If we could choose W such that W ΩW 0 = IT , then
we could apply OLS to (2.6.7)
Theorem
If Ω is a symmetric positive de…nite matrix, then we
can …nd a matrix P such that:

Ω = PP 0 (2.6.9)

1
This suggests P 1
Ω (P 0 ) = IT and that we could

F. Guta (CoBE) Econ 607 September, 2018 110 / 117


Theorem
1
set W = P so that

1
Ω 1
= (P 0 ) P 1
= W 0W

Apply OLS to equation (2.6.7) to obtain:

b 1
β G = (x0 W 0 W x ) (x0 W 0 W y )
1
= x0 Ω 1 x x0 Ω 1 y

which may be written as:


1
b
β G = β + x0 Ω 1 x x0 Ω 1 u
F. Guta (CoBE) Econ 607 September, 2018 111 / 117
) E β̂G = β , as E (u) = 0

0
var b
βG = E b
βG E b
βG b
βG E b
βG
h i
1 1
= E x0 Ω 1
x x0 Ω 1
uu0 Ω 1
x x0 Ω 1
x

1 1
= x0 Ω 1
x x0 Ω 1
E uu0 Ω 1
x x0 Ω 1
x

1 1
= x0 Ω 1
x x0 Ω 1
σ2 Ω Ω 1
x x0 Ω 1
x

1
) var b
β G = σ 2 (x0 Ω 1 x )

In a sense, we standardize the observations by their


F. Guta (CoBE) Econ 607 September, 2018 112 / 117
variances/covariances. Now, var bβG var b
β 0,
nsd
in the sense that this is a nsd matrix, i.e., prefer b
βG
since this is the minimum variance estimator.
An Alternative Derivation of b
βG

In the classical linear regression model, the OLS


estimator is found as the solution to:

0
minS = (y xβ) (y xβ)
β

This is equivalent to:


F. Guta (CoBE) Econ 607 September, 2018 113 / 117
0 1
minS̄ = (y xβ) (σ2 In ) (y xβ)
β

With non-spherical disturbances, consider:

minSG = (y x β)0 Ω 1
(y x β)
β
∂SG
= 2x0 Ω 1
y + 2x0 Ω 1
xb
βG = 0
∂β b
βG

) x0 Ω 1
xb
β G = x0 Ω 1
y

1
Or bβG = x0 Ω 1 x x0 Ω 1y

This is equivalent to maximizing the likelihood


F. Guta (CoBE) Econ 607 September, 2018 114 / 117
function under the assumption that u is normal.

Under the assumption of normality, we can use the


usual F statistic to test:

H0 : R β = r

given by:
0 h i 1
1
Rb
βG r R x0 Ω 1x R0 Rb
βG r
F = F (q, T k)
qS 2
0
y xb
βG Ω 1 y xb
βG bG
u 0 Ω 1u
bG
2
where S = =
T k T k
F. Guta (CoBE) Econ 607 September, 2018 115 / 117
The following are the assumptions called regularity
conditions


i ). log f (x; θ ) exist for all x and θ
∂θ

Z Z n

ii ).
∂θ ∏ f (xi ; θ ) dx1 dxn =
i =1
Z Z n

∂θ ∏ f (xi ; θ ) dx1 dxn
i =1

F. Guta (CoBE) Econ 607 September, 2018 116 / 117


Z Z n

iii ). t (x1 , , xn ) ∏ f (xi ; θ ) dx1 dxn
∂θ i =1
Z Z n

= t (x1 , , xn )
∂θ ∏ f (xi ; θ ) dx1 dxn
i =1
( )
2

iv ).0 < Eθ log f (x; θ ) < θ for all θ 2 Θ
∂θ

F. Guta (CoBE) Econ 607 September, 2018 117 / 117

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy