Zellner's Seemingly Unrelated Regressions Model

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Zellner’s Seemingly Unrelated Regressions Model

James L. Powell
Department of Economics
University of California, Berkeley

Overview
The seemingly unrelated regressions (SUR) model, proposed by Zellner, can be viewed as a special
case of the generalized regression model E(y) = Xβ, V(y) =σ 2 Ω; however, it does not share all of the
features or problems of other leading special cases (e.g., models of heteroskedasticity or serial correlation).
While, like those models, the matrix Ω generally involves unknown parameters which must be estimated,
the usual estimators for the covariance matrix of the least squares estimator β̂LS are valid, so that the usual
inference procedures based on normal theory are valid if the dependent variable y is multinormal or if the
sample size N is large and suitable limit theorems are applicable. Also, unlike those other models, there is
little reason to test the null hypothesis H0 : Ω = I; the form of Ω is straightforward and its parameters are
easy to estimate consistently, so a feasible version of Aitken’s GLS estimator is an attractive alternative

to the asymptotically-inefficient LS estimator.


The basic SUR model assumes that, for each individual observation i, there are M dependent variables
yi1 , ..., yij , ..., yiM available, each with its own linear regression model:

yij = x0ij βj + εij , i = 1, ..., N,

or, with the usual stacking of observations over i,

yj = Xj βj + εj

for j = 1, ..., M, where yj and εj are N -vectors and Xj is an N × Kj matrix, where

Kj = dim(βj )

is the number of regressors for the j th regression.


The standard conditions for the classical regression model are assumed to hold for each j : namely,

E(yj ) = Xj βj ,

V(yj ) = σ jj IN ,

1
with Xj nonstochastic and rank(Xj ) = Kj . Under these conditions, and the additional condition of
multinormality of yj , the usual inference theory is valid for the classical LS estimator of βj , applied
separately to each equation.
However, the SUR model permits nonzero covariance between the error terms εij and εij for a given
individual i across equations j and k, i.e.,

Cov(εij , εik ) = σ ij

while assuming
Cov(εij , εi0 k ) = 0

if i 6= i0 . This can be expressed more compactly in matrix form:

C(εj , εk ) = σ jk IN.

It is the potential nonzero covariance across equations j and k that allows for an improvement in efficiency
of GLS relative to the classical LS estimator of each βj .

Kronecker Product Notation


Zellner’s insight was that, like the usual stacking of the individual dependent variables yij into

an N -vector yj , those latter vectors can themselves be stacked into an M N -dimensional vector y, with a
corresponding arrangement for the error terms, coefficient vectors, and regressors:
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
y1 ε1 β1
⎜ y2 ⎟ ⎜ ε2 ⎟ ⎜ β2 ⎟
y =⎜ ⎟
⎝ ... ⎠ , ε =⎜ ⎟
⎝ ... ⎠ , β =⎜ ⎝ ...
⎟,

(MN×1) (MN×1) (K×1)
yM εM βM

and ⎛ ⎞
X1 0 ... 0
⎜ 0 X2 ... ... ⎟
X =⎜ ⎟
⎝ ... ... ... 0 ⎠ ,
(MN×K)
0 ... 0 XM
with
M
X
K≡ Kj .
j=1

With this notation, and the individual assumptions for each equation j, it follows that

E(y) = Xβ

2
and ⎛ ⎞
σ 11 IN σ 12 IN ... σ 1M IN
⎜ σ 21 IN σ 22 IN ... ... ⎟
V(y) = ⎜

⎟.

(MN×MN) ... ... ... ...
σ M1 IN ... ... σ MM IN
This nonscalar covariance matrix is a particular mixture of the matrix
⎛ ⎞
σ 11 σ 12 ... σ 1M
⎜ σ 21 σ 22 ... ... ⎟
Σ ≡⎜ ⎝ ...

(M×M) ... ... ... ⎠
σ M1 ... ... σ MM

and the (N × N ) identity matrix IN . A notation system for such combinations was proposed by the
otherwise-despicable mathematician Kronecker, the so-called Kronecker product notation; for two matrices
A ≡ [aij ] (i = 1, ..., L, j = 1, ...M ) and B, the Kronecker product of A and B is defined as
⎛ ⎞
a11 B a12 B ... a1M B
⎜ a21 B a22 B ... ... ⎟
A ⊗ B ≡⎜
⎝ ...
⎟.
... ... ... ⎠
aL1 B ... ... aLM B

With this notation, clearly

V(y) = Σ ⊗ IN

for the stacked SUR model.


Kronecker products satisfy a distributive rule, which will come in handy later:

(A ⊗ B) (C ⊗ D) = AC ⊗ BD,

assuming all matrix products are well defined. From this rule follows another for inverses of Kronecker

products:
(A ⊗ B)−1 = A−1 ⊗B−1 ,

assuming both A and B are invertible.

3
Least Squares and Generalized Least Squares
With the foregoing notation, the classical least squares estimator for the vector β can be expressed
as

¡ 0 ¢−1 0
β̂ LS = XX Xy
⎛ ⎞
(X01 X1 )−1 X01 y1
⎜ (X0 X2 )−1 X0 y2 ⎟
= ⎜

2 2 ⎟.

...
(X0M XM )−1 X0M yM

In contrast, the GLS estimator of β (assuming Σ is known) is


³ ´−1
β̂GLS = X0 (Σ ⊗ IN )−1 X X0 (Σ ⊗ IN )−1 y
¡ ¡ ¢ ¢−1 0 ¡ −1 ¢
= X0 Σ−1 ⊗IN X X Σ ⊗IN y
⎛ ³P ´ ⎞
⎛ ⎞−1 X01 σ 1j y
j
σ 11 (X01 X1 ) σ 12 (X01 X2 ) ... σ 1M (X01 XM ) ⎜ ³Pj ´ ⎟
⎜ σ 21 (X02 X1 ) σ 22 (X02 X2 ) ⎟ ⎜ 0 2j ⎟
... ... ⎜ X2 σ y ⎟
= ⎜


⎠ ⎜ j j
⎟,
... ... ... ... ⎜ ... ⎟
M1 0 MM 0 ⎝ ³P ´ ⎠
σ (XM X1 ) ... ... σ (XM XM )
X0M j σ Mj y
j

£ ¤
where σ ij is defined to be the element in the ith row and j th column of Σ−1 , i.e., Σ−1 ≡ σ ij .
To get a better idea of what is going on with the GLS estimator, consider the special case M = 2, with
µ ¶ µ ¶
b1 β̂1
β̂LS ≡ and β̂GLS ≡ ;
b2 β̂2

then it can be shown that the GLS estimators β̂1 and β̂2 satisfy the two equations
µ¶ ³ ´
σ 21 ¡ 0 ¢
β̂ 1 = b1 − X1 X1 X01 y2 − X02 β̂2 ,
σ 22
µ ¶ ³ ´
σ 12 ¡ 0 ¢
β̂ 2 = b2 − X2 X2 X02 y1 − X01 β̂1 .
σ 11

Thus the GLS estimators can be viewed as “adjusted” versions of classical LS, where the adjustment involves
the regression of the GLS residuals from the other equation on the regressors from each equation. As noted
by Luce in JASA, 1964, the GLS estimator for this model can be calculated sequentially by including
appropriately-reweighted residuals from all other equations as additional regressors for each equation.

Another important special case is when the matrix Σ is diagonal, i.e., σ ij = 0 if i 6= j. In this case,

4
since Σ−1 = diag[1/σ ii ], it follows that
⎛ 1 ⎞−1 ⎛ 1 ⎞
σ 11 (X01 X1 ) 0 ... 0 0
σ 11 X1 y1
⎜ 0 1
(X02 X2 ) ... ... ⎟ ⎜ 1 0 ⎟
β̂GLS = ⎜ σ 22 ⎟ ⎜ σ 22 X2 y2 ⎟
⎝ ... ... ... 0 ⎠ ⎝ ... ⎠
1 0 1 0
0 ... 0 σM M (XM XM ) σ MM M yM
X
⎛ ⎞
(X01 X1 )−1 X01 y1
⎜ (X02 X2 )−1 X02 y2 ⎟
= ⎜



...
(X0M XM )−1 X0M yM
≡ β̂LS .

Not surprisingly, then, if there is no covariance across equations in the error terms, there is no prospect
for an efficiency improvement in the GLS estimator relative to LS, applied equation by equation.
Still another important special case is when the matrix of regressors is identical for each equation, i.e.,
Xj ≡ X0 for some N × K ∗ matrix X0 , with K ∗ = K/M. Here the stacked matrix X takes the form
⎛ ⎞
X0 0 ... 0
⎜ 0 X0 ... ... ⎟
X = ⎜ ⎝ ... ... ... 0 ⎠

(MN ×K)
0 ... 0 X0
= (IM ⊗ X0 ) ,

and the GLS estimator also reduces to classical LS:


³ ´−1
β̂GLS = X0 (Σ ⊗ IN )−1 X X0 (Σ ⊗ IN )−1 y
¡ ¡ ¢ ¢−1 ¡ ¢
= (IM ⊗X)0 Σ−1 ⊗IN (IM ⊗X) (IM ⊗X)0 Σ−1 ⊗IN y
¡ −1 ¢−1 ¡ −1 ¢
= Σ ⊗X0 X Σ ⊗X0 y
⎛ ⎞
y1
³ ¡ ¢−1 0 ´ ⎜ y2 ⎟
= IM ⊗ X0 X X ⎜ ⎝ ... ⎠

yM
⎛ 0 −1 0 ⎞
(X0 X0 ) X0 y1
⎜ (X0 X0 )−1 X0 y2 ⎟
= ⎜ 0 0 ⎟
⎝ ... ⎠
(X00 X0 )−1 X00 yM
= β̂LS .

Some intuition for this reduction can be obtained by considering Luce’s result that GLS can be obtained

iteratively, by starting from classical LS estimators and including the residuals from other equations as

5
regressors for each equation. Since the LS residuals are, by construction, orthogonal to the common matrix
of regressors X0 , their inclusion in each equation will not affect the LS estimates of the βj coefficients. An
extension of this argument implies that, if each matrix of regressors Xj has a submatrix X0 in common

— for example, if all equations have an intercept term — then the GLS coefficients corresponding to those
common regressors X0 will be identical to their LS counterparts.

Feasible GLS
An obvious estimator of the unknown covariance matrix Σ = V(y) = [σ ij ] would be Σ̂ ≡ [σ̂ ij ],
with
1 ³ ´0 ³ ´
σ̂ jk ≡ yj − Xj β̂ j yk − Xk β̂k ;
N
while these estimators are not unbiased for σ jk , they are consistent under the usual conditions, and obtain-
ing unbiased estimators for σ jk when j 6= k involves more than a simple “degrees of freedom” adjustment.
Again imposing reasonable regularity conditions, it can be shown that the feasible GLS estimator
³ ³ ´ ´−1 ³ ´
β̂ F GLS = X0 Σ̂−1 ⊗IN X X0 Σ̂−1 ⊗IN y

is asymptotically equivalent to the infeasible GLS estimator which assumes Σ is known:

√ ³ ´ p
N β̂F GLS − β̂GLS → 0.

Thus
√ ³ ´
d
N β̂F GLS − β → N (0, V) ,

where
µ ¶−1
1 0 ¡ −1 ¢
V = plim X Σ ⊗IN X
N
µ ´ ¶−1
1 0 ³ −1
= plim X Σ̂ ⊗IN X
N
≡ plim V̂,

so inference on the parameter vector β can be carried out using the approximate normality of β̂ F GLS :
³ ´
A
β̂F GLS ∼ N β, V̂ .

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy