0% found this document useful (0 votes)
51 views211 pages

Econ-654 - Unit 3-PDM

Uploaded by

chalashebera0314
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views211 pages

Econ-654 - Unit 3-PDM

Uploaded by

chalashebera0314
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 211

Addis Ababa University

College of Business and Economics


Department of Economics
Econ 654: Applied Econometrics
3. Panel Data Modelling

Fantu Guta Chemrie (PhD)


F. Guta (CoBE) CoBE
Econ 654 Dec, 2023 1 / 211
3. Panel Data Modelling
3.1 Introduction to Panel Data Modelling
3.1.1 De…ning Panel Data Model
3.1.2 Reasons for using Panel Data
3.1.3 Limitations of Panel Data

3.2 Panel Data Models


3.2.1 Pooled Regression
3.2.2 Fixed E¤ects Model
3.2.3 Random E¤ects Model
3.2.4 Relationship between total, within and between E¤ects
3.2.5 Hausman’s Speci…cation Test for Random E¤ects Model
3.2.6 Post Estimation Tests
F. Guta (CoBE) Econ 654 Dec, 2023 2 / 211
3. Panel Data Modelling Continued ...
3.3 Endogeneity in Static Panel Data Models
3.3.1 Within Group Instrumental Variables
3.3.2 Hausman-Taylor’s IV Estimators
3.3.3 Mundlak (1978) Approach
3.3.4 Chamberlain (1982, 1984) Approach

3.4 Dynamic Panel Data Model

3.4.1 Introduction to Dynamic Panel Data Model


3.4.2 Estimation of Dynamic Panel Data Model
3.3.3 Dynamic Models with Exogenous Variables
3.4.4 Weak Instruments (Blundell and Bond) and Other Issues

F. Guta (CoBE) Econ 654 Dec, 2023 3 / 211


3. Panel Data Modelling Continued ...
3.5 Models with Limited Dependent Variables
3.5.1 Binary Choice Models
3.5.2 The Fixed E¤ects Logit Model
3.5.3 The Random E¤ects Probit Model
3.5.4 Tobit Models
3.5.5 Dynamics and the Problem of Initial Conditions

F. Guta (CoBE) Econ 654 Dec, 2023 4 / 211


3.1 Introduction to Panel Data Modelling

A “panel” data set will consist of n sets of


observations on individuals to be denoted by
i = 1; :::; n.
If each individual in the data set is observed the
same number of times, usually denoted by T , the
data set is a balanced panel.
An unbalanced panel data set is one in which
individuals may be observed di¤erent numbers of
F. Guta (CoBE) Econ 654 Dec, 2023 5 / 211
times. We will denote this by Ti .

A …xed panel is one in which the same set of


individuals is observed for the duration of the study.
A rotating panel is one in which the cast of
individuals changes from one period to the next.
The availability of repeated observations on the
same units allows us to specify and estimate more
complicated and more realistic models than a single
cross-section or a single time series would do.
F. Guta (CoBE) Econ 654 Dec, 2023 6 / 211
3.1.1 De…ning Panel Data Models

Index all variables by an i for the individual


(i = 1; :::; n) & a t for the time period (t = 1; :::; T ).
In very general terms, we could specify a linear
model as
yit = x0it β it + ε it ,

where β it measures the partial e¤ects of x0it in


period t for unit i.

F. Guta (CoBE) Econ 654 Dec, 2023 7 / 211


This model is much too general to be useful, and we
need to put more structure on the coe¢ cients β it .
The standard assumption, used in many empirical
cases, is that β it is constant for all i and t , except
possibly the intercept term. This could be written as

yit = α i + x0it β + ε it (3.1.1)

where xit is a K dimensional vector of explanatory


variables, not including a constant.

F. Guta (CoBE) Econ 654 Dec, 2023 8 / 211


This means that the e¤ects of a change in x are the
same for all units & all periods, but that the average
level for unit i may be di¤erent from that for unit j.
α i capture the e¤ects of those variables that are
speci…c to the i th individual and that are constant
over time.
The …xed1 e¤ects approach takes α i to be a group-
speci…c constant term in the regression model.
1
Note that the term “…xed” as used here signi…es the correlation of α i and
xit , not that α i is nonstochastic.
F. Guta (CoBE) Econ 654 Dec, 2023 9 / 211
In the standard case, we assume ε it i:i:d 0; σ 2ε .
If we treat the α i as n …xed unknown parameters,
the model in (3.1.1) is referred to as the standard
…xed e¤ects model.
Alternatively one may assume that the intercepts of
the individuals are di¤erent but that they can be
treated as random drawings from a distribution with
mean α and variance σ 2α .
The essential assumption here is that these
F. Guta (CoBE) Econ 654 Dec, 2023 10 / 211
drawings are independent of the explanatory
variables in xit ; i.e., cov (α i ; xit ) = 0.

This leads to the random e¤ects model, where the


individual e¤ects α i are treated as random.
The random e¤ects takes α i as a group-speci…c
random element, similar to ε it except that for each
group, there is but a single draw that enters the
regression identically in each period.
The error term in this model consists of two parts:
F. Guta (CoBE) Econ 654 Dec, 2023 11 / 211
a time-invariant component2 α i and a remainder
component ε it that is uncorrelated over time3 .

If cov (α i ; xit ) = 0 then the model may be


formulated as

yit = α + x0it β + α i + ε it ; (3.1.2)

where α denotes the intercept term.

2
In the random e¤ects model, the α i ’s are rede…ned to have a zero mean.
3
The model is sometimes referred to as a (one-way) error components model.
F. Guta (CoBE) Econ 654 Dec, 2023 12 / 211
3.1.2 Reasons for using Panel Data

1). Panel data give more information, more variability,


less collinearity among variables, more degrees of
freedom and more e¢ ciency.
2). Panel data account for individual heterogeneity:
Individual heterogeneity refers to the fact the
individual units (e.g. …rms/countries) are di¤erent.
For example, di¤erent …rms have di¤erent
managerial skills and access to technology.

F. Guta (CoBE) Econ 654 Dec, 2023 13 / 211


3). Panel data are ideal in the study of dynamics or
change because they follow the same individual
units through time.
4). Can better detect and measure e¤ects that are
unobservable under CS or TS data
E¤ect of minimum wage on employment and income
Impact of social interventions on poverty.

5). Enables the study of more complicated behavioural


models
Tracking technological change, economies of scale, etc.
F. Guta (CoBE) Econ 654 Dec, 2023 14 / 211
6). Minimizes the bias in using broad aggregates

Two main reasons for using panel data are:

1). E¢ ciency of Parameter Estimators:

Because panel data sets are larger than cross-


sectional or time series data sets, and explanatory
variables vary over two dimensions (individuals and
time) rather than one, estimators based on panel
data are quite often more accurate than from other
sources.
F. Guta (CoBE) Econ 654 Dec, 2023 15 / 211
Even with identical sample sizes, the use of a panel
data set will often yield more e¢ cient estimators
than a series of independent cross-sections (where
di¤erent units are sampled in each period).

2). Identi…cation of Parameters: The other advantage


of the availability of panel data is that it reduces
identi…cation problems.

Although this advantage may come under di¤erent


headings, in many cases it involves identi…cation in
F. Guta (CoBE) Econ 654 Dec, 2023 16 / 211
the presence of endogenous regressors or
measurement error, robustness to omitted variables
and the identi…cation of individual dynamics.

3.1.3 Limitations of Panel Data


1). Design and data collection problems:
Incomplete coverage of population of interest
Non-response due to lack of cooperation of the
interviewees and frequency of interviewing.
Faulty response due to memory errors.

F. Guta (CoBE) Econ 654 Dec, 2023 17 / 211


2). Short time-series dimension:
Because of cost reasons, typical panels involve annual
data covering a short time period for each individual.

3). Attrition:
For one reason or another, subjects of the panel drop out
over time and so there are fewer subjects left for
subsequent surveys.

4). Because we repeatedly observe the same units, it is


usually no longer appropriate to assume that
di¤erent observations are independent.
F. Guta (CoBE) Econ 654 Dec, 2023 18 / 211
This may complicate the analysis, particularly in
nonlinear and dynamic models.

3.2. Panel Data Models

In this subsection we discuss the static linear model


in a panel data setting.
We start with two basic models, the …xed e¤ects
and the random e¤ects model, and then discuss the
choice between the two.

3.2.1 Pooled Regression


F. Guta (CoBE) Econ 654 Dec, 2023 19 / 211
Suppose we want to estimate a wage equation using
the panel data set.
Wage equation is used to determine the relationship
between wage and its determinants such as level of
education and experience.
We specify a wage equation as:

wageit = β 1 + β 2 schoolit + β 3 exper it + β 4 exper2it


+β 5 unioni + β 6 marriedi + β 7 blacki + ε it

where i denotes individual workers and t time.


F. Guta (CoBE) Econ 654 Dec, 2023 20 / 211
In pooled regression case we use OLS on the pooled
data. This is called pooled regression.
reg wage schooling exper exper2 union married
black
Source SS df MS Number of obs = 4360
F( 6, 4353) = 166.30
Model 230.588267 6 38.4313779 Prob > F = 0.0000
Residual 1005.94137 4353 .231091517 R-squared = 0.1865
Adj R-squared = 0.1854
Total 1236.52964 4359 .283672779 Root MSE = .48072

wage Coef. Std. Err. t P>|t| [95% Conf. Interval]

schooling .0987582 .0046023 21.46 0.000 .0897353 .107781


exper .0889479 .0101059 8.80 0.000 .0691352 .1087606
exper2 -.0028299 .0007069 -4.00 0.000 -.0042158 -.0014441
union .1807191 .0170982 10.57 0.000 .1471979 .2142403
married .1075152 .0156944 6.85 0.000 .0767461 .1382843
black -.146868 .0232142 -6.33 0.000 -.1923797 -.1013563
_cons -.0240287 .0629953 -0.38 0.703 -.1475315 .0994741

An increase in the level of schooling by one year


F. Guta (CoBE) Econ 654 Dec, 2023 21 / 211
increases hourly wage by US $ 0:0988.
When experience changes from 9 to 10, experience
squared changes from 81 to 100 so the total e¤ect
is given by

0:0889479 19 0:0028299 ' 0:035180

Hence an increase in experience from 9 to 10 years


results in a US$ 0:035 increase in hourly wage rates.
Pooled regression does not make full use of the
richness of the panel data.
F. Guta (CoBE) Econ 654 Dec, 2023 22 / 211
The pooled regression model ignores time e¤ects
and individual heterogeneity, and this might lead to
wrong conclusions.

Testing for time e¤ects

It is advisable to check whether pooling is


appropriate:
If there are no time e¤ects then the pooled regression is
appropriate.
If there are time e¤ects then we need to control for that
and thus the pooled regression isn’t su¢ cient.
F. Guta (CoBE) Econ 654 Dec, 2023 23 / 211
Three simple steps in Testing for time e¤ects

1). Add time dummies to the pooled regression model.


So we include 7 year dummies ( from 1981 to 1987) to
avoid the dummy variable trap.

2). Estimate the model using OLS.


3). Test if time dummies are jointly equal to zero
a). If they are jointly equal to zero, then we accept the null
hypothesis that there are no time e¤ects (pooled
regression model is appropriate)
b). If we reject the null the pooled model is inappropriate
F. Guta (CoBE) Econ 654 Dec, 2023 24 / 211
With Stata, you can perform steps 1 and 2 in one
go as follows:
. reg wage schooling exper exper2 union married black i.yr

Source SS df MS Number of obs = 4360


F( 13, 4346) = 77.99
Model 233.911386 13 17.9931835 Prob > F = 0.0000
Residual 1002.61826 4346 .230699093 R-squared = 0.1892
Adj R-squared = 0.1867
Total 1236.52964 4359 .283672779 Root MSE = .48031

wage Coef. Std. Err. t P>|t| [95% Conf. Interval]

schooling .0907146 .0051718 17.54 0.000 .0805752 .1008539


exper .0669274 .0136884 4.89 0.000 .0400911 .0937636
exper2 -.0023875 .0008193 -2.91 0.004 -.0039938 -.0007812
union .1831315 .0171339 10.69 0.000 .1495403 .2167227
married .108094 .0156873 6.89 0.000 .0773389 .1388492
black -.1423222 .0232352 -6.13 0.000 -.187875 -.0967694

yr
1981 .0584744 .0303516 1.93 0.054 -.0010301 .1179789
1982 .0630235 .0332109 1.90 0.058 -.0020869 .1281339
1983 .0623225 .0366562 1.70 0.089 -.0095424 .1341873
1984 .0907743 .0400868 2.26 0.024 .0121836 .1693649
1985 .1095214 .043349 2.53 0.012 .0245353 .1945075
1986 .1421435 .0464202 3.06 0.002 .0511363 .2331507
1987 .1738353 .0494307 3.52 0.000 .0769258 .2707447

_cons .1028863 .0769932 1.34 0.182 -.0480596 .2538322

F. Guta (CoBE) Econ 654 Dec, 2023 25 / 211


Seven year dummies with 1980 as the base
(reference) year.
Testing for Time E¤ects
. testparm i.yr

( 1) 1981.yr = 0
( 2) 1982.yr = 0
( 3) 1983.yr = 0
( 4) 1984.yr = 0
( 5) 1985.yr = 0
( 6) 1986.yr = 0
( 7) 1987.yr = 0

F( 7, 4346) = 2.06
Prob > F = 0.0447

p value = 0:0447 < 0:05, so reject the null


hypothesis of no time e¤ects at 5% level.
F. Guta (CoBE) Econ 654 Dec, 2023 26 / 211
Robust Standard Errors: correcting for
heteroscedasticity and serial correlation

The simplest but most widely used method is the


method that gives standard errors of regression
coe¢ cients that are robust to heteroscedasticity and
serial correlation.
Such standard errors can be used to test hypotheses
and construct con…dence intervals.
Robust standard errors are also known as “clustered
F. Guta (CoBE) Econ 654 Dec, 2023 27 / 211
standard errors” because serial correlation within
each panel is referred to as clustering.
. reg wage schooling exper exper2 union married black, robust cluster(nr)

Linear regression Number of obs = 4360


F( 6, 544) = 68.07
Prob > F = 0.0000
R-squared = 0.1865
Root MSE = .48072

(Std. Err. adjusted for 545 clusters in nr)

Robust
wage Coef. Std. Err. t P>|t| [95% Conf. Interval]

schooling .0987582 .0089413 11.05 0.000 .0811944 .116322


exper .0889479 .0124591 7.14 0.000 .0644741 .1134217
exper2 -.0028299 .0008731 -3.24 0.001 -.0045449 -.001115
union .1807191 .027687 6.53 0.000 .1263326 .2351056
married .1075152 .0260666 4.12 0.000 .0563117 .1587186
black -.146868 .0492885 -2.98 0.003 -.2436872 -.0500489
_cons -.0240287 .1143511 -0.21 0.834 -.2486525 .2005951

3.2.2 Fixed E¤ects Model

The …xed e¤ects model is a linear regression model


F. Guta (CoBE) Econ 654 Dec, 2023 28 / 211
in which the intercept terms vary over the individual
units i, i.e.,

yit = α i + x0it β + ε it , ε it i:i:d: 0; σ 2ε (3.2.1)

where it is assumed that all xit are independent of all ε it .

We can write this in the usual regression framework


by including a dummy variable for each unit i in the
model, i.e.,
n
yit = ∑ α j dij + x0it β + ε it , (3.2.2)
j=1
F. Guta (CoBE) Econ 654 Dec, 2023 29 / 211
where dij = 1 if i = j and 0 elsewhere.

We thus have a set of n dummy variables in the


model.
The parameters α 1 ; :::; α n and β can be estimated
by OLS in (3.2.2).
The implied estimator for β is referred to as the
least squares dummy variable (LSDV ) estimator.
It is, however, numerically unattractive to have a
regression model with so many regressors.
F. Guta (CoBE) Econ 654 Dec, 2023 30 / 211
Fortunately one can compute the estimator for β in
a simpler way.
It can be shown that the same estimator for β is
obtained if the regression is performed in deviations
from individual means.
This implies that we eliminate the individual e¤ects
α i …rst by transforming the data.
To see this, …rst note that

y i = α i + x0i β + ε i , (3.2.3)
F. Guta (CoBE) Econ 654 Dec, 2023 31 / 211
where y i = T1 ∑ yit and similarly for the other variables.

Consequently, we can write

yit y i = (xit xi )0 β + (ε it εi ) (3.2.4)

This is a regression model in deviations from


individual means and does not include the individual
e¤ects (heterogeneity) α i .
The transformation that produces observations in
deviation from individual means, as in (3.2.4), is
called the within transformation.
F. Guta (CoBE) Econ 654 Dec, 2023 32 / 211
The OLS estimator for β obtained from this
transformed model is often called the within
estimator or …xed e¤ects estimator, and it is exactly
identical to the LSDV estimator described above.
The …xed e¤ects estimator is given by
! 1
n T n T
βb FE = ∑ ∑ (xit xi ) (xit xi )0
∑ ∑ (xit xi ) yit yi , (3.2.5)
i =1 t=1 i =1 t=1

If all xit are assumed to be independent of all ε it the


FE estimator can be shown to be unbiased for β .

F. Guta (CoBE) Econ 654 Dec, 2023 33 / 211


If, in addition, normality of ε it is imposed, βb FE also
has a normal distribution.
For consistency of βb FE , it is required that

E f(xit xi ) ε it g = 0 (3.2.6)

Su¢ cient for consistency is that xit is uncorrelated


with ε it and that xi: has no correlation with ε it .
These conditions are in turn implied by

E fxit ε is g = 0 for all s, t, (3.2.7)


F. Guta (CoBE) Econ 654 Dec, 2023 34 / 211
in which case we call xit strictly exogenous.
A strictly exogenous variable is not allowed to
depend upon current, future and past values of the
error term.
The n intercepts (individual e¤ects) are estimated as

bi = yi :
α x0i βb FE , i = 1; : : : ; n.

Under assumption (3.2.6) these estimators are


unbiased and consistent for the …xed e¤ects α i
provided T ! ∞.
F. Guta (CoBE) Econ 654 Dec, 2023 35 / 211
b i is inconsistent for …xed T is
The reason why α
that when T is …xed the individual averages y i and
xi do not converge to anything even if the number
of individuals increases.

The covariance matrix for βb FE , assuming that ε it is

i.i.d. across individuals and time with variance σ 2ε ,

is given by
! 1
n T
b
var β 2
FE = σ ε ∑ ∑ (xit xi ) (xit 0
xi ) (3.2.8)
i =1 t=1
F. Guta (CoBE) Econ 654 Dec, 2023 36 / 211
Unless T is large, using the standard OLS estimate
for the covariance matrix based upon the within
regression in (3.2.4) will underestimate the true
variance.
The reason is that in the transformed regression the
error covariance matrix is singular (as ∑t (ε it εi )
T 1 2
= 0 for each i) & the variance of ε it ε i is T σε
rather than σ 2ε .
A consistent estimator for σ 2ε is given by the within

F. Guta (CoBE) Econ 654 Dec, 2023 37 / 211


residual sum of squares divided by n(T 1), i.e.,
n T
1 2
b 2ε =
σ ∑ ∑ yit
n(T 1) i =1 t=1
bi
α x0it βb FE

n T n o2
1
n(T 1) i∑ ∑ yit
= yi (xit xi )0 βb FE (3.2.9)
=1 t=1

It is possible to apply the usual d.f. correction in


which K is subtracted from the denominator.
The FE estimator is asymptotically normal, so that
the usual inference procedures can be used (like t
and Wald tests).
F. Guta (CoBE) Econ 654 Dec, 2023 38 / 211
Essentially, the …xed e¤ects model concentrates on
di¤erences ‘within’individuals, i.e., it is explaining
to what extent yit di¤ers from y i and does not
explain why y i is di¤erent from y j .
The parametric assumptions about β on the other
hand, impose that a change in x has the same
(ceteris paribus) e¤ect, whether it is a change from
one period to the other or a change from one
individual to the other.

F. Guta (CoBE) Econ 654 Dec, 2023 39 / 211


Estimation with First Di¤erences

Here, the intent would explicitly be to transform


latent heterogeneity out of the model.
The base case would be

yit = α i + x0it β + ε it ,

which implies the …rst di¤erences equation

yit = ( xit )0 β + ε it = ( xit )0 β + uit

The advantage of …rst di¤erencing is that it removes


F. Guta (CoBE) Econ 654 Dec, 2023 40 / 211
the latent heterogeneity from the model whether
the …xed or random e¤ects model is appropriate.

The disadvantage is that the di¤erencing also


removes any time-invariant variables from the
model.
If the time-invariant variables in the model are of no
interest, then this is a robust approach that can
estimate the parameters of the time-varying
variables consistently.
F. Guta (CoBE) Econ 654 Dec, 2023 41 / 211
However, this is not helpful if the time-invariant
variables in the model are the primary object of the
analysis.
Note also that the di¤erencing procedure trades the
cross observation correlation in α i for a moving
average (MA) disturbance, ui;t = ε i;t ε i;t 1.

The new disturbance, ui;t is autocorrelated, though


across only one period.
One can estimate such models using two-step
feasible GLS for an MA disturbance.
F. Guta (CoBE) Econ 654 Dec, 2023 42 / 211
Alternatively, this model is a natural candidate for
OLS with the Newey–West robust covariance
estimator, since the right number of lags (one) is
known.
Many studies involve two period “panels,” a before
and after treatment.
In these cases, the phenomenon of interest may be
the change in the outcome variable–the “treatment
e¤ect.”

F. Guta (CoBE) Econ 654 Dec, 2023 43 / 211


Consider the model

yit = α i + x0it β + θ Sit + ε it ;

where t = 1; 2 & Sit = 0 in period 1 & 1 in period 2;


Sit indicates a treatment that takes place between
the two observations.
The “treatment e¤ect” would be
E [ yi j ( xi = 0)] = θ ,
which is precisely the constant term in the …rst
di¤erence regression,
F. Guta (CoBE) Econ 654 Dec, 2023 44 / 211
yi = θ + ( xi )0 β + ui
Testing for Heteroskedasticity
A test for heterscedasticity is available for …xed
e¤ects model using the command xttest3
This is a user written programme, to install it type:
ssc install xttest3
xttest3
. xttest3

Modified Wald test for groupwise heteroskedasticity


in fixed effect regression model

H0: sigma(i)^2 = sigma^2 for all i

chi2 (545) = 2.2e+05


Prob>chi2 = 0.0000

F. Guta (CoBE) Econ 654 Dec, 2023 45 / 211


3.2.3 Random E¤ects Model

In regression analysis it is common to assume that


all factors that a¤ect the dependent variable, but
that have not been included as regressors, can be
appropriately summarized by a random error term.
In the REs model, we assume that the α i are
random factors, i.i.d. over individuals.
Thus we write the random e¤ects model as

yit = α + x0it β + α i + ε it , (3.2.10)


F. Guta (CoBE) Econ 654 Dec, 2023 46 / 211
ε it i:i:d: 0; σ 2ε ; α i i:i:d: 0; σ 2α
where α i + ε it is treated as an error term consisting
an individual speci…c component (random heterogen
speci…c to the i th obser), & a remainder component,
which is assumed to be uncorrelated over time.

All correlations of error terms over time is attributed


to the individual e¤ects α i .
α i & ε it are assumed to be mutually independent
and independent of xjs (for all j and s).
F. Guta (CoBE) Econ 654 Dec, 2023 47 / 211
This implies that the OLS estimator for α and β
from (3.2.10) is unbiased and consistent.
The error components structure implies that the
composite error term α i + ε it exhibits a particular
form of autocorrelation (unless σ 2α = 0).
Consequently, OLS s.e are incorrect and an e¢ cient
(GLS) estimator can be obtained by exploiting the
structure of the error covariance matrix.
To derive the GLS estimator, …rst for individual i all
F. Guta (CoBE) Econ 654 Dec, 2023 48 / 211
error terms can be stacked as α i ι T + ε i , where
ι T = (1; :::; 1)0 of dimension T & ε i = (ε i1 ; :::; ε iT )0 .
The covariance matrix of the vector α i ι T + ε i is

var (α i ι T + ε i ) = = σ 2α ι T ι 0T + σ 2ε IT (3.2.11)

where IT is the T -dimensional identity matrix.


This can be used to derive the GLS estimator for
the parameters in (3.2.10).
For each individual, we can transform the data by
premultiplying the vectors yi = (yi1 ; :::; yiT )0 etc. by
1=2
F. Guta (CoBE) Econ 654 Dec, 2023 49 / 211
1=2 , where
h i
1 2 σ 2α
= σε IT ι ι 0 4,
σ ε +T σ 2α T T
2

which can also be written as

1 1 1
= σε 2 IT ι T ι 0T +ψ ι T ι 0T
T T
where
σ 2ε
ψ= 2 .
σ ε + T σ 2α
Noting that IT (1=T ) ι T ι 0T transforms the data
in deviations from individual means and (1=T ) ι T ι 0T
1
4
[A bb 0 ] =A 1
1= 1 b0 A 1 b A 1 bb0 A 1
F. Guta (CoBE) Econ 654 Dec, 2023 50 / 211
takes individual means, the GLS estimator for β can
be written as
! 1
n T n
∑ ∑ (xit ∑
0
βb GLS = xi : ) (xit xi : )0 + ψ T xi : x xi : x
i =1 t=1 i =1
!
n T n
∑ ∑ (xit xi : ) (yit y i :) + ψ T ∑ xi : x y i: y , (3.2.12)
i =1 t=1 i =1

1
where x = nT ∑i;t xit denotes overall average of xit .
It is easy to see that for ψ = 0 the …xed e¤ects
estimator arises.
Because ψ ! 0 if T ! ∞, implying the …xed and
random e¤ects estimators are equivalent for large T .
F. Guta (CoBE) Econ 654 Dec, 2023 51 / 211
If ψ = 1, the GLS estimator is just the OLS
estimator (and is diagonal).
From the general formula for the GLS estimator it
can be derived that

βb GLS = βb B + (IK ) βb FE ,

where
! 1
n n
∑ ∑
0
βb B = xi : x xi : x xi : x yi: y
i =1 i =1
h i 1
W B B
= Sxx + ψ Sxx ψ Sxx
F. Guta (CoBE) Econ 654 Dec, 2023 52 / 211
W = n
Sxx ∑i =1 ∑T
t=1 (xit xi : ) (xit xi : )0

B = n T x 0
Sxx ∑i =1 i: x xi : x

βb B is the so-called between estimator for β .

βb B is the OLS estimator in the model for individual


means

y i = α + x0i β + α i + ε i , i = 1; 2; : : : ; n (3.2.13)

is a weighting matrix and is proportional to the


inverse of the covariance matrix of βb . B
F. Guta (CoBE) Econ 654 Dec, 2023 53 / 211
The GLS estimator is a matrix-weighted average of
the within and the between estimators, where the
weight depends upon the relative variances of the
two estimators.
The between estimator discards the time series
information in the data set.
The GLS estimator is the optimal combination of
the within and the between estimators, and is thus
more e¢ cient than either of these two estimators.

F. Guta (CoBE) Econ 654 Dec, 2023 54 / 211


The OLS estimator (with ψ = 1) is also a linear
combination of the two estimators, but not the
e¢ cient one.
If the explanatory variables are independent of all
ε it and all α i , the GLS estimator is unbiased. It is a
consistent estimator for n or T or both tending to
in…nity if in addition to (3.2.6) it also holds that
E fxi ε it g = 0 and most importantly that

E fxi α i g = 0. (3.2.14)
F. Guta (CoBE) Econ 654 Dec, 2023 55 / 211
Note that these conditions are also required for the
between estimator to be consistent.

An easy way to compute the GLS estimator is


obtained by noting that it can be determined as the
OLS estimator in a transformed model, given by

yit θ y i : = α (1 θ ) + (xit θ xi : )0 β + α i (1 θ ) + (ε it θ ε i :)

yit θ y i : = α (1 θ ) + (xit θ xi : )0 β + uit (3.2.15)

where θ = 1 ψ 1=2 .
F. Guta (CoBE) Econ 654 Dec, 2023 56 / 211
The error term in this transformed regression is i.i.d.
over individuals and time.
The variance components σ 2α and σ 2ε are unknown.
Thus we can use feasible GLS estimator (EGLS),
where the unknown variances are consistently
estimated in a …rst step.
The estimator for σ 2ε is easily obtained from the
within residuals, as given in (3.2.9).
For the between regression the error variance is
F. Guta (CoBE) Econ 654 Dec, 2023 57 / 211
σ 2α + T1 σ 2ε , which we can estimate consistently by
1 n 2
b 2B
σ = ∑ yi bB
α x0i βb B (3.2.16)
n i=1
b B is the between estimator for α.
where α
It is possible to adjust this estimator by applying a
d.f. correction, implying that K + 1 is subtracted in
the denominator.
From this, a consistent estimator for σ 2α follows as
1 2
b 2α = σ
σ b 2B b
σ (3.2.17)
T ε
F. Guta (CoBE) Econ 654 Dec, 2023 58 / 211
The resulting EGLS estimator is referred to as the
REs estimator for β (& α), denoted below as βb . RE

It is also known as the Balestra–Nerlove estimator.


Under weak regularity conditions, the REs estimator
is asymptotically normal with covariance matrix:
! 1
n T n
∑ ∑ (xit ∑
0
var βb RE = σ 2ε xi ) (xit 0
xi ) + ψ T xi x xi x , (3.2.18)
i =1 t=1 i =1

which shows that the REs estimator is more


e¢ cient than the FEs estimator as long as ψ > 0.
The gain in e¢ ciency is due to the use of the
F. Guta (CoBE) Econ 654 Dec, 2023 59 / 211
between variation in the data xi x .

The covariance matrix in (3.2.18) is estimated by


the OLS expressions in the transformed model
(3.2.15).
Testing for Individual Heterogeneity in REM
When there is individual heterogeneity, the REs
model is more e¢ cient than the pooled model.
But if there is no heterogeneity in the panel data, it
is better to use the pooled model (apply OLS).
F. Guta (CoBE) Econ 654 Dec, 2023 60 / 211
The Breusch-Pagan test can be used to test for the
presence of heterogeneity.
The null hypothesis of this test is that “there is no
heterogeneity”(there are no signi…cant di¤erences
across units).
So rejection of the null hypothesis can be taken as
evidence in favour of the REs model.
Breusch and Pagan (1980) have devised a LM test
for the REM against the Pooled Regression based
on the OLS residuals.
F. Guta (CoBE) Econ 654 Dec, 2023 61 / 211
The hypotheses are:
H0 : Pooled Regression is appropriate (variances
across units is zero. i.e. no panel e¤ect)
H1 : REM is appropriate

The LM test-statistic is based on the OLS residuals

ε it and it follows a χ 2 (1):


b
" 2 #
nT ∑ni=1 ∑Tt=1 b
ε it
LM = 2
1
2 (T 1) ∑ni=1 ∑Tt=1 b
ε it

Reject H0 , REM is more appropriate. In practice,


F. Guta (CoBE) Econ 654 Dec, 2023 62 / 211
use the command xttest0 after estimating the REM:
. xttest0

Breusch and Pagan Lagrangian multiplier test for random effects

wage[nr,t] = Xb + u[nr] + e[nr,t]

Estimated results:
Var sd = sqrt(Var)

wage .2836728 .5326094


e .1233862 .3512637
u .1055083 .3248205

Test: Var(u) = 0
chibar2(01) = 3217.14
Prob > chibar2 = 0.0000

3.2.4 Relationship between total, within and


between E¤ects

We can formulate the pooled regression model in


three ways.
F. Guta (CoBE) Econ 654 Dec, 2023 63 / 211
First, the original formulation is

yit = α + x0it β + ε it (3.2.19a)

In terms of the group means,

y i = α + x0i β + ε i (3.2.19b)

Or in terms of deviations from the group means,

yit y i = (xit xi )0 β + ε it εi (3.2.19c)

Here we assumed that there are no time-invariant


variables in xit .
F. Guta (CoBE) Econ 654 Dec, 2023 64 / 211
All three are classical regression models, and in
principle, all three could be estimated, at least
consistently if not e¢ ciently, by OLS.
Note that (3.2.19b) de…nes only n observations, the
group means.
Consider the matrices of sums of squares and cross
products that would be used in each case, where we
focus only on estimation of β .
In (3.2.19a), the moments accumulate variation
about the overall means, y and x, and we would
F. Guta (CoBE) Econ 654 Dec, 2023 65 / 211
use the total sums of squares & cross products,
n T
∑∑
0
Stxx = xit x xit x and
i=1 t=1
n T
Stxy = ∑∑ xit x yit y (3.2.20)
i=1 t=1

For (3.2.19c), because the data are in deviations


already, the means of (yit y i ) and (xit xi ) are
zero.
The moment matrices are within-groups (i.e.,
variation around group means) sums of squares and
F. Guta (CoBE) Econ 654 Dec, 2023 66 / 211
cross products,
n T
Sw
xx = ∑ ∑ (xit xi ) (xit xi )0 and
i=1 t=1
n T
Sw
xy = ∑ ∑ (xit xi ) yit yi
i=1 t=1

Finally, for (3.2.19b), the mean of group means is


the overall mean.
The moment matrices are the between-groups sums
of squares and cross products– that is, the variation
of the group means around the overall
F. Guta (CoBE) Econ 654 Dec, 2023 67 / 211
means;
n
∑T
0
Sbxx = xi x xi x and
i=1
n
Sbxy = ∑T xi x yi y
i=1

It is easy to verify that

Stxx = Sw b t w b
xx + Sxx and Sxy = Sxy + Sxy .

Therefore, there are three possible least squares


estimators of β corresponding to the decomposition.
F. Guta (CoBE) Econ 654 Dec, 2023 68 / 211
The least squares estimator is
h i 1h i
1
βb t = Stxx Stxy = Sw
xx + Sb
xx Sw
xy + Sb
xy . (3.2.21)

The within-groups estimator is

βb w = [Sw
xx ]
1 w
Sxy . (3.2.22)

An alternative estimator would be the


between-groups estimator,
h i 1
b
β b = Sbxx Sbxy . (3.2.23)
F. Guta (CoBE) Econ 654 Dec, 2023 69 / 211
This is the group means estimator.
This least squares estimator of (3.2.19b) is based on
the n sets of group means.

Note: We are assuming that n is at least as large as K .

Equations (3.2.22) and (3.2.23) imply,

wb b b
Sw b
xy = Sxx β w and Sxy = Sxx β b .

Inserting these in (3.2.21), we see that the least


squares estimator is a matrix weighted average of
F. Guta (CoBE) Econ 654 Dec, 2023 70 / 211
the within- and between-groups estimators:

βb t = Fw βb w + Fb βb b (3.2.24)

where
h i 1
w w b
F = Sxx + Sxx SW
xx = I Fb

It can be shown that


h i 1 h i 1 1h i 1
Fw = asy :var βb w + asy :var βb b asy :var βb w .

In the weighted average, the estimator with the


smaller variance receives the greater weight.
F. Guta (CoBE) Econ 654 Dec, 2023 71 / 211
3.2.5 Hausman’s Speci…cation Test for the REM

Whether to treat the individual e¤ects α i as …xed


or random is not an easy question to answer.
It can make a surprising amount of di¤erence in the
estimates of the β parameters in cases where T is
small and n is large.
FEs approach is conditional upon the values for α i ,
i.e., it considers distribution of yit given α i , where
the α i ’s can be estimated.
F. Guta (CoBE) Econ 654 Dec, 2023 72 / 211
This makes sense intuitively if the individuals in the
sample cannot be viewed as a random draw from
some underlying population.
This interpretation is appropriate when i denotes
countries, large companies or industries, and
predictions we want to make are for a particular
country, company or industry.
In contrast, the random e¤ects approach is not
conditional upon the individual α i ’s.

F. Guta (CoBE) Econ 654 Dec, 2023 73 / 211


Formally, while the REs model states that

E fyit j xit g = x0it β , (3.2.25)

the FEs model estimates

E fyit j xit ; α i g = x0it β + α i (3.2.26)

Note that the β coe¢ cients in these two conditional


expectations are the same only if E fα i jxit g = 0.
In summary, a …rst reason why one may prefer the
FEs estimator is that some interest lies in α i , which
F. Guta (CoBE) Econ 654 Dec, 2023 74 / 211
makes sense if the number of units is relatively
small and of a speci…c nature, i.e., identi…cation
of individual units is important.

However, even if we are interested in the larger popn¯


of individual units, and a REs framework seems
appropriate, the FEs estimator may be preferred.
The reason being the correlation between α i and xit
in which case the random e¤ects approach, ignoring
this correlation, leads to inconsistent estimators.
F. Guta (CoBE) Econ 654 Dec, 2023 75 / 211
The problem of correlation between the individual
e¤ects α i and the explanatory variables in xit can
be handled using the FEs approach, which removes
α i from the model, & thus eliminates any problems
that they may cause.
Hausman (1978) has suggested a test for the null
hypothesis that xit and α i are uncorrelated.
The general idea of a Hausman test is that two
estimators are compared: one which is consistent
under both the null and alternative hypothesis
F. Guta (CoBE) Econ 654 Dec, 2023 76 / 211
(βb FE ) and one which is consistent (and typically
e¢ cient) under the null hypothesis only (βb ).
RE

A signi…cant di¤erence between the two estimators


indicates that the null hypothesis is unlikely to hold.
Assume that E fε it xis g = 0 for all s, t, so that the
FEs estimator βb is consistent for β irrespective of
FE
the question whether xit & α i are uncorrelated, the
REs estimator βb is consistent and e¢ cient only if
RE
xit and α i are not correlated.
F. Guta (CoBE) Econ 654 Dec, 2023 77 / 211
Let us consider the di¤erence vector βb FE βb RE .
To evaluate the signi…cance of this di¤erence, we
need its covariance matrix.

This would require to estimate the cov βb FE ; βb RE ,


but because the latter estimator is e¢ cient under
H0 , it can be shown that (under H0 )

n o n o n o
var βb FE βb RE = var βb FE var βb RE 5 . (3.2.27)

h i h i
5
Hausman’s essential result is that Cov βb FE βb RE ; βb RE = Cov βb FE ; βb RE b
Var β RE = 0.
F. Guta (CoBE) Econ 654 Dec, 2023 78 / 211
Thus, we can compute the Hausman test statistic as
0h n o n oi 1
ξ H = βb FE βb RE c βb FE
var c βb RE
var βb FE βb RE (3.2.28)

c s denote estimates of the true covariance


where var’
matrices.
Under H0 , which implicitly says plim(βb FE βb RE ) = 0,
the statistic ξ H has an asymptotic χ 2 (K ), where K
is the number of elements in β .
The Hausman test thus tests whether the FEs and
REs estimators are signi…cantly di¤erent.
F. Guta (CoBE) Econ 654 Dec, 2023 79 / 211
An important reason why the two estimators would
be di¤erent is the existence of correlation between
xit and α i , although other sorts of misspeci…cation
may also lead to rejection.
A practical problem when computing (3.2.28) is
that the covariance matrix in square brackets may
not be positive de…nite in …nite samples, such that
its inverse cannot be computed.
As an alternative, it is possible to test for a subset
of the elements in β .
F. Guta (CoBE) Econ 654 Dec, 2023 80 / 211
3.2.6 Post Estimation Tests

Goodness-of-…t

Computation of goodness-of-…t measures in panel


data applications is somewhat uncommon.
One reason is the fact that one may attach di¤erent
importance to explaining the within and between
variation in the data.
2
Another reason is that the R 2 or R criteria are only
appropriate if the model is estimated by OLS.
F. Guta (CoBE) Econ 654 Dec, 2023 81 / 211
Our starting point here is the de…nition of the R 2 in
terms of the squared correlation coe¢ cient between
actual and …tted values.
This de…nition has the advantage that it produces
values within the [0; 1] interval, irrespective of the
estimator that is used to generate the …tted values.
The total variation in yit can be written as the sum
of the within and the between variation, i.e.,
1 1 1
∑ ∑ n∑
2 2 2
yit y = yit yi + yi y ; (3.2.29)
nT i ;t nT i ;t i
F. Guta (CoBE) Econ 654 Dec, 2023 82 / 211
where y denotes the overall sample average.
Now, we can de…ne alternative versions of an R 2
measure, depending upon the dimension of the data
that we are interested in.
For example, the …xed e¤ects estimator is chosen to
explain the within variation, and thus maximizes the
‘within R 2 ’given by
n FE
o2
2
Rwithin βb FE = Corr ybitFE ybi , yit yi , (3.2.30)

FE
where ybitFE ybi = (xit xi )0 βb FE .
F. Guta (CoBE) Econ 654 Dec, 2023 83 / 211
The between estimator, being an OLS estimator in
the model in terms of individual means, maximizes
the ‘between R 2 ’, which we de…ne as
n b
o2
2 b b
Rbetween β b = Corr y i , y i , (2.3.31)
b
where ybi = x0i βb b .
The OLS estimator maximizes the overall goodness-
of- …t and thus the overall R 2 , which is de…ned as

2
Roverall βb t = fCorr (b
yit , yit )g2 ; (3.2.32)

F. Guta (CoBE) Econ 654 Dec, 2023 84 / 211


with ybit = x0it βb t .

The three measures above can be computed for any


of the estimators that we considered.
They provide, however, possible criteria for choosing
between alternative speci…cations of the model.

Testing the Signi…cance of the Groups E¤ects

If we are interested in di¤erences across groups,


then we can test the hypothesis that the constant
terms are all equal with an F test.
F. Guta (CoBE) Econ 654 Dec, 2023 85 / 211
Under the null hypothesis of equality, the e¢ cient
estimator is the pooled least squares.

The F ratio used for this test is


2
RLSDV 2
RPooled = (n 1)
F (n 1; nT n K) = 2
(3.2.33)
1 RLSDV = (nT K n)
where LSDV indicates the dummy variable model
and Pooled indicates the pooled or restricted model
with only a single overall constant term.
Alternatively, the model may have been estimated
with overall constant & n 1 dummy variables.
F. Guta (CoBE) Econ 654 Dec, 2023 86 / 211
All other results (i.e., the least squares slopes, s 2 ,
R 2 ) will be unchanged, but rather than estimate of
α i , each dummy variable coe¢ cient will now be an
estimate of α i α 1 where “1” is the omitted group.
The F test that the coe¢ cients on these n 1 DVs
are zero is identical to the one above.
However, although the statistical results are the
same, the interpretation of the dummy variable
coe¢ cients in the two formulations is di¤erent.

F. Guta (CoBE) Econ 654 Dec, 2023 87 / 211


Testing for Heteroscedasticity and Autocorrelation

Most of the tests used for heteroscedasticity or


autocorrelation in REs model are computationally
burdensome.
For the FEs model, which is estimated by OLS,
things are relatively less complex.
As the FEs estimator can be applied even if we
make the REs assumption that α i is i.i.d. and
independent of the explanatory variables, the tests
F. Guta (CoBE) Econ 654 Dec, 2023 88 / 211
for the FEs model can also be used in the REs case.
A test for autocorrelation in the FEs model is based
upon the Durbin–Watson test.
The alternative hypothesis is that

ε it = ρε i;t 1 + υ it (3.2.34)

where υ it is i.i.d. across individuals and time.


This allows for autocorrelation over time with the
restriction that each individual has the same
autocorrelation coe¢ cient ρ.
F. Guta (CoBE) Econ 654 Dec, 2023 89 / 211
The null hypothesis to be tested is H0 : ρ = 0
against the one-sided alternative ρ < 0 or ρ > 0.
Let b
ε it be the residuals from the within regression
(3.2.4) or – equivalently – from (3.2.2).
Bhargava, Franzini & Narendranathan (1983)
suggest the following generalization of the DW
statistic
2
∑ni=1 ∑T
t=2 b
ε it b
ε i;t 1
dwρ = 2
(3.2.35)
∑ni=1 ∑T
t=1 b
ε it
F. Guta (CoBE) Econ 654 Dec, 2023 90 / 211
Using similar derivations as Durbin and Watson, the
authors are able to derive lower and upper bounds
on the true critical values that depend upon n, T ,
and K only.
Unlike true time series case, the inconclusive region
for the panel data DW test is very small, particularly
when the number of individuals in the panel is large.
For panels with very large n, Bhargava et al. suggest
to test if the computed statistic dw ρ is less than
two, when testing against positive autocorrelation.
F. Guta (CoBE) Econ 654 Dec, 2023 91 / 211
Because the FEs estimator is also consistent in the
REs model, it is also possible to use this panel data
DW test in the latter model.
To test for heteroscedasticity in ε it , we can again
use the FEs residuals b
ε it .
2
The auxiliary regression of the test regresses b
ε it
upon a constant and the J variables zit that we
think may a¤ect heteroscedasticity.
The alternative hypothesis is that
F. Guta (CoBE) Econ 654 Dec, 2023 92 / 211
var fε it g = σ 2 h z0it α , (3:2:36)
where h is an unknown continuously di¤erentiable
function with h(0) = 1, so that the null hypothesis
that is tested is given by H0 : α = 0.

Under H 0 , the test statistic, computed as n(T 1)


times the R 2 of the auxiliary regression, will have an
asymptotic χ 2 (J).
An alternative test is based upon n times R 2 of an
auxiliary regression of the between squared residuals
F. Guta (CoBE) Econ 654 Dec, 2023 93 / 211
upon zi or, more generally, upon zi1 ; :::; ziT .
Under the null hypothesis of homoscedasticity, the
test statistic has an asymptotic χ 2 distribution, with
d.f. equal to the number of variables included in the
auxiliary regression (excluding the intercept).
The alternative hypothesis of the latter test is less
well-de…ned.
Testing for serial correlation
Serial correlation test apply to macro panels with
long time series (over 20-30 years).
F. Guta (CoBE) Econ 654 Dec, 2023 94 / 211
Serial correlation is not a problem in micro panels
(with very few years).
Serial correlation causes the s.e. of the coe¢ cients
to be smaller than they actually are and higher R 2 .
A Lagrange multiplier test for serial correlation is
available using the command xtserial.
This is user written program, to install it type
ssc install xtserial
xtserial y x1
F. Guta (CoBE) Econ 654 Dec, 2023 95 / 211
. xtserial wage schooling exper exper2

Wooldridge test for autocorrelation in panel data


H0: no first-order autocorrelation
F( 1, 544) = 25.058
Prob > F = 0.0000

3.3 Endogeneity in Static Panel Data Models

Two types:

(1). Due to correlation between some of the x’s and the


equation error ε.
(2). Due to correlation between some of the x’s and α.

F. Guta (CoBE) Econ 654 Dec, 2023 96 / 211


(1) occurs when an x is simultaneously determined
with the dependent variable y. Example: Wages &
tenure simultaneously determined: workers stay
longer in high pay job & high tenure workers earn
more.
(2) occurs when an x is in‡uenced by the
unobserved factor α but not simultaneously
determined. Ability correlated with education in an
earnings equation.

F. Guta (CoBE) Econ 654 Dec, 2023 97 / 211


Estimation:
1). IV methods in panel data context.
2). Hausman-Taylor IV
Possible di¢ culties: …nding suitable instruments!
3). Mundlak (1978) or Chamberlain (1982) method.

3.3.1 Within Group IV: Write equation as

yit = x01;it β 1 + x02;it β 2 + z01;i γ 1 + z02;i γ 2 + α i + ε it (3.3.1)

cov x1;it ; ε it = 0; cov x2;it ; ε it 6= 0


F. Guta (CoBE) Econ 654 Dec, 2023 98 / 211
The within group IVs estimators (WIV ) is found as
follows:
First write the model in terms of deviations from
group mean (within transformation) to obtain:
0 0
yit y i = x1;it x1;i β 1 + x2;it x2;i β 2 +ε it εi

The above model can be written as:


0 1
B β1 C
0 0 B C
yit yi = x1;it x1;i x2;it x2;i B C + ε it εi
@ A
β2

F. Guta (CoBE) Econ 654 Dec, 2023 99 / 211


Stacking the preceding model vertically for t = 1;
2; : : : ; T gives:
2 3 2 3 2 3
0 0
6 yi 1 y i 7 6 (x1;i 1 x1;i ) (x2;i 1 x2;i ) 7 6 εi1 εi 7
6 7 6 72 3 6 7
6 7 6 7 6 7
6 7 6 7 6 7
6 yi 2 y i 7 6 (x1;i 2 x1;i )0 (x2;i 2 x2;i )0 76 β 1 7 6 εi2 εi 7
6 7 6 76 7 6 7
6 7=6 76 7+6 7
6 .. 7 6 .. .. 74 5 6 .. 7
6 . 7 6 . . 7 β 6 . 7
6 7 6 7 2 6 7
6 7 6 7| {z } 6 7
4 5 4 5 4 5
yiT y i (x1;iT x1;i )0 (x2;iT x2;i )0 β
ε iT ε i
| {z } | {z } | {z }
yi xi εi

The above model can be compactly written as:

yi = xi β + ε i
F. Guta (CoBE) Econ 654 Dec, 2023 100 / 211
Instruments: q2;it should have at least as many
elements as x2;it . x1;it acts as its own instrument.
Let q0it = x01;it q02;it be vector of IVs. Stacking
this vector vertical for t = 1; : : : ; T gives:
2 3 2 3
q 0 x0 0
q2;i1
6 i1 7 6 1;i1 7
6 0 7 6 0 0 7
6 qi2 7 6 x1;i2 q2;i2 7
qi = 6 7 6
6 .. 7 = 6 ..
7
.. 7
6 . 7 6 . . 7
4 5 4 5
0
qiT x01;iT 0
q2;iT

Stacking vector qi vertically for i = 1; : : : ; n gives:


F. Guta (CoBE) Econ 654 Dec, 2023 101 / 211
2 3
q1
6 7
6 7
6 q2 7
q =6
6 ..
7
7
6 . 7
4 5
qn
We use IV estimation using say q as instruments.
Assume number of instruments = L and L K.
The population moment conditions are:

E q0i ε i = 0 (L equations)

Then IV estimation solves:


F. Guta (CoBE) Econ 654 Dec, 2023 102 / 211
1 0b 1
nq ε = n ∑ni=1 q0i yi xi βb WIV = m βb WIV = 0

The preceding expression involves L equations in K


unknowns.
Note that if L < K , we can’t solve for our estimates.
Model not-identi…ed.
If L = K , have K equations & K unknowns & hence
have a unique solution giving [is just identi…ed]
! 1 !
n n
1 1
βb WIV = ∑
n i=1
q0i xi ∑
n i=1
q0i yi (simple IVE )

F. Guta (CoBE) Econ 654 Dec, 2023 103 / 211


which is βb WIV = (q0 x) 1
(q0 y).
However, if L > K , then we have more equations
than unknowns. Model is said to be over-identi…ed.
In this case e¢ cient estimator is GMM estimator.

GMM chooses βb WIV to make m βb WIV as small

as possible using quadratic loss function. That is,

GMM estimator βb WIV minimizes


" #0 " #
1 n 0 1 n 0
n i∑ n i∑
Qn (β ) = qi (yi xi β ) Wqq1 qi (yi xi β )
=1 =1

F. Guta (CoBE) Econ 654 Dec, 2023 104 / 211


0
Qn (β ) = n1 q0 (y xβ ) Wqq1 n1 q0 (y xβ )

Wqq1 is an L L matrix of weights which is chosen


‘optimally’[i.e. giving you the smallest variance
GMM estimator].

The solution is
" ! !# 1" ! !#
n n n n
βb WIV = ∑ xi qi0
Wqq1 ∑ qi xi
0
∑ xi qi0
Wqq1 ∑ qi yi0
i =1 i =1 i =1 i =1
1
= x0 qWqq1 q0 x x0 qWqq1 q0 y

1
= Wxq Wqq1 Wqx Wxq Wqq1 Wqy

F. Guta (CoBE) Econ 654 Dec, 2023 105 / 211


where Wxq = n1 ∑ni=1 x0i qi ; Wqx = n1 ∑ni=1 q0i xi and
Wqy = n1 ∑ni=1 q0i yi .
The optimal weighting matrix should be consistent
estimate up to a multiplicative constant of the
variance of the orthogonality conditions given by:
1 0 1 n 0
qbε = ∑ qi yi xi βb WIV = m βb WIV = 0
n n i=1
Wqq is the variance of
p 1 1 0 0
var nm (β ) = var p q0 ε =E q εε q
n n
F. Guta (CoBE) Econ 654 Dec, 2023 106 / 211
Can show that the GMM estimator is consistent and

p b d 1
n β WIV β ! N (0; V) where V = Wxq Wqq1 Wqx

0
Estimate Wqq by n1 ∑ni=1 q0i b
εib
ε i q. This is the
estimated variance of p1 q0 ε.
n

NOTE:

Can’t estimate the γ’s.


Can do the RE IV but will need a stronger
assumption.
F. Guta (CoBE) Econ 654 Dec, 2023 107 / 211
3.3.2 Hausman-Taylor’s IV Estimator

Recall the REs Model is based on the assumption


that unobserved person-speci…c e¤ects, α i , are
uncorrelated with the included variables, xit .
This assumption is major shortcoming of the model.
However, the REs treatment does allow the model
to contain observed time-invariant characteristics,
such as demographic characteristics, while the FEs
model does not-if present, they are simply absorbed
F. Guta (CoBE) Econ 654 Dec, 2023 108 / 211
into the FEs.
Hausman and Taylor’s (1981) estimator for the REs
model suggests a way to overcome the …rst of these
while accommodating the second.
Some regressors are correlated with the α i .
Important model: For e.g. in earnings equation

yit = x01;it β 1 + x02;it β 2 + z01;i γ 1 + z02;i γ 2 + α i + ε it (3.3.2)

In this formulation, all individual e¤ects denoted by


zi are observed.
F. Guta (CoBE) Econ 654 Dec, 2023 109 / 211
Unobserved individual e¤ects are contained in the
person speci…c random term, α i .
Hausman and Taylor de…ne four sets of observed
variables in the model:
x1;it is K1 time varying variable & uncorrelated with α i ,
z1;i is L1 variables that are time-invariant and
uncorrelated with α i ,
x2;it is K2 variables that are time varying and are
correlated with α i ,
z2;i is L2 variables that are time-invariant and are
correlated with α i .
F. Guta (CoBE) Econ 654 Dec, 2023 110 / 211
But assume that,

E [α i jx1;it ; z1;i ] = 0 though E [α i jx2;it ; z2;i ] 6= 0;

var (α i jx1;it ; z1;i ; x2;it ; z2;i ) = σ 2α ;

cov (ε it ; α i jx1;it ; z1;i ; x2;it ; z2;i ) = 0;

var (ε it + α i jx1;it ; z1;i ; x2;it ; z2i ) = σ 2 = σ 2α + σ 2ε ;

corr (ε it + α i ; ε is + α i jx1;it ; z1;i ; x2;it ; z2;i ) = ρ = σ 2α =σ 2 :

By construction, any OLS or GLS estimators of this


model are inconsistent.
F. Guta (CoBE) Econ 654 Dec, 2023 111 / 211
Hausman & Taylor have proposed an IVs estimator
that uses only the information within the model.

Estimation strategy is based on the following logic:

First, take deviations from group means to get

0 0
yit y i = x1;it x1;i β 1 + x2;it x2;i β 2 + ε it ε i (3.3.3)

In this model both parts of β can be consistently


estimated by least squares, in spite of correlation
between x2 and α.
F. Guta (CoBE) Econ 654 Dec, 2023 112 / 211
In the original model, Hausman & Taylor show that
group mean deviations can be used as (K1 + K2 ) IVs
for estimation of (β , α). That is the implication of
(3.3.3).
Because z1 is uncorrelated with disturbances, it can
likewise serve as a set of L1 instrumental variables.
That leaves a necessity for L2 instrumental variables.
The authors show that the group means for x1 can
serve as these remaining instruments, and the model
will be identi…ed if K1 L2 .
F. Guta (CoBE) Econ 654 Dec, 2023 113 / 211
For identi…cation purposes, then, K1 L2 .
The authors propose the following set of steps for
consistent and e¢ cient estimation:
b and βb , the LSDV (FEs) estimator
Step 1: Obtain, β 1w 2w
of β = (β 01 ; β 02 )0 based on x1 and x2 . The residual
variance estimator from this step is a consistent
estimator of σ 2ε .
Step 2: Write the equation as:

yi x0i βb w = z01;i γ 1 + z02;i γ 2 + error


F. Guta (CoBE) Econ 654 Dec, 2023 114 / 211
Use instruments (z1;i and x1;i ) to estimate the γ
coe¢ cients (OLS inconsistent).
2
b ε and
Step 3: Obtain the variance components as before: σ

b 2ε + T σ
σ b 2α (assumes RE speci…cation correct!)
Step 4: Use the step 3 estimates to transform the variables

yit to yit and wit0 = x01;it ; x02;it ; z01;i ; z02;i to wit0 .

Collect these nT observations in the rows of data


matrix W.
The transformed variables for GLS are, as before
F. Guta (CoBE) Econ 654 Dec, 2023 115 / 211
when we …rst …t the REs model,

wit0 = wit0 θb w0i and yit = yit θb y i

where θb denotes the sample estimate of θ .

The transformed data are collected in the rows of


data matrix W and in column vector y .

Note: In the case of time-invariant variables in wit , group

mean is the original variable, and the transformation


just multiplies the variable by 1 θb .
F. Guta (CoBE) Econ 654 Dec, 2023 116 / 211
The instrumental variables are
h i
0 0 0
vit = x1;it x1;i ; x2;it x2;i ; z01;i ; x01;i

These are stacked in the rows of the


nT (K1 + K2 + L1 + K1 ) matrix V.

Note: For the third and fourth sets of instruments, time

invariant variables and group means are repeated for


each member of the group.
Step 5: Estimate the RE-IV: The instrumental variable

estimator would be
F. Guta (CoBE) Econ 654 Dec, 2023 117 / 211
0 1
b h i
B β C 1
B C = (W 0 V) (V0 V)
1
(V0 W ) (W 0 V) (V0 V) 1
(V0 y ) (3:3:4)
@ A
b
γ
IV
Tests for over-identi…cation
h i 1
b = βb W
q βb HT ; then q c βb w
b0 var c βb HT
var b
q asy : χ 2 (K )

where K = minfK1 L2 ; nT K1 K2 g

Notes: Found to be sensitive to the assumptions of

exogeneity of the di¤erent sets of x and z variables.


3.3.3 Mundlak (1978) Approach

Mundlak’s (1978) approach suggest the speci…cation


F. Guta (CoBE) Econ 654 Dec, 2023 118 / 211
E [α i j Xi ] = x0i π

Substituting this in the REs model, we obtain

yit = x0it β + α i + ε it = x0it β + x0i π + ui + ε it

where ui = α i E [α i j Xi ].
Estimating by GLS gives you the same WG
estimators for β , i.e., OLS estimation of the GLS
equation with xi

yit = x0it β + x0i π + ui + ε it


F. Guta (CoBE) Econ 654 Dec, 2023 119 / 211
Gives rise to βb GLS = βb w b = βb b
and π βb w .
What does this mean? BLUE of the REs model
when the correlation is allowed for gives you the
WG estimate.
Can test for π = 0 : Wu-Hausman test for strict
exogeneity. (Discussed in Econ 607).
Mundlak’s approach is often used as a compromise
between the …xed and random e¤ects models.
One side bene…t of the speci…cation is that it
provides another approach to the Hausman test.
F. Guta (CoBE) Econ 654 Dec, 2023 120 / 211
As the model is formulated above, the di¤erence
between the “…xed e¤ects” model and the “random
e¤ects” model is the nonzero π.
A statistical test of the null hypothesis that π
equals zero should provide an alternative approach
to the two methods suggested earlier.

3.3.4 Chamberlain (1982, 1984) Approach

Chamberlian approach assume (as it requires) a


balanced panel of T observations per group.
F. Guta (CoBE) Econ 654 Dec, 2023 121 / 211
For purposes of this development, assume T = 3.
The generalization will be obvious at the conclusion.
Then,the projection suggested by Chamberlain is

α i = α + x0i1 π 1 + x0i2 π 2 + x0i3 π 3 + vi ; vi iid

where now, by construction, vi is orthogonal to xit .


Substituting this in the REs model, we obtain

yit = α + x0i1 π 1 + x0i2 π 2 + x0i3 π 3 + x0it β + vi + ε it

Use minimum distance methods. This gives WG for


β.
F. Guta (CoBE) Econ 654 Dec, 2023 122 / 211
3.4 Dynamic Panel Data Model

Among the major advantages of panel data is the


ability to model individual dynamics.
Many economic models suggest that current
behaviour depends on past behaviour (persistence,
habit formation, partial adjustment, etc.), so we
would like to estimate a dynamic model on an
individual level.

3.4.1 Introduction to Dynamic Panel Data Model


F. Guta (CoBE) Econ 654 Dec, 2023 123 / 211
Consider the linear dynamic model with exogenous
variables and a lagged dependent variable, that is

0
yit = γyi;t 1 + xit β + α i + ε it

where it is assumed that ε it is IID 0; σ 2 .


yi;t 1 will depend upon α i , irrespective of the way
we treat α i .
To illustrate the problems that this causes, we …rst
consider the case where there are no exogenous
variables included and the model reads
F. Guta (CoBE) Econ 654 Dec, 2023 124 / 211
yit = γyi;t 1 + αi + ε it , jγj < 1 (3:4:2)

Assume that we have observations on yit for periods


t = 0; 1; :::; T .
The …xed e¤ects estimator for γ is given by

∑ni=1 ∑T
t=1 (yit y i ) yi;t 1 y i; 1
b
γ FE = 2
(3.4.3)
∑ni=1 ∑T
t=1 yi;t 1 y i; 1
1 1
where y i = T ∑T
t=1 yit and y i; 1 = T ∑T
t=1 yi;t 1.

To analyze the properties of b


γ FE we can substitute
(3.4.2) into (3.4.3) to obtain
F. Guta (CoBE) Econ 654 Dec, 2023 125 / 211
(1 =nT ) ∑ni=1 ∑T
t=1 (ε it ε i )(y i ;t 1 y i; 1 )
b
γ FE = γ + 2 (3:4:4)
(1 =nT ) ∑ni=1 ∑T
t=1 (y i ;t 1 y i; 1 )
This estimator, however, is biased and inconsistent
for n ! ∞ and …xed T , as the last term on the
right-hand side of (3.4.4) does not have expectation
zero and does not converge to zero if n goes to
in…nity. In particular, it can be shown that:
n T
1 σ 2ε (T 1) T γ + γ T
plim
n !∞ nT
∑ ∑ (ε it ε i ) yi ;t 1 y i; 1 =
T2 (1 γ )2
6= 0. (3.4.5)
i =1 t=1

Thus, for …xed T we have an inconsistent estimator.


Note that this inconsistency is not caused by what
F. Guta (CoBE) Econ 654 Dec, 2023 126 / 211
we assumed about the α i ’s, as these are eliminated
in estimation.

The problem is that the within transformed lagged


dependent variable is correlated with the within
transformed error.
If T " ∞, (3.4.5) converges to 0 so that the FEs
estimator is consistent for γ if both T " ∞ & n " ∞.
One could think that the asymptotic bias for …xed
T is quite small and therefore not a real problem.
F. Guta (CoBE) Econ 654 Dec, 2023 127 / 211
This is certainly not the case, as for …nite T the
bias can hardly be ignored.
For example, if the true value of γ equals 0:5, it can
easily be computed that (for n " ∞)
Assume yi1 is available

plim (b
γ FE γ) = 0:250 when T = 2
plim (b
γ FE γ) ' 0:277 when T = 3
plim (b
γ FE γ) ' 0:266 when T = 4

so even for moderate values of T the bias is


F. Guta (CoBE) Econ 654 Dec, 2023 128 / 211
substantial.

Fortunately, there are relatively easy ways to avoid


these biases.

To solve the inconsistency problem, we …rst of all


take a …rst di¤erence transformation to eliminate
the individual e¤ects α i . This gives:

yit yi ;t 1 = γ (yi ;t 1 yi ;t 2 ) + (ε it ε i ;t 1) ; t = 2; : : : ; T . (3.4.6)

If we estimate this by OLS we don’t get consistent


F. Guta (CoBE) Econ 654 Dec, 2023 129 / 211
estimator for γ because yi;t 1 and ε i;t 1 are by
de…nition, correlated, even if T ! ∞.

However, this transformed speci…cation suggests an


IVs approach. For e.g., yi;t 2 is correlated with
yi;t 1 yi;t 2 but not with ε i;t 1, unless ε it exhibits
autocorrelation (which we excluded by assumption).
This suggests an IVs estimator for γ as

∑ni=1 ∑Tt=2 yi;t 2 yit yi;t 1


b
γ IV = n (3.4.7)
∑i=1 ∑Tt=2 yi;t 2 yi;t 1 yi;t 2

F. Guta (CoBE) Econ 654 Dec, 2023 130 / 211


A necessary condition for consistency of this

estimator is that
n T
1
plim ∑ ∑ ε it
n (T 1) i=1 t=2
ε i ;t 1 yi ;t 2 = 0 (3.4.8)

for either T , or n, or both going to in…nity.


The estimator in (3.4.7) is one of the estimators
proposed by Anderson and Hsiao (1981).
They also proposed an alternative, where yi;t 2

yi;t 3 is used as an instrument. This gives


F. Guta (CoBE) Econ 654 Dec, 2023 131 / 211
(2) ∑ni=1 ∑Tt=3 (yi ;t 2 yi ;t 3 )(yit yi ;t 1 )
b
γ IV = (3:4:9)
∑i =1 ∑t=3 (yi ;t 2 yi ;t 3 )(yi ;t 1 yi ;t 2 )
n T

which is consistent (under regularity conditions) if


n T
1
n (T 2) i∑ ∑ ε it
plim ε i ;t 1 yi ;t 2 yi ;t 3 =0 (3.4.10)
=1 t=3

Consistency of both estimators is guaranteed by the


assumption that ε it has no autocorrelation.
Note that the second IVs estimator requires an
additional lag to construct the instrument, such
that the e¤ective number of observations used in
estimation is reduced.
F. Guta (CoBE) Econ 654 Dec, 2023 132 / 211
Consider
0
yit = γyi;t 1 + xit β + α i + ε it , jγj < 1 (3.4.11)

Assume that we have observations on yit for periods


t = 0; 1; :::; T .
Assumption (12) below is (weak exogeneity, i.e.,):

E (yit jxit ; yi ;t 1 ; : : : :; xi 2 ; yi 1 ; α i ) = E (yit jxit ; yi ;t 1; αi )

= γ yi ;t 1 + x0it β + α i (3.4.12)

Note 1 Above does not require ε it to be uncorrelated with


F. Guta (CoBE) Econ 654 Dec, 2023 133 / 211
the future values of x. That is Feedback is allowed
for from yit to xi;t+1 ,. . . , xiT . If necessary we can
impose additional orthogonality conditions to make
ε uncorrelated with past, current and future x’s.
Example
Static model with feedback: [x is weakly exogenous]

yit = δ xit + α i + ε it and xit = β yi;t 1 + φ αi + vit

Here, strict exogeneity assumption for x will not


work!
F. Guta (CoBE) Econ 654 Dec, 2023 134 / 211
Consider E (xi;t+1 ε it ) = β E (yit ε it ) = β E (ε 2it ) 6= 0
even if other terms are 0.

Note 2 With short panels we cannot calculate individual


time series auto-covariances using something like
1
T 1 ∑t=1 yt yt 1 .
T
What we calculate is cross-
sectional autocovariance using n1 ∑ni=1 yi1 yi2 for
example.
Note 3 Asymptotics based on large n and …xed T . So no
need to bother with assumption about γ.
F. Guta (CoBE) Econ 654 Dec, 2023 135 / 211
Identi…cation of parameters and small sample
properties of the estimators- depend on the time
series properties of the series. In‡uence of the initial
observation cannot be ignored when T is small.

Note 4 Persistence come from two sources: heterogeneity


or state dependence. Have true state dependence
when γ 6= 0. When γ = 0 we still observe non-zero
correlation (see RE model). When this happens, we
say that we have spurious state dependence.
F. Guta (CoBE) Econ 654 Dec, 2023 136 / 211
Note 5 Identi…cation: when T = 2 (becomes a cross-section
model). Consider

Model 1 : yit = α + uit ; uit = ρui;t 1 + ε it vs


Model 2 : yit = α i + uit

In Model 1:

Corr(yi1 ; yi2 ) = Corr(α + ui1 ; α + ui2 ) = ρ.

In Model 2:

Corr(yi1 ; yi2 ) = σ 2α =(σ 2α + σ 2u ).


F. Guta (CoBE) Econ 654 Dec, 2023 137 / 211
So, we can’t distinguish between ρ = 0:4 and
σ 2α =(σ 2α + σ 2u ) = 0:4 for example.
Need at least T = 3 here. . . . . .

Note 6 yi;t 1 and α i are correlated.


Note 7 Bias calculations for various estimators are based on
the assumption about the initial condition of the
process.
Note 8 Estimation: WG transformation to eliminate the α i
will not help. The LSDV in mean deviation will be
F. Guta (CoBE) Econ 654 Dec, 2023 138 / 211
correlated with the ε it in mean deviation via the
means. Thus WG is inconsistent (Nickell bias).

Inconsistency vanishes as T " ∞; i.e., need large T


& large n. With large σ 2α & …xed yi1 the bias in the
WG estimate will be small. WG biased downwards.

Note 9 Pooled OLS is also biased and inconsistent. Here


the bias is upwards.
Note 10 FD estimate is also biased downwards.
Note 11 MLE depends on the assumption about yi1 […xed or
F. Guta (CoBE) Econ 654 Dec, 2023 139 / 211
stochastic, correlated with α i or not, stationarity].
Note 12 So better to use IV.
3.4.2 Estimation of Dynamic Panel Data Model

Take …rst di¤erences and then use IVs estimation.


Consider the simpler model where there are no
exogenous variables included:

yit = γ yi;t 1+ ε it , t = 2; ::; T (3.4.13)

Choice of instruments depends on the assumptions


we make.
F. Guta (CoBE) Econ 654 Dec, 2023 140 / 211
Assumption (12): E (ε it jyi;t 1 ; ::; yi1 ; α i ) = 0

implies the following:

1). E (yi;t 1 ε it ) = 0

2). E (yis ε it ) = 0 s = 1; : : : ; t 2; t > 2; [note:


yit γ yi;t 1 = ε it ]
3). E (ε it ε i;t j jyi1 ; ::; yi;t 1 ; α i ) = 0; j > 0 (serially
uncorrelated conditionally)
4). E (ε it ε i;t j ) = 0; (serially uncorrelated
unconditionally too).
F. Guta (CoBE) Econ 654 Dec, 2023 141 / 211
5). E (α i ε it ) = 0 for all t.

Thus, (yi1 ; ::; yi;t 2) are valid instruments at time t.


Instruments (Anderson & Hsiao (1981)):
(yi;t 2 yi;t 3) or yi;t 2 for yi;t 1;

yit = γ yi;t 1+ x0it β + ε it (3.4.14)

In practice, the choice will obviously depend on the


relevant correlations.
Arellano and Bond Estimators
F. Guta (CoBE) Econ 654 Dec, 2023 142 / 211
Arellano and Bond (1991) suggest that the list of
instruments can be extended by exploiting
additional moment conditions and letting their
number vary with t.
To do this, they keep T …xed. For example, when
T = 2, we have

E f(ε i2 ε i1 )yi0 g = 0

as the moment condition for t = 2. For T = 3, we


have have
F. Guta (CoBE) Econ 654 Dec, 2023 143 / 211
E f(ε i3 ε i2 )yi1 g = 0
but it also holds that

E f(ε i3 ε i2 )yi0 g = 0

For period T = 4, we have three moment conditions


and three valid instruments

E f(ε i4 ε i3 )yi0 g = 0
E f(ε i4 ε i3 )yi1 g = 0
E f(ε i4 ε i3 )yi2 g = 0.
F. Guta (CoBE) Econ 654 Dec, 2023 144 / 211
All these moment conditions can be exploited in a
GMM framework.
To introduce the GMM estimator de…ne for general
sample size T ,
0 1
ε i2 ε i1
B C
B .. C
εi = B . C (3.4.15)
@ A
ε i;T ε i;T 1

as the vector of transformed error terms, and

F. Guta (CoBE) Econ 654 Dec, 2023 145 / 211


0 1
B [yi ;0 ] 00 00 C
B C
B C
B C
B 0
B [yi ;0 ; yi ;1 ] 00 C
C
Zi = B C (3:4:16)
B . .. .. C
B .. . . 00 C
B C
B C
@ A
0 00 yi ;0 ; yi ;1 ; : : : ; yi ;T 2

as the matrix of instruments.

Each row in the matrix Zi contains the instruments


that are valid for a given period.
Consequently, the set of all moment conditions can
be written concisely as
F. Guta (CoBE) Econ 654 Dec, 2023 146 / 211
E Z0i ε i = 0 (3:4:17)

The matrix of instruments Zi have T 1 rows and


1
2T (T 1) columns.
Note that these are 1 + 2 + 3 + +T 1=
1
2T (T 1) conditions.
To derive the GMM estimator, write this as

E Z0i yi γ yi; 1 =0 (3.4.18)

where
F. Guta (CoBE) Econ 654 Dec, 2023 147 / 211
0 1 0 1
B yi 2 yi 1 C B yi 1 yi 0 C
B C B C
B C B C
B .. C B .. C
yi = B . C and yi ; 1 = B . C
B C B C
B C B C
@ A @ A
yi ;T yi ;T 1 yi ;T 1 yi ;T 2

Because the number of moment conditions will


exceed the number of unknown coe¢ cients, we
estimate γ by minimizing a quadratic expression in
terms of the corresponding sample moments, that is
" #0 " #
1 n 0 1 n 0
n i∑ n i∑
min Zi ( yi γ yi ; 1 ) Wn Zi ( yi γ yi ; 1 ) (3.4.19)
γ
=1 =1

F. Guta (CoBE) Econ 654 Dec, 2023 148 / 211


where Wn is a symmetric PD weighting matrix6 .

Di¤erentiating this with respect to γ and solving for


γ gives
! !! 1
n n
b
γ GMM = ∑ yi0 ; 1 Zi Wn ∑ Z0i yi ; 1
i =1 i =1
! !!
n n
∑ yi0 ; 1 Zi Wn ∑ Z0i yi (3.4.20)
i =1 i =1

The properties of this estimator depend upon the


6
The su¢ x n re‡ects that Wn can depend upon the sample size n and does
not re‡ect the dimension of the matrix.
F. Guta (CoBE) Econ 654 Dec, 2023 149 / 211
choice for Wn , although it is consistent as long as
Wn is positive de…nite, for example, for Wn = I, the
identity matrix.

The optimal weighting matrix is the one that gives


the most e¢ cient estimator, i.e., that gives the
smallest asymptotic covariance matrix for b
γ GMM .
The optimal weighting matrix is (asymptotically)
proportional to the inverse of the covariance matrix
of the sample moments.
F. Guta (CoBE) Econ 654 Dec, 2023 150 / 211
In this case, this means that the optimal weighting

matrix should satisfy

1 1
plimWn = var Z0i ε i = E Z0i ε i ε 0i Zi (3.4.21)
n !∞

In the case where no restrictions are imposed upon


the covariance matrix of ε i , this can be estimated
using a …rst-step consistent estimator of γ and
replacing the expectation operator by a sample
average.
F. Guta (CoBE) Econ 654 Dec, 2023 151 / 211
This gives
! 1
1 n 0
copt =
Wn ∑
n i=1
Zi b
εi b
0
ε i Zi (3.4.22)

where b
ε i is a residual vector from a …rst-step
consistent estimator, for example using Wn = I.
The general GMM approach does not impose that
ε it is i.i.d. over individuals & time, and the optimal
weighting matrix is thus estimated without imposing
these restrictions.
F. Guta (CoBE) Econ 654 Dec, 2023 152 / 211
Note, however, that the absence of autocorrelation
was needed to guarantee the validity of the moment
conditions.
Instead of estimating the optimal weighting matrix
unrestrictedly, it is also possible (and potentially
advisable in small samples) to impose the absence
of autocorrelation in ε it , combined with a
homoscedasticity assumption.
Noting that under these restrictions

F. Guta (CoBE) Econ 654 Dec, 2023 153 / 211


0 1
B 2 1 0 0 C
B C
B C
B .. C
B 1 2 . 0 0 C
B C
E εi ε 0i = σ 2ε H = σ 2ε B C (3:4:23)
B .. .. .. .. C
B 0 . . . . C
B C
B C
@ . A
.. 0 0 1 2
the optimal weighting matrix can be determined as:
! 1
n
1
Wnopt = ∑
n i=1
Z0i HZi (3.4.24)

Note that this matrix does not involve unknown


parameters, so that the optimal GMM estimator
F. Guta (CoBE) Econ 654 Dec, 2023 154 / 211
can be computed in one step if the original errors
ε it are assumed to be homoscedastic and exhibit no
autocorrelation.
Under weak regularity conditions, the GMM
estimator for γ is asymptotically normal for n " ∞
and …xed T , with its covariance matrix given by
0 ! ! !1 1
1
1 n 1 n 0 1 n 0
n i∑ n i∑ n i∑
plim @ yi0; 1 Zi Zi ε i ε 0i Zi Zi yi ; 1
A (3.4.25)
n !∞ =1 =1 =1

With i.i.d. errors the middle term reduces to


1
1 n
σ 2ε Wnopt = σ 2ε
n ∑i=1
Z0i HZi .
F. Guta (CoBE) Econ 654 Dec, 2023 155 / 211
Alvarez and Arellano (2003) show that, in general,
the GMM estimator is also consistent when both n
and T tend to in…nity despite the fact that the
number of moment conditions tends to in…nity with
the sample size.
For large T , however, the GMM estimator will be
close to the …xed e¤ects estimator, which provides a
more attractive alternative.

3.4.3 Models with Exogenous Variables


F. Guta (CoBE) Econ 654 Dec, 2023 156 / 211
If the model contains exogenous variables, we have
0
yit = γyi;t 1 + xit β + α i + ε it (3.4.26)

which can also be estimated by the generalized IVs


or GMM approach.
Depending upon the assumptions made about xit ,
di¤erent sets of additional instruments can be
constructed.
If xit are strictly exogenous in the sense that they
are uncorrelated with any of the ε is error terms, we
F. Guta (CoBE) Econ 654 Dec, 2023 157 / 211
also have that

E fxis ε it g = 0 for each s; t; (3.4.27)

so that xi1 ; :::; xiT can be added to the instruments


list for the …rst-di¤erenced equation in each period.

This would make the number of columns in Zi quite


large.
Instead, almost the same level of information may
be retained when the …rst-di¤erenced xit s are used
F. Guta (CoBE) Econ 654 Dec, 2023 158 / 211
as their own instruments.7
In this case, we impose the moment conditions

E f xit ε it g = 0 for each t; (3.4.28)

and the instrument matrix can be written as


0 1
B [yi 0; x0
i2] 00 00 C
B C
B C
B C
B
B 00 [yi 0; yi 1; x0i 3 ] 00 C
C
Zi = B C
B .. .. .. C
B . . . 00 C
B C
B C
@ A
00 00 [yi 0; : : : ; yi ;T 2; x0iT ]

7
We give up potential e¢ ciency gains if some xit variables help ‘explaining’
the lagged endogenous variables.
F. Guta (CoBE) Econ 654 Dec, 2023 159 / 211
If the xit variables are not strictly exogenous but
predetermined, in which case current and lagged
xit s are uncorrelated with current error terms, we
only have that E fxit ε is g = 0 for s t.

In this case, only xi;t 1 ; :::; xi1 are valid instruments

for the …rst-di¤erenced equation in period t. Thus,

the moment conditions that can be imposed are

E xi ;t j ε it = 0 for j = 1; : : : ; t 1 (for each t); (3.4.29)

F. Guta (CoBE) Econ 654 Dec, 2023 160 / 211


In practice, a combination of strictly exogenous and
predetermined x variables may occur rather than
one of these two extreme cases. Thus, the matrix
Zi should then be adjusted accordingly.
Information in levels can be exploited in estimation,
i.e., in addition to the moment conditions presented
above, it is also possible to exploit the presence of
valid instruments for the levels equation (3.4.26) or
its average over time (the between regression).

F. Guta (CoBE) Econ 654 Dec, 2023 161 / 211


This is important when the γ coe¢ cient is close to
unity; see Blundell & Bond (1998) and Arellano
(2003, Section 6.6).
With additional assumptions, there are other MCs
that can be used to improve e¢ ciency of the GMM.
STATA command is xtabond. xtabond2 is more
‡exible but you have to download and install.

3.4.4 Weak Instruments (Blundell & Bond) and Other


Issues
F. Guta (CoBE) Econ 654 Dec, 2023 162 / 211
If fyg is very close to being a random walk or when
σ 2α =σ 2ε becomes large, correlation between level of
y and the di¤erence of y will be weak. IV method
will not work properly (weak instruments).
IV su¤ers from very serious …nite sample bias when
the instruments are weak.
Solution: Under homoscedasticity assumption and
that yi0 satis…es mean stationarity
αi
E (yi0 ) = for each i,
1 γ
F. Guta (CoBE) Econ 654 Dec, 2023 163 / 211
can use yi;t 1 as an instrument for the levels
equation. To see this: Write
αi
yi0 = + ε i0 (under mean stationarity)
1 γ

This gives, E (α i ε i0 ) = 0
αi
yi1 = γyi0 + α i + ε i1 = γ + γε i0 + α i + ε i1
1 γ
αi
= + γε i0 + ε i1
1 γ
Thus, yi1 = (γ 1) ε i0 + ε i1 giving E ( yi1 α i ) = 0.
F. Guta (CoBE) Econ 654 Dec, 2023 164 / 211
Which in turn will give (with previous assumptions)

E [(yi ;t 1 yi ;t 2 )(α i + ε it )] = E ( yi ;t 1 (α i + ε it )) = 0 t = 2; 3; : : : ; T

Hence di¤erences can be used as instruments for the


levels giving you an additional (T 2) linear MCs.
The conditions regarding the initial observation
translates to other periods because yit can be
written as a function of yi1 plus errors using
repeated substitution.
In summary, use: E (Z0i ε i ) = 0 & E [ yi ;t 1 (α i + ε it )] = 0.
F. Guta (CoBE) Econ 654 Dec, 2023 165 / 211
The estimation in the above case requires a
combination of equations in FD as well as in levels.
The use of both reduces the problem of weak
instruments.
For large T case, the consistency depends on
T =n !constant < ∞.
STATA command is xtabond2.

Other points

MLE and GLS require some assumptions regarding


F. Guta (CoBE) Econ 654 Dec, 2023 166 / 211
yi0 (stochastic or non-stochastic, correlated with α
or not, etc..).
Test for overidentifying restrictions [Sargan/Hansen]
Moment condition: E (Z0i ε i ) = 0 L instruments
for K covariates.
Use di¤erent set of moment restrictions to see how
estimates change since having too many instruments
can cause small sample bias and also ine¢ ciency.
Also sargan’s test has low power when too many
instruments are used.
F. Guta (CoBE) Econ 654 Dec, 2023 167 / 211
Test for serial correlation

Should not have serial correlation in the ε it . Use


the sample correlation coe¢ cients to test for H0 : no
correlation.
Arellano-Bond test statistic is standard normally
distributed under the null of no serial correlation
In STATA xtabond you can do the following: 1-step
GMM – this uses the H matrix in equation (3.4.23)
to calculate the estimates since the H matrix does
F. Guta (CoBE) Econ 654 Dec, 2023 168 / 211
not depend on any parameters.
1-step GMM with robust covar matrix, estimates
the parameters using the 1-step method and then,
instead of estimating the covariance matrix as
2 ! ! !3 1
1
n n
avar βb = 4 ∑ Xi ; Zi ∑ Zi ∑ Zi X i ; 5
0 0 0 0
b
εi b
ε i Zi (3.4.30)
i =1 i i =1

It uses the residuals to calculate (3.4.30). Here Xi


contains the lagged dependent variable and γ is also
absorbed in β .
2-step GMM uses the optimal weighting matrix and
F. Guta (CoBE) Econ 654 Dec, 2023 169 / 211
calculates the avar as in (3.4.30).

That is, uses the residuals from step 1 to calculate


the optimal weighting matrix as in (3:4:22) and
recalculates βb using this new W.
The avar is then calculated as in (3.4.30) above
using the new set of residuals.

3.5 Models with Limited Dependent Variables

Panel data are used in micro-economic problems


where the models of interest involve nonlinearities.
F. Guta (CoBE) Econ 654 Dec, 2023 170 / 211
Discrete or limited dependent variables are an
important phenomenon in this area, and their
combination with panel data usually complicates
estimation.
The reason is that with panel data it can usually
not be argued that di¤erent observations on the
same unit are independent.
Correlations between di¤erent error terms typically
complicate the likelihood functions of such models
and therefore complicate their estimation.
F. Guta (CoBE) Econ 654 Dec, 2023 171 / 211
3.5.1 Binary Choice Models

As in the cross-sectional case, the binary choice


model is usually formulated in terms of an
underlying latent model. Typically, we write8

yit = α i + x0it β + ε it (3.5.1)

where we observe yit = 1 if yit > 0 and yit = 0


otherwise.
For example, yit may indicate whether person i is
8
We shall assume that xit includes a constant, whenever appropriate.
F. Guta (CoBE) Econ 654 Dec, 2023 172 / 211
working in period t or not.

Let us assume that the idiosyncratic error term ε it


has a symmetric distribution with distribution
function F ( ), i.i.d. across individuals and time and
independent of all xis .
Even in this case the presence of α i complicates
estimation, both when we treat them as …xed
unknown parameters and when we treat them as
random error terms.
F. Guta (CoBE) Econ 654 Dec, 2023 173 / 211
If we treat α i as …xed unknown parameters, we are
essentially including n dummy variables in the
model.
The loglikelihood function is thus given by

ln L (β ; α 1 ; : : : ; α n ) = ∑ yit ln F α i + x0it β (3.5.2)


i ;t

+ ∑ (1 yit ) ln 1 F α i + x0it β
i ;t

Maximizing the loglikelihood with respect to β and


α i (i = 1; :::; n) results in consistent estimators
provided that the number T goes to in…nity.
F. Guta (CoBE) Econ 654 Dec, 2023 174 / 211
For …xed T & n " ∞, the estimators are inconsistent.
The reason is that for …xed T , the number of
parameters grows with sample size n and we have
what is known as an ‘incidental parameter’problem.
We can only estimate α i consistently if the number
of observations for individual i grows, which requires
that T tends to in…nity.
b i for …xed T will
In general, the inconsistency of α
carry over to the estimator for β .

F. Guta (CoBE) Econ 654 Dec, 2023 175 / 211


The incidental parameter problem, where the
number of parameters increases with the number of
observations, arises in any FEs model, including the
linear model.
For the linear case it was possible to eliminate the
α i s, such that β could be estimated consistently,
even though all the α i parameters could not.
bi
For most nonlinear models, the inconsistency of α
leads to inconsistency of the other parameter
estimators as well.
F. Guta (CoBE) Econ 654 Dec, 2023 176 / 211
Although it is possible to transform the latent
variable model such that the individual e¤ects α i
are eliminated, this does not help in this context
because there is no mapping from, for example,
yit yi;t 1, to observables like yit yi;t 1.

An alternative strategy is the use of conditional


maximum likelihood.
In this case, we consider the likelihood function
conditional upon a set of statistics ti that are
su¢ cient for α i .
F. Guta (CoBE) Econ 654 Dec, 2023 177 / 211
This means that conditional upon t i an individual’s
likelihood contribution no longer depends upon α i ,
but still depends upon the other parameters β .
In the panel data binary choice model, the existence
of a su¢ cient statistic depends upon the functional
form of F , i.e., depends upon the distribution of ε it .
A statistic t is called su¢ cient for a parameter θ if
the distribution of the sample given t does not
depend on θ .

F. Guta (CoBE) Econ 654 Dec, 2023 178 / 211


At the general level let us write the joint probability
mass function of yi1 ; :::; yiT as f (yi1 ; :::; yiT j α i ; β ),
which depends upon the parameters β and α i .
If a su¢ cient statistic ti exists, this means that
there exists an observable variable ti such that

f (yi1 ; :::; yiT j ti ; α i ; β ) = f (yi1 ; :::; yiT j ti ; β )

and so does not depend upon α i .


Consequently, we can maximize the conditional
likelihood function, based upon f (yi1 ; :::; yiT j ti ; β )
F. Guta (CoBE) Econ 654 Dec, 2023 179 / 211
to get a consistent estimator for β .

For the linear model with normal errors, a su¢ cient


statistic for α i is y i .
That is, the conditional distribution of yit given y i
does not depend upon α i , and maximizing the
conditional likelihood function can be shown to
reproduce the FEs estimator for β .
Unfortunately, this result does not automatically
extend to nonlinear models.
F. Guta (CoBE) Econ 654 Dec, 2023 180 / 211
For probit model, for e.g., it has been shown that
no su¢ cient statistic for α i exists. This means that
we cannot estimate a FEs probit model consistently
for …xed T .

3.5.2 The Fixed E¤ects Logit Model

For FEs logit model, ti = y i is a su¢ cient statistic


for α i and consistent estimation is possible by
conditional maximum likelihood.
It should be noted that the conditional distribution
F. Guta (CoBE) Econ 654 Dec, 2023 181 / 211
of yi1 ; :::; yiT is degenerate if ti = 0 or ti = 1.

Consequently, such individuals do not contribute to


the conditional likelihood and should be discarded in
estimation.
This means that only individuals that change status
at least once are relevant for estimating β .
To illustrate the FEs logit model, consider the case
with T = 2.
Conditional upon ti = 1=2, the two possible
F. Guta (CoBE) Econ 654 Dec, 2023 182 / 211
outcomes are (0; 1) and (1; 0).

The conditional probability of the …rst outcome is


P f (0; 1)j α i ; β g
P f (0; 1)j ti = 1=2; α i ; β g = (3.5.3)
P f (0; 1)j α i ; β g + P f (1; 0)j α i ; β g

Using that

P f (0; 1)j α i ; β g = P f yi 1 = 0j α i ; β g P f yi 2 = 1j α i ; β g with

1
P f yi 1 = 0j α i ; β g =
1 + exp α i + x0i 1 β
exp α i + x0i 2 β
P f yi 2 = 1j α i ; β g =
1 + exp α i + x0i 2 β
F. Guta (CoBE) Econ 654 Dec, 2023 183 / 211
It follows that the conditional probability is given by
exp (xi 2 xi 1 )0 β
P f (0; 1)j ti = 1=2; α i ; β g = (3.5.4)
1 + exp (xi 2 xi 1 )0 β

which indeed does not depend upon α i . Similarly,

1
P f (1; 0)j ti = 1=2; α i ; β g = (3.5.5)
1 + exp (xi 2 xi 1 )0 β

This means that we can estimate FEs logit model


for T = 2 using a standard logit with xi2 xi1 as
explanatory variables and the change in yit as the
endogenous event (1 for a positive change, 0 for a
F. Guta (CoBE) Econ 654 Dec, 2023 184 / 211
negative one).

Note that in this FEs binary choice model it is even


more clear than in the linear case that the model is
only identi…ed through the ‘within dimension’of the
data; individuals who do not change status are
simply discarded in estimation as they provide no
information whatsoever about β .
For the case with larger T it is cumbersome to
derive all necessary conditional probabilities, but in
F. Guta (CoBE) Econ 654 Dec, 2023 185 / 211
principle it is an extension of the above case
(Cameron & Trivedi, 2005, or Maddala, 1987).
If it can be assumed that the α i are independent of
the explanatory variables in xit , a REs treatment
seems more appropriate. This is easily achieved in
the context of a probit model.

3.5.3 The Random E¤ects Probit Model

Let us start with the latent variable speci…cation

yit = x0it β + uit (3.5.6)


F. Guta (CoBE) Econ 654 Dec, 2023 186 / 211
with

yit = 1 if yit > 0


yit = 0 if yit 0

where uit is an error term with mean zero and unit


variance, independent of (xi1 ; :::; xiT ).
To estimate β by maximum likelihood we will have
to complement this with an assumption about the
joint distribution of ui1 ; :::; uiT .
The likelihood contribution of individual i is the
F. Guta (CoBE) Econ 654 Dec, 2023 187 / 211
(joint) probability of observing the T outcomes
yi1 ; :::; yiT .

This joint probability is determined from the joint


distribution of the latent variables yi1 ; :::; yiT by
integrating over the appropriate intervals.
In general, this will imply T integrals, which in
estimation are typically to be computed numerically.
When T = 4 or more, this makes maximum
likelihood estimation infeasible.
F. Guta (CoBE) Econ 654 Dec, 2023 188 / 211
Possible to circumvent this ‘curse of dimensionality’
by using simulation-based estimators.
If we assume that all uit are independent, we have
that f (yi1 ; :::; yiT j xi1 ; :::; xiT ; β ) = ∏t f (yit jxit ; β ),
which involves T one-dimensional integrals only (as
in the cross-sectional case).
If we make an error components assumption, and
assume that uit = α i + ε it , where ε it is independent
over time (and individuals), we can write the joint

F. Guta (CoBE) Econ 654 Dec, 2023 189 / 211


probability as

f (yi1 ; :::; yiT j xi1 ; :::; xiT ; β )


Z ∞
= f (yi1 ; :::; yiT j xi1 ; :::; xiT ; α i ; β ) f (α i ) dα i

Z ∞
=

∏ f (yit jxit ; α i ; β ) f (α i ) dα i (3.5.7)
t

which requires numerical integration over one


dimension.

This is a feasible speci…cation that allows the error


terms to be correlated across di¤erent periods,
F. Guta (CoBE) Econ 654 Dec, 2023 190 / 211
albeit in a restrictive way.

The crucial step in (3:5:7) is that conditional upon


α i the errors from di¤erent periods are independent.
In principle arbitrary assumptions can be made
about the distributions of α i and ε it .
However, the most common approach is to start
from a multivariate normal distribution, which leads
to the REs probit model.
Assume that the joint distribution of ui1 ; :::; uiT is
F. Guta (CoBE) Econ 654 Dec, 2023 191 / 211
normal with zero means and variances equal to 1
and cov fuit ; uis g = σ 2α , s 6= t.

This corresponds to assuming that α i is


NID 0; σ 2α and ε it is NID 0; 1 σ 2α .
Recall that as in the cross-sectional case we need a
normalization on the errors’variances.
The normalization chosen here implies that the error
variance in a given period is unity, such that the
estimated β coe¢ cients are directly comparable to
F. Guta (CoBE) Econ 654 Dec, 2023 192 / 211
estimates obtained from estimating the model from
one wave of the panel using cross-sectional probit
maximum likelihood.

For the REs probit model, the expressions in the

likelihood function are given by


8
>
> 0
xit β +α i
>
< p if yit = 1
2
1 σα
f (yit jxit ; α i ; β ) = (3.5.8)
>
> x0it β +α i
>
: 1 p if yit = 0
1 σ 2α

where denotes the cumulative distn¯ function of


F. Guta (CoBE) Econ 654 Dec, 2023 193 / 211
the standard normal distribution.
The density of α i is given by
1 1 α 2i
f (α i ) = p exp (3.5.9)
2πσ 2α 2 σ 2α
The integral in (3:5:7) is computed numerically.
Several software packages (e.g., LIMDEP & Stata)
have standard routines for estimating the REs
probit model.
Ignoring the correlations across periods & estimating
the β coe¢ cients using standard probit ML on the
F. Guta (CoBE) Econ 654 Dec, 2023 194 / 211
pooled data is consistent, though ine¢ cient.

Routinely computed standard errors are incorrect.


Nevertheless, these values can be used as initial
estimates in an iterative ML procedure based on
(3:5:7).

3.5.4 Tobit Models

The REs tobit model is similar to the REs probit


model, the only di¤erence is in the observation rule.
F. Guta (CoBE) Econ 654 Dec, 2023 195 / 211
Let us start with

yit = x0it β + α i + ε it (3.5.10)

while

yit = yit if yit > 0


yit = 0 if yit 0

We make REs assumption that α i & ε it are i.i.d.


normally distributed, independent of xi1 ; :::; xiT ,
with zero means & variances σ 2α & σ 2ε , respectively.
F. Guta (CoBE) Econ 654 Dec, 2023 196 / 211
Using f as generic notation for a density or
probability mass function, the likelihood function
can be written as in (3:5:7),
Z ∞
f ( yi 1 ; :::; yiT j xi 1 ; :::; xiT ; β ) = ∏ f (yit jxit ; α i ; β ) f (α i ) d α i ; (3.5.11)
∞ t

where f (α i ) is given by (3:5:9) and f (yit j xit ; α i ; β )


is given by
( )
1 1 (yit x0it β α i )2
f ( yit j xit ; α i ; β ) = p exp if yit > 0
2πσ 2ε 2 σ 2ε
x0it β + α i
= 1 if yit = 0 (3.5.12)
σε
F. Guta (CoBE) Econ 654 Dec, 2023 197 / 211
Note that these expressions are similar to the
likelihood contributions in the cross-sectional case.
The only di¤erence is the inclusion of α i in the
conditional mean.
In a similar fashion, other forms of censoring can be
considered, to obtain, for example, the REs ordered
probit model.
However, in all cases, the integration over α i has to
be done numerically.
F. Guta (CoBE) Econ 654 Dec, 2023 198 / 211
3.5.5 Dynamics & the Problem of Initial Conditions

Suppose we are explaining whether or not an


individual is unemployed over a number of
consecutive months.
It is the case that individuals who have a longer
history of being unemployed are less likely to leave
the state of unemployment.
There are two explanations for this: an individual
with a longer unemployment history may be
F. Guta (CoBE) Econ 654 Dec, 2023 199 / 211
discouraged in looking for a job or may be less
attractive for an employer to hire.

This phenomenon is called state dependence: the


longer you are in a certain state the less likely you
are to leave it.
Alternatively, it is possible that unobserved
heterogeneity is present such that individuals with
certain unobserved characteristics are less likely to
leave unemployment.
F. Guta (CoBE) Econ 654 Dec, 2023 200 / 211
In the binary choice models discussed above, the
individual e¤ects α i capture the unobserved
heterogeneity.
If we include a lagged dependent variable, we can
distinguish between the above two explanations.
Let us consider the REs probit model, although
similar results hold for the REs tobit case.
Suppose latent variable speci…cation is changed into

yit = x0it β + γyi;t 1 + αi + ε it (3.5.13)


F. Guta (CoBE) Econ 654 Dec, 2023 201 / 211
with yit = 1 if yit > 0 and 0 otherwise.
In this model γ > 0 indicates positive state
dependence: the ceteris paribus probability that
yit = 1 is larger if yi;t 1 is also one.
Consider ML estn¯ of this dynamic REs probit model,
under same distributional assumptions as before.
The likelihood contribution of individual i is given by

f ( yi 1 ; : : : ; yiT j xi 1 ; : : : ; xiT ; β )
Z ∞
= f ( yi 1 ; : : : ; yiT j xi 1 ; : : : ; xiT ; α i ; β ) f (α i ) d α i

F. Guta (CoBE) Econ 654 Dec, 2023 202 / 211
Z ∞
= [∏nt=2 f ( yit j yi ;t 1 ; xit ; α i ; β )] f ( yi 1 j xi 1 ; α i ; β ) f (α i ) d α i (3:5:14)

where
!
x0it β + γ yi ;t 1 + α i
f ( yit j yi ;t 1 ; xit ; α i ; β ) = p if yit = 1
1 σ 2α
!
x0it β + γ yi ;t 1 + α i
= 1 p if yit = 0
1 σ 2α

This is completely analogous to static case & yi;t 1

is simply included as an additional explanatory


variable.
However, the term f (yi1 j xi1 ; α i ; β ) in the likelihood
F. Guta (CoBE) Econ 654 Dec, 2023 203 / 211
function may cause problems.

This term gives the probability of observing yi1 = 1


or 0 without knowing the previous state but
conditional upon unobserved heterogeneity term α i .
If the initial value is exogenous in the sense that its
distribution does not depend on α i , we can put
f (yi1 jxi1 ; α i ; β ) = f (yi1 jxi1 ; β ) outside the integral.
In this case, we can simply consider the likelihood
function conditional upon yi1 and ignore the term
F. Guta (CoBE) Econ 654 Dec, 2023 204 / 211
f (yi1 jxi1 ; β ) in estimation.

The only consequence may be a loss of e¢ ciency if


f (yi1 jxi1 ; β ) provides information about β .
This approach would be appropriate if the initial
state is necessarily the same for all individuals or if
it is randomly assigned to individuals.
It may be hard to argue in many applications that
the initial value yi1 is exogenous and does not
depend upon a person’s unobserved heterogeneity.
F. Guta (CoBE) Econ 654 Dec, 2023 205 / 211
In that case we would need an expression for
f (yi1 jxi1 ; α i ; β ) and this is problematic.
If the process that we are estimating has been going
on for several periods before the current sample
period, f (yi1 jxi1 ; α i ; β ) is a complicated function
that depends upon person i’s unobserved history.
This means that it is impossible to derive expression
for the marginal probability f (yi1 jxi1 ; α i ; β ) that is
consistent with the rest of the model.

F. Guta (CoBE) Econ 654 Dec, 2023 206 / 211


Heckman (1981) suggests an approximate solution
to this initial conditions problem that appears to
work reasonably well in practice.
It requires an approximation for the marginal
probability of the initial state by a probit function
using as much pre-sample information as available,
without imposing restrictions between its
coe¢ cients and the structural β and γ parameters.
Vella and Verbeek (1999a) provide an illustration of
this approach in a dynamic REs tobit model.
F. Guta (CoBE) Econ 654 Dec, 2023 207 / 211
The impact of the initial conditions diminishes if the
number of sample periods T increases, so one may
decide to ignore the problem when T is fairly large.

F. Guta (CoBE) Econ 654 Dec, 2023 208 / 211


Stata Linear Panel Commands

1). Panel summary: xtset; xtdescribe; xtsum; xttab;


a). xtdescribe: extent to which panel is unbalanced
b). xtsum: separate within (over time) and between (over
individuals) variation
c). xttab: tabulations within and between for discrete data
e.g. binary

2). Pooled OLS: regress

F. Guta (CoBE) Econ 654 Dec, 2023 209 / 211


3). Feasible GLS: xtgee, family(gaussian) xtgls; xtpcse
a). xtpcse: Linear regression with panel-corrected standard
errors
b). xtgee: …ts generalized linear models and allows you to
specify the within-group correlation structure for the
panels.

4). Random e¤ects: xtreg, re; xtregar, re


5). Fixed e¤ects: xtreg, fe; xtregar, fe
6) First di¤erences: regress (with di¤erenced data)
7). Static IV: xtivreg; xthtaylor
F. Guta (CoBE) Econ 654 Dec, 2023 210 / 211
8). Dynamic IV: xtabond; xtdpdsys:
Arellano-Bover/Blundell-Bond linear dynamic
panel-data estimation
9). xtdpd: Linear dynamic panel-data estimation
10). xtprobit: Random-e¤ects probit models
11). xtlogit: Fixed-e¤ects, random-e¤ects, logit models
12). xttobit: Random-e¤ects tobit models

F. Guta (CoBE) Econ 654 Dec, 2023 211 / 211

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy