Lec06 - Panel Data
Lec06 - Panel Data
Lec06 - Panel Data
Pooled Data
Pooling Independent Cross-Sections across Time
o Since a random sample is drawn at each time period, pooling the resulting random
samples gives us an independently pooled cross-section
o As such, we can use standard OLS methods
o Advantage of pooling is to increase the sample size, thereby obtaining more precise
estimates and test statistics with greater power
o Pooling is only useful in this regard if the relationship between the dependent variable
and at least some of the independent variables remains constant over time
o To reflect the fact that the population may have different distributions in different time
periods, the intercept is usually allowed to differ across time periods (can be
accomplished by including year dummies)
o The coefficients on the year dummies may be of interest (e.g. after controlling for other
factors has the pattern of fertility changed over time?)
o Year dummies can also be interacted with other explanatory variables to see if the
effect of that variable has changed over time
Example
o A state offers a tax break to firms providing employers with health insurance. To estimate
the impact of the bill on the percentage of firms offering health insurance we could use
data on a state that didnt implement such a law as a control group. It is not correct just
to compare pre- and post-law changes in the percentage of firms offering health
insurance, i.e.
= + 2 +
(1)
o Here the coefficient estimate gives an estimate of the difference in the percentage of
firms offering health insurance between periods one and two
o The coefficient doesnt necessarily provide a (causal) estimate of the impact of the tax
break however, since there could be a trend towards more employers offering health
insurance over time
With repeated cross sections, let A be the control group and B the treatment group. Write
= + + 2 + 2. +
(2)
where:
y is the outcome of interest (e.g. percentage of firms offering health insurance in each State)
dB captures possible differences between the treatment and control groups prior to the
policy change (e.g. State A versus State B)
d2 captures aggregate factors that would cause changes in y over time even in the absence
of a policy change, i.e. for both States (e.g. time dummies)
The coefficient of interest is , which gives an estimate of the change in health insurance
take-up for firms in State B, and which is called the difference-in-difference estimator.
State A
Year 1
Year 2
Coefficient
State B
Calculation
( ) ( )
(3)
(4)
(5)
where the A subscript means the state not implementing the policy and the N subscript
represents the non-elderly. This is the difference-in-difference-in-differences (DDD) estimate.
Can add covariates to either the DD or DDD analysis to control for compositional changes.
Can use multiple time periods and groups.
This methodology has a number of applications, particularly when the data arise from a
natural experiment (or quasi experiment)
o Occurs when some exogenous event often a change in government policy changes
the environment in which individuals, families, firms or cities operate
A natural experiment always has a control group, which is not affected by the policy change,
and a treatment group thought to be affected by the policy change
Different to a true experiment the control and treatment groups in natural experiments
arise from the particular policy change and are not randomly assigned
If $ is the control group and the treatment group, and letting equal one for those in
the treatment group , and zero otherwise. Then, letting 2 denote a dummy for the
second (post-policy change) time period, the equation of interest is;
= + 2 + + 2 + &() +&),
Control
Treatment
Treatment - Control
Before
+
After
+
+ + +
+
After - Before
+
The parameter sometimes called the average treatment effect can be estimated in
two ways:
1. Compute the differences in averages between the treatment and control groups in
each time period, and then difference the results over time
2. Compute the change in averages over time for each of the treatment and control
groups, and then difference these changes, i.e. write
= ,- ,- (,. ,. )
When explanatory variables are added to the regression, the OLS estimate of no longer
has a simple form, but its interpretation is similar
Panel Data
A balanced panel has the same number of time observations () for each of the / individuals
An unbalanced panel has different numbers of time observations (0 ) on each individual
A compact panel covers only consecutive time periods for each individual there are no
gaps
Attrition is the process of drop-out of individuals from the panel, leading to an unbalanced
(and possibly non-compact) panel
A short panel has a large number of individuals but few time observations on each, (e.g. the
British Household Panel Survey has 5,500 households and 14 waves)
A long panel has a long run of time observations on each individual, permitting separate
time-series analysis for each (e.g. Penn World Tables has data from 1960)
While panel data can be analyzed using standard OLS techniques, it is better to use some
techniques specifically designed to take advantage of panel data.
o Specifically, you know that in a panel data set there is a special relationship between
the multiple observations of a particular individual.
Consider the following regression specification:
01 = 201 + 01
A common assumption used in panel data is that we can write the error term as:
01 = 301 + 40
Where 40 is called a fixed- (or random-) effect that doesnt vary over time
One assumption of OLS is that (01 |201 ) = 0. If the 40 are correlated with 201 therefore OLS
will provide inconsistent estimates of the parameters
Panel data methods allow us to estimate the parameters consistently using so-called fixed
effects (or related) methods
We replace the assumption that (01 |201 ) = (301 + 40 |201 ) = 0 with the weaker
assumption that (301 |20 ) = 0
However, if you know that this is panel data, you might consider which points in the graph
are observations on the same individuals and, perhaps, circle observations of particular
individuals. In this case, you might find the following:
This reveals a very different relationship between 7 and 8. The knowledge of which
observations came from the same person, company, plant, unit, state, city or whatever can
dramatically impact the results of your research.
In this case, we see a negative relationship between the variables, but we might imagine
that there is a separate intercept for each person. This is one type of panel data model.
(1)
The model can be estimated using the (pooled) Ordinary Least Squares (OLS) estimator
o The simplest approach to the estimation.
o Individual effects 40 are fixed and common across economic agents, such that 40 = 4
for all 9 = 1, . . . ,
o OLS produces consistent and efficient estimates of 4 and .
One assumption of OLS is that there is a zero correlation between the error terms of any
two observations.
The problem with panel data is that we would expect there to be correlations between error
terms for a particular individual across different time periods.
So, if unobserved variables for an individual tend to make its error term positive in one
period, they will tend to make its error term positive in other periods as well. For example, if
a county has a particularly high rate of unemployment in one year, it is likely to have a high
rate then next year, too.
This correlation between error terms is a violation of one of the assumptions of OLS. This
violation means that OLS is not the best estimator.
OLS on this transformed equation will yield consistent estimates of since the 40 has been
removed through first-differencing
201 and 301 are assumed uncorrelated because of the assumption that (301 |201 ) = 0
o Can be used if individual effects 40 are fixed but not common across 9 = 1, . . . ,
o Eliminates the fixed effect 40 by differencing with respect to the mean
Let 0 = ; -1? 01 and 20 = ; -1? 201
Define: 201
= 201 20 and 01 = 01 0
Then 0 = 40 + 20B + 30
Subtracting from (1) gives:
01 0 = (40 40 ) + (201 20 ) + (301 30 )
01 0 = (201 20 ) + (301 30 )
Or 01 = 201
+ 301
Which can then be estimated by OLS
The individual effects can be estimated as 4D0 = 0 20
The estimator of the slope parameters, , is consistent if either or become large
The estimator of the individual effects, 4D0 , is constant only if becomes large
The number of degrees of freedom need to be adjusted.
o Usually the degrees of freedom would be E, but with individual effects we have
E (software packages usually make this correction when running their panel
commands)
This gives:
(2)
One possibility is to estimate a regression model with panel data using OLS. This imposes the assumption that the fixed
effects are the same for each individual.
Number of obs
F( 6, 4353)
Prob > F
R-squared
Root MSE
=
=
=
=
=
4360
141.60
0.0000
0.1659
.48676
-----------------------------------------------------------------------------|
Robust
lwage |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------educ |
.0989097
.0045823
21.59
0.000
.0899261
.1078932
exper |
.0940639
.0101683
9.25
0.000
.0741289
.1139989
expersq | -.0032342
.000678
-4.77
0.000
-.0045634
-.0019051
black | -.1142367
.0248047
-4.61
0.000
-.1628665
-.0656069
married |
.1165982
.0154429
7.55
0.000
.0863222
.1468742
hisp |
.0266568
.0198779
1.34
0.180
-.012314
.0656276
_cons | -.0065678
.0649791
-0.10
0.919
-.13396
.1208244
In this case it is often useful to cluster standard errors around the individual. Statas cluster command specifies
that the standard errors allow for intragroup correlation, relaxing the usual requirement that the observations be
independent. That is, the observations are independent across groups (clusters) but not necessarily within groups.
. regress lwage educ exper expersq black married hisp, vce(cluster nr)
Linear regression
Number of obs
F( 6,
544)
Prob > F
R-squared
Root MSE
=
=
=
=
=
4360
58.28
0.0000
0.1659
.48676
To allow for differences in the distribution in different time periods it is often desirable to allow for differences
in the intercept over time. This can be achieved by including a set of time dummies.
. regress lwage educ exper expersq black married hisp d8*, vce(cluster nr)
Linear regression
Number of obs
F( 13,
544)
Prob > F
R-squared
Root MSE
=
=
=
=
=
4360
43.29
0.0000
0.1682
.48649
To estimate the model using panel techniques (i.e. fixed and random effects) it is necessary to tell Stata which is the
group and which if the time identifier.
. tsset nr year
panel variable:
time variable:
delta:
nr (strongly balanced)
year, 1980 to 1987
1 unit
To run a fixed effects model is then straightforward using the xtreg command.
. xtreg lwage educ exper expersq married black hisp, fe vce(robust)
note: educ omitted because of collinearity
note: black omitted because of collinearity
note: hisp omitted because of collinearity
Fixed-effects (within) regression
Group variable: nr
Number of obs
Number of groups
=
=
4360
545
R-sq:
=
=
=
=
=
8
8.0
8
135.44
0.0000
within = 0.1741
between = 0.0014
overall = 0.0534
corr(u_i, Xb)
= -0.1289
Number of obs
Number of groups
=
=
4360
545
R-sq:
8
8.0
8
within = 0.1739
between = 0.1548
overall = 0.1635
corr(u_i, X)
= 0 (assumed)
Wald chi2(6)
Prob > chi2
=
=
517.71
0.0000
It is also possible to estimate the fixed effects model using the least squares dummy variable approach. Initially we
have to define a set of dummy variables one for each individual.
. tab(nr), gen(dumi)
Then estimate the regression model including the dummy variables using OLS
. regress lwage educ exper expersq married black hisp dumi*, noc vce(robust)
note: dumi375 omitted because of collinearity
note: dumi395 omitted because of collinearity
note: dumi462 omitted because of collinearity
Linear regression
Number of obs
F(548, 3812)
Prob > F
R-squared
Root MSE
=
4360
= 1048.92
= 0.0000
= 0.9639
= .35204
-----------------------------------------------------------------------------|
Robust
lwage |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------educ |
.0826707
.0040599
20.36
0.000
.0747108
.0906305
exper |
.1169371
.0091402
12.79
0.000
.0990168
.1348573
expersq | -.0043329
.0005988
-7.24
0.000
-.0055069
-.0031589
married |
.0473384
.0181832
2.60
0.009
.0116886
.0829882
black |
.1943954
.0733538
2.65
0.008
.0505789
.3382118
hisp |
.4369434
.0835889
5.23
0.000
.2730601
.6008268
dumi1 | -.3174653
.3215929
-0.99
0.324
-.947976
.3130454
dumi2 | -.0474871
.0700694
-0.68
0.498
-.1848643
.0898
..................
dumi542 |
.5402538
.078385
6.89
0.000
.3865731
.6939344
dumi543 | -.3221958
.1511144
-2.13
0.033
-.6184686
-.025923
dumi544 |
.7369184
.0553026
13.33
0.000
.6284928
.8453439
dumi545 | -.0496064
.0986855
-0.50
0.615
-.2430879
.1438751
Warning: we now have estimates for educ, black, etc, but things are not as they appear!
=
=
=
=
=
3815
5.08
0.0063
0.0030
.44326
-----------------------------------------------------------------------------|
Robust
D.lwage |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------educ |
D1. | (omitted)
|
exper |
D1. | (omitted)
|
expersq |
D1. | -.0038615
.0013887
-2.78
0.005
-.0065841
-.0011388
|
black |
D1. | (omitted)
|
married |
D1. |
.0383504
.0233253
1.64
0.100
-.0073809
.0840818
|
hisp |
D1. | (omitted)
|
_cons |
.1155318
.0211452
5.46
0.000
.0740747
.1569889
or,
where 0 L0 = 0 and 1 W1 = 0
The two-way fixed effects panel model can be estimated using the LSDV approach by
including time dummies FS01 = 1 (, = ) in addition to individual dummies, thus estimating:
01 = 4 01 + 4 01 + + 4" "01 + V F01 + + V- F-01 + 201 + 301
One problem with estimating the two-way panel model using dummy variables is that there
is an incidental parameters problem as either or go to infinity
o A new within transformation can remove these:
X01 = 01 0 1 +
o The two-way within model can then be written as
X01 = 2X01 + 01
The average, individual and time effects can now be estimated as
o 4DZ = Z 2
o 4DZ,0 = 0 Z 20
o 4DZ,1 = 1 Z 21
Consistency:
o 4DZ and Z are consistent as either or tend to infinity
o 4DZ,0 is only T-consistent
o 4DZ,1 is only N-consistent
The two-way within transformation removes both observed and unobserved heterogeneity
for both individual and time effects
The two-way model can also be estimated using a random effects model by GLS
In one-way models the fixed effects are either fixed or random. In a two-way model the
individual and time effects can be fixed or random
o i.e. we may have mixed random effects / fixed effects models where the time effect is
assumed fixed and the individual effect random for example
o if is small for example, one may estimate a one-way random effects model on a set
of exogenous variables and time dummies
In certain cases a dataset might have more than 2 dimensions: e.g. firm-industry-year;
country region-year; individual-household-year; employee-firm-year; farm-region-year.
This class of data can be analyzed using nested error component models.
80U1 = 4 + 7[0U1 + 0U1 , 0U1 = L0 + \U + W1 + ]0U + 30U1
Which can be estimated using LSDV methods.
Other Estimators
Between Effects Regression
Between Groups estimation involves OLS on the cross-section equation:
0 = 20 + (40 + 0 )
i.e. we average out all of the within-individual variation, leaving only the between-individual
variation
The model can be estimated using OLS by either
o Using one group-mean observation per individual
o Or using 0 copies of the individual group mean data for individual 9
The latter is equivalent to a weighted regression of 0 on 20 , with the weights given by 0 for
individual 9. It is often desirable to give more weight to individuals with many time series.
Consistency requires that (201 40 ) = 0
Between groups estimation is not efficient
Usually only used to obtain an estimate of PQ when using feasible GLS
One potential problem with the SUR model is that you might have more explanatory
variables than you have observations for any individual in the data, which would essentially
make this approach impossible.
In the above equation we can consider to be the common mean coefficient vector and
the 0 s as the individual deviation from the mean.
Rewriting the above equation we have:
01 = ( + 201 ) + ( 0 + 0 201 ) + 01
]01 = ( 0 + 0 201 ) + 01
=)(]01 ) = ^ + 2^ 201 + ^ 201
+ P
Number of obs
Number of groups
=
=
2287
131
4
17.5
35
Wald chi2(1)
Prob > chi2
=
=
962.03
0.0000
-----------------------------------------------------------------------------langpost |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------iqvc |
2.52637
.0814522
31.02
0.000
2.366726
2.686013
_cons |
40.70956
.3042423
133.81
0.000
40.11325
41.30586
----------------------------------------------------------------------------------------------------------------------------------------------------------Random-effects Parameters |
Estimate
Std. Err.
[95% Conf. Interval]
-----------------------------+-----------------------------------------------schoolnr: Unstructured
|
sd(iqvc) |
.4583713
.1100965
.286264
.7339526
sd(_cons) |
3.058354
.2491357
2.607043
3.587791
corr(iqvc,_cons) | -.8168636
.1743621
-.9744848
-.1196644
-----------------------------+-----------------------------------------------sd(Residual) |
6.44051
.1004244
6.246659
6.640377
-----------------------------------------------------------------------------LR test vs. linear regression:
chi2(3) =
246.91
Prob > chi2 = 0.0000
Note: LR test is conservative and provided only for reference.
The expected language score for a child with average IQ now averages 40.7 across schools, with a standard dviation of 3.1.
The expected gain in language score per point of IQ averages 2.5, with a standard devaition of 0.48.
The intercept and slope have a negative correlation of -0.82 across schools, so schools with higher language scores for a kid with average verbal IQ tend
to show smaller average gains.
Although random effects are not directly estimated, you can form best linear unbiased predictions (BLUPs) of them (and standard errors) using predict after
xtmixed.
The next step is to predict fitted values as well as the random effects (we can also verify that we can reproduce the fitted values)
. predict yhat2, fitted
//residual
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
+---------------------+
|
yhat2
check |
|---------------------|
| 20.78043
20.78043 |
| 24.53934
24.53934 |
| 27.04527
27.04527 |
| 30.80418
30.80418 |
| 33.31012
33.31012 |
|---------------------|
| 34.56309
34.56309 |
| 34.56309
34.56309 |
| 34.56309
34.56309 |
| 34.56309
34.56309 |
| 35.81606
35.81606 |
+---------------------+
The graph of fitted lines shows clearly how school differences are more pronounced at lower than at higher verbal IQs.
o The 40 are a total of several factors specific to the cross-section units and thus represents
specific ignorance, which can be treated as random variables, in the same manner as 01
which represents general ignorance are treated as random
Comparison of Estimators
(1)
OLS
educ
exper
expersq
married
black
hisp
Observations
0.0989***
(0.00458)
0.0941***
(0.0102)
-0.00323***
(0.000678)
0.117***
(0.0154)
-0.114***
(0.0248)
0.0267
(0.0199)
4360
(2)
OLS
(clustered s.e)
0.0989***
(0.00923)
0.0941***
(0.0124)
-0.00323***
(0.000864)
0.117***
(0.0266)
-0.114**
(0.0520)
0.0267
(0.0404)
4360
(3)
FE
0.117***
(0.0107)
-0.00433***
(0.000686)
0.0473**
(0.0212)
4360
(4)
RE
(5)
BE
0.101***
(0.00889)
0.113***
(0.0105)
-0.00415***
(0.000672)
0.0653***
(0.0192)
-0.127**
(0.0515)
0.0265
(0.0407)
4360
0.0941***
(0.0112)
-0.00271
(0.0511)
0.00212
(0.00327)
0.161***
(0.0423)
-0.0963*
(0.0498)
0.0233
(0.0439)
4360
Chow Test
o Provides a test of the pooled (restricted model) versus the fixed effects (unrestricted)
model
o This is simply a joint test of whether the fixed effects are significant
(aabb cabb)/( 1)
$J_` =
cabb/( E)
where RRSS and URSS are the residuals sum of squares from the restricted and unrestricted
model respectively. This is distributed e";,"-;";f under the null of no fixed effects.
o If there are a number of observed individual specific variables in the model, these are
included in the pooled model, but not the fixed effects model (i.e. we want to test for
unobserved heterogeneity)
Number of obs
F(547, 3812)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
4360
11.27
0.0000
0.6179
0.5631
.35204
-----------------------------------------------------------------------------lwage |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------educ |
.0290989
.0352816
0.82
0.410
-.0400737
.0982715
exper |
.1169371
.0084385
13.86
0.000
.1003926
.1334815
expersq | -.0043329
.0006066
-7.14
0.000
-.0055222
-.0031436
black |
.8273663
.1536651
5.38
0.000
.5260925
1.12864
married |
.0473384
.0183445
2.58
0.010
.0113725
.0833043
hisp |
.6551428
.1539299
4.26
0.000
.3533498
.9569357
|
nr |
17 |
.2164064
.1614937
1.34
0.180
-.1002159
.5330287
18 |
.5947677
.1539649
3.86
0.000
.2929061
.8966293
45 |
.496684
.1534798
3.24
0.001
.1957735
.7975946
..
12500 | -.1118741
.1536758
-0.73
0.467
-.4131688
.1894206
12534 |
.8936683
.1537159
5.81
0.000
.592295
1.195042
12548 | (omitted)
|
_cons |
.4325396
.4165826
1.04
0.299
-.3842066
1.249286
------------------------------------------------------------------------------
. testparm i.nr
(
(
(
(
(
1)
2)
3)
4)
5)
17.nr = 0
18.nr = 0
45.nr = 0
110.nr = 0
120.nr = 0
(536) 12420.nr = 0
(537) 12433.nr = 0
(538) 12451.nr = 0
(539) 12477.nr = 0
(540) 12500.nr = 0
(541) 12534.nr = 0
F(541, 3812) =
Prob > F =
8.34
0.0000
So, we reject the null that all fixed effects are zero (and thus prefer the fixed effects over the pooled model)
Hausman Test
o Usually applied to test for fixed versus random effects models
o Compares directly the random effects estimator, g! to the fixed effects estimator, h!
o In the presence of a correlation between the individual effects and the regressors the
GLS estimates are inconsistent, while the OLS fixed effects results are consistent
o If there is no correlation between the fixed effects and the regressors both estimators
are consistent, but the OLS fixed effects estimator is inefficient
o Construct i = h! g! and j (i ) = jh! jg!
;
o Test statistic: k = iD jl (iD)# iD distributed as a m statistic with n degress of freedom
(where n is the dimensionality of )
o The null hypothesis is that the preferred model is a random effects model and the
alternative that the fixed effects model is preferred
Number of obs
Number of groups
=
=
4360
545
R-sq:
8
8.0
8
within = 0.1741
between = 0.0014
overall = 0.0534
corr(u_i, Xb)
= -0.1289
F(3,3812)
Prob > F
=
=
267.93
0.0000
-----------------------------------------------------------------------------lwage |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------educ | (omitted)
exper |
.1169371
.0084385
13.86
0.000
.1003926
.1334815
expersq | -.0043329
.0006066
-7.14
0.000
-.0055222
-.0031436
black | (omitted)
married |
.0473384
.0183445
2.58
0.010
.0113725
.0833043
hisp | (omitted)
_cons |
1.085044
.026295
41.26
0.000
1.033491
1.136598
-------------+---------------------------------------------------------------sigma_u | .40387667
sigma_e | .35204264
rho | .56824994
(fraction of variance due to u_i)
-----------------------------------------------------------------------------F test that all u_i=0:
F(544, 3812) =
8.29
Prob > F = 0.0000
. estimates store coeff_consistent
Number of obs
Number of groups
=
=
4360
545
R-sq:
8
8.0
8
within = 0.1739
between = 0.1548
overall = 0.1635
corr(u_i, X)
= 0 (assumed)
Wald chi2(6)
Prob > chi2
=
=
901.13
0.0000
-----------------------------------------------------------------------------lwage |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------educ |
.1011033
.0091567
11.04
0.000
.0831565
.1190501
exper |
.1128358
.0082738
13.64
0.000
.0966195
.1290521
expersq | -.0041483
.0005928
-7.00
0.000
-.0053101
-.0029864
black | -.1269633
.0488629
-2.60
0.009
-.2227328
-.0311938
married |
.065336
.0168465
3.88
0.000
.0323175
.0983546
hisp |
.026507
.0437909
0.61
0.545
-.0593215
.1123355
_cons | -.0845855
.1135289
-0.75
0.456
-.3070982
.1379271
-------------+---------------------------------------------------------------sigma_u | .33561173
sigma_e | .35204264
rho | .47611949
(fraction of variance due to u_i)
-----------------------------------------------------------------------------. estimates store coeff_efficient
Ho:
"
o Define: b = "
0?(1? 01 ) and b = 0? 1? 01
"-
qr
Number of obs
Number of groups
=
=
4360
545
R-sq:
8
8.0
8
within = 0.1739
between = 0.1548
overall = 0.1635
corr(u_i, X)
= 0 (assumed)
Wald chi2(6)
Prob > chi2
=
=
517.71
0.0000
. xttest0
Breusch and Pagan Lagrangian multiplier test for random effects
lwage[nr,t] = Xb + u[nr] + e[nr,t]
Estimated results:
|
Var
sd = sqrt(Var)
---------+----------------------------lwage |
.2836728
.5326094
e |
.123934
.3520426
u |
.1126352
.3356117
Test:
Var(u) = 0
chibar2(01) =
Prob > chibar2 =
3425.60
0.0000
We reject the null hypothesis, which indicates that there are significant differences across individuals and
random effects is more appropriate
Misspecification Tests
It is difficult to investigate the time-series properties (e.g. autocorrelation, stationarity, etc) of
panel data when is small
Testing for heteroscedasticity is possible with small using the Bickel version of the
Bresuch-Pagan test
o This is a test of both within and between heterogeneity
o This is a test of V = = Vu = 0 in the regression model
u
where 01 and D01 are the residuals and fitted values respectively from the within regression
"-;";f
-;
1 01
which is distributed as m";
o The simplest test is then the Breusch-Godfrey test; }~ = -; . ), which is distributed
(0,1) under the null hypothesis
o Given the slow convergence to normality a superior alternative due to Fisher is often
used; F =
z
"-;";f
. ln
,
;
A user-written command in Stata (xttest3) allows one to test for heteroscedasticity. This tests for
heteroscedasticity within groups.
. ssc install xttest3
. xtreg lwage educ exper expersq black married hisp, fe
note: educ omitted because of collinearity
note: black omitted because of collinearity
note: hisp omitted because of collinearity
Fixed-effects (within) regression
Group variable: nr
Number of obs
Number of groups
=
=
4360
545
R-sq:
8
8.0
8
within = 0.1741
between = 0.0014
overall = 0.0534
corr(u_i, Xb)
= -0.1289
F(3,3812)
Prob > F
=
=
267.93
0.0000
-----------------------------------------------------------------------------lwage |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------educ | (omitted)
exper |
.1169371
.0084385
13.86
0.000
.1003926
.1334815
expersq | -.0043329
.0006066
-7.14
0.000
-.0055222
-.0031436
black | (omitted)
married |
.0473384
.0183445
2.58
0.010
.0113725
.0833043
hisp | (omitted)
_cons |
1.085044
.026295
41.26
0.000
1.033491
1.136598
-------------+---------------------------------------------------------------sigma_u | .40387667
sigma_e | .35204264
rho | .56824994
(fraction of variance due to u_i)
-----------------------------------------------------------------------------F test that all u_i=0:
F(544, 3812) =
8.29
Prob > F = 0.0000
. xttest3
Modified Wald test for groupwise heteroskedasticity
in fixed effect regression model
H0: sigma(i)^2 = sigma^2 for all i
chi2 (545) =
Prob>chi2 =
2.2e+05
0.0000
1?
1?
1
1
0 = 0,1; = 01
We have got rid of the individual effect. But what are the statistical properties of a
regression of 01 0 on (201 20 ) and 0,1; 0 ?
By substitution;
01 = VV0,1; + 20,1; + 40 + 0,1; + 201 + 40 + 01
1 V1
01 =
40 + V S 20,1;S + 01 + V0,1; + + V 1 0, # + V 1 0
1V
S
y ;
-1?
01
=
0 is a function of 0,-; , , 0, and 0
As with the within-groups estimator there is a bias in the estimator when first-differencing
the model
This is also true for other models (e.g. pooled OLS, random effects,)
Consider first-differencing to eliminate individual effects
01 0,1; = 201 20,1; + V0,1; 0,1; + 01 0,1;
OLS is inconsistent since 01 0,1; is correlated with 01 0,1; (even under the
assumption that 01 is serially uncorrelated)
o The transformed error term 01 0,1; is a MA(1) process which contains 0,1; , and
is thus correlated with 01 0,1;
There are several IV estimators which correct for endogeneity of the lagged dependent
variable.
o Similar to the method of Hausman and Taylor (see below) the instruments come from
within the model.
o Examples include Anderson and Hsiao, Arellano and Bond, and Blundell and Bond
What we need is a set of instruments that are correlated with 01 0,1; , but not with
0,1;
o All lagged 201 and 0,1; , , 0 are valid instruments if {01 | is serially independent
Since 0,1; is not correlated with 01 0,1; Anderson and Hsiao suggested using 0,1;
as an instrument for 0,1; 0,1; alongside 201 , 20,1; and 20,1;
Problems
If 01 is (or is close to) a random walk then 0,1; is not correlated with 0,1; and is not a
valid instrument
Methods based solely on the differenced equation ignores potentially valuable information
contained in the initial condition
What is the optimal point on the trade-off between the number of lags used as instruments
and the number of time periods retained in the estimation sample?
System Estimators
The time differenced model:
01 0,1; = 201 20,1; + V0,1; 0,1; + 01 0,1;
01 = 201 + V0,1; + 01
= 2, , 0
(1)
Often there are more moment conditions than parameters to be estimated. Then the
moment conditions dont have a unique solution
In this case, we minimise a (weighted) sum of the squares of the sample moments. In vector
(, , )B
(, , )
notation this is written in the general case as
This is called the generalised method of moments (GMM)
IV estimators are members of the class of GMM estimators (e.g. 2SLS)
t=T: 0- = 20- + V0,-; + 0- , instruments: F0- = F0-; , 0-; , 20 It is the use of different instruments for equations of different time periods that defines the
A&B method relative to conventional IV estimation, which uses the same instrument set for
all endogenous variables.
Conventional instruments can also be used in the analysis
A problem arises with the Arellano and Bond method if the variables are close to a random
walk, with lagged levels being poor instruments for the first differences
Arellano and Bover (1995) and Blundell and Bond (1998) show that adding the original
equation in levels to the system can increase the number of moment conditions and
increase efficiency
o In the levels equations endogenous variables are instrumented with lags of their first
differences
XTABOND2 in Stata fits both the Arellano and Bond difference GMM estimator and the
Blundell and Bond system (i.e. levels and differences) GMM estimator
# = PR
(01 01; ) = (01 01; )(01; 01; ) = 01;
Favoring space over speed. To switch, type or click on mata: mata set matafavor speed, perm.
Dynamic panel-data estimation, one-step difference GMM
-----------------------------------------------------------------------------Group variable: id
Number of obs
=
751
Time variable : year
Number of groups
=
140
Number of instruments = 33
Obs per group: min =
5
Wald chi2(10) =
1235.04
avg =
5.36
Prob > chi2
=
0.000
max =
7
-----------------------------------------------------------------------------n |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------n |
L1. |
.554094
.1037099
5.34
0.000
.3508264
.7573616
|
w | -.7749952
.2021376
-3.83
0.000
-1.171178
-.3788128
k |
.48439
.1422565
3.41
0.001
.2055723
.7632077
|
w |
L1. |
.3597913
.1214988
2.96
0.003
.121658
.5979247
|
k |
L1. |
-.334291
.1447846
-2.31
0.021
-.6180637
-.0505183
|
yr1980 | -.0139178
.0134693
-1.03
0.301
-.0403172
.0124815
yr1981 | -.0466677
.0231872
-2.01
0.044
-.0921137
-.0012217
yr1982 |
-.038262
.0386544
-0.99
0.322
-.1140232
.0374993
yr1983 | -.0311078
.0519977
-0.60
0.550
-.1330214
.0708058
yr1984 | -.0303459
.0642001
-0.47
0.636
-.1561758
.095484
0.002
0.106
. xtabond2 n l.n w k l.w l.k yr1980 yr1981 yr1982 yr1983 yr1984, gmm(l.n) iv(yr1980-yr1984)
Favoring space over speed. To switch, type or click on mata: mata set matafavor speed, perm.
Dynamic panel-data estimation, one-step system GMM
-----------------------------------------------------------------------------Group variable: id
Number of obs
=
891
Time variable : year
Number of groups
=
140
Number of instruments = 41
Obs per group: min =
6
Wald chi2(10) =
6066.80
avg =
6.36
Prob > chi2
=
0.000
max =
8
-----------------------------------------------------------------------------n |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------n |
L1. |
.8658989
.0572771
15.12
0.000
.7536379
.9781598
|
w | -.7153568
.1246113
-5.74
0.000
-.9595905
-.4711232
k |
.5904417
.091638
6.44
0.000
.4108345
.770049
|
w |
L1. |
.5637949
.1067572
5.28
0.000
.3545547
.7730351
|
k |
L1. | -.4850106
.0941143
-5.15
0.000
-.6694714
-.3005499
|
yr1980 |
-.000474
.0123318
-0.04
0.969
-.0246438
.0236958
yr1981 | -.0172628
.0170324
-1.01
0.311
-.0506458
.0161201
yr1982 |
.0163149
.0195558
0.83
0.404
-.0220137
.0546435
yr1983 |
.0206385
.0204995
1.01
0.314
-.0195397
.0608167
yr1984 |
.0126074
.0295783
0.43
0.670
-.0453651
.0705799
_cons |
.6428872
.3137909
2.05
0.040
.0278683
1.257906
------------------------------------------------------------------------------
chi2 =
chi2 =
0.000
0.546
chi2 =
chi2 =
0.002
0.056
Endogeneity Revisited
Consider the following wage regression:
01 = 40 + (30 + +(k(0 + H(01 + (2()01 + 01
We can think of two possible forms of endogeneity:
o Two-way causation experience is rewarded with high pay and workers tend to stay
in high-paid jobs
o Unobserved common factors ability is rewarded with high pay and high ability
people stay in education longer
Two-way causation
Tenure model:
(2()01 = V01 + =01
Unobserved common factors: 40 represents high ability and high ability people stay in education
longer
(30 = 40 + &() =)9(,
&=((30 , 40 ) = PQ
( > 0)
To deal with this kind of endogeneity we can estimate a within group regression model
o The within-group transformation eliminates the 40 s
o It also eliminates time-invariant variables, but there are approaches (e.g. HausmanTaylor) to obtain coefficients on these variables
Other IV Estimators
By applying the between-group transformation or the random-effects GLS transformation to
the model and instruments, we can define between-group and random effects IV estimators
analogous to the regression case
o As with standard regression these estimators are not robust with respect to correlation
between the 40 and 201
Number of obs
Number of groups
=
=
28093
4699
R-sq:
1
6.0
15
within = 0.1335
between = 0.2484
overall = 0.1862
corr(u_i, Xb)
= 0.1840
F(3,23391)
Prob > F
=
=
1201.75
0.0000
-----------------------------------------------------------------------------ln_wage |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------tenure |
.0209418
.0008001
26.17
0.000
.0193735
.02251
age |
.0123481
.0004125
29.93
0.000
.0115395
.0131566
not_smsa |
-.099398
.0097221
-10.22
0.000
-.118454
-.080342
_cons |
1.280688
.0112142
114.20
0.000
1.258707
1.302668
-------------+---------------------------------------------------------------sigma_u | .38143467
sigma_e | .29745202
rho | .62184184
(fraction of variance due to u_i)
-----------------------------------------------------------------------------F test that all u_i=0:
F(4698, 23391) =
7.33
Prob > F = 0.0000
Number of obs
Number of groups
=
=
19007
4134
R-sq:
1
4.6
12
within =
.
between = 0.1277
overall = 0.0879
corr(u_i, Xb)
= -0.6875
Wald chi2(3)
Prob > chi2
=
=
141873.28
0.0000
-----------------------------------------------------------------------------ln_wage |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------tenure |
.2452291
.0386354
6.35
0.000
.1695051
.3209531
age | -.0651322
.0127701
-5.10
0.000
-.0901611
-.0401034
not_smsa | -.0159519
.0346643
-0.46
0.645
-.0838927
.0519888
_cons |
2.831893
.2443845
11.59
0.000
2.352908
3.310878
-------------+---------------------------------------------------------------sigma_u | .71942007
sigma_e | .64359089
rho | .55546192
(fraction of variance due to u_i)
-----------------------------------------------------------------------------F test that all u_i=0:
F(4133,14870) =
1.38
Prob > F
= 0.0000
-----------------------------------------------------------------------------Instrumented:
tenure
Instruments:
age not_smsa union south
Number of obs
Number of groups
=
=
28093
4699
R-sq:
1
6.0
15
within = 0.1322
between = 0.2638
overall = 0.1979
corr(u_i, X)
= 0 (assumed)
Wald chi2(3)
Prob > chi2
=
=
4879.47
0.0000
-----------------------------------------------------------------------------ln_wage |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------tenure |
.025503
.0007467
34.15
0.000
.0240395
.0269666
age |
.0118608
.0003859
30.74
0.000
.0111044
.0126171
not_smsa | -.1580611
.0077541
-20.38
0.000
-.1732589
-.1428633
_cons |
1.289867
.0116069
111.13
0.000
1.267118
1.312616
-------------+---------------------------------------------------------------sigma_u | .32162515
sigma_e | .29745202
rho | .53898759
(fraction of variance due to u_i)
Number of obs
Number of groups
=
=
19007
4134
R-sq:
1
4.6
12
within = 0.0607
between = 0.1725
overall = 0.1192
corr(u_i, X)
= 0 (assumed)
Wald chi2(3)
Prob > chi2
=
=
929.08
0.0000
-----------------------------------------------------------------------------ln_wage |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------tenure |
.1768498
.0110283
16.04
0.000
.1552346
.1984649
age | -.0333235
.0030544
-10.91
0.000
-.03931
-.027337
not_smsa | -.2135208
.0129503
-16.49
0.000
-.2389029
-.1881386
_cons |
2.20578
.0581674
37.92
0.000
2.091774
2.319787
-------------+---------------------------------------------------------------sigma_u | .32796027
sigma_e | .64359089
rho | .20614163
(fraction of variance due to u_i)
-----------------------------------------------------------------------------Instrumented:
tenure
Instruments:
age not_smsa union south
within = 0.1235
between = 0.2071
overall = 0.0892
corr(u_i, Xb)
= -0.4766
Number of obs
Number of groups
=
=
5934
3461
1
4.3
11
Wald chi2(3)
Prob > chi2
=
=
5.83
0.1203
-----------------------------------------------------------------------------D.ln_wage |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------tenure |
D1. |
.1365949
.0778382
1.75
0.079
-.0159652
.289155
|
age |
D1. | -.0048762
.0135226
-0.36
0.718
-.03138
.0216277
|
not_smsa |
D1. | -.0633273
.0382332
-1.66
0.098
-.138263
.0116083
|
_cons | -.0694077
.0598777
-1.16
0.246
-.1867658
.0479503
-------------+---------------------------------------------------------------sigma_u | .53692033
sigma_e | .28615582
rho | .77878957
(fraction of variance due to u_i)
-----------------------------------------------------------------------------Instrumented:
tenure
Instruments:
age not_smsa union south
(1)
Where F0 are a set of individual specific (non-time varying) variables (e.g. education)
Partition 201 and F0 :
Hausman-Taylor IV Estimator
Uses exogenous time-varying regressors 201 from periods other than the current one as
instruments
One benefit of this approach is that it allows the estimation of a coefficient of a timeinvariant regressor in a fixed effects model (which is not possible using the standard FE
model)
Step 1: compute the within-group estimator for :
o Regress 01 0 on (201 20 ), which gives us the estimates (which are consistent
estimates of the parameters)
Step 2: construct within-group residuals and estimate PR
01 = 01 0 (201 20 )
-y
(0 = + F0 + )(,93
o To do this, stack the group means of these residuals in a full sample length data vector
o Use as IVs i01 = (201 , F0 ) (which requires that the number of 2 s exceeds the number
of the number of F s
o This provides a consistent estimator of the s
Step 4: Construct (0 = 0 F0 20 ; estimate PQ from 01 and (0 . These form the
weights in the GLS (random effects) estimation
Step 5: Estimate (1) as a random effects model using as IVs i01 = (F0 , (201 20 ), (201
20 ), 20 )
This estimator was first proposed as a way of estimating wage regressions:
o Given that unobserved ability is omitted from the regression model random effects
estimation will suffer from an endogeneity bias
o Fixed effects estimation can eliminate this bias, but also prevents us from estimating
the coefficients on schooling as well as other time-invariant (dummy) variables
Number of obs
F( 12,
594)
Prob > F
R-squared
Root MSE
=
=
=
=
=
4165
65.91
0.0000
0.4286
.34936
Number of obs
Number of groups
=
=
4165
595
R-sq:
7
7.0
7
within = 0.6581
between = 0.0261
overall = 0.0461
corr(u_i, Xb)
= -0.9100
F(9,594)
Prob > F
=
=
377.62
0.0000
. xthtaylor lwage wks south smsa ms exp exp2 occ ind union fem blk ed, endog(exp exp2 occ ind union ed) constant(fem
blk ed)
Hausman-Taylor estimation
Group variable: id
Number of obs
Number of groups
=
=
4165
595
7
7
7
Wald chi2(12)
Prob > chi2
=
=
6874.89
0.0000
-----------------------------------------------------------------------------lwage |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------TVexogenous |
wks |
.000909
.0005988
1.52
0.129
-.0002647
.0020827
south |
.0071377
.032548
0.22
0.826
-.0566553
.0709306
smsa | -.0417623
.0194019
-2.15
0.031
-.0797893
-.0037352
ms |
-.036344
.0188576
-1.93
0.054
-.0733041
.0006161
TVendogenous |
exp |
.1129718
.0024697
45.74
0.000
.1081313
.1178122
exp2 | -.0004191
.0000546
-7.68
0.000
-.0005261
-.0003121
occ | -.0213946
.0137801
-1.55
0.121
-.048403
.0056139
ind |
.0188416
.0154404
1.22
0.222
-.011421
.0491043
union |
.0303548
.0148964
2.04
0.042
.0011583
.0595513
TIexogenous |
fem | -.1368468
.1272797
-1.08
0.282
-.3863104
.1126169
blk | -.2818287
.1766269
-1.60
0.111
-.628011
.0643536
TIendogenous |
ed |
.1405254
.0658715
2.13
0.033
.0114197
.2696311
|
_cons |
2.884418
.8527775
3.38
0.001
1.213004
4.555831
-------------+---------------------------------------------------------------sigma_u | .94172547
sigma_e | .15180273
rho | .97467381
(fraction of variance due to u_i)
-----------------------------------------------------------------------------Note: TV refers to time varying; TI refers to time invariant.
Certain econometric problems are easier to address within the LPM framework than with
probit and logit models (e.g. when using instrumental variables whilst controlling for fixed
effects)
The two main problems with the LPM were: nonsense predictions are possible (there is
nothing to bind the value of 8 to the (0,1) range); and linearity doesnt make much sense
conceptually. To address these problems we can use a nonlinear binary response model
A general index model has the form:
( = 1|) = ()
for some (0,1). That is, 0 < () < 1. In most cases, () is a cumulative
distribution function for a continuous random variable with density H(). Then, () is
strictly increasing, and the estimates are easier to interpret.
The leading cases are:
o The Probit Model: (F) = (F) ; (=)= where (=) is the standard normal
(2)
= H()U ,
2U
where H(F)
(F)
F
Suppose we estimate a probit modelling the probability that a firm does some exporting as a
function of firm size.
For simplicity, abstract from other explanatory variables. Our model is thus:
(export = 1|,9F() = ( + ,9F()
where size is defined as the natural logarithm of employment.
t-statistic
-2.85
16.6
0.54
13.4
Since the coefficient on size is positive, we know that the marginal effect must be positive.
Treating size as a continuous variable, it follows that the marginal effect is equal to:
(export = 1|,9F()
= ( + ,9F()
,9F(
(export = 1|,9F()
= (2.85 + 0.54,9F()0.54
,9F(
Assume that the mean value of log employment is 3.4 (i.e. 30 workers), we can then
evaluate the marginal effect at the mean (this is the partial effect at the average):
= H( + 2 + 2 + + [ 2[ )
H
We then have:
(export = 1|,9F( = 3.4)
1
=
exp( (2.85 + 0.54 3.4) /2) 0.54
,9F(
2
(export = 1|,9F( = 3.4)
= 0.13
,9F(
So, evaluated at log employment = 3.4, the results imply that an increase in log employment
by a small amount () raises the probability of exporting by 0.13
A second way of calculating the marginal effect is the average partial effect:
U =
;
"
0?
)U # =
H(
;
"
) U
H(
0?
And
(01 = 1|201 , 40 ) = (201 + 40 )
Where () is either the standard normal CDF (probit) or the logistic CDF (logit)
In linear panel models it is easy to eliminate the individual effects (i.e. the 40 s) by first
differencing or by using the within groups transformation
This is not possible in this case because of the non-linear nature of the model (i.e. the ()
function)
If we attempt to estimate the 40 s through the inclusion of dummy variables in the probit
and logit specification we will get biased estimates of unless is large.
This is the incidental parameters problem
o With small the estimates of the 40 s are inconsistent (and increasing doesnt solve
this problem).
o Unlike in the linear model, the inconsistency of the 40 s has a knock-on effect in the
sense that the estimate of becomes insignificant too.
Example:
Consider the logit model in which = 2, is a scalar, and 201 is a time dummy such that
20 = 0, 20 = 1. Thus,
exp( 0 + 40 )
(01 = 1|20 , 40 ) =
( 0 + 40 )
1 + exp( 0 + 40 )
(01 = 1|20 , 40 ) =
exp( 1 + 40 )
( 1 + 40 )
(
)
1 + exp 1 + 40
Suppose we attempt to estimate this model with N dummy variables included to control for
the individual fixed effects. There would thus be + 1 parameters to estimate. It can be
shown that in this case the probability of our variable of interest, , is:
lim l = 2
That is, the probability limit of the logit dummy variable estimator for this admittedly very
special case is double the true value of . With a bias of 100% in very large (infinite)
samples, this is not a very useful approach. This form of inconsistency also holds in more
general cases: unless is large, the logit dummy variable estimator will not work.
And
(01 = 1|201 , 40 ) = (201 + 40 )
These are restrictive assumptions, especially since endogeneity in the explanatory variables
is ruled out. The only advantage over a simple pooled probit model is that the RE model
allows for serial correlation in the unobserved factors determining 01 , i.e. in (40 + 301 )
However, it is fairly straightforward to extend the model and allow for correlation between
40 and 201 this is what the Mundlak-Chamberlain approach does (see below)
If 40 had been observed, the likelihood of observing individual 9 would have been:
-
and it would be straightforward to maximize the sample likelihood conditional on 201 , 40 and
01
Because the 40 are unobserved however, they cannot be included in the likelihood function.
As discussed above, a dummy variables approach cannot be used, unless is large.
Recall from basic statistics (Bayes theorem for probability densities) that, in general,
+| (2, ) =
+ (2, )
+ ()
The marginal density of 7 can be obtained by integrating out from the joint density
+ (2 ) = + (2, ) = +| (2, )+ ()
We can think about + (2 ) as a likelihood contribution. For a linear model, we might write:
+ ( ) = +RQ (, 4 )4 = +R|Q (, 4 )+Q (4)
In the context of the traditional RE probit, we integrate out 40 from the likelihood as follows:
}0 (0 , , 0- |20 , , 20- ; , PQ )
-
Since and PQ can be estimated, the partial effects at 4 = 0 as well as the APEs can be
estimated
Marginal effects at 40 = 0 can be computed using standard techniques, with the APE again a
useful effect to calculate
Since 40 ~Normal(0, PQ ), the APE for a continuous 21U is:
U /(1 + PQ )/ # 21 /(1 + PQ )/ #
Whilst perhaps elegant, the above model does not allow for a correlation between 40 and
the explanatory variables, and so does not achieve anything in terms of addressing an
endogeneity problem. We now turn to more useful models in that context.
One important advantage of this model over the probit model is that it will be possible to
obtain a consistent estimator of without making any assumptions about how 40 is related
to 201 (however, we need strict exogeneity to hold).
This is possible, because the logit functional form enables us to eliminate 40 from the
estimating equation
What we do is find the joint distribution of 0 (0 , , 0- ) conditional on 20 , 40 and
/0 -1? 01
It turns out in the logit case that this conditional distribution does not depend upon 20 , so
that it is also the distribution of 0 given 20 and /0
The key thing to note here is that we condition on 0 + 0 = 1, i.e. that 01 changes
between the two time periods. For the logit functional form, we have:
(0 + 0 = 1|20 , 20 , 40 )
1
exp(20 + 40 )
1
exp(20 + 40 )
=
+
1 + exp(20 + 40 ) 1 + exp(20 + 40 ) 1 + exp(20 + 40 ) 1 + exp(20 + 40 )
Or simply:
( zQy )z(ys zQy )
yr zQy )z(ys zQy )
yr
(0 + 0 = 1|20 , 20 , 40 ) =
z(
Furthermore:
(0 = 0, 0 = 1|20 , 20 , 40 ) =
exp(20 + 40 )
1
1 + exp(20 + 40 ) 1 + exp(20 + 40 )
exp(20 + 40 )
(0 = 0, 0 = 1|20 , 20 , 40 , 0 + 0 = 1) =
exp(20 + 40 ) + exp(20 + 40 )
(0 = 0, 0 = 1|20 , 20 , 40 , 0 + 0 = 1) =
exp(20 )
1 + exp(20 )
The key result is that the 40 have been eliminated. It follows that:
1
(0 = 1, 0 = 0|20 , 20 , 40 , 0 + 0 = 1) =
1 + exp(20 )
Notes:
o These probabilities condition on 0 + 0 = 1
o These probabilities are independent of 40
exp(20 )
1
log } = 0 ln
+ 0 ln
1 + exp(20 )
1 + exp(20 )
?
Another awkward issue concerns the interpretation of the results. The estimation procedure
just outlined implies we do not obtain estimates of 40 , which means we cant compute
marginal effects.
o We cant estimate the partial effects on the response probabilities unless we plug in a
value for 4
o Because the distribution of 40 is unrestricted and in particular (40 ) is not necessarily
zero it is hard to know what to plug in for 4
o We cant estimate APEs, since doing so would require finding (21 + 40 ) - a task
that requires specifying a distribution for 40
We will now discuss an approach which, in some ways, can be thought of as representing a
middle way. Start from the latent variable model:
01 = 201 + 40 + 301
01 = l01 > 0
Consider writing the 40 as an explicit function of the 2-variables (i.e. allowing for correlation
between 40 and 20 ), for example as follows:
40 = + 20 ] + 0
(1)
40 = + 20 ^ + 0
(2)
Or
where 2 is an average of 201 over time for individual 9 (hence time invariant); 20 contains 201
for all ; 0 is assumed uncorrelated with 20 ; 0 is assumed uncorrelated with 20 . Equation (1)
is easier to implement and so we will focus on this.
Equation (1) may be considered restrictive, in the sense that functional form assumptions
are made, but it at least allows for non-zero correlation between 40 and the regressors 201 .
The probability that 01 = 1 can now be written as:
(01 = 1|201 , 40 ) = (01 = 1|201 , 20 , 0 ) = (201 + + 20 ] + 0 )
We now see that, after having added 20 to the RHS, we arrive at the traditional random
effects probit model:
0 (0 , , 0- |20 , , 20- ; , P )
-
= (201 + + 20 ] + 0 )y
1?
(40 ) = + (20 )]
"
; 201 + l + 20 ]
0?
Where the subscripts indicate that coefficients have been scaled by (1 + PQ )/
For a discrete variable, the above expression can be evaluated for two different values for 2
For a continuous variable, 2U , the APE can be evaluated by using the average across 9 of
U 201 + l + 20 ] to get the approximate APE of a one-unit increase in 2U
Number of obs
Number of groups
=
=
28315
5663
R-sq:
5
5.0
5
within = 0.0031
between = 0.0103
overall = 0.0091
corr(u_i, Xb)
= -0.0073
F(6,5662)
Prob > F
=
=
5.61
0.0000
because of
likelihood
likelihood
likelihood
likelihood
Probit regression
collinearity
= -17709.021
= -16561.609
= -16556.671
= -16556.671
Number of obs
LR chi2(10)
Prob > chi2
Pseudo R2
=
=
=
=
28315
2304.70
0.0000
0.0651
-----------------------------------------------------------------------------lfp |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------kids | -.1989144
.0074815
-26.59
0.000
-.2135779
-.1842509
lhinc | -.2110738
.0130489
-16.18
0.000
-.2366493
-.1854984
educ |
.0796863
.003201
24.89
0.000
.0734125
.0859601
black |
.2209396
.0334141
6.61
0.000
.1554492
.2864301
age |
.1449159
.0061536
23.55
0.000
.132855
.1569767
agesq | -.0019912
.0000756
-26.34
0.000
-.0021393
-.001843
per1 |
.0577767
.025249
2.29
0.022
.0082896
.1072637
per2 |
.0453522
.0252187
1.80
0.072
-.0040756
.0947799
per3 |
.0252589
.0251707
1.00
0.316
-.0240749
.0745926
per4 |
.0116797
.025157
0.46
0.642
-.0376272
.0609865
per5 | (omitted)
_cons | -1.122226
.1369621
-8.19
0.000
-1.390667
-.853785
Number of obs
28315
Expression
: Pr(lfp), predict()
dy/dx w.r.t. : kids lhinc
-----------------------------------------------------------------------------|
Delta-method
|
dy/dx
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------kids | -.0660184
.002395
-27.56
0.000
-.0707126
-.0613242
lhinc |
-.070054
.0042791
-16.37
0.000
-.0784409
-.0616671
because of
likelihood
likelihood
likelihood
likelihood
Probit regression
collinearity
= -17709.021
= -16521.245
= -16516.437
= -16516.436
Number of obs
LR chi2(12)
Prob > chi2
Pseudo R2
=
=
=
=
28315
2385.17
0.0000
0.0673
-----------------------------------------------------------------------------lfp |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------kids | -.1173749
.0372874
-3.15
0.002
-.1904569
-.044293
lhinc | -.0288098
.0248077
-1.16
0.246
-.077432
.0198125
kids_bar | -.0856913
.0380322
-2.25
0.024
-.160233
-.0111495
lhinc_bar | -.2501781
.0290625
-8.61
0.000
-.3071396
-.1932167
educ |
.0841338
.0032539
25.86
0.000
.0777562
.0905114
black |
.2030668
.0335069
6.06
0.000
.1373945
.268739
age |
.1516424
.0062081
24.43
0.000
.1394748
.1638101
agesq | -.0020672
.0000762
-27.13
0.000
-.0022166
-.0019179
per1 |
.0552425
.0252773
2.19
0.029
.0056999
.1047851
per2 |
.0416724
.0252544
1.65
0.099
-.0078254
.0911701
per3 |
.0220434
.0252037
0.87
0.382
-.027355
.0714417
per4 |
.0162108
.0251878
0.64
0.520
-.0331564
.0655779
per5 | (omitted)
_cons | -.7812987
.1426149
-5.48
0.000
-1.060819
-.5017785
Number of obs
28315
Expression
: Pr(lfp), predict()
dy/dx w.r.t. : kids lhinc
-----------------------------------------------------------------------------|
Delta-method
|
dy/dx
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------kids |
-.038852
.0123363
-3.15
0.002
-.0630307
-.0146734
lhinc | -.0095363
.008211
-1.16
0.245
-.0256296
.0065571
. xtprobit lfp kids lhinc kids_bar lhinc_bar educ black age agesq per1 per2 per3
Random-effects probit regression
Group variable: id
Number of obs
Number of groups
=
=
28315
5663
5
5.0
5
Log likelihood
= -8609.9002
Wald chi2(12)
Prob > chi2
=
=
623.40
0.0000
-----------------------------------------------------------------------------lfp |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------kids | -.3970102
.0701298
-5.66
0.000
-.534462
-.2595584
lhinc |
-.10034
.0469979
-2.13
0.033
-.1924541
-.0082258
kids_bar | -.4085664
.0898875
-4.55
0.000
-.5847427
-.2323901
lhinc_bar | -.8941069
.1199703
-7.45
0.000
-1.129244
-.6589695
educ |
.3189079
.024327
13.11
0.000
.2712279
.366588
black |
.6388783
.1903525
3.36
0.001
.2657943
1.011962
age |
.7282056
.0445623
16.34
0.000
.640865
.8155461
agesq | -.0098358
.0005747
-17.11
0.000
-.0109623
-.0087094
per1 |
.200357
.049539
4.04
0.000
.1032624
.2974515
per2 |
.1551917
.0499822
3.10
0.002
.0572284
.253155
per3 |
.0756514
.0499737
1.51
0.130
-.0222952
.173598
per4 |
.0646736
.049747
1.30
0.194
-.0328288
.1621759
per5 | (omitted)
_cons | -5.559732
1.000528
-5.56
0.000
-7.52073
-3.598733
-------------+---------------------------------------------------------------/lnsig2u |
2.947234
.0435842
2.861811
3.032657
-------------+---------------------------------------------------------------sigma_u |
4.364995
.0951224
4.182484
4.55547
rho |
.9501326
.002065
.945926
.9540279
-----------------------------------------------------------------------------Likelihood-ratio test of rho=0: chibar2(01) = 1.6e+04 Prob >= chibar2 = 0.000
P = 4.364995 = 19.05318
gen temp = normalden(_b[kids] * 0.22331 * kids + _b[lhinc] * 0.22331 * lhinc + _b[kids_bar] * 0.22331 * kids_bar +
_b[lhinc_bar] * 0.22331 * lhinc_bar + _b[educ] * 0.22331 * educ + _b[black] * 0.22331 * black + _b[age] * 0.22331 * age
+ _b[agesq] * 0.22331 * agesq + _b[per1] * 0.22331 * per1 + _b[per2] * 0.22331 * per2 + _b[per3] * 0.22331 * per3 +
_b[per4] * 0.22331 * per4 + _b[_cons] * 0.22331)
egen temp1 = mean(temp)
temp1 = 0.3250374 (this is the scale factor)
So, the APE for n9, is (0.3970102 0.22331) 0.3250374 = 0.0288
Estimating the LPM by FE gives estimated coefficients of roughly -0.039 and -0.009 on the n9, and 9/ variables
respectively. That is, each child reduces the labour force participation by about 0.039%, while a 10% increase in a
husbands income lowers the probability by about 0.0009.
The APEs become much larger when we use the probit and assume that 40 is independent of 20
Log likelihood
= -2003.4184
Number of obs
Number of groups
=
=
5275
1055
5
5.0
5
LR chi2(6)
Prob > chi2
=
=
57.27
0.0000
-----------------------------------------------------------------------------lfp |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------kids | -.6438386
.1247828
-5.16
0.000
-.8884084
-.3992688
lhinc | -.1842911
.0826019
-2.23
0.026
-.3461878
-.0223943
educ | (omitted)
black | (omitted)
age | (omitted)
agesq | (omitted)
per1 |
.3563745
.0888354
4.01
0.000
.1822604
.5304886
per2 |
.2635706
.0886977
2.97
0.003
.0897262
.437415
per3 |
.1315756
.0880899
1.49
0.135
-.0410774
.3042286
per4 |
.1084422
.0879067
1.23
0.217
-.0638517
.2807361
per5 | (omitted)
Coefficient estimates are difficult to interpret
The relative size is 0.644 / 0.184 = 3.5, which is not too different from the ratio when using the pooled MLE Chamberlain model for
example
(1)
After integrating out the 40 the likelihood function in the dynamic probit model is:
0 (0 , , 0- |20 , , 20- ; , PQ )
-
= F01 + 0,1; + 40 #
1?
(;y )
1 F01 + 0,1; + 40 #
So that 01 given 0,1; , , 0 , F0 , 40 follows a probit model and 0 given (0 , F0 ) is
distributed as Normal(0, P )
This gives a density in exactly the same form as that for conditional MLE with and P
replacing 4 and PQ
This means that we can use standard RE probit commands to estimate these dynamic
models
We simply expand the list of explanatory variables to include 0 and F0 in each time period
It is then simple to test whether = 0, meaning that there is no state dependence, once we
control for an unobserved effect
In estimating the dynamic model it is important to remember that it is not possible to obtain
consistent estimates of the parameters using pooled probit of 01 on 1, F01 , 0,1; , 0 , F0 .
We can estimate Average Partial Effects, but we must now average out the initial condition
along with leads and lags of all strictly exogenous variables.
Let F1 and 1; be given values of the explanatory variables
Then the Average Structural Function:
(F1 + 1; + 40 ) = ( + F1 + 1; + ] 0 + F0 ] )
"
Where the subscript denotes that the original coefficients have been multiplied by
(1 + PD );/ and l, , D, ] , ] and PD are the estimates reported by the statistical package
We can then take derivatives of this expression w.r.t. continuous elements of F1 , or take
differences with respect to discrete elements
A particularly interesting case is to alternatively set 1; = 1 and 1; = 0 and obtain the
change in probability that 01 = 1 when 1; goes from zero to one
To obtain a single APE we can also average across all time periods
We further include time-constant variables: n, (3, H(, H( and a full set of time
dummies
We include among the regressors: +0 , n9,0 through n9,0 and 9/0 through 9/0
2
3
4
5
xtprobit lfp l.lfp lfp_0 kids kids_1 kids_2 kids_3 kids_4 lhinc lhinc_1 lhinc_2 lhinc_3 lhinc_4 educ black age agesq
per2 per3 per4 per5
Random-effects probit regression
Group variable: id
Number of obs
Number of groups
=
=
22652
5663
4
4.0
4
Log likelihood
= -5039.9867
Wald chi2(19)
Prob > chi2
=
=
4093.48
0.0000
-----------------------------------------------------------------------------lfp |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------lfp |
L1. |
1.536826
.0665669
23.09
0.000
1.406358
1.667295
|
lfp_0 |
2.545303
.1556541
16.35
0.000
2.240227
2.85038
kids | -.3591127
.0646669
-5.55
0.000
-.4858574
-.2323679
kids_1 |
.2595521
.0665754
3.90
0.000
.1290668
.3900374
kids_2 |
.0206725
.0355721
0.58
0.561
-.0490476
.0903925
kids_3 |
.0024968
.0358429
0.07
0.944
-.0677541
.0727477
kids_4 |
.047252
.0366343
1.29
0.197
-.02455
.119054
lhinc | -.0745293
.0485506
-1.54
0.125
-.1696867
.0206281
lhinc_1 | -.0747431
.0531001
-1.41
0.159
-.1788173
.0293311
lhinc_2 | -.0080621
.0501491
-0.16
0.872
-.1063524
.0902283
lhinc_3 |
.0088362
.0511642
0.17
0.863
-.0914437
.1091162
lhinc_4 | -.1189348
.0610491
-1.95
0.051
-.2385888
.0007193
educ |
.0459694
.0098917
4.65
0.000
.026582
.0653569
black |
.1281378
.0984119
1.30
0.193
-.0647459
.3210216
age |
.1383024
.019357
7.14
0.000
.1003634
.1762414
agesq | -.0017838
.0002402
-7.43
0.000
-.0022545
-.0013131
per2 | -.7521025
.5635128
-1.33
0.182
-1.856567
.3523623
per3 | -.7700739
.5206839
-1.48
0.139
-1.790596
.2504478
per4 | -.8158966
.4836915
-1.69
0.092
-1.763915
.1321213
per5 | (omitted)
_cons | -2.818611
.5587894
-5.04
0.000
-3.913818
-1.723403
-------------+----------------------------------------------------------------
/lnsig2u |
.1151956
.1209124
-.1217884
.3521796
-------------+---------------------------------------------------------------sigma_u |
1.059289
.0640406
.9409228
1.192545
rho |
.5287671
.030128
.4695905
.587146
-----------------------------------------------------------------------------Likelihood-ratio test of rho=0: chibar2(01) =
164.94 Prob >= chibar2 = 0.000
From the above PD = 1.059 = 1.122093
So, (1 + PD );/ = 0.686464
gen temp = normalden(_b[lfp_0]*0.686464*lfp_0 + _b[kids]*0.686464*kids + _b[kids_1]*0.686464*kids_1 +
_b[kids_2]*0.686464*kids_2 + _b[kids_3]*0.686464*kids_3 + _b[kids_4]*0.686464*kids_4 + _b[lhinc]*0.686464*lhinc +
_b[lhinc_1]*0.686464*lhinc_1 + _b[lhinc_2]*0.686464*lhinc_2 + _b[lhinc_3]*0.686464*lhinc_3 +
_b[lhinc_4]*0.686464*lhinc_4 + _b[educ]*0.686464*educ + _b[black]*0.686464*black + _b[age]*0.686464*age +
_b[agesq]*0.686464*agesq + _b[per2]*0.686464*per2 + _b[per3]*0.686464*per3 + _b[per4]*0.686464*per4 +
_b[_cons]*0.686464 + _b[L1.]*0.686464) - normalden(_b[lfp_0]*0.686464*lfp_0 + _b[kids]*0.686464*kids +
_b[kids_1]*0.686464*kids_1 + _b[kids_2]*0.686464*kids_2 + _b[kids_3]*0.686464*kids_3 + _b[kids_4]*0.686464*kids_4 +
_b[lhinc]*0.686464*lhinc + _b[lhinc_1]*0.686464*lhinc_1 + _b[lhinc_2]*0.686464*lhinc_2 + _b[lhinc_3]*0.686464*lhinc_3 +
_b[lhinc_4]*0.686464*lhinc_4 + _b[educ]*0.686464*educ + _b[black]*0.686464*black + _b[age]*0.686464*age +
_b[agesq]*0.686464*agesq + _b[per2]*0.686464*per2 + _b[per3]*0.686464*per3 + _b[per4]*0.686464*per4 +
_b[_cons]*1.45674)
. summarize temp
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------temp |
28315
.0830207
.2166758 -.3857821
.3968095
Averaged across all women and all time periods the probability of being in the labour force
at time is about 0.08 higher if the woman was in the labour force at time 1
It is instructive to compare the APE with the estimate of a dynamic probit model that ignores 40
. xtprobit lfp l.lfp
kids lhinc
Number of obs
Number of groups
=
=
22652
5663
4
4.0
4
Log likelihood
-5332.529
Wald chi2(10)
Prob > chi2
=
=
12071.51
0.0000
-----------------------------------------------------------------------------lfp |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------lfp |
L1. |
2.875683
.0269811
106.58
0.000
2.822801
2.928565
|
kids | -.0607933
.012217
-4.98
0.000
-.0847381
-.0368484
lhinc | -.1143188
.0211668
-5.40
0.000
-.1558051
-.0728325
educ |
.0291874
.0052362
5.57
0.000
.0189246
.0394501
black |
.079251
.0536696
1.48
0.140
-.0259396
.1844415
age |
.084404
.0099983
8.44
0.000
.0648076
.1040004
agesq | -.0010991
.0001236
-8.90
0.000
-.0013413
-.000857
per1 | (omitted)
per2 |
.0304145
.037152
0.82
0.413
-.042402
.103231
per3 | -.0036646
.0369207
-0.10
0.921
-.0760278
.0686986
per4 |
.0326971
.0371438
0.88
0.379
-.0401035
.1054977
per5 | (omitted)
_cons | -2.201223
.2218053
-9.92
0.000
-2.635954
-1.766493
-------------+---------------------------------------------------------------/lnsig2u | -15.70567
14.44481
-44.01697
12.60564
-------------+---------------------------------------------------------------sigma_u |
.0003886
.002807
2.77e-10
546.109
rho |
1.51e-07
2.18e-06
7.65e-20
.9999966
-----------------------------------------------------------------------------Likelihood-ratio test of rho=0: chibar2(01) =
0.00 Prob >= chibar2 = 1.000
. summarize temp
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------temp |
28315
.8375263
.0121556
.6019519
.849521
The APE for state dependence is much higher in this case than when heterogeneity is
controlled for.
o Averaged across all women and all time periods the probability of being in the labour
force at time is about 0.84 higher if the woman was in the labour force at time 1
Therefore, much of the persistence in labour force participation of married women is
accounted for by unobserved heterogeneity.
There is some state dependence, but its value is much smaller than a simple dynamic probit
indicates.