GMM Stata
GMM Stata
GMM Stata
Stata 11
David M. Drukker
StataCorp
Stata Conference
Washington, DC 2009
1 / 27
Outline
1
A quick introduction to GMM
2
gmm examples
Ordinary least squares
Two-stage least squares
Cross-sectional Poisson with endogenous covariates
Fixed-eects Poisson regression
2 / 27
A quick introduction to GMM
Method of Moments (MM)
We estimate the mean of a distribution by the sample mean, the
variance by the sample variance, etc
We want to estimate = E[y]
We use = (1/N)
N
i =1
y
i
This estimator has nice properties because it solves the sample
moment condition
(1/N)
N
i =1
(y
i
) = 0
which is the sample analog of the population moment condition
E[y ] = 0
Estimators that solve sample moment equations to produce
estimates are called method-of-moments (MM) estimators
This method dates back to Pearson (1895)
3 / 27
A quick introduction to GMM
Generalized method-of-moments (GMM)
The MM only works when the number of moment conditions
equals the number of parameters to estimate
If there are more moment conditions than parameters, the
system of equations is algebraically over identied and cannot
be solved
Generalized method-of-moments (GMM) estimators choose the
estimates that minimize a quadratic form of the sample
moment conditions
GMM gets as close to solving the over-identied system of
sample moment equations as possible
GMM reduces to MM when the number of parameters equals
the number of moment conditions
Hansen (1982) produced many of the key results; Wooldridge
(2002); Cameron and Trivedi (2005) provide good introductions
4 / 27
A quick introduction to GMM
Denition of GMM estimator
Our research question implies q population moment conditions
E[m(w
i
, )] = 0
m is q 1 vector of functions whose expected values are zero in
the population
w
i
is the data on person i
is k 1 vector of parameters, k q
The sample moments that correspond to the population
moments are
m() = (1/N)
N
i =1
m(w
i
, )
When k < q, GMM chooses the parameters that are as close as
possible to solving the over-identied system of moment
equations
GMM
arg min
m()
Wm()
5 / 27
A quick introduction to GMM
Some properties of the GMM estimator
GMM
arg min
m()
Wm()
When k = q, the MM estimator solves m() exactly so
m()
Wm() = 0
W only aects the eciency of the GMM estimator
Setting W = I yields consistent, but inecient estimates
Setting W = Cov[m()]
1
yields an ecient GMM estimator
We can take multiple steps to get an ecient GMM estimator
1
Let W = I and get
GMM1
arg min
m()
m()
2
Use
GMM1
to get
W, which is an estimate of Cov[m()]
1
3
Get
GMM2
arg min
m()
Wm()
4
Repeat steps 2 and 3 using
GMM2
in place of
GMM1
6 / 27
gmm examples
The gmm command
The new command gmm estimates parameters by GMM
gmm is similar to nl, you specify the sample moment conditions
as substitutable expressions
Substitutable expressions enclose the model parameters in braces
{}
7 / 27
gmm examples Ordinary least squares
The interactive syntax of gmm
For many models, the population moment conditions have the
form
E[ze()] = 0
where z is a q 1 vector of instrumental variables and e() is a
scalar function of the data and the parameters
The corresponding syntax of gmm is
gmm (eb expression)
_
if
_
in
_
weight
,
instruments(instrument varlist)
_
options
N
i =1
x
i
(y
i
x
i
) = 0
Solving for yields
OLS
=
_
N
i =1
x
i
x
i
_
1
N
i =1
x
i
y
i
9 / 27
gmm examples Ordinary least squares
Modeling crime data I
We have (ctional) data on crime in 3,000 communities
. use cscrime2, clear
. describe
Contains data from cscrime2.dta
obs: 3,000
vars: 5 29 Jul 2009 12:02
size: 132,000 (98.7% of memory free) (_dta has notes)
storage display value
variable name type format label variable label
policepc double %10.0g police officers per thousand
arrestp double %10.0g arrests/crimes
convictp double %10.0g convictions/arrests
legalwage double %10.0g legal wage index 0-20 scale
crime double %10.0g property-crime index 0-50 scale
Sorted by:
10 / 27
gmm examples Ordinary least squares
Modeling crime data II
We specify that
crime
i
= policepc
i
1
+ legalwage
i
2
+
3
+
i
We want to model
E[crime|policepc, legalwage] = policepc
1
+legalwage
2
+
3
If E[|policepc, legalwage] = 0, the population moment
conditions are
E
__
policepc
legalwage
_
(crime policepc
1
legalwage
2
3
)
_
=
_
0
0
_
11 / 27
gmm examples Ordinary least squares
OLS by GMM I
. gmm (crime - policepc*{b1} - legalwage*{b2} - {b3}), ///
> instruments(policepc legalwage) nolog
Final GMM criterion Q(b) = 2.62e-31
GMM estimation
Number of parameters = 3
Number of moments = 3
Initial weight matrix: Unadjusted Number of obs = 3000
GMM weight matrix: Robust
Robust
Coef. Std. Err. z P>|z| [95% Conf. Interval]
/b1 -.4226003 .0100658 -41.98 0.000 -.4423289 -.4028716
/b2 -7.543894 .3969104 -19.01 0.000 -8.321824 -6.765964
/b3 27.79852 .0546507 508.66 0.000 27.69141 27.90563
Instruments for equation 1: policepc legalwage _cons
12 / 27
gmm examples Ordinary least squares
OLS by GMM II
. regress crime policepc legalwage, robust
Linear regression Number of obs = 3000
F( 2, 2997) = 1384.95
Prob > F = 0.0000
R-squared = 0.6217
Root MSE = 1.7972
Robust
crime Coef. Std. Err. t P>|t| [95% Conf. Interval]
policepc -.4226003 .0100709 -41.96 0.000 -.4423468 -.4028538
legalwage -7.543894 .397109 -19.00 0.000 -8.322528 -6.765261
_cons 27.79852 .054678 508.40 0.000 27.69131 27.90573
13 / 27
gmm examples Two-stage least squares
IV and 2SLS
For some variables, the assumption E[|x] = 0 is too strong and
we need to allow for E[|x] = 0
If we have q variables z for which E[|z] = 0 and the correlation
between z and x is suciently strong, we can estimate from
the population moment conditions
E[z(y x)] = 0
z are known as instrumental variables
If the number of variables in z and x is the same (q = k),
solving the the sample moment conditions yields the MM
estimator known as the instrumental variables (IV) estimator
If there are more variables in z than in x (q > k) and we let
W =
_
N
i =1
z
i
z
i
_
1
in our GMM estimator, we obtain the
two-stage least-squares (2SLS) estimator
14 / 27
gmm examples Two-stage least squares
2SLS on crime data I
The assumption that E[|policepc] = 0 is false if communities
increase policepc in response an increase in crime (an increase
in
i
)
The variables arrestp and convictp are valid instruments, if
they measure some components of communities toughness-on
crime that are unrelated to but are related to policepc
We will continue to maintain that E[|legalwage] = 0
15 / 27
gmm examples Two-stage least squares
2SLS by GMM I
. gmm (crime - policepc*{b1} - legalwage*{b2} - {b3}), ///
> instruments(arrestp convictp legalwage ) nolog onestep
Final GMM criterion Q(b) = .0001736
GMM estimation
Number of parameters = 3
Number of moments = 4
Initial weight matrix: Unadjusted Number of obs = 3000
Robust
Coef. Std. Err. z P>|z| [95% Conf. Interval]
/b1 -.9516683 .0785137 -12.12 0.000 -1.105552 -.7977844
/b2 -2.304205 .9648523 -2.39 0.017 -4.195281 -.4131291
/b3 29.88578 .3135637 95.31 0.000 29.2712 30.50035
Instruments for equation 1: arrestp convictp legalwage _cons
16 / 27
gmm examples Two-stage least squares
2SLS by GMM II
. ivregress 2sls crime legalwage (policepc = arrestp convictp) , robust
Instrumental variables (2SLS) regression Number of obs = 3000
Wald chi2(2) = 696.63
Prob > chi2 = 0.0000
R-squared = .
Root MSE = 3.0516
Robust
crime Coef. Std. Err. z P>|z| [95% Conf. Interval]
policepc -.9516683 .0785137 -12.12 0.000 -1.105552 -.7977844
legalwage -2.304205 .9648523 -2.39 0.017 -4.195281 -.4131291
_cons 29.88578 .3135637 95.31 0.000 29.2712 30.50035
Instrumented: policepc
Instruments: legalwage arrestp
convictp
17 / 27
gmm examples Cross-sectional Poisson with endogenous covariates
Poisson with endogenous covariates
We want to model to E[y
i
|x
i
,
i
] = exp(x
i
)
i
This setup allows the distribution of
i
to depend on x
i
Mullahy (1997) showed that we can use instrumental variables z
i
and the population moment conditions
E [z
i
(y
i
exp(x
i
) 1)] = 0
to estimate
18 / 27
gmm examples Cross-sectional Poisson with endogenous covariates
. use accident2, clear
. describe
Contains data from accident2.dta
obs: 948
vars: 6 29 Jul 2009 11:59
size: 26,544 (99.7% of memory free)
storage display value
variable name type format label variable label
kids float %9.0g
cvalue float %9.0g
tickets float %9.0g
traffic float %9.0g
male float %9.0g
accidents float %9.0g
Sorted by:
traffic and male are exogenous variables
tickets is an endogenous variable
kids and cvalue are instrumental variables
19 / 27
gmm examples Cross-sectional Poisson with endogenous covariates
. gmm (accidents*exp(-tickets*{b1} - traffic*{b2} - male*{b3} - {b4}) - 1), ///
> instruments(kids cvalue traffic male) onestep nolog
Final GMM criterion Q(b) = .0109217
GMM estimation
Number of parameters = 4
Number of moments = 5
Initial weight matrix: Unadjusted Number of obs = 948
Robust
Coef. Std. Err. z P>|z| [95% Conf. Interval]
/b1 1.745919 .1984268 8.80 0.000 1.357009 2.134828
/b2 .1216527 .0421674 2.88 0.004 .0390061 .2042993
/b3 4.693161 .5129505 9.15 0.000 3.687797 5.698526
/b4 -11.51383 1.208924 -9.52 0.000 -13.88327 -9.144379
Instruments for equation 1: kids cvalue traffic male _cons
20 / 27
gmm examples Fixed-eects Poisson regression
More complicated moment conditions
The structure of the moment conditions for some models is too
complicated to t into the interactive syntax used thus far
For example, Wooldridge (1999, 2002); Blundell, Grith, and
Windmeijer (2002) discuss estimating the xed-eects Poisson
model for panel data by GMM.
In the Poisson panel-data model we are modeling
E[y
it
|x
it
,
i
] = exp(x
it
+
i
)
Hausman, Hall, and Griliches (1984) derived a conditional
log-likelihood function when the outcome is assumed to come
from a Poisson distribution with mean exp(x
it
+
i
) and
i
is
an observed component that is correlated with the x
it
21 / 27
gmm examples Fixed-eects Poisson regression
Wooldridge (1999) showed that you could estimate the
parameters of this model by solving the sample moment
equations
t
x
it
_
y
it
it
y
i
i
_
= 0
These moment conditions do not t into the interactive syntax
because the term
i
depends on the parameters
Need to use moment-evaluator program syntax
22 / 27
gmm examples Fixed-eects Poisson regression
Moment-evaluator program syntax
An abbreviated form of the program syntax for gmm is
gmm moment program
_
if
_
in
_
weight
,
equations(moment cond names)
parameters(parameter names)
_
instruments() options