This Content Downloaded From 190.164.207.86 On Sun, 27 Jun 2021 01:50:18 UTC
This Content Downloaded From 190.164.207.86 On Sun, 27 Jun 2021 01:50:18 UTC
This Content Downloaded From 190.164.207.86 On Sun, 27 Jun 2021 01:50:18 UTC
REFERENCES
Linked references are available on JSTOR for this article:
https://www.jstor.org/stable/1391677?seq=1&cid=pdf-
reference#references_tab_contents
You may need to log in to JSTOR to access the linked references.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
https://about.jstor.org/terms
Taylor & Francis, Ltd. and American Statistical Association are collaborating with JSTOR to
digitize, preserve and extend access to Journal of Business & Economic Statistics
Although formal conditions for identification in the multinomial probit (MNP) model are now
clearly established, little is known about how various estimable MNP specifications perform in
practice. This article shows that parameter identification in the MNP model is extremely tenuous
in the absence of exclusion restrictions. This previously unnoticed fact is important because
formal identification of MNP models does not require exclusion restrictions, and many potential
economic applications of MNP are to situations in which exclusion restrictions are not readily
available. Thus, failure to be aware of the difficulties present in such situations may lead to
reporting of unreliable results.
The multinomial probit (MNP) model has rarely beenFor clarity of exposition, it is important to note that
used as a model of choice in applied work, despite its are two types of identification problems. For a
there
well-known advantages over the popular logit model formally nonidentified model, there is a range of pa-
rameter
(i.e., its relaxation of the restrictive independence of values that generate the maximized value of
theof
irrelevant alternatives assumption). The lack of use objective function. In addition, a model may be
MNP stems from the computational burden involved formally
in identified yet exhibit very small variation in
the objective function from its maximum over a wide
its estimation. The model generates choice probabilities
range of parameter values. I refer to identification in
that are multivariate integrals of order M - 1, where
M is the number of alternatives. Thus, even whensuch M =cases as being tenuous or fragile. Common symp-
toms of this problem in practice are a close-to-singular
3, estimation by maximum likelihood (ML) is expensive
given large data sets. Hessian, large standard errors, and inability of optimi-
zation algorithms to find steps that improve the objec-
The recent development of a computationally prac-
tical method of simulated moments (MSM) estimator tive or to achieve convergence. As we shall see, these
for the MNP model (see McFadden 1989; Pakessymptoms and of fragile identification are quite severe in
MNP models without exclusion restrictions.
Pollard 1989), along with the development of practical
parameterization methods that avoid proliferationNote of that in complex nonlinear models such as MNP,
formal identification or nonidentification is often very
covariance matrix elements (see Ben-Akiva and Bolduc
1989; Bolduc 1991; Elrod and Keane 1991), has raised
difficult to prove due to the complexity of the analytical
renewed interest in MNP as a model of choice. Given Hessian. Thus, in this article, rather than working with
the lack of practical experience with MNP models, how- the analytical Hessian, I use a series of trial estimations
ever, there is a need to develop a "folklore" concerning of MNP models on Monte Carlo and actual data to
the conditions under which the model performs well. illustrate the nature of the identification problem. Be-
An important step in that direction is the recent paper cause of the difficulties in proving identification in non-
by Bunch and Kitamura (1991). linear models, a common practice in applied work is to
The purpose of this article is to demonstrate that simply attempt to estimate a model and see if the Hes-
parameter identification in MNP models is extremely sian is singular (or nearly singular). Such practice is
tenuous in the absence of exclusion restrictions. By ex- dangerous when using simulation estimators of the type
clusion restrictions I mean restrictions that certain ex- likely to be applied to MNP models, because simulation
ogenous variables in the model do not affect the utility error will generate contours where the true objective
levels of certain alternatives. This fact is important, function is flat and will generate a nonsingular Hessian
because formal identification of MNP models does not when the true Hessian is singular. An illustration of this
require exclusion restrictions, and (to my knowledge) danger was provided by Horowitz, Sparmann, and Da-
the fact that identification is tenuous in their absence ganzo (1982). They were testing the accuracy of sim-
has not been previously noted in the literature. Hence, ulated ML based on the Clark approximation for pur-
given the new MSM technology, researchers are likely poses of estimating the MNP model. One of the models
to attempt simulation estimation of unrestricted mul- they used for the tests was actually nonidentified, but
tinomial probits. This may, in turn, lead to the reporting approximation errors introduced contours in the ob-
of unreliable results. jective function that masked the problem and allowed
193
them to obtain estimates. (Their other tests did use abilities of choosing alternatives 2 and 3 are expre
identified models, however, and they indicate that the symmetrically.
Clark approximation does not work well for purposes This unrestricted model is not identified for two ob-
of estimating the MNP model.) Because of this prob- vious reasons. First, a proportional change in all ele-
lem, all of the estimation in this article is done using ments of the covariance matrix and of the (ac, /3j) for
ML estimation based on accurate numerical evaluations j = 1, 3 leaves all probabilities unaffected. Second,
of the MNP choice probabilities. addition of a constant to all of the aj leaves all proba-
bilities unchanged, because choice depends only on util-
1. THE MULTINOMIAL PROBIT MODEL ity differences.
Thus, as described by Bunch (1991), Dansie (1985),
In this section, I describe the trinomial probit model
and Albright, Lerman, and Manski (1977), identifica-
(TNP). Extension to multinomial probit is obvious. In
tion may be achieved by normalizing the utility of one
the TNP model there are three alternatives with random
alternative to 0 and by restricting one covariance matrix
utilities given by
element. (Other normalizations are possible.) Setting
Uli = a1 + P1Xi + Eli U3 = 0 and o1 = 1, we have the model
a single regressor Xi (i = 1), 8,000 from the N(6, 5) the Monte Carlo data we know the true model and since
distribution, and by drawing (Ei,, E2i) for i = 1, 8,000
the sample size of 8,000 is rather large, this approxi-
with covariance matrix given by (2'). mation to the Hessian should be a good one.
The first set of results using Monte Carlo data is
TNP model estimates were obtained by ML using the
algorithm of Berndt, Hall, Hall, and Hausman (1974),
presented in Table 1. The true values of the parameters
are reported in the column headed "True value" [here
which will be referred to as the BHHH algorithm. Let
p is defined as corr(ei, e2)]. The columns headed (1)-
0 denote the vector of all parameters of the model, and
let ok denote the estimate of 0 at iteration k. The log-
(4) report estimates obtained with p and 0-2 pegged at
likelihood function L for the TNP model is given by
various values. The X2(2) tests are for the null that the
N 3 hypothesized constraints are valid [note that the opti-
L( k) = E di ln Prji(0k), (4) mized value of the log-likelihood for the unrestricted
i=l j=1
model is -7,953.57 (Table 2, col. (1))]. The true values
of p and 0-2 used in constructing the data were .60 and
where dij is an indicator equal to 1 if i chooses alternative
1.50, respectively.
j and 0 otherwise. The TNP choice probabilities Prji(0k)
In column (2), p is restricted to 0 (whereas a2 is
were evaluated using 100 term tetrachoric expansions.
restricted to the true value of 1.5). The deterioration
A modified Newton-Raphson step is given by
of the log-likelihood resulting from this restriction is
only .57 of a point. Similarly, in Column (3), or2 is
ok+ = k AkH-1 L( O) (5) restricted to 1.00 whereas p is pegged at the true value
of .60. The deterioration of the log-likelihood resulting
where from this restriction is only one point. In column (4),
p is restricted to 0, and a2 is restricted to 1.00. The
H =2L(8) deterioration in the log-likelihood is 2.25 points, giving
d000' ok a X2(2) value of 4.5 that is not significant at the 10%
level (critical value = 4.605).
is the Hessian evaluated at ok and Ak is a step size that
It is apparent that restrictions on p and o02 produce
is 1 on the first step of an iteration but that is reduced
some slight deterioration in fit of the TNP model with-
if a unit step does not improve the log-likelihood func-
out exclusion restrictions-as they must because these
tion. Of course, numerical evaluation of the Hessian is
parameters are formally identified. The improvement
quite computationally expensive in this model. In the
in fit obtained by introducing these parameters is so
BHHH algorithm, the fact that the expected value of
minor, however, as to render their identification in
the Hessian is (at the optimum) equal to the negative
practice problematic. The problem is illustrated in Table
of the expectation of the outer product of the gradient
2. Here TNP models were estimated with p and 02 free.
vectors is invoked to justify approximation of the Hes-
The starting values used for the runs in columns (1)-
sian by
(4) of Table 2 are, respectively, the estimates in columns
(1)-(4) of Table 1. Classic symptoms of fragile iden-
H -_ dL(O) dL(0)
ao a' ok tification were found in all runs in Table 2. First, the
Hessian was so close to singular that, to obtain a sen-
In this article, this approximation tosible
the direction
Hessianvectorwas
for use in BHHH iterations, the
used to obtain steps and in covarianceMarquadt
matrix (1963)
calcula-
procedure of adding positive diagonal
elements
tions. When using Monte Carlo data, the outerto the (approximate) Hessian had to be ap-
product
plied
of the gradients was evaluated at the 0 (note, however,
vector. Sincethatin this was not done when cal-
U1,U2
IU =-0.8+0.2Xi
Xi
//
/-- U2 = -2.200 + 0.393 X i
/
-2.0
/
Figure 1. Effect of Fixing p at .0 When True p = .6. Note that the solid lines
are the true U, and U2 lines. The dashed lines are the U, and U2 lines estimated
with p fixed at .0 when the true p is .60 [see Table 1, col. (2)].
A TNP model with exclusion restrictions is estimated Experimentation with a wide range of specifications
in Table 3 (p. 199). This model has the form revealed that in MNP models it is necessary to have
Uli = a, + 1BilX,i + P12X2i one exclusion from each utility index to avoid identi-
fication problems. Simply introducing additional re-
U2i = a2 + 321X1i + 323X3i, (6) gressors, without introducing exclusion restrictions, does
where X1i is constructed as before, but X2i and X3i are not solve the problem. This is illustrated by the results
dummy variables equal to 1 with probability .50. Thus column (5) of Table 3, where estimates were obtained
in
there are four Ui(Xli) = U2(X1i) points corresponding with the p13 = p22 = 0 restrictions removed [the results
to the four possible values of (X1i, X2/), and any pivotingfrom col. (4) were used as starting values]. The symp-
to increase or reduce the distance IU1(X1i) - U2(X1i) toms of fragile identification, including increases in the
for all X1i is impossible. standard errors of up to 700%, are again apparent here.
The true values of the parameters of the model are It is also apparent that the identification problems
reported in the column headed "True value." All values found in the preceding Monte Carlo data do not result
are the same as before except that the new parameters from the particular distributional assumptions on the
312 and p23 equal -.60. The unrestricted model results Xi. This is best seen by considering a TNP model es-
are reported in column (4). In contrast to the results in timated on actual data. In Table 4, I report results of
Table 2, all of the parameters of the model are esti-estimating a model of industry choice on data from the
mated with precision. Columns (1)-(3) report estimates National Longitudinal Survey of Young Men (NLS).
obtained with p and a2 restricted as in Table 1, columns This application is motivated by the fact that the only
(2)- (4). Here, all of the restrictions are overwhelmingly application of MNP in labor economics is that of Heck-
rejected. For example, the X2(2) statistic for the re-man and Sedlacek (1985), who considered industry choice
striction p = .0, Cr2 = 1.0 is 44.78, compared to a 1%in Current Population Survey data. As in the work of
significance level of 9.21. Clearly, the problem of fragileHeckman and Sedlacek, industries are grouped into
identification in the TNP model is solved by introducing three alternatives-manufacturing (M), nonmanufac-
exclusion restrictions. turing (NM), and unemployment. The NLS sample used
U1, U2 / /
//>/ , U1 = -0
,//s/
/ // /
/ool
A/ / ~-0.8
// U2 = -1.249 + =0.278 X -i
-2.0
Figure 2. Effect of Fixing r2 at 1.0 When True r2 = 1.5. Note that the solid
lines are the true U7 and U2 lines. The dashed lines are the U1 and U2 lines
estimated with a2 fixed at 1.0 when the true a2 is 1.5 [see Table 1, col. (3)].
Parameter M NM M NM M NM
Regressor coefficients
Regressor coefficients
NLINC .0077** -.0471** .0081** -.0473** .0029 -.0367
(.0033) (.0061) (.0036) (.0229) (.0028) (.0255)
U-RATE - .0760* - .0484** - .0752** -.0484** -.0978** - .0804**
(.0145) (.0135) (.0176) (.0179) (.0202) (.0222)
TREND -.0231** .0462** -.0233** .0461* -.0116 .0370
(.0070) (.0089) (.0078) (.0270) (.0078) (.0314)
EDUC .0121* .1083"* .0138 .1106** .0328** .1024*
(.0071) (.0074) (.0140) (.0496) (.0152) (.0552)
EXPER .0252** -.0290** .0269** -.0277 .0142 -.0229
(.0109) (.0110) (.0111) (.0205) (.0115) (.0249)
EXPER2 -.0017** .0005 -.0018** .0005 -.0015** -.0000
(.0005) (.0005) (.0005) (.0007) (.0005) (.0008)
WHITE .1047** .0865* .1117** .0976* .1455** .1356*
(.0495) (.0479) (.0565) (.0567) (.0585) (.0695)
WIFE .4711** .9468** .4782** .9599** .5080** .9110**
(.0390) (.0922) (.0639) (.3610) (.0678) (.3911)
KIDS .1164** -.1777** .1174** -.1798* .0984** -.1092
(.0225) (.0321) (.0276) (.1053) (.0259) (.1184)
CONSTANT -.0585 -.1268 - .0741 -.1346 .4552 .3131
(.1434) (.1156) (.1730) (.3379) (.1814) (.3499)
Covariance matrix
p .0000 .0315 .6419*
(.4093) (.3682)
a2 1.0000 1.0506** 1.1596**
(.5087) (.5843)
Log-likelihood 10,300.710 -10,299.700 -10,299.645
NOTE: M denotes manufacturing utility index coefficients. NM denotes nonmanufacturing. Standard errors are in parentheses. Double
asterisks indicate significance at the 5% level. An asterisk indicates significance at the 10% level. The estimates in column (1) were
obtained by fixing p and -2 at 0 and 1, respectively. The estimates in column (2) were obtained using the column (1) estimates as
starting values. The estimates in column (3) were obtained using as starting values the estimates from a model that included only
CONSTANT, U-RATE, TREND, and NLINC as regressors and that held p and a2 fixed at 0 and 1, respectively.