0% found this document useful (0 votes)
9 views

I Just

LECTURA

Uploaded by

u202120852
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

I Just

LECTURA

Uploaded by

u202120852
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

I Just Ran Two Million Regressions

By XAVIER X. SALA-I-MARTIN *

Following the seminal work of Robert Barro An initial answer to this question was given
(1991 ), the recent empirical literature on eco- by Ross Levine and David Renelt (1992).'
nomic growth has identified a substantial num- They applied Edward Leamer's (1985)
ber of variables that are partially correlated extreme-bounds test to identify "robust" em-
with the rate of economic growth. The basic pirical relations in the economic growth
methodology consists of running cross- literature. In short, the extreme-bounds test
sectional regressions of the form works as follows. Imagine that there is a pool
of N variables that previously have been iden-
(1) Y = a + 1x, + 32X2 tified to be related to growth and one is inter-
ested in knowing whether variable z is
+.-.+Inxn +? "robust." One would estimate regressions of
the form
where y is the vector of rates of economic
growth, and x,, ... , x, are vectors of explan- (2) y = aj + pyjy + Pzjz + Ixjxj + s
atory variables, which vary across researchers
and across papers. Each paper typically reports where y is a vector of variables that always
a (possibly nonrandom) sample of the regres- appear in the regressions (in the Levine and
sions actually run by the researcher. Variables Renelt paper, these variables are the initial
like the initial level of income, the investment level of income, the investment rate, the sec-
rate, various measures of education, some pol- ondary school enrollment rate, and the rate
icy indicators, and many other variables have of population growth), z is the variable of
been found to be significantly correlated with interest, and xj E X is a vector of up to three
growth in regressions like ( 1 ). I have collected variables taken from the pool X of N vari-
around 60 variables which have been found to ables available. One needs to estimate this
be significant in at least one regression. regression or model for all the possible M
The problem faced by empirical growth combinations of xj E X. For each model j]
economists is that growth theories are not one finds an estimate, fzj r and a standard de-
explicit enough about what variables xj viation,
be- azj The lower extreme bound is de-
long in the "true" regression. That is, even fined to be the lowest value of zj -2ozj,
if it is known that the "true" model looks and the upper extreme bound is defined to be
like (1), one does not know exactly what the largest value of fzj + 2ozj. The extreme-
particular variables xj should be used. Ifbounds
one test for variable z says that if the
starts running regressions combining the lower extreme bound for z is negative and
various variables, variable xi will soon be the upper extreme bound is positive, then
found to be significant when the regression variable z is not robust. Note that this
includes variables x2 and x3, but it becomes amounts to saying that if one finds a single
nonsignificant when X4 iS included. Since the regression for which the sign of the coeffi-
"true" variables that should be included are cient f3z changes or becomes insignificant,
not known, one is left with the question: then the variable is not robust.
what are the variables that are really corre- Not surprisingly, Levine and Renelt's
lated with growth? conclusion is that very few (or no) vari-

* Department of Economics, Columbia University, 420


West 118th St., New York, NY 10027, and Universitat ' The data for this paper were taken from the World
Pompeu Fabra, Barcelona, Spain. Bank Research Department's Web page.
178

This content downloaded from 129.82.37.215 on Wed, 28 Sep 2016 22:57:11 UTC
All use subject to http://about.jstor.org/terms
VOL. 87 NO. 2 RECENT EMPIRICAL GROWTH RESEARCH 179

ables are robust. One possible reason for A. Case 1: The Distribution of the Estimates
finding few or no robust variables is, of of f3 across Models Is Normal
course, that very few variables can be iden-
tified to be correlated systematically with In order to compute the cumulative distri-
growth. Hence, some researchers' reading bution function [CDF(0)], one needs to know
of the Levine and Renelt paper concluded the mean and the standard deviation of this
that nothing can be learned from this em- distribution. For each of the M models, com-
pirical growth literature because no vari- pute the (integrated) likelihood, Lj, the point
ables are robustly correlated with growth. estimate ,j, and the standard deviation u,j.
Another explanation, however, is that the With all these numbers one can construct the
test is too strong for any variable to pass it:
mean estimate of /, as the weighted average
if the distribution of the estimators of /, has
each of the M point estimates, I83z:
some positive and some negative support,
M
then one is bound to find one regression for
which the estimated coefficient changes (3) fz= E zj
1=1
signs if enough regressions are run. Thus,
giving the label of nonrobust to all variables where the weights, wzj are proportional to the
is all but guaranteed. (integrated) likelihoods

I. Moving Away from Extreme Tests 1<,


(4) M =
In this paper I want to move away from E Lzi
this "extreme test." In fact, I want to depart
from the zero-one labeling of variables as The reason for using this weighting scheme is
"robust" vs. "nonrobust," and instead, I to give more weight to the regressions or mod-
want to assign some level of confidence to els that are more likely to be the true model.
each of the variables. One way to move (Incidentally, this is another reason for using
away from the extreme-bounds test is to regressions with the same number of explan-
look at the entire distribution of the esti- atory variables, since models with more vari-
mators of f,3. In particular, one might be ablesin- will tend to have better fit. To the extent
terested in the fraction of the density that the fit of model j is an indication of its
function lying on each side of zero: if 95 probability of being the true model, a
percent of the density function for the esti- likelihood-weighted scheme like the one pro-
mates of I31 lies to the right of zero and only posed here should be reasonable.)
52 percent of the density function for ,82 lies I also compute the average variance as
to the right of zero, one will probably think the weighted average of the M estimated
of variable 1 as being more likely to be cor- variances, where the weights are given by
related with growth than variable 2.2 The (4):
immediate problem is that, even though
M
each individual estimate follows a Student-t
distribution, the estimates themselves could (5) = l
j=1
be scattered around in a strange fashion.
Hence, I will operate under two different Once the mean and the variance of the normal
assumptions. distribution are known, I compute the CDF (0)
using the standard normal-distribution.

B. Case 2: The Distribution of the Estimates


2 Zero divides the area under the density in two. For
of fz across Models Is Not Normal
the rest of the paper, and in order to economize on space,
the larger of the two areas will be called CDF(O), regard-
less of whether this is the area above zero or below zero
If the distribution is not normal, one can
[in other words, regardless of whether this is the CDF(O) still compute its CDF(0) as follows. For
or 1 - CDF(O)]. each of the M regressions, compute the

This content downloaded from 129.82.37.215 on Wed, 28 Sep 2016 22:57:11 UTC
All use subject to http://about.jstor.org/terms
180 AEA PAPERS AND PROCEEDINGS MAY 1997

individual CDF(O), denoted by bzj(O1Pzj, timate, using my computer.3 A possible alter-


Uzi). Then compute the aggregate CDF(O) native was to run regressions with only three
of f3z as the weighted average of all the in- or four explanatory variables. The problem
dividual CDF(O) 's, where the weights are, then would be that a lot of the regressions
again, the integrated likelihoods given by would be clearly misspecified (missing im-
(4). In other words, portant variables is more of a problem than
introducing irrelevant variables). Given these
M problems, I decided to follow Levine and
(6) JD(O) = w wjzDzj(Ol/3zj, &j). Renelt and allow all the models to include
three fixed variables, so when I combine these
j=l
three variables along with the tested variable
A potential problem with this method is and then with trios of the remaining 59 vari-
that it is possible that the goodness of fit of ables, I always have regressions with seven
model j may not be a good indicator of the explanatory variables.
probability that model j is the true model. Of all the variables in the literature, I chose
This might happen, for example, when some a total of 62. The selection was made keeping
explanatory variables in the data set are en- in mind that I want variables that are measured
dogenous: Models with endogenous variables at the beginning of the period (which is 1960)
may have a (spurious) better fit. Thus, the or as close as possible to it to minimize en-
weights corresponding to those given to these dogeneity. This eliminated all those variables
models will tend to be larger, and in fact, they that were computed for the later years only.
may very well dominate the estimates. It may The next thing I needed to do was to choose
be found that only one or two of the models the three fixed variables (i.e., the variables that
get all of the weight in the estimated weighted appear in all regressions). These variables
average, and these one or two models may need to be "good" a priori. By this I mean
suffer from endogeneity bias. It can be argued that they have to be widely used in the litera-
that, when this is a serious problem, the un- ture, they have to be variables evaluated in the
weighted average of all the models may be beginning of the period (1960) to avoid en-
superior to the weighted averages, so I also dogeneity, and they have to be variables that
computed unweighted versions of (3), (5), are somewhat "robust" in the sense that they
and (6). systematically seem to matter in all regressions
run in the previous literature. One obvious
II. Specifications and Data variable here is the level of income in 1960,
since most researchers include it in their anal-
Even though I depart from Levine and ysis and find it to be significant (this is the
Renelt when it comes to "testing" variables, conditional convergence effect). The other
I keep their specification in the sense that I am two variables chosen are the life expectancy in
going to estimate models like (2). Model j 1960 and the primary-school enrollment rate
combines some variables which appear in all in 1960. Both are reasonable and widely used
regressions (y), the variable of interest (z), measures of the initial stock of human capital.
with the trio xj taken from the pool X of the In summary, I have a total of 62 variables.
remaining variables proposed in the literature. I will use three of them in all regressions, so
The reason for keeping some variables in all for each variable tested I will combine the
regressions and the reason for allowing the re- remaining 58 variables in sets of three.
maining variables to come only in trios is that Hence, I will estimate 30,856 regressions per
the typical growth regression in the literature variable or a total of nearly 2 million regres-
has (at least) seven right-hand-side variables. sions. I should mention that, even though I
I found a total of 62 variables in the literature.
If I tested one variable and allowed the re-
maining 61 to be combined in groups of 6, I
3 Some regressions are repeatedly estimated. Repetition
would have to estimate 3.4 billion regressions, could be reduced (and, hence, speed increased), but only
which would take me about four years to es- at a high cost in terms of memory usage.

This content downloaded from 129.82.37.215 on Wed, 28 Sep 2016 22:57:11 UTC
All use subject to http://about.jstor.org/terms
VOL. 87 NO. 2 RECENT EMPIRICAL GROWTH RESEARCH 181

do not report these results, I performed the TABLE 1-MAIN RESULTS OF REGRESSIONS
extreme-bounds test on the 59 tested vari- (DEPENDENT VARIABLE = GROWTH)

ables and found that only one passes it.'


Independent (i) (ii) (iii)
However, when I look at the t ratios, I see
variable 6 SD CDFa
that some variables are significant almost all
Equipment
of the time (or over 90 percent), while oth-
investment 0.2175 0.0408 1.000
ers are significant less than 10 percent or Number of years
even 1 percent of the time. open economy 0.0195 0.0042 1.000
Fraction

III. Results Confucian 0.0676 0.0149 1.000


Rule of law 0.0190 0.0049 1.000
Fraction Muslim 0.0142 0.0035 1.000
I will only report here the results for the Political rights -0.0026 0.0009 0.998
variables that appear to be "significantly" Latin America
correlated with growth. By this I mean those dummy -0.0115 0.0029 0.998
Sub-Saharan
variables whose weighted CDF(O) is larger
Africa dummy -0.0121 0.0032 0.997
than 0.95. The full results are reported in Sala-
Civil liberties -0.0029 0.0010 0.997
i-Martin ( 1996).5 Revolutions and
Column (i) of Table 1 reports the estimated coups -0.0118 0.0045 0.995
Fraction of GDP
weighted mean [described in (3)] of the esti-
in mining 0.0353 0.0138 0.994
mated coefficients for each variable. Column
SD black-market
(ii) reports the weighted standard error [de- premium -0.0290 0.0118 0.993
scribed in (5)]. Column (iii) reports the level Primary exports
of significance under the assumption of non- in 1970 -0.0140 0.0053 0.990
Degree of
normality, as described by equation (6) (the
capitalism 0.0018 0.0008 0.987
levels of significance under normality can be War dummy -0.0056 0.0023 0.984
computed by the reader using the average Non-equipment
mean and standard deviations reported in col- investment 0.0562 0.0242 0.982
umns (i) and (ii), respectively). The table Absolute latitude 0.0002 0.0001 0.980
Exchange-rate
shows that 22 out of the 59 variables appear
distortions -0.0590 0.0302 0.968
to be "significant." These variables include Fraction
the following: Protestant -0.0129 0.0053 0.966
Fraction
Buddhist 0.0148 0.0076 0.964
1. Regional Variables: Sub-Saharan Africa,
Fraction Catholic -0.0089 0.0034 0.963
Latin America (negatively related to
Spanish colony -0.0065 0.0032 0.938
growth), and Absolute Latitude (far away
from the equator is good for growth). a Nonnormal.

These variables are from the Barro and


Jong Wha Lee (1993) data set.6
2. Political Variables: Rule of Law, Political
Rights, and Civil Liberties (good for itary Coups and War dummy (bad for
growth); Number of Revolutions and Mil- growth). All of these are from the Barro
and Lee (1993) data set.
3. Religious Variables: Confucian, Buddhist,
and Muslim (positive); and Protestant and
'The detailed results can be found in Sala-i-Martin
Catholic (negative). All of these variables
(1996).
' It turns out that the "levels of significance" found are from Barro (1996).
under the assumption of normal distribution and under 4. Market
the Distortions and Market Perfor-
assumption of nonnormal distribution are virtually iden- mance: Real Exchange Rate Distortions
tical. This may indicate that the distribution is close to
and Standard Deviation of the Black Mar-
normal or that, for each variable, there is only one model
that takes all the weight.
ket Premium (both from Barro and Lee
6 The data for this paper were taken from the NBER [1993] and both negative).
Web page. 5. Types of Investment: Equipment Investment

This content downloaded from 129.82.37.215 on Wed, 28 Sep 2016 22:57:11 UTC
All use subject to http://about.jstor.org/terms
182 AEA PAPERS AND PROCEEDINGS MAY 1997

and Non-Equipment Investment (both pos- As mentioned earlier, the likelihood-


itive, although the coefficient for non- weights used up to now are valid only to the
equipment investment [ 3 = 0.0562] is extent that all the models are true regression
about one-fourth the coefficient for equip- models. If there are models with spurious good
ment investment [13 = 0.2175]; see fits, then a nonweighted scheme may be su-
Bradford De Long and Lawrence Summers perior. In Sala-i-Martin ( 1996) 1 report the de-
[1991]) .7 tailed results. Suffice to say that only four
6. Primary Sector Production: Jeffrey Sachs variables that are above the magic line of 0.95
and Andrew Warner's (1995) Fraction of according to the weighted CDF(0) drop below
Primary Products in Total Exports (nega- that mark when an unweighted average of the
tive) and Robert Hall and Charles Jones's individual CDF(0)'s is used. These variables
( 1996) Fraction of GDP in Mining are Civil Liberties, Revolutions and Coups,
(positive).8 Fraction of GDP in Mining, and the War
7. Openness: Sachs and Warner's ( 1996) dummy. On the other hand, only one variable
Number of Years an Economy Has Been with a CDF(0) above 0.95 gets a CDF(0) be-
Open Between 1950 and 1990 (positive). low 0.95: the Ratio of Liquid Liabilities to
8. Type of Economic Organization: Hall and GDP, which is a measure of the degree of fi-
Jones's ( 1996) Degree of Capitalism nancial development.
(positive).
9. Fortner Spanish Colonies. IV. Conclusions

It is interesting to note some of the variables My claim in this paper is that, if one is in-
that are not in the table (because they appear terested in knowing the coefficient of a par-
not to be important): no measure of govern- ticular variable in a growth regression, the
ment spending (including investment) appears picture emerging from the empirical growth
to affect growth in a significant way. The var- literature is not the pessimistic "nothing is
ious measures of financial sophistication, the robust" obtained with the extreme bound
inflation rate, and its variance do not appear to analysis. Instead, a substantial number of
matter much. (In fairness to the authors who variables can be found to be strongly related
proposed these variables, I should say that they to growth.
specifically say that they affect growth in non-
linear ways, and my analysis allowed these REFERENCES
variables to enter in a linear fashion only.)
Other variables that do not seem to matter in- Barro, Robert J. "Economic Growth in a Cross
clude various measures of scale effects (mea- Section of Countries." Quarterly Journal of
sured by total area and total labor force), Economics, May 1991, 106(2), pp. 407-
outward orientation, tariff restrictions, the 43.
black-market premium, and the recently pub- . "'Determinants of Democracy."
licized "ethno-linguistic fractionalization" Mimeo, Harvard University, July 1996.
(which is supposed to capture the degree to Barro, Robert J. and Lee, Jong-Wha. "Interna-
which there are internal fights among vanous tional Comparisons of Educational Attain-
ethnic groups).' ment." Journal of Monetary Economics,
December 1993, 32(3), pp. 363-94.
De Long, J. Bradford and Summers, Lawrence.
"'Equipment Investment and Economic
7 The data for this paper were taken from the World
Growth." Quarterly Journal of Economics,
Bank Research Department's Web page. May 1991, 106(2), pp. 445-502.
8 The data for Sachs and Warner ( 1995) were provided Hall, Robert and Jones, Charles. "The Produc-
by Andrew Warner; the data for Hall and Jones (1996) tivity of Nations." National Bureau of
were taken from Charles Jones's Web page.
Economic Research (Cambridge, MA)
9 See Sala-i-Martin ( 1996) for the complete list of vari-
ables, with their estimated coefficients and levels of Working Paper No. 5812, November
significance. 1996.

This content downloaded from 129.82.37.215 on Wed, 28 Sep 2016 22:57:11 UTC
All use subject to http://about.jstor.org/terms

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy