Drawbacks in The 3-Factor Approach of Fama and French (2018)
Drawbacks in The 3-Factor Approach of Fama and French (2018)
Drawbacks in The 3-Factor Approach of Fama and French (2018)
(2018)I
David E. Allen
a,∗, and Michael McAleer
b
EI2019-20
Abstract
This paper features a statistical analysis of the monthly three factor Fama/French
return series. We apply rolling OLS regressions to explore the relationship be-
tween the 3 factors, using monthly and weekly data from July 1926 to June 2018,
that are freely available on French's website. The results suggest there are sig-
nicant and time-varying relationships between the factors. This is conirmed
by non-parametric tests. We then switch to a sub-sample from July 1990 to
July 2018, also taken from French's website. The three series and their inter-
relationships are analysed using two stage least squares and the Hausman test
to check for issues related to endogeneity, the Sargan over-identication test
and the Cragg-Donald weak instrument test. The relationship between factors
is also examined using OLS, incorporating Ramsey's RESET tests of functional
form misspecication, plus Naradaya-Watson kernel regression techniques. The
empirical results suggest that the factors, when combined in OLS regression
analysis, as suggested by Fama and French (2018), are likely to suer from en-
dogeneity. OLS regression analysis and the application of Ramsey's RESET
tests suggest a non-linear relationship exists between the three series, in which
cubed terms are signicant. This non-linearity is also conrmed by the ker-
nel regression analysis. We use two instruments to estimate the market betas,
and then use the factor estimates in a second set of panel data tests using
a small sample of monthly returns for US rms that are drawn from the on-
line data source tingo. These issues are analysed using methods suggested by
I The authors are grateful to Adrian Pagan for helpful comments and suggestions.The
second author wishes to acknowledge the Australian Research Council and the Ministry of
Science and Technology (MOST), Taiwan, for nancial support.
∗ Corresponding author
Petersen (2009) to permit clustering in the panels by date and rm. The em-
pirical results suggest that using an instrument to capture endogeneity reduces
the standard error of market beta in subsequent cross-sectional tests, but that
clustering eects, as suggested by Petersen (2009), will also impact on the es-
timated standard errors. The empirical results suggest that using these factors
in linear regression analysis, such as suggested by Fama and French (2018), as
a method of screening factor relevance, is problematic in that the estimated
standard errors are highly sensitive to the correct model specication.
1. Introduction
In a fundamental paper, Fama and French (1993, p.3), stated that: there are
three stock-market factors: an overall market factor and factors related to rm
size and book-to-market equity. French generously provides estimates of these
original factors, and more recently suggested additions, on his personal website
(see http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/Data_Library/f-
f_factors.html). The original 1993 paper triggered the development of a virtual
global industry in testing the eects of various factors on various portfolios se-
lected from global markets. Both Fama and French are Directors and advisors to
a set of corporate entities under the rubric, Dimensional Fund Advisors, which
applies factor models in a managed fund and investment advisory setting.
Cochrane (2011, p.1047), in his Presidential Address, delivered to the Amer-
ican Finance Association, observed that: we also thought that the cross-section
of expected returns came from the CAPM. Now we have a zoo of new factors.
Harvey, Liu, and Zhu (2015) list 316 anomalies proposed as potential factors in
asset-pricing models, and comment that there are others that do not make their
list. Fama and French (2018) respond to these new challenges by suggesting
how to choose among competing factors, and explain that previous approaches
can be described under two main headings. The left-hand-side (LHS) approach
judges competing models on the intercepts (unexplained average returns) in
time series regressions to explain excess returns on sets of LHS portfolios. A
drawback is that dierent sets of LHS portfolios can lead to dierent intercepts
and, therefore, to dierent inferences.
An alternative right-hand-side (RHS) approach uses spanning regressions
to judge whether individual factors contribute to the explanation of average
returns provided by a model. Each candidate factor is regressed on the model's
other factors. If the intercept in a spanning regression is non-zero, the factor
adds to the model's explanation of average returns in that sample period. Fama
and French (2018) note that the GRS statistic of Gibbons, Ross, and Shanken
(1989), hereafter GRS, produces a test of whether multiple factors add to a base
model's explanatory power.
3
A perusal of GRS reveals that their test is based on the strong assumptions
of linearity, independence and a Gaussian distribution. They proceed on the
assumption that there is a given riskless rate of interest, Rf t , for each time
period. Excess returns are computed by subtracting Rf t , from the total rates
of return. Then they consider the following multivariate linear regression:
where r̃it ≡ excess return on asset i in period t, r̃pt ≡ excess return on the
portfolio whose eciency is being tested, and ˜it ≡ disturbance term for asset i
in period t. The disturbances are assumed to be jointly normally distributed in
P
each period, with mean zero and nonsingular covariance matrix , conditional
on the excess returns for portfolio p. P
They also assume independence of the
disturbances over time. In order that be non-singular, r̃pt and the N left-
hand-side assets must be linearly independent.
GRS suggest that if a particular portfolio is mean-variance ecient (that is,
it minimizes variance for a given level of expected return), then the following
rst-order condition must be satised for the given N assets:
Therefore, when they combine the rst-order condition in (2) with the distri-
butional assumption suggested by (1), they obtain the following parametric
restriction, which they state in the form of a null hypothesis:
Thus, the test is based on a null hypothesis that the intercept in the above
regression, as shown in expressions (1) and (2), is zero. There are several as-
sumptions required for this test to be valid, namely linearity, independence, and
Gaussian distributions.
In this comment, we apply simple tests of endogeneity, and independence to
a set of monthly data taken from French's website featuring the Fama/French
estimates of the excess return on the market portfolio, estimates of SMB and
HML. The Fama/French factors are constructed using the 6 value-weight port-
folios formed on size and book-to-market.
SMB (Small Minus Big) is the average return on the three small portfolios
minus the average return on the three big portfolios.
SMB = 1/3 (Small Value + Small Neutral + Small Growth) - 1/3 (Big Value
+ Big Neutral + Big Growth).
HML (High Minus Low) is the average return on the two value portfolios
minus the average return on the two growth portfolios.
HML = 1/2 (Small Value + Big Value) - 1/2 (Small Growth + Big Growth).
Rm-Rf, the excess return on the market, value-weight return of all CRSP
rms incorporated in the USA and listed on the NYSE, AMEX, or NASDAQ
that have a CRSP share code of 10 or 11 at the beginning of month t, good
4
shares and price data at the beginning of t, and good return data for t minus
the one-month Treasury bill rate (from Ibbotson Associates).
For the purpose of providing an example, we use a sample of capitalization
change adjusted company prices from the free on-line data source tingo (see:
https://www.tiingo.com). We employed an R library package interface riingo,
which provides an interface to the database,
(see: https://cran.r-project.org/web/packages/riingo/index.html), and down-
loaded adjusted monthly price data for 21 companies. This data set of three
time series of market factors, consisting of 220 monthly observations from Jan-
uary 2000 through to July 2018, and use a subset of the monthly data from
January 2000 to the end of December 2010 comprising 132 observations, to es-
timate market factors. Tests of endogeneity using two stage least squares and
the Hausman test are used, as Fama and French (2018) adopt a test proposed
by Barillas and Shanken (BS, 2018).
Barillas and Shanken (2018) assume that the factors of competing models are
among the LHS returns that each model is supposed to explain. Formally, let R
be the target set of non-factor LHS excess returns, fi the factors of model i, and
FAi the union of the factors of model i's competitors. In the BS approach, the
set of LHS returns for model i, Πi , combines R and FAi , with linearly dependent
components deleted. Competing models are assessed on the maximum (max)
squared Sharpe ratio for the intercepts from time series regressions of LHS
returns on a model's factors.
P
Dene ai as the vector of intercepts from regressions of Πi on fi , and i
as the residual covariance matrix. The maximum squared Sharpe ratio for the
intercepts is given by:
0 P−1
Sh2 ai = ai i ai , (4)
and the superior model is judged to be the one with the smallest Sh2 ai .
0 P−1 i
Gibbons et al. (1989) show that ai i a , is the dierence between the max
squared Sharpe ratio constructed from fi and Πi together, and the max for fi
individually:
Fama and French (2018) suggest that since Πi includes the factors of all model
i's competitors, the union of Πi and fi , which they call Π, does not depend on
i. This means that equation (5) can be simplied to:
They assume that R is the target set of non-factor LHS excess returns, and
that the best model is the one which produces the highest Sh2 f. They suggest
that there is bias when comparing non-nested models, and conduct a bootstrap
simulation of in- and out-of -sample results to compensate.
What Fama and French (2018) do not mention is a potential problem with
endogeneity of the RHS variables that is integral to their suggested metric.
5
0
H = (b1 − b0 ) (V ar(b1 ) − V ar(b0 ))† (b1 − b0 ), (7)
y = Y β + Xγ + u
Y = ZΠ + X + V,
2
0 −1/2 ⊥0 ⊥ 0 −1/2 T − K1 − K2
GT = (Y ZY ) Y PZ⊥ Y (Y MZ Y ) . (8)
K2
The minimum eigenvalue of GT is the statistic used for testing for weak instru-
ments.
We run a series of tests in which we explore the relationship between the
factors themselves. We regress SM B and HM L on RM − RF . As instruments,
we use monthly OECD surveys of expected US manufacturing production, plus
the monthly return on the VIX. This was the older version of the VIX (VXO),
based on implied volatilities, rather than the new 'model free' VIX, which was
introduced in 2003. We run individual time series regressions in which the
RHS variables are the other two factors in the 3-factor model. The key issue is
the endogeneity of the factors. If they are found to be endogenous, then OLS
estimates in a multiple regression are likely to be biased and inconsistent. This
would make the validity of the test recommended by Fama and French (2018)
for choosing factors more sensitive to the validity of the regression specication
used in estimating the factor loadings.
Figure 1 (c), the third diagram in the set of three, shows the OLS relationship
between HM L and SM B . This relationship is signicantly positive from 1932
to 1950. The relationship is then insignicant until about 1973, when there
is a brief spell when it becomes signicantly negative. The signicant negative
relationship re-occurs between 1982 and 1991, and then again from 1995 to 2004.
Then from 2009 to 2014, it becomes signicantly positive, which is followed by
a period of insignicance.
We then repeated the exercise using weekly data for the same three factor
series, which again was downloaded from French's website, for a period from
the rst week in July 1926 to the last week in October 2016, comprising a to-
tal of 4817 observations. We again ran bivariate rolling OLS regressions using
a window of 52 weeks between the three factors and the results of these are
shown in Figure 2. The results, using a one year window of weekly data, for
the regression of SM B on RM − RF , shown in Figure 2(a) reveal that there are
signicant positive relationships between these two factors in the late 1930s and
for a long period in the 1940s. The relationship then becomes signicant and
negative for periods in the 1950s and 1960s. It becomes signicant and positive
again in the 1960s, and then switches signs to being negative signicant followed
by positive signicant in the late 1970s and early 1980s. It then becomes nega-
tive and signicant in the early and late 1990s. In the mid 2000s, it is positive
and signicant, and this signicant positive relationship re-emerges twice in the
period between 2010 and 2018.
The results shown in Figures 1 and 2 suggest that, if the 3 factors are em-
ployed jointly in a time series regression, to estimate factor loadings then great
care must be taken to check the relationships between the factors. Figures 1 and
2 show that the factors are not independent for long periods of time between
1926 and 2018. If they are employed as independent variables in a time series
regression, they are likely to suer from endogeneity.
As a further check we examined the relationship using a non-parametric
measure for testing non-linear pairwise independence suggested by Massoumi
and Racine (2002) which is available in the R library package 'np' as set out by
Hayeld and Racine (2008). This tests the null of pairwise independence of two
univariate density (or probability) functions. In the case of continuous variables
we construct:
Z ∞ Z ∞
1 1/2 1/2
Sρ = (f1 − f2 )2 dxdy
2 −∞ −∞
!2
Z Z 1/2
1 f2
= 1− 1/2
dF1 (x, y), (9)
2 f1
where f1 = f (xi , yi ) is the joint density and f2 = g(xi ) × h(yi ) is the product
of the marginal densities of the random variables Xi and Yi . The unknown
density/probability functions are replaced with nonparametric kernel estimates.
The bootstrap distribution is obtained by resampling with replacement from
the empirical distribution of X delivering {Xi , Yi } pairs under the null generated
as {Xi∗ , Yi } where X∗ is the bootstrap resample (i.e. we `shue' X leaving Y
unchanged thereby breaking any pairwise dependence to generate resamples
under the null). Bandwidths are obtained via likelihood cross-validation by
default for the marginal and joint densities.
We implemented this test using a measure of predictability for variable Y
and its predicted values Ŷ (from our implemented model). In our case, our three
models implemented were the linear OLS regressions estimated pairwise of the
three Fama-French factors on one-another. The results of these tests using a
bootstrap with 999 replications are shown in Table 1.
predictions obtained via OLS and the full sample period from July 1926 to June
2018, comprising some 1102 observations, reject the null of indepencence, in all
three pairwise cases, at better than the 1 per cent level..
We then set up further simple tests, using a more recent sub-set of the data,
using monthly 3 factor Fama-French return series, from July 1990 to July 2018
available on French's website, together with the monthly excess return on the US
market, RM − RF , employed as the independent variable, in a set of time series
regressions, also taken from French's website. To check for endogeneity and to
estimate the regression equations using two stage least squares, we need suitable
instrumental variables that are independently related to some of the factors. We
chose Business Tendency Surveys for Manufacturing: Condence Indicators:
Composite Indicators: European Commission and National Indicators for the
United States, (BSCICP02USM460S), which is an OECD monthly indicator
series. This series is available on the Federal Reserve Bank of St. Louis (FRED)
database, and features the results of surveys of condence in US Manufacturing.
We also used the return on the older version of the VIX (VXO), based on implied
volatilities, rather than the new 'model free' VIX, which was introduced in 2003.
This was because the data set commences in 2000.
Given the evidence discussed in Figures 1 and 2, we rst explored whether the
factors used in the typical Fama-French regression are related, for this smaller
sample period, by regressing SM B and HM L on the market factor RM − RF .
The results of these regressions are shown in Table 2, which reveal a signicant
relationship between SM B and RM − RF . Simlar to the results in Figures 1
and 2, this suggests that they are likely to suer from an endogeneity problem
if they are used as explanatory variables in time series regressions.
3.2 Further tests using a subset of the data 12
PN
Kh (x − xi )yi
M̂h (x) = Pi=1
N
, (10)
j=1 Kh (x − xj )
3.3. OLS and Two Stage Least Squares results, and Hausman tests
The next step is to compare the customary method adopted in asset pricing
tests: namely time series regressions using OLS as a means to estimate betas,
with two stage least squares, including the use of instruments, and tests of
endogeneity using the Hausman test. This next step requires some company
return series.
These preliminary results suggest there is a potential endogeneity problem
with an OLS time series model, and the regression of returns on a stock or
portfolio to estimate their factor loadings or betas in a 3-factor setting.
To assess the extent of the problem, we downloaded a sample of capi-
talization change adjusted company prices from the free on-line data source
3.3 OLS and Two Stage Least Squares results, and Hausman tests 17
We re-estimated the time series beta estimates for the 3-factor model using
instrumental variables. The results are shown in Tables 11, 12, 13, and 14.
These regressions, undertaken using two stage least squares, with the lagged
instrument based on expectations of US Production, while not biased, are even
stronger.
If we consider the time series estimates, and the beta coecients estimated
on the market factor, RM −RF , of the 21 regression estimates, 17 are signicant
at the 5% level or better. SM B has 7 signicant coecients and HM L has
9. Thus, there are more signicant coecients than in the simple time series
regressions.
However, we compared the estimated slope coecients from the time series
estimation of factor loadings using OLS, and those from the estimates using two
instrumental variables with one lag, to adjust for the endogeneity problem, plus
the application of two stage least squares, and used non-parametric sign tests
to examine whether there are any signicant dierences between the two sets of
estimates.
The results, which are reported in Table 16, suggest that there are signicant
dierences in the estimates of the loadings on the excess market return RM −RF ,
and on SM B , while there is no signicant dierence in the loading on HM L.
This is reassuring, in that the use of the instruments focused on the excess
market return RM − RF and SM B, while HM L was on the borderline of being
endogenous.
3.3 OLS and Two Stage Least Squares results, and Hausman tests 18
Company Code
1 APPLE.INC AAPL
2 INERNATIONAL BUSINESS MACHINES CO IBM
3 AGILENT TECHNOLOGIES INC A
4 YAHOO.INC AABA
5 ALABAMA AIRCRAFT INDUSTRIES AAIIQ
6 ATLANTIC AMERICA CORP AAME
7 ARMADA MERCANTILE LTD AAMTF
8 AARON'S INC AAN
9 AAON. INC AAON
10 AMER-PETRO HUNTER.INC AAPH
11 ALL-AMERICAN SPORTPARK.INC AASP
12 ALLIANCEBERNSTEIN HOLDING L.P. AB
13 ABAXIS.INC ABAX
14 AMERIS BANCORP ABCB
15 ABEO ABEO
16 AMBEV SA ABEV
17 ARKANSAS BEST CORP ABFS
18 ARCA BIOPHARMA INC ABIO
19 ABM INDUSTRIES INC ABM
20 ABBOTT LABORATORIES ABT
21 AUTOBYTEL INC ABTL
3.3 OLS and Two Stage Least Squares results, and Hausman tests 20
Dependent variable:
i ~RM + SMB + HML
∗∗∗ ∗∗ ∗
SMB -0.0005 -0.004 0.007 -0.005 0.005 0.001
(0.003) (0.001) (0.003) (0.003) (0.004) (0.003)
Note: ∗
p<0.1;
∗∗
p<0.05;
∗∗∗
p<0.01
3.3 OLS and Two Stage Least Squares results, and Hausman tests 21
Dependent variable:
i ~RM + SMB + HML
∗∗∗
HML 0.006 -0.003 0.004 0.010 0.006 0.007
(0.007) (0.002) (0.003) (0.014) (0.012) (0.002)
Note: ∗
p<0.1;
∗∗
p<0.05;
∗∗∗
p<0.01
3.3 OLS and Two Stage Least Squares results, and Hausman tests 22
Dependent variable:
i ~RM + SMB + HML
∗∗∗ ∗∗∗
SMB 0.002 0.010 -0.003 0.001 -0.0001 0.021
(0.003) (0.002) (0.006) (0.002) (0.002) (0.006)
∗∗∗ ∗∗∗
HML 0.0003 0.016 -0.009 0.002 0.014 -0.011
(0.003) (0.002) (0.007) (0.003) (0.003) (0.007)
∗ ∗∗∗
Constant 0.002 -0.011 -0.021 0.016 -0.003 -0.068
(0.011) (0.008) (0.023) (0.009) (0.010) (0.024)
Note: ∗
p<0.1;
∗∗
p<0.05;
∗∗∗
p<0.01
3.3 OLS and Two Stage Least Squares results, and Hausman tests 23
Dependent variable:
i ~RM + SMB + HML
∗∗∗
SMB 0.009 -0.002 0.006
(0.002) (0.001) (0.004)
∗∗∗
HML 0.008 0.0002 0.006
(0.002) (0.002) (0.005)
∗
Constant -0.002 0.006 -0.030
(0.007) (0.005) (0.017)
Note: ∗
p<0.1;
∗∗
p<0.05;
∗∗∗
p<0.01
3.3 OLS and Two Stage Least Squares results, and Hausman tests 24
Dependent variable:
i ~RM + SMB + HML |USPROD + USPRODL1 + VRET + VRETL1
∗∗∗ ∗
SMB 0.004 -0.002 0.010 0.005 0.010 -0.001
(0.004) (0.002) (0.003) (0.003) (0.005) (0.004)
∗∗ ∗∗ ∗∗ ∗∗
HML -0.006 0.0004 -0.005 -0.005 0.009 0.003
(0.002) (0.001) (0.002) (0.002) (0.004) (0.003)
∗
Constant 0.020 0.003 -0.007 -0.012 -0.025 -0.004
(0.011) (0.006) (0.010) (0.011) (0.017) (0.013)
Note: ∗
p<0.1;
∗∗
p<0.05;
∗∗∗
p<0.01
3.3 OLS and Two Stage Least Squares results, and Hausman tests 25
Dependent variable:
i ~RM + SMB + HML |USPROD + USPRODL1 + VRET + VRETL1
∗ ∗ ∗∗
SMB 0.006 0.0003 0.005 0.029 0.032 0.002
(0.007) (0.003) (0.002) (0.015) (0.013) (0.002)
∗∗ ∗∗∗
HML -0.003 -0.002 0.005 0.002 0.002 0.005
(0.005) (0.002) (0.002) (0.010) (0.009) (0.001)
Note: ∗
p<0.1;
∗∗
p<0.05;
∗∗∗
p<0.01
3.3 OLS and Two Stage Least Squares results, and Hausman tests 26
Dependent variable:
i ~RM + SMB + HML |USPROD + USPRODL1 + VRET + VRETL1
∗∗∗ ∗∗∗
SMB 0.002 0.009 0.005 0.0004 0.002 0.034
(0.003) (0.003) (0.007) (0.003) (0.003) (0.007)
∗∗ ∗∗∗
Constant 0.009 -0.007 -0.017 0.019 -0.0004 -0.060
(0.011) (0.008) (0.023) (0.009) (0.010) (0.023)
Note: ∗
p<0.1;
∗∗
p<0.05;
∗∗∗
p<0.01
3.3 OLS and Two Stage Least Squares results, and Hausman tests 27
Dependent variable:
i ~RM + SMB + HML |USPROD + USPRODL1 + VRET + VRETL1
∗
Constant 0.004 0.008 -0.032
(0.007) (0.005) (0.016)
Note: ∗
p<0.1;
∗∗
p<0.05;
∗∗∗
p<0.01
3.3 OLS and Two Stage Least Squares results, and Hausman tests 28
where equation (9) includes observations on rms i across years t. X and ε are
assumed to be independent of each other, and ε to possess a zero mean and
nite variance. The beta coecient estimated by OLS is:
PN PT PN PT
i=1 t=1 Xit Yit i=1 t=1 Xit (Xit β + εit )
β̂OLS = PN PT = PN PT
2 2
i=1 t=1 Xit i=1 t=1 Xit
PN PN
i=1 t=1 Xit εit
=β+ PN PT
. (12)
2
i=1 t=1 Xit
! !−2
T PN PT 2
1 t=1 Xit
[ XX
=T f ixed]plim N → ∞ 2 2 2
Xit εit i=1
N t=1
N
1 2 2
2 −2
= T σX σε T σX
N
σε2
= 2 (13)
σX N T.
The above expression is the OLS formula which is correct when the errors
are i.i.d..
Petersen (2009) then assumes that the errors are no longer independent.
First, he assumes that the data have an unobserved rm eect that is xed.
This suggests that the residuals contain a rm-specic component γi , and an
idiosyncratic component that is unique to each observation, ηit . It follows that
the residuals can be specied as:
Petersen (2009) also assumes that the independent variable X has a rm-
specic component:
The components of X (µ and ν) and ε (γ and η) have zero mean, nite vari-
ance, and are independent of one another. This ensures that the estimated co-
ecients are consistent. The independent variable and the errors are correlated
31
across obsevations of the same rm, but are independent across rms. This can
be shown as:
= ρX = σµ2 /σX
2
f ori = j and all t 6= s
= 0 f or all i 6= j,
= 0 f or f or all i 6= j. (16)
It follows that the square of the summed errors is not equal to the sum of
the squared errors. The same observation can be made about the independent
variable. This means that the covariances between the errors must be included.
The asymptotic variance of the OLS coecient estimate can then be written as:
T
N X
!2 PN PT !−2
2
1 t=1 Xit
[ X
i=1
AV ar[β̂OLS −β] =T f ixed]plimN → ∞ Xit εit
N2 i=1 t=1
N
! !−2
N T PN PT 2
1 X X t=1 Xit
[
i=1
=T f ixed]plimN → ∞ 2 Xit εit
N i=1 t=1 N
−1
N
"
T
! T T
#
[ 1 X X 2 2 X X
=T f ixed]plimN → ∞ X ε +2 Xit Xis εit εis
N 2 i=1 i=1 it it t=1 s=t+1
(17)
PN PT !−2
2
i=1 t=1 Xit
N
1 2 2 2 2 −2
= (T σX σε + T (T − 1)ρX σX ρε σε2 )(T σX )
N
σε2
= 2 (1 + (T − 1)ρX ρε .
σX N T
32
We explore this issue in relation to the two sets of estimates of the factor
loadings undertaken in the context of the 3-factor Fama-French model: the
estimates which use time series regressions based on OLS, versus those which
applied two stage least squares (TSLS) and instrumental variables to adjust
for endogeneity. We undertake a limited example of the cross-sectional panel
regression analysis typical of asset pricing tests using the companies downloaded
from 'tiingo'. We use a total of 20 companies because one company had a data
set which ended in 2014, as opposed to continuing to the end of 2017. The cross-
sectional monthly returns sample is from February 2011 to the end of December
2017.
The regressions feature a basic asset pricing test in which the dependent
variable is the actual return on the sample companies. The predicted return is
constructed by applying the estimated company market beta to the actual return
on the market in month t to produce a series of predicted returns. We decided
to switch to a one-factor model, rather than a 3-factor model to concentrate on
the impact on the beta estimates, given that the instrument used in the TSLS
time series regressions in the rst stage of the analysis was related to the market
factor.
The results for the stage one time series regressions of beta for the excess
return on the market factor estimated by OLS, with the second stage asset
pricing tests in a panel context using OLS, Robust OLS, clustered by date, by
rm, and by both date and rm, are shown in Table 17.
Table 18 provides estimates in which the rst stage estimates of the 3-factor
loadings were by two stage least squares (TSLS), plus an instrument. We repeat
that, in the cross-sectional tests reported in Tables 17 and 18, we have concen-
trated on a one-factor model using beta on the excess market return, as this
estimate has been one of the main focusses of the adoption of the instrument.
The key issue is the variation in the estimated standard errors. Fama and
French (2018) suggest ranking models by the intercept estimate, but Tables 17
and 18 show that the standard errors of the intercept are likely to vary, according
to whether we use vanilla OLS, Robust OLS, or tests which allow for potential
clustering of errors in the panel regressions used in the asset pricing tests.
1 We are grateful to Mitchell Petersen and Robert McDonald for supplying copies of the R
code to replicate Petersen (2009).
33
Table 17: Vanilla, Robust and Clustered Standard Errors for OLS
using Returns adjusted by Standard Betas
Variables OLS Robust OLS Cluster: date Cluster: rm Cluster:both
X 0.450603 0.4216 0.4506026 0.450602552 0.45060261
SE 0.120462 0.0458 0.1060870 0.095742356 0.1170480
t Statistic 3.741*** 9.2124*** 4.2475*** 4.70641*** 3.7687***
Constant -0.007239 -0.0002 -0.0072391 -0.00723912 -0.0072391
SE 0.005895 0.0022 0.0064021 0.006699706 0.0075205
t Statistic -1.228 -0.1076 -0.9996 -0.00723912 -0.8815
Adj. RSquare 0.007506 n.a. 0.007506 0.007506 0.007506
F statistic 13.99*** 83.297*** 13.99** 13.99** 13.99**
Res SE 0.2317 0.07618 0.2317 0.2317 0.2317
Observations 1721 1721 1721 1721 1721
Table 18 repeats the analysis but, in this case, uses estimates of the beta on
the market factors which have used two stage least squares plus two instruments
to correct for endogeneity.
Table 18: Vanilla, Robust and Clustered Standard Errors for OLS
using Returns adjusted by Betas estimated by TSLS
Variables OLS Robust OLS Cluster: date Cluster: rm Cluster:both
X 0.6651151 0.7038 0.66511515 0.665115150 0.66511515
SE 0.1475692 0.0557 0.14894697 0.114488052 0.15380479
t Statistic 4.507*** 12.6381*** 4.4654*** 5.80947*** 4.3244***
Constant -0.0004105 0.0041 -0.00041052 -0.000410517 -0.00041052
SE 0.0055778 0.0021 0.00742999 0.006121148 0.00781964
t Statistic -0.074 1.9344 -0.0553 -0.06707 -0.0525
Adj. RSquare 0.01112 n.a. 0.01112 0.01112 0.01112
F statistic 20.31*** 152.48*** 20.31*** 20.31*** 20.31***
Res SE 0.2313 0.07516 0.2313 0.2313 0.2313
Observations 1721 1721 1721 1721 1721
5. Conclusion
In this paper we have used data that are acessible from French's website
to explore the relationship between the data for three monthly market factors
relating to US markets, representing the excess return on the market portfolio
RM − RF , SM B, and HM L. We rst downloaded the entire monthly and
weekly series of the three market factors, which commenced in July 1926 and
terminated in July 1918, and estimated rolling bivariate regressions between the
three series to explore their relationship through time. The rolling regressions
revealed that there are prolonged periods during which the factors are related,
and also intervals when they are not. Their relationship is not constant and
changes sign in some periods. These results suggest that endogeneity between
the factors needs to be considered in certain sub-periods drawn from this 92-
year sample. This was conrmed by non-parametric tests of the independence
of the series against the predictions obtained from pairwise OLS regressions.
We then used monthly data from January 2000 to December 2010 to further
examine these relationships. An exploration of the relationships between these
three factors in this sub-period, using OLS, revealed a signicant relationship
between RM − RF and SM B. Ramsey's RESET test also revealed a non-linear
relationship between HM L and RM − RF . This was further explored via the
application of Naradaya (1964) and Watson (1964) kernel regressions, which
suggested the existence of non-linearities.
Given that asset pricing tests assume linearity between factors and return
series, we set aside the issue of non-linearity and concentrated on the issue
of endogeneity, which empirical evidence suggests as being a complication in
linear time series estimates of factor loadings in a multiple regression context.
We used Business Tendency Surveys for Manufacturing: Condence Indicators:
Composite Indicators: European Commission and National Indicators for the
United States, (BSCICP02USM460S), a monthly OECD indicator series as an
instrument in the estimation of factor loadings using two stage least squares
(TSLS), plus the return on VXO, and one lag of each of these variables.
Non-parametric sign tests on the beta estimates for the loadings on RM −RF
and SM B suggested that there are signicant dierence in the loadings on these
factors estimated by OLS, as opposed to TSLS using instrumental variables.
Given this nding, we then used a small sample of company returns to undertake
cross-sectional tests of sensitivity to the market factor RM − RF , allowing for
clustering of standard errors in this panel of 20 rms, as suggested by Petersen
(2009). The results suggested that the estimated standard errors in the panel
tests are dierent when the beta estimates were estimated by TSLS, which
adjusted for endogeneity, than by OLS. They also varied if clustering was present
by date, or rm, or both, as originally suggested by Petersen (2009).
These empirical results suggest that using these factors in linear regression
analysis, such as suggested by Fama and French (2018), as a method of screening
factor relevance, is problematic in that the standard errors are sensitive to the
correct model specication, in both the initial estimation of the factor loadings,
and in the subsequent panel data tests, in which error clustering may be a
35
serious issue.
References
[1] Barillas, F., and J. Shanken, (2018) Comparing asset pricing models, Jour-
nal of Finance, 73(2), 715-754.
[3] Cragg, J.G., and S.G. Donald (1993) Testing identiability and specica-
tion in instrumental variable models, Econometric Theory, 9(2) 222-240.
[5] Fama, E.F., and K.R. French (1993) Common risk factors in the returns
on stocks and bonds, Journal of Financial Economics, 33, 3-56.
[6] Fama, E.F., and K.R. French (2018) Choosing factors, Journal of Financial
Economics, 128(2), 234-252.
[7] Gibbons, M.R., S.A. Ross, and J. Shanken (1989) A test of the eciency
of a given portfolio, Econometrica, 57(5), 1121-1152.
[8] Harvey, C.R., Y. Liu, and H. Zhu, (2015) . . . and the cross-section of ex-
pected returns, Review of Financial Studies, 29, 5-68.
[11] Maasoumi, E. and J.S. Racine (2002) Entropy and predictability of stock
market returns, Journal of Econometrics, 107(2) 291312.
[14] Petersen, M. (2009) Estimating standard errors in nance panel data sets:
Comparing approaches, Review of Financial Studies, 22(1), 435-480.
[15] Sargan, J.D. (1958) The estimation of economic relationships using instru-
mental variables, Econometrica, 26(3) 393415.
36
[16] Sargan, J.D. (1975) Testing for misspecication after estimating using in-
strumental variables, Mimeo, London School of Economics.
[18] Wu, D.M. (1973) Alternative tests of independence between stochastic re-
gressors and disturbances, Econometrica, 41(4), 733750.