Journal of Safety Research: Frank Gross, Eric T. Donnell
Journal of Safety Research: Frank Gross, Eric T. Donnell
a r t i c l e i n f o a b s t r a c t
Available online 22 March 2011 Problem: While observational before–after studies are considered the industry standard for developing crash
modification factors (CMFs), there are practical limitations that may preclude their use in highway safety
Keywords: analysis. There is a need to explore alternative methods for estimating CMFs. Method: This paper employs
Road safety case–control and cross-sectional analyses to estimate CMFs for fixed roadway lighting and the allocation of
Case–control
lane and shoulder widths. Results: Based on the case–control method, the CMF for intersection lighting is
Cross-sectional
Crash modification factor
0.886, while the cross-sectional study indicates a CMF of 0.881. The CMFs developed for lane and shoulder
Countermeasure evaluation widths are also similar when comparing the two methods. Conclusions: This paper suggests that case–control
and cross-sectional studies produce consistent results if care is taken in the study design and model
development. Impact on industry: Case–control and cross-sectional studies may provide a viable alternative to
estimate CMFs when a before–after study is impractical due to data restrictions.
© 2011 National Safety Council and Elsevier Ltd. All rights reserved.
0022-4375/$ – see front matter © 2011 National Safety Council and Elsevier Ltd. All rights reserved.
doi:10.1016/j.jsr.2011.03.003
118 F. Gross, E.T. Donnell / Journal of Safety Research 42 (2011) 117–129
before–after, and cross-sectional studies, given the same data ratio, which can be used as a direct estimate of safety effectiveness. The
limitations, to investigate their potential as alternative methods for odds ratio is a measure of the percent change in the chance of an
safety evaluations. outcome given the presence of a risk factor compared to the baseline
The objective of this study is to compare case–control and cross- level of the risk factor. This lends itself well to the approximation of
sectional methods to estimate measures of safety effectiveness using CMFs because the purpose is to provide an estimate of the incremental
two independent datasets. The safety effects of fixed, at-grade inter- safety effect of a particular feature in relation to a certain baseline level.
section lighting in Minnesota were estimated using both evaluation The case–control method, in general, is associated with several
methods. Similarly, the safety effects of lane and shoulder width advantages over alternative safety evaluation methods, and the matched
dimensions were evaluated using two-lane, rural highway data from case–control design has additional distinct advantages as follows:
Pennsylvania. An observational before–after evaluation was not
considered in the present study because roadway lighting is seldom • Studying rare events: the case–control design is ideal for studying
the only countermeasure applied to a site, making it difficult to isolate rare events, such as crashes, because the sample may be selected
the safety effects of roadway lighting using this method. Similarly, the so that a pre-specified number of cases are enrolled in the study,
lane and shoulder width evaluation did not involve a treatment; ensuring an adequate sample for analysis.
rather, various lane and shoulder width combinations were compared • Evaluating multiple risk factors from a single sample: the sample is
to a baseline condition. selected based on outcome status and investigated to determine
potential risk factors. Any variables not included in the case
2. Background definition or matching scheme may be assessed, simultaneously,
as individual risk factors.
2.1. Cross-sectional studies • Controlling for confounding variables: confounders include variables
that completely or partially account for the apparent association
Cross-sectional studies are commonly used in transportation between an outcome and risk factor. Specifically, a confounder is a
safety research to estimate the expected number of crashes on a variable that is a risk factor for the outcome under study, and is
roadway segment, interchange, or intersection. CMFs derived from associated with, but not a consequence of, the risk factor in question
cross-sectional data are based on a prescribed time period under the (Collett, 2003). In highway safety, an example of a confounder is
assumption that the ratio of average crash frequencies for sites with average daily traffic (ADT). ADT has been shown to be associated
and without a feature is an estimate of the CMF for implementing that with crash risk and is also associated with, but not a consequence
feature. The strength of a cross-sectional study is that a large number of, several geometric features (e.g., lane width, shoulder width, and
of sites with and without a specific countermeasure can often be horizontal curvature).
identified. A weakness of a cross-sectional study is that it is difficult to • Matching: the primary reason for a matched design is to directly
determine the reason that certain safety countermeasures exist at control for confounding variables. Control sites are matched to each
one location and not at other similar locations. As such, the observed case through random sampling based on similar values of potential
difference in crash experience can be due to known or unknown confounding variables.
factors other than the feature of interest. Known factors, such as traffic o Matching provides a balanced design and adjusts for the effects of
volume or geometric characteristics, can be controlled for in principle variables included in the matching scheme.
by estimating a multivariate regression model. However, the issue is o Matching ensures that adjustment is possible when the con-
not completely resolved since it is difficult to properly account for founding variable is distributed differently within the case and
unknown, or known but unmeasured, factors. Several examples of control populations. In rare cases, the distribution of a confound-
developing CMFs from a cross-sectional study are contained in the ing variable may not overlap for a random sample of cases and
literature. Lord and Bonneson (2007) developed CMFs for lane width, controls. In this case, there would be no way to adjust the results
shoulder width, and edge-line marking presence for frontage roads in during the analysis phase.
Texas. Bonneson and Pratt (2008) recently proposed a procedure to o Matching improves the efficiency of the design, requiring smaller
develop CMFs for curve radius along two-lane rural highways. sample sizes or resulting in estimates with a narrower confidence
Additionally, Fitzpatrick, Lord, and Park (2008) developed CMFs for interval. However, this only holds when the matching is based on
median width on freeways and rural multi-lane highways in Texas. true confounders (Woodward, 2005).
• Care must be taken to ensure that cases and controls are 3.2. Case–control studies
representative of the underlying populations of interest. Specifical-
ly, the chance of being included in the study must not be associated In a typical case–control design, cases are defined and subjects
with the risk factor(s) of interest. are enrolled based upon their current outcome status (i.e., cases
• The traditional case–control design does not recognize differences and controls). The prior risk factor status within each outcome
between locations with many crashes or a single crash. This is a loss group is then determined. For this study, cases are defined as
of potentially important information and thus, the true increase in locations that experience at least one crash during a particular year
risk could be underestimated. Gross (2006) explored methods to of the study period; controls are those locations that do not ex-
account for locations with multiple crashes, but further research is perience a crash in the same period. Control locations are randomly
needed in this area. matched to each case segment based on several factors to account
• Matching is a powerful tool to account for confounding variables, for potential confounding.
but it also has drawbacks, including: The ratio of controls to cases may vary and often depends on
o Increased complexity of data collection and sample selection, the availability of time, budget, and other factors. As the ratio of con-
especially when there are many matching variables. trols to cases increases, the power of the design increases but at a
o The sample sizes within each matching combination often become decreasing rate. There is often little additional power gained at ratios
small due to the limited number of subjects (sites) that match the greater than four controls per case. In this study, one control was
criteria exactly. In transportation, this has been stated as a limitation randomly matched to each case (a one-to-one ratio), which achieves
to cross-sectional studies that involve matching (Hauer, 2010). about 90% power (Woodward, 2005).
o The effectiveness of matching variables cannot be estimated. The Presence of risk factors may be determined only after the case–
interaction between matching variables and risk factors, however, control sample has been obtained. Risk factors may take the form of
may be analyzed. binary variables (e.g., presence of guardrail or rumble strips) or multi-
level variables (e.g., speed limit or traffic control type).
3. Methodology The odds ratio is the appropriate measure of effectiveness for case–
control studies and is an estimate of the relative effect of the risk
3.1. Cross-sectional studies factor compared to a baseline (e.g., no risk factor for a binary variable).
The odds ratio is interpreted as the expected percent change in
It is common practice to use generalized linear modeling the outcome in question (i.e., one or more crashes in a year) due to the
techniques, assuming a negative binomial error structure, to estimate presence of the risk factor. In the case of a binary risk factor (i.e.,
crash frequency models. Several examples are published in the presence or absence), the baseline is usually the absence of a
highway safety literature for intersections (e.g., Poch & Mannering, particular feature and the odds ratio indicates the percent change in
1996; Bauer & Harwood, 1996; Washington, Persaud, Lyon, & Oh, the risk of an outcome due to the presence of the risk factor in
2005; Donnell, Porter, & Shankar, 2010) and rural road segments question. For risk factors with more than two possible categories,
(e.g., Shankar, Mannering, & Barfield, 1995; Vogt & Bared, 1998). A the baseline may be selected as any one category to which other
negative binomial regression model relates a crash count as a left- categories are compared. Each category may be compared to the
hand-side (LHS) variable to a number of right-hand-side (RHS) baseline, in which case the measure of effectiveness would indicate
variables, coefficients that quantify magnitudes of relationships the percent change in the risk of an outcome due to the specified
between LHS and RHS variables, and a disturbance term. For this deviation from the baseline.
study, the negative binomial regression model was assumed to take Conditional binary logistic regression is used to estimate the odds
the following form: ratio for matched case–control designs. The regression model is
conditional on the fact that controls are purposefully matched to cases
lnλi = βXi + εi ð1Þ on specific variables. Conditional binary logistic regression may be
used to estimate the odds ratio of a binary outcome and account for
multi-level risk factors, confounding variables that are not included in
where:
the matching process, and interaction terms. The conditional
probability of an outcome associated with the unmatched variables
λi expected number of crashes at location i;
X1,…,Xp for each member of the jth matched set is given by Eq. (3)
β vector of estimable regression parameters corresponding to
(Schlesselman, 1982).
geometric design and traffic volume explanatory variables;
Xi vector of geometric design and traffic volume explanatory
variables for location i; p
εi gamma-distributed error term. PrðY = 1Þ = 1 = 1 + exp − αj + ∑ βi Xi ð3Þ
i=1
n
c
p
Regression parameters are estimated using the method of maxi- LðβÞ = − ∑ ln 1 + ∑ exp ∑ βk Xjki −Xj0i ð4Þ
mum likelihood. j=1 k=1 i=1
120 F. Gross, E.T. Donnell / Journal of Safety Research 42 (2011) 117–129
where: roadways were included because of considerable missing data for the
minor intersecting roadways. More than 42% of the intersections
n number of cases; contained roadway lighting.
c number of controls matched to each of n cases; Regression models of total daytime and nighttime crashes were
Xi unmatched explanatory variables; estimated for intersections. In the present study, the safety effects
Xj0i value of xi for a case in the jth matched set; of roadway lighting were estimated from daytime and nighttime
Xjki value of xi for the kth matched control in the jth matched set. regression parameters. The daytime crash frequency model was used
as a baseline scenario in which to compare the nighttime crash
In this study, the dependent variable is coded as a one or zero to frequency model. It was assumed that the daytime lighting indicator
indicate a case or control, respectively. Risk factors are represented by variable should be near zero (i.e., no difference in the expected
Xi in Eq. (3). Confounding variables included in the matching scheme daytime crash frequency with or without lighting) because no
do not appear in the estimation because matching variables have the illumination is provided by the lighting system during the day. In
same value for cases and controls. such a case, the nighttime crash frequency model could be used alone
to determine the safety effect of fixed roadway lighting. However, if
4. Empirical setting this assumption did not hold, the relative difference between the
nighttime and daytime lighting indicator parameter estimates was
To compare safety effectiveness results obtained from case– needed to determine the effect of fixed roadway lighting. The method
control and cross-sectional studies, two independent databases to compute the percent reduction in crashes from the daytime and
were developed. At-grade intersections in Minnesota were used to nighttime crash frequency models is as follows:
evaluate the safety effectiveness of roadway lighting presence. Two-
= exp θ̂
lane rural highway segments in Pennsylvania were used to assess the
safety effectiveness of lane and shoulder widths. Each database is exp θ̂N D −1 ð5Þ
described below.
At-grade intersection data in Minnesota were obtained from the θˆ N estimated regression parameter associated with the pres-
Federal Highway Administration (FHWA) Highway Safety Informa- ence of roadway lighting in the nighttime crash frequency
tion System (HSIS) files (Council & Williams, 2001). Four years (2001 model; and
through 2004, inclusive) of data were used in the analyses, which θˆ D estimated regression parameter associated with the presence
included 6,464 intersections. Approximately 13.7% of the intersec- of roadway lighting in the daytime crash frequency model.
tions in the sample had signal control while the remainder operated
under stop control. There were three intersection forms coded in the A negative value resulting from Eq. (5) indicates that the expected
database: cross, tee, and skew. Approximately 49% of the intersections number of crashes are lower on roadways with fixed roadway lighting
were four-leg cross intersections, nearly 40% were three-leg tee compared to those without lighting. Eq. (5) produces a crash reduc-
intersections, and the remaining 11% were four-leg skewed intersec- tion factor (CRF) – a CMF is computed by a simple conversion of
tions. There were 38,437 reported crashes at the intersections the CRF (i.e., CMF = 1 – (CRF/100)). The advantage of this approach
included in the analysis database. Each crash represented a potential over a before–after approach is consideration of variables other than
case for the case–control study, and controls (i.e., intersections with the presence of lighting that may also influence the night-to-day crash
no crashes) were randomly sampled and matched to each case as ratio (e.g., traffic volume, geometrics). By considering these variables,
described previously. the hopeful result is the ability to isolate the effect of roadway lighting
Minnesota is a unique HSIS database in that it contains a variable while controlling for other variables present at the intersection.
for lighting presence for at-grade intersections. As noted previously, The original dataset, used for the negative binomial models, was
only the presence of roadway lighting was considered in the present modified for the case–control study. Two different datasets were
study. Table 1 provides a summary of Minnesota intersection data developed for the case–control study; one to analyze nighttime
included in the analysis. Only geometric design data for the major crashes and one to analyze daytime crashes. The two datasets were
Table 1
Descriptive Statistics of Minnesota Intersection Data.
developed using a similar methodology, but for the nighttime dataset, tions were available (21,688 segments x 5 years of data). The total
cases were defined as intersections that experience at least one number of crashes occurring over the analysis period was 56,732. The
nighttime crash in a year, whereas the daytime dataset defines cases average daily traffic on the analysis segments ranged from 95 to
as intersections that experience at least one daytime crash in a year. 25,844 vehicles per day. Descriptive statistics for all variables included
Once the cases were defined, the following general method was used in the analysis are shown in Table 3.
to develop the two datasets: As shown in Table 3, the lane width, shoulder width, posted speed
limit, location of the roadway segment (engineering district), year,
1. Separate case and control populations.
and additional shoulder width (total minus paved) were the
2. Randomly match one control to each case based on area type,
categorical explanatory variables used in the analysis. The relatively
median type, number of lanes, and intersection type.
low proportion of mileage in engineering districts 5–0, 6–0, 11–0, and
3. Develop indicator variables for lighting, year of crash, speed limit,
and traffic control.
Table 2 shows the sample sizes for the nighttime and daytime Table 3
case–control datasets. For this study, case–control pairs were matched Descriptive Statistics for Pennsylvania 2-lane Rural Roads.
on area type (i.e., rural or urban/suburban), median type on the major
Continuous Variables Minimum Maximum Mean Standard
road (i.e., divided or undivided), number of lanes on the major road Deviation
(i.e., 2-lane or 4-lane), and intersection type (i.e., 3-legged, 4-legged
Number of total crashes (per year) 0 18 0.534 0.903
cross, 4-legged skew). Average daily traffic (veh/day) 95 25,844 3,419 3,061
Roadway lighting took the form of a binary variable (i.e., lighting Segment length (miles) 0.01 1.48 0.48 0.13
present or lighting absent); other covariates included in the cross-
sectional and case–control comparison were: Categorical Variables Proportion of Sample (%)
12–0 is a result of the these locations being primarily urban, while the observations were available for the case–control analysis (41,654
remaining engineering districts are primarily rural districts with a cases and 41,654 controls). The covariates used in the case–control
significant proportion of rural two-lane highway mileage. The analysis are the same as those described above for the negative
additional shoulder width variable was created to determine if there binomial regression model, excluding year, segment length, and AADT
is a safety benefit to providing a shoulder that is not paved. as these variables were used in the matching process. The covariates
A negative binomial regression model of total crashes was entered the model as categorical indicators. The average values for
estimated using the general functional form of the model shown the case sample are as follows: 11.2 ft (3.41 m) lane width, 3.0 ft
in Eq. (1). The parameter estimates from the model were used to (0.91 m) paved shoulder width, 4.1 ft (1.25 m) total shoulder width,
evaluate the safety effect of the lane and shoulder width variables 47 mi/h (75.6 km/h) speed limit, 3,921 average daily traffic, and
when compared to a baseline level. In the present study, the lane 2,598 ft (791.9 m) segment length. The average values for the
width baseline was 12 feet (3.6 m), the shoulder width baseline was control sample are as follows: 11.2 ft (3.41 m) lane width, 3.1 ft
6 feet (1.8 m), and the additional shoulder width baseline was 0 feet (0.94 m) paved shoulder width, 4.3 ft (1.31 m) total shoulder width,
(0 m) to be consistent with the rural two-lane highway AMFs shown 48 mi/h (77.2 km/h) speed limit, 3,701 average daily traffic, and
in the Highway Safety Manual (AASHTO, 2010). As such, the relative 2,578 ft (785.8 m) segment length.
effect of the explanatory variable is compared to the baseline. A value
less than 1.0 indicates that a particular lane or shoulder width 5. Analysis results
dimension is associated with fewer crashes than the baseline
condition, while a value greater than 1.0 is associated with a higher 5.1. Minnesota intersections and roadway lighting
expected total crash frequency when compared to the baseline
condition. Results of the negative binomial regression models are shown in
The dataset used for the negative binomial models was modified Table 5 and Table 6 for daytime and nighttime model specifications,
for the case–control study. Only a single database was created for the respectively. The natural logarithm of the major and minor roadways
case–control analysis to evaluate the safety effects of lane and was used to estimate the relationship between traffic volume and the
shoulder width on two-lane rural highways in Pennsylvania. Creating total daytime and nighttime crash frequency. Comparing the results in
the case and control populations for the segment data generally Table 5 and Table 6, the only parameter estimate that changes in sign
followed the same procedure as that outlined above for the is the lighting presence indicator (from 0.046 in the daytime model to
intersection data. Cases were defined as segments that experience −0.081 in the nighttime model). This result was expected because the
at least one crash during a particular year of the study period, and presence of fixed roadway lighting was expected to have a negative
controls were defined as those segments not experiencing a crash association with nighttime crash frequency, and either a slightly
within the same year. The variables used in the matching process positive or no association with daytime crashes. In the daytime crash
were year, segment length, and average daily traffic. Whereas year frequency model, the lighting indicator was slightly positive and
was included as a covariate in the analysis of Minnesota intersections, marginally significant while in the nighttime model the lighting
it was used as a matching criterion in the analysis of segments in indicator was negative and statistically significant at the 10% level. As
Pennsylvania. Including year as a covariate in the Pennsylvania such, Eq. (5) was used to compute the mean percent reduction in the
analysis would likely not change the results, but there was a sufficient expected crash frequency of approximately 11.9%. When considering
sample size to include year as a matching variable and thus control for the standard error of the estimate, the 95% confidence interval for the
time trends. A transformation was used to apply a non-arbitrary rule reduction in crashes is 10.9 to 13.0%. The CMF for lighting presence
for creating categories for the continuous variables. The procedure would be 0.881.
involves transforming the data to a normal distribution, estimating Results of the case–control models are shown in Table 7 and
the mean and standard deviation, and then selecting categories based Table 8 for daytime and nighttime model specifications, respectively.
upon +/− a standard deviation from the mean. Details regarding the Typically, results of case–control studies are reported as odds ratios,
final categories are provided in previous work (Gross & Jovanis, 2007). but for comparison purposes, the odds ratios have been converted to
Table 4 shows the sample sizes for the case–control database for standard parameter estimates. The odds ratio is simply exp(β), so to
rural two-lane highway segments in Pennsylvania. A total of 83,308 obtain the parameter estimate, the natural log of the odds ratio is
computed.
Comparing the results in Table 7 and Table 8, the only statistically
Table 4
Case–control Sample for Pennsylvania 2-lane Rural Roads.
significant parameter estimate that changes in sign is the lighting
presence indicator (from 0.055 in the daytime model to − 0.121 in the
Covariate Case Sample Control Sample nighttime model). Again, this was expected and is consistent with
Lane width 9 feet 1,677 2,094 the findings from the cross-sectional analysis findings presented in
Lane width 10 feet 10,236 9,850 Table 5 and Table 6. Using Eq. (5), the mean percent reduction in the
Lane width 10.5 feet 1,511 1,361
expected crash frequency is approximately 16.4%. When considering
Lane width 11 feet 16,862 16,209
Lane width 11.5 feet 439 464 the standard error of the reduction, the 95% confidence interval is 15.7
Lane width 12 feet 8,217 9,038 to 16.6%. Of note is the fact that the daytime lighting presence variable
Lane width 12.5 feet 123 167 in the case–control model was not statistically significant. If only the
Lane width 13 feet 320 363 nighttime lighting indicator parameter is used to compute the percent
Lane width 13.5 feet 2,269 2,108
Shoulder width 0 feet 10,374 9,857
difference in crashes at intersections with and without lighting, the
Shoulder width 1 foot 869 783 expected crash reduction is 11.4%, a near identical match to the
Shoulder width 2 feet 5,396 5,024 estimate obtained from the cross-sectional approach. The CMF would
Shoulder width 3 feet 7,392 7,242 be 0.886.
Shoulder width 4 feet 9,943 9,978
The only other variable that changes sign between the daytime
Shoulder width 5 feet 2,511 2,633
Shoulder width 6 feet 2,427 2,714 and nighttime case–control model is the indicator for year 2002.
Shoulder width 7 feet 364 384 However, in both models this variable is not statistically significant.
Shoulder width 8 feet 1,804 2,039 In general, the parameter estimates for the remaining variables
Shoulder width 9 feet 46 93 are intuitive and relatively consistent in the cross-sectional and
Shoulder width 10 feet 528 907
case–control models presented in Tables 5–8. The year indicators
F. Gross, E.T. Donnell / Journal of Safety Research 42 (2011) 117–129 123
Table 5
Negative Binomial Regression Model for Daytime Crashes at Minnesota Intersections.
are included to adjust for time-related trends in crashes. Based on as the percent heavy vehicles increases; however, both modeling
the results, 2002 and 2004 are not significantly different from 2001 approaches indicate that crash risk decreases slightly as percent heavy
(the baseline). The year 2003 is significantly different from the vehicles increases.
baseline and indicates that more crashes occurred in 2003 than in Restricted models were estimated to investigate the effects of
2001; this is consistent in both the cross-sectional and case–control other covariates on the parameter of interest (i.e., lighting). Restricted
models. Daytime and nighttime crashes are expected to increase as models for both daytime and nighttime crash frequency at Minnesota
traffic volume increases on the major and minor roads. Likewise, intersections were developed. The restricted cross-sectional models
signalized intersections are expected to have a higher daytime and included the lighting covariate and all of the matching variables
nighttime crash frequency when compared to stop-controlled inter- that were included in the case–control model to allow for direct
sections. Higher speed roads are associated with a lower crash comparisons. The restricted case–control models included only the
frequency when compared to low-speed roadways in both the lighting variable.
cross-sectional and case–control models. This may seem counterin- The results of the restricted cross-sectional models are shown in
tuitive, but this variable is likely correlated with functional classi- the last column of Table 5 and Table 6. When the cross-sectional
fication, where high speed roads are often designed with more model is restricted (i.e., when covariates are removed), the parameter
uniformly-applied criteria than low speed roads (e.g., wider lanes, estimates for lighting change in both the daytime and nighttime crash
gentler curves, and wider clear zone). The only variable that appears frequency models (both parameter estimates increase). However,
to be counterintuitive in the cross-sectional and case-control models the night-to-day ratio, estimated using Eq. (5), is − 0.129. Using the
is the percent heavy vehicles. One may expect crash risk to increase lighting parameter estimates from the full model, the night-to-day
Table 6
Negative Binomial Regression Model for Nighttime Crashes at Minnesota Intersections.
Table 7
Case–control Daytime Model for Lighting Presence at Minnesota Intersections.
ratio is − 0.119. This finding suggests that the relative safety effects of statistically different than 1.0. The parameter estimates for the lane
lighting (night-to-day ratio) remain consistent between the full and width indicator variables corresponding to 9-, 12.5-, and 13-feet (2.7,
restricted models. However, the restricted models may over-estimate 3.8, and 4.0 m) are all negative and statistically significant in Table 9.
the safety effects of intersection lighting when looking only at This suggests that each of these lane widths are associated with
nighttime crash frequency. fewer expected crashes than the baseline of 12 feet (3.6 m). Likewise,
The results of the restricted case–control models are shown in the the indicator variables for shoulder widths corresponding to 9- and
last column of Table 7 and Table 8. Similar to the restricted cross- 10-feet (2.4, 2.7, and 3.0 m) were both negative and statistically
sectional models, the estimated parameter for lighting increases in significant suggesting that these shoulder widths are associated
both the daytime and nighttime model when the other covariates are with fewer expected crashes than the baseline of 6 feet (1.8 m). The
removed. The lighting indicator is now positive for the nighttime indicator for the 8-foot (2.4 m) shoulder was negative, but not
model, indicating an increase in expected crashes. However, the statistically significant. In the case of both lane width and shoulder
relative parameter estimates for lighting are similar to the full model width, the cross-sectional approach indicates that dimensions wider
in that the daytime model indicates a relatively larger increase in than the baseline have beneficial safety effects.
expected crashes when compared to the nighttime model. Specifical- For the additional shoulder width indicator variables shown in
ly, the relative effect, comparing night-to-day estimates using Eq. (5), Table 9, all have a negative sign. Once the additional shoulder width
is − 0.23 (i.e., a 23% reduction in nighttime crashes when lighting is reaches 4 feet (1.2 m), all parameter estimates are statistically sig-
present). Recall that the relative effect from the full model was −0.16 nificant compared to the baseline of no additional shoulder width.
[based on Eq. (5)], indicating a 16% reduction in nighttime crashes Given that this variable is a measure of the unpaved shoulder width,
when lighting is present. The restricted model appears to over- this finding suggests that providing at least 4 feet (1.2 m) of unpaved
estimate the relative effect of lighting. It is clear from these results that shoulder beyond that which is paved produces a beneficial safety
it is important to control for potential confounding variables when effect.
estimating the safety effects of lighting. The results of the case–control study are shown in Table 10. With
respect to lane width, the parameter estimates from the case–control
5.2. Pennsylvania roadway segments and lane and shoulder widths analysis all have the same sign as the cross-section model, except in
the case of the 11.5 foot (3.5 m) indicator. Many of the magnitudes are
The negative binomial regression model for total crashes on two- similar when comparing the parameter estimates in Tables 9 and 10 –
lane rural highway segments in Pennsylvania is shown in Table 9. The this is even more evident when comparing odds ratios (exp[β]),
natural logarithm of the average daily traffic and segment length were which indicates the relative effect of the indicator when compared to
both included in the model specification. The segment length was not the baseline. The relative effects differ by less than 10% for all lane
included as an offset variable because the parameter estimate was width indicators except for 9 and 11.5 feet (2.7 and 3.5 m).
Table 8
Case–control Nighttime Model for Lighting Presence at Minnesota Intersections.
Table 9
Negative Binomial Regression Model for Rural Two-lane Highways.
For shoulder width, the case–control modeling results are similar of the variable. The most significant differences in the relative effects
to the cross-sectional model results in both sign and magnitude. occur for additional shoulder widths of 7 to 10 feet (2.1 to 3.0 m),
Shoulder widths of 9 and 10 feet (2.7 and 3.0 m) are both negative which range from 12% to 22%.
and statistically significant, indicating that wider shoulders provide a Fig. 1 shows the relative effects (odds ratios) of the lane width,
safety benefit when compared to the baseline of a 6 foot (1.8 m) shoulder width, and additional shoulder width indicator variables
shoulder width. The relative effects for 9- and 10-foot (2.7 and 3.0 m) for the negative binomial and case–control study methods. In Fig. 1(a)
shoulders in the case–control and cross-sectional studies differ by less and (c), the CMF using the cross-sectional approach is generally larger
than 15%. These differences are much less pronounced for shoulder in magnitude than that which was developed using the case–
widths between 0 and 6 feet (0 and 1.8 m). control approach. The opposite is true for the shoulder width CMF in
Similar trends are observed when comparing the additional Fig. 1(b).
shoulder width parameter estimates across the case–control and Similar to the lighting study, restricted cross-sectional and case–
cross-sectional models. The signs are nearly all identical for all levels control models were estimated for two-lane, rural highways using
126 F. Gross, E.T. Donnell / Journal of Safety Research 42 (2011) 117–129
Table 10
Case–control Model for Rural Two-lane Highways.
data from Pennsylvania. Again, the restricted cross-sectional models indicates a reduction. Some of the parameter estimates even flip signs
included the variables of interest (i.e., lane and shoulder width) and (i.e., positive to negative or negative to positive). The results are quite
also included the matching variables (year indicators, segment similar to those in the restricted cross-sectional model. Again, the
length, and ADT) used in the case–control study to allow for direct comparison of the full and restricted models illustrates the importance
comparisons. The restricted case–control model included only lane of controlling for potential confounding variables when estimating
and shoulder width variables. safety effects. Excluding important confounders could result in over-
The results of the restricted cross-sectional model are shown in the or under-estimation of the expected safety effects.
last column of Table 9. Comparing the restricted and full models, it is
evident that the parameter estimates for lane width are higher in the full 6. Discussion
model, but the trend relative to the baseline (lane width =12 feet) is
similar to that shown in Fig. 1a. For shoulder width, some of the parameter As noted previously, the purpose of this study was to compare the
estimates increased while others decreased, yet the trend relative to the CMFs developed using two independent datasets based on cross-
baseline (shoulder width=6 feet) is similar to that shown in Fig. 1b. sectional and case–control methodologies. An observational before–
The results of the restricted case–control model are shown in the after study could not be performed using the data in the present study
last column of Table 10. Comparing the results of the full and restricted because the installation dates of the roadway lighting were not
models, there are several notable differences in the parameter available, and the lane width and shoulder widths were not changed
estimates. In some cases, the restricted model indicates an increase during the study period. From the present analysis, the cross-sectional
in the parameter estimate, while in other cases the restricted model and case–control methods produced similar results for the roadway
F. Gross, E.T. Donnell / Journal of Safety Research 42 (2011) 117–129 127
heavy vehicles on the major approach, traffic control type, and major
road speed limit indicator) were used as explanatory variables in a
conditional logistic regression model.
It was hypothesized that the expected number of daytime crashes
at intersections in Minnesota would not be associated with fixed
roadway lighting. The case–control method confirmed this result
where it was not possible to conclude, based on the sample, that
lighting was statistically associated with daytime crash frequency. As
such, the net effect of lighting for the case–control analysis is only
based on the nighttime model. The nighttime crash frequency model
indicated that the crash frequency is expected to be 11.4% lower at
intersections with roadway lighting than at intersections without
roadway lighting. In the cross-sectional method, the lighting presence
parameter for the daytime crash frequency model was marginally
significant (t-stat = 1.53, p = 0.124). As such, the net effect of road-
way lighting was computed using the ratio of the expected nighttime
and daytime crashes. The ratio indicated that the crash frequency is
expected to be 11.9% lower at intersections with roadway lighting
than at intersections without roadway lighting, which is almost
identical to the results from the case–control analysis.
The results of the present analyses are consistent with the lighting-
safety literature that suggests there is a reduction in total crash
frequency attributed to lighting intersections. As noted by Donnell
et al. (2010), differences between past lighting-safety literature and
regression modeling approaches similar to those applied in the
present study are expected and likely due to the inclusion of traffic
volume, intersection geometry, and traffic control features in model
estimation. Additionally, unlike many past lighting-safety studies
(see Commission Internationale de l'Eclairage, 1992; Elvik, 1995 for
synthesis of results), the modeling approach used in this study is
based on expected crash frequency rather than observed crash counts,
and does not assume that the relationship between crash frequency
and traffic volume is linear, which may lead to incorrect conclusions
related to intersection safety performance (Persaud, 2001).
Using the roadway segment data from Pennsylvania, the cross-
sectional model included traffic volume, segment length, lane width,
shoulder width, additional shoulder width, posted speed limit, and
the engineering district. The case–control approach used traffic
volume and segment length as matching variables and specified a
model using the remaining covariates. In general, the different
approaches appear to produce similar results. With the exception of
9- and 11.5-foot (2.7 and 3.5 m) lanes, the CMFs produced by the
cross-sectional and case–control approaches differed by less than 10%
(in absolute terms). In the case of lane widths, the case–control
approach generally produced a CMF that was lower than that which
was produced by the cross-sectional approach. When comparing the
two methods, the CMF for additional shoulder width differed by less
Fig. 1. Accident Modification Factor Comparison Based on Case–control and Cross-
Sectional Analyses. than 10%, until the magnitude of this variable reached 7 feet (2.1 m).
This may have been the result of the very small sample size (0.1 to
1.2% of entire sample) used to specify the models. For the shoulder
width, the case–control method generally produced a CMF that was
lighting CMF at intersection locations in Minnesota, and the lane greater in magnitude than that which was produced using the cross-
and shoulder width CMFs on two-lane rural highway segments in sectional approach; however, the results differed by less than 10% for
Pennsylvania. shoulder widths ranging from 0 to 8 feet (0 to 2.3 m). The larger
For the intersection data from Minnesota, the cross-sectional difference in the CMF for shoulder widths of 9 and 10 feet (2.7 and
models were specified using major and minor road traffic volumes, 3.0 m) may be the result of the small proportion of the sample
major road heavy vehicle proportions, intersection geometric design available (0.2 and 1.5%, respectively) to specify the models.
features, a traffic control indicator, a roadway lighting presence The safety trends for the lane and shoulder widths shown in Fig. 1a
indicator, and crash year indicators. Each of these variables served as and b are not unlike the CMFs presented in the Highway Safety
explanatory variables in the model. The case–control method also Manual (AASHTO, 2010) for two-lane, rural highway segments. The
used each of these variables in the analysis, but for two different HSM shows higher expected crash frequencies as the lane and
purposes. The intersection type, number of approach lanes on the shoulder widths narrow – the same general trend was found in the
major road, area type, and median type were used to match cases and present study except at the upper and lower ends of the lane width
controls, ensuring that the geometric configuration of each control distribution (9 and 13.5 feet).
was similar to the matched case site. The remaining variables (i.e., In general, the case–control approach produced CMFs that were
year of crash, major and minor road traffic volumes, percentage of greater than the respective CMFs from the cross-sectional approach.
128 F. Gross, E.T. Donnell / Journal of Safety Research 42 (2011) 117–129
This trend held for results from both the Minnesota and Pennsylvania implementation of a specific treatment), it provides an opportunity to
datasets. As discussed in the Methodology, the traditional case– test the main effects (i.e., period and group) as well as the interaction
control design does not account for sites with multiple outcomes between period and group. The present study suggests that case–
(e.g., segments or intersections with multiple crashes), potentially control and cross-sectional studies based on observational data
under-estimating the effects of a treatment. The cross-sectional produce consistent results when a before–after study is impractical
approach, on the other hand, explicitly considers the number of due to data restrictions.
outcomes in the estimation of model coefficients (i.e., estimated The method of matching used in the present study included
safety effects). This is likely one explanation for the difference in several geometric features for the analysis of lighting at intersections,
results. and traffic volumes and segment length in the analysis of lane and
Another noteworthy finding from the present study is that, shoulder widths along rural two-lane roadway segments. In the case
generally, the explanatory variables in the cross-sectional and case– of at-grade intersections, the database was not large enough to
control model specifications are similar in sign. For those coefficients consider other matching parameters without resulting in small
that differ in sign, the difference is likely due to the fact that the case– samples for certain combinations. Future research should consider
control design used each crash as a potential case, where the cross- the effects of different matching schemes to determine when the
sectional study used the total crashes per year as the outcome in the results of case–control and cross-sectional study comparisons either
model. Therefore, if an intersection or roadway segment experienced begin to converge or diverge. Aside from cross-sectional elements, the
multiple crashes in a year, the features associated with the data from Pennsylvania did not include other geometric features
intersection or roadway segment would be given more weight in (e.g., horizontal curve radius, vertical grade) that could be used for
the model. These findings suggest that the cross-sectional approach matching purposes. Future research should also consider additional
that controls for various roadway features and traffic volumes, strictly matching variables to determine when the results of the case–control
by modeling, closely approximates the results obtained from a case– and cross-sectional modeling approaches converge or diverge. Other
control method that matches on some covariates while controlling important covariates that should be considered in future research
for others through modeling. It is likely that cross-sectional studies include information related to weather, driver demographics, and
will approximate the results obtained from a case–control study level of enforcement.
when the covariates have similar distributions in the case and control The matching method used in the present study was also one-to-
populations. one (a single case matched to a single control). In some cases,
While the results from this study are generally consistent between roadway segments experience more than a single crash in a given
the case–control and cross-sectional studies, it may not have been the year. This density is not explicitly considered in the matching process.
case if care was not taken in developing the cross-sectional model. For example, assume a segment experiences three crashes in a given
Selecting an appropriate distribution and functional form have been year. In the matched case–control design, the three crashes occurring
cited as weaknesses of cross-sectional studies. A case–control design, during a year on this segment would be treated as three separate
on the other hand, does not entail the burden of selecting an cases and then matched to three controls (year was a matching
appropriate functional form, but conditional logistic regression must variable). Future research should be undertaken to determine what
be used to account for the matching. effect such a matching method has on the estimation results. An
It is important to note that quasi-experimental designs are always example would be to use ordinal logistic regression to estimate
subject to confounding due to specification errors and absence of the odds ratio in a matched case–control design. This approach
significant covariates. This study illustrated the danger of excluding would permit explicit consideration of multi-crash segments in the
important covariates by comparing the results from full and restricted estimation process.
models. While this study included several covariates to control for Lastly, when a before–after study is not practical, it may be
potential confounding factors, there are other factors such as weather, necessary to develop CMFs using alternative methods. Two examples
driver demographics, level of enforcement, and prior crash rate that are the cross-sectional and case–control methods described in this
were not included in the analysis. As such, these factors should be paper. For the two different datasets considered in this study, the
considered in future CMF estimation research to reduce the potential results appear to produce consistent results. When consistent findings
bias associated with omitting variables from case–control or cross- are obtained for CMF estimates using two alternative analysis
sectional models. methods, one might have confidence that similar safety performance
will ensue in future implementations.
7. Conclusions
The safety effects of roadway lighting and lane and shoulder width 8. Impacts on industry
were estimated and compared using a cross-sectional and case–
control methodology for at-grade intersections in Minnesota and rural Well-designed observational before–after studies are the current
two-lane highway segments in Pennsylvania, respectively. The industry standard for developing CMFs. While they offer several
estimated safety effectiveness for lighting and lane and shoulder advantages over other safety countermeasure evaluation methods,
width was similar when comparing the two methods. This seems to there are several practical limitations, which sometimes preclude
suggest that the two analysis methods are comparable when care is them as a viable method. Some have argued that cross-sectional
taken in developing cross-sectional models (i.e., consideration of studies have not proven successful as a method to identify cause and
many covariates, functional form). While the results presented in this effect in road safety. Further, it has been suggested that epidemio-
paper are based on two independent datasets, future research should logical methods (e.g., the case–control study design) may provide a
be undertaken to determine if this trend remains consistent. viable alternative for estimating CMFs. This paper suggests that case–
Futhermore, comparing the results from cross-sectional, case–control, control and cross-sectional studies based on observational data
and observational before–after studies should be undertaken to produce consistent results when a before–after study is impractical
determine the similarities and differences resulting from the differing due to data restrictions. While the case–control method can be used to
statistical approaches to develop CMFs. Another alternative that account for the many sources of variation present in a cross-sectional
should be considered in this comparison is the pre-post multiple model, a cross-sectional analysis can produce similar results if care is
groups design. While this type of study is subject to several of the taken in the study design and model development (e.g., selection of
limitations of an observational before–after study (e.g., requires the covariates and model functional form).
F. Gross, E.T. Donnell / Journal of Safety Research 42 (2011) 117–129 129
References Lord, D., & Bonneson, J. A. (2007). Development of Accident Modification Factors for
Rural Frontage Road Segments in Texas. Transportation Research Record: Journal of
the Transportation Research Board, 2023, 20−27.
American Association of State Highway Transportation Officials [AASHTO]. (2010). Persaud, B. N. (2001). NCHRP Synthesis 295: Statistical Methods in Highway Safety
Highway Safety Manual (1st Edition). Washington, DC: Author. Analysis. Washington, DC: Transportation Research Board.
Bauer, K. M., & Harwood, D. W. (1996). Statistical Models of At-Grade Intersection Accidents. Poch, M., & Mannering, F. (1996). Negative Binomial Analysis of Intersection Accident
Report No. FHWA-RD-96-125. McLean, VA: Federal Highway Administration. Frequencies. Journal of Transportation Engineering, 122(2), 105−113.
Bonneson, J. A., & Pratt, M. P. (2008). Procedure for Developing Accident Modification Schlesselman, J. J. (1982). Case–control Studies: Design, Conduct, Analysis. New York:
Factors from Cross-Sectional Data. Transportation Research Board: Journal of the Oxford University Press.
Transportation Research Board, 2083, 40−48. Shankar, V., Mannering, F., & Barfield, W. (1995). Effect of roadway geometrics and
Collett, D. (2003). Modeling Binary Data (Second Edition). New York: Chapman and environmental factors on rural accident frequencies. Accident Analysis and
Hall/CRC Press. Prevention, 27(3), 371−389.
Commission Internationale de l'Eclairage. (1992). Road Lighting as an Accident Stevenson, M. R., Jamrozik, K. D., & Spittle, J. (1995). A Case–control Study of Traffic Risk
Countermeasure, CIE No. 93. Vienna, Austria: Author. Factors and Child Pedestrian Injury. International Journal of Epidemiology, 24(5),
Council, F. M., & Williams, C. D. (2001). The Highway Safety Information System Guidebook 957−964.
for the Minnesota Data Files. Washington, DC: Federal Highway Administration Tsai, Y. J., Wang, J. D., & Huang, W. F. (1995). Case–control Study of the Effectiveness of
Available at http://www.hsisinfo.org/pdf/minn_vol1_05.pdf. Different Types of Helmets for the Prevention of Head Injuries among Motorcycle
Donnell, E. T., Porter, R. J., & Shankar, V. N. (2010). Framework for Assessing the Safety Riders in Taipei, Taiwan. American Journal of Epidemiology, 142(9), 974−981.
Effects of Roadway Lighting. Safety Science, 48(10), 1436−1444. Vogt, A., & Bared, J. (1998). Accident Models for Two-Lane Rural Segments and
Elvik, R. (1995). Meta-analysis of Evaluations of Public Lighting as Accident Intersections. Transportation Research Record, 1635, 18−29.
Countermeasure. Transportation Research Record, 1485, 112−123. Washington, S., Persaud, B., Lyon, C., & Oh, J. (2005). Validation of Accident Models
Fitzpatrick, K., Lord, D., & Park, B. J. (2008). Accident Modification Factors for Medians for Intersections, Report No. FHWA-RD-03-037. McLean, VA: Federal Highway
on Freeways and Multilane Rural Highways in Texas. Transportation Research Administration.
Board: Journal of the Transportation Research Board, 2083, 62−71. Woodward, M. (2005). Epidemiology: Study Design and Data Analysis (Second Edition).
Gross, F. (2006). A Dissertation in Civil Engineering: Alternative Methods for Estimating New York: Chapman and Hall/CRC.
Safety Effectiveness on Rural, Two-Lane Highways: Case–control and Cohort Methods.
University Park, PA: The Pennsylvania State University.
Dr. Frank Gross is a Highway Safety Engineer at Vanasse Hangen Brustlin, Inc. (VHB) with
Gross, F., & Jovanis, P. P. (2007). Estimation of the Safety Effectiveness of Lane and
nearly 10 years of diverse transportation research and engineering experience.
Shoulder Width: The Case–control Approach. Journal of Transportation Engineering,
He received a B.S. in Civil Engineering from Clarkson University. He then completed his
133(6), 362−369.
M.S. and Ph.D. in Civil Engineering at the Pennsylvania State University, specializing
Gross, F., & Jovanis, P. P. (2008). Estimation of Safety Effectiveness of Changes in
in transportation safety while earning a graduate minor in statistics. As a safety researcher,
Shoulder Width using Case–control and Cohort Methods. Transportation Research
he specializes in conducting road safety audits, analyzing crash data, and developing
Record: Journal of the Transportation Research Board, 2019, 237−245.
crash modification factors. His research interests include highway safety with particular
Gross, F., Persaud, B., & Lyon, C. (2010). A Guide to Developing Quality Crash
interest in the safety of intersections, rural and low-volume roads, and pedestrians.
Modification Factors, Report No. FHWA-SA-10-032. Washington, DC: Federal
Highway Administration.
Harwood, D. W., Council, F. M., Hauer, E., Hughes, W. E., & Vogt, A. (2000). Prediction of Dr. Eric T. Donnell is an Associate Professor in the Department of Civil and
the Expected Safety Performance of Two-lane Rural Highways, Report No. FHWA-RD- Environmental Engineering at the Pennsylvania State University. He earned B.S.,
99-207. McLean, VA: Federal Highway Administration. M.E., and Ph.D. degrees in Civil Engineering from the Pennsylvania State University.
Hauer, E. (2010). Cause, Effect and Regression in Road Safety: A Case Study. Accident Dr. Donnell teaches both undergraduate and graduate courses in transportation
Analysis and Prevention, 42(4), 1128−1135. engineering. He has 13 years of research experience related to roadway and roadside
Jovanis, P. P., Park, S. W., Chen, K. Y., & Gross, F. (2005). On the Relationship of Crash Risk safety, highway and street design, speed management, and roadway delineation
and Driver Hours of Service. 2005 International Truck and Bus Safety and Security systems. His primary research interest is directed at integrating safety and operational
Symposium, Alexandria, Virginia, November 14–16. performance into design decision-making.