Journal of Safety Research: Frank Gross, Eric T. Donnell

This study compares case-control and cross-sectional methods for estimating crash modification factors (CMFs) to evaluate the safety effects of roadway countermeasures. The study estimates CMFs for intersection lighting and lane/shoulder widths using both methods on independent datasets. For intersection lighting in Minnesota, the case-control method produced a CMF of 0.886 while the cross-sectional method yielded 0.881. Similarly, for lane/shoulder widths in Pennsylvania, the CMFs from the two methods were comparable. The results suggest case-control and cross-sectional studies can provide consistent CMF estimates and may be viable alternatives to before-after studies when certain data limitations exist.

Uploaded by

bakhtiar Al-Barzinji

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views

Journal of Safety Research: Frank Gross, Eric T. Donnell

Uploaded by

bakhtiar Al-Barzinji

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Journal of Safety Research 42 (2011) 117–129

Contents lists available at ScienceDirect

Journal of Safety Research

j o u r n a l h o m e p a g e : w w w. e l s ev i e r. c o m / l o c a t e / j s r

Case–control and cross-sectional methods for estimating crash modiﬁcation

factors: Comparisons from roadway lighting and lane and shoulder width
safety effect studies
Frank Gross a,⁎, Eric T. Donnell b
a
Vanasse Hangen Brustlin, (VHB), Inc. 333 Fayetteville St, Suite 1450, Raleigh, NC 27601, USA
b
The Pennsylvania State University, Department of Civil and Environmental Engineering, 212 Sackett Building, University Park, PA 16802, USA

a r t i c l e i n f o a b s t r a c t

Available online 22 March 2011 Problem: While observational before–after studies are considered the industry standard for developing crash
modification factors (CMFs), there are practical limitations that may preclude their use in highway safety
Keywords: analysis. There is a need to explore alternative methods for estimating CMFs. Method: This paper employs
Road safety case–control and cross-sectional analyses to estimate CMFs for fixed roadway lighting and the allocation of
Case–control
lane and shoulder widths. Results: Based on the case–control method, the CMF for intersection lighting is
Cross-sectional
Crash modification factor
0.886, while the cross-sectional study indicates a CMF of 0.881. The CMFs developed for lane and shoulder
Countermeasure evaluation widths are also similar when comparing the two methods. Conclusions: This paper suggests that case–control
and cross-sectional studies produce consistent results if care is taken in the study design and model
development. Impact on industry: Case–control and cross-sectional studies may provide a viable alternative to
estimate CMFs when a before–after study is impractical due to data restrictions.
© 2011 National Safety Council and Elsevier Ltd. All rights reserved.

1. Introduction with the countermeasure of interest, a before–after study is

difficult to employ.
In an observational before–after study, a safety countermeasure is 3. Study period: a before–after study requires a time sequence, where
implemented at a location and the safety effect is estimated by it is necessary to implement a countermeasure and wait for
comparing the observed crash frequency after implementation to an sufficient data in the after period. While data collection can be time
estimate of the expected number of crashes that would have occurred consuming for any safety evaluation, waiting several years after
had the countermeasure not been implemented. If the only change to implementation is a practical concern in before–after studies.
the site being evaluated is the implementation of a single safety
countermeasure, then it is reasonable to conclude that the counter- Given the limitations associated with observational before–after
measure caused the observed change in crashes at the site. Harwood, studies, alternative evaluation methods are sometimes needed to
Council, Hauer, Hughes, and Vogt (2000) suggest that well-designed provide estimates of countermeasure safety effectiveness. A Guide to
observational before–after studies offer several advantages over other Developing Quality Crash Modification Factors introduces various
safety countermeasure evaluation methods; however, there are methods for estimating the safety effects of countermeasures
several practical limitations, including: (Gross, Persaud, & Lyon, 2010). Several variations of observational
before–after studies are presented along with alternative methods.
1. Confounding factors: several improvements may be implemented Two of the alternative methods are explored in this paper.
simultaneously, making it difficult to isolate the effect of a single One alternative method to estimate the safety effectiveness of a
countermeasure using a before–after study. Similarly, changes in countermeasure is a cross-sectional study. Hauer (2010) argues that
traffic volume, driver population, vehicle mix, and other factors cross-sectional studies have not proven successful to identify cause
may occur over the analysis time period in a before–after study. and effect in road safety because multivariable regression typically
2. Sample size: it is sometimes difficult to find an adequate sample does not produce consistent results between studies. He suggests that
of sites where the treatment of interest has actually been im- an observational epidemiological approach may, however, be a viable
plemented. Results from a limited sample will have a high level of method to control for the many sources of variation present in cross-
statistical uncertainty. When there are few or no sites being treated sectional data. A case–control study is one example of an observa-
tional epidemiology evaluation method that may be used to develop
⁎ Corresponding author. Tel.: +1 919 834 3972; fax: +1 919 834 3970. crash modification factors (CMFs) for a countermeasure. There is a
E-mail addresses: fgross@vhb.com (F. Gross), edonnell@engr.psu.edu (E.T. Donnell). need to compare CMFs developed from observational case–control,

0022-4375/$ – see front matter © 2011 National Safety Council and Elsevier Ltd. All rights reserved.
doi:10.1016/j.jsr.2011.03.003
118 F. Gross, E.T. Donnell / Journal of Safety Research 42 (2011) 117–129

before–after, and cross-sectional studies, given the same data ratio, which can be used as a direct estimate of safety effectiveness. The
limitations, to investigate their potential as alternative methods for odds ratio is a measure of the percent change in the chance of an
safety evaluations. outcome given the presence of a risk factor compared to the baseline
The objective of this study is to compare case–control and cross- level of the risk factor. This lends itself well to the approximation of
sectional methods to estimate measures of safety effectiveness using CMFs because the purpose is to provide an estimate of the incremental
two independent datasets. The safety effects of fixed, at-grade inter- safety effect of a particular feature in relation to a certain baseline level.
section lighting in Minnesota were estimated using both evaluation The case–control method, in general, is associated with several
methods. Similarly, the safety effects of lane and shoulder width advantages over alternative safety evaluation methods, and the matched
dimensions were evaluated using two-lane, rural highway data from case–control design has additional distinct advantages as follows:
Pennsylvania. An observational before–after evaluation was not
considered in the present study because roadway lighting is seldom • Studying rare events: the case–control design is ideal for studying
the only countermeasure applied to a site, making it difficult to isolate rare events, such as crashes, because the sample may be selected
the safety effects of roadway lighting using this method. Similarly, the so that a pre-specified number of cases are enrolled in the study,
lane and shoulder width evaluation did not involve a treatment; ensuring an adequate sample for analysis.
rather, various lane and shoulder width combinations were compared • Evaluating multiple risk factors from a single sample: the sample is
to a baseline condition. selected based on outcome status and investigated to determine
potential risk factors. Any variables not included in the case
2. Background definition or matching scheme may be assessed, simultaneously,
as individual risk factors.
2.1. Cross-sectional studies • Controlling for confounding variables: confounders include variables
that completely or partially account for the apparent association
Cross-sectional studies are commonly used in transportation between an outcome and risk factor. Specifically, a confounder is a
safety research to estimate the expected number of crashes on a variable that is a risk factor for the outcome under study, and is
roadway segment, interchange, or intersection. CMFs derived from associated with, but not a consequence of, the risk factor in question
cross-sectional data are based on a prescribed time period under the (Collett, 2003). In highway safety, an example of a confounder is
assumption that the ratio of average crash frequencies for sites with average daily traffic (ADT). ADT has been shown to be associated
and without a feature is an estimate of the CMF for implementing that with crash risk and is also associated with, but not a consequence
feature. The strength of a cross-sectional study is that a large number of, several geometric features (e.g., lane width, shoulder width, and
of sites with and without a specific countermeasure can often be horizontal curvature).
identified. A weakness of a cross-sectional study is that it is difficult to • Matching: the primary reason for a matched design is to directly
determine the reason that certain safety countermeasures exist at control for confounding variables. Control sites are matched to each
one location and not at other similar locations. As such, the observed case through random sampling based on similar values of potential
difference in crash experience can be due to known or unknown confounding variables.
factors other than the feature of interest. Known factors, such as traffic o Matching provides a balanced design and adjusts for the effects of
volume or geometric characteristics, can be controlled for in principle variables included in the matching scheme.
by estimating a multivariate regression model. However, the issue is o Matching ensures that adjustment is possible when the con-
not completely resolved since it is difficult to properly account for founding variable is distributed differently within the case and
unknown, or known but unmeasured, factors. Several examples of control populations. In rare cases, the distribution of a confound-
developing CMFs from a cross-sectional study are contained in the ing variable may not overlap for a random sample of cases and
literature. Lord and Bonneson (2007) developed CMFs for lane width, controls. In this case, there would be no way to adjust the results
shoulder width, and edge-line marking presence for frontage roads in during the analysis phase.
Texas. Bonneson and Pratt (2008) recently proposed a procedure to o Matching improves the efficiency of the design, requiring smaller
develop CMFs for curve radius along two-lane rural highways. sample sizes or resulting in estimates with a narrower confidence
Additionally, Fitzpatrick, Lord, and Park (2008) developed CMFs for interval. However, this only holds when the matching is based on
median width on freeways and rural multi-lane highways in Texas. true confounders (Woodward, 2005).

Case–control designs are appealing due to their ability to estimate

2.2. Case–control studies
risk while properly controlling for confounding variables; however,
there are disadvantages that must be recognized and addressed.
Case–control designs are well established in epidemiology where
Disadvantages of the case–control method are as follows:
they are used to relate risk factors within a population to a particular
outcome or disease. In the highway safety context, their use has often • Case–control studies cannot be used to measure the probability of
been limited to studies of the road-user and vehicle (Tsai, Wang, & an event (e.g., crash, severe injury) in terms of expected frequency.
Huang, 1995; Stevenson, Jamrozik, & Spittle, 1995; Jovanis, Park, They are more often used to show the relative effects of risk factors.
Chen, & Gross, 2005). More recently, the case–control method has • Case–control studies often rely on collecting retrospective data
been applied to estimate the safety effectiveness of lane and shoulder for risk factors and outcome status, relying on the availability of
width (Gross & Jovanis, 2007; Gross & Jovanis, 2008); the results of historical documentation to provide information regarding risk
this research showed a striking resemblance to CMFs proposed in the factors and outcomes.
Highway Safety Manual for two-lane, rural highways (American • Case–control studies are based on cross-sectional data; however,
Association of State Highway Transportation Ofﬁcials [AASHTO], they should not be confused with cross-sectional studies in general.
2010). This study concludes that case–control methods may be a Case–control studies select subjects based on outcome status where
viable alternative for estimating CMFs when before–after studies are cross-sectional studies generally sample based on risk factor status.
not practical or feasible. Whether used for a case–control design or cross-sectional design,
Case–control studies assess whether exposure to a potential risk cross-sectional data do not involve a time sequence of data collection.
factor is disproportionately distributed between cases and controls, Hence, they can only demonstrate associations, not causality.
thereby indicating the likelihood of an outcome given the presence of • Although case–control studies may be used to explore multiple risk
the risk factor. Case–control studies produce an estimate of the odds factors, they can only investigate one outcome per sample.
F. Gross, E.T. Donnell / Journal of Safety Research 42 (2011) 117–129 119

• Care must be taken to ensure that cases and controls are 3.2. Case–control studies
representative of the underlying populations of interest. Specifical-
ly, the chance of being included in the study must not be associated In a typical case–control design, cases are defined and subjects
with the risk factor(s) of interest. are enrolled based upon their current outcome status (i.e., cases
• The traditional case–control design does not recognize differences and controls). The prior risk factor status within each outcome
between locations with many crashes or a single crash. This is a loss group is then determined. For this study, cases are defined as
of potentially important information and thus, the true increase in locations that experience at least one crash during a particular year
risk could be underestimated. Gross (2006) explored methods to of the study period; controls are those locations that do not ex-
account for locations with multiple crashes, but further research is perience a crash in the same period. Control locations are randomly
needed in this area. matched to each case segment based on several factors to account
• Matching is a powerful tool to account for confounding variables, for potential confounding.
but it also has drawbacks, including: The ratio of controls to cases may vary and often depends on
o Increased complexity of data collection and sample selection, the availability of time, budget, and other factors. As the ratio of con-
especially when there are many matching variables. trols to cases increases, the power of the design increases but at a
o The sample sizes within each matching combination often become decreasing rate. There is often little additional power gained at ratios
small due to the limited number of subjects (sites) that match the greater than four controls per case. In this study, one control was
criteria exactly. In transportation, this has been stated as a limitation randomly matched to each case (a one-to-one ratio), which achieves
to cross-sectional studies that involve matching (Hauer, 2010). about 90% power (Woodward, 2005).
o The effectiveness of matching variables cannot be estimated. The Presence of risk factors may be determined only after the case–
interaction between matching variables and risk factors, however, control sample has been obtained. Risk factors may take the form of
may be analyzed. binary variables (e.g., presence of guardrail or rumble strips) or multi-
level variables (e.g., speed limit or traffic control type).
3. Methodology The odds ratio is the appropriate measure of effectiveness for case–
control studies and is an estimate of the relative effect of the risk
3.1. Cross-sectional studies factor compared to a baseline (e.g., no risk factor for a binary variable).
The odds ratio is interpreted as the expected percent change in
It is common practice to use generalized linear modeling the outcome in question (i.e., one or more crashes in a year) due to the
techniques, assuming a negative binomial error structure, to estimate presence of the risk factor. In the case of a binary risk factor (i.e.,
crash frequency models. Several examples are published in the presence or absence), the baseline is usually the absence of a
highway safety literature for intersections (e.g., Poch & Mannering, particular feature and the odds ratio indicates the percent change in
1996; Bauer & Harwood, 1996; Washington, Persaud, Lyon, & Oh, the risk of an outcome due to the presence of the risk factor in
2005; Donnell, Porter, & Shankar, 2010) and rural road segments question. For risk factors with more than two possible categories,
(e.g., Shankar, Mannering, & Barfield, 1995; Vogt & Bared, 1998). A the baseline may be selected as any one category to which other
negative binomial regression model relates a crash count as a left- categories are compared. Each category may be compared to the
hand-side (LHS) variable to a number of right-hand-side (RHS) baseline, in which case the measure of effectiveness would indicate
variables, coefficients that quantify magnitudes of relationships the percent change in the risk of an outcome due to the specified
between LHS and RHS variables, and a disturbance term. For this deviation from the baseline.
study, the negative binomial regression model was assumed to take Conditional binary logistic regression is used to estimate the odds
the following form: ratio for matched case–control designs. The regression model is
conditional on the fact that controls are purposefully matched to cases
lnλi = βXi + εi ð1Þ on specific variables. Conditional binary logistic regression may be
used to estimate the odds ratio of a binary outcome and account for
multi-level risk factors, confounding variables that are not included in
where:
the matching process, and interaction terms. The conditional
probability of an outcome associated with the unmatched variables
λi expected number of crashes at location i;
X1,…,Xp for each member of the jth matched set is given by Eq. (3)
β vector of estimable regression parameters corresponding to
(Schlesselman, 1982).
geometric design and traffic volume explanatory variables;
Xi vector of geometric design and traffic volume explanatory
variables for location i; p
εi gamma-distributed error term. PrðY = 1Þ = 1 = 1 + exp − αj + ∑ βi Xi ð3Þ
i=1

The mean-variance relationship for the negative binomial distri-

where:
bution is:

αj effect of matching variables for each matched set;

Varðyi Þ = Eðyi Þ½1 + αEðyi Þ ð2Þ βi estimated regression parameters for unmatched explana-
tory variables;
Xi unmatched explanatory variables included in the model.
where:

Estimates of the coefﬁcients for the explanatory variables are

Var(yi) variance of observed crashes y occurring at location i;
obtained by maximizing the likelihood expression in Eq. (4).
E(yi) expected crash frequency at location i;
α overdispersion parameter.

n
c
p
Regression parameters are estimated using the method of maxi- LðβÞ = − ∑ ln 1 + ∑ exp ∑ βk Xjki −Xj0i ð4Þ
mum likelihood. j=1 k=1 i=1
120 F. Gross, E.T. Donnell / Journal of Safety Research 42 (2011) 117–129

where: roadways were included because of considerable missing data for the
minor intersecting roadways. More than 42% of the intersections
n number of cases; contained roadway lighting.
c number of controls matched to each of n cases; Regression models of total daytime and nighttime crashes were
Xi unmatched explanatory variables; estimated for intersections. In the present study, the safety effects
Xj0i value of xi for a case in the jth matched set; of roadway lighting were estimated from daytime and nighttime
Xjki value of xi for the kth matched control in the jth matched set. regression parameters. The daytime crash frequency model was used
as a baseline scenario in which to compare the nighttime crash
In this study, the dependent variable is coded as a one or zero to frequency model. It was assumed that the daytime lighting indicator
indicate a case or control, respectively. Risk factors are represented by variable should be near zero (i.e., no difference in the expected
Xi in Eq. (3). Confounding variables included in the matching scheme daytime crash frequency with or without lighting) because no
do not appear in the estimation because matching variables have the illumination is provided by the lighting system during the day. In
same value for cases and controls. such a case, the nighttime crash frequency model could be used alone
to determine the safety effect of ﬁxed roadway lighting. However, if
4. Empirical setting this assumption did not hold, the relative difference between the
nighttime and daytime lighting indicator parameter estimates was
To compare safety effectiveness results obtained from case– needed to determine the effect of ﬁxed roadway lighting. The method
control and cross-sectional studies, two independent databases to compute the percent reduction in crashes from the daytime and
were developed. At-grade intersections in Minnesota were used to nighttime crash frequency models is as follows:
evaluate the safety effectiveness of roadway lighting presence. Two-

= exp θ̂
lane rural highway segments in Pennsylvania were used to assess the
safety effectiveness of lane and shoulder widths. Each database is exp θ̂N D −1 ð5Þ
described below.

4.1. Minnesota intersections where:

At-grade intersection data in Minnesota were obtained from the θˆ N estimated regression parameter associated with the pres-
Federal Highway Administration (FHWA) Highway Safety Informa- ence of roadway lighting in the nighttime crash frequency
tion System (HSIS) files (Council & Williams, 2001). Four years (2001 model; and
through 2004, inclusive) of data were used in the analyses, which θˆ D estimated regression parameter associated with the presence
included 6,464 intersections. Approximately 13.7% of the intersec- of roadway lighting in the daytime crash frequency model.
tions in the sample had signal control while the remainder operated
under stop control. There were three intersection forms coded in the A negative value resulting from Eq. (5) indicates that the expected
database: cross, tee, and skew. Approximately 49% of the intersections number of crashes are lower on roadways with fixed roadway lighting
were four-leg cross intersections, nearly 40% were three-leg tee compared to those without lighting. Eq. (5) produces a crash reduc-
intersections, and the remaining 11% were four-leg skewed intersection factor (CRF) – a CMF is computed by a simple conversion of
tions. There were 38,437 reported crashes at the intersections the CRF (i.e., CMF = 1 – (CRF/100)). The advantage of this approach
included in the analysis database. Each crash represented a potential over a before–after approach is consideration of variables other than
case for the case–control study, and controls (i.e., intersections with the presence of lighting that may also influence the night-to-day crash
no crashes) were randomly sampled and matched to each case as ratio (e.g., traffic volume, geometrics). By considering these variables,
described previously. the hopeful result is the ability to isolate the effect of roadway lighting
Minnesota is a unique HSIS database in that it contains a variable while controlling for other variables present at the intersection.
for lighting presence for at-grade intersections. As noted previously, The original dataset, used for the negative binomial models, was
only the presence of roadway lighting was considered in the present modified for the case–control study. Two different datasets were
study. Table 1 provides a summary of Minnesota intersection data developed for the case–control study; one to analyze nighttime
included in the analysis. Only geometric design data for the major crashes and one to analyze daytime crashes. The two datasets were

Table 1
Descriptive Statistics of Minnesota Intersection Data.

Variable Minimum Maximum Mean Standard Deviation

Night accident frequency (per year) 0 28 0.366 0.969

Day accident frequency (per year) 0 55 1.121 2.457
Log major road average daily traffic 3.700 11.300 8.431 1.145
Percent heavy vehicles on major road 0 61.11 8.888 5.109
Log minor road average daily traffic 0 11.300 6.939 1.733
Indicator for number of lanes on major road (1 = 4-lanes; 0 = 2-lanes) 0 1 0.227 0.419
Area type indicator (1 = urban/suburban; 0 = rural) 0 1 0.446 0.497
Traffic control indicator (1 = signal; 0 = stop-control) 0 1 0.137 0.344
Lighting indicator (1 = present; 0 = not present) 0 1 0.421 0.494
Intersection type indicator (1 = skew; 0 = cross or tee) 0 1 0.110 0.312
Speed indicator on major road (1 = 80kph (50 mph) or greater; 0 otherwise) 0 1 0.673 0.469
Depressed median indicator on major road (1 = depressed median; 0 = barrier or no median) 0 1 0.116 0.320
Indicator for 2001 (1 = year 2001; 0 = otherwise) 0 1 0.250 0.433
Indicator for 2002 (1 = year 2002; 0 = otherwise) 0 1 0.250 0.433
Indicator for 2003 (1 = year 2003; 0 = otherwise) 0 1 0.250 0.433
Indicator for 2004 (1 = year 2004; 0 = otherwise) 0 1 0.250 0.433
F. Gross, E.T. Donnell / Journal of Safety Research 42 (2011) 117–129 121

developed using a similar methodology, but for the nighttime dataset, tions were available (21,688 segments x 5 years of data). The total
cases were defined as intersections that experience at least one number of crashes occurring over the analysis period was 56,732. The
nighttime crash in a year, whereas the daytime dataset defines cases average daily traffic on the analysis segments ranged from 95 to
as intersections that experience at least one daytime crash in a year. 25,844 vehicles per day. Descriptive statistics for all variables included
Once the cases were defined, the following general method was used in the analysis are shown in Table 3.
to develop the two datasets: As shown in Table 3, the lane width, shoulder width, posted speed
limit, location of the roadway segment (engineering district), year,
1. Separate case and control populations.
and additional shoulder width (total minus paved) were the
2. Randomly match one control to each case based on area type,
categorical explanatory variables used in the analysis. The relatively
median type, number of lanes, and intersection type.
low proportion of mileage in engineering districts 5–0, 6–0, 11–0, and
3. Develop indicator variables for lighting, year of crash, speed limit,
and traffic control.

Table 2 shows the sample sizes for the nighttime and daytime Table 3
case–control datasets. For this study, case–control pairs were matched Descriptive Statistics for Pennsylvania 2-lane Rural Roads.
on area type (i.e., rural or urban/suburban), median type on the major
Continuous Variables Minimum Maximum Mean Standard
road (i.e., divided or undivided), number of lanes on the major road Deviation
(i.e., 2-lane or 4-lane), and intersection type (i.e., 3-legged, 4-legged
Number of total crashes (per year) 0 18 0.534 0.903
cross, 4-legged skew). Average daily trafﬁc (veh/day) 95 25,844 3,419 3,061
Roadway lighting took the form of a binary variable (i.e., lighting Segment length (miles) 0.01 1.48 0.48 0.13
present or lighting absent); other covariates included in the cross-
sectional and case–control comparison were: Categorical Variables Proportion of Sample (%)

Lane width = 9 feet 7.4

• Year of crash, Lane width = 10 feet 26.5
• Average daily traffic (ADT) on the major road, Lane width = 10.5 feet 3.7
• Average daily traffic (ADT) on the minor road, Lane width = 11 feet 36.1
• Percent heavy vehicles on the major road, Lane width = 11.5 feet 1.1
Lane width = 12 feet 19.0
• Speed limit on the major road, and
Lane width = 12.5 feet 0.4
• Traffic control type (signal vs. stop-control). Lane width = 13 feet 0.8
Lane width = 13.5 feet 5.0
For this study, the crash year was included as an indicator variable Shoulder width = 0 feet 29.0
to account for time trends. Average daily traffic on the major and Shoulder width = 1 foot 2.2
minor roads was included in the model to account for traffic volumes. Shoulder width = 2 feet 13.1
For both major and minor roads, the natural log of ADT was used as a Shoulder width = 3 feet 16.6
Shoulder width = 4 feet 21.1
continuous variable in the negative binomial and case–control
Shoulder width = 5 feet 5.6
models. The percentage of heavy vehicles on the major road was Shoulder width = 6 feet 5.6
also included in the models as a continuous variable. Speed limit and Shoulder width = 7 feet 0.9
traffic control type entered the model as indicator variables. Speed Shoulder width = 8 feet 4.2
limit was entered as high speed (≥80 kph (50 mph)) or low speed Shoulder width = 9 feet 0.2
Shoulder width = 10 feet 1.5
(b80 kph (50 mph)). The traffic control variable indicated either a Posted speed limit ≤ 25 mph 1.6
signalized or unsignalized intersection. As discussed previously, area Posted speed limit = 30 or 35 mph 9.8
type, median type, number of lanes, and intersection type were Posted speed limit = 40 or 45 mph 41.2
controlled for in the case–control study by matching. In the negative Posted speed limit = 50 or 55 mph 47.4
Year 1997 20.0
binomial models, these variables were controlled for by creating
Year 1998 20.0
indicators and entering them in the model specification as covariates. Year 1999 20.0
Year 2000 20.0
4.2. Two-lane rural highways in Pennsylvania Year 2001 20.0
Engineering district 1-0 11.2
Engineering district 2-0 15.3
Electronic roadway inventory and crash data were obtained from Engineering district 3-0 13.4
the Pennsylvania Department of Transportation (PennDOT) to Engineering district 4-0 9.1
evaluate the safety effectiveness of lane and shoulder widths. Five Engineering district 5-0 6.5
years of data (1997 through 2001, inclusive) were used in the Engineering district 6-0 2.0
Engineering district 8-0 13.6
analysis. A total of 21,688 segments were available in the PennDOT
Engineering district 9-0 10.5
data files; each segment was nominally 0.8 km (0.5 mi) long. Over the Engineering district 10-0 9.2
five-year analysis period, a total of 108,340 segment-level observa- Engineering district 11-0 2.4
Engineering district 12-0 6.8
Additional shoulder widtha = 0 feet 58.9
Additional shoulder width = 1 feet 4.3
Table 2 Additional shoulder width = 2 feet 15.3
Case–control Sample Size for Minnesota Intersections. Additional shoulder width = 3 feet 6.7
Additional shoulder width = 4 feet 9.6
Intersection Area Nighttime Daytime
Additional shoulder width = 5 feet 1.2
Type Type Dataset Dataset
Additional shoulder width = 6 feet 2.1
Cross Urban/Suburban 9,012 4,576 Additional shoulder width = 7 feet 0.2
Cross Rural 5,650 6,894 Additional shoulder width = 8 feet 1.2
Cross-Skew Urban/Suburban 2,276 930 Additional shoulder width = 9 feet 0.1
Cross-Skew Rural 2,384 1,872 Additional shoulder width=10 feet 0.4
Tee Urban/Suburban 3,970 3,966
Number of observations = 108,340.
Tee Rural 5,194 5,836 a
Additional shoulder width is the difference between the total shoulder shoulder
Total 28,486 24,074
width and paved shoulder width.
122 F. Gross, E.T. Donnell / Journal of Safety Research 42 (2011) 117–129

12–0 is a result of the these locations being primarily urban, while the observations were available for the case–control analysis (41,654
remaining engineering districts are primarily rural districts with a cases and 41,654 controls). The covariates used in the case–control
significant proportion of rural two-lane highway mileage. The analysis are the same as those described above for the negative
additional shoulder width variable was created to determine if there binomial regression model, excluding year, segment length, and AADT
is a safety benefit to providing a shoulder that is not paved. as these variables were used in the matching process. The covariates
A negative binomial regression model of total crashes was entered the model as categorical indicators. The average values for
estimated using the general functional form of the model shown the case sample are as follows: 11.2 ft (3.41 m) lane width, 3.0 ft
in Eq. (1). The parameter estimates from the model were used to (0.91 m) paved shoulder width, 4.1 ft (1.25 m) total shoulder width,
evaluate the safety effect of the lane and shoulder width variables 47 mi/h (75.6 km/h) speed limit, 3,921 average daily traffic, and
when compared to a baseline level. In the present study, the lane 2,598 ft (791.9 m) segment length. The average values for the
width baseline was 12 feet (3.6 m), the shoulder width baseline was control sample are as follows: 11.2 ft (3.41 m) lane width, 3.1 ft
6 feet (1.8 m), and the additional shoulder width baseline was 0 feet (0.94 m) paved shoulder width, 4.3 ft (1.31 m) total shoulder width,
(0 m) to be consistent with the rural two-lane highway AMFs shown 48 mi/h (77.2 km/h) speed limit, 3,701 average daily traffic, and
in the Highway Safety Manual (AASHTO, 2010). As such, the relative 2,578 ft (785.8 m) segment length.
effect of the explanatory variable is compared to the baseline. A value
less than 1.0 indicates that a particular lane or shoulder width 5. Analysis results
dimension is associated with fewer crashes than the baseline
condition, while a value greater than 1.0 is associated with a higher 5.1. Minnesota intersections and roadway lighting
expected total crash frequency when compared to the baseline
condition. Results of the negative binomial regression models are shown in
The dataset used for the negative binomial models was modified Table 5 and Table 6 for daytime and nighttime model specifications,
for the case–control study. Only a single database was created for the respectively. The natural logarithm of the major and minor roadways
case–control analysis to evaluate the safety effects of lane and was used to estimate the relationship between traffic volume and the
shoulder width on two-lane rural highways in Pennsylvania. Creating total daytime and nighttime crash frequency. Comparing the results in
the case and control populations for the segment data generally Table 5 and Table 6, the only parameter estimate that changes in sign
followed the same procedure as that outlined above for the is the lighting presence indicator (from 0.046 in the daytime model to
intersection data. Cases were defined as segments that experience −0.081 in the nighttime model). This result was expected because the
at least one crash during a particular year of the study period, and presence of fixed roadway lighting was expected to have a negative
controls were defined as those segments not experiencing a crash association with nighttime crash frequency, and either a slightly
within the same year. The variables used in the matching process positive or no association with daytime crashes. In the daytime crash
were year, segment length, and average daily traffic. Whereas year frequency model, the lighting indicator was slightly positive and
was included as a covariate in the analysis of Minnesota intersections, marginally significant while in the nighttime model the lighting
it was used as a matching criterion in the analysis of segments in indicator was negative and statistically significant at the 10% level. As
Pennsylvania. Including year as a covariate in the Pennsylvania such, Eq. (5) was used to compute the mean percent reduction in the
analysis would likely not change the results, but there was a sufficient expected crash frequency of approximately 11.9%. When considering
sample size to include year as a matching variable and thus control for the standard error of the estimate, the 95% confidence interval for the
time trends. A transformation was used to apply a non-arbitrary rule reduction in crashes is 10.9 to 13.0%. The CMF for lighting presence
for creating categories for the continuous variables. The procedure would be 0.881.
involves transforming the data to a normal distribution, estimating Results of the case–control models are shown in Table 7 and
the mean and standard deviation, and then selecting categories based Table 8 for daytime and nighttime model specifications, respectively.
upon +/− a standard deviation from the mean. Details regarding the Typically, results of case–control studies are reported as odds ratios,
final categories are provided in previous work (Gross & Jovanis, 2007). but for comparison purposes, the odds ratios have been converted to
Table 4 shows the sample sizes for the case–control database for standard parameter estimates. The odds ratio is simply exp(β), so to
rural two-lane highway segments in Pennsylvania. A total of 83,308 obtain the parameter estimate, the natural log of the odds ratio is
computed.
Comparing the results in Table 7 and Table 8, the only statistically
Table 4
Case–control Sample for Pennsylvania 2-lane Rural Roads.
significant parameter estimate that changes in sign is the lighting
presence indicator (from 0.055 in the daytime model to − 0.121 in the
Covariate Case Sample Control Sample nighttime model). Again, this was expected and is consistent with
Lane width 9 feet 1,677 2,094 the findings from the cross-sectional analysis findings presented in
Lane width 10 feet 10,236 9,850 Table 5 and Table 6. Using Eq. (5), the mean percent reduction in the
Lane width 10.5 feet 1,511 1,361
expected crash frequency is approximately 16.4%. When considering
Lane width 11 feet 16,862 16,209
Lane width 11.5 feet 439 464 the standard error of the reduction, the 95% confidence interval is 15.7
Lane width 12 feet 8,217 9,038 to 16.6%. Of note is the fact that the daytime lighting presence variable
Lane width 12.5 feet 123 167 in the case–control model was not statistically significant. If only the
Lane width 13 feet 320 363 nighttime lighting indicator parameter is used to compute the percent
Lane width 13.5 feet 2,269 2,108
Shoulder width 0 feet 10,374 9,857
difference in crashes at intersections with and without lighting, the
Shoulder width 1 foot 869 783 expected crash reduction is 11.4%, a near identical match to the
Shoulder width 2 feet 5,396 5,024 estimate obtained from the cross-sectional approach. The CMF would
Shoulder width 3 feet 7,392 7,242 be 0.886.
Shoulder width 4 feet 9,943 9,978
The only other variable that changes sign between the daytime
Shoulder width 5 feet 2,511 2,633
Shoulder width 6 feet 2,427 2,714 and nighttime case–control model is the indicator for year 2002.
Shoulder width 7 feet 364 384 However, in both models this variable is not statistically significant.
Shoulder width 8 feet 1,804 2,039 In general, the parameter estimates for the remaining variables
Shoulder width 9 feet 46 93 are intuitive and relatively consistent in the cross-sectional and
Shoulder width 10 feet 528 907
case–control models presented in Tables 5–8. The year indicators
F. Gross, E.T. Donnell / Journal of Safety Research 42 (2011) 117–129 123

Table 5
Negative Binomial Regression Model for Daytime Crashes at Minnesota Intersections.