Federal Aviation


Office of Aerospace Medicine
Washington, DC 20591

Predicting Accident Rates

From General Aviation Pilot
Total Flight Hours

William R. Knecht
Civil Aerospace Medical Institute,
Federal Aviation Administration
Oklahoma City, OK 73125

February 2015

Final Report

Deep appreciation is extended to Joe Mooney, FAA (AVP-210), for his help reviewing the NTSB
database queries that provided the accident data used here, to David Nelms, FAA (AAM-300), for
providing TFH from the DIWS database, and to Dana Broach, FAA (AAM-500) for valuable commentary
on this series of manuscripts.
This research was conducted under the Flight Deck Program Directive/Level of Effort Agreement
between FAA Headquarters and the Aerospace Human Factors Division of the Civil Aerospace Medical
Institute, sponsored by the Office of Aerospace Medicine and supported through the FAA NextGen
Human Factors Division.

Predicting Accident Rates From General Aviation Pilot Total Flight Hours

INTRODUCTION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
The Problem With Rates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
METHOD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
The Modeling Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
The Empirical Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Important Considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Parameter Estimation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
The Problem of Noise. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Parameter Start Values and Evolution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
RESULTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Modeling the Accident Rate Histograms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Model Evaluation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
USING Grate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
A Point-Estimate of GA Flight Risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
A Cumulative Estimate of GA Flight Risk. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
DISCUSSION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Predicting Accident Rates From General
Aviation Pilot Total Flight Hours

Is there a range of pilot flight hours over which general aviation (GA) pilots are at greatest risk? More broadly, can we
predict accident rates, given a pilot’s total flight hours (TFH)? Many GA research studies implicitly assume that accident
rates are a linear function of TFH when, in fact, that relation appears nonlinear. This work explores the ability of a nonlinear
gamma-based function (Grate) to predict GA accident rates from noisy TFH data. Two sets of National Transportation Safety
Board (NTSB)/Federal Aviation Administration (FAA) data, parsed by pilot instrument rating, produced weighted goodness-
of-fit (R2w) estimates of .654 and .775 for non-instrument-rated (non-IR) and instrument-rated pilots (IR), respectively.
This model class would be useful in direct prediction of GA accident rates, and as a statistical covariate to factor in flight
risk during other types of modeling. Applied to FAA data, these models show that the range for relatively high risk may
be far broader than first imagined, and may extend well beyond the 2,000-hour mark before leveling off to a baseline rate.


Total flight hours has long been considered one of the risk fac-
tors in aviation, and is often used to represent either pilot flight
experience or cumulative risk exposure (e.g., Dionne, Gagné &
Vanasse, 1992; Guohua, Baker, Grabowski, Qiang, McCarthy
& Rebok, 2011; Mills, 2005; Sherman, 1997). TFH has served
as both an independent variable in its own right, as well as a
statistical covariate, to control for the effects of experience or risk.
Investigators have often unwittingly assumed a linear rela-
tion between TFH and accident frequency and/or rate, that is,
a straight-line prediction function ŷ=a+bx, with a as y-intercept
and b as slope. However, evidence is emerging that such relations
are actually nonlinear. For instance, Bazargan & Guzhva (2007) Figure 1. Frequency histogram of GA IR fatal accident
counts (y-axis) as a function of total flight hours (x-axis, bin
reported that the logarithmic transform log(TFH) significantly width=100 TFH).
predicted GA fatalities in a logistic regression model. More
recently, Knecht & Smith (2012) reported that a risk covariate
starting with log(TFH), followed by a gamma transform, signifi-
cantly predicted GA fatalities in a log-linear model.
In his 2001 book, The Killing Zone, Paul Craig presented
early evidence that GA pilot fatalities might relate nonlinearly
to TFH. Craig showed that fatalities occur most frequently at
a middle range of TFH (≈50-350), and hypothesized that this
band of time may be one in which pilots are at greatest risk due to
overconfidence at having mastered flying the aircraft, combined
with lack of actual experience and skill in dealing with rare, chal-
lenging events. He supported this hypothesis with histograms of
GA accident frequencies, although not with a formal model. In
group after group, a similar pattern emerged in his data, one of
a “skewed camel hump” with a long tail at higher TFH. Figure 2. Frequency histogram of GA IR SERIOUS + FATAL
accident counts as a function of TFH (bin width=100 TFH).
Following Craig’s findings, I investigated a nonlinear model
for accident frequency counts (Knecht, 2012). Derived from
NTSB and FAA data, Figure 1 illustrates this basic class of rela- serious vs. fatal accidents) or even combine them. Figure 2 shows
tions between TFH and fatal accident counts for a group of 831 GA accident data for 832 pilots from 2003-2007 for NTSB
U.S. instrument-rated GA pilots from 1983-2011. SERIOUS+FATAL accident categories that were combined to
The “skewed camel-hump” relation appears as we focus on the boost frequency counts.
range TFH<5000. It appears quite durable, persisting whether So, frequency count histograms do support the notion of a
we break out different data categories (e.g., IR vs. non-IR or nonlinear “killing zone” somewhere in the lower range of TFH.

Figure 3. A histogram of GA IR non-accident pilot counts Figure 4. A noisy histogram showing combined
from the same data cohort as Fig. 2, showing how numbers “serious”+“fatal” accident rates for all GA pilots
of pilots decrease rapidly as TFH increase (bin width=100). as a function of TFH (bin width=100).1
And, we have successfully modeled that relation with a gamma It is not hard to appreciate how useful a modeling function
probability density function (pdf ). could be here. A good model would smooth the noise in the
But, this raises a deeper question: How can we be certain that data, allowing investigators to better predict rates, given TFH.
this “relation” is not an artifact, merely a side-effect of the fact This would be directly useful in areas such as determination of
that there are simply fewer pilots at higher TFH, hence fewer insurance premiums, allocation of resources for pilot training,
accidents? Figure 3 shows how frequency counts for a matched accident investigation staffing, and public relations. To safety
sample of non-accident pilots also drop off markedly as TFH researchers, it would be useful as an improved statistical covari-
increase. ate, to either factor in or out risk as a function of flight experi-
Figures 2 and 3 look suspiciously similar. In fact, the Pearson ence. This would have broad application in numerous types of
correlation between these datasets is .845 (p<.001). This implies aviation research.
that most (r2≈.71) of the “killing zone” seen in accident frequency With this in mind, the present work moves from fitting
counts can be explained simply by the fact that the numbers of GA accident frequency count data to fitting accident rates as a
pilots tend to decrease as TFH increases. function of GA pilot TFH. The calculation of binned accident
With that in mind, what support remains for the hypothesis rates is a key step in risk analysis because uniformly sized bins
of an actual killing zone? And, if it exists, what shape does it will control for risk exposure while rates control for the actual
truly assume? number of individuals within a given data bin.
We can address these questions by analyzing accident rate
histograms, in which the ith data bin is METHOD

accidentsi The Modeling Function

ratei = (1)
accidentsi + nonaccidentsi Ideally, modeling functions should be motivated by theory
involving causal processes inherent to the data. Unfortunately,
in aviation accidents, causes tend to be numerous and hard to
Rates will control for the fact that our frequency counts de- disentangle, making theory-based modeling difficult.
crease as TFH increases, and uniformly sized bins will control We therefore proceed with the less ambitious goal of simply
for the effect of exposure. Therefore, if a killing zone continues trying to fit a well-behaved, continuous mathematical function
to persist with binned rates, then the empirical evidence for it to empirical GA accident rate data. Standard techniques are
would be much stronger. used, namely minimization of least-squares residuals between a
modeling function and empirical data.
The Problem With Rates The model itself begins with a versatile probability density
Unfortunately, analyzing accident rates is easier said than function Gpdf (Spanier & Oldham, 1987). Its x-axis represents
done. Rate data can be quite noisy, as Figure 4 depicts. TFH. The y-axis can represent either frequency count or propor-
This “noise” represents sampling error—random fluctua- tion (here, we will focus on proportions). The model contains a
tions in rate from one bin to the next. It is a vexing problem at shape parameter, α >0, and a scale parameter, β >0.
higher TFH where Eq. 1’s denominator is small compared to
its numerator, and even small changes in frequency count can e − x β xα −1β −α
Γpdf .c (x; α , β ) = (2)
cause substantial variations in accident rate. Despite having tens Γ(α )
of thousands of cases in the “hump,” bins in the high-TFH tail
may, by pure chance, sometimes consist of one or two individuals,
resulting in alarmingly high-rate bins right next to zero-rate bins. 1
Data from Knecht & Smith (2012).

The gamma function Γ(α) itself is described as the Euler to reflect the notion that all flights harbor some constant risk,
integral, defined for α > 0 no matter how experienced the pilot.

Γ(α ) = ∫ t α −1e − t dt (3) e − (ln( x )−δ ) β (ln(x) − δ )α −1 β −α (4)
Γrate = b + A
0 Γ(α )

The Empirical Data

Models need values for their parameters. The current work
parameterizes Grate on four U.S. GA pilot datasets collected during
another project (Knecht & Smith, 2012). Those data span the time
period 1/1/2003-8/26/2007 (4.65 yr.), and represent GA pilots
licensed after 1995, matched on private-pilot school type and
examiner type for the purposes of that particular study.2 To boost
frequency counts, two official categories of injury (Serious+Fatal)
are now combined into a single category. The data are parsed
by pilot instrument rating (IR vs. non-IR) and accident status
(Accident vs. Non-accident). Table 1 shows set sizes.
Figure 5. Gpdf with various values of a and b. These accident data come from the NTSB accident data-
base, by way of FAA’s Accident Investigation and Prevention
Shown in Figure 5, Gpdf has the attractive feature of being able Division (AVP-210). Non-accident data come from the FAA
to represent overdispersion (variance>mean) or underdispersion Comprehensive Airman Information System (CAIS, pronounced
(variance<mean), as well as having calculable confidence intervals “CASS”), managed by the Flight Standards Service (AFS-760),
around the function itself. supplemented with pilot TFH from the FAA Document Imaging
Gamma pdfs have been used to model a wide variety of pro- Workflow System (DIWS), managed by the Aerospace Medical
cesses, including the size of insurance claims (Hogg & Klugman, Certification Division (AAM-300).
1984), amounts of rainfall (Chiew, Srikanthan, Frost, & Payne, Each of Table 1’s four datasets is aggregated into x-axis bins
2005), waiting times and mean-time-to failure (Winkelman, 100 TFH wide. Experience shows that a bin width of 100 pro-
2008), where it represents time until the ath event in a constant- duces a reasonable balance between the number of bins and the
hazard model, and distributions of microburst wind velocity population of each, given datasets in Table 1’s size range. The
(Mackey, 1998). bins span a range from 0-32,500 TFH, with each bin’s x-value
Prior experience with GA accident frequency data shows centered mid-way into its data range. Student pilots are excluded,
that this canonical version of Gpdf is unable to deal with critical so the actual data range for the first bin is 45-100. Consequently,
features of data such as amplitude, non-zero x-axis (TFH) start x1=77.5, x2=150, x3=250...xi>1=100i-50. Bin accident rates are
values, and the long right-hand tail inherent to real-world data. then calculated by Eq. 1 to produce one dataset for IR pilots
Therefore, a more practical model is proposed (Eq. 4), which and a second for non-IR pilots.
we can call Grate. This includes an amplitude term A (which
can “stretch” the entire function up or down on the y-axis), a Important Considerations
location parameter δ (which can shift it left or right), an x-axis First, it is important to normalize Grate. This requires some
log-transform (which compresses larger values more than smaller explanation. Accidents (which form our rate numerators in
ones), and a base-rate term b (which sets the pdf on a thin, long each TFH bin) accumulate over time, whereas the correspond-
rectangle capable of moving up or down on the y-axis), added ing number of non-accident pilots (which forms each bin’s

Table 1. Pilot statistics in the four datasets modeled.

Accident status Est. Ann acc
Pilot rating A Annual rate per
Accs Non-accs Totals B
TFH 100 FH
IR n11= 832 n12=27,528 28,360 7,805,950 .00229
non-IR n21=1,036 n22=38,291 39,327 2,630,452 .00847
Totals 1,868 65,819 67,687 10,436,402 .00385
Based on 4.65 years’ NTSB accident data.
Based on 2 times the reported past-6-month FH from DIWS database.

Currently, NTSB accident data beyond 2007 are not readily available to the
FAA, for reasons unknown.

denominator) is based on a “snapshot” in time and is therefore A third standard approach was next tried, involving data-
considered a constant. Consequently, we should normalize our smoothing using moving averages. Unfortunately, this also had
bin accident rates to reflect a standard unit time such as one the effect of destroying information by smearing, greatly altering
year. In that event, each bin’s initial rate should be divided by the “camel hump” representing the majority of the data.
the number of years over which accident data were accumulated A fourth approach was then considered. Simulated annealing
(here, 4.65 years) to represent estimated accident rate per year, (Kirkpatrick, Gelatt, & Vecchi, 1983) would involve adding
a.k.a. annualized rate. noise to each parameter’s starting value before engaging in the
Second, we need to clearly understand that, unlike accident gradient-descent, residual error-minimization process. Then,
frequency histograms, a distribution of rates is not technically a as that process progressed, we would “cool the noise,” allowing
pdf, even though we may try to curve-fit binned rate data with parameter estimates to bounce around and out of local minima
a pdf-like function (e.g., Eq. 4).3 More specifically, to cumulate until a global near-minimum could be found.
accident frequencies over a range of TFH, we would compute a Nonetheless, in the case of noisy rate data, even such an
pdf ’s definite integral over that range. In contrast, to cumulate elegant approach unfortunately would still leave us one major
the probability of having an accident over a range of TFH requires logical problem: All our data bins—including the noisy—would
a different method (described later in Eq. 12). have equal opportunity to influence the residual sum-of-squares
during the data-fit. Any “one bin, one vote” approach would
Parameter Estimation implicitly allow all bins equal influence, no matter how many
The NonlinearModelFit function of Mathematica 7.0 (Wol- individuals each bin represented or what that bin’s expected
fram, 2008) is used to estimate parameters for Eq. 4. For variance was. This would confer inordinately large influence to
unconstrained parameters, NonlinearModelFit offers a range of sparsely populated bins, as well as to larger bins having inher-
standard numerical methods (e.g., Newton-Gauss, quasi-Newton, ently low reliability.
Levenberg-Marquardt). For constrained parameters (the method After being frustrated by the problems outlined above, we
used here), where starting and/or final parameter values are forced finally settled on the standard procedure of weighting empty
to lie within some range pmin<pt<pmax, the Karush-Kuhn-Tucker bins by 0 while weighting non-empty bin’s accident rate ri by
(KKT) method was used. the inverse of its sample variance (1/si2). This is easily done in
Mathematica by supplying a vector Weights→w to Nonlinear-
The Problem of Noise ModelFit, where w={w1, w2 ... wi}.
Noisy data such as Figure 4’s pose great difficulty for standard Since our data are rates are proportions, 0≤ri≤1, the ith bin
least-squares minimization. Parameters often fail to converge in weight wi becomes
rugged ratescapes, or may converge to a local minimum rather 1 ni − 1
than a global one. wi = = (6)
To confront the noise, we first tried the simplest approach of
si ri (1 − ri )(1 − (ni N i ))

using wider data bins. Surprisingly, this had almost no effect. ni being the ith bin’s total frequency count (accidents+non-
Accidents within these particular data tended to bunch up in accidents) used to derive ri, and Ni being the number of pilots
several adjacent bins, leaving no way to smooth them without in the general population with that range of TFH.
resorting to arbitrary “bin widths of convenience,” which would In this particular instance, we do not know Ni, so we shall
negate our attempt to standardize risk exposure by having equal assume a constant sample ratio ni/Ni, making the term 1-(ni/Ni)
bin widths. constant for all bins. Since weights only need to be expressed
We next tried another standard tactic, namely dropping relative to each other during curve-fitting, we can eliminate this
outliers. Outliers can be defined in various ways, but a rule of constant term, leading to a functional weight of
thumb is to drop all data points (here, rates) greater than 2.5-3 ni − 1
standard deviations (s) from the group sample mean ȳ wi = (7)
ri (1 − ri )
1 n
n −1
i =1
i − y )2 (5) Unfortunately, Eq. 7 is ill-behaved when ri=0 or 1, leading
to division by zero. This can arise when either TFH bins are too
This method was tested but abandoned for several reasons. narrow or when TFH is large, making the bin sample size ni too
Not only was it based on an (here invalid) assumption of distri- small, since few pilots have high TFH. Figure 6 (top) illustrates
butional normality, but useful information was destroyed, and the problem.
the extremely long tail of the fitting function resulted in an s so
small that we end up dropping very few of the obviously noisy
values in the tail that threatened our curve-fit.

The area under a pdf is always constrained to equal 1.0. The area under a
distribution of rates is unconstrained.

Figure 7. Graph of Eq. 4 with A=1, a=6, b=0.5,δ=b=0.

of the graphic data-fit plots verified a good fit, and multiple

runs showed parameters similar to three significant Figures, we
could be confident that those parameter values represented a
near-optimal data-fit.


Modeling the Accident Rate Histograms

Below are the curve-fits for the empirical data, our two ac-
cident rate histograms. Figure 8 represents non-IR pilots, Figure
9, IR pilots. Grate models are overlaid and parameter estimates
given beneath.
Figure 6. (red, solid lines) With Eq. 7, as r→0 or 1, 1/(r (1-
r))→ ∞; (top, dashed, blue line) With ε =.001, Eq. 8 gives a
good approximation across most of r, from around
.006<r<.994., but gracefully self-limits at r=0 and 1.

To address this issue, Eq. 7 is modified slightly by adding a

term ε designed to constrain behavior at 0 and 1:
As Figure 6 shows, when ε is small, Eq. 8 closely approximates
Eq. 7 over most of its range but self-limits gracefully at ri=0 and 1.

Parameter Start Values and Evolution

In multidimensional parameter spaces with unknown local
minima, the choice of start values can be critical. A hybrid ap- 2
Parameter A α β δ b Rw
proach was used here. Start values were first selected that roughly N=39327 .0130 64.38 .0924 .3648 .0025 .654
approximated the shape of the binned data, namely A=1, α=6, Figure 8. Non-IR GA accident rates (median TFH=250.5).
β=0.5, δ=b=0. Figure 7 shows the shape of that plot.
To allow parameters to evolve in a manner emulating an-
nealing, NonlinearModelFit was set up to use weighted binned
data and to run inside a For[i=1, γ >1.0, i=i+1] loop. Inside the
For[] loop, to emulate noise injected at each iteration, the value
of each parameter pj was multiplied by a random real number
drawn from the range 1.0±γ. with γ starting at 0.02.
During each (ith) iteration of the For[] loop, NonlinearModelFit
was itself allowed to run for i iterations, after which i increased
by 1 while γ decreased by a small fixed amount (.0002). γ thus
emulated the cooling rate. Eventually, γ fell below 1.0, halting
the injection of parameter noise and terminating the For[] loop
after one final gradient descent. Parameter A α β δ b R2w
Running this method repeatedly on a given dataset, we N=28360 .0174 51.35 .0890 2.243 .0011 .775
could broadly sample the parameter space. After inspection Figure 9. IR GA accident rates (median TFH=823.5).

Each Figure shows accident rate (y-axis) plotted by TFH-of- of about 45-250 TFH for non-IR (representing about 50% of
pilot-at-the-time-of-accident (x-axis, bin width = 100). Each non-IR pilots) and about 45-500 TFH for IR pilots (≈30%).
bin’s raw number of accidents (based on 4.65 years’ data) was More accurate prediction awaits the arrival of larger datasets,
divided by the total number of pilots (accident+non-accident) an important point we will revisit in the Discussion section.
in the same range of TFH to produce a bin rate for that 100-hr
range.4 The resulting rates were then divided by 4.65 to represent USING Grate
one average year’s rates before starting the curve-fitting process.
Keep in mind that the height of the y-axis represents the prob- Grate can either be used as a point-estimate of relative accident
ability of having an accident over a 100-FH span, x±50. risk, or as a measure of cumulative risk over a known range of
Grate is based on Eq. 4 with data weighted by Eq. 8 (ε =.001). TFH. The latter is preferable but requires accurate values of TFH
Relative weights are drawn in green and have been scaled to fit for each pilot at known start and end times. These are rarely
within the Figure. Empty bins (ni=0) were weighted 0 to prevent available. Therefore, both estimators are defined.
them from influencing the curve-fit. The yellow band surround-
ing Grate represents its 95% confidence interval (CI). A Point-Estimate of GA Flight Risk
To estimate the relative accident rate for a given value of
Model Evaluation THF=x, (meaning the range from x-50 to x+50, flown over the
A model’s ability to predict empirical data can be expressed by course of 1 year), simply populate Eq. 4’s parameters given the
a variety of metrics. One of the simplest measures of goodness- instrument rating of the pilot, and insert x. A variety of math-
of-fit is the coefficient of determination, R2. R2 varies between 0 ematical and statistical programs will handle the computation,
and 1 and estimates the proportion of explained variance. The including estimating Γ(α)—Mathematica, MATLAB, SPSS,
weighted form of R2 is nominally SAS, and Excel, to name a few.
To illustrate, for non-IR pilots with the median value of
∑ wi ( yi − f i )
n 2
SS w, error (9)
Rw2 = 1 − = 1 − ni =1 TFHxNIR=250.5, Eq. 4 becomes
SS w, total ∑ wi ( yi − yw )
i =1 ln( 250.5 ) −.3648

with weighted means, for instance ȳw, of the form e .0924
(ln(250.5) − .3648) 64.38−1.0924 −64.38
.0025 + .013
∑ wy

yw = i i i
(10) With Γ(64.38) ≈ 9.611×1087, Grate = .0068 (≈1 in 150) ex-
i i
pected serious-to-fatal accidents per 100 TFH, which we can
see is correct from Figure 8.5
However, Eq. 9 can throw values < 0 with nonlinear data, so This point-estimate can be used as a statistical covariate to
we opt for the robust form of R2w= r2w, the square of the weighted control for estimated relative flight risk when the data contain
correlation coefficient only a single value of TFH for each pilot. What the point-estimate
essentially embodies is an estimated average probability of having
∑ w (x − xw )( yi − y w )
i i
rw = i
(11) an accident while flying 100 flight hours over the course of one
(∑ w (x − x ) )(∑ w ( y − y ) )
i i i w
2 n
i i i w
year, given that value of TFH=x as the midpoint (i.e., x±50).
What we need to keep in mind while using such a point-
With these data, Eq. 11 yields r2w-non-IR = .654 and r2w-IR = estimate is the implicit assumption that a) all pilots fly the same
.775, both in the “moderate” range of fit. This is surprisingly number of hours per year, and b) their accident rate does not
high, given the noisy data. The explanation is straightforward. change during that time. Naturally, both assumptions are false
We see from the distribution of weights and the .95CIs that for any given pilot and can only be safely ignored where large
these relatively high r2w values are due to the weighting function sample sizes tend to dampen statistical randomness.
(Eq. 8) assigning large weights mostly to low values of TFH. In spite of these deficiencies, the point estimate should mark
The large majority of pilots have low TFH (median values x˜NIR an improvement over a linear assumption of flight risk because
=250.5, x˜IR=823.5). Large ns increase the numerator of Eq. 8, it embodies a nonlinear relation derived from a reasonably large
thus bin weight. In practical terms, weights are simply reliability sample of GA pilots.
estimates that are higher when based on larger numbers coupled We now turn to a second, more comprehensive estimate of
with proportions close to 0 or 1. The .95CI reflects this. As flight risk.
weights shrink, .95CI widens.
The downside of high reliability at low TFH is lower reliability
at high TFH. Figures 8 and 9 show the penalty imposed by rate
noise. We can predict fairly reliably within the low-end range

Since each rate bin is 100 TFH wide, the per-FH rate is simply the height 5
Gamma can be easily determined using the Microsoft Excel function
of the y-axis divided by 100. =EXP(GAMMALN(α))

A Cumulative Estimate of GA Flight Risk Pseudocode for ∏ will resemble
In the fortunate circumstance where our data contain two myPi=1.0;
separate, time-stamped values of TFH for each pilot,6 we can For[x=347 to 633;
represent each pilot’s annualized accident probability as myPi = myPi *(1-(Grate(x)/100));
1  x2
 Γrate ( x) dx   ];
1 − lim ∏ 1 −
t 2 − t1  dx→0 TFH = x1  100  
 (12)
Final=(365/861)*(1- myPi);

where t2-t1 is the number of years over which the number of Here, Eq. 13 evaluates to about 0.0100 effective annual prob-
hours (TFH2-TFH1) were flown. The term Grate(x) is the value of ability of having a serious-to-fatal accident, given the number
Eq. 4, appropriate to the pilot’s instrument rating, at TFH=x. of hours flown.
Bin width is dx, which we could ideally shrink to zero to find
the limit of the definite product ∏. Note that the constant DISCUSSION
1/100 corrects for the original bin width of 100 TFH that we
first used during curve-fitting. That is, Grate is now renormalized Can total flight hours predict general aviation accident rates? If
to an hourly rate, and the total number of bins used to calculate so, what does that relation look like? Is there a “killing zone”—a
the limit of Eq. 12 would be (TFH2-TFH1)/dx. range of TFH over which GA pilots are at greatest risk? These
Eq. 12 represents an idealized process of finding the overall questions interest pilots, aviation policy makers, and insurance
probability of having an accident over a specified period of time. underwriters alike.
Each separate bin’s probability pi of having an accident is first Craig (2001) proposed that such a killing zone does indeed
transformed into the probability 1-pi of not having an accident exist, and that it spans the range of approximately 50 to 350 total
over the small period dx. These separate 1-pis are then multiplied flight hours. Unfortunately, his analysis relied solely on ­accident
together to let ∏ represent the probability pnoAcc of not having frequency counts, which fails to control for the number of non-
any accident across TFH2-TFH1. This then makes 1-pnoAcc the accident pilots having equivalent TFH. The use of accident rates
probability of having at least one accident during TFH2-TFH1. solves that problem by dividing the number of accident pilots
Finally, (1-pnoAcc)/(t2-t1) represents that as an annualized rate. by the number of non-accident pilots in each bin of our TFH
Finding a closed solution for the limit of Eq. 12 is elusive frequency histograms.
and unnecessary in practice. We can let dx=1.0 and get solutions Now, given this more proper methodology, if such a killing
accurate to about four decimal places—precise enough for our zone truly exists, what does it look like? Many aviation research
noisy empirical data. Eq. 12 then reduces to studies implicitly assume a straight-line relation between accident
rates and TFH. They merely assume that risk decreases as pilots
1  x2
 Γrate ( x)   get more experienced. In fact, that relation appears markedly
1 − ∏ 1 −
t 2 − t1  TFH = x1  100  
 (13)
The present work uses a nonlinear, gamma-based function
Eq. 13 is easily computed by a variety of common programs, (Grate) to predict GA accident rates, even from extremely noisy
using a simple For[] loop. Excel will do this as a macro, SPSS TFH data. Two log-transformed sets of 2003-2007 NTSB/FAA
will do it with syntax, and so forth. data produced weighted goodness-of-fit (R2w) of .654 and .775 for
To illustrate, suppose an IR pilot has 347 TFH reported on 3 non-instrument-rated and instrument-rated pilots, respectively,
Feb, 2012, and 633 TFH on 13 Jun, 2014 (i.e., over 861 days). considered to be “moderately good” by convention.
With parameters from Figure 9, and Γ(51.35)=1.202⋅1065, Grate

e − (ln( x )−2.243) .089 (ln( x) − 2.243) 50.35 .089 −51.35

Γrate = .0011 + .0174
1.202 ⋅ 10 65
and Eq. 13 is
365  633
 Γrate ( x)  
1 − ∏ 1 −
861  x1 =347  100  

All raw data should be checked to ensure that t2>t1 and TFH2> TFH1.
Experience with self-reported FAA data proves that this is usually (but not
always) the case.

We end with an appeal to the FAA and NTSB. The data needed
to compute accident rates such as these come from multiple
sources that are hard to access in the U.S. These include the
FAA CAIS and DIWS databases, as well as the NTSB Aviation
Accident Database. These databases do not intercommunicate
well, and could be greatly improved by sharing a common pilot
identifier code. The FAA has such an identifier called UniqueID,
and sharing that with the NTSB would allow researchers to
conduct studies with far greater sample sizes and reliability than
is currently possible.
Nonetheless, we have explored a methodology here that will
Figure 10. Annualized non-IR GA accident rates (median hopefully prove useful, either in its own right, or as a stimulus
to further research.


Bazargan, M., & Guzhva, V.S. (2007). Factors contributing to

fatalities in general aviation. World Review of Intermodal
Transportation Research, 1(2), 170-181.

Chiew, F.H.S., Srikanthan, R., Frost, A.J., & Payne, E.G.I.

(2005). Reliability of daily and annual stochastic rainfall
data generated from different data lengths and data
characteristics. In: MODSIM 2005 International Congress
on Modeling and Simulation, Modeling and Simulation
Society of Australia and New Zealand, Melbourne,
Figure 11. Annualized IR GA accident rates (median
TFH=823.5). December 2005, pp. 1223-1229.
Craig, P.A. (2001). The killing zone. New York: McGraw-Hill.
Figures 10 and 11 show the rate data with the modeling Dionne, G., Gagne, R., & Vanasse, C. (1992). A statistical
function superimposed. The yellow region surrounding each analysis of airline accidents in Canada, 1976-1987.
modeling curve represents the .95 confidence interval. (Report No. CRT-811). Montreal: Center for Research
Consistent with our intuition and the frequency count on Transportation.
studies, these models suggest that a “killing zone” indeed ex-
ists. Accident rates seem to increase for GA pilots early in their Guohua, L, Baker, S.P., Grabowski, J.G., Qiang, Y.,
post-certification careers, reaching a peak, and then declining McCarthy, M.L., & Rebok, G.W. (2003). Age, flight
with greater flight experience. experience, and risk of crash involvement in a cohort of
However, that zone may be far broader than earlier imagined. professional pilots. American Journal of Epidemiology. 157
Relatively high risk for an individual pilot may extend well be- (10), 874-80.
yond the 2,000-hour mark before leveling off to a baseline rate.
Hogg, R.V., & Klugman, S.A. (1984). Loss distributions. New
For now, we should consider these conclusions tentative.
York: Wiley.
First, the 67,687 pilots in these datasets constitute just over
10% of those currently licensed in the U.S. While this might Kirkpatrick, S., Gelatt, C.D., & Vecchi, M.P. (1983).
represent an excellent dataset size in other venues, the relatively Optimization by simulated annealing. Science. 220,
small number of pilots at high TFH leads to considerable noise 671-80.
in high-TFH rates. Data-weighting can overcome part of the
problem but leads to wide confidence intervals at high TFH, Knecht, W.R.(2012). Predicting general aviation accident
reflecting the low statistical reliability of high-TFH data. frequency from pilot total flight hours. (Technical Report
The net effect is that, at this point in time, we have high con- No. DOT/FAA/AAM-12/13). Washington, DC: Federal
fidence in our data only at relatively low values of TFH. Better Aviation Administration, Office of Aerospace Medicine.
results await improved cross-communication between the FAA
and NTSB databases, which will result in much larger sample Knecht, W.R., & Smith, J. (2013). Effects of training school
sizes and much greater reliability across the full spectrum of TFH. type and examiner type on general aviation flight safety.
(Technical Report No. DOT/FAA/AAM-12/4).
Washington, DC: Federal Aviation Administration,
Office of Aerospace Medicine.

Mackey, J.B. (1998). Forecasting wet microbursts associated Sherman, P.J. (1997). Aircrews’ evaluations of flight deck
with summertime airmass thunderstorms over the automation training and use: Measuring and
southeastern United States. Unpublished MS Thesis, Air ameliorating threats to safety. Unpublished doctoral
Force Institute of Technology. dissertation. The University of Texas at Austin.
Mills, W.D. (2005). The association of aviator’s health Spanier, J. & Oldham, K.B. (1987). An atlas of functions. New
conditions, age, gender, and flight hours with aircraft York: Hemisphere.
accidents and incidents. Unpublished Ph.D. dissertation.
University of Oklahoma, Health Sciences Center Wolfram Mathematica Documentation Center. (2008). Some
Graduate College, Norman, OK. notes on internal implementation. Downloaded April 29,
2011 from
National Transportation Safety Board. (2011). User- note/SomeNotesOnInternalImplementation.html#20880
downloadable database. Downloaded May 26, 2011

