annhyg%2Fmem045

Ann. Occup. Hyg., Vol. 51, No. 7, pp.
611–632, 2007
Ó The Author 2007. Published by Oxford University Press
on behalf of the British Occupational Hygiene Society
doi:10.1093/annhyg/mem045
A Comparison of Several Methods for Analyzing

Censored Data
PAUL HEWETT 1* and GARY H. GANSER2
1
Exposure Assessment Solutions, Inc., Morgantown, West Virginia; 2Department of Mathematics,
West Virginia University, Morgantown, West Virginia
Received 5 March 2007; in final form 17 August 2007
The purpose of this study was to compare the performance of several methods for statistically
Downloaded from http://annhyg.oxfordjournals.org/ by guest on December 3, 2014

analyzing censored datasets [i.e. datasets that contain measurements that are less than the field
limit-of-detection (LOD)] when estimating the 95th percentile and the mean of right-skewed oc-
cupational exposure data. The methods examined were several variations on the maximum like-
lihood estimation (MLE) and log-probit regression (LPR) methods, the common substitution
methods, several non-parametric (NP) quantile methods for the 95th percentile and the NP
Kaplan–Meier (KM) method. Each method was challenged with computer-generated censored
datasets for a variety of plausible scenarios where the following factors were allowed to vary ran-
domly within fairly wide ranges: the true geometric standard deviation, the censoring point or
LOD and the sample size. This was repeated for both a single-laboratory scenario (i.e. single
LOD) and a multiple-laboratory scenario (i.e. three LODs) as well as a single lognormal distri-
bution scenario and a contaminated lognormal distribution scenario. Each method was used to
estimate the 95th percentile and mean for the censored datasets (the NP quantile methods esti-
mated only the 95th percentile). For each scenario, the method bias and overall imprecision (as
indicated by the root mean square error or rMSE) were calculated for the 95th percentile and
mean. No single method was unequivocally superior across all scenarios, although nearly all
of the methods excelled in one or more scenarios. Overall, only the MLE- and LPR-based meth-
ods performed well across all scenarios, with the robust versions generally showing less bias than
the standard versions when challenged with a contaminated lognormal distribution and multiple
LODs. All of the MLE- and LPR-based methods were remarkably robust to departures from the
lognormal assumption, nearly always having lower rMSE values than the NP methods for the
exposure scenarios postulated. In general, the MLE methods tended to have smaller rMSE val-
ues than the LPR methods, particularly for the small sample size scenarios. The substitution
methods tended to be strongly biased, but in some scenarios had the smaller rMSE values, espe-
cially for sample sizes <20. Surprisingly, the various NP methods were not as robust as expected,
performing poorly in the contaminated distribution scenarios for both the 95th percentile and
the mean. In conclusion, when using the rMSE rather than bias as the preferred comparison
metric, the standard MLE method consistently outperformed the so-called robust variations
of the MLE-based and LPR-based methods, as well as the various NP methods, for both the
95th percentile and the mean. When estimating the mean, the standard LPR method tended
to outperform the robust LPR-based methods. Whenever bias is the main consideration, the ro-
bust MLE-based methods should be considered. The KM method, currently hailed by some as
the preferred method for estimating the mean when the lognormal distribution assumption is
questioned, did not perform well for either the 95th percentile or mean and is not recommended.
Keywords: censored data analysis; limit-of-detection
INTRODUCTION left-censored dataset; that is, a dataset where one

or more measurements are less than the field
As exposure limits decrease and exposure controls limit-of-detection (LOD; i.e. the laboratory LOD
improve an increasingly frequent occurrence is the
divided by the sample volume) for a particular com-
bination of sampling method, flow rate and sample
*Author to whom correspondence should be addressed. time. Left-censored datasets tend to occur when-
Tel: 001 304 685 7050; e-mail: phewett_2006_07@oesh.com ever there is a high LOD relative to exposures;
611
612 P. Hewett and G. H. Ganser
the exposures span several orders of magnitude; or Assuming that the underlying exposure profile is
the sample time is short or the flow rate is low, re- reasonably lognormal, which CDA methods
sulting is a small sample volume. Published cen- should be considered?
sored data analysis (CDA) methods fall into four Which method should be used for complex-
general categories: substitution methods, log- censored datasets and/or when we suspect that
probit regression (LPR) methods, maximum likeli- the underlying exposure distribution departs sig-
hood estimation (MLE) methods and non-paramet- nificantly from single lognormal assumption?
ric (NP) methods. Within each category or family, Is there an ‘omnibus’ method that should be the
there are several variations, usually developed to first choice, regardless of the sample size, (ob-
reduce transformation bias (discussed later) or served) percent censored, complexity of censor-
for the situation where it is suspected that the un- ing or variability in the data?
derlying distribution departs significantly from the
assumed lognormal distribution in the hope of To address these questions, we estimated the bias and
reducing the bias or improving overall accuracy root mean square error (rMSE) for each of the CDA
(defined as bias plus precision) when estimating methods when estimating a commonly used compli-
the ‘mean concentration’ (Helsel, 2005). This has ance statistic (i.e. the 95th percentile) and the expo-
resulted in numerous peer-reviewed articles that sure profile mean (often used in environmental

offer sometimes contradictory guidance regarding evaluations and epidemiological studies). The bias
which is the preferable method. Information on is a function not only of the CDA method employed
the accuracy of upper percentile compliance statis- but is also a function of the true geometric standard
tics, such as the 95th percentile, when calculated deviation (GSD), true percent censored and the sam-
from a censored dataset, is difficult to find. Helsel ple size. The rMSE for each method is an estimate of
(2005), author of a recently published text devoted the overall accuracy (i.e. overall imprecision), which
entirely to CDA, strongly recommended using the is a function of both bias and precision.
NP Kaplan–Meier (KM) method to estimate the While earlier studies of CDA methods (discussed
mean whenever the dataset is ,50% censored later) are helpful for sorting out the above issues,
and ‘robust’ parametric methods in other instances. they often had limitations. Obviously, the very early
Last, the simple substitution methods, which con- studies will be dated with respect to newer methods.
tinue to be commonly used, are often condemned Most studies, even the relatively recent studies, fo-
in the literature in preference to these other cused on only a limited number of methods. Many
methods. of the studies averaged their results across all of
We distinguish between ‘simple-censored’ and the posited scenarios and underlying distributions
‘complex-censored’ datasets. A simple-censored making it difficult to impossible to sort out the results
dataset contains one or more measurements censored for a specific method and type of underlying distribu-
at a single LOD, or two or more LODs, but all at tion. Most of the studies considered only one or a few
the low end. In contrast, a complex-censored data- samples sizes and/or only a few levels of variability.
set contains measurements censored at two or more Many of the studies generated only 500–1000 simu-
LODs with uncensored measurements scattered in lated datasets per method, which can result in
between (Bullock and Ignacio, 2006). In our experi- variable and misleading bias and rMSE values, which
ence, most censored datasets appear to be of the we in this study attempted to avoid by using
simple-censored variety: a single laboratory is used 100 000 as the simulation sample size. Several stud-
and the exposure profile for the exposure group is ies examined the robustness of the CDA methods
reasonably stable. Complex datasets occur whenever by challenging them with data drawn from identi-
investigators combine data from several studies where cal ‘contaminated lognormal’ distributions, that is,
different laboratories were used or combine data distributions created by combining two lognormal
from exposure groups that were collected across some distributions. However, these contaminated distribu-
broad span of time during which several laboratories tions were similar in shape to a standard lognormal
were utilized. Furthermore, with large datasets comes and thus not a severe challenge for the methods. Last,
the possibility that the data reflect different exposure all of the studies focused exclusively or primarily on
distributions as exposure profiles are unlikely to estimating the mean of a lognormal, right-skewed
be stable for periods of more than a year (Symanski distribution. Our main interests are the upper percen-
et al., 1996). tiles estimates, such as the 95th percentile exposure
In this paper we address the following questions: (Mulhausen and Damiano, 1998; Bullock and Igna-
cio, 2006), and while a few of the studies passingly
What is the method bias and overall accuracy addressed upper percentiles, none did so comprehen-
when estimating the 95th percentile or mean of sively enough for us to draw reliable inferences.
a lognormal or contaminated lognormal exposure We recognize that many aspects of our computer
profile? simulations repeat work reported in earlier papers.
Comparison of several methods for analyzing censored data 613
For example, we will address the analysis of both tive bias for the 95th percentile (which is calculated
lognormal and contaminated lognormal exposure from the sample GM and sample GSD)). LOD/2 sub-
profiles, examine the effect of LODs in ranges of stitution appears to be the CDA method of choice in
1–50% and 50–80% of the underlying distribution the epidemiological literature whenever large, com-
and cover sample sizes ranging from 5–100. Further- plex-censored datasets are used to construct a job-
more, we will contrast and compare the most exposure matrix (Hornung and Reed, 1990; Glass
common examples from all four families of CDA and Gray, 2001). When estimating the true geometric
methods, nearly all of which have been addressed mean (GM) and GSD, Hornung
pffiffiffi and Reed (1990) rec-
by one or more previous papers. However, we feel ommended using LOD 2 substitution whenever it
these repeats are justified as our results provide was suspected that the underlying GSD is ,3, and
interesting and sometimes surprising comparisons LOD/2 substitution otherwise. But when estimating
between the various methods. the mean, they recommended the LOD/2 method
We purposefully did not address confidence inter- provided the percent censoring was ,50%. Their
vals in this study. Neither did the majority of the pre- paper is frequently referenced to justify using the
vious investigators. Some authors have examined and LOD/2 substitution method.
promote the use of the bootstrap or jackknife meth- All of the substitution methods are biased, and this
ods for devising confidence intervals (Shumway bias will be a function of the true GSD, the true per-

et al., 2002; Helsel, 2005). Helsel (2005) devotes cent censored and the sample size. As the sample size
a chapter to calculating parameter confidence inter- increases, the bias asymptotically approaches a fixed
vals for censored datasets and discusses the various value. El-Shaarawi and Esterby (1992) derived for-
alternatives. mulae for directly calculating the large sample bias
Last, we suspected at the outset that it was unlikely for the mean when using substitution methods, given
that a single method would always perform better known values for the GM, GSD and the true percent
than the others, regardless of the exposure scenario, censored. But since their formulae cannot be used to
which indeed proved true. Therefore, our goal is to determine the bias for small sample sizes, bias when
inform the reader regarding the ability of each of estimating upper percentiles, or the overall accuracy
the selected methods to extract unbiased and precise (bias plus precision) of the estimates and cannot be
estimates of the usual parameters from datasets applied to complex-censored or contaminated log-
where the underlying distribution is either lognormal normal distributions, we included the substitution
or contaminated lognormal, as well as from both sim- methods in our simulations.
ple- and complex-censored datasets. To accomplish
this goal, we devised several exposure scenarios with
LPR methods
the expectation that the reader will select the scenar-
ios that best describe their experience and focus on In occupational health, the LPR method has long
the methods that perform best under these conditions. been recommended (Hawkins et al., 1991; Mulhausen
and Damiano, 1998) for analyzing censored data.
The data, including the LODs, are sorted and plotted
CENSORED DATA ANALYSIS METHODS using log-probability plotting paper. This method,
Published or recommended methods for analyzing the LPR method, is based on following relationship,
such datasets tend to fall into four categories or which is derived from the Z-value equation:
families: yi 5 l̂y þ r̂y U 1 ðpi Þ;
substitution methods, where yi 5 ln(xi) and U1(pi) refers to the inverse
LPR methods, cumulative normal distribution for plotting position
MLE methods, and pi. This is a linear equation which is solved for the
NP methods. non-LOD pairs of yi and U1(pi). The sample GM
and sample GSD are estimated using the exponential
Substitution methods of the intercept and slope, respectively. Blom’s for-
The three common psubstitution
ffiffiffi methods are LOD, mula is typically used for calculating the ith plotting
LOD/2 and LOD/ 2 substitution, although the position: pi 5 (i 3/8)/(n þ 1/4).
choice of the substitution fraction is largely arbitrary. A variation on this method is the robust LPR
Substitution of each ‘less than value’ with the LOD (LPRr) method, which is felt to be less susceptible
continues to occur, despite the numerous recommen- to departures from the lognormal assumption (Kroll
dations against, with the invariable justification that and Stedinger, 1996) and avoids transformation bias
such a practice is conservative (LOD substitution [transformation bias refers to the bias in the estimate
tends to result in a ‘conservative’ or positive bias of the mean that results when the mean (on the
for the mean and a negative bias for variability. concentration scale) is calculated from the sample
The reduction in variability tends to result in a nega- GM and sample GSD (which are estimated using
the log-transformed data); the minimum variance un- predicted using the initial sample GM and GSD
biased estimator equation (Mulhausen and Damiano, and then combined with the detects. The final sample
1998) is typically used with complete (i.e. uncen- GM and GSD, as well as the simple arithmetic mean
sored) samples to reduce this bias] since the mean (thus avoiding transformation bias), are calculated in
can be estimated using the simple arithmetic mean. the conventional manner using the combined dataset.
With LPRr, the missing values, represented by the While in principle either MLE or MLEr can be ap-
non-detects, are predicted using the ‘initial’ values plied to complex-censored datasets, we decided to
of the sample GM and sample GSD determined using create an additional variation (MLErm) which is iden-
LPR. The ‘final’ sample estimates are calculated by tical to LPRrm, except that the MLE method, rather
combining the predicted values and the detects and than the LPR method, is used to generate the initial
analyzing the dataset in the conventional fashion estimates of the sample GM and GSD. Otherwise,
for the sample GM and GSD (and from these calcu- all other calculations are the same as those used for
lating the sample 95th percentile) and for the sample the LPRrm method (see the citations for LPRrm, such
arithmetic mean (thus avoiding transformation bias). as Helsel, 2005).
In principle, neither of the LPR and LPRr methods The last variation on the MLE method is that de-
should be applied to complex-censored datasets. vised by Succop et al. (2004). The MLE method is
Helsel and Cohn (1988) devised an ad hoc method used to derive initial estimates of the sample GM

that can be applied to a complex-censored dataset and GSD. These estimates are used to estimate the
by evenly spreading the LODs throughout the lower cdf for each unique LOD in the dataset. Each non-
portion of the dataset. This method, which we will detect is then replaced with what the Succop et al.
call the robust, multiple LOD LPR method (LPRrm), called ‘the most probable value’ concentration,
was recommended by Helsel and Cohn (1988) when- which corresponds to the predicted concentration at
ever the lognormal distribution assumption is in half of the cdf value for the LOD (for example, if
doubt. The methodology is complex, so the reader an LOD is situated at the 10th percentile for the log-
is referred to the original articles. Helsel (2005) pro- normal exposure profile predicted using the MLE
vided a worked example of the LPRrm method, which method, each non-detect is simply replaced with
he referred to as the ‘robust regression on order sta- the concentration predicted to occur at the 5th per-
tistics’ method. centile). The mean of new dataset, which consists
of the uncensored and the most probable values, is
MLE methods then estimated using the standard arithmetic mean
The MLE method is often considered the gold formula. The authors did not estimate, via computer
standard provided the data are well described by a simulation, the bias or accuracy for this method, but
lognormal distribution. The sample GM and GSD are after comparing these most probable values to labo-
those values that maximize the likelihood function: ratory values (where the laboratory was persuaded
to provide measurements below the LOD), the au-
Y
n
thors concluded that this method is preferable to
LF 5 pdf ðln xi j ln GM; lnGSDÞ the simple substitution methods. We implemented
i5kþ1
the authors’ method (referred to here as MLEmpv)
Y
k
and after replacing all LOD values with the most
cdf ðln xj j ln GM; lnGSDÞ;
probable values estimated the GM, GSD, 95th per-
j51
centile and mean using the standard statistical formu-
where n 5 sample size (including both censored and lae. We should note, however, that Succop et al.
uncensored data), k 5 number of censored data, pdf never claimed that their method could be applied to
refers to the probability density function and cdf re- accurately estimating any parameter other than the
fers to the cumulative distribution function. Since mean.
there is no close-form solution to this equation,
Finkelstein and Verma (2001) recommended using NP methods
the solver-function available in most spreadsheets KM quantile and mean. Schmoyer et al. (1996)
to find the optimal solution. They provided an easy and She (1997) recommended applying the KM
to follow example that can be extended to virtually method, originally intended for application to the
any sample size and degree of censoring. right-censored data that occur in prospective epide-
If the underlying distribution is felt to depart sig- miological studies and in clinical trials, to the left-
nificantly from the lognormal distribution assump- censored concentration data encountered in environ-
tion, Kroll and Stedinger (1996) recommended mental studies. The KM method is basically a NP
using robust maximum likelihood estimation (MLEr) method based on the empirical cdf [for uncensored
where the MLE method is used to derive the initial datasets, the KM method produces quantiles that
estimates of the sample GM and GSD. As with LPRr, are identical to those produced by the empirical cdf
the missing values (i.e. the censored exposures) are method for all sample sizes except those where np
(where n 5 sample size and p 5 proportion: for ex- of variability (i.e. four coefficients of variation) for
ample, p 5 0.95 for the 95th percentile) equals an in- each of the four types of distributions. For each of
teger; in these cases, the empirical cdf method the 16 scenarios, 500 random datasets were gener-
assigns the proportion to the ranked value for that in- ated. The rMSE summary statistics were calculated
teger, while the KM method assigns the proportion to across all 16 simulations, making the results difficult
the next higher ranked value]. Its main advantage is to interpret for any one method and scenario, but the
the ability to estimate the mean in the presence of authors concluded that the LPR and MLE methods
non-detects, without relying upon a distributional were, across all the distributions, the preferred meth-
assumption. The KM method is available in many sta- ods for estimating the mean and ‘median and inter-
tistics programs, but because it was originally intended quartile range’, respectively.
for right-censored datasets, the exposure data must be Helsel and Cohen (1988) extended the work of
‘flipped’ before analysis (i.e. the left-censored data- Gilliom and Helsel (1986) by considering multiple
set must be converted to a right-censored dataset). LODs and adding the LPRrm method to the methods
Helsel (2005) provided an example of the calcula- tested. They looked at only one sample size (n 5 25)
tions and recommended it in preference to ‘all other and three LODs, set at 20%, 40% and 80% of the un-
methods’ whenever the (observed) percent censored derlying distribution. Roughly one-third of the meas-
is ,50%. To implement this method, we wrote com- urements were assigned to each LOD. (If a randomly

puter code that does not require flipping of the data. generated measurement was less than the assigned
KM can also be used to estimate the median and LOD, the measurement was truncated at the LOD.)
other quantiles. For the 95th percentile, the minimum Otherwise, their procedures were identical to those
sample size should be 20. of Gilliom and Helsel (1986). The authors used the
Quantiles. NP methods for estimating quantiles rMSE as the primary metric for comparison and con-
(i.e. NP percentiles) do not require an assumption re- cluded that the LPRrm method is superior to all others
garding the shape of the underlying distribution and when the underlying distribution departs from the
are therefore considered to be robust to departures lognormal distribution assumption.
from the lognormal or any other distributional as- Kroll and Stedinger (1996) compared the LPR,
sumption. For the 95th quantile, that is, the NP sam- LPRr, MLE and MLEr methods when analyzing
ple 95th percentile, at least 19 or 20 measurements censored datasets drawn from the lognormal, con-
are required, depending upon the calculation method, taminated lognormal [using the same contaminated
and an observed percent censored that is no more lognormal distributions used by Gilliom and Helsel
than roughly 90% (depending upon the sample size). (1986) and Helsel and Cohen (1988)], gamma, delta
Hyndman and Fan (1996) present several methods, and other distributions. They generated concentra-
which they numbered Q1 through Q8, for calculating tion datasets for samples sizes of 10, 25 and 50 and
NP quantiles (see Appendix 1). The Systat (2007) single LODs set at 10%, 20%, 40%, 60% and 80%
statistical package default quantile method is ‘Cleve- of the underlying distribution. Because they summa-
land’s method’ (i.e. the Q5 method in Appendix 1). rized the rMSE results across all the distributions and
The SAS (2006) statistical package default method their variations, as well as calculated ratios compar-
is the ‘Empirical CDF with averaging’ method (i.e. ing the various methods, their results are difficult to
the Q2 method). The EPA recently recommended us- interpret and to compare to the results of others.
ing the Q7 method. For these simulations, we se- The authors used the rMSE as the metric for compar-
lected the Q6 method as it was recommended by ison and concluded that for censoring of 60% or less,
Gilbert (1987) and Helsel and Hirsch (2002): the methods in general produced similar rMSE val-
ues, but that the MLE method was superior for the
sort the data from low to high,
80% censored scenarios. The robust methods—LPRr
calculate i 5 integer portion of 0.95(n þ 1), and
and MLEr—performed better when estimating the
estimate the 95th percentile: X^0:95 5 xi þ
mean, with the MLEr performing slightly better than
ð0:95ðn þ 1Þ iÞðxiþ1 xi Þ:
the LPRr method. However, for low to moderate cen-
soring, the authors recommended the LPR and LPRr
Recommendations in the literature methods over the MLE and MLEr methods on the ba-
Gilliom and Helsel (1986) applied several CDA sis that they are easier to understand and implement.
methods—including LOD, LOD/2, LPR and Schmoyer et al. (1996) compared the KM method
MLE—to the analysis of randomly generated data- to the MLE method when doing hypothesis tests on
sets of size 10, 25 and 50, drawn from lognormal, the mean for datasets of size n 5 10 drawn from log-
‘contaminated lognormal’ (i.e. a mixture of two spe- normal, truncated normal and gamma distributions
cific lognormal distributions), gamma and delta dis- where the percent censored was roughly 25%, 50%
tributions. The datasets were censored at a single and 75% of the underlying distribution. For each
LOD set at 20%, 40%, 60% and 80% of the underly- combination they generated 500 datasets. They pre-
ing distribution. Simulations were done at four levels sented their results in terms of power curves for a test
on the mean exposure. While they allowed that the US agencies and organizations have published sev-
interpretation of the power curves was subjective, eral monographs on the analysis of environmental
they concluded that ‘in general, the (KM) test seems data. The Environmental Protection Agency (EPA,
better’ than the MLE method. She (1997) compared 2006) offered the following general recommenda-
the KM method to the LOD/2 substitution, LPR and tions:
MLE methods. She generated 1000 datasets of size
if the percent censored is ,15%, use substitution
n 5 21 where each dataset had three censoring
with zero, LOD/2, or the LOD, or use the MLE
points randomly assigned from 10% to 80% of the
method,
underlying distribution (in increments of 10%). The
for 15–50% censored, use the MLE method, and
bias and rMSE results varied, with the LOD/2 substi-
for 50–90% censored, calculate the NP exceed-
tution occasionally performing better than the KM
ance fraction for the limit.
method. However, She rejected the LOD/2 as a valid
method based on the perception that it ‘has no statis- The US Geological Survey agency (Helsel and
tical theoretical basis’. She concluded that the KM Hirsch, 2002) published a guide to statistical meth-
method ‘performs as well as or better than’ the ods in which the substitution methods were recog-
MLE, LPR or LOD/2 substitution methods, making nized to have good overall accuracy (i.e. low
it an ‘attractive alternative . . . because it is non- rMSE), but were not recommended because they

parametric and quite robust when the distribution de- tend to be biased and have no theoretical foundation.
parts from normality (for the log-transformed data)’. The MLE, MLEr or LPRr methods were recommen-
Shumway et al. (2002) compared the LPRr and ded, the last two in particular when the lognormal
MLE methods when estimating the mean and variance distribution assumption is in doubt. The LPRrm
from censored datasets drawn from the lognormal dis- method was recommended whenever there are multi-
tribution. They looked at sample sizes of 20 and 50, ple LODs in the dataset. Frome and Wambach (2005)
with LODs set at 50% and 80% of the underlying dis- of Oak Ridge National Laboratory published an over-
tribution. They concluded that neither method was view of the statistical methods that can be applied to
consistently better and that the choice depends upon censored datasets. They recommended the MLE
the percent censored and departures from lognormal- method for general use and the KM method when-
ity. They warned against combining different datasets ever the lognormal distribution assumption is in
as it would increase the probability that the data will doubt.
come from a ‘mixture of distributions’, recommend- In summary, it would appear that no one method
ing instead the ‘grouping of data into similar subsets’. has been recommended for all instances of sample
There are other articles, but the above appeared to size, degree of conformity to the lognormal distribu-
be the most relevant. Textbooks on the statistical tion assumption and the degree of censoring. It
analysis of environmental data offer differing advice. also appears that the published computer simulations
Gibbons and Goleman (2001) reviewed the literature more often than not were limited in terms of the num-
and concluded that the MLE method is the ‘best over- ber of methods compared, the simulation sizes, the
all estimator’. In what may be the only published sample sizes selected and/or ranges for the percent
textbook devoted to the topic of CDA, Helsel censored. Our computer simulations are more com-
(2005) reviewed the literature and offered the follow- prehensive in regards to all of these issues and hope-
ing recommendations: fully can be generalized to a wider range of actual
for ,50% censored use the KM method (for all scenarios.
sample sizes),
for 50–80% censored use MLEr or LPRrm for COMPUTER SIMULATION METHODS
sample sizes ,50, and MLE for sample sizes
To estimate the bias and rMSE, we developed
.50, and
a computer program to generate and analyse cen-
for .80% censored report the NP exceedance
sored datasets using the following CDA methods:
fraction for the limit whenever the sample size pffiffiffi
is ,50 and the NP upper percentiles (e.g. 90th Substitution: LOD, LOD/2 and LOD/ 2
or 95th) for sample sizes .50. Log-probit regression: LPR, LPRr, LPRrm.
Maximum likelihood estimation: MLE, MLEr,
Helsel (2005) justified the universal application of
MLErm and MLEmpv.
the KM method for low to moderately censored data-
Non-parametric methods: NP and KM.
sets on the strength of the She (1997) and Shumway
et al. (2002) papers, and the opinion that the KM For each artificial dataset, the estimates of the 95th
method, since it does not require any distributional percentile [most of the CDA methods lead to a sample
assumption, is robust to all situations where the true GM and GSD, from which the sample 95th per-
exposure profile departs significantly from the log- centile can be estimated using the standard equation:
normal distribution assumption. X^0:95 5 expðlnðGMÞ þ 1:645 lnðGSDÞÞ] and mean
were compared to the true values. After the genera- For Simulation 1, we generated 100 000 artificial
tion and analysis of 100 000 artificial-censored data- datasets from censored lognormal or censored con-
sets, the average bias and rMSE (an estimate of the taminated lognormal distributions. The sample size
overall accuracy, discussed later) were calculated for each dataset was randomly varied (using the uni-
for each parameter. form distribution) between 20 and 100 (inclusive).
To compare the methods, we devised the following The percentage of the distribution that was censored
three simulations (Table 1): was also randomly varied (using the uniform distri-
bution), between 1% and 50% (inclusive). The labo-
Simulation 1: n ranged between 20 and 100, the ratory LOD was then set at the concentration in the
true percent censored ranged between 1% and distribution corresponding to the percent censored.
50% and the true GSD ranged between 1.2 and 4, Simulation 1 was repeated for each of four scenar-
Simulation 2: n ranged between 20 and 100, the ios (see Table 1) and for each of the CDA methods. In
true percent censored ranged between 50% and Scenario I, a single lognormal distribution was as-
80% and the true GSD ranged between 1.2 and sumed as well as a single laboratory. The GM was
4, and
fixed at 1, while the GSD for the distribution was ran-
Simulation 3: n ranged between 5 and 19, the true
domly varied between 1.2 and 4 (inclusive) using the
percent censored ranged between 1% and 50%
uniform distribution. In Scenario II, a single lognor-

and the true GSD ranged between 1.2 and 4.
mal distribution was also assumed, as in Scenario I,
For each of these simulations, we devised the follow- but three laboratories with different LODs were as-
ing four scenarios: sumed to have been used, one each for approximately
one-third of the samples. The LOD for each labora-
Scenario I: a single lognormal distribution and
tory was randomly generated as described above. In
a single LOD,
Scenario III, a single laboratory was used; however,
Scenario II: a single lognormal distribution and
the underlying distribution was contaminated. A con-
three LODs,
taminated (i.e. non-lognormal) distribution was cre-
Scenario III: a contaminated lognormal distribu-
ated by combining two lognormal distributions.
tion and a single LOD, and
The GM and GSD for each distribution were ran-
Scenario IV: a contaminated lognormal distribu-
domly generated from uniform distributions where
tion and three LODs.
the minimum and maximum values were 1 and 3,
While there are other permutations that we could and 1.2 and 4, respectively. The fraction that the first
have devised (and did, as discussed later), we felt that distribution contributed to the overall distribution
the simulations and scenarios above represented was also randomly varied by generating a fraction
a thorough testing of the various methods for analyz- from a uniform distribution where the minimum and
ing censored data. Readers should be able to identify maximum were 0 and 1. (The fraction contributed
a familiar scenario and then determine the best by the second distribution was one minus this value.)
method or methods. In Scenario IV, a contaminated distribution was again
Table 1. Parameters used in Simulation 1

Simulation parameter Scenario I Scenario II Scenario III Scenario IV
min–max min–max min–max min–max
Sample size 20–100 20–100 20–100 20–100
Exposure profile distributions
GM1a 1 1 1–3 1–3
GSD1 1.2–4 1.2–4 1.2–4 1.2–4
GM2 — — 1–3 1–3
GSD2 — — 1.2–4 1.2–4
Distribution1 %b 100% 100% 0–100% 0–100%
Laboratory LOD as % of the
exposure profilec
Laboratory1 LOD % 1–50% 1–50% 1–50% 1–50%
Laboratory2 LOD % — 1–50% — 1–50%
Laboratory3 LOD % — 1–50% — 1–50%
Simulations 2 and 3 were identical to Simulation 1 except that for Simulation 2 the LOD as a percent of the exposure profile
ranged between 50% and 80% and for Simulation 3 the sample size ranged between 5 and 19.
a
Since only a single distribution is used in Scenarios I and II, the GM was fixed at 1.
b
Distribution2 % will be 1—the percentage for Distribution1.
c
If three laboratories were used each was used 1/3 of the time.
used, as just described, but with three laboratories standard simple arithmetic mean formula. For the
and three LODs, as was described for Scenario II. KM method, the mean was estimated without varia-
Simulation 2 was identical to Simulation 1 above, tion from the procedure outlined in Helsel (2005).
except that the percentage censored varied between However, we estimated the mean regardless of the
50% and 80% (inclusive). Simulation 3 was also actual fraction of censored data [Helsel (2005) rec-
identical to Simulation 1, except that the sample sizes ommended that the KM method should not be used
were allowed to vary between 5 and 19 (inclusive), to estimate the mean whenever the dataset is .50%
rather than 20 and 100. censored]. The KM and NP methods were used to es-
For each randomly generated dataset, the com- timate the 95th percentile in those simulations where
puter program did the following: the sample size was 20 or greater.
Once the sample 95th percentile and mean were
determined if the dataset was invalid,
calculated, the differences between the sample esti-
determined if the dataset was completely uncen-
mates and the true values were determined. After
sored,
all 100 000 datasets were generated, the program cal-
applied standard statistical methods to each valid,
culated the average bias for each of the parameters
uncensored dataset, and
across all 100 000 datasets:
applied the selected CDA method to each valid,
censored dataset. Bias 5 ð
x hÞ;

A dataset was invalid if all n measurements were cen- where x is the mean of the 100 000 parameter esti-
sored or there were too few uncensored data. For the mates and h is the true value. The program also cal-
LPR-based and MLE-based methods, a valid dataset culated the rMSE, which is a combination of bias and
was one with at least three measurements and at least precision:
two of those must be uncensored. For the substitution sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P
methods, a valid dataset must have at least two meas- 2 ðx xÞ2
urements and at least one must be uncensored. The rMSE 5 ð x hÞ þ
N 1
fraction of invalid datasets for each method was
tracked by the program [for Simulations 1 and 2, where N 5 100 000. The rMSE value can be consid-
the fraction of invalid datasets was typically ered a measure of overall imprecision or overall ac-
,0.1%; for Simulation 3, the typical fraction was curacy. For example, for a particular parameter,
,0.5% (the smaller sample sizes increased the likeli- a proposed method may be biased but have lower var-
hood of an invalid or completely censored dataset)]. iability. If the resulting rMSE is comparable to a gold
If all n measurements were uncensored, the dataset standard method, the proposed method could be con-
was statistically analysed by the program using the sidered suitably accurate.
standard statistical methods for estimating the GM, The bias and rMSE can be evaluated on the log
GSD, 95th percentile and mean (see Mulhausen scale (e.g. bias relative to ln X0.95) or the concentra-
and Damiano, 1998). The selected CDA method tion scale (e.g. bias relative to X0.95). Consistent with
was applied whenever the dataset was valid and there the overwhelming majority of other investigators, we
was at least one LOD value. After the sample GM chose to examine the bias and rMSE on the concen-
and GSD were calculated, the sample 95th percentile tration scale, or relative to the true 95th percentile and
was calculated using the standard equation [most mean.
of the CDA methods lead to a sample GM and
GSD, from which the sample 95th percentile can RESULTS AND DISCUSSION
be estimated using the standard equation: X^0:95 5
expðlnðGMÞ þ 1:645 lnðGSDÞÞ]. For the LPR and The results for the Simulation and Scenario combi-
MLE methods, the mean was estimated using the nations are listed in Tables 2 through 13. For the bias
minimum variance unbiased estimator (mvue) equa- comparison metric, the methods are ranked in order
tion (Mulhausen and Damiano, 1998), using n k of the absolute value of the bias. For the rMSE com-
and n as the sample size, respectively. (The mvue parison metric, the methods are ranked from low to
equation was designed to minimize the transforma- high. The bias and rMSE are given in terms of the
tion bias that occurs when moving from the log scale percent of the true value. Given that we used N 5
to the concentration scale. Preliminary simulations 100 000 for all simulations, the estimates of bias
suggested that using the full sample size (n) in the and rMSE should be fairly stable. Repeating the
mvue equation was appropriate for the MLE-based same simulation will generally produce a result that
methods, while for the LPR-based methods, a re- is within –0.2 of the table values. In contrast, when
duced sample size (n k) results in less bias when we reduce the simulation size to the N 5 500 or
used in the mvue equation.) For the substitution N 5 1000, as was used in most of the published stud-
methods, the robust MLE or LPR methods and the ies on CDA, we observed that the bias and rMSE val-
MLEmpv method, the mean was estimated using the ues frequently vary from the table values by plus or
Table 2. Simulation 1, Scenario I—single lognormal distribution and a single laboratory where the laboratory LOD is 1–50% of
the true distribution; 20 n 100; 1.2 GSD 4
95th Percentile Mean
Method Bias (%) Method rMSE Method Bias (%) Method rMSE
MLErm 0.4 MLEmpv 22.4 MLErm 0.0 MLE 17.9
pffiffiffi
LPRr 0.4 Sub LOD 2 22.6 MLEr 0.1 LPR 18.4
Sub LOD/2 0.5 MLErm 23.0 LPRr 0.1 MLEr 19.5
MLEr 0.5 Sub LOD/2 23.1 MLEmpv 0.1 MLErm 19.6
MLE 0.6 MLEr 23.2 LPRrm 0.2 LPRr 19.6
LPRrm 1.4 MLE 23.3 LPR 0.5 LPRrm 19.8
LPR 2.5 LPRr 23.8 MLE 0.5 MLEmpv 19.8
pffiffiffi pffiffiffi
KM 2.8 LPRrm 24.1 Sub LOD 2 0.6 Sub LOD 2 19.9
MLEmpv 6.0 Sub LOD 24.2 Sub LOD/2 1.8 Sub LOD/2 20.4
pffiffiffi
Sub LOD 2 7.8 LPR 24.9 Sub LOD 4.2 KM 20.5
Sub LOD 13.5 KM 35.8 KM 4.2 Sub LOD 20.5
NP 15.2 NP 50.6

Table 3. Simulation 1, Scenario II—single lognormal distribution and three laboratories where the LOD for each laboratory fell
in the range of 1–50% of the true distribution; 20 n 100; 1.2 GSD 4
pffiffiffi
MLE 0.4 Sub LOD 2 21.9 LPRrm 0.0 LPR 16.5
Sub LOD/2 0.7 LPRr 22.0 LPR 0.0 MLE 17.9
LPRrm 1.4 MLErm 22.1 MLErm 0.2 LPRrm 19.6
MLErm 2.3 MLEr 22.3 MLEmpv 0.3 MLErm 19.6
pffiffiffi
MLEr 2.5 LPR 22.4 Sub LOD 2 0.6 LPRr 19.8
KM 2.9 Sub LOD/2 22.4 MLE 0.7 KM 19.8
MLEmpv 4.1 LPRrm 22.5 KM 1.4 MLEr 19.9
LPR 6.5 Sub LOD 22.6 MLEr 1.4 MLEmpv 19.9
pffiffiffi
LPRr 7.1 MLE 23.1 Sub LOD/2 1.8 Sub LOD 2 20.0
pffiffiffi
Sub LOD 2 7.4 MLEmpv 23.5 LPRr 3.2 Sub LOD 20.1
Sub LOD 12.1 KM 34.2 Sub LOD 4.2 Sub LOD/2 20.1
NP 15.0 NP 52.1
Table 4. Simulation 1, Scenario III—a contaminated lognormal distribution and a single laboratory where the laboratory LOD is
1–50% of the true distribution; 20 n 100; 1.2 GSD 4
LPRr 0.2 MLEmpv 24.1 MLErm 0.1 MLE 18.2
pffiffiffi
MLEr 0.3 Sub LOD 2 24.4 MLEmpv 0.1 LPR 18.7
MLE 0.5 Sub LOD/2 24.4 LPRr 0.1 MLErm 21.4
MLErm 1.2 MLErm 24.7 MLEr 0.2 MLEmpv 21.5
pffiffiffi
LPRrm 1.3 MLE 24.9 LPRrm 0.4 Sub LOD 2 21.8
pffiffiffi
Sub LOD/2 1.6 MLEr 25.2 Sub LOD 2 1.1 MLEr 21.8
LPR 1.8 LPRr 25.7 Sub LOD/2 1.4 Sub LOD/2 21.9
KM 4.6 Sub LOD 26.0 LPR 1.5 LPRrm 21.9
MLEmpv 7.2 LPRrm 26.5 MLE 2.3 LPRr 22.6
pffiffiffi
NP 19.7 NP 64.4
Table 5. Simulation 1, Scenario IV—a contaminated lognormal distribution and three laboratories where the LOD for each
laboratory fell in the range of 1–50% of the true distribution; 20 n 100; 1.2 GSD 4
pffiffiffi
MLE 0.7 Sub LOD 2 23.8 MLErm 0.0 LPR 16.8
Sub LOD/2 1.3 MLErm 24.1 LPRrm 0.2 MLE 18.1
LPRrm 2.2 Sub LOD/2 24.2 MLEmpv 0.4 LPRrm 21.4
pffiffiffi
MLEr 3.2 LPR 24.3 Sub LOD 2 1.0 LPRr 21.5
MLErm 3.3 LPRr 24.3 Sub LOD/2 1.3 MLEr 21.6
KM 4.4 MLEr 24.3 MLEr 1.4 MLEmpv 21.8
MLEmpv 5.4 LPRrm 24.6 KM 1.6 KM 21.9
pffiffiffi
LPRr 7.9 MLE 24.7 LPR 1.9 Sub LOD 2 21.9
LPR 7.9 Sub LOD 24.8 MLE 2.4 MLErm 21.9
pffiffiffi
Sub LOD 2 8.7 MLEmpv 25.3 LPRr 3.1 Sub LOD/2 22.3
Sub LOD 13.3 KM 39.7 Sub LOD 4.3 Sub LOD 22.4
NP 19.5 NP 62.7

MLE 0.0 MLErm 24.2 MLE 0.0 MLE 18.9
MLEr 0.3 MLE 24.8 LPR 0.0 LPR 19.5
LPRr 0.9 MLEr 25.2 MLErm 0.3 MLEr 19.8
LPRrm 1.7 LPRr 25.8 MLEr 0.6 LPRr 20.0
MLErm 1.8 LPRrm 26.4 Sub LOD/2 0.7 MLEmpv 20.0
KM 2.5 Sub LOD 26.6 MLEmpv 1.4 MLErm 20.1
pffiffiffi
LPR 3.5 Sub LOD 2 27.7 LPRrm 1.5 LPRrm 20.4
NP 14.9 LPR 28.0 LPRr 1.6 Sub LOD/2 21.7
Sub LOD/2 19.8 Sub LOD/2 28.3 Sub LOD 2 12.0 Sub LOD 2 24.5
Sub LOD 20.7 MLEmpv 29.6 KM 30.2 KM 38.2
pffiffiffi
Sub LOD 2 21.3 KM 33.5 Sub LOD 30.3 Sub LOD 38.3
MLEmpv 21.6 NP 49.8
MLE 0.2 MLErm 22.9 MLE 0.0 MLE 19.0
KM 2.9 LPRr 22.9 MLErm 0.1 MLErm 19.6
LPRrm 3.1 LPRrm 24.1 LPRrm 0.6 LPRrm 19.8
LPR 4.2 Sub LOD 24.3 Sub LOD/2 0.8 MLEr 19.9
LPRr 4.6 MLE 24.9 MLEmpv 1.7 MLEmpv 20.6
pffiffiffi
MLEr 4.8 Sub LOD 2 26.3 MLEr 3.4 Sub LOD/2 21.4
MLErm 5.4 LPR 26.6 Sub LOD 2 12.1 Sub LOD 2 23.9
NP 15.0 MLEr 26.7 LPR 17.9 LPR 25.8
Sub LOD 17.8 Sub LOD/2 27.2 KM 19.8 KM 28.1
Sub LOD/2 19.0 MLEmpv 29.5 LPRr 21.9 LPRr 30.6
pffiffiffi
MLEmpv 21.8 NP 51.2
MLE 0.9 MLErm 26.7 Sub LOD/2 0.2 MLE 20.1
MLEr 1.0 MLE 27.7 LPRr 0.9 LPR 21.0
MLErm 1.2 MLEr 27.9 MLEr 1.0 MLErm 21.6
LPRr 3.0 Sub LOD 28.2 MLErm 1.1 MLEr 21.6
pffiffiffi
LPRrm 3.8 Sub LOD 2 29.1 MLE 2.3 LPRr 21.8
KM 4.5 LPRr 29.5 LPR 3.2 MLEmpv 21.8
LPR 5.4 Sub LOD/2 29.7 MLEmpv 3.2 LPRrm 22.3
NP 19.7 LPRrm 30.3 LPRrm 3.8 Sub LOD/2 22.7
Sub LOD 21.1 MLEmpv 31.3 Sub LOD 2 12.4 Sub LOD 2 25.9
Sub LOD/2 21.2 LPR 31.8 Sub LOD 29.9 Sub LOD 39.0
pffiffiffi
Sub LOD 2 22.1 KM 40.0 KM 30.0 KM 39.1
MLEmpv 22.7 NP 62.3

MLE 1.0 MLErm 25.4 Sub LOD/2 0.1 MLE 19.9
LPRrm 2.1 LPRr 25.7 MLErm 1.3 LPRrm 21.7
LPR 3.5 LPR 26.2 MLEr 2.0 MLErm 21.8
LPRr 3.6 Sub LOD 26.2 MLE 2.3 MLEr 21.8
KM 4.6 LPRrm 26.9 LPRrm 2.8 MLEmpv 22.4
MLErm 5.1 MLE 27.7 MLEmpv 3.2 Sub LOD/2 22.9
MLEr 5.7 Sub LOD 2 28.1 Sub LOD 2 12.3 LPR 24.6
pffiffiffi
Sub LOD 18.5 Sub LOD/2 28.8 LPR 15.2 Sub LOD 2 25.7
NP 19.7 MLEr 29.5 KM 19.8 KM 29.4
Sub LOD/2 20.3 MLEmpv 31.2 LPRr 20.4 LPRr 31.1
pffiffiffi
MLEmpv 22.8 NP 61.6
pffiffiffi
MLEmpv 1.3 Sub LOD 2 54.3 LPRrm 0.0 MLE 38.9
pffiffiffi
Sub LOD 2 1.8 Sub LOD/2 58.3 MLErm 0.0 LPR 40.2
MLE 3.7 Sub LOD 60.9 MLEr 0.2 LPRr 41.2
MLErm 4.1 MLEmpv 61.3 MLEmpv 0.2 MLErm 41.8
pffiffiffi
LPRr 4.4 MLE 63.4 Sub LOD 2 0.7 MLEr 42.0
MLEr 5.4 MLErm 63.7 LPR 0.8 KM 42.1
pffiffiffi
Sub LOD/2 6.8 MLEr 66.7 LPRr 0.9 Sub LOD 2 42.7
Sub LOD 7.9 LPRr 69.0 Sub LOD/2 1.9 Sub LOD/2 42.9
LPRrm 11.5 LPRrm 94.8 MLE 2.1 MLEmpv 43.1
LPR 12.6 LPR 110.5 KM 4.0 LPRrm 43.2
Sub LOD 4.4 Sub LOD 45.6
MLErm 0.9 Sub LOD 50.8 MLErm 0.1 MLE 38.0
pffiffiffi
MLEmpv 0.8 Sub LOD 2 56.0 MLEmpv 0.3 LPR 38.0
LPRr 1.0 MLErm 59.6 LPRrm 0.4 MLErm 41.0
Sub LOD 2 1.2 MLEmpv 60.2 Sub LOD 2 0.8 MLEmpv 41.7
MLEr 2.9 Sub LOD/2 60.6 MLEr 1.5 MLEr 42.1
MLE 3.1 MLE 60.7 LPR 1.6 Sub LOD 42.5
LPR 3.4 LPRr 62.3 KM 1.6 LPRrm 42.7
LPRrm 4.1 MLEr 63.8 Sub LOD/2 1.8 LPRr 42.8
Sub LOD 6.8 LPRrm 72.3 MLE 2.3 KM 44.0
Sub LOD/2 7.2 LPR 90.6 LPRr 4.0 Sub LOD/2 44.9
pffiffiffi
Sub LOD 4.2 Sub LOD 2 45.6

MLEmpv 0.5 Sub LOD 56.7 MLEmpv 0.0 MLE 39.8
Sub LOD 2 2.6 Sub LOD 2 58.1 LPRrm 0.1 LPR 40.8
MLE 3.4 Sub LOD/2 62.8 MLEr 0.1 MLErm 45.4
MLErm 4.0 MLEmpv 64.4 MLErm 0.3 LPRr 45.8
MLEr 5.5 MLE 65.8 LPRr 0.7 MLEr 45.8
Sub LOD/2 5.7 MLErm 68.0 LPR 0.8 MLEmpv 46.5
pffiffiffi
LPRr 6.1 MLEr 73.1 Sub LOD 2 1.0 Sub LOD/2 46.6
Sub LOD 8.6 LPRr 100.9 Sub LOD/2 1.5 Sub LOD 47.5
pffiffiffi
LPRrm 13.2 LPRrm 103.6 MLE 3.4 Sub LOD 2 47.8
LPR 13.7 LPR 121.3 Sub LOD 4.3 KM 47.9
KM 4.5 LPRrm 48.0
MLEmpv 0.3 Sub LOD 54.8 MLErm 0.2 LPR 38.5
pffiffiffi
MLErm 0.8 Sub LOD 2 57.4 LPR 0.2 MLE 38.9
LPRr 0.8 MLE 63.7 MLEmpv 0.2 LPRr 46.2
pffiffiffi
Sub LOD 2 2.3 Sub LOD/2 64.3 LPRrm 0.7 MLErm 46.5
pffiffiffi
MLE 2.3 MLEmpv 65.5 Sub LOD 2 0.9 Sub LOD 46.5
pffiffiffi
MLEr 2.9 MLErm 66.4 Sub LOD/2 1.3 Sub LOD 2 47.4
LPR 3.9 LPRr 66.8 MLEr 1.4 Sub LOD/2 47.8
LPRrm 4.9 MLEr 70.6 KM 1.7 MLEmpv 47.9
Sub LOD/2 6.0 LPRrm 84.0 LPRr 3.8 MLEr 47.9
Sub LOD 7.5 LPR 89.1 MLE 4.0 LPRrm 48.3
Sub LOD 4.2 KM 49.2
minus several percentage points. Consequently, we other Simulation–Scenario combinations. Other in-
feel that the bias and rMSE estimates in the tables vestigators may have a different experience. For ex-
are reliable. ample, perhaps only a single laboratory is used, but
When comparing methods, our view is that there is the sample sizes are typically ,20, suggesting that
little practical difference between methods having an the Simulation 3 results will be more informative.
absolute bias that differs by only 1% or so, or a rMSE Or perhaps the datasets are always complex and the
that differs by only 2% or so. Furthermore, it is worth lognormal distribution assumption is always in
mentioning that these are composite results for a wide doubt, resulting from the combination of data from
range of distributions and censoring points created disperse areas and/or time periods, in which case
using the simulation parameter ranges specified in the Scenario IV simulations are of interest.
Table 1. It is likely that the rankings would change
if the methods were challenged with a specific distri- Substitution methods
bution, a specific LOD and a specific sample size. While the substitution methods have often been
Our view is that a composite analysis is more infor- condemned (She, 1997; Helsel, 2005), their use con-
mative regarding the performance that one should ex- tinues. Our results, as expected, show that there is
pect in the long run from each method. good reason for not using the LOD substitution
Which comparison metric should be used: the bias method, as it consistently ranked near the bottom

or rMSE? The majority of the studies used the rMSE of the rankings in terms of both bias and rMSE, con-
as a basis for comparing methods, a few used only bi- sistently underestimating the 95th percentile and
as and several provided both. We provide both met- overestimating the mean. In Simulation 1 (Tables
rics for the reader to consider, but lean toward the 2–5), where the maximum ppercent
ffiffiffi censored was
rMSE as the more informative metric. In principle, 50%, the LOD/2 and LOD 2 substitution methods
a method with the lowest rMSE has the best combi- did surprisingly well when estimating the 95th per-
nation of both bias and precision. However, past centile, being consistently in the top half ofp
the
ffiffiffi rMSE
investigators have often rejected the LOD/2 substitu- rankings. Neither of the LOD/2 and LOD 2 meth-
tion method, even though the method had similar or ods did
pffiffiwell
ffi when estimating the mean (with the
lower rMSE values than their preferred method, on LOD 2 substitution method doing slightly better
the basis that it had no theoretical basis. Our view than the LOD/2 method), being consistently in the
is more utilitarian. While some methods do indeed bottom half of the rankings. In Simulation 2 (Tables
appeal to a distributional assumption, we feel that be- 6–9), where the percent censored ranged between
cause the true underlying exposure profile will never 50% and 80%, the substitution methods were consis-
be identical in shape to this assumed distribution that tently in the middle rankings for rMSE when estimat-
all methods are essentially ad hoc. Therefore, those ing the 95th percentile, but were nearly always at the
methods that have low rMSE values and are reason- bottom of the rankings for bias. For the mean, the
ably robust to departures from the unimodal, lognor- substitution methods were consistently in the bottom
mal model should be preferred, with relative ties of the rankings for both bias and rMSE. On the other
going to the method having the lowest bias. hand, in Simulation 3 (Tables 10–13), where the sam-
Looking at both the rMSE and bias results, none ple sizes ranged between 5 and 19 (inclusive), the bi-
of the CDA methods stood out as the ‘single best as was variable, leading to no general observation,
method’. If anything, what was clear is that nearly except to say that all three methods performed con-
every method will occasionally excel at estimating sistently well, using the rMSE metric, when estimat-
the 95th percentile or mean given some particular ing the 95th percentile.
Simulation–Scenario combination. Given that there Overall,ptheffiffiffi substitution methods—LOD, LOD/2
was no obvious overall winner, questions that could and LOD 2 substitution—did poorly when their
be addressed by inspecting the results in Tables 2 bias is compared to the bias of the MLE-based and
through 13 are: LPR-based methods (discussed later) for both the
95th percentile and mean, particularly in Simulation
Which family of methods or specific method is
2 where the distributions were highly censored. Of
generally superior overall?
the three, LOD substitution was, as expected, the
Which family of methods or specific method is
most severely biased and should be categorically
generally superior given the types of distributions
avoided. However, there were scenarios, particularly
typically encountered in your experience?
those inSimulation
pffiffiffi 3, where the rMSE of the LOD/2
For example, in our experience, a typical scenario is or LOD 2 methods was similar to or less than that
one where a single laboratory is used each year, re- of the higher order methods (i.e. the LPR-based or
sulting in a single censoring point, or LOD, assuming MLE-based methods), suggesting that the bias inher-
that the measurements all have roughly the same ent in the substitution methods is somewhat offset by
sample volume. Therefore, we are more inclined to the reduced variability (i.e. increased precision) in
weight the Scenario I results over the results of the the estimates.
LPR-based methods rMSE for the 95th percentile and low bias for the
While there were exceptions, the LPR-based meth- mean.
ods tended to be in the middle to top half of the Overall, our choice would be either of the MLE or
bias and rMSE rankings for Simulations 1 and 2. The MLEr methods when estimating the 95th percentile
LPR-based methods appear to be fairly robust when and the MLE or MLErm methods when estimating
confronted with multiple LODs and/or contaminated the mean. Since both the MLEr and MLErm methods
distributions. The LPRrm method tended to have require additional manipulations, we would, if con-
lower bias than the LPR and LPRr methods in the fined to a single choice, select the standard MLE
multiple LOD scenarios (Scenarios II and IV) in method. Regarding the mean, the standard MLE
Simulations 1 and 2. Overall, the LPR-based meth- method almost always had the lowest rMSE, regard-
ods were slightly lower in the rankings than the less of the simulation or scenario. For the 95th per-
MLE-based methods, although exceptions frequently centile, the MLE method consistently appeared in
occurred. The LPRrm method, which was designed the top half of the rankings for both bias and rMSE,
solely with the intention of estimating the mean from particularly for the severely censored scenarios (Sim-
complex datasets and when the single lognormal dis- ulation 2; Tables 6–9) and appeared to be surpris-
tribution assumption is in doubt, consistently had low ingly robust when confronted with contaminated
bias when estimating the mean in all three simula- distributions (Scenarios III and IV) and complex-

tions. However, the simpler LPR method consistently censored datasets (Scenarios II and IV). Given that
had lower rMSE values, suggesting that it is the supe- the MLEmpv method generally ranked lower than
rior method even when confronted with multiple the other MLE methods for bias in Simulations 1
LODs and/or contaminated distributions. and 2, particularly for the complex-censored scenar-
All three LPR-based methods did poorly when es- ios, and rarely exhibited clearly superior rMSE val-
timating the 95th percentile from small datasets ues, we see no compelling reason for using this
(Simulation 3; Tables 10–13), frequently having both method, even when estimating the mean (for which
large bias and rMSE values. Overall, the LPR-based is was originally intended).
methods tended to have larger rMSE values when
compared to the MLE-based methods (due to the oc-
NP methods
casional very large sample GSD). When estimating
the 95th percentile, there was no consistent winner KM method for estimating the 95th percentile
among the LPR-based methods, in which case the and the mean. With occasional exceptions, the KM
simplest to implement—that is, the LPR method— method was consistently in the middle to bottom half
should probably be preferred. When estimating the of the bias and rMSE rankings when estimating the
mean, if only the rMSE is considered, the standard 95th percentile. Regarding the mean, the KM method
LPR method, while often more biased relative to was, along with the LOD method, the worst of all the
the two so-called robust LPR-based methods, almost methods whenever there was a single LOD (Scenar-
always had the lowest rMSE, regardless of the simu- ios I and III), consistently yielding a strong positive
lation or scenario. If bias for the mean estimate is bias for the mean. Its performance improved when-
a concern, the LPRrm method should be considered. ever there were multiple LODs (Scenarios II and
IV), but still it remained consistently in the bottom
half of the rankings. Helsel (2005) recommended
MLE-based methods against using the KM method whenever the observed
With the exception of the MLEmpv method, the percent censored was 50% or greater on the basis that
MLE-based methods performed well in the single- the median cannot be estimated under such circum-
distribution scenarios and were generally fairly ro- stances (and presumably the estimation of the mean
bust in the multiple LOD and contaminated distribu- also becomes problematic). In our simulations, we
tion scenarios. Typically, when estimating the 95th chose to ignore this restriction and estimated the
percentile, the MLE or MLEr methods often ranked mean and 95th percentile for any observed percent
high for the bias metric, while for the mean, the censored of ,95%. Running the simulations with
MLErm method was often ranked very high. Regard- Helsel’s restriction did not improve the situation,
ing the rMSE metric, the MLE, MLEr and MLErm and in fact, increased the absolute bias.
methods were usually in the top half of the rankings. The KM method was selected by Helsel (2005) as
In Scenarios 1 and 2, the MLEmpv method consis- the method of choice for estimating the mean for all
tently ranked below the other MLE methods and sample sizes whenever the (observed) percent cen-
most of the LPR-based methods for both bias and sored is ,50%. This recommendation appeared to
rMSE, particularly when estimating the 95th percen- be based solely upon two studies that endorsed the
tile. However, in the small sample size scenario KM method as a reasonable alternative to the stan-
(Scenario 3, Tables 10–13), the MLEmpv method per- dard CDA methods. In one study, the investigators
formed well, consistently exhibiting low bias and (Schmoyer et al., 1996) concluded that the KM
method ‘seems better’ than the MLE method, but for the KM to be successfully applied the censoring
their conclusion was not free of equivocation and must be random: ‘. . .the probability that the mea-
was focused on hypothesis testing rather than param- surement of an object is censored cannot depend on
eter estimation (as is the focus of our study). In the the value of censored variable’. Schmoyer et al.
other study, the investigator (She, 1997) found that (1996) recognized this essential requirement when
the LOD/2 substitution method often outperformed in their computer simulations the authors assumed
the KM method, but recommended the KM method that the censoring point or LOD was a random vari-
over the substitution method because the LOD/2 able and generated a random LOD for each random
method ‘has no statistical theoretical basis’. measurement. However, with occupational exposure
Interestingly, as we programmed the KM method, data (and we suspect environmental data as well),
we immediately recognized that when estimating the the probability that a measurement will be censored
mean exposure the KM method is mathematically does indeed depend on the true ‘value’ or concentra-
identical to the worst of the substitution methods tion: the censoring point is relatively fixed and any
(LOD substitution), which was demonstrated in the true concentration below that censoring point will re-
results (e.g. see the tables for Scenarios I and III), result in a LOD measurement. These considerations
sulting in a strong positive bias for the mean. If the make it difficult to envision an occupational scenario
non-detects are internal, that is, bounded on both where the KM method could be applied, even if it

sides by detects, the KM method, in principle, has consistently performed better in the computer simu-
a negative bias. When a dataset has both a terminal lations than the other methods (which it did not).
non-detect and one or more internal non-detects, Quantile methods for estimating the 95th
the biases introduced tend to cancel, but with no percentile. In terms of both bias and rMSE, the NP
guarantee of a near zero overall bias, as we see in quantile method that we selected for estimating the
the tables for Scenarios II and IV. Helsel (2005) felt 95th percentile (i.e. the Q6 method, list in Appendix
that in most multi-LOD situations, the overall bias ‘is 1) was either the worst or among the worst for both
not large’, but did not determine the degree of the bi- bias and rMSE, even for the contaminated distribu-
as for any particular scenario. We found that with tion scenarios (Scenarios III and IV). Using the pa-
three LODs (i.e. Scenarios II and IV), the KM rameters for Simulation 1, Scenario IV, we tested
method performs better than the LOD method, but the other quantile methods for estimating the 95th
still does poorly when compared to the MLE-based percentile (see Table 14 and Appendix 1). While all
and LPR-based methods, even for Scenario IV (i.e. had less absolute bias than the Q6 method, the rMSE
the multiple LOD and contaminated distribution sce- values were at the bottom of the rMSE rankings when
nario) where one would expect it to outperform those compared to the parametric CDA methods (compare
methods that require a distributional assumption. Re- to the results in Table 5). This suggests that for the
garding the 95th percentile (whenever the sample sample sizes considered in Simulation 1, Scenario
size is 20 or more), the KM method performed better IV (i.e. 20 n 100), the LPR- and MLE-based
than the LOD method, but only infrequently outper- methods are sufficiently robust as to outperform the
formed the higher order methods. NP quantile methods. This begs the question, at what
Based on these results, we see no compelling rea- point will the NP methods be superior?
son to recommend the KM method for estimating ei- We increased the severity of the contaminated dis-
ther the mean or the 95th percentile, even when the tributions used in the Simulation 1–Scenario IV com-
dataset contains multiple LODs and is suspected to bination by increasing the range for the two GMs
contain multiple distributions (i.e. is a contaminated from 3-fold to 10-fold, but otherwise retained the
lognormal). Furthermore, according to She (1997) other simulation parameters in Table 1. As expected,
Table 14. Simulation using Simulation 1, Scenario IV parameters, comparing the eight 95th percentile (quantile) estimation
methods presented by Hyndman and Fan (1996)
95th Percentile
Method Bias (%) Method rMSE
Q1—empirical CDF 3.3 Q4—weighted average 1 30.0
Q2—empirical CDF w/averaging 3.9 Q7—weighted average 3 30.1
Q7—weighted average 3 4.1 Q3—closest value 32.0
Q4—weighted average 1 4.8 Q2—empirical CDF w/averaging 37.6
Q3—closest value 4.9 Q1—empirical CDF 38.8
Q5—Cleveland 5.8 Q5—Cleveland 39.9
Q8—median quartile 10.1 Q8—median quartile 47.6
Q6—weighted average 2 19.4 Q6—weighted average 2 63.7
the bias and rMSE values for the NP quantile meth- For these larger sample sizes, the bias and rMSE
ods were virtually identical to those in Table 14. For results for the LPR- and MLE-based methods tend
the LPR- and MLE-based methods, the bias and to converge, suggesting that method-related dif-
rMSE values were again superior, little worse than ferences should be a minor consideration when esti-
the values in Table 5. However, when we increased mating either the 95th percentile or the mean. When
the sample size range from 20–100 to 100–1000, estimating the 95th percentile, in the contaminated
holding all other Simulation 1, Scenario IV parame- distribution scenarios the NP quantile methods
ters the same, the bias and rMSE values for the NP tended to slightly outperform the MLE-based meth-
quantile methods tended to approach those of the ods in terms of bias, but not rMSE, when challenged
LPR- and MLE-based methods. This suggests that with a contaminated distribution, suggesting that the
the NP methods could be applied to contaminated NP methods should not in such instances be consid-
distributions, but only for very large sample sizes. ered an automatic alternative to the higher order
In summary, the NP methods, while they have the methods when estimating the 95th percentile for
advantage of being applicable to any underlying ex- large datasets. For sample sizes ,100, the LPR-
posure profile, whether or not that profile is close to and MLE-based methods are sufficiently robust to
some assumed distribution function, do not perform be preferable to the NP quantile methods.
as well as the parametric methods for right-skewed

exposure profiles for sample sizes of 100 or less, even
for highly contaminated distributions. We feel that Opportunities for improvement
the contaminated distribution scenario that we devel- The purpose of this paper is to present and discuss
oped was a rigorous test of the robustness of the the computer simulation results and not to identify
parametric methods (i.e. the LPR- and MLE-based any real or imagined defects in the various CDA
methods): the GM’s for the two distributions were al- methods. However, since all of the methods are es-
lowed to vary by a factor of three (or even 10, as de- sentially ad hoc and none can be considered the ac-
scribed above), the GSD’s were allowed to range knowledged universal choice, there must be room
between GSDs representing very low to extreme var- for improvement. For example, the effect of the plot-
iability, and the percentage contribution of each dis- ting position formula on the LPR-based methods
tribution was allowed to range between 0% and could be reexamined, although Helsel and Cohn
100%. Even so, the parametric methods appear to (1988) stated that it had little effect. The plotting po-
be sufficiently robust to these departures from the sition method for the LPRrm method is considerably
single lognormal distribution model that we routinely different than that used in the LPR and LPRr meth-
assume for our data. ods, being based upon the Weibull plotting positions
rather than formulae that assume an underlying nor-
mal distribution (for the log-transformed values) (see
Small (,20) and large (.100) sample sizes Helsel, 2005). We found that the LPRrm method
Most of the above discussions apply to Simula- causes identical plotting positions to be assigned to
tions 1 and 2. At the smaller sample sizes used in adjacent single, unique non-detects. Is this correct
Simulation 3, which unfortunately are all too com- or is this a defect? Perhaps there are superior plotting
mon, the rankings could lead to different recommen- position schemes for all of the LPR-based methods.
dations. For example, the MLEmpv method, which Furthermore, all of the LPR-based methods rely upon
did not fare well when confronted with larger data- standard linear regression with its independent–
sets, consistently did very well for both the 95th per- dependent variable assumption. Perhaps the utility
centile and mean, with both bias and rMSE at or near of methods that do not assume that all measurement
the top of the rankings for all four scenarios. How- error resides with the dependent variable—such as
ever, the other MLE-based methods, which appear major axis regression or reduced major axis regres-
to be more universally applicable, also did well. sion—should be examined. For both the LPR and
The LPR-based methods tended to fall in the rank- MLE methods, we adopted a simple scheme that al-
ings, consistently ranking at or near the bottom for lowed the use of the mvue equation (Mulhausen and
the rMSE. Furthermore, if the rMSE is taken as the Damiano, 1998) to reduce the transformation bias:
premier comparison metric, it has to be noted that for the LPR method we used the number of detects
for this set of simulations the substitution methods as the sample size in the mvue equation and for the
did very well when estimating the 95th percentile. MLE method we used the full sample size. This
The above discussions apply to either small or scheme was based upon preliminary computer simu-
moderate sample sizes. As discussed earlier in the lations and seemed to work well, particularly as the
discussion regarding the quantile results, we repeated sample size increased. However, a more sophisti-
the simulations, but this time increasing the sample cated scheme for adjusting the sample size for the
size range from 20–100 to 100–1000, holding all mvue equation might improve the bias and rMSE
other Simulation 1, Scenario IV parameters the same. of these methods when estimating the mean.
One of the themes of this paper is that there will effect of measurement error. In our own computer
be datasets where the unimodal, lognormal distribu- simulations, we assumed that the randomly gener-
tion assumption is inappropriate, but to date, there ated exposures were (i) measured precisely and (ii)
is little guidance on how to make this determina- that the measurements were not rounded or trun-
tion. Certainly, subjective graphical techniques— cated. Therefore, our conclusions, and those of the
log-probability plots and histograms—are helpful, referenced studies, are strictly applicable to ideal
but thus far all of the objective goodness-of-fit proce- exposure measurement systems.
dures assume or require a complete or uncensored A sampling and analytical method produces an es-
dataset. A censored data goodness-of-fit method, per- timate of the true concentration. The accuracy (refer-
haps consisting of both a subjective graphical test ring to the combination of bias and precision) of
and an objective statistical test, where the outcome these estimates vary with the mass collected, the an-
is a decision to use a standard versus robust version alyte and the analytical method. Due to variability in
of a method (e.g. MLE versus MLEr or MLErm) the sampling pump flow rate and manufacturing var-
might prove to be useful. iation in the sampling device (e.g. the sampling de-
None of the methods studied here account for the vice used to obtain a respirable dust sample), the
residual information in a complex dataset: the LOD mass collected will be an estimate of the true mass
and the laboratory in use for each measurement per unit volume at the location sampled. An analyti-

(whether a detect or a non-detect). It is conceivable cal method is used to estimate the true mass of the an-
that a superior CDA method could be devised for alyte collected by the sampling device. The relative
complex datasets that makes use of this discarded variability in the analytical method increases with
information. decreasing collected mass. The overall effect of these
factors on the method’s total coefficient of variation
Additional simulations (CVt) is demonstrated in Appendix 2 for sampling
There are obviously numerous simulations and sce- respirable dust (i.e. respirable particulate mass or
narios that we could have devised. We have already RPM).
mentioned other simulations where we increased the The second issue involves the rounding or
sample size range or increased the severity of the con- truncation of the measurements. The sample volume
taminated distributions to examine to usefulness of is typically rounded to two or three significant fig-
the NP quantile methods when confronted with con- ures. More importantly, the analyte mass for samples
taminated distributions. In addition, we repeated Sim- near the laboratory LOD is reported using one
ulation 1, but this time increasing the range for the significant digit. As the detected mass increases,
percent censored in the four Scenarios from 1–50% the laboratory result will generally have two or three
to 1–80%. While there were some changes in the significant digits. Furthermore, instead of rounding,
rankings, our conclusions above remain the same. a laboratory may use truncation to obtain the
Hornung and Reed (1990) suggested that the LOD/ necessary number of significant digits. All of this
2 substitution method could be used in epidemiolog- adds additional uncertainty to the reported mass
ical studies where the mean exposure is of interest detected per sample and the eventual calculated
and the percent censored is ,50%. We repeated concentration.
Simulation 1, Scenario I, but this time restricting the We repeated Scenario I (see Table 1) for Simula-
range of GSDs to 1.2–2, 2–3 and 3–4.p For
ffiffiffi the low tions 1, 2 and 3 for the sampling of RPM and taking
GSD range, we found that the LOD 2 method, into account measurement error (but ignoring for
while slightly biased, had an rMSE value only now the potential effect of rounding or truncation).
slightly greater than those of the LPR- and MLE- We assumed that each filter is blanked corrected us-
based methods. For the medium GSD range, we ing a separate blank per sample. Other assumptions
found that both methods, while again slightly biased, regarding filter weighing precision and the variabil-
had rMSE values slightly greater than those of the ity associated with the flow rate and the sampling
LPR- and MLE-based methods. For the high GSD device are described in Appendix 2. In all of the
range, we found that the LOD/2 method was virtually previous Scenario I simulations (see Table 1), the
unbiased for the mean and had an rMSE value GM was fixed at 1, and the field LOD varied ac-
slightly greater than the LPR- and MLE-based meth- cording to the percentage of the distribution that
ods. These findings are consistent with those of was censored. Here the field LOD was fixed at
Hornung and Reed (1990) and suggest that the use 0.037 mg m3 (see Appendix 2), which required
of a judiciously selected substitution method when that the GM vary according to the percentage of
estimating the mean is not unreasonable. the distribution that was censored. Otherwise, all
of the simulation conditions presented in the Com-
Effect of exposure measurement error puter Simulation Methods section for Scenario I
To our knowledge, none of the published papers on apply. True concentrations were generated as be-
CDA have considered the potentially confounding fore, but this time a ‘measurement’ was simulated
by adding measurement error to the true concentra- normal distribution function, the actual exposure
tion using the following equation: profile for any particular workplace has an upper
boundary and is an unknown function of the physical
x9 5 x ð1 þ Zr CVt ðxÞÞ;
parameters of the workplace and work practices of
where x9 5 measured concentration, x 5 true con- the employees, and not a function of the GM and
centration (a random value generated from a log- GSD. Even if the underlying exposure profile is rea-
normal distribution), Zr 5 random Z-value and sonably lognormal, we will never know the true GSD
CVt(x) 5 the sampling and analytical method CVt or the true percentage of the underlying distribution
at x (see Appendix 2). that lies below the field LOD, so that overall it can
The results are presented in Tables 15–17. Interest- be difficult to determine which of the simulation
ingly, while the bias tended to increase for all meth- and scenario combinations devised here best fit our
ods, the rMSE often changed little or even decreased. situation.
(After considering the issue, we recognized that the Our results show that for the simulations and sce-
since the CVt increases with decreasing concentra- narios postulated an ‘omnibus’ CDA method does
tion, it is more likely that a true non-detect will be not yet exist. In our view, a ‘preferred’ CDA method
‘measured’ as a detect and less likely that a detect is one that has both low bias and rMSE for the expo-
will be measured as a non-detect. This results in sure profile parameter of interest and is robust when-

a positive bias for the GM and a negative bias for ever the true underlying distribution departs from the
the GSD, with an overall positive bias for both the lognormal distribution model. Our personal prefer-
95th percentile and the mean with little change in ence when estimating the 95th percentile is to use
the rMSE. This effect becomes more pronounced as a MLE-based method (with the exception of the
the percent censored increases.) Although the rela- MLEmpv method). The MLE-based methods appear
tive ranking of an individual method often changed, to be fairly robust, especially when compared to
the relative ranking of the LPR- and MLE-based the supposedly robust NP methods and preferable
methods remained much the same. Therefore, our to the LPR-based methods which tend to have larger
general conclusions remain unchanged. rMSE values, particularly when the sample size is
The results of the modified Scenario I simulations ,20. Within the MLE-based methods, the standard
indicate that the general effect of measurement error MLE method comes closest to being an omnibus
is an increase in the bias for both the 95th percentile method, and therefore receives our recommendation
and the mean, but with little to no change to the as the preferred method.
rMSE, leaving unchanged our general conclusions Another selection factor to consider is the ease of
regarding the relative rankings of the methods, at calculation (or accessibility). The substitution meth-
least when sampling respirable dust. However, we ods are by far the easiest to implement and when
suspect that this will be true for substances other than dealing with large datasets, such as when construct-
respirable dust (i.e. RPM), given that the flow rate ing a job-exposure-matrix for the mean exposure,
CV and analysis CV used in the RPM analysis are are certainly expedient and may be reasonably accu-
similar to those for most sampling and analytical rate, as was suggested by Hornung and Reed (1990),
methods. In conclusion, the general effect of mea- depending upon the true (but unknown) underlying
surement error on the comparisons of the various GSD and percent censored. The LPR-based methods
CDA methods should be minimal. The potentially are more complicated, particularly the LPRr and
confounding effect of rounding or truncation will LPRrm methods as these involve ad hoc methods
be investigated at a later date. and as such are difficult to automate via a program-
ming language. The MLE method is easily accessible
RECOMMENDATIONS using the solver-function of most computer spread-
sheets (Finkelstein and Verma, 2001), but some
Since in reality occupational exposure profiles manual manipulation of the data is required. The
cannot be truly lognormal, our view is that all of MLEr and MLErm methods require additional man-
the CDA methods discussed here are ad hoc. While ual manipulation and are therefore also difficult to
some have a practical basis (substitution, KM, NP) program.
and the remainder appeal to the notion that the data Procedures using r-code have been published for
are reasonably well described by a theoretical distri- implementing the more complicated methods—for
bution function (the LPR- and MLE-based methods), example, MLE and KM—but even here the user is
all are data analysis tools that we use in the hope that required to become proficient in the statistical program-
the results will be reasonably close to the truth. In oc- ming language of r-plus (Frome and Wambach, 2005).
cupational health, we rely heavily on the lognormal Many statistical packages include the KM method
distribution assumption for summarizing our right- for right-censored data, which necessitates some ma-
skewed datasets and for approximating the shape of nipulation of both the data and results in order to ap-
the true exposure profile. However, unlike the log- ply the method to left-censored data. Consequently,
Table 15. Simulation 1, Scenario I (with measurement error)—single lognormal distribution and a single laboratory where the
laboratory LOD is 1–50% of the true distribution; 20 n 100; 1.2 GSD 4
MLErm 1.7 MLEmpv 22.5 MLE 0.4 MLE 17.9
LPRr 2.5 Sub LOD 2 22.8 Sub LOD 2 0.8 LPR 18.4
MLEr 2.8 MLErm 22.9 LPRrm 1.0 MLErm 19.6
MLE 2.8 Sub LOD 23.7 MLEmpv 1.1 LPRrm 19.7
Sub LOD/2 3.5 MLE 23.7 MLErm 1.1 MLEr 19.7
LPRrm 3.5 MLEr 23.7 MLEr 1.2 MLEmpv 19.7
MLEmpv 3.6 LPRr 24.1 LPRr 1.5 LPRr 19.9
pffiffiffi
LPR 4.3 LPRrm 24.7 LPR 1.5 Sub LOD 2 20.0
pffiffiffi
KM 4.7 Sub LOD/2 24.9 Sub LOD 2 1.7 Sub LOD/2 20.6
pffiffiffi
NP 16.9 NP 52.3

MLErm 2.4 MLErm 24.4 Sub LOD/2 3.6 MLE 19.6
MLEr 4.3 Sub LOD 25.4 LPRrm 4.1 LPR 20.2
LPRr 5.0 MLEr 25.5 MLEmpv 5.1 MLEmpv 20.5
MLE 5.4 MLE 25.8 LPR 5.5 LPRrm 20.9
pffiffiffi
LPRrm 6.0 Sub LOD 2 26.1 MLErm 6.7 MLEr 20.9
KM 6.6 LPRr 26.6 LPRr 6.8 MLErm 20.9
LPR 7.3 Sub LOD/2 26.7 MLE 6.8 Sub LOD/2 21.1
Sub LOD/2 13.4 MLEmpv 26.7 MLEr 6.8 LPRr 21.2
MLEmpv 15.6 LPRrm 27.7 Sub LOD 2 15.9 Sub LOD 2 26.5
pffiffiffi
NP 19.0 NP 51.7
pffiffiffi
Sub LOD 2 0.8 Sub LOD 52.4 Sub LOD/2 0.9 MLE 38.5
pffiffiffi
MLEmpv 3.7 Sub LOD 2 55.6 LPRrm 0.9 LPR 40.5
MLE 6.1 Sub LOD/2 59.7 MLE 1.0 MLErm 41.7
Sub LOD 6.1 MLEmpv 61.0 MLEr 1.0 MLEr 41.9
LPRr 6.4 MLE 62.2 MLErm 1.1 LPRr 41.9
MLErm 6.6 MLEr 65.0 MLEmpv 1.3 MLEmpv 42.1
pffiffiffi
MLEr 7.2 MLErm 65.8 Sub LOD 2 1.8 LPRrm 42.9
Sub LOD/2 9.8 LPRr 70.9 LPR 2.0 Sub LOD 43.2
LPRrm 13.1 LPRrm 85.6 LPRr 2.2 KM 43.2
pffiffiffi
LPR 13.9 LPR 96.2 KM 5.2 Sub LOD 2 43.7
Sub LOD 5.3 Sub LOD/2 44.0
Table 18. Recommended methods based on the rMSE results

Distribution assumption Sample size Parameter
X0.95 Mean
1–50% Censored
Reasonably lognormal Small n, 5 n 19 MLE MLE (MLErm) LPR
Large n, 20 n 100 MLErm (MLE, MLEr) MLE MLErm LPR (LPRrm)
Contaminated lognormal Small n, 5 n 19 MLE MLE LPR
Large n, 20 n 100 MLE (MLEr MLErm) MLE (MLEr MLErm) LPR
50–80% Censored
Reasonably lognormal Large n, 20 n 100 MLE MLE
Contaminated lognormal Large n, 20 n 100 MLE (MLErm) MLE (MLErm)
Methods in parentheses are roughly equivalent in performance to the recommended method.
apart from any opinions regarding the preferred methods did consistently well in all of the scenarios,
method, ease of use and accessibility are bound to our table consists primarily of the MLE-based meth-

be factors in the final selection of a CDA method. ods. However, a similar table could be constructed
One obvious solution to these dilemmas is to sim- that would hold primarily LPR-based methods (ex-
ply eliminate or reduce the need for a CDA method cept for perhaps when n is small). In our view, the
through the judicious selection of an analytical standard MLE method comes closest to being an om-
method that has a low LOD and/or the collection of nibus method. The so-called robust versions of the
larger volume samples. Furthermore, for those labo- MLE- and LPR-based methods did not consistently
ratories that routinely report LOQs rather than LODs, result in superior performance when challenged with
they should be requested to additionally report the contaminated lognormal distributions. Due to the fail-
traditional LOD and the mass detected between the ure of the NP quantile methods and the KM method to
LOD and the LOQ. While a measurement between perform better than the LPR- and MLE-based meth-
the LOD and LOQ is admittedly less reliable than ods when confronted with contaminated distributions
a measurement above the LOQ, the loss of informa- (for the scenarios postulated in this paper), we do not
tion when using the LOQ makes it even more diffi- recommend their inclusion in any such table.
cult to estimate the lognormal parameters for the
underlying distribution (Eduard, 2002; Helsel, 2005). APPENDIX 1. QUANTILE CALCULATUION
We agree with Shumway et al. (2002) that the METHODS
temptation to combine disparate data—that is, data
collected from different plants or similar exposure The following table lists the NP quantiles pre-
groups or from periods when different laboratories sented by Hyndman and Fan (1996), the correspond-
were used—should be resisted. The resulting datasets ing quantiles offered by two commercial statistics
are often complex, with multiple censoring points, packages and the necessary calculations. Hyndman
and probably were drawn from more than one under- and Fan recommended the Q8 method. Q1 through
lying distribution. It is unreasonable, in our view, to Q3 use a step function approach. However, the Q2
expect a CDA method to extract a highly accurate es- method uses a averaging method whenever 0.95n
timate of a parameter of interest under such circum- equals an integer. The Q4 through Q8 methods use
stances, and a faith that a NP method will indeed do linear interpolation to estimate the 95th percentile
so is, as our simulations reveal, somewhat misplaced. whenever k is not an integer. The default quantile
However, if the analysis of a complex dataset is re- methods for the Systat Version 11 and the SAS Ver-
quired and it is strongly suspected that the underlying sion 9 statistics programs are the Q5 and Q2 meth-
distribution departs significantly from the unimodal ods, respectively.
lognormal model, our recommendation is to use the
standard MLE method for estimating either the
95th percentile (or other upper percentiles) and mean. APPENDIX 2. CALCULATION OF MEASUREMENT
The results in this study suggest that little is gained ERROR AND THE MINIMUM DETECTABLE
from the increased complexity of the MLEr and CONCENTRATION
MLErm methods (or the LPRr and LPRrm methods).
Overall method accuracy is usually summarized
Finally, Table 18 summarizes our recommenda-
using the propagation of errors formula (Kogut
tions for sample sizes of 100 or less. The listed meth-
et al., 1997):
ods are felt to be roughly equal in performance. Our
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
preference is to select from a particular family of
CVt 5 CV2pump þ CV2sampler þ CV2analysis ;
methods when at all possible. Since the MLE-based
Hyndman and Systat Version 11 SAS Version 9 Intermediate 95th Percentile calculation
Fan (1996) calculationsa
quantile method
Q1 Empirical CDF QNTLDEF 3 k 5 0.95n, If (k i) . 0 then X0.95 5 xiþ1,
i 5 Floor(k) if (k i) 5 0 then X0.95 5 xi
Q2 Empirical CDF QNTLDEF 5 k 5 0.95n, If (k i) . 0 then X0.95 5 1⁄2(xiþ1 xi),
with averaging (default) i 5 Floor(k) if (k i) 5 0 then X0.95 5 1⁄2(xiþ1 xi)
Q3 Closest value QNTLDEF 2 k 5 0.95n, X0.95 5 xi
i 5 Round(k)
Q4 Weighted average 1 QNTLDEF 1 k 5 0.95n, X0.95 5 xi þ (k i)(xiþ1 xi)
i 5 Floor(k)
Q5 Cleveland’s method k 5 0.95n þ 1⁄2, X0.95 5 xi þ (k i)(xiþ1 xi)
(default) i 5 Floor(k)
Q6 Weighted average 2 QNTLDEF 4 k 5 0.95(n þ 1), X0.95 5 xi þ (k i)(xiþ1 xi)
i 5 Floor(k)
Q7 Weighted average 3 k 5 0.95(n 1) þ 1, X0.95 5 xi þ (k i)(xiþ1 xi)
i 5 Floor(k)
Q8 k 5 0.95(n þ 1⁄3) þ 1⁄3, X0.95 5 xi þ (k i)(xiþ1 xi)

i 5 Floor(k)
a
The ‘Floor(k)’ function returns the highest integer less than or equal to k. The ‘Round(k)’ function uses Banker’s Rounding.
where CVt 5 total coefficient of variation for a sam-

pling and analytical method; CVpump 5 fractional
variation due to random variations in the pump
flowrate; CVsampler 5 fractional variation due to the
manufacturing variation in the sampling device and
CVanalysis 5 fractional variation due to the analytical
determination of the analyte mass.
The CVpump and CVsampler are generally considered
to be independent of the true concentration. The
CVanalysis is not independent of the true concentration,
resulting in the following general equation for CVt:
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2
rmass Fig. 1. Total coefficient of variation (CVt) calculated as
CVt ðxÞ 5 CV2pump þ CV2sampler þ x ; a function of true concentration when sampling respirable dust
QT
(i.e. RPM).
where rmass 5 the standard deviation of the analyti- with the analyte and analytical method. For a single
cal system; Q 5 flowrate; T 5 averaging time for the weighing of a filter used in respirable dust sampling,
measurement and x 5 true concentration. a typical rmass is 0.005 mg. Let us assume that the
Using as an example, the sampling of respirable mass collected on each filter is also blank corrected
dust (respirable particulate mass, RPM) (and ignoring (i.e. each sample filter has a matching blank filter).
any uncorrectable particle size distribution effects), Since both the sample filter and the matched blank
the total coefficient of variation at different concentra- are pre- and post-weighed, a total of four weighings
tions of RPM can be estimated. Although recent stud- are required to estimate the mass collected on the sam-
ies (Kogut et al., 1997) have shown slightly lower ple filter, resulting in an overall rmass of 0.010 mg.
values, the CVpump has traditionally been given a value Finally, the low rate and averaging time will be set
of 0.05. The CVsampler for the Dorr-Oliver 10-mm ny- at the standard values of 1.7 Lpm (for RPM) and
lon cyclone has been estimated by Kogut et al. (1997) 480 min, resulting in the following equation:
to be 0.023, but in this example we will use 0.05 as re- Figure 1 shows the relationship between the CVt
ported by Bartley et al. (1994). The rmass will vary and the true RPM concentration. At the higher
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi
2 2 0:010 mg 1 2
CVt ðxÞ 5 0:05 þ 0:05 þ :
0:0017 m3 min 1 480 min x
concentrations, the CVt is relatively constant. Suffi- Eduard W. (2002) Estimation of mean and standard deviation
cient mass is collected that the contribution of the (letter to the editor). Am Ind Hyg Assoc J; 63: 4.
El-Shaarawi AH, Esterby SR. (1992) Replacement of censored
CVanalysis becomes insignificant compared to the observations by a constant: an evaluation. Water Res; 26:
fixed variability due to the sampling pump and 835–44.
sampler. At low concentrations, the CVanalysis pre- Environmental Protection Agency (EPA). (2006) Data quality
dominates and steadily increases with decreasing assessment: statistical methods for practitioners, EPA QA/
G-9S. Washington, DC: Environmental Protection Agency.
collected mass. (The curve does not remain flat Finkelstein MM, Verma DK. (2001) Exposure estimation
forever as the concentration increases. Very high in the presence of nondetectable values: another look
concentrations will tend to result in overloaded sam- (see AIHAJ 63:4 2002 for letters to the editor). AIHAJ; 62:
plers, which will drive the CVt upwards.) 195–8.
Frome EL, Wambach PF. (2005) Statistical methods and soft-
A CVt curve can be determined for any analyte and ware for the analysis of occupational exposure data with
sampling method, and will have a shape similar to non-detectable values. Oak Ridge, TN: Oak Ridge National
that in Fig. 1. According to the NIOSH (Abell and Laboratory ORNL/TM-2005/52.
Kennedy, 1997), a reasonably accurate method Gibbons RD, Goleman DE. (2001) Statistical methods for de-
should have a true CVt that is ,0.128 over the range tection and quantification of environmental contamination.
New York: John Wiley and Sons, Inc.
of 10–200% of the exposure limit. However, at the Gilbert RO. (1987) Statistical methods for environmental pol-
method’s field LOD—that is, the laboratory LOD di- lution monitoring. New York: Van Nostrand Reinhold.

vided by the sample volume, also called the mini- Gilliom RJ, Helsel DR. (1986) Estimation of distributional
mally detectable concentration—the CVt can be parameters for censored trace level water quality data 1.
Estimation techniques. Water Resour Res; 22: 135–46.
much greater. The field LOD for sampling RPM Glass DC, Gray CN. (2001) Estimating mean exposures from
can be estimated using the rmass: censored data: exposure to benzene in the Australian petro-
leum industry. Ann Occup Hyg; 45: 275–82.
Hawkins NC, Norwood SK and Rock JC, editors. (1991) A
3 rmass strategy for occupational exposure assessment. Fairview,
LOD 5 :
QT VA: American Industrial Hygiene Association.
Helsel DR, Cohn TA. (1998) Estimation of descriptive statistics
3 0:010 mg for multiply censored water quality data. Water Resour Res;
LOD 5 5 0:037 mg m 3 : 24: 1997–2004.
0:0017 m3 min 1 480 min Helsel DR, Hirsch RM. (2002) Statistical methods in water re-
sources. Department of the Interior, United States Geological
Survey, Reston, Virginia. Available from: http://water.usgs.gov/
The factor of three forces the field LOD to be three pubs/twri/twri4a3/.
standard deviations above the mean weight change Helsel DR. (2005) Nondetects and data analysis. New York:
for an unused filter (the true mean weight change is John Wiley & Sons, Inc.
Hornung RW, Reed LD. (1990) Estimation of average concen-
zero). Three standard deviations above the mean in- tration in the presence of nondetectable values. Appl Occup
strument response is the traditional method for deter- Environ Hyg; 5: 46–51.
mining the analytical LOD (Abell and Kennedy, Hyndman RJ, Fan Y. (1996) Sample quantiles in statistical
1997). packages. Am Stat; 50: 361–5.
Kogut J, Tomb TF, Parobeck PS et al. (1997) Measurement pre-
At the field LOD the CVt is 0.34, indicating that cision with the coal mine dust personal sampler. Appl Occup
there is a considerable amount of method variability, Environ Hyg; 12: 999–1006.
or what is commonly referred to as measurement er- Kroll CN, Stedinger JR. (1996) Estimation of moments and quan-
ror. This suggests that at and below the LOD, it is tiles using censored data. Water Resour Res; 32: 1005–12.
Mulhausen J, Damiano J, editors. (1998) A strategy for assess-
highly likely that a true non-detect—that is, a true ing and managing occupational exposures. 2nd edn. Fairfax,
concentration that is less than the LOD—could be re- VA: American Industrial Hygiene Association.
ported as a detect, above the LOD, due to measure- SAS. (2006) SAS/STAT software, Version 9. SAS Institute,
ment error. The reverse it also true, a true detectable Inc. Available from: http://www.sas.com.
Schmoyer RL, Beaucamp JJ, Brandt CC et al. (1996) Difficul-
concentration could be reported as a non-detect. ties with the lognormal model in mean estimation and test-
ing. Environ Ecol Stat; 3: 81–97.
She N. (1997) Analyzing censored water quality data using a non-
parametric approach. J Am Water Res Assoc; 33: 615–24.
REFERENCES Shumway RH, Azari RS, Kayhanian M. (2002) Statistical
approaches to estimating mean water quality concen-
Abell MT, Kennedy ER. (1997) A computer program to trations with detection limits. Environ Sci Tech; 36: 3345–53.
promote understanding of the monitoring evaluation Succop PA, Clark S, Chen M et al. (2004) Imputation of data
guidelines used at NIOSH. Am Ind Hyg Assoc J; 58: 236–41. values that are less than a detection limit. J Occup Environ
Bartley DL, Chen C, Song R et al. (1994) Respirable aerosol Health; 1: 436–41.
sampler performance testing. Am Ind Hyg Assoc J; 55: Symanski E, Kupper LL, Kromhout H, Rappaport SM. (1996)
1036–46. An investigation of systematic changes in occupational ex-
Bullock WH, Ignacio JS, editors. (2006) A strategy for assess- posure. Am Ind Hyg Assoc J; 57: 724–35.
ing and managing occupational exposures. 3rd edn. Fairfax, Systat. (2007) Systat, Version 11. Systat Software, Inc Avail-
VA: American Industrial Hygiene Association. able from: http://www.systat.com.

annhyg%2Fmem045

Uploaded by

Copyright:

Available Formats

annhyg%2Fmem045

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

annhyg%2Fmem045

Uploaded by

Copyright:

Available Formats

Ann. Occup. Hyg., Vol. 51, No. 7, pp.

A Comparison of Several Methods for Analyzing

Downloaded from http://annhyg.oxfordjournals.org/ by guest on December 3, 2014

INTRODUCTION left-censored dataset; that is, a dataset where one

Downloaded from http://annhyg.oxfordjournals.org/ by guest on December 3, 2014

Downloaded from http://annhyg.oxfordjournals.org/ by guest on December 3, 2014

Downloaded from http://annhyg.oxfordjournals.org/ by guest on December 3, 2014

Downloaded from http://annhyg.oxfordjournals.org/ by guest on December 3, 2014

Downloaded from http://annhyg.oxfordjournals.org/ by guest on December 3, 2014

Downloaded from http://annhyg.oxfordjournals.org/ by guest on December 3, 2014

Table 1. Parameters used in Simulation 1

Downloaded from http://annhyg.oxfordjournals.org/ by guest on December 3, 2014

Downloaded from http://annhyg.oxfordjournals.org/ by guest on December 3, 2014

Downloaded from http://annhyg.oxfordjournals.org/ by guest on December 3, 2014

Downloaded from http://annhyg.oxfordjournals.org/ by guest on December 3, 2014

Downloaded from http://annhyg.oxfordjournals.org/ by guest on December 3, 2014

Downloaded from http://annhyg.oxfordjournals.org/ by guest on December 3, 2014

Downloaded from http://annhyg.oxfordjournals.org/ by guest on December 3, 2014

Downloaded from http://annhyg.oxfordjournals.org/ by guest on December 3, 2014

Downloaded from http://annhyg.oxfordjournals.org/ by guest on December 3, 2014

Downloaded from http://annhyg.oxfordjournals.org/ by guest on December 3, 2014

Downloaded from http://annhyg.oxfordjournals.org/ by guest on December 3, 2014

Downloaded from http://annhyg.oxfordjournals.org/ by guest on December 3, 2014

Table 18. Recommended methods based on the rMSE results

Downloaded from http://annhyg.oxfordjournals.org/ by guest on December 3, 2014

Downloaded from http://annhyg.oxfordjournals.org/ by guest on December 3, 2014

where CVt 5 total coefficient of variation for a sam-

Downloaded from http://annhyg.oxfordjournals.org/ by guest on December 3, 2014

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.