0% found this document useful (0 votes)

32 views

Statistical Estimation Methods in Hydrological Engineering

This document discusses statistical estimation methods used in hydrological engineering to estimate parameters of distribution functions that describe stress and load parameters. It summarizes several classical parameter estimation methods, including the method of moments, maximum likelihood, least squares, Bayesian estimation, and L-moments. It also evaluates these methods based on their ability to provide unbiased estimates of extreme quantiles with small variance from small sample sizes, as is often required for risk analysis of hydraulic structures. Monte Carlo simulations are used to analyze the performance of these methods in estimating quantiles under different criteria like relative bias, root mean squared error, and over- or underdesign.

Uploaded by

shambel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views

Statistical Estimation Methods in Hydrological Engineering

Uploaded by

shambel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

Statistical Estimation Methods in Hydrological Engineering

P.H.A.J.M. van Gelder

TU Delft, The Netherlands
p.vangelder@ct.tudelft.nl, http://surf.to/vangelder

Introduction
In designing civil engineering structures use is made of probabilistic calculation
methods. Stress and load parameters are described by statistical distribution functions.
The parameters of these distribution functions can be estimated by various methods.
An extensive comparison of these different estimation methods is given in this paper.
The main point of interest is the behaviour of each method for predicting p-quantiles
(the value which is exceeded by the random variable with probability p), where p«1.
The estimation of extreme quantiles corresponding to a small probability of
exceedance is commonly required in the risk analysis of hydraulic structures. Such
extreme quantiles may represent design values of environmental loads (wind, waves,
snow, earthquake), river discharges, and flood levels specified by design codes and
regulations (TAW, 1990). In this paper the performance of the parameter estimation
methods with respect to its small sample behaviour is analyzed with Monte Carlo
simulations, added with mathematical proofs.
In civil engineering practice many parameter estimation methods for
probability distribution functions are in circulation. Well known methods are for
example:
- the method of moments (Johan Bernoulli, 1667-1748),
- the method of maximum likelihood (Daniel Bernoulli, 1700-1782),
- the method of least squares (on the original or on the linearized data), (Gauss, 1777-
1855),
- the method of Bayesian estimation (Bayes, 1763),
- the method of minimum cross entropy (Shannon, 1949),
- the method of probability weighted moments (Greenwood et al., 1979),
- the method of L-moments (Hosking, 1990).
Textbooks, such as Benjamin and Cornell (1970), Berger (1980), treat the
traditional methods in detail. The methods will be briefly reviewed in this paper.

Many attempts (for instance, Goda and Kobune, (1990), Burcharth and Liu, (1994),
Yamaguchi, (1997), and Van Gelder and Vrijling, (1997a)), have been made to find
out which estimation method is preferable for the parameter estimation of a particular
probability distribution in order to obtain a reliable estimate of the p-quantiles. In this
paper, we will in particularly investigate the performance of the parameter estimation
method with respect to three different criteria; (i) based on the relative bias and root
mean squared error (RMSE), (ii) based on the over- and underdesign.
It is desirable that the quantile estimate be unbiased, that is, its expected value
should be equal to the true value. It is also desirable that an unbiased estimate be
efficient, i.e., its variance should be as small as possible. The problem of unbiased
and efficient estimation of extreme quantiles from small samples is commonly
encountered in the civil engineering practice. For example, annual flood discharge
data may be available for past 50 to 100 years and on that basis one may have to
estimate a design flood level corresponding to a 1,000 to 10,000 years return period
(Van Gelder et al., 1995).
The first step in quantile estimation involves fitting an analytical probability
distribution to represent adequately the sample observations. To achieve this, the
distribution type should be judged from data and then parameters of the selected
distribution should be estimated. Since the bias and efficiency of quantile estimates
are sensitive to the distribution type, the development of simple and robust criteria for
fitting a representative distribution to small samples of observations has been an
active area of research. In this paper three different methods for the selection of the
distribution type will be reviewed, extended and tested. The first method is based on
Bayesian statistics, the second one on linear regression, and the third one on L-
moments. Certain linear combinations of expectations of order statistics, also referred
to as L-moments by Hosking (1990), have been shown to be very useful in statistical
parameter estimation. Being a linear combination of data, they are less influenced by
outliers, and the bias of their small sample estimates remains fairly small. A measure
of kurtosis derived from L-moments, referred to as L-kurtosis, was suggested as a
useful indicator of distribution shape (Hosking, 1992).
Hosking (1997) proposed a simple but effective approach to fit 3-parameter
distributions. The approach involves the computation of three L-moments from a
given sample. By matching the three L-moments, a set of 3-parameter distributions
can be fitted to the sample data. In this paper, a distribution type selection which is
based on the the 4th L-moment is suggested to be the most representative distribution,
which should be used for quantile estimation. In essence, the L-kurtosis, which is
related to the 4th L-moment, can be interpreted as a measure of resemblance between
two distributions having common values of the first three L-moments.
The concept of probabilistic distance or discrimination between two
distributions is discussed in great detail in modern information theory (Kullback 1959,
Jumarie 1990). Mathematically sound measure of probabilistic distance, namely, the
divergence, has been used to establish resemblance between two distributions or
conversely to select the closest possible posterior distribution given an assumed prior
distribution. The divergence is a comprehensive measure of probabilistic distance,
since it involves the computation of departure of a distribution from the reference
parent distribution over an entire range of the random variable. Apart from the
performance of estimation methods based on bias and RMSE, and the performance
based on under- and overdesign, is suggested in this paper.
Furthermore, this paper will focus on evaluating the robustness of the L-
kurtosis measure in the distribution selection and extreme quantile estimation from
small samples. The robustness is evaluated against the benchmark estimates obtained
from the information theoretic measure, namely, the divergence. For this purpose, a
series of Monte Carlo simulation experiments were designed in which probability
distributions were fitted to the sample observations based on L-kurtosis and
divergence based criteria, and the accuracies of quantile estimates were compared.
The simulation study revealed that the L-kurtosis measure is fairly effective in
quantile estimation.
Finally, this paper shows some analytical considerations concerning statistical
estimation methods and probability distribution functions. The paper ends with a
discussion.

2
Classical estimation methods
To make statements about the population on the basis of a sample, it is important to
understand in what way the sample relates to the population. In most cases the
following assumptions will be made:
1. Every sample observation x is the outcome of a random variable X which has an
identical distribution (either discrete or continuous) for every member of the
population;
2. The random variables X1, X2, ..., Xn corresponding to the different members of the
sample are independent.

These two assumptions (abbreviated to i.i.d. (independent identically distributed))

formalize what is meant by the statement of drawing a random sample from a
population.
We have now reduced the problem to one which is mathematically very
simple to state: we have i.i.d. observations x1, x2, ..., xn of a random variable X with
probability function (in the discrete case) or probability density function (in the
continuous case) f, and we want to estimate some aspect of this population
distribution (for instance the mean or the variance).
It is helpful here to stress the notation that we are using in this paper: small
case letters xi denote actual sample values. But each xi is the realisation of a random
variable Xi, denoted by capitals. Thus X=(X1, X2, ..., Xn) denotes a random sample,
whose particular value is x=(x1, x2, ..., xn). The distinction helps us to distinguish
between a random quantity and the outcome this quantity actually realises.
A statistic, T(X), is any function of the data (note that T(X) denotes that this is
a random quantity which varies from sample to sample; T(x) will denote the value for
a specific sample x). If a statistic is used for the purpose of estimating a parameter
then it is called an estimator and the realised value T(x) is called an estimate. The
basis of our approach will be to use T(x) as the estimate of 2, but to look at the
sampling properties of the estimator T(x) to judge the accuracy of the estimate. Since
any function of the sample data is a potential estimator, how should we determine
whether an estimator is good or not? There are, in fact, many such criteria: we will
focus on the two most widely used:
- Though we cannot hope to estimate a parameter perfectly, we might hope that ‘on
average’ the estimation procedure gives the correct result.
- Estimators are to be preferred if they have small variability; in particular, we may
require the variability to diminish as we take samples of a larger size.

These concepts are formalized as follows.

The estimator T(X) is unbiased for 2 if :

E(T(X)) = 2 (0.1)

Otherwise, B(T) = E(T(X)) - 2 is the bias of T.

If B(T) 6 0 as the sample size n64 , then T is said to be asymptotically unbiased for 2.
The mean-squared error of an estimator is defined by:

MSE(T) = E((T(X) - 2)2) (0.2)

Note that MSE(T)=var(T)+B2(T). Indeed MSE(T) = E(T2(X)-22T(X)+22) = E(T2(X))
–22E(T(X)) + 22 = E(T2(X)) - 22(B(T)+2) + 22 = E(T2(X)) - 22B(T) - 22

3
and var(T) = E(T2(X)) – E2(T(X)) = E(T2(X)) – (B(T) + 2)2 = E(T2(X)) – B2(T) -
22B(T) - 22 . This proves the equality.

The root mean-squared error of an estimator is defined as:

RMSE = %MSE (0.3)

An estimator T is said to be mean-squared consistent for 2 if MSE(T) 6 0 as the
sample size n64.
Ideally, estimators are both unbiased and consistent. We also prefer estimators
to have as small a variance as possible. In particular, given two estimators T1 and T2
both being unbiased for 2, then T1 is said to be more efficient than T2 if :

var(T1(X)) < var(T2(X)) (0.4)

If estimators are not unbiased it is not so straightforward to determine efficiency: we

often have to make a choice between estimators that have low bias but high mean
squared error and estimators that have high bias but low mean squared error.
Having established some possible criteria by which to judge estimators, we
now turn to general procedures for constructing estimators. In this Section we will
concentrate on three classical estimation methods, and derive their properties for a
certain number of distribution functions. Furtheron in the paper an extremely
powerful procedure based on L-Moments estimations, as well as the Bayesian method
will be presented.

Method of Moments (MOM)

It is difficult to trace back who introduced the MOM, but Johan Bernoulli (1667-
1748) was one of the first who used the method in his work. With the MOM, the
moments of a distribution function in terms of its parameters are set equal to the
moments of the observed sample. Analytical expressions can be derived quite easily
(Appendix), but the estimators can be biased and not efficient. The moment estimators
however, can be very well used as a starting estimation in an iteration process.

z
The central moments of a distribution are given by:
µ r = E ( X − µ ) r = ( x − µ ) r f X ( x ) dx
Variance: σ = µ 2
2

µ3 (0.5)
Skewness: β 1 =
µ 32/ 2
µ4
Kurtosis: β 2 =
µ 22
The sample moments are given by:
x = n −1Σxi
(0.6)
mr = n −1Σ( xi − x )r
The sample mean x is a natural estimator for :. The higher sample moments mr are
reasonable estimators of the :r , but they are not unbiased. Unbiased estimators are

4
often used. In particular F2, :3 and the fourth cumulant 64=:4 - 3:22 are unbiasedly
estimated by:
s2 = ( n − 1)−1 Σ( xi − x )2
n2
m3* = m3 (0.7)
( n − 1)( n − 2 )
n2 RSn +1 UV
k4* =
T
( n − 2 )( n − 3) n − 1
m4 − 3m22
W
The sample standard deviation s = s2 is an estimator of F but is not unbiased. The
sample estimators of CV (Coefficient of Variation), skewness and kurtosis are
respectively:
C$ v = s / x
g = m3* / s3 (0.8)
k = k4* / s 4 + 3
Finding theoretical moments as a function of 2 is not easy for all probability
distributions. The method is difficult to generalize to more complex situations
(dependent data, covariates, non-identically distributed data). Sample covariances
may be used to estimate parameters that determine dependence. For some
distributions (such as Cauchy), moments may not exist. In the Appendix moments are
given for most familiar PDFs.

Method of Maximum Likelihood (MML)

Also with the MML it is difficult to say who discovered the method, although Daniel
Bernoulli (1700-1782) was one of the first who reported about it (Kendall, 1961). The
likelihood function gives the relative likelihood of the obtained observations, as a
function of the parameters 2:

L(2,x)=Af(xi , 2) (0.9)

With this method one chooses that value of 2 for which the likelihood function is
maximized. In the Appendix an overview is given of the likelihood functions of some
familiar distribution functions. The ML-method gives asymptotically unbiased
parameter estimations and of all the unbiased estimators it has the smallest mean
squared error. The variances approach asymptotically to:

Var(1)= - E(M2logL(2,x)/M 22) (0.10)

Furthermore these estimators are invariant, consistent and sufficient. For the
definitions we refer to Hald (1952). Analytical expressions for the parameter
estimators are sometimes difficult to derive. In those cases, numerical optimization
routines have to be used to determine the maximum of the likelihood function, which
can also be quite difficult since the optimum of the likelihood function can be
extremely flat for large sample sizes. Optimization of the likelihood function may also
be hampered by the presence of local maxima. Furthermore:
- MML is (usually) straightforward to implement,
- Maximum likelihood estimators (MLEs) may not exist, and when they do, they may
not be unique or give a bias error (Koch, 1991),

5
- MLE may give inadmissible results (Lundgren, 1988),
- The likelihood function can be used for much more than just finding the MLE:
values close to the MLE are more plausible than those further away, for example. This
argument can be used to obtain an interval [pL,pU] which comprises a plausible range
of values for 2,
- ML is adaptable to more complex modeling situations, because the MLE satisfies a
very convenient invariance property: If q=h(2) where h is a bijective function then
qML=h(2ML). For example, if q=1/2, then qML=1/2ML. So, having found the MLE for
one parameterization, the MLEs for other parameterizations are immediate,
- The maximum likelihood estimator is unbiased, fully efficient (in that it achieves the
Cramér-Rao bound under regularity conditions), and normally distributed; all of them
in asymptotical sense. Regularity conditions are not fulfilled if the range of the
random variable X depends on unknown parameters as is the case for many
distributions in the present paper.

The MML is extremely useful since it is often quite straightforward to evaluate from
the MLE and the observed information. Nonetheless it is an approximation, and
should only be trusted for large values of n (though the quality of the approximation
will vary from model to model).
If the available sample sizes are large, there seems little doubt that the
maximum-likelihood estimator is a good choice. It should be emphasized, however,
that the properties above are asymptotic (large n), and better estimators may be
available when sample sizes are small.

Method of Least Squares (MLS)

Least Squares were introduced by Gauss (1777-1855). Given the observations x=(x1,
x2, ..., xn) and y=(y1, y2, ..., yn), a regression model can be fitted. For the general case:

E(Y|x) = " + $ x (0.11)

with F2 the assumed constant variance of Y around its regression line, the parameter
estimates are:

"* = mY - $* mX
$* = sXY / sX2 (0.12)
F2* = n/(n-2) (1-rXY2)sY2

in which mX, mY, sX, sY, sXY and rXY are defined as:

mX = 1/n Ei=1..nxi
mY = 1/n Ei=1..nyi
sX2 =1/n Ei=1..n(xi-mX)2 (0.13)
sY2 =1/n Ei=1..n(yi-mY)2
sXY = 1/n Ei=1..n(xi-mX)(yi-mY)
rXY = sXY/sXsY

The estimators "* and $* are linear functions of the Yi ‘s and they are unbiased. Their
variances are:

6
FA2 = F2/n (1+mX2/sX2)
FB2 = F2 / nsX2 (0.14)
in which A and B are respectively, "* and $* treated as random variables.
With the above regression techniques, the MLS can be defined. Assume that
we have n observations in sorted order given by x1:n, x2:n, ..., xn:n. Define the plotting
position pi = i/(n+1) of the i-th observation. We want to estimate the optimal value 2
of the distribution function F(x|2) in least squares sense. The following options are
available:

1) Least squares error: Gi=1n [pi-F(xi:n|2)]2 is to be minimized over 2 (sometimes also

referred to as non-linear regression (Demetracopoulis, 1994)).

Assume that we can linearize the probability distribution F(x) under consideration
(with scale parameter B and location parameter A). So we can find a function g s.t.:

g(F(x)) = (x-A)/B (0.15)

For instance, for the Exponential distribution g is given by gE(.) = -ln(1-.) and for the
Gumbel distribution, we have gG(.) = -ln(-ln(.)) (see Appendix for a complete
overview of linearization functions).

2) Least squares linearized error Gi=1n [g(pi)-g(F(xi:n|2))]2 to be minimized over 2.

3) Weighted least squares error Gi=1n [wi(pi)-wi(F(xi:n|2))]2 to be minimized over 2,

where for instance wi can weigh the extreme observations to be better fitted by the
distribution function (for instance wi=1/(1-pi)4).

4) Least squares to the ordered observations themselves: Gi=1n [F-1(pi)- xi:n]2 as

suggested by Moharram et al. (1993).

In Van Gelder (1996b), several LS-methods are applied to wave data at the location of
Pozzallo in Southern Italy. It can be shown that the parameter estimation of the scale
of various 2-parameter distributions with LS always leads to larger p-quantiles than
with the method of moments. Furthermore, note that under the assumption that the
errors in a regression model are independent and normally distributed with zero mean
and a fixed standard deviation, the ML-estimators of the error-distribution are exactly
the same as the LS-estimators (Gauss, 1777-1855). Problems with LS-estimators are
described by McCuen et al. (1990), in which they showed that logarithmic
transformations may lead to a biased model.

Method of L-Moments
Hosking, (1990), introduced the L-moments. They have become popular tools for
solving various problems related to parameter estimation, distribution identification,
and regionalization. It can be shown that L-moments are linear functions of
probability weighted moments (PWM's) and hence for certain applications, such as
the estimation of distribution parameters, serve identical purposes (Hosking, 1986). In
other situations, however, L-moments have significant advantages over PWM's,
notably their ability to summarize a statistical distribution in a more meaningful way.
Since L-moment estimators are linear functions of the ordered data values, they are

7
virtually unbiased and have relatively small sampling variance. L-moment ratio
estimators also have small bias and variance, especially in comparison with the
classical coefficients of skewness and kurtosis. Moreover, estimators of L-moments
are relatively insensitive to outliers. These often-heard arguments in favor of
estimation of distribution parameters by L-moments (or PWM's) should, nevertheless,
not be accepted blindly. In for instance a wave height frequency analysis, the interest
is the estimation of a given quantile, not in the L-moments themselves. Although the
latter may have desirable sampling properties, the same does not necessarily apply to
a function of them, such as a quantile estimator. In fact, several simulation studies
have demonstrated that for some distributions, other estimation methods may be
superior in terms of mean square errors of quantile estimators (Hosking and Wallis,
1987; Rosbjerg et al., 1992). As compared with for example the classical method of
moments, the robustness vis-à-vis sample outliers is clearly a characteristic of
L-moment estimators. However, estimators can be “too robust” in the sense that large
(or small) sample values reflecting important information on the tail of the parent
distribution are given too little weight in the estimation. Hosking (1990) assessed that
L-moments weigh each element of a sample according to its relative importance.
In this section first the theory of L-Moments will be briefly described,
followed with an overview of papers with applications of L-moments. The literature
review has shown that the theory of L-moments have mostly been applied in a
regionalized setting combining data from more than one site. However, in univariate
settings the method of L-moments has not been investigated so much. Therefore
furtheron in this paper a Monte Carlo experiment is designed in a univariate setting in
order to compare the L-moments method with the classical parameter estimation
methods (MOM, MML, and MLS). The performance of these methods will also be
analyzed w.r.t. inhomogeneous data.
Finally, L-moments are in fact nothing else than summary statistics for probability
distributions and data samples. They are analogous to ordinary moments -- they
provide measures of location, dispersion, skewness, kurtosis, and other aspects of the
shape of probability distributions or data samples -- but are computed from linear
combinations of the ordered data values (hence the prefix L). Hosking and Wallis
(1997) give an excellent overview on the whole theory of L-Moments.

L-Moments for data samples

Probability weighted moments, defined by Greenwood et al. (1979), are precursors of

L-moments. Sample probability weighted moments, computed from data values x1:n,
x2:n, ... xn:n, arranged in increasing order, are given by:

n
b0 = n −1 ∑ x j:n
j =1

br = n −1 b j − 1gb j − 2g...b j − r g
n

∑ b n − 1gb n − 2g...bn − r g x j :n
(0.16)

j = r +1

L-moments are certain linear combinations of probability weighted moments that

have simple interpretations as measures of the location, dispersion and shape of the
data sample. A sample of size 2 contains two observations in ascending order x1:2 and
x2:2. The difference between the two observations x2:2 – x1:2 is a measure of the scale
of the distribution. A sample of size 3 contains three observations in ascending order

8
x1:3 , x2:3 and x3:3. The difference between the two observations x2:3 – x1:3 and the
difference between the two observations x3:3 – x2:3 can be subtracted from eachother to
have a measure of the skewness of the distribution. This leads to: x3:3 – x2:3 - (x2:3 –
x1:3) = x3:3 – 2x2:3 + x1:3 . A sample of size 4 contains four observations in ascending
order x1:4 , x2:4 , x3:4 and x4:4. A measure for the kurtosis of the distribution is given by:
x4:4 – x1:4 - 3 (x3:4 – x2:4). In short: the above linear combinations of the elements of
the ordered sample contain information about the location, scale, skewness and
kurtosis of the distribution from which the sample was drawn. A natural way to
generalize the above approach to samples of size n, is to take all possible sub-samples
of size 2 and to average the differences (x2:2 – x1:2)/2:
FG IJ
1 n
−1

∑ ∑ ( xi:n − x j:n )
l2 =
2 2HK i> j
(0.17)

Furthermore, the skewness and kurtosis are related with:

FG IJ
1 n
−1

∑ ∑ ∑ ( xi:n − 2 x j:n + xk:n )

l3 =
3 3HK i> j> k
(0.18)

1 F nI
−1

= G J ∑ ∑ ∑ ∑(x
4 H 4K
l4 i:n − 3x j:n + 3xk :n − xl:n ) (0.19)
i> j > k >l

Hosking (1990) showed that the first few L-moments follow from PWMs via:
l1 = b0
l2 = 2b1 − b0
(0.20)
l3 = 6b2 − 6b1 + b0
l4 = 20b3 − 30b2 + 12b1 − b0

The coefficients in Eqn. (0.20) are those of the shifted Legendre polynomials. The
first L-moment is the sample mean, a measure of location. The second L-moment is (a
multiple of) Gini's mean difference statistic (Johnson et al., 1994), a measure of the
dispersion of the data values about their mean. By dividing the higher-order L-
moments by the dispersion measure, we obtain the L-moment ratios,

tr = lr/l2 (0.21)

These are dimensionless quantities, independent of the units of measurement of the

data; t3 is a measure of skewness and t4 is a measure of kurtosis -- these are
respectively the L-skewness and L-kurtosis. They take values between -1 and +1
(exception: some even-order L-moment ratios computed from very small samples can
be less than -1). The L-moment analogue of the coefficient of variation (standard
deviation divided by the mean), is the L-CV, defined by:

t=l2/l1 (0.22)

It takes values between 0 and 1 (if X$0).

L-Moments for probability distributions

9
For a probability distribution with cumulative distribution function F(x), probability

z
weighted moments are defined by:
β r = x{F ( x )}r dF ( x ), r = 0,1,2,... (0.23)
L-moments are defined in terms of probability weighted moments, analogously to the
sample L-moments:

λ1 = β 0
λ 2 = 2β 1 − β 0
(0.24)
λ 3 = 6 β 2 − 6β 1 + β 0
λ 4 = 20β 3 − 30β 2 + 12β 1 − β 0

L-moment ratios are defined by:

Jr=8r/82 (0.25)

The L-moment analogue of the coefficient of variation, is the L-CV, defined by:

J=82/81 (0.26)

Examples (for a complete overview, see the Appendix):

Uniform (rectangular) distribution on (0,1):
λ 1 = 1 / 2, λ 2 = 1 / 6, τ 3 = 0, τ 4 = 0. (0.27)
Normal distribution with mean 0 and variance 1:
λ 1 = 0, λ 2 = 1 / π , τ 3 = 0, τ 4 ≈ 0123. . (0.28)
The theory of L-moments has been applied in numerous papers. The following work
is worth to mention: Rao and Hamed (1997), Duan et al. (1998), Ben-Zvi and Azmon
(1997), Van Gelder and Neykov (1998), Demuth and Kuells (1997), Pearson et al.
(1991), Ruprecht and Karafilis (1994), Anctil et al. (1998), Lin and Vogel (1993) and
Gingras and Adamowski (1994).

Relation of L-Moments with order statistics

Consider a sample consisting of n observations {x1, x2, . . ., xn} randomly drawn from
a statistical population. If the sample values are rearranged in a non-decreasing order
of magnitude, x1:n # x2:n # . . . # xn:n, then the r-th member (xr:n) of this new sequence
is called the r-th order statistic of the sample (Harter, 1969). When all the sample
values come from a common parent population with cumulative distribution function
F(x), the probability distribution (CDF) of the r-th order statistic, i.e., Prob[Xr:n ≤ x ],
means that at least r observations in a sample of n do not exceed a fixed value, x.

A sample randomly drawn from a distribution is analogous to a Bernouilli experiment

in which the success is defined by the sampled value being less than the threshold, x.
Naturally, the probability of success in such an experiment is given as p = F(x), and
the number of successes, a random variable, follows the binomial distribution. Based
on this argument, the CDF of the r-th order statistic, F(r)(x), can be mathematically
expressed as

10
n
n
F(r)(x) = ∑  k  F k
( x)[1 − F ( x)]( n − k ) (0.29)
 
k =r

The incomplete Beta function Ix(a,b) (Kendall and Stuart 1977) is defined via the Beta
function B(a,b) as:
Ix (a, b) =
1
B( a , b ) z x

0
t a −1 (1 − t ) b −1 dt

in which: (0.30)

B( a , b ) = z 1

0
t a −1 (1 − t ) b−1 dt =
( a − 1)!( b − 1)!
( a + b − 1)!
if a , b > 0

So, the expression (0.29) can be written in terms of an incomplete Beta function as:
F ( x)
 n
F(r)(x) = r   ∫ u r −1 (1 − u ) n − r du = IF(x)(r,n-r+1) (0.31)
r 0
1 n! n FG IJ
Indeed; note that =
B( r , n − r + 1) ( r − 1)!( n − r )!
=r
r
.
HK
The probability density function of Xr:n is given by the first derivative of Eqn. (0.31):
 n  r −1 n− r
f(r)(x) = r   F ( x )[1 − F ( x )] f ( x) (0.32)
r
Now, the expected value of r-th order statistics can be obtained as
∞
E[Xr:n] = ∫ xf
−∞
(r) ( x)dx (0.33)

Substituting from eqn.(0.32) into (0.33) and introducing a transformation, u = F(x) or

x = F-1(u), 0 ≤ u ≤ 1, leads to
 n1
E[Xr:n] = r   ∫ x(u )u r −1 (1 − u ) n − r du (0.34)
 r 0
Note that x(u) denotes the quantile function of a random variable. The expectation of
the maximum and minimum of a sample of size n can be easily obtained from
eqn.(0.34) by setting r = n and r = 1, respectively.
1 1
E[Xn:n] = n ∫ x(u )u n −1
du , and E[X1:n] = n ∫ x(u )(1 − u ) n −1 du (0.35)
0 0
The probability weighted moment (PWM) of a random variable was formally defined
by Greenwood et al. (1979) as
1
M i , j , k = E[ X i u j (1 − u ) k ] = ∫ x(u ) i u j (1 − u ) k du (0.36)
0
The following two forms of PWM are particularly simple and useful:
1
Type 1: α k = M 1,0,k = ∫ x(u )(1 − u ) k du (k = 0, 1, . . .n) (0.37)
0
and
1
Type 2: β k = M 1,k , 0 = ∫ x(u )u k du (k = 0, 1, . . .n) (0.38)
0

Comparing eqns.(0.37) and (0.38), it can be seen that αk and βk, respectively, are
related to the expectations of the minimum and maximum in a sample of size k

11
1 1
αk = E[ X 1:k +1 ] , βk = E[ X k +1:k +1 ] (k ≥ 1)
k +1 k +1
(0.39)
In essence, PWM’s are the normalized expectations of maximum/minimum of k
random observations; the normalization is done by the sample size (k) itself. From

z z
Eqn. (0.39), we notice that E(Xn:n) = n$n-1 and from Eqn. (0.23) we have
β n−1 = xF n−1 ( x ) f ( x )dx . So E ( X n:n ) = xnf ( x ) F n−1 ( x )dx . On the other hand, using
n
Eqn. (0.16), we have bn−1 = n −1 ∑ x j = n −1xn:n . From this it indeed follows that bn-1 is
j= n

an unbiased estimator of $n-1. Landwehr et al. (1979) gave a proof that br is an

unbiased estimator of $r for other values of r.
The expression for $r in Eqn. (0.23) can numerically calculated by using a
plotting-position formula as follows:

z FG IJ
n r n
j 1
H K
β r = x{F ( x )} dF ( x ) ≈ ∑ x j:n = n −1 ∑ li xi:n
r
(0.40)
j =1 n +1 n i =1

Notice that the expression looks almost the same as Eqn. (0.16) by writing:
br = n −1 ∑
n
b gb g b g
j − 1 j − 2 ... j − r n
x j:n =n −1 ∑ ki xi:n
b gb g b g
j = r +1 n − 1 n − 2 ... n − r i = r +1
(0.41)

The terms li and ki in Eqn. (0.40) and (0.41) are compared in Figure 1 and we notice
indeed a very close similarity. Reiss (1989) derived more approximate distributions of
order statistics and Durrans (1992a) derived distributions of fractional order statistics.

Figure 1: Comparison of Eqn. (0.40) and (0.41).

Summarizing: L-moments are certain linear combinations of probability weighted

moments that are analogous to ordinary moments in a sense that they also provide
measures of location, dispersion, skewness, kurtosis, and other aspects of the shape of
probability distributions or data samples. An rth order L-moment is mathematically
defined as:
r
λ r = ∑ p r*−1,k −1β k −1 (0.42)
k =1

12
where p r*,k represents the coefficients of shifted Legendre polynomials (Hosking
1990). The following normalized form of higher order L-moments is convenient to
work with:
λr
τr = , r = 3,4,. . . and |τr| < 1 (0.43)
λ2
The normalized fourth order L-moment, τ4, is referred to as the L-kurtosis of a
distribution. Hosking and Wallis (1997) showed that L-moments are very efficient in
estimating parameters of a wide range of distributions from small samples. The
required computation is fairly limited as compared with other traditional techniques,
such as maximum likelihood and least squares. In the Appendix the L-moment
formulae are given for a selection of PDFs. Apart from the well-known moments
diagrams, also L-Moment diagrams exist in which L-skewness and L-kurtosis are
plotted against eachother. However, the L-Moment diagrams do not form a complete
class; that is to say points in the diagram may correspond to more than one probability
distribution. This is in contrast to the ordinary moment diagram and also to the *1-*2
diagram of Halphen distributions (Bobee et al., 1993).
The Bayesian method
An introduction

As seen in Section 2.2.3, a consistent method to model parameter uncertainty is given

by the Bayesian approach (see also Box and Tiao, 1973). As we will see later, also
model uncertainty can be modeled by the Bayesian approach. As an introduction to
the Bayesian approach, the following Gumbel model will be analyzed. The Gumbel
likelihood model with location 8 and scale * is given by:

l(x|8,) = (1/)n exp(-(Exi-8)/) exp(-Eexp(-(xi-8)/)) (0.44)

Where x=(x1, x2, ...., xn). If, for instance, we assume a normal distribution for the
location parameter p(8G) = N(8G |µ G,FG) then, the posterior distribution of 8 becomes:

p(8|x)= C N(8|µ G,FG) (1/)n exp(-(Exi-8)/)exp(-Eexp(-(xi-8)/*)) (0.45)

in which C a normalisation constant. Figure 2 shows the prior and posterior of the 8-
parameter when the following values are given: 8G = 20, *G = 20, (µ 8,F8) = (17, 2),
n=150.
P ri o r a nd P o s te ri o r
0 .0 6

0 .0 5

0 .0 4 P o s te ri o r
P robability D ensity

0 .0 3

0 .0 2 P ri o r

0 .0 1

13
0
5 10 15 20 25 30
L bd
Figure 2: Prior and posterior of the 8-parameter of the Gumbel likelihood model

Notice that the updating process of 150 observations has led to a more peaked and
shifted distribution function of the 8-parameter.
The way in which an engineer applies his information about a parameter 1
depends on the objectives in analyzing the data. If he is involved for instance in
calculating the frequency of wave heights H, then the inferences he makes on H
should reflect the uncertainty in 1. In the Bayesian framework we are interested in the
so-called predictive probability density function:

fH(h)=I1 fH(h|1) f(1|h1, h2, ..., hn ,I)d1 (0.46)

where fH(h|1) is the probabilistic model of wave heights, conditional on the

parameters 1, fH(h) is the predictive density of the wave heights (now parameter
free), and f(1|h1, h2, ..., hn ,I) is the posterior density of 1 when both the prior
information I and the observations h1, h2, ..., hn are given. In popular words: “the
uncertainty in the 1 parameters has been integrated out”.
The predictive distribution can be interpreted as being the density fH(h|1)
weighted by f(1|h1, h2, ..., hn, I). Inferences made by combining new information are
achieved by updating the distributions of the uncertain parameters through Bayes’
theorem and then by calculating the updated predictive function fH(h).
If we want to “summarize” the posterior distribution of 1 by one parameter we
can use the Bayes estimator 1* = E(f(1)) or 1* = max(f(1)) (associated with a
quadratic loss function and 0-1 loss function respectively). In making inferences on
wave heights it is important to use the predictive function for h, as opposed to the
probabilistic model for h with the Bayes estimator for the parameter set 1, i.e. f(q|1*).
This is because using point estimators for uncertain parameters underestimates the
variance in wave heights.
This is in short the Bayesian way of thinking (see also Kuczera, 1994). A note
on the way of implementation is valuable. If we have to calculate an integral of the
form If(2)g(2)d2 we can apply Riemann integration which says:

If(2)g(2)d2 . G f(2i)g(2i))2i (0.47)

where 2i is a suitable discretisation of 2 with discretisation interval )2i .

If g is a probability distribution function, we can also apply Monte Carlo simulation.
Draw 2i (i=1..n) from g and we have:

If(2)g(2)d2 . lim n64 1/nGi=1..n f(2i) (0.48)

More advanced implementation techniques are available like Markov Chain Monte
Carlo methods amongst others the Gibbs sampling method, Metropolis-Hastings
method, and more (see Carlin and Louis, 1996).

Obtaining conjugate priors

Provided they are not in direct conflict with our prior beliefs, and provided such a
family can be found, the simplicity obtained by using a conjugate prior is very large.
But in which situations can a conjugate family be obtained?

14
It appears that the only case where conjugates can be easily obtained is for
data models within the exponential family (Bernardo and Smith, 1994). That is,

f ( x|θ ) = h( x ) g(θ )et ( x )c (θ ) (0.49)

for functions h,g,t and c such that

z z
f ( x|θ )dx = g(θ ) h( x )et ( x )c (θ ) dx = 1 (0.50)

This might seem restrictive, but in fact Eqn. (0.49) includes the exponential
distribution, the Poisson distribution, the one-parameter Gamma distribution, the
Binomial distribution, the Normal distribution (with known variance), and the
extreme value distributions. For instance, the Gumbel distribution can be written as:

f ( x | α, ξ) = α exp{−( x − ξ)α} exp[− exp{−( x − ξ)α}] =

= α exp( − αx ) exp(ξα) exp[− exp( − αx ) exp( ξα)] = h( x ) g (ξ) exp(t ( x )c(ξ))
in which
(0.51)
h ( x ) = α exp( −αx )
g (ξ) = exp( ξα)
t ( x ) = exp( − αx )
c( ξ) = − exp(αξ)
" is considered as a constant; > is the conjugate parameter. With
a prior of 2, f(2), we can write:
n
n ∑ t ( xi ) c ( θ )
f (θ | x ) ∝ f (θ )l ( x|θ ) = f (θ )∏ h( xi ) g(θ )n e i =1 ∝
i =1 (0.52)
n

∑ t ( xi ) c ( θ )
∝ f (θ ) g(θ )n e i =1
Thus if we choose:

f (θ ) ∝ g(θ )d ebc (θ ) (0.53)

In case of the Gumbel distribution, f (ξ ) ∝ exp( dαξ )exp( − b exp(αξ )) , we obtain:

f (θ | x ) ∝ g(θ ) n+ d L n
O
exp{c(θ )M∑ t ( x ) + bP} = g(θ ) e
~
d
~
b c (θ )

N i =1Q i (0.54)

giving a posterior in the same family as the prior, but with modified parameters
~ ~ n
d = n + d , b = ∑ t ( xi ) + b. Indeed, for the Gumbel likelihood model and conjugate
i =1
prior with parameters d and b, we obtain:

15
ξα −α ( xi −ξ ) ξα −Σe−α ( xi −ξ )
f (ξ| x) = edξαe−be Παe−α( xi −ξ )e−e = α nedξα−Σα( xi −ξ )e−be =
ξα αξ Σe−αxi αξ ( b+Σe−αxi )
= α nedξα−αΣxi +nαξe−be e−e = α ne−αΣxi edξα+nξαe−e =
−eαξ ( b+Σe−αxi )
∝ eξα(n+d )e
(0.55)

The use of conjugate priors should be seen for what it is: a convenient mathematical
device. However, expression of one's prior beliefs as a parametric distribution is
always an approximation. In many situations the richness of the conjugate family is
great enough for a conjugate prior to be found which is sufficiently close to one's
beliefs (see also Pannullo et al., 1993).
The conjugate priors for the scale parameters of other PDFs can be easily
found with the techniques from this section, and they are summarized in the
Appendix.

The predictive distribution

So far, we have focused on parameter estimation. That is, we have specified a

probability model to describe the random process which has generated a set of data,
and have shown how the Bayesian framework combines sample information and prior
information to give parameter estimates in the form of a posterior distribution.
Commonly the purpose of formulating a statistical model is to make predictions about
future values of the process. This is handled much more elegantly in Bayesian
statistics than in the corresponding classical theory. The essential point is that in
making predictions about future values on the basis of an estimated model there are
two sources of uncertainty (Van Gelder, 2000):

- Statistical uncertainty in the parameter values which have been estimated on the
basis of past data; and
- Inherent uncertainty due to the fact that any future value is itself a random event.

In classical statistics it is usual to fit a model to the past data, and then make
predictions of future values on the assumption that this model is correct, the so-called
estimative approach. That is, only the second source of uncertainty is included in the
analysis, leading to estimates which are believed to be more precise than they really
are. There is no completely satisfactory way around this problem in the classical
framework since parameters are not thought of as being random.
Within Bayesian inference it is straightforward to allow for both sources of
uncertainty by simply averaging over the uncertainty in the parameter estimates (the
information of which is completely contained in the posterior distribution).
So, suppose we have past observations of a variable with density function (or
likelihood) f(x|2) and we wish to make inferences about the distribution of a future
value y from this process. With a prior distribution f(2), Bayes' theorem leads to a
posterior distribution f(2|x). Then the predictive density function of y given the data x
is:

f ( y| x ) = z f ( y|θ ) f (θ | x )dθ (0.56)

16
Thus, the predictive density is the integral of the likelihood (of a single observation)
times the posterior. Notice that this definition is simply constructed from the usual
laws of probability manipulation, and the definition itself has a straightforward
interpretation itself in terms of probabilities.
The corresponding approach in classical statistics would be, for example, to
obtain the maximum-likelihood estimate 2* of 2 and to base inference on the
distribution f(y|2*), the estimative distribution.
To emphasize again, this makes no allowance for the variability incurred as a
result of estimating 2, and so gives a false sense of precision (the predictive density
f(y|x) is more variable by averaging across the posterior distribution for 2).
For data models within the exponential family, we obtain from (0.49):

z
f ( y| x) = f ( y|θ) f (θ| x)dθ = h( y)g(θ)e
Note:
z t ( y )c(θ ) ~ ~

z ~ ~

g(θ )d ebc(θ )dθ = h( y) g(θ )d +1ec(θ )(t( y)+b)dθ

z θυ−1e−µθ dθ =
Γ(υ)
µυ
(0.57)

The latter expression can be used to solve the integral in case of simple h, g, c and t
functions. In case of the Gumbel likelihood model, we can derive (by applying the
substitution u=e"2):

z z
~
~ ~
d +1 − u ( e − αy + b ) du 1 d~ − u( e− αy + b~ ) Γ( d + 1)
f ( y| x ) ∝ h( y ) u e = h( y ) ue du = e−αy
αu α ~ ~
( e −αy + b)d +1 (0.58)

The normalising constant follows from If(y|x)dy = 1. The posterior functions of the
location and shape parameters of the exponential family (except for the normal
distribution) cannot be expressed in explicit form. Numerical integration has to be
performed. Uniform prior distributions can be used for the scale and location
parameters. For a large selection of models, the following information is given in the
Appendix: likelihood model, non-informative prior (and corresponding posterior), the
conjugate prior, the conjugate posterior, and the conjugate posterior predictive.

Some comments on the Empirical Bayes Method and the Bayes Linear
Estimation

The empirical Bayes method is a way of using sample information to assist in

specifying the prior distribution. As such, the procedure is not strictly Bayesian, since
in a proper Bayesian procedure the prior distribution must be formulated
independently of the data. However, the technique is now widely used.
It is clearly very difficult to formulate one's prior information very accurately,
and it may be possible only to specify means, variances and covariances with any real
faith. However, the posterior distribution will depend on a complete specification of
the prior. The Bayes linear estimator of a parameter is an estimator whose value
depends only on means and covariances and so does not require a fuller prior
specification.

17
Nonparametric methods
The basic idea of nonparametric density estimation is to relax the parametric
assumptions about the data, typically replacing these assumptions with ones about the
smoothness of the density. The most common and familiar nonparametric estimator is
the histogram. Here the assumption is that the density is fairly smooth (as determined
by the bin widths) and an estimate is made by binning the data and displaying the
proportion of points in each bin (producing a necessarily non-differentiable, but still
useful estimate).
The kernel density estimator is related to the histogram, but produces a smooth
(differentiable) estimate of the density. It has been studied widely since its
introduction in Rosenblatt (1956). Given i.i.d. data x1,...,xn drawn from the unknown
density ", the standard kernel estimator (SKE) is the single bandwidth estimator:

1 n FG
x − xi IJ
α$ ( x ) = ∑
nh i =1
K
H h K (0.59)

The bandwidth h determines the amount of smoothing produced by the estimator. See
the recent books by Silverman 1986, Scott 1992 and the bibliographies contained
therein, for a good introduction to kernel estimators. Much work has been done on

selecting the optimal bandwidth h under different assumptions on " or different

optimality criteria.
An obvious problem with this kind of estimator for a finite data set is that it
uses a single bandwidth (or smoothing parameter) throughout the entire support of the
density. For densities with long tails, or modes with different variances, this can be a
problem.
The group at Utah Water Research Laboratory, under the guidance of Prof.
Upmanu Lall has been working on developing and applying nonparametric estimation
techniques to a wide range of surface and groundwater hydrologic problems including
time-series forecasting. Research by Upmanu Lall and his co-workers has focused on
developing nonparametric statistical methods for the estimation of probabilities of
rare floods that are more appropriate in such situations (Moon and Lall, 1994, Moon
et al., 1993, and Lall et al., 1993).
For densities with long tails, or modes with different variances, there are
modifications to the standard kernel estimator (0.59) proposed by Marchette (1995),
and Marchette et al. (1994) which uses a small number of bandwidths rather than a
single one as in (0.59), which allows local tuning of the density. As motivation for
this estimator, consider a density which consists of a mixture of two normals with
different variances which are very far apart. For concreteness, let:

α ( x ) = pϕ ( x ,− µ , σ 12 ) + (1 − p )ϕ ( x, µ , σ 22 ) (0.60)

18
Suppose we wished to use the standard (single bandwidth) kernel estimator if
possible. It seems reasonable that near the left-hand mode one would wish to use a
bandwidth appropriate to that normal, and similarly on the right. So a reasonable
approach might be to filter the data into two distinct data sets, one from the right
component and one from the left, and estimate these two components separately.
One way to do this (approximately) is to define

ρ1 ( x ) = χ {x >0} ( x )
(0.61)
ρ 2 ( x ) = χ {x ≤0} ( x )
and define our estimator to be:
F
1 n ρ1 ( x i ) FG
x − xi ρ (x ) IJ FG
x − xi IJ I
α$ ( x ) = ∑
n i =1 h1GH K
h1 H+ 2 i K
h2 Kh2 H K JK (0.62)

This allows us to use the bandwidths appropriate to the different components in the
different regions where they are supported. Equation (0.61) is not quite right,
however, for as we move the two components closer together, the overlapping region
becomes more and more significant. What we really want to do is use the posteriors
for each component as our D functions. This is the motivation of the filtered kernel
estimator as proposed by Marchette (1995).
With the above example in mind, suppose we wish to have a small number of
bandwidths where each bandwidth is associated with a region of the support of the
density. To each bandwidth we associate a function which "filters" the data, as in
(0.61). Basically, the filter will define the extent to which each local bandwidth is to
be used for any particular data point. We can then construct a kernel estimator which
is a combination of the kernel estimators constructed using each bandwidth, with the
data filtered by the filtering functions.
Another method of nonparametric estimations is the following. Suppose we
have independent random variables X1 ; X2 ; ... , all of them with distribution function
F . Suppose that F is in the domain of attraction of some extreme value distribution
G(. In other words: suppose 1-F is regularly varying at infinity, i.e.,

1 − F ( tx )
lim t →∞ = x −1/γ (0.63)
1 − F (t )

for x>0, where ( is a positive parameter.

Let X1:n X 2:n .... X n:n be the n-th order statistics. For some m < n define:
X − X n− 2 m+1:n
γ$ p = (log 2)−1 log n− m:n (0.64)
X n− 2 m+1:n − X n− 4 m+1:n
This is Pickands’ estimator for ( (Pickands, 1975).
Another estimator for ( is the moment estimator (Dekkers and De Haan, 1989):

γ$ M = M (1) 1
+1− 1−
RS
( M n(1) )2 UV −1

(0.65)
n
2 T
M n( 2 ) W
in which:
1 k −1
M n( i ) = ∑ (log X n− i:n − log X n− k :n )i (0.66)
k i=0

19
The estimators are consistent for any (. Under appropriate smoothness conditions
on F and a further bound on the rate of increase of k(n) the estimators for ( are
asymptotically normal after normalization (see e.g. Dekkers and de Haan, 1989).

Performance of the statistical estimation methods

In the previous six sections an overview of statistical estimation methods has been
given. The large number of distributions and estimation methods proposed in the
literature may cause confusion which method and/or distribution to use (Bobee et al.,
1993b). The World Meteorological Organization (1989) and Cunnane (1987-1988)
published a few reports which compare current methodologies, and recommend a
number of statistical distributions and estimation procedures. These reports are
already ten years old and the last decade has been extremely busy with new results on
this topic. In the present section we report the latest developments, show advantages
and disadvantages of the various methods, and propose new ideas for a possible
comparison strategy.

Variability of estimators

As written earlier in the paper we prefer estimators which have low variability. But
the question arises as to whether there is any limit to the accuracy that an estimator
can achieve (Beard, 1994). Intuitively, we would expect that there is, since the
variation from one sample to another means that any estimator is bound to have some
degree of variation. However, is it possible to bound the accuracy which an estimator
can achieve? In fact, provided we restrict ourselves to unbiased estimators there is a
remarkable theorem which provides a bound on the variance of any (unbiased)
estimator (Barnett, 1973).
The quantity :

I(p)=-1/nE(M2logL(p,x)/Mp2) (0.67)

is extremely important in statistical theory. It is termed the Expected Information (and

related with the Fisher matrix of Eqn. (2.2) up to a factor n-1 (Fisher (1934)). Since
second derivatives measure curvature, I(p) is a measure of the expected curvature of
the likelihood function at the true parameter value. The importance of the expected
information in the context of minimum variance estimation is provided by the
following result, which is known as the Cramér-Rao theorem (1946):

If T(X) is an unbiased estimator of p then

Var(T(X)) > I-1(p) (0.68)

Thus, within the class of unbiased estimators, no estimator can have a variance which
is smaller than the reciprocal of the expected information. So, as required, we obtain a
bound on the maximum precision that can be attained within the class of unbiased
estimators. Note that the bound only applies to unbiased estimators. It is always
possible to obtain an estimator with lower variance by giving up on the property of
unbiasedness. (For example, the estimator T(X)=100 has zero variance for any
problem).

20
The result is important for both theoretical and practical reasons; it formulates
in a precise way the limits that can be achieved in statistical inference due to random
variation in the population. Though the Cramér-Rao bound puts a lower limit on the
variance of unbiased estimators, the bound may not be achievable.
So far we have used the likelihood function only to determine an estimate of
an unknown parameter 2. We have noted several times, however, that values of 2
with relatively high likelihood are more plausible than those with low likelihood, and
that it should be possible to exploit this knowledge to construct a credible range for 2.
This is achieved formally by looking at the sampling properties of the likelihood
function: quantifying the variation in L(2) from sample to sample.
In most situations exact calculation is very difficult and we will be forced to
use (asymptotic) approximations that assume large sample sizes. In the next Sub-
section we present some methods which are useful in deriving the sampling properties
of the likelihood function. It will be shown that these methods can be used to derive
the sampling distributions of the ML and LM estimators of the exponential
distribution.

Methods for deriving the sampling distribution of estimators

Mood et al. (1974) present methods for deriving the sample distribution of estimators.
In some cases it is possible to establish the exact sampling distribution of an estimator
and use this as the basis for confidence interval estimation. More generally this is not
possible but approximations can be obtained. The following relations can be used in
deriving the exact sampling distributions:

If X and Y are random variables with PDF’s f and g respectively. Let

Z=X+Y, U=X-Y, V=XY and W=X/Y

then the PDF’s of Z, U, V and W are, respectively, given by:

fZ(z) = If(x)g(z-x) dx
(0.69)
fU(u) = If(u+y)g(y) dy

fV(v) = If(x)g(v/x) |x|-1 dx

fW(w) = If(xw)g(x) |x| dx

The proof of these formulae is given in Mood et al. (1974). In textbooks on statistics
the following relation is proven:

E(XY) = E(X)E(Y) (0.70)

n n n n
Var ( ∑ ai X i ) = ∑ ai2Var ( X i ) + 2∑ ∑ aia j Cov ( X i , X j )
i =1 i =1 i =1 j = i +1

Furthermore it is possible to derive the following property for the product of random
variables:

21
Var(V) = Var(X)Var(Y) + E2(X)Var(Y) + E2(Y)Var(X)
(0.71)

If exact calculations are not possible, the following approximation rules can be used
(using Taylor’s formula):

dg ( x ) ( X − mX )2 d 2 g ( x )
g ( X ) = g ( mX ) + ( X − mX ) + +...
dx |x = mX 2 dx 2 |x = mX (0.72)

From this, we can derive:

E(g(X)) . g(E(X)) (0.73)

Var(g(X)) . Var(X) [g’(mX)]2 (0.74)

If the coefficient of variation of X is less than c, the error involved in these

approximations is less than c2. In particular, the following useful approximations can
be used:

Var ( X )
E ( X ) ≈ E ( X ), Var ( X ) ≈
4E( X )
(0.75)
−1 1 Var ( X )
E( X ) ≈ , Var ( X −1 ) ≈ 4
E( X ) E (X)
FG 1 IJ σ( X )
F 1 I H X K ≈ E ( X ) = σ( X ) = CV ( X )
CV G J =
σ 2
From Eqn. (0.75) the approximation
H X K EFG 1 IJ 1 E( X )
H X K E( X )
follows directly. This approximation was shown to be very useful in Sec. 2.6.
The second equation in (0.75) is useful in deriving the sampling distribution of
the standard devation. It is well known that Var(s2) = 2F4/N in which F is the
population standard deviation of the normal distributed Xi and all Xi’s are i.i.d.
Therefore:

2σ 4
σ ( s) = N = σ (0.76)
4σ 2 2N

Exact calculations for the estimations of the 2-parameter Exponential distribution will
be given in the next sub-section.

Exact sampling distributions for the Exponential distribution

The techniques from the previous section will shown to be succesful in deriving the
exact sampling distributions for the parameter estimations of the MML and MLM of
the exponential distribution.
The expressions of the MML and MLM for the location and scale parameter of
the exponential distribution are given in the Appendix. Suppose that X1, X2, ..., Xn ~

22
Exp(>,") i.i.d. This is the usual exponential model parameterized in such a way that
1 n
>+" is the population mean. The sampling distribution of mX = ∑ X i is the sum of
n i=1
n exponential distributions divided by n. According to Johnson et al. (1994), a
summation of n independent exponential distributions, each with scale parameter " is
gamma distributed with parameters (n, "), and a constant k times a gamma
distribution with parameters (",$) is also gamma distributed with parameters (",$/k).
Therefore mX is > + gamma distributed with parameters (n,"n).
Furthermore P(min(X1, X2, ..., Xn) < x) = 1 – P(min(X1, X2, ..., Xn) > x) =
= 1 – P (Xi > x) = 1 – (1 – P(Xi < x))n. So the minimum of n exponential distributions,
n

each with scale parameter ", is also exponentially distributed with scale parameter
"/n. Finally, Sukhatme (1937) showed that 2n/"(mX – min((X1, X2, ..., Xn)) has a Chi-
square distribution with 2(n-1) degrees of freedom.
With the above considerations and with Eqn. (0.70) we can derive that the
following properties for the sampling distribution of the MLEs:

E("ML) = "(1-1/n)
Var("ML) = "2(n-1 - n-2) (0.77)
E(>ML) = > + "/n
Var(>ML) = "2/n2

In order to derive the sampling distribution for the MLM, we use the following
properties of the order statistics of the standard exponential distribution (Johnson et
al., 1994):
i
1
E( X( i) ) = ∑
j =1 n − j + 1
i
1
Var ( X ( i ) ) = ∑ (0.78)
j =1 ( n − j + 1)
2

i
1
Cov( X ( i ) , X ( k ) ) = ∑ = Var ( X ( i ) )
j =1 ( n − j + 1)
2

The following summations have been derived:

n j
1
∑ ∑
j =1 k =1 n − k + 1
=n

n j
j −1
∑ ∑ n − k + 1 = c( n )
j = 2 k =1
(0.79)

n
2 j −1− n 1
∑
j = 2 n( n − 1)
= 1+
n
( n − 1)n n−1
The function c(n) can be derived to c( n ) = Ψ( n ) − ∑ ( n − j )Ψ( j ) , in which
2 j =1

Γ '( x ) n
1
Q is the Psi-function Ψ( x ) = , having the property ∑ = Ψ( n ) − Ψ( k )
Γ( x ) i= k i
2
(Abromowitz and Stegun, 1965). The function c(n) behaves as n for large values of n.
We write c(n)=O(n2).

23
For n=2 the L-Moments estimator of " reads "LM = x2:2 – x1:2. The
difference x2:2 – x1:2 has an exponential distribution with parameter ". Consequently
E("LM) = " (unbiased) and Var("LM) = "2 for n=2. The Eqn. (0.79) and the asymptotic
covariance matrix of the sample L-moments of Hosking (1986) are necessary to
derive the following sampling properties of MLM:

E("LM) = "(4n-1(n-1)-1c(n)-2)
Var("LM) = 4"2 / 3n + O(n-2)
E(>LM) = > + "(4n-1(n-1)-1c(n)-3) (0.80)
Var(>LM) = "2 /3n + O(n-2)

Note that the variance of "LM is slightly larger than the variance of "ML. The variance
of >LM decreases with O(n-1) instead of O(n-2) in case of the maximum likelihood
method. Furthermore, the expressions 3- 4n-1(n-1)-1c(n) and n-1 can be plotted in
Figure 3 for a graphical comparison of the biases:

Figure 3: Expressions 3 - 4n-1(n-1)-1c(n) and n-1

C onvergence as a function of sample size

0.9

0.8

0.7

0.6
B ias

0.5

0.4

0.3

L-Moments
0.2
ML
0.1

0
0 10 20 30 40 50 60
S ample size

Notice that the ML-method has a slightly better performance for the parameter
estimation of the exponential distribution than the L-Moments method; its bias and
RMSE is lower than the MLM-equivalents.

Performance based on relative bias and RMSE of the estimators

With Monte Carlo simulation studies, datasets can be generated from a beforehand
known probability distribution function (and known p-quantile). Different parameter
estimation methods can be applied on these datasets and compared with respect to
their estimates of the p-quantiles. The estimation method with the smallest bias and/or
variance is then considered to be the best method for that particular distribution
function. In order to familiarize the reader with the concept, in the following we start
with a simulation analysis of six estimation methods for the scale parameter of a one-
parameter exponential distribution F(x)=1-e-8x. In Figure 4 we see the performances
of the six methods for one simulation of 20 values the exponential distribution with

24
scale parameter 8=1.5. The different methods cause a very wide range of frequency
lines. The 100 year event (or the 0.99 quantile) of the theoretical distribution is 3.07m.

Figure 4: Performance of six methods

The above simulation analysis is repeated 200 times and from each simulation we
store per estimation method the results of the 1/100 year prediction. This excersise is
also repeated for other sample sizes. Apart from n=20 values we look at n=3, 6, 9,...,
60 values. In the next Figure 5 we have plotted the mean and standard deviations of
each estimation method.

25
Figure 5: Performance of six estimation methods as function of sample size (source: Van
Gelder and Vrijling (1997a))

In general it is impossible to say which estimation method is the most appropriate

method for a particular model and dataset. This depends on the size of the sample, the
type of the distribution, the choice of the parameters of the distribution, the
inhomogeneity that is embedded in the data, and of course the choice of the criterion.
However, a Monte Carlo simulation is the suitable method to examine the
performance for a certain choice of the above mentioned dependencies.
Lots of simulation work has been performed to judge the performance of the
estimation methods based on the relative bias and RMSE of the distribution
parameters. An overview of this work is given in the following Table 1:

Table 1: Literature review (only first authors are shown)

MOM LM (or PWM) ML LS Bayes ME
GUM Takara, 1989 Takara, 1989 Takara, 1989 Takara, 1989 Coles, 1996 Takara, 1989
Yamaguchi, 97 Yamaguchi, 97 Yamaguchi, 97 Yamaguchi, 97
Carter, 1983 Landwehr, 79 Carter, 1983 Carter, 1983
Landwehr, 79 Guo, 1991 Corsini, 1995

26
WEIB Yamaguchi, 97 Yamaguchi, 97 Yamaguchi, 97 Yamaguchi, 97 Smith, 1987 Singh, 1990
Abernethy, 83 Smith, 1987
GPA Moharran, 93 Moharran, 93 Moharran, 93 Moharram, 93 Coles, 1996 Singh, 1997
Hosking 1987 Hosking 1987 Hosking 1987
Castillo 1997 Castillo 1997 Castillo 1997
GEV Takara, 1989 Takara, 1989 Takara, 1989 Yamaguchi, 97 Fill, 1998 Singh, 1992
Yamaguchi, 97 Yamaguchi, 97 Yamaguchi, 97 Coles, 1996 Jowitt, 1979
Sank., 1999 Lu, 1992b Hosking, 1985 Preumont, 88 Lind, 1991
Wang. 1998
LN Stedinger,1980 Takara,1990 Stedinger,1980 Takara, 1990 Corbyn 1988 Singh, 1987
Hoshi, 1986 Sank., 1999 Takara, 1990 Lechner, 1991 Lye, 1988
Yamaguchi, 97 Yamaguchi, 97 Yamaguchi, 97
Goda, 1992 Takeuchi 1988 Takeuchi 1988
GAM Hoshi, 1986 Rasmussen, 94 Hoshi, 1986 Ashkar, 1998 vNoortwijk, 99 Singh 1985
Bobee, 1991 Wu, 1991 Hu, 1987 Ribeiro 1993
Durrans, 92b Stacy, 1965

Maximum likelihood estimation of the generalized Pareto distribution (GPA) has

previously been considered in the literature, but Hosking et al. (1987) show that
unless the sample size is 500 or more, estimators derived by the method of moments
or the method of probability-weighted moments are more reliable. They also use
simulations to assess the accuracy of confidence intervals for the parameters and
quantiles of the generalized Pareto distribution.
Various estimation methods for the three-parameter case of Generalized Pareto
distribution are given in Moharram et al. (1993), including an alternative method
based on least squares (LS). Modified formulae for computing estimators are
provided. The performances of these methods are compared by using Monte Carlo
simulation. It is found that the LS method has generally a lower root mean squared
error (RMSE) than that obtained using other methods. The LS method also performs
best in terms of BIAS when the shape parameter is greater than zero, while the
probability weighted moments (PWM) method performs best when the shape
parameter less than zero.
In Castillo and Hadi (1997), it is shown that when the shape parameter of the
GPA is greater than 1, the maximum likelihood estimates do not exist, and when the
shape parameter is between 1/2 and 1, they may have problems. Furthermore, for
shape parameters less than or equal to -1/2, second and higher moments do not exist,
and hence both the method-of-moments (MOM) and the probability-weighted
moments (PWM) estimates do not exist. Another and perhaps more serious problem
with the MOM and PWM methods is that they can produce nonsensible estimates
(i.e., estimates inconsistent with the observed data). In Castillo and Hadi (1997), a
simulation study is carried out to evaluate the performance of the parameter
estimation methods and to compare them with other methods suggested in the
literature. The simulation results indicate that no method is uniformly best for all the
parameter values.
In Fill and Stedinger (1995), it was shown that for realistic generalized
extreme value (GEV) distributions and short records, a simple index-flood quantile
estimator performs better than two-parameter (2P) GEV quantile estimators with
probability weighted moment (PWM) estimation using a regional shape parameter
and at-site mean and L-coefficient of variation (L-CV), and full three-parameter at-
site GEV/PWM quantile estimators. However, as regional heterogeneity or record
lengths increase, the 2P-estimator quickly dominates. Fill and Stedinger (1995)
generalizes the index flood procedure by employing regression with physiographic
information to refine a normalized T-year flood estimator. A linear empirical Bayes

27
estimator uses the normalized quantile regression estimator to define a prior
distribution which is employed with the normalized 2P-quantile estimator. Monte
Carlo simulations indicate that this empirical Bayes estimator does essentially as well
as or better than the simpler normalized quantile regression estimator at sites with
short records, and performs as well as or better than the 2P-estimator at sites with
longer records or smaller L-CV.
In Wang (1998), approximate goodness-of-fit tests of fitted generalized
extreme value (GEV) distributions using LH moments are formulated on the basis of
comparison of sample LH kurtosis estimates and theoretical LH kurtosis values of the
fitted distributions. Their tests are different from those that have been derived for
testing the GEV distributions of which parameter values are known a priori. The tests
are intended to answer the following questions: Does a fitted GEV distribution
describe adequately a given data series? If not, can the GEV distribution function
describe adequately the larger events in that data series for use for high quantile
estimation? If so, what degree of emphasis on the larger events is needed in order that
the GEV distribution becomes acceptable? The use of the GEV distribution in
conjunction with the LH moment estimation method and the formulated tests should
alleviate the need for finding the "correct" distribution. The tests are evaluated by
Monte Carlo simulations using generated samples of both GEV and Wakeby
distributions.
Takeuchi and Tsuchiya (1988) derive PWM solutions for Normal and 3-parameter
Lognormal distributions. Their paper presents their relative accuracy in comparison
with other parameter estimation procedures such as Moment, Maximum-likelihood,
Quantile and Sextile methods through Monte Carlo simulation experiments.
Simulation results revealed that PWM estimates of quantiles are unbiased for the
Normal distribution and less biased than those of the Moment method for Lognormal
distribution with a large coefficient of skewness. It was also revealed that the RMSE
of PWM estimates of quantiles is as small as that of the Moment method for the
Normal distribution but larger for the Lognormal distribution.
In Lechner (1991) three common estimators for the parameters of the
lognormal distribution are evaluated. Correction factors which eliminate essentially
all the bias, and formulas for the standard deviations of the estimators, are presented.
It is reported that the Persson-Rootzen estimators are about as good as the maximum-
likelihood estimators, without the penalty of requiring iterative optimization. Also, the
estimators resulting from (least squares) fitting a line to the plot of log lifetimes on
normal (Gaussian) probability paper are reasonably good. Formulas are given for
obtaining these latter estimators without actually plotting the points. Lechner (1991)
simulated 5k to 30k samples (more samples for smaller N for each case) and
calculated the following: the means, standard deviations, and third moments of each
estimator; correlations between the two members of each pair; comparisons between
the estimators; and simple corrections to improve the performance of the estimators.
In Corbyn (1988), methods are developed for the determination of the
posterior distribution of the first moment of the lognormal distribution with
exponential and other prior distributions. Bayesian methods of statistical inference are
compared with the more generally used method of inference based on confidence
limits. The general problem of Bayesian estimation of the mean of a correlated
random variable is discussed.
Sankarasubramanian et al. (1999) deals with fitting of regression equations for
the sampling properties, variance of L-standard deviation, and bias and variance of L-
skewness, based on Monte-Carlo simulation results, for generalised Normal

28
(Lognormal-3) and Pearson-3 distributions. These fitted equations will be useful in
formulating goodness-of-fit test statistics in regional frequency analysis. The second
part of their paper presents a comparison of the sampling properties between L-
moments and conventional product moments for generalised Normal, generalised
Extreme Value, generalised Pareto and Pearson-3 distributions, in a relative form. The
comparison reveals that the bias in L-skewness is found to be insignificant up to a
skewness of about 1.0, even for small samples. In case of higher skewness, for a
reasonable sample size of 30, L-skewness is found to be nearly unbiased. However,
the conventional skewness is found to be significantly biased, even for a low
skewness of 0.5 and a reasonable sample size of 30. The overall performance
evaluation in terms of "Relative-RMSE in third moment ratio" reveals that
conventional moments are preferable at lower skewness, particularly for smaller
samples, while L-moments are preferable at higher skewness, for all sample sizes.
Corsini et al. (1995) analyze the Maximum likelihood (ML) algorithms and
Cramer-Rao (CR) bounds for the location and scale parameters of the Gumbel
distribution are discussed. First they consider the case in which the scale parameter is
known, obtaining the estimator of the location parameter by solving the likelihood
equation and then evaluating its performance. They also consider the case where both
the location parameter and the scale parameter are unknown and need to be estimated
simultaneously from the reference samples. For this case, performance is analyzed by
means of Monte-Carlo simulation and compared with the asymptotic CR bound.
Also results of a Monte Carlo study are presented in Guo and Cunnane (1991)
comparing different simulation procedures and assessing the value of historical floods
for at-site flood frequency analysis on the assumption of a Gumbel distribution.
In Wu et al. (1991), a new procedure, the method of lower-bound (MLB), is
proposed for determining the design quantile Xp. Their basic concept is first to
determine an estimate of the location parameter using the probability weighted
moment (PWM) method, and then to transform the variable X from the original (X)
space to a new (Y) space. The variable Y is considered to have a two parameter
gamma distribution. In Y-space, the two parameters are estimated by the PWM or an
autocovariance method, then transformed from the Y-space to the X-space. Results
from the Monte Carlo experiments show that the MLB estimates are less biased than
comparable moment estimates and maximum likelihood estimates, and more efficient
than those of PWM for design quantiles xp.
In Hu (1987a), the determination of confidence intervals for design floods
using the Pearson Type III distribution with a known skewness is analyzed. Tables for
the confidence factors based on moment and curve-fitting estimates were developed
by Monte Carlo simulation technique and were used to construct the confidence
intervals for frequency curves. The performance of methods using tables presented
herein, the method of B values based on the curve-fitting method are evaluated.
In Naghavi and Yu (1996), it is shown that the quantile prediction accuracy of
the log-Pearson type III (LP3) distribution depends largely on the accuracy of the
parameter-estimation method used. The performance of a parameter-estimation
method, on the other hand, depends on both the individual population chosen from the
LP3 family and the sample size. In this study Monte Carlo experiments were
conducted to evaluate four parameter-estimation methods that are frequently used in
hydrological analysis. The four methods tested are the method of indirect moments
(MMI), the method of mixed moments (MIX), the method of direct moments (MMD),
and a modification of MMI using optimization techniques (MMO). A quantile ratio
index (QRI) was devised to identify the limits (sample size and LP3 population

29
subset) within which each of these methods will perform best. This study
suggested that when QRI is less than or equivalent to 1.14, MMI or MMO should be
used for sample size N less than equivalent to 30, MIX for 30 less than N less than
100, and any of the four methods for N greater than equivalent to 100. When QRI
greater than 1.14, MMO is recommended for N less than equivalent to 30, MIX for 30
less than N less than 100, and MIX, MMO, or MMI for N greater than equivalent to
100.
In Pilon and Adamowski (1993), maximum likelihood and censored sample
theory are applied for flood frequency analysis purposes to the log Pearson Type III
(LP3) distribution. The logarithmic likelihood functions are developed and solved in
terms of fully specified floods, historical information, and parameters to be estimated.
The asymptotic standard error of estimate of the T-year flood is obtained using the
general equation for the variance of estimate of a function. The variances and
covariances of the parameters are obtained through inversion of Fisher's information
matrix. Monte Carlo studies to verify the accuracy of the derived asymptotic
expression for the standard errors of the 10, 50, 100, and 500 year floods, indicate that
these are accurate for both Type I and Type II censored samples, while the bias is less
than 2.5%. Subsequently, the Type II censored data were subjected to a random,
multiplicative error. Results indicate that historical information contributes greatly to
the accuracy of estimation of the quantities even when the error of its measurement
becomes excessive. (Type I: The threshold is fixed and the number of censored values
is a random variable. Type II: The number of censored values is fixed and the
threshold is a random variable).
In Lye et al. (1988), the three-parameter lognormal distribution is studied with
Bayesian estimates of the parameters and of the T-year quantile and their posterior
variances of the estimates are obtained by using Lindley's Bayesian approximation
procedure. These estimates are compared to estimates obtained by the method of
maximum likelihood. In all cases the posterior variances of the Bayes estimates of the
T-year flood events are less than the corresponding variances of their maximum
likelihood counterparts.
In Lye et al. (1993), ML estimators were compared with Bayesian estimators
for the reliability functions of the extreme value distributions.
In Yamaguchi (1997), Monte Carlo simulations have been performed to
determine a preferable parameter estimation method for each of eight distribution
functions. It was also shown that a jackknife method is beneficial to correct the bias
and RMSE irrespective of the estimation method used.
In Takara and Stedinger (1994) the use of two-parameter distributions is
recommended from the viewpoint of quantile estimation accuracy for datasets having
sample skewness greater than 0.38 and less than 1.8. They found that the quantile
lower bound estimators are likely to provide more accurate quantile estimators than
other procedures.
Singh and Guo (1997) and Singh and Singh (1985, 1987) showed that the
Method of Entropy yielded parameter estimates for the Generalized Pareto, the
Gamma and the lognormal distributions which were comparable or better within
certain ranges of sample size and coefficient of variation in comparison with MOM,
PWM and ML.

Performance based on over- and underdesign

30
In Van Gelder (1996b) it was suggested to measure the performance of statistical
estimation methods with respect to over- and underdesign. In fact there is a strong
relation with the performance based on relative bias and RMSE from the previous
section. The relative bias and RMSE can give a first indication how much the under-
or overdesign is, however, in case of very skewed distributions of the quantile, this
indication might give a false impression of the amount of under- or overdesign.
Under- or overestimation of the p-quantiles have an important meaning in civil
engineering practice as well. Underestimation may give rise to unsafe structures
whereas overestimation may lead to conservatism or too expensive structures.
Therefore it is very useful to study the probabilities of under- and overdesign of a
certain estimation method.
As a typical result from Van Gelder (1996b), in this section, we will in particularly
concentrate on the under- and overestimation of the p-quantile of an Exponential and
Gumbel distribution with a ML- and LS-parameter estimation method. Different
sample sizes are considered (n=10, 30 and 100) for the same quantile of interest x100
such that P(x>x100 | Data) = 1/100. Typical results are shown in the next Tables 3 and
4.

Data from Gumbel n ML LS

Fitted by Gumbel 10 0.59 0.34
30 0.56 0.36
100 0.53 0.38
Fitted by Exponential 10 0.18 0.19
30 0.01 0.12
100 0.00 0.05
Table 3: Probabilities of underdesign pu

Data from n ML LS
Exponential
Fitted by Gumbel 10 0.85 0.50
30 0.96 0.55
100 0.99 0.67
Fitted by Exponential 10 0.59 0.37
30 0.54 0.37
100 0.52 0.37
Table 4: Probabilities of underdesign pu

The probabilities of overdesign follow from the relation po=1-pu. From the Tables 3
and 4, it follows that the least squares method usually gives lower probabilities of
underdesign than the maximum likelihood method. That’s why a least squares method
is so popular under engineers. If we define assymetric loss-functions, in which we can
model the risk aversion of a designer towards underdesign, we can determine optimal

31
choices for distribution type and estimation method. For example if we penalize
underdesign with a factor 4 more than overdesign we get the following Table 5.

n Gumbel Exponential
f* EM* f* EM*
10 Exp ML Exp LS
30 G LS Exp LS
100 G LS Exp ML
Table 5: Optimal choices for distribution type f* and estimation method EM*

From this table, we indeed notice a preference for the least squares method, except for
large sample sizes from an exponential distribution where a ML-method is prefered
and for small sample sizes from a Gumbel distribution which are better modeled by an
exponential distribution with a ML-method for risk-averse engineers.

Discussion

In this section, an overview and references were given of parameter estimation

techniques that are well known in civil engineering practice. With Monte Carlo
simulation studies, these estimation techniques can easily be compared. Table 1 gave
an overview of all the simulation work that has been performed for the pairs (f,EM):
distribution and estimation method. With the references given in this table it is in
principle possible to determine the optimal (w.r.t. minimum bias and RMSE)
estimation method given a certain distribution function. However, the optimal choice
for a pair (f,EM) can change very quickly if the main assumption of i.i.d. data is
violated, or when the criteria for the optimal pair is changed (performance measured
in terms of under- and overdesign), as was shown in this paper. Under- and
overdesign are important measures for the engineer. Based on simulations from an
Exponential distribution and some mathematical proofs an ordening in risk aversion
of the different estimation techniques can be made. The Maximum Likelihood,
Bayesian point estimation (mean of posterior distributions) and Method of Moments
parameter estimation techniques give a relatively higher proportion of underdesign
than the Bayesian predictive (integration over the posterior distribution) and Least
Squares techniques. Asymetric loss criteria can be used to model the risk aversion of
the engineer in mathematical terms. The optimal choice of the probability distribution
function and the parameter estimation method can then be determined by minimizing
the assymetric loss. In Van Gelder (1996b), this idea has been worked out for more
types of loss functions and includes parameter and model uncertainty. In Sec. 5.5 and
5.6 we come back to this issue.
Most of the in the paper given probability models have been implemented in
computer programs. Kuczera, (1995) made a program called FLIKE in which the
Bayesian analysis of GEV, LN and GPA models are included. Perreault et al. (1994)
and Perron et al. (1994) developed the very powerful and user-friendly AJUSTE
software.

32
Discussion

The choice of statistical estimation methods for probability distribution functions is

one of the most challenging problems within civil engineering, and one that is filled
with many controversies. However, it is a topic with great practical importance and it
needs to be dealt with (see also Lambert et al. (1994), Haimes et al. (1994), and Seiler
and Alvarez (1996)). Attempts to develop new methods have been extremely
abundant (see also Bardsley (1994), Capehart et al. (1998), and Chow and Watt
(1990)). This is the reason for the large number of references in this paper. This paper
focused on the most important statistical estimation methods that are in circulation
under civil engineers. The question which estimation method can best be used is
impossible to answer. This depends on too many factors. The definition of what is
considered best is one of those factors. However, Table 1 gives an overview of the
most important journal papers which investigate the performance of a certain pair (f,
EM) w.r.t. minimum bias and RMSE of a quantile. Depending on various conditions,
sometimes a classification in the performance of the estimation methods can be made.
Furthermore, methods for deriving sampling distributions have been described in this
paper and have been applied to the Exponential distribution. The problem of weight
factor estimation has been investigated with various methods. Bayes factors appear to
perform very well and weight factors based on L-Kurtosis show to be in fairly close
agreement with those obtained from the minimum divergence criterion.
It is recommended in this paper (as well as in Mendel and Chick (1993)) to use
theoretical considerations as much as possible in the distribution selection. Chick et
al. (1995, 1996) proposed a physics-based approach to determine the PDFs of extreme
river discharges. In their papers, a new model for predicting the frequency of extreme
river levels is proposed which encapsulates physical knowledge about river dynamics,
including formulae which describe river discharge. The model accounts for the river
dynamics at a given location by modeling both how water gets into the river (via
upstream tributaries) and how water leaves (discharge modeled by ChJzy’s equation).
Although the simplified physical model makes several rough approximations (using
memoryless properties and ChJzy’s equation for approximating discharge), insights
were gained in the effects of Chezy’s equation parameters on the shape of the curves
relating the river level and flood return frequency can be shown with Chick’s
approach. These shapes do not always conform to the curves found for traditional
models. In particular, the relation is not necessarily linear on log paper, as with the
Exponential model. It was shown that an increase in the power parameter of Chezy’s
equation led to a non-linear relation on log paper. As the power increased, the slope
of the curve relating flood volume and the frequency of extreme floods decreased.
This may be true for more complicated systems as well. It was concluded by Chick
et al. (1995, 1996) that flood protection designs based on drawing straight lines on log
paper would be conservative for extremely rare floods.

33
Figure 6: Effect of changes in the power parameter of Chezy’s equation on the flood
frequency curve (source: Chick et al., 1996)

References

Abernethy, R.B., 1983. Weibull Analysis Handbook, Pratt & Whitney Aircraft, November 1983.
Abramowitz, M., and Stegun, I., 1965. Handbook of Mathematical Functions, Dover Publications Inc.,
New York.
Anctil, F., Martel, N., and Van Diem Hoang. 1998. Analyse régionale des crues journalières de la
province de Québec. Canadian Journal of Civil Engineering, Pages 360-369 Volume 25,
Number 2, April 1998 ISSN 1208-6029.
Ashkar, F., and Ouarda, T.B.M.J., 1998. Approximate confidence intervals for quantiles of gamma and
generalized gamma distributions. Journal-of-Hydrologic-Engineering. v 3 n 1 Jan 1998, p 43-51.
Bardsley W.E., 1994. Against objective statistical-analysis of hydrological extremes , Journal of
hydrology, 162: (3-4) 429-431 NOV 1994.
Barnett, V., 1973. Comparitive Statistical Inference, John Wiley and Sons, Inc.
Bayes, T., 1763. An essay towards solving a probleme in the doctrine of chances, Reprinted from
Philos. Trans. Roy. Soc. London. Vol. 53: 370-418.
Beard, L.R., 1994. Anatomy of the best estimate. Journal of Hydraulic Engineering, Vol. 120, No. 6,
June, 1994.
Benjamin, J.R., and Cornell, C.A., 1970. Probability, Statistics and Decision for Civil Engineers,
McGraw-Hill, Inc.
Ben-Zvi, A., and Azmon, B., 1997. Joint use of L-moment diagram and goodness-of-fit test: A case
study of diverse series, Journal of Hydrology. v 198 n 1-4 Nov 1997. p 245-259.
Berger, J.O., 1980. Statistical Decision Theory, Foundations, Concepts, and Methods, Springer-Verlag.
Bernardo, J.M., and Smith, A.F.M., 1994. Bayesian Theory, John Wiley & Sons, Inc., Wiley Series in
Probability and Mathematical Statistics.
Bernoulli Family, 1999. The MacTutor History of Mathematics archive, http://www-history.mcs.st-
and.ac.uk/history/

34
Bobée, B., and F. Ashkar, 1991. The Gamma Family and Derived Distributions Applied in Hydrology,
203 pp., Water ResourcesPublication, Littleton, CO.
Bobée, B., Ashkar, F., and Perreault, L., 1993a. Two kinds of moment ratio diagrams and their
applications in hydrology, Stochastic Hydrol. Hydraul., 7, 41-65.
Bobée, B., Cavadias, G., Ashkar, F., Bernier, J., and Rasmussen, P.F., 1993b. Towards a systematic
approach to comparing distributions used in flood frequency analysis, J. Hydrol., 142, 121-136.
Box, G.E.P., and Tiao, G.C., 1973. Bayesian Inference in Statistical Analysis, Addison-Wesley.
Burcharth, H.F., and Liu, Z., 1994. On the extreme wave height analysis, In Port and Harbour Research
Institute, Ministry of Transport, editor, International Conference on Hydro-Technical
Engineering for Port and Harbor Construction (Hydro-Port), Yokosuka, Japan, 1994, pages 123-
142, Yokosuka: Coastal Development Institute of Technology.
Capehart, B.L., Mahoney, J.F., and Sivazlian, B.D., 1998. Technological risk assessment: The
statistical evaluation of low frequency events.Publ by Inderscience Enterprises Ltd, Geneva,
Switz p 749-755. Technology Management 1: Proceedings of the First International Conference
on Technology Management. Special Publication of the International Journal of Technology
Management. Miami, FL, USA
Carlin, B.P., and Louis, T.A., 1996. Bayes and Empirical Bayes Methods for Data Analysis, Chapman
& Hall.
Carter, D.J.T., and Challenor, P.G., 1983. Methods of Fitting the Fisher-Tippett Type 1 Extreme Value
Distribution, Ocean Engineering, Vol.10, No. 3, pp.191-199.
Castillo, E., and Hadi, A.S., 1997. Fitting the generalized Pareto distribution to data, Journal-of-the-
American-Statistical-Association. DEC 1997; 92 (440) : 1609-1620.
Chick, S., Shortle, J., Van Gelder, P., and Mendel, M., 1995. A Physics-Based Approach to Predicting
the Frequency of Extreme River Levels, Engineering Probabilistic Design and Maintenance for
Flood Protection, pp.89-107. Discussion: pp.109-138.
Chick, S., Shortle, J., Van Gelder, P., and Mendel, M., 1996. A Model for the Frequency of Extreme
River Levels Based on River Dynamics, Structural Safety, Vol. 18, Nr. 4, pp.261-276.
Chow, K.C.A., and Watt, W.E., 1990. A knowledge-based expert system for flood frequency analysis,
Can. J. Civ. Eng., 17, 597-609.
Coles, S.G., and Powell, E.A., 1996. Bayesian methods in extreme value modelling: A review and new
developments, International Statistical Review, 64: (1) 119-136 APR 1996.
Coles, S.G., and Tawn, J.A., 1996. A Bayesian analysis of extreme rainfall data, applied statistics-
journal of the royal statistical society series C, 45: (4) 463-478 1996.
Corbyn, J.A., 1988. Statistical analysis of samples from lognormal distribution by Bayesian methods
with minerals industry applications.Transactions-of-the-Institution-of-Mining-and-Metallurgy,-
Section-A, v 97 Jul 1988, p 118-124.
Corsini, G., Gini, F., Greco, M.V., and Verrazzani, L., 1995. Cramer-Rao bounds and estimation of the
parameters of the Gumbel distribution, IEEE-Transactions-on-Aerospace-and-Electronic-
Systems. V 31 n 3 Jul 1995, p 1202-1204.
Cramer, H., 1946. Mathematical methods of statistics, Princeton University Press.
Cunnane, C., 1987. Review of statistical methods for flood frequency estimation, in Hydrologic
Frequency Modeling, edited by V.P. Singh, pp. 49-95, D. Reidel, Dordrecht.
Cunnane, C., 1988. Methods and merits of regional flood frequency analysis, J. Hydrol., 100, p. 269-
290.

35
Dekkers, A.L.M., and De Haan, L., 1989. On the estimation of the extreme-value index and large
quantile estimation. Annals of Statistics 17, 1795-1832.
Demetracopoulos, A.C., 1994. nonlinear-regression applied to hydrologic data, journal of irrigation and
drainage engineering-ASCE, 120: (3) 652-659 May-June 1994.
Demuth, S., and Kuells, C., 1997. Probability analysis and regional aspects of droughts in southern
Germany. Symp 1: Sustainability of Water Resources under Increasing Uncertainty IAHS
Publication (International Association of Hydrological Sciences). n 240.
Duan, J., Selker, J., and Grant, G.E., 1998. Evaluation of probability density functions in precipitation
models for the Pacific Northwest, Journal of the American Water Resources Association. v 34 n
3 Jun 1998. p 617-627.
Durrans, S.R., 1992a. Distributions of fractional order statistics in hydrology, Water Resour. Res.,
28(6), 1649-1655.
Durrans, S.R., 1992b. Parameter estimation for the Pearson type 3 distribution using order statistics, J.
Hydrol., 133, 215-232.
Fill, H.D., and Stedinger, J.R., 1995. Homogeneity tests based upon Gumbel distribution and a critical
appraisal of Dalrymple’s test, Journal of Hydrology. v 166 p 81-105.
Fill, H.D., and Stedinger, J.R., 1998. Using regional regression within index flood procedures and an
empirical Bayesian estimator, Journal-of-Hydrology. SEP 1998; 210 (1-4) : 128-145.
Fisher, R.A., 1934. Statistical methods for research workers, Publisher Edinburgh, Biological
monographs and manuals. no. 5.
Gauss, C.F., 1901. Sechs Beweise des Fundamentaltheorems über quadratische Reste (Six proofs of the
fundamental theorems on quadratic residuals; in German), Publisher Leipzig : Engelmann,
HRSG von E. Netto. 1901, 111 pages, Series Ostwald's Klassiker der exakten Wissenschaften
Nr. 122.
Gingras, D., and Adamowski, K., 1994. Performance of L-moments and nonparametric flood frequency
analysis. Canadian Journal of Civil Engineering. v 21 n 5 Oct 1994. p 856-862.
Goda, Y., and Kobune, K., 1990. Distribution Function Fitting for Storm Wave Data, Proceedings of
the International Conference on Coastal Engineering, Delft, The Netherlands.
Goda, Y., Hawkes, P.J., Mansard, E., Martin, M.J., Mathiesen, M., Peltier, E., Thompson, E.F., and
Van Vledder, G., 1993. Intercomparison of extremal wave analysis methods using numerically
simulated data, Ocean wave measurement and analysis, pp. 963-977.
Greenwood, J.A., Landwehr, J.M., Matalas, N.C., and Wallis, J.R., 1979. Probability weighted
moments: Definition and relation to parameters of several distributions expressable in inverse
form. Water Resources Research, 15(5), 1049-1054.
Guo, S.L., and Cunnane, C., 1991. Evaluation of the usefulness of historical and palaeological floods in
quantile estimation, J. Hydrol., 129, 245-262.
Haimes, Y.Y., Barry, T., and Lambert, J.H., 1994. When and how can you specify a probability
distribution when you don't know much, Risk-Based Decision Making in Water Resources VI.
Y.Y. Haimes, D.A. Moser, and E.Z. Stakhiv, Eds., New York: American Society of Civil
Engineers, pp. 19-25.
Hald, A., 1952. Statistical Theory with Engineering Applications, John Wiley and Sons, Inc., New
York.
Harter, H.L., 1969. Order Statistics and Their Use in Testing and Estimation. Volume 2, Aerospace
Research Laboratories, U.S. Air Force, Washington DC, USA.

36
Hoshi, K., and Leeyavanija, U., 1986. A new approach to parameter estimations of gamma-type
distributions, Journal of Hydroscience and Hydraulic Engineering, JSCE, 4 (2), 79-95.
Hosking, J.R.M., 1986. The theory of probability weighted moments, Research Rep. RC 12210, 160
pp., IBM Research Division, Yorktown Heights, NY.
Hosking, J.R.M., 1990. L-moments: Analysis and estimation of distributions using linear combinations
of order statistics. J. R. Stat. Soc. Ser. B., vol. 52, 105-124.
Hosking, J.R.M., 1992. Moments or L moments? An example comparing two measures of distribution
shape. The American Statistician, 46(3), 186-189.
Hosking, J.R.M., 1997. Fortran Routines for Use with the Method of L-Moments. IBM Research
Report, RC20525, Yorktown Heights, NY.
Hosking, J.R.M., and Wallis, J.R., 1987. Parameter and quantile estimation for the generalized Pareto
distribution, Technometrics, 29(3), 339-349.
Hosking, J.R.M., and Wallis, J.R., 1997. Regional Frequency Analysis: An Approach based on L-
Moments. Cambridge University Press, Cambridge, UK.
Hosking, J.R.M., Wallis, J.R., and Wood, E.F., 1985. Estimation of the Generalized Extreme Value
Distribution by the Method of Probability Weighted Moments, Technometrics, August 1985,
Vol. 27, No.3, pp.251-261.
Hu, S., 1987a. Determination of confidence intervals for design floods. J-Hydrol. v 96 n 1-4 Dec 15
1987, p 201-213.
Hu, S., 1987b. Problems with outlier test methods in flood frequency analysis. J-Hydrol. v 96 n 1-4
Dec 15 1987, p 375-383.
Johnson, N.L., Kotz, S., and Balakrishnan, N., 1994. Continuous univariate distributions. Vol. 1
Publisher New York : Wiley, 1994 ISBN 0-471-58495-9, 756 pages.
Jowitt, P.W., 1979. The extreme-value type-1 distribution and the principle of maximum entropy,
Journal of Hydrology, 42, pp.23-28.
Jumarie, G., 1990. Relative Information: Theories and Applications. Springer-Verlag, Berlin,
Germany.
Kendall, M.G., 1961. Daniel Bernoulli on maximum likelihood, Biometrika 48, 1-18.
Koch, S.P., 1991. Bias error in maximum likelihood estimation, J. Hydrol., 122, p. 289-300.
Kuczera, G., 1994. Comprehensive Bayesian at-site flood frequency inference, Proceedings of the
Water Down Under 1994 Conference. Part 3 (of 3), IE Aust, Crows Nest, NSW, Aust. p 205-
210.
Kullback, S., 1959. Information Theory and Statistics. John Wiley & Sons Inc., New York, USA.
Lall, U., Moon, Y., and Bosworth, K., 1993. Kernel Flood Frequency Estimators: Bandwidth Selection
and Kernel Choice. Water Resources Research , April 1993, pp. 1003-1016.
Lambert, J.H., Matalas, N.C., Ling, C.W., Haimes, Y.Y., and Li, D., 1994. Selection of Probability
Distributions in Characterizing Risk of Extreme Events, Risk Analysis, Vol.14, No.5.
Landwehr, J.M., Matalas, N.C., and Wallis, J.R., 1979. Probability weighted moments compared with
some traditional techniques in estimating Gumbel parameters and Quantiles, Water Resources
Research, Vol. 15., No.5, October 1979, pp.1055-1064.
Lechner, J.A., 1991. Estimators for type-II censored (log)normal samples. IEEE-Trans-Reliab. v 40 n 5
Dec 1991, p 547-552.
Lin, B., and Vogel, J.L., 1993. Comparison of L-moments with method of moments Proceedings of the
Symposium on Engineering Hydrology Proc Symp Eng Hydrol. Publ by ASCE, New York,
NY, USA. 1993. p 443-448.

37
Lind, N.C., and Hong, H.P., 1991. Entropy estimation of hydrological extremes, Stochastic
Hydrol. Hydraul., 5, 77-87.
Lu, L.-H., and Stedinger, J.R., 1992b. Variance of two- and three-parameter GEV/PWM quantile
estimators: formulae, confidence intervals, and a comparison, J. Hydrol., 138, 247-267.
Lundgren, H., 1988. Variational criteria in spectrum analysis and image reconstruction, Signal
Processing IV, Elsevier, B157pp. 1573-1576.
Lye, L.M., Hapuarachchi, K.P., and Ryan, S., 1993. Bayes estimation of the extreme-value reliability
function, IEEE Transactions on Reliability, Vol. 42, No. 4, 1993 December.
Lye, L.M., Sinha, S.K., and Booy, C., 1988. Bayesian analysis of the tau-year events for flood data
fitted by a three-prameter lognormal distribution. Civ-Eng-Syst. v 5 n 2 Jun 1988, p 81-86.
Marchette, D.J., 1995. An Investigation of Misspecification of Terms in Mixture Densities, Technical
Report, Computational Statistics at George Mason University
Marchette, D.J., Priebe, C.E., Rogers G.W., and Solka, J.L., 1994. The Filtered Kernel Estimator,
Center for Computational Statistics, GMU, Tech Rpt No 104, October 1994.
McCuen, R.H., Leahy, R.B., and Johnson, P.A., 1990. Problems with logarithmic transformations in
regression , journal of hydraulic engineering-ASCE, 116: (3) 414-428 MAR 1990.
Mendel, M.B., and Chick, S.E., 1993. The Geometry and Calculus of Engineering Probability, UC-
Berkeley, Lecture notes, December 1993.
Moharram, S.H., Gosain, A.K., and Kapoor, P.N., 1993. A comparative study for the estimators of the
generalized Pareto distribution, J. Hydrol., 150, 169-185.
Mood, A.M., Graybill, F.A., and Boes, D.C., 1974. Introduction to the theory of statistics. 3rd ed.
Publisher New York : McGraw-Hill, 1974 Series in probability and statistics.
Moon, Y., and Lall, U., 1994. A Kernel Quantile Function Estimator For Flood Frequency Analysis, in
Extreme Values: Floods and Droughts, ed. K. Hipel, Kluwer.
Moon, Y., Lall, U., and Bosworth, K., 1993. A Comparison of Tail Probability Estimators for Flood
Frequency Analysis. Journal of Hydrology , V. 151, 1993, pp. 343-363.
Naghavi, B., and Yu, F.-X., 1996. Selection of parameter-estimation method for LP3 distribution.
Journal-of-Irrigation-and-Drainage-Engineering. v 122 n 1 Jan-Feb 1996, p 24-30.
Pannullo, J.E., Li, D., and Haimes, Y.Y., 1993. Posterior analysis in assessing risk of extreme events: a
conjugate family approach, Proceedings-of-the-IEEE-International-Conference-on-Systems,-
Man-and-Cybernetics. v 1 1993, Publ by IEEE, IEEE Service Center, Piscataway, NJ,
USA,CH3242-5. p 477-482.
Pearson, C.P., McKerchar, A.I., and Woods, R.A., 1991. Regional flood frequency analysis of
Western Australian data using L-moments. International Hydrology and Water Resources
Symposium 1991 Part 2. v 2 n 91 pt 22. p 631-632.
Perreault, L., Bobee, B., and Perron, H., 1994. Ajuste II: hydrological frequency analysis software part
1: theoretical aspects, International-Conference-on-Hydraulic-Engineering-Software,-
Hydrosoft,-Proceedings-2. 1994, Computational Mechanics Publ, Southampton, Engl. p 243-
250
Perron, H., Bobee, B., Perreault, L., and Roy. R., 1994. Ajuste II: hydrological frequency analysis
software part 2: software aspects and applications, International-Conference-on-Hydraulic-
Engineering-Software,-Hydrosoft,-Proceedings-2. 1994, Computational Mechanics Publ,
Southampton, Engl. p 251-258.
Pickands, J., 1975. Statistical inference using extreme order statistics, The Annals of Statistics, 3(1),
119-131.

38
Pilon, P.J., and Adamowski, K., 1993. Asymptotic variance of flood quantile in log Pearson Type III
distribution with historical information, J. Hydrol., 143, 481-503.
Preumont, A., 1988. Bayesian analysis of the extreme value of a time history. Reliab-Eng-Syst-Saf. v
20 n 3 1988, p 165-172.
Rao, A.R., and Hamed, K.H., 1997. Regional frequency analysis of Wabash River flood data by L-
moments, Journal of Hydrologic Engineering. v 2 n 4 Oct 1997. p 169-179.
Rasmussen, P.F., Bobée, B., and Bernier, J., 1994. Parameter estimation for the Pearson type 3
distribution using order statistics (Durrans, 1992)---Comment, J. Hydrol., 153, 417-424.
Reiss, R.-D., 1989. Approximate Distributions of Order Statistics, With Applications to Nonparametric
Statistics, Spinger Verlag Series in Statistics.
Ribeiro-Correa, J., and Rousselle, J., 1993. A hierarchical and empirical Bayes approach for the
regional Pearson type III distribution, Water Resour. Res., 29(2), p. 435-444.
Rosbjerg, D., Madsen, H., and Rasmussen, P.F., 1992. Prediction in partial duration series with
generalized Pareto-distributed exceedances, Water Resour. Res., 28(11), p. 3001-3010.
Rosenblatt, M., 1956, Remarks on some Nonparametric Estimates of a Density Function, Ann. Math.
Statist., 27, p. 832-835.
Ruprecht, J.K., and Karafilis, D.W., 1994. Regional flood frequency - caution needed Proceedings of
the Water Down Under 1994 Conference. Part 3 (of 3). Adelaide, Aust. 19941121-19941125.
Sankarasubramanian, A., and Srinivasan, K., 1999. Investigation and comparison of sampling
properties of L-moments and conventional moments. Journal-of-Hydrology. May 10 1999; 218
(1-2) : 13-34.
Scott, D.W., 1992. Multivariate Density Estimation, New York: John Wiley.
Seiler, F.A., and Alvarez, J.L., 1996. On the selection of distributions for stochastic variables, Risk
Analysis, Vol. 16, No. 1, pp. 5-18.
Shannon, C.E., 1949. The Mathematical Theory of Communication. The University of Illinois Press,
Urbana, IL, USA.
Silverman, B.W., 1986. Density Estimation for Statistics and Data Analysis, New York: Chapman and
Hall.
Singh V.P., 1997. The use of entropy in hydrology and water resources, hydrological processes, 11: (6)
587-626 MAY 1997.
Singh V.P., and Guo H., 1997. Parameter estimation for 2-parameter generalized pareto distribution by
Pome, Stochastic hydrology and hydraulics, 11: (3) 211-227 JUN 1997.
Singh, V.P., and Fiorentino, M., 1992. A Historical perspective of entropy applications in water
resources, In Singh, V.P., and Fiorentino, M. (eds.), Entropy and Energy Dissipation in Water
Resources, pp.21-61, Kluwer Academic Publishers.
Singh, V.P., and Singh, K., 1985. Derivation of the Gamma distribution by using the principle of
maximum entropy (POME). Water Resources Bulletin, Vol. 21, No. 6, pp.941-952.
Singh, V.P., and Singh, K., 1987. Parameter estimation for TPLN distribution for flood frequency
analysis. Water Resources Bulletin, vol. 23, No. 6, pp.1185-1992.
Singh, V.P., Cruise, J.F., and Ma, M., 1990. A Comparative-evaluation of the estimators of the 3-
parameter lognormal-distribution by Monte-Carlo Simulation, Computational statistics & data
analysis, 10: (1) 71-85 AUG 1990.
Smith, R.L., and Naylor, J.C., 1987. A comparison of Maximum Likelihood and Bayesian Estimators
for the Three-parameter Weibull Distribution, Appl. Statist. 36, No. 3, pp. 358-369.

39
Smith, R.L., and Weissman, I., 1987. Large deviations of tail estimators based on the Pareto
approximation. J. Appl. Prob. Vol. 24, pp. 619-630.
Stacy, E.W., and Mihram, G.A., 1965. Parameter estimation for a generalized gamma distribution,
Technometrics, Vol. 7, No. 3, pp. 349-358, August 1965.
Stedinger, J.R., 1980. Fitting log normal distributions to hydrologic data, Water Resources Research,
16 (3), 481-490.
Sukhatme, P.V., 1937. Tests of significance for samples of the Chi-squared population with two
degrees of freedom, Ann. Eugen., Vol. 8, pp. 52-56.
Takara, K., and Takasao, T., 1990. Comparison for parameter estimation methods for hydrologic
frequency analysis models, Proc. Hydraulic Engineering, JSCE, 34, 7-12 (in Japanese).
Takara, K., Takasao, T., and Shimizu, A., 1989. Comparison of parameter estimation methods for
extreme value distributions. Annuals, DPRI, Kyoto University, 32 B-2, 455-469 (in Japanese).
Takara, K.T., and Stedinger, J.R., 1994. Recent Japanese contributions to frequency analysis and
quantile lower bound estimators, In: K.W. Hipel (ed.), Stochastic and Statistical methods in
hydrology and environmental engineering, Vol. 1, p. 217-234, Kluwer Academic Publishers.
Takeuchi, K., and Tsuchiya, K., 1988. On relative accuracy of PWM estimates of normal and 3-
parameter lognormal distributions, Proc. of JSCE, Vol.393/ll-9, May,pp. 103-112.
TAW: Technical Advisory Committee on Water Defences, 1990. Probabilistic Design of Flood
Defences. Report 141, Centre for Civil Engineering Research and Codes, The Netherlands.
Van Gelder, P.H.A.J.M., 1999b. Risks and safety of flood protection structures in the Netherlands,
Proceedings of the Participation of Young Scientists in the Forum Engelberg 1999 on Risk and
Safety of Technical Systems - in View of Profound Changes, pp.55-60, p.93.
Van Gelder, P.H.A.J.M., and Neykov, N.M., 1998. Regional frequency analysis of extreme water
levels along the Dutch coast using L-moments: A preliminary study, In: Stochastic models of
hydrological processes and their applications to problems of environmental preservation, pp.14-
20.
Van Gelder, P.H.A.J.M., and Vrijling, J.K., 1997a. A comparative study of different parameter
estimation methods for statistical distribution functions in civil engineering applications,
Structural Safety and Reliability, Vol. 1, pp.665-668.
Van Gelder, P.H.A.J.M., Pandey, M.D., and Vrijling, J.K., 1999b. The use of L-Kurtosis in the
Estimation of Extreme Floods, 9th Annual Conference Risk Analysis: Facing the New
Millennium, Rotterdam, The Netherlands, October 10 - 13, 1999.
Van Gelder, P.H.A.J.M., 2000. Statistical methods for the risk-based design of civil structures,
Publisher TU Delft, Descr. 248 p, Series Communications on Hydraulic and Geotechnical
Engineering ISSN:0169-6548 00-1.
Van Noortwijk, J.M., 1999. Quantiles of Generalised Gamma Distributions from a Bayesian Point of
View, HKV-report.
Vrijling, J.K., and Van Gelder, P.H.A.J.M., 1995. Probabilistic design of berm breakwaters, in:
Engineering Probabilistic Design and Maintenance for Flood Protection, pp. 181-198,
Discussion: pp. 199-213.
Vrijling, J.K., and Van Gelder, P.H.A.J.M., 1997a. Societal risk and the concept of risk aversion,
Advances in Safety and Reliability, Vol. 1, pp. 45-52.
Wang, Q.J., 1998. Approximate goodness-of-fit tests of fitted generalized extreme value distributions
using LH moments, Water Resources Research, 34: (12) 3497-3502 DEC 1998.

40
WMO, 1989. Statistical distributions for flood frequency analysis, Operational Hydrology Report No.
33, World Meteorological Organization---No. 718, Geneva, Switzerland.
Wu B., Hou Y., and Ding J., 1991. Method of lower-bound to estimate the parameters of a Pearson
type III distribution. Hydrol-Sci-J. v 36 n 3 Jun 1991, p 271-280.
Yamaguchi, M., 1997. Intercomparison of parameter estimation methods in extremal wave analysis. In
B.L. Edge, editor, 25th International Conference on Coastal Engineering, Orlando, Florida,
U.S.A., 1996, pages 900-913, New York: American Society of Civil Engineers (ASCE).

Wärtsilä 9L20 Part
75% (8)
Wärtsilä 9L20 Part
382 pages
William G. Cochran Sampling Techniques
83% (12)
William G. Cochran Sampling Techniques
442 pages
Ejercicios Presente Simple o Continuo
No ratings yet
Ejercicios Presente Simple o Continuo
10 pages
Grade 6 English Workbook
100% (11)
Grade 6 English Workbook
230 pages
Minimum L - Distance Estimators For Non-Normalized Parametric Models
No ratings yet
Minimum L - Distance Estimators For Non-Normalized Parametric Models
32 pages
Lecture Notes in Statistics 153: Edited by P. Bickel, P. Diggle, S. Fienberg, K. Krickeberg, Olkin, N. Wermuth, S. Zeger
No ratings yet
Lecture Notes in Statistics 153: Edited by P. Bickel, P. Diggle, S. Fienberg, K. Krickeberg, Olkin, N. Wermuth, S. Zeger
10 pages
Vtfit Vtfit: A Routine For Fitting Homogeneous Probability Density Functions
No ratings yet
Vtfit Vtfit: A Routine For Fitting Homogeneous Probability Density Functions
20 pages
Auxiliary Information and A Priori Values in Construction of Improved Estimators, by R. Singh, P. Chauhan, N. Sawan, F. Smarandache
No ratings yet
Auxiliary Information and A Priori Values in Construction of Improved Estimators, by R. Singh, P. Chauhan, N. Sawan, F. Smarandache
75 pages
Preliminary Concepts On Statistical Inference
100% (1)
Preliminary Concepts On Statistical Inference
39 pages
11.5.2 Unknown Variance: 138 Quantiles, Proportions, and Means
No ratings yet
11.5.2 Unknown Variance: 138 Quantiles, Proportions, and Means
29 pages
Cochran 1977 Sampling Techniques Third Edition
100% (1)
Cochran 1977 Sampling Techniques Third Edition
442 pages
William - G. - Cochran - Sampling - Techniques - Third - EdBookFi - Org
No ratings yet
William - G. - Cochran - Sampling - Techniques - Third - EdBookFi - Org
442 pages
Statistics
No ratings yet
Statistics
7 pages
Statistical Foundations for Psychology
From Everand
Statistical Foundations for Psychology
James C. Ware
No ratings yet
Descriptive Statistics
No ratings yet
Descriptive Statistics
4 pages
Data Collection: Sampling
No ratings yet
Data Collection: Sampling
8 pages
data analysis #1
No ratings yet
data analysis #1
9 pages
Nature of Stat
No ratings yet
Nature of Stat
11 pages
LECTURE NOTES_1
No ratings yet
LECTURE NOTES_1
56 pages
Variable and Types of Statistical Variables
100% (1)
Variable and Types of Statistical Variables
9 pages
07_chapter 1
No ratings yet
07_chapter 1
8 pages
Paper_32_JRSS_2016_Tha_Yad_Singh
No ratings yet
Paper_32_JRSS_2016_Tha_Yad_Singh
9 pages
Inferential Statistics notes
No ratings yet
Inferential Statistics notes
41 pages
Masaya Book Statistics
No ratings yet
Masaya Book Statistics
460 pages
Digital Signal Processing (DSP) with Python Programming
From Everand
Digital Signal Processing (DSP) with Python Programming
Maurice Charbit
No ratings yet
Cochran 1977 Sampling Techniques
83% (6)
Cochran 1977 Sampling Techniques
442 pages
Bai Tap-Ppdtm 2024
No ratings yet
Bai Tap-Ppdtm 2024
5 pages
Huber RobustEstimationLocation 1964
No ratings yet
Huber RobustEstimationLocation 1964
30 pages
Cochran 1977 Sampling Techniques Third E
No ratings yet
Cochran 1977 Sampling Techniques Third E
442 pages
9789812567956_TOC
No ratings yet
9789812567956_TOC
9 pages
Cochran 1977 Sampling Techniques Third E
No ratings yet
Cochran 1977 Sampling Techniques Third E
442 pages
Inferencia No Parametrica Bajo Muestra Sesgada de Una Poblacion Finita
No ratings yet
Inferencia No Parametrica Bajo Muestra Sesgada de Una Poblacion Finita
27 pages
yupaporn28,+Journal+manager,+Paper+5-New+Version
No ratings yet
yupaporn28,+Journal+manager,+Paper+5-New+Version
16 pages
samplingbook_Dr Hanif (2024)
No ratings yet
samplingbook_Dr Hanif (2024)
411 pages
3RD Quarter Reviewer
No ratings yet
3RD Quarter Reviewer
19 pages
V01-Analysis of Variance
100% (1)
V01-Analysis of Variance
991 pages
(Springer Series in Statistics) Jun Shao, Dongsheng Tu (Auth.) - The Jackknife and Bootstrap-Springer-Verlag New York (1995)
100% (1)
(Springer Series in Statistics) Jun Shao, Dongsheng Tu (Auth.) - The Jackknife and Bootstrap-Springer-Verlag New York (1995)
532 pages
Ex. Sheet 2
No ratings yet
Ex. Sheet 2
5 pages
Chapter 1
No ratings yet
Chapter 1
12 pages
Fundamentals of Statistics (Lecture Note1)
No ratings yet
Fundamentals of Statistics (Lecture Note1)
12 pages
202004160626023624rajiv Saksena Advance Statistical Inference
No ratings yet
202004160626023624rajiv Saksena Advance Statistical Inference
31 pages
Robust Statistics
No ratings yet
Robust Statistics
11 pages
Lecture Notes in Statistics 148
No ratings yet
Lecture Notes in Statistics 148
241 pages
A BAYESIAN APPROACH TO NONPARAMETRIC TEST PROBLEMS
No ratings yet
A BAYESIAN APPROACH TO NONPARAMETRIC TEST PROBLEMS
16 pages
EM-104-Module
No ratings yet
EM-104-Module
12 pages
With The Collection, Presentation, Analysis and Interpretation of A Set of Data in Order To Yield Meaningful Information
No ratings yet
With The Collection, Presentation, Analysis and Interpretation of A Set of Data in Order To Yield Meaningful Information
9 pages
Paper_34_JMASM_2024_Yadav_Thakur_Pareek
No ratings yet
Paper_34_JMASM_2024_Yadav_Thakur_Pareek
17 pages
9.1. Prob - Stats
No ratings yet
9.1. Prob - Stats
19 pages
Quantitative Data Analysis2
No ratings yet
Quantitative Data Analysis2
7 pages
Robust Estimators (By Lax 1980)
No ratings yet
Robust Estimators (By Lax 1980)
7 pages
With The Collection, Presentation, Analysis and Interpretation of A Set of Data in Order To Yield Meaningful Information
No ratings yet
With The Collection, Presentation, Analysis and Interpretation of A Set of Data in Order To Yield Meaningful Information
9 pages
MATH10282: Introduction To Statistics Supplementary Lecture Notes
No ratings yet
MATH10282: Introduction To Statistics Supplementary Lecture Notes
50 pages
Econ notes32
No ratings yet
Econ notes32
5 pages
Sampling Distribution: Estimation and Testing of Hypothesis
No ratings yet
Sampling Distribution: Estimation and Testing of Hypothesis
34 pages
A hybrid method for density power divergence minimization with application to robust univariate location and scale estimation
No ratings yet
A hybrid method for density power divergence minimization with application to robust univariate location and scale estimation
25 pages
Handling Non-Normal Data in Structural Equation Modeling
No ratings yet
Handling Non-Normal Data in Structural Equation Modeling
5 pages
Bayseian - Google Drive
No ratings yet
Bayseian - Google Drive
6 pages
Co-Clustering: Models, Algorithms and Applications
From Everand
Co-Clustering: Models, Algorithms and Applications
Gérard Govaert
No ratings yet
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
Statistics II Essentials
From Everand
Statistics II Essentials
Emil Milewski
2.5/5 (1)
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
Two Dimensional Geometric Model: Understanding and Applications in Computer Vision
From Everand
Two Dimensional Geometric Model: Understanding and Applications in Computer Vision
Fouad Sabry
No ratings yet
Groundwater Movement: 3.1: Darcy's Law
No ratings yet
Groundwater Movement: 3.1: Darcy's Law
25 pages
Groundwater Management: A) Artificial Groundwater Recharge
No ratings yet
Groundwater Management: A) Artificial Groundwater Recharge
12 pages
Motivation Letter IHE
100% (1)
Motivation Letter IHE
2 pages
Groundwater Movement: 3.1: Darcy's Law
No ratings yet
Groundwater Movement: 3.1: Darcy's Law
25 pages
Chapter 1 Intro
No ratings yet
Chapter 1 Intro
13 pages
ST Chap 2
No ratings yet
ST Chap 2
17 pages
Chapter 3 Sewerage System
No ratings yet
Chapter 3 Sewerage System
15 pages
Cross Drainage Works PPT Final
100% (1)
Cross Drainage Works PPT Final
19 pages
Penaltyparameter - PDF 8
No ratings yet
Penaltyparameter - PDF 8
18 pages
Statement of Purpose Example 12
No ratings yet
Statement of Purpose Example 12
3 pages
Fluid Dynamics - Note
No ratings yet
Fluid Dynamics - Note
60 pages
H4SD - PRESS - Ethiopian Ministry of Water and Energy Introduction - Final
No ratings yet
H4SD - PRESS - Ethiopian Ministry of Water and Energy Introduction - Final
1 page
Groundwater Hydrology Part 2
No ratings yet
Groundwater Hydrology Part 2
52 pages
Water Supply and Sanitation in Ethiopia - Wikipedia, The Free Encyclopedia
No ratings yet
Water Supply and Sanitation in Ethiopia - Wikipedia, The Free Encyclopedia
11 pages
ASUS X51RL Rev.2.00 Schematic Diagram
100% (1)
ASUS X51RL Rev.2.00 Schematic Diagram
64 pages
Math-Paper-3 (1)
No ratings yet
Math-Paper-3 (1)
9 pages
Ahmad Naeem (CV), Project_Manager
No ratings yet
Ahmad Naeem (CV), Project_Manager
2 pages
Solignum Colourless Wood Preservative
100% (4)
Solignum Colourless Wood Preservative
2 pages
Climate of India
No ratings yet
Climate of India
6 pages
Detailed Lesson Plan in T L E
No ratings yet
Detailed Lesson Plan in T L E
3 pages
Ocular Emergency
No ratings yet
Ocular Emergency
39 pages
Performance Task # - Name: - Impulse-Momentum Date: - Score
No ratings yet
Performance Task # - Name: - Impulse-Momentum Date: - Score
5 pages
Fish meristic characters
No ratings yet
Fish meristic characters
8 pages
104 Preparation For Arrival at Port
No ratings yet
104 Preparation For Arrival at Port
1 page
Songs For January 1
No ratings yet
Songs For January 1
1 page
Jurnal Stabilitas Suhu PCT
No ratings yet
Jurnal Stabilitas Suhu PCT
7 pages
Untitled
No ratings yet
Untitled
141 pages
Immediate download Triumph of the City How Our Greatest Invention Makes Us Richer Smarter Greener Healthier and Happier ebooks 2024
No ratings yet
Immediate download Triumph of the City How Our Greatest Invention Makes Us Richer Smarter Greener Healthier and Happier ebooks 2024
24 pages
Oily Water Separator
100% (1)
Oily Water Separator
154 pages
Spindle (Front) Install: Desarmado y Armado
No ratings yet
Spindle (Front) Install: Desarmado y Armado
3 pages
Metode de Reeducare Posturala
No ratings yet
Metode de Reeducare Posturala
4 pages
Garia - Wikipedia, The Free Encyclopedia
No ratings yet
Garia - Wikipedia, The Free Encyclopedia
9 pages
Career Paths Electronics SB - Week 6
No ratings yet
Career Paths Electronics SB - Week 6
7 pages
2012 June
No ratings yet
2012 June
198 pages
Examples 1 - Limits
No ratings yet
Examples 1 - Limits
4 pages
Mrs Swati Gadge 01 11 2024 05 00 42 PM
No ratings yet
Mrs Swati Gadge 01 11 2024 05 00 42 PM
4 pages
Building Safety Inspection Report Khalifa University - SAN-28-10-2020
No ratings yet
Building Safety Inspection Report Khalifa University - SAN-28-10-2020
5 pages
RCC Design & Details
No ratings yet
RCC Design & Details
1 page
Clevo m540r m541r m547r - 6-7p-m54r6-003
No ratings yet
Clevo m540r m541r m547r - 6-7p-m54r6-003
42 pages
De Khao Sat Chat Luong Dau Nam Tieng Anh Lop 7 Global Success
No ratings yet
De Khao Sat Chat Luong Dau Nam Tieng Anh Lop 7 Global Success
5 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Statistical Estimation Methods in Hydrological Engineering

Uploaded by

Statistical Estimation Methods in Hydrological Engineering

Uploaded by

Statistical Estimation Methods in Hydrological Engineering

P.H.A.J.M. van Gelder

These two assumptions (abbreviated to i.i.d. (independent identically distributed))

These concepts are formalized as follows.

Otherwise, B(T) = E(T(X)) - 2 is the bias of T.

MSE(T) = E((T(X) - 2)2) (0.2)

The root mean-squared error of an estimator is defined as:

RMSE = %MSE (0.3)

var(T1(X)) < var(T2(X)) (0.4)

If estimators are not unbiased it is not so straightforward to determine efficiency: we

Method of Moments (MOM)

Method of Maximum Likelihood (MML)

Var(1)= - E(M2logL(2,x)/M 22) (0.10)

Method of Least Squares (MLS)

E(Y|x) = " + $ x (0.11)

1) Least squares error: Gi=1n [pi-F(xi:n|2)]2 is to be minimized over 2 (sometimes also

g(F(x)) = (x-A)/B (0.15)

2) Least squares linearized error Gi=1n [g(pi)-g(F(xi:n|2))]2 to be minimized over 2.

3) Weighted least squares error Gi=1n [wi(pi)-wi(F(xi:n|2))]2 to be minimized over 2,

4) Least squares to the ordered observations themselves: Gi=1n [F-1(pi)- xi:n]2 as

L-Moments for data samples

Probability weighted moments, defined by Greenwood et al. (1979), are precursors of

L-moments are certain linear combinations of probability weighted moments that

Furthermore, the skewness and kurtosis are related with:

∑ ∑ ∑ ( xi:n − 2 x j:n + xk:n )

These are dimensionless quantities, independent of the units of measurement of the

It takes values between 0 and 1 (if X$0).

L-Moments for probability distributions

L-moment ratios are defined by:

Examples (for a complete overview, see the Appendix):

Relation of L-Moments with order statistics

A sample randomly drawn from a distribution is analogous to a Bernouilli experiment

Substituting from eqn.(0.32) into (0.33) and introducing a transformation, u = F(x) or

an unbiased estimator of $n-1. Landwehr et al. (1979) gave a proof that br is an

Figure 1: Comparison of Eqn. (0.40) and (0.41).

Summarizing: L-moments are certain linear combinations of probability weighted

As seen in Section 2.2.3, a consistent method to model parameter uncertainty is given

l(x|8,*) = (1/*)n exp(-(Exi-8)/*) exp(-Eexp(-(xi-8)/*)) (0.44)

p(8|x)= C N(8|µ G,FG) (1/*)n exp(-(Exi-8)/*)exp(-Eexp(-(xi-8)/*)) (0.45)

fH(h)=I1 fH(h|1) f(1|h1, h2, ..., hn ,I)d1 (0.46)

where fH(h|1) is the probabilistic model of wave heights, conditional on the

If(2)g(2)d2 . G f(2i)g(2i))2i (0.47)

where 2i is a suitable discretisation of 2 with discretisation interval )2i .

If(2)g(2)d2 . lim n64 1/nGi=1..n f(2i) (0.48)

Obtaining conjugate priors

f ( x|θ ) = h( x ) g(θ )et ( x )c (θ ) (0.49)

for functions h,g,t and c such that

f ( x | α, ξ) = α exp{−( x − ξ)α} exp[− exp{−( x − ξ)α}] =

f (θ ) ∝ g(θ )d ebc (θ ) (0.53)

In case of the Gumbel distribution, f (ξ ) ∝ exp( dαξ )exp( − b exp(αξ )) , we obtain:

The predictive distribution

So far, we have focused on parameter estimation. That is, we have specified a

f ( y| x ) = z f ( y|θ ) f (θ | x )dθ (0.56)

g(θ )d ebc(θ )dθ = h( y) g(θ )d +1ec(θ )(t( y)+b)dθ

The empirical Bayes method is a way of using sample information to assist in

selecting the optimal bandwidth h under different assumptions on " or different

for x>0, where ( is a positive parameter.

Performance of the statistical estimation methods

is extremely important in statistical theory. It is termed the Expected Information (and

If T(X) is an unbiased estimator of p then

Var(T(X)) > I-1(p) (0.68)

Methods for deriving the sampling distribution of estimators

If X and Y are random variables with PDF’s f and g respectively. Let

Z=X+Y, U=X-Y, V=XY and W=X/Y

then the PDF’s of Z, U, V and W are, respectively, given by:

fV(v) = If(x)g(v/x) |x|-1 dx

fW(w) = If(xw)g(x) |x| dx

E(XY) = E(X)E(Y) (0.70)

From this, we can derive:

E(g(X)) . g(E(X)) (0.73)

Var(g(X)) . Var(X) [g’(mX)]2 (0.74)

If the coefficient of variation of X is less than c, the error involved in these

Exact sampling distributions for the Exponential distribution

The following summations have been derived:

Figure 3: Expressions 3 - 4n-1(n-1)-1c(n) and n-1

C onvergence as a function of sample size

l(x|8,) = (1/)n exp(-(Exi-8)/) exp(-Eexp(-(xi-8)/)) (0.44)

p(8|x)= C N(8|µ G,FG) (1/)n exp(-(Exi-8)/)exp(-Eexp(-(xi-8)/*)) (0.45)