Discriminating Among Weibull, Log-Normal and LL
Discriminating Among Weibull, Log-Normal and LL
log-logistic distributions
M. Z. Raqaba,b∗ , S. Al-Awadhia , Debasis Kunduc
a
Department of Statistics and OR, Kuwait University, Safat 13060, Kuwait
b
Department of Mathematics, The University of Jordan, Amman 11942, Jordan
c
Department of Mathematics, Indian Institute of Technology Kanpur, Kanpur, Pin
208016, INDIA
Abstract
In this paper we consider the problem of the model selection/ discrim-
ination among three different positively skewed lifetime distributions. All
these three distributions namely; the Weibull, log-normal and log-logistic,
have been used quite effectively to analyze positively skewed lifetime data.
In this paper we have used three different methods to discriminate among
these three distributions. We have used the maximized likelihood method to
choose the correct model and computed the asymptotic probability of correct
selection. We have further obtained the Fisher information matrices of these
three different distributions and compare them for complete and censored
observations. These measures can be used to discriminate among these three
distributions. We have also proposed to use the Kolmogorov-Smirnov distance
to choose the correct model. Extensive simulations have been performed to
compare the performances of the three different methods. It is observed that
each method performs better than the other two for some distributions and
for certain range of parameters. Further, the loss of information due to cen-
soring are compared for these three distributions. The analysis of a real data
set has been performed for illustrative purposes.
1 Introduction
Among several right skewed distributions, the Weibull (WE), log-normal (LN) and
log-logistic (LL) distributions have been used quite effectively in analyzing posi-
tively skewed lifetime data. These three distributions have several interesting dis-
tributional properties and their probability density functions also can take different
Corresponding author
∗
1
shapes. For example, the WE distribution can have a decreasing or an unimodal
probability density function (PDF), and a decreasing, a constant and an increasing
hazard function, depending on the shape parameter. Similarly, the PDF of a LN
density function is always unimodal and it has an inverted bathtub shaped hazard
function. Moreover, the LL distribution has either a reversed J shaped or an uni-
modal PDF, and the hazard function of the LL distribution is either a decreasing or
an inverted bathtub shaped. For further details about the distributional behaviors
of these distributions, one may refer to Johnson et al. (1995).
Let us consider the following problem. Suppose {x1 , . . . , xn } is a random sam-
ple of size n from some unknown lifetime distribution function F (·), i.e. F (0−) = 0,
and the preliminary data analysis suggests that it is coming from a positively skewed
distribution. Hence, any one of the above three distributions can be used to analyze
this data set. In this paper, we would like to explore among these WE, LN and
LL distributions, which one fits the data ‘best’. It can be observed that for certain
ranges of the parameters, the corresponding PDFs or the cumulative distribution
functions(CDFs) are very close to each other but can be quite different with respect
to other characteristics. Before explaining this with an example let us introduce
the following notations.
The WE distribution with the shape parameter α > 0 and scale parameter
λ > 0 will be denoted by W E(α, λ). The corresponding PDF and CDF for x > 0,
are
α α
fW E (x; α, λ) = αλα xα−1 e−(λx) and FW E (x; α, λ) = 1 − e−(λx) ,
respectively. The LN distribution is denoted by LN (σ, β) with the shape parameter
σ > 0 and scale parameter β > 0. The PDF and CDF of this distribution for x > 0,
can be written as
1 1 2
fLN (x; σ, β) = √ e− 2σ2 (ln x−ln β)
2π σ x
and
ln x − ln β 1 1 ln x − ln β
FLN (x; σ, β) = Φ( ) = + Erf ( √ ),
σ 2 2 2σ
respectively,
√ where Φ(.) is the CDF of a standard normal distribution with Erf (x) =
2Φ( 2 x) − 1. The PDF and CDF of the LL distribution, denoted by LL(γ, ξ),
with the shape parameter γ > 0 and scale parameter ξ > 0, for x > 0, are
(ln x−ln ξ)
1 e γ 1
fLL (x; γ, ξ) = and FLL (x; γ, ξ) = 1 − ,
γ x (1 + e (ln x−ln
γ
ξ)
)2 1+e
(ln x−ln ξ)
γ
respectively.
In Figure 1, we have plotted the CDFs of WE(4.18,0.56), LN(0.27,1.60) and
LL(0.16,1.60). It is clear from Figure 1 that all the CDFs are very close to each
other. Therefore, if the data are coming from any one of these three distributions,
the other two distributions can easily be used to analyze this data set. Although,
these three CDFs are quite close to each other, the hazard functions of the above
three distribution functions, see Figure 2, are completely different. Moreover, the
2
1.2
0.8
0.6
0.4
F(x)
0.2
0
0 1 2 3 4 5
x
5 WE(4.18,0.56)
3
LL(0.16,1.60)
2
LN(0.27,1.60)
h(x)
0
0 1 2 3 4 5
x
3
Table 1: PCS based on Monte Carlo simulations using ratio of maximized likelihood
function or for the percentile points. This issue would be more crucial when the
sample sizes are small or even moderate. Therefore, the discrimination problem
between different distributions received a considerable attention in the last few
decades.
Cox (1962) first addressed the problem of discriminating between the LN
and the exponential distributions based on the likelihood function and derived the
asymptotic distribution of the likelihood ratio test statistic. Since then extensive
work has been done in discriminating among different distributions. Some of the
recent work regarding discriminating between different lifetime distributions can be
found in Alshunnar et al.(2010), Pakyari (2012), Elsherpieny et al. (2013), Sultan
and Al-Moisheer (2013), Ahmad et al. (2016) and the references cited therein.
Although, extensive work has been done in discriminating between two distributions
not much work has been done when more than two distributions are present, except
the work of Marshall et al. (2001) and Dey and Kundu (2009). Moreover, most of
the work till today are based on the likelihood ratio test.
The aim of this paper is two fold. First of all we derive the Fisher informa-
tion matrices of these three distributions and obtain different Fisher information
measures both for the complete and censored samples. We also provide the loss
of information due to truncation for these three different distributions. It is ob-
served that the Fisher information measure can be used in discriminating purposes.
Our second aim of this paper is to compare three different methods namely (i) the
method based on the Fisher information measures, (ii) the method based on the
likelihood ratio and (iii) the method based on the Kolmogorov-Smirnov distance,
in discriminating among these three different distributions. We perform extensive
simulation experiments to compare different methods for different sample sizes and
for different parameter values. It is observed that the performance of each method
depends on the true underlying distribution and the set of parameter values.
Rest of the paper is organized as follows. In Section 2, we derive the Fisher
information measures for complete sample for all the three cases and show that how
they can be used for discrimination purposes. In Section 3, we provide the discrim-
ination procedure based on likelihood ratio statistics and derive their asymptotic
properties. Monte Carlo simulation results are presented in Section 4. In Section
5, we provide the Fisher information measures for censored samples and the loss of
information due to truncation. The analysis of a data set is presented in Section 6,
and finally we conclude the paper in Section 7.
4
2 FI measure for complete sample
Let X > 0 be a continuous random variable with PDF and CDF as f (x; θ) and
F (x; θ), respectively, where θ = (θ1 , θ2 ) is a vector parameter. Under the standard
regularity conditions, see Lehmann(1991), the FI matrix for the parameter vector
θ is
∂
ln f (X; θ) h
∂θ1 i
I(θ) = E ∂ ln f (X; θ) ∂ ln f (X; θ) .
∂θ1 ∂θ2
∂
∂θ2
ln f (X; θ)
In this section, we present the FI measures for the WE, LN and LL distributions
based on a complete data. The FI matrices of WE and LN (see, for example,
Alshunnar et al. (2010) and Ahmad et al. (2016)) can be described as follows:
f11W f12W 2 f11N f12N
IW (α, λ) = and IN (σ , β) = ,
f21W f22W f21N f22N
where
1 ′ 2
1 α2
f11W = ψ (1) + ψ (2) , f 12W = f 21W = (1 + ψ(1)) , f 22W = ,
α2 λ λ2
and
2 1
f11N = 2
, f12N = f21N = 0, f22N = 2 2 .
σ β σ
Here ψ(x) = Γ′ (x)/Γ(x) and ψ ′ (x) are the psi(or digamma) and tri-gamma func-
tions, respectively, with Γ(.) being the complete gamma function. The FI matrix
for LL distribution based on a complete data in terms of the parameters γ and ξ is
presented in Theorem 1 given below. The proof can be seen easily via differentiation
techniques and straight-forward algebra.
Theorem 1: The FI matrix for the LL distribution is
f11L f12L
IL (γ, ξ) = ,
f21L f22L
where
π2
1 1
f11L = 2 1+ , f22L = 2 2 , and f12L = f21L = 0.
3γ 3 3γ ξ
5
trace of the FI matrix of the WE distribution is the sum of the information mea-
sure of α when λ is known and λ when α is known. The traces of the FI matrix of
the LN and LL are defined similarly. In spite of the fact that the shape and scale
parameters are essential tools in many distributional properties, these parameters
do not characterize the same prominent distributional features of the corresponding
distributions. For comparison purposes of distributional characteristics of the three
distributions, we evaluate the asymptotic variances of the percentile estimators for
these distributions. In our case, the the p-th (0 < p < 1) percentiles of the WE, LN
and LL distributions are respectively,
γ
1 1/α σ Φ−1 (p) p
PW E (α, λ) = (−ln(1 − p)) , PLN (β, σ) = βe , PLL (γ, ξ) = ξ .
λ 1−p
Therefore, V arW E (p), V arLN (p),V arLL (p), the asymptotic variances of the loga-
rithm of the p-th percentile estimators of the WE, LN and LL distributions, respec-
tively, can be written as follows:
and
−1 ∂ln PL
h i f11L f12L ∂γ
∂ln PLL ∂ln PLL
V arLL (p) = . (3)
∂γ ∂ξ
∂ln PL
f21L f22L ∂ξ
The asymptotic variances for the median or the 95-th percentile estimators
of the three distributions can be used for comparison purposes. Using the average
asymptotic variance with respect to probability measure W (.) proposed by Gupta
and Kundu (2006), we compare the following measures:
Z 1 Z 1
AVW E = V arW E (p) dW (p), AVLN = V arLN (p) dW (p),
0 0
and
Z 1
AVLL = V arLL (p) dW (p),
0
R1
where W (.) is a weighted function such that 0 W (p) dp = 1. For more convenience,
one may consider the average asymptotic variances of all percentile estimators, that
is, W (p) = 1, 0 < p < 1.
6
To conduct a comparative study of the total information measure between
any two distributions, we have to compute these measures at their closest val-
ues. One way to define the closeness (distance) between two distributions is to
use the Kullback-Leibler (KL) distance (see, for example, White (1982, Theorem
1), Kundu and Manglick (2004)). For notational convenience, let θ˜1 and θ˜2 be the
miss-classified parameters of F1 distribution for given δ1 and δ2 of F2 distribution
so that F1 (θ˜1 , θ˜2 ) is closest to F2 (δ1 , δ2 ) in terms of the Kullback-Leibler distance.
Lemma 1 below provides the estimates of the parameters where any two distribu-
tions among WE, LN and LL are closest to each other. Further, the maximum
likelihood estimates (MLEs) of θ1 and θ2 are denoted by θˆ1 and θˆ2 .
Lemma 1 [Kundu and Manglick (2004)]:
(i) If the the underlying distribution is WE(α, λ), then the closest LN distribution
in terms of KL distance is LN (σ̃, β̃), where
p
ψ ′ (1) 1 ψ(1)
σ̃ = , and β̃ = e α . (4)
α λ
(ii) If LN (σ, β) is the valid distribution then the closest WE distribution in terms
of KL distance is W E(α̃, λ̃) such that
1 1 −σ
α̃ = , and λ̃ = e 2 . (5)
σ β
(ii)Under the assumption that the data are from LL(γ, ξ), we have the following
results as n → ∞, σ̂ → σ̃, a.s., and β̂ → β̃, a.s., where
ELL ln fLN (X; σ̃, β̃) = max ELL (ln fLN (X; σ, β)) .
σ,β
In fact, Dey and Kundu (2009) have shown that for σ = β = 1, ELN (ln fLL (X; γ, ξ)
is maximized when γ̃ = 0.5718 and ξ˜ = √1, while when γ = ξ = 1, the maximization
of ELL (ln fLN (X; σ, β) occurs at σ̃ = 3 and β̃ = 1. In general if the data are
coming from LN(σ, β), then γ̃ and ξ˜ can be obtained by maximizing
lnX − lnξ lnX−lnξ
ELN (ln fLL (X; γ, ξ) = ELN −ln γ + − 2ln 1 + e γ
γ
lnβ − lnξ β γ1 σγ X
= −lnγ + − 2EN ln 1 + ( ) e ,
γ ξ
where EN (.) stands for the expectation under standard normal distribution. They
do not have explicit forms and they need to be obtained numerically.
7
To obtain σ̃ and β̃ such that LL(γ, ξ) is closest to LN (σ̃, β̃), we have to
maximize
ln(2π) 1
ELL (ln fLN (X; σ, β) = − − ln σ − ELL (ln X) − 2 ELL (ln X − ln β)2 .
2 2σ
It is easily seen that ELL (lnX) = lnξ. By applying Taylor’s series −ln(1 − z) =
∞ j
P
j=1 z /j, it readily follows
1 ∞
1 π2
Z X
lnz ln(1 − z) = = 2 − .
0 j=1
j(j + 1)2 6
π2γ 2
ln(2π) 1 2
ELL (ln fLN (X; σ, β) = − − ln σ − ln ξ − 2 (lnξ − lnβ) + .
2 2σ 3
√
It can be easily verified that β̃ = ξ and σ̃ = πγ/ 3 maximize ELL (ln fLN (X; σ, β).
Similarly as in Lemma 1 and Lemma 2, we have the following Lemma related to
the WE and LL distributions.
Lemma 3:
(i) Suppose the data come from WE(α, λ) then as n → ∞, γ̂ → γ̃, a.s. and ξˆ → ξ,
˜
a.s., where
˜ = max EW E [ln fLL (X; γ, ξ)] .
EW E ln fLL (X; γ̃, ξ)
γ,ξ
(ii) If the data come from LL(γ, ξ), then as n → ∞, we have α̂ → α̃, a.s. and
λ̂ → λ̃, a.s., where
ELL ln fW E (X; α̃, λ̃) = max ELL [ln fW E (X; α, λ)] .
α,λ
Proof: The proof is followed along the lines of Dey and Kundu (2009).
˜ let us define
(i) To find γ̃ and ξ,
ψ(1)
The second term in (6) is evaluated to be EW E (ln X) = −ln λ + , where
α
C = −ψ(1) = 0.57721..., is the Euler’s constant, while the last term can be rewritten
as
γ1 ! Z ∞
1
! γ1
X yα e−y dy.
EW E ln 1 + = ln 1 +
ξ 0 λξ
8
Consequently, Eq. (6) can take the following simplified form
1 ψ(1) lnξ
g(γ, ξ) = −lnγ − 1 − −lnλ + − −
γ α γ
Z ∞
1
! γ1
yα e−y dy.
2 ln 1 + (7)
0 λξ
Upon differentiating (7) with respect to γ, using (8) and equating the resulting
expression to 0, we compute γ̃ as a numerical solution to
1 γ1
yα
2
Z ∞ lny λξ ψ(1)
−y
1 e dy − γ − = 0, (9)
α 0
1
yα γ
α
1+ λξ
(ii) For given γ and ξ, we maximize ELL [ln fW E (X; α, λ)], with respect to α and λ.
Let
By using the facts, ELL (lnX) = ln ξ and ELL (X α ) = ξ α B(1 − αγ, 1 + αγ), αγ < 1,
we have
Arguments similar to those in (i), we find α̃ and λ̃ for which LL(γ, ξ) is closest to
W E(α̃, λ̃) by solving the following normal equations:
1 1
λ̃ = 1/α
, and ψ(1 + α γ) − ψ(1 − α γ) = . (10)
ξ [B(1 − αγ, 1 + α γ)] γα
9
˜ and W E(α, 1) for
Table 2: The TI and TV of FI matrices of LN (σ̃, β̃), LL(γ̃, ξ)
different α
0.8 1.6032 0.4860 0.8699 0.5609 3.4895 2.4254 3.2898 2.1214 1.8922 1.2434
1.0 1.2826 0.5615 0.6959 0.6297 2.8237 3.1438 4.6886 1.7166 1.3412 0.9147
1.2 1.0688 0.6182 0.5799 0.6802 2.7064 4.0414 6.3946 1.6453 1.0077 0.7019
1.4 0.9161 0.6621 0.4971 0.7187 2.8905 5.1012 8.3983 1.7572 0.7875 0.5557
1.6 0.8016 0.6971 0.4349 0.7490 3.2724 6.3151 10.7019 1.9894 0.6335 0.4506
1.8 0.7125 0.7257 0.3866 0.7734 3.8029 7.6799 13.2961 2.3119 0.5212 0.3727
2.0 0.6413 0.7493 0.3480 0.7935 4.4559 9.1938 16.1791 2.7089 0.4365 0.3134
10
˜ and LN (σ, 1) for
Table 3: The TI and TV of FI matrices of W E(α̃, λ̃), LL(γ̃, ξ)
different σ
0.8 1.1180 0.6394 0.5115 1.0000 3.7500 4.5163 6.7396 1.2000 1.1225 0.9679
1.0 1.0000 0.6065 0.5718 1.0000 3.0000 4.5422 5.3931 1.5000 1.0157 1.2095
1.2 0.9129 0.5783 0.6264 1.0000 2.5000 4.6802 4.4939 1.8000 0.9515 1.4515
1.4 0.8452 0.5534 0.6766 1.0000 2.1429 4.8855 3.8518 2.1000 0.9096 1.6935
1.6 0.7906 0.5313 0.7233 1.0000 1.8750 5.1320 3.3704 2.4000 0.8807 1.9354
1.8 0.7454 0.5113 0.7672 1.0000 1.6667 5.4076 2.9958 2.7000 0.8594 2.1774
2.0 0.7071 0.4931 0.8087 1.0000 1.5000 5.7038 2.6962 3.0000 0.8431 2.4193
Table 4: The TI and TV of FI matrices of W E(α̃, λ̃), LN (σ̃, β̃) and LL(γ, 1) for
different γ
0.3 1.6667 0.7627 0.5441 1.0000 19.5921 5.4319 10.1336 0.3329 1.9209 0.4441
0.5 1.0000 0.6366 0.9069 1.0000 7.0532 4.2912 3.6476 0.9248 1.0572 1.2337
0.8 0.6250 0.4855 1.4510 1.0000 2.7551 6.3259 1.4249 2.3676 0.9065 3.1580
1.0 0.5000 0.4053 1.8138 1.0000 1.7633 8.8166 0.9119 3.6993 0.8805 4.9348
1.2 0.4167 0.3383 2.1766 1.0000 1.2245 12.0199 0.6332 5.3270 0.8363 7.1064
1.4 0.3571 0.2824 2.5393 1.0000 0.8996 15.9001 0.4653 7.2507 0.7709 9.6721
1.6 0.3125 0.2357 2.9021 1.0000 0.6888 20.4323 0.3562 9.4703 0.6901 12.6333
1.8 0.2778 0.1968 3.2648 1.0000 0.5442 25.6237 0.2815 11.9858 0.6033 15.9884
2.0 0.2500 0.1643 3.6276 1.0000 0.4408 31.4942 0.2280 14.7973 0.5168 19.7392
11
F D3 > 0, LL if F D2 < 0 and F D3 < 0, as the preferred distribution. The
respective PCSs are defined as follows;
FI
P CSW E = P (F D1 > 0, F D2 > 0|data follow WE),
FI
P CSLN = P (F D2 < 0, F D3 > 0|data follow LN),
FI
P CSLL = P (F D2 < 0, F D3 < 0|data follow LL).
LR
P CSLN = P (Q1 < 0, Q3 > 0|data follow LN),
LR
P CSLL = P (Q2 < 0, Q3 < 0|data follow LL).
Now we have the following results.
12
Theorem 2:
(i) Under the assumptions that the data are from WE(α, λ), (Q1 , Q2 ) is asymptoti-
cally bivariate normally distributed with the mean vector (EW E (Q1 ), EW E (Q2 )), and
the dispersion matrix
V arW E (Q1 ) CovW E (Q1 , Q2 )
ΣW E =
CovW E (Q1 , Q2 ) V arW E (Q2 )
(ii) Under the assumptions that the data are from LN(σ, β), (Q1 , Q3 ) is asymptoti-
cally bivariate normally distributed with the mean vector (EW E (Q1 ), EW E (Q3 )), and
the dispersion matrix
V arLN (Q1 ) CovLN (Q1 , Q3 )
ΣLN =
CovW E (Q1 , Q3 ) V arW E (Q3 )
(iii) Under the assumptions that the data are from LL(γ, ξ), (Q2 , Q3 ) is asymptoti-
cally bivariate normally distributed with the mean vector (ELL (Q2 ), ELL (Q3 )), and
the dispersion matrix
V arLL (Q2 ) CovLL (Q2 , Q3 )
ΣLL =
CovLL (Q2 , Q3 ) V arLL (Q3 )
Proof: The proof can be obtained along the same line as the Theorem 1 of Dey
and Kundu (2009), the details are avoided.
4 Numerical Comparisons
In this section we performed some simulation experiments to compare different
methods to discriminate among these three distributions. We have used the method
based on Fisher information (FI), the ratio of maximized likelihood (RML) method
and the method based on Kolmogorov-Smirnov (MK) distance. The method based
on the KS distance (KSD) can be described as follows. Based on the observed sample
we compute the KS distances between the (i) empirical distribution function (EDF)
ˆ Which ever gives the
and WE(α̂, λ̂), (ii) EDF and LN(σ̂, β̂), (iii) EDF and LL(γ̂, ξ).
minimum we choose that particular distribution as the best fitted one. The PCS
can be defined along the same manner. It can be proved theoretically that the PCS
based on RML does not depend on the parameter values. In all the cases we have
taken the scale parameters to be 1.
To compare the performances of the different methods for different sample
sizes and for different parameter values, we have generated samples from the three
different distributions and compute the PCS for each method based on 1000 repli-
cations. The results are reported in Table 5. Some of the points are quite clear from
these experimental results. It is clear that as the sample size increases, the PCS in-
creases in each case. It indicates the consistency properties of each method. In case
of FI, the PCS decreases as the shape parameter increases in all the three cases.
13
Although no such pattern exists in case of KSD. It is further observed that the
performance of the RML is very poor when the data are obtained from the LL dis-
tribution. It seems, although we could not prove it theoretically that PCS remains
constant in case of KSD, when the data are obtained from the LL distribution. If
the shape parameter is less than 1, then the method based on FI out performs the
RML method. Although, nothing can be said when the shape parameter is greater
than 1.
where,
a11 a12
IM (θ; T1 , T2 ) =
a21 a22
" #
∂
1 ∂θ1
F (T2 , θ) h
∂ ∂
i
IR (θ; T2 ) = ∂ ∂θ1
F (T2 , θ) ∂θ2
F (T2 , θ)
F (T2 , θ) ∂θ2
F (T2 , θ)
" #
∂
1 ∂θ1
F (T1 , θ) h
∂ ∂
i
IL (θ; T1 ) = ∂ ∂θ1
F (T1 , θ) ∂θ2
F (T1 , θ) ,
F (T1 , θ) ∂θ2
F (T1 , θ)
14
Table 5: The PCSs under W E, LN and LL distributions for different sample sizes
and for different parameter values .
W E(α, 1)
n = 20 n = 30 n = 50
α↓ RML FI K-S RML FI K-S RML FI K-S
0.3 0.750 0.905 0.400 0.825 0.945 0.510 0.890 0.970 0.650
0.6 0.760 0.890 0.420 0.795 0.905 0.485 0.895 0.980 0.655
0.8 0.770 0.850 0.448 0.815 0.870 0.480 0.890 0.975 0.660
1.0 0.764 0.695 0.440 0.800 0.726 0.485 0.902 0.782 0.630
1.2 0.770 0.620 0.460 0.820 0.680 0.560 0.915 0.740 0.645
1.5 0.755 0.478 0.490 0.795 0.496 0.615 0.910 0.556 0.780
LN (σ, 1)
n = 20 n = 30 n = 50
σ2 ↓ RML FI K-S RML FI K-S RML FI K-S
0.3 0.775 0.840 0.775 0.800 0.905 0.825 0.905 0.945 0.900
0.5 0.780 0.800 0.730 0.810 0.935 0.885 0.900 0.950 0.920
0.8 0.770 0.780 0.715 0.800 0.845 0.875 0.905 0.930 0.885
1.0 0.775 0.735 0.800 0.820 0.855 0.860 0.900 0.875 0.905
1.2 0.785 0.685 0.585 0.830 0.745 0.590 0.910 0.810 0.775
1.5 0.795 0.565 0.354 0.825 0.590 0.385 0.890 0.652 0.740
LL(γ, 1)
n = 20 n = 30 n = 50
γ↓ RML FI K-S RML FI K-S RML FI K-S
0.2 0.330 1.000 0.615 0.395 1.000 0.650 0.520 1.000 0.705
0.4 0.350 0.980 0.625 0.390 0.990 0.665 0.525 1.000 0.685
0.6 0.328 0.566 0.626 0.376 0.570 0.628 0.535 0.620 0.745
0.8 0.340 0.525 0.620 0.425 0.550 0.670 0.530 0.610 0.735
1.0 0.350 0.480 0.645 0.395 0.520 0.650 0.530 0.550 0.725
1.5 0.330 0.420 0.680 0.410 0.480 0.685 0.520 0.530 0.745
15
for i, j = 1,2. Therefore, the FI for complete sample or for fixed right censored
(at time T2 ) sample or for fixed left censored (at time T1 ) sample with vector of
parameters θ can be obtained as
c11W c12W
ILW (α, λ, T1 ) = ,
c21W c22W
where
− ln(1−p2 ) − ln(1−p2 )
1 α2
Z Z
2 −u
a11W = 2 (1 + ln u − u ln u) e du, a22W = 2 (1 − u)2 e−u du,
α − ln(1−p1 ) λ − ln(1−p1 )
− ln(1−p2 )
1
Z
a12W = a21W = (1 + ln u − u ln u)(1 − u) e−u du,
λ − ln(1−p1 )
2 2
(1 − p2 ) ln (1 − p2 ) ln [−ln(1 − p2 )] α2 (1 − p2 )ln2 (1 − p2 )
b11W = , b 22W = ,
α2 λ2
(1 − p2 )ln2 (1 − p2 ) ln[−ln(1 − p2 )]
b12W = b21W = ,
λ
(1 − p1 )2 ln2 (1 − p1 ) ln2 [−ln(1 − p1 )] α2 (1 − p1 )2 ln2 (1 − p1 )
c11W = , c 22W = ,
α 2 p1 λ2 p 1
(1 − p1 )2 ln2 (1 − p1 ) ln[−ln(1 − p1 )]
c12W = c21W = .
λ p1
(ii)
a11N a12N b11N b12N
IM N (α, λ, T1 , T2 ) = , IRN (α, λ, T2 ) = ,
a21N a22N b21N b22N
c11N c12N
ILN (α, λ, T1 ) = ,
c21N c22N
16
where
Φ−1 (p2 ) Φ−1 (p2 )
1 1
Z Z
2 2
a11N = (1 − y ) φ(y)dy, a22N = 2 2 y 2 φ(y)dy,
σ2 Φ−1 (p 1)
β σ Φ−1 (p 1)
Φ−1 (p2 )
1
Z
a12N = a21L = (y 2 − 1)yφ(y)dy,
βσ 2 Φ−1 (p1 )
1 2 −1 2 1 2
b11N = 2
φ Φ−1 (p2 ) Φ (p2 ) , b22L = 2 2
φ Φ−1 (p2 )
(1 − p2 )σ β (1 − p2 )σ
1 2
b12N = b21L = 2
φ Φ−1 (p2 ) Φ−1 (p2 )
β(1 − p2 )σ
1 −1
2 −1 2 1 −1
2
c11N = φ Φ (p 1 ) Φ (p 1 ) , c 22L = φ Φ (p 1 )
p1 σ 2 β 2 p1 σ 2
1 2 −1
c12N = c21N = 2
φ Φ−1 (p1 ) Φ (p1 ) .
βp1 σ
(iii)
a11L a12L b11L b12L
IM L (α, λ, T1 , T2 ) = , IRL (α, λ, T2 ) = ,
a21L a22L b21L b22L
c11L c12L
ILL (α, λ, T1 ) = ,
c21L c22L
p2 /(1−p2 ) 2 Z p2 /(1−p2 )
(1 − u)2
1 u ln u 1 1
Z
a11L = 2 1 + ln u − 2 du, a22L = 2 2 du,
γ p1 /(1−p1 ) 1 + u (1 + u)2 γ ξ p1 /(1−p1 ) (1 + u)
4
Z p2 /(1−p2 )
1 u ln u 1 − u
a12L = a21L = 2 1 + ln u − 2 du,
γ ξ p1 /(1−p1 ) 1 + u (1 + u)3
p2 p2
(1 − p2 )p22 ln2 1−p 2
2
p (1 − p2 ) p 2
2 (1 − p 2 ) ln 2
1−p2
b11L = 2
, b22L = 2 2 2 , b12L = b21L = 2
,
γ γ ξ γ ξ
p1 p1
p1 (1 − p1 )2 ln2 1−p 1 p1 (1 − p1 ) 2 p 1 (1 − p 1 ) 2 2
ln 1−p1
c11L = , c 22L = , c 12L = c 21L = .
γ2 γ 2ξ2 γ 2ξ
17
Table 6: Loss of FI of the shape and scale parameters for Weibull distribution
Scheme→
Parameter ↓ Scheme 1 Scheme 2 Scheme 3
Shape 46% 78% 38%
Scale 25% 78% 9%
of the Fisher information matrix that if p1L = 1 − p2R , where p1L = FLN (T1 ) and
p2R = FLN (T2 ) corresponding to left and right censoring cases, respectively, then
both left and right censored data for LN distribution have the same FI about both
the parameters. The same is true in case of LL distribution also. It seems it is due
to the fact both normal (ln LN) and logistic (ln LL) distributions are symmetric
distributions, although we could not prove theoretically. In case of WE distribution,
the FI gets higher values towards the right tail than the left tail.
Now one important question is which of the two parameters has more impact.
For this, we would like to discuss the loss of information due to truncation in one
parameter, when the other parameter is known. Suppose the WE distribution is the
underlying distribution with fixed truncation points. If the scale (shape) parameter
is known, the loss of information of the shape (scale) parameter for WE distribution
is
a11W + b11W + c11W
LossW E (α) = 1 −
f11
"Z
−ln(1−p2 )
1
= 1− ′ 2
(1 + ln u − u ln u)2 e−u du+
ψ (1) + ψ (2) −ln(1−p1 )
(1 − p1 )2 ln2 (1 − p1 ) ln2 (−ln(1 − p1 ))
2 2
(1 − p2 ) ln (1 − p2 ) ln (−ln(1 − p2 )) + ,
p1
and
a22W + b22W + c22W
LossW E (λ) = 1 −
f22
"Z #
−ln(1−p2 )
= 1− (1 − u)2 e−u du + (1 − p2 ) ln2 (1 − p2 ) + (1 − p1 )2 ln2 (1 − p1 ) .
−ln(1−p1 )
Clearly both losses are free of any parameter and depend only on the truncation
parameters. The loss of FI of the shape and scale parameters due to Schemes 1,
2 and 3 are presented in Table 6. It is easily seen that the information due to the
interval censoring are similar in both cases while the information due to the right
and left censoring are different. It is of interest to see also that the last portion of
the data contains higher information of the shape/or scale parameter for the WE
distribution.
Now for LN distribution, if the scale (shape) parameter is known, the loss of
18
Table 7: Loss of FI of the shape and scale parameters for the log-normal distribution
Scheme→
Parameter ↓ Scheme 1 Scheme 2 Scheme 3
Shape 31% 80% 51%
Scale 6% 72% 13%
and
"Z #
Φ−1 (p2 ) 2 2
[φ(Φ−1
(p 2 ))] [φ(Φ−1
(p 1 ))]
LossLN (β) = 1 − u2 φ(u) du + + .
Φ−1 (p1 ) 1 − p2 p1
For LN distribution, the losses of FI of the shape and scale parameters under differ-
ent schemes are presented in Table 7. It is clear that the maximum information of
the shape and scale parameters of LN distribution is occurred in the initial portion
of the data.
Proceeding similarly, the losses of FI of the shape and scale parameters of LL
are
p2
"Z
1 1−p2 2u ln u 2 p2
LossLL (γ) = 1 − π2
[1 + ln u − ] + (1 − p2 )p22 ln2 ( )+
3(1 + 3
) p1
1−p1
1+u 1 − p2
2 2 p1
p1 (1 − p1 ) ln ( ) ,
1 − p1
and
p2
"Z #
1−p2 (1 − u)2
LossLL (ξ) = 1 − + p22 (1 − p2 ) + p1 (1 − p1 )2 ,
p1
1−p1
(1 + u)4
respectively. The loss of information for the shape and scale parameters for different
censoring schemes are presented in Table 8. It is clearly observed that for both the
LN and LL, the initial portion of the data has more information than the right tail,
where as for the WE distribution it is the other way.
19
Table 8: Loss of FI of the shape and scale parameters for the log-logistic distribution
Scheme→
Parameter ↓ Scheme 1 Scheme 2 Scheme 3
Shape 26% 74% 51%
Scale 2% 57% 14%
6 Data analysis
Here we discuss the analysis of real life data representing the fatigue life(rounded
to the nearest thousand cycles) for 67 specimens of Alloy T 7987 that failed be-
fore having accumulated 300 thousand cycles of testing. Their recorded values (in
hundreds) are
0.94 1.18 1.39 1.59 1.71 1.89 2.27 0.96 1.21 1.40 1.59 1.72 1.90 2.56
0.99 1.21 1.41 1.59 1.73 1.96 2.57 0.99 1.23 1.41 1.59 1.76 1.97 2.69
1.04 1.29 1.43 1.62 1.77 2.03 2.71 1.08 1.31 1.44 1.68 1.80 2.05 2.74
1.12 1.33 1.49 1.68 1.80 2.11 2.91 1.14 1.35 1.49 1.69 1.84 2.13 1.17
1.36 1.52 1.70 1.87 2.24 1.17 1.39 1.53 1.70 1.88 2.26.
First we provide some preliminary data analysis results. The mean, median, stan-
dard deviation and the coefficient of skewness are 1.6608, 1.5900, 0.4672, 0.7488,
respectively. The histogram of the above data set is presented in Figure 3 From
70
60
50
40
30
20
10
0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4
the preliminary data analysis, it is clear that a skewed distribution may be used
to analyze this data set. Barreto-Souza et al. (2010) fitted the Weibull-geometric,
extended exponential geometric and WE models to this real data set. Graphically,
Meeker and Escobar(1998, p. 149) showed that the LN distribution provides a much
better fit than WE distribution. Before progressing further we provide the plot of
20
˜ and
Table 9: The traces and variances of the FI matrices of LN (σ̃, β̃), LL(γ̃, ξ)
W E(α, 1) for three different censoring schemes
0.8 1.6032 0.4860 0.8699 0.5609 2.0136 2.0815 2.7803 2.7382 2.5921 1.4452
1.0 1.2826 0.5615 0.6959 0.6297 1.7315 2.6455 3.9000 2.3546 1.8015 1.0463
1.2 1.0688 0.6182 0.5799 0.6802 1.7616 3.3532 5.2638 2.3956 1.3346 0.7946
1.4 0.9161 0.6621 0.4971 0.7187 1.9708 4.1886 6.8642 2.6800 1.0322 0.6247
1.6 0.8016 0.6971 0.4349 0.7490 2.3034 5.1441 8.7018 3.1324 0.8237 0.5039
1.8 0.7125 0.7257 0.3866 0.7734 2.7329 6.2169 10.7689 3.7165 0.6734 0.4152
2.0 0.6413 0.7493 0.3480 0.7935 3.2454 7.4050 13.0637 4.4134 0.5612 0.3481
21
the scaled total time on test (TTT) transform. It is well known that the scaled
TTT transform plot provides an indication about the shape of the hazard function.
For example, if the plot is a concave function then it indicates that the hazard
function is an increasing function, or if the plot is first concave and then convex, it
indicates that the hazard function is an upside down function etc. We provide the
plot of the scaled TTT transform in Figure 4. Although, it is not very clear, it has
an indication that at the beginning it is concave and then it is convex. It indicates
that the hazard function is an upside down function.
1.05
1
0.95
0.9
0.85
0.8
0.75
0.7
0.65
0.6
0.55
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Now we fit all the three distributions WE, LN and LL to this data set. The
MLEs of the models parameters are computed numerically using Newton-Raphson
(NR) method. The MLEs, the Kolmogorov-Smirnov (K-S) distances between the
fitted and the empirical distribution functions and the corresponding p-values (be-
tween parentheses) are presented in Table 10. The empirical and fitted distribution
functions are presented in Figure 5. The CDFs of LN and LL are very close to each
other, and the CDF of WE is quite different than the other two.
1
WE LN
0.8
0.6
0.4
0.2
LL
0
0 0.5 1 1.5 2 2.5 3
22
Table 10: MLEs, K-S statistics and the corresponding p-values of the data set
WE model
α λ K −S
3.7257 0.5446 0.0973(0.5497)
LN model
σ β K −S
0.2722 1.5998 0.0418(0.9998)
LL model
γ ξ K −S
0.1576 1.5998 .0502(0.9959)
and
ˆ = 57.5573 0
ILL (γ̂, ξ) .
0 5.2437
Now if we consider the total asymptotic variances of the three cases, they are as
follows:
Hence, based on all these we conclude that LN is the most preferred among these
three distributions for this particular data set.
Next, let us assess the variance of the p-th percentile estimators of the three
distributions for various choices of p. Figure 3 shows the asymptotic variance of
the p-th percentile estimators for complete and censored samples. For complete
sample, it is clear that the WE distribution has higher variance than LN and LL
distributions for p < 0.7. It is also evident from Figure Figure 6 that asymptotic
variances of the p-th percentile estimators for censored samples for LN and LL are
different while their respective curves tend to be identical for complete sample case.
This concludes that both distributions LN and LL can be discriminated easily when
23
the censoring data set is available. By taking a censoring observation on [T1 , T2 ]
with p1 = 0.15 and p2 = 0.90, the FI matrices for WE, LN and LL distributions are
computed to be
0.0488 0.8918 5.4529 0.1678
IW E (α̂, λ̂) = , ILN (β̂, θ̂) =
0.8918 35.1012. 0.1678 1.4947
and
28.5881 5.7614
ILL (β̂, θ̂) =
5.7614 3.2150
respectively. Based on the FI matrices for complete and censored data sets, it is
observed the loss of information due to truncation for LN distribution is much more
than WE and LL distributions with respect to both parameters, while the loss of
information of the shape parameter due truncation for WE and LL is much more
than that of the scale parameters for the same distributions.
Variance Variance WE
0.8 LN
WE
2.5 LL
LN
0.6 LL
2.0
0.4 1.5
0.2 1.0
0.5
p
0.2 0.4 0.6 0.8 p
0.2 0.4 0.6 0.8 1.0
Figure 6: The variances of the pth percentile estimators for WE, LN and LL distri-
butions for complete (left) and censored (right) samples.
7 Conclusions
In this article, we have considered the problem of discrimination among the WE,
LN and LL distributions using three different methods. The asymptotic variance
of the percentile estimators is also compared for these three distributions. It is
24
observed that although the three distributions can be chosen as appropriate fitting
models for a specific data set, the total information of Fisher information matrix as
well as the asymptotic variance of the percentile estimators can be quite different.
An extensive simulation experiment has been carried out to compute the PCS by
different methods, and it is observed that the method based on the Fisher informa-
tion measure competes with the other existing methods well, especially for certain
ranges of parameter values.
Acknowledgement
The authors would like to thank the unknown reviewers for their constructive com-
ments which have helped to improve the manuscript significantly.
Appendix
Proof of Theorem 1:
Taking the natural logarithm for the PDF of the LL distribution, we get
1 (ln x−ln ξ)
ln fLL (x; γ, ξ) ∝ −ln γ + (ln x − ln ξ) − 2 ln(1 + e γ ). (13)
γ
and
(ln x−ln ξ)
∂ ln fLL (x; γ, ξ) 1 e γ
=− (1 − 2 (ln x−ln ξ)
).
∂ξ γξ 1+e γ
Then
(ln x−ln ξ)
∞
1 (ln x − ln ξ) (ln x − ln ξ) e γ
Z
f11L = 2 [1 + −2 (ln x−ln ξ)
]2 fLL (x; γ, ξ) dx,
γ 0 γ γ 1+e γ
(ln x−ln ξ)
∞
1 e
Z γ
f22L = 2 2 [2 (ln x−ln ξ)
− 1]2 fLL (x; γ, ξ) dx.
γ ξ 0 1+e γ
25
A more simplified expression of f11L can be obtained by using the second derivative
approach as follows:
∂ 2 lnf (x; γ, ξ)
f11L = −E
∂γ 2
ln x−ln ξ
" #
1 2 ln x − ln ξ 4 ln x − ln ξ e γ
= − 2− 2 E + 2 E ln x−ln ξ
γ γ γ γ γ 1+e γ
ln x−ln ξ
2 ln x − ln ξ 2 e γ
+ 2 E[( ) 2 ]. (14)
γ γ ln x−ln ξ
1+e γ
Since the integrand function involved in the second term of the right hand side of
(14) is odd function, its integral becomes 0. By using the substitution arguments,
we con readily obtain the following identities:
ln x−ln ξ
" # Z
1
ln x − ln ξ e γ 1
E ln x−ln ξ = [ln(1 − z) − ln z] (1 − z) dz = ,
γ 1+e γ 0 2
and
ln x−ln ξ
" #
1
ln x − ln ξ 2 e γ
Z
E ( ) ln x−ln ξ = [ln(1 − z) − ln z]2 z(1 − z) dz
γ (1 + e γ )2 0
1
19
Z
= −2 ln z ln(1 − z) z(1 − z) dz.
54 0
This leads to
1
1 46
Z
f11L = 2 −4 ln z ln(1 − z) z(1 − z) dz . (15)
γ 27 0
j
By using the Taylor’s series expansion of −ln(1 − z) = ∞
P
j=0 z /j and the identity
(see, Gradshteyn and Ryzhik (1994), §4.272.16, p.548), we have
1 1
∞ X
1 (−1)k
Z X
j+1
(−ln z)z (1 − z) dz =
0 j=0 k=0
k (j + k + 2)2
∞
X 1 1 1
= − .
j=1
j (j + 2)2 (j + 3)2
The series appeared on the right hand side of the above identity can be computed
easily by a straightforward algebra of partial fractions decomposition,Ptelescoping
2
series arguments and using the Euler-Riemann zeta function ζ(2) = ∞ i=1 1/i =
π 2 /6. This gives
Z 1
74 − 6π 2
(−ln z)z j+1 (1 − z) dz = ,
0 216
26
and consequently we simplify (15) as follows:
π2
1
f11L = 2 1 + .
3γ 3
References
[1] Ahmad, M. A., Raqab, M. Z. and Kundu, D. (2017). Discriminating between
the generalized Rayleigh and Weibull distributions: Some comparative studies,
Communications in Statistics-Simulation & Computation, In Press, Online Link:
http://dx.doi.org/10.1080/03610918.2015.1136415.
[2] Alshunnar, F.S., Raqab, M.Z. and Kundu, D. (2010). On the comparison of
the Fisher information of the log-normal and Weibull distributions, Journal of
Applied Statistics 37, 391 - 404.
[7] Gradshteyn, I.S. and Ryzhik, I.M. (1994), In: A. Jeffrey (Ed), Table of Integrals
Series and Products, 5th edn, Academic Press, San Diego, USA.
[8] Gupta, R.D. and Kundu, D, (2006). On the comparison of Fisher information of
the Weibull and GE distributions, Journal of Statistical Planning & Inference
136, 3130-3144.
27
[10] Kundu, D. and Manglick, A.(2004). Discriminating between the Weibull and
log- normal distributions, Naval Research Logistics 51, 893 - 905.
[12] Marshall, A. W., Meza, J. C., and Olkin, I.(2001). Can data recognize its parent
distribution?, Journal of Computational and Graphical Statistics 10 (2001) 555-
580.
[17] White, H. (1982). Regularity conditions for Cox’s test of non-nested hypothe-
ses, Journal of Econometrics 19, 301-318.
[18] Yu, H. F. (2007). Mis-specification analysis between normal and extreme value
distributions for a linear regression model, Communications in Statistics-Theory
& Methods 36, 499-521.
28