9511_et_Module-2
9511_et_Module-2
Shirsendu Mukherjee
Department of Statistics, Asutosh College, Kolkata, India.
shirsendu st@yahoo.co.in
In any given problem of estimation we may have a large, often an infinite class of
competing estimators for g(θ), a real valued function of the parameter θ.The follow-
ing question that may arise is : Are some of many possible estimators better, in some
sense, than others? In this section we will define certain criteria, which an estimator
may or may not possess, that will help us in comparing the performances of rival
estimators and deciding which one is perhaps the ’best’.
Closeness
If our object is to estimate a parametric function g(θ) then we would like the estimator
T (X) to be close to g(θ). Since T (X) is a statistic, the usual measure of closeness
|T (X) − g(θ)| is also a random variable and as a measure of closeness of T we use the
measure Pθ (| T1 − g |< ) for some > 0.
Consider two estimators T1 and T2 for estimating a parametric function g = g(θ) of
θ. The estimator T1 will be called more concentrated estimator of g(θ) than T2 if T1
if for every > 0
1
Since (T1 − θ)2 is a non-negative random variable we get
Z ∞
2
Eθ (T1 − g) = Pθ (| T1 − g |> )d
0
It follows that
Z ∞
2 2
Eθ (T1 − g) − Eθ (T2 − g) = [Pθ (| T2 − g |< ) − Pθ (| T1 − g |< )] d
0
Hence the inequality Eθ (T1 −g)2 ≤ Eθ (T2 −g)2 for all θ ∈ Θ implies that Pθ (| T2 −g |<
) ≤ Pθ (| T1 − g |< ) for all > 0.
Mean-squared Error(MSE)
If T be an estimator of g then the MSE of T is defined by
The term (T − g) is called the error of T in estimating g and thus E(T − g)2 is called
the mean square error of T . It measures the average squared difference between the
estimator T and the parameter g. From the above result it is clear that smaller the
value of MSE the better is the estimator. Naturally, we would prefer an estimator
with smaller or smallest MSE. If such an estimator exists it will be best for the
parameter g.
An estimator T is said to be best for g if M SEθ (T ) < M SEθ (T 0 ) for all θ ∈ Θ for
any other estimator T 0 of g. But the problem is that no such best estimator will
exists in this sense. It will be clear from the following discussion.
Let for a particular value of θ, say θ0 , T 0 be defined as
Then
⇒ M SEθ0 (T ) = 0
2
Hence T = g(θ0 ) with probability 1. Since θ0 is arbitrary for any θ
But T being a statistic it can not be a function of the unknown θ. Hence such a best
estimator does not exist.
Consider the following example.
Example Let X1 , X2 , . . . , Xn be i.i.d. N (θ, 1), θinRrandom variables. To estimate
Pn
θ let us consider two estimators T = X̄ = 1
n i=1 Xi and T 0 = θ0 .
Then we get
M SEθ (T ) = 1
n
and M SEθ (T 0 ) = (θ0 − θ)2 .
Now for values of θ ∈ [θ0 − √1 , θ0
n
+ √1 ]
n
we have M SEθ (T 0 ) ≤ M SEθ (T ) and for
other values of θ we have M SEθ (T 0 ) > M SEθ (T ).
Here T 0 is not a good estimator of θ sine it always estimate θ to be θ0 and it does
not depend on observations at all. On the other hand the estimator T utilizes the
observations and therefore it is better than T 0 .
From the above discussion it is clear that if MSE is the only criterion in search for a
good estimator then there may be some ’freak’ estimators (like T 0 ) that are extremely
prejudiced in favour of a particular values of θ and they would perform better than a
generally good estimator estimator at those points. For instance, in the above exam-
ple the estimator T 0 is highly partial to θ0 since it always estimate θ to be θ0 . One
could restrict such freak estimators by considering only estimators that satisfy some
other property. One such property is that of unbiasedness.
Unbiasedness
Definition An Estimator T is said to be an unbiased estimator (UE) of g(θ) if
3
If T is not an unbiased estimator of g(θ) then the bias of T is defined by
Bθ (T ) = Eθ (T ) − g(θ), θ ∈ Θ.
The following result shows a relationship between MSE and variance of an estimator
in terms of the bias.
M SEθ (T ) = Eθ (T − g(θ))2
= V arθ (T ) + Bθ2 (T )
Thus, MSE incorporates two components one measuring the variability of the esti-
mator(precision) and the other measuring its bias (accuracy). An estimator that has
good MSE properties has small combined variance and bias. To find an estimator
with good MSE properties, we need to find estimators that control both variance and
bias. Clearly, unbiased estimators do a good job of controlling bias. For an unbiased
estimator T we have M SEθ (T ) = V arθ (T ).
Although many unbiased estimators may be reasonable from the standpoint of MSE,
but controlling bias does not guarantee that MSE is controlled. In some cases a
trade-off occurs between the variance and the bias in such a way that a small increase
in bias can be traded for a larger decrease in variance, resulting a smaller MSE. It is
clear from the following example.
Example 1 Let Xi ∼ N (µ, σ 2 )i = 1, 2 . . . n independently where µ and σ 2 are un-
known.
Consider all estimators of σ 2 of the form T = cS 2 where c > 0 is a constant and
4
n
1
S2 = Σ (Xi
n−1 i=1
− X̄)2
Now
M SEσ (cS 2 − σ 2 )2 = Eσ (cS 2 − σ 2 )2 = c2 Eσ (S 4 ) − 2cEσ (S 2 ) + σ 4
(n−1)S 2
Since σ2
∼ χ2n−1 ,
(n − 1)S 2 (n − 1)S 2
! !
Eσ = (n − 1), Vσ = 2(n − 1)
σ2 σ2
2σ 4
which gives Eσ (S 2 ) = σ 2 and Vσ (S 2 ) = n−1
. After some routine algebra we get
2n
+1
2 2 2 4
M SEσ (cS − σ ) = σ c − 2c + 1
n−1
n−1
which attains the minimum when c = n+1
. The minimum value being
2σ 4 2σ 4
<
n+1 n−1
n
1
and hence T = Σ (Xi
n+1 i=1
− X̄)2 has smaller MSE than the unbiased estimator S 2 of
n−1 2
σ 2 . But T is not unbiased for σ 2 since Eσ (T ) = n+1
σ . If we use the criterion of MSE
the estimator T which is a biased for σ 2 is better than the unbiased estimatorS 2 of σ 2 .
T = −1 if X̄ < −1
= X̄ if | X̄ |≤ 1
= 1 if X̄ > 1
Then
Z 1
Eθ (T ) = Pθ (X̄ > 1) + Pθ (X̄ < −1) + x̄f (x̄)dx̄ 6= θ,
−1
where f (x̄) represents the p.d.f. of X̄. Hence T is a biased estimator of θ and the
MSE of T is given by
Z 1
2 2
M SEθ (T ) = (1 − θ) Pθ (X̄ > 1) + (−1 − θ) Pθ (X̄ < −1) + (x̄ − θ)2 f (x̄)dx̄
−1
5
For the estimator X̄ we have
Eθ (X̄) = θ.
Now
Z ∞
M SEθ (X̄) − M SEθ (T ) = (x̄ − θ)2 f (x̄)dx̄ − M SEθ (T )
−∞
Z −1 h i Z ∞h i
= (x̄ − θ)2 − (−1 − θ)2 f (x̄)dx̄ + (x̄ − θ)2 − (1 − θ)2 f (x̄)dx̄
−∞ 1
= I1 + I2 , say.
x̄ < −1 ⇒ (x̄ − θ) < (−1 − θ) < 0 ⇒ (x̄ − θ)2 > (−1 − θ)2 .
x̄ > 1 ⇒ (x̄ − θ) > (−1 − θ) > 0 ⇒ (x̄ − θ)2 > (−1 − θ)2 .
Hence we get I1 + I2 > 0. Thus T is a biased estimator of θ but the MSE of T is less
than that of X̄.
Note In both the examples a natural question may arise : which then should be
preferred? The answer obviously depends on the purpose for which an estimate is
obtained.