THEORY OF ESTIMATION NOTES
THEORY OF ESTIMATION NOTES
UNDERGRADUATE STUDIES
R.O. OTIENO, G.O. ORWA and O.O. NGESA
Prerequisites: Probability, Statistics and Calculus.
1. Introduction
The field of statistical inference consists of those methods used to make decisions
about some unknown (parameter), or to draw conclusions about a population.
These methods utilize information contained in a sample which is usually de-
noted by X1 , X2 , ∙ ∙ ∙ , Xn . Statistical inference is divided into two major areas:
Parameter Estimation and Hypothesis Testing. The concern of this course
is Parameter Estimation.
2. Point Estimation
Here, θ1 = μ, θ2 = σ 2
Suppose a manufacturer is interested in finding the mean daily production
of a plant. The manufacturer will collect daily productions for a few days in a
month and use the information collected from this sample to compute a num-
ber that is in some sense a reasonable value (or guess) of the true mean daily
production. This number is called a point estimate.
Example
Suppose that X ∼ N (μ, σ 2 ), then if we consider the sample X1 , X2 , ∙ ∙ ∙ , Xn ,
and from these observations, find the statistic
n
1X X1 + X 2 + ∙ ∙ ∙ + X n
x= Xi =
n i=1 n
The distribution of x is called its sampling distribution and it may be found as Theory of
follows: Estimation
If X ∼ N (μ, σ 2 ) , then it is known that E(X) = μ and V ar(X) = σ 2
So that;
n
! n n
1X 1X 1X nμ
E(x) = E Xi = E(Xi ) = μ= =μ
n i=1 n i=1 n i=1 n
n
! n
1X 1 X nσ 2 σ2
V ar(x) = V ar Xi = 2
V ar(Xi ) = 2 =
n i=1 n i=1 n n
Example
Suppose that the random variable X is normally distributed with an unknown
mean μ. The sample mean is a point estimator of the unknown population mean
μ . That is, μ̂ . After the sample has been selected, the numerical value is the
point estimate of μ. Thus, if X1 = 25, X2 = 30, X3 = 29 and X4 = 31, the
point estimate of μ is;
25 + 30 + 29 + 31
x= = 28.75
4
Theory of
Estimation
2.2. Methods of Point Estimation
In this section, we discuss two methods for obtaining point estimators: the
method of moments, the method of maximum likelihood and method
of Ordinary Least Squares. Maximum likelihood estimates are generally
preferable to moment estimators because they have better efficiency properties.
However, moment estimators are sometimes easier to compute.
The general idea behind the method of moments is to equate population mo-
ments, which are defined in terms of expected values, to the corresponding
sample moments. The population moments will be functions of the unknown
parameters. Then these equations are solved to yield estimators of the unknown
parameters.
Let X1 , X2 , ∙ ∙ ∙ , Xn be a random sample from the probability distribution
f (x), where f (x) can be a discrete probability mass function or a continuous
probability density function.
0
The k th population moment (or distribution moment), μ̂k is E(X k ), k =
0 P n
1, 2, ∙ ∙ ∙ The corresponding k th sample moment, Mk is n i=1 X k , k = 1, 2, ∙ ∙ ∙
1
Example 1
Suppose that X1 , X2 , ∙ ∙ ∙ , Xn is a random sample from an exponential distribu-
tion with parameter λ. Find the moments estimator of λ.
Solution
0 0
For methods of moments, μ̂k = Mk , k = 1, 2, ∙ ∙ ∙
Here k = 1
Now there is only one parameter to estimate, so we must equate E(X) to
0 R.O.O, G.O.O, and O.O.N
0 R.O.O, G.O.O, and O.O.N
R.O. Otieno, G.O. Orwa and O.O. Ngesa/THEORY OF ESTIMATION 4
Pn
i=1 Xi .
1
n
For the exponential distribution, we know that;
f (x, λ) = λe−λx
Example 2
Suppose that X1 , X2 , ∙ ∙ ∙ , Xn is a random sample from a normal distribution
with parameters μ and σ 2 . Obtain the moment estimators of the two parameters.
Solution
Pn E(X) = μ and E(X ) = μ + σ . Equating E(X)
For the normal distribution 2 2 2
μ̂ = x
and Pn Pn 2 Pn
1 2
x2 − x2 (x − x)
σˆ2 = i=1 n i=1
= i=1
n n
Example 3
Obtain the moment estimators of the parameter A and D in the Uniform dis-
tribution in the interval(A, A + D).
Solution
Recall for X U (a, b);
1
f (x) =
b−a
for a < x < b, zero elsewhere.
We need to find the sample moments and population moments and solve the
resulting equations. For the first population moment,
Z A+D
X D
E(X) = dx = A +
A D 2
0 Pn
For the first sample moment, M1 = 1
n i=1 Xi = x
Equating the two, yields Theory of
D̂ Estimation
x = Â + (2.1)
2
R A+D X 2 D2
For the second population moment, E(X 2 ) = A D dx = A + AD +
2
3
0 P n
For the second sample moment, M2 = n1 i=1 Xi2
We know that;
Pn n
1X 2
2
(X − x)
s2 = i=1 = X − x2
n n i=1 i
This implies that;
n
1X 2
x = s2 + x2
n i=1 i
Equating these two second moments yields
D2
s2 + x2 = A2 + AD + (2.2)
3
Squaring equation 2.1 above yields;
D̂2
x2 = Â2 + ÂD + (2.3)
2
R.O. Otieno, G.O. Orwa and O.O. Ngesa/THEORY OF ESTIMATION 6
D̂ 2
ŝ2 = (2.4)
12
Solving equations 2.1 and 2.4 simultaneously yields:
√
D̂ = s 12
and √
 = x − s 3
Exercise
1. Obtain the moment estimator for λ in the Poisson distribution with pa-
rameter λ. [Ans: λ̂ = x].
Note that the likelihood function is now a function of only the unknown param-
eter.
Solution
Pn
n
Y n
Y xi
e−θ θxi e−nθ θ i=1
L(x; θ) = L(θ) = f (x; θ) = = Qn
i=1 i=1
xi ! i=1 xi !
which is the likelihood function.
Example 2
Let f (x, λ) = λe−λx , x > 0 and zero elsewhere.Find the likelihood function.
Solution
n
Y n
Y Pn
L(θ) = f (x; λ) = λe−λxi = λn e−λ i=1
xi
i=1 i=1
Remark
A statistic say u(x1 , x2 , ∙ ∙ ∙ , xn ) such that when θ is replaced with it, then the
likelihood function is a minimum is usually called a likelihood estimator of
θ denoted by θ̂M LE , this means that the principle of Maximum Likelihood
Estimation is to obtain the values of θ which maximises L(θ).
To maximize the likelihood with respect to a parameter, we differentiate
lnL(θ) with respect to that parameter, equate to zero and solve for that param-
eter, i.e we obtain the solution for;
d
[L(θ)] = 0 (2.6)
dθ
The value of θ that maximizes L(θ), also maximizes the log-likelihood function, Theory of
lnL(θ). Estimation
So sometimes a researcher maximizes lnL(θ) rather than L(θ) because of
simplicity of many lnL(θ) functions, so that we may have;
d d dl
[loge L(θ)] = [lnL(θ)] =
dθ dθ dθ
NB: lnL(θ) = l
If the likelihood function contains k parameters so that L(x, θ) = L(x, θ1 , θ2 , ∙ ∙ ∙ , θk ),
then the MLE’s of the k parameters are the values of θi s with i = 1(1)k which
maximise the value L(x, θ1 , θ2 , ∙ ∙ ∙ , θk ).
The point where the Likelihood function L(x, θ1 , θ2 , ∙ ∙ ∙ , θk ) is maximum will
0 R.O.O, G.O.O and O.O.N
R.O. Otieno, G.O. Orwa and O.O. Ngesa/THEORY OF ESTIMATION 8
Example 1
Let X be exponentially distributed with parameter λ. Consider a random sam-
ple of size n, say X1 , X2 , ∙ ∙ ∙ , Xn . Find the maximum likelihood estimate of λ.
n
dlnL(λ) n X
= − Xi
dλ λ i=1
Example 2 Theory of
Estimation
Given an independent random sample of size n from a normal distribution
with mean μ and variance σ 2 . Obtain the MLEs of μ and σ 2
Solution 1 2
For Normal distribution, f (x, μ, σ 2 ) = σ√12π e− 2σ2 (X−μ)
This has 2 parameters, hence k = 2 here.
n P n2 P
1 1 1
(Xi −μ)2 − 12 (Xi −μ)2
L(θ) = √ e− 2σ2 = e 2σ (2.10)
σ 2π 2πσ 2
n n 1 X
l = − lnσ 2 − ln2π − 2 (Xi − μ)2 (2.11)
2 2 2σ
P
∂l n (Xi − μ)2
= − + (2.12)
∂σ 2 2σ 2 2(σ 2 )2
0 R.O.O, G.O.O and O.O.N
R.O. Otieno, G.O. Orwa and O.O. Ngesa/THEORY OF ESTIMATION 9
Hence:
μ̂ = x
and P
(Xi − x)2
σˆ2 =
n
Exercise
1. Find the MLE of θ for the population whose p.d.f is given by f (x, θ) = 1
θ
for 0 < x < θ and zero elsewhere.
3.1. Unbiasedness
Example 1
Suppose there is a random sample of size n from a population with mean θ,
verify that the sample mean x is an unbiased estimator of θ.
Solution
If x is P
unbiased for θ, then E(x) = θ P Pn
n n
x = n1 i=1 Xi So that E(x) = E( n1 i=1 Xi ) = n1 i=1 E(Xi )
But from hypothesis,E(X
Pn i) = P
θ
n
Therefore, n1 i=1 E(Xi ) = n1 i=1 θ = nθ n =θ
Since E(x) = θ, x is an unbiased estimator of θ.
Example 2
Let X1 , X2 , ∙ ∙ ∙ , Xn be a random sample from a normal population with mean
θ and variance σ 2 . Show that the statistic defined by;
n
1 X
s =
2
(xi − x)2
n − 1 i=1
Theory of
Estimation
3.2. Consistency
2
E(θ̂n − θ)2 = E θ̂n − E(θ̂n ) + E(θ̂n ) − θ
h i2 h i2
= E θ̂n − E(θ̂n ) + E E(θ̂n ) − θ
h i2
= V ar(θ̂n ) + Bias(θ̂n )
h i2
This means that M SE(θ̂n ) = V ar(θ̂n ) + Bias(θ̂n )
Thus the sequence of estimators θ̂i , i = 1, 2, ∙ ∙ ∙ is said to be strongly consistent
iff
(i). limn→∞ V ar(θ̂n ) = 0
and
(ii). limn→∞ Bias(θ̂n ) = 0
Example 1
Suppose that X1 , X2 , ∙ ∙ ∙ , Xn be a random sample of size n from a normal
population with
Pmean μ and variancePσ 2 .
n n
Let x = n i=1 Xi and s2 = n1 i=1 (Xi − x)2 be the sample mean and
1
sample variance respectively. Show that x and s2 are Mean Squared Error esti-
mators of μ and σ 2 respectively.
Solution
First we deal with x Pn P n2 σ2
Here μ̂ = x and V ar(x) = V ar( n1 i=1 Xi ) = 1
n i=1 V ar(Xi ) = n
2
limn→∞ V ar(x) = limn→∞ σn = 0
0 R.O.O, G.O.O and O.O.N
R.O. Otieno, G.O. Orwa and O.O. Ngesa/THEORY OF ESTIMATION 12
Next,
Bias(x) = E(x) − Pμn Pn Pn
but E(x) = E( n1 i=1 Xi )) = n1 i=1 E(Xi ) = n1 i=1 μ = nμ n =μ
Therefore Bias(x) = 0
limn→∞ Bias(x) = limn→∞ (0)
Since the two conditions for strong consistency are satisfied, we conclude that
x is a strongly consistent estimator of the population mean μ
Now we proceed to check for Pns
2
Pn
i=1 (Xi − x)
2
ns2
2
= 2
∼ χ2(n−1)
σ σ
Therefore;
ns2
E =n−1
σ2
n−1 1
⇒ E(s ) =2
σ = 1−
2
σ2
n n
2
Bias of s2 is E(s2 ) − σ 2 = 1 − n1 σ 2 − σ 2 = − σn
2
Hence limn→∞Bias(s2 ) = limn→∞ (− σn ) = 0
ns2
Also V ar σ2 = 2(n − 1)
V ar(s ) = 2 n2 (σ 2 )2 = 2 n−1
2 n−1
n2 (σ )
4
limn→∞ V ar(s ) = limn→∞ 2 n2 (σ 4 )
2 n−1
=0
Since the two conditions for strong consistency are satisfied, we conclude that
s2 is a strong consistent estimator of the population variance σ 2 Theory of
Estimation
3.2.2. Weak Consistency
Or equivalently
h i
lim P r θ̂n − θ < = 1 ∀ θ∈Ω
n→∞
0 R.O.O, G.O.O and O.O.N
R.O. Otieno, G.O. Orwa and O.O. Ngesa/THEORY OF ESTIMATION 13
Example
Solution
Pn Pn σ2
V ar(x) = V ar( n1 i=1 Xi ) = 1
n2 i=1 V ar(Xi ) = n
Now θ̂n − θ = |x − μ| and
h i
P r θ̂n − θ < = P r [|x − μ| < ]
Theory of
Now using Chebychev’s Inequality, which we may state in our own con- Estimation
text as being;
If sampling from a group with mean μ and finite variance σ 2 , then
P r [|x − μ| < ] ≤ 1 − V ar((x))
2
so that after substitution we have ;
σ2
P r [|x − μ| < ] ≤ 1 − n 2
h i
2
⇒ limn→∞ P r [|x − μ| < ] = limn→∞ 1 − σ2 = 1
Hence x
ˉ is a weakly consistent estimator of μ.
3.3. Sufficiency
Example
Suppose there is a random sample X
P1n, X2 , ∙ ∙ ∙ , Xn from a Poisson distribution
with parameter λ. Verify that T = i=1 Xi is sufficient for λ
e−λ λx
f (x, λ) = , x = 0, 1, 2, ∙ ∙ ∙
x!
and zero elsewhere. Pn
We need to determine the conditional distribution of given T = i=1 Xi
This conditional distribution is given by;
L(X, t) L(X, t, λ)
g(X|T = t) = =
g(t) g(t, λ)
0 R.O.O, G.O.O and O.O.N
R.O. Otieno, G.O. Orwa and O.O. Ngesa/THEORY OF ESTIMATION 14
Where L is the likelihood function of while g(t) is the marginal density function
of T.
L(X, t, λ) = P r(X1 = x1 , X2 = x2 , ∙ ∙ ∙ , Xn = xn ; T = t)
Let
A = {X1 = x1 , X2 = x2 , ∙ ∙ ∙ , Xn = xn }
B = {T = t}
Clearly A ⊂ B therefore A ∩ B = A, Therefore P r(A ∩ B) = P r(A)
Because of this, we only find L(X, λ)
P
n
Y xi
e−nλ λ
L(X, λ) = f (xi , λ) = Qn
i=1 i=1 xi !
e−nλ λt
= Qn
i=1 xi !
Pn
Where t = i=1 Xi
Next, g(t, λ) = P r(T = t)
Now, if X ∼ P oisson(λ) then;
n
X e−nλ (nλ)t
Xi ∼ P oisson(nλ) = Qn
i=1 i=1 t!
−nλ
Q
e λt
n
xi ! t!
g(x|T = t) = i=1
= Qn (3.1)
i=1 xi !n
e−nλ
Qn(nλ)t t
t!
i=1
Since
Pn this result from the division is independent of λ we conclude that t =
i=1 Xi is sufficient for λ
Exercises
Pn
1. Let be i.i.d having binomial distribution,b(1, p). Let T = i=1 Xi . Verify
that T is sufficient for p.
2. Let be i.i.d having geometric distribution;
f (x, θ) = θx (1 − θ), x = 0, 1, 2, ∙ ∙ ∙
Given a function f (x, θ) with X being a random variable while θ is the govern-
ing parameter in f (x, θ) then it is usually possible to identify/pick the sufficient
statistic(s) if there exists any. To achieve this, the following methods are usually
employed.
L = q (t, θ) .h (x)
Solution
n o
1−x
f (x, θ) = θx (1 − θ) , x = 0, 1 and zero elsewhere
Pn P
n
xi n− xi
Q
n
1−x
L (x, θ) = θx (1 − θ) = θi=1 (1 − θ) i=1
i=1
P
n
n−t
But T = xi ⇒ L (x, θ) = θt (1 − θ)
i=1
The above L = (x, θ) is already of the form
q (t, θ) h (x)
n−t
where q (t, θ) = θt (1 − θ)
and
h (x) = 1
This verifies that T = t is sufficient for θ.
Theory of
Example 2 Estimation
Let X1 , X2 , ∙ ∙ ∙ , Xn be a random sample from a continuous distribution with
PDF
f (x, θ) = θ xθ−1 , 0 < x < 1
Q
n
Show that T = Xi is sufficient for θ.
i=1
Q
n
L (x, θ) = θ xθ−1
i=1
0 R.O.O, G.O.O and O.O.N
R.O. Otieno, G.O. Orwa and O.O. Ngesa/THEORY OF ESTIMATION 16
θ−1
Q
n Q
n
L (x, θ) = θn xiθ−1 = θn xi
i=1 i=1
Q
n
θ−1
but T = xi so that L (x, θ) = θn (t)
i=1
θ−1
This is in the form q (t, θ) h (x) where q (t, θ) = θn (t) while h (x) = 1
Q
n
Hence T = xi is sufficient for θ.
i=1
Exercise
Let X be a normally distributed random variable with mean μ and variance σ 2
where σ 2 is known. Find the MLE of μ and examine it for sufficiency. Further,
assume that μ is known, derive MLE of σ 2 and examine it for sufficiency.
Remarks
The factorization criterion is best used in cases of multi-parameter functions in
which we have joint sufficiency. By reference, the statistics T1 , T2 ...Tk are said
to be jointly sufficient for θ1 , θ2 ...θk ,iff the joint pdf.
L (x, θ1 , θ2 ...θk ) = q (t, θ1 , θ2 ...θk ) h (x)
where t = t1 , t2 , t3 ...tk
Example
Let X1 , X2 , ∙ ∙ ∙ , Xn be a random sample for a P
normal distribution with both μ
2
and σ 2 unknown, verify that T1 = x and T2 = (xi − x) are jointly sufficient
for μ and σ 2 .
Solution
(x−μ)2
f x, μ, σ 2 = √1 e− 2σ 2
σ 2π
n2 1 P
n
2
L x, μ, σ 2 = 1
2πσ 2 e− 2σ2 (xi − μ)
i=1
but
n
X n
X n
X
2 2 2 2 2
(x − x) = (xi − x + x − μ) = (xi − x) + n (x − μ) = t2 +n (t − μ)
i=1 i=1 i=1
n2 − 1 [t2 +n(t1 −μ)2 ]
This means that L x, μ, σ = 2πσ 1
2
2
e
2σ 2
Example
i=1
Example
Let X1 , X2 , ∙ ∙ ∙ , Xn denote a random sample from a Bernoulli distribution hav-
ing PDF
n o
1−x
f (x, θ) = θx (1 − θ) x = 0, 1 0<θ<1
Show that the family of Bernoulli distribution belongs to the one parameter
exponential family.
Solution x
1−x
f (x, θ) = θx (1 − θ) = (1 − θ) 1−θ
θ
h i
Which may be further be written as (1 − θ) exp x ln 1−θ θ
Exercise
Suppose X1 , X2 , ∙ ∙ ∙ , Xn form a random sample from a prison distribution with Theory of
x ≥ 0. Show that the family of poison distribution belongs to the parameter Estimation
exponential family and determine a sufficient statistic for x.
Possible answers: (i) x!eθ exp(1 x ln θ).
C (θ) = eθ , m (x) = x! 1
φ (θ) = ln θ, ρ (x) = x
Pn P
⇒T = ρ (xi ) = xi willsuf f icient f or θ
i=1
0 R.O.O, G.O.O and O.O.N
R.O. Otieno, G.O. Orwa and O.O. Ngesa/THEORY OF ESTIMATION 18
Example
Let X1 , X2 , ∙ ∙ ∙ , Xn be a random sample from x ∼ N μ, σ 2 , with both pa-
rameters unknown. Verify that this family of normal distributions belongs to
the two parameter exponential family and hence determine the jointly sufficient
statistics for μ and σ 2 .
Solution
(x−μ)2
f x, μ, σ 2 = σ√12π e− 2σ2 which is easily re-written as;
12
−x2
1 −μ2
2σ 2
+ μx
2
f x, μ, σ 2
= e 2σ 2 e σ
..... ∗ ∗
2πσ 2
2 2 μ2
2 μx μ − 2
(x − μ) = x2 − 2xμ + μ2 ⇒ 2σ x
2 + σ 2 − 2σ 2 = e
2σ
Exercise
Let X1 , X2 , ∙ ∙ ∙ , Xn denote a random sample from a distribution with parameter
α > 0 and β > 0, show that the family Beta distributions with both α and β
unknown belongs to a two parameter distribution family. Show that the family
of Gamma densities;
α α−1 −βx
β x e
f (xα, β) = x>0
Γα
R.O. Otieno, G.O. Orwa and O.O. Ngesa/THEORY OF ESTIMATION 19
3.4. Completeness
Example
Show that the binomial family of densities is complete. Solution. If X ∼ Bin (n, θ)
then;
n x n−x
f (x, θ) = θ (1 − θ)
x
Pn n−x
Let φ (x) be any form of x, then E [φ (x)] = φ (x) n
x θx (1 − θ)
x=0
Let a (x) = φ (x) nx so that we have;
Pn
n−x
E [φ (x)] = a (x) θ x (1 − 0)
x=0
If the family is complete, then E [φ (x)] = 0
Pn
n−x
⇒ a (x) θx (1 − θ) = 0 but this is a polynomial of order n
x=0
n n−1 n−2 0
i.e. a (0) θ 0 (1 − θ) + a (1) θ1 (1 − θ) + a (2) (1 − θ) + .....a (n) θn (1 − θ)
n n−1
⇒ a (0) (1 − θ) + a (1) (1 − θ) + ∙ ∙ ∙ + a (n) θn
If this equation is to hold then all the values of the coefficients ai s must be
identically equal to zero
i.e. a (x) = 0 ∀x ⇒ φ (x) nx = 0 but nx 6= 0 which
means that if φ (x) nx = 0 then definitely φ (x) = 0 and hence E [φ (x)] = 0
Therefore the family of binomial densities is complete.
Theory of
Exercise. Estimation
Show that the family of poisson densities is complete.