0% found this document useful (0 votes)
55 views

THEORY OF ESTIMATION NOTES

The document outlines a course on the Theory of Estimation, focusing on methods for parameter estimation in statistics, including Point Estimation, Method of Moments, and Maximum Likelihood Estimation. It discusses the properties of estimators, such as unbiasedness and efficiency, and provides examples and exercises for practical understanding. The course is designed for undergraduate studies and requires prior knowledge in probability, statistics, and calculus.

Uploaded by

jaraemmanuel562
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views

THEORY OF ESTIMATION NOTES

The document outlines a course on the Theory of Estimation, focusing on methods for parameter estimation in statistics, including Point Estimation, Method of Moments, and Maximum Likelihood Estimation. It discusses the properties of estimators, such as unbiasedness and efficiency, and provides examples and exercises for practical understanding. The course is designed for undergraduate studies and requires prior knowledge in probability, statistics, and calculus.

Uploaded by

jaraemmanuel562
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

THEORY OF ESTIMATION FOR

UNDERGRADUATE STUDIES
R.O. OTIENO, G.O. ORWA and O.O. NGESA
Prerequisites: Probability, Statistics and Calculus.

Course Outline : Point Estimation: Method of Moments(MME), Maximum Like-


lihood Estimation. Properties of Estimators :Unbiasedness, Sufficiency, Com-
pleteness, Consistency, Minimum variance. Efficiency: Crammer Rao inequality
for a single parameter. Interval Estimation, Loss and Risk functions, Bayesian
Estimation, Minimax Estimation.

1. Introduction

The field of statistical inference consists of those methods used to make decisions
about some unknown (parameter), or to draw conclusions about a population.
These methods utilize information contained in a sample which is usually de-
noted by X1 , X2 , ∙ ∙ ∙ , Xn . Statistical inference is divided into two major areas:
Parameter Estimation and Hypothesis Testing. The concern of this course
is Parameter Estimation.

2. Point Estimation

2.1. The problem of Point estimation

Suppose that X1 , X2 , ∙ ∙ ∙ , Xn is a random sample of size n from the distribution


of a random variable X. The problem of point estimation is concerned with the
methods of estimating θ given the known form of f (x, θ) and sample observations
X1 , X 2 , ∙ ∙ ∙ , X n Theory of
Examples of f (x, θ) may include: Estimation
 −λ x
e λ
x = 0, 1, 2, ∙ ∙ ∙
The Poisson distribution f (x, λ) = x!
0 elsewhere
Here, θ = λ  n x n−x
x p q x = 0, 1, 2, ∙ ∙ ∙
The Binomial distribution f (x, p) =
0 elsewhere
Here, θ = p
1 2
The Normal distribution f (x, μ, σ 2 ) = √1 e− 2σ2 (x−μ)
σ 2π
0 Lecture notes on Theory of Estimation
1
R.O. Otieno, G.O. Orwa and O.O. Ngesa/THEORY OF ESTIMATION 2

Here, θ1 = μ, θ2 = σ 2
Suppose a manufacturer is interested in finding the mean daily production
of a plant. The manufacturer will collect daily productions for a few days in a
month and use the information collected from this sample to compute a num-
ber that is in some sense a reasonable value (or guess) of the true mean daily
production. This number is called a point estimate.

Definitions: Statistic and Estimator


A statistic is any function of the elements of a random sample which does not
contain any unknowns(parameters). Since statistics is a functions of random
variables, they are also random variables themselves. A statistic has a probabil-
ity distribution. We call the probability distribution of a statistic a sampling
distribution.

Example
Suppose that X ∼ N (μ, σ 2 ), then if we consider the sample X1 , X2 , ∙ ∙ ∙ , Xn ,
and from these observations, find the statistic
n
1X X1 + X 2 + ∙ ∙ ∙ + X n
x= Xi =
n i=1 n

The distribution of x is called its sampling distribution and it may be found as Theory of
follows: Estimation
If X ∼ N (μ, σ 2 ) , then it is known that E(X) = μ and V ar(X) = σ 2
So that;
n
! n n
1X 1X 1X nμ
E(x) = E Xi = E(Xi ) = μ= =μ
n i=1 n i=1 n i=1 n

n
! n
1X 1 X nσ 2 σ2
V ar(x) = V ar Xi = 2
V ar(Xi ) = 2 =
n i=1 n i=1 n n

So the sampling distribution of x is;


σ2
x ∼ N (μ, )
n
In general, if X is a random variable with probability distribution f (x, θ),
characterized by the unknown parameter θ , and if X1 , X2 , ∙ ∙ ∙ , Xn is a random
sample of size n from X, if the statistic θ̂ = h(x1 , x2 , ∙ ∙ ∙ , xn ) is used to estimate
θ, then θ̂ is called an estimator of θ.
After the sample has been selected, θ̂ takes on a particular numerical value
θ̂ called the point estimate of θi.e in general, a reached value of an estimator is
usually called an estimate.
R.O. Otieno, G.O. Orwa and O.O. Ngesa/THEORY OF ESTIMATION 3

Example
Suppose that the random variable X is normally distributed with an unknown
mean μ. The sample mean is a point estimator of the unknown population mean
μ . That is, μ̂ . After the sample has been selected, the numerical value is the
point estimate of μ. Thus, if X1 = 25, X2 = 30, X3 = 29 and X4 = 31, the
point estimate of μ is;
25 + 30 + 29 + 31
x= = 28.75
4

Theory of
Estimation
2.2. Methods of Point Estimation

In this section, we discuss two methods for obtaining point estimators: the
method of moments, the method of maximum likelihood and method
of Ordinary Least Squares. Maximum likelihood estimates are generally
preferable to moment estimators because they have better efficiency properties.
However, moment estimators are sometimes easier to compute.

2.2.1. Method of Moments

The general idea behind the method of moments is to equate population mo-
ments, which are defined in terms of expected values, to the corresponding
sample moments. The population moments will be functions of the unknown
parameters. Then these equations are solved to yield estimators of the unknown
parameters.
Let X1 , X2 , ∙ ∙ ∙ , Xn be a random sample from the probability distribution
f (x), where f (x) can be a discrete probability mass function or a continuous
probability density function.
0
The k th population moment (or distribution moment), μ̂k is E(X k ), k =
0 P n
1, 2, ∙ ∙ ∙ The corresponding k th sample moment, Mk is n i=1 X k , k = 1, 2, ∙ ∙ ∙
1

Example 1
Suppose that X1 , X2 , ∙ ∙ ∙ , Xn is a random sample from an exponential distribu-
tion with parameter λ. Find the moments estimator of λ.

Solution
0 0
For methods of moments, μ̂k = Mk , k = 1, 2, ∙ ∙ ∙
Here k = 1
Now there is only one parameter to estimate, so we must equate E(X) to
0 R.O.O, G.O.O, and O.O.N
0 R.O.O, G.O.O, and O.O.N
R.O. Otieno, G.O. Orwa and O.O. Ngesa/THEORY OF ESTIMATION 4

Pn
i=1 Xi .
1
n
For the exponential distribution, we know that;

f (x, λ) = λe−λx

for x > 0, and zero otherwise.

For the population moment, we have;


Z ∞
0 1
μ̂1 = E(x) = xλe−λx dx =
0 λ

[Hint: Use integration by parts to show this


Por the gamma function (Γ)]
0 n
For the sample moment, we have M1 = n1 i=1 xi = x
Equating the population and sample moments, leads to:
1
x=
λ
1
⇒λ=
x
1
x is the moment estimator of λ. Theory of
The implication of this result is that in order to estimate λ by the methods Estimation
of moments in a case where the density function is exponential distribution, we
simply find the sample mean x from the observations X1 , X2 , ∙ ∙ ∙ , Xn and use
its reciprocal as the estimate of λ

Example 2
Suppose that X1 , X2 , ∙ ∙ ∙ , Xn is a random sample from a normal distribution
with parameters μ and σ 2 . Obtain the moment estimators of the two parameters.

Solution
Pn E(X) = μ and E(X ) = μ + σ . Equating E(X)
For the normal distribution 2 2 2

to x and E(X 2 ) to n1 i=1 X 2 gives


μ=x
and Pn
μ2 + σ 2 = n1 i=1 X 2
Solving these equations gives the moment estimators

μ̂ = x
and Pn Pn 2 Pn
1 2
x2 − x2 (x − x)
σˆ2 = i=1 n i=1
= i=1
n n

0 R.O.O, G.O.O and O.O.N


R.O. Otieno, G.O. Orwa and O.O. Ngesa/THEORY OF ESTIMATION 5

Example 3
Obtain the moment estimators of the parameter A and D in the Uniform dis-
tribution in the interval(A, A + D).

Solution
Recall for X U (a, b);
1
f (x) =
b−a
for a < x < b, zero elsewhere.

Using the interval given for this question, we have


1 1
f (x, A, D) = =
(A + D) − A D
for A < x < A + D, zero elsewhere.

We need to find the sample moments and population moments and solve the
resulting equations. For the first population moment,
Z A+D
X D
E(X) = dx = A +
A D 2

0 Pn
For the first sample moment, M1 = 1
n i=1 Xi = x
Equating the two, yields Theory of
D̂ Estimation
x = Â + (2.1)
2
R A+D X 2 D2
For the second population moment, E(X 2 ) = A D dx = A + AD +
2
3
0 P n
For the second sample moment, M2 = n1 i=1 Xi2
We know that;
Pn n
1X 2
2
(X − x)
s2 = i=1 = X − x2
n n i=1 i
This implies that;
n
1X 2
x = s2 + x2
n i=1 i
Equating these two second moments yields

D2
s2 + x2 = A2 + AD + (2.2)
3
Squaring equation 2.1 above yields;

D̂2
x2 = Â2 + ÂD + (2.3)
2
R.O. Otieno, G.O. Orwa and O.O. Ngesa/THEORY OF ESTIMATION 6

Subtracting equations 2.1 and 2.4 leads to:

D̂ 2
ŝ2 = (2.4)
12
Solving equations 2.1 and 2.4 simultaneously yields:

D̂ = s 12
and √
 = x − s 3

Exercise
1. Obtain the moment estimator for λ in the Poisson distribution with pa-
rameter λ. [Ans: λ̂ = x].

2. Let X1√ Xn be a random sample from a uniform distribution on


, X2 , ∙ ∙ ∙ ,√
(μ − σ 3, μ + σ 3). Find M.M.E for μ and σ. [Ans: μ = x, σ̂ = s]
Theory of
Estimation
2.2.2. Method of Maximum Likelihood

One of the best methods of obtaining a point estimator of a parameter is the


method of maximum likelihood. This technique was developed in the 1920s by a
famous British statistician, Sir R. A. Fisher. As the name implies, the estimator
will be the value of the parameter that maximizes the likelihood function.
The likelihood function of n random variables X1 , X2 , ∙ ∙ ∙ , Xn is defined to
be the joint density function or joint mass function of the n random variables.
In particular,suppose that X is a random variable with probability distribution
f (x; θ), where is θ a single unknown parameter.
Let X1 , X2 , ∙ ∙ ∙ , Xn be the observed values in a random sample of size n. Then
the likelihood function of the sample is L(x1 , x2 , ∙ ∙ ∙ , xn ; θ) = L(x; θ) = L(θ) is
defined as;
n
Y
L(θ) = f (x1 ; θ) ∗ f (x2 ; θ)∗, ∙ ∙ ∙ , ∗f (xn ; θ) = f (xi ; θ) (2.5)
i=1

Note that the likelihood function is now a function of only the unknown param-
eter.

0 R.O.O, G.O.O and O.O.N


R.O. Otieno, G.O. Orwa and O.O. Ngesa/THEORY OF ESTIMATION 7

Examples on finding likelihood functions.


Example 1
−θ x
Suppose that f (x; θ) = e x!θ , x = 0, 1, 2, ∙ ∙ ∙ and zero elsewhere. Find the cor-
responding likelihood function.

Solution
Pn
n
Y n
Y xi
e−θ θxi e−nθ θ i=1
L(x; θ) = L(θ) = f (x; θ) = = Qn
i=1 i=1
xi ! i=1 xi !
which is the likelihood function.

Example 2
Let f (x, λ) = λe−λx , x > 0 and zero elsewhere.Find the likelihood function.

Solution
n
Y n
Y Pn
L(θ) = f (x; λ) = λe−λxi = λn e−λ i=1
xi

i=1 i=1

Remark
A statistic say u(x1 , x2 , ∙ ∙ ∙ , xn ) such that when θ is replaced with it, then the
likelihood function is a minimum is usually called a likelihood estimator of
θ denoted by θ̂M LE , this means that the principle of Maximum Likelihood
Estimation is to obtain the values of θ which maximises L(θ).
To maximize the likelihood with respect to a parameter, we differentiate
lnL(θ) with respect to that parameter, equate to zero and solve for that param-
eter, i.e we obtain the solution for;
d
[L(θ)] = 0 (2.6)

The value of θ that maximizes L(θ), also maximizes the log-likelihood function, Theory of
lnL(θ). Estimation
So sometimes a researcher maximizes lnL(θ) rather than L(θ) because of
simplicity of many lnL(θ) functions, so that we may have;
d d dl
[loge L(θ)] = [lnL(θ)] =
dθ dθ dθ

NB: lnL(θ) = l
If the likelihood function contains k parameters so that L(x, θ) = L(x, θ1 , θ2 , ∙ ∙ ∙ , θk ),
then the MLE’s of the k parameters are the values of θi s with i = 1(1)k which
maximise the value L(x, θ1 , θ2 , ∙ ∙ ∙ , θk ).
The point where the Likelihood function L(x, θ1 , θ2 , ∙ ∙ ∙ , θk ) is maximum will
0 R.O.O, G.O.O and O.O.N
R.O. Otieno, G.O. Orwa and O.O. Ngesa/THEORY OF ESTIMATION 8

be found by the solution to the k equations:


∂l
, i = 1, 2, 3, ∙ ∙ ∙ , k (2.7)
∂θi
NB: These are k partial derivative equations.

Example 1
Let X be exponentially distributed with parameter λ. Consider a random sam-
ple of size n, say X1 , X2 , ∙ ∙ ∙ , Xn . Find the maximum likelihood estimate of λ.

Since the X is exponential, the PDF is f (x, λ) = λe−λx


The likelihood function for this case is given by:
n
Y Pn
L(λ) = λe−λxi = λn e− i=1
λxi
(2.8)
i=1
The log likelihood is;
n
X
lnL(λ) = nlnλ − λ xi
i=1

n
dlnL(λ) n X
= − Xi
dλ λ i=1

Equating this to zero and solving for λ, yields


n 1
λ̂ = Pn = (2.9)
i=1 Xi x

Example 2 Theory of
Estimation
Given an independent random sample of size n from a normal distribution
with mean μ and variance σ 2 . Obtain the MLEs of μ and σ 2

Solution 1 2
For Normal distribution, f (x, μ, σ 2 ) = σ√12π e− 2σ2 (X−μ)
This has 2 parameters, hence k = 2 here.
 n P  n2  P
1 1 1
(Xi −μ)2 − 12 (Xi −μ)2
L(θ) = √ e− 2σ2 = e 2σ (2.10)
σ 2π 2πσ 2

n n 1 X
l = − lnσ 2 − ln2π − 2 (Xi − μ)2 (2.11)
2 2 2σ
P
∂l n (Xi − μ)2
= − + (2.12)
∂σ 2 2σ 2 2(σ 2 )2
0 R.O.O, G.O.O and O.O.N
R.O. Otieno, G.O. Orwa and O.O. Ngesa/THEORY OF ESTIMATION 9

Equating the above equation to zero and solving yields


P
n (Xi − μ)2
= (2.13)
2σ 2 2(σ 2 )2
This yields Theory of
P Estimation
(Xi − μ)2
σ2 = (2.14)
n
Next:
∂l 2 X
=− 2 (Xi − μ)(−1) (2.15)
∂μ 2σ
Equating
P the aboveP
equation
Pto zero and solving yields
(Xi − μ) = 0 ⇒ Xi = μ
This P
implies that
Xi
μ= n =x

Hence:
μ̂ = x
and P
(Xi − x)2
σˆ2 =
n

Exercise
1. Find the MLE of θ for the population whose p.d.f is given by f (x, θ) = 1
θ
for 0 < x < θ and zero elsewhere.

2. Suppose that X1 , X2 , ∙ ∙ ∙ , Xn is a random sample from a gamma distri-


bution with p.d.f
 1 α−1 −x
Γα x e x>0
f (x, α) =
0 elsewhere
Where α is unknown. Show that the MLE of α is the value of α that
satisfies the equation
0 Pn
(Γα̂) lnxi
= i=1
Γα̂ n

3. Properties of Good Estimators

3.1. Unbiasedness

The point estimator θ̂ is said to be an unbiased estimator for the parameter θ


if E(θ̂) = θ
R.O. Otieno, G.O. Orwa and O.O. Ngesa/THEORY OF ESTIMATION 10

Example 1
Suppose there is a random sample of size n from a population with mean θ,
verify that the sample mean x is an unbiased estimator of θ.
Solution
If x is P
unbiased for θ, then E(x) = θ P Pn
n n
x = n1 i=1 Xi So that E(x) = E( n1 i=1 Xi ) = n1 i=1 E(Xi )
But from hypothesis,E(X
Pn i) = P
θ
n
Therefore, n1 i=1 E(Xi ) = n1 i=1 θ = nθ n =θ
Since E(x) = θ, x is an unbiased estimator of θ.

Example 2
Let X1 , X2 , ∙ ∙ ∙ , Xn be a random sample from a normal population with mean
θ and variance σ 2 . Show that the statistic defined by;
n
1 X
s =
2
(xi − x)2
n − 1 i=1

is an unbiased estimator of σ 2 where s2 is the sample variance.


Solution
If s2 is unbiased for σ 2 ,Pthen E(s2 ) = σ 2 . Pn
n
We have that s2 = n−1 1
i=1 (Xi − x) then (n − 1)s =
2 2
i=1 (Xi − x)
2

Dividing both sides by σ 2 yields;


Pn
(n − 1)s2 i=1 (Xi − x)
2
= ∼ χ2(n−1)
σ2 σ2

But usually if X ∼ χ2(n) , then E(X) = n and V ar(x) = 2n


⇒ For X ∼ χ2(n−1) ⇒ E(x) = n − 1 and V ar(x) = 2(n − 1)
h i  Pn 
(n−1)s2 (Xi −x)2
E σ2 =E i=1
σ2 = (n − 1)
(n−1)E(s2 )
σ2 =
(n − 1) ⇒ E(s2 ) = σ 2
Since E(s ) = σ 2 ,s2 is unbiased estimator for σ 2
2

Theory of
Estimation
3.2. Consistency

A good estimator should be one for which the accuracy(precision) becomes


higher as the sample size increases. An estimator which conforms to this expec-
tation is termed as a consistent estimator.
Let

θ̂1 be an estimator of θ based on sample I


θ̂2 be an estimator of θ based on sample II
0 R.O.O, G.O.O and O.O.N
R.O. Otieno, G.O. Orwa and O.O. Ngesa/THEORY OF ESTIMATION 11

θ̂3 be an estimator of θ based on sample III


..
.
θ̂n be an estimator of θ based on sample n
Clearly θ̂1 , θ̂2 , ∙ ∙ ∙ , θ̂n is a sequence of estimators of θ
Theory of
Estimation
3.2.1. Strong Consistency

A sequence of estimators θ̂i , i = 1, 2, ∙ ∙ ∙ is called a strong consistent sequence


of estimators iff:
h i
lim E (θ̂n − θ)2 = 0, ∀ θ∈Ω
n→∞

where Ω is a parametric space.


This is also known as the Mean Square Consistent sequence of estimators
Now:

 2
E(θ̂n − θ)2 = E θ̂n − E(θ̂n ) + E(θ̂n ) − θ
h i2 h i2
= E θ̂n − E(θ̂n ) + E E(θ̂n ) − θ
h i2
= V ar(θ̂n ) + Bias(θ̂n )
h i2
This means that M SE(θ̂n ) = V ar(θ̂n ) + Bias(θ̂n )
Thus the sequence of estimators θ̂i , i = 1, 2, ∙ ∙ ∙ is said to be strongly consistent
iff
(i). limn→∞ V ar(θ̂n ) = 0
and
(ii). limn→∞ Bias(θ̂n ) = 0
Example 1
Suppose that X1 , X2 , ∙ ∙ ∙ , Xn be a random sample of size n from a normal
population with
Pmean μ and variancePσ 2 .
n n
Let x = n i=1 Xi and s2 = n1 i=1 (Xi − x)2 be the sample mean and
1

sample variance respectively. Show that x and s2 are Mean Squared Error esti-
mators of μ and σ 2 respectively.

Solution
First we deal with x Pn P n2 σ2
Here μ̂ = x and V ar(x) = V ar( n1 i=1 Xi ) = 1
n i=1 V ar(Xi ) = n
2
limn→∞ V ar(x) = limn→∞ σn = 0
0 R.O.O, G.O.O and O.O.N
R.O. Otieno, G.O. Orwa and O.O. Ngesa/THEORY OF ESTIMATION 12

Next,
Bias(x) = E(x) − Pμn Pn Pn
but E(x) = E( n1 i=1 Xi )) = n1 i=1 E(Xi ) = n1 i=1 μ = nμ n =μ
Therefore Bias(x) = 0
limn→∞ Bias(x) = limn→∞ (0)
Since the two conditions for strong consistency are satisfied, we conclude that
x is a strongly consistent estimator of the population mean μ
Now we proceed to check for Pns
2

It is known that if s = n i=1 (Xi − x)2 then it follows that;


2 1

Pn
i=1 (Xi − x)
2
ns2
2
= 2
∼ χ2(n−1)
σ σ

Therefore;  
ns2
E =n−1
σ2

  
n−1 1
⇒ E(s ) =2
σ = 1−
2
σ2
n n
 2
Bias of s2 is E(s2 ) − σ 2 = 1 − n1 σ 2 − σ 2 = − σn
2
Hence limn→∞Bias(s2 ) = limn→∞ (− σn ) = 0
ns2
Also V ar σ2 = 2(n − 1)
V ar(s ) = 2 n2 (σ 2 )2 = 2 n−1
2 n−1
n2 (σ )
4

limn→∞ V ar(s ) = limn→∞ 2 n2 (σ 4 )
2 n−1
=0
Since the two conditions for strong consistency are satisfied, we conclude that
s2 is a strong consistent estimator of the population variance σ 2 Theory of
Estimation
3.2.2. Weak Consistency

Weak consistency is looked at when θ̂n approaches θ with increase of n in the


probabilistic sense.
A sequence of estimators θ̂i , i = 1, 2, ∙ ∙ ∙ is called a weakly consistent system of
estimators or a simple consistent sequence of estimators of θ iff for each positive
number , we have the inequality
h i
lim P r θ̂n − θ ≥  = 0 ∀ θ∈Ω
n→∞

Or equivalently
h i
lim P r θ̂n − θ <  = 1 ∀ θ∈Ω
n→∞
0 R.O.O, G.O.O and O.O.N
R.O. Otieno, G.O. Orwa and O.O. Ngesa/THEORY OF ESTIMATION 13

Example

Suppose that X1 , X2 , ∙ ∙ ∙ , Xn be a random sample of size n from population


with mean θ and variance σ 2 . Show that the sample mean is a weakly consistent
estimator θ

Solution
Pn Pn σ2
V ar(x) = V ar( n1 i=1 Xi ) = 1
n2 i=1 V ar(Xi ) = n
Now θ̂n − θ = |x − μ| and
h i
P r θ̂n − θ <  = P r [|x − μ| < ]
Theory of
Now using Chebychev’s Inequality, which we may state in our own con- Estimation
text as being;
If sampling from a group with mean μ and finite variance σ 2 , then
P r [|x − μ| < ] ≤ 1 − V ar((x))
2
so that after substitution we have ;
σ2
P r [|x − μ| < ] ≤ 1 − n 2
h i
2
⇒ limn→∞ P r [|x − μ| < ] = limn→∞ 1 − σ2 = 1
Hence x
ˉ is a weakly consistent estimator of μ.

3.3. Sufficiency

Let T be a statistic and t be any particular value of T. T is said to be a suffi-


cient statistic for the parameter θ if the conditional distribution X1 , X2 , ∙ ∙ ∙ , Xn
given T = t does not depend on θ. This means that T is said to be sufficient if
it contains all the information about θ that may be required.

Example
Suppose there is a random sample X
P1n, X2 , ∙ ∙ ∙ , Xn from a Poisson distribution
with parameter λ. Verify that T = i=1 Xi is sufficient for λ

e−λ λx
f (x, λ) = , x = 0, 1, 2, ∙ ∙ ∙
x!
and zero elsewhere. Pn
We need to determine the conditional distribution of given T = i=1 Xi
This conditional distribution is given by;

L(X, t) L(X, t, λ)
g(X|T = t) = =
g(t) g(t, λ)
0 R.O.O, G.O.O and O.O.N
R.O. Otieno, G.O. Orwa and O.O. Ngesa/THEORY OF ESTIMATION 14

Where L is the likelihood function of while g(t) is the marginal density function
of T.

L(X, t, λ) = P r(X1 = x1 , X2 = x2 , ∙ ∙ ∙ , Xn = xn ; T = t)
Let

A = {X1 = x1 , X2 = x2 , ∙ ∙ ∙ , Xn = xn }
B = {T = t}
Clearly A ⊂ B therefore A ∩ B = A, Therefore P r(A ∩ B) = P r(A)
Because of this, we only find L(X, λ)

P
n
Y xi
e−nλ λ
L(X, λ) = f (xi , λ) = Qn
i=1 i=1 xi !
e−nλ λt
= Qn
i=1 xi !
Pn
Where t = i=1 Xi
Next, g(t, λ) = P r(T = t)
Now, if X ∼ P oisson(λ) then;
n
X e−nλ (nλ)t
Xi ∼ P oisson(nλ) = Qn
i=1 i=1 t!

−nλ
Q
e λt
n
xi ! t!
g(x|T = t) = i=1
= Qn (3.1)
i=1 xi !n
e−nλ
Qn(nλ)t t
t!
i=1

Since
Pn this result from the division is independent of λ we conclude that t =
i=1 Xi is sufficient for λ

Exercises
Pn
1. Let be i.i.d having binomial distribution,b(1, p). Let T = i=1 Xi . Verify
that T is sufficient for p.
2. Let be i.i.d having geometric distribution;

f (x, θ) = θx (1 − θ), x = 0, 1, 2, ∙ ∙ ∙

Verify that T is sufficient for θ


R.O. Otieno, G.O. Orwa and O.O. Ngesa/THEORY OF ESTIMATION 15

3.3.1. Methods of Identifying Sufficient Statistics from a function

Given a function f (x, θ) with X being a random variable while θ is the govern-
ing parameter in f (x, θ) then it is usually possible to identify/pick the sufficient
statistic(s) if there exists any. To achieve this, the following methods are usually
employed.

1. The Neyman-Fisher Factorisation Criterion


Let L be a joint point of X1 , X2 , ∙ ∙ ∙ , Xn coming from the density f (x, θ). Let
T = t (x) be any statistic. Then T = t (x) is sufficient for θ iff:

L = q (t, θ) .h (x)

Where h (x) is a function of xi s alone very independent of θ where as q (t, θ)


depends on θ but may also depend on x through T = t. Other than the defined
T, q (t, θ) may also depend on Xi through any other statistic.
Example 1
Let X1 , X2 , ∙ ∙ ∙ , Xn form a random sample from a Bernoulli distribution for
which probability of success is a known θ. Use the factorization criteria to con-
Pn
firm that T = Xi is sufficient for θ.
i=1

Solution
n o
1−x
f (x, θ) = θx (1 − θ) , x = 0, 1 and zero elsewhere
Pn P
n
xi n− xi
Q
n
1−x
L (x, θ) = θx (1 − θ) = θi=1 (1 − θ) i=1

i=1
P
n
n−t
But T = xi ⇒ L (x, θ) = θt (1 − θ)
i=1
The above L = (x, θ) is already of the form
q (t, θ) h (x)
n−t
where q (t, θ) = θt (1 − θ)
and
h (x) = 1
This verifies that T = t is sufficient for θ.
Theory of
Example 2 Estimation
Let X1 , X2 , ∙ ∙ ∙ , Xn be a random sample from a continuous distribution with
PDF 
f (x, θ) = θ xθ−1 , 0 < x < 1

Q
n
Show that T = Xi is sufficient for θ.
i=1
Q
n
L (x, θ) = θ xθ−1
i=1
0 R.O.O, G.O.O and O.O.N
R.O. Otieno, G.O. Orwa and O.O. Ngesa/THEORY OF ESTIMATION 16

 θ−1
Q
n Q
n
L (x, θ) = θn xiθ−1 = θn xi
i=1 i=1
Q
n
θ−1
but T = xi so that L (x, θ) = θn (t)
i=1
θ−1
This is in the form q (t, θ) h (x) where q (t, θ) = θn (t) while h (x) = 1
Q
n
Hence T = xi is sufficient for θ.
i=1

Exercise
Let X be a normally distributed random variable with mean μ and variance σ 2
where σ 2 is known. Find the MLE of μ and examine it for sufficiency. Further,
assume that μ is known, derive MLE of σ 2 and examine it for sufficiency.

Remarks
The factorization criterion is best used in cases of multi-parameter functions in
which we have joint sufficiency. By reference, the statistics T1 , T2 ...Tk are said
to be jointly sufficient for θ1 , θ2 ...θk ,iff the joint pdf.
L (x, θ1 , θ2 ...θk ) = q (t, θ1 , θ2 ...θk ) h (x)
where t = t1 , t2 , t3 ...tk

Example
Let X1 , X2 , ∙ ∙ ∙ , Xn be a random sample for a P
normal distribution with both μ
2
and σ 2 unknown, verify that T1 = x and T2 = (xi − x) are jointly sufficient
for μ and σ 2 .

Solution
 (x−μ)2
f x, μ, σ 2 = √1 e− 2σ 2
σ 2π
  n2 1 P
n
2
L x, μ, σ 2 = 1
2πσ 2 e− 2σ2 (xi − μ)
i=1
but
n
X n
X n
X
2 2 2 2 2
(x − x) = (xi − x + x − μ) = (xi − x) + n (x − μ) = t2 +n (t − μ)
i=1 i=1 i=1


 n2 − 1 [t2 +n(t1 −μ)2 ]
This means that L x, μ, σ = 2πσ 1
2
2
 e
2σ 2

Which is in the form of q t1 , t2 , μ, σ h (x)


2
  1  n2 − 1 [t2 +n(t1 −μ)2 ]
where q t , t , μ, σ 2 =
1 2 e 2σ22πσ 2
and h (x) = 1
Hence we conclude that T1 &T2 are jointly sufficient for μ and σ 2

Example

Let X1 , X2 , ∙ ∙ ∙ , Xn be a random sample from a uniform distribution on the


interval θ1 and θ2 i.e. (θ1 , θ2 ) where θ1 hθ2 with both θ1 and θ2 unknowns. Show
0 R.O.O, G.O.O and O.O.N
R.O. Otieno, G.O. Orwa and O.O. Ngesa/THEORY OF ESTIMATION 17

that the statistic Y1 = min ( X1 , X2 , ∙ ∙ ∙ , Xn ) and Y2 = max ( X1 , X2 , ∙ ∙ ∙ , Xn


are sufficient functions for θ1 and θ2 .
Theory of
2. Sufficiency and the Exponential Family Estimation
(a) The one parameter exponential family
The family {f (x, θ) , θ ∈ Ω} is said to be a one parameter exponential family if
f (x, θ) can be written for all values of θ ∈ Ω and ∀ xi is in the form
f (x, θ) = C (θ) m (x) exp [φ (θ) ρ (x)] ∙ ∙ ∙ (i)
Where C (θ) and φ (θ) are arbitrary functions of θ while m (x) and ρ (x) are
arbitrary functions of x usually if any p.d.f. takes the structure of equation 1
Pn
then T = ρ (xi ) is a sufficient statistic for θ
i=1
  P
1
(φθ) ρ(xi )
Q
n
Also L (x, θ) = C (θn ) m xi e 2n

i=1

Example
Let X1 , X2 , ∙ ∙ ∙ , Xn denote a random sample from a Bernoulli distribution hav-
ing PDF
n o
1−x
f (x, θ) = θx (1 − θ) x = 0, 1 0<θ<1

Show that the family of Bernoulli distribution belongs to the one parameter
exponential family.

Solution  x
1−x
f (x, θ) = θx (1 − θ) = (1 − θ) 1−θ
θ
h  i
Which may be further be written as (1 − θ) exp x ln 1−θ θ

Which is of theform specified in equation (1) with C (θ) = (1 − θ) , m (x) = 1


φ (θ) = ln 1−θθ
and ρ (x) = x
Thus the family of Bernoulli distribution belays to the one parameter exponen-
P
n P
n
tial family and ρ (xi ) = xi is the sufficient statistic for θ.
i=1 i=1

Exercise
Suppose X1 , X2 , ∙ ∙ ∙ , Xn form a random sample from a prison distribution with Theory of
x ≥ 0. Show that the family of poison distribution belongs to the parameter Estimation
exponential family and determine a sufficient statistic for x.
Possible answers: (i) x!eθ exp(1 x ln θ).
C (θ) = eθ , m (x) = x! 1

φ (θ) = ln θ, ρ (x) = x
Pn P
⇒T = ρ (xi ) = xi willsuf f icient f or θ
i=1
0 R.O.O, G.O.O and O.O.N
R.O. Otieno, G.O. Orwa and O.O. Ngesa/THEORY OF ESTIMATION 18

(ii) exp (x ln λ − ln ( x!) − λ)


where C (θ) = 1, m (x) = 1 φ (θ) = ln λ ρ (x) = x
Pn P
n
Hence accordingly ρ (xi ) = xi is sufficient for θ.
i=1 i=1

(b) The multi-parameter exponential family.


The family {f (x, θ) , θ ∈ Ω} where θ is a vector consisting of K components
( is )
P
k
said to be a K parameter exponential family if f (x, θ) = C (θ) m (x) exp φj (θ) ρj (x) (2)
j=1
For all θ ∈ Ω and for all values of x.
Accordingly;
Pn P
n P
n
T1 = ρ1 (xi ) , T2 = ρ2 (xi ) .......Tk = ρk (xi )
i=1 i=1 i=1
Are said to be jointly sufficient for θ1 , θ2 ...θk .

Example 
Let X1 , X2 , ∙ ∙ ∙ , Xn be a random sample from x ∼ N μ, σ 2 , with both pa-
rameters unknown. Verify that this family of normal distributions belongs to
the two parameter exponential family and hence determine the jointly sufficient
statistics for μ and σ 2 .
Solution
 (x−μ)2
f x, μ, σ 2 = σ√12π e− 2σ2 which is easily re-written as;

  12  
−x2
 1 −μ2
2σ 2
+ μx
2
f x, μ, σ 2
= e 2σ 2 e σ
..... ∗ ∗
2πσ 2
2 2 μ2
2 μx μ − 2
(x − μ) = x2 − 2xμ + μ2 ⇒ 2σ x
2 + σ 2 − 2σ 2 = e

∗∗ is already in the form of example 2 with;


 μ  1
φ1 μ, σ 2 = 2 , φ2 μ, σ 2 = − 2
σ 2σ
  12 − μ2
ρ1 (x) = x, ρ2 (x) = x2 , m (x) = 1while C (θ) = C μ, σ 2 = 2πσ 1
2 e 2σ2
Hence the family of normal distribution belongs to a two parameter expo-
nential family and therefore the jointly sufficient statistic for μ and σ 2 are
P P
n P P
n
T1 = ρ1 (xi ) = xi and T2 = ρ2 (xi ) = x2i
i=1 i=1

Exercise
Let X1 , X2 , ∙ ∙ ∙ , Xn denote a random sample from a distribution with parameter
α > 0 and β > 0, show that the family Beta distributions with both α and β
unknown belongs to a two parameter distribution family. Show that the family
of Gamma densities;
 α α−1 −βx
β x e
f (xα, β) = x>0
Γα
R.O. Otieno, G.O. Orwa and O.O. Ngesa/THEORY OF ESTIMATION 19

Where α is known but β is unknown belongs to a one parameter exponential


family and hence determine a sufficient statistic for β. Theory of
Estimation

3.4. Completeness

Consider a family of densities {f (x, θ) θ ∈ Ω} in the sample space denoted by Θ,


then the family will be said to be complete if for any function of x say φ (x) whose
expected value exists and whose variance is finite, then E [φ (x)] = 0 ∀θ ∈ Ω
which may also mean that φ (x) = 0

Example
Show that the binomial family of densities is complete. Solution. If X ∼ Bin (n, θ)
then;  
n x n−x
f (x, θ) = θ (1 − θ)
x
Pn  n−x
Let φ (x) be any form of x, then E [φ (x)] = φ (x) n
x θx (1 − θ)
 x=0
Let a (x) = φ (x) nx so that we have;
Pn
n−x
E [φ (x)] = a (x) θ x (1 − 0)
x=0
If the family is complete, then E [φ (x)] = 0
Pn
n−x
⇒ a (x) θx (1 − θ) = 0 but this is a polynomial of order n
x=0
n n−1 n−2 0
i.e. a (0) θ 0 (1 − θ) + a (1) θ1 (1 − θ) + a (2) (1 − θ) + .....a (n) θn (1 − θ)
n n−1
⇒ a (0) (1 − θ) + a (1) (1 − θ) + ∙ ∙ ∙ + a (n) θn
If this equation is to hold then all the values of the coefficients ai s must be
identically equal to zero
 i.e. a (x) = 0 ∀x ⇒ φ (x) nx = 0 but nx 6= 0 which
means that if φ (x) nx = 0 then definitely φ (x) = 0 and hence E [φ (x)] = 0
Therefore the family of binomial densities is complete.
Theory of
Exercise. Estimation
Show that the family of poisson densities is complete.

0 R.O.O, G.O.O and O.O.N

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy