3 The Rao-Blackwell Theorem: 3.1 Mean Squared Error
3 The Rao-Blackwell Theorem: 3.1 Mean Squared Error
1 1
n n
Variance is what any two statisticians are at. 1
µ̂ = X̄ = Xi , σ̂ 2 = SXX := (Xi − X̄)2.
n i=1 n n i=1
3.1 Mean squared error
It is easy to check that µ̂ is unbiased. As regards σ̂ 2 note that
A good estimator should take values close to the true value of the parameter it is n
n
n
attempting to estimate. If θ̂ is an unbiased estimator of θ then E (θ̂ − θ)2 is the E (Xi − X̄) = E
2
(Xi − µ + µ − X̄) = E
2
(Xi − µ) − nE (µ − X̄)2
2
variance of θ̂. If θ̂ is a biased estimator of θ then E (θ̂ − θ)2 is no longer the variance i=1 i=1 i=1
of θ̂, but it is still useful as a measure of the mean squared error (MSE) of θ̂. = nσ 2 − n(σ 2/n) = (n − 1)σ 2
Example 3.1 Consider the estimators in Example 1.3. Each is unbiased, so its MSE so σ̂ 2 is biased. An unbiased estimator is s2 = SXX /(n − 1).
is just its variance. Let us consider an estimator of the form λSXX . Above we see SXX has mean
(n − 1)σ 2 and later we will see that its variance is 2(n − 1)σ 4. So
1 var(X1) · · · + var(Xn) np(1 − p) p(1 − p)
var(p̂) = var (X1 + · · · + Xn ) = = = 2
n n2 n2 n
E λSXX − σ 2 = 2(n − 1)σ 4 + (n − 1)2σ 4 λ2 − 2(n − 1)σ 4λ + σ 4 .
1 var(X1 ) + 4 var(X2) 5p(1 − p)
var(p̃) = var (X1 + 2X2 ) = =
3 9 9 This is minimized by λ = 1/(n + 1). Thus the estimator which minimizes the mean
squared error is SXX /(n + 1) and this is neither the MLE nor unbiased. Of course
Not surprisingly, var(p̂) < var(p̃). In fact, var(p̂)/ var(p̃) → 0, as n → ∞.
there is little difference between any of these estimators when n is large.
Note that p̂ is the MLE of p. Another possible unbiased estimator would be
Note that E [σ̂ 2 ] → σ 2 as n → ∞. So again the MLE is asymptotically unbiased.
∗ 1
p = 1 (X1 + 2X2 + · · · + nXn )
2 n(n + 1) 3.2 The Rao-Blackwell theorem
with variance
The following theorem says that if we want an estimator with small MSE we can
1 2(2n + 1) confine our search to estimators which are functions of the sufficient statistic.
var(p∗) = 2 1 + 2 + · · · + n p(1 − p) =
2 2
p(1 − p) .
1
n(n + 1) 3n(n + 1)
2 Theorem 3.3 (Rao-Blackwell Theorem) Let θ̂ be an estimator of θ with E (θ̂2 ) <
∗
Here var(p̂)/ var(p ) → 3/4. ∞ for all θ. Suppose that T is sufficient for θ, and let θ∗ = E (θ̂ | T ). Then for all θ,
The next example shows that neither a MLE or an unbiased estimator necessarily E (θ∗ − θ)2 ≤ E (θ̂ − θ)2.
minimizes the mean square error.
The inequality is strict unless θ̂ is a function of T .
Example 3.2 Suppose X1, . . . , Xn ∼ N(µ, σ 2), µ and σ 2 unknown and to be esti-
mated. To find the MLEs we consider Proof.
n
1 n 1
n E [θ∗ − θ]2
−(xi −µ)2 /2σ2
log f (x | µ, σ ) = log
2
√ e = − log(2πσ 2) − 2 (xi − µ)2 . 2 2
i=1 2πσ 2 2 2σ i=1 = E E θ̂ | T − θ = E E θ̂ − θ | T ≤ E E (θ̂ − θ)2 | T = E (θ̂ − θ)2
This is maximized where ∂(log f )/∂µ = 0 and ∂(log f )/∂σ 2 = 0. So The outer expectation is being taken with respect to T . The inequality follows from
n
n the fact that for any RV, W , var(W ) = E W 2 − (E W )2 ≥ 0. We put W = (θ̂ − θ | T )
n 1
(1/σ̂ 2) (xi − µ̂) = 0, and − + (xi − µ̂)2 = 0, and note that there is equality only if var(W ) = 0, i.e., θ̂ − θ can take just one value
2σ̂ 2 2σ̂ 4
i=1 i=1 for each value of T , or in other words, θ̂ is a function of T .
13 14
Note that if θ̂ is unbiased then θ∗ is also unbiased, since 3.3 Consistency and asymptotic efficiency∗
E θ∗ = E E (θ̂ | T ) = E θ̂ = θ . Two further properties of maximum likelihood estimators are consistency and asymp-
totic efficiency. Suppose θ̂ is the MLE of θ.
We now have a quantitative rationale for basing estimators on sufficient statistics:
To say that θ̂ is consistent means that
if an estimator is not a function of a sufficient statistic, then there is another estimator
which is a function of the sufficient statistic and which is at least as good, in the P(|θ̂ − θ| > ) → 0 as n → ∞ .
sense of mean squared error of estimation.
In Example 3.1 this is just the weak law of large numbers:
Examples 3.4
X1 + · · · + Xn
(a) X1 , . . . , Xn ∼ P (λ), λ to be estimated. P
− p > → 0 .
n
In Example 2.3 (a) we saw that a sufficient statistic is i xi. Suppose we start
with the unbiased estimator λ̃ = X1 . Then ‘Rao–Blackwellization’ gives It can be shown that var(θ̃) ≥ 1/nI(θ) for any unbiased estimate θ̃, where 1/nI(θ)
λ∗ = E [X1 | i Xi = t] . is called the Cramer-Rao lower bound. To say that θ̂ is asymptotically efficient
But means that
E Xi | i Xi =t =E i Xi | i Xi = t = t . lim var(θ̂)/[1/nI(θ)] = 1 .
n→∞
i
By the fact that X1, . . . , Xn are IID, every term within the sum on the l.h.s. must The MLE is asymptotically efficient and so asymptotically of minimum variance.
be the same, and hence equal to t/n. Thus we recover the estimator λ∗ = λ̂ = X̄.
(b) X1 , . . . , Xn ∼ P (λ), θ = e−λ to be estimated. 3.4 Maximum likelihood and decision-making
Now θ = P(X1 = 0). So a simple unbiased estimator is θ̂ = 1{X1 = 0}. Then
We have seen that the MLE is a function of the sufficient statistic, asymptotically
n n
∗ unbiased, consistent and asymptotically efficient. These are nice properties. But
θ = E 1{X1 = 0} Xi = t = P X1 = 0 Xi = t
consider the following example.
i=1
i=1
n
n
Example 3.5 You and a friend have agreed to meet sometime just after 12 noon.
=P X1 = 0; Xi = t P Xi = t
i=2 i=1
You have arrived at noon, have waited 5 minutes and your friend has not shown
t −(n−1)λ
t up. You believe that either your friend will arrive at X minutes past 12, where you
−λ ((n − 1)λ) e (nλ)t e−nλ n−1
=e = believe X is exponentially distributed with an unknown parameter λ, λ > 0, or that
t! t! n
she has completely forgotten and will not show up at all. We can associate the later
Since θ̂ is unbiased, so is θ∗ . As it should be, θ∗ is only a function of t. If you do event with the parameter value λ = 0. Then
Rao-Blackwellization and you do not get just a function of t then you have made a ∞
mistake. P(data | λ) = P(you wait at least 5 minutes | λ) = λe−λt dt = e−5λ .
(c) X1 , . . . , Xn ∼ U [0, θ], θ to be estimated. 5
In Example 2.3 (c) we saw that a sufficient statistic is maxi xi . Suppose we start Thus the maximum likelihood estimator for λ is λ̂ = 0. If you base your decision as
with the unbiased estimator θ̃ = 2X1. Rao–Blackwellization gives to whether or not you should wait a bit longer only upon the maximum likelihood
∗ 1 n−1 n+1 estimator of λ, then you will estimate that your friend will never arrive and decide
θ = E [2X1 | maxi Xi = t] = 2 t+ (t/2) = t.
n n n not to wait. This argument holds even if you have only waited 1 second.
This is an unbiased estimator of θ. In the above calculation we use the idea that The above analysis is unsatisfactory because we have not modelled the costs of
X1 = maxi Xi with probability 1/n, and if X1 is not the maximum then its expected either waiting in vain, or deciding not to wait but then having the friend turn up.
value is half the maximum. Note that the MLE θ̂ = maxi Xi is biased.
15 16