0% found this document useful (0 votes)

173 views

Stein's Paradox: DR Richard J. Samworth, Statslab Cambridge

This document discusses Stein's Paradox in statistics. Stein's Paradox surprisingly shows that in some cases, ignoring the data entirely can produce a better estimator than one that uses all of the data. Specifically, when estimating the mean of multiple independent normal distributions, simply using the sample means is inadmissible and can be improved upon by "shrinking" the estimates towards the overall mean. This counterintuitive result challenged conventional statistical thinking.

Uploaded by

Sanjay Chouhan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

173 views

Stein's Paradox: DR Richard J. Samworth, Statslab Cambridge

Uploaded by

Sanjay Chouhan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Stein’s Paradox

Dr Richard J. Samworth, Statslab Cambridge

P
erhaps the most surprising result in Statis- problem above with p = 1 and the squared error
tics arises in a remarkably simple estima- loss function, the estimator θ̂ = 37 (which ignores
tion problem. Let X1, …, Xp be independent the data!) is admissible. On the other hand, deci-
random variables, with Xi ∼ N(θi , 1) for i = 1, …, sion theory dictates that inadmissible estimators
p. Writing X = (X1, …, Xp)T, suppose we want to can be discarded, and that we should restrict our
find a good estimator θ̂ = θ̂(X) of θ = (θ1, …, θp)T. choice of estimator to the set of admissible ones.
To define more precisely what is meant by a good
estimator, we use the language of statistical deci- This discussion may seem like overkill in this
sion theory. We introduce a loss function L(θ̂, θ), simple problem, because there is a very obvious
which measures the loss incurred when the true estimator of θ: since all the components of X are
value of our unknown parameter is θ, and we esti- independent, and E(Xi) = θi (in other words Xi
mate it by θ̂. We will be particularly interested in is an unbiased estimator of θi ), why not just use
the squared error loss function L(θ̂, θ) = �θ̂ – θ� 2 , θ̂ 0 (X) = X? Indeed, this estimator appears to have
where � . � denotes the Euclidean norm, but other
several desirable properties (for example, it is the
choices, such as the absolute error loss L(θ̂, θ) =
maximum likelihood estimator and the uniform
∑i=1 �θ̂i – θ i � are of course perfectly possible.
p
minimum variance unbiased estimator), and by
Now L(θ̂, θ) is a random quantity, which is not the early 1950’s, three proofs had emerged to show
ideal for comparing the overall performance of that θ̂ 0 is admissible for squared error loss when
two different estimators (as opposed to the loss- p = 1. Nevertheless, Stein (1956) stunned the sta-
es they each incur on a particular data set). We tistical world when he proved that, although θ̂ 0 is
therefore introduce the risk function admissible for squared error loss when p = 2, it is
inadmissible when p ≥ 3. In fact, James and Stein
(1961) showed that the estimator
If θ̂ and θ̃ are both estimators of θ, we say θ̂
strictly dominates θ̃ if R(θ̂, θ) ≤ R(θ̃, θ) for all θ,
with strict inequality for some value of θ. In this
case, we say θ̃ is inadmissible. If θ̂ is not strictly
dominated by any estimator of θ, it is said to be strictly dominates θ̂ 0. The proof of this remark-
admissible. Notice that admissible estimators able fact is relatively straightforward, and is given
are not necessarily sensible: for instance, in our in the Appendix.

38 Largest even number which cannot be written

as the sum of two odd composite numbers.
One of the things that is so surprising about this where x+ = max(x, 0). The risk of the positive-
result is that even though all of the components part James–Stein estimator θ̂+JS = θ̂ +,
JS
0 is also in-
of X are independent, the ith component of θ̂ JS cluded in Figure 1 for comparison. Remarkably,
depends on all of the components of X. To give even the positive-part James–Stein estimator is
an unusual example to emphasise the point, sup- inadmissible, though it cannot be improved by
pose that we were interested in estimating the much, and it took until Shao and Strawderman
proportion of the US electorate who will vote for (1994) to find a (still inadmissible!) estimator to
Barack Obama, the proportion of babies born in strictly dominate it.
China that are girls and the proportion of Britons
with light-coloured eyes. Then our James–Stein
estimate of the proportion of democratic voters
Generalisations and
depends on our hospital and eye colour data! Related Problems
The reader might reasonably complain that in It is natural to ask how crucial the normality and
the above examples, the data would be binomi- squared error loss assumptions are to the Stein
ally rather than normally distributed. However, phenomenon. As a consequence of many papers
one can easily transform binomially distributed written since Stein’s original masterpiece, it is now
data so that it is well approximated by a normal known that the normality assumption is not criti-
distribution with unit variance (see the baseball cal at all; similar (but more complicated) results
example below), and then consider the estimation can be proved for very wide classes of distribu-
problem on the transformed scale, before apply- tions. The original result can also be generalised
ing the inverse transform. to different loss functions, but there is an im-
portant caveat here: the Stein phenomenon only
Geometrically, the James–Stein estimator shrinks holds when we are interested in simultaneous es-
each component of X towards the origin, and it timation of all components of θ. If our loss func-
is therefore not particularly surprising that the tion were L(θ̂, θ) = (θ̂ 1 – θ1) 2 , for example, then
biggest improvement in risk over θ̂ 0 comes when we could not improve on θ̂ 0. This explains why it
�θ� is close to zero; see Figure 1 for plots of the wouldn’t make much sense to use the James–Stein
risk functions of θ̂ 0 and θ̂ JS when p = 5. A simple estimator in our bizarre example above; it is in-
calculation shows that R(θ̂ JS , 0) = 2 for all p ≥ 2, conceivable that we would be simultaneously in-
so the improvement in risk can be substantial terested in three such different quantities to the
when p is moderate or large. In terms of choosing extent that we would want to incorporate all three
a point to shrink towards, though, there is noth- estimation errors into our loss function.
ing special about the origin, and we could equally
well shrink towards any pre-chosen θ 0 ∈ Rp using Although Stein’s result is very clean to state and
the estimator prove, it may seem somewhat removed from
practical statistical problems. Nevertheless, the
idea at the heart of Stein’s proposal, namely that
of employing shrinkage to reduce variance (at
the expense of introducing bias) turns out to be
In this case, we have R(θ̂ θJS0, θ – θ0) = R(θ̂ JS , θ), so a very powerful one that has had a huge impact
θ̂ θJS0 still strictly dominates θ̂ 0 when p ≥ 3. on statistical methodology. In particular, many
modern statistical models may involve thousands
Note that the shrinkage factor in θ̂ θJS0 becomes or even millions of parameters (e.g. in microar-
negative when �X – θ 0� 2 < p – 2, and indeed it ray experiments in genetics, or fMRI studies in
can be proved that θ̂ θJS0 is strictly dominated by the neuroimaging); in such circumstances, we would
positive-part James–Stein estimator almost certainly want estimators to set some of
the parameters to zero, not only to improve per-
formance but also to ensure the interpretability of
the fitted model.

Sum of five consecutive primes (3 + 5 + 7 + 11 + 13)

and the first three powers of three (3 + 9 + 27). 39
Player ni Zi πi
5
Baines 415 0.284 0.289
4 Barfield 476 0.246 0.256
Bell 583 0.254 0.265
3 Biggio 555 0.276 0.287
Risk

Bonds 519 0.301 0.297

2 θ̂ 0 Bonilla 625 0.280 0.279
θ̂JS Brett 544 0.329 0.305
1
θ̂+JS Brooks Jr. 568 0.266 0.269
0 Browne 513 0.267 0.271
0 1 2 3 4 5 6
�θ�
 Figure 1: Risks with respect to squared error loss of  Table 1: Table showing number of times at
the usual estimator θ̂ 0, the James–Stein estimator θ̂JS and bat n i , batting average Z i in 1990, and career
the positive-part James–Stein estimator θ̂+JS when p = 5. batting average π i , of p = 9 baseball players.

Another important problem that is closely related the 1990 season. Further, let πi denote the player’s
to estimation is that of constructing a confidence true batting average, taken to be his career batting
set for θ, the aim being to give an idea of the un- average. (Each player had at least 3000 at bats in
certainty in our estimate of θ. Given α ∈ (0,1), his career.) We consider the model where Z1,…,
an exact (1 – α)-level confidence set is a subset Zp are independent, with Zi ∼ ni–1 Bin(ni , πi).
C = C(X) of Rp such that, whatever the true value
of θ, the confidence set contains it with probability We make the transformation
exactly 1 – α. The usual, exact (1 – α)-level confi-
dence set for θ in our original normal distribution
set-up is a sphere centred at X. More precisely, it is
and let θi = sin–1 (2πi – 1), which means that
Xi is approximately distributed as N(θi , 1). A heu-
where χ 2p (α) denotes the upper α-point of the ristic argument (which can be made rigorous) to
χ 2p distribution (in other words, if Z ∼ χ 2p, then justify this is that by a Taylor expansion applied
P�Z > χ 2p (α)� = α). But in the light of what we have to the function g(x) = sin–1 (2x – 1), we have
seen in the estimation problem, it is natural to
consider confidence sets that are spheres centred
at θ̂+JS (or θ̂ +, θ0 , for some θ 0 ∈ R ). Since the distri-
JS p

bution of �θ̂+ – θ� depends on �θ�, we can no

JS 2

longer obtain an exact (1 – α)-level confidence set,

but it may be possible to construct much smaller and this latter expression has an approximate
confidence sets – using bootstrap methods to ob- N(0, 1) distribution when ni is large, by the cen-
tain the radius, for example – which still have at tral limit theorem. In fact, since mini ni ≥ 400,
least (1 – α)-level coverage (e.g. Samworth, 2005). an exact calculation gives that the variance of
each Xi is between 1 and 1.005 for πi ∈ [0.2, 0.8].
A baseball data example For our prior guess θ 0 = (θ 0,1, …, θ 0,p)T, we take
θ 0,i = sin–1 (2π0 – 1), with π0 = 0.275 and
The following example is adapted from Sam- n = p ∑i=1 ni. We find that �X – θ� 2 = 2.56, some-
– –1 p
worth (2005). The data in Table 1 give the base-
what below its expected value of around 9, though
ball batting averages (number of hits divided by
number of times at bat) of p = 9 baseball players, since the variance of a χ 92 random variable is 18,
all of whom were active in 1990. The source was this observation is only around 1.5 standard de-
www.baseball-reference.com. For i = 1, …, p, let ni viations away from its mean. On the other hand,
and Zi respectively denote the number of times at �θ̂ +,
JS
θ0 – θ� = 1.50, so Stein estimation does pro-
2

bat and batting average of the ith player during vide an improvement in this case.

40 Only number whose letters are in alphabetical order. Venus

returns to the same point in the night sky every 40 years.
Letting π = (π1, …, πp) and recalling that θ is a References
function of π, the usual 95% confidence set for π
is 1. W. James, and C. Stein, Estimation with quad-
ratic loss, Proc. Fourth Berkeley Symposium,
1, 361–380, Univ. California Press (1961)
On the other hand, the 95% confidence set for π 2. R. Samworth, Small confidence sets for the
constructed using the bootstrap approach is mean of a spherically symmetric distribution, J.
Roy. Statist. Soc., Ser. B, 67, 343–361 (2005)
3. C. Stein, Inadmissibility of the usual estimator
Numerical integration gives that the volume ratio of the mean of a multivariate normal distri-
of the bootstrap confidence set to the usual con- bution, Proc. Third Berkeley Symposium, 1,
fidence set in this case is 0.26, so the benefits of 197–206, Univ. California Press (1956)
having centred the confidence set more appropri-
4. P. Y.-S. Shao and W. E. Strawderman, Im-
ately are quite substantial.
proving on the James–Stein positive-part esti-
mator, Ann. Statist., 22, 1517–1538 (1994)

Appendix
First note that since �X – θ� 2 ∼ χ 2p, we have R(θ̂ 0, θ) = p for all θ ∈ Rp. To compute the risk of the James–
Stein estimator, note that we can write

Consider the expectation inside the sum when i = 1. We can simplify this expectation by writing it out
as an n-fold integral, and computing the inner integral by parts:

since the integrated term vanishes. Repeating virtually the same calculation for components i = 2, …, p,
we obtain

We therefore conclude that

for all θ ∈ Rp, as required.

Number of Mozart’s last symphony. The

polynomial n2 + n + 41 gives primes for |n| < 40. 41

A Guide To Modern Econometrics, 5th Edition Answers To Selected Exercises - Chapter 2
No ratings yet
A Guide To Modern Econometrics, 5th Edition Answers To Selected Exercises - Chapter 2
5 pages
Shrinkage Estimation: Dominique Fourdrinier William E. Strawderman Martin T. Wells
No ratings yet
Shrinkage Estimation: Dominique Fourdrinier William E. Strawderman Martin T. Wells
339 pages
James Stein Estimator
No ratings yet
James Stein Estimator
9 pages
James-Stein Estimator - Introduction
No ratings yet
James-Stein Estimator - Introduction
4 pages
Risk Fisher
No ratings yet
Risk Fisher
39 pages
Estimation of Parametric Functions in Downton's
No ratings yet
Estimation of Parametric Functions in Downton's
17 pages
James-Stein Estimator
No ratings yet
James-Stein Estimator
12 pages
Maximum Likelihood An Introduction: L. Le Cam
No ratings yet
Maximum Likelihood An Introduction: L. Le Cam
31 pages
UMVUE Statmat 2 2022
No ratings yet
UMVUE Statmat 2 2022
43 pages
I Just
No ratings yet
I Just
5 pages
18.6501x Fundamentals of Statistics
100% (1)
18.6501x Fundamentals of Statistics
8 pages
David 1985
No ratings yet
David 1985
4 pages
Asymptotic Theory and Parametric Inference
No ratings yet
Asymptotic Theory and Parametric Inference
32 pages
Classics: 76 Resonance
No ratings yet
Classics: 76 Resonance
15 pages
Interpretation of Canonical Discriminant Functions - Rencher
No ratings yet
Interpretation of Canonical Discriminant Functions - Rencher
10 pages
Ms BPR 1259
No ratings yet
Ms BPR 1259
21 pages
LN Estimation Theory
No ratings yet
LN Estimation Theory
11 pages
Mean Multi-Variate Normal Distribution: Inadmissibility of The Usual Esti - Mator For The OF
No ratings yet
Mean Multi-Variate Normal Distribution: Inadmissibility of The Usual Esti - Mator For The OF
10 pages
Estimation Theory: x, x, x ,…… ……x ,x f x,θ θ θ θ
No ratings yet
Estimation Theory: x, x, x ,…… ……x ,x f x,θ θ θ θ
18 pages
STAT2602 Tutorial 5
No ratings yet
STAT2602 Tutorial 5
7 pages
Sala I Martin
No ratings yet
Sala I Martin
7 pages
Estimators: The Basic Statistical Model
No ratings yet
Estimators: The Basic Statistical Model
9 pages
02 Point Estimators
No ratings yet
02 Point Estimators
33 pages
Chap 3
No ratings yet
Chap 3
25 pages
Stat 210B HWK #5 Solutions: Garvesh Raskutti
No ratings yet
Stat 210B HWK #5 Solutions: Garvesh Raskutti
5 pages
Estimator Properties
No ratings yet
Estimator Properties
17 pages
msqe_metrics_1_ps2
No ratings yet
msqe_metrics_1_ps2
11 pages
GMM Estimation PDF
No ratings yet
GMM Estimation PDF
35 pages
STAT732: Solutions For Homework 2: Due: Wednesday, Feb 14
No ratings yet
STAT732: Solutions For Homework 2: Due: Wednesday, Feb 14
7 pages
Technometrics
No ratings yet
Technometrics
14 pages
Statistics Diffusions
No ratings yet
Statistics Diffusions
66 pages
Complete Convergence of END Co Trong So
No ratings yet
Complete Convergence of END Co Trong So
13 pages
Modelos Lineares: Variáveis Instrumentais: Instrumentos Fracos
No ratings yet
Modelos Lineares: Variáveis Instrumentais: Instrumentos Fracos
23 pages
Theory of Estimation by P.G.dixit, Nirali Publication
No ratings yet
Theory of Estimation by P.G.dixit, Nirali Publication
186 pages
G3 e 10
No ratings yet
G3 e 10
26 pages
Manual Econometrics
No ratings yet
Manual Econometrics
20 pages
MLE Lecture Note For Econometrician
No ratings yet
MLE Lecture Note For Econometrician
13 pages
9511_et_Module-2
No ratings yet
9511_et_Module-2
6 pages
Notes
No ratings yet
Notes
10 pages
Estimadores Extremos: Introdução
No ratings yet
Estimadores Extremos: Introdução
54 pages
STAT 135 Solutions To Homework 3:: 30 Points
No ratings yet
STAT 135 Solutions To Homework 3:: 30 Points
8 pages
RKL: A General, Invariant Bayes Solution For Neyman-Scott
No ratings yet
RKL: A General, Invariant Bayes Solution For Neyman-Scott
15 pages
R300 Advanced Econometrics Methods Lecture Slides
No ratings yet
R300 Advanced Econometrics Methods Lecture Slides
362 pages
Chapter 3 - Statistical Inference (Point Estimation
No ratings yet
Chapter 3 - Statistical Inference (Point Estimation
15 pages
Stock Watson 3U ExerciseSolutions Chapter04 Students PDF
No ratings yet
Stock Watson 3U ExerciseSolutions Chapter04 Students PDF
8 pages
Suggested Solutions: Problem Set 5: β = (X X) X Y
No ratings yet
Suggested Solutions: Problem Set 5: β = (X X) X Y
7 pages
delta_method
No ratings yet
delta_method
10 pages
Statistical Inference
No ratings yet
Statistical Inference
82 pages
Statinf Estimation
No ratings yet
Statinf Estimation
110 pages
Unit-16 IGNOU STATISTICS
No ratings yet
Unit-16 IGNOU STATISTICS
16 pages
Reference1 Harvard
No ratings yet
Reference1 Harvard
118 pages
Assign20153 Sol
No ratings yet
Assign20153 Sol
47 pages
Chebyshev's Inequality:: K K K K
No ratings yet
Chebyshev's Inequality:: K K K K
12 pages
Geometry in Space
No ratings yet
Geometry in Space
9 pages
Wa0048.
No ratings yet
Wa0048.
6 pages
Rakhlin Mathstat sp22
No ratings yet
Rakhlin Mathstat sp22
108 pages
Lecture Notes For Mathematical Statistics
No ratings yet
Lecture Notes For Mathematical Statistics
184 pages
Challenging Prime Number Problems
From Everand
Challenging Prime Number Problems
Gerald Patterson
No ratings yet
Set-Theoretic Paradoxes and their Resolution in Z-F
From Everand
Set-Theoretic Paradoxes and their Resolution in Z-F
Samuel Horelick
4.5/5 (2)
BAYES Theorem
From Everand
BAYES Theorem
Jeffery Short
2/5 (5)
Term Paper: Course Title: Business Communication
No ratings yet
Term Paper: Course Title: Business Communication
16 pages
Internal Assessment Scheme Jan-Jun-2021
No ratings yet
Internal Assessment Scheme Jan-Jun-2021
9 pages
Zain Khan Final Synopsis
No ratings yet
Zain Khan Final Synopsis
5 pages
The COPE Index - A First Stage Assessment of Negative Impact, Positive Value and Quality of Support of Caregiving in Informal Carers of Older People
No ratings yet
The COPE Index - A First Stage Assessment of Negative Impact, Positive Value and Quality of Support of Caregiving in Informal Carers of Older People
15 pages
Factors Affecting Org Climate
No ratings yet
Factors Affecting Org Climate
2 pages
Maslach Burnout Inventory-Human Services Survey (MBI-HSS) Versi Bahasa Indonesia
No ratings yet
Maslach Burnout Inventory-Human Services Survey (MBI-HSS) Versi Bahasa Indonesia
11 pages
Paloma Quezada Resume
No ratings yet
Paloma Quezada Resume
3 pages
Activity3.1 Hagosojos MaEdTHE
No ratings yet
Activity3.1 Hagosojos MaEdTHE
3 pages
Simple Random Sampling
No ratings yet
Simple Random Sampling
4 pages
What is IPE
No ratings yet
What is IPE
10 pages
P.R Exam
No ratings yet
P.R Exam
10 pages
Scaling Techniques
No ratings yet
Scaling Techniques
36 pages
Zomi Siamsin USA 2014
0% (1)
Zomi Siamsin USA 2014
4 pages
Jobr Review Assignment 9 Article+Text
No ratings yet
Jobr Review Assignment 9 Article+Text
21 pages
CS507 FINAL TERM Master Latest Solved
100% (2)
CS507 FINAL TERM Master Latest Solved
787 pages
260-Article Text-416-1-10-20211210
No ratings yet
260-Article Text-416-1-10-20211210
14 pages
Sample Article Critique
No ratings yet
Sample Article Critique
1 page
Figure 8.3: Criminal Psychology
No ratings yet
Figure 8.3: Criminal Psychology
2 pages
Chong Druckman Framing Theory 2007
No ratings yet
Chong Druckman Framing Theory 2007
24 pages
Framework
No ratings yet
Framework
5 pages
Adapted For Emerald CCM v1 102015 DF10815
No ratings yet
Adapted For Emerald CCM v1 102015 DF10815
40 pages
LET
No ratings yet
LET
14 pages
Topic 1 Site Investigation PDF
50% (2)
Topic 1 Site Investigation PDF
71 pages
Data Science & Analytics: Course Code: CSE3105 Credits: 02 Credit Hours: 02/week Exam Hours: 03
No ratings yet
Data Science & Analytics: Course Code: CSE3105 Credits: 02 Credit Hours: 02/week Exam Hours: 03
2 pages
Res5153 Module 4
No ratings yet
Res5153 Module 4
21 pages
Values in The Workplace
No ratings yet
Values in The Workplace
15 pages
Construction - Demolition - HSE
No ratings yet
Construction - Demolition - HSE
6 pages
Project Guidelines MBA 2019-20 Batch
No ratings yet
Project Guidelines MBA 2019-20 Batch
9 pages
PDF Kroma 2
No ratings yet
PDF Kroma 2
6 pages
Engaging The Corporate Sector Narayana Murthy Committee
No ratings yet
Engaging The Corporate Sector Narayana Murthy Committee
7 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Stein's Paradox: DR Richard J. Samworth, Statslab Cambridge

Uploaded by

Stein's Paradox: DR Richard J. Samworth, Statslab Cambridge

Uploaded by

Stein’s Paradox

Dr Richard J. Samworth, Statslab Cambridge

38 Largest even number which cannot be written

Sum of five consecutive primes (3 + 5 + 7 + 11 + 13)

Bonds 519 0.301 0.297

bution of �θ̂+ – θ� depends on �θ�, we can no

longer obtain an exact (1 – α)-level confidence set,

40 Only number whose letters are in alphabetical order. Venus

We therefore conclude that

for all θ ∈ Rp, as required.

Number of Mozart’s last symphony. The

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.