The Two Envelopes Problem
The Two Envelopes Problem
The Two Envelopes Problem
in the new
distribution. So we write down:
p(Y |C=H) =
0
p(Y
|C=L)(2Y
Y )dY
(13)
=
|C=L)dY
so the density at
p(Y |C=L) must be half that of p(Y/2|C=L).
3
Note, however, that as we take the limit the paradox (that we should always switch
regardless of Y ) does not rear its ugly head.
6
If equals one then this means: p(Y |C=L) = p(Y |C=H) = p(Y/2|C =
L)/2 for all Y , which holds when p(Y |C=L) = 1/Y . This is a uniform
distribution on log Y and says, I have know idea about the scale of
Y . I think Y is as likely to be between 1 and 10 as it is to be between
10 and 100. This seems more sensible than the uniform prior before -
for one thing, it decreases with Y .
However, this distribution is not normalisable (without introducing
a lower and upper cut-o and then is not 1 for all Y ). Limiting
arguments fail for the same reason they did in the uniform case: in
the region where both the distributions are non-zero, you should al-
ways switch. So taking Y
max
to innity pushes the regime where you
should stick to a place where no data will land. Again, the limit corre-
sponding to the improper distribution corresponds to a terrible set of
assumptions.
0.8 A sensible approach
Is there a sensible approach? In a word yes. However, it will depend
on what assumptions you want to make. In a way the previous dis-
cussion is a red-herring: you might be quite happy to use a uniform
distribution, or a log-uniform distribution - and everything will be ok
so long as you think carefully about the choice for the ranges (am I
playing with a friend for his poket money, or am I through to the -
nal stages of a game-show called Who wants to be a billionaire?).
4
One nal alternative (which does not have hard cut-os) is to place an
exponential distribution over Y :
P(Y |C=L) = exp(Y ) (16)
1/ is the characteristic length scale over which the density decays.
From eqn. 8 we should switch when:
1
2
(Y ) (17)
= 2 exp(Y/2) (18)
(An intuition for this equation is that it is the result of a hypothesis
test: we shuld switch when it is more likely weve chosen the envelope
containing the smaller amount. That is when P(C = L|Y ) > P(C =
H|Y ). Plugging in values well get the same result as above.)
4
One alternative is to place a prior over the ranges and integrate them out:
P(Y |C=L) =
R
P(Y |C=L, Y
min
, Y
max
)P(Y
min
, Y
max
)dY
min
dY
max
7
So we switch when Y
2
ln 4
3
n=0
(2)
n
P(2
n
Y ) under the proviso that 2
N
P(2
N
Y |C =
L) 0 as N . This equation says: You give me a marginal, Ill
give you the conditional.
0.10 A sensible improper distribution
Does using an improper distribution always lead to the paradox? Some
authors claim so, but it hasnt been proved. More over the intuition
from the above examples is that it wasnt the improper nature of the
distribution which lead to the paradox (as some authors glibly state) -
it was the fact that they corresponded to stupid assumptions.
Lets try and invent a distribution which correspondeds to reason-
able assumptions in the limit where they become un-normalisable.
5
The key intuition is this: well be ne as long as the point we
always switch is not situated in the extreme of the range of allowed Y
values. For instance, if the conditional distributions have some periodic
struture of correctly chosen amplitude then there can be regions when
one distribution is much higher than the other, and regions when the
converse is true. In this case the paradox will not arise. Whatsmore, the
marginal distribution P(Y ) implied by the two periodic conditionals
may not be such a stupid one
6
. Let me give a simple example:
5
Another point noted by authors is that the improper distribution above has an innite
mean value - this may correspond to another undesireable assumption too. For instance
there are proper distributions which have innite expected value and in such cases the
paradox can occur too.
6
Although a counter example might be enough to convince the mathematicians the
paradox doesnt arise from un-normalisabilty, it would be nice if such a counter example
corresponded to a realistic
8
P(Y |C = L) = 4 2n + 1 > x > 2n (19)
P(Y |C = L) = 1 otherwise (20)
Over some range L that we extend to innity (and therefore 0)
to make our improper prior.
This species the other conditional:
P(Y |C = H) = 2 4n + 2 > x > 4n (21)
P(Y |C = H) = 1/2 otherwise (22)
And the marginal:
P(Y ) = 3 4n + 1 > x > 4n (23)
P(Y ) = 3/2 4n + 2 > x > 4n + 1 (24)
P(Y ) = 9/4 4n + 3 > x > 4n + 2 (25)
P(Y ) = 3/4 4n + 4 > x > 4n + 3 (26)
Is this a sensible marginal? Well, imagine that some types of
coins were harder to come by than others. For example, imagine only
1,5,9,13... valued coins existed. Then we might like something like this
(perhaps wed like to add in a 1/Y decay too).
When should we switch? This depends on Y in the following way:
Region P(Y |C = L) P(Y |C = H) switch?
I 4n + 1 > x > 4n 4 2 doesnt matter
II 4n + 2 > x > 4n + 1 1 2 no
III 4n + 3 > x > 4n + 2 4 1/2 yes
IV 4n + 4 > x > 4n + 3 1 1/2 doesnt matter
So we do need to listen to what the data are telling us even though the
condtionals are improper.