04 - Random Variables 2
04 - Random Variables 2
Foundations of Probability
www.smstc.ac.uk
Contents
(i)
SMSTC (2022/23)
Foundations of Probability
Chapter 4: Random variables and their laws II
The Probability Teama
www.smstc.ac.uk
Contents
4.1 Moments and inequalities . . . . . . . . . . . . . . . . . . . . . . . . . 4–1
4.1.1 Markov and Chernoff inequalities . . . . . . . . . . . . . . . . . . . . . . 4–1
4.1.2 Jensen’s inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–2
4.1.3 Moments and the spaces Lp . . . . . . . . . . . . . . . . . . . . . . . . . 4–3
4.1.4 Hölder and related inequalities . . . . . . . . . . . . . . . . . . . . . . . 4–4
4.2 Moment generating functions . . . . . . . . . . . . . . . . . . . . . . . 4–5
4.3 Characteristic functions . . . . . . . . . . . . . . . . . . . . . . . . . . 4–6
4.4 Joint distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–8
4.4.1 Joint distribution function . . . . . . . . . . . . . . . . . . . . . . . . . . 4–9
4.4.2 Knowledge of FX,Y implies knowledge of PX,Y . . . . . . . . . . . . . . 4–9
4.4.3 Discrete distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–10
4.4.4 Absolutely continuous distributions and joint density . . . . . . . . . . . 4–10
4.5 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–12
4–1
SMSTC: Foundations of Probability 4–2
Lemma 4.2. Let X be a nonnegative random variable such that EX = 0. Then P(X = 0) = 1.
P(X > n1 ) = 0,
and the result now follows from the continuity property P9 of Chapter 1, since {X > 0} =
S 1 1
n {X > n } and so P(X > 0) = limn→∞ P(X > n ).
Lemma 4.3 (Chernoff inequality). Let X be a real-valued random variable. Then, for any
positive nondecreasing function g
Eg(X)
P(X ≥ t) ≤ .
g(t)
(i.e. X has a binomial distribution with parameters n and 1/2). For a > 1/2, use the Chernoff
inequality to obtain an upper bound for P(X > na), and show that this bound decays exponen-
tially in n. [Hint: take the function g of Lemma 4.3 to be given by g(x) = eθx for some θ > 0.
We then have (from the binomial theorem) that EeθX = 2−n (1 + eθ )n ; now choose θ so as to
optimise the bound.]
for all a, b ∈ R and all 0 ≤ p ≤ 1. Geometrically this means that a line joining (a, ϕ(a)) and
(b, ϕ(b)) lies above the function ϕ (not necessarily strictly) at all points intermediate between a
and b. Examples of convex functions are (i) any linear function x 7→ a + bx for constants a and
b, (ii) x 7→ |x|a for any a ≥ 1, and (iii) x 7→ ex . Further the function given by the (pointwise)
supremum of any family of convex functions is convex. [Exercise: prove this.]
Indeed we have the following useful result: a function ϕ is convex if and only if, for all a ∈ R,
there exists a constant b (depending on a) such that
i.e. if and only if, for every a, there exists a straight line of some slope b through (a, ϕ(a)) which
lies below (not necessarily strictly) the function ϕ at all other points x.
The “if” part of this result is immediate from the observation of the preceding paragraph, while
the “only if” part (which is what we need below) is geometrically “obvious”—see Figure 4.1. A
formal proof is a useful exercise, or, failing this, see [2]. (Note that we do not necessarily have
that ϕ is differentiable at a.)
Now notice that if ξ is a random variable with P(ξ = a) = p, P(ξ = b) = 1−p, the definition (4.1)
can be written as
ϕ(Eξ) ≤ Eϕ(ξ).
Jensen’s inequality generalises this observation.
SMSTC: Foundations of Probability 4–3
convex function ϕ
ϕ(x)
ϕ(a) + b(x − a)
slope b ϕ(a)
a x
Figure 4.1: Convex function ϕ and “supporting” straight line at (a, ϕ(a))
Lemma 4.4 (Jensen’s inequality). Let X be a real-valued integrable random variable (i.e.
E|X| < ∞) and let ϕ be a convex function. Then
so that
ϕ(EX) + b(X − EX) ≤ ϕ(X) a.s.
Taking expectations of both sides gives the required result (4.3).
Note that for integer r ≥ 1, we may also define the r-moment of X by EX r , whenever this
r r
exists. (Recall from Chapter 3 that EX r = E (X + ) − E (X − ) provided that at least one of
expectations on the right side is finite.)
Lemma 4.5. For any random variable X, we have that ||X||r is increasing in r.
Proof We require to show that for any 0 < r < s we have ||X||r ≤ ||X||s . In the case where
||X||r < ∞ (i.e. E|X|r < ∞) this follows immediately from the application of Jensen’s inequality
to the random variable |X|r using the convex function ϕ given by ϕ(x) = xs/r for x ≥ 0 (see
above). In the case where ||X||r = ∞, we may (from the definition of expectation) consider a
sequence Xn of positive simple random variables such that Xn ≤ X and ||Xn ||r → ∞. Thus,
from our result in the finite case, ||Xn ||s → ∞, and so it follows by monotonicity that ||X||s = ∞
as required.
Corollary 4.1. If E|X|p < ∞ for some p > 0 then E|X|r < ∞ for all 0 < r < p.
SMSTC: Foundations of Probability 4–4
The space Lp . For any p > 0, define Lp (sometimes identified more carefully as Lp (Ω, F, P))
to be the collection of real-valued random variables such that E|X|p < ∞, or equivalently
||X||p < ∞. It follows easily from the elementary inequality
that Lp is a vector space. (In particular, if X, Y ∈ Lp , then, for any constants a, b, we have
aX + bY ∈ Lp .) Corollary 4.1 may be restated as Lp ⊆ Lq for all 0 < q < p; (note that other
than on finite probability spaces the inclusion is strict). Of particular interest are the space L1
of integrable random variables, and the smaller space L2 , which we now discuss.
The space L2 . This is the space of random variables X such that EX 2 < ∞. Note that if
X ∈ L2 , then (since L2 ⊆ L1 ) EX exists and is finite, and so also X − EX ∈ L2 .
Exercise 4.2. Note (from positivity) that for X ∈ L2 we have Var X ≥ 0 always. Use the other
basic properties of expectation and Lemma 4.2 to show that Var X = 0 if and only if X is almost
surely constant, and to establish the identity
Var X = EX 2 − (EX)2 .
Exercise 4.3. For X ∈ L2 apply the Markov inequality to (X − EX)2 to prove the Chebyshev
inequality:
Var X
P(|X − EX| ≥ t) ≤ .
t2
Exercise 4.4. For X, Y ∈ L2 , define the covariance between X and Y by
It follows from the elementary inequality 0 ≤ 2|xy| ≤ x2 + y 2 (applied to the random variables
X − EX and Y − EY ) that Cov(X, Y ) is finite. Show that, for any constants a and b,
Lemma 4.6 (Hölder inequality). Let the random variables X, Y be such that X ∈ Lp and
Y ∈ Lq for some p, q > 1 such that p−1 + q −1 = 1. Then
Remarks. It is really the second of the inequalities in (4.6) which is the Hölder inequality (i.e.
it is in essence an inequality for nonnegative random variables); also this result implies that in
particular (under the given conditions) XY is integrable, i.e. XY ∈ L1 . The first inequality in
(4.6) is just the elementary modulus inequality for expectation—see Chapter 3.
In the special case p = q = 2, the Hölder inequality reduces to the well-known Cauchy-
Bunyakowskii-Schwarz inequality (often just referred to as the Schwarz inequality).
SMSTC: Foundations of Probability 4–5
Show that
−1 ≤ ρ(X, Y ) ≤ 1.
Finally, we remark that the following lemma is the triangle inequality which justifies the inter-
pretation of || · || as a norm. For p > 1, it is readily deduced from the Hölder inequality—for
details again see, e.g., [2].
M (θ) := E(eθX ), θ ∈ R.
Notice that in particular M (0) = 1 always. The values of M (θ) are also referred to as the
exponential moments of the random variable X and the function L(θ) := M (−θ) is also
known as the Laplace transform of (the distribution of) X.
The function M is useful if M (θ) < ∞ for some θ 6= 0. (Indeed, there are cases where θ = 0 is
the only point at which M is finite.) If X is a positive random variable, then M (θ) ≤ 1 for all
θ ≤ 0. If X is a negative random variable, then M (θ) ≤ 1 for all θ ≥ 0.
Note that (see Theorem 3.2 of Chapter 3) the moment generating function M depends only on
the distribution, or law, of X. Indeed, from that theorem, in the case where X has an absolutely
continuous distribution function F with density f , we can write
Z
M (θ) = eθx f (x)dx (Lebesgue integral) .
R
Lemma 4.8. Suppose there exist a < 0 < b such that M (θ) < ∞ for all a < θ < b. Then
(i) the r-moment of X exists for all r = 0, 1, 2, . . . and is given by the r-derivative of M at 0:
E(X r ) = Dr M (0).
SMSTC: Foundations of Probability 4–6
(ii)
∞
X E(X r )
M (θ) = θr , a < θ < b.
r!
r=0
(iii) There is only one law such that if a random variable has that law then it has moment
generating function M .
Note that the result (iii) above says that, under the conditions of the lemma, the moment
generating function uniquely determines the corresponding law.
Proof [sketch] Using the Dominated Convergence Theorem (see Chapters 3 and 6), we can see
that M is infinitely differentiable at 0 with r-derivative equal to the r-moment of X. Moreover,
we can see that M is a real analytic function around 0. Hence Taylor’s theorem holds, which
yields the second claim.
Exercise 4.7. Suppose that the random variable X has moment generating function MX . Show
that, for any constants a, b, the random variable Y := a + bX has moment generating function
MY given by MY (θ) = eaθ MX (bθ).
Exercise 4.8. Suppose that the random variable X has an exponential distribution with pa-
rameter λ, i.e. has distribution function F given by F (x) = 0 for x < 0 and F (x) = 1 − e−λx for
x ≥ 0. Show that the corresponding moment generating function is given by M (θ) = λ/(λ − θ)
for θ < λ (and by M (θ) = ∞ for θ ≥ λ). Use the above lemma to deduce directly that EX = λ−1
and Var X = λ−2 .
Exercise 4.9. Let X be absolutely continuous with density f (x) = π −1 (1 + x2 )−1 , x ∈ R (i.e.
X has a Cauchy distribution). Show that the corresponding moment generating function M (θ)
is only defined for θ = 0.
Moment generating functions are particularly useful in regard to the addition of independent
random variables, as we shall see.
ϕ(t) := EeitX , t ∈ R.
Thus
ϕ(t) = E cos(tX) + iE sin(tX), t ∈ R. (4.8)
Further |eitX | = [cos(tX)2 + sin(tX)2 ]1/2 = 1, and therefore ϕ(t) is defined for all t and, from
(4.7), |ϕ(t)| ≤ 1. Note further that ϕ(0) = 1 always.
As in the case of the moment generating function, note that (by, e.g. Theorem 3.2 of Chapter 3)
the characteristic function ϕ depends only on the law of the random variable X.
The following result (for a proof of which see, for example, [2]) is very important.
Theorem 4.1. Given any characteristic function ϕ, there is only law such that if a random
variable has that law then it has characteristic function ϕ.
In other words, the characteristic function uniquely determines (characterises) the law of a
random variable. (This contrasts with the situation for the moment generating function where
we saw that additional conditions are required for it to uniquely characterise the corresponding
law.)
It is further straightforward to show [exercise!] that, for any characteristic function ϕ of a
random variable X:
1. ϕ is a continuous (and, in fact, analytic) function.
2. The complex conjugate ϕ(t) is ϕ(−t). If X is symmetric around 0 (i.e. −X has the same
law as X) then, for all t ∈ R, we have that ϕ(t) is a real number.
This is the Fourier transform of the density f . The inverse Fourier transform determines
f from ϕ:
1 ∞ −itx
Z
f (x) = e ϕ(t)dt.
π −∞
However, this inversion formula is rarely used in practice, where, given a characteristic function
ϕ, we typically conjecture the corresponding law, check that its characteristic function is indeed
ϕ, and then appeal to Theorem 4.1.
Exercise 4.10. Prove the modulus inequality (4.7) for complex random variables.
Exercise 4.11. Suppose that the (real-valued) random variable X has characteristic function
ϕX . Show that, for any constants a, b, the random variable Y := a + bX has characteristic
function ϕY given by ϕY (t) = eiat ϕX (bt).
Exercise 4.12. Let the random variable U have a uniform distribution on (0, 1). Show that
the characteristic function of U is given by
eit − 1
ϕ(t) = , t 6= 0.
it
Use also the result of the previous exercise to deduce that if a random variable has a uniform
distribution on (a, b), then its characteristic function is given by ϕa,b (t) = (eibt − eiat )/i(b − a)t
for t 6= 0.
SMSTC: Foundations of Probability 4–8
Exercise 4.13. Show that an exponentially-distributed random variable, with density function
f given by f (x) = λe−λx for x ≥ 0 and f (x) = 0 for x < 0, has characteristic function
ϕ(t) = λ/(λ − it).
Now recall that the normal distribution with mean 0 and variance 1 has density
1 2
f (z) = √ e−z /2 ,
2π
and is usually denoted by N (0, 1). The following result is required for the proof of the central
limit theorem in Chapter 7.
2 /2
Lemma 4.9. The N (0, 1) distribution has characteristic function ϕ given by ϕ(t) = e−t .
Proof Let the random variable Z have an N (0, 1) distribution. It follows from (4.8) and the
symmetry around 0 of this distribution that, for t ∈ R,
Z ∞
1 2
ϕ(t) = E cos(tZ) = √ cos(tz)e−z /2 dz.
2π −∞
A little elementary analysis then gives
Z ∞
0 1 2
ϕ (t) = − √ z sin(tz)e−z /2 dz
2π −∞
= −tϕ(t),
where, in the first line in the above display, a little care is needed in order to justify the inter-
change of limits involved in differentiation with respect to t and integration with respect to z,
and where the second line follows on integrating by parts. Now, easily,
d t2 /2
e ϕ(t) = 0,
dt
and so the result follows on recalling also ϕ(0) = 1.
Alternatively the result is also easily deduced by using contour integration in the complex plane.
Note that the joint distribution PX determines the (marginal) laws or distributions PX1 , . . . , PXd
of each of the individual random variables X1 , . . . , Xd : for example, for each (one-dimensional)
Borel set B ∈ B,
PX1 (B) = P(X1 ∈ B) = PX (B × R × · · · × R).
However, PX provides information not just about these individual distributions, but also about
the ways in which the random variables X1 , . . . , Xd are probabilistically associated with each
other.
For simplicity, in the rest of this section we specialise to the case d = 2 and replace the random
vector (or pair of random variables) (X1 , X2 ) with the random vector (X, Y ).
(Note also that the choice of ≤ instead of < in (4.10) is an arbitrary convention.)
Analogously to Lemma 3.2 of Chapter 3, the joint distribution function FX,Y has the following
properties.
(i) FX,Y is nondecreasing in each of its arguments, [monotonicity]
(ii) limx→−∞ FX,Y (x, y) = 0 for all y, and also limy→−∞ FX,Y (x, y) = 0 for all x,
(iii) limx,y→+∞ FX,Y (x, y) = 1,
(iv) limn→∞ FX,Y (x + 1/n, y) = FX,Y (x, y) for all y, and also limn→∞ FX,Y (x, y + 1/n) =
FX,Y (x, y) for all x. [right continuity]
However, in the two-dimensional case d = 2 (and in higher-dimensional cases) these properties
are insufficient to characterise joint distribution functions—but see below for the additional
condition required.
We then have that the marginal probability functions of X and Y are given by the nonneg-
ative functions X
pX (x) := P(X = x) = pX,Y (x, y)
y∈S2
and X
pY (y) := P(Y = y) = pX,Y (x, y).
x∈S1
pX,Y (x, y)
if pX (x) 6= 0,
pY |X (y|x) := pX (x) (4.13)
0 otherwise.
(since if pX (x) = 0 then also pX,Y (x, y) = 0) and also, if pX (x) > 0,
P(Y = y | X = x) = pY |X (y|x).
Exercise 4.14. Let the random variables be the numbers obtained on two independent rolls of
a die. Define X = min(N1 , N2 ) and Y = max(N1 , N2 ). Find the joint probability function of
(X, Y ). Use it to calculate the marginal probability functions of X and Y and verify that these
are as you would expect by direct determination of each of them. Find also the conditional
probability functions and verify that these are as you would expect.
In such a case, we say that the random vector (X, Y ) has an absolutely continuous distribu-
tion, and the function fX,Y is referred to as the joint density of this distribution.
It can then be shown (see also Figure 4.2) that, for any Borel set B ∈ B(R2 ),
Z
P((X, Y ) ∈ B) = PX,Y (B) = fX,Y (s, t) ds dt. (4.15)
B
x
Figure 4.2: The probability PX,Y (B) is given by integrating the joint density fX,Y over the
shaded Borel set B.
(In the case where B is a rectangle B = (a1 , b1 ] × (a2 , b2 ] as above, this follows from (4.12)
[exercise].) We also have of course that
Z
fX,Y (s, t) ds dt = 1.
R2
(Here we have used Fubini’s theorem—again see [2], which essentially allows the two-dimensional
integral to be treated as a succession of two one-dimensional integrals.) It follows that the
function fX given by (4.17) is a density for X—and so X is absolutely continuous, as is Y .
We can also, naı̈vely, define the conditional density of Y given X by
fX,Y (x, y)
if fX (x) 6= 0,
fY |X (y|x) := fX (x)
0 otherwise.
Some justification for this is given by the observation that then (4.15) continues to hold when
fX,Y (s, t) is replaced by fX (s)fY |X (t|s). However, we leave a proper treatment of what it means
to condition on a random variable until Chapter 8.
Exercise 4.15. Suppose that the random variables X and Y have joint density
(
a(x + y 2 ), 0 ≤ x ≤ 1, 0 ≤ y ≤ 1,
fX,Y (x, y) =
0, otherwise.
(i) Find the marginal densities fX and fY and the marginal distribution functions FX and
FY of X and Y . Hence identify also the constant a.
(ii) Find the joint distribution function FX,Y of X and Y .
(iii) Find P(X > Y ) and also P(X 2 > Y ).
SMSTC: Foundations of Probability 4–12
4.5 Independence
We continue to consider a random vector (pair of random variables) (X, Y ), but the results of
this section have obvious extensions to d-dimensional random vectors (X1 , . . . , Xd ) or indeed to
random sequences (sequences of random variables) (X1 , X2 , . . . ).
Recall that random variables X and Y are independent if and only if the generated σ-algebras
σ(X) and σ(Y ) are independent.
Lemma 4.10. Suppose that the random variables X and Y are independent. Then
(i) If X and Y have distribution functions FX and FY respectively, then the joint distribution
function FX,Y of the random vector (X, Y ) is given by
(ii) If X and Y are discrete, with probability (mass) functions pX and pY respectively, then
the random vector (X, Y ) is discrete with probability function pX,Y given by
(iii) If X and Y are absolutely continuous, with density functions fX and fY respectively, then
the random vector (X, Y ) is absolutely continuous with density function fX,Y given by
(iv) If X and Y are positive [respectively integrable], then the random variable XY is positive
[respectively integrable] and
E(XY ) = EX EY. (4.21)
(v) If X and Y have moment generating functions MX and MY respectively, then the random
vector (X, Y ) has joint moment generating function
where we allow the possibility that, for appropriate η or θ, MX (η) or MY (θ) may be infinite
(i.e. if the right side of the above identity is infinite, then so is the left).
(vi) If X and Y have characteristic functions ϕX and ϕY respectively, then the random vector
(X, Y ) has joint characteristic function
Proof The result (i) is immediate from the definitions of the joint distribution function and
of independence, i.e., for any x, y,
so that the function of s and t given by fX (s)fY (t) is indeed the joint density function of (X, Y ).
For (iv) note that if X and Y are discrete, then the result follows from (ii) and the rearrangement
of a double sum [exercise!]. In the general case the result is yet another application of Fubini’s
Theorem, which, as previously remarked, permits similar rearrangements of double integrals.
The results (v) and (vi) are immediate applications of (iv) on noting that the independence of
X and Y implies also that of eηX and eθY and of eisX and eitY .
Proof Since X and Y are independent so also are X − EX and Y − EY . Hence, from (iv)
of Lemma 4.10, Cov(X, Y ) = E(X − EX)E(Y − EY ) = 0, and the required result now follows
from (4.5).
Exercise 4.16. Let V, W be independent identically distributed random variables with distri-
bution functions FV (t) = FW (t) = 1 − e−t , t > 0. Let X = 2V , Y = V − W . Compute the joint
distribution function FX,Y of the random vector (X, Y ). [Hint. Note that since V and W are
both nonnegative we necessarily have X ≥ 0 and Y ≤ X/2. Hence it is sufficient to evaluate
FX,Y (x, y) for x > 0, y ≤ x/2 [why?]. For such (x, y),
Z
FX,Y (x, y) = e−v e−w dv dw
A
where A is the region {(v, w) : v ≤ x/2, w ≥ v − y}. Now sketch this region in (v, w)-space, and
integrate first with respect to one variable, and then with respect to the other.]
Exercise 4.17. Let X, Y be independent random variables with common moment generating
2
function MX (θ) = MY (θ) = eθ . (This corresponds to the common distribution of X and Y
being normal with mean 0 and variance 2.) Let X 0 := X + Y , Y 0 := X − Y . Compute the joint
moment generating function of (X 0 , Y 0 ) (you will need to use the definition contained in (4.22)
above). Show that X 0 , Y 0 are independent.
Again suppose that the random variables X and Y are independent—with respective laws (dis-
tributions) PX and PY and respective distribution functions FX and FY . Then the law (distribu-
tion) of the random variable X + Y is referred to as the convolution of the laws (distributions)
of X and Y .
However, it is not always easy to calculate this. If, for example, FX+Y denotes the distribution
SMSTC: Foundations of Probability 4–14
i.e., from (4.24), FX+Y (z) is the Lebesgue integral of the function of x given by FY (z − x) with
respect to the probability measure (law) PX on R. (Of course the roles of X and Y may be
interchanged!) When, for example, X is discrete with probability function pX ,
X
FX+Y (z) = FY (z − x)pX (x), (4.25)
all x
where we again allow the possibility that, for appropriate θ, MX (θ) or MY (θ) may be
infinite (i.e. if the right side of the above identity is infinite, then so is the left).
(ii) If X and Y have characteristic functions ϕX and ϕY respectively, then the random variable
X + Y has characteristic function
Proof The proof of (i) is immediate from part (v) of Lemma 4.10 on putting η = θ, while
the proof of (ii) follows similarly from part (vi) of Lemma 4.10 on putting s = t. Alternatively,
either of these results may be proved directly by using part (iv) of Lemma 4.10.
These results may be used to calculate laws (distributions) of sums of independent random
variables, since (frequently) the moment generating function determines the corresponding law
(distribution) uniquely, while (always) the characteristic function determines the corresponding
law (distribution) uniquely.
Exercise 4.18. Let X and Y be independent random variables having Poisson distributions
with parameters λ > 0 and µ > 0 respectively, e.g.
λn
P(X = n) = e−λ , n = 0, 1, 2, . . . .
n!
Show that the moment generating functions MX and MY of X and Y are given by MX (θ) =
θ θ
eλ(e −1) and MY (θ) = eµ(e −1) respectively. Deduce that X + Y has a Poisson distribution with
parameter λ + µ.
Converse results
Proof By letting y → ∞ in (4.29), we see that (4.30) holds with k = limy→∞ GY (y). We
similarly obtain (4.31)—that the constant there is k −1 follows since, from (4.29),
=1
= lim FX (x) lim FY (y).
x→∞ y→∞
Hence the conditions of the lemma imply that the joint distribution function of X and Y
factorises as the product of the marginal distribution functions:
To show that this implies the independence of X and Y , let X 0 and Y 0 be independent random
variables with the distribution functions FX and FY respectively. Then, from (4.32), the joint
distribution function of (X 0 , Y 0 ) is also FX,Y . Since any joint distribution function uniquely
determines the corresponding law, it follows that the law PX,Y of (X, Y ) is the same as that of
(X 0 , Y 0 ) and so X, Y are independent also.
In the case where random variables X and Y are discrete, it similarly follows that factorisation
of the joint probability function is sufficient for their independence; in the case where X and
Y are absolutely continuous, factorisation of the joint density function is sufficient for their
independence. For any random variables X and Y , the factorisation of their joint characteristic
function (as in (4.23), but the factors on the right side do not need to be identified a priori as
characteristic functions) is sufficient for their independence. The proof is again entirely similarly
to that of Lemma 4.12 and again relies on the fact that any characteristic function identifies
the corresponding law uniquely. A corresponding result holds for the joint moment generating
function (factorisation implies independence) provided that this function exists in (at least) a
(one-sided) neighbourhood of zero.
Finally we remark that the condition (4.21) (a weak condition not comparable with the remaining
conditions featuring in Lemma 4.10) is not sufficient for the independence of X and Y .
SMSTC: Foundations of Probability 4–16
References
[1] B. Fristedt & L. Gray, A Modern Approach to Probability Theory, Birkhäuser, 1997.