0% found this document useful (0 votes)
30 views17 pages

04 - Random Variables 2

Uploaded by

park miru
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views17 pages

04 - Random Variables 2

Uploaded by

park miru
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

SMSTC (2022/23)

Foundations of Probability

www.smstc.ac.uk

Contents

4 Random variables and their laws II 4–1


4.1 Moments and inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–1
4.1.1 Markov and Chernoff inequalities . . . . . . . . . . . . . . . . . . . . . . . 4–1
4.1.2 Jensen’s inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–2
4.1.3 Moments and the spaces Lp . . . . . . . . . . . . . . . . . . . . . . . . . . 4–3
4.1.4 Hölder and related inequalities . . . . . . . . . . . . . . . . . . . . . . . . 4–4
4.2 Moment generating functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–5
4.3 Characteristic functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–6
4.4 Joint distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–8
4.4.1 Joint distribution function . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–9
4.4.2 Knowledge of FX,Y implies knowledge of PX,Y . . . . . . . . . . . . . . . 4–9
4.4.3 Discrete distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–10
4.4.4 Absolutely continuous distributions and joint density . . . . . . . . . . . . 4–10
4.5 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–12

(i)
SMSTC (2022/23)
Foundations of Probability
Chapter 4: Random variables and their laws II
The Probability Teama

www.smstc.ac.uk

Contents
4.1 Moments and inequalities . . . . . . . . . . . . . . . . . . . . . . . . . 4–1
4.1.1 Markov and Chernoff inequalities . . . . . . . . . . . . . . . . . . . . . . 4–1
4.1.2 Jensen’s inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–2
4.1.3 Moments and the spaces Lp . . . . . . . . . . . . . . . . . . . . . . . . . 4–3
4.1.4 Hölder and related inequalities . . . . . . . . . . . . . . . . . . . . . . . 4–4
4.2 Moment generating functions . . . . . . . . . . . . . . . . . . . . . . . 4–5
4.3 Characteristic functions . . . . . . . . . . . . . . . . . . . . . . . . . . 4–6
4.4 Joint distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–8
4.4.1 Joint distribution function . . . . . . . . . . . . . . . . . . . . . . . . . . 4–9
4.4.2 Knowledge of FX,Y implies knowledge of PX,Y . . . . . . . . . . . . . . 4–9
4.4.3 Discrete distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–10
4.4.4 Absolutely continuous distributions and joint density . . . . . . . . . . . 4–10
4.5 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–12

4.1 Moments and inequalities


4.1.1 Markov and Chernoff inequalities
Lemma 4.1 (Markov inequality). Let X be a nonnegative random variable. Then
EX
P(X ≥ t) ≤ , t > 0.
t
Proof We have, for t > 0,
X ≥ tI{X≥t} a.s.,
where, as usual, I{X≥t} is the indicator random variable of the event {X ≥ t}, and where “a.s.”
is the usual abbreviation for “almost surely”, i.e. the above identity holds on a set of probability 1
(indeed since X is nonnegative, the above identity here holds for all ω ∈ Ω, but “almost surely”
is all we need). Hence, by the monotonicity property of expectation,
EX ≥ E(tI{X≥t} )
= tP(X ≥ t).

The following lemma gives an important additional “basic” property of expectation, which may
be added to those of the previous chapter.
a
B.Buke@ed.ac.uk

4–1
SMSTC: Foundations of Probability 4–2

Lemma 4.2. Let X be a nonnegative random variable such that EX = 0. Then P(X = 0) = 1.

Proof From the Markov inequality, for any integer n > 0,

P(X > n1 ) = 0,

and the result now follows from the continuity property P9 of Chapter 1, since {X > 0} =
S 1 1
n {X > n } and so P(X > 0) = limn→∞ P(X > n ). 

Lemma 4.3 (Chernoff inequality). Let X be a real-valued random variable. Then, for any
positive nondecreasing function g

Eg(X)
P(X ≥ t) ≤ .
g(t)

Proof Since g is increasing,


{X ≥ t} ⊆ {g(X) ≥ g(t)}
Now apply the Markov inequality to g(X). 

Exercise 4.1. Let X be a discrete random variable with P(X = k) = nk 2−n , k = 0, 1, . . . , n




(i.e. X has a binomial distribution with parameters n and 1/2). For a > 1/2, use the Chernoff
inequality to obtain an upper bound for P(X > na), and show that this bound decays exponen-
tially in n. [Hint: take the function g of Lemma 4.3 to be given by g(x) = eθx for some θ > 0.
We then have (from the binomial theorem) that EeθX = 2−n (1 + eθ )n ; now choose θ so as to
optimise the bound.]

4.1.2 Jensen’s inequality


A function ϕ : R → R is convex if

ϕ(pa + (1 − p)b) ≤ pϕ(a) + (1 − p)ϕ(b) (4.1)

for all a, b ∈ R and all 0 ≤ p ≤ 1. Geometrically this means that a line joining (a, ϕ(a)) and
(b, ϕ(b)) lies above the function ϕ (not necessarily strictly) at all points intermediate between a
and b. Examples of convex functions are (i) any linear function x 7→ a + bx for constants a and
b, (ii) x 7→ |x|a for any a ≥ 1, and (iii) x 7→ ex . Further the function given by the (pointwise)
supremum of any family of convex functions is convex. [Exercise: prove this.]
Indeed we have the following useful result: a function ϕ is convex if and only if, for all a ∈ R,
there exists a constant b (depending on a) such that

ϕ(a) + b(x − a) ≤ ϕ(x), x ∈ R, (4.2)

i.e. if and only if, for every a, there exists a straight line of some slope b through (a, ϕ(a)) which
lies below (not necessarily strictly) the function ϕ at all other points x.
The “if” part of this result is immediate from the observation of the preceding paragraph, while
the “only if” part (which is what we need below) is geometrically “obvious”—see Figure 4.1. A
formal proof is a useful exercise, or, failing this, see [2]. (Note that we do not necessarily have
that ϕ is differentiable at a.)
Now notice that if ξ is a random variable with P(ξ = a) = p, P(ξ = b) = 1−p, the definition (4.1)
can be written as
ϕ(Eξ) ≤ Eϕ(ξ).
Jensen’s inequality generalises this observation.
SMSTC: Foundations of Probability 4–3

convex function ϕ
ϕ(x)
ϕ(a) + b(x − a)
slope b ϕ(a)

a x

Figure 4.1: Convex function ϕ and “supporting” straight line at (a, ϕ(a))

Lemma 4.4 (Jensen’s inequality). Let X be a real-valued integrable random variable (i.e.
E|X| < ∞) and let ϕ be a convex function. Then

ϕ(EX) ≤ Eϕ(X). (4.3)

Proof Let b be such that (4.2) holds with a = EX. Then

ϕ(EX) + b(x − EX) ≤ ϕ(x), x ∈ R,

so that
ϕ(EX) + b(X − EX) ≤ ϕ(X) a.s.
Taking expectations of both sides gives the required result (4.3). 

4.1.3 Moments and the spaces Lp


Definition 4.1. For r > 0 and a real-valued random variable X, the r-absolute moment of
X is defined as the quantity E|X|r . Note that this always exists, but may be finite or infinite.
We define also the quantity ||X||r := (E|X|r )1/r , again allowing the possibility that this may
be infinite. (The latter quantity may be thought of as a properly dimensioned version of the
r-absolute moment: note in particular that, for any constant a, we have ||aX|| = |a| kX||. In
the case where ||X||r < ∞ it is referred to as the r-norm of X.)

Note that for integer r ≥ 1, we may also define the r-moment of X by EX r , whenever this
r r
exists. (Recall from Chapter 3 that EX r = E (X + ) − E (X − ) provided that at least one of
expectations on the right side is finite.)

Lemma 4.5. For any random variable X, we have that ||X||r is increasing in r.

Proof We require to show that for any 0 < r < s we have ||X||r ≤ ||X||s . In the case where
||X||r < ∞ (i.e. E|X|r < ∞) this follows immediately from the application of Jensen’s inequality
to the random variable |X|r using the convex function ϕ given by ϕ(x) = xs/r for x ≥ 0 (see
above). In the case where ||X||r = ∞, we may (from the definition of expectation) consider a
sequence Xn of positive simple random variables such that Xn ≤ X and ||Xn ||r → ∞. Thus,
from our result in the finite case, ||Xn ||s → ∞, and so it follows by monotonicity that ||X||s = ∞
as required. 

Corollary 4.1. If E|X|p < ∞ for some p > 0 then E|X|r < ∞ for all 0 < r < p.
SMSTC: Foundations of Probability 4–4

The space Lp . For any p > 0, define Lp (sometimes identified more carefully as Lp (Ω, F, P))
to be the collection of real-valued random variables such that E|X|p < ∞, or equivalently
||X||p < ∞. It follows easily from the elementary inequality

(x + y)p ≤ (2 max(x, y))p ≤ 2p max(xp , y p ), x ≥ 0, y ≥ 0,

that Lp is a vector space. (In particular, if X, Y ∈ Lp , then, for any constants a, b, we have
aX + bY ∈ Lp .) Corollary 4.1 may be restated as Lp ⊆ Lq for all 0 < q < p; (note that other
than on finite probability spaces the inclusion is strict). Of particular interest are the space L1
of integrable random variables, and the smaller space L2 , which we now discuss.

The space L2 . This is the space of random variables X such that EX 2 < ∞. Note that if
X ∈ L2 , then (since L2 ⊆ L1 ) EX exists and is finite, and so also X − EX ∈ L2 .

Definition 4.2. The variance of a random variable X ∈ L2 is defined by

Var X := E(X − EX)2 .

Exercise 4.2. Note (from positivity) that for X ∈ L2 we have Var X ≥ 0 always. Use the other
basic properties of expectation and Lemma 4.2 to show that Var X = 0 if and only if X is almost
surely constant, and to establish the identity

Var X = EX 2 − (EX)2 .

Exercise 4.3. For X ∈ L2 apply the Markov inequality to (X − EX)2 to prove the Chebyshev
inequality:
Var X
P(|X − EX| ≥ t) ≤ .
t2
Exercise 4.4. For X, Y ∈ L2 , define the covariance between X and Y by

Cov(X, Y ) := E((X − EX)(Y − EY )). (4.4)

It follows from the elementary inequality 0 ≤ 2|xy| ≤ x2 + y 2 (applied to the random variables
X − EX and Y − EY ) that Cov(X, Y ) is finite. Show that, for any constants a and b,

Var(aX + bY ) = a2 Var X + 2ab Cov(X, Y ) + b2 Var Y. (4.5)

4.1.4 Hölder and related inequalities


The following important inequality has many useful consequences. For a proof, see e.g. [2].

Lemma 4.6 (Hölder inequality). Let the random variables X, Y be such that X ∈ Lp and
Y ∈ Lq for some p, q > 1 such that p−1 + q −1 = 1. Then

|E(XY )| ≤ E|XY | ≤ ||X||p ||Y ||q . (4.6)

Remarks. It is really the second of the inequalities in (4.6) which is the Hölder inequality (i.e.
it is in essence an inequality for nonnegative random variables); also this result implies that in
particular (under the given conditions) XY is integrable, i.e. XY ∈ L1 . The first inequality in
(4.6) is just the elementary modulus inequality for expectation—see Chapter 3.
In the special case p = q = 2, the Hölder inequality reduces to the well-known Cauchy-
Bunyakowskii-Schwarz inequality (often just referred to as the Schwarz inequality).
SMSTC: Foundations of Probability 4–5

Corollary 4.2 (Cauchy-Bunyakowskii-Schwarz). Let the random variables X, Y ∈ L2 .


Then
|E(XY )| ≤ E|XY | ≤ ||X||2 ||Y ||2 .

Exercise 4.5. Give an elementary proof of the Cauchy-Bunyakowskii-Schwarz inequality as


follows. We have already observed that, since X, Y ∈ L2 , we have E|XY | < ∞, i.e. XY is
integrable. Thus also E(XY ) is finite. Now observe that for all λ ∈ R, we have E(λX + Y )2 ≥ 0.
Express the left side of this inequality as a quadratic in λ, and then use the elementary algebraic
condition that such a quadratic should never take negative values to deduce the required result.

Exercise 4.6. Let X, Y ∈ L2 . Define


p p
ρ(X, Y ) := Cov(X, Y )/ Var(X) Var(Y ).

Show that
−1 ≤ ρ(X, Y ) ≤ 1.

Finally, we remark that the following lemma is the triangle inequality which justifies the inter-
pretation of || · || as a norm. For p > 1, it is readily deduced from the Hölder inequality—for
details again see, e.g., [2].

Lemma 4.7 (Minkowski inequality). Let X, Y ∈ Lp for some p ≥ 1. Then

||X + Y ||p ≤ ||X||p + ||Y ||p .

4.2 Moment generating functions


Let X be a real-valued random variable. Since, for any θ ∈ R, the random variable eθX is non-
negative, its expectation exists (but may be equal to +∞). We define the moment generating
function M : R → R ∪ {+∞} by

M (θ) := E(eθX ), θ ∈ R.

Notice that in particular M (0) = 1 always. The values of M (θ) are also referred to as the
exponential moments of the random variable X and the function L(θ) := M (−θ) is also
known as the Laplace transform of (the distribution of) X.
The function M is useful if M (θ) < ∞ for some θ 6= 0. (Indeed, there are cases where θ = 0 is
the only point at which M is finite.) If X is a positive random variable, then M (θ) ≤ 1 for all
θ ≤ 0. If X is a negative random variable, then M (θ) ≤ 1 for all θ ≥ 0.
Note that (see Theorem 3.2 of Chapter 3) the moment generating function M depends only on
the distribution, or law, of X. Indeed, from that theorem, in the case where X has an absolutely
continuous distribution function F with density f , we can write
Z
M (θ) = eθx f (x)dx (Lebesgue integral) .
R

The moment generating function M is so called because of the following result.

Lemma 4.8. Suppose there exist a < 0 < b such that M (θ) < ∞ for all a < θ < b. Then
(i) the r-moment of X exists for all r = 0, 1, 2, . . . and is given by the r-derivative of M at 0:

E(X r ) = Dr M (0).
SMSTC: Foundations of Probability 4–6

(ii)

X E(X r )
M (θ) = θr , a < θ < b.
r!
r=0

(iii) There is only one law such that if a random variable has that law then it has moment
generating function M .

Note that the result (iii) above says that, under the conditions of the lemma, the moment
generating function uniquely determines the corresponding law.
Proof [sketch] Using the Dominated Convergence Theorem (see Chapters 3 and 6), we can see
that M is infinitely differentiable at 0 with r-derivative equal to the r-moment of X. Moreover,
we can see that M is a real analytic function around 0. Hence Taylor’s theorem holds, which
yields the second claim. 

Exercise 4.7. Suppose that the random variable X has moment generating function MX . Show
that, for any constants a, b, the random variable Y := a + bX has moment generating function
MY given by MY (θ) = eaθ MX (bθ).

Exercise 4.8. Suppose that the random variable X has an exponential distribution with pa-
rameter λ, i.e. has distribution function F given by F (x) = 0 for x < 0 and F (x) = 1 − e−λx for
x ≥ 0. Show that the corresponding moment generating function is given by M (θ) = λ/(λ − θ)
for θ < λ (and by M (θ) = ∞ for θ ≥ λ). Use the above lemma to deduce directly that EX = λ−1
and Var X = λ−2 .

Exercise 4.9. Let X be absolutely continuous with density f (x) = π −1 (1 + x2 )−1 , x ∈ R (i.e.
X has a Cauchy distribution). Show that the corresponding moment generating function M (θ)
is only defined for θ = 0.

Moment generating functions are particularly useful in regard to the addition of independent
random variables, as we shall see.

4.3 Characteristic functions


Moment generating functions, as the last exercise shows, may not give any information at all.
There is an analytic technique bypassing this problem, consisting of replacing the argument of
moment generating function by an imaginary number, i.e. a number of the form it, where
the √
i = −1. First, we define the expectation of any complex random variable Y := Y1 + iY2 ,
where Y1 , Y2 are real random variables, by

EY = E(Y1 + iY2 ) := EY1 + iEY2 .

Then, in an extension of the modulus inequality for real-valued random variables,

|EY | ≤ E|Y |. (4.7)

We then define the characteristic function ϕ of a real-valued random variable X by

ϕ(t) := EeitX , t ∈ R.

It is important to note that ϕ is a complex-valued function of a real variable t. By the Taylor


expansion of the exponential function,

eitX = cos(tX) + i sin(tX), t ∈ R.


SMSTC: Foundations of Probability 4–7

Thus
ϕ(t) = E cos(tX) + iE sin(tX), t ∈ R. (4.8)
Further |eitX | = [cos(tX)2 + sin(tX)2 ]1/2 = 1, and therefore ϕ(t) is defined for all t and, from
(4.7), |ϕ(t)| ≤ 1. Note further that ϕ(0) = 1 always.
As in the case of the moment generating function, note that (by, e.g. Theorem 3.2 of Chapter 3)
the characteristic function ϕ depends only on the law of the random variable X.
The following result (for a proof of which see, for example, [2]) is very important.

Theorem 4.1. Given any characteristic function ϕ, there is only law such that if a random
variable has that law then it has characteristic function ϕ.

In other words, the characteristic function uniquely determines (characterises) the law of a
random variable. (This contrasts with the situation for the moment generating function where
we saw that additional conditions are required for it to uniquely characterise the corresponding
law.)
It is further straightforward to show [exercise!] that, for any characteristic function ϕ of a
random variable X:
1. ϕ is a continuous (and, in fact, analytic) function.
2. The complex conjugate ϕ(t) is ϕ(−t). If X is symmetric around 0 (i.e. −X has the same
law as X) then, for all t ∈ R, we have that ϕ(t) is a real number.

It follows from Theorem 3.2 of Chapter 3 (straightforwardly extended to complex functions of


a real-valued random variable) that if a real-valued random variable X is absolutely continuous
with density f , then its characteristic function is given by
Z ∞
ϕ(t) = eitx f (x)dx.
−∞

This is the Fourier transform of the density f . The inverse Fourier transform determines
f from ϕ:
1 ∞ −itx
Z
f (x) = e ϕ(t)dt.
π −∞
However, this inversion formula is rarely used in practice, where, given a characteristic function
ϕ, we typically conjecture the corresponding law, check that its characteristic function is indeed
ϕ, and then appeal to Theorem 4.1.

Exercise 4.10. Prove the modulus inequality (4.7) for complex random variables.

Exercise 4.11. Suppose that the (real-valued) random variable X has characteristic function
ϕX . Show that, for any constants a, b, the random variable Y := a + bX has characteristic
function ϕY given by ϕY (t) = eiat ϕX (bt).

Exercise 4.12. Let the random variable U have a uniform distribution on (0, 1). Show that
the characteristic function of U is given by

eit − 1
ϕ(t) = , t 6= 0.
it
Use also the result of the previous exercise to deduce that if a random variable has a uniform
distribution on (a, b), then its characteristic function is given by ϕa,b (t) = (eibt − eiat )/i(b − a)t
for t 6= 0.
SMSTC: Foundations of Probability 4–8

Exercise 4.13. Show that an exponentially-distributed random variable, with density function
f given by f (x) = λe−λx for x ≥ 0 and f (x) = 0 for x < 0, has characteristic function
ϕ(t) = λ/(λ − it).

Now recall that the normal distribution with mean 0 and variance 1 has density
1 2
f (z) = √ e−z /2 ,

and is usually denoted by N (0, 1). The following result is required for the proof of the central
limit theorem in Chapter 7.
2 /2
Lemma 4.9. The N (0, 1) distribution has characteristic function ϕ given by ϕ(t) = e−t .

Proof Let the random variable Z have an N (0, 1) distribution. It follows from (4.8) and the
symmetry around 0 of this distribution that, for t ∈ R,
Z ∞
1 2
ϕ(t) = E cos(tZ) = √ cos(tz)e−z /2 dz.
2π −∞
A little elementary analysis then gives
Z ∞
0 1 2
ϕ (t) = − √ z sin(tz)e−z /2 dz
2π −∞
= −tϕ(t),

where, in the first line in the above display, a little care is needed in order to justify the inter-
change of limits involved in differentiation with respect to t and integration with respect to z,
and where the second line follows on integrating by parts. Now, easily,
d  t2 /2 
e ϕ(t) = 0,
dt
and so the result follows on recalling also ϕ(0) = 1.
Alternatively the result is also easily deduced by using contour integration in the complex plane.


4.4 Joint distributions


Chapter 3 studied (real-valued) random variables and their distributions. We now wish to extend
some of these ideas to finite collections of random variables, which we refer to as random vectors.
Recall from Chapter 1 (Exercise 1-10) that an ordered collection X = (X1 , . . . , Xd ) of (real-
valued) random variables may equivalently be regarded as a random vector, i.e. as a measurable
function from the probability space (Ω, F, P) into (Rd , B(Rd )) (the set Rd of real numbers en-
dowed with the corresponding Borel σ-algebra B(Rd )).
Recall further from Chapter 3 that the random vector X = (X1 , . . . , Xd ) then induces a proba-
bility measure PX on (Rd , B(Rd )), given by

PX (B) = P(X ∈ B), B ∈ Rd . (4.9)

The probability measure PX is the law, or distribution, of X. When we regard X as a collection


(X1 , . . . , Xd ) of random variables, the probability measure PX on (Rd , B(Rd )) is also referred to
as the joint distribution of these random variables.
SMSTC: Foundations of Probability 4–9

Note that the joint distribution PX determines the (marginal) laws or distributions PX1 , . . . , PXd
of each of the individual random variables X1 , . . . , Xd : for example, for each (one-dimensional)
Borel set B ∈ B,
PX1 (B) = P(X1 ∈ B) = PX (B × R × · · · × R).
However, PX provides information not just about these individual distributions, but also about
the ways in which the random variables X1 , . . . , Xd are probabilistically associated with each
other.
For simplicity, in the rest of this section we specialise to the case d = 2 and replace the random
vector (or pair of random variables) (X1 , X2 ) with the random vector (X, Y ).

4.4.1 Joint distribution function


The joint distribution function of the random vector (X, Y ) is the function
FX,Y (x, y) := P(X ≤ x, Y ≤ y) = PX,Y ((−∞, x] × (−∞, y]), (x, y) ∈ R2 . (4.10)
Note in particular that the marginal distribution functions FX and FY of the individual random
variables X and Y are given by
FX (x) = lim FX,Y (x, y),
y→∞

FY (y) = lim FX,Y (x, y).


x→∞

(Note also that the choice of ≤ instead of < in (4.10) is an arbitrary convention.)
Analogously to Lemma 3.2 of Chapter 3, the joint distribution function FX,Y has the following
properties.
(i) FX,Y is nondecreasing in each of its arguments, [monotonicity]
(ii) limx→−∞ FX,Y (x, y) = 0 for all y, and also limy→−∞ FX,Y (x, y) = 0 for all x,
(iii) limx,y→+∞ FX,Y (x, y) = 1,
(iv) limn→∞ FX,Y (x + 1/n, y) = FX,Y (x, y) for all y, and also limn→∞ FX,Y (x, y + 1/n) =
FX,Y (x, y) for all x. [right continuity]
However, in the two-dimensional case d = 2 (and in higher-dimensional cases) these properties
are insufficient to characterise joint distribution functions—but see below for the additional
condition required.

4.4.2 Knowledge of FX,Y implies knowledge of PX,Y


Now consider a rectangle (with sides parallel to the axes—please think geometrically)
(a1 , b1 ] × (a2 , b2 ] := {(x, y) ∈ R2 : a1 < x ≤ b1 , a2 < y ≤ b2 }. (4.11)
We allow a1 , a2 to take any value, including −∞. Since (−∞, b1 ] × (−∞, b2 ] is the disjoint union
of four rectangles, using additivity, we obtain
PX,Y ((a1 , b1 ] × (a2 , b2 ]) = FX,Y (b1 , b2 ) − FX,Y (b1 , a2 ) − FX,Y (a1 , b2 ) + FX,Y (a1 , a2 ). (4.12)
Hence the joint distribution function FX,Y determines the probability PX,Y on rectangles, and
so also on countable unions of disjoint rectangles. Analogously to Lemma 3.3 of Chapter 3, and
with some messiness, the Extension Theorem may be used to show that, given any function
FX,Y on R2 satisfying the properties (i)-(iv) above, together with the additional requirement
that the right side of (4.12) is nonnegative for all a1 ≤ b1 and a2 ≤ b2 , there exists a random
vector (X, Y ) with joint distribution function FX,Y , and further that the corresponding joint
distribution PX,Y is unique. Thus in particular FX,Y always determines PX,Y uniquely. (To
establish the uniqueness result alone is simpler—see, e.g. Chapter 1 of [2].)
SMSTC: Foundations of Probability 4–10

4.4.3 Discrete distributions


Now suppose that the random vector (X, Y ) takes values in a discrete set (S1 × S2 ) (endowed
with the σ-algebra of all subsets of this set).
As usual, define the joint probability (mass) function of (X, Y ) by

pX,Y (x, y) := P((X, Y ) = (x, y)) = P(X = x, Y = y), (x, y) ∈ S1 × S2 .

Note that this function is nonnegative and that


X
pX,Y (x, y) = 1.
(x,y)∈S1 ×S2

We then have that the marginal probability functions of X and Y are given by the nonneg-
ative functions X
pX (x) := P(X = x) = pX,Y (x, y)
y∈S2

and X
pY (y) := P(Y = y) = pX,Y (x, y).
x∈S1

Define also the conditional probability function of Y given X by b

 pX,Y (x, y)

if pX (x) 6= 0,
pY |X (y|x) := pX (x) (4.13)
0 otherwise.

We then have, for any x, y,

P(X = x, Y = y) = pX (x)pY |X (y|x)

(since if pX (x) = 0 then also pX,Y (x, y) = 0) and also, if pX (x) > 0,

P(Y = y | X = x) = pY |X (y|x).

Exercise 4.14. Let the random variables be the numbers obtained on two independent rolls of
a die. Define X = min(N1 , N2 ) and Y = max(N1 , N2 ). Find the joint probability function of
(X, Y ). Use it to calculate the marginal probability functions of X and Y and verify that these
are as you would expect by direct determination of each of them. Find also the conditional
probability functions and verify that these are as you would expect.

4.4.4 Absolutely continuous distributions and joint density


Now suppose instead that there exists a nonnegative measurable function fX,Y on (R2 , B(R2 ))
such that FX,Y can be written as a Lebesgue integral : c
Z
FX,Y (x, y) = fX,Y (s, t) ds dt. (4.14)
(−∞,x]×(−∞,y]
b
The notation pY |X is terrible. We only use it out of some respect to the undergraduate probability courses.
The reason that the notation is terrible is that in the subscript ‘Y |X’ in pY |X (y|X) plays a merely cosmetic rôle,
as opposed to the essential rôle played by the last variable X inside the parenthesis.
c
The Lebesgue integral was effectively constructed in Chapter 3. However, note in particular that if (4.14)
holds with the alternative interpretation of its right side as the more familiar Riemann integral, then it holds with
the interpretation of its right side as the Lebesgue integral. The point is that the Lebesgue integral may continue
to exist in circumstances—pathological from the point of view of applications—where the Riemann integral does
not.
SMSTC: Foundations of Probability 4–11

In such a case, we say that the random vector (X, Y ) has an absolutely continuous distribu-
tion, and the function fX,Y is referred to as the joint density of this distribution.
It can then be shown (see also Figure 4.2) that, for any Borel set B ∈ B(R2 ),
Z
P((X, Y ) ∈ B) = PX,Y (B) = fX,Y (s, t) ds dt. (4.15)
B

x
Figure 4.2: The probability PX,Y (B) is given by integrating the joint density fX,Y over the
shaded Borel set B.
(In the case where B is a rectangle B = (a1 , b1 ] × (a2 , b2 ] as above, this follows from (4.12)
[exercise].) We also have of course that
Z
fX,Y (s, t) ds dt = 1.
R2

The marginal distribution function FX of X is given by


Z Z x
FX (x) = fX,Y (s, t) dsdt = fX (s) ds, (4.16)
(−∞,x]×(−∞,∞) −∞

where the function fX is given by


Z
fX (s) = fX,Y (s, t) dt. (4.17)
R

(Here we have used Fubini’s theorem—again see [2], which essentially allows the two-dimensional
integral to be treated as a succession of two one-dimensional integrals.) It follows that the
function fX given by (4.17) is a density for X—and so X is absolutely continuous, as is Y .
We can also, naı̈vely, define the conditional density of Y given X by
 fX,Y (x, y)

if fX (x) 6= 0,
fY |X (y|x) := fX (x)
0 otherwise.

Some justification for this is given by the observation that then (4.15) continues to hold when
fX,Y (s, t) is replaced by fX (s)fY |X (t|s). However, we leave a proper treatment of what it means
to condition on a random variable until Chapter 8.
Exercise 4.15. Suppose that the random variables X and Y have joint density
(
a(x + y 2 ), 0 ≤ x ≤ 1, 0 ≤ y ≤ 1,
fX,Y (x, y) =
0, otherwise.

(i) Find the marginal densities fX and fY and the marginal distribution functions FX and
FY of X and Y . Hence identify also the constant a.
(ii) Find the joint distribution function FX,Y of X and Y .
(iii) Find P(X > Y ) and also P(X 2 > Y ).
SMSTC: Foundations of Probability 4–12

4.5 Independence
We continue to consider a random vector (pair of random variables) (X, Y ), but the results of
this section have obvious extensions to d-dimensional random vectors (X1 , . . . , Xd ) or indeed to
random sequences (sequences of random variables) (X1 , X2 , . . . ).
Recall that random variables X and Y are independent if and only if the generated σ-algebras
σ(X) and σ(Y ) are independent.

Lemma 4.10. Suppose that the random variables X and Y are independent. Then
(i) If X and Y have distribution functions FX and FY respectively, then the joint distribution
function FX,Y of the random vector (X, Y ) is given by

FX,Y (x, y) = FX (x)FY (y), for all x, y. (4.18)

(ii) If X and Y are discrete, with probability (mass) functions pX and pY respectively, then
the random vector (X, Y ) is discrete with probability function pX,Y given by

pX,Y (x, y) = pX (x)pY (y) for all x, y. (4.19)

(iii) If X and Y are absolutely continuous, with density functions fX and fY respectively, then
the random vector (X, Y ) is absolutely continuous with density function fX,Y given by

fX,Y (x, y) = fX (x)fY (y) for almost all x, y.d (4.20)

(iv) If X and Y are positive [respectively integrable], then the random variable XY is positive
[respectively integrable] and
E(XY ) = EX EY. (4.21)

(v) If X and Y have moment generating functions MX and MY respectively, then the random
vector (X, Y ) has joint moment generating function

MX,Y (η, θ) := EeηX+θY = MX (η)MY (θ), η, θ ∈ R, (4.22)

where we allow the possibility that, for appropriate η or θ, MX (η) or MY (θ) may be infinite
(i.e. if the right side of the above identity is infinite, then so is the left).
(vi) If X and Y have characteristic functions ϕX and ϕY respectively, then the random vector
(X, Y ) has joint characteristic function

ϕX,Y (s, t) := EeisX+itY = ϕX (s)ϕY (t), s, t ∈ R. (4.23)

Proof The result (i) is immediate from the definitions of the joint distribution function and
of independence, i.e., for any x, y,

FX,Y (x, y) = P(X ≤ x, Y ≤ y)


= P(X ≤ x)P(Y ≤ y) (independence)
= FX (x)FY (y).

The proof of (ii) is entirely similar to that of (i).


d
By ”for almost all x, y” we mean ”except perhaps on a set in R2 of Lebesgue measure 0” since any density
function may be changed on such a set and remain the density function of the same distribution.
SMSTC: Foundations of Probability 4–13

For (iii) note that, from (i)

FX,Y (x, y) = FX (x)FY (y)


Z x Z y
= fX (s) ds fY (t) dt
−∞ −∞
Z
= fX (s)fY (t) ds dt (Fubini again),
(−∞,x]×(−∞,y]

so that the function of s and t given by fX (s)fY (t) is indeed the joint density function of (X, Y ).
For (iv) note that if X and Y are discrete, then the result follows from (ii) and the rearrangement
of a double sum [exercise!]. In the general case the result is yet another application of Fubini’s
Theorem, which, as previously remarked, permits similar rearrangements of double integrals.
The results (v) and (vi) are immediate applications of (iv) on noting that the independence of
X and Y implies also that of eηX and eθY and of eisX and eitY .


Corollary 4.3. Suppose that X, Y ∈ L2 are independent. Then

Var(X + Y ) = Var X + Var Y.

Proof Since X and Y are independent so also are X − EX and Y − EY . Hence, from (iv)
of Lemma 4.10, Cov(X, Y ) = E(X − EX)E(Y − EY ) = 0, and the required result now follows
from (4.5). 

Exercise 4.16. Let V, W be independent identically distributed random variables with distri-
bution functions FV (t) = FW (t) = 1 − e−t , t > 0. Let X = 2V , Y = V − W . Compute the joint
distribution function FX,Y of the random vector (X, Y ). [Hint. Note that since V and W are
both nonnegative we necessarily have X ≥ 0 and Y ≤ X/2. Hence it is sufficient to evaluate
FX,Y (x, y) for x > 0, y ≤ x/2 [why?]. For such (x, y),
Z
FX,Y (x, y) = e−v e−w dv dw
A

where A is the region {(v, w) : v ≤ x/2, w ≥ v − y}. Now sketch this region in (v, w)-space, and
integrate first with respect to one variable, and then with respect to the other.]

Exercise 4.17. Let X, Y be independent random variables with common moment generating
2
function MX (θ) = MY (θ) = eθ . (This corresponds to the common distribution of X and Y
being normal with mean 0 and variance 2.) Let X 0 := X + Y , Y 0 := X − Y . Compute the joint
moment generating function of (X 0 , Y 0 ) (you will need to use the definition contained in (4.22)
above). Show that X 0 , Y 0 are independent.

Sums of independent random variables

Again suppose that the random variables X and Y are independent—with respective laws (dis-
tributions) PX and PY and respective distribution functions FX and FY . Then the law (distribu-
tion) of the random variable X + Y is referred to as the convolution of the laws (distributions)
of X and Y .
However, it is not always easy to calculate this. If, for example, FX+Y denotes the distribution
SMSTC: Foundations of Probability 4–14

function of X + Y , then, for all z,

FX+Y (z) = P(X + Y ≤ z)


Z ∞
= P(Y ≤ z − x) dPX (x)
Z−∞

= FY (z − x) dPX (x) (4.24)
−∞
= EFY (z − X),

i.e., from (4.24), FX+Y (z) is the Lebesgue integral of the function of x given by FY (z − x) with
respect to the probability measure (law) PX on R. (Of course the roles of X and Y may be
interchanged!) When, for example, X is discrete with probability function pX ,
X
FX+Y (z) = FY (z − x)pX (x), (4.25)
all x

while when X is continuous with density fX ,


Z ∞
FX+Y (z) = FY (z − x)fX (x) dx. (4.26)
−∞

Fortunately, Lemma 4.10 gives us the following results.


Lemma 4.11. Suppose that the random variables X and Y are independent.
(i) If X and Y have moment generating functions MX and MY respectively, then the random
variable X + Y has moment generating function

MX+Y (θ) = MX (θ)MY (θ), θ ∈ R, (4.27)

where we again allow the possibility that, for appropriate θ, MX (θ) or MY (θ) may be
infinite (i.e. if the right side of the above identity is infinite, then so is the left).
(ii) If X and Y have characteristic functions ϕX and ϕY respectively, then the random variable
X + Y has characteristic function

ϕX+Y (t) = ϕX (t)ϕY (t), t ∈ R. (4.28)

Proof The proof of (i) is immediate from part (v) of Lemma 4.10 on putting η = θ, while
the proof of (ii) follows similarly from part (vi) of Lemma 4.10 on putting s = t. Alternatively,
either of these results may be proved directly by using part (iv) of Lemma 4.10. 
These results may be used to calculate laws (distributions) of sums of independent random
variables, since (frequently) the moment generating function determines the corresponding law
(distribution) uniquely, while (always) the characteristic function determines the corresponding
law (distribution) uniquely.
Exercise 4.18. Let X and Y be independent random variables having Poisson distributions
with parameters λ > 0 and µ > 0 respectively, e.g.
λn
P(X = n) = e−λ , n = 0, 1, 2, . . . .
n!
Show that the moment generating functions MX and MY of X and Y are given by MX (θ) =
θ θ
eλ(e −1) and MY (θ) = eµ(e −1) respectively. Deduce that X + Y has a Poisson distribution with
parameter λ + µ.

Further examples of the use of these results will be given in Chapter 5.


SMSTC: Foundations of Probability 4–15

Converse results

Recall that, as emphasised in Chapter 3, where independence exists it is usually because it


is a given feature of a probability model, reflecting a construction appropriate to some given
sequence of physically independent experiments, e.g. independent identically distributed trails.
Then random variables measurable with respect to independent sub-σ-algebras are automatically
independent.
However, it is sometimes necessary to verify independence, and therefore it is useful to know,
given random variables X and Y , which of the conditions (4.18)– (4.23) featured in Lemma 4.10
are sufficient for their independence. We give in detail the result for joint distribution functions,
and discuss also the remaining conditions.
Lemma 4.12. Suppose that the joint distribution function of random variables X and Y fac-
torises as
FX,Y (x, y) = GX (x)GY (y), for all x, y, (4.29)
for some functions GX and GY . Then X and Y are independent, and further, for some constant
k > 0, their distribution functions FX and FY satisfy

FX (x) = kGX (x) for all x, (4.30)


−1
FY (y) = k GY (y) for all y. (4.31)

Proof By letting y → ∞ in (4.29), we see that (4.30) holds with k = limy→∞ GY (y). We
similarly obtain (4.31)—that the constant there is k −1 follows since, from (4.29),

lim GX (x) lim GY (y) = lim FX,Y (x, y)


x→∞ y→∞ x,y→∞

=1
= lim FX (x) lim FY (y).
x→∞ y→∞

Hence the conditions of the lemma imply that the joint distribution function of X and Y
factorises as the product of the marginal distribution functions:

FX,Y (x, y) = FX (x)FY (y), for all x, y. (4.32)

To show that this implies the independence of X and Y , let X 0 and Y 0 be independent random
variables with the distribution functions FX and FY respectively. Then, from (4.32), the joint
distribution function of (X 0 , Y 0 ) is also FX,Y . Since any joint distribution function uniquely
determines the corresponding law, it follows that the law PX,Y of (X, Y ) is the same as that of
(X 0 , Y 0 ) and so X, Y are independent also. 
In the case where random variables X and Y are discrete, it similarly follows that factorisation
of the joint probability function is sufficient for their independence; in the case where X and
Y are absolutely continuous, factorisation of the joint density function is sufficient for their
independence. For any random variables X and Y , the factorisation of their joint characteristic
function (as in (4.23), but the factors on the right side do not need to be identified a priori as
characteristic functions) is sufficient for their independence. The proof is again entirely similarly
to that of Lemma 4.12 and again relies on the fact that any characteristic function identifies
the corresponding law uniquely. A corresponding result holds for the joint moment generating
function (factorisation implies independence) provided that this function exists in (at least) a
(one-sided) neighbourhood of zero.
Finally we remark that the condition (4.21) (a weak condition not comparable with the remaining
conditions featuring in Lemma 4.10) is not sufficient for the independence of X and Y .
SMSTC: Foundations of Probability 4–16

References
[1] B. Fristedt & L. Gray, A Modern Approach to Probability Theory, Birkhäuser, 1997.

[2] D. Williams, Probability with Martingales, Cambridge, 1991.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy