0% found this document useful (0 votes)

30 views17 pages

04 - Random Variables 2

Uploaded by

park miru

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views17 pages

04 - Random Variables 2

Uploaded by

park miru

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

SMSTC (2022/23)

Foundations of Probability

www.smstc.ac.uk

Contents

4 Random variables and their laws II 4–1

4.1 Moments and inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–1
4.1.1 Markov and Chernoff inequalities . . . . . . . . . . . . . . . . . . . . . . . 4–1
4.1.2 Jensen’s inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–2
4.1.3 Moments and the spaces Lp . . . . . . . . . . . . . . . . . . . . . . . . . . 4–3
4.1.4 Hölder and related inequalities . . . . . . . . . . . . . . . . . . . . . . . . 4–4
4.2 Moment generating functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–5
4.3 Characteristic functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–6
4.4 Joint distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–8
4.4.1 Joint distribution function . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–9
4.4.2 Knowledge of FX,Y implies knowledge of PX,Y . . . . . . . . . . . . . . . 4–9
4.4.3 Discrete distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–10
4.4.4 Absolutely continuous distributions and joint density . . . . . . . . . . . . 4–10
4.5 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–12

(i)
SMSTC (2022/23)
Foundations of Probability
Chapter 4: Random variables and their laws II
The Probability Teama

www.smstc.ac.uk

Contents
4.1 Moments and inequalities . . . . . . . . . . . . . . . . . . . . . . . . . 4–1
4.1.1 Markov and Chernoff inequalities . . . . . . . . . . . . . . . . . . . . . . 4–1
4.1.2 Jensen’s inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–2
4.1.3 Moments and the spaces Lp . . . . . . . . . . . . . . . . . . . . . . . . . 4–3
4.1.4 Hölder and related inequalities . . . . . . . . . . . . . . . . . . . . . . . 4–4
4.2 Moment generating functions . . . . . . . . . . . . . . . . . . . . . . . 4–5
4.3 Characteristic functions . . . . . . . . . . . . . . . . . . . . . . . . . . 4–6
4.4 Joint distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–8
4.4.1 Joint distribution function . . . . . . . . . . . . . . . . . . . . . . . . . . 4–9
4.4.2 Knowledge of FX,Y implies knowledge of PX,Y . . . . . . . . . . . . . . 4–9
4.4.3 Discrete distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–10
4.4.4 Absolutely continuous distributions and joint density . . . . . . . . . . . 4–10
4.5 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–12

4.1 Moments and inequalities

4.1.1 Markov and Chernoff inequalities
Lemma 4.1 (Markov inequality). Let X be a nonnegative random variable. Then
EX
P(X ≥ t) ≤ , t > 0.
t
Proof We have, for t > 0,
X ≥ tI{X≥t} a.s.,
where, as usual, I{X≥t} is the indicator random variable of the event {X ≥ t}, and where “a.s.”
is the usual abbreviation for “almost surely”, i.e. the above identity holds on a set of probability 1
(indeed since X is nonnegative, the above identity here holds for all ω ∈ Ω, but “almost surely”
is all we need). Hence, by the monotonicity property of expectation,
EX ≥ E(tI{X≥t} )
= tP(X ≥ t).

The following lemma gives an important additional “basic” property of expectation, which may
be added to those of the previous chapter.
a
B.Buke@ed.ac.uk

4–1
SMSTC: Foundations of Probability 4–2

Lemma 4.2. Let X be a nonnegative random variable such that EX = 0. Then P(X = 0) = 1.

Proof From the Markov inequality, for any integer n > 0,

P(X > n1 ) = 0,

and the result now follows from the continuity property P9 of Chapter 1, since {X > 0} =
S 1 1
n {X > n } and so P(X > 0) = limn→∞ P(X > n ).

Lemma 4.3 (Chernoff inequality). Let X be a real-valued random variable. Then, for any
positive nondecreasing function g

Eg(X)
P(X ≥ t) ≤ .
g(t)

Proof Since g is increasing,

{X ≥ t} ⊆ {g(X) ≥ g(t)}
Now apply the Markov inequality to g(X).

Exercise 4.1. Let X be a discrete random variable with P(X = k) = nk 2−n , k = 0, 1, . . . , n

(i.e. X has a binomial distribution with parameters n and 1/2). For a > 1/2, use the Chernoff
inequality to obtain an upper bound for P(X > na), and show that this bound decays exponen-
tially in n. [Hint: take the function g of Lemma 4.3 to be given by g(x) = eθx for some θ > 0.
We then have (from the binomial theorem) that EeθX = 2−n (1 + eθ )n ; now choose θ so as to
optimise the bound.]

4.1.2 Jensen’s inequality

A function ϕ : R → R is convex if

ϕ(pa + (1 − p)b) ≤ pϕ(a) + (1 − p)ϕ(b) (4.1)

for all a, b ∈ R and all 0 ≤ p ≤ 1. Geometrically this means that a line joining (a, ϕ(a)) and
(b, ϕ(b)) lies above the function ϕ (not necessarily strictly) at all points intermediate between a
and b. Examples of convex functions are (i) any linear function x 7→ a + bx for constants a and
b, (ii) x 7→ |x|a for any a ≥ 1, and (iii) x 7→ ex . Further the function given by the (pointwise)
supremum of any family of convex functions is convex. [Exercise: prove this.]
Indeed we have the following useful result: a function ϕ is convex if and only if, for all a ∈ R,
there exists a constant b (depending on a) such that

ϕ(a) + b(x − a) ≤ ϕ(x), x ∈ R, (4.2)

i.e. if and only if, for every a, there exists a straight line of some slope b through (a, ϕ(a)) which
lies below (not necessarily strictly) the function ϕ at all other points x.
The “if” part of this result is immediate from the observation of the preceding paragraph, while
the “only if” part (which is what we need below) is geometrically “obvious”—see Figure 4.1. A
formal proof is a useful exercise, or, failing this, see [2]. (Note that we do not necessarily have
that ϕ is differentiable at a.)
Now notice that if ξ is a random variable with P(ξ = a) = p, P(ξ = b) = 1−p, the definition (4.1)
can be written as
ϕ(Eξ) ≤ Eϕ(ξ).
Jensen’s inequality generalises this observation.
SMSTC: Foundations of Probability 4–3

convex function ϕ
ϕ(x)
ϕ(a) + b(x − a)
slope b ϕ(a)

a x

Figure 4.1: Convex function ϕ and “supporting” straight line at (a, ϕ(a))

Lemma 4.4 (Jensen’s inequality). Let X be a real-valued integrable random variable (i.e.
E|X| < ∞) and let ϕ be a convex function. Then

ϕ(EX) ≤ Eϕ(X). (4.3)

Proof Let b be such that (4.2) holds with a = EX. Then

ϕ(EX) + b(x − EX) ≤ ϕ(x), x ∈ R,

so that
ϕ(EX) + b(X − EX) ≤ ϕ(X) a.s.
Taking expectations of both sides gives the required result (4.3).

4.1.3 Moments and the spaces Lp

Definition 4.1. For r > 0 and a real-valued random variable X, the r-absolute moment of
X is defined as the quantity E|X|r . Note that this always exists, but may be finite or infinite.
We define also the quantity ||X||r := (E|X|r )1/r , again allowing the possibility that this may
be infinite. (The latter quantity may be thought of as a properly dimensioned version of the
r-absolute moment: note in particular that, for any constant a, we have ||aX|| = |a| kX||. In
the case where ||X||r < ∞ it is referred to as the r-norm of X.)

Note that for integer r ≥ 1, we may also define the r-moment of X by EX r , whenever this
r r
exists. (Recall from Chapter 3 that EX r = E (X + ) − E (X − ) provided that at least one of
expectations on the right side is finite.)

Lemma 4.5. For any random variable X, we have that ||X||r is increasing in r.

Proof We require to show that for any 0 < r < s we have ||X||r ≤ ||X||s . In the case where
||X||r < ∞ (i.e. E|X|r < ∞) this follows immediately from the application of Jensen’s inequality
to the random variable |X|r using the convex function ϕ given by ϕ(x) = xs/r for x ≥ 0 (see
above). In the case where ||X||r = ∞, we may (from the definition of expectation) consider a
sequence Xn of positive simple random variables such that Xn ≤ X and ||Xn ||r → ∞. Thus,
from our result in the finite case, ||Xn ||s → ∞, and so it follows by monotonicity that ||X||s = ∞
as required.

Corollary 4.1. If E|X|p < ∞ for some p > 0 then E|X|r < ∞ for all 0 < r < p.
SMSTC: Foundations of Probability 4–4

The space Lp . For any p > 0, define Lp (sometimes identified more carefully as Lp (Ω, F, P))
to be the collection of real-valued random variables such that E|X|p < ∞, or equivalently
||X||p < ∞. It follows easily from the elementary inequality

(x + y)p ≤ (2 max(x, y))p ≤ 2p max(xp , y p ), x ≥ 0, y ≥ 0,

that Lp is a vector space. (In particular, if X, Y ∈ Lp , then, for any constants a, b, we have
aX + bY ∈ Lp .) Corollary 4.1 may be restated as Lp ⊆ Lq for all 0 < q < p; (note that other
than on finite probability spaces the inclusion is strict). Of particular interest are the space L1
of integrable random variables, and the smaller space L2 , which we now discuss.

The space L2 . This is the space of random variables X such that EX 2 < ∞. Note that if
X ∈ L2 , then (since L2 ⊆ L1 ) EX exists and is finite, and so also X − EX ∈ L2 .

Definition 4.2. The variance of a random variable X ∈ L2 is defined by

Var X := E(X − EX)2 .

Exercise 4.2. Note (from positivity) that for X ∈ L2 we have Var X ≥ 0 always. Use the other
basic properties of expectation and Lemma 4.2 to show that Var X = 0 if and only if X is almost
surely constant, and to establish the identity

Var X = EX 2 − (EX)2 .

Exercise 4.3. For X ∈ L2 apply the Markov inequality to (X − EX)2 to prove the Chebyshev
inequality:
Var X
P(|X − EX| ≥ t) ≤ .
t2
Exercise 4.4. For X, Y ∈ L2 , define the covariance between X and Y by

Cov(X, Y ) := E((X − EX)(Y − EY )). (4.4)

It follows from the elementary inequality 0 ≤ 2|xy| ≤ x2 + y 2 (applied to the random variables
X − EX and Y − EY ) that Cov(X, Y ) is finite. Show that, for any constants a and b,

Var(aX + bY ) = a2 Var X + 2ab Cov(X, Y ) + b2 Var Y. (4.5)

4.1.4 Hölder and related inequalities

The following important inequality has many useful consequences. For a proof, see e.g. [2].

Lemma 4.6 (Hölder inequality). Let the random variables X, Y be such that X ∈ Lp and
Y ∈ Lq for some p, q > 1 such that p−1 + q −1 = 1. Then

|E(XY )| ≤ E|XY | ≤ ||X||p ||Y ||q . (4.6)

Remarks. It is really the second of the inequalities in (4.6) which is the Hölder inequality (i.e.
it is in essence an inequality for nonnegative random variables); also this result implies that in
particular (under the given conditions) XY is integrable, i.e. XY ∈ L1 . The first inequality in
(4.6) is just the elementary modulus inequality for expectation—see Chapter 3.
In the special case p = q = 2, the Hölder inequality reduces to the well-known Cauchy-
Bunyakowskii-Schwarz inequality (often just referred to as the Schwarz inequality).
SMSTC: Foundations of Probability 4–5

Corollary 4.2 (Cauchy-Bunyakowskii-Schwarz). Let the random variables X, Y ∈ L2 .

Then
|E(XY )| ≤ E|XY | ≤ ||X||2 ||Y ||2 .

Exercise 4.5. Give an elementary proof of the Cauchy-Bunyakowskii-Schwarz inequality as

follows. We have already observed that, since X, Y ∈ L2 , we have E|XY | < ∞, i.e. XY is
integrable. Thus also E(XY ) is finite. Now observe that for all λ ∈ R, we have E(λX + Y )2 ≥ 0.
Express the left side of this inequality as a quadratic in λ, and then use the elementary algebraic
condition that such a quadratic should never take negative values to deduce the required result.

Exercise 4.6. Let X, Y ∈ L2 . Define

p p
ρ(X, Y ) := Cov(X, Y )/ Var(X) Var(Y ).

Show that
−1 ≤ ρ(X, Y ) ≤ 1.

Finally, we remark that the following lemma is the triangle inequality which justifies the inter-
pretation of || · || as a norm. For p > 1, it is readily deduced from the Hölder inequality—for
details again see, e.g., [2].

Lemma 4.7 (Minkowski inequality). Let X, Y ∈ Lp for some p ≥ 1. Then

||X + Y ||p ≤ ||X||p + ||Y ||p .

4.2 Moment generating functions

Let X be a real-valued random variable. Since, for any θ ∈ R, the random variable eθX is non-
negative, its expectation exists (but may be equal to +∞). We define the moment generating
function M : R → R ∪ {+∞} by

M (θ) := E(eθX ), θ ∈ R.

Notice that in particular M (0) = 1 always. The values of M (θ) are also referred to as the
exponential moments of the random variable X and the function L(θ) := M (−θ) is also
known as the Laplace transform of (the distribution of) X.
The function M is useful if M (θ) < ∞ for some θ 6= 0. (Indeed, there are cases where θ = 0 is
the only point at which M is finite.) If X is a positive random variable, then M (θ) ≤ 1 for all
θ ≤ 0. If X is a negative random variable, then M (θ) ≤ 1 for all θ ≥ 0.
Note that (see Theorem 3.2 of Chapter 3) the moment generating function M depends only on
the distribution, or law, of X. Indeed, from that theorem, in the case where X has an absolutely
continuous distribution function F with density f , we can write
Z
M (θ) = eθx f (x)dx (Lebesgue integral) .
R

The moment generating function M is so called because of the following result.

Lemma 4.8. Suppose there exist a < 0 < b such that M (θ) < ∞ for all a < θ < b. Then
(i) the r-moment of X exists for all r = 0, 1, 2, . . . and is given by the r-derivative of M at 0:

E(X r ) = Dr M (0).
SMSTC: Foundations of Probability 4–6

(ii)
∞
X E(X r )
M (θ) = θr , a < θ < b.
r!
r=0

(iii) There is only one law such that if a random variable has that law then it has moment
generating function M .

Note that the result (iii) above says that, under the conditions of the lemma, the moment
generating function uniquely determines the corresponding law.
Proof [sketch] Using the Dominated Convergence Theorem (see Chapters 3 and 6), we can see
that M is infinitely differentiable at 0 with r-derivative equal to the r-moment of X. Moreover,
we can see that M is a real analytic function around 0. Hence Taylor’s theorem holds, which
yields the second claim.

Exercise 4.7. Suppose that the random variable X has moment generating function MX . Show
that, for any constants a, b, the random variable Y := a + bX has moment generating function
MY given by MY (θ) = eaθ MX (bθ).

Exercise 4.8. Suppose that the random variable X has an exponential distribution with pa-
rameter λ, i.e. has distribution function F given by F (x) = 0 for x < 0 and F (x) = 1 − e−λx for
x ≥ 0. Show that the corresponding moment generating function is given by M (θ) = λ/(λ − θ)
for θ < λ (and by M (θ) = ∞ for θ ≥ λ). Use the above lemma to deduce directly that EX = λ−1
and Var X = λ−2 .

Exercise 4.9. Let X be absolutely continuous with density f (x) = π −1 (1 + x2 )−1 , x ∈ R (i.e.
X has a Cauchy distribution). Show that the corresponding moment generating function M (θ)
is only defined for θ = 0.

Moment generating functions are particularly useful in regard to the addition of independent
random variables, as we shall see.

4.3 Characteristic functions

Moment generating functions, as the last exercise shows, may not give any information at all.
There is an analytic technique bypassing this problem, consisting of replacing the argument of
moment generating function by an imaginary number, i.e. a number of the form it, where
the √
i = −1. First, we define the expectation of any complex random variable Y := Y1 + iY2 ,
where Y1 , Y2 are real random variables, by

EY = E(Y1 + iY2 ) := EY1 + iEY2 .

Then, in an extension of the modulus inequality for real-valued random variables,

|EY | ≤ E|Y |. (4.7)

We then define the characteristic function ϕ of a real-valued random variable X by

ϕ(t) := EeitX , t ∈ R.

It is important to note that ϕ is a complex-valued function of a real variable t. By the Taylor

expansion of the exponential function,

eitX = cos(tX) + i sin(tX), t ∈ R.

SMSTC: Foundations of Probability 4–7

Thus
ϕ(t) = E cos(tX) + iE sin(tX), t ∈ R. (4.8)
Further |eitX | = [cos(tX)2 + sin(tX)2 ]1/2 = 1, and therefore ϕ(t) is defined for all t and, from
(4.7), |ϕ(t)| ≤ 1. Note further that ϕ(0) = 1 always.
As in the case of the moment generating function, note that (by, e.g. Theorem 3.2 of Chapter 3)
the characteristic function ϕ depends only on the law of the random variable X.
The following result (for a proof of which see, for example, [2]) is very important.

Theorem 4.1. Given any characteristic function ϕ, there is only law such that if a random
variable has that law then it has characteristic function ϕ.

In other words, the characteristic function uniquely determines (characterises) the law of a
random variable. (This contrasts with the situation for the moment generating function where
we saw that additional conditions are required for it to uniquely characterise the corresponding
law.)
It is further straightforward to show [exercise!] that, for any characteristic function ϕ of a
random variable X:
1. ϕ is a continuous (and, in fact, analytic) function.
2. The complex conjugate ϕ(t) is ϕ(−t). If X is symmetric around 0 (i.e. −X has the same
law as X) then, for all t ∈ R, we have that ϕ(t) is a real number.

It follows from Theorem 3.2 of Chapter 3 (straightforwardly extended to complex functions of

a real-valued random variable) that if a real-valued random variable X is absolutely continuous
with density f , then its characteristic function is given by
Z ∞
ϕ(t) = eitx f (x)dx.
−∞

This is the Fourier transform of the density f . The inverse Fourier transform determines
f from ϕ:
1 ∞ −itx
Z
f (x) = e ϕ(t)dt.
π −∞
However, this inversion formula is rarely used in practice, where, given a characteristic function
ϕ, we typically conjecture the corresponding law, check that its characteristic function is indeed
ϕ, and then appeal to Theorem 4.1.

Exercise 4.10. Prove the modulus inequality (4.7) for complex random variables.

Exercise 4.11. Suppose that the (real-valued) random variable X has characteristic function
ϕX . Show that, for any constants a, b, the random variable Y := a + bX has characteristic
function ϕY given by ϕY (t) = eiat ϕX (bt).

Exercise 4.12. Let the random variable U have a uniform distribution on (0, 1). Show that
the characteristic function of U is given by

eit − 1
ϕ(t) = , t 6= 0.
it
Use also the result of the previous exercise to deduce that if a random variable has a uniform
distribution on (a, b), then its characteristic function is given by ϕa,b (t) = (eibt − eiat )/i(b − a)t
for t 6= 0.
SMSTC: Foundations of Probability 4–8

Exercise 4.13. Show that an exponentially-distributed random variable, with density function
f given by f (x) = λe−λx for x ≥ 0 and f (x) = 0 for x < 0, has characteristic function
ϕ(t) = λ/(λ − it).

Now recall that the normal distribution with mean 0 and variance 1 has density
1 2
f (z) = √ e−z /2 ,
2π
and is usually denoted by N (0, 1). The following result is required for the proof of the central
limit theorem in Chapter 7.
2 /2
Lemma 4.9. The N (0, 1) distribution has characteristic function ϕ given by ϕ(t) = e−t .

Proof Let the random variable Z have an N (0, 1) distribution. It follows from (4.8) and the
symmetry around 0 of this distribution that, for t ∈ R,
Z ∞
1 2
ϕ(t) = E cos(tZ) = √ cos(tz)e−z /2 dz.
2π −∞
A little elementary analysis then gives
Z ∞
0 1 2
ϕ (t) = − √ z sin(tz)e−z /2 dz
2π −∞
= −tϕ(t),

where, in the first line in the above display, a little care is needed in order to justify the inter-
change of limits involved in differentiation with respect to t and integration with respect to z,
and where the second line follows on integrating by parts. Now, easily,
d t2 /2
e ϕ(t) = 0,
dt
and so the result follows on recalling also ϕ(0) = 1.
Alternatively the result is also easily deduced by using contour integration in the complex plane.

4.4 Joint distributions

Chapter 3 studied (real-valued) random variables and their distributions. We now wish to extend
some of these ideas to finite collections of random variables, which we refer to as random vectors.
Recall from Chapter 1 (Exercise 1-10) that an ordered collection X = (X1 , . . . , Xd ) of (real-
valued) random variables may equivalently be regarded as a random vector, i.e. as a measurable
function from the probability space (Ω, F, P) into (Rd , B(Rd )) (the set Rd of real numbers en-
dowed with the corresponding Borel σ-algebra B(Rd )).
Recall further from Chapter 3 that the random vector X = (X1 , . . . , Xd ) then induces a proba-
bility measure PX on (Rd , B(Rd )), given by

PX (B) = P(X ∈ B), B ∈ Rd . (4.9)

The probability measure PX is the law, or distribution, of X. When we regard X as a collection

(X1 , . . . , Xd ) of random variables, the probability measure PX on (Rd , B(Rd )) is also referred to
as the joint distribution of these random variables.
SMSTC: Foundations of Probability 4–9

Note that the joint distribution PX determines the (marginal) laws or distributions PX1 , . . . , PXd
of each of the individual random variables X1 , . . . , Xd : for example, for each (one-dimensional)
Borel set B ∈ B,
PX1 (B) = P(X1 ∈ B) = PX (B × R × · · · × R).
However, PX provides information not just about these individual distributions, but also about
the ways in which the random variables X1 , . . . , Xd are probabilistically associated with each
other.
For simplicity, in the rest of this section we specialise to the case d = 2 and replace the random
vector (or pair of random variables) (X1 , X2 ) with the random vector (X, Y ).

4.4.1 Joint distribution function

The joint distribution function of the random vector (X, Y ) is the function
FX,Y (x, y) := P(X ≤ x, Y ≤ y) = PX,Y ((−∞, x] × (−∞, y]), (x, y) ∈ R2 . (4.10)
Note in particular that the marginal distribution functions FX and FY of the individual random
variables X and Y are given by
FX (x) = lim FX,Y (x, y),
y→∞

FY (y) = lim FX,Y (x, y).

x→∞

(Note also that the choice of ≤ instead of < in (4.10) is an arbitrary convention.)
Analogously to Lemma 3.2 of Chapter 3, the joint distribution function FX,Y has the following
properties.
(i) FX,Y is nondecreasing in each of its arguments, [monotonicity]
(ii) limx→−∞ FX,Y (x, y) = 0 for all y, and also limy→−∞ FX,Y (x, y) = 0 for all x,
(iii) limx,y→+∞ FX,Y (x, y) = 1,
(iv) limn→∞ FX,Y (x + 1/n, y) = FX,Y (x, y) for all y, and also limn→∞ FX,Y (x, y + 1/n) =
FX,Y (x, y) for all x. [right continuity]
However, in the two-dimensional case d = 2 (and in higher-dimensional cases) these properties
are insufficient to characterise joint distribution functions—but see below for the additional
condition required.

4.4.2 Knowledge of FX,Y implies knowledge of PX,Y

Now consider a rectangle (with sides parallel to the axes—please think geometrically)
(a1 , b1 ] × (a2 , b2 ] := {(x, y) ∈ R2 : a1 < x ≤ b1 , a2 < y ≤ b2 }. (4.11)
We allow a1 , a2 to take any value, including −∞. Since (−∞, b1 ] × (−∞, b2 ] is the disjoint union
of four rectangles, using additivity, we obtain
PX,Y ((a1 , b1 ] × (a2 , b2 ]) = FX,Y (b1 , b2 ) − FX,Y (b1 , a2 ) − FX,Y (a1 , b2 ) + FX,Y (a1 , a2 ). (4.12)
Hence the joint distribution function FX,Y determines the probability PX,Y on rectangles, and
so also on countable unions of disjoint rectangles. Analogously to Lemma 3.3 of Chapter 3, and
with some messiness, the Extension Theorem may be used to show that, given any function
FX,Y on R2 satisfying the properties (i)-(iv) above, together with the additional requirement
that the right side of (4.12) is nonnegative for all a1 ≤ b1 and a2 ≤ b2 , there exists a random
vector (X, Y ) with joint distribution function FX,Y , and further that the corresponding joint
distribution PX,Y is unique. Thus in particular FX,Y always determines PX,Y uniquely. (To
establish the uniqueness result alone is simpler—see, e.g. Chapter 1 of [2].)
SMSTC: Foundations of Probability 4–10

4.4.3 Discrete distributions

Now suppose that the random vector (X, Y ) takes values in a discrete set (S1 × S2 ) (endowed
with the σ-algebra of all subsets of this set).
As usual, define the joint probability (mass) function of (X, Y ) by

pX,Y (x, y) := P((X, Y ) = (x, y)) = P(X = x, Y = y), (x, y) ∈ S1 × S2 .

Note that this function is nonnegative and that

X
pX,Y (x, y) = 1.
(x,y)∈S1 ×S2

We then have that the marginal probability functions of X and Y are given by the nonneg-
ative functions X
pX (x) := P(X = x) = pX,Y (x, y)
y∈S2

and X
pY (y) := P(Y = y) = pX,Y (x, y).
x∈S1

Define also the conditional probability function of Y given X by b

 pX,Y (x, y)

if pX (x) 6= 0,
pY |X (y|x) := pX (x) (4.13)
0 otherwise.


We then have, for any x, y,

P(X = x, Y = y) = pX (x)pY |X (y|x)

(since if pX (x) = 0 then also pX,Y (x, y) = 0) and also, if pX (x) > 0,

P(Y = y | X = x) = pY |X (y|x).

Exercise 4.14. Let the random variables be the numbers obtained on two independent rolls of
a die. Define X = min(N1 , N2 ) and Y = max(N1 , N2 ). Find the joint probability function of
(X, Y ). Use it to calculate the marginal probability functions of X and Y and verify that these
are as you would expect by direct determination of each of them. Find also the conditional
probability functions and verify that these are as you would expect.

4.4.4 Absolutely continuous distributions and joint density

Now suppose instead that there exists a nonnegative measurable function fX,Y on (R2 , B(R2 ))
such that FX,Y can be written as a Lebesgue integral : c
Z
FX,Y (x, y) = fX,Y (s, t) ds dt. (4.14)
(−∞,x]×(−∞,y]
b
The notation pY |X is terrible. We only use it out of some respect to the undergraduate probability courses.
The reason that the notation is terrible is that in the subscript ‘Y |X’ in pY |X (y|X) plays a merely cosmetic rôle,
as opposed to the essential rôle played by the last variable X inside the parenthesis.
c
The Lebesgue integral was effectively constructed in Chapter 3. However, note in particular that if (4.14)
holds with the alternative interpretation of its right side as the more familiar Riemann integral, then it holds with
the interpretation of its right side as the Lebesgue integral. The point is that the Lebesgue integral may continue
to exist in circumstances—pathological from the point of view of applications—where the Riemann integral does
not.
SMSTC: Foundations of Probability 4–11

In such a case, we say that the random vector (X, Y ) has an absolutely continuous distribu-
tion, and the function fX,Y is referred to as the joint density of this distribution.
It can then be shown (see also Figure 4.2) that, for any Borel set B ∈ B(R2 ),
Z
P((X, Y ) ∈ B) = PX,Y (B) = fX,Y (s, t) ds dt. (4.15)
B

x
Figure 4.2: The probability PX,Y (B) is given by integrating the joint density fX,Y over the
shaded Borel set B.
(In the case where B is a rectangle B = (a1 , b1 ] × (a2 , b2 ] as above, this follows from (4.12)
[exercise].) We also have of course that
Z
fX,Y (s, t) ds dt = 1.
R2

The marginal distribution function FX of X is given by

Z Z x
FX (x) = fX,Y (s, t) dsdt = fX (s) ds, (4.16)
(−∞,x]×(−∞,∞) −∞

where the function fX is given by

Z
fX (s) = fX,Y (s, t) dt. (4.17)
R

(Here we have used Fubini’s theorem—again see [2], which essentially allows the two-dimensional
integral to be treated as a succession of two one-dimensional integrals.) It follows that the
function fX given by (4.17) is a density for X—and so X is absolutely continuous, as is Y .
We can also, naı̈vely, define the conditional density of Y given X by
 fX,Y (x, y)

if fX (x) 6= 0,
fY |X (y|x) := fX (x)
0 otherwise.


Some justification for this is given by the observation that then (4.15) continues to hold when
fX,Y (s, t) is replaced by fX (s)fY |X (t|s). However, we leave a proper treatment of what it means
to condition on a random variable until Chapter 8.
Exercise 4.15. Suppose that the random variables X and Y have joint density
(
a(x + y 2 ), 0 ≤ x ≤ 1, 0 ≤ y ≤ 1,
fX,Y (x, y) =
0, otherwise.

(i) Find the marginal densities fX and fY and the marginal distribution functions FX and
FY of X and Y . Hence identify also the constant a.
(ii) Find the joint distribution function FX,Y of X and Y .
(iii) Find P(X > Y ) and also P(X 2 > Y ).
SMSTC: Foundations of Probability 4–12

4.5 Independence
We continue to consider a random vector (pair of random variables) (X, Y ), but the results of
this section have obvious extensions to d-dimensional random vectors (X1 , . . . , Xd ) or indeed to
random sequences (sequences of random variables) (X1 , X2 , . . . ).
Recall that random variables X and Y are independent if and only if the generated σ-algebras
σ(X) and σ(Y ) are independent.

Lemma 4.10. Suppose that the random variables X and Y are independent. Then
(i) If X and Y have distribution functions FX and FY respectively, then the joint distribution
function FX,Y of the random vector (X, Y ) is given by

FX,Y (x, y) = FX (x)FY (y), for all x, y. (4.18)

(ii) If X and Y are discrete, with probability (mass) functions pX and pY respectively, then
the random vector (X, Y ) is discrete with probability function pX,Y given by

pX,Y (x, y) = pX (x)pY (y) for all x, y. (4.19)

(iii) If X and Y are absolutely continuous, with density functions fX and fY respectively, then
the random vector (X, Y ) is absolutely continuous with density function fX,Y given by

fX,Y (x, y) = fX (x)fY (y) for almost all x, y.d (4.20)

(iv) If X and Y are positive [respectively integrable], then the random variable XY is positive
[respectively integrable] and
E(XY ) = EX EY. (4.21)

(v) If X and Y have moment generating functions MX and MY respectively, then the random
vector (X, Y ) has joint moment generating function

MX,Y (η, θ) := EeηX+θY = MX (η)MY (θ), η, θ ∈ R, (4.22)

where we allow the possibility that, for appropriate η or θ, MX (η) or MY (θ) may be infinite
(i.e. if the right side of the above identity is infinite, then so is the left).
(vi) If X and Y have characteristic functions ϕX and ϕY respectively, then the random vector
(X, Y ) has joint characteristic function

ϕX,Y (s, t) := EeisX+itY = ϕX (s)ϕY (t), s, t ∈ R. (4.23)

Proof The result (i) is immediate from the definitions of the joint distribution function and
of independence, i.e., for any x, y,

FX,Y (x, y) = P(X ≤ x, Y ≤ y)

= P(X ≤ x)P(Y ≤ y) (independence)
= FX (x)FY (y).

The proof of (ii) is entirely similar to that of (i).

d
By ”for almost all x, y” we mean ”except perhaps on a set in R2 of Lebesgue measure 0” since any density
function may be changed on such a set and remain the density function of the same distribution.
SMSTC: Foundations of Probability 4–13

For (iii) note that, from (i)

FX,Y (x, y) = FX (x)FY (y)

Z x Z y
= fX (s) ds fY (t) dt
−∞ −∞
Z
= fX (s)fY (t) ds dt (Fubini again),
(−∞,x]×(−∞,y]

so that the function of s and t given by fX (s)fY (t) is indeed the joint density function of (X, Y ).
For (iv) note that if X and Y are discrete, then the result follows from (ii) and the rearrangement
of a double sum [exercise!]. In the general case the result is yet another application of Fubini’s
Theorem, which, as previously remarked, permits similar rearrangements of double integrals.
The results (v) and (vi) are immediate applications of (iv) on noting that the independence of
X and Y implies also that of eηX and eθY and of eisX and eitY .

Corollary 4.3. Suppose that X, Y ∈ L2 are independent. Then

Var(X + Y ) = Var X + Var Y.

Proof Since X and Y are independent so also are X − EX and Y − EY . Hence, from (iv)
of Lemma 4.10, Cov(X, Y ) = E(X − EX)E(Y − EY ) = 0, and the required result now follows
from (4.5).

Exercise 4.16. Let V, W be independent identically distributed random variables with distri-
bution functions FV (t) = FW (t) = 1 − e−t , t > 0. Let X = 2V , Y = V − W . Compute the joint
distribution function FX,Y of the random vector (X, Y ). [Hint. Note that since V and W are
both nonnegative we necessarily have X ≥ 0 and Y ≤ X/2. Hence it is sufficient to evaluate
FX,Y (x, y) for x > 0, y ≤ x/2 [why?]. For such (x, y),
Z
FX,Y (x, y) = e−v e−w dv dw
A

where A is the region {(v, w) : v ≤ x/2, w ≥ v − y}. Now sketch this region in (v, w)-space, and
integrate first with respect to one variable, and then with respect to the other.]

Exercise 4.17. Let X, Y be independent random variables with common moment generating
2
function MX (θ) = MY (θ) = eθ . (This corresponds to the common distribution of X and Y
being normal with mean 0 and variance 2.) Let X 0 := X + Y , Y 0 := X − Y . Compute the joint
moment generating function of (X 0 , Y 0 ) (you will need to use the definition contained in (4.22)
above). Show that X 0 , Y 0 are independent.

Sums of independent random variables

Again suppose that the random variables X and Y are independent—with respective laws (dis-
tributions) PX and PY and respective distribution functions FX and FY . Then the law (distribu-
tion) of the random variable X + Y is referred to as the convolution of the laws (distributions)
of X and Y .
However, it is not always easy to calculate this. If, for example, FX+Y denotes the distribution
SMSTC: Foundations of Probability 4–14

function of X + Y , then, for all z,

FX+Y (z) = P(X + Y ≤ z)

Z ∞
= P(Y ≤ z − x) dPX (x)
Z−∞
∞
= FY (z − x) dPX (x) (4.24)
−∞
= EFY (z − X),

i.e., from (4.24), FX+Y (z) is the Lebesgue integral of the function of x given by FY (z − x) with
respect to the probability measure (law) PX on R. (Of course the roles of X and Y may be
interchanged!) When, for example, X is discrete with probability function pX ,
X
FX+Y (z) = FY (z − x)pX (x), (4.25)
all x

while when X is continuous with density fX ,

Z ∞
FX+Y (z) = FY (z − x)fX (x) dx. (4.26)
−∞

Fortunately, Lemma 4.10 gives us the following results.

Lemma 4.11. Suppose that the random variables X and Y are independent.
(i) If X and Y have moment generating functions MX and MY respectively, then the random
variable X + Y has moment generating function

MX+Y (θ) = MX (θ)MY (θ), θ ∈ R, (4.27)

where we again allow the possibility that, for appropriate θ, MX (θ) or MY (θ) may be
infinite (i.e. if the right side of the above identity is infinite, then so is the left).
(ii) If X and Y have characteristic functions ϕX and ϕY respectively, then the random variable
X + Y has characteristic function

ϕX+Y (t) = ϕX (t)ϕY (t), t ∈ R. (4.28)

Proof The proof of (i) is immediate from part (v) of Lemma 4.10 on putting η = θ, while
the proof of (ii) follows similarly from part (vi) of Lemma 4.10 on putting s = t. Alternatively,
either of these results may be proved directly by using part (iv) of Lemma 4.10.
These results may be used to calculate laws (distributions) of sums of independent random
variables, since (frequently) the moment generating function determines the corresponding law
(distribution) uniquely, while (always) the characteristic function determines the corresponding
law (distribution) uniquely.
Exercise 4.18. Let X and Y be independent random variables having Poisson distributions
with parameters λ > 0 and µ > 0 respectively, e.g.
λn
P(X = n) = e−λ , n = 0, 1, 2, . . . .
n!
Show that the moment generating functions MX and MY of X and Y are given by MX (θ) =
θ θ
eλ(e −1) and MY (θ) = eµ(e −1) respectively. Deduce that X + Y has a Poisson distribution with
parameter λ + µ.

Further examples of the use of these results will be given in Chapter 5.

SMSTC: Foundations of Probability 4–15

Converse results

Recall that, as emphasised in Chapter 3, where independence exists it is usually because it

is a given feature of a probability model, reflecting a construction appropriate to some given
sequence of physically independent experiments, e.g. independent identically distributed trails.
Then random variables measurable with respect to independent sub-σ-algebras are automatically
independent.
However, it is sometimes necessary to verify independence, and therefore it is useful to know,
given random variables X and Y , which of the conditions (4.18)– (4.23) featured in Lemma 4.10
are sufficient for their independence. We give in detail the result for joint distribution functions,
and discuss also the remaining conditions.
Lemma 4.12. Suppose that the joint distribution function of random variables X and Y fac-
torises as
FX,Y (x, y) = GX (x)GY (y), for all x, y, (4.29)
for some functions GX and GY . Then X and Y are independent, and further, for some constant
k > 0, their distribution functions FX and FY satisfy

FX (x) = kGX (x) for all x, (4.30)

−1
FY (y) = k GY (y) for all y. (4.31)

Proof By letting y → ∞ in (4.29), we see that (4.30) holds with k = limy→∞ GY (y). We
similarly obtain (4.31)—that the constant there is k −1 follows since, from (4.29),

lim GX (x) lim GY (y) = lim FX,Y (x, y)

x→∞ y→∞ x,y→∞

=1
= lim FX (x) lim FY (y).
x→∞ y→∞

Hence the conditions of the lemma imply that the joint distribution function of X and Y
factorises as the product of the marginal distribution functions:

FX,Y (x, y) = FX (x)FY (y), for all x, y. (4.32)

To show that this implies the independence of X and Y , let X 0 and Y 0 be independent random
variables with the distribution functions FX and FY respectively. Then, from (4.32), the joint
distribution function of (X 0 , Y 0 ) is also FX,Y . Since any joint distribution function uniquely
determines the corresponding law, it follows that the law PX,Y of (X, Y ) is the same as that of
(X 0 , Y 0 ) and so X, Y are independent also.
In the case where random variables X and Y are discrete, it similarly follows that factorisation
of the joint probability function is sufficient for their independence; in the case where X and
Y are absolutely continuous, factorisation of the joint density function is sufficient for their
independence. For any random variables X and Y , the factorisation of their joint characteristic
function (as in (4.23), but the factors on the right side do not need to be identified a priori as
characteristic functions) is sufficient for their independence. The proof is again entirely similarly
to that of Lemma 4.12 and again relies on the fact that any characteristic function identifies
the corresponding law uniquely. A corresponding result holds for the joint moment generating
function (factorisation implies independence) provided that this function exists in (at least) a
(one-sided) neighbourhood of zero.
Finally we remark that the condition (4.21) (a weak condition not comparable with the remaining
conditions featuring in Lemma 4.10) is not sufficient for the independence of X and Y .
SMSTC: Foundations of Probability 4–16

References
[1] B. Fristedt & L. Gray, A Modern Approach to Probability Theory, Birkhäuser, 1997.

[2] D. Williams, Probability with Martingales, Cambridge, 1991.

Fundamentals of Statistics (18.6501x)
No ratings yet
Fundamentals of Statistics (18.6501x)
20 pages
STA 5325 HW 3 Ramin Shamshiri
93% (15)
STA 5325 HW 3 Ramin Shamshiri
20 pages
Foundations and Applications of Statistics An Introduction Using R 2nd Edition Randall Pruim all chapter instant download
No ratings yet
Foundations and Applications of Statistics An Introduction Using R 2nd Edition Randall Pruim all chapter instant download
65 pages
✨SMA 240 Probability and Statistics 1 Lecture Notes
No ratings yet
✨SMA 240 Probability and Statistics 1 Lecture Notes
36 pages
Prob 2 B English
No ratings yet
Prob 2 B English
81 pages
(eBook PDF) Study Guide for Stewart's Single Variable Calculus: Early Transcendentals, 8th 8th Edition instant download
100% (1)
(eBook PDF) Study Guide for Stewart's Single Variable Calculus: Early Transcendentals, 8th 8th Edition instant download
45 pages
High Dimensional Probability MA3K0 Notes 3
No ratings yet
High Dimensional Probability MA3K0 Notes 3
108 pages
Stat Proof Book
No ratings yet
Stat Proof Book
660 pages
HDP_solution
No ratings yet
HDP_solution
76 pages
Basic Statistics and Probability For Econometrics Econ 270a
No ratings yet
Basic Statistics and Probability For Econometrics Econ 270a
18 pages
Chapter 6 - Random Variables and Probability Distributions
No ratings yet
Chapter 6 - Random Variables and Probability Distributions
101 pages
SQE UNIT 3
No ratings yet
SQE UNIT 3
9 pages
280 LN Deller PART1 WITH ALL SUPPLEMENTS Fall2015 PDF
No ratings yet
280 LN Deller PART1 WITH ALL SUPPLEMENTS Fall2015 PDF
118 pages
18MAB301T - P&S - Unit I
100% (1)
18MAB301T - P&S - Unit I
54 pages
3. Random Variables and Distribution Functions
No ratings yet
3. Random Variables and Distribution Functions
33 pages
MIT6 262S11 Lec02
No ratings yet
MIT6 262S11 Lec02
11 pages
15-359: Probability and Computing Inequalities: N J N J
No ratings yet
15-359: Probability and Computing Inequalities: N J N J
11 pages
Fall07 Notenotes For Pee Seogrts
No ratings yet
Fall07 Notenotes For Pee Seogrts
134 pages
Lecture Notes 2 1 Probability Inequalities
No ratings yet
Lecture Notes 2 1 Probability Inequalities
9 pages
Foss Lecture1
No ratings yet
Foss Lecture1
32 pages
CS229 Supplemental Lecture Notes Hoeffding's Inequality: 1 Basic Probability Bounds
No ratings yet
CS229 Supplemental Lecture Notes Hoeffding's Inequality: 1 Basic Probability Bounds
8 pages
Random Matrices
No ratings yet
Random Matrices
44 pages
Probability in High Dimensions 1693642387
No ratings yet
Probability in High Dimensions 1693642387
184 pages
MA 4040/ MA 2540: Probability Theory
No ratings yet
MA 4040/ MA 2540: Probability Theory
12 pages
Fundamentals of Statistics 2
No ratings yet
Fundamentals of Statistics 2
168 pages
SC633 Lecture Notes
No ratings yet
SC633 Lecture Notes
4 pages
Lectnotemat 5
No ratings yet
Lectnotemat 5
346 pages
Thecombinatoricsof Laguerre Charlierand Hermite Polynomials
No ratings yet
Thecombinatoricsof Laguerre Charlierand Hermite Polynomials
13 pages
Introduction To Probability Theory
No ratings yet
Introduction To Probability Theory
13 pages
Selective Review - Probability
No ratings yet
Selective Review - Probability
30 pages
MUML Preliminiaries
No ratings yet
MUML Preliminiaries
24 pages
2014-12-22-LectureNotesOnProbabilityTheoryA4(110)
No ratings yet
2014-12-22-LectureNotesOnProbabilityTheoryA4(110)
297 pages
MA225 L3 Notes
No ratings yet
MA225 L3 Notes
40 pages
Probability I12
No ratings yet
Probability I12
100 pages
MATH2010 2022 23 AutumnNotes Gappy
No ratings yet
MATH2010 2022 23 AutumnNotes Gappy
92 pages
All Lectures 2018 Fall 201 A
No ratings yet
All Lectures 2018 Fall 201 A
100 pages
Probability P
No ratings yet
Probability P
66 pages
Fall 2018 Statistics 201A Aditya Guntuboyina
No ratings yet
Fall 2018 Statistics 201A Aditya Guntuboyina
101 pages
Inequalites Mso205
No ratings yet
Inequalites Mso205
5 pages
ALL ST218 Lecture Notes
No ratings yet
ALL ST218 Lecture Notes
87 pages
Lecture Notes 2 1 Probability Inequalities
No ratings yet
Lecture Notes 2 1 Probability Inequalities
9 pages
Lecture 03 Maximum Likelihood Estimation
No ratings yet
Lecture 03 Maximum Likelihood Estimation
22 pages
Lecture Notes MAI
No ratings yet
Lecture Notes MAI
111 pages
report-endterm
No ratings yet
report-endterm
30 pages
Modeling Analysis and Optimization Under Uncertainty A Review
No ratings yet
Modeling Analysis and Optimization Under Uncertainty A Review
37 pages
Automorphism Group of A Hypercube
No ratings yet
Automorphism Group of A Hypercube
3 pages
Lec 4
No ratings yet
Lec 4
8 pages
Lecture_Notes_MAI
No ratings yet
Lecture_Notes_MAI
114 pages
17 Notes MFML Probreview
No ratings yet
17 Notes MFML Probreview
19 pages
Digital Electronic
No ratings yet
Digital Electronic
44 pages
Probability Theory (MATHIAS LOWE)
No ratings yet
Probability Theory (MATHIAS LOWE)
69 pages
CMU Prob-Grad-Notes - Tomasz Tkocz
No ratings yet
CMU Prob-Grad-Notes - Tomasz Tkocz
226 pages
B.Sc. Statistics
No ratings yet
B.Sc. Statistics
62 pages
Hitchhiker S Guide To Probability
No ratings yet
Hitchhiker S Guide To Probability
6 pages
Review Notes - Probability
No ratings yet
Review Notes - Probability
16 pages
נוסחאות ואי שיוויונים
No ratings yet
נוסחאות ואי שיוויונים
12 pages
College Statistics
No ratings yet
College Statistics
244 pages
STAT 330 Course Notes Fall 2024 Edition
No ratings yet
STAT 330 Course Notes Fall 2024 Edition
482 pages
Random Signals: 1 Kolmogorov's Axiomatic Definition of Probability
No ratings yet
Random Signals: 1 Kolmogorov's Axiomatic Definition of Probability
14 pages
Continuous Random Variable
No ratings yet
Continuous Random Variable
16 pages
Jawaban PE 1
No ratings yet
Jawaban PE 1
7 pages
Revision Questions (1)
No ratings yet
Revision Questions (1)
3 pages
2.1 Random Variables 2.1.1 Definition: PX PX X
100% (1)
2.1 Random Variables 2.1.1 Definition: PX PX X
13 pages
tocv2
No ratings yet
tocv2
10 pages
MTH2222 Mathematics of Uncertainty
No ratings yet
MTH2222 Mathematics of Uncertainty
96 pages
Assignment
No ratings yet
Assignment
6 pages
15-359: Probability and Computing Inequalities: N J N J
No ratings yet
15-359: Probability and Computing Inequalities: N J N J
11 pages
Jitter Distribution
No ratings yet
Jitter Distribution
7 pages
Studies in the Theory of Random Processes
From Everand
Studies in the Theory of Random Processes
A. V. Skorokhod
No ratings yet
PS Iv
No ratings yet
PS Iv
8 pages
Algebraic Combinatorics: Richard P. Stanley
No ratings yet
Algebraic Combinatorics: Richard P. Stanley
268 pages
Chapter4 Stats
No ratings yet
Chapter4 Stats
8 pages
Conditional Distributions
No ratings yet
Conditional Distributions
5 pages
Lectnotemat 2
No ratings yet
Lectnotemat 2
348 pages
Lab 5 (Review) (Thu) - Attempt Review PDF
No ratings yet
Lab 5 (Review) (Thu) - Attempt Review PDF
2 pages
Untitled
No ratings yet
Untitled
4 pages
Presentation by Dr. P. Sasikumar
No ratings yet
Presentation by Dr. P. Sasikumar
28 pages
Probabilty
No ratings yet
Probabilty
3 pages
Linear Operators for Quantum Mechanics
From Everand
Linear Operators for Quantum Mechanics
Thomas F. Jordan
5/5 (1)
Exercises Lecture 4
No ratings yet
Exercises Lecture 4
2 pages
Tensor Analysis and Its Applications
From Everand
Tensor Analysis and Its Applications
Quddus Khan
No ratings yet
Chapter 4 Probability Review
No ratings yet
Chapter 4 Probability Review
20 pages
Matlab Random Waves
No ratings yet
Matlab Random Waves
185 pages
PROBLEM SET 1 and 2 - 2 - 2 PDF
No ratings yet
PROBLEM SET 1 and 2 - 2 - 2 PDF
7 pages
18MAB203T U3 Book PDF
No ratings yet
18MAB203T U3 Book PDF
38 pages
HW4
No ratings yet
HW4
7 pages
XXXXX
No ratings yet
XXXXX
5 pages
Management Science L5
No ratings yet
Management Science L5
11 pages
Notes Digital Communication Lecture 1 - 4
100% (2)
Notes Digital Communication Lecture 1 - 4
63 pages
(Universitext) Pierre Brémaud - Probability Theory and Stochastic Processes (2020, Springer)
100% (5)
(Universitext) Pierre Brémaud - Probability Theory and Stochastic Processes (2020, Springer)
717 pages
MA8402 Probability and Queueing MCQ
No ratings yet
MA8402 Probability and Queueing MCQ
18 pages
Logical progression of twelve double binary tables of physical-mathematical elements correlated with scientific-philosophical as well as metaphysical key concepts evidencing the dually four-dimensional basic structure of the universe
From Everand
Logical progression of twelve double binary tables of physical-mathematical elements correlated with scientific-philosophical as well as metaphysical key concepts evidencing the dually four-dimensional basic structure of the universe
Federico Tambara
No ratings yet
Notes on the Quantum Theory of Angular Momentum
From Everand
Notes on the Quantum Theory of Angular Momentum
Eugene Feenberg
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

04 - Random Variables 2

Uploaded by

04 - Random Variables 2

Uploaded by

SMSTC (2022/23)

4 Random variables and their laws II 4–1

4.1 Moments and inequalities

Proof From the Markov inequality, for any integer n > 0,

Proof Since g is increasing,

Exercise 4.1. Let X be a discrete random variable with P(X = k) = nk 2−n , k = 0, 1, . . . , n

4.1.2 Jensen’s inequality

ϕ(pa + (1 − p)b) ≤ pϕ(a) + (1 − p)ϕ(b) (4.1)

ϕ(a) + b(x − a) ≤ ϕ(x), x ∈ R, (4.2)

ϕ(EX) ≤ Eϕ(X). (4.3)

Proof Let b be such that (4.2) holds with a = EX. Then

ϕ(EX) + b(x − EX) ≤ ϕ(x), x ∈ R,

4.1.3 Moments and the spaces Lp

(x + y)p ≤ (2 max(x, y))p ≤ 2p max(xp , y p ), x ≥ 0, y ≥ 0,

Definition 4.2. The variance of a random variable X ∈ L2 is defined by

Var X := E(X − EX)2 .

Cov(X, Y ) := E((X − EX)(Y − EY )). (4.4)

Var(aX + bY ) = a2 Var X + 2ab Cov(X, Y ) + b2 Var Y. (4.5)

4.1.4 Hölder and related inequalities

|E(XY )| ≤ E|XY | ≤ ||X||p ||Y ||q . (4.6)

Corollary 4.2 (Cauchy-Bunyakowskii-Schwarz). Let the random variables X, Y ∈ L2 .

Exercise 4.5. Give an elementary proof of the Cauchy-Bunyakowskii-Schwarz inequality as

Exercise 4.6. Let X, Y ∈ L2 . Define

Lemma 4.7 (Minkowski inequality). Let X, Y ∈ Lp for some p ≥ 1. Then

||X + Y ||p ≤ ||X||p + ||Y ||p .

4.2 Moment generating functions

The moment generating function M is so called because of the following result.

4.3 Characteristic functions

EY = E(Y1 + iY2 ) := EY1 + iEY2 .

Then, in an extension of the modulus inequality for real-valued random variables,

|EY | ≤ E|Y |. (4.7)

We then define the characteristic function ϕ of a real-valued random variable X by

It is important to note that ϕ is a complex-valued function of a real variable t. By the Taylor

eitX = cos(tX) + i sin(tX), t ∈ R.

It follows from Theorem 3.2 of Chapter 3 (straightforwardly extended to complex functions of

4.4 Joint distributions

PX (B) = P(X ∈ B), B ∈ Rd . (4.9)

The probability measure PX is the law, or distribution, of X. When we regard X as a collection

4.4.1 Joint distribution function

FY (y) = lim FX,Y (x, y).

4.4.2 Knowledge of FX,Y implies knowledge of PX,Y

4.4.3 Discrete distributions

pX,Y (x, y) := P((X, Y ) = (x, y)) = P(X = x, Y = y), (x, y) ∈ S1 × S2 .

Note that this function is nonnegative and that

Define also the conditional probability function of Y given X by b

We then have, for any x, y,

P(X = x, Y = y) = pX (x)pY |X (y|x)

4.4.4 Absolutely continuous distributions and joint density

The marginal distribution function FX of X is given by

where the function fX is given by

FX,Y (x, y) = FX (x)FY (y), for all x, y. (4.18)

pX,Y (x, y) = pX (x)pY (y) for all x, y. (4.19)

fX,Y (x, y) = fX (x)fY (y) for almost all x, y.d (4.20)

MX,Y (η, θ) := EeηX+θY = MX (η)MY (θ), η, θ ∈ R, (4.22)

ϕX,Y (s, t) := EeisX+itY = ϕX (s)ϕY (t), s, t ∈ R. (4.23)

FX,Y (x, y) = P(X ≤ x, Y ≤ y)

The proof of (ii) is entirely similar to that of (i).

For (iii) note that, from (i)

FX,Y (x, y) = FX (x)FY (y)

Corollary 4.3. Suppose that X, Y ∈ L2 are independent. Then

Var(X + Y ) = Var X + Var Y.

Sums of independent random variables

function of X + Y , then, for all z,

FX+Y (z) = P(X + Y ≤ z)

while when X is continuous with density fX ,

Fortunately, Lemma 4.10 gives us the following results.

MX+Y (θ) = MX (θ)MY (θ), θ ∈ R, (4.27)

ϕX+Y (t) = ϕX (t)ϕY (t), t ∈ R. (4.28)

Further examples of the use of these results will be given in Chapter 5.

Recall that, as emphasised in Chapter 3, where independence exists it is usually because it

FX (x) = kGX (x) for all x, (4.30)

lim GX (x) lim GY (y) = lim FX,Y (x, y)

FX,Y (x, y) = FX (x)FY (y), for all x, y. (4.32)

[2] D. Williams, Probability with Martingales, Cambridge, 1991.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.