Durrett_2019_chap1
Durrett_2019_chap1
Durrett_2019_chap1
978-1-108-47368-2 — Probability
5th Edition
Excerpt
More Information
Measure Theory
In this chapter, we will recall some definitions and results from measure theory. Our purpose
here is to provide an introduction to readers who have not seen these concepts before and
to review that material for those who have. Harder proofs, especially those that do not
contribute much to one’s intuition, are hidden away in the Appendix. Readers with a solid
background in measure theory can skip Sections 1.4, 1.5, and 1.7, which were previously
part of the Appendix.
2 Measure Theory
A little thought reveals that this is the most general probability measure on this space.
In many cases when is a finite set, we have p(ω) = 1/||, where || = the number
of points in .
For a simple concrete example that requires this level of generality, consider the astragali,
dice used in ancient Egypt made from the ankle bones of sheep. This die could come to rest
on the top side of the bone for four points or on the bottom for three points. The side of the
bone was slightly rounded. The die could come to rest on a flat and narrow piece for six
points or somewhere on the rest of the side for one point. There is no reason to think that all
four outcomes are equally likely, so we need probabilities p1 , p3 , p4 , and p6 to describe P .
To prepare for our next definition, we need to note that it follows easily from the
definition: If Fi , i ∈ I are σ -fields, then ∩i∈I Fi is. Here I = ∅ is an arbitrary index set
(i.e., possibly uncountable). From this it follows that if we are given a set and a collection
A of subsets of , then there is a smallest σ -field containing A. We will call this the σ -field
generated by A and denote it by σ (A).
Let Rd be the set of vectors (x1, . . . xd ) of real numbers and Rd be the Borel sets, the
smallest σ -field containing the open sets. When d = 1, we drop the superscript.
Example 1.1.3 (Measures on the real line) Measures on (R, R) are defined by giving a
Stieltjes measure function with the following properties:
(i) F is nondecreasing.
(ii) F is right continuous, i.e., limy↓x F (y) = F (x).
Theorem 1.1.4 Associated with each Stieltjes measure function F there is a unique measure
μ on (R, R) with μ((a,b]) = F (b) − F (a)
The next definition will explain the choice of “open on the left.”
A collection S of sets is said to be a semialgebra if (i) it is closed under intersection,
i.e., S, T ∈ S implies S ∩ T ∈ S , and (ii) if S ∈ S , then S c is a finite disjoint union of sets
in S . An important example of a semialgebra is
Example 1.1.5 Sd = the empty set plus all sets of the form
The definition in (1.1.1) gives the values of μ on the semialgebra S1 . To go from semial-
gebra to σ -algebra, we use an intermediate step. A collection A of subsets of is called an
algebra (or field) if A,B ∈ A implies Ac and A ∪ B are in A. Since A ∩ B = (Ac ∪ B c )c ,
it follows that A ∩ B ∈ A. Obviously, a σ -algebra is an algebra. An example in which the
converse is false is:
Example 1.1.6 Let = Z = the integers. A = the collection of A ⊂ Z so that A or Ac is
finite is an algebra.
Lemma 1.1.7 If S is a semialgebra, then S̄ = {finite disjoint unions of sets in S } is an
algebra, called the algebra generated by S .
Proof Suppose A = +i Si and B = +j Tj , where + denotes disjoint union and we assume
the index sets are finite. Then A ∩ B = +i,j Si ∩ Tj ∈ S̄ . As for complements, if A = +i Si
then Ac = ∩i Sic . The definition of S implies Sic ∈ S̄ . We have shown that S̄ is closed under
intersection, so it follows by induction that Ac ∈ S̄ .
4 Measure Theory
Example 1.1.8 Let = R and S = S1 then S̄1 = the empty set plus all sets of the form
Proof Observe that it follows from the definition that if A = +i Bi is a finite disjoint union
of sets in S̄ and Bi = +j Si,j , then
μ̄(A) = μ(Si,j ) = μ̄(Bi )
i,j i
∪i Bi = F1 + · · · + Fn
A = A ∩ (∪i Bi ) = (A ∩ F1 ) + · · · + (A ∩ Fn )
so using (a), (b) with n = 1, and (a) again
n
n
μ̄(A) = μ̄(A ∩ Fk ) ≤ μ̄(Fk ) = μ̄ (∪i Bi )
k=1 k=1
Proof of Theorem 1.1.4. Let S be the semialgebra of half-open intervals (a,b] with
−∞ ≤ a < b ≤ ∞. To define μ on S , we begin by observing that
F (∞) = lim F (x) and F (−∞) = lim F (x) exist
x↑∞ x↓−∞
and μ((a,b]) = F (b) − F (a) makes sense for all −∞ ≤ a < b ≤ ∞ since F (∞) > −∞
and F (−∞) < ∞.
If (a,b] = +ni=1 (ai ,bi ], then after relabeling the intervals we must have a1 = a, bn = b,
and ai = bi−1 for 2 ≤ i ≤ n, so condition (i) in Theorem 1.1.9 holds. To check (ii), suppose
first that −∞ < a < b < ∞, and (a,b] ⊂ ∪i≥1 (ai ,bi ] where (without loss of generality)
−∞ < ai < bi < ∞. Pick δ > 0 so that F (a + δ) < F (a) + ǫ and pick ηi so that
F (bi + ηi ) < F (bi ) + ǫ2−i
The open intervals (ai ,bi + ηi ) cover [a + δ,b], so there is a finite subcover (αj ,βj ),
1 ≤ j ≤ J . Since (a + δ,b] ⊂ ∪Jj=1 (αj ,βj ], (b) in Lemma 1.1.10 implies
J
∞
F (b) − F (a + δ) ≤ F (βj ) − F (αj ) ≤ (F (bi + ηi ) − F (ai ))
j =1 i=1
and since ǫ is arbitrary, we have proved the result in the case −∞ < a < b < ∞.
To remove the last restriction, observe that if (a,b] ⊂ ∪i (ai ,bi ] and (A,B] ⊂ (a,b] has
−∞ < A < B < ∞, then we have
∞
F (B) − F (A) ≤ (F (bi ) − F (ai ))
i=1
Since the last result holds for any finite (A,B] ⊂ (a,b], the desired result follows.
Measures on Rd
Our next goal is to prove a version of Theorem 1.1.4 for Rd . The first step is to introduce
the assumptions on the defining function F . By analogy with the case d = 1 it is natural to
assume:
6 Measure Theory
0 2/3 1
0 0 2/3
0 0 0
(i) It is nondecreasing, i.e., if x ≤ y (meaning xi ≤ yi for all i), then F (x) ≤ F (y).
(ii) F is right continuous, i.e., limy↓x F (y) = F (x) (here y ↓ x means each yi ↓ xi ).
(iii) If xn ↓ −∞, i.e., each coordinate does, then F (xn ) ↓ 0. If xn ↑ −∞, i.e., each
coordinate does, then F (xn ) ↑ 1.
However, this time it is not enough. Consider the following F
⎧
⎪
⎪
⎪ 1 if x1,x2 ≥ 1
⎪
⎨2/3 if x ≥ 1 and 0 ≤ x < 1
1 2
F (x1,x2 ) =
⎪
⎪
⎪2/3 if x2 ≥ 1 and 0 ≤ x1 < 1
⎪
⎩0 otherwise
See Figure 1.1 for a picture. A little thought shows that
μ((a1,b1 ] × (a2,b2 ]) = μ((−∞,b1 ] × (−∞,b2 ]) − μ((−∞,a1 ] × (−∞,b2 ])
− μ((−∞,b1 ] × (−∞,a2 ]) + μ((−∞,a1 ] × (−∞,a2 ])
= F (b1,b2 ) − F (a1,b2 ) − F (b1,a2 ) + F (a1,a2 )
Using this with a1 = a2 = 1 − ǫ and b1 = b2 = 1 and letting ǫ → 0 we see that
μ({1,1}) = 1 − 2/3 − 2/3 + 0 = −1/3
Similar reasoning shows that μ({1,0}) = μ({0,1}) = 2/3.
To formulate the third and final condition for F to define a measure, let
A = (a1,b1 ] × · · · × (ad ,bd ]
V = {a1,b1 } × · · · × {ad ,bd }
where −∞ < ai < bi < ∞. To emphasize that ∞’s are not allowed, we will call A a finite
rectangle. Then V = the vertices of the rectangle A. If v ∈ V , let
8 Measure Theory
Exercises
1.1.1 Let = R, F = all subsets so that A or Ac is countable, P (A) = 0 in the first case
and = 1 in the second. Show that (, F ,P ) is a probability space.
1.1.2 Recall the definition of Sd from Example 1.1.5. Show that σ (Sd ) = Rd , the Borel
subsets of Rd .
1.1.3 A σ -field F is said to be countably generated if there is a countable collection
C ⊂ F so that σ (C ) = F . Show that Rd is countably generated.
1.1.4 (i) Show that if F1 ⊂ F2 ⊂ . . . are σ -algebras, then ∪i Fi is an algebra. (ii) Give an
example to show that ∪i Fi need not be a σ -algebra.
1.1.5 A set A ⊂ {1,2, . . .} is said to have asymptotic density θ if
lim |A ∩ {1,2, . . . ,n}|/n = θ
n→∞
Let A be the collection of sets for which the asymptotic density exists. Is A a
σ -algebra? an algebra?
1.2 Distributions
Probability spaces become a little more interesting when we define random variables on
them. A real-valued function X defined on is said to be a random variable if for every
Borel set B ⊂ R we have X −1 (B) = {ω : X(ω) ∈ B} ∈ F . When we need to emphasize the
σ -field, we will say that X is F -measurable or write X ∈ F . If is a discrete probability
space (see Example 1.1.2), then any function X : → R is a random variable. A second
trivial, but useful, type of example of a random variable is the indicator function of a set
A ∈ F:
1 ω∈A
1A (ω) =
0 ω ∈ A
1.2 Distributions 9
(,F,P ) (R,R) μ = P ◦ X −1
X
✁ ✲ A
✁ X−1 (A)
✁
The notation is supposed to remind you that this function is 1 on A. Analysts call this
object the characteristic function of A. In probability, that term is used for something quite
different. (See Section 3.3.)
If X is a random variable, then X induces a probability measure on R called its
distribution by setting μ(A) = P (X ∈ A) for Borel sets A. Using the notation introduced
previously, the right-hand side can be written as P (X −1 (A)). In words, we pull A ∈ R back
to X −1 (A) ∈ F and then take P of that set.
To check that μ is a probability measure we observe that if the Ai are disjoint, then using
the definition of μ; the fact that X lands in the union if and only if it lands in one of the Ai ;
the fact that if the sets Ai ∈ R are disjoint, then the events {X ∈ Ai } are disjoint; and the
definition of μ again; we have:
μ (∪i Ai ) = P (X ∈ ∪i Ai ) = P (∪i {X ∈ Ai }) = P (X ∈ Ai ) = μ(Ai )
i i
The distribution of a random variable X is usually described by giving its distribution
function, F (x) = P (X ≤ x).
Theorem 1.2.1 Any distribution function F has the following properties:
(i) F is nondecreasing.
(ii) limx→∞ F (x) = 1, limx→−∞ F (x) = 0.
(iii) F is right continuous, i.e., limy↓x F (y) = F (x).
(iv) If F (x−) = limy↑x F (y), then F (x−) = P (X < x).
(v) P (X = x) = F (x) − F (x−).
Proof To prove (i), note that if x ≤ y, then {X ≤ x} ⊂ {X ≤ y}, and then use (i) in
Theorem 1.1.1 to conclude that P (X ≤ x) ≤ P (X ≤ y).
To prove (ii), we observe that if x ↑ ∞, then {X ≤ x} ↑ , while if x ↓ −∞, then
{X ≤ x} ↓ ∅ and then use (iii) and (iv) of Theorem 1.1.1.
To prove (iii), we observe that if y ↓ x, then {X ≤ y} ↓ {X ≤ x}.
To prove (iv), we observe that if y ↑ x, then {X ≤ y} ↑ {X < x}.
For (v), note P (X = x) = P (X ≤ x) − P (X < x) and use (iii) and (iv).
The next result shows that we have found more than enough properties to characterize
distribution functions.
10 Measure Theory
y ✟
✟✟
✟
x
✏
✏✏
F −1 (x) F −1 (y)
Figure 1.4 Picture of the inverse defined in the proof of Theorem 1.2.2.
Theorem 1.2.2 If F satisfies (i), (ii), and (iii) in Theorem 1.2.1, then it is the distribution
function of some random variable.
Proof Let = (0,1), F = the Borel sets, and P = Lebesgue measure. If ω ∈ (0,1), let
X(ω) = sup{y : F (y) < ω}
Once we show that
(⋆) {ω : X(ω) ≤ x} = {ω : ω ≤ F (x)}
the desired result follows immediately since P (ω : ω ≤ F (x)) = F (x). (Recall P is
Lebesgue measure.) To check (⋆), we observe that if ω ≤ F (x), then X(ω) ≤ x, since
x∈ / {y : F (y) < ω}. On the other hand if ω > F (x), then since F is right continuous, there
is an ǫ > 0 so that F (x + ǫ) < ω and X(ω) ≥ x + ǫ > x.
Even though F may not be 1-1 and onto we will call X the inverse of F and denote it
by F −1 . The scheme in the proof of Theorem 1.2.2 is useful in generating random variables
on a computer. Standard algorithms generate random variables U with a uniform distribu-
tion, then one applies the inverse of the distribution function defined in Theorem 1.2.2 to get
a random variable F −1 (U ) with distribution function F .
If X and Y induce the same distribution μ on (R, R), we say X and Y are equal in dis-
tribution. In view of Theorem 1.1.4, this holds if and only if X and Y have the same
distribution function, i.e., P (X ≤ x) = P (Y ≤ x) for all x. When X and Y have the
same distribution, we like to write
d
X=Y
but this is too tall to use in text, so for typographical reasons we will also use X =d Y .
When the distribution function F (x) = P (X ≤ x) has the form
x
F (x) = f (y) dy (1.2.1)
−∞
we say that X has density function f . In remembering formulas, it is often useful to think
of f (x) as being P (X = x) although
x+ǫ
P (X = x) = lim f (y) dy = 0
ǫ→0 x−ǫ
By popular demand, we have ceased our previous practice of writing P (X = x) for the
density function. Instead we will use things like the lovely and informative fX (x).