note
note
note
FIN
Contents
4 Convergence 21
4.1 Converge Almost Surely and in Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2 Laws of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
6 Measure Theory 28
6.1 Lebesgue Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
6.2 Distribution Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.3 Measurable Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.4 Some Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
7 Integration 34
7.1 Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
7.2 Product Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
7.3 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
7.4 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
7.5 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
7.6 Existence of Independence Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
8 Expected Value 42
8.1 Expected Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
8.2 Characteristic function and Uniqueness Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 42
8.3 Moment Generating Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
1
Chapter 1
for any sequence of set (Aj )j≥1 (where the empty union is defined as ∅), it is also enough to check that A
is closed under countable disjoint union.
Remark. We will use (⋆) from time to time throughout this course. It is called the canonical decompo-
sition of (Aj )j≥1 .
Definition 1.1.2. For a nonempty set Ω, suppose one has a type Θ of set collections. In other words, for
any set Ω, one defines what it means for a collection C ⊆ P(Ω) to be of type Θ. The type Θ is said to be
consistent, if for every set Ω, one has the following conditions:
(1) P(Ω) is of type Θ.
T
(2) If Ci , i ∈ I are collections of type Θ for each i ∈ I, then i∈I Ci is again of type Θ.
It is clear that if type Θ is consistent, then for any collection C ⊆ P(Ω), there exists the smallest (in the
partial ordering of inclusion) collection Θ(C) of type Θ containing C, which is given by
\
Θ(C) = {θ : C ⊆ θ and θ is of type Θ},
called the collection of type Θ generated by C. By (2), Θ(C) is still of type Θ. Specially, if θ is a collection
of type Θ generated by some C ⊆ P(Ω), then we say that C is a basis of θ.
Proposition 1.1.3. Suppose Ω is a set and Θ is a consistent type on Ω. Then for any C1 , C2 ⊆ P(Ω),
Θ(Θ(C1 )) = Θ(C1 ), and if C1 ⊆ C2 , Θ(C1 ) ⊆ Θ(C2 ).
Proof. Let C1 , C2 ⊆ P(Ω) be given. It is immediate that Θ(C1 ) ⊆ Θ(Θ(C1 )). On the other hand, since
Θ(C1 ) ⊆ Θ(C1 ) and Θ(C1 ) is a collection of type Θ containing Θ(C1 ), it follows that Θ(Θ(C1 )) ⊆ Θ(C1 ).
For monotonicity, if C1 ⊆ C2 , then C1 ⊆ Θ(C2 ). Since Θ(C2 ) is of type Θ, Θ(C1 ) ⊆ Θ(C2 ).
2
Remark. By 1.1.3, if we want to show that a property P holds for some collection θ of type Θ, it is typical
to set θ′ = {S ⊆ Ω : P holds for S} and show that θ′ is a collection of type Θ containing θ (or a basis of θ).
One can show that σ-algebra and algebra are both consistent. Therefore, for collection C, we let σ(C)
(resp. α(C)) denotes the σ-algebra (resp. algebra) generated by C.
The tuple (Ω, F, P) is called a probability space. The elements in F are events.
Here are some straightforward properties of probability spaces.
Proposition 1.2.2. Let (Ω, F, P) be a probability space. Then
(1) P(∅) = 0.
(2) (Finite Additivity) If A1 , · · · , Ak ∈ F are pairwise disjoint, then
[k Xk
P Aj = P(Aj ).
j=1 j=1
3
(2)
(5) P(B) = P(B\A) + P(A) ≥ P(A).
(6) Suppose Aj ↗ A. We apply the canonical decomposition:
∞
X n
X
P(A) = P(Aj \Aj−1 ) = lim P(Aj \Aj−1 ) = lim P(An ).
n→∞ n→∞
j=1 j=1
The second part follows from taking complement and apply the first part.
(7) We apply the canonical decomposition:
∞ ∞ j−1 ∞
!
[ X [ X
P Aj = P Aj \ Ak ≤ P(Aj ).
j=1 j=1 k=1 j=1
Proof. The cases n = 2 follows by 1.2.2. Suppose n ≥ 3 and the equality holds for n. Then
n+1
[ [n n
[
P Aj = P Aj + P(An+1 ) − P (Aj ∩ An+1 ) .
j=1 j=1 j=1
and
n
[ n
X X \
−P (Aj ∩ An+1 ) = (−1)k+2 P Aj ∩ An+1
j=1 k=1 J2 ⊆{1,··· ,n} j∈J2
#J2 =k
n
X X \
= (−1)k+2 P Aj .
k=1 J2 ⊆{1,··· ,n} j∈J2 ∪{n+1}
#J2 =k
{J : J ⊆ {1, · · · , n + 1}, #J = k} =
{J1 : J1 ⊆ {1, · · · , n}, #J = k} ∪ {J2 ∪ {n + 1} : J2 ⊆ {1, · · · , n}, #J2 = k − 1}
4
is a disjoint union,
n+1
X X \
(−1)k+1 P Aj
k=1 J⊆{1,··· ,n+1} j∈J
#J=k
n+1
X X \ n+1
X X \
= (−1)k+1 P Aj + (−1)k+1 P Aj
k=1 J1 ⊆{1,··· ,n} j∈J1 k=1 J2 ⊆{1,··· ,n} j∈J2 ∪{n+1}
#J1 =k #J2 =k−1
n
X X \ n+1
X X \
= (−1)k+1 P Aj + P(An+1 ) + (−1)k+1 P Aj
k=1 J1 ⊆{1,··· ,n} j∈J1 k=2 J2 ⊆{1,···n} j∈J2 ∪{n+1}
#J1 =k #J2 =k−1
n
[ n
X X \
= P Aj + P(An+1 ) + (−1)k+2 P Aj
j=1 k=1 J2 ⊆{1,··· ,n} j∈J2 ∪{n+1}
#J2 =k
n
[ n
[ n+1
[
= P Aj + P(An+1 ) − P Pj ∩ Pn+1 = P Aj .
j=1 j=1 j=1
We wish to define a measure on some σ-algebra of (0, 1] such that the measure of an integral coincides
with its length. Generally, we will construct an algebra F0 and define a probability measure P on F0 ; then
try to extend P to a σ-algebra containing F0 .
In this case, we pick F0 to be the collection of disjoint unions of half-open intervals (a, b], where
Fn0 < a ≤ b ≤
1 (note that when a = b, (a, b] = ∅). One can show that F0 is indeed an algebra. Then for A = j=1 (aj , bj ] ∈
Pn
F0 , it is natural to define λ(A) = j=1 (bj −aj ). Since for any interval I = (a, b), (a, b], [a, b), [a, b] with a ≤ b,
Z
b − a = lim # I ∩ (prove it!),
j→∞ j
n n
X Z X Z Z
λ(A) = lim # (aj , bj ] ∩ = lim # (aj , bj ] ∩ = lim # A ∩ ,
j→∞ j j→∞ j j→∞ j
k=1 k=1
5
S
Proof. (1) Since j≥1 Ij ⊆ I, for any n ∈ N,
n
X n
[
λ(Ij ) = λ Ij ≤ λ(I),
j=1 j=1
Sn and inequality holds by the argument above. Then we are done by letting n → ∞.
where the equality
(2) If I ⊆ j=1 Ij for some n, then we are done. For the general case, let I = (a, b] and Ij = (aj , bj ] for
each j. Let ϵ ∈ (0, b − a) be given (the case I = ∅ is trivial, thus omitted). Then
∞
[ ϵ
(a + ϵ, b] ⊆ [a + ϵ, b] ⊆ aj , bj + .
j=1
2j
(4) P∗ |F0 = P.
6
∅ ∈ F0 , 0 ≤ P∗ (∅) ≤ P(∅) = 0.
Proof. (1) Since S
(2) Suppose j≥1 Bj is an F0 -covering of B. Then P∗ (A) ≤ j≥1 P(Bj ) as it is also an F0 -covering of
P
A. By taking inferior over all such (Bj )j≥1 , we are done,
(j)
(3) Let ϵ > 0 be given. Then for each j, there exists an F0 -covering (Bk )k≥1 of Aj such that
P (j) ∗ ϵ
S (j) S
k≥1 P(Bk ) ≤ P (Aj ) + 2j . Since k,j≥1 Bk ⊇ j≥1 Aj , we have
∞ ∞ ∞ ∞
[ X X ϵ X
P∗ Aj ≤ P(Bkj ) ≤ P∗ (Aj ) + j = ϵ + P∗ (Aj ).
j=1 j=1
2 j=1
k,j=1
Sj−1
where Bj = Aj \ k=1 Ak . Taking inferior over all such (Aj )j≥1 , we are done.
In general, P∗ is not a probability measure of P(Ω). We will restrict our collection of sets to obtain an
actual one.
Definition 1.3.3. For A ⊆ Ω, we say that A is P∗ -measurable if
P∗ (E) = P∗ (E ∩ A) + P∗ (E\A), ∀E ⊆ Ω.
P∗ (E) = P∗ (E ∩ A) + P∗ (E\A)
= P∗ (E ∩ A ∩ B) + P∗ ((E ∩ A)\B) + P∗ ((E ∩ B)\A) + P∗ (E\(A ∪ B))
≥ P∗ (E ∩ A ∩ B) + P∗ (E\(A ∩ B)),
showing that A ∩ B ∈ M.
To show that M is a σ-algebra, we claim that if (Aj )≥1 ⊆ M is pairwise disjoint, then for any E ⊆ Ω,
∞
G X∞
P∗ E ∩ Aj = P∗ (E ∩ Aj ).
j=1 j=1
Proof of claim. We first consider the finite case: If there is only one Aj , then there is nothing to prove.
Suppose A1 , A2 ∈ M are disjoint. Then for any E ⊆ Ω,
for any n ∈ N. Let n → ∞ are we are done (the reversed inequality follows from countable subadditivity).
7
Let (Aj )j≥1 ⊆ M be pairwise disjoint. Then for any n ∈ N and E ⊆ Ω,
n
G Gn Xn ∞
G
P∗ (E) = P∗ E ∩ Aj + P∗ E\ Aj ≥ P∗ (E ∩ Aj ) + P∗ E\ Aj .
j=1 j=1 j=1 j=1
Thus, M is a σ-algebra. Moreover, by the claim and the properties above, P∗ is a probability measure on
M. Finally, we need to show that F0 ⊆ M. Let A ∈ F0 , E ⊆ Ω, and ϵ > 0 be given. Then there exists an
F0 -covering (Bj )j≥1 of E such that
∞
X
P∗ (E) + ϵ ≥ P(Bj ).
j=1
Then
∞
[ ∞
X ∞
X
P∗ (A ∩ E) ≤ P∗ (Bj ∩ A) ≤ P∗ (Bj ∩ A) = P(Bj ∩ A).
j=1 j=1 j=1
Similarly, P∗ (E\A) ≤
P
j≥1 P(Bj \A). Therefore,
∞
X ∞
X ∞
X
P∗ (E) + ϵ ≥ P(Bj ) = P(Bj ∩ A) + P(Bj \A) ≥ P∗ (E ∩ A) + P∗ (E\A).
j=1 j=1 j=1
Since ϵ is arbitrary, A ∈ F0 .
Remark. The measure P∗ defined on M is called the Lebesgue measure on (0, 1] and M is the collection
of Lebesgue measurable sets.
Such M is unique in some sense. To show this, we need a bit more concepts.
Definition 1.3.5. Let Ω be a set. A collection of subsets of Ω is a monotone class if it is closed under
monotone limits.
Monotone class is consistent, and we denote µ(C) the monotone class generated by C.
Theorem 1.3.6 (Monotone Class Theorem). If F0 is an algebra, then σ(F0 ) = µ(F0 ).
Proof. Since σ(F0 ) is a monotone class, σ(F0 ) ⊇ µ(F0 ). Then it remains to show that µ(F0 ) is a σ-algebra.
Since Ω ∈ F0 , Ω ∈ µ(F0 ).
(1) (Closed under complement) Let M1 = {A ⊆ Ω : X\A ∈ µ(F0 )}. It is clear that F0 ⊆ M1 . Then
it remains to show that M1 is a monotone class. Suppose (Aj )j≥1 ⊆ M1 and Aj ↗ A. Then
(X\Aj )j≥1 ⊆ µ(F0 ) and X\Aj ↘ X\A. Therefore, X\A ∈ µ(F0 ) and A ∈ M1 . Similarly, if Aj ↘ A,
A ∈ M1 . Hence, µ(F0 ) ⊆ M1 and µ(F0 ) is closed under complement.
(2) (Closed under finite unions) The desired property is ∀A, B ∈ µ(F0 ), A ∪ B ∈ µ(F0 ). We first prove
the intermediate property: ∀A ∈ µ(F0 ), B ∈ F0 , A ∪ B ∈ µ(F0 ). Thus, we set M2 = {A ⊆ Ω : ∀B ∈
F0 , A ∪ B ∈ µ(F0 )}. It is clear that F0 ⊆ M2 . Therefore, it remains to show that M2 is a monotone
class. Suppose (Aj )j≥1 ⊆ M2 with Aj ↗ A. Then for any B ∈ µ(F0 ), Aj ∪ B ↗ A ∪ B ∈ µ(F0 ).
| {z }
∈µ(F0 )
Thus, A ∈ M2 . Similarly, if Aj ↘ A, then A ∈ µ(F0 ).
Set M3 = {A ⊆ Ω : ∀B ∈ µ(F0 ), A ∪ B ∈ µ(F0 )}. Since µ(F0 ) ⊆ M2 , for any A ∈ F0 , B ∈ µ(F0 ),
A ∪ B ∈ µ(F0 ) and therefore F0 ⊆ M3 . By a similar argument, M3 is a monotone class, and we are
done.
8
(3) (Closed under countable union) Let (Aj )j≥1 ⊆ µ(F0 ). Then
∞
[ j
∞ [
[
Aj = Ak ∈ µ(F0 )
j=1 j=1 k=1
| {z }
∈µ(F0 )
Sj
as Bj = k=1 Ak is increasing.
By (1), (2), and (3), µ(F0 ) is a σ-algebra and therefore µ(F0 ) = σ(F0 ).
Theorem 1.3.7. Let F0 be an algebra. If P1 and P2 are probability measures on σ(F0 ) that agree on F0 .
Then they are equal.
Proof. We need to show that for any A ∈ σ(F0 ), P1 (A) = P2 (A). Therefore, set
Theorem 1.3.9 (Dynkin’s π-λ Theorem). Let Ω be a set. Suppose Λ and Π are λ-system and π-system of
Ω, respectively. If Π ⊆ Λ, then σ(Π) ⊆ Λ.
Proof. We claim that λ(Π) is a π-system.
Proof of claim. The desired property is ∀A, B ∈ λ(Π), A ∩ B ∈ λ(Π). Let L1 = {A ⊆ Ω : ∀B ∈ Π, A ∩ B ∈
λ(Π)}. It is clear that Π ⊆ L1 and Ω ∈ L1 . Then it suffices to show that L1 is closed under
F pairwise disjoint
unions and complement. Let (Aj )j≥1 ⊆ L1 be pairwise disjoint. Then for any B ∈ Π, j≥1 Aj ∩ B ∈ λ(Π).
| {z }
∈λ(Π)
Let A ∈ L1 and B ∈ Π be given. Then
9
Chapter 2
(2) Ω\ lim sup Aj = lim inf Ω\Aj and Ω\ lim inf Aj = lim sup Ω\Aj
j→∞ j→∞ j→∞ j→∞
(3) P lim inf Aj ≤ lim inf P(Aj ) ≤ lim sup P(Aj ) ≤ P lim sup Aj .
j→∞ j→∞ j→∞ j→∞
P
Proposition 2.1.2 (First Borel-Cantelli Lemma). Suppose (Aj )j≥1 ⊆ F. If j≥1 P(Aj ) < ∞, then
P lim sup Aj = 0.
j→∞
Example (Coin Tossing). Consider P be the Lebesgue measure on (0, 1] with the Borel σ-algebra. For
n ∈ N, we define level n intervals
(n) (n)
I1 , · · · , I2n
(n) (n)
by Ik = k−1 k
2n , 2n and define dn (ω) = 0 if ω ∈ Ik with k odd and dn (ω) = 1 otherwise. Then dn ’s
correspond to fair coin flips since for each ω ∈ (0, 1], the sequence (dj (ω))j≥1 is an infinite sequence of 0’s
and 1’s, which are viewed as tails and heads, respectively.
Define Aj = {ω : dj (ω) = 1} = {j-th toss is heads}. Then P(Aj ) = 12 . Since
1
lim inf P(Aj ) = ,
j→∞ 2
1
we know that P lim sup Aj = P({infinitly many tosses are heads}) ≥
j→∞ 2
10
2.2 Independence
Definition 2.2.1. A1 , · · · , An ∈ F are independent if for any distinct k1 , · · · , km ∈ {1, · · · , n}, P(Ak1 ∩
· · · ∩ Akm ) = P(Ak1 )P(Ak2 ) · · · P(Akm ). A collection C of events is independent if each finite subcollection
is independent.
Remark. Each Aj need not be distinct. If A and A are independent, then P(A) = 0 and 1.
Definition 2.2.2. If A, B ∈ F with P(B) > 0, then we define the conditional probability
P(A ∩ B)
P(A|B) = .
P(B)
For a fixed B ∈ F with P(B) > 0, the function A 7→ P(A|B) is a probability measure on F.
Remark. If P(B) > 0, then A, B are independent if and only if P(A|B) = P(A). Moreover, if C is indepen-
dent, then Ω\C = {Ω\A : A ∈ C} is also independent (which can be proved by 1.2.3).
P
Theorem 2.2.3 (Second Borel-Cantelli Lemma). Suppose (Aj )j≥1 ⊆ F is independent. If j≥1 P(Aj ) = ∞,
then
P lim sup Aj = 1.
j→∞
Proof. Since
∞
\
1 − P lim sup Aj = P lim inf (Ω\Aj ) = lim P (Ω\Ak ) .
j→∞ j→∞ j→∞
k=j
P
Let M ∈ R be given and fix j ∈ N. Since k≥1 P(Ak ) = ∞, there exists N ∈ N such that for any n ≥ N ,
Pn
k=j P(Ak ) > M . Then for any n ≥ N ,
n
\ n
Y n
Y n
Y n
X
P (Ω\Aj ) = P(Ω\Aj ) = 1 − P(Aj ) ≤ exp(−P(Ej )) = exp − P(Ej ) < exp(−M ).
k=j k=j k=j k=j k=j
Since M is arbitrary,
∞
\
P (Ω\Ak ) = 0 =⇒ P lim sup Aj = 1.
j→∞
k=j
11
Proof of claim. It is clear that Λ is closed under complement. Let (Bj )j≥1 ⊆ Λ be pairwise disjoint. Then
∞
[ ∞
X
P Ai1 ∩ · · · Aik ∩ Bj = P(Ai1 ∩ · · · ∩ Aik ∩ Bj )
j=1 j=1
∞
X
= P(Ai1 ) · · · P(Aik )P(Bj )
j=1
∞
[
= P(Ai1 ) · · · P(Aik )P Bj
j=1
∞
!
[
T ⊆σ σ({A1 , · · · , An }) is independent of T
n=1
and thus T is independent of T . Take any A ∈ T , we know that P(A) = P(A ∩ A) = P(A)2 and therefore
P(A) = 0 or 1, as required.
Example. We have shown that P lim sup Aj = 1 and P lim inf Aj = 0 by 2.2.3. Kolmogorov’s zero-one
j→∞ j→∞
law is an alternative way. Since lim sup Aj , lim inf Aj are tail events and
j→∞ n→∞
1
P lim inf Aj ≤ ≤ P lim sup Aj ,
j→∞ 2 j→∞
12
Chapter 3
which is the minimal σ-algebra Σ making X : Ω → R measurable. The σ-algebra generated by the sequence
of random variables (Xj )j≥1 is
13
Since {(a, b) : a < b} is a basis of Borel σ-algebra, σ(X) = σ({X −1 ((a, b)) : a < b}).
Proposition 3.1.7. Suppose X1 , · · · , Xn are simple random variables on Ω and W = (X1 , · · · , Xn ). Then
σ(X1 , · · · , Xn ) = {W −1 (A) : A ⊆ Rn }.
Proof. For 1 ≤ j ≤ n and Borel E ⊆ R, define Ej = {(x1 , · · · , xn ) ∈ Rn : xj ∈ E}. Then {Xj ∈ E} = {W ∈
Ej } ⊆ {W −1 (A) : A ⊆ Rn }. Therefore, σ(X1 , · · · , Xn ) ⊆ {W −1 (A) : A ⊆ Rn }.
Conversely, write R1 , · · · , Rn for the ranges of X1 , · · · , Xn , respectively. Let A ∈ Rn . Then
[ [ n
\
W −1
(A) = W −1
({(ω1 , · · · , ωn )}) = Xj−1 ({ωj })
(ω1 ,··· ,ωn )∈(R1 ×···×Rn )∩A (ω1 ,··· ,ωn )∈(R1 ×···×Rn )∩A j=1
14
(1) (σ(Xj ))j≥1 are independent.
(2) Define
∞
\
T = σ(Xn , Xn+1 , · · · ).
n=1
3.3 Expectation
We only talk about expectation for simple random variables at this point.
Proposition 3.3.1. If X : Ω → R is a simple function, then there exists a unique representation
m
X
X= aj 1Aj
j=1
15
Proof. The case when Bj ’s are pairwise disjoint is clear by combining same bj ’s. For the general case, there
exists pairwise disjoint (Ak )m
k=1 ⊆ F such that each Bj are union of a subcollection of Ak ’s, say Cj . Then
Xm X
X= bj 1Ak .
k=1 j:Ak ∈Cj
Therefore,
m
X X n
X X n
X
EX = bj P(Ak ) = bj P(Ak ) = bj P(Bj ),
k=1 j:Ak ∈Cj j=1 k:Ak ∈Cj j=1
as required.
Proposition 3.3.4. Let X, Y be simple random variables. Then for any a, b ∈ R, aX + bY is simple and
E(aX + bY ) = aEX + bEY .
Proof. It is clear that aX, bY are simple. Thus, it remains to show that X + Y is simple. Let r ∈ R be given.
Then
[ [
(X + Y )−1 ({r}) = X −1 ({λ}) ∩ Y −1 ({r − λ}) = X −1 ({λ}) ∩ Y −1 ({r − λ}) ∈ F.
λ∈R λ∈R∩im X
If r = 0, then
{XY = r} = {X = 0} ∪ {Y = 0} ∈ F.
By 3.1.4, XY is a simple random variable.
16
Proposition 3.3.6 (Law of the Unconcious Stastician). Suppose X is a simple random variable and g : R →
R be a function. Then g(X) := g ◦ X is a simple random variable with
X
Eg(X) = g(x)P(X = x).
x∈im(X)
Proof. Since for each r ∈ R, {g(X) = r} = X −1 (g −1 ({r})) ∈ F by 3.1.4, g(X) is a simple random variable.
Suppose im g(X) = {y1 , · · · , ym }, where xj ’s and yk ’s are both pairwise disjoint. For k ∈ {1, · · · , m}, let
Ck = {g = yk } ∩ im(X). Therefore,
m
X m
X
Eg(X) = yk P(g(X) = yk ) = yk P(X ∈ Ck )
k=1 k=1
Xm X
= yk P(X = x)
k=1 x∈Ck
Xm X
= g(x)P(X = x)
k=1 x∈Ck
X
= g(x)P(X = x),
x∈im(X)
as required.
Example (Norm of a Vector). Let v1 , · · · , vn ∈ Rd√be unit vectors. Show that there exists constants
a1 , · · · , an ∈ {−1, 1} such that ∥a1 v1 + · · · + an vn ∥ ≤ n.
Proof. Fix v1 , · · · , vn ∈ Rd . Let X1 , · · · , Xn be the independent random variables on Ω (Xj can be chosen
to be dj in the coin tossing example with a slight modification) such that P(Xj = 1) = P(Xj = −1) = 21 .
Define X : Ω → R by
2
n
X X
X(ω) = Xj (ω)vj =n+ Xj (ω)Xk (ω)vj · vk .
j=1 j̸=k
Then
3.3.5
X
EX = n + vj · vk EXj EXk = n.
j̸=k
Therefore, there exists ω ∗ ∈ Ω such that X(ω) ≤ n (as otherwise, EX > n, a contradiction). Let aj =
Xj (ω ∗ ) ∈ {−1, 1}. Then
2
n n
∗
X X √
X(ω ) = aj vj ≤ n ⇐⇒ aj vj ≤ n,
j=1 j=1
as required.
f (EX) ≤ Ef (X).
Therefore, f (EX) is defined. Then the rest follows by regular Jensen’s inequality.
17
Definition 3.4.2. Suppose X is a simple random variable. Then the variation of X is defined by
Since Var(X) = E(X − EX)2 = E(X 2 − 2XEX + (EX)2 ) = EX 2 − (EX)2 , for any simple random variable
3.3.4
X, EX 2 ≥ (EX)2 . Also, Var(aX) = E(a2 X 2 ) − (aE(X))2 = a2 (EX 2 − (EX)2 ) = a2 Var(X).
Proposition 3.4.3. Suppose X, Y are simple random variable. If X, Y are independent, then Var(X + Y ) =
Var(X) + Var(Y ).
Proof.
3.3.4
Var(X + Y ) = E(X + Y )2 − (EX + EY )2 = (EX 2 − (EX)2 ) + (EY 2 − (EY )2 ) = Var(X) + Var(Y ).
Theorem 3.4.4. Let X and (Xj )nj=1 be simple random variables. Then:
(1) (Markov’s Inequality) For any λ > 0 and α > 0,
α
E |X|
P(|X| ≥ λ) ≤ .
λα
18
(4) If p = 1, then
∥X1 + · · · + Xn ∥p = E |X1 + · · · + Xn | ≤ E(|X1 | + · · · + |Xn |) = |X1 |p + · · · + |Xn |p .
For p > 1, we prove it by induction on n. The case n = 1 is trivial. If n = 2,
p p−1 p−1
|X1 + X2 | ≤ |X1 | |X1 + X2 | + |X2 | |X1 + X2 | .
p
Then by (3) with r = 1, p1 = p, p2 = p−1 , it follows that
p−1 p−1
E |X1 | |X1 + X2 | ≤ ∥X1 ∥p |X1 + X2 |p−1 p = ∥X1 ∥p ∥X1 + X2 ∥p
p−1
Similarly,
p−1 p−1
E |X2 | |X1 + X2 | ≤ ∥X2 ∥p ∥X1 + X2 ∥p .
Therefore,
p p p−1
∥X1 + X2 ∥p = E |X1 + X2 | ≤ (∥X1 ∥p + ∥X2 ∥p ) ∥X1 + X2 ∥p =⇒ ∥X1 + X2 ∥p ≤ ∥X1 ∥p + ∥X2 ∥p
whenever ∥X1 + X2 ∥p > 0. If ∥X1 + X2 ∥p = 0, then ∥X1 + X2 ∥ = 0 ≤ ∥X1 ∥p + ∥X2 ∥p is clear.
Suppose n ≥ 3, then
∥X1 + · · · + Xn ∥p = ∥X1 + · · · + (Xn−1 + Xn )∥p ≤ ∥X1 ∥p + · · · + ∥Xn−2 ∥p + ∥Xn−1 + Xn ∥p ,
where the inequality follows by induction hypothesis. Since ∥Xn−1 + Xn ∥p ≤ ∥Xn−1 ∥p + ∥Xn ∥p , we
are done.
Let ϵ > 0 be given. Since g is uniformly continuous on [0, 1], there exists δ > 0 such that for any x, y ∈ [0, 1]
with |x − y| < δ, |g(x) − g(y)| < ϵ. Let M = max{x ∈ [0, 1] : |g(x)|}. Then
|gn (p) − g(p)| = |E(g(Sn,p /n) − g(p))|
≤ E |g(Sn,p /n) − g(p)|
Sn,p
≤ ϵ + 2M P −p ≥δ
n
3.4.4 Var(Sn,p /n)
≤ ϵ + 2M
δ2
p(1 − p)
= ϵ + 2M
nδ 2
2M
≤ϵ+ 2 →ϵ
nδ
as n → ∞. Therefore, gn → g uniformly. Since gn is a polynomial of p, we are done.
19
Proposition 3.4.5 (Payley-Zygmend Inequality). Suppose X ≥ 0 is a simple random variable and 0 ≤ θ ≤ 1.
Then
(1 − θ)2 (EX)2 ≤ P(X > θEX)EX 2 .
Proof. Since X = X1{X≤θEX} + X1{X>θEX} ,
q p
EX = E(X1{X≤θEX} ) + E(X1{X>θEX} ) ≤ θEX + EX 2 E1{X>θEX} = θEX + EX 2 P(X > θEX).
Therefore,
(1 − θ)2 (EX)2 ≤ EX 2 P(X > θEX).
Example (First and Second Moment Method). Let T be an infinite d-ary tree (d ≥ 2) and p ∈ (0, 1). For
each edge e, e is open (resp. closed) with probability p (resp. 1 − p) independent of all other edges of T. Let
C be the connected open component of 0 (the root of T) and E = {#C = ∞}. We claim that:
(1) If pd < 1, then P(E) = 0.
(2) If pd > 1, then P(E) > 0.
Proof of claim. For n ∈ N, let Dn be the n-th layer of vertices. For v ∈ T\{0}, let 0 ↔ v denote that there
is a path from 0 to v. Let En = {∃v ∈ Dn : 0 ↔ v}. Then En ↘ E.
Then
3.4.4 X
P(En ) = P (Xn ≥ 1) ≤ EXn = P(0 ↔ v) = dn pn → 0
v∈Dn
P
if pd < 1, where Xn = v∈Dn 1{0↔v} . Thus, (1) is proved.
For (2), we will use 3.4.5. Note that
X X X
EXn2 = P(0 ↔ u, 0 ↔ v) = P({0 ↔ u ∧ v})P({u ∧ v ↔ u})P({u ∧ v ↔ v}) + P(0 ↔ u),
u,v∈Dn u,v∈Dn ,u̸=v u∈Dn
X n−1
X
= (d − 1)dn−m−1 p2n−m
u∈Dn m=0
X d−1 n−1
X
= p2n dn (dp)−m
d m=0
u∈Dn
X d−1 1
≤ p2n dn 1
d
u∈Dn
1 − dp
d−1 1
= (dp)2n 1 ,
d 1 − dp
it follows that
d−1 1
EXn2 ≤ (dp)2n n
1 + (dp) .
d 1 − dp
By 3.4.5,
(EXn )2 (dp)2n d(dp − 1)
P(Xn ≥ 1) = P(Xn > 0) ≥ ≥ d−1 2n 1
→ > 0.
EXn2 d (dp) 1
1− dp
+ (dp)n (d − 1)dp
Thus,
d(dp − 1)
P(E) = lim P(En ) = lim P(Xn ≥ 1) ≥ > 0.
n→∞ n→∞ (d − 1)dp
20
Chapter 4
Convergence
since lim sup(Xj − X) = inf sup(Xk − X) and lim inf (Xj − X) = sup inf (Xk − X) are both measurable
j→∞ j∈N k≥j j→∞ j∈N k≥j
functions (prove it!).
Proposition 4.1.2. Suppose (Xj )j≥1 and X are random variables. Then
Xj → X almost surely ⇐⇒ ∀ϵ > 0, P lim sup{|Xj − X| ≥ ϵ} = 0.
j→∞
Proof. Suppose Xj → X almost surely. Let ϵ > 0 and ω ∈ Ω such that Xj (ω) → X(ω) be given. There
exists N ∈ N such that if j ≥ N , |Xj (ω) − X(ω)| < ϵ. Thus, |Xj (ω) − X(ω)| ≥ ϵ only finitely often.
Therefore, for a fix ϵ > 0,
1 = P({Xj → X}) ≤ P Ω\ lim sup{|Xj − X| ≥ ϵ} =⇒ P lim sup{|Xj − X| ≥ ϵ} = 0.
j→∞ j→∞
1
Conversely, suppose for any ϵ > 0, P lim sup{|Xj − X| ≥ ϵ} = 0. For each n ∈ N, take ϵn = n.
j→∞
Consider
∞
[
A= lim sup{|Xj − X| ≥ ϵn }.
n=1 j→∞
Then
P(A) = lim P lim sup{|Xj − X| ≥ ϵn } = 0
n→∞ j→∞
as lim sup{|Xj − X| ≥ ϵn } ↗ A. Suppose Xj (ω) ̸→ X(ω), then there exists ϵ0 > 0 such that
j→∞
ω ∈ lim sup{|Xj − X| ≥ ϵ0 }.
j→∞
21
Therefore,
P(Xj ̸→ X) ≤ P(A) = 0 =⇒ P(Xj → X) = 1.
Proposition 4.1.3. Suppose (Xj )j≥1 and X are random variables. If Xj → X a.s. (almost surely), then
Xn → X in probability.
Proof. Let ϵ > 0 be given. By 4.1.2, if Xj → X a.s., then P lim sup{|Xj − X| ≥ ϵ} = 0. Therefore,
j→∞
lim sup P(|Xj − X| ≥ ϵ) ≤ P lim sup{|Xj − X| ≥ ϵ} = 0.
j→∞ j→∞
Proof. Since Xj → X a.s., Xj → X in probability by 4.1.3. Let ϵ > 0 be given. There exists N ∈ N such
that for any j ≥ N ,
P(|Xj − X| ≥ ϵ) < ϵ.
Write E |Xj − X| = E |Xj − X| 1{|Xj −X|<ϵ} + E |Xj − X| 1{|Xj −X|≥ϵ} . Since
X
E |Xj − X| 1{|Xj −X|<ϵ} = zP(|Xj − X| = z) < ϵ
z<ϵ,z∈im|Xj −X|
and
E |Xj − X| 1|Xj −X|≥ϵ ≤ 2CP(|Xj − X| ≥ ϵ) ≤ 2Cϵ
as P(|X| ≤ C) = 1. Thus, E |Xj − X| < ϵ(1 + 2C) =⇒ |EXj − EX| < ϵ(1 + 2C) for any j ≥ N . Since ϵ is
arbitrary, EXj → EX.
22
4.2 Laws of Large Numbers
Theorem 4.2.1 (Weak Law of Large Numbers). Let (Xj )j≥1 be simple random variables which are inde-
pendent and identically distributed, that is, P(Xj ∈ B) = P(X1 ∈ B) for all Borel sets B ⊆ R. Then
Pj
setting µ = EX1 and Sj = k=1 Xj ,
Sj
→ µ in probability.
j
Proof. Without loss of generality, we may assume that µ = 0. Then for any ϵ > 0,
Sj 3.4.4 Var(Sj )
P ≥ ϵ = P (|Sj | ≥ jϵ) = P (|Sj − ESj | ≥ jϵ) ≤ .
j j 2 ϵ2
3.4.3
Since Var(Sj ) = j Var(X1 ),
Var(X1 )
P(|Sj | ≥ jϵ) ≤ →0
jϵ
as j → ∞.
Theorem 4.2.2 (Strong Law of Large Numbers). Let (Xj )j≥1 be simple iid (independent and identically
Pj
distributed) random variables. Then setting µ = EX1 and Sj = k=1 Xk ,
Sj
→ µ almost surely.
j
Proof. May assume µ = 0. Let ϵ > 0.
4
3.4.4 E |Sj |
P(|Sj | ≥ ϵj) ≤ .
ϵ4 j 4
Note that
j
X
4
E |Sj | = E Xk1 Xk2 Xk3 Xk4 .
k1 ,k2 ,k3 ,k4 =1
If one of the indices k1 , · · · , k4 is distinct from the others, then that term is 0 by 3.2.3. The only terms that
remain are these:
23
Chapter 5
Suppose we are gambling at each unit time n, we either gain 1 dollar or lose 1 dollar. Represent this with
random variables: For each n, our gain is either Xn = 1 or Xn = −1. We will assume that the gain is
independent, but identically distributed, with P(Xn = 1) = p, P(Xn = −1) = q = 1 − p. Assume that
Pn with a dollars, where a ≥ 0. Our cumulative fortune at time n is a + Sn , where S0 = 0 and
we begin
Sn = j=1 Xj . For some c ≥ a, we are declared winner if our winning reach c before they reach 0 and loser
if they reach 0 first.
Lemma 5.0.1. For random variables Xj defined on some space (Ω, F, P) as above and c ≥ a ≥ 0,
{ω : Sn + a = c before Sn + a = 0} ∈ F.
This is the event that for some n, our fortune has not reached 0 by time n − 1, but reaches c at time n.
Define, for 0 < a < c,
n−1
\
Ea,n = {0 < Sn + a < c} ∩ {Sn + a = c}
k=1
and Ea,0 = ∅, Ec,0 = Ω and E0,n = Ec,n = ∅ when n ≥ 1 as the event that we win at time n. Then define
[ X
f (a) = P Ea,n = P(Ea,n ).
n≥0 n≥0
f (a) = qf (a − 1) + pf (a + 1)
for a = 1, · · · , c − 1.
Proof. We define
Sn′ = X2 + · · · + Xn+1
for n ≥ 1 and S0′ = 0 and the corresponding events
n−1
\
′
Ea,n = {0 < Sk′ + a < c} ∩ {Sn′ + a = c}.
k=0
24
P
Let a be such that 0 < a < c and note that both f (a) = n≥0 P(Ea,n ). Note that Ea,n ∩ {X1 = 1} is
′
exactly the event {X1 = 1} ∩ Ea+1,n−1 . Therefore, for n ≥ 1,
′ ′
P(Ea,n ) = P(Ea+1,n+1 , X1 = 1) + P(Ea−1,n−1 , X1 = −1).
For any x1 , · · · , xn ,
P(X1 = x1 , · · · , Xn = xn ) = P(X2 = x1 , · · · , Xn+1 = xn ).
Summing this over all x1 , · · · , xn such that
Bn = gn (X1 , · · · , Xn−1 ).
We interpret the 1 as “we will bet” and 0 as “we will not bet”.
Note that for any function gn as above, we have Bn is measurable relative to X1 , · · · , Xn−1 by 3.1.9.
Generally, we define
F0 = {∅, Ω}, Fn = σ(X1 , · · · , Xn ).
Let B1 , B2 , · · · be any sequence of {0, 1}-valued random variable such that
Nn = k if B1 + · · · + Bk = n and B1 + · · · + Bk−1 = n − 1
and Yn = XNn . To see this definition another way, for each ω, we have assumed that B1 (ω), B2 (ω), · · · and
X1 (ω), X2 (ω), · · · are defined. We simply look at the n-th Bk (ω) which is equal to 1 and set Yn (ω) equal to
the Xk (ω) corresponding to this. Strictly speaking this is only defined when there is such a Bk (ω), so we
must define Yn (ω) = −1 if ω ∈ Ω\{Bn = 1 i.o.}.
Theorem 5.1.1. The Yn are simple i.i.d. random variables with P(Yn = 1) = p and P(Yn = −1) = q.
25
Proof. Note that
For x1 , · · · , xn ∈ {±1}, write pj = p with xj = 1 and q otherwise. We would like to show that
P(Y1 = x1 , · · · , Yn = xn ) = p1 · · · pn .
Since x1 , · · · , xn are arbitrary, this will show that Yj are independent (and in facts i.i.d.). We prove it by
induction. For n = 1, X
P(Y1 = x1 ) = P(N1 = k, Xk = x1 ).
k≥1
Recall that {N1 = k} ∈ Fk−1 , which is independent of σ(Xk ). So we can split into a product:
X X
P(N1 = k)P(Xk = x1 ) = p1 P(N1 = k) = p1 P(N1 ≥ 1) = p1 .
k≥1 k≥1
For n ≥ 2, write
X
P(Y1 = x1 , · · · , Yn = xn ) = P(Xk1 = x1 , · · · , Xkn = xn , N1 = k1 , · · · , Nn = kn ).
k1 <···<kn
By induction, we get p1 · · · pn .
Wn = fn (X1 , · · · , Xn−1 ) ≥ 0.
Fn = Fn−1 + Wn Xn ,
A gambling policy comes with a stopping time. We will assume that we have a rule to determine when we
would like to stop betting. This rule will produce at a time τ after which we will not bet. If τ = n this
represents our decision to stop playing at time n, so this should depend only on X1 , · · · , Xn .
Definition 5.2.1. A stopping time is a function tau : Ω → {0, 1, · · · } ∪ {∞} such that
{τ = n} ∈ Fn .
26
With the stopping time, along with our betting (Wn )n≥1 , we can define our fortune at time n as
(
∗ Fn if n ≤ τ
Fn =
Fτ if n ≥ τ.
One way to view this gambling policy is that the stopping time forces our wagers to be 0. In other words,
we can define a new wager Wn∗ by
Wn∗ = Wn 1n≤τ .
Sn−1
This is still measurable relative to Fn−1 , since {n ≤ τ } = k=0 {τ = k} ∈ Fn−1 . Therefore, we can recast
this in the previous language:
Fn∗ = Fn−1
∗
+ Wn∗ Xn .
So again, even with a stopping time in our betting strategy, we still cannot make a fair game be advantageous
to us.
Theorem 5.2.2. With any uniformly Bounded gambling policy, that is, |Fn∗ | ≤ M with probability 1 for
some constant M , and an almost surely finite stopping time, our final fortune Fτ satisfies
1
= F0 if p = q = 2 ,
EFτ = ≤ F0 if p ≤ q,
≥ F0 if p ≥ q.
Proof. Note that Fn∗ → Fτ almost surely. We need to assume here that Fτ is simple, although it is not neces-
sary, as we will see later in integration theory. Under this assumption, we can use the bounded convergence
theorem (4.1.5) to get
EFn∗ → EFτ
as n → ∞.
27
Chapter 6
Measure Theory
Proposition 6.0.1. Suppose G ⊆ Rd is open. Then G is a disjoint countable union of half-open cube.
where n ∈ N, i1 , · · · , id ∈ Z. Since for each n, the half-open dyadic cubes of side length 2−n are pairwise
disjoint and cover Rd . Also observe that each dyadic cube of sidelength 2−n is contained in exactly one
“parent” cube of sidelength 2−n+1 . Hence, given any two half-open dyadic cubes, wither they are almost
disjoint, or one of them is contained in the other (this is called dyadic nesting property).
Suppose x ∈ (a, b) ⊆ R with a < b. Let n ∈ N with that 21n < min{x − a, b − x}. Take j ∈ Z such that
x ∈ ( 2jn , j+1 j 1 j j+1
2n ]. Then x − 2n < 2n < (x − a) =⇒ a < 2n and similarly 2n − x < (b − x) =⇒
j+1
2n < b.
j j+1
Hence, x ∈ ( 2n , 2n+1 ] ⊆ (a, b).
Hence, for each x ∈ G, there exists a half-open dyadic cube Qx such that x ∈ Qx ⊆ G. Then
[
Qx = G.
x∈G
Let Q be a collection of “maximal” half-open dyadic cubes Qx with respectSto set inclusion. Then by dyadic
nesting property, maximal cubes in Q are pairwise disjoint. Thus, G = Q∈Q Q and since Q is at most
countable, we are done.
We can even choose the above rectangles to have rational endpoints, so that B is countably generated.
Definition 6.0.2. If (Ω, F) is a measurable space, then a function µ : F → [0, ∞] is a measure if
(1) µ(∅) = 0;
S
µ is finite if µ(Ω) < ∞; is σ-finite if there is an F-sequence (Aj )j≥1 such that Ω = j≥1 Aj and µ(Aj ) < ∞.
Sn
Note that we can make (Aj )j≥1 be increasing by setting Bj := k=1 Ak or pairwise disjoint by canonical
decomposition. The tuple (Ω, F, µ) is a measure space.
28
Proposition 6.0.3. If (Ω, F, µ) is a measure space, then
(1) if A, B ∈ F with A ⊆ B, then µ(A) ≤ µ(B);
(2) if (Aj )j≥1 ⊆ F with Aj ↗ A, then µ(Aj ) → µ(A) as j → ∞;
(3) if (Aj )j≥1 ⊆ F with Aj ↘ A and µ(A1 ) < ∞, then µ(Aj ) → µ(A) as j → ∞;
S P
(4) if (Aj )j≥1 ⊆ F, then µ j≥1 Aj ≤ j≥1 µ(Aj ).
Theorem 6.0.4. Suppose (Ω, F) is a measurable space and µ, ν are two measures on F. If F = σ(F0 ) for
some algebra F0 , µ|F0 = ν|F0 , and µ, ν are σ-finite on (Ω, F0 ), then µ = ν.
S
Proof. Write Ω = j≥1 Bj with (Bj )j≥1 ⊆ F0 , µ(Bj ) < ∞ and Bj ↗ Ω. Define
FN
Then for any j=1 Rj ∈ F0 with half-open rectangles Rj , we define
N
G N
X
λn Rj = λn (Rj ).
j=1 j=1
Note that λn is a measure on (Rn , F0 ) and λn is σ-finite on (Rn , F0 ). Thus, there exists a unique extension,
that is, the Caratheodory’s extension, λn on σ(F0 ) = BRn .
Remark. Since x 7→ x + α is a homeomorphism for all α ∈ R, BRn is translational invariant. Moreover, if
we define λα α α α
n (A) = λn (A + α) on BRn , then λn |F0 = λn |F0 . Since λn , λn are σ-finite on F0 , by 6.0.4, λn = λn
and therefore λn is translational invariant.
Proposition 6.1.1. The Lebesgue measure λn on (Rn , BRn ) is unique in the sense that if there is another
translational invariant measure ν on BRn such that ν is finite on bounded set, then ν = αλn for some α ≥ 0.
Proof. Let α = ν((0, 1]n ). Then by translational invariance, ν and αλn agrees on {(p, q]n : p < q, p, q ∈ Q},
which generates BRn . Then since ν and αλn are both σ-finite, by 6.0.4, ν = αλn on BRn .
29
Corollary 6.1.1.1. Suppose T : Rn → Rn is a linear isomorphism. Then
λn (T A) = |det T | λn (A)
Proof. Define ν(A) = λn (T A) for all A ∈ BRn . Since T is invertible, T is a finite product of elementary
matrices. Hence, we only need to prove the case when T is an elementary matrix.
Since ν is translational invariant on BRn (prove it by considering each type of elementary matrix sep-
arately) and finite on bounded set, we have ν = αλn . Since ν((0, 1]n ) = |det T |, α = |det T | and we are
done.
Definition 6.1.2. Let (X, T ) be a topological space and (X, F, µ) be a measure space with F ⊇ T . We say
that µ is inner regular if
µ(A) = sup{µ(K) : K ⊆ A, K compact}
and outer regular if
µ(A) = inf{µ(G) : A ⊆ G, G open}.
µ is regular if µ is both inner and outer regular.
Proposition 6.1.3. λn is a regular measure.
Proof. Let A ∈ BRn be given. If λn (A) < ∞, then by the definition of λn , for any ϵ > 0, there exists an
open set G ⊇ A (countable F union of open rectangle) such that λn (G) ≤ λn (A) + ϵ. Then we are done. For
general case, write A = k≥1 Ak with λn (Ak ) < ∞ as λS n is σ-finite. Then for S any k ∈ N, there
P exists open
ϵ
set
P k G ⊇ A k such that λ n (G k ) ≤ λ n (A k ) + 2k
. Thus, k≥1 kG ⊇ A and λ n ( G
k≥1 k ) ≤ k≥1 λn (Gk ) ≤
k≥1 λ(A k ) + ϵ = λ n (A) + ϵ. Hence, λ n is outer regular.
Let
closed
C = A ∈ BRn : ∀ϵ, ∃F ⊆ A, λn (A\F ) < ϵ .
Then C contains every closed set and C is closed under countable intersection by ϵ/2k -trick. Let A ∈ C.
If λn (A) < ∞, for any ϵ > 0, there exists an open set G and closed set F such that G ⊇ A ⊇ F and
λn (G\F ) < ϵ. Hence, λn (Rn \A\Rn \G) = λn (G\A) ≤ λn (G\F ) < ϵ, showing that Rn \A ∈ C. Therefore,
C = BRn .
Finally, suppose A ∈ BRn . Let Ak = A ∩ (−k, k)n . Let ϵ > 0 be given. Then there exists closed set
ϵ
Fk ⊆ Ak (hence compact) such that λ(Ak \Fk ) < 2k+1 . Suppose λn (A) < ∞ Then, for any n ∈ N,
[ X ϵ
λn A\ Fk ≤ λn (Ak \Fk ) ≤ .
2
k≥1 k≥1
SN S
Since A\ k=1 Fk ↘ A\ k≥1 Fk as N → ∞, therefore, for N large enough,
N
! N
!
[ [
λn A\ Fk < ϵ ⇐⇒ λn (A) < λn Fk + ϵ.
k=1 k=1
30
6.2 Distribution Functions
Definition 6.2.1. If µ is a Borel measure on R, the distribution function F for µ is defined by
It is clear that F is nondecreasing. If µ is finite, then µ((a, b]) = F (b) − F (a) and F is right-continuous.
Theorem 6.2.2. For each F : R → R, that is nondecreasing and right-continuous, there exists exactly one
Borel measure µ on R such that µ((a, b]) = F (b) − F (a) for all a ≤ b.
Proof. (1): Trivial. (2): Define F : Ω → [−∞, ∞]2 by F (ω) = (lim inf j fj (ω), lim supj fj (ω)). Then
Then
{ω : fj (ω) → f (ω)} = F −1 ({(x, y, z) ∈ [−∞, ∞]3 : x = y = z}) ∈ F.
Theorem 6.3.3. Let (Ω, F) be a measurable space. Suppose f : Ω → R∗ is a measurable function, then
there exists a sequence of measurable simple functions (fk )k≥1 with fk : Ω → R such that |fk | ≤ |fk+1 | and
fk → f poinwisely. Furthermore, if f is bounded, then fk ⇒ f .
Proof. We consider the dyadic partition of R: Let
m
2k ,
if 0 ≤ f (x) ≤ k and m ∈ Z such that f (x) ∈ [ 2mk , m+1
2k
),
m+1 , if − k ≤ f (x) < 0 and m ∈ Z such that f (x) ∈ [ m , m+1 ),
fk (x) = 2k 2k 2k
k if f (x) > k,
−k if f (x) < −k.
31
Definition 6.3.4. Suppose (Ω1 , F1 ) and (Ω2 , F2 ) are measurable space. If µ is a measure on F1 and
f : Ω1 → Ω2 is a measurable function, we define the push forward measure µf −1 on F2 by
Then there exists a random variable X on some probability space (Ω, F, P) such that P(X ≤ x) = F (x).
Proof. Consider (Ω, F, P) = ((0, 1), B(0,1) , λ). If F is strictly increasing, we may define X = F −1 : (0, 1) → R
and therefore
P(X ≤ x) = P({ω ∈ (0, 1) : ω ≤ F (x)}) = P((0, F (x)]) = F (x).
For general F , we may define the “right continuous inverse” by F −1 : (0, 1) → R by
It is well-defined as for any ω ∈ (0, 1), {x ∈ R : ω ≤ F (x)} is nonempty since lim F (x) = 1 and if
x→∞
inf{x ∈ R : ω ≤ F (x)} = −∞, then ω ≤ F (xn ) for some xn → −∞, which implies that ω ≤ 0 since
lim F (x) = 0, a contradiction.
x→−∞
Moreover, F −1 (ω) ≤ x if and only if ω ≤ F (x) for all x ∈ R. Indeed, if ω ≤ F (x), then F −1 (ω) = inf{x ∈
R : ω ≤ F (x)} ≤ x. On the other hand, if F −1 (ω) ≤ x, then for any n ∈ N, there exists yn ∈ R such that
F (yn ) ≥ ω and yn < x + n1 . Since F is nondecreasing, ω ≤ F (yn ) ≤ F (x + n1 ). Take n → ∞, ω ≤ F (x) as F
is right continuous. Hence, if we define X = F −1 : (0, 1) → R, then
as required.
Remark. The “right continuous inverse” is called the quantile of F . In the case when F is strictly increasing,
F −1 coincides with the usual inverse.
Sometimes the quantile function is defined on [0, 1]. In this case, it takes values in R∗ . But the crucial
inequality F −1 (ω) ≤ x ⇐⇒ ω ≤ F (x) still holds.
32
Example. Suppose Ω = {0, 1, · · · , n}, F = P(Ω). For p ∈ [0, 1], the Binomial random variable X:Ω→
R with parameter p is a random variable defined on a probability space (Ω, F, P) with P({j}) = nj pj (1 − p)j
such that X(j) = j.
Example. Suppose Ω = Z≥0 , F = P(Ω). For λ > 0, the Poisson random variable X : Ω → R with
j
parameter λ is a random variable defined on a probability space (Ω, F, P) with P({j}) = e−λ λj! .
Note that the probability space is not important: if X : Ω → R is a random variable, then we can define
−1
X ∗ : [0, 1] → R∗ by X ∗ (ω) = FX (U ), where U : [0, 1] → R is the standard uniform random variable
−1
defined on ([0, 1], B[0,1] , λ) with U (x) = x, FX is the distribution function of X and FX is the quantile of
FX . Hence,
P(X ∗ ≤ x) = P(U ≤ FX (x)) = FX (x) = P(X ≤ x)
as P(U ≤ y) = λ([0, y]) = y, ∀y ∈ [0, 1].
Hence, we only care about the distribution, and we say that two random variables are equivalent if their
distribution functions are the same, or equivalently, their push forward measures are the same.
Example (exponential distribution). Let X be a random variable which is supposed to represent a waiting
time for an event (e.g. a phone call). We assume that the event in independent of time, that is, for any
x, y ≥ 0,
P(X > x + y|X > y) = P(X > x).
Also, we assume that P(X ≥ 0) = 1 (the event will happen eventually). Then we have
P(X > x + y) 1 − FX (x + y)
= P(X > x) ⇐⇒ = 1 − FX (x).
P(X > y) 1 − FX (y)
Let G(x) = 1−FX (x). Then G(x+y) = G(x)G(y) =⇒ G(x) = eαx for some α ∈ R. Hence, FX (x) = 1−eαx
for some α ∈ R, ∀x > 0. Since FX (x) → 1 as x → ∞, α < 0 and we may replace α by −α with α > 0 (the
case α = 0 is meaningless). Then (
1 − e−αx if x > 0;
F (x) =
0 if x ≤ 0.
This is also known as the exponential distribution (with parameter α > 0).
33
Chapter 7
Integration
Definition 7.0.3. Let (Ω, F, µ) be a measure space. Let f : Ω → [−∞, ∞] be measurable. Then we define
Z Z Z
f dµ = f + dµ − f − dµ,
f + := max(f, 0) and f − := max(−f, 0), as long as one of f + dµ, f − dµ is finite. If both f + and
R R R
Rwhere
−
f are finite, then we say that f is integrable.
(3) (Fatou’s lemma) (fj )j≥1 is a sequence of nonnegative measurable functions, then
Z Z
lim inf fj dµ ≤ lim inf fj dµ.
j→∞ j→∞
(4) a, b ≥ 0, then Z Z Z
(af + bg)dµ = a f dµ + b gdµ.
34
R
where the limit exists as fj dµ is an increasing sequence. On the other hand, we take a simple function
0 ≤ s ≤ f . For α ∈ (0, 1), we define
Ek = {fk ≥ αs}.
Pn
Since fk ↗ f , Ek ↗ Ω. Write s = j=1 xj 1Aj with xj ≥ 0, µ(Aj ) < ∞ and (Aj )nj=1 disjoint (standard
representation). Since
fk ≥ αs1Ek ,
by (1), it follows that
Z n
X Z m
X
fk dµ ≥ α xj µ(Aj ∩ Ek ) =⇒ lim fk dµ ≥ α xj µ(Aj ).
k→∞
j=1 j=1
Since α is arbitrary,
Z m
X Z
lim fk dµ ≥ xj µ(Aj ) = sdµ.
k→∞
j=1
We say that a property P holds µ-almost everywhere (µ-a.e.) if {ω ∈ Ω : P does not hold for ω} ∈ F
and µ({ω ∈ Ω : P does not hold for ω}) = 0.
Remark. Note that if f is integrable. Then |f | dµ = f + dµ + f − dµ < ∞. Conversely, if |f | dµ < ∞,
R R R R
then f is integrable.
Proposition 7.0.5. Let f, g be nonnegative and measurable.
R
(1) f dµ = 0 if and only if f = 0 µ-a.e.
R
(2) If f dµ < ∞, then f < ∞ µ-a.e.
R R
(3) If f ≤ g µ-a.e., then f dµ ≤ gdµ
R R
(4) If f = g µ-a.e., then f dµ = gdµ.
Pn
Proof. (1): Suppose f = 0 µ-a.e. Let Z = {f > 0}. Then for any simple function s = j=1 xj 1Aj with
µ(Aj ) < ∞ and xj > 0 such that sR ≤ f , we have µ(Aj ) = 0 otherwiseR {f > 0} ⊇ {s > 0} ⊇ Aj for some
µ(Aj ) > 0, a contradiction. Hence, sdµ = 0. Since s is arbitrary, f dµ = 0. Conversely, if µ({f > 0}) > 0,
since {f > 0} = j≥1 {f > j −1 }, there exists j ≥ 1 such that µ({f > j −1 }) > 0. Hence, s = 1j 1A ≤ f and
S
Z
1
fµ ≥ µ({f > j −1 }) > 0,
j
a contradiction.
35
R (2): Suppose µ({f = ∞}) > 0. Consider the simple function sn = n1{f =∞} . Then f ≥ sn with
sn dµ = ∞, a contradiction. Pn
(3): Let Z = {f > g}. Then µ(Z) = 0. Let s = j=1 xj 1Aj ≤ f be a simple function. Then
⋆
P n
s := j=1 xj 1Aj \Z ≤ g and
n
X n
X Z
xj µ(Aj ) = xj µ(Aj \Z) ≤ gdµ.
j=1 j=1
Proof. (1): Since f ≤ g µ-a.e., f + ≤ g + µ-a.e. and f − ≥ g − µ-a.e. Then the result follows by 7.0.5.
(2): Since Z Z Z
|af + bg| dµ ≤ |a| |f | dµ + |b| |g| dµ < ∞
Then by 7.0.4,
Z Z Z Z Z Z
(f + g)+ dµ + f − dµ + g − dµ = (f + g)− dµ + f + dµ + g + dµ.
If c < 0, then (cf )+ = max{cf, 0} = −c max{−f, 0} = −cf − and (cf )− = max{−cf, 0} = −c max{f, 0} =
−cf + . Hence, Z Z Z Z
cf dµ = −c f − dµ − f + dµ = c f dµ.
(3): Z Z Z Z Z Z
+ − + −
f dµ = f dµ − f dµ ≤ f dµ + f dµ = |f | dµ
Theorem 7.0.7 (dominated convergence theorem). Suppose (fj )j≥1 and f, g are measurable and fj → f
µ-a.e. If |fj | ≤ g and g is integrable, then fj , f is integrable and
Z Z
fj dµ → f dµ.
36
Proof. Since |fj | ≤ g a.e., fj is integrable. Moreover, as fj → f a.e., |f | ≤ g a.e. and f is integrable. Then
we apply Fatou’s lemma on g + fj and g − fj ,
Z Z Z Z
lim inf (g + fj )dµ ≤ lim inf (g + fj )dµ, lim inf (g − fj )dµ ≤ lim inf (g − fj )dµ.
j→∞ j→∞ j→∞ j→∞
Then Z Z Z Z Z Z Z
(f + g)dµ = f dµ + gdµ ≤ gdµ + lim inf fj dµ =⇒ f dµ ≤ lim inf fj dµ
j→∞ j→∞
and Z Z Z Z Z
(g − f )dµ ≤ gdµ − lim sup fj dµ =⇒ f dµ ≥ lim sup fj dµ.
j→∞ j→∞
Hence, Z Z
fj dµ → f dµ.
Proposition 7.1.2. Suppose that f, g : Ω → [0, ∞] are nonnegative and measurable functions. If
Z Z
f dµ = gdµ
A A
Proof. To get a contradiction, we may assume that µ({f > g}) > 0. Set Aj = {f > g + 1j } ∩ {g ≤ j}. Then
X
0 < µ({f > g}) ≤ µ(Aj ).
j≥1
S n such that µ(An ) > 0. Since µ is a σ-finite, there exists (Bj )j≥1 ⊆ Ω such that
Then there exists some
µ(Bj ) < ∞ and Ω = j≥1 Bj . Let Cj = Bj ∩ An . Then
Z Z XZ
1 1
0< µ(An ) = 1A dµ ≤ (f − g)1An dµ ≤ (f − g)dµ.
n n n Cj
j≥1
a contradiction.
37
Theorem 7.1.3. If g is nonnegative and measurable, then
Z Z
gdν = gf dµ,
where f is the density function of ν relative to µ. For measure g, g is integrable relative to ν if and only if
gf is integrable relative to µ. In this case, the above formula holds.
Proof. We use the standard limiting argument: simple functions → nonnegative measurable function →
integrable functions.
If g = 1A for some A ∈ F, then
Z Z Z
gdν = ν(A) ≡ 1A f dµ = gf dµ.
By linearity, the formula holds for simple functions. For nonnegative measurable function g, there exists
simple functions ψj ↗ g, then the formula holds by MCT.
Suppose g is measurable. The formula holds for |g| and thus gf is µ-integrable ⇐⇒ g is ν-integrable.
In this case, write g = g + − g − and we may use linearity to conclude.
Theorem 7.1.4. Suppose T : Ω1 → Ω2 and f : Ω2 → [0, ∞] are measurable. Then
Z Z
f ◦ T dµ = f d(µT −1 ).
Sx (E) := {y ∈ Ω2 : (x, y) ∈ E} ∈ F2 .
Hence, Tx is measurable and therefore Sx (E) = Tx−1 (E) is measurable. Since fx = f ◦Tx , fx is measurable.
Definition 7.2.3. The product measure µ1 ⊗ µ2 on E ∈ F1 × F2 is defined by
Z
(µ1 ⊗ µ2 )(E) = µ2 (Sx (E))dµ1 (x).
38
For the above definition to make sense, we need to know that x 7→ µ2 (Sx (E)) is an F1 -measurable
function.
If E = A × B is a measurable rectangle, then
(
µ2 (B), if x ∈ A,
µ2 (Sx (E)) = = µ(B)1A (x).
0, if x ∈
/A
Then C is a λ-system containing all measurable rectangles, which forms a π-system. Hence, by 1.3.9, C =
F1 ⊗ F2 . Since x ∈ Ω1 is arbitrary, x 7→ µ2 (Sx (E)) is measurable for all x ∈ Ω1 and E ∈ F1 ⊗ F2 .
Also, for pairwise disjoint (Ej )j≥1 ⊆ F1 ⊗ F2 , since Sx (Ej ) = Tx−1 (Ej ), (T −1 (Ej ))j≥1 is pairwise disjoint
and
G Z G Z X X
(µ1 ⊗ µ2 ) Ej = µ2 Sx Ej dµ1 (x) = µ2 (Tx−1 (Ej ))dµ1 (x) = (µ1 ⊗ µ2 )(Ej ).
j≥1 j≥1 j≥1 j≥1
Also,
Theorem 7.2.6 (Fubini’s theorem). Suppose f : Ω1 × Ω2 → [0, ∞] is measurable and µ1 , µ2 are σ-finite. If
f is integrable, then
Z Z Z Z Z
f d(µ1 ⊗ µ2 ) = f (x, y)dµ1 (x) dµ2 (y) = f (x, y)dµ2 (y) dµ1 (x).
39
Proof.
Pn (1): Trivial. (2): (⇐) Trivial. (⇒) First assume that Y is simple, then we can write Y =
j=1 j 1Bj , Bj ∈ σ(X) and
x pairwise disjoint. Then by (1), there exists Borel Aj ⊆ Rk such that X −1 (Aj ) =
Pn
Bj . Then we define f = j=1 xj 1Aj and therefore f is Borel measurable. Since
n
X
f ◦ X(ω) = xj 1Aj (X(ω)) = Y (ω) =⇒ Y = f ◦ X.
j=1
| {z }
=1Bj (ω)
For general Y , we may find simple and measurable functions ψj → Y . For each j, we can find Borel
fj : Rk → R such that ψj = fj ◦ X. Define
7.4 Independence
Proposition 7.4.1. Suppose X1 , · · · , Xk : Ω → R are random variable. Then X1 , · · · , Xk are independent
if and only if the law of the random vector X = (X1 , · · · , Xk ) is the product measure µ = µ1 ⊗ · · · ⊗ µk ,
where µj is the law of Xj .
Proof. Suppose X1 , · · · , Xk are independent. Then for measurable rectangles A1 × · · · × Ak ∈ Rk ,
k
Y k
Y
= P(X ∈ A1 × · · · × Ak ) = P(X1 ∈ A1 , · · · , Xk ∈ Ak ) = P(Xj ∈ Aj ) = µj (Aj ).
j=1 j=1
So the law of X agrees with µ on all measurable rectangles, and they are equal by 6.0.4.
Suppose the law of X is µ. Then
40
7.5 Convolution
Definition 7.5.1. If µ1 , µ2 are two probability measure on BR , we define the convolution µ1 ∗ µ2 : BR →
[0, ∞] by Z
µ1 ∗ µ2 (B) = µ2 (B − x)dµ1 (x),
Proof of claim. By 6.3.5, there exists random variable X, Y with distribution µ1 , µ2 , respectively. Now for
(R2 , BR2 , µ1 ⊗ µ2 ), we define g : (x, y) 7→ x + y. It is clear that g is measurable and this defines a random
variable X + Y with distribution µ1 ∗ µ2 . Hence, µ1 ∗ µ2 is a probability measure.
Example. If X1 , X2 are independent and have exponential distribution with the same parameter α, then
their density functions are αe−αx 1x≥0 . Hence, the density function of X1 + X2 is
Z Z x
−αy −α(x−y)
h(x) = α 2
e e 1y≥0 1x−y≥0 dy = α 2
e−αx 1x≥0 dy = α2 xe−αx 1x≥0 .
0
By induction, one can show that the density for X1 + · · · + Xk with Xj ∼ Exp(α) is
(αx)k−1
αe−αx 1x≥0 .
(k − 1)!
Then Yi is Fi -measurable. So (Yi )i≥1 is independent by 3.2.2. Let x ∈ [0, 1) be given. We define
n
(n)
X Xij
Yi = .
j=1
2j
Since x is arbitrary, Yi has distribution U ((0, 1]). Hence, there exists a sequence of iid uniform random
variables (Yi )i≥1 .
For µi , let Fi be its distribution function. Define Zi = Fi−1 ◦ Yi , where Fi−1 is the quantile of Fi . Then
41
Chapter 8
Expected Value
Definition 8.1.2. For k ≥ 0, the k-th moment of X is EX k and the k-th absolute moment of X is
k k
E |X| (as long as they exist). We say that X has k-moment if E |X| < ∞. If X has k moment, we can
k
define the k-th central moment by E |X − EX| .
Note that 3.4.4 (and thus 3.4.5) holds for general random variables. Hence, if X has j moment, then for
k < j, let r = kj and r′ be its exponential conjugate. Then
k/j
k k j
E |X| ≤ |X| = E |X| < ∞,
r
42
Theorem 8.2.2. Let X be a random variable and µ = PX −1 . Then for any x1 < x2 , we have
Z T −itx1
1 1 1 e − e−itx2
µ((x1 , x2 )) + µ({x1 }) + µ({x2 }) = lim fX (t)dt.
2 2 T →∞ 2π −T it
Proof. Fix x1 < x2 . Then for each T > 0, we have
Z T −itx1 Z T −itx1 Z ∞
− e−itx2 − e−itx2
e e itx
fX (t)dt = e dµ(x) dt
−T it −T it −∞
Z ∞ Z T −it(x1 −x)
e − e−it(x2 −x)
= dtdµ(x).
−∞ −T it
Here we have used Fubini’s theorem to change the order of integration. This is legal since
e−tx1 − e−itx2 itx
e ≤ |x2 − x1 |
it
is integrable (see the remark below).
Define
Z T −it(x1 −x) Z T Z T !
e − e−it(x2 −x) sin(x2 − x)t sin t(x1 − x)
IT (x; x1 , x2 ) = dt = 2 − dt .
−T it 0 t 0 t
Hence, as T → ∞, we have
lim IT (x; x1 , x2 ) = π(sgn(x2 − x) − sgn(x1 − x)) = π(sgn(x2 − x) + sgn(x − x1 )).
T →∞
Hence, we are done if we can take T → ∞ under integral sign, which is valid as
Z y Z π
sin u sin u
0≤ du ≤ du
0 u 0 u
Rπ
for all y ≥ 0 (as in this case, |IT (x; x1 , x2 )| ≤ 4 0 sinu u du < ∞ and we can apply the dominated convergence
theorem).
Let y ≥ 0. The case y ∈ [0, π] is trivial. If y ∈ [(2k − 1)π, (2k + 1)π] for some k ≥ 1, we have
Z y Z 2kπ k Z 2ℓπ k Z (2ℓ−1)π
sin u sin u X X 1 1
du ≥ du = = sin u − du ≥ 0.
0 u 0 u 2(ℓ−1)π (2ℓ−2)π u u+π
ℓ=1 ℓ=1
Therefore,
e−ix1 t − e−ix2 t |x2 t − x1 t|
≤ = |x2 − x1 | .
it |t|
Corollary 8.2.2.1. Let X and Y be two random variables. If they have the same characteristic function,
then X ∼ Y .
Proof. Let µ1 = PX −1 and µ2 = PY −1 . Let Dj = {x ∈ R : µj ({x}) > 0} and D = D1 ∪ D2 . Since X and Y
have the same characteristic function, we have µ1 ((x1 , x2 )) = µ1 ((x1 , x2 )) for all x1 < x2 in R\D.
On the other hand, D1 , D2 are both countable as µj (R) = 1. Hence, R\D is dense in R. Hence,
µ1 ((a, b]) = µ2 ((a, b]) for all a < b. Therefore, by 6.0.4, µ1 = µ2 on BR and therefore X ∼ Y .
43
8.3 Moment Generating Function
Definition 8.3.1. For a random variable X, define
MX (t) = EetX
as the moment generating function. Note that M (t) can be ∞ for some t.
Property. The values of t such that MX (t) < ∞ is an interval containing 0 (may just be {0}).
Proof. Suppose that M (t) < ∞ for some t > 0. We claim that for any 0 ≤ s ≤ t, MX (s) < ∞.
Proof of claim. Note that esX = esX 1X≥0 + esX 1X<0 . Since
EesX 1X≥0 ≤ EetX 1X≥0 ≤ EetX = M (t) < ∞, EesX 1X<0 ≤ E1 = 1 < ∞,
we are done.
(n)
for all t ∈ (−t0 , t0 ). Therefore, EX n = MX (0) for all n ≥ 0.
Proof. We first show that X has all moments. Since et|X| ≤ etX + e−tX , we know that
ΦX (z) := EezX
SI := {z = t + is : t ∈ I, s ∈ R}
44
Proof. Since E ezX = E etX eisX = MX (t) < ∞ for all z ∈ SI , ΦX (z) is well-defined on SI .
Let z0 = t0 + is0 ∈ SI be given. For z = t + is ∈ SI , note that
(z−z0 )X
ΦX (z) − ΦX (z0 ) z0 X z0 X e − 1 − (z − z0 )X
− E(Xe )=E e .
z − z0 z − z0
When |z − z0 | < η, we have
e(z−z0 )X − 1 − (z − z0 )X X (z − z0 )n−1 X n
z0 X
e = ez 0 X
z − z0 n!
n≥2
X |ηX|n−1
≤ et0 X |X|
(n − 1)!
n≥2
+ e(t0 −η)X .
≤ |X| e (t0 +η)X
If we choose η small enough so that t0 ± η ∈ I, by the remark above, E |X| e(t0 +η)X + e(t0 −η)X < ∞.
45