note

Probability Theory
FIN
Contents
1 σ-algebra and Probability Measures 2

1.1 Sample Space and σ-algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Probability Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Lebesgue Measure on (0, 1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Independence and Kolmogorov’s Zero-One Law 10

2.1 Limits Behavior and Borel-Cantelli Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Kolmogorov’s Zero-One Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3 Random Variables and Expectation 13

3.1 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 Independence of Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.4 Probabilistic Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4 Convergence 21
4.1 Converge Almost Surely and in Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2 Laws of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5 Gambling Systems (a.k.a. random walk) 24

5.1 Betting Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.2 Gambling Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
6 Measure Theory 28
6.1 Lebesgue Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
6.2 Distribution Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.3 Measurable Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.4 Some Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
7 Integration 34
7.1 Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
7.2 Product Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
7.3 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
7.4 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
7.5 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
7.6 Existence of Independence Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
8 Expected Value 42
8.1 Expected Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
8.2 Characteristic function and Uniqueness Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 42
8.3 Moment Generating Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
1
Chapter 1
σ-algebra and Probability Measures
1.1 Sample Space and σ-algebra

Definition 1.1.1. An algebra of subsets of a nonempty set Ω is a collection F of subsets of Ω satisfying
the following conditions:
(1) Ω ∈ F.
(2) F is closed under finite unions and complement.
A σ-algebra is an algebra which is closed under countable union. The pair (Ω, F) is called a measurable
space if F is a σ-algebra.
By De Morgan’s law, to check an algebra F is a σ-algebra, one may show that it is closed under countable
intersections. Similarly, one can replace “finite unions” to “finite intersections” in (2) if we want to show
that F is an algebra. Moreover, since
∞
[ ∞
[ j−1
[
Aj = Aj \ Ak (⋆)
j=1 j=1 k=1
for any sequence of set (Aj )j≥1 (where the empty union is defined as ∅), it is also enough to check that A
is closed under countable disjoint union.
Remark. We will use (⋆) from time to time throughout this course. It is called the canonical decompo-
sition of (Aj )j≥1 .
Definition 1.1.2. For a nonempty set Ω, suppose one has a type Θ of set collections. In other words, for
any set Ω, one defines what it means for a collection C ⊆ P(Ω) to be of type Θ. The type Θ is said to be
consistent, if for every set Ω, one has the following conditions:
(1) P(Ω) is of type Θ.
T
(2) If Ci , i ∈ I are collections of type Θ for each i ∈ I, then i∈I Ci is again of type Θ.
It is clear that if type Θ is consistent, then for any collection C ⊆ P(Ω), there exists the smallest (in the
partial ordering of inclusion) collection Θ(C) of type Θ containing C, which is given by
\
Θ(C) = {θ : C ⊆ θ and θ is of type Θ},
called the collection of type Θ generated by C. By (2), Θ(C) is still of type Θ. Specially, if θ is a collection
of type Θ generated by some C ⊆ P(Ω), then we say that C is a basis of θ.
Proposition 1.1.3. Suppose Ω is a set and Θ is a consistent type on Ω. Then for any C1 , C2 ⊆ P(Ω),
Θ(Θ(C1 )) = Θ(C1 ), and if C1 ⊆ C2 , Θ(C1 ) ⊆ Θ(C2 ).
Proof. Let C1 , C2 ⊆ P(Ω) be given. It is immediate that Θ(C1 ) ⊆ Θ(Θ(C1 )). On the other hand, since
Θ(C1 ) ⊆ Θ(C1 ) and Θ(C1 ) is a collection of type Θ containing Θ(C1 ), it follows that Θ(Θ(C1 )) ⊆ Θ(C1 ).
For monotonicity, if C1 ⊆ C2 , then C1 ⊆ Θ(C2 ). Since Θ(C2 ) is of type Θ, Θ(C1 ) ⊆ Θ(C2 ).
2
Remark. By 1.1.3, if we want to show that a property P holds for some collection θ of type Θ, it is typical
to set θ′ = {S ⊆ Ω : P holds for S} and show that θ′ is a collection of type Θ containing θ (or a basis of θ).
One can show that σ-algebra and algebra are both consistent. Therefore, for collection C, we let σ(C)
(resp. α(C)) denotes the σ-algebra (resp. algebra) generated by C.
1.2 Probability Measures

Definition 1.2.1. Suppose (Ω, F) is a measurable space. Then a function P : F → [0, ∞) is called a
probability measure of the following holds:
(1) P(Ω) = 1.
(2) If (Aj )j≥1 ⊆ F and pairwise disjoint, then
 
∞
[ ∞
X
P Aj  = P(Aj ).
j=1 j=1
The tuple (Ω, F, P) is called a probability space. The elements in F are events.
Here are some straightforward properties of probability spaces.
Proposition 1.2.2. Let (Ω, F, P) be a probability space. Then
(1) P(∅) = 0.
(2) (Finite Additivity) If A1 , · · · , Ak ∈ F are pairwise disjoint, then
 
[k Xk
P Aj  = P(Aj ).
j=1 j=1
(3) If A ∈ F, P(A) = 1 − P(Ω\A).

(4) (Inclusion-Exclusion) If A, B ∈ F,
P(A ∪ B) = P(A) + P(B) − P(A ∩ B).
(5) (Monotonicity) If A ⊆ B and A, B ∈ F, then P(A) ≤ P(B).

(6) (Continuity) Suppose (Aj )j≥1 ⊆ F. If Aj ↗ A or ↘ A, then
P(A) = lim P(Aj ).

j→∞
(7) (Countably Subadditivity) If (Aj )j≥1 ⊆ F, then

 
∞
[ ∞
X
P Aj  ≤ P(Aj ).
j=1 j=1
Proof. (1), (2), and (3) are trivial.

(4)
(2)
P(A ∪ B) = P(A\B) + P(A ∩ B) + P(B\A)
= (P(A\B) + P(A ∩ B)) + (P(B\A) + P(A ∩ B)) − P(A ∩ B)
(2)
= P(A) + P(B) − P(A ∩ B).
3
(2)
(5) P(B) = P(B\A) + P(A) ≥ P(A).
(6) Suppose Aj ↗ A. We apply the canonical decomposition:
∞
X n
X
P(A) = P(Aj \Aj−1 ) = lim P(Aj \Aj−1 ) = lim P(An ).
n→∞ n→∞
j=1 j=1
The second part follows from taking complement and apply the first part.
(7) We apply the canonical decomposition:
 
∞ ∞ j−1 ∞
!
[ X [ X
P Aj  = P Aj \ Ak ≤ P(Aj ).
j=1 j=1 k=1 j=1
Proposition 1.2.3 (General Inclusion-Exclusion). Let A1 , · · · , An ∈ F. Then

   
[n Xn X \
P Aj  = (−1)k+1 P Aj  .
j=1 k=1 J⊆{1,··· ,n} j∈J
#J=k
Proof. The cases n = 2 follows by 1.2.2. Suppose n ≥ 3 and the equality holds for n. Then
     
n+1
[ [n n
[
P Aj  = P  Aj  + P(An+1 ) − P  (Aj ∩ An+1 ) .
j=1 j=1 j=1
By the induction hypothesis,

   
n
[ n
X X \
P Aj  = (−1)k+1 P Aj 
j=1 k=1 J1 ⊆{1,··· ,n} j∈J1
#J1 =k
and
   
n
[ n
X X \
−P  (Aj ∩ An+1 ) = (−1)k+2 P Aj ∩ An+1 
j=1 k=1 J2 ⊆{1,··· ,n} j∈J2
#J2 =k
 
n
X X \
= (−1)k+2 P Aj  .
k=1 J2 ⊆{1,··· ,n} j∈J2 ∪{n+1}
#J2 =k
Since for each k ∈ N,
{J : J ⊆ {1, · · · , n + 1}, #J = k} =
{J1 : J1 ⊆ {1, · · · , n}, #J = k} ∪ {J2 ∪ {n + 1} : J2 ⊆ {1, · · · , n}, #J2 = k − 1}
4
is a disjoint union,
 
n+1
X X \
(−1)k+1 P Aj 
k=1 J⊆{1,··· ,n+1} j∈J
#J=k
   
n+1
X X \ n+1
X X \
= (−1)k+1 P Aj  + (−1)k+1 P Aj 
k=1 J1 ⊆{1,··· ,n} j∈J1 k=1 J2 ⊆{1,··· ,n} j∈J2 ∪{n+1}
#J1 =k #J2 =k−1
   
n
X X \ n+1
X X \
= (−1)k+1 P Aj  + P(An+1 ) + (−1)k+1 P Aj 
k=1 J1 ⊆{1,··· ,n} j∈J1 k=2 J2 ⊆{1,···n} j∈J2 ∪{n+1}
#J1 =k #J2 =k−1
   
n
[ n
X X \
= P Aj  + P(An+1 ) + (−1)k+2 P Aj 
j=1 k=1 J2 ⊆{1,··· ,n} j∈J2 ∪{n+1}
#J2 =k
     
n
[ n
[ n+1
[
= P Aj  + P(An+1 ) − P  Pj ∩ Pn+1  = P  Aj  .
j=1 j=1 j=1
1.3 Lebesgue Measure on (0, 1]

Definition 1.3.1. Suppose F0 ⊆ P(Ω) is an algebra. The probability measure on F0 is a function
P : F0 → [0, ∞) with the following holds:
(1) P(Ω) = 1.
S
(2) If (Aj ) ⊆ F0 and pairwise disjoint with A := j≥1 Aj ∈ F0 , then
 
∞
[ ∞
X
P Aj  = P(Aj ).
j=1 j=1
We wish to define a measure on some σ-algebra of (0, 1] such that the measure of an integral coincides
with its length. Generally, we will construct an algebra F0 and define a probability measure P on F0 ; then
try to extend P to a σ-algebra containing F0 .
In this case, we pick F0 to be the collection of disjoint unions of half-open intervals (a, b], where
Fn0 < a ≤ b ≤
1 (note that when a = b, (a, b] = ∅). One can show that F0 is indeed an algebra. Then for A = j=1 (aj , bj ] ∈
Pn
F0 , it is natural to define λ(A) = j=1 (bj −aj ). Since for any interval I = (a, b), (a, b], [a, b), [a, b] with a ≤ b,

Z
b − a = lim # I ∩ (prove it!),
j→∞ j
n n
X Z X Z Z
λ(A) = lim # (aj , bj ] ∩ = lim # (aj , bj ] ∩ = lim # A ∩ ,
j→∞ j j→∞ j j→∞ j
k=1 k=1
λ is well-defined (independent of representation of A). Moreover, we have the following properties:

Property. Let (Ij )j≥1 and I be half-open intervals contained in (0, 1].
S P
(1) If (Ij )j≥1 is pairwise disjoint and j≥1 Ij ⊆ I, then j≥1 λ(Ij ) ≤ λ(I).
S P
(2) If I ⊆ j≥1 Ij , then λ(I) ≤ j≥1 λ(Ij ).
S P
(3) If I = j≥1 Ij and (Ij )j≥1 is pairwise disjoint, then λ(I) = j≥1 λ(Ij ).
5
S
Proof. (1) Since j≥1 Ij ⊆ I, for any n ∈ N,
 
n
X n
[
λ(Ij ) = λ  Ij  ≤ λ(I),
j=1 j=1
Sn and inequality holds by the argument above. Then we are done by letting n → ∞.
where the equality
(2) If I ⊆ j=1 Ij for some n, then we are done. For the general case, let I = (a, b] and Ij = (aj , bj ] for
each j. Let ϵ ∈ (0, b − a) be given (the case I = ∅ is trivial, thus omitted). Then
∞
[ ϵ
(a + ϵ, b] ⊆ [a + ϵ, b] ⊆ aj , bj + .
j=1
2j
By compactness of [a + ϵ, b], there exists a finite subcovering, say (a1 , b1 + 2ϵ ), · · · , (an , bn + ϵ

2n ) without loss
of generality. Then
n ∞ ∞
X ϵ X X
b − (a + ϵ) ≤ (bj − aj ) + j ≤ λ(Ij ) + ϵ =⇒ λ(I) ≤ 2ϵ + λ(Ij ).
j=1
2 j=1 j=1
Since ϵ is arbitrary, we are done.

(3) follows by (1) and (2).
Next, we show that λ is a probability measure on F0 . It is clear that λ ≥ 0 and λ((0, 1]) = 1. It is then
remains to prove (2) in the definition.
Fn Fnj (j) (j)
Proof of (2). Suppose A = k=1 Ik and Aj = k=1 Ik , where (Ik )k≥1 , (Ik )k≥1 are half-open for each j.
Then
∞ Gnj
(j)
G
Ii = Ii ∩ A = (Ii ∩ Ik ).
j=1 k=1
By the property (3) above,

n n X nj
∞ X nj
∞ X ∞
(j) (j)
X X X X
λ(A) = λ(Ii ) = λ(Ii ∩ Ik ) = λ(Ik ) = λ(Aj ).
i=1 i=1 j=1 k=1 j=1 k=1 j=1
Now we are ready for step 2: extending λ to a σ-algebra containing F0 .

Definition 1.3.2. Let P be a probability measure on an algebra F0 ⊆ P(Ω). For each A ⊆ Ω, we define the
outer measure  
X∞ ∞
[ 
P∗ (A) = inf P(Bj ) : A ⊆ Bj , Bj ∈ F0 , ∀j ≥ 1.
 
j=1 j=1
This is called Caratheodory’s extension of P.

Property.
(1) P∗ (∅) = 0.
(2) If A ⊆ B, then P∗ (A) ≤ P∗ (B).
(3) (Countable subadditivity) If (Aj )j≥1 ⊆ Ω, then
 
∞
[ ∞
X
P∗  Aj  ≤ P∗ (Aj ).
j=1 j=1
(4) P∗ |F0 = P.
6
∅ ∈ F0 , 0 ≤ P∗ (∅) ≤ P(∅) = 0.
Proof. (1) Since S
(2) Suppose j≥1 Bj is an F0 -covering of B. Then P∗ (A) ≤ j≥1 P(Bj ) as it is also an F0 -covering of
P
A. By taking inferior over all such (Bj )j≥1 , we are done,
(j)
(3) Let ϵ > 0 be given. Then for each j, there exists an F0 -covering (Bk )k≥1 of Aj such that
P (j) ∗ ϵ
S (j) S
k≥1 P(Bk ) ≤ P (Aj ) + 2j . Since k,j≥1 Bk ⊇ j≥1 Aj , we have
 
∞ ∞ ∞ ∞
[ X X ϵ X
P∗  Aj  ≤ P(Bkj ) ≤ P∗ (Aj ) + j = ϵ + P∗ (Aj ).
j=1 j=1
2 j=1
k,j=1
Since ϵ is arbitrary, we are done.

(4) Since A ∈ F0 , it is itself an F0 -covering (with a tail of ∅). Thus, P∗ (A) ≤ P(A). Conversely, suppose
(Aj )j≥1 is an F0 -covering of A. By the canonical decomposition, we have
 
∞
[ X∞ X∞ X∞
P(A) = P  (Bj ∩ A) =
 P(Bj ∩ A) ≤ P(Bj ) ≤ P(Aj ),
j=1 j=1 j=1 j=1
Sj−1
where Bj = Aj \ k=1 Ak . Taking inferior over all such (Aj )j≥1 , we are done.
In general, P∗ is not a probability measure of P(Ω). We will restrict our collection of sets to obtain an
actual one.
Definition 1.3.3. For A ⊆ Ω, we say that A is P∗ -measurable if
P∗ (E) = P∗ (E ∩ A) + P∗ (E\A), ∀E ⊆ Ω.
The collection of all P∗ measurable sets is defined by M.

By subadditivity, the condition is equivalent to P∗ (E) ≥ P∗ (E ∩ A) + P∗ (E\A) for all E ⊆ Ω.
Theorem 1.3.4 (Caratheodory’s Extension Theorem). M is a σ-algebra containing F0 and P|M is a prob-
ability measure.
Proof. It is clear that M is closed under complement and containing Ω. We first show that M is an algebra.
Let A, B ∈ M, then for any E ⊆ Ω,
P∗ (E) = P∗ (E ∩ A) + P∗ (E\A)
= P∗ (E ∩ A ∩ B) + P∗ ((E ∩ A)\B) + P∗ ((E ∩ B)\A) + P∗ (E\(A ∪ B))
≥ P∗ (E ∩ A ∩ B) + P∗ (E\(A ∩ B)),
showing that A ∩ B ∈ M.
To show that M is a σ-algebra, we claim that if (Aj )≥1 ⊆ M is pairwise disjoint, then for any E ⊆ Ω,
 
∞
G X∞
P∗ E ∩ Aj  = P∗ (E ∩ Aj ).
j=1 j=1
Proof of claim. We first consider the finite case: If there is only one Aj , then there is nothing to prove.
Suppose A1 , A2 ∈ M are disjoint. Then for any E ⊆ Ω,
P∗ (E ∩ (A1 ∪ A2 )) = P∗ (E ∩ (A1 ∪ A2 ) ∩ A1 ) + P∗ ((E ∩ (A1 ∪ A2 ))\A1 ) = P∗ (E ∩ A1 ) + P∗ (E ∩ A2 ).
Then the finite case follows by induction.

For the infinite case, since
   
∞
G n
G n
X
P∗ E ∩ Aj  ≥ P∗ E ∩ Aj  = P∗ (E ∩ Aj )
j=1 j=1 j=1
for any n ∈ N. Let n → ∞ are we are done (the reversed inequality follows from countable subadditivity).
7
Let (Aj )j≥1 ⊆ M be pairwise disjoint. Then for any n ∈ N and E ⊆ Ω,
     
n
G Gn Xn ∞
G
P∗ (E) = P∗ E ∩ Aj  + P∗ E\ Aj  ≥ P∗ (E ∩ Aj ) + P∗ E\ Aj  .
j=1 j=1 j=1 j=1
By letting n → ∞, it follows that

     
∞
X ∞
G ∞
G ∞
G
P∗ (E) ≥ P∗ (E ∩ Aj ) + P∗ E ∩ Aj  = P∗ E ∩ Aj  + P∗ E\ Aj  .
j=1 j=1 j=1 j=1
Thus, M is a σ-algebra. Moreover, by the claim and the properties above, P∗ is a probability measure on
M. Finally, we need to show that F0 ⊆ M. Let A ∈ F0 , E ⊆ Ω, and ϵ > 0 be given. Then there exists an
F0 -covering (Bj )j≥1 of E such that
∞
X
P∗ (E) + ϵ ≥ P(Bj ).
j=1
Then  
∞
[ ∞
X ∞
X
P∗ (A ∩ E) ≤ P∗  (Bj ∩ A) ≤ P∗ (Bj ∩ A) = P(Bj ∩ A).
j=1 j=1 j=1
Similarly, P∗ (E\A) ≤
P
j≥1 P(Bj \A). Therefore,
∞
X ∞
X ∞
X
P∗ (E) + ϵ ≥ P(Bj ) = P(Bj ∩ A) + P(Bj \A) ≥ P∗ (E ∩ A) + P∗ (E\A).
j=1 j=1 j=1
Since ϵ is arbitrary, A ∈ F0 .
Remark. The measure P∗ defined on M is called the Lebesgue measure on (0, 1] and M is the collection
of Lebesgue measurable sets.
Such M is unique in some sense. To show this, we need a bit more concepts.
Definition 1.3.5. Let Ω be a set. A collection of subsets of Ω is a monotone class if it is closed under
monotone limits.
Monotone class is consistent, and we denote µ(C) the monotone class generated by C.
Theorem 1.3.6 (Monotone Class Theorem). If F0 is an algebra, then σ(F0 ) = µ(F0 ).
Proof. Since σ(F0 ) is a monotone class, σ(F0 ) ⊇ µ(F0 ). Then it remains to show that µ(F0 ) is a σ-algebra.
Since Ω ∈ F0 , Ω ∈ µ(F0 ).
(1) (Closed under complement) Let M1 = {A ⊆ Ω : X\A ∈ µ(F0 )}. It is clear that F0 ⊆ M1 . Then
it remains to show that M1 is a monotone class. Suppose (Aj )j≥1 ⊆ M1 and Aj ↗ A. Then
(X\Aj )j≥1 ⊆ µ(F0 ) and X\Aj ↘ X\A. Therefore, X\A ∈ µ(F0 ) and A ∈ M1 . Similarly, if Aj ↘ A,
A ∈ M1 . Hence, µ(F0 ) ⊆ M1 and µ(F0 ) is closed under complement.
(2) (Closed under finite unions) The desired property is ∀A, B ∈ µ(F0 ), A ∪ B ∈ µ(F0 ). We first prove
the intermediate property: ∀A ∈ µ(F0 ), B ∈ F0 , A ∪ B ∈ µ(F0 ). Thus, we set M2 = {A ⊆ Ω : ∀B ∈
F0 , A ∪ B ∈ µ(F0 )}. It is clear that F0 ⊆ M2 . Therefore, it remains to show that M2 is a monotone
class. Suppose (Aj )j≥1 ⊆ M2 with Aj ↗ A. Then for any B ∈ µ(F0 ), Aj ∪ B ↗ A ∪ B ∈ µ(F0 ).
| {z }
∈µ(F0 )
Thus, A ∈ M2 . Similarly, if Aj ↘ A, then A ∈ µ(F0 ).
Set M3 = {A ⊆ Ω : ∀B ∈ µ(F0 ), A ∪ B ∈ µ(F0 )}. Since µ(F0 ) ⊆ M2 , for any A ∈ F0 , B ∈ µ(F0 ),
A ∪ B ∈ µ(F0 ) and therefore F0 ⊆ M3 . By a similar argument, M3 is a monotone class, and we are
done.
8
(3) (Closed under countable union) Let (Aj )j≥1 ⊆ µ(F0 ). Then
∞
[ j
∞ [
[
Aj = Ak ∈ µ(F0 )
j=1 j=1 k=1
| {z }
∈µ(F0 )
Sj
as Bj = k=1 Ak is increasing.
By (1), (2), and (3), µ(F0 ) is a σ-algebra and therefore µ(F0 ) = σ(F0 ).
Theorem 1.3.7. Let F0 be an algebra. If P1 and P2 are probability measures on σ(F0 ) that agree on F0 .
Then they are equal.
Proof. We need to show that for any A ∈ σ(F0 ), P1 (A) = P2 (A). Therefore, set
C = {A ∈ σ(F0 ) : P1 (A) = P2 (A)}.

1.3.6
By the continuity of probability measure, C is a monotone class containing F0 , therefore C ⊇ µ(F0 ) = σ(F0 ).
Thus, for every A ∈ σ(F0 ), P1 (A) = P2 (A).
Definition 1.3.8. Let Ω be a set. A collection Λ ⊆ P(Ω) is a λ-system if
(1) Ω ∈ Λ.
(2) Λ is closed under complement and countable pairwise disjoint unions.
A nonempty collection Π ⊆ P(Ω) is a π-system if Π is closed under finite intersection.

It is clear that both λ-system and π-system are consistent. Let λ(C) and π(C) denote the λ-system and
π-system generated by C, respectively.
Remark. Suppose A, B ∈ Λ with A ⊆ B. Then Ω\(B\A) = A ∪ (Ω\B) ∈ Λ and therefore B\A ∈ Λ.
Theorem 1.3.9 (Dynkin’s π-λ Theorem). Let Ω be a set. Suppose Λ and Π are λ-system and π-system of
Ω, respectively. If Π ⊆ Λ, then σ(Π) ⊆ Λ.
Proof. We claim that λ(Π) is a π-system.
Proof of claim. The desired property is ∀A, B ∈ λ(Π), A ∩ B ∈ λ(Π). Let L1 = {A ⊆ Ω : ∀B ∈ Π, A ∩ B ∈
λ(Π)}. It is clear that Π ⊆ L1 and Ω ∈ L1 . Then it suffices to show that L1 is closed under
F pairwise disjoint
unions and complement. Let (Aj )j≥1 ⊆ L1 be pairwise disjoint. Then for any B ∈ Π, j≥1 Aj ∩ B ∈ λ(Π).
| {z }
∈λ(Π)
Let A ∈ L1 and B ∈ Π be given. Then
(Ω\A) ∩ B = B\A = B\(A ∩ B) ∈ λ(Π)
by the remark above. Since B is arbitrary, Ω\A ∈ L1 .

Therefore, L1 is a λ-system and λ(Π) ⊆ L1 .
Let L2 = {A ⊆ Ω : ∀B ∈ λ(Π), A ∩ B ∈ λ(Π)}. Since λ(Π) ⊆ L1 , Π ⊆ L2 . By a similar argument, L2 is
a λ-system.
Since λ(Π) is a π-system, λ(Π) is a σ-algebra and therefore σ(Π) ⊆ λ(Π) ⊆ Λ, as required.
Remark. If C is simultaneously a π-system and a λ-system, then C is a σ-algebra: Since C is a λ-system, C
is closed under complement; since C is a π-system, C is closed under finite intersections; finally, since C is a
λ-system, C is closed under pairwise disjoint countable unions. By first two properties, C is an algebra; by
the last property, C is a σ-algebra.
9
Chapter 2
Independence and Kolmogorov’s

Zero-One Law
2.1 Limits Behavior and Borel-Cantelli Lemma

Definition 2.1.1. Let (Ω, F, P) be a probability space. Suppose (Aj )j≥1 ⊆ F. The limit superior and
limit inferior of (Aj )j≥1 are defined by
∞ [
\ ∞ ∞ \
[ ∞
lim sup Aj = Ak , lim inf Aj = Ak .
j→∞ j→∞
j=1 k=j j=1 k=j
Below are some basic facts:

(1) lim inf Aj ⊆ lim sup Aj .
j→∞ j→∞
(2) Ω\ lim sup Aj = lim inf Ω\Aj and Ω\ lim inf Aj = lim sup Ω\Aj
j→∞ j→∞ j→∞ j→∞

(3) P lim inf Aj ≤ lim inf P(Aj ) ≤ lim sup P(Aj ) ≤ P lim sup Aj .
j→∞ j→∞ j→∞ j→∞
P
Proposition 2.1.2 (First Borel-Cantelli Lemma). Suppose (Aj )j≥1 ⊆ F. If j≥1 P(Aj ) < ∞, then

P lim sup Aj = 0.
j→∞
Proof. By the continuity of P,

P lim sup Aj = lim P(Cj ),
j→∞ j→∞
S P
where Cj = k≥j Aj . Since P(Cj ) ≤ k≥j P(Ak ) → 0 as j → ∞, lim P(Cj ) = 0 and we are done.
j→∞
Example (Coin Tossing). Consider P be the Lebesgue measure on (0, 1] with the Borel σ-algebra. For
n ∈ N, we define level n intervals
(n) (n)
I1 , · · · , I2n
(n) (n)
by Ik = k−1 k

2n , 2n and define dn (ω) = 0 if ω ∈ Ik with k odd and dn (ω) = 1 otherwise. Then dn ’s
correspond to fair coin flips since for each ω ∈ (0, 1], the sequence (dj (ω))j≥1 is an infinite sequence of 0’s
and 1’s, which are viewed as tails and heads, respectively.
Define Aj = {ω : dj (ω) = 1} = {j-th toss is heads}. Then P(Aj ) = 12 . Since
1
lim inf P(Aj ) = ,
j→∞ 2

1
we know that P lim sup Aj = P({infinitly many tosses are heads}) ≥
j→∞ 2
10
2.2 Independence
Definition 2.2.1. A1 , · · · , An ∈ F are independent if for any distinct k1 , · · · , km ∈ {1, · · · , n}, P(Ak1 ∩
· · · ∩ Akm ) = P(Ak1 )P(Ak2 ) · · · P(Akm ). A collection C of events is independent if each finite subcollection
is independent.
Remark. Each Aj need not be distinct. If A and A are independent, then P(A) = 0 and 1.
Definition 2.2.2. If A, B ∈ F with P(B) > 0, then we define the conditional probability
P(A ∩ B)
P(A|B) = .
P(B)
For a fixed B ∈ F with P(B) > 0, the function A 7→ P(A|B) is a probability measure on F.
Remark. If P(B) > 0, then A, B are independent if and only if P(A|B) = P(A). Moreover, if C is indepen-
dent, then Ω\C = {Ω\A : A ∈ C} is also independent (which can be proved by 1.2.3).
P
Theorem 2.2.3 (Second Borel-Cantelli Lemma). Suppose (Aj )j≥1 ⊆ F is independent. If j≥1 P(Aj ) = ∞,
then
P lim sup Aj = 1.
j→∞
Proof. Since  
∞
\
1 − P lim sup Aj = P lim inf (Ω\Aj ) = lim P  (Ω\Ak ) .
j→∞ j→∞ j→∞
k=j
P
Let M ∈ R be given and fix j ∈ N. Since k≥1 P(Ak ) = ∞, there exists N ∈ N such that for any n ≥ N ,
Pn
k=j P(Ak ) > M . Then for any n ≥ N ,
   
n
\ n
Y n
Y n
Y n
X
P (Ω\Aj ) = P(Ω\Aj ) = 1 − P(Aj ) ≤ exp(−P(Ej )) = exp − P(Ej ) < exp(−M ).
k=j k=j k=j k=j k=j
Since M is arbitrary,  
∞
\
P (Ω\Ak ) = 0 =⇒ P lim sup Aj = 1.

j→∞
k=j
Example. Recall the coin tossing example.

It
can be shown that (Aj )j≥1 is independent (prove it!).
P
Therefore, since j≥1 P(Aj ) = ∞, P lim sup Aj = 1. Similarly, since (Ω\Aj )j≥1 is independent and
P j→∞
j≥1 P(Ω\Aj ) = ∞,

1 = P lim sup(Ω\Aj ) = P Ω\ lim inf Aj = 1 − P lim inf Aj =⇒ P lim inf Aj = 0.
j→∞ j→∞ j→∞ j→∞
Definition 2.2.4. Collections A1 , · · · , An ⊆ F are independent if each choice A1 ∈ A1 , · · · , An ∈ An is

independent.
Proposition 2.2.5. Suppose A1 , · · · , An ⊆ F are independent collections. If each Aj is a π-system, then
σ(A1 ), · · · , σ(An ) are independent.
Proof. We claim that for any A1 , · · · , Am ∈ F, the collection Λ = {A ∈ F : A, A1 , · · · , Am is independent}

is a λ-system.
11
Proof of claim. It is clear that Λ is closed under complement. Let (Bj )j≥1 ⊆ Λ be pairwise disjoint. Then
 
∞
[ ∞
X
P Ai1 ∩ · · · Aik ∩ Bj  = P(Ai1 ∩ · · · ∩ Aik ∩ Bj )
j=1 j=1
∞
X
= P(Ai1 ) · · · P(Aik )P(Bj )
j=1
 
∞
[
= P(Ai1 ) · · · P(Aik )P  Bj 
j=1
for any distinct i1 , · · · , ik ∈ {1, · · · , m}.

By the claim above, the collection
\
Λ := {A ∈ F : {A}, A2 , · · · , An is independent} = {A ∈ F : A, A2 , · · · , An is independent}
A2 ∈A2 ,···An ∈An
is a λ-system containing A1 . Since A1 is a π-system, by 1.3.9, σ(A1 ) ⊆ Λ and σ(A1 ), A2 , · · · , An are

independent. Therefore, σ(A1 ), σ(A2 ), · · · , σ(An ) are independent by induction.
2.3 Kolmogorov’s Zero-One Law

Definition 2.3.1. Let (Aj )j≥1 ⊆ F be independent. We define the tail σ-algebra of (Aj )j≥1 by
∞
\
T = σ({Aj : j ≥ n}).
n=1
The sets in T are called the tail events.

Theorem 2.3.2 (Kolmogorov’s Zero-One Law). Let (Aj )j≥1 ⊆ F be independent. Suppose T is the tail
σ-algebra of (Aj )j≥1 . Then for any A ∈ T , P(A) = 0 or P(A) = 1.
Proof. For each n, π({A1 , · · · , An }) is independent of π({An+1 , An+2 , · · · }) as the π-system generated by
a collection C is the collection of all finite intersections of sets in C (prove it!). By 2.2.5, σ({A1 , · · · , An })
is independent of σ({An+1 , An+2 , · · · }) ⊇ T . Thus, σ({A1 , · · · , An }) is independent of T . Since n ∈ N is
[∞
arbitrary, σ({A1 , · · · , An }) is independent of T . By applying 2.2.5 again, it follows that
n=1
| {z }
λ-system
∞
!
[
T ⊆σ σ({A1 , · · · , An }) is independent of T
n=1
and thus T is independent of T . Take any A ∈ T , we know that P(A) = P(A ∩ A) = P(A)2 and therefore
P(A) = 0 or 1, as required.

Example. We have shown that P lim sup Aj = 1 and P lim inf Aj = 0 by 2.2.3. Kolmogorov’s zero-one
j→∞ j→∞
law is an alternative way. Since lim sup Aj , lim inf Aj are tail events and
j→∞ n→∞

1
P lim inf Aj ≤ ≤ P lim sup Aj ,
j→∞ 2 j→∞
the equalities are proven.
12
Chapter 3
Random Variables and Expectation
3.1 Random Variables

Definition 3.1.1. Let (Ω1 , F1 ) and (Ω2 , F2 ) be measurable spaces. A function f : Ω1 → Ω2 is measurable
if f −1 (F2 ) := {f −1 (A) : A ∈ F2 } ⊆ F1 .
Our focus is the case when Ω2 = Rn and F2 is the Borel σ-algebra BRn of Rn , that is, the σ-algebra
generated by open sets in Rn .
Definition 3.1.2. Let (Ω, F) be a measurable space. X : Ω → R is a random variable if X −1 (BR ) ⊆ F.
Remark. If F2 has a basis E, then we only need to show that f −1 (E) ⊆ F1 to deduce that f is a measurable
function. Indeed, let C = {A ⊆ Ω2 : f −1 (A) ∈ F1 }. Then E ⊆ C and C is a σ-algebra (prove it!). Thus,
F2 = σ(E) ⊆ C and therefore f −1 (F2 ) ⊆ F1 .
Definition 3.1.3. We say that X : Ω → R is a simple function if it has a finite range. That is, X takes
only finitely many values.
Proposition 3.1.4. Suppose X : Ω → R is a simple function. Then the following are equivalent:
(1) X −1 (B) ∈ F for all Borel set B ⊆ R.
(2) X −1 (A) ∈ F for all A ⊆ R.
(3) X −1 ({x}) ∈ F for all x ∈ R.
Proof. It is clear that (2) ⇒ (1) ⇒S(3). (3)⇒(2). Suppose X(Ω) = {x1 , · · · , xn }. Let A ⊆ R. Since
n
X −1 (A) = X −1 (A ∩ {x1 , · · · , xn }) = j=1 X −1 (A ∩ {xj }) ∈ F.
Definition 3.1.5. For A ⊆ Ω, the indicator function of A, 1A , is defined by
(
1, if x ∈ A
1A (ω) =
0, otherwise.
If X is a simple random variable with values x1 , · · · , xn , we can set Ai = X −1 ({xi }) ∈ F and

n
X
X= xi 1Aj .
j=1
Definition 3.1.6. The σ-algebra generated by a random variable X on Ω is defined as
σ(X) := σ({X −1 (B) : B ⊆ R Borel.}),
which is the minimal σ-algebra Σ making X : Ω → R measurable. The σ-algebra generated by the sequence
of random variables (Xj )j≥1 is
σ(X1 , X2 , · · · , ) := σ({Xj−1 (B) : B ⊆ R Borel, j ≥ 1}).
13
Since {(a, b) : a < b} is a basis of Borel σ-algebra, σ(X) = σ({X −1 ((a, b)) : a < b}).
Proposition 3.1.7. Suppose X1 , · · · , Xn are simple random variables on Ω and W = (X1 , · · · , Xn ). Then
σ(X1 , · · · , Xn ) = {W −1 (A) : A ⊆ Rn }.
Proof. For 1 ≤ j ≤ n and Borel E ⊆ R, define Ej = {(x1 , · · · , xn ) ∈ Rn : xj ∈ E}. Then {Xj ∈ E} = {W ∈
Ej } ⊆ {W −1 (A) : A ⊆ Rn }. Therefore, σ(X1 , · · · , Xn ) ⊆ {W −1 (A) : A ⊆ Rn }.
Conversely, write R1 , · · · , Rn for the ranges of X1 , · · · , Xn , respectively. Let A ∈ Rn . Then
[ [ n
\
W −1
(A) = W −1
({(ω1 , · · · , ωn )}) = Xj−1 ({ωj })
(ω1 ,··· ,ωn )∈(R1 ×···×Rn )∩A (ω1 ,··· ,ωn )∈(R1 ×···×Rn )∩A j=1
and therefore {W −1 (A) : A ⊆ Rn } ∈ σ(X1 , · · · , Xn ).

Definition 3.1.8. We say that Y : Ω → R is measurable relative to a σ-algebra G ⊆ F if Y −1 (B) ∈ G
for all Borel B ⊆ R. Y is measurable relative to random variables X1 , · · · , Xn if it is measurable relative to
σ(X1 , · · · , Xn ).
Proposition 3.1.9. If X1 , · · · , Xn are simple random variables, then Y : Ω → R is measurable relative to
X1 , · · · , Xn if and only if there exists f : Rn → R such that
Y (ω) = f (X1 (ω), · · · , Xn (ω)).
Proof. Let B ⊆ R be Borel. Then
Y −1 (B) = {ω : f (X1 (ω), · · · , Xn (ω)) ∈ B} = {ω : (X1 (ω), · · · , Xn (ω)) ∈ f −1 (B)} ∈ σ(X1 , · · · , Xn )
and Y is measurable relative to X1 , · · · , Xn .

Conversely, assume that Y is measurable relative to X1 , · · · , Xn . Then for each y ∈ R,
Y −1 ({y}) ∈ σ(X1 , · · · , Xn ) = {W −1 (A) : A ⊆ Rn }.
Thus, there exists Ay ⊆ Rn such that

Y −1 ({y}) = W −1 (Ay ).
For each y, let By = Y −1 ({y}). Then if y ̸= y ′ , By ∩ By′ = ∅ and therefore W −1 (Ay ∩ Ay′ ) = W −1 (Ay ) ∩
W −1 (Ay′ ) = ∅. Thus, Ay ∩ Ay′ ⊆ Rn \ im(W ) and W (By ) ⊆ Ay \(Ay ∩ Ay′ ), W (By′ ) ⊆ Ay′ \(Ay ∩ Ay′ ),
showing that W (By ) ∩ W (By′ ) = ∅. Thus, if Y (ω1 ) ̸= Y (ω2 ), then W (ω1 ) ̸= W (ω2 ). Define f : Rn → R by
(
Y (ω), if there exists ω ∈ Ω such that x = W (ω)
f (x) =
0, otherwise.
Then f is well-defined by above discussion and f (W (ω)) = Y (ω).

Remark. By 3.1.9, if A1 , · · · , An ∈ F, then Y is measurable relative to σ({A1 , · · · , An }) if and only if
Y = f (1A1 , · · · , 1An ) for some f : Rn → R.
3.2 Independence of Random Variables

Definition 3.2.1. Random variables X1 , · · · , Xn on Ω are independent if for all Borel sets B1 , · · · , Bn ⊆ R,
P(X1 ∈ B1 , · · · , Xn ∈ Bn ) = P(X1 ∈ B1 ) · · · P(Xn ∈ Bn ).
A collection of random variables is independent if each finite subcollection is.

Remark. For simple random variables X1 , · · · , Xn , they are independent if and only if for all x1 , · · · , xn ∈ R,
P(X1 = x1 , · · · , Xn = xn ) = P(X1 = x2 ) · · · P(Xn = xn ).
Theorem 3.2.2 (Kolmogorov’s Zero-One Law on Random Variables). Let (Xj )j≥1 be a sequence of inde-
pendent random variables. Then
14
(1) (σ(Xj ))j≥1 are independent.
(2) Define
∞
\
T = σ(Xn , Xn+1 , · · · ).
n=1
Then A ∈ T implies P(A) = 0 or 1.
Proof. Same as the proof of 2.3.2.

Proposition 3.2.3. Suppose X1 , · · · , Xn are random variables on Ω and g1 , · · · , gn : R → R are Borel
functions, that is, the function g −1 (B) is Borel for any Borel set B ⊆ R. Then g1 (X1 ), · · · , gn (Xn ) are
independent.
Proof. Suppose B1 , · · · , Bn ⊆ R are Borel. Then
P(g1 (X1 ) ∈ B1 , · · · , gn (Xn ) ∈ Bn ) = P(X1 ∈ g1−1 (B1 ), · · · , Xn ∈ gn−1 (Bn ))

= P(X1 ∈ g1−1 (B1 )) · · · P(Xn ∈ gn−1 (Bn ))
= P(g1 (X1 ) ∈ B1 ) · · · P(gn (Xn ) ∈ Bn ).
3.3 Expectation
We only talk about expectation for simple random variables at this point.
Proposition 3.3.1. If X : Ω → R is a simple function, then there exists a unique representation
m
X
X= aj 1Aj
j=1
such that aj ’s are distinct and Aj ’s form a partition of Ω for some m ∈ N.

Pm Pn
Proof. Uniqueness: Suppose X0 := X = j=1 aj 1Aj = k=1 bk 1Bk are two such representations. Since X
takes n (resp. m) values in the representation of Aj ’s (resp. Bk ’s), n = m. We may assume aj ’s and bk ’s
are strictly increasing. Then for each r ∈ R, X −1 ({r}) = Aj = Bk for some Aj and Bk . Suppose a1 < b1 .
Then X −1 ({a1 }) = A1 but X −1 ({a1 }) ̸= Bk for all Bk , a contradiction. Therefore, a1 ≥ b1 . Similarly,
b1 ≤ a1 =⇒ a1 = Pbn1 , and A1 =PB1 .
n
Thus, X1 := j=2 aj 1Aj = k=2 bk 1Bk and the rest follows by induction.
Pn
Existence: Suppose Im(X) = {a1 , · · · , an }. Then X = j=1 aj 1Aj , where Aj = X −1 ({aj }), satisfies the
requirement.
Remark. This representation is called the standard representation of X.
Pn
Definition 3.3.2. If X is a simple random variable with the standard representation X = j=1 aj 1Aj , we
define the expectation of X by
Xn
EX = aj P(Aj ).
j=1
Pn
Proposition 3.3.3. Suppose X is a simple random variable with X = j=1 bj 1Bj (not necessary the
standard representation). Then
Xn
EX = bj P(Bj ).
j=1
15
Proof. The case when Bj ’s are pairwise disjoint is clear by combining same bj ’s. For the general case, there
exists pairwise disjoint (Ak )m
k=1 ⊆ F such that each Bj are union of a subcollection of Ak ’s, say Cj . Then
 
Xm X
X=  bj  1Ak .
k=1 j:Ak ∈Cj
Therefore,  
m
X X n
X X n
X
EX =  bj  P(Ak ) = bj P(Ak ) = bj P(Bj ),
k=1 j:Ak ∈Cj j=1 k:Ak ∈Cj j=1
as required.
Proposition 3.3.4. Let X, Y be simple random variables. Then for any a, b ∈ R, aX + bY is simple and
E(aX + bY ) = aEX + bEY .
Proof. It is clear that aX, bY are simple. Thus, it remains to show that X + Y is simple. Let r ∈ R be given.
Then
[ [
(X + Y )−1 ({r}) = X −1 ({λ}) ∩ Y −1 ({r − λ}) = X −1 ({λ}) ∩ Y −1 ({r − λ}) ∈ F.
λ∈R λ∈R∩im X
By 3.1.4, X + Y is simple random variables.

To show the linearity of E, it suffices to show that E(aX) = aEX and E(X + Y ) = EX + EY . For a ̸= 0,
X X
E(aX) = zP(aX = z) = axP(X = x) = aEX.
z∈im(aX) x∈im(X)
The case a = 0 is trivial.

X
E(X + Y ) = zP(X + Y = z)
z∈im(X+Y )
X
= (x + y)P(X = x, Y = y)
x∈im(X),y∈im(Y )
X X X X
= x P(X = x, Y = y) + y P(X = x, Y = y)
x∈im(X) y∈im(Y ) y∈im(Y ) x∈im(X)
X X
= xP(X = x) + yP(Y = y) = EX + EY.
x∈im(X) y∈im(Y )
Corollary 3.3.4.1. If X, Y are simple random variables with X ≥ Y , then EX ≥ EY .

Corollary 3.3.4.2. If X is a simple random variable, then |X| is a simple random variable and |EX| ≤ E |X|.
Proof. Let r ∈ R be given. If r ≥ 0, then {|X| = r} = {X = r} ∪ {X = −r} ∈ F. If r < 0, then
{|X| = r} = ∅ ∈ F. By 3.1.4, |X| is a simple random variable.
Since ±E(X) = E(±X) ≤ E |X|, |EX| ≤ E |X|.
Proposition 3.3.5. Suppose X, Y are simple random variables. Then XY is a simple random variable.
Moreover, if X and Y are independent, then E(XY ) = EXEY .
Proof. Let r ∈ R be given. If r ̸= 0, then
[ n ro
{XY = r} = {X = λ} ∩ Y = ∈ F.
λ
λ∈im(X)\{0}
If r = 0, then
{XY = r} = {X = 0} ∪ {Y = 0} ∈ F.
By 3.1.4, XY is a simple random variable.
16
Proposition 3.3.6 (Law of the Unconcious Stastician). Suppose X is a simple random variable and g : R →
R be a function. Then g(X) := g ◦ X is a simple random variable with
X
Eg(X) = g(x)P(X = x).
x∈im(X)
Proof. Since for each r ∈ R, {g(X) = r} = X −1 (g −1 ({r})) ∈ F by 3.1.4, g(X) is a simple random variable.
Suppose im g(X) = {y1 , · · · , ym }, where xj ’s and yk ’s are both pairwise disjoint. For k ∈ {1, · · · , m}, let
Ck = {g = yk } ∩ im(X). Therefore,
m
X m
X
Eg(X) = yk P(g(X) = yk ) = yk P(X ∈ Ck )
k=1 k=1
Xm X
= yk P(X = x)
k=1 x∈Ck
Xm X
= g(x)P(X = x)
k=1 x∈Ck
X
= g(x)P(X = x),
x∈im(X)
as required.
Example (Norm of a Vector). Let v1 , · · · , vn ∈ Rd√be unit vectors. Show that there exists constants
a1 , · · · , an ∈ {−1, 1} such that ∥a1 v1 + · · · + an vn ∥ ≤ n.
Proof. Fix v1 , · · · , vn ∈ Rd . Let X1 , · · · , Xn be the independent random variables on Ω (Xj can be chosen
to be dj in the coin tossing example with a slight modification) such that P(Xj = 1) = P(Xj = −1) = 21 .
Define X : Ω → R by
2
n
X X
X(ω) = Xj (ω)vj =n+ Xj (ω)Xk (ω)vj · vk .
j=1 j̸=k
Then
3.3.5
X
EX = n + vj · vk EXj EXk = n.
j̸=k
Therefore, there exists ω ∗ ∈ Ω such that X(ω) ≤ n (as otherwise, EX > n, a contradiction). Let aj =
Xj (ω ∗ ) ∈ {−1, 1}. Then
2
n n
∗
X X √
X(ω ) = aj vj ≤ n ⇐⇒ aj vj ≤ n,
j=1 j=1
as required.
3.4 Probabilistic Inequality

Proposition 3.4.1 (Jensen’s Inequality). If f : [a, b] → R is convex and im(X) ⊆ [a, b], then
f (EX) ≤ Ef (X).
Proof. Note that

X X X
a= aP(X = x) ≤ EX = xP(X = x) ≤ bP(X = x) = b.
x∈im X x∈im X x∈im X
Therefore, f (EX) is defined. Then the rest follows by regular Jensen’s inequality.
17
Definition 3.4.2. Suppose X is a simple random variable. Then the variation of X is defined by
Var(X) = E(X − EX)2 .
Since Var(X) = E(X − EX)2 = E(X 2 − 2XEX + (EX)2 ) = EX 2 − (EX)2 , for any simple random variable
3.3.4
X, EX 2 ≥ (EX)2 . Also, Var(aX) = E(a2 X 2 ) − (aE(X))2 = a2 (EX 2 − (EX)2 ) = a2 Var(X).
Proposition 3.4.3. Suppose X, Y are simple random variable. If X, Y are independent, then Var(X + Y ) =
Var(X) + Var(Y ).
Proof.
3.3.4
Var(X + Y ) = E(X + Y )2 − (EX + EY )2 = (EX 2 − (EX)2 ) + (EY 2 − (EY )2 ) = Var(X) + Var(Y ).
Theorem 3.4.4. Let X and (Xj )nj=1 be simple random variables. Then:
(1) (Markov’s Inequality) For any λ > 0 and α > 0,
α
E |X|
P(|X| ≥ λ) ≤ .
λα
(2) (Chebyshev’s Inequality) For λ > 0 and α > 0,

α
E |X − EX|
P(|X − EX| ≥ λ) ≤ .
λα
Specially, if α = 2,
Var X
P(|X − EX| ≥ λ) ≤ .
λ2
1 1 1
(3) (Holder’s Inequality) For r, p1 , · · · , pn > 0 with r = p1 + ··· + pn ,
∥X1 X2 · · · Xn ∥r ≤ ∥X1 ∥p1 · · · ∥Xn ∥pn ,

p
p
p
where ∥X∥p := E |X| .
(4) (Minkowski’s Inequality) For p ≥ 1, ∥X1 + · · · + Xn ∥p ≤ ∥X1 ∥p + · · · + ∥Xn ∥p .
Proof.
(1) X X
α α α α α
E |X| = |x| P(|X| = |x| ) ≥ |x| P(|X| = |x|) ≥ λα P(|X| ≥ λ).
x∈im(|X|) x∈im(|X|),|x|≥λ
(2) Apply (1) on the random variables |X − EX|.

(3) Suppose ∥Xj ∥pj = 0 for some j, then Xj = 0 and the LHS of the inequality is 0. Suppose ∥Xj ∥pj > 0
for all j, then we set
|Xj |
Yj = .
∥Xj ∥pj
Then it is enough to show that
∥Y1 Y2 · · · Yn ∥r ≤ 1.
By generalized Young’s inequality (or weighted mean inequality),
p1 p2 pn
r r |Y1 | r |Y2 | r |Yn |
|Y1 Y2 · · · Yn | ≤ + + ··· + .
p1 p2 pn
Then by taking expectation both sides, we have
n n
r
X r p
X r
E |Y1 Y2 · · · Yn | ≤ E |Yj | j = = 1 =⇒ ∥Y1 · · · Yn ∥r ≤ 1.
p
j=1 j
p
j=1 j
18
(4) If p = 1, then
∥X1 + · · · + Xn ∥p = E |X1 + · · · + Xn | ≤ E(|X1 | + · · · + |Xn |) = |X1 |p + · · · + |Xn |p .
For p > 1, we prove it by induction on n. The case n = 1 is trivial. If n = 2,
p p−1 p−1
|X1 + X2 | ≤ |X1 | |X1 + X2 | + |X2 | |X1 + X2 | .
p
Then by (3) with r = 1, p1 = p, p2 = p−1 , it follows that
p−1 p−1
E |X1 | |X1 + X2 | ≤ ∥X1 ∥p |X1 + X2 |p−1 p = ∥X1 ∥p ∥X1 + X2 ∥p
p−1
Similarly,
p−1 p−1
E |X2 | |X1 + X2 | ≤ ∥X2 ∥p ∥X1 + X2 ∥p .
Therefore,
p p p−1
∥X1 + X2 ∥p = E |X1 + X2 | ≤ (∥X1 ∥p + ∥X2 ∥p ) ∥X1 + X2 ∥p =⇒ ∥X1 + X2 ∥p ≤ ∥X1 ∥p + ∥X2 ∥p
whenever ∥X1 + X2 ∥p > 0. If ∥X1 + X2 ∥p = 0, then ∥X1 + X2 ∥ = 0 ≤ ∥X1 ∥p + ∥X2 ∥p is clear.
Suppose n ≥ 3, then
∥X1 + · · · + Xn ∥p = ∥X1 + · · · + (Xn−1 + Xn )∥p ≤ ∥X1 ∥p + · · · + ∥Xn−2 ∥p + ∥Xn−1 + Xn ∥p ,
where the inequality follows by induction hypothesis. Since ∥Xn−1 + Xn ∥p ≤ ∥Xn−1 ∥p + ∥Xn ∥p , we
are done.
Example (Weierstrass Approximation Theorem). Suppose g : [0, 1] → R is a continuous function. Then

there exists a sequence of polynomial (pn )n≥1 such that pn → g uniformly on [0, 1].
Proof. Let p ∈ [0, 1] and n ∈ N be given. Suppose Ω = {0, 1, · · · , n}, F = P(Ω) and

n j
P({j}) = p (1 − p)n−j
j
with finite additivity (we define 00 = 1 here). It is clear that P is a probability measure. Define Sn,p : Ω → R
by Sn,p (j) = j (the Binomial random variable, which is the sum of d1 , · · · , dn in the coin tossing example).
By 3.1.4, Sn,p is a simple random variable. Note that
ESn,p = np, Var(Sn,p ) = np(1 − p).
For p ∈ [0, 1], define
n n
Sn,p 3.3.6
X j X n j
gn (p) := Eg = g P(Sn,p = j) = g pj (1 − p)n−j .
n j=0
n j=0
j n
Let ϵ > 0 be given. Since g is uniformly continuous on [0, 1], there exists δ > 0 such that for any x, y ∈ [0, 1]
with |x − y| < δ, |g(x) − g(y)| < ϵ. Let M = max{x ∈ [0, 1] : |g(x)|}. Then
|gn (p) − g(p)| = |E(g(Sn,p /n) − g(p))|
≤ E |g(Sn,p /n) − g(p)|

Sn,p
≤ ϵ + 2M P −p ≥δ
n
3.4.4 Var(Sn,p /n)
≤ ϵ + 2M
δ2
p(1 − p)
= ϵ + 2M
nδ 2
2M
≤ϵ+ 2 →ϵ
nδ
as n → ∞. Therefore, gn → g uniformly. Since gn is a polynomial of p, we are done.
19
Proposition 3.4.5 (Payley-Zygmend Inequality). Suppose X ≥ 0 is a simple random variable and 0 ≤ θ ≤ 1.
Then
(1 − θ)2 (EX)2 ≤ P(X > θEX)EX 2 .
Proof. Since X = X1{X≤θEX} + X1{X>θEX} ,
q p
EX = E(X1{X≤θEX} ) + E(X1{X>θEX} ) ≤ θEX + EX 2 E1{X>θEX} = θEX + EX 2 P(X > θEX).
Therefore,
(1 − θ)2 (EX)2 ≤ EX 2 P(X > θEX).
Example (First and Second Moment Method). Let T be an infinite d-ary tree (d ≥ 2) and p ∈ (0, 1). For
each edge e, e is open (resp. closed) with probability p (resp. 1 − p) independent of all other edges of T. Let
C be the connected open component of 0 (the root of T) and E = {#C = ∞}. We claim that:
(1) If pd < 1, then P(E) = 0.
(2) If pd > 1, then P(E) > 0.
Proof of claim. For n ∈ N, let Dn be the n-th layer of vertices. For v ∈ T\{0}, let 0 ↔ v denote that there
is a path from 0 to v. Let En = {∃v ∈ Dn : 0 ↔ v}. Then En ↘ E.
Then
3.4.4 X
P(En ) = P (Xn ≥ 1) ≤ EXn = P(0 ↔ v) = dn pn → 0
v∈Dn
P
if pd < 1, where Xn = v∈Dn 1{0↔v} . Thus, (1) is proved.
For (2), we will use 3.4.5. Note that
X X X
EXn2 = P(0 ↔ u, 0 ↔ v) = P({0 ↔ u ∧ v})P({u ∧ v ↔ u})P({u ∧ v ↔ v}) + P(0 ↔ u),
u,v∈Dn u,v∈Dn ,u̸=v u∈Dn
where u ∧ v is the nearest common ancestor of u and v. Since

X X n−1
X X
P({0 ↔ u ∧ v})P({u ∧ v ↔ u})P({u ∧ v ↔ v}) = p2n−m
u,v∈Dn ,u̸=v u∈Dn m=0 v∈Bn ,u∧v∈Bm
X n−1
X
= (d − 1)dn−m−1 p2n−m
u∈Dn m=0
X d−1 n−1
X
= p2n dn (dp)−m
d m=0
u∈Dn
X d−1 1
≤ p2n dn 1
d
u∈Dn
1 − dp
d−1 1
= (dp)2n 1 ,
d 1 − dp
it follows that
d−1 1
EXn2 ≤ (dp)2n n
1 + (dp) .
d 1 − dp
By 3.4.5,
(EXn )2 (dp)2n d(dp − 1)
P(Xn ≥ 1) = P(Xn > 0) ≥ ≥ d−1 2n 1
→ > 0.
EXn2 d (dp) 1
1− dp
+ (dp)n (d − 1)dp
Thus,
d(dp − 1)
P(E) = lim P(En ) = lim P(Xn ≥ 1) ≥ > 0.
n→∞ n→∞ (d − 1)dp
20
Chapter 4
Convergence
4.1 Converge Almost Surely and in Probability

Definition 4.1.1. A sequence (Xj )j≥1 of random variables converges almost surely to a random variable
X if P({ω ∈ Ω : Xj (ω) → X(ω)}) = 1. The sequence converges in probability to X if for any ϵ > 0,
P(|Xj − X| ≥ ϵ) → 0 as j → ∞.
Remark. We need to check that {ω ∈ Ω : Xj (ω) → X(ω)} is indeed an event, which is true as

{ω ∈ Ω : Xj (ω) → X(ω)} = lim sup(Xj − X) = 0 ∩ lim inf (Xj − X) = 0 ∈ F
j→∞ j→∞
since lim sup(Xj − X) = inf sup(Xk − X) and lim inf (Xj − X) = sup inf (Xk − X) are both measurable
j→∞ j∈N k≥j j→∞ j∈N k≥j
functions (prove it!).
Proposition 4.1.2. Suppose (Xj )j≥1 and X are random variables. Then

Xj → X almost surely ⇐⇒ ∀ϵ > 0, P lim sup{|Xj − X| ≥ ϵ} = 0.
j→∞
Proof. Suppose Xj → X almost surely. Let ϵ > 0 and ω ∈ Ω such that Xj (ω) → X(ω) be given. There
exists N ∈ N such that if j ≥ N , |Xj (ω) − X(ω)| < ϵ. Thus, |Xj (ω) − X(ω)| ≥ ϵ only finitely often.
Therefore, for a fix ϵ > 0,

1 = P({Xj → X}) ≤ P Ω\ lim sup{|Xj − X| ≥ ϵ} =⇒ P lim sup{|Xj − X| ≥ ϵ} = 0.
j→∞ j→∞

1
Conversely, suppose for any ϵ > 0, P lim sup{|Xj − X| ≥ ϵ} = 0. For each n ∈ N, take ϵn = n.
j→∞
Consider
∞
[
A= lim sup{|Xj − X| ≥ ϵn }.
n=1 j→∞
Then
P(A) = lim P lim sup{|Xj − X| ≥ ϵn } = 0
n→∞ j→∞
as lim sup{|Xj − X| ≥ ϵn } ↗ A. Suppose Xj (ω) ̸→ X(ω), then there exists ϵ0 > 0 such that
j→∞
ω ∈ lim sup{|Xj − X| ≥ ϵ0 }.
j→∞
Take n large enough such that ϵn < ϵ, then

ω ∈ lim sup{|Xj − X| ≥ ϵn } ∈ A.
j→∞
21
Therefore,
P(Xj ̸→ X) ≤ P(A) = 0 =⇒ P(Xj → X) = 1.
Proposition 4.1.3. Suppose (Xj )j≥1 and X are random variables. If Xj → X a.s. (almost surely), then
Xn → X in probability.

Proof. Let ϵ > 0 be given. By 4.1.2, if Xj → X a.s., then P lim sup{|Xj − X| ≥ ϵ} = 0. Therefore,
j→∞

lim sup P(|Xj − X| ≥ ϵ) ≤ P lim sup{|Xj − X| ≥ ϵ} = 0.
j→∞ j→∞
Since ϵ > 0 is arbitrary, Xn → X in probability.

Proposition 4.1.4. Suppose (Xj )j≥1 and X are random variables. If Xj → X in probability, then there
exists a subsequence Xjk → X a.s.
Proof. Since Xj → X in probability, then for each n ∈ N,

1
P |Xj − X| ≥ →0
n
1 1
as j → ∞. Choose a subsequence Xj1 , Xj2 , · · · such that for any k ∈ N, P(|Xjk − X| ≥ k) ≤ 2k
. Then by
2.1.2,
1
P lim sup |Xjk − X| ≥ = 0.
k→∞ k
Therefore, for any ϵ > 0, take k1 < ϵ, then

1
P lim sup{|Xjk − X| ≥ ϵ} ≤ P lim sup |Xjk − X| ≥ = 0.
k→∞ k→∞ k
By 4.1.2, Xjk → X a.s.

Theorem 4.1.5 (Bounded Convergence Theorem). Let (Xj )j≥1 be a sequence of simple random variable.
Suppose that there exists C > 0 such that P(|Xj | ≤ C) = 1 for all j. If Xj → X a.s. for some simple random
variable X, then EXj → EX.
Proof. Since Xj → X a.s., Xj → X in probability by 4.1.3. Let ϵ > 0 be given. There exists N ∈ N such
that for any j ≥ N ,
P(|Xj − X| ≥ ϵ) < ϵ.
Write E |Xj − X| = E |Xj − X| 1{|Xj −X|<ϵ} + E |Xj − X| 1{|Xj −X|≥ϵ} . Since
X
E |Xj − X| 1{|Xj −X|<ϵ} = zP(|Xj − X| = z) < ϵ
z<ϵ,z∈im|Xj −X|
and
E |Xj − X| 1|Xj −X|≥ϵ ≤ 2CP(|Xj − X| ≥ ϵ) ≤ 2Cϵ
as P(|X| ≤ C) = 1. Thus, E |Xj − X| < ϵ(1 + 2C) =⇒ |EXj − EX| < ϵ(1 + 2C) for any j ≥ N . Since ϵ is
arbitrary, EXj → EX.
22
4.2 Laws of Large Numbers
Theorem 4.2.1 (Weak Law of Large Numbers). Let (Xj )j≥1 be simple random variables which are inde-
pendent and identically distributed, that is, P(Xj ∈ B) = P(X1 ∈ B) for all Borel sets B ⊆ R. Then
Pj
setting µ = EX1 and Sj = k=1 Xj ,
Sj
→ µ in probability.
j
Proof. Without loss of generality, we may assume that µ = 0. Then for any ϵ > 0,

Sj 3.4.4 Var(Sj )
P ≥ ϵ = P (|Sj | ≥ jϵ) = P (|Sj − ESj | ≥ jϵ) ≤ .
j j 2 ϵ2
3.4.3
Since Var(Sj ) = j Var(X1 ),
Var(X1 )
P(|Sj | ≥ jϵ) ≤ →0
jϵ
as j → ∞.
Theorem 4.2.2 (Strong Law of Large Numbers). Let (Xj )j≥1 be simple iid (independent and identically
Pj
distributed) random variables. Then setting µ = EX1 and Sj = k=1 Xk ,
Sj
→ µ almost surely.
j
Proof. May assume µ = 0. Let ϵ > 0.
4
3.4.4 E |Sj |
P(|Sj | ≥ ϵj) ≤ .
ϵ4 j 4
Note that
j
X
4
E |Sj | = E Xk1 Xk2 Xk3 Xk4 .
k1 ,k2 ,k3 ,k4 =1
If one of the indices k1 , · · · , k4 is distinct from the others, then that term is 0 by 3.2.3. The only terms that
remain are these:
(1) All k1 , · · · , k4 are equal.

(2) Two pairs are equal.
Therefore,
2
ESj4 = jEX14 + 3j(j − 1) EX12 .
| {z }
(42)·(2j )
Therefore,
1 C
P(|Sj | ≥ jϵ) ≤ (jEX14 + 3j(j − 1)(EX12 )2 ) ≤ .
ϵ4 j 4 j2
1
P
Since j j2 < ∞, by 2.1.2,

Sj 4.1.2 Sj
P lim sup ≥ϵ = 0 =⇒ → 0 almost surely.
j→∞ j j
23
Chapter 5
Gambling Systems (a.k.a. random

walk)
Suppose we are gambling at each unit time n, we either gain 1 dollar or lose 1 dollar. Represent this with
random variables: For each n, our gain is either Xn = 1 or Xn = −1. We will assume that the gain is
independent, but identically distributed, with P(Xn = 1) = p, P(Xn = −1) = q = 1 − p. Assume that
Pn with a dollars, where a ≥ 0. Our cumulative fortune at time n is a + Sn , where S0 = 0 and
we begin
Sn = j=1 Xj . For some c ≥ a, we are declared winner if our winning reach c before they reach 0 and loser
if they reach 0 first.
Lemma 5.0.1. For random variables Xj defined on some space (Ω, F, P) as above and c ≥ a ≥ 0,
{ω : Sn + a = c before Sn + a = 0} ∈ F.
Proof. Note that Sn + a is a simple random variable. So

[
({0 < S0 + a < c} ∩ · · · ∩ {0 < Sn−1 + a < c} ∩ {Sn + a = c}) ∈ F.
n≥0
This is the event that for some n, our fortune has not reached 0 by time n − 1, but reaches c at time n.
Define, for 0 < a < c,
n−1
\
Ea,n = {0 < Sn + a < c} ∩ {Sn + a = c}
k=1
and Ea,0 = ∅, Ec,0 = Ω and E0,n = Ec,n = ∅ when n ≥ 1 as the event that we win at time n. Then define
 
[ X
f (a) = P  Ea,n  = P(Ea,n ).
n≥0 n≥0
Proposition 5.0.2. The function f : {0, · · · , c} → R satisfies f (0) = 0, f (c) = 1 and
f (a) = qf (a − 1) + pf (a + 1)
for a = 1, · · · , c − 1.
Proof. We define
Sn′ = X2 + · · · + Xn+1
for n ≥ 1 and S0′ = 0 and the corresponding events
n−1
\
′
Ea,n = {0 < Sk′ + a < c} ∩ {Sn′ + a = c}.
k=0
24
P
Let a be such that 0 < a < c and note that both f (a) = n≥0 P(Ea,n ). Note that Ea,n ∩ {X1 = 1} is
′
exactly the event {X1 = 1} ∩ Ea+1,n−1 . Therefore, for n ≥ 1,
′ ′
P(Ea,n ) = P(Ea+1,n+1 , X1 = 1) + P(Ea−1,n−1 , X1 = −1).
Then by 3.2.2, it follows that

′ ′
P(Ea,n ) = pP(Ea+1,n−1 ) + qP(Ea−1,n−1 ).
For any x1 , · · · , xn ,
P(X1 = x1 , · · · , Xn = xn ) = P(X2 = x1 , · · · , Xn+1 = xn ).
Summing this over all x1 , · · · , xn such that
0 < a + x1 + · · · + xk < c for 1 ≤ k ≤ n − 1 and a + x1 + · · · + xn = c,

′
we find P(Ea,n ) = P(Ea,n ). Therefore,
P(Ea,n ) = pP(Ea+1,n+1 ) + qP(Ea−1,n−1 ).
Now sum over all n to get f (a) = qf (a − 1) + pf (a + 1).

Proposition 5.0.3. For any p,
P(the game ends eventually) = 1.
Proof. Take f (a) = P(Sn + a = c or 0 eventually). Then f (c) = f (0) = 1 and f (a) = pf (a + 1) + qf (a − 1)
for all a ∈ {1, · · · , c − 1}. Then f (a) = 1 for all a.
5.1 Betting Strategies

In gambling, our strategy is to decide whether to bet depending on what outcomes have been observes so
far. For instance, for each time n we can define a function gn : Rn−1 → {0, 1} and set
Bn = gn (X1 , · · · , Xn−1 ).
We interpret the 1 as “we will bet” and 0 as “we will not bet”.
Note that for any function gn as above, we have Bn is measurable relative to X1 , · · · , Xn−1 by 3.1.9.
Generally, we define
F0 = {∅, Ω}, Fn = σ(X1 , · · · , Xn ).
Let B1 , B2 , · · · be any sequence of {0, 1}-valued random variable such that
Bn is measurable relative to Fn−1 for n ≥ 1.
We will further assume

P(Bn = 1 i.o.) = 1.
Now to analyze thus, we define Yn to be the outcome at the time at which we place the n-th bet. In other
words, set Nn to be the time at which the n-th bet is made
Nn = k if B1 + · · · + Bk = n and B1 + · · · + Bk−1 = n − 1
and Yn = XNn . To see this definition another way, for each ω, we have assumed that B1 (ω), B2 (ω), · · · and
X1 (ω), X2 (ω), · · · are defined. We simply look at the n-th Bk (ω) which is equal to 1 and set Yn (ω) equal to
the Xk (ω) corresponding to this. Strictly speaking this is only defined when there is such a Bk (ω), so we
must define Yn (ω) = −1 if ω ∈ Ω\{Bn = 1 i.o.}.
Theorem 5.1.1. The Yn are simple i.i.d. random variables with P(Yn = 1) = p and P(Yn = −1) = q.
25
Proof. Note that
{Nn = k} = {B1 + · · · + Bk = n} ∩ {B1 + · · · + Bk−1 = n − 1} ∈ Fk−1 .
However, Nn is not a simple random variable, but for x = ±1,

[
{Yn = x} = {XNn = x} = ({Nn = k} ∩ {Xk = x}) ∈ F.
k≥n
For x1 , · · · , xn ∈ {±1}, write pj = p with xj = 1 and q otherwise. We would like to show that
P(Y1 = x1 , · · · , Yn = xn ) = p1 · · · pn .
Since x1 , · · · , xn are arbitrary, this will show that Yj are independent (and in facts i.i.d.). We prove it by
induction. For n = 1, X
P(Y1 = x1 ) = P(N1 = k, Xk = x1 ).
k≥1
Recall that {N1 = k} ∈ Fk−1 , which is independent of σ(Xk ). So we can split into a product:
X X
P(N1 = k)P(Xk = x1 ) = p1 P(N1 = k) = p1 P(N1 ≥ 1) = p1 .
k≥1 k≥1
For n ≥ 2, write
X
P(Y1 = x1 , · · · , Yn = xn ) = P(Xk1 = x1 , · · · , Xkn = xn , N1 = k1 , · · · , Nn = kn ).
k1 <···<kn
Using independence of Fk−1 and σ(Xkn ), we have

X
pn P(Xk1 = x1 , · · · , Xkn−1 = xn−1 , N1 = k1 , · · · , Nn−1 = kn−1 ) = pn P(Y1 = x1 , · · · , Yn−1 = xn−1 ).
k1 <···<kn
By induction, we get p1 · · · pn .
5.2 Gambling Policies

In a gambling policy, we take a more general strategy. Not only do we decide when to bet, but we also decide
how much. Our wager at each time n is Wn , which we assume again is measurable relative to Fk−1 . So there
is a function fn : Rn−1 → R such that
Wn = fn (X1 , · · · , Xn−1 ) ≥ 0.
When Wn = 0 we are not betting. So our total fortune at time n is
Fn = Fn−1 + Wn Xn ,
where F0 is the initial fortune. Note that Wn is independent of Xn , so we get


≥ EFn−1 if p ≥ q,

EFn = EWn EXn + EFn−1 ≤ EFn−1 if p ≤ q,
= EFn−1 if p = q = 21 .


A gambling policy comes with a stopping time. We will assume that we have a rule to determine when we
would like to stop betting. This rule will produce at a time τ after which we will not bet. If τ = n this
represents our decision to stop playing at time n, so this should depend only on X1 , · · · , Xn .
Definition 5.2.1. A stopping time is a function tau : Ω → {0, 1, · · · } ∪ {∞} such that
{τ = n} ∈ Fn .
τ is an almost surely stopping time if P(τ < ∞) = 1.
26
With the stopping time, along with our betting (Wn )n≥1 , we can define our fortune at time n as
(
∗ Fn if n ≤ τ
Fn =
Fτ if n ≥ τ.
One way to view this gambling policy is that the stopping time forces our wagers to be 0. In other words,
we can define a new wager Wn∗ by
Wn∗ = Wn 1n≤τ .
Sn−1
This is still measurable relative to Fn−1 , since {n ≤ τ } = k=0 {τ = k} ∈ Fn−1 . Therefore, we can recast
this in the previous language:
Fn∗ = Fn−1
∗
+ Wn∗ Xn .
So again, even with a stopping time in our betting strategy, we still cannot make a fair game be advantageous
to us.
Theorem 5.2.2. With any uniformly Bounded gambling policy, that is, |Fn∗ | ≤ M with probability 1 for
some constant M , and an almost surely finite stopping time, our final fortune Fτ satisfies

1
= F0 if p = q = 2 ,

EFτ = ≤ F0 if p ≤ q,

≥ F0 if p ≥ q.

Proof. Note that Fn∗ → Fτ almost surely. We need to assume here that Fτ is simple, although it is not neces-
sary, as we will see later in integration theory. Under this assumption, we can use the bounded convergence
theorem (4.1.5) to get
EFn∗ → EFτ
as n → ∞.
27
Chapter 6
Measure Theory
Proposition 6.0.1. Suppose G ⊆ Rd is open. Then G is a disjoint countable union of half-open cube.
Proof. We define a half-open dyadic cube to be a cube Q of the form

i1 i1 + 1 id id + 1
Q= , × ··· × , ,
2n 2n 2n 2n
where n ∈ N, i1 , · · · , id ∈ Z. Since for each n, the half-open dyadic cubes of side length 2−n are pairwise
disjoint and cover Rd . Also observe that each dyadic cube of sidelength 2−n is contained in exactly one
“parent” cube of sidelength 2−n+1 . Hence, given any two half-open dyadic cubes, wither they are almost
disjoint, or one of them is contained in the other (this is called dyadic nesting property).
Suppose x ∈ (a, b) ⊆ R with a < b. Let n ∈ N with that 21n < min{x − a, b − x}. Take j ∈ Z such that
x ∈ ( 2jn , j+1 j 1 j j+1
2n ]. Then x − 2n < 2n < (x − a) =⇒ a < 2n and similarly 2n − x < (b − x) =⇒
j+1
2n < b.
j j+1
Hence, x ∈ ( 2n , 2n+1 ] ⊆ (a, b).
Hence, for each x ∈ G, there exists a half-open dyadic cube Qx such that x ∈ Qx ⊆ G. Then
[
Qx = G.
x∈G
Let Q be a collection of “maximal” half-open dyadic cubes Qx with respectSto set inclusion. Then by dyadic
nesting property, maximal cubes in Q are pairwise disjoint. Thus, G = Q∈Q Q and since Q is at most
countable, we are done.
Hence, in Rd , the Borel σ-algebra is generated by half-open rectangles

 
Y d 
BRd := σ  (aj , bj ] : aj < bj  .
 
j=1
We can even choose the above rectangles to have rational endpoints, so that B is countably generated.
Definition 6.0.2. If (Ω, F) is a measurable space, then a function µ : F → [0, ∞] is a measure if
(1) µ(∅) = 0;
(2) (countable additivity) if (Aj )j≥1 are pairwise disjoint, then

 
G X
µ Aj  = µ(Aj ).
j≥1 j≥1
S
µ is finite if µ(Ω) < ∞; is σ-finite if there is an F-sequence (Aj )j≥1 such that Ω = j≥1 Aj and µ(Aj ) < ∞.
Sn
Note that we can make (Aj )j≥1 be increasing by setting Bj := k=1 Ak or pairwise disjoint by canonical
decomposition. The tuple (Ω, F, µ) is a measure space.
28
Proposition 6.0.3. If (Ω, F, µ) is a measure space, then
(1) if A, B ∈ F with A ⊆ B, then µ(A) ≤ µ(B);
(2) if (Aj )j≥1 ⊆ F with Aj ↗ A, then µ(Aj ) → µ(A) as j → ∞;
(3) if (Aj )j≥1 ⊆ F with Aj ↘ A and µ(A1 ) < ∞, then µ(Aj ) → µ(A) as j → ∞;
S P
(4) if (Aj )j≥1 ⊆ F, then µ j≥1 Aj ≤ j≥1 µ(Aj ).
Theorem 6.0.4. Suppose (Ω, F) is a measurable space and µ, ν are two measures on F. If F = σ(F0 ) for
some algebra F0 , µ|F0 = ν|F0 , and µ, ν are σ-finite on (Ω, F0 ), then µ = ν.
S
Proof. Write Ω = j≥1 Bj with (Bj )j≥1 ⊆ F0 , µ(Bj ) < ∞ and Bj ↗ Ω. Define
Cj = {A ∈ F : µ(A ∩ Bj ) = ν(A ∩ Bj )}.
Then F0 ⊆ Cj and Cj is a monotone class. Then by 1.3.6, F = σ(F0 ) = µ(F0 ) ⊆ µ(Cj ) = Cj .

Therefore, for any A ∈ F,
µ(A) = lim µ(A ∩ Bj ) = lim ν(A ∩ Bj ) = ν(A).

j→∞ j→∞
6.1 Lebesgue Measure

We can build the n-dimensional Lebesgue measure by setting
F0 = {finite unions of disjoint half-open rectangles}.
For any n-dimensional half-open rectangle we define

 
n
Y n
Y
λn  (aj , bj ] = (bj − aj ).
j=1 j=1
FN
Then for any j=1 Rj ∈ F0 with half-open rectangles Rj , we define
 
N
G N
X
λn  Rj  = λn (Rj ).
j=1 j=1
This is well-defined as we can recover λn on half-open rectangles by

1
λ(R) = lim #(R ∩ Zn ).
j→∞ j
Note that λn is a measure on (Rn , F0 ) and λn is σ-finite on (Rn , F0 ). Thus, there exists a unique extension,
that is, the Caratheodory’s extension, λn on σ(F0 ) = BRn .
Remark. Since x 7→ x + α is a homeomorphism for all α ∈ R, BRn is translational invariant. Moreover, if
we define λα α α α
n (A) = λn (A + α) on BRn , then λn |F0 = λn |F0 . Since λn , λn are σ-finite on F0 , by 6.0.4, λn = λn
and therefore λn is translational invariant.
Proposition 6.1.1. The Lebesgue measure λn on (Rn , BRn ) is unique in the sense that if there is another
translational invariant measure ν on BRn such that ν is finite on bounded set, then ν = αλn for some α ≥ 0.
Proof. Let α = ν((0, 1]n ). Then by translational invariance, ν and αλn agrees on {(p, q]n : p < q, p, q ∈ Q},
which generates BRn . Then since ν and αλn are both σ-finite, by 6.0.4, ν = αλn on BRn .
29
Corollary 6.1.1.1. Suppose T : Rn → Rn is a linear isomorphism. Then
λn (T A) = |det T | λn (A)
for all A ∈ BRn .
Proof. Define ν(A) = λn (T A) for all A ∈ BRn . Since T is invertible, T is a finite product of elementary
matrices. Hence, we only need to prove the case when T is an elementary matrix.
Since ν is translational invariant on BRn (prove it by considering each type of elementary matrix sep-
arately) and finite on bounded set, we have ν = αλn . Since ν((0, 1]n ) = |det T |, α = |det T | and we are
done.
Definition 6.1.2. Let (X, T ) be a topological space and (X, F, µ) be a measure space with F ⊇ T . We say
that µ is inner regular if
µ(A) = sup{µ(K) : K ⊆ A, K compact}
and outer regular if
µ(A) = inf{µ(G) : A ⊆ G, G open}.
µ is regular if µ is both inner and outer regular.
Proposition 6.1.3. λn is a regular measure.
Proof. Let A ∈ BRn be given. If λn (A) < ∞, then by the definition of λn , for any ϵ > 0, there exists an
open set G ⊇ A (countable F union of open rectangle) such that λn (G) ≤ λn (A) + ϵ. Then we are done. For
general case, write A = k≥1 Ak with λn (Ak ) < ∞ as λS n is σ-finite. Then for S any k ∈ N, there
P exists open
ϵ
set
P k G ⊇ A k such that λ n (G k ) ≤ λ n (A k ) + 2k
. Thus, k≥1 kG ⊇ A and λ n ( G
k≥1 k ) ≤ k≥1 λn (Gk ) ≤
k≥1 λ(A k ) + ϵ = λ n (A) + ϵ. Hence, λ n is outer regular.
Let
closed
C = A ∈ BRn : ∀ϵ, ∃F ⊆ A, λn (A\F ) < ϵ .
Then C contains every closed set and C is closed under countable intersection by ϵ/2k -trick. Let A ∈ C.
If λn (A) < ∞, for any ϵ > 0, there exists an open set G and closed set F such that G ⊇ A ⊇ F and
λn (G\F ) < ϵ. Hence, λn (Rn \A\Rn \G) = λn (G\A) ≤ λn (G\F ) < ϵ, showing that Rn \A ∈ C. Therefore,
C = BRn .
Finally, suppose A ∈ BRn . Let Ak = A ∩ (−k, k)n . Let ϵ > 0 be given. Then there exists closed set
ϵ
Fk ⊆ Ak (hence compact) such that λ(Ak \Fk ) < 2k+1 . Suppose λn (A) < ∞ Then, for any n ∈ N,
 
[ X ϵ
λn A\ Fk  ≤ λn (Ak \Fk ) ≤ .
2
k≥1 k≥1
SN S
Since A\ k=1 Fk ↘ A\ k≥1 Fk as N → ∞, therefore, for N large enough,
N
! N
!
[ [
λn A\ Fk < ϵ ⇐⇒ λn (A) < λn Fk + ϵ.
k=1 k=1
Therefore, A = sup{λn (K) : K ⊆ A, K compact}.

If λn (A) = ∞, then for any r ∈ R, there exists k such that λn (Ak ) > r. Hence, there exists a closed
set Fk ⊆ Ak (hence compact) such that λn (Ak \Fk ) ≤ λn (A2k )−r ⇐⇒ λn (Fk ) ≥ λn (A2k )+r > r. Then
sup{λn (K) : K ⊆ A, K compact} ≥ λn (Fk ) > r. Since r is arbitrary, sup{λn (K) : K ⊆ A, K compact} =
∞ = λn (A).
30
6.2 Distribution Functions
Definition 6.2.1. If µ is a Borel measure on R, the distribution function F for µ is defined by
Fµ (x) = µ((−∞, x]).
It is clear that F is nondecreasing. If µ is finite, then µ((a, b]) = F (b) − F (a) and F is right-continuous.
Theorem 6.2.2. For each F : R → R, that is nondecreasing and right-continuous, there exists exactly one
Borel measure µ on R such that µ((a, b]) = F (b) − F (a) for all a ≤ b.
Similarly, for finite Borel measure µ on Rn , we may define Fµ : Rn → R by

 
n
Y
F (x1 , · · · , xn ) = µ  (−∞, xj ] .
j=1
6.3 Measurable Functions

Definition 6.3.1. For R∗ = [−∞, ∞], we say that B ⊆ R∗ is Borel if B ∩ R is Borel. The collection of
R∗ Borel set is denoted by BR∗ . If (Ω, F) is a measurable space, a function f : Ω → R∗ is measurable if
f −1 (B) ∈ F for all B ∈ BR∗ .
Proposition 6.3.2. Suppose (Ω, F) is a measurable space and (fj )j≥1 is a sequence of (F, BR )-measurable
functions. Then
(1) supj fj , inf j fj , lim supj fj , lim inf j fj are measurable.
(2) {ω : fj (ω) converges} ∈ F.
(3) If f : Ω → R is measurable, then {ω : fj (ω) → f (ω)} ∈ F.
Proof. (1): Trivial. (2): Define F : Ω → [−∞, ∞]2 by F (ω) = (lim inf j fj (ω), lim supj fj (ω)). Then
{ω : fj (ω) converges} = F −1 ({(x, y) ∈ [−∞, ∞]2 : x = y}) ∈ F.
(3): Similar, consider

F (ω) := (lim inf fj (ω), lim sup fj (ω), f ).
j j
Then
{ω : fj (ω) → f (ω)} = F −1 ({(x, y, z) ∈ [−∞, ∞]3 : x = y = z}) ∈ F.
Theorem 6.3.3. Let (Ω, F) be a measurable space. Suppose f : Ω → R∗ is a measurable function, then
there exists a sequence of measurable simple functions (fk )k≥1 with fk : Ω → R such that |fk | ≤ |fk+1 | and
fk → f poinwisely. Furthermore, if f is bounded, then fk ⇒ f .
Proof. We consider the dyadic partition of R: Let
m
 2k ,
 if 0 ≤ f (x) ≤ k and m ∈ Z such that f (x) ∈ [ 2mk , m+1
2k
),
 m+1 , if − k ≤ f (x) < 0 and m ∈ Z such that f (x) ∈ [ m , m+1 ),

fk (x) = 2k 2k 2k


 k if f (x) > k,
−k if f (x) < −k.

Each f −1 [ 2mk , m+1

2k
) ∈ F since f is measurable. Thus, each fk is a simple measurable function.
The definition of fk implies that |fk | ≤ |fk+1 | ≤ |f | with |fk (x) − f (x)| ≤ 21k for all x ∈ X such that
f (x) ∈ [−k, k]. Hence, fk → f pointwisely and fk ⇒ f if f is bounded.
31
Definition 6.3.4. Suppose (Ω1 , F1 ) and (Ω2 , F2 ) are measurable space. If µ is a measure on F1 and
f : Ω1 → Ω2 is a measurable function, we define the push forward measure µf −1 on F2 by
µf −1 (A) = µ(f −1 (A)).
It can be shown that µf −1 is indeed a measure on F2 .

If µ = P is a probability measure, then Pf −1 is also a probability measure. If X : Ω → R is a random
variable, then PX −1 is called the distribution (the law) of X. In this case, the distribution function of X
is the distribution of PX −1 , defined by
F (x) = PX −1 ((−∞, x]) = P(X ≤ x).
Then F is right continuous and nondecreasing. Moreover,
lim F (x) = lim P(X ≤ x) = P(X = −∞) = 0

x→−∞ x→−∞
and similarly, F (x) → 1 as x → ∞.

Theorem 6.3.5. Suppose F : R → R is right-continuous and nondecreasing with
lim F (x) = 0, lim F (x) = 1.

x→−∞ x→∞
Then there exists a random variable X on some probability space (Ω, F, P) such that P(X ≤ x) = F (x).
Proof. Consider (Ω, F, P) = ((0, 1), B(0,1) , λ). If F is strictly increasing, we may define X = F −1 : (0, 1) → R
and therefore
P(X ≤ x) = P({ω ∈ (0, 1) : ω ≤ F (x)}) = P((0, F (x)]) = F (x).
For general F , we may define the “right continuous inverse” by F −1 : (0, 1) → R by
F −1 (ω) = inf{x ∈ R : ω ≤ F (x)}.
It is well-defined as for any ω ∈ (0, 1), {x ∈ R : ω ≤ F (x)} is nonempty since lim F (x) = 1 and if
x→∞
inf{x ∈ R : ω ≤ F (x)} = −∞, then ω ≤ F (xn ) for some xn → −∞, which implies that ω ≤ 0 since
lim F (x) = 0, a contradiction.
x→−∞
Moreover, F −1 (ω) ≤ x if and only if ω ≤ F (x) for all x ∈ R. Indeed, if ω ≤ F (x), then F −1 (ω) = inf{x ∈
R : ω ≤ F (x)} ≤ x. On the other hand, if F −1 (ω) ≤ x, then for any n ∈ N, there exists yn ∈ R such that
F (yn ) ≥ ω and yn < x + n1 . Since F is nondecreasing, ω ≤ F (yn ) ≤ F (x + n1 ). Take n → ∞, ω ≤ F (x) as F
is right continuous. Hence, if we define X = F −1 : (0, 1) → R, then
P(X ≤ x) = P({ω : ω ≤ F (x)}) = F (x),
as required.
Remark. The “right continuous inverse” is called the quantile of F . In the case when F is strictly increasing,
F −1 coincides with the usual inverse.
Sometimes the quantile function is defined on [0, 1]. In this case, it takes values in R∗ . But the crucial
inequality F −1 (ω) ≤ x ⇐⇒ ω ≤ F (x) still holds.
6.4 Some Distributions

Definition 6.4.1. We say that a random variable X : Ω → R has a discrete distribution if its distribution
function is piecewise constants.
For discrete distributions, it is more convenient to look at P(X = x).
Example. Suppose Ω = {0, 1}, F = P(Ω). For p ∈ [0, 1], the Bernoulli random variable of parameter p
is a random variable X : Ω → R defined on a probability space (Ω, F, P) with P({1}) = p and P({0}) = 1 − p
such that X(j) = j.
32
Example. Suppose Ω = {0, 1, · · · , n}, F = P(Ω). For p ∈ [0, 1], the Binomial random variable X:Ω→
R with parameter p is a random variable defined on a probability space (Ω, F, P) with P({j}) = nj pj (1 − p)j
such that X(j) = j.
Example. Suppose Ω = Z≥0 , F = P(Ω). For λ > 0, the Poisson random variable X : Ω → R with
j
parameter λ is a random variable defined on a probability space (Ω, F, P) with P({j}) = e−λ λj! .
Note that the probability space is not important: if X : Ω → R is a random variable, then we can define
−1
X ∗ : [0, 1] → R∗ by X ∗ (ω) = FX (U ), where U : [0, 1] → R is the standard uniform random variable
−1
defined on ([0, 1], B[0,1] , λ) with U (x) = x, FX is the distribution function of X and FX is the quantile of
FX . Hence,
P(X ∗ ≤ x) = P(U ≤ FX (x)) = FX (x) = P(X ≤ x)
as P(U ≤ y) = λ([0, y]) = y, ∀y ∈ [0, 1].
Hence, we only care about the distribution, and we say that two random variables are equivalent if their
distribution functions are the same, or equivalently, their push forward measures are the same.
Example (exponential distribution). Let X be a random variable which is supposed to represent a waiting
time for an event (e.g. a phone call). We assume that the event in independent of time, that is, for any
x, y ≥ 0,
P(X > x + y|X > y) = P(X > x).
Also, we assume that P(X ≥ 0) = 1 (the event will happen eventually). Then we have
P(X > x + y) 1 − FX (x + y)
= P(X > x) ⇐⇒ = 1 − FX (x).
P(X > y) 1 − FX (y)
Let G(x) = 1−FX (x). Then G(x+y) = G(x)G(y) =⇒ G(x) = eαx for some α ∈ R. Hence, FX (x) = 1−eαx
for some α ∈ R, ∀x > 0. Since FX (x) → 1 as x → ∞, α < 0 and we may replace α by −α with α > 0 (the
case α = 0 is meaningless). Then (
1 − e−αx if x > 0;
F (x) =
0 if x ≤ 0.
This is also known as the exponential distribution (with parameter α > 0).
33
Chapter 7
Integration
Pn(Ω, F, µ) be a measure space. If f : Ω → R, then we say that f is simple (and

Definition 7.0.1. Let
measurable) if f = j=1 xj 1Aj , where xj ∈ R and Aj ∈ F. In this case, we define
Z n
X
f dµ = xj µ(Aj ).
j=1
This is well-defined by 3.3.3.

Definition 7.0.2. Let (Ω, F, µ) be a measure space. Let f : Ω → [0, ∞] measurable. Then we define
Z Z
f dµ = sup sdµ : s simple, 0 ≤ s ≤ f .
Definition 7.0.3. Let (Ω, F, µ) be a measure space. Let f : Ω → [−∞, ∞] be measurable. Then we define
Z Z Z
f dµ = f + dµ − f − dµ,
f + := max(f, 0) and f − := max(−f, 0), as long as one of f + dµ, f − dµ is finite. If both f + and
R R R
Rwhere
−
f are finite, then we say that f is integrable.
Proposition 7.0.4. Let f, g : Ω → [0, ∞] be measurable. If

R R
(1) f ≤ g, then f dµ ≤ gdµ.
(2) (monotone convergence theorem) (fj )j≥1 is a sequence of nonnegative measurable functions such that
fj ↗ f , then Z Z
fj dµ → f dµ.
(3) (Fatou’s lemma) (fj )j≥1 is a sequence of nonnegative measurable functions, then
Z Z
lim inf fj dµ ≤ lim inf fj dµ.
j→∞ j→∞
(4) a, b ≥ 0, then Z Z Z
(af + bg)dµ = a f dµ + b gdµ.
Proof. (1): Trivial. (2): Since fj ↗ f ,

Z Z Z Z
fj dµ ≤ f dµ =⇒ lim fj dµ ≤ f dµ,
j→∞
34
R
where the limit exists as fj dµ is an increasing sequence. On the other hand, we take a simple function
0 ≤ s ≤ f . For α ∈ (0, 1), we define
Ek = {fk ≥ αs}.
Pn
Since fk ↗ f , Ek ↗ Ω. Write s = j=1 xj 1Aj with xj ≥ 0, µ(Aj ) < ∞ and (Aj )nj=1 disjoint (standard
representation). Since
fk ≥ αs1Ek ,
by (1), it follows that
Z n
X Z m
X
fk dµ ≥ α xj µ(Aj ∩ Ek ) =⇒ lim fk dµ ≥ α xj µ(Aj ).
k→∞
j=1 j=1
Since α is arbitrary,
Z m
X Z
lim fk dµ ≥ xj µ(Aj ) = sdµ.
k→∞
j=1
Since s is arbitrary, we are done.

(3): Note that inf k≥j fk ↗ lim inf j fj . By monotone convergence theorem,
Z Z
lim inf fk dµ = lim inf fj .
j→∞ k≥j j→∞
Since inf k≥j fj ≤ fj ,

Z Z Z Z
inf fk dµ ≤ fj dµ =⇒ lim inf fk dµ ≤ lim inf fj dµ,
k≥j j→∞ k≥j j→∞
completing the proof.

(4): If f, g are simple functions, then there is nothing to prove. For general f, g, take simple functions
0 ≤ ψj , φj such that ψj ↗ f, φj ↗ g. Then
Z Z Z Z Z Z
MCT MCT
(af + bg)dµ = lim (aψj + bφj )dµ = a lim ψj dµ + b lim φj dµ = a f dµ + b gdµ.
j→∞ j→∞ j→∞
We say that a property P holds µ-almost everywhere (µ-a.e.) if {ω ∈ Ω : P does not hold for ω} ∈ F
and µ({ω ∈ Ω : P does not hold for ω}) = 0.
Remark. Note that if f is integrable. Then |f | dµ = f + dµ + f − dµ < ∞. Conversely, if |f | dµ < ∞,
R R R R
then f is integrable.
Proposition 7.0.5. Let f, g be nonnegative and measurable.
R
(1) f dµ = 0 if and only if f = 0 µ-a.e.
R
(2) If f dµ < ∞, then f < ∞ µ-a.e.
R R
(3) If f ≤ g µ-a.e., then f dµ ≤ gdµ
R R
(4) If f = g µ-a.e., then f dµ = gdµ.
Pn
Proof. (1): Suppose f = 0 µ-a.e. Let Z = {f > 0}. Then for any simple function s = j=1 xj 1Aj with
µ(Aj ) < ∞ and xj > 0 such that sR ≤ f , we have µ(Aj ) = 0 otherwiseR {f > 0} ⊇ {s > 0} ⊇ Aj for some
µ(Aj ) > 0, a contradiction. Hence, sdµ = 0. Since s is arbitrary, f dµ = 0. Conversely, if µ({f > 0}) > 0,
since {f > 0} = j≥1 {f > j −1 }, there exists j ≥ 1 such that µ({f > j −1 }) > 0. Hence, s = 1j 1A ≤ f and
S
Z
1
fµ ≥ µ({f > j −1 }) > 0,
j
a contradiction.
35
R (2): Suppose µ({f = ∞}) > 0. Consider the simple function sn = n1{f =∞} . Then f ≥ sn with
sn dµ = ∞, a contradiction. Pn
(3): Let Z = {f > g}. Then µ(Z) = 0. Let s = j=1 xj 1Aj ≤ f be a simple function. Then
⋆
P n
s := j=1 xj 1Aj \Z ≤ g and
n
X n
X Z
xj µ(Aj ) = xj µ(Aj \Z) ≤ gdµ.
j=1 j=1
Since s is arbitrary, the result follows.

(4): Follows from (3).
Corollary 7.0.5.1. Suppose f, g : Ω → [−∞, ∞] are measurable. If g is integrable and |f | ≤ |g| µ-a.e., then
f is integrable.
R R
Proof. Since |g| dµ < ∞ and |f | ≤ |g| µ-a.e., by 7.0.5, |f | dµ < ∞ ⇐⇒ f is integrable.
Proposition 7.0.6. Let f, g be integrable and a, b ∈ R.

R R
(1) If f ≤ g µ-a.e., then f dµ ≤ gdµ.
R R R
(2) af + bg is integrable, and (af + bg)dµ = a f dµ + b gdµ.
R R
(3) f dµ ≤ |f | dµ.
Proof. (1): Since f ≤ g µ-a.e., f + ≤ g + µ-a.e. and f − ≥ g − µ-a.e. Then the result follows by 7.0.5.
(2): Since Z Z Z
|af + bg| dµ ≤ |a| |f | dµ + |b| |g| dµ < ∞
whenever a, b ∈ R. Hence, af + bg is integrable. Since
f + g = (f + g)+ − (f + g)− = f + − f − + g + − g − =⇒ (f + g)+ + f − + g − = (f + g)− + f + + g + .
Then by 7.0.4,
Z Z Z Z Z Z
(f + g)+ dµ + f − dµ + g − dµ = (f + g)− dµ + f + dµ + g + dµ.
Then the result follows by rearranging.

If c ≥ 0, then
Z Z Z Z Z Z
+ − + −
cf dµ = (cf ) dµ − (cf ) dµ = c f dµ − f dµ = c f dµ.
If c < 0, then (cf )+ = max{cf, 0} = −c max{−f, 0} = −cf − and (cf )− = max{−cf, 0} = −c max{f, 0} =
−cf + . Hence, Z Z Z Z
cf dµ = −c f − dµ − f + dµ = c f dµ.
(3): Z Z Z Z Z Z
+ − + −
f dµ = f dµ − f dµ ≤ f dµ + f dµ = |f | dµ
Theorem 7.0.7 (dominated convergence theorem). Suppose (fj )j≥1 and f, g are measurable and fj → f
µ-a.e. If |fj | ≤ g and g is integrable, then fj , f is integrable and
Z Z
fj dµ → f dµ.
36
Proof. Since |fj | ≤ g a.e., fj is integrable. Moreover, as fj → f a.e., |f | ≤ g a.e. and f is integrable. Then
we apply Fatou’s lemma on g + fj and g − fj ,
Z Z Z Z
lim inf (g + fj )dµ ≤ lim inf (g + fj )dµ, lim inf (g − fj )dµ ≤ lim inf (g − fj )dµ.
j→∞ j→∞ j→∞ j→∞
Then Z Z Z Z Z Z Z
(f + g)dµ = f dµ + gdµ ≤ gdµ + lim inf fj dµ =⇒ f dµ ≤ lim inf fj dµ
j→∞ j→∞
and Z Z Z Z Z
(g − f )dµ ≤ gdµ − lim sup fj dµ =⇒ f dµ ≥ lim sup fj dµ.
j→∞ j→∞
Hence, Z Z
fj dµ → f dµ.
7.1 Density Function

Definition 7.1.1. Let f : Ω → [0, ∞] be measurable. Then we may define ν : F → [0, ∞] by
Z
ν(A) = f 1A dµ.
Then ν is a measure, and f is called the density function of ν relative to µ.

Remark. If µ(A) = 0, then ν(A) = 0 (we define 0 · ∞ = ∞ · 0 = 0). We say that a measure ν is absolutely
continuous to µ if this condition holds and denote it as ν ≪ µ.
Proposition 7.1.2. Suppose that f, g : Ω → [0, ∞] are nonnegative and measurable functions. If
Z Z
f dµ = gdµ
A A
for any A ∈ F, then f = g µ-a.e.
Proof. To get a contradiction, we may assume that µ({f > g}) > 0. Set Aj = {f > g + 1j } ∩ {g ≤ j}. Then
X
0 < µ({f > g}) ≤ µ(Aj ).
j≥1
S n such that µ(An ) > 0. Since µ is a σ-finite, there exists (Bj )j≥1 ⊆ Ω such that
Then there exists some
µ(Bj ) < ∞ and Ω = j≥1 Bj . Let Cj = Bj ∩ An . Then
Z Z XZ
1 1
0< µ(An ) = 1A dµ ≤ (f − g)1An dµ ≤ (f − g)dµ.
n n n Cj
j≥1
On the other hand, Z Z

gdµ ≤ nµ(Cj ) < ∞ =⇒ (f − g)dµ = 0
Cj Cj
for all j ≥ 1. Hence,

XZ
(f − g)dµ = 0,
j≥1 Cj
a contradiction.
37
Theorem 7.1.3. If g is nonnegative and measurable, then
Z Z
gdν = gf dµ,
where f is the density function of ν relative to µ. For measure g, g is integrable relative to ν if and only if
gf is integrable relative to µ. In this case, the above formula holds.
Proof. We use the standard limiting argument: simple functions → nonnegative measurable function →
integrable functions.
If g = 1A for some A ∈ F, then
Z Z Z
gdν = ν(A) ≡ 1A f dµ = gf dµ.
By linearity, the formula holds for simple functions. For nonnegative measurable function g, there exists
simple functions ψj ↗ g, then the formula holds by MCT.
Suppose g is measurable. The formula holds for |g| and thus gf is µ-integrable ⇐⇒ g is ν-integrable.
In this case, write g = g + − g − and we may use linearity to conclude.
Theorem 7.1.4. Suppose T : Ω1 → Ω2 and f : Ω2 → [0, ∞] are measurable. Then
Z Z
f ◦ T dµ = f d(µT −1 ).
If f is measurable, it is integrable relative to µT −1 ⇐⇒ f ◦ T is integrable relative to µ, in which case the

above formula holds.
Proof. Holds by the standard limiting argument.
Hence, if X : Ω → R is a random variable and f : R → R is integrable/nonnegative measurable, then
Z Z
f ◦ XdP = f dPX −1 .
Remark. Hence, 3.3.6 holds for integrable/nonnegative measurable function g.
7.2 Product Measure

Definition 7.2.1. If (Ω1 , F1 ) and (Ω2 , F2 ) are measurable spaces, then the product space (Ω1 ×Ω2 , F1 ⊗F2 )
is defined by
F1 ⊗ F2 = σ({A × B : A ∈ F1 , B ∈ F2 }).
Proposition 7.2.2. Suppose E ∈ F1 ⊗ F2 and x ∈ Ω1 . Then we define the x-section Sx (E) by
Sx (E) := {y ∈ Ω2 : (x, y) ∈ E} ∈ F2 .
For f : Ω1 × Ω2 → R be measurable and x ∈ Ω2 , the x-section fx : Ω2 → R by f defined by fx (y) = f (x, y)

is measurable.
Proof. Let x ∈ Ω1 . We define Tx : Ω2 → Ω1 × Ω2 by Tx (y) = (x, y). Let R = A × B with A ∈ Ω1 , B ∈ Ω2
(this is called a measurable rectangle). Then
(
B, if x ∈ A
Tx−1 (R) =
∅, if x ∈
/ A.
Hence, Tx is measurable and therefore Sx (E) = Tx−1 (E) is measurable. Since fx = f ◦Tx , fx is measurable.
Definition 7.2.3. The product measure µ1 ⊗ µ2 on E ∈ F1 × F2 is defined by
Z
(µ1 ⊗ µ2 )(E) = µ2 (Sx (E))dµ1 (x).
38
For the above definition to make sense, we need to know that x 7→ µ2 (Sx (E)) is an F1 -measurable
function.
If E = A × B is a measurable rectangle, then
(
µ2 (B), if x ∈ A,
µ2 (Sx (E)) = = µ(B)1A (x).
0, if x ∈
/A
Hence x 7→ µ2 (Sx (E)) is measurable. Let
C = {E ∈ F1 ⊗ F2 : x 7→ µ2 (Sx (E)) is F1 -measurable}.
Then C is a λ-system containing all measurable rectangles, which forms a π-system. Hence, by 1.3.9, C =
F1 ⊗ F2 . Since x ∈ Ω1 is arbitrary, x 7→ µ2 (Sx (E)) is measurable for all x ∈ Ω1 and E ∈ F1 ⊗ F2 .
Also, for pairwise disjoint (Ej )j≥1 ⊆ F1 ⊗ F2 , since Sx (Ej ) = Tx−1 (Ej ), (T −1 (Ej ))j≥1 is pairwise disjoint
and
    
G Z G Z X X
(µ1 ⊗ µ2 )  Ej  = µ2 Sx  Ej  dµ1 (x) = µ2 (Tx−1 (Ej ))dµ1 (x) = (µ1 ⊗ µ2 )(Ej ).
j≥1 j≥1 j≥1 j≥1
This shows that µ1 ⊗ µ2 is indeed a measure. If E = A × B is a measurable rectangle, (µ1 ⊗ µ2 )(E) =

µ1 (A)µ2 (B).
Remark. We can define similarly
Z
(µ2 ⊗ µ1 )(E) = µ1 (Sy (E))dµ2 (y).
So µ1 ⊗ µ2 = µ2 ⊗ µ1 on measurable rectangles. Hence, µ1 ⊗ µ2 = µ2 ⊗ µ1 whenever they are both σ-finite

by 6.0.4.
Proposition 7.2.4. If E ∈ F1 ⊗ F2 and µ1 , µ2 are σ-finite, then
Z Z Z Z
1E (x, y)dµ1 (x) dµ2 (y) = 1E (x, y)dµ2 (y) dµ1 (x).
By standard limiting argument, we have the following result:

Theorem 7.2.5 (Tonelli’s theorem). If f : Ω1 × Ω2 → [0, ∞] is measurable and µ1 , µ2 are σ-finite, then
Z Z Z Z Z
f d(µ1 ⊗ µ2 ) = f (x, y)dµ1 (x) dµ2 (y) = f (x, y)dµ2 (y) dµ1 (x).
Also,
Theorem 7.2.6 (Fubini’s theorem). Suppose f : Ω1 × Ω2 → [0, ∞] is measurable and µ1 , µ2 are σ-finite. If
f is integrable, then
Z Z Z Z Z
f d(µ1 ⊗ µ2 ) = f (x, y)dµ1 (x) dµ2 (y) = f (x, y)dµ2 (y) dµ1 (x).
7.3 Random Variables

Recall that X = (X1 , · · · , Xk ) : Ω → R is a random vector if and only if Xj : Ω → R is a random variable
for all j.
Proposition 7.3.1. Let X = (X1 , · · · , Xk ) be a random vector.
(1) σ(X) = {X −1 (A) : A ⊆ Rk Borel}.
(2) Y : Ω → R is measurable relative to σ(X) if and only if there exists a Borel measurable f : Rk → R
such that Y = f ◦ X.
39
Proof.
Pn (1): Trivial. (2): (⇐) Trivial. (⇒) First assume that Y is simple, then we can write Y =
j=1 j 1Bj , Bj ∈ σ(X) and
x pairwise disjoint. Then by (1), there exists Borel Aj ⊆ Rk such that X −1 (Aj ) =
Pn
Bj . Then we define f = j=1 xj 1Aj and therefore f is Borel measurable. Since
n
X
f ◦ X(ω) = xj 1Aj (X(ω)) = Y (ω) =⇒ Y = f ◦ X.
j=1
| {z }
=1Bj (ω)
For general Y , we may find simple and measurable functions ψj → Y . For each j, we can find Borel
fj : Rk → R such that ψj = fj ◦ X. Define
M = {x ∈ Rk : (fj (x)) converges}.
Then M is a Borel set. Define f : Rk → R by

(
limj→∞ fj (x), x∈M
f (x) =
0, x∈
/ M.
Then f is Borel measurable. Hence, for any ω ∈ Ω,
Y (ω) = lim sj (ω) = lim fj ◦ X(ω) = f (X(ω)).

j→∞ j→∞ | {z }
∈M
7.4 Independence
Proposition 7.4.1. Suppose X1 , · · · , Xk : Ω → R are random variable. Then X1 , · · · , Xk are independent
if and only if the law of the random vector X = (X1 , · · · , Xk ) is the product measure µ = µ1 ⊗ · · · ⊗ µk ,
where µj is the law of Xj .
Proof. Suppose X1 , · · · , Xk are independent. Then for measurable rectangles A1 × · · · × Ak ∈ Rk ,
k
Y k
Y
= P(X ∈ A1 × · · · × Ak ) = P(X1 ∈ A1 , · · · , Xk ∈ Ak ) = P(Xj ∈ Aj ) = µj (Aj ).
j=1 j=1
So the law of X agrees with µ on all measurable rectangles, and they are equal by 6.0.4.
Suppose the law of X is µ. Then
P(X1 ∈ A1 , · · · , Xk ∈ Ak ) = P(X ∈ A1 × · · · × Ak ) = µ1 (A1 ) · · · µk (Ak ) = P(X1 ∈ A1 ) · · · P(Xk ∈ Ak )
for all Borel A1 , · · · , Ak ⊆ R. Thus, X1 , · · · , Xk are independent.

Corollary 7.4.1.1. (1) X1 , · · · , Xk are independent if and only if P(X1 ≤ x1 , · · · , Xk ≤ xk ) = P(X1 ≤
x1 ) · · · P(Xk ≤ xk ) for all x1 , · · · , xk ∈ R.
(2) Suppose Xj has density fj , that is, PXj−1 = fj 1A dλ, ∀A Borel, then X1 , · · · , Xk are independent if
R
and only if X = (X1 , · · · , Xk ) has density

k
Y
f (x1 , · · · , xk ) := fj (xj ).
j=1
40
7.5 Convolution
Definition 7.5.1. If µ1 , µ2 are two probability measure on BR , we define the convolution µ1 ∗ µ2 : BR →
[0, ∞] by Z
µ1 ∗ µ2 (B) = µ2 (B − x)dµ1 (x),
for all Borel B ⊆ R.

We claim that µ1 ∗ µ2 is a probability measure on BR .
Proof of claim. By 6.3.5, there exists random variable X, Y with distribution µ1 , µ2 , respectively. Now for
(R2 , BR2 , µ1 ⊗ µ2 ), we define g : (x, y) 7→ x + y. It is clear that g is measurable and this defines a random
variable X + Y with distribution µ1 ∗ µ2 . Hence, µ1 ∗ µ2 is a probability measure.
Example. If X1 , X2 are independent and have exponential distribution with the same parameter α, then
their density functions are αe−αx 1x≥0 . Hence, the density function of X1 + X2 is
Z Z x
−αy −α(x−y)
h(x) = α 2
e e 1y≥0 1x−y≥0 dy = α 2
e−αx 1x≥0 dy = α2 xe−αx 1x≥0 .
0
By induction, one can show that the density for X1 + · · · + Xk with Xj ∼ Exp(α) is
(αx)k−1
αe−αx 1x≥0 .
(k − 1)!
7.6 Existence of Independence Sequence

Theorem 7.6.1. Let (µj )j≥1 be Borel probability measure on R. Then there exists a probability space
(Ω, F, P) and independent random variables (Xj )j≥1 such that Xj has distribution µj .
Proof. Let (Xij )i,j≥1 be a sequence of iid Bernoulli random variables (which are constructed in an example
very long ago). Let Fi = σ(Xi1 , Xi2 , · · · ). Define
X Xij
Yi = .
2j
j≥1
Then Yi is Fi -measurable. So (Yi )i≥1 is independent by 3.2.2. Let x ∈ [0, 1) be given. We define
n
(n)
X Xij
Yi = .
j=1
2j
This is a binary expansion with only n digits. If x ∈ [ 2kn , k+1

2n ) for some k ∈ N, then
0
2n , · · · , 2kn ≤ x and
therefore
(n) ⌊x2n ⌋ + 1
P(Yi ≤ x) = →x
2n
(n)
as n → ∞. On the other hand, Yi ↗ Yi and therefore
(n) (n)
P({Yi ≤ x}) = lim P(Yi ≤ x) = x.
n→∞
Since x is arbitrary, Yi has distribution U ((0, 1]). Hence, there exists a sequence of iid uniform random
variables (Yi )i≥1 .
For µi , let Fi be its distribution function. Define Zi = Fi−1 ◦ Yi , where Fi−1 is the quantile of Fi . Then
{Zi ≤ x} = {ω : Yi (ω) ≤ Fi (x)} = Fi (x).
Hence, Zi has distribution µi and (Zi ) is independent as (Yi )i≥1 is independent.
41
Chapter 8
Expected Value
8.1 Expected Value

Definition 8.1.1. If (Ω, F, P) is a probability space and X is an integrable random variable, then we define
Z
EX = XdP.
Remark. If f : R → R is measurable and f ◦ X is integrable, by 7.1.4,

Z
Ef (X) = f dµX ,
where µX is the distribution of X.

Then it is clear by 7.0.6, E is monotone and linear.
Definition 8.1.2. For k ≥ 0, the k-th moment of X is EX k and the k-th absolute moment of X is
k k
E |X| (as long as they exist). We say that X has k-moment if E |X| < ∞. If X has k moment, we can
k
define the k-th central moment by E |X − EX| .
Note that 3.4.4 (and thus 3.4.5) holds for general random variables. Hence, if X has j moment, then for
k < j, let r = kj and r′ be its exponential conjugate. Then
k/j
k k j
E |X| ≤ |X| = E |X| < ∞,
r
showing that X has k moment.

k
We can compute E |X| as below:
Z Z Z |X| Z Z ∞ Z ∞
k k k−1 k−1 7.2.5
E |X| = |X| dP = kx dxdP = kx 10≤x≤|X| dxdP = kxk−1 P(|X| ≥ x)dx.
0 0 0
8.2 Characteristic function and Uniqueness Theorem

Definition 8.2.1. Let X be a random variable. The characteristic function of X is the complex-valued
function given by
fX (t) := EeitX ,
where t ∈ R.
Remark. Since E eitX = E1 = 1, the characteristic function of X is defined for all t ∈ R.
It is obvious that fX (0) = 1 and fX (t) = EeitX = Ee−itX = fX (−t).
42
Theorem 8.2.2. Let X be a random variable and µ = PX −1 . Then for any x1 < x2 , we have
Z T −itx1
1 1 1 e − e−itx2
µ((x1 , x2 )) + µ({x1 }) + µ({x2 }) = lim fX (t)dt.
2 2 T →∞ 2π −T it
Proof. Fix x1 < x2 . Then for each T > 0, we have
Z T −itx1 Z T −itx1 Z ∞
− e−itx2 − e−itx2

e e itx
fX (t)dt = e dµ(x) dt
−T it −T it −∞
Z ∞ Z T −it(x1 −x)
e − e−it(x2 −x)
= dtdµ(x).
−∞ −T it
Here we have used Fubini’s theorem to change the order of integration. This is legal since
e−tx1 − e−itx2 itx
e ≤ |x2 − x1 |
it
is integrable (see the remark below).
Define
Z T −it(x1 −x) Z T Z T !
e − e−it(x2 −x) sin(x2 − x)t sin t(x1 − x)
IT (x; x1 , x2 ) = dt = 2 − dt .
−T it 0 t 0 t
Hence, as T → ∞, we have
lim IT (x; x1 , x2 ) = π(sgn(x2 − x) − sgn(x1 − x)) = π(sgn(x2 − x) + sgn(x − x1 )).
T →∞
Hence, we are done if we can take T → ∞ under integral sign, which is valid as
Z y Z π
sin u sin u
0≤ du ≤ du
0 u 0 u
Rπ
for all y ≥ 0 (as in this case, |IT (x; x1 , x2 )| ≤ 4 0 sinu u du < ∞ and we can apply the dominated convergence
theorem).
Let y ≥ 0. The case y ∈ [0, π] is trivial. If y ∈ [(2k − 1)π, (2k + 1)π] for some k ≥ 1, we have
Z y Z 2kπ k Z 2ℓπ k Z (2ℓ−1)π
sin u sin u X X 1 1
du ≥ du = = sin u − du ≥ 0.
0 u 0 u 2(ℓ−1)π (2ℓ−2)π u u+π
ℓ=1 ℓ=1
Hence, it remains to show that

Z (2k+1)π
sin u
du ≤ 0
π u
for all k ≥ 0, which is true as
Z (2k+1)π k Z 2ℓπ
sin u X 1 1
du = sin u − du ≤ 0.
π u (2ℓ−1)π u u+π
ℓ=1
Remark. Note that Z b Z b

ib ia it
e −e = ie dt ≤ ieit dt = |b − a| .
a a
Therefore,
e−ix1 t − e−ix2 t |x2 t − x1 t|
≤ = |x2 − x1 | .
it |t|
Corollary 8.2.2.1. Let X and Y be two random variables. If they have the same characteristic function,
then X ∼ Y .
Proof. Let µ1 = PX −1 and µ2 = PY −1 . Let Dj = {x ∈ R : µj ({x}) > 0} and D = D1 ∪ D2 . Since X and Y
have the same characteristic function, we have µ1 ((x1 , x2 )) = µ1 ((x1 , x2 )) for all x1 < x2 in R\D.
On the other hand, D1 , D2 are both countable as µj (R) = 1. Hence, R\D is dense in R. Hence,
µ1 ((a, b]) = µ2 ((a, b]) for all a < b. Therefore, by 6.0.4, µ1 = µ2 on BR and therefore X ∼ Y .
43
8.3 Moment Generating Function
Definition 8.3.1. For a random variable X, define
MX (t) = EetX
as the moment generating function. Note that M (t) can be ∞ for some t.
Property. The values of t such that MX (t) < ∞ is an interval containing 0 (may just be {0}).
Proof. Suppose that M (t) < ∞ for some t > 0. We claim that for any 0 ≤ s ≤ t, MX (s) < ∞.
Proof of claim. Note that esX = esX 1X≥0 + esX 1X<0 . Since
EesX 1X≥0 ≤ EetX 1X≥0 ≤ EetX = M (t) < ∞, EesX 1X<0 ≤ E1 = 1 < ∞,
we are done.
Similar holds if M (t) < ∞ for some t < 0.

Proposition 8.3.2. Let X be an random variable such that MX (t) < ∞ for some t ∈ (−t0 , t0 ) with t0 > 0.
Then X tn
MX (t) = EX n
n!
n≥0
(n)
for all t ∈ (−t0 , t0 ). Therefore, EX n = MX (0) for all n ≥ 0.
Proof. We first show that X has all moments. Since et|X| ≤ etX + e−tX , we know that
Eet|X| ≤ EetX + Ee−tX < ∞

k
for all t ∈ (−t0 , t0 ). Fix any t ∈ (0, t0 ). Then for any k ≥ 0, |x| ≤ et|x| for all large x. Hence,
k
E |X| < ∞
k
for all k ≥ 0 (truncates |X| by some bound). Furthermore, via expanding the exponential,
X (tX)n X (tX)n X tn
EetX = E = E = EX n ,
n! n! n!
n≥0 n≥0 n≥0
where the second inequality holds as

X (tX)n
≤ et|X|
n!
n≥0
with Eet|X| < ∞.

Remark. Similarly, if we pick η such that (t − η, t + η) ⊆ (−t0 , t0 ), then we have
k
|X| etX ≤ eη|X| etX ≤ e(t+η)X + e(t−η)X
k
if |X| is large enough. Hence, E |X| etX < ∞ for all k ≥ 0.
Theorem 8.3.3. Suppose MX (t) is well-defined on some open interval I = (a, b). Then the complex function
ΦX (z) := EezX
is well-defined and holomorphic in the strip
SI := {z = t + is : t ∈ I, s ∈ R}
over the complex plane.
44
Proof. Since E ezX = E etX eisX = MX (t) < ∞ for all z ∈ SI , ΦX (z) is well-defined on SI .
Let z0 = t0 + is0 ∈ SI be given. For z = t + is ∈ SI , note that
(z−z0 )X
ΦX (z) − ΦX (z0 ) z0 X z0 X e − 1 − (z − z0 )X
− E(Xe )=E e .
z − z0 z − z0
When |z − z0 | < η, we have
e(z−z0 )X − 1 − (z − z0 )X X (z − z0 )n−1 X n

z0 X
e = ez 0 X
z − z0 n!
n≥2
X |ηX|n−1
≤ et0 X |X|
(n − 1)!
n≥2

+ e(t0 −η)X .
≤ |X| e (t0 +η)X
If we choose η small enough so that t0 ± η ∈ I, by the remark above, E |X| e(t0 +η)X + e(t0 −η)X < ∞.

Hence, by dominated convergence theorem,

(z−z0 )X
ΦX (z) − ΦX (z0 ) z0 z0 X e − 1 − (z − z0 )X
lim sup − E(Xe X) ≤ E lim e =0
z→z0 z − z0 z→z0 z − z0
and therefore Φ′X (z0 ) = E(Xez0 X ).
Since z0 ∈ SI is arbitrary, we conclude that ΦX (z) is holomorphic in SI .
Corollary 8.3.3.1. Suppose that X and Y are two random variables with MX (t) = MY (t) on some neigh-
borhood of t = 0. Then X ∼ Y .
Proof. Suppose MX , MY are well-defined on I := (−δ, δ) for some δ > 0 and MX |I = MY |I . Then ΦX (z) =
ΦY (z) on S := I × iR. Since ΦX , ΦY are holomorphic on S by 8.3.3, by the identity theorem, ΦX (z) = ΦY (z)
on S. In particular, ΦX = ΦY on {0} × iR.
Therefore, fX = fY and by 8.2.2 (or the corollary of it), X ∼ Y .
Theorem 8.3.4 (central limit
Ptheorem). Suppose (Xj )j≥1 be and i.i.d. random variables with EX1 = 0 and
n
EX12 = σ 2 > 0. Define Sn = j=1 Xj . Then
Sn
√ → N (0, 1)
σ n
in distribution, where N (0, 1) is the standard normal distribution, that is,

Sn
lim P √ ≤ x = Φ(x)
n→∞ σ n
for all x ∈ R, where Φ(x) is the distribution of N (0, 1).
Proof. Note that the moment generating function of N (0, 1) is
Z Z
1 x2 2 1 (x−t2 )2 2
√ etx e− 2 dx = et /2 √ e− 2 dx = et /2 .
2π R 2π R
2
Hence, by 8.3.3, it suffices to show MZn (t) → et /2 , where Zn = σS√nn .
Since Zn is a sum of independent random variables, we have
n
t
MZn (t) = MX1 √ .
σ n
t2 2 2
Since MX1 (t) = 1 + tEX1 + 2
2 EX1 + o(t2 ) = 1 + t 2σ + o(t2 ), we have
n
t2

MZn (t) = 1 + + ϵn ,
2n
2
where nϵn t2 → 0 as n → ∞. Therefore, MZn (t) → et /2
as n → ∞.
Thus, Zn → N (0, 1) in distribution.
45

note

Uploaded by

Copyright:

Available Formats

note

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

note

Uploaded by

Copyright:

Available Formats

Probability Theory

1 σ-algebra and Probability Measures 2

2 Independence and Kolmogorov’s Zero-One Law 10

3 Random Variables and Expectation 13

5 Gambling Systems (a.k.a. random walk) 24

σ-algebra and Probability Measures

1.1 Sample Space and σ-algebra

1.2 Probability Measures

(3) If A ∈ F, P(A) = 1 − P(Ω\A).

P(A ∪ B) = P(A) + P(B) − P(A ∩ B).

(5) (Monotonicity) If A ⊆ B and A, B ∈ F, then P(A) ≤ P(B).

P(A) = lim P(Aj ).

(7) (Countably Subadditivity) If (Aj )j≥1 ⊆ F, then

Proof. (1), (2), and (3) are trivial.

Proposition 1.2.3 (General Inclusion-Exclusion). Let A1 , · · · , An ∈ F. Then

By the induction hypothesis,

Since for each k ∈ N,

1.3 Lebesgue Measure on (0, 1]

λ is well-defined (independent of representation of A). Moreover, we have the following properties:

By compactness of [a + ϵ, b], there exists a finite subcovering, say (a1 , b1 + 2ϵ ), · · · , (an , bn + ϵ

Since ϵ is arbitrary, we are done.

By the property (3) above,

Now we are ready for step 2: extending λ to a σ-algebra containing F0 .

This is called Caratheodory’s extension of P.

Since ϵ is arbitrary, we are done.

The collection of all P∗ measurable sets is defined by M.

P∗ (E ∩ (A1 ∪ A2 )) = P∗ (E ∩ (A1 ∪ A2 ) ∩ A1 ) + P∗ ((E ∩ (A1 ∪ A2 ))\A1 ) = P∗ (E ∩ A1 ) + P∗ (E ∩ A2 ).

Then the finite case follows by induction.

By letting n → ∞, it follows that

C = {A ∈ σ(F0 ) : P1 (A) = P2 (A)}.

A nonempty collection Π ⊆ P(Ω) is a π-system if Π is closed under finite intersection.

(Ω\A) ∩ B = B\A = B\(A ∩ B) ∈ λ(Π)

by the remark above. Since B is arbitrary, Ω\A ∈ L1 .

Independence and Kolmogorov’s

2.1 Limits Behavior and Borel-Cantelli Lemma

Below are some basic facts:

Proof. By the continuity of P,  

Example. Recall the coin tossing example.

Definition 2.2.4. Collections A1 , · · · , An ⊆ F are independent if each choice A1 ∈ A1 , · · · , An ∈ An is

σ(A1 ), · · · , σ(An ) are independent.

Proof. We claim that for any A1 , · · · , Am ∈ F, the collection Λ = {A ∈ F : A, A1 , · · · , Am is independent}

for any distinct i1 , · · · , ik ∈ {1, · · · , m}.

is a λ-system containing A1 . Since A1 is a π-system, by 1.3.9, σ(A1 ) ⊆ Λ and σ(A1 ), A2 , · · · , An are

2.3 Kolmogorov’s Zero-One Law

The sets in T are called the tail events.

the equalities are proven.

Random Variables and Expectation

3.1 Random Variables

If X is a simple random variable with values x1 , · · · , xn , we can set Ai = X −1 ({xi }) ∈ F and

Definition 3.1.6. The σ-algebra generated by a random variable X on Ω is defined as

σ(X) := σ({X −1 (B) : B ⊆ R Borel.}),

σ(X1 , X2 , · · · , ) := σ({Xj−1 (B) : B ⊆ R Borel, j ≥ 1}).

and therefore {W −1 (A) : A ⊆ Rn } ∈ σ(X1 , · · · , Xn ).

Y (ω) = f (X1 (ω), · · · , Xn (ω)).

Proof. Let B ⊆ R be Borel. Then

Y −1 (B) = {ω : f (X1 (ω), · · · , Xn (ω)) ∈ B} = {ω : (X1 (ω), · · · , Xn (ω)) ∈ f −1 (B)} ∈ σ(X1 , · · · , Xn )

and Y is measurable relative to X1 , · · · , Xn .

Y −1 ({y}) ∈ σ(X1 , · · · , Xn ) = {W −1 (A) : A ⊆ Rn }.

Thus, there exists Ay ⊆ Rn such that

Then f is well-defined by above discussion and f (W (ω)) = Y (ω).

Proof. By the continuity of P,