Stochastic Processes: Stat433/833 Lecture Notes
Stochastic Processes: Stat433/833 Lecture Notes
Stochastic Processes: Stat433/833 Lecture Notes
Stochastic Processes
Jiahua Chen
Department of Statistics and Actuarial Science
University of Waterloo
Jiahua
c Chen
The text book for this course is Probability and Random Processes
by Grimmett and Stirzaker.
It is NOT essential to purchase the textbook. This course note will be
free to download to all students registered. We may not be able to cover all
the materials included.
Stat433 students will be required to work on fewer problems in assign-
ments and exams.
The lecture note reflects the instructor’s still in-mature understanding of
this general topic, which were formulated after reading pieces of the following
books. A request to reserve has been sent to the library for books with call
numbers.
The References are in Random Order.
3 Generating Functions 23
3.1 Renewal Events . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Proof of the Renewal Theorem . . . . . . . . . . . . . . . . . . 27
3.3 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.3.1 Quick derivation of some generating functions . . . . . 33
3.3.2 Hitting time theorem . . . . . . . . . . . . . . . . . . . 33
3.3.3 Spitzer’s Identity . . . . . . . . . . . . . . . . . . . . . 35
3.3.4 Leads for tied-down random walk . . . . . . . . . . . . 38
3.4 Branching Process . . . . . . . . . . . . . . . . . . . . . . . . 39
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
1
2 CONTENTS
Definition 1.1
2. If A1 , A2 , . . . ∈ F, then ∪∞
i=1 ∈ F (closed under countable union).
1
2 CHAPTER 1. REVIEW AND MORE
Example 1.1
♦
It can be shown that if A and B are two sets in a σ-field, then the resulting
sets of all commonly known operations between A and B are members of the
same σ-field.
2. P (Ω) = 1;
∞
3. P (∪∞ i=1 P (Ai ) for any Ai ∈ F, i = 1, 2, . . . which satisfy
P
i=1 Ai ) =
Ai Aj = φ whenever i 6= j.
In general,
then
P (A) = lim P (An ).
n→∞
F (−∞) = 0, F (∞) = 1.
M (t) = E{exp(tX)}.
1.2. RANDOM VARIABLE 5
φ(t) = E{exp(itX)}
where we assume P (X ≥ 0) = 1.
P {x ≤ X < x + ∆|X ≥ x}
λ(x) = lim .
∆→0+ ∆
f (x) d
λ(x) = = − log S(x).
S(x) dx
Hence, for x ≥ 0, Z x
S(x) = exp{− λ(t)dt}.
0
Let us examine a few examples to illustrate the hazard function and its
implication.
Example 1.2
6 CHAPTER 1. REVIEW AND MORE
λ(x) = λα(λx)α−1 .
Example 1.3
E{ZIA } = E{XIA }
E{ZIA } = E{XIA }
for any functions g1 and g2 . Under measure theory, we need to add a cosmes-
tic condition that these two functions are measurable, plus these expectations
exist. More rigorously, the independence of two random variables is built on
the independence of their σ-fields. It can be easily seen that the σ-field gen-
erated by g1 (X) is a sub-σ-field of that of X. One may then see what g1 (X)
is independent of g2 (Y ).
1. If X is G measurable, then
2. If G1 ⊂ G2 , then
E[E{X|G2 }|G1 ] = E{X|G1 }.
1.4. MULTIVARIATE NORMAL DISTRIBUTION 9
E{X|G} = E{X}.
C exp(−xAxτ − 2bxτ )
where A is positive definite, and C is a constant such that the density function
has total mass 1. Note that x and b are vectors.
Being positive definite implies that there exists an orthogonal decompo-
sition of A such that
A = BΛB τ
where Λ is diagonal matrix with all element positive and BB τ = I, the
identity matrix.
With this knowledge, it is seen that
Z
1 Z
1 (2π)n/2
exp(− xAx )dx = exp(− yΛy τ )dy =
τ
.
2 2 |A|1/2
Hence when b = 0, the density function has the form
1
f (x) = [(2π)−n |A|]1/2 exp(− xAxτ ).
2
Let X be a random vector with the density function given above, and let
Y = X + µ. Then the density function of Y is given by
1
{(2π)−n |A|}1/2 exp{− (y − µ)A(y − µ)0 }.
2
It is more convenient to use V = A−1 in most applications. Thus, we
have the definition as follows.
10 CHAPTER 1. REVIEW AND MORE
2. If X has standard normal distribution, then all its odd moments are
zero, and its even moments are
EX 2r = (2r − 1)(2r − 3) · · · 3 · 1.
1.5. SUMMARY 11
3. Let Φ(x) and φ(x) be the cumulative distribution function and the
density function of the standard normal distribution. It is known that
1 1 1
− 3 φ(x) ≤ 1 − Φ(x) ≤ φ(x)
x x x
for all positive x. In particular, when x is large, they provide a very
accurate bounds.
X̄)2 are independent. Further, X̄n has normal distribution and Sn2 has
chisquare distribution with n − 1 degrees of freedom.
AΣAΣA = AΣA.
1.5 Summary
We did not intend to give you a full account of probability theory learned in
courses preceding this one. Neither we try to lead you to the world of measure
theory based, advanced, and widely regarded as useless rigorous probability
theory. Yet, we did introduce the concept of probability space, and why a
dose of σ-field is needed in doing so.
12 CHAPTER 1. REVIEW AND MORE
In the later half of this course, we need the σ-field based definitions of
independence, conditional expectation. It is also very important to get fa-
miliar with multivariate normal random variables. These preparations are
not sufficient, but will serve as starting point. Also, you may be happy to
know that that the pressure is not here yet.
Chapter 2
13
14 CHAPTER 2. SIMPLE RANDOM WALK
(iii) Markov:
Assume that start with initial capital S0 = a, the game stops as soon as
Sn = 0 or Sn = N for some n, where N ≥ a ≥ 0. Under the conditions of
simple random walk, what is the probability that the game stops at Sn = N ?
Property 2.1
Property 2.2
2.1. COUNTING SAMPLE PATHS 15
For a simple random walk, and when (n + a − b)/2 is a positive integer, all
sample paths from (0, a) to (n, b) have equal probability p(n+b−a)/2 q (n+a−b)/2
to occur. In addition,
!
n
P (Sn = b|S0 = a) = n+b−a p(n+b−a)/2 q (n+a−b)/2 .
2
♦
The proof is straightforward.
Example 2.2
It is seen that !
2n n n
P (S2n = 0|S0 = 0) = p q
n
for n = 0, 1, 2, . . .. This is the probability when the random walk returns to
0 at trial 2n.
When p = q = 1/2, we have
!
2n
u2n = P (S2n = 0|S0 = 0) = (1/2)2n .
n
Using Stirling’s approximation
√
n! ≈ 2πn(n/e)n
for large n. The approximation is good even for small n. We have
1
u2n ≈ √ .
nπ
Thus, n u2n = ∞. By renewal theorem to be discussed, Sn = 0 is a recurrent
P
The number of paths from (0, 0) to (n, b) that do not revisit the axis is
|b|
Nn (0, b).
n
Proof: Notice that such a sample path has to start with a transition from
(0, 0) to (1, 1). The number of paths from (1, 1) to (n, b), such that b > 0,
which do not touch x-axis is
0
Nn−1 (1, b) − Nn−1 (1, b) = Nn−1 (1, b) − Nn−1 (−1, b)
! !
n−1 n−1
= n−b − n−b−2
2 2
|b|
= Nn (0, b).
n
♦
What does this name suggest? Suppose that Micheal and George are in a
competition for some title. In the end, Micheal wins by b votes in n casts. If
the votes are counted in a random order, and A =“Micheal leads throughout
the count”, then
b
Nn (0, b) b
P (A) = n = .
Nn (0, b) n
Theorem 2.1
If b 6= 0 and S0 = 0, then
|b|
fn (b) = P (S1 6= b, · · · , Sn−1 6= b, Sn = b|S0 = 0) = P (Sn = b).
n
Proof: If we reverse the time, the sample paths become those who reach b
in n trials without touching the x-axis. Hence the result. ♦
The next problem of interest is how high Si have attained before it settles
at Sn = b. We assume b > 0. In general, we often encounter problems of
finding the distribution of the extreme value of a stochastic process. This
is an extremely hard problem, but we have a kind of answer for the simple
random walk.
Define Mn = max{Si , i = 1, . . . , n}.
Theorem 2.3
P (Sn = b) if b ≥ r
P (Mn ≥ r, Sn = b) = { r−b
(q/p) P (Sn = 2r − b) if b < r
Hence,
r−1
(q/p)r−b P (Sn = 2r − b)
X
P (Mn ≥ r) = P (Sn ≥ r) +
c=−∞
∞
[1 + (q/p)c−r ]P (Sn = c)
X
= P (Sn = r) +
c=r+1
Thus, in general, the mean number of visit is less than 1. When p = q = 0.5,
all states are recurrent. Therefore µb = 1 for any b. ♦
Suppose a perfect coin is tossed until the first equalisation of the accu-
mulated numbers of heads and tails. The gambler receives one dollar every
time that the number of heads exceeds the number of tails by b. This fact
results in comments that the “fair entrance fee” equals 1 independent of b.
My remark: how many of us thinks that this is against their intuition?
Theorem 2.4 Arc sine law for last visit to the origin.
Suppose that p = q = 1/2 and S0 = 0. The probability that the last visit to
0 up to time 2n occurred at time 2k is given by
♦
Proof: The probability in question is
♦
The proof shows that
{π[k(n − k)]}−1/2
X
P (T2n ≤ 2xn) ≈
k≤xn
Z x1
≈ du
0 π[u(1 − u)]1/2
2 √
= arcsin( x).
π
That is, the limiting distribution has a density function described by
√
arcsin( x).
20 CHAPTER 2. SIMPLE RANDOM WALK
(1) Since p = q = 1/2, one may think the balance of heads and tails
should occur very often. Is it true? After 2n tosses with n large, the chance
that it never touches 0 after trial n is 50% which is surprisingly large.
(2) The last time before trial 2n when the simple random walk touches
0 should be closer to the end. This result shows that it is symmetric about
midpoint n. It is more likely to be at the beginning and near the end.
Why is it more likely at the beginning? since it started at 0, it is likely
to touch 0 again soon. Once it wandered away from 0, touching 0 becomes
less and less likely.
Why is it also likely occur near the end: if for some reason the simple
random walk returned to 0 at some point, it becomes more likely to visit 0
again in the near future. Thus, it pushes the most recent visit closer and
closer to the end.
(3) If we count the number of k such that Sk > 0, what kind of distribution
it has? Should it be more likely to be close to n? It turns out that it is either
very small or very large.
Thus, if you gamble, you may win all the time, or lose all the time even
though the game is perfectly fair in the sense of probability theory.
Let us see how the result establishes (3).
Suppose that p = 1/2 and S0 = 0. The probability that the walk spends
exactly 2k intervals of time, up to time 2n, to the right of the origin equals
u2k u2n−2k . ♦
Proof: Call this probability β2n (2k). We are asked to show that β2n (2k) =
α2n (2k).
We use mathematical induction.
The first step is to consider the case when k = n = m.
In our previous proofs, we have shown that for symmetric random walk
Since these sample paths can belong to one of the two possible groups: always
above 0 or always below 0. Hence,
2.2 Summary
What have we learned in this chapter? One observation is that all sample
paths starting and ending at the same location have equal probability to
occur. Making use of this fact, some interesting properties of the simple
random work are revealed.
One such property is that the simple random work is null recurrent when
p = q and transient otherwise. Let us recall some detail; by counting sample
paths, it is possible to provide an accurate enough approximation to the
probability of entering 0.
The reflection principle allows us to count the number of paths going from
one point to another without touching 0. This result is used to establish the
Ballot Theorem. In old days, there were enough nerds who believed that this
result is intriguing.
We may not believe that the hitting time problem is a big deal. Yet when
it is linked with stock price, you may change your mind. If you find some
neat estimates of such probabilities for very general stochastic processes, you
will be famous. In this course, we provide such a result for the simple random
work. There is a similar result for Brownian motion to be discussed.
Similarly, arc sine law also have its twin in Brownian motion. Please do
not go away and stay tuned.
Chapter 3
GX (s) = E(sX )
Theorem 3.1
23
24 CHAPTER 3. GENERATING FUNCTIONS
♦
The proof can be done through conditioning on N . When N = 0, we
assume that S = 0.
Definition 3.2
Theorem 3.2
♦
The idea of generating function also applies to a sequence of real numbers.
If {an }∞
n=0 is a sequence of real numbers, then
an s n
X
A(s) =
n
for n = 0, 1, . . .. Then
C(s) = A(s)B(s)
where A(s), B(s) and C(s) are generating functions of {an }∞ ∞
n=0 , {bn }n=0 and
{cn }∞
n=0 respectively.
3.1. RENEWAL EVENTS 25
because f has the interpretation that λ recurs at some time in the sequence.
Since the event may not occur at all, it is possible for f to be less than 1.
Clearly, 1 − f represents the probability that λ never recurs in the infinite
sequence of trials. When f < 1, the probability that λ occurs finite number
of times only is 1. Hence, we say that λ is transient. Otherwise, it it
recurrent.
For a recurrent renewal event, F (s) is a probability generating function.
The mean inter-occurrence time is
∞
µ = F 0 (1) =
X
nfn .
n=0
26 CHAPTER 3. GENERATING FUNCTIONS
un = f0 un + f1 un−1 + · · · + fn−1 u1 + fn u0 , n = 1, 2, . . . .
Hence
1 1
U (s) = or F (s) = 1 − .
1 − F (s) U (s)
Theorem 3.3
un = U (1) < ∞,
P
1. transient if and only if u =
un = ∞ and un → 0 as n → ∞.
P
4. null recurrent if and only if
nfn = F 0 (1)
X
µ=
lim un = µ−1 .
n→∞
un = f0 un + f1 un−1 + · · · + fn u0
for all n ≥ 1.
Then un → µ−1 as n → ∞ where µ = nfn (and µ−1 = 0 when µ = ∞).
P
♦
Note this definition of un does not apply to the case of n = 0. Otherwise,
the u-sequence would be a convolution of f-sequence and itself.
The implication of being aperiodic is as follows. Let A be the set of all
integers for which fn > 0, and denote by A+ the set of all positive linear
combinations
p1 a1 + p2 a2 + · · · + pr ar
of numbers a1 , . . . , ar in A.
Lemma 3.1
28 CHAPTER 3. GENERATING FUNCTIONS
Lemma 3.2
Lemma 3.3
Let {wn }∞
n=−∞ be a doubly infinite sequence of numbers such that 0 ≤ wn ≤ 1
and ∞ X
wn = fk wn−k
k=1
w−a = 0 whenever a ∈ A.
3.2. PROOF OF THE RENEWAL THEOREM 29
Using
∞
X X
wn = fk wn−k ≤ fk = 1
k=1
uN = (1 − ρN )u0 − ρN −1 u1 − ρN −2 u2 − · · · − ρ1 uN −1 .
ρN u0 + ρN −1 u1 + · · · + ρ1 uN −1 + ρ0 uN = 1. (3.1)
Recall that
lim un,vj = wn = η
j→∞
for each n. At the same time, un,vj = uvj +n is practically true for all n (or
more precisely for all large n). Let N = vj , and let j → ∞, (3.1), implies
X
η×{ ρk } = 1.
ρN u0 + ρN −1 u1 + · · · + ρ1 uN −1 + ρ0 uN = 1
ηµ + ρ0 (η0 − η) ≥ 1.
Sn = X1 + X2 + · · · + Xn ,
P (X ≤ 1) = 1
b
fb (n) = P (Tb = n) = P (Sn = n).
n
Let us see why this is still true for the new random walk we have just
defined.
For this purpose, define
Fb (z) = E(z Tb )
34 CHAPTER 3. GENERATING FUNCTIONS
We also define
G(z) = E(z 1−X1 ).
Since X1 can be negative with large absolute values, work with generating
function of 1−X1 makes sense. Since 1−X1 is non-negative, G is well defined
for all |z| ≤ 1. Please note that our G here is a bit different from G in the
textbook.
The purpose of using letter z, rather than s, is to allow z to be a complex
number. It seems that defining a function without a value at z = 0 calls for
some attention. We will pretend z is just a real number for illustration.
If we ignore the mathematical subtlety, our preparations boil down to have
defined generating functions of Tb and X1 . Due to the way Sn is defined, the
“generating function” of Sn is determined by [G(z)]n . Hence the required
hitting time theorem may be obtained by linking G(z) and Fb (z).
This step turns out to be rather simple. It is seen that
as the random walk cannot skip a state without landing on it. We take notice
that this relationship works even when b = 0.
Further,
F1 (z) = E[E(z T1 |X1 )] = E[z 1+T1−X1 |X1 ] = zE{[F1 (z)]1−X1 } = zG(F1 (z)).
Let z = w/f (w) where f (w) is an analysis function (has derivative in complex
analysis sense) in a neighborhood of w = 0. (and a one-to-one relationship
between z and w in this neighborhood.). If g is infinitely differentiable, (but
I had the impression being analytic implies infinitely differentiable), then
∞
zn dn−1 0
" #
[g (u){f (u)}n ]
X
g(w(z)) = g(0) + .
n=1 n! dun−1 u=0
Let us not forget that [F1 (z)]b is the “probability” generating function of Tb ,
and Gn (u)/z n is the probability generating function of Sn .
Matching the coefficient of z n , we have
dn−1
" #
n!P (Tb = n) = [bub−1 Gn (u)] .
dun−1 u=0
Notice the latter is (n − 1)! times of the coefficient of un−1 in the power
expansion of bub−1 Gn (u), which in turn, is the coefficent of un−b in the power
series expansion of bGn (u), and which is the coefficient of u−b in the expansion
of bu−n Gn (u).
Now, we point out that u−n Gn (u) = Eu−Sn . That is, (n − 1)! times of
this coefficient is b(n − 1)!P (Sn = b). Hence we get the result. ♦
Theorem 3.6
Multiply both sides by sb tn and sum over b, n ≥ 0. The left hand side is
tn E(sMn )
X
n=0
1−F1 (t)
tn P (T1 > n) =
P
Recall 1−t
. The right hand side is, by convolution
relationship,
∞ ∞
b 1 − F1 (t)
Tb 1 − F1 (t) X
sb [F1 (t)]b
X
s E(t ) =
b=0 1−t 1 − t b=0
1 − F1 (t)
= .
(1 − t)(1 − sF1 (t))
Denote this function as D(s, t).
The rest contains mathematical manipulation: By Hitting time theorem,
n
X
nP (T1 = n) = P (Sn = 1) = P (T1 = j)P (Sn−j = 0)
j=0
Thus, the generating function [F1 (t)]k U (t) is in fact a generating function for
the sequence of P (Sn = k) in n. That is,
∞
[F1 (t)]k U (t) = tn P (Sn = k).
X
n=0
So, we get
∂ ∂ ∂ ∂
log D(s, t) = − log(1 − t) + log[1 − F1 (t)] − log[1 − sF1 (t)]
∂t ∂t ∂t ∂t
∞ ∞ ∞
!
n−1 k
X X X
= t 1− P (Sn = k) + s P (Sn = k)
i=1 k=1 k=1
∞ ∞
!
n−1 k
X X
= t P (Sn ≤ 0) + s P (Sn = k)
i=1 k=1
∞
+
tn−1 E(sSn ).
X
=
i=1
38 CHAPTER 3. GENERATING FUNCTIONS
∞
t2n P (S2n = 0)G2n (s).
X
H(s, t) =
n=0
3.4. BRANCHING PROCESS 39
Conditioning on T0 , we have
n
E[S L2n |S2n = 0, T0 = 2r]P (T0 = 2r|S2n = 0)
X
G2n (s) =
r=1
n
1
( s2r )P (T0 = 2r|S2n = 0).
X
=
r=1 2
Also
P (T0 = 2r)P (S2n−2r = 0)
P (T0 = 2r|S2n = 0) = .
P (S2n = 0)
Hence,
1
H(s, t) − 1 = H(s, t)[F0 (t) + F0 (st)].
2
Recall that
F0 (t) = (1 − 4pqt2 )−1/2 ,
one obtains
2
H(s, t) = √ √
1 − t + 1 − s2 t2
2
√ √
2[ 1 − s2 t2 − 1 − t2 ]
=
t2 (1 − s2 )
∞
1 − s2n+2
!
2n
X
= t P (S2n = 0) .
n=0 (n + 1)(1 − s2 )
generating function G(s). Let µ and σ 2 be the mean and variance of the
family size.
Let the population size of the nth generation be called Zn .
It is known that
E[Zn ] = µn
σ 2 (µn − 1) n−1
V ar(Zn ) = µ .
µ−1
These relations can be derived from the identity:
where Gn (t) = E(tZn ). The same identity also implies that the probability of
ultimate extinction η = lim P (Zn = 0) is the smallest non-negative solution
of the equation
x = G(x).
Because of this, it is known that:
1. if µ < 1, η = 1;
2. if µ > 1, η < 1;
4. if µ = 1 and σ 2 = 0, then η = 0.
3.5 Summary
The biggest deal of this chapter is the proof of the Renewal Theorem. The
proof is not much related to generating function at all. One may learn a lot
from this proof. At the same time, it is okay to choose to ignore the proof
completely.
One should be able to see the beauty of generating functions when han-
dling problems related to the simple random walk and the branching process.
At the same time, none of these discussed should be new to you. You may
3.5. SUMMARY 41
realize that two very short sections on the simple random walk and brach-
ing process are rich in content. Please take the opportunity to plant these
knowledge firmly in you brain if they were not so before.
42 CHAPTER 3. GENERATING FUNCTIONS
Chapter 4
When this transition probability does not depend on n, we say that the
Markov chain is time homogeneous.
The Markov chain discussed in this course will be assumed time homoge-
neous unless otherwise specified. In this case, we use notation
and P for the matrix with the i, jth entry being pij .
It is known that all entries of the transition matrix are non-negative, and
its row sums are all 1.
Further, let P(m) be the m-step transition matrix defined by
(m)
pij = P (Xm = j|X0 = i).
43
44 CHAPTER 4. DISCRETE TIME MARKOV CHAIN
Since these random variables take only two possible values, having correlation
0 implies independence. All other non-neighbouring pairs of random variables
are also independent of each other by definition.
Thus,
1
P (Xm+n = j|Xn = i) = P (Xm+n = j) =
2
for any i, j = ±1. Thus, the m-step transition matrix is
1 1
!
(m) 2 2
P =P= 1 1
2 2
4.1. CLASSIFICATION OF STATES AND CHAINS 45
for some n, then the values of Xn+m ’s are all in G when m ≥ 0. A class G is
closed when it has this property. Notice that given Xn ∈ G for some n, the
Markov chain {Xn , Xn+1 , . . .} has effectively G as its state space. Thus, the
state space is reduced when G is a true subset of the state space.
In contrast to a closed class, if there exist states i ∈ G and j 6∈ G such
that pij > 0, then the class is said open.
♦
Due to the fact that entering a state is a renewal event, some properties
of renewal event can be translated easily here. Let
∞
sn pij (n)
X
Pij (s) =
i=1
4.2. CLASS PROPERTIES 47
be the probability of entering j from i for the first time at time n. Let its
corresponding generating function be Fij (s). We let
X
fij = fij (n)
n
for the probability that the chain ever enters state j starting from i. Note
that fij = Fij (1).
Lemma 4.1
Proof: The first property is the renewal equation. The second one is the
delayed renewal equation. ♦
Based on these two equations, it is easy to show the following.
Theorem 4.1
♦
Proof: Using the generating function equations.
Theorem 4.2
pij (m1 ) > 0 and pji (m2 ) > 0. Hence, pjj (m+m1 +m2 ) ≥ pji (m2 )pii (m)pij (m1 ).
Consequently,
X X
pjj (m) ≥ pjj (m + m1 + m2 )
m m
X
≥ pji (m2 ){ pii (m)}pij (m1 )
m
= ∞.
That is, j is also a recurrent state.
The above proof also implies that if i is transient, j cannot be recurrent.
Hence, j is also transient.
We delegate the positive recurrentness proof to the future.
At last, we show that i and j must have the same period.
Define Ti = {n : pii (n) > 0} and similarly for Tj . Let di and dj be the
periods of state i and j. Note that if n1 , n2 ∈ Ti , then an1 + bn2 ∈ Ti for any
positive integers a and b. Since di is the greatest common divisor of Ti , there
exist integers a1 , . . . , am and n1 , . . . , nm ∈ Ti such that
a1 n1 + a2 n2 + · · · + am nm = di .
By grouping positive and negative coefficients. it is easy to see that the
number of items m can reduced to 2. Thus, we assume that it is possible to
find a1 , a2 and n1 , n2 such that
a1 n1 + a2 n2 = di .
We can then further pick non-negative coefficients such that
a11 n1 + a12 n2 = kdi , a21 n1 + a22 n2 = (k + 1)di
for some positive integer k.
Let m1 and m2 be the number of steps the chain can go and return from
state i to state j. Then, both
m1 + m2 + kdi , m1 + m2 + (k + 1)di ∈ Tj .
Thus, we must have dj divides di . The reverse is also true by symmetry.
Hence di = dj . ♦
4.3. STATIONARY DISTRIBUTION 49
Lemma 4.2
If the state space is finite, then at least one state is recurrent and all recurrent
states are positive recurrent. ♦
we cannot have pij (n) → 0 for all j when the state space is finite. Hence, at
least one of them is recurrent. ♦
Intuitively, the last conclusion implies that there is always a closed class
of recurrent states for a Markov chain with finite state space. Since the
Markov chain cannot escape from the finite class, the average waiting time
for the next visit will be finite. Thus, at least one of them is recurrent and
hence positive recurrent.
We skip the rigorous proof for now.
possible that the distribution stabilizes and it may hence have a limit. This
limit turns out to exist in many cases and it is related to so called stationary
distribution.
Definition 4.1
β(n) = αPn .
Lemma 4.3
If the Markov chain is irreducible and recurrent, there exists a positive root
x of the equation x = xP , which is unique up to a multiplicative constant.
The chain is non-null if i xi < ∞, and null if i xi = ∞. ♦
P P
Proof: Assume the chain is recurrent and irreducible. For any states k, i ∈
S,
X
Nik = I(Xn = i, Tk ≥ n)
n
with Tk being the time of the first return to state k. Note that Tk is a well
defined random variable because p(Tk < ∞) = 1.
Define ρi (k) = E[Nik |X0 = k] which is the mean number of visits of the
chain to state i between two successive visits of state k.
By the way, if j is recurrent, then µj = E[Tj ] is called the mean recurrent
time. It is allowed to be infinity when the summation in the definition of
4.3. STATIONARY DISTRIBUTION 51
It will be seen that the vector ρ with ρi (k) as its kth component is a base for
finding the stationary distribution.
We first show that ρi (k) are finite for all k. Let
lki = P (Xn = i, Tk ≥ n|X0 = k),
the probability that the chain reaches i in n steps but with no intermediate
return to the starting point k.
With this definition, we see that
fkk (m + n) ≥ lki (m)fik (n)
in which the right hand side is one of the sample paths taking a roundtrip
from k back to k in m + n steps, without visiting k in intermediate steps.
Because the chain is irreducible, there exists n such that fik (n) > 0. Now
sum over m on two sides and we get
X X
fkk (m + n) ≥ fik (n) lki (m).
m m
Since m fkk (m + n) < ∞ and fik (n) > 0, we get m lki (m) < ∞. That is,
P P
ρi (k) < ∞.
Now we move to the next step. Note that lki (1) = pki , the one-step
transition probability. Further,
X
lki (n) = P (Xi , Xn−1 = j, Tk ≥ n|X0 = k)
j:j6=k
X
= lkj (n − 1)pji
j:j6=k
The above lemma shows why we did not pay much attention on when
a Markov chain is positive recurrent. Once we find a suitable solution to
x = xP , we know whether it is positive recurrent or not immediately.
The next theorem summary these results to give a conclusion on the
existence of stationary distribution.
4.3. STATIONARY DISTRIBUTION 53
Theorem 4.3
P (Tj ≥ n, X0 = j)
= P (X0 = j, Xm 6= j for all 1 ≤ m ≤ n − 1)
= P (Xm 6= j for 1 ≤ m ≤ n − 1) − P (Xm 6= j for 0 ≤ m ≤ n − 1)
= P (Xm 6= j for 0 ≤ m ≤ n − 2) − P (Xm 6= j for 0 ≤ m ≤ n − 1)
= an−2 − an−1
We show that
satisfies
xj
gij (n) = lij (n).
xi
Let us examine the expressions of gij and lji . One is the probability of all
the pathes which start from j and end up at state i before they ever visit j
again. The other is the probability of all the pathes which start from i and
end up at state j without being there in between. If we reverse the time of
the second, then we are working with the same set of sample pathes.
For each sample path corresponding to lji (n), let us denote it as
j, k1 , k2 , . . . , kn−1 , i,
The reverse sample path for gij (n) is i, kn−1 , . . . , k2 , k1 , j and its probability
of occurrence is
Theorem 4.4
Remarks:
1. If the chain is transient or null recurrent, then it is known that pij (n) →
0 for all i and j. Since µj = ∞ in this case, the theorem is automatically
true.
2. If the chain is positive recurrent, then according to this theorem,
pij (n) → πj = µ−1 j .
3. This theorem implies that the limit of pij (n) does not depend on n. It
further implies that
P (Xn = j) → µ−1 j
as n → ∞.
Proof:
Case I: the Markov chain is transient. In this case, the theorem is true
by renewal theorem.
Case II: the Markov chain is recurrent. We could use renewal theorem.
Yet let us see another line of approach.
Let {Xn } be the Markov chain with the transition probability matrix P
under consideration. Let {Yn } be an independent Markov chain with the
same state space S and transition matrix P with {Xn }.
Now we consider the stochastic process {Zn } such that Zn = (Xn , Yn ),
n = 0, 1, 2, . . .. Its state space is S × S. Its transition probabilities are simply
multiplication of original transition probabilities. That is,
The new chain is still irreducible. If pik (m) > 0 and pjl (n) > 0, then
pik (mn)pjl (mn) > 0 and so kl is reachable from ij.
The new chain is still aperiodic. It states that if the period is one, then
pij (n) > 0 for all sufficiently large n. Thus, pij (n)pkl (n) > 0 for all large
enough n too.
4.3. STATIONARY DISTRIBUTION 57
Case II.1: positive recurrent. In this case, {Xn } has stationary distribu-
tion π, and hence {Zn } has stationary distribution given by {πi πj : i, j ∈ S}.
Thus by the lemma in the last section, {Zn } is also positive recurrent.
Assume (X0 , Y0 ) = (i, j) for some (i, j). For any state k, define
Hence
|pik (n) − pjk (n)| ≤ P (T > n) → 0
as n → ∞. That is,
pik (n) − pjk (n) → 0.
or if the limit of pjk (n) exists, it does not depend on j. (Remark: P (T >
n) → 0 is a consequence of recurrentness).
For the existence, we have
X
πk − pjk (n) = πi (pik (n) − pjk (n)) → 0.
i∈S
pij (nr ) → αj
Let r → ∞, we have
X X
αk pkj ≤ pik αj = αj .
k∈F k∈S
Theorem 4.5
Corollary 4.2
Let
n
1 X
τij (n) = pij (m)
n m=1
be the mean proportion of elapsed time up to the nth step during which the
chain was in state j, starting from i. If j is aperiodic, then
as n → ∞. ♦
60 CHAPTER 4. DISCRETE TIME MARKOV CHAIN
4.4 Reversibility
Let {Xn : n = 0, 1, 2, . . . , N } be a (part) of a Markov chain whose transition
probability matrix is P and it is irreducible and positive recurrent so that π
is its stationary distribution.
Now let us define Yn = XN −n for n = 0, 1, . . . , N . Suppose all Xn has
distribution given by the stationary distribution π. It can be shown that
{Yn } is also a Markov chain.
Theorem 4.6
The sequence Y = {Yn : n = 0, 1, . . . , N } is a Markov chain with transition
probabilities
πj
qij = P (Yn+1 = j|Yn = i) = pji .
πi
♦
Proof We need only verify the Markov property. Other conditions for a
Markov chain are obvious.
P (Yn+1 = in+1 |Yk = ik , k = 0, . . . , n)
= P (Yk = ik , k = 0, . . . , n, n + 1)/P (Yk = ik , k = 0, . . . , n)
= P (XN −k = ik , k = 0, . . . , n, n + 1)/P (XN −k = ik , k = 0, . . . , n)
P (XN −n−1 = in+1 )P (XN −n = in |XN −n−1 = in+1 )
=
P (XN −n = in )
πin+1 pin+1 ,in
= .
πin
Since this transition probability does not depend on ik , k = 0, . . . , n − 1, the
Markov property is verified. ♦
Although Y in the above theorem is a Markov chain, it is not the same
as the original Markov chain.
Definition 4.1
Let {Xn : 0 ≤ n ≤ N } be an irreducible Markov chain such that it has sta-
tionary distribution π for all n. The chain is call reversible if the transition
matrices of X and its time-reversal; Y are the same, which is to say that
πi pij = πj pji
4.4. REVERSIBILITY 61
for all i, j. ♦
Theorem 4.7
Definition 5.1
63
64 CHAPTER 5. CONTINUOUS TIME MARKOV CHAIN
λh + o(h)
if m = 1,
(b)P {N (t + h) = n + m|N (t) = n} = o(h) if m > 1,
1 − λh + o(h) if m = 0.
(c) if s < t, the random variable N (t) − N (s) is independent of the N (s).
♦
In general, we call (b) the individuality, and call (c) the independence
property of the Poisson process. It is well known as these specifications imply
that N (t) − N (s) has Poisson distribution with mean λ(t − s) for s < t.
Theorem 5.1
N (t) has the Poisson distribution with parameter λt; that is to say
(λt)j
P (N (t) = j) = exp(−λt), j = 0, 1, 2, . . . .
j!
♦
Proof: Denote pj (t) = P (N (t) = j). The properties of the Poisson process
lead to the equation
p0j (t) = λpj−1 (t) − λpj (t)
of j 6= 0; likewise
p00 (t) = −λp0 (t).
With the boundary condition pj (0) = I(j = 0), the equations can be solved
and the conclusion proved. ♦
We may view a counting process by recording the arrival time of the nth
event. For that purpose, we define
Xn = Tn − Tn−1 .
Theorem 5.2
Definition 5.2
(c) if s < t, the random variable N (t) − N (s) is independent of the N (s).
♦
Here is a list of special cases:
66 CHAPTER 5. CONTINUOUS TIME MARKOV CHAIN
The differential equation we derived for the Poisson process can be easily
generalized. We can find two basic sets of them. Define pij (t) = P (N (s+t) =
j|N (s) = i). The boundary conditions are pij (0) = δij = I(i = j).
Forward system of equations:
for j ≥ i.
Backward system of equations:
for j ≥ i.
The forward equation can be obtained by computing the probability of
N (t + h) = j conditioning on N (t) = i. The backward equation is obtained
by computing the probability of N (t + h) = j conditioning on N (h) = i.
Theorem 5.3
The forward system has a unique solution, which satisfies the backward sys-
tem. ♦
Proof: First, it is seen that pij (t) = 0 whenever j < i. When j = i,
pii (t) = exp(−λj t) is the solution. Substituting into the foward equation,
we obtain the solution for pi,i+1 (t). Repeat this procedure implies that the
forward system has a unique solutions.
Using Laplace transformation reveals the structure of the solution better.
Define Z ∞
p̂ij (θ) = exp(−θt)pij (t)dt.
0
Then, the forward system becomes
for j > i. The uniqueness is determined by the inversion theorem for Laplace
transforms.
If πij (t)’s solve the backword systems, their corresponding Laplace trans-
forms will satisfy
(θ + λi )π̂ij (θ) = δij + λi π̂i+1,j (θ).
It turns out that these satisfying the forward system will also satisfy the
backward system here in Laplace transforms. Thus, the solution to the for-
ward equation is also a solution to the backward equation. ♦
An implicit conclusion here is: the backward equation may have many
solutions. It turns out that if there are many solutions to the backward
equation, the solution given by the forward equation is the minimum solution.
Theorem 5.4
If {pij (t)} is the unique solution of the forward system, then any solution
{πij (t)} of the backward system satisfies pij (t) ≤ πij (t) for all i, j, t. ♦
If {pij (t)} is the transition probabilities of the specified birth and death
process, then it must solve both forward and backward systems. Thus, the
solution to the forward system must be the transition probabilities. This
could be compared to the problem related to probability of ultimate extinc-
tion in the branching process. Conversely, the solution to the foward system
can be shown to satisfy the Chapman-Kolmogorov equations. Thus, it is a
relevent solution.
The textbook fails to demonstrate the why the proof of the uniqueness
for the forward system cannot be applied to the backward system. The
key is the assumption of pij (t) = 0 when j < i. When this restrict is
removed, the backward system may have multiple solutions. This restriction
reflects the existence of some continuous time Markov chains which have the
same transition probability matrices to some degree to the birth process.
68 CHAPTER 5. CONTINUOUS TIME MARKOV CHAIN
Consequently, the text book should not have made use of this restriction in
proving the uniqueness of solution to the forward system.
Intuitively, we may expect that
X
pij (t) = 1
j∈S
for any solutions. If so, no solutions can be larger than other solutions and
hence the uniqueness is automatic. The non-uniqueness is exactly built on
this observation. This constraint does not hold for some birth processes.
When the birth rate increases with the population size fast enough, the
population size may rearch infinity in finite amount of time.
When the solution to the forward equation
X
pij (t) < 1
j∈S
for some t > 0 and i, it is possible then to construct another solution which
also satisfies the backward system. See Feller for detailed constructions.
In that case, it is possible to design a new stochastics process so that its
transition probabilities are given by this solution.
What is the probabilistic interpretation when
X
pij (t) < 1
j∈S
for some finite t, and i? It implies that within the period of t, the population
size has jumped or exploded all the way to infinity. Consequently, infinite
number of transitions must be have occurred. Recall the waiting time for the
next transition when N (t) = n is exponential with rate λn . Let Tn = ni=1 Xi
P
Lemma 5.1
5.1. BIRTH PROCESSES AND THE POISSON PROCESS 69
0 if ∞ −1
(
n=1 λn = ∞.
P
P (T∞ < ∞) =
1 if ∞ −1
n=1 λn < ∞
P
Hence, when ∞ −1
n=1 λn < ∞, E[T∞ ] < ∞ and P (T∞ < ∞) = 1. This implies
P
dishonesty.
If ∞ −1
n=1 λn = ∞, it does not imply T∞ = ∞ with any positive prob-
P
ability. If, however, P (T∞ < t) > 0 for any t > 0, then E[exp(−T∞ )] ≥
exp(−t)P (T∞ < t) > 0. We show that this is impossible under the current
assumption. Note that
N
Y
E[exp(−T∞ )] = lim E exp(−Xn )
N →∞
n=1
N
Y
= lim E exp(−Xn )
N →∞
n=1
N
(1 + λ−1 −1
Y
= lim n )
N →∞
n=1
N
(1 + λ−1 −1
Y
= lim [ n )] .
N →∞
n=1
Theorem 5.5
P∞
The process N is honest if and only if n=1 λ−1
n = ∞.
70 CHAPTER 5. CONTINUOUS TIME MARKOV CHAIN
P (A|N (T ) = i, B) = P (A|N (T ) = i)
for all i. ♦
Proof: In fact, this is a simple case, as N (T ) is a discrete random variable.
The kind of event B which causes most trouble are those contains all infor-
mation about the history of the process before and including time T . If that
is the case, then the value of T is completely determined by B. Hence, we
may write T = T (B). Hence, it claims that
Among the three pieces of information on the right hand side, T is defined
based on {N (s) : s ≤ T (B)} and is a constant when “N (T ) = i, T = T (B)”.
5.2. CONTINUOUS TIME MARKOV CHAINS 71
Hence the Markov (weak one) property allows us to ignore B itself. At the
same time, the process is time homogeneous, it depends only on the fact that
the chain is in state i now, not when it first reached state i. Hence, the part
of T = T (B) is also not informative. Hence the conclusion.
The measure theory proof is as follows. Let H = σ{N (s) : s ≤ T } which
is the σ-field generated by these random variables. An event B containing
historic information before T is simply an event in this σ-algebra. Recall the
formular E[E(X|Y )] = E(X). Let H play the role of Y , and E(·|N (T ) =
i, B) play the role of expectation, we ahve
P (A|N (T ) = i, B) = E(I(A)|N (T ) = i, B)
= E[E(I(A)|N (T ) = i, B, H)|N (T ) = i, B).
This result is easy to present for discrete time Markov chain. Hence, if
you cannot understand the above proof, work on the discrete time example
in the assignment will help. ♦
Example 5.1
Consider the birth process with N (0) = I > 0. Define pn (t) = P (N (t) = n).
Set up the forward system and solve it when λn = nλ. ♦
Definition 5.1
72 CHAPTER 5. CONTINUOUS TIME MARKOV CHAIN
for all j, i1 , . . . , in−1 ∈ S and any sequence 0 < t1 < t2 < · · · < tn of times.
Definition 5.2
for s < t.
The chain is called time homogeneous if pij (s, t) = pij (0, t − s) for all
i, j, s, t and we write pij (t − s) for pij (s, t). ♦
We will assume the homogeneity for all continuous time Markov chain
to be discussed unless otherwise specified. We also use notation Pt for the
corresponding matrix. It turns out that the collection of all transition prob-
ability matrices form a stochastic semi-group. Forming a group means we
can define an operation on this set such that the operation is closed and
invertable. A semi-group may not have an inverse for each member in the
same set.
Theorem 5.7
5.2. CONTINUOUS TIME MARKOV CHAINS 73
Definition 5.3
where gij ’s are a set of constants such that gij > 0 for i 6= j and −∞ ≤ gii ≤ 0
for all i.
74 CHAPTER 5. CONTINUOUS TIME MARKOV CHAIN
Unless additional retrictions are applied, some gii may take value −∞.
In which case I guess that the interpretation is
pii (h) − 1
→ −∞.
h
When all the needed conditions are in place, we set up a matrix G for gij
and call it infinitesimal generator.
Linking this expansion to the transition probability, we must have
X
gij = 0
j
for all i. Due to the fact that some gii can be negative infinity, the above rela-
tionship does not apply to all Markov chains. We will discuss the conditions
under which this result is guaranteed.
Once we admit the validity of (5.1), then we can easily obtain the forward
and backward equations.
Forward equations P0t = Pt G;
Backward equations P0t = GPt ;
Example 5.2
Recall that for some Markov chain with standard semi-group transition
probability matrices, the instantanuous rates gii could be negative infinity.
In this case, state i is instantanuous. That is, the moment of the Markov
chain entering state i is also the moment of leaving the state. Barring such
possibilities by assuming gij have a common upper bound in absolute value,
then waiting times for Markov chain leaving state i will have momoryless
property. Therefore, we have the result as follows.
Theorem 5.8
Example 5.3
A continuous Markov chain with finite state space is always standard and
uniform. Thus, all the conclusions are valid. ♦
Example 5.4
A continuous Markov chain with only two possible states can have its forward
and backward equations solved easily. ♦
76 CHAPTER 5. CONTINUOUS TIME MARKOV CHAIN
Theorem 5.10
pij (t) → πj
♦
Proof: For any h > 0, we may define Yn = X(nh). Then we have a
discrete time Markov chain which is irreducible and ergodic. If it is non-null
recurrent, then it has a unique stationary distribution π (h) . In addition,
(h)
pij (nh) = P (Yn = j|Y0 = i) → πj .
for all i, j.
For any two rational numbers h1 and h2 , applying the above result implies
(h ) (h )
that πj 1 = πj 2 .
Next, using continuity of pij (t) to fill up the gap for all real numbers.
Remark: Unlike other result, we are told that the conclusion is true when
the class of the transition matrix is a standard semi-group.
Consider the situation when the birth rates λn = λ for all n, and µn = nµ
for n = 1, 2, . . .. That is, there is a constant source of immigration, and each
individual in the population has the same rate of die. Using the transition
probability language, the book staes that
Since the chance to have two or more transitions in a short period of length
h is o(h), it reduces to
pi,i+1 (h) = λh + o(h),
pi,i−1 (h) = (iµ)h + o(h)
and pij (h) = o(h) when |i − j| ≥ 2.
It is very simple to work out its limiting probabilities.
Theorem 5.11
ρn
P (X(t) = n) → exp(−ρ), n = 0, 1, 2, . . . .
n!
The proof is very simple. Let us work out the distribution of X(t) directly.
Assume X(0) = I.
Let pj (t) = P (X(t) = j|X(0) = I). By the foward equations, we have
and ∞
∂G X
= sj p0j (t).
∂t j=0
When the birth and death rates are given by λn = nλ and µn = nµ, we can
work out the problem in exactly the same way.
It turns out that given X(0) = I, the generating function of X(t) is given
by
" #I
µ(1 − s) − (µ − λs) exp{−t(λ − µ)}
G(s, t) = .
λ(1 − s) − (µ − λs) exp{−t(λ − µ)
The corresponding differential equation is given by
∂G ∂G
(s, t) = (λs − µ)(s − 1) (s, t).
∂t ∂s
Does it look like a convolution of one binomial and one negative binomial
distribution?
One can find the mean and variance of X(t) from this expression.
λ+µ
V ar(X(t)) = exp{(λ − µ)t}[exp{(λ − µ)t} − 1]I.
λ−µ
The limit for the expectation is either 0 or infinity depending on whether the
ratio of λ/µ is larger than or smaller than 1.
The probability of extinction is given by the limit of η(t) = P (X(t) = 0)
which is min(ρ−1 , 1). ♦
5.5 Embedding
If we ignore the length of the time between two consecutive transitions in a
continuous time Markov chain, we obtain a discrete time Markov chain. In
the example of linear birth and death, the waiting time for the next birth or
death has exponential distribution with rate n(λ + µ) given X(t) = n. The
probability that the transition is a birth is (nλ)/[n(λ + µ)] = λ/(λ + µ).
Think of the transition as the movement of a particle from the integer
n to the new integer n + 1 or n − 1. The particle is performing a simple
random walk with probability p = λ/(λ + µ). The state 0 is an absorbing
state. The probability that the random walk will be absorbed to state 0 is
given by min(1, q/p).
we may compute 10,000 many s2n values. If 12% of them are smaller than
x = 5, then it can be shown that P (s2n ≤ 5) can be approximated by 12%.
The precision improves when we generate more and more sets of such random
variables.
A more general problem to compute
Z X
g(θ)π(θ)dθ, or g(θ)π(θ)
θ
n−1
X
g(Xi ) → Eπ {g(X1 )}.
i
πk pkj = πj pjk
for all k, j ∈ Θ. The following steps will create such a discrete time Markov
chain.
Assume we have Xn = i already. We need to generate Xn+1 according to
some transition probability.
(1) First, we pick an arbitrary stochastic matrix H = (hij : i, j ∈ Θ).
This matrix is called the ‘proposal matrix’. We generate a random number Y
according to the distribution given by (hik : k ∈ Θ). That is, P (Y = k) = hik .
Since hik are well defined, we consider it feasible.
(2) Select a matrix A = (aij : i, j ∈ Θ) be a matrix with entries between
0 and 1. The aij are called ‘acceptance probabilities’. We first generate a
uniform [0, 1] random variable Z and define
This is because that the transition to j occurs only if both Y = j and Z < aij
when Xn 6= j.
It turns out that the balance equation is satisfied when we choose
This choice results in the algorithm called the Hastings algorithm. Note also
that in this algorithm, we do not need the knowledge of πi , but πi /πj for all
i and j.
Let us try to verify this result. Consider two states i 6= j. We have πi pij =
πi hij aij = min{πi hij , πj hji }. Similarly, πj pji = πj hji aji = min{πi hij , πj hji }.
Hence, πi pij = πj pji .
When two states are the same, the balance equation is obviously satisfied.
When Θ is multi-dimensional, there is some advantage of trying to gener-
ate the random vectors one component a time. That is, if Xn is the current
random vector, we make Xn+1 equal Xn except of one of its component. The
ultimate goal can be achieved by randomly select a component or by rotating
the component.
Gibbs sampler, or heat bath algorithm Assume Θ = S V . That is,
its dimension is V . The state space S is finite and so is the dimension V .
Each state in Θ can be written as i = (iw : w = 1, 2, . . . , V ). Let
That is, it is the set of all states which have the same components as state i
other than the vth component. Assume Xn = i, we now choose a state from
Θi,v for Xn+1 . For this purpose, we select
pij
hij = P , j ∈ Θi,v
k∈Θi,v πk
in the first step of the Hastings algorithm. That is, the choice in Θi,v is based
on the conditional distribution given the vth component.
The next step of the Hastings algorithm is the same as before. It turns
out that aij = 1 for all j ∈ Θi,v . That is, there is no need of the second step.
We need to determine the choice of v. We may rotate the components or
randomly pick a component each time we generate the next random vector.
Metropolis algorithm If the matrix H is symmetric, then aij = min{1, πj /πi }
and hence pij = hij min{1, πj /πi }.
We find such a matrix by placing a uniform distribution on Θi,v as defined
in the last Gibbs samples algorithm. That is
for i 6= j.
Finally, even the limiting distribution of Xn is π. The distribution of Xn
may be far from π until n is hugh. It is also very hard to tell how large
n has to be before the distribution of Xn well approximates the stationary
distribution.
In applications, researchers often propose to make use of a burn out period
M . That is, we throw away X1 , X2 , . . . , XM for a large M and start using
Y1 = XM +1 , Y2 = XM +2 and so on to compute various characteristics of
π. For example, we estimate Eπ g(θ) by n−1 ni=1 g(XM +i ). It is hence very
P
important to have some idea on how large this M must be before the random
vectors (numbers) can be used.
Let P be a transition probability matrix of a finite irreducible Markov
chain with period d. Then,
(a) λ1 = 1 is an eigenvalue of P ,
(b) the d complex roots of unity,
λ1 = ω 0 , λ2 = ω 1 , . . . , λd = ω d−1 ,
are eigenvalues of P .
(c) the remaining eigenvalues λd+1 , . . . , λN satisfy |λj | < 1.
When all eigenvalues are distinct, then P = B −1 ΛB for some matrix
B and diagonal matrix with entries λ1 , . . . , λN . Thus, this decomposition
allows us to compute the n step transition probability matrix easily. That
is, P n = B −1 Λn P .
When not all eigenvalues are distinct, then we can still decompose P as
−1
B M B. However, M cannot be made diagonal any more. The best we can
is block diagnonal diag(J1 , J2 , . . .) such that
λi 1 0 0 ···
···
0 λi 1 0
Ji = 0 0 λi 1 ··· .
0 0 0 λi ···
... ... ... ...
Fortunately, M n still have a very simple form. Hence, once such a decompo-
sition is found, the n-step transition probability matrix is easily obtained.
86 CHAPTER 5. CONTINUOUS TIME MARKOV CHAIN
where λ2 is the eigenvalue with the second largest modulus and m is the
multiplicity of this eigenvalue. Note that the transition probability converges
to πj at exponential rate when m = 1.
Theorem 5.12
where R is the set of all real numbers. That is, at any sample point ω ∈ Ω,
S(t) takes a real value: S(ω, t).
It becomes apparent that S(ω, t) is a bivariate function. One of its vari-
ables is time t, and the other variable is the sample point ω. Given t, we
87
88CHAPTER 6. GENERAL STOCHASTIC PROCESS IN CONTINUOUS TIME
P (S(t1 ) ≤ s1 , S(t2 ) ≤ s2 )
well defined. That means, the set S(t1 ) ≤ s1 , S(t2 ) ≤ s2 must be a member
of F. It is easy to see that this requirement goes from two time points to any
finite number of time points, and to any countable number of time points. It
can be shown that if we can give a proper probability to all such sets (called
cylinder sets), then the probability measure can be extended uniquely to the
smallest σ-algebra that contains all cylinder sets.
As consequence of this discussion is the following theorem.
6.2. SAMPLE PATH 89
Theorem 6.1
Example 6.1
P (Y (t) 6= 0) = P (τ = t) = 0.
That is, P (Y (t) = 0) = 1. Hence, for any t ∈ [0, 1], X(t), Y (t) have the
same distribution. Further, for any set of 0 ≤ t1 < t2 < · · · < tn ≤ 1,
the joint distribution of {X(t1 ), X(t2 ), . . . , X(tn )} is identically to that of
{Y (t1 ), Y (t2 ), . . . , Y (tn )}. That is, X(t) and Y (t) have the same finite di-
mensional distributions. ♦
The moral of this example is: even if two stochastic processes have the
same distribution, they can still have very different sample paths. This ob-
servation prompts the following definition.
Definition 6.1
Theorem 6.2
for all 0 ≤ t, u ≤ T .
(b) A regular version of S(t) exists if we can find positive constants
α1 , α2 , and C such that
for all 0 ≤ u ≤ v ≤ t ≤ T . ♦
There is no need to memorize this theorem. What we should make out
of it? Under some continuity condition on expectations, a regular enough
version of a stochastic process exists. We hence have legitimate base to
investigate stochastic processes with this property.
Finally, let us summarize the results we discussed in this section. The
distribution of a stochastic process is determined by its finite dimensional
distributions, whatever it means. Even if two stochastic processes have the
same distribution, their sample paths may have very different properties.
However, for each given stochastic process, there often exists a continuous
6.3. GAUSSIAN PROCESS 91
version, or regular version, whether the given one itself has continuous sample
paths or not.
Ultimately, we focus on the most convenient version of the given stochastic
process in most applications.
Definition 6.1
Theorem 6.3 Strong and Weak Laws of Large Numbers (Ergodic Theo-
rems).
It was botanist R. Brown who first described the irregular and random mo-
tion of a pollen particle suspected in fluid (1828). Hence, such motions are
called Brownian motion. Interestingly, the particle theory in physics was not
widely accepted until the Einstein used Brown motion to argue that such
motions are caused by bombardments of molecules in 1905. The theory is
far from complete without a mathematical model developed later by Wiener
(1931). This stochastic model used for Brown motion is hence also called
Wiener process. We will see that the mathematical model has many desir-
able properties, and a few that do not fit well with physical laws for Brownian
motion. As early as 1931, Brown motion has been used in mathematical the-
ory for stock prices. It is nowadays a fashion to use stochastic processes to
study financial market.
95
96 CHAPTER 7. BROWNIAN MOTION OR WIENER PROCESS
♦
We should compare this definition with that of Poisson process: the in-
crement distribution is changed from Poisson to normal. The sample path
requirement is needed so that we will work on a definitive version of the
process.
Once the distribution of B(0) is given, then all the finite dimensional
distributions of B(t) are completely determined, which then determines the
distribution of B(t) itself. This notion is again analog to that of a counting
process with independent, stationary and Poisson increments.
In some cases, B(0) = 0 is part of the definition. We will more or less
adopt this convention, but will continue to spell it out.
Example 7.1
2. Find the joint distribution of B(1) and B(2), and compute P (B(1) >
1, B(2) < 0).
♦
Recall the multivariate normal distribution is completely determined by
its mean vector and covariance matrix. Thus, the finite dimensional distribu-
tions of a Gaussian process is completely determined by the mean function
and the correlation function. As a special Gaussian process, it is useful to
calculate the mean and correlation function of the Brownian motion.
♦
Based this calculation, it is clear that the above Brownian motion is not
stationary.
R1
Example 7.3 Distribution of 0 B(t)dt given B(t) = 0.
97
Example 7.4
Brownian motions have the following well known but very peculiar properties.
4. Almost every sample path has infinite variation on any interval, not
matter how short it is;
Definition 7.3
98 CHAPTER 7. BROWNIAN MOTION OR WIENER PROCESS
where the limit is taken over any sequence of sets of (x1 , . . . , xn ) such that
where the limit is taken over any sequence of sets of (x1 , . . . , xn ) such that
Theorem 7.1
If s(x) is continuous and of finite total variation, then its quadratic variation
is zero. ♦
Property 1 is true by definition. Property 2 can be derived from Property
4. If a sample path is monotone over an interval, then its variation is finite,
which contradicts Property 4. Property 3 fits well with Property 2; if a
sample path is nowhere monotone, it is very odd to be differentiable. This
point can be made rigorous.
Finally, it is seen that Property 4 follows from Property 5 because of these
arguments. It is hence vital to show that Brownian motion has Property 5.
99
Let 0 = t0 < t1 < t2 < · · · < tn = t be a partition of the interval [0, 1],
where each of ti depends on n though not spelled out. Let
n
{B(ti−1 ) − B(ti )}2
X
Tn =
i=1
n
E{B(ti−1 ) − B(ti )}2
X
E{Tn } =
i=1
n
X
= V ar{B(ti−1 ) − B(ti )}
i=1
Xn
= (ti − ti−1 ) = t.
i=1
That is, the expected sample path quadratic variation is t, regardless of how
the interval is partitioned.
Property 5, however, mean that lim Tn = t almost surely along any se-
quence of parititions such that δn = max |ti − ti−1 | → 0. We give a partial
proof when dn < ∞. In this case, we have
P
n
V ar{B(ti−1 ) − B(ti )}2
X
V ar{Tn } =
i=1
n
3(ti − ti−1 )2
X
=
i=1
≤ 3t max |ti − ti−1 | = 3δn t.
gence theorem in measure theory, it implies E {Tn − E(Tn )}2 < ∞. (Note
P
the change of order between summation and expectation). The latter implies
Tn − E(Tn ) → 0 almost surely, which is the same as Tn → t almost surely.
Despite the beauty of this result, this property also shows that the math-
ematical Brownian motion does not model the physical Brownian motion in
microscope level. If Newton’s laws hold, the acceleration cannot be instan-
taneous, and the sample paths of a diffusion particle should be smooth.
100 CHAPTER 7. BROWNIAN MOTION OR WIENER PROCESS
Fs ⊂ Ft
♦
With this definition, we have implied that X(t) is Ft measurable. In
general, Ft represents information about X(t) available to an observer up
to time t. If any decision is to be made at time t by this observer, the
decision is a Ft measurable function. Unless otherwise specified, we take
Ft = σ{X(s) : s ≤ t} when a stochastic process X(t) and a filtration Ft are
subjects of some discussion.
One of the defining properties of Brownian motion can be reworded as
the Brownian motion is a martingale.
We would like to mention two other martingales.
Theorem 7.2
Definition 7.2
Let T be a stopping time associated with Brownian motion B(t). Then for
any t > 0,
P {B(T + t) ≤ y|FT } = P {B(T + t) ≤ y|B(T )}
for all y. ♦
We cannot afford to spend more time at proving this result. Let us
tentatively accept it as a fact and see what will be the consequences.
Let us define B̂(t) = B(T + t) − B(T ). When T is a constant, it can be
easily verified that B̂(t) is also a Brownian motion. When T is a stopping
time, it remains to be true due to the strong Markov property.
Assume B(0) = x and a < x < b. Define T = min{T (a), T (b)}. Hence
T is the time when the Brownian motion escapes the area formed by two
horizontal lines. The question is: will it occur in finite time? If so, what is
the average time it takes?
The first question is answered by computing P (T < ∞), and the second
question is question is answered by computing E{T } under the assumption
that the first answer is affirmative. One may link this problem with simple
random walk and predict the outcome. The probability for the first passage
of “1” for a symmetric simple random walk is 1. Since the normal distribution
is symmetric, the simple random walk result seems to suggest that P (T <
∞) = 1. We do not have similar results for E{T }, but for E{T (a)} or
E{T (b)} for simple random walk, and the later are likely finite from the
same consideration. The real answers are given as follows.
Theorem 7.5
(b) E{T |B(0) = x} < ∞, under the assumption that a < x < b.
♦
Proof: It is seen that the event {T > 1} implies a < B(t) < b for all
t ∈ [0, 1]. Hence,
pn = P {T > n|B(0) = x}
= P {T > n|T > n − 1, B(0) = x)pn−1
7.5. MAXIMUM AND MINIMUM OF BROWNIAN MOTION 105
Further,
∞
X
E(T |B(0) = x) ≤ P {T > n|B(0) = x} < ∞.
n=0
♦
I worry slightly that I did not really make use of strong Markov property,
but the Markov property itself.
The following results are obvious:
Theorem 7.6
Proof: We will omit the condition that B(0) = 0. Also, we use T (a) as the
stopping time when B(t) first hits a.
First, for any m, we have
Note that B(T (m)) = m as the sample paths are continuous almost surely,
plus {M (t) > m} is equivalent to T (m) ≤ t.
Theorem 7.7
|x| x2
f (t) = √ exp{− } t ≥ 0.
2πt3 2t
7.6. ZEROS OF BROWNIAN MOTION AND ARCSINE LAW 107
Theorem 7.8
Theorem 7.9
The probability that B(t) has a least one zero in the time interval (a, b) is
given by
2 q
arc cos a/b.
π
Proof: Let E(a, b) be the event that B(t) = 0 for at lease some t ∈ (a, b).
Hence,
Z ∞
E[P {E(a, b)|B(a)}] = P {E(a, b)|B(a) = x}fa (x)dx
−∞
Z ∞
= 2 P {E(a, b)|B(a) = −x}fa (x)dx
Z0∞
= 2 P {T (x) < b − a}fa (x)dx
0
where fa (x) is the density function of B(a) and known to be normal with
mean 0 and variance a. The density function of T (x) is given earlier. Sub-
stituting two expressions in and finishing the job of integration, we get the
result. ♦
When a → 0, so that a/b → 0, this probability approaches 1. Hence, the
probability that B(t) = 0 in any small neighborhood of t = 0 (not including
0) is 1 given B(0) = 0. This conclusion can be further strengthened. There
are infinite number of zeros in any neighborhood of t = 0, no matter how
small this neighborhood is. This result further illustrates that the sample
paths of Brownian motion, though continuous, is nowhere smooth.
The famous Arcsine law now follows.
Theorem 7.10