Book PDF
Book PDF
Book PDF
3
4 CONTENTS
13
14 CHAPTER 1. INTRODUCTION (JULY 20, 1999)
from the new point, etc. For a while, the walk is almost always
quickly visiting pixels it hasn’t visited before, so one sees an
irregular pattern that grows in the center of the screen. After
quite a long while, when the screen is perhaps 95% illuminated,
the growth process will have slowed down tremendously, and the
viewer can safely go read War and Peace without missing any
action. After a minor eternity every cell will have been visited.
Any mathematician will want to know how long, on the average,
it takes until each pixel has been visited. Edited from Wilf [336].
For the usual deck with d = 52 these suggest 8 and 205 shuffles respectively.
where f is given by some explicit but maybe complicated formula. How can
you devise a scheme to sample a random point in Rd with the normalized
probability density f (x)/κ?
For d = 1 the elementary “inverse distribution function” trick is avail-
able, and for small d simple acceptance/rejection methods are often prac-
tical. For large d the most popular method is some form of Markov chain
Monte Carlo (MCMC) method, and this specific d-dimensional sampling
problem is a prototype problem for MCMC methods. The scheme is to de-
sign a chain to have stationary distribution f (x)/κ. A simple such chain is
as follows. From a point x, the next point X1 is chosen by a two-step pro-
cedure. First choose Y from some reference distribution (e.g. multivariate
Normal with specified variance, or uniform of a sphere of specified radius)
16 CHAPTER 1. INTRODUCTION (JULY 20, 1999)
Sl+1 . As in the previous word problem, one can get an approximately uni-
form random sample by MCMC, i.e. by designing a chain whose stationary
distribution is uniform, and simulating a sufficiently large number of steps
of the chain: in making a rigorous algorithm, the issue again reduces to
bounding the mixing time of the chain. The case of SAWs is outlined in
Chapter 11 section yyy.
But what about larger g? Clearly you didn’t have 2120 ≈ 1012 distinct
120’th-generation ancestors! Even taking g = 10, one can argue it’s unlikely
you had 1, 024 different 10th-generation ancestors, though the number is
likely only a bit smaller – say 1,000, in round numbers. Whether you are ac-
tually related to these people is a subtle question. At the level of grade-school
genetics, you have 46 chromosomes, each a copy of one parental chromosome,
and hence each a copy of some 10th-generation ancestor’s chromosome. So
you’re genetically related to at most 46 of your 10th-generation ancestors.
Taking account of crossover during chromosome duplication leads to a more
interesting model, in which the issue is to estimate hitting probabilities in
a certain continuous-time reversible Markov chain. It turns out (Chapter
13 yyy) that the number of 10th-generation ancestors who are genetically
related to you is about 340. So you’re unlikely to be related to a particu-
lar 10th-generation ancestor, a fact which presents a curious sidebar to the
principle of hereditary monarchy.
1.2.2 Prerequisites
The reader who has taken a first-year graduate course in mathematical prob-
ability will have no difficulty with the mathematical content of this book.
Though if the phrase “randomized algorithm” means nothing to you, then
it would be helpful to look at Motwani - Raghavan [265] to get some feeling
for the algorithmic viewpoint.
We have tried to keep much of the book accessible to readers whose math-
ematical background emphasizes discrete math and algorithms rather than
analysis and probability. The minimal background required is an undergrad-
uate course in probability including classical limit theory for finite Markov
chains. Graduate-level mathematical probability is usually presented within
the framework of measure theory, which (with some justification) is often
regarded as irrelevant “general abstract nonsense” by those interested in
concrete mathematics. We will point out as we go the pieces of graduate-
level probability that we use (e.g. martingale techniques, Wald’s identity,
weak convergence). Advice: if your research involves probability then you
should at some time see what’s taught in a good first-year-graduate course,
and we strongly recommend Durrett [133] for this purpose.
23
24CHAPTER 2. GENERAL MARKOV CHAINS (SEPTEMBER 10, 1999)
for probabilities and expectations for the chain started at state i and time
0. More generally, write Pρ (·) and Eρ (·) for probabilities and expectations
for the chain started at time 0 with distribution ρ. Write
Ti = min{t ≥ 0 : Xt = i}
Of course Ti+ = Ti unless X0 = i, in which case we call Ti+ the first return
time to state i. More generally, a subset A of states has first hitting time
TA = min{t ≥ 0 : Xt ∈ A}.
We shall frequently use without comment “obvious” facts like the fol-
lowing.
Start a chain at state i, wait until it first hits j, then wait until
the time (S, say) at which it next hits k. Then Ei S = Ei Tj +
Ej Tk .
The elementary proof sums over the possible values t of Tj . The sophisti-
cated proof appeals to the strong Markov property ([270] section 1.4) of the
stopping time Tj , which implies
Ei (S|Xt , t ≤ Tj ) = Tj + Ej Tk .
Recall that the symbol | is the probabilist’s shorthand for “conditional on”.
P
It can then be checked that πi := π̃(i)/ j π̃(j) is a stationary distribution.
The point of stationarity is that, if the initial position X0 of the chain is
random with the stationary distribution π, then the position Xt at any
subsequent non-random time t has the same distribution π, and the process
(Xt , t = 0, 1, 2, . . .) is then called the stationary chain.
A highlight of elementary theory is that the stationary distribution plays
the main role in asymptotic results, as follows.
Theorem 2.1 (The ergodic theorem: [270] Theorem 1.10.2) Let Ni (t)
be the number of visits to state i during times 0, 1, . . . , t − 1. Then for any
initial distribution,
Theorem 2.1 is the simplest illustration of the ergodic principle “time aver-
ages equal space averages”. Many general identities for Markov chains can
be regarded as aspects of the ergodic principle – in particular, in section
2.2.1 we use it to derive expressions for mean hitting times. Such identities
are important and useful.
The most classical topic in mathematical probability is time-asymptotics
for i.i.d. (independent, identically distributed) random sequences. A vast
number of results are known, and (broadly speaking) have simple analogs
for Markov chains. Thus the analog of the strong law of large numbers is
Theorem 2.1, and the analog of the central limit theorem is Theorem 2.17
below. As mentioned in Chapter 1 section 2.1 (yyy 7/20/99 version) this
book has a different focus, on results which say something about the behavior
of the chain over some specific finite time, rather than what happens in the
indefinite future.
results are simpler in continuous time. See Norris [270] Chapters 2 and 3
for details on what follows.
A continuous-time chain is specified by transition rates (q(i, j) = qij , j 6=
i) which are required to be non-negative but have no constraint on the sums.
Given the transition rates, define
X
qi := qij (2.2)
j:j6=i
and extend (qij ) to a matrix Q by putting qii = −qi . The chain (Xt : 0 ≤
t < ∞) has two equivalent descriptions.
1. Infinitesimal description. Given that Xt = i, the chance that
Xt+dt = j is qij dt for each j 6= i.
2. Jump-and-hold description. Define a transition matrix J by
Jii = 0 and
Jij := qij /qi , j 6= i. (2.3)
Then the continuous-time chain may be constructed by the two-step proce-
dure
(i) Run a discrete-time chain X J with transition matrix J.
(ii) Given the sequence of states i0 , i1 , i2 , . . . visited by X J , the durations
spent in states im are independent exponential random variables with rates
qim .
The discrete-time chain X J is called the jump chain associated with Xt .
The results in the previous section go over to continuous-time chains
with the following modifications.
(t)
(a) Pi (Xt = j) = Qij , where Q(t) := exp(Qt).
(b) The definition of Ti+ becomes
(c) If the chain is irreducible then there exists a unique stationary dis-
tribution π characterized by
X
πi qij = 0 for all j.
i
Proposition 2.3 Consider the chain started at state i. Let 0 < S < ∞ be
a stopping time such that XS = i and Ei S < ∞. Let j be an arbitrary state.
Then
Ei (number of visits to j before time S) = πj Ei S.
2
2.2. IDENTITIES FOR MEAN HITTING TIMES AND OCCUPATION TIMES29
In the periodic case the sum may oscillate, so we use the Cesaro limit or
(equivalently, but more simply) the continuous-time limit (2.9). The matrix
Z is called the fundamental matrix (see Notes for alternate standardizations).
Note that from the definition
X
Zij = 0 for all i. (2.7)
j
Lemma 2.6
P P
Corollary 2.13 j πj Ei Tj = j Zjj for each i.
P
Corollary 2.14 (The random target lemma) j πj Ei Tj does not de-
pend on i.
Lemma 2.15
πj
Eπ (number of visits to j before time Ti ) = Zii − Zij .
πi
Lemmas 2.11 and 2.12, which will be used frequently throughout the book,
will both be referred to as the mean hitting time formula. See the Remark
following the proofs for a two-line heuristic derivation of Lemma 2.12. A
consequence of the mean hitting time formula is that knowing the matrix
Z is equivalent to knowing the matrix (Ei Tj ), since we can recover Zij as
πj (Eπ Tj − Ei Tj ).
Proofs. The simplest choice of S in Proposition 2.3 is of course the first
return time Ti+ . With this choice, the Proposition says
Setting j = i gives 1 = πi Ei Ti+ , which is Lemma 2.5, and then the case of
general j gives Lemma 2.6.
Another choice of S is “the first return to i after the first visit to j”.
Then Ei S = Ei Tj + Ej Ti and the Proposition becomes Lemma 2.7, because
there are no visits to j before time Tj . For the chain started at i, the number
of visits to i (including time 0) before hitting j has geometric distribution,
and so
This is the assertion of Lemma 2.9, and the identity remains true if j = i
(where it becomes Lemma 2.7) or if j = l (where it reduces to 0 = 0). We
deduce Corollary 2.10 by writing
Ei (number of visits to j before time Tl ) =
Pi (Tj < Tl )Ej (number of visits to j before time Tl )
and using Lemma 2.7 to evaluate the final expectation.
We now get slightly more ingenious. Fix a time t0 ≥ 1 and define S as
the time taken by the following 2-stage procedure (for the chain started at
i).
(i) wait time t0
(ii) then wait (if necessary) until the chain next hits i.
Then the Proposition (with j = i) says
0 −1
tX
(t)
pii = πi (t0 + Eρ Ti ) (2.8)
t=0
where ρ(·) = Pk (Xt0 = ·). Subtracting the equality of Lemma 2.7 and
rearranging, we get
0 −1
tX
(t)
(pki − πi ) = πi (Eρ Ti − Ek Ti ).
t=0
32CHAPTER 2. GENERAL MARKOV CHAINS (SEPTEMBER 10, 1999)
Zki = πi (Eπ Ti − Ek Ti ).
Appealing to Lemma 2.11 we get Lemma 2.12. Corollary 2.13 follows from
Lemma 2.12 by using (2.7).
To prove Lemma 2.15, consider again the argument for (2.8), but now
apply the Proposition with j 6= i. This gives
0 −1
tX
(t)
pij + Eρ (number of visits to j before time Ti ) = πj (t0 + Eρ Ti )
t=0
0 −1
tX
(t)
(pij − πj ) + Eρ (number of visits to j before time Ti ) = πj Eρ Ti .
t=0
Letting t0 → ∞ gives
∞ Tj −1 ∞
X X X
1(Xt =j) − πj = 1(Xt =j) − πj + 1(Xt =j) − πj .
t=0 t=0 t=Tj
Take Ei (·) of each term to get Zij = −πj Ei Tj + Zjj . Of course this argu-
ment doesn’t make sense because the sums do not converge. Implicit in our
(honest) proof is a justification of this argument by a limiting procedure.
ju = iu+d , 1 ≤ u ≤ n − d.
From the definition of Z, and the fact that X0 and Xt are independent for
t ≥ n,
Zij = c(i, j) − n2−n .
So we can read off many facts about patterns in coin-tossing from the general
results of this section. For instance, the mean hitting time formula ( Lemma
2.11) says Eπ Ti = 2n c(i, i) − n. Note that “time 0” for the chain is the
n’th toss, at which point the chain is in its stationary distribution. So the
mean number of tosses until first seeing pattern i equals 2n c(i, i). For n = 5
and i = HHT HH, the reader may check this mean number is 38. We leave
the interested reader to explore further — in particular, find three patterns
i, j, k such that
to note that results often do not extend simply from singletons to subsets.
For instance, one might guess that Lemma 2.11 could be extended to
∞
ZAA X
Eπ TA = , ZAA := (Pπ (Xt ∈ A|X0 ∈ A) − π(A)),
π(A) t=0
Eπ Ni (t) = tπi .
t−1
!
X (u)
= πi 2(t − u)(pii − πi ) − t(1 − πi )
u=0
The contribution to the latter double sum from terms r ≤ s equals, putting
u = s − r,
t−1
X (u)
πi (t − u)(pij − πj ) ∼ tπi Zi,j .
u=0
Collecting the other term and subtracting the twice-counted diagonal leads
to the following result.
Eπ Stf Stg XX
→ f Γg := f (i)Γij g(j) (2.11)
t i j
So variation distance is just the maximum additive error one can make, in
using the “wrong” distribution to evaluate the probability of an event. In
example 2.18, variation distance is 0.1. Several equivalent definitions are
provided by
the minimum taken over random pairs (V1 , V2 ) such that Vm has distribution
θm (m = 1, 2). So each of these quantities equals the variation distance
||θ1 − θ2 ||.
Proof. The first three equalities are clear. For the fourth, set B = {i :
θ1 (i) > θ2 (i)}. Then
X
θ1 (A) − θ2 (A) = (θ1 (i) − θ2 (i))
i∈A
X
≤ (θ1 (i) − θ2 (i))
i∈A∩B
X
≤ (θ1 (i) − θ2 (i))
i∈B
X
= (θ1 (i) − θ2 (i))+
i
with equality when A = B. This, and the symmetric form, establish the
fourth equality. In the final equality, the “≤” follows from
P (V1 = i, V2 = i) = θ(i)
38CHAPTER 2. GENERAL MARKOV CHAINS (SEPTEMBER 10, 1999)
Lemma 2.20
¯ + t) ≤ d(s)
(a) d(s ¯ d(t),
¯ s, t ≥ 0 [the submultiplicativity property].
(b) d(s + t) ≤ 2d(s)d(t), s, t ≥ 0 .
¯ ≤ 2d(t), t ≥ 0.
(c) d(t) ≤ d(t)
¯ decrease as t increases.
(d) d(t) and d(t)
the minimum taken over random pairs (V1 , V2 ) such that Vm has distribution
θm (m = 1, 2).
Fix states i1 , i2 and times s, t, and let Y1 , Y2 denote the chains started
at i1 , i2 respectively. By (2.16) we can construct a joint distribution for
(Ys1 , Ys2 ) such that
Now for each pair (j1 , j2 ), we can use (2.16) to construct a joint distribution
1 , Y 2 ) given (Y 1 = j , Y 2 = j ) with the property that
for (Ys+t s+t s 1 s 2
1 2
P (Ys+t 6= Ys+t |Ys1 = j1 , Ys2 = j2 ) = ||Pj1 (Xt = ·) − Pj2 (Xt = ·)||.
¯
The right side is 0 if j1 = j2 , and otherwise is at most d(t). So uncondition-
ally
1
P (Ys+t 2
6= Ys+t ¯ d(t)
) ≤ d(s) ¯
and (2.16) establishes part (a) of the lemma. For part (b), the same argu-
ment (with Y2 now being the stationary chain) shows
¯
d(s + t) ≤ d(s)d(t) (2.17)
so that (b) will follow from the upper bound d(t)¯ ≤ 2d(t) in (c). But this
upper bound is clear from the triangle inequality for variation distance.
And the lower bound in (c) follows from the fact that µ → ||θ − µ|| is a
convex function, so that averaging over j with respect to π in (2.15) can
only decrease distance. Finally, the “decreasing” property for d(t)¯ follows
from (a), and for d(t) follows from (2.17). 2
The assertions of this section hold in either discrete or continuous time.
But note that the numerical value of d(t) changes when we switch from a
discrete-time chain to the continuized chain. In particular, for a discrete-
time chain with period q we have d(t) → (q−1)/q as t → ∞ (which incidently
implies, taking q = 2, that the factor 2 in Lemma 2.20(b) cannot be reduced)
whereas for the continuized chain d(t) → 0.
One often sees slightly disguised corollaries of the submultiplicativity
property in the literature. The following is a typical one.
2.4.2 L2 distance
Another notion of distance, which is less intuitively natural but often more
mathematically tractable, is L2 distance. This is defined with respect to
40CHAPTER 2. GENERAL MARKOV CHAINS (SEPTEMBER 10, 1999)
This may look confusing, because a signed measure ν and a function f are
in a sense “the same thing”, being determined by values (f (i); i ∈ I) or
(νi ; i ∈ I) which can be chosen arbitrarily. But the measure ν can also be
determined by its density function f (i) = νi /πi , and so (2.18) and (2.19)
say that the L2 norm of a signed measure is defined to be the L2 norm of
its density function.
So ||θ − µ||2 is the “L2 ” measure of distance between probability dis-
tributions θ, µ. In particular, the distance between θ and the reference
distribution π is
v v
uX 2
(θi − πi )2 θi
uX
u u
||θ − π||2 = t = t − 1.
i
πi i
πi
Example 2.22 Take I = {0, 1, . . . , n − 1}, fix a parameter 0 < a < 1 and
define a transition matrix
1−a
pij = a1(j=i+1 mod n) + .
n
In this example the t-step transition probabilities are
(t) 1 − at
pij = at 1(j=i+t mod n) +
n
and the stationary distribution π is uniform. We calculate (for arbitrary
j 6= i)
d(t) = ||Pi (Xt ∈ ·) − π|| = (1 − n−1 )at
¯ = ||Pi (Xt ∈ ·) − Pj (Xt ∈ ·)|| = at
d(t)
||Pi (Xt ∈ ·) − π||2 = (n − 1)1/2 at .
Pµ (TA > ms|TA > (m − 1)s) = Pθ (TA > s) for some dist. θ
≤ max Pi (TA > s)
i
∗
≤ tA /s.
So by induction on m
Pµ (TA > js) ≤ (t∗A /s)j
42CHAPTER 2. GENERAL MARKOV CHAINS (SEPTEMBER 10, 1999)
implying
Pµ (TA > t) ≤ (t∗A /s)bt/sc , t > 0.
In continuous time, a good (asymptotically optimal) choice of s is s = et∗A ,
giving the exponential tail bound
$ %!
t
sup Pµ (TA > t) ≤ exp − ∗ , 0 < t < ∞. (2.20)
µ et A
which extends the familiar fact Ei Ti+ = 1/πi . Multiplying the identity of
Lemma 2.23 by t and summing gives
X
Eπ TA + 1 = tPπA (TA = t − 1)
t≥1
tPπA (TA+ ≥ t)
X
= π(A)
t≥1
X 1
= π(A) m(m + 1)PπA (TA+ = m)
m≥1
2
π(A)
= EπA TA+ + EπA (TA+ )2 .
2
Appealing to Kac’s formula and rearranging,
2Eπ TA + 1
EπA (TA+ )2 = , (2.21)
π(A)
2Eπ TA + 1 1
varπA (TA+ ) = − 2 . (2.22)
π(A) π (A)
More generally, there is a relation between EπA (TA+ )p and Eπ (TA+ )p−1 .
In continuous time, the analog of Lemma 2.23 is
Pπ (TA ∈ (t, t + dt)) = Q(A, Ac )PρA (TA > t)dt, t > 0 (2.23)
where
X X X
Q(A, Ac ) := qij , ρA (j) := qij /Q(A, Ac ), j ∈ Ac .
i∈A j∈Ac i∈A
Thus Gij (z) = Fij (z)Gjj (z), and the lemma follows. 2
Probability proof. Let ζ have geometric(z) law P (ζ > t) = z t , indepen-
dent of the chain. Then
2
Note that, differentiating term by term,
d
Ei Tj = dz Fij (z)z=1 .
This and Lemma 2.25 can be used to give an alternative derivation of the
mean hitting time formula, Lemma 2.12.
and then the number Nt of jumps before time t has Poisson(t) distribution.
Now write Sn = nj=1 ξj for the time of the n’th jump. Then the hitting
P
time T̂A for the continuized chain is related to the hitting time TA of the
discrete-time chain by T̂A = STA . Though these two hitting time distribu-
tions are different, their expectations are the same, and their variances are
related in a simple way. To see this, the conditional distribution of T̂A given
TA is the distribution of the sum of TA independent ξ’s, so (using the notion
of conditional expectation given a random variable)
tells us that
C
Pi >x ≤ ne · e−x , 0≤x<∞
et∗
C
Pi − log(en) > x ≤ e−x , 0 ≤ x < ∞.
et∗
C
In words, this says that the distribution of et∗ − log(en) is stochastically
C
smaller that the exponential(1) distribution, implying Ei et∗ − log(en) ≤
1 and hence
max Ei C ≤ (2 + log n)et∗ .
i
This argument does lead to a bound, but one suspects the factors 2 and e
are artifacts of the proof; also, it seems hard to obtain a lower bound this
way. The following result both “cleans up” the upper bound and gives a
lower bound.
Proof. We’ll prove the lower bound — the upper bound proof is identi-
cal. Let J1 , J2 , . . . , Jn be a uniform random ordering of the states, inde-
pendent of the chain. Define Cm := maxi≤m TJi to be the time until all of
{J1 , J2 , . . . , Jm } have been visited, in some order. The key identity is
To understand what this says, suppose we are told which are the states
{J1 , J2 , . . . , Jm } and told the path of the chain up through time Cm−1 . Then
we know whether or not Lm = Jm : if not, then Cm = Cm−1 , and if so, then
the conditional distribution of Cm − Cm−1 is the distribution of the time to
hit Jm from the state at time Cm−1 , which we are told is state Lm−1 .
2.7. NEW CHAINS FROM OLD 47
Writing t∗ := mini6=j t(i, j), the right side of (2.26) is ≥ t∗ 1(Lm =Jm ) , and
so taking expectations
S0 = TA = min{t ≥ 0 : Xt ∈ A}
Yn = XSn .
From the ergodic theorem (Theorem 2.1) it is clear that the stationary
distribution πA of (Yt ) is just π conditioned on A, that is
In general there is little connection between this chain and the original chain
(Xt ), and in general it is not true that the stationary distribution is given
by (2.27). However, when the original chain is reversible, it is easy to check
that the restricted chain does have the stationary distribution (2.27).
p∗ij = pij , i, j ∈ A
p∗ia =
X
pik , i ∈ A
k∈Ac
1
p∗ai =
X
c
πk pki , i ∈ A
π(A ) k∈Ac
1
p∗aa =
X X
c
πk pkl .
π(A ) k∈Ac l∈Ac
Obviously the P-chain started at i and run until TAc is the same as the
P∗ -chain started at i and run until Ta . This leads to the general collapsing
principle
For we may apply the singleton result to the P∗ -chain run until time Ta ,
and the same result will hold for the P-chain run until time TAc .
2.8. MISCELLANEOUS METHODS 49
Proof. If f satisfies the equations above then for any initial distribution the
process Mt := f (Xmin(t,TA ) ) is a martingale. So by the optional stopping
theorem
f (i) = Ei f (XTA ) for all i. (2.28)
This establishes uniqueness. Conversely, if we define f by (2.28) then the
desired equations hold by conditioning on the first step.
then h is constant.
P
Lemma 2.29 (The random target lemma) The sum j Ei Tj πj does not
depend on i.
Proof. This repeats Corollary 2.14 with a different argument. The first-step
recurrence for gj (i) := Ei Tj is
X
gj (i) = 1(i6=j) + 1(i6=j) pik gj (k).
k
P
By Corollary 2.28 it is enough to show that h(i) := j πj gj (i) is a harmonic
function. We calculate
X
h(i) = 1 − πi + πj pik gj (k)1(i6=j)
j,k
X
= 1 − πi + pik (h(k) − πi gi (k)) by definition of h(k)
k
!
X X
= pik h(k) + 1 − πi 1 + pik gi (k) .
k k
Proof. Recall that “before” means strictly before. The assertion of the
lemma is intuitively obvious, because each time the chain visits j it has
chance pjk to make a transition j → k, and one can formalize this as in the
proof of Proposition 2.4. A more sophisticated proof is to observe that M (t)
is a martingale, where
Then Ei TA ≤ h(i), i ∈ I.
Ei MTA = Ei M0 = h(i).
Proof. The proof implicitly compares the given chain to the continuous-time
chain with qi,i−1 = m(i). Write h(i) = ij=1 1/m(j), and extend h by linear
P
where h0 is the left derivative. The result now follows from Lemma 2.31.
52CHAPTER 2. GENERAL MARKOV CHAINS (SEPTEMBER 10, 1999)
E(Yi+1 − Yi |Yj , j ≤ i) ≤ c, i ≥ 0
EYT ≤ cET.
(Eξi )(ET ).
the treatize of Syski [318] on hitting times. Most textbooks leave an exag-
gerated impression of the difference between discrete- and continuous-time
chains.
Section 2.1.2. Continuized is an ugly neologism, but no-one has collected
my $5 prize for suggesting a better name!
Section 2.2. Elementary matrix treatments of results like those in section
2.2.2 for finite state space can be found in [186, 214]. On more general spaces,
this is part of recurrent potential theory: see [96, 215] for the countable-state
setting and Revuz [289] for continuous space. Our treatment, somewhat
novel at the textbook level, Pitman [283] studied occupation measure iden-
tities more general than thos in section 2.2.1 and their applications to hitting
time formulas, and we follow his approach in sectionMHTF. We are being
slightly dishonest in treating Lemmas 2.5 and 2.6 this way, because these
facts figure in the “right” proof of the ergodic theorems we use. We made
a special effort not to abbreviate “number of visits to j before time S” as
Nj (S), which forces the reader to decode formulas.
Kemeny and Snell [214] call Z + Π the fundamental matrix, and use
(Ei Tj+ ) rather than (Ei Tj ) as the matrix of mean hitting times. Our set-up
seems a little smoother – cf. Meyer [202] who calls Z the group inverse of
I − P.
The name “random target lemma” for Corollary 2.14 was coined by
Lovász and Winkler [241]; the result itself is classical ([214] Theorem 4.4.10).
Section 2.6. Matthews [256, 257] introduced his method (Theorem 2.26)
to study some highly symmetric walks (cf. Chapter 7) and to study some
continuous-space Brownian motion covering problems.
Section 2.7. A more sophisticated notion is “the chain conditioned never
to hit A”, which can be formalized using Perron-Frobenius theory.
Section 2.8.1. Applying the optional stopping theorem involves checking
side conditions (involving integrability of the martingale or the stopping
time), but these are trivially satisfied in our applications.
Numerical methods. In many applications of non-reversible chains, e.g.
to queueing-type processes, one must resort to numerical computations of
the stationary distribution: see Stewart [314]. We don’t discuss such issues
because in the reversible case we have conceptually simple expressions for
the stationary distribution,
Matrix methods. There is a curious dichotomy between textbooks on
Markov chains which use matrix theory almost everywhere and textbooks
which use matrix theory almost nowhere. Our style is close to the latter;
matrix formalism obscures more than it reveals. For our purposes, the one
piece of matrix theory which is really essential is the spectral decomposition
of reversible transition matrices in Chapter 3. Secondarily useful is the the-
ory surrounding the Perron-Frobenius theorem, quoted for reversible chains
in Chapter 3 section 6.5. (yyy 9/2/94 version)
2.10. MOVE TO OTHER CHAPTERS 55
d
π = πRZ.
dα
56CHAPTER 2. GENERAL MARKOV CHAINS (SEPTEMBER 10, 1999)
d
Proof. Write η = dα π. Differentiating the balance equations π = πP gives
η = ηP + πR, in other words η(I − P) = πR. Right-multiply by Z to get
Chapter 2 reviewed some aspects of the elementary theory of general fi- 9/10/99 version
nite irreducible Markov chains. In this chapter we specialize to reversible
chains, treating the discrete-time and continuous-time cases in parallel. Af-
ter Section 3.3 we shall assume that we are dealing with reversible chains
without continually repeating this assumption, and shall instead explicitly
say “general” to mean not necessarily reversible.
3.1 Introduction
Recall P denotes the transition matrix and π the stationary distribution of
a finite irreducible discrete-time chain (Xt ). Call the chain reversible if
πi pij = πj pji for all i, j. (3.1)
Equivalently, suppose (for given irreducible P) that π is a probability distri-
bution satisfying (3.1). Then π is the unique stationary distribution and the
chain is reversible. This is true because (3.1), sometimes called the detailed
balance equations, implies
X X
πi pij = πj pji = πj for all j
i i
and therefore π satisfies the balance equations of (1) in Chapter 2. 9/10/99 version
The name reversible comes from the following fact. If (Xt ) is the sta-
tionary chain, that is, if X0 has distribution π, then
d
(X0 , X1 , . . . , Xt ) = (Xt , Xt−1 , . . . , X0 ).
57
58CHAPTER 3. REVERSIBLE MARKOV CHAINS (SEPTEMBER 10, 2002)
More vividly, given a movie of the chain run forwards and the same movie
run backwards, you cannot tell which is which.
It is elementary that the same symmetry property (3.1) holds for the
t-step transition matrix Pt :
(t) (t)
πi pi,j = πj pji
But beware that the symmetry property does not work for mean hitting
times: the assertion
πi Ei Tj = πj Ej Ti
is definitely false in general (see the Notes for one intuitive explanation).
1/31/94 version See Chapter 7 for further discussion. The following general lemma will be
The lemma has been copied useful there.
here from Section 1.2 of
Chapter 7 (1/31/94 version);
reminder: it still needs to be
Lemma 3.1 For an irreducible reversible chain, the following are equiva-
deleted there! lent.
(a) Pi (Xt = i) = Pj (Xt = j), i, j ∈ I, t ≥ 1
(b) Pi (Tj = t) = Pj (Ti = t), i, j ∈ I, t ≥ 1.
satisfy
Fij = Gij /Gjj . (3.3)
For i 6= j we have seen that Gij = Gji , and hence by (3.3)
Ei0 Ti1 + Ei1 Ti2 + · · · + Eim Ti0 = Ei0 Tim + Eim Tim−1 + · · · + Ei1 Ti0 .
Ei0 Ti1 + Ei1 Ti2 + · · · + Eim Ti0 = Ei∗0 Tim + Ei∗m Tim−1 + · · · + Ei∗1 Ti0 (3.6)
This is simple once you see the right picture. Consider the stationary P-
chain (X0 , X1 , X2 , . . .). We can specify the game in terms of that chain by
taking the initial state to be X1 , and the mouse’s jump to be to X0 , and
the cat’s moves to be to X2 , X3 , . . .. So M = T + − 1 with
T + := min{t ≥ 1 : Xt = X0 }.
Now let (X̂t , Ŷt ) be the positions of (cat, mouse) after t moves according to
some strategy. Consider
Wt ≡ t + f (X̂t , Ŷt ).
Equalities (3.8) and (3.10) are exactly what is needed to verify
(Wt ; 0 ≤ t ≤ M ) is a martingale.
So the optional stopping theorem says E(x,y) W0 = E(x,y) WM , that is,
where Mij is the matrix whose only nonzero entries are m(i, i) = m(j, j) =
−1 and m(i, j) = m(j, i) = 1. Plainly Mij is negative semidefinite, hence so
is W0 .
3.2. REVERSIBLE CHAINS AND WEIGHTED GRAPHS 63
where
X X
wv := wvx , w := wv .
x v
Note that w is the total edge-weight, when each edge is counted twice, i.e.,
once in each direction. The fundamental fact is that this chain is automat-
ically reversible with stationary distribution
πv ≡ wv /w (3.14)
because (3.1) is obviously satisfied by πv pvx = πx pxv = wvx /w. Our standing
convention that graphs be connected implies that the chain is irreducible.
Conversely, with our standing convention that chains be irreducible, any
reversible chain can be regarded as as random walk on the weighted graph
with edge-weights wvx := πv pvx . Note also that the “aperiodic” condition for
a Markov chain (occurring in the convergence theorem Chapter 2 Theorem 2) 9/10/99 version
is just the condition that the graph be not bipartite.
An unweighted graph can be fitted into this setup by simply assigning
weight 1 to each edge. Since we’ll be talking a lot about this case, let’s write
out the specialization explicitly. The transition matrix becomes
(
1/dv if (v, x) is an edge
pvx =
0 if not
64CHAPTER 3. REVERSIBLE MARKOV CHAINS (SEPTEMBER 10, 2002)
dv
πv = (3.15)
2|E|
In this model the stationary distribution is always uniform (cf. Section 3.2.1).
In the case of an unweighted regular graph the two models are identical up to
a deterministic time rescaling, but for non-regular graphs there are typically
no exact relations between numerical quantities for the two continuous-time
models. Note that, given an arbitrary continuous-time reversible chain, we
can define edge-weights (wij ) via
but the weights (wij ) do not completely determine the chain: we can specify
the πi independently and then solve for the q’s.
Though there’s no point in writing out all the specializations of the
9/10/99 version general theory of Chapter 2, let us emphasize the simple expressions for
9/10/99 version mean return times of discrete-time walk obtained from Chapter 2 Lemma 5
and the expressions (3.14)–(3.15) for the stationary distribution.
It’s a good question, because if you don’t know Markov chain theory it looks
too messy to do by hand, whereas using Markov chain theory it becomes very
simple. The knight is performing random walk on a graph (the 64 squares
are the vertices, and the possible knight-moves are the edges). It is not hard
to check that the graph is connected, so by the elementary Lemma 3.5, for
a corner square v the mean return time is
1 2|E|
Ev Tv+ = = = |E|,
πv dv
and by drawing a sketch in the margin the reader can count the number of
edges |E| to be 168.
The following cute variation of Lemma 3.5 is sometimes useful. Given
the discrete-time random walk (Xt ), consider the process
Zt = (Xt−1 , Xt )
recording the present position at time t and also the previous position.
→
Clearly (Zt ) is a Markov chain whose state-space is the set E of directed
edges, and its stationary distribution (ρ, say) is
wvx
ρ(v, x) =
w
in the general weighted case, and hence
1 →
ρ(v, x) = → , (x, v) ∈ E
|E |
in the unweighted case. Now given an edge (x, v), we can apply Chapter 2 9/10/99 version
Lemma 5 to (Zt ) and the state (x, v) to deduce the following.
Then
w
Ev U = (weighted )
wvx
= 2|E| (unweighted ).
We shall soon see (Section 3.3.3) this inequality has a natural interpretation
in terms of electrical resistance, but it is worth remembering that the result
is more elementary than that.
Here is another variant of Lemma 3.5.
buckets i and j are pi and pj , the flow rate through the tube should be
proportional to the pressure difference and hence should be wij (pi − pj ) in
the direction i → j, where wij = wji is a parameter. Neglecting the fluid
in the tubes, the quantities of fluid (pi (t)) at time t will evolve according to
the differential equations
dpj (t) X
= wij (pi (t) − pj (t)).
dt i6=j
These of course are the same equations as the forward equations [(4) of
Chapter 2] for pi (t) (the probability of being in state i at time t) for the 9/10/99 version
continuous-time chain with transition rates qij = wij , j 6= i. Hence we call
this particular way of defining a continuous-time chain in terms of a weighted
graph the fluid model . Our main purpose in mentioning this notion is to
distinguish it from the electrical network analogy in the next section. Our
intuition about fluids says that as t → ∞ the fluid will distribute itself
uniformly amongst buckets, which corresponds to the elementary fact that
the stationary distribution of the “fluid model” chain is always uniform.
Our intuition also says that increasing a “specific flow rate” parameter wij
will make the fluid settle faster, and this corresponds to a true fact about
the “fluid model” Markov chain (in terms of the eigenvalue interpretation
of asymptotic convergence rate—see Corollary 3.28). On the other hand
the same assertion for the usual discrete-time chain or its continuization is
simply false.
which implies i∈A f(i) = −1. Given a Markov chain X (in particular, given
P
a weighted graph we can use the random walk) we can define a special flow
as follows. Given v0 6∈ A, define f v0 →A by
TA
X
fij := Ev0 1(Xt−1 =i,Xt =j) − 1(Xt−1 =j,Xt =i) . (3.18)
t=1
I(v) = 0, v ∈
/ {v0 } ∪ A. (3.21)
Regarding the above as intuition arising from the study of physical elec-
trical networks, we can define an electrical network mathematically as a
weighted graph together with a function g and a flow I, called voltage and
current, respectively, satisfying (3.20)–(3.21) and the normalization (3.19).
3.3. ELECTRICAL NETWORKS 69
As it turns out, these three conditions specify g [and hence also I, by (3.20)]
uniquely since (3.20)–(3.21) imply
X
g(v) = pvx g(x), v 6∈ {v0 } ∪ A (3.22)
x
with pvx defined at (3.13), and Chapter 2 Lemma 27 shows that this equa- 9/10/99 version
tion, together with the boundary conditions (3.19), has a unique solution.
Conversely, if g is the unique function satisfying (3.22) and (3.19), then I de-
fined by (3.20) satisfies (3.21), as required. Thus a weighted graph uniquely
determines both a random walk and an electrical network.
The point of this subsection is that the voltage and current functions can
be identified in terms of the random walk. Recall the flow f v0 →A defined
at (3.18).
Proposition 3.10 Consider a weighted graph as an electrical network, where
a wire linking v and x has conductance wvx . Suppose that the voltage func-
tion g satisfies (3.19). Then the voltage at any vertex v is given in terms of
the associated random walk by
and the current Ivx along each wire (v, x) is fvx /r, where f = f v0 →A and
1
r= ∈ (0, ∞). (3.24)
wv0 Pv0 (TA < Tv+0 )
Since f is a unit flow from v0 to A and g(v0 ) = 1, we find I(v0 ) = g(v0 )/r.
Since g = 0 on A, it is thus natural in light of Ohm’s law to regard the entire
network as effectively a single conductor from v0 to A with resistance r; for
this reason r is called the effective resistance between v0 and A. Since (3.19)
and (3.21) are clearly satisfied, to establish Proposition 3.10 it suffices by
our previous comments to prove (3.20), i.e.,
fvx
= (g(v) − g(x))wvx . (3.25)
r
Proof of (3.25). Here is a “brute force” proof by writing everything in
terms of mean hitting times. First, there is no less of generality in assuming
that A is a singleton a, by the collapsing principle (Chapter 2 Section 7.3). 9/10/99 version
Now by the Markov property
Chapter 2 Lemma 9 gives a formula for the expectations above, and using 9/10/99 version
πv pvx = πx pxv = wvx /w we get
w
fvx = Ea Tv − Ev0 Tv − Ea Tx + Ev0 Tx . (3.26)
wvx
9/10/99 version And Chapter 2 Corollaries 8 and 10 give a formula for g:
which leads to
g(v) − g(x)
= Ev Ta − Ex Ta − Ev Tv0 + Ex Tv0 . (3.27)
πv0 Pv0 (Ta < Tv+0 )
But the right sides of (3.27) and (3.26) are equal, by the cyclic tour property
(Lemma 3.2) applied to the tour v0 , x, a, v, v0 , and the result (3.25) follows
after rearrangement, using πv0 = wv0 /w.
Remark. Note that, when identifying a reversible chain with an electrical
network, the procedure of collapsing the set A of states of the chain to a
singleton corresponds to the procedure of shorting together the vertices A
of the electrical network.
Ev Ta + Ea Tv = wrva .
Note that the Corollary takes a simple form in the case of unweighted graphs:
Ev Ta + Ea Tv = 2|E|rva . (3.29)
Note also that the Corollary does not hold so simply if a and v are both
replaced by subsets—see Corollary 3.37.
Corollary 3.11 apparently was not stated explicitly or exploited until
a 1989 paper of Chandra et al [85], but then rapidly became popular in
the “randomized algorithms” community. The point is that “cutting or
shorting” arguments can be used to bound mean commute times. As the
simplest example, it is obvious that the effective resistance rvx across an edge
(v, x) is at most the resistance 1/wvx of the edge itself, and so Corollary 3.11
implies the edge-commute inequality (Corollary 3.8). Finally, we can use add pointers to uses of cutting
and shorting later in book
Corollary 3.11 to get simple exact expressions for mean commute times in
some special cases, in particular for birth-and-death processes (i.e., weighted
linear graphs) discussed in Chapter 5. 4/22/96 version
As with the infinite-space results, the electrical analogy provides a vivid
language for comparison arguments, but the arguments themselves can be
justified via the extremal characterizations of Section 3.7 without explicit
use of the analogy.
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗∗
CONVENTION.
72CHAPTER 3. REVERSIBLE MARKOV CHAINS (SEPTEMBER 10, 2002)
For the rest of the chapter we make the convention that we are dealing
with a finite-state, irreducible, reversible chain, and we will not repeat the
“reversible” hypothesis in each result. Instead we will say “general chain”
to mean not-necessarily-reversible chain.
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗∗
S = UΛUT
The classical fact that |λi | ≤ 1 follows easily from the fact that the entries
of S(t) are bounded as t → ∞ by (3.31) below. These λ’s are the eigenvalues
of P, as well as of S. That is, the solutions (λ; x) with xi 6≡ 0 of
X
xi pij = λxj for all j
i
−1/2
(λ = λm ; yi = cm πi uim , i = 1, . . . , n).
1/2
ui1 = πi .
S(t) = UΛ(t) UT
and
(t) −1/2 (t) 1/2
pij = πi sij πj , (3.31)
so
n
−1/2 1/2 X
Pi (Xt = j) = πi πj λtm uim ujm . (3.32)
m=1
1/2
As before, U is an orthonormal matrix and ui1 = πi , and now the λ’s
are the eigenvalues of −Q. In the continuous-time setting, the eigenvalues
satisfy
0 = λ1 < λ2 ≤ · · · ≤ λn . (3.34)
Rather than give the general proof, let us consider the effect of continuizing
the discrete-time chain (3.32). The continuized chain (Yt ) can be represented
as Yt = XN (t) where N (t) has Poisson(t) distribution, so by conditioning on
N (t) = ν,
n ∞
−1/2 1/2 X X e−t tν
Pi (Yt = j) = πi πj uim ujm λνm
m=1 ν=0
ν!
n
−1/2 1/2 X
= πi πj uim ujm exp(−(1 − λm )t).
m=1
74CHAPTER 3. REVERSIBLE MARKOV CHAINS (SEPTEMBER 10, 2002)
λ(c) (d)
m = 1 − λm (3.35)
λ−1
X X
Zii = m .
i m≥2
λ−1
X X
πj Ei Tj = m (continuous time)
j m≥2
i j iff Eπ Ti ≤ Eπ Tj .
Ei Tj ≥ Ej Ti iff i j.
76CHAPTER 3. REVERSIBLE MARKOV CHAINS (SEPTEMBER 10, 2002)
I haven’t tried to find Here is a counterexample. Choose 0 < ε < 1/2 arbitrarily and let
counterexamples with more
than three states.
2ε 1 − 2ε 0
P := ε 1 − 2ε ε .
0 1 − 2ε 2ε
−1/2 1/2
λ−1
X
Zij = πi πj m uim ujm .
m≥2
This implies
1/2 −1/2
the symmetrized matrix πi Zij πj is positive semidefinite. (3.42)
Note that a symmetric positive semidefinite matrix (Mij ) has the property
Mij2 ≤ Mii Mjj . This gives
2
Zij ≤ Zii Zjj πj /πi , (3.43)
Our applications will use only the special case of a finite sum
am e−θm t , for some am > 0, θm ≥ 0,
X
f (t) = (3.45)
m
Finally, from the definition of the spectral gap λ it is clear that f (t)/f (0) ≤
e−λt . But F̄ has the same spectral gap as f .
Returning to the study of continuous-time reversible chains, the spectral
representation (3.40) says that Pi (Xt = i) is a CM function. It is often
convenient to subtract the limit and say
But analogs of Lemma 3.16 and subsequent results (e.g., Proposition 3.22)
become messier—so we prefer to derive discrete-time results by continuiza-
tion.
Write f (t) for the integrand. We know f is CM, and here λ ≥ λ2 by (3.40),
and f 0 (0) = −qi , so the extreme bounds of Lemma 3.16 become, after mul-
tiplying by f (0) = 1 − πi ,
λ−1
X X
πj Eπ Tj = m ≤ (n − 1)τ2
j m≥2
If the right side were strictly less than 2(n − 1) for all i, then
X X
πi (τ0 + Eπ Ti ) < 2(n − 1) πi (1 − πi ),
i i
3.5. COMPLETE MONOTONICITY 81
which implies
!
1 2(n − 1)2
X
2τ0 < 2(n − 1) 1 − πi2 ≤ 2(n − 1) 1 − = ,
i
n n
(1 − πi )(1 − 2πi )
vari Ti+ ≥ .
πi2
Lemma 3.20
X p2ij (t) pii (2t)
= (3.58)
j
πj πi
s
pik (t + s) pii (2t) pkk (2s)
− 1≤ − 1 − 1 (3.59)
πk πi πk
s
pik (t + s) pii (2t) pkk (2s) pik (2t) pii (2t)
≤ and so maxi,k πk ≤ maxi πi . (3.60)
πk πi πk
82CHAPTER 3. REVERSIBLE MARKOV CHAINS (SEPTEMBER 10, 2002)
Proof.
pik (t + s) X pjk (s) X pkj (s)
= pij (t) = pij (t)
πk j
πk j
πj
p
and applying the Cauchy–Schwarz inequality, we get the bound ai (t)ak (s),
where
This proves (3.59). The cruder bound (3.60) is sometimes easier to use
than (3.59) and is proved similarly.
9/10/99 version Discussion. Recalling from Chapter 2 Section 4.2 the definition of L2
distance between distributions, (3.58) says
pii (2t)
kPi (Xt ∈ ·) − πk22 = − 1. (3.61)
πi
In continuous time, we may regard the assertion “kPi (Xt ∈ ·) − πk2 is de-
creasing in t” as a consequence of the equality in (3.61) and the CM property
of pii (t). This assertion in fact holds for general chains, as pointed out in
9/10/99 version Chapter 2 Lemma 35. Loosely, the general result of Chapter 2 Lemma 35
9/10/99 version says that in a general chain the ratios (Pρ (Xt = j)/πj , j ∈ I) considered
as an unordered set tend to smooth out as t increases. For a reversible
chain, much more seems to be true. There is some “intrinsic geometry” on
the state space such that, for the chain started at i, the probability dis-
tribution as time increases from 0 “spreads out smoothly” with respect to
the geometry. It’s hard to formalize that idea convincingly. On the other
hand, (3.61) does say convincingly that the rate of convergence of the single
probability pii (t) to π(i) is connected to a rate of convergence of the entire
distribution Pi (Xt ∈ ·) to π(·). This intimate connection between the local
and the global behavior of reversible chains underlies many of the technical
10/11/94 version inequalities concerning mixing times in Chapter 4 and subsequent chapters.
3.5. COMPLETE MONOTONICITY 83
and in particular
π(Ac ) Eπ TA
= EρA TA ≤ EπAc TA = ≤ 1/λA . (3.69)
Q(A, Ac ) π(Ac )
(iv)
τ2 π(Ac )
Eπ TA ≤ . (3.70)
π(A)
Concerning (b): The results Remarks. (a) In discrete time we can define ρA and Q(A, Ac ) by replacing
cited from later require that
the chain restricted to Ac be
qij by pij in (3.65)–(3.66), and then (3.69) holds in discrete time. The left
irreducible, but I think that equality of (3.69) is then a reformulation of Kac’s formula (3.62), because
requirement can be dropped
using a limiting argument. EπA TA+ = 1 + PπA (X1 ∈ Ac )EπA (TA+ − 1|X1 ∈ Ac )
Q(A, Ac )
= 1+ EρA TA .
π(A)
(b) Equation (3.83) and Corollary 3.34 [together with remark (b) following
Theorem 3.33] later show that 1/λA ≤ τ2 /π(A). So (3.70) can be regarded
as a consequence of (3.69). Reverse inequalities will be studied in Chapter 4.
10/11/94 version
Proof of Proposition 3.21. First consider the case where A is a single-
ton {a}. Then (3.70) is an immediate consequence of Lemma 3.17. The
equalities in (3.69) and in (3.67) are general identities for stationary pro-
9/10/99 version cesses [(24) and (23) in Chapter 2]. We shall prove below that TA is CM
under PπI\{a} . Then by (3.63), (3.67), and (3.46), TA is also CM under the
other two initial distributions. Then the second inequality of (3.68) is the
upper bound in Lemma 3.16, and the first is a consequence of (3.67) and
Lemma 3.16. And (3.69) follows from (3.68) by integrating over t.
To prove that TA is CM under PπI\{a} , introduce a parameter 0 < ε < 1
and consider the modified chain (Xtε ) with transition rates
ε
qij := qij , i 6= a
ε
qaj := εqaj .
because the chain gets “stuck” upon hitting a. But the left side is CM
by (3.52), so the right side (which does not depend on ε) is CM, because the
class of CM distributions is closed under pointwise limits. (The last asser-
tion is in general the continuity theorem for Laplace transforms [133] p. 83,
though for our purposes we need only the simpler fact that the set of func-
tions of the form (3.45) with at most n summands is closed.)
This completes the proof when A is a singleton. We now claim that the
case of general A follows from the collapsing principle (Chapter 2 Section 7.3), 9/10/99 version
i.e., by applying the special case to the chain in which the subset A is col-
lapsed into a single state. This is clear for all the assertions of Proposi-
tion 3.21 except for (3.70), for which we need the fact that the relaxation
time τ2A of the collapsed chain is at most τ2 . This fact is proved as Corol-
lary 3.27 below.
Remark. Note that the CM property implies a super multiplicitivity
property for hitting times from stationarity in a continuous-time reversible
chain:
Pπ (TA > s + t) ≥ Pπ (TA > s)Pπ (TA > t).
Contrast with the general submultiplicitivity property (Chapter 2 Section 4.3) 9/10/99 version
which holds when Pπ is replaced by maxi Pi .
and so
ET 2 EΘ2
= ≥1
2(ET )2 (EΘ)2
with equality iff Θ is constant, i.e., iff T has exponential distribution. This
ET 2
suggests that the difference 2(ET )2
−1 can be used as a measure of “deviation
from exponentiality”. Let us quote a result of Mark Brown ([72] Theorem
4.1(iii)) which quantifies this idea in a very simple way.
86CHAPTER 3. REVERSIBLE MARKOV CHAINS (SEPTEMBER 10, 2002)
But λ−2
m ≤ λ−1 −1 −1
2 λm = τ2 λm for m ≥ 2, so the right side of (3.73) is bounded
−1 P 2 −1
by πj τ2 m≥2 ujm λm , which by (3.72) equals τ2 Eπ Tj . Applying Proposi-
tion 3.22 gives Proposition 3.23.
We give a straightforward but tedious verification of (3.73) (see also
Notes). The identity x2 /2 = 0∞ (x − t)+ dt, x ≥ 0 starts the calculation
R
Chapter 2 reference in
following display is to 9/10/99 Z ∞
version. 1
Eπ Tj2 = Eπ (Tj − t)+ dt
2 0
Z ∞X
= Pπ (Xt = i, Tj > t)Ei Tj dt
0 i
X
= Ei Tj Eπ (time spent at i before Tj )
i
X Zjj − Zij Zjj πi − Zji πj
=
i
πj πj
by Chapter 2 Lemmas 12 and 15 (continuous-time version)
πj−2 πi (Zjj − Zij )2 .
X
=
i
3.6. EXTREMAL CHARACTERIZATIONS OF EIGENVALUES 87
Expanding the square, the cross-term vanishes and the first term becomes
(Zjj /πj )2 = (Eπ Tj )2 , so
− (Eπ Tj )2 = πj−2
X
1 2 2
2 Eπ Tj πi Zij .
i
πj−1
X
2
πi Zij
i
Z Z
= πj−1
X
πi (pij (s) − πj ) ds (pij (t) − πj ) dt
i
X Z Z
= (pji (s) − πi ) ds (pij (t) − πj ) dt
i
Z Z
= (pjj (s + t) − πj ) ds dt
Z
= t(pjj (t) − πj ) dt
Z X
= u2jm te−λm t dt
m≥2
u2jm λ−2
X
= m .
m≥2
in discrete time, and substitute qij for pij in continuous time. One can
immediately check the following equivalent definitions. In discrete time
In continuous time
E(g, g) = 1
lim t−1 Eπ (g(Xt ) − g(X0 ))2
2 t→0
where the sum includes j = i. Note also that for random walk on a weighted
graph, (3.74) becomes
1 X X wij
E(g, g) := (g(j) − g(i))2 . (3.77)
2 i j6=i w
9/10/99 version Recall from Chapter 2 Section 6.2 the discussion of L2 norms for functions
and measures. In particular
X
||g||22 = πi g 2 (i) = Eπ g 2 (X0 )
i
X µ2
kµ − πk22 = i
− 1 for a probability distribution µ.
i
πi
Lemma 3.24 Write ρ(t) = (ρj (t)) for the distribution at time t of a continuous-
time chain, with arbitrary initial distribution. Write fj (t) = ρj (t)/πj . Then
d
kρ(t) − πk22 = −2E(f (t), f (t)).
dt
Proof. kρ(t) − πk22 = πj−1 ρ2j (t) − 1, so using the forward equations
P
j
d X
ρj (t) = ρi (t)qij
dt i
we get
d
2πj−1 ρj (t)ρi (t)qij
XX
kρ(t) − πk22 =
dt j i
XX
= 2 fj (t)fi (t)πi qij
j i
Because the state space is finite, the sups are attained, and there are the-
oretical descriptions of the g attaining the extrema in all three cases. An
immediate practical use of these characterizations in concrete examples is
to obtain lower bounds on the parameters by inspired guesswork, that is by
choosing some simple explicit “test function” g which seems qualitatively
right and computing the right-hand quantity. See Chapter 14 Example 32 3/10/94 version
for a typical example. Of course we cannot obtain upper bounds this way,
but extremal characterizations can be used as a starting point for further
theoretical work (see in particular the bounds on τ2 in Chapter 4 Section 4). 10/11/94 version
goes as follows ([183] Theorem 4.2.2 and eq. 4.2.7). Let S be a symmetric
matrix with eigenvalues µ1 ≥ µ2 ≥ · · ·. Then
P P
i j xi sij xj
µ1 = sup P 2 (3.78)
x i xi
θ(t) = e−t/τ2 θ.
P
For any signed measure ν = ν(0) with i νi (0) = 0 we have
ν(t) ∼ ce−t/τ2 θ; c =
X X
νi (0)θi /πi = νi (0)g0 (i).
i i
ρ(t) − π ∼ ce−t/τ2 θ, c =
X X
(ρi (0) − πi )g0 (i) = ρi (0)g0 (i). (3.80)
i i
Pi (Xt ∈ ·) − π ∼ g0 (i)e−t/τ2 θ.
Thus by (3.51)
n
!2
X X 1/2
kρ(t) − πk22 = πi g(i)uim exp(−λm t).
m=2 i
Proof. Any function g on the states of the collapsed chain can be extended to
the original state space by setting g = g(a) on A, and E(g, g) and i πi g(i)
P
and ||g||22 are unchanged. So consider a g attaining the sup in the extremal
characterization of τ2A and use this as a test function in the extremal char-
acterization of τ2 .
Remark. An extension of Corollary 3.27 will be provided by the contrac-
tion principle (Chapter 4 Proposition 44). 10/11/94 version
Corollary 3.28 Let τ2 be the relaxation time for a “fluid model” continuous-
time chain associated with a graph with weights (we ) [recall (3.16)] and let τ2∗
be the relaxation time when the weights are (we∗ ). If we∗ ≥ we for all edges e
then τ2∗ ≤ τ2 .
w∗ kgk∗2 2 ∗
2 ≤ wkgk2 max(wi /wi ).
i
kgk∗22 kg − bk∗2
2 kg − bk22 maxi (wi∗ /wi )
≤ ≤ .
E ∗ (g, g) E ∗ (g − b, g − b) E(g − b, g − b) mine (we∗ /we )
94CHAPTER 3. REVERSIBLE MARKOV CHAINS (SEPTEMBER 10, 2002)
This is the lower bound in the lemma, and the upper bound follows by
reversing the roles of we and we∗ .
Remarks. Sometimes τ2 is very sensitive to apparently-small changes in
the chain. Consider random walk on an unweighted graph. If we add extra
edges, but keeping the total number of added edges small relative to the
number of original edges, then we might guess that τ2 could not increase or
decrease much. But the examples outlined below show that τ2 may in fact
change substantially in either direction.
Example 3.30 Take two complete graphs on n vertices and join with a
single edge. Then w = 2n(n − 1) + 2 and τ2 ∼ n2 /2. But if we extend the
single join-edge to an n-edge matching of the vertices in the original two
complete graphs, then w∗ = 2n(n − 1) + 2n ∼ w but τ2∗ ∼ n/2.
then τ2 ≤ δ −1 τ2∗ .
3.6.5 Quasistationarity
Given a subset A of states in a discrete-time chain, let PA be P restricted
to Ac . Then PA will be a substochastic matrix, i.e., the row-sums are
at most 1, and some row-sum is strictly less than 1. Suppose PA is ir-
reducible. As a consequence of the Perron–Frobenius theorem (e.g., [183]
Theorem 8.4.4) for the nonnegative matrix PA , there is a unique 0 < λ < 1
(specifically, the largest eigenvalue of PA ) such that there is a probability
distribution α satisfying
X
α = 0 on A, αi pij = λαj , j ∈ Ac . (3.82)
i
whence
1
EαA TA = .
1 − λA
Call αA the quasistationary distribution and EαA TA the quasistationary
mean exit time.
Similarly, for a continuous-time chain let QA be Q restricted to Ac .
Assuming irreducibility of the substochastic chain with generator QA , there
is a unique λ ≡ λA > 0 such that there is a probability distribution α ≡ αA
(called the quasistationary distribution) satisfying
X
α = 0 on A, αi qij = −λαj , j ∈ Ac .
i
This implies that under PαA the hitting time TA has exponential distribution
PαA (TA > t) = exp(−λA t), t > 0,
whence the quasistationary mean exit time is
EαA TA = 1/λA . (3.83)
Note that both αA and EαA TA are unaffected by continuization of a discrete-
time chain.
The facts above do not depend on reversibility, but invoking now our
standing assumption that chains are reversible we will show in remark (c)
following Theorem 3.33 that, for continuous-time chains, λA here agrees
with the spectral gap λA discussed in Proposition 3.21, and we can also now
prove our second extremal characterization.
96CHAPTER 3. REVERSIBLE MARKOV CHAINS (SEPTEMBER 10, 2002)
Proof. As usual, we give the proof in discrete time. The matrix (sA
ij =
1/2 −1/2 1/2
πi pAij πj ) is symmetric with largest eigenvalue λA . Putting xi = πi g(i)
in the characterization (3.78) gives
πi g(i)pA
P P
i j ij g(j)
λA = sup P 2
.
g i πi g (i)
Clearly the sup is attained by nonnegative g, and though the sums above
are technically over Ac we can sum over all I by setting g = 0 on A. So
πi g(i)pA
(P P )
i j ij g(j)
λA = sup P 2
: g ≥ 0, g = 0 on A .
i πi g (i)
Lemma 3.35
and the sup is attained by g(j) = Pj (Ti < Ta ). In discrete time, for a
subset A and a state i 6∈ A,
and the inf is attained by g(j) = Pj (Ti < TA ). Equation (3.91) remains true
in continuous time, with πi replaced by qi πi on the left.
Proof. As noted at (3.28), form (3.90) follows (in either discrete or con-
tinuous time) from form (3.91) with A = {a}. To prove (3.91), consider g
satisfying the specified boundary conditions. Inspecting (3.74), the contri-
bution to E(g, g) involving a fixed state j is
X
πj pjk (g(k) − g(j))2 . (3.92)
k6=j
For j 6∈ A ∪ {i} the factor (g(j) − k pjk g(k)) equals zero, and for j ∈ A we
P
giving (3.91).
The analogous result for two disjoint subsets A and B is a little compli-
cated to state. The argument above shows that
We want to interpret the reciprocal of this quantity as a mean time for the
chain to commute from A to B and back. Consider the stationary chain
(Xt ; −∞ < t < ∞). We can define what is technically called a “marked
point process” which records the times at which A is first visited after a
visit to B and vice versa. Precisely, define Zt taking values in {α, β, δ} by
β
if ∃s < t such that Xs ∈ A, Xt ∈ B, Xu 6∈ A ∪ B ∀s < u < t
Zt := α if ∃s < t such that Xs ∈ B, Xt ∈ A, Xu ∈6 B ∪ A ∀s < u < t
δ otherwise.
So the times t when Zt = β are the times of first return to B after visiting A,
and the times t when Zt = α are the times of first return to A after visiting B.
Now (Zt ) is a stationary process. By considering the time-reversal of X, we
see that for i ∈ B
Corollary 3.37
In particular
1
r= . (3.96)
wv0 Pv0 (TA < Tv+0 )
There is a dual form of the Dirichlet principle, which following Doyle and
Snell [131] we call
3.7. EXTREMAL CHARACTERIZATIONS AND MEAN HITTING TIMES101
such flows, by the flow f v0 →A [defined at (3.18)] associated with the random
walk from v0 to A, and the minimum value equals the effective resistance r
appearing in (3.96).
Recall that a flow is required to have fij = 0 whenever wij = 0, and interpret
P P
sums i j as sums over ordered pairs (i, j) with wij > 0.
Proof. Write ψ(f ) := 12 i j (fij2 /wij ). By formula (3.25) relating the
P P
random walk notions of “flow” and “potential”, the fact that ψ(f v0 →A ) = r
is immediate from the corresponding equality in the Dirichlet principle. So
the issue is to prove that for a unit flow f ∗ , say, attaining the minimum
of ψ(f ), we have ψ(f ∗ ) = ψ(f v0 →A ). To prove this, consider two arbitrary
paths (yi ) and (zj ) from v0 to A, and let f ε denote the flow f ∗ modified by
adding flow rates +ε along the edges (yi , yi+1 ) and by adding flow rates −ε
along the edges (zi , zi+1 ). Then f ε is still a unit flow from v0 to A. So the
function ε → ψ(f ε ) must have derivative zero at ε = 0, and this becomes
the condition that
i i
So the sum is the same for all paths from v0 to A. Fixing x, the sum must
be the same for all paths from x to A, because two paths from x to A could
be extended to paths from v0 to A by appending a common path from v0
to x. It follows that we can define g ∗ (x) as the sum i (fx∗i ,xi+1 /wxi ,xi+1 )
P
over some path (xi ) from x to A, and the sum does not depend on the path
chosen. So (This is essentially the same
argument used in
∗
fxz Section 3.3.2.)
g ∗ (x) − g ∗ (z) = for each edge (x, z) not contained within A. (3.97)
wxz
The fact that f ∗ is a flow means that, for x 6∈ A ∪ {v0 },
∗
wxz (g ∗ (x) − g ∗ (z)).
X X
0= fxz =
z:wxz >0 z
1 XX
ĥ(x) := fij Dij
2 i j
has E ĥ(x) = h(x) and minimal variance. It is not hard to see that the
former “unbiased” property holds iff f is a unit flow from v0 to x. Then
1 XX 2 1 X X fij2
var ĥ(x) = fij var(Dij ) =
4 i j 4 i j wij
and Proposition 3.39 says this is minimized when we use the flow from v0
to x obtained from the random walk on the weighted graph. But then
Tx
X
ĥ(x) = Ev0 DXt−1 Xt ,
t=1
Corollary 3.40 For random walk on a weighted graph and distinct ver-
tices v and a,
1 X X
Ev Ta + Ea Tv = w inf (fij2 /wij ) : f is a unit flow from a to v
2
i j
and the min is attained by the flow f a→v associated with the random walk.
Comparing with Theorem 3.36 we have two different extremal characteriza-
tions of mean commute times, as a sup over potential functions and as an
inf over flows. In practice this “flow” form is less easy to use than the “po-
tential” form, because writing down a flow f is harder than writing down a
function g. But, when we can write down and calculate with some plausible
flow, it gives upper bounds on mean commute times.
One-sided mean hitting times Ei Tj don’t have simple extremal char-
acterizations of the same kind, with the exception of hitting times from
stationarity. To state the result, we need two definitions. First, given a
probability distribution ρ on vertices, a unit flow from a to ρ is a flow f
satisfying
f(i) = 1(i=a) − ρi for all i; (3.98)
more generally, a unit flow from a set A to ρ is defined to satisfy
X
f(i) = 1 − ρ(A) and f(i) = −ρi for all i ∈ Ac .
i∈A
with the usual convention in the periodic case. So fij is the mean excess
of transitions i → j compared to transitions j → i, for the chain started
at a and run forever. This is a unit flow from a to π, in the above sense.
Equation (6) (the definition of Z) in Chapter 2 and reversibility give the 9/10/99 version
first equality, and Chapter 2 Lemma 12 gives the last equality, in 9/10/99 version
switching to “weighted graphs” notation. Note also that the first-step re-
currence for the function i 7→ Zia is
X
Zia = 1(i=a) − πa + pij Zja . (3.102)
j
When A is a singleton {a}, the minimizing flow is the flow f a→π defined
above, and the maximizing function g is g(i) = Zia /Zaa . For general A the
maximizing function g is g(i) = 1 − EEπi TTAA .
Added the final observation
about general A Proof. Suppose first that A = {a}. We start by showing that the extremizing
flow, f ∗ say, is the asserted f a→π . By considering adding to f ∗ a flow of size ε
along a directed cycle, and copying the argument for (3.97) in the proof of
Proposition 3.39, there must exist a function g ∗ such that
∗
fxz
g ∗ (x) − g ∗ (z) = for each edge (x, z). (3.103)
wxz
The fact that f ∗ is a unit flow from a to π says that
∗
wxz (g ∗ (x) − g ∗ (z))
X X
1(x=a) − πx = fxz =
z z
which implies
1(x=a) − πx X
pxz (g ∗ (x) − g ∗ (z)) = g ∗ (x) − pxz g ∗ (z).
X
=
wx z z
99 version Chapter 2 Corollary 28.) On the other hand, a solution is g ∗ (x) = − ExwTa ,
∗ =
by considering the first-step recurrence for Ex Ta+ . So by (3.103) fxz
∗
(Ez Ta − Ex Ta )wxz /w, and so f = f a→π by (3.101).
Now consider the function g which minimizes E(g, g) under the con-
P
straints i πi g(i) = 0 and g(a) = 1. By introducing a Lagrange multiplier γ
we may consider g as minimizing E(g, g) + γ i πi g(i) subject to g(a) = 1.
P
0 = 0 − (γ/2) + βπa ,
1 X X fij2
w = Eπ Ta (3.104)
2 i j wij
106CHAPTER 3. REVERSIBLE MARKOV CHAINS (SEPTEMBER 10, 2002)
1/E(g, g) = Eπ Ta .
1 X X (fijε )2
Eaε Tz + Ezε Ta = wε ε (3.105)
2 i∈I ε j∈I ε wij
where f ε is the special unit flow from a to z associated with the new graph
(which has vertex set I ε := I ∪ {z}). We want to interpret the ingredients
to (3.105) in terms of the original graph. Clearly wε = w(1 + 2ε). The
new walk has chance ε/(1 + ε) to jump to z from each other vertex, so
Eaε Tz = (1 + ε)/ε. Starting from z, after one step the new walk has the
stationary distribution π on the original graph, and it follows easily that
Ezε Ta = 1 + Eπ Ta (1 + O(ε)). We can regard the new walk up to time Tz
as the old walk sent to z at a random time U ε with Geometric(ε/(1 + ε))
distribution, so for i 6= z and j 6= z the flow fijε is the expected net number
of transitions i → j by the old walk up to time U ε . From the spectral
representation it follows easily that fijε = fij + O(ε). Similarly, for i 6= z we
have −fziε = f ε = P (X(U ε − 1) = i) = π + O(ε); noting that P ε
iz a i i∈I fiz = 1,
the total contribution of such terms to the double sum in (3.105) is
X (πi + f ε − πi )2 ε − π )2
2 X (πi + fiz i
iz
2 =
i∈I
εwi wε i∈I πi
!
2 X (f ε − πi )2 2
iz
= 1+ = + O(ε).
wε i∈I
πi wε
So (3.105) becomes
1+ε 1 X X fij2 1
+ 1 + Eπ Ta + O(ε) = w(1 + 2ε) + + O(ε)
ε 2 i∈I j∈I wij wε
Subtracting (1 + 2ε)/ε from both sides and letting ε → 0 gives the de-
sired (3.104). This concludes the proof for the case A = {a}, once we use
the mean hitting time formula to verify
Zia Ei Ta
g(i) :≡ Zaa =1− Eπ Ta .
3.7. EXTREMAL CHARACTERIZATIONS AND MEAN HITTING TIMES107
∗ ∗ ∗
w∗ = w.
X XX
wij = wij , i, j ∈ Ac ; wia = wik , i ∈ Ac ; waa = wkl ;
k∈A k∈A l∈A
where
1 XX 2 1 X X
Ψ(f ) := (f /wij ), Ψ∗ (f ∗ ) := ((f ∗ )2 /wij
∗
).
2 i∈I j∈I ij 2 i∈I ∗ j∈I ∗ ij
Corollary 3.42 For chains with transition matrices P, P̃ and the same sta-
tionary distribution π,
Proof. Plug the minimizing flow f a→π for the P-chain into Proposition 3.41
for the P̃-chain to get the second inequality. The first follows by reversing
the roles of P and P̃.
108CHAPTER 3. REVERSIBLE MARKOV CHAINS (SEPTEMBER 10, 2002)
i i j i j j j i i j
- - -
- -
Section 3.1.1. Though probabilists would regard the “cyclic tour” Lemma 3.2
as obvious, László Lovász pointed out a complication, that with a careful
definition of starts and ends of tours these times are not invariant under time-
reversal. The sophisticated fix is to use doubly-infinite stationary chains and
observe that tours in reversed time just interleave tours in forward time, so
by ergodicity their asymptotic rates are equal. Tetali [326] shows that the
cyclic tour property implies reversibility. Tanushev and Arratia [321] show
that the distributions of forward and reverse tour times are equal.
Cat-and-mouse game 1 is treated more opaquely in Coppersmith et al
9/1/99 version [99], whose deeper results are discussed in Chapter 9 Section 4.4. Underlying
the use of the optional sampling theorem in game 2 is a general result about
optimal stopping, but it’s much easier to prove what we want here than to
3.8. NOTES ON CHAPTER 3 109
By the random target lemma and (3.53), the quantity under consideration
is at least τ0 ≥ n − 2 + n1 , so the example is close to optimal.
Section 3.5.4. The simple result quoted as Proposition 3.22 is actually
weaker than the result proved in Brown [72]. The ideas in the proof of
Proposition 3.23 are in Aldous [12] and in Brown [73], the latter containing
3.8. NOTES ON CHAPTER 3 111
a shorter Laplace transform argument for (3.73). Aldous and Brown [20]
give a more detailed account of the exponential approximation, including
the following result which is useful in precisely the situation where Proposi-
tion 3.23 is applicable, that is, when Eπ TA is large compared to τ2 .
τ2 −t
Pπ (TA > t) ≥ 1− exp , t>0
EαA TA EαA TA
Eπ TA ≥ EαA TA − τ2 .
Using this requires only a lower bound on Eα TA , which can often be obtained
using the extremal characterization (3.84). Connections with “interleaving
of eigenvalues” results are discussed in Brown [74].
For general chains, explicit bounds on exponential approximation are
much messier: see Aldous [5] for a bound based upon total variation mixing
and Iscoe and McDonald [192] for a bound involving spectral gaps.
Section 3.6.1. Dirichlet forms were developed for use with continuous-
space continuous-time Markov processes, where existence and uniqueness
questions can be technically difficult—see, e.g., Fukushima [159]. Their
use subsequently trickled down to the discrete world, influenced, e.g., by
the paper of Diaconis and Stroock [124]. Chen [88] is the most accessible
introduction.
Section 3.6.2. Since mean commute times have two dual extremal char-
acterizations, as sups over potential functions and as infs over flows, it is
natural to ask
We will see in Chapter 4 Theorem 32 an inequality giving an upper bound 10/11/94 version
on the relaxation time in terms of an inf over flows, but it would be more
elegant to derive such inequalities from some exact characterization.
Section 3.6.4. Lemma 3.32 is sometimes used to show, by comparison
with the i.i.d. chain,
The elementary theory of general finite Markov chains (cf. Chapter 2) fo-
cuses on exact formulas and limit theorems. My view is that, to the extent
there is any intermediate-level mathematical theory of reversible chains, it is
a theory of inequalities. Some of these were already seen in Chapter 3. This
chapter is my attempt to impose some order on the subject of inequalities.
We will study the following five parameters of a chain. Recall our standing
assumption that chains are finite, irreducible and reversible, with stationary
distribution π.
(i) The maximal mean commute time
τ ∗ = max(Ei Tj + Ej Ti )
ij
113
114CHAPTER 4. HITTING AND CONVERGENCE TIME, AND FLOW RATE, PARAMETER
(iv) The relaxation time τ2 , i.e. the time constant in the asymptotic rate
of convergence to the stationary distribution.
(v) A “flow” parameter
π(A)π(Ac ) π(Ac )
τc = sup P = sup c
A Pπ (X1 ∈ A |X0 ∈ A)
P
A i∈A j∈Ac πi pij
π(A)π(Ac ) π(Ac ) dt
τc = sup P = sup c
A Pπ (Xdt ∈ A |X0 ∈ A)
P
A i∈A j∈Ac πi qij
in continuous time.
The following table may be helpful. “Average-case” is intended to indi-
cate essential use of the stationary distribution.
worst-case average-case
hitting times τ∗ τ0
mixing times τ1 τ2
flow τc
The table suggests there should be a sixth parameter, but I don’t have a
candidate.
The ultimate point of this study, as will seen in following chapters, is
• For many questions about reversible Markov chains, the way in which
the answer depends on the chain is related to one of these parameters
This Chapter deals with relationships between these parameters, simple il-
lustrations of properties of chains which are closely connected to the pa-
rameters, and methods of bounding the parameters. To give a preview,
it turns out that these parameters are essentially decreasing in the order
(τ ∗ , τ0 , τ1 , τ2 , τc ): precisely,
1 ∗
τ ≥ τ0 ≥ τ2 ≥ τc
2
66τ0 ≥ τ1 ≥ τ2
4.1. THE MAXIMAL MEAN COMMUTE TIME τ ∗ 115
τ ∗ ≡ max(Ei Tj + Ej Ti ) (4.1)
ij
max Ei Tj ≤ τ ∗ ≤ 2 max Ei Tj
ij ij
τ ∗ = w max rij
ij
which by the Monotonicity Law is at least the effective resistance rij . Thus
trivially
τ ∗ ≤ w max min r(γij ). (4.3)
i,j paths γij
the min over proper subsets A. The max-flow min-cut theorem implies that
for any pair a, b there exists a flow f from a to b of size c such that |fij | ≤ wij
for all edges (i, j). So there is a unit flow from a to b such that |fe | ≤ c−1 we
for all edges e. It is clear that by deleting any flows around cycles we may
assume that the flow through any vertex i is at most unity, and so
X
|fij | ≤ 2 for all i, and = 1 for i = a, b. (4.5)
j
So
X f2
e
Ea Tb + Eb Ta ≤ w by Thompson’s principle
e we
wX
≤ |fe |
c e
w
≤ (n − 1) by (4.5).
c
and we have proved
4.2. THE AVERAGE HITTING TIME τ0 117
w(n − 1)
τ∗ ≤
c
for c defined at (4.4).
Lemma 4.1 and the Monotonicity Law also make clear a one-sided bound
on the effect of changing edge-weights monotonically.
Corollary 4.3 Let w̃e ≥ we be edge-weights and let τ̃ ∗ and τ ∗ be the corre-
sponding parameters for the random walks. Then
Ei T̃j + Ej T̃i w̃
≤ for all i, j
Ei Tj + Ej Ti w
and so
τ̃ ∗ /τ ∗ ≤ w̃/w.
In the case of unweighted graphs the bound in Corollary 4.3 is |Ẽ|/|E|. Ex-
ample yyy of Chapter 3 shows there can be no lower bound of this type, since
in that example w̃/w = 1 + O(1/n) but (by straightforward calculations)
τ̃ ∗ /τ ∗ = O(1/n).
and recalling what we already know. We know (a result not using reversibil-
ity: Chapter 2 Corollary yyy) the random target lemma
X
πj Ei Tj = τ0 for all i (4.7)
j
−1
X
τ0 = λm in continuous time (4.9)
m≥2
118CHAPTER 4. HITTING AND CONVERGENCE TIME, AND FLOW RATE, PARAMETER
The second idea is to consider minimal random times at which the chain has
exactly the stationary distribution. Let
(2)
τ1 ≡ max min Ei Ui
i Ui
where the min is over stopping times Ui such that Pi (X(Ui ) ∈ ·) = π(·). As a
variation on this idea, let us temporarily write, for a probability distribution
µ on the state space,
τ (µ) ≡ max min Ei Ui
i Ui
where the min is over stopping times Ui such that Pi (X(Ui ) ∈ ·) = µ(·).
Then define
(3)
τ1 = min τ (µ).
µ
where the equality involves the fundamental matrix Z and holds by the mean
(4)
hitting time formula. Parameter τ1 measures variability of mean hitting
times as the starting place varies. The final parameter is
(5)
τ1 ≡ max π(A)Ei TA .
i,A
Here we can regard the right side as the ratio of Ei TA , the Markov chain
mean hitting time on A, to 1/π(A), the mean hitting time under independent
sampling from the stationary distribution.
The definitions above make sense in either discrete or continuous time,
but the following notational convention turns out to be convenient. For
a discrete-time chain we define τ1 to be the value obtained by applying
the definition (4.11) to the continuized chain, and write τ1disc for the value
obtained for the discrete-time chain itself. Define similarly τ and τ 1,disc .
(1)
1 1
(2) (5)
But the other parameters − τ1 τ1
are defined directly in terms of the
discrete-time chain. We now state the equivalence theorem, from Aldous
[6].
Theorem 4.6 (a) In either discrete or continuous time, the parameters
(1) (2) (3) (4) (5)
τ1 , τ1 , τ1 , τ1 , τ1 and τ1 are equivalent.
(b) In discrete time, τ disc and τ 1,disc are equivalent, and τ ≤ e τ 1,disc
(2)
1 1 1 e−1 1
.
4.3. THE VARIATION THRESHOLD τ1 . 121
This will be (partially) proved in section 4.3.2, but let us first give a few
remarks and examples. The parameter τ1 and total variation distance are
closely related to the notion of coupling of Markov chains, discussed in Chap-
ter 14. Analogously (see the Notes), the separation s(t) and the parameter
(1)
τ1 are closely related to the notion of strong stationary times Vi for which
Note also that the definition of s(t) involves lower bounds in the convergence
pij (t)
πj → 1. One can make a definition involving upper bounds
Proof. Part (b) follows from part (a) and the definitions. Part (a) is essen-
tially just the “|| ||1 ≤ || ||2 ” inequality, but let’s write it out bare-hands.
2
X
4||Pi (Xt ∈ ·) − π(·)||2 = |pij (t) − πj |
j
2
X 1/2 pij (t) − πj
= πj 1/2
πj
j
X (pij (t) − πj )2
≤ by Cauchy-Schwarz
j
πj
X p2ij (t)
= −1 +
j
πj
pii (2t)
= −1 + by Chapter 3 Lemma yyy.
πi
Example 4.9 Consider a continuous-time 3-state chain with transition rates
1 - 1 -
a b c
ε 1
ε 1
Here πa = 2+ε , πb = πc = 2+ε . It is easy to check that τ1 is bounded as
−t
ε → 0. But paa (t) → e as ε → 0, and so by considering state a we have
ˆ → ∞ as ε → 0 for any fixed t.
d(t)
Remark. In the nice examples discussed in Chapter 5 we can usually
find a pair of states (i0 , j0 ) such that
¯ = ||Pi (Xt ∈ ·) − Pj (Xt ∈ ·)|| for all t.
d(t) 0 0
1
3 2
@
ε
@ ε
1 @
∗ 1
@
ε ε@
@
0 1 1
4.3. THE VARIATION THRESHOLD τ1 . 123
(4) (3)
Lemma 4.13 τ1 ≤ 4τ1
(5) (4)
Lemma 4.14 τ1 ≤ τ1
(1)
These lemmas hold in discrete and continuous time, interpreting tau1 , τ1
as τ disc , τ 1,disc is discrete time. Incidentally, Lemmas 4.12, 4.13 and 4.14
1 1
do not depend on reversibility. To complete the proof of Theorem 4.6 in
continuous time we would need to show
(5)
τ1 ≤ Kτ1 in continuous time (4.15)
for some absolute constant K. The proof I know is too lengthy to repeat
(2)
here – see [6]. Note that (from its definition) τ1 ≤ τ0 , so that (4.15) and
the lemmas above imply τ1 ≤ 2Kτ0 in continuous time. We shall instead
give a direct proof of a result weaker than (4.15):
Lemma 4.15 τ1 ≤ 66τ0 .
Turning to the assertions of Theorem 4.6 is discrete time, (b) is given by
the discrete-time versions of Lemmas 4.11 and 4.12. To prove (a), it is
(2) (5)
enough to show that the numerical values of the parameters τ1 − −τ1 are
(5) (4)
unchanged by continuizing the discrete-time chain. For τ1 and τ1 this is
(3)
clear, because continuization doesn’t affect mean hitting times. For τ1 and
(2)
τ1 it reduces to the following lemma.
124CHAPTER 4. HITTING AND CONVERGENCE TIME, AND FLOW RATE, PARAMETER
Proof of Lemma 4.12. The left inequality is immediate from the defini-
(1)
tions. For the right inequality, fix i. Write u = τ1 , so that
(4)
Writing b(i) for the left sum, the definition of τ1 and the triangle inequality
(4)
give τ1 ≤ maxi,k (b(i) + b(k)), and the Lemma follows.
Proof of Lemma 4.14. Fix a subset A and a starting state i 6∈ A. Then
for any j ∈ A,
Ei Tj = Ei TA + Eρ Tj
where ρ is the hitting place distribution Pi (XTA ∈ ·). So
X X
π(A)Ei TA = πj Ei TA = πj (Ei Tj − Eρ Tj )
j∈A j∈A
X (4)
≤ max πj (Ei Tj − Ek Tj ) ≤ τ1 .
k
j∈A
A = {j : Eπ Tj ≤ τ0 /δ}.
pjj (t) Eπ Tj τ0
−1≤ ≤
πj t δt
pjk (t) τ0
≥ 1 − , j, k ∈ A.
πk δt
Now let i be arbitrary and let k ∈ A. For any 0 ≤ s ≤ u,
and so +
pik (u + t) τ0
≥ 1− Pi (TA ≤ u). (4.18)
πk δt
Now
(5)
Ei TA τ
Pi (TA > u) ≤ ≤ 1 .
u uπ(A)
(5) (5) (4)
using Markov’s inequality and the definition of τ1 . And τ1 ≤ τ1 ≤ 2τ0 ,
the first inequality being Lemma 4.14 and the second being an easy conse-
quence of the definitions. Combining (4.18) and the subsequent inequalities
shows that, for k ∈ A and arbitrary i
+ +
pik (u + t) τ0 2τ0
≥ 1− 1− ≡ η, say.
πk δt uπ(A)
Applying this to arbitrary i and j we get
+ +
¯ + t) ≤ 1 − ηπ(A) ≤ 1 − 1 − τ0 2τ0
d(u π(A) −
δt u
+ +
τ0 2τ0
≤1− 1− 1−δ− by (4.17).
δt u
Putting t = 49τ0 , u = 17τ0 , δ = 1/7 makes the bound = 833 305
< e−1 .
Remark. The ingredients of the proof above are complete monotonicity
and conditioning on carefully chosen hitting times. The proof of (4.15) in
[6] uses these ingredients, plus the minimal hitting time construction in the
recurrent balayage theorem (Chapter 2 yyy).
Outline proof of Lemma 4.16. The observant reader will have noticed
(Chapter 2 yyy) that we avoided writing down a careful definition of stopping
times in the continuous setting. The definition involves measure-theoretic is-
sues which I don’t intend to engage, and giving a rigorous proof of the lemma
is a challenging exercise in the measure-theoretic formulation of continuous-
time chains. However, the underlying idea is very simple. Regard the
chain Yt as constructed from the chain (X0 , X1 , X2 , . . .) and exponential(1)
holds (ξi ). Define T̂ = N (T ), where N (t) is the Poisson counting process
N (t) = max{m : ξ1 +. . .+ξm ≤ t}. Then X(T̂ ) = Y (T ) by construction and
E T̂ = ET by the optional sampling theorem for the martingale N (t) − t. 2
where the second inequality is justified below. The second assertion of the
lemma is now clear, and the first holds by averaging over j.
The second inequality is justified by the following martingale result,
which is a simple application of the optional sampling theorem. The “equal-
ity” assertion is sometimes called Wald’s equation for martingales.
4.3. THE VARIATION THRESHOLD τ1 . 129
E(Yi+1 − Yi |Yj , j ≤ i) ≤ c, i ≥ 0
EYT ≤ cET.
So regardless of the mouse’s strategy, the cat has chance 1/n to meet the
mouse at time Vi , independently as i varies, so the meeting time M satisfies
M ≤ VT where T is a stopping time with mean n, and (4.21) follows from
Lemma 4.19. This topic will be pursued in Chapter 6 yyy.
130CHAPTER 4. HITTING AND CONVERGENCE TIME, AND FLOW RATE, PARAMETER
Lemma 4.21 Consider a family f = (f (a) ), where, for each state a, f (a) is
a unit flow from a to the stationary distribution π. Define
(a)
X |fij |
ψ(f ) = max πa
edges (i,j) a πi pij
in discrete time, and substitute qij for pij in continuous time. Let f a→π be
the special flow induced by the chain. Then
(4)
ψ(f a→π ) ≤ τ1 ≤ ∆ψ(f a→π )
where ∆ is the diameter of the transition graph.
e−1
Corollary 4.22 τ1 ≥ 16e inf f ψ(f ).
Unfortunately it seems hard to get analogous upper bounds. In particular,
it is not true that
τ1 = O ∆ inf ψ(f ) .
f
To see why, consider first random walk on the n-cycle (Chapter 5 Example
yyy). Here τ1 = Θ(n2 ) and ψ(f a→π ) = Θ(n), so the upper bound in Lemma
4.21 is the right order of magnitude, since ∆ = Θ(n). Now modify the
chain by allowing transitions between arbitrary pairs (i, j) with equal chance
o(n−3 ). The new chain will still have τ1 = Θ(n2 ), and by considering the
special flow in the original chain we have inf f ψ(f ) = O(n), but now the
diameter ∆ = 1.
The next three lemmas give inequalities between τ2 and the parameters
studied earlier in this chapter. Write π∗ ≡ mini πi .
Lemma 4.23 In continuous time,
1 1
τ2 ≤ τ1 ≤ τ2 1 + log .
2 π∗
In discrete time,
4e 1 1
(2)
τ1 ≤ τ2 1 + log .
e−1 2 π∗
132CHAPTER 4. HITTING AND CONVERGENCE TIME, AND FLOW RATE, PARAMETER
Proof of Lemma 4.23. Consider first the continuous time case. By the
spectral representation, as t → ∞ we have pii (t) − πi ∼ ci exp(−t/τ2 ) with
some ci 6= 0. But by Lemma 4.5 we have |pii (t) − πi | = O(exp(−t/τ1 )). This
shows τ2 ≤ τ1 . For the right inequality, the spectral representation gives
But
1 1 + 21 log 1/π∗
d e ≤ τ1disc ≤ d e
log 1/β log 1/β
1 2 + log(1/π∗ )
log ≤ .
β ∆
This makes sense for general chains (see Notes for further comments), but
under our standing assumption of reversibility we have
ρ(t) = exp(−t/τ2 ), t ≥ 0.
In discrete time,
ρ(t) = β t , t ≥ 0
where β = max(λ2 , −λn ).
Recall from Chapter 2 yyy that for general chains there is a limit variance
σ 2 = limt→∞ t−1 var St . Reversibility gives extra qualitative and quantita-
tive information. The first result refers to the stationary chain.
In discrete time,
t−1 var π St → σ 2 ≤ 2τ2 ||g||22
2τ2
2
σ t 1− ≤ var π St ≤ σ 2 t + ||g||22
t
and so in particular
1
var π St ≤ t||g||22 (2τ2 + ). (4.24)
t
Proof. Consider first the continuous time case. A brief calculation using the
spectral representation (Chapter 3 yyy) gives
2 −λm t
X
Eπ g(X0 )g(Xt ) = gm e (4.25)
m≥2
P 1/2
where gm = i πi uim g(i). So
Z tZ t
−1 −1
t var π St = t Eπ g(Xu )g(Xs ) dsdu
0 0
Z t
= 2t−1 (t − s)Eπ g(X0 )g(Xs )ds
0
Z t
s
X
2 −λm s
= 2 1− gm e ds (4.26)
0 t m≥2
X g2
m
= 2 A(λm t) (4.27)
m≥2
λm
by change of variables in the integral defining A(u). The right side increases
with t to X
σ2 ≡ 2 gm2
/λm , (4.28)
m≥2
In discrete time the arguments are messier, and we will omit details of
calculations. The analog of (4.26) becomes
t−1
|i|
X
−1
X
2 s
t var π St = 1− gm λm .
s=−(t−1)
t m≥2
136CHAPTER 4. HITTING AND CONVERGENCE TIME, AND FLOW RATE, PARAMETER
In place of the change of variables argument for (4.27), one needs an ele-
mentary calculation to get
2
gm
t−1 var π St = 2
X
B(λm , t) (4.29)
m≥2
1 − λm
1 + λ λ(1 − λt )
where B(λ, t) = − .
2 t(1 − λ)
This shows
1 + λm
t−1 var π St → σ 2 ≡
X
2
gm
m≥2
1 − λm
and the sum is bounded above by
1 + λ2 X 2 1 + λ2
gm = ||g||22 ≤ 2τ2 ||g||22 .
1 − λ2 m≥2 1 − λ2
λ(1 − λt ) 1
inf ≥− .
−1≤λ<1 (1 − λ)2 2
For the lower bound, one has to verify
2λ(1 − λt )
sup is attained at λ2 (and equals C, say)
−1≤λ≤λ2 (1 − λ)(1 + λ)
See the Notes for further comments. For the analogous result about large
deviations see Chapter yyy.
defined on state space. If we could sample i.i.d. from π we would need order
√
ε−2 samples to get an estimator with error about ε var π g. Now consider
the setting where we cannot directly sample from π but instead use the
“Markov Chain Monte Carlo” method of setting up a reversible chain with
stationary distribution π. How many steps of the chain do we need to get
the same accuracy? As in section 4.3.3, because we typically can’t quantify
the closeness to π of a feasible initial distribution, we consider bounds which
hold for arbitrary initial states. In assessing the number of steps required,
there are two opposite traps to avoid. The first is to say (cf. Proposition
4.29) that ε−2 τ2 steps suffice. This is wrong because the relaxation time
bounds apply to the stationary chain and cannot be directly applied to a
non-stationary chain. The second trap is to say that because it take Θ(τ1 )
steps to obtain one sample from the stationary distribution, we therefore
need order ε−2 τ1 steps in order to get ε−2 independent samples. This is
wrong because we don’t need independent samples. The correct answer is
(2)
order (τ1 + ε−2 τ2 ) steps. The conceptual idea (cf. the definition of τ1 )
is to find a stopping time achieving distribution π and use it as an initial
138CHAPTER 4. HITTING AND CONVERGENCE TIME, AND FLOW RATE, PARAMETER
state for simulating the stationary chain. More feasible to implement is the
following algorithm.
Algorithm. For a specified real number t1 > 0 and an integer m2 ≥ 1,
generate M (t1 ) with Poisson(t1 ) distribution. Simulate the chain Xt from
arbitrary initial distribution for M (t1 ) + m2 − 1 steps and calculate
M (t1 )+m2 −1
1 X
A(g, t1 , m2 ) ≡ g(Xt ).
m2 t=M (t1 )
Corollary 4.31
2τ2 + 1/m2
P (|A(g, t1 , m2 ) − ḡ| > ε||g||2 ) ≤ s(t1 ) +
ε2 m 2
where s(t) is separation (recall section 4.3.1) for the continuized chain.
(1) 4τ2
t1 = τ1 dlog(2/δ)e; m2 = d e.
ε2 δ
Since the mean number of steps is t1 + m2 − 1, this formalizes the idea that
we can estimate ḡ to within ε||g||2 in order (τ1 + ε−2 τ2 ) steps.
xxx if don’t know tau‘s
Proof. We may suppose ḡ = 0. Since XM (t1 ) has the distribution of the
continuized chain at time t1 , we may use the definition of s(t1 ) to write
2 −1
1 mX
!
P (|A(g, t1 , m2 )| > ε||g||2 ) ≤ s(t1 ) + Pπ g(Xt ) > ε||g||2
m2
t=0
2 −1
mX
!
1
≤ s(t1 ) + 2 2 var π g(Xt ) .
m2 ε ||g||22 t=0
Apply (4.24).
4.4. THE RELAXATION TIME τ2 139
Theorem 4.32 For each ordered pair (x, y) of vertices in a weighted graph,
let γxy be a path from x to y. Then for discrete-time random walk,
XX
τ2 ≤ w max πx πy r(γxy )1(e∈γxy )
e
x y
1 XX
τ2 ≤ w max πx πy |γxy |1(e∈γxy ) .
e we x y
where κ is the max in the first inequality in the statement of the Theorem.
The first inequality now follows from the extremal characterization (4.22).
The second inequality makes a simpler use of the Cauchy-Schwarz inequality,
in which we replace (4.30,4.31,4.32) by
2
XX X
= πx πy 1 · ∆g(e)
x y e∈γxy
(∆g(e))2
XX X
≤ πx πy |γxy | (4.33)
x y e∈γxy
≤ κ0 we (∆g(e))2 = κ0 2wE(g, g)
X
where κ0 is the max in the second inequality in the statement of the Theorem.
Remarks. (a) Theorem 4.32 applies to continuous-time (reversible) chains
by setting wij = πi qij .
(b) One can replace the deterministic choice of paths γxy by random
paths Γxy of the form x = V0 , V1 , . . . , VM = y of random length M = |Γxy |.
The second inequality extends in the natural way, by taking expectations in
(4.33) to give
!
2
XX X
≤ πx πy E |Γxy |1(e∈Γxy ) (∆g(e)) ,
x y e
(c) Inequalities in the style of Theorem 4.32 are often called Poincaré
inequalities because, to quote [124], they are “the discrete analog of the
classical method of Poincaré for estimating the spectral gap of the Laplacian
on a domain (see e.g. Bandle [39])”. I prefer the descriptive name the
4.4. THE RELAXATION TIME τ2 141
distinguished path method. This method has the same spirit as the coupling
method for bounding τ1 (see Chapter yyy), in that we get to use our skill
and judgement in making wise choices of paths in specific examples. xxx
list examples. Though its main utility is in studying hard examples, we give
some simple illustrations of its use below.
Write the conclusion of Corollary 4.33 as τ2 ≤ w maxe w1e F (e). Consider
a regular unweighted graph, and let Γx,y be chosen uniformly from the set
of minimum-length paths from x to y. Suppose that F (e) takes the same
value F for every directed edge e. A sufficient condition for this is that the
graph be arc-transitive (see Chapter 8 yyy). Then, summing over edges in
Corollary 4.33,
→ XXX XX
τ2 | E | ≤ w πx πy E|Γxy |1(e∈Γxy ) = w πx πy E|Γxy |2
e x y x y
→ →
where | E | is the number of directed edges. Now w = | E |, so we may
reinterpret this inequality as follows.
and where such sups are always over proper subsets A of states. This param-
eter can be calculated exactly in only very special cases, where the following
lemma is helpful.
Lemma 4.36 The sup in (4.34) is attained by some split {A, Ac } in which
both A and Ac are connected (as subsets of the graph of permissible transi-
tions).
X
Q(A, Ac ) = Q(Bi , Bic )
i
X
≥ γ π(Bi )π(Bic )
i
X
= γ (π(Bi ) − π 2 (Bi ))
i
!
X
2
= γ π(A) − π (Bi )
i
and so
Q(A, Ac ) π(A) − i π 2 (Bi )
P
≥ γ .
π(A)π(Ac ) π(A) − π 2 (A)
2 (B ) 2
But for m ≥ 2 we have ≤ ( = π 2 (A), which implies
P P
iπ i i π(Bi ))
Q(A,Ac )
π(A)π(Ac ) > γ. 2
4.5. THE FLOW PARAMETER τC 143
π(A)π(Ac )
≤ τ2
Q(A, Ac )
for any subset A. But much more is true: Chapter 3 yyy may be rephrased
as follows. For any subset A,
π(A)π(Ac ) π(A)Eπ TA
c
≤ ≤ π(A)EαA TA ≤ τ2
Q(A, A ) π(Ac )
π(A)Eπ TA
τc ≤ sup ≤ sup π(A)EαA TA ≤ τ2 .
A π(Ac ) A
π(A)
sup
A:π(A)≤1/2 Q(A, Ac )
which has been used in the literature but which would introduce a spurious
factor of 2 into the inequality τc ≤ τ2 .
Lemma 4.39 below shows that the final inequality of Corollary 4.37 can
be reversed. In contrast, on the n-cycle τc = Θ(n) whereas the other quan-
tities in Corollary 4.37 are Θ(n2 ). This shows that the “square” in Theorem
4.40 below cannot be omitted in general. It also suggests the following
(5)
question (cf. τ1 and τ1 )
π(A)Eπ TA
τ2 ≤ K sup
A π(Ac )
Lemma 4.39
τ2 ≤ sup EαA TA
A:π(A)≥1/2
and so in particular
τ2 ≤ 2 sup π(A)EαA TA .
A
A = {x : h(x) ≤ 0}
∂hf, Pt gi = −E(f, g)
where
1 XX
E(f, g) = (f (j) − f (i))(g(j) − g(i))qij .
2 i j
where the inequality follows from the inequality (a+ −b+ )2 ≤ (a+ −b+ )(a−b)
for real a, b. On the other hand, hh+ , hi ≤ hh+ , h+ i = ||h+ ||22 , and the
eigenvector h satisfies ∂(Pt h) = −λ2 h, so
This result follows by combining Lemma 4.39 above with Lemma 4.41 below.
In discrete time these inequalities hold with q ∗ deleted (i.e. replaced by 1),
by continuization. Our treatment of Cheeger’s inequality closely follows
Diaconis and Stroock [124] – see Notes for more history.
2q ∗ τc2
EαA TA ≤ .
π 2 (A)
Z g(x) !
X X
= 4 tdt πx qxy
g(y)
g(x)>g(y)
Z ∞ X X
= 4 t πx qxy dt
0
g(y)≤t<g(x)
Z ∞
= 4 t Q(Bt , Btc ) dt where Bt ≡ {x : g(x) > t}
0
π(Bt ) π(Btc )
Z ∞
≥ 4 t dt by definition of τc
0 τc
Z ∞
π(Bt ) π(A)
≥ 4 t dt because g = 0 on A
0 τc
2π(A)||g||22
= .
τc
Rearranging,
||g||22 2q ∗ τ 2
≤ 2 c
E(g, g) π (A)
and the first assertion of the Theorem follows from the extremal character-
ization (4.36) of EαA TA .
4(1 + log n)
τ∗ ≤ τc .
minj πj
Example 4.43 below will show that the log term cannot be omitted. Compare
with graph-theoretic bounds in Chapter 6 section yyy.
Proof. Fix states a, b. We want to use the extremal characterization
(Chapter 3 yyy). So fix a function 0 ≤ g ≤ 1 with g(a) = 0, g(b) = 1. Order
the states as a = 1, 2, 3, . . . , n = b so that g(·) is increasing.
XX
E(g, g) = πi qik (g(k) − g(i))2
i<k
XX X
≥ πi qik (g(j + 1) − g(j))2
i≤j<k
4.5. THE FLOW PARAMETER τC 147
X
= (g(j + 1) − g(j))2 Q(Aj , Acj ), where Aj ≡ [1, j]
j
X π(Aj )π(Acj )
≥ (g(j + 1) − g(j))2 (4.37)
j
τc
But
1/2
X X π 1/2 (Aj )π 1/2 (Acj ) τc
1= (g(j+1)−g(j)) = (g(j+1)−g(j)) .
j j τc
1/2 π 1/2 (Aj )π 1/2 (Acj )
The same bound holds for the sum over {j : π(Aj ) ≥ 1/2}, so applying
(4.38) we get
1 4
≤ τc (1 + log n)
E(g, g) π∗
and the Proposition follows from the extremal characterization.
Example 4.43 Consider the weighted linear graph with loops on vertices
{0, 1, 2, . . . , n − 1}, with edge-weights
Using Lemma 4.36, the value of τc is attained by a split of the form {[0, j], [j+
1, n − 1]}, and a brief calculation shows that the maximizing value is j = 0
and gives
τc = 2(n − 1).
So in this example, the bound in Proposition 4.42 is sharp up to the numer-
ical constant.
148CHAPTER 4. HITTING AND CONVERGENCE TIME, AND FLOW RATE, PARAMETER
Pπ̂ (X̂0 = î, X̂1 = ĵ) = Pπ (f (X0 ) = î, f (X1 ) = ĵ). (4.40)
The reader may check that (4.39) and (4.40) are equivalent. Under our
standing assumption that Xt is reversible, the induced chain is also reversible
(though the construction works for general chains as well). In the electrical
network interpretation, we are shorting together vertices with the same f -
values. It seems intuitively plausible that this “shorting” can only decrease
our parameters describing convergence and mean hitting time behavior.
This establishes the assertion for τ ∗ and τ0 , and the extremal characteri-
zation of relaxation time works similarly for τ2 . The assertion about τc is
immediate from the definition, since a partition of Iˆ pulls back to a partition
of I. 2
On the other hand, it is easy to see that shorting may increase a one-
sided mean hitting time. For example, random walk on the unweighted
4.6. INDUCED AND PRODUCT CHAINS 149
graph on the left has Ea Tb = 1, but when we short {a, d} together to form
vertex â in the graph on the right, Eâ T̂b = 2.
â
A
a b c d A
b c
But it is more natural to define the product chain Xt to be the chain with
transition probabilities
1 (1)
(i1 , i2 ) → (j1 , i2 ) : probability P (i1 , j1 )
2
1 (2)
(i1 , i2 ) → (i1 , j2 ) : probability P (i2 , j2 ).
2
This is the jump chain derived from the product of the continuized chains,
and has relaxation time
(1) (2)
τ2 = 2 max(τ2 , τ2 ). (4.43)
Again, this can be seen without need for calculation: the continuized chain
is just the continuous-time product chain run at half speed.
This definition and (4.43) extend to d-fold products in the obvious way.
Random walk on Z d is the product of d copies of random walk on Z 1 , and
random walk on the d-cube (Chapter 5 yyy) is the product of d copies of
random walk on {0, 1}.
Just to make things more confusing, given graphs G(1) and G(2) the
Cartesian product graph is defined to have edges
If both G(1) and G(2) are r-regular then random walk on the product graph
is the product of the random walks on the individual graphs. But in general,
discrete-time random walk on the product graph is the jump chain of the
product of the fluid model (Chapter 3 yyy) continuous-time random walks.
So if the graphs are r1 - and r2 -regular then the discrete-time random walk on
the product graph has the product distribution as its stationary distribution
and has relaxation time
(1) (2)
τ2 = (r1 + r2 ) max(τ2 /r1 , τ2 /r2 ).
Let us briefly discuss the behavior of some other parameters under prod-
ucts. For the continuous-time product (4.41), the total variation distance d¯
of section 4.3 satisfies
where superscripts refer to the graphs G(1) , G(2) and not to the parameters
in section 4.3.1. For the discrete-time chain, there is an extra factor of 2
from “slowing down” (cf. (4.42,4.43)), leading to
(1) (2)
τ1 ≤ 4 max(τ1 , τ1 ).
Here our conventions are a bit confusing: this inequality refers to the discrete-
time product chain, but as in section 4.3 we define τ1 via the continuized
chain – we leave the reader to figure out the analogous result for τ1disc
discussed in section 4.3.3.
(1) (2)
To state a result for τ0 , consider the continuous-time product (Xt , Xt )
of independent copies of the same n-state chain. If the underlying chain
has eigenvalues (λi ; 1 ≤ i ≤ n) then the product chain has eigenvalues
(λi + λj ; 1 ≤ i, j ≤ n) and so by the eigentime identity
product X 1
τ0 =
λ + λj
i,j≥1;(i,j)6=(1,1) i
X 1
= 2τ0 +
λ + λj
i,j≥2 i
n X
n
X 1
= 2τ0 + 2
i=2 j=i
λi + λj
n
X 2
≤ 2τ0 + (n − i + 1)
i=2
λi
≤ 2τ0 + (n − 1)2τ0 = 2nτ0 .
To appreciate this, consider the “trivial” case where the underlying Markov
chain is just an i.i.d. sequence with distribution π on I. Then τ2 = 1 and
the 2n random variables (X (i) , Y (i) ; 1 ≤ i ≤ n) are i.i.d. with distribution
π. And this special case of Corollary 4.46 becomes (4.45) below, because for
each i the distribution of Z − Z (i) is unchanged by substituting X0 for Y (i) .
Corollary 4.47 Let f : I n → R be arbitrary. Let (X0 , X1 , . . . , Xn ) be i.i.d.
with distribution π. Let Z (i) = f (X1 , . . . , Xi−1 , X0 , Xi+1 , . . . , Xn ) and let
Z = f (X1 , . . . , Xn ). Then
n
1X
var (Z) ≤ E(Z − Z (i) )2 (4.45)
2 i=1
If f is symmetric then
n
X
var (Z) ≤ E(Z (i) − Z̄)2 (4.46)
i=0
1 Pn
where Z (0) = Z and Z̄ = n+1 i=0 Z
(i) .
(2) −Zij
τ1 = max . (4.47)
ij πj
Curiously, this elegant result doesn’t seem to help much with the inequalities
in Theorem 4.6.
What happens with the τ1 -family of parameters for general chains re-
mains rather obscure. Some counter-examples to equivalence, and weaker
inequalities containing log 1/π∗ factors, can be found in [6]. Recently, Lo-
(2)
vasz and Winkler [241] initiated a detailed study of τ1 for general chains
which promises to shed more light on this question.
(i)
Our choice of τ1 as the “representative” of the family of τ1 ’s is somewhat
arbitrary. One motivation was that it gives the constant “1” in the inequality
τ2 ≤ τ1 . It would be interesting to know whether the constants in other basic
inequalities relating the τ1 -family to other parameters could be made “1”:
This follows from Corollary 4.22 and Lemma 4.23 when π is uniform, but
Sinclair posed
Open Problem 4.49 (i) Is there a simple proof of (4.49) in general?
(ii) Does (4.49) hold with the diameter ∆ in place of log n?
Section 4.4. As an example of historical interest, before this topic became
popular Fiedler [146] proved
Proposition 4.50 For random walk on a n-vertex weighted graph where
the stationary distribution is uniform,
w wn
τ2 ≤ 2 π ∼ π2c
4nc sin 2n
156CHAPTER 4. HITTING AND CONVERGENCE TIME, AND FLOW RATE, PARAMETER
ρ(t) ∼ c exp(−λt) as t → ∞
where λ is the “spectral gap”. But we cannot pull back from asymptotia
to the real world so easily: it is not true that ρ(t) can be bounded by
K exp(−λt) for universal K. A dramatic example from Aldous [11] section
4 has for each n an n-state chain with spectral gap bounded away from 0 but
with ρ(n) also bounded away from 0, instead of being exponentially small.
So implicit claims in the literature that estimates of the spectral gap for
general chains have implications for finite-time behavior should be treated
with extreme skepticism.
It is not surprising that the classical Berry-Esseen Theorem for i.i.d.
sums ([133] Thm. 2.4.10) has an analog for chains. Write σ 2 for the asymp-
totic variance rate in Proposition 4.29 and write Z for a standard Normal
r.v.
Proposition 4.51 There is a constant K, depending on the chain, such
that
St
sup |Pπ ( 1/2 ≤ x) − P (Z ≤ x)| ≤ Kt−1/2
x σt
for all t ≥ 1 and all standardized g.
This result is usually stated for infinite-state chains satisfying various mix-
ing conditions, which are automatically satisfied by finite chains. See e.g.
Bolthausen [55]. At first sight the constant K depends on the function g
as well as the chain, but a finiteness argument shows that the dependence
on g can be removed. Unfortunately the usual proofs don’t give any useful
indications of how K depends on the chain, and so don’t help with Open
Problem 4.30.
The variance results in Proposition 4.29 are presumably classical, being
straightforward consequences of the spectral representation. Their use in
algorithmic settings such as Corollary 4.31 goes back at least to [10].
Section 4.4.3. Systematic study of the optimal choice of weights in the
Cauchy-Schwarz argument for Theorem 4.32 may lead to improved bounds
in examples. Alan Sokal has unpublished notes on this subject.
4.7. NOTES ON CHAPTER 4 157
Section 4.5.1. The quantity 1/τc , or rather this quantity with the al-
ternate definition of τc mentioned in the text, has been called conductance.
I avoid that term, which invites unnecessary confusion with the electrical
network terminology. However, the subscript c can be regarded as standing
for “Cheeger” or “conductance”.
In connection with Open Problem 4.38 we mention the following result.
Suppose that in the definition (section 4.4.1) of the maximal correlation
function ρ(t) we considered only events, i.e. suppose we defined
Then ρ̃(t) ≤ ρ(t), but in fact the two definitions are equivalent in the sense
that there is a universal function ψ(x) ↓ 0 as x ↓ 0 such that ρ(t) ≤ ψ(ρ̃(t)).
This is a result about “measures of dependence” which has nothing to do
with Markovianness – see e.g. Bradley et al [59].
Section 4.5.2. The history of Cheeger-type inequalities up to 1987 is
discussed in [222] section 6. Briefly, Cheeger [87] proved a lower bound for
the eigenvalues of the Laplacian on a compact Riemannian manifold, and
this idea was subsequently adapted to different settings – in particular, by
Alon [26] to the relationship between eigenvalues and expansion properties
of graphs. Lawler and Sokal [222], and independently Jerrum and Sinclair
[307], were the first to discuss the relationship between τc and τ2 at the
level of reversible Markov chains. Their work was modified by Diaconis
and Stroock [124], whose proof we followed for Lemmas 4.39 and 4.41. The
only novelty in my presentation is talking explicitly about quasistationary
distributions, which makes the relationships easier to follow.
xxx give forward pointer to results of [238, 158].
Section 4.6.2. See Efron-Stein [140] for the origin of their inequality.
Inequality (4.45), or rather the variant mentioned above Corollary 4.47 in-
volving the 2n i.i.d. variables
158CHAPTER 4. HITTING AND CONVERGENCE TIME, AND FLOW RATE, PARAMETER
Chapter 5
There are two main settings in which explicit calculations for random walks
on large graphs can be done. One is where the graph is essentially just 1-
dimensional, and the other is where the graph is highly symmetric. The main
purpose of this chapter is to record some (mostly) bare-hands calculations for
simple examples, in order to illuminate the general inequalities of Chapter 4.
Our focus is on natural examples, but there are a few artificial examples
devised to make a mathematical point. A second purpose is to set out some
theory for birth-and-death chains and for trees.
Lemma 5.1 below is useful in various simple examples, so let’s record
it here. An edge (v, x) of a graph is essential (or a bridge) if its removal
would disconnect the graph, into two components A(v, x) and A(x, v), say,
containing v and x respectively. Recall that E is the set of (undirected)
edges, and write E(v, x) for the set of edges of A(v, x).
Lemma 5.1 (essential edge lemma) For random walk on a weighted graph
with essential edge (v, x),
P
2 (i,j)∈E (v,x) wij
Ev Tx = +1 (5.1)
wvx
w XX
Ev Tx + Ex Tv = , where w = wij . (5.2)
wvx i j
159
160CHAPTER 5. EXAMPLES: SPECIAL GRAPHS AND TREES (APRIL 23 1996)
Proof. It is enough to prove (5.1), since (5.2) follows by adding the two
expressions of the form (5.1). Because (v, x) is essential, we may delete all
vertices of A(x, v) except x, and this does not affect the behavior of the
chain up until time Tx , because x must be the first visited vertex of A(x, v).
After this deletion, πx−1 = Ex Tx+ = 1 + Ev Tx by considering the first step
P
from x, and πx = wvx /(2wvx + 2 (i,j)∈E (v,x) wij ), giving (5.1).
Remarks. Of course Lemma 5.1 is closely related to the edge-commute in-
equality of Chapter 3 Lemma yyy. We can also regard (5.2), and hence (5.4),
as consequences of the commute interpretation of resistance (Chapter 3 yyy),
because the effective resistance across an essential edge (v, x) is obviously
1/wvx .
For the walk started at b, let m(b, x; a, c) be the mean number of visits to
x before the exit time Ta ∧ Tc . (Recall from Chapter 2 our convention that
“before time t” includes time 0 but excludes time t). The number of returns
to b clearly has a Geometric distribution, so by (5.6)
2(c − b)(b − a)
m(b, b; a, c) = , a ≤ b ≤ c. (5.7)
c−a
To get the analog for visits to x we consider whether or not x is hit at all
before exiting; this gives
m(b, x; a, c) = Pb (Tx < Ta ∧ Tc ) m(x, x; a, c).
Appealing to (5.5) and (5.7) gives the famous mean occupation time formula
2(x−a)(c−b)
c−a , a≤x≤b≤c
m(b, x; a, c) = (5.8)
2(c−x)(b−a)
c−a , a ≤ b ≤ x ≤ c.
Now the (random) time to exit must equal the sum of the (random)
times spent at each state. So, taking expectations,
c
X
Eb (Ta ∧ Tc ) = m(b, x; a, c),
x=a
wi−1
Pb
Pb (Tc < Ta ) = Pci=a+1 −1 .
i=a+1 wi
5.1. ONE-DIMENSIONAL CHAINS 163
martingale, so
h(b) = Eb h(X(Ta ∧ Tc )) = ph(c) + (1 − p)h(a)
h(b)−h(a)
for p ≡ Pb (Tc < Ta ). Solving this equation gives p = h(c)−h(a) , which is (a).
The mean hitting time formula (b) has four different proofs! Two that
we will not give are as described below Lemma 5.2: Set up and solve a
recurrence equation, or use a well-chosen martingale. The slick argument is
to use the essential edge lemma (Lemma 5.1) to show
Pj−1
i=1 wi
Ej−1 Tj = 1 + 2 .
wj
Then c
X
Eb Tc = Ej−1 Tj ,
j=b+1
establishing (b). Let us also write out the non-slick argument, using mean
occupation times. By considering mean time spent at i,
b−1
X c−1
X
Eb Tc = Pb (Ti < Tc )m(i, i, c) + m(i, i, c), (5.11)
i=0 i=b
Substituting this and (a) into (5.11) leads to the formula stated in (b).
Finally, (c) can be deduced from (b), but it is more elegant to use the
essential edge lemma to get
There are several ways to use the preceding results to compute the av-
erage hitting time parameter τ0 . Perhaps the most elegant is
XX
τ0 = πi πj (Ei Tj + Ej Ti )
i j>i
n−1
X
= π[0, k − 1]π[k, n − 1](Ek−1 Tk + Ek Tk−1 )
k=1
n−1
X
= π[0, k − 1]π[k, n − 1]w/wk by (5.12)
k=1
n−1 k−1 n−1
= w−1 wk−1 wk + 2
X X X
wj wk + 2 wj . (5.15)
k=1 j=1 j=k+1
We do not know an explicit formula for τ2 , but we can get an upper bound
easily from the “distinguished paths” result Chapter 4 yyy. For x < y the
path γxy has r(γxy ) = yu=x+1 1/wu and hence the bound is
P
j−1
X n−1 y
1 X X (wx + wx+1 )(wy + wy+1 )
τ2 ≤ max . (5.17)
w j x=0 y=j u=x+1 wu
and then
1 p+q
max Ei Tj = max(E0 T1 , E1 T0 ) = , τ ∗ = E0 T1 + E1 T0 = ,
ij min(p, q) pq
and
τ0 = τ1 = τ2 = τc = 1/(p + q).
Specializing the results of Section 5.1.2 to the present example, one can
easily derive the asymptotic results
2ρ1/2 mπ
cos θm , where θm ≡
1+ρ n−1
sin((i + 1)θm )
−i/2
ρ 2 cos(iθm ) − (1 − ρ) , i = 0, . . . , n − 1.
sin(θm )
In particular,
" #−1
2ρ1/2 π 1+ρ
τ2 = 1 − cos → . (5.22)
1+ρ n−1 (1 − ρ1/2 )2
The random walk has drift p − q = −(1 − ρ)/(1 + ρ) ≡ −µ. It is not hard
to show for fixed t > 0 that the distances d¯n (tn) and dn (tn) of Chapter 4 yyy
converge to 1 if t < µ and to 0 if t > µ.
jjj include details? In fact, the cutoff occurs at µn + cρ n1/2 : cf. (e.g.)
Example 4.46 in [115]. Continue same paragraph:
In particular,
1−ρ
τ1 ∼ n (5.23)
1+ρ
2ρ1/2 mπ
cos θm , where now θm ≡
1+ρ n
Example τ∗ τ0 τ1 τ2 τc
5.7. cycle n2 n2 n2 n2 n
5.8. path n2 n2 n2 n2 n
5.9. complete graph n n 1 1 1
5.10. star n n 1 1 1
5.11. barbell n3 n3 n3 n3 n2
5.12. lollipop n3 n2 n2 n2 n
5.13. necklace n2 n2 n2 n2 n
5.14. balanced r-tree n log n n log n n n n
5.15. d-cube (d = log2 n) n n d log d d d
5.16. dense regular graphs n n 1 1 1
5.17. d-dimensional torus
d=2 jjj? n log n n2/d n2/d jjj?n1/d
d≥3 jjj? n n2/d n2/d jjj?n1/d
5.19. rook’s walk n n 1 1 1
τ0 = n−1
X
E0 Tj = (n2 − 1)/6 (5.26)
j
t!
2−t .
X
P0 (Xt = i) =
j:2j−t=i mod n
j!(t − j)!
1 n−1
X
P0 (Xt = i) = (cos(2πm/n))t cos(2πim/n),
n m=0
1 n2
τ2 = ∼ 2.
1 − cos(2π/n) 2π
As an aside, note that the eigentime identity (Chapter 3 yyy) gives the
curious identity
n−1
n2 − 1 X 1
=
6 m=1
1 − cos(2πm/n)
P∞ −2
whose n → ∞ limit is the well-known formula m=1 m = π 2 /6.
5.2. SPECIAL GRAPHS 171
1 n−1
X
P0 (X(t) = i) = exp(−t(1 − cos(2πm/n))) cos(2πim/n). (5.27)
n m=0
where the limit is “d¯ for Brownian motion on the circle”, which can be
written as
d¯∞ (t) ≡ 1 − 2P ((t1/2 Z) mod 1 ∈ (1/4, 3/4))
where Z has the standard Normal distribution. So
τ1 ∼ cn2
.
for the constant c such that d¯∞ (c) = e−1 , whose numerical value c = 0.063
has no real significance.
jjj David: You got 0.054. Please check. Continue same paragraph:
Similarly
1 1
Z
2
dn (tn ) → d∞ (t) ≡ |ft (u) − 1| du,
2 0
where ft is the density of (t1/2 Z) mod 1.
As for τc , the sup in its definition is attained by some A of the form
[0, i − 1], so
i i
$ %
n (1 − n ) 1 n2 n
τc = max = ∼ .
i 1/n n 4 2
As remarked in Chapter 4 yyy, this provides a counter-example to reversing
inequalities in Theorem yyy. But if we consider maxA (π(A)Eπ TA ), the max
is attained with A = [ n2 − αn, n2 + αn], say, where 0 ≤ α < 1/2. By
Lemma 5.2, for x ∈ (− 12 + α, 12 − α),
Eb(x mod 1)nc TA ∼
1
2 −α−x 1
2 − α + x n2 ,
and so
1
−α 1 1 4( 12 − α)3 n2
Z
2
2
Eπ TA ∼ n −α−x −α+x dx = .
− 12 +α 2 2 3
172CHAPTER 5. EXAMPLES: SPECIAL GRAPHS AND TREES (APRIL 23 1996)
Thus
4( 21 − α)3 2α 9n2
max(π(A)Eπ TA ) ∼ n2 sup = ,
A 0<α<1/2 3 512
consistent with Chapter 4 Open Problem yyy.
xxx level of detail for d¯ results, here and later.
Remark. Parameters τ ∗ , τ0 , τ1 , and τ2 are all Θ(n2 ) in this example, and
in Chapter 6 we’ll see that they are O(n2 ) over the class of regular graphs.
However, the exact maximum values over all n-vertex regular graphs (or the
constants c in the ∼ cn2 asymptotics) are not known. See Chapter 6 for the
natural conjectures.
Example 5.8 The n-path.
This is just the graph 0 – 1 – 2 – · · · – (n − 1) on n vertices. If (X̂t ) is
random walk on (all) the integers, then Xt = φ(X̂t ) is random walk on the
n-path, for the “concertina” map
i if i mod 2(n − 1) ≤ n − 1
φ(i) =
2(n − 1) − (i mod 2(n − 1)) otherwise.
We can also regard Xt as being derived from random walk X̃t on the
(2n − 2)-cycle via Xt = min(X̃t , 2n − 2 − X̃t ). So we can deduce the spectral
representation from that in the previous example:
q n−1
X
Pi (Xt = j) = πj /πi λtm uim ujm
m=0
5.2. SPECIAL GRAPHS 173
where, for 0 ≤ m ≤ n − 1,
λm = cos(πm/(n − 1))
and
√ √
u0m = πm ; un−1,m = πm (−1)m ;
√
uim = 2πm cos(πim/(n − 1)), 1 ≤ i ≤ n − 2.
In particular, the relaxation time is
1 2n2
τ2 = ∼ 2 .
1 − cos(π/(n − 1)) π
¯
Furthermore, d¯n (t) = d˜2n−2 (t) and dn (t) = d˜2n−2 (t) for all t, so
dn (t(2n)2 ) → d∞ (t)
where the limits are those in the previous example. Thus τ1 ∼ cn2 , where
.
c = 0.25 is 4 times the corresponding constant for the n-cycle.
xxx explain: BM on [0, 1] and circle described in Chapter 16.
It is easy to see that
n−1
2 if n is even
τc =
n−1 − 1
if n is odd
2 2(n−1)
In Section 5.3.2 we will see that the n-path attains the exact maximum
values of our parameters over all n-vertex trees.
For the complete graph on n vertices, the t-step probabilities for the chain
started at i can be calculated by considering the induced 2-state chain which
indicates whether or not the walk is at i. This gives, in discrete time,
t
1 1 1
Pi (Xt = i) = + 1− −
n n n−1
t
1 1 1
Pi (Xt = j) = − − , j 6= i (5.32)
n n n−1
174CHAPTER 5. EXAMPLES: SPECIAL GRAPHS AND TREES (APRIL 23 1996)
1 1 nt
Pi (Xt = i) = + 1− exp −
n n n−1
1 1 nt
Pi (Xt = j) = − exp − , j 6= i (5.33)
n n n−1
τ0 ≡ n−1
X
Ei Tj = (n − 1)2 /n. (5.37)
j
τ2 = (n − 1)/n. (5.38)
¯ = exp − nt n−1 nt
d(t) , d(t) = exp − ,
n−1 n n−1
so
τ1 = (n − 1)/n. (5.39)
It is easy to check
τc = (n − 1)/n.
We have already proved (Chapter 3 yyy) that the complete graph attains
the exact minimum of τ ∗ , maxij Ei Tj , τ0 , and τ2 over all n-vertex graphs.
The same holds for τc , by considering (in an arbitrary graph) a vertex of
minimum degree.
This leads to
¯ = e−t , d(t) = 1 n − 2 −t
d(t) e−2t + e ,
2(n − 1) n−1
from which
τ1 = 1.
Clearly (isolate a leaf)
1
τc = 1 − .
2(n − 1)
We shall see in Section 5.3.2 that the n-star attains the exact minimum
of our parameters over all n-vertex trees.
n → ∞, m1 /n → α, m2 /n → 1 − 2α
max Ei Tj ∼ α2 (1 − 2α)n3
ij
n3
∼ for α = 1/3
27
5.2. SPECIAL GRAPHS 177
where α = 1/3 is the asymptotic maximizer here and for the other parame-
ters below. Similarly
τ ∗ ∼ 2α2 (1 − 2α)n3
2n3
∼ for α = 1/3.
27
The stationary distribution π puts mass → 1/2 on each bell. Also, by (5.45)–
(5.47) below, Evl Tv = o(Evl Tvr ) uniformly for vertices v in the left bell and
Evl Tv ∼ Evl Tvr ∼ α2 (1 − 2α)n3 uniformly for vertices v in the right bell.
Hence
X 1 1
τ0 ≡ πv Evl Tv ∼ Evl Tvr ∼ α2 (1 − 2α)n3
v 2 2
and so we have proved the “τ0 ” part of
1 2
each of {τ0 , τ1 , τ2 } ∼ 2 α (1 − 2α)n3 (5.44)
n3
∼ for α = 1/3.
54
Consider the relaxation time τ2 . For the function g defined to be +1 on the
left bell, −1 on the right bell and linear on the bar, the Dirichlet form gives
2 2
E(g, g) = ∼ 2 .
(m2 + 1)(m1 (m1 − 1) + m2 + 1) α (1 − 2α)n3
made to work without knowing this “obvious fact” a priori). Couple chains
starting in these states by having them move symmetrically in the obvious
fashion. Certainly these copies will couple by the time T the copy started
at vl has reached the center vertex wm2 /2 of the bar. We claim that the
distribution of T is approximately exponential, and its expected value is
∼ 21 m21 m2 ∼ 21 α2 (1 − 2α)n3 by the first displayed result for the lollipop
example, with m2 changed to m2 /2 there. (In keeping with this observation,
I’ll refer to the “half-stick” lollipop in the next paragraph.)
jjj (cont.) To get approximate exponentiality for the distribution of
T , first argue easily that it’s approximately the same as that of Twm/2 for
the half-stick lollipop started in stationarity. But that distribution is, in
turn, approximately exponential by Chapter 3 Proposition yyy, since τ2 =
Θ(n2 ) = o(n3 ) for the half-stick lollipop.
Proof of (5.43). The mean time in question is the sum of the following
mean times:
obtained from the general formula for mean hitting time across an essen-
tial edge of a graph (Lemma 5.1), where w0 = vL and wm2 +1 = vR . To
argue (5.47), we start with the 1-step recurrence
1 m1 − 2
EvR Tvr = 1 + Ewm2 Tvr + Ex Tvr
m1 m1
where x denotes a vertex of the right bell distinct from vR and vr . Now
using the formula (5.48) for the mean passage time from wm2 to vR . Starting
from x, the time until a hit on either vR or vr has Geometric(2/(m1 − 1))
distribution, and the two vertices are equally likely to be hit first. So
The last three expressions give an equation for EvR Tvr whose solution is (5.47).
And it is straightforward to check that Evl Tvr does achieve the maximum,
using (5.45)–(5.47) to bound the general Ei Tj .
It is straightforward to check
α 2 n2
τc ∼ .
2
Example 5.12 The lollipop.
xxx picture
This is just the barbell without the right bell. That is, we start with a
complete graph on m1 vertices and add m2 new vertices in a path. So there
are n = m1 + m2 vertices, and wm2 is now a leaf. In this example, by (5.45)
and (5.46), with m2 in place of m2 + 1, we have
n → ∞, m1 /n → α, m2 /n → 1 − α
whence
Because the stationary distribution puts mass Θ(1/n) on the “bar”, (5.50) is
also enough to show that τ0 = O(n2 ). So by the general inequalities between
our parameters, to show
τc ∼ 2(1 − α)n.
Remark. The barbell and lollipop are the natural candidates for the n-
vertex graphs which maximize our parameters. The precise conjectures and
known results will be discussed in Chapter 6.
jjj We need to put somewhere—Chapter 4 on τc ? Chapter 6 on max
parameters over n-vertex graphs? in the barbell example?—the fact that
max τc is attained, when n is even, by the barbell with m2 = 0, the max
value being (n2 −2n+2)/8 ∼ n2 /8. Similarly, when n is odd, the maximizing
graph is formed by joining complete graphs on bn/2c and dn/2e vertices
respectively by a single edge, and the max value is easy to write down (I’ve
kept a record) but not so pretty; however, this value too is ∼ n2 /8, which
is probably all we want to say anyway. Here is the first draft of a proof:
For random walk on an unweighted graph, τc is the maximum over
nonempty proper subsets A of the ratio
(deg A)(deg Ac )
, (5.52)
2|E|(A, Ac )
where deg A is defined to be the sum of the degrees of vertices in A and
(A, Ac ) is the number of directed cut edges from A to Ac .
jjj Perhaps it would be better for exposition to stick with undirected
edges and introduce factor 1/2?
Maximizing now over choice of graphs, the max in question is no larger
than the maximum M , over all choices of n1 > 0, n2 > 0, e1 , e2 , and e0
satisfying n1 + n2 = n and 0 ≤ ei ≤ n2i for i = 1, 2 and 1 ≤ e0 ≤ n1 n2 , of
the ratio
(2e1 + e0 )(2e2 + e0 )
. (5.53)
2(e1 + e2 + e0 )e0
(We don’t claim equality because we don’t check that each ni -graph is con-
nected. But we’ll show that M is in fact achieved by the connected graph
claimed above.)
5.2. SPECIAL GRAPHS 181
Simple calculus shows that the ratio (5.53) is (as one would expect)
increasing in e1 and e2 and decreasing in e0 . Thus, for given n1 , (5.53) is
maximized by considering complete graphs of size n1 and n2 = n − n1 joined
by a single edge. Call the maximum value M (n1 ). If n is even, it is then
easy to see that Mn1 is maximized by n1 = n/2, giving M = (n2 − 2n + 2)/8,
as desired.
For the record, here are the slightly tricky details if n is odd. Write
ν = n/2 and n1 = ν − y and put x = y 2 . A short calculation gives M (n1 ) =
1 + g(x), where g(x) ≡ [(a − x)(b − x) − 1]/(2x + c) with a = ν 2 , b = (ν − 1)2 ,
and c = 2ν(ν −1)+2. Easy calculus shows that g is U -shaped over [0, ν] and
then that g(1/4) ≥ g(ν 2 ). Thus M (n1 ) is maximized when n1 = ν − 12 =
bn/2c.
a b h
A @ e @ @ A
A @
@ f @g
@ @
@ A
A @ @ @ A
AA @
@ @
@ @@ AA
c d m − 2 repeats -
This example affords a nice illustration of use of the commute interpre-
tation of resistance. Applying voltage 1 at vertex a and voltage 0 at e, a
brief calculation gives the potentials at intervening vertices as
and gives the effective resistance rae = 7/8. Since the effective resistance
between f and g equals 1, we see the maximal effective resistance is
7 7
rah = 8 + (2m − 3) + 8 = 2m − 54 .
So
5 3n2
τ ∗ = Ea Th + Eh Ta = 3 × (4m + 2) × 2m − ∼ .
4 2
182CHAPTER 5. EXAMPLES: SPECIAL GRAPHS AND TREES (APRIL 23 1996)
Take r ≥ 2 and h ≥ 1. The balanced r-tree is the rooted tree where all leaves
are at distance h from the root, where the root has degree r, and where the
other internal vertices have degree r + 1. Call h the height of the tree. For
h = 1 we have the (r + 1)-star, and for r = 2 we have the balanced binary
tree. The number of vertices is
The chain X̂ induced (in the sense of Chapter 4 Section yyy) by the
function
f (i) = h − (distance from i to the root)
is random walk on {0, . . . , h}, biased towards 0, with reflecting barriers, as
in Example 5.5 with
ρ = 1/r.
In fact, the transition probabilities for X can be expressed in terms of X̂
as follows. Given vertices v1 and v2 with f (v1 ) = f1 and f (v2 ) = f2 ,
the paths [root, v1 ] and [root, v2 ] intersect in the path [root, v3 ], say, with
f (v3 ) = f3 ≥ f1 ∨ f2 . Then
h
max X̂s = m, X̂t = f2 r−(m−f2 ) .
X
Pv1 (Xt = v2 ) = P f1
0≤s≤t
m=f3
Ev2 Tv1 = 2(r − 1)−2 (rf1 +1 − rf2 +1 ) − 2(r − 1)−1 (f1 − f2 ) − (f1 − f2 ),
5.2. SPECIAL GRAPHS 183
The maximum value 2h(n − 1) is attained when v1 and v2 are leaves and v3
is the root. So
1 ∗
2 τ = max Ev Tx = 2(n − 1)h.
v,x
(5.56)
(The τ∗ part is simpler via (5.88) below.) Another special case is that, for
a leaf v,
τ0 ∼ 2nh.
To sketch the proof, given a vertex w, let v be a leaf such that w lies on
the path from v to the root. Then
Eroot Tw = Eroot Tv − Ew Tv ,
and Ew Tv ≤ 2(n − 1)f (w) by (5.54). But the stationary distribution puts
nearly all its mass on vertices w with f (w) of constant order, and n = o(nh).
We claim next that
τ1 ∼ τ2 ∼ 2n/(r − 1).
and
2n
τ2 ≥ (1 + o(1)) . (5.60)
r−1
Proof of (5.59). Put
2n
tn ≡
r−1
for brevity. We begin the proof by recalling the results (5.22) and (5.19) for
the induced walk X̂:
(r + 1)
τ̂2 → 1/2 ,
(r − 1)2
2rh+1
Eπ̂ T̂h ∼ ∼ tn . (5.61)
(r − 1)2
By Proposition yyy of Chapter 3,
!
t τ̂2
sup Pπ̂ (T̂h > t) − exp − ≤ = Θ(n−1 ) = o(1). (5.62)
t Eπ̂ T̂h Eπ̂ T̂h
For X̂ started at 0, let Ŝ be a stopping time at which the chain has exactly
the stationary distribution. Then, for 0 ≤ s ≤ t,
for all large n. Hence τ1 ≤ (1 + )tn for all large n, and (5.59) follows.
Proof of (5.60).
jjj [This requires further exposition in both Chapters 3 and 4-1. In Chap-
ter 3, it needs to be made clear that one of the inequalities having to do with
CM hitting time distributions says precisely that EαA TA ≥ Eπ TA /π(Ac ) ≥
Eπ TA . In Chapter 4-1 (2/96 version), it needs to be noted that Lemma 2(a)
(concerning τ2 for the joining of two copies of a graph) extends to the joining
of any finite number of copies.]
Let G denote a balanced r-tree of height h. Let G00 denote a balanced
r-tree of height h − 1 with root y and construct a tree G0 from G00 by adding
an edge from y to an additional vertex z. We can construct G by joining r
copies of G0 at the vertex z, which becomes the root of G. Let π 0 and π 00
denote the respective stationary distributions for the random walks on G0
and G00 , and use the notation T 0 and T 00 , respectively, for hitting times on
these graphs. By Chapter 4 jjj,
But it is easy to see that this last quantity equals (1 + o(1))Eπ Tz , which is
asymptotically equivalent to 2n/(r − 1) by (5.61).
¿From the discussion at the beginning of Section 5.3.1, it follows that τc
is achieved at any of the r subtrees of the root. This gives
(2rh − r − 1)(2rh − 1) 2n
τc = h
∼ .
2r(r − 1) r
An extension of the balanced r-tree example is treated in Section 5.2.1
below.
This is a graph with vertex-set I = {0, 1}d and hence with n = 2d vertices.
Write i = (i1 , . . . , id ) for a vertex, and write |i − j| = u |iu − ju |. Then (i, j)
P
It is easier to use the continuized walk X(t) = (X1 (t), . . . , Xd (t)), because
the component processes (Xu (t)) are independent as u varies, and each is
in fact just the continuous-time random walk on the 2-path with transition
rate 1/d. This follows from an elementary fact about the superposition of
(marked) Poisson processes.
Thus, in continuous time,
d
1
1 + (−1)|iu −ju | e−2t/d
Y
Pi (X(t) = j) =
u=1
2
|i−j| d−|i−j|
= 2−d 1 − e−2t/d 1 + e−2t/d . (5.64)
By expanding the right side, we see that the continuous-time eigenvalues are
!
d
λk = 2k/d with multiplicity , k = 0, 1, . . . , d. (5.65)
k
(Of course, this is just the general fact that the eigenvalues of a d-fold
product of continuous-time chains are
(λi1 + · · · + λid ; 1 ≤ i1 , . . . , id ≤ n) (5.66)
where (λi ; 1 ≤ i ≤ n) are the eigenvalues of the marginal chain.)
In particular,
τ2 = d/2. (5.67)
By the eigentime identity (Chapter 3 yyy)
d
!
X 1 dX d
τ0 = = k −1
λ
m≥2 m
2 k=1 k
= 2 (1 + d−1 + O(d−2 )),
d
(5.68)
the asymptotics being easy analysis.
From (5.64) it is also straightforward to derive the discrete-time t-step
transition probabilities:
d t X ! !
−d
X 2m r |i − j| d − |i − j|
Pi (Xt = j) = 2 1− (−1) .
m=0
d r r m−r
In particular, writing T Y for hitting times for Y , symmetry and (5.13) give
d
1 ∗Y 1 Y Y Y d−1
X 1
τ = (E0 Td + Ed T0 ) = E0 Td = 2 d−1
.
2 2 i=1 i−1
The asymptotics are the same as in (5.68). In fact it is easy to use (5.64) to
show
Zii = 2−d τ0 = 1 + d−1 + O(d−2 )
Zij = O(d−2 ) uniformly over |i − j| ≥ 2
and then by Chapter 2 yyy
Since
1 + E1 T0Y = E0 T1Y + E1 T0Y = w/w1 = 2d ,
it follows that
Ei Tj = 2d − 1 if |i − j| = 1.
xxx refrain from write out exact Ei Tj —refs
To discuss total variation convergence, we have by symmetry (and writ-
ing d to distinguish from dimension d)
where the second sum is over j(u) with u < u0 (s). But from (5.64) we can
write this sum as
P B 1
2 (1 − d−1/2 e−2s ) ≤ |j(u0 (s))| − P B 1
2 ≤ |j(u0 (s))|
Consider an r-regular n-vertex graph with r > n/2. Of course here we are
considering a class of graphs rather than a specific example. The calculations
below show that these graphs necessarily mimic the complete graph (as far
as smallness of the random walk parameters is concerned) in the asymptotic
setting r/n → c > 1/2.
The basic fact is that, for any pair i, j of vertices, there must be at least
2r − n other vertices k such that i − k − j is a path. To prove this, let a1
(resp., a2 ) be the number of vertices k 6= i, j such that exactly 1 (resp., 2)
of the edges (k, i), (k, j) exist. Then a1 + a2 ≤ n − 2 by counting vertices,
and a1 + 2a2 ≥ 2(r − 1) by counting edges, and these inequalities imply
a2 ≥ 2r − n.
Thus, by Thompson’s principle (Chapter 3, yyy) the effective resistance
2
rij ≤ 2r−n and so the commute interpretation of resistance implies
2rn 2cn
τ∗ ≤ ∼ . (5.72)
2r − n 2c − 1
A simple “greedy coupling” argument (Chapter 14, Example yyy) shows
r c
τ1 ≤ ∼ . (5.73)
2r − n 2c − 1
This is also a bound on τ2 and on τc , because τc ≤ τ2 ≤ τ1 always, and special
case 2 below shows that this bound on τc cannot be improved asymptotically
(nor hence can the bound on τ1 or τ2 ). Because Eπ Tj ≤ nτ2 for regular
graphs (Chapter 3 yyy), we get
nr
Eπ Tj ≤ .
2r − n
This implies
nr cn
τ0 ≤ ∼
2r − n 2c − 1
which also follows from (5.72) and τ0 ≤ τ ∗ /2. We can also argue, in the
notation of Chapter 4 yyy, that
(2) 4e nr cn
max Ei Tj ≤ τ1 + max Eπ Tj ≤ τ1 + nτ1 ≤ (1 + o(1)) ∼ .
ij j e−1 2r − n 2c − 1
Special case 1. The orders of magnitude may change for c = 1/2. Take
two complete (n/2)-graphs, break one edge in each (say edges (v1 , v2 ) and
190CHAPTER 5. EXAMPLES: SPECIAL GRAPHS AND TREES (APRIL 23 1996)
(w1 , w2 )) and add edges (v1 , w1 ) and (v2 , w2 ). This gives an n-vertex ((n/2)−
1)-regular graph for which all our parameters are Θ(n2 ).
jjj I haven’t checked this.
Special case 2. Can the bound τc ≤ r/(2r −n) ∼ c/(2c−1) be asymptoti-
cally improved? Eric Ordentlicht has provided the following natural counter-
example. Again start with two (n/2)-complete graphs on vertices (vi ) and
(wi ). Now add the edges (vi , wj ) for which 0 ≤ (j−i) mod (n/2) ≤ r−(n/2).
This gives an n-vertex r-regular graph. By considering the set A consisting
of the vertices vi , a brief calculation gives
r c
τc ≥ ∼ .
2r − n + 2 2c − 1
d.
Example 5.17 The d-dimensional torus Zm
The torus is the set of d-dimensional integers i = (i1 , . . . , id ) modulo m,
considered in the natural way as a 2d-regular graph on n = md vertices. It
is much simpler to work with the random walk in continuous time, X(t) =
(X1 (t), . . . , Xd (t)), because its component processes (Xu (t)) are independent
as u varies; and each is just continuous-time random walk on the m-cycle,
slowed down by a factor 1/d. Thus we can immediately write the time-t
transition probabilities for X in terms of the corresponding probabilities
p0,j (t) for continuous-time random walk on the m-cycle (see Example 5.7
above) as
d
Y
p0,j (t) = p0,ju (t/d).
u=1
Since the eigenvalues on the m-cycle are (1 − cos(2πk/m), 0 ≤ k ≤ m − 1),
by (5.66) the eigenvalues of X are
d
1X
λ(k1 ...kd ) = (1 − cos(2πku /m)), 0 ≤ ku ≤ m − 1.
d u=1
In particular, we see that the relaxation time satisfies
dm2 dn2/d
τ2 ∼ =
2π 2 2π 2
where here and below asymptotics are as m → ∞ for fixed d. This relaxation
time could more simply be derived from the N -cycle result via the general
“product chain” result of Chapter 4 yyy. But writing out all the eigenvalues
enables us to use the eigentime identity.
X X
τ0 = ··· 1/λ(k1 ,...,kd )
k1 kd
5.2. SPECIAL GRAPHS 191
τ0 ∼ md Rd (5.74)
where Z 1 Z 1
1
Rd ≡ ··· 1 Pd dx1 · · · dxd (5.75)
0 0 d u=1 (1 − cos(2πxu ))
provided the integral converges. The reader who is a calculus whiz will see
that in fact Rd < ∞ for d ≥ 3 only, but this is seen more easily in the
alternative approach of Chapter 15, Section yyy.
xxx more stuff: connection to transience, recurrent potential, etc
xxx new copy from lectures
xxx τ1 , τc
jjj David: I will let you develop the rest of this example. Note that τ1
is considered very briefly in Chapter 15, eq. (17) in 3/6/96 version. Here
are a few comments for τc . First suppose that m > 2 is even and d ≥ 2.
Presumably, τc is achieved by the following half-torus:
d
A := {i = (i1 , . . . , id ) ∈ Zm : 0 ≤ id < m/2}.
whence
τ (A) = d4 n1/d .
[By Example 5.15 (the d-cube) this last result is also true for m = 2, and
(for even m ≥ 2) it is by Example 5.7 (the n-cycle) also true for d = 1.] If
we have correctly conjectured the maximizing A, then
τc = d4 n1/d if m is even,
and presumably(??)
τc ∼ d4 n1/d
in any case.
In particular,
2(m − 1)
τ2 = , (5.77)
m
which equals 1.75 for m = 8 and converges to 2 as m grows. Applying the
eigentime identity, a brief calculation gives
(m − 1)2 (m + 3)
τ0 = , (5.78)
m
which equals 67.375 for m = 8 and m2 + m + O(1) for m large.
Starting the walk X at 0 = (0, 0), let Y (t) denote the Hamming distance
H(X(t), 0) of X(t) from 0, i.e., the number of coordinates (0, 1, or 2) in
which X(t) differs from 0. Then Y is a birth-and-death chain with transition
rates
1 1 1
q01 = 1, q10 = , q12 = , q21 = .
2(m − 1) 2 m−1
This is useful for computing mean hitting times. Of course
Ei Tj = 0 if H(i, j) = 0.
194CHAPTER 5. EXAMPLES: SPECIAL GRAPHS AND TREES (APRIL 23 1996)
Since
1 + E1 T0Y = E0 T1Y + E1 T0Y = m2 ,
it follows that
Ei Tj = m2 − 1 if H(i, j) = 1.
Finally, it is clear that E2 T1Y = m − 1, so that
whence
Ei Tj = m2 + m − 2 if H(i, j) = 2.
These calculations show
1 ∗
2τ = max Ei Tj = m2 + m − 2,
ij
2 mt 2 mt
d¯m (t) = 2 − exp − − 1− exp −
m 2(m − 1) m m−1
and thence
" 1/2 ! #
m−1 m(m − 2) m−1
τ1 = −2 ln 1 − 1 − e−1 + ln ,
m (m − 1)2 m−2
.
which rounds to 2.54 for m = 8 and converges to −2 ln(1 − (1 − e−1 )1/2 ) =
3.17 as m becomes large.
Any set A of the form {(i1 , i2 ) : iu ∈ J} with either u = 1 or u = 2 and
J a nonempty proper subset of {0, . . . , m − 1} achieves the value
m−1
τc = 2 .
m
A direct proof is messy, but this follows immediately from the general in-
equality τc ≤ τ2 , (5.77), and a brief calculation that the indicated A indeed
gives the indicated value.
xxx other examples left to reader? complete bipartite; ladders
jjj Note: I’ve worked these out and have handwritten notes. How much
do we want to include, if at all? (I could at least put the results in the
table.)
5.2. SPECIAL GRAPHS 195
The distribution π̂ concentrates near the root-level if ρ < 1 and near the
leaves-level if ρ > 1; it is nearly uniform on the h levels if ρ = 1. On the
other hand, the weight assigned by the distribution π to an individual vertex
v is a decreasing function of f (v) (thus favoring vertices near the leaves) if
λ < 1 (i.e., ρ < 1/r) and is an increasing function (thus favoring vertices
near the root) if λ > 1; it is uniform on the vertices in the unbiased case
λ = 1.
The mean hitting time calculations of Example 5.14 can all be extended
to the biased case. For example, for λ 6= 1 the general formula (5.55)
becomes [using the same notation as at (5.55)]
λ−f3 − λ−f2 −1 −2
−(f2 +1) −(f1 +1)
Ev1 Tv2 = ŵrh + 2(ρ − 1) ρ − ρ
λ−1 − 1
−2(ρ−1 − 1)−1 (f2 − f1 ) − (f2 − f1 ) (5.79)
196CHAPTER 5. EXAMPLES: SPECIAL GRAPHS AND TREES (APRIL 23 1996)
if ρ 6= 1 and
λ−f3 − λ−f2
Ev1 Tv2 = ŵrh + f22 − f12
λ−1 − 1
if ρ = 1. The maximum value is attained when v1 and v2 are leaves and v3
is the root. So if λ 6= 1,
1 ∗ λ−h − 1
2τ = max Ev Tx = ŵrh . (5.80)
v,x λ−1 − 1
The orders of magnitude for all of the τ -parameters (with r and λ, and
hence ρ, fixed as h, and hence n, becomes large) are summarized on a
case-by-case basis in the next table. Following are some of the highlights
in deriving these results; the details, and derivation of exact formulas and
more detailed asymptotic results, are left to the reader.
Value of ρ τ∗ τ0 τ1 τ2 τc
ρ < 1/r ρ−h ρ−h ρ−h ρ−h ρ−h
ρ = 1/r (≡ Example 5.14) nh nh n n n
1/r < ρ < 1 n n ρ−h ρ−h ρ−h
ρ=1 nh n h h h
ρ>1 n n h 1 1
we have τ0 ≤ E
P
For τ0 = x πx Eroot Tx root Tleaf . If ρ < 1/r, this bound
is tight:
2ρ−h
τ0 ∼ E root Tleaf ∼ (λ − ρ);
(1 − ρ)2 (1 − λ)
for ρ > 1/r a more careful calculation is required.
If ρ < 1, then the same arguments as for the unbiased case (ρ = 1/r)
show
τ1 ∼ τ2 ∼ 2ρ−(h−1) /(1 − ρ)2 .
In this case it is not hard to show that
τc = Θ(ρ−h )
τ1 = Θ(h), τc ∼ 2(1 − 1r )h
5.3. TREES 197
τ2 = Θ(h)
as well. If ρ > 1, then since X̂ has positive drift equal to (ρ − 1)/(ρ + 1), it
follows that
ρ+1
τ1 ∼ h.
ρ−1
The value τc is achieved by isolating a leaf, giving
τc → 1,
τ2 = Θ(1)
as well.
jjj Limiting value of τ2 when ρ > 1 is that of τ2 for biased infinite tree?
Namely?
5.3 Trees
For random walk on a finite tree, we can develop explicit formulas for means
and variances of first passage times, and for distributions of first hitting
places. We shall only treat the unweighted case, but the formulas can be
extended to the weighted case without difficulty.
xxx notation below —change w to x ? Used i, j, v, w, x haphazardly for
vertices.
In this section we’ll write rv for the degree of a vertex v, and d(v, x)
for the distance between v and x. On a tree we may unambiguously write
[v, x] for the path from v to x. Given vertices j, v1 , v2 , . . . in a tree, the
intersection of the paths [j, v1 ], [j, v2 ], . . . is a (maybe trivial) path; write
d(j, v1 ∧ v2 ∧ · · ·) ≥ 0 for the length of this intersection path.
On an n-vertex tree, the random walk’s stationary distribution is
rv
πv = .
2(n − 1)
Recall from the beginning of this chapter that an edge (v, x) of a graph
is essential if its removal would disconnect the graph into two components
A(v, x) and A(x, v), say, containing v and x respectively. Obviously, in a
tree every edge is essential, so we get a lot of mileage out of the essential
edge lemma (Lemma 5.1).
198CHAPTER 5. EXAMPLES: SPECIAL GRAPHS AND TREES (APRIL 23 1996)
For arbitrary i, j,
XX
vari Tj = −Ei Tj + rv rw d(j, i ∧ v ∧ w) [2d(j, v ∧ w) − d(j, i ∧ v ∧ w)] .
v w
(5.86)
Remarks. 1. There are several equivalent expressions for the sums above:
we chose the most symmetric-looking ones. We’ve written sums over ver-
tices, but one could rephrase in terms of sums over edges.
2. In continuous time, the terms “−Ei Tj ” disappear from the variance
formulas—see xxx.
Proof of Theorem 5.20. Equations (5.81) and (5.82) are rephrasings
of (5.3) and (5.4) from the essential edge lemma. Equation (5.84) and the
first equality in (5.83) follow from (5.82) and (5.81) by summing over the
edges in the path [i, j]. Note alternatively that (5.84) can be regarded as
a consequence of the commute interpretation of resistance, since the ef-
fective resistance between i and j is d(i, j). To get the second equality
in (5.83),consider the following deterministic identity (whose proof is obvi-
ous), relating sums over vertices to sums over edges.
Lemma 5.21 Let f be a function on the vertices of a tree, and let j be a
distinguished vertex. Then
(f (v) + f (v ∗ ))
X X
rv f (v) =
v v6=j
where v ∗ is the first vertex (other than v) in the path [v, j].
5.3. TREES 199
where l denotes the sum over all 0 ≤ l ≤ m − 1 for which A(il , il+1 )
P
contains both v and w. Given vertices v and w, there exist unique smallest
values of p and q so that v ∈ A(ip , ip+1 ) and w ∈ A(iq , iq+1 ). If p 6= q, then
P
the sum l in (5.87) equals
m−1
X m−1
X
(2d(il+1 , ip∨q ) − 1) = (2((l + 1) − (p ∨ q)) − 1)
l=p∨q l=p∨q
= (m − (p ∨ q))2 = d2 (j, v ∧ w)
= d(j, i ∧ v ∧ w) [2d(j, v ∧ w) − d(j, i ∧ v ∧ w)] ,
P
as required by (5.86). If p = q, then the sum l in (5.87) equals
m−1
X
(2d(il+1 , ip ) + 2d(ip , v ∧ w) − 1)
l=p
Also, X
rv = 2(n − 1) − 1.
v6=j
But by (5.81), Ei Tj = 2n − 3.
where ∆ is the diameter of the tree. As for τc , it is clear that the sup in its
definition is attained by A(v, w) for some edge (v, w). Note that
2|A(v, w)| − 1
π(A(v, w)) = . (5.89)
2(n − 1)
This leads to
2|A(v,w)|−1 2|A(w,v)|−1
2(n−1) 2(n−1)
τc = max 1
(v,w)
2(n−1)
4|A(v, w)||A(w, v)| − 2n + 1
= max . (5.90)
(v,w) 2(n − 1)
Obviously the max is attained by an edge for which |A(v, w)| is as close as
possible to n/2. This is one of several notions of “centrality” of vertices
and edges which arise in our discussion—see Buckley and Harary [81] for a
treatment of centrality in the general graph context, and for the standard
graph-theoretic terminology.
1 2 X 1
τ0 = + |A(v, w)||A(w, v)| − |A(v, w)|2 + |A(w, v)|2
2 n (v,w) 2(n − 1)
P
where (v,w) denotes the sum over all undirected edges (v, w).
5.3. TREES 201
Proof. Using the formula for the stationary distribution, for each i
1 X
τ0 = rj Ei Tj .
2(n − 1) j
1 X
τ0 = (2Ei Tj − a(i, j))
2(n − 1) j
where a(i, i) = 0 and a(i, j) = Ex Tj , where (j, x) is the first edge of the path
[j, i]. Taking the (unweighted) average over i,
1 XX
τ0 = (2Ei Tj − a(i, j)).
2n(n − 1) i j
Each term Ei Tj is the sum of terms Ev Tw along the edges (v, w) of the path
[i, j]. Counting how many times a directed edge (v, w) appears,
1 X
τ0 = (2|A(v, w)||A(w, v)| − |A(v, w) |)Ev Tw ,
2n(n − 1)
where we sum over directed edges (v, w). Changing to a sum over undirected
edges, using Ev Tw + Ew Tv = 2(n − 1) and Ev Tw = 2|A(v, w)| − 1, gives
X
2n(n − 1)τ0 = [2|A(v, w)||A(w, v)|2(n − 1)
(v,w)
−|A(v, w)|(2|A(v, w)| − 1)
−|A(w, v)|(2|A(w, v)| − 1)] .
σ = min max Ej Ti .
i j
Fix an i attaining the minimum. For arbitrary j we have (the first equality
uses the random target lemma, cf. the proof of Chapter 4 Lemma yyy)
X X
πk |Ej Tk − Ei Tk | = 2 πk (Ej Tk − Ei Tk )+
k k
X
≤ 2 πk Ej Ti because Ej Tk ≤ Ej Ti + Ei Tk
k
≤ 2σ
(3)
and so τ1 ≤ 4σ.
For the converse, it is elementary that we can find a vertex i such that
the size (n∗ , say) of the largest branch from i satisfies n∗ ≤ n/2. (This is
another notion of “centrality”. To be precise, we are excluding i itself from
the branch.) Fix this i, and consider the j which maximizes Ej Ti , so that
Ej Ti ≥ σ by definition. Let B denote the set of vertices in the branch from
i which contains j. Then
Ej Tk = Ej Ti + Ei Tk , k ∈ B c
and so
(3) X
τ1 ≥ πk |Ej Tk − Ei Tk | ≥ π(B c )Ej Ti ≥ π(B c )σ.
k
2n −1 ∗ (3)
But by (5.89) π(B) = 2(n−1) ≤ 21 , so we have shown τ1 ≥ σ/2.
We do not know whether τ2 has a simple expression “up to equivalence”
analogous to Proposition 5.23. It is natural to apply the “distinguished
paths” bound (Chapter 4 yyy). This gives the inequality
X X
τ2 ≤ 2(n − 1) max πx πy d(x, y)
(v,w)
x∈A(v,w) y∈A(w,v)
h i
= 2(n − 1) max π(A(v, w))E d(v, V )1(V ∈A(w,v))
(v,w)
h i
+π(A(w, v))E d(v, V )1(V ∈A(v,w))
where V has the stationary distribution π and where we got the equality
by writing d(x, y) = d(v, y) + d(v, x). The edge attaining the max gives yet
another notion of “centrality.”
xxx further remarks on τ2 .
5.3. TREES 203
Proof. (a) is obvious from (5.88), because ∆ varies between 2 for the
n-star and (n − 1) for the n-path. The lower bound in (b) follows from the
lower bound in (a). For the upper bound in (b), consider some path i =
v0 , v1 , . . . , vd = j in the tree, where plainly d ≤ (n − 1). Now |A(vd−1 , vd )| ≤
n − 1 and so
|A(vd−i , vd−i+1 )| ≤ n − i for all i
because the left side decreases by at least 1 as i increases. So
d−1
X
Ei Tj = Evm Tvm+1
m=0
d−1
X
= (2|A(vm , vm+1 )| − 1) by (5.81)
m=0
d−1
X
≤ (2(m + n − d) − 1)
m=0
n−1
X
≤ (2l − 1)
l=1
= (n − 1)2 .
To prove (c), it is enough to show that the sum in Proposition 5.22 is min-
imized by the n-star and maximized by the n-path. For each undirected
edge (v, w), let
[g, g] = πv (1 − πv − πw )2 + (1 − πv − πw )πv2 ,
Similarly, one can get different-looking expressions for τ0 . Wilf [337] lists 54
identities involving binomial coefficients—it would be amusing to see how
many could be derived by calculating a random walk on the d-cube quantity
in two different ways!
Comparing our treatment of dense regular graphs (Example 5.16) with
that in [272] should convince the reader of the value of general theory.
Section 5.3. An early reference to formulas for the mean and variance
of hitting times on a tree (Theorem 5.20) is Moon [264], who used less
intuitive generating function arguments. The formulas for the mean have
been repeatedly rediscovered.
Of course there are many other questions we can ask about random walk
on trees. Some issues treated later are
xxx list.
xxx more sophisticated ideas in Lyons [245].
Chapter 6
207
208 CHAPTER 6. COVER TIMES (OCTOBER 31, 1994)
One can alternatively derive these inequalities from the commute interpre-
tation of resistance (Chapter 3 yyy), since the resistance between x and v
is at most 1/wvx .
Theorem 6.1 For random walk on a weighted graph,
X
max Ev C + ≤ w min 1/we
v T e∈T
This gives the weighted case, and in the unweighted case w = dn¯ and each
spanning tree has e∈T 1/we = n − 1. 2
P
Note that in the unweighted case, the bound is at most n(n−1)2 . On the
barbell (Chapter 5 Example yyy) it is easy to see that mini Ei C = Ω(n3 ),
so the maximal values of any formalization of “mean cover time”, over n-
vertex graphs, is Θ(n3 ). Results and conjectures on the optimal numerical
constants in the Θ(n3 ) upper bounds are given in section 6.3.
Corollary 6.2 On an unweighted n-vertex tree, Ev Cv+ ≤ 2(n − 1)2 , with
equality iff the tree is the n-path and v is a leaf.
Proof. The inequality follows from Theorem 6.1. On the n-path with leaves
v, z we have Ev Cv+ = Ev Tz + Ez Tv = 2(n − 1)2 . 2
It is worth dissecting the proof of Theorem 6.1. Two different inequalities
are used in the proof. Inequality (6.2) is an equality iff the edge is essential,
so the second inequality in the proof is an equality iff the graph is a tree.
But the first inequality in the proof bounds C + by the time to traverse a
spanning tree in a particular order, and is certainly not sharp on a general
tree, but only on a path. This explains Corollary 6.2. More importantly,
these remarks suggest that the bound dn(n ¯ − 1) in Theorem 6.1 will be
good iff there is some fixed “essential path” in the graph, and the dominant
contribution to C is from the time taken to traverse that path (as happens
on the barbell).
There are a number of variations on the theme of Theorem 6.1, and we
will give two. The first (due to Zuckerman [342], whose proof we follow)
provides a nice illustration of probabilistic technique.
Proposition 6.3 Write Ce for the time to cover all edges of an unweighted
graph, i.e. until each edge (v, w) has been traversed in each direction. Then
max Ev Ce ≤ 11dn ¯ 2.
v
210 CHAPTER 6. COVER TIMES (OCTOBER 31, 1994)
Proof. Fix a vertex v and a time t0 . Define “excursions”, starting and ending
at v, as follows. In each excursion, wait until all vertices have been visited,
then wait t0 longer, then end the excursion at the next visit to v. Writing
Si for the time at which the i’th excursion ends, and N for the (random)
number of excursions required to cover each edge in each direction, we have
SN = min{Si : Si ≥ Ce }
Ev Ce ≤ Ev SN = Ev N × Ev S1 . (6.3)
Clearly
Ev S1 ≤ Ev C + t0 + max Ew Tv ≤ t0 + 2 max Ei C.
w i
where m ≡ dn ¯ is the number of directed edges. Fix a directed edge (w, x),
say. By Chapter 3 Lemma yyy the mean time, starting at x, until (w, x) is
traversed equals m. So the chance, starting at x, that (w, x) is not traversed
before time t0 is at most m/t0 . So using the definition of excursion, the
chance that (v, w) is not traversed during the first excursion is at most
m/t0 , so the chance it is not traversed during the first two excursions is at
most (m/t0 )2 . Since there are m directed edges, (6.4) follows.
Repeating the argument for (6.4) gives
!j
m3
Pv (N > 2j) ≤ ; j≥0
t20
2
Ev N ≤ .
1 − m3 /t20
8
max Ev Ce ≤ (d2m3/2 e + 2 max Ev C).
v 3 v
6.1. THE SPANNING TREE ARGUMENT 211
Proof. Replace each edge (i, j) of the graph by two directed edges (i →
j), (j → i). Pick an arbitrary v1 and construct a path v1 → v2 → . . . vq on
distinct vertices, stopping when the path cannot be extended. That is the
first stage of the construction of F1 . For the second stage, pick a vertex vq+1
not used in the first stage and construct a path vq+1 → vq+2 → . . . vr in
which no second-stage vertex is revisited, stopping when a first-stage vertex
is hit or when the path cannot be extended. Continue stages until all vertices
have been touched. This creates a directed spanning forest F1 . Note that
all the neighbors of vq must be amongst {v1 , . . . , vq−1 }, and so the size of
the component of F1 containing v1 is at least d∗ + 1, and similarly for the
other components of F1 .
Now delete from the graph all the directed edges used in F1 . Inductively
construct forests F2 , F3 , . . . , Fdd∗ /2e in the same way. The same argument
shows that each component of Fi has size at least d∗ + 2 − i, because at a
“stopping” vertex v at most i − 1 of the directed edges out of v were used
in previous forests.
Proof of Theorem 6.4. Write m for the number of (undirected) edges.
For an edge e = (v, x) write be = Ev Tx + Ex Tv . Chapter 3 Lemma yyy says
e be = 2m(n − 1). Now consider the dd∗ /2e forests Fi given by Lemma 6.6.
P
But each component of F has size at least dd∗ /2e, so F has at most 2n/d∗
components. So to extend F to a tree T requires adding at most 2n/d∗ − 1
edges (ej ), and for each edge e we have be ≤ 2m by (6.2). This creates a
spanning tree T with e∈T be ≤ 12mn/d∗ . As in the proof of Theorem 6.1,
P
(a) The coupon collector’s problem. Many textbooks discuss this clas-
sical problem, which involves C for the chain (Xt ; t ≥ 0) whose values are
6.2. SIMPLE EXAMPLES OF COVER TIMES 213
n−1
X
EC = E(C m+1 − C m ) = nhn−1 . (6.8)
m=1
(By symmetry, Ev C is the same for each initial vertex, so we just write
EC) It is also a textbook exercise (e.g. [133] p. 124) to obtain the limit
distribution
d
n−1 (C − n log n) → ξ (6.9)
where ξ has the extreme value distribution
We won’t go into the elementary derivations of results like (6.9) here, because
in Chapter 7 yyy we give more general results.
(b) The complete graph. The analysis of C for random walk on the
complete graph (i.e. without self-loops) is just a trivial variation of the
analysis above. Each step following time C m has chance (n − m)/(n − 1) to
hit a new vertex, so
EC = (n − 1)hn ∼ n log n. (6.11)
And the distribution limit (6.9) still holds. Because Ev Tw = n−1 for w 6= v,
we also have
(c) The n-star (Chapter 5 Example yyy). Here the visits to the leaves
(every second step) are exactly i.i.d., so we can directly apply the coupon
collector’s problem. For instance, writing v for the central vertex and l for
a leaf,
and C/2 satisfies (6.9). Though we won’t give the details, it turns out that
a clever inductive argument shows these are the minima over all trees.
214 CHAPTER 6. COVER TIMES (OCTOBER 31, 1994)
(d) The n-cycle. Random walk on the n-cycle is also easy to study.
At time C m the walk has visited m distinct vertices, and the set of visited
vertices must form an interval [j, j + m − 1], say, where we add modulo n. At
time C m the walk is at one of the endpoints of that interval, and C m+1 −C m
is the time until the first of {j − 1, j + m} is visited, which by Chapter 5
yyy has expectation 1 × m. So
n−1 n−1
X X 1
EC = E(C m+1 − C m ) = i = n(n − 1).
m=1 i=1
2
Obviously the complete graph has the same property, by symmetry. Lovasz
and Winkler [240] gave a short but ingenious proof that these are the only
graphs with that property, a result rediscovered in [179].
such as the barbell and the n-cycle show these bounds are the right order
of magnitude. Quite a lot of attention has been paid to sharpening the
constants in such bounds. We will not go into details, but will merely
record a very simple argument in section 6.3.1 and the best known results
in section 6.3.2.
τ ∗ ≤ max Ei C + (6.13)
i
and the results of section 6.1 imply upper bounds on τ ∗ . But implicit in
earlier results is a direct bound on τ ∗ . The edge-commute inequality implies
that, for arbitrary v, x at distance ∆(v, x),
¯
Ev Tx + Ex Tv ≤ dn∆(v, x) (6.14)
and hence
¯
Corollary 6.8 τ ∗ ≤ dn∆, where ∆ is the diameter of the graph.
∆ ∆
2
X X
(∆ + 1)d∗ ≤ dvi = |Ai | ≤ 3n.
i=0 i=0
216 CHAPTER 6. COVER TIMES (OCTOBER 31, 1994)
4 3
max max Ev C + ∼ n (6.16)
v 27
3 3
max min Ev C + ∼ n (6.17)
v 27
2
max min Ev C ∼ n3 (6.18)
v 27
The value in (6.16) is asymptotically attained on the lollipop, as in Theo-
rem 6.11. Note that (6.15) and (6.16) imply the same 4n3 /27 behavior for
intermediate quantities such as τ ∗ and maxv Ev C. The values in (6.17) and
(6.18) are asymptotically attained by the graph consisting of a n/3-path
with a 2n/3-clique attached at the middle of the path.
The corresponding results for τ0 and τ2 are not known. We have τ2 ≤
τ0 ≤ minv Ev C, the latter inequality from the random target lemma, and so
(6.18) implies
2
maxτ0 and maxτ2 ≤ ( + o(1))n3 . (6.19)
27
But a natural guess is that the asymptotic behavior is that of the barbell,
giving the values below.
For regular graphs, none of the asymptotic values are known exactly.
A natural candidate for extremality is the necklace graph (Chapter 5 yyy),
where the time parameters are asymptotically 3/4 times the parameters
for the n-path. So the next conjecture uses the numerical values from the
necklace graph.
Open Problem 6.14 Prove the conjectures that, over the class of regular
n-vertex graphs
3
max max Ei Tj ∼ n2
i,j 4
3
max τ ∗ ∼ n2
2
3
max max Ev C + ∼ n2
v 2
15 2
max max Ev C ∼ n
v 16
3
max min Ev C ∼ n2
v 4
1
max τ0 ∼ n2
4
3
max τ2 ∼ 2 n2
2π
The best bounds known are those implied by the following result.
Theorem 6.15 (Feige [144]) On a d-regular graph,
max Ev C ≤ 2n2
v
max Ev Cv+ ≤ 2n2 1 + d−2
(d+1)2
≤ 13n2 /6.
v
Remarks. For part (i) we give a slightly fussy argument repeating ingredients
of the proof of Corollary 6.9, since these are needed for (ii). The point of
(iv) is to get a bound for t Eπ Ti . On the n-cycle, it can be shown that
the probability in question really is Θ(min(t1/2 /n, 1)), uniformly in n and t.
Proof of Proposition 6.16. Choose a vertex b ∈ Ac at minimum distance
from i, and let i = i0 , i1 , . . . , ij , ij+1 = b be a minimum-length path. Let G∗
be the subgraph on vertex-set A, and let G∗∗ be the subgraph on vertex-set
A together with all the neighbors of ij . Write superscripts ∗ and ∗∗ for the
random walks on G∗ and G∗∗ . Then
The inequality holds because we can specify the walk on G in terms of the
walk on G∗∗ with possibly extra chances of jumping to Ac at each step (this
is a routine stochastic comparison argument, written out as an example in
Chapter 14 yyy). The equality holds because the only routes in G∗∗ from i
to Ac are via ij , by the minimum-length assumption. Now write E, E ∗ , E ∗∗
for the edge-sets. Using the commute interpretation of resistance,
1 |E ∗ |
Eij TA∗∗c = 2|E ∗∗ | −1=2 + 1 ≤ 2|E ∗ | + 1 ≤ |A|2 .
q q
For part (ii), by the electrical network analogy (Chapter 3 yyy) the
quantity in question equals
1
= wi r(i, Ac ) = dr(i, Ac ) (6.22)
Pi (TAc < Ti+ )
where r(i, Ac ) is the effective resistance in G between i and Ac . Clearly this
effective resistance is at most the distance (j + 1, in the argument above)
from i to Ac , which by (6.21) is at most 3|A|/d +1. Thus the quantity (6.22)
is at most 3|A| + d, establishing the desired result in the case d ≤ 2|A|. If
1
d > 2|A| then there are at least d−|A| edges from i to Ac , so r(i, Ac ) ≤ d−|A|
d
and the quantity (6.22) is at most d−|A| ≤ 2 ≤ 5|A|.
For part (iii), fix a state i and an integer time t. Write Ni (t) for the
number of visits to i before time t, i.e. during times {0, 1, . . . , t − 1}. Then
t
= Eπ Ni (t) ≤ Pπ (Ti < t)Ei Ni (t) (6.23)
n
the inequality by conditioning on Ti . Now choose real s such that ns ≥ t.
P
Since j Ei Nj (t) = t, the set
A ≡ {j : Ei Nj (t) > s}
25n2
The bound becomes 1 for t0 = K2
log2 n. So
∞
X
EC [K] = P (C [K] ≥ t)
t=1
2 −1
nX
! ∞
Kt1/2 X
≤ dt0 e + n exp − + P (C [K] ≥ t)
5n
t=dt0 e+1 t=n2
= dt0 e + S1 + S2 , say,
and the issue is to show that S1 and S2 are o(t0 ). To handle S2 , split the set
of K walks into subsets of sizes K − 1 and 1. By independence, for t ≥ n2
we have P (C [K] ≥ t) ≤ P (C [K−1] ≥ n2 )P (C [1] ≥ t). Then
Pi (Xt = j) ≤ 5t−1/2 , t ≤ n2
1 K1 −t
≤ + exp , t ≥ n2
n n K2 n2
where K1 and K2 are absolute constants.
In discrete time one can get essentially the same result, but with the bounds
multiplied by 2, though we shall not give details (see Notes).
Proof. Pi (Xt = i) is decreasing in t, so
Z t
Pi (Xt = i) ≤ t−1 Pi (Xs = i)ds = t−1 Ei Ni (t) ≤ 5t−1/2
0
where the last inequality is Proposition 6.16 (iii), whose proof is unchanged
in continuous time, and which holds for t ≤ n2 . This gives the first inequality
when i = j, and the general case follows from Chapter 3 yyy.
For the second inequality, recall the definition of separation s(t) from
Chapter 4 yyy. Given a vertex i and a time t, there exists a probability
distribution θ such that
Then for u ≥ 0,
1 1
Pi (Xt+u = j) − = s(t) Pθ (Xu = j) − .
n n
222 CHAPTER 6. COVER TIMES (OCTOBER 31, 1994)
1
Thus, defining q(t) = maxi,j Pi (Xt = j) − n , we have proved
(1) 4 −m
q(n2 + mτ1 ) ≤ e , m ≥ 1. (6.27)
n
(1)
But by Chapter 4 yyy we have τ1 ≤ Kτ ∗ for an absolute constant K, and
(1)
then by Corollary 6.9 we have τ1 ≤ 3Kn2 . The desired inequality now
follows from (6.27).
Proof. The proof relies on Proposition 6.18, whose conclusion implies there
exists a constant K such that
1
p∗ (t) ≡ max pvx (t) ≤ + Kt−1/2 ; 0 ≤ t < ∞.
x,v n
Consider running the process forever. The point is that, regardless of the
initial positions, the chance that the cat and mouse are “together” (i.e. at
the same vertex) at time u is at most p∗ (u). So in the case where the cat
starts with the (uniform) stationary distribution,
Z s
P ( together at time s ) = f (u)P ( together at time s|M = u) du
0
where f is the density function of M
Z s
≤ f (u)p∗ (s − u)du
0
Z s
1
≤ P (M ≤ s) + K f (u)(s − u)−1/2 du.
n 0
6.5. HITTING TIME BOUNDS AND CONNECTIVITY 223
So
Z t
t
= P ( together at time s ) ds by stationarity
n 0
1 t
Z Z t Z t
≤ P (M ≤ s)ds + K f (u)du (s − u)−1/2 ds
n 0 0 u
1 t
Z t
t
Z
= − P (M > s)ds + 2K f (u)(t − u)1/2 du
n n 0 0
t 1
≤ − E min(M, t) + 2Kt1/2 .
n n
Rearranging, E min(M, t) ≤ 2Knt1/2 . Writing t0 = (4Kn)2 , Markov’s in-
equality gives P (M ≤ t0 ) ≥ 1/2. This inequality assumes the cat starts
with the stationary distribution. When it starts at some arbitrary vertex,
we may use the definition of separation s(u) (recall Chapter 4 yyy) to see
P (M ≤ u + t0 ) ≥ (1 − s(u))/2. Then by iteration, EM ≤ 2(u+t 0)
1−s(u) . So
(1)
appealing to the definition of τ1 ,
2 (1)
m∗ ≤ −1
(t0 + τ1 ).
1−e
(1)
But results from Chapter 4 and this chapter show τ1 = O(τ ∗ ) = O(n2 ),
establishing the Proposition.
τ ∗ = O(n). (6.28)
6.5.1 Edge-connectivity
At the other end of the spectrum from expanders, we can consider graphs
satisfying only a little more than connectivity.
xxx more details in proofs – see Fill’s comments.
Recall that a graph is r-edge-connected if for each proper subset A of
vertices there are at least r edges linking A with Ac . By a variant of Menger’s
theorem (e.g. [86] Theorem 5.11), for each pair (a, b) of vertices in such a
graph, there exist r paths (a = v0i , v1i , v2i , . . . , vm
i = b), i = 1, . . . , r for which
i
i i
the edges (vj , vj+1 ) are all distinct.
Proposition 6.22 For a r-edge-connected graph,
¯ 2 ψ(r)
dn
τ∗ ≤
r2
where ψ is defined by
i(i + 1)
ψ =i
2
i(i + 1) (i + 1)(i + 2)
ψ(·) is linear on , .
2 2
6.5. HITTING TIME BOUNDS AND CONNECTIVITY 225
√
Note ψ(r) ∼ 2r. So for a d-regular, d-edge-connected graph, the bound
becomes ∼ 21/2 d−1/2 n2 for large d, improving on the bound from Corollary
6.9. Also, the Proposition improves on the bound implied by Chapter 4 yyy
in this setting.
Proof. Given vertices a, b, construct a unit flow from a to b by putting
flow 1/r along each of the r paths (a = v0i , v1i , v2i , . . . , vm
i = b). By Chapter
i
3 Theorem yyy
¯
Ea Tb + Eb Ta ≤ dn(1/r) 2
M
P
where M = i mi is the total number of edges in the r paths. So the
issue is bounding M . Consider the digraph of all edges (vji , vj+1 i ). If this
Proof.
xxx give proof and picture.
Example 6.24 Take vertices {0, 1, . . . , n − 1} and edges (i, i + u mod n) for
all i and all 1 ≤ u ≤ κ.
(n/2)2
τ ∗ = 2E0 Tbn/2c ∼ = Θ(n2 /κ2 ).
σ2
This is the bound Proposition 6.22 would give if the graph were Θ(κ2 )-
edge-connected. And for a “typical” subset A such as an interval of length
greater than κ there are indeed Ω(κ2 ) edges crossing the boundary of A. But
by considering a singleton A we see that the graph is really only 2κ-edge-
connected, and Proposition 6.22 gives only the weaker O(n2 /κ1/2 ) bound.
xxx tie up with similar discussion of τ2 and connectivity being affected
by small sets; better than bound using τc only.
226 CHAPTER 6. COVER TIMES (OCTOBER 31, 1994)
For the chain started at i write C ++ = Γ(C + ) and C +++ = Γ(C ++ ). Since
Tj < C + we have Γ(Tj ) ≤ C ++ . So the chain started at time Tj has covered
all states and returned to j by time C +++ , implying Ej C + ≤ EC +++ =
3Ei C + . For inequality (6.29), recall the random target lemma: the mean
time to hit a π-random state V equals τ0 , regardless of the initial distribu-
tion. The inequality
Ei C + ≤ τ0 + Eπ C + τ0 + Eπ Ti
In Chapter 2 we proved the lower bound in the case where A was the entire
state space, but the result for general A follows by the same proof, taking
the J’s to be a uniform random ordering of the states in A. One obvious
motivation for the more general formulation comes from the case of trees,
where for a leaf l we have minj El Tj = 1, so the lower bound with A being
the entire state space would be just hn−1 . We now illustrate use of the more
general formulation.
Lemma 6.28 The effective resistance between r(v, x) between vertices v and
x in a weighted graph satisfies
1 1
≤ wv,x + 1 1 .
r(v, x) wv −wv,x + wx −wv,x
dv + dx − 2
r(v, x) ≥ if (v, x) is an edge
dv dx − 1
1 1
≥ + if not
dv dw
6.6. LOWER BOUNDS 229
2dn
Ev Tx + Ex Tv ≥ if (v, x) is an edge
d+1
≥ 2n if not .
Proof. We need only prove the first assertion, since the others follow by
specialization and by the commute interpretation of resistance. Let A be
the set of vertices which are neighbors of either v or x, but exclude v and x
themselves from A. Short the vertices of A together, to form a single vertex
a. In the shorted graph, the only way current can flow from v to x is directly
v → x or indirectly as v → a → x. So, using 0 to denote the shorted graph,
the effective resistance r0 (v, x) in the shorted graph satisfies
1 0 1
= wv,x + 1 1 .
r0 (v, x) 0
wv,a + 0
wx,a
0
Now wx,v 0
= wx,v , wv,a = wv − wv,x and wx,a 0 = wx − wv,x . Since shorting
0
decreases resistance, r (v, x) ≤ r(v, x), establishing the first inequality.
Example 6.29 Take the complete graph on n vertices, and add an edge
(v, l) to a new leaf l.
Since random walk on the complete graph has mean cover time (n − 1)hn−1 ,
random walk on the enlarged graph has
El C = 1 + (n − 1)hn−1 + 2µ
230 CHAPTER 6. COVER TIMES (OCTOBER 31, 1994)
where µ is the mean number of returns to l before covering. Now after each
visit to v, the walk has chance 1/n to visit l on the next step, and so the
mean number of visits to l before visiting some other vertex of the complete
graph equals 1/(n − 1). We may therefore write µ in terms of expectations
for random walk on the complete graph as
1
µ = Ev ( number of visits to v before C)
n−1
1
= Ev ( number of visits to v before C + )
n−1
1 1
= Ev C + by Chapter 2 Proposition yyy
n−1 n
1 + hn−1
= by (6.12).
n
Open Problem 6.30 Prove that, for any reversible chain on n states,
Eπ C ≥ (n − 1)hn−1
The related asymptotic question was open for many years, and was finally
proved by Feige [142].
min Ev C ≥ cn ,
v
where cn ∼ n log n as n → ∞.
there is non-zero chance that the induced walk on every G covers before
time Kt0 . The crude bound |G n,d | ≤ (nd)nd means we may take K =
dnd log(nd)e.
6.9. NOTES ON CHAPTER 6 233
It’s clear one can compute mean hitting times on a n-step chain in polynomial
time, but to set up the computation of Ei C as a hitting-time problem one has
to incorporate the subset of already-visited states into the “current state”,
and thus work with hitting times for a n × 2n−1 -state chain.
[341, 343], Palacios [276, 273] and the Ph.D. thesis of Sbihi [306], as well as
papers cited elsewhere.
Section 6.1. The conference proceedings paper [25] proving Theorem 6.1
was not widely known, or at least its implications not realized, for some
years. Several papers subsequently appeared proving results which are con-
sequences (either obvious, or via the general relations of Chapter 4) of The-
orem 6.1. I will spare their authors embarrassment by not listing them all
here!
The spanning tree argument shows, writing be for the mean commute
time across an edge e, that
X
max Ev C + ≤ min be .
v T e∈T
Coppersmith et al [100] give a deeper study and show that the right side is
bounded between γ and 10γ/3, where
! !
X X 1
γ= dv .
v v dv + 1
The upper bound is obtained by considering a random spanning tree, cf.
Chapter yyy.
Section 6.2. The calculations in these examples, and the uniformity
property of V on the n-cycle, are essentially classical. For the cover time
d
Cn on the n-cycle there is a non-degenerate limit distribution n−2 Cn →
C. From the viewpoint of weak convergence (Chapter yyy), C is just the
cover time for Brownian motion on the circle of unit circumference, and its
distribution is known as part of a large family of known distributions for
maximal-like statistics of Brownian motion: Imhof [187] eq. (2.4) gives the
density as
∞
m2
fC (t) = 23/2 π −1/2 t−3/2
X
(−1)m−1 m2 exp(− ).
m=1
2t
I believe that the recursion set-up in [16] can be used to prove Open
Problem 6.35 on trees, but I haven’t thought carefully about it.
236 CHAPTER 6. COVER TIMES (OCTOBER 31, 1994)
The “shorting” lower bound, Lemma 6.28, was apparently first exploited
by Coppersmith et al [100].
Section 6.7. Corollary 6.32 encompasses a number of exponential limit
results proved in the literature by ad hoc calculations in particular examples.
Section 6.8.1. Proposition 6.34 is one of the neatest instances of “Erdos’s
Probabilistic Method in Combinatorics”, though surprisingly it isn’t in the
recent book [29] on that subject. Constructing explicit universal traversal
sequences is a hard open problem: see Borodin et al [56] for a survey.
Section 6.8.2. See [67] for a more careful discussion of the issues. The
alert reader of our example will have noticed the subtle implication that the
reader has written fewer papers than Paul Erdos, otherwise (why?) it would
be preferable to do the random walk in the other direction.
Miscellaneous. Condon and Hernek [98] study cover times in the follow-
ing setting. The edges of a graph are colored, a sequence (ct ) of colors is
prespecified and the “random walk” at step t picks an edge uniformly at
random from the color-ct edges at the current vertex.
Chapter 7
237
238CHAPTER 7. SYMMETRIC GRAPHS AND CHAINS (JANUARY 31, 1994)
The set Γ of symmetries forms a group under convolution, and in our (non-
standard) terminology a symmetric Markov transition matrix is one for
which Γ acts transitively, i.e.
pij = µ(i−1 ∗ j)
g ∈ G implies g −1 ∈ G
So by Chapter 4 yyy
τ ∗ ≤ 4τ0 . (7.9)
The formula for Eπ Ti in terms of the fundamental matrix (Chapter 2 yyy)
can be written as
∞
X
τ0 /n = 1 + (Pi (Xt = i) − 1/n). (7.10)
t=1
Approximating τ0 by the first few terms is what we call the local transience
heuristic. See Chapter xxx for rigorous discussion.
n
Lemma 7.2 (i) Ei Tj ≥ 1+p(i,j) , j 6= i.
(ii) maxi,j Ei Tj ≤ 2τ0
Proof. (i) This is a specialization of Chapter 6 xxx.
(ii) For any i, j, k,
Ei Tj ≤ Ei Tk + Ek Tj = Ei Tk + Ej Tk .
Note that, because τ2 ≤ τ1 + 1 and τ0 ≥ (n − 1)2 /n, the hypothesis “τ2 /τ0 →
0” is weaker than either “τ2 /n → 0” or “τ1 /τ0 → 0”.
Part (a) is a specialization of Chapter 3 Proposition yyy and its proof.
Parts (b) and (c) use refinements of the same technique. Part (b) implies
Because this applies in many settings in this Chapter, we shall rarely need
to discuss τ ∗ further.
xxx give proof
In connection with (b), note that
(2)
Ev Tw ≤ τ1 + τ0 (7.11)
(2)
by definition of τ1 and vertex-transitivity. So (b) is obvious under the
slightly stronger hypothesis τ1 /τ0 → 0.
Chapter 3 Proposition yyy actually gives information on hitting times
TA to more general subsets A of vertices. Because (Chapter 3 yyy) Eπ TA ≥
(1−π(A))2
π(A) , we get (in continuous time) a quantification of the fact that TA
has approximately exponential distribution when |A| n/τ2 and when the
chain starts with the uniform distribution:
−2
τ2 n |A|
sup |Pπ (TA > t) − exp(−t/Eπ TA )| ≤ 1− .
t |A| n
EC ≥ (β − o(1))τ0 log n.
Proof. Using the basic form of Matthews method (Chapter 2 yyy), (a)
follows from Lemma 7.2 and (b) from Theorem 7.4. To prove (c), fix a state
j and ε > 0. Using (7.11) and Markov’s inequality,
(2)
τ1
π{i : Ei Tj ≤ (1 − ε)τ0 } ≤ ≡ α, say.
ετ0
1
(x, y) → (x0 , y) chance (m1 − 1)−1 1 − , x0 6= x
am1 log m1
1
→ (x, y 0 ) chance (m2 − 1)−1 , y 0 6= y
am1 log m1
244CHAPTER 7. SYMMETRIC GRAPHS AND CHAINS (JANUARY 31, 1994)
Then (c.f. Chapter 4 section yyy) we can define a “product chain” with
state-space I (1) × . . . × I (d) and transition probabilities
(i) (i)
(x1 , . . . , xd ) → (x1 , . . . , x0i , . . . , xd ): probability ai P (X1 = x0i |X0 = xi ).
7.1. SYMMETRIC REVERSIBLE CHAINS 247
This product chain is also symmetric reversible. But if the underlying chains
have extra symmetry properties, these extra properties are typically lost
when one passes to the product chain. Thus we have a general method of
constructing symmetric reversible chains which lack extra structure. Ex-
ample 7.14 below gives a case with distinct underlying components, and
Example 7.11 gives a case with a non-uniform product. In general, writing
(i)
(λu : 1 ≤ u ≤ |I (i) |) for the continuous-time eigenvalues of X (i) , we have
(Chapter 4 yyy) that the continuous-time eigenvalues of the product chain
are
λu = a1 λ(1) (d)
u1 + . . . + ad λud
In continuout time we still get the product form for the distribution at time
t:
i
248CHAPTER 7. SYMMETRIC GRAPHS AND CHAINS (JANUARY 31, 1994)
We call this the cutoff phenomenon, and when a sequence of chains satisfies
(7.20) we say the sequence has “variation cutoff at cn ”. As mentioned at
xxx, the general theory of Chapter 4 works smoothly using d(t), ¯ but in
examples it is more natural to use d(t), which we shall do in this chapter.
Clearly, (7.20) implies the same result for d¯ and implies τ1 ∼ cn . Also, our
convention in this chapter is to work in discrete time, whereas the Chapter
4 general theory worked more smoothly in continuous time. (Clearly (7.20)
in discrete time implies the same result for the continuized chains, provided
cn → ∞). Note that, in the context of symmetric reversible chains,
We also can discuss separation distance (Chapter 4 yyy) which in this context
is
s(t) = 1 − n min Pi (Xt = j) for each i,
j
Proof. The lower bounds are specializations of Lemma 7.2(i), i.e. of Chapter
6 xxx. For the upper bound in (ii),
1X
n−1 = Ey Tx (7.21)
d y∼x
1 dn
≥ Ev Tx + (d − 1) by the lower bound in (ii).
d d+1
Rearrange.
xxx mention general lower bound τ0 ≥ (1−o(1))nd/(d−2) via tree-cover.
It is known (xxx ref) that a Cayley graph of degree d is d-edge-connected,
and so Chapter 6 Proposition yyy gives
τ ∗ ≤ n2 ψ(d)/d
p
where ψ(d)/d ≈ 2/d.
250CHAPTER 7. SYMMETRIC GRAPHS AND CHAINS (JANUARY 31, 1994)
Example 7.14 A Cayley graph where Ev Tw is not the same for all edges
(v, w).
Consider Zm ×Z2 with generators (1, 0), (−1, 0), (0, 1). The figure illustrates
the case m = 4.
30 20
@
@@
31 21
01 11@
@
@
00 10
Let’s calculate E00 T10 using the resistance interpretation. Put unit volt-
age at 10 and zero voltage at 00, and let ai be the voltage at i0. By symmetry
the voltage at i1 is 1 − ai , so we get the equations
1
ai = (ai−1 + ai+1 + (1 − ai )), 1 ≤ i ≤ m − 1
3
with a0 = am = 0. But this is just a linear difference equation, and a brief
calculation gives the solution
1 1 θm/2−i + θi−m/2
ai = −
2 2 θm/2 + θ−m/2
√
where θ = 2 − 3. The current flow is 1 + 2a1 , so the effective resistance is
r = (1 + 2a1 )−1 . The commute interpretation of resistance gives 2E00 T01 =
3nr, and so
3n
E00 T01 =
2(1 + 2a1 )
where n = 2m is the number of vertices. In particular,
3
n−1 E00 T01 → γ ≡ √ as n → ∞.
1+ 3
Using the averaging property (7.21)
√
−1 0 3 3
n E00 T10 →γ ≡ √ as n → ∞.
2(1 + 3)
7.1. SYMMETRIC REVERSIBLE CHAINS 251
Turning from hitting times to mixing times, recall the Cheeger constant
τc ≡ sup c(A)
A
π(Ac )
c(A) ≡ .
Pπ (X1 ∈ Ac |X0 ∈ A)
For random walk on a Cayley graph one can use simple “averaging” ideas
to bound c(A). This is Proposition 7.15 below. The result in fact extends
to vertex-transitive graphs by a covering graph argument - see xxx.
Consider a n-vertex Cayley graph with degree d and generators G =
{g1 , . . . , gd }, where g ∈ G implies g −1 ∈ G. Then
1 X |Ag \ A|
Pπ (X1 ∈ Ac |X0 ∈ A) =
d |A|
g∈G
where Ag = {ag : a ∈ A}. Lower bounding the sum by its maximal term,
we get
d |A| |Ac |
c(A) ≤ . (7.22)
n maxg∈G |Ag \ A|
is the radius of A.
Note that supA ρ(A) is bounded by ∆ but not in general by ∆/2 (consider
the cycle), so that (ii) implies (i) with an extra factor of 2. Part (i) is from
Aldous [10] and (ii) is from Babai [36].
Proof. (i) Fix A. Because
1 X
|A ∩ Av| = |A|2 /n
n
v∈V
1
So there exists g ∈ G with |Ag \ A| ≥ ∆ × |A||Ac |/n, and so (i) follows from
(7.22). For part (ii), fix A with |A| ≤ n/2, write ρ = ρ(A) and suppose
1
max |Ag \ A| < |A|. (7.24)
g∈G 4ρ
1
Fix v with maxw∈A d(w, v) = ρ. Since |Ag \ A| < 4ρ |A| and
we have by induction
1
|A \ Ax| < |A|d(x, v). (7.25)
4ρ
1 |A||Ac |
|Ax \ A| < |A| ≤ .
2 n
But this contradicts (7.23). So (7.24) is false, i.e.
1 1 |A||Ac |
max |Ag \ A| ≥ |A| ≥ .
g∈G 4ρ 2ρ n
By complementation the final inequality remains true when |A| > n/2, and
the result follows from (7.22).
x = g1 g2 . . . gd ; gi ∈ G.
For each x choose some minimal-length word as above and define N (g, x)
to be the number of occurences of g in the word. Now consider a different
reversible random flight on I with some step-distribution µ̃, not necessarily
supported on G. If we know τ˜2 , the next result allows us to bound τ2 .
Theorem 7.16
τ2 1 X
≤ K ≡ max d(x, id)N (g, x)µ̃(x).
τ˜2 g∈G µ(g) x∈I
∆2
τ2 ≤ ,
ming∈G µ(g)
When µ is uniform on G and |G| = d, the Corollary gives the bound d∆2 ,
which improves on the bound 8d2 ∆2 which follows from Proposition 7.15 and
Cheeger’s inequality (Chapter 4 yyy). The examples of the torus ZN d show
2
that ∆ enters naturally, but one could hope for the following variation.
Open Problem 7.18 Write τ∗ = τ∗ (I, G) for the minimum of τ2 over all
symmetric random flights on I with step-distribution supported on G. Is it
true that τ∗ = O(∆2 )?
254CHAPTER 7. SYMMETRIC GRAPHS AND CHAINS (JANUARY 31, 1994)
7.2 Arc-transitivity
Example 7.14 shows that random walk on a Cayley graph does not nec-
essarily have the property that Ev Tw is the same for all edges (v, w). It
is natural to consider some stronger symmetry condition which does im-
ply this property. Call a graph arc-transitive if for each 4-tuple of vertices
(v1 , w1 , v2 , w2 ) such that (v1 , w1 ) and (v2 , w2 ) are edges, there exists an au-
tomorphism γ such that γ(v1 ) = w1 , γ(v2 ) = w2 . Arc-transitivity is stronger
than vertex-transitivity, and immediately implies that Ev Tw is constant over
edges (v, w).
Proof. (i) follows from Ev Tv+ = n. For (ii), write N (w) for the set of
neighbors of w. Then
Ev Tw = Ev TN (w) + (n − 1)
C − τ0 log n d
→ η (7.26)
τ0
Note that the lower bound (n − 1)hn−1 is attained on the complete graph.
It is not known whether this exact lower bound remains true for vertex-
transitive graphs, but this would be a consequence of Chapter 6 Open Prob-
lem yyy. Note also that by xxx the hypothesis τ0 /n → 1 can only hold if
the degrees tend to infinity.
Corollary 7.20 provides easily-checkable conditions for the distributional
limit for cover times, in examples with ample symmetry, such as the card-
shuffling examples in the next section. Note that
b + o(1) C − n log n − bn d
(7.26) and τ0 = n 1 + imply → η.
log n n
7.2. ARC-TRANSITIVITY 255
Thus on the d-cube (Chapter 5 yyy) τ0 = n 1 + 1+o(1)
d =n 1+ log 2+o(1)
log n
and so
C − n log n − n log 2 d
→ η.
n
The model is
With chance 1/m the same card is chosen twice, so the “interchange” has
no effect. This model was studied by Diaconis and Shahshahani [122], and
more concisely in the book Diaconis [112] Chapter 3D. The chain Yt has
transition probabilities
The model is
With probability 1/(m + 1) do nothing. Otherwise, choose one
pair of adjacent cards (counting the top and bottom cards as ad-
jacent), with probability 1/(m+1) for each pair, and interchange
them.
The chain Yt has transition probabilities
i→ i+1 probability 1/(m + 1)
i→ i−1 probability 1/(m + 1)
i→ i probability (m − 1)/(m + 1)
with i ± 1 counted modulo m. This chain is (in continuous time) just a
time-change of random walk on the m-cycle, so has relaxation time
m+1 1 m3
a(m) ≡ ∼ 2.
2 1 − cos(2π/m) 4π
So by the contraction principle xxx the card-shuffling process has τ2 ≥ a(m),
and (xxx unpublished Diaconis work) in fact
τ2 = a(m) ∼ m3 /4π 2 .
A coupling argument which we shall present in Chapter xxx gives an upper
bound τ1 = O(m3 log m) and (xxx unpublished Diaconis work) in fact
τ1 = Θ(m3 log m).
The local transience heuristic (7.10) again suggests
τ0 = m!(1 + 1/m + O(1/m2 ))
but this has not been studied rigorously.
Many variants of these examples have been studied, and we will mention
a generalization of Examples 7.21 and 7.22 in Chapter xxx. Here is another
example, from Diaconis and Saloff-Coste [117], which illustrates the use of
comparison arguments.
7.2. ARC-TRANSITIVITY 257
EC ∼ Rd n log n.
Example 7.22 satisfies (i) and (ii) but are not random flights with steps
uniform on a conjugacy class.
Thus if the graph has the property that there exists a unique vertex 0∗ at
distance ∆ from 0, then we can pull back to the graph to get
∆
τ∗ 1X
= max Ex Tv = E0 T0∗ = 1/wi . (7.35)
2 x6=v 2 i=1
If the graph lacks that property, we can use (7.31) to calculate h(∆).
The general identities of Chapter 3 yyy can now be used to give formulas
for quantities such as Px (Ty < Tz ) or Ex (number of visits to y before Tz ).
262CHAPTER 7. SYMMETRIC GRAPHS AND CHAINS (JANUARY 31, 1994)
7.3.2 Examples
Many treatments of random walk on sporadic examples such as regular
polyhedra have been given, e.g. [227, 228, 275, 319, 320, 330, 331], so I shall
not repeat them here. Of infinite families, the complete graph was discussed
in Chapter 5 yyy, and the complete bipartite graph is very similar. The
d-cube also was treated in Chapter 5. Closely related to the d-cube is a
model arising in several contexts under different names,
Example 7.28 c-subsets of a d-set.
The model has parameters (c, d), where 1 ≤ c ≤ d − 1. Formally, we
have random walk on the distance-transitive graph whose vertices are the
d! 0
c!(d−c)! c-element subsets A ⊂ {1, 2, . . . , d}, and where (A, A ) is an edge iff
|A4A0 | = 2. More vividly, d balls {1, 2, . . . , d} are distributed between a left
urn and a right urn, with c balls in the left urn, and at each stage one ball is
picked at random from each urn, and the two picked balls are interchanged.
The induced birth-and-death chain is often called the Bernouilli-Laplace
diffusion model. The analysis is very similar to that of the d-cube. See
[123, 127] and [112] Chapter 3F for details on convergence to equilibrium
and [110] for hitting and cover times.
Section 7.1.7. The factor of 2 difference between the variation and sep-
aration cutoffs which appears in Lemma 7.12 is the largest possible – see
Aldous and Diaconis [22].
Section 7.1.8. xxx walk-regular example – McKay paper.
Section 7.1.9. Diaconis and Saloff-Coste [117] give many other applica-
tions of Theorem 7.16. We mention some elsewhere; others include
xxx list.
Section 7.2. The name “arc-transitive” isn’t standard: Biggs [48] writes
“symmetric” and Brouwer et al [71] write “flag-transitive”. Arc-transitivity
is not necessary for the property “Ev Tw is constant over edges”. For in-
stance, a graph which is vertex-transitive and edge-transitive (in the sense
of undirected edges) has the property, but is not necessarily arc-transitive
[182]. Gobel and Jagers [168] observed that the property
(equivalently: the effective resistance across each edge is constant) holds for
arc-transitive graphs and for trees.
Section 7.2.2. Sbihi [306] and Zuckerman [343] noted that the subset ver-
sion of Matthews method could be applied to the d-torus to give Corollaries
7.24 and 7.25.
The related topic of the time taken by random walk on the infinite lattice
Z d to cover a ball centered at the origin has been studied independently –
see Revesz [288] Chapter 22 and Lawler [221], who observed that similar
arguments could be applied to the d-torus, improving the lower bound in
Corollary 7.25. It is easy to see an informal argument suggesting that,
for random walk on the 2-torus, when nα vertices are unvisited the set of
unvisited vertices has some kind of fractal structure. No rigorous results are
known, but heuristics are given in Brummelhuis and Hilhorst [75].
Section 7.3.1. Deriving these exact formulas is scarcely more than un-
dergraduate mathematics, so I am amazed to see that research papers have
continued to be published in the 1980s and 1990s claiming various special
or general cases as new or noteworthy.
Section 7.3.5. In the setting of isotropic random flight (7.36) with step-
length distribution q, it is natural to ask what conditions on q and q 0 imply
that τ (q) ≥ τ (q 0 ) for our parameters τ . For certain distributions on the d-
cube, detailed explicit calculations by Karlin et al [207] establish an ordering
of the entire eigenvalue sequences, which in particular implies this inequality
for τ2 and τ0 . Establishing results of this type for general Gelfand pairs seems
an interesting project.
266CHAPTER 7. SYMMETRIC GRAPHS AND CHAINS (JANUARY 31, 1994)
267
268CHAPTER 8. ADVANCED L2 TECHNIQUES FOR BOUNDING MIXING TIMES (MAY 19
n
= πi−1
X
exp(−2λm t)u2im .
m=2
Second, from (8.2) and Chapter 4 yyy:(14) we may also write the maximum
L2 distance appearing in (8.1) using
ˆ
d(2t) ≤ π∗−1 e−2t/τ2 , (8.5)
where τ2 := λ−1
2 is the relaxation time and π∗ := mini πi . Thus if
1 1
t ≥ τ2 log +c ,
2 π∗
then q
d(t) ≤ 1
2
ˆ
d(2t) ≤ 12 e−c , (8.6)
which is small if c is large; in particular, (8.6) gives the upper bound in
1 1
τ2 ≤ τ̂ ≤ τ2 log +1 , (8.7)
2 π∗
ˆ
can sometimes further decrease the time t required to guarantee that d(2t),
and hence also d(t), is small.
A second set of advanced techniques, encompassing the notions of Nash
inequalities, moderate growth, and local Poincaré inequaltities, is described
in Section 3. The development there springs from the inequality
kPi (Xt ∈ ·) − π(·)k2 ≤ N (s)e−(t−s)/τ2 , (8.8)
established for all 0 ≤ s ≤ t in Section 2, where
s
pii (2t)
N (t) = max kPi (Xt ∈ ·)k2 = max , t ≥ 0. (8.9)
i i πi
Choosing s = 0 in (8.8) gives
−1/2 −t/τ2
kPi (Xt ∈ ·) − π(·)k2 ≤ πi e ,
and maximizing over i recaptures (8.5). The point of Section 3, however,
is that one can sometimes reduce the bound by a better choice of s and
suitable estimates of the decay rate of N (·). Such estimates can be provided
by so-called Nash inequalities, which are implied by (1) moderate growth
conditions and (2) local Poincaré inequalities. Roughly speaking, for chains
satisfying these two conditions, judicious choice of s shows that variation
mixing time and τ̂ are both of order ∆2 , where ∆ is the diameter of the
graph underlying the chain.
xxx Might not do (1) or (2), so need to modify the above.
To outline a third direction of improvement, we begin by noting that nei-
ther of the bounds in (8.7) can be much improved in general. Indeed, ignor-
ing Θ(1) factors as usual, the lower bound is equality for the n-cycle (Chap-
ter 5, Example yyy:7) and the upper bound is equality for the M/M/1/n
queue (Chapter 5, Example yyy:6) with traffic intensity ρ ∈ (0, 1).
In Section 4 we introduce the log-Sobolev time τl defined by
τl := sup{L(g)/E(g, g) : g 6≡ constant} (8.10)
where L(g) is the entropy-like quantity
X
L(g) := πi g 2 (i) log(|g(i)|/kgk2 ),
i
recalling kgk22 = i πi g 2 (i). Notice the similarity between (8.10) and the
P
will be to use known results for the random walk with weights (w̃ij ) to derive
corresponding results for the walk of interest. We assume that the graph
is connected under each set of weights. As in Chapter 4, Section yyy:4.3,
we choose (“distinguish”) paths γxy from x to y. Now, however, this need
be done only for those (x, y) with x 6= y and w̃xy > 0, but we impose the
additional constraint we > 0 for each edge e in the path. (Here and below,
e denotes a directed edge in the graph of interest.) In other words, roughly
put, we need to construct a (wij )-path to effect each given (w̃xy )-edge. Recall
from Chapter 3 yyy:(71) the definition of Dirichlet form:
1 X X wij
E(g, g) = (g(j) − g(i))2 , (8.14)
2 i j6=i w
1 X X w̃ij
Ẽ(g, g) = (g(j) − g(i))2 .
2 i j6=i w̃
for every g.
This inequality was established in the proof of the distinguished path the-
orem (Chapter 4 Theorem yyy:32), and that theorem was an immediate
consequence of the inequality. Hence the comparison Theorem 1 may be
regarded as a generalization of the distinguished path theorem.
[xxx For NOTES: We’ve used simple Sinclair weighting. What about
other weighting in use of Cauchy–Schwarz? Hasn’t been considered, as far
as I know.]
(b) When specialized to the setting of reversible random flights on Cay-
ley graphs described in Chapter 7 Section yyy:1.9, Theorem 1 yields The-
orem yyy:14 of Chapter 7. To see this, adopt the setup in Chapter 7 Sec-
tion yyy:1.9, and observe that the word
kgk22 ≤ kgk∼2
2 max(πi /π̃i ) (8.18)
i
Example 8.3 Consider a card shuffle which transposes the top two cards
in the deck, moves the top card to the bottom, or moves the bottom card
to the top, each with probability 1/3. This example fits the specialized
group framework of Chapter 7 Section yyy:1.9 (see also Remark (b) following
Theorem 8.1 above) with I taken to be the symmetric group on m letters
and
G := {(1 2), (m m − 1 m − 2 · · · 1), (1 2 · · · m)}
in cycle notation. [If the order of the deck is represented by a permutation σ
in such a way that σ(i) is the position of the card with label i, and if
permutations are composed left to right, then σ · (m m − 1 m − 2 · · · 1) is
the order resulting from σ by moving the top card to the bottom.]
We obtain a representation (8.15) for any given permutation x by writing
x = hm hm−1 · · · h2
and the inf in (8.22) is taken over all vectors h1 , . . . , hm−1 that are orthogonal
in L2 (π) (or, equivalently, that are linearly independent). Using (8.19),
Corollary 8.2 now generalizes to
Corollary 8.4 (comparison of eigenvalues) In Theorem 8.1, the eigen-
values λm and λ̃m in the respective spectral representations satisfy
A −1
λ−1
λ̃m ≤
a m
with A and a as defined in Corollary 8.2.
276CHAPTER 8. ADVANCED L2 TECHNIQUES FOR BOUNDING MIXING TIMES (MAY 19
A = w/w̃;
furthermore,
π̃i w
a = min = min w̃i wi ,
i πi w̃ i
so
A wi
= max ≤ 1.
a i w̃i
Thus λ−1 −1
l ≤ λ̃l for 1 ≤ l ≤ n := m1 m2 ; in particular,
τ2 ≤ τ̃2 . (8.24)
and in particular
τ2 ≥ 12 τ̃2 .
8.1. THE COMPARISON METHOD FOR EIGENVALUES 277
kf k∞ := max |f (i)|
i
and
kνk∞ := max(|νj |/πj ).
j
The sup in (8.29) is always achieved, and there are many equivalent reex-
pressions, including
where A∗ is the matrix with (i, j) entry πj aji /πi , that is, A∗ is the adjoint
operator to A (with respect to π).
Our applications in this chapter will all have A = A∗ , so we will not need
to distinguish between the two operator norms. In fact, all our applications
will take A to be either Pt or Pt − E for some t ≥ 0, where
Pt := (pij (t) : i, j ∈ I)
E = (πj : i, j ∈ I),
and where we assume that the chain for (Pt ) is reversible. Note that E
operates on functions essentially as expectation with respect to π:
X
(Ef )(i) = πj f (j), i ∈ I.
j
P
The effect of E on signed measures is to map ν to ( i νi )π, and
Pt E = E = EPt , t ≥ 0. (8.31)
d 2
kPt f k22 = −2E(Pt f, Pt f ) ≤ − varπ Pt f ≤ 0.
dt τ2
(b)
kPt − Ek2→2 = e−t/τ2 , t ≥ 0.
8.2. IMPROVED BOUNDS ON L2 DISTANCE 281
d X
pij (t) = qik pkj (t)
dt k
we find
d X
(Pt f )(i) = qik [(Pt f )(k)]
dt k
and so
d XX
kPt f k22 = 2 πi [(Pt f )(i)]qik [(Pt f )(k)]
dt i k
= −2E(Pt f, Pt f ) by Chapter 3 yyy:(70)
2
≤ − varπ Pt f by the extremal characterization of τ2 .
τ2
(b) From (a), for any f we have
d d 2
k(Pt − E)f k22 = kPt (f − Ef )k22 ≤ − k(Pt − E)f k22 ,
dt dt τ2
which yields
Lemma 8.11 For an irreducible reversible chain with arbitrary initial dis-
tribution and any s, t ≥ 0,
xxx For NOTES: By Jensen’s inequality (for 1 ≤ q < ∞), any transition
matrix contracts Lq for any 1 ≤ q ≤ ∞.
and each Pt is contractive on L2 , i.e., kPt k2→2 ≤ 1 (this follows, for
example, from Lemma 8.10(a); and note that kPt k2→2 = 1 by considering
constant functions), it follows that
and the decrease is strictly monotone unless P (X0 ∈ ·) = π(·). From (8.32)
follows
and again
The norm kPs kq∗ →2 decreases in q ∗ (for fixed s) and is identically 1 when
q ∗ ≥ 2, but in applications we will want to take q ∗ < 2. The following
duality lemma will then often prove useful. Recall that 1 ≤ q, q ∗ ≤ ∞ are
said to be (Hölder-)conjugate exponents if
1 1
+ ∗ = 1. (8.36)
q q
and the techniques of later sections are not needed to compute N (s). In
particular, in the vertex-transitive case
n
X
N 2 (s) = 1 + exp(−2λs).
m=2
The norm N (s) clearly behaves nicely under the formation of products:
For the two-state chain, the results of Chapter 5 Example yyy:4 show
max(p, q) −2(p+q)s
N 2 (s) = 1 + e .
min(p, q)
In particular, for the continuized walk on the 2-path,
N 2 (s) = 1 + e−4s .
N 2 (s) = (1 + e−4s/d )d
for the continuized walk on the d-cube. This result is also easily derived from
the results of Chapter 5 Example yyy:15. For d ≥ 2 and t ≥ 14 d log(d − 1),
the optimal choice of s in Lemma 8.13 is therefore
s = 14 d log(d − 1)
τ̂ ≤ 14 d(log d + 3).
τ̂ ≤ 14 (log 2)d2 + 12 d
τ̂ ≤ (1 + o(1)) 41 d log d
Consider again the benchmark product chain (i.e., the “tilde chain”) in
Example 8.5. That chain has relaxation time
−1
π 1
τ2 = d 1 − cos ≤ dm2 ,
m 2
so choosing s = 0 in Lemma 8.13 gives
d 1
τ̂ ≤ log n + 1
1 − cos(π/m) 2
1 2
≤ 4 dm (log n + 2). (8.42)
λl = 1 − cos(π(l − 1)/m), 1 ≤ l ≤ m,
and
m−1
X Z ∞
2 2
exp(−4sl /m ) ≤ exp(−4sx2 /m2 ) dx
l=2 x=1
1/2 !
π 2(2s)1/2
= m P Z≥
4s m
1/2
π/4
≤ exp(−4s/m2 )
4s/m2
≤ (4s/m2 )−1/2 exp(−4s/m2 )
Thus
N 2 (s) ≤ 1 + 2[1 + (4s/m2 )−1/2 ] exp(−4s/m2 ), s > 0.
Return now to the “tilde chain” of Example 8.5, and assume for sim-
plicity that m1 = · · · = md = m. Since this chain is a slowed-down d-fold
product of the path chain, it has
" −1/2 ! #d
4s 4s
2
N (s) ≤ 1 + 2 1 + exp − , s > 0. (8.43)
dm2 dm2
ˆ
In particular, since d(2t) = N 2 (t) − 1, it is now easy to see that
τ̂ ≤ 34 m2 d2 (log 2 + 34 d−1 ).
xxx Improvement over (8.42) by factor Θ(log m), but display follow-
ing (8.43) shows still off by factor Θ(d/ log d).
that holds for some positive constants C, D, and T and for all functions g.
We connect Nash inequalities to mixing times in Section 8.3.1, and in Sec-
tion 8.3.2 we discuss a comparison method for establishing such inequalities.
288CHAPTER 8. ADVANCED L2 TECHNIQUES FOR BOUNDING MIXING TIMES (MAY 19
appearing in the mixing time Lemma 8.13. This norm is continuous in t and
decreases to kEk2→∞ = 1 as t ↑ ∞. Here is the main result:
Proof. First note N (t) = kPt k1→2 by Lemma 8.12. Thus we seek a
bound on h(t) := kPt gk22 independent of g satisfying kgk1 = 1; the square
root of such a bound will also bound N (t).
Substituting Pt g for g in (8.46) and utilizing the identity in Lemma 8.10(a)
and the fact that Pt is contractive on L1 , we obtain the differential inequality
1
h(t)1+ 2D ≤ C [− 12 h0 (t) + 1
T h(t)], t ≥ 0.
Writing
h i−1/(2D)
1 −2t/T
H(t) := 2 Ch(t)e ,
or equivalently
T −2D
h(t) ≤ 1 − e−t/(DT ) , t ≥ 0.
C
But
t
e−t/(DT ) ≤ 1 − (1 − e−1/D ) for 0 < t ≤ T ,
T
8.3. NASH INEQUALITIES 289
t
−2D
−1/D
h(t) ≤ 1−e
C
−2D
2 t
1/D
= e e −1 ≤ [e(DC/t)D ]2 ,
C
as desired.
We now return to Lemma 8.13 and, for t ≥ T , set s = T . (Indeed,
using the bound on N (s) in Theorem 8.17, this is the optimal choice of s if
T < Dτ2 .) This gives
xxx NOTE: In next theorem, only need conclusion of Theorem 8.17, not
hypothesis!
DC
t ≥ T + τ2 D log +c ,
T
q
then ˆ
d(2t) ≤ e1−c ; in particular,
DC
τ̂ ≤ T + τ2 D log +2 .
T
with
1
C 0 := 2 (1 + 1
2D ) [(1 + 2D)1/2 C]1/D ≤ 22+ 2D C 1/D .
Proof. As in the proof of Theorem 8.17, we note N (t) = kPt k1→2 . Hence,
for any g and any 0 < t ≤ T ,
Z t
kgk22 = kPt gk22 − d 2
ds kPs gk2 ds
s=0
Z t
= kPt gk22 + 2 E(Ps g, Ps g) ds by Lemma 8.10(a)
s=0
≤ kPt gk22 + 2E(g, g)t xxx see above
2 −2D
≤ 2E(g, g)t + C t kgk21 .
This gives
kgk22 ≤ t[2E(g, g) + 1 2
T kgk2 ] + C 2 t−2D kgk21
for any t > 0. The righthand side here is convex in t and minimized (for g 6=
0) at
!1/(2D+1)
2DC 2 kgk21
t= .
2E(g, g) + T −1 kgk22
1
Plugging in this value, raising both sides to the power 1+ 2D and simplifying
0
yields the desired Nash inequality. The upper bound for C is derived with
a little bit of calculus.
for constants C̃, D̃, T̃ > 0, then any other reversible chain on the same state
space satisfies
N (t) ≤ e(DC/t)D for 0 < t ≤ T ,
where, with a and A as defined in Corollary 8.2, and with
a0 := max(π̃i /πi ),
i
we set
D = D̃,
1 1/D
C = a−(2+ D ) a0 A × 2(1 + 1
2D )[(1 + 2D)1/2 C̃]1/D
1 1/D 1
≤ a−(2+ D ) a0 A × 22+ 2D C̃ 1/D
,
2A
T = T̃ .
a0 2
xxx Must correct this slightly. Works for any A such that Ẽ ≤ AE, not
just minimal one. This is important since we need a lower bound on T but
generally only have an upper bound on Ẽ/E. The same goes for a0 (only
need upper bound on π̃i /πi ): we also need an upper bound on T .
a ≥ 21 , A = 1, a0 = 2.
τ̂ ≤ 18 m2 d2 log d + ( 19 2 2
8 log 2)m d +
33 2
32 m d, (8.51)
and the log-Sobolev time τl . As in Section 8.3, we again consider the fun-
damental quantity
N (s) = kPs k2→∞
q
arising in the bound on ˆ
d(2t) in Lemma 8.13, and recall from Section 8.3.1
that
−1/2
N (s) decreases strictly monotonically from π∗ at s = 0 to 1 as s ↑ ∞.
for each q ≥ 2, kPs k2→q equals 1 for all sufficiently large s. (8.56)
then s2 = 0 < sq , and we will see presently that sq < ∞ for q ≥ 2. The
following theorem affords a connections with the log-Sobolev time τl (and
hence with the Dirichlet form E).
8.4. LOGARITHMIC SOBOLEV INEQUALITIES 295
For the first half of the proof we suppose that (8.57) holds and must
prove τl ≤ u, that is, we must establish the log-Sobolev inequality
Plugging the specific formula (8.58) for q(t) into (8.59) and setting t = 0
gives
F 0 (0) = kgk−1 −1
2 (u L(g) − E(g, g)). (8.62)
Moreover, since
q q 2 τl
Lq (g) = L(g q/2 ) ≤ τl E(g q/2 , g q/2 ) ≤ E(g, g q−1 ) (8.63)
2 4(q − 1)
q 0 (t)
Lq(t) (Pt g) − E Pt g, (Pt g)q(t)−1 ≤ 0.
q(t)
From (8.59) we then find F 0 (t) ≤ 0 for all t ≥ 0. Since F (0) = kgk2 , this
implies
kPt gkq(t) ≤ kgk2 . (8.64)
We have assumed g ≥ 0, but (8.64) now extends trivially to general g, and
therefore
kPt k2→q(t) ≤ 1.
This gives the desired hypercontractivity assertion (8.57).
Here is the technical Dirichlet form lemma that was used in the proof of
Theorem 8.24.
4(q−1)
Lemma 8.25 E(g, g q−1 ) ≥ q2
E(g q/2 , g q/2 ) for g ≥ 0 and 1 < q < ∞.
q2 q2 bq−1 − aq−1
Z b
≤ tq−2 dt = .
4(b − a) a 4(q − 1) b−a
This shows that
4(q − 1) q/2
(bq−1 − aq−1 )(b − a) ≥ (b − aq/2 )2
q2
and the lemma follows easily from this and (8.65).
Now we are prepared to bound τ̂ in terms of τl .
(b)
τ̂ ≤ 12 τl log log π1∗ + 2τ2 ≤ τl ( 12 log log π1∗ + 2).
Proof. Part (b) follows immediately from (8.54), part (a), and Lemma 8.22.
To prove part (a), we begin with (8.55):
−1/q
kPi (Xt ∈ ·) − π(·)k2 ≤ πi kPs k2→q e−(t−s)/τ2 .
As in the second half of the proof of Theorem 8.24, let q = q(s) := 1 + e2s/τl .
Then kPs k2→q(s) ≤ 1. Thus
−1/q(s) −(t−s)/τ2
kPi (Xt ∈ ·) − π(·)k2 ≤ πi e , 0 ≤ s ≤ t.
Choosing s = 12 τl log log( π1i ) we have q(s) = 1 + log( π1i ) and thus
t−s
kPi (Xt ∈ ·) − π(·)k2 ≤ exp(1 − τ2 ) for t ≥ s.
We have established the upper bound in the following corollary; for the
lower bound, see Corollary 3.11 in [119].
Corollary 8.27
τl ≤ τ̂ ≤ τl ( 12 log log π1∗ + 2).
Eπ g = θg(0) + (1 − θ)g(1) = 1.
x := 1/(g(0) − g(1)),
so that
1−θ θ
g(0) = 1 + , g(1) = 1 −
x x
and we must consider x ∈ (−∞, −(1 − θ)] ∪ [θ, ∞). We calculate
1 θ(1 − θ) θ(1 − θ)
− θ 1+ 2
log 1 +
2 x x2
h
= θ(x + 1 − θ)2 log(x + 1 − θ)2 + (1 − θ)(x − θ)2 log(x − θ)2
i
−(x2 + θ(1 − θ)) log(x2 + θ(1 − θ)) /(2x2 ),
L(g)
r(x) := 2θ(1 − θ) = 2x2 `(x)
E(g, g)
= θ(x + 1 − θ)2 log(x + 1 − θ)2 + (1 − θ)(x − θ)2 log(x − θ)2
−(x2 + θ(1 − θ)) log(x2 + θ(1 − θ)).
Now consider any irreducible chain (automatically reversible) on {0, 1}, with
stationary distribution π. Without loss of generality we may suppose π0 ≤
π1 . We claim that
π log(π /π )
1 1 0
2p01 (1−2π0 )
if π0 6= 1/2
τl =
1/(2p01 ) if π0 = 1/2.
300CHAPTER 8. ADVANCED L2 TECHNIQUES FOR BOUNDING MIXING TIMES (MAY 19
The proof is easy. The functional L(g) depends only on π and so is un-
changed from Example 8.28, and the Dirichlet form changes from E(g, g) =
π0 π1 (g(0) − g(1))2 in Example 8.28 to E(g, g) = p01 (g(0) − g(1))2 here.
Remark. Recall from Chapter 5, Example yyy:4 that τ2 = 1/(p01 +p10 ) =
π1 /p01 . It follows that
log(π /π )
1 0
2(1−2π0 )
if 0 < π0 < 1/2
τl
=
τ2
1 if π0 = 1/2
The proof of Lemma 8.22 and the result of Example 8.28 can be combined
to prove the following result: For the “trivial” chain with pij ≡ πj , the
log-Sobolev time τl is given (when π∗ < 1/2) by
log( π1∗ − 1)
τl = .
2(1 − 2π∗ )
Corollary 8.31 For any reversible chain (with π∗ < 1/2, which is auto-
matic for n ≥ 3),
log( π1∗ − 1)
τl ≤ τ2 .
2(1 − 2π∗ )
log( π1∗ − 1)
L(g) ≤ (varπ g) ,
2(1 − 2π∗ )
It follows readily from Example 8.30 that the continuized walk of the com-
plete graph has
(n − 1) log(n − 1) 1
τl = ∼ log n.
2(n − 2) 2
Since τ2 = (n − 1)/n, equality holds in Corollary 8.31 for this example.
xxx Move the following warning to follow Corollary 8.27, perhaps?
Warning. Although the ratio of the upper bound on τ̂ to lower bound
in Corollary 8.27 is smaller than that in (8.7), the upper bound in Corol-
lary 8.27 is sometimes of larger order of magnitude than the upper bound
in (8.7). For the complete graph, (8.7) says
n−1 n−1 1
n ≤ τ̂ ≤ n (2 log n + 1)
Open Problem 8.33 Calculate τl for the n-cycle (Chapter 5 Example yyy:7)
when n ≥ 4.
2 2
m ≤ τl ≤ m2 .
π2
The lower bound is easy, using Lemma 8.22:
2 2
τl ≥ τ2 = (1 − cos(π/m))−1 ≥ m .
π2
For the upper bound we use Corollary 8.27 and estimation of τ̂ . Indeed, in
Example 8.16 it was shown that
h i
ˆ
d(2t) = N 2 (t) − 1 ≤ 1 + (4t/m2 )−1/2 exp(−4t/m2 ), t > 0.
q
Substituting t = m2 gives d(2t)ˆ ≤ 3/2 e−2 < e−1 , so τl ≤ τ̂ ≤ m2 .
p
xxx P.S. Persi (98/07/02) points out that H. T. Yau showed τl = Θ(n log n)
for random transpositions by combining τl ≥ τ2 (Lemma 8.22) and τl ≤
L(g0 )/E(g0 , g0 ) with g0 = delta function. I have written notes generalizing
and discussing this and will incorporate them into a later version.
Lemma 8.35
X (2) X (1)
E(g, g) = πi2 E (1) (g(·, i2 ), g(·, i2 )) + πi1 E (2) (g(i1 , ·), g(i1 , ·)).
i2 i1
Proof. This follows easily from (8.69) and the definition of E in Chapter 3
Section yyy:6.1 (cf. (68)).
The analogue of (8.68) for the log-Sobolev time is also true:
xxx For NOTES?: Can give analagous proof of (8.68): see my notes,
page 8.4.24A.
Proof. The keys to the proof are Lemma 8.35 and the following “law
of total L-functional.” Given a function g 6≡ 0 on the product state space
I = I1 × I2 , define a function G2 6≡ 0 on I2 by
1/2
X
G2 (i2 ) := kg(·, i2 )k2 = πi1 g 2 (i1 , i2 ) .
i1
Then
X
L(g) = πi1 ,i2 g 2 (i1 , i2 ) [log(|g(i1 , i2 )|/G2 (i2 )) + log(G2 (i2 )/kgk2 )]
i1 ,i2
X (2)
= πi2 L(1) (g(·, i2 )) + L(2) (G2 ),
i2
But from
|G2 (j2 ) − G2 (i2 )| = |kg(·, j2 )k2 − kg(·, i2 )k2 | ≤ kg(·, j2 ) − g(·, i2 )k2
follows X (1)
E (2) (G2 , G2 ) ≤ πi1 E (2) (g(i1 , ·), g(i1 , ·)). (8.71)
304CHAPTER 8. ADVANCED L2 TECHNIQUES FOR BOUNDING MIXING TIMES (MAY 19
τl = d/2 = τ2 .
¿From this and the upper bound in Corollary 8.27 we can deduce
with
L(g(i), c) := g 2 (i) log(|g(i)|/c) − 12 (g 2 (i) − c2 ) ≥ 0. (8.73)
8.4. LOGARITHMIC SOBOLEV INEQUALITIES 305
Proof. We compute
X
f (c) := 2 πi L(g(i), c1/2 ) = Eπ (g 2 log |g|2 ) − kgk22 log c − kgk22 + c,
i
0
f (c) = 1 − c−1 kgk22 , f 00 (c) = c−2 kgk22 > 0.
xxx Remarked in Example 8.34 that τl ≤ m2 for m-path with end self-loops.
xxx So by Theorem 8.36, benchmark product chain has τ̃l ≤ dm2 .
Recalling A ≤ 1 and a ≥ 1/2 from Example 8.5, we therefore find
τl ≤ 2m2 d (8.74)
with
q−1
g = |f |(q−2)/(q−1) , h = |f |2/(q−1) , p= .
q−2
Theorem 8.42 Suppose that a continuous-time reversible chain satisfies
for h i
t ≥ T + 12 τl log log(CT −D ) − 1 + cτ2 ,
where τ2 is the relaxation time and τl is the log-Sobolev time.
Proof. ¿From Lemma 8.11 and a slight extension of (8.34), for any
s, t, u ≥ 0 and any initial distribution we have
Return one last time to the walk of interest in Example 8.5. Example 8.21
showed that (8.75) holds with
Also recall τ2 ≤ 12 dm2 from (8.49) and τl ≤ 2dm2 from Example 8.40.
Plugging these into Theorem 8.42 with c = 2 yields
49 2 1 19
τ̂ ≤ 32 m d log[ 4 d log d + 4 d log 2], which is ≤ 5m2 d log d for d ≥ 2.
309
310CHAPTER 9. A SECOND LOOK AT GENERAL MARKOV CHAINS (APRIL 21, 1995)
Theorem 9.1 Let T be any strong stationary time for the µ-chain. Then
sepµ (t) ≤ Pµ (T > t) for all t ≥ 0. (9.3)
Moreover there exists a minimal strong stationary time T for which
sepµ (t) = Pµ (T > t) for all t ≥ 0. (9.4)
Eµ T ≥ max(Eµ Tj − Eπ Tj ). (9.5)
j
Eµ T = max(Eµ Tj − Eπ Tj ). (9.6)
j
In each case, the first assertion is immediate from the definitions, and
the issue is to carry out a construction of the required T . Despite the
similar appearance of the results, attempts to place them all in a common
framework have not been fruitful. We will prove Theorems 9.1 and 9.3
below, and illustrate with examples. These two proofs involve only rather
simple “greedy” constructions. We won’t give the proof of Theorem 9.2
(the construction is usually called the maximal coupling: see Lindvall [233])
because the construction is a little more elaborate and the existence of the
minimal coupling time is seldom useful, but on the other hand the coupling
inequality in Theorem 9.2 will be used extensively in Chapter 14. In the
context of Theorems 9.1 and 9.2 the minimal times T are clearly unique
in distribution, but in Theorem 9.3 there will generically be many mean-
minimal stationary times T with different distributions.
Pµ (Xt = j, T = t) = σj (t) = rt πj
Pµ (Xt ∈ ·) = θ(t) + Pµ (T ≤ t − 1) · pi
Pµ (Xt = j)
sepµ (t) = 1 − min = Pµ (T ≥ t) − rt = Pµ (T > t).
j πj
Part (c) is rather remarkable, and can be rephrased as follows. Call a state
k with property (9.10) a halting state for the stopping time T . In words, the
chain must stop if and when it hits a halting state. Then part (c) asserts
that, to verify that an admissible time T attains the minimum t̄(µ, ρ), it
suffices to show that there exists some halting state. In the next section we
shall see this is very useful in simple examples.
Proof. The greedy construction used here is called a filling scheme.
Recall from (9.7) the definitions
Write also Σj (t) = Pµ (XT = j, T ≤ t). We now define (θ(t), σ(t); t ≥ 0) and
the associated stopping time T̄ inductively via (9.8) and
σj (t) = 0 if Σj (t − 1) = ρj
= θt if Σj (t − 1) + θj (t) ≤ ρj
= ρj − Σj (t − 1) otherwise.
In words, we stop at the current state (j, say) provided our “quota” ρj for
the chance of stopping at j has not yet been filled. Clearly
tj ≡ min{t : Σj (t) = ρj } ≤ ∞.
We shall show X
xj + ρj = µj + xi pij ∀j. (9.12)
i
So summing over i,
X
xi pij = Eµ (number of visits to j during 1, 2, . . . , T )
i
314CHAPTER 9. A SECOND LOOK AT GENERAL MARKOV CHAINS (APRIL 21, 1995)
Corollary 9.5 The minimal strong stationary time has mean t̄(µ, π), i.e. is
mean-minimal amongst all not-necessarily-strong stationary times, iff there
exists a state k such that
Pµ (Xt = k)/πk = min Pµ (Xt = j)/πj ∀t.
j
Proof. From the construction of the minimal strong stationary time, this is
the condition for k to be a halting state.
9.1.3 Examples
Example 9.6 Patterns in coin-tossing.
Recall Chapter 2 Example yyy: (Xt ) is the chain on the set {H, T }n of n-
tuples i = (i1 , . . . , in ). Start at some arbitrary initial state j = (j1 , . . . , jn ).
Here the deterministic stopping time “T = n” is a strong stationary time.
Now a state k = (k1 , . . . , kn ) will be a halting state provided it does not
overlap j, that is provided there is no 1 ≤ u ≤ n such that (ju , . . . , jn ) =
(k1 , . . . , kn+u−1 ). But the number of overlapping states is at most 1 =
2 + 22 + . . . + 2n−1 = 2n − 1, so there exists a non-overlapping state, i.e. a
halting state. So ET attains the minimum (= n) of t̄(j, π) over all stationary
times (and not just over all strong stationary times).
9.1. MINIMAL CONSTRUCTIONS AND MIXING TIMES 315
Consider the following scheme for shuffling an n-card deck: the top card is
removed, and inserted in one of the n possible positions, chosen uniformly at
random. Start in some arbitrary order. Let T be the first time that the card
which was originally second-from-bottom has reached the top of the deck.
Then it is not hard to show (Diaconis [112] p. 177) that T + 1 is a strong
stationary time. Now any configuration in which the originally-bottom card
is the top card will be a halting state, and so T + 1 is mean-minimal over
Pn−1 n
all stationary times. Here E(T + 1) = 1 + m=2 m = n(hn − 1).
In a series of games which you win or lose independently with chance 0 <
c < 1, let X̂t be your current “winning streak”, i.e. the number of games won
since your last loss. For fixed n, the truncated process Xt = min(Xt , n − 1)
is the Markov chain on states {0, 1, 2, . . . , n−1} with transition probabilities
Here is another stopping time T which is easily checked to attain the station-
ary distribution, for the chain started at 0. With chance 1 − c stop at time
316CHAPTER 9. A SECOND LOOK AT GENERAL MARKOV CHAINS (APRIL 21, 1995)
0. Otherwise, run the chain until either hitting n − 1 (in which case, stop)
or returning to 0. In the latter case, the return to 0 occurs as a transition
to 0 from some state M ≥ 0. Continue until first hitting M + 1, then stop.
Again n − 1 is a halting state, so this stationary time also is mean-minimal.
Of course, the simplest construction is the deterministic time T = n − 1.
This is a strong stationary time (the winning streak chain is a function of
the patterns in coin tossing chain), and again n − 1 is clearly a halting state.
Thus t̄(0, π) = n − 1 without needing the calculation above.
Remark. One could alternatively use Corollary 9.5 to show that the
strong stationary times in Examples 9.6 and 9.7 are mean-minimal sta-
tionary times. The previous examples are atypical: here is a more typical
example in which the hypothesis of Corollary 9.5 is not satisfied and so no
mean-optimal stationary time is a strong stationary time.
spanning tree with one vertex distinguished as the root, and with each edge
e = (v, w) of t regarded as being directed towards the root. Write T for the
set of directed spanning trees. For t ∈ T define
Y
ρ̄(t) ≡ pvw .
(v,w)∈t
Now fix n and consider the stationary Markov chain (Xm : −∞ < m ≤ n)
run from time minus infinity to time n. We now use the chain to construct a
random directed spanning tree Tn . The root of Tn is Xn . For each v 6= Xn
there was a final time, Lv say, before n that the chain visited v:
Lv ≡ max{m ≤ n : Xm = v}.
(v = XLv , XLv +1 ), v 6= Xn .
So the edges of Tn are the last-exit edges from each vertex (other than the
root Xn ). It is easy to check that Tn is a directed spanning tree.
Now consider what happens as n changes. Clearly the process (Tn :
−∞ < n < ∞) is a stationary Markov chain on T , with a certain transition
matrix Q = (q(t, t0 )), say. The figure below indicates a typical transition
t → t0 . Here t was constructed by the chain finishing at its root v, and t0 is
the new tree obtained when the chain makes a transition v → w.
x x
6@ @
@ @
R
@ R
@
◦ ◦
@ @
@ @
R
@ ? R
@ ?
◦ -w v ◦ -w v
6 6 6 6
t t0
Theorem 9.10 (The Markov chain tree theorem) The stationary dis-
tribution of (Tn ) is ρ.
318CHAPTER 9. A SECOND LOOK AT GENERAL MARKOV CHAINS (APRIL 21, 1995)
ρ̄(t)q(t, t0 ) = ρ̄(t0 ).
X
(9.13)
t
Write w for the root of t0 . For each vertex x 6= w there is a tree tx con-
structed from t0 by adding an edge (w, x) and then deleting from the result-
ing cycle the edge (v, w) (say, for some v = v(x)) leading into w. For x = w
set v(x) = x. It is easy to see that the only possible transitions into t0 are
from the trees tx , and that
ρ̄(tx ) pwx
0
= ; q(tx , t0 ) = pvw .
ρ̄(t ) pvw
x x
2
The underlying chain Xn can be recovered from the tree-valued chain
Tn via Xn = root(Tn ), so we can recover the stationary distribution of X
from the stationary distribution of T , as follows.
Corollary 9.11 (The Markov chain tree formula) For each vertex v
define
X
π̄(v) ≡ ρ̄(t).
t: v=root(t)
π̄(v)
π(v) ≡ P .
w π̄(w)
until the cover time C. Define T to be the directed spanning tree with root
X0 and with edges (v = XTv , XTv −1 ), v 6= X0 . If X0 has distribution π
then T has distribution ρ. If X0 is deterministically v0 , say, then T has
distribution ρ conditioned on being rooted at v0 .
Thus T consists of the edges by which each vertex is first visited, directed
backwards.
For a reversible chain, we can of course use the chain itself in Corollary
9.12 above, in place of the time-reversed chain. If the chain is random walk
on a unweighted graph G, then
Y 1
ρ̄(t) = d(root(t))
v d(v)
Proof. Consider the random walk started at v and run until the time U
of the first return to v after the first visit to x. Let p be the chance that
XU −1 = x, i.e. that the return to x is along the edge (x, v). We can calculate
p in two ways. In terms of random walk started at x, p is the chance that
the first visit to v is from x, and so by Corollary 9.12 (applied to the walk
started at x) p = P ((x, v) ∈ T). On the other hand, consider the walk
started at v and let S be the first time that the walk traverses (x, v) in that
direction. Then
ES = EU/p.
But by yyy and yyy
ES = w/wvx , EU = wrvx
and hence p = wvx rvx as required. 2
The next result indicates the usefulness of the electrical network analogy.
9.3. SELF-VERIFYING ALGORITHMS FOR SAMPLING FROM A STATIONARY DISTRIBUTION321
Proof. Consider the “shorted” graph Gshort in which the end-vertices (x1 , x2 )
of e1 are shorted into a single vertex x, with edge-weights wxv = wx1 v +wx2 v .
The natural 1 − 1 correspondence t ↔ t ∪ {e1 } between spanning trees of
Gshort and spanning trees of G containing e1 maps the distribution ρshort to
the conditional distribution ρ(·|e1 ∈ T). So, writing Tshort for the random
spanning tree associated with Gshort ,
Lemma 9.16 Consider a pure simulation algorithm which, given any irre-
ducible n-state chain, eventually outputs a random state whose distribution
is guaranteed to be within ε of the stationary distribution in variation dis-
tance. Then the algorithm must take Ω(n) steps for every P.
steps, where τ1∗ is the mixing time parameter defined as the smallest t such
that
1
Pi (XUσ = j) ≥ πj for all i, j ∈ I, σ ≥ t (9.15)
2
where Uσ denotes a random time uniform on {0, 1, . . . , σ − 1}, independent
of the chain.
xxx tie up with Chapter 4 discussion and [241].
The following two facts are the mathematical ingredients of the algo-
rithm. We quote as Lemma 9.17(a) a result of Ross [300] (see also [53]
Theorem XIV.37); part (b) is an immediate consequence.
As the second ingredient, observe that the Markov chain tree formula (Corol-
lary 9.11) can be rephrased as follows.
Corollary 9.18 Let π be the stationary distribution for a transition matrix
P on I. Let J be random, uniform on I. Let (ξi ; i ∈ I) be independent, with
P (ξi = j) = pij . Consider the digraph with edges {(i, ξi ) : i 6= J}. Then,
conditional on the digraph being a tree with edges directed toward the root
J, the probability that J = j equals πj .
So consider the special case of a chain with the property
p∗ij ≥ (1/2)1/n πj ∀i, j. (9.16)
The probability of getting any particular digraph under the procedure of
Corollary 9.18 is at least 1/2 the probability of getting that digraph under
the procedure of Lemma 9.17, and so the probability of getting some tree is
at least 1/2n, by Lemma 9.17(b). So if the procedure of Corollary 9.18 is
repeated r = d2n log 4e times, the chance that some repetition produces a
tree is at least 1 − (1 − 1/2n)2n log 4 = 3/4, and then the root J of the tree
has distribution exactly π.
Now for any chain, fix σ > τ1∗ . The submultiplicativity (yyy) property of
separation, applied to the chain with transition probabilities p̃ij = Pi (XUσ =
j), shows that if V denotes the sum of m independent copies of Uσ , and ξi
is the state reached after V steps of the chain started at i, then
P (ξi = j) ≡ Pi (XV = j) ≥ (1 − 2−m )πj ∀i, j.
So putting m = − log2 (1 − (1/2)1/n ) = Θ(log n), the set of probabilities
(P (ξi = j)) satisfy (9.16).
Combining these procedures, we have (for fixed σ > τ1∗ ) an algorithm
which, in a mean number nmσr = O(σn2 log n) of steps, has chance ≥
3/4 to produce an output, and (if so) the output has distribution exactly
π. Of course we initially don’t know the right σ to use, but we simply
try n, 2n, 4n, 8n, . . . in turn until some output appears, and the mean total
number of steps will satisfy the asserted bound (9.14).
The details are messy, so let us just outline the (simple) underlying idea.
Suppose we can define a procedure which terminates in some random number
Y of steps, where Y is an estimate of τ0 : precisely, suppose that for any P
P (Y ≤ τ0 ) ≤ ε; EY ≤ Kτ0 (9.18)
Simulate Y ; then run the chain for UY /ε steps and output the
final state ξ
and so
τ0
||P (ξ ∈ ·) − π|| ≤ E max(1, ) ≤ 2ε.
Y /ε
1
And the mean number of steps is (1 + 2ε )EY .
So the issue is to define a procedure terminating in Y steps, where Y
satisfies (9.18). Label the states {1, 2, . . . , n} and consider the following
coalescing paths routine.
(i) Pick a uniform random state J.
(ii) Start the chain at state 1, run until hitting state J, and write A1 for
the set of states visited along the path.
(iii) Restart the chain at state min{j : j 6∈ A1 }, run until hitting some
state in A1 , and write A2 for the union of A1 and the set of states visited
by this second path.
(iiii) Restart the chain at state min{j : j 6∈ A2 }, and continue this
procedure until every state has been visited. Let Y be the total number of
steps.
The random target lemma says that the mean number of steps in (ii)
equals τ0 , making this Y a plausible candidate for a quantity satisfying
(9.18). A slightly more complicated algorithm is in fact needed – see [18].
(i) (j)
C ∗ = min{t : Xt = Xt ∀i, j} ≤ ∞.
(i,t) (j,t)
C = max{t : X0 = X0 ∀i, j} ≥ −∞.
∞
= s−1 trace (
X
Vm )(V − VT ) ≤ 0.
m=0
Letting s ↑ 1 gives the Proposition as stated. 2
The proof in [147] of (9.19) has no simple probabilistic interpretation,
and it would be interesting to find a probabilistic proof. It is not clear to
me whether Conjecture 9.22 could be proved in a similar way.
Here is the probabilistic interpretation of Proposition 9.21. Recall the
elementary result (yyy) that in a n-state chain
XX
πa pab Eb Ta = n − 1. (9.20)
a b
≤ n − 1.
P P
Corollary 9.23 a b πa pab Ea Tb
Corollary 9.24 Assuming Conjecture 9.22 is true, τ0 (λ) ≤ τ0 (1/2) for all
0 ≤ λ ≤ 1.
In other words, making the chain “more reversible” tends to increase mean
hitting times.
Proof. This depends on results about differentiating with respect to
the transition matrix, which we present as slightly informal calculations.
Introduce a “perturbation” matrix Q such that
X
qij = 0 ∀i; qij = 0 whenever pij = 0. (9.21)
j
d X X
Ea Tb = Ea Ni (Tb ) qij Ej Tb .
dθ i j
P
This holds because the j term gives the effect on ETb of a Q-step from i.
Using general identities from Chapter 2 yyy, and (9.21), this becomes
d X πi (zab − zbb )
X
Ea Tb = + zbi − zai qij zjb /πb .
dθ i
πb j
Now specialize to the case where π is the stationary distribution for each
P + θQ, that is where X
πi qij = 0 ∀j.
i
9.5. AN EXAMPLE CONCERNING EIGENVALUES AND MIXING TIMES329
β = max{|λu | : 2 ≤ u ≤ n}.
Lemma 9.25 There exists a family of n-state chains, with uniform station-
ary distributions, such that supn βn < 1 while inf n αn (n) > 0.
j+1
P (Y ≤ j) = , 0 ≤ j ≤ n − 2.
j+2
Xt = max(Xt−1 − 1, Yt ).
This chain has the property (cf. the “patterns in coin-tossing” chain) of
attaining the stationary distribution in finite time. Precisely: for any initial
distribution σ, the distribution of Xn−1 is uniform, and hence Xt is uniform
for all t ≥ n − 1. To prove this, we simply observe that for 0 ≤ j ≤ n − 1,
βn = max{|λu | : 2 ≤ u ≤ n}
9.6. MISCELLANY 331
is bounded away from 1, but we can avoid proving this by considering the
“lazy” chain X̂t with transition matrix P̂ = (I + P)/2, for which by (9.23)
q
β̂n ≤ sup{|(1 + λ)/2| : |λ| ≤ 1, R(λ) ≤ 0} = 1/2.
So the family of lazy chains has the eigenvalue property asserted in Lemma
9.25. But by construction, Xt ≥ X0 − t, and so P (X0 > 3n/4, Xn/2 <
n/4) = 0. For the lazy chains we get
9.6 Miscellany
9.6.1 Mixing times for irreversible chains
In Chapter 4 yyy we discussed equivalences between different definitions of
“mixing time” in the τ1 family. Lovasz and Winkler [241] give a detailed
treatment of analogous results in the non-reversible case.
xxx state some of this ?
p(i, j) = 0, j ≥ i
|A||Ac |
c(A) = P P
n i∈A j∈Ac p(i, j)
n1/2 1/2
≤ E∆ ≤ K2 τ2 n1/2 log n (9.24)
K1 τ2 log n
Section 9.3. The first “pure simulation” algorithm for sampling exactly
from the stationary distribution was given by Asmussen et al [35], using a
quite different idea, and lacking explicit time bounds.
Section 9.3.1. In our discussion of these algorithms, we are assuming
that we have a list of all states. Lovasz - Winkler [242] gave the argument in
a slightly different setting, where the algorithm can only “address” a single
state, and their bound involved maxij Ei Tj in place of τ1∗ .
Section 9.3.3. Letac [226] gives a survey of the “backwards coupling”
method for establishing convergence of continuous-space chains: it suffices
(x,s)
to show there exists a r.x. X −∞ such that X0 → X −∞ a.s. as s → −∞,
for each state x. This method is especially useful in treating matrix-valued
chains of the form Xt = At Xt−1 + Bt , where (At , Bt ), t ≥ 1 are i.i.d. random
matrices. See Barnsley and Elton [43] for a popular application.
Section 9.4.1. One result on spectra and reversibilizations is the follow-
ing. For a transition matrix P write
1
τ (P) = sup{ : λ 6= 1 an eigenvalue of P}.
1 − |λ|
335
336CHAPTER 10. SOME GRAPH THEORY AND RANDOMIZED ALGORITHMS (SEPTEMB
10.1 Expanders
10.1.1 Definitions
The Cheeger time constant τc discussed in Chapter 4 section 5.1 (yyy 10/11/94
version) becomes, for a r-regular n-vertex graph,
r |A| |Ac |
τc = sup
A n |E(A, Ac )|
where E(A, Ac ) is the set of edges from a proper subset A of vertices to its
complement Ac . Our version of Cheeger’s inequality is (Chapter 4 Corollary
37 and Theorem 40) (yyy 10/11/94 version)
τc ≤ τ2 ≤ 8τc2 . (10.1)
Definition. An expander family is a sequence Gn of r-regular graphs (for
some fixed r > 2), with n → ∞ through some subsequence of integers, such
that
sup τc (Gn ) < ∞
n
or equivalently (by Cheeger’s inequality)
sup τ2 (Gn ) < ∞.
n
One informally says “expander” for a generic graph Gn in the family. The
expander property is stronger than the rapid mixing property exemplified
by the d-cube (Chapter 5 Example 15) (yyy 4/23/96 version). None of the
examples in Chapter 5 is an expander family, and indeed there are no known
elementary examples. Certain random constructions of regular graphs yield
expanders: see Chapter 30 Proposition 1 (yyy 7/9/96 version). Explicit
constructions of expander families, in particular the celebrated Ramanujan
graphs, depend on group- and number-theoretic ideas outside our scope: see
the elegant monograph of Lubotzky [243].
Graph parameters like τc are more commonly presented in inverted form
(i.e. like 1/τc ) as coefficients of expansion such as
|E(A, Ac )|
h := inf . (10.2)
A r min(|A|, |Ac |)
A more familiar version ([93] page 26) of Cheeger’s inequality in graph theory
becomes, on regular graphs,
h2 /2 ≤ 1 − λ2 ≤ 2h. (10.3)
10.1. EXPANDERS 337
τ1 = Θ(log n) (10.4)
τ0 = Θ(n) (10.5)
∗
τ = Θ(n) (10.6)
sup Ev C = Θ(n log n) (10.7)
v
This immediately gives the upper bound τ1 = O(log n). For the lower bound,
having bounded degree obviously implies that the diameter ∆ of the graph
satisfies ∆ = Ω(log n). And since the mean distance between an initial
vertex v and the position XT of the walk at a stopping time T is at most
(2) (2)
ET , the definition of τ1 implies d(v, w) ≤ 2τ1 for any pair of vertices,
(2)
that is τ1 ≥ ∆/2. This establishes (10.4). The general Markov chain fact
τ0 = Ω(n) is Chapter 3 Proposition 14 (yyy 1/26/93 version). Chapter 4
Lemma 25 gives τ0 ≤ 2nτ2 . Combining these and the obvious inequality τ0 ≤
τ ∗ /2 establishes (10.5,10.6). Finally, the lower bound in (10.7) follows from
the general lower bound in Chapter 6 Theorem 31 (yyy 10/31/94 version),
338CHAPTER 10. SOME GRAPH THEORY AND RANDOMIZED ALGORITHMS (SEPTEMB
while the upper bound follows from the upper bound on τ ∗ combined with
Chapter 2 Theorem 36 (yyy 8/18/94 version).
In many ways the important aspect of Theorem 10.1 is that τ1 -type
mixing times are of order log n. We spell out some implications below.
These hold for arbitrary regular graphs, though the virtue of expanders is
that τ2 is bounded.
e t = w) ≥ 1 (1 −
Pv (X 1
) for all t ≥ jK2 τ2 log n and all vertices v, w.
n 2j
(2)
Proof. Part (i) is just the definition of τ1 , combined with (10.8) and the
(2)
fact τ1 = O(τ1 ).
yyy relate (ii) to Chapter 4 section 3.3
Repeated use of (i) shows that we can get independent samples from π
by sampling at random times T1 , T2 , T3 , . . . with E(Tj+1 − Tj ) ≤ K1 τ2 log n.
Alternatively, repeated use of (ii) shows that we can get almost indepen-
dent samples from π by examining the lazy chain at deterministic times, as
follows.
P (Y1 = y1 , . . . , YL = yL ) ≥ n−L (1 − L
2j
), for all L, y1 , . . . , yL
the variation distance between dist(Y1 , . . . , YL ) and π×. . .×π is at most L/2j .
Examining the lazy chain at deterministic times means sampling the original
walk at random times, but at bounded random times. Thus we can get L
precisely independent samples using (i) in mean number K1 Lτ2 log n of steps,
but without a deterministic upper bound on the number of steps. Using
Corollary 10.3 we get almost independent samples (up to variation distance
ε) in a number of steps deterministically bounded by K2 L log(L/ε)τ2 log n.
where the log n term is needed for the 2-dimensional torus. We now outline
a counter-example.
Take m copies on the complete graph on m vertices. Distinguish one
vertex vi from each copy i. Add edges to make the (vi ) the vertices of a
r-regular expander. For this graph Gm we have, as m → ∞ with fixed r,
n = m2 ; τ2 = Θ(m2 ); τ0 = Θ(m3 )
contradicting conjecture (10.9). We leave the details to the reader: the key
point is that random walk on Gm may be decomposed as random walk on
the expander, with successive steps in the expander separated by sojourns
of times Θ(m2 ) within a clique.
because obviously
if d(v, w) = ∆ and t < ∆/2 then ||Pv (Xt ∈ ·) − Pw (Xt ∈ ·)|| = 1. (10.11)
1 + 21 log n
& '
τ1disc ≤ , β := max(λ2 , −λn ). (10.12)
log 1/β
paths as i varies,
!m
0
X
P (Nvw ≥ m) ≤ pi /m! (expand the sum)
i
≤ κ /m! ≤ (eκ/m)m .
m
Choosing m = d3 max(eκ, log n)e makes the bound less that 1/(2n2 ) and so
!
0
P max Nvw ≥m < 1/2.
(v,w)
(i)
Now repeat the entire construction to define another copy (Yt , 0 ≤ t ≤ Vi )
00 . Since X (i) and Y (v(i)) have the
of chain segments with traversal counts Nvw Ui Vi
same uniform distribution, for each i we can construct the chain segments
(i) (v(i))
jointly such that XUi = YVi . Concatenating paths gives a (non-Markov)
random path from i to v(i). Then
!
0 00
P max Nvw + Nvw ≥ 2m <1
(v,w)
It’s fun to say this as a word problem in the spirit of Chapter 1. Suppose
your new cyberpunk novel has been rejected by all publishers, so you have
published it privately, and seek to sell copies by mailing advertizements to
individuals. So you need to buy mailing lists (from e.g. magazines and
specialist bookshops catering to science fiction). Your problem is that such
mailing lists might have much overlap. So before buying L lists A1 , . . . , AL
(where Ai is a set of |Ai | names and addresses) you would like to know
roughly the size |∪i Ai | of their union. How can you do this without knowing
what the sets Ai are (the vendors won’t give them to you for free)? Statistical
sampling can be used here. Suppose the vendors will allow you to randomly
sample a few names (so you can check accuracy) and will allow you to
“probe” whether a few specified names are on their list. Then you can
sample k times from each list, and for each sampled name Xij probe the
other lists to count the number m(Xij ) ≥ 1 of lists containing that name.
Consider the identity
As in the previous example, the key point is that the cost of “approximately
counting” ∪i Ai to within a small relative error does not depend on the size
of the sets.
starting vertex and m steps of random walk on the expander, using about
b + m log2 r random bits. For each of the m + 1 vertices of {0, 1}b visited by
the walk (Yi , 0 ≤ i ≤ m), compute A(x, Yi ), and output Yes or No according
to the majority of the m + 1 individual outputs. The error probability is at
most
max Pπ Nm+1
m+1
(B)
− π(B) ≥ 1
3
B
where Nm+1 (B) is the number of visits to B by the walk (Yi , 0 ≤ i ≤ m).
By the large deviation bound for occupation measures (Theorem 10.11, yyy
to be moved to other chapter) this error probability is at most
at a uniform random vertex U and let H(v) be the first hitting time on v.
Then H is a random function satisfying the constraint, minimized at v = U ,
but
Theorem 10.9 ([6]) Every algorithm for locating U by examining values
H(v) requires examining a mean number Ω(2d/2−ε ) of vertices.
The argument is simple in outline. As a preliminary calculation, consider
random walk on the d-cube of length t0 = O(2d/2−ε ), started at 0, and let
Lv be the time of the last visit to v, with Lv = 0 if v is not visited. Then
t0
X
ELv ≤ tP0 (X(t) = v) = O(1) (10.13)
t=1
where the O(1) bound holds because the worst-case v for the sum is v = 0
and, switching to continuous time,
Z 2d/2 Z 2d/2 !d
1 + e−2t/d
tP0 (X̃(t) = 0) dt = t dt = O(1).
0 0 2
Now consider an algorithm which has evaluated H(v1 ), . . . , H(vm ) and
write t0 = mini≤m H(vi ) = H(v ∗ ) say. It does no harm to suppose t0 =
O(2d/2−ε ). Conditional on the information revealed by H(v1 ), . . . , H(vm ),
the distribution of the walk (X(t); 0 ≤ t ≤ t0 ) is specified by
(a) take a random walk from a uniform random start U , and condition on
X(t0 ) = v ∗ ;
(b) condition further on the walk not hitting {vi } before time t0 .
The key point, which of course is technically hard to deal with, is that the
conditioning in (b) has little effect. If we ignore the conditioning in (b), then
by reversing time we see that the random variables (H(v ∗ ) − H(v))+ have
the same distribution as the random variables Lv (up to vertex relabeling).
So whatever vertex v the algorithm chooses to evaluate next, inequality
(10.13) shows that the mean improvement E(H(v ∗ ) − H(v))+ in objective
value is O(1), and so it takes Ω(2d/2−ε ) steps to reduce the objective value
from 2d/2−ε to 0.
From the analysis of random walk on the d-cube (Chapter 5 Example 15)
(yyy 4/23/96 version) one can show
load is O(d/ log d). However, [47] shows that by locally redistributing tasks
assigned to the same processor, one can reduce the maximal load to O(1)
while maintaining the dilation at O(log d).
cij ≤ αc̃ij
In other words, XX
πw pwv m(v, w) = (n − 1)c̄. (10.16)
v w
Now apply the definition (10.15) of s(P, C) to the sequence of states visited
by the stationary time-reversed chain P∗ ; by considering the mean of each
step,
πv p∗vw m(v, w) ≤ s(P, C) πv p∗vw cvw .
XX XX
(10.17)
v w v w
But the left sides of (10.16) and (10.17) are equal by definition of P∗ , and
the sum in the right of (10.17) equals c̄ by symmetry of C, establishing (a).
For (b), first note that the definition (10.15) of stretch is equivalent to
P
i m(vi , vi+1 )
s(P, C) = max P (10.18)
i cvi ,vi+1
σ
Now the hypothesis of (b) is that P is reversible and cvw = t(v, w) + t(w, v).
So the second term of (10.19) equals 2(n − 1) by Chapter 3 Lemma 6 (yyy
9/2/94 version) and the first term equals 1/2 by the cyclic tour property
Chapter 3 Lemma 1 (yyy 9/2/94 version). So for each cycle σ the ratio in
(10.18) equals n − 1, establishing (b).
10.5. APPROXIMATE COUNTING VIA MARKOV CHAINS 351
S = SL ⊃ SL−1 ⊃ . . . ⊃ S2 ⊃ S1 (10.20)
where |S1 | is known, the ratios pi := |Si+1 |/|Si | are bounded away from 0,
and where we can sample uniformly from each Si . Then take k uniform
random samples from each Si and find the sample proportion Wi which fall
into Si−1 . Because |S| = |S1 | Li=2 |Si |/|Si−1 |, we use
Q
L
Wi−1
Y
N̂ := |S1 |
i=2
The simplest case is where we know a theoretical lower bound p∗ for the pi .
Then by taking k = O(ε−2 L/p∗ ) we get
|S|
var ≤ exp( pL∗ k ) − 1 = O(ε2 ).
N̂
In other words, with a total number
Two particular examples have been studied in detail, and historically these
examples provided major impetus for the development of technical tools to
10.5. APPROXIMATE COUNTING VIA MARKOV CHAINS 353
estimate mixing times. Though the details are too technical for this book,
we outline these examples in the next two sections, and then consider in
detail the setting of self-avoiding walks.
B(1) = K0 ⊂ K1 ⊂ . . . ⊂ KL = K
harder setting of counting perfect matchings see the Notes). Enumerate the
edges of G0 as e1 , e2 , . . . , eL , where L is the number of edges of G0 . Write
Gi for the graph G with edges e1 , . . . , ei deleted. A matching of Gi can be
identified with a matching of Gi−1 which does not contain ei , so we can
write
M(GL−1 ) ⊂ M(GL−2 ) ⊂ . . . ⊂ M(G1 ) ⊂ M(G0 ).
Since GL−1 has one edge, we know |M(GL−1 )| = 2. The ratio |M(Gi+1 )|/|M(Gi )|
is the probability that a uniform random matching of Gi does not contain
the edge ei+1 . So the issue is to design and analyze a chain on a typical
set M(Gi ) of matchings whose stationary distribution is uniform. Here is a
natural such chain. From a matching M0 , pick a uniform random edge e of
G0 , and construct a new matching M1 from M0 and e as follows.
If e ∈ M0 then set M1 = M0 \ {e}.
If neither end-vertex of e is in an edge of M0 then set M1 = M0 ∪ {e}.
If exactly one end-vertex of e is in an edge (e0 say) of M0 then set
M1 = {e} ∪ M0 \ {e0 }.
This construction (the idea goes back to Broder [64]) yields a chain with
symmetric transition matrix, because each possible transition has chance
1/L. An elegant analysis by Jerrum and Sinclair [199], outlined in Jerrum
[197] section 5.1, used the distinguished paths technique to prove that on a
n-vertex L-edge graph
τ2 = O(Ln).
Since the number of matchings can be bounded crudely by n!,
Lvw = 1, w=v
= −(dv dw )−1/2 for an edge (v, w) (10.23)
= 0 else.
In the regular case, −L is the Q-matrix of transition rates for the continuous-
time random walk, and so Chung’s eigenvalues are identical to our continuous-
time eigenvalues. In the non-regular case there is no simple probabilistic in-
terpretation of L and hence no simple probabilistic interpretation of results
involving the relaxation time 1/λ2 associated with L.
Section 10.2.1. Chung [93] Chapter 3 gives more detailed results about
diameter and eigenvalues. One can slightly sharpen the argument for Propo-
sition 10.4 by using (10.11) and the analog of Chapter 4 Lemma 26 (yyy
10/11/94 version) in which the threshold for τ1disc is set at 1 − ε. Such argu-
ments give bounds closer to that of [93] Corollary 3.2: if G is not complete
then & '
log(n − 1)
∆≤ .
log 3−λ2
1+λ2
356CHAPTER 10. SOME GRAPH THEORY AND RANDOMIZED ALGORITHMS (SEPTEMB
Section 10.2.2. Chung [93] section 4.4 analyzes a somewhat related rout-
ing problem. Broder et al [70] analyze a dynamic version of path selection
in expanders.
Section 10.3.1. Example 10.7 (union of sets) and the more general DNF
counting problem were studied systematically by Karp et al [210]; see also
[265] section 11.2.
The Solovay-Strassen test of primality depends on a certain property of
the Jacobi symbol: see [265] section 14.6 for a proof of this property.
Section 10.4.1. Several other uses of random walks on expanders can
be found in Ajtai et al [4], Cohen and Wigderson [97], Impagliazzo and
Zuckerman [188].
Section 10.4.4. Tetali [325] discussions extensions of parts (a,b) of Propo-
sition 10.10 to nonsymmetric cost matrices.
Section 10.5. More extensive treatments of approximate counting are in
Sinclair [309] and Motwani and Raghavan [265] Chapter 12.
Jerrum et al [201] formalize a notion of self-reducibility and show that,
under this condition, approximate counting can be performed in polynomial
time iff approximately uniform sampling can. See Sinclair [309] section 1.4
for a nice exposition.
Abstractly, we are studying randomized algorithms which produce a ran-
dom estimate â(d) of a numerical quantity a(d) (where d measures the “size”
of the problem) together with a rigorous bound of the form
π(M ) ∝ λ|M |
for a parameter λ > 1. The distinguished paths technique [199, 197] giving
(10.22) works in this setting to give
τ1 = O(Ln2 λ log(nλ)).
358CHAPTER 10. SOME GRAPH THEORY AND RANDOMIZED ALGORITHMS (SEPTEMB
in exit times from a set A with π(A) small, i.e. hitting times on Ac where
π(Ac ) is near 1. In this setting one can replace inequalities using τ2 or τc
(which parameters involve the whole chain) by inequalities involving analo-
gous parameters for the chain restricted to A and its boundary. See Babai
[36] for uses of such bounds.
On several occasions we have remarked that for most properties of ran-
dom walk, the possibility of an eigenvalue near −1 (i.e. an almost-bipartite
graph) is irrelevant. An obvious exception arises when we consider lower
bounds for Pπ (TA > t) in terms of |A|, because in a bipartite graph with
bipartition {A, Ac } we have P (TA > 1) = 0. It turns out (Alon et al
[27] Proposition 2.4) that a corresponding lower bound holds in terms of
τn ≡ 1/(λn + 1).
t
|A|
Pπ (TA > t) ≥ max(0, 1 − nτn ) .
360CHAPTER 10. SOME GRAPH THEORY AND RANDOMIZED ALGORITHMS (SEPTEMB
Chapter 11
361
362CHAPTER 11. MARKOV CHAIN MONTE CARLO (JANUARY 8 2001)
R
or := g dπ, for specified g : S → R), how can one
P
x π(x)g(x)
estimate the numerical quantity using Monte Carlo (i.e. ran-
domized algorithm) methods?
Variations of this basic idea include running multiple chains and introducing
auxiliary variables (i.e. defining a chain on some product space S × A). The
basic scheme and variations are what make up the field of MCMC. Though
there is no a priori reason why one must use reversible chains, in practice
the need to achieve a target distribution π as stationary distribution makes
general constructions using reversibility very useful.
MCMC originated in statistical physics, but mathematical analysis of
its uses there are too sophisticated for this book, so let us think instead of
Bayesian statistics with high-dimensional data as the prototype setting for
MCMC. So imagine a point x ∈ Rd as recording d numerical characteristics
of an individual. So data on n individuals is represented as a n × d matrix
x = (xij ). As a model, we first take a parametric family φ(θ, x) of probability
densities; that is, θ ∈ Rp is a p-dimensional parameter and for each θ the
function x → φ(θ, x) is a probability density on Rd . Finally, to make a
Bayes model we take θ to have some probability density h(θ) on Rp . So
the probability model for the data is: first choose θ according to h(·), then
choose (xi· ) i.i.d. with density φ(θ, x). So there is a posterior distribution
11.1. OVERVIEW OF APPLIED MCMC 363
on θ specified by Qn
h(θ) i=1 φ(θ, xi· )
fx (θ) := (11.1)
zx
where zx is the normalizing constant. Our goal is to sample from fx (·),
for purposes of e.g. estimating posterior means of real-valued parameters.
An explicit instance of (11.1) is the hierarchical Normal model, but the
general form of (11.1) exhibits features that circumscribe the type of chains
it is feasible to implement in MCMC, as follows.
(i) Though the underlying functions φ(·, ·), h(·) which define the model
may be mathematically simple, our target distribution fx (·) depends on
actual numerical data (the data matrix x), so it is hard to predict, and
dangerous to assume, global regularity properties of fx (·).
(ii) The normalizing constant zx is hard to compute, so we want to define
chains which can be implemented without calculating zx .
The wide range of issues arising in MCMC can loosely be classified as “de-
sign” or “analysis” issues. Here “design” refers to deciding which chain to
simulate, and “analysis” involves the interpretation of results. Let us start
by discussing design issues. The most famous general-purpose method is
the Metropolis scheme, of which the following is a simple implementation in
setting (11.1). Fix a length scale parameter l. Define a step θ → θ(1) of a
chain as follows.
The target density enters the definition only via the ratios fx (θ0 )/fx (θ),
so the value of zx is not needed. The essence of a Metropolis scheme is
that there is a proposal chain which proposes a move θ → θ0 , and then
an acceptance/rejection step which accepts or rejects the proposed move.
See section 11.2.1 for the general definition, and proof that the stationary
distribution is indeed the target distribution. There is considerable flexibility
in the choice of proposal chain. One might replace the uniform proposal
step by a Normal or symmetrized exponential or Cauchy jump; one might
instead choose a random (i.e. isotropic) direction and propose to step some
random distance in that direction (to make an isotropic Normal step, or a
step uniform within a ball, for instance). There is no convincing theory to
364CHAPTER 11. MARKOV CHAIN MONTE CARLO (JANUARY 8 2001)
So simulate a single run of the chain, from some initial state, for some large
number t of steps. Estimate ḡ by
t
1 X
ĝ = g(Xi ) (11.3)
t − t0 i=t +1
0
11.1. OVERVIEW OF APPLIED MCMC 365
different nodes (if such existed), then one can seek schemes specially adapted
to multimodal target densities. Because it is easy to find local maxima of a
target density f , e.g. by a deterministic hill-climbing algorithm, one can find
modes by repeating such an algorithm from many initial states, to try to
find an exhaustive list of modes with relatively high f -values. This is mode-
hunting; one can then design a chain tailored to jump between the wells
with non-vanishing probabilities. Such methods are highly problem-specific;
more general methods (such as the multi-level or multi-particle schemes of
sections 11.3.3 and 11.3.4) seek to automate the search for relevant modes
within MCMC instead of having a separate mode-hunting stage.
In seeking theoretical analysis of MCMC one faces an intrinsic difficulty:
MCMC is only needed on “hard” problems, but such problems are difficult
to study. In comparing effectiveness of different variants of MCMC it is
natural to say “forget about theory – just see what works best on real
examples”. But such experimental evaluation is itself conceptually difficult:
pragmatism is easier in theory than in practice!
produces an output with density f (·) after mean c steps. By combining these
two methods, libraries of algorithms for often-encountered one-dimensional
distributions can be built, and indeed exist in statistical software packages.
But what about a general density f (x)? If we need to sample many times
from the same density, it is natural to use deterministic numerical methods.
First probe f at many values of x. Then either
(a) build up a numerical approximation to F and thence to F −1 ; or
11.1. OVERVIEW OF APPLIED MCMC 367
(b) choose from a library a suitable density g and use rejection sampling.
The remaining case, which is thus the only “hard” aspect of sampling from
one-dimensional distributions, is where we only need one sample from a
general distribution. In other words, where we want many samples which
are all from different distributions. This is exactly the setting of the Gibbs
sampler where the target multidimensional density is complicated, and thus
motivates some of the variants we discuss in section 11.3.
Practical relevance of theoretical mixing time parameters. Standard theory
from Chapter 4 (yyy cross-refs) relates τ2 to the asymptotic variance rate
σ 2 (g) at (11.2) for the “worst-case” g:
1 1 + λ2 σ 2 (g)
τ2 = ≈ = sup . (11.4)
1 − λ2 1 − λ2 g var π g
refers to worst-case initial state, requiring a burn-in time of τ1 seems far too
conservative in practice. The bottom line is that one cannot eliminate the
possibility of metastability error; in general, all one gets from multiple-runs
and diagnostics is confidence that one is sampling from a single potential
well, in the imagery below (though section 11.6.2 indicates a special setting
where we can do better).
One can call H a potential function; note that a mode (local maximum) of π
is a local minimum of H. One can envisage a realization of a Markov chain
as a particle moving under the influence of both a potential function (the
particle responds to some “force” pushing it towards lower values of H) and
random noise. Associated with each local minimum y of H is a potential
well, which we envisage as the set of points which under the influence of the
potential only (without noise) the particle would move to y (in terms of π,
states from which a “steepest ascent” path leads to y).
A fundamental intuitive picture is that the main reason why a reversible
chain may relax slowly is that there is more than one potential well, and the
chain takes a long time to move from one well to another. In such a case,
π conditioned to a single potential well will be a metastable (i.e. almost-
stationary) distribution. One expects the chain’s distribution, from any
initial state, to reach fairly quickly one (or a mixture) of these metastable
distributions, and then the actual relaxation time to stationarity is domi-
nated by the times taken to move between wells. In more detail, if there are
w wells then one can consider, as a coarse-grained approximation, a w-state
continuous-time chain where the transition rates w1 → w2 are the rates of
moving from well w1 to well w2 . Then τ2 for the original chain should be
closely approximated by τ2 for the coarse-grained chain.
n
fx (µ1 , . . . , µn , σ1 , . . . , σn ) = zx−1
Y
h(µi , σi )φ(µi , σi2 , xi ).
i=1
11.2. THE TWO BASIC SCHEMES 369
g(µ1 , . . . , µn , σ1 , . . . , σn ) := µi .
• accept the move (i.e. set x0 = y) with probability min(1, πy /πx ), oth-
erwise stay (x0 = x).
π k
• accept a proposed move x → y with probability min(1, πxy kyx
xy
).
The general case is often called Metropolis-Hastings – see Notes for termi-
nological comments.
and then (11.6) makes it clear that πx pxy = πy pyx . For irreducibility, we
need the condition
The term ud−1 here arises as a Jacobean; see Liu [235] Chapter 8 for expla-
nation and more examples in Rd .
It is easy to check that the chain has stationary distribution π, and is re-
versible if the K i are reversible, so in particular if the K i are defined by a
Metropolis-type propose-accept scheme. In the simplest setting where the
line sampler is the Gibbs sampler and we use the same one-dimensional pro-
posal step distribution each time, this scheme is Metropolis-within-Gibbs. In
that context is seems intuitively natural to use a long-tailed proposal distri-
bution such as the Cauchy distribution. Because we might encounter wildly
different one-dimensional target densities, e.g. one density with s.d. 1/10
and another with two modes separated by 10, and using a U (−L, L) step
proposal would be inefficient in the latter case if L is small, and inefficient
in the former case if L is large. Intuitively, a long-tailed distribution avoids
these worst cases, at the cost of having the acceptance rate be smaller in
good cases.
X m−1
Y m−1
Y πy
pxy = mkxy kx,yi ky,xi P min(1, q)
i=1 i=1 i πyi
11.3. VARIANTS OF BASIC MCMC 373
where the first sum is over ordered (2m−2)-tuples (y1 , . . . , ym−1 , x1 , . . . , xm−1 ).
So we can write
X m−1 m−1
!
Y Y 1 q
πx pxy = mkxy πx πy kx,yi ky,xi min P ,P .
i=1 i=1 i πyi i πyi
The choice of q makes the final term become min( P 1π , P 1π ). One can
i yi i xi
now check πx pxy = πy pyx , by switching the roles of xj and yj .
To compare MTM with single-try Metropolis, consider the m → ∞ limit,
in which the empirical distribution of y1 , . . . , ym will approach k(x, ·), and
so the distribution of the chosen yi will approach k(x, ·)π(·)/ax for ax :=
P
y kxy πy . Thus for large m the transition matrix of MTM will approximate
kxy πy
p∞
xy = min(1, ax /ay ), y 6= x.
ax
To compare with single-try Metropolis P , rewrite both as
p∞
xy = kxy πy min
1 1
ax , ay , y 6= x
1 1
pxy = kxy πy min πx , πy , y 6= x.
distribution and π by making the potential wells grow deeper. Fix a proposal
matrix K, and let Pθ be the transition matrix for the Metropolized chain
(11.5) associated with K and πθ . Now fix L and values 0 = θ1 < θ2 < . . . <
θL = 1. The idea is that for small θ the Pθ -chain should have less difficulty
moving between wells; for θ = 1 we get the correct distribution within each
well; so by varying θ we can somehow sample accurately from all wells.
There are several ways to implement this idea. Simulated tempering [254]
defines a chain on state space S × {1, . . . , L}, where state (x, i) represents
configuration x and parameter θi , and where each step is either of the form
• (x, i) → (x0 , i); x → x0 a step of Pθi
or of the form
• (x, i) → (x, i0 ); where i → i0 is a proposed step of simple random walk
on {1, 2, . . . , L}.
However, implementing this idea is slightly intricate, because normalizing
constants zθ enter into the desired acceptance probabilities. A more ele-
gant variation is the multilevel exchange chain suggested by Geyer [163] and
implemented in statistical physics by Hukushima and Nemoto [185]. First
(i)
consider L independent chains, where the i’th chain Xt has transition ma-
trix Pθi . Then introduce an interaction; propose to switch configurations
X (i) and X (i+1) , and accept with the appropriate probability. Precisely,
take state space S L with states x = (x1 , . . . , xL ). Fix a (small) number
0 < α < 1.
• With probability 1 − α pick i uniformly from {1, . . . , L}, pick x0i ac-
cording to Pθi (xi , ·) and update x by changing xi to x0i .
• With probability α, pick uniformly an adjacent pair (i, i + 1), and
propose to update x by replacing (xi , xi+1 ) by (xi+1 , xi ). Accept this
proposed move with probability
!
πθ (xi+1 )πθi+1 (xi )
min 1, i .
πθi (xi )πθi+1 (xi+1 )
and hence
pxy = pMetro
xy βxy
where βxy = βyx ≤ 1, y 6= x. So the result follows directly from Peskun’s
lemma (yyy Lemma 11.5, to be moved elsewhere). 2
This result can be interpreted as saying that the Metropolis rates (11.5)
are the optimal way of implementing a proposal-rejection scheme. Loosely
speaking, a similar result holds in any natural Metropolis-like construction
of a reversible chain using a max(1, ·) acceptance probability.
It is important to notice that Lemma 11.1 does not answer the following
question, which (except for highly symmetric graphs) seems intractable.
Question. Given a connected graph and a probability distribution π on its
vertices, consider the class of reversible chains with stationary distribution
π and with transitions only across edges of the graph. Within that class,
which chain has smallest relaxation time?
11.4. A LITTLE THEORY 377
Proof. For the chain started at state 1, the time T of the first acceptance of
a proposed step satisfies
P (T > t) = ρt .
Recall from (yyy Chapter 4-3 section 1; 10/11/99 version) the notion of
coupling. For this chain a natural coupling is obtained by using the same
U (0, 1) random variable to implement the accept/reject step (accept if U <
P (accept)) in two versions of the chain. It is easy to check this coupling
378CHAPTER 11. MARKOV CHAIN MONTE CARLO (JANUARY 8 2001)
(Xt , Xt0 ) respects the ordering: if X0 ≤ X00 then Xt ≤ Xt0 . At time T the fact
that a proposed jump from 1 is accepted implies that a jump from any other
state must be accepted. So T is a coupling time, and the coupling inequality
(yyy Chapter 4-3 section 1.1; 10/11/99 version) implies d(t) ¯ ≤ P (T > t).
¯ t
This establishes (a), and the general inequality d(t) = Ω(λ2 ) implies λ2 ≤ ρ.
On the other hand, for the chain started at state 1, on {T = 1} the time-1
distribution is π; in other words
We outline the proof in section 11.5.3. The result may look complicated, so
one piece of background may be helpful. Given a probability distribution on
the integers, there is a Metropolis chain for sampling from it based on the
380CHAPTER 11. MARKOV CHAIN MONTE CARLO (JANUARY 8 2001)
This result gives rise to the useful heuristic for random walk
Metropolis in practice:
Tune the proposal variance so that the average acceptance
rate is roughly 1/4.
We call this the diffusion heuristic for proposal-step scaling. Intuitively one
might hope that the heuristic would be effective for fairly general unimodel
target densities on Rd , though it clearly has nothing to say about the prob-
lem of passage between modes in a multimodal target. Note also that to
invoke the diffusion heuristic in a combinatorial setting, where the proposal
11.5. THE DIFFUSION HEURISTIC FOR OPTIMAL SCALING OF HIGH DIMENSIONAL METROPOL
chain is random walk on a graph, one needs to assume that the target dis-
tribution is “smooth” in the sense that π(v)/π(w) ≈ 1 for a typical edge
(v, w). In this case one can make a Metropolis chain in which the pro-
posal chain jumps σ edges in one step, and seek to optimize σ. See Roberts
[292] for some analysis in the context of smooth distributions on the d-cube.
However, such smoothness assumptions seem inapplicable to most practical
combinatorial MCMC problems.
(x1 , x2 , . . . , xd ) → (x1 + ξ1 , x2 + ξ2 , . . . , xd + ξd ).
Write
d
f (xi + ξ1 ) Y f (xi + ξi )
J = log ; S = log .
f (x1 ) i=2
f (xi )
This identifies the asymptotic drift and variance rates of Yd (t) with those of
Y (t).
Write h(u) := E min(1, eu+S ). Since
f 0 (x1 ) f 0 (x1 )
J ≈ log 1 + f (x1 ) ξ1 ≈ f (x1 ) ξ1 = 2µ(x1 )ξ1 ,
Since J has approximately Normal(0, 4µ2 (x1 )var ξ1 = 4µ2 (x1 )σ 2 /d) distri-
bution, proving (11.14,11.15) reduces to proving
θ
h0 (0) → (11.16)
2σ 2
θ
h(0) → . (11.17)
σ2
We shall argue
f 0 (xi ) f 0 (xi ) 2 2
log f (x i +ξi )
f (xi ) ≈ f (xi ) ξi − 1
2 f (xi ) ξi .
0
Write K(x) = d−1 di=2 ( ff (x
(xi ) 2
P
i)
) . Summing the previous approximation over
i, the first sum on the right has approximately Normal(0, σ 2 K(x)) distri-
bution, and (using the weighted law of large numbers) the second term
is approximately − 12 σ 2 K(x). So the distribution of S is approximately
Normal(−K(x)σ 2 /2, K(x)σ 2 ). But by the law of large numbers, for a typ-
ical x drawn from the product distribution πf we have K(x) ≈ κ2 , giving
(11.18).
To argue (11.17) we pretend S has exactly the Normal distribution at
(11.18). By a standard formula, if S has Normal(α, β 2 ) distribution then
2 /2
E max(1, eS ) = Φ(α/β) + eα+β Φ(−β − α/β).
This leads to
h(0) = 2Φ(−κσ/2)
which verifies (11.17). From the definition of h(u) we see
Lemma 11.4 In the setting of (yyy Chapter 2 Lemma 37), suppose π does
not depend on α. Then
d
dα Z = ZRZ.
xxx JF: I see this from the series expansion for Z – what to do about a
proof, I delegate to you!
t
σ 2 (P, f ) := lim t−1 var
X
f (Xs ) = f Γf (11.19)
t
s=1
(σ 2 (P, f ))0 ≤ 0.
386CHAPTER 11. MARKOV CHAIN MONTE CARLO (JANUARY 8 2001)
By (11.19)
(σ 2 (P, f ))0 = f Γ0 f = 2 0
XX
fi πi zij fj .
i j
where Mij is the matrix whose only non-zero entries are m(i, i) = m(j, j) =
−1; m(i, j) = m(j, i) = 1. Plainly Mij is non-negative definite, hence so is
W0 .
Chapter 12
387
388CHAPTER 12. COUPLING THEORY AND EXAMPLES (OCTOBER 11, 1999)
The reader may wish to look at a few of the examples before reading this
section in detail.
In applying coupling methodology there are two issues. First we need
to specify the coupling, then we need to analyze the coupling time. The
most common strategy for constructing couplings is via Markov couplings,
as follows. Suppose the underlying chain has state space I and (to take
12.1. USING COUPLING TO BOUND VARIATION DISTANCE 389
Finally, one does calculations with the integer-valued chain (Zt∗ ), either
∗ > t) directly or (what is often simpler)
bounding the tail probability P (Ta0
just bounding the expectation ETa0∗ , so that by Markov’s inequality and the
That is, from vertices v and w, if the first chain jumps to x then the sec-
ond chain jumps to θv,w (x), and we maximize the chance of the two chains
meeting after a single step. In general one cannot get useful bounds on the
coupling time. But consider the dense case, where r > n/2. As observed
in Chapter 5 Example 16, here |N (v) ∩ N (w)| ≥ 2r − n and so the coupled
process (Xt , Yt ) has the property that for w 6= v
|N (v) ∩ N (w)| 2r − n
P (Xt+1 = Yt+1 |Xt = v, Yt = w) = ≥
r r
implying that the coupling time T (for any initial pair of states) satisfies
t
n−r
P (T > t) ≤ .
r
12.1. USING COUPLING TO BOUND VARIATION DISTANCE 391
where the (ξu ) are independent exponential (rate 2/d) and d0 = |D(i, j)| is
the initial number of unmatched coordinates. So
P (T ≤ t) = (1 − exp(−2t/d))d0
τ1 ≤ ( 21 + o(1))d log d as d → ∞.
τ1 ∼ 14 d log d as d → ∞
Proof. We couple two versions of the chain by simply using the same v and
γ in both chains at each step. Write Dt for the number of vertices at which
the colors in the two chains differ. Then Dt+1 − Dt ∈ {−1, 0, 1} and the key
estimate is the following.
2rDt
P (Dt+1 = Dt + 1) ≤ (12.5)
cn
(c − 2r)Dt
P (Dt+1 = Dt − 1) ≥ (12.6)
cn
Proof. In order that Dt+1 = Dt + 1 it is necessary that the chosen pair (v, γ)
is such that
(*) there exists a neighbor (w, say) of v such that w has color γ
in one chain but not in the other chain.
But the total number of pairs (v, γ) equals nc while the number of pairs
satisfying (*) is at most Dt · 2r. This establishes (12.5). Similarly, for
Dt+1 = Dt − 1 it is sufficient that v is currently unmatched and that no
12.1. USING COUPLING TO BOUND VARIATION DISTANCE 393
neighbor of v in either chain has color γ; the number of such pairs (v, γ) is
at least Dt · (c − 2r). 2
Lemma 12.3 implies E(Dt+1 − Dt |Dt ) ≤ −(c − 4r)Dt /(cn) and so
c−4r
EDt+1 ≤ κEDt ; κ := 1 − cn .
With chance 1/d the two choices are the same card, so no change results.
To make a coupling analysis, we first give an equivalent reformulation.
This reformulation suggests the coupling in which the same choice of (a, i)
is used for each chain. In the coupled process (with two arbitrary starting
states) let Dt be the number of unmatched cards (that is, cards whose
positions in the two decks are different) after t steps. Then
(i) Dt+1 ≤ Dt .
(ii) P (Dt+1 ≤ j − 1|Dt = j) ≥ j 2 /d2 .
Here (i) is clear, and (ii) holds because whenever the card labeled a and
the card in position i are both unmatched, the step of the coupled chain
creates at least one new match (of the card labeled a).
Noting that Dt cannot take value 1, we can use the decreasing functional
lemma (Lemma 12.1) to show that the coupling time T := min{t : Dt = 0}
satisfies
d
X 2
ET ≤ d2 /j 2 ≤ d2 ( π6 − 1).
j=2
τ ∼ 21 d log d (12.8)
where here and below ±1 is interpreted modulo n. One can define a coupling
by specifying the following transition rates for the bivariate process.
1/2 1/2
(i, i) −→ (i + 1, i + 1); (i, i) −→ (i − 1, i − 1)
1/2 1/2
(if |j − i| > 1) (i, j) −→ (i + 1, j − 1); (i, j) −→ (i − 1, j + 1)
1/2 1/2 1/2
(i, i + 1) −→ (i, i); (i, i + 1) −→ (i + 1, i + 1); (i, i + 1) −→ (i − 1, i + 2)
(12.9)
(0) (k)
and symmetrically for (i+1, i). The joint process ((Xt , Xt ), t ≥ 0) started
at (0, k) can be visualized as follows. Let φ(i) := k − i mod n. Picture the
operation of φ as reflection in a mirror which passes through the points
{x1 , x2 } = {k/2, k/2 + n/2 mod n} each of which is either a vertex or the
middle of an edge. In the simplest case, where x1 and x2 are vertices, let
(Xt0 ) be the chain started at vertex 0, let T 0k = min{t : Xt ∈ {x1 , x2 }} and
define
(k) (0)
Xt = φ(Xt ), t ≤ T 0k
(0)
= Xt , t > T 0k .
This constructs a bivariate process with the transition rates specified above,
with coupling time T 0k , and the pre-T 0k path of X (k) is just the reflection
of the pre-T 0k path of X (0) . In the case where a mirror point is the middle
of an edge (j, j + 1) and the two moving particles are at j and j + 1, we
don’t want simultaneous jumps across that edge; instead (12.9) specifies that
attempted jumps occur at independent times, and the process is coupled at
the time of the first such jump.
It’s noteworthy that in this example the coupling inequality
(0) (k)
||P0 (Xt ∈ ·) − Pk (Xt ∈ ·)|| ≤ P (Xt 6= Xt )
But the support A0 of the first measure is the set of vertices which can be
reached from 0 without meeting or crossing any mirror point (and similarly
for Ak ); and A0 and Ak are indeed disjoint.
It is intuitively clear that the minimum over k of T 0k is attained by
k = bn/2c: we leave the reader to find the simple non-computational proof.
It follows, taking e.g. the simplest case where n is multiple of 4, that we
can write
¯ = P (T{−n/4,n/4} > t)
d(t) (12.10)
where T{−n/4,n/4} is the hitting time for continuous-time random walk on
the integers.
Parallel results hold in discrete time but only when the chains are suit-
ably lazy. The point is that (12.9) isn’t allowable as transition probabilities.
However, if we fix 0 < a ≤ 1/3 then the chain with transition probabilities
a a
i −→ i + 1; i −→ i − 1
(and which holds with the remaining probability) permits a coupling of the
form (12.9) with all transition probabilities being a instead of 1/2. The
analysis goes through as above, leading to (12.10) where T refers to the
discrete-time lazy walk on the integers.
Similar results hold for random walk on the n-path (Chapter 5 Example
8) (yyy 4/23/96 version). and we call couplings of this form reflection cou-
plings. They are simpler in the context of continuous-path Brownian motion
– see Chapter 13-4 section 1 (yyy 7/29/99 version).
until D(t) hits 0. By the elementary formula for mean hitting times on the
cycle (Chapter 5 eq. (24)) (yyy 4/23/96 version), the mean time T (a) until
card a becomes matched satisfies
d d2
ET (a) ≤ 2 4
In particular
τ1disc = O(d3 log d).
In this example it turns out that coupling does give the correct order of
magnitude; the corresponding lower bound
m − d 2d(r + 1)
P(x,y) (D1 = d + 1) ≤ (12.11)
m n
d n − (m + d − 2)(r + 1)
P(x,y) (D1 = d − 1) ≥ . (12.12)
m n
For in order that D1 = d + 1 we must first choose a matched particle a
(chance (m − d)/d) and then choose a vertex v which is a neighbor of (or the
same as) some vertex v 0 which is in exactly one of {x, y}: there are 2d such
vertices v 0 and hence at most 2d(r + 1) possibilities for v. This establishes
(12.11). Similarly, in order that Dt = d − 1 it is sufficient that we pick an
unmatched particle a (chance d/m) and then choose a vertex v which is not
a neighbor of (or the same as) any vertex v 0 which is occupied in one or
both realizations by some particle other than a: there are m + d − 2 such
forbidden vertices v 0 and hence at most (m+d−2)(r +1) forbidden positions
for v. This establishes (12.12).
From (12.11,12.12) a brief calculation gives
−d
E(x,y) (D1 − d) ≤ mn (n − (3m − d − 2)(r + 1))
−d
≤ mn (n − 3(m − 1)(r + 1)).
In other words
n − 3(m − 1)(r + 1)
E(x,y) D1 ≤ κd; κ := 1 − .
mn
400CHAPTER 12. COUPLING THEORY AND EXAMPLES (OCTOBER 11, 1999)
n
If m < 1 + 3(r+1) then κ < 1. In this case, by copying the end of the analysis
of the graph-coloring chain (section 12.1.5)
¯ ≤ mκt ;
d(t) τ1 = O log m
.
1−κ
1
To clarify the size-asymptotics, suppose m, n → ∞ with m/n → ρ < 3(r+1) .
Then for fixed ρ
τ1 = O(n log n).
π(2i−1) π(2i)
words {xl , xl )} use the same choice (forwards or backwards) in both
π(2i−1) π(2i)
chains, except when (xl , xl ) = (1, 0) for one chain and = (0, 1) for
the other chain, in which case use opposite choices of (forwards, backwards)
in the two chains.
To study the coupled processes (X(t), X̂(t)), fix l and consider the num-
ber W (t) := K k k
k=1 |Xl (t) − X̂l (t)| of words in which the l’th letter is not
P
w2
E(W (1)|W (0) = w) = w − 2(K−1) .
We can now apply a comparison lemma (Chapter 2 Lemma 32) (yyy 9/10/99
version) which concludes that the hitting time T l of W (t) to 0 satisfies
K
2(K−1)
X
ET l ≤ w2
≤ 2K.
w=2
and so
τ1disc = O(K log L).
Proof. Just take (Vi ) to be the non-homogeneous Markov chain whose tran-
sition probabilities P (Vi+1 ∈ ·|Vi = v) are the conditional probabilities de-
termined by the specified joint distribution µi,i+1 .
Lemma 12.7 (Path-coupling lemma) Take a discrete-time Markov chain
(i)
(Xt ) with finite state space I. Write X1 for the time-1 value of the chain
started at state i. Let ρ be a pre-metric defined on some subset E ⊂ I × I.
(i) (j)
Suppose that for each pair (i, j) in E we can construct a joint law (X1 , X1 )
such that
(i) (j)
E ρ̄(X1 , X1 ) ≤ κρ(i, j) (12.16)
for some constant 0 < κ < 1. Then
¯ ≤ ∆ ρ κt
d(t) (κ < 1) (12.17)
where ∆ρ := maxi,j∈I ρ̄(i, j).
See the Notes for comments on the case κ = 1.
Proof. Fix states i, j and consider a path (iu ) attaining the minimum
(i ) (i )
in (12.15). For each u let (X1 u , X1 u+1 ) have a joint distribution satis-
(i)
fying (12.16). By Lemma 12.6 there exists a random sequence (X1 =
(i ) (i ) (j)
X1 0 , X1 1 , . . . , X1 ) consistent with these bivariate distributions. In par-
(i) (j)
ticular, there is a joint distribution (X1 , X1 ) such that
(i) (j) X (i ) (i ) X
E ρ̄(X1 , X1 ) ≤ E ρ̄(X1 u , X1 u+1 ) ≤ κ ρ(iu , iu+1 ) = κρ̄(i, j).
u u
This construction gives one step of a coupling of two copies of the chain
(i) (j)
started at arbitrary states, and so extends to a coupling ((Xt , Xt ), t =
0, 1, 2, . . .) of two copies of the entire processes. The inequality above implies
(i) (j) (i) (j) (i) (j)
E(ρ̄(Xt+1 , Xt+1 )|Xt , Xt ) ≤ κρ̄(Xt , Xt )
and hence
(i) (j) (i) (j)
P (Xt 6= Xt ) ≤ E ρ̄(Xt , Xt ) ≤ κt ρ̄(i, j) ≤ κt ∆ρ
establishing (12.17). 2
Bubley and Dyer [77] introduced Lemma 12.7 and the name path-coupling.
It has proved useful in extending the range of applicability of coupling meth-
ods in settings such as graph-coloring (Bubley et al [79] Vigoda [333]) and
independent sets (Luby and Vigoda [244]). These are too intricate for pre-
sentation here, but the following example will serve to illustrate the use of
path-coupling.
404CHAPTER 12. COUPLING THEORY AND EXAMPLES (OCTOBER 11, 1999)
Make the same choice of i in both chains. Also make the same
choice of { move, stay }, except in the case where the elements
in positions i and i + 1 are the same elements in opposite order
in the two realizations, in which case use the opposite choices of
{ stay, move }.
The coupling is similar (but not identical) to that in section 12.1.9, where
the underlying chain is that corresponding to the “null” partial order. For
a general partial order, the coupling started from an arbitrary pair of states
seems hard to analyze directly. For instance, an element in the same position
in both realizations at time t may not remain so at time t + 1. Instead we
use path-coupling, following an argument of Bubley and Dyer [78]. Call two
states x and y adjacent if they differ by only one (not necessarily adjacent)
transposition; if the transposed cards are in positions i < j then let ρ(x, y) =
j−i. We want to study the increment Φ := ρ(X1 , Y1 )−ρ(x, y) where (X1 , Y1 )
is the coupled chain after one step from (x, y). The diagram shows a typical
pair of adjacent states.
a b c α d e f β g h
a b c β d e f α g h
position · · · i · · · j · ·
Observe first that any choice of position other than i − 1, i, j − 1, j will
have no effect on Φ. If position i and “move” are chosen, then {α, d} are
interchanged in the first chain and {β, d} in the second; both lead to feasible
12.2. NOTES ON CHAPTER 4-3 405
This estimate remains true if j = i+1 because in that case choosing position
i (chance w(i)) always creates a match. Now specify
n−1
i(n−i)
X
w(i) := wn , wn := j(n − j)
j=1
for adjacent (x, y). We are thus in the setting of Lemma 12.7, which shows
¯ ≤ ∆n exp(−t/wn ).
d(t)
Section 12.1.2. There may exist Markov couplings which are not of the
natural form (12.4), but examples typically rely on very special symmetry
properties. For the theoretically-interesting notion of (non-Markov) maxi-
mal coupling see Chapter 9 section 1 (yyy 4/21/95 version).
The coupling inequality is often presented using a first chain started
from an arbitrary point and a second chain started with the stationary
distribution, leading to a bound on d(t) instead of d(t).¯ See Chapter 13-
4 yyy for an example where this is used in order to exploit distributional
properties of the stationary chain.
Section 12.1.5. This chain was first studied by Jerrum [200], who proved
rapid mixing under the weaker assumption c ≥ 2r. His proof involved
a somewhat more careful analysis of the coupling, exploiting the fact that
“bad” configurations for the inequalities (12.5,12.6) are different. This prob-
lem attracted interest because the same constraint c ≥ 2r appears in proofs
of the absence of phase transition in the zero-temperature anti-ferromagnetic
Potts model in statistical physics. Proving rapid mixing under weaker hy-
potheses was first done by Bubley et al [79] in special settings and using
computer assistance. Vigoda [333] then showed that rapid mixing still holds
when c > 11 6 r: the proof first studies a different chain (still reversible with
uniform stationary distribution) and then uses a comparison theorem.
Section 12.1.6. The chain here was suggested by Jerrum [196] in the
context of a general question of counting the number of orbits of a per-
mutation group acting on words. More general cases (using a subgroup of
permutations instead of the whole permutation group) remain unanalyzed.
Section 12.1.10. See Luby and Vigoda [244] for more detailed study and
references.
Section 12.1.11. Conceptually, the states in these examples are unordered
families of words. In genetic algorithms for optimization one has an objec-
tive function f : {0, 1}L → R and accepts or rejects offspring words with
probabilities depending on their f -values.
Interesting discussion of some different approaches to genetics and com-
putation is in Rabani et al [287].
Hint for (12.14). First match the L’th letters in each word, using the
occasions when Ui = L or L + 1. This takes O(LK) time.
Section 12.1.12. Another setting where path-coupling has been used is
contingency tables: Dyer and Greenhill [137].
In the case where (12.16) holds for κ = 1, one might expect a bound of
the form
¯ = O(∆2 /α)
d(t) (12.18)
ρ
12.2. NOTES ON CHAPTER 4-3 407
(i) (j)
α := min P (ρ(X1 , X1 ) ≤ ρ(i, j) − 1).
(i,j)∈E
(i) (k)
by arguing that, for arbitrary (i, k), the process ρ(Xt , Xt ) can be com-
pared to a mean-zero random walk with chance α of making a negative
step. Formalizing this idea seems subtle. Consider three states i, j, k with
(i, j) ∈ E and (j, k) ∈ E. Suppose
(i) (j) (i) (j)
P (ρ(X1 , X1 ) = ρ(i, j) + 1) = P (ρ(X1 , X1 ) = ρ(i, j) − 1) = α
and otherwise ρ(·, ·) is unchanged; similarly for (j, k). The changes for the
(i, j) process and for the (j, k) process will typically be dependent, and in
the extreme case we might have
(i) (j) (j) (k)
ρ(X1 , X1 ) = ρ(i, j) + 1 iff ρ(X1 , X1 ) = ρ(j, k) − 1
(i) (k)
and symmetrically, in which case ρ(Xt , Xt ) might not change at all. Thus
proving a result like (12.18) must require further assumptions.
Section 12.1.13. The Markov chain here (with uniform weights) was first
studied by Karzanov and Khachiyan [211].
408CHAPTER 12. COUPLING THEORY AND EXAMPLES (OCTOBER 11, 1999)
Chapter 13
We have said several times that the theory in this book is fundamentally
a theory of inequalities. “Universal” or “a priori” inequalities for reversible
chains on finite state space, such as those in Chapter 4, should extend un-
changed to the continuous space setting. Giving proofs of this, or giving
the rigorous setup for continuous-space chains, is outside the scope of our
intermediate-level treatment. Instead we just mention a few specific pro-
cesses which parallel or give insight into topics treated earlier.
409
410CHAPTER 13. CONTINUOUS STATE, INFINITE STATE AND RANDOM ENVIRONMEN
where
G−1 (1/e) = 1.006. (13.2)
One can also regard BM as a limit of rescaled random walk, a result which
generalizes the classical central limit theorem. If (Xm , m = 0, 1, 2, . . .) is
simple symmetric random walk on Z, then the central limit theorem implies
d
m−1/2 Xm → B1 and the generalized result is
d
(m−1/2 Xbmtc , 0 ≤ t < ∞) → (Bt , 0 ≤ t < ∞) (13.3)
where the convergence here is weak convergence of processes (see e.g. Ethier
and Kurtz [141] for detailed treatment). For more general random flights
on Z, that is Xm = m
P
j=1 ξj with ξ1 , ξ2 , . . . independent and Eξ = 0 and
2
var ξ = σ < ∞, we have Donsker’s theorem ([133] Theorem 7.6.6)
d
(m−1/2 Xbmtc , 0 ≤ t < ∞) → (σBt , 0 ≤ t < ∞). (13.4)
Many asymptotic results for random walk on the integers or on the n-cycle
or on the n-path, and their d-dimensional counterparts, can be explained
in terms of Brownian motion or its variants. The variants of interest to us
take values in compact sets and have uniform stationary distributions.
Brownian motion on the circle can be defined by
Bt◦ := Bt mod 1
(n)
and then random walk (Xm , m = 0, 1, 2, . . .) on the n-cycle {0, 1, 2, . . . , n −
1} satisfies, by (13.3),
(n) d
n−1 (Xbn2 tc , 0 ≤ t < ∞) → (Bt◦ , 0 ≤ t < ∞) as n → ∞. (13.5)
That is, the segment of B (2) over 0 ≤ t ≤ Tx/2 is the image of the corre-
sponding segment of B (1) under the reflection which takes 0 to x. It is easy
to see that B (2) is indeed Brownian motion started at x. This is the re-
flection coupling for Brownian motion. We shall study analogous couplings
for variant processes. Given Brownian motion on the circle B ◦1 started at
0, we can construct another Brownian motion on the circle B ◦2 started at
0 < x ≤ 1/2 via
Bt◦2 = x − Bt◦1 mod 1, 0 ≤ t ≤ T{ x , x + 1 }
2 2 2
= Bt◦1 , T{ x , x + 1 } ≤ t < ∞
2 2 2
where
T{ x , x + 1 } := inf{t : Bt◦1 = x
2 or x
2 + 12 }.
2 2 2
The worst starting point is x = 1/2, and the hitting time in question can be
written as the hitting time T{−1/4,1/4} for standard Brownian motion, so
¯ = P (T{−1/4,1/4} > t) = G(16t)
d(t) (13.6)
412CHAPTER 13. CONTINUOUS STATE, INFINITE STATE AND RANDOM ENVIRONMEN
d
(Bc2 t , 0 ≤ t < ∞) = (cB(t), 0 ≤ t < ∞). (13.7)
See the Notes for an alternative formula. Thus for Brownian motion on the
circle
1 −1
τ1 = 16 G (1/e) = 0.063. (13.8)
2
τ2 = π2
.
2n2
τ2 ∼ π2
as n → ∞
d ◦ ◦
(B̄t , 0 ≤ t < ∞) = (2 min(Bt/4 , 1 − Bt/4 ), 0 ≤ t < ∞)
where the factor d−1/2 arises because the components of the random walk
(n)
have variance 1/d — see (13.4). Analogous to (13.5), random walk (Xm , m =
d ◦
0, 1, 2, . . .) on the discrete torus Zn converges to Brownian motion B on the
continuous torus [0, 1)d :
(n) d
n−1 (Xbn2 tc , 0 ≤ t < ∞) → (d−1/2 B◦t , 0 ≤ t < ∞) as n → ∞. (13.11)
where the final equality holds because T0 has the same distribution as the
time for Brownian motion stared at 1 to exit the interval (0, 2). This es-
tablishes (i) for r = 1. Then from the t → ∞ asymptotics of G(t) in (13.1)
¯ = O(exp(−π 2 t/8)), implying τ2 ≤ 8/π 2 by Lemma ?? and
we have d(t)
establishing (ii).
Sketch proof of Lemma. Details require familiarity with stochastic cal-
culus, but this outline provides the idea. For two Brownian motions in Rd
started from (0, 0, . . . , 0) and from (x, 0, . . . , 0), one can define the reflection
coupling by making the first coordinates evolve as the one-dimensional re-
flection coupling, and making the other coordinate processes be identical in
the two motions. Use isotropy to entend the definition of reflection coupling
to arbitrary starting points. Note that the distance between the processes
evolves as 2 times one-dimensional Brownian motion, until they meet. The
(1) (2)
desired joint distribution of ((Bt , Bt ), 0 ≤ t < ∞) is obtained by spec-
ifying that while both processes are in the interior of K, they evolve as the
reflection coupling (and each process reflects orthogonally at faces). As the
figure illustrates, the effect of reflection can only be to decrease distance
between the two Brownian particles. (
(( (((
( ( ((((
(( (
'$
b0
c ?
a
&%
c0
From state (x, y), choose the same random pair {i, j} for each
process, and link the new values x0i and yi0 (which are uniform
13.1. CONTINUOUS STATE SPACE 417
establishing (13.13).
Consider the effect on l1 distance ||x − y|| := i |xi − yi | of a step of the
P
|U (xi +xj )−U (yi +yj )|+|(1−U )(xi +xj )−(1−U )(yi +yj )|−|xi −yi |−|xj −yj |
−4 X X
= min(ci , dj )
d(d − 1) i∈A j∈B
−4 X X ci dj
=
d(d − 1) i∈A j∈B max(ci , dj )
−4 X X ci dj
≤
d(d − 1) i∈A j∈B ||x − y||/2
−2
= ||x − y||
d(d − 1)
dj = ||x − y||/2. So
P P
because i∈A ci = j∈B
2
E(x,y) ||X(1) − Y(1)|| ≤ 1 − ||x − y||.
d(d − 1)
418CHAPTER 13. CONTINUOUS STATE, INFINITE STATE AND RANDOM ENVIRONMEN
Because ||X(0) − Y(0)|| ≤ 2, it follows that after t steps using the scaling
coupling, t
2
E||X(t) − Y(t)|| ≤ 2 1 − .
d(d − 1)
So by taking t1 ∼ 3d2 log d, after t1 steps we have
So unconditionally
||x − y||
P(x,y) (greedy coupling works on first step) ≥ 1 − . (13.16)
mink yk
(d) (d)
Now the uniform distribution (Y1 , . . . , Yd ) on the simplex has the prop-
erty (use [133] Exercise 2.6.10 and the fact that the uniform distribution on
the simplex is the joint distribution of spacings between d − 1 uniform(0, 1)
variables and the endpoint 1)
(d)
if constants ad > 0 satisfy dad → 0 then P (Y1 ≤ ad ) ∼ dad .
(d) d (d)
Since (Y(t)) is the stationary chain and Yi = Y1 ,
(d)
P ( min Yk (t) ≤ d−4.5 for some t1 < t ≤ t1 + t2 ) ≤ t2 dP (Y1 < d−4.5 )
1≤k≤d
a1 a1
T T
T T
T T T
T T T
b1 T b3 b1 T
T b3
T T T T
T T T T
T T T T T
T
TT T a2 TT TT TT T a2
0 b2 0 b2
Graph G1 Graph G2
(d) d (∞)
(Xb5−d tc , 0 ≤ t < ∞) → (Xt , 0 ≤ t < ∞).
d d
where (M ; W1 , W2 , . . .) are independent, M = M1 and Wi = W . Now
(d)
in the topological setting, the vertices of Gd are a subset of G. Let X̃t
be the process on Gd ⊂ G whose sequence of jumps is as the jumps of the
discrete-time walk X (d) but where the times between jumps are independent
with distribution 5−d W . Using (13.18) we can construct the processes X̃ (d)
jointly for all d such that the process X̃ (d) , watched only at the times of
hitting (successively distinct) points of Gd−1 , is exactly the process X̃ (d−1) .
(∞)
These coupled processes specify a process Xt on G at a random subset of
times t. It can be shown that this random subset is dense and that sample
paths extend continuously to all t, and it is natural to call X (∞) Brownian
motion on the Sierpinski gasket.
• The infinite degree-r tree Tr and bounds for r-regular expander graphs
of large size (section 13.2.6).
422CHAPTER 13. CONTINUOUS STATE, INFINITE STATE AND RANDOM ENVIRONMEN
13.2.1 Set-up
We assume the reader has some acquaintance with classical theory (e.g., [133]
Chapter 5) for a countable-state irreducible Markov chain, which emphasizes
the trichotomy transient or null-recurrent or positive-recurrent. We use the
phrase general chain to refer to the case of an arbitrary irreducible transition
matrix P, without any reversibility assumption.
Recall from Chapter 3 section 2 the identification, in the finite-state
setting, of reversible chains and random walks on weighted graphs. Given
a reversible chain we defined edge-weights wij = πi pij = πj pji ; conversely,
given edge-weights we defined random walk as the reversible chain
X
pvx = wvx /wv ; wv = wvx . (13.19)
x
and we study the associated random walk (Xt ), i.e., the discrete-time chain
with pvx = wvx /wv . So in the unweighted setting (we ≡ 1), we have nearest-
neighbor random walk on a locally finite, infinite graph.
To explain why we adopt this set-up, say π is invariant for P if
X
πi pij = πj ∀j; πj > 0 ∀j.
i
One easily verifies that each of the two measures πi = 1 and πi = 2i is invari-
ant. Such nonuniqueness makes it awkward to seek to define reversibility of
P via the detailed balance equations
recurrent case (see Theorem 13.4 below); because in that case the questions
one asks, such as whether the relaxation time τ2 is finite, can be analyzed
by the same techniques as in the finite-state setting.
Our intuitive interpretation of “reversible” in Chapter 3 was “a movie
of the chain looks the same run forwards or run backwards”. But the chain
corresponding to the weighted graph with weights wi,i+1 = 2i , which is the
chain (13.21) with πi = 2i , has a particle moving towards +∞ and so cer-
tainly doesn’t satisfy this intuitive notion. On the other hand, a probabilistic
interpretation of an infinite invariant measure π is that if we start at time
0 with independent Poisson(πv ) numbers of particles at vertices v, and let
the particles move independently according to P, then the particle process
is stationary in time. So the detailed balance equations (13.22) correspond
to the intuitive “movie” notion of reversible for the infinite particle process,
rather than for a single chain.
Theorem 13.4 For a general chain, one of the following alternatives holds.
Recurrent. ρv = 1 and Ev Nv (∞) = ∞ and Pv (Nw (∞) = ∞) = 1 for all
v, w.
Transient. ρv < 1 and Ev Nv (∞) < ∞ and Pv (Nw (∞) < ∞) = 1 for all
v, w.
In the recurrent case there exists an invariant measure π, unique up to
constant multiples, and the chain is either
positive-recurrent: Ev Tv+ < ∞ ∀v and v πv < ∞; or
P
By analogy with the finite setting, we can regard the inf as the effective
resistance between v and infinity, although (see section ??) we shall not
attempt an axiomatic treatment of infinite electrical networks.
Theorem 13.5 has the following immediate corollary: of course (a) and
(b) are logically equivalent.
Corollary 13.6 (a) If a weighted graph is recurrent, then so is any sub-
graph.
(b) To show that a weighted graph is transient, it suffices to find a transient
subgraph.
Thus the classical fact that Z 2 is recurrent implies that a subgraph of Z 2
is recurrent, a fact which is hard to prove by bounding t-step transition
probabilities. In the other direction, it is possible (but not trivial) to prove
that Z 3 is transient by exhibiting a flow: indeed Doyle and Snell [131]
construct a transient tree-like subgraph of Z 3 .
Here is a different formulation of the same idea.
Corollary 13.7 The return probability ρv = Pv (Tv+ < ∞) cannot increase
if a new edge (not incident at v) is added, or the weight of an existing edge
(not incident at v) is increased.
13.2. INFINITE GRAPHS 425
t=0
Rd − 1
ρ(d) = , d ≥ 3.
Rd
Textbooks sometimes give the impression that calculating ρ(d) is hard, but
one can just calculate numerically the integral (13.31). Or see [174] for a
table.
The quantity ρ(d) has the following sample path interpretation. Let Vt
be the number of distinct vertices visited by the walk before time t. Then
The proof of this result is a textbook application of the ergodic theorem for
stationary processes: see [133] Theorem 6.3.1.
d
13.2.5 The torus Zm
We now discuss how random walk on Z d relates to m → ∞ asymptotics for
random walk on the finite torus Zm d , discussed in Chapter 5. We now use
random walk to Brownian motion of the d-dimensional torus, for which (cf.
sections 13.1.1 and 13.1.2) τ2 = 2π −2 . At (74)–(75) of Chapter 5 we saw
that the eigentime identity gave an exact formula for the mean hitting time
(m)
parameter τ0 , whose asymptotics are, for d ≥ 3,
Z 1 Z 1
(m) 1
m−d τ0 → R̂d ≡ ... 1 Pddx1 . . . dxd < ∞.
0 0 u=1 (1 − cos(2πxu ))
d
(13.34)
Here we give an independent analysis of this result, and the case d = 2.
Proposition 13.8
(n) 1 2
(d = 1) τ0 ∼ 6n (13.35)
(m) −1
(d = 2) τ0 ∼ 2π m2 log m (13.36)
(m) d
(d ≥ 3) τ0 ∼ Rd m (13.37)
The d = 1 result is from Chapter 5 (26). We now prove the other cases.
Proof. We may construct continuized random walk X e (m) on Z d from
t m
continuized random walk X d
e t on Z by
(m)
X
e
t =X
e t mod m (13.38)
(m)
and then P0 (X
e
t = 0) ≥ P0 (X
e t = 0). So
Z ∞
(m) (m)
m−d τ0 = P0 (X
e
t = 0) − m−d dt
0
(Chapter 2, Corollary 12 and (8))
Z ∞ +
(m)
= P0 (X
e
t = 0) − m−d dt by complete monotonicity
Z0∞ +
≥ e t = 0) − m−d
P0 (X dt (13.39)
0
Z ∞
→ P0 (X
e t = 0) dt = Rd .
0
where T is the first time that Xe (m) goes distance bm/2c from 0. And by
considering the construction (13.38)
(m) (m)
P (X
e
t = 0) ≤ P (X
e t = 0) + P (X
e
t = 0, T ≤ t)
(m)
and (13.41) follows, since P (Yet = 0) = 1/m.
Since the d-dimensional probabilities relate to the 1-dimensional proba-
d
(m)
bilities via P0 (X
e
t = 0) = p̃(m) (t/d) and similarly on the infinite lattice,
we can use inequality (13.41) to bound the integrand in (13.40) as follows.
(m)
P0 (X
e
t = 0) − m−d − P0 (X
e t = 0)
d
≤ 1
m + p̃(t/d) − m−d − (p̃(t/d))d
d−1
!
X d
= (p̃(t/d))j ( m
1 d−j
)
j=1
j
p̃(t/d) d−1
!
X d
= (p̃(t/d))j−1 ( m
1 d−1−j
)
m j=1 j
!
p̃(t/d) X d h id−2
≤ d−1 max((p̃(t/d))d−2 , m
1
)
m j=1 j
p̃(t/d) h i
= (2d − 2) max((p̃(t/d))d−2 , ( m
1 d−2
) )
m
d p̃(t/d) h i
≤ (2 − 2) 1 d−2
(p̃(t/d))d−2 + m )
"m #
d (p̃(t/d))d−1 p̃(t/d)
= (2 − 2) + d−1 .
m m
The fact (13.24) that p̃(t) = Θ(t−1/2 ) for large t easily implies that the
integral in (13.40) over 0 ≤ t ≤ m3 tends to zero. But by (13.33) and
430CHAPTER 13. CONTINUOUS STATE, INFINITE STATE AND RANDOM ENVIRONMEN
¯
submultiplicativity of d(t),
(m)
0 ≤ P0 (X
e
t = 0) − m−d ≤ d(t) ≤ d(t)
¯ ≤ B1 exp(− t 2 )
B2 m
(13.42)
where B1 , B2 depend only on d. This easily implies that the integral in
(13.40) over m3 ≤ t < ∞ tends to zero, completing the proof of (13.37).
In the case d = 2, we fix b > 0 and truncate the integral in (13.39) at
bm2 to get
Z bm2
(m)
m−2 τ0 ≥ −b + P0 (X
e t = 0) dt
0
= −b + (1 + o(1)) π2 log(bm2 ) by (13.30)
= (1 + o(1)) π2 log m.
Therefore
(m)
τ0 ≥ (1 + o(1)) π2 m2 log m.
R m2 2
For the corresponding upperbound, since e t = 0)dt ∼
P0 (X log m by
0 π
(m) R∞
e (m)
(13.30), and m−2 τ0 = 0 P0 (Xt = 0) − m−2 dt , it suffices to show
that Z ∞ +
(m)
P0 (X
e
t = 0) − m−2 − P0 (X
e t = 0) dt
0
Z ∞ +
(m)
+ P0 (X
e
t = 0) − m−2 dt = o(log N ). (13.43)
m2
To bound the first of these two integrals, we observe from (13.41) that
P0 (X e (m) = 0) ≤ (m−1 + p̃(t/2))2 , and so the integrand is bounded by
t
2
m p̃(t/2). Using (13.24), the first integral is O(1) = o(log m). To analyze
the second integral in (13.43) we consider separately the ranges m2 ≤ t ≤
m2 log3/2 m and m2 log3/2 m ≤ t < ∞. Over the first range, we again use
2
(13.41) to bound the integrand by m p̃(t/2) + (p̃(t/2))2 . Again using (13.24),
the integral is bounded by
Z m2 log3/2 m Z m2 log3/2 m
2 −1/2 −1
(1 + o(1)) t dt + (1 + o(1))π t−1 dt
π 1/2 m m2 m2
t−1 Vt → 1 − ρ = r−2
r−1 a.s. as t → ∞.
Gi = 1 + Fi Gi (13.47)
Pφ (Tφ+ = 2t) r
4(r − 1)
t
=
P0 (T0+ = 2t) 2(r − 1) r2
where the numerator refers to simple RW on the tree, and the denominator
refers to simple symmetric reflecting RW on Z + . So on the tree,
!1/2
r r 4(r − 1)z 2
q
4(r−1) 1 − 1 −
Fφ (z) = F0 z r2
= .
2(r − 1) 2(r − 1) r2
2(r − 1)
Gφ (z) = p . (13.48)
r − 2 + r2 − 4(r − 1)z 2
In particular, Gφ has radius of convergence 1/β, where
√
β = 2r−1 r − 1 < 1. (13.49)
13.2. INFINITE GRAPHS 433
Without going into details, one can now use standard Tauberian arguments
to show
Pφ (Xt = φ) ∼ αt−3/2 β t , t even (13.50)
for a computable constant α, and this format (for different values of α and
β) remains true for more general radially symmetric random flights on Tr
([339] Theorem 19.30). One can also in principle expand (13.48) as a power
series to obtain Pφ (Xt = φ). Again we shall not give details, but according
to Giacometti [164] one obtains
√ !t
r−1 r−1 Γ(1 + t)
Pφ (Xt = φ) =
r r Γ(2 + t/2)Γ(1 + t/2)
t 4(r−1)
× 2 F1 ( t+1
2 , 1, 2 + 2 , r2 ), t even (13.51)
f2 (i) := (1 + r−2
r i)(r − 1)−i/2 . (13.53)
Pv (Xtn = v) ≤ 1
n + nβ t (n).
t−3/2 β t (α − o(1)) ≤ 1
n + nβ t (n). (13.55)
For (b), the argument for (13.54) gives a coupling between the process X n
started at v and the process X ∞ started at φ such that
dn (Xtn , v) ≤ d∞ (Xt∞ , φ)
where dn and d∞ denote graph distance. Fix ε > 0 and write γ = r−2r + ε.
By the coupling and (13.44), P (dn (Xtn , v) ≥ γt) → 0 as n, t → ∞. This
13.2. INFINITE GRAPHS 435
For these two limit results to be consistent we must have γτ1 (n) ≥ (1 −
log n
ε) log(r−1) for all large n, establishing (b).
For (c), fix a vertex v0 of Gn and use the function f2 at (13.53) to
define f (v) := f2 (d(v, v0 )) for all vertices v of Gn . The equality (13.52) for
f2 on the infinite tree easily implies the inequality Pf ≥ βf on Gn . Set
f¯ := n−1 v f (v) and write 1 for the unit function. By the Rayleigh-Ritz
P
H
HH
H
HH
HH
@ @
@ @
@ @
@ @
A A A A
A A A A
A A A A
A A A A
0 1 10 11 100 101 110 111
Fix a parameter 0 < λ < r. Consider biased random walk Xt on the tree
Trhier , where from each non-leaf vertex the transition goes to the parent with
probability λ/(λ + r) and to each child with probability 1/(λ + r). Then
consider Y = “X watched only on L”, that is the sequence of (not-necessarily
distinct) successive leaves visited by X. The group L is distance-transitive
(for Hamming distance on L) and Y is a certain isotropic random flight on
L. A nice feature of this example is that without calculation we can see that
Y is recurrent if and only if λ ≤ 1. For consider the path of ancestors of 0.
The chain X must spend an infinite time on that path (side-branches are
finite); on that path X behaves as asymmetric simple random walk on Z + ,
which is recurrent if and only if λ ≤ 1; so X and thence Y visits 0 infinitely
often if and only if λ ≤ 1.
Another nice feature is that we can give a fairly explicit expression for
the t-step transition probabilities of Y . Writing H for the maximum height
reached by X in an excursion from the leaves, then
r
λ −1
P (H ≥ h) = P1 (T̂h < T̂0 ) = r h , h≥1
(λ) − 1
where T̂ denotes hitting time for the height process. Writing Mt for the
maximum height reached in t excursions,
!t
r
t λ −1
P (Mt < h) = (P (H < h)) = 1− .
( λr )h − 1
Since P (Mt = h) = P (Mt < h + 1) − P (Mt < h), we have found the “fairly
explicit expression” promised above. A brief calculation gives the following
time-asymptotics. Fix s > 0 and consider t ∼ s( λr )j with j → ∞; then
• non-trivial boundary.
Can these be related to properties for sequences of finite chains? We already
mentioned (section 13.2.3) that the property τ0 (n) = O(n) seems to be the
analog of transience. In this speculative section we propose definitions of
three other properties for sequences of finite chains, which we name
438CHAPTER 13. CONTINUOUS STATE, INFINITE STATE AND RANDOM ENVIRONMEN
• compactness
• infinite-dimensionality
• expander-like.
Future research will show whether these are useful definitions! Intuitively
we expect that every reasonably “natural” sequence should fall into one of
these three classes.
For simplicity we consider reversible random walks on Cayley graphs.
It is also convenient to continuize. The resulting chains are special cases
of (reversible) Lévy processes. We define the general Lévy process to be a
continuous-time process with stationary independent increments on a (con-
tinuous or discrete) group. Thus the setting for the rest of this section is a
(n)
sequence (Xt ) of reversible Lé—vy processes on finite groups G(n) of size
n → ∞ through some subsequence. Because we work in continuous time,
(n) (n)
the eigenvalues satisfy 0 = λ1 < λ2 ≤ · · ·.
(n)
(A): Compactness. Say the sequence (Xt ) is compact if there exists a
(discrete or continuous) compact set S and a reversible Lévy process X
e t on
S such that
˜ ≡ ||Pπ (X
(i) d(t) e t ∈ ·) → π|| → 0 as t → ∞;
λj (n)
(ii) λ2 (n) → λ̃j as n → ∞, j ≥ 2; where 1 = λ̃2 ≤ λ̃3 ≤ · · · are the
eigenvalues of (Xe t );
(n) ˜ as n → ∞; t > 0.
(iii) d (t τ2 (n)) → d(t)
These properties formalize the idea that the sequence of random walks form
discrete approximations to a limit Lévy process on a compact group, at least
as far as mixing times are concerned. Simple random walk on Zm d , and the
(Chapter 7) and τ1Y (n) ∼ τ2X (n). Note that by Chapter 7-1 Lemma 1 we
have τ2Y (n) = o(τ1Y (n)). If the product chain had a subsequential limit, then
its total variation function at (i), say d0 (t), must satisfy
d0 (t) = d(t),
˜ t>1
= 1. t < 1.
But it seems intuitively clear (though we do not know a proof) that ev-
ery Lévy process on a compact set has continuous d(·). This suggests the
following conjecture.
Conjecture 13.10 For any sequence of reversible Lévy processes satisfying
(13.57), there exists a subsequence satisfying the definition of compact except
that condition (ii) is replaced by
(iv): ∃t0 ≥ 0 :
d(n) (t τ2 (n)) → 1; t < t0
˜
→ d(t); t > t0 .
Before describing the other two classes of chains, we need a definition and
some motivating background. In the present setting, the property “trivial
boundary” is equivalent (see Notes) to the property
lim ||Pv (Xt ∈ ·) − Pw (Xt ∈ ·)|| = 0, ∀v, w. (13.58)
t→∞
if d(v, w) = b then
||Pv (X(t) ∈ ·) − Pw (X(t) ∈ ·)|| = ||P0 (X ∗ (tb/d) ∈ ·) − P1 (X ∗ (tb/d) ∈ ·)||.
Take d → ∞ with
b(d) ∼ dα , t(d) ∼ 41 εd log d
for some 0 < α, ε < 1, so that
t(d)b(d)/d
lim 1 → αε .
4 b(d) log b(d)
d
Since the variation cut-off for the b-cube is at 14 b log b, we see that for vertices
v and w at distance b(d),
as required.
Proof. We first give a construction of the random walk jointly with the
random set S. Write A = {a, b, . . .} for a set of k symbols, and write
Ā = {a, a−1 , b, b−1 , . . .}. Fix t ≥ 1 and let (ξs , 1 ≤ s ≤ t) be independent uni-
form on Ā. Choose (g(a), a ∈ A) by uniform sampling without replacement
from G, and set g(a−1 ) = (g(a))−1 . Then the process (Xs ; 1 ≤ s ≤ t) con-
structed via Xs = g(ξ1 )g(ξ2 ) . . . g(ξs ) is distributed as the random walk on
the random Cayley graph, started at the identity ι. So P (Xt = ι) = Epιι (t)
where pιι (t) is the t-step transition probability in the random environment,
and by (13.61) it suffices to take t = t1 (for t1 defined in the statement of
the Proposition) and show
|G|P (X2t = ι) − 1 → 0. (13.62)
To start the argument, let J(2t) be the number of distinct values taken
by (hξs i, 1 ≤ s ≤ 2t), where we define hai = ha−1 i = a. Fix j ≤ t and
1 ≤ s1 < s2 < . . . < sj ≤ 2t. Then
P (J(2t) = j|hξsi i distinct for 1 ≤ i ≤ j) = (j/k)2t−j ≤ (t/k)t .
By considering the possible choices of (si ),
!
2t
P (J(2t) = j) ≤ (t/k)t .
j
2t
= 22t we deduce
P
Since j j
i=1
13.3. RANDOM WALKS IN RANDOM ENVIRONMENTS 445
(k)
where the (Ri ) are independent of each other and of (ξ; W1 , W2 , . . . , Wξ ),
(k) d
and Ri = R(k) .
Consider first the special case ξ ≡ 3. Choose x such that P (Wi−1 >
x for some i) ≤ 1/16. Suppose inductively that P (R(k) > x) ≤ 1/4 (which
holds for k = 0 since R(0) = 0). Then
(k)
P (Ri + Wi−1 > 2x for at least 2 i’s ) ≤ 1
16 + 3( 14 )2 ≤ 41 .
This implies P (R(k+1) > x) ≤ 1/4, and the induction goes through. Thus
P (R > x) ≤ 1/4. By (13.66) p := P (R = ∞) satisfies p = p3 , so p = 0 or 1,
and we just eliminated the possibility p = 1. So R < ∞ a.s..
Reducing the general case to the special case involves a comparison idea,
illustrated by the figure.
c c
@ s(1) @ s(1)
r(2) @ r @
@ @
@ @
@
@ @
@
φ b s(2) ∞ φ r b s(2) ∞
r(1) @ @
@
r(3) @ r(4) @
@
@ s(3) r@ s(3)
@ @
r(5)@
@a @a
@
and hence the resistance Rφ∞ can indeed only be less in the left network.
In the general case, the fact P (ξ ≥ 2) > 0 implies that the number of
individuals in generation g tends to ∞ a.s. as g → ∞. So in particular we
can find 3 distinct individuals {A, B, C} in some generation G. Retain the
edges linking φ with these 3 individuals, and cut all other edges within the
first G generations. Repeat recursively for descendants of {A, B, C}. This
procedure constructs an infinite subtree, and it suffices to show that the
resistance between φ and ∞ in the subtree is a.s. finite. By the comparison
argument above, we may replace the network linking φ to {A, B, C} by three
direct edges with the same (random) resistance, and similarly for each stage
of the construction of the subtree; this gives another tree T , and it suffices
to shows its resistance is finite a.s.. But T fits the special case ξ ≡ 3.
It is not difficult (we won’t give details) to show that the distribution
of R is the unique distribution on (0, ∞) satisfying (13.66). It does seem
difficult to say anything explicit about the distribution of R in Proposition
13.13. One can get a little from comparison arguments. On the binary
tree (ξ ≡ 2), by using the exact potential function and the exact flows
from the unweighted case as “test functions” in the Dirichlet principle and
Thompson’s principle, one obtains
ER ≤ EW −1 ; ER−1 ≤ EW.
Proposition 13.14 The biased random walk is a.s. recurrent for λ ≥ Eξ.
where qi is the probability (on this realization) that the walk started at the
P P
i’th child of φ never hits φ. Rearrange to see q = ( i qi )/(λ + i qi ). So on
the random tree we have
Pξ
d i=1 qi
q = Pξ
λ+ i=1 qi
d
where the (qi ) are independent of each other and ξ, and qi = q. Applying
x
Jensen’s inequality to the concave function x → x+λ shows
(Eξ)(Eq)
Eq ≤ .
λ + (Eξ)(Eq)
Lyons et al [248] show that, when Eξ < ∞, (13.67) is indeed true and that
s(λ, µ) > 0 for all 1 ≤ λ < Eξ. Moreover in the unbiased (λ = 1) case there
is a simple formula [247]
s(1, µ) = E ξ−1
ξ+1 .
There is apparently no such simple formula for s(λ, µ) in general. See Lyons
et al [249] for several open problems in this area.
Now on any n-vertex tree, the mean hitting time t(i, j) = Ei Tj satisfies
(n) d
R1,2 → R(1) + R(2) ; where R1 and R2 are independent copies of R .
The lower bound is clear by shorting, but the upper bound requires a compli-
cated construction to connect the two sets of vertices at distances d(n) from
vertices 1 and 2 in such a way that the effective resistance of this connecting
network tends to zero.
The number of edges of the random graph is asymptotic to nµ/2. So the
P P
total edge weight i j Wij is asymptotic to nµEW , and by the commute
(n)
interpretation of resistance the mean commute time C1,2 for random walk
on a realization of the graph satisfies
(n) d
n−1 C1,2 → µEW (R(1) + R(2) ).
Case (ii): p(n) = o(1) = Ω(nε−1 ), some ε > 0. Here the degree
of vertex 1 tends to ∞, and it is easy to see that the (random) station-
ary probability π1 and the (random) transition probabilities and stationary
450CHAPTER 13. CONTINUOUS STATE, INFINITE STATE AND RANDOM ENVIRONMEN
(k) p
So for fixed k ≥ 1, the k-step transition probabilities satisfy p11 → 0 as
n → ∞. This suggests, but it is technically hard to prove, that the (random)
fundamental matrix Z satisfies
p
Z11 → 1 as n → ∞. (13.72)
Granted (13.72), we can apply Lemma 11 of Chapter 2 and deduce that the
mean hitting times t(π, 1) = Eπ T1 on a realization of the random graph
satisfies
p
n−1 t(π, 1) = nZπ111 → 1, as n → ∞. (13.73)
Theorem 13.15 Assign random conductances (we ) to the edges of the two-
dimensional lattice Z2 , where
(i) the process (we ) is stationary ergodic.
(ii) c1 ≤ we ≤ c2 a.s., for some constants 0 < c1 < c2 < ∞.
Let (Xt ; t ≥ 0) be the associated random walk on this weighted graph, X0 = 0.
d
Then t−1/2 Xt → Z where Z is a certain two-dimensional Normal distri-
bution, and moreover this convergence holds for the conditional distribution
of Xt given the environment, for almost all environments.
13.4. NOTES ON CHAPTER 13 451
¯
d(t) = P0 (Bt◦ ∈ (−1/4, 1/4)) − P1/2 (Bt◦ ∈ (−1/4, 1/4))
= P0 (Bt◦ ∈ (−1/4, 1/4)) − P0 (Bt◦ ∈ (1/4, 3/4))
= 2P0 (Bt◦ ∈ (−1/4, 1/4)) − 1
= 2P ((t1/2 Z) mod 1 ∈ (−1/4, 1/4)) − 1
where Bt is standard Brownian motion and µ(·) and σ(·) are suitably regular
specified functions. See Karlin and Taylor [209] for non-technical introduc-
tion. Theoretical treatments standardize (via a one-to-one transformation
R → R) to the case µ(·) = 0, though for our purposes the standardization
to σ(·) = 1 is perhaps more natural. In this case, if the formula
Z x
f (x) ∝ exp 2µ(y)dy
can give a density function f (x) then f is the stationary density. Such
diffusions relate to two of our topics.
(i) For MCMC, to estimate a density f (x) ∝ exp(−H(x)), one can in
principle simulate the diffusion with σ(x) = 1 and µ(x) = −H 0 (x)/2. This
idea was used in Chapter MCMC section 5.
(ii) Techniques for bounding the relaxation time for one-dimensional
diffusions parallel techniques for birth-and-death chains [90].
Section 13.2. We again refer to Woess [339] for systematic treatment of
random walks on infinite graphs.
Our general theme of using the infinite case to obtain limits for finite
chains goes back at least to [8], in the case of Z d ; similar ideas occur in
the study of interacting particle systems, relating properties of finite and
infinite-site models.
Section 13.2.2. There is a remarkable connection between recurrence of
reversible chains and a topic in Bayesian statistics: see Eaton [138]. Prop-
erties of random walk on fractal-like infinite subsets of Z d are studied by
Telcs [322, 323].
Section 13.2.9. One view of (Yt ) is as one of several “toy models” for
the notion of random walk on fractional-dimensional lattice. Also, when we
seek to study complicated variations of random walk, it is often simpler to
use the hierarchical lattice than Z d itself. See for instance the sophisticated
study of self-avoiding walks by Brydges et al [76]; it would be interesting to
see whether direct combinatorial methods could reproduce their results.
Section 13.2.10. Another class of sequences could be defined as follows.
There are certain continuous-time, continuous-space reversible processes on
compact spaces which “hit points” and for which τ0 < ∞; for example
13.4. NOTES ON CHAPTER 13 453
as the finite analog of “diffusions which hit points”. This property holds for
the discrete approximations to the examples above: (i) random walk on the
n-cycle
(i) random walk on graphs approximating fractals (section 13.1.6)
(iii) random walk on random n-vertex trees (section 13.3.4).
Equivalence (13.58) is hard to find in textbooks. The property “trivial
boundary” is equivalent to “no non-constant bounded harmonic functions”
([339] Corollary 24.13), which is equivalent ([328] Theorem 6.5.1) to ex-
istence of successful shift-coupling of two versions of the chain started at
arbitrary points. The property (13.58) is equivalent ([328] Theorem 4.9.4)
to existence of successful couplings. In the setting of interest to us (con-
tinuized chains on countable space), existence of a shift-coupling (a priori
weaker than existence of a coupling) for the discrete-time chain implies ex-
istence of a coupling for the continuous-time chain, by using independence
of jump chain and hold times.
Section 13.3. Grimmett [175] surveys “random graphical networks” from
a somewhat different viewpoint, emphasising connections with statistical
physics models.
Section 13.3.1. More precise variants of Proposition 3.35 were developed
in the 1970s, e.g. [281, 92]. Lubotzky [243], who attributes this method of
proof of Proposition 13.11 to Sarnak [305], asserts the result for k ≥ 5 but
our own calculations give only k ≥ 7. Note that Proposition 13.11 uses
the permutation model of a 2k-regular random graph. In the alternative
uniform model we put 2k balls labeled 1, 2k balls labeled 2, . . . . . . and 2k
balls labeled n into a box; then draw without replacement two balls at a
time, and put an edge between the two vertices. In both models the graphs
may be improper (multiple edges or self-loops) and unconnected, but are in
fact proper with probability Ω(1) and connected with probability 1 − o(1)
as n → ∞ for fixed k. Behavior of τc in the uniform model is implicitly
studied in Bollobás [54]. The L2 ideas underlying the proof of Proposition
13.12 were used by Broder and Shamir [68], Friedman [155] and Kahn and
Szemerédi [157] in the setting of the permutation model
√
of random r-regular
2 2r−1
graphs. One result is that β ≡ max(λ2 , −λn ) = O( r ) with probability
454CHAPTER 13. CONTINUOUS STATE, INFINITE STATE AND RANDOM ENVIRONMEN
Interacting Particles on
Finite Graphs (March 10,
1994)
We have already encountered results whose natural proofs were “by cou-
pling”, and this is a convenient place to discuss couplings in general.
455
456CHAPTER 14. INTERACTING PARTICLES ON FINITE GRAPHS (MARCH 10, 1994)
14.1 Coupling
If X and Y are random variables with Binomial (n, p1 ) and (n, p2 ) distribu-
tions respectively, and if p1 ≤ p2 , then it is intuitively obvious that
One could verify this from the exact formulas, but there is a more elegant
non-computational proof. For 1 ≤ i ≤ n define events (Ai , Bi , Ci ), indepen-
dent as i varies, with P (Ai ) = p1 , P (Bi ) = p2 − p1 , P (Ci ) = 1 − p2 . And
define
X0 =
X
1Ai = number of A’s which occur
i
Y0 =
X
1Ai ∪Bi = number of A’s and B’s which occur.
i
d
Then X 0 ≤ Y 0 , so (14.1) holds for X 0 and Y 0 , but then because X 0 = X
d
and Y 0 = Y we have proved that (14.1) holds for X and Y . This is the
prototype of a coupling argument, which (in its wide sense) means
Then
Pi1 (Xt = 0) = P (Yt = 0) ≥ P (Zt = 0) = Pi2 (Xt = 0).
14.1. COUPLING 457
But by reversibility
π0
Pi1 (Xt = 0) = P0 (Xt = i1 )
π i1
and similarly for i2 , establishing (a).
Existence of processes satisfying (14.2) is a consequence of the Doeblin
coupling discussed below. The proof of part (b) involves a different technique
and is deferred to section 14.1.3.
(i) (j)
Take (Xt , Xt ) to be the chain on I × I with transition rate matrix Q̃ and
(i) (j)
initial position (i, j), Then (14.3) must hold, and T ≡ min{t : Xt = Xt }
is a coupling time. This construction gives a Markov coupling, and all the
examples where we use the coupling inequality will be of this form. In
practice it is much more understandable to define the joint process in words
xxx red and black particles.
A particular choice of Q̃ is
in which case the joint process is called to Doeblin coupling. In words, the
Doeblin coupling consists of starting one particle at i and the other particle
at j, and letting the two particles move independently until they meet, at
time Mi,j say, and thereafter letting them stick together. In the particular
case of a birth-and-death process, the particles cannot cross without meeting
(i) (j)
(in continuous time), and so if i < j then Xt ≤ Xt for all t, the property
we used at (14.2).
q̃(i, j; iu , ju ) = 1/d if iu = ju
q̃(i, j; iu , j) = 1/d if iu 6= ju
q̃(i, j; i, ju ) = 1/d if iu 6= ju .
14.1. COUPLING 459
Xs(0,0) ≤ Xs(0,j) , 0 ≤ s ≤ t.
It follows that EMi,j = 12 Ei Tj . The next result shows this equality holds un-
der less symmetry, and (more importantly) that an inequality holds without
any symmetry.
Proof. This is really just a special case of the cat and mouse game of Chapter
3 section yyy, where the player is using a random strategy to decide which
animal to move. Write Xt and Yt for the chains started at i and j. Write
f (x, y) = Ex Ty − Eπ Ty . Follow the argument in Chapter 3 yyy to verify
Then
Ei Tj − Eπ Tj = ES0
= ESMi,j by the optional sampling theorem
= 2EMij + Ef (XMij , YMij )
= 2EMi,j − E t̄(XMi,j ), where t̄(k) = Eπ Tj .
In the symmetric case we have t̄(k) = τ0 for all k, establishing the desired
equality. In general we have t̄(k) ≤ maxi,j Ei Tj and the stated inequality
follows.
Remarks. Intuitively the bound in Proposition 14.5 should be reasonable
for “not too asymmetric” graphs. But on the n-star (Chapter 5 yyy), for ex-
ample, we have maxi,j EMi,j = Θ(1) while maxi,j Ei Tj = Θ(n). The “Θ(1)”
in that example comes from concentration of the stationary distribution,
and on a regular graph we can use Chapter 3 yyy to obtain
XX (n − 1)2
πi πj EMi,j ≥ .
i j
2n
But we can construct regular graphs which mimic the n-star in the sense
that maxi,j EMi,j = o(τ0 ). A more elaborate result, which gives the correct
order of magnitude on the n-star, was given in Aldous [15].
14.2. MEETING TIMES 463
The proof is too lengthy to reproduce, but let us observe as a corollary that
we can replace the maxi,j Ei Tj bound in Proposition 14.5 by the a priori
smaller quantity τ0 , at the expense of some multiplicative constant.
Proof of Corollary 14.7. First recall from Chapter 4 yyy the inequality
τ1 ≤ 66τ0 . (14.10)
Open Problem 14.8 In the setting of Proposition 14.6, does there exist
an absolute constant K such that
!−1
X πi
K max EMi,j ≥
i,j
i
max(Eπ Ti , τ1 )
464CHAPTER 14. INTERACTING PARTICLES ON FINITE GRAPHS (MARCH 10, 1994)
The other open problem is whether some modification of the proof of Propo-
sition 14.5 would give a small constant K in Corollary 14.7. To motivate
this question, note that the coupling inequality applied to the Doeblin cou-
¯ ≤ maxi,j P (Mi,j > t). Then Markov’s
pling shows that for any chain d(t)
inequality shows that the variation threshold satisfies τ1 ≤ e maxi,j EMi,j .
In the reversible setting, Proposition 14.5 now implies τ1 ≤ eKτ0 where K
is the constant in Corollary 14.7. So a direct proof of Corollary 14.7 with
small K would improve the numerical constant in inequality (14.10).
each edge e and each direction on e, create a Poisson process of rate 1/r. In
the figure, G is the 8-cycle, “time” is horizontal and an event of the Poisson
process for edge (i, j) at time t is indicated by a vertical arrow i → j at time
t.
t
0 time for coalescing RW 0
8=0 8=0
7 ? ? 7
6
6 ? 6
6
5 ? 5
6
4 ? ? 4
vertices vertices
3 ? 3
6 6
2 2
1 ? ? 1
6
0 ? 0
-
0 time for voter model t0
initially held by k
is exactly the same as the event (for the coalescing random walk process)
The horizontal lines in the figure indicate part of the trajectories. In terms
of the coalescing random walks, the particles starting at 5 and 7 coalesce,
and the cluster is at 4 at time t0 . In terms of the voter model, the opinion
initially held by person 4 is held by persons 5 and 7 at time t0 . The reader
may (provided this is not a library book) draw in the remaining trajectories,
and will find that exactly 3 of the initial opinions survive, i.e. that the
random walks coalesce into 3 clusters.
In particular, the event (for the voter model)
is exactly the same as the event (for the coalescing random walk process)
r|A|(n − |A|)
number of edges linking A and Ac ≥ . (14.11)
nτc
rn2
Proposition 14.9 (a) If G is s-edge-connected then EC ≤ 4s .
(b) EC ≤ 2 log 2 τc n.
14.3. COALESCING RANDOM WALKS AND THE VOTER MODEL467
Proof. The proof uses two ideas. The first is a straightforward compari-
son lemma.
where E ∗ T ∗ refers to mean hitting time for the chain X ∗ on states {0, 1, . . . , n}
with transition rates
qi,i+1 = qi,i−1 = a(i).
The second idea is that our voter model can be used to define a less-
informative “two-party” model. Fix an initial set B of vertices, and group
the opinions of the individuals in B into one political party (“Blues”) and
group the remaining opinions into a second party (“Reds”). Let NtB be
the number of Blues at time t and let C B ≤ C be the first time at which
everyone belongs to the same party. Then
B
P (Nt+dt = NtB + 1| configuration at time t)
B
= P (Nt+dt = NtB − 1| configuration at time t)
number of edges linking Blue - Red vertices at time t
= dt. (14.12)
r
Cases (a) and (b) now use Lemma 14.10 with different comparison chains.
For (a), while both parties coexist, the number of edges being counted in
(14.12) is at least s. To see this, fix two vertices v, x of different parties,
and consider (c.f. Chapter 6 yyy) a collection of s edge-disjoint paths from
v to x. Each path must contain at least one edge linking Blue to Red. Thus
the quantity (14.12) is at least rs dt. If that quantity were 12 dt then NtB
would be continuous time random walk on {0, . . . , n} and the quantity EC B
would be the mean time, starting at |B|, for simple random walk to hit 0 or
n, which by Chapter 5 yyy we know equals |B|(n − |B|). So using Lemma
14.10
r rn2
EC B ≤ |B|(n − |B|) ≤ . (14.13)
2s 8s
468CHAPTER 14. INTERACTING PARTICLES ON FINITE GRAPHS (MARCH 10, 1994)
For (b), use (14.11) to see that the quantity (14.12) must be at least
NtB (n−NtB )
nτc dt. Consider for comparison the chain on {0, . . . , n} with transi-
tion rates qi,i+1 = qi,i−1 = i(n − i)/n. For this chain
n−1
Ei∗ T{0,n}
∗
X
= ∗
Ei ( time spent in j before T{0,n}
j=1
n−1
X 1
2
= mi (j)
j=1
j(n − j)/n
where mi (j) is the mean occupation time for simple symmetric random walk
and the second term is the speed-up factor for the comparison chain under
consideration. Using the formula for mi (j) from Chapter 5 yyy,
n−1 i−1
1 1
Ei∗ T{0,n}
∗
X X
=i + (n − i) ≤ n log 2.
j=i
j j=1
n−j
Proof. We can construct the coalescing random walk process in two steps.
Order the vertices arbitrarily as i1 , . . . , in . First let the n particles perform
independent random walks for ever, with the particles starting at i, j first
meeting at time Mi,j , say. Then when two particles meet, let them cluster
and follow the future path of the lower-labeled particle. Similarly, when
14.3. COALESCING RANDOM WALKS AND THE VOTER MODEL469
two clusters meet, let them cluster and follow the future path of the lowest-
labeled particle in the combined cluster. Using this construction, we see
In the general setting, there is good reason to believe that the log term in
Proposition 14.11 can be removed.
Open Problem 14.13 Prove there exists an absolute constant K such that
on any graph
EC ≤ K max Ev Tw .
v,w
The assertion of Open Problem 14.12 in the case of the torus Zm d for
Lemma 14.14
2EE 1
γ= 2
+ ,
λrn n
where E is the number of edges with endpoints in different components, under
the stationary distribution.
Proof. Run the stationary process, and let A(t) and E(t) be the partition
and the number of edges linking distinct components, at time t, and let
S(t) = i |Ai (t)|2 . Then
P
Proof. Clearly EE is at most the total number of edges, nr/2, so the upper
bound follows from the lemma. For the lower bound, (14.11) implies
and hence
r
EE ≥ (n2 − n2 γ)
2nτc
and the bound follows from the lemma after brief manipulation.
We now consider bounds on γ obtainable by working with the dual pro-
cess. Consider the meeting time M of two independent random walks started
with the stationary distribution. Then by duality (xxx explain)
γ = P (M < ξ(2λ) )
where ξ(2λ) denotes a random variable with exponential (2λ) distribution in-
dependent of the random walks. Now M is the hitting time of the stationary
“product chain” (i.e. two independent continuous-time random walks) on
the diagonal A = {(v, v)}, so by Chapter 3 yyy M has completely monotone
distribution, and we shall use properties of complete monotonicity to get
Corollary 14.16
1 1 τ2
≤γ≤ + .
1 + 2λEM 1 + 2λEM EM
d
Proof. We can write M = Rξ(1) , where ξ(1) has exponential(1) distribution
and R is independent of ξ(1) . Then
(recall that τ2 is the same for the product chain as for the underlying random
walk). So
1 − γ = P (M ≥ ξ(2λ) )
Z ∞
= P (M ≥ t) 2λe−2λt dt
0
2λEM τ2
≥ −
1 = 2λEM EM
and the upper bound follows after rearrangement.
Note that on a vertex-transitive graph Proposition 14.5 implies EM =
τ0 /2. So on a sequence of vertex-transitive graphs with τ2 /τ0 → 0 and with
1
λτ0 → θ, say, Corollary 14.16 implies γ → 1+θ . But in this setting we can
say much more, as the next section will show.
For each person i and each time interval [t, t + dt], with chance
dt the person chooses uniformly at random a neighbor (j, say)
and changes (if necessary) their opinion to the opposite of the
opinion of person j.
The essential difference from the voter model is that opinions don’t disap-
pear. Writing ηv (t) for the opinion of individual v at time t, the process
η(t) = (ηv (t), v ∈ G) is a continuous-time Markov chain on state-space
{−1, 1}G . So, provided this chain is irreducible, there is a unique stationary
distribution (ηv , v ∈ G) for the antivoter model.
This model on infinite lattices was studied in the “interacting particle
systems” literature [172, 231], and again the key idea is duality. In this
model the dual process consists of annihilating random walks. We will not
go into details about the duality relation, beyond the following definition we
need later. For vertices v, w, consider independent continuous-time random
walks started at v and at w. We have previously studied Mv,w , the time
at which the two walks first meet, but now we define Nv,w to be the total
number of jumps made by the two walks, up to and including the time Mv,w .
Set Nv,v = 0.
Donnelly and Welsh [129] considered our setting of a finite graph, and
showed that Proposition 14.19 is a simple consequence of the duality relation.
In particular, defining X
S≡ ηv
v
so that S or −S is the “margin of victory” in an election, we have ES = 0
and XX
var S = c(v, w). (14.19)
v w
On a bipartite graph with bipartition (A, Ac ) the stationary distribution
is
and c(v, w) = −1 for each edge. Otherwise c(v, w) > −1 for every edge.
The antivoter process is in general a non-reversible Markov chain, be-
cause it can transition from a configuration in which v has the same opinion
as all its neighbors to the configuration where v has the opposite opinion,
but the reverse transition is impossible. Nevertheless we could use duality
to discuss convergence time. But, following [129], the spatial structure of
the stationary distribution is a more novel and hence more interesting ques-
tion. Intuitively we expect neighboring vertices to be negatively correlated
and the variance of S to be smaller than n (the variance if opinions were
independent). In the case of the complete graph on n vertices, Nv,w has (for
w 6= v) the geometric distribution
m
1
P (Nv,w > m) = 1 − ;m ≥ 0
n−1
n(n−2)
from which we calculate c(v, w) = −1/(2n − 3) and var S = 2n−3 < n/2.
We next investigate var S in general.
Proof. Writing (ηt ) for the stationary process and dSt = S(ηt+dt ) − S(ηt ),
we have
and so
Corollary 14.21 Let κ = κ(G) be the largest integer such that, for any
subset A of vertices, the number of edges with both ends in A or both ends
in Ac is at least κ. Then
2κ
≤ var S ≤ n.
r
Here κ is a natural measure of “non-bipartiteness” of G. We now show how
to improve the upper bound by exploiting duality. One might expect some
better upper bound for “almost-bipartite” graphs, but Examples 14.27 and
14.28 indicate this may be difficult.
Now consider
(1) (2)
T − = min{t ≥ 0 : X−t = X−t }.
If successive events occur at t0 and t1 , then there are t1 − t0 − 1 times s with
t0 < s < t1 , and another ergodic argument shows
(l − 1)P (L = l)
P (T + T − = l) = , l ≥ 2.
EL
So
1 X
n−1 (P (L is even) − P (L is odd)) = (−1)l P (L = l) since EL = n
EL l≥2
X (−1)l
= P (T + T − = l). (14.21)
l≥2
l−1
478CHAPTER 14. INTERACTING PARTICLES ON FINITE GRAPHS (MARCH 10, 1994)
(1) (2)
Conditional on (X0 , X0 ) = (v, w) with w 6= v, we have that T and T − are
independent and identically distributed. So the sum in (14.21) is positive,
implying P (L is odd) < 1/2, so the Proposition follows from the Lemma.
Implicit in the proof are a corollary and an open problem. The open
problem is to show that var S is in fact maximized on the complete graph.
This might perhaps be provable by sharpening the inequality in (14.22).
cedge < 0.
From the explicit form of the stationary distribution we can deduce that
as n → ∞ the asymptotic distribution of S is Normal. As an exercise in
technique (see Notes) we ask
where (c.f. Chapter 7 yyy) the pi,j are the transition probabilities for the
birth-and-death chain associated with the discrete-time random walk. In
principle we can solve these equations to determine f (1) = cedge . Note
that the bipartite case is the case where pi,i ≡ 0, which is the case where
f (i) ≡ (−1)i and cedge = −1. A simple example of a non-bipartite distance-
regular graph is the “2-subsets of a d-set” example (Chapter 7 yyy) for d ≥ 4.
Here ∆ = 2 and
1 d−3 d−2
p1,0 = p1,1 = 2(d−2) p1,2 =
2(d − 2) 2(d − 2)
4 2d − 8
p2,1 = 2(d−2) p2,2 = .
2(d − 2)
Q ≡ min{j : Mj is odd},
we have
1
P ( walks meet before Q) = + O(m−1 ).
5
Writing Q1 = Q, Q2 , Q3 , . . . for the sucessive j’s at which Mj changes parity,
and
L ≡ max{k : MQk < meeting time}
for the number of parity changes before meeting,
l
1 4
P (L = l) = + O(m−1 ), l ≥ 0
5 5
5
So P (ηvi ,wj is odd) = P (L is even) → 9 and (14.24) follows easily.
Consider the torus Zmd with d ≥ 2 and with even m ≥ 4, and make the
@
@
@
that the modification had only “local” effect, in that c(vm , wm ) → −1. In
fact,
c(vm , wm ) → −1, d = 2
→ β(d) > −1, d ≥ 3.
We don’t give details, but the key observation is that in d ≥ 3 there is a
bounded-below chance that independent random walks started from vm and
wm will traverse one of the modified edges before meeting.
If the answer is “yes”, then the general bound of Chapter 4 yyy will give
1
τ1 ≤ τ̃2 (1 + log n!) = O(τ̃2 n log n)
2
but the following bound is typically better.
Now Ui is distributed as Mv(i),w(i) , where v(i) and w(i) are the initial po-
sitions of particle i in the two versions and where Mv,w denotes meeting
time for independent copies of the underlying random walk X̃t . Writing
m∗ = maxv,w EMv,w , we have by subexponentiality (as at (14.16))
t
P (Mv,w > t) ≤ exp 1 −
em∗
and so
t
¯ ≤ n exp 1 −
d(t) .
em∗
This leads to τ1 ≤ (2 + log n)em∗ and the result follows from Proposition
14.5.
14.6. OTHER INTERACTING PARTICLE MODELS 483
There are many ways to set up such transition rates. Here is one way,
observed by Neuhauser and Sudbury [269]. For each edge (w, v) at time t
with w occupied,
if v is occupied at time t, then with chance dt it becomes unoccupied by
time t + dt
if v is unoccupied at time t, then with chance θdt it becomes occupied
by time t + dt.
If we exclude the empty configuration (which cannot be reached from other
configurations) the state space is irreducible and the stationary distribution
is given by (14.25) conditioned on being non-empty.
Convergence times for this model have not been studied, so we ask
Open Problem 14.31 Give bounds on the relaxation time τ2 in this model.
Picture one particle with charge +1/2 and the other particle with charge
(u) (u)
−1/2, and then γi has units “charge × time”. Clearly Eγi = 0 and it is
14.7. OTHER COUPLING EXAMPLES 485
easy to calculate
1
Z 0 Z 0
(u) (u)
Eγi γj = E 1(Xs =i,Xt =j) − πi πj ds dt
2 −u −u
Z 0 Z 0
1
= πi (P (Xt = j|Xs = i) − πj ) ds dt
2 −u −u
Z u
r
= πi 1− (pij (r) − πj ) dr
0 u
and hence
(u) (u)
u−1 Eγi γj → πi Zij as u → ∞. (14.27)
The central limit theorem for Markov chains (Chapter 2 yyy) implies that
(u)
the u → ∞ distributional limit of (u−1/2 γi ) is some mean-zero Gaussian
family (γi ), and so (14.27) identifies the limit as the family with covariances
(14.26).
As presented here the construction may seem an isolated curiousity, but
in fact it relates to deep ideas developed in the context of continuous-time-
and-space reversible Markov processes. In that context, the Dynkin iso-
morphism theorem relates continuity of local times to continuity of sample
paths of a certain Gaussian process. See [253] for a detailed account. And
various interesting Gaussian processes can be constructed via “charged par-
ticle” models – see [2] for a readable account of such constructions. Whether
these sophisticated ideas can be brought to bear upon the kinds of finite-
state problems in this book is a fascinating open problem.
m3
τ2 ≥
8π 2
Fix a finite alphabet A of size |A|. Fix m, and consider the set Am of
“words” x = (x1 , . . . , xm ) with each xi ∈ A. Consider the Markov chain on
Am in which a step x → y is specified by the following two-stage procedure.
Stage 1. Pick a permutation σ of {1, 2, . . . , m} uniformly at random from
the set of permutations σ satisfying xσ(i) = xi ∀i.
Stage 2. Let (cj (σ); j ≥ 1) be the cycles of σ. For each j, and indepen-
dently as j varies, pick uniformly an element αj of A, and define yi = αj for
every i ∈ cj (σ).
Here is an alternative description. Write Π for the set of permutations of
{1, . . . , m}. Consider the bipartite graph on vertices Am ∪ Π with edge-set
{(x, σ) : xσ(i) = xi ∀i}. Then the chain is random walk on this bipartite
graph, watched every second step when it is in Am .
From the second description, it is clear that the stationary probabilities
π(x) are proportional to the degree of x in the bipartite graph, giving
Y
π(x) ∝ na (x)!
a
14.7. OTHER COUPLING EXAMPLES 487
In particular P (X1 (t) 6= X2 (t)) ≤ m(1 − 1/|A|)t and the coupling inequality
(14.5) gives (14.28).
xxx proof of Lemma – tie up with earlier discussion.
488CHAPTER 14. INTERACTING PARTICLES ON FINITE GRAPHS (MARCH 10, 1994)
1 − (r − 1)−d(i,j)/2 g 1
ET ≥ (r − 1) 4 − 2 .
r−2
Proof. We quote a simple lemma, whose proof is left to the reader.
r − 1 2 1 −2
Eθξ1 +ξ2 ≤ θ + θ , 0 < θ < 1.
r r
In particular, setting θ = (r − 1)−1/2 , we have
Eθξ1 +ξ2 ≤ 1.
(i) (j)
Now consider the distance Dt ≡ d(Xt , Xt ) between the two particles.
The key idea is
(i) (j)
E(θDt+1 − θDt |Xt , Xt ) ≤ 0 if Dt ≤ bg/2c − 1
≤ (θ−2 − 1)θbg/2c else. (14.29)
The second inequality follows from the fact Dt+1 − Dt ≥ −2. For the
first inequality, if Dt ≤ bg/2c − 1 then the incremental distance Dt+1 − Dt
is distributed as ξ1 + ξ2 in the lemma, so the conditional expectation of
θDt+1 −Dt is ≤ 1. Now define a martingale (Mt ) via M0 = 0 and
(i) (j)
Mt+1 − Mt = θDt+1 − θDt − E(θDt+1 − θDt |Xt , Xt ).
14.8. NOTES ON CHAPTER 14 489
Rearranging,
t−1
X
θDt − θD0 = Mt + E(θDs+1 − θDs |Xs(i) , Xs(j) )
s=0
−2
≤ Mt + (θ − 1)θbg/2c t by (14.29).
Apply this inequality at the coupling time T and take expectations: we have
EMT = 0 by the optional sampling theorem (Chapter 2 yyy) and DT = 0,
so
1 − θd(i,j) ≤ (θ−2 − 1)θbg/2c ET
[2] R.J. Adler and R. Epstein. Some central limit theorems for Markov
paths and some properties of Gaussian random fields. Stochastic Pro-
cess. Appl., 24:157–202, 1987.
[5] D.J. Aldous. Markov chains with almost exponential hitting times.
Stochastic Process. Appl., 13:305–310, 1982.
[6] D.J. Aldous. Some inequalities for reversible Markov chains. J. London
Math. Soc. (2), 25:564–576, 1982.
[7] D.J. Aldous. Minimization algorithms and random walk on the d-cube.
Ann. Probab., 11:403–413, 1983.
[8] D.J. Aldous. On the time taken by random walks on finite groups to
visit every state. Z. Wahrsch. Verw. Gebiete, 62:361–374, 1983.
[9] D.J. Aldous. Random walks on finite groups and rapidly mixing
Markov chains. In Seminaire de Probabilites XVII, pages 243–297.
Springer-Verlag, 1983. Lecture Notes in Math. 986.
[10] D.J. Aldous. On the Markov chain simulation method for uniform
combinatorial distributions and simulated annealing. Probab. Engi-
neering Inform. Sci., 1:33–46, 1987.
491
492 BIBLIOGRAPHY
[24] D.J. Aldous, L. Lovász, and P. Winkler. Mixing times for uniformly
ergodic Markov chains. Stochastic Process. Appl., 71:165–185, 1997.
[37] L. Babai. Probably true theorems, cry wolf? Notices Amer. Math.
Soc., 41:453–454, 1994.
494 BIBLIOGRAPHY
[40] M.T. Barlow. Random walks and diffusions on fractals. In Proc. ICM
Kyoto 1990, pages 1025–1035. Springer–Verlag, 1991.
[43] M.F. Barnsley and J.H. Elton. A new class of Markov processes for
image encoding. Adv. in Appl. Probab., 20:14–32, 1988.
[44] J.R. Baxter and R.V. Chacon. Stopping times for recurrent Markov
processes. Illinois J. Math., 20:467–475, 1976.
[46] J. Besag and P.J. Greene. Spatial statistics and Bayesian computation.
J. Royal Statist. Soc. (B), 55:25–37, 1993. Followed by discussion.
[47] S. Bhatt and J-Y Cai. Taking random walks to grow trees in hyper-
cubes. J. Assoc. Comput. Mach., 40:741–764, 1993.
[56] A. Borodin, W.L. Ruzzo, and M. Tompa. Lower bounds on the length
of universal traversal sequences. J. Computer Systems Sci., 45:180–
203, 1992.
[60] L.A. Breyer and G.O. Roberts. From Metropolis to diffusions: Gibbs
states and optimal scaling. Technical report, Statistical Lab., Cam-
bridge U.K., 1998.
[61] G. Brightwell and P. Winkler. Extremal cover times for random walks
on trees. J. Graph Theory, 14:547–554, 1990.
[63] P.J. Brockwell and R.A. Davis. Time Series: Theory and Methods.
Springer–Verlag, 1987.
[66] A. Broder and A.R. Karlin. Bounds on the cover time. J. Theoretical
Probab., 2:101–120, 1989.
[77] R. Bubley and M. Dyer. Path coupling: a technique for proving rapid
mixing in Markov chains. In Proc. 38’th IEEE Symp. Found. Comp.
Sci., pages 223–231, 1997.
[84] E.A. Carlen, S. Kusuoka, and D.W. Stroock. Upper bounds for sym-
metric Markov transition functions. Ann. Inst. H. Poincaré Probab.
Statist., Suppl. 2:245–287, 1987.
[87] J. Cheeger. A lower bound for the lowest eigenvalue for the Laplacian.
In R. C. Gunning, editor, A Symposium in Honor of S. Bochner, pages
195–199. Princeton Univ. Press, 1970.
[89] M. F. Chen. Trilogy of couplings and general formulas for lower bound
of spectral gap. In L. Accardi and C. Heyde, editors, Probability To-
wards 2000, number 128 in Lecture Notes in Statistics, pages 123–136.
Springer–Verlag, 1996.
498 BIBLIOGRAPHY
[91] M.-H. Chen, Q.-M. Shao, and J.G. Ibrahim. Monte Carlo Methods in
Bayesian Computation. Springer–Verlag, 2000.
[95] F.R.K. Chung and S.-T. Yau. Eigenvalues of graphs and Sobolev
inequalities. Combin. Probab. Comput., 4:11–25, 1995.
[102] M.K. Cowles and B.P. Carlin. Markov chain Monte Carlo convergence
diagnostics: A comparative review. J. Amer. Statist. Assoc., 91:883–
904, 1996.
[103] J.T. Cox. Coalescing random walks and voter model consensus times
on the torus in Z d . Ann. Probab., 17:1333–1366, 1989.
BIBLIOGRAPHY 499
[104] J.T. Cox and D. Griffeath. Mean field asymptotics for the planar
stepping stone model. Proc. London Math. Soc., 61:189–208, 1990.
[114] P. Diaconis and J.A. Fill. Examples for the theory of strong stationary
duality with countable state spaces. Prob. Engineering Inform. Sci.,
4:157–180, 1990.
[115] P. Diaconis and J.A. Fill. Strong stationary times via a new form of
duality. Ann. Probab., 18:1483–1522, 1990.
[128] P. Donnelly and D. Welsh. Finite particle systems and infection mod-
els. Math. Proc. Cambridge Philos. Soc., 94:167–182, 1983.
[131] P.G. Doyle and J.L. Snell. Random Walks and Electrical Networks.
Mathematical Association of America, Washington DC, 1984.
[132] R. Durrett. Lecture Notes on Particle Systems and Percolation.
Wadsworth, Pacific Grove CA, 1988.
[133] R. Durrett. Probability: Theory and Examples. Wadsworth, Pacific
Grove CA, 1991.
[134] M. Dyer and A. Frieze. Computing the volume of convex bodies: A
case where randomness provably helps. In B. Bollobás, editor, Proba-
bilistic Combinatorics And Its Applications, volume 44 of Proc. Symp.
Applied Math., pages 123–170. American Math. Soc., 1991.
[135] M. Dyer, A. Frieze, and R. Kannan. A random polynomial time algo-
rithm for approximating the volume of convex bodies. In Proc. 21st
ACM Symp. Theory of Computing, pages 375–381, 1989.
[136] M. Dyer, A. Frieze, and R. Kannan. A random polynomial time al-
gorithm for approximating the volume of convex bodies. J. Assoc.
Comput. Mach., 38:1–17, 1991.
[137] M. Dyer and C. Greenhill. A genuinely polynomial-time algorithm for
sampling two-rowed contingency tables. Technical report, University
of Leeds, U.K., 1998. Unpublished.
[138] M.L. Eaton. Admissability in quadratically regular problems and re-
currence of symmetric Markov chains: Why the connection? J. Statist.
Plann. Inference, 64:231–247, 1997.
[139] R.G. Edwards and A.D. Sokal. Generalization of the Fortuin-
Kasteleyn-Swendsen-Wang representation and Monte Carlo algorithm.
Phys. Rev. D, 38:2009–2012, 1988.
[140] B. Efron and C. Stein. The jackknife estimate of variance. Ann.
Statist., 10:586–596, 1981.
[141] S.N. Ethier and T. G. Kurtz. Markov Processes: Characterization and
Convergence. Wiley, New York, 1986.
[142] U. Feige. A tight lower bound on the cover time for random walks on
graphs. Random Struct. Alg., 6:433–438, 1995.
[143] U. Feige. A tight upper bound on the cover time for random walks on
graphs. Random Struct. Alg., 6, 1995.
502 BIBLIOGRAPHY
[144] U. Feige. Collecting coupons on trees, and the cover time of random
walks. Comput. Complexity, 6:341–356, 1996/7.
[151] J.A. Fill. Strong stationary duality for continuous-time Markov chains.
part I: Theory. J. Theoretical Probab., 5:45–70, 1992.
[152] L. Flatto, A.M. Odlyzko, and D.B. Wales. Random shuffles and group
representations. Ann. Probab., 13:154–178, 1985.
[156] J. Friedman, editor. Expanding Graphs. Amer. Math. Soc., 1993. DI-
MACS volume 10.
BIBLIOGRAPHY 503
[160] A. Gelman, G.O. Roberts, and W.R. Gilks. Efficient Metropolis jump-
ing rules. In Bayesian Statistics, volume 5, pages 599–608. Oxford
University Press, 1996.
[161] A. Gelman and D.B. Rubin. Inference from iterative simulation us-
ing multiple sequences. Statistical Science, 7:457–472, 1992. With
discussion.
[166] W.R. Gilks, G.O. Roberts, and E.I. George. Adaptive direction sam-
pling. The Statistician, 43:179–189, 1994.
[168] F. Gobel and A.A. Jagers. Random walks on graphs. Stochastic Pro-
cess. Appl., 2:311–336, 1974.
[179] K.J. Harrison and M.W. Short. The last vertex visited in a random
walk on a graph. Technical report, Murdoch University, Australia,
1992.
[180] W.K. Hastings. Monte Carlo sampling methods using Markov chains
and their applications. Biometrika, 57:97–109, 1970.
[183] R.A. Horn and C.R. Johnson. Matrix Analysis. Cambridge University
Press, 1985.
BIBLIOGRAPHY 505
[187] J. P. Imhof. On the range of Brownian motion and its inverse process.
Ann. Probab., 13:1011–1017, 1985.
[192] I. Iscoe and D. McDonald. Asymptotics of exit times for Markov jump
processes I. Ann. Probab., 22:372–397, 1994.
[195] E. Janvresse. Spectral gap for Kac’s model of the Boltzmann equation.
Ann. Probab., 29:288–304, 2001.
[199] M. Jerrum and A. Sinclair. The Markov chain Monte Carlo method: an
approach to approximate counting and integration. In D. Hochbaum,
editor, Approximation Algorithms for NP-Hard Problems, pages 482–
520, Boston MA, 1996. PWS.
[200] M.R. Jerrum. A very simple algorithm for estimating the number of
k-colorings of a low-degree graph. Random Struct. Alg., 7:157–165,
1995.
[201] M.R. Jerrum, L.G. Valiant, and V.V. Vazirani. Random generation
of combinatorial structures from a uniform distribution. Theor. Com-
puter Sci., 43:169–188, 1986.
[202] C.D. Meyer Jr. The role of the group generalized inverse in the theory
of finite Markov chains. SIAM Review, 17:443–464, 1975.
[205] J.D. Kahn, N. Linial, N. Nisan, and M.E. Saks. On the cover time of
random walks on graphs. J. Theoretical Probab., 2:121–128, 1989.
[210] R.M. Karp, M. Luby, and N. Madras. Monte Carlo approximation al-
gorithms for enumeration problems. J. Algorithms, 10:429–448, 1989.
BIBLIOGRAPHY 507
[214] J.G. Kemeny and J.L. Snell. Finite Markov Chains. Van Nostrand,
1960.
[215] J.G. Kemeny, J.L Snell, and A.W. Knapp. Denumerable Markov
Chains. Springer–Verlag, 2nd edition, 1976.
[217] C. Kipnis and S.R.S. Varadhan. Central limit theorem for additive
functionals of reversible Markov processes and applications to simple
exclusions. Comm. Math. Phys., 104:1–19, 1986.
[218] W.B. Krebs. Brownian motion on the continuum tree. Probab. Th.
Rel. Fields, 101:421–433, 1995.
[219] H.J. Landau and A.M. Odlyzko. Bounds for eigenvalues of certain
stochastic matrices. Linear Algebra Appl., 38:5–15, 1981.
[222] G.F. Lawler and A.D. Sokal. Bounds on the L2 spectrum for Markov
chains and Markov proceses. Trans. Amer. Math. Soc., 309:557–580,
1988.
[226] G. Letac. A contraction principle for certain Markov chains and its
applications. In Random Matrices and Their Applications, volume 50
of Contemp. Math., pages 263–273. American Math. Soc., 1986.
[229] P. Lezaud. Chernoff-type bound for finite Markov chains. Ann. Appl.
Probab., 8:849–867, 1998.
[230] T.M. Liggett. Coupling the simple exclusion process. Ann. Probab.,
4:339–356, 1976.
[236] J.S. Liu, F. Liang, and W.H. Wong. The use of multiple-try method
and local optimization in Metropolis sampling. JASA, xxx:xxx, xxx.
[240] L. Lovász and P. Winkler. A note on the last new vertex visited by a
random walk. J. Graph Theory, 17:593–596, 1993.
[241] L. Lovász and P. Winkler. Efficient stopping rules for Markov chains.
In Proc. 27th ACM Symp. Theory of Computing, pages 76–82, 1995.
[253] M. B. Marcus and J. Rosen. Sample path properties of the local times
of strongly symmetric Markov processes via Gaussian processes. Ann.
Probab., 20:1603–1684, 1992.
[255] P.C. Matthews. Covering Problems for Random Walks on Spheres and
Finite Groups. PhD thesis, Statistics, Stanford, 1985.
[257] P.C. Matthews. Covering problems for Markov chains. Ann. Probab.,
16:1215–1228, 1988.
[259] P.C. Matthews. Mixing rates for Brownian motion in a convex poly-
hedron. J. Appl. Probab., 27:259–268, 1990.
[261] J.E. Mazo. Some extremal Markov chains. Bell System Tech. J.,
61:2065–2080, 1982.
[263] S.P. Meyn and R.L. Tweedie. Markov Chains and Stochastic Stability.
Springer–Verlag, 1993.
[264] J.W. Moon. Random walks on random trees. J. Austral. Math. Soc.,
15:42–53, 1973.
[283] J.W. Pitman. Occupation measures for Markov chains. Adv. in Appl.
Probab., 9:69–86, 1977.
[284] U. Porod. L2 lower bounds for a special class of random walks. Probab.
Th. Rel. Fields, 101:277–289, 1995.
[286] J. Propp and D. Wilson. Exact sampling with coupled Markov chains
and applications to statistical mechanics. Random Struct. Alg., 9:223–
252, 1996.
[293] G.O. Roberts, A. Gelman, and W.R. Gilks. Weak convergence and
optimal scaling of random walk Metropolis algorithms. Ann. Appl.
Probab., 7:110–120, 1997.
[294] G.O. Roberts and R.L. Tweedie. Geometric convergence and central
limit theorems for multidimensional Hastings and Metropolis algo-
rithms. Biometrika, 83:95–110, 1996.
BIBLIOGRAPHY 513
[303] W. Rudin. Real and Complex Analysis. McGraw–Hill Book Co., New
York, 3rd edition, 1987.
[306] A.M. Sbihi. Covering Times for Random Walks on Graphs. PhD
thesis, McGill University, 1990.
[308] A. J. Sinclair. Improved bounds for mixing rates of Markov chains and
multicommodity flow. Combin. Probab. Comput., 1:351–370, 1992.
[310] R.L. Smith. Efficient Monte Carlo procedures for generating points
uniformly distributed over bounded regions. Operations Research,
32:1296–1308, 1984.
[317] D.E. Symer. Expanded ergodic Markov chains and cycling systems.
Senior thesis, Dartmouth College, 1984.
[318] R. Syski. Passage Times for Markov Chains. IOS Press, Amsterdam,
1992.
[322] A. Telcs. Spectra of graphs and fractal dimensions I. Probab. Th. Rel.
Fields, 85:489–497, 1990.
[325] P. Tetali. Design of on-line algorithms using hitting times. Bell Labs,
1994.
[330] A.R.D. van Slijpe. Random walks on regular polyhedra and other
distance-regular graphs. Statist. Neerlandica, 38:273–292, 1984.
[331] A.R.D. van Slijpe. Random walks on the triangular prism and other
vertex-transitive graphs. J. Comput. Appl. Math., 15:383–394, 1986.
[333] E. Vigoda. Improved bounds for sampling colorings. C.S. Dept., U.C.
Berkeley, 1999.
[334] I.C. Walters. The ever expanding expander coefficients. Bull. Inst.
Combin. Appl., 17:79–86, 1996.
[336] H.S. Wilf. The editor’s corner: The white screen problem. Amer.
Math. Monthly, 96:704–707, 1989.
[338] D.B. Wilson. Mixing times of lozenge tiling and card shuffling Markov
chains. Ann. Appl. Probab., 14:274–325, 2004.
[343] D. Zuckerman. A technique for lower bounding the cover time. SIAM
J. Discrete Math., 5:81–87, 1992.