0% found this document useful (0 votes)

171 views578 pages

Bovier & Den Hollander - Metastability

Uploaded by

Joey Carter

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

171 views578 pages

Bovier & Den Hollander - Metastability

Uploaded by

Joey Carter

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 578

Grundlehren der mathematischen Wissenschaften 351

A Series of Comprehensive Studies in Mathematics

Anton Bovier
Frank den Hollander

Metastability
A Potential-Theoretic Approach
Grundlehren der
mathematischen Wissenschaften 351
A Series of Comprehensive Studies in Mathematics

Series editors

M. Berger P. de la Harpe N.J. Hitchin

A. Kupiainen G. Lebeau F.-H. Lin
S. Mori B.C. Ngô M. Ratner D. Serre
N.J.A. Sloane A.M. Vershik M. Waldschmidt

Editor-in-Chief
A. Chenciner J. Coates S.R.S. Varadhan
For further volumes:
www.springer.com/series/138
Anton Bovier r Frank den Hollander

Metastability

A Potential-Theoretic Approach
Anton Bovier Frank den Hollander
Institut für Angewandte Mathematik Mathematisch Instituut
Rheinische Friedrich-Wilhelms-Universität Universiteit Leiden
Bonn, Germany Leiden, The Netherlands

ISSN 0072-7830 ISSN 2196-9701 (electronic)

Grundlehren der mathematischen Wissenschaften
ISBN 978-3-319-24775-5 ISBN 978-3-319-24777-9 (eBook)
DOI 10.1007/978-3-319-24777-9

Library of Congress Control Number: 2015959720

Mathematics Subject Classification (2010): 60K35, 60J45, 82C26

Springer Cham Heidelberg New York Dordrecht London

© Springer International Publishing Switzerland 2015
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, express or implied, with respect to the material contained herein or for any
errors or omissions that may have been made.

Printed on acid-free paper

Springer is part of Springer Science+Business Media (www.springer.com)

Preface

C’est une grande folie de vouloir être sage tout seul.

(François de La Rochefoucauld, Réflexions)

Metastability is a wide-spread phenomenon in the dynamics of non-linear systems—

physical, chemical, biological or economic—subject to the action of temporal ran-
dom forces typically referred to as noise. In the narrower perspective of statisti-
cal physics, metastable behaviour can be seen as the dynamical manifestation of
a first-order phase transition, i.e., a crossover that involves a jump in some intrin-
sic physical parameter such as the energy density or the magnetisation. Attempts
to understand and model metastable systems mathematically go back to the early
20th century, notably through the work of H. Eyring and H.A. Kramers, who were
concerned with metastable phenomena occurring in chemical reactions.
The modern mathematical approach to metastability was pioneered by M.I. Frei-
dlin and A.D. Wentzell in the late 1960’s and early 1970’s. They introduced the the-
ory of large deviations on path-space in order to analyse the long-term behaviour
of dynamical systems under the influence of weak random perturbations. Their re-
alisation that metastable behaviour is controlled by large deviations of the random
processes driving the dynamics has permeated most of the mathematical literature
on the subject since. A comprehensive account of this development, referred to as
the pathwise approach to metastability, is given in their 1984 monograph Random
Perturbations of Dynamical Systems [115]. At around the same time the application
of these ideas in a statistical physical context was initiated in a paper by M. Cas-
sandro, A. Galves, E. Olivieri and M.E. Vares [51], which in turn triggered a whole
series of papers on metastability of Markovian lattice models. This further develop-
ment is treated at length in the 2005 monograph Large Deviations and Metastability
by E. Olivieri and M.E. Vares [198], which provides the key elements of the sym-
biosis between statistical physics, large deviation theory and metastability.
The present book is concerned with an alternative way to tackle metastability,
initiated around 2000 by A. Bovier, M. Eckhoff, V. Gayrard and M. Klein [33],
referred to now as the potential-theoretic approach to metastability. Here, the path-
wise view taken in the Freidlin-Wentzell theory is largely discarded. Instead of aim-
ing at identifying the most likely paths and estimating their probabilities, it inter-
prets the metastability phenomenon as a sequence of visits of the path to different

v
vi Preface

metastable sets, and focuses on the precise analysis of the respective hitting prob-
abilities and hitting times of these sets with the help of potential theory. The fact
that this requires the solution of Dirichlet problems in typically high-dimensional
spaces has probably acted as a deterrent for a long time, and has prevented an ef-
ficient use of the ensuing methods at a much earlier stage. The key point in the
potential-theoretic approach is the realisation that, in the specific setting related to
metastability, most questions of interest can be reduced to the computation of capac-
ities, and that these capacities in turn can be estimated by exploiting powerful varia-
tional principles. In this way, the metastable dynamics of the system can essentially
be understood via an analysis of its statics. This constitutes a major simplification,
and acts as a guiding principle. In addition, potential theory also allows to deduce
detailed information on the spectral characteristics of the generator of the dynam-
ics, which are typically assumed in the so-called spectral approach to metastability
initiated by Davies [73, 74] in the 1980’s.
The setting of this book is the theory of Markov processes, for the most part, re-
versible Markov processes. Within this limitation, however, there is a wide range of
models that are adequate to describe a variety of different real-world systems. The
models we aim at range from finite-state Markov chains, finite-dimensional diffu-
sions and stochastic partial differential equations, via mean-field dynamics with and
without disorder, to stochastic spin-flip and particle-hopping dynamics and proba-
bilistic cellular automata. Our main aim is to unveil the common universal features
of these systems with respect to their metastable behaviour.
The book is divided into nine parts:
• Part I presents the metastability phenomenon in its various manifestations, with
emphasis on its universal aspects. A brief overview of the history of the subject is
given, including a comparison of the pathwise, the spectral, the potential-theoretic
and the computational approach. Two paradigmatic models are presented: the
Kramers model of Brownian motion in a double-well potential and the two-state
Markov chain. These models serve as a red thread through the book, in the sense
that the much more complex and real-world models treated later still exhibit a
metastable behaviour that is in many respects similar. An outline of which models
will be treated in the book and which main techniques will be used to analyse
them is provided, as well as a brief perspective on metastability in areas other
than interacting particle systems.
• Part II provides the necessary background on Markov processes (and can be
skipped by readers with a basic knowledge of probability theory). Here, the cen-
tral theme is the relation between Markov processes, martingales, and Dirich-
let problems. A brief outline of large deviation theory is provided, as well as a
description of three variational principles for capacities that play a key role in
the study of metastability: the Dirichlet principle, the Thomson principle and the
Berman-Konsowa principle.
• Part III contains the core of the theory. Here, we give the definition of metastable
systems and metastable sets in terms of properties of capacities, and we describe
the consequences of these definitions for the distribution of metastable hitting
Preface vii

times and for the spectral properties of the associated Markov generators. We also
introduce and discuss the basic techniques that can be used to compute capacities
and equilibrium potentials, and to estimate harmonic functions.
Parts IV–VIII highlight the key models that can be treated with the help of these
techniques. It is here that the potential-theoretic approach to metastability fully
comes to life.
• Part IV studies diffusions with small noise, both finite-dimensional (random walks
and stochastic differential equations) and infinite-dimensional (stochastic partial
differential equations).
• Part V describes coarse-graining techniques applied to the Curie-Weiss model
in large volumes at positive temperatures, both for a non-random and a random
magnetic field.
• Part VI focusses on lattice systems in small volumes at low temperatures. In this
setting, energy dominates entropy. An abstract set-up is put forward, and universal
metastability theorems are derived under general hypotheses. These hypotheses
are subsequently proved for Ising spins subject to Glauber dynamics and lattice
gases subject to Kawasaki dynamics.
• Part VII extends the results in Part VI to lattice systems in large volumes at low
temperatures. In large volumes, spatial entropy comes into play, which compli-
cates the analysis. Both for Glauber dynamics and Kawasaki dynamics the key
quantities controlling metastable behaviour can be identified, but at the cost of a
severe restriction on the starting measure of the dynamics.
• Part VIII looks at metastable behaviour of lattice systems in small volumes at high
densities, in particular the zero-range process.
• Part IX lists a number of challenges for future research, both within metastability
and beyond. It describes systems that are presently too hard to deal with in detail,
but are expected to come within reach in the next few years. In particular, we
look at post-nuclear growth for Ising spins subject to Glauber dynamics (limiting
shape of large droplets) and at continuum particle systems with pair interactions
(crystallisation), for which a number of results are already available.
Along the way we will encounter a variety of ideas and techniques from proba-
bility theory, analysis and combinatorics, including martingale theory, variational
calculus and isoperimetric inequalities. It is the combination of physical insight and
mathematical tools that allows for making progress, in the best of the tradition of
mathematical physics.
Throughout the book we only consider classical stochastic dynamics. It would be
interesting to consider quantum stochastic dynamics as well, but this is beyond the
scope of the book. We also do not address issues related to numerical simulation,
which is rather delicate due to the extremely long time scales involved.
It is a pleasure to thank the colleagues with whom we have worked on metasta-
bility over the past 15 years: Florent Barret, Alessandra Bianchi, Michael Eck-
hoff, Alessandra Faggionato, Alexandre Gaudillière, Véronique Gayrard, Dmitry
Ioffe, Sabine Jansen, Oliver Jovanovski, Markus Klein, Roman Kotecký, Francesco
viii Preface

Manzo, Sylvie Méléard, Patrick Müller, Francesca Nardi, Rebecca Neukirch, Enzo
Olivieri, Elena Pulvirenti, Elisabetta Scoppola, Martin Slowik, Cristian Spitoni, Sia-
mak Taati and Alessio Troiani. Special thanks are due to Aernout van Enter for
reading the entire text and providing a host of valuable comments.

Bonn, Germany Anton Bovier

Leiden, The Netherlands Frank den Hollander
June 4, 2015
Preface ix

Logical structure of the monograph

Acknowledgements

Anton Bovier was supported by the German Research Foundation (DFG) through
the Collaborative Research Centers 611 Singular Phenomena and Scaling in Math-
ematical Models and 1060 The Mathematics of Emergent Effects, by the Haus-
dorff Center for Mathematics (HCM) in Bonn, by the German-Israeli Foundation
(GIF), and by the Lady Davis Fellowship Trust (Haifa and Jerusalem). Frank den
Hollander was supported by the Netherlands Organisation for Scientific Research
(NWO) through Gravitation Grant 024.002.003-NETWORKS, and by the European
Research Council (ERC) through Advanced Grant 267356-VARIS Variational Ap-
proach to Random Interacting Systems.
The writing of this book started in 2011 while Frank den Hollander held a Bonn
Research Chair at the HCM, and continued in 2012 while Anton Bovier held a
Kloosterman Chair at the Mathematical Institute of Leiden University. Regular visits
back and forth took place in 2013–2015. The authors thank their home institutions
for hospitality.

xi
Contents

Part I Introduction
1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . 3
1.1 Phenomenology . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Condensation and magnetisation: from gases to ferromagnets . . 5
1.3 Historical perspective . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.1 Early achievements . . . . . . . . . . . . . . . . . . . . 8
1.3.2 The pathwise approach . . . . . . . . . . . . . . . . . . 10
1.3.3 The spectral approach . . . . . . . . . . . . . . . . . . . 11
1.3.4 The potential-theoretic approach . . . . . . . . . . . . . 11
1.3.5 The computational approach . . . . . . . . . . . . . . . . 12
2 Aims and Scopes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1 Two paradigmatic models . . . . . . . . . . . . . . . . . . . . . 16
2.1.1 Kramers model: Brownian motion in a double-well . . . 16
2.1.2 Finite-state Markov processes . . . . . . . . . . . . . . . 17
2.2 Model reduction . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3 Variational point of view . . . . . . . . . . . . . . . . . . . . . . 19
2.4 Specific models . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.5 Related topics . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Part II Markov Processes
3 Some Basic Notions from Probability Theory . . . . . . . . . . . . 27
3.1 Probability and measures . . . . . . . . . . . . . . . . . . . . . 27
3.1.1 Probability spaces . . . . . . . . . . . . . . . . . . . . . 27
3.1.2 Random variables . . . . . . . . . . . . . . . . . . . . . 29
3.1.3 Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.1.4 Spaces of integrable functions . . . . . . . . . . . . . . . 32
3.1.5 Convergence . . . . . . . . . . . . . . . . . . . . . . . . 34
3.1.6 Radon-Nikodým derivative . . . . . . . . . . . . . . . . 36
3.2 Stochastic processes . . . . . . . . . . . . . . . . . . . . . . . . 37

xiii
xiv Contents

3.2.1 Definition of stochastic processes . . . . . . . . . . . . . 37

3.2.2 The Daniell-Kolmogorov extension theorem . . . . . . . 39
3.3 Conditional expectations . . . . . . . . . . . . . . . . . . . . . . 40
3.3.1 Definition of conditional expectations . . . . . . . . . . . 40
3.3.2 Elementary properties of conditional expectations . . . . 41
3.3.3 Conditional probability measures . . . . . . . . . . . . . 42
3.4 Martingales in discrete time . . . . . . . . . . . . . . . . . . . . 44
3.4.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.4.2 Upcrossings and convergence . . . . . . . . . . . . . . . 45
3.4.3 Maximum inequalities . . . . . . . . . . . . . . . . . . . 46
3.4.4 Stopping times and stopped martingales . . . . . . . . . 49
3.5 Martingales in continuous time . . . . . . . . . . . . . . . . . . 53
3.5.1 Càdlàg functions . . . . . . . . . . . . . . . . . . . . . . 53
3.5.2 Filtrations, supermartingales and càdlàg processes . . . . 54
3.5.3 The Doob regularity theorem . . . . . . . . . . . . . . . 55
3.5.4 Convergence theorems and martingale inequalities . . . . 57
3.5.5 Stopping times . . . . . . . . . . . . . . . . . . . . . . . 58
3.5.6 First hitting time and first entrance time . . . . . . . . . . 59
3.5.7 Optional stopping and optional sampling . . . . . . . . . 60
3.6 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . 61
4 Markov Processes in Discrete Time . . . . . . . . . . . . . . . . . . 63
4.1 Markov processes: main definitions and key facts . . . . . . . . . 63
4.1.1 Definition and elementary properties . . . . . . . . . . . 63
4.1.2 Markov processes with stationary transition probabilities 66
4.1.3 The strong Markov property . . . . . . . . . . . . . . . . 67
4.2 Markov processes and martingales . . . . . . . . . . . . . . . . 67
4.2.1 Semigroups . . . . . . . . . . . . . . . . . . . . . . . . 68
4.2.2 The martingale problem . . . . . . . . . . . . . . . . . . 69
4.2.3 Harmonic functions and martingales . . . . . . . . . . . 71
4.2.4 The Doob transform . . . . . . . . . . . . . . . . . . . . 72
4.3 Markov processes with countable state space . . . . . . . . . . . 74
4.4 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . 78
5 Markov Processes in Continuous Time . . . . . . . . . . . . . . . . 79
5.1 Markov jump processes . . . . . . . . . . . . . . . . . . . . . . 79
5.2 Brownian motion . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.2.1 Definition of Brownian motion . . . . . . . . . . . . . . 81
5.2.2 Martingale and Markov properties . . . . . . . . . . . . 84
5.3 General Markov processes . . . . . . . . . . . . . . . . . . . . . 86
5.3.1 Semigroups and generators . . . . . . . . . . . . . . . . 86
5.3.2 Feller-Dynkin processes . . . . . . . . . . . . . . . . . . 89
5.3.3 The strong Markov property . . . . . . . . . . . . . . . . 91
5.4 The martingale problem . . . . . . . . . . . . . . . . . . . . . . 92
5.4.1 Generators and cores . . . . . . . . . . . . . . . . . . . . 92
5.4.2 The martingale problem . . . . . . . . . . . . . . . . . . 95
Contents xv

5.4.3 Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.4.4 Existence . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.5 Itō calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.5.1 Square-integrable continuous martingales . . . . . . . . . 102
5.5.2 Stochastic integrals for simple processes . . . . . . . . . 105
5.5.3 Itō formula . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.6 Stochastic differential equations . . . . . . . . . . . . . . . . . . 107
5.6.1 Strong solutions . . . . . . . . . . . . . . . . . . . . . . 108
5.6.2 Existence and uniqueness of strong solutions . . . . . . . 109
5.6.3 The Doob transform . . . . . . . . . . . . . . . . . . . . 112
5.6.4 The Girsanov theorem . . . . . . . . . . . . . . . . . . . 113
5.7 Stochastic partial differential equations . . . . . . . . . . . . . . 114
5.7.1 The stochastic Allen-Cahn equation . . . . . . . . . . . . 115
5.7.2 Discretisation . . . . . . . . . . . . . . . . . . . . . . . 119
5.8 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . 122
6 Large Deviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
6.1 Large deviation principles . . . . . . . . . . . . . . . . . . . . . 125
6.2 Path large deviations for diffusion processes . . . . . . . . . . . 129
6.2.1 Brownian motion . . . . . . . . . . . . . . . . . . . . . 129
6.2.2 Brownian motion with drift . . . . . . . . . . . . . . . . 134
6.2.3 Diffusion processes . . . . . . . . . . . . . . . . . . . . 135
6.3 Path large deviations for stochastic partial differential equations . 136
6.4 Path large deviations for Markov processes . . . . . . . . . . . . 137
6.5 Freidlin-Wentzell theory . . . . . . . . . . . . . . . . . . . . . . 138
6.5.1 Properties of action functionals . . . . . . . . . . . . . . 138
6.5.2 Crossing and exit problems . . . . . . . . . . . . . . . . 139
6.5.3 Metastability . . . . . . . . . . . . . . . . . . . . . . . . 141
6.6 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . 142
7 Potential Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
7.1 The Dirichlet problem: discrete time . . . . . . . . . . . . . . . 145
7.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . 145
7.1.2 Green function, equilibrium potential and measure . . . . 147
7.1.3 Reversibility . . . . . . . . . . . . . . . . . . . . . . . . 150
7.1.4 One-dimensional nearest-neighbour random walks . . . . 155
7.2 The Dirichlet problem: continuous time . . . . . . . . . . . . . . 157
7.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . 157
7.2.2 Countable state space . . . . . . . . . . . . . . . . . . . 158
7.2.3 Diffusion processes . . . . . . . . . . . . . . . . . . . . 159
7.2.4 Reversible Markov processes . . . . . . . . . . . . . . . 161
7.2.5 One-dimensional diffusions . . . . . . . . . . . . . . . . 171
7.3 Variational principles . . . . . . . . . . . . . . . . . . . . . . . 173
7.3.1 The Dirichlet principle . . . . . . . . . . . . . . . . . . . 174
7.3.2 The Thomson principle . . . . . . . . . . . . . . . . . . 175
7.3.3 The Berman-Konsowa principle . . . . . . . . . . . . . . 178
xvi Contents

7.4 Variational principles in the non-reversible setting . . . . . . . . 181

7.5 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . 184
Part III Metastability
8 Key Definitions and Basic Properties . . . . . . . . . . . . . . . . . 189
8.1 Characterisation of metastability . . . . . . . . . . . . . . . . . 189
8.2 Renewal estimates and ultrametricity . . . . . . . . . . . . . . . 192
8.3 Estimates on mean hitting times . . . . . . . . . . . . . . . . . . 195
8.3.1 Rough bounds . . . . . . . . . . . . . . . . . . . . . . . 195
8.3.2 Sharp bounds . . . . . . . . . . . . . . . . . . . . . . . 196
8.4 Spectral characterisation of metastability . . . . . . . . . . . . . 200
8.4.1 A priori bounds . . . . . . . . . . . . . . . . . . . . . . 202
8.4.2 Characterisation of small eigenvalues . . . . . . . . . . . 203
8.4.3 Computation of small eigenvalues . . . . . . . . . . . . . 209
8.4.4 Exponential law of the metastable exit times . . . . . . . 220
8.5 Metastability in uncountable state spaces . . . . . . . . . . . . . 225
8.6 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . 226
9 Basic Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
9.1 Capacity estimates . . . . . . . . . . . . . . . . . . . . . . . . . 227
9.1.1 General strategies . . . . . . . . . . . . . . . . . . . . . 228
9.1.2 Lower bounds via flows . . . . . . . . . . . . . . . . . . 230
9.2 Coarse-graining . . . . . . . . . . . . . . . . . . . . . . . . . . 233
9.3 Lumping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
9.4 Regularity estimates . . . . . . . . . . . . . . . . . . . . . . . . 236
9.4.1 Elliptic regularity theory . . . . . . . . . . . . . . . . . . 236
9.4.2 Coupling methods . . . . . . . . . . . . . . . . . . . . . 240
9.5 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . 242
Part IV Applications: Diffusions with Small Noise
10 Discrete Reversible Diffusions . . . . . . . . . . . . . . . . . . . . . 247
10.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
10.2 Upper bounds on capacities . . . . . . . . . . . . . . . . . . . . 251
10.2.1 Cleaning of the Dirichlet form . . . . . . . . . . . . . . . 252
10.2.2 Construction of an approximate harmonic function . . . . 254
10.2.3 Final estimate . . . . . . . . . . . . . . . . . . . . . . . 257
10.3 Lower bounds on capacities . . . . . . . . . . . . . . . . . . . . 258
10.4 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . 263
11 Diffusion Processes with Gradient Drift . . . . . . . . . . . . . . . 265
11.1 The setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
11.2 Capacity estimates and mean hitting times . . . . . . . . . . . . 266
11.2.1 Main results . . . . . . . . . . . . . . . . . . . . . . . . 267
11.2.2 Rough estimates on capacities and harmonic functions . . 268
11.2.3 Sharp estimates on capacities . . . . . . . . . . . . . . . 270
11.2.4 Metastable exit times and capacities . . . . . . . . . . . . 275
Contents xvii

11.3 Spectral theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 278

11.3.1 Main results . . . . . . . . . . . . . . . . . . . . . . . . 279
11.3.2 A priori spectral estimates . . . . . . . . . . . . . . . . . 281
11.3.3 Principal Dirichlet eigenvalues . . . . . . . . . . . . . . 284
11.3.4 Exponentially small eigenvalues and their eigenfunctions 289
11.3.5 Improved error estimates . . . . . . . . . . . . . . . . . 300
11.3.6 Exponential distribution of metastable exit times . . . . . 302
11.4 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . 302
12 Stochastic Partial Differential Equations . . . . . . . . . . . . . . . 305
12.1 Definitions, main theorem and outline of proof . . . . . . . . . . 305
12.2 Approximation properties of the potential . . . . . . . . . . . . . 307
12.3 Estimate of the capacity . . . . . . . . . . . . . . . . . . . . . . 311
12.3.1 Properties of the potential . . . . . . . . . . . . . . . . . 311
12.3.2 Upper bound . . . . . . . . . . . . . . . . . . . . . . . . 314
12.3.3 Lower bound . . . . . . . . . . . . . . . . . . . . . . . . 316
12.4 Estimate of the equilibrium potential . . . . . . . . . . . . . . . 317
12.5 Proof of the main theorem . . . . . . . . . . . . . . . . . . . . . 319
12.6 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . 320
Part V Applications: Coarse-Graining in Large Volumes at Positive
Temperatures
13 The Curie-Weiss Model . . . . . . . . . . . . . . . . . . . . . . . . 325
13.1 The Curie-Weiss model . . . . . . . . . . . . . . . . . . . . . . 325
13.2 Metastable behaviour . . . . . . . . . . . . . . . . . . . . . . . 328
13.3 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . 330
14 The Curie-Weiss Model with a Random Magnetic Field: Discrete
Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
14.1 The model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
14.2 Gibbs measure and order parameter . . . . . . . . . . . . . . . . 332
14.3 Glauber dynamics . . . . . . . . . . . . . . . . . . . . . . . . . 334
14.4 Coarse-graining . . . . . . . . . . . . . . . . . . . . . . . . . . 335
14.5 The landscape near critical points . . . . . . . . . . . . . . . . . 338
14.6 Eigenvalues of the Hessian . . . . . . . . . . . . . . . . . . . . 340
14.7 Topology of the landscape . . . . . . . . . . . . . . . . . . . . . 341
14.8 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . 344
15 The Curie-Weiss Model with Random Magnetic Field: Continuous
Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
15.1 Main results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
15.2 Coarse-graining and the mesoscopic approximation . . . . . . . 346
15.2.1 Coarse-graining . . . . . . . . . . . . . . . . . . . . . . 347
15.2.2 The energy landscape near critical points . . . . . . . . . 348
15.3 Upper bounds on capacities . . . . . . . . . . . . . . . . . . . . 350
15.4 Lower bounds on capacities . . . . . . . . . . . . . . . . . . . . 352
15.4.1 Two-scale flows . . . . . . . . . . . . . . . . . . . . . . 352
xviii Contents

15.4.2 Propagation of errors along microscopic paths . . . . . . 356

15.5 Estimates on mean hitting times . . . . . . . . . . . . . . . . . . 363
15.5.1 Mean hitting time and equilibrium potential . . . . . . . 363
15.5.2 Upper bounds on harmonic functions . . . . . . . . . . . 365
15.6 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . 380

Part VI Applications: Lattice Systems in Small Volumes at Low

Temperatures
16 Abstract Set-Up and Metastability in the Zero-Temperature Limit 383
16.1 Hypotheses and universal metastability theorems . . . . . . . . . 383
16.1.1 Metropolis dynamics and geometric definitions . . . . . . 384
16.1.2 Metastability theorems and hypotheses . . . . . . . . . . 385
16.1.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 388
16.1.4 Consequences of the hypotheses . . . . . . . . . . . . . 389
16.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390
16.2.1 Dirichlet form and capacity . . . . . . . . . . . . . . . . 390
16.2.2 A priori estimates on the capacity . . . . . . . . . . . . . 391
16.2.3 Graph structure of the energy landscape . . . . . . . . . . 394
16.2.4 Metastable pair . . . . . . . . . . . . . . . . . . . . . . . 395
16.3 Proof of the metastability theorems . . . . . . . . . . . . . . . . 396
16.3.1 Exponential distribution of the crossover time . . . . . . 396
16.3.2 Average crossover time . . . . . . . . . . . . . . . . . . 397
16.3.3 Gate for the crossover and uniform entrance distribution . 400
16.4 Beyond Metropolis dynamics . . . . . . . . . . . . . . . . . . . 403
16.4.1 Heat-bath dynamics . . . . . . . . . . . . . . . . . . . . 403
16.4.2 Probabilistic cellular automata . . . . . . . . . . . . . . 404
16.5 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . 406
17 Glauber Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . 409
17.1 Introduction and main results . . . . . . . . . . . . . . . . . . . 409
17.1.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 409
17.1.2 Metastable regime and critical droplet size . . . . . . . . 410
17.1.3 Main theorems . . . . . . . . . . . . . . . . . . . . . . . 411
17.1.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 412
17.2 Geometric definitions . . . . . . . . . . . . . . . . . . . . . . . 413
17.3 Verification of the two hypotheses . . . . . . . . . . . . . . . . . 414
17.3.1 First hypothesis . . . . . . . . . . . . . . . . . . . . . . 414
17.3.2 Second hypothesis . . . . . . . . . . . . . . . . . . . . . 416
17.4 Structure of the communication level set . . . . . . . . . . . . . 416
17.5 Computation of the prefactor . . . . . . . . . . . . . . . . . . . 419
17.6 Extension to three dimensions . . . . . . . . . . . . . . . . . . . 420
17.7 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . 422
Contents xix

18 Kawasaki Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . 425

18.1 Introduction and main results . . . . . . . . . . . . . . . . . . . 425
18.1.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
18.1.2 Metastable regime and critical droplet size . . . . . . . . 427
18.1.3 Main theorems . . . . . . . . . . . . . . . . . . . . . . . 427
18.1.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 429
18.2 Geometric definitions . . . . . . . . . . . . . . . . . . . . . . . 431
18.3 Verification of the two hypotheses . . . . . . . . . . . . . . . . . 433
18.3.1 First hypothesis . . . . . . . . . . . . . . . . . . . . . . 433
18.3.2 Second hypothesis . . . . . . . . . . . . . . . . . . . . . 434
18.4 Structure of the communication level set . . . . . . . . . . . . . 434
18.4.1 Canonical protocritical droplets . . . . . . . . . . . . . . 434
18.4.2 Protocritical and critical droplets . . . . . . . . . . . . . 439
18.4.3 Identification of the protocritical and the critical set . . . 443
18.4.4 Motion on the plateau . . . . . . . . . . . . . . . . . . . 443
18.4.5 Cardinality of the set of protocritical droplets . . . . . . . 446
18.5 Asymptotics of the prefactor for large volumes . . . . . . . . . . 447
18.5.1 Geometry of critical droplets and wells . . . . . . . . . . 447
18.5.2 Capacity bounds on the prefactor . . . . . . . . . . . . . 448
18.5.3 Capacity asymptotics . . . . . . . . . . . . . . . . . . . 451
18.6 Extension to three dimensions . . . . . . . . . . . . . . . . . . . 453
18.7 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . 456

Part VII Applications: Lattice Systems in Large Volumes at Low

Temperatures
19 Glauber Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . 461
19.1 Introduction and main results . . . . . . . . . . . . . . . . . . . 461
19.1.1 Glauber dynamics in large volumes . . . . . . . . . . . . 461
19.1.2 Main theorem . . . . . . . . . . . . . . . . . . . . . . . 463
19.1.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 464
19.2 Average time to create a critical droplet . . . . . . . . . . . . . . 465
19.2.1 Estimate of the equilibrium potential . . . . . . . . . . . 465
19.2.2 Estimate of the capacity . . . . . . . . . . . . . . . . . . 465
19.3 Average time to go beyond the critical droplet . . . . . . . . . . 471
19.3.1 Estimate of the equilibrium potential . . . . . . . . . . . 471
19.3.2 Estimate of the capacity . . . . . . . . . . . . . . . . . . 471
19.4 Average time to grow a droplet twice the critical size . . . . . . . 473
19.4.1 Estimate of the equilibrium potential . . . . . . . . . . . 474
19.4.2 Estimate of the capacity . . . . . . . . . . . . . . . . . . 474
19.5 Sparseness of subcritical droplets . . . . . . . . . . . . . . . . . 474
19.6 Typicality of starting configurations . . . . . . . . . . . . . . . . 475
19.7 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . 477
xx Contents

20 Kawasaki Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . 479

20.1 Introduction and main results . . . . . . . . . . . . . . . . . . . 479
20.1.1 Kawasaki dynamics in large volumes . . . . . . . . . . . 479
20.1.2 Main theorem . . . . . . . . . . . . . . . . . . . . . . . 482
20.1.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 483
20.2 Average time to create a critical droplet . . . . . . . . . . . . . . 483
20.2.1 Estimate of the equilibrium potential . . . . . . . . . . . 483
20.2.2 Estimate of the capacity . . . . . . . . . . . . . . . . . . 484
20.3 Average time to grow a droplet twice the critical size . . . . . . . 498
20.4 Equivalence of ensembles . . . . . . . . . . . . . . . . . . . . . 498
20.4.1 Partition functions for different numbers of particles . . . 499
20.4.2 Partition functions for different volumes . . . . . . . . . 502
20.4.3 Atypicality of critical droplets . . . . . . . . . . . . . . . 503
20.4.4 Typicality of starting configurations . . . . . . . . . . . . 504
20.5 The critical droplet is the threshold . . . . . . . . . . . . . . . . 506
20.6 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . 506
Part VIII Applications: Lattice Systems in Small Volumes at High
Densities
21 The Zero-Range Process . . . . . . . . . . . . . . . . . . . . . . . . 511
21.1 Model and basic properties . . . . . . . . . . . . . . . . . . . . 511
21.2 Metastable behaviour . . . . . . . . . . . . . . . . . . . . . . . 513
21.2.1 Finite system size . . . . . . . . . . . . . . . . . . . . . 513
21.2.2 Diverging system size . . . . . . . . . . . . . . . . . . . 515
21.3 Capacity estimates . . . . . . . . . . . . . . . . . . . . . . . . . 516
21.3.1 Lower bound . . . . . . . . . . . . . . . . . . . . . . . . 516
21.3.2 Upper bound . . . . . . . . . . . . . . . . . . . . . . . . 521
21.4 Proof of the main theorems . . . . . . . . . . . . . . . . . . . . 533
21.4.1 Finite system size . . . . . . . . . . . . . . . . . . . . . 533
21.4.2 Diverging system size . . . . . . . . . . . . . . . . . . . 534
21.5 Proof that the condensate configurations form a metastable set . . 535
21.6 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . 540
Part IX Challenges
22 Challenges Within Metastability . . . . . . . . . . . . . . . . . . . 545
22.1 Glauber dynamics in large volumes at small magnetic fields . . . 545
22.1.1 Metastable crossover time . . . . . . . . . . . . . . . . . 546
22.1.2 Wulff construction . . . . . . . . . . . . . . . . . . . . . 547
22.1.3 Heuristics . . . . . . . . . . . . . . . . . . . . . . . . . 549
22.2 Crystallisation in small volumes at low temperatures . . . . . . . 549
22.2.1 Static model . . . . . . . . . . . . . . . . . . . . . . . . 550
22.2.2 Dynamic model . . . . . . . . . . . . . . . . . . . . . . 551
22.2.3 Metastability theorems for the soft disk potential . . . . . 552
22.2.4 Extension to other pair potentials . . . . . . . . . . . . . 554
22.3 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . 556
Contents xxi

23 Challenges Beyond Metastability . . . . . . . . . . . . . . . . . . . 559

23.1 Low temperatures . . . . . . . . . . . . . . . . . . . . . . . . . 560
23.2 Small magnetic fields . . . . . . . . . . . . . . . . . . . . . . . 562
23.3 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . 562
Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577
Part I
Introduction

Part I contains an introduction to the concept of metastability. Chapter 1 provides

background and motivation, including a brief historical account and a brief descrip-
tion of the main ideas driving the pathwise, the spectral and the potential-theoretic
approach to metastability. Chapter 2 sketches the aims and the scopes of the book,
including a description of two paradigmatic models and an outline of the various
models to be considered later in the book, organised according to the conceptual
and technical challenges they involve.
Chapter 1
Background and Motivation

In Science—in fact, in most things—it is usually best to begin at

the beginning. In some things, of course, it’s better to begin at
the other end. For instance, if you wanted to paint a dog green, it
might be best to begin with the tail, as it doesn’t bite at that end.
(Lewis Carroll, Sylvie and Bruno Concluded)

We begin with a brief description of the phenomenon of metastability (Sects. 1.1–

1.2) and a brief historical perspective of the mathematical theories that were devel-
oped to obtain a quantitative understanding of this phenomenon (Sect. 1.3).

1.1 Phenomenology

Metastability is a widespread phenomenon that arises in a large variety of systems—

physical, chemical, biological or economic. A simple experiment anyone can do at
home goes as follows. Fill a plastic bottle with distilled water and put it into the
freezer. After an hour or so, carefully take it out of the freezer. If you are lucky, then
the water still is liquid, but the temperature is down to somewhere between minus
5 and minus 10 degrees centigrade. Now slowly pour the water out of the bottle
and into a bowl. When done carefully, you should see a very fast freezing of the
water as it hits the bowl. What happens is that the water is undercooled, i.e., the
stable state of the water would have been ice, but you found it in the freezer in a
metastable state. This state is very sensitive to perturbations, and the shaking you
subject it to when pouring it into the bowl triggers an immediate transition to the
stable state, which is ice. Should you have left the bottle in the freezer unperturbed,
such a transition would eventually have happened spontaneously. In fact, if you
were to watch the bottle in the freezer over a long time, then you would eventually
see this quick freezing happen (you may need a lot of patience). Moreover, if you
repeat this experiment many times, then you will observe that the time until freezing
is rather variable, and is much longer than the time of the actual freezing itself. It
is reported that similar phenomena occur in very still and clean mountain lakes in
winter. The water cools well below the freezing point until suddenly the lake freezes
over.

© Springer International Publishing Switzerland 2015 3

A. Bovier, F. den Hollander, Metastability,
Grundlehren der mathematischen Wissenschaften 351,
DOI 10.1007/978-3-319-24777-9_1
4 1 Background and Motivation

Similarly, in chemistry the mixing of two reactive compounds (like oxygen and
hydrogen) may lead to a metastable state that can persist for a very long time,
but when triggered (by a spark) transits very rapidly to the stable state (water).
In economics, stock prizes may persist for a long time on high levels, in spite of
economists warnings that the market is “overheated”, until a “crash” occurs and
prices drop within days or hours to much lower levels. In economics jargon, there
was a “bubble” that has collapsed. Such phenomena are ubiquitous. The common
features are: a large variability in the moment of the onset of some dramatic change
in the properties of the system, a much shorter time for the actual transition (i.e.,
between the onset of a noticeable change and the moment a new state is reached),
and unpredictability of the time of the onset of the transition.
What is behind all of this? A simple thought experiment reveals a possible mech-
anism. Suppose that in a mountain range there are two valleys, A and B. Reaching
valley B from valley A requires climbing up 1 km on a steep slope. An experienced
mountaineer will easily make this journey in a fairly predictable time, say, 4 hours.
Now suppose that there is a tourist visiting valley A, a drunken tourist who wanders
around in the valley without any particular purpose and occasionally climbs up the
slope towards valley B. However, as he does so, he will encounter certain obstacles
or will get tired, and just slurps back to the base of the valley (where, for the sake of
the argument, he will have some drinks to retain his confused state of mind). As our
tourist is not terribly interested in getting to the second valley anyway, we may as-
sume that he does not learn anything from his excursions up the slope and after each
visit to the local pub finds himself in just the same condition as before. Let us now
assume that, after many days, we find our tourist in valley B. What has happened?
Well, after many failed excursions uphill, and equally many returns to the pub, on
a lucky day he just happened to climb straight up the slope and then tumbled down
into valley B. Should anyone have observed this final successful climb, he might
not have been able to distinguish the tourist from the experienced mountaineer, who
would have taken the same path on purpose in the first place. A rough estimate re-
veals how long it took the tourist to get to valley B: the number of attempts (returns
to the pub) will be on average 1/p when p is the probability to get over the edge be-
fore returning to the pub. If the average time Δ of such an unsuccessful excursion is
not too tiny (say, 30 minutes), then the average time Δ/p until the final crossing can
run into years when p is small (as is to be expected). Moreover, given the fact that
our tourist is not learning anything (and given that no other conditions are changing
over time, such as the weather), the fact that at each time back to the pub the tourist
is back to where he started implies that the number of failed attempts, and hence
the total time until the final crossing, are essentially unpredictable. Thus, in this
simple example we recognise and understand all the features of the metastability
phenomenon mentioned above. As we will see later, this little thought experiment
indeed captures all the crucial features behind metastability.
The first main challenge of mathematics is of a qualitative nature, namely, to
explain why in a large variety of systems the same type of metastable behaviour is
observed. Many such systems can be described from first principles as many-body
systems subject to classical or quantum dynamics. While the corresponding equa-
tions of motion are known, they are typically very hard to analyse, in particular,
1.2 Condensation and magnetisation: from gases to ferromagnets 5

over the extremely long time intervals in which metastable behaviour occurs. Also,
metastability manifestly exhibits randomness (the unpredictable time of the occur-
rence of the transition), the source of which may be difficult to extract from the
underlying deterministic dynamics. It may be due to quantum effects, or external
perturbations of a (non-closed) system, or the effect of unresolved high-frequency
degrees of freedom. A first simplification is to pass to a description of the system
as a stochastic dynamics. The justification of such a description is an interesting
topic in itself, which will not be addressed in the present book. Rather, a stochastic
model of the dynamics of the systems we are interested in will be the starting point
of our analysis of metastability. Even more restrictively, we will limit our analysis
to Markov processes. Still, even within this restricted setting, there is a wide variety
of different models where metastability emerges and where the explanation of the
underlying universality is possible.
The second main challenge is of a quantitative nature. Given the parameters of
some underlying model, we would like to be able to compute as precisely as pos-
sible the quantities controling the metastable phenomena, in particular, the distri-
bution of the times of the transitions between metastable and stable states. Again,
this is hard because most metastable systems of practical relevance are many-body
systems whose dynamics is not easy to capture, neither analytically nor numeri-
cally, and because extremely long time scales may be involved. (See Newman and
Barkema [194] for an overview on Monte Carlo methods in statistical physics.)
Understanding metastability on the quantitative level is of considerable practical
interest, as it affects the behaviour and functioning of many systems in nature.

1.2 Condensation and magnetisation: from gases

to ferromagnets

From the point of view of statistical mechanics, metastability is the dynamical sig-
nature of a first-order phase transition. In equilibrium statistical physics, a first-
order phase transition is said to occur if a system is sensitive to the change of a
parameter (or a boundary condition), in the sense that certain extensive variables
(such as density or magnetisation) show a discontinuity as functions of certain in-
tensive variables (such as pressure or magnetic field). Dynamically, this sensitivity
manifests itself in the fact that, as the parameter is varied across the phase transition
curve, the system remains for a considerable amount of time (typically random) in
the “old phase” before it suddenly changes to the “new phase”, the true equilib-
rium phase. In other words, the extensive variables change their value as a function
of time with a random delay. Thus, the study of metastability can be seen as part
of non-equilibrium statistical physics. Let us discuss this in a bit more detail in an
example.
The most commonly observed occurrence of metastability is the phenomenon of
condensation of over-saturated water vapour (rainfall). The common explanation of
what is going on can be found in elementary physics textbooks. If water vapour is
6 1 Background and Motivation

Fig. 1.1 Effective free energy ΔG(r) of a droplet as a function of its radius r (middle curve). The
threshold for condensation is the critical radius r ∗

cooled below the critical temperature, then the free energy of the gas-phase is larger
than that of the liquid-phase. Therefore, thermodynamics predicts a transition from
the gas-phase to the liquid-phase. However, this transition can only be achieved
by an aggregation of water molecules. This aggregation has to start somewhere in
the system with the formation of small droplets of liquid. The key point is that the
effective free energy of such a droplet is made up of two terms: (1) the difference
between the bulk free energies of the two phases; (2) the interfacial energy between
the two phases. This leads to a formula of the type (see Fig. 1.1)

ΔG(r) = difference bulk free energies + interfacial energy = −Δr d + σ r d−1 ,

(1.2.1)
where Δ, σ > 0 represent the effect of (1) and (2) per unit volume, respectively, per
unit surface, and r is the radius of the droplet.
The function in (1.2.1) is increasing and positive up to a value r ∗ , the critical ra-
dius, and is decreasing afterwards. This means that it is unfavourable for the system
to have small droplets and favourable to have large droplets: indeed droplets with
a radius smaller than r ∗ tend to evaporate while droplets with a radius larger than
r ∗ tend to grow. But how is it possible to create a large enough droplet of liquid-
phase within the gas-phase? The answer is: by thermal fluctuations, i.e., the system
on some small scale temporarily violates the laws of thermodynamics and evolves
in directions that locally increase the free energy. In this way it can form critical
droplets of liquid, and once this is done these droplets can continue to grow in full
agreement with the laws of thermodynamics. If the parameters of the system are
such that r ∗ is large, then the fluctuations that produce such supercritical droplets
are very rare, which leads to a long lifetime of the metastable gas-phase. Thus, the
crossover is triggered by the appearance of a critical droplet of the new phase in-
side the old phase, which subsequently grows and invades the system. Just as in the
example of our drunken tourist, the transition from the metastable state is charac-
terised by many unsuccessful attempts of the system to create a critical droplet.
The adhoc notion of “thermal fluctuations” may appear rather mysterious, and
does require explanation and theory. We will see in Chaps. 18 and 20, in the context
of stochastic models for the dynamics of lattice gases, that the above picture arises
1.3 Historical perspective 7

Fig. 1.2 Hysteresis in ferromagnets: plot of the magnetisation m versus the magnetic field h. The
dotted pieces refer to the magnetisation of the metastable states. The arrows refer to the metastable
crossovers. The symbols G and K stand for Glauber and Kawasaki dynamics (to be treated in
Part VI), for which the magnetisation is not preserved, respectively, is preserved

very naturally and can be fully quantified. In this context, the excess free energy of
a critical droplet is called the free energy barrier for the onset of the phase transi-
tion. The presence of such a barrier is the reason for the metastable behaviour, and
thermal fluctuations are the driving force for transitions out of the metastable state.
The formation of a critical droplet (i.e., a droplet of critical radius) is the minimal
effort these fluctuations have to make to initiate the phase transition dynamically.
Of course, the same explanation applies when a liquid freezes and—reversed in
time—when a liquid evaporates or a solid melts. The fine details are different, but
the overall picture is the same.
Another situation where the same principles are at work is magnetic hysteresis,
which is treated in Chaps. 13, 17 and 19 (see Fig. 1.2). When a ferromagnetic mate-
rial is placed in a magnetic field h it magnetises, i.e., the atomic magnetic moments
(“spins”) tend to align with the field. At temperatures below the so-called Curie
temperature, this magnetised state persists (forever) even when the field is turned
off: the spontaneous magnetisation is m . This persistence is the sign of a first-order
phase transition. Moreover, even when afterwards the direction of the field is in-
verted, the magnetisation will remain in the old direction and will only align with
the new direction after some time. The reason is the same as for the supersaturation
of a gas: the ferromagnetic material has to create local droplets with the opposite
magnetisation, and these droplets become energetically favourable and hence start
to grow only after they have acquired some minimal size. The creation of such
critical droplets is again the work of thermal fluctuations.
Ferromagnets are particularly easy to manipulate and very precise measurements
are possible. Figure 1.2 is the paradigmatic figure for metastable behaviour, and is
held to be ubiquitous in all situations of metastability.

1.3 Historical perspective

The study of metastability has a long and rich history. In this section we give a brief
summary of the most important developments.
8 1 Background and Motivation

Fig. 1.3 Chemical reaction from state S1 to state S2 via transition state S ∗ with reaction rates k ∗ ,
k1 and k2

1.3.1 Early achievements

The earliest attempt at a quantitative description of metastability dates back to the

work of van ’t Hoff [229], within the context of chemical reaction-rate theory (see
Fig. 1.3). In 1884 he proposed a formula for the temperature dependence of the rate
constant R associated with a chemical reaction, of the form

R = exp[−E/kT ]. (1.3.1)

Here, E is the activation energy associated with the reaction (in joules per
molecule), T is the absolute temperature (in degrees Kelvin), and k is the Boltz-
mann constant. (If the activation energy is measured in joules per mole, then k is to
be replaced by what is called the gas constant.) In 1889 Arrhenius [8] proposed a
refinement of (1.3.1), namely,

R = A exp[−E/kT ], (1.3.2)

where the prefactor A is called the amplitude. He also provided the following
physical interpretation of (1.3.2). For molecules to react they must first acquire
a minimum amount of energy, say E. At absolute temperature T , the fraction of
molecules that have a kinetic energy larger than E is proportional to exp[−E/kT ],
according to the Maxwell-Boltzmann distribution of statistical mechanics. Hence,
exp[−E/kT ] is the probability that a single collision causes a reaction. If A is in-
terpreted as the average number of collisions per time unit, then R is the average
number of collisions per time unit that cause a reaction, and the inverse 1/R is the
average reaction time.
Equation (1.3.2) goes under the name of Arrhenius equation or Arrhenius law.
The same equation applies to other situations where an energy barrier is involved,
such as the phenomena of condensation and magnetisation mentioned in Sect. 1.2.
Still other examples are the motion of dislocations in crystals, the ageing of spin
glasses and the folding of proteins, which underlines the universal character of
the Arrhenius formula. In Part VI we will see that (1.3.2) provides an excellent
1.3 Historical perspective 9

approximation of the average metastable crossover time for a large class of models
with a stochastic dynamics in small volumes at low temperatures.
Several modifications of the Arrhenius equation have been proposed over the
years. One modification is a temperature-dependence of the prefactor of the form
A(T /T0 )α , with T0 a reference temperature and α ∈ R a dimensionless exponent. In
Part VII we will encounter a model with a stochastic dynamics in large volume at
low temperature where this form of the prefactor is needed, with A proportional to
the volume and α = 1. In general, however, this form of the prefactor is neither easy
to explain theoretically nor easy to verify experimentally. Another modification is a
stretched exponential of the form

R = A exp −(E/kT )ᾱ , (1.3.3)

where ᾱ ∈ (0, 1) is a dimensionless exponent. Such an equation appears when the

reaction is controlled by a range of activation energies (occurring e.g. in disordered
systems) or a range of space-time scales (occurring e.g. in Mott multi-range random
hopping). The system is said to be “ageing” as it explores larger and larger activa-
tion energies and space-time scales. In this book we will not deal with models that
require this modification.
In 1940 Kramers proposed a toy model of a chemical reaction based on Brown-
ian motion in a double-well potential [157]. Using this model, he was able to derive
explicit expressions for E and A in (1.3.2) in terms of the shape of the potential (see
Sect. 2.1.1 for details). This work was the first to provide a mathematical verifica-
tion of the Arrhenius equation based on a mesoscopic model that replaces the micro-
scopic collisions of the molecules involved in the chemical reaction by a Brownian
motion, in the spirit of Einstein’s explanation of Brownian motion. Various refine-
ments of the Kramers formula, e.g. to higher dimensions and to different choices of
the noise, were obtained in the 1960’s and 1970’s. See the 1981 monograph Stochas-
tic Processes in Physics and Chemistry by van Kampen [228] for an overview. These
refinements in turn led to the theory of random perturbations of dynamical systems
developed by Freidlin and Wentzell (see Chap. 2 for details), in which explicit ex-
pressions for E and A were derived in much greater generality. This line of research
eventually led to the so-called pathwise approach to metastability, which will be
discussed in more detail in Sect. 1.3.2 below.
In 1966, Lebowitz and Penrose [162] provided a mathematical explanation of
the gas-liquid phase transition within the context of the so-called van der Waals-
Maxwell theory. They proposed a spin-model with a long-range interaction called
“Kac-potential” and showed that the free energy of this model correctly predicts
the pressure-versus-volume phase diagram, including the line of coexistence con-
structed via the “Maxwell’s equal area rule”. In 1971, Penrose and Lebowitz [199]
proposed a framework for a rigorous theory of metastability for particle systems.
They characterised metastable states via three conditions: (1) the system has only
one stable state (the thermodynamic phase); (2) the lifetime of the metastable state
is very long; (3) the crossover from the metastable state to the stable state is an
“irreversible” process, in the sense that the return time is much longer than the
10 1 Background and Motivation

decay time. The main tool in [199] is the restricted ensemble, which is defined
to be the Gibbs measure conditioned on the particle configuration lying in a suit-
able subset R of the configuration space, representing the metastable state, e.g.
corresponding to a supersaturated vapour whose density is conditioned to lie be-
low the density of the liquid. The rate at which the stochastic dynamics brings
the system outside R is maximal at time zero. This incipient rate plays the role
of an escape rate λ. The lifetime of the metastable state is identified with 1/λ, and
is an inherently dynamical quantity. The choice of R must be such that: (1) the
Gibbs measure conditioned on R describes a pure phase; (2) λ is very small; (3) R
has a very small weight under the unconditional Gibbs measure. For the spin-
model with Kac-potential, Penrose and Lebowitz were able to compute λ explic-
itly (on a rough scale) and show that 1/λ coincides with the activation free en-
ergy needed to move out of R. Based on these results, an early attempt to ax-
iomatise metastability was made by Sewell [218, Chap. 6]. For further details we
refer the reader to Penrose and Lebowitz [200] and to Olivieri and Vares [198,
Sect. 4.1].

1.3.2 The pathwise approach

The pathwise approach to metastability was initiated in the late 1960’s and early
1970’s by Freidlin and Wentzell. They introduced the theory of large deviations on
path space in order to analyse the long-term behaviour of dynamical systems un-
der the influence of weak random perturbations. Their realisation that metastable
behaviour is controlled by large deviations of the random processes driving the
dynamics has permeated most of the mathematical literature on the subject since.
A comprehensive account of this development is given in their 1984 monograph
Random Perturbations of Dynamical Systems [115]. The application of these ideas
in a statistical physics context was pioneered in 1984 by Cassandro, Galves, Olivieri
and Vares [51]. They realised that the theory put forward by Freidlin and Wentzell
could be applied to study metastable behaviour of interacting particle systems. This
paper led to a flurry of results for a variety of Markovian lattice models, which
are described at length in the 2005 monograph Large Deviations and Metastabil-
ity [198] by Olivieri and Vares. This work provides the key elements of the symbio-
sis between statistical physics, large deviation theory and metastability.
The advantage of the pathwise approach is that it gives very detailed informa-
tion on the metastable behaviour of the system. By identifying the most likely path
between metastable states (typically, the global minimiser of some “action inte-
gral” that constitutes the large deviation rate function in path space), the time of the
crossover can be determined and information can be obtained on what the system
does before and after the crossover (“tube of typical trajectories”). The drawback
of the pathwise approach is that it is generally hard to identify and control the rate
function, especially for systems with a spatial interaction, for which the dynamics is
non-local. Consequently, the pathwise approach typically leads to relatively crude
results on the crossover time.
1.3 Historical perspective 11

1.3.3 The spectral approach

In the 1980’s, Davies [70–74] proposed an axiomatic approach to metastability

based on spectral properties of generators of reversible Markov processes (in some
L2 -space). He showed that metastable behaviour arises when the spectrum of the
generator consists of a cluster of very small real eigenvalues, separated by a com-
paratively wide gap from the rest of the spectrum. Under additional assumptions
on boundedness of the corresponding eigenfunctions, he showed that the eigenfunc-
tions allow for a decomposition of the state space into “metastable” sets, and that
the motion of the Markov process between these sets is slow, with time-scales that
are given by the inverses of the corresponding eigenvalues. In the 1990’s, these
results were developed further by Gaveau and Schulman [124], and Gaveau and
Moreau [123]. While the spectral approach to metastability is conceptually nice and
natural, it is typically very difficult to verify the assumptions made on the spectrum.

1.3.4 The potential-theoretic approach

The potential-theoretic approach to metastability was initiated in 2001 in a paper

by Bovier, Eckhoff, Gayrard and Klein [33]. Here, the pathwise view is largely dis-
carded. Instead of aiming at identifying the most likely paths realising a metastable
crossover and estimating their probabilities, it interprets the metastability phe-
nomenon as a sequence of visits of the path to different metastable sets, and focuses
on a precise analysis of the respective hitting probabilities and hitting times of these
sets with the help of potential theory. Phrased differently, it translates the prob-
lem of understanding the metastable behaviour of Markov processes to the study of
equilibrium potentials and capacities of electric networks.
More precisely, the configurations of the system are viewed as the vertices of
the network and the transitions between pairs of configurations as the edges of the
network. The transition probabilities are represented by the conductances of the
associated edges. In this language, the hitting probability of a set of configurations
as a function of the starting configuration of the Markov process can be expressed
in terms of the equilibrium potential on the network when the potential is set to 1
on the vertices of the target set and to 0 on the starting vertex. The average hitting
time of the set can then be expressed in terms of the equilibrium potential and the
capacity associated with the target set and the starting vertex. For metastable sets it
turns out that the average hitting time is essentially the inverse of the capacity.
A key observation in the potential-theoretic approach is the fact that capacities
can be estimated by exploiting powerful variational principles. In fact, dual varia-
tional principles are available that express the capacity both as a supremum (over
potentials) and as an infimum (over flows). This opens up the possibility to derive
sharp lower bounds and upper bounds on the capacity via a judicious choice of
test functions. In fact, with the proper physical insight, test functions can be found
for which the lower bounds and the upper bounds are asymptotically equivalent (in
an appropriate limit corresponding to a metastable regime). A second key obser-
12 1 Background and Motivation

vation is that the relevant equilibrium potentials can, to the extent necessary, be in
turn bounded from above and below by capacities with the help of renewal equa-
tions. This is absolutely crucial, as it avoids the formidable problem of solving the
boundary value problems through which the equilibrium potentials are defined. Ef-
fectively, it means that estimates of the average crossover time can be derived that
are much sharper than those obtained via the pathwise approach.
Capacities are expressed with the help of Dirichlet forms, which are functionals
of the space of potentials, respectively, flows, and correspond to the energy associ-
ated with the network. These Dirichlet forms have the dimension of the configura-
tion space, and thus are typically very high-dimensional. However, it turns out that
the ensuing high-dimensional variational principles for the capacity often can be
reduced to low-dimensional variational principles when the system is metastable.
This comes from the fact that metastable crossovers occur near saddles connect-
ing metastable sets of configurations and, consequently, the equilibrium potential
is very close to 1 or to 0 away from these saddles. As a result, the full variational
principle reduces to a simpler variational principle, which only lives on the config-
urations close to the saddle and captures the fine details of the dynamics when it
makes the crossover. In Parts IV–VIII we will see plenty of examples of this reduc-
tion. In some cases the simpler variational problem is so low-dimensional that it can
be solved explicitly.
The quantitative success of the potential-theoretic approach, relying on tractable
variational principles for capacities, also entails its effective limitation to the case of
reversible Markov processes. While variational characterisations of capacities are
known also for non-reversible Markov processes (see Sect. 7.3), they are far more
complicated and difficult to use than their reversible counterparts. Some attempts in
this direction have been made by Eckhoff [101, 102], and more recently by Gaudil-
lière and Landim [121] and Slowik [220]. This area is wide open for future research.
Historically, the potential-theoretic approach has its roots in the early work by
Kramers [157], who performed precise computations of metastable crossover times
in the context of a Brownian motion in a double-well potential. Such explicit so-
lutions of the Dirichlet problems involved are, however, possible only in the one-
dimensional setting. There have been numerous computations in higher-dimensional
settings, based on formal perturbation theory, which can be seen as precursors of the
potential-theoretic approach (see e.g. Matkowsky and Schuss [181], Matkowsky,
Schuss and Tier [182], Knessl, Matkowsky, Schuss and Tier [153], and the discus-
sion in Maier and Stein [169]).
The potential-theoretic approach also connects nicely to the spectral approach.
As we will see in Chaps. 8 and 11, in many cases the spectral assumptions of Davies
are a consequence of metastability as characterised by capacities.

1.3.5 The computational approach

As mentioned above, there is great interest in quantitative numerical computations

for specific systems that exhibit metastable phenomena. Since metastability is driven
1.3 Historical perspective 13

by rare events and involves excessively long time-scales, doing a simulation is

extremely challenging and requires highly sophisticated techniques. Some of the
methods developed so far have relations to or are motivated by theoretical work,
in particular, the spectral approach (see e.g. Schütte, Huisinga and Meyn [216]).
The so-called transition path theory, developed in the 2000’s, uses ideas similar to
those appearing in the potential-theoretic approach, but relies on numerical meth-
ods to compute harmonic functions (see e.g. E and Vanden-Eijnden [100], Ren and
Vanden-Eijnden [99], Metzner, Schütte and Vanden-Eijnden [184]). Covering this
huge field is beyond the scope of the present book.
Chapter 2
Aims and Scopes

What a convenient thing it would be if all thieves had the same

shape! It’s so confusing to have some of them quadrupeds and
others bipeds! (Lewis Carroll, Sylvie and Bruno)

While classical mechanics is concerned with deterministic equations of motion, sta-

tistical mechanics adopts the view that many-particle systems can best be described
with the help of probabilistic techniques that do justice to the intrinsic complexity
of these systems and to our incomplete knowledge of the precise microscopic state
they are in. The aim of equilibrium statistical mechanics is to describe many-particle
systems through Gibbs distributions, i.e., probability distributions on configuration
spaces given by Boltzmann weight factors based on interaction Hamiltonians.
A key target is the computation of the free energy of the system as a function
of macroscopic or mesoscopic parameters. (Gibbs distributions minimise the free
energy according to the Gibbs variational principle.) The idea is that the free en-
ergy, which captures the equilibrium (= static) properties of the system, implicitly
contains information that is pertinent to the non-equilibrium (= dynamic) proper-
ties of the system as well, since both depend on the energy landscape encoded in
the interaction Hamiltonian, with the Gibbs distribution being invariant under the
dynamics. A guiding principle of the present book is to make this idea as precise
as possible within the context of metastability. On the intuitive level, we deal with
Gibbs distributions that put relatively large weight on disjoint sets with different
macroscopic properties (“metastable states”), separated by regions of small weight
through which the system cannot move easily (“saddles”). To make this intuition
precise we must make certain assumptions on the dynamics. Typically we must as-
sume that the dynamics is either local or diffusive, meaning that long-range jumps
are either rare or are excluded.
In Sect. 2.1 we describe two paradigmatic models for metastability that serve as
a red thread through much of the book. In Sect. 2.2 we explain the importance of
model reduction, linking complex realistic models to simple toy models that cap-
ture the metastable behaviour on an aggregate level. This reduction is necessary to
understand the universality behind metastable phenomena. In Sect. 2.3, we give a
brief outline of the variational point of view that is central to the potential-theoretic
approach to metastability—the subject of the present monograph—thereby expand-
ing further on what was already written in Sect. 1.3.4. In Sect. 2.4 we provide a list

© Springer International Publishing Switzerland 2015 15

A. Bovier, F. den Hollander, Metastability,
Grundlehren der mathematischen Wissenschaften 351,
DOI 10.1007/978-3-319-24777-9_2
16 2 Aims and Scopes

of the models to be considered in Parts IV–VIII, with a brief indication of what we

prove about them and how we organise them. In Sect. 2.5 we mention a number of
related topics that are not treated in this book but are slowly coming within reach of
mathematical theory.

2.1 Two paradigmatic models

In Sects. 2.1.1–2.1.2 we describe two paradigmatic models for metastability: the

Kramers model for Brownian motion in a double-well potential and finite-state
Markov chains with exponentially small transition probabilities.

2.1.1 Kramers model: Brownian motion in a double-well

One of the first mathematical models for metastability was proposed in 1940 by
Kramers [157]. It consists of the one-dimensional diffusion equation (or Langevin
equation in physics terminology)
√
dXt = b(Xt ) dt + 2ε dBt , (2.1.1)

where Xt denotes the position at time t of a “particle” diffusing in a drift field

b = −W , with W : R → R a double-well potential, i.e., a function with two local
minima and two steep walls (see Fig. 2.1), and Bt denotes the position at time t of
a standard Brownian motion.1
Equation (2.1.1) has become the paradigm of metastability. Kramers was able to
settle essentially all the interesting questions related to this model. In particular, he
derived the so-called Kramers formula for the average transition time from a local
minimum at u to a global minimum at v via a saddle point at z∗ (see Fig. 2.1):
2π
Eu [τv ] = 1 + o(1) √ exp W z∗ − W (u) /ε . (2.1.2)
[−W (z∗ )]W (u)

This formula fits the classical√Arrhenius law with activation energy E = W (z∗ ) −
W (u), amplitude A = 2π/ [−W (z∗ )]W (u) and inverse temperature β =
1/kT = 1/ε. Note that the flatter W is near z∗ and u, the larger is the amplitude:
flatness slows down the crossover at z∗ and increases the number of returns to u.

1 In fact, (2.1.1) emerges as a special case of the more general equation considered by Kramers,
√
namely, the Ornstein-Uhlenbeck equation dXt = Vt dt , μ−1 dVt = −dXt + b(Xt )dt + 2εdBt ,
where Vt denotes the velocity at time t and μ is a friction parameter. This equation gives rise to
(2.1.1) in the limit as μ → ∞. Thus, (2.1.1) can be seen as the equation of motion of a particle
moving under the influence of a friction force, a gradient force and a random force in the limit
where the friction becomes infinitely strong.
2.1 Two paradigmatic models 17

Fig. 2.1 A double-well potential with a local minimum at u, a global minimum at v and a saddle
point at z∗

Formula (2.1.2) exhibits a structure that is typical for metastable systems. There
is an exponential term, here given by exp[(W (z∗ ) − W (u))/ε], which provides the
leading asymptotic behaviour. The pathwise approach to metastability, which is
based on large deviation theory, typically is capable to identify this term, sometimes
referred to as the exponential asymptotics, by showing that

ε ln Eu [τv ] = 1 + o(1) W z∗ − W (u) , ε ↓ 0. (2.1.3)
√
However, identifying the prefactor, here given by 2π/ [−W (z∗ )]W (u), is in
general a far more subtle problem. It is the ambition of the potential-theoretic ap-
proach exhibited in this book to provide a unified framework that allows to obtain
rigorous asymptotic formulas as in (2.1.2) with an explicit prefactor for a wide class
of metastable systems. We will see plenty of examples in Parts VI–VII.
The multi-dimensional generalisation of (2.1.2) is attributed to Eyring and is
called the Eyring-Kramers formula (see Glasstone, Laidler and Eyring [127], Wei-
denmüller and Zhang [236], Maier and Stein [170]). Actually, Eyring’s so-called
transition-state theory [106] is based on quantum-mechanical considerations and is
different from the classical theory of Kramers. It interprets the potential as a re-
stricted quantum-mechanical free energy. For a historical discussion, see Pollak and
Talkner [201].

2.1.2 Finite-state Markov processes

The model of Kramers describes the evolution of an effective order parameter of a

metastable system driven by diffusive noise. It is clear that in this model the “par-
ticle” spends most of its time close to the two local minima of the double-well
potential. This suggests a further simplification of the picture, namely, a reduction
to a two-state system, with the two states u, v representing the two wells of the po-
tential. The time at which this system jumps from state u to state v and backwards
18 2 Aims and Scopes

Fig. 2.2 Transition rates for the two-state Markov chain

can be reasonably approximated by the first hitting time τv of the local minimum v
starting from the local minimum u, and vice versa for τu . As we will see in Parts IV–
V, in the limit as ε ↓ 0 the times τv and τu normalised by their expectations tend
to exponentially distributed random variables. This means that a rough approxima-
tion of the long-term behaviour of the Kramers model is given by a continuous-time
Markov chain with state space {u, v} and transition rates (see Fig. 2.2)

c(u, v) = e−r(u,v)/ε , r(u, v) = W z∗ − W (u),
(2.1.4)
c(v, u) = e−r(v,u)/ε , r(v, u) = W z∗ − W (v).

The average crossover times Eu [τv ] = 1/c(u, v) and Ev [τu ] = 1/c(v, u) capture the
leading order asymptotics of (2.1.1), as expressed in (2.1.3).
The above setting can be easily generalised to systems with multiple metastable
states. An effective model for such systems would be a continuous-time Markov
chain with a finite state space M = {m1 , . . . , mn } and transition rates c(mi , mj ) =
exp[−r(mi , mj )/ε], i, j = 1, . . . , n. The basic task of a theory of metastability is
to determine these transition rates from first principles. This idea was properly for-
malised by Freidlin and Wentzell [115] in the context of small random perturbations
of dynamical systems. In their theory the coefficients r(mi , mj ) are computed with
the help of the theory of large deviations on path space (see Chap. 6).
Finite-state Markov chains with exponentially small transition rates have become
a subject of interest by themselves. By allowing the transition rates to be either ex-
ponentially small or equal to one, the above picture is capable of describing models
from statistical physics, in particular, spin-flip systems and lattice gases in finite
volumes at low temperatures, with ε playing the role of temperature (see Part VI).
The analysis of the metastability properties of finite-state Markov chains is a
non-trivial problem in itself. In the early 1990’s an intense activity in this direction
started with the work of Catoni and Cerf [53] and Olivieri and Scoppola [196, 197].
The methods used were, once again, large deviations on the path space of these
Markov chains. The difficulties that arise in the analysis of specific models are
essentially of a combinatorial nature: the optimal paths for transitions between
metastable states need to be identified and to be counted. This leads to interest-
ing problems, such as the discrete isoperimetric inequalities studied in Alonso and
Cerf [4], which we will encounter in Part VI. Only later, in the 2000’s, was it noted
that potential theory is very well suited to simplify the analysis and to draw sharper
results from the same input, as first pointed out by Bovier and Manzo [39] and later
amplified in Bovier, den Hollander and Nardi [31].
2.2 Model reduction 19

2.2 Model reduction

Kramers model and finite-state Markov chains can both be seen as simple toy models
that ought to be derivable from more complex realistic models of interest. Ideally,
we would like to start with many-body systems of interacting quantum particles.
This, however, is beyond present-day technology. The most complex models we will
consider in this book are classical interacting particle systems, in particular spin-
flip systems and lattice gases. These are Markov processes with a high-dimensional
(sometimes even infinite-dimensional) state space. Typically, the noise on the level
of the microscopic dynamics is not small, and the large-scale dynamics of the system
depends on the interplay between energetic and entropic effects.
It is generally accepted in the physics and chemistry literature that reduced mod-
els, describing the time evolution of the system on an intermediate aggregate level
of mesoscopic variables, provides a good description of metastable behaviour. Ex-
amples of such models are stochastic differential or partial differential equations
with small noise. Ideally, such effective dynamics should be derived with the help
of coarse-graining techniques, in the spirit of the renormalisation group theory in
equilibrium statistical mechanics (see the monograph by Presutti [202]). However,
this derivation is quite problematic, partly because renormalisation maps typically
do not preserve the Markovian nature of the dynamics. An even more serious issue is
that, while at least formally deterministic evolution equations (like the Allen-Cahn
equation [3] treated in Chap. 12) can be derived as scaling limits (i.e., laws of large
numbers in probabilistic language), a proper understanding of metastability requires
that we move beyond the deterministic limit and retain at least part of the random
perturbations of the dynamics. In the literature this goes by the name of diffusion
limits. However, there are subtle and poorly understood issues regarding the proper
choice of the noise term. In this book we will treat diffusion processes with small
noise as interesting models in their own right in Part IV. The issue of the derivation
of mesoscopic dynamics from microscopic dynamics in the mean-field setting will
be touched upon in Part V.

2.3 Variational point of view

The focus of this book is on the potential-theoretic approach to metastability. The

basic ideas are classical: many probabilistic quantities can be represented as so-
lutions of Dirichlet problems. The usefulness of this observation may appear to be
limited, as it amounts to having to solve partial differential equations or discrete ana-
logues thereof. In general, no explicit analytic solutions of such problems are avail-
able. Two notable exceptions are one-dimensional diffusions (which is the reason
for the solvability of the Kramers model) and one-dimensional nearest-neighbour
random walks.
The power of the potential-theoretic approach arises from the fact that it avoids
to solve the Dirichlet problem. Instead, it makes use of a representation formula
20 2 Aims and Scopes

for the Green function in terms of capacities, the invariant measure and harmonic
functions. Since renewal arguments can be used to control harmonic functions by ca-
pacities, the key objects of the theory are capacities and the invariant measure. The
great advantage of this approach materialises in the context of reversible Markov
processes, i.e., Markov processes whose semi-groups are self-adjoint operators in
an L2 -space with respect to an invariant measure. This provides the main weapon
of the method: the Dirichlet principle expresses capacities as infima of the Dirichlet
form over classes of functions that are constrained by boundary conditions. The use-
fulness of this variational principle has long been recognised, e.g. in the analysis of
finite-state Markov chains. The book by Doyle and Snell [96] is an excellent source
for this material. For a more recent exposition, see Levin, Peres and Wilmer [163].
Part II of the book provides the background on potential theory of reversible Markov
processes that is necessary to deal with problems of metastability.
As a variational problem, the Dirichlet principle is a simple instrument to turn
physical intuition into upper bounds, and the sharpness of these upper bounds is
limited by diligence and imagination only. A particularly nice aspect of the Dirichlet
problem is that it satisfies certain monotonicity properties with respect to underlying
parameters. In fact, on this basis Berman and Konsowa [23] derived a dual varia-
tional principle that expresses capacities (in the case of a discrete state space) in
terms of suprema over flows (similar to, but different from the better known Thom-
son principle), which we call the Berman-Konsowa principle. As an upshot, the
latter allows for the derivation of lower bounds that complement the upper bounds
obtained via the Dirichlet principle. It is a rather remarkable fact that in many ex-
amples upper and lower bounds can be obtained for the metastable crossover time
that differ by a multiplicative factor of the form 1 + o(1) only, where o(1) tends to
zero as the time scale of the metastable system tends to infinity. We will see these
ideas at work in a variety of examples throughout the book. Part III outlines the
basic techniques that are needed to implement these ideas.
A key observation is that the analysis of the Dirichlet principle and the Berman-
Konsowa principle in essence is part of equilibrium statistical physics, since it deals
with acquiring the relevant knowledge of the free energy landscape of the system.
Potential theory links this knowledge to the metastable dynamics of the system,
which is part of non-equilibrium statistical physics.

2.4 Specific models

The following models will be considered in Parts IV–VIII.
• In Part IV we study diffusions with small noise. Chapter 10 deals with diffu-
sions on lattices with small spacings, the simplest setting in which the potential-
theoretic approach to metastability can be applied. Under certain regularity as-
sumptions on the transition probabilities (for discrete time) or transition rates
(for continuous time), we carry out a detailed calculation of metastable crossover
times. Chapter 11 considers finite-dimensional diffusions on subsets of Rd and
2.4 Specific models 21

sharpens the classical results of Freidlin-Wentzell theory by using the potential-

theoretic approach. The Kramers formula is generalised to a d-dimensional dif-
fusion in a general potential satisfying minimal structural assumptions, a link is
made with the principal eigenvalue of the generator of the diffusion, and the ex-
ponential distribution of the crossover time is established. Chapter 12 looks at
stochastic partial differential equations, which are the infinite-dimensional ana-
logues of the diffusions dealt with in Chap. 11, and shows that similar results
apply for a particular example called the Allen-Cahn equation. The theory is com-
plete in one dimension, but suffers from difficulties in higher dimensions, where
the noise has to be “truncated properly”.
• In Part V we deal with models that allow for coarse-graining, i.e., a lumping of
states that leads to a simpler Markov process on a reduced state space. Chapter 13
analyses the Curie-Weiss model (the archetype model for ferromagnetism) sub-
ject to Glauber spin-flip dynamics. The metastable behaviour of the magnetisation
can be fully computed in the limit as the volume tends to infinity, at any subcrit-
ical temperature, and turns out to be similar to that of the Kramers model. Chap-
ters 14–15 extend the analysis to the random-field Curie-Weiss model. If the sup-
port of the distribution of the magnetic field is finite (Chap. 14), then this model
behaves similarly as the Curie-Weiss model, with a large-volume metastable be-
haviour that is like the Kramers model when the dimension is equal to the size of
the support. The computations become much more complicated when the support
is infinite (Chap. 15), in which case delicate coupling techniques are required.
• Part VI looks at lattice models subject to a Metropolis dynamics in a finite vol-
ume in the limit as the temperature tends to zero. Chapter 16 explains how the
potential-theoretic approach can be used to prove that these models have the same
metastable behaviour as the two-state Markov chain, provided a number of min-
imal hypotheses are satisfied. Two other dynamics are briefly discussed as well,
namely, heat-bath dynamics and probabilistic cellular automata, for which the
same universal metastable behaviour can be derived under similar hypotheses.
Chapter 17 settles the hypotheses in the case of Glauber dynamics for Ising spins.
Chapter 18 in the case of Kawasaki dynamics for lattice gas particles.
• Part VII looks at nucleation in lattice systems that grow to infinity as the temper-
ature tends to zero. Spatial entropy comes into play: in large volumes, even at
low temperatures, entropy is competing with energy because the metastable state
and the states that evolve from it under the dynamics have a non-trivial spatial
structure. The main idea is that the system exhibits “homogeneous nucleation”,
i.e., after the large volume is divided up into smaller (but still large) subvolumes,
the system is found to behave more or less independently in different subvol-
umes. Chapter 19 looks at Glauber dynamics, Chap. 20 at Kawasaki dynamics.
For the latter, the computations are delicate and require a proof of “equivalence
of ensembles” in a dynamical setting.
• Part VIII describes lattice systems at high densities. The focus in Chap. 21 is
on the zero-range process, which consists of a collection of particles performing
continuous-time simple random walks with on-site attraction and no on-site re-
pulsion. We consider the limit where the particle density is high, show that the
22 2 Aims and Scopes

process spends most of its time in a “condensed state”, i.e., a configuration where
most of the particles pile up on a single site, and prove that the process evolves
via a “metastable hopping” of this pile from one site to another. Both the hopping
time and the hopping distribution are computed.
The different parts on applications can essentially be read independently and
have the following substructure:
• Part IV: (diffusions with small noise)
(10) Discrete diffusions
(11) Continuous diffusions ∗
(12) Stochastic partial differential equations ∗
• Part V: (coarse-graining in large volumes at positive temperatures)
(13) Curie-Weiss mean-field model
(14) Curie-Weiss in discrete random magnetic field
(15) Curie-Weiss in continuous random magnetic field ∗
• Part VI: (lattice systems in small volumes at low temperatures)
(16) General theory
(17) Glauber dynamics
(18) Kawasaki dynamics
• Part VII: (lattice systems in large volumes at low temperatures)
(19) Glauber dynamics ∗
(20) Kawasaki dynamics ∗
• Part VIII: (lattice systems in small volumes at high densities)
(21) Zero-range dynamics ∗
The chapters without ∗ concern models where the state space is simple (e.g. discrete
and finite) and a complete description of the metastable behaviour is achieved. The
chapters with ∗ concern models where the state space is not simple (e.g. continuous
and infinite) and only partial results are obtained.

2.5 Related topics

Apart from being manifest in interacting particle systems, metastability is an im-
portant feature of complex systems in general. Topics within reach that will not be
considered in this monograph include:
• Ageing: A random dynamics goes through a cascade of metastable states in which
it gets trapped on increasingly larger space-time scales. As a consequence, the de-
cay of space-time correlation functions depends on the age of the system. Exam-
ples are random walks in random environments, used to describe spin glass dy-
namics (Bouchaud, Cugliandolo, Kurchan and Mézard [30], Ben Arous, Bovier
and Gayrard [18]).
2.5 Related topics 23

• Conformational dynamics: Large (bio)-molecules undergo transitions between

metastable states (= conformations) under the influence of thermal noise. There is
a strong application-driven interest in the numerical identification of these states
and of their lifetimes (Grassberger, Barkema, Nadler [129], E, Ren and Vanden-
Eijnden [99], Schütte and Sarich [217]). It is a challenge to develop a rigorous
mathematical framework to analyse such systems (Caputo, Lacoin, Martinelli,
Simenhaus and Toninelli [50]).
• Population dynamics: Selective sweeps in genetic populations, triggered by mu-
tations that drive the population from one dominant trait to another, can be viewed
as transitions between metastable states (Dawson and Greven [75]). Viruses mov-
ing through a complex network may cause an epidemic. The epidemic may be
interpreted as a metastable state of the network, which lasts until the virus disap-
pears. This metastable state depends sensitively on the size and the architecture
of the network (Chatterjee and Durrett [56], Mourrat and Valesin [187]).
• Gene regulatory networks: The genetic information encoded in DNA fixes the
topology of the network. Transitions between the various phenotypic states can
be understood as crossovers between metastable states of the corresponding dy-
namical system subject to noise (Kauffmann [149], Huang [141]).
Part II
Markov Processes

The playground for our expedition into metastability is the theory of Markov pro-
cesses. Part II presents a summary introduction to this subject, with special emphasis
on what will be needed in Part III to describe metastability. The simplest examples
are Markov processes in discrete time and discrete space. The general theory is de-
veloped from there.
Chapter 3 recalls some basic notions from probability theory. Chapters 4–5 look
at Markov processes in discrete, respectively, continuous time, with the focus on
generators and semigroups, martingales, and Itō calculus. Chapter 6 gives a brief
introduction to large deviations, and looks at path large deviations for finite- and
infinite-dimensional diffusion professes via action integrals. Chapter 7 collects the
main ingredients from potential theory that are needed in the rest of the book: with
capacity playing a central role in the study of metastable transition times, and vari-
ational principles for capacities being the main vehicles to estimate capacities.
Readers with a background in probability theory can skip Chaps. 3–6.
Chapter 3
Some Basic Notions from Probability Theory

On peut même dire, à parler en rigueur, que presque toutes nos

connaissances ne sont que probables; et dans le petit nombre de
choses que nous pouvons savoir avec certitude, dans les
sciènces mathématiques elles-mêmes, les principaux moyens de
parvenir à la vérité, l’induction et l’analogie, se fondent sur les
probabilités, en sorte que le système entier des connaissances
humaines se rattache à la théorie exposé dans cet essai.
(Pierre Simon de Laplace, Théorie Analytique des Probabilités)

In this chapter we recall some basic notions from probability theory in order to
set notation and to have easy references for later use. Proofs are mostly omitted.
Readers who are unfamiliar with the concepts appearing below should consult basic
textbooks on probability theory and stochastic processes (see Sect. 3.6 for a list of
possible references). Readers who are familiar may skip to Sect. 4.
Section 3.1 defines key ingredients such as probability spaces, random variables,
integrals and Radon-Nikodým derivative. Section 3.2 defines stochastic processes
and states the Daniell-Kolmogorov extension theorem. Section 3.3 defines condi-
tional expectation, conditional probability and conditional probability measure. Sec-
tions 3.4–3.5 list the main properties of martingales in discrete time, respectively
continuous time.

3.1 Probability and measures

3.1.1 Probability spaces

A space Ω is an arbitrary non-empty set. Elements of Ω are denoted by ω. If A ⊂ Ω

is a subset of Ω, then we denote by 1A the indicator function of the set A, i.e.,

1, if ω ∈ A,
1A (ω) = (3.1.1)
0, if ω ∈ Ac = Ω\A.

© Springer International Publishing Switzerland 2015 27

A. Bovier, F. den Hollander, Metastability,
Grundlehren der mathematischen Wissenschaften 351,
DOI 10.1007/978-3-319-24777-9_3
28 3 Some Basic Notions from Probability Theory

Definition 3.1 Let Ω be a space. A family A = {Aλ }λ∈I , with Aλ ⊂ Ω for all λ ∈ I
with I an arbitrary set, is called a class of Ω. A non-empty class of Ω is called an
algebra if:
(i) Ω ∈ A .
(ii) For all A ∈ A , Ac ∈ A .
(iii) For all A, B ∈ A , A ∪ B ∈ A .
If A is an algebra and

(iv) n∈N An ∈ A whenever An ∈ A for all n ∈ N,
then A is called a σ -algebra.

Definition 3.2 A space Ω, together with a σ -algebra F of subsets of Ω, is called

a measurable space (Ω, F ).

Definition 3.3 Let (Ω, F ) be a measurable space. A map μ : F → [0, ∞] is

called a (positive) measure if
(i) μ(∅) = 0.
(ii) For any countable family {An }n∈N of mutually disjoint elements of F ,

μ An = μ(An ). (3.1.2)
n∈N n∈N

A measure μ is called finite if μ(Ω) < ∞. A measure is called σ -finite if

there exists a sequence (Ωn )n∈N of subsets of Ω such that Ω = n∈N Ωn and
μ(Ωn ) < ∞ for all n ∈ N.
A triple (Ω, F , μ) is called a measure space.

Definition 3.4 Let (Ω, F ) be a measurable space. A positive measure P on (Ω, F )

that satisfies P(Ω) = 1 is called a probability measure. A triple (Ω, F , P), with Ω
a set, F a σ -algebra of subsets of Ω and P a probability measure on (Ω, F ), is
called a probability space.

In most instances one is concerned with the canonical setting where Ω is a topo-
logical space and F = B(Ω) is the Borel-σ -algebra of Ω.

Definition 3.5 Let E be a topological space. The Borel-σ -algebra B(E) of E is

the smallest σ -algebra that contains all the open sets of E.

One says that the Borel-σ -algebra is generated by the open sets of E.
A topological space endowed with a metric and its metric topology is called
a metric space. A metric space E is called complete if any Cauchy sequence in
E converges in E. E is called separable if it contains a countable subset that is
dense in E. The standard setting of probability theory is a complete, separable and
3.1 Probability and measures 29

topological space whose topology is equivalent to some metric topology. Such a

space is called a Polish space.
The crucial theorem permitting the construction of measures is Carathéodory’s
theorem:

Theorem 3.6 (Carathéodory’s theorem) Let Ω be a set and let A be an algebra

on Ω. Let μ0 : A → [0, ∞] be a countably additive map. Then there exists a mea-
sure μ on (Ω, σ (A )) such that μ = μ0 on A . If μ0 is σ -finite, then μ is unique.

A measure defined on a Borel-σ -algebra is sometimes called a Borel measure.

The most important issue that arises in applications is to characterise a measure

with the minimal amount of information possible. The basic tool here is Dynkin’s
theorem.

Definition 3.7 Let Ω be a space. A class T of Ω is called a Π -system if it is closed

under finite intersections. A class G of Ω is called a λ-system if
(i) Ω ∈ G .
(ii) If A, B ∈ G , and A ⊃ B, then A \ B ∈ G .
(iii) If An ∈ G and An ⊂ An+1 , then limn→∞ An ∈ G .

Theorem 3.8 (Dynkin’s theorem) If T is a Π -system and G is a λ-system, then

G ⊃ T implies that G contains the smallest σ -algebra containing T .

The most useful application of Dynkin’s theorem is the observation that if two
probability measures are equal on a Π -system that generates the σ -algebra, then
they are equal on the σ -algebra (since the set on which the two measures coincide
forms a λ-system containing T ).
Dynkin’s lemma has a sometimes useful analogue for so-called monotone classes
of functions.

Theorem 3.9 (Monotone class theorem) Let H be a class of bounded, measurable

functions from Ω to R. Assume that
(i) H is a vector space over R.
(ii) 1 ∈ H .
(iii) If fn ≥ 0 are in H and fn ↑ f , where f is bounded, then f ∈ H .
If H contains the indicator functions of every element of a Π -system S , then
H contains any bounded σ (S )-measurable function.

3.1.2 Random variables

Definition 3.10 Let (Ω, F ) and (E, G ) be two measurable spaces. A map
X : Ω → E is called measurable from (Ω, F ) to (E, G ) if X −1 (A) = {ω ∈
Ω : X(ω) ∈ A} ∈ F for all A ∈ G .
30 3 Some Basic Notions from Probability Theory

Fig. 3.1 A random variable X is a measurable map from Ω to E

The notion of measurability implies that a measurable map is capable of trans-

porting a measure from one space to another. Namely, if (Ω, F , P) is a probability
space and f is a measurable map from (Ω, F ) to (E, G ), then

PX = P ◦ X −1 (3.1.3)

defines a probability measure on (E, G ), called the induced measure. Indeed, for
any B ∈ G , by definition

PX (B) = P X −1 (B) (3.1.4)

is well defined because X −1 (B) ∈ F .

The standard notion of a random variable refers to a measurable function from
some measurable space to the Borel space (R, B(R)). One generally extends this
notion by calling any measurable map from a measurable space (Ω, F ) to a measur-
able space (E, B(E)), with E a topological space or a metric space, an E-valued
random variable or an E-valued Borel function (see Fig. 3.1). Thus, one has an
abstract probability space (Ω, F , P) on which all kinds of random variables—be
it real numbers, infinite sequences, functions or measures—are defined simultane-
ously.
An important notion is that of the σ -algebra generated by random variables.

Definition 3.11 Let (Ω, F ) be a measurable space, and let (E, B(E)) be a topo-
logical space equipped with its Borel-σ -algebra. Let X be an E-valued random
variable. Then σ (X) is the smallest σ -algebra such that X is measurable from
(Ω, σ (X)) to (E, B(E)).

3.1.3 Integrals

We next recall the notion of the integral of a measurable function (respectively, the
expectation value of a random variable). To do so we first introduce the notion of
simple functions:

Definition 3.12 A measurable function g : Ω → R is called simple if it takes only

finitely many values, i.e., if there are numbers k ∈ N and w1 , . . . , wk ∈ R, and a
3.1 Probability and measures 31

partition A1 , . . . , Ak ∈ F of Ω (i.e., ki=1 Ai = Ω and Ai ∩ Aj = ∅ for all 1 ≤ i <
j ≤ k) such that Ai = {ω ∈ Ω : g(ω) = wi } for 1 ≤ i ≤ k. In that case we can write

k
g(ω) = wi 1Ai (ω). (3.1.5)
i=1

The space of simple measurable functions is denoted by E+ .

It is obvious what the integral of a simple function should be.

k
Definition 3.13 Let (Ω, F , μ) be a measure space and g = i=1 wi 1Ai a simple
function. Then
k
g dμ = wi μ(Ai ). (3.1.6)
Ω i=1

The integral of a general measurable function is defined via approximation with

simple functions.

Definition 3.14
(i) Let f be non-negative and measurable. Then

f dμ = sup g dμ ∈ R ∪ {∞}. (3.1.7)
Ω g∈E+ Ω
g≤f

(ii) For f measurable, put

f (ω) = f (ω) 1f (ω)≥0 + f (ω) 1f (ω)<0 = f+ (ω) − f− (ω). (3.1.8)

If either Ω f+ δμ < ∞ or Ω f− dμ < ∞, then define

f dμ = f+ dμ − f− dμ. (3.1.9)
Ω Ω Ω

(iii) A function f is called integrable, or absolutely integrable, if

|f | dμ < ∞. (3.1.10)
Ω

We next state the key properties of the integral. The most fundamental property
is the monotone convergence theorem, which justifies the definition above.

Theorem 3.15 (Monotone convergence) Let (Ω, F , μ) be a measure space and f

a real-valued non-negative measurable function. Let (fn )n∈N be a non-decreasing
32 3 Some Basic Notions from Probability Theory

sequence of non-negative measurable functions that converge pointwise to f .

Then

f dμ = lim fn dμ. (3.1.11)
Ω n→∞ Ω

The monotone convergence theorem allows us to provide an “explicit construc-

tion” of the integral originally used by Lebesgue as a definition.

Lemma 3.16 Let f be a non-negative measurable function. Then

n2n −1

f dμ = lim 2−n k μ ω ∈ Ω : 2−n k ≤ f (ω) < 2−n (k + 1)
Ω n→∞
k=0

+ n μ ω ∈ Ω : f (ω) ≥ n . (3.1.12)

The following lemma is known as Fatou’s lemma:

Lemma 3.17 (Fatou’s lemma) Let (fn )n∈N be a sequence of measurable non-
negative functions. Then

lim inf fn dμ ≤ lim inf fn dμ. (3.1.13)
Ω n→∞ n→∞ Ω

Equally important is Lebesgue’s dominated convergence theorem:

Theorem 3.18 (Lebesgue’s dominated convergence) Let (fn )n∈N be a sequence of

absolutely integrable functions, and let f be a measurable function such that

lim fn (ω) = f (ω) for μ-almost all ω. (3.1.14)

n→∞

Let g ≥ 0 be a measurable function such that Ω g dμ < ∞ and

fn (ω) ≤ g(ω) for μ-almost all ω. (3.1.15)

Then f is absolutely integrable with respect to μ and

lim fn dμ = f dμ. (3.1.16)
n→∞ Ω Ω

3.1.4 Spaces of integrable functions

We briefly summarise some frequently used notions concerning spaces of integrable

functions. Given a measure space (Ω, F , μ) and a p ∈ [1, ∞], we define for a
3.1 Probability and measures 33

Fig. 3.2 A convex function

measurable function X : Ω → R,

1/p
1/p
Xp,μ = Xp = E |X|p = |X| dμ
p
. (3.1.17)
Ω

The set of functions X such that Xp,μ < ∞ is denoted by L p (Ω, F , μ) = L p .

Theorem 3.19 (Minkowski’s inequality) For all X, Y ∈ L p and p ∈ [1, ∞],

X + Y p ≤ Xp + Y p . (3.1.18)

Theorem 3.20 (Hölder’s inequality) For all measurable functions X, Y and p, q ∈

[1, ∞] such that p1 + q1 = 1,

E[XY ] ≤ Xp Y q . (3.1.19)

Both inequalities follow from one of the most important inequalities in integra-
tion theory: Jensen’s inequality (see Fig. 3.2).

Theorem 3.21 (Jensen’s inequality) Let (Ω, F , μ) be a probability space, X an

absolutely integrable random variable, and ϕ : R → R a convex function. Then, for
any c ∈ R,

E ϕ X − E[X] + c ≥ ϕ(c). (3.1.20)
In particular,

E ϕ(X) ≥ ϕ E[X] . (3.1.21)

Proof If ϕ is convex then, for every y, there is a straight line below ϕ that touches
ϕ at (y, ϕ(y)), i.e., there exists an m ∈ R such that ϕ(x) ≥ ϕ(y) + (x − y)m. Choos-
ing x = X − E[X] + c and y = c, and taking expectations on both sides, we get
(3.1.20).

We leave it as an exercise to deduce Minkowski’s inequality and Hölder’s in-

equality from Jensen’s inequality.
34 3 Some Basic Notions from Probability Theory

Since Minkowski’s inequality is really a triangle inequality and linearity is trivial,

we would be inclined to think that · p is a norm and L p is a normed space.
In fact, the only problem is that Xp = 0 does not imply X = 0, since X may
be non-zero on sets of μ-measure zero. Therefore, to define a normed space, we
consider equivalence classes of functions in L p by calling two functions X, X
equivalent when X − X is non-zero only on a set of measure zero. The space of
these equivalence classes is called Lp = Lp (Ω, F , μ).
The following fact about Lp -spaces will be useful.

Lemma 3.22 Lp (Ω, F , μ) is a Banach space (i.e., a complete normed vector

space).

The case p = 2 is particularly nice, in that L2 is not only a Banach space but also
a Hilbert space. The point is that Hölder’s inequality for p = 2 yields

E[XY ] ≤ E X 2 E Y 2 = X2 Y 2 . (3.1.22)

This means that on L2 there exists a quadratic form (·, ·)μ ,

(X, Y )μ = XY dμ = E[XY ], (3.1.23)
Ω

that has the properties

of a scalar product. The L2 -norm being the derived norm, we
have X2 = (X, X)μ .

3.1.5 Convergence

A central issue in probability theory is the notion of convergence of probability

measures and random variables. The most commonly used concept of convergence
of probability measures is that of weak convergence.

Definition 3.23 Let (μn )n∈N be a sequence of probability measures defined on

some topological space S, and let μ be a probability measure defined on the same
space. The sequence (μn )n∈N is said to converge weakly to μ if and only if, for all
continuous functions f : S → R,

lim μn (f ) = μ(f ). (3.1.24)

n→∞

If S = R, then this is equivalent to saying that the sequence of probability distribu-

tion functions (Fn )n∈N defined by Fn (x) = μn ((−∞, x]) converges to the probabil-
ity distribution function F defined by F (x) = μ((−∞, x]), at every point x ∈ R of
continuity of F .
3.1 Probability and measures 35

As to convergence of random variables, the standard concepts of pointwise or

even uniform convergence are often too rigid and need to be replaced by weaker
notions.
The following notion of convergence in law is natural when we do not think
of random variables as functions on sample spaces, but rather only care about the
probability distributions they induce.

Definition 3.24 Let (Xn )n∈N be a sequence of random variables with values in
some topological space, and let X be a random variable on the same space. The
sequence (Xn )n∈N is said to converge in law to X,
D
Xn → X, (3.1.25)

if and only if the induced probability measures (PXn )n∈N converge weakly to PX .

The following notion of convergence in probability takes the functional aspect of

random variables more seriously.

Definition 3.25 Let (Xn )n∈N be a sequence of random variables with values in
some topological space, and let X be a random variable on the same space. The
sequence (Xn )n∈N is said to converge in probability to X if and only if, for any
ε > 0,

lim P |Xn − X| ≥ ε = 0. (3.1.26)
n→∞

A stronger notion, which comes closer to pointwise convergence, is that of almost

sure convergence.

Definition 3.26 Let (Xn )n∈N be a sequence of random variables with values in
some topological space, and let X be a random variable on the same space. The
sequence (Xn )n∈N is said to converge almost surely to X if and only if

P lim Xn = X = 1. (3.1.27)
n→∞

Clearly, almost sure convergence implies convergence in probability, and conver-

gence in probability implies convergence in law.
Finally, there is the notion of convergence in Lp .

Definition 3.27 Let (Xn )n∈N be a sequence of random variables with values in
some normed space, and let X be a random variable on the same space. Let
p ∈ (0, ∞). The sequence (Xn )n∈N is said to converges to X in Lp if and only if

lim E |Xn − X|p = 0. (3.1.28)
n→∞

Even almost sure convergence does not imply convergence of the integral of
the random variable without extra conditions. Lebesgue’s dominated convergence
36 3 Some Basic Notions from Probability Theory

theorem provides a sufficient condition. There exists a useful improvement of the

dominated convergence theorem that leads us to the important notion of uniform
integrability.

Definition 3.28 Let (Ω, F , P) be a probability space. A class C of real-valued

random variables X is called uniformly integrable when for every ε > 0 there exists
a K = K(ε) < ∞ such that

E |X|1|X|>K < ε ∀ X ∈ C . (3.1.29)

Note, in particular, that if C is uniformly integrable, then there exists a constant

C < ∞ such that E[|X|] ≤ C for all X ∈ C .

Theorem 3.29 Let Xn , n ∈ N, and X be integrable random variables on a proba-

bility space (Ω, F , P). Then limn→∞ E[|Xn − X|] = 0 if and only if
(i) Xn → X in probability as n → ∞.
(ii) The family Xn , n ∈ N, is uniformly integrable.

A simple criterion for uniform integrability is Lp -boundedness for p > 1.

Lemma 3.30 Let C be a class of random variables. Assume that, for some p > 1,

sup E |X|p = c < ∞. (3.1.30)
X∈C

Then C is uniformly integrable.

Proof By Hölder’s inequality and Chebychev’s inequality, for all X ∈ C and p, q >
1 such that 1/p + 1/q = 1,
1/p 1/q
E |X|1|X|>K ≤ E |X|p P |X| > K ≤ c1/p cp/q K −p/q , (3.1.31)

which tends to zero uniformly in C as K → ∞.

3.1.6 Radon-Nikodým derivative

It is possible to modify a measure μ on a measurable space (Ω, F ) with the help

of a measurable function X. Indeed, set

μX (A) = X dμ, A ∈ F . (3.1.32)
A

If μ is the Lebesgue measure, then μX is the absolutely continuous measure with

density X. In general, if O is a measurable set with μ(O) = 0, then also μX (O) = 0.
In words, a μ-null set is also a μX -null set. The latter property leads us to the notion
of absolute continuity between general measures.
3.2 Stochastic processes 37

Definition 3.31 Let μ, ν be two measures on a measurable space (Ω, F ).

(i) ν is absolutely continuous with respect to μ, written ν μ, if and only if all
μ-null sets are ν-null sets.
(ii) Two measures μ, ν are equivalent if μ ν and ν μ.
(iii) A measure ν is singular with respect to μ if there exists a set O ∈ F such that
μ(O) = 0 and ν(O c ) = 0.

The following important theorem, called the Radon-Nikodým theorem, asserts

that relative absolute continuity is equivalent to the existence of a density.

Theorem 3.32 (Radon-Nikodým theorem) Let μ, ν be two σ -finite measures on a

measurable space (Ω, F ). Then the following two statements are equivalent:
(i) ν μ.
(ii) There exists a non-negative measurable function X such that ν = μX .
Moreover, X is unique up to null sets.

Definition 3.33 If ν μ, then a positive measurable function X such that ν = μX

is called the Radon-Nikodým derivative of ν with respect to μ, denoted by
dν
X= . (3.1.33)
dμ

The following property of the Radon-Nikodým derivative is very important.

Lemma 3.34 Let μ, ν be two σ -finite measures on (Ω, F ) such that ν μ. If X

is F -measurable and ν-integrable, then

dν
X dν = X dμ, A ∈ F . (3.1.34)
A A dμ

3.2 Stochastic processes

Stochastic processes are the main models for systems that exhibit metastability. In
this section we recall some basis facts.

3.2.1 Definition of stochastic processes

There are various equivalent ways in which stochastic processes can be defined, and
it is useful to keep these in mind. The standard way is as follows. We begin with an
abstract probability space (Ω, F , P). Next, we need a measurable space (S, B(S))
(typically a Polish space together with its Borel σ -algebra), where S is called the
state space. Finally, we need a set I , called the index set. A stochastic process with
38 3 Some Basic Notions from Probability Theory

Fig. 3.3 The path of a stochastic process X taking values in S

state space S and index set I is a collection of (S, B(S))-valued random variables
(Xt )t∈I defined on (Ω, F , P). If I is either N0 = N ∪ {0}, Z, R+ = [0, ∞) or R,
then we may think of it as time. Depending on whether I is discrete or continuous,
we refer to (Xt )t∈I as a stochastic process with discrete or continuous time (see
Fig. 3.3).
Given a stochastic process as defined above, we can take a different perspective
and, for each ω ∈ Ω, view X(ω) as a map from I to S,

X(ω) : I → S, t → Xt (ω). (3.2.1)

We call such a function a sample path of X, or a realisation of X. Here we want

to view the stochastic process as a random variable taking values in the space of
functions,
X : Ω → SI , ω → X(ω), (3.2.2)
where we view S I as the space of functions from I to S. To complete this image,
we need to endow S I with a σ -algebra, called B I . How should we choose this
σ -algebra? Our picture will be that X maps (Ω, F ) to (S I , B I ).

Lemma 3.35 Let B I be the smallest σ -algebra that contains all subsets of S I of
the form

C(A, t) = x ∈ S I : xt ∈ A (3.2.3)
with A ∈ B = B(S), t ∈ I . Then B I is the smallest σ -algebra such that all func-
tions Xt : Ω → S, t ∈ I , are measurable, i.e., B I = σ (Xt , t ∈ I ).

Definition 3.36 If J ⊂ I is finite and B ∈ B J , then we call

C(B, J ) = x ∈ S I : xJ = {xt }t∈J ∈ B (3.2.4)

a cylinder set or, more precisely, a finite-dimensional cylinder set. If B is of the form
B = ×t∈J At , At ∈ B, then we call it a special cylinder.

It is clear that B I contains all finite-dimensional cylinder sets. But, of course, it

contains much more. We call B I the product σ -algebra, or the algebra generated by
the cylinder sets.
3.2 Stochastic processes 39

Fig. 3.4 A stochastics process is determined by its finite-dimensional distributions

If we view X as a map from Ω to the set of all S-valued functions on I , then we

can define the probability distribution induced by P on the space (S I , B I ),

PX = P ◦ X −1 , (3.2.5)

as the distribution of the random variable X.

3.2.2 The Daniell-Kolmogorov extension theorem

The most fundamental observation is that stochastic processes are determined by

their observation on finitely many points in time. For J ⊂ I we will denote by πJ
the canonical projection from S I to S J , i.e., πJ X ∈ S J such that (πJ X)t = Xt for
all t ∈ J . Naturally, on S J we can define the distributions

PJX = P ◦ (πJ X)−1 . (3.2.6)

Definition 3.37 Let F (I ) denote the set of finite non-empty subsets of I . Then the
collection of probability measures
J
PX : J ∈ F (I ) (3.2.7)

is called the collection of finite-dimensional distributions of X (or finite-dimensional

marginal distributions, or finite-dimensional marginals).

Note that the finite-dimensional distributions determine PX on the algebra of

finite-dimensional cylinder sets. Hence, by Dynkin’s theorem, they determine the
distribution on the σ -algebra B I , which is nice. What is even nicer is that we can
also go the other way and construct the law of a stochastic process from specified
finite-dimensional distributions (see Fig. 3.4). This is the content of the following
fundamental theorem due to Kolmogorov and Daniell.

Theorem 3.38 (Daniell-Kolmogorov extension theorem) Let S be a Polish space,

and let B = B(S) be its Borel-σ -algebra. Let I be a set. Suppose that, for each
40 3 Some Basic Notions from Probability Theory

J ∈ F (I ), there exists a probability measure PJ on (S J , B J ) such that, for any

J1 ⊂ J2 ∈ F (I ),
PJ1 = PJ2 ◦ πJ−1
1
, (3.2.8)

where πJ1 denotes the canonical projection from S J2 to S J1 . Then there exists a
unique measure P on (S I , B I ) such that, for all J ∈ F (I ),

P ◦ πJ−1 = PJ . (3.2.9)

Note that we need not distinguish cases according to the nature of the set I .

3.3 Conditional expectations

The notions of conditional expectations and conditional probabilities are central to

the theory of Markov processes and we collect them here.

3.3.1 Definition of conditional expectations

Definition 3.39 Consider a probability space (Ω, F , P). Let G ⊂ F be a sub-σ -

algebra of F . Let X be a random variable, i.e., a F -measurable (real-valued) func-
tion on Ω such that E[|X|] < ∞. We say that a function Y is a conditional expecta-
tion of X given G , written Y = E(X|G ), if
(i) Y is G -measurable.
(ii) For all A ∈ G ,
E[1A Y ] = E[1A X]. (3.3.1)

If two functions Y, Y both satisfy the conditions of a conditional expectation,

then they can differ only on sets of probability zero, i.e., P(Y = Y ) = 1. Such dif-
ferent realisations of a conditional expectation are called “versions”. The following
theorem guarantees the existence of conditional expectations.

Theorem 3.40 Let (Ω, F , P) be a probability space, let X be a random variable

such that E[|X|] < ∞, and let G ⊂ F be a sub-σ -algebra of F . Then
(i) There exists a G -measurable function E[X|G ], unique up to sets of measure
zero, called the conditional expectation of X given G , such that for all A ∈ G ,

E[X|G ] dP = X dP. (3.3.2)
A A
3.3 Conditional expectations 41

(ii) If X is absolutely integrable, and Z is an absolutely integrable G -measurable

random variable such that, for some Π -System D with σ (D) = G ,

E[Z] = E[X], and Z dP = X dP ∀ A ∈ D, (3.3.3)
A A

then Z = E[X|G ] a.s.

(Recall from Definition 3.7 that a Π -system is a set that is closed under finite
intersections.)
In many cases the σ -algebra G with respect to which we are conditioning is the
σ -algebra σ (Y ) generated by some other random variable Y . In those cases we will
write

E X|σ (Y ) = E[X|Y ] (3.3.4)
and call this the conditional expectation of X given Y .

3.3.2 Elementary properties of conditional expectations

Conditional expectations share most of the properties of ordinary expectations. The

following is a list of elementary properties:

Lemma 3.41 Let (Ω, F , P) be a probability space and let G ⊂ F be a sub-σ -

algebra. Then:
(i) If X is G -measurable, then E[X|G ] = X a.s.
(ii) The map X → E[X|G ] is linear.
(iii) E[E[X|G ]] = E[X].
(iv) If B ⊂ G is a σ -algebra, then E[E[X|G ]|B] = E[X|B] a.s.
(v) |E[X|G ]| ≤ E[|X| |G ] a.s.
(vi) If X ≤ Y , then E[X|G ] ≤ E[Y |G ] a.s.

The following theorem summarises the most important properties of conditional

expectations with regard to taking limits.

Theorem 3.42 Let Xn , n ∈ N, and Y be absolutely integrable random variables on

a probability space (Ω, F , P), and let G ⊂ F be a sub-σ -algebra. Then:
(i) If 0 ≤ Xn ↑ X a.s. as n → ∞, then E[Xn |G ] ↑ E[X|G ] a.s. as n → ∞.
(ii) If Xn ≥ 0 a.s. for all n, then

E lim inf Xn |G ≤ lim inf E[Xn |G ]. (3.3.5)
n→∞ n→∞

(iii) If Xn → X a.s. as n → ∞ and |Xn | ≤ |Y | for all n, then E[Xn |G ] → E[X|G ]

a.s. as n → ∞.
42 3 Some Basic Notions from Probability Theory

Of course, these are just the analogues of the three basic convergence theorems
for ordinary expectations. A useful further property is the following lemma.

Lemma 3.43 Let X be integrable and let Y be bounded and G -measurable. Then

E[XY |G ] = Y E[X|G ] a.s. (3.3.6)

There is a natural connection between independence and conditional expectation.

Lemma 3.44 Two σ -algebras G1 , G2 are independent if and only if, for all G2 -
measurable integrable random variables X,

E[X|G1 ] = E[X]. (3.3.7)

By choosing X = 1A , A ∈ G2 , we see that (3.3.7) reduces to the independence of

events.

3.3.3 Conditional probability measures

From conditional expectations we want to construct conditional probability mea-

sures. As before, we consider a probability space (Ω, F , P) and a sub-σ -algebra
G . For any A ∈ F , we can define

P(A|G ) = E[1A |G ], (3.3.8)

and call it the conditional probability of A given G . This is a G -measurable function

that satisfies (see Fig. 3.5)

P(A|G ) dP = 1A dP = P(A ∩ G), G ∈ F . (3.3.9)
G G

It clearly inherits from the conditional expectation the following properties:

(iv) If An ∈ F , n ∈ N, such that limn→∞ An = A then

lim P(An |G ) = P(A|G ) a.s. (3.3.11)

n→∞
3.3 Conditional expectations 43

Fig. 3.5 Conditional probability of A given G , as defined in (3.3.9)

These observations bring us close to viewing conditional probabilities as G -

measurable functions taking values in the set of probability measures, at least for
almost all ω. The problem, however, is that the requirement of σ -additivity, which
seems to be satisfied because of (iii), is in fact problematic: (iii) says that for any
sequence (An )n∈N there exists a set Ω of full measure such that

P An |G (ω) = P(An |G )(ω) ∀ ω ∈ Ω . (3.3.12)
n∈N n∈N

However, Ω may depend on (An )n∈N and, since the space is not countable, it is
unclear whether there exists a set of full measure on which (3.3.12) holds for all
sequences. These considerations lead us to the definition of so-called regular con-
ditional probabilities.

Definition 3.45 Let (Ω, F , P) be a probability space and let G be a sub-σ -algebra.
A regular conditional probability measure or regular conditional probability on F
given G is a function P (ω, A), defined for all A ∈ F and all ω ∈ Ω, such that:
(i) For each ω ∈ Ω, P (ω, ·) is a probability measure on (Ω, F ).
(ii) For each A ∈ F , P (·, A) is a G -measurable function coinciding with the con-
ditional probability P(A|G ) almost everywhere.

The point is that if we have a regular conditional probability, then we can express
conditional expectations as expectations with respect normal probability measures.

Theorem 3.46 With the notation above, if Pω [A] = P (ω, A) is a regular condi-
tional probability on F given G , then for an F -measurable integrable random
variable X,

E[X|G ](ω) = X dPω a.s. (3.3.13)
Ω

The question remains when regular conditional probabilities exist. A central re-
sult is the existence when Ω is a Polish space.
44 3 Some Basic Notions from Probability Theory

Theorem 3.47 Let (Ω, B(Ω), P) be a probability space with Ω a Polish space and
B(Ω) its Borel-σ -algebra. Let G ⊂ B(Ω) be a sub-σ -algebra. Then there exists a
regular conditional probability P (A, ω) given G .

3.4 Martingales in discrete time

One of the most fundamental and useful concepts in the theory of stochastic pro-
cesses is that of a martingale. Since this will play a major rôle in the remainder of
the book, we will spend some time to expose its main properties. As some subtleties
arise in continuous time, we begin with the simpler case of discrete time.

3.4.1 Definitions

Definition 3.48 Let (Ω, F ) be a measurable space. A family of sub-σ -algebras

(Fn )n∈N0 of F that satisfies

F0 ⊂ F1 ⊂ F2 ⊂ · · · ⊂ F∞ = σ Fn ⊂ F , (3.4.1)
n∈N0

is called a filtration of the σ -algebra F . A quadruple (Ω, F , P, (Fn )n∈N0 ) is called

a filtered (probability) space.

Filtrations and stochastic processes are closely linked together, in two ways.

Definition 3.49 A stochastic process X = (Xn )n∈N0 is called adapted to the filtra-
tion (Fn )n∈N0 if Xn is Fn -measurable for every n.

Definition 3.50 Let X = (Xn )n∈N0 be a stochastic process on (Ω, F , P). The nat-
ural filtration (Wn )n∈N0 with respect to X is the smallest filtration such that X is
adapted to it, i.e.,
Wn = σ (X0 , . . . , Xn ). (3.4.2)

Definition 3.51 A stochastic process X on a filtered space is called a martingale if

and only if the following hold:
(i) The process X is adapted to the filtration (Fn )n∈N0 .
(ii) For all n ∈ N0 , E[|Xn |] < ∞.
(iii) For all n ∈ N,
E[Xn |Fn−1 ] = Xn−1 a.s. (3.4.3)
3.4 Martingales in discrete time 45

If (i) and (ii) hold but, instead of (iii), it is only true that E[Xn |Fn−1 ] ≥ Xn−1 ,
respectively, E[Xn |Fn−1 ] ≤ Xn−1 , then the process X is called a submartingale,
respectively, a supermartingale.

We will next head for the fundamental theorem stating the impossibility of “win-
ning games” built on martingales.

Definition 3.52 A stochastic process C = (Cn )n∈N0 is called previsible if Cn is

Fn−1 -measurable for all n ∈ N.

Given an adapted stochastic process X and a previsible process C, we can define

the discrete stochastic integral

n
Wn = Ck (Xk − Xk−1 ) = (C • X)n . (3.4.4)
k=1

The most pertinent fact about martingales is their stability under the martingale
transform:

Theorem 3.53 The martingale transform has the following properties:

(i) Let C be a uniformly bounded non-negative previsible process and let X be a
supermartingale. Then C • X is a supermartingale that vanishes at zero.
(ii) Let C be a uniformly bounded previsible process and let X be a martingale.
Then C • X is a martingale that vanishes at zero.
(iii) Both in (i) and (ii) the condition of uniform boundedness of C can be replaced
by boundedness in L 2 of C and X.

3.4.2 Upcrossings and convergence

It is essentially a consequence of Theorem 3.53 that uniformly integrable martin-

gales converge almost surely. This is the content of Doob’s supermartingale conver-
gence theorem:

Theorem 3.54 (Supermartingale convergence) Let X be an L 1 -bounded super-

martingale (i.e., supn∈N0 E[|Xn |] < ∞). Then a.s. X∞ = limn→∞ Xn exists and is
a finite random variable.

The Doob convergence theorem implies that non-negative supermartingales con-

verge a.s. This is because the supermartingale property ensures that E[|Xn |] =
E[Xn ] ≤ E[X0 ], so the uniform boundedness in L 1 is always guaranteed.
The reason behind the (super)martingale convergence theorem is the Doob’s up-
crossing lemma, which states that in (super)martingales oscillations are necessarily
linked to growth. Let a < b, and let UN (X, [a, b]) be the number of times X crosses
the interval [a, b] from below up to time N (see Fig. 3.6).
46 3 Some Basic Notions from Probability Theory

Fig. 3.6 An upcrossing

Lemma 3.55 (Doob’s upcrossing lemma) Let X be a supermartingale. Then

(b − a) E UN X, [a, b] ≤ E |XN − a|1XN <a . (3.4.5)

The following is an immediate consequence of Lemma 3.55.

Corollary 3.56 Let X be an L 1 -bounded supermartingale. For [a, b] an interval,

define U∞ (X, [a, b]) = limn→∞ Un (X, [a, b]). Then

(b − a) E U∞ X, [a, b] ≤ a + sup E |Xn | < ∞. (3.4.6)
n∈N0

In particular, P(U∞ (X, [a, b]) = ∞) = 0.

For proofs of the above results and further details, see Rogers and Williams [208].

3.4.3 Maximum inequalities

In this section we derive some fundamental inequalities for martingales. One of the
most useful ones is the following maximum inequality.

Theorem 3.57 (Doob’s maximum inequality) Let Z be a non-negative submartin-

gale. Then, for c > 0 and n ∈ N0 ,

c P max Zk ≥ c ≤ E[Zn 1max0≤k≤n Zk ≥c ] ≤ E[Zn ]. (3.4.7)
0≤k≤n

Proof Since this inequality is fundamental for many applications, we will include
its proof. The second inequality in (3.4.7) is trivial. To prove the first inequality,
define the sequence of disjoint events given by F0 = {Z0 ≥ c} and

Fk = {Z < c} ∩ {Zk ≥ c} = ω ∈ Ω : min(0 ≤ ≤ n : X ≥ c) = k .
0≤<k
(3.4.8)
Then
n
F= sup Zk ≥ c = Fk . (3.4.9)
0≤k≤n k=0
3.4 Martingales in discrete time 47

Clearly, Fk ∈ Fk . Moreover, on Fk we know that Zk ≥ c. Thus

E[Zn 1Fk ] ≥ E[Zk 1Fk ] ≥ c P(Fk ), (3.4.10)

where the first inequality uses the submartingale property of Z. Hence

n
n
E[Zn 1F ] = E[Zn 1Fk ] ≥ c P(Fk ) = c P(F ), (3.4.11)
k=0 k=0

which implies the claim.

If (Mn )n∈N0 is a martingale and f is a convex function, then (f (Mn ))n∈N0 is

a submartingale. This observation allows us to obtain useful inequalities from the
one in Theorem 3.57. In particular, the so-called Kolmogorov inequality follows by
choosing f (X) = X 2 . Another useful choice is the exponential function f (X) = eX .
A useful trick when dealing with stochastic processes is to “extract the martin-
gale part”. There are several such decompositions. The following, called the Doob
decomposition, is very important and its continuous-time analogue is fundamental
for the theory of stochastic integration.

Theorem 3.58 (Doob decomposition)

(i) Let X = (Xn )n∈N0 be an adapted process on a filtered space (Ω, F , P,
(Fn )n∈N0 ) with Xn ∈ L 1 for all n ∈ N0 . Then X can be written in the form

X = X0 + M + A, (3.4.12)

where M is a martingale with M0 = 0 and A is a previsible process with A0 = 0.

(Here, (3.4.12) is to be understood as X0 = X0 and Xn = X0 + Mn + An ,
n ∈ N.) This decomposition is unique modulo indistinguishability, i.e., if X =
X0 + M + A for some other M , A , then

P Mn = Mn and An = An ∀ n ∈ N0 = 1. (3.4.13)

(ii) The process X is a submartingale if and only if A is an increasing process, in

the sense that

P(An ≤ An+1 ∀ n ∈ N0 ) = 1. (3.4.14)

Proof The proof is easy. All we need to do is to derive explicit formulas for M and
A. Assume that a decomposition of the claimed form exists. Then

E[Xn − Xn−1 |Fn−1 ] = E[Mn − Mn−1 |Fn−1 ] + E[An − An−1 |Fn−1 ]

= 0 + An − An−1 (3.4.15)
48 3 Some Basic Notions from Probability Theory

by the martingale and predictability properties. Therefore

n
An = E[Xk − Xk−1 |Fk−1 ] a.s. (3.4.16)
k=1

Now simply define An by (3.4.16) and Mn by Mn = Xn − X0 − An . Then, clearly,

M is a martingale, and A is by construction predictable. This completes the proof
of assertion (i). Assertion of (ii) is obvious from (3.4.15).

An immediate application of the decomposition theorem is a maximum inequal-

ity without positivity assumption.

Lemma 3.59 If X is either a submartingale or a supermartingale, then

c P sup |Xk | ≥ 3c ≤ 4E |X0 | + 3E |Xn | , n ∈ N0 , c > 0. (3.4.17)
0≤k≤n

Proof We consider the case where X is a submartingale (the case of a supermartin-

gale is identical by passing to −X). Then there is a Doob decomposition

X = X0 + M + A (3.4.18)

with A an increasing process. Hence

sup |Xk | ≤ |X0 | + sup |Mk | + sup |Ak | = |X0 | + sup |Mk | + An . (3.4.19)
0≤k≤n 0≤k≤n 0≤k≤n 0≤k≤n

Note that |M| is a non-negative submartingale, so for the supremum of |Mk | we can
use Theorem 3.57. We use the simple observation that if x + y + z > 3c, then at
least one of the x, y, z must exceed c. Therefore

c P sup |Xk | ≥ 3c ≤ c P |X0 | ≥ c + c P sup |Mk | ≥ c + c P(An ≥ c)
0≤k≤n 0≤k≤n

≤ E |X0 | + E |Mn | + E[An ]. (3.4.20)

We have

E |Mn | = E |Xn − X0 − An | ≤ E |Xn | + E |X0 | + E[An ] (3.4.21)

and

E[An ] = E[Xn − X0 − Mn ] = E[Xn − X0 ] ≤ E |Xn | + E |X0 | . (3.4.22)

Inserting these two bounds into (3.4.20), we get the claim.

The Doob decomposition gives rise to two important processes associated with
a martingale M, namely, the bracket process M and the quadratic variation pro-
cess [M].
3.4 Martingales in discrete time 49

Definition 3.60 Let M be a martingale in L 2 with M0 = 0. Then M 2 is a sub-

martingale with Doob decomposition

M 2 = N + M, (3.4.23)

where N is a martingale that vanishes at zero and M is a previsible process that
vanishes at zero.

Note that boundedness in L 1 of M is equivalent to boundedness in L 2 of M.

From the formulas associated with the Doob decomposition we deduce that

Mn − Mn−1 = E Mn2 − Mn−1 2
|Fn−1 = E (Mn − Mn−1 )2 |Fn−1 . (3.4.24)

Definition 3.61 For M as before, define

n
[M]n = (Mk − Mk−1 )2 . (3.4.25)
k=1

Lemma 3.62 If M is as before, then

M 2 − [M] = V = (C • M), (3.4.26)

where V is a martingale, and Cn = 2Mn−1 . If M is bounded in L 2 , then V is

bounded in L 1 .

3.4.4 Stopping times and stopped martingales

Our analysis of metastability in Part III relies crucially on the analysis of the first
times when a Markov process hits certain sets. These first hitting times are special
cases of so-called stopping times: a time whose occurrence can be determined based
on the outcome of the process until that time alone.

Definition 3.63 A map τ : Ω → N0 ∪ {∞} is called a stopping time (with respect

to a filtration (Fn )n∈N0 ) if

{T = n} ∈ Fn ∀ n ∈ N0 ∪ {∞}. (3.4.27)

The most important examples of stopping times are hitting times. Let X be an
adapted process, and let B ∈ B. Define

τB = inf{n ∈ N : Xn ∈ B}, (3.4.28)

i.e., the first hitting time of B when the location of X0 is not counted (see Fig. 3.7).
Then τB is a stopping time. To see this, note that if n ∈ N, then

{τB = n} = ω ∈ Ω : Xn (ω) ∈ B, Xk (ω) = B ∀ 1 ≤ k < n . (3.4.29)
50 3 Some Basic Notions from Probability Theory

Fig. 3.7 First hitting locations in B (indicated by ∗) when X0 (indicated by •) is not in B, respec-
tively, is in B

This event is manifestly in Fn . The event {τB = ∞} occurs if and only if {Xn ∈ /B
∀ n ∈ N} ⊂ F∞ .
In principle all stopping times can be realised as first hitting times of some pro-
cess. To do so, simply define

1, if n ≥ T (ω),
I[T ,∞) (n, ω) = (3.4.30)
0, otherwise.

This process is adapted, and T = τ1 .

It is sometimes convenient to have the notion of a σ -algebra of events that take
place before a stopping time.

Definition 3.64 The pre-T -σ -algebra FT is the set of events F ⊂ Ω such that

F ∩ {T ≤ n} ∈ Fn ∀ n ∈ N0 ∪ {∞}. (3.4.31)

Pre-T -σ -algebras will play an important rôle in the formulation of the strong
Markov property. There are some useful elementary facts associated with this con-
cept.

Lemma 3.65 Let S, T be stopping times.

(i) If X is an adapted process, then XT is FT -measurable.
(ii) If S < T , then FS ⊂ FT .
(iii) FT ∧S = FT ∩ FS .
(iv) If F ∈ FS∨T , then F ∩ {S ≤ T } ∈ FT .
(v) FS∨T = σ (FT , FS ).

The interplay of stopping times and martingale properties is of fundamental im-

portance in potential theory, to be described in Chap. 7. We next discuss this in some
detail via an example taken from finance. Consider a supermartingale X. We want to
play a strategy that depends on a stopping time T , say, we keep one “unit of stock”
until the random time T :
Cn = CnT = 1n≤T . (3.4.32)
3.4 Martingales in discrete time 51

Note that C T = (CnT )n∈N0 is a previsible process, namely,

T
Cn = 0 = {T ≤ n − 1} ∈ Fn−1 , (3.4.33)

and, since CnT only takes the values 0 and 1, this inclusion suffices to show that
CnT ∈ Fn−1 . The “wealth process” associated with this strategy is C T • X = ((C T •
X)n )n∈N0 with
T
C • X n = XT ∧n − X0 . (3.4.34)
If we define the stopped process X T = (XnT )n∈N0 via

XnT (ω) = XT (ω)∧n (ω), (3.4.35)

then we have alternatively

C T • X = X T − X0 . (3.4.36)
Since C T is positive and bounded, Theorem 3.53 leads us to the following statement.

Theorem 3.66
(i) If X is a supermartingale and T is a stopping time, then the stopped process
X T = (XT ∧n )n∈N0 , is a supermartingale. In particular,

E[XT ∧n ] ≤ E[X0 ] ∀ n ∈ N0 . (3.4.37)

(ii) If X is a martingale and T is a stopping time, then X T is a martingale. In

particular,
E[XT ∧n ] = E[X0 ] ∀ n ∈ N0 . (3.4.38)

Note that Theorem 3.66 does not assert that E[XT ] ≤ E[X0 ]. The following the-
orem gives conditions under which this inequality holds.

Theorem 3.67 (Doob’s optional stopping theorem)

(i) Let T be a stopping time, and let X be a supermartingale. Then XT is integrable
and
E[XT ] ≤ E[X0 ], (3.4.39)
provided one of the following conditions holds:
(a) T is bounded (i.e., there exists an N ∈ N such that T (ω) ≤ N for all
ω ∈ Ω).
(b) X is bounded and T is a.s. finite.
(c) E[T ] < ∞ and, for some K < ∞,

Xn (ω) − Xn−1 (ω) ≤ K ∀ n ∈ N, ω ∈ Ω. (3.4.40)

(ii) If X is a martingale, then E[XT ] = E[X0 ] in any of the situations above.

52 3 Some Basic Notions from Probability Theory

Proof We already know that E[Xn∧T ] − E[X0 ] ≤ 0 for all n ∈ N. In case (a), we
know that T ∧ N = T , and so E[XT ] = E[XT ∧N ] ≤ E[X0 ], as claimed. In case (b),
we start from E[Xn∧T ] − E[X0 ] ≤ 0 and let n → ∞. Since T is almost surely finite,
we have limn→∞ Xn∧T = XT a.s., and since Xn is uniformly bounded, we get

lim E[XT ∧n ] = E lim XT ∧n = E[XT ], (3.4.41)
n→∞ n→∞

which implies the result. In case (c), we observe that

T ∧n

|XT ∧n − X0 | = (Xk − Xk−1 ) ≤ KT , (3.4.42)

k=1

and E[KT ] < ∞ by assumption. Thus, we can again take the limit n → ∞ and use
Lebesgue’s dominated convergence theorem to justify that the inequality survives.
Finally, to justify (ii), use that if X is a martingale, then both X and −X are
supermartingales. The ensuing two inequalities imply the desired equality.

Theorem 3.67 may look strange, since it seems to contradict the “no winning
strategy”. Indeed, take the simple random walk (Sn )n∈N0 starting from S0 = 0 and
define the stopping time T = inf{n : Sn = 10}. Then, clearly, XT = 10 = E[X0 ] =
0. So, using (c) we must conclude that E[T ] = ∞. In fact, the “sure” gain when we
achieve our goal is offset by the fact that on average it takes infinitely long to reach
this goal (of course, most games will end quickly, but chances are that some may
take very long).
Case (c) in Theorem 3.67 is the situation we hope to have the most often. The
following lemma states that E[T ] < ∞ whenever the probability of the event leading
to T is eventually sufficiently large.

Lemma 3.68 Suppose that T is a stopping time and that there exist N ∈ N and
ε > 0 such that
P(T ≤ n + N | Fn ) > ε a.s. ∀ n ∈ N0 . (3.4.43)
Then E[T ] < ∞.

Proof For k ∈ N we can write, by iteration,

P(T > kN ) = E[1T >(k−1)N 1T >kN ]

= E E[1T >(k−1)N 1T >kN |F(k−1)N ]

= E 1T >(k−1)N E[1T >kN |F(k−1)N ]
≤ (1 − ε) E[1T >(k−1)N ]
≤ (1 − ε)k . (3.4.44)

The exponential decay of this probability implies the claim.

3.5 Martingales in continuous time 53

Finally, we state Doob’s supermartingale inequality for non-negative super-

martingales.

Theorem 3.69 (Doob’s supermartingale inequality) Let X be a non-negative super-

martingale and T a stopping time. Then

E[XT ] ≤ E[X0 ]. (3.4.45)

Moreover, for any c > 0,

c P sup Xk > c ≤ E[X0 ]. (3.4.46)
k∈N0

Proof We know that E[XT ∧n ] ≤ E[X0 ]. Using Fatou’s lemma, we may pass to
the limit n → ∞. For (3.4.45), set T = inf{n ∈ N0 : Xn > c}. Clearly, XT ≥ c
if supk∈N0 Xk > c, and zero otherwise. Thus, E[XT ] ≥ c P(supk∈N0 Xk > c), and
(3.4.46) follows from (3.4.45).

3.5 Martingales in continuous time

In principle, most of the results that hold for martingales in discrete time carry over
to continuous time. There are, however, a number of subtleties that need to be taken
care of.

3.5.1 Càdlàg functions

The first question we need to settle is the choice of function space where the pro-
cesses live in. Often this is the set of continuous functions, but in general this set is
too restrictive. It turns out that a good choice is the set of so-called càdlàg functions.

Definition 3.70 A function f : R+ → R is called a càdlàg function (“continue à

droite, limites à gauche”) if
(i) For every t ∈ R+ , f (t) = lims↓t f (s).
(ii) For every t > 0, f (t−) = lims↑t f (s) exists.

It will be important to be able to extend functions defined on countable sets to

càdlàg functions. Abbreviate Q+ = Q ∩ R+ .

Definition 3.71 A function y : Q+ → R is called regularisable if

(i) For every t ∈ R+ , limq↓t y(q) exists and is finite.
(ii) For every t > 0, y(t−) = limq↑t y(s) exists and is finite.
54 3 Some Basic Notions from Probability Theory

Without going into further details, we state the fact that regularisability is a mea-
surable property.

Lemma 3.72 Let (Yq )q∈Q+ be a stochastic process defined on (Ω, F , P), and let

G = ω ∈ Ω : q → Yq (ω) is regularisable . (3.5.1)

Then G ∈ F .

Next, we observe that from a regularisable function we can readily obtain a

càdlàg function by taking limits from the right.

Theorem 3.73 Let y : Q+ → R be a regularisable function. Define, for t ∈ R+ ,

f (t) = lim y(q). (3.5.2)

q↓t

Then f is càdlàg.

3.5.2 Filtrations, supermartingales and càdlàg processes

We begin with a probability space (Ω, G , P). We define a continuous-time filtration

(Gt )t∈R+ , similarly as in the discrete-time setting.

Definition 3.74 A filtration (Gt )t∈R+ of (Ω, G , P) is an increasing family of sub-

σ -algebras such that, for 0 ≤ s < t,

Gs ⊂ Gt ⊂ G∞ = σ Gr ⊂ G . (3.5.3)
r∈R+

A quadruple (Ω, G , P, (Gt )t∈R+ ) a called a filtered space.

Definition 3.75 A stochastic process (Xt )t∈R+ is called adapted to the filtration
(Gt )t∈R+ if Xt is Gt -measurable for every t ∈ R+ .

Definition 3.76 A stochastic process X on a filtered space is called a martingale if

the following hold:
(i) The process X is adapted to the filtration (Gt )t∈R+ .
(ii) For all t ∈ R+ , E[|Xt |] < ∞.
(iii) For all 0 ≤ s ≤ t,
E[Xt |Gs ] = Xs a.s. (3.5.4)
Sub- and supermartingales are defined in the same way, with = in (3.5.4) replaced
by ≥, respectively, ≤.
3.5 Martingales in continuous time 55

So far almost nothing has changed with respect to the discrete-time setting.
Note, in particular, that if we take a monotone sequence of times (tn )n∈N0 , then
(Yn )n∈N0 = (Xtn )n∈N0 is a discrete-time (sub/super)martingale whenever (Xt )t∈R+
is a continuous-time (sub/super)martingale.
The next lemma is important because it connects martingale properties to càdlàg
properties.

Lemma 3.77 Let Y be a supermartingale on a filtered space (Ω, G , P, (Gt )t∈R+ ).

Let t ∈ R+ , and let q(−n), n ∈ N, be such that q(−n) ↓ t as n → ∞. Then

lim Yq(−n) (3.5.5)

q(−n)↓t

exists a.s. and in L 1 .

Proof This is an application of the Lévy-Doob downward theorem (see Rogers and
Williams [208, Chaps. II.51 and II.63]).

Spaces of càdlàg functions are the natural setting for stochastic processes.

Definition 3.78 A stochastic process is called a càdlàg process if all its sample paths
are càdlàg functions; càdlàg processes that are (sub/super)martingales are called
càdlàg (sub/super)martingales.

Note that we do not just require that almost all sample paths are càdlàg.

3.5.3 The Doob regularity theorem

We will now show that the setting of càdlàg functions is suitable for the theory of
martingales.

Theorem 3.79 (Doob’s regularity theorem) Let Y = (Yt )t∈R+ be a supermartingale

defined on a filtered space (Ω, G , P, (Gt )t∈R+ ). Define the set

G = ω ∈ Ω : the map Q+ q → Yq (ω) ∈ R is regularisable . (3.5.6)

Then G ∈ G and P(G) = 1. The process X defined by

limq↓t Yq (ω), if ω ∈ G,
Xt (ω) = (3.5.7)
0, otherwise,

is a càdlàg process.

One might hope that Theorem 3.79 settles all problems related to continuous-
time martingales. Simply start with any supermartingale and pass to the càdlàg reg-
56 3 Some Basic Notions from Probability Theory

ularization. However, a problem of measurability arises. This can be seen in the

trivial example of a process with a single jump. Let Yt be defined for ω ∈ Ω as

0, if t ≤ 1,
Yt (ω) = (3.5.8)
q(ω), if t > 1,

where E[q] = 0. Let (Gt )t∈R+ be the natural filtration associated with this process.
Clearly, Gt = {∅, Ω} for t ≤ 1 and Y is a martingale with respect to this filtration.
The càdlàg version of this process is

0, if t < 1,
Xt (ω) = (3.5.9)
q(ω), if t ≥ 1.

Now, X = (Xt )t∈R+ is not adapted to the filtration (Gt )t∈R+ , since X1 is not measur-
able with respect to G1 . This problem cannot be remedied by a simple modification
on sets of measure zero because P(X1 = Y1 ) < 1. In particular, X is not a martingale
with respect to the filtration (Gt )t∈R+ , because

E[X1+ε |G1 ] = 0 = X1 ∀ε > 0. (3.5.10)

We thus see that the right-continuous regularisation of Y at the point of the jump
anticipates information from the future. If we want to develop a theory on càdlàg
processes, then we must take this into account and introduce a richer filtration that
contains this information.

Definition 3.80 Let (Ω, G , P, (Gt )t∈R+ ) be a filtered space. Define, for t ∈ R+ ,

Gt+ = Gs = Gq , (3.5.11)
s>t Qq>t

and let

N (G∞ ) = G ∈ G∞ : P(G) ∈ {0, 1} . (3.5.12)
Then the partial augmentation (Ht )t∈R+ of the filtration (Gt )t∈R+ is defined as

Ht = σ Gt+ , N (G∞ ) . (3.5.13)

The following lemma, which is obvious from the construction of càdlàg versions,
justifies this definition.

Lemma 3.81 If Y is a supermartingale with respect to the filtration (Gt )t∈R+ and
X is its càdlàg version defined in Theorem 3.79, then X is adapted to the partially
augmented filtration (Ht )t∈R+ .

A natural question is whether in this setting X is a supermartingale. The next

theorem answers this question in the affirmative and is to be seen as the completion
of Theorem 3.79.
3.5 Martingales in continuous time 57

Theorem 3.82 With the assumptions and notations of Lemma 3.81, X is a super-
martingale with respect to the filtration (Ht )t∈R+ . Moreover, X is a modification of
Y if and only if Y is right-continuous, in the sense that

lim E |Yt − Ys | = 0 ∀ t ∈ R+ . (3.5.14)
s↓t

Henceforth we will work on filtered spaces that are already partially augmented,
i.e., our standard setting (called “the usual setting” in Rogers and Williams [208]) is
as follows.

Definition 3.83 A filtered càdlàg space is a quadruple (Ω, F , P, (Ft )t∈R+ ), where
(Ω, F , P) is a probability space and (Ft )t∈R+ is a filtration that satisfies the fol-
lowing properties:
(i) F is P-complete (i.e., contains all sets of P-measure zero).
(ii) F0 contains all sets of P-measure 0.
(iii) Ft = Ft+ , i.e., t → Ft is right-continuous.
If (Ω, G , P, (Gt )t∈R+ ) is a filtered space, then the minimal enlargement of this space
satisfying conditions (i), (ii) and (iii) is called the right-continuous regularisation of
this space.

Theorem 3.84 The process X constructed in Theorem 3.79 is a supermartingale

with respect to the filtration (Ft )t∈R+ .

We finally give a version of Doob’s regularity theorem for processes defined on

càdlàg spaces.

Theorem 3.85 Let (Ω, F , P, (Ft )t∈R+ ) be a filtered càdlàg space. Let Y be an
adapted supermartingale. Then Y has a càdlàg modification Z if and only if the
map t → E[Yt ] is right-continuous, in which case Z is a càdlàg supermartingale.

3.5.4 Convergence theorems and martingale inequalities

Key results on discrete-time martingale theory were Doob’s forward and backward
convergence theorems and the maximum inequalities. We will now consider the
corresponding results in continuous time.

Theorem 3.86 (Supermartingale convergence) Let X be a càdlàg supermartingale

with respect to a filtered space (Ω, G , P, (Gt )t∈R+ ). Assume that supt∈R+ E[|Xt |] <
∞. Then
lim Xt = X∞ (3.5.15)
t→∞
exists almost surely in R.
58 3 Some Basic Notions from Probability Theory

In a similar way the maximum inequalities for càdlàg submartingales can be

inferred from their discrete-time counterparts.

Theorem 3.87 (Doob’s maximum inequality) Let Z be a non-negative càdlàg sub-

martingale on a filtered space. Then, for any c > 0 and t ∈ R+ ,

P sup Zs ≥ c ≤ c−1 E[Zt 1sup0≤s≤t Zs ≥c ] ≤ c−1 E[Zt ]. (3.5.16)
0≤s≤t

3.5.5 Stopping times

The notions around stopping times introduced in this section are important in the
theory of Markov processes. We need to be careful in the continuous-time setting,
even though we closely follow the discrete-time setting.
We consider a filtered space (Ω, G , P, (Gt )t∈R+ ).

Definition 3.88 A map T : Ω → [0, ∞] is called a (Gt )t∈R+ -stopping time if

{T ≤ t} = ω ∈ Ω : T (ω) ≤ t ∈ Gt ∀ 0 ≤ t ≤ ∞. (3.5.17)

If T is a stopping time, then the pre-T -σ -algebra GT is the set of all Λ ∈ G such
that
Λ ∩ {T ≤ t} ∈ Gt ∀ 0 ≤ t ≤ ∞. (3.5.18)

With this definition we have the usual properties of pre-T -σ -algebras:

Lemma 3.89 Let S, T be stopping times.

(i) If S ≤ T , then GS ⊂ GT .
(ii) GT ∧S = GT ∩ GS .
(iii) If F ∈ GS∨T , then F ∩ {S ≤ T } ∈ GT .
(iv) GS∨T = σ (GT , GS ).

It will be useful to also talk about stopping times with respect to the filtration
(Gt+ )t∈R+ .

Definition 3.90 A map T : Ω → [0, ∞] is called a (Gt+ )t∈R+ -stopping time if

{T < t} = ω ∈ Ω : T (ω) < t ∈ Gt ∀ 0 ≤ t ≤ ∞. (3.5.19)

If T is a (Gt+ )t∈R+ -stopping time, then the pre-T -σ -algebra GT + is the set of all
Λ ∈ G such that
Λ ∩ {T < t} ∈ Gt ∀ 0 ≤ t ≤ ∞. (3.5.20)
3.5 Martingales in continuous time 59

Lemma 3.91 Let (Sn )n∈N be a sequence of (Gt )t∈R+ -stopping times.
(i) If Sn ↑ S, then S is a (Gt )t∈R+ -stopping time.
(ii) If Sn ↓ S, then S is a (Gt+ )t∈R+ -stopping time and GS+ = n∈N GSn + .

Definition 3.92 A process (Xt )t∈R+ is called (Gt )t∈R+ -progressive if for every
t ∈ R+ the restriction of the map (s, ω) → Xs (ω) to [0, t] × Ω is B([0, t]) × Gt -
measurable.

The notion of a progressive process is stronger than that of an adapted process. Its
importance arises from the fact that T -stopped progressive processes are measurable
with respect to their respective pre-T -σ -algebra. The nice fact is that in the càdlàg
setting all works well.

Lemma 3.93 An adapted càdlàg process in a metrisable space (S, B(S)) is pro-
gressive.

Lemma 3.94 If X is progressive with respect to the filtration (Gt )t∈R+ and T is a
(Gt )t∈R+ -stopping time, then XT is GT -measurable.

3.5.6 First hitting time and first entrance time

In the case of discrete-time Markov processes we have seen that hitting times of
certain sets provide particularly important examples of stopping times. We will now
extend this discussion to the continuous-time setting. It is important to distinguish
between the notions of hitting time and entrance time. These differ in the way the
position of the process at time 0 is treated.

Definition 3.95 Let X be a stochastic process with values in a measurable space

(E, E ). Let Γ ∈ E . We call

ΔΓ (ω) = inf t ∈ R+ : Xt (ω) ∈ Γ (3.5.21)

the first entrance time of the set Γ , and

τΓ (ω) = inf t ∈ R+ \{0} : Xt (ω) ∈ Γ (3.5.22)

the first hitting time of the set Γ . In both cases the infimum is understood to be ∞
if the process never enters Γ .

Recall that in the discrete-time setting we only worked with τΓ , which is in fact
the more important notion. Here is an example of a stopping time.

Lemma 3.96 Let E be a metric space and let F be a closed set. Let X be a contin-
uous adapted process. Then ΔF is a (Gt )t∈R+ -stopping time and τF is a (Gt+ )t∈R+ -
stopping time.
60 3 Some Basic Notions from Probability Theory

Proof Let ρ denote the metric on E. Then the map x → ρ(x, F ) is continuous,
and hence the map ω → ρ(Xq (ω), x) is Gq -measurable for q ∈ Q+ . Since the paths
t → Xt (ω) are continuous, we have ΔF (ω) ≤ t if and only if

inf ρ Xq (ω), F = 0 (3.5.23)
q∈Q∩[0,t]

and so ΔF is measurable with respect to (Gt )t∈R+ . For τF the situation is slightly
different at time zero. Indeed, let ΔrF = inf{t ≥ r : Xt ∈ F }, r > 0. Obviously, from
the previous result we have that DFr is a (Gt )t∈R+ -stopping time. On the other hand,
{τF > 0} if and only if there exists a δ > 0 such that ΔrF > δ for all Q r > 0. But,
clearly, the event

Aδ = ΔrF > δ (3.5.24)
Qr>0

is Gδ -measurable, and so the event

{τF = 0} = {τF > 0}c = Acδ (3.5.25)
δ>0

is G0+ -measurable, and so τF is a (Gt+ )t∈R+ -stopping time.

To see where the difference between ΔF and τF comes from, consider the pro-
cess starting at the boundary of F . Then ΔF = 0, while τF may or may not be
zero: it could be that the process immediately leaves F and only returns after some
positive time t, in which case τF > 0, or it may stay for awhile in F , in which case
τF = 0. To distinguish between the two cases, we must look a little bit into the future
(recall Fig. 3.7).

3.5.7 Optional stopping and optional sampling

We have seen in the theory of discrete-time Markov processes that martingale prop-
erties of processes stopped at stopping times are important. We need similar results
for càdlàg processes. We again work on a filtered càdlàg space (Ω, F , P, (Ft )t∈R+ )
on which all the processes will be defined and adapted. The key result is the follow-
ing optional sampling theorem.

Theorem 3.97 (Optional sampling theorem) Let X be a càdlàg submartingale and

let T , S be (Ft )t∈R+ -stopping times. Then, for each M < ∞,

E X(T ∧ M)|FS ≥ X(S ∧ T ∧ M) a.s. (3.5.26)

If, in addition,
(i) T is finite a.s.,
3.6 Bibliographical notes 61

(ii) E[|X(T )|] < ∞,

(iii) limM→∞ E[X(M)1T >M ] = 0,
then

E X(T )|FS ≥ X(S ∧ T ) a.s. (3.5.27)
Equality holds for martingales.

A special case of Theorem 3.97 implies the following corollary.

Corollary 3.98 Let X be a càdlàg (sub/super)martingale, and let T be a stopping

time. Then X T = (XT ∧t )t∈R+ is a (sub/super)martingale.

In the case of uniformly integrable supermartingales, we get the Doob optional

sampling theorem:

Theorem 3.99 Let X be a uniformly integrable or a non-negative càdlàg super-

martingale. Let S and T be stopping times with S ≤ T . Then XT ∈ L 1 and

E[X∞ |FT ] ≤ XT a.s. (3.5.28)

and
E[XT |FS ] ≤ XS a.s., (3.5.29)
with equality when X is a uniformly integrable martingale.

An important concept connecting martingales and stopping times is that of a local

martingale.

Definition 3.100 A stochastic process M is called a local martingale if there ex-

ists a sequence of stopping times (τn )n∈N , with τn ≤ τn+1 and limn→∞ τn = ∞,
such that the processes M τn = (Mt∧τn )t∈R+ are martingales. The same terminology
applies to sub- and super-martingales, as well as to various integrability properties.

3.6 Bibliographical notes

1. Standard textbooks on probability theory are Feller [108, 109], Billingsley [28],
Chow and Teicher [58], Bauer [13], Kallenberg [146]. In these books the proofs can
be found that were omitted in this chapter.

2. The presentation in Sects. 3.4–3.5 largely follows the book of Rogers and
Williams [208].
Chapter 4
Markov Processes in Discrete Time

I think that method of government ought to answer well. You

see, the Kings would be sure to make Laws contradicting each
other: so the Subject could never be punished, because,
whatever he did he’d be obeying some Law.
(Lewis Carroll, Sylvie and Bruno Concluded)

Markov processes are the basic class of stochastic processes that we will use to
model metastable systems. Similarly to what we saw in Chap. 3, there is a substantial
difference in the mathematical difficulties involved in dealing with discrete time
and continuous time. In this chapter we give an outline of the theory of discrete-
time Markov processes (also called Markov chains). In Chap. 5 we will deal with
continuous-time Markov processes.
Section 4.1 gives the main definitions and lists some key facts. Section 4.2 looks
at the link between Markov processes and martingales. Section 4.3 lists a few prop-
erties that are specific to the setting where the state space is countable.

4.1 Markov processes: main definitions and key facts

4.1.1 Definition and elementary properties

Markov processes X = (Xt )t∈I with I = N0 or I = R+ are stochastic analogues of

dynamical systems. As such they must satisfy two basic properties: (1) they must be
causal, i.e., we want to be able to write down an expression for the law of Xt given
the σ -algebra Ft− = σ (Xs , 0 ≤ s < t); (2) they must be forgetful of the past, i.e.,
given the value of Xs at some time 0 ≤ s < t, the law of Xt is independent of the
values of Xu at all times 0 ≤ u < s (see Fig. 4.1).
The basic definition is in fact independent of the nature of the time parameter.

Definition 4.1 A stochastic process with state space S and index set I = N0 or
I = R+ is called a Markov process if the following holds. For any t ≥ s ≥ 0, there
exists a probability kernel Ps,t : S × B → [0, 1], satisfying:

© Springer International Publishing Switzerland 2015 63

A. Bovier, F. den Hollander, Metastability,
Grundlehren der mathematischen Wissenschaften 351,
DOI 10.1007/978-3-319-24777-9_4
64 4 Markov Processes in Discrete Time

Fig. 4.1 Illustration of the Markov property: the future depends on the present not the past

(i) For any x ∈ S, Ps,t (x, ·) is a probability measure on (S, B).

(ii) For any A ∈ B, Ps,t (·, A) is a measurable function on S, such that, for any
t ≥ s ≥ 0,

P(Xt ∈ A|Fs )(ω) = Ps,t Xs (ω), A . (4.1.1)

In the case of discrete time, i.e., for index set I = N0 , the compatibility conditions
impose severe restrictions on the kernels Ps,t that allow us to consider only the
one-step transition kernel Pt−1 = Pt−1,t . Indeed, a stochastic process X with state
space S and index set N0 is a discrete-time Markov process with one-step transition
kernels (Pt )t∈N if, for all A ∈ B and t ∈ N,

P(Xt ∈ A|Ft−1 )(ω) = Pt−1 Xt−1 (ω), A , P-a.s. (4.1.2)

This requirement fixes the law P up to one more probability measure on (S, B),
namely, the initial distribution p0 .

Theorem 4.2 Let (S, B) be a Polish space, let P be a transition kernel on N0 ×

N0 × S × B, and let p0 be a probability measure on (S, B). Then there exists a
unique stochastic process satisfying (4.1.2) such that P(X0 ∈ A) = p0 (A) for all A.

Proof In view of the Daniell-Kolmogorov extension theorem (Theorem 3.38), we

have to show that our requirements fix all finite-dimensional distributions, and that
these satisfy the compatibility conditions. This is essentially a problem of notation.
We will need to be able to derive formulas for

P(Xtn ∈ An , . . . , Xt1 ∈ A1 ) (4.1.3)

for 0 ≤ t1 < t2 < · · · < tn , A1 , . . . , An ∈ B and n ∈ N. To get started, we consider

P(Xt ∈ A|Fs ), 0 ≤ s < t, (4.1.4)

and use that, by the elementary properties of conditional expectations,

P(Xt ∈ A|Fs ) = E P(Xt ∈ A|Ft−1 )|Fs

= E Pt−1 Xt−1 (ω), A Fs

=E E Pt−1 (xt−1 , A)Pt−2 Xt−2 (ω), dxt−1 Ft−2 Fs
S
4.1 Markov processes: main definitions and key facts 65

= Pt−1 (xt−1 , A) Pt−2 (xt−2 , dxt−1 ) . . .
S S

... Ps+1 (xs+1 , dxs+2 )Ps Xs (ω), dxs+1 , (4.1.5)
S

where we refrain from writing “a.s.”, which applies to all equations relating to con-
ditional expectations. We thus have

Ps,t (x, A) = Pt−1 (xt−1 , A) Pt−2 (xt−2 , dxt−1 ) . . .
S S

... Ps+1 (xs+1 , dxs+2 )Ps (x, dxs+1 ). (4.1.6)
S

With this object, we can proceed to more complicated expressions:

P(Xtn ∈ An , . . . , Xt1 ∈ A1 )

= E P(Xtn ∈ An |Ftn−1 )1An−1 (Xtn−1 ) . . . 1A1 (Xt1 )

= E E Ptn−1 ,tn (Xtn−1 , An )|Ftn−1 1An−1 (Xtn−1 ) . . . 1A1 (Xt1 )

=E E Ptn−1 ,tn (xn−1 , An )Ptn−2 ,tn−1 Xtn−2 (ω), dxn−1 Ftn−2
An−1

× 1An−2 (Xtn−2 ) . . . . . . 1A1 (Xt1 )

= Ptn−1 ,tn (xn−1 , An ) Ptn−2 ,tn−1 (xn−2 , dxn−1 ) . . .
An−1 An−2

... Pt1 ,t2 (x1 , dx2 ) P0,t1 (x0 , dx1 )P0 (dx0 ). (4.1.7)
A1 S

Thus, we get the desired expression for the marginal distributions in terms of the
transition kernel P and the initial distribution p0 . The compatibility relations follow
from the following obvious, but important, property of the transition kernels.

Lemma 4.3 The transition kernels Ps,t satisfy the Chapman-Kolmogorov equa-
tions:

Ps,t (x, A) = Pr,t (y, A)Ps,r (x, dy), t > r > s. (4.1.8)
S

Proof This is obvious from the definition.

The proof of the compatibility relations is now also obvious; if some of the Ai ,
1 ≤ i ≤ n, are equal to S, then we can use (4.1.8) and recover the expressions for
the lower-dimensional marginals.
66 4 Markov Processes in Discrete Time

4.1.2 Markov processes with stationary transition probabilities

In general, we call a stochastic process whose index set supports the action of a
group (or semigroup) stationary with respect to the action of this group (or semi-
group) if all finite-dimensional distributions are invariant under the simultaneous
shift of all time indices. Specifically, if our index set I is R+ or Z or N0 , then a
stochastic process is stationary if, for all ∈ N, s1 , . . . , s ∈ I , A1 . . . , A ∈ B and
t ∈ I,

P(Xs1 ∈ A1 , . . . , Xs ∈ A ) = P(Xs1 +t ∈ A1 , . . . , Xs +t ∈ A ). (4.1.9)

We can express this property also as follows. For t ∈ I , define the process X ◦ θt by
(X ◦ θt )s = Xt+s . Then X is stationary if and only if, for all t ∈ I , the processes X
and X ◦ θt have the same finite-dimensional distributions.
In the case of Markov processes, a necessary (but not sufficient) condition for
stationarity is the stationarity of the transition kernels.

Definition 4.4 A Markov process with discrete time I = N0 and state space S is
said to have stationary transition probabilities if its one-step transition kernel Pt is
independent of t, i.e., there exists a probability kernel P (x, A) such that

Pt (x, A) = P (x, A) (4.1.10)

for all t ∈ N0 , x ∈ S and A ∈ B.

With the notation Ps,t for the transition kernel from time s to time t, we can
alternatively state that a Markov process has stationary transition probabilities if
there exists a family of transition kernels Pt (x, A) such that

Ps,t (x, A) = Pt−s (x, A) (4.1.11)

for all s, t ∈ N0 with 0 ≤ s < t, x ∈ S and A ∈ B. Note that Pt and Pt are different
objects and should not be confused.
A key concept for Markov processes with stationary transition probabilities is
that of an invariant distribution and invariant measure.

Definition 4.5 Let P be the transition kernel of a Markov process with stationary
transition probabilities. Then a probability measure π on (S, B) is called an invari-
ant distribution if

π(dx)P (x, A) = π(A) (4.1.12)
S
for all A ∈ B. More generally, a positive and σ -finite measure π satisfying (4.1.12)
is called an invariant measure.

Lemma 4.6 A Markov process with stationary transition probabilities and initial
distribution p0 = π is a stationary stochastic process if and only if π is an invariant
distribution.
4.2 Markov processes and martingales 67

Fig. 4.2 Illustration of the strong Markov property: the future depends on the present not the past,
even when the present occurs at a random stopping time T . Recall Fig. 4.1

There always is at least one invariant measure. When S is finite, this invariant
measure can be chosen to be a probability measure. However, when S is infinite, it
may not be possible to do so.

4.1.3 The strong Markov property

The setting of Markov processes is highly suitable for the application of the notion
of stopping times introduced in Chap. 3. In fact, one of the important properties of
Markov processes is that we can split the past and the future also at random times
(see Fig. 4.2).

Theorem 4.7 Let X be a Markov process with stationary transition probabilities.

The X satisfies the strong Markov property: Let (Fn )n∈N0 be a filtration to which X
is adapted, and let T be a stopping time. Let F and G be F -measurable functions,
and in addition let F be measurable with respect to the pre-T -σ -algebra FT . Then

E[1T <∞ F G ◦ θT |F0 ] = E 1T <∞ F E G|F0 (XT )|F0 , (4.1.13)

where E refers to an independent copy X of the Markov chain X.

Proof We have

E[1T <∞ F G ◦ θT |F0 ] = E E[1T <∞ F G ◦ θT |FT ]|F0

= E 1T <∞ F E[G ◦ θT |FT ]|F0 . (4.1.14)

But E[G ◦ θT |FT ] depends only on XT . Moreover, by stationarity E[G ◦ θT |FT ] =

E [G|F0 ](XT ), and so the claim of the theorem follows.

4.2 Markov processes and martingales

In this section we show how Markov transition kernels can be seen as operators
acting on spaces of measures, respectively, spaces of functions.
68 4 Markov Processes in Discrete Time

4.2.1 Semigroups

For μ a σ -finite measure on S and P a Markov transition kernel, we define the

measure μP as

(μP )(A) = P (x, A)dμ(x), (4.2.1)
S
and similarly for the t-step transition kernel

(μPt )(A) = Pt (x, A)dμ(x). (4.2.2)
S

By the Markov property, we have

(μPt )(A) = μP t (A). (4.2.3)

The action on measures has the following natural interpretation in terms of the pro-
cess: if P(X0 ∈ A) = μ(A), then

P(Xt ∈ A) = (μPt )(A). (4.2.4)

Alternatively, for f a bounded and measurable function on S, we define

(Pf )(x) = f (y)P (x, dy) (4.2.5)
S

and

(Pt f )(x) = f (y)Pt (x, dy), (4.2.6)
S
where again
Pt f = P t f. (4.2.7)
We say that (Pt )t∈N0 is a semigroup acting on the space of measures, respectively,
on the space of bounded measurable functions. The interpretation of the action on
functions is as follows.

Lemma 4.8 Let (Pt )t∈N0 be a Markov semigroup acting on bounded measurable
functions f . Then

(Pt f )(x) = E f (Xt )|F0 (x) = Ex f (Xt ) . (4.2.8)

Proof We need to prove (4.2.8) only for t = 1. But, by definition,

Ex f (X1 ) = f (y)P(X1 ∈ dy|F0 )(x) = f (y)P (x, dy), (4.2.9)
S S

which proves the claim.

4.2 Markov processes and martingales 69

4.2.2 The martingale problem

Note that, by telescopic expansion, we have the elementary formula

t−1
t−1
Pt f − f = Ps (P − 1)f = Ps (Lf ), t ∈ N0 , (4.2.10)
s=0 s=0

where we call L = P − 1 the (discrete-time) generator of our Markov process (this

formula will turn out to have a complete analogue in the continuous-time setting).
An interesting consequence is the following observation, referred to as the martin-
gale problem in discrete time.

Lemma 4.9 Let L be the generator of a Markov process X, and let f be a bounded
measurable function. Then

t−1
Mt = f (Xt ) − f (X0 ) − (Lf )(Xs ), t ∈ N0 , (4.2.11)
s=0

is a martingale.

Proof Let t, r ∈ N0 . Then

t−1
= f (Xt ) − f (X0 ) − (Lf )(Xs )
s=0

r−1
+ (Pr f )(Xt ) − f (Xt ) − (Pr Lf )(Xt )
s=0
= Mt + 0, (4.2.12)

by (4.2.10), which proves the lemma.

Note that (4.2.11) is the Doob decomposition of the process (f (Xt ))t∈N0 because
t−1
( s=0 (Lf )(Xs ))t∈N0 is a previsible process. Check that this fact follows directly
from (3.4.16).
70 4 Markov Processes in Discrete Time

What is important about the latter observation is that it gives rise to a characterisa-
tion of the generator that will turn out to be extremely useful in the continuous-time
setting. Namely, we can ask whether the requirement that (Mt )t∈N0 be a martingale
given a family of pairs (f, Lf ) fully characterises a Markov process.

Theorem 4.10 (Martingale problem) Let X be a discrete-time stochastic process

on a filtered space such that X is adapted. Then X is a Markov process with tran-
sition kernel P = 1 + L if and only if for all bounded measurable functions f the
expression on the right-hand side of (4.2.11) is a martingale.

Proof Lemma 4.9 already provides the “only if” part, so it remains to show the “if”
part. First, if we assume that X is a Markov process, then setting r = 1 and t = 0
in (4.2.12) and taking conditional expectations given F0 , we see from Lemma 4.8
that E(f (X1 )) = f (X0 ) + (Lf )(X0 ), which implies that the transition kernel must
be 1 + L.
It remains to show that X is indeed a Markov process. For this we want to show
that

E f (Xt+s )|Ft = (1 + L)s f (Xt ) = P s f (Xt ), (4.2.13)
from the martingale problem formulation. To see this, we just use the calculation in
(4.2.12) to see that

E f (Xt+r )|Ft = E[Mt+r |Ft ] + f (X0 )

t−1
t+r−1

+ (Lf )(Xs ) + E (Lf )(Xs )|Ft
s=0 s=t

t−1
t+r−1

= Mt + f (X0 ) + (Lf )(Xs ) + E (Lf )(Xs )|Ft
s=0 s=t

r−1

= f (Xt ) + E (Lf )(Xt+s )|Ft . (4.2.14)
s=0

For r = 1,

E f (Xt+1 )|Ft = f (Xt ) + (Lf )(Xt ) = (1 + L)f (Xt ) = (Pf )(Xt ), (4.2.15)

which is (4.2.13) for s = 1. Now proceed by induction: assume that (4.2.13) holds
for all bounded measurable functions for s ≤ r − 1. We must show that it then also
holds for s = r. To do this, we use (4.2.14) for the last sum in (4.2.14),

r−1
r−1

E (Lf )(Xt+s )|Ft = Ps (Lf ) (Xt ) = P r f (Xt ) − f (Xt ), (4.2.16)
s=0 s=0
4.2 Markov processes and martingales 71

where we undid the telescoping sum in (4.1.10). Insertion into (4.2.14) yields
(4.2.13) for s = r. Hence (4.2.13) holds for all r, by induction.

The full strength of Theorem 4.10 will become apparent in the continuous-time
setting, for which it remains valid. A crucial point is that it will not even be necessary
to consider all bounded measurable functions: a sufficiently rich class will do. This
allows us to formulate martingale problems even when we cannot write down the
generator in an explicit form.

4.2.3 Harmonic functions and martingales

We have seen that measures μ satisfying μL = 0 are of special importance in the

theory of Markov processes. Also of central importance are functions f that satisfy
Lf = 0. In this section we will assume that the transition kernels of our Markov
processes have bounded support, i.e., there is a K < ∞ such that |Xt+1 − Xt | ≤
K < ∞ for all t ∈ N0 , a.s.

Definition 4.11 Let L be the generator of a Markov process. A measurable function

satisfying
(Lf )(x) = 0 ∀x ∈ S (4.2.17)
is called a harmonic function. A function is called subharmonic or superharmonic
if Lf ≥ 0, respectively, Lf ≤ 0.

Theorem 4.12 Let X be a Markov process with generator L. Then a non-negative

function f is:
(i) harmonic when (f (Xt ))t∈N0 is a martingale;
(ii) subharmonic when (f (Xt ))t∈N0 is a submartingale;
(iii) superharmonic when (f (Xt ))t∈N0 is a supermartingale.

Proof Simply use Lemma 4.9.

Theorem 4.12 establishes a profound relationship between potential theory and

martingales. This link will be developed further in Chap. 5.
A nice application of Theorem 4.12 is the maximum principle.

Theorem 4.13 (Maximum principle) Let X be a Markov process and let D be a

bounded open domain such that E[τD c ] < ∞. Assume that f is a non-negative sub-
harmonic function on D. Then

sup f (x) ≤ sup f (x). (4.2.18)

x∈D x∈D c
72 4 Markov Processes in Discrete Time

Proof Define T = τD c . Then

E f (XT )|F0 (x) ≥ f (x). (4.2.19)
Since XT ∈ D c , it must be true that

sup f (y) ≥ E f (XT )|F0 (x) ≥ f (x) ∀x ∈ D, (4.2.20)
y∈Dc

which proves the claim. In (4.2.19) we again used the Doob optional stopping theo-
rem, (Theorem 3.67(i, b)).

Theorem 4.13 can be phrased as saying that (sub)harmonic functions take their
maximum on the boundary (since the set D c in (4.2.18) can be replaced by the subset
∂D ⊂ D c such that Px (XT ∈ ∂D) = 1 for all x ∈ D). The above proof is an example
of how intrinsically analytic results can be proven with the help of probabilistic
arguments. The next section will further develop this theme.

4.2.4 The Doob transform

Let us consider a discrete-time Markov process X with generator P − 1. We may

want to consider modifications of the process obtained by conditioning on certain
events to occur. One type of conditioning is to reach some specific set in a particular
location. For instance, consider a random walk on a finite interval conditioned to
exit the interval on a specific side. When and how can we do this, and what is the
nature of the resulting process? In particular, is the resulting process again a Markov
process and, if so, what is its generator?
As an example, let us condition a Markov process to hit a domain B for the first
time in a subset A ⊂ B. We may assume that E[τB ] < ∞. Define h(x) = Px [τA =
τB ], x ∈
/ B, and note that h is harmonic. Let P be the law of X. Let us define a new
law Ph on the space of paths as follows: if Y is a Ft -measurable random variable,
then
1
Eh [Y |F0 ] = E h(Xt )Y |F0 . (4.2.21)
h(X0 )

Lemma 4.14 With the notation above, if Y is a FτB −1 -measurable function, then
Ehx [Y ] = Ex [Y |τA = τB ]. (4.2.22)

Proof This is an application of the strong Markov property. We have

1 1
Ehx [Y ] = Ex Y h(XτB −1 ) = Ex Y E 1τA =τB |F0 (XτB −1 )
h(x) h(x)
1 1
= Ex Y E[1τA =τB |FτB −1 ] = Ex [Y 1τA =τB ]
h(x) Px (τA = τB )
= Ex [Y |τA = τB ]. (4.2.23)
4.2 Markov processes and martingales 73

Here, the first equality is just the definition of h and reproduces the form of the
right-hand side of the strong Markov property, the second equality is the strong
Markov property (recall Theorem 4.7), while the fourth equality uses the fact that the
event {τA = τB } depends only on what happens after τB − 1, and so 1τA =τB θτB −1 =
1τA =τB .

Let us next look at the transformed law Ph in the general case. The first property
to check is whether Ph is defined in a consistent way. Some thought shows that it
suffices to prove the following lemma.

Lemma 4.15 Let Y be Fs -measurable. Then, for any t ≥ s ≥ 0,

1 1
Eh [Y |F0 ] = E h(Xs )Y |F0 = E h(Xt )Y |F0 . (4.2.24)
h(X0 ) h(X0 )

In particular, Ph [Ω|F0 ] = 1.

Proof Just introduce the conditional expectation

E h(Xt )Y |F0 = E E h(Xt )Y |Fs |F0 = E Y E h(Xt )|Fs |F0 , (4.2.25)

and use that (h(Xt ))t∈N0 is a martingale by Theorem 4.12, to get

E h(Xt )Y |F0 = E h(Xs )Y |F0 , (4.2.26)

from which the claim follows.

Lemma 4.15 shows, in particular, why it is important that h be a harmonic func-

tion.
Next, we ask the question whether the law Ph is a Markov chain. To this end we
turn to the martingale problem. We will show that there exists a generator Lh such
that

t−1
h
Mth = f (Xt ) − f (X0 ) − L f (Xs ), t ∈ N0 , (4.2.27)
s=0

is a martingale under the law Eh , i.e., we show that

Eh Mth |Fr = Mrh , t > r ≥ 0. (4.2.28)

First note that, by (4.2.21),

1 r−1

Eh Mth |Fr = E h(Xt )f (Xt )|Fr − f (X0 ) − Lh f (Xs )
h(Xr )
s=0

t−1
1
− E h(Xs ) Lh f (Xs )|Fr . (4.2.29)
s=r
h(Xr )
74 4 Markov Processes in Discrete Time

The two middle terms are part of Mrh and so we must compute E[f (Xt )h(Xt )|Fr ].
This is done by applying Lemma 4.9 for the law P and the function f h, which yields

t−1

E f (Xt )h(Xt )|Fr = f (Xr )h(Xr ) + E L(f h) (Xs )|Fr . (4.2.30)
s=r

Inserting this into (4.2.29), we obtain

r−1
h
Eh Mth |Fr = f (Xr ) − f (X0 ) − L f (Xs ) (4.2.31)
s=0

1
t−1

+ E L(f h) (Xs )|Fr − E h(Xs ) Lh f (Xs )|Fr
h(Xr ) s=r

= Mrh

1
t−1

+ E L(f h) (Xs )|Fr − E h(Xs ) Lh f (Xs )|Fr .
h(Xr ) s=r

The second term will vanish if we choose Lh such that

(Lf )(x) = h(x)−1 L(hf ) (x),

i.e.,

1
Lh f (x) = P (x, dy)h(y)f (y) − f (x). (4.2.32)
h(x) S

Thus we see that, under Ph , X solves the martingale problem corresponding to the
generator Lh , and hence is a Markov chain with transition kernel P h = Lh + 1.
The process X under Ph is called the (Doob) h-transform of the original Markov
process.

4.3 Markov processes with countable state space

In this section we provide some results on Markov processes with countable state
space, in particular, we introduce the notions of recurrence and transience, and dis-
cuss the existence and uniqueness of invariant distributions.
It will be useful to adopt a notation that is close to the matrix notation of finite
Markov processes, namely,

P i, {j } = p(i, j ), (4.3.1)

which is called the transition matrix. We start with the following elementary con-
cepts.
4.3 Markov processes with countable state space 75

Definition 4.16 Two states i, j ∈ S the state space of a Markov process communi-

cate if there exist k, k ≥ 0 such that (p k )ij > 0 and (p k )j i > 0.

It is easy to see that communication is an equivalence relation. We call the equiv-

alence classes communicating classes.

Definition 4.17 A Markov process with countable state space is called irreducible
if and only if its state space forms a single communicating class.

Another concept of importance is periodicity.

Definition 4.18 A state i ∈ S of a Markov chain has period d(i) if and only if d(i)
is the largest common divisor of all numbers n ∈ N such that (P n )i,i > 0. A state of
period 1 is called aperiodic.

It is easy to see that the function d is constant on communicating classes. We

will henceforth place ourselves in the setting of irreducible Markov processes. In a
countably infinite state space the following notions are fundamental.

Definition 4.19 Let X be an irreducible Markov process with countable state

space S.
(i) X is called transient if

Pi (τi < ∞) < 1 ∀i ∈ S. (4.3.2)

(ii) X is called recurrent if it is not transient.

(iii) X is called positive recurrent if

Ei [Ti ] < ∞ ∀i ∈ S. (4.3.3)

(iv) X is called ergodic if it is irreducible, positive recurrent and aperiodic.

The notions of recurrence and transience can be defined for single states rather
than for the entire process. In the case of irreducible Markov processes, all states
have the same characteristics.
Some simple consequences of the above definition are the following.

Lemma 4.20 Let X be an irreducible Markov process with a countable state

space S. Then X is transient if and only if

P (Xt = infinitely often) = 0 ∀ ∈ S. (4.3.4)

Positive recurrent Markov processes possess a unique invariant probability dis-

tribution.
76 4 Markov Processes in Discrete Time

Lemma 4.21 Let X be an irreducible positive recurrent Markov process with a

countable state space S. Then

E [ τt=1 1Xt =j ]
μ(j ) = ∀j, ∈ S, (4.3.5)
E [τ ]
is the unique invariant probability distribution of X.

Proof Define ν (j ) = E [ τt=1 1Xt =j ]. We first show that ν is an invariant mea-
sure. Indeed, 1 = m∈S 1Xt−1 =m , and hence, by the strong Markov property,
τ
τ
ν (j ) = E 1Xt =j = E 1Xt =j 1Xt−1 =m
t=1 m∈S t=1

τ
= E 1Xt−1 =m P(Xt = j |Ft−1 )(m)
m∈S t=1

τ
= E 1Xt−1 =m p(m, j )
m∈S t=1

τ
= E 1Xt =m p(m, j )
m∈S t=1

= ν (m)p(m, j ). (4.3.6)
m∈S

Thus, ν solves the equation for the invariant measure. It remains to show that ν is
normalisable. However,

ν (j ) = E [τ ] < ∞, (4.3.7)
j ∈S

by assumption. Hence ν (j )/ i∈S ν (i) = μ(j ) is an invariant probability distri-
bution.
We next show uniqueness. First, note that for any irreducible Markov process
with countable state space, if ν is an invariant measure and ν(i) = 0 for some
i ∈ S, then ν = 0. Indeed, if ν(j ) > 0 for some j , then there exists a finite t such
that Pjt i > 0, and so ν(i) ≥ ν(j )Pjt i > 0, which is a contradiction. Next, note that
ν () = 1. We can actually show that ν is the only invariant measure such that
ν () = 1, which implies the desired uniqueness result as follows. Below we show
that ν(j ) ≥ ν (j ) for all j ∈ S for any other invariant measure ν such that ν() = 1.
Since ν − ν is a positive invariant measure as well and is zero at , it must vanish
identically, which implies that ν = ν .
We have

ν(i) = p(j1 , i)ν(j ) + p(, i), (4.3.8)
j1 =
4.3 Markov processes with countable state space 77

since ν() = 1 by hypothesis. Write

p(, i) = E [1τ ≥1 1X1 =i ]. (4.3.9)

Iterate (4.3.8) to get

ν(i) = p(j2 , j1 )p(j1 , i)ν(j2 ) + p(, j1 )p(j1 , i) + E [1τ ≥1 1X1 =i ]
j1 ,j2 = j1 =
2∧τ

= p(j2 , j1 )p(j1 , i)ν(j2 ) + E 1Xt =i . (4.3.10)
j1 ,j2 = t=1

Further iteration yields, for any n ∈ N,

n∧τ

ν(i) = p(jn , jn−1 ) . . . p(j2 , j1 )p(j1 , i)ν(jn ) + E 1Xt =i
j1 ,j2 ,...jn = t=1
n∧τ

≥ E 1Xt =i . (4.3.11)
t=1

Let n → ∞ and use (4.3.7) to get ν(i) ≥ ν (i).

Corollary 4.22 For an irreducible positive recurrent Markov process with count-
able state space S,
1
μ(j ) = , j ∈ S. (4.3.12)
Ej [τj ]
τj
Proof Just set = j in the definition of μ(j ), and note that νj (j ) = Ej [ t=1 1Xt =x ]
= 1.

The invariant distribution determines the long-time behaviour of the Markov pro-
cess. We state the following two ergodic theorems without proof.

Theorem 4.23 (Ergodic theorem; strong form) Let X be an irreducible positive

recurrent Markov process with invariant probability distribution μ. Then, for any
bounded measurable function f : S → R,

1
n
lim f (Xk ) = f dμ a.s. (4.3.13)
n→∞ n S
k=1

Theorem 4.24 (Ergodic theorem; weak form) Let X be an irreducible aperiodic

and positive recurrent Markov process with transition matrix P and invariant prob-
ability distribution μ. Then, for any initial law π0 ,

lim π0 P n i = μ(i), i ∈ S. (4.3.14)
n→∞
78 4 Markov Processes in Discrete Time

4.4 Bibliographical notes

1. There are many good textbooks on Markov processes. Nice modern treatments
can be found in Norris [195], Stroock [222], or Levin, Peres and Wilmer [163].
Two classic texts are those by Kemeny and Snell [151], or Kemeny, Snell and
Knapp [150].

2. The idea of characterising Markov processes by an associated martingale problem

goes back to Stroock and Varadhan [223].
Chapter 5
Markov Processes in Continuous Time

“Why cannot you explain the process?” he inquired. Mein Herr

was ready with a quite unanswerable reason. “Because you
have no words, in your language, to convey the ideas that are
needed. I could explain it in – in – but you would not understand
it!” (Lewis Carroll, Sylvie and Bruno Concluded)

In this chapter we review some important aspects of Markov processes in continu-

ous time. Their study involves much more analytic work than in the discrete-time
setting. However, there is also a lot of common structure. The basic definition of a
Markov process was already given in Chap. 4 (see Definition 4.1). Clearly, one-step
transition kernels no longer make sense and we need to look for the appropriate ana-
logue of the generator of the process. In this chapter we will only consider the case
of Markov processes with stationary transition kernels.
Section 5.1 takes a brief look at Markov jump processes. Section 5.2 lists a few
basic properties of Brownian motion. Section 5.3 gives the definition of general
Markov processes via generators and semigroups, and focusses on a special class
called Feller-Dynkin processes, emphasising the central rôle of the strong Markov
property. Section 5.4 introduces and studies the so-called martingale problem, which
is a powerful way to construct general Markov processes, and addresses the issues
of existence and uniqueness. Section 5.5 gives a brief summary on Itō calculus,
preparing for Sect. 5.6 that introduces and studies stochastic differential equations,
states existence and uniqueness criteria for strong solutions, and presents the Doob
h-transform in this setting. Section 5.7 looks at stochastic partial differential equa-
tions.

5.1 Markov jump processes

The simplest class of continuous-time Markov processes are Markov jump pro-
cesses. They are constructed “explicitly” from Markov processes in discrete time.
The idea is simple: take a discrete-time Markov process and randomise the waiting
times between the successive moves in such a way as to obtain a continuous-time
Markov process.

© Springer International Publishing Switzerland 2015 79

A. Bovier, F. den Hollander, Metastability,
Grundlehren der mathematischen Wissenschaften 351,
DOI 10.1007/978-3-319-24777-9_5
80 5 Markov Processes in Continuous Time

Fig. 5.1 Simulation of simple random walk on Z2 with n = 103 , 104 and 105 steps. The circles
have radius n1/2 in units of the step size. Brownian motion on R2 is the continuum limit of simple
random walk on Z2 . (Courtesy Bill Casselman and Gordon Slade)

To be more precise, let (Yn )n∈N0 , be a discrete-time Markov process with state
space S, transition kernel P (also called jump distribution) and initial distribution μ.
Let m : S → R+ be a uniformly bounded and measurable function. Let ei,x , i ∈ N0 ,
x ∈ S, be a family of independent exponential random variables with mean m(x),
defined on the same probability space (Ω, F , P) as (Yn )n∈N0 , and assume that
the Yn and the ei,x are mutually independent. Define

n−1
S(n) = ei,Yi , n ∈ N0 , (5.1.1)
i=0

which is called the clock process: S(n) represents the time at which the n-th jump
takes place. Define the inverse function

S −1 (t) = sup n ∈ N0 : S(n) ≤ t , t ∈ R+ , (5.1.2)

and set
X(t) = YS −1 (t) . (5.1.3)

Theorem 5.1 The process (X(t))t∈R+ defined through (5.1.3) is a continuous-time

Markov process with càdlàg paths.

5.2 Brownian motion

Markov jump processes are not the most general continuous-time Markov processes.
For instance, stochastic processes with continuous paths, such as diffusions, are
excluded. The simplest and most important example is Brownian motion. In this
section we give a brief recapitulation of its basic properties.
The main reason why Brownian motion is so important is that it arises as the uni-
versal limit of a large class of discrete-time processes, in particular, random walks,
(see Fig. 5.1)

n
Sn = Xk , n ∈ N0 , (5.2.1)
k=1
5.2 Brownian motion 81

with Xk , k ∈ N, i.i.d. random variables. Let us focus on the centred case: E(X1 ) = 0.
The central limit theorem states that Zn = n−1/2 Sn converges in distribution to a
Gaussian random variable, provided E[X12 ] = σ 2 < ∞. A natural question that goes
beyond this observation is to ask whether the entire path {Sn , n ∈ N} converges to a
limiting object. It is clear that if we rescale like
tn

Zn (t) = (nσ 2 )−1/2 Xk , t ∈ (0, 1], (5.2.2)
k=1

then
D
Zn (t) → Bt , t ∈ (0, 1], (5.2.3)
where Bt is a centred Gaussian random variable with variance t. Moreover, for ∈ N
and a finite collection of indices 0 = t0 < t1 < · · · < t , define Yn (i) = Zn (ti ) −
Zn (ti−1 ). Then the random variables Yn (i) are independent, and it is easy to see
that they jointly converge, as n → ∞, to a family of independent, centred Gaussian
random variables with variances ti − ti−1 . This implies that the finite-dimensional
distributions of the processes (Zn (t))t∈[0,1] converge to the finite-dimensional dis-
tributions of the Gaussian process, (Bt )t∈[0,1] , with covariance E[Bs Bt ] = s ∧ t. The
latter is called Brownian motion and has very interesting properties.

5.2.1 Definition of Brownian motion

Definition 5.2 An R-valued stochastic process (Bt )t∈R+ defined on a probability

space (Ω, F , P) is called a 1-dimensional Brownian motion starting in 0 if and
only if
(o) B0 = 0 a.s.
(i) For any p ∈ N and any 0 = t0 < t1 < · · · < tp < ∞, the random variables
Bt1 , Bt2 − Bt1 , . . . , Btp − Btp−1 are independent and each Bti − Bti−1 is a centred
Gaussian random variable with variance ti − ti−1 .
(ii) For any ω ∈ Ω, the map t → Bt (ω) is continuous (i.e., B(ω) : R+ → R is a
continuous function).

Alternatively, we can describe Brownian motion as follows.

Lemma 5.3 Brownian motion in 1 dimension is the Gaussian process (Bt )t∈R+ with
values in R such that
(o) B0 = 0.
(i) For any t ∈ R+ , E[Bt ] = 0.
(ii) For any t, s ≥ 0, E[Bt Bs ] = t ∧ s.
(iii) For any ω ∈ Ω, the map t → Bt (ω) is continuous.
82 5 Markov Processes in Continuous Time

Fig. 5.2 A sample of a Brownian motion path in 1 dimension

Proof Let B be Brownian motion as defined in Definition 5.2. Then properties (o),
(i) and (iii) are obviously satisfied. To show that (ii) holds, assume without loss of
generality that t > s. Then

E[Bt Bs ] = E (Bt − Bs )Bs + Bs2 = 0 + s = t ∧ s, (5.2.4)

where we use that Bt − Bs and Bs are independent and centred, and Bs has vari-
ance s.
To prove the converse, i.e., to prove that any stochastic process with the above
properties is Brownian motion, we can simply use the fact that the law of a Gaus-
sian process is uniquely determined by its mean and its covariance. Therefore the
stochastic process has the same law as Brownian motion (see Fig. 5.2), and has
continuous paths by property (iii), so it is Brownian motion.

Once we have Brownian motion in dimension 1, we can trivially define Brownian

motion in dimension d.

Definition 5.4 Brownian motion in d dimensions, written B = (B (1) , . . . , B (d) ), is

a stochastic process indexed by R+ with values in Rd such that the components B (i)
are mutually independent Brownian motions in R.

If B is Brownian motion in Rd and a ∈ Rd , then the process a + B is called

Brownian motion started in a.

Theorem 5.5 Brownian motion exists.

5.2 Brownian motion 83

There are a number of ways to prove the existence of Brownian motion, and we
refer the reader to the literature for proofs. In a way, the most appealing proof is via
Donsker’s theorem, which constructs Brownian motion as the limit of sums of i.i.d.
random variables via an interpolated version of (5.2.2).
Having constructed the random variable (Bt )t∈R+ in C(R+ , Rd ), we can define
its distribution, the so-called Wiener measure. We want to construct this as a measure
on the space of continuous functions equipped with its Borel σ -algebra. For this it
is useful to observe the following.

Lemma 5.6 The smallest σ -algebra C on C(R+ , Rd ) that makes all the coor-
dinate functions t → w(t) measurable coincides with the Borel-σ -algebra B =
B(C(R+ , Rd )) of the metrisable space C(R+ , Rd ) equipped with the topology of
uniform convergence on compact sets.

Proof First, C ⊂ B because all functions t → w(t) are continuous and hence mea-
surable with respect to the Borel-σ -algebra B. To prove that B ⊂ C , note that the
topology of uniform convergence is equivalent to the metric topology relative to the
metric
−n
d w, w = 2 sup w(t) − w (t) ∧ 1 , w, w ∈ C R+ , Rd . (5.2.5)
n∈N 0≤t≤n

We thus have to show that any ball with respect to this distance is measurable with
respect to C . But since w, w are continuous functions, we have

sup w(t) − w (t) ∧ 1 = sup w(t) − w (t) ∧ 1 , (5.2.6)
t∈[0,n] t∈[0,n]∩Q

and so we see that balls are indeed in C .

Note that, by construction, the map ω → B(ω) is measurable because the maps
ω → Bt (ω) are measurable for all t, and by the definition of C the coordinate maps
B → Bt are measurable for all t. Therefore the following definition makes sense.

Definition 5.7 Let (Bt )t∈R+ be a Brownian motion in Rd defined on a probability

space (Ω, F , P). The probability measure on (C(R+ , Rd ), B(C(R+ , Rd ))) given
as the image of P under the map ω → (Bt (ω))t∈R+ is called the d-dimensional
Wiener measure.

Note that uniqueness of the Wiener measure is an immediate consequence of

the Daniell-Kolmogorov theorem (Theorem 3.38), since we already know that the
finite-dimensional distributions are fixed by the prescription of the covariances.
An important property of Brownian motion is the following scale invariance.

Lemma 5.8 For any a ∈ R+ , the processes B = (Bt )t∈R+ and A = (At )t∈R+ with
At = a −1 Bta 2 have the same distribution.
84 5 Markov Processes in Continuous Time

Proof Obviously, A is a Gaussian process. It is also obvious that the time change
and the multiplication by a constant preserve the continuity of the paths, the starting
position 0 and the fact that the process has mean zero. Thus, it suffices to show that
B and A have the same covariance. But

E[At As ] = a −2 E[Ba 2 t Ba 2 s ] = a −2 a 2 t ∧ a 2 s = s ∧ t, (5.2.7)

which is the covariance of B.

5.2.2 Martingale and Markov properties

Brownian motion is a martingale. The proof of this fact is elementary.

Theorem 5.9 Brownian motion defined on a probability space (Ω, F , P) is a

continuous-time martingale, in the sense that, if (Ft )t∈R+ is a filtration of F such
that B is adapted, then

E[Bt |Fs ] = Bs , 0 ≤ s < t. (5.2.8)

Next, we show that Brownian motion is also a Markov process. For the defini-
tion of a continuous-time Markov process, we use the obvious generalisation of a
discrete-time Markov processes.

Definition 5.10 A stochastic process X with state space S and index set R+ is
called a continuous-time Markov process if there exists a two-parameter family of
probability kernels Ps,t satisfying the Chapman-Kolmogorov equations

Ps,t (x, A) = Pr,t (y, A)Ps,r (x, dy), ∀r ∈ (s, t), A ∈ B, (5.2.9)
S

such that, for all A ∈ B and 0 ≤ s < t,

P(Xt ∈ A | Fs )(ω) = Ps,t Xs (ω), A a.s. (5.2.10)

Theorem 5.11 Brownian motion in dimension d is a continuous-time Markov pro-

cess with transition kernel

1 y − x2
Ps,t (x, A) = exp − dy. (5.2.11)
(2π(t − s))d/2 A 2(t − s)

The proof is left as an exercise.

We now come, again somewhat informally, to the martingale problem associated
with Brownian motion.
5.2 Brownian motion 85

Theorem 5.12 (Martingale problem) Let f ∈ C 2 (R+ , Rd ) with bounded second

derivatives. Let B be Brownian motion and Δ the Laplacian. Then

1 t
Mt = f (Bt ) − f (B0 ) − (Δf )(Bs )ds (5.2.12)
2 0
is a martingale.

Proof For simplicity we only consider the case d = 1 (the general case works the
same way). We proceed as in the discrete-time case:

1 t
E[Mt+r | Ft ] = f (Bt ) − f (B0 ) − f (Bs )ds
2 0

1 r
+ E f (Bt+r ) − f (Bt ) | Ft − E f (Bt+s ) | Ft ds
2 0

1 (y − Bt )2
= Mt + √ f (y) exp − dy − f (Bt )
2πr R 2r

1 r 1 (y − Bt )2
− √ f (y) exp − dy ds. (5.2.13)
2 0 2πs R 2s

The last inequality holds since, via integration by parts,

1 (y − x)2
√ f (y) exp − dy
2πs R 2s

d2 1 (y − x)2
= f (y) 2 √ exp − dy
R dy 2πs 2s

1 (y − x)2
=√ f (y) −s −3/2 + (y − x)2 s −5/2 exp − dy
2π R 2s

d 1 (y − x)2
= 2 f (y) √ exp − dy. (5.2.14)
R ds 2πs 2s

Integrating the last expression in (5.2.14) over s, from 0 to r, we get

2 (x − y)2
√ f (y) exp − dy − 2f (x), (5.2.15)
2πr R 2r

where we use that

1 (x − y)2
lim √ f (y) exp − dy = f (x). (5.2.16)
h↓0 2πh R 2h

Inserting (5.2.14)–(5.2.15) into (5.2.13), we get E[Mt+r | Ft ] = Mt , which con-

cludes the proof.
86 5 Markov Processes in Continuous Time

Note that we really only used that the function

1 x2
e(x, t) = √ exp − (5.2.17)
2πt 2t
satisfies the (parabolic) partial differential equation
∂ 1
e(x, t) = Δe(x, t), (5.2.18)
∂t 2
with the (singular) initial condition

e(x, 0) = δ(x), (5.2.19)

where δ denotes the Dirac-delta function, defined by Rd δ(x)f (x)dx = f (0) for
any bounded measurable function f . The function e(x, t) is called the heat kernel
associated with Brownian motion.
Theorem 5.12 suggests to call 12 Δ the generator of Brownian motion, and to
think of (5.2.13) as the associated martingale problem. We will put this on firm
ground in the next sections.

5.3 General Markov processes

We briefly review the most important aspects of the general theory. For S a metric
space, let B(S, R) = B(S) denote the space of real-valued, bounded and measurable
functions on S, C(S, R) = C(S) the space of continuous functions on S, Cb (S, R) =
Cb (S) the space of bounded continuous functions on S, and C0 (S, R) = C0 (S) the
space of bounded continuous functions on S that vanish at infinity. Clearly, C0 (S) ⊂
Cb (S) ⊂ C(S) ⊂ B(S).

5.3.1 Semigroups and generators

The main building block for a time-homogeneous Markov process is the so-called
transition kernel P : R+ × S × B → [0, 1]. As in discrete time, the compatibility
conditions impose the Chapman-Kolmogorov equations for transition kernels.

Definition 5.13 A Markov transition function (Pt )t∈R+ is a family of kernels

Pt : S × B → [0, 1] with the following properties:
(i) For each t ∈ R+ and x ∈ S, Pt (x, ·) is a measure on (S, B) with Pt (x, S) ≤ 1.
(ii) For each t ∈ R+ and A ∈ B, Pt (·, A) is a B-measurable function on S.
(iii) For each s, t ∈ R+ ,

Ps+t (x, A) = Pt (y, A)Ps (x, dy). (5.3.1)
5.3 General Markov processes 87

It is useful to shift from measures of sets to integrals of functions. This motivates

the following equivalent definition of a time-homogeneous Markov process.

Definition 5.14 A stochastic process X with state space S and index set R+ is
a continuous-time homogeneous Markov process on a filtered space (Ω, F , P,
(Ft )t∈R+ ) with transition function (Pt )t∈R+ if it is adapted to (Ft )t∈R+ and, for
all bounded B-measurable functions f and all s, t ∈ R+ ,

E f (Xt+s )|Fs (ω) = (Pt f ) Xs (ω) a.s. (5.3.2)

It is convenient to think of the transition kernels as bounded linear operators on

B(S, R), acting as

(Pt f )(x) = Pt (x, dy)f (y). (5.3.3)
S
The Chapman-Kolmogorov equations in Definition 5.13(iii) then takes the simple
form Ps Pt = Pt+s , and (Pt )t∈R+ can be seen as a semigroup of bounded linear
operators. Note that we also have the dual action of Pt on the space of probability
measures via

(μPt )(A) = μ(dx)Pt (x, A). (5.3.4)
S
This gives the duality relation

(μPt )(f ) = μ(dx)(Pt f )(x) = μ(Pt f ), f ∈ B(S, R). (5.3.5)
S

The condition Pt (x, S) ≤ 1 may look surprising, since we would expect Pt (x, S) =
1. However, it is sometimes convenient to consider the more general situation where
the process may leave the state space, i.e., may “die”.
Equation (5.3.1) allows us to think of Markov transition functions as operators
on the Banach space of bounded measurable functions.

Definition 5.15 A family (Pt )t∈R+ of bounded linear operators on B(S, R) is called
a sub-Markov semigroup if, for all t ∈ R+ ,
(i) Pt : B(S, R) → B(S, R).
(ii) If 0 ≤ f ≤ 1, then 0 ≤ Pt f ≤ 1.
(iii) For all s ∈ R+ , Pt+s = Pt Ps .
(iv) If fn ↓ 0, then Pt fn ↓ 0.
A sub-Markov semigroup is called normal if P0 = 1, and is called honest if Pt 1 = 1
for all t ∈ R+ .

Our aim will be to construct the generator of the semigroup. We are looking for
an operator L such that Pt = exp(tL ), where “exp” is the exponential map, defined
through its Taylor expansion. This is a good enough way to construct a semigroup
88 5 Markov Processes in Continuous Time

from a bounded generator L . We will see shortly that this works well for Markov
jump processes with bounded jump rates m(x), x ∈ S. The general case, however, is
a bit more involved. The proper setting in which the relation between semigroup and
generator can be generalised is that of a so-called strongly continuous contraction
semigroup.

Definition 5.16 Let B0 be a Banach space. A family (Pt )t∈R+ of bounded linear
operators from B0 to B0 is called a strongly continuous contraction semigroup if the
following conditions are verified:
(i) For all f ∈ B0 , limt↓0 Pt f − f = 0.
(ii) Pt ≤ 1 for all t ∈ R+ .
(iii) Pt Ps = Pt+s , for all s, t ∈ R+ .
Here · denotes the operator norm corresponding to the norm on B0 .

We can now define the notion of infinitesimal generator.

Definition 5.17 Let B0 be a Banach space and let (Pt )t∈R+ be a strongly continuous
contraction semigroup. We say that f is in the domain D(L ) of L , if there exists
a function g ∈ B0 , such that

lim t −1 (Pt f − f ) − g = 0. (5.3.6)

t↓0

For such f we set L f = g, where g is the function that satisfies (5.3.6).

Note that we define D(L ) at the same time as L . In general, L will be an

unbounded operator (e.g. a differential operator) whose domain is strictly smaller
than B0 . Some authors describe the generator of a Markov process as a collection of
pairs of functions (f, g) satisfying (5.3.6).
In the case of Markov jump processes, we can identify the generator easily.

Lemma 5.18 Let X be a Markov jump process with jump distribution P and jump
rates m(x), x ∈ S. Then X has a generator with domain B(0) given by

(L f )(x) = m(x) f (y) − f (x) P (x, dy). (5.3.7)
S

Conversely, the semigroup (Pt )t∈R+ is given by

tn
(Pt f )(x) = exp(tL )f (x) = L n f (x). (5.3.8)
n!
n∈N0

For a proof see Ethier and Kurtz [104].

The classical link between generators and strongly continuous contraction semi-
groups is the Hille-Yosida theorem.
5.3 General Markov processes 89

Theorem 5.19 (Hille-Yosida theorem) A linear operator L on a Banach space B0

is the generator of a strongly continuous contraction semigroup if and only if the
following hold:
(i) The domain D(L ) of L is dense in B0 .
(ii) L is dissipative, i.e., for all λ > 0 and all f ∈ D(L ),

(λ − L )f ≥ λf . (5.3.9)

(iii) There exists a λ > 0 such that range(λ − L ) = B0 .

The proof of the Hille-Yosida theorem is quite involved and functional analytic
in nature. It makes use of the concept of resolvent, which provides the constructive
link between generator and semigroup. The proof of the Hille-Yosida theorem also
provides a construction of the semigroup from the generator, but we will not need
this here.

5.3.2 Feller-Dynkin processes

We next turn to a class of Markov semigroups that will be seen to have nice proper-
ties. Our state space is a locally compact Hausdorff space S with a countable basis
(e.g. S = Rd ). We do not need to assume compactness, but we will need to consider
the one-point compactification of S obtained by adding a “coffin state” ∂, making
S ∂ = S ∪ ∂ into a compact metrisable space.
We will place ourselves in the setting where the Hille-Yosida theorem works, and
make a specific choice for the underlying Banach space, namely, we will work on
the space C0 (S) of continuous functions vanishing at infinity. This requires that we
put a restriction on the semigroups so that they preserve C0 (S). The latter is known
as the Feller property.

Definition 5.20 A Feller-Dynkin semigroup is a strongly continuous sub-Markov

semigroup (Pt )t∈R+ acting on the space C0 (S), in particular, for all t ∈ R+ ,

Pt : C0 (S) → C0 (S). (5.3.10)

It is an analytic fact (coming from the Riesz representation theorem) that to any
strongly continuous contraction semigroup there corresponds a sub-Markov kernel
Pt (x, dy) such that (Pt f )(x) = S Pt (x, dy)f (y) for all f ∈ C0 (S). The key result
is the existence theorem for Feller-Dynkin processes.

Theorem 5.21 Let (Pt )t∈R+ be a Feller-Dynkin semigroup on C0 (S). Then there
exists a strong Markov process with values in S ∂ , càdlàg paths and transition ker-
nels (Pt )t∈R+ . (The unique existence of the Markov process on the level of finite-
dimensional distributions does not require the Feller property.)
90 5 Markov Processes in Continuous Time

Proof The Daniell-Kolmogorov extension theorem guarantees the existence of

a unique process on the product space (S ∂ )R+ , provided the finite-dimensional
marginals satisfy the compatibility conditions. This is easily verified, just as in the
discrete-time setting, by using the Chapman-Kolmogorov equations.
We want to show that the paths of the process are regularisable. For this we need
to bring martingales into the game, and also need the notion of the resolvent Rλ ,
associated with a strongly continuous contraction semigroup, defined through
∞
Rλ = e−λt Pt dt, λ ∈ R+ . (5.3.11)
0

Lemma 5.22 Let g ∈ C0 (S) and g ≥ 0. Set h = R1 g. Then

0 ≤ e−t Pt h ≤ h. (5.3.12)

If Y = (Yt )t∈R+ is the corresponding Markov process, then (e−t h(Yt ))t∈R+ is a
supermartingale.

Proof The lower bound in (5.3.12) is clear, since Pt maps non-negative functions
to non-negative functions. To get the upper bound, write
∞ ∞
e−s Ps h = e−s Ps R1 g = e−s Ps e−u Pu gdu = e−u Pu gdu ≤ R1 g = h.
0 s
(5.3.13)
Now, (e−t h(Yt ))t∈R+ is a supermartingale, since

E e−s−t h(Yt+s )|Ft = e−s−t (Ps h)(Yt ) ≤ e−t h(Yt ), (5.3.14)

where in the last step we use (5.3.12).

As a consequence of Lemma 5.22, the functions e−q h(Yq ) are regularisable, i.e.,
limq↓t e−q h(Yq ) exists for all t almost surely. We can therefore take a countable
dense subset {gi }i∈N of elements of C0 (S), set hi = R1 gi , and observe that the set
H = {hi }i∈N separates points in S ∂ , while almost surely e−q hi (Yq ) is regularisable
for all i ∈ N. But then Xt = limq↓t Yq exists for all t almost surely and is a càdlàg
process.
Finally, we establish that X is a modification of Y . To do this, let f, g ∈ C0 (S).
Then

E f (Yt )g(Xt ) = lim E f (Yt )g(Yq ) = lim E f (Yt )(Pq−t g)(Yt )
q↓t q↓t

= E f (Yt )g(Yt ) , (5.3.15)

where the first equality uses the definition of Xt and the third equality uses the strong
continuity of Pt . By an application of the monotone class theorem (Theorem 3.9)
this implies that E[f (Yt , Xt )] = E[f (Yt , Yt )] for any bounded measurable function
on S ∂ × S ∂ , and hence that P(Xt = Yt ) = 1.
5.3 General Markov processes 91

Theorem 5.21 allows us to consider Feller-Dynkin Markov processes defined on

the space of càdlàg functions with values in S ∂ (with the additional property that, if
Xt = ∂ or Xt− = ∂, then Xs = ∂, for all s ≥ t). We will henceforth do so (with the
usual right-continuous filtration).

5.3.3 The strong Markov property

Our Feller-Dynkin processes have the Markov property. In particular, if ζ is an Ft -

measurable function and f ∈ C0 (S), then

E ζf (Xt+s ) = E ζ (Ps f )(Xt ) . (5.3.16)

However, we want more, namely, like in the case of discrete-time Markov processes,
we want to be able to split past and future at stopping times. Let θt be the shift acting
on Ω via
X(θt ω)s = (θt X)(ω)s = X(ω)s+t . (5.3.17)
Then we have the following strong Markov property:

Theorem 5.23 Let T be an Ft+ -stopping time, and let P be the law of a Feller-
Dynkin Markov process X. Then, for all bounded random variables η, if T is a
stopping time, then
E[θT η|FT + ] = EXT [η], (5.3.18)
or equivalently, for all FT+ -measurable bounded random variables ξ ,

E[ξ θT η] = E ξ EXT [η] . (5.3.19)

Proof Consider the dyadic approximation of the stopping time T defined as

k2−n , if (k − 1)2−n ≤ T (ω) < k2−n , k ∈ N,
T (ω) =
(n)
(5.3.20)
∞, if T (ω) = ∞.

For Λ ∈ FT + , set

Λn,k = ω ∈ Ω : T (n) (ω) = 2−n k ∩ Λ ∈ Fk2−n . (5.3.21)

Let f be a continuous function on S. Then

E f (XT (n) +s ) 1Λ = E f (Xk2−n +s ) 1Λn,k
k∈N∪{∞}

= E (Ps f )(Xk2−n ) 1Λn,k
k∈N∪{∞}

= E (Ps f )(XT (n) )1Λ . (5.3.22)
92 5 Markov Processes in Continuous Time

Finally let n → ∞. By the right-continuity of paths, we have

lim XT (n) +s = XT +s , s ∈ R+ . (5.3.23)

n→∞

Since f is continuous, it also follows that

lim f (XT (n) +s ) = f (XT +s ), s ∈ R+ . (5.3.24)

n→∞

Since, by the Feller property, Ps f is also continuous, it further follows that

lim (Ps f )(XT (n) ) = (Ps f )(XT ), s ∈ R+ , (5.3.25)

n→∞

and so from (5.3.22), by dominated convergence,

E f (XT +s )1Λ = E (Ps f )(XT 1Λ , s ∈ R+ . (5.3.26)

To conclude the proof we need only generalise (5.3.26) to more general func-
tions. But this can be done in the usual manner via the monotone class theorem, and
presents no particular difficulties. Indeed, we first check that 1Λ can be replaced by
any bounded FT + -measurable function. ! Next, we show through explicit computa-
tion that instead of f (XT +s ) we can put ni=1 fi (XT +si ), and finally we can again
use the monotone class theorem to conclude the proof for the general case.

Note that working with Feller semigroups has payed off!

5.4 The martingale problem

For discrete-time Markov processes we have encountered a characterisation in terms
of the martingale problem. While this proved to be quite handy, there was nothing
profoundly important about its use. This very much changes in the continuous-time
setting. In fact, the martingale problem characterisation of Markov processes, orig-
inally proposed by Stroock and Varadhan, turns out to be the “proper” way to deal
with the theory in many respects.

5.4.1 Generators and cores

In principle, the Hille-Yosida theorem gives us precise criteria for recognising when
a given linear operator generates a strongly continuous contraction semigroup and
hence a Markov process. However, if we look at the conditions more carefully, then
we realise that in many situations it will be impractical to verify them. The domain
of a generator is usually far too large to allow for a description of the action of
the generator on all of its elements. For instance, for Brownian motion we want to
5.4 The martingale problem 93

think of the generator as 12 times the Laplacian on the space C 2 (R+ , Rd ). But this
operator is closed only in d = 1, but not in d ≥ 2, so already in this case we enter
into subtle issues we would rather like to avoid.
Let us first discuss this issue from a functional analytic point of view. To that end
we need to recall a few notions from operator theory.

Definition 5.24 Let G, C be two linear operators with domains D(G), D(C), re-
spectively. We say that C is an extension of G if
(i) D(G) ⊂ D(C).
(ii) Gf = Cf for all f in D(G).

Definition 5.25 A linear operator G on a Banach space B0 is called closed if its

graph, which is the set

Γ (G) = (f, Gf ) : f ∈ D(G) ⊂ B0 × B0 , (5.4.1)

is closed in B0 × B0 . Equivalently, G is closed, if, for any sequence (fn )n∈N in

D(G) such that limn→∞ fn = f and lim n → ∞Gfn = g, it is true that f ∈ D(G)
and g = Gf .

We call the closure L of a linear operator L the minimal extension of L that

is closed. An operator that has a closed linear extension is called closable.

Lemma 5.26 A dissipative linear operator L on B0 whose domain D(L ) is dense

in B0 is closable and the closure of range(λ − L ) is equal to range(λ − L ) for all
λ > 0.

Proof Let (fn )n∈N be a sequence in D(L ) such that fn → f and L fn → g as

n → ∞. We would like to associate with any such f the value g and then define
L f = g for all achievable f , which would then be the desired closed extension
of L . So, all we need to show is that, if fn → f and L fn → g , then g = g. In fact,
it suffices to show that, if fn → 0 and L fn → g, then g = 0. To do this, consider a
sequence of functions (gn )n∈N in D(L ) such that gn → g. Such a sequence exists
because D(L ) is dense in B0 . Using the dissipativity of L , we get

(λ − L )gn − λg = lim (λ − L )(gn + λfk ) ≥ lim λgn + λfk = λgn ,

k→∞ k→∞
(5.4.2)
where in the first inequality we use that 0 = limk→∞ fk and g = limk→∞ L fk .
Dividing by λ and taking the limit λ → ∞, we obtain

gn ≤ gn − g. (5.4.3)

Since gn − g → 0, this implies gn → 0.

The identification of the closure of the range with the range of the closure follows
from the observation made in Definition 5.25 that the range of a dissipative operator
is closed if and only if the operator is closed.
94 5 Markov Processes in Continuous Time

As a consequence of Lemma 5.26, if a dissipative linear operator L on B0 is

closable and the range of λ − L is dense in B0 , then its closure is the generator of a
strongly continuous contraction semigroup on B0 . These observations motivate the
definition of a core of a linear operator.

Definition 5.27 Let L be a linear operator on a Banach space B0 . A subspace

D ⊂ D(L ) is called a core for L if the closure of the restriction of L to D,
written LD , is equal to L .

Lemma 5.28 Let L be the generator of a strongly continuous contraction semi-

group on B0 . Then a subspace D ⊂ D(L ) is a core for L if and only if D is dense
in B0 and, for some λ > 0, range(λ − LD ) is dense in B0 .

Proof The claim follows from the preceding observations.

The following is a very useful characterisation of a core in our context.

Lemma 5.29 Let L be the generator of a strongly continuous contraction semi-

group (Pt )t∈R+ on B0 . Let D be a dense subset of D(L ). If, for all t ∈ R+ ,
Pt : D → D, then D is a core. In fact, it suffices that there is a dense subset D0 ⊂ D
such that Pt maps D0 into D.

Proof Let f ∈ D0 and set

n2
1 −λk/n
fn = e Pk/n f. (5.4.4)
n
k=0

We have fn ∈ D. By strong continuity,

n2
1 −λk/n
lim (λ − L )fn = lim e Pk/n (λ − L )f
n→∞ n→∞ n
k=0
∞
= dt e−λt Pt (λ − L )f = Rλ (λ − L )f = f. (5.4.5)
0

Thus, for any f ∈ D0 , there exists a sequence of functions ((λ − L )fn )n∈N in
range(λ − LD ) that converges to f . Hence the closure of the range of (λ − LD )
contains D0 . Since D0 is dense in B0 , the assertion follows from Lemma 5.28.

Example Let L be the generator of Brownian motion. We claim that C ∞ =

C ∞ (Rd ) is a core for L and L is the closure of 12 Δ on this core. Indeed, C ∞
is dense in the space of continuous functions. To show that C ∞ is a core, by
Lemma 5.29 we need only show that Pt maps C ∞ to C ∞ . But this is obvious from
the explicit formula for the transition kernel of Brownian motion in Theorem 5.11.
5.4 The martingale problem 95

To check that the restriction of L to C ∞ is 12 Δ is a simple calculation. Hence L is

the closure of 12 Δ.

The above results are nice when we already know the semigroup. In more com-
plicated situations we may only be able to write down the action of what we want
to be the generator of the Markov process on some small subspace of functions.
The question is: How can we find out whether this specifies a (unique) strongly
continuous contraction semigroup on the full space of functions, e.g. C0 (S)? We
may be able to show that it is dissipative, but is range(λ − L ) dense in C0 (S)? The
martingale problem formulation is a powerful tool to address this question.

5.4.2 The martingale problem

We begin with a relatively simple observation.

Lemma 5.30 Let X be a Feller-Dynkin process with transition functions (Pt )t∈R+
and generator L . Define, for f, g ∈ B(S),
t
Mt = f (Xt ) − g(Xs )ds. (5.4.6)
0

If f ∈ D(L ) and g = L f , then (Mt )t∈R+ is an (Ft )t∈R+ -martingale.

Proof The proof runs as in the discrete-time setting. Write

t t+u

E[Mt+u |Ft ] = E f (Xt+u )|Ft − (L f )(Xs )ds − E (L f )(Xs )|Ft ds
0 t
t
= Pu (Xt , dy)f (y) − (L f )(Xs )ds
S 0
u
− Ps (Xt , dy)(L f )(y) ds
0 S
t
= f (Xt ) − (L f )(Xs )ds
0
u
+ Pu (Xt , dy)f (y) − f (Xt ) − Ps (Xt , dy)(L f )(y) ds
S 0 S
u
= Mt + Pu (Xt , dy)f (y) − f (Xt ) − (Ps L f )(Xt )ds. (5.4.7)
S 0

But
d
(Ps L f )(x) = (Ps f )(x), (5.4.8)
ds
96 5 Markov Processes in Continuous Time

and so
u
Pu (Xt , dy)f (y) − f (Xt ) − (Ps L f )(Xt )ds = 0, (5.4.9)
S 0
from which the claim follows.

By “the martingale problem” we will mean the inverse problem associated with
the above observation.

Definition 5.31 Given a linear operator L with domain D(L ) and range(L ) ⊂
Cb (S), an S-valued càdlàg process on a filtered càdlàg space (Ω, F , P, (Ft )t∈R+ )
is called a solution of the martingale problem associated with the operator L if, for
any f ∈ D(L ), (Mt )t∈R+ defined by (5.4.6) is an (Ft )t∈R+ -martingale.

Before we continue, we need some additional notions for convergence in Banach

spaces.

Definition 5.32 A sequence (fn )n∈N in B(S) is said to converge bounded pointwise
(bp) to a function f ∈ B(S) if and only if
(i) supn∈N fn ∞ < ∞.
(ii) For every x ∈ S, limn→∞ fn (x) = f (x).
A set M ∈ B(S) is called bp-closed, if, for any sequence (fn )n∈N in M such that
bp − limn→∞ fn = f ∈ B(S), it is true that f ∈ M. The bp-closure of a set D ⊂
B(S) is the smallest bp-closed set in B(S) that contains D. A set M is called bp-
dense if its closure is B(S).

Lemma 5.33 Let (fn )n∈N be such that bp−limn→∞ fn = f and bp−limn→∞ L fn
= L f . If
t

fn (Xt ) − (L fn )(Xs )ds (5.4.10)

0 t∈R+

is a martingale for all n ∈ N, then

f (Xt ) − (L f )(Xs )ds (5.4.11)

0 t∈R+

is a martingale.

Proof Straightforward.

The implication of Lemma 5.33 is that in order to find a unique solution of the
martingale problem it suffices to know the generator on a core.

Theorem 5.34 Let L1 be an operator with domain D(L1 ) and range range(L1 ),
and let L be an extension of L1 . Suppose that the bp-closures of the graphs of L1
5.4 The martingale problem 97

and L are the same. Then a stochastic process X is a solution of the martingale
problem for L if and only if it is a solution of the martingale problem for L1 .

Proof This follows from Lemma 5.33.

The strategy will be to understand when the martingale problem has a unique
solution, and to show that this solution is a Markov process. It will be comforting
to see that only dissipative operators can give rise to the solution of a martingale
problem.
We first prove a result that gives an equivalent characterisation of the martingale
problem.

Lemma 5.35 Let L be the generator of a continuous-time Markov process X. Sup-

pose that f ∈ D(L ) and k : S → R is continuous and bounded from below. Then
t

f (Xt ) − (L f )(Xs ) ds (5.4.12)

0 t∈R+

is a martingale if and only if

t t s
e− 0 k(Xs )ds
f (Xt ) + e− 0 k(Xr )dr
k(Xs )f (Xs ) − (L f )(Xs ) ds
0 t∈R+
(5.4.13)
is a martingale.

Proof To prove this lemma we need the following theorem.

Theorem 5.36 Let M be a càdlàg local martingale (recall Definition 3.100), and
let V be a continuous and adapted process that is locally of bounded variation. Then
W = (Wt )t∈R+ with
t t
Wt = Vs dMs = Vt Mt − V0 M0 − Ms dVs (5.4.14)
0 0

is a càdlàg local martingale as well.

Proof By the definition of local martingales, a.s. we can find an increasing sequence
of stopping times (τn )n∈N with limn→∞ τn = ∞ such that M τn are martingales for
each n ∈ N, where M τn is the martingale M stopped at time τn . We may, moreover,
assume that |M τn | ≤ n and

m−1
RV (t) = sup sup |Vuk+1 − Vuk | ≤ n. (5.4.15)
m∈N 0≤u0 ≤···≤um ≤t k=0
98 5 Markov Processes in Continuous Time

We have

m−1
t
Vs dMsτn = lim Vunk Muτnn − Muτnn , (5.4.16)
0 n→∞ k+1 k
k=0

where (unk )n∈N is any sequence of partitions of [0, t] such that

lim max unk+1 − unk = 0. (5.4.17)
n→∞ 0≤k≤n

This limit exists since, by elementary reshuffling,

m−1
τ τ
m−1

Vunk Muτnn − nMuτnn = Vtτn Mtτn − V0 M0τn − Munn Vunn − Vuτnn . (5.4.18)
k+1 k k+1 k+1 k
k=0 k=0

Since V is of bounded variation and M τn is bounded, the latter sum converges to

t
0 Ms dVs , both a.s. and in L , as n → ∞. As a consequence, the same is true
1

for the left-hand side. Since, for any n ∈ N, the left-hand side is a martingale, this
property remains true in the limit as n → ∞. The limit as n → ∞ exists because
limn→∞ τn = ∞ a.s.

t of Lemma 5.35 follows from tTheorem 5.36. Indeed, choose Mt =

The proof
f (Xt ) − 0 (L f )(Xs )ds and Vt = exp(− 0 k(Xs )ds). A tedious but straightfor-
ward computation (which uses t Fubini’s theorem) shows that the expression in
(5.4.13) is of the form Vt Xt − 0 Xs dVs and hence defines a martingale.

Corollary 5.37 Let (Ft )t∈R+ be a filtration and X an adapted process. Let f, g ∈
B(S). Then, for λ > 0, (5.4.6) is a martingale if and only if
t

e−λt f (Xt ) + e−λs λf (Xs ) − g(Xs ) ds (5.4.19)
0 t∈R+

is a martingale.

We use this corollary to establish the following.

Lemma 5.38 Let L be a linear operator with domain and range in B(S). If a
solution of the martingale problem for L exists for any initial condition X0 = x ∈ S,
then L is dissipative.

Proof Let f ∈ D(L ) and g = L f . Use that (5.4.19) is a martingale with λ > 0.
Taking expectations and letting t → ∞, we get
∞
−λs

f (X0 ) = f (x) = E e λf (Xs ) − g(Xs ) ds (5.4.20)
0
5.4 The martingale problem 99

and hence

∞ ∞
f (x) ≤ e−λs Eλf (Xs ) − g(Xs )ds ≤ e−λs λf − gds = λ−1 λf − g,
0 0
(5.4.21)
which shows that λf ≤ (λ − L )f and proves that L is dissipative.

We already know that martingales typically have a càdlàg modification. Provided

the set of functions on which we have defined our martingale problem is sufficiently
rich, this property ought to carry over to the solution of the martingale problem as
well. The following theorem shows when this is true.

Theorem 5.39 Suppose that S is separable, D(L ) ⊂ Cb (S), D(L ) is separating

and contains a countable subset that separates points. If X is a solution of the
associated martingale problem, and if for any ε > 0 and T < ∞ there exists a
compact set Kε,T ⊂ S such that

P ∀ t ∈ [0, T ] ∩ Q : Xt ∈ Kε,T > 1 − ε, (5.4.22)

then X has a càdlàg modification.

Proof See Ethier and Kurtz [104, Chap. 4, Theorem 3.6].

5.4.3 Uniqueness

We have seen that solutions of the martingale problem provide candidates for nice
Markov processes. The two main issues to understand are when a martingale prob-
lem has a unique solution and whether this solution represents a Markov process.
When talking about uniqueness we will always assume that an initial distribution μ
is given. Thus, the data for the martingale problem is a pair (L , μ), where L is a
linear operator with its domain D(L ) and μ is a probability measure on S.
The following result is hardly surprising.

Theorem 5.40 Let S be separable and let L be a linear dissipative operator on

B(S) with domain D(L ) ⊂ B(S). Suppose there exists a L with domain D(L ) ⊂
D(L ) such that L is an extension of L . Let D(L ) = range(λ − L ) = D, and
let D be separating. Let X be a solution of the martingale problem for (L , μ). Then
X is a Markov process whose semigroup on D is generated by the closure of L ,
and the martingale problem for (L , μ) has a unique solution.

Proof See Ethier and Kurtz [104, Sect. 4.4].

Finally we can establish a uniqueness criterion and the strong Markov property
for solutions of martingale problems.
100 5 Markov Processes in Continuous Time

Theorem 5.41 Let S be a separable space and let L be a linear operator on B(S).
Suppose that for any initial distribution μ, any two solutions X, Y of the martingale
problem for (L , μ) have the same one-dimensional distributions, i.e., P(Xt ∈ A) =
P(Yt ∈ A) for any t ∈ R+ and any Borel set A. Then the following hold:
(i) Any solution of the martingale problem for L is a Markov process and any
two solutions of the martingale problem with the same initial distribution have
the same finite-dimensional distributions (i.e., uniqueness holds).
(ii) If D(L ) ⊂ Cb (S) and X is a solution of the martingale problem with càdlàg
sample paths, then for any a.s. finite stopping time τ ,

E f (Xt+τ )|Fτ = E f (Xt+τ )|Xτ ∀ f ∈ B(S). (5.4.23)

(iii) If, in addition to the assumptions in (ii), there exists a càdlàg solution of the
martingale problem for any initial measure μ = δx , x ∈ S, then the strong
Markov property holds, i.e.,

E f (Xt+τ )|Fτ = (Pt f )(Xτ ). (5.4.24)

Proof See Ethier and Kurtz [104, Sect. 4.4.].

Note that in Theorem 5.41 we made no assumptions on the choice of D(L ).

In particular, it need not separate points, as in Theorem 5.40. The latter is in
fact implicit in the requirement that uniqueness of the one-dimensional marginals
holds. This leads us to the following observation: for a martingale problem unique-
ness of the one-dimensional marginals implies uniqueness of the finite-dimensional
marginals.

5.4.4 Existence

We have seen that a uniquely solvable martingale problem provides a way to con-
struct a Markov process. We therefore need to find ways to produce solutions of
martingale problems. The best way to do this is through approximations and weak
convergence.

Lemma 5.42 Let L be a linear operator with domain and range in Cb (S). Let
(Ln )n∈N be a sequence of linear operators with domain and range in B(S). Assume
that for any f ∈ D(A) there exists a sequence (fn )n∈N with fn ∈ D(Ln ) such that

lim fn − f = 0 and lim Ln fn − L f = 0. (5.4.25)

n→∞ n→∞

If, for each n ∈ N, X n is a solution of the martingale problem for Ln with càdlàg
sample paths and X n converges to X weakly, then X is a càdlàg solution of the
martingale problem for L .
5.4 The martingale problem 101

Proof Let k ∈ N, and let 0 ≤ t1 < · · · < tk ≤ t < s be elements of the set C (X) =
{u ∈ R+ : P(Xu = Xu− ) = 1}. Let h1 , . . . , hk ∈ Cb (S), and let f, fn be as in the
hypothesis of the lemma. Then
s
"k

E f (Xs ) − f (Xt ) − (L f )(Xu )du hi (Xti )
t i=1

"
k

s n n
= lim E fn Xsn − fn Xtn − (L fn ) Xu du hi Xti = 0.
n→∞ t i=1
(5.4.26)
The complement of the set C (X) is at most countable, and hence (5.4.26) carries
over to all points 0 ≤ t1 < · · · < tk ≤ t < s. But this implies that X solves the mar-
tingale problem for L .

The usefulness of Lemma 5.42 is based on the following lemma, which implies
that we can use Markov jump processes as approximations.

Lemma 5.43 Let S be compact and let L be a dissipative operator on C(S) with
dense domain and L 1 = 0. Then there exists a sequence of positive contraction
operators (Tn )n∈N on B(S) given by transition kernels such that, for f ∈ D(L ),
lim n(Tn − 1)f = L f. (5.4.27)
n→∞

Proof Here is a rough sketch of the proof, which is closely related to the Hille-
Yosida theorem. From L we construct the resolvent (n − L )−1 on the range of
(n − L ). For a dissipative L , the operators n(n − L )−1 are bounded (by 1) on
range(n − L ). Hence, by the Hahn-Banach theorem, they can be extended to C(S)
as bounded operators. Using the Riesz representation theorem, we can then associate
with n(n − L )−1 a probability measure μn via

n(n − L )−1 f (x) = f (y)μn (x, dy), (5.4.28)
S

and so n(n − L )−1 = Tn defines a Markov transition kernel. Finally, it remains

to show that n(Tn − 1)f = nL (n − L )−1 f = Tn L f converges to L f for f ∈
D(L ), which is straightforward.

The point of Lemma 5.43 is that it shows that the martingale problem for L can
be approximated by martingale problems with bounded generators of the form

(Ln f )(x) = n f (y) − f (x) μn (x, dy), (5.4.29)
S

where Ln is the generator of a Markov jump process. For such a generator, the
construction of a solution can be done explicitly in various ways, e.g. by letting the
transition kernel be the convergent series for exp(tLn ).
102 5 Markov Processes in Continuous Time

5.5 Itō calculus

An important class of stochastic processes that exhibit metastability are solutions of
stochastic differential equations, to be treated in Sect. 5.6. This requires the notion
of stochastic integrals. In this section we give a brief outline of the main concepts.
For further reading, see e.g. the monographs by Karatzas and Shreve [148] and by
Itō and McKean [142].
In this section we will work on a filtered space (Ω, F , P, (Ft )t∈R+ ) that satisfies
the conditions of the “usual setting” of Definition 3.83. We will be interested to
define stochastic integrals of the form
t
Xs dMs , (5.5.1)
0

where M is a martingale and X is a progressive process. In fact, the full ambition

of stochastic analysis is to find the largest class of pairs of processes M and X
for which such an integral can be reasonably defined, which leads to the notion of
semi-martingale, but here we will limit our ambition to the considerably simpler
case when M is a continuous square-integrable martingale, i.e., when M has a.s.
continuous paths and E[Mt2 ] < ∞ for all t ∈ R+ . This includes the important case
where M is a Brownian motion.
In the sequel we will sometimes state results only for martingales. But these can
all be extended to local martingales.

5.5.1 Square-integrable continuous martingales

The definition of the t stochastic integral was already provided in Theorem 5.36,
where the integral 0 Vs dMs was defined through a Stieltjes-integral in the case
where Vt has (locally) bounded variation. We thus see that the challenge is to de-
fine stochastic integrals when also the integrand is not of bounded variation. Before
doing so we need to return briefly to the theory of martingales.
Let M be a càdlàg martingale. We want to define its quadratic variation process
[M] in analogy with the discrete-time setting. This will be contained in the following
fundamental result.

Theorem 5.44 Let M be a continuous square-integrable martingale. Then there ex-

ists a unique increasing process [M] such that the process M 2 − [M] is a uniformly
integrable continuous martingale.

Proof We will only consider the case where M is continuous. We may also assume
that M is bounded: otherwise we consider the martingale stopped when it exceeds a
finite value N . Define stopping times

T0n = 0, n
Tk+1 = inf t > Tkn : |Mt − MTkn | ≥ 2−n . (5.5.2)
5.5 Itō calculus 103

Set tkn = t ∧ Tkn . Assuming that M0 = 0, we can write (by telescopic expansions)

Mt2 = 2 n (Mt n − Mt n ) +
Mtk−1 k k−1
(Mtkn − Mtk−1 2
n ) . (5.5.3)
k∈N k∈N

Let

Htn = n 1T n <t≤T n .
MTk−1 k−1 k
(5.5.4)
k∈N

Note that H n = (Htn )t∈R+ is left-continuous, which makes it previsible. The first
term in the right-hand side of (5.5.3) is (H n • M)t , the so-called discrete stochastic
integral (see (3.4.4)), and we know from Theorem 3.53 that this is an L2 -bounded
martingale. We define

Ant = (Mtkn − Mtk−1
n ) .
2
(5.5.5)
k∈N
Then

Mt2 = 2 H n • M t + Ant . (5.5.6)
By construction, H n approximates M well:

sup Htn − Htn+1 ≤ 2−n−1 , sup Htn − Mt ≤ 2−n . (5.5.7)
t∈R+ t∈R+

The sets Jn (ω) = {Tkn (ω), k ∈ N} refine each other, i.e., Jn (ω) ⊂ Jn+1 (ω), and

AnT n ≤ AnT n . (5.5.8)

k k+1

Now it is elementary to see that

n 2 n
n+1 2
E H −H n+1
•M ∞ =E Htk−1 − Htk−1 (Mtk − Mtk−1 )2

k∈N

−2n−2
≤2 E (Mtk − Mtk−1 )2

k∈N
−2n−2
2
=2 E M∞ . (5.5.9)

Thus, the continuous martingales (H n • M) converge uniformly as n → ∞ to a con-

tinuous martingale, say N . This implies that the processes An converge as n → ∞
to some continuous process, say A, and

Mt2 = 2Nt + At . (5.5.10)

Due to the fact that the sets Jn (ω) form refinements and that An increases on the
stopping times Tkn , it follows that

ATkn ≤ ATk+1
n (5.5.11)
104 5 Markov Processes in Continuous Time

for all k, n. So A is increasing on the closure of J (ω) = n∈N Jn (ω). Thus, if J (ω)
is dense, then A is increasing. The remaining option is that the complement of J (ω)
contains some open interval I . But in that case, since no Tkn is in I , M must be
constant on I , and so must A. Thus, A is a continuous increasing process such that
M 2 − A is a continuous martingale, and hence A = [M].
It remains to show the uniqueness of the process [M]. For this we use the follow-
ing lemma.

Lemma 5.45 Suppose that M is a continuous local martingale that has paths of
finite variation. If M0 = 0, then Mt = 0 for all t.

Proof Again, by stopping M at τn = inf{t ∈ R+ : VM (t) > n}, where

VM (t) = lim |Munk − Munk−1 | (5.5.12)
n→∞
k∈N

is the total variation process, we may assume that M has bounded total variation.
Then, obviously,

−n −n
Ant = (Mtkn − Mtk−1
n ) ≤2
2
|Mtkn − Mtk−1
n |≤2 VM (t), (5.5.13)
k∈N k∈N

which tends to zero as n → ∞. Thus, M 2 is a martingale. So E[Mt2 ] = 0 for all t,

and a positive random variable of zero mean is zero a.s.

Uniqueness of [M] follows from the above observations. Indeed, assume that
there are two processes A, A with the desired properties. Then A − A is the dif-
ference of two uniformly integrable martingales, and hence is itself a uniformly
integrable martingale. On the other hand, since A and A are increasing and hence
are of finite variation, their difference is of finite variation, and thus is identically
zero by Lemma 5.45.

It will be convenient to note the following fact.

Theorem 5.46 Let M be a càdlàg martingale. Then, for any t ∈ R+ and any se-
quence of partitions {unk } of the interval [0, t] such that limn→∞ maxk∈N |unk −
unk−1 | = 0,
D
(Munk+1 − Munk )2 → [M]t . (5.5.14)
k∈N

Moreover, if M is square integrable, then the convergence also holds in L 1 .

The proof of this theorem is somewhat technical and will not be included. See
e.g. Ethier and Kurtz [104].
For the case where M is Brownian motion, we have the following fact.
5.5 Itō calculus 105

Lemma 5.47 If B is standard Brownian motion, then [B]t = t.

Recall from the discrete-time theory that there were two brackets associated with
a martingale: M and [M]. The first corresponds to the process given by Theo-
rem 5.44, the second is the quadratic variation process. In the case of continuous
martingales, they are the same.

5.5.2 Stochastic integrals for simple processes

We have already seen that the stochastic integral can be defined as a Stieltjes integral
for integrators of bounded variation. We will now show the crucial connection be-
tween the quadratic variation process of the stochastic integral and the process [M].
We begin with the case where the integrand, X is a step function.

Definition 5.48 A stochastic process is called simple process, if it has sample paths
that are step functions paths of the form
∞

Xt (ω) = xi (ω)1ti−1 <t≤ti , (5.5.15)
i=1

for some increasing sequence ti ∈ R+ and random variables xi ∈ R that are

Fti−1 -measurable. The space of simple processes is denoted by Eb .

Clearly, the stochastic integral for such a function is defined and equals
t
Xs dMs = xi M(ti ) − M(ti−1 ) + xm(t)+1 M(t) − M(tm(t) ) ,
0 ti ≤t

where m(t) = max{m ∈ N : tm ≤ t}.

The following lemma states the crucial properties of stochastic integrals.

Lemma 5.49 On some filtered probability space, let M be a continuous square-

t
integrable martingale and X ∈ Eb a simple process. Then 0 Xs dMs as defined
above is a continuous square-integrable martingale and
· t
XdM = X 2 d[M]. (5.5.16)
0 t 0

Proof The proof is straightforward and will be skipped. See Ethier and Kurtz [104,
Sect. 5.2.].

The stochastic integral for more general integrands should share the properties
stated in Lemma 5.49. The natural goal is to extend the integral to integrands X for
106 5 Markov Processes in Continuous Time

which the objects characterising it make sense. Note, in particular, that it follows
from (5.5.16) that
t
2 t
E Xs dMs =E Xs2 d[M]s . (5.5.17)
0 0

t
This means that the map X → 0 Xs dMs , from the space of left-continuous step-
functions equipped with the norm
t
1/2
X2,d[M] = E Xs2 d[M]s (5.5.18)
0

to the space of local square-integrable martingales with the norm L 2 (P), is an isom-
etry, called the Itō isometry. We will extend this isometry to all of L 2 (d[M]) to
define the Itō integral.

Theorem 5.50 Let M be a continuous square-integrable local martingale, and

let X ∈ L 2 (d[M]).
t Then there exists a unique continuous square-integrable lo-
cal martingale ( 0 Xs dMs )t∈R+ such that for any sequence of left-continuous step-
functions (Xn )n∈N that satisfies

# n $1/2
E (Xn − X)2s d[M]s < ∞, (5.5.19)
n∈N 0

it is true that
t

lim sup (Xn − X)s dMs = 0,
(5.5.20)
n→∞ 0≤t≤T 0

a.s. and in L 2 . Moreover,

· t
Xs dMs = Xs2 d[M]s . (5.5.21)
0 t 0

Proof See Ethier and Kurtz [104, Sect. 5.2, Theorem 2.3].

Remark 5.51 Theorem 5.50 extends the isometry X → XdM from the dense set
of left-continuous bounded step-functions to the full space L 2 (d[M]).

Remark 5.52 Theorem 5.50 is not the end of the possible extensions of the defini-
tion of stochastic integrals. Using localisation arguments as indicated in the defini-
tion of the bracket [M], we can extend the space of integrators to continuous local
martingales without the assumption of square integrability.
5.6 Stochastic differential equations 107

5.5.3 Itō formula

We now come to a most useful formula involving the notion of stochastic integrals,
the celebrated Itō formula. This formula is the analogue for functions of stochastic
processes with unbounded variation of the fundamental theorem of calculus.
We consider a stochastic process X of the form

X = X0 + V + M, (5.5.22)

where V is a continuous adapted process of bounded variation, M is a local mar-

tingale (which we may assume to be in L 2 by Remark 5.51), and V0 = M0 = 0.
Let f : R+ × R → R be continuously differentiable in the first argument and twice
continuously differentiable in the second argument.

Theorem 5.53 (Itō formula) With the assumptions stated above, the following
holds:
t t
∂ ∂
f (t, Xt ) − f (0, X0 ) = f (s, Xs )ds + f (s, Xs )dVs
0 ∂s 0 ∂x
t
∂ 1 t ∂2
+ f (s, Xs )dMs + f (s, Xs )d[M]s .
0 ∂x 2 0 ∂x 2
(5.5.23)

Remark 5.54 The Itō formula can be stated more conveniently in differential form
as

∂ ∂ 1 ∂2
df (t, Xt ) = f (t, Xt )dt + f (t, Xt )dXt + f (t, Xt )d[X]t , (5.5.24)
∂t ∂x 2 ∂x 2
with the understanding that d[X] = d[M], since the quadratic variation of the finite
variation process V is zero.

Proof See Ethier and Kurtz [104, Sect. 5.2].

5.6 Stochastic differential equations

In this section we construct stochastic processes X = (Xt )t∈R+ in Rd that solve

differential equations of the form

dXt = b(t, Xt )dt + σ (t, Xt )dBt (5.6.1)

with a prescribed initial condition X0 = x0 , where B is Brownian motion. The in-

terpretation of (5.6.1) is not entirely straightforward due to the term σ (t, Xt )dBt .
108 5 Markov Processes in Continuous Time

We will therefore interpret it as the integral equation

t t
X t = x0 + b(s, Xs )ds + σ (s, Xs )dBs , (5.6.2)
0 0

where the integral with respect to B is understood as the Itō stochastic integral.
In the most general setting the drift functions b and the diffusion functions σ are
assumed to be locally bounded and measurable.
The questions we are interested in are existence and uniqueness of solutions of
(5.6.2), as well as properties of these solutions. In Sects. 5.6.1–5.6.2 below we dis-
cuss the notion of strong solution. For stochastic differential equations there is also
the notion of weak solution, but we will skip that here.

5.6.1 Strong solutions

Abbreviate W = C(R+ , Rn ), and let H = B(W ) be its corresponding Borel-σ -

algebra. Consider the filtration (Ht )t∈R+ with Ht the σ -algebra generated by the
paths up to time t. We denote by H¯t the augmentation of Ht with respect to the
Wiener measure. The formal set-up for a stochastic differential equation involves
an initial condition and a Brownian motion, all of which require a probability space
denoted by

Ω, F , P, (Ft )t∈R+ , ξ, B , (5.6.3)
where
(i) (Ω, F , P, (Ft )t∈R+ ) is a filtered space satisfying the usual conditions.
(ii) B is a Brownian motion on Rd adapted to (Ft )t∈R+ .
(iii) ξ is an F0 -measurable random variable.
The canonical set-up has Ω = Rn × W and P = μ × Q, where μ is the law of ξ , Q is
the Wiener measure, and Ft is the usual augmentation of Ft0 = σ {ξ, Bs , 0 ≤ s ≤ t}.
The precise definition of pathwise uniqueness of an SDE is as follows.

Definition 5.55 For an SDE pathwise uniqueness holds if the following is true. For
any set-up (Ω, F , P, (Ft )t∈R+ , ξ, B) and any two continuous semi-martingales X
and X such that
t

b(s, Xs ) + σ (s, Xs )2 ds < ∞ (5.6.4)
0
solving the SDE with initial condition ξ and Brownian motion B,

P Xt = Xt ∀ t ∈ R+ = 1. (5.6.5)

If an SDE admits for any set-up (Ω, F , P, (Ft )t∈R+ , ξ, B) exactly one continuous
semi-martingale as solution, then we say that it is exact.
5.6 Stochastic differential equations 109

The notion of strong solution is naturally associated with the setting of exact
SDEs.

Definition 5.56 A strong solution of an SDE is a function

F : Rn × W → W (5.6.6)

such that

F −1 (Ht ) ⊂ B Rn × H¯t ∀ t ∈ R+ (5.6.7)
and on any set-up (Ω, F , P, (Ft )t∈R+ , ξ, B) the process X = F (ξ, B) solves the
SDE.

5.6.2 Existence and uniqueness of strong solutions

Existence and uniqueness results in the strong sense can be proven in a similar
way as for the case of ordinary differential equations, with the help of Gronwall’s
inequality and Picard’s iteration scheme. We recall Gronwall’s lemma, whose proof
is elementary.

Lemma 5.57 Let f, g : [0, b) → R+ be continuous nonnegative maps. Suppose

that, for some A ≥ 0, f satisfies
t
f (t) ≤ A + f (s)g(s)ds, t ∈ [0, b). (5.6.8)
0

Then
t

f (t) ≤ A exp g(s)ds , t ∈ [0, b). (5.6.9)

0
In particular, if (5.6.8) holds with A = 0, then f = 0.

Lemma 5.57 is at the heart of all uniqueness proofs for differential equations.
The idea is always the same: given a differential equation x = F (x, s), try to find
a norm x and a Lipschitz bound F (s, x) − F (s, y) ≤ g(s)x − y, at least for
small times s and small x − y. For two solutions x, y with initial values x0 , y0 ,
set f (t) = xt − yt . Then
t
f (t) ≤ x0 − y0 + g(s)f (s)ds, (5.6.10)
0

so that we can apply Gronwall’s lemma to get (at least locally)

f (t) ≤ x0 − y0 exp g(s)ds . (5.6.11)

This implies uniqueness of solutions with the same initial conditions.

110 5 Markov Processes in Continuous Time

The general approach is to assume local Lipschitz conditions to prove existence

of solutions for finite times, and then glue solutions together until a possible explo-
sion. Let us give the basic uniqueness and existence results due to Itō.

Theorem 5.58 Assume that b and σ are bounded and measurable, and that there
exists an open set U ⊂ R, T > 0 and K < ∞ such that

b(t, x) − b(t, y) + σ (t, x) − σ (t, y) ≤ K|x − y|, x, y ∈ U, 0 ≤ t ≤ T .
(5.6.12)
Let X, Y be two solutions of (5.6.2) (with the same Brownian motion B), and set

τ = inf{t ∈ R+ : Xt ∈
/ U or Yt ∈
/ U }. (5.6.13)

If E[(X0 − Y0 )2 ] = 0, then

P(Xt∧τ = Yt∧τ ∀ 0 ≤ t ≤ T ) = 1. (5.6.14)

Proof The proof is based on Gronwall’s lemma and runs very much like its deter-
ministic analogue. As norm we choose a uniform L 2 -bound:

E max (Xs∧τ − Ys∧τ )2 ≤ 2 E (X0 − Y0 )2
0≤s≤t
s∧τ
2

+ 4 E max σ (u, Xu ) − σ (u, Yu ) dBu
0≤s≤t 0
s∧τ
2

+ 4 E max b(u, Xu ) − b(u, Yu ) du
0≤s≤t 0
t∧τ
2
≤ 16 E σ (u, Xu ) − σ (u, Yu ) du
0
t∧τ
2
+ 4t E b(u, Xu ) − b(u, Yu ) du
0
t∧τ
≤ 4K (t + 4)E
2
(Xu − Yu ) du
2
0
t
≤ 4K 2 (t + 4) E max (Xu∧τ − Yu∧τ )2 ds. (5.6.15)
0 0≤u≤s

The first inequality uses that (a + b)2 ≤ 2a 2 + 2b2 , the second inequality uses the
Cauchy-Schwarz inequality for the drift term and the Doob L2 -maximum inequality
for the diffusion term, the third inequality uses the Lipschitz condition, while the
fourth inequality uses Fubini’s theorem.
We see that f (s) = E[max0≤s≤t (Xs∧τ −Ys∧τ )2 ] satisfies the hypothesis of Gron-
wall’s lemma with A = 0, so that

E max (Xt∧τ − Yt∧τ )2 = 0. (5.6.16)
0≤t≤T
5.6 Stochastic differential equations 111

By Chebyshev’s inequality this implies that P(max0≤t≤T |Xt − Yt | = 0) = 1, as

claimed.

Finally, existence of solutions (for finite times) can be proved via the usual Picard
iteration scheme under Lipschitz and growth conditions.

Theorem 5.59 Let b, σ satisfy the Lipschitz conditions in (5.6.12) and assume that
2 2
max b(t, x) + σ (t, x) ≤ K 2 1 + |x|2 . (5.6.17)
0≤t≤T

Let ξ be a random vector with finite second moment, independent of B, and let
(Ft )t∈R+ be the usual augmentation of the filtration associated with B and ξ . Then
there exists a continuous (Ft )t∈R+ -adapted process X that is a strong solution of
the SDE with initial condition ξ . Moreover, X is square-integrable, i.e., for any
T > 0 there exists a C(T , K) such that, for all 0 ≤ t ≤ T ,

E Xt 2 ≤ C(K, T ) 1 + E ξ 2 eC(K,T )t . (5.6.18)

Proof We define a map F from the space of continuous adapted processes X that
are uniformly square-integrable on [0, T ] to itself via
t t
F (X)t = ξ + b(s, Xs )ds + σ (s, Xs )dBs . (5.6.19)
0 0

Note that the square-integrability of F (X) needs the growth conditions in (5.6.17).
As in (5.6.15),
t 2
2
E sup F (X)t − F (Y )t ≤ 2E sup σ (Xs ) − σ (Ys ) dBs
0≤t≤T 0≤t≤T 0
t 2

+ 2E sup b(Xs ) − b(Ys ) ds
0≤t≤T 0
T
≤ 2K 2 (1 + T ) E sup Xs − Ys 2 dt, (5.6.20)
0 0≤s≤t

and, by iteration of this inequality, there exists a C (depending on K, T ), such that

C k T 2k
2
E sup F k (X)t − F k (Y )t ≤ E sup Xt − Yt 2 , (5.6.21)
0≤t≤T k! 0≤t≤T

where F k is the k-th iterate of F . Thus, for k sufficiently large, F k is a contraction,

and hence has a unique fixed point that solves the SDE. We can construct this so-
(0)
lution as follows. Choose Xt = ξ , X (k) = F (X (k−1) ), k ∈ N. From the preceding
112 5 Markov Processes in Continuous Time

inequality
C k T 2k
(k+1) (k) 2
E sup Xt − Xt ≤ E 1 + ξ2 . (5.6.22)
0≤t≤T k!
Apply the same arguments to estimate

(k+1) 2 t 2
E Xt ≤ K E ξ 2 + KT 1 + E Xs(k) ds. (5.6.23)
0

Iterate this inequality,

(k+1) 2 k
KT i+1
E Xt ≤ E ξ 2 + 1 + E ξ 2 ≤ KT 1 + E ξ 2 eKT t ,
i!
i=1
(5.6.24)
to get the growth bound in (5.6.18) with C(K, T ) = KT .

5.6.3 The Doob transform

An important way in which a drift can be produced is via conditioning. We have

already seen this in the case of discrete-time Markov processes. We will again see
that the martingale formulation plays a useful rôle. As in the discrete-time setting,
the key result is the following.

Theorem 5.60 Let X be a Markov process, i.e., a solution of the martingale prob-
lem for an operator L , and let h be a strictly positive harmonic function. Define
the measure Ph such that, for any Ft -measurable random variable Y ,

1
Ehx [Y ] = Ex h(Xt )Y . (5.6.25)
h(x)

Then Ph is the law of a solution of the martingale problem for the operator L h
defined by
h 1
L f (x) = (L hf )(x). (5.6.26)
h(x)

As an important example, let us consider the case of Brownian motion in a regular

domain D ⊂ Rd killed at the boundary ∂D. We assume that h is a harmonic function
on D, and we let τD be the first exit time of D. Then

1 ∇h
Lh = Δ+ · ∇, (5.6.27)
2 h
5.6 Stochastic differential equations 113

Fig. 5.3 Drift function b : R → R for Brownian motion conditioned to never hit the origin:
b(x) = 1/x

and hence, under the law Ph , the Brownian motion becomes the solution of the SDE
∇h(Xt )
dXt = dt + dBt . (5.6.28)
h(Xt )
On the other hand, we have seen that if h(x) is the probability of some event, e.g.
h(x) = Px (XτD ∈ A) for some A ∈ ∂D, then

Ph (·) = P( · |XτD ∈ A). (5.6.29)

This means that the Brownian motion conditioned to exit D at a given location can
be represented as the solution of an SDE with a specific drift. For instance, let d = 1
and D = (0, R). Consider the Brownian motion conditioned to leave D at R. It is
well known that
Px (XτD = R) = x/R, x ∈ D. (5.6.30)
Thus, the conditioned Brownian motion solves
1
dXt = dt + dBt . (5.6.31)
Xt
We can let R → ∞ without changing the SDE. Hence, the solution of (5.6.31) is
Brownian motion conditioned to never return to the origin (see Fig. 5.3). This is
reasonable, because the strength of the drift away from zero goes to infinity near 0.
Still, it is remarkable that the conditioning can be reproduced by the application of
a proper drift.

5.6.4 The Girsanov theorem

The Girsanov theorem is a particularly useful tool to study properties of stochastic

processes that can be seen as modifications of Brownian motions. For simplicity we
consider only the one-dimensional setting, but the obvious extension to the multi-
dimensional setting holds as well.
Suppose that we are given a filtered space (Ω, F , P, (Ft )t∈R+ ) satisfying the
usual assumptions, a Brownian motion B and an adapted process X that is square-
114 5 Markov Processes in Continuous Time

integrable with respect to dt, i.e., X is an integrand for B. Suppose that we want to
study the process
t
Wt = Bt − Xs ds. (5.6.32)
0
We may think of Xs = b(s, Bs ) for some bounded measurable function b, the sim-
plest example being b(s, Xs ) = b, in which case

Wt = Bt − bt,

which is Brownian motion with a constant drift b.

How can we compute properties of W ? In particular, can we find a new probabil-
ity measure, written %
P, such that under this new measure W becomes simple? The
Girsanov theorem is a striking affirmative answer to this question.

Theorem 5.61 (Girsanov theorem) Let B, X, W be as above. Define

t t

Zt = Zt (X) = exp Xs dBs − 21 2

Xs ds (5.6.33)
0 0

and let %
P be defined by

%
PT (A) = E ZT (X)1A . (5.6.34)
If (Zt )0≤t≤T is a martingale, then the process (Wt )0≤t≤T is a Brownian motion
under % PT .

Remark 5.62 We may check, using the Itō formula, that Zt solves

dZt = Zt Xt dBt . (5.6.35)

x− 0t Xs2 ds
t
To see why, let f (t, x) = e and Yt = 0 Xs dBs . Then Y is a martingale
t 2
with bracket [Y ]t = 0 Xs ds, and Zt = f (t, Yt ). By the Itō formula,

df (t, Yt ) = f (t, Yt ) − 12 Xt2 dt + dYt + 12 d[Y ]t = Zt Xt dBt . (5.6.36)

Hence (Zt )0≤t≤T is a positive local martingale and so, by Fatou’s lemma, a super-
martingale. It is a martingale whenever E[Zt ] = 1 for all t.

Proofs of the Girsanov theorem can be found in most standard textbooks on

stochastic analysis, e.g. in Karatzas and Shreve [148].

5.7 Stochastic partial differential equations

The theory of stochastic partial differential equations (SPDEs) is substantially more
involved than that of stochastic differential equations (SDEs), and in many respects
5.7 Stochastic partial differential equations 115

Fig. 5.4 A space-time plot (x, t) → u(x, t) of the stochastic Allen-Cahn equation

is not yet finalised. For recent developments, see Hairer [135]. In Chap. 12 we will
discuss metastability for one example system, namely, the stochastic Allen-Cahn
equation (see Fig. 5.4 for a visualisation). In this section we present the relevant
background.

5.7.1 The stochastic Allen-Cahn equation

Formally, the stochastic Allen-Cahn equation is a partial differential equation of the

form

∂ 1 ∂2 √ ∂2
u(x, t) = D 2 u(x, t) − V u(x, t) + 2ε W (x, t). (5.7.1)
∂t 2 ∂x ∂x∂t

Here, x ∈ [0, 1] is the space-coordinate, t ∈ R+ is the time-coordinate, D > 0 is

the coupling constant, V : R → R is the potential, ε > 0 is the noise-strength, and
W is the space-time Brownian sheet, i.e., the centred Gaussian process indexed by
[0, 1] × R+ with covariance

E W (x, t)W (y, s) = (x ∧ y)(t ∧ s). (5.7.2)

The initial condition is given by u(x, 0) = u0 (x), x ∈ [0, 1], with u0 a continu-
ous function. We also need to choose boundary conditions, e.g. periodic bound-
ary conditions u(0, t) = u(1, t), t ∈ R+ , or von Neumann boundary conditions
∂x u(0, t) = ∂x u(1, t) = 0, t ∈ R+ .
We need some further assumptions on the potential V .
116 5 Markov Processes in Continuous Time

Assumption 5.63
• V is C 3 on R.
• V is convex at infinity, i.e., there exist R, c > 0 such that

V (u) > c > 0, |u| > R. (5.7.3)

• V is polynomial of finite degree.

The SPDE in (5.7.1) can be seen as the stochastic perturbation of an infinite-

dimensional gradient system,

∂
u = −Dφ F, (5.7.4)
∂t
where, for φ a differentiable function,
1
1

F (φ) = 2 + V φ(x) dx, (5.7.5)
2 D φ (x)
0

and Dφ F is the Fréchet derivative of F . Of course, (5.7.1) is an informal expression

because the derivatives of the Brownian sheet do not exist. Formally, we think of it
as a Gaussian process such that

∂ 2 W (x, t) ∂ 2 W (y, s)
E = δ(x − y)δ(t − s), (5.7.6)
∂x∂t ∂y∂s

where δ(·) is the Dirac function, but it is clear that (5.7.1) requires even more inter-
pretation than an SDE.
To get an idea of what is at stake, consider first the linear equation

∂v(x, t) 1 ∂ 2 v(x, t) √ ∂ 2 W (x, t)

= 2D + 2ε , (5.7.7)
∂t ∂ 2x ∂x∂t
with initial condition v(x, 0) = 0. Naturally, we expect to be able to solve this equa-
tion with the help of the Fourier transform. Indeed, space-time white noise can be
constructed as a Fourier series as follows. Let Bn (t), n ∈ Z, be i.i.d. Brownian mo-
tions. Set
∂W (t, x) (2πi)nx
= e Bn (t). (5.7.8)
∂x
n∈Z

On the level of formal computations, this process has the desired correlation struc-
ture. Therefore, denoting by
1
1
v̂(n, t) = e−(2πi)nx ν(x, t)dx n ∈ Z, (5.7.9)
2π 0
5.7 Stochastic partial differential equations 117

the spatial Fourier coefficients of v, we find that these satisfy the stochastic ordinary
differential equations
√
d v̂(n, t) = − 12 D(2πn)2 v̂(n, t) dt + 2ε dBn (t), (5.7.10)

with initial condition v̂(n, 0) = 0. Note that these equations are uncoupled for dif-
ferent n. The equations in (5.7.10) are interpreted as Itō-SDEs, so we are on firm
ground. The solution of (5.7.10) is readily found to be

√ t
− 12 D(2πn)2 t 1 2
v̂(n, t) = 2ε e e 2 D(2πn) s dBn (s). (5.7.11)
0

This suggests the representation of the solution v in the form

v(x, t) = e(2πi)nx v̂(n, t). (5.7.12)
n∈Z

A quick check whether this series represents a bona fide stochastic process is the
computation of the variance of the spatial L2 -norm:

2
2 −D(2πn)2 t
t 1
D(2πn)2 s
E v(x, t) 2
= 2ε e E e 2 dBn (s) (5.7.13)
n∈Z 0

t
−D(2πn)2 t 2
= 2ε e eD(2πn) s ds
n∈Z 0

1 2
= 2ε 2
1 − e−D(2πn) t .
D(2πn)
n∈Z

Clearly, the sum converges.

Remark 5.64 If we do the same analysis in dimension d > 1, then we obtain

1 2
2
1 − e−D(2πn) t , (5.7.14)
D(2πn)
n∈Zd

which diverges.

So, at least in dimension d = 1, we can construct a proper stochastic process

taking values in L2 that can be reasonably considered a solution of the linear SPDE
in (5.7.7). This process has much nicer properties than the space-time white noise
itself, which suggests that it is better to define solutions of the non-linear equation
through solutions of the linear equation. This goes as follows. Denote by pt (x, y)
the density of the semi-group generated by D∂ 2 /∂ 2 x on [0, 1] (the heat kernel with
118 5 Markov Processes in Continuous Time

suitable boundary conditions). Then a solution of the inhomogeneous linear equa-

tion
∂ 2 u(x, t)
du(x, t) − 12 D dt = r(x, t) dt (5.7.15)
∂ 2x
with initial condition u0 (x) can be written as
1 t 1
u(x, t) = dy gt (x, y)u0 (y) + ds dy gt−s (x, y)r(y, s). (5.7.16)
0 0 0

Write the non-linear equation in (5.7.1) as

∂u(x, t) 1 ∂ 2 u(x, t)
√ ∂ 2 W (x, t)
− 2D = −V u(x, t) + 2ε . (5.7.17)
∂t ∂ 2x ∂x∂t
Next, think of the entire right-hand side as an inhomogeneous term (i.e., ignore the
fact that the right-hand side involves the solution itself). Then we can represent the
solution of (5.7.17) as
1 t 1

u(x, t) = dy gt (x, y)u0 (y) − ds dy gt−s (x, y)V u(y, s)
0 0 0
√ t 1
+ 2ε ds gt−s (x, y) dW (y, s)
0 0

1 t 1
= dy gt (x, y)u0 (y) − ds dy gt−s (x, y)V u(y, s)
0 0 0
+ v(x, t). (5.7.18)

Here, the last term is taken as the solution of the linear equation in (5.7.7) we just
constructed. The other terms in the right-hand side are ordinary integrals, so all the
expressions make sense. A so-called mild solution is a process that satisfies this
equation, i.e., instead of the ill-defined SPDE driven by space-time white noise, we
now have a non-linear integral equation driven by the more regular noise v(x, t).

Definition 5.65 A random field u is a mild solution of (5.7.1) if:

(i) Almost surely u is continuous on [0, 1] × R+ and predictable.
(ii) For all (x, t) ∈ [0, 1] × R+ ,
1 t 1

u(x, t) = dy gt (x, y)u0 (y) − ds dy gt−s (x, y)V u(y, s) + v(x, t),
0 0 0
(5.7.19)
with v(x, t) given by (5.7.12).

Existence, uniqueness and regularity of the solution of (5.7.1) are contained in

the following theorem, where Cbc ([0, 1]) denotes the set of continuous functions on
[0, 1] respecting the chosen boundary conditions.
5.7 Stochastic partial differential equations 119

Theorem 5.66 For every initial condition u0 ∈ Cbc ([0, 1]), the SPDE in (5.7.1) has
a unique mild solution. Moreover, for all T > 0 and p ≥ 1,
p
E sup u(x, t) ≤ C(T , p). (5.7.20)
[0,T ]×[0,1]

The random field u is 2α-Hölder in space and α-Hölder in time for every α ∈ (0, 14 ).

The only complication that arises comes from the fact that V is not globally
Lipschitz. However, due to Assumption 5.63, we have

−xV (x) < C, (5.7.21)

which allows us to prove global existence via a localisation argument, as in the

analogous SDE cases.

Remark 5.67 As noted earlier, v(x, t) is regular only in dimension d = 1. In di-

mensions d > 1, the construction breaks down. The source of the problem is the
strong spatial irregularity of the noise. Therefore in applications often SPDEs with
stronger spatial correlations are considered. The task is to choose the right noise for
the application at hand.

5.7.2 Discretisation

From our perspective, the stochastic Allen-Cahn equation should arise as the limit
of a spatially discrete system. This works out well in dimension d = 1. The discrete
system consists of N ∈ N coupled stochastic differential equations of the form

∂ √
dXj (t) = − FD,N X(t) dt + 2ε dBj (t), j ∈ Λ, (5.7.22)
∂Xj

where Λ = Z/N Z = {1, . . . , N}, Xj (t) are the components of X(t) ∈ RN , and

FD,N (x) = V (xj ) + 14 D (xj − xj +1 )2 , (5.7.23)
j ∈Λ j ∈Λ

with Bj , j ∈ Λ, independent Brownian motions.

In order to obtain a limit as N → ∞, we need to rescale the potential and the
coupling constant. To that end we replace FD,N (x) by

FD,N (x) = N −1 FN 2 D,N (x). (5.7.24)

If we replace the unit lattice by a lattice of spacing 1/N , i.e., (xj )j ∈Λ is the discreti-
sation of a real-valued function x on [0, 1] given by xj = x(j/N ), then the resulting
120 5 Markov Processes in Continuous Time

potential converges formally to

1 1 2
lim FD,N (x) = V x(s) ds + 12 D x (s) ds (5.7.25)
N →∞ 0 0
√
with x(0) = x(1). We need to rescale the Brownian noise by a factor 1/ N . We
may relate this to a space-time white noise by setting formally
j/N
N −1/2 Bj (t) = W (x, t)dx, j ∈ Λ. (5.7.26)
(j −1)/N

Finally, we must accelerate time by a factor N , i.e., we set X N (t) = X(tN ). The
resulting discrete equations take the form
√
dXjN (t) = − 12 DN 2 XjN+1 (t)+XjN−1 −2XjN (t) dt −V XjN (t) dt + 2εN d B %j (t),
(5.7.27)
with B̃j , j ∈ Λ, independent Brownian motions.
We now define the function uN (x, t) : [0, 1] × R+ → R such that, for any given
t ∈ R+ , uN (·, t) is the linear interpolation between the points (j/N, XjN (t)), j ∈ Λ.
Then uN can also be represented as a mild solution of the discrete system, which
allows us to prove convergence to a mild solution of the SPDE. To do this, we
proceed again by solving the linear equations
√
dYjN (t) = − 12 D N 2 YjN+1 (t) + YjN−1 − 2YjN (t) dt + 2εN d B %j (t) (5.7.28)

with the help of Fourier series. Set Y&nN = N −1 j ∈Λ e−(2πi)nj/N Y N and, conse-
j
quently, YjN = n∈Λ e(2πi)nj/N Y &nN . Note that also
√
%j (t) =
NB e(2πi)nj/N Bn (t), (5.7.29)
n∈Λ

where the Bn , n ∈ Λ, are the independent Brownian motions from (5.7.8). A sim-
ple computation yields that the Fourier modes satisfy the SDEs (with zero initial
condition)
N √
dY &n (t) dt + 2ε dBn (t).
&nN (t) = − 1 DN 2 2 cos(2πn/N ) − 2 Y (5.7.30)
2

&N
Abbreviate Δn = N 2[cos(2πn/N ) − 2]. Then these equations can be solved as
2

√ t
&nN (t) =
1 &N 1 &N
Y 2ε e− 2 D Δn t e 2 D Δn s dBn (s). (5.7.31)
0

Define

vjN (t) = &nN (t).
e(2πi)nj/N Y (5.7.32)
n∈Λ
5.7 Stochastic partial differential equations 121

As in the continuous case, we get the 2 -bound

&N
1 − e−D Δn t
E N −1 v N (t)2 = . (5.7.33)
j
DΔ&N
j ∈Λ n n∈Λ

It is easy to see that this expression converges to the right-hand side of (5.7.13)
as N → ∞. In fact, denoting by v N (x, t) the linear interpolation of the points
(j/N, vjN (t)), j ∈ Λ, we can show that this process converges to v(x, t) in L2 .
Finally we can write the discrete equations in their mild form as
t
Xj (t) =
N
pt (j, k)Xk (0) −
N N
pt−s (j, k)V XkN (s) ds + vjN (t), (5.7.34)
k∈Λ k∈Λ 0

where p N is the semi-group associated with the discrete Laplacian. Note that the
noise is coupled to the mild form of the SPDE in that both are driven by the same
Brownian motions.
The above formulation can be further embellished by writing it for the inter-
polations uN defined by putting uN (j/N, t) = XjN (t) for j ∈ Λ and using linear
interpolation. Define (for von Neumann boundary conditions)
N x 1
κN (x) = + . (5.7.35)
N 2N
Let p N be the linear interpolation of p N on [0, 1] × [0, 1] along the discretisation
points.

Lemma 5.68 For every u0 ∈ Cbc ([0, 1]) and N ∈ N the function uN defined on
[0, 1] × R+ satisfies the equation
1

u (x, t) =
N
dy gtN x, κN (y) u0 κN (y)
0

t 1
− ds N
dy gt−s x, κN (y) V uN κN (y), s + v N (x, t).
0 0
(5.7.36)

For all T > 0 and p ≥ 1,

N
sup E sup u (x, t)p ≤ C(T , p). (5.7.37)
N ∈N (x,t)∈[0,1]×[0,T ]

The following theorem asserts the convergence of the solution of (5.7.36) to the
solution of (5.7.1).

Theorem 5.69 For all u0 ∈ Cbc

3 ([0, 1]), T > 0 and p ≥ 1,

lim uN = u, (5.7.38)
N →∞
122 5 Markov Processes in Continuous Time

where convergence holds in the following senses:

p 1
• In Lp , i.e., limN →∞ E[uN − u∞,T ] p = 0.
• Almost surely in C([0, 1] × [0, T ]), i.e., for every η ∈ (0, 12 ) there exists an almost
surely finite random variable Ξ such that

Ξ
uN − u ∞,T
≤ , (5.7.39)
Nη
where w∞,T = supt∈[0,T ] supx∈[0,1] |w(x, t)|.

The convergence of the solutions also implies the convergence of the hitting times
of the discrete approximations to those of the SPDE. A precise statement is as fol-
lows. Let u0 be the initial condition of the solution of (5.7.1) and φ a continuous
function. For ρ > 0, define the hitting times

τ (ρ) = inf t > 0 : u(t) − φ ∞ < ρ ,
(5.7.40)
τ N (ρ) = inf t > 0 : uN (t) − φ N ∞ < ρ ,

where φ N is the linear approximation of φ.

Theorem 5.70 Suppose that limN →∞ φ N − φ∞ = 0 and that there exists a ρ0
such that, for every 0 < ρ < ρ0 ,

Eu0 τ (ρ) < ∞. (5.7.41)

Then, for every 0 < ρ < ρ0 ,

lim τ N (ρ) = τ (ρ) a.s., lim EuN τ N (ρ) = Eu0 τ (ρ) . (5.7.42)
N →∞ N →∞ 0

The proof of this theorem is straightforward and can be found in Barret [11].

5.8 Bibliographical notes

1. Much of the exposition in this chapter is taken from Ethier and Kurtz [104], Roger
and Williams [207, 208] and Karatzas and Shreve [148]. The martingale problem
formulation is due to Stroock and Varadhan [223].

2. The conditions for existence stated in Theorems 5.59 are not necessary. In par-
ticular, growth conditions are important only when the solutions can reach regions
where the coefficients become too large. Formulations of weaker hypotheses for ex-
istence and uniqueness can be found in Jacod and Shiryaev [143, Chap. 14]. Their
verification in concrete cases can be tricky.
5.8 Bibliographical notes 123

3. Examples of introductory textbooks on SPDEs are Da Prato [68], Holden [139]

and Röckner [203]. A classical treatise is the St. Flour lecture notes by Walsh [235].

4. The Allen-Cahn (or Ginzburg-Landau) equation models the behaviour of an elas-

tic string in a potential with viscous stochastic forcing (see e.g. Funaki [117]). It also
has interpretations in quantum field theory (see Fajona [107], Cassandro, Olivieri
and Picco [52]), and in statistical physics as a reaction-diffusion equation modelling
phase transitions and evolution of interfaces (see Brassesco [41], Brassesco and
Buttà [42]).

5. The existence, uniqueness and regularity of the solution of the stochastic Allen-
Cahn equation stated in Proposition 5.66 was proved in Gyöngy and Pardoux [134].
The convergence of the finite discretisation in Theorem 5.69 was proved in Fu-
naki [117] and Gyöngy [133] for V with V globally Lipschitz. Barret [11] extended
this result to the setting of Assumption 5.63.

6. Existence of strong solutions via renormalisation for a class of SPDEs contain-

ing the Allen-Cahn equation on the two-dimensional torus was shown by Da Prato
and Debussche [67]. There are interesting recent developments. Hairer [135] pro-
poses a renormalisation strategy that allows to give sense to the white noise case in
dimensions d = 2, 3.
Chapter 6
Large Deviations

“A large cage!” the Professor promptly replied. “Bring a large

cage”, he said to the people generally, “with strong bars of
steel, and a portcullis made to go up and down like a
mouse-trap! Does anyone happen to have such a thing about
him?” (Lewis Carroll, Sylvie and Bruno Concluded)

This chapter gives a summary introduction to large deviations. Although large devi-
ation theory is not our main interest in this monograph, it is an essential element in
our conceptual understanding of metastability. Moreover, it provides tools to obtain
estimates, which often serve as preliminary steps towards more refined estimates.
Section 6.1 recalls the main ingredients of large deviation theory in a general set-
ting (without proofs). Section 6.2 gives a full derivation of path large deviations for
diffusion processes (under strong regularity assumptions). Section 6.3 takes a brief
look at path large deviations for stochastic partial differential equations. Section 6.4
formulates the extension to path large deviations for Markov processes (without
proofs). Section 6.5 gives a brief outline of the Freidlin-Wentzell theory of metasta-
bility, collects some properties of associated action integrals, and looks at crossing
and exit problems that are crucial for a proper understanding of metastability.

6.1 Large deviation principles

Definition 6.1 A family of probability measures (με )ε>0 on a Polish space X
is said to satisfy the large deviation principle (LDP) with rate function I : X →
[0, ∞] if
(i) I has compact level sets and is not identically infinite,
(ii) lim infε↓0 ε ln με (O) ≥ −I (O) for all O ⊆ X open,
(iii) lim supε↓0 ε ln με (C) ≤ −I (C) for all C ⊆ X closed,
where I (S) = infx∈S I (x), S ⊆ X .

Informally, the LDP says that if Bδ (x) is a ball of radius δ > 0 centred at x ∈ X ,
then

με Bδ (x) = e−[1+o(1)] I (x)/ε (6.1.1)

© Springer International Publishing Switzerland 2015 125

A. Bovier, F. den Hollander, Metastability,
Grundlehren der mathematischen Wissenschaften 351,
DOI 10.1007/978-3-319-24777-9_6
126 6 Large Deviations

Fig. 6.1 Paradigmatic picture of a rate function with a unique zero

when ε ↓ 0 followed by δ ↓ 0 (see Fig. 6.1).

If in (i) the level sets of I are assumed to be closed only and in (iii) the inequality
is assumed to hold for compact sets only, then it is said that the weak LDP holds.
Strengthening a weak LDP to an LDP amounts to establishing exponential tightness,
i.e., to proving that for every N < ∞ there exists a compact set KN ⊆ X such that
lim supε↓0 ε ln με ([KN ]c ) ≤ −N .
The LDP is the workhorse for the computation of averages of exponential func-
tionals, as contained in the following lemma.

Lemma 6.2 (Varadhan’s lemma) If (με )ε>0 satisfies the LDP on X with rate func-
tion I , then

lim ε ln eF (x)/ε με (dx) = ΛF , ∀ F ∈ Cb (X ), (6.1.2)
ε↓0 X

where Cb (X ) is the space of bounded continuous functions on X , and

ΛF = sup F (x) − I (x) . (6.1.3)
x∈X

The result in Lemma 6.2 can be extended to include F that are unbounded
and/or discontinuous, provided certain tail estimates on με are available. Varadhan’s
lemma has the following inverse.

Lemma 6.3 (Bryc’s lemma) Suppose that (με )ε>0 is exponentially tight and the
limit in (6.1.2) exists for all F ∈ Cb (X ). Then (με )ε≥0 satisfies the LDP with rate
function I given by

I (x) = sup F (x) − ΛF , x ∈ X . (6.1.4)
F ∈Cb (X )

There are several “forward principles” that allow LDP’s to be generated from one
another. A key example is the contraction principle (see Fig. 6.2).
6.1 Large deviation principles 127

Fig. 6.2 Illustration of the contraction principle

Lemma 6.4 (Contraction principle) Let (με )ε>0 satisfy the LDP on X with rate
function I . Let Y be a second Polish space, and let T : X → Y be a continuous
map from X to Y . Then the family of probability measures (νε )ε>0 on Y defined
by ν = μ ◦ T −1 satisfies the LDP on Y with rate function J given by

J (y) = inf I (x), y∈Y . (6.1.5)

x∈X
T (x)=y

Another example is via exponential tilting:

Lemma 6.5 Let (με )ε>0 satisfy the LDP on X with rate function I , and let F ∈
Cb (X ). Then the family of probability measures (νε )ε>0 on X defined by

1 F (x)/ε
νε (dx) = e με (dx), Nε = eF (x)/ε με (dx), (6.1.6)
Nε X

satisfies the LDP on X with rate function J given by

J (x) = ΛF − F (x) − I (x) , x∈X. (6.1.7)

A final example is the Dawson-Gärtner projective limit LDP:

Theorem 6.6 (Dawson-Gärtner projective limit LDP) Let (με )e>0 be a family of
on X
probability measures . Let (π N )N ∈N be a nested family of projections acting
on X , such that n∈N π X = X , and let
N

N −1
X N = πNX , ε = με ◦ π
μN , N ∈ N. (6.1.8)

If, for each N ∈ N, the family (μN ε )ε>0 satisfies the LDP on X
N with rate func-

tion I , then (με )ε>0 satisfies the LDP on X with rate function I given by
N

I (x) = sup I N π N x , x∈X. (6.1.9)
N ∈N
128 6 Large Deviations

Since
I N (y) = inf I (x), y ∈ X N, (6.1.10)
{x∈X : π N (x)=y}

the supremum in (6.1.9) is monotone in N by the nestedness property. The projective

limit LDP can, for instance, be used to extend a suitably nested sequence of LDP’s
on finite-dimensional spaces to an LDP on an infinite-dimensional space.

LDPs can be formulated on general topological spaces X , although typically

this comes at the cost of more technicalities. Conversely, more can be said when
X has more structure. For instance, if X is vector space, then the rate function
can be identified as the Legendre transform of a (generalised) cumulant generating
function. When X = Rd , this is known as the Gärtner-Ellis theorem:

Theorem 6.7 (Gärtner-Ellis theorem) Let (με )ε>0 be a family of probability mea-
sures on Rd , d ≥ 1, with the following properties:

(i) φ(u) = limε↓0 ε ln Rd eu,x/ε με (dx) exists in R for all u ∈ Rd , where ·, ·
denotes the standard inner product on Rd .
(ii) u → φ(u) is differentiable on Rd .
Then (με )ε>0 satisfies the LDP on Rd with a convex rate function φ ∗ given by

φ ∗ (x) = sup u, x − φ(u) , x ∈ Rd . (6.1.11)
u∈Rd

There is a version of Theorem 6.7 where the domain of φ is not all of Rd , in

which case additional assumptions must be made.
Two special cases of Theorem 6.7 deserve to be mentioned. Let (Xi )i∈N , be i.i.d.
R-valued random variables with common law ρ. Let M1 (R) denote the space of
probability measures on R (which is a subset of the vector space of signed measures
on R).
−1
n
• Cramér’s Theorem:
λx Let μn denote the law of the empirical average n i=1 Xi .
If M(λ) = e ρ(dx) < ∞ for all λ ∈ R, then (μn )n∈N satisfies the LDP on R
with rate ε = n−1 and rate function

I (x) = sup λx − ln M(λ) , x ∈ R. (6.1.12)
λ∈R

• Sanov’s
Theorem: Let μn denote the law of the empirical distribution
n−1 ni=1 δXi . Then (μn )n∈N satisfies the LDP on M1 (R) with rate ε = n−1 and
rate function

dν
I (ν) = ν(dx) ln (x) , ν ∈ M1 (R), (6.1.13)
R dρ
with the right-hand side infinite when ν is not absolutely continuous with respect
to ρ.
6.2 Path large deviations for diffusion processes 129

Fig. 6.3 A path γ over the time interval [0, T ]

6.2 Path large deviations for diffusion processes

The general theory in Sect. 6.1 serves as the framework for many concrete exam-
ples. In this section we take a look at large deviations on path space for diffusion
processes. In Sect. 6.2.1 we derive Schilder’s theorem for Brownian motion. The
proof is written out in detail in order to convey the background of this theorem. In
Sects. 6.2.2–6.2.3 we show how to extend Schilder’s theorem to diffusions.

6.2.1 Brownian motion

Brownian motion√B = (Bt )t∈R+ on Rd starting at the origin will typically be at a

distance of order t from the origin at time t, in particular, Bt /t converges to zero
a.s. as t → ∞. We are interested in computing the probability that B follows an
exceptional path for which Bt lives on space-scale t. To formalise this idea, we fix
a time-horizon T > 0 and a smooth path γ : [0, T ] → Rd starting at the origin (see
Fig. 6.3), and we estimate the probability

P sup εBs/ε − γ (s) ≤ δ , ε ↓ 0, (6.2.1)
s∈[0,T ]

where · denotes the Euclidean norm on Rd . The following result is known

as Schilder’s theorem. Let C0 ([0, T ]) be the space of continuous functions
f : [0, T ] → Rd such that f (0) = 0 equipped with the supremum norm f ∞ =
sups∈[0,T ] f (s).

Theorem 6.8 (Schilder’s theorem) Set B ε = (Bsε )s∈[0,T ] with Bsε = εBs/ε . Then
(B ε )ε>0 satisfies the LDP on C0 ([0, T ]) with rate function I given by

1 T
γ̇ (s)2 ds, if γ ∈ H1 ,
I (γ ) = 2 0 (6.2.2)
∞, otherwise,

where H1 is the space of absolutely continuous functions with square-integrable

T
derivative equipped with the norm f H1 = [ 0 f˙(s)2 ds]1/2 .
130 6 Large Deviations

Proof First, we prove a lower bound for the probability in (6.2.1).

Lemma 6.9 For every γ ∈ H1 ,

lim inf ε ln P B ε − γ ∞
< δ ≥ −I (γ ) ∀ δ > 0. (6.2.3)
ε↓0

Proof Fix δ>0. Note that (εBs/ε )s∈[0,T ] has the same distribution as (ε 1/2 Bs )s∈[0,T ] .
Hence

P B ε − γ ∞ < δ = P B − ε −1/2 γ ∞ < ε −1/2 δ . (6.2.4)
To estimate the probability in the right-hand side, we observe that, by the Girsanov
theorem (Theorem 5.61), the process B % = (B
%s )s∈[0,T ] defined by

%s = Bs − ε −1/2 γ (s)
B (6.2.5)

is a Brownian motion under the measure Q defined through the Radon-Nikodým

derivative
T T
dQ −1/2 −1 1 2
= exp ε γ̇ (s)dBs − ε 2 γ̇ (s) ds . (6.2.6)
dP 0 0

Hence, abbreviating
T
Z(B, γ ) = γ̇ (s)dBs , (6.2.7)
0
we get

P B − ε −1/2 γ ∞ < ε −1/2 δ = P B % ∞ < ε −1/2 δ

= EQ exp −ε −1/2 Z(B, % γ ) − ε −1 I (γ ) 1B
% ∞ <ε −1/2 δ

% ∞ < ε −1/2 δ
= exp −ε −1 I (γ ) Q B

× EQ exp −ε −1/2 Z(B, % γ ) | B% ∞ < ε −1/2 δ

= exp −ε −1 I (γ ) P B∞ < ε −1/2 δ

× EP exp −ε −1/2 Z(B, γ ) | B∞ < ε −1/2 δ . (6.2.8)

From Jensen’s inequality we have that

EP exp −ε −1/2 Z(B, γ ) | B∞ < ε −1/2 δ

≥ exp −ε −1/2 EP Z(B, γ ) | B∞ < ε −1/2 δ = 1. (6.2.9)

On the other hand, it is easy to see (applying Doob’s maximum in equality for
submartingales in Theorem 3.57) that

lim P B∞ < ε −1/2 δ = 1 (6.2.10)
ε↓0
6.2 Path large deviations for diffusion processes 131

and hence

lim inf ε ln P B − ε −1/2 γ ∞
< ε −1/2 δ ≥ −I (γ ). (6.2.11)
ε↓0

Together with (6.2.4) this gives the desired lower bound.

Next, we prove an upper bound for the probability in (6.2.1), which is stated in a
somewhat particular form.

Lemma 6.10 Let Kλ = {γ ∈ H1 : I (γ ) ≤ λ}, δ, λ ∈ [0, ∞). Then

lim sup ε ln P inf B ε − γ ∞ > δ ≤ −λ. (6.2.12)
ε↓0 γ ∈Kλ

Proof Fix λ ∈ [0, ∞). For n ∈ N, set tk = (k/n)T , k = 0, . . . , n. Let Lε =

(Lεs )s∈[0,T ] be the linear interpolation of B ε = (Bsε )s∈[0,T ] such that BtTk = LTtk for
k = 0, . . . , n. Then

n
P B ε − Lε ∞
> δ ≤ P max Bsε − Lεs > δ
s∈[tk−1 ,tk ]
k=1

= n P max Bsε − (sn/T )BTε /n > δ
s∈[0,T /n]

= n P max Bs − sB1 > δ(n/T ε)1/2
s∈[0,1]

≤ n P max Bs > 12 δ(n/T ε)1/2 , (6.2.13)
s∈[0,1]

where we use that maxs∈[0,1] Bs ≤ 12 x implies maxs∈[0,1] Bs − sB1 ≤ x. The
last probability can be estimated by using the following exponential inequality for
one-dimensional Brownian motion:

Pd=1 max |Bs | > xt ≤ 2 exp − 12 x 2 t , t ∈ R+ . (6.2.14)
s∈[0,t]

This is easily obtained by using that Z = (Zt )t∈R+ with Zt = exp(Bt − 12 t) is a

martingale and by applying the Doob maximum inequality for submartingales in
(3.57). Inserting (6.2.14) into (6.2.13), we get

P max Bs > 12 δ(n/εT )1/2 ≤ d Pd=1 max |Bs | > 12 δ(n/T dε)1/2
s∈[0,1] s∈[0,1]

δ2 n
− 8T
≤ 2d e dε , (6.2.15)

and so
δ2n
lim sup ε ln P B ε − Lε ∞
>δ ≤− . (6.2.16)
ε↓0 8T d
132 6 Large Deviations

On the other hand,

lim sup ε ln P I Lε > λ ≤ −λ. (6.2.17)
ε↓0

Indeed, we have

n ε
n dn
2
I Lε = Btk − Btεk−1 = 12 ε ηi2 , (6.2.18)
2T
k=1 i=1

where ηi , i ∈ N, are i.i.d. standard normal random variables. Since

1 2
E e ρ 2 η i ≤ Cρ < ∞ ∀ 0 < ρ < 1, (6.2.19)

it follows that
' (

dn
dn 1 2
P 1
2ε ηi2 > λ ≤ e−ρλ/ε E e ρ i=1 2 ηi ≤ e−ρλ/ε (Cρ )dn , (6.2.20)
i=1

which yields (6.2.17) after letting ε ↓ 0 followed by ρ ↑ 1. Combining (6.2.16)–

(6.2.17), and using that

P inf B ε − γ ∞
> δ ≤ P B ε − Lε ∞
> δ + P I Lε > λ , (6.2.21)
γ ∈Kλ

we get

δ2n

lim sup ε ln P inf B ε − γ ∞

> δ ≤ − ∨ (−λ), (6.2.22)
ε↓0 γ ∈Kλ 8T d

which yields the claim after we let n → ∞.

Finally, we show that I has compact level sets.

Lemma 6.11 Kλ , λ ∈ [0, ∞), are compact.

Proof We have

N −1
|γ (ti+1 ) − γ (ti )|2
γ ∈ H1 ⇐⇒ sup sup 1
< ∞, (6.2.23)
N ∈N 0≤t1 <···<tN ≤T
2 |ti+1 − ti |2
i=1
6.2 Path large deviations for diffusion processes 133

with the supremum being equal to I (γ ). Let (γn )n∈N be a convergent sequence in
C0 ([0, T ]) with limit γ . Then

N −1
|γ (ti+1 ) − γ (ti )|2
I (γ ) = sup sup 1
N ∈N 0≤t1 <···<tN ≤T
2 |ti+1 − ti |2
i=1

N −1
|γn (ti+1 ) − γn (ti )|2
= sup sup 1
lim
N ∈N 0≤t1 <···<tN ≤T
2 n→∞ |ti+1 − ti |2
i=1

N −1
|γn (ti+1 ) − γn (ti )|2
≤ lim sup sup 1
n→∞ N ∈N 0≤t <···<t ≤T
1 N
2 |ti+1 − ti |2
i=1

= lim I (γn ), (6.2.24)

n→∞

i.e., γ → I (γ ) is lower semi-continuous, which implies that Kλ , λ ∈ [0, ∞), are

closed. Next, for any λ ∈ [0, ∞), γ ∈ Kλ , and 0 ≤ u < v ≤ T ,
1/2
v v
2 1/2
γ (v) − γ (u) = γ̇ (s)ds ≤ (v − u) γ̇ (s) ds ≤ 2(v − u)λ .
u u
(6.2.25)

Hence Kλ is uniformly bounded and uniformly equi-continuous. Therefore, by the

Arzelà-Ascoli theorem, Kλ is compact.

We are now ready to prove the LDP in Theorem 6.8. Since I is not identically
infinite, Part (i) in Definition 6.1 follows from Lemma 6.11. To get Part (ii), note that
for every open set O ⊂ C0 ([0, 1]) and every γ ∈ O, there exists a δ = δ(O, γ ) > 0
such that {γ : γ − γ ∞ < δ} ⊆ O. Hence, Lemma 6.9 implies that

lim inf ε ln P(O) ≥ lim inf ε ln P B ε − γ ∞
< δ ≥ −I (γ ), (6.2.26)
ε↓0 ε↓0

which yields Part (ii) after we take the supremum over γ ∈ O. Part (iii) is derived
as follows. Since Kλ , λ ∈ [0, ∞), are compact, we know that

∀ λ, δ > 0 ∃ δ = δ λ, δ > 0 : Kλ−2δ
δ
⊆ Kλ−δ , (6.2.27)

where Kλδ = {γ : infγ ∈Kλ ≤ γ − γ ∞ ≤ δ} is the δ-blow-up of Kλ . Pick C ⊆

C0 ([0, T ]) closed, and let λC = infγ ∈C I (γ ). Then C ⊆ [KλC −δ ]c for all δ > 0.
Hence (6.2.27) gives
δ c
P(C) ≤ P [KλC −δ ]c ≤ P Kλ−2δ , δ = δ λC , δ . (6.2.28)
134 6 Large Deviations

By applying Lemma 6.10, we get

lim sup ε ln P(C) ≤ lim sup ε ln P inf Bε − γ ∞
>δ
ε↓0 ε↓0 γ ∈Kλc −2δ

≤ − λC − 2δ . (6.2.29)

Letting δ ↓ 0, we get Part (iii).

6.2.2 Brownian motion with drift

We next show how to pass to the analogous result for a Brownian motion with a
drift, namely, we consider the SDE
t

Xt = Bt +
ε ε
b Xsε ds, t ∈ R+ , (6.2.30)
0

where b : Rd → Rd is assumed to be globally Lipschitz.

Theorem 6.12 Set X ε = (Xsε )s∈[0,T ] . Then (X ε )ε>0 satisfies the LDP on C0 ([0, T ])
with rate function I% given by
T 2
I%(γ ) = 1
2 γ̇ (s) − b γ (s) ds. (6.2.31)
0

Proof The easiest way to set up the proof is to consider the map F : C0 ([0, T ]) →
C0 ([0, T ]) given by
F (γ ) = f, (6.2.32)
where f is the solution of the integral equation
t

f (t) = γ (t) + b f (s) ds, t ∈ [0, T ]. (6.2.33)
0

We may use Gronwall’s lemma to show that F is continuous. Therefore X ε =

F (B ε ) in distribution, and hence

P X ε ∈ A = P B ε ∈ F −1 (A) , A ⊆ C0 [0, T ] . (6.2.34)

Since under a continuous map the inverse image of an open (closed) set is again an
open (closed) set, we can use Schilder’s theorem (Theorem 6.8) and the contraction
principle (Lemma 6.4) to obtain that (X ε )ε>0 satisfies the LDP with rate function
I%= I ◦ F −1 , i.e.,

I%(γ ) = inf I (f ) : F (γ ) = f . (6.2.35)
6.2 Path large deviations for diffusion processes 135

Since
t
F −1 (f )(t) = γ (t) = f (t) − b f (s) ds, t ∈ [0, T ], (6.2.36)
0

we get the claim.

6.2.3 Diffusion processes

It is possible to push the argument in Sect. 6.2.2 and consider the SDE
t t

Xtε = b Xsε ds + σ Xsε dBsε , t ∈ R+ , (6.2.37)
0 0

where b : Rd → Rd is assumed to be bounded and globally Lipschitz, and

σ : Rd → Md (with Md the space of d × d R-valued matrices) is assumed to
have entries σij : Rd → R, 1 ≤ i, j ≤ d, that are bounded and globally Lipschitz
as well. Moreover, the diffusion matrix a = σ σ † (the product of σ and its trans-
pose σ † ) is assumed to be uniformly elliptic, i.e., there exists a δ > 0 such that
a(x)y, y ≥ δy2 for all x, y ∈ Rd .

Theorem 6.13 Set X ε = (Xsε )s∈[0,T ] . Then (X ε )ε>0 satisfies the LDP on C0 ([0, 1])
with rate function I& given by

1 T
[γ̇ − (b ◦ γ )], a −1 (γ ) [γ̇ − (b ◦ γ )] (s)ds, if γ ∈ H1 ,
I&(γ ) = 2 0
∞, otherwise,
(6.2.38)
where a −1 is the inverse of a, and ·, · is the standard inner product on Rd .

Theorem 6.13 can be deduced from Theorem 6.12 with the help of a time-change
argument. To see how, first suppose that b = 0 and for simplicity take d = 1. Then
[B ε ][Xε ]t = [X ε ]t , where [·] is the quadratic variation (recall Theorem 5.44). Let
t∈R+ is Brownian motion. Then [X ]i(t) = t. On the other
i(t) be such that (Xi(t) ε ) ε

hand,
i(t)
ε 2
X i(t) = σ Xsε ds. (6.2.39)
0
ε )2 = 1, which together with X ε = B ε shows
Hence differentiation gives σ (Xi(t) i(t) t
that
t
−2
i(t) = σ Bsε ds. (6.2.40)
0
Since i(t) is measurable with respect to the filtration generated by (Bsε )s∈[0,t] , it
follows that Biε−1 (t) has the same distribution as Xtε and hence is a weak solution
136 6 Large Deviations

of (6.2.37) with b = 0. Schilder’s theorem for this time-changed Brownian motion

yields the claim in (6.2.38) with b = 0 (and a = σ 2 ). After adding the drift, we get
(6.2.38).
Finally, it is possible to allow b, σ to be time-dependent. For the rate functions
this simply amounts to writing b(γ (s), s) and σ (γ (s), s) in the formulas. We refer
to the literature for assumptions and proofs.

6.3 Path large deviations for stochastic partial differential

equations
The LDPs in Sect. 6.2 can be extended from SDEs to SPDEs. We focus on the class
of SPDEs that was described in Sect. 5.7.
Return to (5.7.1). For (x, t) ∈ [0, 1] × R+ , consider the family of SPDEs

∂ ε 1 ∂2 √ ∂2
u (x, t) = D 2 uε (x, t) − V uε (x, t) + 2ε W (x, t), (6.3.1)
∂t 2 ∂x ∂x∂t
where ε > 0 is a parameter that scales the strength of the noise. We want to think
of a mild solution as a random variable taking values in a Banach space. To do so,
define, for α ∈ (0, ∞), the Banach space

Bα = f ∈ C [0, 1] : f α < ∞ (6.3.2)

equipped with the norm

|f (x) − f (y)|
f α = sup f (x) + sup . (6.3.3)
x∈[0,1] x,y∈[0,1] |x − y|α

As initial condition for (6.3.1) we take u(·, 0) = ξ(·) with ξ ∈ Bα . Fix T > 0. Define
the space W21,2 as
# $
T 1 ∂γ (x, t) 2
W21,2 = γ : [0, 1] × [0, T ] → R : dt
dx <∞ . (6.3.4)
0 0 ∂t
Recall that, by Definition 5.65 and Theorem 5.66, a mild solution of our SPDE is
a continuous map with values in Bα , for any α ∈ (0, 14 ). Thus, a family of mild
solutions (uε )ε>0 is a family of random variables with values in the Banach space
C([0, T ], Bα ). The following theorem asserts that these satisfy an LDP.

Theorem 6.14 The family (uε )ε>0 satisfies the LDP on C([0, T ], Bα ) with rate
function I given by
⎧ 1
⎪ 1 T ∂2
⎨ 2 0 dt 0 dx | ∂t γ (x, t) − 2 D ∂x 2 γ (x, t) + V (x, γ (x, t))| ,
∂ 1 2
⎪
I (γ ) = if γ ∈ W21,2 , γ (·, 0) = ξ(·), (6.3.5)
⎪
⎪
⎩ ∞, otherwise.
6.4 Path large deviations for Markov processes 137

Here is a sketch of how Theorem 6.14 comes about. The starting point is the LDP
for the Brownian sheet, which is the analogue of Schilder’s theorem for Brownian
motion (Theorem 6.8). To state this LDP, let H be the space of all h ∈ C([0, 1] ×
[0, T ]) such that there exists an ḣ ∈ L2 ([0, 1] × [0, T ]) with
x t
h(x, t) = dy ds ḣ(y, s), x ∈ [0, 1], t ∈ [0, T ]. (6.3.6)
0 0
√
Theorem 6.15 ( 2ε W )ε>0 satisfies the LDP on C([0, 1] × [0, T ]) with rate func-
tion I0 given by
1
1 T
dt 0 dx |ḣ(x, t)|2 , if h ∈ H ,
I0 (h) = 2 0 (6.3.7)
∞, otherwise.

The LDP in Theorem 6.14 follows from Theorem 6.15 via the contraction prin-
ciple (Lemma 6.4), and identifies the rate function as

I (γ ) = inf I0 (h) : h ∈ H , T (h) = γ , (6.3.8)

where T is the map from H into C([0, T ], Bα ) such that T (h) = γ is the solution
of
1 t 1

γ (x, t) = dy gt (x, y)ξ(y) + ds dy gt−s (x, y) −V γ (y, s) + ḣ(y, s)
0 0 0
(6.3.9)
with gt (x, y) the density of the semi-group generated by 12 D∂ 2 /∂ 2 x on [0, 1] (the
heat kernel). Here, (6.3.8) and (6.3.9) are the infinite-dimensional analogues of
(6.2.33) and (6.2.35). The fact that (6.3.8) is the same as (6.3.5) follows from the
same type of inversion argument as in (6.2.36).

6.4 Path large deviations for Markov processes

In this section we state a path LDP for a general class of discrete-time Markov
processes subject to certain regularity conditions. This LDP will turn out to be useful
in Chap. 10.
Fix d ≥ 1. For ε > 0, let Z ε = (Znε )n∈N0 be the time-inhomogeneous Markov
process on εZd , starting at the origin, with transition kernel
ε
p ε (x, y; n) = P Zn+1 = y | Znε = x
#
exp[−q ε (x, ε −1 (y − x); εn)], if ε −1 (y − x) ∈ Δ,
= x, y ∈ εZd , n ∈ N0 ,
0, otherwise,
(6.4.1)
where Δ is a finite subset of Zd , and q ε : Rd × Δ × [0, ∞) → (0, ∞), ε > 0, is
a family of functions that are assumed to be bounded away from 0 and ∞, to be
138 6 Large Deviations

globally Lipschitz in the first and in the third coordinate, uniformly in the second
coordinate and in ε > 0, and to be such that

lim q ε = q for some q : Rd × Δ × [0, ∞) → (0, ∞), (6.4.2)

ε↓0

with the convergence uniform in all three coordinates.

For u, v, v ∗ ∈ Rd and t ∈ R+ , define

L (u, v; t) = ln e−q(u,w;t)+v,w ,
w∈Δ
- . (6.4.3)
L u, v ∗ ; t = sup v, v ∗ − L (u, v; t) .
∗
v∈Rd

Fix T > 0. Let Z̄ ε = (Z̄ ε (s))s∈[0,T ] denote the linear interpolation of

ε / −1 0
Z ε s s∈[0,T ] . (6.4.4)

Theorem 6.16 For every T > 0, (Z̄ ε )ε>0 satisfies the LDP on C0 ([0, T ]) with rate
function I given by

L ∗ (γ (s), γ̇ (s); s)ds, if γ ∈ D0 ([0, T ]),
T
I (γ ) = 0 (6.4.5)
∞, otherwise,

where D0 ([0, T ]) ⊆ C0 ([0, T ]) is the space of absolutely continuous functions with

T
integrable derivative equipped with the norm f D0 ([0,T ]) = 0 f˙(s)ds.

A simple example to which the above setting applies is simple random walk
on εZd , for which we choose Δ = {x ∈ Zd : x = 1} and q ε = ln(2d). For
this case Theorem 6.16 reduces to Mogul’skiı̆’s theorem [186] for simple random
walk, the analogue of Schilder’s theorem for Brownian motion (see Dembo and
Zeitouni [79]).

6.5 Freidlin-Wentzell theory

In this section we give a brief indication of how large deviations on path space
are used in the pathwise approach to metastability of Freidlin and Wentzell (recall
Sect. 1.3.2).

6.5.1 Properties of action functionals

The rate functions in Sects. 6.2–6.4 have the form of a classical action functional
in Newtonian mechanics, i.e., they are of the form
6.5 Freidlin-Wentzell theory 139
T
I (γ ) = L γ (s), γ̇ (s), s ds, (6.5.1)
0

for some Lagrangian L . In Theorem 6.12, for instance, L takes on the special
form
2
L γ (s), γ̇ (s), s = 12 γ̇ (s) − b γ (s) 2 . (6.5.2)
The principle of least action in classical mechanics states that the system follows
the trajectory of minimal action subject to boundary conditions. This leads to the
Euler-Lagrange equations

d2 db(γ (s)) db(γ (s))

γ (s) − 2γ̇ (s) = −b γ (s) . (6.5.3)
ds 2 dγ (s) dγ (s)

which in the case of (6.5.2) take the form

d2 d
γ (s) = 2b γ (s) b γ (s) . (6.5.4)
ds 2 dγ (s)

We can readily identify a special class of solutions of this second-order differential

equation, namely, solutions of the first order differential equation

γ̇ (s) = b γ (s) . (6.5.5)

These solutions have the property that they yield absolute minima of the action
functional, since they satisfy

L γ (s), γ̇ (s) = 0. (6.5.6)

Of course, being first-order, this equation admits only one boundary (or initial) con-
dition.

6.5.2 Crossing and exit problems

A typical question we may ask is the following: What is the probability of a solution
connecting two points u, v ∈ Rd in time T ? The LDP in Theorem 6.12 provides the
answer, namely,

lim lim ε ln P XTε ∈ Bδ (v) | X0ε ∈ Bδ (u) = − inf I (γ ), (6.5.7)
δ↓0 ε↓0 γ : γ (0)=u, γ (T )=v

where Bδ (x) is the ball of radius δ > 0 around x ∈ Rd . This leads us to solve (6.5.4)
subject to the boundary conditions γ (0) = u and γ (T ) = v. Unfortunately, not all
solutions of (6.5.4) also solve (6.5.5) as they can have positive action, meaning
that the event under consideration has an exponentially small probability. However,
140 6 Large Deviations

under certain conditions we may find a zero-action solution, for instance, when we
do not fix the time of arrival at v:

lim lim ε ln P Xsε ∈ Bδ (v) for some s ∈ [0, T ] | X0ε ∈ Bδ (u)
δ↓0 ε↓0
(6.5.8)
=− inf I (γ ).
γ : γ (0)=u, γ (s)=v for some s∈[0,T ]

Clearly, the infimum will be zero if the solution of (6.5.5) with γ (0) = u has the
property that γ (s) = v for some s ∈ [0, T ].
Suppose that we consider an event as in (6.5.8) that admits a zero-action path
γ with γ (0) = u and γ (T ) = v. Define the time-reversed path γ̄ (s) = γ (T − s),
s ∈ [0, T ]. Clearly, γ̄˙ (s) = −γ̇ (T − s). Hence a simple calculation, via (6.5.1)–
(6.5.2), shows that
T

I (γ ) − I (γ̄ ) = 2 b γ (s) γ̇ (s)ds = 2 b(x)dx. (6.5.9)
0 γ

Let us now specialise to the case where b is the gradient of a potential F , i.e.,
b(x) = ∇F (x), x ∈ Rd . Then

b(x)dx = F γ (T ) − F γ (0) = F (v) − F (u). (6.5.10)
γ

Hence

I (γ ) − I (γ̄ ) = 2 F (v) − F (u) . (6.5.11)
If I (γ ) = 0, then I (γ̄ ) = 2[F (u) − F (v)], and this is the minimal possible value for
any path going from v to u. Thus, there is the remarkable fact that the most likely
path going uphill in a potential is the time-reversal of the solution of the gradient
flow.
So far we have considered paths of a fixed time length T . Freidlin and Wentzell
allowed paths of arbitrary length and introduced the notion of quasi-potential:

V (u, v) = inf inf I (γ ), u, v ∈ Rd . (6.5.12)

T <∞ γ : γ (0)=u, γ (T )=v

They showed that, uniformly in wδ ∈ Bδ/2 (u),

−V (u, v) = lim lim ε ln P inf t > σBδ (u) : Xtε ∈ Bδ (v) (6.5.13)
δ↓0 ε↓0

≤ inf t > σBδ (u) : Xtε ∈ Bδ/2 (u) | X0ε = wδ ,

where σBδ (u) = inf{t ∈ R+ : Xtε ∈ / Bδ (u)} is the first exit time of the ball Bδ (u). The
probability in (6.5.13) is the proper version of the escape probability from u to v.
In the setting of Fig. 6.4, we have

V (u, v) = V u, z∗ + V z∗ , v , (6.5.14)
6.5 Freidlin-Wentzell theory 141

Fig. 6.4 A one-dimensional example of a potential F (recall Fig. 2.1)

where, by (6.5.11),

V u, z∗ = V z∗ , u + 2 F z∗ − F (u) , (6.5.15)

while

V z∗ , u = V z∗ , v = 0. (6.5.16)
Hence V (u, v) = 2[F (z∗ ) − F (u)], i.e., the exponential asymptotics of the escape
probability from u to v is given by twice the height of the potential barrier from u
to v.
Let τBδ (v) be the first hitting time of the ball Bδ (v). With the help of a simple re-
newal argument, (6.5.13) can be shown to imply that, for every ρ > 0 and uniformly
in wδ ∈ Bδ/2 (u),
∗ ∗
lim lim P e2[F (z )−F (u)−ρ]/ε ≤ τBδ (v) ≤ e2[F (z )−F (u)+ρ]/ε | X0ε = wδ = 1.
δ↓0 ε↓0
(6.5.17)

6.5.3 Metastability

The discussion in Sects. 6.5.1–6.5.2 forms the basis of the treatment of metastabil-
ity in Freidlin-Wentzell theory. In this theory, any constant or periodic solution of
(6.5.6) is a candidate for a metastable state. If γ is such a solution, then it is called
unstable when there exists another solution γ̃ and a family of functions (γn )n∈N
such that
#
γ (t), t ≤ −n,
γn (t) = (6.5.18)
γ̃ (t), t ≥ n,
while infn∈N I (γn ) = 0. In other words, a solution is unstable when it can be de-
formed into another solution at an arbitrarily small cost. Otherwise, the solution is
142 6 Large Deviations

called stable. In the context of the Markov process, a stable solution is interpreted
as a metastable state, also called a cycle. For us the most interesting situations cor-
respond to fixed points, i.e., solutions of (6.5.6) that are constant in time. In the case
of a reversible Markov process these are the only possible solutions.
A system is called metastable when it has at least two metastable states. In
the presence of noise there exist (exponentially unlikely) trajectories that consti-
tute transitions between these states. The variational problem in (6.5.12) with u, v
metastable states (respectively, its obvious extension when u, v are not fixed points),
provides the asymptotics of the transition probabilities between them, while (6.5.17)
provides control over the transition times between them. This, in a nutshell, is the
basis of the Freidlin-Wentzell theory of metastability. The strong point of this theory
is its great versatility. In particular, no assumption of reversibility needs to be made.
The weak point is the poor level of precision, i.e., only the exponential asymptotics
of characteristic quantities such as hitting times is obtained.
Freidlin-Wentzell theory does not offer the tools to go beyond the exponential
asymptotics. The goal of the present monograph is to position potential theory as
the key mathematical framework for obtaining sharper asymptotics, and to outline
the main ideas and techniques that are available to tackle concrete models.

6.6 Bibliographical notes

1. Section 6.1 is a crash course on large deviation theory. Definition 6.1 is due to
Varadhan. Lemmas 6.2–6.5 and Theorems 6.7–6.6 are key instruments, and are easy
to prove. For further reading we refer to the monographs by Varadhan [232], El-
lis [103], Deuschel and Stroock [91], Dembo and Zeitouni [79], and den Hollan-
der [80].

2. Theorem 6.13 in Sect. 6.2 lies at the heart of Freidlin-Wentzell theory. For further
reading we refer to the monographs by Freidlin and Wentzell [115], Dupuis and
Ellis [98], and Feng and Kurtz [110].

3. The LDP in Theorem 6.14 is derived in Sowers [221]. Extensions to larger classes
of SPDEs were obtained by Kallianpur and Xiong [147] and by Chenal and Mil-
let [57].

4. Theorem 6.16 in Sect. 6.4 is taken from Bovier and Gayrard [37] and will be
needed in Chap. 10. For extensions to general dynamical systems, see the mono-
graph by Kifer [152].

5. In Theorem 6.16 it is possible to restrict the Markov process to εZd ∩ D with

D a convex subset of Rd and allow for singular behaviour near the boundary of D.
This will be a natural setting for the application to mean-field models, which will be
treated in Part IV. It is further straightforward to extend Theorem 6.16 to continuous
space and/or time under certain additional regularity conditions.
6.6 Bibliographical notes 143

6. The application of large deviation theory to the problem of metastability in the

work of Freidlin and Wentzell [115] initiated the rigorous mathematical treatment of
metastability. This development was picked up by many authors. A pivotal paper is
Cassandro, Galves, Olivieri and Vares [51], which established the link to interacting
particle systems. A comprehensive account of metastability from this point of view
is given in the monograph by Olivieri and Vares [198].
Chapter 7
Potential Theory

Mais la méthode la plus générale et la plus directe pour

résoudre les questions de probabilité consiste à les faire
dépendre d’equations aux différences.
(Pierre Simon de Laplace, Théorie Analytique des Probabilités)

The martingale problem and the stopping times that were described in Chaps. 4–5
provide the key link between Markov processes and Dirichlet problems. This chap-
ter gives a detailed account of this connection. Although, once again, the basic
principles are the same in discrete and in continuous time, we split the presenta-
tion: discrete time and countable state space (Sect. 7.1), continuous time and gen-
eral state space (Sect. 7.2). The mixed cases are similar and are left to the reader.
Once we have built up the necessary tools, we provide three variational formulas
for the capacity referred to as the Dirichlet principle, the Thomson principle and
the Berman-Konsowa principle (Sect. 7.3). These will be crucial for the metastable
analyses carried out in Parts IV–VIII. The variational principles can be extended to
the non-reversible setting, but become harder to work with (Sect. 7.4).

7.1 The Dirichlet problem: discrete time

7.1.1 Definition

In this section we place ourselves in the setting of a discrete-time Markov process

X = (Xn )n∈N0 on a countable state space S with transition kernel P and generator
L = P − 1. To avoid complications we will always assume that X is irreducible.
We will use the notation of Chap. 4. Let D ⊂ S, g : D → R, ḡ : D c → R and
k : D → [−K, ∞) with −∞ < K < 1, where D c = S\D. We call the following
pair of equations for an unknown function f a Dirichlet problem (see Fig. 7.1):
(−Lf )(x) + k(x)f (x) = g(x), ∀ x ∈ D,
(7.1.1)
f (x) = ḡ(x), ∀ x ∈ Dc .
The following theorem provides a stochastic representation for the solution of such
Dirichlet problems.

© Springer International Publishing Switzerland 2015 145

A. Bovier, F. den Hollander, Metastability,
Grundlehren der mathematischen Wissenschaften 351,
DOI 10.1007/978-3-319-24777-9_7
146 7 Potential Theory

Fig. 7.1 Dirichlet problem for f : S → R with source k : D → [−K, ∞) and boundary condi-
tions g : D → R and ḡ : D c → R

Theorem 7.1 Let X be a discrete-time Markov process with generator L. Assume

that D is such that

Ex τD c (1 − K)−τDc < ∞ ∀ x ∈ D, (7.1.2)

where τD c = inf{t ∈ N : X(t) ∈ D c }. Then the Dirichlet problem (7.1.1) has a

unique solution given by
'τ c −1 ( τD c −1' s (
"
D
1 " 1
f (x) = Ex ḡ(XτDc ) + g(Xs ) ,
1 + k(Xu ) 1 + k(Xu )
u=0 s=0 u=0
x ∈ D, (7.1.3)

with the convention that the empty product equals 1.

Proof The most convenient way to prove Theorem 7.1 is via the martingale problem
characterisation of Markov processes. Indeed, as in Lemma 4.9, we check that, for
any k : S → R bounded from below,
' t−1 (
" 1
Mt = f (Xt ) − f (X0 )
1 + k(Xu )
u=0
' s (

t−1 "
1
+ (−Lf )(Xs ) + k(Xs )f (Xs ) (7.1.4)
1 + k(Xu )
s=0 u=0

is a martingale. Moreover, Doob’s optional stopping theorem applies to MτDc under

condition (7.1.2) (recall Theorem 3.67(ii)), and so Ex [MτDc ] = M0 = 0.

Note that the solution of the Dirichlet problem is unique, unless the homogeneous
problem
(−Lf )(x) + k(x)f (x) = 0, ∀ x ∈ D,
(7.1.5)
f (x) = 0, ∀ x ∈ Dc ,
7.1 The Dirichlet problem: discrete time 147

admits a non-zero solution. The most interesting case for us is when k = λ is con-
stant. In that case, if (7.1.5) admits a non-zero solution, then λ is called an eigenvalue
and the corresponding solution an eigenfunction of the Dirichlet problem. A solution
of the homogeneous Dirichlet boundary value problem with k = 0,

(−Lf )(x) = 0, ∀ x ∈ D,
(7.1.6)
f (x) = ḡ(x), ∀ x ∈ Dc ,

is called a harmonic function (see Sect. 4.2.3).

One of the most important applications of Theorem 7.1 concerns the case k = 0,
g = 1 and ḡ = 0. This yields the following characterisation of the mean exit time.

Corollary 7.2 Let D ⊂ S and set

Ex [τD c ], x ∈ D,
w(x) = (7.1.7)
0, x ∈ Dc .

Then w is the unique solution of the Dirichlet problem

(−Lw)(x) = 1, x ∈ D,
(7.1.8)
w(x) = 0, x ∈ Dc .

7.1.2 Green function, equilibrium potential and measure

The objects we introduce now will turn out to be fundamental in the study of
metastability. We consider the case where the solution of the Dirichlet problem
in (7.1.1) is unique. For simplicity, we restrict ourselves to the case where k = λ
is constant. Then the solution to (7.1.1) can be written in the form

f (x) = GλD c (x, z)g(z) + HDλ c (x, z)ḡ(z), x ∈ D, (7.1.9)
z∈D z∈D c

where
τ
D c −1

−s−1
GλD c (x, z) = Ex (1 + λ) 1Xs =z , x, z ∈ D, (7.1.10)
s=0

is called the Green function, and

HDλ c (x, z) = Ex (1 + λ)−τDc 1XτDc =z (7.1.11)

= (1 + λ)−s Px (τD c = s, Xs = z), x ∈ D, z ∈ D c ,
s∈N0
148 7 Potential Theory

Fig. 7.2 Dirichlet problem for h : S → [0, 1] with boundary conditions h = 1 on A and h = 0
on B

is called the Poisson kernel. Clearly, the Green function can also be characterised
as the solution of the problem

(−LGλD c )(x, z) + λGλD c (x, z) = 1z (x), ∀ x ∈ D,

(7.1.12)
GλD c (x, z) = 0, ∀ x ∈ Dc .

In the special case λ = 0 we have the following appealing representation of the

Green function G0D c = GD c and the Poisson kernel HD0 c = HD c :
τ
D c −1

GD c (x, z) = Ex 1Xs =z , x, z ∈ D, (7.1.13)
s=0

HD c (x, z) = Px (XτDc = z), x ∈ D, z ∈ D c , (7.1.14)

i.e., for the Markov process starting at x ∈ D, GD c (x, z) represents the average
number of visits to z ∈ D before it exits from D, while HD c (x, z) represents the
probability that it enters D c at z.
The following object will be absolutely central in our study of metastability. Let
A, B ⊂ S be two non-empty disjoint subsets. Consider the Dirichlet problem (see
Fig. 7.2)
(−Lh)(x) = 0, ∀ x ∈ S\(A ∪ B),
h(x) = 1, ∀ x ∈ A, (7.1.15)
h(x) = 0, ∀ x ∈ B.
Suppose that (7.1.15) has a unique solution, e.g. because Ex [τA∪B ] < ∞ for all
x ∈ S. The harmonic function that solves (7.1.15) is denoted by hA,B (x) and is
called the equilibrium potential. The representation of the solution given in (7.1.9)
and (7.1.13)–(7.1.14), with D = S\(A ∪ B), D c = A ∪ B, g = 0, ḡ(x) = 1, x ∈ A,
and ḡ(x) = 0, x ∈ B, implies that

hA,B (x) = Ex 1A (XτA∪B ) = Px (τA < τB ), x ∈ S\(A ∪ B). (7.1.16)
7.1 The Dirichlet problem: discrete time 149

This equation gives an analytic representation for the probability in the right-hand
side when x ∈ S\(A ∪ B). Using the Markov property, we can get a similar expres-
sion when x ∈ A ∪ B. Namely, for x ∈ A ∪ B,

Px (τA < τB ) = p(x, y)Py (τA < τB ) + p(x, y)
y∈S\(A∪B) y∈A

= p(x, y)hA,B (y) = (P hA,B )(x)
y∈S

= (LhA,B )(x) + hA,B (x). (7.1.17)

The latter can be written for x ∈ B as

(LhA,B )(x) = Px (τA < τB ) − 0 = Px (τA < τB ), (7.1.18)

and for x ∈ A as

(−LhA,B )(x) = 1 − Px (τA < τB ) = Px (τB < τA ). (7.1.19)

The quantity
eA,B (x) = (−LhA,B )(x), x ∈ A, (7.1.20)
is called the equilibrium measure on A, and is the second central object in our study
of metastability.
The following simple observation provides a fundamental connection between
the objects we have introduced so far, and leads to a different representation of the
equilibrium potential. Pretend that the equilibrium measure eA,B is already known.
Then the equilibrium potential solves the inhomogeneous Dirichlet problem

(−Lh)(x) = eA,B (x), ∀ x ∈ S\B,

(7.1.21)
h(x) = 0, ∀ x ∈ B.

The solution of (7.1.21) can be written in terms of the Green function.

Theorem 7.3 With the notation introduced above,

hA,B (x) = GB (x, y)eA,B (y), x ∈ S. (7.1.22)
y∈A

Relation (7.1.22) can be used to express the Green function in terms of the equi-
librium measure and the equilibrium potential: simply choose A = {a}, to get
ha,B (x)
GB (x, a) = , x ∈ S. (7.1.23)
ea,B (a)
Note that ea,B (a) = Pa (τB < τa ) has the meaning of an escape probability from a
to B. The full power of Theorem 7.3 will come out in the reversible case, which we
discuss next.
150 7 Potential Theory

7.1.3 Reversibility

Considerable simplifications occur when we assume a certain symmetry property

of the transition kernels known as reversibility or, in physics terminology, detailed
balance.

Definition 7.4 A Markov process with countable state space S and transition kernel
P = {p(x, y), x, y ∈ S}, is called reversible if there exists a non-zero μ : S → R+
such that
μ(x)p(x, y) = μ(y)p(y, x) ∀ x, y ∈ S. (7.1.24)
The function μ is called the reversible measure of the Markov process.

The function space L2 (S, μ) is a natural space to work on when the Markov
process is reversible with respect to μ.

Lemma 7.5 Let f ∈ L2 (S, μ), where μ is invariant with respect to P . Then Pf ∈
L2 (S, μ).

Proof The claim follows from the fact that P is a contraction in the L2 -norm:

2
2
μ(x) (Pf )(x) = μ(x) p(x, y)f (y) (7.1.25)
x∈S x∈S y∈S
2
≤ μ(x) p x, y f y p x, y
x∈S y ∈S y ∈S
2 2
≤ μ(x) p x, y f y = μ y f y ,
x∈S y ∈S y ∈S

where we use the Cauchy-Schwarz inequality and the invariance of μ, i.e.,

μP = μ.

Reversibility can be expressed by saying that the transition kernel P acts as a

self-adjoint operator on the Hilbert space L2 (S, μ).

Lemma 7.6 If μ is a reversible probability measure for P , then μ is an invariant

probability measure for P .

Proof Clearly, f = 1 is in L2 (S, μ). Hence, for all bounded measurable functions g,

μ(x)p(x, y)g(y) = p(y, x)μ(y)g(y) = μ(y)g(y), (7.1.26)
x,y∈S x,y∈S y∈S

and so μ is invariant.
7.1 The Dirichlet problem: discrete time 151

We next come to the definition of the Dirichlet form, which plays a central rôle
in the potential-theoretic approach to metastability.

Lemma 7.7 Let L be the generator of a Markov process with reversible measure μ.
Then L defines a non-negative-definite quadratic form

E (f, g) = μ(x)f (x)(−Lg)(x), f, g ∈ L2 (S, μ), (7.1.27)
x∈S

called the Dirichlet form.

Proof In the discrete case it suffices to write out E (f, g) explicitly. Namely, by
reversibility,

E (f, g) = μ(x)p(x, y)f (x) g(x) − g(y)
x,y∈S

= μ(x)p(x, y)f (y) g(y) − g(x) . (7.1.28)
x,y∈S

Symmetrising between the first and the second expression, we get

E (f, g) = 12 μ(x)p(x, y) f (x) g(x) − g(y) + f (y) g(y) − g(x)
x,y∈S

= 1
2 μ(x)p(x, y) f (x) − f (y) g(x) − g(y) . (7.1.29)
x,y∈S

This expression manifestly is a non-negative-definite quadratic form.

An important rôle will be played by the analogue of the two Green identities for
sums.

Lemma 7.8 Let f, g ∈ L2 (S, μ) and D ⊂ S. Assume that P is reversible with re-
spect to μ. Then
(i) (first Green identity)

1
2 μ(x)p(x, y) f (x) − f (y) g(x) − g(y)
x,y∈D

= μ(x)f (x)(−Lg)(x) − μ(x)p(x, y)f (x) g(x) − g(y)
x∈D x∈D,y∈D c
(7.1.30)

(ii) (second Green identity)

μ(x) f (x)(Lg)(x) − g(x)(Lf )(x)
x∈D
152 7 Potential Theory

= μ(x)p(x, y) f (x)g(y) − g(x)f (y)
x∈D,y∈D c

= μ(y) g(y)(Lf )(y) − f (y)(Lg)(y) . (7.1.31)
y∈D c

Proof To prove the first Green identity, we proceed as in the proof of Lemma 7.7. If
D = S, then the proof gives (7.1.30) without the last term. If D S, then in order
to produce the full action of L we must add the terms that involve y ∈ D c .
The first equality in the second Green identity is a trivial consequence of the first
Green identity. To get the second equality, use reversibility, add terms that involve
x ∈ D c to produce the full action of L, and use that these terms add up to zero. Note
that the equality between the first and the last line is just the statement that L is
symmetric in L2 (S, μ).

An illustration of what can be done with the Green identities is the following
formula for the Poisson kernel in terms of the Green function.

Lemma 7.9 (Poisson kernel and Green function) If P is reversible with respect to
μ and D ⊂ S, then the Poisson kernel defined in (7.1.11) satisfies
μ(x)
HD c (z, y) = p(x, y) GD c (x, z) − GD c (y, z) , z ∈ D, y ∈ D c .
μ(z)
x∈D
(7.1.32)

Proof Fix z ∈ D. In (7.1.31), choose for f the solution of the Dirichlet problem
in (7.1.6), and choose g(x) = GD c (x, z), x ∈ S. With this choice, by (7.1.12) with
λ = 0, the first line in (7.1.31) simply becomes −μ(z)f (z). The second line reads

μ(x)p(x, y) f (x)GD c (y, z) − GD c (x, z)f (y)
x∈D,y∈D c

=− μ(x)p(x, y)GD c (x, z)ḡ(y), (7.1.33)
x∈D,y∈D c

where we use that GD c (y, z) = 0 and f (y) = ḡ(y) for y ∈ D c , again by (7.1.6) and
(7.1.12) with λ = 0. Hence
μ(x)

f (z) = p(x, y)GD c (x, z) ḡ(y). (7.1.34)

c
μ(z)
y∈D x∈D

From this expression the Poisson kernel HD c (z, y) can be read off as the sum
between the brackets, where we recall (7.1.9) and use that ḡ is arbitrary. Since
GD c (y, z) = 0 for y ∈ D c , we thus obtain (7.1.32).

A nice aspect of (7.1.32) is that by reversibility it correctly extends to D, namely,

HD c (z, y) = 0 for z, y ∈ D.
7.1 The Dirichlet problem: discrete time 153

As a second application of the Green identities, we obtain the following alterna-

tive to Theorem 7.3.

Theorem 7.10 If P is reversible with respect to μ, then for all non-empty disjoint
sets A, B ⊂ S,
μ(y)
hA,B (x) = GB (y, x)eA,B (y), x ∈ S. (7.1.35)
μ(x)
y∈A

In particular, if f is a solution of the Dirichlet problem

(−Lf )(x) = g(x), ∀x ∈ B c ,
(7.1.36)
f (x) = 0, ∀x ∈ B,
then
1
νA,B (y)f (y) = μ(x)hA,B (x)g(x), (7.1.37)
cap(A, B)
y∈A x∈S

where νA,B is the probability measure on A given by

μ(y)eA,B (y)
νA,B (y) = , y ∈ A, (7.1.38)
cap(A, B)
with normalisation factor

cap(A, B) = μ(x)eA,B (x). (7.1.39)
x∈A

Proof The key observation is that not only L but also its inverse L−1 is symmetric
in L2 (S, μ). This implies that
μ(x)GB (x, y) = μ(y)GB (y, x), x, y ∈ S, (7.1.40)
and yields (7.1.35) via (7.1.22). Multiplying
both sides of (7.1.35) by μ(x)g(x),
summing over x ∈ S, and noting that x∈S GB (y, x)g(x) = f (y), we get (7.1.37)
apart from the normalisation factor cap(A, B). Dividing by this quantity, we obtain
(7.1.37).

The measure νA,B is called the last-exit biased distribution on A for the transition
from A to B. The number cap(A, B) is called the capacity of the pair (A, B).
The following corollary of Theorem 7.10 provides a formula for mean hitting
times, which plays a crucial rôle in our study of metastability.

Corollary 7.11 Let A, B ⊂ S be non-empty and disjoint. Then, for reversible

Markov processes,
1
νA,B (x) Ex [τB ] = μ(y)hA,B (y). (7.1.41)
cap(A, B)
x∈A y∈S
154 7 Potential Theory

In particular, for A = {x},

1
Ex [τB ] = μ(y)hx,B (y). (7.1.42)
cap(x, B)
y∈S

Proof Note that the representation in (7.1.3) shows that the solution f of (7.1.1)
with k = 0, ḡ = 0 and g = 1 is

f (x) = Ex [τB ], x ∈ S. (7.1.43)

Inserting this into (7.1.37), we get (7.1.41).

In Theorem 7.10 capacity made its first appearance. The first Green identity pro-
vides an important alternative representation of capacity in terms of the Dirichlet
form.

Lemma 7.12 Let A, B ⊂ S be non-empty and disjoint. Then cap(A, B) defined

in (7.1.39) can be expressed as

cap(A, B) = E (hA,B , hA,B ). (7.1.44)

Proof This is obvious from the definition of the Dirichlet form in Lemma 7.7,
the definition of the equilibrium measure in (7.1.20), the definition of the capac-
ity in (7.1.39) and the definition of the equilibrium potential in (7.1.15) defining the
equilibrium potential hA,B .

Note that Lemma 7.12 becomes useful through the alternative representation of
the Dirichlet form given in (7.1.29).
We close by listing a few relations, linking hitting probabilities and capacities,
that will be needed in Chap. 8.

Lemma 7.13
(i) μ(x)Px (τB < τx ) = cap(x, B) for x ∈ S, B ⊂ S\{x}.
(ii) Py (τx < τB )/Px (τy < τB ) = cap(x, B)/cap(y, B) for x, y ∈ S, B ⊂ S\{x, y}.
(iii) Py (τB < τx ) ≤ cap(x, B)/cap(x, y) for x, y ∈ S, B ⊂ S\{x, y}.

Proof (i) It follows from (7.1.19)–(7.1.20) with A = {x} that ex,B (x) = (−Lhx,B )(x)
= Px (τB < τx ). It follows from (7.1.39) with A = {x} that cap(x, B) = μ(x)ex,B (x).
(ii) Use the second Green identity in (7.1.31), with D = {x}, g = hx,B , f = hy,B
and x, y ∈ S\B, to get

μ(x)(Lhx,B )(x)hy,B (x) = μ(y)(Lhy,B )(y)hx,B (y), x, y ∈ S\B. (7.1.45)

Since hy,B (x) = Px (τy < τB ) by (7.1.16) with A = {y}, we get the claim.
7.1 The Dirichlet problem: discrete time 155

(iii) Again use the second Green identity, this time with D = {x}, g = hx,y , f =
hB,x and x, y ∈ S\B, to get

μ(y)(Lhy,x )(y)hB,x (y) = μ(z)(LhB,x )(z)hy,x (z), x, y ∈ S\B. (7.1.46)
z∈B

side equals cap(y, x)Py (τB < τx ), the right-hand side is bounded from
The left-hand
above by z∈B μ(z)(LhB,x )(z), which equals cap(B, x) by (7.1.39).

7.1.4 One-dimensional nearest-neighbour random walks

An important example where explicit computations are possible is that of a Markov

process with state space S ⊆ Z for which transitions are allowed between nearest-
neighbour sites only. Such Markov processes are referred to as birth-death pro-
cesses. We denote the transition rates by p(x, y), y = x ± 1. In this case, there is
a strictly positive invariant measure μ such that μ(x)p(x, y) = μ(y)p(y, x) for all
x, y ∈ S.

Equilibrium potential

Due to the one-dimensional nature of our Markov process, the only equilibrium
potentials we have to compute are of the form

hb,a (x) = Px (τb < τa ), a < x < b. (7.1.47)

This satisfies the one-dimensional discrete boundary-value problem

p(x, x + 1) h(x + 1) − h(x) + p(x, x − 1) h(x − 1) − h(x) = 0, a < x < b,
h(a) = 0,
h(b) = 1. (7.1.48)

Note that the first equation can be conveniently rewritten as

p(x, x + 1) h(x + 1) − h(x) = p(x, x − 1) h(x) − h(x − 1) . (7.1.49)

Setting d(x) = h(x) − h(x − 1), we get

p(x, x − 1)
d(x + 1) = d(x), (7.1.50)
p(x, x + 1)
so that
"
x−1
p(z, z − 1)
d(x) = d(a + 1). (7.1.51)
p(z, z + 1)
z=a+1
156 7 Potential Theory

Using reversibility, we can write the product as

"
x−1
p(z, z − 1) "
x−1
μ(z) p(z, z − 1) μ(a + 1) p(a + 1, a)
= = .
p(z, z + 1) μ(z + 1) p(z + 1, z) μ(x) p(x, x − 1)
z=a+1 z=a+1
(7.1.52)
But

h(x) = d(x) + d(x − 1) + d(x − 2) + · · · + d(a + 1) + h(a), (7.1.53)

so that

h(x) = R(a, x) μ(a + 1) p(a + 1, a) d(a + 1) + h(a), (7.1.54)

where we abbreviate

v
1 1
R(u, v) = , u < v. (7.1.55)
μ(y) p(y, y − 1)
y=u+1

Now h(a) = 0, and so it remains to determine d(a + 1) from the condition h(b) = 1,
i.e.,
1 = R(a, b) μ(a + 1) p(a + 1, a) d(a + 1). (7.1.56)
Combining this with (7.1.54), we get

R(a, x)
hb,a (x) = , a < x < b. (7.1.57)
R(a, b)

Capacity

We continue by computing capacities. The equilibrium measure is given by the for-

mula
eb,a (a)
= p(a, a + 1)hb,a (a + 1) + p(a, a − 1)hb,a (a − 1) = p(a, a + 1)hb,a (a + 1),
(7.1.58)
since hb,a (a − 1) = 0. Inserting (7.1.57), we get
p(a,a+1)
R(a, a + 1) μ(a+1)p(a+1,a) 1
eb,a (a) = p(a, a + 1) = = . (7.1.59)
R(a, b) R(a, b) μ(a)R(a, b)

Consequently, for the capacity we get

1
cap(a, b) = . (7.1.60)
R(a, b)
7.2 The Dirichlet problem: continuous time 157

Remark 7.14 Formula (7.1.60) suggests another common electrostatic interpreta-

tion of capacities, namely, as conductances. In fact, if we interpret μ(x)p(x, x −1)=
μ(x − 1)p(x − 1, x) as the conductance of the resistor (x − 1, x), then, by Ohm’s
law, (7.1.60) represents the conductance of the chain of resistors from a to b.

Mean hitting time

Inserting (7.1.57) and (7.1.60) into (7.1.42) (with A = {x} and B = {a}), we get
' x−1 ∞
(
R(a, y)
Ex [τa ] = R(a, x) μ(y) + μ(y) , a < x. (7.1.61)
R(a, x) y=x
y=a+1

This formula will be used in Chap. 13 to compute the metastable crossover time
for the Curie-Weiss model. The latter will be shown to link up nicely with Kramers
formula for Brownian motion in a double-well potential, as discussed in Sect. 2.1.1.
See also Sect. 7.2.5.

7.2 The Dirichlet problem: continuous time

In the case of continuous time a number of technical problems arise that make the
theory a bit more delicate. Structurally, however, all remains the same.

7.2.1 Definition

Much of what we discussed in Sect. 7.1 carries over to continuous-time Markov

processes. The basic representation theorem for solutions of Dirichlet problems is
provided through the martingale problem characterisation of general Markov pro-
cesses, as discussed in Sect. 5.4.
We consider a continuous-time Markov process X = (Xt )t∈R+ with state space
S and generator L . Let D ⊂ S be an open set and specify continuous functions
g, k : D → R and ḡ : D c → R. The question is whether we can find a continuous
function f : S → R such that

(−L f )(x) + k(x)f (x) = g(x), ∀ x ∈ D,

(7.2.1)
f (x) = ḡ(x), ∀ x ∈ Dc .

The Dirichlet problem in (7.2.1) can also be posed when ḡ is not a continuous
function. In that case the continuity requirement must be replaced by the condition
that, for all x ∈ ∂D, if limn→∞ xn = x in D, then limn→∞ f (xn ) = ḡ(x).
The analogue of Theorem 7.1 is the following basic representation theorem.
158 7 Potential Theory

Theorem 7.15 Let L be the generator of a continuous-time Markov process X,

and assume that the associated martingale problem has a unique solution. Assume
that f ∈ D(L ), the domain of L , solves the Dirichlet problem in (7.2.1), and let
X be a solution of the martingale problem associated with L . Let τD c = inf{t > 0:
Xt ∈
/ D}. If

Ex τD c exp inf k(x)τD c < ∞, x ∈ D, (7.2.2)
x∈D

then
τD c

f (x) = Ex ḡ(XτDc ) exp − k(Xs )ds

0
τD c t

+ g(Xt ) exp − k(Xs )ds dt , x ∈ D. (7.2.3)
0 0

Proof We use Lemma 5.35 with g = L f , where f solves the Dirichlet problem.
Condition (7.2.2) is, like its analogue (7.1.2) in the discrete-time case, sufficient to
imply that the optional sampling theorem holds.

As in (7.1.9) for the discrete case, we rewrite (7.2.3) for the case k(x) = λ as

f (x) = GD c (x, dz) g(z) +
λ
HDλ c (x, dz) ḡ(z), x ∈ D, (7.2.4)
D Dc

where
τD c
GλD c (x, dz) = Ex e−λt 1Xt ∈dz dt , x, z ∈ D, (7.2.5)
0
is called the Green function and

HDλ c (x, dz) = Ex e−λτDc 1XτDc ∈dz (7.2.6)
∞
= e−λt Px (τD c ∈ dt, Xt ∈ dz), x ∈ D, z ∈ D c ,
0

is called the Poisson kernel.

For the sequel it is useful to separate the discussion into two parts, dealing with
the two main classes of Markov processes we need later on: Markov processes with
countable state space (Sect. 7.2.2), and diffusion processes (Sect. 7.2.3).

7.2.2 Countable state space

In a countable state space S, all continuous-time Markov processes are essentially

time changes of discrete-time Markov processes (see Sect. 5.1). The generator L
7.2 The Dirichlet problem: continuous time 159

can be written in terms of jump rates c(x, y), x, y ∈ S, with c(x, x) = 0, x ∈ S, in

the form

(L f )(x) = c(x, y) f (y) − f (x) , x ∈ S. (7.2.7)
y∈S

As long as the total jump rates

c(x) = c(x, y), x ∈ S, (7.2.8)
y∈S

are bounded from above, very little changes from the discrete-time setting, and most
formulas remain unaltered. We need the hitting times

τA = inf{t > 0 : Xt ∈ A, ∃ 0 < s < t : Xs = X0 }, A ⊂ S. (7.2.9)

Note that this definition makes sure that if X0 ∈ A, then τA is not identically zero.
The Green function and the Poisson kernel take the form
τ c
D
−λt
GD c (x, y) = Ex
λ
e 1Xt =y dt , x, y ∈ D, (7.2.10)
0

HDλ c (x, y) = Ex e−λτDc 1X(τDc )=y , x ∈ D, y ∈ D c . (7.2.11)

The solution hA,B of the Dirichlet problem in (7.1.15) with L replaced by L is

again the equilibrium potential with the probabilistic interpretation in (7.1.16). We
define the equilibrium measure on A as

eA,B (x) = (−L hA,B )(x), x ∈ A, (7.2.12)

but its probabilistic interpretation is slightly altered: a moment of reflection shows

that
1
Px (τA < τB ) = (L hA,B )(x), x ∈ B,
c(x)
(7.2.13)
1
Px (τB < τA ) = (−L hA,B )(x), x ∈ A.
c(x)

c(x) L
1
Indeed, is the generator of the underlying discrete-time Markov process.
1
Apart from the factor c(x) , all formulas derived for the reversible discrete-time case
remain unaltered.

7.2.3 Diffusion processes

Matters become more involved for an uncountable state space. We will restrict our
discussion to the case of elliptic diffusion processes in Rd ,

dXt = b(Xt )dt + σ (Xt )dBt , (7.2.14)

160 7 Potential Theory

where b is a time-independent drift vector and σ is a time-independent dispersion

matrix. We have seen in Chap. 5 that solutions of this equation are strong Markov
processes with a generator whose restriction to C 2 (Rd ) is given by

1 ∂ 2 f (x)
d d
∂f (x)
(L f )(x) = aij (x) + bi (x) , (7.2.15)
2 ∂xi ∂xj ∂xi
i,j =1 i=1

where the diffusion matrix a is given by

d
aij (x) = σik (x)σkj (x). (7.2.16)
k=1

In the sequel we will always assume that the dispersion matrix σ is non-degenerate
and hence the diffusion matrix a is strictly positive, i.e., for all x ∈ Rd , a(x) defines
a strictly positive quadratic form. For this case the operator L is called elliptic. If,
for some open domain D ⊂ Rd ,

aij (x)ξi ξj ≥ δξ 22 , x ∈ D, (7.2.17)
i,j

then we call L uniformly elliptic in D.

The classical Dirichlet problem associated with an elliptic operator L and an
open domain D is described as follows (where we assume that D is bounded). Let
g, k : D̄ → R and ḡ : ∂D → R be continuous functions. We want to find a continu-
ous function f : D̄ → R such that

(−L f )(x) + k(x)f (x) = g(x), ∀ x ∈ D,

(7.2.18)
f (x) = ḡ(x), ∀ x ∈ ∂D.

Theorem 7.15 applies to this situation, but it is somewhat delicate to check when the
assumptions are satisfied.
For the case k(x) ≤ 0, condition (7.2.2) is ensured (for bounded domains) by a
rather weak ellipticity condition.

Lemma 7.16 Let D ⊂ Rd be open and bounded. Assume that, for some 1 ≤ ≤ d,

min a (x) > 0. (7.2.19)

x∈D̄

Then Ex [τD c ] < ∞ for all x ∈ D.

Proof Set a = minx∈D̄ a (x), b = maxx∈D̄ b(x) and q = minx∈D̄ x . Let ν >
2b/a. Consider the smooth function h(x) = −μeνx , x ∈ Rd , with μ > 0. Clearly,

(−L h)(x) = μeνx 12 ν 2 a (x) + νb (x) ≥ 12 μνaeνq (ν − 2b/a). (7.2.20)
7.2 The Dirichlet problem: continuous time 161

Choose μ such that the right-hand side is larger than 1, so that (−L h)(x) ≥ 1 for
all x ∈ D. Since
t∧τDc
h(Xt∧τDc ) + (−L h)(Xs )ds (7.2.21)
0
is a martingale, it follows that
t∧τ c
D
Ex (−L h)(Xs )ds = h(x) − Ex h(Xt∧τDc ) , (7.2.22)
0

hence

Ex [t ∧ τD c ] ≤ h(x) − Ex h(Xt∧τDc ) , (7.2.23)
and so
Ex [t ∧ τD c ] ≤ max |h(y)| < ∞. (7.2.24)
y∈D̄

Passing to the limit t → ∞, we get Ex [τD c ] < ∞.

Theorem 7.15 gives us a stochastic representation formula for solutions of the

Dirichlet problem, under the assumption that such solutions exist and also a weak
solution of the SDE in (7.2.14) exists. We may ask whether we can actually use
this formula to prove the existence of solutions of the Dirichlet problem. This is in-
deed the case under certain regularity conditions on the boundary of D. A sufficient
criterion is given in the following proposition.

Theorem 7.17 A point z ∈ ∂D is regular if there exists a cone A with tip z such that
A ∩ Br (z) ⊂ D c for some r > 0, where Br (z) is the ball of radius r centred at z.
If all points of ∂D are regular, then existence and uniqueness of the solution of the
Dirichlet problem holds.

7.2.4 Reversible Markov processes

We now return to reversible Markov processes in the general continuous-time set-

ting. Matters are similar as in discrete time, but formulations are slightly different.

Reversibility

Let (Pt )t∈R+ be a strongly continuous contraction semigroup acting on the space
B(S) of bounded measurable functions on S. Assume that a measure μ on S is
invariant with respect to (Pt )t∈R+ . Then the action of (Pt )t∈R+ can be extended to
L2 (S, μ). The following lemmas are the analogues of Lemmas 7.5–7.7 for discrete
time, and their proofs can be copied.
162 7 Potential Theory

Lemma 7.18 Let f ∈ L2 (S, μ), where μ is invariant with respect to (Pt )t∈R+ . Then
Pt f ∈ L2 (S, μ) for all t ∈ R+ .

Having an L2 -action of Pt , we can define its adjoint Pt∗ via

μ(dx)f (x)(Pt g)(x) = μ(dx) Pt∗ f (x)g(x), f, g ∈ L2 (S, μ). (7.2.25)
S S

We may check that (Pt∗ )t∈R+ is itself a Markov semigroup that generates the time-
reversal of X, in the sense that (Pt∗ f )(Xt ) = f (X0 ).

Definition 7.19 A measure μ on S is called reversible with respect to (Pt )t∈R+ if

Pt∗ = Pt for some t > 0 (and hence for all t > 0).

Lemma 7.20 If μ is a reversible probability measure for (Pt )t∈R+ , then μ is an

invariant probability measure for (Pt )t∈R+ .

The notions that were introduced above extend from the semigroup to the gener-
ator. Thus, for an invariant measure μ, we can define the adjoint L ∗ of a generator
L via

μ(dx) L ∗ g (x)f (x) = μ(dx)(L f )(x)g(x),
S S

∀ f, g ∈ D(L ) : L f, L g ∈ L2 (S, μ). (7.2.26)

If μ is a probability measure, then the second condition is automatically verified.

A reversible Markov process is therefore characterised by the fact that its generator
is self-adjoint in L2 (S, μ) for some invariant measure μ.

Lemma 7.21 Let μ be a reversible measure. Then the generator L defines a non-
negative-definite quadratic form,

E (f, g) = μ(dx)g(x)(−L f )(x), (7.2.27)
S

called the Dirichlet form.

Proof By the fact that L is self-adjoint, E (f, f ) is real for all f ∈ D(L ). More-
over, if E (f, f ) < ∞, then

−1

E (f, f ) = lim t μ(dx)f (x) f (x) − (Pt f )(x) . (7.2.28)
t↓0 S

But

μ(dx)f (x) f (x) − (Pt f )(x) = f 22,μ − μ(dx)f (x)(Pt f )(x)
S

≥ f 22,μ − f 2,μ Pt f 2,μ ≥ f 22,μ − f 2,μ f 2,μ = 0, (7.2.29)

7.2 The Dirichlet problem: continuous time 163

where we use Cauchy-Schwarz and Lemma 7.18. Combining (7.2.28)–(7.2.29), we

get E (f, f ) ≥ 0. Since L is positive and self-adjoint, it can be written in the form
L = At A with A positive. Hence the Dirichlet form has the form

E (f, g) = μ(dx)(Af )(x)(Ag)(x), (7.2.30)
S

and is manifestly non-negative definite.

Reversible diffusions

First, we note that the formal adjoint in L2 (dx) of the operator L given in (7.2.15)
is
∗ 1 ∂2 ∂
L g (x) = aij (x)g(x) − bi (x)g(x)
2 ∂xi ∂xj ∂xi
i,j i

1 ∂ 2 g(x)
= aij (x)
2 ∂xi ∂xj
i,j
∂aij (x)

∂g(x)
+ − bi (x)
∂xj ∂xi
i j
2

1 ∂ aij (x) ∂bi (x)

+ − g(x). (7.2.31)
2 ∂xi ∂xj ∂xi
ij i

Hence L ∗ = L if and only if

∂aij (x)
= 2bi (x), i = 1, . . . , d, (7.2.32)
∂xj
j

which thus is the condition for the diffusion to be reversible with respect to Lebesgue
measure.
Next, we look for a reversible measure of the form μ(dx) = e−F (x) dx. Then μ
is reversible if and only if, for all g ∈ D(L ),
∗ −F
L ge (x) = e−F (x) (L g)(x). (7.2.33)

A simple computation via integration by parts shows that

∗ −F 1 ∂ 2 g(x)
L ge (x) = e−F (x) aij (x)
2 ∂xi ∂xj
i,j
∂F (x) ∂g(x)
− e−F (x) aij (x)
∂xi ∂xj
i,j
164 7 Potential Theory
2
1 ∂ F (x) ∂F (x) ∂F (x)
+ e−F (x) aij (x) + g(x)
2 ∂xi ∂xj ∂xi ∂xj
i,j
∂aij (x)

∂F (x) ∂g(x)

+ e−F (x) − bi (x) − g(x) +

∂xj ∂xi ∂xi
i j
2

−F (x) 1 ∂ aij (x) ∂bi (x)

+e − g(x). (7.2.34)
2 ∂xi ∂xj ∂xi
ij i

The condition for reversibility is therefore

∂F (x) ∂aij

−aij (x) + = 2bi (x), i = 1, . . . , d, (7.2.35)

∂xj ∂xj
j

or
1 F (x) ∂
bi (x) = e aij (x)e−F (x) , i = 1, . . . , d. (7.2.36)
2 ∂xj
j

Inserting this relation into (7.2.15), we see that the operator L can be written in the
form
1 ∂ ∂

(L g)(x) = eF (x) aij (x)e−F (x) g(x). (7.2.37)

2 ∂xi ∂xj
i,j

In the simplest case where aij = δij , (7.2.36) reads

1 ∂
bi (x) = − F (x), i = 1, . . . , d, (7.2.38)
2 ∂xi

i.e., the drift b is the gradient of the potential −F (up to the factor 12 ). In that case
the generator L takes the suggestive form

1
(L g)(x) = eF (x) ∇e−F (x) ∇ g(x). (7.2.39)
2
The corresponding Dirichlet form can be written as

1 - .
E (f, g) = − μ(dx)f (x)(L g)(x) = μ(dx) ∇f (x), ∇g(x) , (7.2.40)
S 2 S

where ·, · denotes the standard inner product in Rd . In the case of general a we
just need to use the inner product relative to a, i.e.,

1 ∂f (x) ∂g(x)
E (f, g) = − μ(dx)f (x)(L g)(x) = μ(dx) aij (x) .
S 2 S ∂xi ∂xj
i,j
(7.2.41)
7.2 The Dirichlet problem: continuous time 165

Fig. 7.3 Capacitor D between A and B

Equilibrium measure, equilibrium potential and capacity

In the following we return to the general case of an SDE corresponding to a gener-

ator that is a uniformly elliptic differential operator L with coefficients satisfying
Lipschitz conditions (so that unique strong solutions of the SDE exist).
Let D be an open regular domain in Rd with ∂D = A ∪ B, where A, B are non-
empty and disjoint (see Fig. 7.3). Then the solution of the Dirichlet problem

(−L h)(x) = 0, x ∈ D,
h(x) = 1, x ∈ A, (7.2.42)
h(x) = 0, x ∈ B,

is denoted by hA,B and is called the equilibrium potential of the capacitor (A, B).
As in the discrete-time case,

hA,B (x) = Px (τA < τB ), x ∈ D. (7.2.43)

Remark 7.22 The above names come from the classical case where L = 12 Δ, for
which the Dirichlet problem is a problem of electrostatics. The sets A and B cor-
respond to two metal plates attached to a battery that imposes a constant voltage
(potential difference) between the plates. The solution of this problem describes the
electrostatic potential, whose gradient is the electrostatic field.

Next, we consider the inhomogeneous Dirichlet problem,

(−L f )(x) = g(x), x ∈ D,

(7.2.44)
f (x) = 0, x ∈ ∂D.

We have seen in Theorem 7.15 that if (7.2.44) has a unique solution, then this solu-
tion has the probabilistic representation
τ c
D
f (x) = Ex g(Xt )dt , x ∈ D. (7.2.45)
0
166 7 Potential Theory

The Green kernel will often have a density with respect to Lebesgue measure, i.e.,

GD c (x, dy) = GD c (x, y)dy, x, y ∈ D. (7.2.46)

In that case GD c (x, y) is called the Green function. For the special case of (7.2.4)
with λ = 0, g = 1 and ḡ = 0, (7.2.45) yields the relation

Ex [τD c ] = GD c (x, y) dy. (7.2.47)
D

Let us next look at the relation between the equilibrium potential and the Dirich-
let form in the case of a reversible diffusion. We want to compute E (hA,B , hA,B ).
We might be tempted to think that E (hA,B , hA,B ) = 0, because (L hA,B )(x) = 0
except on the sets ∂A and ∂B. But on these sets L hA,B is singular because hA,B is
not differentiable. Therefore we may interpret L hA,B as a measure that is concen-
trated on A and B. Since hA,B vanishes on ∂B, we get

E (hA,B , hA,B ) = μ(x)(−L hA,B )(dx). (7.2.48)
∂A

The measure eA,B (dx) = (−L hA,B )(dx) is called the equilibrium measure associ-
ated with the capacitor (A, B).
To understand the above observation better, let us return to the case aij = δij .
We then have the following integral formulas known as the Green identities, which
constitute the analogue of Lemma 7.8.

Lemma 7.23 Let D be a regular domain, let f, g ∈ C 2 (D), and let L be the re-
versible operator given by (7.2.37). Then
(i) (first Green identity)

- .
dx e−F (x) ∇f (x), ∇g(x) − g(x)(2L f )(x)
D

= e−F (x) g(x)∂n(x) f (x) dσD (x) (7.2.49)
∂D

(ii) (second Green identity)

dx e−F (x) f (x)(2L g)(x) − g(x)(2L f )(x)
D

= e−F (x) g(x)∂n(x) f (x) − f (x)∂n(x) g(x) dσD (x) (7.2.50)
∂D

hold with
∂
∂n(x) = ni (x)aij (x) , (7.2.51)
∂xj
i,j
7.2 The Dirichlet problem: continuous time 167

where n(x) denotes the inner normal unit vector at x ∈ ∂D. In the case aij = δij ,
∂n(x) is the usual normal derivative at x.

Proof For the case F = 0 and aij = δij , both formulas are classical and can be found
in any standard textbook on potential theory. The extension to the general case is by
straightforward computation.

As in the discrete case, the Green identities give rise to a representation of the
Poisson kernel in terms of the Green function.

Lemma 7.24 If L is reversible with respect to μ and D ⊂ Rd is open and regular,

then the solution of the Dirichlet boundary value problem

−(L f )(x) = 0, x ∈ D,
(7.2.52)
f (x) = ḡ(x), x ∈ ∂D,

is given by

f (x) = ḡ(y) eF (x)−F (y) ∂n(y) GD c (y, x)dσD (y), x ∈ D, (7.2.53)
∂D

i.e.,

HD c (x, dy) = eF (x)−F (y) ∂n(y) GD c (y, x)dσD (y), x, y ∈ D. (7.2.54)

Using the first Green identity, we can state a precise relation between the equilib-
rium potential and the capacity. Namely, setting f = g = hA,B in (7.2.49), we see
that

E (hA,B , hA,B ) = dx e−F (x) hA,B (x)(−L hA,B )(x)
∂A

= e−F (x) ∂n(x) hA,B (x)dσA (x), (7.2.55)
∂A
i.e., on A the equilibrium measure eA,B is given by

eA,B (dx) = ∂n(x) hA,B (x)dσA (x). (7.2.56)

Remark 7.25 The quantity

cap(A, B) = e−F (x) ∂n(x) hA,B (x)dσA (x) = e−F (x) eA,B (dx) (7.2.57)
A A

is called the capacity of the capacitor (A, B), which in electrical language is the
total charge on the plate A. Using (7.2.55), we see that, alternatively, the capacity is
the total energy of the potential hA,B .

We have the analogue of Lemma 7.12.

168 7 Potential Theory

Lemma 7.26 Let A, B ⊂ S be non-empty and disjoint. Then

cap(A, B) = E (hA,B , hA,B ). (7.2.58)

Last-exit distribution and equilibrium measure

It will be nice to have a probabilistic interpretation of the equilibrium measure that

explains why −L hA,B really is a surface measure.
By the definition of the generator L , we formally have

(−L hA,B )(x) = lim t −1 (1 − Pt )hA,B (x)
t↓0

= lim t −1 Ex 1 − PXt (τA < τB )
t↓0

= lim t −1 Ex PXt (τB < τA ) . (7.2.59)
t↓0

For x ∈ D the limit exists and equals zero. For x ∈ A, however, the limit does not
exist, but we will make sense of it in a weak sense. To that end, let us define the last
exit time TA from A prior to arrival in B as

TA = sup{0 < t < τB : Xt ∈ A}, (7.2.60)

with the convention that sup ∅ = 0. This is not a stopping time, and

Px (TA > 0) = Px (τA < τB ) = hA,B (x), x ∈ D. (7.2.61)

Note that we can write the expectation in the last line of (7.2.59) as

Ex PXt (τB < τA ) = Px (0 < TA < t), x ∈ D ∪ A. (7.2.62)

Set
ψt (x) = t −1 Px (0 < TA < t), x ∈ D ∪ A. (7.2.63)
Define the last exit distribution (x, dy) on A by

(x, dy) = Px (XTA − ∈ dy, TA > 0), x ∈ D ∪ A, y ∈ A. (7.2.64)

Lemma 7.27 Let f be continuous on D̄ = D ∪ ∂D. Then

lim GB (x, y)ψt (y)f (y)dy = (x, dy)f (y), x ∈ D ∪ A. (7.2.65)
t↓0 D∪A A

Proof Without loss of generality, let f ≥ 0. Fix x ∈ D ∪ A. Using the integral rep-
resentation of the Green function in (7.2.5), we get
τB
GB (x, y)ψt (y)f (y)dy = Ex ψt (Xs )f (Xs )ds (7.2.66)
D∪A 0
7.2 The Dirichlet problem: continuous time 169
∞ ∞

= t −1 Ex f (Xs ) PXs (0 < TA < t) ds = t −1 Ex f (Xs )1s<TA <s+t ds
0 0
TA TA
= Ex 10<TA ≤t t −1 f (Xs )ds + Ex 1TA >t t −1
f (Xs )ds .
0 TA −t

Both terms in the last line are obviously uniformly bounded as t ↓ 0. Moreover,
TA
Ex 10<TA ≤t t −1 f (Xs )ds ≤ C Px [0 < TA ≤ t] ↓ 0, t ↓ 0, (7.2.67)
0

and, by the continuity of f ,

TA
lim Ex 1TA >t t −1 f (Xs )ds = Ex 1TA >0 f (XTA − ) . (7.2.68)
t↓0 TA −t

The right-hand side equals A (x, dy)f (y).

From Lemma 7.27 we deduce that the family of measures ψt (y)dy, t > 0, con-
verges as t ↓ 0 to a measure e(dy) on A, which satisfies

GB (x, y)e(dy) = (x, dy), x ∈ D ∪ A, y ∈ A. (7.2.69)

Integrating this formula over A, we arrive at the expression

GB (x, y)e(dy) = (x, dy) = hA,B (x), x ∈ D ∪ A. (7.2.70)
A A

Hence e(dy) = eA,B (dy), the equilibrium measure that was introduced in (7.2.56).
In conclusion, we have proven the following analogue of Theorem 7.3.

Theorem 7.28 For D and A, B as before,

hA,B (x) = GB (x, y)eA,B (dy), x ∈ D ∪ A ∪ B. (7.2.71)
A

Note that (7.2.71) holds on A because A (x, dy) = 1, x ∈ A, and on B because
GB (x, y) = 0, x ∈ B.
It is instructive to view Theorem 7.28 in the following way. We have already seen
that we may think of −L hA,B as a measure. The solution of the formal Dirichlet
problem
(−L h)(dx) = eA,B (dx), x ∈ D ∪ A,
(7.2.72)
h(x) = 0, x∈B
in terms of the Green function is precisely the expression in (7.2.71).
Using reversibility, we obtain from Theorem 7.28 the following analogue of The-
orem 7.10.
170 7 Potential Theory

Theorem 7.29 For D and A, B as before,

μ(y)
hA,B (x) = GB (y, x)eA,B (dy), x ∈ D ∪ A ∪ B. (7.2.73)
A μ(x)

The formula for the Green function gives corresponding formulas for solutions
of Dirichlet problems. For instance, if for some function g we consider the Dirichlet
problem
(−L f )(x) = g(x), x ∈ D ∪ A,
(7.2.74)
f (x) = 0, x ∈ B,

then f (x) = D∪A dy GB (x, y)g(y). By reversibility,

GB (x, y) = eF (x)−F (y) GB (y, x),

and so

e−F (x) hA,B (x)g(x) dx
D∪A

= dx e−F (x) g(x) GB (y, x) eF (x)−F (y) eA,B (dy)
D∪A A

= e−F (y) eA,B (dy) GB (y, x)g(x) dx
A D∪A

= e−F (y) eA,B (dy)f (y). (7.2.75)
A

Introducing the probability measure

e−F (y) eA,B (dy)

νA,B (dy) = , y ∈ A, (7.2.76)
cap(A, B)
we get

1
νA,B (dy)f (y) = e−F (x) hA,B (x)g(x) dx . (7.2.77)
A cap(A, B) D∪A

By picking g = 1, we get the following analogue of Corollary 7.11 linking

crossover times to capacity.

Corollary 7.30 For D and A, B as before,

1
νA,B (dy) Ey [τB ] = dx e−F (x) hA,B (x). (7.2.78)
A cap(A, B) D∪A

Proof Setting w(y) = Ey [τB ], for y ∈

/ B, w solves the Dirichlet problem (7.2.74)
with g = 1. Thus (7.2.78) is immediate from (7.2.77).
7.2 The Dirichlet problem: continuous time 171

7.2.5 One-dimensional diffusions

As in the case of nearest-neighbour random walks on Z, diffusions on R allow for

explicit solutions. In fact, the continuous case is even easier than the discrete case,
which was explained in Sect. 7.1.4.
All homogeneous boundary value problems can, by linearity, be reduced to a
computation of the equilibrium potential hc,{a,b} for c ∈ (a, b), which is the solution
of the Dirichlet problem

(−L h)(x) = 0, x ∈ (a, b)\c,

h(x) = 0, x ∈ {a, b},
h(c) = 1, (7.2.79)

where for later reference we choose the generator to be of the form

(−L h)(x) = −εa(x)h (x) − b(x)h (x) (7.2.80)

for ε > 0, with a(x) > 0 and b(x) ∈ R. The case a(x) = 1 corresponds to the clas-
sical Kramers equation (2.1.1). It follows from the general formula in (7.2.35) that
the invariant measure for this diffusion is given by
x
1 b(z)
μ(dx) = exp dz/ε , (7.2.81)
a(x) 0 a(z)

up to normalisation. Set
x b(z) %(x).
dz = −F (7.2.82)
0 a(z)
Then it is easy to verify that the Dirichlet form is given by

%
E (f, g) = 12 ε e−F (x)/ε f (x)g (x) dx. (7.2.83)
R

To compute the equilibrium potential hc,{a,b} , we must solve the second-order dif-
ferential equation
εa(x)h (x) + b(x)h (x) = 0, (7.2.84)
which reduces to the first-order differential equation

εa(x)u (x) + b(x)u(x) = 0 (7.2.85)

after we set u = h . Clearly, (7.2.85) has the general solution

%
u(x) = C1 eF (x)/ε , (7.2.86)
172 7 Potential Theory

and so the general solution of (7.2.84) is

x
%
h(x) = C1 eF (r)/ε dr + C2 (7.2.87)
0

with C1 and C2 integration constants to be determined from the boundary condi-

tions. In particular, for the equilibrium potential we have
⎧ x F%(r)/ε
⎪
⎪ e dr
⎨ ac eF%(r)/ε dr , a < x < c,
hc,{a,b} (x) = ab F%(r)/ε (7.2.88)
⎪
⎪ e dr
⎩ xb F%(r)/ε , c < x < b.
c e dr

Hence the capacity cap(c, {a, b}) is readily computed as

ε ε
cap c, {a, b} = E (hc,{a,b} , hc,{a,b} ) = c % + b % . (7.2.89)
2 ae F (r)/ε F
dr 2 c e (r)/ε dr

From Lemma 7.28 we get the following formula for the Green function on (a, b):
%
e−F (x)/ε hy,{a,b} (x)
G{a,b} (y, x) = , (7.2.90)
a(x) cap(y, {a, b})
where the second equality uses (7.2.57). Note that this computation is a nice alterna-
tive to the usual method of variation of constants used to obtain the Green function.
Now, if limx↓−∞ F %(x) = ∞, then lima↓−∞ x eF%(r)/ε dr = ∞, and we get
a
⎧
⎪
⎨1, −∞ < x < c,
lim hc,{a,b} (x) = b eF%(r)/ε dr (7.2.91)
a↓−∞ ⎪
⎩ xb F%(r)/ε , c < x < b,
c e dr

and
ε
lim cap c, {a, b} = b % . (7.2.92)
a↓−∞ 2 c e (r)/ε dr
F

Hence
⎧
⎨2(εa(x))−1 e−F%(x)/ε b eF%(r)/ε dr, y < x < b,
x
lim G{a,b} (y, x) = (7.2.93)
a↓−∞ ⎩2(εa(x))−1 e−F%(x)/ε b eF%(r)/ε dr, x < y < b.
y

Integrating over x ∈ (−∞, b), we get

b b

% 1 %
Ey [τb ] = 2 e−F (x)/ε eF (r)/ε dr dx, y ∈ (−∞, b). (7.2.94)
−∞ εa(x) x∨y

% in (7.2.82) reduces to
Note that, for a(x) = 1 as in (2.1.1), the definition of F
% %
b = −F , i.e., F = F with F the potential. If F is chosen to be a double-well
7.3 Variational principles 173

Fig. 7.4 Example of the setting in Remark 7.32 with a = −3 and b = 3: a potential x → F (x) on
[−3, 3] and its associated equilibrium potential x → h3,−3 (x) = Px (τ3 < τ−3 )

potential, then (7.2.94) yields, in the limit as ε ↓ 0 and with the help of elementary
Laplace asymptotics, the Kramers formula in (2.1.2) advertised in Sect. 2.1.

Remark 7.31 Note that we chose an arbitrary normalisation for the invariant mea-
sure, which influences the value of the capacity. It does not, however, affect the
value of physical quantities, in particular, the Green function and the mean hitting
time.

Remark 7.32 If instead of (7.2.79) we take the Dirichlet problem

(−L h)(x) = 0, x ∈ (a, b),

h(a) = 0,
h(b) = 1, (7.2.95)

then (7.2.94) becomes

b x

%(x)/ε
−F 1 %(r)/ε
Ey [τb ] = 2 e e F
dr dx, y ∈ (a, b). (7.2.96)
y εa(x) y

See Fig. 7.4 for an example.

7.3 Variational principles

As was pointed out earlier, variational principles are at the heart of our endeavor
to obtain sharp estimates on key quantities in metastable systems. We have already
seen that such quantities can be expressed as solutions to PDE’s (or discrete ana-
logues of PDE’s), but finding these is hard. Variational principles provide tools to
get good estimates without an explicit solution. In this section we discuss three vari-
ational principles: the Dirichlet principle, the Thomson principle and the Berman-
Konsowa principle.
174 7 Potential Theory

7.3.1 The Dirichlet principle

In Sects. 7.1–7.2 we have seen that the Dirichlet form computed on the equilibrium
potential gives the capacity. We will now show that the equilibrium potential is the
solution of a variational problem.

Theorem 7.33 (Dirichlet principle) Let D and A, B be as in the definition of the

Dirichlet problem in (7.1.15) and (7.2.42) (see Fig. 7.3). Let HA,B be the space of
continuous functions f on D̄ such that
(i) E (f, f ) < ∞.
(ii) f ≥ 1 on A and f ≤ 0 on B.
Assume that the corresponding Dirichlet problem has a unique solution, the equi-
librium potential hA,B . Then

cap(A, B) = inf E (f, f ). (7.3.1)

f ∈HA,B

Moreover, if HA,B = ∅, then the infimum in (7.3.1) is attained uniquely at the equi-
librium potential, i.e., cap(A, B) = E (hA,B , hA,B ).

Proof We write the proof in the diffusion setting, but the same arguments work in
general. Suppose that HA,B = ∅. Let g be a function with E (g, g) < ∞ such that
g ≥ 0 on A and g ≤ 0 on B. Then, for h ∈ HA,B and ε > 0 (recall (7.2.55)), using
the second Green identity (7.2.50),

E (h + εg, h + εg) − E (h, h) = ε E (h, g) + E (g, h) + ε 2 E (g, g)

−F (x)
=ε e g(x)∂n(x) h(x) dσA (x) + ε e−F (x) g(x)∂n(x) h(x) dσB (x)
∂A ∂B

+ 2ε μ(dx)g(x)(L h)(x) + ε 2 E (g, g). (7.3.2)
D

If h = hA,B is the equilibrium potential, then the boundary integrals are non-
negative and the first term in the last line vanishes. Since the second term in the
last line is non-negative, it follows that h is a global minimum of E in HA,B . Fi-
nally, suppose that there is another function f such that E (f, f ) = E (h, h). Then
the identity
f +h f −h
E f +h
2 , 2 + E f −h
2 , 2 = 12 E (f, f ) + 12 E (h, h) (7.3.3)

implies that
f +h f −h
E 2 , f +h
2 = E (h, h) − E f −h
2 , 2 . (7.3.4)
Since h is a global minimum, this equality can only hold if

E (f − h, f − h) = 0. (7.3.5)
7.3 Variational principles 175

But, by (7.2.40) (recall that we are in the diffusion setting), the latter means that
∇(f − h)2 = 0 μ-a.s., i.e., f − h is constant μ-a.s. Because of condition (ii), it
follows that f = h μ-a.s.

The Dirichlet principle is a powerful tool for asymptotic computations of capac-

ities via upper and lower bounds. An elementary upper bound is the following.

Corollary 7.34 For any f ∈ HA,B ,

cap(A, B) ≤ E (f, f ). (7.3.6)

Since E (f, f ) is a sum (or an integral) of non-negative terms (recall (7.1.29),

(7.2.30) and (7.2.40)), a lower bound can be obtained by dropping some of these
terms. Upper and lower estimates of this type, which are flexible, will turn out to be
very important in Parts IV–VIII.

7.3.2 The Thomson principle

A classical reverse variational principle is due to Thomson.

Theorem 7.35 (Thomson principle, Version 1) Assume that A, B are such that the
corresponding Dirichlet problem has a unique solution hA,B . Let TA,B denote the
space of super-harmonic functions on D c that take values in [0, 1], i.e.,

TA,B = h : S → [0, 1], h ∈ L2 (S, μ) : (L h)(x) ≤ 0 ∀ x ∈ S\D . (7.3.7)

Then
E (1A , h)2
cap(A, B) = sup , (7.3.8)
h∈TA,B E (h, h)
and the supremum is attained at h = hA,B .

Proof The proof is simple. By (7.2.26–7.2.27), for all h ∈ TA,B ,

E (hA,B , h) = μ(dx)hA,B (x)(−L h)(x) ≥ μ(dx)(−L h)(x) = E (1A , h).
S A
(7.3.9)
On the other hand, by the Cauchy-Schwarz inequality,

E (hA,B , h)2 ≤ E (h, h)E (hA,B , hA,B ), (7.3.10)

and hence, for all h ∈ TA,B ,

E (1A , h)2
cap(A, B) = E (hA,B , hA,B ) ≥ . (7.3.11)
E (h, h)
176 7 Potential Theory

Fig. 7.5 Kirchhoff’s law says that the in-flow and the out-flow are the same for all vertices that
are not wired to the outside

Thus, the right-hand side of (7.3.8) is a lower bound for cap(A, B). Since, by defi-
nition (see (7.2.55–7.2.57)), cap(A, B) = E (1A , hA,B ), the lower bound in (7.3.11)
is attained for h = hA,B .

The Thomson principle is much more difficult to exploit than the Dirichlet prin-
ciple, since it imposes the constraint of super-harmonicity on the test functions.
Guessing good super-harmonic functions is not easy.
In the setting of Markov processes with countable state space, there is an alterna-
tive (and better known) formulation of the Thomson principle in terms of flows (see
Fig. 7.5).

Definition 7.36 Let Γ = (S, E) be a graph with edge set E and vertex set S. Let
A, B ⊂ S be non-empty and disjoint. A map f : E → R is called a unit flow from
A to B when
(i) Kirchhoff’s law holds: the flows into and out of vertices in S\(A ∪ B) are the
same, i.e.,

f (y, x) = f (x, z) ∀ x ∈ S\(A ∪ B). (7.3.12)
y∈S : z∈S :
(y,x)∈E (x,z)∈E

(ii) The total flow out of A and into B is one, i.e.,

f (x, z) = 1 = f (y, x) . (7.3.13)
x∈A z∈S : x∈B y∈S :
(x,z)∈E (y,x)∈E

Note that in the discrete case,

(L h)(x) = p(x, y) h(y) − h(x) , x ∈ S, (7.3.14)
y∈S

is a sum over edge-functions on the graph of the Markov process. The Dirichlet
form can therefore be written as
7.3 Variational principles 177

E (h, g) = 1
2 μ(x)p(x, y) h(y) − h(x) g(y) − g(x) (7.3.15)
(x,y)∈E
{μ(x)p(x, y)[h(y) − h(x)]} {μ(x)p(x, y)[g(y) − g(x)]}
= 1
2 .
μ(x)p(x, y)
(x,y)∈E

Defining the functional D on pairs of edge functions u, v by

1
D(u, v) = 1
2 u (x, y) v (x, y) , (7.3.16)
μ(x)p(x, y)
(x,y)∈E

we have
E (h, g) = D(μp∇h, μp∇g), (7.3.17)
with the obvious definition of μp∇. In particular,

cap(A, B) = D(μp∇hA,B , μp∇hA,B ). (7.3.18)

On the other hand, for any unit flow f we have

D(μp∇hA,B , f ) = hA,B (x) − hA,B (y) f (x, y)
(x,y)∈E

= f (x, y) = 1. (7.3.19)
x∈A y∈S :
(x,y)∈E

Applying the Cauchy-Schwarz inequality as in (7.3.10) again, we get

D(μp∇hA,B , f )2 1
cap(A, B) ≥ = (7.3.20)
D(f, f ) D(f, f )
for any unit flow f .

Theorem 7.37 (Thomson principle, Version 2) For Markov processes with coun-
table state space, with the notation above,
1
cap(A, B) = sup , (7.3.21)
f ∈UA,B D(f, f )

where UA,B is the space of all unit flows from A to B. The supremum is attained for
the harmonic unit flow
μ(x)p(x, y)[hA,B (y) − hA,B (x)]+
fhA,B (x, y) = . (7.3.22)
cap(A, B)

Proof In view of (7.3.20), we only need to verify that equality holds for the particu-
lar choice of the harmonic unit flow. To check that D(fhA,B , fhA,B ) = 1/cap(A, B)
is immediate. We only need to verify that fhA,B is a unit flow from A to B.
178 7 Potential Theory

Lemma 7.38 Let h be a harmonic function with respect to L . Then φh defined by

φh (x, y) = μ(x)p(x, y) h(y) − h(x) + (7.3.23)

is a flow.

Proof For y ∈ S, compute

φh (x, y) − φh (y, x)
x∈S

= μ(x)p(x, y) h(y) − h(x) + − μ(y)p(y, x) h(x) − h(y) +
x∈S

= μ(y)p(y, x) h(y) − h(x) + − h(x) − h(y) +
x∈S

=− μ(y)p(y, x) h(x) − h(y) = μ(y)(−L h)(y) = 0, (7.3.24)
x∈S

which says that φh is a flow.

By Lemma 7.38 and since x∈A y∈S φhA,B (x, y) = cap(A, B), fhA,B is a unit
flow from A to B. This proves the theorem.

Remark 7.39 Note that the proof of Lemma 7.38 implies that for any function g the
edge-function φg defined in (7.3.23) satisfies

φg (x, y) − φg (y, x) = μ(y)(−L g)(y), y ∈ S. (7.3.25)
x∈S

7.3.3 The Berman-Konsowa principle

Berman and Konsowa [23] obtained another variational principle, for the case of
discrete-time Markov processes, which generates lower bounds that improve on
those obtained from the Thomson principle. Its derivation is quite different, and
actually starts from the Dirichlet principle.
We work in the same setting as in Sect. 7.3.2.

Definition 7.40 Given a graph Γ = (S, E) and non-empty disjoint subsets A, B

⊂ S, an edge function f : E → [0, ∞) is called a loop-free unit flow from A to B
when:
(i) f is a unit flow from A to B.
(ii) Any path γ of edges from A to B such that f (e) > 0 for all e ∈ γ is self-
avoiding. In particular, if f ((x, y)) > 0, then f ((y, x)) = 0.
7.3 Variational principles 179

First we observe that a loop-free

unit flow gives rise to a directed Markov pro-
cess. For x ∈ S, let F (x) = y∈S : (x,y)∈E f ((x, y)) be the total flow out of x, and
assume that F (x) > 0 for all x ∈ S. For (x, y) ∈ E and x ∈ S\B, let
f ((x, y))
q f (x, y) = , (7.3.26)
F (x)

and put q f ((x, y)) = 0 for x ∈ B. We construct a Markov process with law Pf ,
initial distribution Pf (X0 = x) = F (x)1x∈A and transition matrix q f that is killed
in B. Pf can also be seen as a probability distribution on self-avoiding paths from
A to B, with
|γ"
|−1

Pf (γ ) = F (γ0 ) q f (γi , γi+1 ) . (7.3.27)
i=0

Lemma 7.41 Let e ∈ E. Then

Pf (e ∈ γ ) = f (e). (7.3.28)

Proof Let e = (x, y). Then, by the Markov property and the fact that the paths are
self-avoiding, the probability in question equals the probability that a path hits x
and immediately moves to y:

Pf (e ∈ γ ) = Pf (τx < τB )q f (x, y) . (7.3.29)

Use (7.3.27) to write

|γ"
|−1

Pf (τx < τB ) = F (γ0 ) q f (γi , γi+1 ) . (7.3.30)
γ : A→x i=0

The summation over paths has to be carried out carefully. To that end, recursively
define the sets

A0 = A, (7.3.31)

An = z ∈ S\A : ∃y∈An−1 f (y, z) > 0, ∀y ∈A
/ 0 ∪···∪An−1 f (y, z) = 0 .

Note that, due to the loop-freeness of the flow, for any z ∈ S there exists a unique
n∗ (z) such that z ∈ An∗ (z) . Set

G(z) = Pf (τz < τB ). (7.3.32)

Then the Markov property implies the recursive identity

G(z) = G(y)q f (y, z) . (7.3.33)
y∈A0 ∪···∪An∗ (z)−1
180 7 Potential Theory

We prove by induction that G(z) = F (z) for all z ∈ S. Indeed, for z ∈ A we have
G(z) = F (z) by our choice of the initial condition. It therefore suffices to show that
if G(z) = F (z) holds for all z ∈ Ak , 0 ≤ k ≤ n, then it also holds for z ∈ An+1 .
Now, by (7.3.31–7.3.33), for z ∈ An+1 we have

G(z) = G(y)q f (y, z) = f (y, z) , (7.3.34)
y∈A0 ∪···∪An y∈A0 ∪···∪An

where we use the induction hypothesis. However, for z ∈ An+1 we also have

f (y, z) = f (y, z) = f (z, w) = F (z), (7.3.35)
y∈A0 ∪···∪An y∈S w∈S

where we use that the flow satisfies Kirchhoff’s law. Thus, we have completed the
induction step and have proven that

Pf (τx < τB ) = F (x), x ∈ S. (7.3.36)

Combine (7.3.26), (7.3.29) and (7.3.36) to get the claim.

Remark 7.42 Lemma 7.41 is the only place where the flow property of f is used.
In terms of flows the observation in (7.3.36) is the probabilistic interpretation of the
fact that F (x) is the total flow into x.

Provided f (e) > 0, we can divide (7.3.28) by f (e), to obtain

1e∈γ
1f (e)>0 = Pf (γ ) . (7.3.37)
γ
f (e)

Now pick any function h ∈ HA,B . Then

2
E (h, h) ≥ μ(x)p(x, y) h(x) − h(y) 1f ((x,y))>0
(x,y)∈E
2 f 1(x,y)∈γ
= μ(x)p(x, y) h(x) − h(y) P (γ )
γ
f ((x, y))
(x,y)∈E
μ(x)p(x, y) 2
= Pf (γ ) h(x) − h(y) . (7.3.38)
γ
f ((x, y))
(x,y)∈γ

From this we can derive a lower bound on the capacity, namely, we take the infimum
over h and interchange the sum over γ with the infimum over h:
μ(x)p(x, y) 2
cap(A, B) ≥ inf Pf (γ ) h(x) − h(y)
h∈HA,B f ((x, y))
γ (x,y)∈γ
7.4 Variational principles in the non-reversible setting 181

μ(x)p(x, y) 2
≥ Pf (γ ) inf h(x) − h(y)
h∈HA,B f ((x, y))
γ (x,y)∈γ

−1
f ((x, y))
= P (γ )
f
. (7.3.39)
γ
μ(x)p(x, y)
(x,y)∈γ

In the last step we use the explicit solution of the Dirichlet problem on the one-
dimensional path γ . We readily see that equality holds when we insert the harmonic
unit flow (see (7.3.22)). Thus, we have proved the following theorem.

Theorem 7.43 (Berman-Konsowa principle) Let UA,B denote the set of loop-free
unit flows from A to B. Then

−1
f ((x, y))
cap(A, B) = sup Ef . (7.3.40)
f ∈UA,B μ(x)p(x, y)
(x,y)∈γ

Remark 7.44 The Berman-Konsowa principle improves the Thomson principle.

Namely, by Jensen’s inequality and Lemma 7.41,

−1
f ((x, y))
E f
μ(x)p(x, y)
(x,y)∈γ

−1

f ((x, y)) f ((x, y))2 −1 1

≥ E f
= 2
1
= .
μ(x)p(x, y) μ(x)p(x, y) D(f, f )
(x,y)∈γ (x,y)
(7.3.41)

Hence, every choice of f yields a better lower bound via the Berman-Konsowa
principle than via the Thomson principle.

The more serious advantage of the Berman-Konsowa principle is the fact that
the bounds can often be evaluated explicitly. The sums appearing in the right-hand
side of (7.3.40) are straightforward, live on the flow realising the supremum, and
are independent of the realisation of the Markov chain, so that the expectation
over Ef becomes trivial. This will be explained in the examples that are treated
in Parts IV–VIII.

7.4 Variational principles in the non-reversible setting

Variational representations for capacities are known also in the non-reversible set-
ting, but they are much more involved and therefore far less useful. Here is a brief
account.
182 7 Potential Theory

We assume that there exists a unique ergodic invariant measure μ. We denote by

μ(y)
p ∗ (x, y) = p(y, x), x, y ∈ S, (7.4.1)
μ(x)
the transition probabilities of the time-reversed Markov process, and by

p s (x, x) = 12 p(x, y) + p ∗ (x, y) (7.4.2)

the transition probabilities of the symmetrised Markov process. Analogously, we

write L ∗ and L s for the generators of these processes. Note that from the definition
of capacity (see (7.1.39)) we get that, for any f ∈ HA,B ,

cap(A, B) = (f, −L hA,B )μ . (7.4.3)

In particular,

cap(A, B) = h∗A,B , −L hA,B μ = −L ∗ h∗A,B , hA,B μ = cap∗ (A, B). (7.4.4)

Define the norms

f 2H 1 = E (f, f ), (7.4.5)
and

f 2H −1 = sup 2(f, g)μ − E (g, g) . (7.4.6)
g∈H 1

Note that on the space of functions with zero mean we have f 2H −1 = (f, L −1 f ),
while otherwise the H −1 -norm is infinite. An application of the Cauchy-Schwarz
inequality yields the bound

|(f, g)μ | ≤ f H 1 gH −1 . (7.4.7)

Using this bound with f replaced by −L ∗ f and g by hA,B , we can show that
2
−L ∗ f, hA,B μ ≤ cap(A, B) sup 2 −L ∗ f, h μ − h2H 1 , (7.4.8)
h∈GA,B

where GA,B denotes the space of functions that are constant on the sets A and B.
Furthermore, for any f ∈ HA,B we have, via (7.4.3),

−L ∗ f, hA,B μ = (f, −L hA,B )μ = cap(A, B), (7.4.9)

so that we obtain the bound

cap(A, B) ≤ inf sup 2 −L ∗ f, h μ − h2H 1 . (7.4.10)
f ∈HA,B h∈GA,B

It now suffices to choose f = 12 (hA,B + h∗A,B ), where h∗A,B is the equilibrium po-
tential for the adjoint generator L ∗ , to verify that the infimum is attained at f . This
yields the following Dirichlet principle for the non-reversible case.
7.4 Variational principles in the non-reversible setting 183

Theorem 7.45 (Dirichlet principle: non-reversible case) For non-empty disjoint

sets A, B ⊂ S,

cap(A, B) = inf sup 2 −L ∗ f, h μ − h2H 1 . (7.4.11)
f ∈HA,B h∈GA,B

The Thomson principle in the form of Theorem 7.35 carries over to the non-
reversible case.
Another version of both the Dirichlet principle and the Thomson principle is
the following theorem, whose proof can be found in Slowik [220]. We need the
following notations: for g : S → R, set

Ψg (x, y) = μ(x)p s (x, y) g(x) − g(y) (7.4.12)

and

Φg (x, y) = μ(x)p(x, y)g(x) − μ(y)p(y, x)g(y). (7.4.13)

Theorem 7.46 (Dirichlet and Thomson principles: non-reversible case) Consider a

Markov process with a countable state space S. Let A, B ⊂ S be non-empty and
disjoint. Then:
(i) The Dirichlet principle holds, in the sense that

cap(A, B) = inf inf D(Φf − ψ, Φf − ψ), (7.4.14)

f ∈HA,B ψ∈U 0
A,B

where HA,B is the space of functions defined in Theorem 7.33 and UA,B 0 is
∗
the space of zero-flows. The infima are attained at f = 2 (hA,B + hA,B ) and
1

ψ = Φf − ΨhA,B .
(ii) The Thomson principle holds, in the sense that

1
cap(A, B) = sup sup , (7.4.15)
g∈G0A,B φ∈UA,B
1 D(φ − Φg , φ − Φg )

where G0A,B is the space of functions that vanish on A ∪ B and UA,B

1 is the
space of unit flows. The suprema are attained for φ = φA,B + Φg and g =
1 ∗
2 (hA,B − hA,B )/cap(A, B), where φA,B is the harmonic flow.

Whether or not these variational principles are useful in connection with metasta-
bility remains to be seen. They are substantially more involved than their analogues
in the reversible case, where the minimiser has a transparent probabilistic interpre-
tation that makes it easy to come up with good guesses for test functions.
184 7 Potential Theory

7.5 Bibliographical notes

1. The connection between Markov processes and potential theory goes back to
Kakutani [144, 145]. A fundamental treatment is given in the monograph by
Doob [95]. For a presentation of the theory in the context of discrete Markov pro-
cesses, see e.g. the book by Doyle and Snell [96].

2. There is a formula for the mean hitting time that does not require reversibility, as
was noted by Gaveau and Moreau [123]. It suffices to recall that (7.1.23) holds in
general, to get
μ(x)hx,B (a)
Ea [τB ] = . (7.5.1)
cap(x, B)
x∈S
Note that this formula is not quite as nice as (7.1.41), but in principle it constitutes
an alternative. For more details, see the PhD thesis of Eckhoff [101, 102]. Fernán-
dez, Manzo, Nardi, Scoppola and Sohier [111] and Fernández, Manzo, Nardi and
Scoppola [112] develop a theory of metastability without reversibility, based on cer-
tain assumptions involving slow escape, fast thermalisation and fast recurrence, and
provide examples of dynamics for which these assumptions can be verified.

3. The Dirichlet form E can be extended to the set {f : E (f, f ) < ∞}, which typ-
ically is larger than the domain of L . An entire theory is available that allows us
to use this fact to construct a Markov process from a Dirichlet form. For a detailed
treatment, see e.g. the monograph by Fukushima, Oshida and Takeda [116].

4. In textbooks, Green identities are given for the case L = 12 Δ only. We have not
been able to find a reference where they are stated in general in explicit form.

5. The derivation in Sect. 7.2.4 is taken from the monograph by Sznitman [226].

6. For irregular domains, existence and uniqueness issues are more delicate. For
further reading, we refer the reader to the monograph by Karatzas and Shreve [148].

7. The approach in Sect. 7.3.3 was developed by Bianchi, Bovier and Ioffe [24]
following the original paper by Berman and Konsowa [23]. In den Hollander
and Jansen [82] the Berman-Konsowa principle is extended to arbitrary reversible
Markov jump processes on Polish spaces. The latter paper contains an appendix in
which the physical interpretations of the Dirichlet, Thomson and Berman-Konsowa
variational principles are elaborated.

8. The connection between the Berman-Konsowa principle and the Thomson prin-
ciple has been worked out by Slowik [219]. Remark 7.44 comes from that paper.

9. The history of variational principles in the non-reversible case appears to be

a little obscure. The Dirichlet principle in the form of (7.4.11) was obtained by
7.5 Bibliographical notes 185

Doyle [97]. Our presentation follows the exposition given by Slowik [220]. The
Dirichlet principle in the form of (7.4.14) is given by Landim [159] and Gaudillière
and Landim [121], while the Thomson principle in (7.4.15) apparently appears for
the first time in Slowik [220].

10. For a detailed discussion of Theorem 7.17, see Karatzas and Shreve [148].

11. A host of material on reversible Markov processes with countable state space is
presented in the online-book by Aldous and Fill [2].
Part III
Metastability

Part III develops the theory of metastability for Markov processes.

Chapter 8 provides key definitions and basic properties. The starting point is a
definition of a metastable Markov process based on capacities. After that, renewal
estimates are used to derive bounds on harmonic functions in terms of capacities,
and it is shown that capacities are approximately ultrametric. This in turn leads to
estimates on mean hitting times. Finally, metastability is linked to the spectrum of
the generator of the Markov process, which is shown to decompose into a cluster
of small real eigenvalues that are separated from the rest of the spectrum by a gap.
This in turn leads to the exponential law for the metastable crossover time.
Chapter 9 collects basic techniques. Upper and lower bounds on capacity are
derived with the help of the Dirichlet principle. Coarse-graining techniques are used
to describe metastability of Markov chains with a high degree of symmetry, like
the Curie-Weiss model. Regularity estimates on harmonic functions are derived for
Markov processes with uncountable state spaces, like elliptic diffusions. These in
turn are linked to coupling methods.
Chapter 8
Key Definitions and Basic Properties

La véritable éloquence consiste à dire tout ce qu’il faut, et à ne

dire que ce qu’il faut.
(François de La Rochefoucauld, Réflexions)

In this chapter we introduce the basic setup for our approach to metastability.
The guiding principle is to provide a definition of metastable sets, representing
metastable states in model systems, that is verifiable in concrete models and implies
the type of behaviour that is associated with metastability. The intuitive picture we
have in mind comes from the paradigmatic Brownian motion in a double-well (or
a multi-well) potential in one dimension. Here, the metastable states correspond to
“valleys” of the potential, labeled by the local minima of the potential. Our aim is
to give a definition that applies in far more general situations.
Section 8.1 defines metastable sets and provides the characterisation of metasta-
bility in terms of capacities. Section 8.2 shows how renewal estimates can be used
to obtain upper and lower bounds on the equilibrium potential in terms of capacity
and establishes the approximate ultrametricity of capacity. Section 8.3 uses these
results to obtain sharp bounds on mean hitting times. Section 8.4 makes the link
with spectral theory. Section 8.5, finally, mentions some problems that come up for
uncountable state spaces.

8.1 Characterisation of metastability

Consider a Markov process X with state space S and discrete or continuous time.
Let P denote the law of X and Px the law of X conditioned on X0 = x. We will
typically assume that X is uniquely ergodic with invariant measure μ. For D ⊂ S,
let τD denote the first hitting time of X in D, i.e.,

τD = inf t > 0 : X(t) ∈ D . (8.1.1)

The fundamental feature we would like to associate with metastability is the ex-
istence of two well-separated time scales and the partition of the state space into
disjoint sets Si , i ∈ I , such that, when X starts in Si , on a short time scale it reaches

© Springer International Publishing Switzerland 2015 189

A. Bovier, F. den Hollander, Metastability,
Grundlehren der mathematischen Wissenschaften 351,
DOI 10.1007/978-3-319-24777-9_8
190 8 Key Definitions and Basic Properties

Fig. 8.1 Picture of a metastable set (dots) labelling the metastable valleys, and transitions between
these valleys (arrows)

some sort of local equilibrium concentrated on Si , while on a long time scale it ex-
its Si and moves to some Sj with j = i, where it again reaches local equilibrium,
etc. We may think of the dynamics as “hopping” between quasi-invariant sets (see
Fig. 8.1). To capture this picture, an appealing way is to characterise the rapid ap-
proach to local equilibrium by saying that, in a suitable sense, X is locally recurrent
or Harris recurrent: each Si contains a small set Bi ⊂ Si that is revisited by X very
frequently before it moves out of Si .
On this basis, an intuitively appealing definition of metastability could be the
following:
• A family of Markov processes is called metastable if there exists a collection of
disjoint sets Bi ⊂ S, i ∈ I , such that

supx ∈/ i∈I Bi Ex [τi∈I Bi ]

= o(1). (8.1.2)
infi∈I infx∈Bi Ex [τj ∈I \i Bj ]

Here, o(1) should be thought of as a small intrinsic parameter that characterises the
“degree” of metastability, since typically we deal with a family of Markov processes
indexed by a parameter (like temperature, system size, etc.) that allows us to make
(8.1.2) as small as we like.
The definition in (8.1.2) characterises metastability in terms of a physical prop-
erty, namely, hitting times of the system. Certainly we would want such a property to
hold for a system to be called metastable. However, the problem is that (8.1.2) is not
immediately verifiable, since mean hitting times are generally difficult to compute.
Indeed, one of our goals is to compute mean hitting times, and so (8.1.2) would put
us in a circular set-up. It is thus desirable to have an equivalent definition involving
more manageable quantities.
The relations in Corollaries 7.11 and 7.30 between mean hitting times and capac-
ities suggest an alternative characterisation of metastability through capacities. We
will see that this characterisation entails many advantages. We first give a tentative
definition of metastable sets.
8.1 Characterisation of metastability 191

• A family of Markov processes is called metastable if there exists a collection of

disjoint sets Bi ⊂ S, i ∈ I , such that

supi∈I infx∈Bi Px (τj ∈I \i Bj < τ̂Bi )

= o(1), (8.1.3)
infx ∈/ i∈I Bi Px (τi∈I Bi < τ̂B(x) )

where B(x) is a sufficiently large neighbourhood of x, and τ̂B(x) = inf{t >

τB(x)c : Xt ∈ B(x)} is the first time X returns to B(x) after having left it.

Remark 8.1 Note that here and in the sequel we always include the stable set in
the collection of metastable sets, contrary to what is common practice. In particu-
lar, if there is only one metastable set, then it is stable, and the system exhibits no
metastable behaviour on this level of resolution.

This definition leaves some questions open. Should we take the supremum over
x ∈ Bi in the numerator rather than the infimum? What should be the choice for
B(x)? How can we relate the probabilities appearing in the definition to capacities,
as advertised?
It will emerge that the usefulness of the definition depends crucially on further
properties of the sets Bi , i ∈ I , and on local mixing properties of the process. Be-
fore we continue this discussion, we turn to the simplest case from which we can
derive much of our intuition: Markov processes in discrete time with countable state
spaces.
An important goal will be to derive general properties of metastable systems.
Since (8.1.3) implies frequent returns to the small starting set Bi before the transition
to a set Bj , j = i, we expect an exponential law for the transition times. We further
expect that the process of successive visits to the sets Bi , i ∈ I , asymptotically is a
Markov process on I .
Everything becomes easy and transparent when the state space S is finite, and we
can replace the sets Bi , i ∈ I , and B(x), x ∈ S, in (8.1.3) by single points. It will be
useful to understand this simple setting first.
The following definition of a set of metastable points applies (see Fig. 8.1).

Definition 8.2 (Metastable points) Suppose that |S| < ∞. A Markov processes X
is said to be ρ-metastable with respect to a set of points M ⊂ S if

supx∈M [cap(x, M \x)/μ(x)]

|S| ≤ ρ 1. (8.1.4)
infy ∈/ M [cap(y, M )/μ(y)]

Remark 8.3 Definition 8.2 is useful because, as we will see later, it involves quan-
tities that are either known or are controllable. It becomes intuitively even more
appealing after we note that (8.1.4) can be written alternatively as

supx∈M Px (τM \x < τx )

|S| ≤ ρ 1, (8.1.5)
infy ∈/ M Py (τM < τy )
192 8 Key Definitions and Basic Properties

where to go from (8.1.4) to (8.1.5) we use Lemma 7.13(i). Note the appearance
of the cardinality of the state space in (8.1.4)–(8.1.5). The definition makes sense
when we have a sequence of processes where the cardinality of the state space is
either fixed or increases slowly. If |S| = ∞, but there exists a subset S0 ⊂ S with
cap(M , S0c ) maxx∈M cap(x, M \x), then |S| can be replaced by |S0 | in Defini-
tion 8.2. The reader may verify this fact in the proofs below. The intuitive reason is
that, under this assumption, the process will have visited all metastable points long
before it leaves the set S0 .

We want to show that if a process is metastable in the sense of Definition 8.2,

then we can express mean hitting times of subsets of metastable points in terms of
capacities and the invariant measure alone. This is based on the key formula in
(7.1.41), which here reads, for J ⊂ M , n ∈ M ,

1
En [τJ ] = μ(y)hn,J (y). (8.1.6)
cap(n, J )
y∈S

The main work is to control the sum over the equilibrium potential in (8.1.6). To
do this, we show in Sect. 8.2 how to control the equilibrium potential in terms of
capacities. In Sect. 8.3 we use these estimates to derive bounds on mean hitting
times.

8.2 Renewal estimates and ultrametricity

The estimation of the equilibrium potential through capacities is based on a renewal

argument that is simple in the case of a discrete state space.

Lemma 8.4 (Renewal estimate on equilibrium potential) Let A, B ⊂ S be non-

empty disjoint sets, and let x ∈
/ A ∪ B. Then

cap(x, B) cap(x, A)
max 1 − , 0 ≤ hA,B (x) ≤ min ,1 . (8.2.1)
cap(x, A) cap(x, B)

Proof The upper bound follows from the estimate

Px (τA < τB∪x )

hA,B (x) = Px (τA < τB ) =
1 − Px (τx < τA∪B )
Px (τA < τB∪x ) Px (τA < τx ) cap(x, A)
= ≤ = , (8.2.2)
Px (τA∪B < τx ) Px (τB < τx ) cap(x, B)

where the second equality comes from counting the returns to x without a hit of
A or B. The lower bound follows from the upper bound via the symmetry relation
hA,B (x) = 1 − hB,A (x).
8.2 Renewal estimates and ultrametricity 193

An important consequence of the renewal estimate in Lemma 8.4 is the approxi-

mate ultrametricity of capacities.

Lemma 8.5 Let D ⊂ S and x, y ∈ S\D. If cap(x, D) ≤ δ cap(x, y) for 0 < δ < 1,
then
cap(x, D) 1
1−δ≤ ≤ . (8.2.3)
cap(y, D) 1 − δ

Proof By Lemma 7.13(ii), we have

cap(x, D) Py (τx < τD )

= . (8.2.4)
cap(y, D) Px (τy < τD )

Trivially, the right-hand side can be sandwiched as

Py (τx < τD ) 1
1 − Py (τD < τx ) ≤ ≤ . (8.2.5)
Px (τy < τD ) 1 − Px (τD < τy )

But from the renewal bound in Lemma 8.4 we have

cap(x, D)
Px (τD < τy ) = hD,y (x) ≤ ≤ δ. (8.2.6)
cap(x, y)

Substitution into the right-hand side of (8.2.5) yields the upper bound in (8.2.3). On
the other hand, by Lemma 7.13(iii) we also have

cap(x, D)
Py (τD < τx ) ≤ ≤ δ. (8.2.7)
cap(x, y)

Substitution into the left-hand side of (8.2.5) yields the lower bound in (8.2.3).

Lemma 8.5 has the following corollary, which is the version of the approximate
ultrametric triangle inequality we are looking for.

Corollary 8.6 (Approximate ultrametricity of capacities) For all distinct x,y,z ∈ S,

cap(x, y) ≥ 12 min cap(x, z), cap(y, z) . (8.2.8)

Proof Suppose that the claim is false. Then there exist distinct x, y, z ∈ S with
cap(x, y) < 12 cap(x, z) and cap(x, y) < 12 cap(y, z). Lemma 8.5 with δ = 12 there-
fore implies that

cap(x, y) cap(x, y)
1
2 ≤ ≤ 2, 1
2 ≤ ≤ 2, (8.2.9)
cap(y, z) cap(x, z)

which yields a contradiction.

194 8 Key Definitions and Basic Properties

Fig. 8.2 Ultrametricity of valleys

It is useful to have the notion of a valley around a point in M , which will serve
as an attractor for the dynamics (see Fig. 8.2). For m ∈ M , let

A(m) = z ∈ S : Pz [τm = τM ] = sup Pz [τn = τM ] . (8.2.10)
n∈M

Note that valleys may overlap, but from Lemma 8.5 it follows that their intersection
has a negligible mass under the invariant distribution. The following estimate holds.

Lemma 8.7 Let m, n ∈ M and x ∈

/ {m, n}. If

Px (τm < τx ) ≥ ε and Px (τn < τx ) ≥ ε, (8.2.11)

then
μ(x) ≤ 2ε −1 cap(m, n). (8.2.12)

Proof It follows from (7.1.18)–(7.1.20) and (7.1.39) that (8.2.11) implies cap(n, x)
≥ εμ(x) and cap(m, x) ≥ εμ(x). Hence

cap(m, n) ≥ 12 εμ(x) (8.2.13)

by Corollary 8.6.

Corollary 8.8 Assume that x ∈ S\M has the property Px (τm = τM ) = Px (τn =
τM ) = max∈M Px (τ = τM ). Then

μ(x) ≤ 2ρ min μ(m), μ(n) , (8.2.14)

with ρ from Definition 8.2.

Proof Since ∈M Px (τ = τM ) = 1, we have

Px (τm = τM ) ≥ 1/|M | and Px (τn = τM ) ≥ 1/|M |. (8.2.15)

Moreover, by the renewal estimate,

Px (τm < τx )
Px (τm = τM ) ≤ , (8.2.16)
Px (τM < τx )
8.3 Estimates on mean hitting times 195

and similarly with m replaced by n. Hence the hypotheses of Lemma 8.7 are satis-
fied with ε = Px (τM < τx )/|M |, and so

2|M | cap(m, n)
μ(x) ≤ ≤ 2ρ min μ(m), μ(n) , (8.2.17)
Px (τM < τx )

where the last inequality follows from Lemma 7.13(i) and Definition 8.2.

In view of Corollary 8.8 we may modify the definition of the valleys A(m),
m ∈ M , by reassigning their overlaps in an arbitrary fashion so that they become
disjoint.
We will make frequent use of the following corollary as well.

Corollary 8.9 Let J ⊂ M , m ∈ M \J and y ∈ A(m)\m. Then either

cap(m, J )
1
2 ≤ ≤ 2, (8.2.18)
cap(y, J )
or
cap(m, J )
1 ≤ 2|M | . (8.2.19)
cap(y, M )

Proof By Lemma 8.5 with D = J and δ = 12 , if cap(m, J ) ≤ 1

2 cap(m, y), then
(8.2.18) holds. To get (8.2.19), use that

cap(y, M ) ≤ cap(y, n) ≤ |M | max cap(y, n). (8.2.20)
n∈M
n∈M

Since y ∈ A(m), the maximum must be achieved for n = m, which gives cap(y, M )
≤ |M | cap(y, m). Combining this with cap(m, J ) > 12 cap(m, y), we get the
claim.

8.3 Estimates on mean hitting times

A pleasant feature of the definition of metastability in terms of capacities is that it

allows us to exploit Corollary 7.11 and link mean hitting times to capacities in a
simple way. We derive rough bounds in Sect. 8.3.1 and subsequently sharpen them
in Sect. 8.3.2.

8.3.1 Rough bounds

We begin with an a priori bound on the mean hitting time of M .

196 8 Key Definitions and Basic Properties

Fig. 8.3 Choice of J ⊂ M and n ∈ M \J

Lemma 8.10 For reversible Markov processes with |S| < ∞,

μ(y)
sup Ez [τM ] ≤ |S| sup . (8.3.1)
/M
z∈ / M cap(y, M )
y∈

Proof Recall from (7.1.13) that Ez [τM ] = y∈S\M GM (z, y). Using the repre-
sentation in (7.1.23) for the Green function GM (z, y), we get, for z ∈
/ M,
hy,M (z) 1
Ez [τM ] = ≤ |S| sup , (8.3.2)
ey,M (y) y∈S\M ey,M (y)
y∈S\M

which yields (8.3.1) after we recall from (7.1.39) that ey,M (y) = cap(y, M )/μ(y).

In the proof we used the trivial bound hy,M (z) ≤ 1. This explains the remark
made after Definition 8.2: with additional work the term |S| can be replaced by the
cardinality of a smaller set of y’s where hy,M (z) is close to 1.

8.3.2 Sharp bounds

We next turn to the computation of mean hitting times from a point n ∈ M to some
subset J ⊂ M (see Fig. 8.3). Return to (8.1.6). Decompose the sum in the right-
hand side as
μ(y)
μ(y)hn,J (y) = μ(m) Wn,J (m), Wn,J (m) = hn,J (y),
μ(m)
y∈S m∈M y∈A(m)
(8.3.3)
where A(m) is the set defined in (8.2.10), modified so that m∈M A(m) becomes a
disjoint union (recall the remarks made below (8.2.10) and (8.2.17)). Lemmas 8.11
and 8.13 below provide technical estimates of the quantities in (8.3.3). After the
statement and the proof of these lemmas we will explain how these estimates must
be read, and in what regimes they reduce to simpler estimates.
8.3 Estimates on mean hitting times 197

The first technical lemma gives bounds on hn,J (y) and μ(y)/μ(m) in the differ-
ent sets A(m), m ∈ M . Abbreviate

a = a(m) = inf cap(y, M )/μ(y) . (8.3.4)
y∈A(m)

Lemma 8.11 Let J ⊂ M and n ∈ M \J .

(i) If m = n, then hm,J (m) = 1, and for y ∈ A(m)\m either

cap(m, J )
hn,J (y) ≥ 1 − 2|M | (8.3.5)
cap(y, M )
or
1
μ(y) ≤ 2|M | cap(m, J ). (8.3.6)
a
(ii) If m ∈ J , then hn,J (m) = 0, and for y ∈ A(m)\m either

1
μ(y)hn,J (y) ≤ 2|M | cap(m, n) (8.3.7)
a
or
1
μ(y) ≤ 2|M | cap(m, n). (8.3.8)
a
(iii) If m ∈
/ J ∪ n, then for y ∈ A(m) either
cap(m, J ) cap(m, n)
1−4 ≤ hn,J (y) ≤ 4 (8.3.9)
cap(m, n) cap(m, J )
or
1
μ(y) ≤ 2 |M | max cap(m, J ), cap(m, n) . (8.3.10)
a
Proof The values of hn,J (y) for y ∈ J ∪ n are trivial from the definition of the
equilibrium potential. By Lemma 8.4, for J ⊂ M , n ∈ M and y ∈ / M,
cap(y, J ) cap(y, n)
1− ≤ hn,J (y) ≤ . (8.3.11)
cap(y, n) cap(y, J )
(i) To get the first assertion, use Corollary 8.9. In the first case, this yields
cap(m, J ) cap(m, J )
hm,J (y) ≥ 1 − 2 ≥ 1 − 2 |M | , (8.3.12)
cap(y, m) cap(y, M )
where we use (8.2.20). In the second case, we get (8.3.6) via the definition of a.
(ii) Use the upper bound in (8.3.11) to get
cap(y, n)
hn,J (y) ≤ , (8.3.13)
cap(y, J )
198 8 Key Definitions and Basic Properties

and use Corollary 8.9 with J = n. In the first case, cap(n, y) ≤ 2 cap(m, n), and
hence
cap(n, m)
hn,J (y) ≤ 2 . (8.3.14)
cap(y, J )
From here (8.3.7) follows as in (i). In the second case, (8.3.8) is again straightfor-
ward.
(iii) Write the two renewal bounds from Lemma 8.4,

cap(y, J ) cap(y, n)
1− ≤ hn,J (y) ≤ , (8.3.15)
cap(y, n) cap(y, J )

and again use Corollary 8.9. If (8.2.18) holds both for J = J and J = n, then we
can replace y by m in the numerators and denominators of (8.3.15) at the cost of a
factor 4 to get (8.3.9). If, on the other hand, (8.2.19) holds, then we get (8.3.10) just
as in the previous cases.

Remark 8.12 Case (iii) is special in as much as it does not give sharp estimates
when cap(m, J ) ≈ cap(m, n). If this situation occurs and the corresponding terms
contribute to leading order, then we cannot get sharp estimates with the tools ex-
ploited above, and better estimates on the equilibrium potential are needed.

The second technical lemma uses the estimates in Lemma 8.11 to obtain esti-
mates of Wn,J (m) in (8.3.3).

Lemma 8.13 Let J ⊂ M and n ∈ M \J .

(i) If m = n, then
μ(A(n))
Wn,J (n) ≤ (8.3.16)
μ(n)
and

μ(A(n)) cap(n, J ) 1
Wn,J (n) ≥ 1− 4|M | A(n) . (8.3.17)
μ(n) μ(A(n)) a

(ii) If m ∈ J , then
# $
1 cap(m, n) 1 cap(m, n)
Wn,J (m) ≤ C|M | y ∈ A(m) : μ(y) ≥ |M |
a μ(m) a μ(n)
(8.3.18)
for some C ∈ (0, ∞) independent of ρ in Definition 8.2.
(iii) If m ∈
/ J ∪ n, then
μ(A(m))
Wn,J (m) ≤ . (8.3.19)
μ(m)
Moreover:
8.3 Estimates on mean hitting times 199

(iii1) If cap(m, J ) ≤ 12 cap(m, n), then

cap(m, J ) μ(A(m)) 1 cap(m, J )

Wn,J (m) ≥ 1 − 4 − C|M | .
cap(m, n) μ(m) a μ(m)
(8.3.20)
(iii2) If cap(m, J ) ≥ 12 cap(m, n), then

μ(A(m)) cap(m, n) 1 cap(m, J )

Wn,J (m) ≤ + 2 |M |A(m) . (8.3.21)
μ(m) cap(m, J a μ(m)

Proof The proof consists of just inserting the bounds from Lemma 8.11.

Lemma 8.13 looks complicated. Ignoring small terms, we see that the statement
boils down to the following:
(i) The starting valley always contributes

μ(A(n))
Wn,J (n) ≈ , (8.3.22)
μ(n)
μ(A(n))
which gives a contribution of cap(n,J ) to the mean hitting time.
(ii) For m ∈ J ,
cap(m, n) 1
Wn,J (m) | S|. (8.3.23)
μ(m) a
This gives a contribution to the mean hitting time of order at most |S|/a, which
by assumption is small compared to that coming from (i).
(iii) For m ∈/ J ∪ n,
μ(A(m))
Wn,J (m) . (8.3.24)
μ(m)

(iii1) This bound is achieved when cap(m, J ) cap(m, n). In this case the
contribution to the hitting time is μ(A(m))
cap(n,J ) , which is small compared to
the one from (i) only if μ(m) μ(n).
(iii2) If cap(m, J ) ( cap(m, n), then the bound can be improved to

μ(A(m)) cap(m, n) 2 |M ||A(m)| cap(m, J )

Wn,J (m) ≤ + . (8.3.25)
μ(m) cap(m, J ) a μ(m)

The second term is always harmless, while the first can contribute more
to the mean hitting time than the one from (i), unless μ(m)/cap(m, J )
μ(n)/cap(m, n).

Remark 8.14 The above arguments use that quantities like μ(m)/μ(A(m)) are not
too small, i.e., the most massive points in a metastable set have a fairly large mass
200 8 Key Definitions and Basic Properties

(compared to, say, ρ in Definition 8.2). The most restrictive contribution comes
from case (iii), which is small only when ρ|S| is small. Physically speaking, the
latter avoids the situation where the time it takes for the dynamics to hit a target
point in J after crossing the respective saddle is much longer than the time it takes
to escape from the starting well.
Taking into account that Wn,J (m) appears with the prefactor μ(m)/μ(n) in
(8.3.3), we see that contributions from case (ii) are always subdominant. In particu-
lar, when J = M \n, the term m = n always gives the main contribution. The terms
from case (iii) have a chance to contribute only when μ(m) ≥ μ(n). In subcase
(iii1) they indeed contribute, and potentially dominate the sum, while in subcase
(iii2) they may or may not contribute.

The estimates obtained in Lemma 8.13 can now be inserted into the sum in (8.3.3)
and then into (8.1.6), to provide estimates on the mean hitting times En [τJ ]. We state
the outcome in the special case when only the term involving the starting minimum
contributes.

Theorem 8.15 (Mean metastable exit time) Let n ∈ M and J ⊂ M \n be such that
for all m ∈
/ J ∪ n μ(m) μ(n) or cap(m, J )/μ(m) ( cap(m, n)/μ(n). Then

μ(A(n))
En [τJ ] = [1 + error ], 0 < error 1. (8.3.26)
cap(n, J )

Proof The proof is straightforward from (8.3.3) and Lemmas 8.11 and 8.13. See the
discussion before Remark 8.14.

Note that the theorem covers in particular the case J = Mn , where

Mn = m ∈ M : μ(m) ≥ μ(n) . (8.3.27)

We call En [τMn ] the mean metastable exit time from the metastable point n. This
quantity plays an important rôle in Sect. 8.4 as well.

We will see in Parts IV–VIII that (8.3.26) is the key formula for the computation
of mean crossover times in metastable systems.

8.4 Spectral characterisation of metastability

We now turn to the characterisation of metastability through spectral data. We will

show that Definition 8.2 implies that the spectrum of the generator decomposes
into a cluster of |M | small real eigenvalues that are separated from the rest of the
spectrum by a gap (see Fig. 8.4). The associated eigenfunctions each live essentially
in their own valley (see Fig. 8.5).
8.4 Spectral characterisation of metastability 201

Fig. 8.4 Schematic picture of eigenvalues and Dirichlet eigenvalues when |M | = k, k ∈ N\{1}.
The eigenvalues 0 = λ1 < λ2 < · · · < λk−1 < λk are indicated by dots, the Dirichlet eigenvalues
M M
λM
0
1
< · · · < λ0 k−2 < λ0 k−1 < λM 0
k
are indicated by stars. The latter correspond to a nested
sequence of subsets of M , namely, M = {x1 , . . . , x }, = 1, . . . , k, with Mk = M , ordered ac-
cording to the depths of the valleys (see (8.4.61) below). The distance between the two spectra is
much smaller than the gaps within each of the two spectra

Fig. 8.5 Each eigenfunction essentially lives in its own valley

For the generator of a Markov process with countable state space, the eigenvalues
of −L are those values of λ for which

(−L − λ)ψλ (x) = 0, x ∈ S, (8.4.1)

has a solution ψλ ∈ RS , called the eigenvector. For a subset I ⊂ S, we denote by LI

the Dirichlet operator with Dirichlet boundary conditions on I . The eigenvalues of
−LI are denoted by λI and are those values of λ for which

(−L − λ)ψλ (x) = 0, x ∈ S\I,
(8.4.2)
ψλ (x) = 0, x ∈ I,

has a solution. The smallest eigenvalue is called principal eigenvalue. The com-
parison of eigenvalues and Dirichlet eigenvalues will be an important tool in the
analysis of the spectra of metastable Markov processes.
In Sect. 8.4.1 we derive rough bounds on the smallest eigenvalue λM 0 of −L .
M

In Sect. 8.4.2 we characterise the eigenvalues smaller than λM

0 , in Sect. 8.4.3 we find
the asymptotics of these eigenvalues, while in Sect. 8.4.4 we use this asymptotics to
prove the exponential limit law of metastable exit times.
202 8 Key Definitions and Basic Properties

8.4.1 A priori bounds

The first step consists in getting a rough bound on the principal eigenvalue of −LM ,
with M the set of metastable points. This uses an important tool that is due to
Donsker and Varadhan [94].

Lemma 8.16 (Lower bound on principal Dirichlet eigenvalues) Let I ⊂ S, and let
λI0 be the smallest eigenvalue of −LI . Then

1
λI0 ≥ . (8.4.3)
supz∈S Ez [τI ]

Proof For any φ ∈ RS , any C > 0 and any x, y ∈ S, we have

1 1
φ(x)φ(y) ≤ Cφ(x) + φ(y) .
2 2
(8.4.4)
2 C

Let w ∈ RS be such that w(x) > 0 whenever φ(x) = 0. Using (8.4.4) with C =
w(y)/w(x) within the Dirichlet form, we get

2 (−Lw)(x)
E (φ, φ) ≥ μ(x)φ(x) . (8.4.5)
w(x)
x∈S

Let w(x) = Ex [τI ], x ∈ S\I , and let φ be an eigenvector of −LI with eigenvalue λ.
Recalling that w solves the Dirichlet problem in (7.1.8), we get

1 1
λφ2,μ ≥
2
μ(x)φ(x) 2
≥ φ22,μ . (8.4.6)
w(x) supx∈S\I w(x)
x∈S\I

Since this holds for all eigenvalues of −LI , it implies the assertion in (8.4.3).

Lemma 8.16 links the time scale of the metastable dynamics to the smallest
eigenvalue of the Dirichlet operator in a way that is intuitively plausible. The es-
timate sometimes needs improvement, but at least it shows the basic twist. Note that
the bound is not very precise. We later derive a more precise relation for the cluster
of |M | small real eigenvalues alluded to above.
Combining Lemma 8.16 with Lemma 8.10, we obtain the following.

Corollary 8.17 (Lower bound on principal Dirichlet eigenvalues) For a reversible

Markov process with a finite state space S and a set of metastable points M ,

cap(y, M )
λM
0 ≥ inf . (8.4.7)
/M
y∈ 3 |M ||S|μ(y)
8.4 Spectral characterisation of metastability 203

8.4.2 Characterisation of small eigenvalues

We next obtain a representation formula for the eigenvalues that are smaller than
λM0 . We show that there are precisely |M | such eigenvalues. The idea is to use the
fact that the solution of the Dirichlet problem

(−Lf )(x) − λf (x) = 0, x ∈ M c,

(8.4.8)
f (x) = φx , x ∈ M,

already solves the eigenvalue equation −Lf = λf on M c . The question is whether

an appropriate choice of boundary conditions and of λ leads to a solution. The fol-
lowing observation is elementary.

Lemma 8.18 Assume that λ < λM 0 is an eigenvalue of −L and that φ is the cor-
responding eigenfunction. Then the unique solution of (8.4.8) with φx = φ(x),
x ∈ M , satisfies f (y) = φ(y) for all y ∈ S.

Proof Inserting f = φ into (8.4.8), we see that the first line in (8.4.8) is satisfied
because φ is an eigenfunction with eigenvalues λ. The second line holds by as-
sumption.

For any λ < λM 0 , the boundary value problem in (8.4.8) has a unique solution for
any choice of boundary condition φ. Denote by hλx,M \x = hλx , x ∈ M , the solutions
for the special case

(−L − λ)hλx (y) = 0, y ∈ M c ,
hλx (x) = 1, (8.4.9)
hλx (y) = 0, y ∈ M \x.

Then the unique solution of (8.4.8) can be represented as

f (y) = φx hλx (y), y ∈ S. (8.4.10)
x∈M

Asking for f to be an eigenfunction therefore amounts to imposing the following

|M | additional equations:

(−L − λ)hλx (y)φx = eλx,M \x (y)φx = 0, y ∈ M . (8.4.11)
x∈M x∈M

Here, eλx,M \x (y) = ((−L − λ)hλx,M \x )(y) is the λ-analogue of the equilibrium mea-
sure. Thus, denoting by EM (λ) the (|M | × |M |)-matrix with elements

EM (λ) xy = eλx,M \x (y), x, y ∈ M , (8.4.12)

we get the following lemma.

204 8 Key Definitions and Basic Properties

Lemma 8.19 A number λ < λM

0 is an eigenvalue of the matrix −L if and only if

det EM (λ) = 0. (8.4.13)

Remark 8.20 Equation (8.4.12) can be seen as a non-linear generalisation of the

characteristic equation for eigenvalues. It in fact coincides with that equation if we
replace M by S.

Anticipating that we are interested in small λ, we want to rewrite the matrix

EM (λ) in a more convenient form. In order to do so, let us set

hλx (y) = hx (y) + ψxλ (y), (8.4.14)

where hx = h0x is the equilibrium potential hx = hx,M \x . Since hy (y) = 1 for

y ∈ M , we have

(−L − λ)hλx (y) = hy (y) (−L − λ)hλx (y), x, y ∈ M , (8.4.15)

and since hλy (z) = 0 for z = y ∈ M , we have

hy (z) (−L − λ)hλx (z) = 0, z = y. (8.4.16)

Hence, we get that

M \x (y) = (−L − λ)hx (y)
λ λ
ex,
1
= μ(z)hy (z) (−L − λ)hλx (z)
μ(y)
z∈S

1
= μ(z)hy (z) (−L − λ)hx (z)
μ(y)
z∈S

1
+ μ(z)hy (z) (−L − λ)ψxλ (z). (8.4.17)
μ(y)
z∈S

The first sum equals

μ(z)hy (z) (−L − λ)hx (z) = E (hy , hx ) − λ(hy , hx )μ . (8.4.18)
z∈S

The matrix with elements E (hx , hy ) is referred to as the capacity matrix. It will turn
out that the small eigenvalues of the generator are very close to the eigenvalues of
the capacity matrix.
8.4 Spectral characterisation of metastability 205

For the second sum, we use the symmetry of L, plus the fact that ψxλ vanishes on
M and (−Lh)y vanishes on M c , to write it as

μ(z)hy (z) (−L − λ)ψxλ (z)
z∈S

= −λ hy , ψxλ μ + μ(z)(−Lhy )(z)ψxλ (z) = −λ hy , ψxλ μ . (8.4.19)
z∈S

Hence eλx,M \x (y) can be written in the form

1
eλx,M \x (y) = E (hy , hx ) − λ(hy , hx )μ − λ hy , ψxλ μ . (8.4.20)
μ(y)

The term involving ψ λ is of order λ2 , and hence is a small perturbation. This is

implied by the following lemma.

Lemma 8.21 (2 -bounds) Let λM0 denote the principal eigenvalue of the operator
−L with Dirichlet boundary conditions in M .
(i) If λ < λM
0 , then for all x ∈ M ,

λ
ψxλ 2,μ
≤ M hx 2,μ . (8.4.21)
λ0 − λ

(ii) For all x, y ∈ M ,

λ
hy , ψ λ ≤ hx 2,μ hy 2,μ . (8.4.22)
(λM
x μ
0 − λ)

(iii) For all λ, λ < λM

0 and x ∈ M ,

|λ − λ | λ
ψxλ − ψxλ ≤ hx 2,μ . (8.4.23)
2,μ
λM
0 −λ

(iv) For all x, y ∈ M ,

λ
ψ − ψ λ , hy ≤ |λ − λ | hλ hy 2,μ . (8.4.24)
λM
x x μ x 2,μ
0 −λ

Proof It is readily verified that ψxλ solves the Dirichlet problem

(−L − λ)ψxλ (y) = λhx (y), y ∈ M c,
(8.4.25)
ψxλ (y) = 0, y ∈ M.
206 8 Key Definitions and Basic Properties

But the Dirichlet operator −LM − λ is invertible for λ < λM

0 , and its norm as an
operator on 2 (S, μ) is bounded by 1/(λM
0 − λ). Hence

λ
ψxλ 2,μ
≤ M hx 2,μ , (8.4.26)
λ0 − λ

which proves (i). The assertion in (ii) follows from (i) together with the Cauchy-
Schwarz inequality. Finally,
λ
−L ψx − ψxλ (z) = λ − λ hλx (z) + λ ψxλ − ψxλ . (8.4.27)

Hence

−L − λ ψxλ − ψxλ (z) = λ − λ hλx (z), (8.4.28)
and so (8.4.23) follows in the same way as (8.4.21). Assertion (iv) follows again via
the Cauchy-Schwarz inequality.

The 2 -bounds in Lemma 8.21 can be improved to ∞ -bounds.

Lemma 8.22 (∞ -bounds) With the notation of Lemma 8.21, the following bounds
hold.
(i) For all λ < λM
0 and x ∈ M ,

μ(y)
ψxλ λ|S| supy ∈/ M cap(y,M )
≤ μ(y)
. (8.4.29)
hx,M ∞ 1 − λ|S| supy ∈/ M cap(y, M)

(ii) For all λ, λ < λM

0 and x ∈ M ,

ψxλ − ψxλ

|λ − λ ||S| supy ∈/ M μ(y)
cap(y,M )
≤ . (8.4.30)
hx,M ∞ 1 − |λ − λ ||S| supy ∈/ M cap(y,
μ(y)
M)

Proof Note that ψxλ satisfies

−Lψxλ (y) = λhλx (y), y∈
/ M,
(8.4.31)
ψxλ (z) = 0, z ∈ M.

Thus, with GM the Green function with Dirichlet boundary conditions in M , ψxλ
satisfies

ψxλ (y) = λ GM (y, a)hλx (a). (8.4.32)
/M
a∈
8.4 Spectral characterisation of metastability 207

Dividing by hx,M \x (y), we get

ψxλ (y) 1 hλx (a)

=λ GM (y, a)hx,M \x (a)
hx,M \x (y) hx,M \x (y) hx,M \x (a)
/M
a∈
ψxλ (a)
=λ GhMx (y, a) + λ GhMx (y, a) . (8.4.33)
hx,M \x (a)
/M
a∈ /M
a∈

Here, GhMx (y, a) = 1

hx,M\x (y) GM (y, a)hx,M \x (a) is the Green function of the
Doob-transformed generator Lhx , which is the generator of
the process conditioned
to hit M in x (see Sect. 4.2.4). In particular,
h
GMx (y, a) = Ey [τM |τx = τM ]. (8.4.34)
/M
a∈

Using the representation of the Green function given in (7.1.23), we see that

1 ha,M (y)
GhMx (y, a) = hx,M \x (a). (8.4.35)
hx,M \x (y) ea,M (a)

But
ha,M (y)hx,M \x (a) Py (τa < τM ∧ τx = τM )
= = Py (τa < τM |τx = τM ).
hx,M \x (y) Py (τx = τM )
(8.4.36)
Hence
μ(a)
Ey [τM |τx = τM ] ≤ sup |S|. (8.4.37)
a∈S\M cap(a, M)
From (8.4.33–8.4.36) it follows that, for all y ∈ S\M ,

ψxλ (y) ψxλ (a)

≤ λEy [τM |τx = τM ] 1 + sup . (8.4.38)
hx,M \x (y) a∈/ M hx,M \x (a)

Via the bound in (8.4.37) this implies (8.4.29). The bound in (8.4.30) is proven
analogously.

The main application of the bounds in Lemma 8.22 is the following improvement
of (8.4.22) and (8.4.24).

Corollary 8.23 For all x, y ∈ M ,

μ(y)
λ λ|S| supy ∈/ M cap(y,M )
ψ , hy ≤ C (hx , hy )μ , (8.4.39)
x μ μ(y)
1 − λ|S| supy ∈/ M cap(y, M)
208 8 Key Definitions and Basic Properties

and

λ |λ − λ ||S| supy ∈/ M μ(y)

cap(y,M )
ψ − ψ λ , hy ≤ C (hx , hy )μ . (8.4.40)
x x μ
1 − |λ − λ ||S| supy ∈/ M cap(y,
μ(y)
M)

Proof Follows from the estimates in Lemma 8.22.

The next lemma controls the off-diagonal elements (hx , hy )μ .

Lemma 8.24 There is a C < ∞ such that

(hx , hy )μ
sup ≤ Cρ. (8.4.41)
x,y∈M hx 2,μ hy 2,μ
x=y

Proof To bound the numerator we can use the computations from the proof of The-
orem 8.15. Analogously to (8.3.3), we write
μ(z)
μ(z)hx (z)hy (z) = %x,y (m),
μ(m)W %x,y (m) =
W hx (z)hy (z).
μ(m)
z∈S m∈M z∈A(m)
(8.4.42)
We need to distinguish the terms m ∈ {x, y} from the rest. First,
μ(z)
%x,y (x) ≤
W hy (z) = Wy,M \y (x). (8.4.43)
μ(x)
z∈A(x)

Hence, by Lemma 8.13(ii),

%x,y (x) ≤ C|M |a −1 cap(x, y) A(x),
W (8.4.44)
μ(x)

and analogously

%x,y (y) ≤ C|M |a −1 cap(x, y) A(y).
W (8.4.45)
μ(y)
For m ∈
/ {x, y}, we use the bounds from Lemma 8.11(ii). This yields
μ(z)
%x,y (m) =
W hx (z)hy (z)
μ(m)
z∈A(m)
min(cap(x, m), cap(y, m))
≤ 2|M |a −1
μ(m)
z∈A(m)

cap(x, y)
≤ |A(m)|4|M |a −1 . (8.4.46)
μ(m)
8.4 Spectral characterisation of metastability 209

This yields

μ(z)hx (z)hy (z) ≤ C|M |a −1 |S| cap(x, y). (8.4.47)
z∈S
√
The denominator in (8.4.41) is trivially bounded from below by μ(x)μ(y). Thus,
the left-hand side of (8.4.41) is bounded from above by

cap(x, y) cap(x, y) cap(x, y)

√ ≤ max , ≤ aρ/|S|, (8.4.48)
μ(x)μ(y) μ(x) μ(y)

where the last inequality uses Definition 8.2. The assertion of the lemma now fol-
lows with C = a/|S|.

Remark 8.25 The bounds can be improved. For instance, with a little more care we
get

hx 2,μ hy 2,μ ≥ μ A(x) μ A(y) 1 − O(ρ) . (8.4.49)

We are now in a position to relate the small eigenvalues of −L to capacities.

8.4.3 Computation of small eigenvalues

We have a bound on λM 0 and a characterisation of the eigenvalues of −L that are

smaller than λM
0 . We will show that there are |M | such eigenvalues. The strategy
to compute them is as follows. First, compute the largest eigenvalue that is smaller
than λM
0 . It will turn out that this eigenvalue is slightly larger than, but very close
M \x
to, λ0 for some x ∈ M . Next, start all over with M replaced by M \x, i.e.,
M \x
compute the largest eigenvalue of −L that is smaller than λ0 . Finally, repeat this
procedure until the set M is exhausted and all |M | eigenvalues are computed. For
this strategy to work, we need some non-degeneracy assumptions that will be stated
below.
Let us note that principal eigenvalues are also characterised through the Rayleigh-
Ritz variational principle. Namely, for any I ⊂ S,

λI0 = inf E (f, f ). (8.4.50)

f :f (x)=0,x∈I
f 2,μ =1

This immediately implies that, for I ⊂ J ⊂ S,

λI0 ≤ λJ0 . (8.4.51)

M \x
We start by deriving a precise estimate on the principal eigenvalue λ0 for
x ∈ M . The next theorem gives two characterisations of principal eigenvalues.
210 8 Key Definitions and Basic Properties

Theorem 8.26 (Capacity bounds for principal eigenvalues) Let N ⊂ M be non-

N \x
empty, and let x ∈ N . Then −LN \x has a unique eigenvalue λ = λ0 smaller
N
than λ0 , given by the solution of the equation

N \x (x) = 0.
λ
ex, (8.4.52)

Moreover,
cap(x, N \x) cap(x, N \x)
1 − O(δ) ≤ λ = , (8.4.53)
hx,N \x 2,μ
2 hx,N \x 22,μ
cap(x,N \x)
where δ = /λN .
hx,N \x 22,μ 0

Proof The same argument leading to Lemma 8.19 shows that any eigenvalue of
−LN \x smaller than λN 0 must satisfy (8.4.52). To show (8.4.53), note that the
principal eigenvector of −LN \x must be strictly positive on (N \x)c . In particular,
it must be positive at x. Hence we can reformulate the Rayleigh-Ritz variational
principle in (8.4.50) as

N \x E (f, f ) E (hx,N \x , hx,N \x )

λ0 = inf ≤ . (8.4.54)
f: f (z)=0,z∈N \x,f (x)=1 f 2 hx,N \x 22,μ
2,μ

On the other hand, using (8.4.20) to write out (8.4.52), we see that this equation
implies
N \x E (hx,N \x , hx,N \x )
λ0 = . (8.4.55)
hx,N \x 22,μ + (hx,N \x , ψxλ )μ
Via the bound in (8.4.22) from Lemma 8.21, this implies that

N \x E (hx,N \x , hx,N \x ) 1
λ0 ≥ N \x N \x
. (8.4.56)
hx,N \x 22,μ 1 + λ0 /(λN − λ0 )
0

N \x
Finally, using the upper bound on λ0 from (8.4.54) and the definition of δ, we
get
1 1
N \x N \x
≥ ≥ 1 − O(δ). (8.4.57)
1 + λ0 /(λN 1+ δ
0 − λ0 ) 1−δ

This concludes the proof of Theorem 8.26.

We define a sequence of nested subsets of M as follows.

Definition 8.27 (Nested subsets of metastable sets) Let M be a set of metastable

points, and let |M | = k. Set
Mk = M . (8.4.58)
8.4 Spectral characterisation of metastability 211

For = k, . . . , 1, set

cap(x, M \x)
x = argmax x ∈ M : (8.4.59)
μ(x)
and
M−1 = M \x . (8.4.60)
We call the set M non-degenerate if for any = k, . . . , 2 the set M is itself a set
of metastable points in the sense of Definition 8.2.

What the recursive construction in Definition 8.27 does is to look for the mini-
mum with the smallest stability level and remove it from the set of minima that are
left over. The sequence of sets thus obtained has the form

M = Mk ⊃ Mk−1 = Mk \xk ⊃ Mk−2 = Mk−1 \xk−1 ⊃ · · · ⊃ M1

= M2 \x2 = x1 ⊃ ∅. (8.4.61)

Note that the non-degeneracy condition implies that

cap(x , M−1 ) cap(x+1 , M )
≤δ , (8.4.62)
μ(x ) μ(x+1)
with
|S|δ ≤ ρ. (8.4.63)
We next establish some properties of the sets M . We first obtain some informa-
tion on the relation between the sets M and Mx .
(j )
Lemma 8.28 Decompose disjointly Mx = j Mx such that cap(x , z) ∼ cap(x ,
(j ) (j ) (j ) (j ) (j )
Mx ) for all z ∈ Mx . Let x∗ be such that μ(x∗ ) = max(μ(z) : z ∈ Mx ).
Then
(j )
(i) For all j , x∗ ∈ M−1 .
(j )
(ii) For all z ∈ Mx \M−1 ,
(j )
cap(z, Mx ) cap(x , M−1 )
> . (8.4.64)
μ(z) μ(x )

Proof Throughout we use the approximate ultrametricity of capacity and the non-
(j )
degeneracy assumptions. Consider a single set Mx . If all points in this set are in
M−1 , then there is nothing to prove. Otherwise, there is some with k ≥ >
(j )
for which a first point z ∈ Mx is selected to be removed as x . But then it must be
that
cap(z, M \z) cap(x , M \x ) cap(x , M−1 )
≥ ≥ . (8.4.65)
μ(z) μ(x ) μ(x )
212 8 Key Definitions and Basic Properties

(j )
Assume first that {z} = Mx . Then
cap(z, M \z) cap(z, x ) cap(z, x ) cap(x , M \x )
< < ≤ , (8.4.66)
μ(z) μ(z) μ(x ) μ(x )
which contradicts (8.4.65). Hence, z is not selected before the -th step, and so
(j )
z ∈ M−1 . Now let Mx contain several points. Then, for (8.4.65) to be satis-
(j )
fied, it must be a point y ∈ Mx such that cap(z, y) ∼ cap(z, M \z), and then z
is selected only if μ(y) > μ(z). Otherwise, z cannot be selected and must be in
(j ) (j )
M−1 . Continuing in this way, we must arrive at a point x∗ ∈ Mx with max-

imal invariant mass that cannot be removed at any step > , and thus must
(j )
be in M−1 . This proves (i). Since now for any point in Mx there is a point
y with μ(y) > μ(z), and cap(z, y)/μ(z) > cap(x , M−1 )/μ(x ), it follows that
(j )
cap(z, y)/μ(z) > cap(z, x∗ )/μ(z), which implies (ii).

Corollary 8.29 Under the assumptions of Lemma 8.28,

cap(x , Mx ) = cap(x , M−1 ) 1 + O(δ) . (8.4.67)

(j )
Proof We have shown that each component Mx contains one point from M−1 .
We need the following lemma.

Lemma 8.30 Let Y ⊂ X ⊂ S, and let x ∈ S\X. Then

cap(x, Y )
cap(x, Y ) ≤ cap(x, X) ≤ cap(z,x)
. (8.4.68)
1 − supz∈X\Y cap(z,Y )

Proof The first inequality in (8.4.68) is trivial. For the second, note that
Px (τX < τx ) = Px (τX < τx ∧ τY < τx ) + Px (τX < τx ∧ τY > τx ) (8.4.69)

≤ Px (τY < τx ) + Px (τz < τx∪X\z )Pz (τx < τY )
z∈X\Y

≤ Px (τY < τx ) + sup Pz (τx < τY )Px (τX\Y < τx∪Y )

z∈X\Y

cap(z, x)
≤ Px (τY < τx ) + sup Px (τX < τx ).
z∈X\Y cap(z, Y )

Using that cap(x, Y ) = μ(x)Px (τX < τx ) (see (7.1.19)), we get the upper bound in
(8.4.68).

At this point we have

cap(x , Mx ∩ M−1 )
cap(x , Mx ∩ M−1 ) ≤ cap(x , Mx ) ≤ cap(z,x )
.
1 − supz∈Mx \M−1 cap(z,Mx ∩M−1 )

(8.4.70)
8.4 Spectral characterisation of metastability 213

It follows from Lemma 8.28 and the non-degeneracy conditions that the ratios of
the capacities in the denominators are at most δ. Next, we use the same reasoning to
show that
cap(x , Mx ∩ M−1 )
cap(x , Mx ∩ M−1 ) ≤ cap(x , M−1 ) ≤ cap(z,x )
.
1 − supz∈M−1 \Mx cap(z,Mx ∩M−1 )

(8.4.71)
Here, again the ratios of the capacities in the denominator must all be smaller than δ,
since if for some z in the supremum this is not true, then it leads to a contradiction
with the definition of x .

The first important consequence is the following estimate on the 2 -norms of the
corresponding equilibrium potentials.

Lemma 8.31 Let M be a non-degenerate set of metastable points and let M , =

k, . . . , 1 be defined in Definition 8.27. Then, for = k, . . . , 2,

hx ,M−1 22,μ = μ A(x ) , (8.4.72)

where the valley A(x ) is defined with respect to the original set M .

Proof Use the estimates on equilibrium potentials in Lemma 8.11 and Lemma 8.28.

Corollary 8.32 Let M be a non-degenerate set of metastable points. Then

cap(x , M−1 ) cap(x+1 , M )

≤δ , (8.4.73)
hx ,M−1 22,μ hx+1 ,M 22,μ

with 0 < δ 1 as in (8.4.63).

Theorem 8.33 (Sharp bounds on principal eigenvalues) Assume that M is a non-

degenerate set of metastable points and let |M | = k. Define the sequence of points
xk , . . . , x1 and the sequence of sets M , = k, . . . , 1 as in Definition 8.27. Then, for
all = 1, . . . , k − 1,

cap(x , M−1 ) M cap(x , M−1 )

1 − O(δ) ≤ λ0 ≤ 1 + O(δ) (8.4.74)
hx ,M−1 2,μ
2 hx ,M−1 2,μ
2

M
and λ0 −1 ≤ O(δ)λM 0 , and the sequence M , = k, . . . , 1, realises the sequence

defined in (8.4.61).

We will show that each of these principal Dirichlet eigenvalues is very close to
one of the small eigenvalues of −L.
214 8 Key Definitions and Basic Properties

Theorem 8.34 (Sharp asymptotics of principal eigenvalues) Assume that there ex-
ists an x ∈ M such that, for some 0 < δ 1,

cap(x, M \x) cap(z, M \z)

δ ≥ max . (8.4.75)
hx 2,μ
2 z∈M \x hz 22,μ

Then the largest eigenvalue of −L smaller than λM

0 is given by

cap(x, M \x)
λx = 1 + O(δ) , (8.4.76)
hx 2,μ
2

and all other eigenvalues λ of −L satisfy

λ ≤ Cδλx . (8.4.77)

Moreover, the eigenvector φ (x) corresponding to λx , normalised such that φ (x) (x) =
1, satisfies φ (x) (z) ≤ Cδ, z = x, for some constant C < ∞.

Proof Let x = xk ∈ M be defined in Definition 8.27. We know from Theorem 8.26

M \x
that λ0 ∼ cap(x, M \x)/hx 22,μ .
Assume that there is an eigenvalue λx smaller than λM 0 . We try to compute the
precise value of this eigenvalue, i.e., we look for a root of the determinant of EM (λ)
that is of order cap(x, M \x)/hx 22,μ .
The determinant of EM (λ) vanishes together with that of the matrix K1 with
elements
μ(x)
K1
xy =
λ
EM (λ) xy
hx 2,μ hy 2,μ

E (hx , hy ) (hx , hy )μ + (ψxλ , hy )μ

= −λ . (8.4.78)
hx 2,μ hy 2,μ hx 2,μ hy 2,μ

Lemma 8.21, Corollary 8.23 and Lemma 8.24 already control the term involving
ψxλ and the scalar products (hx , hy )μ . The terms involving E (hx , hy ), x = y, can be
bounded using the Cauchy-Schwarz inequality,

1

E (hx , hy ) = μ z p z , z hx z − hx (z) hy z − hy (z)
2
z,z ∈S

≤ E (hx , hx )E (hy , hy ), (8.4.79)

and hence
2 2
E (hx , hy )
≤ E (hx , hx ) E (hy , hy ) . (8.4.80)
h h h 2 h 2
x 2,μ y 2,μ x 2,μ y 2,μ
8.4 Spectral characterisation of metastability 215

Therefore, by the assumption in Theorem 8.34, there exists an x ∈ M such that

E (hx , hy ) √ E (hx , hx )

h h ≤ δ h 2 . (8.4.81)
x 2,μ y 2,μ x 2,μ

Collecting estimates, we have the following, where we abbreviate

E (hx , hx )
Ax = . (8.4.82)
hx 22,μ

Lemma 8.35
(i) Let x be the point specified in the assumptions in Theorem 8.34. Then

K1xx = Ax − λ 1 + O(λ) .
λ
(8.4.83)

(ii) For y = x, the diagonal elements satisfy

K1yy = Ax O(δ) − λ 1 + O(λ) ,
λ
y = x. (8.4.84)

(iii) All off-diagonal elements satisfy

√
|K1
uv | ≤ C( δAx + λρ),
λ
u = v. (8.4.85)

Recall that λ < λM

0 is an eigenvalue of −L if there is a non-zero solution to the
equations

K3zy cy = 0, z ∈ M . (8.4.86)
y∈M

Trivially, we may choose the vector c in such a way that maxz∈M |cz | = 1, and the
component realising the maximum is equal to 1. Assume that, with this normalisa-
tion, cz = 1 for z = x. Then the z-line of (8.4.86) reads

−K3 zz = K3zy cy , (8.4.87)
y=z

and inserting the estimates on the matrix elements, we find

√
λ ≤ Ax C|M | δ + λCρM |, (8.4.88)

which implies that λ must be much smaller than Ax . Thus, such a c would not
M \x
correspond to an eigenvalues that is larger than λ0 . Hence we may assume that
cx = 1 ≥ |cy | for all y = x. Now, (8.4.86) with z = x,

K3
xx = K3
xy cy , (8.4.89)
y=x
216 8 Key Definitions and Basic Properties

implies, in view of the bounds on K3

xy and the fact that |cy | ≤ 1,
λ √
K3 ≤ C|M | δAx + λ2 /λM , (8.4.90)
xx 0

i.e.,
√
|Ax − λ| ≤ C|M | δAx + λ2 /λM
0 , (8.4.91)
which in turn implies that
√
λ = A 1 + O( δ + ρ) . (8.4.92)

This bound can be improved if we consider the remaining equations in (8.4.86).

Namely, for z = x,

−K3 zz cz = K3 zy cy . (8.4.93)
y=z

Solving for cz , using (ii) and (iii), and employing λ ∼ Ax , we see that
√
|cz | ≤ C|M |( δ + ρ). (8.4.94)

This allows us to improve (8.4.91) to

|Ax − λ| ≤ C 2 |M |2 Ax , (8.4.95)

which is the first claim in Theorem 8.34. The assertion on the eigenvector follows
from our estimates on the vector c.
It remains to show that a solution of (8.4.86) as specified above exists. This can
be shown with the help of a fixed-point argument. Rearranging terms, we can cast
(8.4.86) into the form

λ = Λ(λ, c1 , . . . , ck−1 ),
(8.4.96)
c = C (λ, c1 , . . . , ck−1 ), = 1, . . . k − 1.

Explicitly, the maps Λ and C read (we abbreviate c = (c1 , . . . , ck−1 ))

(ψ λ , hx ) λ
k−1
E (hx , hx )
Λ(λ, c) = −λ x 2 + Hxxj cj ,
hx 2,μ
2 hx 2,μ j =1

(8.4.97)
E (hx , hx ) (ψxλ , hx )
C (λ, c) = λ−1 c − λ c + Hx zj cj
λ
hx 22,μ hx 22,μ j =

for = 2, . . . , k. We want to construct a solution by the following iteration scheme.

(0)
Let λ(0) = Ax and c = 0, = 1, . . . k − 1. For n ∈ N, let λ(n) be the solution
of λ(n) = Λ(λ(n) , c(n−1) ), and let c(n) be the solution of c(n) = C(λ(n) , c(n) ). We
want to prove that the sequence (λ(n) , c(n) )n∈N converges. To do this, we need the
following facts.
8.4 Spectral characterisation of metastability 217

(i) For c in a small neighbourhood of 0, the map Λ(cot, c) : R → R is a contrac-

tion on a neighbourhood of Ax , and hence the steps λ(n−1) → λ(n) are well
defined.
(ii) For λ in a small neighbourhood of Ax , the map C(λ, ·) : Rk−1 → Rk−1 is a
contraction on a neighbourhood of 0, and hence the steps c(n−1) → c(n) are
well defined.
(iii) On the respective sets, the solutions of λ = Λ(λ, c) are Lipschitz in c, and the
solutions of c = C(λ, c) are Lipschitz in λ, with Lipschitz constants such that
the composition of these maps yields a contraction.
In the following statements the assumptions of Theorem 8.34 are in place.

Lemma 8.36 For any c with c∞ ≤ 1, the map Λ(·, c) : (Ax /2, 3Ax /2) → R is a
contraction. More precisely,

Λ(λ, c) − Λ λ , c ≤ λ − λ Cρ, (8.4.98)

where C < ∞ is independent of ρ.

Proof The estimate in (8.4.98) is straightforward from Lemma 8.21, Corollary 8.23
and Lemma 8.24, together with the assumption that all M are ρ-metastable sets.

Corollary 8.37 For any c with c∞ ≤ 1,

λ = Λ(λ, c) (8.4.99)

has a unique solution λ(c) ∈ (Ax /2, 3Ax /2).

Proof Set λ(0) (c) = Ax , and λ(n) (c) = Λ(λ(n−1) (c), c). Then, as n → ∞, λ(n) (c)
converges to the unique fixed point of the map Λ(·, c) on (Ax /2, 3Ax /2), which is
the solution of (8.4.99).

Lemma 8.38 The solution of (8.4.99) from Corollary 8.37 √ is Lipschitz continuous
with respect to the 1 -norm in c with Lipschitz constant C δAx .

Proof We show that, for fixed λ ∈ (Ax /2, 3Ax /2), Λ(λ, c) is Lipschitz in c.
Namely,

k−1

Λ(λ, c) − Λ λ, c ≤ c − c H λ
xx
=1

k−1
√
= c − c ( δAx + 3Ax Cρ), (8.4.100)

=1
218 8 Key Definitions and Basic Properties
√
which is dominated by the δAx -term. This gives the Lipschitz bound in 1 and
in ∞ . Combining this with the bound (8.4.98), we get
√

λ(c) − λ c ≤ C δAx , (8.4.101)
1 − Cρ

which proves the lemma.

Lemma 8.39 For λ ∈ (Ax /2, 3Ax /2), the map C(λ, ·) : [−1, 1]k−1 → [−1, 1]k−1
is a contraction. More precisely,

C(λ, c) − C λ, c ≤ Cδ c − c . (8.4.102)
1

Proof This is again elementary from what is already proven.

Corollary 8.40 For any λ ∈ (Ax /2, 3Ax /2), the equation

c = C(λ, c) (8.4.103)

has a unique solution in [−1, 1]k−1 .

Proof Same as the proof of Corollary 8.37.

Lemma 8.41 Let c(λ) denote the solution of (8.4.103) from Corollary 8.40. Then
c(λ) is Lipschitz in λ. More precisely,
√
δ
c(λ) − c λ ≤ C λ − λ . (8.4.104)
Ax

Proof The proof goes like the proof of Lemma 8.38. The fairly large bound on the
Lipschitz constant comes from the term involving Hxλ x that gives rise to a term

−1 E (hxk−1 , hx )
λ − λ −1 , (8.4.105)
hxk−1 2,μ hx 2,μ

which cannot be bounded by less than what is claimed.

Corollary 8.42 The map T : (Ax /2, 3Ax /2) → (Ax /2, 3Ax /2) defined by T (λ) =
λ(c(λ)), where λ(c) is the unique solution of λ = Λ(λ, c) and c(λ) is the unique
solution of c = C(λ, c), is a contraction. More precisely,
√
T (λ) − T λ ≤ C δ λ − λ , (8.4.106)

where C < ∞ is independent of δ.

8.4 Spectral characterisation of metastability 219

Proof Using Lemmas 8.38 and 8.41, we have that

√
λ c(λ) − λ c λ ≤ C δAx c(λ) − c λ (8.4.107)
1
√ √
≤ C δC δAx A−1 2
x λ−λ =C δ λ−λ ,

which proves the claim.

Corollary 8.42 implies that there exists a unique λ ∈ (Ax /2, 3Ax /2) such that
λ = λ(c(λ)). Hence (λ, c(λ)) is the unique solution of λ = Λ(λ, c) and c = C(λ, c)
with λ near Ax . This proves the existence of the solution to (8.4.86) and concludes
the proof of Theorem 8.34.

M \x
At this point we have shown that λM0 > λx > λ0 , where the last two eigen-
values are almost the same and are smaller by a factor at least δ than the first. This
procedure can now be repeated with M replaced by M \x, provided M \x satisfies
the hypothesis of a set of metastable points.

Theorem 8.43 (Asymptotics of the spectrum and mean metastable exit times) Let
|M | = k ≥ 2, and let M , = k, . . . , 1 be the sequence of sets defined in Theo-
rem 8.33. Assume further that, for each = 1, . . . , k, M is a set of metastable
points in the sense of Definition 8.2 (with the same parameter ρ). Then −L has k
eigenvalues λ1 < λ2 < · · · < λk < λM0 , where

λ1 = 0, (8.4.108)

and
cap(x , M−1 )
λ = 1 + O(δ) , = 2, . . . , k. (8.4.109)
μ(A(x ))
Consequently,

1 1
λ = 1 + O(δ) = 1 + O(δ) . (8.4.110)
Ex [τM−1 ] Ex [τMx ]

The corresponding normalised eigenfunction has the form

hx ,M−1 (y)

−1
hx ,Mj −1 (y)
ψ (y) = + O(δ) . (8.4.111)
hx ,M−1 2,μ hx ,Mj −1 2,μ
j =1

Proof Applying Theorem 8.34 to the sets M , we successively show that −L has
k − 1 eigenvalues below λM
0 that satisfy

cap(x , M−1 )
λx = 1 + O(δ) , = 2, . . . , k. (8.4.112)
hx ,M−1 2,μ
2
220 8 Key Definitions and Basic Properties

Using the same arguments as in the proof of Lemma 8.24, we show that

hx ,M−1 22,μ = hx ,M−1 1,μ 1 + O(ρ) = 1A(x ) 1,μ 1 + O(ρ) . (8.4.113)

It remains to identify the right-hand side with the inverse mean hitting time of
the set Mx = {z ∈ M : μ(z) > μ(x )}.

Lemma 8.44 Under the assumptions of Lemma 8.28,

Ex [τMx ] = Ex [τM−1 ] 1 + O(δ) . (8.4.114)

Proof First, from Theorem 8.15 we have that

μ(A(x ))
Ex [τMx ] = 1 + o(1) . (8.4.115)
cap(x , Mx )

To estimate Ex [τM−1 ], we use that, by assumption, M is a set of metastable points

and so, by Theorem 8.15,

1
Ex [τM−1 ] = μ(z)hx ,M−1 (z)(1 + o(1)), (8.4.116)
cap(x , M−1 )
& )
z∈A(x

& ) now refers to the set M . However, by the construction of the sequence
where A(x
& )) = μ(A(x )), = k, . . . , 1. By Corol-
of sets M , = k, . . . , 1, we have μ(A(x
lary 8.29, the capacities in the two formulas are also equal up to a factor 1 + O(δ),
which proves the lemma.

This observation allows us to conclude that the k smallest eigenvalues of L are

precisely the inverses of the mean exit times from the metastable points M . The
estimate on the eigenvectors is inherited from (8.4.94).

8.4.4 Exponential law of the metastable exit times

There are different ways to prove that the distribution of metastable exit times is
asymptotically exponential. The most robust argument is based on a renewal ar-
gument: Since the probability to reach the set Mx starting from x ∈ M without
returning to x is very small, the process returns many times to x before a success-
ful excursion happens. The number of such excursions is geometrically distributed,
and the time of an unsuccessful recursion is μ(A(x))/μ(x), by the ergodic theorem.
Since this time is small compared to the number of excursions, the rescaled time un-
til a successful excursion converges to an exponential distribution. Finally, the time
of the last excursion is negligible compared to this time.
8.4 Spectral characterisation of metastability 221

Theorem 8.45 (Exponential law of metastable exit times) Under the non-degene-
racy hypothesis of Theorem 8.34 with δ satisfying (8.4.63), for all t > 0,

lim Px τMx > t Ex [τMx ] = e−t . (8.4.117)
ρ↓0

Proof To exploit the renewal structure it is convenient to use Laplace transforms.

We set τ̂Mx = τMx /Ex [τMx ] and τ̂x = τx /Ex [τMx ] and

Rx (λ) = Ex exp(λτ̂Mx ) . (8.4.118)

Note that Rx (λ) < ∞ for all λ < 1, due to the fact that the principle eigenvalue of
the Dirichlet generator with Dirichlet conditions in Mx is essentially 1/Ex [τMx ].
Moreover, Rx (λ) satisfies the following renewal equation.

Lemma 8.46 (Renewal equation for Laplace transforms) For all λ < 1,

Ex [eλτ̂Mx 1τMx <τx ]

Rx (λ) = . (8.4.119)
1 − Ex [eλτ̂x 1τx <τMx ]

Proof Noting that 1 = 1τMx <τx + 1τx <τMx and using the strong Markov property,
we see that

Rx (λ) = Ex eλτ̂Mx 1τMx <τx + Ex eλτ̂x 1τx <τMx Ex eλτ̂Mx . (8.4.120)

Equation (8.4.119) is now immediate.

As a result of the representation in (8.4.119), Theorem 8.45 is a consequence of

the following lemma.

Lemma 8.47 With the notation in Lemma (8.46), for all λ < 1,

Ex [eλτ̂Mx 1τ̂Mx <τx ] 1

lim = . (8.4.121)
ρ↓0 1 − Ex [eλτx 1τx <τ̂Mx ] 1+λ

Proof The first step in the proof is the following crucial pointwise bound.

Lemma 8.48 There exists a 1 < C < ∞ such that

Cρ
Ex [τMx 1τMx <τx ] ≤ Ex [τx 1τx <τMx ]. (8.4.122)
1 − Cρ

Proof Instead of proving (8.4.122) directly, we first show the (more natural) esti-
mate
Ex [τMx 1τMx <τx ] ≤ Cρ. (8.4.123)
222 8 Key Definitions and Basic Properties

To do so, we use the fact that

Ex [τx∪Mx ] = Ex [τx 1τx <τMx ] + Ex [τMx 1τMx <τx ]. (8.4.124)

Define the function

Ey [τx 1τx <τMx ], if y ∈
/ x ∪ Mx ,
wx,Mx (y) = (8.4.125)
0, else.

Then wx,Mx solves the Dirichlet problem

(−Lwx,Mx )(y) = hx,Mx (y), y∈

/ x ∪ Mx ,
(8.4.126)
wx,Mx (y) = 0, y ∈ x ∪ Mx .

Note that

Ex [τx 1τx <τMx ] = Px (τx < τMx ) − (−Lwx,Mx )(x). (8.4.127)

Next, using reversibility, we have

μ(y)hx,Mx (y)(−Lwx,Mx )(y) = μ(y)(−Lhx,Mx )(y)wx,Mx (y).
y y
(8.4.128)
By the properties of the functions hx,Mx and wx,Mx , the right-hand side of (8.4.128)
vanishes identically, while the left-hand side is equal to

μ(x)(−Lwx,Mx )(x) + μ(y)hx,Mx (y)2 = 0. (8.4.129)
/ Mx
y ∈x∪

Hence, inserting (8.4.127), we get

μ(x)Ex [τx 1τx <τMx ] = μ(x)Px (τx < τMx ) + μ(y)hx,M (y)2 . (8.4.130)
/ Mx
y ∈x∪

With a similar procedure we show that

μ(x)Ex [τx∪Mx ] = μ(x) + μ(y)hx,Mx (y). (8.4.131)
/ Mx
y ∈x∪

Therefore, combining (8.4.124), (8.4.30) and (8.4.131), we get

1
Ex [τMx 1τMx <τx ] = Px (τMx < τx ) + μ(y)hx,Mx (y)hMx ,x (y).
μ(x)
/ Mx
y ∈x∪
(8.4.132)
The first term in the right-hand side is exponentially small. The second term is
controlled in the same way as in Lemma 8.24 and is bounded by Cρ. Thus,
8.4 Spectral characterisation of metastability 223

(8.4.123) holds. Finally, Ex [τx∪Mx ] ≥ 1 and so, via (8.4.124), it follows that
Ex [τx 1τx <τMx ] ≥ 1 − Cρ, and we deduce (8.4.122).

Next, define T = Ex [τMx ]. Then (8.4.122) implies that

T = Ex [τMx 1τMx <τx ] + Ex [τMx 1τx <τMx ] (8.4.133)

= Ex [τMx 1τMx <τx ] + Ex [τx 1τx <τ̂Mx ] + Px (τx < τMx )Ex [τMx ]
= Ex [τx∪Mx ] + Px (τx < τMx )T .

Hence we have
Ex [τx∪Mx ]
T= . (8.4.134)
Px (τMx < τx )
Because 1 ≥ e−x ≥ 1 − x, x ≥ 0, it follows that the numerator in (8.4.119) satisfies,
for λ ≤ 0,

Ex [τMx 1τMx <τx ]

Px (τMx < τx ) ≥ Ex eλτ̂Mx 1τ̂Mx <τx ≥ Px (τMx < τx ) 1 + λ .
Ex [τx∪Mx ]
(8.4.135)
Let us now turn to the denominator in (8.4.119), which we rewrite as

Ex [(1 − eλτ̂x )1τx <τ̂Mx ]

Px (τMx < τx ) 1 + λ . (8.4.136)
λPx (τMx < τx )

Combining this with (8.4.135) we get that, for λ ≤ 0,

Ex [τMx 1τM <τx ]

1 1+λ x
Ex [τx∪Mx ]
≥ Rx (λ) ≥ . (8.4.137)
Ex [(1−eλτ̂x )1τx <τ̂M ] Ex [(1−eλτ̂x )1τx <τ̂M ]
1+λ λPx (τMx <τx )
x
1+λ λPx (τMx <τx )
x

What is left to control is

Ex [(1 − eλτ̂x )1τx <τ̂Mx ] Ex [τ̂x 1τx <τ̂Mx ] Ex [(1 + λτ̂x − eλτ̂x )1τx <τ̂Mx ]
=− + .
λPx (τMx < τx ) Px (τMx < τx ) λPx (τMx < τx )
(8.4.138)
The first term is fine because

Ex [τ̂x 1τx <τ̂Mx ] Ex [τx 1τx <τ̂Mx ]

= , (8.4.139)
Px (τMx < τx ) Ex [τx∪Mx )

which tends to one rapidly. To deal with the second term, we use that, for u ≤ 0,

0 ≥ 1 + uτx − euτx ≥ − 12 u2 τx2 . (8.4.140)

224 8 Key Definitions and Basic Properties

Next, we note that, for u ≤ 0,

0 ≥ Ex 1 + uτx − euτx 1τx <τMx (8.4.141)

= Ex 1 + uτx∪Mx − e uτx∪Mx
1τx <τMx ≥ Ex 1 + uτx∪Mx − e uτx∪Mx
.

Thus we need to control the non-negative function

Ey [euτx − 1 − uτx ], y ∈
/ x ∪ Mx ,
vu (y) = (8.4.142)
0, y ∈ x ∪ Mx ,

for u ∼ 1/Ex [τMx ]. This goes in a similar fashion as the derivation of the ∞ -
bounds in Lemma 8.22.
Set w(y) = Ey [τx∪Mx ]. We easily verify that vu solves the Dirichlet problem

(−L − ū)vu (y) = uūw(y) + (ū − u), y∈

/ x ∪ Mx ,
(8.4.143)
vu (y) = 0, y ∈ x ∪ Mx ,

where we set ū = 1 − e−u . Note that ū − u ≤ 0 for u ≤ 0, and is O(u2 ) for small u.
Mx
The Dirichlet problem in (8.4.143) has a unique solution for ū < λx∪
0 . Note also
that

Ex euτx∪Mx = eu (Lwu )(x) + 1 , (8.4.144)
and
Ex [τx∪Mx ] = (Lw)(x) + 1. (8.4.145)
Proceeding as in the proof of Lemma 8.22, we obtain the following estimates.

/ Mx Ez [τx∪Mx ]. Then, for any y and u ≤ 0,

Lemma 8.49 Assume that ū < supz∈x∪

0 ≤ vu (y) ≤ uū Ey [τx∪Mx ] sup Ez [τx∪Mx ] + (ū − u) Ey [τx∪Mx ]. (8.4.146)

/ Mx
z∈x∪

Proof Rewrite the first line of (8.4.143) as

(−Lvu )(y) = ū vu (y) + uū w(y) + (ū − u), (8.4.147)

and solve for vu . This gives, for u ≤ 0,

vu (y) = ū Gx∪Mx (y, z) vu (z) + u w(z) + (1 − u/ūu)
/ Mx
z∈x∪

≤ uū sup w(z)Ey [τx∪Mx ] + (ū − u)Ey [τx∪Mx ], (8.4.148)

/ Mx
z∈x∪

which yields the claim.

8.5 Metastability in uncountable state spaces 225

A rough estimate gives

μ(y) Mx
sup Ey [τx∪Mx ] ≤ |S| sup ≤ |S|/λx∪ ≤ ρ/λM 0 ,
x

/ Mx cap(y, Mx ∪ x)
0
/ Mx
y ∈x∪ y ∈x∪
(8.4.149)
Mx Mx
where the last inequality uses the assumption that λx∪
0 ≥ δλ 0 with δ|S| ≤ ρ.
Mx
Finally, we recall that λ0 ∼ 1/Ex [τMx ]. Hence, for u = λ/Ex [τMx ], λ ≤ 0,

Ex 1 + λτ̂x − eλτ̂x 1τ ≤ Cλ2 ρ Ex [τx∪Mx ] . (8.4.150)
x <τ̂Mx
Ex [τMx ]

Inserting this bound into (8.4.138), we see that the second term in the right-hand
side is bounded in absolute value by λCρ, which tends to zero as desired. But this
implies the assertion of Lemma 8.47.

This concludes the proof of Theorem 8.45.

8.5 Metastability in uncountable state spaces

In Sects. 8.2–8.4 we have seen that the definition of metastability in Definition 8.2
through capacities leads to precise predictions on the connection between capacities,
metastable exit times and small eigenvalues of the generator. These results relied on
renewal arguments that are only available in the context of countable state spaces.
For uncountable state spaces, however, renewal arguments are not possible because
capacities of single points are either zero or are much smaller than capacities of
small neighbourhoods around them. This means that the process may take a very
long time to hit a point, even when it has already reached its neighbourhood. A
proper definition of metastable sets must look like (8.1.3), but with a suitable and
model-dependent choice of the sets Bi , i ∈ I .
In Chap. 11 we explain in detail, within the context of diffusion processes, how
the theory developed above carries over, provided the sets Bi , i ∈ I , can be chosen in
such a way that solutions of the Dirichlet problems we have encountered are almost
constant on these sets. At present this is the only example where the full picture
established in the present chapter has been carried over. For this reason, we do not
formulate an abstract model-independent result, but rather provide the details in the
specific context of Chap. 11.
An interesting setting where we would like to establish similar results are inter-
acting particle systems with state spaces like S = {−1, +1}Λ with Λ ⊂ Zd . In this
situation, only partial results in specific models are presently available. These are the
random-field Curie Weiss model (Chap. 15), when Λ = {1, . . . , N} with N → ∞,
as well as the Ising model with Glauber dynamics (Chap. 19) and the lattice-gas
model with Kawasaki dynamics (Chap. 20), both at low temperature, when the size
of Λ diverges as the temperature tends to zero.
226 8 Key Definitions and Basic Properties

8.6 Bibliographical notes

1. The material in this chapter was developed by Bovier, Eckhoff, Gayrard and
Klein [34]. The exposition here is streamlined.

2. A somewhat opposite approach, connecting metastability to spectral data, was

initiated in a series of papers by Davies [70–74] and developed further by Gaveau
and Schulman [124] and Gaveau and Moreau [123]. Here, the starting point is a set
of assumptions on the properties of eigenvalues and eigenfunctions of the generator
of the Markov process that imply certain metastable behaviour. Low-lying eigenval-
ues are related to metastable time scales and the corresponding eigenfunctions are
related to metastable states. The problem is, however, that the computation of the
spectrum of a generator and of its eigenfunctions is difficult, and even numerically
prohibitive for large systems.

3. In much the same way as in Theorem 8.15, conditional mean hitting times such as
Ex [τJ |τJ ≤ τI ] can be computed (for an example see (8.4.37)). Formulas are given
in Bovier, Eckhoff, Gayrard and Klein [33, 34].

4. The characterisation of eigenvalues given in Lemma 8.19 was first exploited in

Bovier, Eckhoff, Gayrard and Klein [34], but similar ideas were put forward earlier
in Wentzel [233, 234].

5. The non-degeneracy conditions of Theorem 8.34, ensuring the simplicity of

eigenvalues, can be lifted at the expense of more work. Some examples have been
worked out in the context of finite-state Markov processes with exponentially small
transition probabilities by Berglund and Dutercq [20] and by Cirillo and Nardi [65].

6. A similar approach to metastability for continuous-time Markov processes, based

on potential theory combined with martingale convergence theory, has been taken
by Beltrán and Landim [16, 17].

7. A probabilistic proof of Lemma 8.5 (with slightly different bounds) was given
in Bovier, Eckhoff, Gayrard and Klein [34]. An analytic proof can be found in
Slowik [219].

8. The transition path theory championed by E and Vanden-Eijnden [100] analyses

path properties of metastable systems via Doob transforms based on the assumption
that the equilibrium potentials (“committors” in their terminology) are known. They
use numerical methods to compute equilibrium potentials in concrete models.

9. The proof of the asymptotic exponential law for metastable exit times given in
Sect. 8.4.4 can be extended to settings where only approximate renewal properties
hold. See, for instance, Bianchi, Bovier and Ioffe [25]. An alternative proof in the
discrete setting can be found in Bovier, Eckhoff, Gayrard and Klein [34]. An elegant
earlier proof was given by Martinelli and Scoppola [177] and Martinelli, Olivieri
and Scoppola [174, 175].
Chapter 9
Basic Techniques

Il y a peu de choses impossibles d’elles-mêmes; et l’application

pour les faire réussir nous manque plus que les moyens.
(François de La Rochefoucauld, Réflexions)

This chapter collects techniques that are basic for the study of metastability and that
will be used throughout Parts IV–VIII. Section 9.1 focusses on capacity estimates,
and derives upper and lower bounds on capacities with the help of variational princi-
ples, namely, the Dirichlet principle and the Berman-Konsowa principle introduced
in Sect. 7.3. We outline the strategies that are used to exploit these variational prin-
ciples in an efficient way to derive matching upper and lower bounds. Section 9.2
introduces the notion of coarse-graining, which is particularly useful for mean-field
systems. Section 9.3 states conditions under which the Markov property is preserved
when states are lumped. Section 9.4 deals with regularity estimates for harmonic
functions with the help of elliptic regularity theory and coupling.

9.1 Capacity estimates

We have seen in Chap. 8 that in metastable dynamics the computation of key quan-
tities such as hitting probabilities and mean hitting times can be largely reduced
to the computation of capacities. The usefulness of this reduction depends on how
well we can compute capacities. While clearly the universality of our approach ends
here, and model-specific properties have to come into the game, it is both surprising
and rewarding that precise computations of capacities are possible in a multitude of
specific systems, as we will come to appreciate as we go along.
In Sect. 9.1.1 we outline general strategies to obtain upper and lower bounds on
capacities via the Dirichlet principle. In Sect. 9.1.2 we concentrate on lower bounds
on capacities via the Berman-Konsowa principle.

© Springer International Publishing Switzerland 2015 227

A. Bovier, F. den Hollander, Metastability,
Grundlehren der mathematischen Wissenschaften 351,
DOI 10.1007/978-3-319-24777-9_9
228 9 Basic Techniques

9.1.1 General strategies

The key to success is the variational representation of capacities through the Dirich-
let principle stated in Theorem 7.33. The Dirichlet principle immediately yields two
avenues towards bounds:
(i) Upper bounds via judiciously chosen test potentials.
(ii) Lower bounds via monotonicity of the Dirichlet form in the transition probabil-
ities.
These two avenues are well-known and give rise to what are called “Rayleigh’s
short-cut rules” in the language of electric networks (see e.g. Doyle and Snell [92]
and references therein). In the context of metastable systems, their usefulness can
be enhanced by an iterative method. Moreover, the lower bound can be tackled with
the help of the Thomson principle or the Berman-Konsowa principle, of which the
latter will turn out to be especially powerful.
The key idea of iteration is to get control of the minimiser in the Dirichlet prin-
ciple, i.e., the equilibrium potential. In metastable systems, when we are interested
in computing for instance the capacity cap(Bx , By ) for a pair of metastable sets
Bx , By ⊂ S, our initial goal will always be to identify domains where hBx ,By is
close to 0 or close to 1. This can be done with the help of the renewal estimate in
Lemma 8.4. While this approach looks circular at first glance (namely, we need to
know capacities to estimate equilibrium potentials, which we need to estimate ca-
pacities), it turns out to yields a tool to turn “poor” bounds into “less poor” bounds.
Thus, the first step in the program is:
(i) Guess a test potential that produces a good upper bound based on proper intu-
ition.
(ii) Simplify the state space to obtain a system that can be solved exactly for the
lower bound. In many examples, this leads to a one-dimensional or quasi-one-
dimensional system.
(iii) Insert the resulting bounds into (8.2.1) to obtain bounds on the harmonic func-
tion.
Using the bounds on the harmonic function, we can identify the sets

Dx = z ∈ S : hBx ,By (z) < δ , and Dy = z ∈ S : hBx ,By (z) > 1 − δ , (9.1.1)

for 0 < δ 1 suitably chosen. If S\(Dx ∪ Dy ) contains no further metastable set,

then we define

I = z ∈ S\(Dx ∪ Dy ) : μ(z) < ρ sup μ(w) (9.1.2)
w∈S\(Dx ∪Dy )

for 0 < ρ 1 suitably chosen, with μ the invariant measure on S. The idea is
that the set I will be irrelevant for the value of the capacity, no matter what values
hBx ,By takes on I , and that the sets Dx and Dy give no contribution to the capacity
to leading order. The only problem therefore is to find the equilibrium potential,
or a reasonably good approximation of it, on the set S = S\(Dx ∪ Dy ∪ I ) (see
Fig. 9.1). We return to this problem shortly.
9.1 Capacity estimates 229

Fig. 9.1 Schematic picture of the harmonic function hBx ,By : trivial on Dx and Dy , irrelevant on
I , and nontrivial on S . In many applications, S is small and well structured

Remark 9.1 Of course, the above approach can only make sense when the sets Dx
and Dy are connected through S . If that is not the case, then we will have to analyse
the set S\(Dx ∪ Dy ) more carefully. If Dx ∪ Dy contains further metastable sets
Bw , then it will be possible to identify domains Dw ⊃ Bw on which hBx ,By takes
a constant value cw (to be determined later). Note that this can be done again with
the help of the renewal bounds we encountered in Chap. 8. The starting point is the
observation that

hBx ,By (z) = Pz (τBx < τBy )

= Pz (τBx < τBy , τBw < τBx ) + Pz (τBx < τBy , τBw ≥ τBx )
∼ Pz (τBw < τBx ∪By ) Pw (τBx < τBy )
= Pz (τBw < τBx ∪By ) cw , (9.1.3)

where we anticipate that Pw (τBx < τBy ) ∼ Pz (τBx < τBy ) = cw for all z ∈ Bw
(and later for all z ∈ Dw ). Thus, the problem (to be solved with the help of
the a priori bounds on the equilibrium potential in Lemma 8.4 and the a priori
bounds on capacities in Lemma 8.5) is to determine the set of points z for which
Pz (τw < τBx ∪By ) > 1 − δ. Once this is done, we proceed as before, after increasing
the set Dx ∪ Dy in the definition of I to D = Dx ∪ Dy ∪ Dw1 ∪ · · · ∪ Dwk when
k such sets can be identified. It should then be the case that the set D ∪ S is con-
nected. The remaining problem consists in determining the equilibrium potential on
the set S and the values cw1 , . . . , cwk . At this stage we can obtain upper and lower
bounds in terms of variational formulas that involve only the set S . To what extent
these variational formulas can be solved depends on the situation at hand.

To obtain an upper bound, we choose a test function h+ with the properties

h+ (z) = 1, z ∈ Dx ,
h+ (z) = 0, z ∈ Dy , (9.1.4)
h+ (z) = cwi , z ∈ Dwi , i = 1, . . . , k,
230 9 Basic Techniques

where the constants cwi are determined later. On I the function h+ can be chosen
essentially arbitrarily, while on S it must be chosen such that it optimises the re-
striction of the Dirichlet form to S with boundary conditions implied by (9.1.4).
Finally, the constants cwi , i = 1, . . . , k, are determined by minimising the outcome
as a function of these constants.
A first strategy to obtain matching lower bounds that works well in many situa-
tions goes as follows. If h∗ denotes the true minimiser in the Dirichlet form on S,
then

E h∗ , h∗ ≥ ES h∗ , h∗ , (9.1.5)
where ES is the restriction of the Dirichlet form to S , i.e.,
2
ES (h, h) = 12 μ(x)p(x, y) h(x) − h(y) , (9.1.6)
x∈S and/or y∈S

where for the sake of exposition we focus on the case of countable S. We minorise
ES (h∗ , h∗ ) by taking the infimum over all h on S , with boundary conditions im-
posed by what we know a priori about the equilibrium potential. In particular, we
know that these boundary conditions are close to constants on the different com-
ponents of D. Of course, we do not know the constants cwi , i = 1, . . . , k, but by
minimising the result over their possible values at the end we get a lower bound.
Thus, if we can show that the minimisers in the lower bound differ little from the
minimisers with constant boundary conditions, then we get upper and lower bounds
that coincide up to small error terms. In general it may be difficult to compute these
minimisers. However, in metastable systems typically the problem reduces in com-
plexity compared to the original problem, and in many instances can be solved ex-
plicitly.

9.1.2 Lower bounds via flows

The method to get lower bounds described above leaves open the problem of how
to obtain a lower bound for the reduced Dirichlet form ES . Sometimes this can be
done by setting sufficiently many transition probabilities p(x, y) equal to 0 until the
remaining Dirichlet problem can be solved exactly by hand (for instance, because it
corresponds to the one-dimensional chains discussed in Sects. 7.1 and 7.2).
A more versatile and systematic tool is provided by the variational principles
of Thomson and Berman-Konsowa presented in Sect. 7.3. The latter appears to be
particularly suitable, and we describe now the strategies used in exploiting it.
The Berman-Konsowa principle yields a lower bound whenever we insert some
flow. However, guessing a flow is much harder than guessing a potential, due to the
local constraints imposed by Kirchhoff’s law. In principle we would like to guess a
good approximation of the harmonic flow, but this is not easy in practice. A natural
idea is to use the approximate harmonic function that was guessed in the derivation
9.1 Capacity estimates 231

of the upper bound to produce an approximation of the harmonic flow. But since an
approximate harmonic function is not harmonic, it is not straightforward how to get
a flow out of it. It is useful to inspect the proof of the Berman-Konsowa principle in
Theorem 7.43 to see what flexibility we have in playing with the test flow. The flow
property is used only in the proof of Lemma 7.41, more precisely, in the derivation
of (7.3.28). We therefore define the notion of a defective flow.

Definition 9.2 (Defective loop-free unit flow) Let Γ = (S, E) be a graph with
vertex set S and edge set E. Let A, B ⊂ S be non-empty and disjoint. A map
f : E → R is called a defective loop-free unit flow from A to B if:
(i) There exists a defect function δ : S → R such that

f (y, x) = f (x, z) + δ(x), x ∈ S\(A ∪ B). (9.1.7)
y∈S : z∈S :
(y,x)∈E (x,z)∈E

(ii) The total flow out of A is equal to 1, i.e.,

f (x, z) = 1. (9.1.8)
x∈A z∈S :
(x,z)∈E

(iii) Any path γ from A to B such that f (e) > 0 for all e ∈ γ is self-avoiding. In
particular, if f ((x, y)) > 0, then f ((y, x)) = 0.

Given a defective loop-free unit flow f , we can construct a Markov chain with
transition rates
f ((x, y))
q f (x, y) = , (9.1.9)
F (x)

where F (x) = y∈S f ((x, y)), and with initial distribution

Pf (X0 = x) = F (x)1x∈A . (9.1.10)

Here, without loss of generality, we assume that F (x) > 0 for all x ∈ S. We denote
the law of this Markov chain by Pf , and define the sets An , n ∈ N0 , with A0 = A, as
in (7.3.31) in Sect. 7.3. Recall that n∗ (z) is the unique value of n such that z ∈ An .
The elementary estimate that results is the following.

Lemma 9.3 Let f be a defective loop-free unit flow from A to B. Then

∗ (z)
n"

δ(y)
Pf (τz < τB ) ≤ 1 + max F (z), z ∈ S, (9.1.11)
y∈Ak F (y) +
k=1

where a+ = max(a, 1).

232 9 Basic Techniques

Proof The proof of this estimate proceeds by induction on n in exactly the same
way as the proof of Lemma 7.41. We know that (9.1.11) holds for all z ∈ A0 because
F = 1 on A0 . We assume that (9.1.11) holds for all z ∈ Ak with 0 ≤ k ≤ n. Then
the recursion in (7.3.33) yields, for z ∈ An+1 ,

Pf (τz < τB ) = Pf (τy < τB )q f (y, z) (9.1.12)
y∈A0 ∪···∪An

"
n k−1
δ(y)

≤ 1 + max f (y, z)
y∈A F (y) +
k=0 =1 y∈Ak

"
n−1
δ(y)

≤ 1 + max f (y, z)
y∈Ak F (y) +
k=1 y∈S
n
"

δ(y)
≤ 1 + max F (z),
y∈Ak F (y) +
k=1

where the first inequality uses (9.1.9) and the induction hypothesis, and the last
inequality uses (9.1.7).

Recalling (7.3.29) and using (9.1.9) and (9.1.11), we have

(x)−1 ∗

n" δ(y)
P (x, y) ∈ γ ≤
f
1 + max f (x, y) , (9.1.13)
y∈Ak F (y) +
k=1

and hence
n (x)−1
∗

Pf ((x, y) ∈ γ ) " δ(y) −1

1f ((x,y))>0 ≥ 1 + max . (9.1.14)
f ((x, y)) y∈Ak F (y) +
k=1

Inserting this lower bound into the Dirichlet form in (7.3.38), we get
M
"
−1
δ(y) μ(x)p(x, y) 2
E (h, h) ≥ 1 + max Ef h(x) − h(y)
y∈Ak F (y) + f ((x, y))
k=1 (x,y)∈γ
(9.1.15)
with M = maxx∈S n∗ (x). Taking the infimum over h, we get the following lower
bound on the capacity (recall (7.3.39)).

Lemma 9.4 (Berman-Konsowa defective flow bound) Let f be a defective loop-

free unit flow from A to B with defect function δ. Then
M
"

δ(y) −1 f f ((x, y)) −1
cap(A, B) ≥ 1 + max E . (9.1.16)
y∈Ak F (y) + μ(x)p(x, y)
k=1 (x,y)∈γ
9.2 Coarse-graining 233

Situations where we may want to apply Lemma 9.4 arise when an approximate
harmonic function has been guessed in the upper bound. Given such a function, say
g, we may define

f (x, y) = N (g)−1 μ(x)p(x, y) g(y) − g(x) + , (9.1.17)

where N(g) is a normalising constant that fixes the total outgoing flow from A to
be 1, and we may write, using reversibility,

f (z, w) = N (g)−1 μ(z)p(z, w) g(w) − g(z) (9.1.18)
z∈S z∈S,(z,w)∈E,g(w)≥g(z)

= N (g)−1 μ(w)p(w, z) g(w) − g(z)
z∈S,(z,w)∈E

+ N (g)−1 μ(w)p(w, z) g(z) − g(w)
z∈S,(z,w)∈E,g(w)<g(z)
−1
= N (g) μ(w)(L g)(w) + F (w),

i.e., the defect function is given by δ(w) = N (g)−1 μ(w)(L g)(w). In Chap. 10 we
will encounter a set-up where this strategy works nicely. In general, however, there is
quite a bit of artistry involved in working out good test flows. We will see examples
in Chaps. 15, 19 and 20.

9.2 Coarse-graining

One of the heuristic ideas in physics is that of renormalisation or coarse-graining.

Here, the hope is that a system defined on the level of microscopic variables (such
as particles or spins), can be effectively described by a lower-dimensional sys-
tem that captures only the mesoscopic variables (such as block-densities or block-
magnetisations). If we want to understand the behaviour of such systems at mod-
erate temperatures, then a coarse-grained description becomes imperative, since on
the microscopic level the competition between energy and entropy does not allow
for a proper understanding of metastable states and their transition paths.
The paradigmatic example is the Curie-Weiss model (to be discussed in
Chap. 13). This offers the first link between the dynamics of a spin system and
the Kramers diffusion mentioned in Sect. 2.1.1. The microscopic model is a spin
system with state space {−1, +1}N whose state σ (t) = {σi (t)}N i=1 at time t evolves
as a Markov process
with transition rates that depend only on the total magnetisation
mN (σ (t)) = N1 N i=1 σi (t). The dynamics is chosen to be reversible with respect to
−1
a Gibbs measure μN,β (σ ) = ZN exp[−βHN (σ )], where the Hamiltonian

HN (σ ) = − 12 N mN (σ )2 − hN mN (σ ) (9.2.1)
234 9 Basic Techniques

Fig. 9.2 Coarse-graining from spin configuration to magnetisation

is a function of the total magnetisation. In this case it is easy to verify that

mN (t) = mN σ (t) (9.2.2)

is again a Markov process, this time on the much smaller state space {−1, −1 +
2N −1 , . . . , 1 − 2N −1 , 1} (see Fig. 9.2), and is reversible with respect to the measure
exp[−Nfβ,N (m)], where fβ,N is a double-well potential for β > 1 and h small
enough (for more details, see Fig. 13.1). This Markov process is a nearest-neighbour
random walk that is attracted to the local minima of the function fβ,N and behaves
similarly to the Kramers diffusion. The key point here is that the effective inverse
temperature in the coarse-grained model is of order N , i.e., entropic effects on the
mesoscopic scale have become marginal compared to the original model, and can
be ignored in the limit as N → ∞.
We would like to consider a similar mapping down to mesoscopic variables in
similar situations. This is called lumping in the theory of Markov chains, and is ex-
tensively discussed in the monograph by Kemeny and Snell [151]. Let us briefly
state the main results. The technique is still on the level of an art, and will be illus-
trated in the example of the random-field Curie-Weiss model treated in Chaps. 14
and 15.

9.3 Lumping

Consider a Markov process X = (Xt )t∈R+ on some state space S. Let T be some
other state space, and let f be a map from S to T . Then Y = (Yt )t∈R+ with Yt =
f (Xt ) is again a stochastic process. Typically, Y is not Markov, but there are easy
conditions under which it is (see Burke and Rosenblatt [43]).

Theorem 9.5 (Preservation of Markov property under lumping) Let P be the law of
a Markov process X = (Xt )t∈R+ with state space (S, B(S)) and stationary transi-
tion kernels Pt , t ≥ 0. Let Ft = σ (Xs , 0 ≤ s ≤ t), t ∈ R+ , be the σ -algebra gener-
ated by X up to time t. Let (T , B(T )) be a measurable space and f a measurable
9.3 Lumping 235

Fig. 9.3 Lumping: map with symmetry

map from S to T . Then Y = (Yt )t∈R+ with Yt = f (Xt ) is a Markov process when
for every t ≥ 0 and B ∈ B(T ) the maps

P f (Xt ) ∈ B | Fs , 0 ≤ s < t, (9.3.1)

are measurable with respect to the σ -algebra generated by Y up to time s. In that

case, the transition kernels Rt of the Markov process Y are given by

Rt (B, y) = Pt f −1 (B), x , B ∈ B(T ), y ∈ T , ∀ x ∈ S : f (x) = y. (9.3.2)

In words, the image process Y is Markov when the original process X is Markov
and has a high degree of symmetry (see Fig. 9.3).
In the case of a countable state space, the conditions of Theorem 9.5 can be
restated as saying that, if p(x, x ), x, x ∈ S, are the transition probabilities (or tran-
sition rates) of X, then for any y, y ∈ T and x ∈ S such that f (x) = y the formula

r y, y = p x, x , (9.3.3)
x ∈S :
f (x )=y

is independent of the specific choice of x. Clearly, if X has invariant measure μ,

then Y has invariant measure ν = μ ◦ f −1 . Also, if X is reversible with respect to
μ, then Y is reversible with respect to ν.

Remark 9.6 Note that we can alternatively define transition probabilities

μ(x)
r y, y = p x, x . (9.3.4)
x∈S :
ν(y) x ∈S :
f (x)=y f (x )=y

If p is reversible with respect to μ, then r is reversible with respect to ν, and we

may hope that r generates a Markov chain that is a good approximation of Y .

The following theorem states some consequences of lumpability.

Theorem 9.7 (Consequences of lumpability) Let X, Y and f be as in Theorem 9.5.

Let A, B ⊂ S be such that there exist a, b ⊂ T such that A = f −1 (a), B = f −1 (b).
Then
236 9 Basic Techniques

(i)

A,B (x) = ha,b f (x) ,
hX x ∈ S,
Y
(9.3.5)

where hX , hY denote the equilibrium potentials with respect to X and Y .

(ii)
capX (A, B) = capY (a, b), (9.3.6)
where capX , capY denote the capacities with respect to X and Y .

Proof Property (i) is immediate from (9.3.1) in combination with the representa-
A,B (x) = P(τA < τB | X0 = x), x ∈ S, and ha,b (y) = P(τa < τb | Y0 = y),
tions hX X X Y Y Y

y ∈ T , where τAX is the first hitting time of A for X and τaY is the first hitting
time of a for Y . Property (ii) is immediate from (9.3.1) and the representations
capX (A, B) = E X (hX A,B , hA,B ) and capY (a, b) = E (ha,b , ha,b ), where E , E
X Y Y Y X Y

are the Dirichlet forms associated with X, Y (recall Lemmas 7.12 and 7.26).

9.4 Regularity estimates

For models with an uncountable state space, in order to carry over the general for-
malism discussed in Chap. 8 we need some a priori control on the behaviour of
harmonic functions and other solutions of relevant Dirichlet problems. There are
two methods that can work in different cases: elliptic regularity theory (Sect. 9.4.1)
and coupling methods (Sect. 9.4.2).

9.4.1 Elliptic regularity theory

In the case of Markov processes with a state space that is a subset of Rd and with
a generator given by (the closure of) an elliptic operator of the form (7.2.15), there
is a well developed analytic theory that provides quantitative control on the regular-
ity of solutions of homogeneous and inhomogeneous Dirichlet problems associated
with these operators. The following two key lemmas are taken from Gilbarg and
Trudinger [126, Corollaries 9.24–9.25], and concern second-order elliptic operators

∂2 ∂
L = aij (x) + bi (x) + d(x) (9.4.1)
∂xi ∂xj ∂xj
i,j i

defined on some domain Ω ⊂ Rd , where aij ∈ C 0 (Ω), bi , d ∈ L∞ (Ω). Assume

that, for two numbers 0 < c ≤ C < ∞,

C(ξ, ξ ) ≥ ξ, a(x)ξ ≥ c(ξ, ξ ) > 0 ∀ ξ ∈ Rd . (9.4.2)
9.4 Regularity estimates 237

Let γ = C/c, and choose ν such that (b∞ /c)2 ≤ ν and b∞ ≤ ν. For n ∈ N,
let W 2,n (Ω) denote the Sobolev spaces of twice (weakly) differentiable functions
on Ω whose derivatives of order ≤ 2 are in Ln (Ω). Let BR (x) denote the ball of
radius R centred at x.

Lemma 9.8 If u ∈ W 2,n (Ω) is positive and satisfies L u = 0 in Ω, then for any
x ∈ Ω and R > 0 such that B2R (x) ⊂ Ω,

sup u(z) ≤ C inf u(z), (9.4.3)

z∈BR (x) z∈BR (x)

where C = C(n, γ , νR 2 ) < ∞.

Lemma 9.9 If u ∈ W 2,n (Ω) is positive and satisfies L u = f in BR0 (x), then for
any 0 < R ≤ R0 ,

α
R
oscBR (x) u ≤ C oscBR0 (x) u + R0 f − cun,BR0 (x) , (9.4.4)
R0

where oscA u = supA u − infA u, α = α(n, γ , νR02 ) > 0 and C = C(n, γ , νR02 ) < ∞.

In the context of reversible diffusions with small noise, the term involving second
derivatives is scaled by the small parameter ε, i.e., we deal with operators of the form
∂ ∂
Lε = ε eF (x)/ε aij (x) e−F (x)/ε (9.4.5)
∂xi ∂xj
i,j

∂2 ∂aij (x)

∂F (x) ∂
=ε aij (x) + ε − aij (x) .
∂xi ∂xj ∂xi ∂xi ∂xj
i,j i,j

This means, in particular, that the ellipticity constant scales with ε. The way we
will use Lemmas 9.8–9.9 is to consider a family of domains depending on ε, chosen
in such a way that the numerical constants C and α are independent of ε. For the
operator Lε , both c and C are proportional to ε, γ = O(1), and we can choose
ν ∼ ε −2 supy∈Ω ∇F (y)2∞ .
An important application of the regularity estimates in Lemmas 9.8–9.9 is to
obtain bounds on harmonic functions. The basic tool used throughout Chap. 8 for
countable state spaces was Lemma 8.4, which was based on a simple renewal argu-
ment contained in the renewal equation (recall (8.2.2))
Px (τA < τB∪x )
Px (τA < τB ) = . (9.4.6)
Px (τA∪B < τx )
While this formula remains true in the diffusion setting, it is useless for d > 1 be-
cause the denominator equals 1 and so the numerator equals the left-hand side. For-
tunately, it is easy to obtain a useful analogue of (9.4.6) by purely analytic con-
siderations, contained in the following theorem for operators of the form (9.4.5).
238 9 Basic Techniques

Theorem 9.10 (Upper bound on harmonic function) Let A, B ⊂ Rd be disjoint

closed sets whose complement is regular, and let x ∈ (A ∪ B)c be such that
dist(x, A ∪ B) > cε. Then, for any ρ ≤ cε, with c ∈ R+ , there exists a C ∈ (0, ∞)
(depending only on c and on the value of ∇F (x)∞ ) such that

cap(Bρ (x), A)
hA,B (x) ≤ C . (9.4.7)
cap(Bρ (x), B)

Proof We begin by proving the following lemma.

Lemma 9.11 With the notation of Theorem 9.10,

sup G(A∪B)c (z, x) eF (x)/ε e−F (y)/ε eB∪Bρ (x),A (dy),
z∈∂Bρ (x) ∂Bρ (x)

≥ hA,B (x) (9.4.8)

≥ inf G(A∪B)c (z, x) eF (x)/ε e−F (y)/ε eB∪Bρ (x),A (dy),
z∈∂Bρ (x) ∂Bρ (x)

where eB∪Bρ (x),A is the equilibrium measure defined in (7.2.56).

Proof Let Ω ⊂ Rd be a regular domain, and let f be a function defined on ∂Ω.

Recall that the Poisson kernel HΩ = HΩλ=0 defined in (7.2.6) maps a function f de-
fined on ∂Ω to a harmonic function on Ω, called its harmonic extension. Choosing
Ω = (A ∪ B)c , we see that the equilibrium potential hA,B satisfies the mean-value
property
hA,B (x) = (H(A∪B)c hA,B )(x). (9.4.9)
Let C⊂(A ∪ B)c be a regular neighbourhood of x. Since hA,B∪C and hA,B coincide
on ∂(A ∪ B), it is obvious that

hA,B (z) = (H(A∪B)c hA,B∪C )(z) ∀ z ∈ (A ∪ B ∪ C)c . (9.4.10)

Using the first Green identity in (7.2.49) for Ω = Γ = (A ∪ B ∪ C)c , f =

G(A∪B)c (x, ·) and g = hA,B∪C , we get

(H(A∪B)c hA,B∪C )(x)

= −ε e[F (x)−F (y)]/ε hA,B∪C (y) ∂n(y) G(A∪B)c (y, x) dσA∪B (y)
∂(A∪B)

= −ε e[F (x)−F (y)]/ε G(A∪B)c (y, x) ∂n(y) hA,B∪C (y) dσC (y)
∂C

=− e[F (x)−F (y)]/ε G(A∪B)c (y, x) eA,B∪C (dy), (9.4.11)
∂C
9.4 Regularity estimates 239

where ∂n(y) is the normal derivative defined in (7.2.51). We use that hA,B∪C (y) = 0
when y ∈ ∂C and GA∪B (y, x) = 0 when x ∈ ∂(A ∪ B). The last equality follows
from (7.2.56). (Note that the factor ε appears because the definition of the normal
derivative does not include the factor ε.)
Now choose C = Bρ (x). If we could replace G(A∪B)c (y, x) by a constant for
y ∈ ∂Bρ (x), then we could extract this constant from the integral, and the remaining
integral would be some partial capacity. In fact, on a countable state space instead
of the ball Bρ (x) we could choose the point x, in which case the problem would
be absent and we would readily get (9.4.6). In the present setting, by combining
(9.4.10)–(9.4.11), we still get two bounds, namely,

hA,B (x) ≥ − sup G(A∪B)c (z, x) e F (x)/ε
e−F (y)/ε eA,B∪Bρ (x) (dy),
z∈∂Bρ (x) ∂Bρ (x)
(9.4.12)
−F (y)/ε
hA,B (x) ≤ − inf G(A∪B)c (z, x) e F (x)/ε
e eA,B∪Bρ (x) (dy).
z∈∂Bρ (x) ∂Bρ (x)

But, trivially, −eA∪B,Bρ (x) = eBρ (x),A∪B , which implies (9.4.8).

At this point it is clear that we need to be able to control the Green function near
the diagonal. Before turning to estimates, we bring (9.4.12) into a more suitable
form.

Lemma 9.12 Within the setting of Lemma 9.11,

hA,B (x) ≤ sup G(A∪B)c (z, x) eF (x)/ε cap Bρ (x), A . (9.4.13)
z∈∂Bρ (x)

Proof By the representation in (7.2.56), we have eB∪Bρ (x),A (dy) ≤ eBρ (x),A (dy).
Hence, by (7.2.57),

−F (y)/ε

e eB∪Bρ (x),A (dy) ≤ e−F (x)/ε eBρ (x),A (dy) = cap Bρ (x), A .
∂Bρ (x) ∂Bρ (x)
(9.4.14)
Thus, the upper bound in (9.4.8) implies the upper bound in (9.4.13).

We want to express the Green function in Lemma 9.11 in terms of capacity. Using
the symmetry of the Green function and the fundamental relation between the Green
function, the equilibrium measure and the equilibrium potential in Theorem 7.28,
we get

e F (x)/ε
e−F (z)/ε G(A∪B)c (x, z)eBρ (x),A∪B (dz)
∂Bρ (x)

= G(A∪B)c (z, x)eBρ (x),A∪B (dz)
∂Bρ (x)

= hBρ (x),A∪B (x) = 1. (9.4.15)

240 9 Basic Techniques

This implies that

1 ≥ eF (x)/ε inf G(A∪B)c (x, z) e−F (z)/ε eBρ (x),A∪B (dz)
z∈Bρ (x) ∂Bρ (x)

= eF (x)/ε inf G(A∪B)c (x, z) cap Bρ (x), A ∪ B , (9.4.16)
z∈Bρ (x)

i.e.,
1
eF (x)/ε inf G(A∪B)c (x, z) ≤ . (9.4.17)
z∈Bρ (x) cap(Bρ (x), A ∪ B)
It is clear at this point that we cannot continue unless we can compare the infimum
and the supremum of G(A∪B)c (z, x) with z ∈ Bρ (x). Such a comparison is provided
by the Harnack inequalities.

Lemma 9.13 (Harnack inequality for the Green function) If ρ = cε for some
c < ∞, then there exists a constant C, depending on c only, such that

sup G(A∪B)c (z, x) ≤ C inf G(A∪B)c (z, x), x ∈ (A ∪ B)c . (9.4.18)

z∈Bρ (x) z∈Bρ (x)

Proof We will apply Lemma 9.8. If we choose R ≤ ε, then we can use (9.4.3)
with a constant that does not depend on ε. (If x is a quadratic critical point of F ,
then we can even choose R = ε 1/2 .) Let u(z) = G(A∪B)c (z, x), z ∈ Bρ (x). Then u
is harmonic in (A ∪ B)c \x. Therefore, if ρ > 2R, then u is harmonic in B2R (y)
for every y ∈ ∂Bρ (x). Let a, b ∈ ∂Bρ (x) be such that supz∈∂Bρ (x) u(z) = u(a) and
infz∈∂Bρ (x) u(z) = u(b). Then we can find k points x1 , . . . , xk ∈ ∂Bρ (x), with k ≤
πρ/R, such that x1 = a, b ∈ BR (xk ) and BR (xi ) ∩ BR (xi+1 ) = ∅. Clearly,

u(a) ≤ C inf u(z) ≤ C inf u(z) ≤ C sup u(z)

z∈BR (a) z∈BR (a)∩BR (x2 ) z∈BR (x2 )

≤ C2 inf u(z) ≤ · · · ≤ C k−1 sup u(z) ≤ C k inf u(z) = u(b).

z∈BR (x2 ) z∈BR (xk ) z∈BR (xk )
(9.4.19)

Thus, u(a) ≤ C ρ/R u(b), and so if ρ = cε and R = ε, then the supremum and the
infimum are related by at most a finite ε-independent constant.

Combining (9.4.16) with Lemmas 9.12–9.13, we arrive at the assertion in

(9.4.7).

9.4.2 Coupling methods

An alternative way to obtain regularity estimates that are suitable for metastable
systems is via coupling. The basic idea is as follows. Take some function depending
9.4 Regularity estimates 241

Fig. 9.4 Coupling of two trajectories starting from two different initial values in a certain neigh-
bourhood

on the initial value x of the process, for instance, the expected hitting time x →
Ex [τD ] of a set D. We want to show that in a certain uniform sense this function
is continuous in x on a certain neighbourhood. To do so, we start two copies of the
process in two different initial values, say x and y, coupled in such a way as to
favour convergence of the trajectories over time. If the trajectories meet before D is
hit, then the processes realise the hit together. If this happens with large probability
after a small time, then the difference between Ex [τD ] and Ey [τD ] must be small.
See Fig. 9.4 for an illustration.
To exemplify this technique, we present its application in finite- and infinite-
dimensional diffusion processes first used by Martinelli et al. [174–177]. To that
end we place ourselves in the setting of the SDEs and SPDEs discussed in Sects. 5.6
and 5.7, respectively, 6.2 and 6.3, where the noise is scaled by a small parameter ε.
As pointed out in Sect. 6.5, there may be several minima of the action functional that
are metastable states in the sense of Freidlin-Wentzell theory. The theory therefore
yields the exponential asymptotics of transition times between such states.
In the potential-theoretic approach, Corollary 7.30 gives us a formula for mean
transition times when the system is started in a specific initial distribution on a
ball around a metastable state, namely, the last-exit biased distribution. The desired
result, however, is that the mean transition times do not really depend on the initial
distribution, i.e., are essentially the same no matter where in the ball the system
starts.
We limit our discussion to the reversible setting, although the results of Martinelli
et al. apply also to non-reversible processes. The main conditions formulated below
guarantee that the deterministic dynamics is attractive in the proper sense. If we are
looking at an S(P)DE of the form
√
dXt = −∇F (Xt ) dt + 2ε dBt , (9.4.20)

(where X is a finite-dimensional vector or a function, and B is d-dimensional Brow-

nian motion or the derivative of a Brownian sheet), then we assume that
(i) F is a Morse function with finitely many local minima, and the Hessian matrix
of F is non-degenerate at all critical points.
(ii) For some R < ∞ and all x ∈ / BR (0), the gradient ∇F (x) is pointing inward,
i.e., (∇F (x), n(x)) ≤ b < 0, where n(x) is the normal vector at x.
(iii) The second derivatives of F are locally bounded.
(See Chap. 12 for more precise statements for SPDEs.)
242 9 Basic Techniques

The finitely many local minima of F correspond to metastable states in the

Freidlin-Wentzell theory. We denote this set by M . For m ∈ M we denote by
M ⊂ M the set of local minima of F below F (m ).

Theorem 9.14 For any m ∈ M there exist ρ0 > 0, η > 0 and ρ0 > ρ > 0 such
that, for any δ > 0 and for ε small enough,

|Em [τ (Bδ (M ))] − Ez [τ (Bδ (M ))]|

sup ≤ e−η/ε . (9.4.21)
z−m ∞ <ρ0 Em [τ (Bδ (M ))]

Moreover, for any m ∈ M ,

sup Pm τ Bδ (m ) < τ Bδ (M )

z−m ∞ <ρ0

− Pz τ Bδ (m ) < τ Bδ (M ) ≤ e−η/ε . (9.4.22)

Remark 9.15 This result applies both for finite-dimensional diffusions and for
SPDEs under the conditions stated in Sect. 5.7. For sequences of N -dimensional
discretisations of SPDEs as described in Sect. 5.7, the corresponding estimates hold
uniformly in N , as was shown in Barret [11].

The proof makes essential use of the attractive nature of the deterministic (ε = 0)
equation. The key estimate is the following bound on solutions of (9.4.20). This
holds both for SDEs and SPDEs.

Lemma 9.16 Denote by Xzε the solution of (9.4.20) starting in z. Let m be a min-
imum of F . Then there exist k, C > 0 and ε0 , ρ0 > 0 such that, for ε0 > ε > 0,

P sup Xzε (t) − Xm (t) ∞ ≤ e−kt Xzε (0) − Xm ε

(0) ∞
, ∀t > 0
z−m ∞ <ρ0

≥ 1 − e−C/ε . (9.4.23)

The proof of this contraction result is tedious but relies on large deviation esti-
mates only. These are used to show that solutions cannot spend a substantial fraction
of time away from local minima. Two solutions driven by the same Brownian mo-
tion approach each other when they are in the neighbourhood of a minimum. Careful
book-keeping yields the result. For details see, in particular, the elegant proof given
in Martinelli, Sbano and Scoppola [176].

9.5 Bibliographical notes

1. The first reference to lumping appears to be Burke and Rosenblatt [43]. In this pa-
per, a necessary and sufficient criterion is given for a function of a Markov process
9.5 Bibliographical notes 243

to be Markovian. Kemeny, Snell and Laurie [151] introduced the notion of lump-
ing and of a lumpable Markov process. A systematic presentation of conditions for
lumpability on terms of symmetries of the transition rates is given in Baake, Baake,
Bovier and Klein [9]. Liggett [166] discusses lumping in connection with capaci-
ties. In the context of metastability, lumping was used heavily in Bovier, Eckhoff,
Gayrard and Klein [33]. Sharp bounds through coarse-graining in non-lumpable
models have been obtained in Bianchi, Bovier and Ioffe [24] for the random-field
Curie-Weiss model, and by Slowik [219] for the Potts version of this model.

2. Coupling methods to prove regularity were used by Martinelli, Olivieri and Scop-
pola [174, 175], and in an improved form by Martinelli, Sbano and Scoppola [176],
to prove exponential convergence of exit times for finite- and infinite-dimensional
diffusions. They were applied to discretisations of SPDEs by Barret [11].

3. Coupling techniques were used to obtain similar bounds as in Theorem 9.14 for
Glauber dynamics of the random-field Curie-Weiss model by Bianchi, Bovier and
Ioffe in [25]. The coupling used was an extension of a coupling constructed for the
Glauber dynamics of the Curie-Weiss model by Levin, Luczak and Peres in [164].

4. Bianchi and Gaudillière [26] consider families of finite-state Markov processes

in a setting where the size of the state space tends to infinity. With the help of
potential theory they are able to prove that metastable transition times depend only
weakly on the starting distribution, provided this has a support that lies in a small
neighbourhood of a metastable state. An immediate consequence of their result is
that metastable transition times are asymptotically exponential. The key idea is to
use Corollary 7.11, not for S, but for S ∪ (A ∪ B ) with A , B copies of A, B, and
to allow transitions A ↔ A and B ↔ B between mirror sites at non-zero rates. The
effect of this extension is that the Markov process can start in A , move to A, run
around A for awhile so as to approach a quasi-stationary distribution on A before
exiting A, then move to B, run around B for awhile so as to approach a quasi-
stationary distribution on B before exiting B, and finally enter B . It is shown that,
under certain conditions, the mean metastable transition time from A to B starting
from the quasi-stationary distribution on A is close to the mean metastable transition
time from A to B starting from the last-exit based distribution on A . For the latter
the formula in Corollary 7.11 is available and the techniques outlined in Chaps. 8–9
can be used.
Part IV
Applications: Diffusions with Small Noise

Parts IV–VIII bring the general theory outlined in Part III to bear on a number of
selected examples.
In Part IV we study diffusions with small noise. Chapter 10 deals with diffusions
on a lattice with a vanishing spacing. Chapter 11 looks at finite-dimensional diffu-
sions on subsets of Rd and sharpens the results of Freidlin and Wentzell by using
the potential-theoretic tools introduced in Part III. Chapter 12 looks at stochastic
partial differential equations, which are the infinite-dimensional analogues of the
diffusions dealt with in Chap. 11.
Chapter 10
Discrete Reversible Diffusions

I noticed an unlighted cigar and an open box of cigar-lights: all

things betokened that the Doctor, usually so methodical and so
self-contained, had been trying every form of occupation and
could settle to none.
(Lewis Carroll, Sylvie and Bruno Concluded)

One of the simplest settings in which the general theory of metastability outlined in
Part III can be applied is that of discrete diffusions. By this we understand discrete-
time or continuous-time (nearest-neighbour) random walks on d-dimensional lat-
tices of spacing ε > 0 subject to a drift field derived from a potential F that may
have several local minima. One of the motivations for studying discrete reversible
diffusions is that they appear as coarse-grained versions of mean-field spin systems.
The results of this chapter will be used in Part V.
In Sect. 10.1 we define the setting and state the necessary assumptions on the
potential. In Sects. 10.2 and 10.3 we derive upper and lower bounds on the relevant
capacities.

10.1 Definitions

For simplicity, we focus on the discrete-time setting (the extension to continuous

time is trivial). The transitions take place in subsets Sε of the lattice (εZ)d , ε > 0.
In particular, we assume that Sε = Ω ∩ (εZ)d for some fixed connected open set
Ω ⊂ Rd . We write x ∼ y when x and y are nearest-neighbour sites in (εZ)d .
We denote by X ε the time-homogeneous nearest-neighbour random walk on Sε
with transition matrix pε (x, y), x, y ∈ Sε , and assume reversibility with respect to
a probability distribution με , i.e., με (x)pε (x, y) = με (y)pε (y, x) for all x, y ∈ Sε .
We write Lε for the generator of X ε . To avoid notational complications, we assume
that

με (x) = exp −F (x)/ε (10.1.1)
for some F : Rd → R. In many applications it is necessary to allow F to depend on ε
as well, but this poses no additional difficulties. We ignore the issue of normalisation

© Springer International Publishing Switzerland 2015 247

A. Bovier, F. den Hollander, Metastability,
Grundlehren der mathematischen Wissenschaften 351,
DOI 10.1007/978-3-319-24777-9_10
248 10 Discrete Reversible Diffusions

of με , which is of no consequence for what follows. Metastability occurs when F

has at least two local minima.
We also assume that the transition probabilities pε (x, y) depend smoothly on x.
A possible choice is
⎧
⎪ −[F (y)−F (x)]+ /ε , if y = x ± εe , = 1, . . . , d,
⎨r e
pε (x, y) = 1 − Σ(x), if y = x, (10.1.2)
⎪
⎩
0, otherwise,

d of the lattice Z , r ∈ (0, 1) is an isotropy

where e denotes the -th basis vector d

in the direction of e such that

l=1 r ≤ 1, and Σ(x) is chosen such that
y∈Sε p ε (x, y) = 1. We assume that:

Assumption 10.1 F ∈ C 3 (Ω, R).

Definition 10.2 (Communication heights, communication level sets, optimal paths,

gates) Given two non-empty subsets A, B ⊂ Ω:
(a) The communication height between A and B is

Φ(A, B) = inf sup F γ (t) , (10.1.3)
γ ∈C([0,1],Ω)
γ (0)∈A,γ (1)∈B
t∈[0,1]

where the infimum runs over all continuous paths γ in Ω. The communication
level set between A and B is

S (A, B) = z ∈ Rd : F (z) = Φ(A, B) . (10.1.4)

(b) The set of optimal paths from A to B is

(A → B)opt

= γ ∈ C [0, 1], Ω : γ (0) ∈ A, γ (1) ∈ B, sup F γ (t) = Φ(A, B) .
t∈[0,1]
(10.1.5)
(c) A subset W ⊆ S (A, B) is a gate if it is a minimal subset with the property that
all optimal paths intersect W . A priori there may be several (not necessarily
disjoint) gates. Their union is denoted by G (A, B) and is called the essential
gate.

We make the following simplifying assumptions on F :

Assumption 10.3
(i) The set M of local minima of F is finite, and for all pairs x, y ∈ M there is a
unique essential gate G (x, y) consisting of a finite collection of isolated saddle
points zk∗ (x, y), k ∈ I (x, y), with I (x, y) an index set.
10.1 Definitions 249

(ii) At all local minima x ∈ M and all saddle points zk∗ (x, y), x, y ∈ M , k ∈
I (x, y), the Hessian matrix of F , denoted by A(x) and A(zk∗ (x, y)), is non-
degenerate (i.e., has only non-zero eigenvalues).

Remark 10.4 Assumption 10.3 amounts to saying that F is a Morse function. Under
this assumption, the saddle points zk∗ (x, y) are the critical points where A(zk∗ (x, y))
has exactly one negative eigenvalue.

We may encounter situations where saddle points in ∂Ω are relevant. While this
does not necessarily lead to problems, there are many instances where the formu-
lation of general results becomes somewhat cumbersome. In order to avoid these
complications we exclusively deal with situations where ∂Ω is never reached by X ε :

Assumption 10.5 limi→∞ F (xi ) = ∞ for any sequence of points (xi )i∈N in Ω
such that limi→∞ xi = x ∈ ∂Ω.

In the setting described above, the general theory of metastability for Markov
chains on countable state spaces described in Chap. 8 applies.

Theorem 10.6 (Metastable set) Let Mε ⊂ Sε be a set of best lattice approximations

of the points in M . Then Mε is a set of metastable points in the sense of Defini-
tion 8.2, with ρ = exp(−c/ε) for some c > 0 depending on F .

Proof The proof of this fact is easy. As we will see later in full detail, if x, y ∈
M , then cap(x, M \x)/με (x) ∼ exp(ε −1 [F (x) − F (z∗ (x, y))]), which is of order
exp(−C/ε) with C > 0. If z ∈ / M , then there is a path γ = (γ0 , . . . , γn ) from z
to M along which F is decreasing. As pointed out in Sect. 9.1.2, it follows that
cap(z, M ) ≥ capγ (γ (0), γ (n)), where capγ is the capacity of the Markov process
in which all connections except those on γ are removed. The lower bound can be
computed explicitly (see Sect. 7.1.4). This leads to the estimate cap(z, M )/με (z) ≥
O(ε p ) for some dimension-dependent p < ∞, which in turn implies that the condi-
tions of Definition 8.2 are satisfied.

Remark 10.7 Note that when Sε is infinite, our assumptions on F imply that there
exists a subset Sε,0 = Λ ∩ (εZ)d , with Λ some finite box, satisfying the hypothesis
on the subset S0 mentioned in the remark following Definition 8.2. We just need to
take Λ such that it contains all local minima of F and such that, outside Λ, F is
large enough.

In view of Theorem 10.6, all that is required is to compute capacities between the
single points in Mε . To further simplify the presentation, we assume that all gates
consist of a single saddle point. The general case is obtained simply by adding up
the contributions to the capacity coming from the different saddle points. Our goal is
to apply Theorem 8.15, which expresses the metastable crossover times between the
250 10 Discrete Reversible Diffusions

local minima in Mε in terms of capacities and the invariant measure. Theorem 8.43
automatically yields the associated spectral estimates.
Let A, B be disjoint non-empty subsets of Mε connected through a unique saddle
point z∗ (A, B), i.e., A and B are contained in two different connected components
of the level set {y ∈ Sε : F (y) < Φ(A, B)}. For z∗ ∈ Mε , let B(z∗ ) be the matrix
with elements
√ √
B z∗ ,k = r A z∗ ,k rk , (10.1.6)
and let γ̂1 (z∗ ) be the unique negative eigenvalue of B(z∗ ). For x ∈ Mε , write
Mε,x = {y ∈ Mε : F (y) < F (x)} and let z∗ (x, Mε,x ) be the unique saddle point
connecting x and Mε,x .
Our main results in this chapter are the following.

Theorem 10.8 (Sharp asymptotics for capacities) For A, B as above, as ε ↓ 0,

d/2
ε[−γ̂1 (z∗ )] 2π 3
cap(A, B) = e−Φ(A,B)/ε √ 1 + O ε ln(1/ε) ,
2π − det A(z∗ ) ε
(10.1.7)
where z∗ = z∗ (A, B).

Theorem 10.9 (Mean metastable exit time) For every x ∈ Mε , as ε ↓ 0,

2
2π − det A(z∗ ) 3
[Φ(x,Mε,x )−F (x)]/ε
Ex [τMε,x ] = e 1+O ε ln(1/ε) ,
ε[−γ̂1 (z∗ )] det A(x)
(10.1.8)
where z∗ = z∗ (x, Mε,x ).

Theorem 10.10 (Link between spectrum and metastable exit times) Suppose that
there exists a θ > 0 such that the elements of Mε can be labeled in such a way that

Φ(xk , Mk−1 ) − F (xk ) ≤ min Φ(xl , Mk \xl ) − F (xl ) − θ, k = 2, . . . , |Mε |,
1≤l<k
(10.1.9)
where Mk = {x1 , . . . , xk }, k = 1, . . . , |Mε |. As ε ↓ 0,

1
λk = 1 + O e−δ/ε , k = 2, . . . , |Mε |, (10.1.10)
Exk [τMk ]

for some δ = δ(θ ) > 0, where λk is the k-th eigenvalue of −Lε (in increasing order),
and λ1 = 0.

Theorem 10.11 (Exponential law of the metastable exit time) Under the assump-
tions of Theorem 10.10, for k = 1, . . . , |Mε |,

lim Pxk τMk /Exk [τMk ] > t = e−t , t ≥ 0. (10.1.11)
ε↓0
10.2 Upper bounds on capacities 251

The proof of Theorem 10.8 is given in Sects. 10.2–10.3. Theorems 10.9–10.11

follow from Theorems 8.15, 8.43 and 8.45, together with Theorem 10.8 and the
following lemma.

Lemma 10.12 For every x ∈ Mε , as ε ↓ 0,

d/2
2π 1 3
με A(x) = e−F (x)/ε √ 1 + O ε ln(1/ε) , (10.1.12)
ε det A(x)
where A(x) is the valley around x defined in (8.2.10).

Proof The lemma states that, up to the error given, the mass of A(x) is the same
as if the potential were of the form F (y) = F (x) + 12 ((y − x), A(x)(y − x)). In
that case the discrete sum over lattice points with spacing ε is well approximated
by the corresponding Gaussian integral. Furthermore, the contributions to both the
Gaussian
√ integral and the original sum coming from the region where y − x∞ ≥
C ε ln(1/ε) are by a factor of order ε C smaller than the main contribution and can
be neglected. On the remaining set, by Taylor expansion,

3
ε −1 F (y) − F (x) − 12 (y − x), A(x)(y − x) ≤ C ε ln(1/ε) . (10.1.13)

This results in the error term in (10.1.12). More details can be found in the compu-
tation of capacities carried out in Sects. 10.2–10.3, which uses very similar approx-
imations.

10.2 Upper bounds on capacities

In this section we derive upper bounds on capacities between two local minima. For
this we use the Dirichlet principle. We only need to produce a good test function.
We want to estimate

cap(A, B) = inf E (h, h), (10.2.1)

h∈HA,B

where the Dirichlet form is given by

2
E (h, h) = 12 e−F (x)/ε pε (x, y) h(x) − h(y) , (10.2.2)
x,y∈Sε

and HA,B = {h : Sε → R : E (h, h) < ∞, h|A ≥ 1, h|B ≤ 0}. The general strat-
√ to construct a test function is the following. We choose a strip W0 of width
egy
C ε ln(1/ε) with the following properties (see Fig. 10.1):
(i) The complement of W0 in Sε consists of two parts: W1 containing A and W2
containing B.
252 10 Discrete Reversible Diffusions

Fig. 10.1 Domains for the construction of the test function in (10.2.3), with m∗1 ∈ A and m∗2 ∈ B

√
(ii) W0 contains z∗ , and for a cube Dε of linear size C ε ln(1/ε) centered at z∗ ,
with C large enough, W0 ∩ Dε is contained in the set {x ∈ Sε : F (x) > F (z∗ ) +
cε ln(1/ε)} for a suitably chosen c > 1.
A test function %
g is taken of the form
⎧
⎪
⎪0, if x ∈ W1 ,
⎪
⎨1, if x ∈ W2 ,
%
g (x) = (10.2.3)
⎪
⎪g(x), if x ∈ W0 ∩ Dε = W0in ,
⎪
⎩
0, if x ∈ W0 ∩ Dεc = W0out ,

where g(x) has to be chosen carefully as an approximately harmonic function. It

should be clear that the non-negligible contributions to the Dirichlet form come
from the region W0 ∩ Dε . The advantage is that, within the small region Dε , the
Dirichlet form can be approximated by a simplified form for which a harmonic
function is readily found.
The proof proceeds in three steps: cleaning of the Dirichlet form (Sect. 10.2.1);
construction of approximate harmonic functions (Sect. 10.2.2); final estimate
(Sect. 10.2.3).

10.2.1 Cleaning of the Dirichlet form

The following lemma provides a continuity property for capacities.

Lemma 10.13 Let E , E% be two Dirichlet forms defined for the same state space
Sε , corresponding to reversible measures με , %
με and transition matrices pε , p̃ε ,
10.2 Upper bounds on capacities 253

respectively. Assume that, for all x, x ∈ Sε ,

με (x) pε (x, x )

− 1 ≤ δ, − 1 ≤ δ. (10.2.4)
%με (x)
%ε (x, x )
p

Then for any disjoint and non-empty A, B ⊂ Sε ,

cap(A, B)
(1 − δ)2 ≤ ≤ (1 − δ)−2 , (10.2.5)
3
cap(A, B)

where cap(A, B) = infu∈HA,B E (u, u) and cap(A,

3 B) = infu∈HA,B E%(u, u).

Proof Note that there exists an h∗ such that

cap(A, B) = E h∗ , h∗
με (x) pε (x, x ) ∗
= 1
%
με (x) p̃ε x, x h (x) − h∗ 2
x
2 %
με (x) p̃ε (x, x )
x,x ∈Sε
2
≥ 1
2 με (x)(1 − δ)p̃ε x, x (1 − δ) h∗ (x) − h∗ x
%
x,x ∈Sε
2
≥ (1 − δ)2 inf 1
2 με (x)p̃ε x, x h(x) − h x
%
u∈HA,B
x,x ∈Sε

= (1 − δ)2 cap(A,
3 B). (10.2.6)

Reversing the rôles of E and E%, we also get

3
cap(A, B) ≥ (1 − δ)2 cap(A, B), (10.2.7)

and the claim follows.

We use Lemma 10.13 for our Dirichlet

√ forms restricted to neighbourhoods of the
saddle point z∗ . For ρ = ρ(ε) = C ε ln(1/ε) with C < ∞, define

Dε (ρ) = x ∈ Sε : z∗ − x ≤ ρ ∀ = 1, . . . , d . (10.2.8)

We need to control the transition probabilities pε (x, x ) and the reversible measure
με (x) = exp(−F (x)/ε) in terms of suitable modifications. Let A(z∗ ) = A be the
Hessian matrix of F at the saddle point z∗ , and set

μ̃ε (x) = exp − 12 x − z∗ A x − z∗ /ε . (10.2.9)

Then, by Taylor expansion, for some K < ∞,

με (x)/μ̃ε (x) − 1 ≤ Kρ 3 /ε, x ∈ Dε (ρ). (10.2.10)
254 10 Discrete Reversible Diffusions

By (10.1.2) and Assumption 10.1, the transition probabilities are C 1 -functions so

that, for some C < ∞,

pε (x, x + εe )
− 1 ≤ C ε ln(1/ε), x ∈ Dε (ρ). (10.2.11)
r

With this in mind, we let L %ε be the generator of the dynamics on Dε (ρ) with
transition probabilities r̃(x, y) given by

r(x, x + εe ) = r ,
%
(10.2.12)
%
r(x + εe , x) = r % με (x + εe ).
με (x)/%

The fact that we choose the transition probabilities in the directions +e to be con-
stant is arbitrary. In the directions −e we must choose them such that reversibility
with respect to the modified reversible measure μ̃ε is satisfied.
For u ∈ HA,B , we write the corresponding Dirichlet form as

d
2
E%Dε (u, u) = r μ̃ε (x) u(x) − u(x + εe ) , (10.2.13)
x∈Dε (ρ) =1

where we note that μ̃ε (z∗ ) = 1.

10.2.2 Construction of an approximate harmonic function

In this section we construct a function that is almost harmonic with respect to the
Dirichlet form E%Dε .
Recall the matrix B(z∗ ) = B defined in (10.1.6). Let v̂ (i) , i = 1, . . . , d, be the
normalized eigenvectors of B, and γ̂i the corresponding eigenvalues. Denote by γ̂1
the unique negative eigenvalue of B. Define vectors v (i) by
(i) (i) √
v = v̂ / r , = 1, . . . , d, (10.2.14)

and vectors v̌ (i) by

(i) (i) √ (i)
v̌ = v̂ r = r v , = 1, . . . , d. (10.2.15)

The important fact about these vectors is that

Av̌ (i) = γ̂i v (i) (10.2.16)

and
(i) (j )
v̌ , v = δij . (10.2.17)
10.2 Upper bounds on capacities 255

This implies the following non-orthogonal decomposition of the quadratic form A:

d

(y, Ax) = γ̂i y, v (i) x, v (i) . (10.2.18)
i=1

Define the function f : R → [0, 1] by

a −|γ̂1 |u2 /2ε du
4
−∞ e |γ̂1 | a
e−|γ̂1 |u
2 /2ε
f (a) = ∞ = du. (10.2.19)
−|γ̂1 |u2 /2ε du 2πε
−∞ e −∞

Finally, we single out the vectors v = v (1) , v̌ = v̌ (1) , v̂ = v̂ (1) and set

g(x) = f ((v, x)), (10.2.20)

which is our choice for the approximately harmonic function in the definition of
%
g in (10.2.3). Note that g(x) only varies in the direction of the vector v, and that
it is close to 0 when (v, x) ≤ −ρ, and close to 1 when (v, x) ≥ ρ. Moreover, the
following estimate holds.

Lemma 10.14 Let g be as in (10.2.20), and let L %ε be the generator defined after
(10.2.11). Then, for all x ∈ Dε (ρ), there exists a constant c < ∞ such that
'4 (
ε|γ̂1 | −|γ̂1 |(v,x)2 /2ε
d
%ε g)(x) ≤
(L e r v O(ρ 2 ). (10.2.21)
2π
=1

Proof We choose coordinates such that z∗ = 0, and set A = A(z∗ ). Using reversibil-
ity, we get

r(x, x − εe ) = exp − 12 ε −1 (x, Ax) − (x − εe ), A(x − εe ) r
%

= exp −(e , Ax) 1 + O(ε) r . (10.2.22)

Therefore

d

%ε g)(x) =
(L r g(x + εe ) − g(x)
=1

g(x) − g(x − εe )

× 1 − exp −(e , Ax) 1 + O(ε) . (10.2.23)
g(x + εe ) − g(x)

Next, we use the explicit form of g given in (10.2.20) to obtain, by Taylor expansion,
that for some x̃ ∈ [x, x + εe ],
256 10 Discrete Reversible Diffusions

g(x + εe ) − g(x) = f (v, x) + εv − f (v, x) (10.2.24)

= v εf (v, x) + 12 v2 ε 2 f (v, x) + 16 v3 ε 3 f (v, x̃)
4
ε|γ̂1 | −|γ̂1 |(v,x)2 /2ε
= v e 1 − v |γ̂1 |(v, x) + O ρ 2 ,
2π
√
where we use (10.2.19) and ρ = C ε ln(1/ε). In particular, we get that

g(x) − g(x − εe )

= exp −|γ̂1 | (v, x − εe )2 − (v, x)2 /2ε (10.2.25)
g(x + εe ) − g(x)
1 − v |γ̂1 |[(v, x) − v ε] + O(ρ 2 )
×
1 − v |γ̂1 |(v, x) + O(ρ 2 )

v2 ε|γ̂1 | + O(ρ 2 )

= exp −|γ̂1 |v (v, x) 1 +
1 − v |γ̂1 |(v, x) + O(ρ 2 )

= exp −|γ̂1 |v (v, x) 1 + O ρ 2 .

Inserting these equations into (10.2.23), we get

4
ε|γ̂1 | −|γ̂1 |(v,x)2 /2ε
d

%ε g)(x) =
(L e r v 1 − v |γ̂1 |(v, x) + O ρ 2
2π
=1

× 1 − exp −(e , Ax) − |γ̂1 |v (v, x) 1 + O ρ 2 . (10.2.26)

Now

1 − exp −(e , Ax) − |γ̂1 |v (v, x) 1 + O ρ 2

= (e , Ax) + |γ̂1 |v (v, x) + O ρ 2 . (10.2.27)

Using this fact, and collecting the leading order terms, we get
4
%ε g)(x) = ε|γ̂1 | −|γ̂1 |(v,x)2 /2ε
(L e
2π

d

× r v (e , Ax) + |γ̂1 |v (v, x) + O ρ 2 . (10.2.28)
=1

Thus, since γ̂1 < 0, we have proved the claim, provided

d

r v (e , Ax)−γ̂1 v (v, x) = 0. (10.2.29)
=1
10.2 Upper bounds on capacities 257

But from (10.2.18) we obtain

d
(j ) (j )
(e , Ax) − γ̂1 v (v, x) = γ̂j v v ,x . (10.2.30)
j =2

(1)
Hence, recalling that r v = v̌ by (10.2.15) and that v̌ (1) is orthogonal to v (j ) for
j ≥ 2 by (10.2.17), we see that (10.2.29) holds.

10.2.3 Final estimate

Lemma 10.14 justifies the choice of g and will play an important rôle also in the
derivation of the lower bound in Sect. 10.3. But first we state the upper bound that
follows from it.

Proposition 10.15 With the notation introduced above,

∗ |γ̂1 ε| 2π d/2 1 3
cap(A, B) ≤ με z √ 1 + O ε ln(1/ε) .
2π ε − det A(z∗ )
(10.2.31)

Proof Return to Fig. 10.1. We first estimate the contribution of the set Dε ∩ W0 . By
Lemma 10.13, this can be controlled in terms of the modified Dirichlet form E%Dε in
(10.2.13). Thus, let g be the function defined in (10.2.20), and choose coordinates
such that z∗ = 0 and F (z∗ ) = 0. Then, by (10.2.9) and (10.2.24),

d
2
E%Dε (g, g) = e−(x,Ax)/2ε r g(x + e ) − g(x) (10.2.32)
x∈Dε =1

|γ̂1 |ε −|γ̂1 |(v,x)2 /ε −(x,Ax)/2ε 2

d
= e e r v
2π
x∈Dε =1
2
× 1 − v |γ̂1 |(v, x) + O ρ 2
|γ̂1 |ε −|γ̂ |(v,x)2 /ε −(x,Ax)/2ε
= 1 + O ε ln(1/ε) e 1 e ,
2π
x∈Dε

where we use that d=1 r v2 = d=1 v̂2 = 1. It remains to compute the sum over x.
Via a standard approximation of the sum by an integral we get, by (10.1.6), (10.2.14)
and (10.2.19),
258 10 Discrete Reversible Diffusions

exp −|γ̂1 |(v, x)2 /ε − (x, Ax)/2ε
x∈Dε

−d

=ε 1 + O(ρ) dx exp −|γ̂1 |(v, x)2 /ε − (x, Ax)/2ε
Dε
d √
" r
= 1 + O(ρ) dy exp −|γ̂1 |(y, v̂)2 /ε − (y, By)/2ε
ε D̄ε
=1
√
"
d
r
d
2
= 1 + O(ρ) dy exp −|γ̂1 |(y, v̂)2 /ε − γ̂j v̂ (j ) , y /2ε
ε D̄ε
=1 j =1

d √
d
" r (j ) 2
= 1 + O(ρ) dy exp − |γ̂j | v̂ , y /2ε
ε D̄ε
=1 j =1

d 4
2π d/2 " r
= 1 + O(ρ)
ε |γ̂ |
=1

d/2
2π −1/2
= 1 + O(ρ) − det A . (10.2.33)
ε
√
Here, D̄ε is the image of Dε under the change of variables y = x / r , and in
the last√equality we use the fact that Gaussian integrals over intervals of length
ρ = C ε ln(1/ε) are equal to integrals over R up to errors of order ε C .
Inserting (10.2.33) into (10.2.32), we see that the left-hand side of (10.2.32) is
equal to the right-hand side of (10.2.31) up to error terms. It therefore remains to
show that the sum outside Dε in the Dirichlet form does not contribute significantly
to the capacity. But we can always choose Dε and W0 in such a way that the follow-
ing hold:
(i) For x ∈ W0 ∩ Dεc , με (x) ≤ με (z∗ )ε K with K as large as desired.
(ii) If x ∈ W0 ∩ Dε and y ∈ W1 , with pε (x, y) > 0, then g(x)2 με (x)/με (z∗ )
≤ ε K . Similarly, if x ∈ W0 ∩ Dε and y ∈ W2 , with pε (x, y) > 0, then
(g(x) − 1)2 με (x)/με (z∗ ) ≤ ε K . Both facts follow from the explicit form of
g and the fact that F is close to its quadratic approximation on Dε .
From these observations we easily derive that the contribution to the Dirichlet form
from (W0 ∩ Dε )c is negligible compared to the contribution from W0 ∩ Dε . This
yields Proposition 10.15.

10.3 Lower bounds on capacities

To obtain sharp lower bounds on capacities we use the Berman-Konsowa principle
from Sect. 7.3 and the ideas presented in Sect. 9.1, in particular, Lemma 9.4. We
prove the following counterpart of Proposition 10.15.
10.3 Lower bounds on capacities 259

Fig. 10.2 The defective flow from A to B

Proposition 10.16 With the notation from Proposition 10.15,

|γ̂1 |ε 2π d/2 1 3
cap(A, B) ≥ με z∗ √ 1 + O ε ln(1/ε) .
2π ε − det A(z∗ )
(10.3.1)

Proof We have to construct a defective unit flow fA,B from A to B that reproduces
the upper bound from Proposition 10.15. The construction of the flow is a bit more
artistic than the construction of the approximate harmonic function. The idea is to
channel the flow through a certain neighbourhood Gε of z∗ that plays a rôle similar
to that of the cube Dε in Sect. 10.2 (recall Fig. 10.2). We will construct the flow
from three pieces, fA , f, fB . Here, fA is a unit flow from A to ∂A Gε , fB is a unit
flow from ∂B Gε to B, and f is a defective unit flow from ∂A Gε to ∂B Gε associated
with the approximate harmonic function g that we used in the upper bound (recall
Fig. 10.1 and see Fig. 10.2). In fact, for x ∈ Gε we set
μ̃ε (x)r [g(x + εe ) − g(x)]
f (x, x + εe ) = , (10.3.2)
N (g)
where N(g) is the normalising constant so that the total flow out of ∂A Gε equals 1,
which is given by

N(g) = μ̃ε (x)r g(x + εe ) − g(x) . (10.3.3)
x∈∂A Gε 1≤≤d :
x+εe ∈Gε

Recall from (10.2.24) that

4
ε|γ̂1 |
g(x + εe ) − g(x) = v exp −|γ̂1 |(v, x)2 /2ε 1 + O(ρ) (10.3.4)
2π
uniformly in Gε . Recall the definition of the transition probabilities q f in (7.3.26).
Substitution of (10.3.4) into (10.3.2) yields, because r ν = ν̌ ,
f ((x, x + εe )) v̌
q f (x, x + εe ) = d = d 1 + O(ρ) . (10.3.5)
k=1 f ((x, x + εek )) k=1 v̌k

This is essentially a directed nearest-neighbour random walk with drift in the di-
rection of v̌. Recall that v̌ is the direction of steepest descent of F at the saddle
260 10 Discrete Reversible Diffusions

point z∗ . This implies, in particular, that with probability tending to one paths start-
ing in ∂A Gε stay within a larger cylinder in the direction of v̌ with base containing
∂A Gε before leaving Dε .
The choice of the flow into ∂A Gε is rather arbitrary. Ideally, we would like to
take disjoint paths from A to each point in ∂A Gε and send the flow through this
path. Of course, this is not possible when e.g. A is a single point, since in that case
paths need to merge. However, this is not really a problem because these parts of the
paths will not give any relevant contributions to the capacity anyway. Likewise, the
flow arriving in ∂B Gε will be channeled into B along coalescing paths. Figure 10.2
depicts this choice.
We will only consider paths from A to B that enter Gε through ∂A Gε and exit Gε
through ∂B Gε . By construction, this set of paths has Pf -probability at least 1 − o(1).
Any such path consists of three pieces: γ1 : A → ∂A Gε , γ2 : ∂A Gε → ∂B Gε and
γ3 : ∂B Gε → B. Consequently,

fA,B ((x, y))

3 fA,B ((x, y))
= = K1 + K2 + K3 . (10.3.6)
μ̃ε (x)pε (x, y) μ̃ε (x)pε (x, y)
(x,y)∈γ i=1 (x,y)∈γi

The term K2 gives the desired contribution (recall (10.3.2)). Using (10.2.10–
10.2.11) and the explicit form of g in (10.2.19–10.2.20), we get (recall (10.1.2))
f ((x, y)) 1
K2 = = g(x) − g(y)
μ̃ε (x)pε (x, y) N (g)
(x,y)∈γ2 (x,y)∈γ2

1
= g γ2 (|γ2 |) − g γ2 (0)
N (g)
4 √
1 |γ̂1 | C ε ln(1/ε) −|γ̂1 |u2 /2ε
= e du
N (g) 2πε −C √ε ln(1/ε)
1
= 1 − O εC . (10.3.7)
N (g)

The terms K1 and K3 are negligible, provided the paths γ1 , γ3 stay within the
level set {x ∈ Sε : F (x) ≤ F (z∗ ) − C ε ln(1/ε)}, which can always be achieved due
to our assumptions on the function F . Namely, even the crudest possible bound
f ((x, y)) ≤ 1 implies that (recall (10.1.1)–(10.1.2))
fA ((x, y)) 1
K1 = ≤
με (x)pε (x, y) με (x)pε (x, y)
(x,y)∈γ1 (x,y)∈γ1

∗ )/2ε−C ln(1/ε) 1
≤ C|γ1 |eF (z , (10.3.8)
N (g)

provided C is large enough. The last estimate in (10.3.8) will be shown in (10.3.18–
10.3.19) below. The same argument applies to K3 and fB . Hence, we obtain with
10.3 Lower bounds on capacities 261

the help of Lemma 10.13 that

−1

fA,B ((x, y))

Ef
με (x)pε (x, y)
(x,y)∈γ

≥ N (g) 1 − o(1) Pf (τ∂B Gε < τ∂Gε \∂B Gε ) = N (g) 1 − o(1) . (10.3.9)

In order to apply Lemma 9.4, we must ensure that the accumulated defect is
negligible. Recall from Lemma 9.4 that the error factor is bounded by
M
"

δ(y) −1
1 + max , (10.3.10)
y∈Ak F (y)
k=1

where M is the length of the path and Ak , k = 1, . . . , M are the sets defined in
(7.3.31). For our choice of the flow,

δ(y) %ε g)(y)
(L
= . (10.3.11)
F (y) z∈Gε : g(y)<g(z) με (y)pε (y, z)[g(z) − g(y)]

By Lemma 10.14 and (10.2.24), for all y ∈ Gε ,

%ε g)(y)
(L
≤ O ρ2 . (10.3.12)
z∈Gε : g(y)<g(z) με (y)pε (y, z)[g(z) − g(y)]

On the other hand, the paths in Gε have length at most ρ/ε, so that
M
"
−1
δ(y) −ρ/ε
1 + max ≥ 1 + O ρ2
y∈Ak F (y)
k=1
3
≥1−O ε ln(1/ε) , (10.3.13)

which controls the error factor.

Finally, we must show that, with the right choice of Gε , the normalisation N (g)
%
√ to EDε (g, g). Let Gε be the cylinder with axis v̂, radius ρ and
is essentially equal
length ρ = C ε ln(1/ε), centred at z∗ . The constants C, C will be chosen such
(z∗ ) − cε ln(1/ε)
that, for all x in the front and end bases of the cylinder, F (x) ≤ F
for some c > 0. This is possible because, to leading order for x = di=1 ai v (i) ,

d

F (x) = − 12 |γ̌1 |a12 + 1
2 γ̌i ai2 + O ε 3/2 , (10.3.14)
i=2

where
√ we recall (10.1.13)√ and (10.2.16). For points in the front base, a1 =
C ε ln(1/ε) and ai ≤ C ε ln(1/ε), i = 2, . . . , d, so by making C large we can
achieve our objective.
262 10 Discrete Reversible Diffusions

Fig. 10.3 The cylinder Gε , and the pieces ∂A Gε and ∂B Gε of ∂Gε at the front and end base of
the cylinder

Let ∂B Gε be the end base of the cylinder, in the direction of B. Let ∂A Gε be the
√
central part of radius C ε ln(1/ε) with C < C of the front base of the cylinder,
in the direction of A. Any choice of C < C will actually be fine. The cylinder Gε
is depicted in Fig. 10.3.
By Lemma 10.13, inside Gε we may work with the modified Dirichlet form
given in (10.2.13). The boundary ∂Gε of Gε consists of three disjoint pieces, ∂Gε =
∂A Gε ∪ ∂B Gε ∪ ∂r Gε , where ∂r Gε is simply what is left over after the other two
pieces are removed. Let g be the approximate harmonic function defined in (10.2.3)
and (10.2.20). Proceeding along the lines of (10.2.32)–(10.2.33), we see that

d
2
EGε (g, g) = 1 + o(1) μ̃ε (x) r g(x + εe ) − g(x) . (10.3.15)
x∈Gε =1

Now use the first Green identity in (10.3.15), to get

d
2
μ̃ε (x) r g(x + εe ) − g(x)
x∈Gε =1

=− %ε g)(x)
μ̃ε (x)g(x)(L
x∈Gε

+ r μ̃ε (x)g(x) g(x) − g(x + εe ) . (10.3.16)
x∈∂Gε 1≤≤d :
x+εe ∈Gε

Using the bound in (10.2.21) from Lemma 10.14, we get

√ −1+d/2
μ̃ε (x)g(x)(Lε g)(x) ≤ O (ρ/ε)d ερ 2 = O ε 3/2−d/2 ln(1/ε)
% .

x∈Gε
(10.3.17)
But this is much smaller than E˜Gε (g, g), which we know to be of order ε 1−d/2 by
(10.2.32–10.2.33). Finally, in view of the fact that g(x) ≤ ε C∗ , x ∈ ∂B Gε (where C∗
can be taken as large as desired by taking C large enough) and that μ̃ε (x) decays as
10.4 Bibliographical notes 263

(x, v̂)2 increases (see (10.3.14), we get that

r μ̃ε (x)g(x) g(x) − g(x + εe )
x∈∂Gε 1≤≤d :
x+εe ∈Gε

= 1 + o(1) r μ̃ε (x) g(x) − g(x + εe ) . (10.3.18)
x∈∂A Gε 1≤≤d :
x+εe ∈Gε

Hence

E˜Gε (g, g) = 1 + o(1) r μ̃ε (x) g(x) − g(x + εe )
x∈∂A Gε 1≤≤d
x+εe ∈Gε

= 1 + o(1) N (g), (10.3.19)

where we recall (10.3.3). This completes the proof of Proposition 10.16.

In conclusion, we have achieved our goal to derive asymptotically coinciding up-

per and lower bounds for capacities of disjoint sets that are neighbourhoods of two
local minima of F . In the current formulation, we assumed that there is only one rel-
evant saddle point connecting these minima, but the extension to more complicated
situations is straightforward (see the general discussion in Sect. 9.1).

10.4 Bibliographical notes

1. The class of models treated in this chapter served as the starting point of the
potential-theoretic approach to metastability initiated in Bovier, Eckhoff, Gayrard
and Klein [33]. This paper was leaning heavily on renewal ideas, and identified
prefactors of capacities and mean hitting times only up to constants.

2. The method that allows us to identify the prefactors up to a multiplicative error

1 + o(1) is the Berman-Konsowa principle for lower bounds on capacities, intro-
duced in Bianchi, Bovier and Ioffe [24] for the analysis of the random-field Curie-
Weiss model with a continuous distribution of the random magnetic field, which is
treated in Chap. 15. The proof using defective flows given here is new.
Chapter 11
Diffusion Processes with Gradient Drift

“Ignorance of Axioms”, the Lecturer continued, “is a great

drawback in life. It wastes so much time to have to say them
over and over again. For instance, take the Axiom “Nothing is
greater than itself”, that is, “Nothing can contain itself”. How
often do you hear people say “He was so excited, he was quite
unable to contain himself”. Why, of course he was unable! The
excitement had nothing to do with it”.
(Lewis Carroll, Sylvie and Bruno Concluded)

The first steps towards describing metastability for models with a non-discrete state
space lead to finite-dimensional diffusions. These are the processes originally stud-
ied by Freidlin and Wentzell [115]. In the case of gradient drifts we are able to
recover the heuristic predictions by Eyring and Kramers explained in Sect. 2.1.1.
The presentation below contains two main parts. After describing the setting in
Sect. 11.1, we derive sharp estimates on average hitting times in Sect. 11.2. This
requires the use of sharp estimates on capacities, together with the regularity es-
timates that were presented in Sect. 9.4. In Sect. 11.3 we compute the low-lying
spectrum of the generator of the diffusion. The result is completely analogous to
the spectral result described in Chap. 8 for Markov processes with a discrete state
space. The main message of this chapter is that all the results about metastable sys-
tems that were obtained in Chap. 8 in the setting of a discrete state space carry over
to diffusions with the help of some regularity theory.

11.1 The setting

In this chapter we investigate metastability in the context of reversible diffusion
processes Xε = (Xε (t))t≥0 that were discussed in Sect. 5.6. We limit ourselves to
the simplest case of SDEs of the form
√
dXε (t) = −∇F Xε (t) dt + 2ε dB(t) (11.1.1)

on a regular domain Ω ⊆ Rd , where the drift ∇F is generated by a potential func-

tion F that is sufficiently regular. The parameter ε scales the strength of the noise.
Metastability occurs when F has one or more local minima that are not global min-

© Springer International Publishing Switzerland 2015 265

A. Bovier, F. den Hollander, Metastability,
Grundlehren der mathematischen Wissenschaften 351,
DOI 10.1007/978-3-319-24777-9_11
266 11 Diffusion Processes with Gradient Drift

ima, and when ε is small. We assume that Xε is killed as soon as it exits Ω. This
is the natural extension of the paradigmatic double-well model of Kramers to the
multi-dimensional setting. We will show that the formalism outlined in Chap. 8 is
well suited to study the metastable behaviour of Xε in the limit as ε ↓ 0, and yields
sharp results in a relatively simple manner. Recall that the process Xε is reversible
with respect to the measure

με (dx) = exp −F (x)/ε dx. (11.1.2)

Recall that the generator Lε of this process acts on smooth functions f as

(−Lε f )(x) = εΔf (x) − ∇F (x) · ∇f (x). (11.1.3)

The Dirichlet form is given by

E (f, g) = ε e−F (x)/ε ∇f (x), ∇g(x) dx. (11.1.4)
Ω

We begin by formulating assumptions on Ω and F .

Assumption 11.1
(i) Ω ⊆ Rd is open and connected, and F ∈ C 3 (Ω).
(ii) If Ω is unbounded, then
(ii.1) limx→∞ |( x
x
, ∇F (x))| = ∞,
(ii.2) limx→∞ [∇F (x) − 2ΔF (x)] = ∞.

Assumption 11.1 ensures that the resolvent of the generator −Lε of Xε is com-
pact for ε sufficiently small. Moreover, it implies that F has exponentially tight level
sets, i.e.,

e−F (y)/ε dy ≤ Ce−a/ε ∀ a > 0, (11.1.5)
{y∈Rd : F (y)≥a}

where C = C(a) < ∞ is uniform in 0 < ε ≤ 1.

Recall Definition 10.2. Throughout the sequel, Assumption 11.1 is in force, as
well as Assumptions 10.3 and 10.5 in Sect. 10. In this situation, it can be shown
easily that (11.1.1) has a global strong solution (Bauer [14]).

11.2 Capacity estimates and mean hitting times

Our main interest in this section is the mean of the hitting time

τA = inf t > 0 : X(t) ∈ A , A ⊂ Rd , (11.2.1)

for X starting in a minimum of F , say x ∈ M , when A = Bρ (y) is a small ball

of radius ρ around another minimum of F , say y ∈ M . It will become apparent
11.2 Capacity estimates and mean hitting times 267

Fig. 11.1 Motion in a potential with two wells

that the precise choice of the hitting set is not important, and that the problem of
computing τA is virtually equivalent to computing the escape time from a suitably
chosen neighbourhood of x, provided this neighbourhood contains the relevant sad-
dle points connecting x and y. Figure 11.1 schematically depicts the motion of a
particle in a two-well potential.

11.2.1 Main results

The basis for the success of the potential-theoretic approach to metastability is the
fact that capacities can be estimated sharply. Recall the definition of the communi-
cation height Φ(A, B) and the essential gate G (A, B) and between two disjoint sets
A, B, as introduced in Definition 10.2.

Theorem 11.2 (Capacity asymptotics) Assume that A, B ⊂ Rd are closed and dis-
joint such that
(i) dist(G (A, B), A ∪ B) ≥ δ > 0 for some δ independent of ε.
(ii) Both A and B contain a closed ball of radius at least ε.
If G (A, B) = {z1∗ , . . . , zn∗ }, then

cap(A, B)
(2πε)d/2
n
[−λ∗1 (zi∗ )] 3
= e−Φ(A,B)/ε 1 + O ε ln(1/ε) ,
2π − det(∇ 2 F (zi∗ ))
i=1
(11.2.2)
where λ∗1 (zi∗ ) denotes the negative eigenvalue of the Hessian of F at zi∗ .
268 11 Diffusion Processes with Gradient Drift

Theorem 11.3 (Mean metastable exit times) Let xi ∈ M be a minimum of F and

let D ⊂ Rd be a closed set such that:
k(i)
(i) j =1 Bε (yj ) ⊂ D, where Mi = {y1 , . . . , yk(i) } ⊂ M enumerates all the min-
ima of F with F (yj ) ≤ F (xi ), j = 1, . . . , k(i).
(ii) dist(G (xi , Mi ), D) ≥ δ > 0 for some δ independent of ε.
Then
Exi [τD ]
' k(i) (−1
2πe[Φ(xi ,Mi )−F (xi )]/ε [−λ∗1 (zj∗ )] 3
= 1 + O ε ln(1/ε) .
det(∇ 2 F (xi )) ∗
j =1 − det(∇ F (zj ))
2

(11.2.3)

In the special case where there is only one saddle point z∗ , (11.2.3) reduces to
the classical Eyring-Kramers formula in (2.1.2):

2πe[F (z )−F (xi )]/ε − det(∇ 2 F (z∗ ))
∗ 3
Exi [τD ] = 1 + O ε ln(1/ε) .
[−λ∗1 (z∗ )] det(∇ 2 F (xi ))
(11.2.4)

11.2.2 Rough estimates on capacities and harmonic functions

To prove Theorems 11.2–11.3, we follow the general strategy outlined in Sect. 9.1.
In the present section we derive rough estimates on capacities. Via the renewal es-
timate in (9.4.6) these lead to rough estimates on harmonic functions. These will in
turn lead to sharp estimates on capacities and the equilibrium potential.

Lemma 11.4 Let D ⊂ Rd be closed, and let x ∈ D c be such that d(x, D) ≥ ρ for
some 0 < ρ ≤ ε. Let z∗ = z∗ (x, D) be any point in G (x, D). Then there are con-
stants C , Cu > 0 independent of ε such that, for ε small enough,
∗ ∗
C ρ d−1 e−F (z )/ε ≤ cap Bρ (x), D ≤ Cu ερ −1 e−F (z )/ε . (11.2.5)

Proof To prove the lower bound, we use the Dirichlet principle (recall Theo-
rem 7.33) and monotonicity. We begin by choosing a smooth path ω from x to D re-
maining in the level set {z ∈ Rd : F (z) ≤ F (z∗ )} and reaching the value F (z∗ ) only
when passing through z∗ . (The canonical path can be constructed by using pieces of
the deterministic trajectory of the unperturbed equation dXε (t) = −∇F (Xε (t))dt in
a rather obvious manner, but this is not important.) Given this path, we parametrise
it by arc-length, so that ω̇(t)2 = 1 for all t.
Given ω(t), we consider the tube of width ρ around ω(t):

ωρ = z ∈ Rd : ∃ t ∈ 0, |ω| such that ω(t) − z 2 ≤ ρ . (11.2.6)
11.2 Capacity estimates and mean hitting times 269

Let Dρ denote the (d − 1)-dimensional disk of radius ρ centred at the origin. The
important fact to note is that, for any h,

2 d 2
∇h ω(t) + z⊥ 2
≥ h ω(t) + z⊥ . (11.2.7)
dt

Here z⊥ denotes a vector in the subspace orthogonal to ω at ω(t).

We may therefore bound the Dirichlet form in (11.1.4) as
|ω|
−F (ω(t)+z⊥ )/ε d 2
E (h, h) ≥ ε dz⊥ dt e h ω(t) + z⊥ . (11.2.8)
Dρ 0 dt

The minimisation problem is now trivial, i.e., it decomposes for each fixed z⊥ into
a one-dimensional problem whose solution is well known. In fact, the minimiser
hz⊥ (t) is the solution of the 1-dimensional Dirichlet problem

d d d
−ε + F ω(t) + z⊥ hz (t) = 0,
dt dt dt ⊥
(11.2.9)
hz⊥ (0) = 1,
hz⊥ (|ω|) = 0,

whose solution is readily found to be

|ω|
ds eF (ω(s)+z⊥ )/ε
hz⊥ (t) = t|ω| . (11.2.10)
0 ds eF (ω(s)+z⊥ )/ε

Inserting this solution into the lower bound (11.2.8), we get

|ω| −1

cap Bρ (x), D ≥ ε dz⊥ dt e F (ω(t)+z⊥ )/ε
. (11.2.11)
Dρ 0

From here the lower bound in (11.2.5) follows from simple saddle point evaluations
of the integral in the denominator.
The upper bound is obtained by choosing a suitable test function that changes
from zero to one over a distance ρ along a plane separating Bρ (x) and D passing at
distance ρ from x.

Corollary 11.5 Let A, D ⊂ Rd be disjoint sets. Let x ∈ (A ∪ D)c . Then, for 0 <
ρ ≤ ε, there exists a constant C such that, for ε small enough,
∗
hA,D (x) ≤ Cρ −d e−F (z )/ε cap Bρ (x), A (11.2.12)

with z∗ the same as in Lemma 11.4.

Proof Combine Lemma 11.4 with Lemma 9.4.7.

270 11 Diffusion Processes with Gradient Drift

11.2.3 Sharp estimates on capacities

In this section we prove Theorem 11.2. The proof is similar to the one for discrete
diffusions in Chap. 10.

Proof The capacity cap(A, B) satisfies the Dirichlet principle in Theorem 7.33,

cap(A, B) = inf E (h, h), (11.2.13)

h∈HBA

where

HBA = h ∈ W 1,2 Rd , Q(dx) : h(z) ∈ [0, 1], ∀z ∈ Rd , h|A = 1, h|B = 0
(11.2.14)
with Q(dx) = e−F (x)/ε dx. For simplicity we only consider the case of a single
saddle point z∗ (i.e., n = 1 in (11.2.2)).We may assume without loss of generality
that F (z∗ ) = 0 and z∗ = 0. We choose coordinates that diagonalise the Hessian of
F at z∗ , so that (with λ∗i = λ∗i (z∗ ))

d

F (z) = 12 λ∗1 z12 + 1 ∗ 2
2 λi zi + O z32 , z ↓ 0. (11.2.15)
i=2

Define a neighbourhood of zero

d
5
Cδ = −δ/ −λ∗1 , δ/ −λ∗1 −2δ/ λ∗i , 2δ/ λ∗i . (11.2.16)
i=2

Since we have assumed that there is a single saddle point at the communication
height between A and B, it is possible to choose δ > 0 so small that there exists a
strip Sδ of width 2δ/ [−λ∗1 ], containing 0, separating A and B in the sense that any
path connecting them must cross Sδ , and such that F (z) ≥ δ 2 for all z ∈ Sδ \Cδ .
Let DA and DB be the connected components of Rd \Sδ containing A and B, re-
spectively.

Upper bound

To prove an upper bound on the capacity we simply choose a convenient function h+

(see Fig. 11.2):
⎧
⎪
⎪ 1, if z ∈ DA ,
⎪
⎨ 0, if z ∈ DB ,
h+ (z) =
⎪
⎪
⎪ arbitrary in [0, 1], if z ∈ Sδ \Cδ with ∇h+ 2,με ≤ c [−λ∗1 ]/δ,
⎩
f (z1 ), if z ∈ Cδ ,
(11.2.17)
11.2 Capacity estimates and mean hitting times 271

Fig. 11.2 Schematic construction of the test function

where f is the solution of the one-dimensional Dirichlet problem

d d d
−ε + F (z1 , 0) f (z1 ) = 0,
dz1 dz1 dz1

f −δ/ −λ∗1 = 1, (11.2.18)

f +δ/ −λ∗1 = 0.

The solution of this problem is obviously

δ/√[−λ∗ ]
1 eF (t,0)/ε dt
z1
f (z1 ) = √ ∗ . (11.2.19)
δ/ [−λ1 ]
√ ∗ eF (t,0)/ε dt
−δ/ [−λ1 ]

Inserting (11.2.17) into (11.2.13), we see that

√ √
2δ/ λ∗2 2δ/ λ∗d
cap(A, B) ≤ ε √ ∗ dz2 . . . √ dzd
−2δ/ λ2 −2δ/ λ∗d
√

δ/ [−λ∗1 ] 2
× √ dz1 e−F (z)/ε f (z1 )
−δ/ [−λ∗1 ]

+ εc2 −λ∗1 δ −2 dz e−F (z)/ε . (11.2.20)
Sδ \Cδ
272 11 Diffusion Processes with Gradient Drift

The second term in (11.2.20) is bounded by a constant times εδ −2 e−δ

2 /ε
, because
of our assumption on F . The first term equals
−F (z)/ε e2F (z1 ,0)/ε
+ + C dz e
ECδ h , h = ε δ √ ∗ 2 . (11.2.21)
δ/ [−λ1 ]
√ ∗ dt eF (t,0)/ε
−δ/ [−λ1 ]

Now, on the set Cδ we have

λ∗1 z12 + λ∗2 z22 + · · · + λ∗d zd2
F (z) = F (0) + + O z32 (11.2.22)
2
and hence
−λ∗1 z12 + λ∗2 z22 · · · + λ∗d zd2
F (z) − 2F (z1 , 0) = −F (0) + + O z32 . (11.2.23)
2
√
But z2 ≤ C δ on Cδ for some so if we choose δ = C ε ln(1/ε) for some C , for

some C < ∞, then the numerator in (11.2.21) satisfies the bound

dz e−F (z)/ε e2F (z1 ,0)/ε
C

−λ∗ z2 + · · · + λ∗d zd2

≤ e−F (0)/ε eO(ε
1/2 [ln(1/ε)]3/2 )
dz exp − 1 1
Rd 2ε
(2πε)d/2 1/2 3/2
= e−F (0)/ε !d ∗ 1 + O ε ln(1/ε) . (11.2.24)
∗
[−λ1 ] i=2 λi
Similarly, the integral in the denominator in (11.2.21) is bounded from below by
√ ∗ δ/ [−λ1 ]
√ dt eF (t,0)/ε
−δ/ [−λ∗1 ]
∞

1/2 [ln(1/ε)]3/2 ) (2πε)1/2 ∗ 2 /ε

≥ eO(ε eF (0)/ε −2 √ dt eλ1 t
[−λ∗1 ] δ/ [−λ∗1 ]

e−δ /ε
2
1/2 [ln(1/ε)]3/2 ) (2πε)1/2
≥ eO(ε eF (0)/ε −
[−λ∗1 ] δε −1/2

(2πε)1/2 1/2 3/2

= eF (0)/ε 1 + O ε ln(1/ε) . (11.2.25)
[−λ∗1 ]
Combining the estimates in (11.2.20) and (11.2.24)–(11.2.25), we arrive at the upper
bound
[−λ∗1 ] 3/2
ECδ h+ , h+ ≤ e−F (0)/ε (2πε)d/2 1 + O ε 1/2 ln(1/ε) .
2π − det(∇ 2 F (0))
(11.2.26)
11.2 Capacity estimates and mean hitting times 273

Lower bound

For the lower bound we consider a different domain, namely,

d
5
& ∗ ∗
Cδ = −2δ/ −λ1 , 2δ/ −λ1 −δ/ (d − 1)λ∗i , δ/ (d − 1)λ∗i
i=2

&δ⊥ .
= −2δ/ −λ∗1 , 2δ/ −λ∗1 ⊗ C (11.2.27)

Let h∗ denote the minimiser of the variational problem in (11.2.13), i.e., the equi-
librium potential of the capacitor (A, B). Then

inf E (h, h) = E h∗ , h∗ ≥ EC&δ h∗ , h∗ , (11.2.28)
h∈HBA

&δ . Obviously,
where EC&δ is the restriction of the Dirichlet form to the domain C

2
∂h(z)
EC&δ (h, h) ≥ E¯C&δ (h, h) = ε dz e −F (z)/ε
(11.2.29)
&δ
C ∂z1
√ 2

2δ/ [−λ∗1 ]
−F (z)/ε ∂h(z1 , z⊥ )
=ε dz⊥ √ ∗ dz1 e
&⊥
C −2δ/ [−λ1 ] ∂z1
δ

≥ε dz⊥ √ inf √
&⊥
C f : f (±δ/ [−λ∗1 ])=h∗ (±δ/ [−λ∗1 ])
δ

2δ/ [−λ∗1 ] 2
√ dz1 e−F (z)/ε f (z1 ) .
−2δ/ [−λ∗1 ]

The minimisation problem for fixed values of z⊥ is the solution of the Dirichlet
problem

d d d
−ε + F (z1 , z⊥ ) f (z1 ) = 0,
dz1 dz1 dz1

f −2δ/ −λ∗1 = h∗ −2δ/ −λ∗1 , z⊥ , (11.2.30)

f +2δ/ −λ∗1 = h∗ 2δ/ −λ∗1 , z⊥ .

Set a = h∗ (−2δ/ [−λ∗1 ], z⊥ ), b = h∗ (2δ/ [−λ∗1 ], z⊥ ) and g(z1 ) = F (z1 , z⊥ ).
Then the general solution of the differential equation in (11.2.30) is
s
f (z1 ) = c eg(t)/ε dt, (11.2.31)
z1
274 11 Diffusion Processes with Gradient Drift

where the constants c and s are determined by the boundary conditions, i.e.,
s
c √ eg(t)/ε dt = a,
−2δ/ [−λ∗1 ]
s
(11.2.32)
c √ e g(t)/ε
dt = b,
2δ/ [−λ∗1 ]

from which we get

a
c= s . (11.2.33)
√ eg(t)/ε dt
−2δ/ [−λ∗1 ]

The value of s is determined through the equation

s 2δ/√[−λ∗ ]
a 1
√ ∗ e
g(t)/ε
dt = √ ∗ e
g(t)/ε
dt. (11.2.34)
−2δ/ [−λ1 ] a − b −2δ/ [−λ1 ]

Inserting this solution into (11.2.29) and recalling the definition of g, we obtain

EC&δ h∗ , h∗
2δ/√[−λ∗ ]
1 e−F (z1 ,z⊥ )/ε [h∗ (−2δ/ [−λ∗1 ], z⊥ )]2 e2F (z1 ,z⊥ )/ε
≥ε dz⊥ √ dz1 2
&⊥ −2δ/ [−λ∗1 ]
C δ √ ∗ eF (t,z⊥ )/ε dt
s(z⊥ )
−2δ/ [−λ1 ]

[h∗ (−2δ/ [−λ∗1 ], z⊥ ) − h∗ (2δ/ [−λ∗1 ], z⊥ )]2
=ε dz⊥ √ . (11.2.35)
&⊥ 2δ/ [−λ∗1 ]
Cδ √ ∗ eF (t,z⊥ )/ε dt
−2δ/ [−λ1 ]

Using (11.2.15) again, we see that

2δ/√[−λ∗ ] √
[−λ∗1 ]
1 d λ∗i zi2 +O(δ 3 /ε) 2δ/ [−λ∗
1 ]t
2

√ ∗ e F (t,z⊥ )/ε
dt = e i=2 2ε
√ dt e− 2ε
−2δ/ [−λ1 ] −2δ/ [−λ∗1 ]
√
2πε d λ∗i zi2 +O(δ 3 /ε)
≤ e i=2 2ε , (11.2.36)
[−λ∗1 ]
and so
d
ε[−λ∗1 ] λ∗ z 2 3
EC&δ h∗ , h∗ ≥ √ dz⊥ exp − i i
+ O δ /ε
2π C&⊥ 2ε
δ i=2
2
× h∗ −2δ/ −λ∗1 , z⊥ − h∗ 2δ/ −λ∗1 , z⊥ .
(11.2.37)

The following lemma shows how close the values of h∗ appearing in (11.2.37)
are to 0 and 1, respectively.
11.2 Capacity estimates and mean hitting times 275

&⊥ ,
Lemma 11.6 Uniformly in z⊥ ∈ C δ

1 − h∗ −2δ/ −λ∗1 , z⊥ ≤ Cε −d/2 e−δ /4ε ,
2

h∗ 2δ/ −λ∗1 , z⊥ ≤ Cε −d/2 e−δ /4ε .
2
(11.2.38)

Proof Use that h∗ (z) = Pz [τA < τB ] = hA,B (z) in combination with Corol-
lary 11.5.

Inserting these bounds into (11.2.37), we arrive at

∗ ∗ −F (0)/ε (2πε)
d/2 [−λ∗1 ]
EC&δ h , h ≥ e !d ∗
2π i=2 λi
√
d−1
2 ε(d − 1) − 2(d−1)ε
δ2
× 1 − Cε −d/2 e−δ /4ε e−O(δ /ε) 1 −
2 3
e .
δ
(11.2.39)

Choosing, as before, δ 2 = Cε ln(1/ε), we see that to leading order (11.2.39) coin-

cides with the upper bound (11.2.26), which proves Theorem 11.2 for the case n = 1
saddle points in the essential gate.
The generalisation to the case of several saddle points in the essential gate is
straightforward and is left to the reader.

11.2.4 Metastable exit times and capacities

In Propositions 11.7–11.8 below we compute the mean value of certain metastable

exit times in terms of capacities. At the end we use these propositions to prove
Theorem 11.2.

Proposition 11.7 Let x be a (non-degenerate quadratic) critical point of F , and let

A, D ⊂ Rd be disjoint closed sets. Then there exists an α > 0 such that

Dc dy e−F (y)/ε hBε (x),D (y)
Ex [τD ] = 1 + O ε α/2 (11.2.40)
cap(Bε (x), D)

and

(A∪D)c dy e−F (y)/ε hBε (x),A∪D (y)hD,A (y)
Ex [τD 1τD <τA ] = 1 + O ε α/2 .
cap(Bε (x), A ∪ D)
(11.2.41)
276 11 Diffusion Processes with Gradient Drift

Proof The proofs of (11.2.40) and (11.2.41) are analogous, and therefore we will
only give the former. The strategy is to first use Corollary 7.30 with A a small ball
around x, and then to use that the average exit time does not vary much as a function
of the starting point on this ball. In fact, by Corollary 7.30,
−F (y)/ε h
(A∪D)c dy e Bε (x),A∪D (y)hD,A (y)
νBε (x),D (dy)Ey [τD 1τD <τA ] = .
cap(Bε (x), A ∪ D)
(11.2.42)
Recall that wD (y) = Ey [τD ], y ∈ / D, solves the Dirichlet problem in (7.2.74) with
g = 1. Let BR0 (x) be the ball of radius R0 centred at x, where x is a critical point
of F . Then there is a K < ∞ such that supy∈BR (x) ∇F (y)∞ ≤ KR0 for R0 small
0
enough. Hence √ Lemmas 9.8–9.9 applied to this function have uniform constants
when R0 ≤ ε. Thus, by Lemma 9.13, wD inherits from Lemma 9.8 the uniform
Harnack bound
sup wD (y) ≤ C inf wD (y). (11.2.43)
y∈B√ε (x) y∈B√ε (x)

Now use Lemma 9.9 with R = ε. This yields

oscBε (x) wD ≤ Cε α/2 sup wD (y) + R0 (11.2.44)
y∈BR0 (x)

and implies

sup wD (y) ≤ wD (x) + C 2 eα/2 wD (x) + Cε 1/2+α/2 ,

y∈Bε (x)
(11.2.45)
inf wD (y) ≥ wD (x) − C 2 eα/2 wD (x) − Cε 1/2+α/2 .
y∈Bε (x)

Using these estimates in (7.2.78) of Corollary 7.30 with ρ = ε and A = Bε (x), we

get the claim in (11.2.40).

Proposition 11.8 Let xj , j = 1, . . . , n, be the local minima of F . Let Sk =

∪ki=1 Bε (xi ), k = 1, . . . , n, be the collection of disjoint balls around the first k min-
ima such that no ball contains a saddle point of F . Assume that, for all j > k,
F (xi ) > F (xj ) for all i > k with i = j . Then, for all j > k

1 (2πε)d/2
Exj [τSk ] = e−F (xj )/ε
cap(Bε (xj ), Sk ) det(∇ 2 F (x ))
j

3
× 1 + O ε ln(1/ε) , ε α/2 . (11.2.46)

Proof Fix j > k. Consider the set Γj = {y ∈ Ω : F (y) ≤ Φ(xj , Sk ) + δ} for δ > 0
sufficiently small. Decompose Γj into its connected components: Γj = ∪ι̃ Γj (ι̃).
11.2 Capacity estimates and mean hitting times 277

Write

dy e−F (y)/ε hBε (xj ),Sk (y)
Ω

−F (y)/ε
= dy e hBε (xj ),Sk (y) + dy e−F (y)/ε hBε (xj ),Sk (y).
Γjc ι̃ Γj (ι̃)
(11.2.47)

The first integral is bounded from above by C exp(−[Φ(xj , Sk ) + δ]/ε) and is

therefore negligible. The sum over ι̃ can be split into ι̃ ∈ L and ι̃ ∈ R with
L = {ι̃ : Φ(xi , Sk ) > Φ(xi , xj ) ∀ xi ∈ Γj (ι̃)} and R the remaining ι̃’s. The point
of this decomposition is that hBε (xj ),Sk (y) is close to 1 for ι̃ ∈ L and y ∈ Γj (ι̃),
while hBε (xj ),Sk (y) is close to 0 otherwise. (Here we make use of the fact that if
y, xi ∈ Γj (ι̃) and Φ(xi , Sk ) > Φ(xi , xj ), then Φ(y, Sk ) = Φ(ι̃, Sk ).) We have

−F (y)/ε
dy e hBε (xj ),Sk (y) = dy e−F (y)/ε 1 − hSk ,Bε (xj ) (y) .
ι̃∈L Γj (ι̃) ι̃∈L Γj (ι̃)
(11.2.48)
By Corollary 11.5 and the upper bound on cap(Bε (xi ), Sk ) provided by Theo-
rem 11.2, we get, for ι̃ ∈ L and y ∈ Γj (ι̃),

0 ≤ hSk ,Bε (xj ) (y) ≤ Cε −d/2 e−[Φ(xι̃ ,Sk )−Φ(xι̃ ,xj )]/ε , (11.2.49)

which is exponentially small. On the other hand, if xι̃ denotes the absolute minimum
of F within Γj (ι̃) and the Hessian ∇ 2 F (xι̃ ) at this minimum is non-degenerate, then

(2πε)d/2 3
dy e−F (y)/ε = e−F (xι̃ )/ε 1 + O ε ln(1/ε)
Γj (ι̃)\Sk det(∇ 2 F (xι̃ ))
(11.2.50)
by standard Laplace asymptotics. Combining (11.2.48)–(11.2.50), we get

dy e−F (y)/ε hBε (xj ),Sk (y)
ι̃∈L Γj (ι̃)
(2πε)d/2 3
= e−F (xι̃ )/ε 1 + O ε ln(1/ε) . (11.2.51)
ι̃∈L det(∇ 2 F (xι̃ ))

Note that, under our assumptions on F , xj is the unique value in the sum over ι̃ for
which F (xι̃ ) takes its minimal value, and hence the sum is dominated by this single
term.
The terms with ι̃ ∈ R cannot be computed as precisely, but they are negligible.
Indeed, note that, under our assumptions, all components Γj (ι̃) that do not intersect
Sk give a contribution that is smaller than exp(−F (xj )/ε) times an exponentially
small factor, and hence are negligible compared to what we get from (11.2.51).
278 11 Diffusion Processes with Gradient Drift

Again using Corollary 11.5 when ι̃ ∈ R, we get

dy e −F (y)/ε
hBε (xj ),Sk (y) ≤ Cε −d
dy e−F (y)/ε e−[Φ(y,xj )−Φ(y,Sk )]/ε .
Γj (ι̃) Γj (ι̃)\Sk
(11.2.52)
There are two possibilities. Either y is such that Φ(y, Sk ) = F (y). For those terms
the integrand is bounded from above by exp(−Φ(y, xj )/ε) = exp(−F (Sk , xj )/ε),
which is exponentially small. The integral over all such y is therefore bounded by
this small factor times the cardinality of the set Γj (ι̃). All other y must lie in the
valley Ai of a minimum xi with i > k, and the contribution of such a valley can be
at most of order exp(−F (xj )/ε), which again under our assumptions is smaller than
the main contribution from (11.2.51) times an exponentially small factor. Thus, in
fact, all contributions coming from the terms with ι̃ ∈ R are smaller than the main
term in (11.2.51) by an exponentially small factor. Hence (11.2.46) holds.

We are finally in a position to prove Theorem 11.3.

Proof The proof is immediate by inserting the formula for the capacity in Theo-
rem 11.2 into (11.2.46), except for the error terms of order ε α/2 , which we will show
can be removed. Namely, note that nothing changes in the proof of Proposition 11.8
when we replace the starting point xj by some point x ∈ B√ε (y). Also, inspection
of the proof of Theorem 11.2 shows that the difference between cap(Bε (xj ), Sk )
and cap(Bε (x), Sk ) for x ∈ B√ε (y) is in fact much smaller than the error terms.
Thus, we get

3
oscx∈B ε (xj ) Ex [τSk ] ≤ C ε + ε ln(1/ε) Exj [τSk ],
√
α/2
(11.2.53)

which improves the input in the Hölder estimate by a factor ε α/2 , which in turn
allows us to improve the error estimates in Proposition 11.8 from ε α/2 to ε α . Iter-
ating
this procedure, we can reduce these errors until they are of the same order as
ε[ln(1/ε)]3 .

11.3 Spectral theory

In this section we turn to the analysis of the low-lying spectrum of the generator
(11.1.3) with Dirichlet boundary conditions on Ω c (when Ω = Rd ). The strategy
we follow is similar to that outlined in Sect. 8.4 in the context of discrete state
spaces. The additional input that is needed is again the regularity estimates.
Assumption 11.1 on F ensures that the spectrum of Lε is discrete. Moreover, it
is well known from Wentzell-Freidlin theory [115] that the spectrum has precisely
one exponentially small eigenvalue for each local minimum of the function F . We
show how to get sharp estimates as ε ↓ 0.
11.3 Spectral theory 279

In Sect. 11.3.1 we state our main results. In Sect. 11.3.2 we derive a priori lower
bounds on the spectrum. In Sect. 11.3.3 we look at the principal Dirichlet eigen-
value, in Sect. 11.3.4 at the small eigenvalues. In Sect. 11.3.5 we derive improved
error estimates. In Sect. 11.3.6 we show that the exit times are asymptotically expo-
nentially distributed.

11.3.1 Main results

Theorem 11.9 (Small eigenvalues) Suppose that F has n local minima x1 , . . . , xn ,

and that for some θ > 0 these minima can be labeled in such a way that

Φ(xk , Mk−1 ) − F (xk ) ≤ min Φ(xl , Mk \xl ) − F (xl ) − θ, k = 1, . . . , n,
1≤l<k
(11.3.1)
where
k M 0 = Ω c and M = {x , . . . , x }, k = 1, . . . , n. Set B = B (x ) and S =
k 1 k k ε k k
l=1 Bl and hk (y) = hBk ,Sk−1 (y). Suppose that all essential gates G (xk , Mk−1 )
consist of a single saddle point z∗ (xk , Mk−1 ), and that the Hessian of F is non-
degenerate at all the saddle points and all the local minima. Then there exists a
δ > 0 such that the n exponentially small eigenvalues 0 ≤ λ1 < λ2 < · · · < λn of
−Lε satisfy

λ1 = 0 (11.3.2)

and

cap(Bk , Sk−1 )
λk = 1 + O e−δ/ε
hk 2,με
2

1
= 1 + O e−δ/ε
Exk [τSk−1 ]
2
[−λ∗1 (z∗ (xk , Mk−1 ))] det(∇ 2 F (xk ))
=
2π − det(∇ 2 F (z∗ (xk , Mk−1 )))

× e−[Φ(xk ,Mk−1 )−F (xk )]/ε

3/2
× 1 + O ε 1/2 ln(1/ε) , k = 2, . . . , n, (11.3.3)

where λ∗1 (z∗ ) denotes the unique negative eigenvalue of the Hessian of F at the
saddle point z∗ .

The conditions in (11.3.1) state that “all valleys of F have different depth”,
which is the generic situation. This is analogous to the condition that M be a regular
set of metastable points made in Chap. 8.
280 11 Diffusion Processes with Gradient Drift

Fig. 11.3 First and second eigenfunction in a two-well potential

Remark 11.10 The Wentzell-Freidlin theory of metastability also provides esti-

mates on the small eigenvalues, however, with less precise error estimates, namely,

lim ln ελk = F (xk ) − Φ(xk , Mk−1 ). (11.3.4)

ε↓0

In the course of the proof of Theorem 11.9 we also obtain detailed control on the
eigenfunctions of −Lε corresponding to the small eigenvalues (see Fig. 11.3 for a
schematic representation of the first two eigenfunctions in a double-well potential
in one dimension).

Theorem 11.11 (Properties of eigenfunctions) Under the assumptions of Theo-

rem 11.9, if φk denotes the normalised eigenfunction corresponding to the eigen-
value λk , then there exists a δ > 0 such that
hk (y)
φk (y) = + O e−δ/ε , k = 1, . . . , n. (11.3.5)
hk 2,με

Finally, metastable exit times are asymptotically exponentially distributed when

appropriate non-degeneracy conditions are met.

Theorem 11.12 (Exponential law of metastable exit times) Suppose that the as-
sumptions of Theorem 11.9 are satisfied. Let D ⊂ Rd be a closed subset such that:
(i) If Mk = {y1 , . . . , yl } ⊂ M enumerates all the minima of F such that F (yl ) ≤
F (xk ), then kl=1 Bε (yl ) ⊂ D.
(ii) dist(z∗ (xi , Mi ), D) ≥ δ > 0 for some δ > 0 independent of ε.
Then there exists a δ > 0, independent of ε and t, such that, for all t > 0,
−δ/ε
Pxk τD > t Exk [τD ] = 1 + O e−δ/ε e−t[1+O(e )]
−tλ Ex [τ ]
× O e−δ/ε e l k D + O(1) e−tO(ε ) Exk [τD ] .
d−1

l>k (11.3.6)
11.3 Spectral theory 281

11.3.2 A priori spectral estimates

In this section we derive a priori lower bounds on principal eigenvalues for the
Dirichlet problem in regular open sets D ⊂ Ω ⊆ Rd . The closure of D is denoted
c
by D̄, the complement by D c , and the boundary by ∂D. We denote by λD 0 the
principal (= smallest) eigenvalue of the Dirichlet problem

(−Lε − λ)f (x) = 0, x ∈ D,

(11.3.7)
f (x) = 0, x ∈ Dc .

We sometimes use the notation LεD to indicate the Dirichlet operator corresponding
to (11.3.7).
The following lemma improves the Donsker-Varadhan estimate in Lemma 8.16
when D is unbounded.

Lemma 11.13 Let φD denote the normalised eigenfunction corresponding to the

principal eigenvalue of −LεD , and let A ⊂ D be a compact set. Then

Dc 1 −F (y)/ε
λ0 ≥ 1− dy e 2
φD (y) . (11.3.8)
supx∈A Ex [τD c ] D\A

Moreover, for any δ > 0, there exists a bounded set A ⊂ D, independent of ε, such
that
c 1−δ
λD0 ≥ . (11.3.9)
supx∈A Ex [τD c ]
For B ⊂ D,

1
λ0D ∪B −F (y)/ε
c
≥ 1− dy e 2
φD\B (y) . (11.3.10)
supx∈A Ex [τB |τB ≤ τD c ] D\A

Proof Let w(x) denote the solution of the Dirichlet problem

(−Lε w)(x) = 1, x ∈ D,
(11.3.11)
w(x) = 0, x ∈ Dc .

Note that w(x) = Ex [τD c ]. Moreover,

dx e−F (x)/ε φ(x)(−Lε φ)(x)
D

= dx e−F (x)/ε ∇φ(x) · ∇φ(x)
D

d
2
= dx e−F (x)/ε lim h−2 φ(x + hei ) − φ(x) . (11.3.12)
D h↓0
i=1
282 11 Diffusion Processes with Gradient Drift

Using that, for any a, b ∈ R and C > 0, ab ≤ 12 (Ca 2 + b2 /C), and picking a =
φ(x + hei ), b = φ(x) and C = w(x)/w(x + hei ), we have

2 φ(x + hei )2 φ(x)2

φ(x + hei ) − φ(x) ≥ − w(x + hei ) − w(x) . (11.3.13)
w(x + hei ) w(x)
Inserting this inequality into (11.3.12), we obtain

φ(x)2
dx e−F (x)/ε φ(x)(−Lε φ)(x) ≥ dx e−F (x)/ε (−Lε w)(x)
D D w(x)

φ(x)
= dx e−F (x)/ε φ(x) (11.3.14)
D w(x)

1
≥ dx e−F (x)/ε φ(x)2 .
supx∈A w(x) A

Choosing φ as the normalised eigenfunction of −LεD with maximal eigenvalue, we

arrive at (11.3.8).
Next we claim that, for any γ > 0,

%
dy e−γ F (y)/ε φD (y)2 < Cγ < ∞, (11.3.15)
D

where F %(y) = minx∈M [F (y) − F (x)]. Clearly this implies (11.3.9). To see why
(11.3.15) is true, set v(y) = e−F (y)/2ε φD (y), which is the corresponding ground-
state eigenfunction of the operator
1 2 1
− e−F /2ε Lε eF /2ε (x) = −εΔ + ∇F (x) − ΔF (x), (11.3.16)
4ε 2
which is a symmetric operator on L2 (D, dy). A semi-classical Agmon estimate for
the ground-state eigenfunction v that can be found in Helffer and Sjöstrand [138]
yields

%
dy e(1−γ )F (y)/ε v(y)2 < Cγ < ∞, (11.3.17)
D
which in turn implies (11.3.15). To obtain (11.3.10), note that wB,D (x) = Ex [τB |τB
≤ τD c ], x ∈ D\B, solves the Dirichlet problem

(−Lε wB,D )(x) = hB,D c (x), x ∈ D\B, (11.3.18)

wB,D (x) = 0, x ∈B ∪D . c

Rerunning the proof of (11.3.8) with w replaced by wB,D , we obtain (11.3.10).

c
We will next establish that λD
0 is at most polynomially small in ε when D does
not contain local minima. Define

Mε = z ∈ Ω : dist(z, M ) ≤ ε . (11.3.19)
11.3 Spectral theory 283

Lemma 11.14 Assume that D ∩ M2ε = ∅. Then there is a finite positive constant C,
independent of ε, such that

−2d+2
sup Ex [τD c ] ≤ Cε sup 1F (y)≤F (x) dy. (11.3.20)
x∈D x∈D Ω

Proof We start from the relation

dy e−F (y)/ε hBε (x),D c (y) ≥ inf Ez [τD c ] cap Bε (x), D c , (11.3.21)
D z∈∂Bε (x)

which is an immediate consequence of Corollary 7.30. The Harnack inequality in

Lemma 9.13 and the representation formula (7.2.47) give

sup Ez [τD c ] ≤ C inf Ez [τD c ]. (11.3.22)

z∈∂Bε (x) z∈∂Bε (x)

Combining this with (11.3.21), we get

dy e−F (y)/ε hBε (x),D c (y)
sup Ez [τD c ] ≤ C D . (11.3.23)
z∈∂Bε (x) cap(Bε (x), D c )

We distinguish between the regions {y ∈ D : F (y) > F (x)} and {y ∈ D : F (y) ≤

F (x)} in the integral. In the former, we use that hBε (x),D c (y) ≤ 1, while in the latter
we use the upper bound in Theorem 9.10. This gives

sup Ez [τD c ]
z∈∂Bε (x)
−F (y)/ε
{y∈D : F (y)>F (x)} dy e
≤C
cap(Bε (x), D c )

1 cap(Bε (y), Bε (x))
+C dy e−F (y)/ε .
cap(Bε (x), D c ) {y∈D : F (y)≤F (x)} cap(Bε (y), D c )
(11.3.24)

Using the bounds on capacities given in Proposition 11.4, with ρ = ε, we get

−d+1
sup Ez [τD c ] ≤ C ε e F (x)/ε
dy e−F (y)/ε
z∈∂Bε (x) {y∈D : F (y)>F (x)}

+ C ε −2d+2 dy. (11.3.25)
{y∈D : F (y)≤F (x)}

By our assumption on F , the first integral is bounded by a constant times e−F (x)/ε
and the second integral is equal to the volume of the level set {y ∈ D : F (y) ≤
F (x)}. The second term in (11.3.25) is dominant.
284 11 Diffusion Processes with Gradient Drift

Corollary 11.15 If D ∩ M2ε = ∅, then there exists a finite positive constant C,

independent of ε, such that
c
0 ≥ Cε
2d−2
λD . (11.3.26)

We can generalise the bounds obtained so far to sets D containing some of the
local minima of F . Let N ⊂ M be non-empty, and let

Nε = y ∈ Rd : dist(y, N ) ≤ ε . (11.3.27)

Assume that D ⊃ Nε , and set

A(x) = y ∈ D : hBε (x),D c \Bε (x) (y) = max hBε (x),D c \Bε (x) (z) . (11.3.28)
z∈M

Lemma 11.16 Under the assumptions of Lemma 11.13,

−F (y)/ε dy
1 A(xk ) e
c ≤ . (11.3.29)
λD cap(Bε (xk ), D\Bε (xk ))
0 k : xk ∈Nε

Proof The proof is similar to that of Lemma 11.14 when combined with the estimate
on mean exit times given in Proposition 11.8. We leave the details to the reader.

Lemma 11.16 and Theorem 11.2 imply, under the assumptions of Theorem 11.9,
that
e−[Φ(xk ,Mk )−F (xk )]/ε ,
c
0 ≥ Cε
λD min (11.3.30)
k : xk ∈Nε

where Cε is polynomially bounded in ε. This rough bound will be made more pre-
cise in the next section.

11.3.3 Principal Dirichlet eigenvalues

We now give a precise characterisation of Dirichlet eigenvalues.

General strategy

Let xi , i = 1, . . . , n, be the local minima of F labelled as in Theorem 11.9. Let

Bi =
Bε (xi ) ⊂ Ω, i = 1, . . . , n, be ε-balls around them. For n ≥ k ≥ 1, set Sk = ki=1 Bi
and let λ̄k = λS k
0 denote the principal eigenvalue of the Dirichlet operator Lε with

Dirichlet boundary conditions on ∂Sk = ki=1 ∂Bi (and on ∂Ω when this is not
11.3 Spectral theory 285

empty). For λ < λ̄k , consider the Dirichlet problem

(−Lε − λ)f λ (x) = 0, x ∈ Ω\∂Sk ,
f λ (x) = φ(x), x ∈ ∂Sk , (11.3.31)
f λ (x) = 0, x ∈ ∂Ω.
This is the Dirichlet problem in the exterior and the interior of the balls simulta-
neously (note that the principal eigenvalue of −Lε within a ball is larger than λ̄k ,
and therefore plays no rôle). In the sequel, when we specify Dirichlet problems, the
vanishing of the solution on the boundary of Ω will always be understood and will
not be mentioned anymore.
The basic idea is to construct an eigenfunction of the full operator −Lε as a
solution of the problem in (11.3.31) with a suitably chosen φ. Namely, if λ ≤ l, ¯
λ
then (11.3.31) has a unique solution. Suppose that there is an eigenfunction φ with
eigenvalue λ. If we choose φ(x) = φ λ (x) as boundary condition, then φ λ solves
(11.3.31), which shows that eigenfunctions can be obtained as solutions of this
Dirichlet problem for suitable boundary conditions. On the other hand, when we
want to check whether λ is an eigenvalue, we just need to verify whether there is a
function φ on ∂Sk such that the corresponding solution f λ of (11.3.31) also verifies
(−Lε − λ)f λ (x) = 0, ∀x ∈ ∂Sk . (11.3.32)
In fact, (−Lε − λ)f λ in general is a measure concentrated on the surface ∂Sk .
Demanding that this surface measure be zero, we are led to an integral equation for
φ on ∂Sk that is not particularly easy to handle.
This procedure is completely analogous to that in Chap. 8 for Markov processes
with countable state space. There, instead of balls Bi we just had points xi . The
equations in (11.3.32) were just k equations, and the boundary condition reduced to
the k numbers φ(xi ). This led to a set of linear equations for the unknown vector
φ(xi ), i = 1, . . . , k. The condition for λ to be an eigenvalue reduced to the vanishing
of a certain determinant. It would be nice if in the present setting we could reduce
the computation to a similarly simple condition. Indeed, this will be almost the case,
due to the fact that the eigenfunctions are very close to constants on the balls Bi .
We begin the program in this section with the somewhat simpler problem of
the computation of the principal eigenvalues in domains D ⊂ Ω. This problem is
considerably simpler because principal eigenfunctions are positive. Later the main
application will be to the case where D equals Ω with some small balls around local
minima of F removed.

Regularity properties of eigenfunctions

We first state a simple application of the Harnack and Hölder inequalities in Lem-
mas 9.8–9.9.

Lemma 11.17 Assume that x is a local minimum of F . Let φ be a positive strong

solution of (−Lε − λ)φ = 0, |λ| ≤ 1, on the ball B4√ε (x). Then there exist 0 < C <
286 11 Diffusion Processes with Gradient Drift

∞ and α > 0, both independent of ε, such that

oscy∈Bε (x) φ(y) ≤ Cε α/2 min φ(x). (11.3.33)

y∈Bε (x)

Sharp estimates on principal eigenvalues

c
We want to improve the estimates on principal eigenvalues λD
0 obtained in
Sect. 11.3.2 when D contains a local minimum of F .

Proposition 11.18 Assume that D contains l local minima of the function F and
that there is a single minimum x ∈ D that realises

Φ x, D c − F (x) = max Φ xi , D c − F (xi ) . (11.3.34)
1≤i≤l

Write B = Bε (x). Then there exist α > 0, C < ∞ and δ > 0, independent of ε, such
c
that the principal eigenvalue λD
0 of the Dirichlet problem on D satisfies

cap(B, D c ) −δ/ε
cap(B, D c )
1 − Cε α/2
1 − e ≤ λ Dc
0 ≤ 1 + Cε α/2 1 + e−δ/ε ,
hB,D 2,με
c 2 hB,D 2,με
c 2

(11.3.35)
where · 2,με denotes the L2 -norm with respect to the measure με (dy) =
e−F (y)/ε dy. In particular,
cap(Bk , Sk−1 )
1 − Cε α/2 1 − e−δ/ε
hBk ,Sk−1 2,με
2

cap(Bk , Sk−1 )
≤ λ̄k ≤ 1 + Cε α/2 1 + e−δ/ε . (11.3.36)
hBk ,Sk−1 2,με
2

Proof We know by Lemma 11.16 that

c ∪B ∗ (x,D c ))−F (x)]/ε
λ0D ≥ e−[F (z eδ/ε . (11.3.37)
c c c ∗
D ∪B (and expect λD ≈ e−[F (z (x,D ))−F (x)]/ε , which is c
We also know that λD0 < λ0 0
much smaller than the lower bound in (11.3.37)). By the philosophy outlined above,
we know that the principal eigenfunction can be represented as the solution of the
Dirichlet problem (both inside B and outside B)

(−Lε − λ)f λ (y) = 0, y ∈ D\∂B,

f λ (y) = φD (y), y ∈ ∂B, (11.3.38)
f λ (y) = 0, y ∈ Dc ,

where the boundary conditions φD are given by the actual principal eigenfunction.
We assume that dist(x, D c ) ≥ δ > 0, with δ independent of ε. Then B4√ε (x) ⊂
11.3 Spectral theory 287

D, and since φD is the principal eigenfunction, it may be chosen positive on D.

Therefore Lemma 11.17 applies and shows that

inf φD (y) = c ≤ sup φD (y) ≤ 1 + Cε α/2 c. (11.3.39)
y∈∂B y∈∂B

We normalise the eigenfunction such that c = 1. Then f λ (x) = hλB,D c (x) +

λ λ
χB,D c (x), where hB,D c is the λ-equilibrium potential that solves

(−Lε − λ)hλB,D c (y) = 0, y ∈ D\∂B,

hλB,D c (y) = 1, y ∈ ∂B, (11.3.40)

hλB,D c (y) = 0, y ∈ Dc ,

λ
while χB,D c solves

(−Lε − λ)χB,D
λ
c (y) = 0, y ∈ D\∂B,
c (y) = φD (y) − 1, y ∈ ∂B,
λ
χB,D (11.3.41)
λ
χB,D c (y) = 0, y ∈ Dc .

We want that (−Lε − λ)f λ vanishes also as a surface measure on ∂B. This
requires that there is no discontinuity in the derivative of f λ normal to ∂B, which
we can express as saying that, for g a smooth test function that vanishes on D c ,

e−F (y)/ε g(y)∂n(y) f λ (y) + g(y)∂−n(y) f λ (y) dσB (y) = 0, (11.3.42)
∂B

where dσB (y) denotes the Euclidean surface measure on ∂B, and ∂±n(y) denotes the
normal derivative at y ∈ ∂B from the exterior and interior of B, respectively. As we
will see, it already suffices to require that this equation hold for functions g that are
equal to 1 on ∂B. In fact, we will choose g = hB,D c . To evaluate this expression, it
will be convenient to observe that hB,D c (y) = 1 for y ∈ ∂B. Moreover, hB,D c (y) =
1 on B, so that ∂−n(y) hB,D c (y) vanishes on ∂B. Using these facts, together with the
second Green identity, we get from (11.3.42) the condition

−F (y)/ε λ
0= e ∂n(y) hB,D c (y)f (y)dσB (y) −
λ
dy e−F (y)/ε hB,D c (y)f λ (y)
∂B ε D

λ
= e−F (y)/ε ∂n(y) hB,D c (y)dσB (y) − dy e−F (y)/ε hB,D c (y)hλB,D c (y)
∂B ε D

+ e−F (y)/ε ∂n(y) hB,D c (y)χB,D
λ
c (y)dσB (y)
∂B

λ
− dy e−F (y)/ε hB,D c (y)χB,D
λ
c (y). (11.3.43)
ε D
288 11 Diffusion Processes with Gradient Drift

(Note that the derivative ∂n(y) is in the direction of the interior of B.) The two terms
λ
involving χB,D c will be naturally treated as error terms. Since ∂n(y) hB,D c (y) > 0,
we get via Lemma 11.17 that

0≤ e−F (y)/ε ∂n(y) hB,D c (y)χB,D
λ
c (y) ≤ Cε
α/2
e−F (y)/ε ∂n(y) hB,D c (y).
∂B ∂B
(11.3.44)
λ
Defining δχB,D c = χ λ
B,D c − χ 0
B,D c , we see that δχ λ
B,D c solves the Dirichlet problem

(−Lε − λ)δχB,D
λ
c (y) = λχB,D c (y),
0
y ∈ D\∂B,

c (y) = 0, y ∈ ∂B ∪ D c .
λ
δχB,D (11.3.45)

c (y) = 0, y ∈ Dc .
λ
δχB,D

In complete analogy with Lemma 8.21, we get the following L2 (με )-estimates.

Lemma 11.19
(i)
λ
λ
δχB,D c ≤ 0
χB,D c . (11.3.46)
λ0D ∪B
c
2,με
−λ 2,με

(ii)
λ
hλB,D c − hB,D c ≤ hB,D c 2,με . (11.3.47)
λ0D ∪B
c
2,με
−λ
(iii) For all z ∈ D\B,
0 ≤ χB,D
0
c (z) ≤ Cε
α/2
hB,D c (z). (11.3.48)

Proof Items (i) and (ii) are the standard L2 -bounds as used in Lemma 8.21. Item
0
(iii) follows from the Poisson kernel representation of χB,D c,

c (x) = −ε φD (y) − 1 ∂n(y) GD\B (x, y)dσB (y).
0
χB,D (11.3.49)
∂B

Since the normal derivative of the Green function GD\B (x, y) is negative on ∂B and
φD (y) ≥ 1 on ∂B, we get (11.3.48).

Using the estimates above, together with

ε e−F (y)/ε ∂n(y) hB,D c (y) = cap B, D c , (11.3.50)
∂B

we see that (11.3.43) implies that

c c
λ − cap(B, D ) ≤ Cε α/2 cap(B, D ) + λ
. (11.3.51)
hB,D c 2,με
2 hB,D c 2,με
2 D
λ0
c ∪B
−λ
11.3 Spectral theory 289

c
This yields the bounds on λD 0 in (11.3.35). Note that, while we have only used a
c
necessary condition for λD 0 , the fact that there must be such an eigenvalue implies
that it actually lies between the bounds given by (11.3.51).

Uniform estimates on principal eigenfunctions

In complete analogy with Lemma 8.22 we can improve the L2 -estimates to uniform
estimates.

Lemma 11.20 With the notation above, the following estimates hold for all ε small
enough:
(i) For all z ∈ D,
λ λ 0
χ ≤2 χ . (11.3.52)
B,D c (z) B,D c (z)
λ0D ∪B
c
−λ
(ii) For all z ∈ D\B,
λ λ
h ≤ hB,D c (z),
B,D c (z) − hB,D c (z) (11.3.53)
a(D, B) − λ

where a(D, B) = infy∈D\B 1

E[τB |τB ≤τD c ] .
(iii) For all z ∈ B, χB,D0 (z) ≤ Cε α/2 .
c
(iv) Consequently, the eigenfunction φD , normalised such that infy∈∂B φD (y) = 1,
satisfies, for all z ∈ D,

hB,D c (y) ≤ φD (y) ≤ hB,D c (y) 1 + Cε α/2 1 + e−δ/ε . (11.3.54)

Proof Items (i) and (ii) follow from the same arguments that were used in the proof
of Lemma 8.4.21. Item (iii) follows from the maximum principle. Combine these
estimates to get (iv).
c ∪B
Remark 11.21 Note that a(D, B) = 1/λ0D [1 + o(1)] for sets D\B that do con-
tain a local minimum of F .

11.3.4 Exponentially small eigenvalues and their eigenfunctions

The goal of this section is to generalise the analysis in Sect. 11.3.3 to all small
eigenvalues of −Lε . To do this, we need to first establish some a priori estimates
on the behaviour of eigenfunctions near the local minima of F .
290 11 Diffusion Processes with Gradient Drift

A priori estimates on eigenfunctions near local minima

For the analysis of harmonic functions that are not necessarily positive, we need an
estimate for sub-harmonic functions that allows us to relate the oscillation to the
L2 -norm.

Lemma 11.22 Let φ be a strong solution of (−Lε − λ)φ = 0 on the ball Bc√ε (x).
Then there exist a C < ∞ independent of ε such that

1/2
−d/4
oscBc ε φ ≤ Cε
√
2
φ(x) dx . (11.3.55)
B2c√ε

Proof This is just a specialisation of Gilbar and Trudinger [126, Theorem 9.20]
(which gives upper bounds on suprema of sub-harmonic functions in terms of Lp -
norms), and is obtained after choosing the balls in such a way that the constants are
uniform in ε.
√
We want to show that in ε-neighbourhoods the eigenfunctions corresponding
to the exponentially small eigenvalues of −Lε either have a constant sign or are
irrelevantly small. This property is suggested by the following result.

Lemma 11.23 Let φ be a normalised eigenfunction of −Lε corresponding to one

of the |M | smallest eigenvalues. Let γ < γ̂ = minx,y∈M [Φ(x, y) − F (y)]. For
i = 1, . . . , n, let Di be the set of points in y ∈ Ω such that the solution of the differ-
ential equation dt d
y(t) = −∇F (y(t)) with initial condition y(0) = y converges to
xi ∈ M . Then there exist constants ci , i = 1, . . . , n, such that

n
φ− ci 1Di ≤ Ce−γ /ε , (11.3.56)
i=1 2,με

for some C = Cγ < ∞.

Proof This proposition is stated and proved in Kolokoltsov [154] for smooth F , but
it is easy to check that the proof carries through for F ∈ C 3 (Ω).

Unfortunately Lemma 11.23 is not quite enough to conclude that Φ is not chang-
ing sign near any minimum. We will, however, show that this is the case when the
contribution of φ coming from a neighbourhood of a given minimum is significant.
To that end, for D ⊂ Ω set

1/2
f 2,με ,D = 2
f (x) με (dx) . (11.3.57)
D

For a given eigenfunction φ, define the set

J = 1 ≤ j ≤ n : φ2,με ,Dj ≥ e−γ /2ε , (11.3.58)
where Dj are the sets defined in Lemma 11.23.
11.3 Spectral theory 291

Lemma 11.24 If φ is one of the eigenfunctions of Lemma 11.23 and j ∈ J , then

there exist positive and finite constants cj , C, a, independent of ε, such that |φ(x) −
cj | ≤ Cε α/2 cj , for all x ∈ B√ε (xj ).

Proof We will first show that the weighted L2 -estimate on the deviation of√φ from
a constant implies a local unweighted L2 -estimate on balls of radius r = ε near
the minima xj , j ∈ J . To that end, note that (11.3.56) implies that

φ − cj 2,με ,Dj ≤ Ce−γ /ε . (11.3.59)

&(x) = φ(x)/φ2,με ,Dj and ĉj = cj /φ2,με ,Dj . Then, by the definition of J ,
Set φ
this locally normalised function satisfies the estimate

& − ĉj 2,με ,Dj ≤ Ce−γ /2ε .

φ (11.3.60)

This estimate does not change if we add a constant to F (x). Thus, we can pretend
that F (xi ) = 0. Let R > 0 be such that BR (xj ) ∈ Dj . Since xj is a quadratic min-
imum, there exists a positive and finite constant b such that F (x) ≤ b(x − xj )2 for
x ∈ BR (xj ). Hence (11.3.60) implies, in particular, that

φ&(x) − ĉj 2 dx ≤ CebR 2 /ε e−γ /2ε . (11.3.61)
BR (xj )

Note that also

&(x)2 dx ≤ ebR 2 /ε φ
φ &2,με ,Dj = ebR 2 /ε . (11.3.62)
BR (xj )

Let x ∈ B√ε (xj ). Then Lemma 11.22 implies

& ≤ Cε −d/4
oscB2√ε φ (11.3.63)

for a new positive, finite and ε-independent constant

√ C. Now we can use the Hölder
estimate in Lemma 9.9 to obtain that, for r < ε,

r α
&
oscBr φ ≤ Cε −d/4
, (11.3.64)
ε 1/2

for a new constant C and α > 0 independent of ε. If we choose r = ε 4α +1 , then

we can achieve that oscBr (x) ≤ Cε α/2 < ĉi /2 for ε small enough by the estimate
& must be close to cj , uniformly on Br (x). Since this
(11.3.61), it then follows that φ
√
argument holds for all x ∈ B ε (xj ), we have |φ& − ĉj | ≤ Cε α/2 on this ball.

We will later see that Lemma 11.24 overestimates the fluctuations of φ.

Lemma 11.22 is also the appropriate tool to show that near the minima where the
L2 -norm is small a similar estimate holds uniformly.
292 11 Diffusion Processes with Gradient Drift

Lemma 11.25 Let xi ∈ M , i ∈ / J . Then any eigenfunction φ of −Lε corresponding

to one of the |M | smallest eigenvalues satisfies

sup φ(x) ≤ Cε −d/4 e−γ /2ε eF (xi )/2ε . (11.3.65)
x∈B√ε (xi )

Proof Since i ∈ / J we may assume that φ changes sign on B√e (xi ). Hence its abso-
lute value is bounded by its oscillation, and so, by Lemma 11.22,

sup φ(x) ≤ Cε −d/4 φ2,dx,B2√ε (xj ) (11.3.66)
x∈B√ε (xi )

≤ C ε −d/4 eF (xi )/2ε φ2,με ,B2√ε (xj )

≤ C ε −d/4 eF (xi )/2ε φ2,με ,Dj

≤ C ε −d/4 e−γ /2ε eF (xi )/2ε .

This is the claimed bound.

Characterisation of the eigenvalues

Recall that we are working under the assumption stated in Theorem 11.9. Suppose
that we want to compute eigenvalues below λS 0 = λ̄k . We know that if φ is an
k λ

eigenfunction with λ < λ̄k , then it can be represented as the solution of the Dirichlet
problem
(−Lε − λ)f λ (y) = 0, y ∈ Ω\∂Sk ,
(11.3.67)
f λ (y) = φ λ (y), y ∈ ∂Sk .
As in the analysis of principal eigenvalues, the condition on λ will be the existence
of a non-trivial φ λ on ∂Sk such that the surface measure

dy e−F (y)/ε (−Lε − λ)f λ (y) = e−F (y)/ε ∂n(y) f λ (y) + ∂−n(y) f λ (y) dσSk (y)
(11.3.68)
vanishes. A necessary condition for this to happen is the vanishing of the total mass
on each of the surfaces ∂Bi , 1 ≤ i ≤ k, i.e.,

e−F (y)/ε ∂n(y) f λ (y) + ∂−n(y) f λ (y) dσSk (y) = 0. (11.3.69)
∂Bi

Let ci = infy∈Bi φ λ (y). In view of Lemmas 11.24 and 11.25, either of the follow-
ing two properties holds:
(i) supy∈Bi |φ λ (y)/ci − 1| ≤ Cε α/2 .
(ii) supy∈Bi |φ λ (y)| ≤ Cε −d/4 e−γ /2ε eF (xi )/2ε .
11.3 Spectral theory 293

In what follows we analyse all possible cases. Let J ⊂ {1, . . . , k} be the set of
indices where (i) holds and J c = {1, . . . , k}\J the set of indices where (ii) holds.
Given this partition, set

λ
fλ = cj hλBj ,Sk \Bj + χBλj ,Sk \Bj + χBj ,Sk \Bj . (11.3.70)
j ∈J j ∈J c

To lighten the notation we set hλj = hλBj ,Sk \Bj and χjλ = χBλj ,Sk \Bj , etc. in the se-
quel. For j ∈ J , χjλ is the solution of

(−Lε − λ)χjλ (y) = 0, y ∈ Ω\∂Sk ,

χjλ (y) = φ λ (y)/cj − 1, y ∈ ∂Bj , (11.3.71)
χjλ (y) = 0, y ∈ ∂Bi , i = j,

whereas, for j ∈ J c , χjλ is the solution of

(−Lε − λ)χjλ (y) = 0, y ∈ Ω\∂Sk ,

χjλ (y) = φ λ (y), y ∈ ∂Bj , (11.3.72)
χjλ (y) = 0, y ∈ ∂Bi , i = j.

We now proceed as in the analysis of principal eigenvalues, i.e., as necessary condi-

tion for λ to be an eigenvalue we require that, for all i = 1, . . . , k,

0= e−F (y)/ε hi (y) ∂n(y) f λ (y) + ∂−n(y) f λ (y) dσ∂ Sk (y)
∂Bi

λ
= e−F (y)/ε ∂n(y) hi (y)f λ (y) dσ∂ Sk (y) − dy e−F (y)/ε hi (y)f λ (y)
∂ Sk ε Ω

= cj e−F (y)/ε ∂n(y) hi (y) hλj (y) + χjλ (y) dσ∂ Sk (y)
j ∈J ∂Bj

λ −F (y)/ε
λ
− dy e hi (y) hj (y) + χj (y)
λ
ε Ω

+ e−F (y)/ε ∂n(y) hi (y)χjλ (y)dσ∂ Sk (y)
j ∈J c ∂Bj

λ −F (y)/ε
− dy e hi (y)χjλ (y) . (11.3.73)
ε Ω
294 11 Diffusion Processes with Gradient Drift

By the bounds in (i) and (ii) we have, for j ∈ J ,

e −F (y)/ε
∂n(y) hi (y)χjλ (y) dσ∂ Sk (y)

∂Bj

α/2

≤ Cε e −F (y)/ε
∂n(y) hi (y) dσ∂ Sk (y), (11.3.74)
∂Bj

and, for j ∈ J c ,

e−F (y)/ε ∂n(y) hi (y)χjλ (y) dσ∂ Sk (y)

∂Bj

−d/4 −γ /2ε F (xj )/2ε

≤ Cε e e e −F (y)/ε
∂n(y) hi (y) dσ∂ Sk (y). (11.3.75)
∂Bj

Since the hi are harmonic, the first Green formula (7.23) implies that, for i = j ,

e−F (y)/ε ∂n(y) hi (y) dσBj (y) = e−F (y)/ε hj (y)∂n(y) hi (y) dσBj (y)

∂Bj ∂Bj

= ε −1 dy e−F (y)/ε ∇hj (y), ∇hi (y)
Skc

≤ ε −1 cap(Bi , Sk \Bi ) cap(Bj , Sk \Bj ), (11.3.76)

where the last inequality uses the Cauchy-Schwarz inequality. Thus, for j ∈ J \i,

e−F (y)/ε ∂n(y) hi (y)χjλ dσ∂ Sk (y)

∂Bj

≤ Cε α/2 ε −1 cap(Bi , Sk \Bi ) cap(Bj , Sk \Bj ). (11.3.77)

and, for j ∈ J c \i,

e−F (y)/ε ∂n(y) hi (y)χjλ dσ∂ Sk (y)

∂Bj

≤ Cε −d/4 e−γ /2ε eF (xj )/2ε ε −1 cap(Bi , Sk \Bi ) cap(Bj , Sk \Bj ). (11.3.78)

For the diagonal terms i = j ∈ J this simplifies to

e−F (y)/ε ∂n(y) hi (y)χjλ dσ∂ Sk (y) ≤ Cε α/2 cap(Bj , Sk \Bj ). (11.3.79)

∂Bj
11.3 Spectral theory 295

For the remaining terms involving χ λ in (11.3.73) we obtain, in complete analogy

with the derivation of the bounds in Lemma 11.19, that, for j ∈ J ,

dy e−F (y)/ε hi (y) hλj (y) − hj (y) + χjλ (y)
Ω

= O ε α/2 1 + O e−δ/ε dy e−F (y)/ε hi (y)hj (y) (11.3.80)
Ω

and, for j ∈ J c ,

dy e−F (y)/ε hi (y)χjλ (y)
Ω

= O ε −d/4 e−γ /2ε eF (xj )/2ε dy e−F (y)/ε hi (y)hj (y). (11.3.81)
Ω

To control the off-diagonal terms we need to show that the normalised functions hi
and hj are almost orthogonal.

Lemma 11.26

(i) There is a constant C < ∞ such that, for i = j ,

(hi , hj )με = dy e−F (y)/ε hj (y)hi (y) ≤ Cε −2d e−Φ(xi ,xj )/ε . (11.3.82)
Ω

(ii) For all j ,

hj 22,με ≥ Cε d/2 e−F (xj )/ε . (11.3.83)

(iii) There is a constant C < ∞ such that, for i = j ,

(hi , hj )με
≤ Cε −3d max e−(Φ(xi ,xj )−F (xi ))/ε , e−(Φ(xi ,xj )−F (xj ))/ε .
hi 2,με hj 2,με
(11.3.84)

Proof The proof goes in the same way as that of Lemma 8.24 in Chap. 8, and
uses the bounds in (11.2.12) on harmonic functions and the bounds in (11.2.5) on
capacities.

Finally, we note that by Lemma 11.25 and Lemma 11.20, for j ∈

/ J,

λ
χ (z) ≤ Cε −d/4 e−γ /2ε hj (z) . (11.3.85)
j
hj 2,με
296 11 Diffusion Processes with Gradient Drift

Computation of small eigenvalues

The matrix C with elements given by

e−F (y)/ε hj (y)∂n(y) hi (y) dσBj (y),
(k)
Cij = Cij = ε i, j = 1, . . . , k,
∂Bj
(11.3.86)
is the analogue of the capacity matrix that we encountered in Chap. 8. We also use
its normalised version
(k)
(k)
Cij
Kij = Kij = . (11.3.87)
hi 2,με hj 2,με
Note that this matrix is symmetric and, by (11.3.76), satisfies

Kij ≤ Kii Kjj . (11.3.88)

Also introduce the matrices

ε ∂Bi e−F (y)/ε ∂n(y) hi (y)χjλ (y) dσ∂ Sk (y)
Aij = , (11.3.89)
hi 2,με hj 2,με
⎧
⎪ dy e−F (y)/ε hi (y)[hλj (y)−hj (y)+χjλ (y)]
⎨(1 − δij ) Ω hi 2,με hj 2,με , j ∈ J,
Bij = (11.3.90)
⎪
⎩(1 − δ ) Ω dy e
−F (y)/ε λ
hi (y)χj (y)
ij hi 2,με hj 2,με , j ∈J ,
c

−F (y)/ε h (y)[hλ (y) − h (y) + χ λ ]

Ω dy e j j j j
Dij = δij . (11.3.91)
hj 2,με2

Then (11.3.73) can be rewritten, for i ∈ J ,

ĉj Kij − λδij + Aij − λ(Dij + Bij ) + (Aij + λBij )hj 2,με = 0,
j ∈J j ∈J c
(11.3.92)
where ĉj = hj 2,με cj , j = 1, . . . , k. These equations are the analogues of
(8.4.86) for countable state space. The following lemma, which is the analogue
of Lemma 8.35, collects the estimates needed to analyse the solution of these equa-
tions.

Lemma 11.27 The following bounds hold:

(i) For all i, j ∈ 1, . . . , k,
|Aij | ≤ |Kij |Cε α/2 . (11.3.93)
(ii) For i = j ∈ J ,

|Bij | ≤ O ε α + e−δ/ε Kii Kjj . (11.3.94)
11.3 Spectral theory 297

(iii) For j ∈ J ,
|Djj | ≤ Cε α/2 . (11.3.95)
(iv) For i = j ∈ J c ,

hj 2,με |Aij | ≤ Cε −3d/4 e−γ /2ε |Kii |, (11.3.96)

and

hj 2,με |Bij | ≤ Cε −d e−γ /ε Kii Kjj . (11.3.97)

Proof The bound in (11.3.93) is (11.3.77). The bounds in (11.3.94) and (11.3.95)
follow from (11.3.80) and (11.3.88). The bound in (11.3.96) is a consequence of
(11.3.78), while (11.3.97) follows from (11.3.81).

From here on the analysis of the solutions of (11.3.92) is very similar to that of
(8.4.86). Let us summarise the situation so far.

Theorem 11.28 Let Sk = ki=1 Bε (xi ), and let λ̄k denote the principal eigenvalue
of the operator −Lε with Dirichlet boundary conditions on ∂Sk (and ∂Ω). Then a
necessary condition for a number λ < λ̄k to be an eigenvalue of the operator −Lε
is
that there exist a non-empty set J ⊂ {1, . . . , k} and constants ĉj , j ∈ J , with
ĉ 2 = 1, such that (11.3.92) holds for all i ∈ J .
j ∈J J

We expect that all the solutions of (11.3.92) are close to an eigenvalue of K .

Lemma 11.29 Let (Kij )1≤i,j ≤n be the normalised capacity matrix and assume that

max Kii ≤ e−δ/ε Kkk . (11.3.98)

1≤i<k

Then J k, and the largest eigenvalue μk of K satisfies

μk = Kkk 1 + O e−δ/2ε , (11.3.99)

while all other eigenvalues are smaller than Ce−δ/ε λk . Moreover, the eigenvec-
tor v = (v1 , . . . , vk ) corresponding to the largest eigenvalue normalised such that
vk = 1 satisfies |vi | ≤ Ce−δ/ε for 1 ≤ i < k.

Proof The proof is a simple perturbation argument. Note that we can write

K = Kˆ + Kˇ , (11.3.100)

where Kˆij = Kkk δik δj k . Estimate the norm of Kˇ as in the proof of Lemma 11.26.
Recall that
|Kij | ≤ Kii Kjj . (11.3.101)
298 11 Diffusion Processes with Gradient Drift

Hence, by assumption (11.3.98),

Kˇ ≤ Kkk e−δ/ε k + e−δ /ε k 2 .
2 2
(11.3.102)

Since Kˆ has one eigenvalue Kkk with the obvious eigenvector and all other eigen-
values are zero, the claim follows from standard perturbation theory.

Since Kkk = cap(Bk , Sk−1 )/hk 22,με ≈ λ̄k−1 (i.e., is equal to λ̄k−1 up to poly-
nomial terms in ε), Lemma 11.29 tells us that μk ≈ λ̄k−1 , which is precisely the
value we expect.

Corollary 11.30 Under the hypothesis in (11.3.98) the following hold:

(i) If there exists an eigenvalue λk of −Lε in the interval (λ̄k , λ̄k−1 ], then

cap(Bk , Sk−1 )
λk = 1 + O ε α/2 , e−δ/ε . (11.3.103)
hk 2,με
2

(ii) The eigenvalue λk is simple and the corresponding eigenfunction fkλ can be
written as

hk (y) k−1
hj (y)
φkλ (y) = 1 + O ε α/2 + dj (y) , (11.3.104)
hk 2,με hj 2,με
j =1

where |dj (y)| ≤ e−δ/ε for some δ > 0 (uniformly on compact subsets when Ω
is unbounded).

cap(Bk ,Sk−1 ) o(1)/ε

Proof (i) We already know (see Remark 11.3.4) that λ = e . But
hk 22,με
the only coefficient in (11.3.92) that is of this order is Kkk , if J k. So if k ∈
/ J,
then all ĉi , i ∈ J , would need to be exponentially small, contradicting the normal-
isation condition. Hence we may assume k ∈ J . By considering all the equations
with i = k, we see that the same argument as before shows that |ĉi | ≤ Ce−δ/2ε , and
hence ĉk ≈ 1, so the equation labelled k implies that

(Kkk + Akk − λk ) ≤ C|Kkk |e−δ/ε ε α/2 , (11.3.105)

and since also |Akk | ≤ Cε α/2 Kkk , (11.3.103) follows.

(ii) We have just seen that a solution of (11.3.92) with ĉk = 1 must satisfy |ĉj | ≤
e−δ/ε for all j = k. Hence, by (11.3.70),

hλk (y) + χkλ (y) hλj (y) + χjλ (y) λ

φkλ (y) = + ĉj + χj (y). (11.3.106)
hk 2,με hj 2,με
j ∈J \k cj ∈J
11.3 Spectral theory 299

Using the same arguments as in the proof of Lemma 11.20, and the bounds on
φ λ − cj on the boundaries ∂Bj , we get that, for j ∈ J ,

|χjλ (y)| hj (y)

≤ Cε α/2 . (11.3.107)
hj 2,με hj 2,με
Combining these estimates we arrive at (11.3.104). Note that this final estimate does
not depend on the choice of J .

At this point we can further explore the eigenvalues below λ̄k−1 , etc., with the
same result. Thus, at the end of the procedure, we arrive at the conclusion that −Lε
can have at most the n simple eigenvalues given by the values in Corollary 11.30
below Cε d−1 . However, since we know that there must be n such eigenvalues, we
conclude that all these candidate eigenvalues are in fact the true eigenvalues, which
yields the following proposition.

Proposition 11.31 Under the assumptions of Theorem 11.9, the spectrum of −Lε
below Cε d−1 consists of n simple eigenvalues that satisfy:
cap(Bk , Sk−1 )
λk = 1 + O ε α/2 , e−δ/ε
hk 2,με
2

det(∇ 2 F (xk )) F (xk )/ε
= cap(Bk , Sk−1 ) √ e 1 + O ε 1/2 ln(1/ε), ε α/2 , e−δ/ε
( 2πε) d

1
= 1 + O ε α/2 , e−δ/ε , k = 1, . . . , n. (11.3.108)
Exk [τSk−1 ]

The corresponding eigenfunctions satisfy (11.3.104).

Proof We have seen that λk = Kkk (1 + O(e−θ/ε , ε α/2 )), which proves the first
(k)

assertion. It remains to identify the eigenvalues with the inverse mean times. The
argument is essentially the same as in the proof of Theorem 8.43.
By virtue of Theorem 11.2 we need to show that

−F (y)/ε 2
dy e hk (y) ∼ dy e−F (y)/ε hk (y). (11.3.109)
Ω Ω

In fact, we will show more, namely, that both sides of (11.3.109) are asymptotically
equal to
√
( 2πε)d
e−F (xk )/ε . (11.3.110)
det(∇ 2 F (xk ))
We must show that the main contribution of the integrals comes from a small neigh-
bourhood of xk , which yields the contribution in (11.3.110). It is clear that all con-
tributions from the set {y ∈ Ω : F (y) > F (xk ) + ε ln(1/ε)} give only sub-leading
corrections. To treat the complement of this set, we use the bounds on the equilib-
300 11 Diffusion Processes with Gradient Drift

rium potential in (11.2.12). Up to polynomial factors in ε, these imply that, on the

connected component of the level set that does not contain xk , the integrand in the
right-hand side of (11.3.109) (and a fortiori in the left-hand side) is smaller than

e−[F (y)+Φ(y,Bk )−Φ(y,Sk−1 )]/ε . (11.3.111)

If y is in the component of the level set that contains the minimum xj , 1 ≤ j ≤ k,

then we see that the latter is equal to

e−Φ(xj ,Bk )/ε , (11.3.112)

which is exponentially smaller than exp(−F (xk )/ε), independently of y. If j > k,

then we still get the same result when F (y) ≥ Φ(xj , Sk−1 ). Otherwise, we can
write (11.3.111) as

e−[F (y)−F (xj )]/ε e−{F (xk )+[Φ(xj ,Bk )−F (xk )]−[Φ(xj ,Sk−1 )−F (xj )]}/ε . (11.3.113)

We will argue that

Φ(xj , Bk ) − F (xk ) > Φ(xj , Sk−1 ) − F (xj ). (11.3.114)

Suppose that the contrary holds. Trivially,

Φ(xj , Sk−1 ) ≥ Φ(xj , Sj −1 ), (11.3.115)

while
Φ(xj , Bk ) = Φ(xk , Bj ) ≤ Φ(xk , Sj \Bk ). (11.3.116)
Therefore, our supposition implies that

Φ(xj , Sj −1 ) − F (xj ) ≤ Φ(xk , Sj \Bk ) − F (xk ), (11.3.117)

which contradicts the conditions in (11.3.1) at stage j . In other words, if our suppo-
sition were true, then the set Bk would have to yield the largest eigenvalue at stage j ,
i.e., it would have to be labelled Bj . Hence (11.3.114) must hold.
Since, by assumption, the inequalities are strict (which is more than we need), it
indeed follows that
√
−F (y)/ε −F (xk )/ε ( 2πε)d 3/2
dy e hk (y) = e 1 + O ε 1/2 ln(1/ε) ,
Ω 2
det(∇ F (xk ))
(11.3.118)
and the same bound holds when hk is replaced by h2k .

11.3.5 Improved error estimates

To conclude the proofs of Theorems 11.9 and 11.11, we only need to improve the
error estimates. So far the proofs have produced error terms from two sources: the
11.3 Spectral theory 301

exponentially small errors resulting from the perturbation around λ = 0 and from
the imperfect orthogonality of the functions hi , i = 1, . . . , n, and the much larger
errors of order ε α/2 resulting from the a priori control on the regularity of the eigen-
functions obtained from the Hölder estimate of Lemma 11.24. In the light of the
estimates obtained on the eigenfunctions, these can now be improved successively.
First, note that the eigenfunction corresponding to the minimum xk is small
enough at all the minima xl , 1 ≤ l < k, so that we can actually take J = {k} and
Jk = {1, . . . , k − 1} in (11.3.71) and (11.3.73). Then we know from Corollary 11.30
that

oscy∈B4√ε (xk ) φk (y) ≤ Cε α/2 sup φk (y), (11.3.119)

y∈B4√ε (xk )

which improves the a priori estimate in (11.3.33).

Next, the Hölder estimate in Lemma 9.9 gives the improvement

oscy∈Bε (xk ) φk (y) ≤ Cε α/2 Cε α/2 + λk ε (d+1)/2 sup φk (y)
y∈B4√ε (xk )

≤ Cε α sup φk (y) (11.3.120)

y∈B4√ε (xk )

over the estimate in (11.3.33). This allows us to replace all errors of order ε α/2 by
errors of order ε α . This procedure can be iterated m times to get errors of order
ε mα/2 , which for m of order ln(1/ε) is as small as the exponentially small errors.
Finally, we want to improve the precision with which we relate the eigenvalues
to the inverse of the mean exit times. This precision is so far limited by the precision
with which
cap(Bε (xk ), Sk−1 )
Exk [τSk−1 ] ≈ . (11.3.121)
hk 2,με
From Proposition 11.7 we know that this precision is limited only by the variation of
Ex [τSk−1 ] on Bε (xk ). To improve (11.3.121), we need to control (hk = hBε (xk ),Sk−1 )

cap(Bε (xk ), Sk−1 ) cap(Bε (x), Sk−1 )

− , x ∈ B√ε (xk ). (11.3.122)
hk 2,με hBε (x),Sk−1 2,με

Now, it is easy to see that if x ∈ B√ε (xk ), then

hB (y) − hk (y) ≤ e−δ/ε hk (y). (11.3.123)
ε (x),Sk−1

Namely,

hB (y) − hk (y)
ε (x),Sk−1

≤ Py (τBε (xk ) < τSk−1 < τBε (x) ) + Py (τBε (x) < τSk−1 < τBε (xk ) ). (11.3.124)
302 11 Diffusion Processes with Gradient Drift

By the Markov property, the first term in (11.3.124) is bounded as

Py (τBε (xk ) < τSk−1 < τBε (x) )

≤ Py (τBε (xk ) < τSk−1 ) max Pz (τSk−1 < τBε (x) )
z∈Bε (xk )
−δ/ε
≤e Py (τBε (xk ) < τSk−1 ). (11.3.125)

The second term in (11.3.124) is bounded in the same way. This in turn implies that

hBε (x),Sk−1 2,με − hk 2,με ≤ e−δ/ε hk 2,με . (11.3.126)

To get an analogous estimate for capacities, we take advantage of the fact that, as
B (x)∪Sk−1
long as λ0 ε ( λk , we can replace Bε (xk ) by Bε (x) in the proof of Proposi-
tion 11.31 without further changes. Thus
cap(Bε (x), Sk−1 ) −δ/ε cap(Bε (xk ), Sk−1 ) −δ/ε
λk = 1 + O e = 1 + O e ,
hBε (x),Sk−1 22,με hk 22,με
(11.3.127)
which together with (11.3.126) implies that

cap Bε (x), Sk−1 −cap Bε (xk ), Sk−1 ≤ e−δ/ε cap Bε (xk ), Sk−1 . (11.3.128)

Based on (11.3.123) and (11.3.128), we can improve Proposition 11.7 iteratively as

above to obtain
cap(Bε (xk ), Sk−1 )
Exk [τSk ] = 1 + O e−δ/ε , (11.3.129)
hk 2,με
which, together with the capacity estimate given in Theorem 11.2, implies the first
equality in Theorem 11.9. Thus, all error terms of order ε α/2 can be removed from
(11.3.104) and (11.3.108), which completes the proofs of Theorems 11.9 and The-
orem 11.11.

11.3.6 Exponential distribution of metastable exit times

The last assertion of Theorem 11.12, the asymptotic exponential distribution of the
metastable exit time, follows from the spectral estimates above exactly as in the
discrete case (see the proof of Theorem 8.45). This result can also be obtained via
the coupling method of Martinelli et al. [174, 177].

11.4 Bibliographical notes

1. The material presented in this chapter is based on Bovier, Eckhoff, Gayrard and
Klein [35] and Bovier, Gayrard and Klein [38], with some corrections taken from
11.4 Bibliographical notes 303

the Diploma Thesis of Erich Bauer [14]. Assumptions 10.3, 10.5 and 11.1 can be
relaxed. In particular, we may take F = Fε depending on ε, or F with infinitely
many local minima. See e.g. Berglund and Gentz [21].

2. A proof of the Eyring-Kramers formula for the special case when all minima of
the potential are at the same level was given in two little-noticed papers by Sug-
iura [224, 225]. The approach used there runs via a direct variational control on
principal eigenvalues.

3. If Assumption 10.3 fails, then the asymptotics in Theorem 11.2 becomes more
complicated. Berglund and Gentz [21] classify various cases where the saddle point
is not quadratic.

4. Rough estimates of the small eigenvalues λi associated with the local minima xi
of F were derived in Freidlin and Wentzell [115], Mathieu [179] and Miclo [185].
Wentzell [234] and Freidlin and Wentzell [115] obtained estimates for the expo-
nential rate limε↓0 ε ln λk (ε) with the help of large deviation methods. Sharper esti-
mates, with multiplicative errors of order ε ±kd , were obtained for principal eigen-
values by Holley, Kusuoka and Strook [140] with the help of variational principles.
These methods were extended to the full set of exponentially small eigenvalues in
Miclo [185] and Mathieu [179].

5. For a long time sharp spectral estimates were known only in the one-dimensional
case (see e.g. Buslov and Makarov [44, 45] and references therein), whereas in the
multi-dimensional case only heuristic results based on formal power series expan-
sions of the so-called WKB-type existed (see e.g. Kolokoltsov [154]). The proof
in Sect. 11.3, which is based on potential theory, follows Bovier, Gayrard and
Klein [38] and uses ideas that appeared already in Wentzell [233, 234]. More re-
cently, a full analytic proof of the asymptotic expansion for these eigenvalues was
given by Helffer, Klein and Nier [136], and Helffer and Nier [137], using a micro-
local analysis of the so-called Witten complex. They show, in particular, that the
error bounds in Theorem 11.9 can be improved to O(ε). Moreover, they show that,
under the assumption that F is C ∞ , a full asymptotic expansion in ε for the eigen-
values can be computed.

6. There is considerable interest in the knowledge of eigenfunctions in the con-

text of numerical schemes designed to recover metastable sets from the computa-
tion of eigenfunctions (see, in particular, Schütte, Huisinga and Meyn [216]). Us-
ing the bounds on equilibrium potentials obtained in Bovier, Eckhoff, Gayrard and
Klein [35, Corollary 4.8], we can show that the result in Theorem 11.11 implies that
the eigenfunction φk corresponding to the local minimum xk of F is exponentially
close to a constant, i.e., ∼ eF (xk )/2ε , in the connected component of the level set
{y ∈ D : F (y) < F (z∗ (xk , Mk−1 ))} that contains xk (i.e., in the valley below the
saddle point that connects xk to the set below the level of xk ), while it drops expo-
nentially fast in the other connected components of the level set of this saddle, and
304 11 Diffusion Processes with Gradient Drift

below the level of xk is exponentially small in absolute terms. Note that this implies
that the zeros of φk are generally not in the neighbourhood of the saddle points, but
close to the minima in Mk−1 . This fact was observed in Schütte, Huisinga and Meyn
[216]. We would like to stress that the fact that the eigenfunctions drop sharply at the
saddle points makes them very good indicators of the actual valley structure of F ,
i.e., they are excellent approximations of the indicator functions of the metastable
sets corresponding to the metastable exit time 1/λk .

7. An interesting approach to the characterisation of sharp Poincaré inequalities

that allows for a derivation of the Kramers formula based on the theory of optimal
transport was developed by Menz and Schlichtung [183, 210].
Chapter 12
Stochastic Partial Differential Equations

Il y a des faussetés déguisées qui représentes si bien la vérité

que ce serait mal juger que de ne s’y pas laisser tromper.
(Francois de la Rochefoucauld, Réflexions)

A natural generalisation of the finite-dimensional diffusions considered Chap. 11

are stochastic partial differential equations. In this chapter we focus on the Allen-
Cahn equation introduced in Sect. 5.7. Section 12.1 gives the main theorem and
a rough outline of its proof. Section 12.2 lists some approximation properties for
the potential that are crucial for the proof. Section 12.3 provides estimates on the
relevant capacities, Sect. 12.4 on the equilibrium potential. The results are collected
in Sect. 12.5 to complete the argument.

12.1 Definitions, main theorem and outline of proof

We return to the SPDE in (6.3.1). Let F be the functional defined in (5.7.5). The
first and second Frechét derivatives Dφ and Dφ2 are defined through the requirement
that F has a Taylor expansion up to second order in h,

F (φ + h) = F (φ) + (Dφ F )(h) + 12 Dφ2 F (h, h) + o h2C 2 , (12.1.1)

where hC 2 = h∞ + h ∞ + h ∞ . The differentials Dφ F and Dφ2 F can be
computed explicitly, namely,

(Dφ F )(h)(x) = −Dh (x) + V φ(x) h(x), (12.1.2)

while (Dφ2 F )(h, h) is the quadratic form associated with the Hessian operator Hφ F
given by

(Hφ F )(h)(x) = −Dh (x) + V φ(x) h(x). (12.1.3)
Note that Hφ F a Sturm-Liouville operator (see Coddington and Levinson [66]).
We say that φ is a stationary point of F when φ is a solution of the non-linear
differential equation
−Dφ + V (φ) = 0. (12.1.4)

© Springer International Publishing Switzerland 2015 305

A. Bovier, F. den Hollander, Metastability,
Grundlehren der mathematischen Wissenschaften 351,
DOI 10.1007/978-3-319-24777-9_12
306 12 Stochastic Partial Differential Equations

The notion of saddle points, communications heights, and gates are defined as in the
finite-dimensional setting.
The theory can be developed under assumptions that are analogous to those used
in the finite-dimensional setting:

Assumption 12.1
(i) F has finitely many local minima and saddle points.
(ii) All local minima and saddle points of F are non-degenerate: at each point the
Hessian operator has only non-zero eigenvalues.

However, in this chapter we will only do computations for the simplest non-trivial
case, namely,
V (s) = − 12 s 2 + 14 s 4 + bs, s ∈ R, (12.1.5)
with b ≥ 0 small enough so that the equation

s − s3 − b = 0 (12.1.6)

has three roots,

zb−,∗ < 0 < zb∗ < zb+,∗ , (12.1.7)
corresponding to two minima and one saddle point of V . In this case, there are only
two local minima of F , namely, the constant functions I ± given by I ± (x) = ±zb±,∗ ,
x ∈ [0, 1].

Assumption 12.2 S (I − , I + ), the communication level set of F between I − and

I + (recall Definition 10.2), consists of a single saddle point O = Ob .

This assumption holds when D > π −2 . The saddle point is the function Ob (x) =
zb∗ ,
x ∈ [0, 1]. If b = 0, then zb∗ = 0.
For Sturm-Liouville operators, the notion of a determinant can be defined in the
following way. For φ ∈ C([0, 1]), let f be the solution of the initial value problem

(Hφ F )(f ) = 0, f (0) = 1, f (0) = 0. (12.1.8)

Define Det(Hφ F ) = f (1). Note that, as a regular Sturm-Liouville operator,

Hφ F has a countable number of real eigenvalues (λk (φ))k∈N . The definition of
Det(Hφ F ) is justified by the following standard result from Sturm-Liouville theory
(see Levit and Smilansky [165]).

Lemma 12.3 For any φ and ψ with non-degenerate Hessian operator, the infinite
product
" λk (φ) Det(Hφ F )
= (12.1.9)
λk (ψ) Det(Hψ F )
k∈N
is convergent.
12.2 Approximation properties of the potential 307

For φ ∈ Hbc
1 , let

Bρ (φ) = σ ∈ Hbc
1
: σ − φL2 ≤ ρ . (12.1.10)

Theorem 12.4 (Mean metastable exit time) Suppose that Assumptions 12.1 and
12.2 are satisfied. Then there exists a ρ0 ∈ (0, ∞) such that, for any ρ ∈ (0, ρ0 ),
2
− 2π [−Det(HO F )] [F (O)−F (I + )]/ε
EI + τ Bρ I = e 1 + Ψ (ε) ,
[−λ− (O)] Det(HI + F )
(12.1.11)
where λ− (O)
is the unique negative eigenvalue of H O F , and the error term satisfies
Ψ (ε) = O( ε[ln(1/ε)]3 ).

The main idea behind the proof of Theorem 12.4 is the use of the space-
discretisation introduced in Sect. 5.7.2. The proof comes in three steps:
(1) Let FN be the space discretisation of F defined in (5.7.24). According to Theo-
rem 5.70, given ε > 0 and sequences I ±,N , N ∈ N, converging to I ± , we have

lim EI +,N τεN Bρ I −,N = EI + τ Bρ I − . (12.1.12)
N →∞

(2) For fixed N , we compute the asymptotics of the transition time. This produces
a prefactor aN (ε) such that

1 N −,N
E τ B I − 1 = ψ(ε, N ). (12.1.13)
a (ε) I +,N ε ρ
N

We show that ψ(ε, N ) ≤ Ψ (ε) = O( ε[ln(1/ε)]3 ) for all N . This estimate is
first shown for the process starting in the last-exit biased distribution and then
transferred to a pointwise estimate with the help of a coupling argument as
explained in Sect. 9.4.2.
(3) We show that aN (ε) converges to the explicit expression given in (12.1.11) as
N → ∞.

12.2 Approximation properties of the potential

In this section we collect some approximation properties of the potential and related
quantities.
Recall Sect. 5.7.2. We identify uN = (uN1 , . . . , uN ) ∈ R with the linear inter-
N N

polation between the points (i/N, ui ). We say that uN ∈ RN converges to u ∈ H 1

when the linear interpolation associated with uN converges to u in the H 1 -norm.

The proof of the following lemma is elementary.

Lemma 12.5 For any uN ∈ RN , N ∈ N, converging to u ∈ H 1 the following hold:

308 12 Stochastic Partial Differential Equations

(a) limN →∞ FN (uN ) = F (u) < ∞.

(b) limN →∞ ∇FN (uN ) · hN = (Du F )(h) for any hN , N ∈ N, converging to h.
(c) limN →∞ (H FN )(uN )(hN , k N ) = (Du2 F )(h, k) for any hN , k N , N ∈ N, con-
verging to h, k.

Let $N denote the discrete Laplacian defined by

N
$ u (x) = N 2 u x + N −1 − 2u(x) + u x − N −1 . (12.2.1)

Let (λ0k,N )1≤k≤N be the eigenvalues of D$N and (λ0k )k∈N the eigenvalues of D$,
in increasing order. In the case of periodic boundary conditions on [0, 1], we have

2 kπ
λk,N = D (2N ) sin
0 2
, λ0k = D k 2 π 2 , k ∈ N. (12.2.2)
2N
Set
ek,N = λ0k,N − λ0k . (12.2.3)
Note that limN →∞ ek,N = 0 for fixed k, but there is no convergence uniformly in k.
Fix uN ∈ RN , N ∈ N, converging to u ∈ H 1 . Let (λk,N (uN ))1≤k≤N be the eigen-
values of N(H FN )(uN ) and (λk (u))k∈N the eigenvalues of (H F )(u). We would
like to show that (λk,N (uN ))1≤k≤N converges to (λk (u))k∈N in some appropriate
sense. Since (recall (5.7.24))

N (H FN ) uN = − 12 D$N + V uN (12.2.4)

and V (u) is bounded for any u fixed, we have the following estimates.

Lemma 12.6 There is a constant C such that, for all k, N, φ N , ψ N , φ, ψ,

λk,N φ N − λk,N ψ N ≤ C, λk (φ) − λk (ψ) ≤ C,
(12.2.5)
λk,N φ N − λ0 ≤ C, λk (φ) − λ0 ≤ C.
k,N k

The following lemma, adapted from de Hoog and Anderssen [76], gives us tighter
control under stronger assumptions.

Lemma 12.7 Consider a sequence uN ∈ RN , N ∈ N, converging to u ∈ C 2 ([0, 1])

such that uN − u∞ = O(N −2 ).
(a) For every α ∈ (0, 1) there is a constant C1 such that, for all N and all 1 ≤ k ≤
αN ,

λk,N uN − λk (u) − ek,N ≤ C1 . (12.2.6)
N2
(b) There exists a constant C2 such that

C2 k 4
|ek,N | ≤ . (12.2.7)
N2
12.2 Approximation properties of the potential 309

(c) For fixed 1 ≤ k ≤ N , the H 1 -normalised eigenvector φk,N of (H FN )(uN ) as-

sociated with λk,N (uN ) converges in H 1 to the eigenvector φk (u) of (H F )(u)
associated with λk (u), and
φk,N ∞ C
≤√ . (12.2.8)
φk,N 2 N

Lemmas 12.6–12.7 imply that

"
N −1 " λk (φ)
λk (φ)
lim = . (12.2.9)
N →∞ λk (ψ) λk (ψ)
k=0 k∈N0

Indeed, this convergence holds because

λk (φ) − λk (ψ) C
≤ . (12.2.10)
λk (ψ) k2

Proposition 12.8 For any φ N , ψ N , N ∈ N, converging in H 1 to φ, ψ such that

(H F )(ψ) and (H F )(φ) do not have a zero eigenvalue and
C
φN − φ ∞
∨ ψN − ψ ∞
≤ , (12.2.11)
N2
the following convergence holds:

det[(H FN )(φ N )] " λk (φ)

lim = . (12.2.12)
N →∞ det[(H FN )(ψ N )] λk (ψ)
k∈N

Proof For 1 ≤ k ≤ αN we proceed as follows. Put

θk,N (φ) = λk (φ)−1 λk,N φ N − λk (φ) . (12.2.13)

Then
λk,N (φ N ) λk (ψ) 1 + θk,N (φ) θk,N (φ) − θk,N (ψ)
= =1+ . (12.2.14)
λk (φ) λk,N (ψ ) 1 + θk,N (ψ)
N 1 + θk,N (ψ)

By Lemmas 12.6–12.7, we have

θk,N (ψ) ≤ C C1 + C2 k ≤ C α 2 + 1 . (12.2.15)

k2 N 2 N2 N2

For α small enough and N large enough this gives |θk,N (ψ)| ≤ 12 , and hence
αN
" λ (φ N ) λ (ψ)
αN

k,N k θk,N (φ) − θk,N (ψ) ≤ 2Cα ,
ln ≤ 2 (12.2.16)
λk (φ) λk,N (ψ N ) N
k=0 k=0
310 12 Stochastic Partial Differential Equations

where we use Lemma 12.6 to estimate |θk,N (φ) − θk,N (ψ)| ≤ C/N 2 .
For k > αN we proceed similarly. Put

θk,N = λk,N (ψ)−1 λk,N φ N − λk,N ψ N ,
(12.2.17)
θk = λk (ψ)−1 λk (φ) − λk (ψ) .

Then

1 + θk,N − θk
λk,N (φ N ) λk (ψ) θk,N
= = 1 + , (12.2.18)
λk (φ) λk,N (ψ N ) 1 + θk 1 + θk

and similarly for θk . By Lemma 12.6, we have

C
θ ≤ . (12.2.19)
k
k2

For α fixed and N large enough this gives |θk | ≤ 12 , and hence
N −1
" λ (φ N ) λ (ψ)
N −1
−1
N
k,N k θ 2C 2C
ln ≤ 2 − θ ≤ ≤ ,
λk (φ) λk,N (ψ N ) k,N k
k2 αN
k=αN k=αN k=αN
(12.2.20)

where we use Lemma 12.6 to estimate |θk,N − θk | ≤ C/k 2 . Combining (12.2.16)
and (12.2.20), and recalling that

"
N −1
det[(H FN )(φ N )] λk,N (φ N )
N
= , (12.2.21)
det[(H FN )(ψ )] λk,N (ψ N )
k=0

we get the claim.

It can be shown that the conclusion of Proposition 12.8 holds when the condition
in (12.2.11) is replaced by

C
φN − φ L2
∨ ψN − ψ L2
≤ . (12.2.22)
N
The next lemma shows that every stationary point of F can be approximated by
a sequence of stationary points of FN , N ∈ N, in the sense of (12.2.22). The proof
is elementary.

Lemma 12.9 There exist C, N0 such that for all N > N0 and all stationary points
φ of F there is a stationary point φ N of FN such that

C
φ − φN L2
≤ . (12.2.23)
N
12.3 Estimate of the capacity 311

12.3 Estimate of the capacity

In this section we compute the relevant capacities for the discretised process. This
can be taken from Chap. 11, except that we have to take care of the N -dependence
of the error terms.
Recall from (5.7.27) that, after proper rescaling, we are considering the N -
dimensional diffusion
√
dXtN = −N ∇FN XtN dt + 2εN dBt . (12.3.1)

ε the invariant measure for the process X = (Xt )t∈R+ :

Denote by μN N N

−FN (x)/ε
ε (dx) = e
μN dx. (12.3.2)

The Dirichlet form for this process is given by

2
E N (h, h) = εN ∇h(x) 2
μN
ε (dx). (12.3.3)
RN

Let BρN (x) denote the Euclidean ball of radius ρ around x ∈ RN . Write I +,N , I −,N ,
O N , λ− N + −
N (O ) to denote the analogues of I , I , O defined prior to Theorem 12.4.
The following proposition is the desired estimate for the capacity with an error term
that is uniform in N .

Proposition 12.10 For all 0 < ε < ε0 and ρ > 0,

cap BρN I +,N , BρN I −,N
− −FN (O N )/ε
(N −2)/2 [−λN (O )] e
N
= ε(2πε) 1 + ψ1 (ε, N ) , (12.3.4)
− det[(H FN )(O )]
N

where lim supN →∞ |ψ1 (ε, N )| ≤ C ε[ln(1/ε)]3 for some constant C.

The proof of this proposition is given in Sects. 12.3.1–12.3.3.

12.3.1 Properties of the potential

We need to control the potentials FN globally and near their critical points. It is
very convenient that in our setting the Hessians at all the three stationary points are
diagonal in the same basis, namely,
√
vlk = ωkl / N , k ∈ {0, . . . , N − 1}, l ∈ {1, . . . , N}, (12.3.5)
312 12 Stochastic Partial Differential Equations

with ω = e2πi/N . This allows us to choose global coordinates for which all relevant
Hessians are diagonal. For y ∈ RN , define

1 k
N
ŷk = ŷk (y) = √ v l yl , (12.3.6)
N l=1

and the inverse

−1
√ N
yl = yl (ŷ) = N vlk ŷk . (12.3.7)
k=0

Recall that the explicit form of FN in the old coordinates is (recall (5.7.23))

N
N
FN (y) = N −1 V (yl ) + 14 N D (yl − yl+1 )2 . (12.3.8)
l=1 l=1

In the new coordinates this takes the form (recall (12.2.2))

N −1
1 4
FN y(ŷ) = 1
2 λ0k,N ŷk2 − 12 ŷ02 + y(ŷ) 4
(12.3.9)
4N
k=0

N −1
1 4
= 1
2 λk,N ŷk2 + y(ŷ) 4
4N
k=0

where λ0k,N , k = 0, . . . , N − 1, are the eigenvalues of D times the discrete Laplacian

(λ00,N = 0), and we put

λ0,N = −1, λk,N = λ0k,N , k = 1, . . . , N − 1. (12.3.10)

The critical points in the new coordinates are

I ±,N (ŷ) = (±1, 0, . . . , 0), O N (ŷ) = (0, 0, . . . , 0). (12.3.11)

Since these all lie on the line ŷ1 = ŷ2 = · · · = ŷN −1 = 0, it is useful to single out the
0-th coordinate. Note that
N
4
1
N
4
N −1 1
N
4
1 lk
yl (ŷ) = ŷ0 + ω ŷk = ŷ0 + wl (ŷ) , (12.3.12)
N N N
l=1 l=1 k=1 l=1

N −1
where wl = wl (ŷ) = k=1 ωkl ŷk . The important point is that Nl=1 ω = 0 for all
kl

k = 0. Hence, expanding to fourth power, we get the following.

Lemma 12.11 The potential FN expressed in the new coordinates ŷ satisfies:

12.3 Estimate of the capacity 313

(i)

N −1

FN y(ŷ) + 12 ŷ02 1 + 3w22 − 14 ŷ04 − 1
λk,N ŷk2
2
k=1
1
≤ 4|ŷ0 |w33 + w44 . (12.3.13)
4N
(ii)

N −1

FN y(ŷ) ≥ 1
2 λk,N ŷk2 − 12 ŷ02 . (12.3.14)
k=1

Proof Item (i) follows from (12.3.9) and (12.3.13). Item (ii) follows trivially be-
cause the quartic term in (12.3.9) is non-negative.

The following facts are crucial.

Lemma 12.12 With the norms and maps defined above:

(i) The Parseval identity holds, i.e.,

N −1/2 y(ŷ) 2
= ŷ2 . (12.3.15)

(ii) The Hausdorff-Young inequality holds, i.e., for any p ≥ 2 and for q = p/(p − 1)
there exists a constant Cq such that

N −1/p y(ŷ) p
≤ Cp ŷq . (12.3.16)

Proof The Parseval identity is checked easily. The Hausdorff-Young inequality is

proven as a consequence of the Riesz-Thorin interpolation theorem. Namely,
√ since
the components of the vectors v k are bounded in absolute value by C/ N , ŷ →
y(ŷ) is bounded as a map from L1 to L∞ , i.e.,

y(ŷ) ∞
≤ Cŷ1 . (12.3.17)

Together with the Parseval identity, this provides the input to obtain (12.3.16) from
the Riesz-Thorin interpolation theorem. See Reed and Simon [204, p. 328].

Define, for δ0 > 0 and for constants rk,N , k = 1, . . . , N − 1, to be chosen later,

the sets

CδN,⊥ = ŷ ∈ RN −1 : |ŷk | ≤ δrk,N / |λk,N |, 1 ≤ k ≤ N − 1 . (12.3.18)

Then, for ŷ ∈ CδN,⊥ ,

'N −1 (p/q
q
rk,N
w(ŷ) p
≤ δp . (12.3.19)
(λk,N )q/2
k=1
314 12 Stochastic Partial Differential Equations

From the explicit form of the eigenvalues of the discrete Laplacian in (12.2.2) and
the relation in (12.3.10), we see that λk,N = λN −k,N , k = 1, . . . , N − 1. Using that,
for 0 ≤ t ≤ π2 ,

0 < t 2 1 − 13 t 2 ≤ sin2 t ≤ t 2 , (12.3.20)
we see that, for 1 ≤ k ≤ N2 ,

λk,N ≥ k 2 81 Dπ 2 1 − 1 2
12 π . (12.3.21)

The constants rk,N are constructed as follows. For an increasing sequence (ρk )k∈N
set
6 7
N
rk,N = rN −k,N = ρk , 1 ≤ k ≤ . (12.3.22)
2
Pick ρk = k α with α > 0 such that, for q = 32 , 43 ,

ρq
k
= Bq < ∞. (12.3.23)
kq
k∈N

This yields the following estimates.

Lemma 12.13 There is a choice of α > 0 such that for ρk = k α and p = 2, 3, 4

there are constants Cp < ∞ (independent of N ) such that
(i) For ŷ ∈ CδN,⊥ ,
N −1 w(ŷ)
p
p
≤ δ p Cp . (12.3.24)

(ii) For ŷ ∈ CδN,⊥ with ŷ0 ≤ δ,

N −1 y(ŷ)
4
4
≤ δ 4 C4 . (12.3.25)

Proof Collect the estimates above.

This uniform control on the quadratic approximation of FN is the main ingredient

needed to extend the analysis of capacities in Chap. 11 to the SPDE setting.

12.3.2 Upper bound

The strategy for the upper bound is the same as in the proof of Theorem 11.2.

Proof Define the following neighbourhood of the saddle point O N :

CδN = CδN O N = y(ŷ) ∈ RN : |ŷ0 | ≤ c0 δ, ŷ ∈ CδN,⊥ , (12.3.26)
12.3 Estimate of the capacity 315

where CδN,⊥ is defined in (12.3.18) and c0 < ∞ is a constant to be chosen. For the
upper bound it is enough to replace FN by its lower bound in (12.3.14). Define the
set

UδN = y(ŷ) ∈ RN : |ŷ0 | ≤ c0 δ . (12.3.27)
Choose a test function h+ in the Dirichlet principle for the Dirichlet form in (12.3.3)
to obtain an upper bound on the capacity of interest. The set (UδN )c decomposes into
two disjoint connected components, one of which contains I +,N . We set h+ (y) = 1
on the latter component and h+ (y) = 0 on the other component. On UδN we choose
h+ as h+ (y) = f (ŷ0 ), where (recall that λ0,N = −1)
c0 δ λ t 2 /2ε
e 0,N dt
f (s) = cs δ 2 /2ε
. (12.3.28)
0 λ t
−c0 δ e dt
0,N

The Dirichlet form evaluated on this test function then reduces to

E N h+ , h+
√ N 2
= εN N d ŷ ∇h+ y(ŷ) 2 e−FN (y(ŷ))/ε
UδN

√ N c0 δ 2
N−1
d ŷ0 e−λ0,N ŷ0 /2ε f (ŷ0 ) d ŷ1 . . . d ŷN −1 e−
2 λk,N ŷk2 /2ε
≤ε N k=1
−c0 δ RN−1
2
√ N "
N −1
1 2πε
=ε N c
0δ s 2 /2ε λk,N
−c0 δ eλ0,N
ds k=1
4 2
−1
√ N −λ0,N N" 2πε 2 2
=ε N 1 + O ec0 δ λ0,N /2ε . (12.3.29)
2πε λk,N
k=1

In the first and second equality, the change of variable y → ŷ gives rise to the factor
√ N √
N and the relation ∇h+ (y(ŷ))22 = N −1 |f (ŷ0 )|2 . Taking δ = K ε ln(1/ε),
as in Chap. 11, we see that the right-hand side has the desired asymptotics. Thus we
obtain that, for N large enough,

cap BρN I +,N , BρN I −,N

[−λ0,N ] e−FN (O )/ε 2 2

≤ ε(2πε)(N −2)/2 1 + ε −c0 K /2 , (12.3.30)

− det[(H FN )(O N )]
! −1
where we use that det[(H FN )(O N )] = N k=0 λk,N , and recall that FN (O ) = 0
N

(see (12.1.5), (12.3.8) and (12.3.11)). This is the upper bound with a better error
estimate than in (12.3.4).

Remark 12.14 Note that, due to the particularly simple form of the potential in
(12.1.5), we did not need to use the fact that FN is well approximated by a quadratic
316 12 Stochastic Partial Differential Equations

function in a suitable neighbourhood of the saddle point. In more general settings,

however, this would be needed, together with an estimate showing that the contri-
butions coming from outside this neighbourhood are negligible, as in Chap. 11.

12.3.3 Lower bound

We next turn to the proof of the complementing lower bound.

Proof Around the saddle point O N we take a narrow corridor from one local mini-
mum to the other, and minimise the Dirichlet form on this corridor. We use the same
notation as in the proof of the upper bound.
We bound the capacity from below by

cap BρN I +,N , BρN I −,N (12.3.31)
√ N
2
≥ inf εN N ∇h y(ŷ) 2 e−FN (y(ŷ))/ε d ŷ
h : h(x)=1 ∀ x∈BN +,N ) CδN,⊥
ρ (I
h(x)=0 ∀ x∈BN −,N )
ρ (I

√ N d 2 −F (y(ŷ))/ε
≥ inf εN N h y(ŷ) e N d ŷ.
h : h(x)=1 ∀ x∈BN +,N ) N,⊥ d ŷ
ρ (I Cδ 0
h(x)=0 ∀ x∈BN −,N )
ρ (I

The infimum can now be performed for each value of the orthogonal coordinates
ŷ ⊥ = (ŷ1 , . . . , ŷN −1 ) separately, i.e., the right-hand side of (12.3.31) is larger than
or equal to

√ N 1 2 ⊥
ε N d ŷ ⊥
sup d ŷ0 f (ŷ0 ) e−FN (y(ŷ0 ,ŷ ))/ε
CδN,⊥ f : f (1)=1,f (−1)=0 −1
−1
√ N 1 ⊥ ))/ε
=ε N d ŷ ⊥ d ŷ0 eFN (y(ŷ0 ,ŷ , (12.3.32)
CδN,⊥ −1

where we use that we already know how to solve the one-dimensional variational
problem.
To conclude, we need to bound the second integral in (12.3.32) from above. Us-
ing the upper bound from Lemma 12.11 and bounding the norms of w appearing
there with the help of Lemma 12.13, we obtain
1 N−1 1
⊥ ))/ε 1 1 2 )]+ 1 ŷ 4 )/ε
λk,N ŷk2 /ε+O(δ 3 )/ε
d ŷ0 e( 2 λ0,N ŷ0 [1+O(δ
2
d ŷ0 eFN (y(ŷ0 ,ŷ ≤ e2 k=1 4 0
−1 −1
√ (12.3.33)
when y⊥ ∈ CδN,⊥ (O N ). We again choose δ = K ε ln(1/ε) for some sufficiently
large K, and recall that λ0,N = −1. Hence the exponent in the integrand in the
12.4 Estimate of the equilibrium potential 317

right-hand side of (12.3.33) without the error term achieves its unique maximum at
−1/4ε. It is therefore easy to see that
1 √
1 1 4
d ŷ0 e( 2 λ0,N ŷ0 [1+O(δ )]+ 4 ŷ0 )/ε = 2πε (−λ0,N )−1/2 1 + O ε ln(1/ε) .
2 2

−1
(12.3.34)
Inserting this bound into (12.3.32), we can now carry out all the integrals over the
ŷk , 1 ≤ k ≤ N − 1. It is again elementary to show that

1 N−1
d ŷ ⊥ e− 2 k=1 λk,N ŷk /ε
2

CδN,⊥
' −1 4 (
N−1
N
λk,N
⊥ − 12 λk,N ŷk2 /ε − 12 λk,N ŷk2 /ε
≥ d ŷ e k=1 1− √ d ŷk e
RN−1 2πε |ŷk |≥δrk,N / λk,N
k=1
−1
' −1
(
√ N −1 "
N
1
N
1
−1 − 2 K ln(1/ε)rk,N
2
≥ 2πε 1− rk,N e . (12.3.35)
k=1
λk,N k=1

If we choose rk,N as in (12.3.22), with ρk = k α for some α > 0, and choose K large
enough, then we can arrange that

N −1
1 %
−1 − 2 K ln(1/ε)rk,N 2
& K
1− rk,N e ≥ 1 − Kε (12.3.36)
k=1

for some K & < ∞ for K

% > 1 as large as desired (uniformly in N ). From here we get
the desired lower bound

cap BρN I +,N , BρN I −,N
√
N −2 [−λ0,N ]e−FN (O )/ε
N

≥ ε 2πε 1 − C ε ln(1/ε)3 (12.3.37)

− det[(H FN )(O N )]
for some constant C that is independent of N . This is the claimed lower bound and
concludes the proof of Proposition 12.10.

12.4 Estimate of the equilibrium potential

Recall from Corollary 7.30 that we have the formula
N
RN hBρN (I +,N ),BρN (I −,N ) (x) dμε (x)
Eν N +,N N −,N [τBρN (I −,N ) ] = . (12.4.1)
Bρ (I ),Bρ (I ) cap(BρN (I +,N ), BρN (I −,N ))

In Sect. 12.3 we derived upper and lower bounds on the denominator in (12.4.1) We
next derive estimates on the numerator of (12.4.1). The point is to show that this is
essentially the mass of a small neighbourhood of the starting minimum I +,N .
318 12 Stochastic Partial Differential Equations

Proposition 12.15 For all 0 < ε < ε0 and ρ > 0 small enough,

hBρN (I +,N ),BρN (I −,N ) (x) dμN
ε (x)
RN

(2πε)N
e−FN (O
N )/ε
= 1 + ψ2 (ε, N ) , (12.4.2)
det[(H FN )(I +,N )]

where lim supN →∞ |ψ2 (ε, N )| ≤ C ε[ln(1/ε)]3 .

Proof We first consider the symmetric case b = 0 in (12.1.5). As in the previous

section, we define around the minimum I ±,N ∈ RN a neighbourhood CδN (I ±,N ) by

CδN I ±,N = y(ŷ) ∈ RN : |ŷ0 ∓ 1| ≤ δ, ŷ ∈ CδN,⊥ . (12.4.3)

To estimate the left-hand side of (12.4.2) we need yet another lower bound on the
non-quadratic terms in FN . This time we write
4 4
y(ŷ) 4,N
= ŷ04 + y(ŷ) 4,N
− ŷ04 ≤ ŷ04 , (12.4.4)

where in the last inequality we use that, by the Cauchy-Schwarz inequality,

' (4 ' (2

N
N
N
−1 −2
ŷ04 = N yl ≤N yl2 ≤ N −1 yl4 . (12.4.5)
l=1 l=1 l=1

Inserting (12.4.4) into (12.3.9), we get

−1
N
FN y(ŷ) ≥ λk,N ŷk2 + 14 ŷ04 . (12.4.6)
k=0

Note, moreover, that the coordinates of the two local minima are

ŷ I ±,N k = ±δk,0 , (12.4.7)

and in the CδN -neighbourhoods of these local minima the quadratic approximation
is good. Finally, the sets CδN (I ±,N ) are subsets of BρN (I ±,N ), so that the integrand
is equal to 1 on the set CδN (I +,N ) and equal to 0 on the set CδN (I −,N ). The claimed
estimate on the integral is now straightforward.
Most of the analysis above carries over unchanged when b > 0. The saddle points
remain the same, while the positions of the minima are shifted. More importantly,
the value of FN is now smaller by bs on the negative side. To show that nonetheless
there is no contribution from the target valley, we need a bound on the equilibrium
potential. Let

A = x ∈ RN : FN (x) ≤ FN I+N + δ (12.4.8)
for some δ > 0 small enough.
12.5 Proof of the main theorem 319

Lemma 12.16 For all η > 0 there exist ρ0 > 0, δ0 > 0 and ε0 > 0 such that for all
0 < ρ < ρ0 , 0 < δ < δ0 , 0 < ε < ε0 and x ∈ A ,

hBρN (I +,N ),BρN (I −,N l) (x) ≤ e−(FN (O

N )−F 2 η)/ε
N (x)−cδ . (12.4.9)

Proof By the definition of the set A , all paths from x ∈ A to I +,N must attain a
height at least FN (O N ). Therefore it follows from the large deviation principle and
the discussion on the exit problem (see Sect. 6.5.2) that for any T < ∞ fixed and all
x∈A,
Px (τBρN (I +,N ) < T ) ≤ e−(FN (O
N )−F
N (x)−η)/ε . (12.4.10)
On the other hand, for all x ∈ A there is a zero-action path from x to one of the
minima in BρN (I −,N ) that takes only a finite time T0 . All zero-action paths must
lead to BρN (I −,N ) in finite time. Therefore, to stay away from this set for a time T
requires the path not to follow a minimiser of the action integral for time T − T0 .
This costs a total action of order T a for some a > 0, and thus the probability of this
event is of order exp(−T a/ε), which can be made as small as desired by choosing
T large enough. In particular, it can be made much smaller than the probability in
(12.4.10). Now the simple bound

Px (τBρN (I +,N ) < τBρN (l −,N ) ) ≤ Px (τBρN (I +,N ) < T ) + Px (τBρN (I −,N ) > T ) (12.4.11)

yields the desired estimate.

Using the bound in Lemma 12.16, we see that the results for the symmetric case
b = 0 carry over to b > 0. This completes the proof of Proposition 12.15.

Remark 12.17 In more complicated situations, i.e., in the presence of multiple sta-
tionary points, the argument gets a little more involved. In that case, the process may
reach a small neighbourhood of some other stationary point before reaching its final
destination, and in this neighbourhood it could spend a large amount of time with-
out penalty. The probabilities to first reach the various stationary points are easily
computed with the help of large deviations, and by continuing the analysis step for
step from these new points as starting points we can show that this does not affect
the ultimate estimate on the harmonic function. This type of analysis is the basis
of the Freidlin-Wentzell theory [115]. All estimates involve only the potentials FN ,
and since these converge to F as discussed earlier, the control that is obtained in this
way is uniform in N .

12.5 Proof of the main theorem

Proof By putting all the estimates together, we obtain the following result on the
mean metastable exit time.
320 12 Stochastic Partial Differential Equations

Proposition 12.18 Uniformly in N ,

EI +,N τ N BρN I −,N
+,N
2π e(FN (O )−FN (I ))/ε − det[(H FN )(O N )]
N

= 1 + Ψ (ε, N ) , (12.5.1)
[−λ0,N ] det (H FN )(I +,N )

where the error term satisfies

3
lim supΨ (ε, N ) ≤ C ε ln(1/ε) . (12.5.2)
N →∞

Proof Inserting the estimates for the denominator (Proposition 12.10) and the nu-
merator (Proposition 12.15) into (12.4.1), we get that Eν N [τεN ] is equal to the right-
hand side of (12.5.1), where

ν N = νB
N
N +,N ),B N (I −,N ) (12.5.3)
ρ (I ρ

is the last-exit biased distribution on BρN (I +,N ). Then use Theorem 9.14 to replace
ν N by the point BρN (I +,N ).

The assertion of Theorem 12.4 follows from Proposition 12.18 and the conver-
gence results established in Sect. 5.7, in particular, Theorem 5.70.

12.6 Bibliographical notes

1. The system in (5.7.1) and its metastable behaviour have been studied for thirty
years. The main techniques employed in the literature are based on large deviation
principles and comparison estimates between the deterministic process ((5.7.1) with
ε = 0) and the stochastic process ((5.7.1) with ε > 0). Faris and Jona-Lasinio [107]
analysed (5.7.1) for the quartic double-well potential we considered here. Cassan-
dro, Olivieri and Picco [52] obtained similar asymptotics as in [107] when the space
interval [0, 1] is not fixed but tends to infinity as ε ↓ 0 (sufficiently slowly). These re-
sults established the existence of a suitable exponential time scale on which the pro-
cess undergoes a transition. For (6.3.1), Martinelli, Olivieri and Scoppola [175] ob-
tained the asymptotic exponential law of the transition times. Brassesco [41] proved
that the trajectories exhibit characteristics of metastable behaviour: the escape from
the basin of attraction of the minimum occurs through the lowest saddle points and
the process starting from this minimum spends most of its time before the transition
near this minimum.

2. As in the finite-dimensional setting, local minima and saddle points play a key
role in understanding metastability. In the infinite-dimensional setting, identifying
12.6 Bibliographical notes 321

the critical points is already a difficult task in itself. Fortunately, elegant methods
are available to do so: see e.g. Fiedler and Rocha [113] and Wolfrum [237].

3. To analyse metastability for the infinite-dimensional diffusion, we used a spa-

tial discretisation that brings us back to the case of finite-dimensional diffusions
studied in Chap. 11. The use of spatial finite-difference approximation is natural.
Berglund and Gentz [22] use a Galerkin approximation and obtain analogous re-
sults in a more general setting. Our main objective has been to derive the infinite-
dimensional analogue of Kramer’s formula for average metastable exit times. Such
a formula was conjectured by Maier and Stein [170] (see also Vanden-Eijnden and
Westdickenberg [230]) as a formal limit of the finite-dimensional systems. For the
setting described in this chapter, this limit was justified rigorously in Barret, Bovier
and Méléard [12]. Extensions to more general settings were studied by Barret [11],
and Berglund and Gentz [22]. Berglund and Gentz also consider cases where the
Hessian matrix is degenerate.
Part V
Applications: Coarse-Graining in Large
Volumes at Positive Temperatures

Part V deals with Markov processes that allow for coarse-graining, i.e., a lumping of
states that leads to a simpler Markov process on a reduced state space. For instance,
the reduction of the state space of a high-dimensional spin system to that of a low-
dimensional spin system, whenever possible, is a powerful tool for the analysis of
its dynamics. Some mean-field models allow for such a reduction.
Chapter 13 looks at the Curie-Weiss model, Chaps. 14–15 at the random-field
Curie-Weiss model.
Chapter 13
The Curie-Weiss Model

La simplicité affectée est une imposture délicate.

(François de La Rochefoucauld, Réflexions)

Most systems of interest in statistical physics are extremely high-dimensional, and

become infinite-dimensional in the thermodynamic limit. Unlike in the diffusion-
type models discussed in Part IV, their metastable behaviour cannot be read off
from the energy of paths alone, because a true interplay between energy and entropy
of paths takes place. This makes the analysis of such systems hard. A promising
strategy is the reduction of this complexity via a mapping to a low-dimensional state
space in the spirit of the coarse-graining and lumping explained in Sects. 9.2–9.3.
In this chapter we deal with the Curie-Weiss model. Section 13.1 defines the model
and introduces the coarse-graining. Section 13.2 solves the coarse-grained model
and proves the theorems describing its metastable behaviour.

13.1 The Curie-Weiss model

The toy model where coarse-graining works perfectly well is the Curie-Weiss model
of a ferromagnet. The state space is SΛ = {−1, +1}Λ with Λ = {1, . . . , N }, N ∈ N.
The Hamiltonian is given by
1
HN (σ ) = − σi σj − h σi , σ ∈ SΛ , (13.1.1)
2N
i,j ∈Λ i∈Λ

with h ∈ R the magnetic field. The fact that this is a mean-field model is expressed
by the fact that HN (σ ) depends on σ only through the empirical magnetisation
1
mN (σ ) = σi , (13.1.2)
N
i∈Λ

namely,
1
HN (σ ) = −N 2 mN (σ ) + hmN (σ )
2
= N E mN (σ ) . (13.1.3)

© Springer International Publishing Switzerland 2015 325

A. Bovier, F. den Hollander, Metastability,
Grundlehren der mathematischen Wissenschaften 351,
DOI 10.1007/978-3-319-24777-9_13
326 13 The Curie-Weiss Model

We choose a discrete-time dynamics σN = (σ (n))n∈N0 on SΛ with Metropolis tran-

sition probabilities
⎧
⎪ −1
⎨N exp[−β[HN (σ ) − HN (σ )]+ ], if σ − σ 1 = 2,

p σ, σ = 0, if σ − σ 1 > 2, (13.1.4)
⎪
⎩
1 − η=σ p(σ, η), if σ = σ ,

where · 1 is the 1 -norm on SΛ , and the last line is put in to obtain a proper
normalisation. This dynamics is reversible w.r.t. the Gibbs measure

1
μβ,N (σ ) = e−βHN (σ ) 2−N , σ ∈ SΛ , (13.1.5)
Zβ,N

with Zβ,N the normalising partition function and β the inverse temperature.
Let us look at the evolution of the magnetisation mN (n) = mN (σ (n)) at time
n ∈ N0 . Clearly, this quantity can only increase or decrease by 2N −1 , and the prob-
ability of doing so only depends on the number of −1’s and +1’s present in the con-
figuration σ (n), i.e., on mN (σ (n)). In other words, with Fn denoting the σ -algebra
up to time n,

P mN (n + 1) = m | Fn = rβ,N mN (n), m , n ∈ N0 , (13.1.6)

is a function of mN (n) only, so that Theorem 9.5 applies and the image Markov
process has transition probabilities (recall (9.3.3))
⎧
⎨ 1−m exp[−βN[E(m ) − E(m)]+ ], if m = m + 2N −1 ,
2
rβ,N m, m =
⎩ 1+m exp[−βN[E(m ) − E(m)] ], if m = m − 2N −1 ,
2 +
(13.1.7)
on the state space

ΓN = −1, −1 + 2N −1 , . . . , 1 − 2N −1 , 1 . (13.1.8)

Moreover, this Markov process is reversible with respect to the image Gibbs mea-
sure

1 N
νβ,N (m) = e−βN E(m) 1+m 2−N , m ∈ ΓN . (13.1.9)
Zβ,N 2 N
In exponential form the latter can be written as

1 −βNfβ,N (m)
νβ,N (m) = e , (13.1.10)
Zβ,N

where
fβ,N (m) = − 12 m2 − hm + β −1 IN (m), (13.1.11)
13.1 The Curie-Weiss model 327

Fig. 13.1 Plot of m → fβ (m) on [−1, 1] when β > 1 and h < 0

with

1 N
−IN (m) = ln 1+m 2−N . (13.1.12)
N 2 N

In the limit as N → ∞, ΓN lies dense in [−1, 1] and

lim fβ,N (m) = fβ (m), lim IN (m) = I (m), (13.1.13)

N →∞ N →∞

with

fβ (m) = − 12 m2 − hm + β −1 I (m),
I (m) = 12 (1 + m) ln(1 + m) + 12 (1 − m) ln(1 − m). (13.1.14)

The latter is the Cramér rate function for coin tossing (recall Sect. 6.1). Since
I (m) = I (−m) and I (m) ∼ 12 m2 as m → 0, we see from (13.1.11) that m → fβ (m)
is a double well when β > 1 and |h| is small enough (see Fig. 13.1). The stationary
points of fβ are the solutions of the equation

m = tanh β(m + h) . (13.1.15)

The above observations show that mN = (mN (n))n∈N0 is a random walk on ΓN ⊂

[−1, 1] with a reversible invariant measure that is close to exp[−βNfβ (m)] (modulo
normalisation) for large N . Clearly, this bring us to a situation where we can obtain
an exact solution, as was explained in Sect. 7.1.4. Moreover, since ΓN is a lattice
with spacing 2N −1 , in the limit as N → ∞ sums appearing in the exact solution
can be approximated by integrals with the help of saddle-point techniques.
328 13 The Curie-Weiss Model

13.2 Metastable behaviour

The random walk mN is close to a diffusion on [−1, 1] given by the Kramers diffu-
sion equation in (2.1.1) with W (x) = βfβ (x) and ε = N −1 . In other words, for large
N the dynamics of the magnetisation in the Curie-Weiss model can be approximated
by a Brownian motion in a potential as encountered in Sect. 2.1. For β > 1 and |h|
small enough this potential is a double well and the diffusion exhibits metastable
behaviour.
Let m∗− < m∗+ be the two local minima of m → fβ (m), and z∗ the saddle point in
between. Let m∗− (N ), m∗+ (N ) denote the points in ΓN that are closest to m∗− , m∗+ .
These points form a metastable set in the sense of Definition 8.2. In the setting of
Fig. 13.1, we have fβ (m+ ) > fβ (m− ), so m∗+ (N ) is the metastable state and m∗− (N )
is the stable state. Let Em∗+ (N ) denote expectation w.r.t. the Markov process starting
in m∗+ (N) and τm∗− (N ) the first hitting time of m∗− (N ).

Theorem 13.1 (Mean metastable crossover time) As N → ∞,

Em∗+ (N ) [τm∗− (N ) ] = exp βN fβ z∗ − fβ m∗+ (13.2.1)
2
2 1 − z∗2 2πN/4
× 1 + o(1) ∗ ∗2
.
1−z 1 − m+ β [−f (z∗ )]f (m∗ )
β β +

Proof Since our Markov process mN is a nearest-neighbour random walk on ΓN ,

we can use the computations in Sect. 7.1.4. According to (7.1.61), we have
νβ,N (m ) 1
Em∗+ (N ) [τm∗− (N ) ] = . (13.2.2)
νβ,N (m) rβ,N (m, m − 2N −1 )
m,m ∈ΓN , m≤m
m∗ ∗
− (N)<m≤m+ (N)

By (13.1.7) and (13.1.10), we have

1 + m −βN [E(m−2N −1 )−E(m)]
rβ,N m, m − 2N −1 = e +,
2
νβ,N (m )
= e βN [fβ,N (m)−fβ,N (m )] . (13.2.3)
νβ,N (m)

In the limit as N → ∞, the sums in (13.2.2) are dominated by the terms with
m → z∗ and m → m∗+ , since for these terms fβ,N (m) − fβ,N (m ) is maximal. This
explains the exponential factor in (13.2.1). To get the prefactor in (13.2.1), we need
to look a bit more closely.
Note that [E(m − 2N −1 ) − E(m)]+ = 2N −1 [(m + h) − N −1 ]+ . For m → z∗ , the
∗
∗
2 exp(−2β[z + h]+ ). In the situation depicted
first line of (13.2.3) converges to 1+z
∗ ∗
in Fig. 13.1, we have z > 0. But z is a solution of (13.1.15), and so we have
exp(2β[z∗ + h]) = (1 + z∗ )/(1 − z∗ ) > 1. Therefore (13.2.2)–(13.2.3) imply that,
13.2 Metastable behaviour 329

for any ε > 0,

∗ )−f ∗ 2
Em∗+ (N ) [τm∗− (N ) ] = e βN [fβ,N (z β,N (m+ )]
∗
1 + o(1) (13.2.4)
1−z
∗ )]−βN [f ∗
× e βN[fβ,N (m)−fβ,N (z β,N (m )−fβ,N (m+ )] .
m,m ∈ΓN
|m−z∗ |<ε, |m −m∗
+ |<ε

It follows from (13.1.11)–(13.1.14) and Stirling’s formula that

1 πN(1 − m2 )
IN (m) − I (m) = 1 + o(1) ln (13.2.5)
2N 2

and hence
2
πN(1 − m2 )
e βN [fβ,N (m)−fβ (m)] = 1 + o(1) . (13.2.6)
2

Consequently,
2
βN [fβ,N (z∗ )−fβ,N (m∗+ )]
∗ ∗ 1 − z∗2
e = 1 + o(1) e βN [fβ (z )−fβ (m+ )] . (13.2.7)
1 − m∗2
+

Inserting this into (13.2.4), we get

2
βN [fβ (z∗ )−fβ (m∗+ )] 2 1 − z∗2
Em∗+ (N ) [τm∗− (N ) ] = e 1 + o(1) (13.2.8)
1 − z∗ ∗2
1 − m+
∗ )]−βN [f (m )−f (m∗ )]
× e βN [fβ (m)−fβ (z β β + .
m,m ∈ΓN
|m−z∗ |<ε, |m −m∗
+ |<ε

To evaluate the sum we write the Taylor expansions

2 3
fβ (m) − fβ z∗ = 12 m − z∗ fβ z∗ + O m − z∗ , (13.2.9)
2 3
fβ m − fβ m∗+ = 12 m − m∗+ fβ m∗+ + O m − m∗+ ,

where we use that fβ (z∗ ) = 0 and fβ (m∗+ ) = 0. Changing to new variables u =
√ √
N (m − z∗ ) and u = N (m − m∗+ ) and recalling (13.1.8), we see that the sum in
(13.2.8) equals

N
1 + o(1) du du exp 12 βfβ z∗ u2 − 12 βfβ m∗+ u2 . (13.2.10)
4 R R
330 13 The Curie-Weiss Model

Since fβ (z∗ ) < 0 and fβ (m∗+ ) > 0, the integral converges and equals

2π
. (13.2.11)
β [−fβ (z∗ )] fβ (m∗+ )

Combining (13.2.8) and (13.2.10)–(13.2.11), we end up with (13.2.1).

The result in Theorem 13.1 fits the classical Arrhenius law with activation energy
β[fβ (z∗ ) − fβ (m∗+ )] and amplitude given by the prefactor. The former coincides
with what was found in (2.1.2) for the Kramers model, with W = βfβ and ε = N −1 ,
while the latter differs by a factor
2
N 1 1 − z∗2
∗
. (13.2.12)
2 1−z 1 − m∗2+

This discrepancy comes from the discrete nature of the Curie-Weiss model. In par-
ticular, the factor N is due to the fact that time is discrete and only one spin is flipped
per time step. In a continuous-time version, we would speed up time by a factor N ,
after which N would disappear from the last term in the right-hand of (13.2.1).
As a corollary of Theorems 8.43 and 8.45 we get the exponential law of the
metastable crossover time.

Theorem 13.2 (Exponential law) As N → ∞,

τm∗− (N )
lim Pm∗+ (N ) > t = e−t ∀ t ≥ 0. (13.2.13)
N →∞ Em∗+ (N ) [τm∗− (N ) ]

13.3 Bibliographical notes

1. The knowledge of the behaviour of mN = (mN (n))n∈N0 does not answer all rel-
evant questions about the dynamics of σN = (σ (n))n∈N0 . This issue was addressed
by Levin, Luczak and Peres [164].

2. There are a number of generalised mean-field models that allow for a similar re-
duction to a multi-dimensional diffusive Markov process. See e.g. Bovier, Eckhoff,
Gayrard and Klein [33] and Chap. 14 of this book.

3. The calculations in this chapter are not robust against small modifications. Indeed,
we are using the full permutation symmetry of the Hamiltonian, which is necessary
to ensure that mN = (mN (n))n∈N0 is a Markov process. Even when we merely re-
place the discrete spin variables by continuous spin variables (which leads to the
model of mean-field interacting diffusions), the Markov property fails and we are
required to consider the empirical measure rather than the empirical magnetisation
as the macroscopic variable in order to obtain a Markovian dynamics.
Chapter 14
The Curie-Weiss Model with a Random
Magnetic Field: Discrete Distributions

He’s a wonderfully clever man, you know. Sometimes he says

things that only the Other Professor can understand. Sometimes
he says things that nobody can understand.
(Lewis Carroll, Sylvie and Bruno)

In Sect. 14.1 we introduce the model. In Sect. 14.2 we define the associated Gibbs
measure and the relevant order parameter. In Sect. 14.3 we define the Glauber
dynamics and state the main metastability result. Section 14.4 deals with coarse-
graining, which works because of the mean-field interaction and because the ran-
dom fields take finitely many values. We construct the effective Dirichlet form that
is obtained after the coarse-graining. Section 14.5 studies the energy landscape near
the critical points. Section 14.6 analyses the eigenvalues of the Hessian at the crit-
ical points, while Sect. 14.7 looks at the overall topology of the energy landscape,
and indicate how the metastability results follow from those in Chap. 10.

14.1 The model

The random-field Curie-Weiss (RFCW) model is one of the simplest examples of a

disordered mean-field model. The state space is SN = {−1, +1}N , where N is the
number of spins in the system, and the Hamiltonian is given by

1
HN [ω](σ ) = − σi σj − hi [ω]σi , (14.1.1)
2N
i,j ∈Λ i∈L

where Λ = {1, . . . , N} and hi , i ∈ N, are i.i.d. random variables on some probability

space (Ω, F , Ph ).
We will consider two versions of the model that differ substantially in their com-
plexity. If the magnetic field has a discrete distribution, then the approach taken for
the standard Curie-Weiss model treated in Chap. 13 can be easily extended. This
version will be treated in the present chapter. However, if the magnetic field has a
continuous distribution, then computations become seriously more complex. This
version will be treated in Chap. 15.

© Springer International Publishing Switzerland 2015 331

A. Bovier, F. den Hollander, Metastability,
Grundlehren der mathematischen Wissenschaften 351,
DOI 10.1007/978-3-319-24777-9_14
332 14 The Curie-Weiss Model with a Random Magnetic Field

14.2 Gibbs measure and order parameter

We briefly review some key features of the equilibrium behaviour of the RFCW-
model, for which we do not need any assumption on the distribution of the random
magnetic field.
The Gibbs measure of the RFCW-model is the random probability measure

2−N e−βHN [ω](σ )

μβ,N [ω](σ ) = , (14.2.1)
Zβ,N [ω]

the partition function is defined as

Zβ,N [ω] = 2−N e−βHN [ω](σ ) . (14.2.2)
σ ∈SN

As in Chap. 13, the empirical magnetisation

1
mN (σ ) = σi (14.2.3)
N
i∈Λ

serves as the order parameter of the model, and we define its distribution under the
Gibbs measure in (14.2.1) as the induced measure

Qβ,N = μβ,N ◦ m−1

N (14.2.4)

on the set of possible values of mn , ΓN = {−1, −1 + 2/N, . . . , 1 − 2/N, 1}.

We write
1 2
Zβ,N [ω]Qβ,N [ω](m) = e 2 Nβm Zβ,N
1
[ω](m), (14.2.5)

where

1
Zβ,N [ω](m) = 2−N eβ i∈Λ hi [ω]σi 1{N −1 i∈Λ σi =m} . (14.2.6)
σ ∈SN

For simplicity, we identify functions f defined on the discrete set ΓN with functions
f defined on the interval [−1, 1] by setting f (m) = f (02N m1/2N ). Then, by using
sharp large deviation estimates (see Chaganty and Sethuraman [55]), ZN 1 (m), m ∈

(−1, 1), can be expressed as

exp[−N IN [ω](m)]
1
Zβ,N [ω](m) = 1 + o(1) , (14.2.7)

2 N π/IN [ω](m)
1
14.2 Gibbs measure and order parameter 333

where o(1) tends to zero as N → ∞, and IN [ω](y) is the Legendre-Fenchel trans-

form of the log-moment-generating function

1
UN [ω](t) = ln 2−N eβ i∈Λ hi [ω]σi et i∈Λ σi
N
σ ∈SN

1
= ln cosh t + βhi [ω] . (14.2.8)
N
i∈Λ

Hence we can rewrite (14.2.5) as

2IN [ω](m)
Zβ,N [ω]Qβ,N [ω](m) = Nπ exp −NβFβ,N [ω](m) 1 + o(1) , (14.2.9)
where
1 1
Fβ,N [ω](m) = − m2 + IN [ω](m). (14.2.10)
2 β
We are interested in the behaviour of Zβ,N [ω]Qβ,N [ω] near the critical points
of Fβ,N . These satisfy the equation

m∗ = β −1 IN m∗ = β −1 t ∗ , (14.2.11)
or

βm∗ = IN m∗ = t ∗ . (14.2.12)
Since IN is the Legendre-Fenchel transform of UN , we have IN (x) = [UN ]−1 (x),
so that
1
m∗ = UN βm∗ = tanh β m∗ + hi [ω] . (14.2.13)
N
i∈Λ
Moreover,
∗
a m∗ = Fβ,N

m = −1 + β −1 IN m∗ . (14.2.14)
(m∗ ) = 1/U (t ∗ ), we get the alternative expression
Finally, using that IN, N,

1 1
a m∗ = −1 + = −1 + .
βUN (βm∗ ) β
i∈Λ [1 − tanh 2
(β(m∗ + hi [ω]))]
N
(14.2.15)
Thus, we see that, by the law of large numbers, the set of critical points converges
Ph -a.s. to the set of solutions of the equation

m∗ = Eh tanh β m∗ + h , (14.2.16)
and the second derivative of Fβ,N (m∗ ) converges to

∗ 1
lim Fβ,N m = −1 + . (14.2.17)
N →∞ βEh [1 − tanh2 (β(m∗ + h))]
334 14 The Curie-Weiss Model with a Random Magnetic Field

Hence m∗ is a local minimum when

βEh 1 − tanh2 β m∗ + h < 1 (14.2.18)

and a local maximum when

βEh 1 − tanh2 β m∗ + h > 1. (14.2.19)

(The case βEh [1 − tanh2 (β(m∗ + h))] = 1 corresponds to a second-order phase

transition and will not be considered here.)

Proposition 14.1 Let m∗ be a critical point of Fβ,N . Then Ph -a.s., for all but finitely
many values of N ,

exp[−βNFβ,N (m∗ )][1 + o(1)]

Zβ,N Qβ,N m∗ = (14.2.20)
∗
2 |E[1 − tanh (β(m + h))]|
Nπ 2

with
1 2 1
Fβ,N m∗ = m∗ − ln cosh β m∗ + hi [ω] . (14.2.21)
2 βN
i∈Λ

The above observations provide a detailed picture of the distribution of the order
parameter. Note that m∗ depends on ω.

14.3 Glauber dynamics

Next we add dynamics. As in Chap. 13, we consider the discrete-time Glauber dy-
namics with Metropolis transition probabilities (compare with (13.1.4))
⎧
⎪ −1
⎨N exp[−β[HN [ω](σ ) − HN [ω](σ )]+ ], if σ − σ 1 = 2,

pN [ω] σ, σ = 0, if σ − σ 1 > 2,
⎪
⎩
1 − η=σ p(σ, η), if σ = σ .
(14.3.1)
We write Pσ [ω] = Pσ for the law of this Markov process (for a given realisation of
the magnetic fields) starting in σ . Note that this dynamics is ergodic and reversible
with respect to the Gibbs measure μβ,N [ω] for each ω.
A heuristic picture for the metastable behaviour of systems like the random-
field Curie-Weiss model is based on replacing the full Markov process on SN by
an effective Markov process for the order parameter, i.e., by a nearest-neighbour
random walk on ΓN with transition probabilities that are reversible with respect to
the induced measure Qβ,N . The ensuing model can be solved exactly. A natural
14.4 Coarse-graining 335

choice for the transition rates of the effective dynamics is

1
rN [ω] m, m = μβ,N [ω](σ )
Qβ,N [ω](m)
σ ∈SN : mN (σ )=m

× pN [ω] σ, σ , (14.3.2)
σ ∈SN : mN (σ )=m

which are different from zero only when m = m − 2/N, m, m + 2/N . The ensuing
Markov process is a one-dimensional nearest-neighbour random walk, for which
most quantities of interest can be computed explicitly by elementary means, as in
Chap. 13. In particular, it is easy to show that if M is the global minimum of Fβ,N
and m∗ is a local minimum, then, as Theorem 13.1,

Em∗ [τM ] = exp βN Fβ,N z∗ − Fβ,N m∗
2
2 2πN/4 βEh [1 − tanh2 (β(z∗ + h))] − 1
× ∗ ∗
1 + o(1) ,
1 − z β[−a(z )] 1 − βEh [1 − tanh (β(m + h))]
2 ∗

(14.3.3)

where z∗ is the saddle point between M and m∗ , and a(z∗ ) is defined in (14.2.15).
However, the prediction of this naive approximation produces the wrong prefactor,
as is shown in our main theorem below.
To obtain precise results, we will need to introduce an exact lumping in the sense
of Sect. 9.2.

14.4 Coarse-graining

So far we did not need any assumption on the distribution of the random field. Now
we assume that the random field takes values in the finite set I = {b1 , . . . , bn }.
Each realisation of the random field {hi [ω]}i∈Λ induces a random partition of the
set Λ = {1, . . . , N} into subsets (see Fig. 14.1)

Λk [ω] = i ∈ Λ : hi [ω] = bk , k = 1, . . . , n. (14.4.1)

Accordingly, we introduce n order parameters

1
mk [ω](σ ) = σi , k = 1, . . . , n, (14.4.2)
N
i∈Λk [ω]

and we denote by m [ω] the n-dimensional vector (m1 [ω], . . . , mn [ω]). In the sequel
we will use the convention that boldface symbols denote n-dimensional vectors and
their components, while the sum of the components is denoted by the corresponding
336 14 The Curie-Weiss Model with a Random Magnetic Field

Fig. 14.1 Coarse-graining: Λ is partitioned into sets where the magnetic field takes the same value

n
plain symbol, e.g. m[ω] = k=1 mk [ω]. The vector m takes values in the set
n
ΓNn [ω] = ×
k=1
−ρN,k [ω], −ρN,k [ω] + N2 , . . . , ρN,k [ω] − N2 , ρN,k [ω] , (14.4.3)

where
|Λk [ω]|
ρk = ρN,k [ω] = . (14.4.4)
N
We denote by ek , k = 1, . . . , n, the lattice vectors of the set ΓNn [ω], i.e., the vectors
of length 2/N parallel to the unit vectors. Note that the random variables ρN,k [ω]
concentrate exponentially fast in N around their mean values Eh [ρN,k ] = Ph (h1 =
bk ) = pk . In particular, we have the following lemma.

Lemma 14.2 For all n ∈ N,

P ∃Nn ∀N ≥Nn ∀1≤k≤n : |ρN,k − pk | ≤ 12 pk = 1. (14.4.5)

The Hamiltonian takes the form

HN [ω](σ ) = −N E m[ω](σ ) , (14.4.6)

where E : Rn → R is the function

' n (2
1
n
E(x) = xk + bk xk . (14.4.7)
2
k=1 k=1

The equilibrium distribution of the random variables m[σ ] is given by

Qβ,N [ω](x) = μβ,N [ω] m[ω](σ ) = x (14.4.8)
1
= eβN E(x) 2−N 1{m[ω](σ )=x} , x ∈ ΓNn [ω],
ZN [ω]
σ ∈SN
14.4 Coarse-graining 337

where ZN [ω] is the normalising partition function. We use the same symbols Qβ,N ,
Fβ,N for functions defined on the n-dimensional variables x. Since we distinguish
vectors from scalars by using boldface type, there should be no confusion possible.
Similarly, for a mesoscopic subset A ⊆ ΓNn [ω], we define its microscopic counter-
part,

A = SN [A] = σ ∈ SN : m(σ ) ∈ A . (14.4.9)
The vectors (m[ω](σ (t)))t∈R+ form a Markov process with transition rates
1
rN [ω] x, x = μβ,N [ω](σ ) p[ω] σ, σ . (14.4.10)
Qβ,N [ω](x)
σ ∈SN [x] σ ∈SN [x ]

This can be easily inferred by checking the conditions of Theorem 9.5 in Sect. 9.2.
We can also check that the capacities of these processes are related. Let the sets
A, B ⊂ SN be defined in terms of the block variables m. This means that, for some
A, B ⊆ ΓNn , A = SN [A] and B = SN [B]. By symmetry under permutations that
leave the partition Λk [ω] invariant, we have
1 2
cap(A, B) = inf μβ,N [ω](σ )p σ, σ h(σ ) − h σ
h∈HA,B 2
σ,σ ∈SN

1 2
= inf μβ,N [ω](σ )p σ, σ u m(σ ) − u m σ
u∈GA,B 2
σ,σ ∈SN
2
= inf u(x) − u x μβ,N [ω](σ ) p σ, σ
u∈GA,B
x,x ∈ΓNn σ ∈SN [x] σ ∈SN [x ]
2
= inf Qβ,N [ω](x)rN x, x u(x) − u x
u∈GA,B
x,x ∈ΓNn

= CAP(A, B), (14.4.11)

where

HA,B = h : SN → [0, 1] : h(σ ) = 1 ∀ σ ∈ A, h(σ ) = 0 ∀ σ ∈ B ,
(14.4.12)
GA,B = u : ΓNn → [0, 1] : u(x) = 1 ∀ x ∈ A, u(x) = 0 ∀ x ∈ B ,

and we use the symbol CAP for capacity on ΓNn .

Most of the interesting issues on the dynamics of the model can now be derived
directly from the dynamics on the mesoscopic variables. But for the latter we are
now in the setting of Chap. 10 and can harvest the results obtained there. All that is
left to do is to analyse the specific energy landscape for the present models.

Theorem 14.3 (Metastable sets) Let MN be the set of (best lattice approximations
of) the local minima of the functions Fβ,N . Then MN is a metastable set in the sense
of Definition 8.2 for the induced dynamics with transition rates given by rN .
338 14 The Curie-Weiss Model with a Random Magnetic Field

Theorem 14.4 (Mean metastable exit times) Let x ∈ MN . Let Mx be the set of
local minima where Fβ,N is smaller than or equal to Fβ,N (x). For every σ ∈ S[x]
and x ∈ MN , Ph -a.s. for all but finitely many values of N ,

Eσ [τS[Mx ] ] = exp βN Fβ,N z∗ − Fβ,N x ∗
2
πN βEh [1 − tanh2 (β(z∗ + h))] − 1
× 1 + o(1) ,
2β[−γ̄1 ] 1 − βEh [1 − tanh (β(m∗ + h))]
2

(14.4.13)

where x ∗ = n=1 x , z∗ = n=1 z , z is the saddle point between x and Mx , and
γ̄1 is the unique negative solution of the equation

[1 − tanh(β(z∗ + h))] exp[−2β(z∗ + h)+ ]
Eh exp[−2β(z∗ +h)+ ]
= 1. (14.4.14)
β[1+tanh(β(z∗ +h))] − 2γ

Theorem 14.5 (Exponential law) With the notation of Theorem 14.4,

lim Pσ τS[Mx ] /Eσ [τS[Mx ] ] > t = e−t , t ≥ 0, a.s. (14.4.15)
N →∞

The proofs of these theorems are given in Sect. 14.7.

Note that

∗ ∗ (z∗ )2 − (m∗ )2
Fβ,N z − Fβ,N m = exp βN
2

−1
∗ ∗
−N ln cosh β z + hi − ln cosh β m + hi
i∈Λ
(14.4.16)

has random fluctuations of order N −1 , which lead to strong fluctuations in the

metastable crossover time with respect to the disorder variables hi , i ∈ Λ.

14.5 The landscape near critical points

We are very close to the setting of Chap. 10. To complete the connection we need
to analyse the measures Qβ,N [ω](x). We henceforth suppress ω from the notation.
Note that
' ' n (2 ( n
1 "
n
Zβ,N Qβ,N (x) = exp Nβ x + x b
ZN (x /ρ ), (14.5.1)
2
=1 =1 =1
14.5 The landscape near critical points 339

where

ZN (y) = 2−|Λ | 1{|Λ |−1 i∈Λ σi =y} . (14.5.2)

σ ∈SΛ

For y ∈ (−1, 1), ZN (y) can be expressed, via an elementary asymptotics of bino-

mial coefficients, as
exp[−|Λ |I (y)]

ZN (y) = 1 + o(1) , (14.5.3)

2 |Λ |/I (y)
π

where o(1) tends to zero as |Λ | → ∞ and I is Cramèr’s rate function (13.1.13)
(again we identify functions on ΓNn with their natural extensions to Rn ). This means
that we can express the right-hand side of (14.5.1) as
2
"n
I (x /ρ )/ρ
Zβ,N Qβ,N (x) = exp −NβFβ,N (x) 1 + o(1) , (14.5.4)
Nπ/2
=1

where
' n (2
1 1
n n
Fβ,N (x) = − x − x b + ρ I (x /ρ ). (14.5.5)
2 β
=1 =1 =1

The critical points z∗ of Fβ,N are solutions of the equation

n

z∗j + b = β −1 I z∗ /ρ = β −1 t∗ , (14.5.6)
j =1
n ∗
or, with z∗ = j =1 z ,

β z∗ + b = I z∗ /ρ = t∗ , (14.5.7)

which implies

z∗ /ρ = tanh β z∗ + b . (14.5.8)
Summing over , we see that z∗ must satisfy the equation
1
z∗ = tanh β z∗ + hi , (14.5.9)
N
i∈Λ

which coincides with (14.2.13) for the one-dimensional order parameter m.

The Hessian matrix A(z∗ ) at a critical point z∗ has elements

∗ ∂ 2 Fβ,N (z∗ ) ∗
A z k = = −1 + δk, β −1 ρ−1 IN,

z /ρ = −1 + δ,k λ̂ ,
∂zk ∂z
(14.5.10)
340 14 The Curie-Weiss Model with a Random Magnetic Field

where the random numbers λ̂ are given by

1
λ̂ = . (14.5.11)
βρ [1 − tanh (β(z∗ + b ))]
2

The determinant of the matrix A(z∗ ) has a simple expression, namely,

' ( n
∗ n
1 "
det A z = 1 − λ̂ (14.5.12)
λ̂
=1 =1

n
β "
= 1− 1 − tanh2 β z∗ + hi λ̂
N
i∈Λ =1

"
n

= 1 − βEh 1 − tanh2 β z∗ + h λ̂ 1 + o(1) .
=1

Combining these observations, we arrive at the following proposition.

Proposition 14.6 Let z∗ be a critical point of Fβ,N . Then z∗ is given by (14.5.8)

with z∗ a solution of (14.5.9). Moreover,
√
− det(A(z∗ ))
Zβ,N Qβ,N z∗ =
( N2βπ )n (βEh [1 − tanh2 (β(z∗ + h))] − 1)

1 ∗ 2 1 ∗
× exp βN − 2 z + ln cosh β z + hi
βN
i∈Λ

× 1 + o(1) . (14.5.13)

Proof The proof of the analogous result is given in Chap. 15.

14.6 Eigenvalues of the Hessian

We next describe the eigenvalues of the Hessian matrix A(z∗ ).

Lemma 14.7 Let z∗ be a solution of (14.5.9). In addition, assume that all numbers
λ̂k are distinct. Then γ is an eigenvalue of A(z∗ ) if and only if it is a solution of the
equation

n
1
= 1. (14.6.1)
1
=1 βρ [1−tanh2 (β(z∗ +b ))]
−γ
14.7 Topology of the landscape 341

Fig. 14.2 Correspondence of the 1-dimensional and n-dimensional landscapes

Moreover, (14.6.1) has at most one negative solution, and it has such a solution if
and only if

βEh 1 − tanh2 β z∗ + h > 1. (14.6.2)

Proof To find the eigenvalues of A, simply replace λ̂k by λ̂k − γ in the first line of
(14.5.12). This gives
' (
∗
n
1 "
n
det A z − γ = 1 − (λ̂ − γ ), (14.6.3)
=1
λ̂ − γ =1

provided none of the λ̂ − γ is zero. Then (14.6.1) is just the requirement that the
first factor in the right-hand side of (14.6.3) vanishes. It is easy to see that, under the
hypothesis of the lemma, this equation has n solutions, and that exactly one of them
is negative under the hypothesis in (14.6.2).

14.7 Topology of the landscape

From the analysis of the critical points of Fβ,N it follows that the landscape of this
function is closely linked to the one-dimensional landscape described in Sect. 11.1
(see Fig. 14.2). We collect the following features:
(i) Let m∗1 < z1∗ < m∗2 < z2∗ < · · · < zk∗ < m∗k+1 be the sequence of minima, respec-
tively, maxima of the one-dimensional function Fβ,N defined in (14.2.10). To
each minimum m∗i corresponds a minimum m∗i of Fβ,N such that n=1 m∗i, =
m∗i , and to each maximum zi∗ corresponds a saddle point z∗i of Fβ,N such that
n ∗ ∗
=1 zi, = zi .
(ii) For any value m of the total magnetisation,
the function Fβ,N (x) takes its
relative minimum on the set {y : y = m} at the point x̂ ∈ Rn determined
342 14 The Curie-Weiss Model with a Random Magnetic Field

(coordinate-wise) by the equation

1
x̂ (m) = tanh β(m + a + hi )
N
i∈Λ

= ρ tanh β(m + a + b ) , (14.7.1)

where a = a(m) is determined by the equation

1
m= tanh β(m + a + hi ) (14.7.2)
N
i∈Λ

n

= ρ tanh β(m + a + b ) .
=1

Moreover,

Fβ,N (m) ≤ Fβ,N (x̂) ≤ Fβ,N (m) + O(n ln N/N ). (14.7.3)

Remark 14.8 Note that the minimal energy curves x̂(·) defined by (14.7.1) pass
through the minima and the saddle points, but in general are not integral curves of
the gradient flow connecting them. Also note that, since we assume that the random
fields hi have bounded support, for every δ > 0 there exist two universal constants
0 < c1 ≤ c2 < ∞ such that
dx̂ (m)
c1 ρ ≤ ≤ c2 ρ , (14.7.4)
dm
uniformly in N , m ∈ [−1 + δ, 1 − δ] and = 1, . . . , n.

Finally, in order to apply the results from Chap. 10, we need the form of the
transition rates r near a saddle point z∗ . For σ ∈ SN , put

Λ±
k (σ ) = i ∈ Λk : σ (i) = ±1 . (14.7.5)

Note that |Λ± k (σ )| = 2 (1 ∓ xk (σ )) is independent of the specific choice of σ . For

all x ∈ ΓN , we have
n

rN (x, x + e ) = Qβ,N (x)−1 μβ,N [ω](σ ) p σ, σ i (14.7.6)
σ ∈SN [x] i∈Λ−
(σ )

−2β[x− 1 +b ]+
= Λ −
(x) e
N .

Define, as in Eq. (10.2.8), the sets

DN (ρ) = |z − x | < ρ, ∀ = 1, . . . , n ,
√
with ρ = C N −1 ln N .
14.7 Topology of the landscape 343

It follows easily that, for all x ∈ DN (ρ),

rN (x, x + e )
− 1 ≤ cβnρ, (14.7.7)
r (z∗ , z∗ + e )
N

for some finite constant c > 0. Thus, as in Chap. 10, we replace the Dirichlet form
near the saddle point by a simplified one, where
%β,N (x)
Q
r(x, x + e ) = rN z∗ , z∗ + e = r ,
% %
r(x + e , x) = r , (14.7.8)
%β,N (x + e )
Q
are the modified rates of a dynamics on DN (ρ) that is reversible w.r.t. the measure
Q%β,N (x). Let L
%N denote the corresponding generator. For u ∈ GA,B , we write the
corresponding Dirichlet form as

n
∗ ∗ ∗ 2
E%DN (u, u) = Qβ,N z∗ r e−βN ((x−z ),A(z )(x−z )) u(x) − u(x + e ) .
x∈DN (ρ) =1
(14.7.9)
We now have all the ingredients needed to apply the results of Chap. 10. The
only difference is that the free energy functional Fβ,N is random and depends on N .
But this presents no obstacle. What is still needed is the computation of the relevant
eigenvalues and eigenfunctions of the matrix B defined in (10.1.6).

Lemma 14.9 Let z∗ be a solution of (14.5.9) and assume in addition that

βEh 1 − tanh2 β z∗ + h > 1. (14.7.10)

Then z∗ defined through (14.5.8) is a saddle point, and the unique negative eigen-
value of B(z∗ ) is the unique negative solution γ̂1 = γ̂1 (N, n) of the equation

[1 − tanh2 (β(z∗ + h))]
E = 1. (14.7.11)
1 − 2γ exp (2β[z∗ + h]+ )β[1 + tanh(β(z∗ + h))]

Proof The particular form of the matrix B allows us to obtain a simple characterisa-
tion of all the eigenvalues and eigenvectors. The eigenvalue equations can be written
as
n
√
− r rk u + (rk λ̂k − γ )uk = 0 ∀ 1 ≤ k ≤ n. (14.7.12)
=1

Assume, for simplicity, that the rk λ̂k take distinct values.

Then there is no non-trivial
√
solution of these equations with γ = rk λ̂k , and so n=1 r u = 0. Thus,
√ n √
rk =1 r u
uk = . (14.7.13)
rk λ̂k − γ
√
Multiplying by rk and summing over k, we find that uk is a solution if and only if
γ satisfies the equation
344 14 The Curie-Weiss Model with a Random Magnetic Field

n
rk
= 1. (14.7.14)
r λ̂
k=1 k k
−γ

Inserting the expressions for r from (14.7.6), z∗k /ρk from (14.5.8) and λ̂k from
(14.5.11) into (14.7.14), we obtain (14.7.11). Since the left-hand side of (14.7.14)
is monotone decreasing in γ as long as γ ≥ 0, it follows that there can be at most
one negative solution of this equation, and such a solution exists if and only if the
left-hand side is larger than 1 for γ = 0.

We can now give the proof of Theorems 14.3–14.5.

Proof The induced dynamics on the block-magnetisations is essentially of the form

of the discrete diffusions treated in Chap. 10. To obtain Theorem 14.4, it remains
to insert the particular expressions obtained above into the general form of Theo-
rem 10.9. Taking into account that the lattice spacing is ε = 2/N , while the ε in the
exponent in the invariant measure is to be replaced by 1/N , we get

n 4
∗ β|γ̂1 | πN n/2 " r
cap(A, B) = Qβ,N z 1 + O (ln N )3 /N .
2πN 2β |γ̂j |
=1
(14.7.15)
Using Proposition 14.6 and the formula for the mean hitting time in Theorem 8.15,
we get (14.4.13). Theorem 14.5 follows from Theorem 8.45.

14.8 Bibliographical notes

1. The random-field Curie-Weiss model was one of the original motivations in
Bovier, Eckhoff, Gayrard and Klein [33] (together with the so-called Hopfield
model) for the development of the potential-theoretic approach to metastability. It
was the main example given in that paper for the case where the distribution of the
random field is discrete. Earlier work on the dynamics of this model was done by
Matthieu and Picco [180] and Fontes, Matthieu and Picco [114].

2. The equilibrium behaviour of the RFCW-model was analysed first by Salinas and
Wereszinski [238], and later in more detail by Amaro de Matos, Baêta Segundo and
Perez [5] and Külske [158].

3. For solutions of (14.5.6), see Bovier, Eckhoff, Gayrard and Klein [33] or Bovier,
Bianchi and Ioffe [24]. It is straightforward to analyse the case where some of the
λ̂k ’s in Lemma 14.7 coincide.

4. Another model that can be analysed with the methods of this chapter is the
Glauber dynamics of the Hopfield model of neural networks (see Bovier and
Gayrard [36] for a review) with finitely many stored patterns. This was done in
the thesis of an der Heiden [6] under somewhat restrictive conditions.
Chapter 15
The Curie-Weiss Model with Random Magnetic
Field: Continuous Distributions

“Which contain the greatest amount of Science, do you think,

the books, or the minds?” . . . And I considered a minute before
replying: “If you mean living minds, I don’t think it’s possible to
decide. There is so much written Science that no living person
has ever read: and there is so much thought-out Science that
hasn’t yet been written”.
(Lewis Carroll, Sylvie and Bruno)

The random-field Curie-Weiss model with general distributions of the magnetic

fields is a key example where non-exact coarse-graining methods can be shown to
work efficiently in the context of the potential-theoretic approach. In Sect. 15.1 we
state the main results of this chapter. In Sect. 15.2 we set up the coarse-graining and
look at the energy landscape near critical points. In Sect. 15.3 we prove the upper
bound in Theorem 15.3, which is relatively easy. In Sect. 15.4 we prove the lower
bound, which is much harder. In Sect. 15.5 we combine Theorem 15.3 with estates
on the harmonic function to complete the proof of Theorem 15.1.

15.1 Main results

We consider the same model with the same dynamics as in Chap. 14, but we drop
the assumption made in Sect. 14.4 that the random magnetic fields take on only
finitely many values. Instead we will only assume that the common distribution of
the random magnetic fields has bounded support. All the results from Sects. 14.1–
14.3 remain unchanged. What fails is the exact lumping procedure that allowed
us to realise the mesoscopic image of our Markov process as a discrete diffusion
process.
Our task is to obtain sharp estimates on metastable exit times. The main result
is formulated in the following theorem, whose proof is the content of the present
chapter.

Theorem 15.1 (Mean metastable exit times) Assume that β and the distribution of
the magnetic fields are such that there exist more than one local minimum of Fβ,N .

© Springer International Publishing Switzerland 2015 345

A. Bovier, F. den Hollander, Metastability,
Grundlehren der mathematischen Wissenschaften 351,
DOI 10.1007/978-3-319-24777-9_15
346 15 The Curie-Weiss Model with Random Magnetic Field

Let m∗ be a local minimum of Fβ,N , M = M(m∗ ) the set of minima of Fβ,N such
that Fβ,N (m) < Fβ,N (m∗ ), and z∗ the minimax between m and M, i.e., the lower
of the highest maxima separating m from M to the left, respectively, right. Then,
Ph -a.s. and for all but finitely many values of N ,

EνS[m∗ ],S[M] [τS[M] ] = exp N Fβ,N z∗ − Fβ,N m∗
2
πN βEh (1 − tanh2 (β(z∗ + h))) − 1
× 1 + o(1) ,
2β[−γ̄1 ] 1 − βEh (1 − tanh2 (β(m∗ + h)))
(15.1.1)

where γ̄1 is the unique negative solution of the equation

(1 − tanh(β(z∗ + h))) exp (−2β[z∗ + h]+ )
Eh exp (−2β[z∗ +h]+ )
= 1. (15.1.2)
β(1+tanh(β(z∗ +h))) − 2γ

Note that

Fβ,N z∗ − Fβ,N m∗

(z∗ )2 − (m∗ )2
= exp βN − ln cosh β z∗ + hi − ln cosh β m∗ + hi .
2
i∈Λ
(15.1.3)

Remark 15.2 Theorem 15.1 can be improved with the help of coupling techniques
in two ways. First, the starting measure νS[m∗ ],S[M] can be replaced by any configu-
ration σ in a suitably defined subset of S[m∗ ]. Second, the law of the transition time
can be shown to be asymptotically exponential. Both these results rely on rather
intricate and technical coupling arguments (see Sect. 15.6).

The proof of Theorem 15.1 relies on the following estimate for capacities.

Theorem 15.3 (Capacity asymptotics) With the same notation as in Theorem 15.1,

β|γ̄1 | exp[−βNFβ,N (z∗ )] [1 + o(1)]

Zβ,N cap S m∗ , S[M] = . (15.1.4)
2πN
βEh (1 − tanh2 (β(z∗ + h))] − 1

15.2 Coarse-graining and the mesoscopic approximation

As in Chap. 14, we want to pass to a coarse-grained description. As exact lumping

is not possible, we use a sequence of approximate coarse-grainings.
15.2 Coarse-graining and the mesoscopic approximation 347

Fig. 15.1 Coarse-graining: Λ is partitioned into sets where the magnetic field takes values in a
narrow interval. Compare with Fig. 14.1

15.2.1 Coarse-graining

Let I denote the support of the common distribution of the random fields hi . Let I ,
∈ {1, . . . , n}, be a partition of I such that |I | ≤ C/n = ε for all and some
C < ∞. Each realisation of the random fields {hi [ω]}i∈N induces a random par-
tition of the set Λ = {1, . . . , N} into subsets (see Fig. 15.1)

Λ [ω] = i ∈ Λ : hi [ω] ∈ I , = 1, . . . , n. (15.2.1)

In complete analogy with the case of discrete distributions, we introduce n order

parameters
1
m [ω](σ ) = σi , = 1, . . . , n. (15.2.2)
N
i∈Λ [ω]

All notations from Sect. 14.4 carry over.

Remark 15.4 To simplify the presentation in this chapter, all statements involving
random variables on (Ω, F , Ph ) are understood to be true with Ph -probability one,
for all but finitely many values of N .

We define
1
h̄ = hi , h̃i = hi − h̄ . (15.2.3)
|Λ |
i∈Λ

The Hamiltonian can then be written in the form

n
HN [ω](σ ) = −N E m[ω](σ ) + σi h̃i [ω], (15.2.4)
=1 i∈Λ

where E : Rn → R is as in (14.4.7), but with h̄ replacing b . We define the equi-

librium distribution of the variables m[σ ] as in (14.4.8), but take a slightly different
348 15 The Curie-Weiss Model with Random Magnetic Field

form:
n
1 σ (h −h̄ )
Qβ,N [ω](x) = eβN E(x) Eσ 1{m[ω](σ )=x} e =1 i∈Λ i i . (15.2.5)
ZN [ω]

15.2.2 The energy landscape near critical points

We now turn to the precise computation of the measures Qβ,N [ω](x) in the neigh-
bourhood of the critical points of Fβ,N [ω](x). We will see that this goes very much
along the lines of the analysis for discrete distributions. We get the same expression
for Zβ,N [ω]Qβ,N [ω](x) as in (14.5.1), again with b replaced by h̄ , and

Zβ,N [ω](y) = EσΛ exp β h̃i σi 1{|Λ |−1 i∈Λ σi =y}

i∈Λ

= Eh̃σΛ [1{|Λ |−1 i∈Λ σi =y} ]. (15.2.6)

As in Sect. 14.5, we can express Zβ,N [ω]Qβ,N [ω](x) in the form of (14.5.4)
with Fβ,N given by (14.5.5), but b replaced by h̄ , where the entropy function
IN, [ω](y) is now defined as the Legendre-Fenchel transform of the log-moment-
generating function,

1
UN, [ω](t) = ln Eh̃σΛ exp t σi
|Λ |
i∈Λ

1
= ln cosh(t + β h̃i ). (15.2.7)
|Λ |
i∈Λ

The analysis of the free energy functions near critical points z∗ of Fβ,N goes very
much as in Sect. 14.5, with the obvious replacements. Using that, by standard prop-
(x) = U −1 (x), we see that (14.5.8) be-
erties of Legendre-Fenchel transforms, IN, N,
comes
∗ 1
z∗ /ρ = UN,

β z + h = tanh β z∗ + hi , (15.2.8)
|Λ |
i∈Λ

where z∗ again solves (14.5.9), which is independent of n.

(z∗ /ρ ) = 1/U (t ∗ ) at a critical point, we find that the
Finally, using that IN, N,
Hessian matrix A(z ) at a critical point z∗ has elements
∗

∗
A z k = −1 + δ,k λ̂ , (15.2.9)
15.2 Coarse-graining and the mesoscopic approximation 349

where
I (z∗ /ρ ) 1 1
λ̂ = = (β(z∗ + h̄ ))
= ,
β ∗
i∈Λ (1 − tanh (β(z + hi )))
βρ βρ UN, 2
N
(15.2.10)
which replaces (14.5.11).
The following is the analogue of Proposition 14.6.

Proposition 15.5 Let z∗ be a critical point of Fβ,N . Then z∗ is given by (15.2.8),

where z∗ is a solution of (14.5.9). Moreover,

Zβ,N Qβ,N z∗
√
[− det(A(z∗ ))]
=
( N2βπ )n |βEh (1 − tanh2 (β(z∗ + h))) − 1|

(z∗ )2 1 ∗
× exp βN − + ln cosh β z + hi 1 + o(1) .
2 βN
i∈Λ
(15.2.11)

Proof We start with the representation of Zβ,N [ω]Qβ,N [ω](x) given in (14.5.4).
Using (15.2.10) and the formula for the determinant of A(z∗ ) given in (14.5.12), we
get the prefactor. For the exponential term Fβ,N , note that by convex duality

IN, z∗ /ρ = t∗ z∗ /ρ − UN, t∗ = β z∗ + h̄ z∗ /ρ − UN, β z∗ + h̄ .
(15.2.12)
Hence

Fβ,N z∗

1 ∗ 2 ∗ 1
n n

=− z − z h̄ + ρ β z∗ + h̄ z∗ /ρ − ρ UN, β z∗ + h̄
2 β
=1 =1
n

1 ∗ 2 1
=− z − z∗ h̄ − z∗ z∗ − h̄z∗ + ln cosh β z∗ + hi
2 βN
=1 i∈Λ

1 2 1
= z∗ − ln cosh β z∗ + hi , (15.2.13)
2 βN
i∈Λ

which is the desired exponent.

Remark 15.6 The form given in Proposition 15.5 is highly suitable for our purposes,
as the dependence on n appears only in the denominator of the prefactor. We will
see that this is just what we need
to get a formula for capacities that is independent
of the choice of the partition I = 1≤≤n I and has a limit as n → ∞.
350 15 The Curie-Weiss Model with Random Magnetic Field

The eigenvalues of the Hessian are characterised in the next lemma, which is the
analogue of Lemma 14.7.

Lemma 15.7 Let z∗ be a solution of (14.5.9). In addition, assume that the distri-
bution of (hi )i∈N is such that all numbers λ̂k are Ph -a.s. distinct. Then γ is an
eigenvalue of A(z∗ ) if and only if it is a solution of the equation

n
1
= 1. (15.2.14)

1
−γ
=1 β
N i∈Λ (1−tanh2 (β(z∗ +hi )))

Moreover, (15.2.14) has at most one negative solution, and it has such a negative
solution if and only if

β
N

1 − tanh2 β z∗ + hi > 1. (15.2.15)
N
i=1

The proof of Lemma 15.7 is identical to that of Lemma 14.7.

The discussion of the energy landscape carries over unchanged from Chap. 14.

15.3 Upper bounds on capacities

Sections 15.3–15.4 are devoted to the proof of Theorem 15.3. In this section we
prove the upper bound. Obtaining upper bounds on capacities just requires guessing
a test function. Basically, we may ignore the fact that our coarse-graining is not an
exact lumping, since (14.4.11) holds as an upper bound (only the second equality
has to be replaced by an inequality).
Let A = SN [A] and B = SN [B], for some A, B ⊆ ΓNn . Then

1 2
cap(A, B) = inf μβ,N [ω](σ )p σ, σ h(σ ) − h σ
h∈HA,B 2
σ,σ ∈SN

1 2
≤ inf μβ,N [ω](σ )p σ, σ u m(σ ) − u m σ
u∈GA,B 2
σ,σ ∈SN
2
= inf Qβ,N [ω](x)rN x, x u(x) − u x
u∈GA,B
x,x ∈ΓNn

= CAPnN (A, B) (15.3.1)

with rN (β, x ), HA,B and GA,B defined precisely as in (14.4.10) and (14.4.12).
We proceed from here as in Chap. 10. For this we need the form of the transition
rates in the neighbourhood of a critical point. The formulas from Chap. 14 have to
15.3 Upper bounds on capacities 351

be changed only slightly: for all x ∈ ΓNn , we now have

rN (x, x + e ) = Qβ,N (x)−1 μβ,N [ω](σ ) p σ, σ i (15.3.2)
σ ∈SN [x] i∈Λ−
(σ )

1 −2β[m(σ )−N −1 +hi ]+

= Qβ,N (x)−1 μβ,N [ω](σ ) e .
N
σ ∈SN [x] i∈Λ−
(σ )

Note that, for all σ ∈ SN (x), |Λ− (σ )| is a constant depending on x only. Using that
% %
hi = h̄ + hi , with hi ∈ [−ε, ε], we get the bounds

|Λ−
(x)| −2β[m(σ )+h̄ ]+

rN (x, x + e ) = e 1 + O(ε) . (15.3.3)
N
It follows that, for all x ∈ DN (ρ),

rN (x, x + e )
− 1 ≤ cβ(ε + nρ) (15.3.4)
r (z∗ , z∗ + e )
N

for some finite constant c > 0. With these minimal changes we arrive at the same
form of the effective Dirichlet form E%DN (u, u) as in (14.7.9).
From now on the upper bound follows as in the case of discrete magnetic fields.
There is just a slight change in that Lemma 14.9 needs to be replaced by the follow-
ing.

Lemma 15.8 Let z∗ be a solution of (14.5.9). In addition, assume that

β
N

1 − tanh2 β z∗ + hi > 1. (15.3.5)
N
i=1

Then z∗ defined through (15.2.8) is a saddle point, and the unique negative eigen-
value of B(z∗ ) is the unique negative solution γ̂1 = γ̂1 (N, n) of the equation
∗ ∗

n 1
|Λ | i∈Λ (1 − tanh(β(z + hi ))) exp (−2β[z + h̄ ]+ )
ρ 1 ∗ ∗
= 1. (15.3.6)
|Λ | i∈Λ (1−tanh(β(z +hi ))) exp (−2β[z +h̄ ]+ )
=1 β ∗
− 2γ
i∈Λ (1−tanh (β(z +hi )))
2
|Λ |

Moreover,
lim lim γ̂1 (N, n) = γ̄1 , (15.3.7)
n→∞ N →∞

where γ̄1 is the unique negative solution of the equation

(1 − tanh(β(z∗ + h))) exp (−2β[z∗ + h]+ )
Eh exp (−2β[z∗ +h]+ )
= 1. (15.3.8)
β(1+tanh(β(z∗ +h))) − 2γ
352 15 The Curie-Weiss Model with Random Magnetic Field

Proof The proof of (15.3.6) is identical to that of Lemma 14.9. The assertion on the
convergence follows from the fact that the size of the small fields tends to zero as
n → ∞.

This result yields the upper bound given in the next proposition.

Proposition 15.9 With the notation above, for every n ∈ N,

n 4
β|γ̂1 | πN n/2 " r
cap(A, B) ≤ Qβ,N z∗ 1 + O ε + (ln N )3 /N .
2πN 2β |γ̂j |
=1
(15.3.9)

Combining Proposition 15.9 with Proposition 14.6, we get (after some computa-
tions) the following more explicit representation of the upper bound.

Corollary 15.10 With the same notation as in Proposition 15.9,

β|γ̄1 | exp(−βNFβ,N (z∗ ))[1 + o(1)]

Zβ,N cap(A, B) ≤ , (15.3.10)
2πN
βNEh (1 − tanh2 (β(z∗ + h))) − 1

where γ̄1 is defined through (15.3.8).

Corollary 15.10 concludes the upper bound in the proof of Theorem 15.3.

15.4 Lower bounds on capacities

To prove the matching lower bound is technically involved, because it requires a
more sophisticated use of the Berman-Konsowa principle in Theorem 7.43 that was
so successfully used in Sect. 10.3. We have seen in Chap. 14 that we can construct a
defective flow on mesoscopic variables that produces a good lower bound for capac-
ities (along the lines explained in Sect. 9.1.2). The strategy in the present situation
is to try to reproduce this mesoscopic flow on the microscopic level.

15.4.1 Two-scale flows

Let A and B be mesoscopic neighbourhoods of two minima of Fβ,N , and let z∗ be

the corresponding saddle point. Let A = SN [A] and B = SN [B] be as before. Let
fA,B = {fA , f, fB } be the defective mesoscopic flow from A to B. In this section we
are going to construct a subordinate microscopic flow fA,B from A to B. In the
sequel, given a microscopic bond b = (σ, σ ), we write e(b) = (m(σ ), m(σ )) to
denote its mesoscopic pre-image.
15.4 Lower bounds on capacities 353

We label the realisations of the mesoscopic Markov chain XA,B associated with
the mesoscopic flow fA,B as x = (x−A , . . . , xB ) in such a way that x−A ∈ A,
xB ∈ B, and m(x0 ) = m(z∗ ). We denote by PfA,B the corresponding law on the
mesoscopic paths. If e is a mesoscopic bond, then we write e ∈ x when e =
(x , x+1 ) for some = −A , . . . , B − 1. With each path x of positive probabil-
ity we associate a subordinate microscopic unit flow f x such that

f x (b) > 0 if and only if e(b) ∈ x, (15.4.1)

and

f x (b) = 1 ∀ x, e ∈ x. (15.4.2)
b : e(b)=e

is a unit flow on x, it defines a Markov chain Px on the microscopic

Finally, since f x
paths S whose image is x, such that

f x (b) = Px (b ∈ S). (15.4.3)

We think of Px as the conditional law, given x, of a Markov chain on the microscopic

paths, namely,

PfA,B (S = σ ) = PfA,B (XA,B = x)Px (S = σ ). (15.4.4)
x

Therefore we have

PfA,B (b ∈ σ ) = PfA,B (XA,B = x)Px (b ∈ σ ). (15.4.5)
x

Summing over b giving rise to a mesoscopic bond e, we get

PfA,B (b ∈ σ ) = PfA,B (XA,B = x)1e∈x = PfA,B (e ∈ XA,B ). (15.4.6)
b : e(b)=e x

This provides the decomposition of unity

PfA,B (XA,B = x)Px (Σ = σ )

1{fA,B (b)>0} = N
. (15.4.7)
xe(b) σ b
PfA,B (e ∈ XA,B )f x (b)

If fA,B (e(b)) would be non-defective, then we could replace PfA,B (e ∈ XA,B ) by

fA,B (e(b)). Since we will choose a defective flow, we assume that

PfA,B (e ∈ XA,B ) ≤ fA,B e(b) 1 + d(e) , (15.4.8)

where d(e) depends only on the initial point of the bond e.

354 15 The Curie-Weiss Model with Random Magnetic Field

As in Lemma 9.4 we get

cap(A, B)
' −1 (−1
f
B
fA,B (x , x+1 )f x (σ , σ+1 )
≥ PNA,B (XA,B = x)E x

x
(1 + d(x ))μβ,N (σ )pN (σ , σ+1 )
=−A
' −1 (−1
f
B
fA,B (x , x+1 )f x (σ , σ+1 )
≥ ENA,B E x
, (15.4.9)
(1 + d(x ))μβ,N (σ )pN (σ , σ+1 )
=−A

where we use Jensen’s inequality to obtain the second inequality. We set

Qβ,N (x )rN (x , x+1 )f x (σ , σ+1 ) x
= φ (σ , σ+1 ). (15.4.10)
μβ,N (σ )pN (σ , σ+1 )
Then (15.4.9) reads

cap(A, B)
' −1 (−1
f
B
fA,B (x , x+1 ) x
≥ ENA,B E x
φ (σ , σ+1 ) .
(1 + d(x ))Qβ,N (x )rN (x , x+1 )
=−A
(15.4.11)
x
The point of the above rewrite is that if were equal to one, then we
φ (σ , σ+1 )
would be in the same situation as in the case of discrete distributions of the magnetic
fields, and the left-hand side would be equal to the upper bound in (15.3.9), up to
errors of order N −1/2 (ln N )3/2 + ε.
Recall from Sect. 10.3 that we can restrict the expectation to a subset of good
f
realisations x of the mesoscopic Markov chain XA,B whose probability under PNA,B
x x
is close to one. It remains to construct f such that φ in (15.4.10) is close to
one (in a weak sense). This requires some additional notation. Given a mesoscopic
trajectory x = (x−A , . . . , xB ), define k = k() as the direction of the increment
of the -th jump, i.e., x+1 = x + ek . On the microscopic level such a transi-
tion corresponds to a flip of a spin from the Λk -slot. Thus, recalling the notation
Λ± k (σ ) = {i ∈ Λk : σ (i) = ±1}, we have that if σ ∈ SN [x ] and σ+1 ∈ SN [x+1 ],
then σ+1 = θi+ σ for some i ∈ Λ− k() (σ ). By our choice of pN and rN ,

rN (x , x+1 ) −
= Λk() (σ ) 1 + O(ε) , (15.4.12)
pN (σ , σ+1 )
uniformly in and in all pairs of neighbours σ , σ+1 . Note that the cardinality of
Λ−k() (σ ) is the same for all σ ∈ SN [x ].
For x ∈ ΓNn , define the measure
1{σ ∈SN [x]} μβ,N (σ )
μxβ,N (σ ) = (15.4.13)
Qβ,N (x)
15.4 Lower bounds on capacities 355
n
β =1 i∈Λ σi h̃i
e
= 1{σ ∈SN [x]} n
β =1 i∈Λ σi h̃i
σ: x(σ )=x e

"
n
e
β i∈Λ σi h̃i "
n
= 1{σ ∈SN [x]} = μx,
β,N (σ ).
β i∈Λ σi h̃i
=1 σΛ : x (σΛ() )=x e =1

x
Then we can write φ as

x rN (x , x+1 )f x (σ , σ+1 )

φ (σ , σ+1 ) = x . (15.4.14)
pN (σ , σ+1 )μβ,N (σ )

If the magnetic fields were constant on each set Ik , then we could choose

(σ )/Λ−
x
f x (σ , σ+1 ) = μβ,N k() (σ ) , (15.4.15)

and we would be done. But this is not possible here.

Construction of f x

We construct a Markov chain Px on microscopic trajectories Σ = {σ0 , . . . , σB }

from SN [x0 ] to B such that σ ∈ SN [x ] for all = 0, . . . , B . The construction of
a microscopic flow from A to SN [x0 ] is just the reversal of the above and we will
omit it.

S TEP 1. The transition probabilities q (σ , σ+1 ) in (15.4.18) are defined in the
following way: all the microscopic jumps are of the form σ → θj+ σ for some
j ∈ Λ− +
k() (σ ), where θj flips the j -th spin from −1 to 1. For such a flip define

e2β h̃j
q σ , θj+ σ = . (15.4.16)
e 2β h̃i
i∈Λ−
k (σ )

Clearly, these ratios sum up to one. Note also that they satisfy
1 + O(ε)
q σ , θj+ σ = . (15.4.17)
|Λ−k() |

x x
0
S TEP 2. As initial measure ν0 , choose μβ,N . For = 0, . . . , B , set

x
x
ν+1 (σ+1 ) = ν (σ )q (σ , σ+1 ). (15.4.18)
σ ∈SN [x ]

Note that these measures are concentrated on SN [x ] and are the marginals of Px at
time .
356 15 The Curie-Weiss Model with Random Magnetic Field

S TEP 3. Define the microscopic flow through an admissible bond b = (σ , σ+1 ) as
x
f x (σ , σ+1 ) = Px (b ∈ Σ) = ν (σ )q (σ , σ+1 ). (15.4.19)

Note that the fact that the q are probabilities, together with the definition in
(15.4.18), ensures that f x is a unit flow. Consequently,
x
x ν (σ ) rN (x , x+1 )
φ (σ , σ+1 ) = q (σ , σ+1 ). (15.4.20)
μxβ,N

(σ ) pN (σ , σ+1 )

Using the observations in (15.4.12) and (15.4.17), we see that

x
x ν (σ )
φ (σ , σ+1 ) = x 1 + O(ε) = Ψ (σ ) 1 + O(ε) . (15.4.21)
μβ,N (σ )

Note that Ψ0 (σ0 ) = 1. We need to control the evolution of this quantity in time.

Proposition 15.11 There exists a set TA,B of good mesoscopic trajectories from A
to B such that
f
PNA,B (XA,B ∈ TA,B ) = 1 − o(1), (15.4.22)
and, uniformly in x ∈ TA,B ,
−1

B
fA,B (x , x+1 ) 1
E x
Ψ (σ ) ≤ 1 + O(ε) .
(1 + d(x ))Qβ,N (x )rN (x , x + 1) EN (g̃)
=−A
(15.4.23)

Proposition 15.11 implies that

cap(A, B) ≥ EN (g̃) 1 − O(ε) , (15.4.24)

which is the lower bound necessary to prove Theorem 15.3.

The rest of this section is devoted to the proof of (15.4.23). First of all, we derive
recursive estimates on Ψ for a given realisation x of the mesoscopic Markov chain.
After that it will be obvious how to define TA,B .

15.4.2 Propagation of errors along microscopic paths

Let x be given. We have seen in (15.4.13) that μx|b,N is a product measure. On the
other hand, according to (15.4.16), the large microscopic Markov chain Σ splits
into a direct product of n small microscopic Markov chains Σ (1) , . . . , Σ (n) , which
(1) (n)
independently evolve on SN , . . . , SN . Thus, k() = k means that the -th step of
15.4 Lower bounds on capacities 357

the mesoscopic Markov chain induces a step of the k-th small microscopic Markov
chain Σ (k) . Let τ1 [], . . . , τn [] be the numbers of steps performed by each of the
small microscopic Markov chains after steps of the mesoscopic Markov chain
or, equivalently, after steps of the large microscopic Markov chain Σ . Then the
corrector Ψ in (15.4.21) also factorises and can be written as

"
n
(j ) (j )
Ψ (σ ) = ψτj [] σ . (15.4.25)
j =1

Therefore we are left with two separate tasks: On the microscopic level we need to
control the propagation of errors along small Markov chains, while on the meso-
scopic level we need to control the statistics of τ1 [], . . . , τn [].

Small microscopic Markov chains

To simplify notation we consider the error propagation along the small Markov
chains in a more abstract setting. Fix 1 M ∈ N and 0 ≤ ε 1. Let g1 , . . . , gM ∈
[−1, 1]. Consider spin configurations ξ ∈ SM = {−1, 1}M with product weights

w(ξ ) = eε i gi ξ(i) . (15.4.26)

As before, let Λ± (ξ ) = {i : ξ(i) = ±1}. Define layers of fixed magnetisation

SM [K] = {ξ ∈ SM : |Λ+ (ξ )| = K}. Finally, fix δ0 , δ1 ∈ (0, 1) such that δ0 < δ1 .
Set K0 = δ0 M and r = (δ1 − δ0 )M. We consider a Markov chain Ξ =
{Ξ0 , Ξ1 , . . . , Ξr } on SM such that Ξτ ∈ SM [K0 + τ ] = SMτ for τ = 0, 1, . . . , r.
Let μτ be the probability measure

w(ξ )1{ξ ∈SMτ }

μτ (ξ ) = . (15.4.27)
Zτ
We take ν0 = μ0 as the initial distribution of Ξ0 and, following (15.4.16), define
transition rates
e2εgj
qτ ξτ , θj+ ξτ = 2εgi
. (15.4.28)
i∈Λ− (ξτ ) e

We denote by P the law of this Markov chain and let ντ be the distribution of Ξτ
(which is concentrated on SMτ ), i.e., ντ (ξ ) = P(Ξτ = ξ ). The propagation of errors
along paths of our Markov chain is then quantified in terms of ψτ (·) = ντ (·)/μτ (·).

Proposition 15.12 For τ = 1, . . . , r and ξ ∈ SMτ , set

M
Bτ (ξ ) = e2εgi 1{i∈Λ− (ξ )} (15.4.29)
i=1
358 15 The Curie-Weiss Model with Random Magnetic Field

M

Aτ = Eμτ Bτ (·) = e2εgi μτ i ∈ Λ− (·) .
i=1

Then there exists a c = c(δ0 , δ1 ) such that, for any trajectory ξ = (ξ0 , . . . , ξr ),
τ
A0 2 /M
ψτ (ξτ ) ≤ ecετ (15.4.30)
B0 (ξ0 )

for all τ = 0, 1, . . . , r.

Remark 15.13 The second factor in the bound in (15.4.30) will be seen to be what
we want, since it grows much slower than φA,B (x , x+1 ) decays. The first factor
involves the ratio of A0 and B0 , which is more delicate. To control
√ it we require
a concentration estimate showing that A0 /B0 (ξ0 ) ≤ 1 + O(1/ M), which will be
done later.

Proof By construction, ψ0 = 1. Let ξτ +1 ∈ SMτ +1 . Since ντ satisfies the recursion

ντ +1 (ξτ +1 ) = ντ θj− ξτ +1 qτ θj− ξτ +1 , ξτ +1 , (15.4.31)
j ∈Λ+ (ξτ +1 )

it follows that ψτ satisfies

ντ (θj− ξτ +1 )qτ (θj− ξτ +1 , ξτ +1 )

ψτ +1 (ξτ +1 ) = (15.4.32)
μτ +1 (ξτ +1 )
j ∈Λ+ (ξτ +1 )

μτ (θj− ξτ +1 )qτ (θj− ξτ +1 , ξτ +1 )

= ψτ θj− ξτ +1 .
μτ +1 (ξτ +1 )
j ∈Λ+ (ξτ +1 )

By our choice of transition probabilities in (15.4.28),

# $−1
μτ (θj− ξτ +1 )qτ (θj− ξτ +1 , ξτ +1 ) Zτ +1
= e 2εgi
. (15.4.33)
μτ +1 (ξτ +1 ) Zτ
i∈Λ− (θj− ξτ +1 )

Recalling that |Λ+ (ξτ )| = |Λ+

τ | = K0 + τ does not depend on the particular value
of ξτ , we have

Zτ +1 1 1 1
= w(ξ ) = w θj− ξ e2εgj (15.4.34)
Zτ Zτ Zτ |Λ+ (ξ )|
τ +1
ξ ∈ SM τ +1
ξ ∈ SM j ∈Λ+ (ξ )

1 1 1
= w(ξ ) · + e 2εgj
= μτ e 2εgj
.
Zτ |Λτ +1 | |Λ+ (ξτ +1 )|
τ ξ ∈ SM j ∈Λ− (ξ ) − j ∈Λ (·)
15.4 Lower bounds on capacities 359

We conclude that the right-hand side of (15.4.33) equals

1 μτ ( i∈Λ− (·) e2εgi ) 1 Aτ
= + . (15.4.35)
|Λ+ (ξτ +1 )| i∈Λ− (θ − ξτ +1 ) e2εgi |Λ (ξτ +1 )| Bτ (θj− ξτ +1 )
j

Consequently,
1 Aτ
ψτ +1 (ξτ +1 ) = − ψτ θj− ξτ +1 . (15.4.36)
|Λ+ (ξτ +1 )| Bτ (θj ξτ +1 )
j ∈Λ+ (ξτ +1 )

Iterating the above procedure, we arrive at the following conclusion. Consider the
set D(ξτ +1 ) of all paths ξ = (ξ0 , . . . , ξτ , ξτ +1 ) of positive probability from SM0 to
SMτ +1 to ξτ +1 . The number Dτ +1 = |D(ξτ +1 )| of such paths does not depend on
ξτ +1 . Therefore, since ψ0 = 1, we have

1 "
τ
As
ψτ +1 (ξτ +1 ) = . (15.4.37)
Dτ +1 Bs (ξs )
ξ ∈D (ξτ +1 ) s=0

To conclude the proof, we need the following lemma.

Lemma 15.14

As O(ε) As−1
= 1+ , (15.4.38)
Bs (ξs ) M Bs−1 (ξs−1 )
where O(ε) is uniform in all parameters.

Once (15.4.38) is verified, we have

τ
O(ε)τ 2 /M A0
ψτ (ξτ ) ≤ e max , (15.4.39)
ξ0 ∼ξτ B0 (ξ0 )

where for ξ0 ∈ SM0 the relation ξ0 ∼ ξτ means that there is a path of positive prob-
ability from ξ0 to ξτ . But all such ξ0 differ in at most 2τ coordinates. It is straight-
forward to see that if ξ0 ∼ ξτ and ξ0 ∼ ξτ , then

B0 (ξ0 )
≤ eO(ε)τ/M , (15.4.40)
B0 (ξ0 )

and (15.4.30) follows.

It remains to prove Lemma 15.14.

Proof of Lemma 15.14 Let ξ ∈ SMs and ξ = θj− ξ ∈ SMs−1 . Note that

Bs−1 ξ − Bs (ξ ) = e2εgj = 1 + O(ε). (15.4.41)
360 15 The Curie-Weiss Model with Random Magnetic Field

Similarly,

M

As−1 − As = e2εgi μs−1 i ∈ Λ− − μs i ∈ Λ− (15.4.42)
i=1

M
2εg
=1+ e i − 1 μs−1 i ∈ Λ− − μs i ∈ Λ− .
i=1

By standard local limit results for independent Bernoulli variables,

1
μs−1 i ∈ Λ− − μs i ∈ Λ− = O (15.4.43)
M

uniformly in s = 1, . . . , r − 1 and i = 1, . . . , M. Hence As−1 − As = 1 + O(ε).

Finally, both As−1 and Bs−1 (ξ ) are uniformly O(M), whereas

M
2εg
As−1 − Bs−1 ξ = e i − 1 μs−1 i ∈ Λ− − 1{i∈Λ− (ξ )} = O(ε)M.
i=1
(15.4.44)
Hence

As As−1 − 1 + O(ε) As−1 O(ε)
= = 1+ , (15.4.45)
Bs (ξ ) Bs−1 (ξ ) − 1 + O(ε) Bs−1 (ξ ) M

which is (15.4.38).

Lemma 15.15 Assume that τ/M ≤ C. Then

τ

A0 τ 2 τ 2
Eμ 0 ≤ exp max O(ε) √ , O ε . (15.4.46)
B0 (ξ0 ) M M

Proof First note that

A0 B0 (ξ0 ) − A0 −τ
= 1+ . (15.4.47)
B0 (ξ0 ) A0

Now define the random variable

B0 (ξ0 ) − A0
Y= . (15.4.48)
A0

Due to the fact that |Λ− (ξ )| only depends on the magnetisation of ξ , we can rewrite
Y as
1 2εgi
M

Y= e − 1 1i∈Λ− (ξ ) − μτ i ∈ Λ− . (15.4.49)
A0
i=1
15.4 Lower bounds on capacities 361

Since A0 = |λΛ−1 (ξ(0))|[1 + O(ε)], it follows that

|Y | ≤ 5ε, (15.4.50)
where 5 is an arbitrary choice for a number larger than 4. This ensures that when we
compute Eμ0 [(1 + Y )−τ ], we stay away from the singularity at zero. Note that Y is
a centred random variable and is a sum of bounded random variables that are almost
independent (their dependence arises only from conditioning on the value of their
sum). Moreover, the variance of the summands is of order ε 2 /M 2 . In this situation,
the following lemma holds.

Lemma
√ 15.16 There exist finite positive constants c, C such that, for any r >
ε/ M,

μ0 |Y | > r ≤ Ce−cMr /ε .
2 2
(15.4.51)

Next we use that, for |x| ≤ 10

1
say, there is a finite constant d > 0 such that
ln(1 + x) ≥ x − dx . Hence
2

2
Eμ0 [1 + Y ]−τ ≤ Eμ0 e−τ Y +dτ Y (15.4.52)
√
≤ eετ/ M+dε τ/M + τ C e−τ r+dτ r e−cMr /ε dr
2 2 2 2

√
√
ετ/ M+dε 2 τ/M 2πτ 2 /(cM/ε 2 −dτ )
=e + eτ .
2(cM/ε 2 − dτ )
Since we have assumed that τ ≤ CM, and ε is small, the right-hand side of (15.4.52)
is as claimed in (15.4.46).

Back to the large microscopic Markov chain

Going back to (15.4.25), we infer that the corrector of the large Markov chain Σ
satisfies the following upper bound. Let σ = (σ0 , σ1 , . . . ) be a trajectory of Σ (as
sampled from Px ). Then, for every = 0, 1, . . . , B − 1,
8 n τj []

n
τj []2 " A0
(j )
Ψ (σ ) ≤ exp cε (j ) (j )
, (15.4.53)
Mj
j =1 B (σ )
j =1 0 0

(j ) (j )
where Mj = |Λj | = ρj N , and A0 , B0 are defined as in (15.4.29) with respect
to the corresponding small microscopic Markov chains. We need to check that when
this bound is inserted into the left-hand side of (15.4.23), we recover the right-hand
side as upper bound.
f
By the construction of the mesoscopic Markov chain PNA,B , and in view of
(15.2.8) and (14.5.9), the step frequencies τj []/ are on average proportional to ρj .
f
Therefore there exists a constant C1 such that, up to exponentially negligible PNA,B -
362 15 The Curie-Weiss Model with Random Magnetic Field

probabilities,
τj [B ]
max ≤ C1 . (15.4.54)
1≤j ≤n Mj
Our mesoscopic trajectories are constructed such that the assumptions of
Sect. 15.4.2 hold for each of them. Thus Lemma 15.4.46 together with Proposi-
tion 15.12 imply that
'
(
n
τj [] τj []2
Ψ (σ ) ≤ exp O(ε) max , (15.4.55)
Mj Mj
j =1

√ 2
≤ exp max O( ε) √ , O(ε) ,
N N
uniformly in = 0, . . . , B . Note thatto obtain the second line we use the Cauchy-
Schwarz inequality and the fact that nj=1 Mj = N and ε = O(1/n). Inserting this
into the bound (15.4.9), we have now proved that

f
cap(A, B) ≥ ENA,B 1TA,B

' −1 √ (
fA,B (x , x+1 ) exp(max(O( ε) √|| , O(ε) N )) −1
2

B
N
× E x
.
(1 + d(x ))Qβ,N (x )rN (x , x+1 )
=−A
(15.4.56)

Let us now set

fA,B (x , x+1 )
φA,B (x , x+1 ) = . (15.4.57)
(1 + d(x ))Qβ,N (x )rN (x , x+1 )
Just as in the proof of the lower bound in Sect. 10.3, from the fact that the free
energy is quadratic with a negative eigenvalue in the direction of our paths, we
obtain that there exists a C > 0 such that, for all x under consideration and for all
= −A , . . . , B − 1,
fA,B (x , x+1 ) fA,B (x0 , x1 )
≤ e−C /N
2
.
(1 + d(x ))Qβ,N (x )rN (x , x+1 ) (1 + d(x0 ))Qβ,N (x0 )rN (x0 , x1 )
(15.4.58)
From this fact it is elementary to deduce that
B −1

fA,B (x , x+1 ) √ || 2
1 − exp max O( ε) √ , O(ε)
(1 + d(x ))Qβ,N (x )rN (x , x+1 ) N N
=−A

B −1
√ fA,B (x , x+1 )
≤ O( ε) . (15.4.59)
(1 + d(x ))Qβ,N (x )rN (x , x+1 )
=−A
15.5 Estimates on mean hitting times 363
√
Thus, we have established that, up to an error of order ε, the lower bound on the
capacity for the coarse-grained model is also a lower bound for the full model. This
leads to the inequality in (15.4.24) which, together with the upper bound given in
(15.3.10), concludes the proof of Theorem 15.3.

15.5 Estimates on mean hitting times

In this section we conclude the proof of Theorem 15.1. The capacity in the denom-
inator in the right-hand side of (7.1.41) is controlled by Theorem 15.3. It therefore
remains to control the equilibrium potential hA,B (σ ). We are in a situation where
the renewal inequality hA,B (σ ) ≤ cap(A, σ )/cap(B, σ ) cannot be used because ca-
pacities of single configurations are too small. We will need another method to cope
with this problem, explained in Sects. 15.5.1–15.5.2.

15.5.1 Mean hitting time and equilibrium potential

Let us start by considering a local minimum m∗0 of the one-dimensional function

Fβ,N , and denote by M the set of minima m such that Fβ,N (m) < Fβ,N (m∗0 ). We
consider the disjoint subsets A = SN [m∗0 ] and B = SN [M], and write (7.1.41) as
1
νA,B (σ ) Eσ τB = μβ,N (σ )hA,B (σ ). (15.5.1)
cap(A, B)
σ ∈A m∈[−1,1] σ ∈SN [m]

We expect the right-hand side of (15.5.1) to be of order Qβ,N (m∗0 ), so that all terms
in the sum over m with Qβ,N (m) much smaller than Qβ,N (m∗0 ) can be ignored.
More precisely, we choose δ > 0 in such a way that, for all N large enough, there
is no critical point z of Fβ,N with Fβ,N (z) ∈ [Fβ,N (m∗0 ), Fβ,N (m∗0 ) + δ], and we
define

Uδ = m ∈ [−1, 1] : Fβ,N (m) ≤ Fβ,N m∗0 + δ . (15.5.2)

Lemma 15.17 With Uδc the complement of Uδ ,

μβ,N (σ )hA,B (σ ) ≤ N e−βN δ Qβ,N m∗0 . (15.5.3)
m∈Uδc σ ∈SN [m]

The main problem is to control the equilibrium potential hA,B (σ ) for configura-
tions σ ∈ SN [Uδ ]. To do so, first note that

Uδ = Uδ m∗0 Uδ (m), (15.5.4)
m∈M
364 15 The Curie-Weiss Model with Random Magnetic Field

Fig. 15.2 Decomposition of [−1, 1]: Uδc is represented by dotted lines, Uδ = Uδ (m∗0 ) m∈M
Uδ (m) by continuous lines

where Uδ (m) is the connected component of Uδ containing m (see Fig. 15.2). Note
that it may happen that Uδ (m) = Uδ (m ) for two different minima m, m ∈ M.
With this notation we have the following lemma.

Lemma 15.18 There exists a constant c > 0 such that:

(i)

μβ,N (σ )hA,B (σ ) ≤ e−βN c Qβ,N m∗0 , m ∈ M, (15.5.5)
σ ∈SN [Uδ (m)]

(ii)

μβ,N (σ ) 1 − hA,B (σ ) ≤ e−βN c Qβ,N m∗0 . (15.5.6)
σ ∈SN [Uδ (m∗0 )]

The treatment of (i) and (ii) is completely similar, as both rely on a rough estimate
of the probability of leaving the starting valley before visiting its minimum, which
will be discussed below.
Assuming Lemma 15.18, we can readily conclude the proof of Theorem 15.1.
Indeed, using (15.5.5) together with (15.5.3), we obtain the upper bound

μβ,N (σ )hA,B (σ ) ≤ Qβ,N (m) + O Qβ,N m∗0 e−βN c
σ ∈SN m∈Uδ (m∗0 )
2
∗ πN
= Qβ,N m0 ∗ 1 + o(1) , (15.5.7)
2βa(m0 )

where a(m∗0 ) is given in (14.2.17). On the other hand, using (15.5.6) we get the
corresponding lower bound

μβ,N (σ )hA,B (σ ) ≥ μβ,N (σ ) 1 − 1 − hA,B (σ )
σ ∈SN m∈Uδ (m∗0 ) σ ∈SN [m]
15.5 Estimates on mean hitting times 365

≥ Qβ,N (m) − O Qβ,N m∗0 e−βN c
m∈Uδ (m∗0 )
2
∗ πN
= Qβ,N m0 ∗ 1 + o(1) . (15.5.8)
2βa(m0 )

From (15.2.11) for Qβ,N (m∗0 ) and (15.1.4) for cap(A, B), we finally obtain
μβ,N (σ )hA,B (σ )
EνA,B [τB ] =
cap(A, B)
σ ∈SN

= exp βN Fβ,N z∗ − Fβ,N m∗0
2
2πN βEh (1 − tanh2 (β(z∗ + h))) − 1
× ∗
1 + o(1) , (15.5.9)
β|γ̂1 | 1 − βEh (1 − tanh (β(m0 + h)))
2

which proves Theorem 15.1.

15.5.2 Upper bounds on harmonic functions

We next prove Lemma 15.18, giving a detailed proof only for (i) because the proof
of (ii) is completely analogous. This requires us to get an estimate on the minimiser
of the Dirichlet form, the harmonic function hA,B (σ ).
First note that, since hA,B (σ ) = Pσ (τA < τB ) for all σ ∈
/ A ∪ B, the only non-
zero contributions to the sum in (i) come from those sets Uδ (m) (at most two) whose
corresponding m is such that there are no minima of M between m∗0 and m. By
symmetry, we can just as well analyse one of these two sets, denoted by Uδ (m∗ ),
assuming for definiteness that m∗0 < m∗ . Next note that, since hA,B (σ ) = 0 for all σ
such that m∗ ≤ m(σ ), the problem can be reduced further to the set

Uδ− = Uδ m∗ ∩ m ∈ [0, 1] : m < m∗ . (15.5.10)

Define the mesoscopic counterpart of Uδ− , namely, for fixed m∗ ∈ M and n ∈ N, let
m∗ ∈ ΓNn be the minimum of Fβ,N (x) corresponding to m∗ , and define

Uδ = Uδ m∗ = x ∈ ΓNn : m(x) ∈ Uδ− . (15.5.11)
We write the boundary of Uδ as ∂Uδ = ∂A Uδ 2 ∂B Uδ , where ∂B Uδ = ∂Uδ ∩ B, and
observe that, for all σ ∈ SN [Uδ ],
hA,B (σ ) = Pσ [τA < τB ] ≤ Pσ [τS[∂A Uδ ] < τS[∂B Uδ ] ]. (15.5.12)
Let max1≤≤n ρ θ (ε) 1, and for θ = θ (ε) define
8
n
(m − m∗ )2 ε 2
Gθ = m ∈ Uδ : ≤ . (15.5.13)
ρ θ
=1
366 15 The Curie-Weiss Model with Random Magnetic Field

Fig. 15.3 Neighbourhoods of m∗0 and m∗ in the space ΓNn , where Uδ (m∗0 ) denotes the mesoscopic
counterpart of U (m∗0 )

As before, we denote by ∂Gθ the boundary of Gθ , and write ∂Gθ = ∂A Gθ ∪ ∂B Gθ ,

where ∂B Gθ = ∂Gθ ∩ B (see Fig. 15.3).
The strategy to control the equilibrium potential Pσ (τA < τB ) consists in
estimating the probabilities Pσ [τA < τSN [∂A Gθ ]∪B ] for σ ∈ S [Uδ \ Gθ ] and
Pσ [τSN [∂A Gθ ] < τB ] for σ ∈ Gθ , in order to apply a renewal argument and draw
from these estimates a bound on the probability of the original event. Proceeding
along this line, we state the following.

Proposition 15.19 For any α ∈ (0, 1) there exists an n0 ∈ N such that the inequality
∗
Pσ (τA < τSN [∂A Gθ ]∪B ) ≤ e−(1−α)βN [Fβ,N (m0 )+δ−Fβ,N (m(σ ))] (15.5.14)

holds for all σ ∈ SN [Uδ \ Gθ ] and n ≥ n0 , for all N sufficiently large.

Proof of Proposition 15.19: Super-harmonic barrier functions

Throughout the next computations c, c and c will denote positive constants that
are independent of n but may depend on β and on the distribution of h. The value
of c and c may change from line to line.
We first observe that, for all σ ∈ SN [Uδ \ Gθ ],

Pσ [τA < τSN [∂A Gθ ]∪B ] ≤ Pσ [τSN [∂A Uδ ] < τSN [∂A Gθ ]∪B ]. (15.5.15)

The probability in the right-hand side of (15.5.15) is the main object of investigation.
The idea behind the proof of bound (15.5.14) is simple. Suppose that ψ is a bounded
super-harmonic function defined on SN [Uδ \ Gθ ], with L = LN the generator of the
Markov process defined in Sect. 14.3, i.e.,

(Lψ)(σ ) ≤ 0 ∀ σ ∈ SN [Uδ \ Gθ ]. (15.5.16)

Then ψ(σt ) is a supermartingale, and T = τSN [∂A Uδ ] ∧ τSN [∂A Gθ ]∪B is an integrable
stopping time, so that, by Doob’s optional stopping theorem,
15.5 Estimates on mean hitting times 367

Eσ ψ(σT ) ≤ ψ(σ ) ∀ σ ∈ SN [Uδ \ Gθ ]. (15.5.17)

On the other hand,

Eσ ψ(σT ) ≥ min ψ σ Pσ (τSN [∂A Uδ ] < τSN [∂A Gθ ]∪B ), (15.5.18)
σ ∈SN [∂A Uδ ]

and hence
ψ(σ )
Pσ (τSN [∂A Uδ ] < τSN [∂A Gθ ]∪B ) ≤ max . (15.5.19)
σ ∈SN [∂A Uδ ] ψ(σ )

The problem is to find a super-harmonic function in order to get a suitable bound in

(15.5.19).

Proposition 15.20 For any α ∈ (0, 1) there exists n0 ∈ N such that the function
ψ(σ ) = φ(m(σ )) with φ : Rn → R defined by

φ(x) = e(1−α)βN Fβ,N (x) (15.5.20)

is super-harmonic in SN [Uδ \ Gθ ] for all n ≥ n0 , for N sufficiently large.

The proof of Proposition 15.20 will involve computations with differences of the
function Fβ,N . We collect some necessary properties that will be needed along the
way. First we need some control on the second derivative of this function. A simple
computation shows that

∂ 2 Fβ,N (x) 2 1
= −1 + IN, (x /ρ ) . (15.5.21)
∂x2 N βρ

Thus, we need to estimate the function IN, .

Lemma 15.21 For any y ∈ (−1, 1),

tanh−1 (y) − βε ≤ IN,

(y) ≤ tanh−1 (y) + βε. (15.5.22)
(y) = ±∞.
In particular, limy→±1 IN,

(y) = U −1 (y). Set I (y) = t. Then

Proof Recall that IN, N, N,

1
y= tanh(t + β h̃i ), (15.5.23)
|Λ |
i∈Λ

and hence
tanh(t − βε) ≤ y ≤ tanh(t + βε), (15.5.24)
or, equivalently, (15.5.22).
368 15 The Curie-Weiss Model with Random Magnetic Field

Lemma 15.22 For any y ∈ (−1, 1),

1
0 ≤ IN, (y) ≤ . (15.5.25)
1 − [|y| + εβ(1 − y 2 )]2
In particular, for all y ∈ [−1 + ν, 1 − ν] with ν ∈ (0, 1/2),

1
0 ≤ IN, (y) ≤ ≤ c, (15.5.26)
2ν + ν2 + O(ε)
and, for all y ∈ (−1, −1 + ν] ∪ [1 − ν, 1),

1
0 ≤ IN, (y) ≤ . (15.5.27)
1 − |y|

Proof We only consider the case y ≥ 0, the case y < 0 being completely analogous.
(x) = (U (I (x)))−1 , setting t = I (y) arctanh(y),
Using the relation IN, N, N, N,
and using Lemma 15.21, we obtain

1
IN, (y) =
i∈Λ (x) [1 − tanh (β h̃i + t )]
1 2
|Λ (x)|

1
≤
1 − tanh2 (εβ + t )
1
≤
1 − tanh (tanh−1 (y) + 2εβ)
2

1
≤
1 − [y + 2εβ tanh (tanh−1 (y))]2
1
= , (15.5.28)
1 − [y + 2εβ(1 − y 2 )]2
where we use that tanh is monotone increasing. The remainder of the proof is ele-
mentary algebra.

Let us define, for all m such that x /ρ ∈ [−1, 1 − 2N −1 ],

N
g (x) = FN,β (x + e ) − FN,β (x) . (15.5.29)
2
Lemma 15.22 has the following corollary.

Corollary 15.23
(i) If x /ρ ∈ [−1 + ν, 1 − ν] with ν > 0, then
1
g (x) = −x − h̄ + I (x /ρ ) + O(1/N ). (15.5.30)
β N,
15.5 Estimates on mean hitting times 369

(ii) If x /ρ ∈ [−1, −1 + ν] ∪ [1 − ν, 1 − 2N −1 ], then

1
g (x) = −x − h̄ + I (x /ρ ) + O(1), (15.5.31)
β N,
where O(1) is independent of N, n and ν.
(iii) If x /ρ ∈ [−1 + ν, 1 − ν] with ν > 0, then there exists a c < ∞ independent
of N such that

g (x) − g (x − e ) ≤ c . (15.5.32)
N
(iv) If x /ρ ∈ [−1, −1 + ν] ∪ [1 − ν, 1 − 2N −1 ], then

g (x) − g (x − e ) ≤ C, (15.5.33)
where C is a constant independent of N, n and ν.

The proof of this corollary is elementary and will not be detailed. The usefulness
| is large on the relevant domain. More precisely,
of (ii) results from the fact that |IN,
we have the following lemma.

Lemma 15.24 There exists a ν > 0 independent of N and n such that if x /ρ >
1 − ν, then g (x) is strictly increasing in x and tends to ∞ as x /ρ ↑ 1. Similarly,
if x /ρ < −1 + ν, then g (x) is strictly decreasing in x and tends to −∞ as
x /ρ ↓ −1.

Proof Combine Corollary 15.23(ii) with Lemma 15.21 and note that h̄ is bounded
by hypothesis.

The next step towards the proof of Proposition 15.20 is the following lemma.

Lemma 15.25 Let m ∈ Uδ \ Gθ and put S(m) = {1 ≤ ≤ n : m /ρ = 1}. Then

there exists a constant c = c(β, h) > 0 independent of N and n such that the follow-
ing holds. If
ε2
ρ ≤ , (15.5.34)
8θ
∈S(m)
/

then
2 ε2
ρ g (m) ≥ c . (15.5.35)
θ
∈S(m)

(x) = U −1 (x) we get that, for all ∈ S(m),

Proof From the relation IN, N,

1
m = tanh β g (m) 1 + o(1) + m + hi , (15.5.36)
N
i∈Λ

where o(1) tends to zero as N → ∞.

370 15 The Curie-Weiss Model with Random Magnetic Field

We need to be concerned about small g (m). Subtracting N1 i∈Λ tanh(β(m +
hi )) on both sides of (15.5.36) and expanding the right-hand side to first order in
g (m), and afterwards summing over ∈ S(m), we obtain

1
N
1
m − tanh β(m + hi ) − m − tanh β(m + hi )
N N
i=1 ∈S(m)
/ i∈Λ

1/2

≤c ρ g (m) ≤ c ρ g2 (m) . (15.5.37)
∈S(m) ∈S(m)

Note that the function m → m − N1 N i=1 tanh(β(m + hi )) has, by (14.2.18), a non-
zero derivative at m∗ . Moreover, by construction, m∗ is the only zero of this function
in Uδ− (m∗ ). From this observation, together with (15.5.37), we conclude that
' (1/2

n

ρ g2 (m) ≥ cm − m∗ − 2 ρ (15.5.38)
=1 ∈S(m)
/

for some constant c < ∞, where we use the triangle inequality and the fact that
|m − N1 i∈Λ tanh(β(m + hi ))| ≤ 2ρ . Under the hypothesis of the lemma, this
√
gives the desired bound when |m − m∗ | ≥ c ε/ θ for some constant c < ∞. On
the other hand, we can write for ∈ S(m),

m − m∗ ≤ 1 tanh β g (m) 1 + ω(1) + m + hi − tanh β(m + hi )

N
i∈Λ

1
+ tanh β(m + hi ) − tanh β m∗ + hi
N
i∈Λ

≤ cρ m − m∗ + c ρ g (m). (15.5.39)

Hence we get the bound

1/2

(m − m∗ )2 1/2

2
ρ g (m) ≥c − c m − m∗
ρ
∈S(m) ∈S(m)
' (1/2

n
(m − m∗ )2 (m − m∗ )2
=c
−
ρ ρ
=1 ∈S(m)
/

− c m − m∗

1/2

≥ c ε 2 /θ − 4 ρ − c m − m∗
∈S(m)
/
√
≥ cε/ 2θ − c m − m∗ , (15.5.40)
15.5 Estimates on mean hitting times 371

where in the last line we use that m ∈

/ Gθ . The inequalities in (15.5.38) and (15.5.40)
yield (15.5.35).

Proof of Proposition 15.20 Let σ ∈ SN [Uδ \ Gθ ], and set x = m(σ ) so that, for ψ
as in Proposition 15.20, and abbreviate (Lψ)(σ ) = (Lφ)(x). Let σ i be the configu-
ration obtained from σ after a spin-flip at i, and introduce the notation

n
(Lφ)(x) = (L φ)(x), (15.5.41)
=1

where

(L φ)(x) = pN σ, σ i φ(x + e ) − φ(x)
i∈Λ−
(x)

+ pN σ, σ i φ(x − e ) − φ(x) . (15.5.42)
i∈Λ+
(x)

Note that if x /ρ = ±1, then Λ± ±

(x) = ∅ and the summation over Λ (x) in
(15.5.42) disappears.
We define the probabilities

Pσ±, = pN σ, σ i , (15.5.43)
i∈Λ∓
(x)

and observe that they are uniformly close to the mesoscopic rates rN , namely,

Pσ±,
e−cε ≤ ≤ ecε (15.5.44)
rN (x, x ± e )
for some c > 0 and ε = 1/n. Note also that

cρ ≤ Pσ+, + Pσ−, ≤ c ρ . (15.5.45)

With the above notation and using the convention 0/0 = 0, we get

(L φ)(x) = φ(x)Pσ+, exp 2β(1 − α)g (x) − 1

+ φ(x)Pσ−, exp −2β(1 − α)g (x − e ) − 1

= φ(x) 1{Pσ+, ≥Pσ−, } Pσ+, G+ σ σ
σ −
(x) + 1{P−, >P+, } P−, G (x) , (15.5.46)

where we introduce the functions

Pσ−,
G+ (x) = exp 2β(1 − α)g (x) − 1 + exp −2β(1 − α)g (x − e ) − 1

P+,
σ

(15.5.47)
372 15 The Curie-Weiss Model with Random Magnetic Field

and

Pσ+,
G− (x) = exp −2β(1 − α)g (x − e ) − 1 + exp 2β(1 − α)g (x) − 1 .

P−,
σ

(15.5.48)
If x /ρ = ±1, then the local generator takes the simpler form

φ(x)Pσ−, [exp (−2β(1 − α)g (x − e )) − 1], if x /ρ = 1,
(L φ)(x) =
φ(x)Pσ+, [exp (2β(1 − α)g (x)) − 1],if x /ρ = −1.
(15.5.49)
From Lemma 15.24 and inequalities (15.5.45), it follows that, for all such that
x /ρ = ±1,

(L φ)(x) ≤ − 1 + ω(1) ρ φ(x). (15.5.50)
Let us now return to the case when x is not a boundary point. By the reversibility
conditions,

rN (x, x + e ) = exp −2βg (x) rN (x + e , x),
(15.5.51)
rN (x, x − e ) = exp 2βg (x − e ) rN (x − e , x),

which implies, together with (15.5.44), that

Pσ+,
exp −2βg (x) − cε ≤ σ ≤ exp −2βg (x) + cε ,
P−,
(15.5.52)
Pσ−,
exp 2βg (x − e ) − cε ≤ σ ≤ exp 2βg (x − e ) + cε .
P+,

Inserting the last bounds into (15.5.47) and (15.5.48), we obtain, after some compu-
tations,

G+ (x) ≤ exp 2β(1 − α)g (x) − 1 1 − exp 2βαg (x − e ) ∓ cε

+ exp 2βg (x − e ) ∓ cε exp 2β(1 − α) g (x) − g (x − e ) − 1
(15.5.53)
and

G−
(x) ≤ exp −2β(1 − α)g (x − e ) − 1 1 − exp −2βαg (x) ∓ cε

+ exp −2βg (x) ∓ cε exp 2β(1 − α) g (x) − g (x − e ) − 1 ,
(15.5.54)
where ∓ = −sign(g (x)) = −sign(g (x − e )). For all such that x /ρ ∈ [−1 + ν,
1 − ν], we can use (15.5.32) to get

G+
(x) ≤ exp 2β(1 − α)g (x) − 1 1 − exp 2αβg (x) ∓ cε + c/N (15.5.55)
15.5 Estimates on mean hitting times 373

and

G− (x) ≤ exp −2β(1 − α)g (x) − 1 1 − exp −2αβg (x) ∓ cε + c/N.
(15.5.56)
The right-hand sides of (15.5.55) and (15.5.56) are negative if and only if |g | > 2αβ
cε
.
Let us define the index sets
# $
cε
S < = : x /ρ ∈ [−1 + ν, 1 − ν], g (x) ≤ , (15.5.57)
αβ
# $
cε
>
S = : x /ρ ∈ [−1 + ν, 1 − ν], g (x) > . (15.5.58)
αβ

If ∈ S < , then we immediately get that

c 2
max G+ −
(x), G (x) ≤ ε , (15.5.59)
α
and hence from (15.5.45) and (15.5.46) that

c 2
(L φ)(x) ≤ ε ρ φ(x). (15.5.60)
α
To control the right-hand side of (15.5.55) and (15.5.56) when ∈ S > , we set

y = min β g (x), 12 ≤ β g (x). (15.5.61)
cε
If g (x) > αβ , then

exp 2β(1 − α)g (x) − 1 ≥ exp 2(1 − α)y − 1 ≥ 2(1 − α)y (15.5.62)

and

1 − exp 2βαg (x) − cε ≤ 1 − exp (αy ) ≤ −αy , (15.5.63)

so that the product in the right-hand side of (15.5.55) is bounded from above by
− 34 (1 − α)αy2 . On the other hand, if g (x) < − αβ
cε
, then

exp 2β(1 − α)g (x) − 1 ≤ exp −2(1 − α)y − 1 ≤ −(1 − α)y (15.5.64)

and

1 − exp 2βαg (x) + cε ≥ 1 − exp (−αy ) ≥ 34 αy , (15.5.65)

and the product in the right-hand side of (15.5.55) is bounded from above by
− 34 (1 − α)αy2 . Altogether this proves that, for all ∈ S > ,

G+
(x) ≤ − 4 (1 − α)αy ,
3 2
(15.5.66)
374 15 The Curie-Weiss Model with Random Magnetic Field

and with a similar computation that

G−
(x) ≤ − 4 (1 − α)αy .
3 2
(15.5.67)

If ∈ S > , then we have

(L φ)(x) ≤ −cαρ y2 φ(x). (15.5.68)

It remains to control the case x /ρ ∈ (−1, −1 + ν] ∪ [1 − ν, 1). It follows from

Lemma 15.24 that, while the positive contribution to G+ −
(x) and G (x) is bounded
by a constant, the negative contribution becomes large when ν gets small. More
explicitly, for ν small enough we have
±C 2
G+
(x) ≤ − e − 1 + e±C e2β(1−α)c − 1 ≤ − 1 + o(1) ,
(15.5.69)
∓C 2
G−
(x) ≤ − 1 − e + e∓C e2β(1−α)c − 1 ≤ − 1 + o(1) ,

where C and C are positive constants tending to ∞ as ν ↓ 0, and the sign ± is

equal to the sign of x . Together with (15.5.45) and (15.5.46), we finally get

(L φ)(x) ≤ − 1 + o(1) ρ φ(x). (15.5.70)

From (15.5.50), (15.5.60), (15.5.68)

and (15.5.70), it turns out that the positive con-
tribution to the generator (Lφ)(x) = n=1 (L φ)(x) comes at most from the indices
∈ S < , and can be estimated by

c 2 c
ε ρ ≤ ε 2 . (15.5.71)
α <
α
∈S

We must distinguish two cases, according to whether the hypothesis of Lemma 15.25
is satisfied or not.
ε2
Case 1: ∈S(x)
/ ρ > 8θ . By (15.5.50), we get

n
(L φ)(x) ≤ (L φ)(x) + (L φ)(x) (15.5.72)
=1 ∈S(x)
/ ∈S <

ε2 c
≤− 1 + o(1) φ(x) + ε 2 ,
8θ α
which is as negative as desired when θ is small enough, i.e., when ε is small enough.
ε2
Case 2: ∈S(x)
/ ρ ≤ 8θ . In this case, the assertion of Lemma 15.25 holds. By
(15.5.50), (15.5.68), and (15.5.70), we have that, for all ∈ S(x) \ L< ,

(L φ)(x) ≤ −ρ φ(x) min cαy2 , 1 ≤ −cαρ y2 φ(x), (15.5.73)
15.5 Estimates on mean hitting times 375

where the last inequality holds for α < 4/c. Let us write the generator as

(Lφ)(x) ≤ (L φ)(x) + (L φ)(x). (15.5.74)
∈S(x)\S < ∈S <

The first sum in (15.5.74) is bounded from above by

−cαφ(x) ρ y2 ≤ −cαφ(x) ρ min β 2 g2 (x), 14
∈S(x)\S < ∈s(x)\S <
# $
≤ −cαφ(x) min β 2 ρ g2 (x), 14 . (15.5.75)
∈S(x)\S <

But from Lemma 15.25 we know that, for all x ∈ Uδ \ Gθ ,

ε2 c ε2
ρ g2 (x) ≥ c − 2 ε 2 ≥ c , (15.5.76)
θ α θ
∈S(x)\S <

where c is a positive constant, provided that α ≥ cθ . Taking n large enough, we

find that
# $ # 2 $
ε ε2
min β 2 ρ g2 (x), 14 ≥ min c , 4 = c ,
1
(15.5.77)
θ θ
∈s(x)\S <

and then, from (15.5.71) and (15.5.75), we get

(Lψ)(σ ) ≤ −ε 2 (1 − α)φ(x) c αθ −1 − c α −1 . (15.5.78)

By our choice of θ , taking n large enough we see that the condition c αθ −1 −
c α −1 > 0, or α > cθ , is satisfied for any α ∈ (0, 1). Hence, for such n and for
N large enough, we get that (Lψ)(σ ) ≤ 0, which concludes the proof of Proposi-
tion 15.20.

Substituting the expression for the super-harmonic function in (15.5.20) into

(15.5.19), and using (15.5.15), we obtain that, for all σ ∈ SN [Uδ \ Gθ ],
))−F
Pσ [τA < τSN [∂A Gθ ]∪B ] ≤ max e−(1−α)βN [Fβ,N (m(σ β,N (m(σ ))]
σ ∈SN [∂A Uδ ]
∗
≤ e−(1−α)βN [Fβ,N (m0 )+δ−Fβ,N (m(σ ))] , (15.5.79)

where the last inequality follows from the definition of Uδ , together with the bounds
in (14.7.3). This concludes the proof of Proposition 15.19.
376 15 The Curie-Weiss Model with Random Magnetic Field

Renewal estimates on escape probabilities

Let us now return to the proof of Lemma 15.18. An easy consequence of (15.5.14)
is that, for all σ ∈ SN [∂A Gθ ],
∗
Pσ (τA < τSN [∂A Gθ ]∪B ) ≤ e−(1−α)βN (Fβ,N (m0 )+δ) max e(1−α)βN Fβ,N (m) ,
m∈∂A Gθ
(15.5.80)
while obviously Pσ (τA < τSN [∂A Gθ ]∪B ) = 0 for all σ ∈ SN [Gθ \ ∂A Gθ ]. To control
the right-hand side of (15.5.80), we need the following lemma.

Lemma 15.26 There exists a constant c < ∞ independent of n such that, for all
m ∈ Gθ ,

Fβ,N (m) ≤ Fβ,N m∗ + cε. (15.5.81)

Proof Fix m ∈ Gθ and set m − m∗ = v. Note that, from the definition of Gθ , we

have
n
(m − m∗ )2
v22 ≤ max ρ ≤ ε2 . (15.5.82)
1≤≤n ρ
=1
Using Taylor’s formula, we have

Fβ,N (m) = Fβ,N m∗ + 12 v, A m∗ v + 16 D 3 Fβ,N (x)v3 , (15.5.83)

where A(m∗ ) is the positive-definite matrix described in Sect. 15.2.2 (see (15.2.9))
and x is a suitable element of the ball around m∗ . From the explicit representation
of the eigenvalues of A(m∗ ), we see that A(m∗ ) ≤ cε −1 , and hence

v, A m∗ v ≤ cε −1 v22 ≤ cε. (15.5.84)

The remainder is given in explicit form as

n
∂ 3 Fβ,N 1 1
n
D 3 Fβ,N (x)v3 = (x)v3 = I (x /ρ )v3
2 N,
(15.5.85)
=1
∂x3 β ρ
=1

1 1 UN, (t ) 3
n
=− (t )]3 v
β ρ 2 [UN,
=1

1 1 |Λ |−1 i∈Λ tanh(t + β h̃i )[1 − tanh (t + β h̃i )] 3
n 2
=− v ,
β ρ2
=1
(|Λ |−1 i∈Λ [1 − tanh2 (t + β h̃i )])3
(x /ρ ). Thus,
where t = IN,

3 n
1 3
D Fβ,N (x)v3 ≤ c v ≤ c ε −1 v22 ≤ c ε, (15.5.86)
ρ 2
=1
15.5 Estimates on mean hitting times 377

where we use that |v /ρ | ≤ 1. Hence, for some c < ∞ independent of n,

Fβ,N (m) ≤ Fβ,N m∗ + cε. (15.5.87)

Inserting the result of Lemma 15.26 into (15.5.80), and recalling that Fβ,N (m∗ ) =
Fβ,N (m∗ ), we get that, for all σ ∈ SN [∂A Gθ ],
∗ ∗ )−cε)
Pσ (τA < τSN [∂A Gθ ]∪B ) ≤ e−(1−α)βN (Fβ,N (m0 )+δ−Fβ,N (m . (15.5.88)

The last ingredient in order to get a suitable estimate on Pσ (τA < τB ) is stated in
the following lemma.

Lemma 15.27 For any δ2 > 0 there exists an n0 ∈ N such that for all n ≥ n0 , all
σ ∈ SN [∂A Gθ ] and all N large enough,

Pσ (τB < τSN [∂A Gθ ] ) ≥ e−Nβδ2 . (15.5.89)

Proof Fix σ ∈ SN [∂A Gθ ] and set m(0) = m(σ ). As pointed out in the proof of
Lemma 15.26, every m(0) ∈ ∂A Gθ can be written in the form m(0) = m∗ + v
with v ∈ ΓNn such that v2 ≤ ε. Let m = (m(0), m(1), . . . , m(v1 N ) = m∗ ) be
a nearest-neighbour path in ΓNn from m(0) to m∗ of length N v1 with the follow-
ing property: with t the unique index in {1, . . . , n} such that mt (t) = mt (t − 1),

2
mt (t) = mt (t − 1) + st , ∀ t ≥ 1, (15.5.90)
N
where we define

st = sign m∗t − mt (t − 1) . (15.5.91)
Note that, by property (15.5.90), m(t) ∈ Gθ for all t ≥ 0. Thus, all microscopic
paths, (σ (t))t≥0 such that σ (0) = σ and m(σ (t)) = m(t) for all t ≥ 1 are contained
in the event {τB < τSN [∂A Gθ ] }. Therefore we get

Pσ (τB < τSN [∂A Gθ ] ) ≥ Pσ m σ (t) = m(t) ∀ t = 1, . . . , v1 N
v
" 1N

= Pσ m σ (t) = m(t) | m σ (t − 1) = m(t − 1)
t=1
v
" 1N
= pN σ (t − 1), σ i (t − 1) . (15.5.92)
t=1 i∈Λst
t

Note that Λstt is the set of sites in which a spin-flip corresponds to a step from
m(t − 1) to m(t).
378 15 The Curie-Weiss Model with Random Magnetic Field

The sum of the probabilities in the right-hand side of (15.5.92) corresponds to the
σ (t−1)
quantity Pst ,t defined in (15.5.43). From the inequalities in (15.5.44) and (15.3.3)
it follows that, for some constant c > 0 depending on β and on the distribution of
the magnetic field,

Psσt (t−1)
,t ≥ cΛstt m(t − 1) /N ≥ cΛstt m∗ /N, (15.5.93)

where the second inequality follows by our choice of the path m. Now, since
|Λ± ∗ ∗ ∗
(m )|/N = 2 (ρ ± m ), we can use the expression in (15.2.8) for mt and
1

continue from (15.5.93), to obtain

Psσt (t−1)
,t ≥ c ρ t . (15.5.94)

Inserting the last inequality into (15.5.92) and using that, by the definition of the
path m, the number of steps corresponding to a spin-flip in Λ is equal to |v |N for
all = {1, . . . , n}, we get

v
" 1N "
n
|v |N
Pσ (τB < τSN [∂A Gθ ] ) ≥ c ρt = ev1 N ln(c ) ρ (15.5.95)
t=1 =1
√ n √ n √
N ε ln(c ) −N =1 v ln(1/ρ ) ε ln(c ) −N
≥e e ≥ eN e =1 v / ρ

n 1/2 ε −1/2 √ √
ε/θ− ε ln(c ))
≥ eN ε ln(c ) e−N ( ≥ e−N (
2
=1 v /ρ )s ,
√
where in the third line we use the inequality v1 ≤ ε −1/2 v2 ≤ ε, and in the
last line we use that m(0) = m∗√ + v ∈ G√θ . By our choice of θ ( ε, there exists an
n0 ∈ N such that, for all n ≥ n0 , ε/θ − ε ln(c ) ≤ βδ2 . For such n, the inequality
in (15.5.95) yields the bound in (15.5.89).

We finally state the following proposition.

Proposition 15.28 For all σ ∈ S [Uδ ],

∗ ∗ )−cε)−δ
Pσ (τA < τB ) ≤ e−βN [(1−α)(Fβ,N (m0 )+δ−Fβ,N (m 2] 1 + o(1) . (15.5.96)

Proof We first consider a configuration σ ∈ SN [∂A Gθ ]. Then

Pσ (τA < τB )

≤ Pσ (τA < τSN [∂A Gθ ]∪B ) + Pσ (τA < τB , τη ≤ τSN [∂A Gθ ]∪A∪B )
η∈SN [∂A Gθ ]

≤ Pσ (τA < τSN [∂A Gθ ]∪B ) + max Pη (τA < τB ) Pσ (τSN [∂A Gθ ] < τB )
η∈SN [∂A Gθ ]

≤ Pσ (τA < τSN [∂A Gθ ]∪B ) + max Pη (τA < τB ) 1 − e−βN δ2 , (15.5.97)
η∈SN [∂A Gθ ]
15.5 Estimates on mean hitting times 379

where in the second line we use the Markov property, and in the last line we insert
the result in (15.28). Taking the maximum over σ ∈ SN [∂A Gθ ] on both sides of
(15.5.97), and rearranging the summation, we get

max Pσ (τA < τB ) ≤ max Pσ (τA < τSN [∂A Gθ ] )eβN δ2

σ ∈SN [∂A Gθ ] σ ∈SN [∂A Gθ ∪B]
∗ ∗ )−cε)−δ
≤ e−βN ((1−α)(Fβ,N (m0 )+δ−Fβ,N (m 2) , (15.5.98)

where in the last line we use the bound in (15.5.88). This concludes the proof of
(15.5.96) for σ ∈ SN [∂A Gθ ].
Next we consider σ ∈ SN [Uδ \ ∂A Gθ ]. As before,

Pσ (τA < τB )

≤ Pσ (τA < τSN [∂A Gθ ]∪B ) + Pσ (τA < τB , τη ≤ τSN [∂A Gθ ]∪A∪B )
η∈SN [∂A Gθ ]

≤ Pσ (τA < τSN [∂A Gθ ]∪B ) + max Pη (τA < τB )Pσ (τSN [∂A Gθ ] < τB )
η∈SN [∂A Gθ ]

≤ Pσ (τA < τSN [∂A Gθ ]∪B ) + max Pη (τA < τB ), (15.5.99)

η∈SN [∂A Gθ ]

where Pσ (τA < τSN [∂a Gθ ]∪B ) is zero for all σ ∈ SN [Gθ \∂A Gθ ], and is exponentially
small in N for all σ ∈ SN [Uδ \ Gθ ] (due to Proposition 15.19). Inserting the bound
in (15.5.98) into the last equation, we get (15.5.96) for σ ∈ SN [Uδ \ ∂A G].

The proof of (15.5.5) now follows straightforwardly. From (15.5.96) we get

μβ,N (σ )Pσ (τA < τB )
σ ∈SN [Uδ (m∗ )]
∗ ∗ )−cε)−δ
≤ e−βN [(1−α)(Fβ,N (m0 )+δ−Fβ,N (m 2] Qβ,N (m)
m∈Uδ
∗ ∗
= Qβ,N m∗0 eβN [αFβ,N (m0 )−(1−α)(δ−Fβ,N (m )−cε)+δ2 ] e−βN Fβ,N (m)
m∈Uδ
∗ ∗
≤ Qβ,N m∗0 N n eβN [α(Fβ,N (m0 )−Fβ,N (m ))−(1−α)(δ−cε)+δ2 ] , (15.5.100)

where in the second inequality we use the expression in (14.2.9) for Qβ,N (m∗0 ), and
in the last line we use the bounds Fβ,N (m) ≤ Fβ,N (m∗ ) = Fβ,N (m∗ ) and |Ud | ≤
N n . Finally, choosing α small enough, namely,
δ − cε − δ2
α< ∗ , (15.5.101)
Fβ,N (m0 ) − Fβ,N (m∗ ) + δ − cε

we can easily ensure that (15.5.100) implies (15.5.5).

In exactly the same way we can prove (15.5.6). This concludes the proof of
Lemma 15.18 and hence of Theorem 15.1.
380 15 The Curie-Weiss Model with Random Magnetic Field

15.6 Bibliographical notes

1. The results of this chapter have been obtained by Bianchi, Bovier and Ioffe
in [24]. The proof given here is streamlined, and the use of deficient flows on the
mesoscopic level leads to a considerable simplification. Coupling methods have
been used by the same authors in [25] to prove that transition times asymptoti-
cally exponentially distributed and the independence of the initial law. We have de-
cided that this proof is too technical and too model-specific to be reproduced here.
The results have been generalised to the Potts version of the model in the thesis of
Slowik [219].

2. The dynamics of the random field Curie-Weiss model has been studied before. Dai
Pra and den Hollander [69] studied the short-time dynamics using large deviation
results and obtained the analogue of the McKean-Vlasov equations. Mathieu and
Picco [180] considered convergence to equilibrium in the particularly simple case
where the random field takes only the two values ±ε (with further restrictions on
the parameters that exclude the presence of more than two minima).

3. The computations in this chapter should be paradigmatic for a wider class of

Glauber dynamics for spin systems at low but finite temperatures: through a se-
quence of approximate coarse-grainings, the problem is mapped to a discrete dif-
fusion in a potential given by a free energy functional, which in turn behaves like
a diffusion process. The quality of the approximation is improved by refining the
coarse-graining, leading effectively to a limiting SPDE. The random-field Curie-
Weiss model is the only case where this program has been carried out in full so far.
A natural next candidate would be Glauber dynamics for Ising models with Kac-
type interactions (see the monograph by Presutti [202]).
Part VI
Applications: Lattice Systems in Small
Volumes at Low Temperatures

In Chaps. 13–15 we studied models with a mean-field interaction at a fixed (sub-

critical) temperature in the limit of large volume. The parameter controlling the
metastable behaviour in these models was the volume. We will now consider situa-
tions where the relevant parameter is the temperature.
Part VI looks at lattice models with a short-range interaction in a finite volume
in the limit as the temperature tends to zero. These models have been a key success
of the large-deviation approach, and they are important targets for the potential-
theoretic approach as well. In Chap. 16 we show how to use the theory developed
in Part III to establish the universal metastable behaviour of these models under
a number of general hypotheses. These hypotheses will be proved for two specific
choices of the dynamics: in Chap. 17 we look at Glauber dynamics for Ising spins,
in Chap. 18 at Kawasaki dynamics for lattice gas particles.
In most of Chap. 16 we focus on Metropolis dynamics. At the end we briefly
discuss two other dynamics, namely, heat-bath dynamics and probabilistic cellular
automata, for which the same universal metastable behaviour can be derived under
similar hypotheses.
Chapter 16
Abstract Set-Up and Metastability
in the Zero-Temperature Limit

Talking is a wonderful smoother-over of difficulties. When I

come upon anything—in Logic or in any other hard
subject—that entirely puzzles me, I find it a capital plan to talk it
over, aloud, even when I am all alone. One can explain things so
clearly to one’s self! And then, you know, one is so patient with
one’s self: one never gets irritated at one’s own stupidity!
(Lewis Carroll, A Selection from Symbolic Logic)

This chapter describes the metastable behaviour of lattice systems in small volumes
at low temperatures subject to a Metropolis dynamics. These theorems are derived
under two hypotheses on the energy landscape, i.e., on the interaction Hamiltonian.
These hypotheses, in turn, will be checked for Glauber dynamics in Chap. 17 and for
Kawasaki dynamics in Chap. 18. The theorems themselves are model-independent,
and therefore amplify the universal nature of metastability (in the setting considered
here). However, they involve a number of quantities that are model-dependent. The
identification of these quantities will be carried out in Chaps. 17 and 18 as well.
The outline is as follows. In Sect. 16.1 we define Metropolis dynamics on a
general configuration space with respect to a general Hamiltonian and for a general
set of allowed moves, we state the theorems subject to the hypotheses, and we place
the results in their proper context. Section 16.2 explains how this abstract set-up
fits into the potential-theoretic framework developed in Part III. The proofs of the
theorems are given in Sect. 16.3. In Sect. 16.4 we take a brief look at two other
dynamics, namely, heat-bath dynamics and probabilistic cellular automata, and we
indicate how these can be included into the same abstract set-up with only minor
modifications.

16.1 Hypotheses and universal metastability theorems

In this section we state three metastability theorems under two hypotheses.

A. Bovier, F. den Hollander, Metastability,
Grundlehren der mathematischen Wissenschaften 351,
DOI 10.1007/978-3-319-24777-9_16
384 16 Abstract Set-Up and Metastability in the Zero-Temperature Limit

16.1.1 Metropolis dynamics and geometric definitions

We begin by defining the abstract set-up of lattice models with short-range interac-
tion in finite volume from statistical physics.
Let Λ be a finite set (e.g. Λ ⊆ Zd , d ≥ 1). We refer to elements of Λ as sites.
With each site x ∈ Λ we associate a variable ξ(x) ∈ Υ , where Υ is a finite set of
spin-values. A configuration ξ = {ξ(x) : x ∈ Λ} is an element of S = Υ Λ . To each
configuration ξ we associate an energy given by a Hamiltonian H : S → R, which
in general depends on one or more parameters. The Gibbs measure associated with
H is
1 −βH (ξ )
μβ (ξ ) = e , ξ ∈ S, (16.1.1)
Zβ
where β ∈ (0, ∞) is the inverse temperature, and Zβ is the normalising partition
sum.
Equip S with a set of undirected edges E, connecting pairs of elements of S,
such that (S, E) is a connected graph. Write ξ ∼ ξ when (ξ, ξ ) ∈ E. As dynamics
we consider the continuous-time Markov process (ξt )t≥0 with state space S whose
transition rates are given by
# −β[H (ξ )−H (ξ )]
e +, ξ ∼ ξ ,
cβ ξ, ξ = (16.1.2)
0, otherwise,

i.e., transitions occur along edges only. This dynamics is called the Metropolis dy-
namics with respect to H at inverse temperature β. It is ergodic and reversible with
respect to μβ :

μβ (ξ )cβ ξ, ξ = μβ ξ cβ ξ , ξ ∀ ξ, ξ ∈ S, (16.1.3)

Choosing a particular model amounts to choosing Λ, S, E, H and β. The generator

Lβ of the dynamics is

(Lβ f )(ξ ) = cβ ξ, ξ f ξ − f (ξ ) , f : S → R. (16.1.4)
ξ ∼ξ

In order to formulate our metastability theorems, we need some general defini-

tions (compare with Definition 10.2).

Definition 16.1 (Communication heights, communication level sets, stability levels,

sets of stable and metastable configurations)
(a) Φ(ξ, ξ ) is the communication height between ξ, ξ ∈ S defined by

Φ ξ, ξ = min max H (σ ), (16.1.5)
γ : ξ →ξ σ ∈γ
16.1 Hypotheses and universal metastability theorems 385

where γ : ξ → ξ is any path of allowed moves from ξ to ξ . For non-empty

sets A, B ⊆ S put

Φ(A, B) = min Φ ξ, ξ . (16.1.6)
ξ ∈A,ξ ∈B

(b) S (ξ, ξ )
is the communication level set between ξ, ξ ∈ S defined by

S ξ, ξ = ζ ∈ S : ∃ γ : ξ → ξ , γ ζ : max H (η) = H (ζ ) = Φ ξ, ξ .
η∈γ
(16.1.7)
(c) Vξ is the stability level of ξ ∈ S defined by

Vξ = Φ(ξ, Iξ ) − H (ξ ), (16.1.8)

where

Iξ = ζ ∈ S : H (ζ ) < H (ξ ) (16.1.9)
is the set of configurations with energy lower than ξ .
(d) Sstab is the set of configurations with minimal energy, called stable configura-
tions, defined by

Sstab = ξ ∈ S : H (ξ ) = min H (ζ ) . (16.1.10)
ζ ∈S

(e) Smeta is the set of non-minimal configurations with maximal stability, called
metastable configurations, defined by

Smeta = ξ ∈ S : Vξ = max Vζ . (16.1.11)
ζ ∈S\Sstab

Definition 16.2 (Optimal paths, gates, dead-ends)

(a) (ξ → ξ )opt is the set of paths realising the minimax in Φ(ξ, ξ ).
(b) A set W ⊆ S is called a gate for ξ → ξ if W ⊆ S (ξ, ξ ) and γ ∩ W = ∅ for
all γ ∈ (ξ → ξ )opt .
(c) A set W ⊆ S is called a minimal gate for ξ → ξ if it is a gate for ξ → ξ and
for any W W there exists a γ ∈ (ξ → ξ )opt such that γ ∩ W = ∅.
(d) A priori there may be several (not necessarily disjoint) minimal gates. Their
union is denoted by G (ξ, ξ ) and is called the essential gate for (ξ → ξ )opt .
(e) The configurations in S (ξ, ξ )\G (ξ, ξ ) are called dead-ends for (ξ → ξ )opt .

Armed with these definitions we are ready to state our metastability theorems
(see Fig. 16.1).

16.1.2 Metastability theorems and hypotheses

Theorems 16.4–16.6 below involve a pair of configurations (m, s) ∈ Smeta × Sstab ,

which will be referred to as the metastable configuration, respectively, the stable
386 16 Abstract Set-Up and Metastability in the Zero-Temperature Limit

Fig. 16.1 Schematic picture of H , Smeta , Sstab and G , the essential gate between Smeta and Sstab

Fig. 16.2 Schematic picture of the protocritical set and the critical set

configuration. Associated with (m, s) is a pair of sets (P (m, s), C (m, s)), which
will be referred to as the protocritical set, respectively, the critical set, defined as
follows.

Definition 16.3 (Protocritical and critical sets) (See Fig. 16.2.) Let

Γ = Φ(m, s) − H (m). (16.1.12)

Then (P (m, s), C (m, s)) is the maximal subset of S × S such that:
(1) ∀ ξ ∈ P (m, s) ∃ ξ ∈ C (m, s) : ξ ∼ ξ and ∀ ξ ∈ C ∗ (m, s) ∃ ξ ∈ P ∗ (m, s) :
ξ ∼ ξ.
(2) ∀ ξ ∈ P (m, s) : Φ(ξ, m) < Φ(ξ, s).
(3) ∀ ξ ∈ C (m, s) ∃ γ : ξ → s : maxζ ∈γ H (ζ ) − H (m) ≤ Γ , γ ∩ {ζ ∈ S :
Φ(ζ, m) < Φ(ζ, s)} = ∅.

Think of P (m, s) as the set of configurations where the dynamics starting from
m is “almost on top of the hill”, and of C (m, s) as the set of configurations where
the dynamics “has reached the top of the hill” and is “capable of crossing over” to s
16.1 Hypotheses and universal metastability theorems 387

without returning to “the valley around m”. The latter restriction is put in to remove
the dead-ends. Note that

H (ξ ) − H (m) = Γ ∀ ξ ∈ C (m, s). (16.1.13)

Also note that C (m, s) ⊆ G (m, s), where the inclusion may be strict.
Theorems 16.4–16.6 below will be proved subject to two hypotheses:
(H1) Smeta = {m}, Sstab = {s}.
(H2) ξ → |{ξ ∈ P (m, s) : ξ ∼ ξ }| is constant on C (m, s).
(H1) says that Smeta and Sstab are singletons, while (H2) says that all configurations
in C (m, s) have the same number of configurations in P (m, s) from which they
can be reached via an allowed move. Any pair of configurations (m, s) satisfying
(H1) is referred to as a metastable pair. Without loss of generality we may assume
that
H (m) = 0. (16.1.14)
We write Pξ to denote the law of (ξt )t≥0 given ξ0 = ξ ∈ S, and

τA = inf{t ≥ 0 : ξt ∈ A, ∃ 0 < s < t : ξs = ξ0 } (16.1.15)

to denote the first hitting time of A ⊆ S after the starting configuration has been
left. In what follows we abbreviate P = P (m, s) and C = C (m, s).

Theorem 16.4 (Critical gate and uniform entrance distribution)

(a) limβ→∞ Pm (τC < τs | τs < τm ) = 1.
(b) limβ→∞ Pm (ξτC = χ) = 1/|C | for all χ ∈ C .

Theorem 16.5 (Mean crossover time) There exists a constant K ∈ (0, ∞) such that

lim e−βΓ Em (τs ) = K.

(16.1.16)
β→∞

Theorem 16.6 (Spectrum and exponential law of crossover time)

(a) limβ→∞ λβ Em (τs ) = 1, where λβ is the second eigenvalue of −Lβ , with Lβ
the generator of the Metropolis dynamics.
(b) limβ→∞ Pm (τs /Em (τs ) > t) = e−t for all t ≥ 0.

It turns out that typically Γ is independent of Λ (provided Λ is large enough)

and is relatively robust against variations of the dynamics, while K depends on Λ
and is rather sensitive to the details of the dynamics.
In Sect. 16.3 we will see that K is given by a non-trivial variational formula in-
volving the set of all configurations where the dynamics can enter and exit S (m, s)
(see Lemma 16.17 below). This set includes the border of the “valleys around m
and s”, and possibly the border of “wells in S (m, s)”, i.e., configurations with en-
ergy < Γ but communication height Γ towards both m and s. We will see in
388 16 Abstract Set-Up and Metastability in the Zero-Temperature Limit

Chap. 17 that for Glauber dynamics there are no wells and K can be computed ex-
plicitly. We will see in Chap. 18 that for Kawasaki dynamics there are wells, but
they are sometimes harmless, e.g. when Λ is a large box in Z2 whose size tends to
infinity (after the limit β → ∞ has been taken).
While (H1) plays a central role in the derivation of Theorems 16.4–16.6, (H2) is
needed for Theorem 16.4(b) only.

16.1.3 Discussion

1. Theorem 16.4(a) says that C is a gate for the crossover, i.e., on its way from m
to s the dynamics passes through C with a probability tending to one in the limit of
low temperature. Theorem 16.4(b) says that, in this limit, all critical configurations
are equally likely to be seen upon first entrance in C . Theorem 16.5 says that the

average crossover time is asymptotic to KeΓ β , which is the classical Arrhenius
law (see Sect. 1.3.1). Theorem 16.6(a) says that the spectral gap −Lβ (the first
eigenvalue of −Lβ is zero) scales like the inverse of the average crossover time,
while Theorem 16.6(b) says that asymptotically the crossover time is exponentially
distributed on the scale of its average.

2. Theorems 16.4–16.6 are model-independent, i.e., they hold in the same form for
all stochastic dynamics in a finite volume in the limit of low temperature and for
any pair (m, s) satisfying hypotheses (H1–H2). In fact, we will see that (H1–H2)
are essentially the minimal hypotheses needed to prove Theorems 16.4–16.6. The
model-dependent ingredients of Theorems 16.4–16.6 are the pair (m, s) and the
triple (Γ , C , K). In Chaps. 17 and 18 we will identify these for Glauber dynamics
and Kawasaki dynamics, and prove (H1–H2).

3. There is some flexibility in letting our dynamics start and end at configurations
that are different from m and s. For instance, we will see that the same results apply
when the initial configuration is drawn from the “valley around m”, and the target
configuration is drawn near the bottom of the “valley around s” (see Sect. 16.2.3,
Eq. (16.1.17) for precise definitions).

4. Hypothesis (H1) can be relaxed. The Hamiltonian may have valleys that are
deeper than Γ (the energy barrier between m and s), but are shielded away from
m by an energy barrier that is higher than Γ . In that case the dynamics has a neg-
ligible probability to enter these valleys, and (H1) is required to hold only on the
subset of S obtained by removing all the configurations with energy > Γ + H (m).
The average crossover time on this subset is the relevant time scale, not the average
crossover time on S, which is much longer. See also Item 3 in Sect. 16.5.
16.1 Hypotheses and universal metastability theorems 389

16.1.4 Consequences of the hypotheses

Lemmas 16.7–16.10 below are immediate consequences of (H1) and will be needed
in Sect. 16.2. Recall that H (m) = 0 by (16.1.14). Recall also Figs. 16.1–16.2.

Lemma 16.7 (H1) implies that Vm = Γ .

Proof By Definition 16.1(c–e), s ∈ Im and hence Vm ≤ Γ . We show that (H1)

implies Vm = Γ . The proof is by contradiction. Suppose that Vm < Γ . Then there
exists a ξ0 ∈ Im such that

Φ(m, ξ0 ) = Φ(m, ξ0 ) − H (m) = Vm < Γ . (16.1.17)

Since (H1) tells us that m has the largest stability level, we can proceed to reduce
the energy further until we hit s. Indeed, the finiteness of S guarantees that there
exists an m ∈ N0 and a sequence ξ1 , . . . , ξm ∈ S\m with ξm = s such that ξi+1 ∈ Iξi
and Φ(ξi , ξi+1 ) − H (ξi ) < Vm for i = 0, . . . , m − 1. Therefore we have

Φ(ξ0 , s) ≤ max Φ(ξi , ξi+1 ) < max H (ξi ) + Vm
i=0,...,m−1 i=0,...,m−1

= H (ξ0 ) + Vm < H (m) + Γ = Γ , (16.1.18)

where in the first inequality we use the ultrametricity of the communication height,

Φ(ξ, χ) ≤ max Φ(ξ, ζ ), Φ(ζ, χ) ∀ ξ, χ, ζ ∈ S (16.1.19)

(a property that is closely related to the approximate ultrametricity of capacity en-

countered in Corollary 8.6), and in the last inequality we use that Vm ≤ Γ and
H (ξ0 ) < H (m) because ξ0 ∈ Im . It follows from (16.1.17)–(16.1.19) that

Γ = Φ(m, s) ≤ max Φ(m, ξ0 ), Φ(ξ0 , s) < Γ , (16.1.20)

which is a contradiction.

Lemma 16.8 (H1) implies that H (ξ ) > 0 for all ξ ∈ S\m with Φ(ξ, m) ≤ Φ(ξ, s).

Proof The proof is again by contradiction. Fix ξ0 ∈ S\m with Φ(ξ0 , m) ≤ Φ(ξ0 , s)
and suppose that H (ξ0 ) ≤ 0. Then m ∈ / Iξ0 . As in the proof of Lemma 16.7, there
exist an m ∈ N0 and a sequence ξ0 , . . . , ξm ∈ S with ξm = s such that ξi+1 ∈ Iξi and
Φ(ξi , ξi+1 ) − H (ξi ) < Vm = Γ for i = 0, . . . , m − 1. Therefore, as in (16.1.18),
we get Φ(ξ0 , s) − H (ξ0 ) < Vm = Γ . Hence

Γ = Φ(m, s) ≤ max Φ(m, ξ0 ), Φ(ξ0 , s) = Φ(ξ0 , s) ≤ Φ(ξ0 , s) − H (ξ0 ) < Γ ,
(16.1.21)
which is a contradiction.
390 16 Abstract Set-Up and Metastability in the Zero-Temperature Limit

Lemma 16.9 (H1) implies that there exists a V < Γ such that Φ(ξ, {m, s}) −
H (ξ ) ≤ V for all ξ ∈ S\{m, s}.

Proof In the proof of Lemma 16.8 we have shown that Φ(ξ0 , s) − H (ξ0 ) < Γ for
all ξ0 ∈ S\m. But

Φ ξ0 , {m, s} = min Φ(ξ0 , m), Φ(ξ0 , s) ≤ Φ(ξ0 , s), (16.1.22)

while Φ(m, {m, s}) − H (m) = 0, and so the claim follows.

Lemma 16.10 Let C¯ = {ξ ∈ S\[P ∪ C ] : H (ξ ) ≤ Γ , ∃ ξ ∈ C : ξ ∼ ξ }.

Then for every ξ ∈ C¯ every path in (ξ → m)opt passes through P .

Proof Pick any ξ ∈ C¯ , any γ ∈ (ξ → m)opt and any ξ ∈ C such that ξ ∼ ξ .

We have maxζ ∈γ H (ζ ) ≤ Γ , because H (ξ ) ≤ Γ and Φ(m, ξ ) ≤ Γ by Defi-
nition 16.3. The reverse of γ can be extended by the single move from ξ to ξ
to obtain a path γ : m → ξ such that maxζ ∈γ H (ζ ) ≤ Γ . Moreover, by Defi-
nition 16.3(3), this path can be further extended by a path γ : ξ → s such that
maxζ ∈γ H (ζ ) ≤ Γ and γ ∩ P = ∅. The concatenation γ ∪ γ is an optimal
path, i.e., γ ∪ γ ∈ (m → s)opt . However, by the maximality in Definition 16.3,
any path in (m → s)opt must hit P . Since γ does not hit P , it follows that γ
hits P . But ξ ∈ C and P ∩ C = ∅, and so the piece of γ between m and ξ
hits P .

Lemmas 16.7–16.8 say that m lies at the bottom of a valley of depth Γ ,

Lemma 16.9 says that there are no deeper valleys anywhere else, while Lemma 16.10
says that once an optimal path from m to s is over the hill it cannot go back to m
without passing through the protocritical set (see Fig. 16.2).

16.2 Preliminaries
In this section we recall some facts from Part III, adapted to the present context,
and use them to derive a few lemmas that are needed in Sect. 16.3 to prove Theo-
rems 16.4–16.6.

16.2.1 Dirichlet form and capacity

As we have seen in Chap. 8, the key object in the potential-theoretic approach to

metastability is the Dirichlet form
2
Eβ (h, h) = 12 μβ (ξ )cβ ξ, ξ h(ξ ) − h ξ , h : S → [0, 1], (16.2.1)
ξ,ξ ∈S
16.2 Preliminaries 391

where μβ is the Gibbs measure defined in (16.1.1) and cβ is the kernel of transition
rates defined in (16.1.2). Given a pair of non-empty disjoint sets A, B ⊆ S, the
capacity of the pair A, B is given by the Dirichlet principle,

capβ (A, B) = min Eβ (h, h), (16.2.2)

h : S→[0,1]
h|A =1,h|B =0

where h|A = 1 means that h(ξ ) = 1 for all ξ ∈ A and h|B = 0 means that h(ξ ) = 0
for all ξ ∈ B. The unique minimizer hA,B of (16.2.2) is called the equilibrium po-
tential of the pair A, B, and is the solution of the equation

(−Lβ h)(ξ ) = 0, ξ ∈ S\(A ∪ B),

h(ξ ) = 1, ξ ∈ A, (16.2.3)
h(ξ ) = 0, ξ ∈ B,

which is given by

hA,B (ξ ) = Pξ (τA < τB ), ξ ∈ S\(A ∪ B),

hA,B (ξ ) = 1, ξ ∈ A, (16.2.4)
hA,B (ξ ) = 0, ξ ∈ B.

An alternative expression for the capacity is

capβ (A, B) = μβ (ξ ) cβ (ξ ) Pξ (τB < τA ) (16.2.5)
ξ ∈A

with cβ (ξ ) = ξ ∈S\ξ cβ (ξ, ξ ) the rate of moving out of ξ (recall (7.1.19)–(7.1.20),
(7.1.39) and (16.1.15)).

16.2.2 A priori estimates on the capacity

The following estimates on capacity will be needed later on.

Lemma 16.11 For every pair of non-empty disjoint sets A, B ⊆ S there exist con-
stants 0 < C1 ≤ C2 < ∞ (depending on A, B) such that

C1 ≤ eβΦ(A,B) Zβ capβ (A, B) ≤ C2 ∀ β ∈ (0, ∞). (16.2.6)

Proof The proof uses basic properties of communication heights.

Upper bound: Suppose that A, B are such that

Φ(ζ, A) > H (ζ ) ∀ ζ ∈ B. (16.2.7)

392 16 Abstract Set-Up and Metastability in the Zero-Temperature Limit

Then, picking h = 1K(A,B) in (16.2.2) with

K(A, B) = ξ ∈ S : Φ(ξ, A) ≤ Φ(ξ, B) , (16.2.8)

we get
capβ (A, B) ≤ Eβ (1K(A,B) , 1K(A,B) ). (16.2.9)
Here note that A ⊂ K(A, B), while (16.2.7) guarantees that B ⊂ S\K(A, B), so
that the boundary conditions on A and B are met.
To estimate Eβ (1K(A,B) , 1K(A,B) ), the key observation is that if ξ ∼ ξ with ξ ∈
K(A, B) and ξ ∈ S\K(A, B), then

(1) H ξ < H (ξ ),
(16.2.10)
(2) H (ξ ) ≥ Φ(A, B).

To prove (1), we argue by contradiction. Suppose that H (ξ ) ≥ H (ξ ). Then, because

ξ ∼ ξ , we have

Φ ξ , C = Φ(ξ, C) ∨ H ξ ∀ C ⊆ S. (16.2.11)
But ξ ∈ K(A, B) tells us that Φ(ξ, A) ≤ Φ(ξ, B), and so (16.2.11) gives

Φ ξ , A = Φ(ξ, A) ∨ H ξ ≤ Φ(ξ, B) ∨ H ξ = Φ ξ , B . (16.2.12)

Therefore ξ ∈ K(A, B), which is a contradiction. To see (2), note that (1) implies

Φ(ξ, C) = Φ ξ , C ∨ H (ξ ) ∀ C ⊆ S. (16.2.13)

Trivially, H (ξ ) ≤ Φ(ξ, B). We argue by contradiction that equality holds. Suppose

that H (ξ ) < Φ(ξ, B). Then (16.2.13) gives

H (ξ ) < Φ(ξ, B) = Φ ξ , B ∨ H (ξ ) = Φ ξ , B

< Φ ξ , A = Φ ξ , A ∨ H (ξ ) = Φ(ξ, A), (16.2.14)

where the second inequality uses that ξ ∈ S\K(A, B). Thus, we have Φ(ξ, A) >
Φ(ξ, B), which contradicts ξ ∈ K(A, B). From the equality H (ξ ) = Φ(ξ, B) and
(16.1.19) we obtain Φ(A, B) ≤ Φ(A, ξ ) ∨ Φ(ξ, B) = Φ(ξ, B) = H (ξ ), which
proves (2).
Combining (16.2.10) with (16.1.1)–(16.1.3), we find that
1 −β[H (ξ )∨H (ξ )]
μβ (ξ )cβ ξ, ξ = e
Zβ
1 −βΦ(A,B)
≤ e ∀ ξ ∈ K(A, B), ξ ∈ S\K(A, B). (16.2.15)
Zβ
Hence
1 −βΦ(A,B)
Eβ (1K(A,B) , 1K(A,B) ) ≤ C2 e (16.2.16)
Zβ
16.2 Preliminaries 393

with C2 = |{(ξ, ξ ) ∈ K(A, B) × S\K(A, B) : ξ ∼ ξ }|. Together with (16.2.9) this

completes the proof subject to (16.2.7).
Reversing the roles of A and B, we see that the same bound holds when

Φ ζ , B > H ζ ∀ ζ ∈ A. (16.2.17)

Thus it remains to consider A, B such that

∃ ζ ∈ B : Φ(ζ, A) = H (ζ ),
(16.2.18)
∃ ζ ∈ A : Φ ζ , B = H (ζ ).

Estimating

capβ (A, B) ≤ Eβ (1A , 1A ) = μβ (ξ )cβ ξ, ξ
ξ ∈A,ξ ∈S\A
1 −β[H (ξ )∨H (ξ )] 1 −βΦ(A,S\A)
= e ≤ C2 e (16.2.19)
Zβ Zβ
ξ ∈A,ξ ∈S\A
ξ ∼ξ

with C2 = |{(ξ, ξ ) : ξ ∼ ξ , ξ ∈ A, ξ ∈ S\A}|, and using that Φ(A, S\A) =

Φ(A, B) by (16.2.18), we get the claim.

Lower bound: The lower bound is obtained by picking any self-avoiding path

γ = (γ0 , γ1 , . . . , γL ) (16.2.20)

that realizes the minimax in Φ(A, B) and ignore all the transitions that are not in
this path, i.e.,
γ
capβ (A, B) ≥ min Eβ (h, h), (16.2.21)
h : γ →[0,1]
h(γ0 )=1,h(γL )=0

γ
where the Dirichlet form Eβ is defined as Eβ in (16.2.1) but with S replaced by γ .
Due to the one-dimensional nature of the set γ , the variational problem in the right-
hand side can be solved explicitly by elementary computations (recall Sect. 7.1.4).
We find that the minimum equals
L−1 −1
1
M= , (16.2.22)
μβ (γl )cβ (γl , γl+1 )
l=0

and is uniquely attained at h given by

l−1
1
h(γl ) = M , l = 0, 1, . . . , L. (16.2.23)
μβ (γk )cβ (γk , γk+1 )
k=0
394 16 Abstract Set-Up and Metastability in the Zero-Temperature Limit

Fig. 16.3 Schematic picture of the subgraphs S (on or below the top line) and S (below the top
line) and of the connected components Sm and Ss . The four vertical lines represent dead-ends

We thus have

1
capβ (A, B) ≥ M ≥ min μβ (γl )cβ (γl , γl+1 )
L l=0,1,...,L−1

1 1 1 −βΦ(A,B)
= min e−β[H (γl )∨H (γl+1 )] = C1 e
L Zβ l=0,1,...,L−1 Zβ
(16.2.24)
with C1 = 1/L.

16.2.3 Graph structure of the energy landscape

In this section we have a closer look at the geometric structure of the set S.

Theorem 16.12 (Graph structure of the energy landscape) View S as a graph whose
vertices are the configurations and whose edges connect configurations that can be
obtained from each other via an allowed move, i.e., (ξ, ξ ) is an edge if and only if
ξ ∼ ξ . Define (see Fig. 16.3)
– S is the subgraph of S obtained by removing all vertices ξ with H (ξ ) > Γ and
all edges incident to these vertices;
– S is the subgraph of S obtained by removing all vertices ξ with H (ξ ) = Γ
and all edges incident to these vertices;
– Sm and Ss are the connected components of S containing m and s, respectively.
Then

Sm = ξ ∈ S : Φ(ξ, m) < Φ(ξ, s) = Γ ,
(16.2.25)
Ss = ξ ∈ S : Φ(ξ, s) < Φ(ξ, m) = Γ .
16.2 Preliminaries 395

Moreover, Sm and Ss are disconnected in S , and

P ⊆ Sm , C ⊆ S \Sm ,
(16.2.26)
∀ ξ ∈ C ∃ γ : ξ → Ss such that γ \ξ ⊆ S \Sm .

Proof All paths connecting m and s reach energy level ≥ Γ (recall that H (m) = 0
by (16.1.14)). Therefore Sm and Ss are disconnected in S (because S does not
contain vertices with energy ≥ Γ ). The claims in (16.2.25) are immediate from
the definition of Sm and Ss . The claims in (16.2.26) are immediate consequences of
Definition 16.3.

16.2.4 Metastable pair

An important consequence of (H1) and Lemma 16.11 is the following.

Lemma 16.13 (Metastable pair) The pair {m, s} is a metastable set in the sense of
Definition 8.2:
maxξ ∈{m,s}
/ μβ (ξ )/capβ (ξ, {m, s})
lim = 0. (16.2.27)
β→∞ minξ ∈{m,s} μβ (ξ )/capβ (ξ, {m, s}\ξ )

Proof Note that (16.1.1), Lemma 16.9 and the lower bound in (16.2.6) give that
the numerator is bounded from above by eβ(V −H (m)) /C1 = eβ(Γ −δ) /C1 for some

δ > 0, while (16.1.1), the definition of Γ and the upper bound in (16.2.6) give that

the denominator is bounded from below by eΓ β /C2 (the minimum being attained
at m).

The property in (16.2.27) has an important consequence.

Lemma 16.14 (Mean crossover time asymptotics) Em (τs ) = [Zβ capβ (m, s)]−1 [1+
o(1)] as β → ∞.

Proof According to (8.2.10) and Theorem 8.15,

μβ (A(m))
E m (τs ) = 1 + o(1) , β → ∞, (16.2.28)
capβ (m, s)

where

A(m) = ξ ∈ S : Pξ (τm < τs ) ≥ Pξ (τs < τm )

= ξ ∈ S : hm,s (ξ ) ≥ 12 . (16.2.29)
It follows from Lemma 16.15 below that

lim min hm,s (ξ ) = 1, lim max hm,s (ξ ) = 0. (16.2.30)

β→∞ ξ ∈Sm β→∞ ξ ∈Ss
396 16 Abstract Set-Up and Metastability in the Zero-Temperature Limit

Hence, for large enough β,

Sm ⊆ A(m) ⊆ S\Ss . (16.2.31)

By Lemma 16.8, we have H (ξ ) > 0 = H (m) for all ξ = m such that Φ(ξ, m) ≤
Φ(ξ, s). Therefore, by the second inclusion in (16.2.31),

min H (ξ ) > 0. (16.2.32)

ξ ∈A(m)\m

The latter in turn implies that μβ (A(m))/μβ (m) = 1 + o(1). Since μβ (m) = 1/Zβ ,
we get the claim.

What Lemma 16.14 shows is that the proof of Theorem 16.5 revolves around
getting sharp bounds on Zβ capβ (m, s). The a priori estimates in (16.2.6) serve as a
jump board, because together with Lemma 16.14 they already yield the estimate
1
≤ e−βΓ Em (τs ) ≤ C1 .

(16.2.33)
C2
Thus, our task is to narrow down the constants leading to the identification of the
prefactor K. The strategy in Sect. 16.3 to do so is the following:
– Note that all terms in the Dirichlet form in (16.2.1) involving configurations ξ
with H (ξ ) > Γ , i.e., ξ ∈ S\S , contribute at most Ce−β(Γ +δ) for some δ > 0

and can be neglected. Thus, effectively we can replace S by S .

– Show that hm,s = 1 − O(e−βδ ) on Sm and hm,s = O(e−βδ ) on Ss for some δ > 0.
Thus, effectively we can replace hm,s by 1 on Sm and by 0 on Ss .
– Derive sharp estimates for hm,s on S \(Sm ∪ Ss ) in terms of a variational formula
involving only the vertices and the edges that are on or incident to S \(Sm ∪ Ss ).
Use this variational formula to identify K.

16.3 Proof of the metastability theorems

With the preparations done in Sect. 16.2, we are now ready to prove Theorems 16.4–
16.6. This will be done in Sects. 16.3.1–16.3.3, in reverse order.

16.3.1 Exponential distribution of the crossover time

Proof Theorem 16.6 follows from the general theory in Sect. 8.4. The intuition
behind the exponential distribution of the crossover time is simple: each time the
dynamics reaches C (m, s) but fails to enter Ss and instead falls back into Sm , it has
a probability exponentially close to 1 to return to m because m lies at the bottom of
Sm (recall Lemma 16.8). Each time the dynamics returns to m, it starts from scratch.
16.3 Proof of the metastability theorems 397

Thus, the dynamics manages to reach a critical configuration and go over the hill
only after a number of unsuccessful attempts that tends to infinity as β → ∞, each
having a small probability that tends to zero as β → ∞. Consequently, the time to
go over the hill is exponentially distributed on the scale of its average.

16.3.2 Average crossover time

In this section we prove Theorem 16.5.

Proof Our starting point is Lemma 16.14. Recalling (16.2.1)–(16.2.4), our task is
to show that
2
Zβ capβ (m, s) = 12 Zβ μβ (ξ )cβ ξ, ξ hm,s (ξ ) − hm,s ξ
ξ,ξ ∈S

= 1 + o(1) Θ e−βΓ ,

β → ∞, (16.3.1)

and to identify the constant Θ, since (16.3.1) will imply (16.1.16) with Θ = 1/K.
This is done in three steps: in the first two steps we derive sharp estimates on hm,s ,
in the third step we use these estimates to derive a variational formula for Θ.

1. For all ξ ∈ S\S we have H (ξ ) > Γ , and so there exists a δ > 0 such that
Zβ μβ (ξ ) ≤ e−β(Γ +δ) . Since cβ (ξ, ξ ) ≤ 1 for all ξ, ξ ∈ S, we can therefore replace

S by S in the sum in (16.3.1) at the cost of a prefactor 1 + O(e−βδ ) (for details, see
the proof of Lemma 16.17 below).

Lemma 16.15 There exist C < ∞ and δ > 0 such that

min hm,s (ξ ) ≥ 1 − Ce−βδ , max hm,s (ξ ) ≤ Ce−βδ , ∀ β ∈ (0, ∞). (16.3.2)

ξ ∈Sm ξ ∈Ss

Proof Combine Lemma 8.4 with Lemma 16.11.

2. Because of Lemma 16.15, on the set Sm ∪ Ss , hm,s is trivial and its contribution to
the sum in (16.3.1) can be put into the prefactor 1 + o(1) (for details, see the proof
of Lemma 16.17 below). Consequently, all that is needed is to understand what hm,s
looks like on the set

S \(Sm ∪ Ss ) = ξ ∈ S : Φ(ξ, m) = Φ(ξ, s) = Γ . (16.3.3)

However, Lemma 16.16 below shows that hm,s is also trivial on the set

I
S \(Sm ∪ Ss ) = Si , (16.3.4)
i=1
398 16 Abstract Set-Up and Metastability in the Zero-Temperature Limit

Fig. 16.4 Schematic picture of the wells Si

which is a union of wells Si , i = 1, . . . , I , in S (m, s) for some I ∈ N. Each Si is a

maximal set of communicating configurations with energy < Γ and with commu-
nication height Γ towards both m and s (recall Fig. 16.3 and see Fig. 16.4).

Lemma 16.16 There exist C < ∞ and δ > 0 such that

max hm,s (ξ ) − hm,s ξ ≤ Ce−βδ , ∀ i = 1, . . . , I, β ∈ (0, ∞). (16.3.5)
ξ,ξ ∈Si

Proof Fix i ∈ {1, . . . , I } and ξ, ξ ∈ Si . Estimate

hm,s (ξ ) = Pξ (τm < τs ) ≤ Pξ (τm < τξ ) + Pξ (τξ < τm < τs ). (16.3.6)

Combining Lemma 8.4 with Lemma 16.11, we have

capβ (ξ, m) )]
Pξ (τm < τξ ) ≤ ≤ C e−β[Φ(ξ,m)−Φ(ξ,ξ ≤ C e−βδ , (16.3.7)
capβ (ξ, ξ )

where we use that Φ(ξ, m) = Γ and Φ(ξ, ξ ) < Γ . But

Pξ (τξ < τm < τs ) = Pξ (τξ < τm∪s ) Pξ (τm < τs ) ≤ Pξ (τm < τs ) = hm,s ξ .
(16.3.8)
Combining (16.3.6)–(16.3.8), we therefore get

hm,s (ξ ) ≤ C e−βδ + hm,s ξ . (16.3.9)

Interchange ξ and ξ to get the claim.

Lemma 16.16 shows that the contribution to the sum in (16.3.1) of the transitions
inside a well can also be put into the prefactor 1 + o(1) (for details, see the proof of
Lemma 16.17 below). Thus, only the transitions in and out of wells contribute.

3. In view of the above observations, the estimation of Zβ capβ (m, s) reduces to the
study of a simpler variational problem.

Lemma 16.17 (Variational formula for the prefactor) As β → ∞,

Zβ capβ (m, s) = 1 + o(1) Θ e−βΓ

(16.3.10)
16.3 Proof of the metastability theorems 399

with
2
Θ = min min

1
2 1{ξ ∼ξ } h(ξ ) − h ξ . (16.3.11)
C1 ...,CI h : S →[0,1]
h|Sm =1, h|Ss =0, h|S =Ci ∀ i=1,...,I
i
ξ,ξ ∈S

Proof First, recalling (16.1.1)–(16.1.2) and (16.2.1)–(16.2.2), we have

Zβ capβ (m, s)
2
= Zβ min 1
2 μβ (ξ )cβ ξ, ξ h(ξ ) − h ξ
h : S→[0,1]
h(m)=1, h(s)=0 ξ,ξ ∈S

= O e−(Γ +δ)β

2
+ Zβ min

1
2 μβ (ξ )cβ ξ, ξ h(ξ ) − h ξ . (16.3.12)
h : S →[0,1]
h(m)=1, h(s)=0 ξ,ξ ∈S

Next, with the help of Lemmas 16.11–16.15, we get

2
min

1
2 μβ (ξ )cβ ξ, ξ h(ξ ) − h ξ
h : S →[0,1]
h(m)=1, h(s)=0 ξ,ξ ∈S
2
= min

1
2 μβ (ξ )cβ ξ, ξ h(ξ ) − h ξ
h : S →[0,1]
h=hm,s on Sm ∪Ss ∪(S1 ,...,SI ) ξ,ξ ∈S

where the error term O(e−δβ ) arises after we replace the approximate boundary
conditions
⎧
⎨ 1 − O(e−βδ ) on Sm ,
h = O(e−βδ ) on Ss , (16.3.14)
⎩
Ci + O(e−βδ ) on Si , i = 1, . . . , I,
coming from Lemmas 16.15–16.16 by the sharp boundary conditions
⎧
⎨1 on Sm ,
h= 0 on Ss , (16.3.15)
⎩
Ci on Si , i = 1, . . . , I.
400 16 Abstract Set-Up and Metastability in the Zero-Temperature Limit

The minimum with the sharp boundary conditions is an upper bound for the mini-
mum with the approximate boundary conditions. Conversely, removal from the min-
imum with the approximate boundary conditions of all the transitions that stay in-
side Sm , Ss or Si for some i = 1, . . . , I yields a lower bound that is within a factor
1 − O(e−βδ ) of the minimum with the sharp boundary conditions.
Finally, by (16.1.1)–16.1.3) we have

Zβ μβ (ξ )cβ ξ, ξ = 1{ξ ∼ξ } e−βΓ

(16.3.16)

for all ξ, ξ ∈ S that are not both in Sm or both in Ss or both in Si for some i =
1, . . . , I . Indeed, by Theorem 16.12 and the decomposition in (16.3.4), in each of
these cases either H (ξ ) = Γ > H (ξ ) or H (ξ ) < Γ = H (ξ ), because there are no
allowed moves between Sm , Ss and Si , i = 1, . . . , I . Combining (16.3.12)–(16.3.13)
and (16.3.16), we arrive at the claim.

Combining Lemma 16.14 with (16.3.10)–(16.3.11), we see that we have com-

pleted the proof of (16.1.16) with K = 1/Θ.

The variational formula for Θ is non-trivial because it depends on the geometry

of the wells Si , i = 1, . . . , I . In Chaps. 17 and 18 we will see how to compute K for
Glauber dynamics and Kawasaki dynamics.

16.3.3 Gate for the crossover and uniform entrance distribution

In this section we prove Theorem 16.4.

Proof (a) We will show that there exist δ > 0 and C < ∞ such that for all β,

Pm (τC < τs | τs < τm ) ≥ 1 − Ce−βδ , (16.3.17)

which implies the claim. The proof goes as follows.

By (16.2.5), capβ (m, s) = μβ (m) cβ (m)Pm (τs < τm ) with μβ (m) = 1/Zβ .
From the lower bound in Lemma 16.11 it therefore follows that
1
Pm (τs < τm ) ≥ C1 e−βΓ

. (16.3.18)
cβ (m)

We will show that

1
Pm {τC < τs }c , τs < τm ≤ C2 e−β(Γ +δ)

. (16.3.19)
cβ (m)

Combining (16.3.18)–(16.3.19), we get (16.3.17) with C = C2 /C1 .

16.3 Proof of the metastability theorems 401

Because C ⊆ G (m, s), any path from m to s that does not pass through C
must hit a configuration ξ with H (ξ ) > Γ . Therefore there exists a set U , with
H (ξ ) ≥ Γ + δ for all ξ ∈ U and some δ > 0, such that

Pm {τC < τs }c , τs < τm ≤ Pm (τU < τm ). (16.3.20)

Now estimate, with the help of reversibility,

Pm (τU < τm ) ≤ Pm (τξ < τm )
ξ ∈U
μβ (ξ )cβ (ξ )
= Pξ (τm < τξ )
μβ (m)cβ (m)
ξ ∈U

1
≤ ξ ∈ S\ξ : ξ ∼ ξ e−βH (ξ )
cβ (m)
ξ ∈U

1
C2 e−β(Γ +δ)

≤ (16.3.21)
cβ (m)

with C2 = |{(ξ, ξ ) ∈ U × S\ξ : ξ ∼ ξ }|, where we use that H (m) = 0 and

cβ (ξ, ξ ) ≤ 1. Combine (16.3.20)–(16.3.21) to get the claim in (16.3.19).

(b) Write

Pm (ξτC = ξ, τC < τm )
Pm (ξτC = ξ | τC < τm ) = , ξ ∈ C . (16.3.22)
Pm (τC < τm )

By reversibility,

μβ (ξ )cβ (ξ )
Pm (ξτC = ξ, τC < τm ) = Pξ (τm < τC )
μβ (m)cβ (m)
cβ (ξ )
= e−Γ
β
Pξ (τm < τC ), ξ ∈ C . (16.3.23)
cβ (m)

Moreover (recall (16.2.4)–(16.2.3)),

cβ (ξ, ξ )
Pξ (τm < τC ) = hm,C ξ , ξ ∈ C , (16.3.24)

cβ (ξ )
ξ ∈S\C
ξ ∼ξ

where
⎧
⎨ 0 if ξ ∈ C ,
hm,C ξ = 1 if ξ = m, (16.3.25)
⎩
Pξ (τm < τC ) otherwise.
402 16 Abstract Set-Up and Metastability in the Zero-Temperature Limit

Because P ⊆ Sm by Theorem 16.12 and C ⊆ G (m, s) by Definition 16.3, we

have Φ(ξ , C ) − Φ(ξ , m) = Γ − Φ(ξ , m) ≥ δ for all ξ ∈ P and some δ > 0.
Therefore, as in the proof of Lemma 8.4, it follows that

min hm,C ξ ≥ 1 − Ce−βδ . (16.3.26)
ξ ∈P

Moreover, let

C¯ = ξ ∈ S\ P ∪ C : H ξ ≤ Γ , ∃ ξ ∈ C : ξ ∼ ξ . (16.3.27)

By Lemma 16.10, any path from C¯ to m that avoids C must reach an energy level
above Γ , and so hm,C (ξ ) ≤ hS\S ,C (ξ ) for all ξ ∈ C¯ . But Φ(ξ , S\S ) −
Φ(ξ , C ) = Φ(ξ , S\S ) − Γ ≥ δ for all ξ ∈ C¯ ∩ S and some δ > 0. Therefore,
again as in the proof of Lemma 8.4, it follows that

max hm,C ξ ≤ Ce−βδ . (16.3.28)
ξ ∈C¯ ∩S

The estimates in (16.3.26)–(16.3.28) can be used as follows. By restricting the

sum in (16.3.24) to ξ ∈ P and using (16.3.26), we get the lower bound
cβ (ξ, P )
Pξ (τm < τC ) ≥ 1 − Ce−βδ , ξ ∈ C . (16.3.29)
cβ (ξ )
On the other hand, by using (16.3.28) in combination with the fact that cβ (ξ,S\[C ∪
C¯ ]) = cβ (ξ, P ) for all ξ ∈ C (recall Fig. 16.2) and cβ (ξ, ξ ) ≤ e−βδ for all
ξ ∈ C and ξ ∈ S\S , we get the upper bound
cβ (ξ, P )
Pξ (τm < τC ) ≤ + Ce−βδ C¯ + e−βδ S\S , ξ ∈ C . (16.3.30)
cβ (ξ )
Because H (ξ ) < H (ξ ) = Γ for all ξ ∈ C and ξ ∈ P , we have

cβ ξ, P = cβ ξ, ξ = ξ ∈ P : ξ ∼ ξ , ξ ∈ C , (16.3.31)
ξ ∈P

and, since cβ (ξ ) ≤ |S|, it follows that ξ → cβ (ξ, P )/cβ (ξ ) ≥ C > 0. Combine this
observation with (16.3.29)–(16.3.30), to get
cβ (ξ, P )
Pξ (τm < τC ) = 1 + O e−βδ , ξ ∈ C . (16.3.32)
cβ (ξ )
Combine this in turn with (16.3.22)–(16.3.23), to arrive at
cβ (ξ ) Pξ (τm < τC )
Pm (ξτC = ξ | τC < τm ) =
ξ ∈C cβ (ξ ) Pξ (τm < τC )
cβ (ξ, P )
= 1 + O e−βδ
, ξ ∈ C .
ξ ∈C cβ (ξ , P )

(16.3.33)
16.4 Beyond Metropolis dynamics 403

Finally, by (H2) and (16.3.31), ξ → cβ (ξ, P ) is constant on C . Together with

(16.3.33) this proves the claim.

16.4 Beyond Metropolis dynamics

There is nothing that prevents us from choosing a dynamics that is different from
the Metropolis dynamics in (16.1.2). We take a brief look at two examples, namely,
heat-bath dynamics (Sect. 16.4.1) and probabilistic cellular automata (Sect. 16.4.2).
We show that Theorems 16.4–16.6 in Sect. 16.1 carry over provided we modify
hypothesis (H2).

16.4.1 Heat-bath dynamics

Return to the setting of Sect. 16.1.1. The heat-bath dynamics is the continuous-time
Markov process with state space S = Υ Λ and transition rates

[1 + eβ[H (ξ )−H (ξ )] ]−1 ξ ∼ ξ ,
cβ ξ, ξ = (16.4.1)
0 otherwise.

This Markov process is reversible with respect to μ, the Gibbs measure associated
with H . Note that for large β the transition rates of the heat-bath dynamics and the
Metropolis dynamics are close to each other, except when H (ξ ) = H (ξ ), in which
case the former gives cβ (ξ, ξ ) = 12 while the latter gives cβ (ξ, ξ ) = 1.

Theorem 16.18 (Metastability for heat-bath dynamics) Theorems 16.4–16.6 are

valid for heat-bath dynamics subject to (H1) and
(H2 ) (H2) holds and ξ → H (ξ ) is constant on P .

Proof The same proofs as in Sects. 16.2–16.3 apply, except for minor modifications
in a few spots:

1. In Sect. 16.2.2, the only modification is that, because

−1
μβ (ξ )cβ ξ, ξ = eβH (ξ ) + eβH (ξ ) ≥ 12 e−β[H (ξ )∨H (ξ )] , (16.4.2)

the lower bound in (16.2.24) holds with C1 = 1/2L instead if C1 = 1/L. This does
not affect the a priori estimates in Lemma 16.11.

2. In Sect. 16.3.2, the only modification is that instead of (16.3.16) we have

Zβ μβ (ξ )cβ ξ, ξ = 1{ξ ∼ξ } e−βΓ 1 + O e−βδ (16.4.3)
404 16 Abstract Set-Up and Metastability in the Zero-Temperature Limit

for all ξ, ξ ∈ S that are not both in Sm or both in Ss or both in Si for some i =
1, . . . , I . This does not affect Lemma 16.17, and so the same variational formula for
Θ = 1/K as in (16.3.11) holds.

3. In Sect. 16.3.3, no modification is needed all the way up to and including (16.3.30)
(the last term in the right-hand side of (16.3.30) comes with a factor [1 + eβδ ]−1 ≤
e−βδ ). Also, instead of (16.3.31), we can estimate cβ (ξ, P ) ≥ 12 |{ξ ∈ P : ξ ∼
ξ }|, so that also (16.3.32) and (16.3.33) carry over. Finally, ξ → cβ (ξ, P ) is not
constant on C . However,
−1
cβ ξ, ξ = 1 + eβ[H (ξ )−H (ξ )] = e−β[H (ξ )−H (ξ )] 1 + O e−βδ ,
(16.4.4)
ξ ∈ C ξ ∈ P .

Consequently, if we strengthen (H2) to (H2 ), then (16.3.33) again gives the uniform
entrance distribution (use that H (ξ ) = Γ for all ξ ∈ C ).

Since the triple (Γ , C , K) only depends on H , ∼, m and s, even this is not

affected by the choice of (16.4.1) over Metropolis.

16.4.2 Probabilistic cellular automata

Again return to the setting of Sect. 16.1.1. A probabilistic cellular automaton (PCA)
is a discrete-time Markov chain with state space S = Υ Λ and transition matrix
"
p ξ, ξ = px,ξ ξ (x) , ξ, ξ ∈ S, (16.4.5)
x∈Λ

where, for each x ∈ Λ and ξ ∈ S, px,ξ (·) is a probability measure on S with full
support. This transition matrix corresponds to independent updates of all the spins
simultaneously at each unit of time (“parallel dynamics”), according to local updat-
ing rules that take into account both the location of the spin and the values of the
spins in its surroundings. Typically, px,ξ (·) is assumed to depend on ξ only through
the spins ξ(y), y ∈ N (x), in some small neighbourhood N (x) of x. If Λ has a lattice
structure (e.g. a torus in Zd , d ≥ 1), then typically N (x) = x + N , x ∈ Λ, for some
small N ⊆ Λ.
What makes PCA’s into challenging objects is that they evolve via global moves
rather than local moves: all transitions—between any pair of configurations in S—
have positive probability, and therefore all transitions are allowed. This means that
∼ loses the role it played for Metropolis dynamics,
For β > 0, the PCA in (16.4.5) is reversible with respect to the Gibbs measure
μβ (ξ ) = e−βH (ξ ) /Zβ , ξ ∈ S, associated with the Hamiltonian H : S → R if

μβ (ξ )p ξ, ξ = μβ ξ p ξ , ξ ∀ ξ, ξ ∈ S, (16.4.6)
16.4 Beyond Metropolis dynamics 405

i.e., if the “dynamic Hamiltonian” defined by

1
H ξ, ξ = H (ξ ) − ln p ξ, ξ (16.4.7)
β
is a symmetric function on S × S. For a given choice of H and β, this condition puts
a constraint on the choice of PCA in (16.4.5).
The communication height between two configurations ξ, ξ ∈ S with ξ = ξ is
defined to be

Φ ξ, ξ = min max H (e), (16.4.8)
γ : ξ →ξ e∈γ

where the maximum runs over all edges e in γ , i.e., over all pairs of successive con-
figurations visited by the path. This is different from Definition 16.1(a), where the
maximum runs over the single configurations in γ , and H was used instead of H
(note that Φ(ξ, ξ ) = H (ξ ) by convention). The definition of the communication
level set S (ξ, ξ ) in Definition 16.1(b) must be adapted accordingly: this becomes a
set of pairs of configurations rather than single configurations (S (ξ, ξ ) = ξ by con-
vention). A similar change applies to the definition of gates and dead-ends in Defi-
nition 16.2. What makes (16.4.8) non-trivial to compute is the fact that the Hamilto-
nian and the transition probabilities compete with each other: to make H (·, ·) small
we must make H (·) small and p(·, ·) large simultaneously.
Definition 16.3 must be changed into the following.

Definition 16.19 (Protocritical and critical sets for PCA dynamics) Let

Γ = Φ(m, s) − H (m). (16.4.9)

Then (P (m, s), C (m, s)) is the maximal subset of S × S such that:
(1) ∀ ξ ∈ P (m, s) ∃ γ : ξ → m : maxe∈γ H (e) − H (m) < Γ .
(2) ∀ ξ ∈ C (m, s) ∃ γ : ξ → s : maxe∈γ H (e) − H (m) ≤ Γ , γ ∩ {ζ ∈ S :
Φ(ζ, m) < Φ(ζ, s)} = ∅.

With this change we can now state the following.

Theorem 16.20 (Metastability for PCA dynamics) Theorems 16.4–16.6 are valid
for PCA dynamics subject to (H1) and
(H2 ) (ξ, ξ ) → p(ξ, ξ ) is constant on C × P .

Proof We will again go through the proofs in Sects. 16.2–16.3 to see what needs to
be modified.

1. In Sect. 16.2.1, the definition of the Dirichlet form in (16.2.1) becomes

2
E (h, h) = 12 μβ (ξ )p ξ, ξ h(ξ ) − h ξ , h : S → [0, 1], (16.4.10)
ξ,ξ ∈S
406 16 Abstract Set-Up and Metastability in the Zero-Temperature Limit

while the definition of the capacity in (16.2.2) remains the same. Note that

μβ (ξ )p ξ, ξ = e−β H (ξ,ξ ) /Zβ . (16.4.11)

Replace (16.2.5) by

capβ (A, B) = μβ (ξ )p ξ, ξ Pξ (τB < τA ), (16.4.12)
ξ ∈A

which is simpler than (16.2.5) because the PCA dynamics evolves is discrete rather
than continuous time.

2. Throughout Sect. 16.2.2, the new definition of communication height in (16.4.8)

must be used, and μβ (ξ )cβ (ξ, ξ ) must be replaced by μβ (ξ )p(ξ, ξ ). Otherwise
there are no changes.

3. In Sect. 16.2.3, Theorem 16.12 needs to be adapted as follows: S is the graph con-
sisting of all vertices and all edges; S is the subgraph of S obtained by removing all
edges e with H (e) > Γ ; S is the subgraph of S obtained by removing all edges
e with H (e) = Γ ; Sm and Ss are the connected components of S containing m
and s, respectively. With these modifications the claim in Theorem 16.12 stays the
same.

4. There are no changes in Sects. 16.2.4 and 16.3.1. Throughout Sect. 16.3.2,
μβ (ξ )cβ (ξ, ξ ) must be replaced by μβ (ξ )p(ξ, ξ ), while the indicator 1{ξ ∼ ξ }
must be removed from (16.3.11) and (16.3.16). The definition of Si , i = 1, . . . , I
in (16.3.4) stays the same, and so does the variational formula for Θ = 1/K in
(16.3.11).

5. Throughout Sect. 16.3.3, remove the terms cβ (ξ ) and cβ (m) (in view of (16.4.12))
and replace cβ (ξ, ξ ) by p(ξ, ξ ). Finally, if we replace (H2) by (H2 ), then
(16.3.33) again gives the uniform entrance distribution.

16.5 Bibliographical notes

1. The line of reasoning pursued in Sects. 16.2–16.3 was put forward in Bovier and
Manzo [39] for Glauber dynamics and in Bovier, Nardi and den Hollander [31] for
Kawasaki dynamics, both in their model-specific context (see Chaps. 17 and 18).
The two hypotheses in Sect. 16.1.2 have been stripped of this context in order to
capitalise as much as possible on the general theory developed in Part III: they are
the essentially minimal hypotheses that are needed to obtain the universal metastable
behaviour expressed in Theorems 16.4–16.6. Part of this stripping was already done
in den Hollander, Nardi and, Troiani [87], where Kawasaki dynamics with two types
of particles was considered (see Sect. 18.7).
16.5 Bibliographical notes 407

Fig. 16.5 Example where Smeta = {m1 , m2 }, yet for both these metastable configurations the same
results apply as in Theorems 16.4–16.6

Fig. 16.6 Example where there is a well of depth > Γ . The presence of this well does not influ-
ence the typical crossover time from m to s, but enlarges its average by a factor eβΔ

Fig. 16.7 Example where Smeta = {m1 , m2 }, with s separated from m1 by m2 . In this case the dis-
tribution of the crossover time from m1 to s divided by its average is one half times the convolution
of two unit exponentials

2. An “axiomatisation” of the essential features of metastability in the context of

the pathwise approach to metastability can be found in Manzo, Nardi, Olivieri and
Scoppola [171]. Here, hypotheses similar to (H1) are formulated, and are used to
derive the results in Theorem 16.4(a) and Theorem 16.5 without the prefactor (i.e.,
the average crossover time is identified up to a multiplicative factor eo(β) ). This
408 16 Abstract Set-Up and Metastability in the Zero-Temperature Limit

paper also contains a careful analysis of the role of minimal gates and essential
gates for the metastable pair.

3. The non-degeneracy assumption in (H1) can be relaxed. For instance, if Smeta

is not a singleton, then the same results as in Theorems 16.4–16.6 apply for each
choice of m ∈ Smeta as long as (m → s)opt does not need to cross Smeta \m. An
example is given in Fig. 16.5. See Cirillo and Nardi [65] for an analysis of what
may happen in degenerate situations.
Figures 16.6–16.7 exhibit two examples where (H1) fails and the metastable be-
haviour is different.
Chapter 17
Glauber Dynamics

“You have no right to grow here,” said the Dormouse. “Don’t

talk nonsense,” said Alice more boldly: “you know you’re
growing too.” “Yes, but I grow at a reasonable pace,” said the
Dormouse: “not in a ridiculous fashion. . . ”
(Lewis Carroll, Alice’s Adventures in Wonderland)

In this chapter we apply the results obtained in Chap. 16 to Ising spins in two and
three dimensions subject to Glauber dynamics. Spins live in a finite box, flip up
and down, want to align when they sit next to each other, and want to align with an
external magnetic field. We are interested in how the system magnetises, i.e., how
the dynamics aligns the spins with the magnetic field when initially all the spins are
pointing in the opposite direction. Our goal will be to prove hypotheses (H1–H2)
in Sect. 16.1.2, implying that Theorems 16.4–16.6 are valid. In two dimensions we
will identify (Γ , C , K). In three dimensions we will also identify Γ , but we will
obtain only partial information on C and K.

17.1 Introduction and main results

17.1.1 Model

Let Λ ⊂ Z2 be a large square torus, centred at the origin. With each site x ∈ Λ we
associate a spin variable σ (x) assuming the values −1 or +1, indicating whether the
spin at x is pointing down or up (see Fig. 17.1). A configuration is denoted by σ ∈
S = {−1, +1}Λ . Each configuration σ ∈ S has an energy given by the Hamiltonian

J h
H (σ ) = − σ (x)σ (y) − σ (x), (17.1.1)
2 2
{x,y}∈Λ∗ x∈Λ

where

Λ∗ = {x, y} : x, y ∈ Λ, x − y = 1 (17.1.2)

A. Bovier, F. den Hollander, Metastability,
Grundlehren der mathematischen Wissenschaften 351,
DOI 10.1007/978-3-319-24777-9_17
410 17 Glauber Dynamics

Fig. 17.1 An Ising-spin configuration

is the set of non-oriented nearest-neighbour bonds in Λ. The interaction consists of

a ferromagnetic pair potential J > 0 for each pair of neighbouring spins in Λ and a
magnetic field h > 0 for each spin in Λ.
The Hamiltonian in (17.1.1) models Ising spins in Λ that want to align with
neighbouring spins and with an external magnetic field that is pointing upwards.
We are interested in Glauber dynamics on Λ. This is the Metropolis dynamics with
respect to H at inverse temperature β defined in (16.1.2) with single-spin flips as
allowed moves, i.e., −1 changes to +1 or +1 changes to −1 at single sites in Λ.
Clearly, this dynamics is a finite-state Markov process, and hence fits into the gen-
eral theory described in Chap. 16.
The Gibbs measure μβ defined in (16.1.1) is the equilibrium of the dynamics
with transition rates cβ defined in (16.1.2) and satisfies the reversibility property in
(16.1.3).

17.1.2 Metastable regime and critical droplet size

Throughout the sequel we assume that

h ∈ (0, 2J ). (17.1.3)

This parameter range will be seen to correspond to metastable behaviour in the limit
as β → ∞ (see Fig. 17.4 below). A key role will be played by what we call the
critical droplet size:
9 :
2J
c = (17.1.4)
h
(0·1 denotes the upper integer part). For reasons that will become clear later on, we
will assume that
2J
∈
/ N. (17.1.5)
h
Thus, an (c − 1) × (c − 1) droplet will be “subcritical” while an c × c droplet
will be “supercritical”. Moreover, we will assume that Λ is large enough so that it
contains an 2c × 2c square, which is necessary for (H1) and will also prevent the
critical droplet to be a ring that wraps around Λ.
Analogous assumptions are needed in three dimensions (see Sect. 17.6).
17.1 Introduction and main results 411

Fig. 17.2 Configurations in Q , Q 1pr and Q 2pr . Inside the contours sit the up-spins, outside the
contours sit the down-spins

17.1.3 Main theorems

Each configuration can be decomposed into maximally connected components,

called clusters.

Definition 17.1
(a) Let

= σ ∈ S : σ (x) = −1 ∀ x ∈ Λ ,
(17.1.6)
= σ ∈ S : σ (x) = +1 ∀ x ∈ Λ ,
denote the configurations where all spins in Λ are down, respectively, up.
(b) Let Q be the set of configurations where the up-spins form a single (c − 1) × c
quasi-square anywhere in Λ.
(c) Let Q 1pr be the set of configurations where the up-spins form a single quasi-
square (c − 1) × c anywhere in Λ with a single protuberance attached any-
where to one of its longest sides.
(d) Let Q 2pr be the set of configurations where the up-spins form a single quasi-
square (c − 1) × c anywhere in Λ with a double protuberance attached any-
where to one of its longest sides.

See Fig. 17.2 for a picture of the configurations in Q, Q 1pr and Q 2pr .
The main metastability theorems for Glauber dynamics are the following. Recall
Definition 16.3.

Theorem 17.2 The pair (, ) satisfies hypotheses (H1–H2) in Section 16.1.2 and
hence Theorems 16.4–16.6 hold.

Theorem 17.3 The pair (, ) has protocritical set P (, ) = Q, critical set
C (, ) = Q 1pr , and communication height

Γ = Γ (, ) = H Q 1pr − H () = J [4c ] − h c (c − 1) + 1 . (17.1.7)

Theorem 17.4 The prefactor K = K(Λ) equals K(Λ) = 3 1

4(2c −1) |Λ| .
412 17 Glauber Dynamics

Fig. 17.3 Transitions over the hill: Q → Q 1pr → Q 2pr

In addition, we have the following geometric description of the configurations in

the valleys S , S around , defined in (16.2.25). Let

V≤Q = σ ∈ S : σ ≤ σ for some σ ∈ Q ,
(17.1.8)
V≥Q2pr = σ ∈ S : σ ≥ σ for some σ ∈ Q 2pr ,

where we write σ ≤ σ when σ (x) ≤ σ (x) for all x ∈ Λ, and vice versa.

Theorem 17.5 S ⊇ V≤Q , S ⊇ V≥Q2pr .

17.1.4 Discussion

1. The proof of Theorem 17.2 is given in Sect. 17.3. (H2) is easy to check, (H1) is
more involved and relies on certain isoperimetric inequalities.

2. The heuristics behind Theorem 17.3 is as follows. In Sect. 17.4 we will see that
Q 1pr ⊆ S (, ), the communication level set of the pair (, ). We will see that
on its way from to the dynamics passes through S (, ) in three steps:
(1) first it creates a quasi-square of up-spins; (2) next it attaches a single protu-
berance; (3) finally it turns this single protuberance into a double protuberance (see
Fig. 17.3). After these three steps are completed, the dynamics is “over the hill”
and proceeds downwards in energy to fill the box with up-spins. This also explains
where Theorem 17.5 comes from.

3. The heuristics behind Theorem 17.4 is as follows. The average time it takes for
the dynamics to enter C (, ) = Q 1pr when starting from is
1
eβΓ 1 + o(1) , β → ∞, (17.1.9)
|Q |
1pr

where |Q 1pr | counts the number of critical droplets. Let π(c ) be the probability
that the single protuberance is turned into a double protuberance rather than is being
removed. Then
1
1 + o(1) , β → ∞, (17.1.10)
π(c )
17.2 Geometric definitions 413

is the average number of times a critical droplet just created attempts to move over
the hill before it finally manages to do so. The average nucleation time is the product
of (17.1.9) and (17.1.10), and so we conclude that
1
K= . (17.1.11)
|Q 1pr | π(c )

To compute |Q 1pr |, note that

1pr
Q = |Λ| N (c ) with N (c ) = 4c . (17.1.12)

Indeed, the (c − 1) × c quasi-square can be located anywhere in Λ (which is a

torus) in two possible orientations, while the single protuberance can be attached in
any of the 2c possible locations on one of the sides of length c (see Fig. 17.2).
To compute π(c ), note that if the protuberance sits at one of the two extreme ends
of the side it is attached to, then the probability is 12 that its one neighbouring spin
flips up before the spin itself flips down. On the other hand, if the protuberance sits
somewhere else, then the probability is 23 that one of its two neighbouring spins
on the same side flip up before the spin itself flips down. Since the location of the
protuberance in uniform (because of the uniform exit distribution stated in Theo-
rem 16.4(b)), we therefore get
# $
1 1 2 2c − 1
π(c ) = 2 + (c − 2) = . (17.1.13)
c 2 3 3c
Combine (17.1.11)–(17.1.13) to get the formula for K in Theorem 17.4.

Outline The outline of the remainder of this chapter is as follows. In Sect. 17.2
we introduce some geometric definitions that are needed for the proof of Theo-
rems 17.2–17.5. These theorems are proved in Sects. 17.3–17.5. Section 17.6 looks
at the extension from two to three dimensions.

17.2 Geometric definitions

In order to prove Theorems 17.2–17.5, we need some further definitions.

1. Throughout the sequel, we identify a configuration σ ∈ S with the set of locations

of its up-spins supp(σ ) = {x ∈ Λ : σ (x) = +1}, and write x ∈ σ to indicate that σ
has an up-spin at x.

2. Given a configuration σ ∈ S, consider the set C(σ ) ⊆ R2 defined as the union

of the closed unit squares centred at the sites of supp(σ ). The maximal connected
components C1 , . . . , Cm , m ∈ N, of C(σ ) are called clusters of σ (two unit squares
touching only at the corners are not connected). There is a one-to-one correspon-
dence between configurations σ ∈ S and sets C(σ ).
414 17 Glauber Dynamics

3. For σ ∈ S, let |σ | be the volume of C(σ ), ∂(σ ) the Euclidean boundary of C(σ ),
called the contour of σ , and |∂(σ )| the length of ∂(σ ). Then the energy associated
with σ is given by

H (σ ) = J ∂(σ ) − h|σ | + H (). (17.2.1)
4. To describe the shape of clusters, we need the following:
– An 1 × 2 rectangle is a union of closed unit squares centered at the sites in Λ
with side lengths 1 , 2 ≥ 1. We use the convention 1 ≤ 2 and collect rectangles
in equivalence classes modulo translations and rotations.
– A quasi-square is an × ( + δ) rectangle with ≥ 1 and δ ∈ {0, 1}. A square is
a quasi-square with δ = 0.
– A bar is a 1 × k rectangle with k ≥ 1. A bar is called a row or a column if it fills
a side of a rectangle.
– A corner of a rectangle is an intersection of two bars attached to the rectangle.
– A 1-protuberance is a 1 × 1 bar attached to one side of a rectangle.
– A 2-protuberance is a 1 × 2 bar attached to one side of a rectangle.
5. The configuration space S can be partitioned as
|Λ|

S= Vn , (17.2.2)
n=0

where
Vn = {σ ∈ S : |σ | = n} (17.2.3)
is the set of configurations with n up-spins.

17.3 Verification of the two hypotheses

In this section we verify (H1) and (H2) for Glauber dynamics and thereby prove
Theorem 17.2.

17.3.1 First hypothesis

Proof Let D denote the set of configurations where the up-spins form a single ×
square anywhere in Λ. The energy of the configurations in D equals (recall (17.1.1)
and see Fig. 17.4)

E() = H (D ) − H () = J [4] − h[]2 , (17.3.1)

which is maximal at = 2J / h and is negative for l > 4J / h. Since Λ is chosen large

enough so that it contains an 2c × 2c square, it follows that H () = H (0 × 0) >
H (). It is obvious from (17.2.1) that is the global minimum of H , while is a
17.3 Verification of the two hypotheses 415

Fig. 17.4 → E() (compare with Fig. 1.1)

local minimum of H . Thus, to settle (H1) it remains to show that has the unique
maximal stability level on S\.
Let γ = (γ0 , . . . , γ|Λ|
) : → be any path that grows a droplet of up-spins

by successively adding rows and bars to a quasi-square or square. We refer to this

as the reference path. In Sect. 17.4 we will show that:
γ ∈ ( → )opt ,
(17.3.2)
H γk = min H (σ ).
σ ∈Vk

Let

k = min k ∈ N : H γk ≤ H () ≥ 2 (17.3.3)
be the first time the reference path after it has left hits an energy not exceeding
that of .
For σ, σ ∈ S, let σ ∨ σ and σ ∧ σ denote the componentwise maximum, re-
spectively, minimum of σ and σ . An easy computation shows that, for all σ, σ ∈ S,

∂ σ ∨ σ + ∂ σ ∧ σ ≤ ∂(σ ) + ∂ σ ,
(17.3.4)
σ ∨ σ + σ ∧ σ = |σ | + σ .

Pick any σ ∈ S \ [ ∪ ]. Then there exists at least one pair of neighbouring sites x
and y in Λ such that σ (x) = −1 and σ (y) = +1. By translation invariance we may
assume without loss of generality that the first two spins that are flipped up in γ
are located at x and y, respectively. Then
σ ∧ γ1 = ,
(17.3.5)
1 ≤ σ ∧ γk < k ∀ k ≥ 2.

In what follows we will consider the path σ ∨ γk for 0 ≤ k ≤ k . We have

H σ ∨ γ1 − H (σ ) < H ∨ γ1 − H () = H γ1 − H (), (17.3.6)
where the inequality comes from the interaction between the up-spins at x and y.
416 17 Glauber Dynamics

Moreover, for 2 ≤ k ≤ k we can estimate

H σ ∨ γk − H (σ ) = J ∂ σ ∨ γk − ∂(σ ) − h σ ∨ γk − |σ |

≤ J ∂ γk − ∂ σ ∧ γk − h γk − σ ∧ γk

= H γk − H σ ∧ γk

< H γk − H (), (17.3.7)

where we use (17.2.1) and (17.3.2)–(17.3.5). (Note that the second lines of (17.3.2)
and (17.3.5) imply that H (σ ∧ γk ) > H () for 2 ≤ k ≤ k .) By picking k = k in
(17.3.7), we get

H σ ∨ γk − H (σ ) < H γk − H () ≤ 0. (17.3.8)

Combining (17.3.6)–(17.3.8), we find that

H σ ∨ γk < H (σ ), Φ σ, σ ∨ γk − H (σ ) < 0 ∨ max H γk − H () ≤ Γ ,
1≤k≤k
(17.3.9)
where the second inequality uses that σ = σ ∨ γ0 because γ0 = and the third
inequality uses that γ ∈ ( → )opt . Because of Definition 16.1(c), what (17.3.9)
says is that the stability level of σ is < Γ .
Since Φ(, ) − H () = Γ , it follows that has the unique maximal stability
level on S\ (recall Lemmas 16.7).

17.3.2 Second hypothesis

Proof It is obvious from Definitions 17.1(b–c) and Theorem 17.3 that (H3) is satis-
fied. Indeed, each configuration in C (, ) = Q 1pr has exactly one configuration
in P (, ) = Q from which it can be reached via an allowed move, namely, the
configuration that is obtained from it by removing the single protuberance.

17.4 Structure of the communication level set

In this section we prove Theorems 17.3 and 17.5.

Proposition 17.6
(i) Φ(, ) = Γ .
(ii) S (, ) ⊇ Q 1pr .

Proof The proof is based on four lemmas (Lemmas 17.7–17.10 below).

17.4 Structure of the communication level set 417

(i) We prove that Φ(, ) ≤ Γ and Φ(, ) ≥ Γ .

• Φ(, ) ≤ Γ : All we need to do is to construct a path that connects and

without exceeding energy Γ . The proof comes in three steps.

1. We first show that the configurations in Q are connected to by a path that stays
below Γ .

Lemma 17.7 For any σ ∈ Q there exists an γ : σ → such that maxξ ∈ω H (ξ ) <
Γ .

Proof Fix σ ∈ Q. Note that, by (17.1.7), we have

H (σ ) = Γ − (2J − h). (17.4.1)

First, we flip down a spin at a corner of the quasi-square, which increases the energy
by h. Next, we repeat this operation another c − 3 times, each time picking a spin
from a corner on the same shortest side. To guarantee that we never reach energy Γ ,
we must have that
h(c − 2) < 2J − h, (17.4.2)
or
2J
c < + 1. (17.4.3)
h
But this inequality holds by the definition of c in (17.1.4) and the non-degeneracy
hypothesis in (17.1.5). Finally, we flip down the last spin, which lowers the energy
by 2J − h, so that we arrive at energy

Γ − (2J − h) − 2J − h(c − 1) , (17.4.4)

which is strictly smaller than (17.4.1) by (17.4.3). Thus, the removal of a row of
length c − 1 from the (c − 1) × c quasi-square in σ lowers the energy (see
Fig. 17.5). We now have a square of side length c − 1. It is obvious that we can
remove further rows without encountering new conditions, until we reach .

2. We next show that the configurations in Q 2pr are connected to by a path that
stays below Γ .

Lemma 17.8 For any σ ∈ Q 2pr there exists an γ : σ → such that maxξ ∈ω H (ξ )
< Γ .

Proof Fix σ ∈ Q 2pr . Note that

H (σ ) = Γ − h. (17.4.5)
418 17 Glauber Dynamics

Fig. 17.5 Cost of adding or removing a row of length

First, we flip up a spin next to the 2-protuberance. This lowers the energy by h. We
can repeat this operation another c − 3 times until the row is filled. By that time we
have a square of side length c and energy

Γ − h(c − 1). (17.4.6)

Next, we flip up a spin to form a new 1-protuberance. This raises the energy by
2J − h. To make sure that we do not reach energy Γ , we must have

h(c − 1) > 2J − h, (17.4.7)

or
2J
c > , (17.4.8)
h
which holds by the definition of c and the non-degeneracy hypothesis in (17.1.5).
We now have a square of side length c with a 1-protuberance. By flipping up a spin
next to this 1-protuberance, we get a 2-protuberance and reach energy

Γ − h(c − 1) + (2J − h) − h, (17.4.9)

which is strictly smaller than (17.4.5) by (17.4.8). Thus, the completion of a row of
length c with a 2-protuberance and the creation of a new 2-protuberance lowers
the energy (see Fig. 17.5). It is obvious that we can complete further rows and create
further 2-protuberances without encountering new conditions, until we reach .

3. We can now conclude the proof of Φ(, ) ≤ Γ as follows. The desired path
γ : → is realized by tracing the path in Lemma 17.7 in the reverse direction,
from to σ ∈ Q, then going from σ to σ ∈ Q 1pr by adding a 1-protuberance and
from σ to σ ∈ Q 2pr by extending this 1-protuberance to a 2-protuberance, and
finally following the path in Lemma 17.8 from σ to . This γ will be called the
reference path for the magnetisation.

• Φ(, ) ≥ Γ : The proof comes in two more steps.

4. The first crucial ingredient in the proof is the following observation:

17.5 Computation of the prefactor 419

Lemma 17.9 Any ω ∈ ( → )opt must pass through Q.

Proof Any path γ : → must cross the set Vc (c −1) . As shown in Alonso and
Cerf [4], Theorem 2.6, the following isoperimetric inequality holds as a conse-
quence of (17.2.1): in Vc (c −1) the unique (modulo translations and rotations) con-
figuration of minimal energy is the (c − 1) × c quasi-square, which has energy
H (σ ) = Γ − (2J − h). All other configurations in Vc (c −1) have energy at least
Γ + h, and thus any path not hitting Q exceeds energy Γ .

5. The second crucial ingredient in the proof is the following observation:

Lemma 17.10 Any γ ∈ ( → )opt must pass through Q 1pr .

Proof Follow the path until it hits the set Vc (c −1) . According to Lemma 17.9, the
configuration in this set must be an (c − 1) × c quasi-square. Since we need not
consider any paths that return to the set Vc (c −1) afterwards, a first step beyond
the quasi-square must be the creation of a 1-protuberance. This brings us to en-
ergy Γ . If the 1-protuberance is created on the side of length c , then we have
a configuration in Q 1pr . If, on the other hand, it is created on the side of length
c − 1, then completion of the row leads an (c − 1) × (c + 1) rectangle with en-
ergy Γ − h(c − 2). After that the creation of a 1-protuberance brings us to energy
Γ − h(c − 2) + (2J − h), which exceeds energy Γ because of (17.4.2). Since
(c − 1) × (c + 1) + 1 = c × c , any other path that proceeds from the (c − 1) × c
quasi-square with a 1-protuberance on the side of length c − 1 to the set V2c without
returning to the set Vc (c −1) also exceeds energy Γ . Indeed, according to Alonso
and Cerf [4], Theorem 2.6, the unique configuration with minimal energy in the set
V2c is the c × c square (modulo rotations and translations).

Lemmas 17.9–17.10 imply that Φ(, ) ≥ Γ , and together with Steps 1–3
complete the proof of Proposition 17.6(i).

(ii) Proposition 17.6(ii) follows from Lemma 17.10 because H (Q 1pr ) = Γ .

The relations P (, ) = Q and C (, ) = Q 1pr and the formula for Γ
claimed in Theorem 17.3 are an immediate consequence of Definition 16.3 and
Lemmas 17.7–17.10.
The claim in Theorem 17.5 is immediate from Lemmas 17.7–17.8 in combination
with Proposition 16.12, Lemma 16.15, (16.2.4) and (17.3.6)–(17.3.7).

17.5 Computation of the prefactor

In this section we prove Theorem 17.4.

420 17 Glauber Dynamics

Proof Our starting point is the variational formula for Θ = 1/K in Lemma 16.17.
This variational problem simplifies considerably because of the following two facts
that are specific to our Glauber dynamics (abbreviate C = C (, )):
• S \ [S ∪ S ] = C , i.e., there are no wells inside C .
• There are no allowed moves within C , i.e., critical droplets cannot transform
into each other via single spin-flips.
Consequently, (16.3.11) reduces to
2 2
Θ= min 1 − h(σ ) N − (σ ) + 0 − h(σ ) N + (σ ) ,
h : Q 1pr →[0,1]
σ ∈Q 1pr
N − (σ )N + (σ )
= , (17.5.1)
N − (σ ) + N + (σ )
σ ∈Q 1pr

where

N − (σ ) = σ ∈ Q : σ ∼ σ ,
(17.5.2)
N + (σ ) = σ ∈ Q 2pr : σ ∼ σ ,

is the number of configurations in Q, respectively, Q 2pr that can reached from σ ∈

Q 1pr by a single spin-flip (use that Q ⊆ S and Q 2pr ⊆ S ). For all σ ∈ Q 1pr we
have N − (σ ) = 1, N + (σ ) = 1 when the 1-protuberance in σ sits at a corner, and
N + (σ ) = 2 when it does not. Hence

Θ = 2|Λ| 2(c − 2) 23 + 4 12 = |Λ| 43 (2c − 1), (17.5.3)

where 2|Λ| counts the number of locations and rotations of the protocritical droplet.
Since K = 1/Θ, this completes the proof of Theorem 17.4.

17.6 Extension to three dimensions

In this section we briefly indicate how to extend the main definitions and results
from two to three dimensions. No proofs are given. See Sect. 17.7 for references.
Let Λ ⊂ Z3 be a large cubic box, centred at the origin. The metastable parameter
range replacing (17.1.3) is
h ∈ (0, 3J ), (17.6.1)
and, similarly as in (17.1.5), we assume that

2J 4J
∈
/ N, ∈
/ N. (17.6.2)
h h
The analogue of Definitions 17.1(b–c) reads:
17.6 Extension to three dimensions 421

Fig. 17.6 An element of Q 1pr for c = 10, mc = 20 and δc = 0

Definition 17.11
(a) Let Q be the set of configurations where the up-spins form an (mc − 1) × (mc −
δc ) × mc quasi-cube with, attached to one of its faces, an (c − 1) × c quasi-
square, anywhere in Λ. Here, δc ∈ {0, 1} depends on the arithmetic properties
of J and h, while
9 : 9 :
2J 4J
c = , mc = , (17.6.3)
h h
are the two-dimensional critical droplet size on a face, respectively, the three-
dimensional critical droplet size, replacing (17.1.4). Note that mc ∈ {2c − 1,
2c }.
(b) Let Q 1pr be the set of configurations obtained from Q by adding a single protu-
berance anywhere to one of the longest sides of the quasi-square (see Fig. 17.6).
(c) Let

Γ = Γ (, ) = H Q 1pr − H ()

= J 2mc (mc − δc ) + 2mc (mc − 1) + 2(mc − δc )(mc − 1) + 4c

− h mc (mc − δc )(mc − 1) + c (c − 1) + 1 . (17.6.4)

Theorem 17.3 carries over: P (, ) = Q and C (, ) = Q 1pr . Also Theo-
rem 17.2 carries over: the proof of (H1–H2) is the same as in Sects. 17.3.1–17.3.2.
As to Theorem 17.4, the prefactor K can be computed explicitly, namely,
Kd=2
K = Kd=3 = (17.6.5)
Md=3
with Kd=2 the prefactor in two dimensions and Md=3 the number of quasi-cubes
in three dimensions that are contained in a three-dimensional critical droplet. The
422 17 Glauber Dynamics

rationale behind (17.6.5) is that a three-dimensional critical droplet is obtained by

first growing a quasi-cube with the appropriate side lenghts and then growing a two-
dimensional critical droplet on one side of this quasi-cube.

17.7 Bibliographical notes

1. The results in this chapter are taken from Bovier and Manzo [39]. Cruder versions
of the main results in Chap. 16 for Glauber dynamics, derived with the help of the
pathwise approach to metastability, were obtained by Neves and Schonmann [193]
in two dimensions and by Ben Arous and Cerf [19] in three dimensions.

2. The formula for K claimed in [39] contains a small error. This is corrected in
Theorem 17.4. The argument in Sect. 17.3.1 first appeared in den Hollander, Nardi,
Olivieri and Scoppola [84].

3. It is possible to extend the analysis in Sect. 17.6 to arbitrary dimension. As shown

in Neves [192], Γ can be computed in a recursive manner, based on the observa-
tion that a critical droplet in dimension d can be obtained by attaching a critical
droplet in dimension d − 1 to the appropriate side of a quasi-hypercube with the
appropriate side lengths. The main difficulty is to show that all the configurations
in C can be obtained in this way, which remains open. For the computation of K
the simple structure of C , exploited in Sect. 17.5 for the case of two dimensions,
prevails in higher dimensions, provided we assume that 2J / h, . . . , (d − 1)J / h ∈
/ N.
In particular, we have K = Kd = Kd−1 /Md with Md the number of quasi-cubes in
d dimensions that are contained in a d-dimensional critical droplet. For details and
relevant formulas, see [39].

4. Detailed results are known about the tube of typical trajectories, i.e., the set of
paths within which the crossover from to takes place, also referred to as the
nucleation pattern. The identification of this tube requires an analysis of the dy-
namics on shorter time scales, in particular, the typical times scales on which rows
and columns are grown. This is the realm of the pathwise approach to metastability.
Such a refined analysis is also necessary to improve on the result in Theorem 17.5.
See Olivieri and Vares [198, Sects. 7.3–7.4], and Gaudillière, Olivieri and Scop-
pola [122].

5. An anisotropic version of Glauber dynamics, in which the Hamiltonian in (18.1.2)

is modified by allowing for different pair potentials Jh > Jv > 0 in the horizon-
tal and the vertical direction, was studied in Kotecký and Olivieri [155]. Sur-
prisingly, despite the anisotropy the critical droplet still is a quasi-square with
a single protuberance with side length c = 02Jv / h1 under the assumption that
0 < h < 2Jv ∧ 2(Jh − Jv ). Only after this critical droplet has been grown does the
17.7 Bibliographical notes 423

nucleation proceed in an anisotropic manner, by first fully expanding in the horizon-

tal direction and afterwards fully expanding in the vertical direction, after which the
box is filled with plus-spins. For details, see [198, Sect. 7.7].

6. A version where a next-to-nearest-neighbour interaction with pair potential

J¯ > 0 is added to the Hamiltonian in (18.1.2) was considered by Kotecký and
Olivieri [156]. Here the critical droplet is expected to have an octagonal shape,
with side lengths that depend on the values of J, J¯. However, the situation turns
out to be different. Under the assumption that h < 17 J¯ < 70 1
J , initially the nucle-
ation pattern follows a sequence of regular octogons whose sides are equal, up to
length ¯c = 02J¯/ h1. After that the oblique sides remain of fixed length ¯c while
the horizontal and the vertical sides continue to grow longer, until they reach length
c − 2(¯c − 1) with c = 02J / h1, at which stage the critical droplet is reached. Af-
ter that the horizontal and the vertical sides continue to grow longer until the box is
filled with plus-spins. The analysis is rather delicate, because standard isoperimetric
inequalities can no longer be used. For details, see [198, Sect. 7.9].

7. A staggered version of Glauber dynamics, in which the Hamiltonian in (17.1.1)

is modified by allowing for opposite magnetic fields heven > 0 > hodd at the even
and the odd numbered sites, was studied in Nardi and Olivieri [188]. Once again the
nucleation pattern is unusual. There are three regimes, corresponding to the three
equilibrium phases of the model (plus-phase, minus-phase and staggered phase).
See [198, Sect. 7.10].

8. A version of the Glauber dynamics with three spin-values called the Blume-Capel
model, namely, Υ = {−1, 0, +1} and ∼ allowing for single-site changes of the
spins, was considered by Manzo and Olivieri [172, 173]. There are three regimes,
corresponding to the three equilibrium phases of the model (plus-phase, zero-phase
and minus-phase). The nucleation pattern is fairly complex. See [198, Sect. 7.11].

9. Metastability for Ising spins subject to a PCA spin-flip dynamics (defined in

Sect. 16.4.2) was considered in a series of papers by Bigilis, Cirillo, Lebowitz and
Speer [27], Cirillo, Nardi and Polosa [61], Cirillo [59], Cirillo and Nardi [60], Cir-
illo, Nardi and Spitoni [62–64], and Nardi and Spitoni [190]. The model studied
most closely is the one where Λ is a torus in Z2 and Υ = {−1, +1} is the spin
space, like in the present chapter, but the Hamiltonian is
1
H (σ ) = −h σ (x) − ln cosh β Uσ (x) + h , σ ∈ S = Υ Λ , (17.7.1)
β
x∈Λ x∈Λ

where β, h > 0 and Uσ (x) = y∈N σ (x + y) with N = {z ∈ Λ : z ≤ 1} and ·
the lattice norm on Z2 , and the single-spin transition probabilities are

1
px,σ (s) = 1 + s tanh β Uσ (x) + h , x ∈ Λ, σ ∈ S, s ∈ Υ. (17.7.2)
2
424 17 Glauber Dynamics

The choice in (17.7.1)–(17.7.2) matches the reversibility condition in (16.4.6). Note

that H depends on β, but

lim H (σ ) = H̄ (σ ) = −h σ (x) − Uσ (x) + h. (17.7.3)
β→∞
x∈Λ x∈Λ

As shown in [190] (recall (16.4.7)),

|Λ| ln 2
H σ, σ = H¯ σ, σ + (17.7.4)
β
with

H¯ σ, σ ≥ H̄ (σ ) ∨ H̄ σ . (17.7.5)
Therefore, in the limit as β → ∞, the PCA behaves like a discrete-time Markov
chain driven by the Hamiltonian H̄ , similar to the Metropolis dynamics. A full iden-
tification of the triple (Γ , C , K) was achieved for the choice where m = and
s = in the metastable regime where 0 < h < 1 and β → ∞. The results are similar
to what we found in Sect. 17.1. However, the proofs are more difficult, because there
are many configurations in which the dynamics stays trapped for a long time, e.g.
when the plus-spins form a rectangle. There are also many pairs of configurations
between which the dynamics oscillates for a long time, e.g. when the plus-spins
form two alternate checkerboards in a rectangle. This complexity hampers the geo-
metric analysis of H in (16.4.7) that is needed to identify (Γ , C , K). The model
in which 0 is removed from N (“no self-interaction”) is even harder to analyse. It
turns out that on its way from to the PCA visits the two configurations where
the spins form a checkerboard in Λ (provided Λ has even side length). This does
not happen for the model where 0 is included in N .
Chapter 18
Kawasaki Dynamics

“All right,” said the Cat; and this time it vanished quite slowly,
beginning with the end of the tail, and ending with the grin,
which remained some time after the rest of it had gone.
(Lewis Carroll, Alice’s Adventures in Wonderland)

In this chapter we apply the results obtained in Chap. 16 to the lattice gas in two and
three dimensions subject to Kawasaki dynamics. Particles live in a finite box, hop
between nearest-neighbour sites, feel an attractive interaction when they sit next to
each other, and are created, respectively, annihilated at the boundary of the box in a
way that reflects the presence of an infinite gas reservoir. We are interested in how
the system nucleates, i.e., how the box fills up when it is initially empty. Our goal
will be to prove hypotheses (H1–H2) in Sect. 16.1.2, implying that Theorems 16.4–
16.6 are valid. In two dimensions we will further identify (Γ , C ) and obtain the
asymptotics of K in the limit as the size of the box tends to infinity. In three dimen-
sions we will also identify Γ , but we will obtain only partial information on C
and K.
Kawasaki differs from Glauber, treated in Chap. 17, in that it is a conservative
dynamics: particles are conserved in the interior of the box. Consequently, during
the growing and the shrinking of droplets, particles must travel between the droplet
and the boundary of the box, which causes several complications. Moreover, it turns
out that in the metastable regime particles move along the border of a droplet more
rapidly than they arrive from the boundary of the box. This leads to a shape of the
critical droplet that is more complicated than the one for Glauber dynamics. This
complexity needs to be handled in order to obtain information on C and K.

18.1 Introduction and main results

18.1.1 Model

Let Λ ⊂ Z2 be a large square box, centered at the origin. Let

∂ −Λ = x ∈ Λ : ∃ y ∈ / Λ : y − x = 1 (18.1.1)

A. Bovier, F. den Hollander, Metastability,
Grundlehren der mathematischen Wissenschaften 351,
DOI 10.1007/978-3-319-24777-9_18
426 18 Kawasaki Dynamics

Fig. 18.1 A lattice-gas configuration

be the internal boundary of Λ, and put Λ− = Λ \ ∂ − Λ. With each site x ∈ Λ we

associate an occupation variable η(x) assuming the values 0 or 1, indicating the
absence or presence of a particle at x (see Fig. 18.1). A configuration is denoted by
η ∈ S = {0, 1}Λ . Each configuration η ∈ S has an energy given by the Hamiltonian

H (η) = −U η(x)η(y) + Δ η(x), (18.1.2)
{x,y}∈(Λ− )∗ x∈Λ

where
− ∗
Λ = {x, y} : x, y ∈ Λ− , x − y = 1 (18.1.3)

is the set of non-oriented nearest-neighbour bonds in Λ− . The interaction consists

of a binding energy −U < 0 for each neighbouring pair of particles in Λ− and an
activation energy Δ > 0 for each particle in Λ. Note that particles in ∂ − Λ do not
interact with particles anywhere in Λ.
The Hamiltonian in (18.1.2) models a lattice gas in Λ. We are interested in
Kawasaki dynamics on Λ with an open boundary. This is the Metropolis dynamics
with respect to H at inverse temperature β defined in (16.1.2) with two types of
allowed moves: (1) particle hop: 0 and 1 interchange at a pair of neighbouring sites
in Λ; (2) particle creation or annihilation: 0 changes to 1 or 1 changes to 0 at a sin-
gle site in ∂ − Λ. Clearly, this dynamics is a finite-state Markov process, and hence
fits into the general theory described in Chap. 16.
Kawasaki dynamics models the behaviour inside Λ of a lattice gas in Z2 , con-
sisting of particles subject to random hopping with hard core repulsion inside Λ,
neighbouring attraction inside Λ− , and creation and annihilation in ∂ − Λ. We may
think of Z2 \Λ as an infinite reservoir that keeps the particle density inside Λ fixed
at e−βΔ . In our model this reservoir is replaced by an open boundary ∂ − Λ. Note
that a move of particles inside ∂ − Λ does not involve a change of energy because the
interaction acts only inside Λ− .
The Gibbs measure μβ defined in (16.1.1) is the equilibrium of the dynamics
with transition rates cβ defined in (16.1.2) and satisfies the reversibility property in
(16.1.3).
18.1 Introduction and main results 427

18.1.2 Metastable regime and critical droplet size

Throughout the sequel we assume that

Δ ∈ (U, 2U ). (18.1.4)

We will see that this parameter range corresponds to metastable behaviour in the
limit as β → ∞ (see Fig. 18.3 below). A key role will be played by what we call
the critical droplet size:
9 :
U
c = . (18.1.5)
2U − Δ
For reasons that will become clear later on, we will assume that
U
∈
/ N. (18.1.6)
2U − Δ
Thus, an (c − 1) × (c − 1) droplet will be “subcritical” while an c × c droplet
will be “supercritical”. Moreover, we will assume that Λ is large enough so that Λ−
contains an 2c × 2c square.
Analogous assumptions are needed in three dimensions (see Sect. 18.6).

18.1.3 Main theorems

Each configuration can be decomposed into maximally connected components,

called clusters. A free particle is a particle in Λ not interacting with other parti-
cles.

Definition 18.1
(a) Let

= η ∈ S : η(x) = 0 ∀ x ∈ Λ ,
(18.1.7)
= η ∈ S : η(x) = 1 ∀ x ∈ Λ− , η(x) = 0 ∀ x ∈ ∂ − Λ ,

denote the configurations where Λ is empty, respectively, Λ− is full and ∂ − Λ

is empty.
(b) Let
%
D = D̄ ∪ D, (18.1.8)
where
– D̄ is the set of configurations with a single cluster anywhere in Λ− consisting
of an (c − 2) × (c − 2) square with four bars of lengths k̄i , i = 1, 2, 3, 4,
428 18 Kawasaki Dynamics

Fig. 18.2 A configuration in D̄ with an (c − 2) × (c − 2) square in the center and four bars
% with an (c − 3) × (c − 1) rectangle in the center
attached to it. A similar picture applies for D

attached to its four sides satisfying

1 ≤ k̄i ≤ c − 1, k̄i = 3c − 3. (18.1.9)
i

% is the set of configurations with a single cluster anywhere in Λ− consisting

– D
of an (c − 3) × (c − 1) rectangle with four bars of lengths %ki , i = 1, 2, 3, 4,
attached to its four sides satisfying

1 ≤%
ki ≤ c − 1, %
ki = 3c − 2. (18.1.10)
i

In the definition of D̄ , the four bars may be placed anywhere in the ring around
the square, i.e., anywhere in the union of the two rows and the two columns forming
the outer layer of the square (see Fig. 18.2). A total of 3c − 3 particles must be
accommodated in this ring in such a way that each side of the ring, i.e., each row or
column, contains precisely one bar. A bar may include a corner of the ring provided
%.
the neighbouring bar also includes this corner. Similarly for D
In Sect. 18.1.4, item 2, we will see that the configurations in D arise from each
other via motion of particles along the border of the droplet, a phenomenon that is
specific to Kawasaki dynamics.
The main metastability theorems for Kawasaki dynamics are the following. Re-
call Definition 16.3.

Theorem 18.2 The pair (, ) satisfies hypotheses (H1–H2) in Sect. 16.1.2, and
hence Theorems 16.4–16.6 hold.
18.1 Introduction and main results 429

Theorem 18.3 The pair (, ) has protocritical set P (, ) = D , critical set
C (, ) = D fp , and communication height

Γ = Γ (, ) = H D fp − H () = H (D) + Δ

= −U (c − 1)2 + c (c − 2) + 1 + Δ c (c − 1) + 2

= 2U [c + 1] − (2U − Δ) c (c − 1) + 2 . (18.1.11)

Theorem 18.4 For large Λ the prefactor K = K(Λ) scales like

|Λ| 1
lim K(Λ) = (18.1.12)
Λ→Z2 ln |Λ| 4πN(c )
with

4 c + k − 2 c + k − 3
N(c ) = +2 (18.1.13)
k 2k − 1 2k − 1
k=1,2,3,4

the cardinality of D modulo shifts.

Remark The asymptotics in (18.1.12) does not depend on the shape of Λ, e.g. it
would be the same if Λ were a large circle rather than a large square.
In addition, we have the following geometric description of the configurations in
the valleys S , S around , defined in (16.2.25). Let

V≤D = η ∈ S : η ≤ η for some η ∈ D ,
(18.1.14)
V≥C G = η ∈ S : η ≥ η for some η ∈ C G ,

where C G is the set of configurations obtained from C = D fp by moving the free

particle from ∂ − Λ to the cluster and attaching it at a “good” site in the outer layer
of the cluster (i.e., next to two other particles; see Fig. 18.10 below).

Theorem 18.5 S ⊇ V≤D , S ⊇ V≥C G .

18.1.4 Discussion

1. The proof of Theorem 18.2 is given in Sect. 18.3. (H2) is easy to check, (H1) is
more involved and relies on certain isoperimetric inequalities.

2. The heuristics behind Theorem 18.3 is as follows. In Sect. 18.4 we will see that
D fp ⊆ S (, ), the communication level set of the pair (, ). We will see that
the dynamics passes through S (, ) in four steps: (1) first it creates a “canonical
protocritical droplet”, namely, a configuration in D with the property that three bars
430 18 Kawasaki Dynamics

have full length and one bar consists of a single protuberance; (2) next it allows
particles to “move along the border of the droplet”, thereby forming all the other
“protocritical droplets” in D ; (3) after that it brings in a free particle, thereby form-
ing a “critical droplet”; (4) finally it attaches this free particle to the boundary of the
protocritical droplet. After these four steps are completed, the dynamics is “over the
hill” and proceeds downwards in energy to fill up the box. This also explains where
Theorem 18.5 comes from.

Note: If the free particle attaches itself at a “bad site” in the outer layer of the proto-
critical droplet (i.e., next to one other particle), then either it may again detach itself
or it may cause a motion of particles along the border of the droplet, after which
another particle may detach itself, possibly leaving behind a different protocritical
droplet. However, since for large Λ a free particle has a small probability to escape
from the protocritical droplet and return to ∂ − Λ, it must eventually attach itself at a
“good site”. See Sect. 18.4.4 for more details.

3. The heuristics behind Theorem 18.4 is as follows. The average time it takes for
the dynamics to enter C (, ) = D fp when starting from is

1 1
eβΓ 1 + o(1) , β → ∞, (18.1.15)
|D| |∂ − Λ|

where |D| counts the number of protocritical droplets and |∂ − Λ| counts the number
of locations where the free particle can be created. Let π(Λ, c ) be the probability
that the free particle moves from ∂ − Λ to the protocritical droplet and attaches itself
at a good site, i.e., the probability that the dynamics after it enters C (, ) moves
onwards to rather than returns to . Then
1
1 + o(1) , β → ∞, (18.1.16)
π(Λ, c )

is the average number of times a free particle just created in ∂ − Λ attempts to move
to the protocritical droplet and attach itself at a good site before it finally manages
to do so. The average nucleation time is the product of (18.1.15) and (18.1.16), and
so we conclude that
1
K= . (18.1.17)
|D| |∂ − Λ| π(Λ, c )
To compute |D|, note that

|D| = 1 + o(1) |Λ| N (c ), Λ → Z2 . (18.1.18)

To compute π(Λ, c ), note that

4π
|∂ − Λ| π(Λ, c ) = 1 + o(1) , Λ → Z2 . (18.1.19)
ln |Λ|
18.2 Geometric definitions 431

Indeed, as we will see in Sect. 18.5, the right-hand side of (18.1.19) is the probability
for large Λ that a particle detaching itself from the protocritical droplet reaches
∂ − Λ before re-attaching itself. Due to the recurrence of simple random walk in
two dimensions, for large Λ this probability is independent of the shape and the
location of the protocritical droplet, as long as it is far from ∂ − Λ. By reversibility,
the reverse motion has the same probability, which explains (18.1.19). Combine
(18.1.17)–(18.1.19) to get (18.1.12).

4. In the limit of weak supersaturation, Δ ↑ 2U , we have c → ∞. In this limit, the

formula in (18.1.13) gives N (c ) ∼ 7c /2520.

5. In Sect. 18.5 we will derive a representation for the prefactor K in terms of

certain capacities associated with two-dimensional simple random walk on Λ in the
presence of a protocritical droplet. We will see that this representation is non-trivial
because of the presence of so-called “good sites” and “bad sites” on the border of
the protocritical droplet. Consequently, no easily computable expression is available
for K for finite Λ, only bounds. Theorem 18.4 shows that these bounds merge in
the limit as Λ → Z2 .

Outline The outline of the remainder of this chapter is as follows. In Sect. 18.2
we introduce some key geometric definitions that are needed for the proof of Theo-
rems 18.2–18.5. These theorems are proved in Sects. 18.3–18.5. Section 18.6 looks
at the extension from two to three dimensions.

Throughout the sequel we assume that c ≥ 3. The case c = 2 is trivial:

P (, ) = D is the set of configurations consisting of three particles forming a
cluster anywhere in Λ− , C (, ) = D fp is the set of configurations obtained from
these by adding a free particle anywhere in ∂ − Λ, and Γ (, ) = −2U + 4Δ.

18.2 Geometric definitions

In order to prove Theorems 18.2–18.4, we need some further definitions.

1. Throughout the sequel, we identify a configuration η ∈ S with its support

supp(η) = {x ∈ Λ : η(x) = 1}, and write x ∈ η to indicate that η has a particle at x.
Free particles, 1-protuberances and corners are defined as follows:
– For x ∈ Λ− , let N (x) = {y ∈ Λ− : |y − x| = 1} be the set of nearest-neighbour
sites of x in Λ− .
− −
free particle in η ∈ S is a site x ∈ η ∩ ∂ Λ or a site x ∈ η ∩ Λ such that
– A
y∈N (x) η(y) = 0, i.e., a particle not in interaction with any other particle (re-
member from (18.1.2) that particles in the interior boundary ∂ − Λ have no inter-
action with particles in the interior Λ− ).
– A 1-protuberance in η ∈ S is a site x ∈ η ∩ Λ− such that y∈N (x) η(y) = 1.
432 18 Kawasaki Dynamics

– A corner in η ∈ S is a site x ∈ Λ− such that y∈N (x) η(y) ≥ 2. A corner in η
can be either occupied or vacant.

2. Given a configuration η ∈ S, consider the set C(η) ⊆ R2 defined as the union of

the closed unit squares centred at the sites inside Λ− where η has a particle. The
maximal connected components C1 , . . . , Cm , m ∈ N, of C(η) are called clusters of η
(two unit squares touching only at the corners are not connected). There is a one-to-
one correspondence between configurations η ⊆ Λ− and sets C(η). A configuration
η ∈ S is characterised by a set C(η), depending only on η ∩ Λ− , plus possibly a
set of particles in ∂ − Λ, namely, η ∩ ∂ − Λ. Thus, we are actually identifying two
different objects: a configuration η ∈ S and the pair (C(η), η ∩ ∂ − Λ).

3. For η ∈ S, let |η| be the number of particles in η, ∂(η) the Euclidean boundary
of C(η), called the contour of η, and |∂(η)| the length of ∂(η). Then the energy
associated with η is given by
U
H (η) = ∂(η) − (2U − Δ)η ∩ Λ− + Δη ∩ ∂ − Λ. (18.2.1)
2
4. To describe the shape of clusters, we need the following:
– An 1 × 2 rectangle is a union of closed unit squares centred at the sites in-
side Λ− with side lengths 1 , 2 ≥ 1. We use the convention 1 ≤ 2 and collect
rectangles in equivalence classes modulo translations and rotations.
– A bar is a 1 × k rectangle with k ≥ 1. A bar is called a row or a column if it fills
a side of a rectangle.
– A corner of a rectangle is an intersection of two bars attached to the rectangle.
– A quasi-square is an × ( + δ) rectangle with ≥ 1 and δ ∈ {0, 1}. A square is
a quasi-square with δ = 0.
– If η is a configuration with a single contour, then we denote by CR(η) the rectan-
gle circumscribing η, i.e., the smallest rectangle containing η. We write

∂ − CR(η) = x ∈ CR(η) : ∃ y ∈ / CR(η) : y − x = 1 ,
(18.2.2)
∂ + CR(η) = x ∈/ CR(η) : ∃ y ∈ CR(η) : y − x = 1 ,

to denote the interior, respectively, external boundary of CR(η), and put

−
CR (η) = CR(η) \ ∂ − CR(η),
(18.2.3)
+
CR (η) = CR(η) ∪ ∂ + CR(η).

Note that here we identify particles with unit squares.

– Given η such that η ⊇ CR− (η), we say that it is possible to move a particle from
row rα (η) ⊆ ∂ − CR(η) to row rα (η) ⊆ ∂ − CR(η) via corner cα,α (η) ∈ ∂ − CR(η) if
(see Figs. 18.5–18.6 in Sect. 18.4.1)

cαα (η) ∩ η = 0, rα (η) ∩ η ≥ 1, 1 ≤ rα (η) ∩ η ≤ rα (η), (18.2.4)
18.3 Verification of the two hypotheses 433

where αα ∈ {ne, nw, se, sw} with n = north, s = south, etc. By convention, cor-
ners are not part of rows. If equality holds in the last inequality, then we need to
place the bar in the row opposite to rα (η), say rα (η), a distance 1 away from
cα α (η) in order to be able to accommodate the shift of a bar in rα (η) that is
necessary to accommodate the particle that moves around the corner.

5. For η, η ∈ S, a path γ : η → η of allowed moves is called a U -path if

(i) H (η) = H η ,
(ii) maxi H (γi ) ≤ H (η) + U, (18.2.5)
(iii) |γi | = |η| for all i.
6. The configuration space S can be partitioned as
|Λ|

S= Vn , (18.2.6)
n=0

where

Vn = η ∈ S : |η| = n (18.2.7)
is the set of configurations with n particles.

18.3 Verification of the two hypotheses

18.3.1 First hypothesis

Proof Let D denote the set of configurations where the particles form a single
× square anywhere inside Λ− . The energy E() of the configurations in D
equals (recall (18.1.2) and see Fig. 18.3)

E() = H (D ) − H () = −U 2( − 1) + Δ2 = 2U − (2U − Δ)2 , (18.3.1)

which is maximal at = U/(2U − Δ) and is negative for l > 2U/(2U − Δ). Since
Λ is chosen large enough so that Λ− contains an 2c × 2c square, it follows that
H () = H (0 × 0) > H (). It is obvious from (18.2.1) that is the global min-
imum of H , while is a local minimum of H . Thus, to settle (H1) it remains to
show that has the unique maximal stability level on S\.
We can repeat the argument for Glauber dynamics in Sect. 17.3.1 by thinking
of up-spins as particles and down-spins as vacancies. The additional obstacle under
Kawasaki dynamics is that, when we are growing the configuration by considering
the union of η with the droplets in the reference path, particles cannot be created
where needed but have to arrive from ∂ − Λ. We have to make sure that at any stage
the configuration is such that a particle coming from ∂ − Λ can be moved to where
it is needed. This requires a technical construction with “pistons enclosing η”, for
which we refer to the literature (see the reference in Sect. 18.7).
434 18 Kawasaki Dynamics

Fig. 18.3 → E() (compare with Fig. 1.1)

18.3.2 Second hypothesis

Proof It is obvious from Definitions 18.1(b–c) and Theorem 18.3 that (H2) is sat-
isfied. Indeed, each configuration in C (, ) = D fp has exactly one configuration
in P (, ) = D from which it can be reached via an allowed move, namely, the
configuration that is obtained from it by removing the free particle in ∂ − Λ.

18.4 Structure of the communication level set

In this section we prove Theorems 18.3 and 18.5. In Sect. 18.4.1 we consider the set
Q consisting of those configurations in D where the single cluster is an (c − 1) × c
quasi-square with a protuberance attached to one of its sides. We show that D , our
target protocritical set, coincides with Q U , the set all configurations that can be
obtained from Q via a U -path. In Sect. 18.4.2 we use the identity D = Q U to show
that Φ(, ) = Γ with Γ given by (18.1.11) and S (, ) ⊇ D fp . In Sect. 18.4.3
we combine the results obtained in Sect. 18.4.2 to show that P (, ) = D and
C (, ) = D fp , thereby completing the proof of Theorem 18.3. In Sect. 18.4.4 we
take a closer look at what happens when the free particle in D fp attaches itself to
the single cluster, where we distinguish between “good sites” and “bad sites” on the
border of the single cluster. The latter distinction will be needed in Sect. 18.5 for
the proof of Theorem 18.4. In Sect. 18.4.5, finally, we compute the cardinality of D
modulo shifts, which will also be needed in Sect. 18.5 for the proof of Theorem 18.4.

18.4.1 Canonical protocritical droplets

The following definition formalises the notion of canonical protocritical droplet and
protocritical droplet mentioned in Item 2 of Sect. 18.1.4.
18.4 Structure of the communication level set 435

Fig. 18.4 A canonical critical droplet: an element of Q fp ⊆ D fp

Definition 18.6
(a) Let Q ⊆ D be the set of configurations consisting of an (c − 1) × c quasi-
square anywhere in Λ− with a protuberance attached to one of its sides (see
Fig. 18.4). These configurations are called canonical protocritical droplets.
(b) Let Q U be the set of configurations that can be reached from some configuration
in Q via a U -path, i.e.,

Q U = η ∈ Vnc : ∃ η ∈ Q : H (η) = H η , ΦVnc η, η ≤ H (η) + U ,
(18.4.1)
where nc = c (c − 1) + 1 is the volume of the clusters in Q and ΦVnc is the
communication height within Vnc . These configurations are called protocritical
droplets.

Note that Q = Q¯ ∪ Q,
% where

– Q¯ are those configurations where the single particle is attached to one of the
longest sides of the (c − 1) × c quasi-square.
– Q% are those configurations where the single particle is attached to one of the
shortest sides of the (c − 1) × c quasi-square.
Thus, Q¯ consists of precisely those configurations in D̄ where in (18.1.9) one k̄i
equals 1 and the others are maximal. Similarly, Q % consists of precisely those con-
% %
figurations in D where in (18.1.10) one ki equals 1 and the others are maximal.
We will see in Sect. 18.4 that the configurations in D̄, D ¯ Q
% arise from those in Q, %
via a motion of particles along the border of the droplet (see Figs. 18.5–18.6). This
property is special for Kawasaki dynamics.
Our main result in this section is the following relation, which will be needed in
Sect. 18.4.2.

Proposition 18.7 D = Q U .
436 18 Kawasaki Dynamics

Fig. 18.5 Translation of a bar on a side of a rectangle at cost U

Fig. 18.6 Motion of a particle around a corner of a rectangle at cost U

Proof The proof is split into two parts:

(i) D ⊆ QU ,
(18.4.2)
(ii) D ⊇ QU .

• Proof of (i): Recall the definition of U -path in (18.2.5) and of the protocritical set
D = D̄ ∪ D% in Definition 18.1(b). To prove (i) we must show that for all η ∈ D ,

(i1) H (η) = H (Q),

(18.4.3)
(i2) ∃ γ : Q → η : maxi H (γi ) ≤ H (Q) + U, |γi | = nc for all i.

• Proof of (i1): Any η ∈ D has a single contour ∂(σ ) inside Λ− of length |∂(σ )| =
4c and volume |η ∩Λ− | = c (c −1)+1 = nc , while |η ∩∂ − Λ| = 0 (see Fig. 18.2).
Thus, by (18.2.1), H is constant on D . Since Q ⊆ D , this completes the proof
of (i1).

• Proof of (i2): Note that, because Q¯ and Q% are connected via a U -path (disconnect
the 1-protuberance and re-attach it to one of the neighboring sides of the (c − 1) ×
c quasi-square), we have

Q U = {η ∈ S : ∃ U -path from Q¯ to η} = {η ∈ S : ∃ U -path from Q

% to η}.
(18.4.4)
First we prove that for any η ∈ D̄ there exists a γ : Q¯ → η such that
¯ + U and |γi | = nc for all i. We start the path from some
maxi H (γi ) ≤ H (Q)
¯
ζ ∈ Q. Then, recalling the labelling in Definition 18.1(b), we have
– k̄1 (ζ ) = 1 contained in re (ζ );
– k̄2 (ζ ) = c − 2 contained in rn (ζ );
– k̄3 (ζ ) = k̄4 (ζ ) = c − 1 contained in rw (ζ ) ∪ cnw (ζ ) and rs (ζ ) ∪ csw (ζ ), respec-
tively.
Here, without loss of generality, we assume that the 1-protuberance is attached to
re (ζ ) and proceed anti-clockwise. Using the mechanism described in Figs. 18.5–
18.6, we move k̄2 (ζ ) − k̄2 (η) particles from rn (ζ ) to re (ζ ), one by one. After that
we move k̄3 (ζ ) − k̄3 (η) + k̄4 (ζ ) − k̄4 (η) particles from rs (ζ ) ∪ csw (ζ ) to re (ζ ).
Finally, we move k̄3 (ζ ) − k̄3 (η) particles from rw (ζ ) ∪ cnw (ζ ) to rs (ζ ) ∪ csw (ζ ).
The result is a configuration η ∈ D̄ .
18.4 Structure of the communication level set 437

Next we prove that for any η ∈ D % there exists a γ : Q % → η such that

%
maxi H (γi ) ≤ H (Q) + U and |γi ∩ Λ| = nc for all i. We start the path from some
% We have
ζ ∈ Q.
– %
k1 (ζ ) = 1 contained in re (ζ );
– %
k2 (ζ ) = %
k4 (ζ ) = c − 1 contained in rn (ζ ) and rs (ζ );
– %
k3 (ζ ) = c − 1 contained in rw (ζ ) ∪ cnw (ζ ) ∪ csw (ζ ).
We move % k2 (ζ ) − %
k2 (η) particles from rn (ζ ) to re (ζ ). After that we move % k3 (ζ ) −
%
k3 (η) + %k4 (ζ ) − %k4 (η) particles from rs (ζ ) ∪ csw (ζ ) to re (ζ ). Finally, we move
k3 (ζ ) − %
% k3 (η) particles from rw (ζ ) ∪ cnw (ζ ) to rs (ζ ) ∪ csw (ζ ). The result is a con-
figuration η ∈ D %. This completes the proof of (i2).

• Proof of (ii): By (18.4.3), all configurations in D are connected via a U -path.

Since Q ⊆ Q U ∩ D , in order to prove (ii) it suffices to show that D cannot be
exited via a U -path (recall (18.4.4)).
Call a path clustering if all the configurations in the path consist of a single cluster
and no free particles. Below we will prove that for any η ∈ D and any η connected
to η by a clustering U -path,

(a) CR η = CR(η),
(18.4.5)
(b) η ⊇ CR− (η).

What (18.4.5) says is that neither D̄ nor D% can be exited via a clustering U -path.
From this in turn we deduce that for any η ∈ D and any η connected to η by a
U -path we must have that η ∈ D , which is what we want to prove. The argument
for the latter goes as follows. Detaching a particle costs 2U unless the particle is a
1-protuberance, in which case the cost is U . The only configurations in D having a
1-protuberance are those in Q. If we detach the 1-protuberance from a configuration
in Q, at cost U , then we obtain an (c − 1) × c quasi-square plus a free particle.
Since now only moves at zero cost are allowed, only the free particle can move.
Since in a U -path the particle number is conserved, the only way to regain U and
complete the U -path is to re-attach the free particle to the quasi-square, in which
case we return to Q.

Remark Note that the motion of particles along the border of a droplet may shift
the droplet. Indeed, from any configuration in Q the 1-protuberance may detach
itself and re-attach itself to a different side of the quasi-square or rectangle. Thus,
the U -path may shift the protocritical droplet to anywhere in Λ− .

• Proof of (a): Starting from any η ∈ S, it is geometrically impossible to modify

CR(η) without detaching a particle.

• Proof of (b): Fix η ∈ D . The proof is done in two steps.

1. Let us first consider clustering U -paths along which we do not move a particle
from CR− (η). Along such paths we only encounter configurations in D or configu-
438 18 Kawasaki Dynamics

Fig. 18.7 Creation and motion of the hole at cost 0

Fig. 18.8 Dumb-bell shape of D = D̄ ∪ D % for U -paths: the canonical protocritical droplets Q¯ and
Q %
% are the gateways between the sets of protocritical droplets D̄ and D

rations obtained from D by breaking one of the bars in ∂ − CR(η) into two pieces, at
cost U (because there is no particle outside CR(η) that can help to lower the cost).
From the latter only moves at zero cost are possible, so no particle can be detached,
and the only way to regain U and complete the U -path is to restore a bar.

2. Let us next consider clustering U -paths along which we move a particle from a
corner of CR− (η). This move costs 2U , which exceeds U . The overshoot U must be
regained by letting the particle slide next to a bar that is attached to a side of CR− (η)
(see Fig. 18.7). Since there are never two bars attached to the same side, we can at
most gain U . This is why it is not possible to move a particle from CR− (η) other
than from a corner.
From here only moves at zero cost are allowed. There are no 1-protuberances
present anymore, because only the configurations in Q have a 1-protuberance. Thus,
no particle outside CR− (η) can move, except the one that just detached itself from
CR− (η). This particle can move back, in which case we return to the same config-
uration η. In fact, all possible moves at zero cost consist in moving the “hole” just
created in CR− (η) along the side of CR− (η), until it reaches the height of the top of
the bar attached to this side of CR− (η), after which it cannot advance anymore at
zero cost (see Fig. 18.7). All these moves do not change the energy, except the one
that returns the particle to its original position and regains U .
This proves our claim in (18.4.5), completes the proof of (ii) in (18.4.2), and
hence of Proposition 18.7.

We saw above that U -paths cannot exit D = D̄ ∪ D %, but can make a crossover
between D̄ and D . This crossover can, however, only occur between Q¯ and Q.
% %
A schematic picture of D therefore is as in Fig. 18.8.
18.4 Structure of the communication level set 439

18.4.2 Protocritical and critical droplets

Most of this section revolves around getting a precise description of ( → )opt ,

the set of optimal paths for the nucleation (recall Definition 16.2(a)).

Proposition 18.8
(i) Φ(, ) = Γ .
(ii) S (, ) ⊇ D fp .

Proof The proof is based on five lemmas (Lemmas 18.9–18.13 below).

(i) We prove that Φ(, ) ≤ Γ and Φ(, ) ≥ Γ .

• Φ(, ) ≤ Γ : All we need to do is to construct a path that connects and

without exceeding energy Γ . The proof comes in three steps.

1. We first show that the configurations in Q are connected to by a path that stays
below Γ .

Lemma 18.9 For any η1pr ∈ Q there exists a γ : η1pr → such that maxξ ∈γ H (ξ )
< Γ .

Proof Fix η1pr ∈ Q. Note that, by (18.1.11), we have H (η1pr ) = Γ − Δ. First, we

detach the 1-protuberance from the (c − 1) × c quasi-square, which costs U and
raises the energy to Γ − Δ + U (< Γ ), move the particle to the boundary of the
box, which costs nothing, and move it out of the box, which pays Δ. We are then
left with a quasi-square of energy

Γ − (2Δ − U ). (18.4.6)

Second, we detach a particle from a corner of the quasi-square, which costs 2U , and
move it out of the box, which pays Δ. Thus, the energy increases by 2U − Δ when
detaching and removing a particle from a corner of the quasi-square. We repeat this
operation another c − 3 times, each time picking particles from the bar on the same
shortest side. To guarantee that we never reach energy Γ , we have the condition
that
(2U − Δ)k + 2U < 2Δ − U for 0 ≤ k ≤ c − 3, (18.4.7)
or
U
3 ≤ c < + 1. (18.4.8)
2U − Δ
The second inequality holds by the definition of c in (18.1.5) and the non-
degeneracy assumption in (18.1.6), the first inequality by our exclusion of c = 2
(recall the statement made at the end of Sect. 18.3). Third, detaching the last par-
440 18 Kawasaki Dynamics

Fig. 18.9 Cost of adding or removing a row of length

ticle costs U instead of 2U . To guarantee that we still do not reach energy Γ , we

have the condition that

(2U − Δ)(c − 2) + U < 2Δ − U, (18.4.9)

which is weaker than (18.4.7) because 2U − Δ < U . Removal of the last particle
pays Δ, so that we arrive at energy

Γ − (2Δ − U ) + (2U − Δ)(c − 2) + U − Δ = Γ − 2Δ + (2U − Δ)(c − 1),
(18.4.10)
which is strictly smaller than (18.4.6) by the second inequality in (18.4.8). Thus,
removal of a row of length c − 1 from the (c − 1) × c quasi-square in η1pr ∈ Q
lowers the energy (see Fig. 18.9). We now have a square of side length c − 1. It
is obvious that we can remove further rows without encountering new conditions,
until we reach .

2. For η1pr ∈ Q, let η2pr be the configuration obtained from η1pr by attaching an
extra particle next to the 1-protuberance, thereby forming a 2-protuberance. We next
show that η2pr is connected to by a path that stays below Γ .

Lemma 18.10 For any η1pr ∈ Q there exists a γ : η2pr → such that maxξ ∈γ H (ξ )
< Γ .

Proof Without loss of generality we may assume that η1pr ∈ D̄ because of Propo-
sition 18.7. Fix η1pr ∈ Q. Note that H (η2pr ) = Γ − 2U . First, we create a par-
ticle, which costs Δ and raises the energy to Γ − (2U − Δ)(< Γ ), move it to
the droplet, which costs nothing, and attach it next to the 2-protuberance, which
pays 2U , thereby forming a bar of length 3. This operation pays 2U − Δ. We can
18.4 Structure of the communication level set 441

repeat this operation another c − 3 times until the row is filled. By that time we
have a square of side length c and energy

Γ − 2U − (2U − Δ)(c − 2). (18.4.11)

Second, we create another particle and attach it anywhere to the square to form a
new 1-protuberance. This operation costs Δ − U . We must make sure that we can
still create a particle without reaching energy Γ , which gives us the condition

Δ − U + Δ < 2U + (2U − Δ)(c − 2), (18.4.12)

or
U
c > , (18.4.13)
2U − Δ
which holds by the definition of c and the non-degeneracy assumption in (18.1.6).
Third, we create another particle and attach it next to the new 1-protuberance. This
brings us to energy
Γ − U − (2U − Δ)c , (18.4.14)
which is below the energy of η2pr by (18.4.13). It is obvious that we can add further
rows without encountering new conditions, until we reach .

3. We can now conclude the proof of Φ(, ) ≤ Γ by constructing a bridge be-

tween η1pr and η2pr that does not exceed Γ . Namely, create a particle at the bound-
ary, which costs Δ and raises the energy to Γ , move it to the droplet, which costs
nothing, and place it next to the 1-protuberance, which pays 2U . The desired path
γ : → is realized by tracing the path in Lemma 18.9 in the reverse direction,
back from to η1pr , going over the bridge from η1pr to η2pr , and then following
the path in Lemma 18.10 from η2pr to . This γ will be called the reference path
through η for the nucleation.

• Φ(, ) ≥ Γ : The proof comes in three more steps.

4. The first crucial ingredient in the proof is the following observation:

Lemma 18.11 Any γ ∈ ( → )opt must pass through a configuration consisting

of a single (c − 1) × c quasi-square somewhere in Λ− .

Proof Any path γ : → must cross the set Vc (c −1) . As shown in Alonso and
Cerf [4], Theorem 2.6, in Vc (c −1) the unique (modulo translations and rotations)
configuration of minimal energy is the (c − 1) × c quasi-square, which we denote
by η and which has energy

H (η) = Γ − (2Δ − U ). (18.4.15)

All other configurations in Vc (c −1) have energy at least Γ − 2Δ + 2U . To increase
the particle number starting from any such configuration, we must create a particle
442 18 Kawasaki Dynamics

at cost Δ. But the resulting configuration would have energy Γ − Δ + 2U (> Γ )

and thus would lead to a path exceeding energy Γ .

5. The second crucial ingredient in the proof is the following observation:

Lemma 18.12 Any γ ∈ ( → )opt must pass through Q.

Proof Follow the path until it hits the set Vc (c −1) . According to Lemma 18.11, the
configuration in this set must be an (c − 1) × c quasi-square. Since we need not
consider any paths that return to the set Vc (c −1) afterwards, a first step beyond the
quasi-square must be the creation of a new particle. This brings us to energy

Γ − Δ + U. (18.4.16)

Before any new particle is created, we must lower the energy by at least U . The
obviously only possible way to do this is to move the particle to the quasi-square
and attach it to one of its sides, which reduces the energy to

Γ−Δ (18.4.17)

and gives us a configuration in Q.

6. It now suffices to show that to reach from Q we must reach energy Γ . This
goes as follows. Starting from Q, it is impossible to reduce the energy without
lowering the particle number. Indeed, this follows from Alonso and Cerf [4], Theo-
rem 2.6, which asserts that the minimal energy in Vc (c −1)+1 is realised (although
not uniquely) by the configurations in Q. Since any further move to increase the
particle number involves the creation of a new particle, the energy must reach Γ .

Lemmas 18.11–18.12 imply that Φ(, ) = Γ , and together with Steps 1–3 com-
pletes the proof of Proposition 18.8(i).

(ii) Our final observation is the following:

Lemma 18.13 The set of configurations in Vc (c −1)+1 that can be reached from
by a path that stays below Γ and for which it is possible to add a particle without
exceeding Γ coincides with the set Q U defined in Definition 18.6(b).

Proof From step 2 above it is clear that the definition of Q U precisely assures that
the assertion holds true. Indeed, by Lemma 18.12, any γ ∈ ( → )opt crosses
Vc (c −1)+1 in Q. Once it is in Q, before the arrival of the next particle, which
costs Δ, it can reach all configurations that have the same energy, the same particle
number, and can be reached at cost ≤ U < Δ.

We know from Proposition 18.7 that Q U = D . By adding a free particle in ∂ − Λ

to a configuration in D we obtain a configuration in D fp . Hence Lemma 18.13 im-
plies that any optimal path passes through D fp . This completes the proof of Propo-
sition 18.8(ii).
18.4 Structure of the communication level set 443

18.4.3 Identification of the protocritical and the critical set

The relations P ( → ) = D and C ( → ) = D fp and the formula for Γ

claimed in Theorem 18.3 are an immediate consequence of Definition 16.3, Lem-
mas 18.9–18.10 and the following proposition:

Proposition 18.14 Any γ ∈ ( → )opt passes first through Q, then (possibly)

through D \ Q, and finally through D fp .

Proof Combine Lemmas 18.12–18.13 and Proposition 18.8(i).

The claim in Theorem 18.5 is immediate from Lemmas 18.9–18.10 in combi-

nation with Proposition 16.12, Lemma 16.15, (16.2.4) and (17.3.6)–(17.3.7). As
argued in Sect. 18.3.1, the latter two equations (which were derived for Glauber dy-
namics) continue to be valid for Kawasaki dynamics as well. In Sect. 18.4.4 below
we will see why it is important to attach the free particle at a “good site”, i.e., next
to two other particles in the protocritical droplet.
Think of Q as the set of canonical protocritical droplets: D , the set of protocrit-
ical droplets, is the set of all configurations the dynamics can reach after hitting Q
before the creation of the next free particle in ∂ − Λ. This particle completes the
formation of a critical droplet (= a protocritical droplet + a free particle at the
boundary) that triggers the nucleation. If subsequently the free particle moves to
the protocritical droplet and attaches itself at a “good site”, then the dynamics has
“moved over the hill” and proceeds to fill up Λ− .

18.4.4 Motion on the plateau

The following observations, which constitute a refinement of what the dynamics

does when it is close to forming a critical droplet, will be needed in Sect. 18.5.
(1) Starting from D fp \ Q fp , the only transitions that do not raise the energy are
motions of the free particle, as long as the free particle is at lattice distance ≥ 3
from the protocritical droplet.
(2) Starting from Q fp , the only transitions that do not raise the energy are motions
of the free particle and motions of the 1-protuberance along the side of the quasi-
square where it is attached, as long as the free particle is at lattice distance ≥ 3
from the protocritical droplet. When the lattice distance is 2, either the free
particle can be attached to the protocritical droplet or the 1-protuberance can be
detached from the protocritical droplet and attached to the free particle, to form
a quasi-square plus a dimer. From the latter configuration the only transition
that does not raise the energy is the reverse move.
(3) Starting from D fp , the only configurations that can be reached by a path that
lowers the energy and does not decrease the particle number are those where
the free particle is attached to the protocritical droplet.
444 18 Kawasaki Dynamics

Fig. 18.10 Good sites (G) and bad sites (B)

The restriction in observation (1) that the free particle must be at lattice distance ≥ 3
from the protocritical droplet is needed for the following reason: If the protocritical
droplet is a configuration in D \ Q and the free particle sits at lattice distance 2
from a corner of a bar, diagonally opposite the particle that sits in the corner of the
bar, then at zero cost this particle may detach itself from the bar and slide inbetween
the quasi-square and the free particle. For observation (3) note the following: if we
start from the configuration described above and slide the remaining particles in the
bar one by one, all at zero cost except the last one, which pays U , then we reach a
configuration where the free particle is attached to the protocritical droplet with the
bar shifted.
The following definition introduces the notion of good sites (G) and bad sites (B)
on the border of protocritical droplets (see Fig. 18.10).

Definition 18.15
(a) For η ∈ D fp , write η = (η̂, x) with η̂ ∈ D the protocritical droplet and x ∈ ∂ − Λ
the position of the free particle.
(b) Let the configurations that can be reached from η = (η̂, x) ∈ D fp according to
observation (3) be denoted by

C G (η̂) if the particle is attached in ∂ − CR(η̂),

(18.4.18)
C B (η̂) if the particle is attached in ∂ + CR(η̂).

The next proposition, which is the main result of this section, shows that when
the dynamics reaches C G it has gone “over the hill”, while when it reaches C B it
has not.
18.4 Structure of the communication level set 445

Fig. 18.11 An example of a path from C B to

Proposition 18.16
(i) If η ∈ C G , then there exists a γ : η → such that maxξ ∈γ H (ξ ) < Γ ∗ .
(ii) If η ∈ C B , then there are no γ : η → or γ : η → such that maxξ ∈γ H (ξ ) <
Γ ∗.

Proof (i) If η ∈ C G , then its energy is either Γ − 2U or Γ − U , depending on

whether the particle was attached in a corner or as a 1-protuberance. In the latter case
we can move the particle at no cost into a corner and gain an extra −U . After that it
is possible to create a new particle and re-attach it, which leads to energy Γ − 2U −
(2U −Δ). We can continue in this way, filling up all rows in ∂ − CR(η), until we reach
either an c × c square or an (c − 1) × (c + 1) rectangle, depending on whether η
arose from D̄ or D % (recall Definition 18.1(b)). In the first case we can proceed along
the reference path for the nucleation constructed in the proof of Proposition 18.8.
In the latter case, however, we can connect to this reference path as follows. The
energy of the (c − 1) × (c + 1) rectangle is Γ − 2U − (2U − Δ)(c − 3). This is
lower than Γ − Δ, because c ≥ 3. Create a particle, which costs Δ, and attach it
to one of the longest sides of the rectangle, which pays U . Now slide particles along
the corner of the rectangle, following the mechanism described in Figs. 18.5–18.6,
until an c × c square is reached. This costs U and keeps the energy below Γ .
From there again proceed along the reference path for the nucleation.
(ii) If η ∈ C B , then H (η) = Γ − U , so as long as the energy stays below Γ it
is impossible to create a new particle before further lowering the energy. But there
are no moves available to lower the energy. The only moves available are those
where the particle that was last attached is moving along the side or is detached
again, which brings us back to D fp , or those that start a motion of particles along
the border of the droplet (as in Fig. 18.6), which may or may not bring us back to
D fp . In both cases the cost is U and the energy returns to Γ .
An example of a path from C B to that does not return to a protocritical droplet
plus a free particle is obtained as follows (see Fig. 18.11). Suppose that η̂ ∈ D is
such that one bar completes one side of ∂ − CR(η̂), and suppose that the free parti-
446 18 Kawasaki Dynamics

cle attaches itself on top of that bar, forming a 1-protuberance. Then the energy is
Γ − U . Slide this bar to the end of the side it is attached to (at cost and gain U ) and
slide the two bars on the neighboring sides to the end as well (at cost and gain U ).
Then the energy is again Γ − U . Next move the shorter bar on top of the longer
bar via a motion as in Fig. 18.6. When the last particle of the bar is moved, it can
be detached (at cost U ) and re-attached (at gain 2U ). Then the energy is Γ − 2U .
Now create a free particle (at cost Δ), move it to the droplet (at cost 0), and attach it
in a corner of the droplet (at gain 2U ). Continue “downhill” in this way, adding on
successive rows as in the reference path that was used above, until is reached.

Proposition 18.16(ii) shows that the configurations in C B are wells, i.e., their
energy is < Γ , but to move to either or the energy must return to Γ . The
configurations of the form “quasi-square plus dimer” described in observation (2)
are elements of S (, ) but not of C (, ). Indeed, the only possible move at
zero cost is the one where the free particle jumps back to the quasi-square.

Summarizing the above, we have the following:

– The set of configurations through which all optimal paths must pass is a union of
plateaus, indexed by η̂ ∈ D .
– Each plateau consists of a protocritical droplet η̂ and a collection of positions of
the free particle, indexed by Λ \ (η̂ ∪ ∂ + η̂).
– Each plateau has wells and dead-ends when the free particle is close to the prot-
ocritical droplet.
This geometric structure is special for Kawasaki dynamics. We will not attempt to
describe the wells and the dead-ends in detail. For the proof of Theorem 18.4 in
Sect. 18.5 this will not be needed.

18.4.5 Cardinality of the set of protocritical droplets

In this section we show that the cardinality of D modulo shifts of the protocritical
droplet equals the formula given in (18.1.13).

Proof First we consider D̄ . We have to count the number of different shapes of the
clusters in D̄ (recall Fig. 18.2). We do this by counting in how many ways c − 1
particles can be removed from the four bars of an c × c square starting from
the four corners (recall Definition 18.1(b)). We split the counting according to the
number k = 1, 2, 3, 4 of corners from which particles are removed. The number of
ways in which we can choose k corners is k4 . After we have removed the particles
at these corners, we need to remove c − 1 − k more particles from either side of
18.5 Asymptotics of the prefactor for large volumes 447

each corner. The number of ways in which this can be done is

(m1 , . . . , m2k ) ∈ N2k : m1 + · · · + m2k = c − 1 − k
0

= (m1 , . . . , m2k ) ∈ N2k : m1 + · · · + m2k = c − 1 + k

c − 2 + k
= . (18.4.20)
2k − 1

The counting for D % is the same, except that we start from an (c − 1) × (c + 1)
rectangle and count in how many ways c − 2 particles can be removed from the
four bars. The answer is the same as in (18.4.20) with c − 1 replaced by c − 2,
except for an extra factor 2 that counts the two orientations of the rectangle.

18.5 Asymptotics of the prefactor for large volumes

In this section we prove Theorem 18.4. Our starting point is the variational formula
for Θ = 1/K given in Lemma 16.17. In Sect. 18.5.1 we define certain objects that
capture the geometry of critical droplets and wells. In Sect. 18.5.2 we derive upper
and lower bounds for Θ in terms of certain capacities of simple random walk on
Λ+ restricted not to enter the support of a protocritical droplet. In Sect. 18.5.3 we
compute the asymptotics of these capacities in the limit as Λ → Z2 , and show that
the upper and lower bounds merge because of the recurrence of simple random walk
on Z2 .

18.5.1 Geometry of critical droplets and wells

In the proof we need one more definition, which relies on the geometric structure
outlined in Sect. 18.4.4. Recall the definition of S , S and Si , i = 1, . . . , I , from
(16.2.25) and (16.3.3)–(16.3.4). Abbreviate supp+ (η̂) = supp(η̂) ∪ ∂ + supp(η̂).

Definition 18.17
(a) Let DΛ = {η = (η̂, x) : η̂ ∈ D, x ∈ Λ \ supp+ (η̂)}.
fp

(b) For η̂ ∈ D , let

G(η̂) = x ∈ ∂ + supp(η̂) : (η̂, x) ∈ S ,
(18.5.1)
B(η̂) = x ∈ ∂ + supp(η̂) : ∃ i = 1, . . . , I : (η̂, x) ∈ Si ,

be the set of good sites, respectively, bad sites for η̂. Note that (η̂, x) may be in
the same Si for different x ∈ B(η̂).
448 18 Kawasaki Dynamics

(c) For η̂ ∈ D , let

I (η̂) = i ∈ 1, . . . , I : ∃ x ∈ B(η̂) : (η̂, x) ∈ Si . (18.5.2)

Note that B(η̂) can be partitioned into disjoint sets B1 (η̂), . . . , B|I (η̂)| (η̂) accord-
ing to which Si the configuration (η̂, x) belongs to.
(d) Write
+
CS(η̂) = supp(η̂) ∪ G(η̂), CS (η̂) = CS(η̂) ∪ ∂ + CS(η̂),
(18.5.3)
++
CS (η̂) = CS+ (η̂) ∪ ∂ + CS+ (η̂).

By Proposition 18.16, the link between the sets in Definitions 18.15(b) and
18.17(b) is

C G (η̂) = (η̂, x),
x∈G(η̂)
(18.5.4)
C B (η̂) = (η̂, x).
x∈B(η̂)

For the argument below it is important that G(η̂) = ∅ for all η̂ ∈ D . On the other
hand, the sets B(η̂), η̂ ∈ D , will turn out to play no role for the asymptotics of K as
Λ → Z2 .

18.5.2 Capacity bounds on the prefactor

We have the following bounds on Θ = 1/K.

Lemma 18.18 Θ ∈ [Θ1 , Θ2 ] with

+
Θ1 = 1 + o(1) cap Λ ∂ + Λ, CS(η̂) ,
η̂∈D
+ (18.5.5)
Λ+
Θ2 = cap ∂ Λ, CS++ (η̂) ,
η̂∈D

where
+ 2
cap Λ ∂ + Λ, F = min 1
2 g(x) − g x , F ⊂ Λ, (18.5.6)
g : Λ+ →[0,1]
g| + =1, g|F =0
∂ Λ
(x,x )∈(Λ+ )

with (Λ+ ) = {(x, y) : x, y ∈ Λ+ , x − y = 1} is the capacity of simple random

walk on Λ modulo normalisation, and o(1) is an error term that tends to zero as
Λ → Z2 .
18.5 Asymptotics of the prefactor for large volumes 449

Proof The variational problem in (16.3.11) decomposes into disjoint variational

problems for the maximally connected components of S . Only those components
that contain S or S contribute, since for the other components the minimum is
achieved by picking h constant.

• Θ ≥ Θ1 : A lower bound is obtained from (16.3.11) by removing all transitions that

do not involve a fixed protocritical droplet and a move of the free/attached particle.
This removal gives

Θ≥ min min
Ci (η̂), i∈I (η̂) g : Λ+ →[0,1]
η̂∈D g|G(η̂) =0, g|B (η̂) =Ci (η̂), i∈I (η̂), g| + =1
∂ Λ
i
2
1
2 g(x) − g x . (18.5.7)
(x,x )∈[Λ+ \supp(η̂)]

To see how this bound arises from (16.3.11), pick h in (16.3.11) and g in (18.5.7)
such that

h(η) = h(η̂, x) = g(x), η̂ ∈ D, x ∈ Λ+ \supp(η̂), (18.5.8)

and use that, by Definitions 18.17(b–c),

(η̂, x) ∈ S , x ∈ G(η̂),
(η̂, x) ∈ Si , x ∈ Bi (η̂), i ∈ I (η̂), (18.5.9)
(η̂, x) ∈ D ⊂ S , x ∈ ∂ + Λ.

A further lower bound is obtained by removing from the right-hand side of (18.5.9)
the boundary condition on the sets Bi (η̂), i ∈ I (η̂). This gives
2
Θ≥ min 1
2 g(x) − g x
g : Λ+ →[0,1]
η̂∈D g|G(η̂) =0, g| + =1
∂ Λ
(x,x )∈[Λ+ \supp(η̂)]
+ \supp(η̂) +
= cap Λ ∂ Λ, G(η̂) , (18.5.10)
η̂∈D

where the upper index Λ+ \supp(η̂) refers to the fact that no moves in and out of
supp(η̂) are allowed (i.e., this set acts as an obstacle for the free particle). To com-
plete the proof we show that, in the limit as Λ → Z2 ,
+ +
cap Λ ∂ + Λ, supp(η̂) ∪ G(η̂) ≥ cap Λ \supp(η̂) ∂ + Λ, G(η̂)
+
≥ cap Λ ∂ + Λ, supp(η̂) ∪ G(η̂) − O [1/ ln |Λ|]2 . (18.5.11)
+
We will show in Sect. 18.5.2 that cap Λ (∂ + Λ, CS(η̂)) decays like 1/ ln |Λ|. Since
CS(η̂) = supp(η̂) ∪ G(η̂) by Definition 18.17(d), the lower bound Θ ≥ Θ1 follows.
450 18 Kawasaki Dynamics

Remark 18.19 Before we prove (18.5.11), note that the capacity in the right-hand
side of (18.5.11) includes more transitions than the capacity in the left-hand side,
namely, all transitions from supp(η̂) to B(η̂). Let
Λ+ \supp(η̂) + \supp(η̂) +
g∂ + Λ,G(η̂) (x) = equilibrium potential for cap Λ ∂ Λ, G(η̂) at x.
(18.5.12)
Below we will show that
Λ+ \supp(η̂)
g∂ + Λ,G(η̂) (x) ≤ C/ ln |Λ| ∀ x ∈ B(η̂) for some C < ∞. (18.5.13)

Since in the Dirichlet form in (18.5.6) the equilibrium potential appears squared, the
error made by adding to the capacity in the left-hand side of (18.5.11) the transitions
from supp(η̂) to B(η̂) is of order [1/ ln |Λ|]2 times |B(η̂)|, which explains how
(18.5.11) arises.

η̂
Formally, let Px be the law of the simple random walk that starts at x ∈ B(η̂) and
is forbidden to visit the sites in supp(η̂). Let y ∈ G(η̂). As in the proof of Lemma 8.4,
we have
η̂
Λ+ \supp(η̂) Px (τ∂ + Λ < τG(η̂)∪x )
g∂ + Λ,G(η̂) (x) = Pη̂x (τ∂ + Λ < τG(η̂) ) = η̂
Px (τG(η̂)∪∂ + Λ < τx )
+
cap Λ \supp(η̂) (x, ∂ + Λ)
η̂
Px (τ∂ + Λ < τx )
≤ ≤ + . (18.5.14)
η̂
Px (τy < τx ) cap Λ \supp(η̂) (x, y)

The denominator of (18.5.14) can be bounded from below by some C > 0 that is
independent of x, y and supp(η̂). To see why, pick a path from x to y that avoids
supp(η̂) but stays inside a layer around supp(η̂), and argue as in the proof of the
lower bound of Lemma 6.11. On the other hand, the numerator is bounded from
+
above by cap Λ (x, ∂ + Λ), i.e., by the capacity of the same pair of sets for a ran-
dom walk that is not forbidden to visit supp(η̂), since the Dirichlet problem asso-
ciated to the latter has the same boundary conditions but includes more transitions.
+
In the proof of Lemma 18.20 below, we will see that cap Λ (x, ∂ + Λ) decays like
C / ln |Λ| for some C < ∞ (see (18.5.21)–(18.5.22) below). We therefore con-
clude that indeed (18.5.13) holds with C = C /C .

• Θ ≤ Θ2 : The upper bound is obtained from (16.3.11) by picking Ci = 0, i =

1, . . . , I , and
⎧
⎪
⎨1 for η ∈ S ,
h(η) = g(x) for η = (η̂, x) ∈ C ++ , (18.5.15)
⎪
⎩
0 for η ∈ S \[S ∪ C ++ ],
where

C ++ = η = (η̂, x) : η̂ ∈ D, x ∈ Λ\CS++ (η̂) (18.5.16)
18.5 Asymptotics of the prefactor for large volumes 451

fp
consists of those configurations in DΛ for which the free particle is at distance ≥ 2
of the protocritical droplet and the set of good sites. The choice in (18.5.15) gives
+
Θ≤ cap Λ ∂ + Λ, CS++ (η̂) . (18.5.17)
η̂∈D

To see how this upper bound arises, note that:

– The choice in (18.5.15) satisfies the boundary conditions in (16.3.11) because
(recall (16.3.3)–(16.3.4))
' I (
fp

++ fp
C ⊆ DΛ , S ∪ D Λ ∩ S ∪ Si =∅
i=1
' (

I
=⇒ S \ S ∪ C ++ ⊃ S ∪

Si . (18.5.18)
i=1

– Since D ⊂ S , the first line of (18.5.15) implies that h(η) = 1 for η = (η̂, x) with
η̂ ∈ D and x ∈ ∂ + Λ, which is consistent with the boundary condition g|∂ + Λ = 1
in (18.5.6).
– The third line of (18.5.15) implies that h(η) = 0 for η = (η̂, x) with η̂ ∈ D and
x ∈ CS++ (η̂), which is consistent with the boundary condition g|F = 0 in (18.5.6)
for F = CS++ (η̂).
Note further that:
– The only transitions in S between S and C ++ are those where a free particle
enters ∂ − Λ.
– The only transitions in S between C ++ and S \[S ∪ C ++ ] are those where the
free particle moves from distance 2 to distance 1 of the protocritical droplet. All
other transitions either involve a detachment of a particle from the protocritical
droplet (which raises the number of droplets) or an increase in the number of
particles in Λ. Such transitions lead to energy > Γ , which is not possible in S .
– There are no transitions between S and S \[S ∪ C ++ ].
The latter arguments show that (18.5.6) includes all the transitions in (16.3.11).

18.5.3 Capacity asymptotics

With Lemma 18.18 we have obtained upper and lower bounds on Θ in terms of
capacities for simple random walk on Z2 of the pairs of sets ∂ + Λ and CS(η̂), re-
spectively, CS++ (η̂), with η̂ summed over D . We use these bounds to prove Theo-
rem 18.4. The transition rates of the simple random walk are 1 between neighbour-
ing pairs of sites.
452 18 Kawasaki Dynamics

Fig. 18.12 Simple random walk of a free particle moving from ∂ + BM to CS(η̂), respectively,
CS++ (η̂)

Proof Lemma 18.20 below shows that, in the limit as Λ → Z2 , each of the ca-
pacities in the upper and lower bound on Θ has the same asymptotic behaviour,
namely, [1 + o(1)] 4π/ ln |Λ|, irrespective of the location and shape of the protocrit-
ical droplet (provided it is not too close to ∂ + Λ, which is a negligible fraction of the
possible locations). In what follows we take Λ = BM = [−M, +M]2 ∩ Z2 for some
M ∈ N large enough (M > 2c ).

Lemma 18.20 For any ε > 0 (see Fig. 18.12),

ln M +
lim max cap BM
∂ +
B , (η̂) − 1 = 0,
M→∞ η̂∈D 2π
M CS
d(∂ + BM ,supp(η̂))≥εM
(18.5.19)
ln M +
lim max BM
∂ BM , CS (η̂) − 1 = 0,
+ ++
M→∞ η̂∈D 2π cap
d(∂ + BM ,supp(η̂))≥εM

where d(∂ + BM , supp(η̂)) = min{x − y : x ∈ ∂ + BM , y ∈ supp(η̂)}.

Proof We only prove the first line of (18.5.19). The proof of the second line is
similar.

• Lower bound: For η̂ ∈ D , let y ∈ CS(η̂) ⊂ BM denote the site closest to the center
of CS(η̂). The capacity decreases when we enlarge the set over which the Dirichlet
form is minimised. Therefore we have
+ +
cap BM ∂ + BM , CS(η̂) ≥ cap BM ∂ + BM , y
+ +
= cap (BM −y) ∂ + (BM − y), 0 ≥ cap B2M ∂ + B2M , 0 , (18.5.20)
+
where the last equality uses that (BM − y)+ ⊂ B2M because y ∈ BM . By the ana-
logue of (16.2.5) for simple random walk, we have (compare (18.5.6) with (16.2.1)–
(16.2.2))
+ +
capB2M ∂ + B2M , 0 = capB2M 0, ∂ + B2M = 4 P0 (τ∂ + B2M < τ0 ), (18.5.21)
18.6 Extension to three dimensions 453

where P0 is the law on path space of the discrete-time simple random walk on Z2
starting at 0. It is a standard fact (see e.g. Révész [205], Lemma 22.1) that
π
P0 (τ∂ + B2M < τ0 ) = 1 + o(1) , M → ∞. (18.5.22)
2 ln(2M)
Combining (18.5.20)–(18.5.22), we get the desired lower bound.

• Upper bound: As in (18.5.20), we have

+ +
cap BM ∂ + BM , CS(η̂) ≤ cap BM ∂ + BM , Sy (η̂)
+ +
= cap (BM −y) ∂ + (BM − y), Sy (η̂) − y ≤ cap BεM ∂ + BεM , Sy (η̂) − y ,
(18.5.23)
where Sy (η̂) is the smallest square centered at y containing CS(η̂), and the last
+
inequality uses that (BM − y)+ ⊃ BεM when d(∂ + BM , supp(η̂)) ≥ εM. By the re-
currence of simple random walk, we have
+ +
cap BεM ∂ + BεM , Sy (η̂) − y = 1 + o(1) cap BεM ∂ + BεM , 0 , M → ∞.
(18.5.24)
Combining (18.5.22)–(18.5.24), we get the desired upper bound.

We are now ready to complete the proof of Theorem 18.4. Combining Lem-
mas 18.18–18.20, we find that Θ ∈ [Θ1 , Θ2 ] with
+
Θ1 = O(εM) + cap BM ∂ + BM , CS(η̂)
η̂∈D
d(∂ + BM ,supp(η̂))≥εM

2π
= O(εM) + η̂ ∈ D : d ∂ + BM , supp(η̂) ≥ εM 1 + o(1)
ln M
2π 2
= O(εM) + N (c ) 2(1 − ε)M 1 + o(1) , M → ∞, (18.5.25)
ln M
for any ε > 0 and the same expression for Θ2 , where we use that
+ +
cap BM ∂ + BM , CS(η̂) ≤ cap BM BM+
\CS(η̂), CS(η̂) = 12 CS+ (η̂) ≤ 12 (c + 2)2 ,
(18.5.26)
and we recall that N (c ) is the cardinality of D modulo shifts of the pro-
tocritical droplets. Let M → ∞ followed by ε ↓ 0, to conclude that Θ ∼
2πN(c )(2M)2 / ln M. Since |Λ| = (2M + 1)2 and K = 1/Θ, this proves (18.1.12)
in Theorem 18.4.

18.6 Extension to three dimensions

In this section we briefly indicate how to extend the main definitions and results
from two to three dimensions.
454 18 Kawasaki Dynamics

Let Λ ⊂ Z3 be a large cubic box, centred at the origin. The metastable parameter
range replacing (18.1.4) is
Δ ∈ (U, 3U ), (18.6.1)
and, similarly as in (18.1.6), we assume that

U 2U
∈
/ N, ∈
/ N. (18.6.2)
3U − Δ 3U − Δ
The analogue of Definitions 18.1(b–c) and 18.6 reads:

Definition 18.21
(a) Let Q denote the set of configurations having one cluster anywhere in Λ− con-
sisting of an (mc − 1) × (mc − δc ) × mc quasi-cube with, attached to one of
its faces, an (c − 1) × c quasi-square with, attached to one of its sides, a sin-
gle particle. Here, δc ∈ {0, 1} depends on the arithmetic properties of U and Δ,
while
9 : 9 :
U 2U
c = , mc = , (18.6.3)
3U − Δ 3U − Δ
are the two-dimensional critical droplet size on a face, respectively, the three-
dimensional critical droplet size, replacing (18.1.5). Note that mc ∈ {2c − 1,
2c }.
(b) For Δ ∈ (2U, 3U ), let Q 2U denote the set of configurations that can be reached
from some configuration in Q via a 2U -path, i.e.,

Q 2U = η ∈ Vnc : ∃ η ∈ Q : H (η) = H η , ΦVnc η, η ≤ H (η) + 2U ,
(18.6.4)
where nc = mc (mc − δc )(mc − 1) + c (c − 1) + 1 is the volume of the clusters
in Q. For Δ ∈ (U, 2U ), use U instead of 2U in (18.6.4).
(c) Let [Q 2U ]fp denote the set of configurations obtained from Q 2U by adding a
free particle anywhere in ∂ − Λ (see Fig. 18.13).
(d) Let
fp
Γ = Γ (, ) = H Q 2U = H Q 2U + Δ = H (Q) + Δ

= U mc (mc − δc ) + mc (mc − 1) + (mc − δc )(mc − 1) + 2c + 3

− (3U − Δ) mc (mc − δc )(mc − 1) + c (c − 1) + 2 .
(18.6.5)

Theorem 18.3 carries over: P (, ) = Q 2U and C (, ) = [Q 2U ]fp . Unfor-

tunately, we are not able to fully identify the geometry of Q 2U , i.e., the analogue
of Fig. 18.2 is missing. This is due to the fact that the motion of particles along the
border of the droplet is much more complex in three than in two dimensions (see
Fig. 18.14 for an example).
18.6 Extension to three dimensions 455

Fig. 18.13 An element of Q fp ⊆ D fp for c = 10, mc = 20 and δc = 0

Fig. 18.14 An example of motion of particles along the border of the droplet

Also Theorem 18.2 carries over: the proof of (H1–H2) is the same as in
Sects. 18.3.1–18.3.2, except that for (H1) a little extra care is needed to handle the
geometry in three dimensions.
As in two dimensions, no easily computable formula for K is available. Similarly
as in Sect. 18.5, however, the prefactor K can be estimated in terms of capacities
associated with three-dimensional simple random walk. Since the latter is transient,
the large volume scaling of these capacities is no longer independent of the shape
and the location of the protocritical droplet. Therefore Theorem 18.4 carries over in
a somewhat weaker form.

Theorem 18.22 For large Λ,

1
lim |Λ| K(Λ, c , mc , δc ) = , (18.6.6)
Λ→Z3 M(c , mc , δc )N (c , mc , δc )
456 18 Kawasaki Dynamics

where N(c , mc , δc ) is the cardinality of D modulo shifts, and M(c , mc , δc ) satis-

fies the bounds
√
κ mc − 0 mc 1 ≤ M(c , mc , δc ) ≤ κ(mc + 3) (18.6.7)

with κ(m) the capacity of the m × m × m cube for simple random walk on Z3 .

Proof The extension of the proof in Sect. 18.5 from two to three dimensions is in
principle straightforward and involves no new ideas. The geometry of the commu-
nication level set is less explicit, but no detail is needed for the proof.
By the transience of simple random walk in three dimensions, we know that
+
lim cap Λ ∂ + Λ, F = cap Z (F )
3
(18.6.8)
Λ→Z3

exists for any finite non-empty F ⊂ Z3 . The limit, which is positive and finite, is the
capacity of F . Let κ(m) = cap Z (m × m × m) be the capacity of the m × m × m
3

cube for simple random walk on Z3 . Then we know that

lim κ(m)/m = κ (18.6.9)

m→∞

with κ the capacity of the unit cube for standard Brownian motion on R3 . Since
2πR is the capacity of the√ball with radius R for standard Brownian motion on R3 ,
we have that κ ∈ (2π, 2π 3).
The lower bound in (18.6.7) comes from the fact that all protocritical droplets
√
contain a cube of side length mc − mc . The upper bound comes from the fact that
all protocritical droplets are contained in a cube of side length mc + 1, and that as
long as the free particle is at distance ≥ 2 from the protocritical droplet no border
motion is possible. Both these facts are easy to establish.

With the help of (18.6.7) and (18.6.9), we have good control over M(c , mc , δc )
for mc large, i.e., for Δ close to 3U . We have no formula for N (c , mc , δc ) analo-
gous to (18.1.13). It would be nice to know its asymptotics for mc large.

18.7 Bibliographical notes

1. The results in this chapter are taken from Bovier, den Hollander and Nardi [31],
with geometric input from den Hollander, Nardi, Olivieri and Scoppola [84]. Cruder
versions of the main results in Chap. 16 for Kawasaki dynamics, derived with the
help of the pathwise approach to metastability, were obtained by den Hollander,
Olivieri and Scoppola [88–90] in two dimensions and by den Hollander, Nardi,
Olivieri and Scoppola [84] in three dimensions. The latter paper contains the “piston
construction” mentioned in Sect. 18.3.1.
18.7 Bibliographical notes 457

2. The formula for the number of protocritical droplets modulo shifts claimed in [31]
is wrong. The correct formula is (18.1.13), as shown in Sect. 18.4.5. The authors are
grateful to Markus Mayer for pointing out the error.

3. For details of the argument needed in Sect. 18.3.1 to extend the proof of (H1) from
Glauber dynamics to Kawasaki dynamics, see [84]. For a comparison of Glauber
dynamics and Kawasaki dynamics, see den Hollander [81].

4. The results in this chapter extend to arbitrary shapes of Λ (instead of a square

or a cube), provided |∂Λ|/|Λ| tends to zero as Λ → Z2 , respectively, Λ → Z3 . For
the relevant capacity asymptotics, needed in Sects. 18.5.2 and 18.6, see van den
Berg [227].

5. For more information on the tube of typical trajectories, or nucleation pattern, see
Olivieri and Vares [198], Sect. 7.13.

6. It would appear that the analysis in Sect. 18.6 could be extended to arbitrary di-
mension, like for Glauber dynamics (recall Sect. 17.6, Item 3). However, this exten-
sion has never been written out in detail. The set of critical droplets is quite complex
due to the motion of particles along the border of droplets. In two dimensions we
have a full understanding of this motion, in three dimensions a partial understanding
(see [84]), while in higher dimensions we know very little. It is clear that the critical
droplets for Glauber dynamics all are protocritical droplets for Kawasaki dynam-
ics. But the border motion can create many additional shapes, all via V -paths with
V < Δ.

7. An anisotropic version of Kawasaki dynamics, in which the Hamiltonian in

(18.1.2) is modified by allowing for different binding energies Uh < 0 and Uv < 0
in the horizontal and the vertical direction, was studied in Nardi, Olivieri and Scop-
pola [189]. Different nucleation patterns occur for weak and strong anisotropy. In
both cases the critical droplets are different from what is naively expected, similarly
as for the anisotropic Glauber dynamics described in Item 5 of Chap. 17.

8. Kawasaki dynamics with two types of particles, with binding energy −U < 0
between particles of different types (and no binding energy between particles of the
same type) and with different activation energies Δ1 > 0 and Δ2 > 0, was studied in
den Hollander, Nardi and Troiani [85–87]. There are several regimes, with critical
droplets being either square-shaped or rhombus-shaped. The proof of (H1)–(H2)
is quite involved, and is hampered by the fact that droplets with fixed volume and
minimal surface change shape when they come close to ∂Λ.
Part VII
Applications: Lattice Systems in Large
Volumes at Low Temperatures

Part VII looks at nucleation in lattice systems that grow to infinity as the tempera-
ture tends to zero. Spatial entropy comes into play: in large volumes, even at low
temperatures, entropy is competing with energy because the metastable state and
the states that evolve from it under the dynamics have a non-trivial spatial structure.
Chapter 19 looks at Glauber dynamics, Chap. 20 at Kawasaki dynamics.
The transition from the metastable state (with only subcritical droplets) to the
stable state (with one or more supercritical droplets) is triggered by the appearance
of a single critical droplet somewhere in the system. The main property driving the
results in Chaps. 19–20 is that the average time until this appearance is inversely
proportional to the volume. This property is referred to as homogeneous nucleation,
because it says that the critical droplet for the transition appears essentially inde-
pendently in small volumes that partition the large volume.
No information will be obtained about what happens to the system after the criti-
cal droplet has appeared. This belongs to the post-nucleation regime, which is much
harder than the pre-nucleation regime considered here, and which will be briefly ad-
dressed in Chap. 23. Our results are further limited in the sense that we need to draw
the initial configuration according to a specific distribution on the set of subcritical
configurations, namely, the last-exit biased distribution introduced in Chap. 8. To
show that the same results hold for more general initial distributions we would need
to establish strong recurrence properties of the dynamics within the metastable state.
Another limitation is that there will be no proof that the nucleation time divided by
its average converges to the exponential distribution.
Contrary to Chap. 16, where for small volumes we were able to deal with a gen-
eral dynamics under a general set of hypotheses, the situation for large volumes is
significantly more difficult. This is why we can so far offer results only for Glauber
and Kawasaki. It remains a challenge to develop a more abstract set-up.
Chapter 19
Glauber Dynamics

La complexion qui fait le talent pour les petites choses est

contraire à celle qu’il faut pour le talent des grandes.
(François de La Rochefoucauld, Réflexions)

The goal of this chapter is to extend the analysis of Chap. 17 to volumes that grow
moderately fast as the temperature decreases. Let Λβ ⊂ Z2 be a square box with
periodic boundary conditions such that limβ→∞ |Λβ | = ∞. We run the Glauber dy-
namics on Λβ starting from a random initial configuration where all the droplets
(= clusters of plus-spins) are small. For large β, and in the parameter range cor-
responding to the metastable regime (recall Sect. 17.1.2), the transition from the
metastable state (with only subcritical droplets) to the stable state (with one or
more supercritical droplets) is triggered by the appearance of a single critical droplet
somewhere in Λβ . We will show that the average time until this happens scales like

eΓ β /N (c )|Λ|, where Γ and N (c ) are the quantities as for small volumes (recall
Sect. 17.1.3). This scaling is valid as long as the average nucleation time tends to
infinity.

19.1 Introduction and main results

19.1.1 Glauber dynamics in large volumes

We retain the setting of Sect. 17.1.1, expect that we replace the torus Λ ⊂ Z2 by
a β-dependent torus Λβ ⊂ Z2 . Accordingly, we write Sβ , Hβ instead of S, H to
indicate that the configuration space and the Hamiltonian also depend on β.

Subcritical, protocritical and critical configurations We want to start our

Glauber dynamics on Λβ from an initial configuration in which all droplets are
sufficiently small. To make this notion precise, we need the following definitions.

Definition 19.1
(a) Let CB (σ ), σ ∈ Sβ , be the configuration that is obtained from σ by a “bootstrap
percolation map”, i.e., by circumscribing all the droplets in σ with rectangles,

A. Bovier, F. den Hollander, Metastability,
Grundlehren der mathematischen Wissenschaften 351,
DOI 10.1007/978-3-319-24777-9_19
462 19 Glauber Dynamics

and continuing to do so in an iterative manner until a union of disjoint rectangles

is obtained.
(b) Call CB (σ ) subcritical if all its rectangles fit inside the protocritical droplets for
Glauber dynamics in Chap. 17, and are at distance ≥ 2 from each other (i.e., are
non-interacting).

Definition 19.2
(a) S = {σ ∈ Sβ : CB (σ ) is subcritical}.
(b) P = {σ ∈ S : cβ (σ, σ ) > 0 for some σ ∈ S c }.
(c) C = {σ ∈ S c : cβ (σ, σ ) > 0 for some σ ∈ S }.

We refer to S , P and C as the set of subcritical, protocritical, respectively,

critical configurations. Note that, for every σ ∈ Sβ , each step in the bootstrap per-
colation map σ → CB (σ ) decreases the energy, and therefore the Glauber dynamics
moves from σ to CB (σ ) in a time of order one. This is why CB (σ ) appears in the
definition of S . The subcritical configurations therefore are the analogues of the
subcritical droplets we encountered in Sect. 17.1.

Remark 19.3 The sets P, C will play a similar rôle as, but are not directly compa-
rable with, the sets P , C in Chap. 17.

Sets of starting configurations For 1 , 2 ∈ N, let R1 ,2 (x) ⊂ Λβ be the 1 × 2

rectangle whose lower-left corner is x. (We always take 1 ≤ 2 and allow for both
orientations of the rectangle, i.e., R1 ,2 (x) actually represents two rectangles.) For
L = 1, . . . , 2c − 3, let QL (x) denote the L-th element in the canonical sequence
of growing squares and quasi-squares

R1,2 (x), R2,2 (x), R2,3 (x), R3,3 (x), . . . , Rc −1,c −1 (x), Rc −1,c (x). (19.1.1)

Our starting configurations will be drawn from one of the sets SL ⊂ S defined by

SL = σ ∈ S : each rectangle in CB (σ ) fits inside QL (x) for some x ∈ Λβ ,
(19.1.2)
for any L ∈ N that satisfies L∗ ≤ L ≤ 2c − 3 with

L∗ = min 1 ≤ L ≤ 2c − 3 : lim μβ (SL )/μβ (S ) = 1 . (19.1.3)
β→∞

In words, SL is the subset of those subcritical configurations whose droplets fit

inside a square or quasi-square labelled by L, with L chosen large enough so that
SL is typical within S under the Gibbs measure μβ associated with Hβ in the limit
as β → ∞. (Our main theorem in Sect. 19.1.2 turns out not to depend on the choice
of L subject to these restrictions.)
Note that S2c −3 = S . The value of L∗ depends on how fast Λβ grows with β.
In Sect. 19.6 we show that, for every 1 ≤ L ≤ 2c − 4,

lim |Λβ |e−βΓL+1 = 0

lim μβ (SL )/μβ (S ) = 1 if and only if (19.1.4)
β→∞ β→∞
19.1 Introduction and main results 463

with ΓL+1 the energy needed to create a droplet QL+1 (0) at the origin. Thus, if
|Λβ | = eθβ , then L∗ = L∗ (θ ) = (2c − 3) ∧ min{L ∈ N : ΓL+1
> θ }, which in-
creases stepwise from 1 to 2c − 3 as θ increases from 0 to Γ , with Γ the com-
munication height in Chap. 17.

Initial distribution For non-empty disjoint sets A, B ⊂ Sβ , we recall that νA,B

denotes the last-exit biased distribution on A for the crossover to B, defined in
(7.1.38) as
μβ (σ )eA,B (σ )
νA,B (σ ) = , σ ∈ A, (19.1.5)
cap(A, B)
where eA,B is the equilibrium measure defined in (7.1.21).
We choose the initial distribution to be biased according to the last exit of SL
for the transition from SL to a target set in S c . Three choices for this target set are
made in Sect. 19.1.2, namely, S c , S c \C and DM , M ∈ N, M ≥ c , defined by

DM = σ ∈ Sβ : ∃ x ∈ Λβ such that supp CB (σ ) ⊃ RM,M (x) , (19.1.6)

which is the set of configurations containing a supercritical droplet of size M.

19.1.2 Main theorem

Throughout this chapter we assume that we are in the metastable regime where
h ∈ (0, 2J ) (recall Sect. 17.1.2). We further assume that

lim |Λβ | e−βΓ = 0.

lim |Λβ | = ∞, (19.1.7)
β→∞ β→∞

The second condition ensures that the existence of a critical droplet anywhere in the
box is still a rare event and does only occur after a large time. If this condition were
violated, then the metastable transition would no longer be dominated by the time
of nucleation, but by the growth of supercritical droplets that exist somewhere far
away.
For σ ∈ Sβ , let Pσ denote the law of the Glauber dynamics starting from σ . For
ν a probability distribution on Sβ , write

Pν (·) = ν(σ ) Pσ (·). (19.1.8)
σ ∈Sβ

Abbreviate

N1 = N1 (c ) = 4c , N2 = N2 (c ) = 43 (2c − 1). (19.1.9)

Theorem 19.4 (Mean crossover time) Subject to (19.1.3) and (19.1.7), the follow-
ing hold:
464 19 Glauber Dynamics

(a)
1
lim |Λβ | e−βΓ EνSL ,S c (τS c ) =

. (19.1.10)
β→∞ N1
(b)
1
lim |Λβ | e−βΓ EνSL ,S c \C (τS c \C ) =

. (19.1.11)
β→∞ N2
(c)

1
lim |Λβ | e−βΓ EνSL ,DM (τDM ) =

, ∀c ≤ M ≤ 2c − 1. (19.1.12)
β→∞ N2

19.1.3 Discussion

1. Theorem 19.4(a) says that the average time to create a critical droplet is [1 +

o(1)]eβΓ /N1 |Λβ |. Theorems 19.4(b–c) say that the average time to go beyond this

critical droplet and to grow a droplet that is twice as large is [1 + o(1)]eβΓ /N2 |Λβ |.
The factor N1 counts the number of shapes of the critical droplet, while |Λβ | counts
the number of locations. The average times to create a critical, respectively, a su-
percritical droplet differ by a factor N2 /N1 < 1. This is because, as we saw in
Sect. 17.1.4, item 3, once the dynamics is “on top of the hill” C it has a posi-
tive probability to “fall back” to S . On average the dynamics makes N1 /N2 > 1
attempts to reach the top C before it finally “falls over” to S c \C . After that, it
rapidly grows a large droplet.

2. If the second condition in (19.1.7) fails, then there is a positive probability to see a
protocritical droplet in Λβ under the starting measure νSL ,S c , and nucleation sets in
immediately. In that situation different questions about the system become relevant,
which are no longer nucleation-driven but are growth-driven (see Chap. 23). Theo-
rem 19.4(a) continues to be true, but it no longer describes metastable behaviour.

3. The average probability under the Gibbs measure μβ of destroying a supercritical

droplet and returning to a configuration in SL is exponentially small in β. We defer
the proof of this fact to Chap. 20, where we consider Kawasaki dynamics. The proof
for Glauber is easily read off from the one for Kawasaki. Thus, the crossover from
SL to S c \C truly represents the threshold for nucleation, and Theorem 19.4(b)
truly represents the nucleation time.

Outline Theorem 19.4 is proved in Sects. 19.2–19.4. Along the way we need
two technical facts whose proofs are deferred to Sects. 19.5–19.6. These deal with
sparseness of subcritical droplets and typicality of starting configurations, respec-
tively.
19.2 Average time to create a critical droplet 465

19.2 Average time to create a critical droplet

To estimate the average crossover time from SL ⊂ S to S c in Theorem 19.4(a),

we will use (7.1.41) in Corollary 7.11. With A = SL and B = S c , this relation
reads
1
νSL ,S c (σ ) Eσ (τS c ) = μβ (σ ) hSL ,S c (σ ). (19.2.1)
CAP(SL , S c )
σ ∈ SL σ ∈S

The left-hand side is the quantity of interest in (19.1.10). In Sects. 19.2.1–19.2.2 we

estimate σ ∈S μβ (σ )hSL ,S c (σ ) and CAP(SL , S c ). The estimates will show that

1
r.h.s. (19.2.1) = eβΓ 1 + o(1) , β → ∞. (19.2.2)
N1 |Λβ |

19.2.1 Estimate of the equilibrium potential

Lemma 19.5 σ ∈S μβ (σ )hSL ,S c (σ ) = μβ (S )[1 + o(1)] as β → ∞.

Proof Write, using (7.1.16),

μβ (σ )hSL ,S c (σ ) = μβ (σ )hSL ,S c (σ ) + μβ (σ )hSL ,S c (σ )
σ ∈S σ ∈ SL σ ∈S \SL

= μβ (SL ) + μβ (σ )Pσ (τSL < τS c ). (19.2.3)
σ ∈S \SL

The last sum is bounded above by μβ (S \SL ). But μβ (S \SL ) = o(μβ (S )) as

β → ∞ by our choice of L in (19.1.3).

19.2.2 Estimate of the capacity

Lemma 19.6 CAP(SL , S c ) = N1 |Λβ |e−βΓ μβ (S )[1 + o(1)] as β → ∞ with

N1 = 4c .

Proof The proof proceeds via upper and lower bounds, which are written out be-
low.
466 19 Glauber Dynamics

Fig. 19.1 Rc −1,c (x) (shaded box) and [Rc +1,c +2 (x − (1, 1))]c (complement of dotted box)

Upper bound

Proof We use the Dirichlet principle and a test function that is equal to 1 on S to
get the upper bound

CAP SL , S
c
≤ CAP S , S c = μβ (σ )cβ σ, σ (19.2.4)
σ ∈S ,σ ∈S c
cβ (σ,σ )>0

= μβ (σ ) ∧ μβ σ ≤ μβ (C ),
σ ∈S ,σ ∈S c
cβ (σ,σ )>0

where the second equality uses reversibility in combination with the fact that
cβ (σ, σ ) ∨ cβ (σ , σ ) = 1. Thus, it suffices to show that

μβ (C ) ≤ N1 |Λβ | e−βΓ μβ (S ) 1 + o(1)

as β → ∞. (19.2.5)

For every σ ∈ P there are one or more rectangles Rc −1,c (x), x = x(σ ) ∈ Sβ , that
are filled by (+1)-spins in CB (σ ). If σ ∈ C is such that σ = σ y for some y ∈ Λβ ,
then σ has a (+1)-spin at y situated on the boundary of one of these rectangles
(recall Definition 19.2). Let

Sˆ (x) = σ ∈ S : supp[σ ] ⊆ Rc −1,c (x) ,
c (19.2.6)
Sˇ (x) = σ ∈ S : supp[σ ] ⊆ Rc +1,c +2 x − (1, 1) .

For every σ ∈ P, we have σ = σ̂ ∨ σ̌ for some σ̂ ∈ Sˆ (x) and σ̌ ∈ Sˇ(x) with

x = x(σ ), uniquely decomposing the configuration into two non-interacting parts
inside Rc −1,c (x) and [Rc +1,c +2 (x − (1, 1))]c (see Fig. 19.1). We have

Hβ (σ ) − Hβ () = Hβ (σ̂ ) − Hβ () + Hβ (σ̌ ) − Hβ () . (19.2.7)

Moreover, for any y ∈

/ supp[CB (σ )], we have

Hβ σ y ≥ Hβ (σ ) + 2J − h. (19.2.8)
19.2 Average time to create a critical droplet 467

Fig. 19.2 Canonical order to break down a critical droplet

Hence
1 −βHβ (σ x )
μβ (C ) = e
Zβ
σ ∈P
x∈Λ β
σ x ∈C

1
≤ N1 e−β[2J −h−Hβ ()] e−βHβ (σ̌ ) e−βHβ (σ̂ )
Zβ
x∈Λβ σ̌ ∈Sˇ (x) σ̂ ∈Sˆ (x)
σ̂ ∨σ̌ ∈P

1
N1 |Λβ | e−βΓ e−βHβ (σ̌ )

≤ 1 + o(1)
Zβ
σ̌ ∈Sˇ (0)

= 1 + o(1) N1 |Λβ | e−βΓ μβ Sˇ (0) ,

(19.2.9)

where the first inequality uses (19.2.7)–(19.2.8), with N1 = 2 × 2c = 4c counting
the number of critical droplets that can arise from a protocritical droplet via a spin
flip, and the second inequality uses that

σ̂ ∈ Sˆ (0), σ̂ ∨ σ̌ ∈ P =⇒ Hβ (σ̂ ) ≥ Hβ Rc −1,c (0) = Γ − (2J − h) + Hβ ()
(19.2.10)
with equality in the right-hand side if and only if supp[σ̂ ] = Rc −1,c (0). Combining
(19.2.4) and (19.2.9) with the inclusion Sˇ(0) ⊂ S , we get the upper bound in
(19.2.5).

Lower bound

Proof We exploit Theorem 7.43 by making a judicious choice for the flow f . In fact,
for Glauber dynamics this choice will be simple: with each configuration σ ∈ SL
we associate a configuration in C ⊂ S c containing a unique critical droplet and a
flow that, from each such configuration, follows a unique deterministic path along
which this droplet is broken down in the canonical order (see Fig. 19.2) until the set
SL is reached, i.e., a square or quasi-square droplet with label L is left over (recall
(19.1.1)–(19.1.2)).
The proof comes in 5 steps.

1. Let w(β) be such that

1
lim w(β) = ∞, lim ln w(β) = 0, lim |Λβ |/w(β) = ∞,
β→∞ β→∞ β β→∞
(19.2.11)
468 19 Glauber Dynamics

Fig. 19.3 The critical droplet P(y) (x)

Fig. 19.4 Going from SL to CL by adding a critical droplet P(y) (x) somewhere in Λβ

and define

W = σ ∈ S : supp[σ ] ≤ |Λβ |/w(β) . (19.2.12)
Let CL ⊂ C ⊂Sc be the set of configurations obtained by picking any σ ∈ SL ∩W
and adding somewhere in Λβ a critical droplet at distance ≥ 2 from supp[σ ]. Note
that the density restriction imposed on W guarantees that adding such a droplet
is possible almost everywhere in Λβ for β large enough. Denoting by P(y) (x) the
critical droplet obtained by adding a protuberance at y along the longest side of the
rectangle Rc −1,c (x), we may write

CL = σ ∪ P(y) (x) : σ ∈ S ∩ W , x, y ∈ Λβ , (x, y)⊥σ , (19.2.13)

where (x, y)⊥σ stands for the restriction that the critical droplet P(y) (x) is not in-
teracting with supp[σ ], which implies that Hβ (σ ∪ P(y) (x)) = Hβ (σ ) + Γ (see
Figs. 19.3 and 19.4).

2. For each σ ∈ CL , we let γσ = (γσ (0), γσ (1), . . . , γσ (K)) be the canonical path
from σ = γσ (0) to SL along which the critical droplet is broken down (γσ (k) = σk
in Fig. 19.2), where K = v(2c − 3) − v(L) with

v(L) = QL (0) (19.2.14)
19.2 Average time to create a critical droplet 469

(recall (19.1.1)). We will choose our flow such that

f σ , σ
⎧
⎪
⎪ ν0 (σ ), if σ = σ, σ = γσ (1) for some σ ∈ CL ,
⎪
⎨
σ̃ ∈CL f (γσ̃ (k − 1), γσ (k)), if σ = γσ (k), σ = γσ (k + 1)
=
⎪
⎪ for some k ≥ 1, σ ∈ CL ,
⎪
⎩
0, otherwise.
(19.2.15)
Here, ν0 is some initial distribution on CL that will turn out to be arbitrary as long
as its support is all of CL .

3. We see from (19.2.15) that the flow increases whenever paths merge. In our case
this happens only after the first step, when the protuberance at y is removed. There-
fore we get the explicit form
⎧
⎪ν (σ ), if σ = σ, σ = γσ (1) for some σ ∈ CL ,
⎨ 0
f σ , σ = Cν0 (σ ), if σ = γσ (k), σ = γσ (k + 1) for some k ≥ 1, σ ∈ CL ,
⎪
⎩
0, otherwise,
(19.2.16)
where C = 2c is the number of possible positions of the protuberance on the proto-
critical droplet (see Fig. 19.2). Using Theorem 7.43, we therefore have

CAP SL , S
c

= CAP S c , SL ≥ CAP(CL , SL )
K−1 −1
f (γσ (k), γσ (k + 1))
≥ ν0 (σ )
μβ (γσ (k))cβ (γσ (k), γσ (k + 1))
σ ∈ CL k=0
−1
1
K−1
C
= + .
μβ (σ )cβ (γσ (0), γσ (1)) μβ (γσ (k))cβ (γσ (k), γσ (k + 1))
σ ∈ CL k=1
(19.2.17)
Thus, all we have to do is to control the sum between square brackets.

4. Because cβ (γσ (0), γσ (1)) = 1 (removing the protuberance lowers the energy),
the term with k = 0 equals 1/μβ (σ ). To show that the terms with k ≥ 1 are of
higher order, we argue as follows. Abbreviate Ξ = h(c − 2). For every k ≥ 1 and
σ (0) ∈ CL , we have (see Fig. 19.5)
1 −β[Hβ (γσ (k))∨Hβ (γσ (k+1))]
μβ γσ (k) cβ γσ (k), γσ (k + 1) = e
Zβ
≥ μβ (σ0 ) eβ[2J −h−Ξ ] = μβ (σ )eδβ , (19.2.18)
470 19 Glauber Dynamics

Fig. 19.5 Visualization of (19.2.18)

where δ = 2J − h − Ξ = 2J − h(c − 1) > 0. Therefore

K−1
C 1
≤ CKe−δβ , (19.2.19)
μβ (γσ (k))cβ (γσ (k), γσ (k + 1)) μβ (σ )
k=1

and so from (19.2.17) we get

μβ (σ ) μβ (CL )
CAP SL , S c ≥ = = 1 + o(1) μβ (CL ).
1 + CKe−βδ 1 + CKe−βδ
σ ∈ CL
(19.2.20)

5. Finally, we estimate, with the help of (19.2.13),

1 −βHβ (σ ) 1
μβ (CL ) = e = e−βHβ (σ ∪P(y) (x))
Zβ Zβ
σ ∈ CL σ ∈ S L ∩W x,y∈Λβ
(x,y)⊥σ

1
= e−βΓ e−βHβ (σ )

1
Zβ
σ ∈ S L ∩W x,y∈Λβ
(x,y)⊥σ

≥ e−βΓ μβ (SL ∩ W ) N1 |Λβ | 1 − (c + 1)2 /w(β) .

(19.2.21)

The last inequality uses that |Λβ |(c + 1)2 /w(β) is the maximal number of sites
in Λβ where it is not possible to insert a non-interacting critical droplet (recall
(19.2.12) and note that a critical droplet fits inside an c × c square). Finally, ac-
cording to Lemma 19.9 in Sect. 19.5, we have

μβ (SL ∩ W ) = μβ (SL ) 1 + o(1) , (19.2.22)

while conditions (19.1.2)–(19.1.3) imply that μβ (SL ) = μβ (S )[1 + o(1)]. Com-

bining the latter with (19.2.20)–(19.2.21), we obtain the desired lower bound.
19.3 Average time to go beyond the critical droplet 471

19.3 Average time to go beyond the critical droplet

To prove Theorem 19.4(b) we use the same technique as in Sect. 19.2. Therefore we
only give a sketch of the proof.
To estimate the average crossover time from SL ⊂ S to S c \C , we again use
Corollary 7.11, this time with A = SL and B = S c \C :
1
νSL ,S c \C (σ ) Eσ (τS c \C ) = μβ (σ ) hSL ,S c \C (σ ).
CAP(SL , S c \C )
σ ∈ SL σ ∈ S ∪C
(19.3.1)
The left-hand
side is the quantity of interest in (19.1.11). In Sects. 19.3.1–19.3.2 we
estimate both σ ∈S ∪C μβ (σ )hSL ,S c \C (σ ) and CAP(SL , S c \C ). The estimates
will show that
1
r.h.s. (19.3.1) = eβΓ 1 + o(1) , β → ∞. (19.3.2)
N2 |Λβ |

19.3.1 Estimate of the equilibrium potential

Lemma 19.7 σ ∈ S ∪C μβ (σ )hSL ,S c \C (σ ) = μβ (S )[1 + o(1)] as β → ∞.

Proof Write, using (7.1.16),

μβ (σ )hSL ,S c \C (σ ) = μβ (SL ) + μβ (σ )Pσ (τSL < τS c \C ).
σ ∈ S ∪C σ ∈(S \SL )∪C
(19.3.3)
The last sum is bounded above by μβ (S \SL ) + μβ (C ). As before, μβ (S \SL ) =
o(μβ (S )) as β → ∞. But (19.1.7) and (19.2.9) imply that μβ (C ) = o(μβ (S )) as
β → ∞.

19.3.2 Estimate of the capacity

Lemma 19.8 CAP(S , S c \C ) = N2 |Λβ |e−βΓ μβ (S )[1 + o(1)] as β → ∞ with

N2 = 43 (2c − 1).

Proof The proof is similar as that of Lemma 19.6, except that it takes care of the
transition probabilities away from the critical droplet (see Fig. 19.6, where σ is the
configuration that is reached through these transitions). The proof again proceeds
via upper and lower bounds, which are written out below.
472 19 Glauber Dynamics

Fig. 19.6 Canonical order to break down a proto-critical droplet plus a double protuberance. In
the first step, the double protuberance has probability 12 to be broken down in either of the two
possible ways. The subsequent steps are deterministic as in Fig. 19.2

Upper bound

Proof Recalling (7.1.35) and Lemma 7.12, and noting that Glauber dynamics does
not allow transitions within C , we have, for all h : C → [0, 1],

CAP SL , S \C ≤ CAP S , S \C
c c

2 2
≤ μβ (σ ) ĉσ h(σ ) − 1 + čσ h(σ ) − 0 , (19.3.4)
σ ∈C

where ĉσ = η∈S cβ (σ, η) and čσ = η∈S c \C cβ (σ, η). The quadratic form in the
right-hand side of (19.3.4) achieves its minimum for h(σ ) = ĉσ /(ĉσ + čσ ), so

CAP SL , S \C ≤
c
Cσ μβ (σ ) (19.3.5)
σ ∈C

with Cσ = ĉσ čσ /(ĉσ + čσ ). We have

1
Cσ x e−βHβ (σ )
x
Cσ μβ (σ ) =
Zβ
σ ∈C σ ∈P
x∈Λ β
σ x ∈C

1 −βHβ (σ ) 1
= e−β(2J −h) e 2 2 4 + 23 (2c − 4)
Zβ
σ ∈P
1
= e−β(2J −h) μβ (P) N2 = μβ (C ) N2 , (19.3.6)
N1

where in the second line we use that Cσ = 12 if σ has a protuberance in a corner

(2 × 4 choices) and Cσ = 23 otherwise (2 × (2c − 4) choices).

Lower bound
2 (x) the droplet obtained by adding
Proof In analogy with (19.2.13), denoting by P(y)
a double protuberance at y along the longest side of the rectangle Rc −1,c (x), we
define the set DL ⊂ S c \C by

DL = σ ∪ P(y)
2
(x) : σ ∈ SL ∩ W , x, y ∈ Λβ , (x, y)⊥σ . (19.3.7)
19.4 Average time to grow a droplet twice the critical size 473

As in (19.2.15), we may choose any starting measure ν0 on DL . We choose the flow

as follows. For the first step we choose

f σ , σ = 12 ν0 (σ ), σ ∈ DL , σ ∈ CL , (19.3.8)
which reduces the double protuberance to a single protuberance (compare (19.2.13)
and (19.3.7)). For all subsequent steps we follow the deterministic paths γσ used
in Sect. 19.2.2, which start from γσ (0) = σ . Note, however, that we get different
values for the flows f (γσ (0), γσ (1)) depending on whether the protuberance sits in
a corner or not. In the former case, it has only one possible antecedent, and so

f γσ (0), γσ (1) = 12 ν0 (σ ), (19.3.9)
while in the latter case it has two antecedents, and so

f γσ (0), γσ (1) = ν0 (σ ). (19.3.10)
This time the terms k = 0 and k = 1 are of the same order while, as in (19.2.19), all
the subsequent terms give a contribution that is a factor O(e−δβ ) smaller. Indeed, in
analogy with (19.2.17) we obtain, writing σ ∼ σ when cβ (σ , σ ) > 0,
c
CAP SL , S \C = CAP S \C , SL ≥ CAP(DL , SL )
c

f (σ , σ ) f (σ, γσ (1))
≥ 1
2 +

μβ (σ ) μβ (σ )
σ ∈ DL
σ ∈C L
σ ∼σ
−1

K−1
f (γσ (k), γσ (k + 1))
+
μβ (γσ (k))cβ (γσ (k), γσ (k + 1))
k=1
−1
≥ 1
2 μβ (σ ) f σ , σ + f σ, γσ (1) + CKe−βδ
σ ∈ DL σ ∈CL
σ ∼σ

2c − 4 1 1 4 1
= 1 + o(1) μβ (CL ) +
2c 1 + 1
2
2 2c 1
2 + 1
2
N2
= 1 + o(1) μβ (CL ) . (19.3.11)
N1
Using (19.2.21) and the remarks following it, we get the desired lower bound.

Figure 19.6 depicts the sequence of steps taken to break a protocritical droplet
down.

19.4 Average time to grow a droplet twice the critical size

The proof of Theorem 19.4(c) follows along the same lines as that of Theo-
rem 19.4(a–b) in Sects. 19.2–19.3. The starting point is the analogue of (19.3.1)
with S c \C replaced by DM and S ∪ C by DM c .
474 19 Glauber Dynamics

19.4.1 Estimate of the equilibrium potential

Proof Write

μβ (σ )hSL ,DM (σ ) = μβ (σ )hSL ,DM (σ ) + μβ (σ )hSL ,DM (σ )
σ ∈ DM
c σ ∈ SL σ ∈ DM
c \S
L

= μβ (SL ) + μβ (σ )Pσ (τSL < τDM ).
σ ∈ DM
c \S
L
(19.4.1)
The last sum is bounded above by μβ (S \SL ) + μβ (DM c \S ). But μ (S \S ) =
β L
o(μβ (S )) as β → ∞ by our choice of L in (19.1.3), while μβ (DM c \S ) =

o(μβ (S )) as β → ∞ because of the restriction c ≤ M ≤ 2c − 1. Indeed, un-

der that restriction the energy of a square droplet of size M is strictly larger than the
energy of a critical droplet.

19.4.2 Estimate of the capacity

Proof The main point is to prove that CAP(SL , DM ) = [1+o(1)]CAP(SL , S c \C ).

But CAP(SL , DM ) ≤ CAP(SL , S c \C ). The latter was estimated in Sect. 19.3, and
so we need only prove a lower bound on CAP(SL , DM ). This is done by using a
flow that breaks down an M × M droplet to a square or quasi-square droplet QL
in the canonical way, which takes M 2 − v(L) steps (recall Fig. 19.2 and (19.2.14)).
The leading terms are still the protocritical droplet with a single and a double protu-
berance. To each M × M droplet is associated a unique critical droplet, so that the
prefactor in the lower bound is the same as in the proof of Theorem 19.4(b).
Note that we can even allow M to grow with β as M = eo(β) . Indeed, (19.2.11)–
(19.2.12) imply that there is room enough to add a droplet of size eo(β) almost ev-
erywhere in Λβ , and the factor M 2 e−δβ replacing Ke−δβ in (19.2.20) still is o(1).

19.5 Sparseness of subcritical droplets

Recall Definition 19.2(a) and (19.2.11)–(19.2.12). In this section we prove the claim
made in (19.2.22).

μβ (S \W )
Lemma 19.9 limβ→∞ 1
β ln μ β (S ) = −∞.

Proof We will prove that limβ→∞ 1

β ln μβ (S \W )/μβ () = −∞. Since ∈ S ,
this will prove the claim.
19.6 Typicality of starting configurations 475

Let w(β) be the function satisfying (19.2.11). We begin by noting that

μβ (S \W ) ≤ μβ (I ) with I = σ ∈ S : supp CB (σ ) > |Λβ |/w(β) ,
(19.5.1)
because the bootstrap percolation map increases the number of (+1)-spins. Let D(k)
denote the set of configurations whose support consists on k non-interacting subcrit-
ical rectangles. Put C1 = (c + 2)(c + 1). Since the union of a subcritical rectangle
and its exterior boundary has at most C1 sites, it follows that in I there are at least
|Λβ |/C1 w(β) non-interacting rectangles. Thus, we have

Kmax
1
μβ (I ) ≤ F (k) with F (k) = e−β Hβ (σ ) , (19.5.2)
|Λβ |
Zβ σ ∈Sβ :
k= C w(β) C(σ )∈D(k)
1

where Kmax ≤ |Λβ |.

Next, note that
k 1 −βH (σ )
F (k) ≤ 2C1 e β . (19.5.3)
Zβ
σ ∈D (k)

Since the bootstrap percolation map is downhill, the energy of a subcritical rectangle
is bounded below by C2 = 2J −h (recall Fig. 19.5), and the number of ways to place

k rectangles in Λβ is at most |Λkβ | , it follows that for k large enough

|Λβ |
F (k) ≤ 2 C1 k
μβ () e−C2 βk
k
k
≤ 2C1 k C1 ew(β) μβ () e−C2 βk ≤ μβ () exp − 12 C2 βk , (19.5.4)

where the second inequality uses that k! ≥ k k e−k , k ∈ N, and the third inequality
uses that w(β) = eo(β) . We thus have

K
max
|Λβ | C2 |Λβ |
F (k) ≤ 2μβ () w(β) exp − 12 β , (19.5.5)
|Λβ |
w(β) C1 w(β)
k= C w(β)
1

which is the desired estimate because |Λβ |/w(β) tends to infinity as β → ∞.

19.6 Typicality of starting configurations

In this section we prove the claim made in (19.1.4).

Proof Split
S = SL ∪ (S \ SL ) = SL ∪ U>L , (19.6.1)
476 19 Glauber Dynamics

where U>L ⊂ S are those configurations σ for which CB (σ ) has at least one rect-
angle that is larger than QL (0). We have

CB (σ ) = R1 (x),2 (x) (x), (19.6.2)
x∈X(σ )

where X(σ ) is the set of lower-left corners of the rectangles in CB (σ ), which in turn
can be split as
X(σ ) = X >L (σ ) ∪ X ≤L (σ ), (19.6.3)
where X >L (σ )
labels the rectangles that are larger than QL (0) and labels X ≤L (σ )
the rest.
Let σ |A denote the restriction of σ to the set A ⊂ Z2 . Then, for any x ∈ X(σ ),
we have
H (σ ) = H (σ |R1 (x),2 (x) (x) ) + H (σ |Rc (x) ), (19.6.4)
2 (x),2 (x)

because the rectangles in CB (σ ) are non-interacting. Since for σ ∈ U>L there is at

least one rectangle with lower-left corner in X >L (σ ), we have

μβ (U>L )

≤ 1{x∈X>L (σ )} μβ (σ )
x∈Λβ σ ∈S
1
= 1{x∈X>L (σ )} exp −β H (σ |R1 (x),2 (x) (x) ) + H (σ |Rc (x), (x) (x) )
Zβ 1 2
x∈Λβ σ ∈S
1 −βH (σ |Rc (x), (x) (x) )
≤ e−βΓL+1 1{x∈X>L (σ )} e 1 2 , (19.6.5)
Zβ
x∈Λβ σ ∈S

where ΓL+1 is the energy of QL+1 (0). In the last step we use the fact that the
bootstrap map is downhill and that the energy of QL (0) is increasing with L. Since
the energy of a subcritical rectangle is non-negative, we get

μβ (U>L ) ≤ NL+1 e−βΓL+1 |Λβ | μβ (S ) (19.6.6)

with NL+1 counting the number of configurations with support in QL+1 (0).
On the other hand, by considering only those configurations in U>L that have a
QL+1 (0) droplet, we get
[QL+1 (0)]c
μβ (U>L ) ≥ NL+1 e−βΓL+1 |Λβ | μβ (S ), (19.6.7)

where the last factor is the Gibbs weight of the configurations in S with support
[Q (0)]c
outside [QL+1 (0)]c . It easy to show that μβ L+1 (S ) = μβ (S )[1 + o(1)] as
β → ∞ and so

μβ (U>L ) ≥ NL+1 e−βΓL+1 |Λβ | μβ (S ) 1 + o(1) , β → ∞. (19.6.8)
19.7 Bibliographical notes 477

Combining (19.6.6) and (19.6.8), we conclude that limβ→∞ μβ (U>L )/μβ (S ) = 0

if and only if
lim |Λβ | e−ΓL+1 = 0, (19.6.9)
β→∞

which proves the claim.

19.7 Bibliographical notes

1. The results in this chapter are taken from Bovier, den Hollander and Spitoni [32].
The “bootstrap percolation map” in Definition 19.1 is taken from Kotecký and
Olivieri [155].

2. If we draw the starting configuration from some subset of S that has a strong
recurrence property under the dynamics, then the choice of initial distribution on
this subset should not matter. This issue remains to be resolved. Gaudilliere, den
Hollander, Nardi, Olivieri and Scoppola [118–120] provide a partial answer within
the pathwise approach to metastability, i.e., up to exponential order in β.

3. We expect Theorem 19.4(c) to hold for values of M that grow with β as M =

eo(β) . As we saw in Sect. 19.4, the necessary capacity estimates carry over, but
the necessary equilibrium potential estimates do not. Also this issue remains to be
resolved.

4. The extension of the main theorem in Sect. 19.1.2 from two to three (and higher)
dimensions is straightforward. See also Sect. 17.6.

5. Theorem 19.4 identifies the first time when a critical droplet appears somewhere
in Λβ . It is a different issue to compute the first time when the plus-phase appears
near the origin. Two regimes have been studied: (1) |Λ| = ∞, h ∈ (0, 2J ), β → ∞;
(2) |Λ| = ∞, J > 0, β > 0 large enough, h ↓ 0. Regime (1) was considered in
two dimensions by Dehghanpour and Schonmann [77, 78], and in three and higher
dimensions by Cerf and Manzo [54]. Regime (2) was considered in two dimensions
by Schonmann [211–214], and Shlosman and Schonmann [215]. The invasion time
is identified up to errors that are subexponential in β, respectively, 1/ h. Proofs are
hard because the invasion time depends on where critical droplets appear for the
first time, how they grow and diffuse, how they meet other droplets along the way
and possibly merge with them, and how they eventually invade the origin. We will
return to this problem in Chap. 23.

6. The analogue of regime (1) in item 5 for the Blume-Capel model (recall
Sect. 17.7, item 8), was studied in Manzo and Olivieri [173].
Chapter 20
Kawasaki Dynamics

Tout le monde trouve à redire en autrui ce qu’on trouve à redire

en lui. (François de La Rochefoucauld, Réflexions)

The goal of this chapter is to extend the analysis in Chap. 19 to Kawasaki dynamics.
We will see that, again, the average time until the appearance of a critical droplet
somewhere is inversely proportional to the volume, and is driven by the same quan-
tities Γ and K as for small volumes. However, in the proof we encounter several
difficult issues, all coming from the fact that Kawasaki dynamics is conservative.
The first is to understand why Γ , representing the energetic cost to create a critical
droplet in a small box with an open boundary, i.e., in a grand-canonical setting,
reappears even though we choose our box to have a closed boundary, i.e., we work
in a canonical setting. This “mystery” will be resolved by the observation that the
formation of a critical droplet reduces the entropy of the system: the precise compu-
tation of this entropy loss yields Γ via dynamical equivalence of ensembles. The
second problem is to control the probability of a particle moving from the gas to
the protocritical droplet at the last stage of the nucleation, which plays a key role
in understanding how K comes up. This non-locality issue will be dealt with via
upper and lower estimates. As we will see, the latter in fact causes the scaling to be
slightly different than for small volumes.

20.1 Introduction and main results

20.1.1 Kawasaki dynamics in large volumes

We retain the setting of Sect. 18.1.1, and again let Λβ , Sβ and Hβ depend on β.
The main difference with the small volume situation described in Chap. 18 is that
we consider the dynamics on a torus rather than on a box with an open boundary,
and do not allow particles to be created or annihilated. Indeed, as Hamiltonian we
choose

Hβ (σ ) = −U σ (x)σ (y), σ ∈ Sβ , (20.1.1)
{x,y}∈(Λβ )∗

A. Bovier, F. den Hollander, Metastability,
Grundlehren der mathematischen Wissenschaften 351,
DOI 10.1007/978-3-319-24777-9_20
480 20 Kawasaki Dynamics

Fig. 20.1 An example of a configuration in S : no box BLβ (·) of size Lβ contains more than a
protocritical number of particles

and we work in the canonical ensemble, i.e., the second term in (18.1.2) is removed.
The number of particles in Λβ is taken to be
/ 0
nβ = ρβ |Λβ | , (20.1.2)

where ρβ is the particle density, which is chosen to be

ρβ = e−βΔ , Δ > 0. (20.1.3)

Here, the activity parameter Δ that was removed from the Hamiltonian resurfaces
via the density in Λβ , i.e., we view Λβ as a gas reservoir surrounding local volumes.
Because of particle conservation, the state space of our dynamics is the set

= σ ∈ Sβ : supp[σ ] = nβ ,
(nβ )
Sβ (20.1.4)

where supp[σ ] = {x ∈ Λβ : σ (x) = 1}.

Subcritical, protocritical and critical configurations Let Lβ be a reference dis-

tance, defined as
1 −βδβ
L2β = eβ(Δ−δβ ) = e (20.1.5)
ρβ
with δβ chosen such that

lim δβ = 0, lim βδβ = ∞, (20.1.6)

β→∞ β→∞

and such that Lβ is odd. What this says is that Lβ is marginally below the typical
interparticle distance.

Definition 20.1 Let BLβ (x), x ∈ Λβ , be the square box with side length Lβ centred
at x (see Fig. 20.1).
20.1 Introduction and main results 481

Fig. 20.2 Schematic picture of the sets S , C − , C + defined in Definition 20.1 and the set C˜
interpolating between C − and C +

(n )
(a) S = {σ ∈ Sβ β : |supp[σ ] ∩ BLβ (x)| ≤ c (c − 1) + 1 ∀ x ∈ Λβ }.
(b) P = {σ ∈ S : cβ (σ, σ ) > 0 for some σ ∈ S c }.
(c) C = {σ ∈ S c : cβ (σ, σ ) > 0 for some σ ∈ S }.
(d) C − = {σ ∈ C : ∃ x ∈ Λβ such that BLβ (x) contains a protocritical droplet
whose lower-left corner is at x plus a free particle}.
(e) C + = the set of configurations obtained from C − by moving the free particle
to a site at distance 2 from the protocritical droplet, i.e., next to its boundary.
(f) C˜ = the set of configurations “interpolating” between C − and C + , i.e., the free
particle is somewhere between the boundary of the protocritical droplet and the
boundary of the box of size Lβ around it (see Fig. 20.2).

As in Chap. 19, we refer to S , P and C as the set of subcritical, protocritical,

respectively, critical configurations. Note that, for every σ ∈ S , the number of par-
ticles in a box of size Lβ does not exceed the number of particles in a protocritical
droplet. These particles do not have to form a cluster or to be near to each other, be-
cause the Kawasaki dynamics brings them together in a time of order L2β = o(1/ρβ ).

Remark 20.2 The sets P, C will play a similar rôle as, but are not directly compa-
rable with, the sets P , C in Chap. 18.

Sets of starting configurations The initial distribution will again be concentrated

on sets SL ⊂ S , this time defined by
(n )
SL = σ ∈ Sβ β : supp[σ ] ∩ BLβ (x) ≤ L ∀ x ∈ Λβ , (20.1.7)

for any L ∈ N that satisfies L∗ ≤ L ≤ c (c − 1) + 1 with

# $
∗ μβ (SL )
L = min 1 ≤ L ≤ c (c − 1) + 1 : lim =1 , (20.1.8)
β→∞ μβ (S )
482 20 Kawasaki Dynamics

(n )
where μβ is the canonical Gibbs measure associated with Hβ living on Sβ β . In
words, SL is the subset of those subcritical configurations for which no box of size
Lβ carries more than L particles, with L chosen such that SL is typical within S
under the Gibbs measure μβ as β → ∞.
Note that Sc (c −1)+1 = S . As for Glauber, the value of L∗ depends on how fast
Λβ grows with β. In Sect. 20.4.4 we will show that, for every 1 ≤ L ≤ c (c − 1),

lim μβ (SL )/μβ (S ) = 1 if and only if lim |Λβ |e−β(ΓL+1 −(Δ−δβ )) = 0

β→∞ β→∞
(20.1.9)
with ΓL+1 the energy needed to create a droplet of L+1 particles (closest in shape to
a square or quasi-square) in BLβ (0) under the grand-canonical Hamiltonian on this
box. Thus, if |Λβ | = eθβ , then L∗ = L∗ (θ ) = [c (c −1)+1]∧min{L ∈ N : ΓL+1 −
Δ > θ }, which increases stepwise from 1 to c (c − 1) + 1 as θ increases from Δ to
Γ , the communication height in Chap. 18.

Initial distribution We choose the initial distribution to be the last-exit biased

distribution on S for the crossover to S c \C˜, respectively, DM , M ∈ N, M ≥ c ,
defined by

DM = σ ∈ Sβ : ∃ x ∈ Λβ such that supp[σ ] ⊃ RM,M (x) , (20.1.10)

i.e., the set of configurations containing a supercritical droplet of size M.

20.1.2 Main theorem

Throughout this chapter we assume that we are in the metastabe regime where Δ ∈
(U, 2U ) (recall Sect. 18.1.2). We further assume that

lim |Λβ | L2β e−βΓ = 0.

lim |Λβ | ρβ = ∞, (20.1.11)
β→∞ β→∞

This first condition says that the number of particles tends to infinity, and ensures
that the formation of a critical droplet somewhere does not globally deplete the
surrounding gas. The second condition ensures that the set of configurations with
a protocritical droplet and a free particle within distance Lβ is atypical compared
to S .
Write N = N(c ) to denote the number of protocritical droplets modulo shifts
for Kawasaki dynamics in small volumes, which was identified in (18.1.13).

Theorem 20.3 (Mean crossover time) Subject to (20.1.8) and (20.1.11), the follow-
ing hold:
(a)
4π −βΓ 1
lim |Λβ | e EνS ,(S c \C˜)∪C + (τ(S c \C˜)∪C + ) = . (20.1.12)
β→∞ βΔ L N
20.2 Average time to create a critical droplet 483

(b)

4π −βΓ 1
lim |Λβ | e EνSL ,DM (τDM ) = , ∀ c ≤ M ≤ 2c − 1. (20.1.13)
β→∞ βΔ N

20.1.3 Discussion

1. Theorem 20.3(a) says that the average time to create a critical droplet is

[1 + o(1)](βΔ/4π)eβΓ /N|Λβ |. The factor βΔ/4π comes from the simple ran-
dom walk that is performed by the free particle “from the gas to the protocritical
droplet” (i.e., as the dynamics goes from C − to C + ), while the factor N counts the
number of shapes of the protocritical droplet. Theorem 20.3(b) says that, once the
critical droplet is created, it rapidly grows to a droplet that has twice the size.

2. In Sect. 20.5 we will show that the average probability under the Gibbs measure
μβ of destroying a supercritical droplet and returning to a configuration in SL is
exponentially small in β. Hence, the crossover from SL to S c \C˜ ∪ C + represents
the threshold for nucleation, and Theorem 20.3(a) represents the nucleation time.

3. The Λβ -dependence in Theorem 20.3(a) matches the Λ-dependence in Theo-

rem 18.4, with the logarithmic factor in (18.1.12) being linked to the extra factor
βΔ in (20.1.12). Note that this factor is particularly interesting, since it says that the
effective box size responsible for the formation of a critical droplet is Lβ .

Outline Theorem 20.3 is proved in Sects. 20.2–20.3. Along the way we need sev-
eral technical facts whose proofs are deferred to Sects. 20.4–20.5. These are all
related to the difficult issues mentioned in the opening of this chapter.

20.2 Average time to create a critical droplet

In this section we prove Theorem 20.3(a). Our starting point is the analogue of
(19.3.1) with S ∪ C and S c \C replaced by S ∪ (C˜\C + ) and (S c \C˜) ∪ C + .

20.2.1 Estimate of the equilibrium potential

Lemma 20.4 σ ∈S ∪(C˜\C + )
μβ (σ )hS c \C˜)∪C + (σ ) = μβ (S )[1 + o(1)] as
L ,(S
β → ∞.
484 20 Kawasaki Dynamics

Proof Write, using (7.1.16),

μβ (σ )hS ,(S c \C˜)∪C + (σ )
L
σ ∈S ∪(C˜\C + )

= μβ (SL ) + μβ (σ )Pσ (τSL < τ(S c \C˜)∪C + ). (20.2.1)
σ ∈(S \SL )∪(C˜\C + )

The last sum is bounded above by μβ (S \SL ) + μβ (C˜\C + ). But μβ (S \SL ) =

o(μβ (S )) as β → ∞ by our choice of L in (20.1.8). In Lemma 20.11 in Sect. 20.4.3
we will show that μβ (C˜\C + ) = o(μβ (S )) as β → ∞.

20.2.2 Estimate of the capacity

Lemma 20.5 cap(SL , (S c \C˜) ∪ C + ) = N |Λβ | βΔ

4π −βΓ
e μβ (S )[1 + o(1)] as
β → ∞.

Proof The argument is in the same spirit as that in Sect. 19.2.2. However, a number
of additional hurdles need to be taken that come from the conservative nature of
Kawasaki dynamics. The proof proceeds via upper and lower bounds, written out
below. Both take up quite a bit of space.

Upper bound

Proof The proof comes in 7 steps.

1. Protocritical droplet and free particle. We have

cap SL , S c \C˜ ∪ C + ≤ cap S ∪ C − , S c \C˜ ∪ C +
2
= min 1
2 μβ (σ )cβ σ, σ h(σ ) − h σ .
(nβ )
h : Sβ →[0,1] (nβ )
=1, h| =0
σ,σ ∈Sβ
S ∪C −
h|
(S c \C˜ )∪C +
(20.2.2)
Split the right-hand side into a contribution coming from σ, σ ∈ C˜ and the rest, i.e.,

r.h.s. (20.2.2) = I + γ1 (β), (20.2.3)

where
2
I= min 1
2 μβ (σ )cβ σ, σ h(σ ) − h σ (20.2.4)
h : C˜ →[0,1]
h| − =1, h| + =0
C C σ,σ ∈C˜
20.2 Average time to create a critical droplet 485

and γ1 (β) is an error term that will be estimated in Step 7. This term will turn
out to be small because μβ (σ )cβ (σ, σ ) is small when either σ ∈ Sβ β \C˜ or σ ∈
(n )

S β \C˜. Next, partition C˜, C − , C + into sets C˜(x), C − (x), C + (x), x ∈ Λβ , by

(n )
β
requiring that the lower-left corner of the protocritical droplet is in the center of the
box BLβ (x). Then, because cβ (σ, σ ) = 0 when σ ∈ C˜(x) and σ ∈ C˜(x ) for some
x = x , we may write

2
I = |Λβ | min 1
2 μβ (σ )cβ σ, σ h(σ ) − h σ . (20.2.5)
h : C˜ (0)→[0,1]
h|
C − (0)
=1, h| + =0
C (0) σ,σ ∈C˜(0)

2. Decomposition of configurations. Define (compare with (19.2.6))

Cˆ(0) = σ 1BLβ (0) : σ ∈ C˜(0) ,
(20.2.6)
Cˇ(0) = σ 1[BLβ (0)]c : σ ∈ C˜(0) .

Then every σ ∈ C˜(0) can be uniquely decomposed as σ = σ̂ ∨ σ̌ for some σ̂ ∈ Cˆ(0)

and σ̌ ∈ Cˇ(0). Note that Cˆ(0) has K = c (c −1)+2 particles and Cˇ(0) has nβ −K
particles (and recall that, by the first half of (20.1.11), nβ → ∞ as β → ∞). Define

C fp (0) = σ ∈ C˜(0) : Hβ (σ ) = Hβ (σ̂ ) + Hβ (σ̌ ) , (20.2.7)

i.e., the set of configurations consisting of a protocritical droplet and a free particle
inside BLβ (0) not interacting with the particles outside BLβ (0). Write C fp,− (0) and
C fp,+ (0) to denote the subsets of C fp (0) where the free particle is at distance Lβ ,
respectively, 2 from the protocritical droplet. Split the right-hand side of (20.2.5)
into a contribution coming from σ, σ ∈ C fp (0) and the rest, i.e.,

r.h.s. (20.2.5) = |Λβ | II + γ2 (β) , (20.2.8)

where

2
II = min 1
2 μβ (σ )cβ σ, σ h(σ ) − h σ (20.2.9)
h : C fp (0)→[0,1]
h| fp,− =1, h| fp,+ =0 σ,σ ∈C fp (0)
C (0) C (0)

and γ2 (β) is an error term that will be estimated in Step 6. This term will turn out
to be small because of loss of entropy when the particle is at the boundary.
486 20 Kawasaki Dynamics

3. Reduction to capacity of simple random walk. Estimate

II = min 1
2
h : C fp (0)→[0,1]
h| =1, h| fp,+ =0
C fp,− (0) C (0) σ̌ ,σ̌ ∈Cˇ(0) σ̂ ,σ̂ ∈Cˆ (0):
σ̂ ∨σ̌ ,σ̂ ∨σ̌ ∈C fp (0)
2
μβ (σ̂ ∨ σ̌ ) cβ σ̂ ∨ σ̌ , σ̂ ∨ σ̌ h(σ̂ ∨ σ̌ ) − h σ̂ ∨ σ̌

≤ min 1
2
g : Cˆ (0)→[0,1]
g| ˆ − =1, g| ˆ + =0
C (0) C (0) σ̌ ∈Cˇ(0) σ̂ ,σ̂ ∈Cˆ (0):
σ̂ ∨σ̌ ,σ̂ ∨σ̌ ∈C fp (0)
2
μβ (σ̂ ∨ σ̌ ) cβ σ̂ ∨ σ̌ , σ̂ ∨ σ̌ g(σ̂ ) − g σ̂ , (20.2.10)

where Cˆ− (0), Cˆ(0)+ denote the subsets of Cˆ(0) where the free particle is at dis-
tance Lβ , respectively, 2 from the protocritical droplet, and the inequality comes
from substituting

h(σ̂ ∨ σ̌ ) = g(σ̂ ), σ̂ ∈ Cˆ(0), σ̌ ∈ Cˇ(0), (20.2.11)

and afterwards replacing the double sum over σ̌ , σ̌ ∈ Cˇ(0) by the single sum over
σ̌ ∈ Cˇ(0) because cβ (σ̂ ∨ σ̌ , σ̂ ∨ σ̌ ) > 0 only if either σ̂ = σ̂ or σ̌ = σ̌ (the
dynamics updates one pair of neighbouring sites at a time). Next, estimate

r.h.s. (20.2.10)
1
≤ (nβ )
e−βHβ (σ̌ ) min 1
2
Z g : Cˆ (0)→[0,1]
σ̌ ∈Cˇ(0) β g| ˆ − =1, g| ˆ + =0
C (0) C (0)
σ̂ ,σ̂ ∈Cˆ (0)
σ̂ ∨σ̌ ,σ̂ ∨σ̌ ∈C fp (0)
2
e−βHβ (σ̂ ) cβ σ̂ , σ̂ g(σ̂ ) − g σ̂ , (20.2.12)

where we used Hβ (σ ) = Hβ (σ̂ ) + Hβ (σ̌ ) from (20.2.7) and write cβ (σ̂ , σ̂ ) to

denote the transition rate associated with the Kawasaki dynamics restricted to
BLβ (0), which clearly equals cβ (σ̂ ∨ σ̌ , σ̂ ∨ σ̌ ) for every σ̌ ∈ Cˇ(0) such that
σ̂ ∨ σ̌ , σ̂ ∨ σ̌ ∈ C fp (0) because there is no interaction between the particles in-
side and outside BLβ (0). The minimum in the r.h.s. of (20.2.12) can be estimated
from above by

minimum in (20.2.12) ≤ Vβ (σ ) (20.2.13)
σ ∈P (0)

with P(0) the set of protocritical droplets with lower-left corner at 0, and
2
Vβ (σ ) = min 1
2 f (x) − f x , (20.2.14)
f : Z2 →[0,1]
f |Pσ (0) =1, f |[B (0)]c =0 x,x ∈Z2
Lβ x∼x
20.2 Average time to create a critical droplet 487

where Pσ (0) is the support of the protocritical droplet in σ , and x ∼ x means that
x and x are neighbouring sites. Indeed, (20.2.13) is obtained from the expression
in (20.2.12) by dropping the restriction σ̂ ∨ σ̌ , σ̂ ∨ σ̌ ∈ C fp (0), substituting

g Pσ (0) ∪ {x} = f (x), σ ∈ P(0), x ∈ BLβ (0)\Pσ (0), (20.2.15)

and noting that cβ (Pσ (0) ∪ {x}, Pσ (0) ∪ {x }) = 1 when x ∼ x and zero otherwise.
What (20.2.14) says is that
c
Vβ (σ ) = cap Pσ (0), BLβ (0) (20.2.16)

is the capacity of simple random walk between the protocritical droplet Pσ (0) in σ
and the exterior of BLβ (0). Now, define

(nβ −K)
Žβ (0) = e−βHβ (σ̌ ) . (20.2.17)
σ̌ ∈Cˇ(0)

Then we obtain from (20.2.12)–(20.2.13) that

(nβ −K)
−β Γ¯
Žβ (0)
r.h.s. (20.2.12) ≤ e (nβ )
Vβ (σ ), (20.2.18)
Zβ σ ∈P (0)

where Γ¯ = −U [(c −1)2 +c (c −2)+1] is the binding energy of the protocritical
droplet.

4. Capacity estimate. For future reference we state the following estimate on ca-
pacities for simple random walk.

Lemma 20.6 Let U ⊂ Z2 be any set such that {0} ⊂ U ⊂ Bk (0), with k ∈ N0 inde-
pendent of β. Let V ⊂ Z2 be any set such that [BKLβ (0)]c ⊂ V ⊂ [BLβ (0)]c , with
K ∈ N independent of β. Then
c c
cap {0}, BKLβ (0) ≤ cap(U, V ) ≤ cap Bk (0), BLβ (0) . (20.2.19)

Moreover, via (20.1.5)–(20.1.6),

c 2π
cap Bk (0), BKLβ (0) = 1 + o(1)
ln(KLβ ) − ln k
4π
= 1 + o(1) , β → ∞. (20.2.20)
βΔ

Proof The inequalities in (20.2.19) follow from standard monotonicity properties of

capacities. The asymptotic estimate in (20.2.20) for capacities of concentric boxes
are standard (see e.g. Lawler [160], Sect. 2.3), and also follow by comparison to
Brownian motion.
488 20 Kawasaki Dynamics

We can apply Lemma 20.6 to estimate Vβ (σ ) in (20.2.16), since the protocritical

droplet with lower-left corner in 0 fits inside the box B2c (0). This gives

4π
Vβ (σ ) = 1 + o(1) , ∀ σ ∈ P(0), β → ∞. (20.2.21)
βΔ

Moreover, from Theorem 18.4 we know that |P(0)|, the number of shapes of the
protocritical droplet, equals N .

5. Equivalence of ensembles. According to Lemma 20.8 in Sect. 20.4.1, we have

(nβ −K)
Žβ (0)
(n )
= (ρβ )K μβ (S ) 1 + o(1) , β → ∞. (20.2.22)
Zβ β

This is an “equivalence of ensembles” property relating the probabilities to find

nβ − K, respectively, nβ particles inside [BLβ (0)]c (recall (20.2.6)). Combining
(20.2.2)–(20.2.3), (20.2.5), (20.2.8), (20.2.10), (20.2.12), (20.2.18) and (20.2.21)–
(20.2.22), we get

4π −βΓ
cap S , C + ≤ γ1 (β) + |Λβ |γ2 (β) + N |Λβ | e μβ (S ) 1 + o(1) ,
βΔ
β → ∞, (20.2.23)

where we use that Γ¯ + ΔK = Γ . This completes the proof of the upper bound,
provided that the error terms γ1 (β) and γ2 (β) are negligible.

6. Second error term. To estimate the error term γ2 (β), note that the configurations
in C˜(0)\C fp (0) are those for which inside BLβ (0) there is a protocritical droplet
whose lower-left corner is at 0, and at the boundary of βLβ (0) there is a particle that
is attached to some cluster outside βLβ (0). Recalling (20.2.5)–(20.2.9), we therefore
have
2
γ2 (β) ≤ μβ (σ )cβ σ, σ h(σ ) − h σ
σ ∈C˜(0)\C fp (0) σ ∈C˜(0)

≤ 6μβ C˜(0)\C fp (0) , (20.2.24)

where we use that h : C˜(0) → [0, 1], μβ (σ )cβ (σ, σ ) = μβ (σ ) ∧ μβ (σ ), and there
are at most 6 possible transitions from C˜(0)\C fp (0) to C˜(0): 3 through a move by
the particle at the boundary of BLβ (0) and 3 through a move by a particle in the
cluster outside BLβ (0). Since

Hβ (σ ) ≥ Hβ (σ̂ ) + Hβ (σ̌ ) − U, σ ∈ C˜(0)\C fp (0), (20.2.25)

20.2 Average time to create a critical droplet 489

it follows from the same argument as in Steps 3 and 5 that

¯
μβ C˜(0)\C fp (0) ≤ N e−β Γ (ρβ )K+1 μβ (S ) eβU 4(K − 1) 1 + o(1) ,
(20.2.26)
where (ρβ )K+1 comes from the fact that there are nβ − (K + 1) particles outside
BLβ +1 (0) (once more use Lemma 20.8 in Sect. 20.4.1), eβU comes from the gap in
(20.2.25), and 4(K − 1) counts the maximal number of places at the boundary of
BLβ (0) where the particle can interact with particles outside BLβ (0) due to the con-
straint that defines S (recall Definition 20.1)(a)). Since ρβ eβU = o(1), we therefore
see that γ2 (β) indeed is small compared to the main term of (20.2.23).

7. First error term. To estimate the error term γ1 (β), we define the sets of pairs of
configurations
(n ) 2
I1 = (σ, η) ∈ Sβ β : σ ∈ S , η ∈ S c \C˜ ,
(n ) 2 (20.2.27)
I2 = (σ, η) ∈ Sβ β : σ ∈ C˜, η ∈ S c \C˜ ,

and estimate

2
γ1 (β) ≤ μβ (σ ) cβ (σ, η) = 12 Σ(I1 ) + 12 Σ(I2 ). (20.2.28)
i=1 (σ,η)∈Ii

The sum Σ(I1 ) can be written as

1
Σ(I1 ) = |Λβ | cβ (η, σ ) 1 supp[η] ∩ BLβ (0) = K (n )
e−βHβ (η) ,
σ ∈P η∈S c \C˜ Zβ β
(20.2.29)
(nβ )
where we use that μβ (σ )cβ (σ, η) = μβ (η)cβ (η, σ ), σ, η ∈ Sβ , and cβ (η, σ ) = 0,
η ∈ S c \C˜, σ ∈
/ P (recall Definition 20.1(b)). We have

Hβ (η) ≥ Hβ (η̂) + Hβ (η̌) − kU, η ∈ S c \C˜, (20.2.30)

where k counts the number of pairs of particles interacting across the boundary of
/ C˜, we have
BLβ (0). Moreover, since η ∈

Hβ (η̂) ≥ Γ¯ + U. (20.2.31)

Inserting (20.2.30)–(20.2.31) into (20.2.29), we obtain

¯
K
k
Σ(I1 ) ≤ |Λβ | e−β Γ μβ (S ) 1 + o(1) (ρβ )K+k 4(K − 1) eβ(k−1)U
k=0
−β Γ¯

= |Λβ | e μβ (S ) 1 + o(1) e−βU , (20.2.32)
490 20 Kawasaki Dynamics

where (ρβ )K+k comes from the fact that there are nβ − (K + k) particles outside
BLβ +1 (0) (once more use Lemma 20.8 in Sect. 20.4.1), and the inequality again
uses an argument similar as in Steps 3 and 5. Therefore Σ(I1 ) is small compared
to the main term of (20.2.23). The sum Σ(I2 ) can be estimated as

Σ(I2 ) = μβ (σ ) cβ (σ, η)
σ ∈C˜ η∈S c \C˜

= |Λβ | μβ (σ ) cβ (σ, η)
σ ∈C˜(0) η∈S c \C˜(0)

≤ |Λβ | μβ C˜(0) e−β U + (4Lβ ) ρβ 1 + o(1) , (20.2.33)

where the first term comes from detaching a particle from the critical droplet and
the second term from a extra particle entering BLβ (0). The term between braces
is o(1). Moreover, μβ (C˜(0)) = μβ (C fp (0)) + μβ (C˜(0)\C fp (0)). The second term
was estimated in (20.2.26), the first term can again be estimated as in Steps 3 and 5:

μβ C fp (0) = μβ (σ̂ ∨ σ̌ )
σ̂ ∈Cˆ(0) σ̌ ∈Cˇ (0)
σ̂ ∨σ̌ ∈C fp (0)

(nβ −K)
−β Γ¯
Žβ (0)
= N e−βΓ μβ (S ) 1 + o(1) .

=Ne (n )
(20.2.34)
Zβ β

Therefore also Σ(I2 ) is small compared to the main term of (20.2.23).

Having completed the proof of the upper bound in Lemma 20.5, we next turn to
the proof of the lower bound.

Lower bound

For future reference we state the following property of the harmonic function for
simple random walk on Z2 .

Lemma 20.7 Let g be the harmonic function of simple random walk on B2Lβ (0)
(which is equal to 1 on {0} and 0 on [B2Lβ (0)]c ). Then there exists a constant C < ∞
such that
c
g(z) − g(z + e) + ≤ C/Lβ ∀ z ∈ BLβ (0) . (20.2.35)
e

Proof See e.g. Lawler, Schramm and Werner [161], Lemma 5.1. The proof can be
given via the estimates in Lawler [160], Sect. 1.7, or via a coupling argument.
20.2 Average time to create a critical droplet 491

The proof of the lower bound follows the same line of argument as for Glauber
dynamics in that it relies on the construction of a suitable unit flow. This flow will,
however, be considerably more difficult. In particular, we will no longer be able
to get away with choosing a deterministic flow, and the full power of the Berman-
Konsowa variational principle has to be brought to bear.

Proof The proof comes in 5 steps.

1. Starting configurations. We start our flow on a subset of the configurations in

C + that is sufficiently large and sufficiently convenient. Let C2+ ⊂ C + denote the
set of configurations having a protocritical droplet with lower-left corner at some
site x ∈ Λβ , a free particle at distance 2 from this protocritical droplet, no other
particles in the box B2Lβ (x), and satisfying the constraints in SL , i.e., all other
boxes of size 2Lβ carry no more particles than there are in a protocritical droplet.
This is the same as C + , except that the box around the protocritical droplet has size
2Lβ rather than Lβ .
(n −K)
Let K = c (c − 1) + 2 be the volume of the critical droplet, and let S2 β be
the analogue of S when the total number of particles is nβ − K and the boxes in
which we count particles have size 2Lβ (compare with Definition 20.1). Similarly
as in (19.2.17), our task is to derive a lower bound for cap(SL , (S c \C˜) ∪ C + ) =
cap((S c \C˜) ∪ C + , SL ) ≥ cap(CL , SL ), where CL ⊂ C2+ ⊂ C + defined by

(n −K)
CL = σ ∪ P(y) (x, z) : σ ∈ S2 β , x, y ∈ Λβ , (x, y, z) ⊥ σ (20.2.36)

is the analogue of (19.2.13), namely, the set of configurations obtained from

(n −K)
S2 β by adding a critical droplet somewhere in Λβ (lower-left corner at x,
protuberance at y, free particle at z) such that it does not interact with the particles
in σ and has an empty box of size 2Lβ around it. Note that the nβ − K particles can
block at most nβ (2Lβ )2 = o(|Λβ |) sites from being the center of an empty box of
size 2Lβ , and so the critical particle can be added at |Λβ | − o(|Λβ |) locations.
We partition CL into sets CL (x), x ∈ Λβ , according to the location of the proto-
critical droplet. It suffices to consider the case where the critical droplet is added at
x = 0, because the union over x trivially produces a factor |Λβ |.

2. Overall strategy. Starting from a configuration in CL (0), we will successively

pick K − L particles from the critical droplet (starting with the free particle at z
at distance 2) and move them out of the box BLβ (0), placing them essentially uni-
formly in the annulus B2Lβ (0)\BLβ (0). Once this has been achieved, the configu-
ration is in SL . Each such move will produce an entropy of order L2β , which will
be enough to compensate for the loss of energy in tearing down the droplet. The
order in which the particles are removed follows the canonical order employed in
the lower bound for Glauber dynamics (recall Fig. 19.2). As for Glauber, we will
492 20 Kawasaki Dynamics

use Theorem 7.43 to estimate

(γ )
τ −1
f (γk , γk+1 )
cap(CL , SL ) ≥ |Λβ | P (γ )
f
μβ (γk )cβ (γk , γk+1 )
σ ∈CL (0) γ : γ0 =σ k=0
(20.2.37)
for a suitably constructed flow f and associated path measure Pf , starting from
some initial distribution on CL (0) (which as for Glauber will be irrelevant), and
τ (γ ) the time at which the last of the K − L particles exits the box BLβ (0).
The difference between Glauber and Kawasaki is that, while in Glauber the
droplet can be torn down via single spin-flips, in Kawasaki after we have detached
a particle from the droplet we need to move it out of the box BLβ (0), which takes
a large number of steps. Thus, τ (γ ) is the sum of K − L stopping times, each of
which, except the first, is a sum of two stopping times itself, one to detach the parti-
cle and one to move it out of the box BLβ (0). With each motion of a single particle
we need to gain an entropy factor of order close to 1/ρβ . This will be done by con-
structing a flow that involves only the motion of this single particle, based on the
harmonic function of the simple random walk in the box B2Lβ (0) up to the boundary
of the box BLβ (0). Outside BLβ (0) the flow becomes more complex: we modify it
in such a way that a small fraction of the flow, of order Lβ−1+ε for some ε > 0 small
enough, is going into the direction of removing the next particle from the droplet.
The reason for this choice is that we want to make sure that the flow becomes suffi-
ciently small, of order Lβ−2+ε , so that this can compensate for the fact that the Gibbs
weight in the denominator of the lower bound in Theorem 7.43 is reduced by a fac-
tor e−βU when the protuberance is detached. The reason for the extra ε is that we
want to make sure that, along most of the paths, the protuberance is detached before
the first particle leaves the box B2Lβ (0).
Once the protuberance detaches itself from the protocritical droplet, the first par-
ticle stops and the second particle moves in the same way as the first particle did
when it moved away from the protocritical droplet, and so on. This is repeated until
no more than L particles remain in BLβ (0), by which time we have reached SL .
As we will see, the only significant contribution to the lower bound comes from the
motion of the first particle (as for Glauber), and this coincides with the upper bound
established earlier. The details of the construction are to some extent arbitrary and
there are many other choices imaginable.

3. First particle. We first construct the flow that moves the particle at distance 2
from the protocritical droplet to the boundary of the box BLβ (0). This flow will
consist of independent flows for each fixed shape and location of the critical droplet,
and will be seen to produce the essential contribution to the lower bound.
We label the configurations in CL (0) by σ , describing the shape of the critical
droplet, as well as the configuration outside the box B2Lβ (0), and we label the posi-
tion of the free particle in σ by z1 (σ ).
Let g be the harmonic function for simple random walk with boundary condi-
tions 0 on [B2Lβ (0)]c and 1 on the critical droplet. Then we choose our flow to be
20.2 Average time to create a critical droplet 493

C1 [g(z) − g(z + e)]+ , if z = z + e, e = 1,
f σ (z), σ z = (20.2.38)
0, otherwise,

where σ (z) is the configuration obtained from σ by placing the first particle at site z.
The constant C1 is chosen to ensure that f defines a unit flow, i.e.,

C1 g z1 (σ ) − g z1 (σ ) + e
σ ∈CL (0) z1 (σ ),e
c
= C1 cap Pσ (0), B2Lβ (0) = 1, (20.2.39)
σ ∈CL (0)

where Pσ (0) denotes the support of the protocritical droplet in σ , and the capacity
refers to the simple random walk.
Now, let z1 (k) be the location of the first particle at time k, and
c
τ 1 = inf k ∈ N : z1 (k) ∈ BLβ (0) (20.2.40)

be the first time when, under the Markov chain associated to the flow f , it exits
BLβ (0). Let γ be a path of this Markov chain. Then, by (20.2.38)–(20.2.39), we
have
1

τ
f (γk , γk+1 ) C1 [g(z1 (0)) − g(z1 (τ 1 ))]
= (20.2.41)
μβ (γk )cβ (γk , γk+1 ) μβ (γ0 )
k=0

where the sum over the g’s is telescoping because only paths along which the
g-function decreases carry positive probability, and cβ (γk , γk+1 ) = 1 for all 0 ≤ k ≤
τ 1 because the first particle is free. We have g(z1 (0)) = 1, while, by Lemma 20.7,
there exists a C < ∞ such that
c
g(x) ≤ C/ ln Lβ , x ∈ BLβ (0) . (20.2.42)

Therefore
1

τ
f (γk , γk+1 ) C1
= 1 + o(1) . (20.2.43)
μβ (γk )cβ (γk , γk+1 ) μβ (γ0 )
k=0

Next, by Lemma 20.6, we have

c 4π
cap Pσ (0), B2Lβ (0) = 1 + o(1) , σ ∈ CL (0), β → ∞, (20.2.44)
βΔ

(because {0} ⊂ Pσ (0) ⊂ B2c (0) for all σ ∈ CL (0)). Since N = |CL (0)|, it follows
from (20.2.39) that
1 4π
=N 1 + o(1) , (20.2.45)
C1 βΔ
494 20 Kawasaki Dynamics

and so (20.2.43) becomes

τ1 −1
f (γk , γk+1 ) 4π
= μβ (γ0 ) N 1 + o(1) , β → ∞. (20.2.46)
μβ (γk )cβ (γk , γk+1 ) βΔ
k=0

This is the contribution we want, because when we sum (20.2.46) over γ0 = σ ∈

CL (0) (recall (20.2.37)), we get a factor

μβ CL (0) = e−βΓ μβ (S ) 1 + o(1) . (20.2.47)

To see why (20.2.47) is true, recall from (20.2.36) that CL (0) is obtained from
(n −K)
S2 β by adding a critical droplet with lower-left corner at the origin that does
not interact with the nβ − K particles elsewhere in Λβ . Hence
(nβ −K)
¯ Z̃β (0)
μβ CL (0) = e−β Γ (n )
, (20.2.48)
Zβ β
(n −K) (n −K)
where Z̃β β (0) is the analogue of Žβ β (0) (defined in (20.2.17)) obtained by
requiring that the nβ −K particles are in [Rc ,c (0)]c instead of [BLβ (0)]c . However,
it will follow from the proofs of Lemmas 20.8–20.10 in Sect. 20.4 that, similarly as
in (20.2.22),
(nβ −K)
Z̃β (0)
(n )
= (ρβ )K μβ (S ) 1 + o(1) , β → ∞, (20.2.49)
Zβ β

which yields (20.2.47) because Γ = Γ¯ + KΔ. For the remaining part of the con-
struction of the flow it therefore suffices to ensure that the sum beyond τ 1 gives a
smaller contribution.

4. Second particle. Once the first particle (i.e., the free particle) has left the box
BLβ (0), we need to allow the second particle (i.e., the protuberance) to detach it-
self from the protocritical droplet and to move out of BLβ (0) as well. The problem is
that detaching the second particle reduces the Gibbs weight appearing in the denom-
inator by e−Uβ , while the increments of the flow are reduced only to about 1/Lβ .
Thus, we cannot immediately detach the second particle. Instead, we do this with
probability Lβ−1+ε only. The idea is that, once the first particle is outside BLβ (0),
we leak some of the flow that drives the motion of the first particle into a flow that
detaches the second particle. To do this, we have to first construct a leaky flow in
B2Lβ (0)\BLβ (0) for simple random walk. This goes as follows.
Let p(z, z + e) denote the transition probabilities of simple random walk driven
by the harmonic function g on B2Lβ (0). Put

p(z, z + e), if z ∈ BLβ (0),
p̃(z, z + e) = −1+ε (20.2.50)
(1 − Lβ ) p(z, z + e), if z ∈ B2Lβ (0)\BLβ (0).
20.2 Average time to create a critical droplet 495

Use the transition probabilities p̃(z, z + e) to define a path measure P̃ . This path
measure describes simple random walk driven by g, but with a killing probability
Lβ−1+ε inside the annulus B2Lβ (0)\BLβ (0). Put

k(z, z + e) = P̃ (γ )1(z,z+e)∈γ , z ∈ B2Lβ (0). (20.2.51)
γ

This edge function satisfies the following equations:

• k(z, z + e) = g(z) − g(z + e) + ,
if z ∈ BLβ (0),
• k(z, z + e) = 0,
(20.2.52)
if z ∈ B2Lβ (0)\BLβ (0) and g(z) − g(z + e) + = 0,

• 1 − Lβ−1+ε k(z + e, z)1g(z+e)−g(z)>0 = k(z, z + e)1g(z)−g(z+e)>0
e e
if z ∈ B2Lβ (0)\BLβ (0).
Note that inside the annulus B2Lβ (0)\BLβ (0) at each site the flow out is less than
the flow in by a leaking factor 1 − Lβ−1+ε . We pick ε > 0 so small that

eβU is exponentially smaller in β than Lβ2−ε (20.2.53)

(which is possible by (20.1.5)–(20.1.6)). The important fact for us is that this leaky
flow is dominated by the harmonic flow associated with g, in particular, the flow in
satisfies

k(z + e, z) ≤ g(z + e) − g(z) + ∀ z ∈ B2Lβ (0) (20.2.54)
e e

(and the same applies for the flow out). This inequality holds because g satisfies the
same equations as in (20.2.50)–(20.2.51) but without the leaking factor 1 − Lβ−1+ε .
Using this leaky flow, we can now construct a flow involving the first two parti-
cles, as follows:

• f σ (z1 , a), σ (z1 + e, a) = C1 k(z1 , z1 + e), (20.2.55)
if z1 ∈ B2Lβ (0),

• f σ (z1 , a), σ (z1 , b) = C1 Lβ−1+ε k(z1 , z1 + e),
e
if z1 ∈ B2Lβ (0)\BLβ (0),
# $
−1+ε
• f σ (z1 , z2 ), σ (z1 , z2 + e) = C1 Lβ k(z1 , z1 + e) g(z2 ) − g(z2 + e) + ,
e
if z1 ∈ B2Lβ (0)\BLβ (0), z2 ∈ BLβ (0)\Pσ (0).
496 20 Kawasaki Dynamics

Here, we write a and b for the locations of the second particle prior and after it
detaches itself from the protocritical droplet, and σ (z1 , z2 ) for the configuration
obtained from σ by placing the first particle (that was at distance 2 from the pro-
tocritical droplet) at site z1 and the second particle (that was the protuberance) at
site z2 . The flow for other motions is zero, and the constant C1 is the same as in
(20.2.38)–(20.2.39).
We next define two further stopping times, namely,

ζ 2 = inf k ∈ N : z2 (γk ) = b , (20.2.56)

i.e., the first time the second particle (the protuberance) detaches itself from the
protocritical droplet, and
c
τ 2 = inf k ∈ N : z2 (γk ) ∈ BLβ (0) , (20.2.57)

i.e., the first time the second particle exits the box BLβ (0). Note that, since we
choose the leaking probability to be Lβ−1+ε , the probability that ζ 2 is larger than
the first time the first particle exits B2Lβ (0) is of order exp[−Lεβ ] and hence is
negligible. We will disregard the contributions of such paths in the lower bound.
These paths will be called good.
We will next show that (20.2.41) also holds if we extend the sum along any path
of positive probability up to ζ 2 . The reason for this lies in Lemma 20.7. Let γ be
a path that has a positive probability under the path measure Pf associated with f
stopped at τ 2 . We will assume that this path is good in the sense described above.
To that end we decompose

2

τ
f (γk , γk+1 )
μβ (γk )cβ (γk , γk+1 )
k=0

τ 1 −2
ζ 2
f (γk , γk+1 ) f (γk , γk+1 )
= +
μβ (γk )cβ (γk , γk+1 ) μβ (γk )cβ (γk , γk+1 )
k=0 k=τ +1
1

2

τ
f (γk , γk+1 )
+ = I + II + III. (20.2.58)
μβ (γk )cβ (γk , γk+1 )
k=ζ 2 −1

The term I was already estimated in (20.2.41)–(20.2.47). To estimate II, we use

(20.2.42) and (20.2.54)–(20.2.55) to bound (compare with (20.2.41))

g(z1 (ζ 2 )) − g(z1 (τ 1 )) [C/ ln Lβ ]

II ≤ C1 ≤ C1 , (20.2.59)
μβ (γ0 ) μβ (γ0 )
20.2 Average time to create a critical droplet 497

which is negligible compared to I due to the factor C/ ln Lβ . It remains to esti-

mate III. Note that
2
f (γζ 2 −1 , γζ 2 )
τ
f (γk , γk+1 )
III = + . (20.2.60)
μβ (γζ 2 −1 )cβ (γζ 2 −1 , γζ 2 ) μβ (γk )cβ (γk , γk+1 )
k=ζ 2

The first term corresponds to the move when the protuberance detaches itself from
the protocritical droplet. Its numerator is given by f (σ (z1 , a), σ (z1 , b)) (for some
z1 ∈ [BLβ (0)]c ) which, by Lemma 20.7 and (20.2.54)–(20.2.55), is smaller than
C1 Lβ−1+ε CL−1β = C1 CLβ
−2+ε
. On the other hand, its denominator is given by

μ(γζ 2 −1 )cβ (γζ 2 −1 , γζ 2 ) = μβ (γ0 )e−Uβ . (20.2.61)

The same holds for the denominators in all the other terms in III, while the numer-
ators in these terms satisfy the bound

f (γk , γk+1 ) ≤ C1 C Lβ−2+ε g z2 (γk ) − g z2 (γk+1 ) . (20.2.62)

Adding up the various terms, we get that

C1 2C1 −2+ε βU

III ≤ Lβ−2+ε eβU 1 + g z2 ζ 2 − g z2 τ 2 ≤ L e .
μβ (γ0 ) μβ (γ0 ) β
(20.2.63)
The right-hand side is smaller than I by a factor Lβ−2+ε eβU , which, by (20.2.53), is
exponentially small in β.

5. Remaining particles. The lesson from the previous steps is that we can construct
a flow with the property that each time we remove a particle from the droplet we gain
a factor Lβ−2+ε , i.e., almost e−Δβ . (This entropy gain corresponds to the gain from
the magnetic field in Glauber dynamics, or from the activity in Kawasaki dynamics
on a finite open box.) We can continue our flow by tearing down the critical droplet
in the same order as we did for Glauber dynamics. Each removal corresponds to a
flow that is built in the same way as described in Step 4 for the second particle. There
will be some minor modifications involving a negligible fraction of paths where a
particle hits a particle that was moved out earlier, but this is of no consequence. As
a result of the construction, the sums along the remainders of these paths will give
only negligible contributions.
Thus, we have shown that the lower bound coincides, up to a factor 1 + o(1),
with the upper bound and the lemma is proven.

Combining the upper bound obtained in Sect. 20.2.2 with the lower bound ob-
tained in Sect. 20.2.2, we have finally completed the proof of Lemma 20.5, and
therefore of Theorem 20.3(a).
498 20 Kawasaki Dynamics

20.3 Average time to grow a droplet twice the critical size

In this section we prove Theorem 20.3(b). The starting point is again the analogue
of (19.3.1) with S c \C replaced by DM and S ∪ C by DM c .

Proof The same observation holds as in (19.4.1). Therefore the proof follows
along the same lines as that of Theorem 20.3(a). The main point is to prove
cap(DM , SL ) = [1 + o(1)]cap(C + , SL ). Since cap(SL , DM ) ≤ cap(SL , C + ), all
we need to do is prove a lower bound on cap(DM , SL ). This is done in almost
exactly the same way as for Glauber, by using the construction given there and sub-
stituting each Glauber move by a flow involving the motion of just two particles.
Note that, as long as M = eo(β) , an M × M droplet can be added at |Λβ | −
o(|Λβ |) locations to a configuration σ ∈ S (compare with (20.2.36)). The only
novelty is that we have to eventually remove the cloud of particles that is produced in
the annulus B2Lβ (0)\BLβ (0). This is done in much the same way as before. As long
as only eo(β) particles have to be removed, potential collisions between particles can
be ignored as they are sufficiently unlikely.

20.4 Equivalence of ensembles

Recall that K = c (c − 1) + 2 is the number of particles in a critical droplet. For

m ∈ N0 , let
(n −m)
S (nβ −m) = σ ∈ Sβ β : supp[σ ] ∩ BLβ (x) < K ∀ x ∈ Λβ (20.4.1)

and
(nβ −m)
Zβ = e−βHβ (σ ) ,
(nβ −m)
σ ∈S
(20.4.2)
(nβ −m)
Žβ = e−βHβ (σ ) 1{supp[σ ]⊂Λβ \BLβ (0) } .
(nβ −m)
σ ∈S

The first is the partition function with nβ − m particles restricted such that no box
of size Lβ has ≥ K particles. The second is the same partition function but with the
additional restriction that no particle falls in BLβ (0).
The following lemma was used in (20.2.22), (20.2.26), (20.2.32) and (20.2.49).

(nβ −m) (nβ )

Lemma 20.8 Žβ /Zβ = (ρβ )m μβ (S (nβ ) ) [1 + o(1)] as β → ∞ for all
m ∈ N0 .

In Sects. 20.4.1–20.4.2 two lemmas are proved that combine to yield Lemma 20.8.
Sections 20.4.3–20.4.4 prove atypicality of critical droplets and typicality of starting
configurations.
20.4 Equivalence of ensembles 499

20.4.1 Partition functions for different numbers of particles

(nβ −m) (nβ )

Lemma 20.9 Zβ /Zβ = (ρβ )m [1 + o(1)] as β → ∞ for all m ∈ N.

Proof The proof comes in 5 steps.

1. It suffices to give the proof for m = 1. The same proof works for m ≥ 2 after we
replace nβ by nβ − m + 1. Write

1
e−βHβ (σ ∨1x ) 1{σ ∨1 ∈S (nβ ) }
(nβ )
Zβ =
nβ x
supp[σ ]⊂Λβ x∈Λβ \supp[σ ]
|σ |=nβ −1

= e−βHβ (σ ) I (σ ) + II(σ ) = I + II, (20.4.3)
supp[σ ]⊂Λβ
|σ |=nβ −1

where

1
I (σ ) = 1{σ ∨1 ∈S (nβ ) } ,
nβ x∈Λβ
x

dist(x,supp[σ ])>1
(20.4.4)
1
−β[Hβ (σ ∨1x )−Hβ (σ )]
II(σ ) = e 1{σ ∨1 ∈S (nβ ) } .
nβ x∈Λβ
x

dist(x,supp[σ ])=1

In the first sum the particle at x is free and Hβ (σ ∨ 1x ) = Hβ (σ ), while in the second
sum it is not free and Hβ (σ ∨ 1x ) < Hβ (σ ). For every σ ∈ S (nβ −1) , we have (recall
(20.4.1))

|Λβ | − (2Lβ + 1)2 (nβ − 1) ≤ 1{σ ∨1 ∈S (nβ ) } ≤ |Λβ |. (20.4.5)
x
x∈Λβ
dist(x,supp[σ ])>1

Moreover, by (20.1.2)–(20.1.3) and (20.1.5)–(20.1.6), we have L2β nβ = o(|Λβ |),

and so it follows that

|Λβ | |Λβ | (n −1)

I= 1 + o(1) e−βHβ (σ ) = 1 + o(1) Zβ β
nβ (nβ −1)
nβ
σ ∈S
1 (n −1)
= 1 + o(1) Zβ β . (20.4.6)
ρβ

We will show that II is exponentially smaller than I , which will prove the claim.
500 20 Kawasaki Dynamics

2. Let us define a 1-cluster as a maximal set of particles such that for each particle
in the cluster there is another particle in the cluster at distance ≤ 2. Write
nβ −1
1
II =
nβ
N =1 C ,...,CN
N 1
m=1 |Cm |=nβ −1

N
e−β b=a Hβ (Cb )−βHβ (Ca ∨1x )
1{x∪[∪N (nβ ) , (20.4.7)
a=1 Ca ]∈S }
a=1 x∈∂Ca

where N counts the number of 1-clusters, labelled C1 , . . . , CN . Order the 1-clusters

according to the number of particles they contain, by writing
nβ −1 'K−1 (
1 " 1
II = 1{K−1 kN =nβ −1}
nβ k=1 k Nk !
N1 ,...,NK−1 =0 k=1 C1k ,...,CN
k
k
|C1k |=···=|CN k |=k
k
Nk

K−2 Nk
K−1
k k
exp −β Hβ Cl 1{(k ,l )=(k,l)} − βHβ Cl ∨ 1x
k=1 l=1 x∈∂C k k =1 l =1
l

×1 N (nβ ) , (20.4.8)
k=1 ∪l=1 Cl ]∈S
{x∪[∪K−1 }
k k

where Nk counts the number of 1-clusters of size k labelled C1k , . . . , CN

k , and the
k
sum over k in the second line does not include the term with k = K − 1 because
ClK−1 ∨ 1x contains K particles, making it a supercritical cluster that is excluded
by the indicator in the third line (recall (20.4.1)).

3. By a standard isoperimetric inequality, we have that

Hβ Clk ∨ 1x ≥ Hk+1 ∀ x ∈ ∂Clk (20.4.9)

with Hk denoting the energy of a droplet of k particles that is closest to a square or

quasi-square. Therefore we may estimate
Nk

K−1
k k
exp −β Hβ Cl 1{(k ,l )=(k,l)} − βHβ Cl ∨ 1x
x∈∂Clk k =1 l =1

Nk

K−1
k
≤ 4k e−βHk+1 exp −β Hβ Cl 1{(k ,l )=(k,l)} . (20.4.10)
k =1 l =1

The last sum no longer contains the 1-cluster Clk that x is attached to. Since
|Clk | = k, the other 1-clusters contain a total of nβ − (k + 1) particles. Hence, in-
20.4 Equivalence of ensembles 501

serting (20.4.10) into (20.4.8), we arrive at the estimate

|Λβ |
K−2
(n −k−1)
II ≤ 4k e−βHk+1 Zβ β , (20.4.11)
nβ
k=1

where |Λβ | counts the possible locations of the 1-cluster that has been removed, we
trace back the decomposition in (20.4.7)–(20.4.8), and we use that if x ∈ ∂Clk , then
K−1 N 8 K−1 N 8
k
k
(nβ −k−1)
x∪ Clk ∈S (nβ )
⊂ Clk \Clk ∈S . (20.4.12)
k =1 l =1 k =1 l =1

However, the same argument as in Step 1 yields

(nβ −k−1) (nβ −1)
Zβ ≤ (ρβ )k Zβ 1 + o(1) . (20.4.13)

Combining (20.4.6), (20.4.11) and (20.4.13), we get

K−2

II ≤ 4k e−βHk+1 (ρβ )k I 1 + o(1) . (20.4.14)
k=1

4. In Step 5 below we will prove that, for c ≥ 3,

Hk + (k − 1)Δ > 0 ∀ 2 ≤ k ≤ (2c − 3)2 . (20.4.15)

Inserting this bound into (20.4.14) and using (20.1.3), we see that the sum in
(20.4.14) is O(e−βε ) for some ε > 0, and so II indeed is exponentially smaller
than I . Here, note that if c ≥ 3, then K ≤ (2c − 3)2 , which means that (20.4.15)
covers the range of k-values needed in (20.4.14). For c = 2 we have K = 4, but
H2 + Δ = −U + Δ > 0 and H3 + 2Δ = −2U + 2Δ > 0 because Δ > U , and so we
are done as well.

5. It remains to prove (20.4.15). Let |σ | denote the volume of σ (the number of

particles) and γ (σ ) the perimeter of σ (the number of holes next to a particle). Then
the energy of σ equals (recall (18.2.1))

Hβ (σ ) = −U 2|σ | − 12 γ (σ ) , σ ∈ Sβ . (20.4.16)

By the standard isoperimetric inequality, we have

2
|σ | ≤ 14 γ (σ ) ∀ σ ∈ Sβ . (20.4.17)

Hence
√ √
Hk + (k − 1)Δ ≥ −2U [k − k] + (k − 1)Δ = −(2U − Δ)k + 2U k − Δ.
(20.4.18)
√
Let ∗ = U/(2U − Δ). Then the right-hand side equals (2U − Δ)[−k + 2∗ k −
(2∗ − 1)], which is = 0 for k = 1 and > 0 for 2 ≤ k < (2∗ − 1)2 . Since c = 0∗ 1
502 20 Kawasaki Dynamics

and ∗ ∈
/ N (recall (18.1.5)–(18.1.6)), we have 2∗ − 1 > 2c − 3, which proves the
claim.

20.4.2 Partition functions for different volumes

(nβ −m) (nβ −m)
Lemma 20.10 limβ→∞ Žβ /Zβ = 1 for all m ∈ N0 .

Proof It suffices to give the proof for m = 0. The same proof works for m ≥ 1 after
(n ) (n )
we replace nβ by nβ − m. Since Žβ β ≤ Zβ β , it suffices to prove the lower bound.
Write
(nβ ) (nβ )
Zβ = Žβ

K
+ e−β Hβ (η∨ζ ) 1{supp[η]⊂BLβ (0)} 1{supp[ζ ]⊂[BLβ (0)]c }
m=1 η∈S (m) (nβ −m)
β ζ ∈Sβ
(nβ )
η∨ζ ∈S

(nβ )
≤ Žβ + γ1 (β) + γ2 (β),
(20.4.19)
where

K
γ1 (β) = e−β [Hβ (η)+Hβ (ζ )] 1{supp[η]⊂BLβ (0)} 1{supp[ζ ]⊂[BLβ (0)]c }
m=1 η∈S (m) (nβ −m)
β ζ ∈Sβ
(nβ )
η∨ζ ∈S
(20.4.20)
and γ2 (β) is a term that arises from particles interacting across the boundary of
BLβ (0). We will show that both γ1 (β) and γ2 (β) are negligible.
Estimate
γ1 (β)

K
(nβ −m)
≤ Žβ e−βHβ (η) 1{supp[η]⊂BLβ (0)}
m=1 η∈S (m)

(n )
K
= 1 + o(1) Žβ β (ρβ )m e−β Hβ (η) 1{supp[η]⊂BLβ (0)}
m=1 η∈S (m)

(n )
K
m j
= 1 + o(1) Žβ β (ρβ )m e−β i=1 Hβ (Ci ) ,
m=1 j =1 2≤k1 ,...,kj ≤K j
C=∪i=1 Ci ⊂BL (0)
j β
i=1 ki =m |Ci |=ki ∀ i
(20.4.21)
20.4 Equivalence of ensembles 503

where the first equality uses Lemma 20.9 with Λβ replaced by Λβ \BLβ (0), while
the second equality is an expansion in terms of clusters. Using once more the isoperi-
metric inequality in (20.4.15), we get (recall (20.1.5))

(n )
K
m j
γ1 (β) ≤ 1 + o(1) Žβ β (ρβ )m e−β i=1 Hki 1
m=1 j =1 2≤k1 ,...,kj ≤K j
C=∪i=1 Ci
j
i=1 ki =m |Ci |=ki ∀ i

K
m
2 j j
e−β
(nβ )
≤ A Žβ (ρβ )m Lβ i=1 Hki

m=1 j =1 2≤k1 ,...,kj ≤K

j
i=1 ki =m

K
m j
e−β i=1 [Hki +ki Δ−(Δ−δβ )]
(n )
= A Žβ β
m=1 j =1 2≤k1 ,...,kj ≤K
j
i=1 ki =m

(nβ ) −βε
≤ B Žβ e (20.4.22)
for some ε > 0 and some constants A, B < ∞ that are independent of β, i.e., γ1 (β)
is negligible. Estimate
γ2 (β) (20.4.23)

K
m
(nβ −m−k)
≤ e−βHβ (η) eβkU 1{supp[η]⊂BLβ (0)} Žβ
m=1 η∈S (m) k=1

K
m
(nβ )
≤ e−βHβ (η) eβkU 1{supp[η]⊂BLβ (0)} (ρβ )m+k Žβ 1 + o(1)
m=1 η∈S (m) k=1

(nβ )
K
−βHβ (η)

m
≤ 1 + o(1) Žβ (ρβ )m
e e−βk(Δ−U ) 1{supp[η]⊂BLβ (0)} ,
m=1 η∈S (m) k=1

and we can proceed as in (20.4.21)–(20.4.22) to show that

(nβ ) −βε
γ2 (β) ≤ C Žβ e (20.4.24)
for some ε > 0 and some constant C < ∞ that is independent of β, i.e., γ2 (β) is
negligible.

20.4.3 Atypicality of critical droplets

The following lemma was used in Sect. 20.2.1. Recall Definition 20.1, and note that
S = S (nβ ) .
504 20 Kawasaki Dynamics

Lemma 20.11 limβ→∞ μβ (C˜\C + )/μβ (S ) = 0.

Proof Similarly as in (20.4.19), we first write

∗

μβ C˜\C + ≤ μβ (C˜) = γ3 (β) + |Λβ | 1 + o(1)
(K) (nβ −K)
η∈Sβ ζ ∈ Sβ

e−β [Hβ (η)+Hβ (ζ )]

× (nβ )
1{supp[η]⊂BLβ (0)} 1{supp[ζ ]⊂[BLβ (0)]c } , (20.4.25)
Zβ

where the first sum runs over all configurations in BLβ (0) consisting of a proto-
critical droplet centred at the origin and a free particle elsewhere, and γ3 (β) is a
negligible term that arises from particles interacting across the boundary of BLβ (0),
similar as the term γ2 (β) in Sect. 20.4.2. The double sum in the right-hand side of
(20.4.25) equals
(nβ −K)
Žβ
BL (0) 1 + o(1) N e−β Γ¯ 1 + o(1) , (20.4.26)
(n ) β
Zβ β

where Γ¯ (= Γ − KΔ) is the energy of a critical droplet and N is the number of

(n −K) (nβ )
shapes of a critical droplet. By Lemma 20.8, Žβ β /Zβ = μβ (S (nβ ) )(ρβ )K [1+
o(1)]. Hence
¯
r.h.s. (20.4.25) = N |Λβ | L2β e−β Γ (ρβ )K μβ S (nβ ) 1 + o(1)

= N |Λβ | L2β e−βΓ μβ S (nβ ) 1 + o(1) , β → ∞, (20.4.27)

which is o(μβ (S (nβ ) )) by (20.1.11).

20.4.4 Typicality of starting configurations

In this section we prove the claim made in (20.1.9).

Proof Split
S (nβ ) = S = SL ∪ (S \ SL ) = SL ∪ U>L , (20.4.28)
where U>L ⊂ S are those configurations σ for which there exists an x such that
|supp[σ ] ∩ BLβ (x)| > L. Then

K

μβ (U>L ) = μβ (σ ) 1{|supp[σ ]∩BLβ (x)|=m} = |Λβ | ϕ(β)+γ (β) ,
x∈Λβ σ ∈S (nβ ) m=L+1
(20.4.29)
20.4 Equivalence of ensembles 505

where

K e−β[Hβ (η)+Hβ (ζ )]
ϕ(β) = (nβ )
m=L+1 η∈S (m) (nβ −m) Zβ
β ζ ∈Sβ
(nβ )
η∨ζ ∈S

× 1{supp[η]⊂BLβ (0)} 1{supp[ζ ]⊂[BLβ (0)]c } (20.4.30)

and γ (β) is an error term arising from particles interacting across the boundary of
BLβ (0). By the same argument as in (20.4.23), this term is negligible. Moreover,

(nβ −m)

K
Žβ
−β Hβ (η)
ϕ(β) ≤ (nβ )
e 1{supp[η]⊂BLβ (0)} (20.4.31)
m=L+1 Zβ η∈S (m)

(n )
K
−βHβ (η)
≤ 1 + o(1) μβ S β (ρβ )m
e 1{supp[η]⊂BLβ (0)} ,
m=L+1 η∈S (m)

where in the last inequality we use Lemmas 20.8–20.10. Now proceed as in

(20.4.21)–(20.4.22), via the cluster expansion, to get

K
m j
ϕ(β) ≤ 1 + o(1) A μβ S (nβ ) e−β i=1 [Hki +ki Δ−(Δ−δβ )]

m=L+1 j =1 2≤k1 ,...,kj ≤K

j
i=1 ki =m

≤ 1 + o(1) B μβ S (nβ ) e−β[ΓL+1 −(Δ−δβ )] ,
(20.4.32)
where Hk is the energy of a droplet with k particles that is closest to a square or
quasi-square, ΓL+1 = HL+1 + (L + 1)Δ, and the second inequality uses the isoperi-
metric inequality together with the fact that Hk +kΔ is increasing in k for subcritical
droplets.
On the other hand, by considering only those configurations in U>L that have a
droplet with L + 1 particles, we get

ϕ(β) ≥ 1 + o(1) A μβ S (nβ ) e−β[ΓL+1 −(Δ−δβ )] . (20.4.33)

Combining (20.4.29) and (20.4.32)–(20.4.33), we conclude that

lim μβ (U>L )/μβ S (nβ ) = 0 if and only if lim |Λβ | e−β[ΓL+1 −(Δ−δβ )] = 0.
β→∞ β→∞
(20.4.34)

506 20 Kawasaki Dynamics

20.5 The critical droplet is the threshold

We show that our estimates on capacities imply that the average probability under
the Gibbs measure μβ of destroying a supercritical droplet and returning to a con-
figuration in SL is exponentially small in β. The proof for Glauber dynamics can
be read off immediately, and fullfills the promise made in Sect. 19.1.3, Item 3.

Proof Pick M ≥ c . Recall (7.1.19)–(7.1.20) that (7.1.39). Summing over σ ∈

∂DM , the internal boundary of DM , we get that

σ ∈∂ DM μβ (σ )cβ (σ )Pσ (τSL < τDM ) cap(SL , DM )
= . (20.5.1)
σ ∈∂ DM μβ (σ )cβ (σ ) σ ∈∂ DM μβ (σ )cβ (σ )

Clearly, the left-hand side of (20.5.1) is the escape probability to SL from ∂DM
averaged with respect to the canonical Gibbs measure μβ conditioned on ∂DM and
weighted by the outgoing rate cβ . To show that this quantity is exponentially small
in β, which is our goal, it suffices to show that in the right-hand side of (20.5.1) the
denominator is large compared to the numerator.
By Lemma 20.5,
4π −βΓ
cap(SL , DM ) ≤ cap SL , S c \ C˜ ∪ C + = N |Λβ | e μβ (S ) 1 + o(1) .
Δβ
(20.5.2)
On the other hand, note that ∂DM contains all configurations σ for which there is
an M × M droplet somewhere in Λβ , all Lβ -boxes not containing this droplet carry
at most K particles, and there is a free particle somewhere in Λβ . The last condition
ensures that cβ (σ ) ≥ 1. Therefore we can use Lemma 20.8 to estimate
(nβ −M 2 )
Žβ
μβ (σ )cβ (σ ) ≥ |Λβ | e−βHM 2 (nβ )
σ ∈∂ DM Zβ

= |Λβ | e−βHM 2 (ρβ )M μβ (S ) 1 + o(1) ,
2
(20.5.3)

where HM 2 is the energy of an M × M droplet. Combining (20.5.2)–(20.5.3) we

find that the right-hand side of (20.5.1) is bounded from above by

4π exp[−βΓ ]
N 1 + o(1) , (20.5.4)
Δβ exp[−β(HM 2 + ΔM 2 )]

which is exponentially small in β because Γ > HM 2 + ΔM 2 for all M ≥ c .

20.6 Bibliographical notes

1. The results in this chapter are taken from Bovier, den Hollander and Spitoni [32].
Section 20.4 corrects flaws in the original proofs of Lemmas 20.8 and 20.11.
20.6 Bibliographical notes 507

2. In Gaudillière, den Hollander, Nardi, Olivieri, and Scoppola [118–120] the same
nucleation problem as in Sect. 20.1 is studied with the help of the pathwise approach
to metastability. Only the exponential asymptotics of the nucleation time is obtained,
but for a much wider class of initial distributions than we can presently handle with
the potential-theoretic approach. The techniques developed in these papers center
around the idea of approximating the low-temperature and low-density Kawasaki
lattice gas by an ideal gas (without interaction) and showing that this ideal gas
stays close to equilibrium while exchanging particles with droplets that are growing
and shrinking. In this way, the large system is shown to behave essentially like
the union of many small independent systems, leading to homogeneous nucleation.
The proofs are long and complicated, but they provide considerable detail about the
typical trajectory of the system prior to and shortly after the onset of nucleation,
something the potential-theoretic approach cannot offer.

3. If we worked in the grand-canonical ensemble, then we would have to consider

Kawasaki dynamics on Λβ with an open boundary and with Hamiltonian

H gc (σ ) = −U σ (x)σ (y) + Δ σ (x), σ ∈ Sβ , (20.6.1)
(x,y)∈Λβ x∈Λβ
x∼y

where Δ > 0 is the usual activity parameter mimicking the presence of an infinite
gas reservoir around Λβ . This was the setting of Chap. 17 for small volumes. For
large volumes, however, even with this Hamiltonian we still have to face all the
difficult issues of non-locality we struggled with in Sects. 20.2–20.5.

4. As for Theorem 19.4(c), we expect Theorem 20.3(b) to hold for values of M that
grow with β as M = eo(β) .

5. Gois and Landim [128] consider Kawasaki dynamics at inverse temperature β

on a two-dimensional torus of size L(β) in the limit as β ↓ 0 when L(β) → ∞.
Initially the particles form a square of size n. It is shown that, under certain growth
restrictions for L(β), most of the time the particles form a square of size n, and
that there is a time scale L2 (β)θ (β) on which the centre of the square performs a
Brownian motion. It is shown that C1 (n/L2 (β)) e2β ≤ θ (β) ≤ C2 n2 e2β for some
C1 , C2 ∈ (0, ∞).

6. The extension of our results to higher dimensions is limited only by the com-
binatorial problems involved in the computation of the number of critical droplets
(which is hard in the case of Kawasaki dynamics) and of the probability for simple
random walk to hit a critical droplet of a given shape when coming from far. Recall
Sect. 18.6.

7. There appears to be no work that deals with metastability of Kawasaki dynamics

in infinite volume, contrary to Glauber dynamics (recall Sect. 19.7, Item 5).
Part VIII
Applications: Lattice Systems in Small
Volumes at High Densities

Part VIII describes lattice systems in small volumes at high densities. The focus is
on the zero-range process, which consists of a collection of continuous-time simple
random walks with on-site attraction and no on-site repulsion. We consider the limit
where the particle density is high, show that the process spends most of its time in
a “condensed state”, i.e., a configuration where most of the particles pile up on a
single site, and prove that the process evolves via a “metastable hopping” of this
pile from one site to another. Both the hopping time and the hopping distribution
are computed.
Chapter 21
The Zero-Range Process

The shop seemed to be full of all manner of curious things—but

the oddest part of it all was that, whenever she looked at any
shelf, to make out exactly what it had on it, that particular shelf
was always quite empty, though the others round it were
crowded as full as they could hold.
(Lewis Carroll, Through the Looking-Glass, and what Alice
found there)

The zero-range process offers yet another example of a system for which potential-
theoretic methods can be used to describe metastable behaviour. The free energy
landscape is of a different nature than what we encountered in the models treated so
far. In particular, there is no temperature parameter, and the key quantity to control
is entropy. This necessitates a different approach to the choice of test functions to
estimates capacities, which is worthwhile to expose.

21.1 Model and basic properties

Let N ∈ N and S = {1, . . . , L}, L ∈ N. The zero-range process Y = (η(t))t≥0 on S
with N particles is the continuous-time Markov process with state space
# $
EN,S = η = (ηx )x∈S ∈ NS0 : ηx = N (21.1.1)
x∈S

such that in configuration η a particle jumps from site x to site y at rate g(ηx )r(x, y).
Here, ηx ∈ N0 represents the number of particles at site x ∈ S, r(·, ·) is an ir-
reducible probability transition kernel associated with a reversible random walk
X = (X(t))t≥0 on S, and g is chosen as
a(n)
g(0) = 0, g(1) = 1, g(n) = , n ∈ N\{1}, (21.1.2)
a(n − 1)
with
a(0) = 1, a(n) = nα for some α ∈ (1, ∞). (21.1.3)

A. Bovier, F. den Hollander, Metastability,
Grundlehren der mathematischen Wissenschaften 351,
DOI 10.1007/978-3-319-24777-9_21
512 21 The Zero-Range Process

Formally, Y is defined through its generator LN,S acting on functions F ∈

C(EN,S , R) as

(LN,S F )(η) = g(ηx )r(x, y) F ηx,y − F (η) , (21.1.4)
x,y∈S

where ηx,y is the configuration obtained from η by moving a particle from site x to
site y. Note that the zero-range dynamics preserve particles.
Lemmas 21.1–21.2 and Theorem 21.3 below are well-known results for the zero-
range process in equilibrium. For references, see the bibliography in Sect. 21.6.

Lemma 21.1 Y is irreducible, and is reversible with respect to the unique invariant
probability measure μN given by
η
N α m∗
μN,S (η) = , η ∈ EN,S , (21.1.5)
ZN,S a(η)

with
η
" "
m∗ = m∗ (x)ηx , a(η) = a(ηx ), (21.1.6)
x∈S x∈S

where
m(x)
m∗ (x) = , M∗ = max m(x), (21.1.7)
M∗ x∈S

with m the invariant measure of the random walk X, and ZN,S denotes the normal-
ising partition function
m∗
ζ
ZN,S = N α . (21.1.8)
a(ζ )
ζ ∈EN,S

Note that m∗ (x) = 1 for all x ∈ S∗ with

S∗ = y ∈ S : m(y) = M∗ . (21.1.9)

Lemma 21.2 For L fixed,

|S∗ | " "

lim ZN,S = ZS = Γx = |S∗ | Γ (α)|S∗ |−1 Γy , (21.1.10)
N →∞ Γ (α)
x∈S y ∈S
/ ∗

where
m∗ (x)j 1
Γx = , Γ (α) = . (21.1.11)
a(j ) a(j )
j ∈N0 j ∈N0
21.2 Metastable behaviour 513

The interesting feature of the zero-range process with a g-function given by

(21.1.2) is that it exhibits a condensation phenomenon. Namely, for large N the
invariant measure μN concentrates on disjoint sets of configurations ENx , x ∈ S∗ ,
which are defined as follows. Given a sequence (N )N ∈N such that

N
lim N = ∞, lim = 0, (21.1.12)
N →∞ N →∞ N

we say that a configuration has a condensate at site x ∈ S∗ when it belongs to the

set
ENx = {η ∈ EN,S : ηx ≥ N − N }. (21.1.13)

Theorem 21.3 Suppose that L = L(N) is such that limN →∞ L(N )/N = 0. Then
there exists a sequence (N )N ∈N satisfying (21.1.12) such that

lim μN,S ENx = 1. (21.1.14)

N →∞
x∈S∗

Note that the configurations ηx , x ∈ S∗ , given by

ηxx = N, ηyx = 0, y ∈ S\{x}, (21.1.15)

have maximal measure. When L is independent of N , this maximal measure is

bounded away from zero uniformly in N .
The condensation phenomenon in Theorem 21.3 already occurs when the parti-
cle density ρ = N/L exceeds a certain critical particle density ρc ∈ (0, ∞), which
depends on the parameters of the model.

21.2 Metastable behaviour

The question that will be addressed in this section is how Y moves between the
different condensate configurations.

21.2.1 Finite system size

First we state our results when L is kept fixed and N → ∞. Recall Definition 8.1.4.

Theorem 21.4 Y is metastable with respect to the set M = x∈S∗ ηx with ρ =
O(L(L2 + N)/N α+1 r ), where r = infu∈S r(u, u ± 1).
514 21 The Zero-Range Process

The proof of Theorem 21.4 will be given in Sect. 21.5 and is based on a compu-
tation of capacities capN,S (η, ζ ) between configurations η, ζ ∈ EN,S , where capN,S
refers to capacity associated with Y .
Define
1−s
Iα (s) = uα (1 − u)α du, 0 ≤ s ≤ 12 , (21.2.1)
s
and

ηU = ηx , ∅ = U ⊆ S∗ . (21.2.2)
x∈U

Theorem 21.5 (Sharp asymptotics of capacities) Let S∗1 , S∗2 S∗ be non-empty dis-
joint sets. Then
1 2 1
capN,S ηS∗ , ηS∗ = 1 + o(1)
2N α+1 M ∗ |S∗ | Iα (0) Γ (α)

× inf capS (x, y)[Wy − Wx ]2 , (21.2.3)
W ∈W (S∗1 ,S∗2 )
x,y∈S∗

where capN,S denotes capacity for Y , capS denotes capacity for X, and

W S∗1 , S∗2 = W = (Wz )z∈S ∈ [0, 1]S∗ : W |S∗1 = 1, W |S∗2 = 0 . (21.2.4)

Remark 21.6 Note that the second line in (21.2.3) is the conductance between S∗1
and S∗2 of a resistor network on S∗ with conductances capS (x, y) between sites
x, y ∈ S∗ .

Theorem 21.5 allows us to use Corollary 7.11 and Theorem 8.45 to obtain the
following result for the metastable exit times τM \ηx , x ∈ S∗ .

Corollary 21.7 (Mean and exponential law of metastable exit times) For every x ∈
S∗ the metastable exit time τM \ηx
(i) has asymptotic mean
N α+1 M∗ Iα (0) Γ (α)
Eηx [τM \ηx ] = 1 + o(1) , N → ∞, (21.2.5)
y∈S∗ \{x} capS (x, y)

(ii) on the scale of its mean has asymptotic exponential distribution

Pηx τM \ηx > t Eηx [τM \ηx ] = 1 + o(1) e−t[1+o(1)] , N → ∞. (21.2.6)

Remark 21.8 Combining the previous remark with Corollary 21.7 we see that, in the
limit as N → ∞, on the time scale N α+1 the zero-range process observed when it
hits the set M = ∪x∈S∗ ηx behaves like a continuous-time random walk with transi-
tion rates r̄(x, y) given by r̄(x, y) = M∗ Iα (0)Γ (α)capS (x, y)/ z∈S∗ \{x} capS (x, z).
21.2 Metastable behaviour 515

21.2.2 Diverging system size

Next we state our results when L = L(N) and N → ∞ with

L(N )
lim L(N) = ∞, lim = 0. (21.2.7)
N →∞ N →∞ N

In this case the transitions rates r(x, y) and the set S∗ will typically depend on N .
We suppress this dependence to lighten the notation.
Define

EN = EN (S∗ ) = ENx . (21.2.8)
x∈S∗

For general disjoint non-empty sets S∗1 , S∗2 we can only derive a lower bound and
an upper bound for capN,S (EN (S∗1 ), EN (S∗2 )) that coincide up to a constant. But for
partitions of S∗ we can get more.

Theorem 21.9 (Sharp asymptotics of capacities) Suppose that L(N ) satisfies

(21.2.7). Let S∗1 , S∗2 be a partition of S∗ . Then

capN,S EN S∗1 , EN S∗2
1
= 1 + o(1) capS (x, y). (21.2.9)
N α+1 M ∗ |S∗ | Iα (0) Γ (α)
x∈S∗1 ,y∈S∗2

As before, Theorem 21.9 allows us to use Corollary 7.11 to obtain the follow-
ing result for the metastable exit times τEN \ENx , x ∈ S∗ , where we recall that νA,B
denotes the last-exit biased distribution on A for the transition from A to B.

Corollary 21.10 (Mean metastable exit times) Suppose that L(N ) satisfies condi-
tion (21.2.7). For every x ∈ S∗ the metastable exit time τEN \ENx has asymptotic mean

N α+1 M∗ Iα (0) Γ (α)

EνE x ,E x [τEN \ENx ] = 1 + o(1) , N → ∞. (21.2.10)
N \EN
N
y∈S∗ \{x} capS (x, y)

Remark 21.11 We would like to show that the assertion in Corollary 21.10 also
holds for the process starting in a single configuration ηx ∈ M , and that the law of
the exit time is exponential. In Bovier, Bianchi and Ioffe [25] such results were ob-
tained for the Curie-Weiss model with random magnetic field described in Chap. 15,
through the use of coupling techniques. Such techniques, however, seem difficult to
implement for the zero-range model.
516 21 The Zero-Range Process

21.3 Capacity estimates

In this section we derive lower bounds and upper bounds on capacities that coincide
in the limit as N → ∞ with L fixed. These bounds will be used in Sect. 21.4 to
prove Theorem 21.5.

21.3.1 Lower bound

We begin by proving an a priori bound showing that the equilibrium potential is

almost constant on the sets ENx , x ∈ S∗ .

Lemma 21.12 Let S∗1 , S∗2 S∗ be non-empty disjoint sets, and let W denote the
1 2
equilibrium potential for the capacitor (ηS∗ , ηS∗ ). Then there is a constant Kα such
that

W (ξ ) − W ξ ≤ Kα L(L + N ) ,
2
ξ, ξ ∈ ENz , z ∈ S∗ , (21.3.1)
N α+1 r
where r = infu∈S r(u, u ± 1).

Proof Clearly,

W (ξ ) = Pξ [τ 1 <τ 2 ] (21.3.2)
ηS∗ ηS∗

= Pξ [τξ < τ 1 ,τ 1 <τ 2 ] + Pξ [τ 1 <τ 2 , τξ > τ 1 ]

ηS∗ ηS∗ ηS∗ ηS∗ ηS∗ ηS∗

= Pξ [τξ < τ S∗1 ] Pξ [τ S∗1 < τ S∗2 ] + Pξ [τ S∗1 < τ S∗2 ]

η η η η ξ ∪η

= 1 − Pξ [τ S∗1 < τξ ] W ξ + Pξ [τ S∗1 < τ S∗2 ].
η η ξ ∪η

With the help of the renewal equation in (8.2.2), this yields

Pξ [τ < τξ ]

1
ηS∗ Pξ [τηS∗1 < τξ ]
1− W ξ ≤ W (ξ ) ≤ W ξ + . (21.3.3)
Pξ [τξ < τξ ] Pξ [τξ < τξ ]

In Proposition 21.24 below we show that

Pξ [τ 1 < τξ ] L(L2 + N )
ηS∗
≤ Kα . (21.3.4)
Pξ [τξ < τξ ] N α+1 r

Combining (21.3.3)–(21.3.4), we get the claim.

Remark 21.13 Lemma 21.12 is most useful when L is fixed and N → ∞, but it also
allows us to include cases with slowly growing L = L(N ).
21.3 Capacity estimates 517

We first prove a lower bound on capacities for arbitrary L and N .

Proposition 21.14 Let S∗1 , S∗2 S∗ be non-empty disjoint sets. Put

# $
L(L2 + N ) N
δ = max , , (21.3.5)
N α+1 r N
and assume that 0 < δ 1. Then

capN,S EN S∗1 , EN S∗2 ≥ 1 + O(δ) inf capS (x, y)[Wy − Wx ]2
W ∈W (S∗1 ,S∗2 )
x,y∈S∗

1
N ξ
m∗
× , (21.3.6)
2N α+1 M∗ Iα (0) ZN,S a(ξ )
k=0 ξ ∈Ek,S0

where S0 = S\{u, v} for any u, v ∈ S∗ .

Proof The proof comes in 4 Steps.

1. As usual, a lower bound is obtained by using the monotonicity of the Dirichlet

form
2
EN (h) = 12 μN,S (η)g(ηz )r(z, w) h ηz,w − h(η) ,
z,w∈S η∈EN,S (21.3.7)
1 2
h ∈ HN EN S∗ , EN S∗ ,

where
HN (A, B) = {h : EN,S → R+ : h|A = 1, h|B = 0}. (21.3.8)
The strategy is to set rates to zero so that disjoint one-dimensional paths are ob-
tained. Afterwards we can use that the sum of the Dirichlet forms over the one-
dimensional paths, which are computable, yields a lower bound for the Dirichlet
form in (21.3.7), and hence for the capacity.
The construction goes as follows (see Figs. 21.1, 21.2, 21.3):
• For each ξ ∈ Ek,S , k ∈ {0, . . . , N }, we obtain a one-dimensional path as fol-
lows. Let {ξ, px,y } ∈ EN −1,S be the configuration given by {ξ, px,y }z = ξz for
z ∈ S\{x, y}, and {ξ, px,y }x = ξx + p and {ξ, px,y }y = ξy + N − k − p − 1 for
p ∈ {0, . . . , N − k − 1}. For each pair x, y ∈ S∗ , the one-dimensional path consists
of the path-segments {ξ, px,y }, where the excess N − k particles on site x jump
one by one until they reach site y (only one particle is jumping at any time).
• The path-segments are disjoint for the following reason. Let {ξ, px,y }, {ξ , px,y } ∈
EN −1,S be two different path-segments. Suppose that at some time t these paths-
segments coincide in a single configuration due to a jump of the jumping particle.
However, since {ξ, px,y } and {ξ , px,y } are different, the sites at which this par-
ticle is at time t in these segments must differ. In the next step particles from
518 21 The Zero-Range Process

Fig. 21.1 Restriction of the transition rates

Fig. 21.2 Jump of a particle in a path-segment

Fig. 21.3 Disjoint paths

different sites jump in such a way that the resulting configurations are different,
and hence the paths-segments cannot merge.
With the construction described above, we obtain one-dimensional paths that consist
of a Dirichlet form of a zero-range process on two sites multiplied by a term that we
can estimate by the capacity of the underlying random walk.

2. Let dz ∈ E1,S be the configuration with exactly one particle at site z ∈ S (the
jumping particle). Let h∗ ∈ HN (EN (S∗1 ), EN (S∗2 )) be the minimiser of (21.3.7).
Then

capN,S EN S∗1 , EN S∗2 ≥ ENlb h∗

N −k−1
N
= 1
4 μN,S {ξ, px,y } + dz g {ξ, px,y }z + 1 r(z, w)
k=0 ξ ∈Ek,S x,y∈S∗ p=0 z,w∈S
2
× h∗ {ξ, px,y } + dw − h∗ {ξ, px,y } + dz . (21.3.9)
21.3 Capacity estimates 519

Here an extra factor 12 arises because the sum over x, y counts the configurations
−k−1
{ξ, px,y }N
p=0 , ξ ∈ Ek,S , 0 ≤ k ≤ N , twice. Inserting the definition of g in (21.1.2)
and μN in (21.1.5) into (21.3.9), we get

ENlb h∗

Nα
N m∗
ξ
=
4ZN,S M∗ a(ξ \{x, y})
k=0 ξ ∈Ek,S x,y∈S∗

−k−1
N
1
×
a(ξx + p)a(ξy + N − k − p − 1)
p=0
2
× m(z)r(z, w) h∗ {ξ, px,y } + dw − h∗ {ξ, px,y } + dz , (21.3.10)
z,w∈S

where ξ \{x, y} is the configuration without the sites x, y. Next, fix x, y ∈ S∗ and
ξ ∈ Ek,S , and let fx,y : S → R be given by

h∗ ({ξ, px,y } + dv ) − h∗ ({ξ, px,y } + dy )

fx,y (v) = . (21.3.11)
h∗ ({ξ, px,y } + dx ) − h∗ ({ξ, px,y } + dy )

Note that

fx,y ∈ B(x, y) = f : S → R+ : f (x) = 1, f (y) = 0 . (21.3.12)

Inserting fx,y into (21.3.10), we see that the sum over z, w ∈ S equals
2
2ES (fx,y ) h∗ {ξ, px,y } + dx − h∗ {ξ, px,y } + dy , (21.3.13)

where ES is the Dirichlet form associated with the random walk on S. Since fx,y ∈
B(x, y) and ES (fx,y ) ≥ capS (x, y), we get from (21.3.10) that

ENlb h∗

Nα
N m∗
ξ
≥ capS (x, y)
2ZN,S M∗ a(ξ \{x, y})
k=0 ξ ∈Ek,S x,y∈S∗

−k−1
N
1
×
a(ξx + p)a(ξy + N − k − p − 1)
p=0
2
× h∗ {ξ, px,y } + dx − h∗ {ξ, px,y } + dy

Nα
N ξ
m∗
≥ inf capS (x, y)
2ZN,S M∗ W (ξ )∈W (S∗1 ,S∗2 ) a(ξ \{x, y})
k=0 ξ ∈Ek,S x,y∈S∗
520 21 The Zero-Range Process

−k−1
N
[hW (ξ ) ({ξ, px,y } + dx ) − hW (ξ ) ({ξ, px,y } + dy )]2
× inf
hW (ξ ) ∈HN (EN (S∗1 ),EN (S∗2 )) a(ξx + p)a(ξy + N − k − p − 1)
p=0
hW (ξ ) (η)=Wx (ξ ) ∀ η∈ENx
y
hW (ξ ) (η)=Wy (ξ ) ∀ η∈EN

Nα
N m∗
ξ
= inf capS (x, y)
2ZN,S M∗ W (ξ )∈W (S∗1 ,S∗2 ) a(ξ \{x, y})
k=0 ξ ∈Ek,S x,y∈S∗
2
× Wx (ξ ) − Wy (ξ )
−k−1
N
[hξ (ξx + p + 1) − hξ (ξx + p)]2
× inf .
hξ ∈HN (EN (S∗1 ∪x),EN (S∗2 ∪y)) a(ξx + p)a(ξy + N − k − p − 1)
p=0
(21.3.14)
3. Due to the boundary conditions on the function hξ (recall (21.3.8)), the last factor
in (21.3.14) reduces to
N −ξx −1
N −
[hξ (ξx + p + 1) − hξ (ξx + p)]2
inf .
hξ ∈HN (EN (S∗1 ∪x),EN (S∗2 ∪y)) a(ξx + p)a(ξy + N − k − p − 1)
p=N −k+ξy
(21.3.15)
This is just the Dirichlet form of a zero-range process living on the two sites x and
y only, which is minimized by the function
x
q=N −k+ξx +ξy +1 a(q − 1)a(ξx + ξy + N − k − q)
H (x) = N − . (21.3.16)
q=N −k+ξx +ξy +1 a(q − 1)a(ξx + ξy + N − k − q)
N

For p ∈ [N − k + ξy , N − N − ξx − 1] we have

a(ξx + p)a(ξy + N − k − p − 1)
H (ξx + p + 1) − H (ξx + p) = N − −1 .
q=N −k+ξx +ξy a(q)a(ξx + ξy + N − k − q − 1)
N

(21.3.17)
Inserting (21.3.17) into (21.3.15), we obtain
1
N −N −1 . (21.3.18)
q=N −k+ξx +ξy a(q)a(ξx + ξy + N − k − q − 1)

Since this expression depends on the configuration ξ only through the number of
particles k, for fixed k it is bounded from below by [1 − O(N /N )]/N 2α+1 Iα (0)
(recall (21.2.1)).

4. Combining (21.3.14) with Lemma 21.12, we arrive at

1
N
ENlb h∗ =
2ZN,S N α+1 M∗ Iα (0)
k=0 ξ ∈Ek,S
21.3 Capacity estimates 521

ξ
m∗ 2
inf capS (x, y) Wx − Wy + O(δ)
W ∈W (S∗1 ,S∗2 ) a(ξ \{x, y})
x,y∈S∗
2
× 1 − O(δ)
1
=
2ZN,S N α+1 M ∗ Iα (0)

N
m∗
ξ
× 1 + O(δ) inf capS (x, y)[Wx − Wy ]2 ,
W ∈W (S∗1 ,S∗2 ) a(ξ )
x,y∈S∗ k=0 ξ ∈Ek,S0
(21.3.19)

where we recall (21.3.5). By (21.3.9), this gives the claim.

Remark 21.15
(1) Note that if S∗2 = S∗ \S∗1 , then Lemma 21.12 is not needed in the proof of Propo-
sition 21.14 and the bound in (21.3.6) holds with δ = N /N . This fact estab-
lishes the lower bound in Theorem 21.9.
(2) If δ ↓ 0, which occurs when L is independent of N , then the same bound as in
1 2
(21.3.6) holds for capN,S (ηS∗ , ηS∗ ). This is because Lemma 21.12 implies that
on the sets ENx , x ∈ S∗1 ∪ S∗2 , the equilibrium potential W is close to 1 or to 0.
This fact establishes the lower bound in Theorem 21.5.

21.3.2 Upper bound

Let ρc ∈ (0, ∞) denote the critical particle density mentioned below Theorem 21.3.

Proposition 21.16 Let S∗1 , S∗2 ⊂ S∗ be non-empty disjoint sets. Then, for ε > 0,

1 2 LCε N −α−1
capN,S ηS∗ , ηS∗ ≤ capN,S EN S∗1 , EN S∗2 ≤
(N − Lρc )α

+ inf capS (x, y)[Wy − Wx ]2
W ∈W (S∗1 ,S∗2 )
x,y∈S∗

N −α−1
N mξ∗
N

× 1+O ,
ZN,S M∗ Iα (3ε) a(ξ ) εN
m=0 ξ ∈Em,S0
(21.3.20)

where Cε is a constant that depends on ε.

Proof The first inequality in (21.3.20) is obvious. The proof of the second inequality
comes in 8 Steps.
522 21 The Zero-Range Process

Fig. 21.4 Sharing configurations of I x , I y

1. Let U = {u ∈ RS+ : x∈S ux = 1}. Define the sets
• F x,y = {u ∈ U : ux + uy ≥ 1 − ε},
• L x = {u ∈ U : ux > 1 − 3ε},
• D x,y = F x,y \L x ,
• I x,y =F x,y \{L x ∪ L y } and
• I x = y∈S∗ I x,y , for x, y ∈ S∗ .
We need the following facts.

Lemma 21.17 The sets D x,y and I x,y , x = y ∈ S∗ are mutually disjoint.

Proof Since I x,y ∪ L y = D x,y , it is enough to prove the assertion for the sets
η
D x,y , x = y ∈ S∗ . Let x = y = z ∈ S∗ . Assume that Nη ∈ D x,y ∩D x,z . Then Ny , ηNz ≥
ηx +ηy +ηz
2ε, and we get a contradiction because 1 ≥ N ≥ 1 − ε + ηNz ≥ 1 − ε + 2ε =
1 + ε.

Note that the sets I x,y are symmetric for x = y ∈ S∗ . From Lemma 21.17 we
know that if Nη ∈ I x , then there exists a unique y ∈ S∗ \{x} and therefore Nη ∈ I y
(see Fig. 21.4).

2. For the construction of the test function we define a smooth function hx,y : U →
[0, 1], y ∈ S∗ \{x}, such that

hx,y (u) = 1, u∈U, hx,y (u) = 1, u ∈ D x,y , (21.3.21)
y∈S∗ \{x}

and a smooth function g x : U → [0, 1], for x ∈ S∗ such that

g x (u) = 1, u∈U, g x (u) = 12 , u ∈ I x, g x (u) = 1, u ∈ L x.
x∈S∗
(21.3.22)
Let ε > 0 such that N < εN . We start by choosing a test function for the capacity
y
between two sites x, y ∈ S∗ out of the set HN (ENx , EN ). This test function depends
on the function that solves the variational problem for the capacity of the underlying
21.3 Capacity estimates 523

random walk and on the harmonic function of the zero-range process on two sites,
' 8(

L−1
ηx 1
k
G x,y
(η) = fxy (zk ) − fxy (zk+1 ) H + min ηzl , ε , (21.3.23)
N N
k=1 l=2

where x = z1 , z2 , . . . , zL = y is an enumeration of S such that fxy (zi ) ≥ fxy (zj )

for all 1 ≤ i < j ≤ L, and fxy is the harmonic function in B(x, y). The function
−m
H : {0, . . . , NN } → R+ is the harmonic function of the zero-range process on two
sites,
zN
q=3εN a(q − 1)a(N − m − q)
H (z) = N −3εN−1 , (21.3.24)
q=3εN a(q)a(N − m − q − 1)
with boundary conditions

H (z) = 0, z ∈ 0, . . . , 3εN ,
(21.3.25)
H (z) = 1, z ∈ N − 3εN , . . . , N .

y
Lemma 21.18 Gx,y belongs to the set HN (ENx , EN ).

Proof Let η ∈ ENx . Then, for N large enough, ηNx > 1 − 3ε. Due to the boundary
condition in (21.3.25) the harmonic function H in (21.3.23) takes the value 1 for
each k. Hence

L−1

G x,y
(η) = fxy (zk ) − fxy (zk+1 ) = 1. (21.3.26)
k=1
y
Let η ∈ EN . Then, for N large enough, z∈S\{y} ηNz < 2ε, and again, through the
boundary condition (21.3.25), the harmonic function H is always 0, which implies
Gx,y (η) = 0.

3. We are now ready to choose the test function on η ∈ EN,S , namely,

GSW (η) = g x (η/N )GxW (η), (21.3.27)
x∈S∗

where, for x, y ∈ S∗ and η ∈ EN,S ,

x,y
x,y
GW (η) = Wy + Gx,y (η)(Wx − Wy ), GxW (η) = hx,y (η/N )GW (η),
y∈S∗ \{x}
(21.3.28)
with W ∈ W (S∗1 , S∗2 ).

Lemma 21.19 GSW ∈ HN (EN (S∗1 ), EN (S∗2 )).

524 21 The Zero-Range Process

Proof Let η ∈ ENx , x ∈ S∗1 . Then Nη ∈ L x and g x (η/N ) = 1{x =x} . Moreover,
x,y
GW (η) = Wy + Gx,y (η)(Wx − Wy ) = Wy + 1 − Wy = 1 because η ∈ ENx . Therefore
Gx,y (η) = 1 for all y ∈ S∗ \{x}, and Wx = 1 because x ∈ S∗1 . Hence
x,y
GSW (η) = g x (η/N )GxW (η) = hx,y (η/N )GW (η)
x ∈S∗ y∈S∗ \{x}

= hx,y (η/N) = 1. (21.3.29)
y∈S∗ \{x}

Let η ∈ ENx , x ∈ S∗2 . Then Nη ∈ L x and g x (η/N ) = 1{x =x} . Moreover, Wx = 0
x,y
because x ∈ S∗2 . Therefore Gx,y = 1 because η ∈ ENx , and GW (η) = Wy +0−Wy =
0 for all y ∈ S∗ \{x}. Hence
x,y
GSW (η) = g x (η/N )GxW (η) = hx,y (η/N )GW (η) = 0, (21.3.30)
x ∈S∗ y∈S∗ \{x}

which settles the claim.

x,y
Lemma 21.20 Let η ∈ FN = {η ∈ EN : ηx + ηy ≥ N − N }, x, y ∈ S∗ . Then

GSW (η) = Wx , ∀η ∈ ENx ,

y
GSW (η) = Wy , ∀η ∈ EN , (21.3.31)
x,y y,x x,y y
GSW (η) = GW (η) = GW (η), ∀η ∈ FN \ ENx ∪ EN .

Proof We start with the last equation. For this we show that Gx,y (η) + Gy,x (η) = 1
for Nη ∈ F x,y . If Nη ∈ L x , then this equality holds because Gx,y (η) = 1 and
Gy,x (η) = 0. The same holds for Nη ∈ L y . For Nη ∈ I x,y , let z1 , . . . , zL be the enu-
meration obtained from fx,y and w1 , . . . , wL the enumeration obtained from fy,x .
Since fx,y + fy,x = 1, we can choose wk+1 = zL−k and get
' k (

L−1
ηz
G x,y
(η) + Gy,x
(η) = fx,y (zk ) − fx,y (zk+1 ) H n
N
k=1 n=1
' k (

L−1
ηwz
+ fy,x (wk ) − fy,x (wk+1 ) H n
N
k=1 n=1
' (

L−1

k
ηzn
= fx,y (zk ) − fx,y (zk+1 ) H
N
k=1 n=1
' k (

L−1
ηzL−n+1
+ fy,x (zL−k+1 ) − fy,x (zL−k ) H
N
k=1 n=1
21.3 Capacity estimates 525
' k (

L−1
ηz
= fx,y (zk ) − fx,y (zk+1 ) H n
N
k=1 n=1
' (

L−1

L
ηzn
+ fx,y (zk ) − fx,y (zk+1 ) H
N
k=1 n=k+1
' ' k (

L−1
ηz
= fx,y (zk ) − fx,y (zk+1 ) H n
N
k=1 n=1
' ((

L
ηzn
+H
N
n=k+1

L−1

= fx,y (zk ) − fx,y (zk+1 ) H N − 3εN = 1. (21.3.32)
k=1

With this equation, we get

x,y
GW (η) = Wy + Gx,y (η)(Wx − Wy ) = Wy + 1 − Gy,x (η) (Wx − Wy )
y,x
= Wy + Wx − Wy + Gy,x (η)(Wy − Wx ) = GW (η). (21.3.33)
x,y
F
N
Now observe that, for N large enough, N = I x,y ∪L x ∪L y . For η/N ∈ N1 ENx ⊆
1 y
L and η/N ∈ N EN ⊆ L the assertion follows from Lemma 21.19. Let η/N ∈
x y
1
I x,y . Then η/N ∈ I x and η/N ∈ I y . Hence g x (η/N ) = {x ∈{x,y}}
2 , and we get
y
GSW (η) = g x (η/N )GxW (η) = 12 GxW (η) + 12 GW (η). (21.3.34)
x ∈S∗

Moreover, η/N ∈ D x,y and η/N ∈ D y,x . Hence hx,y (η/N ) = 1{y =y} and hy,y (η/
N) = 1{y =x} . Thus, (21.3.34) equals
x,y
y,y
x,y
1
2 h (η/N )GW (η) + 1
2 hy,y (η/N )GW (η)
y ∈S∗ \{x} y ∈S∗ \{x}
x,y y,x x,y x,y x,y
= 12 GW (η) + 12 GW (η) = 12 GW (η) + 12 GW (η) = GW (η), (21.3.35)

which settles the claim.

4. Using the test function GSW , we can now derive an upper bound for the desired
x,y
capacity. For x ∈ S∗ , let FNx = y∈S∗ \{x} FN and FN = x∈S∗ FNx . The Dirichlet
form EN (GSW ) of GSW (not to be confused with the set EN in (21.2.8)) is
2
EN GSW = 12 μN,S (η)g(ηz )r(z, w) GSW ηz,w − GSW (η)
η∈EN,S z,w∈S
526 21 The Zero-Range Process
2
= 1
2 μN,S (η)g(ηz )r(z, w) GSW ηz,w − GSW (η)
x,y∈S∗ η∈F x,y z,w∈S
N
2
+ 1
2 μN,S (η)g(ηz )r(z, w) GSW ηz,w − GSW (η)
η∈FNc z,w∈S
x,y
= EN GSW | FN + EN GSW | FNc . (21.3.36)
x,y∈S∗

Thus, we have to estimate the Dirichlet form on the set of configurations FNc and
x,y
the set of configurations FN , x, y ∈ S∗ .

x,y
5. We start with the set of configurations FN with fixed x = y ∈ S∗ . It follows from
Lemma 21.20 that
x,y
EN GSW | FN
2
= 12 μN,S (η)g(ηzi )r(zi , zj ) GSW ηzi ,zj − GSW (η)
1≤i,j ≤L η∈F x,y
N
x,y x,y 2
= 1
2 μN,S (η)g(ηzi )r(zi , zj ) GW ηzi ,zj − GW (η)
1≤i,j ≤L η∈F x,y
N

= 1
2 μN,S (η)g(ηzi )r(zi , zj )
1≤i,j ≤L η∈F x,y
N
2
× (Wx − Wy ) Gx,y ηzi ,zj − Gx,y (η)
x,y
= (Wx − Wy )2 EN Gx,y | FN . (21.3.37)

Furthermore,
x,y
EN Gx,y | FN
2
= 12 μN,S (η)g(ηzi )r(zi , zj ) Gx,y ηzi ,zj − Gx,y (η)
1≤i,j ≤L η∈F x,y
N

Nα η
m∗ a(ηzi ) 2
= r(zi , zj ) Gx,y ηzi ,zj − Gx,y (η)
2ZN,S a(η) a(ηzi − 1)
1≤i,j ≤L η∈F x,y
N

Nα
≤ m(zi )r(zi , zj )
2ZN,S M∗
1≤i,j ≤L

m∗ x,y
ξ
2
× G (ξ + dzj ) − Gx,y (ξ + dzi ) , (21.3.38)
ξ ∈EN−1,S
a(ξ )
ξx +ξy ≥N−N −1
21.3 Capacity estimates 527

where we set η = ξ + ∂zi . By the definition of Gx,y , we obtain

x,y
EN Gx,y | FN
L−1
Nα m∗
ξ

≤ m(zi )r(zi , zj ) fxy (zk ) − fxy (zk+1 )
2ZN,S M∗ a(ξ )
1≤i,j ≤L ξ ∈EN−1,S k=1
ξx +ξy ≥N−N −1

' ' ( ' ((2

ξx ξzl ξx ξzl
k k
1{k≥i} 1{k≥j }
× H + + −H + + .
N N N N N N
l=2 l=2
(21.3.39)

Fix two sites zi = zj ∈ S with i < j . Since m∗ (x) = m∗ (y) = 1, by setting mk =

k
l=2 ξzl we get for the sum over the configurations ξ in (21.3.39) the upper bound

N ζ N −3εN
−1
m∗ 1
a(ζ ) a(p)a(N − m − p − 1)
m=0 ζ ∈Em,S\{x,y} p=2εN
j −1

2
p + mk + 1 p + mk
× fxy (zk ) − fxy (zk+1 ) H −H .
N N
k=i
(21.3.40)
The sum over p only runs from 2εN to N − 3εN − 1, due to the boundary
conditions in (21.3.25). Inserting the explicit form (21.3.24), we get

N ζ N −3εN−1

m∗ 1
a(ζ ) a(p)a(N − m − p − 1)
m=0 ζ ∈Em,S\{x,y} p=2εN
j −1 2
a(p + mk )a(N − m − mk − p − 1)
× fxy (zk ) − fxy (zk+1 ) N −3εN −1 .
k=i q=3εN a(q)a(N − m − q − 1)
(21.3.41)
Since mk ≤ N and p ≥ 2εN , we can estimate

α

mk α N N
a(p + mk ) = a(p) 1 + ≤ a(p) 1 + ≤ a(p) 1 + O ,
p 2εN εN
(21.3.42)
and a(N − m − mk − p − 1) ≤ a(N − m − p − 1). Inserting these estimates into
(21.3.41) we get the upper bound

N ζ N −3εN−1

m∗ 1
a(ζ ) a(p)a(N − m − p − 1)
m=0 ζ ∈Em,S\{x,y} p=2εN
528 21 The Zero-Range Process
j −1
a(p)a(N − m − p − 1)
× fxy (zk ) − fxy (zk+1 ) N −3εN −1
k=i q=3εN a(q)a(N − m − q − 1)

2
N
× 1+O
εN
j −1 2

N m∗
ζ

= fxy (zi ) − fxy (zj )
a(ζ )
m=0 ζ ∈Em,S\{x,y} k=i
N −3εN −1

p=2εN a(p)a(N − m − p − 1) N
× N −3εN −1 1+O . (21.3.43)
( q=3εN a(q)a(N − m − q − 1))2 εN

For j < i we get the same bound. Now note that

N −3εN−1
p=2εN a(p)a(N − m − p − 1) 1+R
N −3εN−1 = N −3εN−1
( q=3εN a(q)a(N − m − q − 1)) 2
p=3εN a(p)a(N − m − p − 1)
(21.3.44)
with
3εN −1
p=2εN a(p)a(N − m − p − 1)
R = N −3εN−1 . (21.3.45)
p=3εN a(p)a(N − m − p − 1)

It is easy to verify that (recall (21.2.1))

εN
R =O . (21.3.46)
N − εN

Thus, we obtain for (21.3.39) the upper bound

Nα 2
m(zi )r(zi , zj ) fxy (zi ) − fxy (zj )
2ZN,S M∗
1≤i,j ≤L

N ζ

m∗ 1 N
× 1 + O .
a(ζ ) N −3εN −1 a(p)a(N − m − p − 1) εN
m=0 ζ ∈Em,S\{x,y} p=3εN
(21.3.47)

The sum over i, j in (21.3.47) is just the capacity of the underlying random walk
between the two sites x and y. Since

N −3εN−1

N
a(p)a(N − m − p − 1) ≥ N 2α+1 Iα (3ε) 1 − O , (21.3.48)
εN
p=3εN
21.3 Capacity estimates 529

we get for (21.3.47)

N ζ

x,y capS (x, y) m∗ N
EN Gx,y | FN ≤ 1+O .
N α+1 ZN,S M∗ Iα (3ε) a(ζ ) εN
m=0 ζ ∈Em,S0
(21.3.49)
6. Next we do the computation of EN (GSW | FNc ).

Lemma 21.21 There exists an ε-dependent constant Cε such that

S z,w Cε
max G η − GSW (η) ≤ . (21.3.50)
W
η∈EN,S \FN N

Proof Write

S z,w x z,w x z,w
G η
− GW (η) =
S
g η /N GW η − g (η/N )GW (η).
x x
W
x∈S∗

(21.3.51)

Since g x is a smooth function, there exists a constant C such that |g x (ηz,w /N ) −

g x (η/N )| ≤ N
C
, and hence (21.3.51) can be bounded from above by

x z,w C
x
g (η/N) GW η 1+ − GW (η)
x
N
x∈S∗

x x,y z,w C
= g (η/N) hx,y z,w
η /N GW η 1+
N
x∈S∗ y∈S∗ \{x}

−h x,y x,y
(η/N )GW (η) . (21.3.52)

Since also hx,y is a smooth function, there exist a constant C such that |hx,y (ηz,w /
N) − hx,y (η/N )| ≤ N C
. Hence (21.3.52) is at most

x x,y C 2
≤ − GW (η)
x,y
g (η/N ) hx,y (η/N) GW ηz,w 1 +
N
x∈S∗ y∈S∗ \{x}

= g x (η/N ) hx,y (η/N)
x∈S∗ y∈S∗ \{x}

x,y 2C C2 x,y z,w
×
x,y
GW ηz,w − GW (η) + + 2 GW η
N N
530 21 The Zero-Range Process

x
≤ g (η/N )
x∈S∗

C
× hx,y (η/N) (Wx − Wy ) Gx,y ηz,w − Gx,y (η) +
N
y∈S∗ \{x}

x,y
≤ g x (η/N ) h (η/N)|Wx − Wy |Gx,y ηz,w − Gx,y (η) + C ,
N
x∈S∗ y∈S∗ \{x}
(21.3.53)
x,y
where we use that GW (η) ≤ 1 for all η ∈ EN,S . It remains to estimate

maxc Gx,y ηz,w − Gx,y (η). (21.3.54)
η∈FN

We may assume that z = zj and w = zi with i < j . Then

maxc Gx,y ηzi ,zj − Gx,y (η)
η∈FN
j −1

mk + 1
ηx ηx mk
= maxc fx,y (zk ) − fx,y (zk+1 ) H + −H +
η∈FN N N N N
k=i
j −1
a(ηx + mk )a(N − ηx − mk − 1)

= maxc fx,y (zk ) − fx,y (zk+1 ) N −3εN
η∈FN a(p − 1)a(N − p)
k=i p=3εN+1
j −1

a(N/2)a(N/2)
≤ maxc fx,y (zk ) − fx,y (zk+1 ) N −3εN
η∈FN a(p − 1)a(N − p)
k=i p=3εN +1

(N/2)2α
≤ N −3εN . (21.3.55)
p=3εN+1 a(p − 1)a(N − p)

N −3εN N
Since p=3εN+1 a(p − 1)a(N − p) ≥ N 2α+1 Iα (3ε)[1 − O( εN )], we get that
there exists a constant Cε such that

(N/2)2α N Cε
r.h.s. (21.3.55) ≤ 2α+1
1 + O = . (21.3.56)
N Iα (3ε) εN N

Thus
Cε x Cε
r.h.s. (21.3.53) ≤ g (η/N) hx,y (η/N ) = , (21.3.57)
N N
x∈S∗ y∈S∗ \{x}

because g x , hx,y , x = y ∈ S∗ are positive functions and |Wx − Wy | ≤ 1.

21.3 Capacity estimates 531

7. In order to proceed with the computation, we need a technical result that follows
from Großkinsky and Spohn [132]. The first statement says that all excess parti-
cles accumulate on a single site in S∗ . The second statement says that if there is a
constraint on the maximal occupation number of a single site, then as many excess
particles as possible accumulate on a single site.

Proposition 21.22 Let ZN,S (k) be the constrained partition function with the con-
dition ηz < k for all η ∈ EN,S and z ∈ S. Then:
Nα
(i) ZN,S (k) = x∈S∗ (N −ρc L) α (ρ L)α Zρc L,S\{x} [1 + o(1)] for k ≥ N − ρc L.
c
α
(ii) ZN,S (k) = x∈S∗ (N −k)N
α k α ZN −k,S\{x} (k)[1 + o(1)] for k < N − ρc L.

Proposition 21.22 is needed for the following lemma.

Lemma 21.23 Let Ax = {η ∈ EN,S : εN ≤ ηx ≤ N − εN , ηx + ηy < N −

N , ∀y ∈ S\{x}}, x ∈ S∗ . Then there is a constant C∗ (depending on α and |S∗ |)
such that
C∗ 1
μN,S (η) ≤ . (21.3.58)
(N − ρc L)α N α−1
η∈Ax

Proof Write

N −εN
ξ
Nα 1 m∗
μN,S (η) =
ZN,S kα a(ξ )
η∈A x k=εN ξ ∈EN−k,S\{x}
ξy <N−k−N ∀y∈S\{x}

N −εN

Nα 1 1
= ZN −k,S\{x} (N − k − N ). (21.3.59)
ZN,S k (N − k)α
α
k=εN

Now we apply Proposition 21.22(ii) to ZN −k,S\{x} (N −k −N ). Since N −k −N <

N − ρc (L − 1) for all k ∈ {3εN, . . . , N − 3εN }, we find that (21.3.59) equals

N −εN

1 (N − k)α ZN ,S\{x,y} (N − k − N )
=N α
k α (N − k)α αN (N − k − N )α ZN,S
εN y∈S∗ \{x}

× 1 + o(1) . (21.3.60)

Since N − k − N ≥ N − ρc (L − 2) for all k ∈ {3εN, . . . , N − 3εN }, we can apply

Proposition 21.22(i) to ZN ,S\{x,y,} . Hence (21.3.60) equals
532 21 The Zero-Range Process

N −εN

1 1
Nα
kα αN (N − k − N )α
k=εN y∈S∗ \{x}
αN [1 + o(1)] Z(L−2)ρc ,S\{x,y,z}
× . (21.3.61)
(N − (L − 2)ρc )α (ρc (L − 2))α ZN,S
z∈S∗ \{x,y}

Note that ZN,S ≥ ZN,S\{x,y} . We apply Proposition 21.22(i) with k = N for

ZN,S\{x,y} ,
N α Z(L−2)ρc ,S\{x,y,z}
ZN,S\{x,y} = 1 + o(1)
(N − ρc (L − 2)) (ρc (L − 2))
α α
z∈S∗ \{x,y}

N α |S∗ − 2|(1 + o(1))

= Z(L−2)ρc ,S\{x∗ ,y∗ ,z∗ } , (21.3.62)
(N − (L − 2)ρc )α (ρc (L − 2))α
where x∗ = y∗ = z∗ ∈ S∗ are arbitrary condensate sites. Inserting (21.3.62), we see
that

r.h.s. (21.3.61)
N −εN
1 |S∗ − 1| |S∗ − 2| (N − (L − 2)ρc )α
≤ 1 + o(1)
k α (N − k − N )α (N − (L − 2)ρc )α |S∗ − 2|
k=εN

N −εN

N − (L − 2)ρc α 1 1
= |S∗ − 1| 1 + o(1)
N − (L − 2)ρc k (N − k − N )
α α
k=εN

(N −
N )/2
N − (L − 2)ρc α 1 1
≤ 2α+1 |S∗ − 1| 1 + o(1)
N − N (N − (L − 2)ρc )α k α
k=εN

α
2α+1 |S∗ − 1| N − (L − 2)ρc 1 1
≤ 1+
α−1 N − N (N − (L − 2)ρc )α ε α−1 N α−1

× 1 + o(1)
C∗ 1
≤ . (21.3.63)
(N − Lρc ) ε
α α−1 N α−1
This settles the claim.

8. Observe that configurations out of the set z∈S\S∗ {η ∈ EN,S : ηz ≥ N − N } do
not contribute to the Dirichlet form, because for these configurations Gx,y = 0 for
all x, y ∈ S∗ . Combining Lemmas 21.21–21.23, we get
1 2
EN GSW | FNc = μN,S (η)g(ηz )r(z, w) GSW ηz,w − GSW (η)
2 c η∈FN z,w∈S
21.4 Proof of the main theorems 533

1 2
≤ μN,S (η)r(z, w) GSW ηz,w − GSW (η)
2
x∈S∗ η∈EN \FNx z,w∈S

Cε2
≤ μ N,S (η) r(z, w)
2N 2 x
x∈S∗ η∈A z,w∈S

LC∗ Cε2 1
≤ . (21.3.64)
(N − Lρc )α N α+1
Combine (21.3.36), (21.3.37), (21.3.49) and (21.3.64) to arrive at

capN,S EN S∗1 , EN S∗2
= inf EN (G)
G∈HN (EN (S∗1 ),EN (S∗2 ))

≤ inf EN GSW
W ∈W (S∗1 ,S∗2 )
x,y
= inf EN GS | FN [Wy − Wx ]2 + EN GSW | FNc
W ∈W (S∗1 ,S∗2 )
x,y∈S∗

≤ inf capS (x, y)[Wy − Wx ]2
W ∈W (S∗1 ,S∗2 )
x,y∈S∗

N −α−1
N mξ∗
N

× 1+O
2ZN,S M∗ Iα (3ε) a(ξ ) εN
m=0 ξ ∈Em,S0

LC∗ Cε2 1
+ . (21.3.65)
(N − Lρc )α N α+1
This completes the proof.

21.4 Proof of the main theorems

21.4.1 Finite system size

Proof To prove Theorem 21.5, recall the remark made after the proof of Proposi-
tion 21.14. Using Lemma 21.2, we get for L fixed and N → ∞ the lower bound
1 2 N −α−1 [1 + o(1)]
capN,S ηS∗ , ηS∗ ≥
2ZS M∗ Iα (0)

N
m∗
ξ
× inf capS (x, y)[Wx − Wy ]
2
.
W ∈W (S∗1 ,S∗2 ) a(ξ )
x,y∈S∗ k=0 ξ ∈Ek,S0
(21.4.1)
534 21 The Zero-Range Process

With the definitions in (21.1.10)–(21.1.11), the sum over k in (21.4.1) converges as

N → ∞ to
ζ
m∗ " m∗ (z)j " ZS
= = Γz = . (21.4.2)
a(ζ ) a(j ) |S∗ |Γ (α)
k∈N0 ζ ∈Ek,S0 z∈S0 j ∈N0 z∈S0

This gives the lower bound for (21.2.3).

Insert the bound

N m∗
ξ
ZS
≤ (21.4.3)
a(ξ ) |S∗ |Γ (α)
m=0 ξ ∈Em,S0

into (21.3.20) and use Lemma 21.2. This gives the upper bound for (21.2.3).

21.4.2 Diverging system size

Proof To prove Theorem 21.9, suppose that L(N) satisfies (21.2.7).

1. In the case where S∗1 and S∗2 are a partition of S∗ , the lower bound in Proposi-
tion 21.14 takes the form
1
capN,S EN S∗1 , EN S∗2 ≥ capS (x, y)
N α+1 M ∗ Iα (0)
x∈S∗1 ,y∈S∗2

1
N
m∗
ξ

× 1 − O(δ) . (21.4.4)
ZN,S a(ξ )
k=0 ξ ∈Ek,S0

From Proposition 21.22 we know that

N
1 m∗
ξ

ZN,S = N α 1 + o(1)
(N − k)α a(ξ )
x∈S∗ k=0 ξ ∈Ek,S\{x}

α
N m∗
ξ
N
≤ 1+ 1 + o(1)
N − N a(ξ )
x∈S∗ k=0 ξ ∈Ek,S\{x}

N m∗
ξ

= |S∗ | 1 + o(1)
a(ξ )
k=0 ξ ∈Ek,S\{x∗ }

" m∗ (z)k
≤ |S∗ | 1 + o(1)
a(k)
z∈S\{x∗ } k≥0
21.5 Proof that the condensate configurations form a metastable set 535
"
= |S∗ | Γz 1 + o(1)
z∈S\{x∗ }
"
= |S∗ |Γ (α) Γz 1 + o(1) . (21.4.5)
z∈S0

Hence we get

capN,S EN S∗1 , EN S∗2
1
≥ 1 + o(1) capS (x, y), (21.4.6)
N α+1 M∗ |S∗ | Iα (0) Γ (α)
x∈S∗1 ,y∈S∗2

which is the desired lower bound.

2. In the case where S∗1 and S∗2 are a partition of S∗ , we get from (21.3.20), together
with the estimate in (21.4.3), the upper bound
ZS 1
capN,S EN S∗1 , EN S∗2 ≤
ZN,S N α+1 M∗ |S∗ | Iα (0) Γ (α)

N C N
× capS (x, y) 1 + O . (21.4.7)
1 2
N
x∈S∗ ,y∈S∗

ZS
Since ZN,S = 1 + o(1), we obtain

1
capN,S EN S∗1 , EN S∗2 ≤ 1 + o(1)
N α+1 M ∗ |S∗ | Iα (0) Γ (α)

× capS (x, y), (21.4.8)
x∈S∗1 ,y∈S∗2

which is the desired upper bound.

3. Combining (21.4.6) and (21.4.8), we get Theorem 21.9.

21.5 Proof that the condensate configurations form a metastable

set
We now prove Proposition 21.4 for L fixed and N → ∞.

Proposition 21.24
supη∈M [capN,S (η, M \η)/μN,S (η)]
≤ O L L2 + N /N α+1 r . (21.5.1)
infξ ∈M c [capN,S (ξ, M )/μN,S (ξ )]
536 21 The Zero-Range Process

Fig. 21.5 Motion of the particles in the chosen path

Proof The following lemma bounds the denominator in (21.5.1) from below.

Lemma 21.25 Let r = minu∈S r(u, u ± 1) > 0. Then

capN,S (ξ, M ) r
≥ , ξ ∈ M c, (21.5.2)
μN,S (ξ ) L(L2 + N )|S∗ |K(α)

where K(α) is a constant that depends on α.

Proof First we give the proof for |S∗ | = 3. After that we extend the proof to |S∗ | > 3
with the help of an algorithm.

• |S∗ | = 3. Let S∗ = {x, y, z} and r(x, y) ≥ r(x, z) ≥ r(y, z). To estimate the ca-
pacity from below we only need one path from ξ to M . First, we let all particles
of a valley (recall (8.2.10) for the definition of valleys) jump onto its attractor. The
resulting configuration, where only the three attractors x, y, z in the three valleys
A(x), A(y), A(z) are occupied, is called η. Assume that on y there are more parti-
cles than on x (see Fig. 21.5). Next, since r(x, y) ≥ r(x, z), r(y, z), and since y is
occupied by more particles than x, we let all particles from x jump to y. The result-
ing configuration with only the sites y and z occupied, is called σ x,y (see Fig. 21.5).
Finally, we pick the site y or z where σ xy has the highest occupation number and
move all particles to that site.
Since
1
capN,S (ξ, M ) ≥ , (21.5.3)
cap−1 −1 −1
N,S (ξ, η) + capN,S (η, σ ) + capN,S (σ , M )
xy xy

we must calculate a lower bound for each of the three capacities in the right-hand
side of (21.5.3).
For w ∈ S∗ , let (w)n be a distance-to-w-increasing enumeration of A(w)\{w},
and let ξwi = ξw + i−1j =1 ξwj be the number of particles on site w before the particles
on site wi jump onto w. Let |w| = max{dist(w, wi ) : w ∈ S∗ , wi ∈ (w)n }, where
dist(w, wi ) is the Euclidean distance. For each transition of the particles from site wi
to site w via nearest-neighbour jumps, we use the explicit formula for the capacity
21.5 Proof that the condensate configurations form a metastable set 537

of the one-dimensional chain. Doing so, we get for the capacity between ξ and η
the lower bound

−1
|(w)
n |
wi ξ
|w|(N − 1)α
cap−1
N,S (ξ, η) ≤ , (21.5.4)
μN,S ({ξwi − k, ξwi + k})r N α
w∈S∗ i=1 k=0

where {ξwi − k, ξwi + k} is the configuration

⎧
⎪
⎪ξv , v ∈ S\A(w) and v ∈ {wn : n > i},
⎪
⎨0,
v ∈ {wn : n < i},
ξwi − k, ξwi + k v = (21.5.5)
⎪
⎪ξwi − k, v = wi ,
⎪
⎩ i
ξw + k, v = w.

Note that the μN -measure of the configuration increases after each transition of a
particle, because there are more particles on condensate-sites. Therefore (21.5.4)
equals

−1
|(w)
n |
wi ξ
ZN,S |w| a({ξwi − k, ξwi + k}\{wi , w})
Nα {ξw −k,ξwi +k}\{wi }
w∈S∗ i=1 k=0 m i ∗

a(ξwi − k)a(ξwi + k)(N − 1)α

×
m∗ (wi )ξwi −k N α r
|(w)
n | ZN,S |w| a({ξw − k, ξ i + k}\{wi , w})a(ξ i )a(ξw )
= i w w i
Nα {ξwi −k,ξwi +k}\{wi } ξwi
w∈S∗ i=1 m ∗ m (w ) r ∗ i
ξwi −1

k α k α 1 α
× 1− 1+ i 1− m∗ (wi )k . (21.5.6)
ξwi ξw N
k=0

The sum over k in (21.5.6) is bounded by L times an α-dependent constant K (α).

Observe that a(ξw1 ) = 1 when ξw1 = 0, and we can use the same estimation. We
therefore obtain

|(w)
n | |w|LK (α) |(w)
n | L2 K (α)
cap−1
N,S (ξ, η) ≤ ≤ .
μN,S ({ξwi , ξwi })r μN,S ({ξwi , ξwi })r
w∈S∗ i=1 w∈S∗ i=1
(21.5.7)
If ηx = N or ηy = N , then we can stop here, because we already have a configura-
tion in M . Otherwise we continue the estimation of the capacities.
Without loss of generality, let ηx ≤ ηy . A similar estimation of the formula for
the one-dimensional chain yields
538 21 The Zero-Range Process

cap−1
N,S η, σ
xy

x −1
η
dist(x, y)K (α)(N − 1)α
≤
μN,S ({ηx − k, ηy + k})r N α
k=0
ηx −1
ZN,S dist(x, y)K (α) a(η\{x, y}) a(ηx − k)a(ηy + k)(N − 1)α
= η
Nα r m∗ k=0
Nα

x −1
η
α
α
α
dist(x, y)K (α) k k 1
= 1− 1+ 1− , (21.5.8)
μN,S (η)r ηx ηy N
k=0

where {ηx − k, ηy + k} ∈ EN,S is the configuration where all sites are empty except
for site x with ηx − k particles, site y with ηy + k particles and site z with ηz
particles. Observe that ηx /ηy ≤ 1. For all k ∈ {1, . . . , ηx − 1}, we have

1 α k α k α
1− ≤ 1, 1− ≤ 1, 1+ ≤ 2α . (21.5.9)
N ηx ηy

Since ηx , ηy < N , we can estimate the sum in (21.5.8) from above by 2α N , and get

2α K (α)N dist(x, y) 2α K (α)N L

cap−1 η, σ xy
≤ ≤ . (21.5.10)
N,S
μN,S (η)r μN,S (η)r
xy
If σy = N , then we can stop here. Otherwise we continue the estimation of the
capacities.
xy xy
Without loss of generality, let σz ≤ σy . A similar estimation yields

xy 2α K (α)N dist(y, z) 2α K (α)N L

cap−1
N,S σ , M ≤ ≤ . (21.5.11)
μN (σ xy )r μN,S (σ xy )r

Combining (21.5.7), (21.5.10) and (21.5.11), we get the lower bound

|(w) |
capN,S (ξ, M ) L2 K (α) n μN,S (ξ )
≥
μN,S (ξ ) r μN,S ({ξwi , ξwi })
w∈S∗ i=1

−1
2α K (α)N L μN,S (ξ ) μN,S (ξ )
+ + . (21.5.12)
r μN,S (η) μN,S (σ xy )

Since there are more particles on the sites of S∗ in the configurations {ξwi , ξwi }, η
and σ xy than in the configuration ξ , we have

μN,S (ξ ) μN,S (ξ ) μN,S (ξ )

, , ≤ 1. (21.5.13)
μN,S ({ξwi , ξwi }) μN,S (η) μN,S (σ xy )
21.5 Proof that the condensate configurations form a metastable set 539

Fig. 21.6 Construction of the tree

Thus, we can continue by estimating

3
capN,S (ξ, M ) L K (α)|S∗ | 2α K (α)(|S∗ | − 1)N L −1
≥ +
μN,S (ξ ) r r
r
≥ , (21.5.14)
L(L2 + N )|S∗ |K(α)

and we are done.

• |S∗ | > 3. The following algorithm generalizes the argument to |S∗ | > 3. Fix a con-
figuration ξ ∈/ M . First, we let all particles in a valley jump onto its attractor. From
the resulting configuration we construct a labeled tree. Each attractor corresponds
to a leaf of the tree that is labeled with its occupation number. The local maxima of
the potential of the random walk on S are the vertices of the tree, where the largest
local maximum is the root of the tree. The root is connected with the vertices of the
next two largest local maxima, and so on, as illustrated in Fig. 21.6.
The algorithm works as follows. For each pair of leaves we calculate the length
of the shortest path between them and choose the pair of leaves with the shortest
path. If there are multiple shortest paths, then we choose the one with the lowest-
labeled leave. Next, we increase the label of the highest-labeled leaf in the pair by
the value of the label of the lowest-labeled leaf, and delete the latter. We continue
until we obtain a tree with only one leaf. This algorithm describes a path from the
configuration ξ ∈ / M to a configuration in M (because the final tree corresponds to a
configuration in M ). Thus, for the general case we have to calculate at most |S∗ | − 1
transitions between condensate-sites, i.e., we have to estimate at most |S∗ | − 1 ca-
pacities. Hence (21.5.14) also holds in the general case. Figure 21.7 illustrates the
algorithm for the case |S∗ | = 3.

Having concluded the proof of Lemma 21.25, we can now conclude the proof of
Proposition 21.24. Using Theorem 21.5 and Lemma 21.2, we obtain

capN,S (ηw , M \ηw )

sup
w∈S∗ μN,S (ηw )
ZN,S [1 + o(1)]
= sup α+1 |S | M I (0) Γ (α)
capS (w, v)
w∈S∗ N ∗ ∗ α v∈S∗ \{w}
540 21 The Zero-Range Process

Fig. 21.7 Algorithm for |S∗ | = 3

ZS
= 1 + o(1) sup capS (w, v), (21.5.15)
N α+1 |S ∗ | M ∗ Iα (0) Γ (α) w∈S∗ v∈S∗ \{w}

where μN,S (ηw ) = 1/ZN,S is the equilibrium weight of the configuration ηw where
all N particles are at site w. The expression in (21.5.15) gives
us control of the
numerator in (21.5.1) (recall (21.1.15) and the fact that M = w∈S∗ ηw ). For the
denominator in (21.5.1) we use Lemma 21.25. For the quotient we therefore get

supη∈M [capN,S (η, M \η)/μN,S (η)]

infξ ∈M c [capN,S (ξ, M )/μN,S (ξ )]
ZS L(L2 + N )K(α)
≤ 1 + o(1) α+1 sup capS (w, v)
N M∗ Iα (0) Γ (α)r w∈S∗
v∈S∗ \{w}

= O L L2 + N /N α+1 r , (21.5.16)

which is the desired result.

21.6 Bibliographical notes

1. Metastability for the zero-range process in the case where the random walk is
reversible was first investigated by Beltrán and Landim [16]. Convergence of the
time-scaled process, observed when it visits the metastable set, to a continuous-
time random walk was proved with the help of martingale techniques (recall Re-
mark 21.8). The results were recovered in the framework of the potential-theoretic
approach to metastability in the Diploma Thesis of Rebecca Neukirch [191], and
were published in Bovier and Neukirch [40]. The presentation in this chapter is
based on this work.

2. Lemma 21.1 was proved, in increasing generality, in Andjel [7], Evans [105],
Großkinsky and Spohn [132]. Lemma 21.2 was proved in Großkinsky and Spohn
[132], Beltrán and Landim [16]. The condensation phenomenon in Theorem 21.3
21.6 Bibliographical notes 541

was proved in Großkinsky and Spohn [132], Großkinsky, Schütz and Spohn [131],
Großkinsky and Schütz [130].

3. Landim [159] deals with the zero-range process in the case where the random
walk is totally asymmetric (so that reversibility fails).
Part IX
Challenges

Part IX describes a number of models whose metastable behaviour is in principle

understood but still presents major technical challenges. Some of these challenges
may eventually become tractable with the potential-theoretic methods outlined in
this monograph.
In Chap. 22 we list some challenges within metastability. In particular, we look at
Glauber dynamics in large volumes of Ising spins with a nearestneighbour interac-
tion in small magnetic fields, respectively, at birth-death dynamics of point particles
with a spatial interaction in small volumes at low temperatures. The former is a
variant of the model treated in Chaps. 17 and 19 where the critical droplets are so
large that they take on a macroscopic shape, called the Wulff shape. The latter is a
continuum variant of the model treated in Chapts. 18 and 20, and is linked to the
phenomenon of crystallisation. For both we state a few theorems and point to a few
open problems.
In Chap. 23 we list some challenges outside metastability. In particular, we look
at Glauber dynamics in infinite volume and study what happens after the system
has created a critical droplet. The focus is on how long it takes for the origin to be
invaded by the plus-phase when the system starts in the minus-phase. It turns out
that this crossover is caused by a critical droplet that is created far away from the
origin and subsequently grows until it reaches the origin.
Chapter 22
Challenges Within Metastability

Under this modification, the process of division was repeated,

but with the old negative result: the attempt was therefore
abandoned, though not without hope that future
mathematicians, by introducing a number of hitherto
undetermined constants, raised to the second degree, might
succeed in obtaining a positive result.
(Lewis Carroll, A New Method of Evaluation)

There are several challenges within metastability that as yet remain unsolved, but
are potentially within reach of the conceptual and technical machinery described in
the present monograph. This chapter is devoted to two models representing some
of these challenges. We state a few theorems—without proofs—and point to a few
open problems.
Section 22.1 looks at Ising spins in a small magnetic field subject to Glauber
dynamics. This is the same model as treated in Chaps. 17 and 19, but in a different
metastable regime, namely, where the critical droplet is very large. In Sect. 22.2 we
discuss a model of an interacting particle system in the continuum, to which the
methods described in the present monograph apply after appropriate modifications.

22.1 Glauber dynamics in large volumes at small magnetic fields

In Part VII we already dealt with large volumes for Glauber dynamics and Kawasaki
dynamics at low temperatures. In this section we again look at Glauber dynamics in
large volumes, but now at positive temperatures and small magnetic fields. The tem-
perature is chosen strictly below the critical temperature of the Ising model on the
infinite lattice Z2 in zero magnetic field. In the limit as the magnetic field tends to
zero the size of the critical droplet tends to infinity. The main idea is that its asymp-
totic shape is the Wulff shape from equilibrium statistical physics, i.e., the shape
that minimises the integrated surface tension between the minus-phase outside the
droplet and the plus-phase inside the droplet. In what follows we consider volumes
that are comparable to the volume of the critical droplet. In Chap. 23 we will see
what happens in larger volumes.

A. Bovier, F. den Hollander, Metastability,
Grundlehren der mathematischen Wissenschaften 351,
DOI 10.1007/978-3-319-24777-9_22
546 22 Challenges Within Metastability

Return to the setting of Sect. 17.1.1. The Ising-spin Hamiltonian on a finite square
box Λ ⊂ Z2 reads (recall (17.1.1)–(17.1.2))
J h
H (σ ) = − σ (x)σ (y) − σ (x), σ ∈ S = {−1, +1}Λ , (22.1.1)
2 2
{x,y}∈Λ∗ x∈Λ

with J, h > 0, where we use periodic boundary conditions. The system follows a
Metropolis dynamics (σt )t≥0 with spin-flip rates
# −β[H (σ x )−H (σ )]
e +, σ ∈ S, x ∈ Λ,
cβ σ, σ x = (22.1.2)
0, otherwise,

where σ x is the configuration obtained from σ by flipping the spin at site x. We

write Eσ to denote expectation w.r.t. the law of this dynamics starting from σ0 = σ .
√ of the system in the limit as h ↓ 0
We are interested in the metastable behaviour
for fixed β ∈ (βc , ∞), where βc = 2J
1
ln(1 + 2) is the critical inverse temperature
of the model on Z2 at h = 0 (see e.g. Georgii [125], p. 451). In this limit the critical
droplet will be large, namely, it has a linear size of order 1/ h (recall (17.1.4)). In
order to accommodate this droplet, we pick Λ = Λh with

C C 2
Λh = − , ∩ Z2 with C ∈ (0, ∞) large enough. (22.1.3)
h h

As initial configuration we take σ0 = h , i.e., all spins in Λh are pointing down-

wards.
In Sect. 22.1.1 we state the main theorem for the crossover time, in Sect. 22.1.2
we explain the Wulff construction, while in Sect. 22.1.3 we give the heuristics be-
hind the theorem.

22.1.1 Metastable crossover time

Intuitively, we expect the system to quickly relax to a distribution that is close to

the minus-phase of the Ising model on the infinite lattice Z2 in zero magnetic field
h = 0. Indeed, since h is small the barrier for tunnelling to a distribution that is close
to the plus-phase is very high. We are interested in the crossover time. To that end,
let f : S → R be called local if it depends on finitely many spin variables only. Let
f − and f + be the average of f in the minus-phase, respectively, the plus-phase.

Theorem 22.1 (Mean crossover time from minus-phase to plus-phase) Fix β ∈

(βc , ∞). If f is a local function, then
#
f − , if κ < κβ ,
lim Eh f (στ (h;κ) ) = (22.1.4)
h↓0 f + , if κ > κβ ,
22.1 Glauber dynamics in large volumes at small magnetic fields 547

where τ (h; κ) = exp[κ/ h] and

βw ∗ (β)2
κβ = , (22.1.5)
4m∗ (β)
with m∗ (β) the spontaneous magnetisation of the plus-phase and w ∗ (β) the inte-
grated surface tension of the Wulff droplet of unit volume.

Theorem 22.1 says that the crossover from the minus-phase to the plus-phase
occurs at time exp[(κβ / h)[1 + o(1)]]. What is remarkable about (22.1.4) is that
it relates the crossover, which is a non-equilibrium quantity, to the spontaneous
magnetisation and the integrated surface tension, which are equilibrium quanti-
ties. A priori there is no reason why the critical droplet should have an equilibrium
shape (= Wulff shape). In fact, in Sect. 17.7, Items 5–8, we saw examples where
(sub)critical droplets do not take on an equilibrium shape.

22.1.2 Wulff construction

Here is a brief description of the construction of the Wulff droplet (see Fig. 22.2).
Let S 1 = {x ∈ R2 : x2 = 1} be the surface of the Euclidean ball of radius 1. The
surface tension in the Ising model on Z2 at h = 0 in the direction perpendicular to
n ∈ S 1 is defined as

1 Z,σ (n)
Tβ (n) = − lim ln . (22.1.6)
→∞ 2βy()2 Z,+

Here, y() and −y() are the points where the straight line {x ∈ R2 : (x, n) = 0}
intersects the boundary of the box Λ = [−, ]2 , Z,σ (n) is the partition sum on
Λ ∩ Z2 with the boundary condition σ (n) given by
#
+1 if (x, n) ≥ 0,
σ (n)(x) = x ∈ ∂Λ , (22.1.7)
−1 if (x, n) < 0,

and Z,+ is the partition sum with the plus boundary condition.
Let D denote the set of closed self-avoiding rectifiable curves in R2 that are the
boundary of a bounded region in R2 . For γ ∈ D , define the surface tension along γ
as (see Fig. 22.1)

Iβ (γ ) = Tβ (ns ) dns , (22.1.8)
γ

where s parametrises γ according to the Euclidean length measure, and ns is the

unit outward normal vector at the point s ∈ γ (which exists for almost every s ∈ γ ).
For n ∈ S 1 and λ ∈ (0, ∞), define the region

Wβλ (n) = x ∈ R2 : (x, n) ≤ λTβ (n) . (22.1.9)
548 22 Challenges Within Metastability

Fig. 22.1 The surface tension of a droplet equals the integral of the local surface tension over the
boundary of the droplet. The local surface tension depends on the direction perpendicular to the
boundary

Fig. 22.2 Wulff construction. Left: Polar plot of the function n → Tβ (n), with three outward
directions and three orthogonal tangent lines demarking three inward half-spaces (of which only
one has been shaded). Right: The intersection of all the half-spaces (= the inner envelope of the
tangent lines) gives rise to the Wulff shape. The Wulff droplet is the scaling of the Wulff shape that
has unit volume

For λ ∈ (0, ∞), define the intersection

Wβλ = Wβλ (n). (22.1.10)
n∈S 1

The latter region satisfies the scaling relation Wβλ = λWβ1 , i.e., its shape stays the
same as λ is varied. The Wulff droplet is defined as the region
λ(β)
Wβ = Wβ , (22.1.11)

where λ(β) is chosen such that Wβ has volume 1 (see Fig. 22.2). Clearly, Wβ is
convex and hence ∂Wβ ∈ D . The integrated surface tension of the Wulff droplet, the
quantity that appears in (22.1.5), reads

w ∗ (β) = Iβ (∂Wβ ). (22.1.12)

22.2 Crystallisation in small volumes at low temperatures 549

It is known that the Wulff droplet satisfies the variational principle

w ∗ (β) ≤ Iβ (γ ) ∀ γ ∈ D with interiour volume 1, (22.1.13)

with equality if and only if γ is a translation of ∂Wβ .

22.1.3 Heuristics

The heuristics behind Theorem 22.1 is as follows. Consider a droplet of the plus-
phase inside the minus-phase. Let S be the shape of this droplet and 2 its volume
(i.e., the number of sites inside). For large , the free energy of this droplet is roughly

ΦS () = −m∗ (β)h2 + wS (β). (22.1.14)

Here, −m∗ (β)h2 is the change of the free energy due to the fact that inside the
droplet the minus-phase is replaced by the plus-phase, and wS (β) is the change of
the free energy due to the surface tension along the border of the droplet. The two
terms are of the same order of magnitude when is of order 1/ h. Therefore, putting
= b/ h and ΦS () = φS (b)/ h, we get

φS (b) = −m∗ (β)b2 + wS (β)b. (22.1.15)

This function takes its maximal value at bc = wS (β)/2m∗ (β), reaching the value
φS (bc ) = wS (β)2 /4m∗ (β). The height of this barrier is minimised by the Wulff
shape, i.e., for S with wS (β) = w ∗ (β).

22.2 Crystallisation in small volumes at low temperatures

When we want to analyse the metastable behaviour of interacting particle systems

in the continuum rather than on a lattice, a number of technical difficulties arise that
need to be handled. In this section we describe a system of point particles living on a
finite torus in Rd , d ≥ 2, and interacting with each other through a pair potential that
has a hard-core repulsion, a finite range, and a unique attractive minimum. Particles
are randomly created and annihilated according to a Metropolis dynamics, with the
outside of the torus acting as an infinite particle reservoir with a fixed chemical
potential. Once on the torus, particles cannot move.
Our goal will be to derive the metastable behaviour of this system in a way that
is analogous to what was done in Chaps. 16–18. The chemical potential is chosen
such that the system is metastable: starting from the vacuum configuration where
the torus is empty, the system wants to nucleate (i.e., fill up the torus with particles
in close packing). However, in order to do so it has to create a critical droplet that is
large enough to trigger the nucleation. We are again interested in the nucleation time
550 22 Challenges Within Metastability

and in the size and shape of the critical droplets in the limit as the temperature tends
to zero. We will show that this can be achieved subject to a number of hypotheses
on the energy landscape, which replace the hypotheses (H1) and (H2) in Chap. 16.
It will turn out that the prefactor depends in a delicate way on the temperature, the
chemical potential, and the shape of the pair potential near its minimum, which is
different from what we found in Chaps. 16–18 for lattice systems.
The problem with working in the continuum is that it is hard to control the en-
ergy landscape, especially in the vicinity of the set of critical droplets. We rely on
properties derived in the literature for minimal-energy configurations at fixed parti-
cle numbers. Our assumptions on the energy landscape are expected to be true for
a large class of pair potentials, but as yet can be proven only for a particular pair
potential in d = 2, called the soft disk potential.

22.2.1 Static model

Let

Ω∗ = ω ⊂ Rd : card(ω) ∈ N0 (22.2.1)
with card(ω) the cardinality of ω. The set Ω∗ represents all the finite-particle config-
urations in Rd (particles have locations, do not overlap, and are indistinguishable),
and is endowed with the Hausdorff metric dH : Ω∗ × Ω∗ → [0, ∞).
Particles interact with each other through a pair potential v : [0, ∞) → R ∪ {∞}.
Let U : Ω∗ → R ∪ {∞} be the energy function defined by

U (ω) = U {x1 , . . . , xN } = v xi − xj , (22.2.2)
1≤i<j ≤N

i.e., U (ω) is the energy of the configuration ω = {x1 , . . . , xN } ∈ Ω∗ with N ∈ N0

particles. Note that U (∅) = 0 and U ({x}) = 0, x ∈ Rd .
We will be interested in particle configurations restricted to a finite torus. To that
end, fix L ∈ (0, ∞) large, let Λ = [−L, L]d , give Λ periodic boundary conditions,
and put

Ω = ΩΛ = ω ⊂ Λ : card(ω) ∈ N0 ⊂ Ω∗ . (22.2.3)
For A ⊂ Λ and ω ∈ Ω, let nA (ω) = card(ω ∩A) ∈ N0 denote the number of particles
in A. Let F be the smallest σ -algebra on Ω such that for every Borel measurable
A ⊂ Λ the map nA : Ω → N0 is measurable with respect to F . Write Q = QΛ for
the Poisson point process on Λ with intensity 1, which is a probability measure on
(Ω, F ). For later purpose, note the formula

Q(dω) F (ω, ω ∪ x) dx = Q(dω) F (ω\x, ω) ∀ F ∈ Cb (Ω),
Ω Λ Ω x∈ω
(22.2.4)
with the usual convention that the sum over ∅ is zero.
22.2 Crystallisation in small volumes at low temperatures 551

Let β ∈ (0, ∞) denote the inverse temperature and μ ∈ R the chemical potential.
The grand-canonical Hamiltonian H = Hμ,Λ on Λ with chemical potential μ ∈ R
is the function on Ω defined by

H (ω) = U (ω) − μ n(ω), (22.2.5)

where n(ω) = nΛ (ω). The grand-canonical Gibbs measure Pβ = Pβ,μ,Λ is the

probability measure on (Ω, F ) defined by

dPβ 1
(ω) = e−βH (ω) , (22.2.6)
dQ Ξ
where Ξβ = Ξβ,μ,Λ is the grand-canonical partition function

Ξβ = e−βH (ω) Q(dω). (22.2.7)
Ω

Throughout the paper the labels μ, Λ will be suppressed from the notation.

22.2.2 Dynamic model

The particle configuration evolves in time according to a Markov process

X = X(t) t≥0 (22.2.8)

on Ω with càdlàg paths. Particles are randomly created and annihilated inside Λ
as if the outside of Λ were an infinite gas reservoir with chemical potential μ. The
dynamics is Metropolis with grand-canonical Hamiltonian H in (22.2.5). Once in-
side Λ, particles cannot move.
Our dynamics is the Markov process with generator Lβ given by

(Lβ f )(ω) = bβ (x, ω) f (ω ∪ x) − f (ω) d x
Λ

+ dβ (x, ω) f (ω \ x) − f (ω) , f ∈ Cb (Ω), ω ∈ Ω, (22.2.9)
x∈ω

with Metropolis rates

bβ (x, ω) = e−β[H (ω∪x)−H (ω)]+ , dβ (x, ω) = eβ[H (x,ω\x)−H (ω)]− . (22.2.10)

Under this dynamics, in configuration ω a particle is added to an infinitesimal neigh-

bourhood dx of a site x ∈ / ω with rate bβ (x, ω)dx, while a particle is removed at
site x ∈ ω with rate dβ (x, ω). Apart from this creation (= birth) and annihilation
(= death), no particle motion takes place. We write Pω to denote the law of (Xt )t≥0
given X0 = ω.
552 22 Challenges Within Metastability

The grand-canonical Gibbs measure in (22.2.6) is reversible w.r.t. this dynamics

because bβ , dβ satisfy the relation

e−βH (ω) bβ (x, ω) = e−βH (ω∪x) dβ (x, ω ∪ x), x∈

/ ω ∈ Ω. (22.2.11)

22.2.3 Metastability theorems for the soft disk potential

The set of ground states will be denoted by

= argmin H, (22.2.12)

the vacuum by ∅. In what follows we will be interested in the metastable crossover

from ∅ to N , with N representing a state where the system has sufficiently
nucleated. This set must satisfy maxω∈N H (ω) < H (∅) and hypothesis (H4) in
Sect. 22.2.4, but is otherwise general. An example is the set of all configurations
with a sufficiently large number of particles. The set of protocritical and critical
configurations for the transition from ∅ to N will be denoted by P and C .
Pick any β → ε(β) such that limβ→∞ ε(β) = 0 and limβ→∞ βε(β) = ∞, and
define

C (β) = ω ∈ Ω : dH ω, C ≤ ε(β) . (22.2.13)
This set of near-critical configurations turns out to play an important role in our
analysis of the nucleation time.
Theorems 22.4–22.5 below are valid for the special case where the pair potential
is the soft disk potential given by (see Fig. 22.3)
⎧
⎪
⎪ ∞, 0 ≤ r < 1,
⎨
v(r) = 24r − 25, 1 ≤ r < 25 24 ,
(22.2.14)
⎪
⎪
⎩
24 ≤ r < ∞.
25
0,

Put h = μ + 3, and assume that h ∈ (0, 1) with h−1 ∈

/ 12 N. Let
; <
c = h−1 , (22.2.15)

note that (c + 1)−1 < h < −1

c , and define
2
3c + 4c + 2, h ∈ ((c + 1)−1 , (c + 12 )−1 ),
kc = (22.2.16)
32c + 2c + 1, h ∈ ((c + 12 )−1 , −1
c ).

Theorem 22.2 (Geometry of critical droplets)

(i) If h ∈ ((c +1)−1 , (c + 12 )−1 ), then C is the set of all configurations consisting
of a single hexagon of side length c located anywhere in Λ, with a bar of c
particles added on any side.
22.2 Crystallisation in small volumes at low temperatures 553

Fig. 22.3 Soft disk potential

(ii) If h ∈ ((c + 12 )−1 , −1

c ), then C is the set of all configurations consisting of

a single hexagon of side length c located anywhere in Λ, with a bar of c + 1

particles removed from any side.

Theorem 22.3 (Critical gate) limβ→∞ P∅ (τC (β) < τN ) = 1.

Theorem 22.4 (Mean crossover time)

(i) There exists a KN (μ) ∈ (0, ∞) such that

KN (μ)
lim (24β)−(2kc −3) e−βΓ E∅ (τN ) = , (22.2.17)
β→∞ 2π|Λ|

where
Γ = Ekc − kc μ (22.2.18)
with

Ekc = −3kc + 0 12kc − 3 1. (22.2.19)
(ii) There exists a χ ∈ R such that

lim (μ + 3)2 ln KN (μ) = χ. (22.2.20)

μ↓−3

Theorem 22.5 (Exponential distribution of crossover time)

lim P∅ τN /E∅ (τN ) > t = e−t for all t ≥ 0.
β→∞
554 22 Challenges Within Metastability

Theorem 22.2 identifies the critical droplets as quasi-hexagons with kc particles.

These take over the rôle of the quasi-squares for the lattice dynamics studied in
Chap. 17.
Theorem 22.3 shows that nucleation takes place through the set C (β) of near-
critical configurations, where the particles are located in regions with a linear size
of order 1/β around the set of locations corresponding to the critical droplets. The
volume of order β 2(kc −2) of these regions in (R2 )kc −2 is large enough so that the
dynamics is sufficiently likely to create particles inside (two particles drop out be-
cause of the arbitrary location and orientation of the critical droplet captured by the
factor 2π|Λ| in the right-hand side of (22.2.17)).
Theorem 22.4(i) identifies the average nucleation time. The factor (24β)2kc −3
in the left-hand side of (22.2.17) is proportional to the inverse of the volume just
described, with the factor 24 coming from the slope of the soft disk potential at its
minimum. The factor KN (μ) turns out to be a complicated object, since it incorpo-
rates all the possible shapes of the critical droplet, and depends in a delicate manner
on the fine details of the soft disk potential near its minimum.
Theorem 22.4(ii) shows that scaling of KN (μ) for μ ↓ e∞ , which corresponds
to the regime of weak metastability where the size of the critical droplet tends to
infinity, leads to a limit χ that has the interpretation of the free energy per site in a
triangular crystal of particles interacting through the soft disk potential.

22.2.4 Extension to other pair potentials

In this section we state four hypotheses on the energy landscape under which the re-
sults in Sect. 22.2.3 carry over to other pair potentials, modulo minor modifications.
The class of pair potentials we are interested in is the following.

Definition 22.6 (Class of pair potentials) The pair potential ν : [0, ∞) → (−∞, ∞]
is assumed to be lower-semicontinuous, to be non-positive, continuous and Lips-
chitz where finite, to have a hard-core repulsion, a finite range, and a unique strictly
negative minimum, i.e., there exist 0 < r1 ≤ r2 < r3 < ∞ such that (see Fig. 22.4)
⎧
⎪
⎪ = ∞, for 0 ≤ r < r1 ,
⎨
≤ 0, for r1 ≤ r < ∞,
v(r) (22.2.21)
⎪
⎪ > v(r2 ), for r = r2 ,
⎩
= 0, for r ≥ r3 .

The conditions in (22.2.21) imply that v is stable, i.e., there exists a C ∈ (0, ∞)
such that U (ω) ≥ −Ccard(ω) for all ω ∈ Ω∗ (see Ruelle [209, Sect. 3.2]).
For k ∈ N, define
Ek = inf U (ω). (22.2.22)
ω∈Ω∗
card(ω)=k
22.2 Crystallisation in small volumes at low temperatures 555

Fig. 22.4 Shape of the pair potential r → v(r). The soft disk potential has r1 = r2 = 1, r3 = 25
24
and v(r1 ) = v(r2 ) = −1 (see Fig. 22.3)

Any configuration achieving the infimum is called a k-particle ground state in Rd .

By subadditivity, we have the existence of

e∞ = lim Ek /k = inf Ek /k. (22.2.23)

k→∞ k∈N

The analogue of (22.2.22) on Λ reads

Ek,Λ = inf U (ω). (22.2.24)

ω∈Ω
card(ω)=k

Our first three hypotheses read:

Assumption 22.7 (Ground state properties)

(H1) k → Ek − kμ has a unique maximizer kc on N0 that satisfies 2 ≤ kc < ∞.
(H2) limk→∞ (Ek − ke∞ ) = ∞.
(H3) There exists an ε > 0 such that Ek,Λ = Ek for all 0 ≤ k ≤ ε|Λ|.

Hypothesis (H1) precludes certain special values of the chemical potential, and
guarantees that our system is in the metastable regime: the threshold for nucleation
is neither one particle nor infinitely many particles (Λ needs to be chosen large
enough so that the kc -particle ground states fit inside). The restrictions on the pair
potential in Definition 22.6 imply that e∞ ∈ (−∞, 0). Since E0 = 0 and Ek > ke∞
for all k ∈ N, it follows that kc = ∞ for all μ < e∞ . Hypothesis (H2) implies that
kc = ∞ also for μ = e∞ . On the other hand, kc < ∞ for all μ > e∞ . In order to
have kc ≥ 2, as required in Hypothesis (H1), we need to constrain μ from above.
Under Hypothesis (H2) there exists an hc ∈ (0, ∞) (depending on the fine details of
556 22 Challenges Within Metastability

the pair potential) such that

2 ≤ kc < ∞ if and only if μ ∈ (e∞ , e∞ + hc ). (22.2.25)

Hypothesis (H3), finally, says that the k-particle ground states in Λ coincide with
those in Rd when k is not too large. (Λ needs to be chosen large enough so that no
interaction around the torus occurs.)
Our fourth hypothesis is analogous to what we had in Lemma 16.9.

Assumption 22.8 (No-deep-well property)

(H4) There exists a V ∗ < Γ such that max ω∈Ω\{∅,N } [Φ(ω, {∅, N }) − H (ω)] ≤ V ∗ .
H (ω)≤Γ

Hypothesis (H4) says that the deepest wells in the energy landscape are those
containing ∅ and N , which makes (∅, N ) into a metastable pair in the sense of
Definition 8.2.
The following theorem states that the results in Sect. 22.2.3 indeed carry over.

Theorem 22.9 (Metastability for other pair potentials) Theorems 22.3–22.5 hold
under Hypotheses (H1)–(H4), with kc and Ekc as defined above, and with an appro-
priate modification of the scaling factors in (22.2.17) and (22.2.20).

Different pair potentials will have different scaling in β. E.g. when the pair poten-
tial is twice differentiable
√ near its minimum, the scaling factor in the left-hand side
of (22.2.17) becomes [ 2πv (1)β]−(2kc −3) and the formula for KN in the right-
hand side of (22.2.17) involves the determinant of a certain (2kc − 3)-dimensional
quadratic form. In the near-critical configurations
√ the regions where the particles are
located have a linear size of order 1/ β.
Hypotheses (H1)–(H4) are satisfied for the soft disk potential in d = 2, with
e∞ = −3 and hc = 1. We expect that they are in fact satisfied for a large class of
pair potentials. For instance, (H1)–(H3) should hold under the conditions on the pair
potential stated in Definition 22.6, with Ek − ke∞ 4 k (d−1)/d as k → ∞. Moreover,
if the well of the pair potential is narrow and deep enough (see Fig. 22.4), then
Theorem 22.2 should carry over as well, with the spacing in the triangular lattice
equal to r2 and with Ek multiplied by −v(r2 ) > 0. Hypothesis (H4) is more delicate,
and will no doubt require stronger conditions on the pair potential, e.g. unimodality.
Settling (H1)–(H4) beyond the soft disk potential in d = 2 represents a hard open
problem in the analytic theory of crystallisation. In d = 3, for instance, it is believed
that the ground states consist of stacked layers of triangular lattices, but the relative
position of these layers is a matter of debate.

22.3 Bibliographical notes

1. Theorem 22.1 was proved by Shlosman and Schonmann [215]. The proof is long
and technical. To obtain control on the growing and shrinking of large droplets,
22.3 Bibliographical notes 557

delicate coupling and coarse-graining techniques are needed in which the micro-
scopic regions where the system changes from the minus-phase to the plus-phase
are approximated on a mesoscopic scale by local pieces of a continuum interface,
and the cost of these pieces is related to the direction-dependent surface tension, as
explained in Sects. 22.1.2–22.1.3.

2. Vanheuverzwijn [231] proved the existence of metastable states for the Ising
model on Z2 . Numerical studies by Rikvold, Tomita, Miyashita and Sides [206]
confirm Theorem 22.1.

3. The extension of Theorem 22.1 to d ≥ 3 was obtained in Bodineau, Graham and

Wouts [29] (see Item 7 below). A weaker version of this extension was conjectured
in Aizenman and Lebowitz [1] and was proved in Schonmann [212].

4. The proof of Theorem 22.1 given in [215] in fact applies to spin-flip rates that
are more general than (22.1.1)–(22.1.2): translation invariant, finite range, attrac-
tive, monotone in the magnetic field, uniformly bounded away from zero and infin-
ity. Also the initial condition can be more general: any starting distribution that is
stochastically below the minus-phase.

5. It is shown in [215] that the metastable state, i.e., the state at time τ (h; κ) with
κ < κβ , is “infinitesimally larger” than the minus-phase. An asymptotic expansion in
powers of h is derived for the difference of a local average under the metastable state
and the minus-phase, which can be interpreted as describing the C ∞ -continuation in
h of the family of Gibbs distributions with negative h into the region of positive h.
This continuation is expected not to be analytic, a situation that should be typical
for metastable states. It is known that there is no analytic continuation of the minus-
phase across h = 0.

6. The extension of Theorems 22.1 to Kawasaki dynamics, where the limit of small
magnetic field h ↓ 0 is taken over by the limit of weak supersaturation Δ ↑ 2U , is
also still open. The main difficulty is that Kawasaki dynamics is conservative: the
growing of large droplets is hampered because the gas around the droplet gets de-
pleted. None of the techniques developed for Glauber dynamics seems easily trans-
portable.

7. Bodineau, Graham and Wouts [29] look at a diluted version of the model studied
in Sect. 22.1, where the pair interaction is switched off on a random set of sites with
density p ∈ (0, 1). The relaxation time is shown to be exp[κβ (p)/ hd−1 ], h ↓ 0, with
κβ (p) = w ∗ (p, β)2 /m∗ (p, β), where m∗ (p, β) and w ∗ (p, β) are the analogues of
m∗ (β) and w ∗ (β) in the non-diluted model, i.e., the spontaneous magnetisation of
the plus-phase and the integrated surface tension of the Wulff droplet of unit vol-
ume. Intuitively, dilution enhances relaxation because there is no surface tension in
diluted areas, and so it is expected that p → κβ (p) is non-increasing. However, no
558 22 Challenges Within Metastability

proof is available, even though both p → m∗ (p, β) and p → w ∗ (p, β) are known
to be non-increasing. It is shown that limβ→∞ κβ (p)/κβ = 0 for all p ∈ (0, 1).

8. The results in Sect. 22.2 are taken from den Hollander and Jansen [83].

9. A substantial body of literature is concerned with the speed of convergence to

equilibrium in both Glauber dynamics and Kawasaki dynamics at finite temper-
ature, both for high temperature and for low temperature. See e.g. Cancrini and
Martinelli [47], Cancrini, Martinelli and Roberto [48, 49], Lubetzky and Sly [168],
Lubetzky, Martinelli, Sly and Toninelli [167]. The slow mixing at low temperatures
found in Cancrini, Cesi and Martinelli [46], Martinelli and Toninelli [178] is cer-
tainly a signature of metastable behaviour.

10. A major challenge consists in understanding metastable behaviour of stochastic

dynamics on large random graphs. The theory developed in Chap. 16 applies to
finite graphs, but assumption (H1) needs verification and may fail for certain real-
isations of the graph. (Assumption (H2) typically fails, which means we lose The-
orem 16.4(b).) The key quantities in Theorems 16.4–16.6, i.e., the critical set, the
communication height and the prefactor, all depend on the realisation of the graph.
Understanding the scaling of these quantities for large graphs is interesting. Dom-
mers [93] contains rough results for Glauber dynamics on random regular graphs.
Chapter 23
Challenges Beyond Metastability

Learning, undigested by thought, is labor lost;

thought, unassisted by learning, is perilous.
(Kong-tze, Lun Yu)

Parts VI and VII dealt with nucleation in small and large volumes at low tempera-
tures. In the former, where the volume was kept fixed, we were able to arrive at a full
description of metastability, with detailed computations for both Glauber dynamics
and Kawasaki dynamics. In the latter, where the volume grew exponentially with
the inverse temperature, we restricted ourselves to the computation of the time of
first appearance of a critical droplet somewhere in the large volume. We were un-
able to follow the subsequent growth of this droplet beyond twice its initial size. In
particular, it remained open how this droplet eventually invaded the large volume.
Especially for Kawasaki dynamics this is a formidable challenge, because a large
droplet tends to deplete the surrounding gas.
What happens in infinite volume? In that case a new mechanism of nucleation
becomes possible: the critical droplet is created somewhere far from the origin and
invades the neighbourhood of the origin by growing. The key question reads: Is this
mechanism more efficient than nucleation close to the origin? It turns out that the
answer is yes.
In this chapter we look at Glauber dynamics in infinite volume in two metastable
regimes. In Sect. 23.1 we consider the limit of low temperature at positive magnetic
field, while in Sect. 23.2 we turn to the limit of small magnetic field at positive tem-
perature. We restrict ourselves to presenting the main ideas only, omitting proofs.
For references we refer the reader to the bibliographical notes.
Post-nuclear growth is not part of metastability theory, the latter being concerned
with pre-nucleation and nucleation phenomena only. Key features, such as the re-
newal structure created by repeated unsuccessful trials to form a critical droplet, are
lost. In fact, so far potential theory has rather little to say about post-nuclear growth.
Consequently, sharp results are hard to get, and fully rely on ad hoc methods.

A. Bovier, F. den Hollander, Metastability,
Grundlehren der mathematischen Wissenschaften 351,
DOI 10.1007/978-3-319-24777-9_23
560 23 Challenges Beyond Metastability

23.1 Low temperatures

Return to the setting of Sect. 17.1.1. Replace Λ by Zd , d ≥ 2. The infinite-volume

Ising-spin Hamiltonian reads

J h
σ ∈ S = {−1, +1}Z ,
d
H (σ ) = − σ (x)σ (y) − σ (x),
2 2
{x,y}∈(Zd )∗ d
x∈Z
(23.1.1)
with J, h > 0. The system follows a Metropolis dynamics (σt )t≥0 with spin-flip
rates given by

e−β[Δx H (σ )]+ , σ ∈ S, x ∈ Zd ,
c σ, σ =
x
(23.1.2)
0, otherwise,

where

Δx H (σ ) = σ (x) J σ (y) + h . (23.1.3)
y∈Zd
(x,y)∈(Zd )∗

The reason for writing Δx H (σ ) instead of H (σ x ) − H (σ ) is that H is infinite (see

Liggett [166], Chapter III, for a precise definition). We again write Eσ to denote
expectation w.r.t. the law of this dynamics starting from σ0 = σ .
As in Sects. 17.1.2 and 17.6, we assume that h ∈ (0, dJ ) with dJ / h non-integer.
As in Sect. 17.1.3, we let and denote the configurations where all the spins are
down, respectively, up. We say that a function f : S → R is local if it depends on
finitely many spin variables only. The main result of this section is the following.

Theorem 23.1 (Mean crossover time from minus to plus) If f is a local function,
then

f (), if κ < κd ,
lim E f (στ (β;κ) ) = (23.1.4)
β→∞ f (), if κ > κd ,
where τ (β; κ) = exp(βκ) and

1
d
κd = Γk , (23.1.5)
d +1
k=1

with Γk the energy of the critical droplet in k dimensions.

Recall from Sect. 17.7, Item 3, that an explicit formula is available for Γk .
The heuristics behind Theorem 23.1 is as follows. The most efficient mechanism
for relaxation from minus to plus near the origin is for the system to create a crit-
ical droplet of plus-spins somewhere far away from the origin and let this droplet
grow and invade the origin. Suppose that nucleation in a finite box occurs at rate
23.1 Low temperatures 561

Fig. 23.1 Recursion in the dimension: A three-dimensional critical droplet, consisting of a

quasi-cube with a two-dimensional critical droplet attached to one of its faces

exp(−βΓd ) (which we know from Chap. 19 is true if we ignore terms of order 1).
Suppose further that the speed of growth of a large supercritical droplet is vd (i.e.,
the speed at which the faces move outwards). Then, to invade the origin at time t,
the droplet must be born inside the space-time cone whose basis is a d-dimensional
hypercube with side length vd t and whose height is t. The critical space-time cone
is such that the nucleation rate is of order 1. Therefore, writing τd for the time when
the origin is invaded, we have

τd (vd τd )d exp −βΓd = 1, (23.1.6)

where we ignore terms of order exp(o(β)). Since large droplets are approximately
parallelepipeds, the dynamics on a face behaves like a d − 1-dimensional Glauber
dynamics, and so the time needed to fill a face is τd−1 . Hence

vd = 1/τd−1 . (23.1.7)

Combining (23.1.6)–(23.1.7), and putting τd = exp(βκd ), we obtain the recursion

relation
(d + 1)κd = Γd + dκd−1 . (23.1.8)
Since κ0 = 0, this yields the formula for κd in (23.1.5). Figure 23.1 (which is the
same as Fig. 17.6 in Chap. 17) illustrates (23.1.8) for d = 3.
In order to turn the above heuristics into the statement in (23.1.4), three major
obstacles have to be overcome:
(1) Speed of growth: To control the speed of growth vd , a comparison is made
with a simpler nucleation-and-growth model in the spirit of bootstrap percola-
tion, namely, all sites are initially unoccupied, a site becomes occupied at rate
e−βΓd if it has no occupied neighbours, at rate ε = e−βγ if it has one occupied

562 23 Challenges Beyond Metastability

neighbour, and at rate 1 if it has two or more occupied neighbours, and occu-
pied sites stay occupied forever. For this model it is shown that the speed is
ε 1/d and the nucleation time is exp[βκc ] with κc = max{γ , (Γd + γ )/(d + 1)},
provided Γd ≥ γ . By making the appropriate choice for γ as a function of J, h
(e.g. γ = 2J − h in d = 2), it is shown that the nucleation time in the original
model is close to that of the nucleation-and-growth model. The proof requires
delicate coupling techniques.
(2) Energy landscape: A detailed study of the energy landscape is necessary in
order to show that the dynamics does not get caught in a deep well. For d =
2, 3 this can be done with the help of the combinatorial techniques mentioned
and used in Chap. 17, but for higher dimensions no analogous techniques are
available. The necessary estimates are obtained via rougher arguments.
(3) Space-time clusters: Some control on the size of space-time clusters is needed,
e.g. to show that it is very unlikely for large space-time clusters to be formed
prior to nucleation or for subcritical clusters to move over long distances. This
requires estimates on recurrence times as well as an analysis of “cycle com-
pounds”.
1
The factor d+1 in (23.1.5) shows that the mechanism of far-away nucleation
followed by invasion is faster than the mechanism of close-by nucleation alone.
Thus, space-time entropy places a crucial role in infinite volume.

23.2 Small magnetic fields

Return to the model in Sect. 22.1 with Λ replaced by Z2 and again consider the
metastable regime β ∈ (0, βc ) and h ↓ 0.

Theorem 23.2 (Mean crossover time from minus to plus) The same result as in
Theorem 22.1 holds with κβ replaced by 13 κβ .

The heuristics behind this theorem is the same as for the model in Sect. 23.1. The
1
extra factor d+1 = 13 in d = 2 again comes from space-time entropy: invasion of a
growing droplet that is created somewhere in a space-time cone of the appropriate
size. The same three obstacles have to be overcome to build a proof.

23.3 Bibliographical notes

1. Aizenman and Lebowitz [1] argued that a nucleation center in bootstrap perco-
lation plays a similar rôle as a growing Wulff droplet in the Ising model subject to
Glauber dynamics, and that therefore bootstrap percolation can serve as a paradigm
for the description of metastable behaviour. They sketched a program for Glauber
dynamics, which was subsequently carried out in Schonmann and Shlosman [215].
23.3 Bibliographical notes 563

2. Theorem 23.1 was proved for d = 2 in Dehghanpour and Schonmann [77, 78],
and for d ≥ 3 in Cerf and Manzo [54]. Theorem 23.2 was proved for d = 2 in Schon-
mann and Shlosman [215]. The extension to d ≥ 3 is still open. It is conjectured in
Schonmann [213] that v2 ∼ Ch, h ↓ 0, which is much stronger than the scaling
property v2 = exp[o(1/ h)], h ↓ 0, that is needed for the proof of Theorem 23.2.

3. It remains a challenge to obtain a sharper estimate of the crossover time, i.e., to

find the function β → Td (β) such that f (σt ) ≈ f () for t Td (β) exp(βκd ) and
f (σt ) ≈ f () for t ( Td (β) exp(βκd ) as β → ∞. This function takes over the role
of the prefactor K in Chaps. 16–18.

4. No analogues of Theorems 23.1–23.2 for Kawasaki dynamics have been proved

yet. This is a formidable challenge because Kawasaki dynamics is conservative. In
the post-nuclear phase there are droplets of varying sizes. Small droplets tend to
shrink and be absorbed by large droplets, a phenomenon referred to as Ostwald
ripening. Becker-Döring theory [15] is a phenomenological theory that tries to cap-
ture the size distribution of the droplets as a function of time. It turns out that, at
low densities, to a good approximation the average radius of droplets grows like a
fractional power of time (exponent 13 in d = 3), which is supported by simulations.
For mathematical background, see Ball, Carr and Penrose [10].
Glossary

(A → B)opt optimal path between sets A and B

β inverse temperature
(Bt )t∈R+ Brownian motion
B Borel σ -algebra
Bρ (x) ball of radius ρ centred at site x
cap(A, B) capacity between sets A and B
C ∗ critical set
e(x, t) heat kernel at site x at time t
E expectation with respect to a probability measure P
Eμ expectation with respect to a Markov process with initial law μ
Ex expectation with respect to a Markov process starting at site x
E (f, g) Dirichlet form for functions f, g
eA,B equilibrium measure on A for a pair of sets A, B
F potential function
F , G σ -algebra’s
(Fn )n∈N , (Ft )t∈R+ filtration in discrete, respectively, continuous time
'(A, B) communication height between sets A and B
G (A, B) essential gate between sets A and B
GD c (x, y) Green function at x, y ∈ D with Dirichlet boundary conditions
H (ξ ) Hamiltonian at configuration ξ
HD c (x, y) Poisson kernel at x, y ∈ D with Dirichlet boundary conditions
hA,B equilibrium potential between sets A and B
I (x), I (γ ) large deviation rate functions at site x, respectively, path γ
L Markov generator, discrete time
L Markov generator, continuous time
L p (μ) space of functions whose p-th power is integrable w.r.t. μ
Lp (μ) space of equivalence classes of functions whose p-th power is integrable
w.r.t. μ
L (γ , γ̇ , s) Lagrangian in large deviation theory at time s for path γ
λx , λk eigenvalue of generator associated with x ∈ M , respectively, xk ∈ M
λN0 principal eigenvalue of generator with Dirichlet boundary conditions in N

A. Bovier, F. den Hollander, Metastability,
Grundlehren der mathematischen Wissenschaften 351,
DOI 10.1007/978-3-319-24777-9
566 Glossary

(Mn )n∈N , (Mt )t∈R+ martingales in discrete, respectively, continuous time

M set of metastable points
Mx set of metastable points that are more stable than metastable point x
μ measure, often invariant measure or Gibbs measure
νA,B last-exit biased distribution on A for Markov process killed in B
(*, F , P) probability space, consisting of a set, a sigma-algebra and a probability
measure
(*, F , P, (Ft )t∈I ) filtered probability space, consisting of a set, a sigma-algebra,
a probability measure and a filtration
P ∗ protocritical set
P (x, A) one-step transition kernel from site x to set A
Pt (x, A) Markov semigroup at time t from site x to set A
Ph , Eh Doob transformed law and expectation with harmonic function h
Rλ resolvent of a semigroup
S often state space of a stochastic process
S (A, B) communication level set between sets A and B
σ, η configurations (spin variables, occupation variables, etc.)
τ stopping time
W (x, t) space-time white noise
(Xt )t∈R+ , (Yt )t∈R+ stochastic processes
ZN partition function for system size N
References

1. Aizenman, M., Lebowitz, J.: Metastability effects in bootstrap percolation. J. Phys. A 21,
3801–3813 (1988)
2. Aldous, D., Fill, J.A.: Reversible Markov Chains and Random Walks on Graphs. https://
www.stat.berkeley.edu/~aldous/RWG/book.pdf (2002/2014)
3. Allen, S., Cahn, J.: Ground state structures in ordered binary alloys with second neighbor
interactions. Acta Metall. 20, 423–433 (1972)
4. Alonso, L., Cerf, R.: The three-dimensional polyominoes of minimal area. Electron. J. Comb.
3, 1–39 (1996)
5. Amaro de Matos, J.M.G., Baêta Segundo, J.A., Perez, J.F.: Fluctuations in dilute antiferro-
magnets: Curie-Weiss models. J. Phys. A 25, 2819–2830 (1992)
6. an der Heiden, M.: Metastability of Markov chains and in the Hopfield model. Ph.D. thesis,
Technische Universtität Berlin (2007)
7. Andjel, E.D.: Invariant measures for the zero range processes. Ann. Probab. 10, 525–547
(1982)
8. Arrhenius, S.: On the reaction rate of the inversion of non-refined sugar upon souring. Z.
Phys. Chem. 4, 226–248 (1889)
9. Baake, E., Baake, M., Bovier, A., Klein, M.: An asymptotic maximum principle for essen-
tially linear evolution models. J. Math. Biol. 50, 83–114 (2005)
10. Ball, J.M., Carr, J., Penrose, O.: The Becker-Döring cluster equations: basic properties and
asymptotic behaviour of solutions. Commun. Math. Phys. 104, 657–692 (1986)
11. Barret, F.: Sharp asymptotics of metastable transition times for one-dimensional SPDEs.
Ann. Inst. Henri Poincaré Probab. Stat. 51, 129–166 (2015)
12. Barret, F., Bovier, A., Méléard, S.: Uniform estimates for metastable transition times in a
coupled bistable system. Electron. J. Probab. 15, 323–345 (2010)
13. Bauer, H.: Probability Theory and Elements of Measure Theory. Academic Press Inc. [Har-
court Brace Jovanovich Publishers], London (1981)
14. Baur, E.: Metastabilität von reversiblen Diffusionsprozessen. Diploma thesis, Bonn Univer-
sity (2011)
15. Becker, R., Döring, W.: Kinetische Behandlung der Keimbildung in übersättigten Dämpfen.
Ann. Phys. (Leipz.) 24, 719–752 (1935)
16. Beltrán, J., Landim, C.: Tunneling and metastability of continuous time Markov chains II,
the nonreversible case. J. Stat. Phys. 149, 598–618 (2012)
17. Beltrán, J., Landim, C.: A martingale approach to metastability. Probab. Theory Relat. Fields
161, 267–307 (2015)
18. Ben Arous, G., Bovier, A., Gayrard, V.: Glauber dynamics of the random energy model. I.
Metastable motion on the extreme states. Commun. Math. Phys. 235, 379–425 (2003)

A. Bovier, F. den Hollander, Metastability,
Grundlehren der mathematischen Wissenschaften 351,
DOI 10.1007/978-3-319-24777-9
568 References

19. Ben Arous, G., Cerf, R.: Metastability of the three-dimensional Ising model on a torus at
very low temperatures. Electron. J. Probab. 1, 1–55 (1996)
20. Berglund, N., Dutercq, S.: The Eyring–Kramers law for Markovian jump processes with
symmetries. J. Theor. Probab. (2015). doi:10.1007/s10959-015-0617-9
21. Berglund, N., Gentz, B.: The Eyring-Kramers law for potentials with nonquadratic saddles.
Markov Process. Relat. Fields 16, 549–598 (2010)
22. Berglund, N., Gentz, B.: Sharp estimates for metastable lifetimes in parabolic SPDEs:
Kramers’ law and beyond. Electron. J. Probab. 18, 1–58 (2013)
23. Berman, K.A., Konsowa, M.H.: Random paths and cuts, electrical networks, and reversible
Markov chains. SIAM J. Discrete Math. 3, 311–319 (1990)
24. Bianchi, A., Bovier, A., Ioffe, D.: Sharp asymptotics for metastability in the random field
Curie-Weiss model. Electron. J. Probab. 14, 1541–1603 (2009)
25. Bianchi, A., Bovier, A., Ioffe, D.: Pointwise estimates and exponential laws in metastable
systems via coupling methods. Ann. Probab. 40, 339–379 (2012)
26. Bianchi, A., Gaudillière, A.: Metastable states, quasi-stationary and soft measures, mixing
time asymptotics via variational principles. arXiv:1103.1143v1, to appear in Stoch. Proc.
Appl. (2011)
27. Bigelis, S., Cirillo, E., Lebowitz, J., Speer, E.: Critical droplets in metastable probabilistic
cellular automata. Phys. Rev. E 59, 3935–3941 (1999)
28. Billingsley, P.: Probability and Measure. Wiley Series in Probability and Mathematical Statis-
tics. Wiley, New York (1995)
29. Bodineau, T., Graham, B., Wouts, M.: Metastability in the dilute Ising model. Probab. Theory
Relat. Fields 157, 955–1009 (2013)
30. Bouchaud, J.-P., Cugliandolo, L., Kurchan, J., Mézard, M.: Out of equilibrium dynamics in
spin-glasses and other glassy systems. In: Young, A.P. (ed.) Spin Glasses and Random Fields.
World Scientific, Singapore (1998)
31. Bovier, A., den Hollander, F., Nardi, F.R.: Sharp asymptotics for Kawasaki dynamics on a
finite box with open boundary. Probab. Theory Relat. Fields 135, 265–310 (2006)
32. Bovier, A., den Hollander, F., Spitoni, C.: Homogeneous nucleation for Glauber and
Kawasaki dynamics in large volumes and low temperature. Ann. Probab. 38, 661–713 (2010)
33. Bovier, A., Eckhoff, M., Gayrard, V., Klein, M.: Metastability in stochastic dynamics of
disordered mean-field models. Probab. Theory Relat. Fields 119, 99–161 (2001)
34. Bovier, A., Eckhoff, M., Gayrard, V., Klein, M.: Metastability and low lying spectra in re-
versible Markov chains. Commun. Math. Phys. 228, 219–255 (2002)
35. Bovier, A., Eckhoff, M., Gayrard, V., Klein, M.: Metastability in reversible diffusion pro-
cesses. I. Sharp asymptotics for capacities and exit times. J. Eur. Math. Soc. (JEMS) 6, 399–
424 (2004)
36. Bovier, A., Gayrard, V.: Hopfield models as generalized random mean field models. In: Math-
ematical Aspects of Spin Glasses and Neural Networks. Progr. Probab., vol. 41, pp. 3–89.
Birkhäuser, Boston (1998)
37. Bovier, A., Gayrard, V.: Sample path large deviations for a class of Markov chains related to
disordered mean field models. Preprint available at arXiv:math/9905022, 487, WIAS Berlin
(1999)
38. Bovier, A., Gayrard, V., Klein, M.: Metastability in reversible diffusion processes. II. Precise
asymptotics for small eigenvalues. J. Eur. Math. Soc. (JEMS) 7, 69–99 (2005)
39. Bovier, A., Manzo, F.: Metastability in Glauber dynamics in the low-temperature limit: be-
yond exponential asymptotics. J. Stat. Phys. 107, 757–779 (2002)
40. Bovier, A., Neukirch, R.: A note on metastable behaviour in the zero-range process. In:
Griebel, M. (ed.) Singular Phenomena and Scaling in Mathematical Models, pp. 365–376.
Springer, Berlin (2013)
41. Brassesco, S.: Some results on small random perturbations of an infinite-dimensional dy-
namical system. Stoch. Process. Appl. 38, 33–53 (1991)
42. Brassesco, S., Buttà, P.: Interface fluctuations for the D = 1 stochastic Ginzburg-Landau
equation with nonsymmetric reaction term. J. Stat. Phys. 93, 1111–1142 (1998)
References 569

43. Burke, C.J., Rosenblatt, M.: A Markovian function of a Markov chain. Ann. Math. Stat. 29,
1112–1122 (1958)
44. Buslov, V.A., Makarov, K.A.: A time-scale hierarchy with small diffusion. Teor. Mat. Fiz.
76, 219–230 (1988)
45. Buslov, V.A., Makarov, K.A.: Life spans and least eigenvalues of an operator of small diffu-
sion. Mat. Zametki 51, 160 (1992)
46. Cancrini, N., Cesi, F., Martinelli, F.: The spectral gap for the Kawasaki dynamics at low
temperature. J. Stat. Phys. 95, 215–271 (1999)
47. Cancrini, N., Martinelli, F.: On the spectral gap of Kawasaki dynamics under a mixing con-
dition revisited. J. Math. Phys. 41, 1391–1423 (2000)
48. Cancrini, N., Martinelli, F., Roberto, C.: The logarithmic Sobolev constant of Kawasaki dy-
namics under a mixing condition revisited. Ann. Inst. Henri Poincaré Probab. Stat. 38, 385–
436 (2002)
49. Cancrini, N., Martinelli, F., Roberto, C.: Spectral gap and logarithmic Sobolev constant of
Kawasaki dynamics under a mixing condition revisited. In: In and out of Equilibrium (Mam-
bucaba, 2000). Progr. Probab., vol. 51, pp. 259–271. Birkhäuser, Boston (2002)
50. Caputo, P., Lacoin, H., Martinelli, F., Simenhaus, F., Toninelli, F.L.: Polymer dynamics in
the depinned phase: metastability with logarithmic barriers. Probab. Theory Relat. Fields
153, 587–641 (2012)
51. Cassandro, M., Galves, A., Olivieri, E., Vares, M.E.: Metastable behavior of stochastic dy-
namics: a pathwise approach. J. Stat. Phys. 35, 603–634 (1984)
52. Cassandro, M., Olivieri, E., Picco, P.: Small random perturbations of infinite-dimensional
dynamical systems and nucleation theory. Ann. Inst. Henri Poincaré, Phys. Théor. 44, 343–
396 (1986)
53. Catoni, O., Cerf, R.: The exit path of a Markov chain with rare transitions. ESAIM Probab.
Stat. 1, 95–144 (1995/97)
54. Cerf, R., Manzo, F.: Nucleation and growth for the Ising model in d dimensions at very low
temperatures. Ann. Probab. 41, 3697–3785 (2013)
55. Chaganty, N.R., Sethuraman, J.: Strong large deviation and local limit theorems. Ann.
Probab. 21, 1671–1690 (1993)
56. Chatterjee, S., Durrett, R.: Contact processes on random graphs with power law degree dis-
tributions have critical value 0. Ann. Probab. 37, 2332–2356 (2009)
57. Chenal, F., Millet, A.: Uniform large deviations for parabolic SPDEs and applications. Stoch.
Process. Appl. 72, 161–186 (1997)
58. Chow, Y.S., Teicher, H.: Probability Theory, 3rd edn. Springer Texts in Statistics. Springer,
New York (1997)
59. Cirillo, E.: A note on the metastability of the Ising model: the alternate updating case. J. Stat.
Phys. 106, 335–390 (2002)
60. Cirillo, E., Nardi, F.: Metastability for the Ising model with a parallel dynamics. J. Stat. Phys.
110, 183–217 (2003)
61. Cirillo, E., Nardi, F., Polosa, A.: Magnetic order in the Ising model with parallel dynamics.
Phys. Rev. E 64, 57103 (2001)
62. Cirillo, E., Nardi, F., Spitoni, C.: Competitive nucleation in reversible probabilistic cellular
automata. Phys. Rev. E 78, 040601 (2008)
63. Cirillo, E., Nardi, F., Spitoni, C.: Metastability for reversible probabilistic cellular automata
with self-interaction. J. Stat. Phys. 132, 431–471 (2008)
64. Cirillo, E., Nardi, F., Spitoni, C.: Competitive nucleation in metastable systems. Commun.
SIMAI Congr. 3, 040601(R) (2009)
65. Cirillo, E.N.M., Nardi, F.R.: Relaxation height in energy landscapes: an application to mul-
tiple metastable states. J. Stat. Phys. 150, 1080–1114 (2013)
66. Coddington, E., Levinson, N.: Theory of Ordinary Differential Equations. McGraw-Hill,
New York-Toronto-London (1955)
67. Da Prato, G., Debussche, A.: Strong solutions to the stochastic quantization equations. Ann.
Probab. 31, 1900–1916 (2003)
570 References

68. Da Prato, G., Zabczyk, J.: Stochastic Equations in Infinite Dimensions. Encyclopedia of
Mathematics and Its Applications, vol. 44. Cambridge University Press, Cambridge (1992)
69. Dai Pra, P., den Hollander, F.: McKean-Vlasov limit for interacting random processes in
random media. J. Stat. Phys. 84, 735–772 (1996)
70. Davies, E.: Dynamical stability of metastable states. J. Funct. Anal. 46, 373–386 (1982)
71. Davies, E.: Spectral properties of metastable Markov semigroups. J. Funct. Anal. 52, 315–
329 (1983)
72. Davies, E.B.: Metastability and the Ising model. J. Stat. Phys. 27, 657–675 (1982)
73. Davies, E.B.: Metastable states of symmetric Markov semigroups. I. Proc. Lond. Math. Soc.
45, 133–150 (1982)
74. Davies, E.B.: Metastable states of symmetric Markov semigroups. II. J. Lond. Math. Soc. 26,
541–556 (1982)
75. Dawson, D., Greven, A.: Spatial Fleming-Viot Models with Selection and Mutation. Lecture
Notes in Mathematics, vol. 2092. Springer, Berlin (2014)
76. de Hoog, F.R., Anderssen, R.S.: Asymptotic formulas for discrete eigenvalue problems in
Liouville normal form. Math. Models Methods Appl. Sci. 11, 43–56 (2001)
77. Dehghanpour, P., Schonmann, R.H.: Metropolis dynamics relaxation via nucleation and
growth. Commun. Math. Phys. 188, 89–119 (1997)
78. Dehghanpour, P., Schonmann, R.H.: A nucleation-and-growth model. Probab. Theory Relat.
Fields 107, 123–135 (1997)
79. Dembo, A., Zeitouni, O.: Large Deviations Techniques and Applications, 2nd edn. Applica-
tions of Mathematics (New York), vol. 38. Springer, New York (1998)
80. den Hollander, F.: Large Deviations. Fields Institute Monographs, vol. 14. American Mathe-
matical Society, Providence (2000)
81. den Hollander, F.: Metastability under stochastic dynamics. Stoch. Process. Appl. 114, 1–26
(2004)
82. den Hollander, F., Jansen, S.: Berman-Konsowa principle for reversible Markov jump pro-
cesses. arXiv:1309.1305, to appear in Markov Process. Relat. Fields (2015)
83. den Hollander, F., Jansen, S.: Metastability at low temperature for continuum interacting
particle systems (2015, in preparation)
84. den Hollander, F., Nardi, F., Olivieri, E., Scoppola, E.: Droplet growth for three-dimensional
Kawasaki dynamics. Probab. Theory Relat. Fields 125, 153–194 (2003)
85. den Hollander, F., Nardi, F.R., Troiani, A.: Kawasaki dynamics with two types of particles:
stable/metastable configurations and communication heights. J. Stat. Phys. 145, 1423–1457
(2011)
86. den Hollander, F., Nardi, F.R., Troiani, A.: Kawasaki dynamics with two types of particles:
critical droplets. J. Stat. Phys. 149, 1013–1057 (2012)
87. den Hollander, F., Nardi, F.R., Troiani, A.: Metastability for Kawasaki dynamics at low tem-
perature with two types of particles. Electron. J. Probab. 17, 26 (2012)
88. den Hollander, F., Olivieri, E., Scoppola, E.: Metastability and nucleation for conservative
dynamics. J. Math. Phys. 41, 1424–1498 (2000)
89. den Hollander, F., Olivieri, E., Scoppola, E.: Nucleation in fluids: some rigorous results.
Physica A 279, 110–122 (2000)
90. den Hollander, F., Olivieri, E., Scoppola, E.: Metastability and nucleation for conservative
dynamics. Markov Process. Relat. Fields 7, 51–53 (2001)
91. Deuschel, J.-D., Stroock, D.: Large Deviations. Pure and Applied Mathematics, vol. 137.
Academic Press, Boston (1989)
92. Dobrushin, R., Shlosman, S.: “Non-Gibbsian” states and their Gibbs description. Commun.
Math. Phys. 200, 125–179 (1999)
93. Dommers, S.: Metastability of the Ising model on random regular graphs at zero temperature.
arXiv:1411.6802 (2014)
94. Donsker, M., Varadhan, S.: A law of the iterated logarithm for total occupation times of
transient Brownian motion. Commun. Pure Appl. Math. 33, 365–393 (1980)
References 571

95. Doob, J.L.: Classical Potential Theory and Its Probabilistic Counterpart. Grundlehren
der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences],
vol. 262. Springer, New York (1984)
96. Doyle, P., Snell, J.: Random Walks and Electric Networks. Carus Mathematical Monographs,
vol. 22. Mathematical Association of America, Washington (1984)
97. Doyle, P.G.: Energy for Markov chains. Preprint available at http://math.dartmouth.edu/
~doyle/docs/energy/energy.pdf (1994)
98. Dupuis, P., Ellis, R.: A Weak Convergence Approach to the Theory of Large Deviations.
Wiley Series in Probability and Statistics: Probability and Statistics. Wiley, New York (1997)
99. E, W., Ren, W., Vanden-Eijnden, E.: Energy landscapes and rare events. In: Proceedings of
the International Congress of Mathematicians, Vol. I (Beijing, 2002), pp. 621–630. Higher
Ed. Press, Beijing (2002)
100. E, W., Vanden-Eijnden, E.: Towards a theory of transition paths. J. Stat. Phys. 123, 503–523
(2006)
101. Eckhoff, M.: Capacity and the Low Lying Spectrum in Attractive Markov Chains. Ph.D.
thesis, Universität Potsdam (2000)
102. Eckhoff, M.: The low lying spectrum of irreversible, infinite state Markov chains in
the metastable regime. Technical report. Preprint, available at http://www.math.uzh.ch/
fileadmin/user/eckhoff/publikation/specirrev.pdf (2002)
103. Ellis, R.: Entropy, Large Deviations, and Statistical Mechanics. Grundlehren der Mathema-
tischen Wissenschaften, vol. 271. Springer, New York (1985)
104. Ethier, S., Kurtz, T.: Markov Processes. Characterization and Convergence. Wiley Series in
Probability and Mathematical Statistics: Probability and Mathematical Statistics. Wiley, New
York (1986)
105. Evans, M.: Phase transitions in one-dimensional nonequilibrium systems. Braz. J. Phys. 30,
42–57 (2000)
106. Eyring, H.: The activated complex in chemical reactions. J. Chem. Phys. 3, 107–115 (1935)
107. Faris, W., Jona-Lasinio, G.: Large fluctuations for a nonlinear heat equation with noise. J.
Phys. A 15, 3025–3055 (1982)
108. Feller, W.: An Introduction to Probability Theory and Its Applications. Vol. I, 3rd edn. Wiley,
New York (1968)
109. Feller, W.: An Introduction to Probability Theory and Its Applications. Vol. II, 2nd edn.
Wiley, New York (1971)
110. Feng, J., Kurtz, T.: Large Deviations for Stochastic Processes. Mathematical Surveys and
Monographs, vol. 131. American Mathematical Society, Providence (2006)
111. Fernandez, R., Manzo, F., Nardi, F., Scoppola, E., Sohier, J.: Conditioned, quasi-stationary,
restricted measures and escape from metastable states. ArXiv e-prints (Oct. 2014)
112. Fernandez, R., Manzo, F., Nardi, F.R., Scoppola, E.: Asymptotically exponential hitting times
and metastability. arXiv:1406.2637 (2014)
113. Fiedler, B., Rocha, C.: Heteroclinic orbits of semilinear parabolic equations. J. Differ. Equ.
125, 239–281 (1996)
114. Fontes, L.R., Mathieu, P., Picco, P.: On the averaged dynamics of the random field Curie-
Weiss model. Ann. Appl. Probab. 10, 1212–1245 (2000)
115. Freidlin, M.I., Wentzell, A.D.: Random Perturbations of Dynamical Systems. Grundlehren
der Mathematischen Wissenschaften, vol. 260. Springer, New York (1984)
116. Fukushima, M., Oshima, Y., Takeda, M.: Dirichlet Forms and Symmetric Markov Processes.
de Gruyter Studies in Mathematics, vol. 19. de Gruyter, Berlin (2011), extended edition
117. Funaki, T.: Random motion of strings and related stochastic evolution equations. Nagoya
Math. J. 89, 129–193 (1983)
118. Gaudillière, A., den Hollander, F., Nardi, F.R., Olivieri, E., Scoppola, E.: Ideal gas approxi-
mation for a two-dimensional rarefied gas under Kawasaki dynamics. Stoch. Process. Appl.
119, 737–774 (2009)
119. Gaudillière, A., den Hollander, F., Nardi, F.R., Olivieri, E., Scoppola, E.: Droplet dynamics
in a two-dimensional rarified gas under Kawasaki dynamics (2015, in preparation)
572 References

120. Gaudillière, A., den Hollander, F., Nardi, F.R., Olivieri, E., Scoppola, E.: Homogeneous nu-
cleation for two-dimensional Kawasaki dynamics (2015, in preparation)
121. Gaudillière, A., Landim, C.: A Dirichlet principle for non reversible Markov chains and some
recurrence theorems. Probab. Theory Relat. Fields 158, 55–89 (2014)
122. Gaudillière, A., Olivieri, E., Scoppola, E.: Nucleation pattern at low temperature for local
Kawasaki dynamics in two dimensions. Markov Process. Relat. Fields 11, 553–628 (2005)
123. Gaveau, B., Moreau, M.: Metastable relaxation times and absorption probabilities for multi-
dimensional stochastic systems. J. Phys. A 33, 4837–4850 (2000)
124. Gaveau, B., Schulman, L.S.: Theory of nonequilibrium first-order phase transitions for
stochastic dynamics. J. Math. Phys. 39, 1517–1533 (1998)
125. Georgii, H.-O.: Gibbs Measures and Phase Transitions, 2nd edn. de Gruyter Studies in Math-
ematics, vol. 9. de Gruyter, Berlin (1988)
126. Gilbarg, D., Trudinger, N.: Elliptic Partial Differential Equations of Second Order, 2nd edn.
Grundlehren der Mathematischen Wissenschaften, vol. 224. Springer, Berlin (1983)
127. Glasstone, S., Laidler, K., Eyring, H.: The Theory of Rate Processes. McGraw-Hill, New
York (1941)
128. Gois, B., Landim, C.: Zero-temperature limit of the Kawasaki dynamics for the Ising lattice
gas in a large two-dimensional torus. Ann. Probab. 43, 2151–2203 (2015)
129. Grassberger, P., Barkema, G., Nadler, W.: Monte Carlo Approach to Biopolymers and Protein
Folding. World Scientific, Singapore (1998)
130. Großkinsky, S., Schütz, G.M.: Discontinuous condensation transition and nonequivalence of
ensembles in a zero-range process. J. Stat. Phys. 132, 77–108 (2008)
131. Großkinsky, S., Schütz, G.M., Spohn, H.: Condensation in the zero range process: stationary
and dynamical properties. J. Stat. Phys. 113, 389–410 (2003)
132. Großkinsky, S., Spohn, H.: Stationary measures and hydrodynamics of zero range processes
with several species of particles. Bull. Braz. Math. Soc. (N.S.), 489–507 (2003)
133. Gyöngy, I.: Lattice approximations for stochastic quasi-linear parabolic partial differential
equations driven by space-time white noise. I. Potential Anal. 9, 1–25 (1998)
134. Gyöngy, I., Pardoux, É.: On quasi-linear stochastic partial differential equations. Probab.
Theory Relat. Fields 94, 413–425 (1993)
135. Hairer, M.: A theory of regularity structures. Invent. Math. 198, 269–504 (2014)
136. Helffer, B., Klein, M., Nier, F.: Quantitative analysis of metastability in reversible diffusion
processes via a Witten complex approach. Mat. Contemp. 26, 41–85 (2004)
137. Helffer, B., Nier, F.: Quantitative analysis of metastability in reversible diffusion processes
via a Witten complex approach: the case with boundary. Mém. Soc. Math. Fr. (N.S.) 105,
vi+89 (2006)
138. Helffer, B., Sjöstrand, J.: Multiple wells in the semiclassical limit. I. Commun. Partial Differ.
Equ. 9, 337–408 (1984)
139. Holden, H., Oksendal, B., Uboe, J., Zhang, T.: Stochastic Partial Differential Equations.
Probability and Its Applications. Birkhäuser, Boston (1996)
140. Holley, R.A., Kusuoka, S., Stroock, D.W.: Asymptotics of the spectral gap with applications
to the theory of simulated annealing. J. Funct. Anal. 83, 333–347 (1989)
141. Huang, S.: The molecular and mathematical basis of Waddington’s epigenetic landscape:
A framework for post-Darwinian biology? BioEssays 34, 149–157 (2011)
142. Itō, K., McKean, H.: Diffusion Processes and Their Sample Paths. Springer, New York
(1965)
143. Jacod, J., Shiryaev, A.: Limit Theorems for Stochastic Processes, 2nd edn. Grundlehren der
Mathematischen Wissenschaften, vol. 288. Springer, Berlin (2003)
144. Kakutani, S.: Two-dimensional Brownian motion and harmonic functions. Proc. Imp. Acad.
(Tokyo) 20, 706–714 (1944)
145. Kakutani, S.: Markov process and the Dirichlet problem. Proc. Jpn. Acad. 21, 227–233
(1949), 1945
146. Kallenberg, O.: Foundations of Modern Probability, 2nd edn. Probability and Its Applications
(New York). Springer, New York (2002)
References 573

147. Kallianpur, G., Xiong, J.: Large deviations for a class of stochastic partial differential equa-
tions. Ann. Probab. 24, 320–345 (1996)
148. Karatzas, I., Shreve, S.: Brownian Motion and Stochastic Calculus. Graduate Texts in Math-
ematics. Springer, New York (1988)
149. Kauffman, S.: The Origins of Order. Oxford University Press, Oxford (1993)
150. Kemeny, J., Snell, J., Knapp, A.: Denumerable Markov Chains, 2nd edn. Springer, New York
(1976)
151. Kemeny, J.G., Snell, J.L.: Finite Markov Chains. The University Series in Undergraduate
Mathematics. D. Van Nostrand Co., Inc., Princeton-Toronto-London-New York (1960)
152. Kifer, Y.: Random Perturbations of Dynamical Systems. Progress in Probability and Statis-
tics, vol. 16. Birkhäuser, Boston (1988)
153. Knessl, C., Matkowsky, B.J., Schuss, Z., Tier, C.: An asymptotic theory of large deviations
for Markov jump processes. SIAM J. Appl. Math. 45, 1006–1028 (1985)
154. Kolokoltsov, V.: Semiclassical Analysis for Diffusions and Stochastic Processes. Lecture
Notes in Mathematics, vol. 1724. Springer, Berlin (2000)
155. Kotecký, R., Olivieri, E.: Droplet dynamics for asymmetric Ising model. J. Stat. Phys. 70,
1121–1148 (1993)
156. Kotecký, R., Olivieri, E.: Shapes of growing droplets—a model of escape from a metastable
phase. J. Stat. Phys. 75, 409–506 (1994)
157. Kramers, H.A.: Brownian motion in a field of force and the diffusion model of chemical
reactions. Physica 7, 284–304 (1940)
158. Külske, C.: Metastates in disordered mean-field models: random field and Hopfield models.
J. Stat. Phys. 88, 1257–1293 (1997)
159. Landim, C.: Metastability for a non-reversible dynamics: The evolution of the condensate in
totally asymmetric zero range processes. Commun. Math. Phys. 330, 1–32 (2014)
160. Lawler, G.F.: Intersections of Random Walks. Probability and Its Applications. Birkhäuser,
Boston (1991)
161. Lawler, G.F., Schramm, O., Werner, W.: Conformal invariance of planar loop-erased random
walks and uniform spanning trees. Ann. Probab. 32, 939–995 (2004)
162. Lebowitz, J.L., Penrose, O.: Rigorous treatment of the van der Waals-Maxwell theory of the
liquid-vapor transition. J. Math. Phys. 7, 98–113 (1966)
163. Levin, D., Peres, Y., Wilmer, E.: Markov Chains and Mixing Times. American Mathematical
Society, Providence (2009)
164. Levin, D.A., Luczak, M.J., Peres, Y.: Glauber dynamics for the mean-field Ising model: cut-
off, critical power law, and metastability. Probab. Theory Relat. Fields 146, 223–265 (2010)
165. Levit, S., Smilansky, U.: A theorem on infinite products of eigenvalues of Sturm-Liouville
type operators. Proc. Am. Math. Soc. 65, 299–302 (1977)
166. Liggett, T.: Interacting Particle Systems. Grundlehren der Mathematischen Wissenschaften,
vol. 276. Springer, New York (1985)
167. Lubetzky, E., Martinelli, F., Sly, A., Toninelli, F.L.: Quasi-polynomial mixing of the 2D
stochastic Ising model with “plus” boundary up to criticality. J. Eur. Math. Soc. (JEMS) 15,
339–386 (2013)
168. Lubetzky, E., Sly, A.: Critical Ising on the square lattice mixes in polynomial time. Commun.
Math. Phys. 313, 815–836 (2012)
169. Maier, R.S., Stein, D.L.: Limiting exit location distributions in the stochastic exit problem.
SIAM J. Appl. Math. 57, 752–790 (1997)
170. Maier, R.S., Stein, D.L.: Droplet nucleation and domain wall motion in a bounded interval.
Phys. Rev. Lett. 87, 270601 (2001)
171. Manzo, F., Nardi, F.R., Olivieri, E., Scoppola, E.: On the essential features of metastability:
tunnelling time and critical configurations. J. Stat. Phys. 115, 591–642 (2004)
172. Manzo, F., Olivieri, E.: Relaxation patterns for competing metastable states: a nucleation and
growth model. Markov Process. Relat. Fields 4, 549–570 (1998)
173. Manzo, F., Olivieri, E.: Dynamical Blume-Capel model: competing metastable states at infi-
nite volume. J. Stat. Phys. 104, 1029–1090 (2001)
574 References

174. Martinelli, F., Olivieri, E., Scoppola, E.: Small random perturbations of finite- and infinite-
dimensional dynamical systems: unpredictability of exit times. J. Stat. Phys. 55, 477–504
(1989)
175. Martinelli, F., Olivieri, E., Scoppola, E.: Metastability and exponential approach to equilib-
rium for low-temperature stochastic Ising models. J. Stat. Phys. 61, 1105–1119 (1990)
176. Martinelli, F., Sbano, L., Scoppola, E.: Small random perturbation of dynamical systems:
recursive multiscale analysis. Stoch. Stoch. Rep. 49, 253–272 (1994)
177. Martinelli, F., Scoppola, E.: Small random perturbations of dynamical systems: exponential
loss of memory of the initial condition. Commun. Math. Phys. 120, 25–69 (1988)
178. Martinelli, F., Toninelli, F.L.: On the mixing time of the 2D stochastic Ising model with
“plus” boundary conditions at low temperature. Commun. Math. Phys. 296, 175–213 (2010)
179. Mathieu, P.: Spectra, exit times and long time asymptotics in the zero-white-noise limit.
Stoch. Stoch. Rep. 55, 1–20 (1995)
180. Mathieu, P., Picco, P.: Metastability and convergence to equilibrium for the random field
Curie-Weiss model. J. Stat. Phys. 91, 679–732 (1998)
181. Matkowsky, B.J., Schuss, Z.: On the lifetime of a metastable state at low noise. Phys. Lett.
A 95, 213–215 (1983)
182. Matkowsky, B.J., Schuss, Z., Tier, C.: Uniform expansion of the transition rate in Kramers’
problem. J. Stat. Phys. 35, 443–456 (1984)
183. Menz, G., Schlichting, A.: Poincaré and logarithmic Sobolev inequalities by decomposition
of the energy landscape. Ann. Probab. 42, 1809–1884 (2014)
184. Metzner, P., Schütte, C., Vanden-Eijnden, E.: Transition path theory for Markov jump pro-
cesses. Multiscale Model. Simul. 7, 1192–1219 (2008)
185. Miclo, L.: Comportement de spectres d’opérateurs de Schrödinger à basse température. Bull.
Sci. Math. 119, 529–553 (1995)
186. Mogul’skiı̆, A.A.: Large deviations for the trajectories of multidimensional random walks.
Teor. Verojatn. Primen. 21, 309–323 (1976)
187. Mourrat, J.-C., Valesin, D.: Phase transition of the contact process on random regular graphs.
arXiv:1405.0865 (2014)
188. Nardi, F.R., Olivieri, E.: Low temperature stochastic dynamics for an Ising model with alter-
nating field. Markov Process. Relat. Fields 2, 117–166 (1996)
189. Nardi, F.R., Olivieri, E., Scoppola, E.: Anisotropy effects in nucleation for conservative dy-
namics. J. Stat. Phys. 119, 539–595 (2005)
190. Nardi, F.R., Spitoni, C.: Sharp asymptotics for stochastic dynamics with parallel updating
rule. J. Stat. Phys. 146, 701–718 (2012)
191. Neukirch, R.: Metastability in the Zero-range Process. Diploma thesis, Bonn University
(2011)
192. Neves, E.: A discrete variational problem related to Ising droplets at low temperatures. J.
Stat. Phys. 80 (1995)
193. Neves, E.J., Schonmann, R.H.: Critical droplets and metastability for a Glauber dynamics at
very low temperatures. Commun. Math. Phys. 137, 209–230 (1991)
194. Newman, M., Barkema, G.: Monte Carlo Methods in Statistical Physics. Oxford University
Press, Oxford (1999)
195. Norris, J.: Markov Chains. Cambridge Series in Statistical and Probabilistic Mathematics,
vol. 2. Cambridge University Press, Cambridge (1998)
196. Olivieri, E., Scoppola, E.: Markov chains with exponentially small transition probabilities:
first exit problem from a general domain. I. The reversible case. J. Stat. Phys. 79, 613–647
(1995)
197. Olivieri, E., Scoppola, E.: Markov chains with exponentially small transition probabilities:
first exit problem from a general domain. II. The general case. J. Stat. Phys. 84, 987–1041
(1996)
198. Olivieri, E., Vares, M.E.: Large Deviations and Metastability. Encyclopedia of Mathematics
and Its Applications, vol. 100. Cambridge University Press, Cambridge (2005)
References 575

199. Penrose, O., Lebowitz, J.: Rigorous treatment of metastable states in the van der Waals-
Maxwell theory. J. Stat. Phys. 3, 211–236 (1971)
200. Penrose, O., Lebowitz, J.: Towards a rigorous molecular theory of metastability. In: Fluctua-
tion Phenomena, 2nd edn. North-Holland, Amsterdam (1987)
201. Pollak, E., Talkner, P.: Reaction rate theory: What it was, where is it today, and where is it
going? Chaos 15, 026116 (2005)
202. Presutti, E.: Scaling Limits in Statistical Mechanics and Microstructures in Continuum Me-
chanics. Theoretical and Mathematical Physics. Springer, Berlin (2009)
203. Prévôt, C., Röckner, M.: A Concise Course on Stochastic Partial Differential Equations.
Lecture Notes in Mathematics, vol. 1905. Springer, Berlin (2007)
204. Reed, M., Simon, B.: Methods of Modern Mathematical Physics. I. Functional Analysis, 2nd
edn. Academic Press [Harcourt Brace Jovanovich, Publishers], New York (1980)
205. Révész, P.: Random Walk in Random and Nonrandom Environments. World Scientific, Tea-
neck (1990)
206. Rikvold, P., Tomita, H., Miyashita, S., Sides, S.: Metastable lifetimes in a kinetic Ising model:
dependence on field and system size. Phys. Rev. E 49, 5080–5090 (1994)
207. Rogers, L., Williams, D.: Diffusions, Markov Processes, and Martingales. Vol. 2. Wiley Se-
ries in Probability and Mathematical Statistics: Probability and Mathematical Statistics. Wi-
ley, New York (1987)
208. Rogers, L., Williams, D.: Diffusions, Markov Processes, and Martingales. Vol. 1, 2nd edn.
Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statis-
tics. Wiley, New York (1994)
209. Ruelle, D.: Statistical Mechanics: Rigorous Results. Benjamin, New York-Amsterdam
(1969)
210. Schlichting, A.: The Eyring-Kramers Formula for Poincaré and Logarithmic Sobolev In-
equalities. Ph.D. thesis, Leipzig University (2012)
211. Schonmann, R.H.: The pattern of escape from metastability of a stochastic Ising model.
Commun. Math. Phys. 147, 231–240 (1992)
212. Schonmann, R.H.: Slow droplet-driven relaxation of stochastic Ising models in the vicinity
of the phase coexistence region. Commun. Math. Phys. 161, 1–49 (1994)
213. Schonmann, R.H.: Theorems and conjectures on the droplet-driven relaxation of stochastic
Ising models. In: Probability and Phase Transition (Cambridge, 1993). NATO Adv. Sci. Inst.
Ser. C Math. Phys. Sci., vol. 420, pp. 265–301. Kluwer Acad., Dordrecht (1994)
214. Schonmann, R.H.: Metastability and the Ising model. In: Proceedings of the International
Congress of Mathematicians (Berlin, 1998), vol. III, pp. 173–181 (1998)
215. Schonmann, R.H., Shlosman, S.B.: Wulff droplets and the metastable relaxation of kinetic
Ising models. Commun. Math. Phys. 194, 389–462 (1998)
216. Schütte, C., Huisinga, W., Meyn, S.: Metastability of diffusion processes. In: IUTAM Sym-
posium on Nonlinear Stochastic Dynamics. Solid Mech. Appl., vol. 110, pp. 71–81. Kluwer
Acad., Dordrecht (2003)
217. Schütte, C., Sarich, M.: Metastability and Markov State Models in Molecular Dynamics.
Courant Lecture Notes in Mathematics, vol. 24. Courant Institute of Mathematical Sciences/
American Mathematical Society, Providence/New York (2013)
218. Sewell, G.: Quantum Theory of Collective Phenomena. Oxford University Press, Oxford
(1986)
219. Slowik, M.: Contributions to the Potential Theoretic Approach to Metastability with Appli-
cations to the Random Field Curie-Weiss-Potts Model. Ph.D. thesis, Technische Universität
Berlin (2012)
220. Slowik, M.: A note on variational representations of capacities for reversible and non-
reversible Markov chains. Unpublished, Technische Universität Berlin (2012)
221. Sowers, R.B.: Large deviations for a reaction-diffusion equation with non-Gaussian pertur-
bations. Ann. Probab. 20, 504–537 (1992)
222. Stroock, D.: An Introduction to Markov Processes. Graduate Texts in Mathematics, vol. 230.
Springer, Berlin (2005)
576 References

223. Stroock, D., Varadhan, S.: Multidimensional Diffusion Processes. Grundlehren der Mathe-
matischen Wissenschaften, vol. 233. Springer, Berlin (1979)
224. Sugiura, M.: Metastable behaviors of diffusion processes with small parameter. J. Math. Soc.
Jpn. 47, 755–788 (1995)
225. Sugiura, M.: Exponential asymptotics in the small parameter exit problem. Nagoya Math. J.
144, 137–154 (1996)
226. Sznitman, A.-S.: Brownian Motion, Obstacles and Random Media. Springer Monographs in
Mathematics. Springer, Berlin (1998)
227. van den Berg, M.: Exit and return of a simple random walk. Potential Anal. 23, 45–53 (2005)
228. van Kampen, N.: Stochastic Processes in Physics and Chemistry. North-Holland, Amsterdam
(1981)
229. van ’t Hoff, J.: Études de Dynamiques Chimiques. F. Muller, Amsterdam (1884)
230. Vanden-Eijnden, E., Westdickenberg, M.G.: Rare events in stochastic partial differential
equations on large spatial domains. J. Stat. Phys. 131, 1023–1038 (2008)
231. Vanheuverzwijn, P.: Metastable states in the infinite Ising model. J. Math. Phys. 20, 2665–
2670 (1979)
232. Varadhan, S.R.S.: Large Deviations and Applications. CBMS-NSF Regional Conference
Series in Applied Mathematics, vol. 46. Society for Industrial and Applied Mathematics
(SIAM), Philadelphia (1984)
233. Ventcel’, A.D.: The asymptotic behavior of the first eigenvalue of a second order differential
operator with a small parameter multiplying the highest derivatives. Teor. Verojatn. Primen.
20, 610–613 (1975)
234. Ventcel’, A.D.: Formulas for eigenfunctions and eigenmeasures that are connected with a
Markov process. Teor. Verojatn. Primen. 18, 3–29 (1973)
235. Walsh, J.B.: An introduction to stochastic partial differential equations. In: École d’été de
probabilités de Saint-Flour, XIV—1984. Lecture Notes in Math., vol. 1180, pp. 265–439.
Springer, Berlin (1986)
236. Weidenmüller, H.A., Zhang, J.S.: Stationary diffusion over a multidimensional potential bar-
rier: a generalization of Kramers’ formula. J. Stat. Phys. 34, 191–201 (1984)
237. Wolfrum, M.: A sequence of order relations: encoding heteroclinic connections in scalar
parabolic PDE. J. Differ. Equ. 183, 56–78 (2002)
238. Wreszinski, W.F., Salinas, S.R.A.: The mean field Ising model in a random external magnetic
field. J. Stat. Phys. 41, 299–313 (1985)
Index

Symbols Closed operator, 93

σ -algebra, 28 Coarse-graining, 233, 234, 335, 347
Borel, 28 Communicating class, 75
generated, 28 Communication
height, 248, 384
A level set, 248, 385
Absolute continuity, 37 Conditional
Action functional, 138 expectation, 40
Activation energy, 8, 16 probability, 42
Adapted process, 44, 54 Conductance, 157
Allen-Cahn equation, 115, 305 Contraction principle, 126
Amplitude, 8, 16 Contraction semigroup, 88
Arrhenius law, 8, 16, 388 Convergence
almost sure, 35
B in Lp , 35
Berman-Konsowa principle, 20, 145, 173, 178, in law, 35
258, 352 in probability, 35
Birth-death dynamics, 551 weak, 34
Borel measure, 29 Core, 94
Borel-σ -algebra, 28 Coupling, 240
Bracket process, 48 Cramér’s theorem, 128
Brownian motion, 16, 80 Critical
Brownian sheet, 115 configurations, 462, 480
Bryc’s lemma, 126 gate, 387
set, 386, 411, 421, 429
C Critical droplet, 6, 397, 421
Càdlàg process, 54 Glauber dynamics, 410, 421
Capacitor, 165 Kawasaki dynamics, 427, 454
Capacity, 20, 153, 154, 167, 182, 191, 275, 391 Critical space-time cone, 561
1-dim continuous, 172 Crystallisation, 549
1-dim discrete, 156 Curie-Weiss model, 325
matrix, 204, 296 random-field, 331, 345
Carathéodory’s theorem, 29 Cycle, 142
Cauchy sequence, 28
Chapman-Kolmogorov equations, 65, 84, 86 D
Characterisation of metastability, 189 Daniell-Kolmogorov extension theorem, 39
Closable, 93 Detailed balance, 150

A. Bovier, F. den Hollander, Metastability,
Grundlehren der mathematischen Wissenschaften 351,
DOI 10.1007/978-3-319-24777-9
578 Index

Diffusion matrix, 160 F

Diffusion process, 265 Fatou’s lemma, 32
1-dim, 171 Feller property, 89
discrete, 247 Feller-Dynkin
with gradient drift, 265 process, 89
Dirichlet form, 151, 162, 390 semigroup, 89
gradient diffusion, 266 Filtered space, 44
Dirichlet principle, 20, 145, 173, 174 Filtration, 44
non-reversible, 182 Finite-state Markov process, 17
Dirichlet problem, 19, 281 First hitting time, 49, 59
1-dim discrete, 155 Flow, 176, 230
continuous-time, 157 defective, 230, 352
diffusion, 165 harmonic, 177, 230
discrete-time, 145 loop-free, 179
principal eigenvalue, 284 unit, 176
uniqueness, 146 Free energy, 15
Discrete stochastic integral, 45 Freidlin-Wentzell theory, 138, 141
Discretisation of spdes, 119
Dissipative operator, 89, 98 G
Doob Gärtner-Ellis theorem, 128
decomposition, 47, 69 Generator, 79, 88
maximum inequality, 46, 58 Brownian motion, 86
optional stopping theorem, 51 continuous-time, 88
regularity theorem, 55 diffusion process, 160
supermartingale inequality, 53 discrete-time, 69
transform, 112 gradient diffusion, 266
Gibbs measure, 15
Double-well potential, 16
canonical, 479
Dynkin’s theorem, 29
general, 384
grand-canonical, 551
E
Ginzburg-Landau equation, 115
Eigenfunction, 147
Girsanov theorem, 113, 130
Eigenvalue, 147, 201 Glauber dynamics, 410
Dirichlet, 201 Green formula
principal, 201 continuous, 166
Electrostatic interpretation, 165 discrete, 151
Elliptic operators, 160, 236 Green function, 148, 158, 166
Elliptic regularity theory, 236 1-dim diffusion, 172
Equilibrium measure, 149, 166 diffusion, 167
Equilibrium potential, 148, 165, 391 Gronwall lemma, 109
Equivalence of ensembles, 488, 498, 499
Equivalence of measures, 37 H
Ergodic Hamiltonian, 336, 347
process, 75 canonical, 479
theorem, 77 continuum, 551
Escape probability, 141, 149 Curie-Weiss, 325
Essential gate, 248, 385 general, 384
Euler-Lagrange equations, 139 grand-canonical, 551
Exit problem, 139 Ising spins, 409, 545, 560
Exit time, 275 lattice gas, 426
exponential distribution, 220, 302, 396, 514 random-field Curie-Weiss, 331
Exponential asymptotics, 17, 142 Harmonic extension, 238
Exponential tightness, 126 Harmonic flow, 230
Eyring-Kramers formula, 17, 268 Harmonic function, 71, 147
Index 579

Heat kernel, 86 M
Heat-bath dynamics, 403 Markov jump process, 79, 101
Hessian, 249, 267, 279, 280, 340 Markov process, 66, 86
eigenvalue, 267 ρ-metastable, 191
Hille-Yosida theorem, 88, 92 aperiodic, 75
Hitting time, 49, 200 continuous-time, 84
Hysteresis, 7 discrete-time, 63
irreducible, 75
I
metastable family, 190
Inequality
Harnack, 240 stationary, 66
Hausdorff-Young, 313 transition function, 86
Hölder, 33 Markov property, 65
Jensen, 33 strong, 67, 91
maximum, 48, 58 Markov semigroup, 68
Minkowski, 33 Martingale, 44
Initial distribution, 64 convergence theorem, 45, 57
Inverse temperature, 326, 384 local, 61
Itō sub, 44, 54
calculus, 102 super, 44, 54
formula, 107 Martingale problem, 69, 92, 96
integral, 106 continuous-time, 84
isometry, 106 discrete-time, 69
existence, 100
J Markov, 70, 96
Jump process, 101
uniqueness, 99
K Martingale transform, 45
Kawasaki dynamics, 426 Maximum inequality, 46, 48, 58
Kirchhoff’s law, 176 Maximum principle, 71
Kramers Mean hitting time, 195
equation, 171 1-dim diffusion, 172
formula, 9, 16, 173 1-dim discrete, 157
model, 9, 16, 265 basic formula, 154
Mean-field model, 325
L Measurable map, 29
Lagrangian, 138 Measure, 28
Langevin equation, 16 σ -finite, 28
Large deviation principle, 125, 129 Borel, 29
Dawson-Gärtner projective limit, 127 equilibrium, 149
exponential tightness, 126 finite, 28
Gärtner-Ellis, 128
induced, 30
path space, 129
invariant, 66, 150, 162
weak, 126
measurable space, 28
Large deviations
for spdes, 136 measure space, 28
on path space, 129 reversible, 162
Last-exit biased distribution, 153, 463, 482 Wiener, 83
Lebesgue integral, 31 Metastability
Lebesgue’s dominated convergence theorem, characterisation, 189
32 computational approach, 12
Lebowitz-Penrose theory, 9 pathwise approach, 10
Legendre transform, 128 potential-theoretic approach, 11
Local martingale, 61 spectral approach, 11
Lumping, 234 spectral characterisation, 200
580 Index

Metastable Q
configurations, 385 Quadratic variation, 49
exit time, 200 Quasi-invariant set, 190
points, 191 Quasi-potential, 140
set, 210, 249, 395
state, 6, 15, 142 R
Metastable regime Radon-Nikodým
Glauber dynamics, 410, 420, 546 derivative, 37
Kawasaki dynamics, 427, 454 theorem, 37
Metropolis dynamics, 384 Random variable, 30
Microscopic flow, 355 Random walk, 155
Minimal gate, 385 Random-field Curie-Weiss model, 331, 345
Model reduction, 19 Rate function, 125
Mogul’skiı̆’s theorem, 138 Rayleigh-Ritz variational principle, 209
Monotone class theorem, 29 Recurrence, 74
Monotone convergence theorem, 31 Regular conditional probability, 43
Morse function, 249 existence, 43
Regular domain, 161
N Regularisable function, 53
Normal semigroup, 87 Renewal estimate, 192, 268
Nucleation path, 462 approximate, 237
Renormalisation, 233
O Reversed process, 182
Ohm’s law, 157 Reversibility, 150, 162
Optimal path, 248, 385 Reversible
Optimal transport, 304 measure, 162
Optional sampling theorem, 60 process, 162

P S
Path large deviations Saddle points, 248
Markov processes, 137 Sample path, 38
Phase transition, 5 Sanov’s theorem, 128
Poisson kernel, 148, 152, 158, 167, 238 Schilder’s theorem, 129
Polish space, 29 Semigroup, 87
Positive recurrent, 75 Separable, 28
Previsible process, 45 Simple function, 30
Principal Dirichlet eigenvalue, 281 Simple process, 105
Principal eigenvalue, 209 Simple random walk, 490
Probabilistic cellular automata, 404 Small eigenvalues, 203, 289
Probability Sobolev space, 237
density, 37 Solution
measure, 28 mild, 118
regular conditional, 43 strong, 108, 109
space, 28 weak, 108
Process Space
adapted, 44, 54 L p , 32
ergodic, 75 complete, 28
non-reversible, 181 filtered, 44
previsible, 45 Lp , 32
progressive, 59 metric, 28
quadratic variation, 49 Polish, 29
reversible, 162 Sobolev, 237
Progressive process, 59 Spectrum, 200, 278, 396
Protocritical set, 386, 411, 421, 429 Stability level, 385
Index 581

Stable configurations, 385 U

State space, 38 Ultrametric triangle inequality, 193
Stationary process, 66 Ultrametricity, 193
Stochastic Uniform entrance distribution, 387
differential equation, 107 Uniform integrability, 36
integral equation, 107 Uniqueness
partial differential equation, 115, 305 pathwise, 108
Stochastic integral, 102 Unit flow, 176
discrete, 45 Universality, 5
Stopping time, 49, 58 Upcrossing inequality, 46
Strong Markov property, 67
Sub-Markovian, 87 V
Subcritical configurations, 462, 480 Varadhan’s lemma, 126
Submartingale, 45 Variational principle
Supercritical configurations, 462, 480 Berman-Konsowa, 178
Supermartingale, 45
Dirichlet, 174
Symmetrised process, 182
Thomson, 175
T
Thomson principle, 20, 145, 173, 175 W
non-reversible, 183 Weak convergence, 34
Tightness Wiener measure, 83
exponential, 126 1-dim, 83
Time d-dim, 83
continuous, 38 Wulff construction, 547
discrete, 38 Wulff droplet, 548
Transience, 74
Transition kernel, 63, 86 Z
stationary, 66 Zero-range process, 511
Transition matrix, 74 condensation phenomenon, 513

Stochastic Finance A Numeraire Approach (PDFDrive)
No ratings yet
Stochastic Finance A Numeraire Approach (PDFDrive)
339 pages
Statistical Analysis With R Essentials For Dummies Schmuller PDF Download
No ratings yet
Statistical Analysis With R Essentials For Dummies Schmuller PDF Download
53 pages
Am, Is, Are (To Be) : Presentation
No ratings yet
Am, Is, Are (To Be) : Presentation
3 pages
Article On Sur Kedaro
No ratings yet
Article On Sur Kedaro
8 pages
Antonov SABR Spreads Its Wings
No ratings yet
Antonov SABR Spreads Its Wings
7 pages
#Data Assimilation - Mathematical Concepts and Instructive Examples
No ratings yet
#Data Assimilation - Mathematical Concepts and Instructive Examples
140 pages
Whole Bible Niv1984
No ratings yet
Whole Bible Niv1984
1,871 pages
Sign of Jonah - Ahmed Deedat - Zakir Naik - Exposed - Rebuttal - Refuted by John Gilchrist
No ratings yet
Sign of Jonah - Ahmed Deedat - Zakir Naik - Exposed - Rebuttal - Refuted by John Gilchrist
17 pages
SABR Stochastic Volatility
No ratings yet
SABR Stochastic Volatility
91 pages
Advances in Applied Mathematics and Global Optimization in Honor
No ratings yet
Advances in Applied Mathematics and Global Optimization in Honor
542 pages
The Limitation of Conflict - A Theory of Bargaining and - Rangarajan, L - N - New York, New York State, 1985 - Palgrave Macmillan - 9780312486754 - Anna's Archive
No ratings yet
The Limitation of Conflict - A Theory of Bargaining and - Rangarajan, L - N - New York, New York State, 1985 - Palgrave Macmillan - 9780312486754 - Anna's Archive
360 pages
Markov Processes - Characterization and Convergence
No ratings yet
Markov Processes - Characterization and Convergence
273 pages
Lecture Notes For Tutorial
No ratings yet
Lecture Notes For Tutorial
162 pages
Who I Am in Christ Daily Affirmations
No ratings yet
Who I Am in Christ Daily Affirmations
34 pages
Structure of A Report
No ratings yet
Structure of A Report
32 pages
G6 M6 Top Stars 2 UK
No ratings yet
G6 M6 Top Stars 2 UK
25 pages
Nuvation Energy Communication Protocol Reference - 2.2 2
No ratings yet
Nuvation Energy Communication Protocol Reference - 2.2 2
15 pages
Differential Equations
No ratings yet
Differential Equations
23 pages
(Arnold L.) Stochastic Differential Equations The (BookFi)
No ratings yet
(Arnold L.) Stochastic Differential Equations The (BookFi)
244 pages
MAM101 Problemset SeriesSequences
No ratings yet
MAM101 Problemset SeriesSequences
15 pages
Aunt Judy's Tales by Gatty, Alfred, MRS., 1809-1873
No ratings yet
Aunt Judy's Tales by Gatty, Alfred, MRS., 1809-1873
89 pages
InstructieBLSR S7-1200 RS232C 0 E PDF
No ratings yet
InstructieBLSR S7-1200 RS232C 0 E PDF
41 pages
Field Change Results in Credit Block
No ratings yet
Field Change Results in Credit Block
3 pages
WHO RUN THE WORLD by Beyonce
No ratings yet
WHO RUN THE WORLD by Beyonce
2 pages
Key 5b
No ratings yet
Key 5b
4 pages
Lectures On Ergodic Theory
No ratings yet
Lectures On Ergodic Theory
153 pages
Slides 2 Extending The RBC Model
No ratings yet
Slides 2 Extending The RBC Model
70 pages
Topics in Probability Theory and Stochastic Processes Steven R. Dunbar
100% (1)
Topics in Probability Theory and Stochastic Processes Steven R. Dunbar
26 pages
Vimec Autodialler - Guidarapida - Helpy Vox - Intl - 24v - en
No ratings yet
Vimec Autodialler - Guidarapida - Helpy Vox - Intl - 24v - en
16 pages
Question Bank Unit-3
No ratings yet
Question Bank Unit-3
3 pages
Oracle 11GR2 Upgrade V2.6
No ratings yet
Oracle 11GR2 Upgrade V2.6
38 pages
Further Topics On Discrete-Time Markov Control Processes
No ratings yet
Further Topics On Discrete-Time Markov Control Processes
285 pages
Rough Volatility 2023 Part 1 Handout
No ratings yet
Rough Volatility 2023 Part 1 Handout
43 pages
Bayesian Structural Time Series Models
No ratings yet
Bayesian Structural Time Series Models
100 pages
Stochastic Notes
100% (1)
Stochastic Notes
143 pages
L13 B TreeVariants - Amortizedanalysis
No ratings yet
L13 B TreeVariants - Amortizedanalysis
44 pages
Module 12 (Prof Ed 16 PT)
No ratings yet
Module 12 (Prof Ed 16 PT)
10 pages
Calibration of The Schwartz-Smith Model For Commodity Prices
100% (1)
Calibration of The Schwartz-Smith Model For Commodity Prices
77 pages
Daily Routine
No ratings yet
Daily Routine
1 page
UCL-Thesis-Hui Gong
No ratings yet
UCL-Thesis-Hui Gong
101 pages
Sir Syed Ahmad Khan
No ratings yet
Sir Syed Ahmad Khan
4 pages
Stochastic Analysis in Finance II
No ratings yet
Stochastic Analysis in Finance II
16 pages
Temuco
No ratings yet
Temuco
298 pages
Gabillon Oil Futures Curve
100% (1)
Gabillon Oil Futures Curve
52 pages
GRADE 8 - ENG 2 - Poetry - The Slave S Dream - NOTES - APRIL 2023
No ratings yet
GRADE 8 - ENG 2 - Poetry - The Slave S Dream - NOTES - APRIL 2023
3 pages
Lucas Tree PDF
100% (1)
Lucas Tree PDF
11 pages
Introduction Mathematical Portfolio Theo
No ratings yet
Introduction Mathematical Portfolio Theo
159 pages
Rune Magic by Donald Tyson
95% (38)
Rune Magic by Donald Tyson
113 pages
Functional Ito Calculus and Stochastic Integral Representation of Martingales
No ratings yet
Functional Ito Calculus and Stochastic Integral Representation of Martingales
33 pages
Advance Stochastic Calculus (Abstracts) PDF
100% (2)
Advance Stochastic Calculus (Abstracts) PDF
106 pages
ECON138 Syllabus W23
No ratings yet
ECON138 Syllabus W23
9 pages
AAmockpaper 1
No ratings yet
AAmockpaper 1
14 pages
Introduction Electronics (Tele@Vtu23)
No ratings yet
Introduction Electronics (Tele@Vtu23)
9 pages
Present Simple PDF
No ratings yet
Present Simple PDF
6 pages
Advanced Numerical Methods
No ratings yet
Advanced Numerical Methods
160 pages
Econ 138: Financial and Behavioral Economics
No ratings yet
Econ 138: Financial and Behavioral Economics
21 pages
Econ 138: Financial and Behavioral Economics: Noise February 6, 2017
No ratings yet
Econ 138: Financial and Behavioral Economics: Noise February 6, 2017
29 pages
Lecture Notes Stochastic Optimization-Koole
No ratings yet
Lecture Notes Stochastic Optimization-Koole
42 pages
Ensemble Average and Time Average
No ratings yet
Ensemble Average and Time Average
31 pages
Stochastic Volatiity Models 2005 PDF
No ratings yet
Stochastic Volatiity Models 2005 PDF
35 pages
New Trends in Energy Derivatives: Alexander Eydeland Morgan Stanley
No ratings yet
New Trends in Energy Derivatives: Alexander Eydeland Morgan Stanley
33 pages
Bayesian Model
No ratings yet
Bayesian Model
71 pages
Equity and Foreign Exchange Hybrid Models For Pricing Long-Maturity Financial Derivativesn
No ratings yet
Equity and Foreign Exchange Hybrid Models For Pricing Long-Maturity Financial Derivativesn
184 pages
TimeSeries Analysis State Space Methods
100% (1)
TimeSeries Analysis State Space Methods
57 pages
Bayesian Methods in Finance-Nick Polson
No ratings yet
Bayesian Methods in Finance-Nick Polson
38 pages
Countinuous Short Rate Models
No ratings yet
Countinuous Short Rate Models
13 pages
Arbitrage Theory in Continuous Time: Third Edition
No ratings yet
Arbitrage Theory in Continuous Time: Third Edition
9 pages
Sde
No ratings yet
Sde
64 pages
Stochastic Calculus, Filtering, and Stochastic Control
100% (2)
Stochastic Calculus, Filtering, and Stochastic Control
265 pages
Paul Samuelson Theoretical Notes On Trade Problems
No ratings yet
Paul Samuelson Theoretical Notes On Trade Problems
11 pages
Change Point Detection
No ratings yet
Change Point Detection
23 pages
A Practical ImplementationOfHJM
No ratings yet
A Practical ImplementationOfHJM
336 pages
Lecture - 12 Von Neumann & Morgenstern Expected Utility
No ratings yet
Lecture - 12 Von Neumann & Morgenstern Expected Utility
20 pages
Kemna Vorst
100% (1)
Kemna Vorst
17 pages
MSM Specification: Discrete Time
No ratings yet
MSM Specification: Discrete Time
5 pages
(Whalley, Wilmott) An Asymptotic Analysis of An Optimal Hedging Model For Option Pricing With Transaction Costs (Jul1997)
100% (2)
(Whalley, Wilmott) An Asymptotic Analysis of An Optimal Hedging Model For Option Pricing With Transaction Costs (Jul1997)
18 pages
Checkpoint 7 Unit 2 Text 2E
No ratings yet
Checkpoint 7 Unit 2 Text 2E
6 pages
Information Theory Is The New Central Discipline
No ratings yet
Information Theory Is The New Central Discipline
3 pages
Complete Quadratic Lyapunov Functionals Using Bessel LegendreInequality
No ratings yet
Complete Quadratic Lyapunov Functionals Using Bessel LegendreInequality
6 pages
A Comparison of Numerical Techniques For American Option Pricing
No ratings yet
A Comparison of Numerical Techniques For American Option Pricing
8 pages
EC 744 Lecture Notes: Incomplete Markets and Bewley Models: Jianjun Miao
No ratings yet
EC 744 Lecture Notes: Incomplete Markets and Bewley Models: Jianjun Miao
39 pages
EC744 Lecture Note 6 Stochastic Models: Mathematical Preliminaries
No ratings yet
EC744 Lecture Note 6 Stochastic Models: Mathematical Preliminaries
18 pages
EC744 Lecture Notes: Economic Dynamics: Prof. Jianjun Miao
No ratings yet
EC744 Lecture Notes: Economic Dynamics: Prof. Jianjun Miao
13 pages
Optimizing the Aging, Retirement, and Pensions Dilemma
From Everand
Optimizing the Aging, Retirement, and Pensions Dilemma
William T. Ziemba
No ratings yet
Theoretical Numerical Analysis: An Introduction to Advanced Techniques
From Everand
Theoretical Numerical Analysis: An Introduction to Advanced Techniques
Peter Linz
No ratings yet
Stochastic Processes and Filtering Theory
From Everand
Stochastic Processes and Filtering Theory
Andrew H. Jazwinski
No ratings yet
Manufacturing and Managing Customer-Driven Derivatives
From Everand
Manufacturing and Managing Customer-Driven Derivatives
Dong Qu
No ratings yet
How to Implement Market Models Using VBA
From Everand
How to Implement Market Models Using VBA
Francois Goossens
No ratings yet
Fourier Transform Methods in Finance
From Everand
Fourier Transform Methods in Finance
Umberto Cherubini
No ratings yet
Exercises of Stochastic Processes
From Everand
Exercises of Stochastic Processes
Simone Malacrida
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Bovier & Den Hollander - Metastability

Uploaded by

Bovier & Den Hollander - Metastability

Uploaded by

Grundlehren der mathematischen Wissenschaften 351

A Series of Comprehensive Studies in Mathematics

M. Berger P. de la Harpe N.J. Hitchin

ISSN 0072-7830 ISSN 2196-9701 (electronic)

Library of Congress Control Number: 2015959720

Mathematics Subject Classification (2010): 60K35, 60J45, 82C26

Springer Cham Heidelberg New York Dordrecht London

Printed on acid-free paper

Springer is part of Springer Science+Business Media (www.springer.com)

C’est une grande folie de vouloir être sage tout seul.

Metastability is a wide-spread phenomenon in the dynamics of non-linear systems—

Bonn, Germany Anton Bovier

Logical structure of the monograph

3.2.1 Definition of stochastic processes . . . . . . . . . . . . . 37

7.4 Variational principles in the non-reversible setting . . . . . . . . 181

11.3 Spectral theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 278

15.4.2 Propagation of errors along microscopic paths . . . . . . 356

Part VI Applications: Lattice Systems in Small Volumes at Low

18 Kawasaki Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . 425

Part VII Applications: Lattice Systems in Large Volumes at Low

20 Kawasaki Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . 479

23 Challenges Beyond Metastability . . . . . . . . . . . . . . . . . . . 559

Part I contains an introduction to the concept of metastability. Chapter 1 provides

In Science—in fact, in most things—it is usually best to begin at

We begin with a brief description of the phenomenon of metastability (Sects. 1.1–

Metastability is a widespread phenomenon that arises in a large variety of systems—

© Springer International Publishing Switzerland 2015 3

1.2 Condensation and magnetisation: from gases

ΔG(r) = difference bulk free energies + interfacial energy = −Δr d + σ r d−1 ,

1.3 Historical perspective

1.3.1 Early achievements

The earliest attempt at a quantitative description of metastability dates back to the

where ᾱ ∈ (0, 1) is a dimensionless exponent. Such an equation appears when the

1.3.2 The pathwise approach

1.3.3 The spectral approach

In the 1980’s, Davies [70–74] proposed an axiomatic approach to metastability

1.3.4 The potential-theoretic approach

The potential-theoretic approach to metastability was initiated in 2001 in a paper

1.3.5 The computational approach

As mentioned above, there is great interest in quantitative numerical computations

by rare events and involves excessively long time-scales, doing a simulation is

What a convenient thing it would be if all thieves had the same

While classical mechanics is concerned with deterministic equations of motion, sta-

© Springer International Publishing Switzerland 2015 15

of the models to be considered in Parts IV–VIII, with a brief indication of what we

2.1 Two paradigmatic models

In Sects. 2.1.1–2.1.2 we describe two paradigmatic models for metastability: the

2.1.1 Kramers model: Brownian motion in a double-well

where Xt denotes the position at time t of a “particle” diffusing in a drift field

2.1.2 Finite-state Markov processes

The model of Kramers describes the evolution of an effective order parameter of a

Fig. 2.2 Transition rates for the two-state Markov chain

2.2 Model reduction

2.3 Variational point of view

The focus of this book is on the potential-theoretic approach to metastability. The

2.4 Specific models

sharpens the classical results of Freidlin-Wentzell theory by using the potential-

2.5 Related topics

• Conformational dynamics: Large (bio)-molecules undergo transitions between

On peut même dire, à parler en rigueur, que presque toutes nos

3.1 Probability and measures

3.1.1 Probability spaces

A space Ω is an arbitrary non-empty set. Elements of Ω are denoted by ω. If A ⊂ Ω

© Springer International Publishing Switzerland 2015 27

Definition 3.2 A space Ω, together with a σ -algebra F of subsets of Ω, is called

Definition 3.3 Let (Ω, F ) be a measurable space. A map μ : F → [0, ∞] is

A measure μ is called finite if μ(Ω) < ∞. A measure is called  σ -finite if

Definition 3.4 Let (Ω, F ) be a measurable space. A positive measure P on (Ω, F )

Definition 3.5 Let E be a topological space. The Borel-σ -algebra B(E) of E is

topological space whose topology is equivalent to some metric topology. Such a

Theorem 3.6 (Carathéodory’s theorem) Let Ω be a set and let A be an algebra

A measure defined on a Borel-σ -algebra is sometimes called a Borel measure.

The most important issue that arises in applications is to characterise a measure

Definition 3.7 Let Ω be a space. A class T of Ω is called a Π -system if it is closed

Theorem 3.8 (Dynkin’s theorem) If T is a Π -system and G is a λ-system, then

Theorem 3.9 (Monotone class theorem) Let H be a class of bounded, measurable

A measure μ is called finite if μ(Ω) < ∞. A measure is called σ -finite if

The set of functions X such that Xp,μ < ∞ is denoted by L p (Ω, F , μ) = L p .

Definition 3.33 If ν μ, then a positive measurable function X such that ν = μX

Lemma 3.34 Let μ, ν be two σ -finite measures on (Ω, F ) such that ν μ. If X

X(ω) : I → S, t → Xt (ω). (3.2.1)