Bovier & Den Hollander - Metastability
Bovier & Den Hollander - Metastability
Anton Bovier
Frank den Hollander
Metastability
A Potential-Theoretic Approach
Grundlehren der
mathematischen Wissenschaften 351
A Series of Comprehensive Studies in Mathematics
Series editors
Editor-in-Chief
A. Chenciner J. Coates S.R.S. Varadhan
For further volumes:
www.springer.com/series/138
Anton Bovier r Frank den Hollander
Metastability
A Potential-Theoretic Approach
Anton Bovier Frank den Hollander
Institut für Angewandte Mathematik Mathematisch Instituut
Rheinische Friedrich-Wilhelms-Universität Universiteit Leiden
Bonn, Germany Leiden, The Netherlands
v
vi Preface
metastable sets, and focuses on the precise analysis of the respective hitting prob-
abilities and hitting times of these sets with the help of potential theory. The fact
that this requires the solution of Dirichlet problems in typically high-dimensional
spaces has probably acted as a deterrent for a long time, and has prevented an ef-
ficient use of the ensuing methods at a much earlier stage. The key point in the
potential-theoretic approach is the realisation that, in the specific setting related to
metastability, most questions of interest can be reduced to the computation of capac-
ities, and that these capacities in turn can be estimated by exploiting powerful varia-
tional principles. In this way, the metastable dynamics of the system can essentially
be understood via an analysis of its statics. This constitutes a major simplification,
and acts as a guiding principle. In addition, potential theory also allows to deduce
detailed information on the spectral characteristics of the generator of the dynam-
ics, which are typically assumed in the so-called spectral approach to metastability
initiated by Davies [73, 74] in the 1980’s.
The setting of this book is the theory of Markov processes, for the most part, re-
versible Markov processes. Within this limitation, however, there is a wide range of
models that are adequate to describe a variety of different real-world systems. The
models we aim at range from finite-state Markov chains, finite-dimensional diffu-
sions and stochastic partial differential equations, via mean-field dynamics with and
without disorder, to stochastic spin-flip and particle-hopping dynamics and proba-
bilistic cellular automata. Our main aim is to unveil the common universal features
of these systems with respect to their metastable behaviour.
The book is divided into nine parts:
• Part I presents the metastability phenomenon in its various manifestations, with
emphasis on its universal aspects. A brief overview of the history of the subject is
given, including a comparison of the pathwise, the spectral, the potential-theoretic
and the computational approach. Two paradigmatic models are presented: the
Kramers model of Brownian motion in a double-well potential and the two-state
Markov chain. These models serve as a red thread through the book, in the sense
that the much more complex and real-world models treated later still exhibit a
metastable behaviour that is in many respects similar. An outline of which models
will be treated in the book and which main techniques will be used to analyse
them is provided, as well as a brief perspective on metastability in areas other
than interacting particle systems.
• Part II provides the necessary background on Markov processes (and can be
skipped by readers with a basic knowledge of probability theory). Here, the cen-
tral theme is the relation between Markov processes, martingales, and Dirich-
let problems. A brief outline of large deviation theory is provided, as well as a
description of three variational principles for capacities that play a key role in
the study of metastability: the Dirichlet principle, the Thomson principle and the
Berman-Konsowa principle.
• Part III contains the core of the theory. Here, we give the definition of metastable
systems and metastable sets in terms of properties of capacities, and we describe
the consequences of these definitions for the distribution of metastable hitting
Preface vii
times and for the spectral properties of the associated Markov generators. We also
introduce and discuss the basic techniques that can be used to compute capacities
and equilibrium potentials, and to estimate harmonic functions.
Parts IV–VIII highlight the key models that can be treated with the help of these
techniques. It is here that the potential-theoretic approach to metastability fully
comes to life.
• Part IV studies diffusions with small noise, both finite-dimensional (random walks
and stochastic differential equations) and infinite-dimensional (stochastic partial
differential equations).
• Part V describes coarse-graining techniques applied to the Curie-Weiss model
in large volumes at positive temperatures, both for a non-random and a random
magnetic field.
• Part VI focusses on lattice systems in small volumes at low temperatures. In this
setting, energy dominates entropy. An abstract set-up is put forward, and universal
metastability theorems are derived under general hypotheses. These hypotheses
are subsequently proved for Ising spins subject to Glauber dynamics and lattice
gases subject to Kawasaki dynamics.
• Part VII extends the results in Part VI to lattice systems in large volumes at low
temperatures. In large volumes, spatial entropy comes into play, which compli-
cates the analysis. Both for Glauber dynamics and Kawasaki dynamics the key
quantities controlling metastable behaviour can be identified, but at the cost of a
severe restriction on the starting measure of the dynamics.
• Part VIII looks at metastable behaviour of lattice systems in small volumes at high
densities, in particular the zero-range process.
• Part IX lists a number of challenges for future research, both within metastability
and beyond. It describes systems that are presently too hard to deal with in detail,
but are expected to come within reach in the next few years. In particular, we
look at post-nuclear growth for Ising spins subject to Glauber dynamics (limiting
shape of large droplets) and at continuum particle systems with pair interactions
(crystallisation), for which a number of results are already available.
Along the way we will encounter a variety of ideas and techniques from proba-
bility theory, analysis and combinatorics, including martingale theory, variational
calculus and isoperimetric inequalities. It is the combination of physical insight and
mathematical tools that allows for making progress, in the best of the tradition of
mathematical physics.
Throughout the book we only consider classical stochastic dynamics. It would be
interesting to consider quantum stochastic dynamics as well, but this is beyond the
scope of the book. We also do not address issues related to numerical simulation,
which is rather delicate due to the extremely long time scales involved.
It is a pleasure to thank the colleagues with whom we have worked on metasta-
bility over the past 15 years: Florent Barret, Alessandra Bianchi, Michael Eck-
hoff, Alessandra Faggionato, Alexandre Gaudillière, Véronique Gayrard, Dmitry
Ioffe, Sabine Jansen, Oliver Jovanovski, Markus Klein, Roman Kotecký, Francesco
viii Preface
Manzo, Sylvie Méléard, Patrick Müller, Francesca Nardi, Rebecca Neukirch, Enzo
Olivieri, Elena Pulvirenti, Elisabetta Scoppola, Martin Slowik, Cristian Spitoni, Sia-
mak Taati and Alessio Troiani. Special thanks are due to Aernout van Enter for
reading the entire text and providing a host of valuable comments.
Anton Bovier was supported by the German Research Foundation (DFG) through
the Collaborative Research Centers 611 Singular Phenomena and Scaling in Math-
ematical Models and 1060 The Mathematics of Emergent Effects, by the Haus-
dorff Center for Mathematics (HCM) in Bonn, by the German-Israeli Foundation
(GIF), and by the Lady Davis Fellowship Trust (Haifa and Jerusalem). Frank den
Hollander was supported by the Netherlands Organisation for Scientific Research
(NWO) through Gravitation Grant 024.002.003-NETWORKS, and by the European
Research Council (ERC) through Advanced Grant 267356-VARIS Variational Ap-
proach to Random Interacting Systems.
The writing of this book started in 2011 while Frank den Hollander held a Bonn
Research Chair at the HCM, and continued in 2012 while Anton Bovier held a
Kloosterman Chair at the Mathematical Institute of Leiden University. Regular visits
back and forth took place in 2013–2015. The authors thank their home institutions
for hospitality.
xi
Contents
Part I Introduction
1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . 3
1.1 Phenomenology . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Condensation and magnetisation: from gases to ferromagnets . . 5
1.3 Historical perspective . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.1 Early achievements . . . . . . . . . . . . . . . . . . . . 8
1.3.2 The pathwise approach . . . . . . . . . . . . . . . . . . 10
1.3.3 The spectral approach . . . . . . . . . . . . . . . . . . . 11
1.3.4 The potential-theoretic approach . . . . . . . . . . . . . 11
1.3.5 The computational approach . . . . . . . . . . . . . . . . 12
2 Aims and Scopes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1 Two paradigmatic models . . . . . . . . . . . . . . . . . . . . . 16
2.1.1 Kramers model: Brownian motion in a double-well . . . 16
2.1.2 Finite-state Markov processes . . . . . . . . . . . . . . . 17
2.2 Model reduction . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3 Variational point of view . . . . . . . . . . . . . . . . . . . . . . 19
2.4 Specific models . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.5 Related topics . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Part II Markov Processes
3 Some Basic Notions from Probability Theory . . . . . . . . . . . . 27
3.1 Probability and measures . . . . . . . . . . . . . . . . . . . . . 27
3.1.1 Probability spaces . . . . . . . . . . . . . . . . . . . . . 27
3.1.2 Random variables . . . . . . . . . . . . . . . . . . . . . 29
3.1.3 Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.1.4 Spaces of integrable functions . . . . . . . . . . . . . . . 32
3.1.5 Convergence . . . . . . . . . . . . . . . . . . . . . . . . 34
3.1.6 Radon-Nikodým derivative . . . . . . . . . . . . . . . . 36
3.2 Stochastic processes . . . . . . . . . . . . . . . . . . . . . . . . 37
xiii
xiv Contents
5.4.3 Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.4.4 Existence . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.5 Itō calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.5.1 Square-integrable continuous martingales . . . . . . . . . 102
5.5.2 Stochastic integrals for simple processes . . . . . . . . . 105
5.5.3 Itō formula . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.6 Stochastic differential equations . . . . . . . . . . . . . . . . . . 107
5.6.1 Strong solutions . . . . . . . . . . . . . . . . . . . . . . 108
5.6.2 Existence and uniqueness of strong solutions . . . . . . . 109
5.6.3 The Doob transform . . . . . . . . . . . . . . . . . . . . 112
5.6.4 The Girsanov theorem . . . . . . . . . . . . . . . . . . . 113
5.7 Stochastic partial differential equations . . . . . . . . . . . . . . 114
5.7.1 The stochastic Allen-Cahn equation . . . . . . . . . . . . 115
5.7.2 Discretisation . . . . . . . . . . . . . . . . . . . . . . . 119
5.8 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . 122
6 Large Deviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
6.1 Large deviation principles . . . . . . . . . . . . . . . . . . . . . 125
6.2 Path large deviations for diffusion processes . . . . . . . . . . . 129
6.2.1 Brownian motion . . . . . . . . . . . . . . . . . . . . . 129
6.2.2 Brownian motion with drift . . . . . . . . . . . . . . . . 134
6.2.3 Diffusion processes . . . . . . . . . . . . . . . . . . . . 135
6.3 Path large deviations for stochastic partial differential equations . 136
6.4 Path large deviations for Markov processes . . . . . . . . . . . . 137
6.5 Freidlin-Wentzell theory . . . . . . . . . . . . . . . . . . . . . . 138
6.5.1 Properties of action functionals . . . . . . . . . . . . . . 138
6.5.2 Crossing and exit problems . . . . . . . . . . . . . . . . 139
6.5.3 Metastability . . . . . . . . . . . . . . . . . . . . . . . . 141
6.6 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . 142
7 Potential Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
7.1 The Dirichlet problem: discrete time . . . . . . . . . . . . . . . 145
7.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . 145
7.1.2 Green function, equilibrium potential and measure . . . . 147
7.1.3 Reversibility . . . . . . . . . . . . . . . . . . . . . . . . 150
7.1.4 One-dimensional nearest-neighbour random walks . . . . 155
7.2 The Dirichlet problem: continuous time . . . . . . . . . . . . . . 157
7.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . 157
7.2.2 Countable state space . . . . . . . . . . . . . . . . . . . 158
7.2.3 Diffusion processes . . . . . . . . . . . . . . . . . . . . 159
7.2.4 Reversible Markov processes . . . . . . . . . . . . . . . 161
7.2.5 One-dimensional diffusions . . . . . . . . . . . . . . . . 171
7.3 Variational principles . . . . . . . . . . . . . . . . . . . . . . . 173
7.3.1 The Dirichlet principle . . . . . . . . . . . . . . . . . . . 174
7.3.2 The Thomson principle . . . . . . . . . . . . . . . . . . 175
7.3.3 The Berman-Konsowa principle . . . . . . . . . . . . . . 178
xvi Contents
1.1 Phenomenology
Similarly, in chemistry the mixing of two reactive compounds (like oxygen and
hydrogen) may lead to a metastable state that can persist for a very long time,
but when triggered (by a spark) transits very rapidly to the stable state (water).
In economics, stock prizes may persist for a long time on high levels, in spite of
economists warnings that the market is “overheated”, until a “crash” occurs and
prices drop within days or hours to much lower levels. In economics jargon, there
was a “bubble” that has collapsed. Such phenomena are ubiquitous. The common
features are: a large variability in the moment of the onset of some dramatic change
in the properties of the system, a much shorter time for the actual transition (i.e.,
between the onset of a noticeable change and the moment a new state is reached),
and unpredictability of the time of the onset of the transition.
What is behind all of this? A simple thought experiment reveals a possible mech-
anism. Suppose that in a mountain range there are two valleys, A and B. Reaching
valley B from valley A requires climbing up 1 km on a steep slope. An experienced
mountaineer will easily make this journey in a fairly predictable time, say, 4 hours.
Now suppose that there is a tourist visiting valley A, a drunken tourist who wanders
around in the valley without any particular purpose and occasionally climbs up the
slope towards valley B. However, as he does so, he will encounter certain obstacles
or will get tired, and just slurps back to the base of the valley (where, for the sake of
the argument, he will have some drinks to retain his confused state of mind). As our
tourist is not terribly interested in getting to the second valley anyway, we may as-
sume that he does not learn anything from his excursions up the slope and after each
visit to the local pub finds himself in just the same condition as before. Let us now
assume that, after many days, we find our tourist in valley B. What has happened?
Well, after many failed excursions uphill, and equally many returns to the pub, on
a lucky day he just happened to climb straight up the slope and then tumbled down
into valley B. Should anyone have observed this final successful climb, he might
not have been able to distinguish the tourist from the experienced mountaineer, who
would have taken the same path on purpose in the first place. A rough estimate re-
veals how long it took the tourist to get to valley B: the number of attempts (returns
to the pub) will be on average 1/p when p is the probability to get over the edge be-
fore returning to the pub. If the average time Δ of such an unsuccessful excursion is
not too tiny (say, 30 minutes), then the average time Δ/p until the final crossing can
run into years when p is small (as is to be expected). Moreover, given the fact that
our tourist is not learning anything (and given that no other conditions are changing
over time, such as the weather), the fact that at each time back to the pub the tourist
is back to where he started implies that the number of failed attempts, and hence
the total time until the final crossing, are essentially unpredictable. Thus, in this
simple example we recognise and understand all the features of the metastability
phenomenon mentioned above. As we will see later, this little thought experiment
indeed captures all the crucial features behind metastability.
The first main challenge of mathematics is of a qualitative nature, namely, to
explain why in a large variety of systems the same type of metastable behaviour is
observed. Many such systems can be described from first principles as many-body
systems subject to classical or quantum dynamics. While the corresponding equa-
tions of motion are known, they are typically very hard to analyse, in particular,
1.2 Condensation and magnetisation: from gases to ferromagnets 5
over the extremely long time intervals in which metastable behaviour occurs. Also,
metastability manifestly exhibits randomness (the unpredictable time of the occur-
rence of the transition), the source of which may be difficult to extract from the
underlying deterministic dynamics. It may be due to quantum effects, or external
perturbations of a (non-closed) system, or the effect of unresolved high-frequency
degrees of freedom. A first simplification is to pass to a description of the system
as a stochastic dynamics. The justification of such a description is an interesting
topic in itself, which will not be addressed in the present book. Rather, a stochastic
model of the dynamics of the systems we are interested in will be the starting point
of our analysis of metastability. Even more restrictively, we will limit our analysis
to Markov processes. Still, even within this restricted setting, there is a wide variety
of different models where metastability emerges and where the explanation of the
underlying universality is possible.
The second main challenge is of a quantitative nature. Given the parameters of
some underlying model, we would like to be able to compute as precisely as pos-
sible the quantities controling the metastable phenomena, in particular, the distri-
bution of the times of the transitions between metastable and stable states. Again,
this is hard because most metastable systems of practical relevance are many-body
systems whose dynamics is not easy to capture, neither analytically nor numeri-
cally, and because extremely long time scales may be involved. (See Newman and
Barkema [194] for an overview on Monte Carlo methods in statistical physics.)
Understanding metastability on the quantitative level is of considerable practical
interest, as it affects the behaviour and functioning of many systems in nature.
From the point of view of statistical mechanics, metastability is the dynamical sig-
nature of a first-order phase transition. In equilibrium statistical physics, a first-
order phase transition is said to occur if a system is sensitive to the change of a
parameter (or a boundary condition), in the sense that certain extensive variables
(such as density or magnetisation) show a discontinuity as functions of certain in-
tensive variables (such as pressure or magnetic field). Dynamically, this sensitivity
manifests itself in the fact that, as the parameter is varied across the phase transition
curve, the system remains for a considerable amount of time (typically random) in
the “old phase” before it suddenly changes to the “new phase”, the true equilib-
rium phase. In other words, the extensive variables change their value as a function
of time with a random delay. Thus, the study of metastability can be seen as part
of non-equilibrium statistical physics. Let us discuss this in a bit more detail in an
example.
The most commonly observed occurrence of metastability is the phenomenon of
condensation of over-saturated water vapour (rainfall). The common explanation of
what is going on can be found in elementary physics textbooks. If water vapour is
6 1 Background and Motivation
Fig. 1.1 Effective free energy ΔG(r) of a droplet as a function of its radius r (middle curve). The
threshold for condensation is the critical radius r ∗
cooled below the critical temperature, then the free energy of the gas-phase is larger
than that of the liquid-phase. Therefore, thermodynamics predicts a transition from
the gas-phase to the liquid-phase. However, this transition can only be achieved
by an aggregation of water molecules. This aggregation has to start somewhere in
the system with the formation of small droplets of liquid. The key point is that the
effective free energy of such a droplet is made up of two terms: (1) the difference
between the bulk free energies of the two phases; (2) the interfacial energy between
the two phases. This leads to a formula of the type (see Fig. 1.1)
Fig. 1.2 Hysteresis in ferromagnets: plot of the magnetisation m versus the magnetic field h. The
dotted pieces refer to the magnetisation of the metastable states. The arrows refer to the metastable
crossovers. The symbols G and K stand for Glauber and Kawasaki dynamics (to be treated in
Part VI), for which the magnetisation is not preserved, respectively, is preserved
very naturally and can be fully quantified. In this context, the excess free energy of
a critical droplet is called the free energy barrier for the onset of the phase transi-
tion. The presence of such a barrier is the reason for the metastable behaviour, and
thermal fluctuations are the driving force for transitions out of the metastable state.
The formation of a critical droplet (i.e., a droplet of critical radius) is the minimal
effort these fluctuations have to make to initiate the phase transition dynamically.
Of course, the same explanation applies when a liquid freezes and—reversed in
time—when a liquid evaporates or a solid melts. The fine details are different, but
the overall picture is the same.
Another situation where the same principles are at work is magnetic hysteresis,
which is treated in Chaps. 13, 17 and 19 (see Fig. 1.2). When a ferromagnetic mate-
rial is placed in a magnetic field h it magnetises, i.e., the atomic magnetic moments
(“spins”) tend to align with the field. At temperatures below the so-called Curie
temperature, this magnetised state persists (forever) even when the field is turned
off: the spontaneous magnetisation is m . This persistence is the sign of a first-order
phase transition. Moreover, even when afterwards the direction of the field is in-
verted, the magnetisation will remain in the old direction and will only align with
the new direction after some time. The reason is the same as for the supersaturation
of a gas: the ferromagnetic material has to create local droplets with the opposite
magnetisation, and these droplets become energetically favourable and hence start
to grow only after they have acquired some minimal size. The creation of such
critical droplets is again the work of thermal fluctuations.
Ferromagnets are particularly easy to manipulate and very precise measurements
are possible. Figure 1.2 is the paradigmatic figure for metastable behaviour, and is
held to be ubiquitous in all situations of metastability.
Fig. 1.3 Chemical reaction from state S1 to state S2 via transition state S ∗ with reaction rates k ∗ ,
k1 and k2
R = exp[−E/kT ]. (1.3.1)
Here, E is the activation energy associated with the reaction (in joules per
molecule), T is the absolute temperature (in degrees Kelvin), and k is the Boltz-
mann constant. (If the activation energy is measured in joules per mole, then k is to
be replaced by what is called the gas constant.) In 1889 Arrhenius [8] proposed a
refinement of (1.3.1), namely,
R = A exp[−E/kT ], (1.3.2)
where the prefactor A is called the amplitude. He also provided the following
physical interpretation of (1.3.2). For molecules to react they must first acquire
a minimum amount of energy, say E. At absolute temperature T , the fraction of
molecules that have a kinetic energy larger than E is proportional to exp[−E/kT ],
according to the Maxwell-Boltzmann distribution of statistical mechanics. Hence,
exp[−E/kT ] is the probability that a single collision causes a reaction. If A is in-
terpreted as the average number of collisions per time unit, then R is the average
number of collisions per time unit that cause a reaction, and the inverse 1/R is the
average reaction time.
Equation (1.3.2) goes under the name of Arrhenius equation or Arrhenius law.
The same equation applies to other situations where an energy barrier is involved,
such as the phenomena of condensation and magnetisation mentioned in Sect. 1.2.
Still other examples are the motion of dislocations in crystals, the ageing of spin
glasses and the folding of proteins, which underlines the universal character of
the Arrhenius formula. In Part VI we will see that (1.3.2) provides an excellent
1.3 Historical perspective 9
approximation of the average metastable crossover time for a large class of models
with a stochastic dynamics in small volumes at low temperatures.
Several modifications of the Arrhenius equation have been proposed over the
years. One modification is a temperature-dependence of the prefactor of the form
A(T /T0 )α , with T0 a reference temperature and α ∈ R a dimensionless exponent. In
Part VII we will encounter a model with a stochastic dynamics in large volume at
low temperature where this form of the prefactor is needed, with A proportional to
the volume and α = 1. In general, however, this form of the prefactor is neither easy
to explain theoretically nor easy to verify experimentally. Another modification is a
stretched exponential of the form
R = A exp −(E/kT )ᾱ , (1.3.3)
decay time. The main tool in [199] is the restricted ensemble, which is defined
to be the Gibbs measure conditioned on the particle configuration lying in a suit-
able subset R of the configuration space, representing the metastable state, e.g.
corresponding to a supersaturated vapour whose density is conditioned to lie be-
low the density of the liquid. The rate at which the stochastic dynamics brings
the system outside R is maximal at time zero. This incipient rate plays the role
of an escape rate λ. The lifetime of the metastable state is identified with 1/λ, and
is an inherently dynamical quantity. The choice of R must be such that: (1) the
Gibbs measure conditioned on R describes a pure phase; (2) λ is very small; (3) R
has a very small weight under the unconditional Gibbs measure. For the spin-
model with Kac-potential, Penrose and Lebowitz were able to compute λ explic-
itly (on a rough scale) and show that 1/λ coincides with the activation free en-
ergy needed to move out of R. Based on these results, an early attempt to ax-
iomatise metastability was made by Sewell [218, Chap. 6]. For further details we
refer the reader to Penrose and Lebowitz [200] and to Olivieri and Vares [198,
Sect. 4.1].
The pathwise approach to metastability was initiated in the late 1960’s and early
1970’s by Freidlin and Wentzell. They introduced the theory of large deviations on
path space in order to analyse the long-term behaviour of dynamical systems un-
der the influence of weak random perturbations. Their realisation that metastable
behaviour is controlled by large deviations of the random processes driving the
dynamics has permeated most of the mathematical literature on the subject since.
A comprehensive account of this development is given in their 1984 monograph
Random Perturbations of Dynamical Systems [115]. The application of these ideas
in a statistical physics context was pioneered in 1984 by Cassandro, Galves, Olivieri
and Vares [51]. They realised that the theory put forward by Freidlin and Wentzell
could be applied to study metastable behaviour of interacting particle systems. This
paper led to a flurry of results for a variety of Markovian lattice models, which
are described at length in the 2005 monograph Large Deviations and Metastabil-
ity [198] by Olivieri and Vares. This work provides the key elements of the symbio-
sis between statistical physics, large deviation theory and metastability.
The advantage of the pathwise approach is that it gives very detailed informa-
tion on the metastable behaviour of the system. By identifying the most likely path
between metastable states (typically, the global minimiser of some “action inte-
gral” that constitutes the large deviation rate function in path space), the time of the
crossover can be determined and information can be obtained on what the system
does before and after the crossover (“tube of typical trajectories”). The drawback
of the pathwise approach is that it is generally hard to identify and control the rate
function, especially for systems with a spatial interaction, for which the dynamics is
non-local. Consequently, the pathwise approach typically leads to relatively crude
results on the crossover time.
1.3 Historical perspective 11
vation is that the relevant equilibrium potentials can, to the extent necessary, be in
turn bounded from above and below by capacities with the help of renewal equa-
tions. This is absolutely crucial, as it avoids the formidable problem of solving the
boundary value problems through which the equilibrium potentials are defined. Ef-
fectively, it means that estimates of the average crossover time can be derived that
are much sharper than those obtained via the pathwise approach.
Capacities are expressed with the help of Dirichlet forms, which are functionals
of the space of potentials, respectively, flows, and correspond to the energy associ-
ated with the network. These Dirichlet forms have the dimension of the configura-
tion space, and thus are typically very high-dimensional. However, it turns out that
the ensuing high-dimensional variational principles for the capacity often can be
reduced to low-dimensional variational principles when the system is metastable.
This comes from the fact that metastable crossovers occur near saddles connect-
ing metastable sets of configurations and, consequently, the equilibrium potential
is very close to 1 or to 0 away from these saddles. As a result, the full variational
principle reduces to a simpler variational principle, which only lives on the config-
urations close to the saddle and captures the fine details of the dynamics when it
makes the crossover. In Parts IV–VIII we will see plenty of examples of this reduc-
tion. In some cases the simpler variational problem is so low-dimensional that it can
be solved explicitly.
The quantitative success of the potential-theoretic approach, relying on tractable
variational principles for capacities, also entails its effective limitation to the case of
reversible Markov processes. While variational characterisations of capacities are
known also for non-reversible Markov processes (see Sect. 7.3), they are far more
complicated and difficult to use than their reversible counterparts. Some attempts in
this direction have been made by Eckhoff [101, 102], and more recently by Gaudil-
lière and Landim [121] and Slowik [220]. This area is wide open for future research.
Historically, the potential-theoretic approach has its roots in the early work by
Kramers [157], who performed precise computations of metastable crossover times
in the context of a Brownian motion in a double-well potential. Such explicit so-
lutions of the Dirichlet problems involved are, however, possible only in the one-
dimensional setting. There have been numerous computations in higher-dimensional
settings, based on formal perturbation theory, which can be seen as precursors of the
potential-theoretic approach (see e.g. Matkowsky and Schuss [181], Matkowsky,
Schuss and Tier [182], Knessl, Matkowsky, Schuss and Tier [153], and the discus-
sion in Maier and Stein [169]).
The potential-theoretic approach also connects nicely to the spectral approach.
As we will see in Chaps. 8 and 11, in many cases the spectral assumptions of Davies
are a consequence of metastability as characterised by capacities.
One of the first mathematical models for metastability was proposed in 1940 by
Kramers [157]. It consists of the one-dimensional diffusion equation (or Langevin
equation in physics terminology)
√
dXt = b(Xt ) dt + 2ε dBt , (2.1.1)
This formula fits the classical√Arrhenius law with activation energy E = W (z∗ ) −
W (u), amplitude A = 2π/ [−W (z∗ )]W (u) and inverse temperature β =
1/kT = 1/ε. Note that the flatter W is near z∗ and u, the larger is the amplitude:
flatness slows down the crossover at z∗ and increases the number of returns to u.
1 In fact, (2.1.1) emerges as a special case of the more general equation considered by Kramers,
√
namely, the Ornstein-Uhlenbeck equation dXt = Vt dt , μ−1 dVt = −dXt + b(Xt )dt + 2εdBt ,
where Vt denotes the velocity at time t and μ is a friction parameter. This equation gives rise to
(2.1.1) in the limit as μ → ∞. Thus, (2.1.1) can be seen as the equation of motion of a particle
moving under the influence of a friction force, a gradient force and a random force in the limit
where the friction becomes infinitely strong.
2.1 Two paradigmatic models 17
Fig. 2.1 A double-well potential with a local minimum at u, a global minimum at v and a saddle
point at z∗
Formula (2.1.2) exhibits a structure that is typical for metastable systems. There
is an exponential term, here given by exp[(W (z∗ ) − W (u))/ε], which provides the
leading asymptotic behaviour. The pathwise approach to metastability, which is
based on large deviation theory, typically is capable to identify this term, sometimes
referred to as the exponential asymptotics, by showing that
ε ln Eu [τv ] = 1 + o(1) W z∗ − W (u) , ε ↓ 0. (2.1.3)
√
However, identifying the prefactor, here given by 2π/ [−W (z∗ )]W (u), is in
general a far more subtle problem. It is the ambition of the potential-theoretic ap-
proach exhibited in this book to provide a unified framework that allows to obtain
rigorous asymptotic formulas as in (2.1.2) with an explicit prefactor for a wide class
of metastable systems. We will see plenty of examples in Parts VI–VII.
The multi-dimensional generalisation of (2.1.2) is attributed to Eyring and is
called the Eyring-Kramers formula (see Glasstone, Laidler and Eyring [127], Wei-
denmüller and Zhang [236], Maier and Stein [170]). Actually, Eyring’s so-called
transition-state theory [106] is based on quantum-mechanical considerations and is
different from the classical theory of Kramers. It interprets the potential as a re-
stricted quantum-mechanical free energy. For a historical discussion, see Pollak and
Talkner [201].
can be reasonably approximated by the first hitting time τv of the local minimum v
starting from the local minimum u, and vice versa for τu . As we will see in Parts IV–
V, in the limit as ε ↓ 0 the times τv and τu normalised by their expectations tend
to exponentially distributed random variables. This means that a rough approxima-
tion of the long-term behaviour of the Kramers model is given by a continuous-time
Markov chain with state space {u, v} and transition rates (see Fig. 2.2)
c(u, v) = e−r(u,v)/ε , r(u, v) = W z∗ − W (u),
(2.1.4)
c(v, u) = e−r(v,u)/ε , r(v, u) = W z∗ − W (v).
The average crossover times Eu [τv ] = 1/c(u, v) and Ev [τu ] = 1/c(v, u) capture the
leading order asymptotics of (2.1.1), as expressed in (2.1.3).
The above setting can be easily generalised to systems with multiple metastable
states. An effective model for such systems would be a continuous-time Markov
chain with a finite state space M = {m1 , . . . , mn } and transition rates c(mi , mj ) =
exp[−r(mi , mj )/ε], i, j = 1, . . . , n. The basic task of a theory of metastability is
to determine these transition rates from first principles. This idea was properly for-
malised by Freidlin and Wentzell [115] in the context of small random perturbations
of dynamical systems. In their theory the coefficients r(mi , mj ) are computed with
the help of the theory of large deviations on path space (see Chap. 6).
Finite-state Markov chains with exponentially small transition rates have become
a subject of interest by themselves. By allowing the transition rates to be either ex-
ponentially small or equal to one, the above picture is capable of describing models
from statistical physics, in particular, spin-flip systems and lattice gases in finite
volumes at low temperatures, with ε playing the role of temperature (see Part VI).
The analysis of the metastability properties of finite-state Markov chains is a
non-trivial problem in itself. In the early 1990’s an intense activity in this direction
started with the work of Catoni and Cerf [53] and Olivieri and Scoppola [196, 197].
The methods used were, once again, large deviations on the path space of these
Markov chains. The difficulties that arise in the analysis of specific models are
essentially of a combinatorial nature: the optimal paths for transitions between
metastable states need to be identified and to be counted. This leads to interest-
ing problems, such as the discrete isoperimetric inequalities studied in Alonso and
Cerf [4], which we will encounter in Part VI. Only later, in the 2000’s, was it noted
that potential theory is very well suited to simplify the analysis and to draw sharper
results from the same input, as first pointed out by Bovier and Manzo [39] and later
amplified in Bovier, den Hollander and Nardi [31].
2.2 Model reduction 19
Kramers model and finite-state Markov chains can both be seen as simple toy models
that ought to be derivable from more complex realistic models of interest. Ideally,
we would like to start with many-body systems of interacting quantum particles.
This, however, is beyond present-day technology. The most complex models we will
consider in this book are classical interacting particle systems, in particular spin-
flip systems and lattice gases. These are Markov processes with a high-dimensional
(sometimes even infinite-dimensional) state space. Typically, the noise on the level
of the microscopic dynamics is not small, and the large-scale dynamics of the system
depends on the interplay between energetic and entropic effects.
It is generally accepted in the physics and chemistry literature that reduced mod-
els, describing the time evolution of the system on an intermediate aggregate level
of mesoscopic variables, provides a good description of metastable behaviour. Ex-
amples of such models are stochastic differential or partial differential equations
with small noise. Ideally, such effective dynamics should be derived with the help
of coarse-graining techniques, in the spirit of the renormalisation group theory in
equilibrium statistical mechanics (see the monograph by Presutti [202]). However,
this derivation is quite problematic, partly because renormalisation maps typically
do not preserve the Markovian nature of the dynamics. An even more serious issue is
that, while at least formally deterministic evolution equations (like the Allen-Cahn
equation [3] treated in Chap. 12) can be derived as scaling limits (i.e., laws of large
numbers in probabilistic language), a proper understanding of metastability requires
that we move beyond the deterministic limit and retain at least part of the random
perturbations of the dynamics. In the literature this goes by the name of diffusion
limits. However, there are subtle and poorly understood issues regarding the proper
choice of the noise term. In this book we will treat diffusion processes with small
noise as interesting models in their own right in Part IV. The issue of the derivation
of mesoscopic dynamics from microscopic dynamics in the mean-field setting will
be touched upon in Part V.
for the Green function in terms of capacities, the invariant measure and harmonic
functions. Since renewal arguments can be used to control harmonic functions by ca-
pacities, the key objects of the theory are capacities and the invariant measure. The
great advantage of this approach materialises in the context of reversible Markov
processes, i.e., Markov processes whose semi-groups are self-adjoint operators in
an L2 -space with respect to an invariant measure. This provides the main weapon
of the method: the Dirichlet principle expresses capacities as infima of the Dirichlet
form over classes of functions that are constrained by boundary conditions. The use-
fulness of this variational principle has long been recognised, e.g. in the analysis of
finite-state Markov chains. The book by Doyle and Snell [96] is an excellent source
for this material. For a more recent exposition, see Levin, Peres and Wilmer [163].
Part II of the book provides the background on potential theory of reversible Markov
processes that is necessary to deal with problems of metastability.
As a variational problem, the Dirichlet principle is a simple instrument to turn
physical intuition into upper bounds, and the sharpness of these upper bounds is
limited by diligence and imagination only. A particularly nice aspect of the Dirichlet
problem is that it satisfies certain monotonicity properties with respect to underlying
parameters. In fact, on this basis Berman and Konsowa [23] derived a dual varia-
tional principle that expresses capacities (in the case of a discrete state space) in
terms of suprema over flows (similar to, but different from the better known Thom-
son principle), which we call the Berman-Konsowa principle. As an upshot, the
latter allows for the derivation of lower bounds that complement the upper bounds
obtained via the Dirichlet principle. It is a rather remarkable fact that in many ex-
amples upper and lower bounds can be obtained for the metastable crossover time
that differ by a multiplicative factor of the form 1 + o(1) only, where o(1) tends to
zero as the time scale of the metastable system tends to infinity. We will see these
ideas at work in a variety of examples throughout the book. Part III outlines the
basic techniques that are needed to implement these ideas.
A key observation is that the analysis of the Dirichlet principle and the Berman-
Konsowa principle in essence is part of equilibrium statistical physics, since it deals
with acquiring the relevant knowledge of the free energy landscape of the system.
Potential theory links this knowledge to the metastable dynamics of the system,
which is part of non-equilibrium statistical physics.
process spends most of its time in a “condensed state”, i.e., a configuration where
most of the particles pile up on a single site, and prove that the process evolves
via a “metastable hopping” of this pile from one site to another. Both the hopping
time and the hopping distribution are computed.
The different parts on applications can essentially be read independently and
have the following substructure:
• Part IV: (diffusions with small noise)
(10) Discrete diffusions
(11) Continuous diffusions ∗
(12) Stochastic partial differential equations ∗
• Part V: (coarse-graining in large volumes at positive temperatures)
(13) Curie-Weiss mean-field model
(14) Curie-Weiss in discrete random magnetic field
(15) Curie-Weiss in continuous random magnetic field ∗
• Part VI: (lattice systems in small volumes at low temperatures)
(16) General theory
(17) Glauber dynamics
(18) Kawasaki dynamics
• Part VII: (lattice systems in large volumes at low temperatures)
(19) Glauber dynamics ∗
(20) Kawasaki dynamics ∗
• Part VIII: (lattice systems in small volumes at high densities)
(21) Zero-range dynamics ∗
The chapters without ∗ concern models where the state space is simple (e.g. discrete
and finite) and a complete description of the metastable behaviour is achieved. The
chapters with ∗ concern models where the state space is not simple (e.g. continuous
and infinite) and only partial results are obtained.
The playground for our expedition into metastability is the theory of Markov pro-
cesses. Part II presents a summary introduction to this subject, with special emphasis
on what will be needed in Part III to describe metastability. The simplest examples
are Markov processes in discrete time and discrete space. The general theory is de-
veloped from there.
Chapter 3 recalls some basic notions from probability theory. Chapters 4–5 look
at Markov processes in discrete, respectively, continuous time, with the focus on
generators and semigroups, martingales, and Itō calculus. Chapter 6 gives a brief
introduction to large deviations, and looks at path large deviations for finite- and
infinite-dimensional diffusion professes via action integrals. Chapter 7 collects the
main ingredients from potential theory that are needed in the rest of the book: with
capacity playing a central role in the study of metastable transition times, and vari-
ational principles for capacities being the main vehicles to estimate capacities.
Readers with a background in probability theory can skip Chaps. 3–6.
Chapter 3
Some Basic Notions from Probability Theory
In this chapter we recall some basic notions from probability theory in order to
set notation and to have easy references for later use. Proofs are mostly omitted.
Readers who are unfamiliar with the concepts appearing below should consult basic
textbooks on probability theory and stochastic processes (see Sect. 3.6 for a list of
possible references). Readers who are familiar may skip to Sect. 4.
Section 3.1 defines key ingredients such as probability spaces, random variables,
integrals and Radon-Nikodým derivative. Section 3.2 defines stochastic processes
and states the Daniell-Kolmogorov extension theorem. Section 3.3 defines condi-
tional expectation, conditional probability and conditional probability measure. Sec-
tions 3.4–3.5 list the main properties of martingales in discrete time, respectively
continuous time.
Definition 3.1 Let Ω be a space. A family A = {Aλ }λ∈I , with Aλ ⊂ Ω for all λ ∈ I
with I an arbitrary set, is called a class of Ω. A non-empty class of Ω is called an
algebra if:
(i) Ω ∈ A .
(ii) For all A ∈ A , Ac ∈ A .
(iii) For all A, B ∈ A , A ∪ B ∈ A .
If A is an algebra and
(iv) n∈N An ∈ A whenever An ∈ A for all n ∈ N,
then A is called a σ -algebra.
In most instances one is concerned with the canonical setting where Ω is a topo-
logical space and F = B(Ω) is the Borel-σ -algebra of Ω.
One says that the Borel-σ -algebra is generated by the open sets of E.
A topological space endowed with a metric and its metric topology is called
a metric space. A metric space E is called complete if any Cauchy sequence in
E converges in E. E is called separable if it contains a countable subset that is
dense in E. The standard setting of probability theory is a complete, separable and
3.1 Probability and measures 29
The most useful application of Dynkin’s theorem is the observation that if two
probability measures are equal on a Π -system that generates the σ -algebra, then
they are equal on the σ -algebra (since the set on which the two measures coincide
forms a λ-system containing T ).
Dynkin’s lemma has a sometimes useful analogue for so-called monotone classes
of functions.
PX = P ◦ X −1 (3.1.3)
defines a probability measure on (E, G ), called the induced measure. Indeed, for
any B ∈ G , by definition
PX (B) = P X −1 (B) (3.1.4)
Definition 3.11 Let (Ω, F ) be a measurable space, and let (E, B(E)) be a topo-
logical space equipped with its Borel-σ -algebra. Let X be an E-valued random
variable. Then σ (X) is the smallest σ -algebra such that X is measurable from
(Ω, σ (X)) to (E, B(E)).
3.1.3 Integrals
We next recall the notion of the integral of a measurable function (respectively, the
expectation value of a random variable). To do so we first introduce the notion of
simple functions:
k
g(ω) = wi 1Ai (ω). (3.1.5)
i=1
Definition 3.14
(i) Let f be non-negative and measurable. Then
f dμ = sup g dμ ∈ R ∪ {∞}. (3.1.7)
Ω g∈E+ Ω
g≤f
We next state the key properties of the integral. The most fundamental property
is the monotone convergence theorem, which justifies the definition above.
Lemma 3.17 (Fatou’s lemma) Let (fn )n∈N be a sequence of measurable non-
negative functions. Then
lim inf fn dμ ≤ lim inf fn dμ. (3.1.13)
Ω n→∞ n→∞ Ω
measurable function X : Ω → R,
1/p
1/p
Xp,μ = Xp = E |X|p = |X| dμ
p
. (3.1.17)
Ω
X + Y p ≤ Xp + Y p . (3.1.18)
Both inequalities follow from one of the most important inequalities in integra-
tion theory: Jensen’s inequality (see Fig. 3.2).
Proof If ϕ is convex then, for every y, there is a straight line below ϕ that touches
ϕ at (y, ϕ(y)), i.e., there exists an m ∈ R such that ϕ(x) ≥ ϕ(y) + (x − y)m. Choos-
ing x = X − E[X] + c and y = c, and taking expectations on both sides, we get
(3.1.20).
The case p = 2 is particularly nice, in that L2 is not only a Banach space but also
a Hilbert space. The point is that Hölder’s inequality for p = 2 yields
E[XY ] ≤ E X 2 E Y 2 = X2 Y 2 . (3.1.22)
3.1.5 Convergence
Definition 3.24 Let (Xn )n∈N be a sequence of random variables with values in
some topological space, and let X be a random variable on the same space. The
sequence (Xn )n∈N is said to converge in law to X,
D
Xn → X, (3.1.25)
if and only if the induced probability measures (PXn )n∈N converge weakly to PX .
Definition 3.25 Let (Xn )n∈N be a sequence of random variables with values in
some topological space, and let X be a random variable on the same space. The
sequence (Xn )n∈N is said to converge in probability to X if and only if, for any
ε > 0,
lim P |Xn − X| ≥ ε = 0. (3.1.26)
n→∞
Definition 3.26 Let (Xn )n∈N be a sequence of random variables with values in
some topological space, and let X be a random variable on the same space. The
sequence (Xn )n∈N is said to converge almost surely to X if and only if
P lim Xn = X = 1. (3.1.27)
n→∞
Definition 3.27 Let (Xn )n∈N be a sequence of random variables with values in
some normed space, and let X be a random variable on the same space. Let
p ∈ (0, ∞). The sequence (Xn )n∈N is said to converges to X in Lp if and only if
lim E |Xn − X|p = 0. (3.1.28)
n→∞
Even almost sure convergence does not imply convergence of the integral of
the random variable without extra conditions. Lebesgue’s dominated convergence
36 3 Some Basic Notions from Probability Theory
Lemma 3.30 Let C be a class of random variables. Assume that, for some p > 1,
sup E |X|p = c < ∞. (3.1.30)
X∈C
Proof By Hölder’s inequality and Chebychev’s inequality, for all X ∈ C and p, q >
1 such that 1/p + 1/q = 1,
1/p 1/q
E |X|1|X|>K ≤ E |X|p P |X| > K ≤ c1/p cp/q K −p/q , (3.1.31)
There are various equivalent ways in which stochastic processes can be defined, and
it is useful to keep these in mind. The standard way is as follows. We begin with an
abstract probability space (Ω, F , P). Next, we need a measurable space (S, B(S))
(typically a Polish space together with its Borel σ -algebra), where S is called the
state space. Finally, we need a set I , called the index set. A stochastic process with
38 3 Some Basic Notions from Probability Theory
state space S and index set I is a collection of (S, B(S))-valued random variables
(Xt )t∈I defined on (Ω, F , P). If I is either N0 = N ∪ {0}, Z, R+ = [0, ∞) or R,
then we may think of it as time. Depending on whether I is discrete or continuous,
we refer to (Xt )t∈I as a stochastic process with discrete or continuous time (see
Fig. 3.3).
Given a stochastic process as defined above, we can take a different perspective
and, for each ω ∈ Ω, view X(ω) as a map from I to S,
Lemma 3.35 Let B I be the smallest σ -algebra that contains all subsets of S I of
the form
C(A, t) = x ∈ S I : xt ∈ A (3.2.3)
with A ∈ B = B(S), t ∈ I . Then B I is the smallest σ -algebra such that all func-
tions Xt : Ω → S, t ∈ I , are measurable, i.e., B I = σ (Xt , t ∈ I ).
a cylinder set or, more precisely, a finite-dimensional cylinder set. If B is of the form
B = ×t∈J At , At ∈ B, then we call it a special cylinder.
PX = P ◦ X −1 , (3.2.5)
Definition 3.37 Let F (I ) denote the set of finite non-empty subsets of I . Then the
collection of probability measures
J
PX : J ∈ F (I ) (3.2.7)
where πJ1 denotes the canonical projection from S J2 to S J1 . Then there exists a
unique measure P on (S I , B I ) such that, for all J ∈ F (I ),
P ◦ πJ−1 = PJ . (3.2.9)
Note that we need not distinguish cases according to the nature of the set I .
(Recall from Definition 3.7 that a Π -system is a set that is closed under finite
intersections.)
In many cases the σ -algebra G with respect to which we are conditioning is the
σ -algebra σ (Y ) generated by some other random variable Y . In those cases we will
write
E X|σ (Y ) = E[X|Y ] (3.3.4)
and call this the conditional expectation of X given Y .
Of course, these are just the analogues of the three basic convergence theorems
for ordinary expectations. A useful further property is the following lemma.
Lemma 3.43 Let X be integrable and let Y be bounded and G -measurable. Then
Lemma 3.44 Two σ -algebras G1 , G2 are independent if and only if, for all G2 -
measurable integrable random variables X,
However, Ω may depend on (An )n∈N and, since the space is not countable, it is
unclear whether there exists a set of full measure on which (3.3.12) holds for all
sequences. These considerations lead us to the definition of so-called regular con-
ditional probabilities.
Definition 3.45 Let (Ω, F , P) be a probability space and let G be a sub-σ -algebra.
A regular conditional probability measure or regular conditional probability on F
given G is a function P (ω, A), defined for all A ∈ F and all ω ∈ Ω, such that:
(i) For each ω ∈ Ω, P (ω, ·) is a probability measure on (Ω, F ).
(ii) For each A ∈ F , P (·, A) is a G -measurable function coinciding with the con-
ditional probability P(A|G ) almost everywhere.
The point is that if we have a regular conditional probability, then we can express
conditional expectations as expectations with respect normal probability measures.
Theorem 3.46 With the notation above, if Pω [A] = P (ω, A) is a regular condi-
tional probability on F given G , then for an F -measurable integrable random
variable X,
E[X|G ](ω) = X dPω a.s. (3.3.13)
Ω
The question remains when regular conditional probabilities exist. A central re-
sult is the existence when Ω is a Polish space.
44 3 Some Basic Notions from Probability Theory
Theorem 3.47 Let (Ω, B(Ω), P) be a probability space with Ω a Polish space and
B(Ω) its Borel-σ -algebra. Let G ⊂ B(Ω) be a sub-σ -algebra. Then there exists a
regular conditional probability P (A, ω) given G .
One of the most fundamental and useful concepts in the theory of stochastic pro-
cesses is that of a martingale. Since this will play a major rôle in the remainder of
the book, we will spend some time to expose its main properties. As some subtleties
arise in continuous time, we begin with the simpler case of discrete time.
3.4.1 Definitions
F0 ⊂ F1 ⊂ F2 ⊂ · · · ⊂ F∞ = σ Fn ⊂ F , (3.4.1)
n∈N0
Filtrations and stochastic processes are closely linked together, in two ways.
Definition 3.49 A stochastic process X = (Xn )n∈N0 is called adapted to the filtra-
tion (Fn )n∈N0 if Xn is Fn -measurable for every n.
Definition 3.50 Let X = (Xn )n∈N0 be a stochastic process on (Ω, F , P). The nat-
ural filtration (Wn )n∈N0 with respect to X is the smallest filtration such that X is
adapted to it, i.e.,
Wn = σ (X0 , . . . , Xn ). (3.4.2)
If (i) and (ii) hold but, instead of (iii), it is only true that E[Xn |Fn−1 ] ≥ Xn−1 ,
respectively, E[Xn |Fn−1 ] ≤ Xn−1 , then the process X is called a submartingale,
respectively, a supermartingale.
We will next head for the fundamental theorem stating the impossibility of “win-
ning games” built on martingales.
The most pertinent fact about martingales is their stability under the martingale
transform:
For proofs of the above results and further details, see Rogers and Williams [208].
In this section we derive some fundamental inequalities for martingales. One of the
most useful ones is the following maximum inequality.
Proof Since this inequality is fundamental for many applications, we will include
its proof. The second inequality in (3.4.7) is trivial. To prove the first inequality,
define the sequence of disjoint events given by F0 = {Z0 ≥ c} and
Fk = {Z < c} ∩ {Zk ≥ c} = ω ∈ Ω : min(0 ≤ ≤ n : X ≥ c) = k .
0≤<k
(3.4.8)
Then
n
F= sup Zk ≥ c = Fk . (3.4.9)
0≤k≤n k=0
3.4 Martingales in discrete time 47
n
n
E[Zn 1F ] = E[Zn 1Fk ] ≥ c P(Fk ) = c P(F ), (3.4.11)
k=0 k=0
X = X0 + M + A, (3.4.12)
Proof The proof is easy. All we need to do is to derive explicit formulas for M and
A. Assume that a decomposition of the claimed form exists. Then
n
An = E[Xk − Xk−1 |Fk−1 ] a.s. (3.4.16)
k=1
X = X0 + M + A (3.4.18)
sup |Xk | ≤ |X0 | + sup |Mk | + sup |Ak | = |X0 | + sup |Mk | + An . (3.4.19)
0≤k≤n 0≤k≤n 0≤k≤n 0≤k≤n
Note that |M| is a non-negative submartingale, so for the supremum of |Mk | we can
use Theorem 3.57. We use the simple observation that if x + y + z > 3c, then at
least one of the x, y, z must exceed c. Therefore
c P sup |Xk | ≥ 3c ≤ c P |X0 | ≥ c + c P sup |Mk | ≥ c + c P(An ≥ c)
0≤k≤n 0≤k≤n
≤ E |X0 | + E |Mn | + E[An ]. (3.4.20)
We have
E |Mn | = E |Xn − X0 − An | ≤ E |Xn | + E |X0 | + E[An ] (3.4.21)
and
E[An ] = E[Xn − X0 − Mn ] = E[Xn − X0 ] ≤ E |Xn | + E |X0 | . (3.4.22)
The Doob decomposition gives rise to two important processes associated with
a martingale M, namely, the bracket process M and the quadratic variation pro-
cess [M].
3.4 Martingales in discrete time 49
M 2 = N + M, (3.4.23)
where N is a martingale that vanishes at zero and M is a previsible process that
vanishes at zero.
Our analysis of metastability in Part III relies crucially on the analysis of the first
times when a Markov process hits certain sets. These first hitting times are special
cases of so-called stopping times: a time whose occurrence can be determined based
on the outcome of the process until that time alone.
{T = n} ∈ Fn ∀ n ∈ N0 ∪ {∞}. (3.4.27)
The most important examples of stopping times are hitting times. Let X be an
adapted process, and let B ∈ B. Define
i.e., the first hitting time of B when the location of X0 is not counted (see Fig. 3.7).
Then τB is a stopping time. To see this, note that if n ∈ N, then
{τB = n} = ω ∈ Ω : Xn (ω) ∈ B, Xk (ω) = B ∀ 1 ≤ k < n . (3.4.29)
50 3 Some Basic Notions from Probability Theory
Fig. 3.7 First hitting locations in B (indicated by ∗) when X0 (indicated by •) is not in B, respec-
tively, is in B
This event is manifestly in Fn . The event {τB = ∞} occurs if and only if {Xn ∈ /B
∀ n ∈ N} ⊂ F∞ .
In principle all stopping times can be realised as first hitting times of some pro-
cess. To do so, simply define
1, if n ≥ T (ω),
I[T ,∞) (n, ω) = (3.4.30)
0, otherwise.
Definition 3.64 The pre-T -σ -algebra FT is the set of events F ⊂ Ω such that
F ∩ {T ≤ n} ∈ Fn ∀ n ∈ N0 ∪ {∞}. (3.4.31)
Pre-T -σ -algebras will play an important rôle in the formulation of the strong
Markov property. There are some useful elementary facts associated with this con-
cept.
and, since CnT only takes the values 0 and 1, this inclusion suffices to show that
CnT ∈ Fn−1 . The “wealth process” associated with this strategy is C T • X = ((C T •
X)n )n∈N0 with
T
C • X n = XT ∧n − X0 . (3.4.34)
If we define the stopped process X T = (XnT )n∈N0 via
Theorem 3.66
(i) If X is a supermartingale and T is a stopping time, then the stopped process
X T = (XT ∧n )n∈N0 , is a supermartingale. In particular,
Note that Theorem 3.66 does not assert that E[XT ] ≤ E[X0 ]. The following the-
orem gives conditions under which this inequality holds.
Proof We already know that E[Xn∧T ] − E[X0 ] ≤ 0 for all n ∈ N. In case (a), we
know that T ∧ N = T , and so E[XT ] = E[XT ∧N ] ≤ E[X0 ], as claimed. In case (b),
we start from E[Xn∧T ] − E[X0 ] ≤ 0 and let n → ∞. Since T is almost surely finite,
we have limn→∞ Xn∧T = XT a.s., and since Xn is uniformly bounded, we get
lim E[XT ∧n ] = E lim XT ∧n = E[XT ], (3.4.41)
n→∞ n→∞
and E[KT ] < ∞ by assumption. Thus, we can again take the limit n → ∞ and use
Lebesgue’s dominated convergence theorem to justify that the inequality survives.
Finally, to justify (ii), use that if X is a martingale, then both X and −X are
supermartingales. The ensuing two inequalities imply the desired equality.
Theorem 3.67 may look strange, since it seems to contradict the “no winning
strategy”. Indeed, take the simple random walk (Sn )n∈N0 starting from S0 = 0 and
define the stopping time T = inf{n : Sn = 10}. Then, clearly, XT = 10 = E[X0 ] =
0. So, using (c) we must conclude that E[T ] = ∞. In fact, the “sure” gain when we
achieve our goal is offset by the fact that on average it takes infinitely long to reach
this goal (of course, most games will end quickly, but chances are that some may
take very long).
Case (c) in Theorem 3.67 is the situation we hope to have the most often. The
following lemma states that E[T ] < ∞ whenever the probability of the event leading
to T is eventually sufficiently large.
Lemma 3.68 Suppose that T is a stopping time and that there exist N ∈ N and
ε > 0 such that
P(T ≤ n + N | Fn ) > ε a.s. ∀ n ∈ N0 . (3.4.43)
Then E[T ] < ∞.
Proof We know that E[XT ∧n ] ≤ E[X0 ]. Using Fatou’s lemma, we may pass to
the limit n → ∞. For (3.4.45), set T = inf{n ∈ N0 : Xn > c}. Clearly, XT ≥ c
if supk∈N0 Xk > c, and zero otherwise. Thus, E[XT ] ≥ c P(supk∈N0 Xk > c), and
(3.4.46) follows from (3.4.45).
The first question we need to settle is the choice of function space where the pro-
cesses live in. Often this is the set of continuous functions, but in general this set is
too restrictive. It turns out that a good choice is the set of so-called càdlàg functions.
Without going into further details, we state the fact that regularisability is a mea-
surable property.
Lemma 3.72 Let (Yq )q∈Q+ be a stochastic process defined on (Ω, F , P), and let
G = ω ∈ Ω : q → Yq (ω) is regularisable . (3.5.1)
Then G ∈ F .
Then f is càdlàg.
Gs ⊂ Gt ⊂ G∞ = σ Gr ⊂ G . (3.5.3)
r∈R+
Definition 3.75 A stochastic process (Xt )t∈R+ is called adapted to the filtration
(Gt )t∈R+ if Xt is Gt -measurable for every t ∈ R+ .
So far almost nothing has changed with respect to the discrete-time setting.
Note, in particular, that if we take a monotone sequence of times (tn )n∈N0 , then
(Yn )n∈N0 = (Xtn )n∈N0 is a discrete-time (sub/super)martingale whenever (Xt )t∈R+
is a continuous-time (sub/super)martingale.
The next lemma is important because it connects martingale properties to càdlàg
properties.
Proof This is an application of the Lévy-Doob downward theorem (see Rogers and
Williams [208, Chaps. II.51 and II.63]).
Spaces of càdlàg functions are the natural setting for stochastic processes.
Definition 3.78 A stochastic process is called a càdlàg process if all its sample paths
are càdlàg functions; càdlàg processes that are (sub/super)martingales are called
càdlàg (sub/super)martingales.
Note that we do not just require that almost all sample paths are càdlàg.
We will now show that the setting of càdlàg functions is suitable for the theory of
martingales.
is a càdlàg process.
One might hope that Theorem 3.79 settles all problems related to continuous-
time martingales. Simply start with any supermartingale and pass to the càdlàg reg-
56 3 Some Basic Notions from Probability Theory
where E[q] = 0. Let (Gt )t∈R+ be the natural filtration associated with this process.
Clearly, Gt = {∅, Ω} for t ≤ 1 and Y is a martingale with respect to this filtration.
The càdlàg version of this process is
0, if t < 1,
Xt (ω) = (3.5.9)
q(ω), if t ≥ 1.
Now, X = (Xt )t∈R+ is not adapted to the filtration (Gt )t∈R+ , since X1 is not measur-
able with respect to G1 . This problem cannot be remedied by a simple modification
on sets of measure zero because P(X1 = Y1 ) < 1. In particular, X is not a martingale
with respect to the filtration (Gt )t∈R+ , because
We thus see that the right-continuous regularisation of Y at the point of the jump
anticipates information from the future. If we want to develop a theory on càdlàg
processes, then we must take this into account and introduce a richer filtration that
contains this information.
Definition 3.80 Let (Ω, G , P, (Gt )t∈R+ ) be a filtered space. Define, for t ∈ R+ ,
Gt+ = Gs = Gq , (3.5.11)
s>t Qq>t
and let
N (G∞ ) = G ∈ G∞ : P(G) ∈ {0, 1} . (3.5.12)
Then the partial augmentation (Ht )t∈R+ of the filtration (Gt )t∈R+ is defined as
Ht = σ Gt+ , N (G∞ ) . (3.5.13)
The following lemma, which is obvious from the construction of càdlàg versions,
justifies this definition.
Lemma 3.81 If Y is a supermartingale with respect to the filtration (Gt )t∈R+ and
X is its càdlàg version defined in Theorem 3.79, then X is adapted to the partially
augmented filtration (Ht )t∈R+ .
Theorem 3.82 With the assumptions and notations of Lemma 3.81, X is a super-
martingale with respect to the filtration (Ht )t∈R+ . Moreover, X is a modification of
Y if and only if Y is right-continuous, in the sense that
lim E |Yt − Ys | = 0 ∀ t ∈ R+ . (3.5.14)
s↓t
Henceforth we will work on filtered spaces that are already partially augmented,
i.e., our standard setting (called “the usual setting” in Rogers and Williams [208]) is
as follows.
Definition 3.83 A filtered càdlàg space is a quadruple (Ω, F , P, (Ft )t∈R+ ), where
(Ω, F , P) is a probability space and (Ft )t∈R+ is a filtration that satisfies the fol-
lowing properties:
(i) F is P-complete (i.e., contains all sets of P-measure zero).
(ii) F0 contains all sets of P-measure 0.
(iii) Ft = Ft+ , i.e., t → Ft is right-continuous.
If (Ω, G , P, (Gt )t∈R+ ) is a filtered space, then the minimal enlargement of this space
satisfying conditions (i), (ii) and (iii) is called the right-continuous regularisation of
this space.
Theorem 3.85 Let (Ω, F , P, (Ft )t∈R+ ) be a filtered càdlàg space. Let Y be an
adapted supermartingale. Then Y has a càdlàg modification Z if and only if the
map t → E[Yt ] is right-continuous, in which case Z is a càdlàg supermartingale.
Key results on discrete-time martingale theory were Doob’s forward and backward
convergence theorems and the maximum inequalities. We will now consider the
corresponding results in continuous time.
The notions around stopping times introduced in this section are important in the
theory of Markov processes. We need to be careful in the continuous-time setting,
even though we closely follow the discrete-time setting.
We consider a filtered space (Ω, G , P, (Gt )t∈R+ ).
If T is a stopping time, then the pre-T -σ -algebra GT is the set of all Λ ∈ G such
that
Λ ∩ {T ≤ t} ∈ Gt ∀ 0 ≤ t ≤ ∞. (3.5.18)
It will be useful to also talk about stopping times with respect to the filtration
(Gt+ )t∈R+ .
If T is a (Gt+ )t∈R+ -stopping time, then the pre-T -σ -algebra GT + is the set of all
Λ ∈ G such that
Λ ∩ {T < t} ∈ Gt ∀ 0 ≤ t ≤ ∞. (3.5.20)
3.5 Martingales in continuous time 59
Lemma 3.91 Let (Sn )n∈N be a sequence of (Gt )t∈R+ -stopping times.
(i) If Sn ↑ S, then S is a (Gt )t∈R+ -stopping time.
(ii) If Sn ↓ S, then S is a (Gt+ )t∈R+ -stopping time and GS+ = n∈N GSn + .
Definition 3.92 A process (Xt )t∈R+ is called (Gt )t∈R+ -progressive if for every
t ∈ R+ the restriction of the map (s, ω) → Xs (ω) to [0, t] × Ω is B([0, t]) × Gt -
measurable.
The notion of a progressive process is stronger than that of an adapted process. Its
importance arises from the fact that T -stopped progressive processes are measurable
with respect to their respective pre-T -σ -algebra. The nice fact is that in the càdlàg
setting all works well.
Lemma 3.93 An adapted càdlàg process in a metrisable space (S, B(S)) is pro-
gressive.
Lemma 3.94 If X is progressive with respect to the filtration (Gt )t∈R+ and T is a
(Gt )t∈R+ -stopping time, then XT is GT -measurable.
In the case of discrete-time Markov processes we have seen that hitting times of
certain sets provide particularly important examples of stopping times. We will now
extend this discussion to the continuous-time setting. It is important to distinguish
between the notions of hitting time and entrance time. These differ in the way the
position of the process at time 0 is treated.
the first hitting time of the set Γ . In both cases the infimum is understood to be ∞
if the process never enters Γ .
Recall that in the discrete-time setting we only worked with τΓ , which is in fact
the more important notion. Here is an example of a stopping time.
Lemma 3.96 Let E be a metric space and let F be a closed set. Let X be a contin-
uous adapted process. Then ΔF is a (Gt )t∈R+ -stopping time and τF is a (Gt+ )t∈R+ -
stopping time.
60 3 Some Basic Notions from Probability Theory
Proof Let ρ denote the metric on E. Then the map x → ρ(x, F ) is continuous,
and hence the map ω → ρ(Xq (ω), x) is Gq -measurable for q ∈ Q+ . Since the paths
t → Xt (ω) are continuous, we have ΔF (ω) ≤ t if and only if
inf ρ Xq (ω), F = 0 (3.5.23)
q∈Q∩[0,t]
and so ΔF is measurable with respect to (Gt )t∈R+ . For τF the situation is slightly
different at time zero. Indeed, let ΔrF = inf{t ≥ r : Xt ∈ F }, r > 0. Obviously, from
the previous result we have that DFr is a (Gt )t∈R+ -stopping time. On the other hand,
{τF > 0} if and only if there exists a δ > 0 such that ΔrF > δ for all Q r > 0. But,
clearly, the event
Aδ = ΔrF > δ (3.5.24)
Qr>0
To see where the difference between ΔF and τF comes from, consider the pro-
cess starting at the boundary of F . Then ΔF = 0, while τF may or may not be
zero: it could be that the process immediately leaves F and only returns after some
positive time t, in which case τF > 0, or it may stay for awhile in F , in which case
τF = 0. To distinguish between the two cases, we must look a little bit into the future
(recall Fig. 3.7).
We have seen in the theory of discrete-time Markov processes that martingale prop-
erties of processes stopped at stopping times are important. We need similar results
for càdlàg processes. We again work on a filtered càdlàg space (Ω, F , P, (Ft )t∈R+ )
on which all the processes will be defined and adapted. The key result is the follow-
ing optional sampling theorem.
If, in addition,
(i) T is finite a.s.,
3.6 Bibliographical notes 61
and
E[XT |FS ] ≤ XS a.s., (3.5.29)
with equality when X is a uniformly integrable martingale.
2. The presentation in Sects. 3.4–3.5 largely follows the book of Rogers and
Williams [208].
Chapter 4
Markov Processes in Discrete Time
Markov processes are the basic class of stochastic processes that we will use to
model metastable systems. Similarly to what we saw in Chap. 3, there is a substantial
difference in the mathematical difficulties involved in dealing with discrete time
and continuous time. In this chapter we give an outline of the theory of discrete-
time Markov processes (also called Markov chains). In Chap. 5 we will deal with
continuous-time Markov processes.
Section 4.1 gives the main definitions and lists some key facts. Section 4.2 looks
at the link between Markov processes and martingales. Section 4.3 lists a few prop-
erties that are specific to the setting where the state space is countable.
Definition 4.1 A stochastic process with state space S and index set I = N0 or
I = R+ is called a Markov process if the following holds. For any t ≥ s ≥ 0, there
exists a probability kernel Ps,t : S × B → [0, 1], satisfying:
Fig. 4.1 Illustration of the Markov property: the future depends on the present not the past
In the case of discrete time, i.e., for index set I = N0 , the compatibility conditions
impose severe restrictions on the kernels Ps,t that allow us to consider only the
one-step transition kernel Pt−1 = Pt−1,t . Indeed, a stochastic process X with state
space S and index set N0 is a discrete-time Markov process with one-step transition
kernels (Pt )t∈N if, for all A ∈ B and t ∈ N,
P(Xt ∈ A|Ft−1 )(ω) = Pt−1 Xt−1 (ω), A , P-a.s. (4.1.2)
This requirement fixes the law P up to one more probability measure on (S, B),
namely, the initial distribution p0 .
where we refrain from writing “a.s.”, which applies to all equations relating to con-
ditional expectations. We thus have
Ps,t (x, A) = Pt−1 (xt−1 , A) Pt−2 (xt−2 , dxt−1 ) . . .
S S
... Ps+1 (xs+1 , dxs+2 )Ps (x, dxs+1 ). (4.1.6)
S
P(Xtn ∈ An , . . . , Xt1 ∈ A1 )
= E P(Xtn ∈ An |Ftn−1 )1An−1 (Xtn−1 ) . . . 1A1 (Xt1 )
= E E Ptn−1 ,tn (Xtn−1 , An )|Ftn−1 1An−1 (Xtn−1 ) . . . 1A1 (Xt1 )
=E E Ptn−1 ,tn (xn−1 , An )Ptn−2 ,tn−1 Xtn−2 (ω), dxn−1 Ftn−2
An−1
× 1An−2 (Xtn−2 ) . . . . . . 1A1 (Xt1 )
= Ptn−1 ,tn (xn−1 , An ) Ptn−2 ,tn−1 (xn−2 , dxn−1 ) . . .
An−1 An−2
... Pt1 ,t2 (x1 , dx2 ) P0,t1 (x0 , dx1 )P0 (dx0 ). (4.1.7)
A1 S
Thus, we get the desired expression for the marginal distributions in terms of the
transition kernel P and the initial distribution p0 . The compatibility relations follow
from the following obvious, but important, property of the transition kernels.
Lemma 4.3 The transition kernels Ps,t satisfy the Chapman-Kolmogorov equa-
tions:
Ps,t (x, A) = Pr,t (y, A)Ps,r (x, dy), t > r > s. (4.1.8)
S
The proof of the compatibility relations is now also obvious; if some of the Ai ,
1 ≤ i ≤ n, are equal to S, then we can use (4.1.8) and recover the expressions for
the lower-dimensional marginals.
66 4 Markov Processes in Discrete Time
In general, we call a stochastic process whose index set supports the action of a
group (or semigroup) stationary with respect to the action of this group (or semi-
group) if all finite-dimensional distributions are invariant under the simultaneous
shift of all time indices. Specifically, if our index set I is R+ or Z or N0 , then a
stochastic process is stationary if, for all ∈ N, s1 , . . . , s ∈ I , A1 . . . , A ∈ B and
t ∈ I,
We can express this property also as follows. For t ∈ I , define the process X ◦ θt by
(X ◦ θt )s = Xt+s . Then X is stationary if and only if, for all t ∈ I , the processes X
and X ◦ θt have the same finite-dimensional distributions.
In the case of Markov processes, a necessary (but not sufficient) condition for
stationarity is the stationarity of the transition kernels.
Definition 4.4 A Markov process with discrete time I = N0 and state space S is
said to have stationary transition probabilities if its one-step transition kernel Pt is
independent of t, i.e., there exists a probability kernel P (x, A) such that
With the notation Ps,t for the transition kernel from time s to time t, we can
alternatively state that a Markov process has stationary transition probabilities if
there exists a family of transition kernels Pt (x, A) such that
for all s, t ∈ N0 with 0 ≤ s < t, x ∈ S and A ∈ B. Note that Pt and Pt are different
objects and should not be confused.
A key concept for Markov processes with stationary transition probabilities is
that of an invariant distribution and invariant measure.
Definition 4.5 Let P be the transition kernel of a Markov process with stationary
transition probabilities. Then a probability measure π on (S, B) is called an invari-
ant distribution if
π(dx)P (x, A) = π(A) (4.1.12)
S
for all A ∈ B. More generally, a positive and σ -finite measure π satisfying (4.1.12)
is called an invariant measure.
Lemma 4.6 A Markov process with stationary transition probabilities and initial
distribution p0 = π is a stationary stochastic process if and only if π is an invariant
distribution.
4.2 Markov processes and martingales 67
Fig. 4.2 Illustration of the strong Markov property: the future depends on the present not the past,
even when the present occurs at a random stopping time T . Recall Fig. 4.1
There always is at least one invariant measure. When S is finite, this invariant
measure can be chosen to be a probability measure. However, when S is infinite, it
may not be possible to do so.
The setting of Markov processes is highly suitable for the application of the notion
of stopping times introduced in Chap. 3. In fact, one of the important properties of
Markov processes is that we can split the past and the future also at random times
(see Fig. 4.2).
Proof We have
E[1T <∞ F G ◦ θT |F0 ] = E E[1T <∞ F G ◦ θT |FT ]|F0
= E 1T <∞ F E[G ◦ θT |FT ]|F0 . (4.1.14)
In this section we show how Markov transition kernels can be seen as operators
acting on spaces of measures, respectively, spaces of functions.
68 4 Markov Processes in Discrete Time
4.2.1 Semigroups
The action on measures has the following natural interpretation in terms of the pro-
cess: if P(X0 ∈ A) = μ(A), then
and
(Pt f )(x) = f (y)Pt (x, dy), (4.2.6)
S
where again
Pt f = P t f. (4.2.7)
We say that (Pt )t∈N0 is a semigroup acting on the space of measures, respectively,
on the space of bounded measurable functions. The interpretation of the action on
functions is as follows.
Lemma 4.8 Let (Pt )t∈N0 be a Markov semigroup acting on bounded measurable
functions f . Then
(Pt f )(x) = E f (Xt )|F0 (x) = Ex f (Xt ) . (4.2.8)
t−1
t−1
Pt f − f = Ps (P − 1)f = Ps (Lf ), t ∈ N0 , (4.2.10)
s=0 s=0
Lemma 4.9 Let L be the generator of a Markov process X, and let f be a bounded
measurable function. Then
t−1
Mt = f (Xt ) − f (X0 ) − (Lf )(Xs ), t ∈ N0 , (4.2.11)
s=0
is a martingale.
t+r−1
E[Mt+r |Ft ] = E f (Xt+r )|Ft − E f (X0 )|Ft − E (Lf )(Xs )|Ft
s=0
= (Pr f )(Xt ) − f (Xt ) + f (Xt ) − f (X0 )
t+r−1
t−1
− E (Lf )(Xs )|Ft − E (Lf )(Xs )|Ft
s=t s=0
t−1
= f (Xt ) − f (X0 ) − (Lf )(Xs )
s=0
r−1
+ (Pr f )(Xt ) − f (Xt ) − (Pr Lf )(Xt )
s=0
= Mt + 0, (4.2.12)
Note that (4.2.11) is the Doob decomposition of the process (f (Xt ))t∈N0 because
t−1
( s=0 (Lf )(Xs ))t∈N0 is a previsible process. Check that this fact follows directly
from (3.4.16).
70 4 Markov Processes in Discrete Time
What is important about the latter observation is that it gives rise to a characterisa-
tion of the generator that will turn out to be extremely useful in the continuous-time
setting. Namely, we can ask whether the requirement that (Mt )t∈N0 be a martingale
given a family of pairs (f, Lf ) fully characterises a Markov process.
Proof Lemma 4.9 already provides the “only if” part, so it remains to show the “if”
part. First, if we assume that X is a Markov process, then setting r = 1 and t = 0
in (4.2.12) and taking conditional expectations given F0 , we see from Lemma 4.8
that E(f (X1 )) = f (X0 ) + (Lf )(X0 ), which implies that the transition kernel must
be 1 + L.
It remains to show that X is indeed a Markov process. For this we want to show
that
E f (Xt+s )|Ft = (1 + L)s f (Xt ) = P s f (Xt ), (4.2.13)
from the martingale problem formulation. To see this, we just use the calculation in
(4.2.12) to see that
E f (Xt+r )|Ft = E[Mt+r |Ft ] + f (X0 )
t−1
t+r−1
+ (Lf )(Xs ) + E (Lf )(Xs )|Ft
s=0 s=t
t−1
t+r−1
= Mt + f (X0 ) + (Lf )(Xs ) + E (Lf )(Xs )|Ft
s=0 s=t
r−1
= f (Xt ) + E (Lf )(Xt+s )|Ft . (4.2.14)
s=0
For r = 1,
E f (Xt+1 )|Ft = f (Xt ) + (Lf )(Xt ) = (1 + L)f (Xt ) = (Pf )(Xt ), (4.2.15)
which is (4.2.13) for s = 1. Now proceed by induction: assume that (4.2.13) holds
for all bounded measurable functions for s ≤ r − 1. We must show that it then also
holds for s = r. To do this, we use (4.2.14) for the last sum in (4.2.14),
r−1
r−1
E (Lf )(Xt+s )|Ft = Ps (Lf ) (Xt ) = P r f (Xt ) − f (Xt ), (4.2.16)
s=0 s=0
4.2 Markov processes and martingales 71
where we undid the telescoping sum in (4.1.10). Insertion into (4.2.14) yields
(4.2.13) for s = r. Hence (4.2.13) holds for all r, by induction.
The full strength of Theorem 4.10 will become apparent in the continuous-time
setting, for which it remains valid. A crucial point is that it will not even be necessary
to consider all bounded measurable functions: a sufficiently rich class will do. This
allows us to formulate martingale problems even when we cannot write down the
generator in an explicit form.
which proves the claim. In (4.2.19) we again used the Doob optional stopping theo-
rem, (Theorem 3.67(i, b)).
Theorem 4.13 can be phrased as saying that (sub)harmonic functions take their
maximum on the boundary (since the set D c in (4.2.18) can be replaced by the subset
∂D ⊂ D c such that Px (XT ∈ ∂D) = 1 for all x ∈ D). The above proof is an example
of how intrinsically analytic results can be proven with the help of probabilistic
arguments. The next section will further develop this theme.
Lemma 4.14 With the notation above, if Y is a FτB −1 -measurable function, then
Ehx [Y ] = Ex [Y |τA = τB ]. (4.2.22)
Here, the first equality is just the definition of h and reproduces the form of the
right-hand side of the strong Markov property, the second equality is the strong
Markov property (recall Theorem 4.7), while the fourth equality uses the fact that the
event {τA = τB } depends only on what happens after τB − 1, and so 1τA =τB θτB −1 =
1τA =τB .
Let us next look at the transformed law Ph in the general case. The first property
to check is whether Ph is defined in a consistent way. Some thought shows that it
suffices to prove the following lemma.
In particular, Ph [Ω|F0 ] = 1.
1 r−1
Eh Mth |Fr = E h(Xt )f (Xt )|Fr − f (X0 ) − Lh f (Xs )
h(Xr )
s=0
t−1
1
− E h(Xs ) Lh f (Xs )|Fr . (4.2.29)
s=r
h(Xr )
74 4 Markov Processes in Discrete Time
The two middle terms are part of Mrh and so we must compute E[f (Xt )h(Xt )|Fr ].
This is done by applying Lemma 4.9 for the law P and the function f h, which yields
t−1
E f (Xt )h(Xt )|Fr = f (Xr )h(Xr ) + E L(f h) (Xs )|Fr . (4.2.30)
s=r
r−1
h
Eh Mth |Fr = f (Xr ) − f (X0 ) − L f (Xs ) (4.2.31)
s=0
1
t−1
+ E L(f h) (Xs )|Fr − E h(Xs ) Lh f (Xs )|Fr
h(Xr ) s=r
= Mrh
1
t−1
+ E L(f h) (Xs )|Fr − E h(Xs ) Lh f (Xs )|Fr .
h(Xr ) s=r
i.e.,
1
Lh f (x) = P (x, dy)h(y)f (y) − f (x). (4.2.32)
h(x) S
Thus we see that, under Ph , X solves the martingale problem corresponding to the
generator Lh , and hence is a Markov chain with transition kernel P h = Lh + 1.
The process X under Ph is called the (Doob) h-transform of the original Markov
process.
which is called the transition matrix. We start with the following elementary con-
cepts.
4.3 Markov processes with countable state space 75
Definition 4.16 Two states i, j ∈ S the state space of a Markov process communi-
cate if there exist k, k ≥ 0 such that (p k )ij > 0 and (p k )j i > 0.
Definition 4.17 A Markov process with countable state space is called irreducible
if and only if its state space forms a single communicating class.
Definition 4.18 A state i ∈ S of a Markov chain has period d(i) if and only if d(i)
is the largest common divisor of all numbers n ∈ N such that (P n )i,i > 0. A state of
period 1 is called aperiodic.
The notions of recurrence and transience can be defined for single states rather
than for the entire process. In the case of irreducible Markov processes, all states
have the same characteristics.
Some simple consequences of the above definition are the following.
Thus, ν solves the equation for the invariant measure. It remains to show that ν is
normalisable. However,
ν (j ) = E [τ ] < ∞, (4.3.7)
j ∈S
by assumption. Hence ν (j )/ i∈S ν (i) = μ(j ) is an invariant probability distri-
bution.
We next show uniqueness. First, note that for any irreducible Markov process
with countable state space, if ν is an invariant measure and ν(i) = 0 for some
i ∈ S, then ν = 0. Indeed, if ν(j ) > 0 for some j , then there exists a finite t such
that Pjt i > 0, and so ν(i) ≥ ν(j )Pjt i > 0, which is a contradiction. Next, note that
ν () = 1. We can actually show that ν is the only invariant measure such that
ν () = 1, which implies the desired uniqueness result as follows. Below we show
that ν(j ) ≥ ν (j ) for all j ∈ S for any other invariant measure ν such that ν() = 1.
Since ν − ν is a positive invariant measure as well and is zero at , it must vanish
identically, which implies that ν = ν .
We have
ν(i) = p(j1 , i)ν(j ) + p(, i), (4.3.8)
j1 =
4.3 Markov processes with countable state space 77
Corollary 4.22 For an irreducible positive recurrent Markov process with count-
able state space S,
1
μ(j ) = , j ∈ S. (4.3.12)
Ej [τj ]
τj
Proof Just set = j in the definition of μ(j ), and note that νj (j ) = Ej [ t=1 1Xt =x ]
= 1.
The invariant distribution determines the long-time behaviour of the Markov pro-
cess. We state the following two ergodic theorems without proof.
1. There are many good textbooks on Markov processes. Nice modern treatments
can be found in Norris [195], Stroock [222], or Levin, Peres and Wilmer [163].
Two classic texts are those by Kemeny and Snell [151], or Kemeny, Snell and
Knapp [150].
The simplest class of continuous-time Markov processes are Markov jump pro-
cesses. They are constructed “explicitly” from Markov processes in discrete time.
The idea is simple: take a discrete-time Markov process and randomise the waiting
times between the successive moves in such a way as to obtain a continuous-time
Markov process.
Fig. 5.1 Simulation of simple random walk on Z2 with n = 103 , 104 and 105 steps. The circles
have radius n1/2 in units of the step size. Brownian motion on R2 is the continuum limit of simple
random walk on Z2 . (Courtesy Bill Casselman and Gordon Slade)
To be more precise, let (Yn )n∈N0 , be a discrete-time Markov process with state
space S, transition kernel P (also called jump distribution) and initial distribution μ.
Let m : S → R+ be a uniformly bounded and measurable function. Let ei,x , i ∈ N0 ,
x ∈ S, be a family of independent exponential random variables with mean m(x),
defined on the same probability space (Ω, F , P) as (Yn )n∈N0 , and assume that
the Yn and the ei,x are mutually independent. Define
n−1
S(n) = ei,Yi , n ∈ N0 , (5.1.1)
i=0
which is called the clock process: S(n) represents the time at which the n-th jump
takes place. Define the inverse function
S −1 (t) = sup n ∈ N0 : S(n) ≤ t , t ∈ R+ , (5.1.2)
and set
X(t) = YS −1 (t) . (5.1.3)
with Xk , k ∈ N, i.i.d. random variables. Let us focus on the centred case: E(X1 ) = 0.
The central limit theorem states that Zn = n−1/2 Sn converges in distribution to a
Gaussian random variable, provided E[X12 ] = σ 2 < ∞. A natural question that goes
beyond this observation is to ask whether the entire path {Sn , n ∈ N} converges to a
limiting object. It is clear that if we rescale like
tn
Zn (t) = (nσ 2 )−1/2 Xk , t ∈ (0, 1], (5.2.2)
k=1
then
D
Zn (t) → Bt , t ∈ (0, 1], (5.2.3)
where Bt is a centred Gaussian random variable with variance t. Moreover, for ∈ N
and a finite collection of indices 0 = t0 < t1 < · · · < t , define Yn (i) = Zn (ti ) −
Zn (ti−1 ). Then the random variables Yn (i) are independent, and it is easy to see
that they jointly converge, as n → ∞, to a family of independent, centred Gaussian
random variables with variances ti − ti−1 . This implies that the finite-dimensional
distributions of the processes (Zn (t))t∈[0,1] converge to the finite-dimensional dis-
tributions of the Gaussian process, (Bt )t∈[0,1] , with covariance E[Bs Bt ] = s ∧ t. The
latter is called Brownian motion and has very interesting properties.
Lemma 5.3 Brownian motion in 1 dimension is the Gaussian process (Bt )t∈R+ with
values in R such that
(o) B0 = 0.
(i) For any t ∈ R+ , E[Bt ] = 0.
(ii) For any t, s ≥ 0, E[Bt Bs ] = t ∧ s.
(iii) For any ω ∈ Ω, the map t → Bt (ω) is continuous.
82 5 Markov Processes in Continuous Time
Proof Let B be Brownian motion as defined in Definition 5.2. Then properties (o),
(i) and (iii) are obviously satisfied. To show that (ii) holds, assume without loss of
generality that t > s. Then
E[Bt Bs ] = E (Bt − Bs )Bs + Bs2 = 0 + s = t ∧ s, (5.2.4)
where we use that Bt − Bs and Bs are independent and centred, and Bs has vari-
ance s.
To prove the converse, i.e., to prove that any stochastic process with the above
properties is Brownian motion, we can simply use the fact that the law of a Gaus-
sian process is uniquely determined by its mean and its covariance. Therefore the
stochastic process has the same law as Brownian motion (see Fig. 5.2), and has
continuous paths by property (iii), so it is Brownian motion.
There are a number of ways to prove the existence of Brownian motion, and we
refer the reader to the literature for proofs. In a way, the most appealing proof is via
Donsker’s theorem, which constructs Brownian motion as the limit of sums of i.i.d.
random variables via an interpolated version of (5.2.2).
Having constructed the random variable (Bt )t∈R+ in C(R+ , Rd ), we can define
its distribution, the so-called Wiener measure. We want to construct this as a measure
on the space of continuous functions equipped with its Borel σ -algebra. For this it
is useful to observe the following.
Lemma 5.6 The smallest σ -algebra C on C(R+ , Rd ) that makes all the coor-
dinate functions t → w(t) measurable coincides with the Borel-σ -algebra B =
B(C(R+ , Rd )) of the metrisable space C(R+ , Rd ) equipped with the topology of
uniform convergence on compact sets.
Proof First, C ⊂ B because all functions t → w(t) are continuous and hence mea-
surable with respect to the Borel-σ -algebra B. To prove that B ⊂ C , note that the
topology of uniform convergence is equivalent to the metric topology relative to the
metric
−n
d w, w = 2 sup w(t) − w (t) ∧ 1 , w, w ∈ C R+ , Rd . (5.2.5)
n∈N 0≤t≤n
We thus have to show that any ball with respect to this distance is measurable with
respect to C . But since w, w are continuous functions, we have
sup w(t) − w (t) ∧ 1 = sup w(t) − w (t) ∧ 1 , (5.2.6)
t∈[0,n] t∈[0,n]∩Q
Note that, by construction, the map ω → B(ω) is measurable because the maps
ω → Bt (ω) are measurable for all t, and by the definition of C the coordinate maps
B → Bt are measurable for all t. Therefore the following definition makes sense.
Lemma 5.8 For any a ∈ R+ , the processes B = (Bt )t∈R+ and A = (At )t∈R+ with
At = a −1 Bta 2 have the same distribution.
84 5 Markov Processes in Continuous Time
Proof Obviously, A is a Gaussian process. It is also obvious that the time change
and the multiplication by a constant preserve the continuity of the paths, the starting
position 0 and the fact that the process has mean zero. Thus, it suffices to show that
B and A have the same covariance. But
E[At As ] = a −2 E[Ba 2 t Ba 2 s ] = a −2 a 2 t ∧ a 2 s = s ∧ t, (5.2.7)
Next, we show that Brownian motion is also a Markov process. For the defini-
tion of a continuous-time Markov process, we use the obvious generalisation of a
discrete-time Markov processes.
Definition 5.10 A stochastic process X with state space S and index set R+ is
called a continuous-time Markov process if there exists a two-parameter family of
probability kernels Ps,t satisfying the Chapman-Kolmogorov equations
Ps,t (x, A) = Pr,t (y, A)Ps,r (x, dy), ∀r ∈ (s, t), A ∈ B, (5.2.9)
S
1 y − x2
Ps,t (x, A) = exp − dy. (5.2.11)
(2π(t − s))d/2 A 2(t − s)
Proof For simplicity we only consider the case d = 1 (the general case works the
same way). We proceed as in the discrete-time case:
1 t
E[Mt+r | Ft ] = f (Bt ) − f (B0 ) − f (Bs )ds
2 0
1 r
+ E f (Bt+r ) − f (Bt ) | Ft − E f (Bt+s ) | Ft ds
2 0
1 (y − Bt )2
= Mt + √ f (y) exp − dy − f (Bt )
2πr R 2r
1 r 1 (y − Bt )2
− √ f (y) exp − dy ds. (5.2.13)
2 0 2πs R 2s
1 (y − x)2
√ f (y) exp − dy
2πs R 2s
d2 1 (y − x)2
= f (y) 2 √ exp − dy
R dy 2πs 2s
1 (y − x)2
=√ f (y) −s −3/2 + (y − x)2 s −5/2 exp − dy
2π R 2s
d 1 (y − x)2
= 2 f (y) √ exp − dy. (5.2.14)
R ds 2πs 2s
2 (x − y)2
√ f (y) exp − dy − 2f (x), (5.2.15)
2πr R 2r
1 (x − y)2
lim √ f (y) exp − dy = f (x). (5.2.16)
h↓0 2πh R 2h
1 x2
e(x, t) = √ exp − (5.2.17)
2πt 2t
satisfies the (parabolic) partial differential equation
∂ 1
e(x, t) = Δe(x, t), (5.2.18)
∂t 2
with the (singular) initial condition
The main building block for a time-homogeneous Markov process is the so-called
transition kernel P : R+ × S × B → [0, 1]. As in discrete time, the compatibility
conditions impose the Chapman-Kolmogorov equations for transition kernels.
Definition 5.14 A stochastic process X with state space S and index set R+ is
a continuous-time homogeneous Markov process on a filtered space (Ω, F , P,
(Ft )t∈R+ ) with transition function (Pt )t∈R+ if it is adapted to (Ft )t∈R+ and, for
all bounded B-measurable functions f and all s, t ∈ R+ ,
E f (Xt+s )|Fs (ω) = (Pt f ) Xs (ω) a.s. (5.3.2)
The condition Pt (x, S) ≤ 1 may look surprising, since we would expect Pt (x, S) =
1. However, it is sometimes convenient to consider the more general situation where
the process may leave the state space, i.e., may “die”.
Equation (5.3.1) allows us to think of Markov transition functions as operators
on the Banach space of bounded measurable functions.
Definition 5.15 A family (Pt )t∈R+ of bounded linear operators on B(S, R) is called
a sub-Markov semigroup if, for all t ∈ R+ ,
(i) Pt : B(S, R) → B(S, R).
(ii) If 0 ≤ f ≤ 1, then 0 ≤ Pt f ≤ 1.
(iii) For all s ∈ R+ , Pt+s = Pt Ps .
(iv) If fn ↓ 0, then Pt fn ↓ 0.
A sub-Markov semigroup is called normal if P0 = 1, and is called honest if Pt 1 = 1
for all t ∈ R+ .
Our aim will be to construct the generator of the semigroup. We are looking for
an operator L such that Pt = exp(tL ), where “exp” is the exponential map, defined
through its Taylor expansion. This is a good enough way to construct a semigroup
88 5 Markov Processes in Continuous Time
from a bounded generator L . We will see shortly that this works well for Markov
jump processes with bounded jump rates m(x), x ∈ S. The general case, however, is
a bit more involved. The proper setting in which the relation between semigroup and
generator can be generalised is that of a so-called strongly continuous contraction
semigroup.
Definition 5.16 Let B0 be a Banach space. A family (Pt )t∈R+ of bounded linear
operators from B0 to B0 is called a strongly continuous contraction semigroup if the
following conditions are verified:
(i) For all f ∈ B0 , limt↓0 Pt f − f = 0.
(ii) Pt ≤ 1 for all t ∈ R+ .
(iii) Pt Ps = Pt+s , for all s, t ∈ R+ .
Here · denotes the operator norm corresponding to the norm on B0 .
Definition 5.17 Let B0 be a Banach space and let (Pt )t∈R+ be a strongly continuous
contraction semigroup. We say that f is in the domain D(L ) of L , if there exists
a function g ∈ B0 , such that
Lemma 5.18 Let X be a Markov jump process with jump distribution P and jump
rates m(x), x ∈ S. Then X has a generator with domain B(0) given by
(L f )(x) = m(x) f (y) − f (x) P (x, dy). (5.3.7)
S
tn
(Pt f )(x) = exp(tL )f (x) = L n f (x). (5.3.8)
n!
n∈N0
(λ − L )f ≥ λf . (5.3.9)
The proof of the Hille-Yosida theorem is quite involved and functional analytic
in nature. It makes use of the concept of resolvent, which provides the constructive
link between generator and semigroup. The proof of the Hille-Yosida theorem also
provides a construction of the semigroup from the generator, but we will not need
this here.
We next turn to a class of Markov semigroups that will be seen to have nice proper-
ties. Our state space is a locally compact Hausdorff space S with a countable basis
(e.g. S = Rd ). We do not need to assume compactness, but we will need to consider
the one-point compactification of S obtained by adding a “coffin state” ∂, making
S ∂ = S ∪ ∂ into a compact metrisable space.
We will place ourselves in the setting where the Hille-Yosida theorem works, and
make a specific choice for the underlying Banach space, namely, we will work on
the space C0 (S) of continuous functions vanishing at infinity. This requires that we
put a restriction on the semigroups so that they preserve C0 (S). The latter is known
as the Feller property.
It is an analytic fact (coming from the Riesz representation theorem) that to any
strongly continuous contraction semigroup there corresponds a sub-Markov kernel
Pt (x, dy) such that (Pt f )(x) = S Pt (x, dy)f (y) for all f ∈ C0 (S). The key result
is the existence theorem for Feller-Dynkin processes.
Theorem 5.21 Let (Pt )t∈R+ be a Feller-Dynkin semigroup on C0 (S). Then there
exists a strong Markov process with values in S ∂ , càdlàg paths and transition ker-
nels (Pt )t∈R+ . (The unique existence of the Markov process on the level of finite-
dimensional distributions does not require the Feller property.)
90 5 Markov Processes in Continuous Time
0 ≤ e−t Pt h ≤ h. (5.3.12)
If Y = (Yt )t∈R+ is the corresponding Markov process, then (e−t h(Yt ))t∈R+ is a
supermartingale.
Proof The lower bound in (5.3.12) is clear, since Pt maps non-negative functions
to non-negative functions. To get the upper bound, write
∞
∞
e−s Ps h = e−s Ps R1 g = e−s Ps e−u Pu gdu = e−u Pu gdu ≤ R1 g = h.
0 s
(5.3.13)
Now, (e−t h(Yt ))t∈R+ is a supermartingale, since
E e−s−t h(Yt+s )|Ft = e−s−t (Ps h)(Yt ) ≤ e−t h(Yt ), (5.3.14)
As a consequence of Lemma 5.22, the functions e−q h(Yq ) are regularisable, i.e.,
limq↓t e−q h(Yq ) exists for all t almost surely. We can therefore take a countable
dense subset {gi }i∈N of elements of C0 (S), set hi = R1 gi , and observe that the set
H = {hi }i∈N separates points in S ∂ , while almost surely e−q hi (Yq ) is regularisable
for all i ∈ N. But then Xt = limq↓t Yq exists for all t almost surely and is a càdlàg
process.
Finally, we establish that X is a modification of Y . To do this, let f, g ∈ C0 (S).
Then
E f (Yt )g(Xt ) = lim E f (Yt )g(Yq ) = lim E f (Yt )(Pq−t g)(Yt )
q↓t q↓t
= E f (Yt )g(Yt ) , (5.3.15)
where the first equality uses the definition of Xt and the third equality uses the strong
continuity of Pt . By an application of the monotone class theorem (Theorem 3.9)
this implies that E[f (Yt , Xt )] = E[f (Yt , Yt )] for any bounded measurable function
on S ∂ × S ∂ , and hence that P(Xt = Yt ) = 1.
5.3 General Markov processes 91
However, we want more, namely, like in the case of discrete-time Markov processes,
we want to be able to split past and future at stopping times. Let θt be the shift acting
on Ω via
X(θt ω)s = (θt X)(ω)s = X(ω)s+t . (5.3.17)
Then we have the following strong Markov property:
Theorem 5.23 Let T be an Ft+ -stopping time, and let P be the law of a Feller-
Dynkin Markov process X. Then, for all bounded random variables η, if T is a
stopping time, then
E[θT η|FT + ] = EXT [η], (5.3.18)
or equivalently, for all FT+ -measurable bounded random variables ξ ,
E[ξ θT η] = E ξ EXT [η] . (5.3.19)
For Λ ∈ FT + , set
Λn,k = ω ∈ Ω : T (n) (ω) = 2−n k ∩ Λ ∈ Fk2−n . (5.3.21)
To conclude the proof we need only generalise (5.3.26) to more general func-
tions. But this can be done in the usual manner via the monotone class theorem, and
presents no particular difficulties. Indeed, we first check that 1Λ can be replaced by
any bounded FT + -measurable function. ! Next, we show through explicit computa-
tion that instead of f (XT +s ) we can put ni=1 fi (XT +si ), and finally we can again
use the monotone class theorem to conclude the proof for the general case.
In principle, the Hille-Yosida theorem gives us precise criteria for recognising when
a given linear operator generates a strongly continuous contraction semigroup and
hence a Markov process. However, if we look at the conditions more carefully, then
we realise that in many situations it will be impractical to verify them. The domain
of a generator is usually far too large to allow for a description of the action of
the generator on all of its elements. For instance, for Brownian motion we want to
5.4 The martingale problem 93
think of the generator as 12 times the Laplacian on the space C 2 (R+ , Rd ). But this
operator is closed only in d = 1, but not in d ≥ 2, so already in this case we enter
into subtle issues we would rather like to avoid.
Let us first discuss this issue from a functional analytic point of view. To that end
we need to recall a few notions from operator theory.
Definition 5.24 Let G, C be two linear operators with domains D(G), D(C), re-
spectively. We say that C is an extension of G if
(i) D(G) ⊂ D(C).
(ii) Gf = Cf for all f in D(G).
n2
1 −λk/n
fn = e Pk/n f. (5.4.4)
n
k=0
n2
1 −λk/n
lim (λ − L )fn = lim e Pk/n (λ − L )f
n→∞ n→∞ n
k=0
∞
= dt e−λt Pt (λ − L )f = Rλ (λ − L )f = f. (5.4.5)
0
Thus, for any f ∈ D0 , there exists a sequence of functions ((λ − L )fn )n∈N in
range(λ − LD ) that converges to f . Hence the closure of the range of (λ − LD )
contains D0 . Since D0 is dense in B0 , the assertion follows from Lemma 5.28.
The above results are nice when we already know the semigroup. In more com-
plicated situations we may only be able to write down the action of what we want
to be the generator of the Markov process on some small subspace of functions.
The question is: How can we find out whether this specifies a (unique) strongly
continuous contraction semigroup on the full space of functions, e.g. C0 (S)? We
may be able to show that it is dissipative, but is range(λ − L ) dense in C0 (S)? The
martingale problem formulation is a powerful tool to address this question.
Lemma 5.30 Let X be a Feller-Dynkin process with transition functions (Pt )t∈R+
and generator L . Define, for f, g ∈ B(S),
t
Mt = f (Xt ) − g(Xs )ds. (5.4.6)
0
But
d
(Ps L f )(x) = (Ps f )(x), (5.4.8)
ds
96 5 Markov Processes in Continuous Time
and so
u
Pu (Xt , dy)f (y) − f (Xt ) − (Ps L f )(Xt )ds = 0, (5.4.9)
S 0
from which the claim follows.
By “the martingale problem” we will mean the inverse problem associated with
the above observation.
Definition 5.31 Given a linear operator L with domain D(L ) and range(L ) ⊂
Cb (S), an S-valued càdlàg process on a filtered càdlàg space (Ω, F , P, (Ft )t∈R+ )
is called a solution of the martingale problem associated with the operator L if, for
any f ∈ D(L ), (Mt )t∈R+ defined by (5.4.6) is an (Ft )t∈R+ -martingale.
Definition 5.32 A sequence (fn )n∈N in B(S) is said to converge bounded pointwise
(bp) to a function f ∈ B(S) if and only if
(i) supn∈N fn ∞ < ∞.
(ii) For every x ∈ S, limn→∞ fn (x) = f (x).
A set M ∈ B(S) is called bp-closed, if, for any sequence (fn )n∈N in M such that
bp − limn→∞ fn = f ∈ B(S), it is true that f ∈ M. The bp-closure of a set D ⊂
B(S) is the smallest bp-closed set in B(S) that contains D. A set M is called bp-
dense if its closure is B(S).
Lemma 5.33 Let (fn )n∈N be such that bp−limn→∞ fn = f and bp−limn→∞ L fn
= L f . If
t
is a martingale.
Proof Straightforward.
The implication of Lemma 5.33 is that in order to find a unique solution of the
martingale problem it suffices to know the generator on a core.
Theorem 5.34 Let L1 be an operator with domain D(L1 ) and range range(L1 ),
and let L be an extension of L1 . Suppose that the bp-closures of the graphs of L1
5.4 The martingale problem 97
and L are the same. Then a stochastic process X is a solution of the martingale
problem for L if and only if it is a solution of the martingale problem for L1 .
The strategy will be to understand when the martingale problem has a unique
solution, and to show that this solution is a Markov process. It will be comforting
to see that only dissipative operators can give rise to the solution of a martingale
problem.
We first prove a result that gives an equivalent characterisation of the martingale
problem.
t t s
e− 0 k(Xs )ds
f (Xt ) + e− 0 k(Xr )dr
k(Xs )f (Xs ) − (L f )(Xs ) ds
0 t∈R+
(5.4.13)
is a martingale.
Theorem 5.36 Let M be a càdlàg local martingale (recall Definition 3.100), and
let V be a continuous and adapted process that is locally of bounded variation. Then
W = (Wt )t∈R+ with
t
t
Wt = Vs dMs = Vt Mt − V0 M0 − Ms dVs (5.4.14)
0 0
Proof By the definition of local martingales, a.s. we can find an increasing sequence
of stopping times (τn )n∈N with limn→∞ τn = ∞ such that M τn are martingales for
each n ∈ N, where M τn is the martingale M stopped at time τn . We may, moreover,
assume that |M τn | ≤ n and
m−1
RV (t) = sup sup |Vuk+1 − Vuk | ≤ n. (5.4.15)
m∈N 0≤u0 ≤···≤um ≤t k=0
98 5 Markov Processes in Continuous Time
We have
m−1
t
Vs dMsτn = lim Vunk Muτnn − Muτnn , (5.4.16)
0 n→∞ k+1 k
k=0
m−1
τ τ
m−1
Vunk Muτnn − nMuτnn = Vtτn Mtτn − V0 M0τn − Munn Vunn − Vuτnn . (5.4.18)
k+1 k k+1 k+1 k
k=0 k=0
for the left-hand side. Since, for any n ∈ N, the left-hand side is a martingale, this
property remains true in the limit as n → ∞. The limit as n → ∞ exists because
limn→∞ τn = ∞ a.s.
Corollary 5.37 Let (Ft )t∈R+ be a filtration and X an adapted process. Let f, g ∈
B(S). Then, for λ > 0, (5.4.6) is a martingale if and only if
t
e−λt f (Xt ) + e−λs λf (Xs ) − g(Xs ) ds (5.4.19)
0 t∈R+
is a martingale.
Lemma 5.38 Let L be a linear operator with domain and range in B(S). If a
solution of the martingale problem for L exists for any initial condition X0 = x ∈ S,
then L is dissipative.
Proof Let f ∈ D(L ) and g = L f . Use that (5.4.19) is a martingale with λ > 0.
Taking expectations and letting t → ∞, we get
∞
−λs
f (X0 ) = f (x) = E e λf (Xs ) − g(Xs ) ds (5.4.20)
0
5.4 The martingale problem 99
and hence
∞ ∞
f (x) ≤ e−λs Eλf (Xs ) − g(Xs )ds ≤ e−λs λf − gds = λ−1 λf − g,
0 0
(5.4.21)
which shows that λf ≤ (λ − L )f and proves that L is dissipative.
5.4.3 Uniqueness
We have seen that solutions of the martingale problem provide candidates for nice
Markov processes. The two main issues to understand are when a martingale prob-
lem has a unique solution and whether this solution represents a Markov process.
When talking about uniqueness we will always assume that an initial distribution μ
is given. Thus, the data for the martingale problem is a pair (L , μ), where L is a
linear operator with its domain D(L ) and μ is a probability measure on S.
The following result is hardly surprising.
Finally we can establish a uniqueness criterion and the strong Markov property
for solutions of martingale problems.
100 5 Markov Processes in Continuous Time
Theorem 5.41 Let S be a separable space and let L be a linear operator on B(S).
Suppose that for any initial distribution μ, any two solutions X, Y of the martingale
problem for (L , μ) have the same one-dimensional distributions, i.e., P(Xt ∈ A) =
P(Yt ∈ A) for any t ∈ R+ and any Borel set A. Then the following hold:
(i) Any solution of the martingale problem for L is a Markov process and any
two solutions of the martingale problem with the same initial distribution have
the same finite-dimensional distributions (i.e., uniqueness holds).
(ii) If D(L ) ⊂ Cb (S) and X is a solution of the martingale problem with càdlàg
sample paths, then for any a.s. finite stopping time τ ,
E f (Xt+τ )|Fτ = E f (Xt+τ )|Xτ ∀ f ∈ B(S). (5.4.23)
(iii) If, in addition to the assumptions in (ii), there exists a càdlàg solution of the
martingale problem for any initial measure μ = δx , x ∈ S, then the strong
Markov property holds, i.e.,
E f (Xt+τ )|Fτ = (Pt f )(Xτ ). (5.4.24)
5.4.4 Existence
We have seen that a uniquely solvable martingale problem provides a way to con-
struct a Markov process. We therefore need to find ways to produce solutions of
martingale problems. The best way to do this is through approximations and weak
convergence.
Lemma 5.42 Let L be a linear operator with domain and range in Cb (S). Let
(Ln )n∈N be a sequence of linear operators with domain and range in B(S). Assume
that for any f ∈ D(A) there exists a sequence (fn )n∈N with fn ∈ D(Ln ) such that
If, for each n ∈ N, X n is a solution of the martingale problem for Ln with càdlàg
sample paths and X n converges to X weakly, then X is a càdlàg solution of the
martingale problem for L .
5.4 The martingale problem 101
Proof Let k ∈ N, and let 0 ≤ t1 < · · · < tk ≤ t < s be elements of the set C (X) =
{u ∈ R+ : P(Xu = Xu− ) = 1}. Let h1 , . . . , hk ∈ Cb (S), and let f, fn be as in the
hypothesis of the lemma. Then
s
"k
E f (Xs ) − f (Xt ) − (L f )(Xu )du hi (Xti )
t i=1
"
k
s n n
= lim E fn Xsn − fn Xtn − (L fn ) Xu du hi Xti = 0.
n→∞ t i=1
(5.4.26)
The complement of the set C (X) is at most countable, and hence (5.4.26) carries
over to all points 0 ≤ t1 < · · · < tk ≤ t < s. But this implies that X solves the mar-
tingale problem for L .
The usefulness of Lemma 5.42 is based on the following lemma, which implies
that we can use Markov jump processes as approximations.
Lemma 5.43 Let S be compact and let L be a dissipative operator on C(S) with
dense domain and L 1 = 0. Then there exists a sequence of positive contraction
operators (Tn )n∈N on B(S) given by transition kernels such that, for f ∈ D(L ),
lim n(Tn − 1)f = L f. (5.4.27)
n→∞
Proof Here is a rough sketch of the proof, which is closely related to the Hille-
Yosida theorem. From L we construct the resolvent (n − L )−1 on the range of
(n − L ). For a dissipative L , the operators n(n − L )−1 are bounded (by 1) on
range(n − L ). Hence, by the Hahn-Banach theorem, they can be extended to C(S)
as bounded operators. Using the Riesz representation theorem, we can then associate
with n(n − L )−1 a probability measure μn via
n(n − L )−1 f (x) = f (y)μn (x, dy), (5.4.28)
S
The point of Lemma 5.43 is that it shows that the martingale problem for L can
be approximated by martingale problems with bounded generators of the form
(Ln f )(x) = n f (y) − f (x) μn (x, dy), (5.4.29)
S
where Ln is the generator of a Markov jump process. For such a generator, the
construction of a solution can be done explicitly in various ways, e.g. by letting the
transition kernel be the convergent series for exp(tLn ).
102 5 Markov Processes in Continuous Time
The definition of the t stochastic integral was already provided in Theorem 5.36,
where the integral 0 Vs dMs was defined through a Stieltjes-integral in the case
where Vt has (locally) bounded variation. We thus see that the challenge is to de-
fine stochastic integrals when also the integrand is not of bounded variation. Before
doing so we need to return briefly to the theory of martingales.
Let M be a càdlàg martingale. We want to define its quadratic variation process
[M] in analogy with the discrete-time setting. This will be contained in the following
fundamental result.
Proof We will only consider the case where M is continuous. We may also assume
that M is bounded: otherwise we consider the martingale stopped when it exceeds a
finite value N . Define stopping times
T0n = 0, n
Tk+1 = inf t > Tkn : |Mt − MTkn | ≥ 2−n . (5.5.2)
5.5 Itō calculus 103
Set tkn = t ∧ Tkn . Assuming that M0 = 0, we can write (by telescopic expansions)
Mt2 = 2 n (Mt n − Mt n ) +
Mtk−1 k k−1
(Mtkn − Mtk−1 2
n ) . (5.5.3)
k∈N k∈N
Let
Htn = n 1T n <t≤T n .
MTk−1 k−1 k
(5.5.4)
k∈N
Note that H n = (Htn )t∈R+ is left-continuous, which makes it previsible. The first
term in the right-hand side of (5.5.3) is (H n • M)t , the so-called discrete stochastic
integral (see (3.4.4)), and we know from Theorem 3.53 that this is an L2 -bounded
martingale. We define
Ant = (Mtkn − Mtk−1
n ) .
2
(5.5.5)
k∈N
Then
Mt2 = 2 H n • M t + Ant . (5.5.6)
By construction, H n approximates M well:
sup Htn − Htn+1 ≤ 2−n−1 , sup Htn − Mt ≤ 2−n . (5.5.7)
t∈R+ t∈R+
The sets Jn (ω) = {Tkn (ω), k ∈ N} refine each other, i.e., Jn (ω) ⊂ Jn+1 (ω), and
k∈N
−2n−2
≤2 E (Mtk − Mtk−1 )2
k∈N
−2n−2
2
=2 E M∞ . (5.5.9)
Due to the fact that the sets Jn (ω) form refinements and that An increases on the
stopping times Tkn , it follows that
ATkn ≤ ATk+1
n (5.5.11)
104 5 Markov Processes in Continuous Time
for all k, n. So A is increasing on the closure of J (ω) = n∈N Jn (ω). Thus, if J (ω)
is dense, then A is increasing. The remaining option is that the complement of J (ω)
contains some open interval I . But in that case, since no Tkn is in I , M must be
constant on I , and so must A. Thus, A is a continuous increasing process such that
M 2 − A is a continuous martingale, and hence A = [M].
It remains to show the uniqueness of the process [M]. For this we use the follow-
ing lemma.
Lemma 5.45 Suppose that M is a continuous local martingale that has paths of
finite variation. If M0 = 0, then Mt = 0 for all t.
is the total variation process, we may assume that M has bounded total variation.
Then, obviously,
−n −n
Ant = (Mtkn − Mtk−1
n ) ≤2
2
|Mtkn − Mtk−1
n |≤2 VM (t), (5.5.13)
k∈N k∈N
Uniqueness of [M] follows from the above observations. Indeed, assume that
there are two processes A, A with the desired properties. Then A − A is the dif-
ference of two uniformly integrable martingales, and hence is itself a uniformly
integrable martingale. On the other hand, since A and A are increasing and hence
are of finite variation, their difference is of finite variation, and thus is identically
zero by Lemma 5.45.
Theorem 5.46 Let M be a càdlàg martingale. Then, for any t ∈ R+ and any se-
quence of partitions {unk } of the interval [0, t] such that limn→∞ maxk∈N |unk −
unk−1 | = 0,
D
(Munk+1 − Munk )2 → [M]t . (5.5.14)
k∈N
The proof of this theorem is somewhat technical and will not be included. See
e.g. Ethier and Kurtz [104].
For the case where M is Brownian motion, we have the following fact.
5.5 Itō calculus 105
Recall from the discrete-time theory that there were two brackets associated with
a martingale: M and [M]. The first corresponds to the process given by Theo-
rem 5.44, the second is the quadratic variation process. In the case of continuous
martingales, they are the same.
We have already seen that the stochastic integral can be defined as a Stieltjes integral
for integrators of bounded variation. We will now show the crucial connection be-
tween the quadratic variation process of the stochastic integral and the process [M].
We begin with the case where the integrand, X is a step function.
Definition 5.48 A stochastic process is called simple process, if it has sample paths
that are step functions paths of the form
∞
Xt (ω) = xi (ω)1ti−1 <t≤ti , (5.5.15)
i=1
Clearly, the stochastic integral for such a function is defined and equals
t
Xs dMs = xi M(ti ) − M(ti−1 ) + xm(t)+1 M(t) − M(tm(t) ) ,
0 ti ≤t
Proof The proof is straightforward and will be skipped. See Ethier and Kurtz [104,
Sect. 5.2.].
The stochastic integral for more general integrands should share the properties
stated in Lemma 5.49. The natural goal is to extend the integral to integrands X for
106 5 Markov Processes in Continuous Time
which the objects characterising it make sense. Note, in particular, that it follows
from (5.5.16) that
t
2
t
E Xs dMs =E Xs2 d[M]s . (5.5.17)
0 0
t
This means that the map X → 0 Xs dMs , from the space of left-continuous step-
functions equipped with the norm
t
1/2
X2,d[M] = E Xs2 d[M]s (5.5.18)
0
to the space of local square-integrable martingales with the norm L 2 (P), is an isom-
etry, called the Itō isometry. We will extend this isometry to all of L 2 (d[M]) to
define the Itō integral.
#
n $1/2
E (Xn − X)2s d[M]s < ∞, (5.5.19)
n∈N 0
it is true that
t
lim sup (Xn − X)s dMs = 0,
(5.5.20)
n→∞ 0≤t≤T 0
Proof See Ethier and Kurtz [104, Sect. 5.2, Theorem 2.3].
Remark 5.51 Theorem 5.50 extends the isometry X → XdM from the dense set
of left-continuous bounded step-functions to the full space L 2 (d[M]).
Remark 5.52 Theorem 5.50 is not the end of the possible extensions of the defini-
tion of stochastic integrals. Using localisation arguments as indicated in the defini-
tion of the bracket [M], we can extend the space of integrators to continuous local
martingales without the assumption of square integrability.
5.6 Stochastic differential equations 107
We now come to a most useful formula involving the notion of stochastic integrals,
the celebrated Itō formula. This formula is the analogue for functions of stochastic
processes with unbounded variation of the fundamental theorem of calculus.
We consider a stochastic process X of the form
X = X0 + V + M, (5.5.22)
Theorem 5.53 (Itō formula) With the assumptions stated above, the following
holds:
t
t
∂ ∂
f (t, Xt ) − f (0, X0 ) = f (s, Xs )ds + f (s, Xs )dVs
0 ∂s 0 ∂x
t
∂ 1 t ∂2
+ f (s, Xs )dMs + f (s, Xs )d[M]s .
0 ∂x 2 0 ∂x 2
(5.5.23)
Remark 5.54 The Itō formula can be stated more conveniently in differential form
as
∂ ∂ 1 ∂2
df (t, Xt ) = f (t, Xt )dt + f (t, Xt )dXt + f (t, Xt )d[X]t , (5.5.24)
∂t ∂x 2 ∂x 2
with the understanding that d[X] = d[M], since the quadratic variation of the finite
variation process V is zero.
where the integral with respect to B is understood as the Itō stochastic integral.
In the most general setting the drift functions b and the diffusion functions σ are
assumed to be locally bounded and measurable.
The questions we are interested in are existence and uniqueness of solutions of
(5.6.2), as well as properties of these solutions. In Sects. 5.6.1–5.6.2 below we dis-
cuss the notion of strong solution. For stochastic differential equations there is also
the notion of weak solution, but we will skip that here.
Definition 5.55 For an SDE pathwise uniqueness holds if the following is true. For
any set-up (Ω, F , P, (Ft )t∈R+ , ξ, B) and any two continuous semi-martingales X
and X such that
t
b(s, Xs ) + σ (s, Xs )2 ds < ∞ (5.6.4)
0
solving the SDE with initial condition ξ and Brownian motion B,
P Xt = Xt ∀ t ∈ R+ = 1. (5.6.5)
If an SDE admits for any set-up (Ω, F , P, (Ft )t∈R+ , ξ, B) exactly one continuous
semi-martingale as solution, then we say that it is exact.
5.6 Stochastic differential equations 109
The notion of strong solution is naturally associated with the setting of exact
SDEs.
F : Rn × W → W (5.6.6)
such that
F −1 (Ht ) ⊂ B Rn × H¯t ∀ t ∈ R+ (5.6.7)
and on any set-up (Ω, F , P, (Ft )t∈R+ , ξ, B) the process X = F (ξ, B) solves the
SDE.
Existence and uniqueness results in the strong sense can be proven in a similar
way as for the case of ordinary differential equations, with the help of Gronwall’s
inequality and Picard’s iteration scheme. We recall Gronwall’s lemma, whose proof
is elementary.
Then
t
Lemma 5.57 is at the heart of all uniqueness proofs for differential equations.
The idea is always the same: given a differential equation x = F (x, s), try to find
a norm x and a Lipschitz bound F (s, x) − F (s, y) ≤ g(s)x − y, at least for
small times s and small x − y. For two solutions x, y with initial values x0 , y0 ,
set f (t) = xt − yt . Then
t
f (t) ≤ x0 − y0 + g(s)f (s)ds, (5.6.10)
0
Theorem 5.58 Assume that b and σ are bounded and measurable, and that there
exists an open set U ⊂ R, T > 0 and K < ∞ such that
b(t, x) − b(t, y) + σ (t, x) − σ (t, y) ≤ K|x − y|, x, y ∈ U, 0 ≤ t ≤ T .
(5.6.12)
Let X, Y be two solutions of (5.6.2) (with the same Brownian motion B), and set
τ = inf{t ∈ R+ : Xt ∈
/ U or Yt ∈
/ U }. (5.6.13)
If E[(X0 − Y0 )2 ] = 0, then
Proof The proof is based on Gronwall’s lemma and runs very much like its deter-
ministic analogue. As norm we choose a uniform L 2 -bound:
E max (Xs∧τ − Ys∧τ )2 ≤ 2 E (X0 − Y0 )2
0≤s≤t
s∧τ
2
+ 4 E max σ (u, Xu ) − σ (u, Yu ) dBu
0≤s≤t 0
s∧τ
2
+ 4 E max b(u, Xu ) − b(u, Yu ) du
0≤s≤t 0
t∧τ
2
≤ 16 E σ (u, Xu ) − σ (u, Yu ) du
0
t∧τ
2
+ 4t E b(u, Xu ) − b(u, Yu ) du
0
t∧τ
≤ 4K (t + 4)E
2
(Xu − Yu ) du
2
0
t
≤ 4K 2 (t + 4) E max (Xu∧τ − Yu∧τ )2 ds. (5.6.15)
0 0≤u≤s
The first inequality uses that (a + b)2 ≤ 2a 2 + 2b2 , the second inequality uses the
Cauchy-Schwarz inequality for the drift term and the Doob L2 -maximum inequality
for the diffusion term, the third inequality uses the Lipschitz condition, while the
fourth inequality uses Fubini’s theorem.
We see that f (s) = E[max0≤s≤t (Xs∧τ −Ys∧τ )2 ] satisfies the hypothesis of Gron-
wall’s lemma with A = 0, so that
E max (Xt∧τ − Yt∧τ )2 = 0. (5.6.16)
0≤t≤T
5.6 Stochastic differential equations 111
Finally, existence of solutions (for finite times) can be proved via the usual Picard
iteration scheme under Lipschitz and growth conditions.
Theorem 5.59 Let b, σ satisfy the Lipschitz conditions in (5.6.12) and assume that
2 2
max b(t, x) + σ (t, x) ≤ K 2 1 + |x|2 . (5.6.17)
0≤t≤T
Let ξ be a random vector with finite second moment, independent of B, and let
(Ft )t∈R+ be the usual augmentation of the filtration associated with B and ξ . Then
there exists a continuous (Ft )t∈R+ -adapted process X that is a strong solution of
the SDE with initial condition ξ . Moreover, X is square-integrable, i.e., for any
T > 0 there exists a C(T , K) such that, for all 0 ≤ t ≤ T ,
E Xt 2 ≤ C(K, T ) 1 + E ξ 2 eC(K,T )t . (5.6.18)
Proof We define a map F from the space of continuous adapted processes X that
are uniformly square-integrable on [0, T ] to itself via
t
t
F (X)t = ξ + b(s, Xs )ds + σ (s, Xs )dBs . (5.6.19)
0 0
Note that the square-integrability of F (X) needs the growth conditions in (5.6.17).
As in (5.6.15),
t 2
2
E sup F (X)t − F (Y )t ≤ 2E sup σ (Xs ) − σ (Ys ) dBs
0≤t≤T 0≤t≤T 0
t 2
+ 2E sup b(Xs ) − b(Ys ) ds
0≤t≤T 0
T
≤ 2K 2 (1 + T ) E sup Xs − Ys 2 dt, (5.6.20)
0 0≤s≤t
inequality
C k T 2k
(k+1) (k) 2
E sup Xt − Xt ≤ E 1 + ξ2 . (5.6.22)
0≤t≤T k!
Apply the same arguments to estimate
(k+1) 2 t 2
E Xt ≤ K E ξ 2 + KT 1 + E Xs(k) ds. (5.6.23)
0
(k+1) 2 k
KT i+1
E Xt ≤ E ξ 2 + 1 + E ξ 2 ≤ KT 1 + E ξ 2 eKT t ,
i!
i=1
(5.6.24)
to get the growth bound in (5.6.18) with C(K, T ) = KT .
Theorem 5.60 Let X be a Markov process, i.e., a solution of the martingale prob-
lem for an operator L , and let h be a strictly positive harmonic function. Define
the measure Ph such that, for any Ft -measurable random variable Y ,
1
Ehx [Y ] = Ex h(Xt )Y . (5.6.25)
h(x)
Then Ph is the law of a solution of the martingale problem for the operator L h
defined by
h 1
L f (x) = (L hf )(x). (5.6.26)
h(x)
1 ∇h
Lh = Δ+ · ∇, (5.6.27)
2 h
5.6 Stochastic differential equations 113
Fig. 5.3 Drift function b : R → R for Brownian motion conditioned to never hit the origin:
b(x) = 1/x
and hence, under the law Ph , the Brownian motion becomes the solution of the SDE
∇h(Xt )
dXt = dt + dBt . (5.6.28)
h(Xt )
On the other hand, we have seen that if h(x) is the probability of some event, e.g.
h(x) = Px (XτD ∈ A) for some A ∈ ∂D, then
This means that the Brownian motion conditioned to exit D at a given location can
be represented as the solution of an SDE with a specific drift. For instance, let d = 1
and D = (0, R). Consider the Brownian motion conditioned to leave D at R. It is
well known that
Px (XτD = R) = x/R, x ∈ D. (5.6.30)
Thus, the conditioned Brownian motion solves
1
dXt = dt + dBt . (5.6.31)
Xt
We can let R → ∞ without changing the SDE. Hence, the solution of (5.6.31) is
Brownian motion conditioned to never return to the origin (see Fig. 5.3). This is
reasonable, because the strength of the drift away from zero goes to infinity near 0.
Still, it is remarkable that the conditioning can be reproduced by the application of
a proper drift.
integrable with respect to dt, i.e., X is an integrand for B. Suppose that we want to
study the process
t
Wt = Bt − Xs ds. (5.6.32)
0
We may think of Xs = b(s, Bs ) for some bounded measurable function b, the sim-
plest example being b(s, Xs ) = b, in which case
Wt = Bt − bt,
and let %
P be defined by
%
PT (A) = E ZT (X)1A . (5.6.34)
If (Zt )0≤t≤T is a martingale, then the process (Wt )0≤t≤T is a Brownian motion
under % PT .
Remark 5.62 We may check, using the Itō formula, that Zt solves
Hence (Zt )0≤t≤T is a positive local martingale and so, by Fatou’s lemma, a super-
martingale. It is a martingale whenever E[Zt ] = 1 for all t.
Fig. 5.4 A space-time plot (x, t) → u(x, t) of the stochastic Allen-Cahn equation
is not yet finalised. For recent developments, see Hairer [135]. In Chap. 12 we will
discuss metastability for one example system, namely, the stochastic Allen-Cahn
equation (see Fig. 5.4 for a visualisation). In this section we present the relevant
background.
∂ 1 ∂2 √ ∂2
u(x, t) = D 2 u(x, t) − V u(x, t) + 2ε W (x, t). (5.7.1)
∂t 2 ∂x ∂x∂t
The initial condition is given by u(x, 0) = u0 (x), x ∈ [0, 1], with u0 a continu-
ous function. We also need to choose boundary conditions, e.g. periodic bound-
ary conditions u(0, t) = u(1, t), t ∈ R+ , or von Neumann boundary conditions
∂x u(0, t) = ∂x u(1, t) = 0, t ∈ R+ .
We need some further assumptions on the potential V .
116 5 Markov Processes in Continuous Time
Assumption 5.63
• V is C 3 on R.
• V is convex at infinity, i.e., there exist R, c > 0 such that
∂
u = −Dφ F, (5.7.4)
∂t
where, for φ a differentiable function,
1
1
F (φ) = 2 + V φ(x) dx, (5.7.5)
2 D φ (x)
0
where δ(·) is the Dirac function, but it is clear that (5.7.1) requires even more inter-
pretation than an SDE.
To get an idea of what is at stake, consider first the linear equation
On the level of formal computations, this process has the desired correlation struc-
ture. Therefore, denoting by
1
1
v̂(n, t) = e−(2πi)nx ν(x, t)dx n ∈ Z, (5.7.9)
2π 0
5.7 Stochastic partial differential equations 117
the spatial Fourier coefficients of v, we find that these satisfy the stochastic ordinary
differential equations
√
d v̂(n, t) = − 12 D(2πn)2 v̂(n, t) dt + 2ε dBn (t), (5.7.10)
with initial condition v̂(n, 0) = 0. Note that these equations are uncoupled for dif-
ferent n. The equations in (5.7.10) are interpreted as Itō-SDEs, so we are on firm
ground. The solution of (5.7.10) is readily found to be
√
t
− 12 D(2πn)2 t 1 2
v̂(n, t) = 2ε e e 2 D(2πn) s dBn (s). (5.7.11)
0
A quick check whether this series represents a bona fide stochastic process is the
computation of the variance of the spatial L2 -norm:
2
2 −D(2πn)2 t
t 1
D(2πn)2 s
E v(x, t) 2
= 2ε e E e 2 dBn (s) (5.7.13)
n∈Z 0
t
−D(2πn)2 t 2
= 2ε e eD(2πn) s ds
n∈Z 0
1 2
= 2ε 2
1 − e−D(2πn) t .
D(2πn)
n∈Z
which diverges.
∂u(x, t) 1 ∂ 2 u(x, t)
√ ∂ 2 W (x, t)
− 2D = −V u(x, t) + 2ε . (5.7.17)
∂t ∂ 2x ∂x∂t
Next, think of the entire right-hand side as an inhomogeneous term (i.e., ignore the
fact that the right-hand side involves the solution itself). Then we can represent the
solution of (5.7.17) as
1
t
1
u(x, t) = dy gt (x, y)u0 (y) − ds dy gt−s (x, y)V u(y, s)
0 0 0
√
t
1
+ 2ε ds gt−s (x, y) dW (y, s)
0 0
1 t 1
= dy gt (x, y)u0 (y) − ds dy gt−s (x, y)V u(y, s)
0 0 0
+ v(x, t). (5.7.18)
Here, the last term is taken as the solution of the linear equation in (5.7.7) we just
constructed. The other terms in the right-hand side are ordinary integrals, so all the
expressions make sense. A so-called mild solution is a process that satisfies this
equation, i.e., instead of the ill-defined SPDE driven by space-time white noise, we
now have a non-linear integral equation driven by the more regular noise v(x, t).
Theorem 5.66 For every initial condition u0 ∈ Cbc ([0, 1]), the SPDE in (5.7.1) has
a unique mild solution. Moreover, for all T > 0 and p ≥ 1,
p
E sup u(x, t) ≤ C(T , p). (5.7.20)
[0,T ]×[0,1]
The random field u is 2α-Hölder in space and α-Hölder in time for every α ∈ (0, 14 ).
The only complication that arises comes from the fact that V is not globally
Lipschitz. However, due to Assumption 5.63, we have
5.7.2 Discretisation
From our perspective, the stochastic Allen-Cahn equation should arise as the limit
of a spatially discrete system. This works out well in dimension d = 1. The discrete
system consists of N ∈ N coupled stochastic differential equations of the form
∂ √
dXj (t) = − FD,N X(t) dt + 2ε dBj (t), j ∈ Λ, (5.7.22)
∂Xj
where Λ = Z/N Z = {1, . . . , N}, Xj (t) are the components of X(t) ∈ RN , and
FD,N (x) = V (xj ) + 14 D (xj − xj +1 )2 , (5.7.23)
j ∈Λ j ∈Λ
If we replace the unit lattice by a lattice of spacing 1/N , i.e., (xj )j ∈Λ is the discreti-
sation of a real-valued function x on [0, 1] given by xj = x(j/N ), then the resulting
120 5 Markov Processes in Continuous Time
Finally, we must accelerate time by a factor N , i.e., we set X N (t) = X(tN ). The
resulting discrete equations take the form
√
dXjN (t) = − 12 DN 2 XjN+1 (t)+XjN−1 −2XjN (t) dt −V XjN (t) dt + 2εN d B %j (t),
(5.7.27)
with B̃j , j ∈ Λ, independent Brownian motions.
We now define the function uN (x, t) : [0, 1] × R+ → R such that, for any given
t ∈ R+ , uN (·, t) is the linear interpolation between the points (j/N, XjN (t)), j ∈ Λ.
Then uN can also be represented as a mild solution of the discrete system, which
allows us to prove convergence to a mild solution of the SPDE. To do this, we
proceed again by solving the linear equations
√
dYjN (t) = − 12 D N 2 YjN+1 (t) + YjN−1 − 2YjN (t) dt + 2εN d B %j (t) (5.7.28)
with the help of Fourier series. Set Y&nN = N −1 j ∈Λ e−(2πi)nj/N Y N and, conse-
j
quently, YjN = n∈Λ e(2πi)nj/N Y &nN . Note that also
√
%j (t) =
NB e(2πi)nj/N Bn (t), (5.7.29)
n∈Λ
where the Bn , n ∈ Λ, are the independent Brownian motions from (5.7.8). A sim-
ple computation yields that the Fourier modes satisfy the SDEs (with zero initial
condition)
N √
dY &n (t) dt + 2ε dBn (t).
&nN (t) = − 1 DN 2 2 cos(2πn/N ) − 2 Y (5.7.30)
2
&N
Abbreviate Δn = N 2[cos(2πn/N ) − 2]. Then these equations can be solved as
2
√
t
&nN (t) =
1 &N 1 &N
Y 2ε e− 2 D Δn t e 2 D Δn s dBn (s). (5.7.31)
0
Define
vjN (t) = &nN (t).
e(2πi)nj/N Y (5.7.32)
n∈Λ
5.7 Stochastic partial differential equations 121
It is easy to see that this expression converges to the right-hand side of (5.7.13)
as N → ∞. In fact, denoting by v N (x, t) the linear interpolation of the points
(j/N, vjN (t)), j ∈ Λ, we can show that this process converges to v(x, t) in L2 .
Finally we can write the discrete equations in their mild form as
t
Xj (t) =
N
pt (j, k)Xk (0) −
N N
pt−s (j, k)V XkN (s) ds + vjN (t), (5.7.34)
k∈Λ k∈Λ 0
where p N is the semi-group associated with the discrete Laplacian. Note that the
noise is coupled to the mild form of the SPDE in that both are driven by the same
Brownian motions.
The above formulation can be further embellished by writing it for the inter-
polations uN defined by putting uN (j/N, t) = XjN (t) for j ∈ Λ and using linear
interpolation. Define (for von Neumann boundary conditions)
N x 1
κN (x) = + . (5.7.35)
N 2N
Let p N be the linear interpolation of p N on [0, 1] × [0, 1] along the discretisation
points.
Lemma 5.68 For every u0 ∈ Cbc ([0, 1]) and N ∈ N the function uN defined on
[0, 1] × R+ satisfies the equation
1
u (x, t) =
N
dy gtN x, κN (y) u0 κN (y)
0
t 1
− ds N
dy gt−s x, κN (y) V uN κN (y), s + v N (x, t).
0 0
(5.7.36)
The following theorem asserts the convergence of the solution of (5.7.36) to the
solution of (5.7.1).
lim uN = u, (5.7.38)
N →∞
122 5 Markov Processes in Continuous Time
Ξ
uN − u ∞,T
≤ , (5.7.39)
Nη
where w∞,T = supt∈[0,T ] supx∈[0,1] |w(x, t)|.
The convergence of the solutions also implies the convergence of the hitting times
of the discrete approximations to those of the SPDE. A precise statement is as fol-
lows. Let u0 be the initial condition of the solution of (5.7.1) and φ a continuous
function. For ρ > 0, define the hitting times
τ (ρ) = inf t > 0 : u(t) − φ ∞ < ρ ,
(5.7.40)
τ N (ρ) = inf t > 0 : uN (t) − φ N ∞ < ρ ,
Theorem 5.70 Suppose that limN →∞ φ N − φ∞ = 0 and that there exists a ρ0
such that, for every 0 < ρ < ρ0 ,
Eu0 τ (ρ) < ∞. (5.7.41)
The proof of this theorem is straightforward and can be found in Barret [11].
1. Much of the exposition in this chapter is taken from Ethier and Kurtz [104], Roger
and Williams [207, 208] and Karatzas and Shreve [148]. The martingale problem
formulation is due to Stroock and Varadhan [223].
2. The conditions for existence stated in Theorems 5.59 are not necessary. In par-
ticular, growth conditions are important only when the solutions can reach regions
where the coefficients become too large. Formulations of weaker hypotheses for ex-
istence and uniqueness can be found in Jacod and Shiryaev [143, Chap. 14]. Their
verification in concrete cases can be tricky.
5.8 Bibliographical notes 123
5. The existence, uniqueness and regularity of the solution of the stochastic Allen-
Cahn equation stated in Proposition 5.66 was proved in Gyöngy and Pardoux [134].
The convergence of the finite discretisation in Theorem 5.69 was proved in Fu-
naki [117] and Gyöngy [133] for V with V globally Lipschitz. Barret [11] extended
this result to the setting of Assumption 5.63.
This chapter gives a summary introduction to large deviations. Although large devi-
ation theory is not our main interest in this monograph, it is an essential element in
our conceptual understanding of metastability. Moreover, it provides tools to obtain
estimates, which often serve as preliminary steps towards more refined estimates.
Section 6.1 recalls the main ingredients of large deviation theory in a general set-
ting (without proofs). Section 6.2 gives a full derivation of path large deviations for
diffusion processes (under strong regularity assumptions). Section 6.3 takes a brief
look at path large deviations for stochastic partial differential equations. Section 6.4
formulates the extension to path large deviations for Markov processes (without
proofs). Section 6.5 gives a brief outline of the Freidlin-Wentzell theory of metasta-
bility, collects some properties of associated action integrals, and looks at crossing
and exit problems that are crucial for a proper understanding of metastability.
Informally, the LDP says that if Bδ (x) is a ball of radius δ > 0 centred at x ∈ X ,
then
με Bδ (x) = e−[1+o(1)] I (x)/ε (6.1.1)
Lemma 6.2 (Varadhan’s lemma) If (με )ε>0 satisfies the LDP on X with rate func-
tion I , then
lim ε ln eF (x)/ε με (dx) = ΛF , ∀ F ∈ Cb (X ), (6.1.2)
ε↓0 X
The result in Lemma 6.2 can be extended to include F that are unbounded
and/or discontinuous, provided certain tail estimates on με are available. Varadhan’s
lemma has the following inverse.
Lemma 6.3 (Bryc’s lemma) Suppose that (με )ε>0 is exponentially tight and the
limit in (6.1.2) exists for all F ∈ Cb (X ). Then (με )ε≥0 satisfies the LDP with rate
function I given by
I (x) = sup F (x) − ΛF , x ∈ X . (6.1.4)
F ∈Cb (X )
There are several “forward principles” that allow LDP’s to be generated from one
another. A key example is the contraction principle (see Fig. 6.2).
6.1 Large deviation principles 127
Lemma 6.4 (Contraction principle) Let (με )ε>0 satisfy the LDP on X with rate
function I . Let Y be a second Polish space, and let T : X → Y be a continuous
map from X to Y . Then the family of probability measures (νε )ε>0 on Y defined
by ν = μ ◦ T −1 satisfies the LDP on Y with rate function J given by
Lemma 6.5 Let (με )ε>0 satisfy the LDP on X with rate function I , and let F ∈
Cb (X ). Then the family of probability measures (νε )ε>0 on X defined by
1 F (x)/ε
νε (dx) = e με (dx), Nε = eF (x)/ε με (dx), (6.1.6)
Nε X
Theorem 6.6 (Dawson-Gärtner projective limit LDP) Let (με )e>0 be a family of
on X
probability measures . Let (π N )N ∈N be a nested family of projections acting
on X , such that n∈N π X = X , and let
N
N −1
X N = πNX , ε = με ◦ π
μN , N ∈ N. (6.1.8)
If, for each N ∈ N, the family (μN ε )ε>0 satisfies the LDP on X
N with rate func-
tion I , then (με )ε>0 satisfies the LDP on X with rate function I given by
N
I (x) = sup I N π N x , x∈X. (6.1.9)
N ∈N
128 6 Large Deviations
Since
I N (y) = inf I (x), y ∈ X N, (6.1.10)
{x∈X : π N (x)=y}
Theorem 6.7 (Gärtner-Ellis theorem) Let (με )ε>0 be a family of probability mea-
sures on Rd , d ≥ 1, with the following properties:
(i) φ(u) = limε↓0 ε ln Rd eu,x/ε με (dx) exists in R for all u ∈ Rd , where ·, ·
denotes the standard inner product on Rd .
(ii) u → φ(u) is differentiable on Rd .
Then (με )ε>0 satisfies the LDP on Rd with a convex rate function φ ∗ given by
φ ∗ (x) = sup u, x − φ(u) , x ∈ Rd . (6.1.11)
u∈Rd
• Sanov’s
Theorem: Let μn denote the law of the empirical distribution
n−1 ni=1 δXi . Then (μn )n∈N satisfies the LDP on M1 (R) with rate ε = n−1 and
rate function
dν
I (ν) = ν(dx) ln (x) , ν ∈ M1 (R), (6.1.13)
R dρ
with the right-hand side infinite when ν is not absolutely continuous with respect
to ρ.
6.2 Path large deviations for diffusion processes 129
Theorem 6.8 (Schilder’s theorem) Set B ε = (Bsε )s∈[0,T ] with Bsε = εBs/ε . Then
(B ε )ε>0 satisfies the LDP on C0 ([0, T ]) with rate function I given by
1 T
γ̇ (s)2 ds, if γ ∈ H1 ,
I (γ ) = 2 0 (6.2.2)
∞, otherwise,
Proof Fix δ>0. Note that (εBs/ε )s∈[0,T ] has the same distribution as (ε 1/2 Bs )s∈[0,T ] .
Hence
P B ε − γ ∞ < δ = P B − ε −1/2 γ ∞ < ε −1/2 δ . (6.2.4)
To estimate the probability in the right-hand side, we observe that, by the Girsanov
theorem (Theorem 5.61), the process B % = (B
%s )s∈[0,T ] defined by
%s = Bs − ε −1/2 γ (s)
B (6.2.5)
Hence, abbreviating
T
Z(B, γ ) = γ̇ (s)dBs , (6.2.7)
0
we get
P B − ε −1/2 γ ∞ < ε −1/2 δ = P B % ∞ < ε −1/2 δ
= EQ exp −ε −1/2 Z(B, % γ ) − ε −1 I (γ ) 1B
% ∞ <ε −1/2 δ
% ∞ < ε −1/2 δ
= exp −ε −1 I (γ ) Q B
× EQ exp −ε −1/2 Z(B, % γ ) | B% ∞ < ε −1/2 δ
= exp −ε −1 I (γ ) P B∞ < ε −1/2 δ
× EP exp −ε −1/2 Z(B, γ ) | B∞ < ε −1/2 δ . (6.2.8)
On the other hand, it is easy to see (applying Doob’s maximum in equality for
submartingales in Theorem 3.57) that
lim P B∞ < ε −1/2 δ = 1 (6.2.10)
ε↓0
6.2 Path large deviations for diffusion processes 131
and hence
lim inf ε ln P B − ε −1/2 γ ∞
< ε −1/2 δ ≥ −I (γ ). (6.2.11)
ε↓0
Next, we prove an upper bound for the probability in (6.2.1), which is stated in a
somewhat particular form.
n
P B ε − Lε ∞
> δ ≤ P max Bsε − Lεs > δ
s∈[tk−1 ,tk ]
k=1
= n P max Bsε − (sn/T )BTε /n > δ
s∈[0,T /n]
= n P max Bs − sB1 > δ(n/T ε)1/2
s∈[0,1]
≤ n P max Bs > 12 δ(n/T ε)1/2 , (6.2.13)
s∈[0,1]
where we use that maxs∈[0,1] Bs ≤ 12 x implies maxs∈[0,1] Bs − sB1 ≤ x. The
last probability can be estimated by using the following exponential inequality for
one-dimensional Brownian motion:
Pd=1 max |Bs | > xt ≤ 2 exp − 12 x 2 t , t ∈ R+ . (6.2.14)
s∈[0,t]
δ2 n
− 8T
≤ 2d e dε , (6.2.15)
and so
δ2n
lim sup ε ln P B ε − Lε ∞
>δ ≤− . (6.2.16)
ε↓0 8T d
132 6 Large Deviations
Indeed, we have
n ε
n dn
2
I Lε = Btk − Btεk−1 = 12 ε ηi2 , (6.2.18)
2T
k=1 i=1
1 2
E e ρ 2 η i ≤ Cρ < ∞ ∀ 0 < ρ < 1, (6.2.19)
it follows that
' (
dn
dn 1 2
P 1
2ε ηi2 > λ ≤ e−ρλ/ε E e ρ i=1 2 ηi ≤ e−ρλ/ε (Cρ )dn , (6.2.20)
i=1
we get
δ2n
Proof We have
N −1
|γ (ti+1 ) − γ (ti )|2
γ ∈ H1 ⇐⇒ sup sup 1
< ∞, (6.2.23)
N ∈N 0≤t1 <···<tN ≤T
2 |ti+1 − ti |2
i=1
6.2 Path large deviations for diffusion processes 133
with the supremum being equal to I (γ ). Let (γn )n∈N be a convergent sequence in
C0 ([0, T ]) with limit γ . Then
N −1
|γ (ti+1 ) − γ (ti )|2
I (γ ) = sup sup 1
N ∈N 0≤t1 <···<tN ≤T
2 |ti+1 − ti |2
i=1
N −1
|γn (ti+1 ) − γn (ti )|2
= sup sup 1
lim
N ∈N 0≤t1 <···<tN ≤T
2 n→∞ |ti+1 − ti |2
i=1
N −1
|γn (ti+1 ) − γn (ti )|2
≤ lim sup sup 1
n→∞ N ∈N 0≤t <···<t ≤T
1 N
2 |ti+1 − ti |2
i=1
We are now ready to prove the LDP in Theorem 6.8. Since I is not identically
infinite, Part (i) in Definition 6.1 follows from Lemma 6.11. To get Part (ii), note that
for every open set O ⊂ C0 ([0, 1]) and every γ ∈ O, there exists a δ = δ(O, γ ) > 0
such that {γ : γ − γ ∞ < δ} ⊆ O. Hence, Lemma 6.9 implies that
lim inf ε ln P(O) ≥ lim inf ε ln P B ε − γ ∞
< δ ≥ −I (γ ), (6.2.26)
ε↓0 ε↓0
which yields Part (ii) after we take the supremum over γ ∈ O. Part (iii) is derived
as follows. Since Kλ , λ ∈ [0, ∞), are compact, we know that
∀ λ, δ > 0 ∃ δ = δ λ, δ > 0 : Kλ−2δ
δ
⊆ Kλ−δ , (6.2.27)
We next show how to pass to the analogous result for a Brownian motion with a
drift, namely, we consider the SDE
t
Xt = Bt +
ε ε
b Xsε ds, t ∈ R+ , (6.2.30)
0
Theorem 6.12 Set X ε = (Xsε )s∈[0,T ] . Then (X ε )ε>0 satisfies the LDP on C0 ([0, T ])
with rate function I% given by
T 2
I%(γ ) = 1
2 γ̇ (s) − b γ (s) ds. (6.2.31)
0
Proof The easiest way to set up the proof is to consider the map F : C0 ([0, T ]) →
C0 ([0, T ]) given by
F (γ ) = f, (6.2.32)
where f is the solution of the integral equation
t
f (t) = γ (t) + b f (s) ds, t ∈ [0, T ]. (6.2.33)
0
Since under a continuous map the inverse image of an open (closed) set is again an
open (closed) set, we can use Schilder’s theorem (Theorem 6.8) and the contraction
principle (Lemma 6.4) to obtain that (X ε )ε>0 satisfies the LDP with rate function
I%= I ◦ F −1 , i.e.,
I%(γ ) = inf I (f ) : F (γ ) = f . (6.2.35)
6.2 Path large deviations for diffusion processes 135
Since
t
F −1 (f )(t) = γ (t) = f (t) − b f (s) ds, t ∈ [0, T ], (6.2.36)
0
It is possible to push the argument in Sect. 6.2.2 and consider the SDE
t
t
Xtε = b Xsε ds + σ Xsε dBsε , t ∈ R+ , (6.2.37)
0 0
Theorem 6.13 Set X ε = (Xsε )s∈[0,T ] . Then (X ε )ε>0 satisfies the LDP on C0 ([0, 1])
with rate function I& given by
1 T
[γ̇ − (b ◦ γ )], a −1 (γ ) [γ̇ − (b ◦ γ )] (s)ds, if γ ∈ H1 ,
I&(γ ) = 2 0
∞, otherwise,
(6.2.38)
where a −1 is the inverse of a, and ·, · is the standard inner product on Rd .
Theorem 6.13 can be deduced from Theorem 6.12 with the help of a time-change
argument. To see how, first suppose that b = 0 and for simplicity take d = 1. Then
[B ε ][Xε ]t = [X ε ]t , where [·] is the quadratic variation (recall Theorem 5.44). Let
t∈R+ is Brownian motion. Then [X ]i(t) = t. On the other
i(t) be such that (Xi(t) ε ) ε
hand,
i(t)
ε 2
X i(t) = σ Xsε ds. (6.2.39)
0
ε )2 = 1, which together with X ε = B ε shows
Hence differentiation gives σ (Xi(t) i(t) t
that
t
−2
i(t) = σ Bsε ds. (6.2.40)
0
Since i(t) is measurable with respect to the filtration generated by (Bsε )s∈[0,t] , it
follows that Biε−1 (t) has the same distribution as Xtε and hence is a weak solution
136 6 Large Deviations
∂ ε 1 ∂2 √ ∂2
u (x, t) = D 2 uε (x, t) − V uε (x, t) + 2ε W (x, t), (6.3.1)
∂t 2 ∂x ∂x∂t
where ε > 0 is a parameter that scales the strength of the noise. We want to think
of a mild solution as a random variable taking values in a Banach space. To do so,
define, for α ∈ (0, ∞), the Banach space
Bα = f ∈ C [0, 1] : f α < ∞ (6.3.2)
As initial condition for (6.3.1) we take u(·, 0) = ξ(·) with ξ ∈ Bα . Fix T > 0. Define
the space W21,2 as
#
$
T 1 ∂γ (x, t) 2
W21,2 = γ : [0, 1] × [0, T ] → R : dt
dx <∞ . (6.3.4)
0 0 ∂t
Recall that, by Definition 5.65 and Theorem 5.66, a mild solution of our SPDE is
a continuous map with values in Bα , for any α ∈ (0, 14 ). Thus, a family of mild
solutions (uε )ε>0 is a family of random variables with values in the Banach space
C([0, T ], Bα ). The following theorem asserts that these satisfy an LDP.
Theorem 6.14 The family (uε )ε>0 satisfies the LDP on C([0, T ], Bα ) with rate
function I given by
⎧ 1
⎪ 1 T ∂2
⎨ 2 0 dt 0 dx | ∂t γ (x, t) − 2 D ∂x 2 γ (x, t) + V (x, γ (x, t))| ,
∂ 1 2
⎪
I (γ ) = if γ ∈ W21,2 , γ (·, 0) = ξ(·), (6.3.5)
⎪
⎪
⎩ ∞, otherwise.
6.4 Path large deviations for Markov processes 137
Here is a sketch of how Theorem 6.14 comes about. The starting point is the LDP
for the Brownian sheet, which is the analogue of Schilder’s theorem for Brownian
motion (Theorem 6.8). To state this LDP, let H be the space of all h ∈ C([0, 1] ×
[0, T ]) such that there exists an ḣ ∈ L2 ([0, 1] × [0, T ]) with
x
t
h(x, t) = dy ds ḣ(y, s), x ∈ [0, 1], t ∈ [0, T ]. (6.3.6)
0 0
√
Theorem 6.15 ( 2ε W )ε>0 satisfies the LDP on C([0, 1] × [0, T ]) with rate func-
tion I0 given by
1
1 T
dt 0 dx |ḣ(x, t)|2 , if h ∈ H ,
I0 (h) = 2 0 (6.3.7)
∞, otherwise.
The LDP in Theorem 6.14 follows from Theorem 6.15 via the contraction prin-
ciple (Lemma 6.4), and identifies the rate function as
I (γ ) = inf I0 (h) : h ∈ H , T (h) = γ , (6.3.8)
where T is the map from H into C([0, T ], Bα ) such that T (h) = γ is the solution
of
1
t
1
γ (x, t) = dy gt (x, y)ξ(y) + ds dy gt−s (x, y) −V γ (y, s) + ḣ(y, s)
0 0 0
(6.3.9)
with gt (x, y) the density of the semi-group generated by 12 D∂ 2 /∂ 2 x on [0, 1] (the
heat kernel). Here, (6.3.8) and (6.3.9) are the infinite-dimensional analogues of
(6.2.33) and (6.2.35). The fact that (6.3.8) is the same as (6.3.5) follows from the
same type of inversion argument as in (6.2.36).
globally Lipschitz in the first and in the third coordinate, uniformly in the second
coordinate and in ε > 0, and to be such that
Theorem 6.16 For every T > 0, (Z̄ ε )ε>0 satisfies the LDP on C0 ([0, T ]) with rate
function I given by
L ∗ (γ (s), γ̇ (s); s)ds, if γ ∈ D0 ([0, T ]),
T
I (γ ) = 0 (6.4.5)
∞, otherwise,
A simple example to which the above setting applies is simple random walk
on εZd , for which we choose Δ = {x ∈ Zd : x = 1} and q ε = ln(2d). For
this case Theorem 6.16 reduces to Mogul’skiı̆’s theorem [186] for simple random
walk, the analogue of Schilder’s theorem for Brownian motion (see Dembo and
Zeitouni [79]).
The rate functions in Sects. 6.2–6.4 have the form of a classical action functional
in Newtonian mechanics, i.e., they are of the form
6.5 Freidlin-Wentzell theory 139
T
I (γ ) = L γ (s), γ̇ (s), s ds, (6.5.1)
0
for some Lagrangian L . In Theorem 6.12, for instance, L takes on the special
form
2
L γ (s), γ̇ (s), s = 12 γ̇ (s) − b γ (s) 2 . (6.5.2)
The principle of least action in classical mechanics states that the system follows
the trajectory of minimal action subject to boundary conditions. This leads to the
Euler-Lagrange equations
d2 d
γ (s) = 2b γ (s) b γ (s) . (6.5.4)
ds 2 dγ (s)
These solutions have the property that they yield absolute minima of the action
functional, since they satisfy
L γ (s), γ̇ (s) = 0. (6.5.6)
Of course, being first-order, this equation admits only one boundary (or initial) con-
dition.
A typical question we may ask is the following: What is the probability of a solution
connecting two points u, v ∈ Rd in time T ? The LDP in Theorem 6.12 provides the
answer, namely,
lim lim ε ln P XTε ∈ Bδ (v) | X0ε ∈ Bδ (u) = − inf I (γ ), (6.5.7)
δ↓0 ε↓0 γ : γ (0)=u, γ (T )=v
where Bδ (x) is the ball of radius δ > 0 around x ∈ Rd . This leads us to solve (6.5.4)
subject to the boundary conditions γ (0) = u and γ (T ) = v. Unfortunately, not all
solutions of (6.5.4) also solve (6.5.5) as they can have positive action, meaning
that the event under consideration has an exponentially small probability. However,
140 6 Large Deviations
under certain conditions we may find a zero-action solution, for instance, when we
do not fix the time of arrival at v:
lim lim ε ln P Xsε ∈ Bδ (v) for some s ∈ [0, T ] | X0ε ∈ Bδ (u)
δ↓0 ε↓0
(6.5.8)
=− inf I (γ ).
γ : γ (0)=u, γ (s)=v for some s∈[0,T ]
Clearly, the infimum will be zero if the solution of (6.5.5) with γ (0) = u has the
property that γ (s) = v for some s ∈ [0, T ].
Suppose that we consider an event as in (6.5.8) that admits a zero-action path
γ with γ (0) = u and γ (T ) = v. Define the time-reversed path γ̄ (s) = γ (T − s),
s ∈ [0, T ]. Clearly, γ̄˙ (s) = −γ̇ (T − s). Hence a simple calculation, via (6.5.1)–
(6.5.2), shows that
T
I (γ ) − I (γ̄ ) = 2 b γ (s) γ̇ (s)ds = 2 b(x)dx. (6.5.9)
0 γ
Let us now specialise to the case where b is the gradient of a potential F , i.e.,
b(x) = ∇F (x), x ∈ Rd . Then
b(x)dx = F γ (T ) − F γ (0) = F (v) − F (u). (6.5.10)
γ
Hence
I (γ ) − I (γ̄ ) = 2 F (v) − F (u) . (6.5.11)
If I (γ ) = 0, then I (γ̄ ) = 2[F (u) − F (v)], and this is the minimal possible value for
any path going from v to u. Thus, there is the remarkable fact that the most likely
path going uphill in a potential is the time-reversal of the solution of the gradient
flow.
So far we have considered paths of a fixed time length T . Freidlin and Wentzell
allowed paths of arbitrary length and introduced the notion of quasi-potential:
where σBδ (u) = inf{t ∈ R+ : Xtε ∈ / Bδ (u)} is the first exit time of the ball Bδ (u). The
probability in (6.5.13) is the proper version of the escape probability from u to v.
In the setting of Fig. 6.4, we have
V (u, v) = V u, z∗ + V z∗ , v , (6.5.14)
6.5 Freidlin-Wentzell theory 141
where, by (6.5.11),
V u, z∗ = V z∗ , u + 2 F z∗ − F (u) , (6.5.15)
while
V z∗ , u = V z∗ , v = 0. (6.5.16)
Hence V (u, v) = 2[F (z∗ ) − F (u)], i.e., the exponential asymptotics of the escape
probability from u to v is given by twice the height of the potential barrier from u
to v.
Let τBδ (v) be the first hitting time of the ball Bδ (v). With the help of a simple re-
newal argument, (6.5.13) can be shown to imply that, for every ρ > 0 and uniformly
in wδ ∈ Bδ/2 (u),
∗ ∗
lim lim P e2[F (z )−F (u)−ρ]/ε ≤ τBδ (v) ≤ e2[F (z )−F (u)+ρ]/ε | X0ε = wδ = 1.
δ↓0 ε↓0
(6.5.17)
6.5.3 Metastability
The discussion in Sects. 6.5.1–6.5.2 forms the basis of the treatment of metastabil-
ity in Freidlin-Wentzell theory. In this theory, any constant or periodic solution of
(6.5.6) is a candidate for a metastable state. If γ is such a solution, then it is called
unstable when there exists another solution γ̃ and a family of functions (γn )n∈N
such that
#
γ (t), t ≤ −n,
γn (t) = (6.5.18)
γ̃ (t), t ≥ n,
while infn∈N I (γn ) = 0. In other words, a solution is unstable when it can be de-
formed into another solution at an arbitrarily small cost. Otherwise, the solution is
142 6 Large Deviations
called stable. In the context of the Markov process, a stable solution is interpreted
as a metastable state, also called a cycle. For us the most interesting situations cor-
respond to fixed points, i.e., solutions of (6.5.6) that are constant in time. In the case
of a reversible Markov process these are the only possible solutions.
A system is called metastable when it has at least two metastable states. In
the presence of noise there exist (exponentially unlikely) trajectories that consti-
tute transitions between these states. The variational problem in (6.5.12) with u, v
metastable states (respectively, its obvious extension when u, v are not fixed points),
provides the asymptotics of the transition probabilities between them, while (6.5.17)
provides control over the transition times between them. This, in a nutshell, is the
basis of the Freidlin-Wentzell theory of metastability. The strong point of this theory
is its great versatility. In particular, no assumption of reversibility needs to be made.
The weak point is the poor level of precision, i.e., only the exponential asymptotics
of characteristic quantities such as hitting times is obtained.
Freidlin-Wentzell theory does not offer the tools to go beyond the exponential
asymptotics. The goal of the present monograph is to position potential theory as
the key mathematical framework for obtaining sharper asymptotics, and to outline
the main ideas and techniques that are available to tackle concrete models.
2. Theorem 6.13 in Sect. 6.2 lies at the heart of Freidlin-Wentzell theory. For further
reading we refer to the monographs by Freidlin and Wentzell [115], Dupuis and
Ellis [98], and Feng and Kurtz [110].
3. The LDP in Theorem 6.14 is derived in Sowers [221]. Extensions to larger classes
of SPDEs were obtained by Kallianpur and Xiong [147] and by Chenal and Mil-
let [57].
4. Theorem 6.16 in Sect. 6.4 is taken from Bovier and Gayrard [37] and will be
needed in Chap. 10. For extensions to general dynamical systems, see the mono-
graph by Kifer [152].
The martingale problem and the stopping times that were described in Chaps. 4–5
provide the key link between Markov processes and Dirichlet problems. This chap-
ter gives a detailed account of this connection. Although, once again, the basic
principles are the same in discrete and in continuous time, we split the presenta-
tion: discrete time and countable state space (Sect. 7.1), continuous time and gen-
eral state space (Sect. 7.2). The mixed cases are similar and are left to the reader.
Once we have built up the necessary tools, we provide three variational formulas
for the capacity referred to as the Dirichlet principle, the Thomson principle and
the Berman-Konsowa principle (Sect. 7.3). These will be crucial for the metastable
analyses carried out in Parts IV–VIII. The variational principles can be extended to
the non-reversible setting, but become harder to work with (Sect. 7.4).
Fig. 7.1 Dirichlet problem for f : S → R with source k : D → [−K, ∞) and boundary condi-
tions g : D → R and ḡ : D c → R
Proof The most convenient way to prove Theorem 7.1 is via the martingale problem
characterisation of Markov processes. Indeed, as in Lemma 4.9, we check that, for
any k : S → R bounded from below,
' t−1 (
" 1
Mt = f (Xt ) − f (X0 )
1 + k(Xu )
u=0
' s (
t−1 "
1
+ (−Lf )(Xs ) + k(Xs )f (Xs ) (7.1.4)
1 + k(Xu )
s=0 u=0
Note that the solution of the Dirichlet problem is unique, unless the homogeneous
problem
(−Lf )(x) + k(x)f (x) = 0, ∀ x ∈ D,
(7.1.5)
f (x) = 0, ∀ x ∈ Dc ,
7.1 The Dirichlet problem: discrete time 147
admits a non-zero solution. The most interesting case for us is when k = λ is con-
stant. In that case, if (7.1.5) admits a non-zero solution, then λ is called an eigenvalue
and the corresponding solution an eigenfunction of the Dirichlet problem. A solution
of the homogeneous Dirichlet boundary value problem with k = 0,
(−Lf )(x) = 0, ∀ x ∈ D,
(7.1.6)
f (x) = ḡ(x), ∀ x ∈ Dc ,
(−Lw)(x) = 1, x ∈ D,
(7.1.8)
w(x) = 0, x ∈ Dc .
The objects we introduce now will turn out to be fundamental in the study of
metastability. We consider the case where the solution of the Dirichlet problem
in (7.1.1) is unique. For simplicity, we restrict ourselves to the case where k = λ
is constant. Then the solution to (7.1.1) can be written in the form
f (x) = GλD c (x, z)g(z) + HDλ c (x, z)ḡ(z), x ∈ D, (7.1.9)
z∈D z∈D c
where
τ
D c −1
−s−1
GλD c (x, z) = Ex (1 + λ) 1Xs =z , x, z ∈ D, (7.1.10)
s=0
Fig. 7.2 Dirichlet problem for h : S → [0, 1] with boundary conditions h = 1 on A and h = 0
on B
is called the Poisson kernel. Clearly, the Green function can also be characterised
as the solution of the problem
i.e., for the Markov process starting at x ∈ D, GD c (x, z) represents the average
number of visits to z ∈ D before it exits from D, while HD c (x, z) represents the
probability that it enters D c at z.
The following object will be absolutely central in our study of metastability. Let
A, B ⊂ S be two non-empty disjoint subsets. Consider the Dirichlet problem (see
Fig. 7.2)
(−Lh)(x) = 0, ∀ x ∈ S\(A ∪ B),
h(x) = 1, ∀ x ∈ A, (7.1.15)
h(x) = 0, ∀ x ∈ B.
Suppose that (7.1.15) has a unique solution, e.g. because Ex [τA∪B ] < ∞ for all
x ∈ S. The harmonic function that solves (7.1.15) is denoted by hA,B (x) and is
called the equilibrium potential. The representation of the solution given in (7.1.9)
and (7.1.13)–(7.1.14), with D = S\(A ∪ B), D c = A ∪ B, g = 0, ḡ(x) = 1, x ∈ A,
and ḡ(x) = 0, x ∈ B, implies that
hA,B (x) = Ex 1A (XτA∪B ) = Px (τA < τB ), x ∈ S\(A ∪ B). (7.1.16)
7.1 The Dirichlet problem: discrete time 149
This equation gives an analytic representation for the probability in the right-hand
side when x ∈ S\(A ∪ B). Using the Markov property, we can get a similar expres-
sion when x ∈ A ∪ B. Namely, for x ∈ A ∪ B,
Px (τA < τB ) = p(x, y)Py (τA < τB ) + p(x, y)
y∈S\(A∪B) y∈A
= p(x, y)hA,B (y) = (P hA,B )(x)
y∈S
and for x ∈ A as
The quantity
eA,B (x) = (−LhA,B )(x), x ∈ A, (7.1.20)
is called the equilibrium measure on A, and is the second central object in our study
of metastability.
The following simple observation provides a fundamental connection between
the objects we have introduced so far, and leads to a different representation of the
equilibrium potential. Pretend that the equilibrium measure eA,B is already known.
Then the equilibrium potential solves the inhomogeneous Dirichlet problem
Relation (7.1.22) can be used to express the Green function in terms of the equi-
librium measure and the equilibrium potential: simply choose A = {a}, to get
ha,B (x)
GB (x, a) = , x ∈ S. (7.1.23)
ea,B (a)
Note that ea,B (a) = Pa (τB < τa ) has the meaning of an escape probability from a
to B. The full power of Theorem 7.3 will come out in the reversible case, which we
discuss next.
150 7 Potential Theory
7.1.3 Reversibility
Definition 7.4 A Markov process with countable state space S and transition kernel
P = {p(x, y), x, y ∈ S}, is called reversible if there exists a non-zero μ : S → R+
such that
μ(x)p(x, y) = μ(y)p(y, x) ∀ x, y ∈ S. (7.1.24)
The function μ is called the reversible measure of the Markov process.
The function space L2 (S, μ) is a natural space to work on when the Markov
process is reversible with respect to μ.
Lemma 7.5 Let f ∈ L2 (S, μ), where μ is invariant with respect to P . Then Pf ∈
L2 (S, μ).
Proof The claim follows from the fact that P is a contraction in the L2 -norm:
2
2
μ(x) (Pf )(x) = μ(x) p(x, y)f (y) (7.1.25)
x∈S x∈S y∈S
2
≤ μ(x) p x, y f y p x, y
x∈S y ∈S y ∈S
2 2
≤ μ(x) p x, y f y = μ y f y ,
x∈S y ∈S y ∈S
Proof Clearly, f = 1 is in L2 (S, μ). Hence, for all bounded measurable functions g,
μ(x)p(x, y)g(y) = p(y, x)μ(y)g(y) = μ(y)g(y), (7.1.26)
x,y∈S x,y∈S y∈S
and so μ is invariant.
7.1 The Dirichlet problem: discrete time 151
We next come to the definition of the Dirichlet form, which plays a central rôle
in the potential-theoretic approach to metastability.
Lemma 7.7 Let L be the generator of a Markov process with reversible measure μ.
Then L defines a non-negative-definite quadratic form
E (f, g) = μ(x)f (x)(−Lg)(x), f, g ∈ L2 (S, μ), (7.1.27)
x∈S
Proof In the discrete case it suffices to write out E (f, g) explicitly. Namely, by
reversibility,
E (f, g) = μ(x)p(x, y)f (x) g(x) − g(y)
x,y∈S
= μ(x)p(x, y)f (y) g(y) − g(x) . (7.1.28)
x,y∈S
An important rôle will be played by the analogue of the two Green identities for
sums.
Lemma 7.8 Let f, g ∈ L2 (S, μ) and D ⊂ S. Assume that P is reversible with re-
spect to μ. Then
(i) (first Green identity)
1
2 μ(x)p(x, y) f (x) − f (y) g(x) − g(y)
x,y∈D
= μ(x)f (x)(−Lg)(x) − μ(x)p(x, y)f (x) g(x) − g(y)
x∈D x∈D,y∈D c
(7.1.30)
Proof To prove the first Green identity, we proceed as in the proof of Lemma 7.7. If
D = S, then the proof gives (7.1.30) without the last term. If D S, then in order
to produce the full action of L we must add the terms that involve y ∈ D c .
The first equality in the second Green identity is a trivial consequence of the first
Green identity. To get the second equality, use reversibility, add terms that involve
x ∈ D c to produce the full action of L, and use that these terms add up to zero. Note
that the equality between the first and the last line is just the statement that L is
symmetric in L2 (S, μ).
An illustration of what can be done with the Green identities is the following
formula for the Poisson kernel in terms of the Green function.
Lemma 7.9 (Poisson kernel and Green function) If P is reversible with respect to
μ and D ⊂ S, then the Poisson kernel defined in (7.1.11) satisfies
μ(x)
HD c (z, y) = p(x, y) GD c (x, z) − GD c (y, z) , z ∈ D, y ∈ D c .
μ(z)
x∈D
(7.1.32)
Proof Fix z ∈ D. In (7.1.31), choose for f the solution of the Dirichlet problem
in (7.1.6), and choose g(x) = GD c (x, z), x ∈ S. With this choice, by (7.1.12) with
λ = 0, the first line in (7.1.31) simply becomes −μ(z)f (z). The second line reads
μ(x)p(x, y) f (x)GD c (y, z) − GD c (x, z)f (y)
x∈D,y∈D c
=− μ(x)p(x, y)GD c (x, z)ḡ(y), (7.1.33)
x∈D,y∈D c
where we use that GD c (y, z) = 0 and f (y) = ḡ(y) for y ∈ D c , again by (7.1.6) and
(7.1.12) with λ = 0. Hence
μ(x)
From this expression the Poisson kernel HD c (z, y) can be read off as the sum
between the brackets, where we recall (7.1.9) and use that ḡ is arbitrary. Since
GD c (y, z) = 0 for y ∈ D c , we thus obtain (7.1.32).
Theorem 7.10 If P is reversible with respect to μ, then for all non-empty disjoint
sets A, B ⊂ S,
μ(y)
hA,B (x) = GB (y, x)eA,B (y), x ∈ S. (7.1.35)
μ(x)
y∈A
Proof The key observation is that not only L but also its inverse L−1 is symmetric
in L2 (S, μ). This implies that
μ(x)GB (x, y) = μ(y)GB (y, x), x, y ∈ S, (7.1.40)
and yields (7.1.35) via (7.1.22). Multiplying
both sides of (7.1.35) by μ(x)g(x),
summing over x ∈ S, and noting that x∈S GB (y, x)g(x) = f (y), we get (7.1.37)
apart from the normalisation factor cap(A, B). Dividing by this quantity, we obtain
(7.1.37).
The measure νA,B is called the last-exit biased distribution on A for the transition
from A to B. The number cap(A, B) is called the capacity of the pair (A, B).
The following corollary of Theorem 7.10 provides a formula for mean hitting
times, which plays a crucial rôle in our study of metastability.
1
Ex [τB ] = μ(y)hx,B (y). (7.1.42)
cap(x, B)
y∈S
Proof Note that the representation in (7.1.3) shows that the solution f of (7.1.1)
with k = 0, ḡ = 0 and g = 1 is
In Theorem 7.10 capacity made its first appearance. The first Green identity pro-
vides an important alternative representation of capacity in terms of the Dirichlet
form.
Proof This is obvious from the definition of the Dirichlet form in Lemma 7.7,
the definition of the equilibrium measure in (7.1.20), the definition of the capac-
ity in (7.1.39) and the definition of the equilibrium potential in (7.1.15) defining the
equilibrium potential hA,B .
Note that Lemma 7.12 becomes useful through the alternative representation of
the Dirichlet form given in (7.1.29).
We close by listing a few relations, linking hitting probabilities and capacities,
that will be needed in Chap. 8.
Lemma 7.13
(i) μ(x)Px (τB < τx ) = cap(x, B) for x ∈ S, B ⊂ S\{x}.
(ii) Py (τx < τB )/Px (τy < τB ) = cap(x, B)/cap(y, B) for x, y ∈ S, B ⊂ S\{x, y}.
(iii) Py (τB < τx ) ≤ cap(x, B)/cap(x, y) for x, y ∈ S, B ⊂ S\{x, y}.
Proof (i) It follows from (7.1.19)–(7.1.20) with A = {x} that ex,B (x) = (−Lhx,B )(x)
= Px (τB < τx ). It follows from (7.1.39) with A = {x} that cap(x, B) = μ(x)ex,B (x).
(ii) Use the second Green identity in (7.1.31), with D = {x}, g = hx,B , f = hy,B
and x, y ∈ S\B, to get
Since hy,B (x) = Px (τy < τB ) by (7.1.16) with A = {y}, we get the claim.
7.1 The Dirichlet problem: discrete time 155
(iii) Again use the second Green identity, this time with D = {x}, g = hx,y , f =
hB,x and x, y ∈ S\B, to get
μ(y)(Lhy,x )(y)hB,x (y) = μ(z)(LhB,x )(z)hy,x (z), x, y ∈ S\B. (7.1.46)
z∈B
side equals cap(y, x)Py (τB < τx ), the right-hand side is bounded from
The left-hand
above by z∈B μ(z)(LhB,x )(z), which equals cap(B, x) by (7.1.39).
Equilibrium potential
Due to the one-dimensional nature of our Markov process, the only equilibrium
potentials we have to compute are of the form
"
x−1
p(z, z − 1) "
x−1
μ(z) p(z, z − 1) μ(a + 1) p(a + 1, a)
= = .
p(z, z + 1) μ(z + 1) p(z + 1, z) μ(x) p(x, x − 1)
z=a+1 z=a+1
(7.1.52)
But
so that
where we abbreviate
v
1 1
R(u, v) = , u < v. (7.1.55)
μ(y) p(y, y − 1)
y=u+1
Now h(a) = 0, and so it remains to determine d(a + 1) from the condition h(b) = 1,
i.e.,
1 = R(a, b) μ(a + 1) p(a + 1, a) d(a + 1). (7.1.56)
Combining this with (7.1.54), we get
R(a, x)
hb,a (x) = , a < x < b. (7.1.57)
R(a, b)
Capacity
1
cap(a, b) = . (7.1.60)
R(a, b)
7.2 The Dirichlet problem: continuous time 157
Inserting (7.1.57) and (7.1.60) into (7.1.42) (with A = {x} and B = {a}), we get
' x−1 ∞
(
R(a, y)
Ex [τa ] = R(a, x) μ(y) + μ(y) , a < x. (7.1.61)
R(a, x) y=x
y=a+1
This formula will be used in Chap. 13 to compute the metastable crossover time
for the Curie-Weiss model. The latter will be shown to link up nicely with Kramers
formula for Brownian motion in a double-well potential, as discussed in Sect. 2.1.1.
See also Sect. 7.2.5.
7.2.1 Definition
The Dirichlet problem in (7.2.1) can also be posed when ḡ is not a continuous
function. In that case the continuity requirement must be replaced by the condition
that, for all x ∈ ∂D, if limn→∞ xn = x in D, then limn→∞ f (xn ) = ḡ(x).
The analogue of Theorem 7.1 is the following basic representation theorem.
158 7 Potential Theory
then
τD c
Proof We use Lemma 5.35 with g = L f , where f solves the Dirichlet problem.
Condition (7.2.2) is, like its analogue (7.1.2) in the discrete-time case, sufficient to
imply that the optional sampling theorem holds.
As in (7.1.9) for the discrete case, we rewrite (7.2.3) for the case k(x) = λ as
f (x) = GD c (x, dz) g(z) +
λ
HDλ c (x, dz) ḡ(z), x ∈ D, (7.2.4)
D Dc
where
τD c
GλD c (x, dz) = Ex e−λt 1Xt ∈dz dt , x, z ∈ D, (7.2.5)
0
is called the Green function and
HDλ c (x, dz) = Ex e−λτDc 1XτDc ∈dz (7.2.6)
∞
= e−λt Px (τD c ∈ dt, Xt ∈ dz), x ∈ D, z ∈ D c ,
0
are bounded from above, very little changes from the discrete-time setting, and most
formulas remain unaltered. We need the hitting times
Note that this definition makes sure that if X0 ∈ A, then τA is not identically zero.
The Green function and the Poisson kernel take the form
τ c
D
−λt
GD c (x, y) = Ex
λ
e 1Xt =y dt , x, y ∈ D, (7.2.10)
0
HDλ c (x, y) = Ex e−λτDc 1X(τDc )=y , x ∈ D, y ∈ D c . (7.2.11)
c(x) L
1
Indeed, is the generator of the underlying discrete-time Markov process.
1
Apart from the factor c(x) , all formulas derived for the reversible discrete-time case
remain unaltered.
Matters become more involved for an uncountable state space. We will restrict our
discussion to the case of elliptic diffusion processes in Rd ,
1 ∂ 2 f (x)
d d
∂f (x)
(L f )(x) = aij (x) + bi (x) , (7.2.15)
2 ∂xi ∂xj ∂xi
i,j =1 i=1
d
aij (x) = σik (x)σkj (x). (7.2.16)
k=1
In the sequel we will always assume that the dispersion matrix σ is non-degenerate
and hence the diffusion matrix a is strictly positive, i.e., for all x ∈ Rd , a(x) defines
a strictly positive quadratic form. For this case the operator L is called elliptic. If,
for some open domain D ⊂ Rd ,
aij (x)ξi ξj ≥ δξ 22 , x ∈ D, (7.2.17)
i,j
Theorem 7.15 applies to this situation, but it is somewhat delicate to check when the
assumptions are satisfied.
For the case k(x) ≤ 0, condition (7.2.2) is ensured (for bounded domains) by a
rather weak ellipticity condition.
Lemma 7.16 Let D ⊂ Rd be open and bounded. Assume that, for some 1 ≤ ≤ d,
Proof Set a = minx∈D̄ a (x), b = maxx∈D̄ b(x) and q = minx∈D̄ x . Let ν >
2b/a. Consider the smooth function h(x) = −μeνx , x ∈ Rd , with μ > 0. Clearly,
(−L h)(x) = μeνx 12 ν 2 a (x) + νb (x) ≥ 12 μνaeνq (ν − 2b/a). (7.2.20)
7.2 The Dirichlet problem: continuous time 161
Choose μ such that the right-hand side is larger than 1, so that (−L h)(x) ≥ 1 for
all x ∈ D. Since
t∧τDc
h(Xt∧τDc ) + (−L h)(Xs )ds (7.2.21)
0
is a martingale, it follows that
t∧τ c
D
Ex (−L h)(Xs )ds = h(x) − Ex h(Xt∧τDc ) , (7.2.22)
0
hence
Ex [t ∧ τD c ] ≤ h(x) − Ex h(Xt∧τDc ) , (7.2.23)
and so
Ex [t ∧ τD c ] ≤ max |h(y)| < ∞. (7.2.24)
y∈D̄
Theorem 7.17 A point z ∈ ∂D is regular if there exists a cone A with tip z such that
A ∩ Br (z) ⊂ D c for some r > 0, where Br (z) is the ball of radius r centred at z.
If all points of ∂D are regular, then existence and uniqueness of the solution of the
Dirichlet problem holds.
Reversibility
Let (Pt )t∈R+ be a strongly continuous contraction semigroup acting on the space
B(S) of bounded measurable functions on S. Assume that a measure μ on S is
invariant with respect to (Pt )t∈R+ . Then the action of (Pt )t∈R+ can be extended to
L2 (S, μ). The following lemmas are the analogues of Lemmas 7.5–7.7 for discrete
time, and their proofs can be copied.
162 7 Potential Theory
Lemma 7.18 Let f ∈ L2 (S, μ), where μ is invariant with respect to (Pt )t∈R+ . Then
Pt f ∈ L2 (S, μ) for all t ∈ R+ .
We may check that (Pt∗ )t∈R+ is itself a Markov semigroup that generates the time-
reversal of X, in the sense that (Pt∗ f )(Xt ) = f (X0 ).
The notions that were introduced above extend from the semigroup to the gener-
ator. Thus, for an invariant measure μ, we can define the adjoint L ∗ of a generator
L via
μ(dx) L ∗ g (x)f (x) = μ(dx)(L f )(x)g(x),
S S
Lemma 7.21 Let μ be a reversible measure. Then the generator L defines a non-
negative-definite quadratic form,
E (f, g) = μ(dx)g(x)(−L f )(x), (7.2.27)
S
Proof By the fact that L is self-adjoint, E (f, f ) is real for all f ∈ D(L ). More-
over, if E (f, f ) < ∞, then
−1
E (f, f ) = lim t μ(dx)f (x) f (x) − (Pt f )(x) . (7.2.28)
t↓0 S
But
μ(dx)f (x) f (x) − (Pt f )(x) = f 22,μ − μ(dx)f (x)(Pt f )(x)
S
Reversible diffusions
First, we note that the formal adjoint in L2 (dx) of the operator L given in (7.2.15)
is
∗ 1 ∂2 ∂
L g (x) = aij (x)g(x) − bi (x)g(x)
2 ∂xi ∂xj ∂xi
i,j i
1 ∂ 2 g(x)
= aij (x)
2 ∂xi ∂xj
i,j
∂aij (x)
∂g(x)
+ − bi (x)
∂xj ∂xi
i j
2
which thus is the condition for the diffusion to be reversible with respect to Lebesgue
measure.
Next, we look for a reversible measure of the form μ(dx) = e−F (x) dx. Then μ
is reversible if and only if, for all g ∈ D(L ),
∗ −F
L ge (x) = e−F (x) (L g)(x). (7.2.33)
∗ −F 1 ∂ 2 g(x)
L ge (x) = e−F (x) aij (x)
2 ∂xi ∂xj
i,j
∂F (x) ∂g(x)
− e−F (x) aij (x)
∂xi ∂xj
i,j
164 7 Potential Theory
2
1 ∂ F (x) ∂F (x) ∂F (x)
+ e−F (x) aij (x) + g(x)
2 ∂xi ∂xj ∂xi ∂xj
i,j
∂aij (x)
∂F (x) ∂g(x)
or
1 F (x) ∂
bi (x) = e aij (x)e−F (x) , i = 1, . . . , d. (7.2.36)
2 ∂xj
j
Inserting this relation into (7.2.15), we see that the operator L can be written in the
form
1 ∂ ∂
1 ∂
bi (x) = − F (x), i = 1, . . . , d, (7.2.38)
2 ∂xi
i.e., the drift b is the gradient of the potential −F (up to the factor 12 ). In that case
the generator L takes the suggestive form
1
(L g)(x) = eF (x) ∇e−F (x) ∇ g(x). (7.2.39)
2
The corresponding Dirichlet form can be written as
1 - .
E (f, g) = − μ(dx)f (x)(L g)(x) = μ(dx) ∇f (x), ∇g(x) , (7.2.40)
S 2 S
where ·, · denotes the standard inner product in Rd . In the case of general a we
just need to use the inner product relative to a, i.e.,
1 ∂f (x) ∂g(x)
E (f, g) = − μ(dx)f (x)(L g)(x) = μ(dx) aij (x) .
S 2 S ∂xi ∂xj
i,j
(7.2.41)
7.2 The Dirichlet problem: continuous time 165
(−L h)(x) = 0, x ∈ D,
h(x) = 1, x ∈ A, (7.2.42)
h(x) = 0, x ∈ B,
is denoted by hA,B and is called the equilibrium potential of the capacitor (A, B).
As in the discrete-time case,
Remark 7.22 The above names come from the classical case where L = 12 Δ, for
which the Dirichlet problem is a problem of electrostatics. The sets A and B cor-
respond to two metal plates attached to a battery that imposes a constant voltage
(potential difference) between the plates. The solution of this problem describes the
electrostatic potential, whose gradient is the electrostatic field.
We have seen in Theorem 7.15 that if (7.2.44) has a unique solution, then this solu-
tion has the probabilistic representation
τ c
D
f (x) = Ex g(Xt )dt , x ∈ D. (7.2.45)
0
166 7 Potential Theory
The Green kernel will often have a density with respect to Lebesgue measure, i.e.,
In that case GD c (x, y) is called the Green function. For the special case of (7.2.4)
with λ = 0, g = 1 and ḡ = 0, (7.2.45) yields the relation
Ex [τD c ] = GD c (x, y) dy. (7.2.47)
D
Let us next look at the relation between the equilibrium potential and the Dirich-
let form in the case of a reversible diffusion. We want to compute E (hA,B , hA,B ).
We might be tempted to think that E (hA,B , hA,B ) = 0, because (L hA,B )(x) = 0
except on the sets ∂A and ∂B. But on these sets L hA,B is singular because hA,B is
not differentiable. Therefore we may interpret L hA,B as a measure that is concen-
trated on A and B. Since hA,B vanishes on ∂B, we get
E (hA,B , hA,B ) = μ(x)(−L hA,B )(dx). (7.2.48)
∂A
The measure eA,B (dx) = (−L hA,B )(dx) is called the equilibrium measure associ-
ated with the capacitor (A, B).
To understand the above observation better, let us return to the case aij = δij .
We then have the following integral formulas known as the Green identities, which
constitute the analogue of Lemma 7.8.
Lemma 7.23 Let D be a regular domain, let f, g ∈ C 2 (D), and let L be the re-
versible operator given by (7.2.37). Then
(i) (first Green identity)
- .
dx e−F (x) ∇f (x), ∇g(x) − g(x)(2L f )(x)
D
= e−F (x) g(x)∂n(x) f (x) dσD (x) (7.2.49)
∂D
hold with
∂
∂n(x) = ni (x)aij (x) , (7.2.51)
∂xj
i,j
7.2 The Dirichlet problem: continuous time 167
where n(x) denotes the inner normal unit vector at x ∈ ∂D. In the case aij = δij ,
∂n(x) is the usual normal derivative at x.
Proof For the case F = 0 and aij = δij , both formulas are classical and can be found
in any standard textbook on potential theory. The extension to the general case is by
straightforward computation.
As in the discrete case, the Green identities give rise to a representation of the
Poisson kernel in terms of the Green function.
−(L f )(x) = 0, x ∈ D,
(7.2.52)
f (x) = ḡ(x), x ∈ ∂D,
is given by
f (x) = ḡ(y) eF (x)−F (y) ∂n(y) GD c (y, x)dσD (y), x ∈ D, (7.2.53)
∂D
i.e.,
Using the first Green identity, we can state a precise relation between the equilib-
rium potential and the capacity. Namely, setting f = g = hA,B in (7.2.49), we see
that
E (hA,B , hA,B ) = dx e−F (x) hA,B (x)(−L hA,B )(x)
∂A
= e−F (x) ∂n(x) hA,B (x)dσA (x), (7.2.55)
∂A
i.e., on A the equilibrium measure eA,B is given by
is called the capacity of the capacitor (A, B), which in electrical language is the
total charge on the plate A. Using (7.2.55), we see that, alternatively, the capacity is
the total energy of the potential hA,B .
For x ∈ D the limit exists and equals zero. For x ∈ A, however, the limit does not
exist, but we will make sense of it in a weak sense. To that end, let us define the last
exit time TA from A prior to arrival in B as
with the convention that sup ∅ = 0. This is not a stopping time, and
Note that we can write the expectation in the last line of (7.2.59) as
Ex PXt (τB < τA ) = Px (0 < TA < t), x ∈ D ∪ A. (7.2.62)
Set
ψt (x) = t −1 Px (0 < TA < t), x ∈ D ∪ A. (7.2.63)
Define the last exit distribution (x, dy) on A by
Proof Without loss of generality, let f ≥ 0. Fix x ∈ D ∪ A. Using the integral rep-
resentation of the Green function in (7.2.5), we get
τB
GB (x, y)ψt (y)f (y)dy = Ex ψt (Xs )f (Xs )ds (7.2.66)
D∪A 0
7.2 The Dirichlet problem: continuous time 169
∞
∞
= t −1 Ex f (Xs ) PXs (0 < TA < t) ds = t −1 Ex f (Xs )1s<TA <s+t ds
0 0
TA
TA
= Ex 10<TA ≤t t −1 f (Xs )ds + Ex 1TA >t t −1
f (Xs )ds .
0 TA −t
Both terms in the last line are obviously uniformly bounded as t ↓ 0. Moreover,
TA
Ex 10<TA ≤t t −1 f (Xs )ds ≤ C Px [0 < TA ≤ t] ↓ 0, t ↓ 0, (7.2.67)
0
From Lemma 7.27 we deduce that the family of measures ψt (y)dy, t > 0, con-
verges as t ↓ 0 to a measure e(dy) on A, which satisfies
Hence e(dy) = eA,B (dy), the equilibrium measure that was introduced in (7.2.56).
In conclusion, we have proven the following analogue of Theorem 7.3.
The formula for the Green function gives corresponding formulas for solutions
of Dirichlet problems. For instance, if for some function g we consider the Dirichlet
problem
(−L f )(x) = g(x), x ∈ D ∪ A,
(7.2.74)
f (x) = 0, x ∈ B,
then f (x) = D∪A dy GB (x, y)g(y). By reversibility,
and so
e−F (x) hA,B (x)g(x) dx
D∪A
= dx e−F (x) g(x) GB (y, x) eF (x)−F (y) eA,B (dy)
D∪A A
= e−F (y) eA,B (dy) GB (y, x)g(x) dx
A D∪A
= e−F (y) eA,B (dy)f (y). (7.2.75)
A
for ε > 0, with a(x) > 0 and b(x) ∈ R. The case a(x) = 1 corresponds to the clas-
sical Kramers equation (2.1.1). It follows from the general formula in (7.2.35) that
the invariant measure for this diffusion is given by
x
1 b(z)
μ(dx) = exp dz/ε , (7.2.81)
a(x) 0 a(z)
up to normalisation. Set
x b(z) %(x).
dz = −F (7.2.82)
0 a(z)
Then it is easy to verify that the Dirichlet form is given by
%
E (f, g) = 12 ε e−F (x)/ε f (x)g (x) dx. (7.2.83)
R
To compute the equilibrium potential hc,{a,b} , we must solve the second-order dif-
ferential equation
εa(x)h (x) + b(x)h (x) = 0, (7.2.84)
which reduces to the first-order differential equation
From Lemma 7.28 we get the following formula for the Green function on (a, b):
%
e−F (x)/ε hy,{a,b} (x)
G{a,b} (y, x) = , (7.2.90)
a(x) cap(y, {a, b})
where the second equality uses (7.2.57). Note that this computation is a nice alterna-
tive to the usual method of variation of constants used to obtain the Green function.
Now, if limx↓−∞ F %(x) = ∞, then lima↓−∞ x eF%(r)/ε dr = ∞, and we get
a
⎧
⎪
⎨1, −∞ < x < c,
lim hc,{a,b} (x) = b eF%(r)/ε dr (7.2.91)
a↓−∞ ⎪
⎩ xb F%(r)/ε , c < x < b,
c e dr
and
ε
lim cap c, {a, b} = b % . (7.2.92)
a↓−∞ 2 c e (r)/ε dr
F
Hence
⎧
⎨2(εa(x))−1 e−F%(x)/ε b eF%(r)/ε dr, y < x < b,
x
lim G{a,b} (y, x) = (7.2.93)
a↓−∞ ⎩2(εa(x))−1 e−F%(x)/ε b eF%(r)/ε dr, x < y < b.
y
% 1 %
Ey [τb ] = 2 e−F (x)/ε eF (r)/ε dr dx, y ∈ (−∞, b). (7.2.94)
−∞ εa(x) x∨y
% in (7.2.82) reduces to
Note that, for a(x) = 1 as in (2.1.1), the definition of F
% %
b = −F , i.e., F = F with F the potential. If F is chosen to be a double-well
7.3 Variational principles 173
Fig. 7.4 Example of the setting in Remark 7.32 with a = −3 and b = 3: a potential x → F (x) on
[−3, 3] and its associated equilibrium potential x → h3,−3 (x) = Px (τ3 < τ−3 )
potential, then (7.2.94) yields, in the limit as ε ↓ 0 and with the help of elementary
Laplace asymptotics, the Kramers formula in (2.1.2) advertised in Sect. 2.1.
Remark 7.31 Note that we chose an arbitrary normalisation for the invariant mea-
sure, which influences the value of the capacity. It does not, however, affect the
value of physical quantities, in particular, the Green function and the mean hitting
time.
%(x)/ε
−F 1 %(r)/ε
Ey [τb ] = 2 e e F
dr dx, y ∈ (a, b). (7.2.96)
y εa(x) y
As was pointed out earlier, variational principles are at the heart of our endeavor
to obtain sharp estimates on key quantities in metastable systems. We have already
seen that such quantities can be expressed as solutions to PDE’s (or discrete ana-
logues of PDE’s), but finding these is hard. Variational principles provide tools to
get good estimates without an explicit solution. In this section we discuss three vari-
ational principles: the Dirichlet principle, the Thomson principle and the Berman-
Konsowa principle.
174 7 Potential Theory
In Sects. 7.1–7.2 we have seen that the Dirichlet form computed on the equilibrium
potential gives the capacity. We will now show that the equilibrium potential is the
solution of a variational problem.
Moreover, if HA,B = ∅, then the infimum in (7.3.1) is attained uniquely at the equi-
librium potential, i.e., cap(A, B) = E (hA,B , hA,B ).
Proof We write the proof in the diffusion setting, but the same arguments work in
general. Suppose that HA,B = ∅. Let g be a function with E (g, g) < ∞ such that
g ≥ 0 on A and g ≤ 0 on B. Then, for h ∈ HA,B and ε > 0 (recall (7.2.55)), using
the second Green identity (7.2.50),
E (h + εg, h + εg) − E (h, h) = ε E (h, g) + E (g, h) + ε 2 E (g, g)
−F (x)
=ε e g(x)∂n(x) h(x) dσA (x) + ε e−F (x) g(x)∂n(x) h(x) dσB (x)
∂A ∂B
+ 2ε μ(dx)g(x)(L h)(x) + ε 2 E (g, g). (7.3.2)
D
If h = hA,B is the equilibrium potential, then the boundary integrals are non-
negative and the first term in the last line vanishes. Since the second term in the
last line is non-negative, it follows that h is a global minimum of E in HA,B . Fi-
nally, suppose that there is another function f such that E (f, f ) = E (h, h). Then
the identity
f +h f −h
E f +h
2 , 2 + E f −h
2 , 2 = 12 E (f, f ) + 12 E (h, h) (7.3.3)
implies that
f +h f −h
E 2 , f +h
2 = E (h, h) − E f −h
2 , 2 . (7.3.4)
Since h is a global minimum, this equality can only hold if
E (f − h, f − h) = 0. (7.3.5)
7.3 Variational principles 175
But, by (7.2.40) (recall that we are in the diffusion setting), the latter means that
∇(f − h)2 = 0 μ-a.s., i.e., f − h is constant μ-a.s. Because of condition (ii), it
follows that f = h μ-a.s.
Theorem 7.35 (Thomson principle, Version 1) Assume that A, B are such that the
corresponding Dirichlet problem has a unique solution hA,B . Let TA,B denote the
space of super-harmonic functions on D c that take values in [0, 1], i.e.,
TA,B = h : S → [0, 1], h ∈ L2 (S, μ) : (L h)(x) ≤ 0 ∀ x ∈ S\D . (7.3.7)
Then
E (1A , h)2
cap(A, B) = sup , (7.3.8)
h∈TA,B E (h, h)
and the supremum is attained at h = hA,B .
E (1A , h)2
cap(A, B) = E (hA,B , hA,B ) ≥ . (7.3.11)
E (h, h)
176 7 Potential Theory
Fig. 7.5 Kirchhoff’s law says that the in-flow and the out-flow are the same for all vertices that
are not wired to the outside
Thus, the right-hand side of (7.3.8) is a lower bound for cap(A, B). Since, by defi-
nition (see (7.2.55–7.2.57)), cap(A, B) = E (1A , hA,B ), the lower bound in (7.3.11)
is attained for h = hA,B .
The Thomson principle is much more difficult to exploit than the Dirichlet prin-
ciple, since it imposes the constraint of super-harmonicity on the test functions.
Guessing good super-harmonic functions is not easy.
In the setting of Markov processes with countable state space, there is an alterna-
tive (and better known) formulation of the Thomson principle in terms of flows (see
Fig. 7.5).
Definition 7.36 Let Γ = (S, E) be a graph with edge set E and vertex set S. Let
A, B ⊂ S be non-empty and disjoint. A map f : E → R is called a unit flow from
A to B when
(i) Kirchhoff’s law holds: the flows into and out of vertices in S\(A ∪ B) are the
same, i.e.,
f (y, x) = f (x, z) ∀ x ∈ S\(A ∪ B). (7.3.12)
y∈S : z∈S :
(y,x)∈E (x,z)∈E
is a sum over edge-functions on the graph of the Markov process. The Dirichlet
form can therefore be written as
7.3 Variational principles 177
E (h, g) = 1
2 μ(x)p(x, y) h(y) − h(x) g(y) − g(x) (7.3.15)
(x,y)∈E
{μ(x)p(x, y)[h(y) − h(x)]} {μ(x)p(x, y)[g(y) − g(x)]}
= 1
2 .
μ(x)p(x, y)
(x,y)∈E
we have
E (h, g) = D(μp∇h, μp∇g), (7.3.17)
with the obvious definition of μp∇. In particular,
D(μp∇hA,B , f )2 1
cap(A, B) ≥ = (7.3.20)
D(f, f ) D(f, f )
for any unit flow f .
Theorem 7.37 (Thomson principle, Version 2) For Markov processes with coun-
table state space, with the notation above,
1
cap(A, B) = sup , (7.3.21)
f ∈UA,B D(f, f )
where UA,B is the space of all unit flows from A to B. The supremum is attained for
the harmonic unit flow
μ(x)p(x, y)[hA,B (y) − hA,B (x)]+
fhA,B (x, y) = . (7.3.22)
cap(A, B)
Proof In view of (7.3.20), we only need to verify that equality holds for the particu-
lar choice of the harmonic unit flow. To check that D(fhA,B , fhA,B ) = 1/cap(A, B)
is immediate. We only need to verify that fhA,B is a unit flow from A to B.
178 7 Potential Theory
is a flow.
Remark 7.39 Note that the proof of Lemma 7.38 implies that for any function g the
edge-function φg defined in (7.3.23) satisfies
φg (x, y) − φg (y, x) = μ(y)(−L g)(y), y ∈ S. (7.3.25)
x∈S
Berman and Konsowa [23] obtained another variational principle, for the case of
discrete-time Markov processes, which generates lower bounds that improve on
those obtained from the Thomson principle. Its derivation is quite different, and
actually starts from the Dirichlet principle.
We work in the same setting as in Sect. 7.3.2.
and put q f ((x, y)) = 0 for x ∈ B. We construct a Markov process with law Pf ,
initial distribution Pf (X0 = x) = F (x)1x∈A and transition matrix q f that is killed
in B. Pf can also be seen as a probability distribution on self-avoiding paths from
A to B, with
|γ"
|−1
Pf (γ ) = F (γ0 ) q f (γi , γi+1 ) . (7.3.27)
i=0
Pf (e ∈ γ ) = f (e). (7.3.28)
Proof Let e = (x, y). Then, by the Markov property and the fact that the paths are
self-avoiding, the probability in question equals the probability that a path hits x
and immediately moves to y:
Pf (e ∈ γ ) = Pf (τx < τB )q f (x, y) . (7.3.29)
|γ"
|−1
Pf (τx < τB ) = F (γ0 ) q f (γi , γi+1 ) . (7.3.30)
γ : A→x i=0
The summation over paths has to be carried out carefully. To that end, recursively
define the sets
A0 = A, (7.3.31)
An = z ∈ S\A : ∃y∈An−1 f (y, z) > 0, ∀y ∈A
/ 0 ∪···∪An−1 f (y, z) = 0 .
Note that, due to the loop-freeness of the flow, for any z ∈ S there exists a unique
n∗ (z) such that z ∈ An∗ (z) . Set
We prove by induction that G(z) = F (z) for all z ∈ S. Indeed, for z ∈ A we have
G(z) = F (z) by our choice of the initial condition. It therefore suffices to show that
if G(z) = F (z) holds for all z ∈ Ak , 0 ≤ k ≤ n, then it also holds for z ∈ An+1 .
Now, by (7.3.31–7.3.33), for z ∈ An+1 we have
G(z) = G(y)q f (y, z) = f (y, z) , (7.3.34)
y∈A0 ∪···∪An y∈A0 ∪···∪An
where we use the induction hypothesis. However, for z ∈ An+1 we also have
f (y, z) = f (y, z) = f (z, w) = F (z), (7.3.35)
y∈A0 ∪···∪An y∈S w∈S
where we use that the flow satisfies Kirchhoff’s law. Thus, we have completed the
induction step and have proven that
Remark 7.42 Lemma 7.41 is the only place where the flow property of f is used.
In terms of flows the observation in (7.3.36) is the probabilistic interpretation of the
fact that F (x) is the total flow into x.
From this we can derive a lower bound on the capacity, namely, we take the infimum
over h and interchange the sum over γ with the infimum over h:
μ(x)p(x, y) 2
cap(A, B) ≥ inf Pf (γ ) h(x) − h(y)
h∈HA,B f ((x, y))
γ (x,y)∈γ
7.4 Variational principles in the non-reversible setting 181
μ(x)p(x, y) 2
≥ Pf (γ ) inf h(x) − h(y)
h∈HA,B f ((x, y))
γ (x,y)∈γ
−1
f ((x, y))
= P (γ )
f
. (7.3.39)
γ
μ(x)p(x, y)
(x,y)∈γ
In the last step we use the explicit solution of the Dirichlet problem on the one-
dimensional path γ . We readily see that equality holds when we insert the harmonic
unit flow (see (7.3.22)). Thus, we have proved the following theorem.
Theorem 7.43 (Berman-Konsowa principle) Let UA,B denote the set of loop-free
unit flows from A to B. Then
−1
f ((x, y))
cap(A, B) = sup Ef . (7.3.40)
f ∈UA,B μ(x)p(x, y)
(x,y)∈γ
Hence, every choice of f yields a better lower bound via the Berman-Konsowa
principle than via the Thomson principle.
The more serious advantage of the Berman-Konsowa principle is the fact that
the bounds can often be evaluated explicitly. The sums appearing in the right-hand
side of (7.3.40) are straightforward, live on the flow realising the supremum, and
are independent of the realisation of the Markov chain, so that the expectation
over Ef becomes trivial. This will be explained in the examples that are treated
in Parts IV–VIII.
Variational representations for capacities are known also in the non-reversible set-
ting, but they are much more involved and therefore far less useful. Here is a brief
account.
182 7 Potential Theory
In particular,
cap(A, B) = h∗A,B , −L hA,B μ = −L ∗ h∗A,B , hA,B μ = cap∗ (A, B). (7.4.4)
Note that on the space of functions with zero mean we have f 2H −1 = (f, L −1 f ),
while otherwise the H −1 -norm is infinite. An application of the Cauchy-Schwarz
inequality yields the bound
Using this bound with f replaced by −L ∗ f and g by hA,B , we can show that
2
−L ∗ f, hA,B μ ≤ cap(A, B) sup 2 −L ∗ f, h μ − h2H 1 , (7.4.8)
h∈GA,B
where GA,B denotes the space of functions that are constant on the sets A and B.
Furthermore, for any f ∈ HA,B we have, via (7.4.3),
−L ∗ f, hA,B μ = (f, −L hA,B )μ = cap(A, B), (7.4.9)
It now suffices to choose f = 12 (hA,B + h∗A,B ), where h∗A,B is the equilibrium po-
tential for the adjoint generator L ∗ , to verify that the infimum is attained at f . This
yields the following Dirichlet principle for the non-reversible case.
7.4 Variational principles in the non-reversible setting 183
The Thomson principle in the form of Theorem 7.35 carries over to the non-
reversible case.
Another version of both the Dirichlet principle and the Thomson principle is
the following theorem, whose proof can be found in Slowik [220]. We need the
following notations: for g : S → R, set
Ψg (x, y) = μ(x)p s (x, y) g(x) − g(y) (7.4.12)
and
Φg (x, y) = μ(x)p(x, y)g(x) − μ(y)p(y, x)g(y). (7.4.13)
where HA,B is the space of functions defined in Theorem 7.33 and UA,B 0 is
∗
the space of zero-flows. The infima are attained at f = 2 (hA,B + hA,B ) and
1
ψ = Φf − ΨhA,B .
(ii) The Thomson principle holds, in the sense that
1
cap(A, B) = sup sup , (7.4.15)
g∈G0A,B φ∈UA,B
1 D(φ − Φg , φ − Φg )
Whether or not these variational principles are useful in connection with metasta-
bility remains to be seen. They are substantially more involved than their analogues
in the reversible case, where the minimiser has a transparent probabilistic interpre-
tation that makes it easy to come up with good guesses for test functions.
184 7 Potential Theory
2. There is a formula for the mean hitting time that does not require reversibility, as
was noted by Gaveau and Moreau [123]. It suffices to recall that (7.1.23) holds in
general, to get
μ(x)hx,B (a)
Ea [τB ] = . (7.5.1)
cap(x, B)
x∈S
Note that this formula is not quite as nice as (7.1.41), but in principle it constitutes
an alternative. For more details, see the PhD thesis of Eckhoff [101, 102]. Fernán-
dez, Manzo, Nardi, Scoppola and Sohier [111] and Fernández, Manzo, Nardi and
Scoppola [112] develop a theory of metastability without reversibility, based on cer-
tain assumptions involving slow escape, fast thermalisation and fast recurrence, and
provide examples of dynamics for which these assumptions can be verified.
3. The Dirichlet form E can be extended to the set {f : E (f, f ) < ∞}, which typ-
ically is larger than the domain of L . An entire theory is available that allows us
to use this fact to construct a Markov process from a Dirichlet form. For a detailed
treatment, see e.g. the monograph by Fukushima, Oshida and Takeda [116].
4. In textbooks, Green identities are given for the case L = 12 Δ only. We have not
been able to find a reference where they are stated in general in explicit form.
5. The derivation in Sect. 7.2.4 is taken from the monograph by Sznitman [226].
6. For irregular domains, existence and uniqueness issues are more delicate. For
further reading, we refer the reader to the monograph by Karatzas and Shreve [148].
7. The approach in Sect. 7.3.3 was developed by Bianchi, Bovier and Ioffe [24]
following the original paper by Berman and Konsowa [23]. In den Hollander
and Jansen [82] the Berman-Konsowa principle is extended to arbitrary reversible
Markov jump processes on Polish spaces. The latter paper contains an appendix in
which the physical interpretations of the Dirichlet, Thomson and Berman-Konsowa
variational principles are elaborated.
8. The connection between the Berman-Konsowa principle and the Thomson prin-
ciple has been worked out by Slowik [219]. Remark 7.44 comes from that paper.
Doyle [97]. Our presentation follows the exposition given by Slowik [220]. The
Dirichlet principle in the form of (7.4.14) is given by Landim [159] and Gaudillière
and Landim [121], while the Thomson principle in (7.4.15) apparently appears for
the first time in Slowik [220].
10. For a detailed discussion of Theorem 7.17, see Karatzas and Shreve [148].
11. A host of material on reversible Markov processes with countable state space is
presented in the online-book by Aldous and Fill [2].
Part III
Metastability
In this chapter we introduce the basic setup for our approach to metastability.
The guiding principle is to provide a definition of metastable sets, representing
metastable states in model systems, that is verifiable in concrete models and implies
the type of behaviour that is associated with metastability. The intuitive picture we
have in mind comes from the paradigmatic Brownian motion in a double-well (or
a multi-well) potential in one dimension. Here, the metastable states correspond to
“valleys” of the potential, labeled by the local minima of the potential. Our aim is
to give a definition that applies in far more general situations.
Section 8.1 defines metastable sets and provides the characterisation of metasta-
bility in terms of capacities. Section 8.2 shows how renewal estimates can be used
to obtain upper and lower bounds on the equilibrium potential in terms of capacity
and establishes the approximate ultrametricity of capacity. Section 8.3 uses these
results to obtain sharp bounds on mean hitting times. Section 8.4 makes the link
with spectral theory. Section 8.5, finally, mentions some problems that come up for
uncountable state spaces.
Consider a Markov process X with state space S and discrete or continuous time.
Let P denote the law of X and Px the law of X conditioned on X0 = x. We will
typically assume that X is uniquely ergodic with invariant measure μ. For D ⊂ S,
let τD denote the first hitting time of X in D, i.e.,
τD = inf t > 0 : X(t) ∈ D . (8.1.1)
The fundamental feature we would like to associate with metastability is the ex-
istence of two well-separated time scales and the partition of the state space into
disjoint sets Si , i ∈ I , such that, when X starts in Si , on a short time scale it reaches
Fig. 8.1 Picture of a metastable set (dots) labelling the metastable valleys, and transitions between
these valleys (arrows)
some sort of local equilibrium concentrated on Si , while on a long time scale it ex-
its Si and moves to some Sj with j = i, where it again reaches local equilibrium,
etc. We may think of the dynamics as “hopping” between quasi-invariant sets (see
Fig. 8.1). To capture this picture, an appealing way is to characterise the rapid ap-
proach to local equilibrium by saying that, in a suitable sense, X is locally recurrent
or Harris recurrent: each Si contains a small set Bi ⊂ Si that is revisited by X very
frequently before it moves out of Si .
On this basis, an intuitively appealing definition of metastability could be the
following:
• A family of Markov processes is called metastable if there exists a collection of
disjoint sets Bi ⊂ S, i ∈ I , such that
Here, o(1) should be thought of as a small intrinsic parameter that characterises the
“degree” of metastability, since typically we deal with a family of Markov processes
indexed by a parameter (like temperature, system size, etc.) that allows us to make
(8.1.2) as small as we like.
The definition in (8.1.2) characterises metastability in terms of a physical prop-
erty, namely, hitting times of the system. Certainly we would want such a property to
hold for a system to be called metastable. However, the problem is that (8.1.2) is not
immediately verifiable, since mean hitting times are generally difficult to compute.
Indeed, one of our goals is to compute mean hitting times, and so (8.1.2) would put
us in a circular set-up. It is thus desirable to have an equivalent definition involving
more manageable quantities.
The relations in Corollaries 7.11 and 7.30 between mean hitting times and capac-
ities suggest an alternative characterisation of metastability through capacities. We
will see that this characterisation entails many advantages. We first give a tentative
definition of metastable sets.
8.1 Characterisation of metastability 191
Remark 8.1 Note that here and in the sequel we always include the stable set in
the collection of metastable sets, contrary to what is common practice. In particu-
lar, if there is only one metastable set, then it is stable, and the system exhibits no
metastable behaviour on this level of resolution.
This definition leaves some questions open. Should we take the supremum over
x ∈ Bi in the numerator rather than the infimum? What should be the choice for
B(x)? How can we relate the probabilities appearing in the definition to capacities,
as advertised?
It will emerge that the usefulness of the definition depends crucially on further
properties of the sets Bi , i ∈ I , and on local mixing properties of the process. Be-
fore we continue this discussion, we turn to the simplest case from which we can
derive much of our intuition: Markov processes in discrete time with countable state
spaces.
An important goal will be to derive general properties of metastable systems.
Since (8.1.3) implies frequent returns to the small starting set Bi before the transition
to a set Bj , j = i, we expect an exponential law for the transition times. We further
expect that the process of successive visits to the sets Bi , i ∈ I , asymptotically is a
Markov process on I .
Everything becomes easy and transparent when the state space S is finite, and we
can replace the sets Bi , i ∈ I , and B(x), x ∈ S, in (8.1.3) by single points. It will be
useful to understand this simple setting first.
The following definition of a set of metastable points applies (see Fig. 8.1).
Definition 8.2 (Metastable points) Suppose that |S| < ∞. A Markov processes X
is said to be ρ-metastable with respect to a set of points M ⊂ S if
Remark 8.3 Definition 8.2 is useful because, as we will see later, it involves quan-
tities that are either known or are controllable. It becomes intuitively even more
appealing after we note that (8.1.4) can be written alternatively as
where to go from (8.1.4) to (8.1.5) we use Lemma 7.13(i). Note the appearance
of the cardinality of the state space in (8.1.4)–(8.1.5). The definition makes sense
when we have a sequence of processes where the cardinality of the state space is
either fixed or increases slowly. If |S| = ∞, but there exists a subset S0 ⊂ S with
cap(M , S0c ) maxx∈M cap(x, M \x), then |S| can be replaced by |S0 | in Defini-
tion 8.2. The reader may verify this fact in the proofs below. The intuitive reason is
that, under this assumption, the process will have visited all metastable points long
before it leaves the set S0 .
1
En [τJ ] = μ(y)hn,J (y). (8.1.6)
cap(n, J )
y∈S
The main work is to control the sum over the equilibrium potential in (8.1.6). To
do this, we show in Sect. 8.2 how to control the equilibrium potential in terms of
capacities. In Sect. 8.3 we use these estimates to derive bounds on mean hitting
times.
cap(x, B) cap(x, A)
max 1 − , 0 ≤ hA,B (x) ≤ min ,1 . (8.2.1)
cap(x, A) cap(x, B)
where the second equality comes from counting the returns to x without a hit of
A or B. The lower bound follows from the upper bound via the symmetry relation
hA,B (x) = 1 − hB,A (x).
8.2 Renewal estimates and ultrametricity 193
Lemma 8.5 Let D ⊂ S and x, y ∈ S\D. If cap(x, D) ≤ δ cap(x, y) for 0 < δ < 1,
then
cap(x, D) 1
1−δ≤ ≤ . (8.2.3)
cap(y, D) 1 − δ
Py (τx < τD ) 1
1 − Py (τD < τx ) ≤ ≤ . (8.2.5)
Px (τy < τD ) 1 − Px (τD < τy )
Substitution into the right-hand side of (8.2.5) yields the upper bound in (8.2.3). On
the other hand, by Lemma 7.13(iii) we also have
cap(x, D)
Py (τD < τx ) ≤ ≤ δ. (8.2.7)
cap(x, y)
Substitution into the left-hand side of (8.2.5) yields the lower bound in (8.2.3).
Lemma 8.5 has the following corollary, which is the version of the approximate
ultrametric triangle inequality we are looking for.
Proof Suppose that the claim is false. Then there exist distinct x, y, z ∈ S with
cap(x, y) < 12 cap(x, z) and cap(x, y) < 12 cap(y, z). Lemma 8.5 with δ = 12 there-
fore implies that
cap(x, y) cap(x, y)
1
2 ≤ ≤ 2, 1
2 ≤ ≤ 2, (8.2.9)
cap(y, z) cap(x, z)
It is useful to have the notion of a valley around a point in M , which will serve
as an attractor for the dynamics (see Fig. 8.2). For m ∈ M , let
A(m) = z ∈ S : Pz [τm = τM ] = sup Pz [τn = τM ] . (8.2.10)
n∈M
Note that valleys may overlap, but from Lemma 8.5 it follows that their intersection
has a negligible mass under the invariant distribution. The following estimate holds.
then
μ(x) ≤ 2ε −1 cap(m, n). (8.2.12)
Proof It follows from (7.1.18)–(7.1.20) and (7.1.39) that (8.2.11) implies cap(n, x)
≥ εμ(x) and cap(m, x) ≥ εμ(x). Hence
by Corollary 8.6.
Corollary 8.8 Assume that x ∈ S\M has the property Px (τm = τM ) = Px (τn =
τM ) = max∈M Px (τ = τM ). Then
μ(x) ≤ 2ρ min μ(m), μ(n) , (8.2.14)
and similarly with m replaced by n. Hence the hypotheses of Lemma 8.7 are satis-
fied with ε = Px (τM < τx )/|M |, and so
2|M | cap(m, n)
μ(x) ≤ ≤ 2ρ min μ(m), μ(n) , (8.2.17)
Px (τM < τx )
where the last inequality follows from Lemma 7.13(i) and Definition 8.2.
In view of Corollary 8.8 we may modify the definition of the valleys A(m),
m ∈ M , by reassigning their overlaps in an arbitrary fashion so that they become
disjoint.
We will make frequent use of the following corollary as well.
cap(m, J )
1
2 ≤ ≤ 2, (8.2.18)
cap(y, J )
or
cap(m, J )
1 ≤ 2|M | . (8.2.19)
cap(y, M )
Since y ∈ A(m), the maximum must be achieved for n = m, which gives cap(y, M )
≤ |M | cap(y, m). Combining this with cap(m, J ) > 12 cap(m, y), we get the
claim.
μ(y)
sup Ez [τM ] ≤ |S| sup . (8.3.1)
/M
z∈ / M cap(y, M )
y∈
Proof Recall from (7.1.13) that Ez [τM ] = y∈S\M GM (z, y). Using the repre-
sentation in (7.1.23) for the Green function GM (z, y), we get, for z ∈
/ M,
hy,M (z) 1
Ez [τM ] = ≤ |S| sup , (8.3.2)
ey,M (y) y∈S\M ey,M (y)
y∈S\M
which yields (8.3.1) after we recall from (7.1.39) that ey,M (y) = cap(y, M )/μ(y).
In the proof we used the trivial bound hy,M (z) ≤ 1. This explains the remark
made after Definition 8.2: with additional work the term |S| can be replaced by the
cardinality of a smaller set of y’s where hy,M (z) is close to 1.
We next turn to the computation of mean hitting times from a point n ∈ M to some
subset J ⊂ M (see Fig. 8.3). Return to (8.1.6). Decompose the sum in the right-
hand side as
μ(y)
μ(y)hn,J (y) = μ(m) Wn,J (m), Wn,J (m) = hn,J (y),
μ(m)
y∈S m∈M y∈A(m)
(8.3.3)
where A(m) is the set defined in (8.2.10), modified so that m∈M A(m) becomes a
disjoint union (recall the remarks made below (8.2.10) and (8.2.17)). Lemmas 8.11
and 8.13 below provide technical estimates of the quantities in (8.3.3). After the
statement and the proof of these lemmas we will explain how these estimates must
be read, and in what regimes they reduce to simpler estimates.
8.3 Estimates on mean hitting times 197
The first technical lemma gives bounds on hn,J (y) and μ(y)/μ(m) in the differ-
ent sets A(m), m ∈ M . Abbreviate
a = a(m) = inf cap(y, M )/μ(y) . (8.3.4)
y∈A(m)
cap(m, J )
hn,J (y) ≥ 1 − 2|M | (8.3.5)
cap(y, M )
or
1
μ(y) ≤ 2|M | cap(m, J ). (8.3.6)
a
(ii) If m ∈ J , then hn,J (m) = 0, and for y ∈ A(m)\m either
1
μ(y)hn,J (y) ≤ 2|M | cap(m, n) (8.3.7)
a
or
1
μ(y) ≤ 2|M | cap(m, n). (8.3.8)
a
(iii) If m ∈
/ J ∪ n, then for y ∈ A(m) either
cap(m, J ) cap(m, n)
1−4 ≤ hn,J (y) ≤ 4 (8.3.9)
cap(m, n) cap(m, J )
or
1
μ(y) ≤ 2 |M | max cap(m, J ), cap(m, n) . (8.3.10)
a
Proof The values of hn,J (y) for y ∈ J ∪ n are trivial from the definition of the
equilibrium potential. By Lemma 8.4, for J ⊂ M , n ∈ M and y ∈ / M,
cap(y, J ) cap(y, n)
1− ≤ hn,J (y) ≤ . (8.3.11)
cap(y, n) cap(y, J )
(i) To get the first assertion, use Corollary 8.9. In the first case, this yields
cap(m, J ) cap(m, J )
hm,J (y) ≥ 1 − 2 ≥ 1 − 2 |M | , (8.3.12)
cap(y, m) cap(y, M )
where we use (8.2.20). In the second case, we get (8.3.6) via the definition of a.
(ii) Use the upper bound in (8.3.11) to get
cap(y, n)
hn,J (y) ≤ , (8.3.13)
cap(y, J )
198 8 Key Definitions and Basic Properties
and use Corollary 8.9 with J = n. In the first case, cap(n, y) ≤ 2 cap(m, n), and
hence
cap(n, m)
hn,J (y) ≤ 2 . (8.3.14)
cap(y, J )
From here (8.3.7) follows as in (i). In the second case, (8.3.8) is again straightfor-
ward.
(iii) Write the two renewal bounds from Lemma 8.4,
cap(y, J ) cap(y, n)
1− ≤ hn,J (y) ≤ , (8.3.15)
cap(y, n) cap(y, J )
and again use Corollary 8.9. If (8.2.18) holds both for J = J and J = n, then we
can replace y by m in the numerators and denominators of (8.3.15) at the cost of a
factor 4 to get (8.3.9). If, on the other hand, (8.2.19) holds, then we get (8.3.10) just
as in the previous cases.
Remark 8.12 Case (iii) is special in as much as it does not give sharp estimates
when cap(m, J ) ≈ cap(m, n). If this situation occurs and the corresponding terms
contribute to leading order, then we cannot get sharp estimates with the tools ex-
ploited above, and better estimates on the equilibrium potential are needed.
The second technical lemma uses the estimates in Lemma 8.11 to obtain esti-
mates of Wn,J (m) in (8.3.3).
μ(A(n)) cap(n, J ) 1
Wn,J (n) ≥ 1− 4|M | A(n) . (8.3.17)
μ(n) μ(A(n)) a
(ii) If m ∈ J , then
# $
1 cap(m, n) 1 cap(m, n)
Wn,J (m) ≤ C|M | y ∈ A(m) : μ(y) ≥ |M |
a μ(m) a μ(n)
(8.3.18)
for some C ∈ (0, ∞) independent of ρ in Definition 8.2.
(iii) If m ∈
/ J ∪ n, then
μ(A(m))
Wn,J (m) ≤ . (8.3.19)
μ(m)
Moreover:
8.3 Estimates on mean hitting times 199
Proof The proof consists of just inserting the bounds from Lemma 8.11.
Lemma 8.13 looks complicated. Ignoring small terms, we see that the statement
boils down to the following:
(i) The starting valley always contributes
μ(A(n))
Wn,J (n) ≈ , (8.3.22)
μ(n)
μ(A(n))
which gives a contribution of cap(n,J ) to the mean hitting time.
(ii) For m ∈ J ,
cap(m, n) 1
Wn,J (m) | S|. (8.3.23)
μ(m) a
This gives a contribution to the mean hitting time of order at most |S|/a, which
by assumption is small compared to that coming from (i).
(iii) For m ∈/ J ∪ n,
μ(A(m))
Wn,J (m) . (8.3.24)
μ(m)
(iii1) This bound is achieved when cap(m, J ) cap(m, n). In this case the
contribution to the hitting time is μ(A(m))
cap(n,J ) , which is small compared to
the one from (i) only if μ(m) μ(n).
(iii2) If cap(m, J ) ( cap(m, n), then the bound can be improved to
The second term is always harmless, while the first can contribute more
to the mean hitting time than the one from (i), unless μ(m)/cap(m, J )
μ(n)/cap(m, n).
Remark 8.14 The above arguments use that quantities like μ(m)/μ(A(m)) are not
too small, i.e., the most massive points in a metastable set have a fairly large mass
200 8 Key Definitions and Basic Properties
(compared to, say, ρ in Definition 8.2). The most restrictive contribution comes
from case (iii), which is small only when ρ|S| is small. Physically speaking, the
latter avoids the situation where the time it takes for the dynamics to hit a target
point in J after crossing the respective saddle is much longer than the time it takes
to escape from the starting well.
Taking into account that Wn,J (m) appears with the prefactor μ(m)/μ(n) in
(8.3.3), we see that contributions from case (ii) are always subdominant. In particu-
lar, when J = M \n, the term m = n always gives the main contribution. The terms
from case (iii) have a chance to contribute only when μ(m) ≥ μ(n). In subcase
(iii1) they indeed contribute, and potentially dominate the sum, while in subcase
(iii2) they may or may not contribute.
The estimates obtained in Lemma 8.13 can now be inserted into the sum in (8.3.3)
and then into (8.1.6), to provide estimates on the mean hitting times En [τJ ]. We state
the outcome in the special case when only the term involving the starting minimum
contributes.
Theorem 8.15 (Mean metastable exit time) Let n ∈ M and J ⊂ M \n be such that
for all m ∈
/ J ∪ n μ(m) μ(n) or cap(m, J )/μ(m) ( cap(m, n)/μ(n). Then
μ(A(n))
En [τJ ] = [1 + error ], 0 < error 1. (8.3.26)
cap(n, J )
Proof The proof is straightforward from (8.3.3) and Lemmas 8.11 and 8.13. See the
discussion before Remark 8.14.
We call En [τMn ] the mean metastable exit time from the metastable point n. This
quantity plays an important rôle in Sect. 8.4 as well.
We will see in Parts IV–VIII that (8.3.26) is the key formula for the computation
of mean crossover times in metastable systems.
Fig. 8.4 Schematic picture of eigenvalues and Dirichlet eigenvalues when |M | = k, k ∈ N\{1}.
The eigenvalues 0 = λ1 < λ2 < · · · < λk−1 < λk are indicated by dots, the Dirichlet eigenvalues
M M
λM
0
1
< · · · < λ0 k−2 < λ0 k−1 < λM 0
k
are indicated by stars. The latter correspond to a nested
sequence of subsets of M , namely, M = {x1 , . . . , x }, = 1, . . . , k, with Mk = M , ordered ac-
cording to the depths of the valleys (see (8.4.61) below). The distance between the two spectra is
much smaller than the gaps within each of the two spectra
For the generator of a Markov process with countable state space, the eigenvalues
of −L are those values of λ for which
(−L − λ)ψλ (x) = 0, x ∈ S, (8.4.1)
has a solution. The smallest eigenvalue is called principal eigenvalue. The com-
parison of eigenvalues and Dirichlet eigenvalues will be an important tool in the
analysis of the spectra of metastable Markov processes.
In Sect. 8.4.1 we derive rough bounds on the smallest eigenvalue λM 0 of −L .
M
The first step consists in getting a rough bound on the principal eigenvalue of −LM ,
with M the set of metastable points. This uses an important tool that is due to
Donsker and Varadhan [94].
Lemma 8.16 (Lower bound on principal Dirichlet eigenvalues) Let I ⊂ S, and let
λI0 be the smallest eigenvalue of −LI . Then
1
λI0 ≥ . (8.4.3)
supz∈S Ez [τI ]
1 1
φ(x)φ(y) ≤ Cφ(x) + φ(y) .
2 2
(8.4.4)
2 C
Let w ∈ RS be such that w(x) > 0 whenever φ(x) = 0. Using (8.4.4) with C =
w(y)/w(x) within the Dirichlet form, we get
2 (−Lw)(x)
E (φ, φ) ≥ μ(x)φ(x) . (8.4.5)
w(x)
x∈S
Let w(x) = Ex [τI ], x ∈ S\I , and let φ be an eigenvector of −LI with eigenvalue λ.
Recalling that w solves the Dirichlet problem in (7.1.8), we get
1 1
λφ2,μ ≥
2
μ(x)φ(x) 2
≥ φ22,μ . (8.4.6)
w(x) supx∈S\I w(x)
x∈S\I
Since this holds for all eigenvalues of −LI , it implies the assertion in (8.4.3).
Lemma 8.16 links the time scale of the metastable dynamics to the smallest
eigenvalue of the Dirichlet operator in a way that is intuitively plausible. The es-
timate sometimes needs improvement, but at least it shows the basic twist. Note that
the bound is not very precise. We later derive a more precise relation for the cluster
of |M | small real eigenvalues alluded to above.
Combining Lemma 8.16 with Lemma 8.10, we obtain the following.
cap(y, M )
λM
0 ≥ inf . (8.4.7)
/M
y∈ 3 |M ||S|μ(y)
8.4 Spectral characterisation of metastability 203
We next obtain a representation formula for the eigenvalues that are smaller than
λM0 . We show that there are precisely |M | such eigenvalues. The idea is to use the
fact that the solution of the Dirichlet problem
Lemma 8.18 Assume that λ < λM 0 is an eigenvalue of −L and that φ is the cor-
responding eigenfunction. Then the unique solution of (8.4.8) with φx = φ(x),
x ∈ M , satisfies f (y) = φ(y) for all y ∈ S.
Proof Inserting f = φ into (8.4.8), we see that the first line in (8.4.8) is satisfied
because φ is an eigenfunction with eigenvalues λ. The second line holds by as-
sumption.
For any λ < λM 0 , the boundary value problem in (8.4.8) has a unique solution for
any choice of boundary condition φ. Denote by hλx,M \x = hλx , x ∈ M , the solutions
for the special case
(−L − λ)hλx (y) = 0, y ∈ M c ,
hλx (x) = 1, (8.4.9)
hλx (y) = 0, y ∈ M \x.
Here, eλx,M \x (y) = ((−L − λ)hλx,M \x )(y) is the λ-analogue of the equilibrium mea-
sure. Thus, denoting by EM (λ) the (|M | × |M |)-matrix with elements
EM (λ) xy = eλx,M \x (y), x, y ∈ M , (8.4.12)
1
= μ(z)hy (z) (−L − λ)hx (z)
μ(y)
z∈S
1
+ μ(z)hy (z) (−L − λ)ψxλ (z). (8.4.17)
μ(y)
z∈S
The matrix with elements E (hx , hy ) is referred to as the capacity matrix. It will turn
out that the small eigenvalues of the generator are very close to the eigenvalues of
the capacity matrix.
8.4 Spectral characterisation of metastability 205
For the second sum, we use the symmetry of L, plus the fact that ψxλ vanishes on
M and (−Lh)y vanishes on M c , to write it as
μ(z)hy (z) (−L − λ)ψxλ (z)
z∈S
= −λ hy , ψxλ μ + μ(z)(−Lhy )(z)ψxλ (z) = −λ hy , ψxλ μ . (8.4.19)
z∈S
1
eλx,M \x (y) = E (hy , hx ) − λ(hy , hx )μ − λ hy , ψxλ μ . (8.4.20)
μ(y)
Lemma 8.21 (2 -bounds) Let λM0 denote the principal eigenvalue of the operator
−L with Dirichlet boundary conditions in M .
(i) If λ < λM
0 , then for all x ∈ M ,
λ
ψxλ 2,μ
≤ M hx 2,μ . (8.4.21)
λ0 − λ
|λ − λ | λ
ψxλ − ψxλ ≤ hx 2,μ . (8.4.23)
2,μ
λM
0 −λ
λ
ψ − ψ λ , hy ≤ |λ − λ | hλ hy 2,μ . (8.4.24)
λM
x x μ x 2,μ
0 −λ
λ
ψxλ 2,μ
≤ M hx 2,μ , (8.4.26)
λ0 − λ
which proves (i). The assertion in (ii) follows from (i) together with the Cauchy-
Schwarz inequality. Finally,
λ
−L ψx − ψxλ (z) = λ − λ hλx (z) + λ ψxλ − ψxλ . (8.4.27)
Hence
−L − λ ψxλ − ψxλ (z) = λ − λ hλx (z), (8.4.28)
and so (8.4.23) follows in the same way as (8.4.21). Assertion (iv) follows again via
the Cauchy-Schwarz inequality.
Lemma 8.22 (∞ -bounds) With the notation of Lemma 8.21, the following bounds
hold.
(i) For all λ < λM
0 and x ∈ M ,
μ(y)
ψxλ λ|S| supy ∈/ M cap(y,M )
≤ μ(y)
. (8.4.29)
hx,M ∞ 1 − λ|S| supy ∈/ M cap(y, M)
ψxλ − ψxλ
|λ − λ ||S| supy ∈/ M μ(y)
cap(y,M )
≤ . (8.4.30)
hx,M ∞ 1 − |λ − λ ||S| supy ∈/ M cap(y,
μ(y)
M)
Thus, with GM the Green function with Dirichlet boundary conditions in M , ψxλ
satisfies
ψxλ (y) = λ GM (y, a)hλx (a). (8.4.32)
/M
a∈
8.4 Spectral characterisation of metastability 207
Using the representation of the Green function given in (7.1.23), we see that
1 ha,M (y)
GhMx (y, a) = hx,M \x (a). (8.4.35)
hx,M \x (y) ea,M (a)
But
ha,M (y)hx,M \x (a) Py (τa < τM ∧ τx = τM )
= = Py (τa < τM |τx = τM ).
hx,M \x (y) Py (τx = τM )
(8.4.36)
Hence
μ(a)
Ey [τM |τx = τM ] ≤ sup |S|. (8.4.37)
a∈S\M cap(a, M)
From (8.4.33–8.4.36) it follows that, for all y ∈ S\M ,
Via the bound in (8.4.37) this implies (8.4.29). The bound in (8.4.30) is proven
analogously.
The main application of the bounds in Lemma 8.22 is the following improvement
of (8.4.22) and (8.4.24).
and
Proof To bound the numerator we can use the computations from the proof of The-
orem 8.15. Analogously to (8.3.3), we write
μ(z)
μ(z)hx (z)hy (z) = %x,y (m),
μ(m)W %x,y (m) =
W hx (z)hy (z).
μ(m)
z∈S m∈M z∈A(m)
(8.4.42)
We need to distinguish the terms m ∈ {x, y} from the rest. First,
μ(z)
%x,y (x) ≤
W hy (z) = Wy,M \y (x). (8.4.43)
μ(x)
z∈A(x)
and analogously
%x,y (y) ≤ C|M |a −1 cap(x, y) A(y).
W (8.4.45)
μ(y)
For m ∈
/ {x, y}, we use the bounds from Lemma 8.11(ii). This yields
μ(z)
%x,y (m) =
W hx (z)hy (z)
μ(m)
z∈A(m)
min(cap(x, m), cap(y, m))
≤ 2|M |a −1
μ(m)
z∈A(m)
cap(x, y)
≤ |A(m)|4|M |a −1 . (8.4.46)
μ(m)
8.4 Spectral characterisation of metastability 209
This yields
μ(z)hx (z)hy (z) ≤ C|M |a −1 |S| cap(x, y). (8.4.47)
z∈S
√
The denominator in (8.4.41) is trivially bounded from below by μ(x)μ(y). Thus,
the left-hand side of (8.4.41) is bounded from above by
where the last inequality uses Definition 8.2. The assertion of the lemma now fol-
lows with C = a/|S|.
Remark 8.25 The bounds can be improved. For instance, with a little more care we
get
hx 2,μ hy 2,μ ≥ μ A(x) μ A(y) 1 − O(ρ) . (8.4.49)
N \x (x) = 0.
λ
ex, (8.4.52)
Moreover,
cap(x, N \x) cap(x, N \x)
1 − O(δ) ≤ λ = , (8.4.53)
hx,N \x 2,μ
2 hx,N \x 22,μ
cap(x,N \x)
where δ = /λN .
hx,N \x 22,μ 0
Proof The same argument leading to Lemma 8.19 shows that any eigenvalue of
−LN \x smaller than λN 0 must satisfy (8.4.52). To show (8.4.53), note that the
principal eigenvector of −LN \x must be strictly positive on (N \x)c . In particular,
it must be positive at x. Hence we can reformulate the Rayleigh-Ritz variational
principle in (8.4.50) as
On the other hand, using (8.4.20) to write out (8.4.52), we see that this equation
implies
N \x E (hx,N \x , hx,N \x )
λ0 = . (8.4.55)
hx,N \x 22,μ + (hx,N \x , ψxλ )μ
Via the bound in (8.4.22) from Lemma 8.21, this implies that
N \x E (hx,N \x , hx,N \x ) 1
λ0 ≥ N \x N \x
. (8.4.56)
hx,N \x 22,μ 1 + λ0 /(λN − λ0 )
0
N \x
Finally, using the upper bound on λ0 from (8.4.54) and the definition of δ, we
get
1 1
N \x N \x
≥ ≥ 1 − O(δ). (8.4.57)
1 + λ0 /(λN 1+ δ
0 − λ0 ) 1−δ
For = k, . . . , 1, set
cap(x, M \x)
x = argmax x ∈ M : (8.4.59)
μ(x)
and
M−1 = M \x . (8.4.60)
We call the set M non-degenerate if for any = k, . . . , 2 the set M is itself a set
of metastable points in the sense of Definition 8.2.
What the recursive construction in Definition 8.27 does is to look for the mini-
mum with the smallest stability level and remove it from the set of minima that are
left over. The sequence of sets thus obtained has the form
Proof Throughout we use the approximate ultrametricity of capacity and the non-
(j )
degeneracy assumptions. Consider a single set Mx . If all points in this set are in
M−1 , then there is nothing to prove. Otherwise, there is some with k ≥ >
(j )
for which a first point z ∈ Mx is selected to be removed as x . But then it must be
that
cap(z, M \z) cap(x , M \x ) cap(x , M−1 )
≥ ≥ . (8.4.65)
μ(z) μ(x ) μ(x )
212 8 Key Definitions and Basic Properties
(j )
Assume first that {z} = Mx . Then
cap(z, M \z) cap(z, x ) cap(z, x ) cap(x , M \x )
< < ≤ , (8.4.66)
μ(z) μ(z) μ(x ) μ(x )
which contradicts (8.4.65). Hence, z is not selected before the -th step, and so
(j )
z ∈ M−1 . Now let Mx contain several points. Then, for (8.4.65) to be satis-
(j )
fied, it must be a point y ∈ Mx such that cap(z, y) ∼ cap(z, M \z), and then z
is selected only if μ(y) > μ(z). Otherwise, z cannot be selected and must be in
(j ) (j )
M−1 . Continuing in this way, we must arrive at a point x∗ ∈ Mx with max-
imal invariant mass that cannot be removed at any step > , and thus must
(j )
be in M−1 . This proves (i). Since now for any point in Mx there is a point
y with μ(y) > μ(z), and cap(z, y)/μ(z) > cap(x , M−1 )/μ(x ), it follows that
(j )
cap(z, y)/μ(z) > cap(z, x∗ )/μ(z), which implies (ii).
(j )
Proof We have shown that each component Mx contains one point from M−1 .
We need the following lemma.
Proof The first inequality in (8.4.68) is trivial. For the second, note that
Px (τX < τx ) = Px (τX < τx ∧ τY < τx ) + Px (τX < τx ∧ τY > τx ) (8.4.69)
≤ Px (τY < τx ) + Px (τz < τx∪X\z )Pz (τx < τY )
z∈X\Y
cap(z, x)
≤ Px (τY < τx ) + sup Px (τX < τx ).
z∈X\Y cap(z, Y )
Using that cap(x, Y ) = μ(x)Px (τX < τx ) (see (7.1.19)), we get the upper bound in
(8.4.68).
It follows from Lemma 8.28 and the non-degeneracy conditions that the ratios of
the capacities in the denominators are at most δ. Next, we use the same reasoning to
show that
cap(x , Mx ∩ M−1 )
cap(x , Mx ∩ M−1 ) ≤ cap(x , M−1 ) ≤ cap(z,x )
.
1 − supz∈M−1 \Mx cap(z,Mx ∩M−1 )
(8.4.71)
Here, again the ratios of the capacities in the denominator must all be smaller than δ,
since if for some z in the supremum this is not true, then it leads to a contradiction
with the definition of x .
The first important consequence is the following estimate on the 2 -norms of the
corresponding equilibrium potentials.
where the valley A(x ) is defined with respect to the original set M .
Proof Use the estimates on equilibrium potentials in Lemma 8.11 and Lemma 8.28.
M
and λ0 −1 ≤ O(δ)λM 0 , and the sequence M , = k, . . . , 1, realises the sequence
defined in (8.4.61).
We will show that each of these principal Dirichlet eigenvalues is very close to
one of the small eigenvalues of −L.
214 8 Key Definitions and Basic Properties
Theorem 8.34 (Sharp asymptotics of principal eigenvalues) Assume that there ex-
ists an x ∈ M such that, for some 0 < δ 1,
cap(x, M \x)
λx = 1 + O(δ) , (8.4.76)
hx 2,μ
2
λ ≤ Cδλx . (8.4.77)
Moreover, the eigenvector φ (x) corresponding to λx , normalised such that φ (x) (x) =
1, satisfies φ (x) (z) ≤ Cδ, z = x, for some constant C < ∞.
Lemma 8.21, Corollary 8.23 and Lemma 8.24 already control the term involving
ψxλ and the scalar products (hx , hy )μ . The terms involving E (hx , hy ), x = y, can be
bounded using the Cauchy-Schwarz inequality,
1
E (hx , hy ) = μ z p z , z hx z − hx (z) hy z − hy (z)
2
z,z ∈S
≤ E (hx , hx )E (hy , hy ), (8.4.79)
and hence
2 2
E (hx , hy )
≤ E (hx , hx ) E (hy , hy ) . (8.4.80)
h h h 2 h 2
x 2,μ y 2,μ x 2,μ y 2,μ
8.4 Spectral characterisation of metastability 215
E (hx , hx )
Ax = . (8.4.82)
hx 22,μ
Lemma 8.35
(i) Let x be the point specified in the assumptions in Theorem 8.34. Then
K1xx = Ax − λ 1 + O(λ) .
λ
(8.4.83)
Trivially, we may choose the vector c in such a way that maxz∈M |cz | = 1, and the
component realising the maximum is equal to 1. Assume that, with this normalisa-
tion, cz = 1 for z = x. Then the z-line of (8.4.86) reads
−K3 zz = K3zy cy , (8.4.87)
y=z
which implies that λ must be much smaller than Ax . Thus, such a c would not
M \x
correspond to an eigenvalues that is larger than λ0 . Hence we may assume that
cx = 1 ≥ |cy | for all y = x. Now, (8.4.86) with z = x,
K3
xx = K3
xy cy , (8.4.89)
y=x
216 8 Key Definitions and Basic Properties
i.e.,
√
|Ax − λ| ≤ C|M | δAx + λ2 /λM
0 , (8.4.91)
which in turn implies that
√
λ = A 1 + O( δ + ρ) . (8.4.92)
Solving for cz , using (ii) and (iii), and employing λ ∼ Ax , we see that
√
|cz | ≤ C|M |( δ + ρ). (8.4.94)
|Ax − λ| ≤ C 2 |M |2 Ax , (8.4.95)
which is the first claim in Theorem 8.34. The assertion on the eigenvector follows
from our estimates on the vector c.
It remains to show that a solution of (8.4.86) as specified above exists. This can
be shown with the help of a fixed-point argument. Rearranging terms, we can cast
(8.4.86) into the form
λ = Λ(λ, c1 , . . . , ck−1 ),
(8.4.96)
c = C (λ, c1 , . . . , ck−1 ), = 1, . . . k − 1.
(ψ λ , hx ) λ
k−1
E (hx , hx )
Λ(λ, c) = −λ x 2 + Hxxj cj ,
hx 2,μ
2 hx 2,μ j =1
(8.4.97)
E (hx , hx ) (ψxλ , hx )
C (λ, c) = λ−1 c − λ c + Hx zj cj
λ
hx 22,μ hx 22,μ j =
Lemma 8.36 For any c with c∞ ≤ 1, the map Λ(·, c) : (Ax /2, 3Ax /2) → R is a
contraction. More precisely,
Λ(λ, c) − Λ λ , c ≤ λ − λ Cρ, (8.4.98)
Proof The estimate in (8.4.98) is straightforward from Lemma 8.21, Corollary 8.23
and Lemma 8.24, together with the assumption that all M are ρ-metastable sets.
λ = Λ(λ, c) (8.4.99)
Proof Set λ(0) (c) = Ax , and λ(n) (c) = Λ(λ(n−1) (c), c). Then, as n → ∞, λ(n) (c)
converges to the unique fixed point of the map Λ(·, c) on (Ax /2, 3Ax /2), which is
the solution of (8.4.99).
Lemma 8.38 The solution of (8.4.99) from Corollary 8.37 √ is Lipschitz continuous
with respect to the 1 -norm in c with Lipschitz constant C δAx .
Proof We show that, for fixed λ ∈ (Ax /2, 3Ax /2), Λ(λ, c) is Lipschitz in c.
Namely,
k−1
Λ(λ, c) − Λ λ, c ≤ c − c H λ
xx
=1
k−1
√
= c − c ( δAx + 3Ax Cρ), (8.4.100)
=1
218 8 Key Definitions and Basic Properties
√
which is dominated by the δAx -term. This gives the Lipschitz bound in 1 and
in ∞ . Combining this with the bound (8.4.98), we get
√
λ(c) − λ c ≤ C δAx , (8.4.101)
1 − Cρ
Lemma 8.39 For λ ∈ (Ax /2, 3Ax /2), the map C(λ, ·) : [−1, 1]k−1 → [−1, 1]k−1
is a contraction. More precisely,
C(λ, c) − C λ, c ≤ Cδ c − c . (8.4.102)
1
Corollary 8.40 For any λ ∈ (Ax /2, 3Ax /2), the equation
c = C(λ, c) (8.4.103)
Lemma 8.41 Let c(λ) denote the solution of (8.4.103) from Corollary 8.40. Then
c(λ) is Lipschitz in λ. More precisely,
√
δ
c(λ) − c λ ≤ C λ − λ . (8.4.104)
Ax
Proof The proof goes like the proof of Lemma 8.38. The fairly large bound on the
Lipschitz constant comes from the term involving Hxλ x that gives rise to a term
−1 E (hxk−1 , hx )
λ − λ −1 , (8.4.105)
hxk−1 2,μ hx 2,μ
Corollary 8.42 The map T : (Ax /2, 3Ax /2) → (Ax /2, 3Ax /2) defined by T (λ) =
λ(c(λ)), where λ(c) is the unique solution of λ = Λ(λ, c) and c(λ) is the unique
solution of c = C(λ, c), is a contraction. More precisely,
√
T (λ) − T λ ≤ C δ λ − λ , (8.4.106)
Corollary 8.42 implies that there exists a unique λ ∈ (Ax /2, 3Ax /2) such that
λ = λ(c(λ)). Hence (λ, c(λ)) is the unique solution of λ = Λ(λ, c) and c = C(λ, c)
with λ near Ax . This proves the existence of the solution to (8.4.86) and concludes
the proof of Theorem 8.34.
M \x
At this point we have shown that λM0 > λx > λ0 , where the last two eigen-
values are almost the same and are smaller by a factor at least δ than the first. This
procedure can now be repeated with M replaced by M \x, provided M \x satisfies
the hypothesis of a set of metastable points.
Theorem 8.43 (Asymptotics of the spectrum and mean metastable exit times) Let
|M | = k ≥ 2, and let M , = k, . . . , 1 be the sequence of sets defined in Theo-
rem 8.33. Assume further that, for each = 1, . . . , k, M is a set of metastable
points in the sense of Definition 8.2 (with the same parameter ρ). Then −L has k
eigenvalues λ1 < λ2 < · · · < λk < λM0 , where
λ1 = 0, (8.4.108)
and
cap(x , M−1 )
λ = 1 + O(δ) , = 2, . . . , k. (8.4.109)
μ(A(x ))
Consequently,
1 1
λ = 1 + O(δ) = 1 + O(δ) . (8.4.110)
Ex [τM−1 ] Ex [τMx ]
Proof Applying Theorem 8.34 to the sets M , we successively show that −L has
k − 1 eigenvalues below λM
0 that satisfy
cap(x , M−1 )
λx = 1 + O(δ) , = 2, . . . , k. (8.4.112)
hx ,M−1 2,μ
2
220 8 Key Definitions and Basic Properties
Using the same arguments as in the proof of Lemma 8.24, we show that
hx ,M−1 22,μ = hx ,M−1 1,μ 1 + O(ρ) = 1A(x ) 1,μ 1 + O(ρ) . (8.4.113)
It remains to identify the right-hand side with the inverse mean hitting time of
the set Mx = {z ∈ M : μ(z) > μ(x )}.
μ(A(x ))
Ex [τMx ] = 1 + o(1) . (8.4.115)
cap(x , Mx )
1
Ex [τM−1 ] = μ(z)hx ,M−1 (z)(1 + o(1)), (8.4.116)
cap(x , M−1 )
& )
z∈A(x
& ) now refers to the set M . However, by the construction of the sequence
where A(x
& )) = μ(A(x )), = k, . . . , 1. By Corol-
of sets M , = k, . . . , 1, we have μ(A(x
lary 8.29, the capacities in the two formulas are also equal up to a factor 1 + O(δ),
which proves the lemma.
There are different ways to prove that the distribution of metastable exit times is
asymptotically exponential. The most robust argument is based on a renewal ar-
gument: Since the probability to reach the set Mx starting from x ∈ M without
returning to x is very small, the process returns many times to x before a success-
ful excursion happens. The number of such excursions is geometrically distributed,
and the time of an unsuccessful recursion is μ(A(x))/μ(x), by the ergodic theorem.
Since this time is small compared to the number of excursions, the rescaled time un-
til a successful excursion converges to an exponential distribution. Finally, the time
of the last excursion is negligible compared to this time.
8.4 Spectral characterisation of metastability 221
Theorem 8.45 (Exponential law of metastable exit times) Under the non-degene-
racy hypothesis of Theorem 8.34 with δ satisfying (8.4.63), for all t > 0,
lim Px τMx > t Ex [τMx ] = e−t . (8.4.117)
ρ↓0
Note that Rx (λ) < ∞ for all λ < 1, due to the fact that the principle eigenvalue of
the Dirichlet generator with Dirichlet conditions in Mx is essentially 1/Ex [τMx ].
Moreover, Rx (λ) satisfies the following renewal equation.
Lemma 8.46 (Renewal equation for Laplace transforms) For all λ < 1,
Proof Noting that 1 = 1τMx <τx + 1τx <τMx and using the strong Markov property,
we see that
Rx (λ) = Ex eλτ̂Mx 1τMx <τx + Ex eλτ̂x 1τx <τMx Ex eλτ̂Mx . (8.4.120)
Lemma 8.47 With the notation in Lemma (8.46), for all λ < 1,
Proof The first step in the proof is the following crucial pointwise bound.
Proof Instead of proving (8.4.122) directly, we first show the (more natural) esti-
mate
Ex [τMx 1τMx <τx ] ≤ Cρ. (8.4.123)
222 8 Key Definitions and Basic Properties
Note that
1
Ex [τMx 1τMx <τx ] = Px (τMx < τx ) + μ(y)hx,Mx (y)hMx ,x (y).
μ(x)
/ Mx
y ∈x∪
(8.4.132)
The first term in the right-hand side is exponentially small. The second term is
controlled in the same way as in Lemma 8.24 and is bounded by Cρ. Thus,
8.4 Spectral characterisation of metastability 223
(8.4.123) holds. Finally, Ex [τx∪Mx ] ≥ 1 and so, via (8.4.124), it follows that
Ex [τx 1τx <τMx ] ≥ 1 − Cρ, and we deduce (8.4.122).
Hence we have
Ex [τx∪Mx ]
T= . (8.4.134)
Px (τMx < τx )
Because 1 ≥ e−x ≥ 1 − x, x ≥ 0, it follows that the numerator in (8.4.119) satisfies,
for λ ≤ 0,
Ex [(1 − eλτ̂x )1τx <τ̂Mx ] Ex [τ̂x 1τx <τ̂Mx ] Ex [(1 + λτ̂x − eλτ̂x )1τx <τ̂Mx ]
=− + .
λPx (τMx < τx ) Px (τMx < τx ) λPx (τMx < τx )
(8.4.138)
The first term is fine because
which tends to one rapidly. To deal with the second term, we use that, for u ≤ 0,
for u ∼ 1/Ex [τMx ]. This goes in a similar fashion as the derivation of the ∞ -
bounds in Lemma 8.22.
Set w(y) = Ey [τx∪Mx ]. We easily verify that vu solves the Dirichlet problem
where we set ū = 1 − e−u . Note that ū − u ≤ 0 for u ≤ 0, and is O(u2 ) for small u.
Mx
The Dirichlet problem in (8.4.143) has a unique solution for ū < λx∪
0 . Note also
that
Ex euτx∪Mx = eu (Lwu )(x) + 1 , (8.4.144)
and
Ex [τx∪Mx ] = (Lw)(x) + 1. (8.4.145)
Proceeding as in the proof of Lemma 8.22, we obtain the following estimates.
μ(y) Mx
sup Ey [τx∪Mx ] ≤ |S| sup ≤ |S|/λx∪ ≤ ρ/λM 0 ,
x
/ Mx cap(y, Mx ∪ x)
0
/ Mx
y ∈x∪ y ∈x∪
(8.4.149)
Mx Mx
where the last inequality uses the assumption that λx∪
0 ≥ δλ 0 with δ|S| ≤ ρ.
Mx
Finally, we recall that λ0 ∼ 1/Ex [τMx ]. Hence, for u = λ/Ex [τMx ], λ ≤ 0,
Ex 1 + λτ̂x − eλτ̂x 1τ ≤ Cλ2 ρ Ex [τx∪Mx ] . (8.4.150)
x <τ̂Mx
Ex [τMx ]
Inserting this bound into (8.4.138), we see that the second term in the right-hand
side is bounded in absolute value by λCρ, which tends to zero as desired. But this
implies the assertion of Lemma 8.47.
In Sects. 8.2–8.4 we have seen that the definition of metastability in Definition 8.2
through capacities leads to precise predictions on the connection between capacities,
metastable exit times and small eigenvalues of the generator. These results relied on
renewal arguments that are only available in the context of countable state spaces.
For uncountable state spaces, however, renewal arguments are not possible because
capacities of single points are either zero or are much smaller than capacities of
small neighbourhoods around them. This means that the process may take a very
long time to hit a point, even when it has already reached its neighbourhood. A
proper definition of metastable sets must look like (8.1.3), but with a suitable and
model-dependent choice of the sets Bi , i ∈ I .
In Chap. 11 we explain in detail, within the context of diffusion processes, how
the theory developed above carries over, provided the sets Bi , i ∈ I , can be chosen in
such a way that solutions of the Dirichlet problems we have encountered are almost
constant on these sets. At present this is the only example where the full picture
established in the present chapter has been carried over. For this reason, we do not
formulate an abstract model-independent result, but rather provide the details in the
specific context of Chap. 11.
An interesting setting where we would like to establish similar results are inter-
acting particle systems with state spaces like S = {−1, +1}Λ with Λ ⊂ Zd . In this
situation, only partial results in specific models are presently available. These are the
random-field Curie Weiss model (Chap. 15), when Λ = {1, . . . , N} with N → ∞,
as well as the Ising model with Glauber dynamics (Chap. 19) and the lattice-gas
model with Kawasaki dynamics (Chap. 20), both at low temperature, when the size
of Λ diverges as the temperature tends to zero.
226 8 Key Definitions and Basic Properties
3. In much the same way as in Theorem 8.15, conditional mean hitting times such as
Ex [τJ |τJ ≤ τI ] can be computed (for an example see (8.4.37)). Formulas are given
in Bovier, Eckhoff, Gayrard and Klein [33, 34].
7. A probabilistic proof of Lemma 8.5 (with slightly different bounds) was given
in Bovier, Eckhoff, Gayrard and Klein [34]. An analytic proof can be found in
Slowik [219].
9. The proof of the asymptotic exponential law for metastable exit times given in
Sect. 8.4.4 can be extended to settings where only approximate renewal properties
hold. See, for instance, Bianchi, Bovier and Ioffe [25]. An alternative proof in the
discrete setting can be found in Bovier, Eckhoff, Gayrard and Klein [34]. An elegant
earlier proof was given by Martinelli and Scoppola [177] and Martinelli, Olivieri
and Scoppola [174, 175].
Chapter 9
Basic Techniques
This chapter collects techniques that are basic for the study of metastability and that
will be used throughout Parts IV–VIII. Section 9.1 focusses on capacity estimates,
and derives upper and lower bounds on capacities with the help of variational princi-
ples, namely, the Dirichlet principle and the Berman-Konsowa principle introduced
in Sect. 7.3. We outline the strategies that are used to exploit these variational prin-
ciples in an efficient way to derive matching upper and lower bounds. Section 9.2
introduces the notion of coarse-graining, which is particularly useful for mean-field
systems. Section 9.3 states conditions under which the Markov property is preserved
when states are lumped. Section 9.4 deals with regularity estimates for harmonic
functions with the help of elliptic regularity theory and coupling.
for 0 < ρ 1 suitably chosen, with μ the invariant measure on S. The idea is
that the set I will be irrelevant for the value of the capacity, no matter what values
hBx ,By takes on I , and that the sets Dx and Dy give no contribution to the capacity
to leading order. The only problem therefore is to find the equilibrium potential,
or a reasonably good approximation of it, on the set S = S\(Dx ∪ Dy ∪ I ) (see
Fig. 9.1). We return to this problem shortly.
9.1 Capacity estimates 229
Fig. 9.1 Schematic picture of the harmonic function hBx ,By : trivial on Dx and Dy , irrelevant on
I , and nontrivial on S . In many applications, S is small and well structured
Remark 9.1 Of course, the above approach can only make sense when the sets Dx
and Dy are connected through S . If that is not the case, then we will have to analyse
the set S\(Dx ∪ Dy ) more carefully. If Dx ∪ Dy contains further metastable sets
Bw , then it will be possible to identify domains Dw ⊃ Bw on which hBx ,By takes
a constant value cw (to be determined later). Note that this can be done again with
the help of the renewal bounds we encountered in Chap. 8. The starting point is the
observation that
where we anticipate that Pw (τBx < τBy ) ∼ Pz (τBx < τBy ) = cw for all z ∈ Bw
(and later for all z ∈ Dw ). Thus, the problem (to be solved with the help of
the a priori bounds on the equilibrium potential in Lemma 8.4 and the a priori
bounds on capacities in Lemma 8.5) is to determine the set of points z for which
Pz (τw < τBx ∪By ) > 1 − δ. Once this is done, we proceed as before, after increasing
the set Dx ∪ Dy in the definition of I to D = Dx ∪ Dy ∪ Dw1 ∪ · · · ∪ Dwk when
k such sets can be identified. It should then be the case that the set D ∪ S is con-
nected. The remaining problem consists in determining the equilibrium potential on
the set S and the values cw1 , . . . , cwk . At this stage we can obtain upper and lower
bounds in terms of variational formulas that involve only the set S . To what extent
these variational formulas can be solved depends on the situation at hand.
h+ (z) = 1, z ∈ Dx ,
h+ (z) = 0, z ∈ Dy , (9.1.4)
h+ (z) = cwi , z ∈ Dwi , i = 1, . . . , k,
230 9 Basic Techniques
where the constants cwi are determined later. On I the function h+ can be chosen
essentially arbitrarily, while on S it must be chosen such that it optimises the re-
striction of the Dirichlet form to S with boundary conditions implied by (9.1.4).
Finally, the constants cwi , i = 1, . . . , k, are determined by minimising the outcome
as a function of these constants.
A first strategy to obtain matching lower bounds that works well in many situa-
tions goes as follows. If h∗ denotes the true minimiser in the Dirichlet form on S,
then
E h∗ , h∗ ≥ ES h∗ , h∗ , (9.1.5)
where ES is the restriction of the Dirichlet form to S , i.e.,
2
ES (h, h) = 12 μ(x)p(x, y) h(x) − h(y) , (9.1.6)
x∈S and/or y∈S
where for the sake of exposition we focus on the case of countable S. We minorise
ES (h∗ , h∗ ) by taking the infimum over all h on S , with boundary conditions im-
posed by what we know a priori about the equilibrium potential. In particular, we
know that these boundary conditions are close to constants on the different com-
ponents of D. Of course, we do not know the constants cwi , i = 1, . . . , k, but by
minimising the result over their possible values at the end we get a lower bound.
Thus, if we can show that the minimisers in the lower bound differ little from the
minimisers with constant boundary conditions, then we get upper and lower bounds
that coincide up to small error terms. In general it may be difficult to compute these
minimisers. However, in metastable systems typically the problem reduces in com-
plexity compared to the original problem, and in many instances can be solved ex-
plicitly.
The method to get lower bounds described above leaves open the problem of how
to obtain a lower bound for the reduced Dirichlet form ES . Sometimes this can be
done by setting sufficiently many transition probabilities p(x, y) equal to 0 until the
remaining Dirichlet problem can be solved exactly by hand (for instance, because it
corresponds to the one-dimensional chains discussed in Sects. 7.1 and 7.2).
A more versatile and systematic tool is provided by the variational principles
of Thomson and Berman-Konsowa presented in Sect. 7.3. The latter appears to be
particularly suitable, and we describe now the strategies used in exploiting it.
The Berman-Konsowa principle yields a lower bound whenever we insert some
flow. However, guessing a flow is much harder than guessing a potential, due to the
local constraints imposed by Kirchhoff’s law. In principle we would like to guess a
good approximation of the harmonic flow, but this is not easy in practice. A natural
idea is to use the approximate harmonic function that was guessed in the derivation
9.1 Capacity estimates 231
of the upper bound to produce an approximation of the harmonic flow. But since an
approximate harmonic function is not harmonic, it is not straightforward how to get
a flow out of it. It is useful to inspect the proof of the Berman-Konsowa principle in
Theorem 7.43 to see what flexibility we have in playing with the test flow. The flow
property is used only in the proof of Lemma 7.41, more precisely, in the derivation
of (7.3.28). We therefore define the notion of a defective flow.
Definition 9.2 (Defective loop-free unit flow) Let Γ = (S, E) be a graph with
vertex set S and edge set E. Let A, B ⊂ S be non-empty and disjoint. A map
f : E → R is called a defective loop-free unit flow from A to B if:
(i) There exists a defect function δ : S → R such that
f (y, x) = f (x, z) + δ(x), x ∈ S\(A ∪ B). (9.1.7)
y∈S : z∈S :
(y,x)∈E (x,z)∈E
(iii) Any path γ from A to B such that f (e) > 0 for all e ∈ γ is self-avoiding. In
particular, if f ((x, y)) > 0, then f ((y, x)) = 0.
Given a defective loop-free unit flow f , we can construct a Markov chain with
transition rates
f ((x, y))
q f (x, y) = , (9.1.9)
F (x)
where F (x) = y∈S f ((x, y)), and with initial distribution
Here, without loss of generality, we assume that F (x) > 0 for all x ∈ S. We denote
the law of this Markov chain by Pf , and define the sets An , n ∈ N0 , with A0 = A, as
in (7.3.31) in Sect. 7.3. Recall that n∗ (z) is the unique value of n such that z ∈ An .
The elementary estimate that results is the following.
δ(y)
Pf (τz < τB ) ≤ 1 + max F (z), z ∈ S, (9.1.11)
y∈Ak F (y) +
k=1
Proof The proof of this estimate proceeds by induction on n in exactly the same
way as the proof of Lemma 7.41. We know that (9.1.11) holds for all z ∈ A0 because
F = 1 on A0 . We assume that (9.1.11) holds for all z ∈ Ak with 0 ≤ k ≤ n. Then
the recursion in (7.3.33) yields, for z ∈ An+1 ,
Pf (τz < τB ) = Pf (τy < τB )q f (y, z) (9.1.12)
y∈A0 ∪···∪An
"
n k−1
δ(y)
≤ 1 + max f (y, z)
y∈A F (y) +
k=0 =1 y∈Ak
"
n−1
δ(y)
≤ 1 + max f (y, z)
y∈Ak F (y) +
k=1 y∈S
n
"
δ(y)
≤ 1 + max F (z),
y∈Ak F (y) +
k=1
where the first inequality uses (9.1.9) and the induction hypothesis, and the last
inequality uses (9.1.7).
n" δ(y)
P (x, y) ∈ γ ≤
f
1 + max f (x, y) , (9.1.13)
y∈Ak F (y) +
k=1
and hence
n (x)−1
∗
Inserting this lower bound into the Dirichlet form in (7.3.38), we get
M
"
−1
δ(y) μ(x)p(x, y) 2
E (h, h) ≥ 1 + max Ef h(x) − h(y)
y∈Ak F (y) + f ((x, y))
k=1 (x,y)∈γ
(9.1.15)
with M = maxx∈S n∗ (x). Taking the infimum over h, we get the following lower
bound on the capacity (recall (7.3.39)).
Situations where we may want to apply Lemma 9.4 arise when an approximate
harmonic function has been guessed in the upper bound. Given such a function, say
g, we may define
f (x, y) = N (g)−1 μ(x)p(x, y) g(y) − g(x) + , (9.1.17)
where N(g) is a normalising constant that fixes the total outgoing flow from A to
be 1, and we may write, using reversibility,
f (z, w) = N (g)−1 μ(z)p(z, w) g(w) − g(z) (9.1.18)
z∈S z∈S,(z,w)∈E,g(w)≥g(z)
= N (g)−1 μ(w)p(w, z) g(w) − g(z)
z∈S,(z,w)∈E
+ N (g)−1 μ(w)p(w, z) g(z) − g(w)
z∈S,(z,w)∈E,g(w)<g(z)
−1
= N (g) μ(w)(L g)(w) + F (w),
i.e., the defect function is given by δ(w) = N (g)−1 μ(w)(L g)(w). In Chap. 10 we
will encounter a set-up where this strategy works nicely. In general, however, there is
quite a bit of artistry involved in working out good test flows. We will see examples
in Chaps. 15, 19 and 20.
9.2 Coarse-graining
HN (σ ) = − 12 N mN (σ )2 − hN mN (σ ) (9.2.1)
234 9 Basic Techniques
is again a Markov process, this time on the much smaller state space {−1, −1 +
2N −1 , . . . , 1 − 2N −1 , 1} (see Fig. 9.2), and is reversible with respect to the measure
exp[−Nfβ,N (m)], where fβ,N is a double-well potential for β > 1 and h small
enough (for more details, see Fig. 13.1). This Markov process is a nearest-neighbour
random walk that is attracted to the local minima of the function fβ,N and behaves
similarly to the Kramers diffusion. The key point here is that the effective inverse
temperature in the coarse-grained model is of order N , i.e., entropic effects on the
mesoscopic scale have become marginal compared to the original model, and can
be ignored in the limit as N → ∞.
We would like to consider a similar mapping down to mesoscopic variables in
similar situations. This is called lumping in the theory of Markov chains, and is ex-
tensively discussed in the monograph by Kemeny and Snell [151]. Let us briefly
state the main results. The technique is still on the level of an art, and will be illus-
trated in the example of the random-field Curie-Weiss model treated in Chaps. 14
and 15.
9.3 Lumping
Consider a Markov process X = (Xt )t∈R+ on some state space S. Let T be some
other state space, and let f be a map from S to T . Then Y = (Yt )t∈R+ with Yt =
f (Xt ) is again a stochastic process. Typically, Y is not Markov, but there are easy
conditions under which it is (see Burke and Rosenblatt [43]).
Theorem 9.5 (Preservation of Markov property under lumping) Let P be the law of
a Markov process X = (Xt )t∈R+ with state space (S, B(S)) and stationary transi-
tion kernels Pt , t ≥ 0. Let Ft = σ (Xs , 0 ≤ s ≤ t), t ∈ R+ , be the σ -algebra gener-
ated by X up to time t. Let (T , B(T )) be a measurable space and f a measurable
9.3 Lumping 235
map from S to T . Then Y = (Yt )t∈R+ with Yt = f (Xt ) is a Markov process when
for every t ≥ 0 and B ∈ B(T ) the maps
P f (Xt ) ∈ B | Fs , 0 ≤ s < t, (9.3.1)
In words, the image process Y is Markov when the original process X is Markov
and has a high degree of symmetry (see Fig. 9.3).
In the case of a countable state space, the conditions of Theorem 9.5 can be
restated as saying that, if p(x, x ), x, x ∈ S, are the transition probabilities (or tran-
sition rates) of X, then for any y, y ∈ T and x ∈ S such that f (x) = y the formula
r y, y = p x, x , (9.3.3)
x ∈S :
f (x )=y
(i)
A,B (x) = ha,b f (x) ,
hX x ∈ S,
Y
(9.3.5)
Proof Property (i) is immediate from (9.3.1) in combination with the representa-
A,B (x) = P(τA < τB | X0 = x), x ∈ S, and ha,b (y) = P(τa < τb | Y0 = y),
tions hX X X Y Y Y
y ∈ T , where τAX is the first hitting time of A for X and τaY is the first hitting
time of a for Y . Property (ii) is immediate from (9.3.1) and the representations
capX (A, B) = E X (hX A,B , hA,B ) and capY (a, b) = E (ha,b , ha,b ), where E , E
X Y Y Y X Y
are the Dirichlet forms associated with X, Y (recall Lemmas 7.12 and 7.26).
For models with an uncountable state space, in order to carry over the general for-
malism discussed in Chap. 8 we need some a priori control on the behaviour of
harmonic functions and other solutions of relevant Dirichlet problems. There are
two methods that can work in different cases: elliptic regularity theory (Sect. 9.4.1)
and coupling methods (Sect. 9.4.2).
In the case of Markov processes with a state space that is a subset of Rd and with
a generator given by (the closure of) an elliptic operator of the form (7.2.15), there
is a well developed analytic theory that provides quantitative control on the regular-
ity of solutions of homogeneous and inhomogeneous Dirichlet problems associated
with these operators. The following two key lemmas are taken from Gilbarg and
Trudinger [126, Corollaries 9.24–9.25], and concern second-order elliptic operators
∂2 ∂
L = aij (x) + bi (x) + d(x) (9.4.1)
∂xi ∂xj ∂xj
i,j i
Let γ = C/c, and choose ν such that (b∞ /c)2 ≤ ν and b∞ ≤ ν. For n ∈ N,
let W 2,n (Ω) denote the Sobolev spaces of twice (weakly) differentiable functions
on Ω whose derivatives of order ≤ 2 are in Ln (Ω). Let BR (x) denote the ball of
radius R centred at x.
Lemma 9.8 If u ∈ W 2,n (Ω) is positive and satisfies L u = 0 in Ω, then for any
x ∈ Ω and R > 0 such that B2R (x) ⊂ Ω,
Lemma 9.9 If u ∈ W 2,n (Ω) is positive and satisfies L u = f in BR0 (x), then for
any 0 < R ≤ R0 ,
α
R
oscBR (x) u ≤ C oscBR0 (x) u + R0 f − cun,BR0 (x) , (9.4.4)
R0
where oscA u = supA u − infA u, α = α(n, γ , νR02 ) > 0 and C = C(n, γ , νR02 ) < ∞.
In the context of reversible diffusions with small noise, the term involving second
derivatives is scaled by the small parameter ε, i.e., we deal with operators of the form
∂ ∂
Lε = ε eF (x)/ε aij (x) e−F (x)/ε (9.4.5)
∂xi ∂xj
i,j
∂2 ∂aij (x)
∂F (x) ∂
=ε aij (x) + ε − aij (x) .
∂xi ∂xj ∂xi ∂xi ∂xj
i,j i,j
This means, in particular, that the ellipticity constant scales with ε. The way we
will use Lemmas 9.8–9.9 is to consider a family of domains depending on ε, chosen
in such a way that the numerical constants C and α are independent of ε. For the
operator Lε , both c and C are proportional to ε, γ = O(1), and we can choose
ν ∼ ε −2 supy∈Ω ∇F (y)2∞ .
An important application of the regularity estimates in Lemmas 9.8–9.9 is to
obtain bounds on harmonic functions. The basic tool used throughout Chap. 8 for
countable state spaces was Lemma 8.4, which was based on a simple renewal argu-
ment contained in the renewal equation (recall (8.2.2))
Px (τA < τB∪x )
Px (τA < τB ) = . (9.4.6)
Px (τA∪B < τx )
While this formula remains true in the diffusion setting, it is useless for d > 1 be-
cause the denominator equals 1 and so the numerator equals the left-hand side. For-
tunately, it is easy to obtain a useful analogue of (9.4.6) by purely analytic con-
siderations, contained in the following theorem for operators of the form (9.4.5).
238 9 Basic Techniques
cap(Bρ (x), A)
hA,B (x) ≤ C . (9.4.7)
cap(Bρ (x), B)
where ∂n(y) is the normal derivative defined in (7.2.51). We use that hA,B∪C (y) = 0
when y ∈ ∂C and GA∪B (y, x) = 0 when x ∈ ∂(A ∪ B). The last equality follows
from (7.2.56). (Note that the factor ε appears because the definition of the normal
derivative does not include the factor ε.)
Now choose C = Bρ (x). If we could replace G(A∪B)c (y, x) by a constant for
y ∈ ∂Bρ (x), then we could extract this constant from the integral, and the remaining
integral would be some partial capacity. In fact, on a countable state space instead
of the ball Bρ (x) we could choose the point x, in which case the problem would
be absent and we would readily get (9.4.6). In the present setting, by combining
(9.4.10)–(9.4.11), we still get two bounds, namely,
hA,B (x) ≥ − sup G(A∪B)c (z, x) e F (x)/ε
e−F (y)/ε eA,B∪Bρ (x) (dy),
z∈∂Bρ (x) ∂Bρ (x)
(9.4.12)
−F (y)/ε
hA,B (x) ≤ − inf G(A∪B)c (z, x) e F (x)/ε
e eA,B∪Bρ (x) (dy).
z∈∂Bρ (x) ∂Bρ (x)
At this point it is clear that we need to be able to control the Green function near
the diagonal. Before turning to estimates, we bring (9.4.12) into a more suitable
form.
Proof By the representation in (7.2.56), we have eB∪Bρ (x),A (dy) ≤ eBρ (x),A (dy).
Hence, by (7.2.57),
−F (y)/ε
e eB∪Bρ (x),A (dy) ≤ e−F (x)/ε eBρ (x),A (dy) = cap Bρ (x), A .
∂Bρ (x) ∂Bρ (x)
(9.4.14)
Thus, the upper bound in (9.4.8) implies the upper bound in (9.4.13).
We want to express the Green function in Lemma 9.11 in terms of capacity. Using
the symmetry of the Green function and the fundamental relation between the Green
function, the equilibrium measure and the equilibrium potential in Theorem 7.28,
we get
e F (x)/ε
e−F (z)/ε G(A∪B)c (x, z)eBρ (x),A∪B (dz)
∂Bρ (x)
= G(A∪B)c (z, x)eBρ (x),A∪B (dz)
∂Bρ (x)
i.e.,
1
eF (x)/ε inf G(A∪B)c (x, z) ≤ . (9.4.17)
z∈Bρ (x) cap(Bρ (x), A ∪ B)
It is clear at this point that we cannot continue unless we can compare the infimum
and the supremum of G(A∪B)c (z, x) with z ∈ Bρ (x). Such a comparison is provided
by the Harnack inequalities.
Lemma 9.13 (Harnack inequality for the Green function) If ρ = cε for some
c < ∞, then there exists a constant C, depending on c only, such that
Proof We will apply Lemma 9.8. If we choose R ≤ ε, then we can use (9.4.3)
with a constant that does not depend on ε. (If x is a quadratic critical point of F ,
then we can even choose R = ε 1/2 .) Let u(z) = G(A∪B)c (z, x), z ∈ Bρ (x). Then u
is harmonic in (A ∪ B)c \x. Therefore, if ρ > 2R, then u is harmonic in B2R (y)
for every y ∈ ∂Bρ (x). Let a, b ∈ ∂Bρ (x) be such that supz∈∂Bρ (x) u(z) = u(a) and
infz∈∂Bρ (x) u(z) = u(b). Then we can find k points x1 , . . . , xk ∈ ∂Bρ (x), with k ≤
πρ/R, such that x1 = a, b ∈ BR (xk ) and BR (xi ) ∩ BR (xi+1 ) = ∅. Clearly,
Thus, u(a) ≤ C ρ/R u(b), and so if ρ = cε and R = ε, then the supremum and the
infimum are related by at most a finite ε-independent constant.
An alternative way to obtain regularity estimates that are suitable for metastable
systems is via coupling. The basic idea is as follows. Take some function depending
9.4 Regularity estimates 241
Fig. 9.4 Coupling of two trajectories starting from two different initial values in a certain neigh-
bourhood
on the initial value x of the process, for instance, the expected hitting time x →
Ex [τD ] of a set D. We want to show that in a certain uniform sense this function
is continuous in x on a certain neighbourhood. To do so, we start two copies of the
process in two different initial values, say x and y, coupled in such a way as to
favour convergence of the trajectories over time. If the trajectories meet before D is
hit, then the processes realise the hit together. If this happens with large probability
after a small time, then the difference between Ex [τD ] and Ey [τD ] must be small.
See Fig. 9.4 for an illustration.
To exemplify this technique, we present its application in finite- and infinite-
dimensional diffusion processes first used by Martinelli et al. [174–177]. To that
end we place ourselves in the setting of the SDEs and SPDEs discussed in Sects. 5.6
and 5.7, respectively, 6.2 and 6.3, where the noise is scaled by a small parameter ε.
As pointed out in Sect. 6.5, there may be several minima of the action functional that
are metastable states in the sense of Freidlin-Wentzell theory. The theory therefore
yields the exponential asymptotics of transition times between such states.
In the potential-theoretic approach, Corollary 7.30 gives us a formula for mean
transition times when the system is started in a specific initial distribution on a
ball around a metastable state, namely, the last-exit biased distribution. The desired
result, however, is that the mean transition times do not really depend on the initial
distribution, i.e., are essentially the same no matter where in the ball the system
starts.
We limit our discussion to the reversible setting, although the results of Martinelli
et al. apply also to non-reversible processes. The main conditions formulated below
guarantee that the deterministic dynamics is attractive in the proper sense. If we are
looking at an S(P)DE of the form
√
dXt = −∇F (Xt ) dt + 2ε dBt , (9.4.20)
Theorem 9.14 For any m ∈ M there exist ρ0 > 0, η > 0 and ρ0 > ρ > 0 such
that, for any δ > 0 and for ε small enough,
Remark 9.15 This result applies both for finite-dimensional diffusions and for
SPDEs under the conditions stated in Sect. 5.7. For sequences of N -dimensional
discretisations of SPDEs as described in Sect. 5.7, the corresponding estimates hold
uniformly in N , as was shown in Barret [11].
The proof makes essential use of the attractive nature of the deterministic (ε = 0)
equation. The key estimate is the following bound on solutions of (9.4.20). This
holds both for SDEs and SPDEs.
Lemma 9.16 Denote by Xzε the solution of (9.4.20) starting in z. Let m be a min-
imum of F . Then there exist k, C > 0 and ε0 , ρ0 > 0 such that, for ε0 > ε > 0,
P sup Xzε (t) − Xm (t) ∞ ≤ e−kt Xzε (0) − Xm ε
(0) ∞
, ∀t > 0
z−m ∞ <ρ0
≥ 1 − e−C/ε . (9.4.23)
The proof of this contraction result is tedious but relies on large deviation esti-
mates only. These are used to show that solutions cannot spend a substantial fraction
of time away from local minima. Two solutions driven by the same Brownian mo-
tion approach each other when they are in the neighbourhood of a minimum. Careful
book-keeping yields the result. For details see, in particular, the elegant proof given
in Martinelli, Sbano and Scoppola [176].
1. The first reference to lumping appears to be Burke and Rosenblatt [43]. In this pa-
per, a necessary and sufficient criterion is given for a function of a Markov process
9.5 Bibliographical notes 243
to be Markovian. Kemeny, Snell and Laurie [151] introduced the notion of lump-
ing and of a lumpable Markov process. A systematic presentation of conditions for
lumpability on terms of symmetries of the transition rates is given in Baake, Baake,
Bovier and Klein [9]. Liggett [166] discusses lumping in connection with capaci-
ties. In the context of metastability, lumping was used heavily in Bovier, Eckhoff,
Gayrard and Klein [33]. Sharp bounds through coarse-graining in non-lumpable
models have been obtained in Bianchi, Bovier and Ioffe [24] for the random-field
Curie-Weiss model, and by Slowik [219] for the Potts version of this model.
2. Coupling methods to prove regularity were used by Martinelli, Olivieri and Scop-
pola [174, 175], and in an improved form by Martinelli, Sbano and Scoppola [176],
to prove exponential convergence of exit times for finite- and infinite-dimensional
diffusions. They were applied to discretisations of SPDEs by Barret [11].
3. Coupling techniques were used to obtain similar bounds as in Theorem 9.14 for
Glauber dynamics of the random-field Curie-Weiss model by Bianchi, Bovier and
Ioffe in [25]. The coupling used was an extension of a coupling constructed for the
Glauber dynamics of the Curie-Weiss model by Levin, Luczak and Peres in [164].
Parts IV–VIII bring the general theory outlined in Part III to bear on a number of
selected examples.
In Part IV we study diffusions with small noise. Chapter 10 deals with diffusions
on a lattice with a vanishing spacing. Chapter 11 looks at finite-dimensional diffu-
sions on subsets of Rd and sharpens the results of Freidlin and Wentzell by using
the potential-theoretic tools introduced in Part III. Chapter 12 looks at stochastic
partial differential equations, which are the infinite-dimensional analogues of the
diffusions dealt with in Chap. 11.
Chapter 10
Discrete Reversible Diffusions
One of the simplest settings in which the general theory of metastability outlined in
Part III can be applied is that of discrete diffusions. By this we understand discrete-
time or continuous-time (nearest-neighbour) random walks on d-dimensional lat-
tices of spacing ε > 0 subject to a drift field derived from a potential F that may
have several local minima. One of the motivations for studying discrete reversible
diffusions is that they appear as coarse-grained versions of mean-field spin systems.
The results of this chapter will be used in Part V.
In Sect. 10.1 we define the setting and state the necessary assumptions on the
potential. In Sects. 10.2 and 10.3 we derive upper and lower bounds on the relevant
capacities.
10.1 Definitions
where the infimum runs over all continuous paths γ in Ω. The communication
level set between A and B is
S (A, B) = z ∈ Rd : F (z) = Φ(A, B) . (10.1.4)
(A → B)opt
= γ ∈ C [0, 1], Ω : γ (0) ∈ A, γ (1) ∈ B, sup F γ (t) = Φ(A, B) .
t∈[0,1]
(10.1.5)
(c) A subset W ⊆ S (A, B) is a gate if it is a minimal subset with the property that
all optimal paths intersect W . A priori there may be several (not necessarily
disjoint) gates. Their union is denoted by G (A, B) and is called the essential
gate.
Assumption 10.3
(i) The set M of local minima of F is finite, and for all pairs x, y ∈ M there is a
unique essential gate G (x, y) consisting of a finite collection of isolated saddle
points zk∗ (x, y), k ∈ I (x, y), with I (x, y) an index set.
10.1 Definitions 249
(ii) At all local minima x ∈ M and all saddle points zk∗ (x, y), x, y ∈ M , k ∈
I (x, y), the Hessian matrix of F , denoted by A(x) and A(zk∗ (x, y)), is non-
degenerate (i.e., has only non-zero eigenvalues).
Remark 10.4 Assumption 10.3 amounts to saying that F is a Morse function. Under
this assumption, the saddle points zk∗ (x, y) are the critical points where A(zk∗ (x, y))
has exactly one negative eigenvalue.
We may encounter situations where saddle points in ∂Ω are relevant. While this
does not necessarily lead to problems, there are many instances where the formu-
lation of general results becomes somewhat cumbersome. In order to avoid these
complications we exclusively deal with situations where ∂Ω is never reached by X ε :
Assumption 10.5 limi→∞ F (xi ) = ∞ for any sequence of points (xi )i∈N in Ω
such that limi→∞ xi = x ∈ ∂Ω.
In the setting described above, the general theory of metastability for Markov
chains on countable state spaces described in Chap. 8 applies.
Proof The proof of this fact is easy. As we will see later in full detail, if x, y ∈
M , then cap(x, M \x)/με (x) ∼ exp(ε −1 [F (x) − F (z∗ (x, y))]), which is of order
exp(−C/ε) with C > 0. If z ∈ / M , then there is a path γ = (γ0 , . . . , γn ) from z
to M along which F is decreasing. As pointed out in Sect. 9.1.2, it follows that
cap(z, M ) ≥ capγ (γ (0), γ (n)), where capγ is the capacity of the Markov process
in which all connections except those on γ are removed. The lower bound can be
computed explicitly (see Sect. 7.1.4). This leads to the estimate cap(z, M )/με (z) ≥
O(ε p ) for some dimension-dependent p < ∞, which in turn implies that the condi-
tions of Definition 8.2 are satisfied.
Remark 10.7 Note that when Sε is infinite, our assumptions on F imply that there
exists a subset Sε,0 = Λ ∩ (εZ)d , with Λ some finite box, satisfying the hypothesis
on the subset S0 mentioned in the remark following Definition 8.2. We just need to
take Λ such that it contains all local minima of F and such that, outside Λ, F is
large enough.
In view of Theorem 10.6, all that is required is to compute capacities between the
single points in Mε . To further simplify the presentation, we assume that all gates
consist of a single saddle point. The general case is obtained simply by adding up
the contributions to the capacity coming from the different saddle points. Our goal is
to apply Theorem 8.15, which expresses the metastable crossover times between the
250 10 Discrete Reversible Diffusions
local minima in Mε in terms of capacities and the invariant measure. Theorem 8.43
automatically yields the associated spectral estimates.
Let A, B be disjoint non-empty subsets of Mε connected through a unique saddle
point z∗ (A, B), i.e., A and B are contained in two different connected components
of the level set {y ∈ Sε : F (y) < Φ(A, B)}. For z∗ ∈ Mε , let B(z∗ ) be the matrix
with elements
√ √
B z∗ ,k = r A z∗ ,k rk , (10.1.6)
and let γ̂1 (z∗ ) be the unique negative eigenvalue of B(z∗ ). For x ∈ Mε , write
Mε,x = {y ∈ Mε : F (y) < F (x)} and let z∗ (x, Mε,x ) be the unique saddle point
connecting x and Mε,x .
Our main results in this chapter are the following.
Theorem 10.10 (Link between spectrum and metastable exit times) Suppose that
there exists a θ > 0 such that the elements of Mε can be labeled in such a way that
Φ(xk , Mk−1 ) − F (xk ) ≤ min Φ(xl , Mk \xl ) − F (xl ) − θ, k = 2, . . . , |Mε |,
1≤l<k
(10.1.9)
where Mk = {x1 , . . . , xk }, k = 1, . . . , |Mε |. As ε ↓ 0,
1
λk = 1 + O e−δ/ε , k = 2, . . . , |Mε |, (10.1.10)
Exk [τMk ]
for some δ = δ(θ ) > 0, where λk is the k-th eigenvalue of −Lε (in increasing order),
and λ1 = 0.
Theorem 10.11 (Exponential law of the metastable exit time) Under the assump-
tions of Theorem 10.10, for k = 1, . . . , |Mε |,
lim Pxk τMk /Exk [τMk ] > t = e−t , t ≥ 0. (10.1.11)
ε↓0
10.2 Upper bounds on capacities 251
Proof The lemma states that, up to the error given, the mass of A(x) is the same
as if the potential were of the form F (y) = F (x) + 12 ((y − x), A(x)(y − x)). In
that case the discrete sum over lattice points with spacing ε is well approximated
by the corresponding Gaussian integral. Furthermore, the contributions to both the
Gaussian
√ integral and the original sum coming from the region where y − x∞ ≥
C ε ln(1/ε) are by a factor of order ε C smaller than the main contribution and can
be neglected. On the remaining set, by Taylor expansion,
3
ε −1 F (y) − F (x) − 12 (y − x), A(x)(y − x) ≤ C ε ln(1/ε) . (10.1.13)
This results in the error term in (10.1.12). More details can be found in the compu-
tation of capacities carried out in Sects. 10.2–10.3, which uses very similar approx-
imations.
In this section we derive upper bounds on capacities between two local minima. For
this we use the Dirichlet principle. We only need to produce a good test function.
We want to estimate
and HA,B = {h : Sε → R : E (h, h) < ∞, h|A ≥ 1, h|B ≤ 0}. The general strat-
√ to construct a test function is the following. We choose a strip W0 of width
egy
C ε ln(1/ε) with the following properties (see Fig. 10.1):
(i) The complement of W0 in Sε consists of two parts: W1 containing A and W2
containing B.
252 10 Discrete Reversible Diffusions
Fig. 10.1 Domains for the construction of the test function in (10.2.3), with m∗1 ∈ A and m∗2 ∈ B
√
(ii) W0 contains z∗ , and for a cube Dε of linear size C ε ln(1/ε) centered at z∗ ,
with C large enough, W0 ∩ Dε is contained in the set {x ∈ Sε : F (x) > F (z∗ ) +
cε ln(1/ε)} for a suitably chosen c > 1.
A test function %
g is taken of the form
⎧
⎪
⎪0, if x ∈ W1 ,
⎪
⎨1, if x ∈ W2 ,
%
g (x) = (10.2.3)
⎪
⎪g(x), if x ∈ W0 ∩ Dε = W0in ,
⎪
⎩
0, if x ∈ W0 ∩ Dεc = W0out ,
Lemma 10.13 Let E , E% be two Dirichlet forms defined for the same state space
Sε , corresponding to reversible measures με , %
με and transition matrices pε , p̃ε ,
10.2 Upper bounds on capacities 253
cap(A, B)
(1 − δ)2 ≤ ≤ (1 − δ)−2 , (10.2.5)
3
cap(A, B)
= (1 − δ)2 cap(A,
3 B). (10.2.6)
3
cap(A, B) ≥ (1 − δ)2 cap(A, B), (10.2.7)
We need to control the transition probabilities pε (x, x ) and the reversible measure
με (x) = exp(−F (x)/ε) in terms of suitable modifications. Let A(z∗ ) = A be the
Hessian matrix of F at the saddle point z∗ , and set
μ̃ε (x) = exp − 12 x − z∗ A x − z∗ /ε . (10.2.9)
With this in mind, we let L %ε be the generator of the dynamics on Dε (ρ) with
transition probabilities r̃(x, y) given by
r(x, x + εe ) = r ,
%
(10.2.12)
%
r(x + εe , x) = r % με (x + εe ).
με (x)/%
The fact that we choose the transition probabilities in the directions +e to be con-
stant is arbitrary. In the directions −e we must choose them such that reversibility
with respect to the modified reversible measure μ̃ε is satisfied.
For u ∈ HA,B , we write the corresponding Dirichlet form as
d
2
E%Dε (u, u) = r μ̃ε (x) u(x) − u(x + εe ) , (10.2.13)
x∈Dε (ρ) =1
In this section we construct a function that is almost harmonic with respect to the
Dirichlet form E%Dε .
Recall the matrix B(z∗ ) = B defined in (10.1.6). Let v̂ (i) , i = 1, . . . , d, be the
normalized eigenvectors of B, and γ̂i the corresponding eigenvalues. Denote by γ̂1
the unique negative eigenvalue of B. Define vectors v (i) by
(i) (i) √
v = v̂ / r , = 1, . . . , d, (10.2.14)
and
(i) (j )
v̌ , v = δij . (10.2.17)
10.2 Upper bounds on capacities 255
d
(y, Ax) = γ̂i y, v (i) x, v (i) . (10.2.18)
i=1
Finally, we single out the vectors v = v (1) , v̌ = v̌ (1) , v̂ = v̂ (1) and set
which is our choice for the approximately harmonic function in the definition of
%
g in (10.2.3). Note that g(x) only varies in the direction of the vector v, and that
it is close to 0 when (v, x) ≤ −ρ, and close to 1 when (v, x) ≥ ρ. Moreover, the
following estimate holds.
Lemma 10.14 Let g be as in (10.2.20), and let L %ε be the generator defined after
(10.2.11). Then, for all x ∈ Dε (ρ), there exists a constant c < ∞ such that
'4 (
ε|γ̂1 | −|γ̂1 |(v,x)2 /2ε
d
%ε g)(x) ≤
(L e r v O(ρ 2 ). (10.2.21)
2π
=1
Proof We choose coordinates such that z∗ = 0, and set A = A(z∗ ). Using reversibil-
ity, we get
r(x, x − εe ) = exp − 12 ε −1 (x, Ax) − (x − εe ), A(x − εe ) r
%
= exp −(e , Ax) 1 + O(ε) r . (10.2.22)
Therefore
d
%ε g)(x) =
(L r g(x + εe ) − g(x)
=1
Next, we use the explicit form of g given in (10.2.20) to obtain, by Taylor expansion,
that for some x̃ ∈ [x, x + εe ],
256 10 Discrete Reversible Diffusions
g(x + εe ) − g(x) = f (v, x) + εv − f (v, x) (10.2.24)
= v εf (v, x) + 12 v2 ε 2 f (v, x) + 16 v3 ε 3 f (v, x̃)
4
ε|γ̂1 | −|γ̂1 |(v,x)2 /2ε
= v e 1 − v |γ̂1 |(v, x) + O ρ 2 ,
2π
√
where we use (10.2.19) and ρ = C ε ln(1/ε). In particular, we get that
Now
1 − exp −(e , Ax) − |γ̂1 |v (v, x) 1 + O ρ 2
= (e , Ax) + |γ̂1 |v (v, x) + O ρ 2 . (10.2.27)
Using this fact, and collecting the leading order terms, we get
4
%ε g)(x) = ε|γ̂1 | −|γ̂1 |(v,x)2 /2ε
(L e
2π
d
× r v (e , Ax) + |γ̂1 |v (v, x) + O ρ 2 . (10.2.28)
=1
d
r v (e , Ax)−γ̂1 v (v, x) = 0. (10.2.29)
=1
10.2 Upper bounds on capacities 257
d
(j ) (j )
(e , Ax) − γ̂1 v (v, x) = γ̂j v v ,x . (10.2.30)
j =2
(1)
Hence, recalling that r v = v̌ by (10.2.15) and that v̌ (1) is orthogonal to v (j ) for
j ≥ 2 by (10.2.17), we see that (10.2.29) holds.
Lemma 10.14 justifies the choice of g and will play an important rôle also in the
derivation of the lower bound in Sect. 10.3. But first we state the upper bound that
follows from it.
Proof Return to Fig. 10.1. We first estimate the contribution of the set Dε ∩ W0 . By
Lemma 10.13, this can be controlled in terms of the modified Dirichlet form E%Dε in
(10.2.13). Thus, let g be the function defined in (10.2.20), and choose coordinates
such that z∗ = 0 and F (z∗ ) = 0. Then, by (10.2.9) and (10.2.24),
d
2
E%Dε (g, g) = e−(x,Ax)/2ε r g(x + e ) − g(x) (10.2.32)
x∈Dε =1
where we use that d=1 r v2 = d=1 v̂2 = 1. It remains to compute the sum over x.
Via a standard approximation of the sum by an integral we get, by (10.1.6), (10.2.14)
and (10.2.19),
258 10 Discrete Reversible Diffusions
exp −|γ̂1 |(v, x)2 /ε − (x, Ax)/2ε
x∈Dε
−d
=ε 1 + O(ρ) dx exp −|γ̂1 |(v, x)2 /ε − (x, Ax)/2ε
Dε
d √
" r
= 1 + O(ρ) dy exp −|γ̂1 |(y, v̂)2 /ε − (y, By)/2ε
ε D̄ε
=1
√
"
d
r
d
2
= 1 + O(ρ) dy exp −|γ̂1 |(y, v̂)2 /ε − γ̂j v̂ (j ) , y /2ε
ε D̄ε
=1 j =1
d √
d
" r (j ) 2
= 1 + O(ρ) dy exp − |γ̂j | v̂ , y /2ε
ε D̄ε
=1 j =1
d 4
2π d/2 " r
= 1 + O(ρ)
ε |γ̂ |
=1
d/2
2π −1/2
= 1 + O(ρ) − det A . (10.2.33)
ε
√
Here, D̄ε is the image of Dε under the change of variables y = x / r , and in
the last√equality we use the fact that Gaussian integrals over intervals of length
ρ = C ε ln(1/ε) are equal to integrals over R up to errors of order ε C .
Inserting (10.2.33) into (10.2.32), we see that the left-hand side of (10.2.32) is
equal to the right-hand side of (10.2.31) up to error terms. It therefore remains to
show that the sum outside Dε in the Dirichlet form does not contribute significantly
to the capacity. But we can always choose Dε and W0 in such a way that the follow-
ing hold:
(i) For x ∈ W0 ∩ Dεc , με (x) ≤ με (z∗ )ε K with K as large as desired.
(ii) If x ∈ W0 ∩ Dε and y ∈ W1 , with pε (x, y) > 0, then g(x)2 με (x)/με (z∗ )
≤ ε K . Similarly, if x ∈ W0 ∩ Dε and y ∈ W2 , with pε (x, y) > 0, then
(g(x) − 1)2 με (x)/με (z∗ ) ≤ ε K . Both facts follow from the explicit form of
g and the fact that F is close to its quadratic approximation on Dε .
From these observations we easily derive that the contribution to the Dirichlet form
from (W0 ∩ Dε )c is negligible compared to the contribution from W0 ∩ Dε . This
yields Proposition 10.15.
Proof We have to construct a defective unit flow fA,B from A to B that reproduces
the upper bound from Proposition 10.15. The construction of the flow is a bit more
artistic than the construction of the approximate harmonic function. The idea is to
channel the flow through a certain neighbourhood Gε of z∗ that plays a rôle similar
to that of the cube Dε in Sect. 10.2 (recall Fig. 10.2). We will construct the flow
from three pieces, fA , f, fB . Here, fA is a unit flow from A to ∂A Gε , fB is a unit
flow from ∂B Gε to B, and f is a defective unit flow from ∂A Gε to ∂B Gε associated
with the approximate harmonic function g that we used in the upper bound (recall
Fig. 10.1 and see Fig. 10.2). In fact, for x ∈ Gε we set
μ̃ε (x)r [g(x + εe ) − g(x)]
f (x, x + εe ) = , (10.3.2)
N (g)
where N(g) is the normalising constant so that the total flow out of ∂A Gε equals 1,
which is given by
N(g) = μ̃ε (x)r g(x + εe ) − g(x) . (10.3.3)
x∈∂A Gε 1≤≤d :
x+εe ∈Gε
This is essentially a directed nearest-neighbour random walk with drift in the di-
rection of v̌. Recall that v̌ is the direction of steepest descent of F at the saddle
260 10 Discrete Reversible Diffusions
point z∗ . This implies, in particular, that with probability tending to one paths start-
ing in ∂A Gε stay within a larger cylinder in the direction of v̌ with base containing
∂A Gε before leaving Dε .
The choice of the flow into ∂A Gε is rather arbitrary. Ideally, we would like to
take disjoint paths from A to each point in ∂A Gε and send the flow through this
path. Of course, this is not possible when e.g. A is a single point, since in that case
paths need to merge. However, this is not really a problem because these parts of the
paths will not give any relevant contributions to the capacity anyway. Likewise, the
flow arriving in ∂B Gε will be channeled into B along coalescing paths. Figure 10.2
depicts this choice.
We will only consider paths from A to B that enter Gε through ∂A Gε and exit Gε
through ∂B Gε . By construction, this set of paths has Pf -probability at least 1 − o(1).
Any such path consists of three pieces: γ1 : A → ∂A Gε , γ2 : ∂A Gε → ∂B Gε and
γ3 : ∂B Gε → B. Consequently,
The term K2 gives the desired contribution (recall (10.3.2)). Using (10.2.10–
10.2.11) and the explicit form of g in (10.2.19–10.2.20), we get (recall (10.1.2))
f ((x, y)) 1
K2 = = g(x) − g(y)
μ̃ε (x)pε (x, y) N (g)
(x,y)∈γ2 (x,y)∈γ2
1
= g γ2 (|γ2 |) − g γ2 (0)
N (g)
4
√
1 |γ̂1 | C ε ln(1/ε) −|γ̂1 |u2 /2ε
= e du
N (g) 2πε −C √ε ln(1/ε)
1
= 1 − O εC . (10.3.7)
N (g)
The terms K1 and K3 are negligible, provided the paths γ1 , γ3 stay within the
level set {x ∈ Sε : F (x) ≤ F (z∗ ) − C ε ln(1/ε)}, which can always be achieved due
to our assumptions on the function F . Namely, even the crudest possible bound
f ((x, y)) ≤ 1 implies that (recall (10.1.1)–(10.1.2))
fA ((x, y)) 1
K1 = ≤
με (x)pε (x, y) με (x)pε (x, y)
(x,y)∈γ1 (x,y)∈γ1
∗ )/2ε−C ln(1/ε) 1
≤ C|γ1 |eF (z , (10.3.8)
N (g)
provided C is large enough. The last estimate in (10.3.8) will be shown in (10.3.18–
10.3.19) below. The same argument applies to K3 and fB . Hence, we obtain with
10.3 Lower bounds on capacities 261
In order to apply Lemma 9.4, we must ensure that the accumulated defect is
negligible. Recall from Lemma 9.4 that the error factor is bounded by
M
"
δ(y) −1
1 + max , (10.3.10)
y∈Ak F (y)
k=1
where M is the length of the path and Ak , k = 1, . . . , M are the sets defined in
(7.3.31). For our choice of the flow,
δ(y) %ε g)(y)
(L
= . (10.3.11)
F (y) z∈Gε : g(y)<g(z) με (y)pε (y, z)[g(z) − g(y)]
On the other hand, the paths in Gε have length at most ρ/ε, so that
M
"
−1
δ(y) −ρ/ε
1 + max ≥ 1 + O ρ2
y∈Ak F (y)
k=1
3
≥1−O ε ln(1/ε) , (10.3.13)
d
F (x) = − 12 |γ̌1 |a12 + 1
2 γ̌i ai2 + O ε 3/2 , (10.3.14)
i=2
where
√ we recall (10.1.13)√ and (10.2.16). For points in the front base, a1 =
C ε ln(1/ε) and ai ≤ C ε ln(1/ε), i = 2, . . . , d, so by making C large we can
achieve our objective.
262 10 Discrete Reversible Diffusions
Fig. 10.3 The cylinder Gε , and the pieces ∂A Gε and ∂B Gε of ∂Gε at the front and end base of
the cylinder
Let ∂B Gε be the end base of the cylinder, in the direction of B. Let ∂A Gε be the
√
central part of radius C ε ln(1/ε) with C < C of the front base of the cylinder,
in the direction of A. Any choice of C < C will actually be fine. The cylinder Gε
is depicted in Fig. 10.3.
By Lemma 10.13, inside Gε we may work with the modified Dirichlet form
given in (10.2.13). The boundary ∂Gε of Gε consists of three disjoint pieces, ∂Gε =
∂A Gε ∪ ∂B Gε ∪ ∂r Gε , where ∂r Gε is simply what is left over after the other two
pieces are removed. Let g be the approximate harmonic function defined in (10.2.3)
and (10.2.20). Proceeding along the lines of (10.2.32)–(10.2.33), we see that
d
2
EGε (g, g) = 1 + o(1) μ̃ε (x) r g(x + εe ) − g(x) . (10.3.15)
x∈Gε =1
d
2
μ̃ε (x) r g(x + εe ) − g(x)
x∈Gε =1
=− %ε g)(x)
μ̃ε (x)g(x)(L
x∈Gε
+ r μ̃ε (x)g(x) g(x) − g(x + εe ) . (10.3.16)
x∈∂Gε 1≤≤d :
x+εe ∈Gε
= 1 + o(1) r μ̃ε (x) g(x) − g(x + εe ) . (10.3.18)
x∈∂A Gε 1≤≤d :
x+εe ∈Gε
Hence
E˜Gε (g, g) = 1 + o(1) r μ̃ε (x) g(x) − g(x + εe )
x∈∂A Gε 1≤≤d
x+εe ∈Gε
= 1 + o(1) N (g), (10.3.19)
1. The class of models treated in this chapter served as the starting point of the
potential-theoretic approach to metastability initiated in Bovier, Eckhoff, Gayrard
and Klein [33]. This paper was leaning heavily on renewal ideas, and identified
prefactors of capacities and mean hitting times only up to constants.
The first steps towards describing metastability for models with a non-discrete state
space lead to finite-dimensional diffusions. These are the processes originally stud-
ied by Freidlin and Wentzell [115]. In the case of gradient drifts we are able to
recover the heuristic predictions by Eyring and Kramers explained in Sect. 2.1.1.
The presentation below contains two main parts. After describing the setting in
Sect. 11.1, we derive sharp estimates on average hitting times in Sect. 11.2. This
requires the use of sharp estimates on capacities, together with the regularity es-
timates that were presented in Sect. 9.4. In Sect. 11.3 we compute the low-lying
spectrum of the generator of the diffusion. The result is completely analogous to
the spectral result described in Chap. 8 for Markov processes with a discrete state
space. The main message of this chapter is that all the results about metastable sys-
tems that were obtained in Chap. 8 in the setting of a discrete state space carry over
to diffusions with the help of some regularity theory.
ima, and when ε is small. We assume that Xε is killed as soon as it exits Ω. This
is the natural extension of the paradigmatic double-well model of Kramers to the
multi-dimensional setting. We will show that the formalism outlined in Chap. 8 is
well suited to study the metastable behaviour of Xε in the limit as ε ↓ 0, and yields
sharp results in a relatively simple manner. Recall that the process Xε is reversible
with respect to the measure
με (dx) = exp −F (x)/ε dx. (11.1.2)
Assumption 11.1
(i) Ω ⊆ Rd is open and connected, and F ∈ C 3 (Ω).
(ii) If Ω is unbounded, then
(ii.1) limx→∞ |( x
x
, ∇F (x))| = ∞,
(ii.2) limx→∞ [∇F (x) − 2ΔF (x)] = ∞.
Assumption 11.1 ensures that the resolvent of the generator −Lε of Xε is com-
pact for ε sufficiently small. Moreover, it implies that F has exponentially tight level
sets, i.e.,
e−F (y)/ε dy ≤ Ce−a/ε ∀ a > 0, (11.1.5)
{y∈Rd : F (y)≥a}
that the precise choice of the hitting set is not important, and that the problem of
computing τA is virtually equivalent to computing the escape time from a suitably
chosen neighbourhood of x, provided this neighbourhood contains the relevant sad-
dle points connecting x and y. Figure 11.1 schematically depicts the motion of a
particle in a two-well potential.
The basis for the success of the potential-theoretic approach to metastability is the
fact that capacities can be estimated sharply. Recall the definition of the communi-
cation height Φ(A, B) and the essential gate G (A, B) and between two disjoint sets
A, B, as introduced in Definition 10.2.
Theorem 11.2 (Capacity asymptotics) Assume that A, B ⊂ Rd are closed and dis-
joint such that
(i) dist(G (A, B), A ∪ B) ≥ δ > 0 for some δ independent of ε.
(ii) Both A and B contain a closed ball of radius at least ε.
If G (A, B) = {z1∗ , . . . , zn∗ }, then
cap(A, B)
(2πε)d/2
n
[−λ∗1 (zi∗ )] 3
= e−Φ(A,B)/ε 1 + O ε ln(1/ε) ,
2π − det(∇ 2 F (zi∗ ))
i=1
(11.2.2)
where λ∗1 (zi∗ ) denotes the negative eigenvalue of the Hessian of F at zi∗ .
268 11 Diffusion Processes with Gradient Drift
(11.2.3)
In the special case where there is only one saddle point z∗ , (11.2.3) reduces to
the classical Eyring-Kramers formula in (2.1.2):
2πe[F (z )−F (xi )]/ε − det(∇ 2 F (z∗ ))
∗ 3
Exi [τD ] = 1 + O ε ln(1/ε) .
[−λ∗1 (z∗ )] det(∇ 2 F (xi ))
(11.2.4)
To prove Theorems 11.2–11.3, we follow the general strategy outlined in Sect. 9.1.
In the present section we derive rough estimates on capacities. Via the renewal es-
timate in (9.4.6) these lead to rough estimates on harmonic functions. These will in
turn lead to sharp estimates on capacities and the equilibrium potential.
Lemma 11.4 Let D ⊂ Rd be closed, and let x ∈ D c be such that d(x, D) ≥ ρ for
some 0 < ρ ≤ ε. Let z∗ = z∗ (x, D) be any point in G (x, D). Then there are con-
stants C , Cu > 0 independent of ε such that, for ε small enough,
∗ ∗
C ρ d−1 e−F (z )/ε ≤ cap Bρ (x), D ≤ Cu ερ −1 e−F (z )/ε . (11.2.5)
Proof To prove the lower bound, we use the Dirichlet principle (recall Theo-
rem 7.33) and monotonicity. We begin by choosing a smooth path ω from x to D re-
maining in the level set {z ∈ Rd : F (z) ≤ F (z∗ )} and reaching the value F (z∗ ) only
when passing through z∗ . (The canonical path can be constructed by using pieces of
the deterministic trajectory of the unperturbed equation dXε (t) = −∇F (Xε (t))dt in
a rather obvious manner, but this is not important.) Given this path, we parametrise
it by arc-length, so that ω̇(t)2 = 1 for all t.
Given ω(t), we consider the tube of width ρ around ω(t):
ωρ = z ∈ Rd : ∃ t ∈ 0, |ω| such that ω(t) − z 2 ≤ ρ . (11.2.6)
11.2 Capacity estimates and mean hitting times 269
Let Dρ denote the (d − 1)-dimensional disk of radius ρ centred at the origin. The
important fact to note is that, for any h,
2 d 2
∇h ω(t) + z⊥ 2
≥ h ω(t) + z⊥ . (11.2.7)
dt
The minimisation problem is now trivial, i.e., it decomposes for each fixed z⊥ into
a one-dimensional problem whose solution is well known. In fact, the minimiser
hz⊥ (t) is the solution of the 1-dimensional Dirichlet problem
d d d
−ε + F ω(t) + z⊥ hz (t) = 0,
dt dt dt ⊥
(11.2.9)
hz⊥ (0) = 1,
hz⊥ (|ω|) = 0,
From here the lower bound in (11.2.5) follows from simple saddle point evaluations
of the integral in the denominator.
The upper bound is obtained by choosing a suitable test function that changes
from zero to one over a distance ρ along a plane separating Bρ (x) and D passing at
distance ρ from x.
Corollary 11.5 Let A, D ⊂ Rd be disjoint sets. Let x ∈ (A ∪ D)c . Then, for 0 <
ρ ≤ ε, there exists a constant C such that, for ε small enough,
∗
hA,D (x) ≤ Cρ −d e−F (z )/ε cap Bρ (x), A (11.2.12)
In this section we prove Theorem 11.2. The proof is similar to the one for discrete
diffusions in Chap. 10.
Proof The capacity cap(A, B) satisfies the Dirichlet principle in Theorem 7.33,
where
HBA = h ∈ W 1,2 Rd , Q(dx) : h(z) ∈ [0, 1], ∀z ∈ Rd , h|A = 1, h|B = 0
(11.2.14)
with Q(dx) = e−F (x)/ε dx. For simplicity we only consider the case of a single
saddle point z∗ (i.e., n = 1 in (11.2.2)).We may assume without loss of generality
that F (z∗ ) = 0 and z∗ = 0. We choose coordinates that diagonalise the Hessian of
F at z∗ , so that (with λ∗i = λ∗i (z∗ ))
d
F (z) = 12 λ∗1 z12 + 1 ∗ 2
2 λi zi + O z32 , z ↓ 0. (11.2.15)
i=2
d
5
Cδ = −δ/ −λ∗1 , δ/ −λ∗1 −2δ/ λ∗i , 2δ/ λ∗i . (11.2.16)
i=2
Since we have assumed that there is a single saddle point at the communication
height between A and B, it is possible to choose δ > 0 so small that there exists a
strip Sδ of width 2δ/ [−λ∗1 ], containing 0, separating A and B in the sense that any
path connecting them must cross Sδ , and such that F (z) ≥ δ 2 for all z ∈ Sδ \Cδ .
Let DA and DB be the connected components of Rd \Sδ containing A and B, re-
spectively.
Upper bound
d d d
−ε + F (z1 , 0) f (z1 ) = 0,
dz1 dz1 dz1
f −δ/ −λ∗1 = 1, (11.2.18)
f +δ/ −λ∗1 = 0.
δ/√[−λ∗ ]
1 eF (t,0)/ε dt
z1
f (z1 ) = √ ∗ . (11.2.19)
δ/ [−λ1 ]
√ ∗ eF (t,0)/ε dt
−δ/ [−λ1 ]
√
√
2δ/ λ∗2 2δ/ λ∗d
cap(A, B) ≤ ε √ ∗ dz2 . . . √ dzd
−2δ/ λ2 −2δ/ λ∗d
√
δ/ [−λ∗1 ] 2
× √ dz1 e−F (z)/ε f (z1 )
−δ/ [−λ∗1 ]
+ εc2 −λ∗1 δ −2 dz e−F (z)/ε . (11.2.20)
Sδ \Cδ
272 11 Diffusion Processes with Gradient Drift
e−δ /ε
2
1/2 [ln(1/ε)]3/2 ) (2πε)1/2
≥ eO(ε eF (0)/ε −
[−λ∗1 ] δε −1/2
Lower bound
d
5
& ∗ ∗
Cδ = −2δ/ −λ1 , 2δ/ −λ1 −δ/ (d − 1)λ∗i , δ/ (d − 1)λ∗i
i=2
&δ⊥ .
= −2δ/ −λ∗1 , 2δ/ −λ∗1 ⊗ C (11.2.27)
Let h∗ denote the minimiser of the variational problem in (11.2.13), i.e., the equi-
librium potential of the capacitor (A, B). Then
inf E (h, h) = E h∗ , h∗ ≥ EC&δ h∗ , h∗ , (11.2.28)
h∈HBA
&δ . Obviously,
where EC&δ is the restriction of the Dirichlet form to the domain C
2
∂h(z)
EC&δ (h, h) ≥ E¯C&δ (h, h) = ε dz e −F (z)/ε
(11.2.29)
&δ
C ∂z1
√ 2
2δ/ [−λ∗1 ]
−F (z)/ε ∂h(z1 , z⊥ )
=ε dz⊥ √ ∗ dz1 e
&⊥
C −2δ/ [−λ1 ] ∂z1
δ
≥ε dz⊥ √ inf √
&⊥
C f : f (±δ/ [−λ∗1 ])=h∗ (±δ/ [−λ∗1 ])
δ
2δ/ [−λ∗1 ] 2
√ dz1 e−F (z)/ε f (z1 ) .
−2δ/ [−λ∗1 ]
The minimisation problem for fixed values of z⊥ is the solution of the Dirichlet
problem
d d d
−ε + F (z1 , z⊥ ) f (z1 ) = 0,
dz1 dz1 dz1
f −2δ/ −λ∗1 = h∗ −2δ/ −λ∗1 , z⊥ , (11.2.30)
f +2δ/ −λ∗1 = h∗ 2δ/ −λ∗1 , z⊥ .
Set a = h∗ (−2δ/ [−λ∗1 ], z⊥ ), b = h∗ (2δ/ [−λ∗1 ], z⊥ ) and g(z1 ) = F (z1 , z⊥ ).
Then the general solution of the differential equation in (11.2.30) is
s
f (z1 ) = c eg(t)/ε dt, (11.2.31)
z1
274 11 Diffusion Processes with Gradient Drift
where the constants c and s are determined by the boundary conditions, i.e.,
s
c √ eg(t)/ε dt = a,
−2δ/ [−λ∗1 ]
s
(11.2.32)
c √ e g(t)/ε
dt = b,
2δ/ [−λ∗1 ]
Inserting this solution into (11.2.29) and recalling the definition of g, we obtain
EC&δ h∗ , h∗
2δ/√[−λ∗ ]
1 e−F (z1 ,z⊥ )/ε [h∗ (−2δ/ [−λ∗1 ], z⊥ )]2 e2F (z1 ,z⊥ )/ε
≥ε dz⊥ √ dz1 2
&⊥ −2δ/ [−λ∗1 ]
C δ √ ∗ eF (t,z⊥ )/ε dt
s(z⊥ )
−2δ/ [−λ1 ]
[h∗ (−2δ/ [−λ∗1 ], z⊥ ) − h∗ (2δ/ [−λ∗1 ], z⊥ )]2
=ε dz⊥ √ . (11.2.35)
&⊥ 2δ/ [−λ∗1 ]
Cδ √ ∗ eF (t,z⊥ )/ε dt
−2δ/ [−λ1 ]
√ ∗ e F (t,z⊥ )/ε
dt = e i=2 2ε
√ dt e− 2ε
−2δ/ [−λ1 ] −2δ/ [−λ∗1 ]
√
2πε d λ∗i zi2 +O(δ 3 /ε)
≤ e i=2 2ε , (11.2.36)
[−λ∗1 ]
and so
d
ε[−λ∗1 ] λ∗ z 2 3
EC&δ h∗ , h∗ ≥ √ dz⊥ exp − i i
+ O δ /ε
2π C&⊥ 2ε
δ i=2
2
× h∗ −2δ/ −λ∗1 , z⊥ − h∗ 2δ/ −λ∗1 , z⊥ .
(11.2.37)
The following lemma shows how close the values of h∗ appearing in (11.2.37)
are to 0 and 1, respectively.
11.2 Capacity estimates and mean hitting times 275
&⊥ ,
Lemma 11.6 Uniformly in z⊥ ∈ C δ
1 − h∗ −2δ/ −λ∗1 , z⊥ ≤ Cε −d/2 e−δ /4ε ,
2
h∗ 2δ/ −λ∗1 , z⊥ ≤ Cε −d/2 e−δ /4ε .
2
(11.2.38)
Proof Use that h∗ (z) = Pz [τA < τB ] = hA,B (z) in combination with Corol-
lary 11.5.
and
(A∪D)c dy e−F (y)/ε hBε (x),A∪D (y)hD,A (y)
Ex [τD 1τD <τA ] = 1 + O ε α/2 .
cap(Bε (x), A ∪ D)
(11.2.41)
276 11 Diffusion Processes with Gradient Drift
Proof The proofs of (11.2.40) and (11.2.41) are analogous, and therefore we will
only give the former. The strategy is to first use Corollary 7.30 with A a small ball
around x, and then to use that the average exit time does not vary much as a function
of the starting point on this ball. In fact, by Corollary 7.30,
−F (y)/ε h
(A∪D)c dy e Bε (x),A∪D (y)hD,A (y)
νBε (x),D (dy)Ey [τD 1τD <τA ] = .
cap(Bε (x), A ∪ D)
(11.2.42)
Recall that wD (y) = Ey [τD ], y ∈ / D, solves the Dirichlet problem in (7.2.74) with
g = 1. Let BR0 (x) be the ball of radius R0 centred at x, where x is a critical point
of F . Then there is a K < ∞ such that supy∈BR (x) ∇F (y)∞ ≤ KR0 for R0 small
0
enough. Hence √ Lemmas 9.8–9.9 applied to this function have uniform constants
when R0 ≤ ε. Thus, by Lemma 9.13, wD inherits from Lemma 9.8 the uniform
Harnack bound
sup wD (y) ≤ C inf wD (y). (11.2.43)
y∈B√ε (x) y∈B√ε (x)
and implies
1 (2πε)d/2
Exj [τSk ] = e−F (xj )/ε
cap(Bε (xj ), Sk ) det(∇ 2 F (x ))
j
3
× 1 + O ε ln(1/ε) , ε α/2 . (11.2.46)
Proof Fix j > k. Consider the set Γj = {y ∈ Ω : F (y) ≤ Φ(xj , Sk ) + δ} for δ > 0
sufficiently small. Decompose Γj into its connected components: Γj = ∪ι̃ Γj (ι̃).
11.2 Capacity estimates and mean hitting times 277
Write
dy e−F (y)/ε hBε (xj ),Sk (y)
Ω
−F (y)/ε
= dy e hBε (xj ),Sk (y) + dy e−F (y)/ε hBε (xj ),Sk (y).
Γjc ι̃ Γj (ι̃)
(11.2.47)
0 ≤ hSk ,Bε (xj ) (y) ≤ Cε −d/2 e−[Φ(xι̃ ,Sk )−Φ(xι̃ ,xj )]/ε , (11.2.49)
which is exponentially small. On the other hand, if xι̃ denotes the absolute minimum
of F within Γj (ι̃) and the Hessian ∇ 2 F (xι̃ ) at this minimum is non-degenerate, then
(2πε)d/2 3
dy e−F (y)/ε = e−F (xι̃ )/ε 1 + O ε ln(1/ε)
Γj (ι̃)\Sk det(∇ 2 F (xι̃ ))
(11.2.50)
by standard Laplace asymptotics. Combining (11.2.48)–(11.2.50), we get
dy e−F (y)/ε hBε (xj ),Sk (y)
ι̃∈L Γj (ι̃)
(2πε)d/2 3
= e−F (xι̃ )/ε 1 + O ε ln(1/ε) . (11.2.51)
ι̃∈L det(∇ 2 F (xι̃ ))
Note that, under our assumptions on F , xj is the unique value in the sum over ι̃ for
which F (xι̃ ) takes its minimal value, and hence the sum is dominated by this single
term.
The terms with ι̃ ∈ R cannot be computed as precisely, but they are negligible.
Indeed, note that, under our assumptions, all components Γj (ι̃) that do not intersect
Sk give a contribution that is smaller than exp(−F (xj )/ε) times an exponentially
small factor, and hence are negligible compared to what we get from (11.2.51).
278 11 Diffusion Processes with Gradient Drift
Proof The proof is immediate by inserting the formula for the capacity in Theo-
rem 11.2 into (11.2.46), except for the error terms of order ε α/2 , which we will show
can be removed. Namely, note that nothing changes in the proof of Proposition 11.8
when we replace the starting point xj by some point x ∈ B√ε (y). Also, inspection
of the proof of Theorem 11.2 shows that the difference between cap(Bε (xj ), Sk )
and cap(Bε (x), Sk ) for x ∈ B√ε (y) is in fact much smaller than the error terms.
Thus, we get
3
oscx∈B ε (xj ) Ex [τSk ] ≤ C ε + ε ln(1/ε) Exj [τSk ],
√
α/2
(11.2.53)
which improves the input in the Hölder estimate by a factor ε α/2 , which in turn
allows us to improve the error estimates in Proposition 11.8 from ε α/2 to ε α . Iter-
ating
this procedure, we can reduce these errors until they are of the same order as
ε[ln(1/ε)]3 .
In this section we turn to the analysis of the low-lying spectrum of the generator
(11.1.3) with Dirichlet boundary conditions on Ω c (when Ω = Rd ). The strategy
we follow is similar to that outlined in Sect. 8.4 in the context of discrete state
spaces. The additional input that is needed is again the regularity estimates.
Assumption 11.1 on F ensures that the spectrum of Lε is discrete. Moreover, it
is well known from Wentzell-Freidlin theory [115] that the spectrum has precisely
one exponentially small eigenvalue for each local minimum of the function F . We
show how to get sharp estimates as ε ↓ 0.
11.3 Spectral theory 279
In Sect. 11.3.1 we state our main results. In Sect. 11.3.2 we derive a priori lower
bounds on the spectrum. In Sect. 11.3.3 we look at the principal Dirichlet eigen-
value, in Sect. 11.3.4 at the small eigenvalues. In Sect. 11.3.5 we derive improved
error estimates. In Sect. 11.3.6 we show that the exit times are asymptotically expo-
nentially distributed.
λ1 = 0 (11.3.2)
and
cap(Bk , Sk−1 )
λk = 1 + O e−δ/ε
hk 2,με
2
1
= 1 + O e−δ/ε
Exk [τSk−1 ]
2
[−λ∗1 (z∗ (xk , Mk−1 ))] det(∇ 2 F (xk ))
=
2π − det(∇ 2 F (z∗ (xk , Mk−1 )))
where λ∗1 (z∗ ) denotes the unique negative eigenvalue of the Hessian of F at the
saddle point z∗ .
The conditions in (11.3.1) state that “all valleys of F have different depth”,
which is the generic situation. This is analogous to the condition that M be a regular
set of metastable points made in Chap. 8.
280 11 Diffusion Processes with Gradient Drift
In the course of the proof of Theorem 11.9 we also obtain detailed control on the
eigenfunctions of −Lε corresponding to the small eigenvalues (see Fig. 11.3 for a
schematic representation of the first two eigenfunctions in a double-well potential
in one dimension).
Theorem 11.12 (Exponential law of metastable exit times) Suppose that the as-
sumptions of Theorem 11.9 are satisfied. Let D ⊂ Rd be a closed subset such that:
(i) If Mk = {y1 , . . . , yl } ⊂ M enumerates all the minima of F such that F (yl ) ≤
F (xk ), then kl=1 Bε (yl ) ⊂ D.
(ii) dist(z∗ (xi , Mi ), D) ≥ δ > 0 for some δ > 0 independent of ε.
Then there exists a δ > 0, independent of ε and t, such that, for all t > 0,
−δ/ε
Pxk τD > t Exk [τD ] = 1 + O e−δ/ε e−t[1+O(e )]
−tλ Ex [τ ]
× O e−δ/ε e l k D + O(1) e−tO(ε ) Exk [τD ] .
d−1
l>k (11.3.6)
11.3 Spectral theory 281
In this section we derive a priori lower bounds on principal eigenvalues for the
Dirichlet problem in regular open sets D ⊂ Ω ⊆ Rd . The closure of D is denoted
c
by D̄, the complement by D c , and the boundary by ∂D. We denote by λD 0 the
principal (= smallest) eigenvalue of the Dirichlet problem
We sometimes use the notation LεD to indicate the Dirichlet operator corresponding
to (11.3.7).
The following lemma improves the Donsker-Varadhan estimate in Lemma 8.16
when D is unbounded.
Dc 1 −F (y)/ε
λ0 ≥ 1− dy e 2
φD (y) . (11.3.8)
supx∈A Ex [τD c ] D\A
Moreover, for any δ > 0, there exists a bounded set A ⊂ D, independent of ε, such
that
c 1−δ
λD0 ≥ . (11.3.9)
supx∈A Ex [τD c ]
For B ⊂ D,
1
λ0D ∪B −F (y)/ε
c
≥ 1− dy e 2
φD\B (y) . (11.3.10)
supx∈A Ex [τB |τB ≤ τD c ] D\A
(−Lε w)(x) = 1, x ∈ D,
(11.3.11)
w(x) = 0, x ∈ Dc .
Using that, for any a, b ∈ R and C > 0, ab ≤ 12 (Ca 2 + b2 /C), and picking a =
φ(x + hei ), b = φ(x) and C = w(x)/w(x + hei ), we have
where F %(y) = minx∈M [F (y) − F (x)]. Clearly this implies (11.3.9). To see why
(11.3.15) is true, set v(y) = e−F (y)/2ε φD (y), which is the corresponding ground-
state eigenfunction of the operator
1 2 1
− e−F /2ε Lε eF /2ε (x) = −εΔ + ∇F (x) − ΔF (x), (11.3.16)
4ε 2
which is a symmetric operator on L2 (D, dy). A semi-classical Agmon estimate for
the ground-state eigenfunction v that can be found in Helffer and Sjöstrand [138]
yields
%
dy e(1−γ )F (y)/ε v(y)2 < Cγ < ∞, (11.3.17)
D
which in turn implies (11.3.15). To obtain (11.3.10), note that wB,D (x) = Ex [τB |τB
≤ τD c ], x ∈ D\B, solves the Dirichlet problem
Lemma 11.14 Assume that D ∩ M2ε = ∅. Then there is a finite positive constant C,
independent of ε, such that
−2d+2
sup Ex [τD c ] ≤ Cε sup 1F (y)≤F (x) dy. (11.3.20)
x∈D x∈D Ω
sup Ez [τD c ]
z∈∂Bε (x)
−F (y)/ε
{y∈D : F (y)>F (x)} dy e
≤C
cap(Bε (x), D c )
1 cap(Bε (y), Bε (x))
+C dy e−F (y)/ε .
cap(Bε (x), D c ) {y∈D : F (y)≤F (x)} cap(Bε (y), D c )
(11.3.24)
By our assumption on F , the first integral is bounded by a constant times e−F (x)/ε
and the second integral is equal to the volume of the level set {y ∈ D : F (y) ≤
F (x)}. The second term in (11.3.25) is dominant.
284 11 Diffusion Processes with Gradient Drift
We can generalise the bounds obtained so far to sets D containing some of the
local minima of F . Let N ⊂ M be non-empty, and let
Nε = y ∈ Rd : dist(y, N ) ≤ ε . (11.3.27)
Proof The proof is similar to that of Lemma 11.14 when combined with the estimate
on mean exit times given in Proposition 11.8. We leave the details to the reader.
Lemma 11.16 and Theorem 11.2 imply, under the assumptions of Theorem 11.9,
that
e−[Φ(xk ,Mk )−F (xk )]/ε ,
c
0 ≥ Cε
λD min (11.3.30)
k : xk ∈Nε
where Cε is polynomially bounded in ε. This rough bound will be made more pre-
cise in the next section.
General strategy
We first state a simple application of the Harnack and Hölder inequalities in Lem-
mas 9.8–9.9.
Proposition 11.18 Assume that D contains l local minima of the function F and
that there is a single minimum x ∈ D that realises
Φ x, D c − F (x) = max Φ xi , D c − F (xi ) . (11.3.34)
1≤i≤l
Write B = Bε (x). Then there exist α > 0, C < ∞ and δ > 0, independent of ε, such
c
that the principal eigenvalue λD
0 of the Dirichlet problem on D satisfies
cap(B, D c ) −δ/ε
cap(B, D c )
1 − Cε α/2
1 − e ≤ λ Dc
0 ≤ 1 + Cε α/2 1 + e−δ/ε ,
hB,D 2,με
c 2 hB,D 2,με
c 2
(11.3.35)
where · 2,με denotes the L2 -norm with respect to the measure με (dy) =
e−F (y)/ε dy. In particular,
cap(Bk , Sk−1 )
1 − Cε α/2 1 − e−δ/ε
hBk ,Sk−1 2,με
2
cap(Bk , Sk−1 )
≤ λ̄k ≤ 1 + Cε α/2 1 + e−δ/ε . (11.3.36)
hBk ,Sk−1 2,με
2
where the boundary conditions φD are given by the actual principal eigenfunction.
We assume that dist(x, D c ) ≥ δ > 0, with δ independent of ε. Then B4√ε (x) ⊂
11.3 Spectral theory 287
hλB,D c (y) = 0, y ∈ Dc ,
λ
while χB,D c solves
(−Lε − λ)χB,D
λ
c (y) = 0, y ∈ D\∂B,
c (y) = φD (y) − 1, y ∈ ∂B,
λ
χB,D (11.3.41)
λ
χB,D c (y) = 0, y ∈ Dc .
We want that (−Lε − λ)f λ vanishes also as a surface measure on ∂B. This
requires that there is no discontinuity in the derivative of f λ normal to ∂B, which
we can express as saying that, for g a smooth test function that vanishes on D c ,
e−F (y)/ε g(y)∂n(y) f λ (y) + g(y)∂−n(y) f λ (y) dσB (y) = 0, (11.3.42)
∂B
where dσB (y) denotes the Euclidean surface measure on ∂B, and ∂±n(y) denotes the
normal derivative at y ∈ ∂B from the exterior and interior of B, respectively. As we
will see, it already suffices to require that this equation hold for functions g that are
equal to 1 on ∂B. In fact, we will choose g = hB,D c . To evaluate this expression, it
will be convenient to observe that hB,D c (y) = 1 for y ∈ ∂B. Moreover, hB,D c (y) =
1 on B, so that ∂−n(y) hB,D c (y) vanishes on ∂B. Using these facts, together with the
second Green identity, we get from (11.3.42) the condition
−F (y)/ε λ
0= e ∂n(y) hB,D c (y)f (y)dσB (y) −
λ
dy e−F (y)/ε hB,D c (y)f λ (y)
∂B ε D
λ
= e−F (y)/ε ∂n(y) hB,D c (y)dσB (y) − dy e−F (y)/ε hB,D c (y)hλB,D c (y)
∂B ε D
+ e−F (y)/ε ∂n(y) hB,D c (y)χB,D
λ
c (y)dσB (y)
∂B
λ
− dy e−F (y)/ε hB,D c (y)χB,D
λ
c (y). (11.3.43)
ε D
288 11 Diffusion Processes with Gradient Drift
(Note that the derivative ∂n(y) is in the direction of the interior of B.) The two terms
λ
involving χB,D c will be naturally treated as error terms. Since ∂n(y) hB,D c (y) > 0,
we get via Lemma 11.17 that
0≤ e−F (y)/ε ∂n(y) hB,D c (y)χB,D
λ
c (y) ≤ Cε
α/2
e−F (y)/ε ∂n(y) hB,D c (y).
∂B ∂B
(11.3.44)
λ
Defining δχB,D c = χ λ
B,D c − χ 0
B,D c , we see that δχ λ
B,D c solves the Dirichlet problem
(−Lε − λ)δχB,D
λ
c (y) = λχB,D c (y),
0
y ∈ D\∂B,
c (y) = 0, y ∈ ∂B ∪ D c .
λ
δχB,D (11.3.45)
c (y) = 0, y ∈ Dc .
λ
δχB,D
In complete analogy with Lemma 8.21, we get the following L2 (με )-estimates.
Lemma 11.19
(i)
λ
λ
δχB,D c ≤ 0
χB,D c . (11.3.46)
λ0D ∪B
c
2,με
−λ 2,με
(ii)
λ
hλB,D c − hB,D c ≤ hB,D c 2,με . (11.3.47)
λ0D ∪B
c
2,με
−λ
(iii) For all z ∈ D\B,
0 ≤ χB,D
0
c (z) ≤ Cε
α/2
hB,D c (z). (11.3.48)
Proof Items (i) and (ii) are the standard L2 -bounds as used in Lemma 8.21. Item
0
(iii) follows from the Poisson kernel representation of χB,D c,
c (x) = −ε φD (y) − 1 ∂n(y) GD\B (x, y)dσB (y).
0
χB,D (11.3.49)
∂B
Since the normal derivative of the Green function GD\B (x, y) is negative on ∂B and
φD (y) ≥ 1 on ∂B, we get (11.3.48).
c
This yields the bounds on λD 0 in (11.3.35). Note that, while we have only used a
c
necessary condition for λD 0 , the fact that there must be such an eigenvalue implies
that it actually lies between the bounds given by (11.3.51).
In complete analogy with Lemma 8.22 we can improve the L2 -estimates to uniform
estimates.
Lemma 11.20 With the notation above, the following estimates hold for all ε small
enough:
(i) For all z ∈ D,
λ λ 0
χ ≤2 χ . (11.3.52)
B,D c (z) B,D c (z)
λ0D ∪B
c
−λ
(ii) For all z ∈ D\B,
λ λ
h ≤ hB,D c (z),
B,D c (z) − hB,D c (z) (11.3.53)
a(D, B) − λ
Proof Items (i) and (ii) follow from the same arguments that were used in the proof
of Lemma 8.4.21. Item (iii) follows from the maximum principle. Combine these
estimates to get (iv).
c ∪B
Remark 11.21 Note that a(D, B) = 1/λ0D [1 + o(1)] for sets D\B that do con-
tain a local minimum of F .
The goal of this section is to generalise the analysis in Sect. 11.3.3 to all small
eigenvalues of −Lε . To do this, we need to first establish some a priori estimates
on the behaviour of eigenfunctions near the local minima of F .
290 11 Diffusion Processes with Gradient Drift
For the analysis of harmonic functions that are not necessarily positive, we need an
estimate for sub-harmonic functions that allows us to relate the oscillation to the
L2 -norm.
Lemma 11.22 Let φ be a strong solution of (−Lε − λ)φ = 0 on the ball Bc√ε (x).
Then there exist a C < ∞ independent of ε such that
1/2
−d/4
oscBc ε φ ≤ Cε
√
2
φ(x) dx . (11.3.55)
B2c√ε
Proof This is just a specialisation of Gilbar and Trudinger [126, Theorem 9.20]
(which gives upper bounds on suprema of sub-harmonic functions in terms of Lp -
norms), and is obtained after choosing the balls in such a way that the constants are
uniform in ε.
√
We want to show that in ε-neighbourhoods the eigenfunctions corresponding
to the exponentially small eigenvalues of −Lε either have a constant sign or are
irrelevantly small. This property is suggested by the following result.
Proof This proposition is stated and proved in Kolokoltsov [154] for smooth F , but
it is easy to check that the proof carries through for F ∈ C 3 (Ω).
Unfortunately Lemma 11.23 is not quite enough to conclude that Φ is not chang-
ing sign near any minimum. We will, however, show that this is the case when the
contribution of φ coming from a neighbourhood of a given minimum is significant.
To that end, for D ⊂ Ω set
1/2
f 2,με ,D = 2
f (x) με (dx) . (11.3.57)
D
Proof We will first show that the weighted L2 -estimate on the deviation of√φ from
a constant implies a local unweighted L2 -estimate on balls of radius r = ε near
the minima xj , j ∈ J . To that end, note that (11.3.56) implies that
&(x) = φ(x)/φ2,με ,Dj and ĉj = cj /φ2,με ,Dj . Then, by the definition of J ,
Set φ
this locally normalised function satisfies the estimate
This estimate does not change if we add a constant to F (x). Thus, we can pretend
that F (xi ) = 0. Let R > 0 be such that BR (xj ) ∈ Dj . Since xj is a quadratic min-
imum, there exists a positive and finite constant b such that F (x) ≤ b(x − xj )2 for
x ∈ BR (xj ). Hence (11.3.60) implies, in particular, that
φ&(x) − ĉj 2 dx ≤ CebR 2 /ε e−γ /2ε . (11.3.61)
BR (xj )
& ≤ Cε −d/4
oscB2√ε φ (11.3.63)
r α
&
oscBr φ ≤ Cε −d/4
, (11.3.64)
ε 1/2
we can achieve that oscBr (x) ≤ Cε α/2 < ĉi /2 for ε small enough by the estimate
& must be close to cj , uniformly on Br (x). Since this
(11.3.61), it then follows that φ
√
argument holds for all x ∈ B ε (xj ), we have |φ& − ĉj | ≤ Cε α/2 on this ball.
Proof Since i ∈ / J we may assume that φ changes sign on B√e (xi ). Hence its abso-
lute value is bounded by its oscillation, and so, by Lemma 11.22,
sup φ(x) ≤ Cε −d/4 φ2,dx,B2√ε (xj ) (11.3.66)
x∈B√ε (xi )
Recall that we are working under the assumption stated in Theorem 11.9. Suppose
that we want to compute eigenvalues below λS 0 = λ̄k . We know that if φ is an
k λ
eigenfunction with λ < λ̄k , then it can be represented as the solution of the Dirichlet
problem
(−Lε − λ)f λ (y) = 0, y ∈ Ω\∂Sk ,
(11.3.67)
f λ (y) = φ λ (y), y ∈ ∂Sk .
As in the analysis of principal eigenvalues, the condition on λ will be the existence
of a non-trivial φ λ on ∂Sk such that the surface measure
dy e−F (y)/ε (−Lε − λ)f λ (y) = e−F (y)/ε ∂n(y) f λ (y) + ∂−n(y) f λ (y) dσSk (y)
(11.3.68)
vanishes. A necessary condition for this to happen is the vanishing of the total mass
on each of the surfaces ∂Bi , 1 ≤ i ≤ k, i.e.,
e−F (y)/ε ∂n(y) f λ (y) + ∂−n(y) f λ (y) dσSk (y) = 0. (11.3.69)
∂Bi
Let ci = infy∈Bi φ λ (y). In view of Lemmas 11.24 and 11.25, either of the follow-
ing two properties holds:
(i) supy∈Bi |φ λ (y)/ci − 1| ≤ Cε α/2 .
(ii) supy∈Bi |φ λ (y)| ≤ Cε −d/4 e−γ /2ε eF (xi )/2ε .
11.3 Spectral theory 293
In what follows we analyse all possible cases. Let J ⊂ {1, . . . , k} be the set of
indices where (i) holds and J c = {1, . . . , k}\J the set of indices where (ii) holds.
Given this partition, set
λ
fλ = cj hλBj ,Sk \Bj + χBλj ,Sk \Bj + χBj ,Sk \Bj . (11.3.70)
j ∈J j ∈J c
To lighten the notation we set hλj = hλBj ,Sk \Bj and χjλ = χBλj ,Sk \Bj , etc. in the se-
quel. For j ∈ J , χjλ is the solution of
and, for j ∈ J c ,
e−F (y)/ε ∂n(y) hi (y)χjλ (y) dσ∂ Sk (y)
∂Bj
−d/4 −γ /2ε F (xj )/2ε
≤ Cε e e e −F (y)/ε
∂n(y) hi (y) dσ∂ Sk (y). (11.3.75)
∂Bj
Since the hi are harmonic, the first Green formula (7.23) implies that, for i = j ,
e−F (y)/ε ∂n(y) hi (y) dσBj (y) = e−F (y)/ε hj (y)∂n(y) hi (y) dσBj (y)
∂Bj ∂Bj
= ε −1 dy e−F (y)/ε ∇hj (y), ∇hi (y)
Skc
≤ ε −1 cap(Bi , Sk \Bi ) cap(Bj , Sk \Bj ), (11.3.76)
where the last inequality uses the Cauchy-Schwarz inequality. Thus, for j ∈ J \i,
e−F (y)/ε ∂n(y) hi (y)χjλ dσ∂ Sk (y)
∂Bj
≤ Cε α/2 ε −1 cap(Bi , Sk \Bi ) cap(Bj , Sk \Bj ). (11.3.77)
and, for j ∈ J c ,
dy e−F (y)/ε hi (y)χjλ (y)
Ω
= O ε −d/4 e−γ /2ε eF (xj )/2ε dy e−F (y)/ε hi (y)hj (y). (11.3.81)
Ω
To control the off-diagonal terms we need to show that the normalised functions hi
and hj are almost orthogonal.
Lemma 11.26
(hi , hj )με
≤ Cε −3d max e−(Φ(xi ,xj )−F (xi ))/ε , e−(Φ(xi ,xj )−F (xj ))/ε .
hi 2,με hj 2,με
(11.3.84)
Proof The proof goes in the same way as that of Lemma 8.24 in Chap. 8, and
uses the bounds in (11.2.12) on harmonic functions and the bounds in (11.2.5) on
capacities.
λ
χ (z) ≤ Cε −d/4 e−γ /2ε hj (z) . (11.3.85)
j
hj 2,με
296 11 Diffusion Processes with Gradient Drift
(iii) For j ∈ J ,
|Djj | ≤ Cε α/2 . (11.3.95)
(iv) For i = j ∈ J c ,
and
hj 2,με |Bij | ≤ Cε −d e−γ /ε Kii Kjj . (11.3.97)
Proof The bound in (11.3.93) is (11.3.77). The bounds in (11.3.94) and (11.3.95)
follow from (11.3.80) and (11.3.88). The bound in (11.3.96) is a consequence of
(11.3.78), while (11.3.97) follows from (11.3.81).
From here on the analysis of the solutions of (11.3.92) is very similar to that of
(8.4.86). Let us summarise the situation so far.
Theorem 11.28 Let Sk = ki=1 Bε (xi ), and let λ̄k denote the principal eigenvalue
of the operator −Lε with Dirichlet boundary conditions on ∂Sk (and ∂Ω). Then a
necessary condition for a number λ < λ̄k to be an eigenvalue of the operator −Lε
is
that there exist a non-empty set J ⊂ {1, . . . , k} and constants ĉj , j ∈ J , with
ĉ 2 = 1, such that (11.3.92) holds for all i ∈ J .
j ∈J J
Lemma 11.29 Let (Kij )1≤i,j ≤n be the normalised capacity matrix and assume that
while all other eigenvalues are smaller than Ce−δ/ε λk . Moreover, the eigenvec-
tor v = (v1 , . . . , vk ) corresponding to the largest eigenvalue normalised such that
vk = 1 satisfies |vi | ≤ Ce−δ/ε for 1 ≤ i < k.
Proof The proof is a simple perturbation argument. Note that we can write
K = Kˆ + Kˇ , (11.3.100)
where Kˆij = Kkk δik δj k . Estimate the norm of Kˇ as in the proof of Lemma 11.26.
Recall that
|Kij | ≤ Kii Kjj . (11.3.101)
298 11 Diffusion Processes with Gradient Drift
Since Kˆ has one eigenvalue Kkk with the obvious eigenvector and all other eigen-
values are zero, the claim follows from standard perturbation theory.
Since Kkk = cap(Bk , Sk−1 )/hk 22,με ≈ λ̄k−1 (i.e., is equal to λ̄k−1 up to poly-
nomial terms in ε), Lemma 11.29 tells us that μk ≈ λ̄k−1 , which is precisely the
value we expect.
cap(Bk , Sk−1 )
λk = 1 + O ε α/2 , e−δ/ε . (11.3.103)
hk 2,με
2
(ii) The eigenvalue λk is simple and the corresponding eigenfunction fkλ can be
written as
hk (y) k−1
hj (y)
φkλ (y) = 1 + O ε α/2 + dj (y) , (11.3.104)
hk 2,με hj 2,με
j =1
where |dj (y)| ≤ e−δ/ε for some δ > 0 (uniformly on compact subsets when Ω
is unbounded).
Using the same arguments as in the proof of Lemma 11.20, and the bounds on
φ λ − cj on the boundaries ∂Bj , we get that, for j ∈ J ,
At this point we can further explore the eigenvalues below λ̄k−1 , etc., with the
same result. Thus, at the end of the procedure, we arrive at the conclusion that −Lε
can have at most the n simple eigenvalues given by the values in Corollary 11.30
below Cε d−1 . However, since we know that there must be n such eigenvalues, we
conclude that all these candidate eigenvalues are in fact the true eigenvalues, which
yields the following proposition.
Proposition 11.31 Under the assumptions of Theorem 11.9, the spectrum of −Lε
below Cε d−1 consists of n simple eigenvalues that satisfy:
cap(Bk , Sk−1 )
λk = 1 + O ε α/2 , e−δ/ε
hk 2,με
2
det(∇ 2 F (xk )) F (xk )/ε
= cap(Bk , Sk−1 ) √ e 1 + O ε 1/2 ln(1/ε), ε α/2 , e−δ/ε
( 2πε) d
1
= 1 + O ε α/2 , e−δ/ε , k = 1, . . . , n. (11.3.108)
Exk [τSk−1 ]
Proof We have seen that λk = Kkk (1 + O(e−θ/ε , ε α/2 )), which proves the first
(k)
assertion. It remains to identify the eigenvalues with the inverse mean times. The
argument is essentially the same as in the proof of Theorem 8.43.
By virtue of Theorem 11.2 we need to show that
−F (y)/ε 2
dy e hk (y) ∼ dy e−F (y)/ε hk (y). (11.3.109)
Ω Ω
In fact, we will show more, namely, that both sides of (11.3.109) are asymptotically
equal to
√
( 2πε)d
e−F (xk )/ε . (11.3.110)
det(∇ 2 F (xk ))
We must show that the main contribution of the integrals comes from a small neigh-
bourhood of xk , which yields the contribution in (11.3.110). It is clear that all con-
tributions from the set {y ∈ Ω : F (y) > F (xk ) + ε ln(1/ε)} give only sub-leading
corrections. To treat the complement of this set, we use the bounds on the equilib-
300 11 Diffusion Processes with Gradient Drift
e−[F (y)−F (xj )]/ε e−{F (xk )+[Φ(xj ,Bk )−F (xk )]−[Φ(xj ,Sk−1 )−F (xj )]}/ε . (11.3.113)
while
Φ(xj , Bk ) = Φ(xk , Bj ) ≤ Φ(xk , Sj \Bk ). (11.3.116)
Therefore, our supposition implies that
which contradicts the conditions in (11.3.1) at stage j . In other words, if our suppo-
sition were true, then the set Bk would have to yield the largest eigenvalue at stage j ,
i.e., it would have to be labelled Bj . Hence (11.3.114) must hold.
Since, by assumption, the inequalities are strict (which is more than we need), it
indeed follows that
√
−F (y)/ε −F (xk )/ε ( 2πε)d 3/2
dy e hk (y) = e 1 + O ε 1/2 ln(1/ε) ,
Ω 2
det(∇ F (xk ))
(11.3.118)
and the same bound holds when hk is replaced by h2k .
To conclude the proofs of Theorems 11.9 and 11.11, we only need to improve the
error estimates. So far the proofs have produced error terms from two sources: the
11.3 Spectral theory 301
exponentially small errors resulting from the perturbation around λ = 0 and from
the imperfect orthogonality of the functions hi , i = 1, . . . , n, and the much larger
errors of order ε α/2 resulting from the a priori control on the regularity of the eigen-
functions obtained from the Hölder estimate of Lemma 11.24. In the light of the
estimates obtained on the eigenfunctions, these can now be improved successively.
First, note that the eigenfunction corresponding to the minimum xk is small
enough at all the minima xl , 1 ≤ l < k, so that we can actually take J = {k} and
Jk = {1, . . . , k − 1} in (11.3.71) and (11.3.73). Then we know from Corollary 11.30
that
over the estimate in (11.3.33). This allows us to replace all errors of order ε α/2 by
errors of order ε α . This procedure can be iterated m times to get errors of order
ε mα/2 , which for m of order ln(1/ε) is as small as the exponentially small errors.
Finally, we want to improve the precision with which we relate the eigenvalues
to the inverse of the mean exit times. This precision is so far limited by the precision
with which
cap(Bε (xk ), Sk−1 )
Exk [τSk−1 ] ≈ . (11.3.121)
hk 2,με
From Proposition 11.7 we know that this precision is limited only by the variation of
Ex [τSk−1 ] on Bε (xk ). To improve (11.3.121), we need to control (hk = hBε (xk ),Sk−1 )
Namely,
hB (y) − hk (y)
ε (x),Sk−1
≤ Py (τBε (xk ) < τSk−1 < τBε (x) ) + Py (τBε (x) < τSk−1 < τBε (xk ) ). (11.3.124)
302 11 Diffusion Processes with Gradient Drift
The second term in (11.3.124) is bounded in the same way. This in turn implies that
To get an analogous estimate for capacities, we take advantage of the fact that, as
B (x)∪Sk−1
long as λ0 ε ( λk , we can replace Bε (xk ) by Bε (x) in the proof of Proposi-
tion 11.31 without further changes. Thus
cap(Bε (x), Sk−1 ) −δ/ε cap(Bε (xk ), Sk−1 ) −δ/ε
λk = 1 + O e = 1 + O e ,
hBε (x),Sk−1 22,με hk 22,με
(11.3.127)
which together with (11.3.126) implies that
cap Bε (x), Sk−1 −cap Bε (xk ), Sk−1 ≤ e−δ/ε cap Bε (xk ), Sk−1 . (11.3.128)
The last assertion of Theorem 11.12, the asymptotic exponential distribution of the
metastable exit time, follows from the spectral estimates above exactly as in the
discrete case (see the proof of Theorem 8.45). This result can also be obtained via
the coupling method of Martinelli et al. [174, 177].
the Diploma Thesis of Erich Bauer [14]. Assumptions 10.3, 10.5 and 11.1 can be
relaxed. In particular, we may take F = Fε depending on ε, or F with infinitely
many local minima. See e.g. Berglund and Gentz [21].
2. A proof of the Eyring-Kramers formula for the special case when all minima of
the potential are at the same level was given in two little-noticed papers by Sug-
iura [224, 225]. The approach used there runs via a direct variational control on
principal eigenvalues.
3. If Assumption 10.3 fails, then the asymptotics in Theorem 11.2 becomes more
complicated. Berglund and Gentz [21] classify various cases where the saddle point
is not quadratic.
4. Rough estimates of the small eigenvalues λi associated with the local minima xi
of F were derived in Freidlin and Wentzell [115], Mathieu [179] and Miclo [185].
Wentzell [234] and Freidlin and Wentzell [115] obtained estimates for the expo-
nential rate limε↓0 ε ln λk (ε) with the help of large deviation methods. Sharper esti-
mates, with multiplicative errors of order ε ±kd , were obtained for principal eigen-
values by Holley, Kusuoka and Strook [140] with the help of variational principles.
These methods were extended to the full set of exponentially small eigenvalues in
Miclo [185] and Mathieu [179].
5. For a long time sharp spectral estimates were known only in the one-dimensional
case (see e.g. Buslov and Makarov [44, 45] and references therein), whereas in the
multi-dimensional case only heuristic results based on formal power series expan-
sions of the so-called WKB-type existed (see e.g. Kolokoltsov [154]). The proof
in Sect. 11.3, which is based on potential theory, follows Bovier, Gayrard and
Klein [38] and uses ideas that appeared already in Wentzell [233, 234]. More re-
cently, a full analytic proof of the asymptotic expansion for these eigenvalues was
given by Helffer, Klein and Nier [136], and Helffer and Nier [137], using a micro-
local analysis of the so-called Witten complex. They show, in particular, that the
error bounds in Theorem 11.9 can be improved to O(ε). Moreover, they show that,
under the assumption that F is C ∞ , a full asymptotic expansion in ε for the eigen-
values can be computed.
below the level of xk is exponentially small in absolute terms. Note that this implies
that the zeros of φk are generally not in the neighbourhood of the saddle points, but
close to the minima in Mk−1 . This fact was observed in Schütte, Huisinga and Meyn
[216]. We would like to stress that the fact that the eigenfunctions drop sharply at the
saddle points makes them very good indicators of the actual valley structure of F ,
i.e., they are excellent approximations of the indicator functions of the metastable
sets corresponding to the metastable exit time 1/λk .
where hC 2 = h∞ + h ∞ + h ∞ . The differentials Dφ F and Dφ2 F can be
computed explicitly, namely,
(Dφ F )(h)(x) = −Dh (x) + V φ(x) h(x), (12.1.2)
while (Dφ2 F )(h, h) is the quadratic form associated with the Hessian operator Hφ F
given by
(Hφ F )(h)(x) = −Dh (x) + V φ(x) h(x). (12.1.3)
Note that Hφ F a Sturm-Liouville operator (see Coddington and Levinson [66]).
We say that φ is a stationary point of F when φ is a solution of the non-linear
differential equation
−Dφ + V (φ) = 0. (12.1.4)
The notion of saddle points, communications heights, and gates are defined as in the
finite-dimensional setting.
The theory can be developed under assumptions that are analogous to those used
in the finite-dimensional setting:
Assumption 12.1
(i) F has finitely many local minima and saddle points.
(ii) All local minima and saddle points of F are non-degenerate: at each point the
Hessian operator has only non-zero eigenvalues.
However, in this chapter we will only do computations for the simplest non-trivial
case, namely,
V (s) = − 12 s 2 + 14 s 4 + bs, s ∈ R, (12.1.5)
with b ≥ 0 small enough so that the equation
s − s3 − b = 0 (12.1.6)
This assumption holds when D > π −2 . The saddle point is the function Ob (x) =
zb∗ ,
x ∈ [0, 1]. If b = 0, then zb∗ = 0.
For Sturm-Liouville operators, the notion of a determinant can be defined in the
following way. For φ ∈ C([0, 1]), let f be the solution of the initial value problem
Lemma 12.3 For any φ and ψ with non-degenerate Hessian operator, the infinite
product
" λk (φ) Det(Hφ F )
= (12.1.9)
λk (ψ) Det(Hψ F )
k∈N
is convergent.
12.2 Approximation properties of the potential 307
For φ ∈ Hbc
1 , let
Bρ (φ) = σ ∈ Hbc
1
: σ − φL2 ≤ ρ . (12.1.10)
Theorem 12.4 (Mean metastable exit time) Suppose that Assumptions 12.1 and
12.2 are satisfied. Then there exists a ρ0 ∈ (0, ∞) such that, for any ρ ∈ (0, ρ0 ),
2
− 2π [−Det(HO F )] [F (O)−F (I + )]/ε
EI + τ Bρ I = e 1 + Ψ (ε) ,
[−λ− (O)] Det(HI + F )
(12.1.11)
where λ− (O)
is the unique negative eigenvalue of H O F , and the error term satisfies
Ψ (ε) = O( ε[ln(1/ε)]3 ).
The main idea behind the proof of Theorem 12.4 is the use of the space-
discretisation introduced in Sect. 5.7.2. The proof comes in three steps:
(1) Let FN be the space discretisation of F defined in (5.7.24). According to Theo-
rem 5.70, given ε > 0 and sequences I ±,N , N ∈ N, converging to I ± , we have
lim EI +,N τεN Bρ I −,N = EI + τ Bρ I − . (12.1.12)
N →∞
(2) For fixed N , we compute the asymptotics of the transition time. This produces
a prefactor aN (ε) such that
1 N −,N
E τ B I − 1 = ψ(ε, N ). (12.1.13)
a (ε) I +,N ε ρ
N
We show that ψ(ε, N ) ≤ Ψ (ε) = O( ε[ln(1/ε)]3 ) for all N . This estimate is
first shown for the process starting in the last-exit biased distribution and then
transferred to a pointwise estimate with the help of a coupling argument as
explained in Sect. 9.4.2.
(3) We show that aN (ε) converges to the explicit expression given in (12.1.11) as
N → ∞.
Let (λ0k,N )1≤k≤N be the eigenvalues of D$N and (λ0k )k∈N the eigenvalues of D$,
in increasing order. In the case of periodic boundary conditions on [0, 1], we have
2 kπ
λk,N = D (2N ) sin
0 2
, λ0k = D k 2 π 2 , k ∈ N. (12.2.2)
2N
Set
ek,N = λ0k,N − λ0k . (12.2.3)
Note that limN →∞ ek,N = 0 for fixed k, but there is no convergence uniformly in k.
Fix uN ∈ RN , N ∈ N, converging to u ∈ H 1 . Let (λk,N (uN ))1≤k≤N be the eigen-
values of N(H FN )(uN ) and (λk (u))k∈N the eigenvalues of (H F )(u). We would
like to show that (λk,N (uN ))1≤k≤N converges to (λk (u))k∈N in some appropriate
sense. Since (recall (5.7.24))
N (H FN ) uN = − 12 D$N + V uN (12.2.4)
and V (u) is bounded for any u fixed, we have the following estimates.
The following lemma, adapted from de Hoog and Anderssen [76], gives us tighter
control under stronger assumptions.
C2 k 4
|ek,N | ≤ . (12.2.7)
N2
12.2 Approximation properties of the potential 309
"
N −1 " λk (φ)
λk (φ)
lim = . (12.2.9)
N →∞ λk (ψ) λk (ψ)
k=0 k∈N0
Then
λk,N (φ N ) λk (ψ) 1 + θk,N (φ) θk,N (φ) − θk,N (ψ)
= =1+ . (12.2.14)
λk (φ) λk,N (ψ ) 1 + θk,N (ψ)
N 1 + θk,N (ψ)
4
For α small enough and N large enough this gives |θk,N (ψ)| ≤ 12 , and hence
αN
" λ (φ N ) λ (ψ)
αN
k,N k θk,N (φ) − θk,N (ψ) ≤ 2Cα ,
ln ≤ 2 (12.2.16)
λk (φ) λk,N (ψ N ) N
k=0 k=0
310 12 Stochastic Partial Differential Equations
where we use Lemma 12.6 to estimate |θk,N (φ) − θk,N (ψ)| ≤ C/N 2 .
For k > αN we proceed similarly. Put
θk,N = λk,N (ψ)−1 λk,N φ N − λk,N ψ N ,
(12.2.17)
θk = λk (ψ)−1 λk (φ) − λk (ψ) .
Then
1 + θk,N − θk
λk,N (φ N ) λk (ψ) θk,N
= = 1 + , (12.2.18)
λk (φ) λk,N (ψ N ) 1 + θk 1 + θk
For α fixed and N large enough this gives |θk | ≤ 12 , and hence
N −1
" λ (φ N ) λ (ψ)
N −1
−1
N
k,N k θ 2C 2C
ln ≤ 2 − θ ≤ ≤ ,
λk (φ) λk,N (ψ N ) k,N k
k2 αN
k=αN k=αN k=αN
(12.2.20)
where we use Lemma 12.6 to estimate |θk,N − θk | ≤ C/k 2 . Combining (12.2.16)
and (12.2.20), and recalling that
"
N −1
det[(H FN )(φ N )] λk,N (φ N )
N
= , (12.2.21)
det[(H FN )(ψ )] λk,N (ψ N )
k=0
It can be shown that the conclusion of Proposition 12.8 holds when the condition
in (12.2.11) is replaced by
C
φN − φ L2
∨ ψN − ψ L2
≤ . (12.2.22)
N
The next lemma shows that every stationary point of F can be approximated by
a sequence of stationary points of FN , N ∈ N, in the sense of (12.2.22). The proof
is elementary.
Lemma 12.9 There exist C, N0 such that for all N > N0 and all stationary points
φ of F there is a stationary point φ N of FN such that
C
φ − φN L2
≤ . (12.2.23)
N
12.3 Estimate of the capacity 311
In this section we compute the relevant capacities for the discretised process. This
can be taken from Chap. 11, except that we have to take care of the N -dependence
of the error terms.
Recall from (5.7.27) that, after proper rescaling, we are considering the N -
dimensional diffusion
√
dXtN = −N ∇FN XtN dt + 2εN dBt . (12.3.1)
−FN (x)/ε
ε (dx) = e
μN dx. (12.3.2)
Let BρN (x) denote the Euclidean ball of radius ρ around x ∈ RN . Write I +,N , I −,N ,
O N , λ− N + −
N (O ) to denote the analogues of I , I , O defined prior to Theorem 12.4.
The following proposition is the desired estimate for the capacity with an error term
that is uniform in N .
where lim supN →∞ |ψ1 (ε, N )| ≤ C ε[ln(1/ε)]3 for some constant C.
We need to control the potentials FN globally and near their critical points. It is
very convenient that in our setting the Hessians at all the three stationary points are
diagonal in the same basis, namely,
√
vlk = ωkl / N , k ∈ {0, . . . , N − 1}, l ∈ {1, . . . , N}, (12.3.5)
312 12 Stochastic Partial Differential Equations
with ω = e2πi/N . This allows us to choose global coordinates for which all relevant
Hessians are diagonal. For y ∈ RN , define
1 k
N
ŷk = ŷk (y) = √ v l yl , (12.3.6)
N l=1
Recall that the explicit form of FN in the old coordinates is (recall (5.7.23))
N
N
FN (y) = N −1 V (yl ) + 14 N D (yl − yl+1 )2 . (12.3.8)
l=1 l=1
N −1
1 4
FN y(ŷ) = 1
2 λ0k,N ŷk2 − 12 ŷ02 + y(ŷ) 4
(12.3.9)
4N
k=0
N −1
1 4
= 1
2 λk,N ŷk2 + y(ŷ) 4
4N
k=0
Since these all lie on the line ŷ1 = ŷ2 = · · · = ŷN −1 = 0, it is useful to single out the
0-th coordinate. Note that
N
4
1
N
4
N −1 1
N
4
1 lk
yl (ŷ) = ŷ0 + ω ŷk = ŷ0 + wl (ŷ) , (12.3.12)
N N N
l=1 l=1 k=1 l=1
N −1
where wl = wl (ŷ) = k=1 ωkl ŷk . The important point is that Nl=1 ω = 0 for all
kl
(i)
N −1
FN y(ŷ) + 12 ŷ02 1 + 3w22 − 14 ŷ04 − 1
λk,N ŷk2
2
k=1
1
≤ 4|ŷ0 |w33 + w44 . (12.3.13)
4N
(ii)
N −1
FN y(ŷ) ≥ 1
2 λk,N ŷk2 − 12 ŷ02 . (12.3.14)
k=1
Proof Item (i) follows from (12.3.9) and (12.3.13). Item (ii) follows trivially be-
cause the quartic term in (12.3.9) is non-negative.
N −1/2 y(ŷ) 2
= ŷ2 . (12.3.15)
(ii) The Hausdorff-Young inequality holds, i.e., for any p ≥ 2 and for q = p/(p − 1)
there exists a constant Cq such that
N −1/p y(ŷ) p
≤ Cp ŷq . (12.3.16)
y(ŷ) ∞
≤ Cŷ1 . (12.3.17)
Together with the Parseval identity, this provides the input to obtain (12.3.16) from
the Riesz-Thorin interpolation theorem. See Reed and Simon [204, p. 328].
From the explicit form of the eigenvalues of the discrete Laplacian in (12.2.2) and
the relation in (12.3.10), we see that λk,N = λN −k,N , k = 1, . . . , N − 1. Using that,
for 0 ≤ t ≤ π2 ,
0 < t 2 1 − 13 t 2 ≤ sin2 t ≤ t 2 , (12.3.20)
we see that, for 1 ≤ k ≤ N2 ,
λk,N ≥ k 2 81 Dπ 2 1 − 1 2
12 π . (12.3.21)
The constants rk,N are constructed as follows. For an increasing sequence (ρk )k∈N
set
6 7
N
rk,N = rN −k,N = ρk , 1 ≤ k ≤ . (12.3.22)
2
Pick ρk = k α with α > 0 such that, for q = 32 , 43 ,
ρq
k
= Bq < ∞. (12.3.23)
kq
k∈N
N −1 y(ŷ)
4
4
≤ δ 4 C4 . (12.3.25)
The strategy for the upper bound is the same as in the proof of Theorem 11.2.
where CδN,⊥ is defined in (12.3.18) and c0 < ∞ is a constant to be chosen. For the
upper bound it is enough to replace FN by its lower bound in (12.3.14). Define the
set
UδN = y(ŷ) ∈ RN : |ŷ0 | ≤ c0 δ . (12.3.27)
Choose a test function h+ in the Dirichlet principle for the Dirichlet form in (12.3.3)
to obtain an upper bound on the capacity of interest. The set (UδN )c decomposes into
two disjoint connected components, one of which contains I +,N . We set h+ (y) = 1
on the latter component and h+ (y) = 0 on the other component. On UδN we choose
h+ as h+ (y) = f (ŷ0 ), where (recall that λ0,N = −1)
c0 δ λ t 2 /2ε
e 0,N dt
f (s) = cs δ 2 /2ε
. (12.3.28)
0 λ t
−c0 δ e dt
0,N
√ N
c0 δ 2
N−1
d ŷ0 e−λ0,N ŷ0 /2ε f (ŷ0 ) d ŷ1 . . . d ŷN −1 e−
2 λk,N ŷk2 /2ε
≤ε N k=1
−c0 δ RN−1
2
√ N "
N −1
1 2πε
=ε N c
0δ s 2 /2ε λk,N
−c0 δ eλ0,N
ds k=1
4 2
−1
√ N −λ0,N N" 2πε 2 2
=ε N 1 + O ec0 δ λ0,N /2ε . (12.3.29)
2πε λk,N
k=1
In the first and second equality, the change of variable y → ŷ gives rise to the factor
√ N √
N and the relation ∇h+ (y(ŷ))22 = N −1 |f (ŷ0 )|2 . Taking δ = K ε ln(1/ε),
as in Chap. 11, we see that the right-hand side has the desired asymptotics. Thus we
obtain that, for N large enough,
cap BρN I +,N , BρN I −,N
(see (12.1.5), (12.3.8) and (12.3.11)). This is the upper bound with a better error
estimate than in (12.3.4).
Remark 12.14 Note that, due to the particularly simple form of the potential in
(12.1.5), we did not need to use the fact that FN is well approximated by a quadratic
316 12 Stochastic Partial Differential Equations
Proof Around the saddle point O N we take a narrow corridor from one local mini-
mum to the other, and minimise the Dirichlet form on this corridor. We use the same
notation as in the proof of the upper bound.
We bound the capacity from below by
cap BρN I +,N , BρN I −,N (12.3.31)
√ N
2
≥ inf εN N ∇h y(ŷ) 2 e−FN (y(ŷ))/ε d ŷ
h : h(x)=1 ∀ x∈BN +,N ) CδN,⊥
ρ (I
h(x)=0 ∀ x∈BN −,N )
ρ (I
√ N
d 2 −F (y(ŷ))/ε
≥ inf εN N h y(ŷ) e N d ŷ.
h : h(x)=1 ∀ x∈BN +,N ) N,⊥ d ŷ
ρ (I Cδ 0
h(x)=0 ∀ x∈BN −,N )
ρ (I
The infimum can now be performed for each value of the orthogonal coordinates
ŷ ⊥ = (ŷ1 , . . . , ŷN −1 ) separately, i.e., the right-hand side of (12.3.31) is larger than
or equal to
√ N
1 2 ⊥
ε N d ŷ ⊥
sup d ŷ0 f (ŷ0 ) e−FN (y(ŷ0 ,ŷ ))/ε
CδN,⊥ f : f (1)=1,f (−1)=0 −1
−1
√ N 1 ⊥ ))/ε
=ε N d ŷ ⊥ d ŷ0 eFN (y(ŷ0 ,ŷ , (12.3.32)
CδN,⊥ −1
where we use that we already know how to solve the one-dimensional variational
problem.
To conclude, we need to bound the second integral in (12.3.32) from above. Us-
ing the upper bound from Lemma 12.11 and bounding the norms of w appearing
there with the help of Lemma 12.13, we obtain
1 N−1
1
⊥ ))/ε 1 1 2 )]+ 1 ŷ 4 )/ε
λk,N ŷk2 /ε+O(δ 3 )/ε
d ŷ0 e( 2 λ0,N ŷ0 [1+O(δ
2
d ŷ0 eFN (y(ŷ0 ,ŷ ≤ e2 k=1 4 0
−1 −1
√ (12.3.33)
when y⊥ ∈ CδN,⊥ (O N ). We again choose δ = K ε ln(1/ε) for some sufficiently
large K, and recall that λ0,N = −1. Hence the exponent in the integrand in the
12.4 Estimate of the equilibrium potential 317
right-hand side of (12.3.33) without the error term achieves its unique maximum at
−1/4ε. It is therefore easy to see that
1 √
1 1 4
d ŷ0 e( 2 λ0,N ŷ0 [1+O(δ )]+ 4 ŷ0 )/ε = 2πε (−λ0,N )−1/2 1 + O ε ln(1/ε) .
2 2
−1
(12.3.34)
Inserting this bound into (12.3.32), we can now carry out all the integrals over the
ŷk , 1 ≤ k ≤ N − 1. It is again elementary to show that
1 N−1
d ŷ ⊥ e− 2 k=1 λk,N ŷk /ε
2
CδN,⊥
' −1 4
(
N−1
N
λk,N
⊥ − 12 λk,N ŷk2 /ε − 12 λk,N ŷk2 /ε
≥ d ŷ e k=1 1− √ d ŷk e
RN−1 2πε |ŷk |≥δrk,N / λk,N
k=1
−1
' −1
(
√ N −1 "
N
1
N
1
−1 − 2 K ln(1/ε)rk,N
2
≥ 2πε 1− rk,N e . (12.3.35)
k=1
λk,N k=1
If we choose rk,N as in (12.3.22), with ρk = k α for some α > 0, and choose K large
enough, then we can arrange that
N −1
1 %
−1 − 2 K ln(1/ε)rk,N 2
& K
1− rk,N e ≥ 1 − Kε (12.3.36)
k=1
In Sect. 12.3 we derived upper and lower bounds on the denominator in (12.4.1) We
next derive estimates on the numerator of (12.4.1). The point is to show that this is
essentially the mass of a small neighbourhood of the starting minimum I +,N .
318 12 Stochastic Partial Differential Equations
Proposition 12.15 For all 0 < ε < ε0 and ρ > 0 small enough,
hBρN (I +,N ),BρN (I −,N ) (x) dμN
ε (x)
RN
(2πε)N
e−FN (O
N )/ε
= 1 + ψ2 (ε, N ) , (12.4.2)
det[(H FN )(I +,N )]
where lim supN →∞ |ψ2 (ε, N )| ≤ C ε[ln(1/ε)]3 .
To estimate the left-hand side of (12.4.2) we need yet another lower bound on the
non-quadratic terms in FN . This time we write
4 4
y(ŷ) 4,N
= ŷ04 + y(ŷ) 4,N
− ŷ04 ≤ ŷ04 , (12.4.4)
Note, moreover, that the coordinates of the two local minima are
ŷ I ±,N k = ±δk,0 , (12.4.7)
and in the CδN -neighbourhoods of these local minima the quadratic approximation
is good. Finally, the sets CδN (I ±,N ) are subsets of BρN (I ±,N ), so that the integrand
is equal to 1 on the set CδN (I +,N ) and equal to 0 on the set CδN (I −,N ). The claimed
estimate on the integral is now straightforward.
Most of the analysis above carries over unchanged when b > 0. The saddle points
remain the same, while the positions of the minima are shifted. More importantly,
the value of FN is now smaller by bs on the negative side. To show that nonetheless
there is no contribution from the target valley, we need a bound on the equilibrium
potential. Let
A = x ∈ RN : FN (x) ≤ FN I+N + δ (12.4.8)
for some δ > 0 small enough.
12.5 Proof of the main theorem 319
Lemma 12.16 For all η > 0 there exist ρ0 > 0, δ0 > 0 and ε0 > 0 such that for all
0 < ρ < ρ0 , 0 < δ < δ0 , 0 < ε < ε0 and x ∈ A ,
Proof By the definition of the set A , all paths from x ∈ A to I +,N must attain a
height at least FN (O N ). Therefore it follows from the large deviation principle and
the discussion on the exit problem (see Sect. 6.5.2) that for any T < ∞ fixed and all
x∈A,
Px (τBρN (I +,N ) < T ) ≤ e−(FN (O
N )−F
N (x)−η)/ε . (12.4.10)
On the other hand, for all x ∈ A there is a zero-action path from x to one of the
minima in BρN (I −,N ) that takes only a finite time T0 . All zero-action paths must
lead to BρN (I −,N ) in finite time. Therefore, to stay away from this set for a time T
requires the path not to follow a minimiser of the action integral for time T − T0 .
This costs a total action of order T a for some a > 0, and thus the probability of this
event is of order exp(−T a/ε), which can be made as small as desired by choosing
T large enough. In particular, it can be made much smaller than the probability in
(12.4.10). Now the simple bound
Px (τBρN (I +,N ) < τBρN (l −,N ) ) ≤ Px (τBρN (I +,N ) < T ) + Px (τBρN (I −,N ) > T ) (12.4.11)
Using the bound in Lemma 12.16, we see that the results for the symmetric case
b = 0 carry over to b > 0. This completes the proof of Proposition 12.15.
Remark 12.17 In more complicated situations, i.e., in the presence of multiple sta-
tionary points, the argument gets a little more involved. In that case, the process may
reach a small neighbourhood of some other stationary point before reaching its final
destination, and in this neighbourhood it could spend a large amount of time with-
out penalty. The probabilities to first reach the various stationary points are easily
computed with the help of large deviations, and by continuing the analysis step for
step from these new points as starting points we can show that this does not affect
the ultimate estimate on the harmonic function. This type of analysis is the basis
of the Freidlin-Wentzell theory [115]. All estimates involve only the potentials FN ,
and since these converge to F as discussed earlier, the control that is obtained in this
way is uniform in N .
Proof By putting all the estimates together, we obtain the following result on the
mean metastable exit time.
320 12 Stochastic Partial Differential Equations
= 1 + Ψ (ε, N ) , (12.5.1)
[−λ0,N ] det (H FN )(I +,N )
Proof Inserting the estimates for the denominator (Proposition 12.10) and the nu-
merator (Proposition 12.15) into (12.4.1), we get that Eν N [τεN ] is equal to the right-
hand side of (12.5.1), where
ν N = νB
N
N +,N ),B N (I −,N ) (12.5.3)
ρ (I ρ
is the last-exit biased distribution on BρN (I +,N ). Then use Theorem 9.14 to replace
ν N by the point BρN (I +,N ).
The assertion of Theorem 12.4 follows from Proposition 12.18 and the conver-
gence results established in Sect. 5.7, in particular, Theorem 5.70.
1. The system in (5.7.1) and its metastable behaviour have been studied for thirty
years. The main techniques employed in the literature are based on large deviation
principles and comparison estimates between the deterministic process ((5.7.1) with
ε = 0) and the stochastic process ((5.7.1) with ε > 0). Faris and Jona-Lasinio [107]
analysed (5.7.1) for the quartic double-well potential we considered here. Cassan-
dro, Olivieri and Picco [52] obtained similar asymptotics as in [107] when the space
interval [0, 1] is not fixed but tends to infinity as ε ↓ 0 (sufficiently slowly). These re-
sults established the existence of a suitable exponential time scale on which the pro-
cess undergoes a transition. For (6.3.1), Martinelli, Olivieri and Scoppola [175] ob-
tained the asymptotic exponential law of the transition times. Brassesco [41] proved
that the trajectories exhibit characteristics of metastable behaviour: the escape from
the basin of attraction of the minimum occurs through the lowest saddle points and
the process starting from this minimum spends most of its time before the transition
near this minimum.
2. As in the finite-dimensional setting, local minima and saddle points play a key
role in understanding metastability. In the infinite-dimensional setting, identifying
12.6 Bibliographical notes 321
the critical points is already a difficult task in itself. Fortunately, elegant methods
are available to do so: see e.g. Fiedler and Rocha [113] and Wolfrum [237].
Part V deals with Markov processes that allow for coarse-graining, i.e., a lumping of
states that leads to a simpler Markov process on a reduced state space. For instance,
the reduction of the state space of a high-dimensional spin system to that of a low-
dimensional spin system, whenever possible, is a powerful tool for the analysis of
its dynamics. Some mean-field models allow for such a reduction.
Chapter 13 looks at the Curie-Weiss model, Chaps. 14–15 at the random-field
Curie-Weiss model.
Chapter 13
The Curie-Weiss Model
with h ∈ R the magnetic field. The fact that this is a mean-field model is expressed
by the fact that HN (σ ) depends on σ only through the empirical magnetisation
1
mN (σ ) = σi , (13.1.2)
N
i∈Λ
namely,
1
HN (σ ) = −N 2 mN (σ ) + hmN (σ )
2
= N E mN (σ ) . (13.1.3)
where · 1 is the 1 -norm on SΛ , and the last line is put in to obtain a proper
normalisation. This dynamics is reversible w.r.t. the Gibbs measure
1
μβ,N (σ ) = e−βHN (σ ) 2−N , σ ∈ SΛ , (13.1.5)
Zβ,N
with Zβ,N the normalising partition function and β the inverse temperature.
Let us look at the evolution of the magnetisation mN (n) = mN (σ (n)) at time
n ∈ N0 . Clearly, this quantity can only increase or decrease by 2N −1 , and the prob-
ability of doing so only depends on the number of −1’s and +1’s present in the con-
figuration σ (n), i.e., on mN (σ (n)). In other words, with Fn denoting the σ -algebra
up to time n,
P mN (n + 1) = m | Fn = rβ,N mN (n), m , n ∈ N0 , (13.1.6)
is a function of mN (n) only, so that Theorem 9.5 applies and the image Markov
process has transition probabilities (recall (9.3.3))
⎧
⎨ 1−m exp[−βN[E(m ) − E(m)]+ ], if m = m + 2N −1 ,
2
rβ,N m, m =
⎩ 1+m exp[−βN[E(m ) − E(m)] ], if m = m − 2N −1 ,
2 +
(13.1.7)
on the state space
ΓN = −1, −1 + 2N −1 , . . . , 1 − 2N −1 , 1 . (13.1.8)
Moreover, this Markov process is reversible with respect to the image Gibbs mea-
sure
1 N
νβ,N (m) = e−βN E(m) 1+m 2−N , m ∈ ΓN . (13.1.9)
Zβ,N 2 N
In exponential form the latter can be written as
1 −βNfβ,N (m)
νβ,N (m) = e , (13.1.10)
Zβ,N
where
fβ,N (m) = − 12 m2 − hm + β −1 IN (m), (13.1.11)
13.1 The Curie-Weiss model 327
with
1 N
−IN (m) = ln 1+m 2−N . (13.1.12)
N 2 N
with
fβ (m) = − 12 m2 − hm + β −1 I (m),
I (m) = 12 (1 + m) ln(1 + m) + 12 (1 − m) ln(1 − m). (13.1.14)
The latter is the Cramér rate function for coin tossing (recall Sect. 6.1). Since
I (m) = I (−m) and I (m) ∼ 12 m2 as m → 0, we see from (13.1.11) that m → fβ (m)
is a double well when β > 1 and |h| is small enough (see Fig. 13.1). The stationary
points of fβ are the solutions of the equation
m = tanh β(m + h) . (13.1.15)
The random walk mN is close to a diffusion on [−1, 1] given by the Kramers diffu-
sion equation in (2.1.1) with W (x) = βfβ (x) and ε = N −1 . In other words, for large
N the dynamics of the magnetisation in the Curie-Weiss model can be approximated
by a Brownian motion in a potential as encountered in Sect. 2.1. For β > 1 and |h|
small enough this potential is a double well and the diffusion exhibits metastable
behaviour.
Let m∗− < m∗+ be the two local minima of m → fβ (m), and z∗ the saddle point in
between. Let m∗− (N ), m∗+ (N ) denote the points in ΓN that are closest to m∗− , m∗+ .
These points form a metastable set in the sense of Definition 8.2. In the setting of
Fig. 13.1, we have fβ (m+ ) > fβ (m− ), so m∗+ (N ) is the metastable state and m∗− (N )
is the stable state. Let Em∗+ (N ) denote expectation w.r.t. the Markov process starting
in m∗+ (N) and τm∗− (N ) the first hitting time of m∗− (N ).
In the limit as N → ∞, the sums in (13.2.2) are dominated by the terms with
m → z∗ and m → m∗+ , since for these terms fβ,N (m) − fβ,N (m ) is maximal. This
explains the exponential factor in (13.2.1). To get the prefactor in (13.2.1), we need
to look a bit more closely.
Note that [E(m − 2N −1 ) − E(m)]+ = 2N −1 [(m + h) − N −1 ]+ . For m → z∗ , the
∗
∗
2 exp(−2β[z + h]+ ). In the situation depicted
first line of (13.2.3) converges to 1+z
∗ ∗
in Fig. 13.1, we have z > 0. But z is a solution of (13.1.15), and so we have
exp(2β[z∗ + h]) = (1 + z∗ )/(1 − z∗ ) > 1. Therefore (13.2.2)–(13.2.3) imply that,
13.2 Metastable behaviour 329
∗ )−f ∗ 2
Em∗+ (N ) [τm∗− (N ) ] = e βN [fβ,N (z β,N (m+ )]
∗
1 + o(1) (13.2.4)
1−z
∗ )]−βN [f ∗
× e βN[fβ,N (m)−fβ,N (z β,N (m )−fβ,N (m+ )] .
m,m ∈ΓN
|m−z∗ |<ε, |m −m∗
+ |<ε
1 πN(1 − m2 )
IN (m) − I (m) = 1 + o(1) ln (13.2.5)
2N 2
and hence
2
πN(1 − m2 )
e βN [fβ,N (m)−fβ (m)] = 1 + o(1) . (13.2.6)
2
Consequently,
2
βN [fβ,N (z∗ )−fβ,N (m∗+ )]
∗ ∗ 1 − z∗2
e = 1 + o(1) e βN [fβ (z )−fβ (m+ )] . (13.2.7)
1 − m∗2
+
where we use that fβ (z∗ ) = 0 and fβ (m∗+ ) = 0. Changing to new variables u =
√ √
N (m − z∗ ) and u = N (m − m∗+ ) and recalling (13.1.8), we see that the sum in
(13.2.8) equals
N
1 + o(1) du du exp 12 βfβ z∗ u2 − 12 βfβ m∗+ u2 . (13.2.10)
4 R R
330 13 The Curie-Weiss Model
Since fβ (z∗ ) < 0 and fβ (m∗+ ) > 0, the integral converges and equals
2π
. (13.2.11)
β [−fβ (z∗ )] fβ (m∗+ )
The result in Theorem 13.1 fits the classical Arrhenius law with activation energy
β[fβ (z∗ ) − fβ (m∗+ )] and amplitude given by the prefactor. The former coincides
with what was found in (2.1.2) for the Kramers model, with W = βfβ and ε = N −1 ,
while the latter differs by a factor
2
N 1 1 − z∗2
∗
. (13.2.12)
2 1−z 1 − m∗2+
This discrepancy comes from the discrete nature of the Curie-Weiss model. In par-
ticular, the factor N is due to the fact that time is discrete and only one spin is flipped
per time step. In a continuous-time version, we would speed up time by a factor N ,
after which N would disappear from the last term in the right-hand of (13.2.1).
As a corollary of Theorems 8.43 and 8.45 we get the exponential law of the
metastable crossover time.
τm∗− (N )
lim Pm∗+ (N ) > t = e−t ∀ t ≥ 0. (13.2.13)
N →∞ Em∗+ (N ) [τm∗− (N ) ]
2. There are a number of generalised mean-field models that allow for a similar re-
duction to a multi-dimensional diffusive Markov process. See e.g. Bovier, Eckhoff,
Gayrard and Klein [33] and Chap. 14 of this book.
3. The calculations in this chapter are not robust against small modifications. Indeed,
we are using the full permutation symmetry of the Hamiltonian, which is necessary
to ensure that mN = (mN (n))n∈N0 is a Markov process. Even when we merely re-
place the discrete spin variables by continuous spin variables (which leads to the
model of mean-field interacting diffusions), the Markov property fails and we are
required to consider the empirical measure rather than the empirical magnetisation
as the macroscopic variable in order to obtain a Markovian dynamics.
Chapter 14
The Curie-Weiss Model with a Random
Magnetic Field: Discrete Distributions
In Sect. 14.1 we introduce the model. In Sect. 14.2 we define the associated Gibbs
measure and the relevant order parameter. In Sect. 14.3 we define the Glauber
dynamics and state the main metastability result. Section 14.4 deals with coarse-
graining, which works because of the mean-field interaction and because the ran-
dom fields take finitely many values. We construct the effective Dirichlet form that
is obtained after the coarse-graining. Section 14.5 studies the energy landscape near
the critical points. Section 14.6 analyses the eigenvalues of the Hessian at the crit-
ical points, while Sect. 14.7 looks at the overall topology of the energy landscape,
and indicate how the metastability results follow from those in Chap. 10.
1
HN [ω](σ ) = − σi σj − hi [ω]σi , (14.1.1)
2N
i,j ∈Λ i∈L
We briefly review some key features of the equilibrium behaviour of the RFCW-
model, for which we do not need any assumption on the distribution of the random
magnetic field.
The Gibbs measure of the RFCW-model is the random probability measure
1
mN (σ ) = σi (14.2.3)
N
i∈Λ
serves as the order parameter of the model, and we define its distribution under the
Gibbs measure in (14.2.1) as the induced measure
where
1
Zβ,N [ω](m) = 2−N eβ i∈Λ hi [ω]σi 1{N −1 i∈Λ σi =m} . (14.2.6)
σ ∈SN
For simplicity, we identify functions f defined on the discrete set ΓN with functions
f defined on the interval [−1, 1] by setting f (m) = f (02N m1/2N ). Then, by using
sharp large deviation estimates (see Chaganty and Sethuraman [55]), ZN 1 (m), m ∈
exp[−N IN [ω](m)]
1
Zβ,N [ω](m) = 1 + o(1) , (14.2.7)
2 N π/IN [ω](m)
1
14.2 Gibbs measure and order parameter 333
1
UN [ω](t) = ln 2−N eβ i∈Λ hi [ω]σi et i∈Λ σi
N
σ ∈SN
1
= ln cosh t + βhi [ω] . (14.2.8)
N
i∈Λ
1 1
a m∗ = −1 + = −1 + .
βUN (βm∗ ) β
i∈Λ [1 − tanh 2
(β(m∗ + hi [ω]))]
N
(14.2.15)
Thus, we see that, by the law of large numbers, the set of critical points converges
Ph -a.s. to the set of solutions of the equation
m∗ = Eh tanh β m∗ + h , (14.2.16)
and the second derivative of Fβ,N (m∗ ) converges to
∗ 1
lim Fβ,N m = −1 + . (14.2.17)
N →∞ βEh [1 − tanh2 (β(m∗ + h))]
334 14 The Curie-Weiss Model with a Random Magnetic Field
Proposition 14.1 Let m∗ be a critical point of Fβ,N . Then Ph -a.s., for all but finitely
many values of N ,
with
1 2 1
Fβ,N m∗ = m∗ − ln cosh β m∗ + hi [ω] . (14.2.21)
2 βN
i∈Λ
The above observations provide a detailed picture of the distribution of the order
parameter. Note that m∗ depends on ω.
Next we add dynamics. As in Chap. 13, we consider the discrete-time Glauber dy-
namics with Metropolis transition probabilities (compare with (13.1.4))
⎧
⎪ −1
⎨N exp[−β[HN [ω](σ ) − HN [ω](σ )]+ ], if σ − σ 1 = 2,
pN [ω] σ, σ = 0, if σ − σ 1 > 2,
⎪
⎩
1 − η=σ p(σ, η), if σ = σ .
(14.3.1)
We write Pσ [ω] = Pσ for the law of this Markov process (for a given realisation of
the magnetic fields) starting in σ . Note that this dynamics is ergodic and reversible
with respect to the Gibbs measure μβ,N [ω] for each ω.
A heuristic picture for the metastable behaviour of systems like the random-
field Curie-Weiss model is based on replacing the full Markov process on SN by
an effective Markov process for the order parameter, i.e., by a nearest-neighbour
random walk on ΓN with transition probabilities that are reversible with respect to
the induced measure Qβ,N . The ensuing model can be solved exactly. A natural
14.4 Coarse-graining 335
which are different from zero only when m = m − 2/N, m, m + 2/N . The ensuing
Markov process is a one-dimensional nearest-neighbour random walk, for which
most quantities of interest can be computed explicitly by elementary means, as in
Chap. 13. In particular, it is easy to show that if M is the global minimum of Fβ,N
and m∗ is a local minimum, then, as Theorem 13.1,
Em∗ [τM ] = exp βN Fβ,N z∗ − Fβ,N m∗
2
2 2πN/4 βEh [1 − tanh2 (β(z∗ + h))] − 1
× ∗ ∗
1 + o(1) ,
1 − z β[−a(z )] 1 − βEh [1 − tanh (β(m + h))]
2 ∗
(14.3.3)
where z∗ is the saddle point between M and m∗ , and a(z∗ ) is defined in (14.2.15).
However, the prediction of this naive approximation produces the wrong prefactor,
as is shown in our main theorem below.
To obtain precise results, we will need to introduce an exact lumping in the sense
of Sect. 9.2.
14.4 Coarse-graining
So far we did not need any assumption on the distribution of the random field. Now
we assume that the random field takes values in the finite set I = {b1 , . . . , bn }.
Each realisation of the random field {hi [ω]}i∈Λ induces a random partition of the
set Λ = {1, . . . , N} into subsets (see Fig. 14.1)
Λk [ω] = i ∈ Λ : hi [ω] = bk , k = 1, . . . , n. (14.4.1)
1
mk [ω](σ ) = σi , k = 1, . . . , n, (14.4.2)
N
i∈Λk [ω]
and we denote by m [ω] the n-dimensional vector (m1 [ω], . . . , mn [ω]). In the sequel
we will use the convention that boldface symbols denote n-dimensional vectors and
their components, while the sum of the components is denoted by the corresponding
336 14 The Curie-Weiss Model with a Random Magnetic Field
Fig. 14.1 Coarse-graining: Λ is partitioned into sets where the magnetic field takes the same value
n
plain symbol, e.g. m[ω] = k=1 mk [ω]. The vector m takes values in the set
n
ΓNn [ω] = ×
k=1
−ρN,k [ω], −ρN,k [ω] + N2 , . . . , ρN,k [ω] − N2 , ρN,k [ω] , (14.4.3)
where
|Λk [ω]|
ρk = ρN,k [ω] = . (14.4.4)
N
We denote by ek , k = 1, . . . , n, the lattice vectors of the set ΓNn [ω], i.e., the vectors
of length 2/N parallel to the unit vectors. Note that the random variables ρN,k [ω]
concentrate exponentially fast in N around their mean values Eh [ρN,k ] = Ph (h1 =
bk ) = pk . In particular, we have the following lemma.
where ZN [ω] is the normalising partition function. We use the same symbols Qβ,N ,
Fβ,N for functions defined on the n-dimensional variables x. Since we distinguish
vectors from scalars by using boldface type, there should be no confusion possible.
Similarly, for a mesoscopic subset A ⊆ ΓNn [ω], we define its microscopic counter-
part,
A = SN [A] = σ ∈ SN : m(σ ) ∈ A . (14.4.9)
The vectors (m[ω](σ (t)))t∈R+ form a Markov process with transition rates
1
rN [ω] x, x = μβ,N [ω](σ ) p[ω] σ, σ . (14.4.10)
Qβ,N [ω](x)
σ ∈SN [x] σ ∈SN [x ]
This can be easily inferred by checking the conditions of Theorem 9.5 in Sect. 9.2.
We can also check that the capacities of these processes are related. Let the sets
A, B ⊂ SN be defined in terms of the block variables m. This means that, for some
A, B ⊆ ΓNn , A = SN [A] and B = SN [B]. By symmetry under permutations that
leave the partition Λk [ω] invariant, we have
1 2
cap(A, B) = inf μβ,N [ω](σ )p σ, σ h(σ ) − h σ
h∈HA,B 2
σ,σ ∈SN
1 2
= inf μβ,N [ω](σ )p σ, σ u m(σ ) − u m σ
u∈GA,B 2
σ,σ ∈SN
2
= inf u(x) − u x μβ,N [ω](σ ) p σ, σ
u∈GA,B
x,x ∈ΓNn σ ∈SN [x] σ ∈SN [x ]
2
= inf Qβ,N [ω](x)rN x, x u(x) − u x
u∈GA,B
x,x ∈ΓNn
where
HA,B = h : SN → [0, 1] : h(σ ) = 1 ∀ σ ∈ A, h(σ ) = 0 ∀ σ ∈ B ,
(14.4.12)
GA,B = u : ΓNn → [0, 1] : u(x) = 1 ∀ x ∈ A, u(x) = 0 ∀ x ∈ B ,
Theorem 14.3 (Metastable sets) Let MN be the set of (best lattice approximations
of) the local minima of the functions Fβ,N . Then MN is a metastable set in the sense
of Definition 8.2 for the induced dynamics with transition rates given by rN .
338 14 The Curie-Weiss Model with a Random Magnetic Field
Theorem 14.4 (Mean metastable exit times) Let x ∈ MN . Let Mx be the set of
local minima where Fβ,N is smaller than or equal to Fβ,N (x). For every σ ∈ S[x]
and x ∈ MN , Ph -a.s. for all but finitely many values of N ,
Eσ [τS[Mx ] ] = exp βN Fβ,N z∗ − Fβ,N x ∗
2
πN βEh [1 − tanh2 (β(z∗ + h))] − 1
× 1 + o(1) ,
2β[−γ̄1 ] 1 − βEh [1 − tanh (β(m∗ + h))]
2
(14.4.13)
where x ∗ = n=1 x , z∗ = n=1 z , z is the saddle point between x and Mx , and
γ̄1 is the unique negative solution of the equation
[1 − tanh(β(z∗ + h))] exp[−2β(z∗ + h)+ ]
Eh exp[−2β(z∗ +h)+ ]
= 1. (14.4.14)
β[1+tanh(β(z∗ +h))] − 2γ
We are very close to the setting of Chap. 10. To complete the connection we need
to analyse the measures Qβ,N [ω](x). We henceforth suppress ω from the notation.
Note that
' ' n (2 ( n
1 "
n
Zβ,N Qβ,N (x) = exp Nβ x + x b
ZN (x /ρ ), (14.5.1)
2
=1 =1 =1
14.5 The landscape near critical points 339
where
ZN (y) = 2−|Λ | 1{|Λ |−1 i∈Λ σi =y} . (14.5.2)
σ ∈SΛ
For y ∈ (−1, 1), ZN (y) can be expressed, via an elementary asymptotics of bino-
mial coefficients, as
exp[−|Λ |I (y)]
ZN (y) = 1 + o(1) , (14.5.3)
2 |Λ |/I (y)
π
where o(1) tends to zero as |Λ | → ∞ and I is Cramèr’s rate function (13.1.13)
(again we identify functions on ΓNn with their natural extensions to Rn ). This means
that we can express the right-hand side of (14.5.1) as
2
"n
I (x /ρ )/ρ
Zβ,N Qβ,N (x) = exp −NβFβ,N (x) 1 + o(1) , (14.5.4)
Nπ/2
=1
where
' n (2
1 1
n n
Fβ,N (x) = − x − x b + ρ I (x /ρ ). (14.5.5)
2 β
=1 =1 =1
n
z∗j + b = β −1 I z∗ /ρ = β −1 t∗ , (14.5.6)
j =1
n ∗
or, with z∗ = j =1 z ,
β z∗ + b = I z∗ /ρ = t∗ , (14.5.7)
which implies
z∗ /ρ = tanh β z∗ + b . (14.5.8)
Summing over , we see that z∗ must satisfy the equation
1
z∗ = tanh β z∗ + hi , (14.5.9)
N
i∈Λ
∗ ∂ 2 Fβ,N (z∗ ) ∗
A z k = = −1 + δk, β −1 ρ−1 IN,
z /ρ = −1 + δ,k λ̂ ,
∂zk ∂z
(14.5.10)
340 14 The Curie-Weiss Model with a Random Magnetic Field
1
λ̂ = . (14.5.11)
βρ [1 − tanh (β(z∗ + b ))]
2
"
n
= 1 − βEh 1 − tanh2 β z∗ + h λ̂ 1 + o(1) .
=1
Lemma 14.7 Let z∗ be a solution of (14.5.9). In addition, assume that all numbers
λ̂k are distinct. Then γ is an eigenvalue of A(z∗ ) if and only if it is a solution of the
equation
n
1
= 1. (14.6.1)
1
=1 βρ [1−tanh2 (β(z∗ +b ))]
−γ
14.7 Topology of the landscape 341
Moreover, (14.6.1) has at most one negative solution, and it has such a solution if
and only if
βEh 1 − tanh2 β z∗ + h > 1. (14.6.2)
Proof To find the eigenvalues of A, simply replace λ̂k by λ̂k − γ in the first line of
(14.5.12). This gives
' (
∗
n
1 "
n
det A z − γ = 1 − (λ̂ − γ ), (14.6.3)
=1
λ̂ − γ =1
provided none of the λ̂ − γ is zero. Then (14.6.1) is just the requirement that the
first factor in the right-hand side of (14.6.3) vanishes. It is easy to see that, under the
hypothesis of the lemma, this equation has n solutions, and that exactly one of them
is negative under the hypothesis in (14.6.2).
From the analysis of the critical points of Fβ,N it follows that the landscape of this
function is closely linked to the one-dimensional landscape described in Sect. 11.1
(see Fig. 14.2). We collect the following features:
(i) Let m∗1 < z1∗ < m∗2 < z2∗ < · · · < zk∗ < m∗k+1 be the sequence of minima, respec-
tively, maxima of the one-dimensional function Fβ,N defined in (14.2.10). To
each minimum m∗i corresponds a minimum m∗i of Fβ,N such that n=1 m∗i, =
m∗i , and to each maximum zi∗ corresponds a saddle point z∗i of Fβ,N such that
n ∗ ∗
=1 zi, = zi .
(ii) For any value m of the total magnetisation,
the function Fβ,N (x) takes its
relative minimum on the set {y : y = m} at the point x̂ ∈ Rn determined
342 14 The Curie-Weiss Model with a Random Magnetic Field
n
= ρ tanh β(m + a + b ) .
=1
Moreover,
Remark 14.8 Note that the minimal energy curves x̂(·) defined by (14.7.1) pass
through the minima and the saddle points, but in general are not integral curves of
the gradient flow connecting them. Also note that, since we assume that the random
fields hi have bounded support, for every δ > 0 there exist two universal constants
0 < c1 ≤ c2 < ∞ such that
dx̂ (m)
c1 ρ ≤ ≤ c2 ρ , (14.7.4)
dm
uniformly in N , m ∈ [−1 + δ, 1 − δ] and = 1, . . . , n.
Finally, in order to apply the results from Chap. 10, we need the form of the
transition rates r near a saddle point z∗ . For σ ∈ SN , put
Λ±
k (σ ) = i ∈ Λk : σ (i) = ±1 . (14.7.5)
all x ∈ ΓN , we have
n
rN (x, x + e ) = Qβ,N (x)−1 μβ,N [ω](σ ) p σ, σ i (14.7.6)
σ ∈SN [x] i∈Λ−
(σ )
−2β[x− 1 +b ]+
= Λ −
(x) e
N .
for some finite constant c > 0. Thus, as in Chap. 10, we replace the Dirichlet form
near the saddle point by a simplified one, where
%β,N (x)
Q
r(x, x + e ) = rN z∗ , z∗ + e = r ,
% %
r(x + e , x) = r , (14.7.8)
%β,N (x + e )
Q
are the modified rates of a dynamics on DN (ρ) that is reversible w.r.t. the measure
Q%β,N (x). Let L
%N denote the corresponding generator. For u ∈ GA,B , we write the
corresponding Dirichlet form as
n
∗ ∗ ∗ 2
E%DN (u, u) = Qβ,N z∗ r e−βN ((x−z ),A(z )(x−z )) u(x) − u(x + e ) .
x∈DN (ρ) =1
(14.7.9)
We now have all the ingredients needed to apply the results of Chap. 10. The
only difference is that the free energy functional Fβ,N is random and depends on N .
But this presents no obstacle. What is still needed is the computation of the relevant
eigenvalues and eigenfunctions of the matrix B defined in (10.1.6).
Then z∗ defined through (14.5.8) is a saddle point, and the unique negative eigen-
value of B(z∗ ) is the unique negative solution γ̂1 = γ̂1 (N, n) of the equation
[1 − tanh2 (β(z∗ + h))]
E = 1. (14.7.11)
1 − 2γ exp (2β[z∗ + h]+ )β[1 + tanh(β(z∗ + h))]
Proof The particular form of the matrix B allows us to obtain a simple characterisa-
tion of all the eigenvalues and eigenvectors. The eigenvalue equations can be written
as
n
√
− r rk u + (rk λ̂k − γ )uk = 0 ∀ 1 ≤ k ≤ n. (14.7.12)
=1
n
rk
= 1. (14.7.14)
r λ̂
k=1 k k
−γ
Inserting the expressions for r from (14.7.6), z∗k /ρk from (14.5.8) and λ̂k from
(14.5.11) into (14.7.14), we obtain (14.7.11). Since the left-hand side of (14.7.14)
is monotone decreasing in γ as long as γ ≥ 0, it follows that there can be at most
one negative solution of this equation, and such a solution exists if and only if the
left-hand side is larger than 1 for γ = 0.
2. The equilibrium behaviour of the RFCW-model was analysed first by Salinas and
Wereszinski [238], and later in more detail by Amaro de Matos, Baêta Segundo and
Perez [5] and Külske [158].
3. For solutions of (14.5.6), see Bovier, Eckhoff, Gayrard and Klein [33] or Bovier,
Bianchi and Ioffe [24]. It is straightforward to analyse the case where some of the
λ̂k ’s in Lemma 14.7 coincide.
4. Another model that can be analysed with the methods of this chapter is the
Glauber dynamics of the Hopfield model of neural networks (see Bovier and
Gayrard [36] for a review) with finitely many stored patterns. This was done in
the thesis of an der Heiden [6] under somewhat restrictive conditions.
Chapter 15
The Curie-Weiss Model with Random Magnetic
Field: Continuous Distributions
We consider the same model with the same dynamics as in Chap. 14, but we drop
the assumption made in Sect. 14.4 that the random magnetic fields take on only
finitely many values. Instead we will only assume that the common distribution of
the random magnetic fields has bounded support. All the results from Sects. 14.1–
14.3 remain unchanged. What fails is the exact lumping procedure that allowed
us to realise the mesoscopic image of our Markov process as a discrete diffusion
process.
Our task is to obtain sharp estimates on metastable exit times. The main result
is formulated in the following theorem, whose proof is the content of the present
chapter.
Theorem 15.1 (Mean metastable exit times) Assume that β and the distribution of
the magnetic fields are such that there exist more than one local minimum of Fβ,N .
Let m∗ be a local minimum of Fβ,N , M = M(m∗ ) the set of minima of Fβ,N such
that Fβ,N (m) < Fβ,N (m∗ ), and z∗ the minimax between m and M, i.e., the lower
of the highest maxima separating m from M to the left, respectively, right. Then,
Ph -a.s. and for all but finitely many values of N ,
EνS[m∗ ],S[M] [τS[M] ] = exp N Fβ,N z∗ − Fβ,N m∗
2
πN βEh (1 − tanh2 (β(z∗ + h))) − 1
× 1 + o(1) ,
2β[−γ̄1 ] 1 − βEh (1 − tanh2 (β(m∗ + h)))
(15.1.1)
Note that
Fβ,N z∗ − Fβ,N m∗
(z∗ )2 − (m∗ )2
= exp βN − ln cosh β z∗ + hi − ln cosh β m∗ + hi .
2
i∈Λ
(15.1.3)
Remark 15.2 Theorem 15.1 can be improved with the help of coupling techniques
in two ways. First, the starting measure νS[m∗ ],S[M] can be replaced by any configu-
ration σ in a suitably defined subset of S[m∗ ]. Second, the law of the transition time
can be shown to be asymptotically exponential. Both these results rely on rather
intricate and technical coupling arguments (see Sect. 15.6).
The proof of Theorem 15.1 relies on the following estimate for capacities.
Theorem 15.3 (Capacity asymptotics) With the same notation as in Theorem 15.1,
Fig. 15.1 Coarse-graining: Λ is partitioned into sets where the magnetic field takes values in a
narrow interval. Compare with Fig. 14.1
15.2.1 Coarse-graining
Let I denote the support of the common distribution of the random fields hi . Let I ,
∈ {1, . . . , n}, be a partition of I such that |I | ≤ C/n = ε for all and some
C < ∞. Each realisation of the random fields {hi [ω]}i∈N induces a random par-
tition of the set Λ = {1, . . . , N} into subsets (see Fig. 15.1)
Λ [ω] = i ∈ Λ : hi [ω] ∈ I , = 1, . . . , n. (15.2.1)
Remark 15.4 To simplify the presentation in this chapter, all statements involving
random variables on (Ω, F , Ph ) are understood to be true with Ph -probability one,
for all but finitely many values of N .
We define
1
h̄ = hi , h̃i = hi − h̄ . (15.2.3)
|Λ |
i∈Λ
form:
n
1 σ (h −h̄ )
Qβ,N [ω](x) = eβN E(x) Eσ 1{m[ω](σ )=x} e =1 i∈Λ i i . (15.2.5)
ZN [ω]
We now turn to the precise computation of the measures Qβ,N [ω](x) in the neigh-
bourhood of the critical points of Fβ,N [ω](x). We will see that this goes very much
along the lines of the analysis for discrete distributions. We get the same expression
for Zβ,N [ω]Qβ,N [ω](x) as in (14.5.1), again with b replaced by h̄ , and
Zβ,N [ω](y) = EσΛ exp β h̃i σi 1{|Λ |−1 i∈Λ σi =y}
i∈Λ
As in Sect. 14.5, we can express Zβ,N [ω]Qβ,N [ω](x) in the form of (14.5.4)
with Fβ,N given by (14.5.5), but b replaced by h̄ , where the entropy function
IN, [ω](y) is now defined as the Legendre-Fenchel transform of the log-moment-
generating function,
1
UN, [ω](t) = ln Eh̃σΛ exp t σi
|Λ |
i∈Λ
1
= ln cosh(t + β h̃i ). (15.2.7)
|Λ |
i∈Λ
The analysis of the free energy functions near critical points z∗ of Fβ,N goes very
much as in Sect. 14.5, with the obvious replacements. Using that, by standard prop-
(x) = U −1 (x), we see that (14.5.8) be-
erties of Legendre-Fenchel transforms, IN, N,
comes
∗ 1
z∗ /ρ = UN,
β z + h = tanh β z∗ + hi , (15.2.8)
|Λ |
i∈Λ
∗
A z k = −1 + δ,k λ̂ , (15.2.9)
15.2 Coarse-graining and the mesoscopic approximation 349
where
I (z∗ /ρ ) 1 1
λ̂ = = (β(z∗ + h̄ ))
= ,
β ∗
i∈Λ (1 − tanh (β(z + hi )))
βρ βρ UN, 2
N
(15.2.10)
which replaces (14.5.11).
The following is the analogue of Proposition 14.6.
Proof We start with the representation of Zβ,N [ω]Qβ,N [ω](x) given in (14.5.4).
Using (15.2.10) and the formula for the determinant of A(z∗ ) given in (14.5.12), we
get the prefactor. For the exponential term Fβ,N , note that by convex duality
IN, z∗ /ρ = t∗ z∗ /ρ − UN, t∗ = β z∗ + h̄ z∗ /ρ − UN, β z∗ + h̄ .
(15.2.12)
Hence
Fβ,N z∗
1 ∗ 2 ∗ 1
n n
=− z − z h̄ + ρ β z∗ + h̄ z∗ /ρ − ρ UN, β z∗ + h̄
2 β
=1 =1
n
1 ∗ 2 1
=− z − z∗ h̄ − z∗ z∗ − h̄z∗ + ln cosh β z∗ + hi
2 βN
=1 i∈Λ
1 2 1
= z∗ − ln cosh β z∗ + hi , (15.2.13)
2 βN
i∈Λ
Remark 15.6 The form given in Proposition 15.5 is highly suitable for our purposes,
as the dependence on n appears only in the denominator of the prefactor. We will
see that this is just what we need
to get a formula for capacities that is independent
of the choice of the partition I = 1≤≤n I and has a limit as n → ∞.
350 15 The Curie-Weiss Model with Random Magnetic Field
The eigenvalues of the Hessian are characterised in the next lemma, which is the
analogue of Lemma 14.7.
Lemma 15.7 Let z∗ be a solution of (14.5.9). In addition, assume that the distri-
bution of (hi )i∈N is such that all numbers λ̂k are Ph -a.s. distinct. Then γ is an
eigenvalue of A(z∗ ) if and only if it is a solution of the equation
n
1
= 1. (15.2.14)
1
−γ
=1 β
N i∈Λ (1−tanh2 (β(z∗ +hi )))
Moreover, (15.2.14) has at most one negative solution, and it has such a negative
solution if and only if
β
N
1 − tanh2 β z∗ + hi > 1. (15.2.15)
N
i=1
Sections 15.3–15.4 are devoted to the proof of Theorem 15.3. In this section we
prove the upper bound. Obtaining upper bounds on capacities just requires guessing
a test function. Basically, we may ignore the fact that our coarse-graining is not an
exact lumping, since (14.4.11) holds as an upper bound (only the second equality
has to be replaced by an inequality).
Let A = SN [A] and B = SN [B], for some A, B ⊆ ΓNn . Then
1 2
cap(A, B) = inf μβ,N [ω](σ )p σ, σ h(σ ) − h σ
h∈HA,B 2
σ,σ ∈SN
1 2
≤ inf μβ,N [ω](σ )p σ, σ u m(σ ) − u m σ
u∈GA,B 2
σ,σ ∈SN
2
= inf Qβ,N [ω](x)rN x, x u(x) − u x
u∈GA,B
x,x ∈ΓNn
with rN (β, x ), HA,B and GA,B defined precisely as in (14.4.10) and (14.4.12).
We proceed from here as in Chap. 10. For this we need the form of the transition
rates in the neighbourhood of a critical point. The formulas from Chap. 14 have to
15.3 Upper bounds on capacities 351
Note that, for all σ ∈ SN (x), |Λ− (σ )| is a constant depending on x only. Using that
% %
hi = h̄ + hi , with hi ∈ [−ε, ε], we get the bounds
|Λ−
(x)| −2β[m(σ )+h̄ ]+
rN (x, x + e ) = e 1 + O(ε) . (15.3.3)
N
It follows that, for all x ∈ DN (ρ),
rN (x, x + e )
− 1 ≤ cβ(ε + nρ) (15.3.4)
r (z∗ , z∗ + e )
N
for some finite constant c > 0. With these minimal changes we arrive at the same
form of the effective Dirichlet form E%DN (u, u) as in (14.7.9).
From now on the upper bound follows as in the case of discrete magnetic fields.
There is just a slight change in that Lemma 14.9 needs to be replaced by the follow-
ing.
β
N
1 − tanh2 β z∗ + hi > 1. (15.3.5)
N
i=1
Then z∗ defined through (15.2.8) is a saddle point, and the unique negative eigen-
value of B(z∗ ) is the unique negative solution γ̂1 = γ̂1 (N, n) of the equation
∗ ∗
n 1
|Λ | i∈Λ (1 − tanh(β(z + hi ))) exp (−2β[z + h̄ ]+ )
ρ 1 ∗ ∗
= 1. (15.3.6)
|Λ | i∈Λ (1−tanh(β(z +hi ))) exp (−2β[z +h̄ ]+ )
=1 β ∗
− 2γ
i∈Λ (1−tanh (β(z +hi )))
2
|Λ |
Moreover,
lim lim γ̂1 (N, n) = γ̄1 , (15.3.7)
n→∞ N →∞
Proof The proof of (15.3.6) is identical to that of Lemma 14.9. The assertion on the
convergence follows from the fact that the size of the small fields tends to zero as
n → ∞.
This result yields the upper bound given in the next proposition.
Combining Proposition 15.9 with Proposition 14.6, we get (after some computa-
tions) the following more explicit representation of the upper bound.
Corollary 15.10 concludes the upper bound in the proof of Theorem 15.3.
We label the realisations of the mesoscopic Markov chain XA,B associated with
the mesoscopic flow fA,B as x = (x−A , . . . , xB ) in such a way that x−A ∈ A,
xB ∈ B, and m(x0 ) = m(z∗ ). We denote by PfA,B the corresponding law on the
mesoscopic paths. If e is a mesoscopic bond, then we write e ∈ x when e =
(x , x+1 ) for some = −A , . . . , B − 1. With each path x of positive probabil-
ity we associate a subordinate microscopic unit flow f x such that
and
f x (b) = 1 ∀ x, e ∈ x. (15.4.2)
b : e(b)=e
Therefore we have
PfA,B (b ∈ σ ) = PfA,B (XA,B = x)Px (b ∈ σ ). (15.4.5)
x
cap(A, B)
' −1 (−1
f
B
fA,B (x , x+1 )f x (σ , σ+1 )
≥ PNA,B (XA,B = x)E x
x
(1 + d(x ))μβ,N (σ )pN (σ , σ+1 )
=−A
' −1 (−1
f
B
fA,B (x , x+1 )f x (σ , σ+1 )
≥ ENA,B E x
, (15.4.9)
(1 + d(x ))μβ,N (σ )pN (σ , σ+1 )
=−A
cap(A, B)
' −1 (−1
f
B
fA,B (x , x+1 ) x
≥ ENA,B E x
φ (σ , σ+1 ) .
(1 + d(x ))Qβ,N (x )rN (x , x+1 )
=−A
(15.4.11)
x
The point of the above rewrite is that if were equal to one, then we
φ (σ , σ+1 )
would be in the same situation as in the case of discrete distributions of the magnetic
fields, and the left-hand side would be equal to the upper bound in (15.3.9), up to
errors of order N −1/2 (ln N )3/2 + ε.
Recall from Sect. 10.3 that we can restrict the expectation to a subset of good
f
realisations x of the mesoscopic Markov chain XA,B whose probability under PNA,B
x x
is close to one. It remains to construct f such that φ in (15.4.10) is close to
one (in a weak sense). This requires some additional notation. Given a mesoscopic
trajectory x = (x−A , . . . , xB ), define k = k() as the direction of the increment
of the -th jump, i.e., x+1 = x + ek . On the microscopic level such a transi-
tion corresponds to a flip of a spin from the Λk -slot. Thus, recalling the notation
Λ± k (σ ) = {i ∈ Λk : σ (i) = ±1}, we have that if σ ∈ SN [x ] and σ+1 ∈ SN [x+1 ],
then σ+1 = θi+ σ for some i ∈ Λ− k() (σ ). By our choice of pN and rN ,
rN (x , x+1 ) −
= Λk() (σ ) 1 + O(ε) , (15.4.12)
pN (σ , σ+1 )
uniformly in and in all pairs of neighbours σ , σ+1 . Note that the cardinality of
Λ−k() (σ ) is the same for all σ ∈ SN [x ].
For x ∈ ΓNn , define the measure
1{σ ∈SN [x]} μβ,N (σ )
μxβ,N (σ ) = (15.4.13)
Qβ,N (x)
15.4 Lower bounds on capacities 355
n
β =1 i∈Λ σi h̃i
e
= 1{σ ∈SN [x]} n
β =1 i∈Λ σi h̃i
σ: x(σ )=x e
"
n
e
β i∈Λ σi h̃i "
n
= 1{σ ∈SN [x]} = μx,
β,N (σ ).
β i∈Λ σi h̃i
=1 σΛ : x (σΛ() )=x e =1
x
Then we can write φ as
If the magnetic fields were constant on each set Ik , then we could choose
(σ )/Λ−
x
f x (σ , σ+1 ) = μβ,N k() (σ ) , (15.4.15)
Construction of f x
S TEP 1. The transition probabilities q (σ , σ+1 ) in (15.4.18) are defined in the
following way: all the microscopic jumps are of the form σ → θj+ σ for some
j ∈ Λ− +
k() (σ ), where θj flips the j -th spin from −1 to 1. For such a flip define
e2β h̃j
q σ , θj+ σ = . (15.4.16)
e 2β h̃i
i∈Λ−
k (σ )
Clearly, these ratios sum up to one. Note also that they satisfy
1 + O(ε)
q σ , θj+ σ = . (15.4.17)
|Λ−k() |
x x
0
S TEP 2. As initial measure ν0 , choose μβ,N . For = 0, . . . , B , set
x
x
ν+1 (σ+1 ) = ν (σ )q (σ , σ+1 ). (15.4.18)
σ ∈SN [x ]
Note that these measures are concentrated on SN [x ] and are the marginals of Px at
time .
356 15 The Curie-Weiss Model with Random Magnetic Field
S TEP 3. Define the microscopic flow through an admissible bond b = (σ , σ+1 ) as
x
f x (σ , σ+1 ) = Px (b ∈ Σ) = ν (σ )q (σ , σ+1 ). (15.4.19)
Note that the fact that the q are probabilities, together with the definition in
(15.4.18), ensures that f x is a unit flow. Consequently,
x
x ν (σ ) rN (x , x+1 )
φ (σ , σ+1 ) = q (σ , σ+1 ). (15.4.20)
μxβ,N
(σ ) pN (σ , σ+1 )
Note that Ψ0 (σ0 ) = 1. We need to control the evolution of this quantity in time.
Proposition 15.11 There exists a set TA,B of good mesoscopic trajectories from A
to B such that
f
PNA,B (XA,B ∈ TA,B ) = 1 − o(1), (15.4.22)
and, uniformly in x ∈ TA,B ,
−1
B
fA,B (x , x+1 ) 1
E x
Ψ (σ ) ≤ 1 + O(ε) .
(1 + d(x ))Qβ,N (x )rN (x , x + 1) EN (g̃)
=−A
(15.4.23)
Let x be given. We have seen in (15.4.13) that μx|b,N is a product measure. On the
other hand, according to (15.4.16), the large microscopic Markov chain Σ splits
into a direct product of n small microscopic Markov chains Σ (1) , . . . , Σ (n) , which
(1) (n)
independently evolve on SN , . . . , SN . Thus, k() = k means that the -th step of
15.4 Lower bounds on capacities 357
the mesoscopic Markov chain induces a step of the k-th small microscopic Markov
chain Σ (k) . Let τ1 [], . . . , τn [] be the numbers of steps performed by each of the
small microscopic Markov chains after steps of the mesoscopic Markov chain
or, equivalently, after steps of the large microscopic Markov chain Σ . Then the
corrector Ψ in (15.4.21) also factorises and can be written as
"
n
(j ) (j )
Ψ (σ ) = ψτj [] σ . (15.4.25)
j =1
Therefore we are left with two separate tasks: On the microscopic level we need to
control the propagation of errors along small Markov chains, while on the meso-
scopic level we need to control the statistics of τ1 [], . . . , τn [].
To simplify notation we consider the error propagation along the small Markov
chains in a more abstract setting. Fix 1 M ∈ N and 0 ≤ ε 1. Let g1 , . . . , gM ∈
[−1, 1]. Consider spin configurations ξ ∈ SM = {−1, 1}M with product weights
w(ξ ) = eε i gi ξ(i) . (15.4.26)
We denote by P the law of this Markov chain and let ντ be the distribution of Ξτ
(which is concentrated on SMτ ), i.e., ντ (ξ ) = P(Ξτ = ξ ). The propagation of errors
along paths of our Markov chain is then quantified in terms of ψτ (·) = ντ (·)/μτ (·).
M
Bτ (ξ ) = e2εgi 1{i∈Λ− (ξ )} (15.4.29)
i=1
358 15 The Curie-Weiss Model with Random Magnetic Field
M
Aτ = Eμτ Bτ (·) = e2εgi μτ i ∈ Λ− (·) .
i=1
Then there exists a c = c(δ0 , δ1 ) such that, for any trajectory ξ = (ξ0 , . . . , ξr ),
τ
A0 2 /M
ψτ (ξτ ) ≤ ecετ (15.4.30)
B0 (ξ0 )
for all τ = 0, 1, . . . , r.
Remark 15.13 The second factor in the bound in (15.4.30) will be seen to be what
we want, since it grows much slower than φA,B (x , x+1 ) decays. The first factor
involves the ratio of A0 and B0 , which is more delicate. To control
√ it we require
a concentration estimate showing that A0 /B0 (ξ0 ) ≤ 1 + O(1/ M), which will be
done later.
Zτ +1 1 1 1
= w(ξ ) = w θj− ξ e2εgj (15.4.34)
Zτ Zτ Zτ |Λ+ (ξ )|
τ +1
ξ ∈ SM τ +1
ξ ∈ SM j ∈Λ+ (ξ )
1 1 1
= w(ξ ) · + e 2εgj
= μτ e 2εgj
.
Zτ |Λτ +1 | |Λ+ (ξτ +1 )|
τ ξ ∈ SM j ∈Λ− (ξ ) − j ∈Λ (·)
15.4 Lower bounds on capacities 359
Consequently,
1 Aτ
ψτ +1 (ξτ +1 ) = − ψτ θj− ξτ +1 . (15.4.36)
|Λ+ (ξτ +1 )| Bτ (θj ξτ +1 )
j ∈Λ+ (ξτ +1 )
Iterating the above procedure, we arrive at the following conclusion. Consider the
set D(ξτ +1 ) of all paths ξ = (ξ0 , . . . , ξτ , ξτ +1 ) of positive probability from SM0 to
SMτ +1 to ξτ +1 . The number Dτ +1 = |D(ξτ +1 )| of such paths does not depend on
ξτ +1 . Therefore, since ψ0 = 1, we have
1 "
τ
As
ψτ +1 (ξτ +1 ) = . (15.4.37)
Dτ +1 Bs (ξs )
ξ ∈D (ξτ +1 ) s=0
Lemma 15.14
As O(ε) As−1
= 1+ , (15.4.38)
Bs (ξs ) M Bs−1 (ξs−1 )
where O(ε) is uniform in all parameters.
where for ξ0 ∈ SM0 the relation ξ0 ∼ ξτ means that there is a path of positive prob-
ability from ξ0 to ξτ . But all such ξ0 differ in at most 2τ coordinates. It is straight-
forward to see that if ξ0 ∼ ξτ and ξ0 ∼ ξτ , then
B0 (ξ0 )
≤ eO(ε)τ/M , (15.4.40)
B0 (ξ0 )
Proof of Lemma 15.14 Let ξ ∈ SMs and ξ = θj− ξ ∈ SMs−1 . Note that
Bs−1 ξ − Bs (ξ ) = e2εgj = 1 + O(ε). (15.4.41)
360 15 The Curie-Weiss Model with Random Magnetic Field
Similarly,
M
As−1 − As = e2εgi μs−1 i ∈ Λ− − μs i ∈ Λ− (15.4.42)
i=1
M
2εg
=1+ e i − 1 μs−1 i ∈ Λ− − μs i ∈ Λ− .
i=1
1
μs−1 i ∈ Λ− − μs i ∈ Λ− = O (15.4.43)
M
M
2εg
As−1 − Bs−1 ξ = e i − 1 μs−1 i ∈ Λ− − 1{i∈Λ− (ξ )} = O(ε)M.
i=1
(15.4.44)
Hence
As As−1 − 1 + O(ε) As−1 O(ε)
= = 1+ , (15.4.45)
Bs (ξ ) Bs−1 (ξ ) − 1 + O(ε) Bs−1 (ξ ) M
which is (15.4.38).
A0 B0 (ξ0 ) − A0 −τ
= 1+ . (15.4.47)
B0 (ξ0 ) A0
Due to the fact that |Λ− (ξ )| only depends on the magnetisation of ξ , we can rewrite
Y as
1 2εgi
M
Y= e − 1 1i∈Λ− (ξ ) − μτ i ∈ Λ− . (15.4.49)
A0
i=1
15.4 Lower bounds on capacities 361
Lemma
√ 15.16 There exist finite positive constants c, C such that, for any r >
ε/ M,
μ0 |Y | > r ≤ Ce−cMr /ε .
2 2
(15.4.51)
2
Eμ0 [1 + Y ]−τ ≤ Eμ0 e−τ Y +dτ Y (15.4.52)
√
≤ eετ/ M+dε τ/M + τ C e−τ r+dτ r e−cMr /ε dr
2 2 2 2
√
√
ετ/ M+dε 2 τ/M 2πτ 2 /(cM/ε 2 −dτ )
=e + eτ .
2(cM/ε 2 − dτ )
Since we have assumed that τ ≤ CM, and ε is small, the right-hand side of (15.4.52)
is as claimed in (15.4.46).
Going back to (15.4.25), we infer that the corrector of the large Markov chain Σ
satisfies the following upper bound. Let σ = (σ0 , σ1 , . . . ) be a trajectory of Σ (as
sampled from Px ). Then, for every = 0, 1, . . . , B − 1,
8 n τj []
n
τj []2 " A0
(j )
Ψ (σ ) ≤ exp cε (j ) (j )
, (15.4.53)
Mj
j =1 B (σ )
j =1 0 0
(j ) (j )
where Mj = |Λj | = ρj N , and A0 , B0 are defined as in (15.4.29) with respect
to the corresponding small microscopic Markov chains. We need to check that when
this bound is inserted into the left-hand side of (15.4.23), we recover the right-hand
side as upper bound.
f
By the construction of the mesoscopic Markov chain PNA,B , and in view of
(15.2.8) and (14.5.9), the step frequencies τj []/ are on average proportional to ρj .
f
Therefore there exists a constant C1 such that, up to exponentially negligible PNA,B -
362 15 The Curie-Weiss Model with Random Magnetic Field
probabilities,
τj [B ]
max ≤ C1 . (15.4.54)
1≤j ≤n Mj
Our mesoscopic trajectories are constructed such that the assumptions of
Sect. 15.4.2 hold for each of them. Thus Lemma 15.4.46 together with Proposi-
tion 15.12 imply that
'
(
n
τj [] τj []2
Ψ (σ ) ≤ exp O(ε) max , (15.4.55)
Mj Mj
j =1
√ 2
≤ exp max O( ε) √ , O(ε) ,
N N
uniformly in = 0, . . . , B . Note thatto obtain the second line we use the Cauchy-
Schwarz inequality and the fact that nj=1 Mj = N and ε = O(1/n). Inserting this
into the bound (15.4.9), we have now proved that
f
cap(A, B) ≥ ENA,B 1TA,B
' −1 √ (
fA,B (x , x+1 ) exp(max(O( ε) √|| , O(ε) N )) −1
2
B
N
× E x
.
(1 + d(x ))Qβ,N (x )rN (x , x+1 )
=−A
(15.4.56)
fA,B (x , x+1 ) √ || 2
1 − exp max O( ε) √ , O(ε)
(1 + d(x ))Qβ,N (x )rN (x , x+1 ) N N
=−A
B −1
√ fA,B (x , x+1 )
≤ O( ε) . (15.4.59)
(1 + d(x ))Qβ,N (x )rN (x , x+1 )
=−A
15.5 Estimates on mean hitting times 363
√
Thus, we have established that, up to an error of order ε, the lower bound on the
capacity for the coarse-grained model is also a lower bound for the full model. This
leads to the inequality in (15.4.24) which, together with the upper bound given in
(15.3.10), concludes the proof of Theorem 15.3.
In this section we conclude the proof of Theorem 15.1. The capacity in the denom-
inator in the right-hand side of (7.1.41) is controlled by Theorem 15.3. It therefore
remains to control the equilibrium potential hA,B (σ ). We are in a situation where
the renewal inequality hA,B (σ ) ≤ cap(A, σ )/cap(B, σ ) cannot be used because ca-
pacities of single configurations are too small. We will need another method to cope
with this problem, explained in Sects. 15.5.1–15.5.2.
We expect the right-hand side of (15.5.1) to be of order Qβ,N (m∗0 ), so that all terms
in the sum over m with Qβ,N (m) much smaller than Qβ,N (m∗0 ) can be ignored.
More precisely, we choose δ > 0 in such a way that, for all N large enough, there
is no critical point z of Fβ,N with Fβ,N (z) ∈ [Fβ,N (m∗0 ), Fβ,N (m∗0 ) + δ], and we
define
Uδ = m ∈ [−1, 1] : Fβ,N (m) ≤ Fβ,N m∗0 + δ . (15.5.2)
The main problem is to control the equilibrium potential hA,B (σ ) for configura-
tions σ ∈ SN [Uδ ]. To do so, first note that
Uδ = Uδ m∗0 Uδ (m), (15.5.4)
m∈M
364 15 The Curie-Weiss Model with Random Magnetic Field
Fig. 15.2 Decomposition of [−1, 1]: Uδc is represented by dotted lines, Uδ = Uδ (m∗0 ) m∈M
Uδ (m) by continuous lines
where Uδ (m) is the connected component of Uδ containing m (see Fig. 15.2). Note
that it may happen that Uδ (m) = Uδ (m ) for two different minima m, m ∈ M.
With this notation we have the following lemma.
(ii)
μβ,N (σ ) 1 − hA,B (σ ) ≤ e−βN c Qβ,N m∗0 . (15.5.6)
σ ∈SN [Uδ (m∗0 )]
The treatment of (i) and (ii) is completely similar, as both rely on a rough estimate
of the probability of leaving the starting valley before visiting its minimum, which
will be discussed below.
Assuming Lemma 15.18, we can readily conclude the proof of Theorem 15.1.
Indeed, using (15.5.5) together with (15.5.3), we obtain the upper bound
μβ,N (σ )hA,B (σ ) ≤ Qβ,N (m) + O Qβ,N m∗0 e−βN c
σ ∈SN m∈Uδ (m∗0 )
2
∗ πN
= Qβ,N m0 ∗ 1 + o(1) , (15.5.7)
2βa(m0 )
where a(m∗0 ) is given in (14.2.17). On the other hand, using (15.5.6) we get the
corresponding lower bound
μβ,N (σ )hA,B (σ ) ≥ μβ,N (σ ) 1 − 1 − hA,B (σ )
σ ∈SN m∈Uδ (m∗0 ) σ ∈SN [m]
15.5 Estimates on mean hitting times 365
≥ Qβ,N (m) − O Qβ,N m∗0 e−βN c
m∈Uδ (m∗0 )
2
∗ πN
= Qβ,N m0 ∗ 1 + o(1) . (15.5.8)
2βa(m0 )
From (15.2.11) for Qβ,N (m∗0 ) and (15.1.4) for cap(A, B), we finally obtain
μβ,N (σ )hA,B (σ )
EνA,B [τB ] =
cap(A, B)
σ ∈SN
= exp βN Fβ,N z∗ − Fβ,N m∗0
2
2πN βEh (1 − tanh2 (β(z∗ + h))) − 1
× ∗
1 + o(1) , (15.5.9)
β|γ̂1 | 1 − βEh (1 − tanh (β(m0 + h)))
2
We next prove Lemma 15.18, giving a detailed proof only for (i) because the proof
of (ii) is completely analogous. This requires us to get an estimate on the minimiser
of the Dirichlet form, the harmonic function hA,B (σ ).
First note that, since hA,B (σ ) = Pσ (τA < τB ) for all σ ∈
/ A ∪ B, the only non-
zero contributions to the sum in (i) come from those sets Uδ (m) (at most two) whose
corresponding m is such that there are no minima of M between m∗0 and m. By
symmetry, we can just as well analyse one of these two sets, denoted by Uδ (m∗ ),
assuming for definiteness that m∗0 < m∗ . Next note that, since hA,B (σ ) = 0 for all σ
such that m∗ ≤ m(σ ), the problem can be reduced further to the set
Uδ− = Uδ m∗ ∩ m ∈ [0, 1] : m < m∗ . (15.5.10)
Define the mesoscopic counterpart of Uδ− , namely, for fixed m∗ ∈ M and n ∈ N, let
m∗ ∈ ΓNn be the minimum of Fβ,N (x) corresponding to m∗ , and define
Uδ = Uδ m∗ = x ∈ ΓNn : m(x) ∈ Uδ− . (15.5.11)
We write the boundary of Uδ as ∂Uδ = ∂A Uδ 2 ∂B Uδ , where ∂B Uδ = ∂Uδ ∩ B, and
observe that, for all σ ∈ SN [Uδ ],
hA,B (σ ) = Pσ [τA < τB ] ≤ Pσ [τS[∂A Uδ ] < τS[∂B Uδ ] ]. (15.5.12)
Let max1≤≤n ρ θ (ε) 1, and for θ = θ (ε) define
8
n
(m − m∗ )2 ε 2
Gθ = m ∈ Uδ : ≤ . (15.5.13)
ρ θ
=1
366 15 The Curie-Weiss Model with Random Magnetic Field
Fig. 15.3 Neighbourhoods of m∗0 and m∗ in the space ΓNn , where Uδ (m∗0 ) denotes the mesoscopic
counterpart of U (m∗0 )
Proposition 15.19 For any α ∈ (0, 1) there exists an n0 ∈ N such that the inequality
∗
Pσ (τA < τSN [∂A Gθ ]∪B ) ≤ e−(1−α)βN [Fβ,N (m0 )+δ−Fβ,N (m(σ ))] (15.5.14)
Throughout the next computations c, c and c will denote positive constants that
are independent of n but may depend on β and on the distribution of h. The value
of c and c may change from line to line.
We first observe that, for all σ ∈ SN [Uδ \ Gθ ],
Pσ [τA < τSN [∂A Gθ ]∪B ] ≤ Pσ [τSN [∂A Uδ ] < τSN [∂A Gθ ]∪B ]. (15.5.15)
The probability in the right-hand side of (15.5.15) is the main object of investigation.
The idea behind the proof of bound (15.5.14) is simple. Suppose that ψ is a bounded
super-harmonic function defined on SN [Uδ \ Gθ ], with L = LN the generator of the
Markov process defined in Sect. 14.3, i.e.,
Then ψ(σt ) is a supermartingale, and T = τSN [∂A Uδ ] ∧ τSN [∂A Gθ ]∪B is an integrable
stopping time, so that, by Doob’s optional stopping theorem,
15.5 Estimates on mean hitting times 367
and hence
ψ(σ )
Pσ (τSN [∂A Uδ ] < τSN [∂A Gθ ]∪B ) ≤ max . (15.5.19)
σ ∈SN [∂A Uδ ] ψ(σ )
Proposition 15.20 For any α ∈ (0, 1) there exists n0 ∈ N such that the function
ψ(σ ) = φ(m(σ )) with φ : Rn → R defined by
The proof of Proposition 15.20 will involve computations with differences of the
function Fβ,N . We collect some necessary properties that will be needed along the
way. First we need some control on the second derivative of this function. A simple
computation shows that
∂ 2 Fβ,N (x) 2 1
= −1 + IN, (x /ρ ) . (15.5.21)
∂x2 N βρ
1
y= tanh(t + β h̃i ), (15.5.23)
|Λ |
i∈Λ
and hence
tanh(t − βε) ≤ y ≤ tanh(t + βε), (15.5.24)
or, equivalently, (15.5.22).
368 15 The Curie-Weiss Model with Random Magnetic Field
1
0 ≤ IN, (y) ≤ . (15.5.25)
1 − [|y| + εβ(1 − y 2 )]2
In particular, for all y ∈ [−1 + ν, 1 − ν] with ν ∈ (0, 1/2),
1
0 ≤ IN, (y) ≤ ≤ c, (15.5.26)
2ν + ν2 + O(ε)
and, for all y ∈ (−1, −1 + ν] ∪ [1 − ν, 1),
1
0 ≤ IN, (y) ≤ . (15.5.27)
1 − |y|
Proof We only consider the case y ≥ 0, the case y < 0 being completely analogous.
(x) = (U (I (x)))−1 , setting t = I (y) arctanh(y),
Using the relation IN, N, N, N,
and using Lemma 15.21, we obtain
1
IN, (y) =
i∈Λ (x) [1 − tanh (β h̃i + t )]
1 2
|Λ (x)|
1
≤
1 − tanh2 (εβ + t )
1
≤
1 − tanh (tanh−1 (y) + 2εβ)
2
1
≤
1 − [y + 2εβ tanh (tanh−1 (y))]2
1
= , (15.5.28)
1 − [y + 2εβ(1 − y 2 )]2
where we use that tanh is monotone increasing. The remainder of the proof is ele-
mentary algebra.
Corollary 15.23
(i) If x /ρ ∈ [−1 + ν, 1 − ν] with ν > 0, then
1
g (x) = −x − h̄ + I (x /ρ ) + O(1/N ). (15.5.30)
β N,
15.5 Estimates on mean hitting times 369
The proof of this corollary is elementary and will not be detailed. The usefulness
| is large on the relevant domain. More precisely,
of (ii) results from the fact that |IN,
we have the following lemma.
Lemma 15.24 There exists a ν > 0 independent of N and n such that if x /ρ >
1 − ν, then g (x) is strictly increasing in x and tends to ∞ as x /ρ ↑ 1. Similarly,
if x /ρ < −1 + ν, then g (x) is strictly decreasing in x and tends to −∞ as
x /ρ ↓ −1.
Proof Combine Corollary 15.23(ii) with Lemma 15.21 and note that h̄ is bounded
by hypothesis.
The next step towards the proof of Proposition 15.20 is the following lemma.
then
2 ε2
ρ g (m) ≥ c . (15.5.35)
θ
∈S(m)
1
m = tanh β g (m) 1 + o(1) + m + hi , (15.5.36)
N
i∈Λ
1/2
≤c ρ g (m) ≤ c ρ g2 (m) . (15.5.37)
∈S(m) ∈S(m)
Note that the function m → m − N1 N i=1 tanh(β(m + hi )) has, by (14.2.18), a non-
zero derivative at m∗ . Moreover, by construction, m∗ is the only zero of this function
in Uδ− (m∗ ). From this observation, together with (15.5.37), we conclude that
' (1/2
n
ρ g2 (m) ≥ cm − m∗ − 2 ρ (15.5.38)
=1 ∈S(m)
/
for some constant c < ∞, where we use the triangle inequality and the fact that
|m − N1 i∈Λ tanh(β(m + hi ))| ≤ 2ρ . Under the hypothesis of the lemma, this
√
gives the desired bound when |m − m∗ | ≥ c ε/ θ for some constant c < ∞. On
the other hand, we can write for ∈ S(m),
m − m∗ ≤ 1 tanh β g (m) 1 + ω(1) + m + hi − tanh β(m + hi )
N
i∈Λ
1
+ tanh β(m + hi ) − tanh β m∗ + hi
N
i∈Λ
≤ cρ m − m∗ + c ρ g (m). (15.5.39)
Proof of Proposition 15.20 Let σ ∈ SN [Uδ \ Gθ ], and set x = m(σ ) so that, for ψ
as in Proposition 15.20, and abbreviate (Lψ)(σ ) = (Lφ)(x). Let σ i be the configu-
ration obtained from σ after a spin-flip at i, and introduce the notation
n
(Lφ)(x) = (L φ)(x), (15.5.41)
=1
where
(L φ)(x) = pN σ, σ i φ(x + e ) − φ(x)
i∈Λ−
(x)
+ pN σ, σ i φ(x − e ) − φ(x) . (15.5.42)
i∈Λ+
(x)
and observe that they are uniformly close to the mesoscopic rates rN , namely,
Pσ±,
e−cε ≤ ≤ ecε (15.5.44)
rN (x, x ± e )
for some c > 0 and ε = 1/n. Note also that
With the above notation and using the convention 0/0 = 0, we get
(L φ)(x) = φ(x)Pσ+, exp 2β(1 − α)g (x) − 1
+ φ(x)Pσ−, exp −2β(1 − α)g (x − e ) − 1
= φ(x) 1{Pσ+, ≥Pσ−, } Pσ+, G+ σ σ
σ −
(x) + 1{P−, >P+, } P−, G (x) , (15.5.46)
(15.5.47)
372 15 The Curie-Weiss Model with Random Magnetic Field
and
Pσ+,
G− (x) = exp −2β(1 − α)g (x − e ) − 1 + exp 2β(1 − α)g (x) − 1 .
P−,
σ
(15.5.48)
If x /ρ = ±1, then the local generator takes the simpler form
φ(x)Pσ−, [exp (−2β(1 − α)g (x − e )) − 1], if x /ρ = 1,
(L φ)(x) =
φ(x)Pσ+, [exp (2β(1 − α)g (x)) − 1],if x /ρ = −1.
(15.5.49)
From Lemma 15.24 and inequalities (15.5.45), it follows that, for all such that
x /ρ = ±1,
(L φ)(x) ≤ − 1 + ω(1) ρ φ(x). (15.5.50)
Let us now return to the case when x is not a boundary point. By the reversibility
conditions,
rN (x, x + e ) = exp −2βg (x) rN (x + e , x),
(15.5.51)
rN (x, x − e ) = exp 2βg (x − e ) rN (x − e , x),
Pσ+,
exp −2βg (x) − cε ≤ σ ≤ exp −2βg (x) + cε ,
P−,
(15.5.52)
Pσ−,
exp 2βg (x − e ) − cε ≤ σ ≤ exp 2βg (x − e ) + cε .
P+,
Inserting the last bounds into (15.5.47) and (15.5.48), we obtain, after some compu-
tations,
G+ (x) ≤ exp 2β(1 − α)g (x) − 1 1 − exp 2βαg (x − e ) ∓ cε
+ exp 2βg (x − e ) ∓ cε exp 2β(1 − α) g (x) − g (x − e ) − 1
(15.5.53)
and
G−
(x) ≤ exp −2β(1 − α)g (x − e ) − 1 1 − exp −2βαg (x) ∓ cε
+ exp −2βg (x) ∓ cε exp 2β(1 − α) g (x) − g (x − e ) − 1 ,
(15.5.54)
where ∓ = −sign(g (x)) = −sign(g (x − e )). For all such that x /ρ ∈ [−1 + ν,
1 − ν], we can use (15.5.32) to get
G+
(x) ≤ exp 2β(1 − α)g (x) − 1 1 − exp 2αβg (x) ∓ cε + c/N (15.5.55)
15.5 Estimates on mean hitting times 373
and
G− (x) ≤ exp −2β(1 − α)g (x) − 1 1 − exp −2αβg (x) ∓ cε + c/N.
(15.5.56)
The right-hand sides of (15.5.55) and (15.5.56) are negative if and only if |g | > 2αβ
cε
.
Let us define the index sets
# $
cε
S < = : x /ρ ∈ [−1 + ν, 1 − ν], g (x) ≤ , (15.5.57)
αβ
# $
cε
>
S = : x /ρ ∈ [−1 + ν, 1 − ν], g (x) > . (15.5.58)
αβ
c 2
(L φ)(x) ≤ ε ρ φ(x). (15.5.60)
α
To control the right-hand side of (15.5.55) and (15.5.56) when ∈ S > , we set
y = min β g (x), 12 ≤ β g (x). (15.5.61)
cε
If g (x) > αβ , then
exp 2β(1 − α)g (x) − 1 ≥ exp 2(1 − α)y − 1 ≥ 2(1 − α)y (15.5.62)
and
1 − exp 2βαg (x) − cε ≤ 1 − exp (αy ) ≤ −αy , (15.5.63)
so that the product in the right-hand side of (15.5.55) is bounded from above by
− 34 (1 − α)αy2 . On the other hand, if g (x) < − αβ
cε
, then
exp 2β(1 − α)g (x) − 1 ≤ exp −2(1 − α)y − 1 ≤ −(1 − α)y (15.5.64)
and
1 − exp 2βαg (x) + cε ≥ 1 − exp (−αy ) ≥ 34 αy , (15.5.65)
and the product in the right-hand side of (15.5.55) is bounded from above by
− 34 (1 − α)αy2 . Altogether this proves that, for all ∈ S > ,
G+
(x) ≤ − 4 (1 − α)αy ,
3 2
(15.5.66)
374 15 The Curie-Weiss Model with Random Magnetic Field
G−
(x) ≤ − 4 (1 − α)αy .
3 2
(15.5.67)
c 2 c
ε ρ ≤ ε 2 . (15.5.71)
α <
α
∈S
We must distinguish two cases, according to whether the hypothesis of Lemma 15.25
is satisfied or not.
ε2
Case 1: ∈S(x)
/ ρ > 8θ . By (15.5.50), we get
n
(L φ)(x) ≤ (L φ)(x) + (L φ)(x) (15.5.72)
=1 ∈S(x)
/ ∈S <
ε2 c
≤− 1 + o(1) φ(x) + ε 2 ,
8θ α
which is as negative as desired when θ is small enough, i.e., when ε is small enough.
ε2
Case 2: ∈S(x)
/ ρ ≤ 8θ . In this case, the assertion of Lemma 15.25 holds. By
(15.5.50), (15.5.68), and (15.5.70), we have that, for all ∈ S(x) \ L< ,
(L φ)(x) ≤ −ρ φ(x) min cαy2 , 1 ≤ −cαρ y2 φ(x), (15.5.73)
15.5 Estimates on mean hitting times 375
where the last inequality holds for α < 4/c. Let us write the generator as
(Lφ)(x) ≤ (L φ)(x) + (L φ)(x). (15.5.74)
∈S(x)\S < ∈S <
ε2 c ε2
ρ g2 (x) ≥ c − 2 ε 2 ≥ c , (15.5.76)
θ α θ
∈S(x)\S <
By our choice of θ , taking n large enough we see that the condition c αθ −1 −
c α −1 > 0, or α > cθ , is satisfied for any α ∈ (0, 1). Hence, for such n and for
N large enough, we get that (Lψ)(σ ) ≤ 0, which concludes the proof of Proposi-
tion 15.20.
where the last inequality follows from the definition of Uδ , together with the bounds
in (14.7.3). This concludes the proof of Proposition 15.19.
376 15 The Curie-Weiss Model with Random Magnetic Field
Let us now return to the proof of Lemma 15.18. An easy consequence of (15.5.14)
is that, for all σ ∈ SN [∂A Gθ ],
∗
Pσ (τA < τSN [∂A Gθ ]∪B ) ≤ e−(1−α)βN (Fβ,N (m0 )+δ) max e(1−α)βN Fβ,N (m) ,
m∈∂A Gθ
(15.5.80)
while obviously Pσ (τA < τSN [∂A Gθ ]∪B ) = 0 for all σ ∈ SN [Gθ \ ∂A Gθ ]. To control
the right-hand side of (15.5.80), we need the following lemma.
Lemma 15.26 There exists a constant c < ∞ independent of n such that, for all
m ∈ Gθ ,
Fβ,N (m) ≤ Fβ,N m∗ + cε. (15.5.81)
where A(m∗ ) is the positive-definite matrix described in Sect. 15.2.2 (see (15.2.9))
and x is a suitable element of the ball around m∗ . From the explicit representation
of the eigenvalues of A(m∗ ), we see that A(m∗ ) ≤ cε −1 , and hence
v, A m∗ v ≤ cε −1 v22 ≤ cε. (15.5.84)
n
∂ 3 Fβ,N 1 1
n
D 3 Fβ,N (x)v3 = (x)v3 = I (x /ρ )v3
2 N,
(15.5.85)
=1
∂x3 β ρ
=1
1 1 UN, (t ) 3
n
=− (t )]3 v
β ρ 2 [UN,
=1
1 1 |Λ |−1 i∈Λ tanh(t + β h̃i )[1 − tanh (t + β h̃i )] 3
n 2
=− v ,
β ρ2
=1
(|Λ |−1 i∈Λ [1 − tanh2 (t + β h̃i )])3
(x /ρ ). Thus,
where t = IN,
3 n
1 3
D Fβ,N (x)v3 ≤ c v ≤ c ε −1 v22 ≤ c ε, (15.5.86)
ρ 2
=1
15.5 Estimates on mean hitting times 377
where we use that |v /ρ | ≤ 1. Hence, for some c < ∞ independent of n,
Fβ,N (m) ≤ Fβ,N m∗ + cε. (15.5.87)
Inserting the result of Lemma 15.26 into (15.5.80), and recalling that Fβ,N (m∗ ) =
Fβ,N (m∗ ), we get that, for all σ ∈ SN [∂A Gθ ],
∗ ∗ )−cε)
Pσ (τA < τSN [∂A Gθ ]∪B ) ≤ e−(1−α)βN (Fβ,N (m0 )+δ−Fβ,N (m . (15.5.88)
The last ingredient in order to get a suitable estimate on Pσ (τA < τB ) is stated in
the following lemma.
Lemma 15.27 For any δ2 > 0 there exists an n0 ∈ N such that for all n ≥ n0 , all
σ ∈ SN [∂A Gθ ] and all N large enough,
Proof Fix σ ∈ SN [∂A Gθ ] and set m(0) = m(σ ). As pointed out in the proof of
Lemma 15.26, every m(0) ∈ ∂A Gθ can be written in the form m(0) = m∗ + v
with v ∈ ΓNn such that v2 ≤ ε. Let m = (m(0), m(1), . . . , m(v1 N ) = m∗ ) be
a nearest-neighbour path in ΓNn from m(0) to m∗ of length N v1 with the follow-
ing property: with t the unique index in {1, . . . , n} such that mt (t) = mt (t − 1),
2
mt (t) = mt (t − 1) + st , ∀ t ≥ 1, (15.5.90)
N
where we define
st = sign m∗t − mt (t − 1) . (15.5.91)
Note that, by property (15.5.90), m(t) ∈ Gθ for all t ≥ 0. Thus, all microscopic
paths, (σ (t))t≥0 such that σ (0) = σ and m(σ (t)) = m(t) for all t ≥ 1 are contained
in the event {τB < τSN [∂A Gθ ] }. Therefore we get
Pσ (τB < τSN [∂A Gθ ] ) ≥ Pσ m σ (t) = m(t) ∀ t = 1, . . . , v1 N
v
" 1N
= Pσ m σ (t) = m(t) | m σ (t − 1) = m(t − 1)
t=1
v
" 1N
= pN σ (t − 1), σ i (t − 1) . (15.5.92)
t=1 i∈Λst
t
Note that Λstt is the set of sites in which a spin-flip corresponds to a step from
m(t − 1) to m(t).
378 15 The Curie-Weiss Model with Random Magnetic Field
The sum of the probabilities in the right-hand side of (15.5.92) corresponds to the
σ (t−1)
quantity Pst ,t defined in (15.5.43). From the inequalities in (15.5.44) and (15.3.3)
it follows that, for some constant c > 0 depending on β and on the distribution of
the magnetic field,
Psσt (t−1)
,t ≥ cΛstt m(t − 1) /N ≥ cΛstt m∗ /N, (15.5.93)
where the second inequality follows by our choice of the path m. Now, since
|Λ± ∗ ∗ ∗
(m )|/N = 2 (ρ ± m ), we can use the expression in (15.2.8) for mt and
1
Psσt (t−1)
,t ≥ c ρ t . (15.5.94)
Inserting the last inequality into (15.5.92) and using that, by the definition of the
path m, the number of steps corresponding to a spin-flip in Λ is equal to |v |N for
all = {1, . . . , n}, we get
v
" 1N "
n
|v |N
Pσ (τB < τSN [∂A Gθ ] ) ≥ c ρt = ev1 N ln(c ) ρ (15.5.95)
t=1 =1
√ n √ n √
N ε ln(c ) −N =1 v ln(1/ρ ) ε ln(c ) −N
≥e e ≥ eN e =1 v / ρ
n 1/2 ε −1/2 √ √
ε/θ− ε ln(c ))
≥ eN ε ln(c ) e−N ( ≥ e−N (
2
=1 v /ρ )s ,
√
where in the third line we use the inequality v1 ≤ ε −1/2 v2 ≤ ε, and in the
last line we use that m(0) = m∗√ + v ∈ G√θ . By our choice of θ ( ε, there exists an
n0 ∈ N such that, for all n ≥ n0 , ε/θ − ε ln(c ) ≤ βδ2 . For such n, the inequality
in (15.5.95) yields the bound in (15.5.89).
Pσ (τA < τB )
≤ Pσ (τA < τSN [∂A Gθ ]∪B ) + Pσ (τA < τB , τη ≤ τSN [∂A Gθ ]∪A∪B )
η∈SN [∂A Gθ ]
≤ Pσ (τA < τSN [∂A Gθ ]∪B ) + max Pη (τA < τB ) Pσ (τSN [∂A Gθ ] < τB )
η∈SN [∂A Gθ ]
≤ Pσ (τA < τSN [∂A Gθ ]∪B ) + max Pη (τA < τB ) 1 − e−βN δ2 , (15.5.97)
η∈SN [∂A Gθ ]
15.5 Estimates on mean hitting times 379
where in the second line we use the Markov property, and in the last line we insert
the result in (15.28). Taking the maximum over σ ∈ SN [∂A Gθ ] on both sides of
(15.5.97), and rearranging the summation, we get
where in the last line we use the bound in (15.5.88). This concludes the proof of
(15.5.96) for σ ∈ SN [∂A Gθ ].
Next we consider σ ∈ SN [Uδ \ ∂A Gθ ]. As before,
Pσ (τA < τB )
≤ Pσ (τA < τSN [∂A Gθ ]∪B ) + Pσ (τA < τB , τη ≤ τSN [∂A Gθ ]∪A∪B )
η∈SN [∂A Gθ ]
≤ Pσ (τA < τSN [∂A Gθ ]∪B ) + max Pη (τA < τB )Pσ (τSN [∂A Gθ ] < τB )
η∈SN [∂A Gθ ]
where Pσ (τA < τSN [∂a Gθ ]∪B ) is zero for all σ ∈ SN [Gθ \∂A Gθ ], and is exponentially
small in N for all σ ∈ SN [Uδ \ Gθ ] (due to Proposition 15.19). Inserting the bound
in (15.5.98) into the last equation, we get (15.5.96) for σ ∈ SN [Uδ \ ∂A G].
where in the second inequality we use the expression in (14.2.9) for Qβ,N (m∗0 ), and
in the last line we use the bounds Fβ,N (m) ≤ Fβ,N (m∗ ) = Fβ,N (m∗ ) and |Ud | ≤
N n . Finally, choosing α small enough, namely,
δ − cε − δ2
α< ∗ , (15.5.101)
Fβ,N (m0 ) − Fβ,N (m∗ ) + δ − cε
1. The results of this chapter have been obtained by Bianchi, Bovier and Ioffe
in [24]. The proof given here is streamlined, and the use of deficient flows on the
mesoscopic level leads to a considerable simplification. Coupling methods have
been used by the same authors in [25] to prove that transition times asymptoti-
cally exponentially distributed and the independence of the initial law. We have de-
cided that this proof is too technical and too model-specific to be reproduced here.
The results have been generalised to the Potts version of the model in the thesis of
Slowik [219].
2. The dynamics of the random field Curie-Weiss model has been studied before. Dai
Pra and den Hollander [69] studied the short-time dynamics using large deviation
results and obtained the analogue of the McKean-Vlasov equations. Mathieu and
Picco [180] considered convergence to equilibrium in the particularly simple case
where the random field takes only the two values ±ε (with further restrictions on
the parameters that exclude the presence of more than two minima).
This chapter describes the metastable behaviour of lattice systems in small volumes
at low temperatures subject to a Metropolis dynamics. These theorems are derived
under two hypotheses on the energy landscape, i.e., on the interaction Hamiltonian.
These hypotheses, in turn, will be checked for Glauber dynamics in Chap. 17 and for
Kawasaki dynamics in Chap. 18. The theorems themselves are model-independent,
and therefore amplify the universal nature of metastability (in the setting considered
here). However, they involve a number of quantities that are model-dependent. The
identification of these quantities will be carried out in Chaps. 17 and 18 as well.
The outline is as follows. In Sect. 16.1 we define Metropolis dynamics on a
general configuration space with respect to a general Hamiltonian and for a general
set of allowed moves, we state the theorems subject to the hypotheses, and we place
the results in their proper context. Section 16.2 explains how this abstract set-up
fits into the potential-theoretic framework developed in Part III. The proofs of the
theorems are given in Sect. 16.3. In Sect. 16.4 we take a brief look at two other
dynamics, namely, heat-bath dynamics and probabilistic cellular automata, and we
indicate how these can be included into the same abstract set-up with only minor
modifications.
We begin by defining the abstract set-up of lattice models with short-range interac-
tion in finite volume from statistical physics.
Let Λ be a finite set (e.g. Λ ⊆ Zd , d ≥ 1). We refer to elements of Λ as sites.
With each site x ∈ Λ we associate a variable ξ(x) ∈ Υ , where Υ is a finite set of
spin-values. A configuration ξ = {ξ(x) : x ∈ Λ} is an element of S = Υ Λ . To each
configuration ξ we associate an energy given by a Hamiltonian H : S → R, which
in general depends on one or more parameters. The Gibbs measure associated with
H is
1 −βH (ξ )
μβ (ξ ) = e , ξ ∈ S, (16.1.1)
Zβ
where β ∈ (0, ∞) is the inverse temperature, and Zβ is the normalising partition
sum.
Equip S with a set of undirected edges E, connecting pairs of elements of S,
such that (S, E) is a connected graph. Write ξ ∼ ξ when (ξ, ξ ) ∈ E. As dynamics
we consider the continuous-time Markov process (ξt )t≥0 with state space S whose
transition rates are given by
# −β[H (ξ )−H (ξ )]
e +, ξ ∼ ξ ,
cβ ξ, ξ = (16.1.2)
0, otherwise,
i.e., transitions occur along edges only. This dynamics is called the Metropolis dy-
namics with respect to H at inverse temperature β. It is ergodic and reversible with
respect to μβ :
μβ (ξ )cβ ξ, ξ = μβ ξ cβ ξ , ξ ∀ ξ, ξ ∈ S, (16.1.3)
(b) S (ξ, ξ )
is the communication level set between ξ, ξ ∈ S defined by
S ξ, ξ = ζ ∈ S : ∃ γ : ξ → ξ , γ ζ : max H (η) = H (ζ ) = Φ ξ, ξ .
η∈γ
(16.1.7)
(c) Vξ is the stability level of ξ ∈ S defined by
Vξ = Φ(ξ, Iξ ) − H (ξ ), (16.1.8)
where
Iξ = ζ ∈ S : H (ζ ) < H (ξ ) (16.1.9)
is the set of configurations with energy lower than ξ .
(d) Sstab is the set of configurations with minimal energy, called stable configura-
tions, defined by
Sstab = ξ ∈ S : H (ξ ) = min H (ζ ) . (16.1.10)
ζ ∈S
(e) Smeta is the set of non-minimal configurations with maximal stability, called
metastable configurations, defined by
Smeta = ξ ∈ S : Vξ = max Vζ . (16.1.11)
ζ ∈S\Sstab
Armed with these definitions we are ready to state our metastability theorems
(see Fig. 16.1).
Fig. 16.1 Schematic picture of H , Smeta , Sstab and G , the essential gate between Smeta and Sstab
Fig. 16.2 Schematic picture of the protocritical set and the critical set
configuration. Associated with (m, s) is a pair of sets (P (m, s), C (m, s)), which
will be referred to as the protocritical set, respectively, the critical set, defined as
follows.
Definition 16.3 (Protocritical and critical sets) (See Fig. 16.2.) Let
Then (P (m, s), C (m, s)) is the maximal subset of S × S such that:
(1) ∀ ξ ∈ P (m, s) ∃ ξ ∈ C (m, s) : ξ ∼ ξ and ∀ ξ ∈ C ∗ (m, s) ∃ ξ ∈ P ∗ (m, s) :
ξ ∼ ξ.
(2) ∀ ξ ∈ P (m, s) : Φ(ξ, m) < Φ(ξ, s).
(3) ∀ ξ ∈ C (m, s) ∃ γ : ξ → s : maxζ ∈γ H (ζ ) − H (m) ≤ Γ , γ ∩ {ζ ∈ S :
Φ(ζ, m) < Φ(ζ, s)} = ∅.
Think of P (m, s) as the set of configurations where the dynamics starting from
m is “almost on top of the hill”, and of C (m, s) as the set of configurations where
the dynamics “has reached the top of the hill” and is “capable of crossing over” to s
16.1 Hypotheses and universal metastability theorems 387
without returning to “the valley around m”. The latter restriction is put in to remove
the dead-ends. Note that
Also note that C (m, s) ⊆ G (m, s), where the inclusion may be strict.
Theorems 16.4–16.6 below will be proved subject to two hypotheses:
(H1) Smeta = {m}, Sstab = {s}.
(H2) ξ → |{ξ ∈ P (m, s) : ξ ∼ ξ }| is constant on C (m, s).
(H1) says that Smeta and Sstab are singletons, while (H2) says that all configurations
in C (m, s) have the same number of configurations in P (m, s) from which they
can be reached via an allowed move. Any pair of configurations (m, s) satisfying
(H1) is referred to as a metastable pair. Without loss of generality we may assume
that
H (m) = 0. (16.1.14)
We write Pξ to denote the law of (ξt )t≥0 given ξ0 = ξ ∈ S, and
to denote the first hitting time of A ⊆ S after the starting configuration has been
left. In what follows we abbreviate P = P (m, s) and C = C (m, s).
Theorem 16.5 (Mean crossover time) There exists a constant K ∈ (0, ∞) such that
Chap. 17 that for Glauber dynamics there are no wells and K can be computed ex-
plicitly. We will see in Chap. 18 that for Kawasaki dynamics there are wells, but
they are sometimes harmless, e.g. when Λ is a large box in Z2 whose size tends to
infinity (after the limit β → ∞ has been taken).
While (H1) plays a central role in the derivation of Theorems 16.4–16.6, (H2) is
needed for Theorem 16.4(b) only.
16.1.3 Discussion
1. Theorem 16.4(a) says that C is a gate for the crossover, i.e., on its way from m
to s the dynamics passes through C with a probability tending to one in the limit of
low temperature. Theorem 16.4(b) says that, in this limit, all critical configurations
are equally likely to be seen upon first entrance in C . Theorem 16.5 says that the
average crossover time is asymptotic to KeΓ β , which is the classical Arrhenius
law (see Sect. 1.3.1). Theorem 16.6(a) says that the spectral gap −Lβ (the first
eigenvalue of −Lβ is zero) scales like the inverse of the average crossover time,
while Theorem 16.6(b) says that asymptotically the crossover time is exponentially
distributed on the scale of its average.
2. Theorems 16.4–16.6 are model-independent, i.e., they hold in the same form for
all stochastic dynamics in a finite volume in the limit of low temperature and for
any pair (m, s) satisfying hypotheses (H1–H2). In fact, we will see that (H1–H2)
are essentially the minimal hypotheses needed to prove Theorems 16.4–16.6. The
model-dependent ingredients of Theorems 16.4–16.6 are the pair (m, s) and the
triple (Γ , C , K). In Chaps. 17 and 18 we will identify these for Glauber dynamics
and Kawasaki dynamics, and prove (H1–H2).
3. There is some flexibility in letting our dynamics start and end at configurations
that are different from m and s. For instance, we will see that the same results apply
when the initial configuration is drawn from the “valley around m”, and the target
configuration is drawn near the bottom of the “valley around s” (see Sect. 16.2.3,
Eq. (16.1.17) for precise definitions).
4. Hypothesis (H1) can be relaxed. The Hamiltonian may have valleys that are
deeper than Γ (the energy barrier between m and s), but are shielded away from
m by an energy barrier that is higher than Γ . In that case the dynamics has a neg-
ligible probability to enter these valleys, and (H1) is required to hold only on the
subset of S obtained by removing all the configurations with energy > Γ + H (m).
The average crossover time on this subset is the relevant time scale, not the average
crossover time on S, which is much longer. See also Item 3 in Sect. 16.5.
16.1 Hypotheses and universal metastability theorems 389
Lemmas 16.7–16.10 below are immediate consequences of (H1) and will be needed
in Sect. 16.2. Recall that H (m) = 0 by (16.1.14). Recall also Figs. 16.1–16.2.
Since (H1) tells us that m has the largest stability level, we can proceed to reduce
the energy further until we hit s. Indeed, the finiteness of S guarantees that there
exists an m ∈ N0 and a sequence ξ1 , . . . , ξm ∈ S\m with ξm = s such that ξi+1 ∈ Iξi
and Φ(ξi , ξi+1 ) − H (ξi ) < Vm for i = 0, . . . , m − 1. Therefore we have
Φ(ξ0 , s) ≤ max Φ(ξi , ξi+1 ) < max H (ξi ) + Vm
i=0,...,m−1 i=0,...,m−1
where in the first inequality we use the ultrametricity of the communication height,
Φ(ξ, χ) ≤ max Φ(ξ, ζ ), Φ(ζ, χ) ∀ ξ, χ, ζ ∈ S (16.1.19)
which is a contradiction.
Lemma 16.8 (H1) implies that H (ξ ) > 0 for all ξ ∈ S\m with Φ(ξ, m) ≤ Φ(ξ, s).
Proof The proof is again by contradiction. Fix ξ0 ∈ S\m with Φ(ξ0 , m) ≤ Φ(ξ0 , s)
and suppose that H (ξ0 ) ≤ 0. Then m ∈ / Iξ0 . As in the proof of Lemma 16.7, there
exist an m ∈ N0 and a sequence ξ0 , . . . , ξm ∈ S with ξm = s such that ξi+1 ∈ Iξi and
Φ(ξi , ξi+1 ) − H (ξi ) < Vm = Γ for i = 0, . . . , m − 1. Therefore, as in (16.1.18),
we get Φ(ξ0 , s) − H (ξ0 ) < Vm = Γ . Hence
Γ = Φ(m, s) ≤ max Φ(m, ξ0 ), Φ(ξ0 , s) = Φ(ξ0 , s) ≤ Φ(ξ0 , s) − H (ξ0 ) < Γ ,
(16.1.21)
which is a contradiction.
390 16 Abstract Set-Up and Metastability in the Zero-Temperature Limit
Lemma 16.9 (H1) implies that there exists a V < Γ such that Φ(ξ, {m, s}) −
H (ξ ) ≤ V for all ξ ∈ S\{m, s}.
Proof In the proof of Lemma 16.8 we have shown that Φ(ξ0 , s) − H (ξ0 ) < Γ for
all ξ0 ∈ S\m. But
Φ ξ0 , {m, s} = min Φ(ξ0 , m), Φ(ξ0 , s) ≤ Φ(ξ0 , s), (16.1.22)
16.2 Preliminaries
In this section we recall some facts from Part III, adapted to the present context,
and use them to derive a few lemmas that are needed in Sect. 16.3 to prove Theo-
rems 16.4–16.6.
where μβ is the Gibbs measure defined in (16.1.1) and cβ is the kernel of transition
rates defined in (16.1.2). Given a pair of non-empty disjoint sets A, B ⊆ S, the
capacity of the pair A, B is given by the Dirichlet principle,
where h|A = 1 means that h(ξ ) = 1 for all ξ ∈ A and h|B = 0 means that h(ξ ) = 0
for all ξ ∈ B. The unique minimizer hA,B of (16.2.2) is called the equilibrium po-
tential of the pair A, B, and is the solution of the equation
which is given by
Lemma 16.11 For every pair of non-empty disjoint sets A, B ⊆ S there exist con-
stants 0 < C1 ≤ C2 < ∞ (depending on A, B) such that
we get
capβ (A, B) ≤ Eβ (1K(A,B) , 1K(A,B) ). (16.2.9)
Here note that A ⊂ K(A, B), while (16.2.7) guarantees that B ⊂ S\K(A, B), so
that the boundary conditions on A and B are met.
To estimate Eβ (1K(A,B) , 1K(A,B) ), the key observation is that if ξ ∼ ξ with ξ ∈
K(A, B) and ξ ∈ S\K(A, B), then
(1) H ξ < H (ξ ),
(16.2.10)
(2) H (ξ ) ≥ Φ(A, B).
Therefore ξ ∈ K(A, B), which is a contradiction. To see (2), note that (1) implies
Φ(ξ, C) = Φ ξ , C ∨ H (ξ ) ∀ C ⊆ S. (16.2.13)
where the second inequality uses that ξ ∈ S\K(A, B). Thus, we have Φ(ξ, A) >
Φ(ξ, B), which contradicts ξ ∈ K(A, B). From the equality H (ξ ) = Φ(ξ, B) and
(16.1.19) we obtain Φ(A, B) ≤ Φ(A, ξ ) ∨ Φ(ξ, B) = Φ(ξ, B) = H (ξ ), which
proves (2).
Combining (16.2.10) with (16.1.1)–(16.1.3), we find that
1 −β[H (ξ )∨H (ξ )]
μβ (ξ )cβ ξ, ξ = e
Zβ
1 −βΦ(A,B)
≤ e ∀ ξ ∈ K(A, B), ξ ∈ S\K(A, B). (16.2.15)
Zβ
Hence
1 −βΦ(A,B)
Eβ (1K(A,B) , 1K(A,B) ) ≤ C2 e (16.2.16)
Zβ
16.2 Preliminaries 393
∃ ζ ∈ B : Φ(ζ, A) = H (ζ ),
(16.2.18)
∃ ζ ∈ A : Φ ζ , B = H (ζ ).
Estimating
capβ (A, B) ≤ Eβ (1A , 1A ) = μβ (ξ )cβ ξ, ξ
ξ ∈A,ξ ∈S\A
1 −β[H (ξ )∨H (ξ )] 1 −βΦ(A,S\A)
= e ≤ C2 e (16.2.19)
Zβ Zβ
ξ ∈A,ξ ∈S\A
ξ ∼ξ
Lower bound: The lower bound is obtained by picking any self-avoiding path
γ = (γ0 , γ1 , . . . , γL ) (16.2.20)
that realizes the minimax in Φ(A, B) and ignore all the transitions that are not in
this path, i.e.,
γ
capβ (A, B) ≥ min Eβ (h, h), (16.2.21)
h : γ →[0,1]
h(γ0 )=1,h(γL )=0
γ
where the Dirichlet form Eβ is defined as Eβ in (16.2.1) but with S replaced by γ .
Due to the one-dimensional nature of the set γ , the variational problem in the right-
hand side can be solved explicitly by elementary computations (recall Sect. 7.1.4).
We find that the minimum equals
L−1 −1
1
M= , (16.2.22)
μβ (γl )cβ (γl , γl+1 )
l=0
l−1
1
h(γl ) = M , l = 0, 1, . . . , L. (16.2.23)
μβ (γk )cβ (γk , γk+1 )
k=0
394 16 Abstract Set-Up and Metastability in the Zero-Temperature Limit
Fig. 16.3 Schematic picture of the subgraphs S (on or below the top line) and S (below the top
line) and of the connected components Sm and Ss . The four vertical lines represent dead-ends
We thus have
1
capβ (A, B) ≥ M ≥ min μβ (γl )cβ (γl , γl+1 )
L l=0,1,...,L−1
1 1 1 −βΦ(A,B)
= min e−β[H (γl )∨H (γl+1 )] = C1 e
L Zβ l=0,1,...,L−1 Zβ
(16.2.24)
with C1 = 1/L.
In this section we have a closer look at the geometric structure of the set S.
Theorem 16.12 (Graph structure of the energy landscape) View S as a graph whose
vertices are the configurations and whose edges connect configurations that can be
obtained from each other via an allowed move, i.e., (ξ, ξ ) is an edge if and only if
ξ ∼ ξ . Define (see Fig. 16.3)
– S is the subgraph of S obtained by removing all vertices ξ with H (ξ ) > Γ and
all edges incident to these vertices;
– S is the subgraph of S obtained by removing all vertices ξ with H (ξ ) = Γ
and all edges incident to these vertices;
– Sm and Ss are the connected components of S containing m and s, respectively.
Then
Sm = ξ ∈ S : Φ(ξ, m) < Φ(ξ, s) = Γ ,
(16.2.25)
Ss = ξ ∈ S : Φ(ξ, s) < Φ(ξ, m) = Γ .
16.2 Preliminaries 395
Proof All paths connecting m and s reach energy level ≥ Γ (recall that H (m) = 0
by (16.1.14)). Therefore Sm and Ss are disconnected in S (because S does not
contain vertices with energy ≥ Γ ). The claims in (16.2.25) are immediate from
the definition of Sm and Ss . The claims in (16.2.26) are immediate consequences of
Definition 16.3.
Lemma 16.13 (Metastable pair) The pair {m, s} is a metastable set in the sense of
Definition 8.2:
maxξ ∈{m,s}
/ μβ (ξ )/capβ (ξ, {m, s})
lim = 0. (16.2.27)
β→∞ minξ ∈{m,s} μβ (ξ )/capβ (ξ, {m, s}\ξ )
Proof Note that (16.1.1), Lemma 16.9 and the lower bound in (16.2.6) give that
the numerator is bounded from above by eβ(V −H (m)) /C1 = eβ(Γ −δ) /C1 for some
δ > 0, while (16.1.1), the definition of Γ and the upper bound in (16.2.6) give that
the denominator is bounded from below by eΓ β /C2 (the minimum being attained
at m).
Lemma 16.14 (Mean crossover time asymptotics) Em (τs ) = [Zβ capβ (m, s)]−1 [1+
o(1)] as β → ∞.
where
A(m) = ξ ∈ S : Pξ (τm < τs ) ≥ Pξ (τs < τm )
= ξ ∈ S : hm,s (ξ ) ≥ 12 . (16.2.29)
It follows from Lemma 16.15 below that
By Lemma 16.8, we have H (ξ ) > 0 = H (m) for all ξ = m such that Φ(ξ, m) ≤
Φ(ξ, s). Therefore, by the second inclusion in (16.2.31),
The latter in turn implies that μβ (A(m))/μβ (m) = 1 + o(1). Since μβ (m) = 1/Zβ ,
we get the claim.
What Lemma 16.14 shows is that the proof of Theorem 16.5 revolves around
getting sharp bounds on Zβ capβ (m, s). The a priori estimates in (16.2.6) serve as a
jump board, because together with Lemma 16.14 they already yield the estimate
1
≤ e−βΓ Em (τs ) ≤ C1 .
(16.2.33)
C2
Thus, our task is to narrow down the constants leading to the identification of the
prefactor K. The strategy in Sect. 16.3 to do so is the following:
– Note that all terms in the Dirichlet form in (16.2.1) involving configurations ξ
with H (ξ ) > Γ , i.e., ξ ∈ S\S , contribute at most Ce−β(Γ +δ) for some δ > 0
Proof Theorem 16.6 follows from the general theory in Sect. 8.4. The intuition
behind the exponential distribution of the crossover time is simple: each time the
dynamics reaches C (m, s) but fails to enter Ss and instead falls back into Sm , it has
a probability exponentially close to 1 to return to m because m lies at the bottom of
Sm (recall Lemma 16.8). Each time the dynamics returns to m, it starts from scratch.
16.3 Proof of the metastability theorems 397
Thus, the dynamics manages to reach a critical configuration and go over the hill
only after a number of unsuccessful attempts that tends to infinity as β → ∞, each
having a small probability that tends to zero as β → ∞. Consequently, the time to
go over the hill is exponentially distributed on the scale of its average.
Proof Our starting point is Lemma 16.14. Recalling (16.2.1)–(16.2.4), our task is
to show that
2
Zβ capβ (m, s) = 12 Zβ μβ (ξ )cβ ξ, ξ hm,s (ξ ) − hm,s ξ
ξ,ξ ∈S
= 1 + o(1) Θ e−βΓ ,
β → ∞, (16.3.1)
and to identify the constant Θ, since (16.3.1) will imply (16.1.16) with Θ = 1/K.
This is done in three steps: in the first two steps we derive sharp estimates on hm,s ,
in the third step we use these estimates to derive a variational formula for Θ.
1. For all ξ ∈ S\S we have H (ξ ) > Γ , and so there exists a δ > 0 such that
Zβ μβ (ξ ) ≤ e−β(Γ +δ) . Since cβ (ξ, ξ ) ≤ 1 for all ξ, ξ ∈ S, we can therefore replace
S by S in the sum in (16.3.1) at the cost of a prefactor 1 + O(e−βδ ) (for details, see
the proof of Lemma 16.17 below).
2. Because of Lemma 16.15, on the set Sm ∪ Ss , hm,s is trivial and its contribution to
the sum in (16.3.1) can be put into the prefactor 1 + o(1) (for details, see the proof
of Lemma 16.17 below). Consequently, all that is needed is to understand what hm,s
looks like on the set
S \(Sm ∪ Ss ) = ξ ∈ S : Φ(ξ, m) = Φ(ξ, s) = Γ . (16.3.3)
However, Lemma 16.16 below shows that hm,s is also trivial on the set
I
S \(Sm ∪ Ss ) = Si , (16.3.4)
i=1
398 16 Abstract Set-Up and Metastability in the Zero-Temperature Limit
Lemma 16.16 shows that the contribution to the sum in (16.3.1) of the transitions
inside a well can also be put into the prefactor 1 + o(1) (for details, see the proof of
Lemma 16.17 below). Thus, only the transitions in and out of wells contribute.
3. In view of the above observations, the estimation of Zβ capβ (m, s) reduces to the
study of a simpler variational problem.
with
2
Θ = min min
1
2 1{ξ ∼ξ } h(ξ ) − h ξ . (16.3.11)
C1 ...,CI h : S →[0,1]
h|Sm =1, h|Ss =0, h|S =Ci ∀ i=1,...,I
i
ξ,ξ ∈S
Zβ capβ (m, s)
2
= Zβ min 1
2 μβ (ξ )cβ ξ, ξ h(ξ ) − h ξ
h : S→[0,1]
h(m)=1, h(s)=0 ξ,ξ ∈S
= O e−(Γ +δ)β
2
+ Zβ min
1
2 μβ (ξ )cβ ξ, ξ h(ξ ) − h ξ . (16.3.12)
h : S →[0,1]
h(m)=1, h(s)=0 ξ,ξ ∈S
= min min
C1 ,...,CI h : S →[0,1]
h|Sm =1−O(e−βδ ), h|Ss =O(e−βδ ), h|S =Ci +O(e−βδ ) ∀ i=1,...,I
i
2
1
2 μβ (ξ )cβ ξ, ξ h(ξ ) − h ξ
ξ,ξ ∈S
= 1 − O e−δβ min min
C1 ,...,CI h : S →[0,1]
h|Sm =1, h|Ss =0, h|S =Ci ∀ i=1,...,I
i
2
1
2 μβ (ξ )cβ ξ, ξ h(ξ ) − h ξ , (16.3.13)
ξ,ξ ∈S
where the error term O(e−δβ ) arises after we replace the approximate boundary
conditions
⎧
⎨ 1 − O(e−βδ ) on Sm ,
h = O(e−βδ ) on Ss , (16.3.14)
⎩
Ci + O(e−βδ ) on Si , i = 1, . . . , I,
coming from Lemmas 16.15–16.16 by the sharp boundary conditions
⎧
⎨1 on Sm ,
h= 0 on Ss , (16.3.15)
⎩
Ci on Si , i = 1, . . . , I.
400 16 Abstract Set-Up and Metastability in the Zero-Temperature Limit
The minimum with the sharp boundary conditions is an upper bound for the mini-
mum with the approximate boundary conditions. Conversely, removal from the min-
imum with the approximate boundary conditions of all the transitions that stay in-
side Sm , Ss or Si for some i = 1, . . . , I yields a lower bound that is within a factor
1 − O(e−βδ ) of the minimum with the sharp boundary conditions.
Finally, by (16.1.1)–16.1.3) we have
Zβ μβ (ξ )cβ ξ, ξ = 1{ξ ∼ξ } e−βΓ
(16.3.16)
for all ξ, ξ ∈ S that are not both in Sm or both in Ss or both in Si for some i =
1, . . . , I . Indeed, by Theorem 16.12 and the decomposition in (16.3.4), in each of
these cases either H (ξ ) = Γ > H (ξ ) or H (ξ ) < Γ = H (ξ ), because there are no
allowed moves between Sm , Ss and Si , i = 1, . . . , I . Combining (16.3.12)–(16.3.13)
and (16.3.16), we arrive at the claim.
Proof (a) We will show that there exist δ > 0 and C < ∞ such that for all β,
Because C ⊆ G (m, s), any path from m to s that does not pass through C
must hit a configuration ξ with H (ξ ) > Γ . Therefore there exists a set U , with
H (ξ ) ≥ Γ + δ for all ξ ∈ U and some δ > 0, such that
Pm {τC < τs }c , τs < τm ≤ Pm (τU < τm ). (16.3.20)
1
≤ ξ ∈ S\ξ : ξ ∼ ξ e−βH (ξ )
cβ (m)
ξ ∈U
1
C2 e−β(Γ +δ)
≤ (16.3.21)
cβ (m)
(b) Write
Pm (ξτC = ξ, τC < τm )
Pm (ξτC = ξ | τC < τm ) = , ξ ∈ C . (16.3.22)
Pm (τC < τm )
By reversibility,
μβ (ξ )cβ (ξ )
Pm (ξτC = ξ, τC < τm ) = Pξ (τm < τC )
μβ (m)cβ (m)
cβ (ξ )
= e−Γ
β
Pξ (τm < τC ), ξ ∈ C . (16.3.23)
cβ (m)
where
⎧
⎨ 0 if ξ ∈ C ,
hm,C ξ = 1 if ξ = m, (16.3.25)
⎩
Pξ (τm < τC ) otherwise.
402 16 Abstract Set-Up and Metastability in the Zero-Temperature Limit
Moreover, let
C¯ = ξ ∈ S\ P ∪ C : H ξ ≤ Γ , ∃ ξ ∈ C : ξ ∼ ξ . (16.3.27)
By Lemma 16.10, any path from C¯ to m that avoids C must reach an energy level
above Γ , and so hm,C (ξ ) ≤ hS\S ,C (ξ ) for all ξ ∈ C¯ . But Φ(ξ , S\S ) −
Φ(ξ , C ) = Φ(ξ , S\S ) − Γ ≥ δ for all ξ ∈ C¯ ∩ S and some δ > 0. Therefore,
again as in the proof of Lemma 8.4, it follows that
max hm,C ξ ≤ Ce−βδ . (16.3.28)
ξ ∈C¯ ∩S
and, since cβ (ξ ) ≤ |S|, it follows that ξ → cβ (ξ, P )/cβ (ξ ) ≥ C > 0. Combine this
observation with (16.3.29)–(16.3.30), to get
cβ (ξ, P )
Pξ (τm < τC ) = 1 + O e−βδ , ξ ∈ C . (16.3.32)
cβ (ξ )
Combine this in turn with (16.3.22)–(16.3.23), to arrive at
cβ (ξ ) Pξ (τm < τC )
Pm (ξτC = ξ | τC < τm ) =
ξ ∈C cβ (ξ ) Pξ (τm < τC )
cβ (ξ, P )
= 1 + O e−βδ
, ξ ∈ C .
ξ ∈C cβ (ξ , P )
(16.3.33)
16.4 Beyond Metropolis dynamics 403
There is nothing that prevents us from choosing a dynamics that is different from
the Metropolis dynamics in (16.1.2). We take a brief look at two examples, namely,
heat-bath dynamics (Sect. 16.4.1) and probabilistic cellular automata (Sect. 16.4.2).
We show that Theorems 16.4–16.6 in Sect. 16.1 carry over provided we modify
hypothesis (H2).
Return to the setting of Sect. 16.1.1. The heat-bath dynamics is the continuous-time
Markov process with state space S = Υ Λ and transition rates
[1 + eβ[H (ξ )−H (ξ )] ]−1 ξ ∼ ξ ,
cβ ξ, ξ = (16.4.1)
0 otherwise.
This Markov process is reversible with respect to μ, the Gibbs measure associated
with H . Note that for large β the transition rates of the heat-bath dynamics and the
Metropolis dynamics are close to each other, except when H (ξ ) = H (ξ ), in which
case the former gives cβ (ξ, ξ ) = 12 while the latter gives cβ (ξ, ξ ) = 1.
Proof The same proofs as in Sects. 16.2–16.3 apply, except for minor modifications
in a few spots:
the lower bound in (16.2.24) holds with C1 = 1/2L instead if C1 = 1/L. This does
not affect the a priori estimates in Lemma 16.11.
for all ξ, ξ ∈ S that are not both in Sm or both in Ss or both in Si for some i =
1, . . . , I . This does not affect Lemma 16.17, and so the same variational formula for
Θ = 1/K as in (16.3.11) holds.
3. In Sect. 16.3.3, no modification is needed all the way up to and including (16.3.30)
(the last term in the right-hand side of (16.3.30) comes with a factor [1 + eβδ ]−1 ≤
e−βδ ). Also, instead of (16.3.31), we can estimate cβ (ξ, P ) ≥ 12 |{ξ ∈ P : ξ ∼
ξ }|, so that also (16.3.32) and (16.3.33) carry over. Finally, ξ → cβ (ξ, P ) is not
constant on C . However,
−1
cβ ξ, ξ = 1 + eβ[H (ξ )−H (ξ )] = e−β[H (ξ )−H (ξ )] 1 + O e−βδ ,
(16.4.4)
ξ ∈ C ξ ∈ P .
Consequently, if we strengthen (H2) to (H2 ), then (16.3.33) again gives the uniform
entrance distribution (use that H (ξ ) = Γ for all ξ ∈ C ).
Again return to the setting of Sect. 16.1.1. A probabilistic cellular automaton (PCA)
is a discrete-time Markov chain with state space S = Υ Λ and transition matrix
"
p ξ, ξ = px,ξ ξ (x) , ξ, ξ ∈ S, (16.4.5)
x∈Λ
where, for each x ∈ Λ and ξ ∈ S, px,ξ (·) is a probability measure on S with full
support. This transition matrix corresponds to independent updates of all the spins
simultaneously at each unit of time (“parallel dynamics”), according to local updat-
ing rules that take into account both the location of the spin and the values of the
spins in its surroundings. Typically, px,ξ (·) is assumed to depend on ξ only through
the spins ξ(y), y ∈ N (x), in some small neighbourhood N (x) of x. If Λ has a lattice
structure (e.g. a torus in Zd , d ≥ 1), then typically N (x) = x + N , x ∈ Λ, for some
small N ⊆ Λ.
What makes PCA’s into challenging objects is that they evolve via global moves
rather than local moves: all transitions—between any pair of configurations in S—
have positive probability, and therefore all transitions are allowed. This means that
∼ loses the role it played for Metropolis dynamics,
For β > 0, the PCA in (16.4.5) is reversible with respect to the Gibbs measure
μβ (ξ ) = e−βH (ξ ) /Zβ , ξ ∈ S, associated with the Hamiltonian H : S → R if
μβ (ξ )p ξ, ξ = μβ ξ p ξ , ξ ∀ ξ, ξ ∈ S, (16.4.6)
16.4 Beyond Metropolis dynamics 405
where the maximum runs over all edges e in γ , i.e., over all pairs of successive con-
figurations visited by the path. This is different from Definition 16.1(a), where the
maximum runs over the single configurations in γ , and H was used instead of H
(note that Φ(ξ, ξ ) = H (ξ ) by convention). The definition of the communication
level set S (ξ, ξ ) in Definition 16.1(b) must be adapted accordingly: this becomes a
set of pairs of configurations rather than single configurations (S (ξ, ξ ) = ξ by con-
vention). A similar change applies to the definition of gates and dead-ends in Defi-
nition 16.2. What makes (16.4.8) non-trivial to compute is the fact that the Hamilto-
nian and the transition probabilities compete with each other: to make H (·, ·) small
we must make H (·) small and p(·, ·) large simultaneously.
Definition 16.3 must be changed into the following.
Definition 16.19 (Protocritical and critical sets for PCA dynamics) Let
Then (P (m, s), C (m, s)) is the maximal subset of S × S such that:
(1) ∀ ξ ∈ P (m, s) ∃ γ : ξ → m : maxe∈γ H (e) − H (m) < Γ .
(2) ∀ ξ ∈ C (m, s) ∃ γ : ξ → s : maxe∈γ H (e) − H (m) ≤ Γ , γ ∩ {ζ ∈ S :
Φ(ζ, m) < Φ(ζ, s)} = ∅.
Theorem 16.20 (Metastability for PCA dynamics) Theorems 16.4–16.6 are valid
for PCA dynamics subject to (H1) and
(H2 ) (ξ, ξ ) → p(ξ, ξ ) is constant on C × P .
Proof We will again go through the proofs in Sects. 16.2–16.3 to see what needs to
be modified.
while the definition of the capacity in (16.2.2) remains the same. Note that
μβ (ξ )p ξ, ξ = e−β H (ξ,ξ ) /Zβ . (16.4.11)
Replace (16.2.5) by
capβ (A, B) = μβ (ξ )p ξ, ξ Pξ (τB < τA ), (16.4.12)
ξ ∈A
which is simpler than (16.2.5) because the PCA dynamics evolves is discrete rather
than continuous time.
3. In Sect. 16.2.3, Theorem 16.12 needs to be adapted as follows: S is the graph con-
sisting of all vertices and all edges; S is the subgraph of S obtained by removing all
edges e with H (e) > Γ ; S is the subgraph of S obtained by removing all edges
e with H (e) = Γ ; Sm and Ss are the connected components of S containing m
and s, respectively. With these modifications the claim in Theorem 16.12 stays the
same.
4. There are no changes in Sects. 16.2.4 and 16.3.1. Throughout Sect. 16.3.2,
μβ (ξ )cβ (ξ, ξ ) must be replaced by μβ (ξ )p(ξ, ξ ), while the indicator 1{ξ ∼ ξ }
must be removed from (16.3.11) and (16.3.16). The definition of Si , i = 1, . . . , I
in (16.3.4) stays the same, and so does the variational formula for Θ = 1/K in
(16.3.11).
5. Throughout Sect. 16.3.3, remove the terms cβ (ξ ) and cβ (m) (in view of (16.4.12))
and replace cβ (ξ, ξ ) by p(ξ, ξ ). Finally, if we replace (H2) by (H2 ), then
(16.3.33) again gives the uniform entrance distribution.
Fig. 16.5 Example where Smeta = {m1 , m2 }, yet for both these metastable configurations the same
results apply as in Theorems 16.4–16.6
Fig. 16.6 Example where there is a well of depth > Γ . The presence of this well does not influ-
ence the typical crossover time from m to s, but enlarges its average by a factor eβΔ
Fig. 16.7 Example where Smeta = {m1 , m2 }, with s separated from m1 by m2 . In this case the dis-
tribution of the crossover time from m1 to s divided by its average is one half times the convolution
of two unit exponentials
paper also contains a careful analysis of the role of minimal gates and essential
gates for the metastable pair.
In this chapter we apply the results obtained in Chap. 16 to Ising spins in two and
three dimensions subject to Glauber dynamics. Spins live in a finite box, flip up
and down, want to align when they sit next to each other, and want to align with an
external magnetic field. We are interested in how the system magnetises, i.e., how
the dynamics aligns the spins with the magnetic field when initially all the spins are
pointing in the opposite direction. Our goal will be to prove hypotheses (H1–H2)
in Sect. 16.1.2, implying that Theorems 16.4–16.6 are valid. In two dimensions we
will identify (Γ , C , K). In three dimensions we will also identify Γ , but we will
obtain only partial information on C and K.
17.1.1 Model
Let Λ ⊂ Z2 be a large square torus, centred at the origin. With each site x ∈ Λ we
associate a spin variable σ (x) assuming the values −1 or +1, indicating whether the
spin at x is pointing down or up (see Fig. 17.1). A configuration is denoted by σ ∈
S = {−1, +1}Λ . Each configuration σ ∈ S has an energy given by the Hamiltonian
J h
H (σ ) = − σ (x)σ (y) − σ (x), (17.1.1)
2 2
{x,y}∈Λ∗ x∈Λ
where
Λ∗ = {x, y} : x, y ∈ Λ, x − y = 1 (17.1.2)
h ∈ (0, 2J ). (17.1.3)
This parameter range will be seen to correspond to metastable behaviour in the limit
as β → ∞ (see Fig. 17.4 below). A key role will be played by what we call the
critical droplet size:
9 :
2J
c = (17.1.4)
h
(0·1 denotes the upper integer part). For reasons that will become clear later on, we
will assume that
2J
∈
/ N. (17.1.5)
h
Thus, an (c − 1) × (c − 1) droplet will be “subcritical” while an c × c droplet
will be “supercritical”. Moreover, we will assume that Λ is large enough so that it
contains an 2c × 2c square, which is necessary for (H1) and will also prevent the
critical droplet to be a ring that wraps around Λ.
Analogous assumptions are needed in three dimensions (see Sect. 17.6).
17.1 Introduction and main results 411
Fig. 17.2 Configurations in Q , Q 1pr and Q 2pr . Inside the contours sit the up-spins, outside the
contours sit the down-spins
Definition 17.1
(a) Let
= σ ∈ S : σ (x) = −1 ∀ x ∈ Λ ,
(17.1.6)
= σ ∈ S : σ (x) = +1 ∀ x ∈ Λ ,
denote the configurations where all spins in Λ are down, respectively, up.
(b) Let Q be the set of configurations where the up-spins form a single (c − 1) × c
quasi-square anywhere in Λ.
(c) Let Q 1pr be the set of configurations where the up-spins form a single quasi-
square (c − 1) × c anywhere in Λ with a single protuberance attached any-
where to one of its longest sides.
(d) Let Q 2pr be the set of configurations where the up-spins form a single quasi-
square (c − 1) × c anywhere in Λ with a double protuberance attached any-
where to one of its longest sides.
See Fig. 17.2 for a picture of the configurations in Q, Q 1pr and Q 2pr .
The main metastability theorems for Glauber dynamics are the following. Recall
Definition 16.3.
Theorem 17.2 The pair (, ) satisfies hypotheses (H1–H2) in Section 16.1.2 and
hence Theorems 16.4–16.6 hold.
Theorem 17.3 The pair (, ) has protocritical set P (, ) = Q, critical set
C (, ) = Q 1pr , and communication height
Γ = Γ (, ) = H Q 1pr − H () = J [4c ] − h c (c − 1) + 1 . (17.1.7)
where we write σ ≤ σ when σ (x) ≤ σ (x) for all x ∈ Λ, and vice versa.
17.1.4 Discussion
1. The proof of Theorem 17.2 is given in Sect. 17.3. (H2) is easy to check, (H1) is
more involved and relies on certain isoperimetric inequalities.
2. The heuristics behind Theorem 17.3 is as follows. In Sect. 17.4 we will see that
Q 1pr ⊆ S (, ), the communication level set of the pair (, ). We will see that
on its way from to the dynamics passes through S (, ) in three steps:
(1) first it creates a quasi-square of up-spins; (2) next it attaches a single protu-
berance; (3) finally it turns this single protuberance into a double protuberance (see
Fig. 17.3). After these three steps are completed, the dynamics is “over the hill”
and proceeds downwards in energy to fill the box with up-spins. This also explains
where Theorem 17.5 comes from.
3. The heuristics behind Theorem 17.4 is as follows. The average time it takes for
the dynamics to enter C (, ) = Q 1pr when starting from is
1
eβΓ 1 + o(1) , β → ∞, (17.1.9)
|Q |
1pr
where |Q 1pr | counts the number of critical droplets. Let π(c ) be the probability
that the single protuberance is turned into a double protuberance rather than is being
removed. Then
1
1 + o(1) , β → ∞, (17.1.10)
π(c )
17.2 Geometric definitions 413
is the average number of times a critical droplet just created attempts to move over
the hill before it finally manages to do so. The average nucleation time is the product
of (17.1.9) and (17.1.10), and so we conclude that
1
K= . (17.1.11)
|Q 1pr | π(c )
Outline The outline of the remainder of this chapter is as follows. In Sect. 17.2
we introduce some geometric definitions that are needed for the proof of Theo-
rems 17.2–17.5. These theorems are proved in Sects. 17.3–17.5. Section 17.6 looks
at the extension from two to three dimensions.
3. For σ ∈ S, let |σ | be the volume of C(σ ), ∂(σ ) the Euclidean boundary of C(σ ),
called the contour of σ , and |∂(σ )| the length of ∂(σ ). Then the energy associated
with σ is given by
H (σ ) = J ∂(σ ) − h|σ | + H (). (17.2.1)
4. To describe the shape of clusters, we need the following:
– An 1 × 2 rectangle is a union of closed unit squares centered at the sites in Λ
with side lengths 1 , 2 ≥ 1. We use the convention 1 ≤ 2 and collect rectangles
in equivalence classes modulo translations and rotations.
– A quasi-square is an × ( + δ) rectangle with ≥ 1 and δ ∈ {0, 1}. A square is
a quasi-square with δ = 0.
– A bar is a 1 × k rectangle with k ≥ 1. A bar is called a row or a column if it fills
a side of a rectangle.
– A corner of a rectangle is an intersection of two bars attached to the rectangle.
– A 1-protuberance is a 1 × 1 bar attached to one side of a rectangle.
– A 2-protuberance is a 1 × 2 bar attached to one side of a rectangle.
5. The configuration space S can be partitioned as
|Λ|
S= Vn , (17.2.2)
n=0
where
Vn = {σ ∈ S : |σ | = n} (17.2.3)
is the set of configurations with n up-spins.
Proof Let D denote the set of configurations where the up-spins form a single ×
square anywhere in Λ. The energy of the configurations in D equals (recall (17.1.1)
and see Fig. 17.4)
local minimum of H . Thus, to settle (H1) it remains to show that has the unique
maximal stability level on S\.
Let γ = (γ0 , . . . , γ|Λ|
) : → be any path that grows a droplet of up-spins
Let
k = min k ∈ N : H γk ≤ H () ≥ 2 (17.3.3)
be the first time the reference path after it has left hits an energy not exceeding
that of .
For σ, σ ∈ S, let σ ∨ σ and σ ∧ σ denote the componentwise maximum, re-
spectively, minimum of σ and σ . An easy computation shows that, for all σ, σ ∈ S,
∂ σ ∨ σ + ∂ σ ∧ σ ≤ ∂(σ ) + ∂ σ ,
(17.3.4)
σ ∨ σ + σ ∧ σ = |σ | + σ .
Pick any σ ∈ S \ [ ∪ ]. Then there exists at least one pair of neighbouring sites x
and y in Λ such that σ (x) = −1 and σ (y) = +1. By translation invariance we may
assume without loss of generality that the first two spins that are flipped up in γ
are located at x and y, respectively. Then
σ ∧ γ1 = ,
(17.3.5)
1 ≤ σ ∧ γk < k ∀ k ≥ 2.
where we use (17.2.1) and (17.3.2)–(17.3.5). (Note that the second lines of (17.3.2)
and (17.3.5) imply that H (σ ∧ γk ) > H () for 2 ≤ k ≤ k .) By picking k = k in
(17.3.7), we get
H σ ∨ γk − H (σ ) < H γk − H () ≤ 0. (17.3.8)
Proof It is obvious from Definitions 17.1(b–c) and Theorem 17.3 that (H3) is satis-
fied. Indeed, each configuration in C (, ) = Q 1pr has exactly one configuration
in P (, ) = Q from which it can be reached via an allowed move, namely, the
configuration that is obtained from it by removing the single protuberance.
Proposition 17.6
(i) Φ(, ) = Γ .
(ii) S (, ) ⊇ Q 1pr .
1. We first show that the configurations in Q are connected to by a path that stays
below Γ .
Lemma 17.7 For any σ ∈ Q there exists an γ : σ → such that maxξ ∈ω H (ξ ) <
Γ .
First, we flip down a spin at a corner of the quasi-square, which increases the energy
by h. Next, we repeat this operation another c − 3 times, each time picking a spin
from a corner on the same shortest side. To guarantee that we never reach energy Γ ,
we must have that
h(c − 2) < 2J − h, (17.4.2)
or
2J
c < + 1. (17.4.3)
h
But this inequality holds by the definition of c in (17.1.4) and the non-degeneracy
hypothesis in (17.1.5). Finally, we flip down the last spin, which lowers the energy
by 2J − h, so that we arrive at energy
Γ − (2J − h) − 2J − h(c − 1) , (17.4.4)
which is strictly smaller than (17.4.1) by (17.4.3). Thus, the removal of a row of
length c − 1 from the (c − 1) × c quasi-square in σ lowers the energy (see
Fig. 17.5). We now have a square of side length c − 1. It is obvious that we can
remove further rows without encountering new conditions, until we reach .
2. We next show that the configurations in Q 2pr are connected to by a path that
stays below Γ .
Lemma 17.8 For any σ ∈ Q 2pr there exists an γ : σ → such that maxξ ∈ω H (ξ )
< Γ .
H (σ ) = Γ − h. (17.4.5)
418 17 Glauber Dynamics
First, we flip up a spin next to the 2-protuberance. This lowers the energy by h. We
can repeat this operation another c − 3 times until the row is filled. By that time we
have a square of side length c and energy
Next, we flip up a spin to form a new 1-protuberance. This raises the energy by
2J − h. To make sure that we do not reach energy Γ , we must have
or
2J
c > , (17.4.8)
h
which holds by the definition of c and the non-degeneracy hypothesis in (17.1.5).
We now have a square of side length c with a 1-protuberance. By flipping up a spin
next to this 1-protuberance, we get a 2-protuberance and reach energy
which is strictly smaller than (17.4.5) by (17.4.8). Thus, the completion of a row of
length c with a 2-protuberance and the creation of a new 2-protuberance lowers
the energy (see Fig. 17.5). It is obvious that we can complete further rows and create
further 2-protuberances without encountering new conditions, until we reach .
3. We can now conclude the proof of Φ(, ) ≤ Γ as follows. The desired path
γ : → is realized by tracing the path in Lemma 17.7 in the reverse direction,
from to σ ∈ Q, then going from σ to σ ∈ Q 1pr by adding a 1-protuberance and
from σ to σ ∈ Q 2pr by extending this 1-protuberance to a 2-protuberance, and
finally following the path in Lemma 17.8 from σ to . This γ will be called the
reference path for the magnetisation.
Proof Any path γ : → must cross the set Vc (c −1) . As shown in Alonso and
Cerf [4], Theorem 2.6, the following isoperimetric inequality holds as a conse-
quence of (17.2.1): in Vc (c −1) the unique (modulo translations and rotations) con-
figuration of minimal energy is the (c − 1) × c quasi-square, which has energy
H (σ ) = Γ − (2J − h). All other configurations in Vc (c −1) have energy at least
Γ + h, and thus any path not hitting Q exceeds energy Γ .
Proof Follow the path until it hits the set Vc (c −1) . According to Lemma 17.9, the
configuration in this set must be an (c − 1) × c quasi-square. Since we need not
consider any paths that return to the set Vc (c −1) afterwards, a first step beyond
the quasi-square must be the creation of a 1-protuberance. This brings us to en-
ergy Γ . If the 1-protuberance is created on the side of length c , then we have
a configuration in Q 1pr . If, on the other hand, it is created on the side of length
c − 1, then completion of the row leads an (c − 1) × (c + 1) rectangle with en-
ergy Γ − h(c − 2). After that the creation of a 1-protuberance brings us to energy
Γ − h(c − 2) + (2J − h), which exceeds energy Γ because of (17.4.2). Since
(c − 1) × (c + 1) + 1 = c × c , any other path that proceeds from the (c − 1) × c
quasi-square with a 1-protuberance on the side of length c − 1 to the set V2c without
returning to the set Vc (c −1) also exceeds energy Γ . Indeed, according to Alonso
and Cerf [4], Theorem 2.6, the unique configuration with minimal energy in the set
V2c is the c × c square (modulo rotations and translations).
Lemmas 17.9–17.10 imply that Φ(, ) ≥ Γ , and together with Steps 1–3
complete the proof of Proposition 17.6(i).
The relations P (, ) = Q and C (, ) = Q 1pr and the formula for Γ
claimed in Theorem 17.3 are an immediate consequence of Definition 16.3 and
Lemmas 17.7–17.10.
The claim in Theorem 17.5 is immediate from Lemmas 17.7–17.8 in combination
with Proposition 16.12, Lemma 16.15, (16.2.4) and (17.3.6)–(17.3.7).
Proof Our starting point is the variational formula for Θ = 1/K in Lemma 16.17.
This variational problem simplifies considerably because of the following two facts
that are specific to our Glauber dynamics (abbreviate C = C (, )):
• S \ [S ∪ S ] = C , i.e., there are no wells inside C .
• There are no allowed moves within C , i.e., critical droplets cannot transform
into each other via single spin-flips.
Consequently, (16.3.11) reduces to
2 2
Θ= min 1 − h(σ ) N − (σ ) + 0 − h(σ ) N + (σ ) ,
h : Q 1pr →[0,1]
σ ∈Q 1pr
N − (σ )N + (σ )
= , (17.5.1)
N − (σ ) + N + (σ )
σ ∈Q 1pr
where
N − (σ ) = σ ∈ Q : σ ∼ σ ,
(17.5.2)
N + (σ ) = σ ∈ Q 2pr : σ ∼ σ ,
where 2|Λ| counts the number of locations and rotations of the protocritical droplet.
Since K = 1/Θ, this completes the proof of Theorem 17.4.
In this section we briefly indicate how to extend the main definitions and results
from two to three dimensions. No proofs are given. See Sect. 17.7 for references.
Let Λ ⊂ Z3 be a large cubic box, centred at the origin. The metastable parameter
range replacing (17.1.3) is
h ∈ (0, 3J ), (17.6.1)
and, similarly as in (17.1.5), we assume that
2J 4J
∈
/ N, ∈
/ N. (17.6.2)
h h
The analogue of Definitions 17.1(b–c) reads:
17.6 Extension to three dimensions 421
Definition 17.11
(a) Let Q be the set of configurations where the up-spins form an (mc − 1) × (mc −
δc ) × mc quasi-cube with, attached to one of its faces, an (c − 1) × c quasi-
square, anywhere in Λ. Here, δc ∈ {0, 1} depends on the arithmetic properties
of J and h, while
9 : 9 :
2J 4J
c = , mc = , (17.6.3)
h h
are the two-dimensional critical droplet size on a face, respectively, the three-
dimensional critical droplet size, replacing (17.1.4). Note that mc ∈ {2c − 1,
2c }.
(b) Let Q 1pr be the set of configurations obtained from Q by adding a single protu-
berance anywhere to one of the longest sides of the quasi-square (see Fig. 17.6).
(c) Let
Γ = Γ (, ) = H Q 1pr − H ()
= J 2mc (mc − δc ) + 2mc (mc − 1) + 2(mc − δc )(mc − 1) + 4c
− h mc (mc − δc )(mc − 1) + c (c − 1) + 1 . (17.6.4)
Theorem 17.3 carries over: P (, ) = Q and C (, ) = Q 1pr . Also Theo-
rem 17.2 carries over: the proof of (H1–H2) is the same as in Sects. 17.3.1–17.3.2.
As to Theorem 17.4, the prefactor K can be computed explicitly, namely,
Kd=2
K = Kd=3 = (17.6.5)
Md=3
with Kd=2 the prefactor in two dimensions and Md=3 the number of quasi-cubes
in three dimensions that are contained in a three-dimensional critical droplet. The
422 17 Glauber Dynamics
1. The results in this chapter are taken from Bovier and Manzo [39]. Cruder versions
of the main results in Chap. 16 for Glauber dynamics, derived with the help of the
pathwise approach to metastability, were obtained by Neves and Schonmann [193]
in two dimensions and by Ben Arous and Cerf [19] in three dimensions.
2. The formula for K claimed in [39] contains a small error. This is corrected in
Theorem 17.4. The argument in Sect. 17.3.1 first appeared in den Hollander, Nardi,
Olivieri and Scoppola [84].
4. Detailed results are known about the tube of typical trajectories, i.e., the set of
paths within which the crossover from to takes place, also referred to as the
nucleation pattern. The identification of this tube requires an analysis of the dy-
namics on shorter time scales, in particular, the typical times scales on which rows
and columns are grown. This is the realm of the pathwise approach to metastability.
Such a refined analysis is also necessary to improve on the result in Theorem 17.5.
See Olivieri and Vares [198, Sects. 7.3–7.4], and Gaudillière, Olivieri and Scop-
pola [122].
8. A version of the Glauber dynamics with three spin-values called the Blume-Capel
model, namely, Υ = {−1, 0, +1} and ∼ allowing for single-site changes of the
spins, was considered by Manzo and Olivieri [172, 173]. There are three regimes,
corresponding to the three equilibrium phases of the model (plus-phase, zero-phase
and minus-phase). The nucleation pattern is fairly complex. See [198, Sect. 7.11].
1
px,σ (s) = 1 + s tanh β Uσ (x) + h , x ∈ Λ, σ ∈ S, s ∈ Υ. (17.7.2)
2
424 17 Glauber Dynamics
“All right,” said the Cat; and this time it vanished quite slowly,
beginning with the end of the tail, and ending with the grin,
which remained some time after the rest of it had gone.
(Lewis Carroll, Alice’s Adventures in Wonderland)
In this chapter we apply the results obtained in Chap. 16 to the lattice gas in two and
three dimensions subject to Kawasaki dynamics. Particles live in a finite box, hop
between nearest-neighbour sites, feel an attractive interaction when they sit next to
each other, and are created, respectively, annihilated at the boundary of the box in a
way that reflects the presence of an infinite gas reservoir. We are interested in how
the system nucleates, i.e., how the box fills up when it is initially empty. Our goal
will be to prove hypotheses (H1–H2) in Sect. 16.1.2, implying that Theorems 16.4–
16.6 are valid. In two dimensions we will further identify (Γ , C ) and obtain the
asymptotics of K in the limit as the size of the box tends to infinity. In three dimen-
sions we will also identify Γ , but we will obtain only partial information on C
and K.
Kawasaki differs from Glauber, treated in Chap. 17, in that it is a conservative
dynamics: particles are conserved in the interior of the box. Consequently, during
the growing and the shrinking of droplets, particles must travel between the droplet
and the boundary of the box, which causes several complications. Moreover, it turns
out that in the metastable regime particles move along the border of a droplet more
rapidly than they arrive from the boundary of the box. This leads to a shape of the
critical droplet that is more complicated than the one for Glauber dynamics. This
complexity needs to be handled in order to obtain information on C and K.
18.1.1 Model
where
− ∗
Λ = {x, y} : x, y ∈ Λ− , x − y = 1 (18.1.3)
Δ ∈ (U, 2U ). (18.1.4)
We will see that this parameter range corresponds to metastable behaviour in the
limit as β → ∞ (see Fig. 18.3 below). A key role will be played by what we call
the critical droplet size:
9 :
U
c = . (18.1.5)
2U − Δ
For reasons that will become clear later on, we will assume that
U
∈
/ N. (18.1.6)
2U − Δ
Thus, an (c − 1) × (c − 1) droplet will be “subcritical” while an c × c droplet
will be “supercritical”. Moreover, we will assume that Λ is large enough so that Λ−
contains an 2c × 2c square.
Analogous assumptions are needed in three dimensions (see Sect. 18.6).
Definition 18.1
(a) Let
= η ∈ S : η(x) = 0 ∀ x ∈ Λ ,
(18.1.7)
= η ∈ S : η(x) = 1 ∀ x ∈ Λ− , η(x) = 0 ∀ x ∈ ∂ − Λ ,
Fig. 18.2 A configuration in D̄ with an (c − 2) × (c − 2) square in the center and four bars
% with an (c − 3) × (c − 1) rectangle in the center
attached to it. A similar picture applies for D
(c) Let D fp denote the set of configurations obtained from D by adding a free
particle anywhere in ∂ − Λ.
In the definition of D̄ , the four bars may be placed anywhere in the ring around
the square, i.e., anywhere in the union of the two rows and the two columns forming
the outer layer of the square (see Fig. 18.2). A total of 3c − 3 particles must be
accommodated in this ring in such a way that each side of the ring, i.e., each row or
column, contains precisely one bar. A bar may include a corner of the ring provided
%.
the neighbouring bar also includes this corner. Similarly for D
In Sect. 18.1.4, item 2, we will see that the configurations in D arise from each
other via motion of particles along the border of the droplet, a phenomenon that is
specific to Kawasaki dynamics.
The main metastability theorems for Kawasaki dynamics are the following. Re-
call Definition 16.3.
Theorem 18.2 The pair (, ) satisfies hypotheses (H1–H2) in Sect. 16.1.2, and
hence Theorems 16.4–16.6 hold.
18.1 Introduction and main results 429
Theorem 18.3 The pair (, ) has protocritical set P (, ) = D , critical set
C (, ) = D fp , and communication height
Γ = Γ (, ) = H D fp − H () = H (D) + Δ
= −U (c − 1)2 + c (c − 2) + 1 + Δ c (c − 1) + 2
= 2U [c + 1] − (2U − Δ) c (c − 1) + 2 . (18.1.11)
Remark The asymptotics in (18.1.12) does not depend on the shape of Λ, e.g. it
would be the same if Λ were a large circle rather than a large square.
In addition, we have the following geometric description of the configurations in
the valleys S , S around , defined in (16.2.25). Let
V≤D = η ∈ S : η ≤ η for some η ∈ D ,
(18.1.14)
V≥C G = η ∈ S : η ≥ η for some η ∈ C G ,
18.1.4 Discussion
1. The proof of Theorem 18.2 is given in Sect. 18.3. (H2) is easy to check, (H1) is
more involved and relies on certain isoperimetric inequalities.
2. The heuristics behind Theorem 18.3 is as follows. In Sect. 18.4 we will see that
D fp ⊆ S (, ), the communication level set of the pair (, ). We will see that
the dynamics passes through S (, ) in four steps: (1) first it creates a “canonical
protocritical droplet”, namely, a configuration in D with the property that three bars
430 18 Kawasaki Dynamics
have full length and one bar consists of a single protuberance; (2) next it allows
particles to “move along the border of the droplet”, thereby forming all the other
“protocritical droplets” in D ; (3) after that it brings in a free particle, thereby form-
ing a “critical droplet”; (4) finally it attaches this free particle to the boundary of the
protocritical droplet. After these four steps are completed, the dynamics is “over the
hill” and proceeds downwards in energy to fill up the box. This also explains where
Theorem 18.5 comes from.
Note: If the free particle attaches itself at a “bad site” in the outer layer of the proto-
critical droplet (i.e., next to one other particle), then either it may again detach itself
or it may cause a motion of particles along the border of the droplet, after which
another particle may detach itself, possibly leaving behind a different protocritical
droplet. However, since for large Λ a free particle has a small probability to escape
from the protocritical droplet and return to ∂ − Λ, it must eventually attach itself at a
“good site”. See Sect. 18.4.4 for more details.
3. The heuristics behind Theorem 18.4 is as follows. The average time it takes for
the dynamics to enter C (, ) = D fp when starting from is
1 1
eβΓ 1 + o(1) , β → ∞, (18.1.15)
|D| |∂ − Λ|
where |D| counts the number of protocritical droplets and |∂ − Λ| counts the number
of locations where the free particle can be created. Let π(Λ, c ) be the probability
that the free particle moves from ∂ − Λ to the protocritical droplet and attaches itself
at a good site, i.e., the probability that the dynamics after it enters C (, ) moves
onwards to rather than returns to . Then
1
1 + o(1) , β → ∞, (18.1.16)
π(Λ, c )
is the average number of times a free particle just created in ∂ − Λ attempts to move
to the protocritical droplet and attach itself at a good site before it finally manages
to do so. The average nucleation time is the product of (18.1.15) and (18.1.16), and
so we conclude that
1
K= . (18.1.17)
|D| |∂ − Λ| π(Λ, c )
To compute |D|, note that
|D| = 1 + o(1) |Λ| N (c ), Λ → Z2 . (18.1.18)
Indeed, as we will see in Sect. 18.5, the right-hand side of (18.1.19) is the probability
for large Λ that a particle detaching itself from the protocritical droplet reaches
∂ − Λ before re-attaching itself. Due to the recurrence of simple random walk in
two dimensions, for large Λ this probability is independent of the shape and the
location of the protocritical droplet, as long as it is far from ∂ − Λ. By reversibility,
the reverse motion has the same probability, which explains (18.1.19). Combine
(18.1.17)–(18.1.19) to get (18.1.12).
Outline The outline of the remainder of this chapter is as follows. In Sect. 18.2
we introduce some key geometric definitions that are needed for the proof of Theo-
rems 18.2–18.5. These theorems are proved in Sects. 18.3–18.5. Section 18.6 looks
at the extension from two to three dimensions.
3. For η ∈ S, let |η| be the number of particles in η, ∂(η) the Euclidean boundary
of C(η), called the contour of η, and |∂(η)| the length of ∂(η). Then the energy
associated with η is given by
U
H (η) = ∂(η) − (2U − Δ)η ∩ Λ− + Δη ∩ ∂ − Λ. (18.2.1)
2
4. To describe the shape of clusters, we need the following:
– An 1 × 2 rectangle is a union of closed unit squares centred at the sites in-
side Λ− with side lengths 1 , 2 ≥ 1. We use the convention 1 ≤ 2 and collect
rectangles in equivalence classes modulo translations and rotations.
– A bar is a 1 × k rectangle with k ≥ 1. A bar is called a row or a column if it fills
a side of a rectangle.
– A corner of a rectangle is an intersection of two bars attached to the rectangle.
– A quasi-square is an × ( + δ) rectangle with ≥ 1 and δ ∈ {0, 1}. A square is
a quasi-square with δ = 0.
– If η is a configuration with a single contour, then we denote by CR(η) the rectan-
gle circumscribing η, i.e., the smallest rectangle containing η. We write
∂ − CR(η) = x ∈ CR(η) : ∃ y ∈ / CR(η) : y − x = 1 ,
(18.2.2)
∂ + CR(η) = x ∈/ CR(η) : ∃ y ∈ CR(η) : y − x = 1 ,
where αα ∈ {ne, nw, se, sw} with n = north, s = south, etc. By convention, cor-
ners are not part of rows. If equality holds in the last inequality, then we need to
place the bar in the row opposite to rα (η), say rα (η), a distance 1 away from
cα α (η) in order to be able to accommodate the shift of a bar in rα (η) that is
necessary to accommodate the particle that moves around the corner.
where
Vn = η ∈ S : |η| = n (18.2.7)
is the set of configurations with n particles.
Proof Let D denote the set of configurations where the particles form a single
× square anywhere inside Λ− . The energy E() of the configurations in D
equals (recall (18.1.2) and see Fig. 18.3)
E() = H (D ) − H () = −U 2( − 1) + Δ2 = 2U − (2U − Δ)2 , (18.3.1)
which is maximal at = U/(2U − Δ) and is negative for l > 2U/(2U − Δ). Since
Λ is chosen large enough so that Λ− contains an 2c × 2c square, it follows that
H () = H (0 × 0) > H (). It is obvious from (18.2.1) that is the global min-
imum of H , while is a local minimum of H . Thus, to settle (H1) it remains to
show that has the unique maximal stability level on S\.
We can repeat the argument for Glauber dynamics in Sect. 17.3.1 by thinking
of up-spins as particles and down-spins as vacancies. The additional obstacle under
Kawasaki dynamics is that, when we are growing the configuration by considering
the union of η with the droplets in the reference path, particles cannot be created
where needed but have to arrive from ∂ − Λ. We have to make sure that at any stage
the configuration is such that a particle coming from ∂ − Λ can be moved to where
it is needed. This requires a technical construction with “pistons enclosing η”, for
which we refer to the literature (see the reference in Sect. 18.7).
434 18 Kawasaki Dynamics
Proof It is obvious from Definitions 18.1(b–c) and Theorem 18.3 that (H2) is sat-
isfied. Indeed, each configuration in C (, ) = D fp has exactly one configuration
in P (, ) = D from which it can be reached via an allowed move, namely, the
configuration that is obtained from it by removing the free particle in ∂ − Λ.
In this section we prove Theorems 18.3 and 18.5. In Sect. 18.4.1 we consider the set
Q consisting of those configurations in D where the single cluster is an (c − 1) × c
quasi-square with a protuberance attached to one of its sides. We show that D , our
target protocritical set, coincides with Q U , the set all configurations that can be
obtained from Q via a U -path. In Sect. 18.4.2 we use the identity D = Q U to show
that Φ(, ) = Γ with Γ given by (18.1.11) and S (, ) ⊇ D fp . In Sect. 18.4.3
we combine the results obtained in Sect. 18.4.2 to show that P (, ) = D and
C (, ) = D fp , thereby completing the proof of Theorem 18.3. In Sect. 18.4.4 we
take a closer look at what happens when the free particle in D fp attaches itself to
the single cluster, where we distinguish between “good sites” and “bad sites” on the
border of the single cluster. The latter distinction will be needed in Sect. 18.5 for
the proof of Theorem 18.4. In Sect. 18.4.5, finally, we compute the cardinality of D
modulo shifts, which will also be needed in Sect. 18.5 for the proof of Theorem 18.4.
The following definition formalises the notion of canonical protocritical droplet and
protocritical droplet mentioned in Item 2 of Sect. 18.1.4.
18.4 Structure of the communication level set 435
Definition 18.6
(a) Let Q ⊆ D be the set of configurations consisting of an (c − 1) × c quasi-
square anywhere in Λ− with a protuberance attached to one of its sides (see
Fig. 18.4). These configurations are called canonical protocritical droplets.
(b) Let Q U be the set of configurations that can be reached from some configuration
in Q via a U -path, i.e.,
Q U = η ∈ Vnc : ∃ η ∈ Q : H (η) = H η , ΦVnc η, η ≤ H (η) + U ,
(18.4.1)
where nc = c (c − 1) + 1 is the volume of the clusters in Q and ΦVnc is the
communication height within Vnc . These configurations are called protocritical
droplets.
Note that Q = Q¯ ∪ Q,
% where
– Q¯ are those configurations where the single particle is attached to one of the
longest sides of the (c − 1) × c quasi-square.
– Q% are those configurations where the single particle is attached to one of the
shortest sides of the (c − 1) × c quasi-square.
Thus, Q¯ consists of precisely those configurations in D̄ where in (18.1.9) one k̄i
equals 1 and the others are maximal. Similarly, Q % consists of precisely those con-
% %
figurations in D where in (18.1.10) one ki equals 1 and the others are maximal.
We will see in Sect. 18.4 that the configurations in D̄, D ¯ Q
% arise from those in Q, %
via a motion of particles along the border of the droplet (see Figs. 18.5–18.6). This
property is special for Kawasaki dynamics.
Our main result in this section is the following relation, which will be needed in
Sect. 18.4.2.
Proposition 18.7 D = Q U .
436 18 Kawasaki Dynamics
(i) D ⊆ QU ,
(18.4.2)
(ii) D ⊇ QU .
• Proof of (i): Recall the definition of U -path in (18.2.5) and of the protocritical set
D = D̄ ∪ D% in Definition 18.1(b). To prove (i) we must show that for all η ∈ D ,
• Proof of (i1): Any η ∈ D has a single contour ∂(σ ) inside Λ− of length |∂(σ )| =
4c and volume |η ∩Λ− | = c (c −1)+1 = nc , while |η ∩∂ − Λ| = 0 (see Fig. 18.2).
Thus, by (18.2.1), H is constant on D . Since Q ⊆ D , this completes the proof
of (i1).
• Proof of (i2): Note that, because Q¯ and Q% are connected via a U -path (disconnect
the 1-protuberance and re-attach it to one of the neighboring sides of the (c − 1) ×
c quasi-square), we have
What (18.4.5) says is that neither D̄ nor D% can be exited via a clustering U -path.
From this in turn we deduce that for any η ∈ D and any η connected to η by a
U -path we must have that η ∈ D , which is what we want to prove. The argument
for the latter goes as follows. Detaching a particle costs 2U unless the particle is a
1-protuberance, in which case the cost is U . The only configurations in D having a
1-protuberance are those in Q. If we detach the 1-protuberance from a configuration
in Q, at cost U , then we obtain an (c − 1) × c quasi-square plus a free particle.
Since now only moves at zero cost are allowed, only the free particle can move.
Since in a U -path the particle number is conserved, the only way to regain U and
complete the U -path is to re-attach the free particle to the quasi-square, in which
case we return to Q.
Remark Note that the motion of particles along the border of a droplet may shift
the droplet. Indeed, from any configuration in Q the 1-protuberance may detach
itself and re-attach itself to a different side of the quasi-square or rectangle. Thus,
the U -path may shift the protocritical droplet to anywhere in Λ− .
1. Let us first consider clustering U -paths along which we do not move a particle
from CR− (η). Along such paths we only encounter configurations in D or configu-
438 18 Kawasaki Dynamics
Fig. 18.8 Dumb-bell shape of D = D̄ ∪ D % for U -paths: the canonical protocritical droplets Q¯ and
Q %
% are the gateways between the sets of protocritical droplets D̄ and D
rations obtained from D by breaking one of the bars in ∂ − CR(η) into two pieces, at
cost U (because there is no particle outside CR(η) that can help to lower the cost).
From the latter only moves at zero cost are possible, so no particle can be detached,
and the only way to regain U and complete the U -path is to restore a bar.
2. Let us next consider clustering U -paths along which we move a particle from a
corner of CR− (η). This move costs 2U , which exceeds U . The overshoot U must be
regained by letting the particle slide next to a bar that is attached to a side of CR− (η)
(see Fig. 18.7). Since there are never two bars attached to the same side, we can at
most gain U . This is why it is not possible to move a particle from CR− (η) other
than from a corner.
From here only moves at zero cost are allowed. There are no 1-protuberances
present anymore, because only the configurations in Q have a 1-protuberance. Thus,
no particle outside CR− (η) can move, except the one that just detached itself from
CR− (η). This particle can move back, in which case we return to the same config-
uration η. In fact, all possible moves at zero cost consist in moving the “hole” just
created in CR− (η) along the side of CR− (η), until it reaches the height of the top of
the bar attached to this side of CR− (η), after which it cannot advance anymore at
zero cost (see Fig. 18.7). All these moves do not change the energy, except the one
that returns the particle to its original position and regains U .
This proves our claim in (18.4.5), completes the proof of (ii) in (18.4.2), and
hence of Proposition 18.7.
We saw above that U -paths cannot exit D = D̄ ∪ D %, but can make a crossover
between D̄ and D . This crossover can, however, only occur between Q¯ and Q.
% %
A schematic picture of D therefore is as in Fig. 18.8.
18.4 Structure of the communication level set 439
Proposition 18.8
(i) Φ(, ) = Γ .
(ii) S (, ) ⊇ D fp .
1. We first show that the configurations in Q are connected to by a path that stays
below Γ .
Lemma 18.9 For any η1pr ∈ Q there exists a γ : η1pr → such that maxξ ∈γ H (ξ )
< Γ .
Γ − (2Δ − U ). (18.4.6)
Second, we detach a particle from a corner of the quasi-square, which costs 2U , and
move it out of the box, which pays Δ. Thus, the energy increases by 2U − Δ when
detaching and removing a particle from a corner of the quasi-square. We repeat this
operation another c − 3 times, each time picking particles from the bar on the same
shortest side. To guarantee that we never reach energy Γ , we have the condition
that
(2U − Δ)k + 2U < 2Δ − U for 0 ≤ k ≤ c − 3, (18.4.7)
or
U
3 ≤ c < + 1. (18.4.8)
2U − Δ
The second inequality holds by the definition of c in (18.1.5) and the non-
degeneracy assumption in (18.1.6), the first inequality by our exclusion of c = 2
(recall the statement made at the end of Sect. 18.3). Third, detaching the last par-
440 18 Kawasaki Dynamics
which is weaker than (18.4.7) because 2U − Δ < U . Removal of the last particle
pays Δ, so that we arrive at energy
Γ − (2Δ − U ) + (2U − Δ)(c − 2) + U − Δ = Γ − 2Δ + (2U − Δ)(c − 1),
(18.4.10)
which is strictly smaller than (18.4.6) by the second inequality in (18.4.8). Thus,
removal of a row of length c − 1 from the (c − 1) × c quasi-square in η1pr ∈ Q
lowers the energy (see Fig. 18.9). We now have a square of side length c − 1. It
is obvious that we can remove further rows without encountering new conditions,
until we reach .
2. For η1pr ∈ Q, let η2pr be the configuration obtained from η1pr by attaching an
extra particle next to the 1-protuberance, thereby forming a 2-protuberance. We next
show that η2pr is connected to by a path that stays below Γ .
Lemma 18.10 For any η1pr ∈ Q there exists a γ : η2pr → such that maxξ ∈γ H (ξ )
< Γ .
Proof Without loss of generality we may assume that η1pr ∈ D̄ because of Propo-
sition 18.7. Fix η1pr ∈ Q. Note that H (η2pr ) = Γ − 2U . First, we create a par-
ticle, which costs Δ and raises the energy to Γ − (2U − Δ)(< Γ ), move it to
the droplet, which costs nothing, and attach it next to the 2-protuberance, which
pays 2U , thereby forming a bar of length 3. This operation pays 2U − Δ. We can
18.4 Structure of the communication level set 441
repeat this operation another c − 3 times until the row is filled. By that time we
have a square of side length c and energy
Second, we create another particle and attach it anywhere to the square to form a
new 1-protuberance. This operation costs Δ − U . We must make sure that we can
still create a particle without reaching energy Γ , which gives us the condition
or
U
c > , (18.4.13)
2U − Δ
which holds by the definition of c and the non-degeneracy assumption in (18.1.6).
Third, we create another particle and attach it next to the new 1-protuberance. This
brings us to energy
Γ − U − (2U − Δ)c , (18.4.14)
which is below the energy of η2pr by (18.4.13). It is obvious that we can add further
rows without encountering new conditions, until we reach .
Proof Any path γ : → must cross the set Vc (c −1) . As shown in Alonso and
Cerf [4], Theorem 2.6, in Vc (c −1) the unique (modulo translations and rotations)
configuration of minimal energy is the (c − 1) × c quasi-square, which we denote
by η and which has energy
All other configurations in Vc (c −1) have energy at least Γ − 2Δ + 2U . To increase
the particle number starting from any such configuration, we must create a particle
442 18 Kawasaki Dynamics
Proof Follow the path until it hits the set Vc (c −1) . According to Lemma 18.11, the
configuration in this set must be an (c − 1) × c quasi-square. Since we need not
consider any paths that return to the set Vc (c −1) afterwards, a first step beyond the
quasi-square must be the creation of a new particle. This brings us to energy
Γ − Δ + U. (18.4.16)
Before any new particle is created, we must lower the energy by at least U . The
obviously only possible way to do this is to move the particle to the quasi-square
and attach it to one of its sides, which reduces the energy to
Γ−Δ (18.4.17)
6. It now suffices to show that to reach from Q we must reach energy Γ . This
goes as follows. Starting from Q, it is impossible to reduce the energy without
lowering the particle number. Indeed, this follows from Alonso and Cerf [4], Theo-
rem 2.6, which asserts that the minimal energy in Vc (c −1)+1 is realised (although
not uniquely) by the configurations in Q. Since any further move to increase the
particle number involves the creation of a new particle, the energy must reach Γ .
Lemmas 18.11–18.12 imply that Φ(, ) = Γ , and together with Steps 1–3 com-
pletes the proof of Proposition 18.8(i).
Lemma 18.13 The set of configurations in Vc (c −1)+1 that can be reached from
by a path that stays below Γ and for which it is possible to add a particle without
exceeding Γ coincides with the set Q U defined in Definition 18.6(b).
Proof From step 2 above it is clear that the definition of Q U precisely assures that
the assertion holds true. Indeed, by Lemma 18.12, any γ ∈ ( → )opt crosses
Vc (c −1)+1 in Q. Once it is in Q, before the arrival of the next particle, which
costs Δ, it can reach all configurations that have the same energy, the same particle
number, and can be reached at cost ≤ U < Δ.
The restriction in observation (1) that the free particle must be at lattice distance ≥ 3
from the protocritical droplet is needed for the following reason: If the protocritical
droplet is a configuration in D \ Q and the free particle sits at lattice distance 2
from a corner of a bar, diagonally opposite the particle that sits in the corner of the
bar, then at zero cost this particle may detach itself from the bar and slide inbetween
the quasi-square and the free particle. For observation (3) note the following: if we
start from the configuration described above and slide the remaining particles in the
bar one by one, all at zero cost except the last one, which pays U , then we reach a
configuration where the free particle is attached to the protocritical droplet with the
bar shifted.
The following definition introduces the notion of good sites (G) and bad sites (B)
on the border of protocritical droplets (see Fig. 18.10).
Definition 18.15
(a) For η ∈ D fp , write η = (η̂, x) with η̂ ∈ D the protocritical droplet and x ∈ ∂ − Λ
the position of the free particle.
(b) Let the configurations that can be reached from η = (η̂, x) ∈ D fp according to
observation (3) be denoted by
(c) Let
CG = C G (η̂), CB = C B (η̂). (18.4.19)
η̂∈D η̂∈D
The next proposition, which is the main result of this section, shows that when
the dynamics reaches C G it has gone “over the hill”, while when it reaches C B it
has not.
18.4 Structure of the communication level set 445
Proposition 18.16
(i) If η ∈ C G , then there exists a γ : η → such that maxξ ∈γ H (ξ ) < Γ ∗ .
(ii) If η ∈ C B , then there are no γ : η → or γ : η → such that maxξ ∈γ H (ξ ) <
Γ ∗.
cle attaches itself on top of that bar, forming a 1-protuberance. Then the energy is
Γ − U . Slide this bar to the end of the side it is attached to (at cost and gain U ) and
slide the two bars on the neighboring sides to the end as well (at cost and gain U ).
Then the energy is again Γ − U . Next move the shorter bar on top of the longer
bar via a motion as in Fig. 18.6. When the last particle of the bar is moved, it can
be detached (at cost U ) and re-attached (at gain 2U ). Then the energy is Γ − 2U .
Now create a free particle (at cost Δ), move it to the droplet (at cost 0), and attach it
in a corner of the droplet (at gain 2U ). Continue “downhill” in this way, adding on
successive rows as in the reference path that was used above, until is reached.
Proposition 18.16(ii) shows that the configurations in C B are wells, i.e., their
energy is < Γ , but to move to either or the energy must return to Γ . The
configurations of the form “quasi-square plus dimer” described in observation (2)
are elements of S (, ) but not of C (, ). Indeed, the only possible move at
zero cost is the one where the free particle jumps back to the quasi-square.
In this section we show that the cardinality of D modulo shifts of the protocritical
droplet equals the formula given in (18.1.13).
Proof First we consider D̄ . We have to count the number of different shapes of the
clusters in D̄ (recall Fig. 18.2). We do this by counting in how many ways c − 1
particles can be removed from the four bars of an c × c square starting from
the four corners (recall Definition 18.1(b)). We split the counting according to the
number k = 1, 2, 3, 4 of corners from which particles are removed. The number of
ways in which we can choose k corners is k4 . After we have removed the particles
at these corners, we need to remove c − 1 − k more particles from either side of
18.5 Asymptotics of the prefactor for large volumes 447
c − 2 + k
= . (18.4.20)
2k − 1
The counting for D % is the same, except that we start from an (c − 1) × (c + 1)
rectangle and count in how many ways c − 2 particles can be removed from the
four bars. The answer is the same as in (18.4.20) with c − 1 replaced by c − 2,
except for an extra factor 2 that counts the two orientations of the rectangle.
In this section we prove Theorem 18.4. Our starting point is the variational formula
for Θ = 1/K given in Lemma 16.17. In Sect. 18.5.1 we define certain objects that
capture the geometry of critical droplets and wells. In Sect. 18.5.2 we derive upper
and lower bounds for Θ in terms of certain capacities of simple random walk on
Λ+ restricted not to enter the support of a protocritical droplet. In Sect. 18.5.3 we
compute the asymptotics of these capacities in the limit as Λ → Z2 , and show that
the upper and lower bounds merge because of the recurrence of simple random walk
on Z2 .
In the proof we need one more definition, which relies on the geometric structure
outlined in Sect. 18.4.4. Recall the definition of S , S and Si , i = 1, . . . , I , from
(16.2.25) and (16.3.3)–(16.3.4). Abbreviate supp+ (η̂) = supp(η̂) ∪ ∂ + supp(η̂).
Definition 18.17
(a) Let DΛ = {η = (η̂, x) : η̂ ∈ D, x ∈ Λ \ supp+ (η̂)}.
fp
be the set of good sites, respectively, bad sites for η̂. Note that (η̂, x) may be in
the same Si for different x ∈ B(η̂).
448 18 Kawasaki Dynamics
Note that B(η̂) can be partitioned into disjoint sets B1 (η̂), . . . , B|I (η̂)| (η̂) accord-
ing to which Si the configuration (η̂, x) belongs to.
(d) Write
+
CS(η̂) = supp(η̂) ∪ G(η̂), CS (η̂) = CS(η̂) ∪ ∂ + CS(η̂),
(18.5.3)
++
CS (η̂) = CS+ (η̂) ∪ ∂ + CS+ (η̂).
By Proposition 18.16, the link between the sets in Definitions 18.15(b) and
18.17(b) is
C G (η̂) = (η̂, x),
x∈G(η̂)
(18.5.4)
C B (η̂) = (η̂, x).
x∈B(η̂)
For the argument below it is important that G(η̂) = ∅ for all η̂ ∈ D . On the other
hand, the sets B(η̂), η̂ ∈ D , will turn out to play no role for the asymptotics of K as
Λ → Z2 .
where
+ 2
cap Λ ∂ + Λ, F = min 1
2 g(x) − g x , F ⊂ Λ, (18.5.6)
g : Λ+ →[0,1]
g| + =1, g|F =0
∂ Λ
(x,x )∈(Λ+ )
To see how this bound arises from (16.3.11), pick h in (16.3.11) and g in (18.5.7)
such that
(η̂, x) ∈ S , x ∈ G(η̂),
(η̂, x) ∈ Si , x ∈ Bi (η̂), i ∈ I (η̂), (18.5.9)
(η̂, x) ∈ D ⊂ S , x ∈ ∂ + Λ.
A further lower bound is obtained by removing from the right-hand side of (18.5.9)
the boundary condition on the sets Bi (η̂), i ∈ I (η̂). This gives
2
Θ≥ min 1
2 g(x) − g x
g : Λ+ →[0,1]
η̂∈D g|G(η̂) =0, g| + =1
∂ Λ
(x,x )∈[Λ+ \supp(η̂)]
+ \supp(η̂) +
= cap Λ ∂ Λ, G(η̂) , (18.5.10)
η̂∈D
where the upper index Λ+ \supp(η̂) refers to the fact that no moves in and out of
supp(η̂) are allowed (i.e., this set acts as an obstacle for the free particle). To com-
plete the proof we show that, in the limit as Λ → Z2 ,
+ +
cap Λ ∂ + Λ, supp(η̂) ∪ G(η̂) ≥ cap Λ \supp(η̂) ∂ + Λ, G(η̂)
+
≥ cap Λ ∂ + Λ, supp(η̂) ∪ G(η̂) − O [1/ ln |Λ|]2 . (18.5.11)
+
We will show in Sect. 18.5.2 that cap Λ (∂ + Λ, CS(η̂)) decays like 1/ ln |Λ|. Since
CS(η̂) = supp(η̂) ∪ G(η̂) by Definition 18.17(d), the lower bound Θ ≥ Θ1 follows.
450 18 Kawasaki Dynamics
Remark 18.19 Before we prove (18.5.11), note that the capacity in the right-hand
side of (18.5.11) includes more transitions than the capacity in the left-hand side,
namely, all transitions from supp(η̂) to B(η̂). Let
Λ+ \supp(η̂) + \supp(η̂) +
g∂ + Λ,G(η̂) (x) = equilibrium potential for cap Λ ∂ Λ, G(η̂) at x.
(18.5.12)
Below we will show that
Λ+ \supp(η̂)
g∂ + Λ,G(η̂) (x) ≤ C/ ln |Λ| ∀ x ∈ B(η̂) for some C < ∞. (18.5.13)
Since in the Dirichlet form in (18.5.6) the equilibrium potential appears squared, the
error made by adding to the capacity in the left-hand side of (18.5.11) the transitions
from supp(η̂) to B(η̂) is of order [1/ ln |Λ|]2 times |B(η̂)|, which explains how
(18.5.11) arises.
η̂
Formally, let Px be the law of the simple random walk that starts at x ∈ B(η̂) and
is forbidden to visit the sites in supp(η̂). Let y ∈ G(η̂). As in the proof of Lemma 8.4,
we have
η̂
Λ+ \supp(η̂) Px (τ∂ + Λ < τG(η̂)∪x )
g∂ + Λ,G(η̂) (x) = Pη̂x (τ∂ + Λ < τG(η̂) ) = η̂
Px (τG(η̂)∪∂ + Λ < τx )
+
cap Λ \supp(η̂) (x, ∂ + Λ)
η̂
Px (τ∂ + Λ < τx )
≤ ≤ + . (18.5.14)
η̂
Px (τy < τx ) cap Λ \supp(η̂) (x, y)
The denominator of (18.5.14) can be bounded from below by some C > 0 that is
independent of x, y and supp(η̂). To see why, pick a path from x to y that avoids
supp(η̂) but stays inside a layer around supp(η̂), and argue as in the proof of the
lower bound of Lemma 6.11. On the other hand, the numerator is bounded from
+
above by cap Λ (x, ∂ + Λ), i.e., by the capacity of the same pair of sets for a ran-
dom walk that is not forbidden to visit supp(η̂), since the Dirichlet problem asso-
ciated to the latter has the same boundary conditions but includes more transitions.
+
In the proof of Lemma 18.20 below, we will see that cap Λ (x, ∂ + Λ) decays like
C / ln |Λ| for some C < ∞ (see (18.5.21)–(18.5.22) below). We therefore con-
clude that indeed (18.5.13) holds with C = C /C .
fp
consists of those configurations in DΛ for which the free particle is at distance ≥ 2
of the protocritical droplet and the set of good sites. The choice in (18.5.15) gives
+
Θ≤ cap Λ ∂ + Λ, CS++ (η̂) . (18.5.17)
η̂∈D
– Since D ⊂ S , the first line of (18.5.15) implies that h(η) = 1 for η = (η̂, x) with
η̂ ∈ D and x ∈ ∂ + Λ, which is consistent with the boundary condition g|∂ + Λ = 1
in (18.5.6).
– The third line of (18.5.15) implies that h(η) = 0 for η = (η̂, x) with η̂ ∈ D and
x ∈ CS++ (η̂), which is consistent with the boundary condition g|F = 0 in (18.5.6)
for F = CS++ (η̂).
Note further that:
– The only transitions in S between S and C ++ are those where a free particle
enters ∂ − Λ.
– The only transitions in S between C ++ and S \[S ∪ C ++ ] are those where the
free particle moves from distance 2 to distance 1 of the protocritical droplet. All
other transitions either involve a detachment of a particle from the protocritical
droplet (which raises the number of droplets) or an increase in the number of
particles in Λ. Such transitions lead to energy > Γ , which is not possible in S .
– There are no transitions between S and S \[S ∪ C ++ ].
The latter arguments show that (18.5.6) includes all the transitions in (16.3.11).
With Lemma 18.18 we have obtained upper and lower bounds on Θ in terms of
capacities for simple random walk on Z2 of the pairs of sets ∂ + Λ and CS(η̂), re-
spectively, CS++ (η̂), with η̂ summed over D . We use these bounds to prove Theo-
rem 18.4. The transition rates of the simple random walk are 1 between neighbour-
ing pairs of sites.
452 18 Kawasaki Dynamics
Fig. 18.12 Simple random walk of a free particle moving from ∂ + BM to CS(η̂), respectively,
CS++ (η̂)
Proof Lemma 18.20 below shows that, in the limit as Λ → Z2 , each of the ca-
pacities in the upper and lower bound on Θ has the same asymptotic behaviour,
namely, [1 + o(1)] 4π/ ln |Λ|, irrespective of the location and shape of the protocrit-
ical droplet (provided it is not too close to ∂ + Λ, which is a negligible fraction of the
possible locations). In what follows we take Λ = BM = [−M, +M]2 ∩ Z2 for some
M ∈ N large enough (M > 2c ).
Proof We only prove the first line of (18.5.19). The proof of the second line is
similar.
• Lower bound: For η̂ ∈ D , let y ∈ CS(η̂) ⊂ BM denote the site closest to the center
of CS(η̂). The capacity decreases when we enlarge the set over which the Dirichlet
form is minimised. Therefore we have
+ +
cap BM ∂ + BM , CS(η̂) ≥ cap BM ∂ + BM , y
+ +
= cap (BM −y) ∂ + (BM − y), 0 ≥ cap B2M ∂ + B2M , 0 , (18.5.20)
+
where the last equality uses that (BM − y)+ ⊂ B2M because y ∈ BM . By the ana-
logue of (16.2.5) for simple random walk, we have (compare (18.5.6) with (16.2.1)–
(16.2.2))
+ +
capB2M ∂ + B2M , 0 = capB2M 0, ∂ + B2M = 4 P0 (τ∂ + B2M < τ0 ), (18.5.21)
18.6 Extension to three dimensions 453
where P0 is the law on path space of the discrete-time simple random walk on Z2
starting at 0. It is a standard fact (see e.g. Révész [205], Lemma 22.1) that
π
P0 (τ∂ + B2M < τ0 ) = 1 + o(1) , M → ∞. (18.5.22)
2 ln(2M)
Combining (18.5.20)–(18.5.22), we get the desired lower bound.
We are now ready to complete the proof of Theorem 18.4. Combining Lem-
mas 18.18–18.20, we find that Θ ∈ [Θ1 , Θ2 ] with
+
Θ1 = O(εM) + cap BM ∂ + BM , CS(η̂)
η̂∈D
d(∂ + BM ,supp(η̂))≥εM
2π
= O(εM) + η̂ ∈ D : d ∂ + BM , supp(η̂) ≥ εM 1 + o(1)
ln M
2π 2
= O(εM) + N (c ) 2(1 − ε)M 1 + o(1) , M → ∞, (18.5.25)
ln M
for any ε > 0 and the same expression for Θ2 , where we use that
+ +
cap BM ∂ + BM , CS(η̂) ≤ cap BM BM+
\CS(η̂), CS(η̂) = 12 CS+ (η̂) ≤ 12 (c + 2)2 ,
(18.5.26)
and we recall that N (c ) is the cardinality of D modulo shifts of the pro-
tocritical droplets. Let M → ∞ followed by ε ↓ 0, to conclude that Θ ∼
2πN(c )(2M)2 / ln M. Since |Λ| = (2M + 1)2 and K = 1/Θ, this proves (18.1.12)
in Theorem 18.4.
Let Λ ⊂ Z3 be a large cubic box, centred at the origin. The metastable parameter
range replacing (18.1.4) is
Δ ∈ (U, 3U ), (18.6.1)
and, similarly as in (18.1.6), we assume that
U 2U
∈
/ N, ∈
/ N. (18.6.2)
3U − Δ 3U − Δ
The analogue of Definitions 18.1(b–c) and 18.6 reads:
Definition 18.21
(a) Let Q denote the set of configurations having one cluster anywhere in Λ− con-
sisting of an (mc − 1) × (mc − δc ) × mc quasi-cube with, attached to one of
its faces, an (c − 1) × c quasi-square with, attached to one of its sides, a sin-
gle particle. Here, δc ∈ {0, 1} depends on the arithmetic properties of U and Δ,
while
9 : 9 :
U 2U
c = , mc = , (18.6.3)
3U − Δ 3U − Δ
are the two-dimensional critical droplet size on a face, respectively, the three-
dimensional critical droplet size, replacing (18.1.5). Note that mc ∈ {2c − 1,
2c }.
(b) For Δ ∈ (2U, 3U ), let Q 2U denote the set of configurations that can be reached
from some configuration in Q via a 2U -path, i.e.,
Q 2U = η ∈ Vnc : ∃ η ∈ Q : H (η) = H η , ΦVnc η, η ≤ H (η) + 2U ,
(18.6.4)
where nc = mc (mc − δc )(mc − 1) + c (c − 1) + 1 is the volume of the clusters
in Q. For Δ ∈ (U, 2U ), use U instead of 2U in (18.6.4).
(c) Let [Q 2U ]fp denote the set of configurations obtained from Q 2U by adding a
free particle anywhere in ∂ − Λ (see Fig. 18.13).
(d) Let
fp
Γ = Γ (, ) = H Q 2U = H Q 2U + Δ = H (Q) + Δ
= U mc (mc − δc ) + mc (mc − 1) + (mc − δc )(mc − 1) + 2c + 3
− (3U − Δ) mc (mc − δc )(mc − 1) + c (c − 1) + 2 .
(18.6.5)
Fig. 18.14 An example of motion of particles along the border of the droplet
Also Theorem 18.2 carries over: the proof of (H1–H2) is the same as in
Sects. 18.3.1–18.3.2, except that for (H1) a little extra care is needed to handle the
geometry in three dimensions.
As in two dimensions, no easily computable formula for K is available. Similarly
as in Sect. 18.5, however, the prefactor K can be estimated in terms of capacities
associated with three-dimensional simple random walk. Since the latter is transient,
the large volume scaling of these capacities is no longer independent of the shape
and the location of the protocritical droplet. Therefore Theorem 18.4 carries over in
a somewhat weaker form.
1
lim |Λ| K(Λ, c , mc , δc ) = , (18.6.6)
Λ→Z3 M(c , mc , δc )N (c , mc , δc )
456 18 Kawasaki Dynamics
with κ(m) the capacity of the m × m × m cube for simple random walk on Z3 .
Proof The extension of the proof in Sect. 18.5 from two to three dimensions is in
principle straightforward and involves no new ideas. The geometry of the commu-
nication level set is less explicit, but no detail is needed for the proof.
By the transience of simple random walk in three dimensions, we know that
+
lim cap Λ ∂ + Λ, F = cap Z (F )
3
(18.6.8)
Λ→Z3
exists for any finite non-empty F ⊂ Z3 . The limit, which is positive and finite, is the
capacity of F . Let κ(m) = cap Z (m × m × m) be the capacity of the m × m × m
3
with κ the capacity of the unit cube for standard Brownian motion on R3 . Since
2πR is the capacity of the√ball with radius R for standard Brownian motion on R3 ,
we have that κ ∈ (2π, 2π 3).
The lower bound in (18.6.7) comes from the fact that all protocritical droplets
√
contain a cube of side length mc − mc . The upper bound comes from the fact that
all protocritical droplets are contained in a cube of side length mc + 1, and that as
long as the free particle is at distance ≥ 2 from the protocritical droplet no border
motion is possible. Both these facts are easy to establish.
With the help of (18.6.7) and (18.6.9), we have good control over M(c , mc , δc )
for mc large, i.e., for Δ close to 3U . We have no formula for N (c , mc , δc ) analo-
gous to (18.1.13). It would be nice to know its asymptotics for mc large.
1. The results in this chapter are taken from Bovier, den Hollander and Nardi [31],
with geometric input from den Hollander, Nardi, Olivieri and Scoppola [84]. Cruder
versions of the main results in Chap. 16 for Kawasaki dynamics, derived with the
help of the pathwise approach to metastability, were obtained by den Hollander,
Olivieri and Scoppola [88–90] in two dimensions and by den Hollander, Nardi,
Olivieri and Scoppola [84] in three dimensions. The latter paper contains the “piston
construction” mentioned in Sect. 18.3.1.
18.7 Bibliographical notes 457
2. The formula for the number of protocritical droplets modulo shifts claimed in [31]
is wrong. The correct formula is (18.1.13), as shown in Sect. 18.4.5. The authors are
grateful to Markus Mayer for pointing out the error.
3. For details of the argument needed in Sect. 18.3.1 to extend the proof of (H1) from
Glauber dynamics to Kawasaki dynamics, see [84]. For a comparison of Glauber
dynamics and Kawasaki dynamics, see den Hollander [81].
5. For more information on the tube of typical trajectories, or nucleation pattern, see
Olivieri and Vares [198], Sect. 7.13.
6. It would appear that the analysis in Sect. 18.6 could be extended to arbitrary di-
mension, like for Glauber dynamics (recall Sect. 17.6, Item 3). However, this exten-
sion has never been written out in detail. The set of critical droplets is quite complex
due to the motion of particles along the border of droplets. In two dimensions we
have a full understanding of this motion, in three dimensions a partial understanding
(see [84]), while in higher dimensions we know very little. It is clear that the critical
droplets for Glauber dynamics all are protocritical droplets for Kawasaki dynam-
ics. But the border motion can create many additional shapes, all via V -paths with
V < Δ.
8. Kawasaki dynamics with two types of particles, with binding energy −U < 0
between particles of different types (and no binding energy between particles of the
same type) and with different activation energies Δ1 > 0 and Δ2 > 0, was studied in
den Hollander, Nardi and Troiani [85–87]. There are several regimes, with critical
droplets being either square-shaped or rhombus-shaped. The proof of (H1)–(H2)
is quite involved, and is hampered by the fact that droplets with fixed volume and
minimal surface change shape when they come close to ∂Λ.
Part VII
Applications: Lattice Systems in Large
Volumes at Low Temperatures
Part VII looks at nucleation in lattice systems that grow to infinity as the tempera-
ture tends to zero. Spatial entropy comes into play: in large volumes, even at low
temperatures, entropy is competing with energy because the metastable state and
the states that evolve from it under the dynamics have a non-trivial spatial structure.
Chapter 19 looks at Glauber dynamics, Chap. 20 at Kawasaki dynamics.
The transition from the metastable state (with only subcritical droplets) to the
stable state (with one or more supercritical droplets) is triggered by the appearance
of a single critical droplet somewhere in the system. The main property driving the
results in Chaps. 19–20 is that the average time until this appearance is inversely
proportional to the volume. This property is referred to as homogeneous nucleation,
because it says that the critical droplet for the transition appears essentially inde-
pendently in small volumes that partition the large volume.
No information will be obtained about what happens to the system after the criti-
cal droplet has appeared. This belongs to the post-nucleation regime, which is much
harder than the pre-nucleation regime considered here, and which will be briefly ad-
dressed in Chap. 23. Our results are further limited in the sense that we need to draw
the initial configuration according to a specific distribution on the set of subcritical
configurations, namely, the last-exit biased distribution introduced in Chap. 8. To
show that the same results hold for more general initial distributions we would need
to establish strong recurrence properties of the dynamics within the metastable state.
Another limitation is that there will be no proof that the nucleation time divided by
its average converges to the exponential distribution.
Contrary to Chap. 16, where for small volumes we were able to deal with a gen-
eral dynamics under a general set of hypotheses, the situation for large volumes is
significantly more difficult. This is why we can so far offer results only for Glauber
and Kawasaki. It remains a challenge to develop a more abstract set-up.
Chapter 19
Glauber Dynamics
The goal of this chapter is to extend the analysis of Chap. 17 to volumes that grow
moderately fast as the temperature decreases. Let Λβ ⊂ Z2 be a square box with
periodic boundary conditions such that limβ→∞ |Λβ | = ∞. We run the Glauber dy-
namics on Λβ starting from a random initial configuration where all the droplets
(= clusters of plus-spins) are small. For large β, and in the parameter range cor-
responding to the metastable regime (recall Sect. 17.1.2), the transition from the
metastable state (with only subcritical droplets) to the stable state (with one or
more supercritical droplets) is triggered by the appearance of a single critical droplet
somewhere in Λβ . We will show that the average time until this happens scales like
eΓ β /N (c )|Λ|, where Γ and N (c ) are the quantities as for small volumes (recall
Sect. 17.1.3). This scaling is valid as long as the average nucleation time tends to
infinity.
We retain the setting of Sect. 17.1.1, expect that we replace the torus Λ ⊂ Z2 by
a β-dependent torus Λβ ⊂ Z2 . Accordingly, we write Sβ , Hβ instead of S, H to
indicate that the configuration space and the Hamiltonian also depend on β.
Definition 19.1
(a) Let CB (σ ), σ ∈ Sβ , be the configuration that is obtained from σ by a “bootstrap
percolation map”, i.e., by circumscribing all the droplets in σ with rectangles,
Definition 19.2
(a) S = {σ ∈ Sβ : CB (σ ) is subcritical}.
(b) P = {σ ∈ S : cβ (σ, σ ) > 0 for some σ ∈ S c }.
(c) C = {σ ∈ S c : cβ (σ, σ ) > 0 for some σ ∈ S }.
Remark 19.3 The sets P, C will play a similar rôle as, but are not directly compa-
rable with, the sets P , C in Chap. 17.
R1,2 (x), R2,2 (x), R2,3 (x), R3,3 (x), . . . , Rc −1,c −1 (x), Rc −1,c (x). (19.1.1)
Our starting configurations will be drawn from one of the sets SL ⊂ S defined by
SL = σ ∈ S : each rectangle in CB (σ ) fits inside QL (x) for some x ∈ Λβ ,
(19.1.2)
for any L ∈ N that satisfies L∗ ≤ L ≤ 2c − 3 with
L∗ = min 1 ≤ L ≤ 2c − 3 : lim μβ (SL )/μβ (S ) = 1 . (19.1.3)
β→∞
with ΓL+1 the energy needed to create a droplet QL+1 (0) at the origin. Thus, if
|Λβ | = eθβ , then L∗ = L∗ (θ ) = (2c − 3) ∧ min{L ∈ N : ΓL+1
> θ }, which in-
creases stepwise from 1 to 2c − 3 as θ increases from 0 to Γ , with Γ the com-
munication height in Chap. 17.
Throughout this chapter we assume that we are in the metastable regime where
h ∈ (0, 2J ) (recall Sect. 17.1.2). We further assume that
The second condition ensures that the existence of a critical droplet anywhere in the
box is still a rare event and does only occur after a large time. If this condition were
violated, then the metastable transition would no longer be dominated by the time
of nucleation, but by the growth of supercritical droplets that exist somewhere far
away.
For σ ∈ Sβ , let Pσ denote the law of the Glauber dynamics starting from σ . For
ν a probability distribution on Sβ , write
Pν (·) = ν(σ ) Pσ (·). (19.1.8)
σ ∈Sβ
Abbreviate
Theorem 19.4 (Mean crossover time) Subject to (19.1.3) and (19.1.7), the follow-
ing hold:
464 19 Glauber Dynamics
(a)
1
lim |Λβ | e−βΓ EνSL ,S c (τS c ) =
. (19.1.10)
β→∞ N1
(b)
1
lim |Λβ | e−βΓ EνSL ,S c \C (τS c \C ) =
. (19.1.11)
β→∞ N2
(c)
1
lim |Λβ | e−βΓ EνSL ,DM (τDM ) =
, ∀c ≤ M ≤ 2c − 1. (19.1.12)
β→∞ N2
19.1.3 Discussion
1. Theorem 19.4(a) says that the average time to create a critical droplet is [1 +
o(1)]eβΓ /N1 |Λβ |. Theorems 19.4(b–c) say that the average time to go beyond this
critical droplet and to grow a droplet that is twice as large is [1 + o(1)]eβΓ /N2 |Λβ |.
The factor N1 counts the number of shapes of the critical droplet, while |Λβ | counts
the number of locations. The average times to create a critical, respectively, a su-
percritical droplet differ by a factor N2 /N1 < 1. This is because, as we saw in
Sect. 17.1.4, item 3, once the dynamics is “on top of the hill” C it has a posi-
tive probability to “fall back” to S . On average the dynamics makes N1 /N2 > 1
attempts to reach the top C before it finally “falls over” to S c \C . After that, it
rapidly grows a large droplet.
2. If the second condition in (19.1.7) fails, then there is a positive probability to see a
protocritical droplet in Λβ under the starting measure νSL ,S c , and nucleation sets in
immediately. In that situation different questions about the system become relevant,
which are no longer nucleation-driven but are growth-driven (see Chap. 23). Theo-
rem 19.4(a) continues to be true, but it no longer describes metastable behaviour.
Outline Theorem 19.4 is proved in Sects. 19.2–19.4. Along the way we need
two technical facts whose proofs are deferred to Sects. 19.5–19.6. These deal with
sparseness of subcritical droplets and typicality of starting configurations, respec-
tively.
19.2 Average time to create a critical droplet 465
1
r.h.s. (19.2.1) = eβΓ 1 + o(1) , β → ∞. (19.2.2)
N1 |Λβ |
Lemma 19.5 σ ∈S μβ (σ )hSL ,S c (σ ) = μβ (S )[1 + o(1)] as β → ∞.
N1 = 4c .
Proof The proof proceeds via upper and lower bounds, which are written out be-
low.
466 19 Glauber Dynamics
Fig. 19.1 Rc −1,c (x) (shaded box) and [Rc +1,c +2 (x − (1, 1))]c (complement of dotted box)
Upper bound
Proof We use the Dirichlet principle and a test function that is equal to 1 on S to
get the upper bound
CAP SL , S
c
≤ CAP S , S c = μβ (σ )cβ σ, σ (19.2.4)
σ ∈S ,σ ∈S c
cβ (σ,σ )>0
= μβ (σ ) ∧ μβ σ ≤ μβ (C ),
σ ∈S ,σ ∈S c
cβ (σ,σ )>0
where the second equality uses reversibility in combination with the fact that
cβ (σ, σ ) ∨ cβ (σ , σ ) = 1. Thus, it suffices to show that
μβ (C ) ≤ N1 |Λβ | e−βΓ μβ (S ) 1 + o(1)
as β → ∞. (19.2.5)
For every σ ∈ P there are one or more rectangles Rc −1,c (x), x = x(σ ) ∈ Sβ , that
are filled by (+1)-spins in CB (σ ). If σ ∈ C is such that σ = σ y for some y ∈ Λβ ,
then σ has a (+1)-spin at y situated on the boundary of one of these rectangles
(recall Definition 19.2). Let
Sˆ (x) = σ ∈ S : supp[σ ] ⊆ Rc −1,c (x) ,
c (19.2.6)
Sˇ (x) = σ ∈ S : supp[σ ] ⊆ Rc +1,c +2 x − (1, 1) .
Hence
1 −βHβ (σ x )
μβ (C ) = e
Zβ
σ ∈P
x∈Λ β
σ x ∈C
1
≤ N1 e−β[2J −h−Hβ ()] e−βHβ (σ̌ ) e−βHβ (σ̂ )
Zβ
x∈Λβ σ̌ ∈Sˇ (x) σ̂ ∈Sˆ (x)
σ̂ ∨σ̌ ∈P
1
N1 |Λβ | e−βΓ e−βHβ (σ̌ )
≤ 1 + o(1)
Zβ
σ̌ ∈Sˇ (0)
= 1 + o(1) N1 |Λβ | e−βΓ μβ Sˇ (0) ,
(19.2.9)
where the first inequality uses (19.2.7)–(19.2.8), with N1 = 2 × 2c = 4c counting
the number of critical droplets that can arise from a protocritical droplet via a spin
flip, and the second inequality uses that
σ̂ ∈ Sˆ (0), σ̂ ∨ σ̌ ∈ P =⇒ Hβ (σ̂ ) ≥ Hβ Rc −1,c (0) = Γ − (2J − h) + Hβ ()
(19.2.10)
with equality in the right-hand side if and only if supp[σ̂ ] = Rc −1,c (0). Combining
(19.2.4) and (19.2.9) with the inclusion Sˇ(0) ⊂ S , we get the upper bound in
(19.2.5).
Lower bound
Proof We exploit Theorem 7.43 by making a judicious choice for the flow f . In fact,
for Glauber dynamics this choice will be simple: with each configuration σ ∈ SL
we associate a configuration in C ⊂ S c containing a unique critical droplet and a
flow that, from each such configuration, follows a unique deterministic path along
which this droplet is broken down in the canonical order (see Fig. 19.2) until the set
SL is reached, i.e., a square or quasi-square droplet with label L is left over (recall
(19.1.1)–(19.1.2)).
The proof comes in 5 steps.
Fig. 19.4 Going from SL to CL by adding a critical droplet P(y) (x) somewhere in Λβ
and define
W = σ ∈ S : supp[σ ] ≤ |Λβ |/w(β) . (19.2.12)
Let CL ⊂ C ⊂Sc be the set of configurations obtained by picking any σ ∈ SL ∩W
and adding somewhere in Λβ a critical droplet at distance ≥ 2 from supp[σ ]. Note
that the density restriction imposed on W guarantees that adding such a droplet
is possible almost everywhere in Λβ for β large enough. Denoting by P(y) (x) the
critical droplet obtained by adding a protuberance at y along the longest side of the
rectangle Rc −1,c (x), we may write
CL = σ ∪ P(y) (x) : σ ∈ S ∩ W , x, y ∈ Λβ , (x, y)⊥σ , (19.2.13)
where (x, y)⊥σ stands for the restriction that the critical droplet P(y) (x) is not in-
teracting with supp[σ ], which implies that Hβ (σ ∪ P(y) (x)) = Hβ (σ ) + Γ (see
Figs. 19.3 and 19.4).
2. For each σ ∈ CL , we let γσ = (γσ (0), γσ (1), . . . , γσ (K)) be the canonical path
from σ = γσ (0) to SL along which the critical droplet is broken down (γσ (k) = σk
in Fig. 19.2), where K = v(2c − 3) − v(L) with
v(L) = QL (0) (19.2.14)
19.2 Average time to create a critical droplet 469
3. We see from (19.2.15) that the flow increases whenever paths merge. In our case
this happens only after the first step, when the protuberance at y is removed. There-
fore we get the explicit form
⎧
⎪ν (σ ), if σ = σ, σ = γσ (1) for some σ ∈ CL ,
⎨ 0
f σ , σ = Cν0 (σ ), if σ = γσ (k), σ = γσ (k + 1) for some k ≥ 1, σ ∈ CL ,
⎪
⎩
0, otherwise,
(19.2.16)
where C = 2c is the number of possible positions of the protuberance on the proto-
critical droplet (see Fig. 19.2). Using Theorem 7.43, we therefore have
CAP SL , S
c
= CAP S c , SL ≥ CAP(CL , SL )
K−1 −1
f (γσ (k), γσ (k + 1))
≥ ν0 (σ )
μβ (γσ (k))cβ (γσ (k), γσ (k + 1))
σ ∈ CL k=0
−1
1
K−1
C
= + .
μβ (σ )cβ (γσ (0), γσ (1)) μβ (γσ (k))cβ (γσ (k), γσ (k + 1))
σ ∈ CL k=1
(19.2.17)
Thus, all we have to do is to control the sum between square brackets.
4. Because cβ (γσ (0), γσ (1)) = 1 (removing the protuberance lowers the energy),
the term with k = 0 equals 1/μβ (σ ). To show that the terms with k ≥ 1 are of
higher order, we argue as follows. Abbreviate Ξ = h(c − 2). For every k ≥ 1 and
σ (0) ∈ CL , we have (see Fig. 19.5)
1 −β[Hβ (γσ (k))∨Hβ (γσ (k+1))]
μβ γσ (k) cβ γσ (k), γσ (k + 1) = e
Zβ
≥ μβ (σ0 ) eβ[2J −h−Ξ ] = μβ (σ )eδβ , (19.2.18)
470 19 Glauber Dynamics
K−1
C 1
≤ CKe−δβ , (19.2.19)
μβ (γσ (k))cβ (γσ (k), γσ (k + 1)) μβ (σ )
k=1
μβ (σ ) μβ (CL )
CAP SL , S c ≥ = = 1 + o(1) μβ (CL ).
1 + CKe−βδ 1 + CKe−βδ
σ ∈ CL
(19.2.20)
1 −βHβ (σ ) 1
μβ (CL ) = e = e−βHβ (σ ∪P(y) (x))
Zβ Zβ
σ ∈ CL σ ∈ S L ∩W x,y∈Λβ
(x,y)⊥σ
1
= e−βΓ e−βHβ (σ )
1
Zβ
σ ∈ S L ∩W x,y∈Λβ
(x,y)⊥σ
≥ e−βΓ μβ (SL ∩ W ) N1 |Λβ | 1 − (c + 1)2 /w(β) .
(19.2.21)
The last inequality uses that |Λβ |(c + 1)2 /w(β) is the maximal number of sites
in Λβ where it is not possible to insert a non-interacting critical droplet (recall
(19.2.12) and note that a critical droplet fits inside an c × c square). Finally, ac-
cording to Lemma 19.9 in Sect. 19.5, we have
μβ (SL ∩ W ) = μβ (SL ) 1 + o(1) , (19.2.22)
To prove Theorem 19.4(b) we use the same technique as in Sect. 19.2. Therefore we
only give a sketch of the proof.
To estimate the average crossover time from SL ⊂ S to S c \C , we again use
Corollary 7.11, this time with A = SL and B = S c \C :
1
νSL ,S c \C (σ ) Eσ (τS c \C ) = μβ (σ ) hSL ,S c \C (σ ).
CAP(SL , S c \C )
σ ∈ SL σ ∈ S ∪C
(19.3.1)
The left-hand
side is the quantity of interest in (19.1.11). In Sects. 19.3.1–19.3.2 we
estimate both σ ∈S ∪C μβ (σ )hSL ,S c \C (σ ) and CAP(SL , S c \C ). The estimates
will show that
1
r.h.s. (19.3.1) = eβΓ 1 + o(1) , β → ∞. (19.3.2)
N2 |Λβ |
N2 = 43 (2c − 1).
Proof The proof is similar as that of Lemma 19.6, except that it takes care of the
transition probabilities away from the critical droplet (see Fig. 19.6, where σ is the
configuration that is reached through these transitions). The proof again proceeds
via upper and lower bounds, which are written out below.
472 19 Glauber Dynamics
Fig. 19.6 Canonical order to break down a proto-critical droplet plus a double protuberance. In
the first step, the double protuberance has probability 12 to be broken down in either of the two
possible ways. The subsequent steps are deterministic as in Fig. 19.2
Upper bound
Proof Recalling (7.1.35) and Lemma 7.12, and noting that Glauber dynamics does
not allow transitions within C , we have, for all h : C → [0, 1],
CAP SL , S \C ≤ CAP S , S \C
c c
2 2
≤ μβ (σ ) ĉσ h(σ ) − 1 + čσ h(σ ) − 0 , (19.3.4)
σ ∈C
where ĉσ = η∈S cβ (σ, η) and čσ = η∈S c \C cβ (σ, η). The quadratic form in the
right-hand side of (19.3.4) achieves its minimum for h(σ ) = ĉσ /(ĉσ + čσ ), so
CAP SL , S \C ≤
c
Cσ μβ (σ ) (19.3.5)
σ ∈C
1 −βHβ (σ ) 1
= e−β(2J −h) e 2 2 4 + 23 (2c − 4)
Zβ
σ ∈P
1
= e−β(2J −h) μβ (P) N2 = μβ (C ) N2 , (19.3.6)
N1
Lower bound
2 (x) the droplet obtained by adding
Proof In analogy with (19.2.13), denoting by P(y)
a double protuberance at y along the longest side of the rectangle Rc −1,c (x), we
define the set DL ⊂ S c \C by
DL = σ ∪ P(y)
2
(x) : σ ∈ SL ∩ W , x, y ∈ Λβ , (x, y)⊥σ . (19.3.7)
19.4 Average time to grow a droplet twice the critical size 473
2c − 4 1 1 4 1
= 1 + o(1) μβ (CL ) +
2c 1 + 1
2
2 2c 1
2 + 1
2
N2
= 1 + o(1) μβ (CL ) . (19.3.11)
N1
Using (19.2.21) and the remarks following it, we get the desired lower bound.
Figure 19.6 depicts the sequence of steps taken to break a protocritical droplet
down.
Proof Write
μβ (σ )hSL ,DM (σ ) = μβ (σ )hSL ,DM (σ ) + μβ (σ )hSL ,DM (σ )
σ ∈ DM
c σ ∈ SL σ ∈ DM
c \S
L
= μβ (SL ) + μβ (σ )Pσ (τSL < τDM ).
σ ∈ DM
c \S
L
(19.4.1)
The last sum is bounded above by μβ (S \SL ) + μβ (DM c \S ). But μ (S \S ) =
β L
o(μβ (S )) as β → ∞ by our choice of L in (19.1.3), while μβ (DM c \S ) =
Recall Definition 19.2(a) and (19.2.11)–(19.2.12). In this section we prove the claim
made in (19.2.22).
μβ (S \W )
Lemma 19.9 limβ→∞ 1
β ln μ β (S ) = −∞.
Kmax
1
μβ (I ) ≤ F (k) with F (k) = e−β Hβ (σ ) , (19.5.2)
|Λβ |
Zβ σ ∈Sβ :
k= C w(β) C(σ )∈D(k)
1
Since the bootstrap percolation map is downhill, the energy of a subcritical rectangle
is bounded below by C2 = 2J −h (recall Fig. 19.5), and the number of ways to place
k rectangles in Λβ is at most |Λkβ | , it follows that for k large enough
|Λβ |
F (k) ≤ 2 C1 k
μβ () e−C2 βk
k
k
≤ 2C1 k C1 ew(β) μβ () e−C2 βk ≤ μβ () exp − 12 C2 βk , (19.5.4)
where the second inequality uses that k! ≥ k k e−k , k ∈ N, and the third inequality
uses that w(β) = eo(β) . We thus have
K
max
|Λβ | C2 |Λβ |
F (k) ≤ 2μβ () w(β) exp − 12 β , (19.5.5)
|Λβ |
w(β) C1 w(β)
k= C w(β)
1
Proof Split
S = SL ∪ (S \ SL ) = SL ∪ U>L , (19.6.1)
476 19 Glauber Dynamics
where U>L ⊂ S are those configurations σ for which CB (σ ) has at least one rect-
angle that is larger than QL (0). We have
CB (σ ) = R1 (x),2 (x) (x), (19.6.2)
x∈X(σ )
where X(σ ) is the set of lower-left corners of the rectangles in CB (σ ), which in turn
can be split as
X(σ ) = X >L (σ ) ∪ X ≤L (σ ), (19.6.3)
where X >L (σ )
labels the rectangles that are larger than QL (0) and labels X ≤L (σ )
the rest.
Let σ |A denote the restriction of σ to the set A ⊂ Z2 . Then, for any x ∈ X(σ ),
we have
H (σ ) = H (σ |R1 (x),2 (x) (x) ) + H (σ |Rc (x) ), (19.6.4)
2 (x),2 (x)
μβ (U>L )
≤ 1{x∈X>L (σ )} μβ (σ )
x∈Λβ σ ∈S
1
= 1{x∈X>L (σ )} exp −β H (σ |R1 (x),2 (x) (x) ) + H (σ |Rc (x), (x) (x) )
Zβ 1 2
x∈Λβ σ ∈S
1 −βH (σ |Rc (x), (x) (x) )
≤ e−βΓL+1 1{x∈X>L (σ )} e 1 2 , (19.6.5)
Zβ
x∈Λβ σ ∈S
where ΓL+1 is the energy of QL+1 (0). In the last step we use the fact that the
bootstrap map is downhill and that the energy of QL (0) is increasing with L. Since
the energy of a subcritical rectangle is non-negative, we get
with NL+1 counting the number of configurations with support in QL+1 (0).
On the other hand, by considering only those configurations in U>L that have a
QL+1 (0) droplet, we get
[QL+1 (0)]c
μβ (U>L ) ≥ NL+1 e−βΓL+1 |Λβ | μβ (S ), (19.6.7)
where the last factor is the Gibbs weight of the configurations in S with support
[Q (0)]c
outside [QL+1 (0)]c . It easy to show that μβ L+1 (S ) = μβ (S )[1 + o(1)] as
β → ∞ and so
μβ (U>L ) ≥ NL+1 e−βΓL+1 |Λβ | μβ (S ) 1 + o(1) , β → ∞. (19.6.8)
19.7 Bibliographical notes 477
2. If we draw the starting configuration from some subset of S that has a strong
recurrence property under the dynamics, then the choice of initial distribution on
this subset should not matter. This issue remains to be resolved. Gaudilliere, den
Hollander, Nardi, Olivieri and Scoppola [118–120] provide a partial answer within
the pathwise approach to metastability, i.e., up to exponential order in β.
4. The extension of the main theorem in Sect. 19.1.2 from two to three (and higher)
dimensions is straightforward. See also Sect. 17.6.
5. Theorem 19.4 identifies the first time when a critical droplet appears somewhere
in Λβ . It is a different issue to compute the first time when the plus-phase appears
near the origin. Two regimes have been studied: (1) |Λ| = ∞, h ∈ (0, 2J ), β → ∞;
(2) |Λ| = ∞, J > 0, β > 0 large enough, h ↓ 0. Regime (1) was considered in
two dimensions by Dehghanpour and Schonmann [77, 78], and in three and higher
dimensions by Cerf and Manzo [54]. Regime (2) was considered in two dimensions
by Schonmann [211–214], and Shlosman and Schonmann [215]. The invasion time
is identified up to errors that are subexponential in β, respectively, 1/ h. Proofs are
hard because the invasion time depends on where critical droplets appear for the
first time, how they grow and diffuse, how they meet other droplets along the way
and possibly merge with them, and how they eventually invade the origin. We will
return to this problem in Chap. 23.
6. The analogue of regime (1) in item 5 for the Blume-Capel model (recall
Sect. 17.7, item 8), was studied in Manzo and Olivieri [173].
Chapter 20
Kawasaki Dynamics
The goal of this chapter is to extend the analysis in Chap. 19 to Kawasaki dynamics.
We will see that, again, the average time until the appearance of a critical droplet
somewhere is inversely proportional to the volume, and is driven by the same quan-
tities Γ and K as for small volumes. However, in the proof we encounter several
difficult issues, all coming from the fact that Kawasaki dynamics is conservative.
The first is to understand why Γ , representing the energetic cost to create a critical
droplet in a small box with an open boundary, i.e., in a grand-canonical setting,
reappears even though we choose our box to have a closed boundary, i.e., we work
in a canonical setting. This “mystery” will be resolved by the observation that the
formation of a critical droplet reduces the entropy of the system: the precise compu-
tation of this entropy loss yields Γ via dynamical equivalence of ensembles. The
second problem is to control the probability of a particle moving from the gas to
the protocritical droplet at the last stage of the nucleation, which plays a key role
in understanding how K comes up. This non-locality issue will be dealt with via
upper and lower estimates. As we will see, the latter in fact causes the scaling to be
slightly different than for small volumes.
We retain the setting of Sect. 18.1.1, and again let Λβ , Sβ and Hβ depend on β.
The main difference with the small volume situation described in Chap. 18 is that
we consider the dynamics on a torus rather than on a box with an open boundary,
and do not allow particles to be created or annihilated. Indeed, as Hamiltonian we
choose
Hβ (σ ) = −U σ (x)σ (y), σ ∈ Sβ , (20.1.1)
{x,y}∈(Λβ )∗
Fig. 20.1 An example of a configuration in S : no box BLβ (·) of size Lβ contains more than a
protocritical number of particles
and we work in the canonical ensemble, i.e., the second term in (18.1.2) is removed.
The number of particles in Λβ is taken to be
/ 0
nβ = ρβ |Λβ | , (20.1.2)
Here, the activity parameter Δ that was removed from the Hamiltonian resurfaces
via the density in Λβ , i.e., we view Λβ as a gas reservoir surrounding local volumes.
Because of particle conservation, the state space of our dynamics is the set
= σ ∈ Sβ : supp[σ ] = nβ ,
(nβ )
Sβ (20.1.4)
and such that Lβ is odd. What this says is that Lβ is marginally below the typical
interparticle distance.
Definition 20.1 Let BLβ (x), x ∈ Λβ , be the square box with side length Lβ centred
at x (see Fig. 20.1).
20.1 Introduction and main results 481
Fig. 20.2 Schematic picture of the sets S , C − , C + defined in Definition 20.1 and the set C˜
interpolating between C − and C +
(n )
(a) S = {σ ∈ Sβ β : |supp[σ ] ∩ BLβ (x)| ≤ c (c − 1) + 1 ∀ x ∈ Λβ }.
(b) P = {σ ∈ S : cβ (σ, σ ) > 0 for some σ ∈ S c }.
(c) C = {σ ∈ S c : cβ (σ, σ ) > 0 for some σ ∈ S }.
(d) C − = {σ ∈ C : ∃ x ∈ Λβ such that BLβ (x) contains a protocritical droplet
whose lower-left corner is at x plus a free particle}.
(e) C + = the set of configurations obtained from C − by moving the free particle
to a site at distance 2 from the protocritical droplet, i.e., next to its boundary.
(f) C˜ = the set of configurations “interpolating” between C − and C + , i.e., the free
particle is somewhere between the boundary of the protocritical droplet and the
boundary of the box of size Lβ around it (see Fig. 20.2).
Remark 20.2 The sets P, C will play a similar rôle as, but are not directly compa-
rable with, the sets P , C in Chap. 18.
(n )
where μβ is the canonical Gibbs measure associated with Hβ living on Sβ β . In
words, SL is the subset of those subcritical configurations for which no box of size
Lβ carries more than L particles, with L chosen such that SL is typical within S
under the Gibbs measure μβ as β → ∞.
Note that Sc (c −1)+1 = S . As for Glauber, the value of L∗ depends on how fast
Λβ grows with β. In Sect. 20.4.4 we will show that, for every 1 ≤ L ≤ c (c − 1),
Throughout this chapter we assume that we are in the metastabe regime where Δ ∈
(U, 2U ) (recall Sect. 18.1.2). We further assume that
This first condition says that the number of particles tends to infinity, and ensures
that the formation of a critical droplet somewhere does not globally deplete the
surrounding gas. The second condition ensures that the set of configurations with
a protocritical droplet and a free particle within distance Lβ is atypical compared
to S .
Write N = N(c ) to denote the number of protocritical droplets modulo shifts
for Kawasaki dynamics in small volumes, which was identified in (18.1.13).
Theorem 20.3 (Mean crossover time) Subject to (20.1.8) and (20.1.11), the follow-
ing hold:
(a)
4π −βΓ 1
lim |Λβ | e EνS ,(S c \C˜)∪C + (τ(S c \C˜)∪C + ) = . (20.1.12)
β→∞ βΔ L N
20.2 Average time to create a critical droplet 483
(b)
4π −βΓ 1
lim |Λβ | e EνSL ,DM (τDM ) = , ∀ c ≤ M ≤ 2c − 1. (20.1.13)
β→∞ βΔ N
20.1.3 Discussion
1. Theorem 20.3(a) says that the average time to create a critical droplet is
[1 + o(1)](βΔ/4π)eβΓ /N|Λβ |. The factor βΔ/4π comes from the simple ran-
dom walk that is performed by the free particle “from the gas to the protocritical
droplet” (i.e., as the dynamics goes from C − to C + ), while the factor N counts the
number of shapes of the protocritical droplet. Theorem 20.3(b) says that, once the
critical droplet is created, it rapidly grows to a droplet that has twice the size.
2. In Sect. 20.5 we will show that the average probability under the Gibbs measure
μβ of destroying a supercritical droplet and returning to a configuration in SL is
exponentially small in β. Hence, the crossover from SL to S c \C˜ ∪ C + represents
the threshold for nucleation, and Theorem 20.3(a) represents the nucleation time.
Outline Theorem 20.3 is proved in Sects. 20.2–20.3. Along the way we need sev-
eral technical facts whose proofs are deferred to Sects. 20.4–20.5. These are all
related to the difficult issues mentioned in the opening of this chapter.
In this section we prove Theorem 20.3(a). Our starting point is the analogue of
(19.3.1) with S ∪ C and S c \C replaced by S ∪ (C˜\C + ) and (S c \C˜) ∪ C + .
Proof The argument is in the same spirit as that in Sect. 19.2.2. However, a number
of additional hurdles need to be taken that come from the conservative nature of
Kawasaki dynamics. The proof proceeds via upper and lower bounds, written out
below. Both take up quite a bit of space.
Upper bound
where
2
I= min 1
2 μβ (σ )cβ σ, σ h(σ ) − h σ (20.2.4)
h : C˜ →[0,1]
h| − =1, h| + =0
C C σ,σ ∈C˜
20.2 Average time to create a critical droplet 485
and γ1 (β) is an error term that will be estimated in Step 7. This term will turn
out to be small because μβ (σ )cβ (σ, σ ) is small when either σ ∈ Sβ β \C˜ or σ ∈
(n )
2
I = |Λβ | min 1
2 μβ (σ )cβ σ, σ h(σ ) − h σ . (20.2.5)
h : C˜ (0)→[0,1]
h|
C − (0)
=1, h| + =0
C (0) σ,σ ∈C˜(0)
Cˆ(0) = σ 1BLβ (0) : σ ∈ C˜(0) ,
(20.2.6)
Cˇ(0) = σ 1[BLβ (0)]c : σ ∈ C˜(0) .
C fp (0) = σ ∈ C˜(0) : Hβ (σ ) = Hβ (σ̂ ) + Hβ (σ̌ ) , (20.2.7)
i.e., the set of configurations consisting of a protocritical droplet and a free particle
inside BLβ (0) not interacting with the particles outside BLβ (0). Write C fp,− (0) and
C fp,+ (0) to denote the subsets of C fp (0) where the free particle is at distance Lβ ,
respectively, 2 from the protocritical droplet. Split the right-hand side of (20.2.5)
into a contribution coming from σ, σ ∈ C fp (0) and the rest, i.e.,
r.h.s. (20.2.5) = |Λβ | II + γ2 (β) , (20.2.8)
where
2
II = min 1
2 μβ (σ )cβ σ, σ h(σ ) − h σ (20.2.9)
h : C fp (0)→[0,1]
h| fp,− =1, h| fp,+ =0 σ,σ ∈C fp (0)
C (0) C (0)
and γ2 (β) is an error term that will be estimated in Step 6. This term will turn out
to be small because of loss of entropy when the particle is at the boundary.
486 20 Kawasaki Dynamics
where Cˆ− (0), Cˆ(0)+ denote the subsets of Cˆ(0) where the free particle is at dis-
tance Lβ , respectively, 2 from the protocritical droplet, and the inequality comes
from substituting
and afterwards replacing the double sum over σ̌ , σ̌ ∈ Cˇ(0) by the single sum over
σ̌ ∈ Cˇ(0) because cβ (σ̂ ∨ σ̌ , σ̂ ∨ σ̌ ) > 0 only if either σ̂ = σ̂ or σ̌ = σ̌ (the
dynamics updates one pair of neighbouring sites at a time). Next, estimate
r.h.s. (20.2.10)
1
≤ (nβ )
e−βHβ (σ̌ ) min 1
2
Z g : Cˆ (0)→[0,1]
σ̌ ∈Cˇ(0) β g| ˆ − =1, g| ˆ + =0
C (0) C (0)
σ̂ ,σ̂ ∈Cˆ (0)
σ̂ ∨σ̌ ,σ̂ ∨σ̌ ∈C fp (0)
2
e−βHβ (σ̂ ) cβ σ̂ , σ̂ g(σ̂ ) − g σ̂ , (20.2.12)
with P(0) the set of protocritical droplets with lower-left corner at 0, and
2
Vβ (σ ) = min 1
2 f (x) − f x , (20.2.14)
f : Z2 →[0,1]
f |Pσ (0) =1, f |[B (0)]c =0 x,x ∈Z2
Lβ x∼x
20.2 Average time to create a critical droplet 487
where Pσ (0) is the support of the protocritical droplet in σ , and x ∼ x means that
x and x are neighbouring sites. Indeed, (20.2.13) is obtained from the expression
in (20.2.12) by dropping the restriction σ̂ ∨ σ̌ , σ̂ ∨ σ̌ ∈ C fp (0), substituting
g Pσ (0) ∪ {x} = f (x), σ ∈ P(0), x ∈ BLβ (0)\Pσ (0), (20.2.15)
and noting that cβ (Pσ (0) ∪ {x}, Pσ (0) ∪ {x }) = 1 when x ∼ x and zero otherwise.
What (20.2.14) says is that
c
Vβ (σ ) = cap Pσ (0), BLβ (0) (20.2.16)
is the capacity of simple random walk between the protocritical droplet Pσ (0) in σ
and the exterior of BLβ (0). Now, define
(nβ −K)
Žβ (0) = e−βHβ (σ̌ ) . (20.2.17)
σ̌ ∈Cˇ(0)
where Γ¯ = −U [(c −1)2 +c (c −2)+1] is the binding energy of the protocritical
droplet.
4. Capacity estimate. For future reference we state the following estimate on ca-
pacities for simple random walk.
Lemma 20.6 Let U ⊂ Z2 be any set such that {0} ⊂ U ⊂ Bk (0), with k ∈ N0 inde-
pendent of β. Let V ⊂ Z2 be any set such that [BKLβ (0)]c ⊂ V ⊂ [BLβ (0)]c , with
K ∈ N independent of β. Then
c c
cap {0}, BKLβ (0) ≤ cap(U, V ) ≤ cap Bk (0), BLβ (0) . (20.2.19)
4π
Vβ (σ ) = 1 + o(1) , ∀ σ ∈ P(0), β → ∞. (20.2.21)
βΔ
Moreover, from Theorem 18.4 we know that |P(0)|, the number of shapes of the
protocritical droplet, equals N .
4π −βΓ
cap S , C + ≤ γ1 (β) + |Λβ |γ2 (β) + N |Λβ | e μβ (S ) 1 + o(1) ,
βΔ
β → ∞, (20.2.23)
where we use that Γ¯ + ΔK = Γ . This completes the proof of the upper bound,
provided that the error terms γ1 (β) and γ2 (β) are negligible.
6. Second error term. To estimate the error term γ2 (β), note that the configurations
in C˜(0)\C fp (0) are those for which inside BLβ (0) there is a protocritical droplet
whose lower-left corner is at 0, and at the boundary of βLβ (0) there is a particle that
is attached to some cluster outside βLβ (0). Recalling (20.2.5)–(20.2.9), we therefore
have
2
γ2 (β) ≤ μβ (σ )cβ σ, σ h(σ ) − h σ
σ ∈C˜(0)\C fp (0) σ ∈C˜(0)
≤ 6μβ C˜(0)\C fp (0) , (20.2.24)
where we use that h : C˜(0) → [0, 1], μβ (σ )cβ (σ, σ ) = μβ (σ ) ∧ μβ (σ ), and there
are at most 6 possible transitions from C˜(0)\C fp (0) to C˜(0): 3 through a move by
the particle at the boundary of BLβ (0) and 3 through a move by a particle in the
cluster outside BLβ (0). Since
7. First error term. To estimate the error term γ1 (β), we define the sets of pairs of
configurations
(n ) 2
I1 = (σ, η) ∈ Sβ β : σ ∈ S , η ∈ S c \C˜ ,
(n ) 2 (20.2.27)
I2 = (σ, η) ∈ Sβ β : σ ∈ C˜, η ∈ S c \C˜ ,
and estimate
2
γ1 (β) ≤ μβ (σ ) cβ (σ, η) = 12 Σ(I1 ) + 12 Σ(I2 ). (20.2.28)
i=1 (σ,η)∈Ii
where k counts the number of pairs of particles interacting across the boundary of
/ C˜, we have
BLβ (0). Moreover, since η ∈
Hβ (η̂) ≥ Γ¯ + U. (20.2.31)
¯
K
k
Σ(I1 ) ≤ |Λβ | e−β Γ μβ (S ) 1 + o(1) (ρβ )K+k 4(K − 1) eβ(k−1)U
k=0
−β Γ¯
= |Λβ | e μβ (S ) 1 + o(1) e−βU , (20.2.32)
490 20 Kawasaki Dynamics
where (ρβ )K+k comes from the fact that there are nβ − (K + k) particles outside
BLβ +1 (0) (once more use Lemma 20.8 in Sect. 20.4.1), and the inequality again
uses an argument similar as in Steps 3 and 5. Therefore Σ(I1 ) is small compared
to the main term of (20.2.23). The sum Σ(I2 ) can be estimated as
Σ(I2 ) = μβ (σ ) cβ (σ, η)
σ ∈C˜ η∈S c \C˜
= |Λβ | μβ (σ ) cβ (σ, η)
σ ∈C˜(0) η∈S c \C˜(0)
≤ |Λβ | μβ C˜(0) e−β U + (4Lβ ) ρβ 1 + o(1) , (20.2.33)
where the first term comes from detaching a particle from the critical droplet and
the second term from a extra particle entering BLβ (0). The term between braces
is o(1). Moreover, μβ (C˜(0)) = μβ (C fp (0)) + μβ (C˜(0)\C fp (0)). The second term
was estimated in (20.2.26), the first term can again be estimated as in Steps 3 and 5:
μβ C fp (0) = μβ (σ̂ ∨ σ̌ )
σ̂ ∈Cˆ(0) σ̌ ∈Cˇ (0)
σ̂ ∨σ̌ ∈C fp (0)
(nβ −K)
−β Γ¯
Žβ (0)
= N e−βΓ μβ (S ) 1 + o(1) .
=Ne (n )
(20.2.34)
Zβ β
Having completed the proof of the upper bound in Lemma 20.5, we next turn to
the proof of the lower bound.
Lower bound
For future reference we state the following property of the harmonic function for
simple random walk on Z2 .
Lemma 20.7 Let g be the harmonic function of simple random walk on B2Lβ (0)
(which is equal to 1 on {0} and 0 on [B2Lβ (0)]c ). Then there exists a constant C < ∞
such that
c
g(z) − g(z + e) + ≤ C/Lβ ∀ z ∈ BLβ (0) . (20.2.35)
e
Proof See e.g. Lawler, Schramm and Werner [161], Lemma 5.1. The proof can be
given via the estimates in Lawler [160], Sect. 1.7, or via a coupling argument.
20.2 Average time to create a critical droplet 491
The proof of the lower bound follows the same line of argument as for Glauber
dynamics in that it relies on the construction of a suitable unit flow. This flow will,
however, be considerably more difficult. In particular, we will no longer be able
to get away with choosing a deterministic flow, and the full power of the Berman-
Konsowa variational principle has to be brought to bear.
(n −K)
CL = σ ∪ P(y) (x, z) : σ ∈ S2 β , x, y ∈ Λβ , (x, y, z) ⊥ σ (20.2.36)
(γ )
τ −1
f (γk , γk+1 )
cap(CL , SL ) ≥ |Λβ | P (γ )
f
μβ (γk )cβ (γk , γk+1 )
σ ∈CL (0) γ : γ0 =σ k=0
(20.2.37)
for a suitably constructed flow f and associated path measure Pf , starting from
some initial distribution on CL (0) (which as for Glauber will be irrelevant), and
τ (γ ) the time at which the last of the K − L particles exits the box BLβ (0).
The difference between Glauber and Kawasaki is that, while in Glauber the
droplet can be torn down via single spin-flips, in Kawasaki after we have detached
a particle from the droplet we need to move it out of the box BLβ (0), which takes
a large number of steps. Thus, τ (γ ) is the sum of K − L stopping times, each of
which, except the first, is a sum of two stopping times itself, one to detach the parti-
cle and one to move it out of the box BLβ (0). With each motion of a single particle
we need to gain an entropy factor of order close to 1/ρβ . This will be done by con-
structing a flow that involves only the motion of this single particle, based on the
harmonic function of the simple random walk in the box B2Lβ (0) up to the boundary
of the box BLβ (0). Outside BLβ (0) the flow becomes more complex: we modify it
in such a way that a small fraction of the flow, of order Lβ−1+ε for some ε > 0 small
enough, is going into the direction of removing the next particle from the droplet.
The reason for this choice is that we want to make sure that the flow becomes suffi-
ciently small, of order Lβ−2+ε , so that this can compensate for the fact that the Gibbs
weight in the denominator of the lower bound in Theorem 7.43 is reduced by a fac-
tor e−βU when the protuberance is detached. The reason for the extra ε is that we
want to make sure that, along most of the paths, the protuberance is detached before
the first particle leaves the box B2Lβ (0).
Once the protuberance detaches itself from the protocritical droplet, the first par-
ticle stops and the second particle moves in the same way as the first particle did
when it moved away from the protocritical droplet, and so on. This is repeated until
no more than L particles remain in BLβ (0), by which time we have reached SL .
As we will see, the only significant contribution to the lower bound comes from the
motion of the first particle (as for Glauber), and this coincides with the upper bound
established earlier. The details of the construction are to some extent arbitrary and
there are many other choices imaginable.
3. First particle. We first construct the flow that moves the particle at distance 2
from the protocritical droplet to the boundary of the box BLβ (0). This flow will
consist of independent flows for each fixed shape and location of the critical droplet,
and will be seen to produce the essential contribution to the lower bound.
We label the configurations in CL (0) by σ , describing the shape of the critical
droplet, as well as the configuration outside the box B2Lβ (0), and we label the posi-
tion of the free particle in σ by z1 (σ ).
Let g be the harmonic function for simple random walk with boundary condi-
tions 0 on [B2Lβ (0)]c and 1 on the critical droplet. Then we choose our flow to be
20.2 Average time to create a critical droplet 493
C1 [g(z) − g(z + e)]+ , if z = z + e, e = 1,
f σ (z), σ z = (20.2.38)
0, otherwise,
where σ (z) is the configuration obtained from σ by placing the first particle at site z.
The constant C1 is chosen to ensure that f defines a unit flow, i.e.,
C1 g z1 (σ ) − g z1 (σ ) + e
σ ∈CL (0) z1 (σ ),e
c
= C1 cap Pσ (0), B2Lβ (0) = 1, (20.2.39)
σ ∈CL (0)
where Pσ (0) denotes the support of the protocritical droplet in σ , and the capacity
refers to the simple random walk.
Now, let z1 (k) be the location of the first particle at time k, and
c
τ 1 = inf k ∈ N : z1 (k) ∈ BLβ (0) (20.2.40)
be the first time when, under the Markov chain associated to the flow f , it exits
BLβ (0). Let γ be a path of this Markov chain. Then, by (20.2.38)–(20.2.39), we
have
1
τ
f (γk , γk+1 ) C1 [g(z1 (0)) − g(z1 (τ 1 ))]
= (20.2.41)
μβ (γk )cβ (γk , γk+1 ) μβ (γ0 )
k=0
where the sum over the g’s is telescoping because only paths along which the
g-function decreases carry positive probability, and cβ (γk , γk+1 ) = 1 for all 0 ≤ k ≤
τ 1 because the first particle is free. We have g(z1 (0)) = 1, while, by Lemma 20.7,
there exists a C < ∞ such that
c
g(x) ≤ C/ ln Lβ , x ∈ BLβ (0) . (20.2.42)
Therefore
1
τ
f (γk , γk+1 ) C1
= 1 + o(1) . (20.2.43)
μβ (γk )cβ (γk , γk+1 ) μβ (γ0 )
k=0
(because {0} ⊂ Pσ (0) ⊂ B2c (0) for all σ ∈ CL (0)). Since N = |CL (0)|, it follows
from (20.2.39) that
1 4π
=N 1 + o(1) , (20.2.45)
C1 βΔ
494 20 Kawasaki Dynamics
To see why (20.2.47) is true, recall from (20.2.36) that CL (0) is obtained from
(n −K)
S2 β by adding a critical droplet with lower-left corner at the origin that does
not interact with the nβ − K particles elsewhere in Λβ . Hence
(nβ −K)
¯ Z̃β (0)
μβ CL (0) = e−β Γ (n )
, (20.2.48)
Zβ β
(n −K) (n −K)
where Z̃β β (0) is the analogue of Žβ β (0) (defined in (20.2.17)) obtained by
requiring that the nβ −K particles are in [Rc ,c (0)]c instead of [BLβ (0)]c . However,
it will follow from the proofs of Lemmas 20.8–20.10 in Sect. 20.4 that, similarly as
in (20.2.22),
(nβ −K)
Z̃β (0)
(n )
= (ρβ )K μβ (S ) 1 + o(1) , β → ∞, (20.2.49)
Zβ β
which yields (20.2.47) because Γ = Γ¯ + KΔ. For the remaining part of the con-
struction of the flow it therefore suffices to ensure that the sum beyond τ 1 gives a
smaller contribution.
4. Second particle. Once the first particle (i.e., the free particle) has left the box
BLβ (0), we need to allow the second particle (i.e., the protuberance) to detach it-
self from the protocritical droplet and to move out of BLβ (0) as well. The problem is
that detaching the second particle reduces the Gibbs weight appearing in the denom-
inator by e−Uβ , while the increments of the flow are reduced only to about 1/Lβ .
Thus, we cannot immediately detach the second particle. Instead, we do this with
probability Lβ−1+ε only. The idea is that, once the first particle is outside BLβ (0),
we leak some of the flow that drives the motion of the first particle into a flow that
detaches the second particle. To do this, we have to first construct a leaky flow in
B2Lβ (0)\BLβ (0) for simple random walk. This goes as follows.
Let p(z, z + e) denote the transition probabilities of simple random walk driven
by the harmonic function g on B2Lβ (0). Put
p(z, z + e), if z ∈ BLβ (0),
p̃(z, z + e) = −1+ε (20.2.50)
(1 − Lβ ) p(z, z + e), if z ∈ B2Lβ (0)\BLβ (0).
20.2 Average time to create a critical droplet 495
Use the transition probabilities p̃(z, z + e) to define a path measure P̃ . This path
measure describes simple random walk driven by g, but with a killing probability
Lβ−1+ε inside the annulus B2Lβ (0)\BLβ (0). Put
k(z, z + e) = P̃ (γ )1(z,z+e)∈γ , z ∈ B2Lβ (0). (20.2.51)
γ
(which is possible by (20.1.5)–(20.1.6)). The important fact for us is that this leaky
flow is dominated by the harmonic flow associated with g, in particular, the flow in
satisfies
k(z + e, z) ≤ g(z + e) − g(z) + ∀ z ∈ B2Lβ (0) (20.2.54)
e e
(and the same applies for the flow out). This inequality holds because g satisfies the
same equations as in (20.2.50)–(20.2.51) but without the leaking factor 1 − Lβ−1+ε .
Using this leaky flow, we can now construct a flow involving the first two parti-
cles, as follows:
• f σ (z1 , a), σ (z1 + e, a) = C1 k(z1 , z1 + e), (20.2.55)
if z1 ∈ B2Lβ (0),
• f σ (z1 , a), σ (z1 , b) = C1 Lβ−1+ε k(z1 , z1 + e),
e
if z1 ∈ B2Lβ (0)\BLβ (0),
# $
−1+ε
• f σ (z1 , z2 ), σ (z1 , z2 + e) = C1 Lβ k(z1 , z1 + e) g(z2 ) − g(z2 + e) + ,
e
if z1 ∈ B2Lβ (0)\BLβ (0), z2 ∈ BLβ (0)\Pσ (0).
496 20 Kawasaki Dynamics
Here, we write a and b for the locations of the second particle prior and after it
detaches itself from the protocritical droplet, and σ (z1 , z2 ) for the configuration
obtained from σ by placing the first particle (that was at distance 2 from the pro-
tocritical droplet) at site z1 and the second particle (that was the protuberance) at
site z2 . The flow for other motions is zero, and the constant C1 is the same as in
(20.2.38)–(20.2.39).
We next define two further stopping times, namely,
ζ 2 = inf k ∈ N : z2 (γk ) = b , (20.2.56)
i.e., the first time the second particle (the protuberance) detaches itself from the
protocritical droplet, and
c
τ 2 = inf k ∈ N : z2 (γk ) ∈ BLβ (0) , (20.2.57)
i.e., the first time the second particle exits the box BLβ (0). Note that, since we
choose the leaking probability to be Lβ−1+ε , the probability that ζ 2 is larger than
the first time the first particle exits B2Lβ (0) is of order exp[−Lεβ ] and hence is
negligible. We will disregard the contributions of such paths in the lower bound.
These paths will be called good.
We will next show that (20.2.41) also holds if we extend the sum along any path
of positive probability up to ζ 2 . The reason for this lies in Lemma 20.7. Let γ be
a path that has a positive probability under the path measure Pf associated with f
stopped at τ 2 . We will assume that this path is good in the sense described above.
To that end we decompose
2
τ
f (γk , γk+1 )
μβ (γk )cβ (γk , γk+1 )
k=0
τ 1 −2
ζ 2
f (γk , γk+1 ) f (γk , γk+1 )
= +
μβ (γk )cβ (γk , γk+1 ) μβ (γk )cβ (γk , γk+1 )
k=0 k=τ +1
1
2
τ
f (γk , γk+1 )
+ = I + II + III. (20.2.58)
μβ (γk )cβ (γk , γk+1 )
k=ζ 2 −1
The first term corresponds to the move when the protuberance detaches itself from
the protocritical droplet. Its numerator is given by f (σ (z1 , a), σ (z1 , b)) (for some
z1 ∈ [BLβ (0)]c ) which, by Lemma 20.7 and (20.2.54)–(20.2.55), is smaller than
C1 Lβ−1+ε CL−1β = C1 CLβ
−2+ε
. On the other hand, its denominator is given by
The same holds for the denominators in all the other terms in III, while the numer-
ators in these terms satisfy the bound
f (γk , γk+1 ) ≤ C1 C Lβ−2+ε g z2 (γk ) − g z2 (γk+1 ) . (20.2.62)
5. Remaining particles. The lesson from the previous steps is that we can construct
a flow with the property that each time we remove a particle from the droplet we gain
a factor Lβ−2+ε , i.e., almost e−Δβ . (This entropy gain corresponds to the gain from
the magnetic field in Glauber dynamics, or from the activity in Kawasaki dynamics
on a finite open box.) We can continue our flow by tearing down the critical droplet
in the same order as we did for Glauber dynamics. Each removal corresponds to a
flow that is built in the same way as described in Step 4 for the second particle. There
will be some minor modifications involving a negligible fraction of paths where a
particle hits a particle that was moved out earlier, but this is of no consequence. As
a result of the construction, the sums along the remainders of these paths will give
only negligible contributions.
Thus, we have shown that the lower bound coincides, up to a factor 1 + o(1),
with the upper bound and the lemma is proven.
Combining the upper bound obtained in Sect. 20.2.2 with the lower bound ob-
tained in Sect. 20.2.2, we have finally completed the proof of Lemma 20.5, and
therefore of Theorem 20.3(a).
498 20 Kawasaki Dynamics
In this section we prove Theorem 20.3(b). The starting point is again the analogue
of (19.3.1) with S c \C replaced by DM and S ∪ C by DM c .
Proof The same observation holds as in (19.4.1). Therefore the proof follows
along the same lines as that of Theorem 20.3(a). The main point is to prove
cap(DM , SL ) = [1 + o(1)]cap(C + , SL ). Since cap(SL , DM ) ≤ cap(SL , C + ), all
we need to do is prove a lower bound on cap(DM , SL ). This is done in almost
exactly the same way as for Glauber, by using the construction given there and sub-
stituting each Glauber move by a flow involving the motion of just two particles.
Note that, as long as M = eo(β) , an M × M droplet can be added at |Λβ | −
o(|Λβ |) locations to a configuration σ ∈ S (compare with (20.2.36)). The only
novelty is that we have to eventually remove the cloud of particles that is produced in
the annulus B2Lβ (0)\BLβ (0). This is done in much the same way as before. As long
as only eo(β) particles have to be removed, potential collisions between particles can
be ignored as they are sufficiently unlikely.
and
(nβ −m)
Zβ = e−βHβ (σ ) ,
(nβ −m)
σ ∈S
(20.4.2)
(nβ −m)
Žβ = e−βHβ (σ ) 1{supp[σ ]⊂Λβ \BLβ (0) } .
(nβ −m)
σ ∈S
The first is the partition function with nβ − m particles restricted such that no box
of size Lβ has ≥ K particles. The second is the same partition function but with the
additional restriction that no particle falls in BLβ (0).
The following lemma was used in (20.2.22), (20.2.26), (20.2.32) and (20.2.49).
In Sects. 20.4.1–20.4.2 two lemmas are proved that combine to yield Lemma 20.8.
Sections 20.4.3–20.4.4 prove atypicality of critical droplets and typicality of starting
configurations.
20.4 Equivalence of ensembles 499
1. It suffices to give the proof for m = 1. The same proof works for m ≥ 2 after we
replace nβ by nβ − m + 1. Write
1
e−βHβ (σ ∨1x ) 1{σ ∨1 ∈S (nβ ) }
(nβ )
Zβ =
nβ x
supp[σ ]⊂Λβ x∈Λβ \supp[σ ]
|σ |=nβ −1
= e−βHβ (σ ) I (σ ) + II(σ ) = I + II, (20.4.3)
supp[σ ]⊂Λβ
|σ |=nβ −1
where
1
I (σ ) = 1{σ ∨1 ∈S (nβ ) } ,
nβ x∈Λβ
x
dist(x,supp[σ ])>1
(20.4.4)
1
−β[Hβ (σ ∨1x )−Hβ (σ )]
II(σ ) = e 1{σ ∨1 ∈S (nβ ) } .
nβ x∈Λβ
x
dist(x,supp[σ ])=1
In the first sum the particle at x is free and Hβ (σ ∨ 1x ) = Hβ (σ ), while in the second
sum it is not free and Hβ (σ ∨ 1x ) < Hβ (σ ). For every σ ∈ S (nβ −1) , we have (recall
(20.4.1))
|Λβ | − (2Lβ + 1)2 (nβ − 1) ≤ 1{σ ∨1 ∈S (nβ ) } ≤ |Λβ |. (20.4.5)
x
x∈Λβ
dist(x,supp[σ ])>1
We will show that II is exponentially smaller than I , which will prove the claim.
500 20 Kawasaki Dynamics
2. Let us define a 1-cluster as a maximal set of particles such that for each particle
in the cluster there is another particle in the cluster at distance ≤ 2. Write
nβ −1
1
II =
nβ
N =1 C ,...,CN
N 1
m=1 |Cm |=nβ −1
N
e−β b=a Hβ (Cb )−βHβ (Ca ∨1x )
1{x∪[∪N (nβ ) , (20.4.7)
a=1 Ca ]∈S }
a=1 x∈∂Ca
×1 N (nβ ) , (20.4.8)
k=1 ∪l=1 Cl ]∈S
{x∪[∪K−1 }
k k
Nk
K−1
k
≤ 4k e−βHk+1 exp −β Hβ Cl 1{(k ,l )=(k,l)} . (20.4.10)
k =1 l =1
The last sum no longer contains the 1-cluster Clk that x is attached to. Since
|Clk | = k, the other 1-clusters contain a total of nβ − (k + 1) particles. Hence, in-
20.4 Equivalence of ensembles 501
|Λβ |
K−2
(n −k−1)
II ≤ 4k e−βHk+1 Zβ β , (20.4.11)
nβ
k=1
where |Λβ | counts the possible locations of the 1-cluster that has been removed, we
trace back the decomposition in (20.4.7)–(20.4.8), and we use that if x ∈ ∂Clk , then
K−1 N 8 K−1 N 8
k
k
(nβ −k−1)
x∪ Clk ∈S (nβ )
⊂ Clk \Clk ∈S . (20.4.12)
k =1 l =1 k =1 l =1
Inserting this bound into (20.4.14) and using (20.1.3), we see that the sum in
(20.4.14) is O(e−βε ) for some ε > 0, and so II indeed is exponentially smaller
than I . Here, note that if c ≥ 3, then K ≤ (2c − 3)2 , which means that (20.4.15)
covers the range of k-values needed in (20.4.14). For c = 2 we have K = 4, but
H2 + Δ = −U + Δ > 0 and H3 + 2Δ = −2U + 2Δ > 0 because Δ > U , and so we
are done as well.
Hence
√ √
Hk + (k − 1)Δ ≥ −2U [k − k] + (k − 1)Δ = −(2U − Δ)k + 2U k − Δ.
(20.4.18)
√
Let ∗ = U/(2U − Δ). Then the right-hand side equals (2U − Δ)[−k + 2∗ k −
(2∗ − 1)], which is = 0 for k = 1 and > 0 for 2 ≤ k < (2∗ − 1)2 . Since c = 0∗ 1
502 20 Kawasaki Dynamics
and ∗ ∈
/ N (recall (18.1.5)–(18.1.6)), we have 2∗ − 1 > 2c − 3, which proves the
claim.
Proof It suffices to give the proof for m = 0. The same proof works for m ≥ 1 after
(n ) (n )
we replace nβ by nβ − m. Since Žβ β ≤ Zβ β , it suffices to prove the lower bound.
Write
(nβ ) (nβ )
Zβ = Žβ
K
+ e−β Hβ (η∨ζ ) 1{supp[η]⊂BLβ (0)} 1{supp[ζ ]⊂[BLβ (0)]c }
m=1 η∈S (m) (nβ −m)
β ζ ∈Sβ
(nβ )
η∨ζ ∈S
(nβ )
≤ Žβ + γ1 (β) + γ2 (β),
(20.4.19)
where
K
γ1 (β) = e−β [Hβ (η)+Hβ (ζ )] 1{supp[η]⊂BLβ (0)} 1{supp[ζ ]⊂[BLβ (0)]c }
m=1 η∈S (m) (nβ −m)
β ζ ∈Sβ
(nβ )
η∨ζ ∈S
(20.4.20)
and γ2 (β) is a term that arises from particles interacting across the boundary of
BLβ (0). We will show that both γ1 (β) and γ2 (β) are negligible.
Estimate
γ1 (β)
K
(nβ −m)
≤ Žβ e−βHβ (η) 1{supp[η]⊂BLβ (0)}
m=1 η∈S (m)
(n )
K
= 1 + o(1) Žβ β (ρβ )m e−β Hβ (η) 1{supp[η]⊂BLβ (0)}
m=1 η∈S (m)
(n )
K
m j
= 1 + o(1) Žβ β (ρβ )m e−β i=1 Hβ (Ci ) ,
m=1 j =1 2≤k1 ,...,kj ≤K j
C=∪i=1 Ci ⊂BL (0)
j β
i=1 ki =m |Ci |=ki ∀ i
(20.4.21)
20.4 Equivalence of ensembles 503
where the first equality uses Lemma 20.9 with Λβ replaced by Λβ \BLβ (0), while
the second equality is an expansion in terms of clusters. Using once more the isoperi-
metric inequality in (20.4.15), we get (recall (20.1.5))
(n )
K
m j
γ1 (β) ≤ 1 + o(1) Žβ β (ρβ )m e−β i=1 Hki 1
m=1 j =1 2≤k1 ,...,kj ≤K j
C=∪i=1 Ci
j
i=1 ki =m |Ci |=ki ∀ i
K
m
2 j j
e−β
(nβ )
≤ A Žβ (ρβ )m Lβ i=1 Hki
K
m j
e−β i=1 [Hki +ki Δ−(Δ−δβ )]
(n )
= A Žβ β
m=1 j =1 2≤k1 ,...,kj ≤K
j
i=1 ki =m
(nβ ) −βε
≤ B Žβ e (20.4.22)
for some ε > 0 and some constants A, B < ∞ that are independent of β, i.e., γ1 (β)
is negligible. Estimate
γ2 (β) (20.4.23)
K
m
(nβ −m−k)
≤ e−βHβ (η) eβkU 1{supp[η]⊂BLβ (0)} Žβ
m=1 η∈S (m) k=1
K
m
(nβ )
≤ e−βHβ (η) eβkU 1{supp[η]⊂BLβ (0)} (ρβ )m+k Žβ 1 + o(1)
m=1 η∈S (m) k=1
(nβ )
K
−βHβ (η)
m
≤ 1 + o(1) Žβ (ρβ )m
e e−βk(Δ−U ) 1{supp[η]⊂BLβ (0)} ,
m=1 η∈S (m) k=1
The following lemma was used in Sect. 20.2.1. Recall Definition 20.1, and note that
S = S (nβ ) .
504 20 Kawasaki Dynamics
where the first sum runs over all configurations in BLβ (0) consisting of a proto-
critical droplet centred at the origin and a free particle elsewhere, and γ3 (β) is a
negligible term that arises from particles interacting across the boundary of BLβ (0),
similar as the term γ2 (β) in Sect. 20.4.2. The double sum in the right-hand side of
(20.4.25) equals
(nβ −K)
Žβ
BL (0) 1 + o(1) N e−β Γ¯ 1 + o(1) , (20.4.26)
(n ) β
Zβ β
Proof Split
S (nβ ) = S = SL ∪ (S \ SL ) = SL ∪ U>L , (20.4.28)
where U>L ⊂ S are those configurations σ for which there exists an x such that
|supp[σ ] ∩ BLβ (x)| > L. Then
K
μβ (U>L ) = μβ (σ ) 1{|supp[σ ]∩BLβ (x)|=m} = |Λβ | ϕ(β)+γ (β) ,
x∈Λβ σ ∈S (nβ ) m=L+1
(20.4.29)
20.4 Equivalence of ensembles 505
where
K e−β[Hβ (η)+Hβ (ζ )]
ϕ(β) = (nβ )
m=L+1 η∈S (m) (nβ −m) Zβ
β ζ ∈Sβ
(nβ )
η∨ζ ∈S
and γ (β) is an error term arising from particles interacting across the boundary of
BLβ (0). By the same argument as in (20.4.23), this term is negligible. Moreover,
(nβ −m)
K
Žβ
−β Hβ (η)
ϕ(β) ≤ (nβ )
e 1{supp[η]⊂BLβ (0)} (20.4.31)
m=L+1 Zβ η∈S (m)
(n )
K
−βHβ (η)
≤ 1 + o(1) μβ S β (ρβ )m
e 1{supp[η]⊂BLβ (0)} ,
m=L+1 η∈S (m)
K
m j
ϕ(β) ≤ 1 + o(1) A μβ S (nβ ) e−β i=1 [Hki +ki Δ−(Δ−δβ )]
Clearly, the left-hand side of (20.5.1) is the escape probability to SL from ∂DM
averaged with respect to the canonical Gibbs measure μβ conditioned on ∂DM and
weighted by the outgoing rate cβ . To show that this quantity is exponentially small
in β, which is our goal, it suffices to show that in the right-hand side of (20.5.1) the
denominator is large compared to the numerator.
By Lemma 20.5,
4π −βΓ
cap(SL , DM ) ≤ cap SL , S c \ C˜ ∪ C + = N |Λβ | e μβ (S ) 1 + o(1) .
Δβ
(20.5.2)
On the other hand, note that ∂DM contains all configurations σ for which there is
an M × M droplet somewhere in Λβ , all Lβ -boxes not containing this droplet carry
at most K particles, and there is a free particle somewhere in Λβ . The last condition
ensures that cβ (σ ) ≥ 1. Therefore we can use Lemma 20.8 to estimate
(nβ −M 2 )
Žβ
μβ (σ )cβ (σ ) ≥ |Λβ | e−βHM 2 (nβ )
σ ∈∂ DM Zβ
= |Λβ | e−βHM 2 (ρβ )M μβ (S ) 1 + o(1) ,
2
(20.5.3)
4π exp[−βΓ ]
N 1 + o(1) , (20.5.4)
Δβ exp[−β(HM 2 + ΔM 2 )]
2. In Gaudillière, den Hollander, Nardi, Olivieri, and Scoppola [118–120] the same
nucleation problem as in Sect. 20.1 is studied with the help of the pathwise approach
to metastability. Only the exponential asymptotics of the nucleation time is obtained,
but for a much wider class of initial distributions than we can presently handle with
the potential-theoretic approach. The techniques developed in these papers center
around the idea of approximating the low-temperature and low-density Kawasaki
lattice gas by an ideal gas (without interaction) and showing that this ideal gas
stays close to equilibrium while exchanging particles with droplets that are growing
and shrinking. In this way, the large system is shown to behave essentially like
the union of many small independent systems, leading to homogeneous nucleation.
The proofs are long and complicated, but they provide considerable detail about the
typical trajectory of the system prior to and shortly after the onset of nucleation,
something the potential-theoretic approach cannot offer.
where Δ > 0 is the usual activity parameter mimicking the presence of an infinite
gas reservoir around Λβ . This was the setting of Chap. 17 for small volumes. For
large volumes, however, even with this Hamiltonian we still have to face all the
difficult issues of non-locality we struggled with in Sects. 20.2–20.5.
4. As for Theorem 19.4(c), we expect Theorem 20.3(b) to hold for values of M that
grow with β as M = eo(β) .
6. The extension of our results to higher dimensions is limited only by the com-
binatorial problems involved in the computation of the number of critical droplets
(which is hard in the case of Kawasaki dynamics) and of the probability for simple
random walk to hit a critical droplet of a given shape when coming from far. Recall
Sect. 18.6.
Part VIII describes lattice systems in small volumes at high densities. The focus is
on the zero-range process, which consists of a collection of continuous-time simple
random walks with on-site attraction and no on-site repulsion. We consider the limit
where the particle density is high, show that the process spends most of its time in
a “condensed state”, i.e., a configuration where most of the particles pile up on a
single site, and prove that the process evolves via a “metastable hopping” of this
pile from one site to another. Both the hopping time and the hopping distribution
are computed.
Chapter 21
The Zero-Range Process
The zero-range process offers yet another example of a system for which potential-
theoretic methods can be used to describe metastable behaviour. The free energy
landscape is of a different nature than what we encountered in the models treated so
far. In particular, there is no temperature parameter, and the key quantity to control
is entropy. This necessitates a different approach to the choice of test functions to
estimates capacities, which is worthwhile to expose.
such that in configuration η a particle jumps from site x to site y at rate g(ηx )r(x, y).
Here, ηx ∈ N0 represents the number of particles at site x ∈ S, r(·, ·) is an ir-
reducible probability transition kernel associated with a reversible random walk
X = (X(t))t≥0 on S, and g is chosen as
a(n)
g(0) = 0, g(1) = 1, g(n) = , n ∈ N\{1}, (21.1.2)
a(n − 1)
with
a(0) = 1, a(n) = nα for some α ∈ (1, ∞). (21.1.3)
where ηx,y is the configuration obtained from η by moving a particle from site x to
site y. Note that the zero-range dynamics preserve particles.
Lemmas 21.1–21.2 and Theorem 21.3 below are well-known results for the zero-
range process in equilibrium. For references, see the bibliography in Sect. 21.6.
Lemma 21.1 Y is irreducible, and is reversible with respect to the unique invariant
probability measure μN given by
η
N α m∗
μN,S (η) = , η ∈ EN,S , (21.1.5)
ZN,S a(η)
with
η
" "
m∗ = m∗ (x)ηx , a(η) = a(ηx ), (21.1.6)
x∈S x∈S
where
m(x)
m∗ (x) = , M∗ = max m(x), (21.1.7)
M∗ x∈S
with m the invariant measure of the random walk X, and ZN,S denotes the normal-
ising partition function
m∗
ζ
ZN,S = N α . (21.1.8)
a(ζ )
ζ ∈EN,S
where
m∗ (x)j 1
Γx = , Γ (α) = . (21.1.11)
a(j ) a(j )
j ∈N0 j ∈N0
21.2 Metastable behaviour 513
N
lim N = ∞, lim = 0, (21.1.12)
N →∞ N →∞ N
Theorem 21.3 Suppose that L = L(N) is such that limN →∞ L(N )/N = 0. Then
there exists a sequence (N )N ∈N satisfying (21.1.12) such that
The question that will be addressed in this section is how Y moves between the
different condensate configurations.
First we state our results when L is kept fixed and N → ∞. Recall Definition 8.1.4.
Theorem 21.4 Y is metastable with respect to the set M = x∈S∗ ηx with ρ =
O(L(L2 + N)/N α+1 r ), where r = infu∈S r(u, u ± 1).
514 21 The Zero-Range Process
The proof of Theorem 21.4 will be given in Sect. 21.5 and is based on a compu-
tation of capacities capN,S (η, ζ ) between configurations η, ζ ∈ EN,S , where capN,S
refers to capacity associated with Y .
Define
1−s
Iα (s) = uα (1 − u)α du, 0 ≤ s ≤ 12 , (21.2.1)
s
and
ηU = ηx , ∅ = U ⊆ S∗ . (21.2.2)
x∈U
Theorem 21.5 (Sharp asymptotics of capacities) Let S∗1 , S∗2 S∗ be non-empty dis-
joint sets. Then
1 2 1
capN,S ηS∗ , ηS∗ = 1 + o(1)
2N α+1 M ∗ |S∗ | Iα (0) Γ (α)
× inf capS (x, y)[Wy − Wx ]2 , (21.2.3)
W ∈W (S∗1 ,S∗2 )
x,y∈S∗
where capN,S denotes capacity for Y , capS denotes capacity for X, and
W S∗1 , S∗2 = W = (Wz )z∈S ∈ [0, 1]S∗ : W |S∗1 = 1, W |S∗2 = 0 . (21.2.4)
Remark 21.6 Note that the second line in (21.2.3) is the conductance between S∗1
and S∗2 of a resistor network on S∗ with conductances capS (x, y) between sites
x, y ∈ S∗ .
Theorem 21.5 allows us to use Corollary 7.11 and Theorem 8.45 to obtain the
following result for the metastable exit times τM \ηx , x ∈ S∗ .
Corollary 21.7 (Mean and exponential law of metastable exit times) For every x ∈
S∗ the metastable exit time τM \ηx
(i) has asymptotic mean
N α+1 M∗ Iα (0) Γ (α)
Eηx [τM \ηx ] = 1 + o(1) , N → ∞, (21.2.5)
y∈S∗ \{x} capS (x, y)
Remark 21.8 Combining the previous remark with Corollary 21.7 we see that, in the
limit as N → ∞, on the time scale N α+1 the zero-range process observed when it
hits the set M = ∪x∈S∗ ηx behaves like a continuous-time random walk with transi-
tion rates r̄(x, y) given by r̄(x, y) = M∗ Iα (0)Γ (α)capS (x, y)/ z∈S∗ \{x} capS (x, z).
21.2 Metastable behaviour 515
L(N )
lim L(N) = ∞, lim = 0. (21.2.7)
N →∞ N →∞ N
In this case the transitions rates r(x, y) and the set S∗ will typically depend on N .
We suppress this dependence to lighten the notation.
Define
EN = EN (S∗ ) = ENx . (21.2.8)
x∈S∗
For general disjoint non-empty sets S∗1 , S∗2 we can only derive a lower bound and
an upper bound for capN,S (EN (S∗1 ), EN (S∗2 )) that coincide up to a constant. But for
partitions of S∗ we can get more.
As before, Theorem 21.9 allows us to use Corollary 7.11 to obtain the follow-
ing result for the metastable exit times τEN \ENx , x ∈ S∗ , where we recall that νA,B
denotes the last-exit biased distribution on A for the transition from A to B.
Corollary 21.10 (Mean metastable exit times) Suppose that L(N ) satisfies condi-
tion (21.2.7). For every x ∈ S∗ the metastable exit time τEN \ENx has asymptotic mean
Remark 21.11 We would like to show that the assertion in Corollary 21.10 also
holds for the process starting in a single configuration ηx ∈ M , and that the law of
the exit time is exponential. In Bovier, Bianchi and Ioffe [25] such results were ob-
tained for the Curie-Weiss model with random magnetic field described in Chap. 15,
through the use of coupling techniques. Such techniques, however, seem difficult to
implement for the zero-range model.
516 21 The Zero-Range Process
In this section we derive lower bounds and upper bounds on capacities that coincide
in the limit as N → ∞ with L fixed. These bounds will be used in Sect. 21.4 to
prove Theorem 21.5.
Lemma 21.12 Let S∗1 , S∗2 S∗ be non-empty disjoint sets, and let W denote the
1 2
equilibrium potential for the capacitor (ηS∗ , ηS∗ ). Then there is a constant Kα such
that
W (ξ ) − W ξ ≤ Kα L(L + N ) ,
2
ξ, ξ ∈ ENz , z ∈ S∗ , (21.3.1)
N α+1 r
where r = infu∈S r(u, u ± 1).
Proof Clearly,
W (ξ ) = Pξ [τ 1 <τ 2 ] (21.3.2)
ηS∗ ηS∗
Pξ [τ 1 < τξ ] L(L2 + N )
ηS∗
≤ Kα . (21.3.4)
Pξ [τξ < τξ ] N α+1 r
Remark 21.13 Lemma 21.12 is most useful when L is fixed and N → ∞, but it also
allows us to include cases with slowly growing L = L(N ).
21.3 Capacity estimates 517
1
N ξ
m∗
× , (21.3.6)
2N α+1 M∗ Iα (0) ZN,S a(ξ )
k=0 ξ ∈Ek,S0
where
HN (A, B) = {h : EN,S → R+ : h|A = 1, h|B = 0}. (21.3.8)
The strategy is to set rates to zero so that disjoint one-dimensional paths are ob-
tained. Afterwards we can use that the sum of the Dirichlet forms over the one-
dimensional paths, which are computable, yields a lower bound for the Dirichlet
form in (21.3.7), and hence for the capacity.
The construction goes as follows (see Figs. 21.1, 21.2, 21.3):
• For each ξ ∈ Ek,S , k ∈ {0, . . . , N }, we obtain a one-dimensional path as fol-
lows. Let {ξ, px,y } ∈ EN −1,S be the configuration given by {ξ, px,y }z = ξz for
z ∈ S\{x, y}, and {ξ, px,y }x = ξx + p and {ξ, px,y }y = ξy + N − k − p − 1 for
p ∈ {0, . . . , N − k − 1}. For each pair x, y ∈ S∗ , the one-dimensional path consists
of the path-segments {ξ, px,y }, where the excess N − k particles on site x jump
one by one until they reach site y (only one particle is jumping at any time).
• The path-segments are disjoint for the following reason. Let {ξ, px,y }, {ξ , px,y } ∈
EN −1,S be two different path-segments. Suppose that at some time t these paths-
segments coincide in a single configuration due to a jump of the jumping particle.
However, since {ξ, px,y } and {ξ , px,y } are different, the sites at which this par-
ticle is at time t in these segments must differ. In the next step particles from
518 21 The Zero-Range Process
different sites jump in such a way that the resulting configurations are different,
and hence the paths-segments cannot merge.
With the construction described above, we obtain one-dimensional paths that consist
of a Dirichlet form of a zero-range process on two sites multiplied by a term that we
can estimate by the capacity of the underlying random walk.
2. Let dz ∈ E1,S be the configuration with exactly one particle at site z ∈ S (the
jumping particle). Let h∗ ∈ HN (EN (S∗1 ), EN (S∗2 )) be the minimiser of (21.3.7).
Then
capN,S EN S∗1 , EN S∗2 ≥ ENlb h∗
N −k−1
N
= 1
4 μN,S {ξ, px,y } + dz g {ξ, px,y }z + 1 r(z, w)
k=0 ξ ∈Ek,S x,y∈S∗ p=0 z,w∈S
2
× h∗ {ξ, px,y } + dw − h∗ {ξ, px,y } + dz . (21.3.9)
21.3 Capacity estimates 519
Here an extra factor 12 arises because the sum over x, y counts the configurations
−k−1
{ξ, px,y }N
p=0 , ξ ∈ Ek,S , 0 ≤ k ≤ N , twice. Inserting the definition of g in (21.1.2)
and μN in (21.1.5) into (21.3.9), we get
ENlb h∗
Nα
N m∗
ξ
=
4ZN,S M∗ a(ξ \{x, y})
k=0 ξ ∈Ek,S x,y∈S∗
−k−1
N
1
×
a(ξx + p)a(ξy + N − k − p − 1)
p=0
2
× m(z)r(z, w) h∗ {ξ, px,y } + dw − h∗ {ξ, px,y } + dz , (21.3.10)
z,w∈S
where ξ \{x, y} is the configuration without the sites x, y. Next, fix x, y ∈ S∗ and
ξ ∈ Ek,S , and let fx,y : S → R be given by
Note that
fx,y ∈ B(x, y) = f : S → R+ : f (x) = 1, f (y) = 0 . (21.3.12)
Inserting fx,y into (21.3.10), we see that the sum over z, w ∈ S equals
2
2ES (fx,y ) h∗ {ξ, px,y } + dx − h∗ {ξ, px,y } + dy , (21.3.13)
where ES is the Dirichlet form associated with the random walk on S. Since fx,y ∈
B(x, y) and ES (fx,y ) ≥ capS (x, y), we get from (21.3.10) that
ENlb h∗
Nα
N m∗
ξ
≥ capS (x, y)
2ZN,S M∗ a(ξ \{x, y})
k=0 ξ ∈Ek,S x,y∈S∗
−k−1
N
1
×
a(ξx + p)a(ξy + N − k − p − 1)
p=0
2
× h∗ {ξ, px,y } + dx − h∗ {ξ, px,y } + dy
Nα
N ξ
m∗
≥ inf capS (x, y)
2ZN,S M∗ W (ξ )∈W (S∗1 ,S∗2 ) a(ξ \{x, y})
k=0 ξ ∈Ek,S x,y∈S∗
520 21 The Zero-Range Process
−k−1
N
[hW (ξ ) ({ξ, px,y } + dx ) − hW (ξ ) ({ξ, px,y } + dy )]2
× inf
hW (ξ ) ∈HN (EN (S∗1 ),EN (S∗2 )) a(ξx + p)a(ξy + N − k − p − 1)
p=0
hW (ξ ) (η)=Wx (ξ ) ∀ η∈ENx
y
hW (ξ ) (η)=Wy (ξ ) ∀ η∈EN
Nα
N m∗
ξ
= inf capS (x, y)
2ZN,S M∗ W (ξ )∈W (S∗1 ,S∗2 ) a(ξ \{x, y})
k=0 ξ ∈Ek,S x,y∈S∗
2
× Wx (ξ ) − Wy (ξ )
−k−1
N
[hξ (ξx + p + 1) − hξ (ξx + p)]2
× inf .
hξ ∈HN (EN (S∗1 ∪x),EN (S∗2 ∪y)) a(ξx + p)a(ξy + N − k − p − 1)
p=0
(21.3.14)
3. Due to the boundary conditions on the function hξ (recall (21.3.8)), the last factor
in (21.3.14) reduces to
N −ξx −1
N −
[hξ (ξx + p + 1) − hξ (ξx + p)]2
inf .
hξ ∈HN (EN (S∗1 ∪x),EN (S∗2 ∪y)) a(ξx + p)a(ξy + N − k − p − 1)
p=N −k+ξy
(21.3.15)
This is just the Dirichlet form of a zero-range process living on the two sites x and
y only, which is minimized by the function
x
q=N −k+ξx +ξy +1 a(q − 1)a(ξx + ξy + N − k − q)
H (x) = N − . (21.3.16)
q=N −k+ξx +ξy +1 a(q − 1)a(ξx + ξy + N − k − q)
N
(21.3.17)
Inserting (21.3.17) into (21.3.15), we obtain
1
N −N −1 . (21.3.18)
q=N −k+ξx +ξy a(q)a(ξx + ξy + N − k − q − 1)
Since this expression depends on the configuration ξ only through the number of
particles k, for fixed k it is bounded from below by [1 − O(N /N )]/N 2α+1 Iα (0)
(recall (21.2.1)).
1
N
ENlb h∗ =
2ZN,S N α+1 M∗ Iα (0)
k=0 ξ ∈Ek,S
21.3 Capacity estimates 521
ξ
m∗ 2
inf capS (x, y) Wx − Wy + O(δ)
W ∈W (S∗1 ,S∗2 ) a(ξ \{x, y})
x,y∈S∗
2
× 1 − O(δ)
1
=
2ZN,S N α+1 M ∗ Iα (0)
N
m∗
ξ
× 1 + O(δ) inf capS (x, y)[Wx − Wy ]2 ,
W ∈W (S∗1 ,S∗2 ) a(ξ )
x,y∈S∗ k=0 ξ ∈Ek,S0
(21.3.19)
Remark 21.15
(1) Note that if S∗2 = S∗ \S∗1 , then Lemma 21.12 is not needed in the proof of Propo-
sition 21.14 and the bound in (21.3.6) holds with δ = N /N . This fact estab-
lishes the lower bound in Theorem 21.9.
(2) If δ ↓ 0, which occurs when L is independent of N , then the same bound as in
1 2
(21.3.6) holds for capN,S (ηS∗ , ηS∗ ). This is because Lemma 21.12 implies that
on the sets ENx , x ∈ S∗1 ∪ S∗2 , the equilibrium potential W is close to 1 or to 0.
This fact establishes the lower bound in Theorem 21.5.
Let ρc ∈ (0, ∞) denote the critical particle density mentioned below Theorem 21.3.
Proposition 21.16 Let S∗1 , S∗2 ⊂ S∗ be non-empty disjoint sets. Then, for ε > 0,
1 2 LCε N −α−1
capN,S ηS∗ , ηS∗ ≤ capN,S EN S∗1 , EN S∗2 ≤
(N − Lρc )α
+ inf capS (x, y)[Wy − Wx ]2
W ∈W (S∗1 ,S∗2 )
x,y∈S∗
N −α−1
N mξ∗
N
× 1+O ,
ZN,S M∗ Iα (3ε) a(ξ ) εN
m=0 ξ ∈Em,S0
(21.3.20)
Proof The first inequality in (21.3.20) is obvious. The proof of the second inequality
comes in 8 Steps.
522 21 The Zero-Range Process
1. Let U = {u ∈ RS+ : x∈S ux = 1}. Define the sets
• F x,y = {u ∈ U : ux + uy ≥ 1 − ε},
• L x = {u ∈ U : ux > 1 − 3ε},
• D x,y = F x,y \L x ,
• I x,y =F x,y \{L x ∪ L y } and
• I x = y∈S∗ I x,y , for x, y ∈ S∗ .
We need the following facts.
Lemma 21.17 The sets D x,y and I x,y , x = y ∈ S∗ are mutually disjoint.
Proof Since I x,y ∪ L y = D x,y , it is enough to prove the assertion for the sets
η
D x,y , x = y ∈ S∗ . Let x = y = z ∈ S∗ . Assume that Nη ∈ D x,y ∩D x,z . Then Ny , ηNz ≥
ηx +ηy +ηz
2ε, and we get a contradiction because 1 ≥ N ≥ 1 − ε + ηNz ≥ 1 − ε + 2ε =
1 + ε.
Note that the sets I x,y are symmetric for x = y ∈ S∗ . From Lemma 21.17 we
know that if Nη ∈ I x , then there exists a unique y ∈ S∗ \{x} and therefore Nη ∈ I y
(see Fig. 21.4).
2. For the construction of the test function we define a smooth function hx,y : U →
[0, 1], y ∈ S∗ \{x}, such that
hx,y (u) = 1, u∈U, hx,y (u) = 1, u ∈ D x,y , (21.3.21)
y∈S∗ \{x}
random walk and on the harmonic function of the zero-range process on two sites,
' 8(
L−1
ηx 1
k
G x,y
(η) = fxy (zk ) − fxy (zk+1 ) H + min ηzl , ε , (21.3.23)
N N
k=1 l=2
y
Lemma 21.18 Gx,y belongs to the set HN (ENx , EN ).
Proof Let η ∈ ENx . Then, for N large enough, ηNx > 1 − 3ε. Due to the boundary
condition in (21.3.25) the harmonic function H in (21.3.23) takes the value 1 for
each k. Hence
L−1
G x,y
(η) = fxy (zk ) − fxy (zk+1 ) = 1. (21.3.26)
k=1
y
Let η ∈ EN . Then, for N large enough, z∈S\{y} ηNz < 2ε, and again, through the
boundary condition (21.3.25), the harmonic function H is always 0, which implies
Gx,y (η) = 0.
Proof Let η ∈ ENx , x ∈ S∗1 . Then Nη ∈ L x and g x (η/N ) = 1{x =x} . Moreover,
x,y
GW (η) = Wy + Gx,y (η)(Wx − Wy ) = Wy + 1 − Wy = 1 because η ∈ ENx . Therefore
Gx,y (η) = 1 for all y ∈ S∗ \{x}, and Wx = 1 because x ∈ S∗1 . Hence
x,y
GSW (η) = g x (η/N )GxW (η) = hx,y (η/N )GW (η)
x ∈S∗ y∈S∗ \{x}
= hx,y (η/N) = 1. (21.3.29)
y∈S∗ \{x}
Let η ∈ ENx , x ∈ S∗2 . Then Nη ∈ L x and g x (η/N ) = 1{x =x} . Moreover, Wx = 0
x,y
because x ∈ S∗2 . Therefore Gx,y = 1 because η ∈ ENx , and GW (η) = Wy +0−Wy =
0 for all y ∈ S∗ \{x}. Hence
x,y
GSW (η) = g x (η/N )GxW (η) = hx,y (η/N )GW (η) = 0, (21.3.30)
x ∈S∗ y∈S∗ \{x}
x,y
Lemma 21.20 Let η ∈ FN = {η ∈ EN : ηx + ηy ≥ N − N }, x, y ∈ S∗ . Then
Proof We start with the last equation. For this we show that Gx,y (η) + Gy,x (η) = 1
for Nη ∈ F x,y . If Nη ∈ L x , then this equality holds because Gx,y (η) = 1 and
Gy,x (η) = 0. The same holds for Nη ∈ L y . For Nη ∈ I x,y , let z1 , . . . , zL be the enu-
meration obtained from fx,y and w1 , . . . , wL the enumeration obtained from fy,x .
Since fx,y + fy,x = 1, we can choose wk+1 = zL−k and get
' k (
L−1
ηz
G x,y
(η) + Gy,x
(η) = fx,y (zk ) − fx,y (zk+1 ) H n
N
k=1 n=1
' k (
L−1
ηwz
+ fy,x (wk ) − fy,x (wk+1 ) H n
N
k=1 n=1
' (
L−1
k
ηzn
= fx,y (zk ) − fx,y (zk+1 ) H
N
k=1 n=1
' k (
L−1
ηzL−n+1
+ fy,x (zL−k+1 ) − fy,x (zL−k ) H
N
k=1 n=1
21.3 Capacity estimates 525
' k (
L−1
ηz
= fx,y (zk ) − fx,y (zk+1 ) H n
N
k=1 n=1
' (
L−1
L
ηzn
+ fx,y (zk ) − fx,y (zk+1 ) H
N
k=1 n=k+1
' ' k (
L−1
ηz
= fx,y (zk ) − fx,y (zk+1 ) H n
N
k=1 n=1
' ((
L
ηzn
+H
N
n=k+1
L−1
= fx,y (zk ) − fx,y (zk+1 ) H N − 3εN = 1. (21.3.32)
k=1
Moreover, η/N ∈ D x,y and η/N ∈ D y,x . Hence hx,y (η/N ) = 1{y =y} and hy,y (η/
N) = 1{y =x} . Thus, (21.3.34) equals
x,y
y,y
x,y
1
2 h (η/N )GW (η) + 1
2 hy,y (η/N )GW (η)
y ∈S∗ \{x} y ∈S∗ \{x}
x,y y,x x,y x,y x,y
= 12 GW (η) + 12 GW (η) = 12 GW (η) + 12 GW (η) = GW (η), (21.3.35)
4. Using the test function GSW , we can now derive an upper bound for the desired
x,y
capacity. For x ∈ S∗ , let FNx = y∈S∗ \{x} FN and FN = x∈S∗ FNx . The Dirichlet
form EN (GSW ) of GSW (not to be confused with the set EN in (21.2.8)) is
2
EN GSW = 12 μN,S (η)g(ηz )r(z, w) GSW ηz,w − GSW (η)
η∈EN,S z,w∈S
526 21 The Zero-Range Process
2
= 1
2 μN,S (η)g(ηz )r(z, w) GSW ηz,w − GSW (η)
x,y∈S∗ η∈F x,y z,w∈S
N
2
+ 1
2 μN,S (η)g(ηz )r(z, w) GSW ηz,w − GSW (η)
η∈FNc z,w∈S
x,y
= EN GSW | FN + EN GSW | FNc . (21.3.36)
x,y∈S∗
Thus, we have to estimate the Dirichlet form on the set of configurations FNc and
x,y
the set of configurations FN , x, y ∈ S∗ .
x,y
5. We start with the set of configurations FN with fixed x = y ∈ S∗ . It follows from
Lemma 21.20 that
x,y
EN GSW | FN
2
= 12 μN,S (η)g(ηzi )r(zi , zj ) GSW ηzi ,zj − GSW (η)
1≤i,j ≤L η∈F x,y
N
x,y x,y 2
= 1
2 μN,S (η)g(ηzi )r(zi , zj ) GW ηzi ,zj − GW (η)
1≤i,j ≤L η∈F x,y
N
= 1
2 μN,S (η)g(ηzi )r(zi , zj )
1≤i,j ≤L η∈F x,y
N
2
× (Wx − Wy ) Gx,y ηzi ,zj − Gx,y (η)
x,y
= (Wx − Wy )2 EN Gx,y | FN . (21.3.37)
Furthermore,
x,y
EN Gx,y | FN
2
= 12 μN,S (η)g(ηzi )r(zi , zj ) Gx,y ηzi ,zj − Gx,y (η)
1≤i,j ≤L η∈F x,y
N
Nα η
m∗ a(ηzi ) 2
= r(zi , zj ) Gx,y ηzi ,zj − Gx,y (η)
2ZN,S a(η) a(ηzi − 1)
1≤i,j ≤L η∈F x,y
N
Nα
≤ m(zi )r(zi , zj )
2ZN,S M∗
1≤i,j ≤L
m∗ x,y
ξ
2
× G (ξ + dzj ) − Gx,y (ξ + dzi ) , (21.3.38)
ξ ∈EN−1,S
a(ξ )
ξx +ξy ≥N−N −1
21.3 Capacity estimates 527
N ζ N −3εN
−1
m∗ 1
a(ζ ) a(p)a(N − m − p − 1)
m=0 ζ ∈Em,S\{x,y} p=2εN
j −1
2
p + mk + 1 p + mk
× fxy (zk ) − fxy (zk+1 ) H −H .
N N
k=i
(21.3.40)
The sum over p only runs from 2εN to N − 3εN − 1, due to the boundary
conditions in (21.3.25). Inserting the explicit form (21.3.24), we get
N ζ N −3εN−1
m∗ 1
a(ζ ) a(p)a(N − m − p − 1)
m=0 ζ ∈Em,S\{x,y} p=2εN
j −1 2
a(p + mk )a(N − m − mk − p − 1)
× fxy (zk ) − fxy (zk+1 ) N −3εN −1 .
k=i q=3εN a(q)a(N − m − q − 1)
(21.3.41)
Since mk ≤ N and p ≥ 2εN , we can estimate
α
mk α N N
a(p + mk ) = a(p) 1 + ≤ a(p) 1 + ≤ a(p) 1 + O ,
p 2εN εN
(21.3.42)
and a(N − m − mk − p − 1) ≤ a(N − m − p − 1). Inserting these estimates into
(21.3.41) we get the upper bound
N ζ N −3εN−1
m∗ 1
a(ζ ) a(p)a(N − m − p − 1)
m=0 ζ ∈Em,S\{x,y} p=2εN
528 21 The Zero-Range Process
j −1
a(p)a(N − m − p − 1)
× fxy (zk ) − fxy (zk+1 ) N −3εN −1
k=i q=3εN a(q)a(N − m − q − 1)
2
N
× 1+O
εN
j −1 2
N m∗
ζ
= fxy (zi ) − fxy (zj )
a(ζ )
m=0 ζ ∈Em,S\{x,y} k=i
N −3εN −1
p=2εN a(p)a(N − m − p − 1) N
× N −3εN −1 1+O . (21.3.43)
( q=3εN a(q)a(N − m − q − 1))2 εN
εN
R =O . (21.3.46)
N − εN
Nα 2
m(zi )r(zi , zj ) fxy (zi ) − fxy (zj )
2ZN,S M∗
1≤i,j ≤L
N ζ
m∗ 1 N
× 1 + O .
a(ζ ) N −3εN −1 a(p)a(N − m − p − 1) εN
m=0 ζ ∈Em,S\{x,y} p=3εN
(21.3.47)
The sum over i, j in (21.3.47) is just the capacity of the underlying random walk
between the two sites x and y. Since
N −3εN−1
N
a(p)a(N − m − p − 1) ≥ N 2α+1 Iα (3ε) 1 − O , (21.3.48)
εN
p=3εN
21.3 Capacity estimates 529
N ζ
x,y capS (x, y) m∗ N
EN Gx,y | FN ≤ 1+O .
N α+1 ZN,S M∗ Iα (3ε) a(ζ ) εN
m=0 ζ ∈Em,S0
(21.3.49)
6. Next we do the computation of EN (GSW | FNc ).
S z,w Cε
max G η − GSW (η) ≤ . (21.3.50)
W
η∈EN,S \FN N
Proof Write
S z,w x z,w x z,w
G η
− GW (η) =
S
g η /N GW η − g (η/N )GW (η).
x x
W
x∈S∗
(21.3.51)
x x,y z,w C
= g (η/N) hx,y z,w
η /N GW η 1+
N
x∈S∗ y∈S∗ \{x}
−h x,y x,y
(η/N )GW (η) . (21.3.52)
Since also hx,y is a smooth function, there exist a constant C such that |hx,y (ηz,w /
N) − hx,y (η/N )| ≤ N C
. Hence (21.3.52) is at most
x x,y C 2
≤ − GW (η)
x,y
g (η/N ) hx,y (η/N) GW ηz,w 1 +
N
x∈S∗ y∈S∗ \{x}
= g x (η/N ) hx,y (η/N)
x∈S∗ y∈S∗ \{x}
x,y 2C C2 x,y z,w
×
x,y
GW ηz,w − GW (η) + + 2 GW η
N N
530 21 The Zero-Range Process
x
≤ g (η/N )
x∈S∗
C
× hx,y (η/N) (Wx − Wy ) Gx,y ηz,w − Gx,y (η) +
N
y∈S∗ \{x}
x,y
≤ g x (η/N ) h (η/N)|Wx − Wy |Gx,y ηz,w − Gx,y (η) + C ,
N
x∈S∗ y∈S∗ \{x}
(21.3.53)
x,y
where we use that GW (η) ≤ 1 for all η ∈ EN,S . It remains to estimate
maxc Gx,y ηz,w − Gx,y (η). (21.3.54)
η∈FN
mk + 1
ηx ηx mk
= maxc fx,y (zk ) − fx,y (zk+1 ) H + −H +
η∈FN N N N N
k=i
j −1
a(ηx + mk )a(N − ηx − mk − 1)
= maxc fx,y (zk ) − fx,y (zk+1 ) N −3εN
η∈FN a(p − 1)a(N − p)
k=i p=3εN+1
j −1
a(N/2)a(N/2)
≤ maxc fx,y (zk ) − fx,y (zk+1 ) N −3εN
η∈FN a(p − 1)a(N − p)
k=i p=3εN +1
(N/2)2α
≤ N −3εN . (21.3.55)
p=3εN+1 a(p − 1)a(N − p)
N −3εN N
Since p=3εN+1 a(p − 1)a(N − p) ≥ N 2α+1 Iα (3ε)[1 − O( εN )], we get that
there exists a constant Cε such that
(N/2)2α N Cε
r.h.s. (21.3.55) ≤ 2α+1
1 + O = . (21.3.56)
N Iα (3ε) εN N
Thus
Cε x Cε
r.h.s. (21.3.53) ≤ g (η/N) hx,y (η/N ) = , (21.3.57)
N N
x∈S∗ y∈S∗ \{x}
7. In order to proceed with the computation, we need a technical result that follows
from Großkinsky and Spohn [132]. The first statement says that all excess parti-
cles accumulate on a single site in S∗ . The second statement says that if there is a
constraint on the maximal occupation number of a single site, then as many excess
particles as possible accumulate on a single site.
Proposition 21.22 Let ZN,S (k) be the constrained partition function with the con-
dition ηz < k for all η ∈ EN,S and z ∈ S. Then:
Nα
(i) ZN,S (k) = x∈S∗ (N −ρc L) α (ρ L)α Zρc L,S\{x} [1 + o(1)] for k ≥ N − ρc L.
c
α
(ii) ZN,S (k) = x∈S∗ (N −k)N
α k α ZN −k,S\{x} (k)[1 + o(1)] for k < N − ρc L.
Proof Write
N −εN
ξ
Nα 1 m∗
μN,S (η) =
ZN,S kα a(ξ )
η∈A x k=εN ξ ∈EN−k,S\{x}
ξy <N−k−N ∀y∈S\{x}
N −εN
Nα 1 1
= ZN −k,S\{x} (N − k − N ). (21.3.59)
ZN,S k (N − k)α
α
k=εN
N −εN
1 (N − k)α ZN ,S\{x,y} (N − k − N )
=N α
k α (N − k)α αN (N − k − N )α ZN,S
εN y∈S∗ \{x}
× 1 + o(1) . (21.3.60)
N −εN
1 1
Nα
kα αN (N − k − N )α
k=εN y∈S∗ \{x}
αN [1 + o(1)] Z(L−2)ρc ,S\{x,y,z}
× . (21.3.61)
(N − (L − 2)ρc )α (ρc (L − 2))α ZN,S
z∈S∗ \{x,y}
r.h.s. (21.3.61)
N −εN
1 |S∗ − 1| |S∗ − 2| (N − (L − 2)ρc )α
≤ 1 + o(1)
k α (N − k − N )α (N − (L − 2)ρc )α |S∗ − 2|
k=εN
N −εN
N − (L − 2)ρc α 1 1
= |S∗ − 1| 1 + o(1)
N − (L − 2)ρc k (N − k − N )
α α
k=εN
(N −
N )/2
N − (L − 2)ρc α 1 1
≤ 2α+1 |S∗ − 1| 1 + o(1)
N − N (N − (L − 2)ρc )α k α
k=εN
α
2α+1 |S∗ − 1| N − (L − 2)ρc 1 1
≤ 1+
α−1 N − N (N − (L − 2)ρc )α ε α−1 N α−1
× 1 + o(1)
C∗ 1
≤ . (21.3.63)
(N − Lρc ) ε
α α−1 N α−1
This settles the claim.
8. Observe that configurations out of the set z∈S\S∗ {η ∈ EN,S : ηz ≥ N − N } do
not contribute to the Dirichlet form, because for these configurations Gx,y = 0 for
all x, y ∈ S∗ . Combining Lemmas 21.21–21.23, we get
1 2
EN GSW | FNc = μN,S (η)g(ηz )r(z, w) GSW ηz,w − GSW (η)
2 c η∈FN z,w∈S
21.4 Proof of the main theorems 533
1 2
≤ μN,S (η)r(z, w) GSW ηz,w − GSW (η)
2
x∈S∗ η∈EN \FNx z,w∈S
Cε2
≤ μ N,S (η) r(z, w)
2N 2 x
x∈S∗ η∈A z,w∈S
LC∗ Cε2 1
≤ . (21.3.64)
(N − Lρc )α N α+1
Combine (21.3.36), (21.3.37), (21.3.49) and (21.3.64) to arrive at
capN,S EN S∗1 , EN S∗2
= inf EN (G)
G∈HN (EN (S∗1 ),EN (S∗2 ))
≤ inf EN GSW
W ∈W (S∗1 ,S∗2 )
x,y
= inf EN GS | FN [Wy − Wx ]2 + EN GSW | FNc
W ∈W (S∗1 ,S∗2 )
x,y∈S∗
≤ inf capS (x, y)[Wy − Wx ]2
W ∈W (S∗1 ,S∗2 )
x,y∈S∗
N −α−1
N mξ∗
N
× 1+O
2ZN,S M∗ Iα (3ε) a(ξ ) εN
m=0 ξ ∈Em,S0
LC∗ Cε2 1
+ . (21.3.65)
(N − Lρc )α N α+1
This completes the proof.
Proof To prove Theorem 21.5, recall the remark made after the proof of Proposi-
tion 21.14. Using Lemma 21.2, we get for L fixed and N → ∞ the lower bound
1 2 N −α−1 [1 + o(1)]
capN,S ηS∗ , ηS∗ ≥
2ZS M∗ Iα (0)
N
m∗
ξ
× inf capS (x, y)[Wx − Wy ]
2
.
W ∈W (S∗1 ,S∗2 ) a(ξ )
x,y∈S∗ k=0 ξ ∈Ek,S0
(21.4.1)
534 21 The Zero-Range Process
into (21.3.20) and use Lemma 21.2. This gives the upper bound for (21.2.3).
1. In the case where S∗1 and S∗2 are a partition of S∗ , the lower bound in Proposi-
tion 21.14 takes the form
1
capN,S EN S∗1 , EN S∗2 ≥ capS (x, y)
N α+1 M ∗ Iα (0)
x∈S∗1 ,y∈S∗2
1
N
m∗
ξ
× 1 − O(δ) . (21.4.4)
ZN,S a(ξ )
k=0 ξ ∈Ek,S0
N
1 m∗
ξ
ZN,S = N α 1 + o(1)
(N − k)α a(ξ )
x∈S∗ k=0 ξ ∈Ek,S\{x}
α
N m∗
ξ
N
≤ 1+ 1 + o(1)
N − N a(ξ )
x∈S∗ k=0 ξ ∈Ek,S\{x}
N m∗
ξ
= |S∗ | 1 + o(1)
a(ξ )
k=0 ξ ∈Ek,S\{x∗ }
" m∗ (z)k
≤ |S∗ | 1 + o(1)
a(k)
z∈S\{x∗ } k≥0
21.5 Proof that the condensate configurations form a metastable set 535
"
= |S∗ | Γz 1 + o(1)
z∈S\{x∗ }
"
= |S∗ |Γ (α) Γz 1 + o(1) . (21.4.5)
z∈S0
Hence we get
capN,S EN S∗1 , EN S∗2
1
≥ 1 + o(1) capS (x, y), (21.4.6)
N α+1 M∗ |S∗ | Iα (0) Γ (α)
x∈S∗1 ,y∈S∗2
2. In the case where S∗1 and S∗2 are a partition of S∗ , we get from (21.3.20), together
with the estimate in (21.4.3), the upper bound
ZS 1
capN,S EN S∗1 , EN S∗2 ≤
ZN,S N α+1 M∗ |S∗ | Iα (0) Γ (α)
N C N
× capS (x, y) 1 + O . (21.4.7)
1 2
N
x∈S∗ ,y∈S∗
ZS
Since ZN,S = 1 + o(1), we obtain
1
capN,S EN S∗1 , EN S∗2 ≤ 1 + o(1)
N α+1 M ∗ |S∗ | Iα (0) Γ (α)
× capS (x, y), (21.4.8)
x∈S∗1 ,y∈S∗2
Proposition 21.24
supη∈M [capN,S (η, M \η)/μN,S (η)]
≤ O L L2 + N /N α+1 r . (21.5.1)
infξ ∈M c [capN,S (ξ, M )/μN,S (ξ )]
536 21 The Zero-Range Process
Proof The following lemma bounds the denominator in (21.5.1) from below.
capN,S (ξ, M ) r
≥ , ξ ∈ M c, (21.5.2)
μN,S (ξ ) L(L2 + N )|S∗ |K(α)
Proof First we give the proof for |S∗ | = 3. After that we extend the proof to |S∗ | > 3
with the help of an algorithm.
• |S∗ | = 3. Let S∗ = {x, y, z} and r(x, y) ≥ r(x, z) ≥ r(y, z). To estimate the ca-
pacity from below we only need one path from ξ to M . First, we let all particles
of a valley (recall (8.2.10) for the definition of valleys) jump onto its attractor. The
resulting configuration, where only the three attractors x, y, z in the three valleys
A(x), A(y), A(z) are occupied, is called η. Assume that on y there are more parti-
cles than on x (see Fig. 21.5). Next, since r(x, y) ≥ r(x, z), r(y, z), and since y is
occupied by more particles than x, we let all particles from x jump to y. The result-
ing configuration with only the sites y and z occupied, is called σ x,y (see Fig. 21.5).
Finally, we pick the site y or z where σ xy has the highest occupation number and
move all particles to that site.
Since
1
capN,S (ξ, M ) ≥ , (21.5.3)
cap−1 −1 −1
N,S (ξ, η) + capN,S (η, σ ) + capN,S (σ , M )
xy xy
we must calculate a lower bound for each of the three capacities in the right-hand
side of (21.5.3).
For w ∈ S∗ , let (w)n be a distance-to-w-increasing enumeration of A(w)\{w},
and let ξwi = ξw + i−1j =1 ξwj be the number of particles on site w before the particles
on site wi jump onto w. Let |w| = max{dist(w, wi ) : w ∈ S∗ , wi ∈ (w)n }, where
dist(w, wi ) is the Euclidean distance. For each transition of the particles from site wi
to site w via nearest-neighbour jumps, we use the explicit formula for the capacity
21.5 Proof that the condensate configurations form a metastable set 537
of the one-dimensional chain. Doing so, we get for the capacity between ξ and η
the lower bound
−1
|(w)
n |
wi ξ
|w|(N − 1)α
cap−1
N,S (ξ, η) ≤ , (21.5.4)
μN,S ({ξwi − k, ξwi + k})r N α
w∈S∗ i=1 k=0
Note that the μN -measure of the configuration increases after each transition of a
particle, because there are more particles on condensate-sites. Therefore (21.5.4)
equals
−1
|(w)
n |
wi ξ
ZN,S |w| a({ξwi − k, ξwi + k}\{wi , w})
Nα {ξw −k,ξwi +k}\{wi }
w∈S∗ i=1 k=0 m i ∗
k α k α 1 α
× 1− 1+ i 1− m∗ (wi )k . (21.5.6)
ξwi ξw N
k=0
|(w)
n | |w|LK (α) |(w)
n | L2 K (α)
cap−1
N,S (ξ, η) ≤ ≤ .
μN,S ({ξwi , ξwi })r μN,S ({ξwi , ξwi })r
w∈S∗ i=1 w∈S∗ i=1
(21.5.7)
If ηx = N or ηy = N , then we can stop here, because we already have a configura-
tion in M . Otherwise we continue the estimation of the capacities.
Without loss of generality, let ηx ≤ ηy . A similar estimation of the formula for
the one-dimensional chain yields
538 21 The Zero-Range Process
cap−1
N,S η, σ
xy
x −1
η
dist(x, y)K (α)(N − 1)α
≤
μN,S ({ηx − k, ηy + k})r N α
k=0
ηx −1
ZN,S dist(x, y)K (α) a(η\{x, y}) a(ηx − k)a(ηy + k)(N − 1)α
= η
Nα r m∗ k=0
Nα
x −1
η
α
α
α
dist(x, y)K (α) k k 1
= 1− 1+ 1− , (21.5.8)
μN,S (η)r ηx ηy N
k=0
where {ηx − k, ηy + k} ∈ EN,S is the configuration where all sites are empty except
for site x with ηx − k particles, site y with ηy + k particles and site z with ηz
particles. Observe that ηx /ηy ≤ 1. For all k ∈ {1, . . . , ηx − 1}, we have
1 α k α k α
1− ≤ 1, 1− ≤ 1, 1+ ≤ 2α . (21.5.9)
N ηx ηy
Since ηx , ηy < N , we can estimate the sum in (21.5.8) from above by 2α N , and get
Since there are more particles on the sites of S∗ in the configurations {ξwi , ξwi }, η
and σ xy than in the configuration ξ , we have
• |S∗ | > 3. The following algorithm generalizes the argument to |S∗ | > 3. Fix a con-
figuration ξ ∈/ M . First, we let all particles in a valley jump onto its attractor. From
the resulting configuration we construct a labeled tree. Each attractor corresponds
to a leaf of the tree that is labeled with its occupation number. The local maxima of
the potential of the random walk on S are the vertices of the tree, where the largest
local maximum is the root of the tree. The root is connected with the vertices of the
next two largest local maxima, and so on, as illustrated in Fig. 21.6.
The algorithm works as follows. For each pair of leaves we calculate the length
of the shortest path between them and choose the pair of leaves with the shortest
path. If there are multiple shortest paths, then we choose the one with the lowest-
labeled leave. Next, we increase the label of the highest-labeled leaf in the pair by
the value of the label of the lowest-labeled leaf, and delete the latter. We continue
until we obtain a tree with only one leaf. This algorithm describes a path from the
configuration ξ ∈ / M to a configuration in M (because the final tree corresponds to a
configuration in M ). Thus, for the general case we have to calculate at most |S∗ | − 1
transitions between condensate-sites, i.e., we have to estimate at most |S∗ | − 1 ca-
pacities. Hence (21.5.14) also holds in the general case. Figure 21.7 illustrates the
algorithm for the case |S∗ | = 3.
Having concluded the proof of Lemma 21.25, we can now conclude the proof of
Proposition 21.24. Using Theorem 21.5 and Lemma 21.2, we obtain
ZS
= 1 + o(1) sup capS (w, v), (21.5.15)
N α+1 |S ∗ | M ∗ Iα (0) Γ (α) w∈S∗ v∈S∗ \{w}
where μN,S (ηw ) = 1/ZN,S is the equilibrium weight of the configuration ηw where
all N particles are at site w. The expression in (21.5.15) gives
us control of the
numerator in (21.5.1) (recall (21.1.15) and the fact that M = w∈S∗ ηw ). For the
denominator in (21.5.1) we use Lemma 21.25. For the quotient we therefore get
1. Metastability for the zero-range process in the case where the random walk is
reversible was first investigated by Beltrán and Landim [16]. Convergence of the
time-scaled process, observed when it visits the metastable set, to a continuous-
time random walk was proved with the help of martingale techniques (recall Re-
mark 21.8). The results were recovered in the framework of the potential-theoretic
approach to metastability in the Diploma Thesis of Rebecca Neukirch [191], and
were published in Bovier and Neukirch [40]. The presentation in this chapter is
based on this work.
2. Lemma 21.1 was proved, in increasing generality, in Andjel [7], Evans [105],
Großkinsky and Spohn [132]. Lemma 21.2 was proved in Großkinsky and Spohn
[132], Beltrán and Landim [16]. The condensation phenomenon in Theorem 21.3
21.6 Bibliographical notes 541
was proved in Großkinsky and Spohn [132], Großkinsky, Schütz and Spohn [131],
Großkinsky and Schütz [130].
3. Landim [159] deals with the zero-range process in the case where the random
walk is totally asymmetric (so that reversibility fails).
Part IX
Challenges
There are several challenges within metastability that as yet remain unsolved, but
are potentially within reach of the conceptual and technical machinery described in
the present monograph. This chapter is devoted to two models representing some
of these challenges. We state a few theorems—without proofs—and point to a few
open problems.
Section 22.1 looks at Ising spins in a small magnetic field subject to Glauber
dynamics. This is the same model as treated in Chaps. 17 and 19, but in a different
metastable regime, namely, where the critical droplet is very large. In Sect. 22.2 we
discuss a model of an interacting particle system in the continuum, to which the
methods described in the present monograph apply after appropriate modifications.
In Part VII we already dealt with large volumes for Glauber dynamics and Kawasaki
dynamics at low temperatures. In this section we again look at Glauber dynamics in
large volumes, but now at positive temperatures and small magnetic fields. The tem-
perature is chosen strictly below the critical temperature of the Ising model on the
infinite lattice Z2 in zero magnetic field. In the limit as the magnetic field tends to
zero the size of the critical droplet tends to infinity. The main idea is that its asymp-
totic shape is the Wulff shape from equilibrium statistical physics, i.e., the shape
that minimises the integrated surface tension between the minus-phase outside the
droplet and the plus-phase inside the droplet. In what follows we consider volumes
that are comparable to the volume of the critical droplet. In Chap. 23 we will see
what happens in larger volumes.
Return to the setting of Sect. 17.1.1. The Ising-spin Hamiltonian on a finite square
box Λ ⊂ Z2 reads (recall (17.1.1)–(17.1.2))
J h
H (σ ) = − σ (x)σ (y) − σ (x), σ ∈ S = {−1, +1}Λ , (22.1.1)
2 2
{x,y}∈Λ∗ x∈Λ
with J, h > 0, where we use periodic boundary conditions. The system follows a
Metropolis dynamics (σt )t≥0 with spin-flip rates
# −β[H (σ x )−H (σ )]
e +, σ ∈ S, x ∈ Λ,
cβ σ, σ x = (22.1.2)
0, otherwise,
βw ∗ (β)2
κβ = , (22.1.5)
4m∗ (β)
with m∗ (β) the spontaneous magnetisation of the plus-phase and w ∗ (β) the inte-
grated surface tension of the Wulff droplet of unit volume.
Theorem 22.1 says that the crossover from the minus-phase to the plus-phase
occurs at time exp[(κβ / h)[1 + o(1)]]. What is remarkable about (22.1.4) is that
it relates the crossover, which is a non-equilibrium quantity, to the spontaneous
magnetisation and the integrated surface tension, which are equilibrium quanti-
ties. A priori there is no reason why the critical droplet should have an equilibrium
shape (= Wulff shape). In fact, in Sect. 17.7, Items 5–8, we saw examples where
(sub)critical droplets do not take on an equilibrium shape.
Here is a brief description of the construction of the Wulff droplet (see Fig. 22.2).
Let S 1 = {x ∈ R2 : x2 = 1} be the surface of the Euclidean ball of radius 1. The
surface tension in the Ising model on Z2 at h = 0 in the direction perpendicular to
n ∈ S 1 is defined as
1 Z,σ (n)
Tβ (n) = − lim ln . (22.1.6)
→∞ 2βy()2 Z,+
Here, y() and −y() are the points where the straight line {x ∈ R2 : (x, n) = 0}
intersects the boundary of the box Λ = [−, ]2 , Z,σ (n) is the partition sum on
Λ ∩ Z2 with the boundary condition σ (n) given by
#
+1 if (x, n) ≥ 0,
σ (n)(x) = x ∈ ∂Λ , (22.1.7)
−1 if (x, n) < 0,
and Z,+ is the partition sum with the plus boundary condition.
Let D denote the set of closed self-avoiding rectifiable curves in R2 that are the
boundary of a bounded region in R2 . For γ ∈ D , define the surface tension along γ
as (see Fig. 22.1)
Iβ (γ ) = Tβ (ns ) dns , (22.1.8)
γ
Fig. 22.1 The surface tension of a droplet equals the integral of the local surface tension over the
boundary of the droplet. The local surface tension depends on the direction perpendicular to the
boundary
Fig. 22.2 Wulff construction. Left: Polar plot of the function n → Tβ (n), with three outward
directions and three orthogonal tangent lines demarking three inward half-spaces (of which only
one has been shaded). Right: The intersection of all the half-spaces (= the inner envelope of the
tangent lines) gives rise to the Wulff shape. The Wulff droplet is the scaling of the Wulff shape that
has unit volume
The latter region satisfies the scaling relation Wβλ = λWβ1 , i.e., its shape stays the
same as λ is varied. The Wulff droplet is defined as the region
λ(β)
Wβ = Wβ , (22.1.11)
where λ(β) is chosen such that Wβ has volume 1 (see Fig. 22.2). Clearly, Wβ is
convex and hence ∂Wβ ∈ D . The integrated surface tension of the Wulff droplet, the
quantity that appears in (22.1.5), reads
22.1.3 Heuristics
The heuristics behind Theorem 22.1 is as follows. Consider a droplet of the plus-
phase inside the minus-phase. Let S be the shape of this droplet and 2 its volume
(i.e., the number of sites inside). For large , the free energy of this droplet is roughly
Here, −m∗ (β)h2 is the change of the free energy due to the fact that inside the
droplet the minus-phase is replaced by the plus-phase, and wS (β) is the change of
the free energy due to the surface tension along the border of the droplet. The two
terms are of the same order of magnitude when is of order 1/ h. Therefore, putting
= b/ h and ΦS () = φS (b)/ h, we get
This function takes its maximal value at bc = wS (β)/2m∗ (β), reaching the value
φS (bc ) = wS (β)2 /4m∗ (β). The height of this barrier is minimised by the Wulff
shape, i.e., for S with wS (β) = w ∗ (β).
and in the size and shape of the critical droplets in the limit as the temperature tends
to zero. We will show that this can be achieved subject to a number of hypotheses
on the energy landscape, which replace the hypotheses (H1) and (H2) in Chap. 16.
It will turn out that the prefactor depends in a delicate way on the temperature, the
chemical potential, and the shape of the pair potential near its minimum, which is
different from what we found in Chaps. 16–18 for lattice systems.
The problem with working in the continuum is that it is hard to control the en-
ergy landscape, especially in the vicinity of the set of critical droplets. We rely on
properties derived in the literature for minimal-energy configurations at fixed parti-
cle numbers. Our assumptions on the energy landscape are expected to be true for
a large class of pair potentials, but as yet can be proven only for a particular pair
potential in d = 2, called the soft disk potential.
Let
Ω∗ = ω ⊂ Rd : card(ω) ∈ N0 (22.2.1)
with card(ω) the cardinality of ω. The set Ω∗ represents all the finite-particle config-
urations in Rd (particles have locations, do not overlap, and are indistinguishable),
and is endowed with the Hausdorff metric dH : Ω∗ × Ω∗ → [0, ∞).
Particles interact with each other through a pair potential v : [0, ∞) → R ∪ {∞}.
Let U : Ω∗ → R ∪ {∞} be the energy function defined by
U (ω) = U {x1 , . . . , xN } = v xi − xj , (22.2.2)
1≤i<j ≤N
Let β ∈ (0, ∞) denote the inverse temperature and μ ∈ R the chemical potential.
The grand-canonical Hamiltonian H = Hμ,Λ on Λ with chemical potential μ ∈ R
is the function on Ω defined by
dPβ 1
(ω) = e−βH (ω) , (22.2.6)
dQ Ξ
where Ξβ = Ξβ,μ,Λ is the grand-canonical partition function
Ξβ = e−βH (ω) Q(dω). (22.2.7)
Ω
Throughout the paper the labels μ, Λ will be suppressed from the notation.
on Ω with càdlàg paths. Particles are randomly created and annihilated inside Λ
as if the outside of Λ were an infinite gas reservoir with chemical potential μ. The
dynamics is Metropolis with grand-canonical Hamiltonian H in (22.2.5). Once in-
side Λ, particles cannot move.
Our dynamics is the Markov process with generator Lβ given by
(Lβ f )(ω) = bβ (x, ω) f (ω ∪ x) − f (ω) d x
Λ
+ dβ (x, ω) f (ω \ x) − f (ω) , f ∈ Cb (Ω), ω ∈ Ω, (22.2.9)
x∈ω
= argmin H, (22.2.12)
KN (μ)
lim (24β)−(2kc −3) e−βΓ E∅ (τN ) = , (22.2.17)
β→∞ 2π|Λ|
where
Γ = Ekc − kc μ (22.2.18)
with
Ekc = −3kc + 0 12kc − 3 1. (22.2.19)
(ii) There exists a χ ∈ R such that
In this section we state four hypotheses on the energy landscape under which the re-
sults in Sect. 22.2.3 carry over to other pair potentials, modulo minor modifications.
The class of pair potentials we are interested in is the following.
Definition 22.6 (Class of pair potentials) The pair potential ν : [0, ∞) → (−∞, ∞]
is assumed to be lower-semicontinuous, to be non-positive, continuous and Lips-
chitz where finite, to have a hard-core repulsion, a finite range, and a unique strictly
negative minimum, i.e., there exist 0 < r1 ≤ r2 < r3 < ∞ such that (see Fig. 22.4)
⎧
⎪
⎪ = ∞, for 0 ≤ r < r1 ,
⎨
≤ 0, for r1 ≤ r < ∞,
v(r) (22.2.21)
⎪
⎪ > v(r2 ), for r = r2 ,
⎩
= 0, for r ≥ r3 .
The conditions in (22.2.21) imply that v is stable, i.e., there exists a C ∈ (0, ∞)
such that U (ω) ≥ −Ccard(ω) for all ω ∈ Ω∗ (see Ruelle [209, Sect. 3.2]).
For k ∈ N, define
Ek = inf U (ω). (22.2.22)
ω∈Ω∗
card(ω)=k
22.2 Crystallisation in small volumes at low temperatures 555
Fig. 22.4 Shape of the pair potential r → v(r). The soft disk potential has r1 = r2 = 1, r3 = 25
24
and v(r1 ) = v(r2 ) = −1 (see Fig. 22.3)
Hypothesis (H1) precludes certain special values of the chemical potential, and
guarantees that our system is in the metastable regime: the threshold for nucleation
is neither one particle nor infinitely many particles (Λ needs to be chosen large
enough so that the kc -particle ground states fit inside). The restrictions on the pair
potential in Definition 22.6 imply that e∞ ∈ (−∞, 0). Since E0 = 0 and Ek > ke∞
for all k ∈ N, it follows that kc = ∞ for all μ < e∞ . Hypothesis (H2) implies that
kc = ∞ also for μ = e∞ . On the other hand, kc < ∞ for all μ > e∞ . In order to
have kc ≥ 2, as required in Hypothesis (H1), we need to constrain μ from above.
Under Hypothesis (H2) there exists an hc ∈ (0, ∞) (depending on the fine details of
556 22 Challenges Within Metastability
Hypothesis (H3), finally, says that the k-particle ground states in Λ coincide with
those in Rd when k is not too large. (Λ needs to be chosen large enough so that no
interaction around the torus occurs.)
Our fourth hypothesis is analogous to what we had in Lemma 16.9.
Hypothesis (H4) says that the deepest wells in the energy landscape are those
containing ∅ and N , which makes (∅, N ) into a metastable pair in the sense of
Definition 8.2.
The following theorem states that the results in Sect. 22.2.3 indeed carry over.
Theorem 22.9 (Metastability for other pair potentials) Theorems 22.3–22.5 hold
under Hypotheses (H1)–(H4), with kc and Ekc as defined above, and with an appro-
priate modification of the scaling factors in (22.2.17) and (22.2.20).
Different pair potentials will have different scaling in β. E.g. when the pair poten-
tial is twice differentiable
√ near its minimum, the scaling factor in the left-hand side
of (22.2.17) becomes [ 2πv (1)β]−(2kc −3) and the formula for KN in the right-
hand side of (22.2.17) involves the determinant of a certain (2kc − 3)-dimensional
quadratic form. In the near-critical configurations
√ the regions where the particles are
located have a linear size of order 1/ β.
Hypotheses (H1)–(H4) are satisfied for the soft disk potential in d = 2, with
e∞ = −3 and hc = 1. We expect that they are in fact satisfied for a large class of
pair potentials. For instance, (H1)–(H3) should hold under the conditions on the pair
potential stated in Definition 22.6, with Ek − ke∞ 4 k (d−1)/d as k → ∞. Moreover,
if the well of the pair potential is narrow and deep enough (see Fig. 22.4), then
Theorem 22.2 should carry over as well, with the spacing in the triangular lattice
equal to r2 and with Ek multiplied by −v(r2 ) > 0. Hypothesis (H4) is more delicate,
and will no doubt require stronger conditions on the pair potential, e.g. unimodality.
Settling (H1)–(H4) beyond the soft disk potential in d = 2 represents a hard open
problem in the analytic theory of crystallisation. In d = 3, for instance, it is believed
that the ground states consist of stacked layers of triangular lattices, but the relative
position of these layers is a matter of debate.
delicate coupling and coarse-graining techniques are needed in which the micro-
scopic regions where the system changes from the minus-phase to the plus-phase
are approximated on a mesoscopic scale by local pieces of a continuum interface,
and the cost of these pieces is related to the direction-dependent surface tension, as
explained in Sects. 22.1.2–22.1.3.
2. Vanheuverzwijn [231] proved the existence of metastable states for the Ising
model on Z2 . Numerical studies by Rikvold, Tomita, Miyashita and Sides [206]
confirm Theorem 22.1.
4. The proof of Theorem 22.1 given in [215] in fact applies to spin-flip rates that
are more general than (22.1.1)–(22.1.2): translation invariant, finite range, attrac-
tive, monotone in the magnetic field, uniformly bounded away from zero and infin-
ity. Also the initial condition can be more general: any starting distribution that is
stochastically below the minus-phase.
5. It is shown in [215] that the metastable state, i.e., the state at time τ (h; κ) with
κ < κβ , is “infinitesimally larger” than the minus-phase. An asymptotic expansion in
powers of h is derived for the difference of a local average under the metastable state
and the minus-phase, which can be interpreted as describing the C ∞ -continuation in
h of the family of Gibbs distributions with negative h into the region of positive h.
This continuation is expected not to be analytic, a situation that should be typical
for metastable states. It is known that there is no analytic continuation of the minus-
phase across h = 0.
6. The extension of Theorems 22.1 to Kawasaki dynamics, where the limit of small
magnetic field h ↓ 0 is taken over by the limit of weak supersaturation Δ ↑ 2U , is
also still open. The main difficulty is that Kawasaki dynamics is conservative: the
growing of large droplets is hampered because the gas around the droplet gets de-
pleted. None of the techniques developed for Glauber dynamics seems easily trans-
portable.
7. Bodineau, Graham and Wouts [29] look at a diluted version of the model studied
in Sect. 22.1, where the pair interaction is switched off on a random set of sites with
density p ∈ (0, 1). The relaxation time is shown to be exp[κβ (p)/ hd−1 ], h ↓ 0, with
κβ (p) = w ∗ (p, β)2 /m∗ (p, β), where m∗ (p, β) and w ∗ (p, β) are the analogues of
m∗ (β) and w ∗ (β) in the non-diluted model, i.e., the spontaneous magnetisation of
the plus-phase and the integrated surface tension of the Wulff droplet of unit vol-
ume. Intuitively, dilution enhances relaxation because there is no surface tension in
diluted areas, and so it is expected that p → κβ (p) is non-increasing. However, no
558 22 Challenges Within Metastability
proof is available, even though both p → m∗ (p, β) and p → w ∗ (p, β) are known
to be non-increasing. It is shown that limβ→∞ κβ (p)/κβ = 0 for all p ∈ (0, 1).
8. The results in Sect. 22.2 are taken from den Hollander and Jansen [83].
Parts VI and VII dealt with nucleation in small and large volumes at low tempera-
tures. In the former, where the volume was kept fixed, we were able to arrive at a full
description of metastability, with detailed computations for both Glauber dynamics
and Kawasaki dynamics. In the latter, where the volume grew exponentially with
the inverse temperature, we restricted ourselves to the computation of the time of
first appearance of a critical droplet somewhere in the large volume. We were un-
able to follow the subsequent growth of this droplet beyond twice its initial size. In
particular, it remained open how this droplet eventually invaded the large volume.
Especially for Kawasaki dynamics this is a formidable challenge, because a large
droplet tends to deplete the surrounding gas.
What happens in infinite volume? In that case a new mechanism of nucleation
becomes possible: the critical droplet is created somewhere far from the origin and
invades the neighbourhood of the origin by growing. The key question reads: Is this
mechanism more efficient than nucleation close to the origin? It turns out that the
answer is yes.
In this chapter we look at Glauber dynamics in infinite volume in two metastable
regimes. In Sect. 23.1 we consider the limit of low temperature at positive magnetic
field, while in Sect. 23.2 we turn to the limit of small magnetic field at positive tem-
perature. We restrict ourselves to presenting the main ideas only, omitting proofs.
For references we refer the reader to the bibliographical notes.
Post-nuclear growth is not part of metastability theory, the latter being concerned
with pre-nucleation and nucleation phenomena only. Key features, such as the re-
newal structure created by repeated unsuccessful trials to form a critical droplet, are
lost. In fact, so far potential theory has rather little to say about post-nuclear growth.
Consequently, sharp results are hard to get, and fully rely on ad hoc methods.
J h
σ ∈ S = {−1, +1}Z ,
d
H (σ ) = − σ (x)σ (y) − σ (x),
2 2
{x,y}∈(Zd )∗ d
x∈Z
(23.1.1)
with J, h > 0. The system follows a Metropolis dynamics (σt )t≥0 with spin-flip
rates given by
e−β[Δx H (σ )]+ , σ ∈ S, x ∈ Zd ,
c σ, σ =
x
(23.1.2)
0, otherwise,
where
Δx H (σ ) = σ (x) J σ (y) + h . (23.1.3)
y∈Zd
(x,y)∈(Zd )∗
Theorem 23.1 (Mean crossover time from minus to plus) If f is a local function,
then
f (), if κ < κd ,
lim E f (στ (β;κ) ) = (23.1.4)
β→∞ f (), if κ > κd ,
where τ (β; κ) = exp(βκ) and
1
d
κd = Γk , (23.1.5)
d +1
k=1
Recall from Sect. 17.7, Item 3, that an explicit formula is available for Γk .
The heuristics behind Theorem 23.1 is as follows. The most efficient mechanism
for relaxation from minus to plus near the origin is for the system to create a crit-
ical droplet of plus-spins somewhere far away from the origin and let this droplet
grow and invade the origin. Suppose that nucleation in a finite box occurs at rate
23.1 Low temperatures 561
exp(−βΓd ) (which we know from Chap. 19 is true if we ignore terms of order 1).
Suppose further that the speed of growth of a large supercritical droplet is vd (i.e.,
the speed at which the faces move outwards). Then, to invade the origin at time t,
the droplet must be born inside the space-time cone whose basis is a d-dimensional
hypercube with side length vd t and whose height is t. The critical space-time cone
is such that the nucleation rate is of order 1. Therefore, writing τd for the time when
the origin is invaded, we have
τd (vd τd )d exp −βΓd = 1, (23.1.6)
where we ignore terms of order exp(o(β)). Since large droplets are approximately
parallelepipeds, the dynamics on a face behaves like a d − 1-dimensional Glauber
dynamics, and so the time needed to fill a face is τd−1 . Hence
vd = 1/τd−1 . (23.1.7)
neighbour, and at rate 1 if it has two or more occupied neighbours, and occu-
pied sites stay occupied forever. For this model it is shown that the speed is
ε 1/d and the nucleation time is exp[βκc ] with κc = max{γ , (Γd + γ )/(d + 1)},
provided Γd ≥ γ . By making the appropriate choice for γ as a function of J, h
(e.g. γ = 2J − h in d = 2), it is shown that the nucleation time in the original
model is close to that of the nucleation-and-growth model. The proof requires
delicate coupling techniques.
(2) Energy landscape: A detailed study of the energy landscape is necessary in
order to show that the dynamics does not get caught in a deep well. For d =
2, 3 this can be done with the help of the combinatorial techniques mentioned
and used in Chap. 17, but for higher dimensions no analogous techniques are
available. The necessary estimates are obtained via rougher arguments.
(3) Space-time clusters: Some control on the size of space-time clusters is needed,
e.g. to show that it is very unlikely for large space-time clusters to be formed
prior to nucleation or for subcritical clusters to move over long distances. This
requires estimates on recurrence times as well as an analysis of “cycle com-
pounds”.
1
The factor d+1 in (23.1.5) shows that the mechanism of far-away nucleation
followed by invasion is faster than the mechanism of close-by nucleation alone.
Thus, space-time entropy places a crucial role in infinite volume.
Return to the model in Sect. 22.1 with Λ replaced by Z2 and again consider the
metastable regime β ∈ (0, βc ) and h ↓ 0.
Theorem 23.2 (Mean crossover time from minus to plus) The same result as in
Theorem 22.1 holds with κβ replaced by 13 κβ .
The heuristics behind this theorem is the same as for the model in Sect. 23.1. The
1
extra factor d+1 = 13 in d = 2 again comes from space-time entropy: invasion of a
growing droplet that is created somewhere in a space-time cone of the appropriate
size. The same three obstacles have to be overcome to build a proof.
1. Aizenman and Lebowitz [1] argued that a nucleation center in bootstrap perco-
lation plays a similar rôle as a growing Wulff droplet in the Ising model subject to
Glauber dynamics, and that therefore bootstrap percolation can serve as a paradigm
for the description of metastable behaviour. They sketched a program for Glauber
dynamics, which was subsequently carried out in Schonmann and Shlosman [215].
23.3 Bibliographical notes 563
2. Theorem 23.1 was proved for d = 2 in Dehghanpour and Schonmann [77, 78],
and for d ≥ 3 in Cerf and Manzo [54]. Theorem 23.2 was proved for d = 2 in Schon-
mann and Shlosman [215]. The extension to d ≥ 3 is still open. It is conjectured in
Schonmann [213] that v2 ∼ Ch, h ↓ 0, which is much stronger than the scaling
property v2 = exp[o(1/ h)], h ↓ 0, that is needed for the proof of Theorem 23.2.
1. Aizenman, M., Lebowitz, J.: Metastability effects in bootstrap percolation. J. Phys. A 21,
3801–3813 (1988)
2. Aldous, D., Fill, J.A.: Reversible Markov Chains and Random Walks on Graphs. https://
www.stat.berkeley.edu/~aldous/RWG/book.pdf (2002/2014)
3. Allen, S., Cahn, J.: Ground state structures in ordered binary alloys with second neighbor
interactions. Acta Metall. 20, 423–433 (1972)
4. Alonso, L., Cerf, R.: The three-dimensional polyominoes of minimal area. Electron. J. Comb.
3, 1–39 (1996)
5. Amaro de Matos, J.M.G., Baêta Segundo, J.A., Perez, J.F.: Fluctuations in dilute antiferro-
magnets: Curie-Weiss models. J. Phys. A 25, 2819–2830 (1992)
6. an der Heiden, M.: Metastability of Markov chains and in the Hopfield model. Ph.D. thesis,
Technische Universtität Berlin (2007)
7. Andjel, E.D.: Invariant measures for the zero range processes. Ann. Probab. 10, 525–547
(1982)
8. Arrhenius, S.: On the reaction rate of the inversion of non-refined sugar upon souring. Z.
Phys. Chem. 4, 226–248 (1889)
9. Baake, E., Baake, M., Bovier, A., Klein, M.: An asymptotic maximum principle for essen-
tially linear evolution models. J. Math. Biol. 50, 83–114 (2005)
10. Ball, J.M., Carr, J., Penrose, O.: The Becker-Döring cluster equations: basic properties and
asymptotic behaviour of solutions. Commun. Math. Phys. 104, 657–692 (1986)
11. Barret, F.: Sharp asymptotics of metastable transition times for one-dimensional SPDEs.
Ann. Inst. Henri Poincaré Probab. Stat. 51, 129–166 (2015)
12. Barret, F., Bovier, A., Méléard, S.: Uniform estimates for metastable transition times in a
coupled bistable system. Electron. J. Probab. 15, 323–345 (2010)
13. Bauer, H.: Probability Theory and Elements of Measure Theory. Academic Press Inc. [Har-
court Brace Jovanovich Publishers], London (1981)
14. Baur, E.: Metastabilität von reversiblen Diffusionsprozessen. Diploma thesis, Bonn Univer-
sity (2011)
15. Becker, R., Döring, W.: Kinetische Behandlung der Keimbildung in übersättigten Dämpfen.
Ann. Phys. (Leipz.) 24, 719–752 (1935)
16. Beltrán, J., Landim, C.: Tunneling and metastability of continuous time Markov chains II,
the nonreversible case. J. Stat. Phys. 149, 598–618 (2012)
17. Beltrán, J., Landim, C.: A martingale approach to metastability. Probab. Theory Relat. Fields
161, 267–307 (2015)
18. Ben Arous, G., Bovier, A., Gayrard, V.: Glauber dynamics of the random energy model. I.
Metastable motion on the extreme states. Commun. Math. Phys. 235, 379–425 (2003)
19. Ben Arous, G., Cerf, R.: Metastability of the three-dimensional Ising model on a torus at
very low temperatures. Electron. J. Probab. 1, 1–55 (1996)
20. Berglund, N., Dutercq, S.: The Eyring–Kramers law for Markovian jump processes with
symmetries. J. Theor. Probab. (2015). doi:10.1007/s10959-015-0617-9
21. Berglund, N., Gentz, B.: The Eyring-Kramers law for potentials with nonquadratic saddles.
Markov Process. Relat. Fields 16, 549–598 (2010)
22. Berglund, N., Gentz, B.: Sharp estimates for metastable lifetimes in parabolic SPDEs:
Kramers’ law and beyond. Electron. J. Probab. 18, 1–58 (2013)
23. Berman, K.A., Konsowa, M.H.: Random paths and cuts, electrical networks, and reversible
Markov chains. SIAM J. Discrete Math. 3, 311–319 (1990)
24. Bianchi, A., Bovier, A., Ioffe, D.: Sharp asymptotics for metastability in the random field
Curie-Weiss model. Electron. J. Probab. 14, 1541–1603 (2009)
25. Bianchi, A., Bovier, A., Ioffe, D.: Pointwise estimates and exponential laws in metastable
systems via coupling methods. Ann. Probab. 40, 339–379 (2012)
26. Bianchi, A., Gaudillière, A.: Metastable states, quasi-stationary and soft measures, mixing
time asymptotics via variational principles. arXiv:1103.1143v1, to appear in Stoch. Proc.
Appl. (2011)
27. Bigelis, S., Cirillo, E., Lebowitz, J., Speer, E.: Critical droplets in metastable probabilistic
cellular automata. Phys. Rev. E 59, 3935–3941 (1999)
28. Billingsley, P.: Probability and Measure. Wiley Series in Probability and Mathematical Statis-
tics. Wiley, New York (1995)
29. Bodineau, T., Graham, B., Wouts, M.: Metastability in the dilute Ising model. Probab. Theory
Relat. Fields 157, 955–1009 (2013)
30. Bouchaud, J.-P., Cugliandolo, L., Kurchan, J., Mézard, M.: Out of equilibrium dynamics in
spin-glasses and other glassy systems. In: Young, A.P. (ed.) Spin Glasses and Random Fields.
World Scientific, Singapore (1998)
31. Bovier, A., den Hollander, F., Nardi, F.R.: Sharp asymptotics for Kawasaki dynamics on a
finite box with open boundary. Probab. Theory Relat. Fields 135, 265–310 (2006)
32. Bovier, A., den Hollander, F., Spitoni, C.: Homogeneous nucleation for Glauber and
Kawasaki dynamics in large volumes and low temperature. Ann. Probab. 38, 661–713 (2010)
33. Bovier, A., Eckhoff, M., Gayrard, V., Klein, M.: Metastability in stochastic dynamics of
disordered mean-field models. Probab. Theory Relat. Fields 119, 99–161 (2001)
34. Bovier, A., Eckhoff, M., Gayrard, V., Klein, M.: Metastability and low lying spectra in re-
versible Markov chains. Commun. Math. Phys. 228, 219–255 (2002)
35. Bovier, A., Eckhoff, M., Gayrard, V., Klein, M.: Metastability in reversible diffusion pro-
cesses. I. Sharp asymptotics for capacities and exit times. J. Eur. Math. Soc. (JEMS) 6, 399–
424 (2004)
36. Bovier, A., Gayrard, V.: Hopfield models as generalized random mean field models. In: Math-
ematical Aspects of Spin Glasses and Neural Networks. Progr. Probab., vol. 41, pp. 3–89.
Birkhäuser, Boston (1998)
37. Bovier, A., Gayrard, V.: Sample path large deviations for a class of Markov chains related to
disordered mean field models. Preprint available at arXiv:math/9905022, 487, WIAS Berlin
(1999)
38. Bovier, A., Gayrard, V., Klein, M.: Metastability in reversible diffusion processes. II. Precise
asymptotics for small eigenvalues. J. Eur. Math. Soc. (JEMS) 7, 69–99 (2005)
39. Bovier, A., Manzo, F.: Metastability in Glauber dynamics in the low-temperature limit: be-
yond exponential asymptotics. J. Stat. Phys. 107, 757–779 (2002)
40. Bovier, A., Neukirch, R.: A note on metastable behaviour in the zero-range process. In:
Griebel, M. (ed.) Singular Phenomena and Scaling in Mathematical Models, pp. 365–376.
Springer, Berlin (2013)
41. Brassesco, S.: Some results on small random perturbations of an infinite-dimensional dy-
namical system. Stoch. Process. Appl. 38, 33–53 (1991)
42. Brassesco, S., Buttà, P.: Interface fluctuations for the D = 1 stochastic Ginzburg-Landau
equation with nonsymmetric reaction term. J. Stat. Phys. 93, 1111–1142 (1998)
References 569
43. Burke, C.J., Rosenblatt, M.: A Markovian function of a Markov chain. Ann. Math. Stat. 29,
1112–1122 (1958)
44. Buslov, V.A., Makarov, K.A.: A time-scale hierarchy with small diffusion. Teor. Mat. Fiz.
76, 219–230 (1988)
45. Buslov, V.A., Makarov, K.A.: Life spans and least eigenvalues of an operator of small diffu-
sion. Mat. Zametki 51, 160 (1992)
46. Cancrini, N., Cesi, F., Martinelli, F.: The spectral gap for the Kawasaki dynamics at low
temperature. J. Stat. Phys. 95, 215–271 (1999)
47. Cancrini, N., Martinelli, F.: On the spectral gap of Kawasaki dynamics under a mixing con-
dition revisited. J. Math. Phys. 41, 1391–1423 (2000)
48. Cancrini, N., Martinelli, F., Roberto, C.: The logarithmic Sobolev constant of Kawasaki dy-
namics under a mixing condition revisited. Ann. Inst. Henri Poincaré Probab. Stat. 38, 385–
436 (2002)
49. Cancrini, N., Martinelli, F., Roberto, C.: Spectral gap and logarithmic Sobolev constant of
Kawasaki dynamics under a mixing condition revisited. In: In and out of Equilibrium (Mam-
bucaba, 2000). Progr. Probab., vol. 51, pp. 259–271. Birkhäuser, Boston (2002)
50. Caputo, P., Lacoin, H., Martinelli, F., Simenhaus, F., Toninelli, F.L.: Polymer dynamics in
the depinned phase: metastability with logarithmic barriers. Probab. Theory Relat. Fields
153, 587–641 (2012)
51. Cassandro, M., Galves, A., Olivieri, E., Vares, M.E.: Metastable behavior of stochastic dy-
namics: a pathwise approach. J. Stat. Phys. 35, 603–634 (1984)
52. Cassandro, M., Olivieri, E., Picco, P.: Small random perturbations of infinite-dimensional
dynamical systems and nucleation theory. Ann. Inst. Henri Poincaré, Phys. Théor. 44, 343–
396 (1986)
53. Catoni, O., Cerf, R.: The exit path of a Markov chain with rare transitions. ESAIM Probab.
Stat. 1, 95–144 (1995/97)
54. Cerf, R., Manzo, F.: Nucleation and growth for the Ising model in d dimensions at very low
temperatures. Ann. Probab. 41, 3697–3785 (2013)
55. Chaganty, N.R., Sethuraman, J.: Strong large deviation and local limit theorems. Ann.
Probab. 21, 1671–1690 (1993)
56. Chatterjee, S., Durrett, R.: Contact processes on random graphs with power law degree dis-
tributions have critical value 0. Ann. Probab. 37, 2332–2356 (2009)
57. Chenal, F., Millet, A.: Uniform large deviations for parabolic SPDEs and applications. Stoch.
Process. Appl. 72, 161–186 (1997)
58. Chow, Y.S., Teicher, H.: Probability Theory, 3rd edn. Springer Texts in Statistics. Springer,
New York (1997)
59. Cirillo, E.: A note on the metastability of the Ising model: the alternate updating case. J. Stat.
Phys. 106, 335–390 (2002)
60. Cirillo, E., Nardi, F.: Metastability for the Ising model with a parallel dynamics. J. Stat. Phys.
110, 183–217 (2003)
61. Cirillo, E., Nardi, F., Polosa, A.: Magnetic order in the Ising model with parallel dynamics.
Phys. Rev. E 64, 57103 (2001)
62. Cirillo, E., Nardi, F., Spitoni, C.: Competitive nucleation in reversible probabilistic cellular
automata. Phys. Rev. E 78, 040601 (2008)
63. Cirillo, E., Nardi, F., Spitoni, C.: Metastability for reversible probabilistic cellular automata
with self-interaction. J. Stat. Phys. 132, 431–471 (2008)
64. Cirillo, E., Nardi, F., Spitoni, C.: Competitive nucleation in metastable systems. Commun.
SIMAI Congr. 3, 040601(R) (2009)
65. Cirillo, E.N.M., Nardi, F.R.: Relaxation height in energy landscapes: an application to mul-
tiple metastable states. J. Stat. Phys. 150, 1080–1114 (2013)
66. Coddington, E., Levinson, N.: Theory of Ordinary Differential Equations. McGraw-Hill,
New York-Toronto-London (1955)
67. Da Prato, G., Debussche, A.: Strong solutions to the stochastic quantization equations. Ann.
Probab. 31, 1900–1916 (2003)
570 References
68. Da Prato, G., Zabczyk, J.: Stochastic Equations in Infinite Dimensions. Encyclopedia of
Mathematics and Its Applications, vol. 44. Cambridge University Press, Cambridge (1992)
69. Dai Pra, P., den Hollander, F.: McKean-Vlasov limit for interacting random processes in
random media. J. Stat. Phys. 84, 735–772 (1996)
70. Davies, E.: Dynamical stability of metastable states. J. Funct. Anal. 46, 373–386 (1982)
71. Davies, E.: Spectral properties of metastable Markov semigroups. J. Funct. Anal. 52, 315–
329 (1983)
72. Davies, E.B.: Metastability and the Ising model. J. Stat. Phys. 27, 657–675 (1982)
73. Davies, E.B.: Metastable states of symmetric Markov semigroups. I. Proc. Lond. Math. Soc.
45, 133–150 (1982)
74. Davies, E.B.: Metastable states of symmetric Markov semigroups. II. J. Lond. Math. Soc. 26,
541–556 (1982)
75. Dawson, D., Greven, A.: Spatial Fleming-Viot Models with Selection and Mutation. Lecture
Notes in Mathematics, vol. 2092. Springer, Berlin (2014)
76. de Hoog, F.R., Anderssen, R.S.: Asymptotic formulas for discrete eigenvalue problems in
Liouville normal form. Math. Models Methods Appl. Sci. 11, 43–56 (2001)
77. Dehghanpour, P., Schonmann, R.H.: Metropolis dynamics relaxation via nucleation and
growth. Commun. Math. Phys. 188, 89–119 (1997)
78. Dehghanpour, P., Schonmann, R.H.: A nucleation-and-growth model. Probab. Theory Relat.
Fields 107, 123–135 (1997)
79. Dembo, A., Zeitouni, O.: Large Deviations Techniques and Applications, 2nd edn. Applica-
tions of Mathematics (New York), vol. 38. Springer, New York (1998)
80. den Hollander, F.: Large Deviations. Fields Institute Monographs, vol. 14. American Mathe-
matical Society, Providence (2000)
81. den Hollander, F.: Metastability under stochastic dynamics. Stoch. Process. Appl. 114, 1–26
(2004)
82. den Hollander, F., Jansen, S.: Berman-Konsowa principle for reversible Markov jump pro-
cesses. arXiv:1309.1305, to appear in Markov Process. Relat. Fields (2015)
83. den Hollander, F., Jansen, S.: Metastability at low temperature for continuum interacting
particle systems (2015, in preparation)
84. den Hollander, F., Nardi, F., Olivieri, E., Scoppola, E.: Droplet growth for three-dimensional
Kawasaki dynamics. Probab. Theory Relat. Fields 125, 153–194 (2003)
85. den Hollander, F., Nardi, F.R., Troiani, A.: Kawasaki dynamics with two types of particles:
stable/metastable configurations and communication heights. J. Stat. Phys. 145, 1423–1457
(2011)
86. den Hollander, F., Nardi, F.R., Troiani, A.: Kawasaki dynamics with two types of particles:
critical droplets. J. Stat. Phys. 149, 1013–1057 (2012)
87. den Hollander, F., Nardi, F.R., Troiani, A.: Metastability for Kawasaki dynamics at low tem-
perature with two types of particles. Electron. J. Probab. 17, 26 (2012)
88. den Hollander, F., Olivieri, E., Scoppola, E.: Metastability and nucleation for conservative
dynamics. J. Math. Phys. 41, 1424–1498 (2000)
89. den Hollander, F., Olivieri, E., Scoppola, E.: Nucleation in fluids: some rigorous results.
Physica A 279, 110–122 (2000)
90. den Hollander, F., Olivieri, E., Scoppola, E.: Metastability and nucleation for conservative
dynamics. Markov Process. Relat. Fields 7, 51–53 (2001)
91. Deuschel, J.-D., Stroock, D.: Large Deviations. Pure and Applied Mathematics, vol. 137.
Academic Press, Boston (1989)
92. Dobrushin, R., Shlosman, S.: “Non-Gibbsian” states and their Gibbs description. Commun.
Math. Phys. 200, 125–179 (1999)
93. Dommers, S.: Metastability of the Ising model on random regular graphs at zero temperature.
arXiv:1411.6802 (2014)
94. Donsker, M., Varadhan, S.: A law of the iterated logarithm for total occupation times of
transient Brownian motion. Commun. Pure Appl. Math. 33, 365–393 (1980)
References 571
95. Doob, J.L.: Classical Potential Theory and Its Probabilistic Counterpart. Grundlehren
der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences],
vol. 262. Springer, New York (1984)
96. Doyle, P., Snell, J.: Random Walks and Electric Networks. Carus Mathematical Monographs,
vol. 22. Mathematical Association of America, Washington (1984)
97. Doyle, P.G.: Energy for Markov chains. Preprint available at http://math.dartmouth.edu/
~doyle/docs/energy/energy.pdf (1994)
98. Dupuis, P., Ellis, R.: A Weak Convergence Approach to the Theory of Large Deviations.
Wiley Series in Probability and Statistics: Probability and Statistics. Wiley, New York (1997)
99. E, W., Ren, W., Vanden-Eijnden, E.: Energy landscapes and rare events. In: Proceedings of
the International Congress of Mathematicians, Vol. I (Beijing, 2002), pp. 621–630. Higher
Ed. Press, Beijing (2002)
100. E, W., Vanden-Eijnden, E.: Towards a theory of transition paths. J. Stat. Phys. 123, 503–523
(2006)
101. Eckhoff, M.: Capacity and the Low Lying Spectrum in Attractive Markov Chains. Ph.D.
thesis, Universität Potsdam (2000)
102. Eckhoff, M.: The low lying spectrum of irreversible, infinite state Markov chains in
the metastable regime. Technical report. Preprint, available at http://www.math.uzh.ch/
fileadmin/user/eckhoff/publikation/specirrev.pdf (2002)
103. Ellis, R.: Entropy, Large Deviations, and Statistical Mechanics. Grundlehren der Mathema-
tischen Wissenschaften, vol. 271. Springer, New York (1985)
104. Ethier, S., Kurtz, T.: Markov Processes. Characterization and Convergence. Wiley Series in
Probability and Mathematical Statistics: Probability and Mathematical Statistics. Wiley, New
York (1986)
105. Evans, M.: Phase transitions in one-dimensional nonequilibrium systems. Braz. J. Phys. 30,
42–57 (2000)
106. Eyring, H.: The activated complex in chemical reactions. J. Chem. Phys. 3, 107–115 (1935)
107. Faris, W., Jona-Lasinio, G.: Large fluctuations for a nonlinear heat equation with noise. J.
Phys. A 15, 3025–3055 (1982)
108. Feller, W.: An Introduction to Probability Theory and Its Applications. Vol. I, 3rd edn. Wiley,
New York (1968)
109. Feller, W.: An Introduction to Probability Theory and Its Applications. Vol. II, 2nd edn.
Wiley, New York (1971)
110. Feng, J., Kurtz, T.: Large Deviations for Stochastic Processes. Mathematical Surveys and
Monographs, vol. 131. American Mathematical Society, Providence (2006)
111. Fernandez, R., Manzo, F., Nardi, F., Scoppola, E., Sohier, J.: Conditioned, quasi-stationary,
restricted measures and escape from metastable states. ArXiv e-prints (Oct. 2014)
112. Fernandez, R., Manzo, F., Nardi, F.R., Scoppola, E.: Asymptotically exponential hitting times
and metastability. arXiv:1406.2637 (2014)
113. Fiedler, B., Rocha, C.: Heteroclinic orbits of semilinear parabolic equations. J. Differ. Equ.
125, 239–281 (1996)
114. Fontes, L.R., Mathieu, P., Picco, P.: On the averaged dynamics of the random field Curie-
Weiss model. Ann. Appl. Probab. 10, 1212–1245 (2000)
115. Freidlin, M.I., Wentzell, A.D.: Random Perturbations of Dynamical Systems. Grundlehren
der Mathematischen Wissenschaften, vol. 260. Springer, New York (1984)
116. Fukushima, M., Oshima, Y., Takeda, M.: Dirichlet Forms and Symmetric Markov Processes.
de Gruyter Studies in Mathematics, vol. 19. de Gruyter, Berlin (2011), extended edition
117. Funaki, T.: Random motion of strings and related stochastic evolution equations. Nagoya
Math. J. 89, 129–193 (1983)
118. Gaudillière, A., den Hollander, F., Nardi, F.R., Olivieri, E., Scoppola, E.: Ideal gas approxi-
mation for a two-dimensional rarefied gas under Kawasaki dynamics. Stoch. Process. Appl.
119, 737–774 (2009)
119. Gaudillière, A., den Hollander, F., Nardi, F.R., Olivieri, E., Scoppola, E.: Droplet dynamics
in a two-dimensional rarified gas under Kawasaki dynamics (2015, in preparation)
572 References
120. Gaudillière, A., den Hollander, F., Nardi, F.R., Olivieri, E., Scoppola, E.: Homogeneous nu-
cleation for two-dimensional Kawasaki dynamics (2015, in preparation)
121. Gaudillière, A., Landim, C.: A Dirichlet principle for non reversible Markov chains and some
recurrence theorems. Probab. Theory Relat. Fields 158, 55–89 (2014)
122. Gaudillière, A., Olivieri, E., Scoppola, E.: Nucleation pattern at low temperature for local
Kawasaki dynamics in two dimensions. Markov Process. Relat. Fields 11, 553–628 (2005)
123. Gaveau, B., Moreau, M.: Metastable relaxation times and absorption probabilities for multi-
dimensional stochastic systems. J. Phys. A 33, 4837–4850 (2000)
124. Gaveau, B., Schulman, L.S.: Theory of nonequilibrium first-order phase transitions for
stochastic dynamics. J. Math. Phys. 39, 1517–1533 (1998)
125. Georgii, H.-O.: Gibbs Measures and Phase Transitions, 2nd edn. de Gruyter Studies in Math-
ematics, vol. 9. de Gruyter, Berlin (1988)
126. Gilbarg, D., Trudinger, N.: Elliptic Partial Differential Equations of Second Order, 2nd edn.
Grundlehren der Mathematischen Wissenschaften, vol. 224. Springer, Berlin (1983)
127. Glasstone, S., Laidler, K., Eyring, H.: The Theory of Rate Processes. McGraw-Hill, New
York (1941)
128. Gois, B., Landim, C.: Zero-temperature limit of the Kawasaki dynamics for the Ising lattice
gas in a large two-dimensional torus. Ann. Probab. 43, 2151–2203 (2015)
129. Grassberger, P., Barkema, G., Nadler, W.: Monte Carlo Approach to Biopolymers and Protein
Folding. World Scientific, Singapore (1998)
130. Großkinsky, S., Schütz, G.M.: Discontinuous condensation transition and nonequivalence of
ensembles in a zero-range process. J. Stat. Phys. 132, 77–108 (2008)
131. Großkinsky, S., Schütz, G.M., Spohn, H.: Condensation in the zero range process: stationary
and dynamical properties. J. Stat. Phys. 113, 389–410 (2003)
132. Großkinsky, S., Spohn, H.: Stationary measures and hydrodynamics of zero range processes
with several species of particles. Bull. Braz. Math. Soc. (N.S.), 489–507 (2003)
133. Gyöngy, I.: Lattice approximations for stochastic quasi-linear parabolic partial differential
equations driven by space-time white noise. I. Potential Anal. 9, 1–25 (1998)
134. Gyöngy, I., Pardoux, É.: On quasi-linear stochastic partial differential equations. Probab.
Theory Relat. Fields 94, 413–425 (1993)
135. Hairer, M.: A theory of regularity structures. Invent. Math. 198, 269–504 (2014)
136. Helffer, B., Klein, M., Nier, F.: Quantitative analysis of metastability in reversible diffusion
processes via a Witten complex approach. Mat. Contemp. 26, 41–85 (2004)
137. Helffer, B., Nier, F.: Quantitative analysis of metastability in reversible diffusion processes
via a Witten complex approach: the case with boundary. Mém. Soc. Math. Fr. (N.S.) 105,
vi+89 (2006)
138. Helffer, B., Sjöstrand, J.: Multiple wells in the semiclassical limit. I. Commun. Partial Differ.
Equ. 9, 337–408 (1984)
139. Holden, H., Oksendal, B., Uboe, J., Zhang, T.: Stochastic Partial Differential Equations.
Probability and Its Applications. Birkhäuser, Boston (1996)
140. Holley, R.A., Kusuoka, S., Stroock, D.W.: Asymptotics of the spectral gap with applications
to the theory of simulated annealing. J. Funct. Anal. 83, 333–347 (1989)
141. Huang, S.: The molecular and mathematical basis of Waddington’s epigenetic landscape:
A framework for post-Darwinian biology? BioEssays 34, 149–157 (2011)
142. Itō, K., McKean, H.: Diffusion Processes and Their Sample Paths. Springer, New York
(1965)
143. Jacod, J., Shiryaev, A.: Limit Theorems for Stochastic Processes, 2nd edn. Grundlehren der
Mathematischen Wissenschaften, vol. 288. Springer, Berlin (2003)
144. Kakutani, S.: Two-dimensional Brownian motion and harmonic functions. Proc. Imp. Acad.
(Tokyo) 20, 706–714 (1944)
145. Kakutani, S.: Markov process and the Dirichlet problem. Proc. Jpn. Acad. 21, 227–233
(1949), 1945
146. Kallenberg, O.: Foundations of Modern Probability, 2nd edn. Probability and Its Applications
(New York). Springer, New York (2002)
References 573
147. Kallianpur, G., Xiong, J.: Large deviations for a class of stochastic partial differential equa-
tions. Ann. Probab. 24, 320–345 (1996)
148. Karatzas, I., Shreve, S.: Brownian Motion and Stochastic Calculus. Graduate Texts in Math-
ematics. Springer, New York (1988)
149. Kauffman, S.: The Origins of Order. Oxford University Press, Oxford (1993)
150. Kemeny, J., Snell, J., Knapp, A.: Denumerable Markov Chains, 2nd edn. Springer, New York
(1976)
151. Kemeny, J.G., Snell, J.L.: Finite Markov Chains. The University Series in Undergraduate
Mathematics. D. Van Nostrand Co., Inc., Princeton-Toronto-London-New York (1960)
152. Kifer, Y.: Random Perturbations of Dynamical Systems. Progress in Probability and Statis-
tics, vol. 16. Birkhäuser, Boston (1988)
153. Knessl, C., Matkowsky, B.J., Schuss, Z., Tier, C.: An asymptotic theory of large deviations
for Markov jump processes. SIAM J. Appl. Math. 45, 1006–1028 (1985)
154. Kolokoltsov, V.: Semiclassical Analysis for Diffusions and Stochastic Processes. Lecture
Notes in Mathematics, vol. 1724. Springer, Berlin (2000)
155. Kotecký, R., Olivieri, E.: Droplet dynamics for asymmetric Ising model. J. Stat. Phys. 70,
1121–1148 (1993)
156. Kotecký, R., Olivieri, E.: Shapes of growing droplets—a model of escape from a metastable
phase. J. Stat. Phys. 75, 409–506 (1994)
157. Kramers, H.A.: Brownian motion in a field of force and the diffusion model of chemical
reactions. Physica 7, 284–304 (1940)
158. Külske, C.: Metastates in disordered mean-field models: random field and Hopfield models.
J. Stat. Phys. 88, 1257–1293 (1997)
159. Landim, C.: Metastability for a non-reversible dynamics: The evolution of the condensate in
totally asymmetric zero range processes. Commun. Math. Phys. 330, 1–32 (2014)
160. Lawler, G.F.: Intersections of Random Walks. Probability and Its Applications. Birkhäuser,
Boston (1991)
161. Lawler, G.F., Schramm, O., Werner, W.: Conformal invariance of planar loop-erased random
walks and uniform spanning trees. Ann. Probab. 32, 939–995 (2004)
162. Lebowitz, J.L., Penrose, O.: Rigorous treatment of the van der Waals-Maxwell theory of the
liquid-vapor transition. J. Math. Phys. 7, 98–113 (1966)
163. Levin, D., Peres, Y., Wilmer, E.: Markov Chains and Mixing Times. American Mathematical
Society, Providence (2009)
164. Levin, D.A., Luczak, M.J., Peres, Y.: Glauber dynamics for the mean-field Ising model: cut-
off, critical power law, and metastability. Probab. Theory Relat. Fields 146, 223–265 (2010)
165. Levit, S., Smilansky, U.: A theorem on infinite products of eigenvalues of Sturm-Liouville
type operators. Proc. Am. Math. Soc. 65, 299–302 (1977)
166. Liggett, T.: Interacting Particle Systems. Grundlehren der Mathematischen Wissenschaften,
vol. 276. Springer, New York (1985)
167. Lubetzky, E., Martinelli, F., Sly, A., Toninelli, F.L.: Quasi-polynomial mixing of the 2D
stochastic Ising model with “plus” boundary up to criticality. J. Eur. Math. Soc. (JEMS) 15,
339–386 (2013)
168. Lubetzky, E., Sly, A.: Critical Ising on the square lattice mixes in polynomial time. Commun.
Math. Phys. 313, 815–836 (2012)
169. Maier, R.S., Stein, D.L.: Limiting exit location distributions in the stochastic exit problem.
SIAM J. Appl. Math. 57, 752–790 (1997)
170. Maier, R.S., Stein, D.L.: Droplet nucleation and domain wall motion in a bounded interval.
Phys. Rev. Lett. 87, 270601 (2001)
171. Manzo, F., Nardi, F.R., Olivieri, E., Scoppola, E.: On the essential features of metastability:
tunnelling time and critical configurations. J. Stat. Phys. 115, 591–642 (2004)
172. Manzo, F., Olivieri, E.: Relaxation patterns for competing metastable states: a nucleation and
growth model. Markov Process. Relat. Fields 4, 549–570 (1998)
173. Manzo, F., Olivieri, E.: Dynamical Blume-Capel model: competing metastable states at infi-
nite volume. J. Stat. Phys. 104, 1029–1090 (2001)
574 References
174. Martinelli, F., Olivieri, E., Scoppola, E.: Small random perturbations of finite- and infinite-
dimensional dynamical systems: unpredictability of exit times. J. Stat. Phys. 55, 477–504
(1989)
175. Martinelli, F., Olivieri, E., Scoppola, E.: Metastability and exponential approach to equilib-
rium for low-temperature stochastic Ising models. J. Stat. Phys. 61, 1105–1119 (1990)
176. Martinelli, F., Sbano, L., Scoppola, E.: Small random perturbation of dynamical systems:
recursive multiscale analysis. Stoch. Stoch. Rep. 49, 253–272 (1994)
177. Martinelli, F., Scoppola, E.: Small random perturbations of dynamical systems: exponential
loss of memory of the initial condition. Commun. Math. Phys. 120, 25–69 (1988)
178. Martinelli, F., Toninelli, F.L.: On the mixing time of the 2D stochastic Ising model with
“plus” boundary conditions at low temperature. Commun. Math. Phys. 296, 175–213 (2010)
179. Mathieu, P.: Spectra, exit times and long time asymptotics in the zero-white-noise limit.
Stoch. Stoch. Rep. 55, 1–20 (1995)
180. Mathieu, P., Picco, P.: Metastability and convergence to equilibrium for the random field
Curie-Weiss model. J. Stat. Phys. 91, 679–732 (1998)
181. Matkowsky, B.J., Schuss, Z.: On the lifetime of a metastable state at low noise. Phys. Lett.
A 95, 213–215 (1983)
182. Matkowsky, B.J., Schuss, Z., Tier, C.: Uniform expansion of the transition rate in Kramers’
problem. J. Stat. Phys. 35, 443–456 (1984)
183. Menz, G., Schlichting, A.: Poincaré and logarithmic Sobolev inequalities by decomposition
of the energy landscape. Ann. Probab. 42, 1809–1884 (2014)
184. Metzner, P., Schütte, C., Vanden-Eijnden, E.: Transition path theory for Markov jump pro-
cesses. Multiscale Model. Simul. 7, 1192–1219 (2008)
185. Miclo, L.: Comportement de spectres d’opérateurs de Schrödinger à basse température. Bull.
Sci. Math. 119, 529–553 (1995)
186. Mogul’skiı̆, A.A.: Large deviations for the trajectories of multidimensional random walks.
Teor. Verojatn. Primen. 21, 309–323 (1976)
187. Mourrat, J.-C., Valesin, D.: Phase transition of the contact process on random regular graphs.
arXiv:1405.0865 (2014)
188. Nardi, F.R., Olivieri, E.: Low temperature stochastic dynamics for an Ising model with alter-
nating field. Markov Process. Relat. Fields 2, 117–166 (1996)
189. Nardi, F.R., Olivieri, E., Scoppola, E.: Anisotropy effects in nucleation for conservative dy-
namics. J. Stat. Phys. 119, 539–595 (2005)
190. Nardi, F.R., Spitoni, C.: Sharp asymptotics for stochastic dynamics with parallel updating
rule. J. Stat. Phys. 146, 701–718 (2012)
191. Neukirch, R.: Metastability in the Zero-range Process. Diploma thesis, Bonn University
(2011)
192. Neves, E.: A discrete variational problem related to Ising droplets at low temperatures. J.
Stat. Phys. 80 (1995)
193. Neves, E.J., Schonmann, R.H.: Critical droplets and metastability for a Glauber dynamics at
very low temperatures. Commun. Math. Phys. 137, 209–230 (1991)
194. Newman, M., Barkema, G.: Monte Carlo Methods in Statistical Physics. Oxford University
Press, Oxford (1999)
195. Norris, J.: Markov Chains. Cambridge Series in Statistical and Probabilistic Mathematics,
vol. 2. Cambridge University Press, Cambridge (1998)
196. Olivieri, E., Scoppola, E.: Markov chains with exponentially small transition probabilities:
first exit problem from a general domain. I. The reversible case. J. Stat. Phys. 79, 613–647
(1995)
197. Olivieri, E., Scoppola, E.: Markov chains with exponentially small transition probabilities:
first exit problem from a general domain. II. The general case. J. Stat. Phys. 84, 987–1041
(1996)
198. Olivieri, E., Vares, M.E.: Large Deviations and Metastability. Encyclopedia of Mathematics
and Its Applications, vol. 100. Cambridge University Press, Cambridge (2005)
References 575
199. Penrose, O., Lebowitz, J.: Rigorous treatment of metastable states in the van der Waals-
Maxwell theory. J. Stat. Phys. 3, 211–236 (1971)
200. Penrose, O., Lebowitz, J.: Towards a rigorous molecular theory of metastability. In: Fluctua-
tion Phenomena, 2nd edn. North-Holland, Amsterdam (1987)
201. Pollak, E., Talkner, P.: Reaction rate theory: What it was, where is it today, and where is it
going? Chaos 15, 026116 (2005)
202. Presutti, E.: Scaling Limits in Statistical Mechanics and Microstructures in Continuum Me-
chanics. Theoretical and Mathematical Physics. Springer, Berlin (2009)
203. Prévôt, C., Röckner, M.: A Concise Course on Stochastic Partial Differential Equations.
Lecture Notes in Mathematics, vol. 1905. Springer, Berlin (2007)
204. Reed, M., Simon, B.: Methods of Modern Mathematical Physics. I. Functional Analysis, 2nd
edn. Academic Press [Harcourt Brace Jovanovich, Publishers], New York (1980)
205. Révész, P.: Random Walk in Random and Nonrandom Environments. World Scientific, Tea-
neck (1990)
206. Rikvold, P., Tomita, H., Miyashita, S., Sides, S.: Metastable lifetimes in a kinetic Ising model:
dependence on field and system size. Phys. Rev. E 49, 5080–5090 (1994)
207. Rogers, L., Williams, D.: Diffusions, Markov Processes, and Martingales. Vol. 2. Wiley Se-
ries in Probability and Mathematical Statistics: Probability and Mathematical Statistics. Wi-
ley, New York (1987)
208. Rogers, L., Williams, D.: Diffusions, Markov Processes, and Martingales. Vol. 1, 2nd edn.
Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statis-
tics. Wiley, New York (1994)
209. Ruelle, D.: Statistical Mechanics: Rigorous Results. Benjamin, New York-Amsterdam
(1969)
210. Schlichting, A.: The Eyring-Kramers Formula for Poincaré and Logarithmic Sobolev In-
equalities. Ph.D. thesis, Leipzig University (2012)
211. Schonmann, R.H.: The pattern of escape from metastability of a stochastic Ising model.
Commun. Math. Phys. 147, 231–240 (1992)
212. Schonmann, R.H.: Slow droplet-driven relaxation of stochastic Ising models in the vicinity
of the phase coexistence region. Commun. Math. Phys. 161, 1–49 (1994)
213. Schonmann, R.H.: Theorems and conjectures on the droplet-driven relaxation of stochastic
Ising models. In: Probability and Phase Transition (Cambridge, 1993). NATO Adv. Sci. Inst.
Ser. C Math. Phys. Sci., vol. 420, pp. 265–301. Kluwer Acad., Dordrecht (1994)
214. Schonmann, R.H.: Metastability and the Ising model. In: Proceedings of the International
Congress of Mathematicians (Berlin, 1998), vol. III, pp. 173–181 (1998)
215. Schonmann, R.H., Shlosman, S.B.: Wulff droplets and the metastable relaxation of kinetic
Ising models. Commun. Math. Phys. 194, 389–462 (1998)
216. Schütte, C., Huisinga, W., Meyn, S.: Metastability of diffusion processes. In: IUTAM Sym-
posium on Nonlinear Stochastic Dynamics. Solid Mech. Appl., vol. 110, pp. 71–81. Kluwer
Acad., Dordrecht (2003)
217. Schütte, C., Sarich, M.: Metastability and Markov State Models in Molecular Dynamics.
Courant Lecture Notes in Mathematics, vol. 24. Courant Institute of Mathematical Sciences/
American Mathematical Society, Providence/New York (2013)
218. Sewell, G.: Quantum Theory of Collective Phenomena. Oxford University Press, Oxford
(1986)
219. Slowik, M.: Contributions to the Potential Theoretic Approach to Metastability with Appli-
cations to the Random Field Curie-Weiss-Potts Model. Ph.D. thesis, Technische Universität
Berlin (2012)
220. Slowik, M.: A note on variational representations of capacities for reversible and non-
reversible Markov chains. Unpublished, Technische Universität Berlin (2012)
221. Sowers, R.B.: Large deviations for a reaction-diffusion equation with non-Gaussian pertur-
bations. Ann. Probab. 20, 504–537 (1992)
222. Stroock, D.: An Introduction to Markov Processes. Graduate Texts in Mathematics, vol. 230.
Springer, Berlin (2005)
576 References
223. Stroock, D., Varadhan, S.: Multidimensional Diffusion Processes. Grundlehren der Mathe-
matischen Wissenschaften, vol. 233. Springer, Berlin (1979)
224. Sugiura, M.: Metastable behaviors of diffusion processes with small parameter. J. Math. Soc.
Jpn. 47, 755–788 (1995)
225. Sugiura, M.: Exponential asymptotics in the small parameter exit problem. Nagoya Math. J.
144, 137–154 (1996)
226. Sznitman, A.-S.: Brownian Motion, Obstacles and Random Media. Springer Monographs in
Mathematics. Springer, Berlin (1998)
227. van den Berg, M.: Exit and return of a simple random walk. Potential Anal. 23, 45–53 (2005)
228. van Kampen, N.: Stochastic Processes in Physics and Chemistry. North-Holland, Amsterdam
(1981)
229. van ’t Hoff, J.: Études de Dynamiques Chimiques. F. Muller, Amsterdam (1884)
230. Vanden-Eijnden, E., Westdickenberg, M.G.: Rare events in stochastic partial differential
equations on large spatial domains. J. Stat. Phys. 131, 1023–1038 (2008)
231. Vanheuverzwijn, P.: Metastable states in the infinite Ising model. J. Math. Phys. 20, 2665–
2670 (1979)
232. Varadhan, S.R.S.: Large Deviations and Applications. CBMS-NSF Regional Conference
Series in Applied Mathematics, vol. 46. Society for Industrial and Applied Mathematics
(SIAM), Philadelphia (1984)
233. Ventcel’, A.D.: The asymptotic behavior of the first eigenvalue of a second order differential
operator with a small parameter multiplying the highest derivatives. Teor. Verojatn. Primen.
20, 610–613 (1975)
234. Ventcel’, A.D.: Formulas for eigenfunctions and eigenmeasures that are connected with a
Markov process. Teor. Verojatn. Primen. 18, 3–29 (1973)
235. Walsh, J.B.: An introduction to stochastic partial differential equations. In: École d’été de
probabilités de Saint-Flour, XIV—1984. Lecture Notes in Math., vol. 1180, pp. 265–439.
Springer, Berlin (1986)
236. Weidenmüller, H.A., Zhang, J.S.: Stationary diffusion over a multidimensional potential bar-
rier: a generalization of Kramers’ formula. J. Stat. Phys. 34, 191–201 (1984)
237. Wolfrum, M.: A sequence of order relations: encoding heteroclinic connections in scalar
parabolic PDE. J. Differ. Equ. 183, 56–78 (2002)
238. Wreszinski, W.F., Salinas, S.R.A.: The mean field Ising model in a random external magnetic
field. J. Stat. Phys. 41, 299–313 (1985)
Index
Heat kernel, 86 M
Heat-bath dynamics, 403 Markov jump process, 79, 101
Hessian, 249, 267, 279, 280, 340 Markov process, 66, 86
eigenvalue, 267 ρ-metastable, 191
Hille-Yosida theorem, 88, 92 aperiodic, 75
Hitting time, 49, 200 continuous-time, 84
Hysteresis, 7 discrete-time, 63
irreducible, 75
I
metastable family, 190
Inequality
Harnack, 240 stationary, 66
Hausdorff-Young, 313 transition function, 86
Hölder, 33 Markov property, 65
Jensen, 33 strong, 67, 91
maximum, 48, 58 Markov semigroup, 68
Minkowski, 33 Martingale, 44
Initial distribution, 64 convergence theorem, 45, 57
Inverse temperature, 326, 384 local, 61
Itō sub, 44, 54
calculus, 102 super, 44, 54
formula, 107 Martingale problem, 69, 92, 96
integral, 106 continuous-time, 84
isometry, 106 discrete-time, 69
existence, 100
J Markov, 70, 96
Jump process, 101
uniqueness, 99
K Martingale transform, 45
Kawasaki dynamics, 426 Maximum inequality, 46, 48, 58
Kirchhoff’s law, 176 Maximum principle, 71
Kramers Mean hitting time, 195
equation, 171 1-dim diffusion, 172
formula, 9, 16, 173 1-dim discrete, 157
model, 9, 16, 265 basic formula, 154
Mean-field model, 325
L Measurable map, 29
Lagrangian, 138 Measure, 28
Langevin equation, 16 σ -finite, 28
Large deviation principle, 125, 129 Borel, 29
Dawson-Gärtner projective limit, 127 equilibrium, 149
exponential tightness, 126 finite, 28
Gärtner-Ellis, 128
induced, 30
path space, 129
invariant, 66, 150, 162
weak, 126
measurable space, 28
Large deviations
for spdes, 136 measure space, 28
on path space, 129 reversible, 162
Last-exit biased distribution, 153, 463, 482 Wiener, 83
Lebesgue integral, 31 Metastability
Lebesgue’s dominated convergence theorem, characterisation, 189
32 computational approach, 12
Lebowitz-Penrose theory, 9 pathwise approach, 10
Legendre transform, 128 potential-theoretic approach, 11
Local martingale, 61 spectral approach, 11
Lumping, 234 spectral characterisation, 200
580 Index
Metastable Q
configurations, 385 Quadratic variation, 49
exit time, 200 Quasi-invariant set, 190
points, 191 Quasi-potential, 140
set, 210, 249, 395
state, 6, 15, 142 R
Metastable regime Radon-Nikodým
Glauber dynamics, 410, 420, 546 derivative, 37
Kawasaki dynamics, 427, 454 theorem, 37
Metropolis dynamics, 384 Random variable, 30
Microscopic flow, 355 Random walk, 155
Minimal gate, 385 Random-field Curie-Weiss model, 331, 345
Model reduction, 19 Rate function, 125
Mogul’skiı̆’s theorem, 138 Rayleigh-Ritz variational principle, 209
Monotone class theorem, 29 Recurrence, 74
Monotone convergence theorem, 31 Regular conditional probability, 43
Morse function, 249 existence, 43
Regular domain, 161
N Regularisable function, 53
Normal semigroup, 87 Renewal estimate, 192, 268
Nucleation path, 462 approximate, 237
Renormalisation, 233
O Reversed process, 182
Ohm’s law, 157 Reversibility, 150, 162
Optimal path, 248, 385 Reversible
Optimal transport, 304 measure, 162
Optional sampling theorem, 60 process, 162
P S
Path large deviations Saddle points, 248
Markov processes, 137 Sample path, 38
Phase transition, 5 Sanov’s theorem, 128
Poisson kernel, 148, 152, 158, 167, 238 Schilder’s theorem, 129
Polish space, 29 Semigroup, 87
Positive recurrent, 75 Separable, 28
Previsible process, 45 Simple function, 30
Principal Dirichlet eigenvalue, 281 Simple process, 105
Principal eigenvalue, 209 Simple random walk, 490
Probabilistic cellular automata, 404 Small eigenvalues, 203, 289
Probability Sobolev space, 237
density, 37 Solution
measure, 28 mild, 118
regular conditional, 43 strong, 108, 109
space, 28 weak, 108
Process Space
adapted, 44, 54 L p , 32
ergodic, 75 complete, 28
non-reversible, 181 filtered, 44
previsible, 45 Lp , 32
progressive, 59 metric, 28
quadratic variation, 49 Polish, 29
reversible, 162 Sobolev, 237
Progressive process, 59 Spectrum, 200, 278, 396
Protocritical set, 386, 411, 421, 429 Stability level, 385
Index 581