Stochastics PDEs
Stochastics PDEs
Stochastics PDEs
M. Hairer
Mathematics Institute, The University of Warwick
Email: M.Hairer@Warwick.co.uk
Contents
1 Introduction 1
2 Definition of a Markov process 2
3 Dynamical systems 6
4 Stationary Markov processes 8
5 Main structure theorem 9
6 Existence of an invariant measure 12
7 A simple yet powerful uniqueness criterion 13
8 Hörmander’s condition 15
9 What about the infinite-dimensional case? 20
10 The Bismut-Elworthy-Li formula 23
11 The asymptotic strong Feller property 25
1 Introduction
These lecture notes cover the material presented at the LMS-EPSRC Short Course
on Stochastic Partial Differential Equations held at Imperial College London in July
2008. They extend the material presented at the ‘12th National Summer School in
Mathematics for Graduate Students’ that took place at the University of Wuhan in July
2007.
They are by no means meant to be exhaustive and a solid background in probability
theory and PDE theory is required to follow them. The structure of these notes is
as follows. In Sections 2 to 4, we introduce the concepts of a time homogeneous
Markov process, its set of invariant measures We then proceed in Section 5 to show a
very general structure theorem that gives us a feeling of what the set of all invariant
probability measures for a given Markov process can look like. Sections 6 and 7 are
then devoted to the presentation of a few criteria that yield existence and uniqueness of
the invariant measure(s). The main philosophy that we try to convey there is that:
• In order to have existence of an invariant measure, the Markov process should
satisfy some compactness property, together with some regularity.
D EFINITION OF A M ARKOV PROCESS 2
• In order to have uniqueness of the invariant measure, the Markov process should
satisfy some irreducibility property, together with some regularity.
These two claims illustrate that the interplay between measure-theoretic notions (exis-
tence and uniqueness of an invariant measure) and topological concepts (compactness,
irreducibility) is a fundamental aspect of the ergodic theory of Markov processes.
Section 8 is devoted to an explanation (rather than a complete proof) of Hörman-
der’s famous ‘sums of squares’ theorem and how it can be used to check whether a
diffusion has transition probabilities that are continuous in the total variation distance,
thus satisfying the regularity condition required for showing the uniqueness of an in-
variant measure. In Section 9, we then show in which ways the proof of Hörmander’s
theorem breaks down for infinite-dimensional diffusions.
For situation where the forcing noise is sufficiently rough, we see however in Sec-
tion 10 that not all is lost. In particular, we give a proof of the Bismut-Elworthy-Li
formula that allows to show the strong Feller property for a rather large class of semi-
linear parabolic stochastic PDEs. In cases where the noise is very weak, this has no
chance of being applicable. It therefore motivates the introduction in Section 11 of the
asymptotic strong Feller property, which is the weakest type of regularity condition so
far, still ensuring uniqueness of the invariant measure when combined with topologi-
cal irreducibility. We also show that the asymptotic strong Feller property is satisfied
by a class of stochastic reaction-diffusion equations, provided that the noise acts on
sufficiently many Fourier modes with small wave number.
Definition 2.1 A stochastic process {Xt }t∈T taking values in a state space X is called
a Markov process if, for any N > 0, any ordered collection t−N < . . . < t0 < . . . <
tN of times, and any two functions f, g : X N → R, the equality
Definition 2.3 A Markov operator over a Polish space X is a bounded linear operator
P : Bb (X ) → Bb (X ) such that:
• P1 = 1.
• Pϕ is positive whenever ϕ is positive.
• If a sequence {ϕn } ⊂ Bb (X ) converges pointwise to an element ϕ ∈ Bb (X ),
then Pϕn converges pointwise to Pϕ.
It is possible to check that the two definitions actually define one and the same
object in the following sense:
We will therefore use the terminologies ‘Markov transition kernel’ and ‘Markov
operator’ interchangeably. We will also use the symbol P for both a Markov operator
acting on bounded measurable functions and the corresponding transition probabilities.
In order to streamline the notation, we will make a further abuse of notations and use
the same symbol P for the operator acting on signed measures by
Z
(Pµ)(A) = P(x, A)µ(dx) .
X
It will hopefully always be clear from the context which is the object that we are
currently talking about.
When considering Markov processes with continuous time, it is natural to consider
a family of Markov operators indexed by time. We call such a family a Markov semi-
group, provided that it satisfies the relation Pt+s = Pt ◦ Ps , for any s, t > 0. We call
a Markov process X a time-homogeneous Markov process with semigroup {Pt } if, for
any two times s < t, we have
almost surely.
A probability measure µ on X is invariant for the Markov operator P if the equality
Z Z
(Pϕ)(x) µ(dx) = ϕ(x) µ(dx)
X X
holds for every function ϕ ∈ Bb (X ). In other words, one has Pt µ = µ for every
positive time t.
D EFINITION OF A M ARKOV PROCESS 4
2.1 Examples
Example 2.5 (Finite state space) Take X = {1, . . . , n} for some n ∈ N. In this case,
both spaces Bb (X ) and M(X ) are canonically isomorphic to Rn via the identifications
ϕi = ϕ(i) for functions and µi = µ({i}) for measures. The pairing between functions
and measures then corresponds to the Euclidean scalar product on Rn .
A Markov operator over X acting on measures is therefore given P by an n × n
matrix P with the properties that Pij ≥ 0 for any pair (i, j) and that j Pij = 1 for
every i. The number Pij represents the probability of jumping to the state i, given that
the current state is j. The corresponding operator acting on functions is given by the
transpose matrix P T , since hP µ, ϕi = hµ, P T ϕi.
Example 2.6 (i.i.d. random variables) Take an arbitrary state space X and a prob-
ability measure µ ∈ M1 (X ). A sequence {Xi } of independent, identically dis-
tributed, X -valued random variables with law
R µ is a Markov process. The correspond-
ing Markov operator is given by (Pϕ)(x) = X ϕ(y) µ(dy), which is always a constant
function.
Example 2.7 (Random walk) Let {ξn } be a sequencePn of i.i.d. real-valued random
variables with law µ and define X0 = 0, Xn = k=1 ξk . The process X is called
a random walk with increments ξ. The most prominent example is the simple random
walk which coresponds to the case where the ξn are Bernoulli random variables tak-
ing the values ±1 with equal probabilities. More information about the behaviour of
random walks can be found in [Spi76].
Example 2.8 (Brownian motion) This is probably the most studied Markov process
in continuous time, see for example the monograph [RY99]. Its state space is R and,
for a given σ > 0, its Markov semigroup is given by
Z
1 (x−y)2
(Pt ϕ)(x) = √ e− 2σ2 t ϕ(y) dy . (2.1)
σ 2πt R
In terms of transition probabilities, we can also write this as Pt (x, · ) = N (x, σ 2 t),
the Gaussian law with mean x and variance σ 2 t. The Brownian motion with variance
σ 2 = 1 is also called the standard Wiener process. Brownian motion is named after
19th century botanist Robert Brown who studied the motion of grains of pollen in
suspension in a fluid [Bro28]. It can be obtained as a scaling limit
1
Bt ∼ lim √ X[N t] ,
N →∞ N
2
R 2 Xn denotes the random walk from the previous example, provided that σ =
where
R
x µ(dx) < ∞. The modeling idea is that the grain of pollen is constantly bom-
barded by water molecules which push it into a random direction1 .
It is interesting to note that the kernel appearing in the right hand side of (2.1) is
the fundamental solution of the heat equation, which implies that ψ(x, t) = (Pt ϕ)(x)
solves the partial differential equation
∂ψ 1 ∂2ψ
= , ψ(x, 0) = ϕ(x) .
∂t 2 ∂x2
1 A more realistic model has a parameter γ > 0 taking into account the inertia of the grain of pollen is
given by the process Bγ (t) = γ 0t e−γ(t−s) B(s) ds, where B is the mathematical Brownian motion that
R
we just described. The process Bγ is sometimes referred to as the physical Brownian motion and converges
to the mathematical Brownian motion in the limit γ → ∞.
D EFINITION OF A M ARKOV PROCESS 5
This link between Brownian motion and the heat equation was discovered by Einstein
in [Ein05] and is still permeates much of stochastic analysis. It allows to give proba-
bilistic proof of analytical questions and vice-versa.
Example 2.10 (Autoregressive process) Let {ξn } be a sequence of i.i.d. random vari-
ables on Rd with law µ and let α ∈ R. Given an Rd -valued random variable X0
independent of {ξn }, we construct a sequence {Xn } by the recursion formula:
Xn+1 = αXn + ξn .
This is a Markov process in discrete time and the corresponding Markov operator is
given by Z
(Pϕ)(x) = ϕ(αx + y) µ(dy) .
Rd
Actually, we can look at more general recursions of the form
Xn+1 = F (Xn , ξn ) ,
which still yields a Markov process with Markov operator
Z
(Pϕ)(x) = ϕ(F (x, y)) µ(dy) .
Rd
The autoregressive process and the random walk are two examples of Markov processes
with this structure.
3 Dynamical systems
In this section, we give a short survey of the basic notions and results from the theory
of dynamical systems. For a much more exhaustive overview, we refer to the excellent
monographs [Sin94, Wal82].
Remark 3.2 The reason why we use the letter E to denote a Polish space in this section
instead of X is that in the application we are interested in, we will take E = X Z and
Θt the shift map.
Given a dynamical system, a natural object of interest is the set of probability mea-
sures that are invariant under the action of Θt . Denoting by Θ∗t µ the push-forward of
µ under the map Θt , we define the set of invariant measures for {Θt } by
We can also define in a similar way the σ-algebra of all invariant subsets of E:
(This set depends obviously also on the choice of dynamical system, but we will omit
this in order not to get our notations overcrowded.)
One of the most striking results of the theory of dynamical systems is that some
kind of ‘law of large numbers’ can be shown to hold in great generality:
Theorem 3.3 (Birkhoff’s Ergodic Theorem) Let (Θt )t∈T be a measurable dynami-
cal system over a Polish space E. Fix an invariant measure µ ∈ J (Θ) and let
f ∈ L1 (E, µ). Then,
N −1
1 X
lim f (Θn (x)) = Eµ (f | I)(x)
N →∞ N
n=0
for µ-almost every x ∈ E. In particular, the expression on the left converges to a limit
for µ-almost every starting point x.
This theorem suggests strongly that an important class of invariant measures is
given by those under which the invariant σ-algebra is trivial.
µ - almost surely.
DYNAMICAL SYSTEMS 7
which is a contradiction and similarly for A− . This implies that µ(A0 ) = 1, and so
µ(f¯ = Ef¯) = 1 as requested.
Before we trun to the proof of Theorem 3.3, we establish the following important
result:
Theorem 3.6 (Maximal Ergodic Theorem) With the notations of Theorem 3.3, define
N
X −1
SN (x) = f (θn x) , MN (x) = max{S0 (x), S1 (x), . . . , SN (x)} ,
n=0
R
with the convention S0 = 0. Then, {MN >0}
f (x) µ(dx) ≥ 0 for every N ≥ 1.
Furthermore, max{S1 (x), . . . , SN (x)} = MN (x) on the set {MN > 0}, so that
Z Z
f (x) µ(dx) ≥ (MN (x) − MN (Θ(x))) µ(dx)
{MN >0} {MN >0}
Z
≥ EMN − MN (x) µ(dx) ,
AN
where AN = {Θ(x) | MN (x) > 0}. The first inequality follows from the fact that
MN ≥ 0 and theRsecond inequality follows from the fact that Θ is measure-preserving.
Since MN ≥ 0, A MN (x) µ(dx) ≤ EMN for every set A, so that the expression above
is greater or equal to 0, which is the required result.
ε
The sequence of sets {MN > 0} increases to the set B ε ≡ {supN SN
ε
> 0} =
ε
SN
{supN N > 0}. It follows from (3.1) that
n SN o
B ε = {η̄ > ε} ∩ sup > ε = {η̄ > ε} = Aε .
N N
Since E|f ε | ≤ E|f | + ε < ∞, the dominated convergence theorem implies that
Z Z
ε
lim f (x) µ(dx) = f ε (x) µ(dx) ≥ 0 ,
N →∞ ε >0}
{MN Aε
and so
Z Z Z
0≤ f ε (x) µ(dx) = (f (x) − ε) µ(dx) = f (x) µ(dx) − εµ(Aε )
Aε Aε Aε
Z
= E(f (x) | I) µ(dx) − εµ(Aε ) = −εµ(Aε ) ,
Aε
where we used the fact that Aε ∈ I to go from the first to the second line. Therefore,
one must have µ(Aε ) = 0 for every ε > 0, which implies that η̄ ≤ 0 almost surely.
Therefore, the measure Pµ is an invariant measure for the dynamical system θt over
X R . This allows to carry over in a natural way the following notions from the theory
of dynamical systems:
Theorem 5.1 The set J (P) of all invariant probability measures for a Markov semi-
group {Pt } is convex and µ ∈ J (P) is ergodic if and only if it is an extremal point
of J (P) (that is it cannot be decomposed as µ = tµ1 + (1 − t)µ2 with t ∈ (0, 1) and
µi ∈ J (P)).
Furthermore, any two ergodic invariant probability measures are either identical or
mutually singular and, for every invariant measure µ ∈ J (P) there exists a probability
Rmeasure %µ on the set E(P) of ergodic invariant measures for P such that µ(A) =
J (P)
ν(A) %µ (dν).
Before we turn to the proof of Theorem 5.1, we prove the following very important
preliminary result:
Proposition 5.2 Let P be the law of a stationary Markov process on X Z . Then, the
σ-algebra I of all subsets invariant under θ is contained in the completion F̄00 of F00
under P.
Proof. We introduce the following notation. For any subset A ⊂ X Z and any subset
I ⊂ Z, denote by ΠI A ⊂ X Z the set2
ΠI A = {y ∈ X Z | ∃x ∈ A with xk = yk ∀k ∈ I} .
T
Note that one has A ⊂ ΠI A for any I. Furthermore, we have A = n≥0 Π[−n,n] A, so
that P(Π[−n,n] A \ A) → 0 as n → ∞. Note also that if A = Π[k,`] A, then A ∈ Fk` .
Fix now k > 0. Since A ∈ I, one has
Since on the other hand 12A = 1A and E(1A | F00 ) ∈ [0, 1], one has E(1A | F00 ) ∈ {0, 1}
almost surely. Let  denote the points such that E(1A | F00 ) = 1, so that  ∈ F00
by the definition of conditional expectations. Furthermore, the same definition yields
P(Â ∩ B) = P(A ∩ B) for every set B ∈ F00 and (using the same reasoning as above
for 1 − 1A ) P(Âc ∩ B) = P(Ac ∩ B) as well. Using this for B = Â and B = Âc
respectively shows that A ∼ Â, as required.
Corollary 5.3 Let again P be the law of a stationary Markov process. Then, for every
set A ∈ I there exists a measurable set Ā ⊂ X such that A ∼ ĀZ .
Proof. We know by Proposition 5.2 that A ∈ F̄00 , so that the event A is equivalent to
an event of the form {x0 ∈ Ā} for some Ā ⊂ X . Since P is stationary and A ∈ I,
the time 0 is not distinguishable from any other time, so that this implies that A is
equivalent to the event {xn ∈ Ā} for every n ∈ Z. In particular, it is equivalent to the
event {xn ∈ Ā for every n}.
Note that this result is crucial in the proof of the structure theorem, since it allows
us to relate invariant sets A ∈ I to invariant sets Ā ⊂ X , in the following sense:
Proof. It follows immediately from the definition of an invariant set that one has µ(Ā) =
Pµ (ĀZ ) for every µ-invariant set Ā.
Now if µ is ergodic, then Pµ (ĀZ ) ∈ {0, 1} for every set Ā, so that in particular
µ(Ā) ∈ {0, 1} for every µ-invariant set. If µ is not ergodic, then there exists a set
A ∈ I such that Pµ (A) 6∈ {0, 1}. By Corollary 5.3, there exists a set Ā ⊂ X such
that A ∼ {x0 ∈ Ā} ∼ ĀZ . The set Ā must be µ-invariant, since otherwise the relation
{x0 ∈ Ā} ∼ ĀZ would fail.
Proof of Theorem 5.1. Assume first that µ ∈ J (P) is not extremal, i.e. it is of the form
µ = tµ1 + (1 − t)µ2 with t ∈ (0, 1) and µi ∈ J (P). (Note that therefore Pµ =
tPµ1 + (1 − t)Pµ2 .) Assume by contradiction that µ is ergodic, so that Pµ (A) ∈ {0, 1}
M AIN STRUCTURE THEOREM 11
for every A ∈ I. If Pµ (A) = 0, then one must have Pµ1 (A) = Pµ2 (A) = 0 and
smilarly if Pµ (A) = 1. Therefore, Pµ1 and Pµ2 agree on I, so that both Pµ1 and Pµ2
are ergodic. Let now f : X Z → R be an arbitrary bounded measurable function and
consider the function f ∗ : X Z → R which is defined by
n
1X
f ∗ (x) = lim f (θk (x)) ,
n→∞ n
k=1
on the set E on which this limit exists and by f ∗ (x) = 0 otherwise. Denote by Ei the
∗
R
set of points x such that f (x) = f (x) Pµi (dx). By Corollary 3.5, one has Pµi (Ei ) =
1, so that Rin particular Pµ (E1R) = Pµ (E2 ) = 1. Since f was arbitrary, one can choose
it so that f (x) Pµ1 (dx) 6= f (x) Pµ2 (dx), which would imply E1 ∩ E2 = φ, thus
contradicting the fact that Pµ (E1 ) = Pµ (E2 ) = 1.
Let now µ ∈ J (P) be an invariant measure that is not ergodic, we want to show
that it can be written as µ = tµ1 + (1 − t)µ2 for some µi ∈ J (P) and t ∈ (0, 1). By
Corollary 5.5, there exists a set Ā ⊂ X such that µ(Ā) = t and such that P (x, Ā) = 1
for µ-almost every x ∈ Ā. Furthermore, one has µ(Āc ) = 1 − t and the stationarity of
µ implies that one must have P (x, Āc ) = 1 for µ-almost every x ∈ Āc . This invariance
property immediately implies that the measures µi defined by
1 1
µ1 (B) = µ(Ā ∩ B) , µ2 (B) = µ(Āc ∩ B) ,
t 1−t
belong to J (P) and therefore have the required property.
The statement about the mutual singularity of any two elements of E(P) follows
immediately from Corollary 3.5. Let indeed µ1 and µ2 be two distinct ergodic invari-
ant probability measures. Since R they are distinct,
R there exists a measurable bounded
function f : X → R such that f (x) µ1 (dx) 6= f (x) µ2 (dx). Let us denote by {xn }
the Markov process with transition operator P starting at x0 . Then, using the shift map
θ in Corollary 3.5, we find that the equality
N Z
1 X
lim f (xn ) = f (x) µi (dx)
N →∞ N
n=1
holds almost surely for µi -almost every initial conditionR x0 (which is the Rsame as to say
that it holds for Pµi -almost every sequence x). Since f (x) µ1 (dx) 6= f (x) µ2 (dx)
by assumption, this implies that µ1 and µ2 are mutually singular.
The proof of the fact that every invariant measure can be obtained as a convex com-
bination of ergodic invariant measures is a consequence of the ergodic decomposition
and will not be given here.
Corollary 5.6 If a Markov process with transition operator P has a unique invariant
measure µ, then µ is ergodic.
Theorem 6.1 (Krylov-Bogolioubov) Let (Pt )t≥0 be a Feller Markov semigroup over
a Polish space X . Assume that there exists µ0 ∈ M1 (X ) such that the sequence {Pt µ0 }
is tight. Then, there exists at least one invariant probability measure for (Pt )t≥0 .
1 t
Z
µt (A) = (Ps µ0 )(A) ds .
t 0
Since we assumed that {Pt µ0 } is tight, it is straightforward to check that {µt } is also
tight (just take the same compact set). Therefore, there exists at least one accumulation
point µ∗ and a sequence tn with tn → ∞ such that µtn → µ∗ weakly. Take now an
arbitrary test function ϕ ∈ Cb (X ). One has
|(Pt µ∗ )(ϕ) − µ∗ (ϕ)| = |µ∗ (Pt ϕ) − µ∗ (ϕ)| = lim |µtn (Pt ϕ) − µtn (ϕ)|
n→∞
= lim |µtn (Pt ϕ) − µtn (ϕ)|
n→∞
1 t+tn
Z Z t
= lim µ0 (Ps ϕ) ds − µ0 (Ps ϕ) ds
n→∞ tn
tn 0
2t
≤ lim sup |ϕ(x)| = 0 .
n→∞ tn x∈X
Here, the second equality relies on the fact that Pt ϕ is continuous since Pt was as-
sumed to be Feller. Since both ϕ and t were arbitrary, this shows that Pt µ∗ = µ∗ for
every t as requested.
Example 6.3 Take X = [0, 1] and consider the transition probabilities defined by
δx/2 if x > 0
P(x, ·) =
δ1 if x = 0.
It is clear that this Markov operator cannot have any invariant probability measure. In-
deed, assume that µ is invariant. Clearly, one must have µ({0}) = 0 since P(x, {0}) =
0 for every x. Since, for x 6= 0, one has P(x, {(1/2, 1]}) = 0, one must also have
µ((1/2, 1]) = 0. Proceeding by induction, we have that µ((1/2n , 1]) = 0 for every n
and therefore µ((0, 1]) = 0. Therefore, µ(X ) = 0 which is a contradiction.
Endowing X with the usual topology, it is clear that the ‘Feller’ assumption of the
Krylov-Bogolioubov criteria is not satisfied around 0. The tightness criterion however
is satisfied since X is a compact space. On the other hand, we could add the set {0}
to the topology of X , therefore really interpreting it as X = {0} t (0, 1]. Since {0}
A SIMPLE YET POWERFUL UNIQUENESS CRITERION 13
already belongs to the Borel σ-algebra of X , this change of topology does not affect
the Borel sets. Furthermore, the space X is still a Polish space and it is easy to check
that the Markov operator P now has the Feller property! However, the space X is no
longer compact and a sequence {xn } accumulating at 0 is no longer a precompact set,
so that it is now the tightness assumption that is no longer satisfied.
Lemma 7.1 If the set J (P) of invariant measure for a Markov operator P over a
Polish space X contains more than one element, then there exist at least two elements
µ1 , µ2 ∈ J (P) such that µ1 and µ2 are mutually singular.
Proof. Assume that J (P) has at least two elements. Since, by Theorem 5.1, every
invariant measure can be obtained as a convex combination of ergodic ones, J (P)
must contain at least two distinct ergodic invariant measures, say µ1 and µ2 , which are
mutually singular by Theorem 5.1.
As a consequence of this lemma, if J (P) contains more than one invariant measure,
the state space X can be partitioned into (at least) two disjoint parts X = X1 t X2 with
the property that if the process starts in X1 , then it will stay in X1 for all times almost
surely and the same applies to X2 . The intuition that derives from this consideration
is that uniqueness of the invariant measure is a consequence of the process visiting a
“sufficiently large” portion of the phase space, independently of its initial position. The
remainder of this section is devoted to several ways of formalising this intuition.
The following definition captures what we mean by the fact that a given point of
the state space can be ‘visited’ by the dynamic:
Definition 7.2 Let {Pt } be a Markov semigroup over a Polish space X and let x ∈ X .
Define the resolvent operator Rλ for Pt by
Z ∞
Rλ (y, U ) = λ e−λt Pt (y, U ) dt ,
0
which is again a Markov operator over X . We say that x is accessible for {Pt } if, for
every y ∈ X and every open neighborhood U of x, one has Rλ (y, U ) > 0. (Note that
this definition does not depend on the choice of λ.)
It is straightforward to show that if a given point is reachable, then it must belong to
the topological support of every invariant measure of the semigroup:
Lemma 7.3 Let {Pt } be a Markov semigroup over a Polish space X and let x ∈ X be
accessible. Then, x ∈ supp µ for every µ ∈ J (P).
Proof. Let µ be invariant for the Markov semigroup {Pt }, let λ > 0, and let U ⊂ X
be an arbitrary neighborhood of x. The invariance of µ implies that
Z
µ(U ) = Rλ (y, U ) µ(dy) > 0 ,
X
as required.
A SIMPLE YET POWERFUL UNIQUENESS CRITERION 14
It is important to realise that this definition depends on the topology of X and not
just on the Borel σ-algebra. Considering again Example 6.3, we see that the point 0 is
reachable when [0, 1] is endowed with its usual topology, whereas it is not reachable if
we interpret the state space as {0} t (0, 1]. Therefore, as in the previous section, this
definition can be useful only in conjunction with an appropriate regularity property of
the Markov semigroup. The following example shows that the Feller property is too
weak to serve our purpose.
Example 7.4 (Ising model) The Ising model is one of the most popular toy models
of statistical mechanics. It is one of the simplest models describing the evolution of a
ferromagnet. The physical space is modelled by a lattice Zd and the magnetisation at
each lattice site is modelled by a ‘spin’, an element of {±1}. The state space of the
2
system is therefore given by X = {±1}Z , which we endow with the product topology.
This topology can be metrized for example by the distance function
X |xk − yk |
d(x, y) = ,
2
2|k|
k∈Z
and the space X endowed with this distance function is easily seen to be separable.
The (Glauber) dynamic for the Ising model depends on a parameter β and can be
described in the following way. At each lattice site, we consider independent clocks
that ring at Poisson distributed times. Whenever P the clock at a given site (say the site
k) rings, we consider the quantity δEk (x) = j∼k xj xk , where the sum runs over all
sites j that are nearest neighbors of k. We then flip the spin at site k with probability
min{1, exp(−β δEk (x))}.
Let us first show that every point is accessible for this dynamic. Fix an arbi-
trary configuration x ∈ X and a neighbourhood U containing x. By the definition
of the product topology, U contains an ‘elementary’ neighbourhood UN (x) of the type
UN (x) = {y ∈ X | yk = xk ∀ |k| ≤ N }. Given now an arbitrary initial condition
y ∈ X , we can find a sequence of m spin flips at distinct locations k1 , . . . , km , all of
them located inside the ball {|k| ≤ N }, that allows to go from y into UN (x). Fix now
t > 0. There is a very small but nevertheless strictly positive probability that within
that time interval, the Poisson clocks located at k1 , . . . , km ring exactly once and ex-
actly in that order, whereas all the other clocks located in the ball {|k| ≤ N + 2} do
not ring. Furthermore, there is a strictly positive probability that all the corresponding
spin flips do actually happen. As a consequence, the Ising model is topologically irre-
ducible in the sense that for any state x ∈ X , any open set U ⊂ X and any t > 0, one
has Pt (x, U ) > 0.
It is also relatively straightforward to show that the dynamic has the Feller property,
but this is outside the scope of these notes. However, despite the fact that the dynamic
is both Feller and topologically irreducible, one has the following:
Theorem 7.5 For d ≥ 2 there exists βc > 0 such that the Ising model has at least two
distinct invariant measures for β > βc .
The proof of this theorem is not simple and we will not give it here. It was a celebrated
√
tour de force by Onsager to be able to compute the critical value βc = ln(1 + 2)/2
explicitly in [Ons44] for the case d = 2. We refer to the monograph [Geo88] for a
more detailed discussion of this and related models.
H ÖRMANDER ’ S CONDITION 15
This example shows that if we wish to base a uniqueness argument on the acces-
sibility of a point or on the topological irreduciblity of a system, we need to combine
this with a stronger regularity property than the Feller property. One possible regularity
property that yields the required properties is the strong Feller property:
Definition 7.6 A Markov operator P over a Polish space X has the strong Feller prop-
erty if, for every function ϕ ∈ Bb (X ), one has Pϕ ∈ Cb (X ).
With this definition, one has:
Proposition 7.7 If a Markov operator P over a Polish space X has the strong Feller
property, then the topological supports of any two mutually singular invariant measures
are disjoint.
Proof. Let µ and ν be two mutually singular invariant measures for P. Since they must
be mutually singular by Theorem 5.1, there exists a set A ⊂ X such that µ(A) = 1 and
ν(A) = 0. The invariance of µ and ν then implies that P(x, A) = 1 for µ-almost every
x and P(x, A) = 0 for ν-almost every x.
Set ϕ = P1A , where 1A is the characteristic function of A. It follows from the pre-
vious remarks that ϕ(x) = 1 µ-almost everywhere and ϕ(x) = 0 ν-almost everywhere.
Since ϕ is continuous by the strong Feller property, the claim now follows from the
fact that if a continuous function is constant µ-almost everywhere, it must be constant
on the topological support of µ.
Actually, by looking at the proof of Proposition 7.7, one realises that one could
have introduced a notion of being strong Feller at a point x ∈ X in a natural way by
imposing that Pϕ is continuous at x for every bounded measurable function ϕ. With
this notation, the same proof as above allows to conclude that if P is strong Feller at
x, then x can belong to the support of at most one invariant probability measure. This
leads to one of the most general uniqueness criteria commonly used in the literature:
Proof. Combine Proposition 7.7 with Lemma 7.3 and the fact that if J (P) contains
more than one element, then by Theorem 5.1 there must be at least two distinct ergodic
invariant measures for P.
Exercise 7.9 Let ξn be an i.i.d. sequence of real-valued random variables with law µ.
Define a real-valued Markov process xn by xn+1 = 12 xn + ξn . Show that if µ has a
continuous density with respect to Lebesgue measure, then the corresponding Markov
operator has the strong Feller property.
8 Hörmander’s condition
The Markov processes considered in this section are diffusions with smooth coeffi-
cients:
m
X
dx(t) = f0 (x(t)) dt + fi (x(t)) ◦ dwi (t) . (8.1)
i=1
with bounded derivatives of all orders, and the wi ’s are i.i.d. standard Wiener processes.
H ÖRMANDER ’ S CONDITION 16
It is a standard result from stochastic analysis that (8.1) has a unique solution for every
initial condition x0 ∈ Rn and that these solutions have the Markov property [Øks03b,
Kry95].
Denote by Pt the Markov semigroup associated to solutions of (8.1), that is
The aim of this section is to provide a criteria that is not difficult to verify in practice
and that guarantees that Pt ϕ is smooth for every bounded measurable function ϕ.
Given two smooth vector fields f, g : Rn → Rn , we define their Lie bracket by
Here, Df and Dg denote the Fréchet derivatives of f and g. With this notation at hand,
we define an increasing sequence of families of vector fields recursively by
A0 = {fj : j = 1, . . . , m} ,
Ak+1 = Ak ∪ {[g, fj ] : g ∈ Ak , j = 0, . . . , m} .
We will say that Hörmander’s condition holds at a point x ∈ Rn if Ā∞ (x) = Rn . With
this notation, we have the following result:
Remark 8.2 The easiest way for Hörmander’s condition to hold is if Ā0 (x) = Rn for
every x. In this case, the generator
m
2
X
L = f0 ∇ + (fi ∇) (8.2)
i=1
These approximations can be justified rigorously, and lead to the integration by parts
formula: Z 1
EDv f (w) = E f (w) v(t) dw(t) . (8.4)
0
exception is the situation where v is an adapted process. In this case, v(t) does not
depend on the increments of the Wiener process before time t, so that Ds v(t) = 0
for s < t. Therefore, one of the two factors appearing in the double integral always
vanishes and one recovers the usual Itô isometry.
How does all this help for the proof of Hörmander’s theorem? Using the chain rule
and Fubini’s theorem, we see that if ϕ is a sufficiently smooth function and ξ is an
arbitrary element of Rn , one has the identity
were we denoted by Js,t the Jacobian of the solution map of (8.1) between two times
s and t. Suppose now that, given ξ ∈ Rn , we can find a process vξ ∈ L2 ([0, t], Rm )
such that the derivative J0,t ξ of the solution to (8.1) in the direction ξ with respect to
its initial condition is equal to its Malliavin derivative Dvξ xt in the direction of the
process vξ . We could then use (8.4) to write
Z t
Dξ Pt ϕ(x) = E(Dϕ(xt )Dvξ xt ) = E(Dvξ (ϕ(xt ))) = E ϕ(xt ) vξ (s) dw(s) ,
0
thus obtaining a bound on the derivative of Pt ϕ which is uniform over all functions ϕ
with a given supremum bound. Iterating such a procedure would then lead to the proof
of Hörmander’s theorem. The main moral that one should take home from this story is
that Malliavin calculus allows to transform a regularity problem (showing that Pt has a
smoothing property) into a linear control problem (find a control v such that perturbing
the noise by v has the same effect as a given perturbation in the initial condition).
The aim of the remainder of this section is to give an idea on how to construct such a
‘control’ vξ in the framework given by the assumptions of Hörmander’s theorem. The
main insight required to perform such a construction is to realise that the Malliavin
derivative of xt is intimately related to the Jacobian. Formally taking the derivative of
(8.1) in the direction of the wi indeed yields
m
X m
X
dDv xt = Df0 (xt )Dv xt dt + Dfi (xt )Dv xt ◦ dwi (t) + fi (xt )vi (t) dt ,
i=1 i=1
endowed with the initial condition Dv x0 = 0, whereas taking derivatives with respect
to the initial condition yields the very similar expression
m
X
dJs,t ξ = Df0 (xt )Js,t ξ dt + Dfi (xt )Js,t ξ ◦ dwi (t) ,
i=1
endowed with the initial condition Js,s ξ = ξ. This allows to solve the equation for
Dv xt using the variation of constants formula, thus obtaining the expression
Z t Z t
−1
D v xt = Js,t fi (xs )vi (s) ds = J0,t J0,s fi (xs )vi (s) ds ≡ J0,t A0,t v ,
0 0
where the (random) linear operator A0,t maps L2 ([0, t], Rn ) into Rn . With these no-
tations in place, our control problem is now to find a control vξ such that one has
the identity J0,t A0,t v = J0,t ξ which, since the Jacobian is invertible for the class of
problems that we consider, is equivalent to the identity
A0,t v = ξ . (8.6)
H ÖRMANDER ’ S CONDITION 19
Exercise 8.3 Let M be a random positive semidefinite d×d matrix such that kM k ≤ 1
almost surely. Assume that for every p > 0 one can find a constant Cp > 0 such that
the bound
sup P(hξ, M ξi ≤ ε) ≤ Cp εp ,
kξk=1
holds for ε sufficiently small. Show that this implies the existence of a possibly differ-
ent family of constants Cp0 such that the bound
P inf hξ, M ξi ≤ ε ≤ Cp0 εp ,
kξk=1
holds for ε small enough. Deduce that the matrix M is then invertible almost surely
and that its inverse has moments of all orders.
Hint: Decompose the sphere kξk = 1 into small patches of radius ε2 and argue
separately on each patch.
Let a and b be two adapted real-valued and Rm -valued process respectively satis-
fying sufficient regularity assumptions and consider the process z defined by
Z t Z t
z(t) = a(s) ds + b(s) ◦ dw(s) .
0 0
Then Norris’ lemma states that if z is small then, with high probability, both a and b
are small. Using the fact that the inverse of the Jacobian satisfies the equation
m
X
−1 −1 −1
dJs,t = −Js,t Df0 (xt ) dt − Js,t Dfi (xt ) ◦ dwi (t) ,
i=1
It is now straightforward to check that if g is any smooth vector field, then the process
−1
z(t) = hξ, J0,s g(xs )i satisfies the SDE
m
X
−1 −1
dz(t) = hξ, J0,s [f0 , g](xs )i dt + hξ, J0,s [fi , g](xs )i ◦ dwi (t) (8.8)
i=1
W HAT ABOUT THE INFINITE - DIMENSIONAL CASE ? 20
In order to show that hξ, M0,t ξi cannot be too small, we now argue by contradic-
−1
tion. Assume that hξ, M0,t ξi is very small then, by (8.7), the processes hξ, J0,s fi (xs )i
must all be very small as well. On the other hand, Norris’ lemma combined with
−1
(8.8) shows that if any process of the type hξ, J0,s g(xs )i is small, then the processes
−1
hξ, J0,s [fi , g](xs )i must also be small for j = 0, . . . , m. In particular, hξ, [fi , g](x0 )i
must be small.
Iterating this argument shows that if hξ, M0,t ξi is very small, then hξ, g(x0 )i must
be small for all g ∈ A∞ , which is in direct contradiction with Hörmander’s condition.
Hörmander’s theorem is a very neat way of showing that a diffusion has the strong
Feller property. Combined with Stroock-Varadhan’s support theorem, it allows very
often to verify that the assumptions of Corollary 7.8 hold:
Then, the support of the transition probabilities Pt (x, · ) is precisely given by the clo-
sure of all points in Rd that can be reached in time t by solutions to (8.9) with the ui
given by arbitrary smooth functions.
Exercise 8.6 A slight elaboration on the previous example is given by a finite chain of
nonlinear oscillators coupled to heat baths at the ends:
dqi = pi dt , i = 0, . . . , N ,
p
dp0 = −∇V1 (q0 ) dt − ∇V2 (q0 − q1 ) dt − p0 dt + 2TL dwL (t) ,
dpj = −∇V1 (qj ) dt − ∇V2 (qj − qj−1 ) dt − ∇V2 (qj − qj+1 ) dt ,
p
dpN = −∇V1 (qN ) dt − ∇V2 (qN − qN −1 ) dt − pN dt + 2TR dwR (t) .
Show that if the coupling potential V2 is strictly convex (so that its Hessian is strictly
positive definite in every point), then this equation does satisfy Hörmander’s condi-
tion, so that it satisfies the strong Feller property. Harder: Show that every point is
reachable, using Stroock-Varadhan’s support theorem.
First, the Jacobian for a parabolic PDE (or SPDE) is not usually an invertible oper-
ator, so we can not reduce ourselves to the situation (8.6) where the process appearing
in the definition of A is adapted. Furthermore, the question of the invertibility of
the operator M0,t is much more subtle in infinite dimensions. As a consequence, one
cannot expect in general that a smoothing theorem along the lines of the statement of
Hörmander’s theorem given previously holds in infinite dimensions.
Let us for example consider the following infinite-dimensional system of SDEs:
2
dxk = −xk dt + e−k dwk (t) , k ∈ Z , (9.1)
with the wk ’s a sequence of i.i.d. standard Wiener processes. Note that the sum in
(9.3) does not converge in `2 , but one can convince oneself that the expression (9.2) is
well-defined and does take values in `2 .
Exercise 9.1 Show that the solution to (9.1) does indeed live in `2 almost surely and
that the Markov semigroup given by (9.2) has the Feller property.
Consider now the subset A ⊂ `2 of sequences with fast decay:
n o
A = x ∈ `2 : sup |xk ||k|N < ∞ ∀N > 0 .
k
It a straightforward calculation to show that, if x0 = 0, then the right hand side of (9.2)
belongs to A almost surely. Therefore, since A is a vector space, one has x(t) ∈ A
if and only if x0 ∈ A. In other words, the characteristic function of the set A is left
invariant by the Markov semigroup associated to (9.2), thus showing that it does not
have the strong Feller property, despite the diffusion (9.1) looking perfectly ‘elliptic’.
This is however not the generic situation. Before we show an example showing
that the strong Feller property can sometimes also be satisfied in infinite-dimensional
spaces, let us recall some basics of the theory of Gaussian measures on Hilbert spaces.
A measure µ on a (separable) Banach space B is called Gaussian if the law of `∗ µ is
Gaussian for every continuous linear functional ` : B → R. The covariance operator
Cµ of µ is the bounded linear operator from B ∗ to B such that the identity
Z
u(Cµ v) = u(x)v(x) µ(dx) ,
H
∗
holds for any u, v ∈ B . The two main theorems from Gaussian measure theory are
then given by:
Exercise 9.6 Again in the case of a Gaussian measure on a Hilbert space, show that
1/2 −1/2
Hµ is given by the range of Cµ and that khkµ = kCµ hk.
Consider now a general linear SPDE with additive noise on a Hilbert space H
driven by additive noise written as an evolution equation:
In other words, the law of x(t) is a Gaussian measure centred at e−At x0 with covariance
Z t
∗
Qt = e−A(t−s) QQ∗ e−A (t−s) ds . (9.5)
0
In view of Exercise 9.4, we obtain the following condition for a linear evolution equa-
tion on a Hilbert space to possess the strong Feller property:
T HE B ISMUT-E LWORTHY-L I FORMULA 23
Proposition 9.7 The Markov operator Pt associated to (9.4) has the strong Feller
1/2
property if and only if the range of e−At is contained in the range of Qt .
Proof. Let us first show that the condition is sufficient. Denote by µ the law of the
centred Gaussian measure with covariance Qt . It then follows from Theorem 9.3 that
the transition probabilities Pt (x, dy) have a density pt (x, y) with respect to µ given by
−1/2 −At −1/2 1 −1/2 −At 2
pt (x, y) = exp hQt e x, Qt yi − kQt e xk .
2
In particular, it follows that the directional derivative Dξ Pt ϕ in the direction ξ ∈ H is
given by
Z
−1/2 −At −1/2
Dξ Pt ϕ(x) = (hQt e ξ, Qt (y − e−At x0 )i)ϕ(y)pt (x, y)µ(dy)
sZ
−1/2 −At −1/2
p
≤ Pt ϕ2 (x) |hQt e ξ, Qt yi|2 µ(dy)
−1/2 −At
p
= kQt e ξk2 Pt ϕ2 (x) .
−1/2
Here, Qt Y should be interpreted as in Remark 9.5. Note that all of these cal-
culations were formal, but can easily be made rigorous by approximation. Since
−1/2 −At
Qt e is a bounded operator by assumption, the right hand side is bounded uni-
formly for all bounded measurable functions ϕ, showing that Pt ϕ is uniformly Lips-
chitz, so that Pt is strong Feller.
1/2
Suppose now that the range of e−At is not contained in the range of Qt . In this
case, one can find x0 ∈ H such that h ≡ e−At x0 does not belong to the Cameron-
∗
Martin space of µ. By Theorem 9.3, the measures τεh µ are all mutually singular,
so that we can find a measurable subset B ⊂ H such that µ(B) = 1 and such that
∗
(τεh µ)(B) = 0 for every rational ε. This shows that Pt 1B is equal to 1 at the origin,
but equal to 0 at εx0 for every rational ε. It is therefore discontinuous at 0, showing
that Pt is not strong Feller.
3. W is a cylindrical Wiener process on H and Q and A are such that the solution
to the linearised equation (that is (10.1) where we set F ≡ 0) has almost surely
continuous sample paths in H.
With these assumptions, one can show that (10.1) can be solved pathwise by the usual
Picard iteration procedure, see for example [Hai08]. This solution can be continued in
the usual way up to a (random) explosion time τ such that limt%τ ku(t)k = ∞. Since
we do not wish to deal with exploding solution, we assume that one can obtain an a
priori estimate on the size of the solution such that
4. The explosion time τ is infinite almost surely.
Exercise 10.1 Given T > 0 and α > 0 and a positive definite selfadjoint operator
A, show that there exists a constant C such that ke−At Aα k ≤ Ct−α holds for every
t ∈ [0, T ].
Exercise 10.2 Show that all the simplifying assumptions are satisfied for reaction dif-
fusion equations, that is H = L2 (D, Rm ) for some smooth domain D ⊂ Rd , A = −∆
endowed with Dirichlet boundary conditions, F (u)(x) = f (u(x)) for some globally
Lipschitz continuous function f : Rm → Rm , and Q is a Hilbert-Schmidt operator
(that is QQ∗ is trace class) on H.
Under these assumptions, one can show that, as a straightforward consequence
of the implicit functions theorem, the solution to (10.1) is Fréchet differentiable with
respect to its initial condition and its derivative Js,t ξ in the direction ξ ∈ H satisfies
the equation
dJs,t ξ = −AJs,t ξ dt + DF (u(t))Js,t ξ dt . (10.2)
Furthermore, it is possible to show in the same way that the solution is Malliavin dif-
ferentiable and that its Malliavin derivative Dv ut in the direction v ∈ L2 ([0, t], H) is
given by the formula Z t
Dv ut = Js,t Qv(s) ds . (10.3)
0
this allows to prove the following formula for the derivative of Pt ϕ [EL94]:
Theorem 10.3 (Bismut-Elworthy-Li) Assume that the Jacobian J is such that the
range of J0,t is contained in the range of Q for t > 0 and that EkQ−1 J0,t k2 < ∞
uniformly in t over any bounded time interval bounded away from 0. Then, for all
Fréchet differentiable test functions ϕ : H → R, one has the identity
Z 3t/4
2
Dξ Pt ϕ(u) = E ϕ(ut ) hQ−1 J0,s ξ, dW (s)i
t t/4
Proof. The proof works just like the proof of Hörmander’s theorem, except that since
Q is invertible on the range of the Jacobian, one can find explicitly a solution to (8.6).
For fixed t and ξ, we write
2 −1
tQ J0,s ξ for s ∈ [ 4t , 3t
4]
vξ (s) =
0 otherwise
It follows immediately from (10.3) that this particular choice of vξ satisfies the identity
Dvξ ut = J0,t ξ, so that
Dξ Pt ϕ(u) = E Dϕ(ut )J0,t ξ = E Dϕ(ut )Dvξ ut
T HE ASYMPTOTIC STRONG F ELLER PROPERTY 25
Z
= E ϕ(ut ) hvξ (s), dW (s)i ,
Verifying the conditions of Theorem 10.3 is not a trivial task by far in general.
However, there is a heuristic argument that allows to ‘guess’ the right answer in many
cases of interest. It is based on the following two facts:
• The solution to (10.1) has very often the same regularity as the solution to the
linear equation with F ≡ 0. This is because most parabolic PDEs have some
smoothing property that would cause the solutions to the deterministic equation
(Q ≡ 0) to become C ∞ immediately. Therefore, the driving noise is the only
factor that limits the regularity of the solutions.
• The Jacobian J0,t typically has 1 − α powers of A more smoothness than the
solutions to (10.1). (Here, the exponent α is the one appearing in assumption 2
above.) This is about the maximal amount of regularity that one can expect from
(10.2). Indeed, the variation of constants formula yields
Z t
J0,t ξ = e−A(t−s) DF (u(s))J0,s ξ ds .
0
Even if one assumes that J0,s ξ is extremely smooth, due to assumption 2 one
would in general expect DF (u) to have α powers of A less regularity than u.
The convolution with the semigroup generated by A however allows to gain one
power of A in terms of regularity, since the operator Aβ e−At behaves like t−β
for small t, so that this singularity is integrable provided β < 1.
Of course, this heuristic can be expected to hold only if the range of Q can be described
as a space of functions with a given degree of regularity. This is the case for example
if Q is given by a negative power of the Laplacian or some other elliptic differential
operator.
Combining these facts with the expression (9.5) for the covariance of the linear
equation, we deduce from these heuristic considerations that the Bismut-Elworthy-Li
formula is applicable provided that the operator Q−1 Aα−1 e−At Q is Hilbert-Schmidt
and that its Hilbert-Schmidt norm is square-integrable at t ≈ 0. If Q and A commute,
3
the borderline case for this condition occurs when Aα− 2 is a Hilbert-Schmidt operator.
Definition 11.1 Given a Polish space X and a metric d on X , we lift d to the cor-
responding Wasserstein-1 metric on the space of probability measures on X via the
formula Z Z
kµ − νkd = sup ϕ(x)µ(dx) − ϕ(x)ν(dx) .
Lip ϕ=1
Here, Lip ϕ denotes the best Lipschitz constant for ϕ, taken with respect to the metric
d.
The important fact about Wasserstein distances is:
Theorem 11.2 If d is a bounded metric that generates the topology of X , then the
corresponding Wasserstein-1 metric generates the topology of weak convergence on
the space of probability measures on X .
It is also possible to show that with this definition, the total variation distance be-
tween two probability measures (actually half of the usual total variation distance, so
that the distance between mutually singular probability measures is normalised to 1) is
given by the Wasserstein-1 distance corresponding to the metric
1 if x 6= y,
dTV (x, y) =
0 if x = y.
This is a metric that totally separates all the points of our space and therefore loses
completely all information about the topology of X . It suggests the following defi-
nition, which provides one way of approximating the total variation distance between
two probability measures by a sequence of Wasserstein-1 distances.
It is said to have the asymptotic strong Feller property if the above property holds at
every x ∈ X .
Remark 11.5 If B(x, γ) denotes the open ball of radius γ centered at x in some metric
defining the topology of X , then it is immediate that (11.1) is equivalent to
lim lim sup sup kPtn (x, · ) − Ptn (y, · )kdn = 0 .
γ→0 n→∞ y∈B(x,γ)
Remark 11.6 If there exists t > 0 such that Pt is continuous in the total variation
topology, then this is also the case for Ps with all s > t. In this case, it is a straightfor-
ward exercise to check that the semigroup {Pt } satisfies the asymptotic strong Feller
property. On the other hand, it is a known fact [DM83, Sei01] if Pt is strong Feller,
then P2t is continuous in the total variation topology. This shows that the asymptotic
strong Feller property is a genuine generalisation of the strong Feller property.
T HE ASYMPTOTIC STRONG F ELLER PROPERTY 27
Proof. For every measurable set A, every t > 0, and every metric d on X with d ≤ 1,
the triangle inequality for k · kd implies
kµ − νkd ≤ 1 − min{µ(A), ν(A)} 1 − max kPt (z, ·) − Pt (y, ·)kd . (11.2)
y,z∈A
Continuing with the proof of the corollary, by the definition of the asymptotic strong
Feller property there exist constants N > 0, a sequence of totally separating metrics
{dn }, and an open set U containing x such that kPtn (z, ·) − Ptn (y, ·)kdn ≤ 1/2 for
every n > N and every y, z ∈ U . (Note that by the definition of totally separating
pseudo-metrics dn ≤ 1.)
Let now µ and ν be two distinct ergodic invariant measures and assume by contra-
diction that x ∈ supp µ ∩ supp ν and therefore that one has α = min(µ(U ), ν(U )) > 0.
Taking A = U , d = dn , and t = tn in (11.2), we then get kµ − νkdn ≤ 1 − α2 for
every n > N . On the other hand, it is possible to show that if dn is a totally separating
sequence of metrics, then kµ − νkdn → kµ − νkTV , so that kµ − νkTV ≤ 1 − α2 , thus
leading to a contradiction with the fact that µ and ν are mutually singular.
A useful criterion for checking that the strong Feller property holds for a given
Markov semigroup is the following:
Proposition 11.8 Let tn and δn be two positive sequences with {tn } increasing to
infinity and {δn } converging to zero. A semigroup Pt on a Hilbert space H is asymp-
totically strong Feller if, for all ϕ : H → R with kϕk∞ and kDϕk∞ finite one has the
bound
kDPtn ϕ(h)k ≤ C(khk)(kϕkL∞ + δn kDϕkL∞ ) (11.3)
for all n > 0 and h ∈ H, where C : R+ → R is a fixed non-decreasing function.
Proof. For ε > 0, we define on H the distance dε (h1 , h2 ) = 1 ∧ ε−1 kh1 − h2 kH , and
we denote by k·kε the corresponding Wasserstein-1 distance. It is clear that if δn is any
decreasing sequence converging to 0, {dδn } is a totally separating system of metrics
for H.
T HE ASYMPTOTIC STRONG F ELLER PROPERTY 28
References
[Bis81] J.-M. B ISMUT. Martingales, the Malliavin calculus and hypoellipticity under general
Hörmander’s conditions. Z. Wahrsch. Verw. Gebiete 56, no. 4, (1981), 469–505.
[Bog98] V. I. B OGACHEV. Gaussian measures, vol. 62 of Mathematical Surveys and Mono-
graphs. American Mathematical Society, Providence, RI, 1998.
[Bro28] R. B ROWN. A brief account of microscopical observations made in the months of
june, july and august, 1827, on the particles contained in the pollen of plants; and on
the general existence of active molecules in organic and inorganic bodies. Phil. Mag.
4, (1828), 161–173.
[DM83] C. D ELLACHERIE and P.-A. M EYER. Probabilités et potentiel. Chapitres IX à XI.
Publications de l’Institut de Mathématiques de l’Université de Strasbourg [Publica-
tions of the Mathematical Institute of the University of Strasbourg], XVIII. Hermann,
Paris, revised ed., 1983. Théorie discrète du potential. [Discrete potential theory], Ac-
tualités Scientifiques et Industrielles [Current Scientific and Industrial Topics], 1410.
[DPZ92] G. DA P RATO and J. Z ABCZYK. Stochastic Equations in Infinite Dimensions. Uni-
versity Press, Cambridge, 1992.
[DPZ96] G. DA P RATO and J. Z ABCZYK. Ergodicity for Infinite Dimensional Systems, vol.
229 of London Mathematical Society Lecture Note Series. University Press, Cam-
bridge, 1996.
[Ein05] A. E INSTEIN. Über die von der molekularkinetischen Theorie der Wärme geforderte
Bewegung von in ruhenden Flüssigkeiten suspendierten Teilchen. Annalen der Physik
17, (1905), 549–560.
[EL94] K. D. E LWORTHY and X.-M. L I. Formulae for the derivatives of heat semigroups.
J. Funct. Anal. 125, no. 1, (1994), 252–286.
[Fed69] H. F EDERER. Geometric measure theory. Die Grundlehren der mathematischen
Wissenschaften, Band 153. Springer-Verlag New York Inc., New York, 1969.
[Geo88] H.-O. G EORGII. Gibbs measures and phase transitions, vol. 9 of de Gruyter Studies
in Mathematics. Walter de Gruyter & Co., Berlin, 1988.
[Hai08] M. H AIRER. An introduction to stochastic PDEs, 2008. URL http://www.
hairer.org/Teaching.html. Unpublished lecture notes.
[HM04] M. H AIRER and J. C. M ATTINGLY. Ergodic properties of highly degenerate 2D
stochastic Navier-Stokes equations. C. R. Math. Acad. Sci. Paris 339, no. 12, (2004),
879–882.
[Hör67] L. H ÖRMANDER. Hypoelliptic second order differential equations. Acta Math. 119,
(1967), 147–171.
[Hör85] L. H ÖRMANDER. The Analysis of Linear Partial Differential Operators I–IV.
Springer, New York, 1985.
[Kry95] N. V. K RYLOV. Introduction to the theory of diffusion processes, vol. 142 of Trans-
lations of Mathematical Monographs. American Mathematical Society, Providence,
RI, 1995. Translated from the Russian manuscript by Valim Khidekel and Gennady
Pasechnik.
[KS84] S. K USUOKA and D. S TROOCK. Applications of the Malliavin calculus. I. In Stochas-
tic analysis (Katata/Kyoto, 1982), vol. 32 of North-Holland Math. Library, 271–306.
North-Holland, Amsterdam, 1984.
[KS85] S. K USUOKA and D. S TROOCK. Applications of the Malliavin calculus. II. J. Fac.
Sci. Univ. Tokyo Sect. IA Math. 32, no. 1, (1985), 1–76.
[KS87] S. K USUOKA and D. S TROOCK. Applications of the Malliavin calculus. III. J. Fac.
Sci. Univ. Tokyo Sect. IA Math. 34, no. 2, (1987), 391–442.
T HE ASYMPTOTIC STRONG F ELLER PROPERTY 30
[Mal78] P. M ALLIAVIN. Stochastic calculus of variations and hypoelliptic operators. Proc. In-
tern. Symp. SDE 195–263.
[Nor86] J. N ORRIS. Simplified Malliavin calculus. In Séminaire de Probabilités, XX, 1984/85,
vol. 1204 of Lecture Notes in Math., 101–130. Springer, Berlin, 1986.
[Nua95] D. N UALART. The Malliavin calculus and related topics. Probability and its Appli-
cations (New York). Springer-Verlag, New York, 1995.
[Øks03a] B. Ø KSENDAL. Stochastic differential equations. Universitext. Springer-Verlag,
Berlin, sixth ed., 2003. An introduction with applications.
[Øks03b] B. Ø KSENDAL. Stochastic differential equations. Universitext. Springer-Verlag,
Berlin, sixth ed., 2003. An introduction with applications.
[Ons44] L. O NSAGER. Crystal statistics. I. A two-dimensional model with an order-disorder
transition. Phys. Rev. (2) 65, (1944), 117–149.
[RY99] D. R EVUZ and M. YOR. Continuous martingales and Brownian motion, vol. 293 of
Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Math-
ematical Sciences]. Springer-Verlag, Berlin, third ed., 1999.
[Sei01] J. S EIDLER. A note on the strong Feller property, 2001. Unpublished lecture notes.
[Sin94] Y. G. S INA Ĭ. Topics in ergodic theory, vol. 44 of Princeton Mathematical Series.
Princeton University Press, Princeton, NJ, 1994.
[Spi76] F. S PITZER. Principles of random walks. Springer-Verlag, New York, second ed.,
1976. Graduate Texts in Mathematics, Vol. 34.
[Wal82] P. WALTERS. An introduction to ergodic theory, vol. 79 of Graduate Texts in Mathe-
matics. Springer-Verlag, New York, 1982.