Advanced Probabiliy
Advanced Probabiliy
Perla Sousi
October 13, 2013
Contents
1 Conditional expectation
1.1
Discrete case
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2
1.3
11
1.4
11
1.4.1
Gaussian case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
1.4.2
12
2 Discrete-time martingales
13
2.1
Stopping times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
2.2
Optional stopping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
2.3
Gamblers ruin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
2.4
17
2.5
Doobs inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
2.6
20
2.7
21
2.8
Backwards martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
2.9
Applications of martingales . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
2.9.1
27
28
3.1
Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
3.2
32
3.3
36
3.4
38
4 Weak convergence
40
4.1
Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
4.2
Tightness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43
4.3
Characteristic functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
44
5 Large deviations
47
5.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
5.2
Cramers theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
5.3
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
50
6 Brownian motion
51
6.1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
51
6.2
Wieners theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
52
6.3
Invariance properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
53
6.4
57
6.5
Reflection principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
58
6.6
60
6.7
62
6.8
65
6.9
68
71
71
7.1
71
7.2
73
7.3
74
Conditional expectation
n=1
P(An ).
Let A, B F be two events with P(B) > 0. Then the conditional probability of A given the
event B is defined by
P(A B)
P(A|B) =
.
P(B)
Definition 1.3. The Borel -algebra, B(R), is the -algebra generated by the open sets in
R, i.e., it is the intersection of all -algebras containing the open sets of R. More formally,
let O be the open sets of R, then
B(R) = {E : E is a -algebra containing O}.
Informally speaking, consider the open sets of R, do all possible operations, i.e., unions,
intersections, complements, and take the smallest -algebra that you get.
Definition 1.4. X is a random variable, i.e., a measurable function with respect to F,
if X : R is a function with the property that for all open sets V the inverse image
X 1 (V ) F.
Remark 1.5. If X is a random variable, then the collection of sets
{B R : X 1 (B) F}
is a -algebra (check!) and hence it must contain B(R).
Definition 1.6. For a collection A of subsets of we write (A) for the smallest -algebra
that contains A, i.e.
(A) = {E : E
1(A)(x) = 1(x A) =
1, if x A;
0, otherwise.
Recall the definition of expectation. First for positive simple random variables, i.e., linear
combinations of indicator random variables, we define
" n
#
n
X
X
E
ci 1(Ai ) :=
ci P(Ai ),
i=1
i=1
where ci are positive constants and Ai are measurable events. Next, let X be a non-negative
random variable. Then X is the increasing limit of positive simple variables. For example
Xn () = 2n b2n X()c n X() as n .
So we define
E[X] := lim E[Xn ].
Finally, for a general random variable X, we can write X = X + X , where X + = max(X, 0)
and X = max(X, 0) and we define
E[X] := E[X + ] E[X ],
if at least one of E[X + ], E[X ] is finite. We call the random variable X integrable, if it
satisfies E[|X|] < .
Let X be a random variable with E[|X|] < . Let A be an event in F with P(A) > 0. Then
the conditional expectation of X given A is defined by
E[X|A] =
E[X 1(A)]
,
P(A)
Our goal is to extend the definition of conditional expectation to -algebras. So far we only
have defined it for events and it was a number. Now, the conditional expectation is going
to be a random variable, measurable with respect to the -algebra with respect to which we
are conditioning.
1.1
Discrete case
Let X be integrable, i.e., E[|X|] < . Lets start with a -algebra which is generated by a
countable family of disjoint events (Bi )iI with i Bi = , i.e., G = (Bi , i I). It is easy
to check that G = {iJ Bi : J I}.
The natural thing to do is to define a new random variable X 0 = E[X|G] as follows
X
X0 =
E[X|Bi ]1(Bi ).
iI
iI
(1.1)
iI
1.2
(1.2)
Before stating the existence and uniqueness theorem on conditional expectation, let us
quickly recall the notion of an event happening almost surely (a.s.), the Monotone convergence theorem and Lp spaces.
Let A F. We will say that A happens a.s., if P(A) = 1.
Theorem 1.7. [Monotone convergence theorem] Let (Xn )n be random variables such
that Xn 0 for all n and Xn X as n a.s. Then
E[Xn ] E[X] as n .
Theorem 1.8. [Dominated convergence theorem] If Xn X and |Xn | Y for all n
a.s., for some integrable random variable Y , then
E[Xn ] E[X].
Let p [1, ) and f a measurable function in (, F, P). We define the norm
kf kp = (E[|f |p ])1/p
and we denote by Lp = Lp (, F, P) the set of measurable functions f with kf kp < . For
p = , we let
kf k = inf{ : |f | a.e.}
and L the set of measurable functions with kf k < .
Formally, Lp is the collection of equivalence classes, where two functions are equivalent if
they are equal almost everywhere (a.e.). In practice, we will represent an element of Lp by
a function, but remember that equality in Lp means equality a.e..
Theorem 1.9. The space (L2 , k k2 ) is a Hilbert space with hf, gi = E[f g]. If H is a closed
subspace, then for all f L2 , there exists a unique (in the sense of a.e.) g H such that
kf gk2 = inf hH kf hk2 and hf g, hi = 0 for all h H.
5
(1.3)
Notice that the left hand side is nonnegative, while the right hand side is non-positive,
implying that P(Y < 0) = 0.
2nd step: Suppose that X 0. For each n we define the random variables Xn = X n n,
and hence Xn L2 . Thus from the first part of the existence proof we have that for each n
there exists a G-measurable random variable Yn satisfying for all A G
E[Yn 1(A)] = E[(X n)1(A)].
(1.4)
Since the sequence (Xn )n is increasing, from (1.3) we get that almost surely (Yn )n is increasing. If we now set Y = lim supn Yn , then clearly Y is G-measurable and almost surely
Y = limn Yn . By the monotone convergence theorem in (1.4) we get for all A G
E[Y 1(A)] = E[X 1(A)],
(1.5)
since Xn X, as n .
In particular, if E[X] is finite, then E[Y ] is also finite.
3rd step: Finally, for a general random variable X L1 (not necessarily positive) we
can apply the above construction to X + = max(X, 0) and X = max(X, 0) and then
E[X|G] = E[X + |G] E[X |G] satisfies (a) and (b).
Remark 1.14. Note that the 2nd step of the above proof gives that if X 0, then there
exists a G-measurable random variable Y such that
for all A G, E[X 1(A)] = E[Y 1(A)],
i.e., all the conditions of Theorem 1.11 are satisfied except for the integrability one.
Definition 1.15. Sub--algebras G1 , G2 , . . . of F are called independent, if whenever Gi Gi
(i N) and i1 , . . . , in are distinct, then
P(Gi1 . . . Gin ) =
n
Y
P(Gik ).
k=1
When we say that a random variable X is independent of a -algebra G, it means that (X)
is independent of G.
The following properties are immediate consequences of Theorem 1.11 and its proof.
Proposition 1.16. Let X, Y L1 (, F, P) and let G F be a -algebra. Then
1. E [E[X|G]] = E[X]
2. If X is G-measurable, then E[X|G] = X a.s..
3. If X is independent of G, then E[X|G] = E[X] a.s..
4. If X 0 a.s., then E[X|G] 0 a.s..
7
Theorem 1.18. [Jensens inequality] Let X be an integrable random variable and let
: R R be a convex function. Then
E[(X)] (E[X]).
Proposition 1.19. Let G F be a -algebra.
1. Conditional monotone convergence theorem: If (Xn )n0 is an increasing sequence of non-negative random variables with a.s. limit X, then
E[Xn |G] % E[X|G] as n , a.s..
2. Conditional Fatous lemma: If Xn 0 for all n, then
h
i
E lim inf Xn |G lim inf E[Xn |G] a.s..
n
kn
Clearly, E[inf kn Xk |G] inf kn E[Xk |G]. Passing to the limit gives the desired inequality.
3. Since Xn + Y and Y Xn are positive random variables for all n, applying conditional
Fatous lemma we get
E[X + Y |G] = E[lim inf(Xn + Y )|G] lim inf E[Xn + Y |G] and
n
4. A convex function is the supremum of countably many affine functions: (see for instance
[2, 6.6])
(x) = sup(ai x + bi ), x R.
i
So for all i we have E[(X)|G] ai E[X|G] + bi a.s. Now using the fact that the
supremum is over a countable set we get that
E[(X)|G] sup(ai E[X|G] + bi ) = (E[X|G]) a.s.
i
10
1.3
A measure space (E, E, ) is called -finite, if there exists a collection of sets (Sn )n0 in E
such that n Sn = E and (Sn ) < for all n.
Let (E1 , E1 , 1 ) and (E2 , E2 , 2 ) be two -finite measure spaces. The set
A = {A1 A2 : A1 E1 , A2 E2 }
is a -system of subsets of E = E1 E2 . Define the product -algebra
E1 E2 = (A).
Set E = E1 E2 .
Theorem 1.25. [Product measure] Let (E1 , E1 , 1 ) and (E2 , E2 , 2 ) be two -finite measure spaces. There exists a unique measure = 1 2 on E such that
(A1 A2 ) = 1 (A1 )2 (A2 )
for all A1 E1 and A2 E2 .
Theorem 1.26. [Fubinis theorem] Let (E1 , E1 , 1 ) and (E2 , E2 , 2 ) be two -finite measure
spaces.
Let f be E-measurable and non-negative. Then
Z Z
(f ) =
f (x1 , x2 ) 2 (dx2 ) 1 (dx1 ).
E1
(1.6)
E2
If f is integrable, then
1. x2 f (x1 , x2 ) is 2 -integrable for 1 -almost all x1 ,
R
2. x1 E2 f (x1 , x2 ) 2 (dx2 ) is 1 -integrable and formula (1.6) for (f ) holds.
1.4
Gaussian case
Let (X, Y ) be a Gaussian random vector in R2 . Set G = (Y ). In this example, we are going
to compute X 0 = E[X|G].
11
(1.7)
(1.8)
Suppose that X and Y are random variables having a joint density function fX,Y (x, y) in
R2 . Let h : R R be a Borel function such that h(X) is integrable.
In this example we want to compute E[h(X)|Y ] = E[h(X)|(Y )].
The random variable Y has a density function fY , given by
Z
fY (y) =
fX,Y (x, y) dx.
R
h(x)(Y, dx),
R
where (y, dx) = fY (y)1 fX,Y (x, y)1(fY (y) > 0)dx = fX|Y (x|y)dx. The measure (y, dx) is
called the conditional distribution of X given Y = y, and fX|Y (x|y) is the conditional density
function of X given Y = y. Notice this function of x, y is defined only up to a zero-measure
set.
12
Discrete-time martingales
Let (, F, P) be a probability space and (E, E) be a measurable space. (We will mostly
consider E = R, Rd , C. Unless otherwise indicated, it is to be understood from now on that
E = R.)
Let X = (Xn )n0 be a sequence of random variables taking values in E. We call X a
stochastic process in E.
A filtration (Fn )n is an increasing family of sub--algebras of F, i.e., Fn Fn+1 , for all n.
We can think of Fn as the information available to us at time n. Every process has a natural
filtration (FnX )n , given by
FnX = (Xk , k n).
The process X is called adapted to the filtration (Fn )n , if Xn is Fn -measurable for all n. Of
course, every process is adapted to its natural filtration. We say that X is integrable if Xn
is integrable for all n.
Definition 2.1. Let (, F, (Fn )n0 , P) be a filtered probability space. Let X = (Xn )n0 be
an adapted integrable process taking values in R.
X is a martingale if E[Xn |Fm ] = Xm a.s., for all n m.
X is a supermartingale if E[Xn |Fm ] Xm a.s., for all n m.
X is a submartingale if E[Xn |Fm ] Xm a.s., for all n m.
Note that every process which is a martingale (resp. super, sub) with respect to the given
filtration is also a martingale (resp. super, sub) with respect to its natural filtration by the
tower property of conditional expectation.
Example 2.2. Let (i )i1 be
P a sequence of i.i.d. random variables with E[1 ] = 0. Then it
is easy to check that Xn = ni=1 i is a martingale.
Example 2.3.QLet (i )i1 be a sequence of i.i.d. random variables with E[1 ] = 1. Then the
product Xn = ni=1 i is a martingale.
2.1
Stopping times
Example 2.5.
Let (Xn )n0 be an adapted process taking values in R. Let A B(R). The first
entrance time to A is
TA = inf{n 0 : Xn A}
with the convention that inf() = , so that TA = , if X never enters A. This is a
stopping time, since
{TA n} = kn {Xk A} Fn .
The last exit time though, TA = sup{n 10 : Xn A}, is not always a stopping time.
As an immediate consequence of the definition, one gets:
Proposition 2.6. Let S, T, (Tn )n be stopping times on the filtered probability space (, F, (Fn ), P).
Then also S T, S T, inf n Tn , supn Tn , lim inf n Tn , lim supn Tn are stopping times.
Proof. Note that in discrete time everything follows straight from the definitions. But when
one considers continuous time processes, then right continuity of the filtration is needed to
ensure that the limits are indeed stopping times.
Definition 2.7. Let T be a stopping time on the filtered probability space (, F, (Fn ), P).
Define the -algebra FT via
FT = {A F : A {T t} Ft , for all t}.
Intuitively FT is the information available at time T .
It is easy to check that if T = t, then T is a stopping time and FT = Ft .
For a process X, we set XT () = XT () (), whenever T () < . We also define the stopped
process X T by XtT = XT t .
Proposition 2.8. Let S and T be stopping times and let X = (Xn )n0 be an adapted process.
Then
1. if S T , then FS FT ,
2. XT 1(T < ) is an FT -measurable random variable,
3. X T is adapted,
4. if X is integrable, then X T is integrable.
Proof.
2. Let A E. Then
{XT 1(T < ) A} {T t} =
t
[
{Xs A} {T = s} Ft ,
s=1
" t1
X
"
|Xs |1(T = s) + E
s=0
2.2
#
|Xt |1(T = s)
s=t
t
X
E[|Xt |] < .
s=0
Optional stopping
n
X
k=0
15
Let A FS . Then
E[XT 1(A)] = E[XS 1(A)] +
n
X
k=0
2.3
Gamblers ruin
16
this event is independent of the previous one. Hence T can be bounded from above by a
geometric random variable of success probability 2(a+b) times a + b. Therefore we get
E[T ] (a + b)2a+b .
We thus have a martingale with bounded increments and a stopping time with finite expectation. Hence, from the optional stopping theorem (5), we deduce that
E[XT ] = E[X0 ] = 0.
We also have
E[XT ] = aP(Ta < Tb ) + bP(Tb < Ta ) and P(Ta < Tb ) + P(Tb < Ta ) = 1,
and hence we deduce that
P(Ta < Tb ) =
2.4
b
.
a+b
Theorem 2.13. [A.s. martingale convergence theorem] Let X = (Xn )n be a supermartingale which is bounded in L1 , i.e., supn E[|Xn |] < . Then Xn X a.s. as n ,
for some X L1 (F ), where F = (Fn , n 0).
Usually when we want to prove convergence of a sequence, we have an idea of what the limit
should be. In the case of the martingale convergence theorem though, we do not know the
limit. And, indeed in most cases, we just know the existence of the limit. In order to show
the convergence in the theorem, we will employ a beautiful trick due to Doob, which counts
the number of upcrossings of every interval with rational endpoints.
Corollary 2.14. Let X = (Xn )n be a non-negative supermartingale. Then X converges a.s.
towards an a.s. finite limit.
Proof. Since X is non-negative we get that
E[|Xn |] = E[Xn ] E[X0 ] < ,
hence X is bounded in L1 .
Let x = (xn )n be a sequence of real numbers. Let a < b be two real numbers. We define
T0 (x) = 0 and inductively for k 0
Sk+1 (x) = inf{n Tk (x) : xn a} and Tk+1 (x) = inf{n Sk+1 (x) : xn b}
(2.1)
S2
S1 T 1
Figure 1. Upcrossings.
Before stating and proving Doobs upcrossing inequality, we give an easy lemma that will
be used in the proof of Theorem 2.13.
= R {} if and
Lemma 2.15. A sequence of real numbers x = (xn )n converges in R
only if N ([a, b], x) < for all rationals a < b.
Proof. Suppose that x converges. Then if for some a < b we had that N ([a, b], x) = ,
that would imply that lim inf n xn a < b lim supn xn , which is a contradiction.
Next suppose that x does not converge. Then lim inf n xn < lim supn xn and so taking a < b
rationals between these two numbers gives that N ([a, b], x) = .
Theorem 2.16. [Doobs upcrossing inequality] Let X be a supermartingale and a < b
be two real numbers. Then for all n 0
(b a)E[Nn ([a, b], X)] E[(Xn a) ].
Proof. We will omit the dependence on X from Tk and Sk and we will write N = Nn ([a, b], X)
to simplify notation. By the definition of the times (Tk ) and (Sk ), it is clear that for all k
XTk XSk b a.
(2.2)
We have
n
X
N
n
X
X
(XTk n XSk n ) =
(XTk XSk ) +
(Xn XSk n )1(N < n)
k=1
k=1
(2.3)
k=N +1
N
X
(XTk XSk ) + (Xn XSN +1 )1(SN +1 n),
(2.4)
k=1
since the only term contributing in the second sum appearing on the right hand side of (2.3)
is N + 1, by the definition of N . Indeed, if SN +2 n, then that would imply that TN +1 n,
which would contradict the definition of N .
Using induction on k, it is easy to see that (Tk )k and (Sk )k are sequences of stopping times.
Hence for all n, we have that Sk n Tk n are bounded stopping times and thus by the
Optional stopping theorem, Theorem 2.9 we get that E[XSk n ] E[XTk n ], for all k.
18
Therefore, taking expectations in (2.3) and (2.4) and using (2.2) we get
" n
#
X
0E
(XTk n XSk n ) (b a)E[N ] E[(Xn a) ],
k=1
since (Xn XSN +1 )1(SN +1 n) (Xn a) . Rearranging gives the desired inequality.
Proof of Theorem 2.13. Let a < b Q. By Doobs upcrossing inequality, Theorem 2.16
we get that
E[Nn ([a, b], X)] (b a)1 E[(Xn a) ] (b a)1 E[|Xn | + a].
By monotone convergence theorem, since Nn ([a, b], X) N ([a, b], X) as n , we get that
E[N ([a, b], X)] (b a)1 (sup E[|Xn |] + a) < ,
n
by the assumption on X being bounded in L1 . Therefore, we get that N ([a, b], X) < a.s.
for every a < b Q. Hence,
!
\
P
{N ([a, b[, X) < } = 1.
a<bQ
Writing 0 = a<bQ {N ([a, b[, X) < }, we have that P(0 ) = 1 and by Lemma 2.15 on
0 we have that X converges to a possible infinite limit X . So we can define
limn Xn , on 0 ,
X =
0,
on \ 0 .
Then X is F -measurable and by Fatous lemma and the assumption on X being in L1
we get
E[|X |] = E[lim inf |Xn |] lim inf E[|Xn |] < .
n
Hence X L as required.
2.5
Doobs inequalities
Theorem 2.17. [Doobs maximal inequality] Let X = (Xn )n be a non-negative submartingale. Writing Xn = sup0kn Xk we have
P(Xn ) E[Xn 1(Xn )] E[Xn ].
Proof. Let T = inf{k 0 : Xk }. Then T n is a bounded stopping time, hence by the
Optional stopping theorem, Theorem 2.9, we have
E[Xn ] E[XT n ] = E[XT 1(T n)] + E[Xn 1(T > n)] P(T n) + E[Xn 1(T > n)].
It is clear that {T n} = {Xn }. Hence we get
P(Xn ) E[Xn 1(T n)] = E[Xn 1(Xn )] E[Xn ].
19
Theorem 2.18. [Doobs Lp inequality] Let X be a martingale or a non-negative submartingale. Then for all p > 1 letting Xn = supkn |Xk | we have
kXn kp
p
kXn kp .
p1
Z
Z
x) dx =
pxp1 P(Xn x) dx
k) ] = E
px 1
0
0
Z k
p
pxp2 E[Xn 1(Xn x)] dx =
E Xn (Xn k)p1
p1
0
p
p1
(Xn
where in the second and third equalities we used Fubinis theorem, for the first inequality
we used Theorem 2.17 and for the last inequality we used Holders inequality. Rearranging,
we get
p
kXn kp .
kXn kkp
p1
Letting k and using monotone convergence completes the proof.
2.6
Theorem 2.19. Let X be a martingale and p > 1, then the following statements are equivalent:
1. X is bounded in Lp (, F, P) : supn0 kXn kp <
2. X converges a.s. and in Lp to a random variable X
3. There exists a random variable Z Lp (, F, P) such that
Xn = E[Z|Fn ] a.s.
Proof. 1 = 2 Suppose that X is bounded in Lp . Then by Jensens inequality, X is also
bounded in L1 . Hence by Theorem 2.13 we have that X converges to a finite limit X a.s.
By Fatous lemma we have
E[|X |p ] = E[lim inf |Xn |p ] lim inf E[|Xn |p ] sup kXn kpp < .
n
p
kXn kp ,
p1
20
n0
where recall that Xn = supkn |Xk |. If we now let n , then by monotone convergence
we get that
p
kX
kp
sup kXn kp .
p 1 n0
Therefore
Lp
|Xn X | 2X
(2.5)
2.7
21
(2.6)
Xn 1(T = n) + X 1(T = ).
n=0
n=0
nZ+ {}
Let B FT . Then
X
E[1(B)XT ] =
E[1(B)1(T = n)Xn ] =
nZ+ {}
nZ+ {}
where for the second equality we used that E[X |Fn ] = Xn a.s. Also, clearly XT is FT measurable, and hence
E[X |FT ] = XT a.s.
Now using the tower property of conditional expectation, we get for stopping times S T ,
since FS FT
E[XT |FS ] = E[E[X |FT ]|FS ] = E[X |FS ] = XS a.s.
2.8
Backwards martingales
(2.7)
Since X0 L1 , from (2.7) and Theorem 2.23 we get that X is uniformly integrable. This is
a nice property that backwards martingales have: they are automatically UI.
Theorem 2.31. Let X be a backwards martingale, with X0 Lp for some p [1, ). Then
Xn converges a.s. and in Lp as n to the random variable X = E[X0 |G ], where
G = n0 Gn .
Proof. We will first adapt Doobs up-crossing inequality, Theorem 2.16, in this setting. Let
a < b be real numbers and Nn ([a, b], X) be the number of up-crossings of the interval [a, b]
by X between times n and 0 as defined at the beginning of Section 2.4.
If we write Fk = Gn+k , for 0 k n, then Fk is an increasing filtration and the process
(Xn+k , 0 k n) is an F-martingale. Then Nn ([a, b], X) is the number of up-crossings
of the interval [a, b] by Xn+k between times 0 and n. Thus applying Doobs up-crossing
inequality to Xn+k we get that
(b a)E[Nn ([a, b], X)] E[(X0 a) ].
Letting n we have that Nn ([a, b], X) increases to the total number of up-crossings of
X from a to b and thus we deduce that
Xm X as m a.s.,
for some random variable X , which is G -measurable, since the -algebras Gn are decreasing.
Since X0 Lp , it follows that Xn Lp , for all n 0. Also, by Fatous lemma, we get that
X Lp . Now by conditional Jensens inequality we obtain
|Xn X |p = |E[X0 X |Gn ]|p E[|X0 X |p |Gn ].
But the latter family of random variables, (E[|X0 X |p |Gn ])n is UI, by Theorem 2.23
again. Hence also (|Xn X |p )n is UI, and thus by [2, Theorem 13.7], we conclude that
Xn X as n in Lp .
In order to show that X = E[X0 |G ] a.s., it only remains to show that if A G , then
E[X0 1(A)] = E[X 1(A)].
Since A Gn , for all n 0, we have by the martingale property that
E[X0 1(A)] = E[Xn 1(A)].
Letting n in the above equality and using the L1 convergence of Xn to X finishes
the proof.
24
2.9
Applications of martingales
Theorem 2.32. [Kolmogorovs 0-1 law] Let (Xi )i1 be a sequence of i.i.d. random variables. Let Fn = (Xk , k n) and F = n0 Fn . Then F is trivial, i.e. every A F
has probability P(A) {0, 1}.
Proof. Let Gn = (Xk , k n) and A F . Since Gn is independent of Fn+1 , we have that
E[1(A)|Gn ] = P(A) a.s.
Theorem 2.26 gives that E[1(A)|Gn ] converges to E[1(A)|G ] a.s. as n , where G =
(Gn , n 0). Hence we deduce that
E[1(A)|G ] = 1(A) = P(A) a.s.,
since F G . Therefore
P(A) {0, 1}.
Theorem 2.33. [Strong law of large numbers] Let (Xi )i1 be a sequence of i.i.d. random
variables in L1 with = E[X1 ]. Let Sn = X1 + . . . + Xn , for n 1 and S0 = 0. Then
Sn /n as n a.s. and in L1 .
Proof. Let Gn = (Sn , Sn+1 , . . .) = (Sn , Xn+1 , . . .). We will now show that (Mn )n1 =
(Sn /(n))n1 is a (Fn )n1 = (Gn )n1 backwards martingale. We have for m 1
i
h
Sm1
(2.8)
E Mm+1 Fm = E
Gm .
m 1
Setting n = m, since Xn is independent of Xn+1 , Xn+2 , . . ., we obtain
Sn1
Xn
Sn Xn
Sn
E
E
Gn = E
Gn =
Sn .
n1
n1
n1
n1
(2.9)
By symmetry, notice that E[Xk |Sn ] = E[X1 |Sn ] for all k. Indeed, for any A B(R) we have
that E[Xk 1(Sn A)] does not depend on k. Clearly
E[X1 |Sn ] + . . . + E[Xn |Sn ] = E[Sn |Sn ] = Sn ,
and hence E[Xn |Sn ] = Sn /n a.s. Finally putting everything together we get
Sn1
Sn
Sn
Sn
E
=
a.s.
Gn =
n1
n 1 n(n 1)
n
Thus, by the backwards martingale convergence theorem, we deduce that Snn converges as
n a.s. and in L1 to a random variable, say Y = lim Sn /n. Obviously for all k
Y = lim
Xk+1 + . . . + Xk+n
,
n
25
Theorem 2.34. [Kakutanis product martingale theorem] Let (Xn )n0 be a sequence
of independent non-negative random variables of mean 1. We set
M0 = 1 and Mn = X1 X2 . . . Xn , n N.
Then (Mn )n0 is a non-negative
martingale and Mn M a.s. as n for some random
variable M . We set an = E[ Xn ], then an (0, 1]. Moreover,
1. if
2. if
an = 0, then M = 0 a.s.
Proof. Clearly (Mn )n is a positive martingale and E[Mn ] = 1, for all n, since the random
variables (Xi ) are independent and of mean 1. Hence, by the a.s. martingale convergence
theorem, we get that Mn converges a.s. as n to a finite random variable M . By
Cauchy-Schwarz an 1 for all n.
We now define
Nn =
X1 . . . Xn
, for n 1.
a1 . . . an
Then Nn is a non-negative martingale that is bounded in L1 , and hence converges a.s. towards
a finite limit N as n .
1. We have
1
1
Q
sup E[Nn2 ] = sup Qn
< ,
(2.10)
2 =
( n an ) 2
n0
n0 ( i=1 ai )
Q
Q
under the assumption that n an > 0. Since Mn = Nn2 ( ni=1 ai )2 Nn2 for all n, we get
E[sup Mk ] E[sup Nk2 ] 4E[Nn2 ],
kn
kn
where the last inequality follows by Doobs L2 -inequality, Theorem 2.18. Hence by Monotone
convergence and (2.10) we deduce
E[sup Mn ] < ,
n
and since Mn supn Mn we conclude that Mn is UI, and hence it also converges in L1
towards M . Finally since E[Mn ] = 1 for all n, it follows that E[M ] = 1.
Q
Q
2. We have Mn = Nn2 ( ni=1 ai )2 0, as n , since n an = 0 and N exists and is finite
a.s. by the a.s. martingale convergence theorem. Hence M = 0 a.s.
26
2.9.1
27
dQ
on Fn .
dP
It is easy to check that (Xn )n is a non-negative martingale with respect to the filtered
probability space (, F, (Fn ), P). Indeed, if A Fn , then
Xn =
Therefore by (b)
E[Xn 1(Xn )] = Q({Xn }) ,
which proves the uniform integrability. Thus by the convergence theorem for UI martingales,
Theorem 2.26, we get that Xn converges to X as n in L1 and E[X ] = 1. So for all
A Fn we have
E[Xn 1(A)] = E[X 1(A)].
e
e
Hence if we now define a new probability measure Q(A)
= E[X 1(A)], then Q(A) = Q(A)
for all A n Fn . But n Fn is a -system that generates the - algebra F, and hence
e on F,
Q=Q
which implies (c).
The implication (c) = (a) is straightforward.
3
3.1
When we consider processes in discrete time, if we equip N with the -algebra P(N) that
contains all the subsets of N, then the process
(, n) 7 Xn ()
is clearly measurable with respect to the product -algebra F P(N).
Back to continuous time, if we fix t R+ , then 7 Xt () is a random variable. But, the
mapping (, t) 7 Xt () has no reason to be measurable with respect to F B(R) (B(R) is
the Borel -algebra) unless some regularity conditions are imposed on X. Also, if A R,
then the first hitting time of A,
TA = inf{t : Xt A}
is not in general a stopping time as the set
{TA t} = 0st {T = s}
/ Ft in general,
since this is an uncountable union.
A quite natural requirement is that for a fixed the mapping t 7 Xt () is continuous in
t. Then, indeed the mapping (, t) 7 Xt () is measurable. More generally we will consider
processes that are right-continuous and admit left limits everywhere a.s. and we will call such
processes c`adl`ag from the french continu a` droite limite `a gauche. Continuous and c`adl`ag
processes are determined by their values in a countable dense subset of R+ , for instance Q+ .
Note that if a process X = (Xt )t(0,1] is continuous, then the mapping
(, t) 7 Xt ()
is measurable with respect to F B((0, 1]). To see this, note that by the continuity of X in
t we can write
n 1
2X
1(t (k2n, (k + 1)2n])Xk2n ().
Xt () = lim
n
k=0
n 1
2X
()
k=0
Proposition 3.1. Let S and T be stopping times and X a cadlag adapted process. Then
1. S T is a stopping time,
2. if S T , then FS FT ,
3. XT 1(T < ) is an FT -measurable random variable,
4. X T is adapted.
Proof. 1,2 follow directly from the definition like in the discrete time case. We will only
show 3. Note that 4 follows from 3, since XT t will then be FT t -measurable, and hence
Ft -measurable, since by 2, FT t Ft .
Note that a random variable Z is FT measurable if and only if Z 1(T t) is Ft -measurable
for all t. It follows directly by the definition that if Z is FT -measurable, then Z 1(T t) is
Ft -measurable for all t. For the other implication, note that if Z = c1(A), then
Pthe claim is
true. This extends to all finite linear combinations of indicators, since if Z = ni=1 ci 1(Ai ),
where the constants ci are positive, then we can write Z as a linear combination of indicators
of disjoint sets and then the claim follows easily. Finally for any positive random variable
Z we can approximate it by Zn = 2n b2n Zc n Z as n . Then the claim follows for
each Zn , since if Z 1(T t) is Ft -measurable, then also Zn 1(T t) is Ft -measurable, for
all t. Finally the limit of FT -measurable random variables is FT -measurable.
So in order to prove that XT 1(T < ) is FT -measurable, we will show that XT 1(T t) is
Ft -measurable for all t. We can write
XT 1(T t) = XT 1(T < t) + Xt 1(T = t).
Clearly, the random variable Xt 1(T = t) is Ft -measurable. It only remains to show that
XT 1(T < t) is Ft -measurable. If we let Tn = 2n d2n T e, then it is easy to see that Tn is a
stopping time that takes values in the set Dn = {k2n : k N}. Indeed
{Tn t} = {d2n T e 2n t} = {T 2n b2n tc} F2n b2n tc Ft .
By the cadlag property of X and the convergence Tn T we get that
XT 1(T < t) = lim XTn t 1(T < t).
n
But Tn is a stopping time wrt the filtration (Ft ), and hence we see that XTn t 1(T < t) is
Ft -measurable for all n and this finishes the proof.
30
Example 3.2. Note that when the time index set is R+ , then hitting times are not always
stopping times. Let J be a random variable that takes values +1 or 1 each with probability
1/2. Consider now the following process
t,
if t [0, 1];
Xt =
1 + J(t 1), if t > 1.
Let Ft = (Xs , s t) be the natural filtration of X. Then if A = (1, 2) and we consider
TA = inf{t 0 : Xt A}, then clearly
{TA 1}
/ F1 .
If we impose some regularity conditions on the process or the filtration though, then we get
stopping times like in the next two propositions.
Proposition 3.3. Let A be a closed set and let X be a continuous adapted process. Then
the first hitting time of A,
TA = inf{t 0 : Xt A},
is a stopping time.
Proof. It suffices to show that
{TA t} =
inf d(Xs , A) = 0 ,
sQ,st
(3.1)
where d(x, A) stands for the distance of x from the set A. If TA = s t, then there exists a
sequence sn of times such that Xsn A and sn s as n . By continuity of X, we then
deduce that Xsn Xs as n and since A is closed, we must have that Xs A. Thus
we showed that XTA A. We can now find a sequence of rationals qn such that qn TA as
n and since d(XTA , A) = 0 we get that d(Xqn , A) 0 as n .
Suppose now that inf sQ,st d(Xs , A) = 0. Then there exists a sequence sn Q, sn t, for
all n such that
d(Xsn , A) 0 as n .
We can extract a converging subsequence of sn s and by continuity of X we get that
Xsn Xs as n . Since d(Xs , A) = 0 and A is a closed set, we conclude that Xs A,
and hence TA t.
Definition 3.4. Let (Ft )tR+ be a filtration. For each t we define
\
Ft+ =
Fs .
s>t
Proof. First we show that for all t, the event {TA < t} Ft . Indeed,by the continuity of
X and the fact that A is open we get that
[
{TA < t} =
{Xq A} Ft ,
qQ,q<t
\
{T < t + 1/n}
n
3.2
As we discussed at the beginning of the section, we can view a stochastic process indexed
by R+ as a random variable with values in the space of functions {f : R+ E} endowed
with the product -algebra that makes the projections f 7 f (t) measurable. The law of the
process X is the measure that is defined as
(A) = P(X A),
where A is in the product -algebra. However the measure is not easy to work with.
Instead we consider simpler objects that we define below.
Given a probability measure on D(R+ , E) we consider the probability measure J , where
J R+ is a finite set, defined as the law of (Xt , t J). The probability measures (J ) are
called the finite dimensional distributions of . By a -system uniqueness argument, is
uniquely determined by its finite-dimensional distributions. Indeed the set
{sJ {Xs As } : J is finite , As B(R)}
is a -system generating the product -algebra. So, when we want to specify the law of
a cadlag process, it suffices to describe its finite-dimensional distributions. Of course we
have no a priori reason to believe there exists a cadlag process whose finite-dimensional
distributions coincide with a given family of measures (J : J R+ , J finite).
Even if we know the law of a process, this does not give us much information about the
sample path properties of the process. Namely, there could be different processes with the
same finite marginal distributions. This motivates the following definition:
Definition 3.6. Let X and X 0 be two processes defined on the same probability space
(, F, P). We say that X 0 is a version of X if Xt = Xt0 a.s. for every t.
Remark 3.7. Note that two versions of the same process have the same finite marginal
distributions. But they do not share the same sample path properties.
Example 3.8. Let X = (Xt )t[0,1] be the process that is identical to 0 for all t. Then
obviously the finite marginal distributions will be Dirac measures at 0. Now let U be a
uniform random variable on [0, 1]. We define Xt0 = 1(U = t). Then clearly the finite
32
st
Proof. First note that if (sn ) is a sequence of rationals decreasing to t, then by Lemma 2.15
we get that the limit limn f (sn ) exists. Similarly if s0n is a sequence increasing to t, then
the limit limn f (s0n ) exists. So far we showed that for any sequence converging to t from
above (or below) the limit exists. It remains to show that the limit is the same along any
sequence decreasing to t. To see this, note that if sn is a sequence decreasing to t and qn is
33
another sequence decreasing to t and limn f (sn ) 6= limn f (qn ), then we can combine the two
sequences and get a decreasing sequence (an ) converging to t such that limn f (an ) does not
exist, which is a contradiction, since we already showed that for every decreasing sequence
the limit exists. Finally the limits from above or below are finite, which follows by the
assumption that f is bounded on any bounded subset of Q+ .
e as follows:
Proof of Theorem 3.10. The goal is to define X
et =
X
lim Xs
st,sQ+
where K > sup I. So taking a monotone limit over J finite subsets of I with union the set
I, then we get that
P(sup |Xt | > ) E[|XK |].
tI
Let a < b be rational numbers. Then we have N ([a, b], I, X) = supJI, finite N ([a, b], J, X).
Let J = {a1 , . . . , an } (in increasing order again) be a finite subset of I. Then (Xai )in is a
martingale and Doobs upcrossing lemma gives that
(b a)E[N ([a, b], J, X)] E[(Xan a) ] E[(XK a) ]
(3.2)
By monotone convergence again, if we let IM = Q+ [0, M ], we then get that for all M
N ([a, b], IM , X) < a.s.
Thus if we now let
0 = M N a<b,a,bQ {N ([a, b], IM , X) < } { sup |Xt | < },
tIM
then we obtain that P(0 ) = 1 . For 0 by Lemma 3.11 the following limits exist in R:
Xt+ () = lim Xs (), t 0
st,sQ
34
Xt+ , on 0 ;
0,
otherwise.
Notice that the process (Xtn : n 1) is a backwards martingale, and hence it converges a.s.
and in L1 as n . Therefore,
et |Ft ] in L1 .
E[Xti |Ft ] E[X
But E[Xti |Ft ] = Xt . Therefore
et |Ft ] a.s..
Xt = E[X
(3.3)
(3.4)
If s < t, then by the tower property and (3.4) and (3.3) we get that
et |Fs+ ] = X
es a.s.
E[X
Notice that if G is any -algebra and X is an integrable random variable, then
E[X|G N ] = E[X|G] a.s.
et |Fes ] = X
es a.s., which shows that X
e is a martingale with respect to
Finally we get that E[X
e
the filtration F.
The only thing that remains to prove is the cadlag property.
e is not right continuous. Then this means
Suppose that for some 0 we have that X
that there exists a sequence (sn ) such that sn t as n and
e sn X
et | > ,
|X
e for 0 , there exists a sequence of rational
for some > 0. By the definition of X
0
0
0
numbers (sn ) such that sn > sn , sn t as n and
e s n Xs 0 |
|X
n
35
2.
et | > ,
|Xs0n X
2
et as n .
X
e has left limits is left as an exercise (hint: use the finite up-crossing property
The proof that X
of X on rationals).
Example 3.12. Let , be independent random variables taking values +1 or 1 with equal
probability. We now define
if t < 1;
0,
,
if t = 1;
Xt =
+ , if t > 1.
We also define Ft to be the natural filtration, i.e. Ft = (Xs , s t). Then clearly, X is a
martingale relative to the filtration (Ft ), but it is not right continuous at 1. Also, it is easy
to see that F1 = () but F1+ = (, ). We now define
0,
if t < 1;
e
Xt =
+ , if t 1.
et |Ft ] a.s. for all t and X
e is a martingale with respect to the
It is easy to check that Xt = E[X
e is cadlag. Note though that X
e is not a version of X,
filtration (Ft+ ). It is obvious that X
e1 .
since X1 6= X
From now on when we work with martingales in continuous time, we will always consider
their cadlag version, provided that the filtration satisfies the usual conditions.
3.3
In this section we will give the continuous time analogues of Doobs inequalities and the
convergence theorems for martingales.
Theorem 3.13. [A.s. martingale convergence] Let (Xt : t 0) be a cadlag martingale
which is bounded in L1 . Then Xt X a.s. as t , for some X L1 (F ).
Proof. If N ([a, b], IM , X) stands for the number of up-crossings of the interval [a, b] as
defined in Lemma 3.11, then from (3.2) in the proof of the martingale regularization theorem,
we get that
(b a)E[N ([a, b], IM , X)] a + sup E[|Xt |] < ,
t0
|Xt Xq | < .
2
Hence we conclude that
|Xt X | .
sup
|Xs |.
s{t}([0,t]Q+ )
The rest of the proof follows in the same way as the first part of the proof of Theorem 3.10
Theorem 3.15. [Doobs Lp -inequality] Let (Xt : t 0) be a cadlag martingale. Setting
Xt = supst |Xs |, then for all p > 1 we have
kXt kp
p
kXt kp .
p1
37
(3.5)
Since A FS the definition of Sn implies that A FSn . Hence from (3.5) we obtain that
E[XTn 1(A)] = E[XSn 1(A)]
Letting n and using the L1 convergence of XTn to XT and of XSn to XS we have
E[XT 1(A)] = E[XS 1(A)].
3.4
38
0k<2n
|Xk2n X(k+1)2n |
M < .
2n
(3.6)
We will now show that there exists a random variable M 0 < a.s. so that for every s, t D
we have
|Xt Xs | M 0 |t s| .
Let s, t D and let r be the unique integer such that
2(r+1) < t s 2r .
Then there exists k such that s < k2(r+1) < t. Set = k2r+1 , then 0 < t < 2r . So
we have that
X xj
,
t=
j
2
kr+1
where xj {0, 1} for all j (in fact this is a finite sum because t is dyadic). Similarly we
can write
X yj
s=
,
j
2
jr+1
where yj {0, 1} for all j. Thus we see that we can write the interval [s, t) as a disjoint
union of dyadic intervals of length 2n for n r + 1 and where at most 2 such intervals have
the same length. Therefore,
X
|Xs Xt |
|Xd Xd+2n |,
d,n
where d, d + 2n in the summation above are the endpoints of the intervals in the decomposition of [s, t). Hence using (3.6) we obtain that for all s, t D
|Xs Xt | 2
M 2n = 2M
nr+1
2(r+1)
.
1 2
Weak convergence
4.1
Definitions
Let (M, d) be a metric space endowed with its Borel -algebra. All the measures that we
will consider in this section will be measures on such a measurable space.
Definition 4.1. Let (n , n 0) be a sequence of probability measures on a metric space
(M, d). We say that n converges weakly to and write n
R if n (f ) (f ) as n
for all bounded continuous functions f on M , where (f ) = M f d.
Notice that by the definition is also a probability measure, since (1) = 1.
Example 4.2. Let (xn )n0 be a sequence in a metric space M that converges to x as
n . Then xn converges weakly to x as n , since if f is any continuous function,
then f (xn ) f (x) as n .
P
Example 4.3. Let M = [0, 1] with the Euclidean metric and n = n1 0kn1 k/n .
R1
P
Then n (f ) is the Riemann sum n1 0kn1 f (k/n) and it converges to 0 f (x) dx if f is
continuous, which shows that n converges weakly to Lebesgue measure on [0, 1].
Remark 4.4. Notice that if A is a Borel set, then it is not always true that n (A) (A)
as n , when n . Indeed, let xn = 1/n and n = xn . Then n 0 , but n (A) = 1
for all n, when A is the open set (0, 1), and 0 (A) = 0.
Theorem 4.5. Let (n )n0 be a sequence of probability measures. The following are equivalent:
(a) n as n ,
(b) lim inf n n (G) (G) for all open sets G,
(c) lim supn n (A) (A) for all closed sets A,
(d) limn n (A) = (A) for all sets A with (A) = 0.
Proof. (a) = (b). Let G be an open set with non-empty complement Gc . For every
positive M we now define
fM (x) = 1 (M d(x, Gc )).
Then fM is a continuous and bounded function and for all M we have fM (x) 1(x G).
Also fM 1(G) as M , since Gc is a closed set. Since fM is continuous and bounded
we have
n (fM ) (fM ) as n .
Hence
lim inf n (G) lim inf n (fM ) = (fM ).
n
40
where K is an upper bound for f . We will now show that for Lebesgue almost all t we have
({f t}) = 0.
(4.1)
41
(4.2)
Finally we deduce
lim inf n (G) = lim inf
n
X
k
n ((ak , bk ))
((ak , bk )) = (G),
where the first inequality follows from Fatous lemma and the second one from (4.2).
Definition 4.7. Let (Xn )n be a sequence of random variables taking values in a metric
space (M, d) but defined on possibly different probability spaces (n Fn , Pn ). We say that Xn
converges in distribution to a random variable X defined on the probability space (, F, P) if
the law of Xn converges weakly to the law of X as n . Equivalently, if for all functions
f : M R continuous and bounded
EPn [f (Xn )] EP [f (X)] as n .
Proposition 4.8 (a). Let (Xn )n be a sequence of random variables that converges to X in
probability as n . Then Xn converges to X in distribution to X as n .
(b). Let (Xn )n be a sequence of random variables that converges to a constant c in distribution
as n . Then Xn converges to c in probability as n .
Proof. See example sheet.
Example 4.9. [Central limit theorem] Let (Xn )n be a sequence of i.i.d. random variables
in L2 with m = E[X1 ] and 2 = var(X1 ). We set Sn = X1
+ . . . + Xn . Then the central
limit theorem states that the normalized sums (Sn nm)/ n converge in distribution to
a Gaussian N (0, 1) random variable as n .
42
4.2
Tightness
Remark 4.11. Note that if a metric space M is compact, then every sequence of measures
is tight.
Theorem 4.12. [Prohorovs theorem] Let (n )n be a tight sequence of probability measures on a metric space M . Then there exists a subsequence (nk ) and a probability measure
on M such that
nk .
Proof. We will prove the theorem in the case when M = R. Let Fn = Fn be the distribution
function corresponding to the measure n . We will first show that there exists a subsequence
nk and a non-decreasing function F such that Fnk (x) converges to F (x) for all x Q. To
prove that we will use a standard extraction argument.
Let (x1 , x2 , . . .) be an enumeration of Q. Then (Fn (x1 ))n is a sequence in [0, 1], and hence
it has a converging subsequence. Let the converging subsequence be Fn(1) (x1 ) and the limit
k
F (x1 ). Then (Fn(1) (x2 ))k is a sequence in [0, 1] and thus also has a converging subsequence.
k
(i)
If we continue in this way, we get for each i 1 a sequence nk so that Fn(i) (xj ) converges
k
(k)
to a limit F (xj ) for all j = 1, . . . , i. Then the diagonal sequence mk = nk satisfies that
Fmk (x) converges for all x Q to F (x) as k . Since the distribution functions Fn (x)
are non-decreasing in x, then we get that F (x) is also non-decreasing in x.
By the monotonicity of F we can define for all x R
F (x) = lim F (q).
qx,qQ
The definition of F gives that it is right continuous and the monotonicity property gives
that left limits exist, hence F is cadlag.
We will next show that if t is a point of continuity of F , i.e. F (t) = F (t), then
lim Fmk (t) = F (t).
Let s1 < t < s2 with s1 , s2 Q and such that |F (si ) F (t)| < /2 for i = 1, 2. Note that
such rational numbers s1 and s2 exist since t is a continuity point of F . Then using the
monotonicity property of Fmk we get that for k large enough
F (t) < F (s1 )
< Fmk (s1 ) Fmk (t) Fmk (s2 ) < F (s2 ) + < F (t) + .
2
2
Note that we can choose N so that both N and N are continuity points of F (F is monotone). Therefore it follows that
F (N ) and 1 F (N ) .
Hence we see that
lim F (x) = 0 and
lim F (x) = 1.
Finally we need to show that there exists a measure such that F = F . To this end, we
define
((a, b]) = F (b) F (a).
Then can be extended to a Borel probability measure by Caratheodorys extension theorem
and F = F . Another way to construct the measure is given in [2, Section 3.12].
Proposition 4.6 now finishes the proof.
4.3
Characteristic functions
Definition 4.13. Let X be a random variable taking values in Rd with law . We define
the characteristic function = X by
Z
ihu,Xi
(u) = E[e
]=
eihu,xi (dx), u Rd .
Rd
[,]d
j=1
[,]
Z Y
Z Y
d
d
1
2 sin(xj )
ixj
ixj
=<
e
e
d(x) =
d(x).
ixj
xj
j=1
j=1
Therefore we have
!
Z
Z
d
Y
1
sin(x
)
j
.
(1 <X (u)) du = 2d
d(x) 1
d [,]d
xj
Rd
j=1
(4.3)
Qd
j=1
"
#
d
1
Y
X
sin(K
X
)
j
K) = P
K
1 CE 1
K 1 Xj
j=1
!
Z
d
Y
sin(K 1 xj )
d(x) 1
=C
.
1 x
K
j
Rd
j=1
By the assumption and since |1 <Xn (u)| 2 for all n using the dominated convergence
theorem we have
Z
Z
d
d
lim K
(1 <Xn (u)) du = K
(1 <(u)) du.
n
[K 1 ,K 1 ]d
[K 1 ,K 1 ]d
45
Since is continuous at 0, if we take K large enough we can make this limit < /(2Cd ) and
so for all n large enough
P(kXn k > K) .
If we now take K even larger, then the above inequality holds for all n showing the tightness
of the family (L(Xn )).
By Prohorovs theorem there exists a subsequence (Xnk ) that converges in distribution to
some random variable X. So Xnk converges pointwise to X , and hence X = , which
shows that is a characteristic function.
We will finally show that Xn converges in distribution to X. If not, then there would exist
a subsequence (mk ) and a continuous and bounded function f such that for some > 0 and
all k
|E[f (Xmk )] E[f (X)]| > .
(4.4)
But since the laws of (Xmk ) are tight, we can extract a subsequence (`k ) along which (X`k )
converges in distribution to some Y , which would imply that = Y and thus Y would have
the same distribution as X, contradicting (4.4).
46
5
5.1
Large deviations
Introduction
Let {Xi } be a sequence of i.i.d. random variables with E[X1 ] = x and we set Sn =
Pn
i=1
Xi .
P(Sn n
x + a n) P(Z a) as n ,
where Z N (0, 1).
Large deviations: What are the asymptotics of P(Sn an) as n , for a > x?
Example 5.1. Let Xi be i.i.d. distributed as N (0, 1). Then
1
2
ea n/2 ,
P(Sn an) = P(X1 a n)
a 2n
where we write f (x) g(x) if f (x)/g(x) 1 as x . So
1
a2
log P(Sn an) I(a) =
as n .
n
2
In general we have
P(Sn+m a(n + m)) P(Sn an)P(Sm am),
so bn = log P(Sn an) satisfies that
bn+m bn + bm ,
and hence this implies the existence of the limit (exercise)
lim
bn
1
= lim log P(Sn an) = I(a).
n
n
Note that if P(X1 a0 ) = 1, then we will only consider a a0 , since clearly P(Sn na) = 0
for a > a0 .
5.2
Cramers theorem
We will now obtain a bound for P(Sn na) using the moment generating function of X1 .
For 0 we set
M () = E[eX1 ],
which could also be infinite. We define
() = log M ().
47
(a)
P(Sn an) en
n,
whence
1
lim inf log P(Sn an) (a).
n
n
(5.2)
TheoremP
5.2. [Cramers theorem] Let (Xi ) be i.i.d. random variables with E[X1 ] = x
and Sn = ni=1 Xi . Then
1
lim log P(Sn na) = (a) for a x.
n
M 0 ()
for D.
M ()
e
= |x|e (e1 x + e2 x )1 ,
h
is in [1 , 2 ] if |h| < mini | i | = 2.
where
Proof of Theorem 5.2. The direction
1
lim log P(Sn na) (a)
n
n
48
1
log P(Sn 0) inf ()
0
n
(5.3)
when x < 0.
If P(X1 0) = 1, then
inf () lim () = log (0),
where = L(X1 ), so (5.3) holds in this case. Thus we may assume that P(X1 > 0) > 0.
Next consider the case M () < for all . Define a new law where
Z
d
ex
ex
(x) =
, so E [f (X1 )] = f (x)
d(x).
d
M ()
M ()
More generally
Z
E [F (X1 , . . . , Xn )] =
F (x1 , . . . , xn )
n
Y
i=1
Qn
i=1
exi
d(x1 ) . . . d(xn )
M ()n
The dominated convergence theorem gives that g() = E [X1 ] is continuous and g(0) = x <
0, while
R x
xe d
lim g() = lim R x
> 0,
e d
since (0, ) > 0. Thus we can find > 0 such that E [X1 ] = 0.
We now have
P(Sn 0) P(Sn [0, n]) E e(Sn n) 1(Sn [0, n]) = M ()n P (Sn [0, n])en .
By the central limit theorem we have that P (Sn [0, n]) 1/2 as n so
lim inf
n
1
log P(Sn 0) () .
n
49
log
Therefore
lim inf
1
1
log n [0, ) log [K, K] + lim inf log n [0, ) inf K () = JK .
0
n
n
1
log n [0, ) J.
n
(5.4)
Therefore we obtain
(0 ) = lim K (0 ) J,
K
1
log n [0, ) (0 ) inf ()
0
n
as claimed.
5.3
Examples
50
a2
a2
= .
2
2
1
.
1
X
1 k1
e
= ee 1 ,
k!
k=0
6
6.1
Brownian motion
History and definition
Brownian motion is named after R. Brown who observed in 1827 the erratic motion of small
particles in water. A physical model was developed by Einstein in 1905 and the mathematical
construction is due to N. Wiener in 1923. He used a random Fourier series to construct
Brownian motion. Our treatment follows later ideas of Levy and Kolmogorov.
Definition 6.1. Let B = (Bt )t0 be a continuous process in Rd . We say that B is a Brownian
motion in Rd started from x Rd if
(i) B0 = x a.s.,
(ii) Bt Bs N (0, (t s)Id ), for all s < t,
(iii) B has independent increments, independent of B0 .
Remark 6.2. We say that (Bt )t0 is a standard Brownian motion if x = 0.
Conditions (ii) and (iii) uniquely determine the law of a Brownian motion. In the next
section we will show that Brownian motion exists.
Example 6.3. Suppose tht (Bt , t 0) is a standard Brownian motion and U is an indepenet , t 0) defined
dent random variable uniformly distributed on [0, 1]. Then the process (B
by
Bt , if t 6= U ;
e
Bt =
0, if t = U
has the same finite-dimensional distributions as Brownian motion, but is discontinuous if
B(U ) 6= 0, which happens with probability one, and hence it is not a Brownian motion.
51
6.2
Wieners theorem
Theorem 6.4. [Wieners theorem] There exists a Brownian motion on some probability
space.
Proof. We will first prove the theorem in dimension d = 1 and we will construct a process
(Bt , 0 t 1) and then extend it to the whole of R+ and to higher dimensions.
Let D0 = {0, 1} and Dn = {k2n , 0 k 2n } for n 1, and D = n0 Dn be the set
of dyadic rational numbers in [0, 1]. Let (Zd , d D) be a sequence of independent random
variables distributed according to N (0, 1) on some probability space (, F, P). We will first
construct (Bd , d D) inductively.
First set B0 = 0 and B1 = Z1 . Inductively, given that we have constructed (Bd , d Dn1 )
satisfying the conditions of the definition, we build (Bd , d Dn ) as follows:
Take d Dn \ Dn1 and let d = d 2n and d+ = d + 2n , so that d , d+ are consecutive
dyadic numbers in Dn1 . We write
Bd =
Bd + Bd+
Zd
+ (n+1)/2 .
2
2
Then we have
Bd Bd =
Bd+ Bd
Bd Bd
Zd
Zd
+ (n+1)/2 and Bd+ Bd = +
(n+1)/2 .
2
2
2
2
(6.1)
Bd Bd
Zd
Setting Nd = + 2 and Nd0 = 2(n+1)/2
, we see by the induction hypothesis that Nd and
Nd0 are independent centred Gaussian random variables with variance 2n1 . Therefore
52
where di is a sequence in D converging to t. It follows easily that (Bt , t [0, 1]) is -Holder
continuous for all < 1/2 a.s.
Finally we will check that (Bt , t [0, 1]) has the properties of Brownian motion. We will
first prove the independence of the increments property. Let 0 = t0 < t1 < . . . < tk and let
0 = tn0 tn1 . . . tnk be dyadic rational numbers such that tni ti as n for each i.
By continuity (Btn1 , . . . , Btnk ) converges a.s. to (Bt1 , . . . , Btk ) as n , while on the other
hand the increments (Btnj Btnj1 , 1 j k) are independent Gaussian random variables
with variances (tnj tnj1 , 1 j k). Then as n we have
"
E exp i
k
X
!#
uj (Btnj Btnj1 )
j=1
k
Y
j=1
k
Y
j=1
By Levys convergence theorem we now see that the increments converge in distribution to
independent Gaussian random variables with respective variances tj tj1 , which is thus the
distribution of (Btj Btj1 , 1 j k) as desired.
To finish the proof we will construct Brownian motion indexed by R+ . To this end, take
a sequence (Bti , t [0, 1]) for i = 0, 1, . . . of independent Brownian motions and glue them
together, more precisely by
btc1
X
btc
B1i .
Bt = Btbtc +
i=0
This defines a continuous random process B : [0, ) R and it is easy to see from what
we have already shown that B satisfies the properties of a Brownian motion.
Finally to construct Brownian motion in Rd we take d independent Brownian motions in 1
dimension, B 1 , . . . , B d , and set Bt = (Bt1 , . . . , Btd ). Then it is straightforward to check that
B has the required properties.
Remark 6.5. The proof above gives that the Brownian paths are a.s. -Holder continuous
for all < 1/2. However, a.s. there exists no interval [a, b] with a < b such that B is Holder
continuous with exponent 1/2 on [a, b]. See example sheet for the last fact.
6.3
Invariance properties
Theorem 6.7. [Time inversion] Suppose that (Bt , t 0) is a standard Brownian motion.
Then the process (Xt , t 0) defined by
0,
if t = 0;
Xt =
tB1/t , for t > 0
is also a standard Brownian motion.
Proof. The finite dimensional distributions (Bt1 , . . . , Btn ) of Brownian motion are Gaussian
random vectors and are therefore characterized by their means E[Bti ] = 0 and covariances
Cov(Bti , Btj ) = ti for 0 ti tj .
So it suffices to show that the process X is a continuous Gaussian process with the same
means and covariances as Brownian motion. Clearly the vector (Xt1 , . . . , Xtn ) is a centred
Gaussian vector. The covariances for s t are given by
1
Cov(Xs , Xt ) = st Cov(B1/s , B1/t ) = st = s.
t
Hence X and B have the same finite marginal distributions. The paths t 7 Xt are clearly
continuous for t > 0, so it remains to show that they are also continuous for t = 0. First notice
that since X and B have the same finite marginal distributions we get that (Xt , t 0, t Q)
has the same law as a Brownian motion and hence
lim Xt = 0 a.s.
t0,tQ
t0
Bt
t
= 0.
Remark 6.9. Of course one can show the above result directly using the strong law of large
numbers, i.e. limn Bn /n = 0. The one needs to show that B does not oscillate too much
between n and n + 1. See example sheet.
Definition 6.10. We define (FtB , t 0) to be the natural filtration of (Bt , t 0) and Fs+
the slightly augmented -algebra defined by
\
Fs+ =
FtB .
t>s
54
Remark 6.11. By the simple Markov property of Brownian motion Bt+s Bs is independent
of FsB . Clearly FsB Fs+ for all s, since in Fs+ we allow an additional infinitesimal glance
into the future. But the next theorem shows that Bt+s Bs is still independent of Fs+ .
Theorem 6.12. For every s 0 the process (Bt+s Bs , t 0) is independent of Fs+ .
Proof. Let (sn ) be a strictly decreasing sequence converging to s as n . By continuity
Bt+s Bs = lim Bsn +t Bsn a.s.
n
Let A Fs+ and t1 , . . . , tm 0. For any F continuous and bounded on(Rd )m we have by
the dominated convergence theorem
E[F ((Bt1 +s Bs , . . . , Btm +s Bs ))1(A)] = lim E[F ((Bt1 +sn Bsn , . . . , Btm +sn Bsn ))1(A)].
n
Since A Fs+ , we have that A FsBn for all n, and hence by the simple Markov property we
obtain that for all n
E[F ((Bt1 +sn Bsn , . . . , Btm +sn Bsn ))1(A)] = P(A)E[F ((Bt1 +sn Bsn , . . . , Btm +sn Bsn ))].
Therefore, taking the limit again we deduce that
E[F ((Bt1 +s Bs , . . . , Btm +s Bs ))1(A)] = E[F ((Bt1 +s Bs , . . . , Btm +s Bs ))]P(A),
and hence proving the claimed independence.
Theorem 6.13. [Blumenthals 0-1 law] The -algebra F0+ is trivial, i.e. if A F0+ , then
P(A) {0, 1}.
Proof. Let A F0+ . Then A (Bt , t 0), and hence by Theorem 6.12 we obtain that A
is independent of F0+ , i.e. it is independent of itself:
P(A) = P(A A) = P(A)2 ,
which gives that P(A) {0, 1}.
Theorem 6.14. Suppose that (Bt )t0 is a standard Brownian motion in 1 dimension. Define
= inf{t > 0 : Bt > 0} and = inf{t > 0 : Bt = 0}. Then
P( = 0) = P( = 0) = 1.
Proof. For all n we have
{ = 0} =
kn
B
and thus { = 0} F1/n
for all n, and hence
{ = 0} F0+ .
55
Therefore, P( = 0) {0, 1}. It remains to show that it has positive probability. Clearly,
for all t > 0 we have
1
P( t) P(Bt > 0) = .
2
Hence by letting t 0 we get that P( = 0) 1/2 and this finishes the proof. In exactly the
same way we get that
inf{t > 0 : Bt < 0} = 0 a.s.
Since B is a continuous function, by the intermediate value theorem, we deduce that
P( = 0) = 1.
Proposition 6.15. For d = 1 and t 0 let St = sup0st Bs and It = inf 0st Bs .
1. Then for every > 0 we have
S > 0 and I < 0 a.s.
In particular, a.s. there exists a zero of B in any interval of the form (0, ), for all > 0.
2. A.s. we have
sup Bt = inf Bt = +.
t0
t0
(d)
S = sup Bt = sup Bt = sup Bt .
t0
t0
t0
(d)
Hence S = S for all > 0. Thus for all x > 0 the probability P(S x) is a constant
c, and hence
P(S 0) = c.
But we have already showed that P(S 0) = 1. Therefore, for all x we have
P(S x) = 1,
which gives that P(S = ) = 1.
56
Proposition 6.16. Let C be a cone in Rd with non-empty interior and origin at 0, i.e. a
set of the form {tu : t > 0, u A}, where A is a non-empty open subset of the unit sphere
of Rd . If
HC = inf{t > 0 : Bt C}
is the first hitting time of C, then HC = 0 a.s.
Proof. Since the cone C is invariant under multiplication by a positive scalar, by the scaling
invariance property of Brownian motion we get that for all t
P(Bt C) = P(B1 C).
Since C has non-empty interior, it is straightforward to check that
P(B1 C) > 0
and then we can finish the proof using Blumenthals 0-1 law as in the proposition above.
6.4
Let (Ft )t0 be a filtration. We say that a Brownian motion B is an (Ft )-Brownian motion
if B is adapted to (Ft ) and (Bs+t Bs , t 0) is independent of Fs for every s 0.
In Proposition 3.3 we saw that the first hitting time of a closed set by a continuous process
is always a stopping time. This is not true in general though for an open set. However, if we
consider the right continuous filtration, i.e. (Ft+ ), then we showed in Proposition 3.5 that
the first hitting time of an open set by a continuous process is always an (Ft+ ) stopping time.
So, in what follows we will be considering the right continuous filtration. As this filtration
is larger, this choice produces more stopping times.
Theorem 6.17. [Strong Markov property] Let T be an a.s. finite stopping time. Then
the process
(BT +t BT , t 0)
is a standard Brownian motion independent of FT+ .
Proof. We will first prove the theorem for the stopping times Tn = 2n d2n T e that discretely
(k)
approximate T from above. We write Bt = Bt+k2n Bk2n which is a Brownian motion
and B for the process defined by
B (t) = Bt+Tn BTn .
We will first show that B is a Brownian motion independent of FT+n . Let E FT+n . For
every event {B A} we have
P({B A} E) =
k=0
k=0
57
+
since by the simple Markov property {B (k) A} is independent of Fk2
n and E {Tn =
+
n
(k)
(k)
k2 } Fk2n . Since B is a Brownian motion, we have P(B A) = P(B A) does
not depend on k, and hence
The increments (Bt+s+Tn Bs+Tn ) are normally distributed with 0 mean variance equal to
t. Thus for any s 0 the increments Bt+s+T Bs+T are also normally distributed with 0
mean and variance t. As the process (Bt+T BT , t 0) is a.s. continuous, it is a Brownian
motion. It only remains to show that it is independent of FT+ .
Let A FT+ and t1 , . . . , tk 0. We will show that for any function F : (Rd )k R continuous
and bounded we have
E[1(A)F ((Bt1 +T BT , . . . , Btk +T BT ))] = P(A)E[F ((Bt1 +T BT , . . . , Btk +T BT ))].
Using the continuity again and the dominated convergence theorem, we get that
E[1(A)F ((Bt1 +T BT , . . . , Btk +T BT ))] = lim E[1(A)F ((Bt1 +Tn BTn , . . . , Btk +Tn BTn ))].
n
Since Tn > T , it follows that A FT+n . But we already showed that the process (Bt+Tn
BTn , t 0) is independent of FT+n , hence using the continuity and dominated convergence
one more time gives the claimed independence.
Remark 6.18. Let = inf{t 0 : Bt = max0s1 Bs }. It is intuitively clear that is not a
stopping time. To prove that, first show that < 1 a.s. The increment Bt+ B is negative
in a small neighbourhood of 0, which contradicts the strong Markov property.
6.5
Reflection principle
Theorem 6.19. [Reflection principle] Let T be an a.s. finite stopping time and (Bt , t 0)
et , t 0) defined by
a standard Brownian motion. Then the process (B
et = Bt 1(t T ) + (2BT Bt )1(t > T )
B
is also a standard Brownian motion and we call it Brownian motion reflected at T .
58
6.6
60
We saw above that if f (x) = x2 , then the right term to subtract from f (Bt ) in order to
make it a martingale is t. More generally now, we are interested in finding what we need to
subtract from f in order to obtain a martingale. Before stating the theorem for Brownian
motion, lets look at a discrete time analogue for a simple random walk on the integers. Let
(Sn ) be the random walk. Then
1
E[f (Sn+1 )|S1 , . . . , Sn ] f (Sn ) = (f (Sn + 1) 2f (Sn ) + f (Sn 1))
2
1e
= f
(Sn ),
2
e (x) := f (x + 1) 2f (x) + f (x 1). Hence
where f
n1
1Xe
f (Sn )
f (Sk )
2 k=0
defines a discrete time martingale. In the Brownian motion case we expect a similar result
e replaced by its continuous analogue, the Laplacian
with
f (x) =
d
X
2f
i=1
x2i
Theorem 6.26. Let B be a Brownian motion in Rd . Let f (t, x) : R+ Rd R be continuously differentiable in the variable t and twice continuously differentiable in the variable
x. Suppose in addition that f and its derivatives up to second order are bounded. Then the
following process
Z t
1
Mt = f (t, Bt ) f (0, B0 )
+ f (s, Bs ) ds, t 0
t 2
0
is an (Ft+ )-martingale.
Proof. Integrability follows trivially by the assumptions on the boundedness of f and its
derivatives.
We will now show the martingale property. Let 0 t. Then
Z s+t
1
Mt+s Ms = f (t + s, Bt+s ) f (s, Bs )
+ f (r, Br ) dr
r 2
s
Z t
1
= f (t + s, Bt+s ) f (s, Bs )
+ f (r + s, Br+s ) dr.
r 2
0
Since Bt+s Bs is independent of Fs+ by Theorem 6.12 and Bs is Fs+ -measurable, writing
2
ps (z, y) = (2s)d/2 e|zy| /(2s) for the transition density in time s, we have (check!)
Z
+
+
E[f (t + s, Bt+s )|Fs ] = E[f (t + s, Bt+s Bs + Bs )|Fs ] =
f (t + s, Bs + x)pt (0, x) dx.
Rd
61
Now notice that by the boundedness assumption on f and all its derivatives
Z t
Z t
1
1
+
E
E
+ f (r + s, Br+s ) drFs =
+ f (r + s, Br+s )Fs+ dr.
r 2
r 2
0
0
(Check! using Fubinis theorem and the definition of conditional expectation.) Using again
the fact that Bt+s Bs is independent of Fs+ , we get
Z
1
1
+
+ f (r + s, Br+s Bs + Bs )Fs =
+ f (r+s, x+Bs )pr (0, x) dx.
E
r 2
r 2
Rd
By the boundedness of f and its derivatives, using the dominated convergence theorem we
deduce
Z tZ
Z tZ
1
1
pr (0, x)f (r + s, x + Bs ) dr dx
(f (t + s, Bs + x)pt (0, x) f ( + s, x + Bs )p (0, x)) dx
Rd r
Rd
Z tZ
1
+
pr (0, x)f (r + s, x + Bs ) dx dr.
Rd 2
The transition density pr (0, x) satisfies the heat equation, i.e. (r /2)p = 0, and hence
this last expression is equal to
Z
(f (t + s, Bs + x)pt (0, x) f ( + s, x + Bs )p (0, x)) dx.
Rd
Rd
since the limit above is equal to lim0 E[f (s + , Bs+ )|Fs+ ] which by the continuity of the
Brownian motion and of f and by the conditional dominated convergence theorem is equal
to f (s, Bs ).
Therefore we showed that
E[Mt+s Ms |Fs+ ] = 0 a.s.
and this finishes the proof.
6.7
We note that if a Brownian motion starts from x Rd , i.e. B0 = x, then B can be written
as
et ,
Bt = x + B
e is a standard Brownian motion.
where B
We will write Px to indicate that the Brownian motion starts from x, i.e. under Px the
process (Bt x, t 0) is a standard Brownian motion.
62
Z
(Bt )
0
1
(Bs ) ds
2
t
is a martingale.
We now set S = inf{t 0 : |Bt | = } and TR = inf{t 0 : |Bt | = R}. Then H = S TR
is an a.s. finite stopping time and (MtH )t0 = (log |BtH |, t 0) is a bounded martingale.
By the optional stopping theorem, since H < a.s., we thus obtain that
Ex [log |BH |] = log |x|
or equivalently,
log()Px (S < TR ) + log(R)Px (TR < S) = log |x|,
63
(6.2)
Letting R we have that TR a.s. and hence Px (S < ) = 1, which shows that
Px (|Bt | , for some t > 0) = 1.
Applying the Markov property at time n we get
Px (|Bt | , for some t > n) = Px (|Bt+n Bn + Bn | , for some t > 0)
Z
=
P0 (|Bt + y| , for some t > 0)Px (Bn dy)
2
ZR
Py (|Bt | , for some t > 0)Px (Bn dy).
=
R2
(Px (Bn dy) is the law of Bn under Px .) Since we showed above that for all z
Pz (|Bt | , for some t > 0) = 1,
we deduce that Px (|Bt | , for some t > n) = 1 for all x.
Therefore the set {t 0 : |Bt | } is unbounded Px -a.s.
Letting 0 in (6.2) gives that the probability of hitting 0 before hitting the boundary of
the ball around 0 of radius R is 0. Therefore, letting R gives that the probability of
ever hitting 0 is 0, i.e. for all x 6= 0
Px (Bt = 0, for some t > 0) = 0.
We only need to show now that
P0 (Bt = 0, for some t > 0) = 0.
Applying again the Markov property at a > 0 we get
Z
P0 (Bt = 0, for some t a) =
P0 (Bt+a Ba + y = 0, for some t > 0)P0 (Ba dy)
2
R
Z
1
2
=
Py (Bt = 0, for some t > 0)
e|y| /(2a) dy = 0
d/2
(2a)
R2
since for all y 6= 0 we have already proved that Py (Bt = 0, for some t > 0) = 0.
Thus, since P0 (Bt = 0, for some t a) = 0 for all a > 0, letting a 0 we deduce that
P0 (Bt = 0, for some t > 0) = 0.
(iii) Since the first three components of a Brownian motion in Rd form a Brownian motion
in R3 , it suffices to treat the case d = 3. As we did above, let f be a function f Cb2 (R3 )
such that
1
f (y) =
, for |y| R.
|y|
64
Note that f (y) = 0 for |y| R. Let B0 = x with |x| R. If we define again S
and TR as above the same argument shows that
Px (S < TR ) =
|x|1 R1
.
1 R1
As R this converges to /|x| which is the probability of ever visiting the ball centred
at 0 and of radius when starting from |x| .
We will now show that
P0 (|Bt | as t ) = 1.
Let Tr = inf{t > 0 : |Bt | = r} for r > 0. We define the events
An = {|Bt | > n for all t Tn3 }.
By the unboundedness of Brownian motion, it is clear that
P0 (Tn3 < ) = 1.
Applying the strong Markov property at the time Tn3 we obtain
P0 (Acn ) = P0 |Bt+Tn3 BTn3 + BTn3 | n for some t 0
n
1
= E0 [PBT 3 (Tn < )] = 3 = 2 .
n
n
n
Since the right hand side is summable, by the Borel-Cantelli lemma we get that only finitely
many of the sets Acn occur, which implies that |Bt | diverges to as t .
6.8
Z
u(y) dy,
B(x,r)
Z
u(y) dx,r (y),
B(x,r)
xD
xD
Proof. (i) Let M be the maximum. Then the set V = {x D : u(x) = M } is relatively
closed in D (if xn is a sequence of points in V converging to x D, then x V ), since u is
continuous. Since D is open, for any x V there exists r > 0 such that B(x, r) D. From
Theorem 6.30 we have
Z
1
u(y) dy M.
M = u(x) =
L(B(x, r)) B(x,r)
We thus deduce that u(y) = M for almost all y B(x, r). But since u is continuous, this
gives that u(y) = M for all y B(x, r). Therefore, B(x, r) V . Hence V is also open and
by assumption non-empty. But since D is connected, we must have that V = D. Hence u is
constant on D.
is closed and bounded, u attains a maximum on D.
By (i),
(ii) Since u is continuous and D
the maximum has to be attained on D.
Corollary 6.33. Suppose that u1 , u2 : Rd R are functions harmonic on a bounded domain
If u1 and u2 agree on D, then they are identical.
D and continuous on D.
66
xD
xD
all x D. Hence u1 = u2 on D.
Proof of Theorem 6.29. Since the domain D is bounded, we get that u is bounded. We
will first show that u = 0 on D, by showing that u satisfifes condition (iii) of Theorem 6.30.
) D. Let = inf{t > 0 : Bt
Let x D. Then there exists > 0 such that B(x,
/ B(x, )}.
Then this is an a.s. finite stopping time, and hence applying the strong Markov property at
we get
u(x) = Ex [(BD )] = Ex [Ex [(BD )|F ]] = Ex [EB [(BD )]]
Z
1
u(y) dx,r (y).
= Ex [u(B )] =
(B(x, r)) B(x,r)
The uniqueness now follows from Corollary 6.33.
Clearly u is continuous on D. So we only
It remains to show that u is continuous on D.
need to show that u is continuous on D. Let z D. Since the domain D satisfies the
Poincare cone condition, there exists h > 0 and a non-empty open cone Cz with origin at z
such that Cz B(z, h) Dc .
Since is continuous on D, we get that for every > 0, there exists 0 < h such that
if |y z| and y D, then |(y) (z)| < .
Let x be such that |x z| 2k , for some k > 0. Then we have
|u(x) u(z)| = |Ex [(BD )] (z)| Ex [|(BD ) (z)|]
Px (D < B(z,) ) + 2kk Px (B(z,) < D )
Px (D < B(z,) ) + 2kk Px (B(z,) < Cz ).
Now we note that
Px (B(z,) < Cz ) ak ,
for some a < 1. Thus by choosing k large enough, we can get this last probability as small
as we like, and hence this completes the proof of continuity.
We will now give an example where the domain does not satisfy the conditions of Theorem 6.29 and the function u as defined there fails to solve the Dirichlet problem.
Example 6.34. Let v be a solution of the Dirichlet problem on B(0, 1) with boundary
condition : B(0, 1) R. We now let D = {x R2 : 0 < |x| < 1} be the punctured disc.
We will show that the function u(x) = Ex [(BD )] given by Theorem 6.29 fails to solve
the problem on D with boundary condition : B(0, 1) {0} if (0) 6= v(0). Indeed, since
planar Brownian motion does not hit points, the first hitting time of D = B(0, 1) {0} is
equal a.s. to the first hitting time of B(0, 1). Therefore,
u(0) = E0 [(BD )] = v(0) 6= (0).
67
6.9
In this section we will show that Brownian motion is the scaling limit of random walks with
steps of 0 mean and finite variance. This can be seen as a generalization of the central limit
theorem to processes.
For a function f C([0, 1], R) we define its uniform norm kf k = supt |f (t)|. The uniform
norm makes C([0, 1], R) into a metric space so we can consider weak convergence of probability
measures. The associated Borel -algebra coincides with the -algebra generated by the
coordinate functions.
Theorem 6.35. [Donskers invariance principle] Let (Xn , n 1) be a sequence of Rvalued integrable independent random variables with common law such that
Z
Z
x d(x) = 0 and
x2 d(x) = 2 (0, ).
Let S0 = 0 and Sn = X1 + . . . + Xn and define a continuous process that interpolates linearly
between values of S, namely
St = (1 {t})S[t] + {t}S[t]+1 , t 0,
where [t] denotes the integer part of t and {t} = t [t]. Then S [N ] := (( 2 N )1/2 SN t , 0
t 1) converges in distribution to a standard Brownian motion between times 0 and 1, i.e.
for every bounded continuous function F : C([0, 1], R) R,
E[F (S [N ] )] E[F (B)] as N .
Remark 6.36. Note that from Donskers theorem we can infer that N 1/2 sup0nN Sn
converges to sup0t1 Bt in distribution as N , since the function f 7 sup f is a
continuous operation on C([0, 1], R).
The proof of Theorem 6.35 that we will give uses a coupling of the random walk with
the Brownian motion, called the Skorokhod embedding theorem. It is however specific to
dimension d = 1.
Theorem 6.37. [Skorokhod embedding for random walks] Let be a probability
measure on R of mean 0 and variance 2 < . Then there exists a probability space (, F, P)
with filtration (Ft )t0 , on which is defined a Brownian motion (Bt )t0 and a sequence of
stopping times
0 = T0 T1 T2 . . .
such that, setting Sn = BTn ,
(i) (Tn )n0 is a random walk with steps of mean 2 ,
(ii) (Sn )n0 is a random walk with step distribution .
68
Write T = T1 ,X = X1 and Y = Y1 .
By Proposition 6.24, conditional on X = x and Y = y, we have T < a.s. and
P(BT = Y |X, Y ) = X/(X + Y ) and E[T |X, Y ] = XY.
So, for A B([0, )),
Z Z
P(BT A) =
A
x
C(x + y) (dx)+ (dy)
x+y
so P(BT A) = (A). A similar argument shows this identity holds also for A B((, 0]).
Next
Z Z
E[T ] =
xyC(x + y) (dx)+ (dy)
0
0
Z 0
Z
2
(x) (dx) +
y 2 (dy) = 2 .
=
Now by the strong Markov property for each n 0 the process (BTn +t BTn )t0 is a Brownian
motion, independent of FTBn . So by the above argument BTn+1 BTn has law , Tn+1 Tn
has mean 2 , and both are independent of FTBn . The result follows.
Proof of Theorem 6.35. We assume for this proof that = 1. This is enough by scaling.
Let (Bt )t0 be a Brownian motion and (Tn )n1 be the sequence of stopping times as constructed in Theorem 6.37. Then BTn is a random walk with the same distribution as Sn .
Let (St )t0 be the linear interpolation between the values of (Sn ).
For each N 1 we set
(N )
Bt
N BN 1 t ,
69
which by the scaling invariance property of Brownian motion is again a Brownian motion.
(N )
We now perform the Skorokhod embedding construction with (Bt )t0 replaced by (Bt )t0 ,
(N )
(N )
(N )
to obtain stopping times Tn . We then set Sn = B (N ) and interpolate linearly to form
Tn
(N )
(St )t0 .
(N )
(N )
(N )
(N )
Next we set Ten = N 1 Tn and Set = N 1/2 SN t . Then
(N )
(Set
[N ]
(N )
and Sen/N = BTen(N ) for all n. We need to show that for all bounded continuous functions
F : C([0, 1], R) R that as N
Since F is continuous, this implies that F (Se(N ) ) F (B) in probability, which by bounded
convergence is enough.
Since Tn is a random walk with increments of mean 1 by the strong law of large numbers we
have that a.s.
Tn
1 as n .
n
So as N we have that a.s.
N 1 sup |Tn n| 0 as n .
nN
(N )
Sen/N
Since
= BTen(N ) for all n we have that for every n/N t (n + 1)/N there exists
(N
)
(N )
(N )
Ten u Ten+1 such that Set = Bu . This follows by the intermediate value theorem and
(N )
the fact that (Set ) is the linear interpolation between the values of Sn . Hence we have
(N )
{|Set
Bt | > for some t [0, 1]} {|Ten(N ) n/N | > for some n N }
{|Bu Bt | > for some t [0, 1] and |u t| + 1/N }
= A1 A2 .
The paths of (Bt )t0 are uniformly continuous on [0, 1]. So for any > 0 we can find > 0
so that P(A2 ) /2 whenever N 1/. Then by choosing N even larger we can ensure
that P(A1 ) /2 also. Hence Se(N ) B uniformly on [0, 1] in probability as required.
Remark 6.38. From the proof above we see that we can construct the Brownian motion
and the random walk on the same space so that as N
[N ]
P sup |St Bt | > 0.
0t1
70
6.10
Theorem 6.39. Let (Bt )t0 be a one dimensional Brownian motion and
Zeros = {t 0 : Bt = 0}
is the zero set. Then, almost surely, Zeros is a closed set with no isolated points.
Proof. Since Brownian motion is continuous almost surely, the zero set is closed a.s. To
prove that no point is isolated we do the following: for each rational q [0, ) we consider
the first zero after q, i.e.
q = inf{t q : Bt = 0}.
Note that q is an almost surely finite stopping time. Since Zeros is a closed set, this infimum
is almost surely a minimum. By the strong Markov property, applied to q , we have that
for each q, almost surely q is not an isolated zero from the right. But since the rational
numbers is a countable set we get that almost surely for all rational q, the zero q is not
isolated from the right.
The next thing to prove is that the remaining points of Zeros are not isolated from the left.
We claim that for any 0 < t in the zero set which is different from q for all rational q is not
an isolated point from the left. Take a sequence qn t with qn Q. Define tn = qn . Clearly
qn tn < t and so tn t. Thus t is not isolated from the left.
Theorem 6.40. Fix t 0. Then, almost surely, Brownian motion in one dimension is not
differentiable at t.
Proof. Exercise.
But also a much stronger statement is true, namely
Theorem 6.41. [Paley, Wiener and Zygmund 1933] Almost surely, Brownian motion
in one dimension is nowhere differentiable.
7
7.1
For (0, ) we say that a random variable X in Z+ is Poisson of parameter and write
X P () if
P(X = n) = e n /n!
We also write X P (0) to mean X 0 and write X P () to mean X .
Proposition 7.1. [Addition property] Let Nk , k N, be independent random variables,
with Nk P (k ) for all k. Then
X
X
Nk P (
k ).
k
71
N
X
1(Yn = j).
n=1
(A ) =
k
Y
j=1
Since the set of such sets A is a -system generating E , this implies that is uniquely
determined on E .
(Existence.) Consider first the case where = (E) < . There exists a probability
space (, F, P) on which are defined independent random variables N and Yn , n N, with
N P () and Yn / for all n. Set
M (A) =
N
X
1(Yn A),
n=1
72
A E.
(7.1)
It is easy to check, by the Poisson splitting property, that M is a Poisson random measure
with intensity . Indeed, for disjoint A1 , . . . , Ak in E with finite measures, we let Xn = j
whenever Yn Aj , so that M (Aj ), 1 j k are independent P ((Aj )), 1 j k random
variables.
More generally, if (E, E, ) is -finite, then there exist disjoint sets Ek E, k N, such
that k Ek = E and (Ek ) < for all k. We can construct, on some probability space,
independent Poisson random measures Mk , k N, with Mk having intensity |Ek . Set
X
M (A) =
Mk (A Ek ), A E.
kN
It is easy to check, by the Poisson addition property, that M is a Poisson random measure
with intensity . The law of M on E is then a measure with the required properties.
The above construction gives the following important property of Poisson random measures.
Proposition 7.4. Let M be a Poisson random measure on E with intensity , and let A E
be such that (A) < .
P Then M (A) has law P ((A)), and given M (A) = k, the restriction
M |A has same law as ki=1 Xi , where (X1 , . . . , Xk ) are independent with law ( A)/(A).
Moreover, if A, B E are disjoint, then the restrictions M |A , M |B are independent.
Exercise 7.5. Let E = R+ and = 1(t 0) dt. Let M be a Poisson random measure on
R+ with intensity measure and let(Tn )n1 and T0 = 0 be a sequence of random variables
such that (Tn Tn1 , n 1) are independent exponential random variables with parameter
> 0. Then
!
X
Nt =
1(Tn t), t 0 and (Nt0 = M ([0, t]), t 0)
n1
7.2
Theorem 7.6. Let M be a Poisson random measure on E with intensity . Then for
f L1 (), then so is M (f ) defined by
Z
M (f ) =
f (y)M (dy)
E
and
Z
E[M (f )] =
Z
f (y)(dy), var(M (f )) =
f (y)2 (dy).
73
Proof. First assume that f = 1(A), for A E. Then M (A) is a random variable by
definition of M and this extends to any finite linear combination of indicators. Since any
measurable non-negative function is the increasing limit of finite linear combinations of such
indicator functions, we obtain by monotone convergence that M (f ) is a random variable as
a limit of random variables.
Let En , n 0 be a measurable partition of E into sets of finite -measure. A similar
approximation argument shows that M (f 1(En )), n 0 are independent random variables.
Let f L1 (). We will first show the formula for the expectation and the variance. If
f = 1(A), then this is clear. This extends to finite linear combinations and to any nonnegative measurable functions by approximation. For a general f , we do the standard
procedure, separating into f = f + f and use the fact that M (f + ) and M (f ) are
independent.
Since
Pk by Proposition 7.4 given M (En ) = k, the restriction M |En has the same law as
i=1 Xi , where (X1 , . . . , Xk ) are independent with law ( En )/(En ), we get
E[exp(uM (f 1(En )))] =
k=0
k
(En ) (En )
k!
Z
k=0
=e
(En )
Z
e
uf (x)
En
uf (x)
(dx)
(En )
k
(dx)
exp
e
En
Z
(dx)(1 exp(uf (x))) .
= exp
En
Since the random variables M (f 1(En )) are independent over n 0, we can take products
over n 0 and by monotone convergence we obtain the wanted formula.
To establish the formula in the case where f L1 (), follows by the same kind of arguments.
We first establish the formula for f 1(En ) in place of f . Then to obtain the result, we must
show that
Z
Z
iuf (x)
(dx)(e
1)
(dx)(eiuf (x) 1) as n ,
An
E
ix
7.3
In this section we are going to consider Poisson random measures in Rd for d 1 with
intensity measure given by = dx, i.e. multiples of the Lebesgue measure in d dimensions.
74
Xi ,
i=1
where Xi are random variables, since the Lebesgue measure of the whole space is infinite.
We will sometimes say Poisson point process to mean a Poisson random measure in Rd .
Proposition 7.7. [Thinning property] Let = {Xi } be a Poisson point process in Rd of
intensity . For each point Xi we perform an independent experiment and we keep it with
probability p(Xi ) and we remove it with the complementary probability, where p : Rd [0, 1]
is a measurable function. Thus we define a new process that contains the points RXi that we
kept. The process is a Poisson random measure in Rd with intensity (A) = A p(x) dx.
Proof. The independence property follows easily from the independence of . We will now
show that for any set A with finite volume we have (A) P ((A)), where is the intensity
measure given in the statement. By Proposition 7.4 we have
X
P((A) = n, (A) = k)
P((A) = k) =
nk
Z
k Z
nk
dx
dx
n
p(x)
(1 p(x))
=
e
n!
k
vol(A)
vol(A)
A
A
nk
k X
nk
Z
Z
k
nk
vol(A)
=e
p(x) dx
(1 p(x)) dx
k!
(n k)!
A
A
nk
k
Z
Z
k
vol(A)
=e
p(x) dx exp (1 p(x)) dx
k!
A
A
R
Z
( A p(x) dx)k
= exp p(x) dx
.
k!
A
X
n
vol(A) (vol(A))
We can write
P
E eu(f ) = E eu i f (Xi +Yi )
75
and conditioning on {Xi } and using the independence of the (Yi )s we obtain
" Z
#
Y
u P f (Xi +Yi )
u(f )
i
=E E e
| = E
E e
euf (Xi +y) (dy)
i
"
= E exp log
YZ
i
Rd
!#
euf (Xi +y) (dy)
Rd
"
Z
X
= E exp
log
!#
Rd
= E [exp ((g))] ,
euf (x+y) (dy). By Theorem 7.6 we have
Z
Z
uf (x+y)
exp log e
(dy) 1 dx
E [exp ((g))] = exp
Rd
Z Z
uf (x+y)
e
(dy) 1 dx
= exp
Rd
Rd
Z
uf (x)
= exp
e
1 dx ,
Rd
Rd
where in the last step we used Fubinis theorem and the fact that is a probability measure
on Rd .
For the rest of this section we are going to consider the following model: let (0) be a Poisson
point process in Rd of intensity , let (0) = {Xi }. We now let each point of the Poisson
process move independently according to a standard Brownian motion in d dimensions.
Namely the point Xi moves according to the Brownian motion (i (t))t0 . This way at every
time t we obtain a new process (t) = {Xi + i (t)}, which by Proposition 7.8 is again a
Poisson point process of intensity .
We can think of the points of the Poisson process (0) as the users of a wireless network
that can communicate with each other when they are at distance at most r from each other.
So it is natural to introduce mobility to the model and this is why we let them evolve in
space.
We now fix a target particle which is at the origin of Rd and we are interested in the first
time that one of the points of the Poisson process is within distance r from it, i.e. we define
Tdet = inf {t 0 : 0 i B(Xi + i (t), r)} ,
where B(x, r) stands for the ball centred at x of radius r.
Theorem 7.9. [Stochastic geometry formula] Let be a standard Brownian motion in
d dimensions and let W (t) = st B((s), r) be the so-called Wiener sausage up to time t.
Then, for any dimension d 1, the detection probability satisfies
P(Tdet > t) = exp(E[vol(W (t))]).
76
Proof. Let be the set of points of (0) that have detected 0 by time t, that is
= {Xi (0) : s t s.t. 0 B(Xi + i (s), r)}.
Since the i s are independent we have by Proposition 7.7 that is a thinned Poisson point
process with intensity (x)dx where is given by
(x) = P(x st B((s), r)),
for is a standard Brownian motion.
So for the probability that the detection time is greater than t we have that
Z
d
P(x st B((s), r)) dx
P(Tdet > t) = P((R ) = 0) = exp
Rd
8t
+ 2r for d = 1
2t
(1 + o(1)) for d = 2
E [vol(W (t))] =
log t
d/2 r d2 t
(1 + o(1)) for d 3.
( d2
2 )
Proof. Dimension d = 1 is left as an exercise.
For all d we have that
Z
E[vol(Wt )] =
Rd \B(0,r)
77
where A is the first hitting time of the set A by the Brownian motion. Define
Z t
y
Zt =
1((s) B(y, r)) ds,
(7.2)
i.e. the time that the Brownian motion spends in the ball B(y, r) before time t. It is clear
by the continuity of the Brownian paths that {Zty > 0} = {B(y,r) t}. We now have
E[Z y ]
P(Zty > 0) = E[Z y |Zty >0] and for the first moment we have
t
E[Zty ]
Z tZ
=
0
B(y,r)
|z|2
1
2s
dz ds =
e
(2s)d/2
Z tZ
0
B(0,r)
|z+y|2
1
2s
dz ds
e
(2s)d/2
and for the conditional expectation E[Zty |Zty > 0], if we write T for the first time that the
Brownian motion hits the boundary of the ball B(y, r), then we get that in 2 dimensions for
all y
/ B(0, r)
Z t
Z t
y
y
E[Zt |Zt > 0] = E
1((s) B(y, r)) ds P0((s) B(0, r)) ds
T
0
Z tZ
2
r2
1 |z|
e 2s dz ds 1 + log t.
1+
2
1
B(0,r) 2s
In dimensions d 3 we have for all y
/ B(0, r)
Z t
Z t
y
y
1((s) B(y, r)) ds P0((s) B(0, r)) ds
E[Zt |Zt > 0] = E
T
0
Z tZ
Z
Z
|z|2
1
1
1
2s
=
e
dz ds =
sd/22 es ds dz,
d/2
d/2
d2 |z|2
(2s)
(2)
|z|
0
B((0,r),r)
B((0,r),r)
2t
where B((0, r), r) stands for the ball centred at (0, . . . , 0, r) and of radius r and the last step
follows by a change of variable. Now notice that
Z
d2
d/22 s
s
e ds
as t
|z|2
2
2t
and by the mean value property for the harmonic function 1/|z|d2 we get that
Z
dz
= vol(B(0, 1))r2 .
d2
|z|
B((0,r),r)
So, putting all things together we obtain that in 2 dimensions
Z
E[Zty ]
E[vol(Wt )] = vol(B(0, r)) +
dy
y
y
R2 \B(0,r) E[Zt |Zt > 0]
R
2
RtR
R
1 |z+y|
2s
e
dy
dz
ds
E[Zty ] dy
2
0 B(0,r)
R 2s
B(0,r)
vol(B(0, r)) +
2
1 + r2 log t
R
2 B(0,r) E[Zty ] dy
2tr2
= vol(B(0, r)) +
.
2 + r2 log t
2 + r2 log t
78
R
B(0,r)
log t
1.
2t
R
B(0,r)
log t
1
2t
(7.3)
and in d 3 that
d2
lim sup E[vol(Wt )] d/2 2d2 1.
2 r t
t
Let > 0. We define Zety =
R t(1+)
0
(7.4)
P(Zty > 0)
E[Zety ]
.
E[Zety |Zty > 0]
We can now lower bound the conditional expectation appearing in the denominator above
as follows. In d = 2 we have
Z t Z
Z t Z
1 |z|2
1 |z|2
y
y
e
e 2s dz ds
e 2s dz ds
E[Zt |Zt > 0]
log(t) B((0,r),r) 2s
0
B((0,r),r) 2s
2r 2
r2 e log(t)
B((0,r),r)
|z|2
1
1
2s
dz ds =
e
d/2
(2s)
(2)d/2
Z
B((0,r),r)
1
|z|d2
|z|2
2t
2t(1 + )e log(t)
E[vol(Wt )]
log t + log log log(t)
and hence for d = 2
log t
1 + ,
2t
t
for all > 0, and thus letting go to 0 proves (7.3).
lim sup E[vol(Wt )]
sd/22 es ds dz.
Now suppose that the target particle is moving according to a deterministic function
f : R+ Rd . We define the detection time
f
= inf{t 0 : f (t) i B(Xi + i (t), r))}.
Tdet
References
[1] Peter Morters and Yuval Peres. Brownian motion. Cambridge Series in Statistical and
Probabilistic Mathematics. Cambridge University Press, Cambridge, 2010. With an appendix by Oded Schramm and Wendelin Werner.
[2] David Williams. Probability with martingales. Cambridge Mathematical Textbooks.
Cambridge University Press, Cambridge, 1991.
80