Mathematical Finance
Mathematical Finance
Mathematical Finance
1
squash’: dilate the x-axis in the ration a/b. So A 7→ A.a/b = πb2 .a/b = πab.
Fine – what next? We have already used both the coordinate systems to
hand. There is no general way to continue this list. Indeed, I don’t know
another example of comparable neatness and importance to the above.
The only general procedure is to superimpose finer and finer sheets of
graph paper on our region, and count squares (‘interior squares’ and ‘edge
squares’). This yields numerical approximations – which is all we can hope
for, and all we need, in general.
The question is whether this procedure always works. Where it is clearly
most likely to fail is with highly irregular regions: ‘all edge and no middle’.
It turns out that this procedure does not always work; it works for some
but not all sets – those whose structure is ‘nice enough’. This goes back to
the 1902 thesis of Henri LEBESGUE (1875-1941):
H. Lebesgue: Intégrale, longueur, aire. Annali di Mat. 7 (1902), 231-259.
Similarly in other dimensions. So: some but not all sets have a length/area/volume.
Those which do are called (Lebesgue) measurable; length/area/volume is
called (Lebesgue) measure; this subject is called Measure Theory.
We first meet integration in just this context – finding areas under curves
(say). The ‘Sixth Form integral’ proceeds by dividing up the range of inte-
gration on the x-axis into a large number of small subintervals, [x, x + dx]
say. This divides the required area up into a large number of thin strips, each
of which is approximately rectangular; we sum the areas of these rectangles
to approximate the area.
This informal procedure can be formalised, as the Riemann integral (G.
F. B. RIEMANN (1826-66) in 1854). This (basically, the Sixth From integral
formalised in the language of epsilons and deltas) is part of the undergradu-
ate Mathematics curriculum.
We see here the essence of calculus (the most powerful single weapon in
mathematics, and indeed in science). If something is reasonably smooth, and
we break it up finely enough, curves look straight, so we can handle them.
We make an error by this approximation, but when calculus applies, this er-
ror can be made arbitrarily small, so the approximation is effectively exact.
Example: We do this sort of thing automatically. If in a discussion of global
warming we hear an estimate of polar ice lost, this will translate into an
estimate of increase in sea level (neglecting the earth’s curvature).
Note. The ‘squashing’ argument above was deliberately presented informally.
It can be made quite precise – but this needs the mathematics of Haar mea-
sure, a fusion of Measure Theory and Topological Groups.
2
§2. Probability basics: Recapitulation from Years 1 and 2
This is true quite generally (as a Lebesgue-Stieltjes integral: see Ch. III). If
F has density f , this is
Z ∞
E[g(X)] = g(x)f (x)dx
−∞
— so ‘dF (x) = f (x)dx’ in the density case). In the discrete case, if X takes
values xn with probability an , this is a summation:
X
E[g(X)] = an g(xn ).
These formulae will be familiar to you from Years 1 and 2; we will deal with
both together, and pass to a much better integral (there are several!) in Ch.
III.
Several random variables: (X, Y ), etc. Joint and marginal distributions:
F (x, y) = FX,Y (x, y) := P (X ≤ x, Y ≤ y) (joint: governs behaviour of X, Y
3
together; marginals: govern their behaviour separately).
Covariances: cov(X, Y ) := E[(X − E[X])(Y
√ − E[Y ])];
correlation ρ = ρ(X, Y ) := cov(X, Y )/ var X.var t.
So: E[X] has the dimensions of X; var X has the dimensions of X 2 ; cov(X, Y )
has those of XY ; corr(X, Y is dimensionless, and takes values in [−1, 1]
(Cauchy-Schwarz inequality).
cov(X, Y ) = E[XY ] − E[X].E[Y ]; var X = cov(X, X) = E[X 2 ] − (E[X])2 ,
var X ≥ 0; var X = 0 iff X is constant a.s. (almost surely — with proba-
bility one), and then the constant is E[X].
Independence
A family of random variables X1 , · · · , Xn are independent if, for any sub-
family Xr1 , · · · , Xrk , and any intervals Ii (more generally, measurable sets:
Ch. III),
P (Xr1 ∈ I1 , · · · , Xrk ∈ Ik ) = P (Xr1 ∈ I1 ) · · · P (Xrk ∈ Ik ).
That is, knowing how the Xs behave separately (as on the RHS) tells us how
they behave together (as on the LHS).
Taking the Ii = (−∞, xi ]:
the Xs are independent iff their joint distribution function factorises into the
product of the marginal distribution functions.
When the densities exist, differentiating wrt x1 , · · · , xn gives: the Xs are
independent iff their joint density function factorises into the product of the
marginal density functions.
Similarly for mass functions, in the discrete case.
Similarly for the various transforms we shall need; see below.
4
(like) a double integral in calculus courses: we usually evaluate such a double
integral by integrating out first over one variable and then over the other,
so a double integral is reduced to two repeated single integrals. We will see
how to do this using Lebesgue-Stieltjes integrals (better!) in Ch. III. But on
the first RHS above, the one d means ‘integrate w.r.t.’ one two-dimensional
function (or, in Ch. III, measure).
3. Transforms
5
If X, Y are independent with d/ns F , G, their sum X + Y has d/n H,
where H is the convolution
H =F ∗G:
Z Z ∞ Z z−x
H(z) := P (X + Y ≤ z) = dF (x)dG(y) = dF (x) dG(y)
x+y≤z −∞ ∞
Z ∞ Z ∞
= G(z − x)dF (x) = F (z − y)dG(y),
−∞ −∞
M (t) := E[etX ].
This looks just like the CF, except that we have no i. This looks simpler –
but it is in fact more complicated. For, |eitX | ≤ 1, so all the expectations
6
above converge. But, as etX may grow exponentially, the MGF may fail to
be defined for all real t – may diverge to +∞ for some t. However, when it
does exist, we can expand the exponential and take expectations termwise:
with
µn := E[X n ]
the nth moment of X,
∞
X ∞
X ∞
X
tX n n n n
M (t) = E[e ] = E[ t X /n!] = t E[X ]/n! = tn µn /n!.
0 0 0
The one function M (t) on the left is said to generate the infinitely many
moments µn on the right, hence the name MGF.
Moments.
Recall from Complex Analysis (M2P3) that an analytic function deter-
mines and is determined by its power-series expansions. So if power-series
expansion of M (t) above about the origin has radius of convergence R > 0,
we can find the moments by differentiation:
Similarly for CFs: remembering the i: in µ(n) = φ(n) (0) (here things always
converge, so we don’t have to worry about R).
7
the LST above to the PGF (replace e−s , s ≥ 0, by s ∈ [0, 1]: if X takes
values in N0 , with
P (X = n) = pn (n = 0, 1, 2, · · · ),
the PGF of X is ∞
X
X
P (s) := E[s ] = pn sn ;
0
P∞
as P (1) = 0 pn = 1 (< ∞!), the radius of convergence of the power series
here is always at least 1.
As with CF, so with MGF, LST and PGF: the transform (convolution) of an
independent sum is the product of the transforms, and similarly for moments.
pk := e−λ λk /k! (k = 0, 1, 2, · · · ).
P
From the exponential series, k pk = 1, so this does indeed give a probability
distribution (or law, for short) on N0 . It is called the Poisson distribution
P (λ), with parameter λ, after S.-D. Poisson (1781-1840) in 1837.
The Poisson law has mean λ. For if N is a random variable with the
Poisson law P (λ), N ∼ P (λ), N has mean
X X X X
E[N ] = kP (N = k) = kpk = k.e−λ λk /k! = λ e−λ λk−1 /(k−1)! = λ,
8
the Poisson are called over-dispersed; data with variance less than Poisson
are under-dispersed.
3. The variance calculation above used the (second) factorial moment,
E[N (N − 1)]. These are better for count data than ordinary moments.
9
We can find which laws show no aging, as follows. The law F has the
lack-of-memory property iff the components show no aging – that is, if a
component still in use behaves as if new. The condition for this is
P (X > s + t|X > s) = P (X > t) (s, t > 0) :
P (X > s + t) = P (X > s)P (X > t).
Writing F (x) := 1 − F (x) (x ≥ 0) for the tail of F , this says that
F (s + t) = F (s)F (t) (s, t ≥ 0).
Obvious solutions are
F (t) = e−λt , F (t) = 1 − e−λt
for some λ > 0 – the exponential law E(λ). Now
f (s + t) = f (s)f (t) (s, t ≥ 0)
is a ‘functional equation’ – the Cauchy functional equation – and we quote
that these are the only solutions, subject to minimal regularity (such as one-
sided boundedness, as here – even on an interval of arbitrarily small length!).
So the exponential laws E(λ) are characterized by the lack-of-memory
property. Also, the lack-of-memory property corresponds in the renewal con-
text to the Markov property. The renewal process generated by E(λ) is
called the Poisson (point) process with rate λ, P pp(λ). So: among renewal
processes, the only Markov processes are the Poisson processes. We meet
Lévy processes below: among renewal processes, the only Lévy processes are
the Poisson processes.
It is the lack of memory property of the exponential distribution that
(since the inter-arrival times of the Poisson process are exponentially dis-
tributed) makes the Poisson process the basic model for events occurring
‘out of the blue’. Typical examples are accidents, insurance claims, hospital
admissions, earthquakes, volcanic eruptions etc. So it is not surprising that
Poisson processes and their extensions (compound Poisson processes) domi-
nate in the actuarial and insurance professions, as well as geophysics, etc.
10
(x > 0 is needed for convergence at the origin). One can check (integration
by parts, and induction) that
thus Gamma provides a continuous extension to the factorial. One can show
1 √
Γ( ) = π
2
R − 1 x2 √
(the proof is essentially that R e 2 dx = 2π, i.e. that the standard nor-
mal density integrates to 1). The Gamma function is needed for Statistics, as
it commonly occurs in the normalisation constants of the standard densities.
Sn := W1 + · · · + Wn ∼ Γ(n, λ).
11
The MGF of the sum of independent random variables is the product of the
MGFs (same for characteristic functions, CFs, and for probability generating
functions, PGFs – check). So W1 + · · · + Wn has MGF (λ/(λ − t))n , the MGT
of Γ9, n, λ) as above:
Sn := W1 + · · · Wn ∼ Γ(n, λ).
Proof. Part (i) is clear: the first lifetime is positive (they all are).
The link between the Poisson process, defined as above in terms of the
exponential distribution, and the Poisson distribution, is as follows. First,
12
This starts an induction, which continues (using integration by parts):
But the integral here is P (Nt = k − 1). So (passing from Gammas to facto-
rials)
(λt)k (λt)k−1
P (Nt = k) − e−λt = P (Nt = k − 1) − e−λt ,
k! (k − 1)!
completing the induction. This shows that
N (t) ∼ P (λt).
This gives (ii) also: re-start the process at time t, which becomes the new
time-origin. The re-started process is a new Poisson process, by the lack-of-
memory property applied to the current item (lightbulb above); this gives
(ii) and (iii). Conversely, independent increments of N corresponds to the
lack-of-memory property of the lifetime law, and we know that this charac-
terises the exponential law, and so the Poisson process. //
Yt := X1 + · · · + XN (t)
13