0% found this document useful (0 votes)
57 views

Portfolio Theory Notes

York notes of Portfolio Theory module

Uploaded by

Invisible Friend
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views

Portfolio Theory Notes

York notes of Portfolio Theory module

Uploaded by

Invisible Friend
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 103

Portfolio Theory and Risk Management

Contents

1 Two assets 4
1.1 Expected return . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Variance as a risk measure . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Semi–variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Portfolios consisting of two assets . . . . . . . . . . . . . . . . . . . . . . . 12
1.5 Return . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.6 Feasible set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.7 Special cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.8 Minimum variance portfolio . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.9 Adding a risk–free security . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.10 Indifference curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2 Many assets 32
2.1 Lagrange multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.2 Risk and return . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.3 Three risky securities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.4 Minimum variance portfolio . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.5 Minimum variance line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.6 Market portfolio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.7 CAPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.8 Derivation of CAPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.9 Security Market Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

3 Utility functions 63
3.1 Basic notions and axioms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.2 Utility maximisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.3 Relation to mean variance analysis . . . . . . . . . . . . . . . . . . . . . . . 72
3.4 Risk aversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.5 Utility functions and indifference curves . . . . . . . . . . . . . . . . . . . . 76

4 Value at Risk 80
4.1 Quantiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.2 Measuring downside risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.3 Examples of computing VaR . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.4 VaR in the Black–Scholes model . . . . . . . . . . . . . . . . . . . . . . . . 95

i
Introduction
These lecture notes cover the material of the module Portfolio Theory and Risk Manage-
ment, one of the three components of the certificate stage of on-line MSc in Mathematical
Finance offered by the University of York.
The module is divided into four parts, each corresponding to some exercises and course-
work assignments as follows:
Chapter 1 – Exercises 1, Coursework Assignment 1
Chapter 2 – Exercises 2, Coursework Assignment 2
Chapter 3 – Exercises 3, Coursework Assignment 3
Chapter 4 – Exercises 4, Coursework Assignment 4
The notes contain links to some Excel files which should be placed in Excel subfolder.
The following textbooks are recommended as auxiliary sources:
[CK] M. J. Capinski, E.Kopp, Portfolio Theory and Risk Management, to appear in
Cambridge University Press
[CZ] M.Capinski, T.Zastawniak, Mathematics for Finance, 2ed. Springer 2010.

3
1
Two assets
In this chapter we first analyse various ways of introducing the two fundamental concepts
of finance: return and risk. In brief, return reflects the efficiency of an investment, risk is
concerned with uncertainty. The balance between these two is at the heart of portfolio theory.

1.1 Expected return


We are concerned with just two time instants: the present t = 0 and the future t = 1, where
1 may stand for any unit of time. Suppose we make a single period investment in some stock
with the current price S (0) known, and the future price S (1) unknown, hence assumed to be
represented by a random variable

S (1) : Ω → [0, +∞),

where Ω is the sample space of some probability space (Ω, F, P) .


When Ω is finite, Ω = {ω1 , . . . , ωN }, we shall adopt the notation

S (1, ωi ) = S (1)(ωi ) for i = 1, . . . , N,

for the possible values of S (1). In this setting it is natural to equip Ω with the σ-field F = 2Ω of
all its subsets. To define a probability measure P : F → [0, 1] it is sufficient to give its values
on single element sets, P({ωi }) = pi , by choosing pi ∈ [0, 1] such that i=1 pi = 1. We can then
PN
compute the expected price at the end of the period
N
X
E(S (1)) = S (1, ωi )pi ,
i=1

and the variance of the price


N
X
Var(S (1)) = (S (1, ωi ) − E(S (1)))2 pi .
i=1

4
Example 1.1
Assume that S (0) = 100 and

with probability 12 ,
(
120
S (1) =
90 with probability 12 .

Then E(S (1)) = 12 120 + 12 90 = 105 and Var(S (1)) = (120 − 105)2 12 + (90 − 105)2 21 =
152 . Observe
√ also that the standard deviation, which is the square root of the variance,
is equal to Var(S (1)) = 15. To open a file with this example click here: Excel.

Here is a Video of how the computation is done.

Here is a Video of how the expectation and variance changes with respect to initial
data.

Wen S (1) has a continuous distribution with a density function f : R → R, then


Z ∞
E(S (1)) = x f (x)dx,
−∞

and Z ∞
Var(S (1)) = (x − E(S (1)))2 f (x)dx.
−∞

5
Example 1.2
Assume that S (1) = S (0) exp (m + sZ) , where Z is a random variable with standard nor-
mal distribution N(0, 1). This means that S (1) has lognormal distribution. The density
function of S (1) is equal to
2
1 (ln S (0) −m)x

f (x) = √ e− 2s2 for x > 0,


xs 2π
and 0 for x ≤ 0. We can compute the expected price as
Z ∞
E(S (1)) = x f (x)dx
0
2
1 − (ln S (0)2−m)
Z ∞ x

= √ e 2s dx
0 s 2π
Z ∞ !
sy+m 1 1 x
2
− y2
= S (0)e √ e dy (taking y = ln −m )
−∞ 2π s S (0)
Z ∞
2
m+ s2 1 (y−s)2
= S (0)e √ e− 2 dy
−∞ 2π
s2
= S (0)em+ 2 .

We may allow any probability space. However, we must make sure that negative values of
S (1) are excluded since these make no sense from the point of view of economics. This means
that the distribution of S (1) has to be supported on [0, +∞) (meaning that P(S (1) ≥ 0) = 1).
The return on the investment S is a random variable K : Ω → R, defined as
S (1) − S (0)
K= .
S (0)
By the linearity of mathematical expectation
E(S (1)) − S (0)
E(K) = .
S (0)
We introduce the convention of using the Greek letter µ for expectations of various random
returns
µ = E(K),
with various subscripts indicating the context, if necessary.

Here is a Video of an Excel computation of the return, expected return and the standard
deviation of the return.

6
The relationships between the prices and returns can be written as

S (1) = S (0)(1 + K),


E(S (1)) = S (0)(1 + µ),

which illustrates the possibility of reversing the approach: given the returns we can find the
prices.
The requirement that S (1) is nonnegative impies that we must have K ≥ −1. This in par-
ticular excludes the possibility of considering K with Gaussian (normal) distribution.
At time t = 1 a dividend may be paid. In practice, after the dividend is paid, the stock
price drops by this amount, which is logical. Thus we have to distinguish between the price
that includes the dividend; more precisely, between the right to receive that price (the cum
dividend price) and the price after the dividend is paid (the ex dividend price). We assume
that S (1) denotes the latter, hence the definition of the return has to be modified to account for
dividends:
S (1) + Div(1) − S (0)
K= .
S (0)
A bond is a special security that pays a certain sum of money, known in advance, at ma-
turity; this sum is the same in each state. The return on a bond is not random if the bond is
held to maturity. Consider a bond paying a unit of home currency at time t = 1, B(1) = 1,
purchased for B(0) < 1. The return
1 − B(0)
r=
B(0)
defines a risk-free interest rate (provided that the length of the period is one year; otherwise
some technical adjustment is necessary). The bond price can be expressed by means of r
1
B(0) =
1+r
giving the present value of a unit at time 1.

1.2 Variance as a risk measure


The concept of risk in finance is captured in many ways. The basic and most widely used one
is concerned with risk as uncertainty of the unknown future value of some quantity in question
(here we are concerned with return). This uncertainty is understood as the scatter around some
reference point. A natural candidate for the reference value is the mathematical expectation
(though some other numbers can be also considered). The extent of scatter is conveniently
measured by the variance. This notion takes care of two aspects of risk:

(i). The distances between possible values and the expectation,

7
(ii). The probabilities of attaining these values.

Definition 1.3
By (the measure of) risk we mean the variance of the return

Var(K) = E(K − µ)2 = E(K 2 ) − µ2

or the standard deviation denoted by

σK =
p
Var(K).

The variance of the return can be computed from the variance of S (1),
!
S (1) − S (0)
Var(K) = Var
S (0)
1
= Var (S (1) − S (0))
S (0)2
1
= Var (S (1)) .
S (0)2
We introduce the convention of using the Greek letter σ for standard deviations of various
random returns
σ = Var(K),
p

with various subscripts indicating the context, if necessary.

We have a Video of an application of the formulae.

Standard deviation alone does not fully capture the risk of an investment. We illustrate this
with a simple example.

8
Example 1.4
Consider three assets with today’s prices S i (0) = 100 for i = 1, 2, 3 and time 1 prices
with following distributions

with probability 12 ,
(
120
S 1 (1) =
90 with probability 12 ,
with probability 12 ,
(
140
S 2 (1) =
90 with probability 12 ,
with probability 12 ,
(
130
S 3 (1) =
100 with probability 12 .

We can see that

σ1 = Var(K1 ) = 15,
p

σ2 = Var(K2 ) = 25,
p

σ3 = Var(K2 ) = 15.
p

Here σ2 > σ1 and σ3 = σ1 , both the second and third assets are preferable to the
first: both have higher expected return than the first (0.15 against 0.05), while the worst
outcomes are the same for the first two assets. We shall return to this in the next section.
(see Excel file: Excel and Video )

When considering the risk of an investment we should take into account both the expected
return and its standard deviation. Given the choice between two securities a rational investor
will, if possible, choose that with the higher expected return and lower standard deviation, that
is, lower risk. This motivates the following definition.

Definition 1.5
We say that a security with expected return µ1 and standard deviation σ1 dominates
another security with expected return µ2 and standard deviation σ2 whenever

µ1 ≥ µ2 and σ1 ≤ σ2 .

The meaning of the word “dominates” is that we assume the investors to be risk averse.
One can imagine an investor whose personal goal is just the excitement. This person will not
pay any attention to return and will prefer higher risk. However, it is not our intention to cover
such individuals by our theory.
The playground for portfolio theory will be the (σ, µ)−plane, in fact the right half-plane
since the standard deviation is non-negative. Each security is represented by a dot on this
plane. This means that we are making a simplification by assuming that the expectation and

9
Figure 1.1 Efficient subset.

variance are all that matters when investment decisions are made.
We assume that the dominating securities are preferred, which geometrically (geograph-
ically) means that the north-west moves are preferable. This ordering is only partial, since
looking at Figure 1.1 we see for instance that the pairs (σ1 , µ1 ) and (σ2 , µ2 ) are not compara-
ble.
Given a set A of securities, we consider the subset of all maximal elements with respect
to the dominance relation and call it the efficient subset. If the set A is finite, finding the
efficient subsets reduces to eliminating the dominated securities. Figure 1.1 shows the set of
five securities with efficient subset consisting of just three, numbered 1, 2, and 4.

1.3 Semi–variance
Consider the three assets described in Example 1.4. Although σ1 = σ3 , the third asset carries
no ‘downside risk’, since neither outcome for S 3 (1) involves a loss for the investor. Similarly,
although σ2 > σ1 , the downside risk for the second asset is the same as that for the first (a 50%
chance of incurring a loss of 10), but the return for the expected return for the second asset is
0.15, making it the more attractive investment even though, as measured by variance, it is more
risky. Since investors regard risk as concerned with failure (i.e. downside risk), the following
modification of variance is sometimes used. It is called semi-variance and is computed by the
formula that takes into account only the unvafourable outcomes, where the return is below the
expected value
E(min{0, K − µ})2 . (1.1)
The square root of semi-variance is denoted by semi-σ. However, this notion still does not
agree fully with the intuition.

10
Example 1.6

Assume that Ω = {ω1 , ω2 }, P({ω1 }) = P({ω2 }) = 1


2
and

K(ω1 ) = 10%,
K(ω2 ) = 20%.

Consider a modification K 0 with

K 0 (ω1 ) = 10%,
K 0 (ω2 ) = 30%.

Then K 0 is definitely better than K but the semi-variance and the variance for K 0 are
both higher than for K (see Excel and Video ).

If variance or semi-variance are to represent risk, it is illogical that a better version should
be regarded as more risky. This defect can be rectified by replacing the expectation by some
other reference point, for instance risk free rate with the following modification of (1.1),

E(min{0, K − r})2 .
which eliminates the above unwanted feature. Instead of the risk free rate, one can also con-
sider the cost of capital, i.e. the return required by the investor.
These versions are not very popular in the financial world, the variance being the basic
measure of risk. This is explained by its simplicity, in particular differentiability, which gives
it the advantage that one can use calculus to solve minimisation problems. In addition, if the
probability distribution of a quantity is Gaussian (normal), the variance and the expectation
determine the whole distribution.
The Gaussian distributions are important due to the central limit theorem, since they natu-
rally emerge as limits of some discrete cases. In Example 1.2 we have seem a typical applica-
tion of Gaussian distributions for stocks with lognormal distribution. For such distributions it
is convenient to use the lognormal return
!
S (1)
k = ln ,
S (0)

and consider its expected return and standard deviation for portfolio analysis.
In our presentation of portfolio theory we follow the historical tradition and take variance as
the risk measure. It is however possible to develop a version of the theory for some alternative
risk measure. In most cases though such theory does not produce neat analytic formulae as is
the case for mean and variance.

11
1.4 Portfolios consisting of two assets
We begin a discussion of portfolio risk and expected return in the simplest situation of two
risky securities. We denote the prices of the securities as S 1 and S 2 . We start by a motivating
example.

Example 1.7
Let Ω = {ω1 , ω2 }, S 1 (0) = 200, S 2 (0) = 300. Assume that
1
P ({ω1 }) = P ({ω1 }) = ,
2
and that
S 1 (1, ω1 ) = 260, S 2 (1, ω1 ) = 270,
S 1 (1, ω2 ) = 180, S 2 (1, ω2 ) = 360.
The expected return and standard deviation for the two assets are

µ1 = 10%, µ2 = 5%
σ1 = 20%, σ2 = 15%.

Assume that we spend V(0) = 500, buying a single share of stock S 1 and a single share
of stock S 2 . At time 1 we will have

V(1, ω1 ) = 260 + 270 = 530,


V(1, ω2 ) = 180 + 360 = 540.

The expected return on the investment is 7% and the standard deviation is just 1%. We
can see that by diversifying the investment into two stocks we have considerably reduced
the risk.

See Video.

From the example we see that the risk can be reduced by diversification. In this section we
discuss how to minimize risk when investing in two stocks.

1.5 Return
Suppose that we buy x1 shares of stock S 1 and x2 shares of stock S 2 . The initial value of this
portfolio is

V x1 ,x2 (0) = x1 S 1 (0) + x2 S 2 (0).

12
When we design a portfolio, usually its initial value is the starting point of our considerations
and it is given. The decision on the number of shares is secondary and follows from the
decision on the percentage division of our wealth. This can be expressed by means of the
weights defined by
x1 S 1 (0) x2 S 2 (0)
w1 = , w2 = . (1.2)
V x1 ,x2 (0) V x1 ,x2 (0)
If the initial wealth V(0) and the weights w1 , w2 , w1 + w2 = 1, are given, the funds allocated to
a particular stock are w1 V(0), w2 V(0), respectively, and the numbers of shares we receive are
w1 V(0) w2 V(0)
x1 = , x2 = .
S 1 (0) S 2 (0)
At the end of the period the securities prices change, which gives the final value of the
portfolio as a random variable

V x1 ,x2 (1) = x1 S 1 (1) + x2 S 2 (1).

To express the return on a portfolio we shall employ the weights rather than the numbers
of shares since this is more convenient.
The return on the investment in two assets depends on the method of allocation of the
funds (the weights) and the corresponding returns. The vector of weights will be denoted by
w = (w1 , w2 ), or in matrix notation " #
w1
w= ,
w2
and the return of the corresponding portfolio by Kw .

Proposition 1.8
The return Kw on a portfolio consisting of two securities is the weighted average

Kw = w1 K1 + w2 K2 , (1.3)

where w1 and w2 are the weights and K1 and K2 the returns on the two components.

Proof
With the numbers of shares computed as above, we have the following formula for the

13
value of the portfolio

V x1 ,x2 (1) = x1 S 1 (1) + x2 S 2 (1)


w1 V x1 ,x2 (0) w2 V x1 ,x2 (0)
= S 1 (0)(1 + K1 ) + S 2 (0)(1 + K2 )
S 1 (0) S 2 (0)
= V x1 ,x2 (0) (w1 (1 + K1 ) + w2 (1 + K2 ))
= V x1 ,x2 (0)(1 + w1 K1 + w2 K2 ),

hence
V x1 ,x2 (1) − V x1 ,x2 (0)
Kw = = w1 K1 + w2 K2 .
V x1 ,x2 (0)

Here is a Video with an example.

In reality, the numbers of shares have to be integers. This, however, puts a constraint on
possible weights since not all percentage splits of our wealth can be realized. To simplify
matters we make an assumption that our stock position, that is, the number of shares, can be
any real number.
When the number of shares of given stock is positive, then we say that we have a long
position in the stock. We shall assume that we can also hold a negative number of shares of
stock. This is known as short selling. Short selling is a mechanism in which at time zero we
borrow stock and immediately sell it; we then need to buy it back at time one to return it to
the lender. This mechanism gives us additional money at time zero, that can be invested in a
different security.

14
Example 1.9
Consider the stocks S 1 and S 2 from Example 1.7. Suppose that at time zero we have
V(0) = 600. Suppose also that at time zero we borrow three shares of stock S 1 , meaning
that we choose x1 = −3. We sell the three shares of stock, which together with V(0)
gives us 3 · 200 + 600 = 1200 to invest in the second asset. We can thus take x2 = 4.
Note that
V x1 ,x2 (0) = x1 S 1 (0) + x2 S 2 (0) = 600 = V(0).
At time one we have the proceeds from holding four shares of S 2 , but we need to buy
back the three shares of S 1 at its market value. Since

V x1 ,x2 (1) = x1 S 1 (1) + x2 S 2 (1),

we see that

V x1 ,x2 (1, ω1 ) = −3 · 260 + 4 · 270 = 300,


V x1 ,x2 (1, ω2 ) = −3 · 180 + 4 · 360 = 900.

We can compute the weights using (1.2)


−3 · 200 4 · 300
w1 = = −1, w2 = = 2.
600 600
We see that, as expected, w1 + w2 = 1.

Here is an accompanying Video.

When short selling is allowed, we assume that the weights can be any real numbers whose
sum is one. In real markets short selling comes with restrictions. To take a short position a
trader usually needs to pay a lending fee or to make a deposit. Throughout the discussion we
make a simplifying assumption that short selling is free of such charges. Since not all real
markets allow short selling, sometimes we distinguish a special case, where all the weights are
non-negative.

1.6 Feasible set


Finding the risk of a portfolio requires, apart from the risks of the components and the weights,
some knowledge about their statistical relationship.
Recall from the probability the notion of covariance of two random variables, X, Y

Cov(X, Y) = E [(X − E(X))(Y − E(Y)] = E(XY) − E(X)E(Y), (1.4)

15
with Cov(X, X) = Var(X) in particular.

Here is a Video with an example.

Let us introduce the following notation

σi j = Cov(Ki , K j ),
for i, j = 1, 2. In particular,

σ11 = Cov(K1 , K1 ) = Var(K1 ) = σ21 ,


σ22 = Cov(K2 , K2 ) = Var(K2 ) = σ22 .

From (1.4) we see that


σ12 = σ21 .
If the returns are independent, then we have σ12 = 0.
For convenience, the so-called correlation coefficient is also introduced
σi j
ρi j = . (1.5)
σi σ j
For this to make sense we have to assume that the variances of both returns are non-zero. A
variance is zero in one case only, namely when the random variable is constant (almost surely).
So we assume that the returns on stocks are genuine, non-constant, random variables, unlike
bonds, where the return is the same in each state (scenario).
Since |σi j | ≤ σi σ j (which is a particular case of the Schwarz inequality), the correlation
coefficient satisfies
−1 ≤ ρi j ≤ 1.
This makes correlation a good coefficient to measure dependence. If the correlation coefficient
is close to one or minus one, then there is a strong influence of one variable on the other. It is
more difficult to make such assertions by looking at covariance only. We can have examples
in which the covariance is a small number and yet the dependence is strong, or the other way
around.

Theorem 1.10
The expected return and the variance of the return on a portfolio are given by

µw = E(Kw ) = w1 µ1 + w2 µ2 , (1.6)
σ2w = Var (Kw ) = w21 σ21 + w22 σ22 + 2w1 w2 σ12 . (1.7)

Here is a Video with the proof.

16
Proof
Equation (1.6) follows directly from (1.3) and linearity of mathematical expectation

µw = E(Kw ) = E (w1 K1 + w2 K2 ) = w1 E(K1 ) + w2 E(K2 ).

We wish to compute the standard deviation of the return on a portfolio of two stocks:

σ2w = E(Kw2 ) − µ2w .

Substituting (1.3) and (1.6), and using (1.4) in the last equality, gives

σ2w = E(w21 K12 + w22 K22 + 2w1 w2 K1 K2 ) − w21 µ21 − w22 µ22 − 2w1 w2 µ1 µ2
= w21 [E(K12 ) − µ21 ] + w22 [E(K22 ) − µ22 ] + 2w1 w2 [E(K1 K2 ) − µ1 µ2 ]
= w21 σ21 + w22 σ22 + 2w1 w2 σ12 .

Corollary 1.11
Using (1.5) we can rewrite the formula for variance of a portfolio as

σ2w = w21 σ21 + w22 σ22 + 2w1 w2 ρ12 σ1 σ2 . (1.8)

Corollary 1.12
Using the following matrix notation

µ1
" # " #
w1
w= , µ= ,
w2 µ2

σ21 σ12
" #
C= ,
σ12 σ22
equations (1.6–1.7) can be written as

µw = wT µ, (1.9)
σ2w = wTCw. (1.10)

Here we have a Video of how the formulae can be applied in Excel.

The collection of all portfolios that can be manufactured by means of two given assets (in
other words, the feasible, or attainable set) can be conveniently depicted in the (σ, µ)-plane.

17
Figure 1.2 Attainable set.

Figure 1.3 Portfolio lines for various values of ρ12 .

Assume that σ1 ≤ σ2 and µ1 , µ2 (let µ1 < µ2 for instance). Take the first weight as a
parameter writing w = w1 . Hence w2 = 1 − w, w = (w, 1 − w) and the expected return and
standard deviation of the portfolio as functions of w have the form

µw = wµ1 + (1 − w)µ2 , (1.11)


σ2w = w2 σ21 + (1 − w)2 σ22 + 2w(1 − w)ρ12 σ1 σ2 .

The attainable set is therefore a curve parameterized by w. An example of such set is depicted
in Figure 1.2 (see Excel). If short selling is not allowed we restrict our attention to the segment
corresponding to w ∈ [0, 1]. This is the thicker part of the curve in Figure 1.2.
The shape of the line depends on the correlation coefficient ρ12 . This is shown in Figure 1.3
(see Excel). We see that for negative ρ12 we can reduce the risk of the portfolio, at the same
time achieving an expected return between the expected returns of the two risky assets.
Suppose that the position of the two basis securities is such as in Figure 1.4, namely one
dominates the other. Depending on the correlation coefficient the portfolios manufactured may
give the investor extra choice, for instance we may obtain the portfolios whose risk is lower
than the risk of any of the individual assets. This shows that rejecting the dominated security
would be a bad decision.
From (1.11) we see that µw is affine, and σ2w is a quadratic function with respect to w. Since

18
Figure 1.4 Portfolio line with one asset dominating the other.
y

x
h

(x−h)2 (y−k)2
Figure 1.5 A hyperbola a2
− b2
= 1.

a graph of the root of a quadratic function is a hyperbola, one can guess that the attainable set
consisting of all points (µw , σw ) is likely to be a hyperbola (see Excel).

Here ia a Video describing the change of the shape of the hyperbola depending on the
choice of ρ12 .

Theorem 1.13
If µ1 , µ2 and ρ12 ∈ (−1, 1), then the feasible set is a hyperbola with a center on the
vertical axis.

Here is a Video with a sketch of the proof, focusing on the main idea.

Proof
For better clarity we change the notation introducing the letters x, y for the coordinates

19
so that we have the following description of the feasible set:

y = wµ1 + (1 − w)µ2 , (1.12)


x2 = w2 σ21 + (1 − w)2 σ22 + 2w(1 − w)σ12 . (1.13)

The goal of further computations is to convert above system of equations to the form

(x − h)2 (y − k)2
− = 1, (1.14)
a2 b2
from which we will be able to read the properties of the hyperbola (see Figure 1.5).
Solving (1.12) for w
y − µ2
w=
µ1 − µ2
(note the relevance of the assumption µ1 , µ2 ) and inserting into (1.13), we get
1
x2 = [(y − µ2 )2 σ21 + (µ1 − y)2 σ22 + 2(y − µ2 )(µ1 − y)σ12 ],
A
where A = (µ1 − µ2 )2 > 0. Simple computation gives
1
x2 = [By2 − 2Cy + D], (1.15)
A
where

B = σ21 + σ22 − 2σ12 ,


C = σ21 µ2 + σ22 µ1 − σ12 (µ1 + µ2 ),
D = σ21 µ22 + σ22 µ21 − 2σ12 µ1 µ2 .

Observe, that B > 0 if ρ12 < 1, since σ21 + σ22 − 2σ12 > σ21 + σ22 − 2σ1 σ2 ≥ 0.
Let us observe that
 C D
By − 2Cy + D = B y − 2y +
2 2
B B
C 2 C2 D
" #
= B (y − ) − 2 +
B B B
= B(y − k)2 + c
 
with k = CB and c = B1 BD − C 2 . Substituting into (1.15) gives

1h i
x2 = B(y − k)2 + c ,
A

20
µ

( 2 , µ2 )

( 1 , µ1 )

Figure 1.6 Efficient frontier.

hence
x2 (y − k)2
c − c = 1. (1.16)
A B
We can see that we have obtained the desired hyperbola equation (1.14), with h = 0,
meaning that the center of the hyperbola lies on the vertical axis (see Figure 1.5).
One loose end to tie up is to show that c , 0. Otherwise we would have a division by
zero in (1.16). A simple but tedious computation shows that

BD − C 2 = Aσ21 σ22 (1 − ρ212 ).

Since ρ12 ∈ (−1, 1), B > 0 and A > 0,


1  A
c= BD − C 2 = σ21 σ22 (1 − ρ212 ) > 0.
B B

We shall return to the above discussion when later on we shall be working with n assets. It
will come as a surprise that from the point of view of technical difficulties, the general case will
be simpler than the particular situation just worked out, where only two assets are involved. It
will also turn out that the case of many assets reduces to the case of just two and we will be
able to draw valuable conclusions that remain valid in general case, from the discussion of the
present chapter .
In practice we can reject some of these portfolios drawing on the basic preference prop-
erty, namely, given two portfolios with the same risk, the one with higher expected return is
preferable. So we may discard the lower part of the curve restricting our attention to the up-
per, called the efficient set or frontier, as shown in Figure 1.6. More precisely, a portfolio is
called efficient if there is no other portfolio, except itself, that dominates it. The set of efficient
portfolios among all attainable portfolios is called the efficient frontier.

21
Figure 1.7 Attainable set for ρ12 = ±1.

1.7 Special cases

Here is a Video with an overview of the two special cases.

Our first special case is when ρ12 = −1. From (1.8),

σ2w = w21 σ21 + w22 σ22 − 2w1 w2 σ1 σ2


= (w1 σ1 − w2 σ2 )2 ,

hence
σw = |w1 σ1 − w2 σ2 | .
Since σw is non negative, so the smallest possible value is σw = 0. Taking w1 = w and
w2 = 1 − w gives
σw = |wσ1 − (1 − w)σ2 | , (1.17)
and we can solve for σw = 0, obtaining
σ2 σ1
w= , 1−w= . (1.18)
σ1 + σ2 σ1 + σ2
Since σ1 , σ2 ≥ 0, we can see that w ∈ [0, 1], hence we can minimize our risk to zero without
short–selling.
From (1.17) and (1.11) one can show that the attainable set consists of two half lines,
emanating from the vertical axis (see Figure 1.7).
Our second case is ρ12 = 1. Then

σ2w = w21 σ21 + w22 σ22 + 2w1 w2 σ1 σ2


= (w1 σ1 + w2 σ2 )2 ,

and
σw = |w1 σ1 + w2 σ2 | .

22
Figure 1.8 Portfolio line for one risky and one risk–free security.

Similarly to the previous case, we obtain σw = 0 for


−σ2 σ1
w1 = , w2 = . (1.19)
σ1 − σ2 σ1 − σ2
This requires that σ1 , σ2 , and we exclude this trivial case. Since σ1 , σ2 ≥ 0, either w or 1 − w
has to be negative, hence we can not minimize risk to zero without short–selling. Without
short–selling the smallest risk is either at w = 0 or at w = 1.

Here is a Video showing the derivation of the weights for which σw = 0 for the two
special cases.

Finally, consider a particular case where one of the assets is risk free, σ1 = 0, say. The
return on this asset is sure, µ1 = r and a reasonable assumption is that r < µ2 since otherwise
risk averse investors would never invest in the risky asset, its price should fall and so the
expected return should grow above the risk free level. (The preferences of investors will be
discussed in more detail later.) The return and risk for portfolios take a simplified form

µw = w1 r + w2 µ2 ,
σ2w = w22 σ22

giving
σw = |w2 | σ2 .
and so the set in the (σ, µ)-plane is as shown in Figure 1.8, see Excel (with redundant lower
part according to the preference relation).
The segment between the risk free asset and the asset characterized by (σ2 , µ2 ) corresponds
to positive weights. The line above (σ2 , µ2 ) requires taking short position in the risk free asset,
in other words, borrowing at the riskless rate (which we assume here to be possible). The
rejected lower segment shows portfolios with a short position in the risky asset.

23
1.8 Minimum variance portfolio
We wish to minimize the variance σ2w - or, equivalently, the standard deviation σw . We start
with a theorem where the problem is solved in the case when short–selling is allowed.

Theorem 1.14
If short selling is allowed, then the portfolio with minimum variance has the weights
wmin = (w1 , w2 ) with
a b
w1 = , w2 = ,
a+b a+b
where

a = σ22 − ρ12 σ1 σ2 ,
b = σ21 − ρ12 σ1 σ2 ,

unless both ρ12 = 1 and σ1 = σ2 .

Here is a Video with the proof.

Proof
When ρ12 = −1, then from (1.18)
σ2 σ2 (σ1 + σ2 ) a
w1 = = = .
σ1 + σ2 (σ1 + σ2 )2 a+b
Similarly, for ρ12 = 1, using (1.19)
−σ2 −σ2 (σ1 − σ2 ) a
w1 = = = .
σ1 − σ2 (σ1 − σ2 )2 a+b
When ρ12 ∈ (−1, 1),
σ2w = w2 σ21 + (1 − w)2 σ22 + 2w(1 − w)ρ12 σ1 σ2
is a quadratic function. We compute the derivative of σ2w with respect to w and equate it
to 0:
2wσ21 − 2 (1 − w) σ22 + 2(1 − w)ρ12 σ1 σ2 − 2wρ12 σ1 σ2 = 0.
Solving for w gives the above result. The second derivative is positive,
2σ21 + 2σ22 − 4ρ12 σ1 σ2 > 2σ21 + 2σ22 − 4σ1 σ2 = 2 (σ1 − σ2 )2 ≥ 0,
which shows that we have a global minimum.

24
In Corollary 1.12 the return and variance of a given portfolio were stated in terms of the
covariance matrix
σ1 σ12
" 2 #
C=
σ12 σ22
for the two assets. We now do the same for the weights of the minimum variance portfolio.
By Cramer’s Rule
σ21 −σ12
" #
1
C =
−1
,
det C −σ12 σ22
so we have, writing 1 = (1, 1),

σ22 −σ12
" #
1
C 1 =
−1
,
det C −σ12 σ21
1
1T C −1 1 = (σ2 + σ22 − 2σ12 ).
det C 1
We have proved the following

Corollary 1.15
The vector w = (w1 , w2 ) of weights of the minimum variance portfolio found in Theorem
1.14 has the form
C −1 1
w = T −1 ,
1 C 1
provided that the denominator is non-zero.

Here is a Video with an Excel example of the fact that the matrix formula does work.

We now discuss what happens when short–selling is not allowed. We need to find the
minimum of
σ2w = w2 σ21 + (1 − w)2 σ22 + 2w(1 − w)ρ12 σ1 σ2
for restricted values of the weight 0 ≤ w ≤ 1. Let w1 be the coefficient from Theorem 1.14.
The claim is illustrated in Figure 1.9, where the bold parts correspond to portfolios with no
short selling. We can see that the smallest variance is attained at wmin = (w, 1 − w) with

0 if w1 < 0,



w=

w1 if w1 ∈ [0, 1],

 1 if w > 1.


1

Hence, if the global minimum is outside [0, 1], en embargo on short-selling means that an
investor wishing to minimise his risk should put all his funds into one of the two assets.

25
Figure 1.9 Smallest variance with short–selling restrictions.
µ

Figure 1.10 Feasible set after adding a risk–free security.

1.9 Adding a risk–free security


All portfolios built of the risk free asset (with rate of return r) and any other asset are repre-
sented by a straight half-line starting from (0, r) and passing though the corresponding points
on the (σ, µ) plane. The new feasible region is thus obtained by taking any point on the attain-
able set and linking it with the risk free asset, as shown in Figure 1.10.

Here is a Video with an Excel example and a discussion.

To find the new efficient frontier we seek a line with the highest slope according to the
preference relation. Note that it is reasonable to make the following restriction: the risk–free
rate is smaller than the expected return of the risk minimizing portfolio. Under this assumption
there is a unique portfolio on the efficient frontier, called the market portfolio, such that the
line with the highest slope passes through it (see Figure (1.11)). This optimal line, called
the Capital Market Line, is tangent to the efficient frontier (as follows from the elementary
geometric properties of hyperbolas). Denoting the expected return of the market portfolio by
µm and its risk by σm , the capital market line is given by
µm − r
µ=r+ σ. (1.20)
σm

26
Theorem 1.16
The weights of the market portfolio are m = (w, 1 − w), with
c d
w= , 1−w= , (1.21)
c+d c+d
where

c = σ22 (µ1 − r) − σ12 (µ2 − r),


d = σ21 (µ2 − r) − σ12 (µ1 − r).

Here is a Video with a sketch of the proof.

Proof
For a portfolio (w, 1 − w), we denote its expected return by µ(w), and standard deviation
by σ(w). Optimization is based on maximizing the slope coefficient:
µ(w) − r
s(w) = .
σ(w)
To this end it is necessary and sufficient to solve
s0 (w) = 0
(sufficiency follows from the uniqueness of the solution). We have
µ0 (w)σ(w) − (µ(w) − r)σ0 (w)
s0 (w) = .
σ2 (w)
Since
p 0 1 1
σ0 (w) = σ2 (w) = (σ2 (w))0 = (σ2 (w))0
2 σ (w)
p
2 2σ(w)
the equation s0 (w) = 0 reduces to
2µ0 (w)σ2 (w) − (µ(w) − r)(σ2 (w))0 = 0
that is
(µ1 − µ2 )(w2 σ21 + (1 − w)2 σ22 + 2w(1 − w)σ12 )
−(wµ1 + (1 − w)µ2 − r)(wσ21 − (1 − w)σ22 + (1 − 2w)σ12 ) = 0.
This is in fact a linear equation in w since all terms involving w2 cancel out. Elementary,
but tedious computations give
c d
w= , 1−w= .
c+d c+d

27
µ CML

( 2 , µ2 )
MP
MVP
r ( 1 , µ1 )

Figure 1.11 The minimum variance portfolio MVP, the market portfolio MP, and the capital market
line CML.

Corollary 1.17
The formulae (1.21) for the weights of the market portfolio can be written in matrix
notation as
C −1 (µ − r1)
m = T −1 , (1.22)
1 C (µ − r1)
where C is the covariance matrix, µ is the vector of expected returns, and 1 is a vector
consisting of numbers 1 on both coordinates.

Here is a Video with an Excel example of the fact that the matrix formula does work.

The following argument shows a possible practical relevance of the market portfolio.
Suppose that the market consists of two securities and suppose that the investors make their
decisions on the basis of the expected returns and the covariance matrix, assuming in addition
that they all use the same numerical values (returns, variances and covariance for the assets).
If they all behave rationally, they perform the above computations and all arrive at the same
market portfolio. They may choose different portfolios on the Capital Market Line, but they all
invest in the two given components in the same proportions. The conclusion is that the weights
of the market portfolio are given by the percentage value of all shares of each asset.
To see this consider an example. Asset A is represented by 1000 shares 20 dollars each,
asset B by 500 shares, 40 dollars each, so each asset represents 50% of the market. If the
investors have these assets in any other proportion, this leads to a contradiction with the fact
that they all should have the same portfolio. Should any have above 50% of asset A, say, this
would leave some other investors unsatisfied, since they wish to get more A than is available,
and to sell some unwanted B. This would result in excess supply of B and excess demand
of A, which would alter the prices, the expected returns and consequently the weights on the
market portfolio. For the above argument to be valid we have to assume that the market is in
equilibrium

28
The assumptions needed to arrive at this conclusion are never fulfilled in real life (for
instance the assumption that all investors believe that the parameters of the assets are identical)
so the practical relevance of the result is rather limited.

Example 1.18
Assume that the covariance matrix C, the vector of expected returns µ, and the risk free
rate r are given. Assume also that an investor wishes to spend V and his aim is to achieve
an expected return equal to a given rate m. The question is how much he should spend
on the risky assets, and how much should he invest risk free.
First we compute m using (1.21). We can then compute the expected return of the
market portfolio using (1.9)
µm = mT µ.
Optimal investments lie on the capital market line. The investor needs to hold a combi-
nation of the market portfolio and the risk–free security. We assume that he spends λV
on the market portfolio and (1 − λ) V risk free. The desired λ can be computed from the
expected return of the position

λµm + (1 − λ) r = m,

giving
m−r
λ= .
µm − r
Since the investor spends λV on the market portfolio, the vector
!
v1
= λVm,
v2

gives us the amount v1 invested in the first asset, and v2 invested in the second asset. As
mentioned above, (1 − λ) V is invested risk free.

Here is a Video with an Excel example.

1.10 Indifference curves


The preference relation does not help us choose between two assets, one with higher expected
return and higher risk, the other less risky but with lower return. It seems impossible to extend
the relation to solve this decision problem so that this extension would be accepted by all
investors. The relation is based on risk aversion, but the investors who, as assumed, share this

29
µ

( 1 , µ1 )

Figure 1.12 An indifference curve for (σ1 , µ1 ).

attitude, may differ in the intensity of their aversion. An investor who is sensitive to risk may
require much higher returns as a compensation for increased exposure. Another investor may
be cornered, forced to accept risk to earn the return needed to fulfill the requirement created
by the circumstances, or may be just less sensitive to risk. It is inevitable that we have to allow
modelling individual preferences. So let us fix our attention on one particular investor, and
fix one particular asset (or portfolio of assets). We assume that this investor can answer the
following question: which assets are equally attractive as the fixed one? The answer provides
us with a certain set of assets. Since the preference relation is valid, the intersection of this
set by any line parallel to any of the axes can contain at most one element. So it is a graph of
an increasing function. We assume in addition that this function is convex for each investor –
in other words, to retain his peace of mind, the investor demands that a unit increase of risk
be offset by more than one unit increase in return, as shown in Figure 1.12 - and we call it an
indifference curve.

Here is a Video with a discussion.

We assume that indifference curves are level sets of a function

u : R2 → R.

We assume that a curve {u = c2 } lies above {u = c1 } for c1 < c2 . In other words, the higher the
value of u, the higher the investor’s satisfaction with the investment. Given a set of attainable
portfolios, an investor chooses the one placed on the best indifference curve. It is geometrically
obvious as a result of convexity of the curves that the optimal portfolio is at the tangency point
for some indifference line, as shown in Figure 1.13.
For another investor, who is less risk averse, that is, who has less steep indifference curves,
the optimal portfolio may be different, Figure 1.13. It lies further to the right which agrees
with our intuition regarding the risk preferences of this investor.

30
Figure 1.13 Indifference curves and optimal inveatment for a risk averse investor (left), and for a
risk indifferent investor (right).

Example 1.19
Assume that the covariance matrix C, the vector of expected returns µ, and the risk free
rate r are given, and that an investor’s indifference curves are given by

u(σ, µ) = µ − aσ2 − bσ.

We show how the investor should spend V to maximize u.


Using (1.22), (1.9) and (1.10) we can find the market portfolio m, its expected return µm
and variance σ2m . Since the slope of the curve u, which is 2aσ + b, needs to match the
slope of the capital market line, the tangency point can be found by solving the system
of two linear equations
µm − r
µ = r+ σ,
σm
µm − r
2aσ + b = .
σm
This means that
µm − r 1 µm − r
!
µ=r+ −b .
σm 2a σm
We can now decide how to divide V amongst the assets using the same method as in
Example 1.18.

We shall return to indifference curves in section 3.5, where we will discuss their relation
with utility functions.

31
2
Many assets
2.1 Lagrange multipliers
The objective of this section is to show how to solve the following problem
min f (v)
under constraints: (2.1)
g(v) = 0,
where
f : Rn → R,
g : Rn → Rk .
Before writing out a solution, we need to introduce some notations.
To keep better track of dimensions, we use a bold font whenever we are dealing with
vectors, and normal font when dealing with numbers. Note that above we used f for a function
taking values in reals and g for a function
g(v) = (g1 (v), . . . , gk (v))
k
taking values in R .
For a function f : Rn → R and v ∈ Rn we write ∇ f (v) for a vector
 ∂f 
 ∂x1 (v) 
∇ f (v) =  ...  .
 
 ∂ f 
∂xn
(v)

Here is a Video dicussing the intuition behind the notion of a gradient.

Theorem 2.1
If v∗ is a solution of the problem (2.1), and the derivative of g at v∗ is a matrix of rank k,
then there exists a sequence of numbers λ1 , . . . , λk ∈ R such that

∇ f (v∗ ) − (λ1 ∇g1 (v∗ ) + . . . + λk ∇gk (v∗ )) = 0. (2.2)

32
The λ1 , . . . , λk from Theorem 2.1 are referred to as Lagrange multipliers, and the function

L(v) = ∇ f (v∗ ) − (λ1 ∇g1 (v∗ ) + . . . + λk ∇gk (v∗ ))

is referred to as the Lagrangian.


We give the proof of Theorem 2.1 later on in the section, after introducing some necessary
preliminaries. Before doing so let us comment that Theorem 2.1 provides only the necessary
conditions. Even when (2.2) holds for some v∗ , it does not necessarily imply that v∗ is a
minimum. This is very similar in spirit to searching for a minimum of a function f : R → R.
The obvious candidate for a minimum is a point x∗ satisfying f 0 (x∗ ) = 0. It is of course not
enough that f 0 (x∗ ) = 0 for x∗ to be a minimum though. Some additional conditions need
to be checked. Similarly, Theorem 2.1 is a handy tool for finding a candidate for a solution
of problem (2.1). To prove that this candidate is indeed a solution usually requires some
additional work.
The proof of Theorem 2.1 will rely on the implicit function theorem, which is a classi-
cal result in analysis. We therefore write out the theorem without giving its proof. We first
introduce some notations.
For
g = (g1 , . . . , gk ) : Rl × Rm → Rk
∂g ∂g
and (x, y) ∈ Rl × Rm , x = (x1 , . . . xl ) and y = (y1 , . . . ym ) we write ∂x
and ∂y
for matrices

∂g1 ∂g1 ∂g1


...
 
∂x1
(x, y) ∂x2
(x, y) ∂xl
(x, y)
∂g
 
(x, y) = 
 .. .. ..  ,

∂x . . .
∂gk ∂gk ∂gk

(x, y) . . .

∂x1
(x, y) ∂x2 ∂xl
(x, y)
∂g1 ∂g1 ∂g1
...
 
∂y1
(x, y) ∂y2
(x, y) ∂ym
(x, y)
∂g
 
(x, y) = 
 .. .. ..  .

∂y . . .
∂gk ∂gk ∂gk

y) . . .

∂y1
(x, y) ∂y2
(x, ∂ym
(x, y)

Here is a Video with an example.

33
Theorem 2.2 Implicit function theorem

Consider n > k and a C 1 function

g = (g1 , . . . , gk ) : Rn−k × Rk → Rk .

Assume that for a point (x∗ , y∗ ) ∈ Rn−k × Rk

g(x∗ , y∗ ) = 0,
∂g ∗ ∗
and that the matrix ∂y (x , y ) is invertible. Then there exists a neighbourhood U × V ⊂
R × R of (x , y ) and a C 1 function
n−k k ∗ ∗

h : U → V,

such that
g(x, h(x)) = 0 for all x ∈ U.
Moreover, for any v ∈ U × V, if g(v) = 0, then v = (x, h(x)) for some x ∈ U.

Here is a Video with some intuition behind the implicit function theorem.

Corollary 2.3
For the function h from Theorem 2.2
!−1
∂g ∂g
h (x) = −
0
(x, h(x)) (x, h(x)) .
∂y ∂x

Proof
Since g(x, h(x)) = 0, by computing a derivative with respect to x we obtain
∂g ∂g
(x, h(x)) + (x, h(x))h0 (x) = 0.
∂x ∂y
The claim follows by rearranging so that h0 (x) is on the left hand side.

We are now ready to prove Theorem 2.1:

34
Proof Proof of Theorem 2.1
Since the derivative of g at v∗ is a matrix of rank k there exist k-dimensional coordinates
∂g ∗
y such ∂y (v ) is invertible. We can always re-number the coordinates so that v = (x, y)
with x ∈ Rn−k and y ∈ Rk .

See the Video with the proof up till now.

By the implicit function theorem, we know that there exists a function h such that

g(x, h(x)) = 0.

Since v∗ = (x∗ , y∗ ) is a solution of problem (2.1), x∗ is a minimum of f (x, h(x)), meaning


that the derivative of f (x, h(x)) with respect to x is zero at x∗ . Applying Corollary 2.3,
this gives
∂f ∗ ∂f ∗ 0 ∗
0= (v ) + (v )h (x )
∂x ∂y
!−1
∂f ∗ ∂ f ∗ ∂g ∗ ∂g ∗
= (v ) − (v ) (v ) (v ) . (2.3)
∂x ∂y ∂y ∂x

The proof up to this point is discussed in this Video.

We define a 1 × k matrix Λ as
!−1
h i ∂ f ∗ ∂g ∗
Λ= λ1 λ2 . . . λk = (v ) (v ) .
∂y ∂y
From (2.3) follows that
∂f ∗ ∂g
(v ) = Λ (v∗ ) . (2.4)
∂x ∂x
From the definition of Λ
∂f ∗ ∂g
(v ) = Λ (v∗ ) . (2.5)
∂y ∂y

A more detailed explanation is given in this Video.

Conditions (2.4) and (2.5) combined give (2.2).

The following Video contains a more detailed explanation of the last line of the
proof.

35
In some cases, the necessary condition (2.2) turns out to be sufficient for v∗ to be a solution
of the problem (2.1).

Theorem 2.4
Assume that f (v) is a smooth convex function and that

g(v) = Av − c,

where A is an k×n matrix and c ∈ Rk . If there exist a sequence of numbers λ1 , . . . , λk ∈ R


and a point v∗ ∈ Rn such that (2.2) is satisfied, then v∗ is a solution of the problem (2.1).

Proof
Let us take any v satisfying g(v) = 0. We need to show that f (v) ≥ f (v∗ ).
Using a notation λ = (λ1 , . . . , λk ) we can write
λ1 ∇g1 (v∗ ) + . . . + λk ∇gk (v∗ ) = AT λ. (2.6)

The Video gives a more detailed explanation of the proof up this point.

Let w = v − v∗ . Since g(v) = 0 and g(v∗ ) = 0


0 = g(v) = g(v∗ + w) = Av∗ + Aw − c = g(v∗ ) + Aw = Aw. (2.7)
By the Taylor’s formula
Z 1 !
1
f (v + w) = f (v ) + ∇ f (v ) · w + w
∗ ∗ ∗ T
(1 − t) H( f, v + tw)dt w,

(2.8)
2 0

where the dot stands for the scalar product and H( f, w) is the Hessian of f at w, that is,
the n × n matrix
∂2 f
!
H( f, w) = .
∂xi ∂x j i, j=1,...,n
Since f is convex wT H ( f, v∗ + tw) w ≥ 0 so for the matrix
1 1
Z
B (w) := (1 − t) H( f, v∗ + tw)dt
2 0
we see that
Z 1 !
1
w B (w) w = w
T T
(1 − t) H( f, v + tw)dt w

(2.9)
2 0
Z 1
1
≥ (1 − t) wT H( f, v∗ + tw)wdt
2 0
≥ 0.

36
We can now compute

f (v) = f (v∗ + w)
= f (v∗ ) + ∇ f (v∗ ) · w + wT B (w) w (from (2.8)
≥ f (v∗ ) + AT λ · w (from (2.2), (2.6) and (2.9)))
 T
= f (v ) + A λ w
∗ T

= f (v∗ ) + λT Aw
= f (v∗ ) (from (2.7)).

This concludes our proof.

Here is a Video with the proof.

To see a simple example of an application of the Lagrange multiplier method see the
Video.

2.2 Risk and return


A portfolio constructed from n different securities can be described by means of the vector of
weights
w =(w1 , . . . , wn ),
with a constraint j=1 w j = 1. With a notation 1 for an n-dimensional vector
Pn

1 = (1, . . . , 1) ,

the constraint can be convieniently written as

wT 1 = 1. (2.10)

The attainable set A consists of all possible portfolios with weights w:


n o
A = w : wT 1 = 1 .
If short-selling is not possible, a constraint w j ≥ 0 , for all j, is added throughout. Here, unless
stated otherwise, we assume availability of short sales.
Alternatively a portfolio is described by the vector of positions taken in particular compo-
nents (numbers of units of assets)
x = (x1 , . . . , xn ).

37
We have the following relations between the weights, prices and the numbers of shares
x j S j (0)
wj = , j = 1, . . . , n,
V(0)
where x j is the number of shares of type j in the portfolio, S j (0) is the initial price of security j,
and V(0) is the total money invested.
Denote the random returns on the securities by K1 , . . . , Kn , and the vector of expected
returns by
µ = (µ1 , . . . , µn ),
with
µ j = E(K j ), for j = 1, . . . , n.
The covariances between returns will be denoted by σ jk = Cov(K j , Kk ), in particular σ j j =
σ2j = Var(K j ). These are the entries of the n × n covariance matrix

σ11 σ12 · · · σ1n


 
 
σ21 σ22 · · · σ2n
 
C =  .. .. .. ..  .
 
 . . . . 
σn1 σn2 · · · σnn
 

We write as before n
X
Kw = w j K j.
j=1

Theorem 1.10 can easily be generalised.

Theorem 2.5
The expected return µw = E(Kw ) and variance σ2w = Var(Kw ) of a portfolio with weights
w are given by

µw = wT µ,
σ2w = wTCw.

Here is a Video with the proof.

38
Proof
The formula for µw follows from the linearity of mathematical expectation,
 n  n
X  X
µw = E(Kw ) = E  w j K j  =
  w j µ j = wT µ.
j=1 j=1

For σ2w we use the bilinearity of covariance:

σ2w = Var(Kw )
= Cov (Kw , Kw )
 n n

X X 
= Cov  w j K j , wk Kk 
j=1 k=1
n
X
= w j wk σ jk (since Cov(K j , Kk ) = σ jk )
j,k=1

= wTCw.

Proposition 2.6
For any two portfolios

wA = wA,1 , . . . , wA,n ,


wB = wB,1 , . . . , wB,n ,


the covariance between the returns is

Cov(KwA , KwB ) = wTA CwB .

Here is a Video with the proof.

39
Proof
Using the bilinearity of covariance we compute
 n n

X X 
Cov(KwA , KwB ) = Cov  wA, j K j , wB,k Kk 
j=1 k=1
n
X
= wA, j wB,k σ jk (since Cov(K j , Kk ) = σ jk )
j,k=1

= wTA CwB .

Instead of using covariance matrix as initial data, we can use correlation. Assume that we
have a matrix of correlations between returns
 1 ρ12 · · · ρ1n 
 
 ρ21 1 · · · ρ2n 
% =  .. .. . . ..  ,

 . . . . 
ρn1 ρn2 · · · 1
 

with σi j
ρi j = .
σi σ j
We also use a notation diag(σ1 , . . . , σn ) to denote an n × n matrix with σ1 , . . . , σn on the
diagonal, and zero on remaining entries. The following lemma gives a link between C and %.

Lemma 2.7
The covariance matrix is equal to

C = diag(σ1 , . . . , σn ) % diag(σ1 , . . . , σn ).

Proof
We first compute

 σ1 ρ12 σ2 · · · ρ1n σn 


 

 ρ21 σ1 σ2 · · · ρ2n σn 


 
% diag(σ1 , . . . , σn ) =  .. .. .. ..  .
 . . . . 
ρn1 σ1 ρn2 σ2 · · · σn
 

40
w1 w1
w2 w2

Figure 2.1 The plots of µw and σw with respect to w1 , w2 .

Multiplying by diag(σ1 , . . . , σn ) on the left hand side we have

diag(σ1 , . . . , σn ) % diag(σ1 , . . . , σn )
 σ1 σ1 σ1 ρ12 σ2 · · · σ1 ρ1n σn 
 
 σ2 ρ21 σ1 σ2 σ2 · · · σ2 ρ2n σn 
=  .. .. .. ..

 . . . .


σn ρn1 σ1 σn ρn2 σ2 · · · σn σn

= C (since σi j = σi ρi j σ j , and ρii = 1)

2.3 Three risky securities


The purpose of this section is to provide geometric intuition as to the shape of the attainable
set.
In the case when we have three risky assets, the third weight of a portfolio can be computed
from the first two weights
w3 = 1 − w2 − w1 ,
meaning that the attainable set is parameterized by w1 and w2 . We can write the formulae for
µw and σw with respect to these two parameters as

µw = w1 µ1 + w2 µ2 + w3 µ3
= w1 µ1 + w2 µ2 + (1 − w1 − w2 ) µ3 ,

and

σ2w = w21 σ21 + w22 σ22 + w23 σ23 + 2w1 w2 σ12 + 2w1 w3 σ13 + 2w2 w3 σ23
= w21 σ21 + w22 σ22 + (1 − w2 − w1 )2 σ23 + 2w1 w2 σ12
+2w1 (1 − w2 − w1 ) σ13 + 2w2 (1 − w2 − w1 ) σ23 .

41
w2 w2
1 1

w1 w1
1 1

Figure 2.2 The lines µw = m (left) and the curves σw = c (right).

The plots of µw and σw are given in Figure 2.1. The lines on the graphs represent the level sets
{µw = m} and {σw = c} for several values of m and c.
Since the third weight can be computed from the first two, the attainable set is represented
as the (w1 , w2 ) plane, as in Figure 2.2. The vertices of the gray triangle represent investments
in single assets. The point (1, 0) represents the first asset, (0, 1) the second asset, and since
w3 = 1 − w1 − w2 , the point (0, 0) represents the third asset. The gray triangle consists of the
points
{(w1 , w2 ) : w1 , w2 ≥ 0, w1 + w2 ≤ 1}, (2.11)
and contains portfolios attainable without short–selling. The level sets {µw = m} and {σw = c}
from Figure 2.1 can be projected onto the (w1 , w2 ) plane in Figure 2.2. These are the straight
lines and ellipses on the graphs in Figure 2.2, respectively. The middle point of the ellipses
is the minimum variance portfolio. In this particular figure, since the point lies outside of the
triangle, we see that the minimum variance portfolio requires short selling. In Figure 2.2 we
also see that if short selling is not allowed, then the smallest attainable σw lies on the ellipse,
which is tangent to the gray triangle. The minimum variance portfolio without short–selling is
the tangency point.
We now discuss the shape the attainabe portfolios take on the (σ, µ) plane. We start with
Figure 2.3, where we see the plane corresponding to portfolios with µw = m, together with the
plot of σw . We see that there is a single point, which has smallest attainable variance under
the constraint µw = m. This is the point at the bottom of the intersection of the plane with the
hyperbola. From the plot we also see that for µw = m we can have portfolios with arbitrarily
large σ. This leads to a conclusion that on the (σ, µ) plane, the set of portfolios with µw = m is
a horizontal half line, which is depicted in Figure 2.4. Intuitively one can think of Figure 2.4
as the leftmost graph from Figure 2.3, rotated clockwise by ninety degrees, and projected onto
the plane. Since the plot of σw is a hyperbola, one is lead to believe that the boundary of the
attainable set on the (σ, µ) plane should also be a hyperbola. This is just a geometric intuition,
and is by no means meant as a proof. We shall prove this fact later on.

42
w1 w1
w2 w2 w1
w2

Figure 2.3 The plot of σw together with µw = m.

Figure 2.4 Attainable portfolios.

Here is a Video that provides some intuition.

When short–selling is not allowed, the attainable set is restricted to the set from equation
(2.11). In such case, on the (σ, µ) plane the attainable set takes the shape depicted in Figure
2.5. The three points represent the three assets. A hyperbola passing through any two points,
represents portfolios involving investments in the two securities corresponding to the points.
The fragments of the hyperbolas between two points correspond to the edges of the triangle
from Figure 2.2. The attainable set in Figure 2.5 can therefore be interpreted as a distorted and

Figure 2.5 Attainable portfolios with short–selling constraints.

43
folded projection of the triangle from Figure 2.2.

2.4 Minimum variance portfolio


In this section we give the formula for the weights of the portfolio with smallest variance.
Before doing so, we need to consider a technical lemma.

Lemma 2.8
We have the following formulae for the gradients with respect to w
 
∇ wT µ = µ, (2.12)
 
∇ wT 1 = 1, (2.13)
 
∇ wTCw = 2Cw, (2.14)

and the Hessian of wTCw is equal to 2C.

Proof
Since
∂  T  ∂
w µ = (w1 µ1 + . . . + wn µn ) = µi
∂wi ∂wi
we see that

 
wT µ µ1
   
∂w1
  
  .. ..
 
∇ w µ =   =   = µ,
T
 
.  .

 

wT µ µn
 
∂wn

which proves (2.12).

Here is a Video of the proof up to this point.

The proof of (2.13) follows from a mirror argument, using 1 instead of µ.


To prove (2.14) we observe that in
n n
∂  T  ∂ XX
w Cw = w j wk σ jk
∂wi ∂wi j=1 k=1

44
the derivative of each term can be non zero only when j = i or k = i. This means that
n n
∂ XX
w j wk σ jk
∂wi j=1 k=1
 
∂  XX XX 
= wi wi σii + w j wk σ jk + w j wk σ jk 
∂wi j=i k,i j,i k=i
X X
= 2wi σii + wk σik + w j σ ji
k,i j,i
n
X
= 2 wk σik (since σ ji = σi j ) (2.15)
k=1
= 2 (Cw)i

where (Cw)i stands for the i-th coordinate of the vector Cw. Combining the partial
derivatives on all coordinates gives (2.14).

Here is a Video with the argument, that provides a bit more detail.

Using (2.15) we can compute


 n 
∂ ∂  T  ∂  X
w Cw = wk σik 

2
∂wl ∂wi ∂wl k=1
= 2σil
= 2σli ,

hence
∂2  T 
!
w Cw = (2σli )l,i≤n = 2C,
∂wl ∂wi l,i≤n

which is the Hessian of wTCw.

Here is a Video with the argument.

We are now ready to give the the formula for the weights of the minimum variance protfo-
lio.

45
Theorem 2.9
The portfolio with the smallest variance in the attainable set has weights

C −1 1
wmin = . (2.16)
1TC −1 1

Here is a Video with the proof.

Proof
We need to find the minimum of wTCw subject to the constraint

wT 1 = 1. (2.17)

To this end we use the method of Lagrange multipliers taking a Lagrangeian


   
L(w) = ∇ wTCw −∇ λ(1T w − 1) .

By (2.13) and (2.14) from Lemma 2.8,

L(w) = 2Cw − λ1 = 0,

hence
λ
w = C −1 1. (2.18)
2
Substituting this into the constraint (2.17), we obtain
λ T −1
1 = wT 1 = 1T w = 1 C 1
2
Solving this for λ and substituting the result into (2.18) gives (2.16).
We have shown that (2.16) is the only candidate for a local extremum. From Lemma
2.8 we know that the Hessian of wTCw is 2C, which is positive definite. By Theorem
2.4 this means that wmin is a local minimum. Since wmin is the only local extremum, it
needs to be a global minimum.

The minimum variance portfolio has a surprising property that its covariance with any other
portfolio is constant. This property will prove useful later on, when discussing the shape of the
attainable set on the (σ, µ) plane.

46
Corollary 2.10
For any portfolio w
Cov(Kw , Kwmin ) = σ2wmin .

Here is a Video with the proof.

Proof
By Proposition 2.6

Cov(Kw , Kwmin ) = wTCwmin


C −1 1
= wTC T −1
1 C 1
T
w 1
= T −1
1 C 1
1
= T −1 . (2.19)
1 C 1
Above holds for any portfolio w, hence also in particular for w = wmin , giving
1
σ2wmin = Var(Kwmin ) = Cov(Kwmin , Kwmin ) = . (2.20)
1TC −1 1
Combining (2.19) with (2.20) we obtain our claim.

2.5 Minimum variance line


To find the efficient frontier, we have to recognise and eliminate the dominated portfolios. To
this end we fix a level of expected return, denote it by m, and consider all portfolios with
µw = m. All of these are redundant except the one with the smallest variance. The family of
such portfolios, parametrised by m, is called the minimum variance line (see Figure 2.6).
More precisely, portfolios on the minimum variance line are solutions of the following
problem:
min wTCw

under constraints: (2.21)


wT µ = m,
wT 1 = 1.

47
µ
MVL

Figure 2.6 Minimum variance line (MVL).

Theorem 2.11
Let M be a 2 × 2 matrix of the form
µ C µ µTC −1 1
" T −1 #
M= .
µTC −1 1 1TC −1 1

If C and M are invertible, then the solution of problem (2.21) is given by


1
w= C −1 (det(M1 ) µ + det(M2 ) 1) , (2.22)
det(M)
where
m µTC −1 1 µTC −1 µ m
" # " #
M1 = , M2 = .
1 1TC −1 1 µTC −1 1 1

For an outline of the proof see the Video.

Proof
We introduce the Lagrange multiplier λ = (λ1 , λ2 ), and the Lagrangeian
      
L(w) = ∇ wTCw −∇ λ1 wT µ − m + λ2 wT 1 − 1 = 0.

Using Lemma 2.8 we can compute

L(w) = 2Cw − λ1 µ − λ2 1 = 0.

We solve this system for w


1 1
w = λ1C −1 µ + λ2C −1 1. (2.23)
2 2
Since wT µ = µT w and wT µ = µT w, substituting (2.23) into the constraints from (2.21),

48
we obtain a system of linear equations
1 1
λ1 µTC −1 µ + λ2 µTC −1 1 = m,
2 2
1 1
λ1 1TC −1 µ + λ2 1TC −1 1 = 1.
2 2
We can solve above system for λ1 and λ2 to obtain (note the relevence of the assumption
that M is invertible, which ensures that det(M) , 0)
1 det (M1 ) 1 det (M2 )
λ1 = , λ2 = .
2 det (M) 2 det (M)
Substituting the above back into (2.23) gives (2.22).
We have found a candidate for the solution of (2.21). By Lemma 2.8 we know that the
Hessian of wTCw is equal 2C, which is a positive definite matrix. This ensures that we
have found a global minimum.

The formula (2.22) is long and somewhat cumbersome to apply. Our aim will be to simplify
it. The first step towards this end is to notice that all portfolios on the minimum varance line
can be expressed as an affince function of two fixed vectors.

Corollary 2.12
There exist two vectors a and b, which depend only on C and µ, such that a solution of
the problem (2.21) is
w = ma + b.

The following Video provides the proof.

Proof
Since

det (M1 ) = m1TC −1 1 − µTC −1 1,


det (M2 ) = µTC −1 µ − mµTC −1 1,

from (2.22) we see that w = ma + b for


1     
a = C −1 1TC −1 1 µ − µTC −1 1 1 ,
det(M)
1     
b = C −1 µTC −1 µ 1 − µTC −1 1 µ .
det(M)

49
µ

µwmin MVP

Figure 2.7 Efficient frontier, together with the minimum variance portfolio (MVP).

The efficient frontier, which is the set of all portfolios not dominated by any other portfo-
lios, consists of w = am + b for m ≥ µwmin (see Figure 2.7, Excel and Video ).
We will now show that the whole minimum variance line can be computed from just two
portfolios.

Corollary 2.13
Suppose that w1 and w2 are two portfolios on the minimum variance line with different
expected returns: µw1 , µw2 . Then any portfolio w on the minimum variance line can be
obtained from these two, that is, there is an α such that w = αw1 + (1 − α)w2 .

For the intuition behind the proof see the Video.

Proof
We first find α so that
µw = αµw1 + (1 − α)µw2 .
This is possible since the returns are different:
µw − µw2
α= .
µw1 − µw2
Since the two portfolios lie on the minimum variance line, they satisfy

w1 = µw1 a + b,
w2 = µw2 a + b.

From these relations we have

αw1 + (1 − α)w2 = (αµw1 + (1 − α)µw2 )a + b = µw a + b,

but w is also on the minimum variance line so w = µw a + b hence the result.

50
To see how Corollary 2.13 works in Excel see Video.

The minimum variance portfolio wmin lies on the minimum variance line (see Excel). We
therefore already have a simple formula (2.16) for one of the two portfolios needed to obtain
the minimum variance line. The second portfolio will be the market portfolio. We give the
formula for the market portfolio in the next section. The resulting parameterization of the
minimum variance line is then written out in equation (2.27).
From Corrolary 2.13 follows an important observation.

Theorem 2.14
Suppose that there exist two portfolios w1 and w2 on the minimum variance line with
different expected returns: µw1 , µw2 . Then the minimum variance line is a hyperbola
with a center on the vertical axis.

Here is a Video with the proof.

Proof
Let Kw1 and Kw2 be the returns on portfolios w1 and w2 , respectively. From Corollary
2.13 we know that any portfolio on the minimum variance line can be expressed as

w = αw1 + (1 − α)w2 ,

hence its return is equal to

Kw = αKw1 + (1 − α)Kw2 .

We can treat each of the two portfolios as if it were a single security. Applying the
results from chapter two for portfolios consisting of two securities, we know that

µw = αµw1 + (1 − α) µw2 ,
σ2w = α2 σ2w1 + (1 − α)2 σ2w2 + 2α (1 − α) Cov Kw1 , Kw2 .


Since µw1 , µw2 , by Theorem 1.13 the curve (σw , µw ) is a hyperbola.

2.6 Market portfolio


Recall that the market portfolio is the optimal portfolio on the efficient frontier taking into
account existence of a risk–free asset. The line connecting the market portfolio with the risk

51
µ CML

MP
MVP
r

Figure 2.8 Minimum variance portfolio MVP, the market portfolio MP, and the capital market line
CML.

free asset is tangent to the minimum variance line and has the maximal slope among the lines
determined by all portfolios (Figure 2.8 and Excel).
Above we found the formula for the market portfolio obtained in case of two risky securi-
ties determining the efficient set. This result is of course applicable to the general situation in
view of Corollary 2.13. However, we derive the formula again, this time the parameters of all
n securities will be used.

Theorem 2.15
If the risk free rate is smaller than the expected return of the minimum variance portfolio,
then the market portfolio exists and is given by

C −1 (µ − r1)
m= . (2.24)
1TC −1 (µ − r1)

Proof
From Theorem 2.14 we know that the minimum variance line is a hyperbola. Since its
centre is on the vertical axis, there exists a single tangency point for a half line emanating
from (0, r), which maximizes the slope (see Figure 2.8). The slope in question is of the
form
µw − r wT µ − r
= √ ,
σw wTCw
where w are the weights of a portfolio and r is the risk free rate of return. At the maximal
slope the Lagrangian
wT µ − r
L(w) = ∇ √ − λ∇(wT 1 − 1),
wTCw
needs to be equal to zero. We can compute the gradients using Lemma 2.8 and equate

52
them to zero:

µ wTCw − (wT µ − r) 2 √w1TCw 2Cw
L(w) = − λ1 = 0.
wTCw

See Video for an overview of the argument so far.

This yields
Cw
µσw − (µw − r) − λσ2w 1 = 0,
σw
hence
µw − r
Cw = µ − λσw 1.
σ2w
Multiplying by wT on the left and using the fact that wT 1 = 1 we get
µw − r T
w Cw = µw − λσw ,
σ2w
so
r
λ= ,
σw
therefore we have the equation
γCw = µ − r1
µw −r
where γ = σ2w
, so that
γw = C −1 (µ − r1).
Even though we have a w in the formula for γ, the γ turns out to be a constant. This
follows from multiplying above equation by 1T on both sides, which gives

γ = 1TC −1 (µ − r1). (2.25)

By substituting the γ into (2.25) we obtain our claim.

See Video for the second part of the proof.

The line joining the risk–free security represented by (0, r) and the market portfolio with
coordinates (σm , µm ) satisfies the equation
µm − r
µ=r+ σ. (2.26)
σm
It is called the Capital Market Line, CML in brief. For a portfolio on CML with risk σ the

53
term µσmm−r σ is called the risk premium which is the additional return above the risk–free level
as a reward or compensation for exposure to risk.
If each investor has the same view on the values of the model parameters (the expected re-
turns on the basic assets and the entries of the covariance matrix) and if each investor chooses
an optimal portfolio according to convex indifference curves on the basis of risk-return anal-
ysis, then all these optimal portfolios are placed on the CML. Consequently, they should all
invest in just one risky portfolio, namely the market portfolio, (combining it with the risk free
asset in a preferred individual way). Consequently, the market portfolio weights should rep-
resent the relative volumes of the values of particular shares with respect to the whole market
(similar argument like in chapter two where we discussed a baby market with just two ingredi-
ents). Such a portfolio is represented in reality by an index. An empirical test of the practical
relevance of the whole theory is therefore to see if the market index lies on the efficient frontier
and does it determine the tangent line if combined with the risk free asset.
We now return to the discussion on the shape of the minimum variance line. From Corol-
lary 2.13 we know that the minimum variance line can be constructed using wmin and m.
By Corollary 2.13, Cov(Kwmin , Km ) = σ2wmin , which gives the following paremetrization of all
(σw , µw ) on the minimum variance line:

µw = αµwmin + (1 − α) µm , (2.27)
σ2w = α2 σ2wmin + (1 − α)2 σ2m + 2α (1 − α) σ2wmin .

The µwmin , σwmin , µm and σm are easy to compute, due to the simplicity of the expressions for
wmin and m. This makes (2.27) a handy tool for making plots of the minimum variance line
(see Excel and Video ).
We finish the section by considering a situation where we have different interest rates for
risk free borrowing and investing. This is a more realistic setting than assuming that we have
single interest rate r.
Assume that we can invest risk free at a rate r1 and borrow at a rate r2 . It is natural to
consider r1 < r2 . Any portfolio w invested in the risky securities can be combined with a risk
free investment at the rate r1 . This gives the following portfolios on the (σ, µ)-plane:

µα = αr1 + (1 − α) µw ,
for α ≥ 0.
σα = |1 − α| σw ,

Note that we can not take α < 0, since this implies a short position in r1 , which would mean
borrowing at r1 .
We can also combine any portfolio w with borrowing at r2 , giving

µα = αr2 + (1 − α) µw ,
for α ≤ 0.
σα = (1 − α)σw ,

We can not take α > 0, since this would mean investing at r2 , which is not allowed. We can
only borrow at this rate.

54
µ

m2
r2 m1
r1

Figure 2.9 Efficient frontier in the case of different rates for investing and borrowing risk free.

To find the efficient frontier we first establish two tangency portfolios m1 and m2 , for the
half lines starting from (0, r1 ) and (0, r2 ), respectively. The m1 and m2 can be computed using
(2.24) taking r1 and r2 instead of r, respectively. The frontier is depicted in Figure 2.9 and
consists of the interval between (0, r1 ) to (σm1 , µm1 ), the fragment of the minimum variance
line between (σm1 , µm1 ) and (σm2 , µm2 ), together with the half line starting from (σm2 , µm2 ).

See Video for an Excel example.

2.7 CAPM
Paradoxically, in the world where decisions are based on risk expressed as variance, if we
look at an asset to asses its risk, its variance turns out not so relevant as the covariance with
all the other assets. This claim will be illustrated by an example, supported by economic
considerations, and finally proved.
We begin with a brief discussion of market equilibrium. The goal of an investment is
to earn a suitable return. The question is about the understanding the meaning of the word
‘suitable’. We take a simplistic view that the level of this return, called the required return,
depends on the risk concerned with a particular asset. If risk is high, the required rate of return
is also high. We leave for a moment the question of how to relate these notions in a quantitative
way, an answer will be given later in this chapter. The required return is compared with the
expected return (we assume that investors have already estimated it). If the required return
is lower than the expected, the investor decides to buy the asset. As a result of the emerging
demand the price grows, which pushes the expected return down. If on the other hand the
required return is higher than expected, the prices should drop thus leading to an increase of
the expected return (as the very definition of return shows). In equilibrium the expected return
should be equal to the required return.
Consider now a conjecture that the required return is related to variance as a measure of
risk. Consider two assets with the same expected return but different variances and assume that
the market is in equilibrium. We shall demonstrate that a contradiction emerges which leads to

55
rejecting this conjecture. For, note the asset with higher variance should have higher required
return and in equilibrium the expected return should be also higher, matching the required
return. Hence the situation described is impossible, namely, the required return cannot be
related to variance.
We support this observation with a simple example

Example 2.16

Suppose that the weights of a portfolio are of the form w j = 1n where n is the number
of the assets. We shall investigate the risk of this portfolio depending on n. Assume
that the variances of all portfolios on the market (suppose there are countably many) are
uniformly bounded, σ2j ≤ L. Then
X X X
σ2w = w j wk σ jk = w2j σ2j + w j wk σ jk
j,k
1 1 X
≤ nL + σ jk
n2 n2 j,k

Assume further that the off-diagonal elements of the covariance matrix are all equal,
σ jk = c > 0, say. Then
L 1
σ2w ≤ + 2 n(n − 1)c.
n n
The upper bound converges to c as n → ∞. Hence the risk of a portfolio containing many
assets is determined by the covariances since the variances of the ingredients become
irrelevant for large n.

This example motivates the following distinction between two kinds of risk: diversifiable,
or specific risk, which can be reduced to zero by expanding the portfolio, and undiversifiable,
systematic, or market risk which cannot be avoided because the securities are to some extent
linked to the market. We will return to this distinction later on in this chapter.
Let us now recall that the assumption that the market is in equilibrium combined with the
assumption that all investors have homogeneous views on all parameters (expected returns and
the entries of the covariance matrix) leads to the conclusion that they all invest in a single
portfolio, called the market portfolio (as shown in the previous chapters). As a consequence,
each investor has the same proportion of each asset, hence the weights of the market portfolio
should equal to the relative values of each assets as compared with the whole market:
market value of asset j
mj = .
market value of all assets
To illustrate this idea consider a pizza with various ingredients arranged in rings. Each slice
of this pizza contains each ingredient in equal proportion, though the sizes of slices may be

56
olives
tomatoes
ham

Figure 2.10 The proportional components in slices.

different, Figure 2.10.


If the slices were not proper, some guests would be deprived of fair share of some ingre-
dients which might create some tension. This tension on a market leads to demand on some
assets and consequently to price movements contradicting the state of equilibrium, which is
our basic assumption here.
The expected return on the market portfolio is one of the ingredients of a simple but effec-
tive model, which is the topic of this chapter. Capital Asset Pricing Model (CAPM in brief)
distinguishes some parameters influencing the return on an asset and describes this influence
by means of a simple formula.

2.8 Derivation of CAPM


Let KA be a return on some asset or some portfolio of assets. We introduce the following
notation:

Definition 2.17
We call
Cov(KA , Km )
βA =
σ2m
the beta factor of a given asset (or portfolio) A.

We now have the famous theorem.

Theorem 2.18 CAPM


Suppose that the risk free rate r is lower that the expected return of the minimal variance
portfolio (so that the market portfolio m exists). Then the expected return µA on any
asset or portfolio is given by the formula

µA = r + βA (µm − r) (2.28)

57
m CML

M A

(0,r)

Figure 2.11 A contradiction when there is no tangency at the market portfolio.

Proof
As we know, the Capital Market Line is tangent to the efficient frontier at certain point M
corresponding to a portfolio, which we call the market portfolio and denote the weights
by m so M = (σm , µm ). Consider any other asset (it can be a portfolio) represented by
a point A in the (σ, µ)-plane, A = (σA , µA ). Consider all portfolios built by means of M
and P. They form a hyperbole which we claim to be tangent to the CML at M. Suppose
for the contrary that this hyperbole intersects the CML. This clearly contradict the fact
that the slope of CML is maximal, see Figure 2.11 and Video .
We compute the slope of the tangent to the hyperbole at M and then we will use the fact
that the slope of CML is the same. Denote the weights of A and M in a portfolio on the
hyperbole by x = (x, 1 − x). The risk and return are of the form

µx = xµA + (1 − x)µm
1
σx = (x2 σ2A + (1 − x)2 σ2m + 2x(1 − x)cov(KA , Km )) 2

and we compute the derivatives with respect to x at x = 0 to get


∂µx
| x=0 = µA − µm ,
∂x
∂σx cov(KA , Km ) − σ2m
| x=0 = .
∂x σm
The slope of the tangent is the ratio of these derivatives and we equate it to the slope of
CML:
µa − µm µm − r
=
cov(KA ,Km )−σ2m σm
σm

Solving for µA we get


cov(KA , Km )
µA = r + (µm − r),
σ2m

58
as required.

Here is a Video with the proof.

The second term in (2.28) is called the risk premium. It represents the additional return
required by an investor who faces a risk represented by the link of the portfolio to the whole
market.
The CAPM formula is concerned with expectations. Our next step is to consider the re-
turns, that is the random variables. Define the error as
ew = Kw − r − βw (Km − r)
and of course we have
E(ew ) = 0.
It is interesting to see that the principle of error minimising implies the form of the beta coef-
ficient:

Proposition 2.19
Given a portfolio w, let ew = Kw − r − β(Km − r) for some number β. The variance of ew
w ,Km )
is minimal for β = Cov(K
Var(Km )
.

Here is a Video with the proof.

Proof
We can compute the variance of ew as

Var(ew ) = Var(Kw − r − β(Km − r))


= Var(Kw − βKm ) (Var(X + a) = Var(X) for constant a)
= Var(Kw ) + Var(−βKm ) + 2Cov(Kw , −βKm )
= Var(Kw ) + β2 Var(Km ) − 2βCov(Kw , Km ).

This is a quadratic function of β with a positive coefficient for β2 . The minimum is


found when
0 = 2βVar(Km ) − 2Cov(Kw , Km ),
hence
Cov(Kw , Km )
β= ,
Var(Km )
which concludes the proof.

59
We rewrite the definition of the error

Kw = r + βw (Km − r) + ew (2.29)

to use this formula to compute the variance of the portfolio with weights w. First we find the
covariance between ew and Km

Cov(Km , ew ) = Cov(Km , Kw − r − βw (Km − r))


= Cov(Km , Kw ) − βw Cov(Km , Km ) = 0

by the definition of beta. Next

Var(Kw ) = β2w Var(Km − r) + Var(ew )


so, using the translation invariance of the variance,

σ2w = β2w σ2m + Var(ew ).

This formula sheds better light on the above distinction between two kinds of risk. The first
term represents the systematic risk that cannot be avoided by adding more securities to the
portfolio and it is measured by the beta coefficient. The second term is the diversifiable part of
the risk. If w = m, then we have Var(em ) = 0 so this term can be discarded if we invest in the
market portfolio or in a portfolio sufficiently large to serve in practice as its substitute.
CAPM shows the link between βw and the expected return µw and, consequently, the prices
of securities. An increase of the systematic risk increases the required return and pushes down
the prices. Diversifiable risk attracts no premium having no effect on µw since it can be elim-
inated by spreading an investment in a suitable portfolio, in particular, choosing the market
portfolio which is well approximated by some stock exchange indices.
The above relation between the returns (see (2.29)) and the connection to the minimising of
the variance of the error lead to the following method of finding the beta from historical data.
Beta can be recognised as the result of applying the linear regression (see Excel and Video
). If the realized past returns on the securities are plotted against the realized returns on the
market portfolio the line of best fit, also known as the characteristic line can be found. We
can write the equation of the line obtained in the form

y = βx + α.

If the historical data are consistent with the expectations for the future returns (these are in-
volved in the CAPM formula), then the β obtained from linear regression can be used to find
the expected return on the security by means of CAPM.

60
m m

mm

r
mw
s b
0 sm bw 0 bm =1

Figure 2.12 The capital market line and the security market line.

2.9 Security Market Line


Drawing on the results established before, in Chapter 4, we can give an alternative proof of the
CAPM formula. Consider an arbitrary portfolio with weights w. The vector of weights in the
market portfolio will be denoted by m and as we know it satisfies

γCm = µ − r1
for some number γ > 0. The beta of the portfolio with weights w can, therefore, be written as

Cov(Kw , Km ) wTCm wT γ1 (µ − r1) µw − r


βw = = T = = .
σ2m m Cm mT γ1 (µ − r1) µm − r

Solving this for µw we obtain the CAPM relation again:

µw = r + βw (µm − r).
The expected return is an affine function of the beta coefficient. The graph of this function
in the (β, µ)-plane is called the security market line. This straight line is shown in Figure 2.12
where the CML is also plotted for comparison.
Going back to the discussion of the market equilibrium let us see the theoretical conse-
quences of possible departure from this state. In the state of equilibrium everyone is holding a
fraction of the market portfolio, the prices determine the expected returns which exactly match
the required returns given by the right hand sides of CAPM applied for each security. Any new
information about a particular security may affect the expected return and the CAPM may no
longer hold. Suppose, for example, that

µA > r + βA (µm − r)

for some security. In this case investors place buy orders and as a result of the demand created
the price goes up, which pushes the expected return down. On the other hand, if

µA < r + βA (µm − r)

61
investors want to sell or even short-sell the security, the price falls because of the excess supply,
and the expected return increases. In both cases the we should observe some adjustments
restoring the CAPM formula and the equilibrium.
Apart form illustrating the market equilibrium, CAPM has applications in analysing the
performance of various investments. The right hand side of CAPM gives the target return and
this is compared with realized return. The difference: realized return minus the target return,
is called the Jensen index. The goal is to achieve positive value of this index, the higher the
better.
Another approach to evaluation of performance comes from comparing the market prices
of risk, which by definition is the excess return per risk:
µw − r
MPRw = .
σw
The Sharpe index or Sharpe Ratio is obtained if the expected returns are replaced by the real-
ized returns and the standard deviation by the sample standard deviation. The benchmark is
the market price of risk for the market portfolio, in other words the slope of the CLM:
µm − r
MPRm = .
σm
The goal of an investor is to maximize the Sharpe index. Apart form the evaluation of the
performance, the above measures can be used to construct portfolios (on the basis of historical
data).

Remark 2.20
Capital Asset Pricing model is also called a single factor model. This single factor is
beta showing the dependence of required return on risk related to the whole market. This
theory has been generalised to a multi-factor version. The idea is similar and in addition
to the market we take some other economic quantities, like inflation, growth of the
national product, exchange rates for key foreign currencies, markets in other countries.
The formula for the expected return takes the form

µw = α + β1 F1 + β2 F2 + · · · + βm Fm

where β j are constants and F j represent the numerical values of the factors chosen.
Linear regression can be applied and the beta parameters can be estimated on the basis
of historical data. This theory is called Arbitrage Pricing Theory (APT in brief).

62
3
Utility functions
3.1 Basic notions and axioms
We begin with recalling some basic probability notation, slightly modified for our needs. The
probability space is discrete, Ω = {1, . . . , N}, the elements i ∈ Ω ale called states. The prices
of securities are denoted by S j (0), the initial prices, and S j (1, i), the prices at the end of the
period, which depend on the state. Portfolios will be described by the numbers x j of securities
held, so it is represented by a vector x = (x0 , x1 , . . . , xn ). The initial wealth of the investor is
denoted by W so the formation of a portfolio is subject to the bound
n
X
x j S j (0) ≤ W.
j=0

The final wealth is a random variable determined by the portfolio chosen, and we denote it by
Vx (1). In the state i it takes the value
n
X
Vx (1, i) = x j S j (1, i) = Sxi
j=0

where S = [S i j ] = [S j (1, i)]. This amount can be consumed by the investor so this motivates
the name feasible consumption set for the set
n
X
FCS = {X : X(i) ≥ 0, all i, X = Vx (1) where x j S j (0) ≤ W}.
j=0

We assume that the matrix S represents a monomorphism. This means in particular that the
number of rows, i.e. the cardinality N of Ω, is bigger that the number n of columns i.e. the
number of assets. This also means that the rank of the matrix is maximal, i.e. n. Video

63
Remark 3.1
The following version of this setup if often considered in the literature. An investor
(often called an agent) receives some known endowment e(0) at the beginning and a
random one e(1), at the end of the period, consumes c(0) at the beginning (a known
quantity) and.c(1, i) in state i (a random variable). Then the initial constraint is of the
form n
X
x j S j (0) ≤ e(0) − c(0)
j=0

meaning that the money available for investment is what is left after initial consumption.
The end of period consumption is the main object now and the feasible consumption
plans are considered
n
X n
X
FCP = {c(1) : c(1) ≥ 0, c(1) ≤ e(1) + x j S j (1), x j S j (0) ≤ e(0) − c(0)}.
j=0 j=0

Our setting is a particular case with W = e(0) − c(0), e(1) = 0, and assumption that the
whole final value of the portfolio is consumed.

We assume that the investor can decide between any two possible final consumptions. So
we assume that a binary relation on FCS is given: for X, Y ∈ FS C we write X  Y to mean
that Y is preferred to X, and X ∼ Y if the investor is indifferent between X and Y. We also write
X ≺ Y if X  Y and X / Y.

For a quick overview and motivation see Video.

The following axioms are formulated to describe the rational behaviour of investors.

Axiom 1 (transitivity) If X  Y and Y  Z then X  Z.

This axiom is sometimes called the consistency axiom since it excludes irrational prefer-
ences.

Axiom 2 (completeness) For all X, Y either X  Y or Y  X.

Here we assume that each individual can always arrive at a decision.


If Axioms 1 and 2 are satisfied, we call  a preference relation. In practice this relation
may be difficult to specify. An alternative approach is based on employing so-called utility.
Consider a real functions U : RN → R representing investor’s preference, who wishes to solve
the problem

max{U(X) : X ∈ FCS }.

64
Usually, we assume that

(i). U is strictly increasing with respect to each variable,

(ii). U is differentiable,

(iii). U is strictly concave.

Such a function determines the preference relation:

XY if and only if U(X) ≤ U(Y).

The question is if any preference can be represented by a utility. The answer is positive for
finite case.
The situation is different in general case where this representation may be impossible (un-
less some additional technical assumptions are made).
A particular case of utility is the expected utility determined by means of some u : R → R
called the utility function, and the formula

U(X) = E(u(X)).
If such a u exists we say that U is a von Neuman-Morgenstern utility. The crucial feature
of this representation of utility is that it is done by means of a single variable function. If X
is non-random (corresponding to a portfolio involving risk-free assets only) then X(i) = c for
some constant c and then U(c) = u(c). The following assumptions are usually imposed on
utility functions (analogous to those introduced for utilities):

(i). u is strictly increasing,

(ii). u is differentiable,

(iii). u is strictly concave and as a result the first derivative u0 , called the marginal utility, is
decreasing.

Typical examples of utility functions are as follows:

(i). Exponential: u(x) = −e−ax ;

(ii). Logarithmic: u(x) = ln x;

(iii). Power: u(x) = axa for a ≤ 1;

(iv). Quadratic: u(x) = x − 12 bx2 (which is increasing only for x < b1 ).

65
The computation of the expected utility involves only the probability distribution PX of X,
not the random variable itself since
Z
E(u(X)) = u(x)dPX (x).

For this reason the preference relation concerned with the expected utility is often discussed
on the set P of all probability measures on R Of course the relation given on random variables
induces the relation on distributions: if X  Y then we say that PX  PY . The preference
relation on P is assumed to satisfy the following Axioms.

Axiom 1 (transitivity) If P1  P2 , P2  P3 then P1  P3 .

Axiom 2 (completeness) For all P1 , P2 ∈ P either P1  P2 or P2  P1

Axiom 3 (independence) For all P1 , P2 , P3 ∈ P and a ∈ (0, 1], if P1 ≺ P2 then aP1 + (1 −


a)P3 ≺ aP2 + (1 − a)P3 .

In other words, the choice between P1 and P2 is not affected by the appearance of some
other opportunity. However, this axiom is not consistent with empirical facts. We present the
so-called Alais paradox. Given the choice between P1 = δ1M (1 million for sure, δ is the Dirac
delta measure δa (A) = 1 if and only if a ∈ A, so δa ({a}) = 1) and P2 = 0.1 × δ5M + 0.89 × δ1M +
0.01 × δ0 most people choose P1 . They also prefer P3 = 0.1 × δ5M + 0.9 × δ0 to P4 = 0.11 × δ1M .
This, however, contradicts the independence axiom.
Axiom 4 (Archimedean) For all P1 ≺ P2 ≺ P3 there exists a ∈ (0, 1) such that aP1 + (1 −
a)P3 ≺ P2 and there exists b ∈ (0, 1) such that P2 ≺ bP1 + (1 − b)P2 .

In other words, a ‘good’ plan P3 is never ultimately good since combined with bad P1
becomes worse than P2 . Similarly, there is no ultimately ‘bad’ plan. In financial terms this
axiom is acceptable though with some limitations: if ‘bad’ means ultimate bankruptcy, no
sweetener (even high probability of huge wealth) may be acceptable to some individuals. The
name ‘Archimedean’ comes from a well-known property of ordered fields: for all positive x, y
there is an integer n so that nx > y (despite x being possibly much smaller than y).
One can prove that Axioms 1-4 are equivalent to the existence of von Neuman-Morgenstern
utility function in the case of measures supported on a finite set. Some additional conditions
allow an extension of this result to a general case. We shall not dwell on that.

3.2 Utility maximisation

Definition 3.2
We say that a portfolio y = (y1 , . . . , yn ) is an arbitrage opportunity if Vy (0) ≤ 0 and
Vy (1) ≥ 0 with Vy (1, i) > 0 for at least one i ∈ Ω = {1, . . . , N}.

66
A fundamental assumption of finance theory is that arbitrage opportunities do not exist.
A comment: Video We shall see how this principle is related to the existence of a solution to
the problem of maximising utility.

Theorem 3.3
Assume that the rank of S is n. If there is a solution to the problem

max{U(X) : X ∈ FCS },

where U satisfies the above conditions, then there is no arbitrage. Conversely, if U is


continuous and there is no arbitrage, then the problem has a solution.

Proof
Suppose there is an x such that U(X) ≤ U(Vx (1)) for any feasible consumption X and
suppose that there exists arbitrage opportunity y. Take z = x + y. At the initial time
Vz (0) = Vx (0) so z is feasible. Since U is strictly increasing in each variable, U(Vz (1)) =
U(Vx (1) + Vy (1)) > U(Vx (1)) which is a contradiction.

To see the proof so far click Video

For the converse, it is sufficient to see that the set of feasible consumptions is bounded
since it is obviously closed (being defined by weak inequalities). To this end it is suffi-
cient to show that the set A ⊂ Rn of all portfolios x such that Vx (1) ∈ FCS is bounded.
Suppose for the contrary that there is sequence xn ∈ A with kxn k unbounded. We may
assume that kxn k → ∞ taking a subsequence if necessary. The sequence zn = kxxnn k is
bounded and has a subsequence convergent to a limit z. We shall see that z is an arbi-
trage opportunity which will be a contradiction. First,
X 1 X W
Vz (0) = z j S j (0) = lim (xn ) j S j (0) ≤ →0
n→∞ kxn k kxn k
so Vz (0) ≤ 0. Second, Vxn (1) ≥ 0 by the definition of FCS , and this inequality is
preserved in the limit.

The second part of the proof is described in the Video

But Vz (1) , 0 since kzk = 1 (for otherwise Vz (1) = Sz = 0 would imply z = 0). This
demonstrates that z is an arbitrage, which is a contradiction.

Final part of the argument Video

67
We turn to the question of relation between the security prices at time 0 and 1 introducing
the so-called state prices πi which are positive numbers such that
N
X
S j (0) = S j (1, i)πi .
i=1

Suppose that one of the securities is risk-free, that is, S 1 (1, i) = 1 for all i, say. Then
N
X
S 1 (0) = πi
i=1

which is the price of a sure euro to be received at time 1, that is, it is the discount factor. We
can then define the interest rate (suppose that the length of the period is one year)
1
1+R= P .
πi

Remark 3.4
State prices are related to risk neutral probabilities. Recall that
1 1 X
S j (0) = Eq (S j (1)) = qi S j (1, i)
1+R 1+R
where qi are risk neutral (martingale) probabilities. The above relation says that the dis-
counted prices form a martingale. Their existence is guaranteed by no-arbitrage condi-
tion, uniqueness holds is complete models which in our setting means that each random
variable X representing the payoff at time 1 can be replicated, i.e. written as X = Vx (1)
for some portfolio x. This is the case if the matrix S = [S j (1, i)]i j is quadratic and in-
vertible. We can see that the existence of state prices is equivalent to the no-arbitrage
principle and
qi
πi = .
1+R

We shall show that state prices can be found by means of utilities. An interesting fact is
that the result is to some extent independent of the particular form of utility.

Theorem 3.5
Suppose that X ∗ is a solution of the maximisation problem for utility U. Then there is a
number λ such that
∂U ∗
πi = λ (X )
∂Xi
are state prices.

68
The proof is given in Video

Proof
First let us observe that for a strategy x∗ for which Vx∗ (1) = X ∗ we must have Vx∗ (0) = W.
This is because if we had Vx∗ (0) < W, we could invest the funds W − Vx∗ (0) into one
of the assets which would result with an investment that has higher time one value than
Vx∗ (1). It would then also have higher utility than X ∗ = Vx∗ (1), which would contradict
the fact that X ∗ is optimal. This means that

max U (X) = max {U (Vx (1)) : Vx (0) = W, Vx (1) ≥ 0} .


X∈FCS

Let us now define two functions

f, g : Rn → R

as

f (x) = U (X ∗ + Vx (1)) = U (X ∗ + Sx) ,


g (x) = Vx (0).

We see that
max U (X) = max f (x) = f (0) .
X∈FCS g(x)=0

From the method of Lagrange multipliers, there exists a λ such that

λ∇ f (0) − ∇g (0) = 0. (3.1)

Note that Sx is an N dimensional vector with coefficients (Sx) j =


Pn
i=1 S i (1, j) xi . This
means that

∂f
N
X ∂U ∂ (Sx) j
(0) = (X ∗ + Sx) |x=0
∂xm j=1
∂X j ∂xm
N
X ∂U ∗
= (X + Sx) S m (1, j) |x=0
j=1
∂X j

N
X ∂U ∗
= (X ) S m (1, j) .
j=1
∂X j

From the fact that  n 


∂g ∂ X
= xi S i (0) = S m (0),

∂xm ∂xm

i=1

69
we see that (3.1) leads to
N
X ∂U ∗
λ (X ) S m (1, j) = S m (0) .
j=1
∂X j

∂U
This means that π j = λ ∂X j
(X ∗ ) are state prices, as required.

Theorem 3.6
Assume that U(X) = E(u(X)). If X ∗ = (X1∗ , . . . , XN∗ ) is a solution of the problem

max{U(X) : X ∈ FCS },

then with (u0 )−1 denoting the inverse function of u0 , we have

0 −1 πi
!
Xi = (u )

(3.2)
λpi

where λ is determined by the condition


N
πi
X !
W= πi (u )
0 −1
. (3.3)
i=1
λpi

A Video of the proof.

Proof
For a particular case of expected utility we have
N
∂U ∗ ∂ X
(X ) = u(X ∗ (k))pk = u0 (X ∗ (i))pi
∂Xi ∂Xi k=1

hence the state prices are of the form

πi = λu0 (X ∗ (i))pi .

Dividing by pi and taking (u0 )−1 over both sides of the equation gives (3.2).

70
There exists a strategy x∗ with V = Vx∗ (0) such that Vx∗ (1) = X ∗ . We can compute
N
X N
X
πi Xi∗ = πi Vx∗ (1, i)
i=1 i=1
N
X n
X
= πi x∗j S j (1, i)
i=1 j=1
n
X N
X
= x∗j πi S j (1, i)
j=1 i=1
n
X
= x∗j S j (0) since πi are state prices
j=1
= Vx∗ (0)
= W,

which combined with (3.2) gives (3.3).

Recall the definition of risk neutral probabilities and use the above
πi pi u0 (X ∗ (i)) u0 (X ∗ (i))
qi = P = PN = pi
πi 0 ∗
i=1 u (X (i))pi
E(u0 (X ∗ ))

Now we have the following expressions for the current securities prices
N
X
S j (0) = πi S j (1, i)
i=1
XN
=λ u0 (X ∗ (i))pi S j (1, i)
i=1
= λE(u0 (X ∗ )S j (1))
= γEq (S j (1))

where γ = πi is the discount factor.


P

71
Example 3.7
Consider a trinomial model with two assets: risk free with S 1 (1, i) = S 1 (0)(1 + r) and
risky S 2 (1, i) = S 2 (0)(1+K2 (i)), i = 1, 2, 3. Let u(x) = − exp(−ax). Suppose that r = 5%,


 20% with probability 0.4
K2 (i) = 

10% with probability 0.3


 −10% with probability 0.3

S 1 (0) = 1, S 2 (0) = 10. For a = 0.1 maximising expected utility gives a portfo-
lio x = (1.8343, 1.6566) with X ∗ (i) = 23.75, 21.92, and 18.25, respectively. Then
u0 (X ∗ (i))pi = 0.00372, 0.00335, 0.00484 and after normalising we get πi = 0.3124,
0.2814, 0.4062, respectively. Finally, πi K2 (i) = r = 5% so the prices of the risky asset
P
form a martingale. For other values of a we get different measures but the martingale
property is preserved as shown in general above (see Excel)

3.3 Relation to mean variance analysis


Decision making based on utility functions is consistent in a special case with Markowitz
theory based on finding balance between risk (variance or standard deviation) and return (ex-
pected return). This case is concerned with a special kind of utility, no restrictions imposed on
the distributions of returns.
Consider quadratic utility function, u(x) = ax − 12 bx2 , which is applicable under the as-
sumption that the possible future wealth does not exceed 1b . Video In this case

!
1 1 1
max E[aW(1) − bW (1)] = max aE[W(1)] − bE[W(1)] − bVar(W(1)) .
2 2
2 2 2

We can see that expectation and variance are all that is needed to arrive at a decision. For a
given level of expected wealth (which is equivalent to saying ”for a given expected return”)
we need to minimise the variance of wealth. But

Var(W(1)) = W 2 (0)Var(1 + K) = W 2 (0)Var(K)

which means that to optimise we need to minimise the variance of return. This means that
the optimal (from the point of view of utility approach) portfolio lies on the frontier (efficient
frontier, if the expected return is large enough).
Our next step is to make another circle of ideas in portfolio theory by showing the relation-
ship between the utilities approach and Capital Asset Pricing Model.
First, we will relate the results of the previous section to the approach based on returns. To
this end we go back to the description of investment decisions by means of weights and returns

72
rather than numbers of securities and their prices. Assume that the first security is risk free:

S 1 (1) = S 1 (0)(1 + R)

and for the remaining we have random returns

S j (1) = S j (0)(1 + K j ).

The final wealth can be written in the following way:


n
X
W(1) = W(0)(w1 (1 + R) + w j (1 + K j ))
j=2
n
X n
X
= W(0)((1 − w j )(1 + R) + w j (1 + K j ))
j=2 j=2

where we eliminate the first weight since it is determined by the remaining ones.

Theorem 3.8
If every investor chooses his position by maximising the expectation of a quadratic
utility, then
E(K j ) − R = β j (E(K M ) − R).
(Here different investors can use different quadratic functions as their utility functions;
they do not need to use the same one.)

Proof
Consider an investor who uses a The first order condition for the expected utility max-
imisation (max(E(u(W(1))) implies that

Eu(W(1)) = W(0)E[u0 (W(1))(K j − R)] = 0. (3.4)
∂w j

For a detailed explanation of the above see Video

Next, we have
Cov(u0 (W(1)), (K j − R)) = E[u0 (W(1))(K j − R)] − E(u0 (W(1)))E(K j − R)
so
E u0 (W(1)) E(K j − R) = −Cov(u0 (W(1)), (K j − R)) = −Cov(u0 (W(1)), K j )


This relation holds for each investor separately since they have different utility functions.

73
For an explanation of the above steps and for an general idea of the following
argument see Video

Suppose that our investors choose utilities um = am x − 12 bm x2 + cm which lead to optimal


wealths Wm (1). Then u0m (x) = am − bm x and

(am − bm E (Wm (1)))E(K j − R) = −bm Cov(Wm (1), K j ).

Write
am
γm = − + E (Wm (1))
bm
so that
γm E(K j − R) = Cov(Wm (1), K j ). (3.5)
Let M = m Wm (1) is the value of the whole market. We can write M = M(0)(1 + K M ),
P
where K M is the return on the market portfolio.
After adding up (3.5) for all the investors we obtain
X X
E(K j − R) γm = Cov( Wm (1), K j ) = Cov(M, K j ) = M(0)Cov(K M , K j ).
m m

We can write briefly


E(K j − R) = ηCov(K M , K j )
for each return (on each asset), with η = 1/( γm ). In particular, for the market portfolio
P
we have
E(K M − R) = ηCov(K M , K M ) = ηVar(K M ).
Eliminating η from the last two equations we get the familiar CAPM formula

E(K j ) − R = β j (E(K M ) − R),

as required.

For a live derivation with some additional explanations see Video

3.4 Risk aversion


Investor is said to be risk averse if

u(E(X)) ≥ E(u(X)) for all X.

74
An intuitive explanation is this: both sides represent an expected utility Video . On the left we
have sure consumption at the level E(X), on the right we have an uncertain wealth X, both with
the same expected value. The inequality says that the investor always chooses the sure thing.
We say that the investor is risk neutral if

u(E(X)) = E(u(X)) for all X.

If the investor is risk averse we define the risk premium as a function γ : FCS → R such
that
u(E(X) − γ(X)) = E(u(X))
Video The number E(X) − γ(X) is called the certainty equivalent of X.
We shall find an approximate formula for γ. Assume that X takes only finitely many values
x1 , . . . , xn .

For an explanation of the derivation of ARA that follows below you can see Video

Take the Taylor expansion at xi of u around m = E(X) to get


1
u(xi ) ≈ u(m) + (xi − m)u0 (m) + (xi − m)2 u00 (m).
2
Multiplying by pi = prob(X = xi ) and summing we get
1 1
E(u(X)) ≈ u(m) + E(X − m)u0 (m) + E(X − m)2 u00 (m) = u(m) + Var(X)u00 (m). (3.6)
2 2
Taking Taylor expansion at m − γ(X) around m gives

u(m − γ(X)) ≈ u(m) − γ(X)u0 (m)

so (by the definition of the risk premium)

E(u(X)) ≈ u(m) − γ(X)u0 (m). (3.7)

Comparing the right hand sides of (3.6) and (3.7) we get


1
u(m) + Var(X)u0 (m) ≈ u(m) − γ(X)u00 (m)
2
which yields
u00 (E(X)) 1
γ(X) ≈ − Var(X)
u0 (E(X)) 2
The number
u00 (E(X))
ARA = −
u0 (E(X))
is called the absolute risk aversion coefficient (introduced by Arrow and Pratt).

75
3.5 Utility functions and indifference curves
In this section we make a simplifying assumption that the final wealth X of a portfolio has
normal distribution and show how in such setting utility functions are related with indifference
curves.
When the final wealth has normal distribution, then the expected utility depends on two
parameters: the mean and standard deviation which we denote by m and s, respectively. The
random return K on this portfolio has normal distribution as well since X = W (1 + K) . The
mean and standard deviation of the return are denoted by µ and σ as before.
We now assign a real number to each pair (s, m) in the following way. Denote the normal
density of the wealth by f s,m (x) and put
Z
U(s, m) = E (u (X)) = u(x) f s,m (x)dx. (3.8)
R

Since m and s are related to µ and σ in the following way:

m = E (X) = W (1 + E (K)) = W (1 + µ) , (3.9)


p p p
s = Var (X) = Var (W (1 + K)) = W Var (K) = Wσ, (3.10)

formula (3.8) automatically defines a function on the (µ, σ) plane:

(σ, µ) 7→ U(Wσ, W (1 + µ)).

The locus of all points with the same value of U is the indifference curve (discussed in section
1.10).
We claim that concavity of the utility function (which is related to risk aversion) implies
convexity of the indifference curves in (σ, µ) plane. We make a simplifying assumtion that u is
differentiable and then u0 (x) > 0 since u is increasing, and u0 is decreasing sonce u is concave.
Next we note that the implicit function m(s) given by the condition U(s, m) = const = c is
increasing.

Proposition 3.9
The level set
{(s, m) : U(s, m) = c}
is parameterised as (s, m (s)) where m(s) is a differentiable function with positive deriva-
tive. Moreover, m(s) is convex.

76
Proof
Let Y = X−m
s
, which has standard normal distribution. Consequently X = sY + m and
Z +∞
U(s, m) = E(u(X)) = u(sy + m) f (y)dy (3.11)
−∞

where f is the standard normal density (that is, f (x) = f0,1 (x)).
Since u0 > 0 and f > 0 we see that
Z +∞
∂U(s, m)
= u0 (sy + m) f (y)dy = E u0 (Y) > 0,

(3.12)
∂m −∞

which by the implicit function theorem implies the existence of the function m(s).

To see the proof so far see Video

The implicit function theorem also says that m(s) is differentiable and that
∂U(s,m)
dm(s) ∂s
|U=c = − ∂U(s,m) .
ds
∂m

We can compute the partial derivative


Z +∞
∂U(s, m)
= yu0 (sy + m) f (y)dy.
∂s −∞

Since we have alsready shown (3.12), to establish that m0 (s) > 0 it is sufficient to show
that Z +∞
yu0 (sy + m) f (y)dy < 0. (3.13)
−∞
We see that
Z 0
yu0 (sy + m) f (y)dy
Z−∞
+∞
= (−v) u0 (−sv + m) f (−v)dv we change y = −v
0
Z +∞
= − vu0 (−sv + m) f (v)dv since f (−v) = f (v)
0
Z +∞
< − vu0 (sv + m) f (v)dv since u0 (x1 ) > u0 (x2 ) for x1 < x2
0

which by moving the right hand side of the inequality to the left leads to (3.13).

77
The above argument above is described in Video

To see convexity of m(s) take two points on it, U(s1 , m1 ) = U(s2 , m2 ) = c; in other
words m1 = m (s1 ) and m2 = m (s2 ). We start by showing that U( s1 +s 2
2 m1 +m2
, 2 ) > c. To
this end we use (3.11), then do simple arythmetics, then use concavity of u, and finally
(3.11) again
 s + s m + m  Z +∞ s + s m1 + m2
1 2 1 2 1 2
U , = u( y+ ) f (y)dy
2 2 −∞ 2 2
Z +∞
m1 + s1 y m2 + ys2
= u( + ) f (y)dy
−∞ 2 2
Z +∞
1 1
> [ u(m1 + s1 y) + u(m2 + s2 y)] f (y)dy
−∞ 2 2
1 1
= U(s1 , m1 ) + U(s2 , m2 )
2 2
= c.

From the above it follows that


s + s  s + s  s + s m + m 
1 2 1 2 1 2 1 2
U ,m =c<U , . (3.14)
2 2 2 2
∂U
Since ∂m
> 0 we see that (3.14) implies
s + s  m + m m(s1 ) + m(s2 )
1 2 1 2
m < = ,
2 2 2
as required.

The final part of the proof can be seen here Video

We can now obtain analogous properties on the (σ, µ) plane.

Theorem 3.10
The level sets of
{(σ, µ) : U (Wσ, W (1 + µ)) = c}
are given by curves σ 7→ µ (σ) where µ0 (σ) > 0. Moreover µ (σ) is convex.

78
The proof is discussed in the Video

Proof
We want to find µ (σ) such that U (Wσ, µ(σ)) = c. We know that U (s, m(s)) = c so

µ(σ) = m(Wσ).

Since m(s) is increasing and convex and W > 0 we see that µ(σ) is also increasing and
convex, as required.

79
4
Value at Risk
Until now we have focused our attention on variance, or more precisely, standard deviation,
as a tool for measuring risk. The standard deviation σK of the return K on a risky investment
measures the spread of the random values of K from their mean µK . In portfolio selection we
seek to minimize σK while maximizing µK . However, an investor seeking to measure the risk
inherent in an asset he holds is naturally more concerned to place a bound on his potential
losses, while remaining relaxed about possible high levels of profit! Thus one looks for risk
measures which focus on the downside risk, that is, measures concerned with the lower tail
of the distribution of K. Variance and standard deviation are symmetric, so they are not good
candidates in this search.
In looking for quantitative measures of the overall risk in a portfolio, we seek a statistic
which can be applied universally, enabling us to compare the risks of different types of risky
portfolio, irrespective of whether these are based on equities, currencies or commodities. Ide-
ally, we look for a number (or set of numbers) that expresses the potential loss with a given
level of confidence, enabling the risk manager to adjudge the risk as acceptable or not.
In the wake of spectacular financial collapses in the early 1990s at Barings Bank and Or-
ange County,Value at Risk (henceforth abbreviated as VaR) became a standard benchmark for
measuring financial risk. It has the advantage of relative simplicity and ease of use when suffi-
cient data are available. Its principal drawback is that it does not provide full protection against
extreme (i.e. highly unlikely) events. In this chapter we explore this popular risk measure.

4.1 Quantiles
An investor holding an asset whose future value is uncertain may wish to determine whether
his final position, X, in the asset has at least 95% probability of remaining above a certain
(usually negative) level. Value at Risk at 5% answers this question by specifying the worst
of the best 95% of possible outcomes. Its calculation is therefore closely tied to the values of
the distribution function F X of X. This leads us to examine the so-called quantiles of F X more
closely.
We start with a simple example.

80
Example 4.1
Consider a two step binomial model with stock prices

121
%
110 −→ 99
% %
100 −→ 90 −→ 81

Assume that the probability p of the price going up in a single step is p = 0.8. In this
example we neglect the time value of money and compute the gain at time three of
buying a single share of stock as

X = S (3) − S (0),

21 with probability p2 = 0.64





X= −1 with probability 2p(1 − p) = 0.32,


 −19 with probability (1 − p)2 = 0.04.

We can see that the probability that our investment will lead to a loss of 19 is

P(−X < 19) = P(X > −19) = 0.96.

This means that with with probability 96% we can believe that we will lose no more
than 1. If we agree, for instance, to ignore the worst 5% of potential outcomes, our
‘worst-case scenario’ would be to expect a loss of 1. However, if we are only willing to
exclude (say) the worst 2.5%, the loss of 19 should be taken into account.

An outcome at a given probability can be expressed using quantiles. We recall the defini-
tion and some simple properties.
Let (Ω, F, P) be a probability space and let X : Ω → R be a random variable. The cumu-
lative distribution function F X : R → [0, 1], defined by F X (x) = P(X ≤ x) is right-continuous
and non-decreasing (see [PF] for details).

81
Figure 4.1 The upper and lower quantiles for various distribution functions.

Definition 4.2
For α ∈ (0, 1) the number

qα (X) = inf{x : α < F X (x)}, (4.1)

is called the upper α-quantile of X. The number

qα (X) = inf{x : α ≤ F X (x)}, (4.2)

is called the lower α-quantile of X. Any

q ∈ [qα (X), qα (X)],

is called an α-quantile of X.

Here is a Video describing the intuition behind the notion.

The definition is best understood when looking at the graph of the cumulative distribution
function. In Figure 4.1 we can see that the upper and the lower quantiles differ when the plot
of F X (x) becomes flat at the value F X (x) = α, otherwise they are equal.

82
-19 -1 -21

Figure 4.2 The plot of the distribution function for X from Example 4.1.

Example 4.3
For X from Example 4.1 we can compute the upper and the lower α-quantiles, for α ∈
{0.025, 0.04, 0.05}, as (see Figure 4.2)

q0.025 (X) = −19, q0.025 (X) = −19,


q0.04 (X) = −1, q0.04 (X) = −19,
q0.05 (X) = −1, q0.05 (X) = −1.

We list some basic properties of quantiles. The proofs are all elementary, but we defer the
more technical parts to the end of the chapter to avoid disturbing the flow of development.

Proposition 4.4
Let X, Y be random variables.

(i). X ≥ Y implies qα (X) ≥ qα (Y).

(ii). For any b ∈ R, qα (X + b) = qα (X) + b.

(iii). For b > 0, qα (bX) = bqα (X).

(iv). qα (−X) = −q1−α (X).

For an intuitive/geometric overview of the proof see:


Video for (i)
Video for (ii)
Video for (iii)
Video for (iv)

83
Proof
If X ≥ Y then
F X (x) = P(X ≤ x) ≤ P(Y ≤ x) = FY (x),
hence α < F X (x) implies that α < FY (x). This means that

{x : α < F X (x)} ⊂ {x : α < FY (x)}

which gives

qα (X) = inf{x : α < F X (x)} ≥ inf{x : α < FY (x)} = qα (Y).

The second property follows since with Y = X + b we have

FY (x + b) = P(X + b ≤ x + b) = F X (x),

so that

qα (X + b) = inf{x + b : α < FY (x + b)}


= inf{x : α < FY (x + b)} + b
= inf{x : α < F X (x)} + b
= qα (X) + b.

Since P(bX ≤ x) = P(X ≤ x/b) we see similarly that

FbX (x) = F X (x/b),

hence for b > 0

qα (bX) = inf{x : α < FbX (x)}


= inf{x : α < F X (x/b)}
= inf {by : α < F X (y)}
= b inf{y : α < F X (y)}
= bqα (X).

To prove (iv) we first need to show that for any b ∈ R

inf{x : b ≤ P (X ≤ x)} = inf{x : b ≤ P (X < x)}. (4.3)

Since P (X < x) ≤ P (X ≤ x) , if b ≤ P (X < x) then b ≤ P (X ≤ x) , which means that

{x : b ≤ P (X < x)} ⊂ {x : b ≤ P (X ≤ x)},

84
hence
inf{x : b ≤ P (X < x)} ≥ inf{x : b ≤ P (X ≤ x)}.
Suppose that

inf{x : b ≤ P (X ≤ x)} < x∗ < inf{x : b ≤ P (X < x)}, (4.4)

for some x∗ ∈ R. Then P (X < x∗ ) < b, and since x → P (X < x) is left-continuous, we


can find an x∗∗ ∈ R, x∗ > x∗∗ , for which

P (X < x∗∗ ) < b.

This would mean that


P (X ≤ x∗ ) ≤ P (X < x∗∗ ) < b.
The fact that P (X ≤ x∗ ) < b contradicts inf{x : b ≤ P (X ≤ x)} < x∗ , which means that
(4.4) cannot hold.
To prove (iv) we shall also use the fact that

F−X (x) = P (−X ≤ x) = P (X ≥ −x) = 1 − P (X < −x) . (4.5)

We can now compute

qα (−X) = inf{x : α < F−X (x)}


= − sup{−x : α < F−X (x)}
= − sup{−x : α < 1 − P (X < −x)}
(using (4.5))
= − sup{y : α < 1 − P (X < y)}
(taking y = −x)
= − sup{y : P (X < y) < 1 − α}
= − inf{y : 1 − α ≤ P (X < y)}
(since y → P (X < y) is non-decreasing)
= − inf{y : 1 − α ≤ P (X ≤ y)}
(using (4.3))
= − inf{y : 1 − α ≤ F X (y)}
= −q1−α (X).

85
Figure 4.3 Case 1. from the proof of Lemma 4.6.

Figure 4.4 Case 2. from the proof of Lemma 4.6.

Lemma 4.5
If F X (x) is continuous and strictly increasing then

qα (X) = F X−1 (α).

Proof
Since F X (x) is continuous and strictly increasing, the cumulative distribution function
F X (x) is invertible, and α < F X (x) is equivalent to F X−1 (α) < x. This gives

qα (X) = inf{x : α < F X (x)} = inf{x : F X−1 (α) < x} = F X−1 (α).

Lemma 4.6
Let X be a random variable. If f : R → R is right-continuous and non-decreasing then

qα ( f (X)) = f (qα (X)).

86
Proof
We know that
qα (X) = inf {x : α < F X (x)} ,
so for every x > qα (X)
α < F X (x) (4.6)
and
α ≤ F X (qα (X)) . (4.7)
We need to show that
qα ( f (X)) = inf y : α < F f (X) (y)
n o

which is equivalent to showing that

α ≤ F f (X) ( f (qα (X))) (4.8)

and that for y > f (qα (X))


α < F f (X) (y) . (4.9)
To show (4.8) consider two cases Video :
Case 1. We can have f −1 ((−∞, f (qα (X))]) = (−∞, x1 ) where x1 > qα (X). See Figure
4.3. Then from (4.7)

F f (X) ( f (qα (X))) = P ( f (X) ≤ f (qα (X))) = P (X < x1 ) ≥ P (X ≤ qα (X)) = F X (qα (X)) ≥ α.

Case 2. The second possibility is that f −1 ((−∞, f (qα (X))]) = (−∞, x1 ] where x1 ≥
qα (X). See Figure 4.4. Then from (4.7)

F f (X) ( f (qα (X))) = P ( f (X) ≤ f (qα (X))) = P (X ≤ x1 ) ≥ P (X ≤ qα (X)) = F X (qα (X)) ≥ α.

We have established (4.8).

To see an overview of the remaining part of the argument see Video.

We still need to show (4.9). Consider y > f (qα (X)). Since f is right-continuous, we
know that f −1 ((−∞, y)) = (−∞, x2 ), for some x2 ∈ R. Since f is non-decreasing, we
must have x2 > qα (X). See Figures 4.3, 4.4. We can take x̄ ∈ (qα (X), x2 ) and by (4.6)

F f (X) (y) = P ( f (X) ≤ y) ≥ P ( f (X) < y) = P (X < x2 ) ≥ P (X ≤ x̄) = F X ( x̄) > α,

which shows (4.9) and concludes our proof.

87
Figure 4.5 −VaRα (X) is the upper α-quantile for X.

4.2 Measuring downside risk


We work in a single-step financial market model in which we invest at time t = 0 and terminate
our investment at t = T. We denote by X the proceeds from the investment at time T .

Definition 4.7
For α in (0, 1), we define the Value at Risk (VaR) of X, at confidence level 1 − α, as
(see Figure 4.5)
VaRα (X) = −qα (X) = − inf{x : α < F X (x)}.

To gain some intuition, let us consider the following example.

Example 4.8
Let X be as in Example 4.1. By looking at the distribution function F X (x) (see Figure
4.2) we can see that

VaR0.04 (X) = 1,
VaR0.025 (X) = 19,

which agrees with our intuition of worst possible loss at probability 0.95 and 0.975,
respectively.

Let us observe that since X denotes the gain from an investment, the −X is the loss. We

88
can express VaR in terms of the loss as follows.

VaRα (X) = −qα (X)


= q1−α (−X) (by (iv) from Proposition 4.4)
= inf{x : 1 − α ≤ P(−X ≤ x)}
= inf{x : P(x < −X) ≤ α}
= inf{x : P(X + x < 0) ≤ α}.

In loose terms, this means that the probability of the loss exceeding VaRα is no greater than
α. In other words, VaRα is the worst possible loss at the confidence level 1 − α. Its simple
algebraic properties follow from those we proved for the upper quantile:

Proposition 4.9
Let X, Y be random variables.

(i). X ≥ Y implies VaRα (X) ≤ VaRα (Y),

(ii). For any a ∈ R, VaRα (X + a) = VaRα (X) − a,

(iii). For any a > 0, VaRα (aX) = aVaRα (X).

Proof
The proof of all above properties follows directly from the definition of VaRα (X) and
from the respective properties of quantiles proved in Proposition 4.4.

4.3 Examples of computing VaR


To familiarize ourselves with the definition of VaR let us consider a few simple examples.
We shall assume that at time zero we invest V(0) to receive V(T ) at time T . We use Ṽ(t) to
denote the discounted value
Ṽ(t) = e−rt V(t),
where r is the risk-free rate for continuous compounding. We use G(T ) to denote the gain from
an investment a time T
G(T ) = V(T ) − V(0),
and G̃(T ) to denote the discounted gain

G̃(T ) = Ṽ(T ) − V(0).

89
For investments starting at time zero and terminating at time T we shall be interested in
computing the VaR for
X = G̃(T ).

Example 4.10

Suppose that we invest V(0) risk-free. Then V(T ) = erT V(0) giving

X = G̃(T ) = e−rT V(T ) − V(0) = 0.

The distribution function of X is then


(
1 for x ≥ 0
F X (x) =
0 for x < 0.

For any α ∈ (0, 1), qα (X) = 0, which gives

VaRα (X) = −qα (X) = 0.

Example 4.11
Consider (
−20 with probability 0.05
X= (4.10)
−10 with probability 0.05
and P(X > 0) = 0.95. For x < 0


 0 x ∈ (−∞, −20)
F X (x) = 

0.05 x ∈ [−20, −10)


 F (x) ≥ 0.1 x ≥ −10.

X

Taking α = 0.05 we have

VaR0.05 (X) = −q0.05 (X) = 10,

For any α < 0.05,


VaRα (X) = −qα (X) = 20,
which demonstrates that VaR can be sensitive to the choice of α.
Let us now change the −20 from (4.10) to −2000. The VaR0.05 still remains equal to 10!
This illustrates that VaR does not take into consideration unlikely events, no matter the
severity of their outcome. This is an undesirable feature is a risk measure.

See Video.

90
1 1

0 1 0 1

Figure 4.6 The cumulative distribution functions for Example 4.12

91
Example 4.12
Consider two independent investments X1 , X2 with gains
(
0 with probability p
Xi =
1 with probability 1 − p

for i = 1, 2. We can think of these as corporate bonds with the same price and maturity
date, of two independent companies that each have a probability of default equal to p.
If p < α then
VaRα (X1 ) = VaRα (X2 ) = −1.
If we diversify our investment equally between the two bonds, then our gain will be
equal to
with probability p2

0
1 1


X1 + X2 = 

 1
with probability 2p(1 − p)
2 2  12


with probability (1 − p)2 .
If we choose α ∈ (p, p2 + 2p(1 − p)) then (see Figure 4.6)
!
1
F 21 X1 + 12 X2 = p2 + 2p(1 − p) > α
2

hence !
α 1 1 1
VaR X1 + X2 = − .
2 2 2
We can see that
!
1 α 1 1
− = VaR X1 + X2 > max {VaRα (X1 ), VaRα (X2 )} = −1,
2 2 2

which means that the risk of a diversified position, as measured by VaRα , is greater than
the risk of investing all our funds in a single bond. This runs counter to the principle that
diversification should reduce risk, and therefore illustrates a second serious drawback
in using VaR to measure risk. In the next chapter we will consider risk measures that
avoid these defects - for the present we present some further computations with VaR.

See Video.

From examples explored so far we see that finding VaR in the case of discrete distributions
is an easy task. This is summarized in the following lemma.

92
Lemma 4.13
Assume that X is a discrete random variable with P(X = xi ) = pi , pi = 1, and
PN
i=1
x1 < x2 < . . . < xN . Then
VaRα (X) = −xkα ,
pi ≤ α.
P α −1
where kα ∈ N is the largest number such that ki=1

For an explanation of the proof see Video.

Proof
Since X has discrete distribution and x1 < x2 < . . . < xN we can see that
k
X
P(X ≤ xk ) = pi . (4.11)
i=1

We shall also use the fact that


k
X k−1
X
min{k : α < pi } = max{k : pi ≤ α}. (4.12)
i=1 i=1

This gives

qα (X) = inf{x : α < P(X ≤ x)}


= min{xk : α < P(X ≤ xk )} (since X ∈ {x1 , . . . , xN })
Xk
= min{xk : α < pi } (by (4.11))
i=1
k−1
X
= max{xk : pi ≤ α} (by (4.12))
i=1
= xkα (by definition of kα ).

We now turn to random variables with continuous distributions.

93
Example 4.14
Suppose that today’s price of stock is equal to S (0). Assume also that the price of stock
at time T is equal to S (T ) = S (0)em+σZ , with Z having a standard normal distribution
N(0, 1). We shall compute VaRα (X) for

X = e−rT S (T ) − S (0).

By Lemma 4.5, qα (Z) = N −1 (α), where N is the standard normal distribution function.
Observing that
X = f (Z),
where
f (ζ) = e−rT S (0)em+σζ − S (0)
is an increasing function,

VaRα (X) = −qα ( f (Z))


= − f (qα (Z)) (by Lemma 4.6)
= −1
− f (N (α)) (by Lemma 4.5)
S (0) 1 − em−rT +σN (α) .
−1
 
= (4.13)

For a live derivation see Video.

In Example 4.14 we have exploited the fact that X was a non-decreasing function of a
random variable with standard normal distribution, for which quantiles are easy to compute.
This idea can be formulated in more general terms as follows.

Lemma 4.15
Let f : R → R be a non-decreasing right-continuous function. Then

VaRα ( f (X)) = − f (qα (X)).

Proof
By Lemma 4.6
VaRα ( f (X)) = −qα ( f (X)) = − f (qα (X)).

94
4.4 VaR in the Black–Scholes model
In the Black–Scholes model we have a single stock and a risk-free asset. The time zero price
of the stock is S (0) > 0. The stock price at time T is given by

2
 √
µ− σ2 T +σ T Z
S (T ) = S (0)e , (4.14)

where µ and σ are positive real parameters, and Z is a random variable with standard normal
distribution N(0, 1). The parameter µ represents the drift and the parameter σ represents the
volatility of stock. The risk free rate is constant and equal to r > 0, with continuous com-
pounding, meaning that the time T price of the risk-free asset is

A(T ) = A(0)erT . (4.15)

A put option with strike price K which expires at time T has a payoff

(K − S (T ))+ = max(K − S (T ), 0),

and costs
P(r, T, K, S (0), σ) = Ke−rT N(−d− ) − S (0)N(−d+ ), (4.16)
where    
ln S K(0) + r + 12 σ2 T ln S K(0) + r − 21 σ2 T
d+ = √ , d− = √ , (4.17)
σ T σ T
and N is the standard normal cumulative distribution function. For more details on the Black-
Scholes model see [BSM].

For an overview and ideas behind this section see Video.

Let H(t) denote the value of a put option at time t ∈ {0, T }

H(0) = P(r, T, K, S (0), σ),


H(T ) = (K − S (T ))+ . (4.18)

We start with a simple Lemma.

Lemma 4.16
For S (T ) and H(T ) given by (4.14) and (4.18), respectively,

2
 √
α µ− σ2 T +σ T N −1 (α)
q (S (T )) = S (0)e , (4.19)
α α +
q (−H(T )) = − (K − q (S (T ))) . (4.20)

95
Proof
σ2

By Lemma 4.5, qα (Z) = N −1 (α). Since z 7−→ S (0)e(µ− 2 )T +σ T z is an increasing func-
tion, (4.19) follows from Lemma 4.6.
Similarly, since ζ 7−→ −(K − ζ)+ is a non-decreasing function, (4.20) follows also from
Lemma 4.6.

Assume that we buy a single share of stock. The discounted gain from this investment is

G̃stock (T ) = S̃ (T ) − S (0) = e−rT S (T ) − S (0).

By Lemma 4.15 we can see that

VaRα (G̃stock (T )) = S (0) − e−rT qα (S (T )). (4.21)

We now consider an investment where at time zero we buy x shares of stock and y units of
the risk-free asset. For t ∈ {0, T }, we use V(x,y) (t) to denote the value of the portfolio at time t

V(x,y) (t) = xS (t) + yA(t),

we use Ṽ(x,y) to denote the discounted value of the portfolio

Ṽ(x,y) (t) = e−rt V(x,y) (t),

and G̃stock,rf
(x,y) (T ) to denote the discounted gain

(x,y) (T ) = Ṽ(x,y) (T ) − Ṽ(x,y) (0).


G̃stock,rf

Lemma 4.17
If x ≥ 0 then

VaRα G̃stock,rf −rT α


 
(x,y) (T ) = V(x,y) (0) − xe q (S (T )) − yA(0). (4.22)

Proof
Since x ≥ 0, the gain G̃stock,rf
(x,y) (T ) can be expressed as a non-decreasing function of S (T )

(x,y) (T ) = f (S (T )),
G̃stock,rf
with
f (ζ) = e−rT (xξ + yA(T )) − V(x,y) (0)
= e−rT xξ + yA(0) − V(x,y) (0).
hence (4.22) follows from Lemma 4.15.

96
Choosing any x ∈ (0, 1) and y = (1−x)S (0)
A(0)
we can see that

V(x,y) (0) = xS (0) + yA(0) = S (0),

and

VaRα G̃stock,rf
 
(x,y) (T )
= V(x,y) (0) − xe−rT qα (S (T )) − yA(0) (from (4.22))
−rT α
= xS (0) − xe q (S (T )) (since V(x,y) (0) = xS (0) + yA(0))
α
= stock
xVaR (G̃ (T )) (from (4.21))
α
< stock
VaR (G̃ (T )).

This means that diversifying an investment between the stock and the risk-free asset reduces
VaR (which is hardly a surprise!).
Another natural idea to reduce VaR is to buy European put options. By doing so one
can hedge against undesirable scenarios, while leaving oneself open to the positive outcomes.
Assume that at time zero we buy x number of stocks and z number of put options with a strike
price K. The value of such an investment is

V(x,z) (t) = xS (t) + zH(t),

and the discounted gain is

G̃stock,put
(x,z) (T ) = Ṽ(x,z) (T ) − Ṽ(x,z) (0)
= e−rT xS (T ) + z (K − S (T ))+ − V(x,z) (0).


Lemma 4.18
If 0 < z < x then

VaRα G̃stock,put (T ) = V(x,z) (0) − e−rT xqα (S (T )) + z (K − qα (S (T )))+ .


  
(x,z) (4.23)

Proof
Since 0 < z < x,we see that G̃(x,z) (T ) can be expressed as a non-decreasing function of
S (T )
G̃stock,put
(x,z) (T ) = f (S (T )),
with
f (ζ) = e−rT xζ + z (K − ζ)+ − V(x,z) (0).


97
 stock,put 
Figure 4.7 VaR5% G̃(x,z(K)) (T ) for different choices of K, for parameters V0 = S (0) = 100,
µ = 0.1, σ = 0.2, r = 0.03, T = 1 and x = 0.99.

By Lemma 4.15

VaRα G̃stock,put (T ) = − f (qα (S (T )))


 
(x,z)

= e−rT −xqα (S (T )) − z (K − qα (S (T )))+ + V(x,z) (0),




which combined with (4.20) gives (4.23).

Example 4.19
Assume that we want to invest V0 at time zero and buy x shares of stock. In order to
have V(x,z) (0) = V0 we need to buy

(1 − x) V0
z = z(K) =
P(r, T, K, S (0), σ)
put options. Depending on the choice of the strike price K we obtain different values of

VaRα G̃stock,put = V0 − e−rT xqα (S (T )) + z(K) (K − qα (S (T )))+


  
(x,z(K)) (T )

(see Figure 4.7). The choice of a high strike price makes the term (K − qα (S (T )))+ large,
but since options with a high strike price are expensive, their number z(K) is small. On
the other hand, if we choose a low strike price, then we can buy a larger number of
options z(K), but each offers weaker protection (K − qα (S (T )))+ . An optimal choice of
the strike price K lies somewhere between these extremes (see Figure 4.7 and Excel).

Usually we do not have full freedom of choice for the strike price of a put option and need
to choose between options which are available on the market. Let us assume that we can invest
in n put options with strike prices K1 , . . . , Kn and maturities T. We denote by Hi (t) the value of

98
a put option with strike price Ki ; in particular

Hi (0) = P(r, T, Ki , S (0), σ),


Hi (T ) = (Ki − S (T ))+ .

Assume that we buy x shares of stock and zi put options with strike price Ki , for i = 1, . . . , n.
Assume also that we buy y units of the risk free asset A. Let z, 1 and H(t) for t = 0, T be vectors
in Rn defined as
 z1   1   H1 (t) 
     

z =  ...  , 1 =  ...  , H(t) =  ...  .
     
     
zn 1 Hn (t)
The value of our investment at time t is

V(x,y,z) (t) = xS (t) + yA(t) + zT H(t).

We show how to compute VaRα for

G̃stock,rf,puts
(x,y,z) (T ) = Ṽ(x,y,z) (T ) − Ṽ(x,y,z) (0).

Proposition 4.20
n
If zi ≥ 0, for i = 1, . . . , n, and z i = zT 1 ≤ x, then
P
i=1

VaRα G̃stock,rf,puts α T α
   
(x,y,z) (T ) = V (x,y,z) (0) − e−rT
xq (S (T )) + yA(T ) − z q (−H(T )) , (4.24)

where
+
 (K1 − qα (S (T )))
 
..

qα (−H(T )) = −  .  .
 
(4.25)
(Kn − qα (S (T )))+
 

Proof
The formula (4.25) follows from Lemma 4.16.
Since zT 1 ≤ x, the function
n
 
X
ζ 7−→ e−rT  xζ + yA(T ) + zi (Ki − ζ)+  − V(x,y,z) (0)

 

i=1

99
is non-decreasing, which by Lemma 4.6 implies that

VaRα G̃stock,rf,puts
 
(x,y,z) (T ) =
n
 
+
 α X
α
= V(x,y,z) (0) − e  xq (S (T )) + yA(T ) + zi (Ki − q (S (T )))  ,
−rT


i=1

which gives (4.24).

From now on we shall assume that x and y are fixed and investigate how to minimize
α
 stock,rf,puts
VaR G̃(x,y,z) (T ) by choosing z. We assume that we have V0 at our disposal for investing
and hedging. This means that we spend

c = V0 − xS (0) − yA(0)

on put options. We assume that we do not take short positions in stock or puts, and that the
number of options does not exceed the number of shares of stock in our portfolio. These
restrictions are imposed by common sense. Later in this chapter we produce an example of
what might happen if these are violated. Under such assumptions, by (4.24), minimizing
α
 stock,rf,puts
VaR G̃(x,y,z) (T ) is equivalent the following problem

min zT qα (−H(T ))
subject to: zT H(0) = c,
(4.26)
zT 1 ≤ x,
z0 , . . . , zn ≥ 0.

Since H(0) and qα (−H(T )) are fixed vectors in Rn , (4.26) is a typical linear programming
problem, which can be solved numerically.

100
Example 4.21
Consider the Black–Scholes model with parameters S (0) = 100, µ = 0.1, σ = 0.2 and
r = 0.03. Assume that we want to invest V0 = 1000 in stock and put options with strike
prices K1 = 75, K2 = 90, K3 = 110 with expiry T = 1. We shall solve the problem
(4.26) for α = 0.05, taking y = 0 and considering c = 0, 10, 30, 50 and 80.
We compute the prices of the put options using (4.16)
 
 0.406 
H(0) =  2.769  .
 
12.042
 

Using the fact that N −1 (0.05) = −1.645 we compute


σ2

qα (S (T )) = S (0)e(µ− 2 )T +σ T N −1 (α)
= 77.96

and  
 0 
α
q (−H(T )) =  −12.04  .
 
−32.04
 

The numerical solutions of (4.26) are given in Table 4.1.

See Video for a live Excel computation.

c x z1 z2 z3 VaRα
0 10 0.00 0.00 0.00 243.44
10 9.9 0.00 3.61 0.00 208.81
30 9.7 0.00 9.36 0.34 146.23
50 9.5 0.00 6.95 2.55 120.68
80 9.2 0.00 3.32 5.88 82.35

Table 4.1 VaRα for various hedging expenditures from Example 4.21.

Evidently it does not make sense to buy put options with strike prices below qα (S (T )).
Looking at Table 4.1 we can see that c is small then we buy options which are cheaper. When
c is large then we can afford to spend money on options with higher strike price, which offer
better protection. A full picture is obtained when we look not only at VaR, but at the distribu-
tion of X in Figure 4.8.

101
stock,rf,puts
Figure 4.8 The gain X = G̃(x,y,z) (T ) from Example 4.21 for various levels of c (left), and its
distribution function (right).

In the formulation of (4.26) we have added constraints that we do not take short positions
in puts, and that we do not buy more puts than stocks. We finish the section by demonstrating
that exercising such common sense is often necessary when dealing with VaR.

Example 4.22
Consider the data from Example 4.21. Suppose that we want to invest V0 = 1000 and
decide to buy x = 20 shares of stock and hedge them with z3 = 20 put options with
strike price K3 . Clearly V(0) does not provide enough funds to enter such a position.
We decide to finance our strategy by taking a short position in put options with strike
price K1
1
z1 = (V0 − xS (0) − z3 H3 (0)) = −3056.
H1 (0)
Clearly our strategy is not a good idea. Common sense dictates that the short position in
unhedged puts will be catastrophic if S (T ) < K1 . Since the probability of this is small,

P(S (T ) < K1 ) < P(S (T ) ≤ qα (S (T ))) = α,

such a scenario is ignored in the computation of VaR and we obtain (see Figure 4.9)

VaRα G̃stock,rf,puts
 
(x,0,z) (T ) = −1135,

which can lull us into a false sense of security, again illustrating the most serious short-
coming of VaR as a risk measure.

In Figure 4.10 we see that the strategy from Example 4.22 can suffer huge losses if stock
prices move against us. In Figure 4.9 we can see that with probability of 2% we will suffer a
severe loss. The conclusion is that looking at loss at given confidence level is not enough, we

102
1135 1135

Figure 4.9 The cumulative distribution function for the gain X = VaRα G̃(x,0,z)
 stock,rf,puts 
(T ) for the
strategy from Example 4.22. The full picture on the left, and a closeup on the right.

stock,rf,puts
Figure 4.10 The gain X = G̃(x,0,z) (T ) for the strategy from Example 4.22.

need to also look at expected loss.

103

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy